Programming rules to live by
Preface
During my college days, after the first two years, I was eager and excited to apply all the fancy new data structures and optimizations into all my projects.
As part of the yearly technical fest, I was building out a simulation of a sports game. Naturally, I used all sorts of complex data structures, optimized every single array search and tried to tune the app to be as fast as possible.
While it was, to an extent, fast - I also learned valuable lessons while I built it out. Lessons that I will keep with me throughout my career. This post is about some of those lessons.
Rob Pike’s programming rules
Rob Pike (part of the golang team) summarized it very well here. At first glance, some of them seemed counter-intuitive to me.
Fancy algorithms were tuned to give really good performance. Personally, I have written custom search scripts with two pointers that have helped me clear all the hard test cases on <generic-coding-platform>. But, in practice, these have caused me a lot of weird errors and that too - almost always only in production.
The rules aren’t complex to understand and I’d like to give some examples to explain why I’ve started following them religiously. As with a lot of things in life - your mileage may vary.
Rule 1. You can’t tell where a program is going to spend its time.
You can’t tell where a program is going to spend its time. Bottlenecks occur in surprising places, so don’t try to second guess and put in a speed hack until you’ve proven that’s where the bottleneck is.
This is very, very important. In hindsight, quite a few of the bottlenecks I’ve seen while tuning apps for performance have happened in places where I didn’t bother to spend too much time.
Case in point - process X was performing some computation and writing it to the database. I attempted to parallelize the computation with threadpools, all sorts of locks and ensured clean error handling. However, this didn’t improve the performance significantly.
The solution to get the best performance out of this system was to chunk the database writes into batches. It sounds very obvious when I say it now, but in the heat of the moment, things like these are always missed out.
Rule 2. Measure.
Measure. Don’t tune for speed until you’ve measured, and even then don’t unless one part of the code overwhelms the rest.
This brings me to another curious case. A webpage had to be optimized to be the smallest possible size for mobile users and I spent hours changing the bundler and minifier from browserify to webpack, tree-shaking - the works. The main javascript file size dropped down to ~6KB but we still had users complaining about bad UX.
The webpage itself was not very complex - a couple of dropdowns, an input bar and a button. We compressed all the API calls into one single API call, so not too many network interactions either. When we tested locally, the page rendered pretty fast (~3s) even under throttling.
The culprit - fonts. Locally, the fonts were cached and loaded in seconds. In practice, a ~100KB font file was always loaded from an external site so that we could show fancy icons.
We were able to get custom sprites from our UX folks and users were very happy now with the performance.
Rule 3. Fancy algorithms are slow when n is small, and n is usually small.
Fancy algorithms are slow when n is small, and n is usually small. Fancy algorithms have big constants. Until you know that n is frequently going to be big, don’t get fancy.
(Even if n does get big, use Rule 2 first.)
There was a task where a list of strings was being transformed. We were using a library X that could perform the same task at 2X the speed. The caveat - the library took some time to initialize.
For large inputs, which was a rare occurrence, the library initialization time didn’t matter too much. But since we used it for all inputs, we saw an increase in latency for small inputs as well.
The fix - simply check the size of the input before deciding to go with library X. In practice, for smaller inputs, even brute forcing the transforms and the computations were much, much faster.
Again, this is not to prove never to use library X, but more to show that all of these are just tools. At the end of the day, use the best tool for the job - and check if your job can actually be split into two jobs (different ones for small and large inputs).
Rule 4. Use simple algorithms as well as simple data structures.
Fancy algorithms are buggier than simple ones, and they’re much harder to implement. Use simple algorithms as well as simple data structures.
Most of the time, there are libraries to solve commonly occurring problems.
When I first started writing simple C++ scripts to solve some algorithmic challenges in school, I wrote a method to dynamically increase the size of an array by creating a new array and copying the contents. I also created similar wrappers around Stacks and Queues.
Later on when I found out about STL, I felt like an idiot. Vectors were a breeze to use and didn’t come with the added memory consumption and performance impact of creating a new array. In hindsight, it helped me understand how some of the internals work, but I would have saved a ton of time by just finding out the right tool and the right algorithm.
This is also why I prefer using a String.contains()
versus creating my custom KMP implementation to check for a substring.
Note: This does not go to mean that you do an npm install isOdd
to check whether an integer is odd. Moderation is key - factor in effort and complexity before taking any decisions.
Rule 5. Data dominates. Data structures, not algorithms, are central to programming.
Data dominates. If you’ve chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming.
I’ve seen a situation where 3 parallel arrays were being maintained and continuously sorted to keep track of a “minimum”. Optimizing this into objects with a comparator ended up making the code cleaner, much more maintainable and faster (although not by too much).
Always think thoroughly when deciding your schema or storage structure. In my experience, the implementations running off of these data storage solutions can be switched out with far more ease than actually migrating / transforming the data.
Fin.
I’ve only been working professionally as an engineer for around 3 years now. I’ve learned so much and I hope to learn much more.
Some additional learnings:
-
KISS - Keep it simple, stupid
Make code easily readable and ensure large methods are appropriately refactored. This makes code easy to follow after a year, easy to test and much easier to maintain. -
Trust, but verify
Quite a few times, “issues” have been founded in configuration that someone put in hurry. Verify until your trust is well founded. -
Search, learn and execute
Search online for the best practices, see how best you can implement it. -
Testing is very important
Ensure that you test all pieces of software. Keeping functionality in small, testable components should be a top priority. Never try to build a huge black box and then test only the black box.