Zebra Codes

Prematue Optimisation is Necessary, not Evil

21st of October, 2023

According to the oft-repeated advice of Donald Knuth, “premature optimisation is the root of all evil.” In my opinion this is poor advice. Let me tell you why.

Conventional wisdom holds that you should optimize only at the end of the project, when the architecture is settled, the algorithms implemented, and the bugs fixed. Only then is the program stable enough to warrant optimization. You should use a profiler to identify the “hot loop”: the portion of the code where the majority of the time is being spend, and focus your optimization effort in that.

This may have been good advice when it was written in 1974, but programs are vastly more complex now, and that complexity comes at a cost.

You may notice that despite computers becoming ever faster, application performance tends to remain the same. If you were to attempt profile guided optimization of, say, a word processor then you would likely be disappointed. The application spends 90% of its time idle while waiting for user input, and the rest of the time is spent “nowhere in particular” – you will not find a smoking-gun “hot loop” to optimize, but thousands of tiny function calls each of which takes virtually no time at all.

Allow me to introduce you to another proverb:

Look after the pennies, and the pounds will look after themselves.

Proverb

The reason that the application is performing slowly is death by a million cuts: a pointer indirection here, a virtual function there… things so trivial it feels like they are free on anything other than an antique CPU. That is an illusion, however, and they are not free. Any one of them (or all of them!) may cause a cache miss, wasting hundreds or thousands of cycles as code or data is fetched from memory or the instruction pipeline is flushed and recomputed. Multiply this by the many times it occurs and you start to see why your application is not as fast as it should be.

At this point many people will be thinking, “So what? It may not be instant but it’s still perfectly acceptable.” This is, frankly, a selfish attitude. You are asserting that it is OK to waste other people’s resources. Your application is not the only one running on the system. The user may not be able to afford a machine as powerful as your development box. You may be needlessly draining the battery on someone’s portable device. In short: the time is not yours to waste.

If you have made it this far without major diagreement then you are doubtless expecting me to provide the solution round about now. Unfortunately there is no easy fix to this: it is a question of mindset. Just like security is something to be considered at every junction, not bolted-on at the end, so must it be with performance. Question your data structures: does the O(1) retrieval from a map save you time, or do you actually only hold 5 items in it and so it would be quicker to use a vector and avoid cache misses? Do you really need another virtual function, or can it be resolved at complie time? Could items be sorted so that you are not constantly flushing the instruction cache as you iterate over a heterogeneous list?

Modern programming languages may have something to answer for here. Programmers are encouraged to work to a machine abstracted away by the compiler, but the truth of the matter is that your code will run on real hardware with real performance considerations.

Object Orientated Programming in particular encourages this kind of of overly optimistic thinking. The Liskov substitution principle tells you that you can iterate over a list of pointers to different types of object as long as they share the same interface; it neglects to mention that each iteration will likely cause a cache miss when it fetches the object, an instruction cache miss when it fetches the appropriate virtual function, and a flush of the execution pipeline.

One potential ray of hope is Data Oriented Design. Essentially this treats your application data as a database, holding similar data contiguously in memory. This is a radical departure from the usual object orientated approach and one that I humbly suggest you look into. It is an entirely different way of structuring an application, and so it is not suitable for integrating into an existing project, however it may be worth considering when designing a fresh application.

In conclusion I would like to reiterate that performance, like security, is something that should be considered at every stage of development and not relegated to a check-list item at the end. Consider the real hardware on which your application will run, not the compiler’s abstract fantasy, and design according. It is hard work, but your users and I will thank you.