Thursday, February 21, 2019

Software Optimizations

Most software can execute faster.

There are many ways that software can be optimized to improve its performance. Most of these techniques are well-understood, but they still need to be used with caution, in that they can accidentally harm other attributes of the system.

The most obvious way to speed up code is to no longer do useless work.

One common form of wasted effort is to redundantly copy the same data to many different areas of memory. Another is to parse the data into smaller pieces and then reassemble it later or vice versa. Removing these from code should get it closer to the minimal effort, but not necessarily the minimum.

Sometimes, however, extra copies of data are necessary for security, locking or architectural reasons. These types of redundancies can stay, if they are justified. For sanity reasons, most incoming data for a large system should be fully parsed as soon as possible. Exporting this data may legitimately require reassembling it. This too is fine.

Switching to a better algorithm, quite often, can afford very large time savings. There is a huge amount of knowledge available about algorithms and their performance attributes. Significant research is always required.

Sometimes shifting between time and space works really well. We can rebalance the code to shift this resource usage. In some cases though, there are natural boundaries for reductions, so the embedded information doesn’t exist to optimize the code.

Algorithmic optimizations are often the most effective, but they require a great deal of knowledge and often a lot of time to implement properly.

Beyond that, memoization which is the reuse of earlier computations can produce decent optimizations. Caching is the most famous of these, but care must be taken to distinguish between read-only and write-through caching. They are very different from each other and frequently confused. A bad implementation can cause weird bugs.

The big trick with memoization is not in saving the value, but rather in knowing ‘precisely’ when that value is no longer of any use. These types of optimizations require a strong understanding of usage and data frequency. Generalized solutions can help (and hurt), but specific solutions will always produce better results.

An example of this is compression. Data can be taken down close to its information theoretic minimum, beyond that some data is lost (which can also be acceptable). The act of reducing the size of the data is accomplished by utilizing these redundancies. This is also a classic time vs space tradeoff.

Parallelizing computation is another strong form of optimization. To make it work on interconnected data usually requires synchronization primitives like locking. Locking can be coarse-grained or fine-grain, with the latter usually providing better performance at the cost of more management overhead. Locking gets extraordinarily challenging when it occurs outside of a single process.

Locking algorithms spread across different computers are bounded by TGP (two generals problem) which in itself influences impossibility results like CAP and FLP. Generally, this is caused by an inherent ambiguity (missing information) within the communication between the separated computations (getting worse as the underlying reliability of the communication weakens). This sometimes described as transactional integrity as well.

In general, we can optimize code by seeking out data independence. If for a given computation, there is some dependence for the result on some other piece of data, then that relationship bounds the minimum amount of work. All outputs must be produced from some finite set of inputs. This is true for all computations. As there is a precise minimum for information, there also exists one for computation.

Optimization attempts then can start by observing for a given context that there will never be ties between any specific set of variables and using that information to reorder the work involved to get closer to the minimum. That is, we can conjecture that for any specific output, there are a finite number of both computations and data that form a minimum directed acyclic graph (DAG) with all inputs as leaves. Then there should exist a minimal such DAG (relative to the computational primitives). This can be applied mechanically to any set of instructions, for a given set of data, as it is bounded by a specific context. Fill in these unknowns and the minimal set of work is explicit.

Some algorithmic optimizations are tricky in that they would require currently unknown relationships to exist in order to find the actual minimum effort. We can, however, come close to the minimum, even if we can’t get there yet.

Most other optimizations are easier, in that they really come from understanding the data, its usage and the underlying functioning of the computers themselves (sometimes optimizations at one level exist to counterbalance bad optimizations at a lower level).

Most code is written as the ‘obvious first try’, so most of the time there is plenty to optimize. However, most programmers do not fully understand the data or the context, which is why we warn younger coders to not prematurely attempt to optimize. They do not have a full enough understanding yet to do it correctly and bad optimizations, by definition, will use more resources not less. Poor optimizations can impair readability or extendability. Only good optimizations will help.

No comments:

Post a Comment

Thanks for the Feedback!