The Programmer's Paradox: December 2017

Sunday, December 31, 2017

Visualizing Code

In its simplest sense, programming is about assembling a list of instructions for a computer to follow.

It has become somewhat more complicated lately because some of these instructions are quite high-level and the underlying details are somewhat convoluted. We like to use badly designed libraries and erratic frameworks to accomplish most of the tasks that we no longer feel obligated to work on ourselves.

That might seem to define programming as just utilizing the knowledge embedded into both the programming language and underlying dependencies. Learn enough of the details and off you go.

But there seems to be another dimension to programming.

In order to fully comprehend the behavior of the code, it helps a great deal if we can visualize the running code. To be able to ‘see’ it in our heads, somehow.

There are plenty of debuggers and techniques for embedding print statements, but these alone, for debugging, are often not sufficient for understanding really complex bugs. And they don’t help at all with design issues. They only show us what is there and step by step, how it runs.

The ability to ‘see’ the running code in our heads, however, allows us to be able to work through the possible corner-cases, instead of having to acquire all of our knowledge from the runtime. Programmers that can visualize their code, can quickly diff their understandings against the actual behavior. It’s this skill that amplifies some programmers to new levels.

The basic underlying act of programming is not particularly creative since generally the problems should be analyzed before the coding began. Most design aspects are fairly routine if the underlying technologies are understood.

But even on an organized and well-thought-out development project, being able to visualize the code before it is being built is a huge advantage. That is, to be able to see the code running in our heads, we need to imagine it in some quasi-tangible way, which needs to be simpler than the actual operation, but not so simple that it doesn’t reflect the reality of the code running. That, because we are human, gives us some internal representations that allow us to understand how we see the code operating versus how it is actually running.

That ability brings creativity back into the mix since it requires imagination, and as it is a slowly growing skill over time it explains why programmers can sometimes accidentally exceed their abilities. That is, someone who is quite comfortable debugging a problem in 5K code might suddenly flounder with a problem that spans 20K. It also explains why a software developer that can design a 50K system might completely fail to design one at 150K or larger. What happens is that they become unable to see these increases in complexity, and if the issues span across the full width of the problem then they are unlikely to make ‘reasonable’ decisions.

What seems true is that visualization size is a very slowly accumulating skill set. It takes plenty of time to keep increasing this ability, little by little. It doesn’t come right away, most people aren’t even aware of its existence and there is no magical way to just hyper-jump it into a broader ability. In that sense, although occasionally an interview question almost taps into it, we generally focus on the exact opposite during interviews. We ask specific ‘definition’ questions, not general visualization questions (a rare exception is Fermi estimations).

There have been plenty of anti 10x programmer movements, where we insist that no such beings exist, but that doesn’t seem to match experience.

If 10x programmers exist, since their ability to type in code is effectively bound then it would be exactly this ability to visualize 10x larger contexts that would separate them from the pack. That makes sense, in that they would be able to see where more of the code should be before they start coding and thus save a huge amount of time by not going down wrong paths. They would also be able to diagnose bugs a whole lot faster and make fewer nasty ones. Both of these account for more of our time than actually typing in the code, which while it comes in bursts is not what usually consumes most programming effort. Someone with stronger visualization skills would obviously be able to work through complicated programs a lot faster than someone with lesser abilities. And it wouldn’t be surprising that this ability rests on an exponential scale, given that increases in code size do as well. Small increases in visualization would give large advantages.

So, it does seem that there is a significant creativity aspect to programming, but it isn’t revolving around finding “creative” solutions when unneeded, rather it is about being able to visualize larger and larger solutions and bind them better to reality.

If we accept this, then it is clear that we need to respond by doing two things. The first is that we need a way of identifying this ability. This helps with interviewing, but it also helps in assigning development work to people. Problems that exceed programmers abilities will not likely get solved well. If we are realistic about the size of our problems, then we can do a better job of matching up the coders to them. We can also assign problems that help push a programmers ability a little bit farther each time. Just a bit beyond their current level, but not far enough that it is demoralizing.

The second thing we should do is find some training methods that help increase this ability. We’ve always focused on simple decomposition methods like data structures or objects, but we’ve never reoriented this approach with respect to how it might help a programmer actually grow their abilities. It’s just considered base useful knowledge. We just drown them with definitions. It also makes a huge amount of sense to learn more alternative means of decomposition and different programming paradigms. These help to strengthen our visualizations by giving us different ways of seeing the same things. Thus we need to revisit the way we are teaching programming with an intent to find new ways to expand this ability.

This issue with visualization may actually help explain both the software crisis and modern bloat. Older programmers always lament that the kids don’t pay attention to the details anymore, and this oddly has been a long-running sentiment for over half a century. That makes sense in that each new generation of tools took some of that cognitive effort away from the coders while exchanging it for the ability to work on larger systems. In this exchange, the underlying visualization requirements are similar but are transferred to different areas. Thus the younger programmers work at a higher level, but what they do spans way more underlying code. So the basic effort stays relatively constant, but for each older generation, it looks like the kids are focussing on the wrong issues.

Indirectly this also aligns with many of Bret Victor’s ideas about reducing the lag between working and results. It is a more explicit external approach to achieving the same goal. It is just that even with his advances in technology, our need to visualize will still supersede them because there will always be a need for some larger visualizations that are beyond the current abilities of hardware. It’s a never-ending problem of growing complexity and context.

The underlying mechanics of programming seem to be fairly straightforward, but we are too focused on the instructions themselves. As we try to build bigger systems, this complexity quickly builds up. In order to cope with it, we develop the skills to visualize the behavior. As we grow this secondary skill, we are able to cope with larger and larger programming issues. This is the ability that gives some programmers an edge over the others. People who can see more of the full solution are far more likely to actually solve a lot more of the problem with fewer resources and better quality.

Saturday, December 30, 2017

Immutability

All too often software development practices are driven by fads.

What starts out as a good idea gets watered down to achieve popularity and then is touted as a cure-all or silver bullet. It may initially sound appealing, but eventually, enough evidence accrues that it is, as expected, far from perfect. At that point the negatives can overwhelm the positives, resulting in a predictable backlash, so everyone rushes off to the next big thing. We’ve been through this cycle many, many times over the last few decades.

That is unfortunate, in that technologies, practices, idioms, ideas, etc. are always a mixture of both good and bad. Rather than continually searching for the next, new, easy, thoughtless, perfect approach we really should just accept that both sides of the coin exist, while really focusing on trying to enhance our work. That is, no approach is completely good or completely bad, just applicable or not to a given situation. New isn’t any better than old. Old isn’t intrinsically worse. Technologies and ideas should be judged on their own merits, not by what is considered popular.

It’s not about right or wrong, it is about finding a delicate balance that improves the underlying quality at a reasonable cost. Because in the end, it's what we’ve built that really matters, not the technologies or process we used.

Sometimes, a technique like immutability is really strong and will greatly enhance the code, making it easier and more likely to be correct. But it needs to be used in moderation. It will fix some problems, but it can easily make others far worse.

Recently, I’ve run into a bunch of situations where immutability was abused, but still, the coders were insisting that it was helping. Once people are committed, it isn’t unusual for them to turn a blind eye to the flaws; to stick to a bad path, even in the face of contradictory evidence.

The idea of immutability became popular from the family of functional programming languages. There are good reasons for this. In general, spaghetti code and globals cause unknown and unintentional side-effects that are brutal to track down. Bug fixing becomes long, arduous and dangerous. The core improvement is that if there are no side-effects if the scope is tightly controlled, it is far easier to reason about the behavior of subparts of the code. Thus, we can predict the impact of any change.

That means that the underlying code then can no longer be allowed to modify any of its incoming data. It's not just about global variables, it's also any internal data and the inputs to functions. To enforce immutability, once some data has been created, anywhere, it becomes untouchable. It can’t be changed. If the data is wrong later, then there should be only one place in the system where it was constructed, thus only one place in the system where it should be fixed. If a whole lot of data has that attribute, then if it is buggy, you can reason quickly about how to repair it with a great deal of confidence that it won’t cascade into other bugs.

While that, for specific types of code, is really great and really strong, it is not, as some believe, a general attribute. The most obvious flaw is that there could be dozens of scattered locations were the immutable data is possibly created, thus diminishing its scope benefits. You might still have to spend a lot of time tracking down where a specific instance was created incorrectly. You’ve protected the downstream effects but still have an upstream problem.

So, immutability does not magically infer better programmatic qualities. But it can be far worse. There are rather obvious places in a system where it makes the code slower, more resource intensive, more complex and way less readable. There is a much darker side to immutability as well.

Performance is the obvious first issue.

It was quite notable in Java that its immutable strings were severely hurting its performance. Code often has to do considerable string fiddling and quite often this is distributed in many locations throughout the program as the final string gets built up by many different computations.

Minimizing this fiddling is a great way to get a micro-boost in speed, but no matter how clever, all of that fiddling never really goes away. Quite obviously, if strings are immutable, then for each fiddle you temporarily double the string, which is both memory usage and CPU time spent copying and cleaning up.

Thus pretty quickly the Java language descended into having special string buffers which were mutable and later an optimization to hide them when compiling. The takeaway from this is that setting things that need to change ‘frequently’ to be immutable is extraordinarily costly. For data that doesn’t change often, it is fine. For data that changes a lot, it is a bad idea if there are any time or resource constraints.

Performance isn’t nearly as critical as it used to be, but it still matters enough that resources shouldn’t just be pissed away. Most systems, most of the time, could benefit from better performance.

In a similar manner to the Java strings, there was a project that decided that all of the incoming read-only data it was collecting should be stored immutability in the database. It was not events, but just fairly simple static data that was changeable externally.

The belief was that this should provide both safety and a ‘history’ of any incoming changes, so basically they believed it was a means of making the ETL easier (only using inserts) and getting better auditing (keeping every change).

In general, strong decompositions of an underlying problem make them easier and safer to reason about the behavior and tend towards fewer side-effects, but intermixing two distinctly different problems causes unintentional complexity and makes it harder to work through the corner-cases. Decomposition is about separating the pieces, not combining them. We’ve known that for a long time, in that a tangle of badly nested if statements are usually described as spaghetti code. It is hard to reason about and expensive to fix. They need to be separated to be usable.

So it’s not surprising then that this persistent immutable collection of data required overly complex gymnastics in the query language just to be read. Each select from the database needs to pick through a much larger set of different but essentially redundant records, each time the data is accessed. The extra storage is huge (since there are extra management costs like secondary indices), the performance is really awful and the semantics of querying the data are convoluted and hopelessly unreadable. They could have added another stage to copy that data into a second schema, but besides being resource intensive and duplicating a lot of data, it would also add in a lag time to the processing. Hardly seems better.

If the data was nicely arranged in at least a 3rd NF schema and the data flow was idempotent (correctly using inserts or updates as needed) then simple logging or an audit log would have easily solved that secondary tracking problem with the extra benefit that it might be finite for a far shorter timeframe then the rest of the data. That easily meets all of the constraints and is a far more manageable approach to solving both problems. Immutability, in this case, is going against the grain of far older and more elegant means of dealing with the requirements.

Since immutability is often sold as a general safety mechanism, it is not uncommon for people to leverage that into misunderstandings as well. So, for instance, there was one example of a complex immutable data structure but it spanned two completely different contexts. Memory and persistence. The programmers assumed that the benefits of immutability in one context automatically extended to the other.

A small subset of the immutable structure was in memory with the full version persisting. This is intrinsically safe. You can build up new structures on top of this in memory and then persist later when completed.

The flaw is to assume however that any and all ‘operations’ on this arrangement would preserve that safety just because the data is immutable. If one copies the immutable structure in memory but does not mirror that copy within the other context then just because the new structure is immutable doesn’t mean it will stay in sync and not get corrupted. There aren’t two things to synchronize, now there are three and that is the problem.

Even if the memory copy is not touched, the copy operation introduces a disconnect between the three contexts that amounts to the data equivalent of a race condition. That is the copies are not independent of the original or the persisted version; mutable or immutable that ‘dependence’ dominates every other attribute. So, a delete on the persistent side would not be automatically reflected in the copy. A change on the original would not be reflected either. It is very easy to throw the three different instances out of whack, the individual immutability of each is irrelevant. To maintain its safety, the immutability must span all instances of the data at all structural levels, but that is far too resource intensive to be a viable option. Thus partial immutability causes far more problems and is harder to reason about than just making the whole construct mutable.

Immutability does, however, have some very strong and important usages. Quite obviously for a functional program in the appropriate language, being able to reason correctly about its behavior is huge. To get any sort of programming quality, at bare minimum, you have to be able to quickly correct the code after its initially written and we’ve yet to find an automated means to ensure that code is bug free with respect to its usage in the real world (it can still be proven to be mathematically correct though, but that is a bit different).

In object-oriented languages, using a design pattern like flyweights, for a fixed and finite set of data can be a huge performance boost with a fairly small increase in complexity. Immutable objects also help with concurrency and locking. As well, for very strict programming problems, compile-time exceptions for immutable static data are great and runtime exceptions when touching immutable data help a lot in debugging.

Immutability has its place in our toolboxes, but it just needs to be understood that it is not the first approach you should choose and that it never comes for free. It’s an advanced approach that should be fully understood before utilizing it.

We really need to stop searching for silver bullets. Programming requires knowledge and thought, so any promise of what are essentially thoughtless approaches to quickly encode an answer to a sub-problem is almost always dangerous.

In general, if a programmer doesn’t have long running experience or extensively deep knowledge then they should err on the side of caution (which will obviously be less efficient) with a focus on consistency.

Half-understanding a deep and complex approach will always result in a mess. You need to seek out deep knowledge and/or experience first; decades have passed so trying to reinvent that experience on your own in a short time frame is futile; trying to absorb a casual blog post on a popular trick is also futile. The best thing to do is to ensure that the non-routine parts of systems are staffed with experience and that programmers wishing to push the envelope of their skills seek out knowledge and mentors first, not just flail at it hoping to rediscover what is already known.

Routine programming itself is fairly easy and straightforward, but not if you approach it as rocket science. It is really easy to make it harder than it needs to be.

Saturday, December 9, 2017

Sophistication

Computers aren’t smart, but they are great at remembering things and can be both quick and precise. Those qualities can be quite helpful in a fast-paced modern world. What we’d like is for the computer to take care of our problems, while deferring to us on how or when this work should be executed. What we’d like then, is for our software running on those computers to be ‘sophisticated’ enough to make our lives easier.

A simple example of this is a todo list. Let’s say that this particular version can schedule conference calls. It sends out a time and medium to a list of participants, then collects back their responses. If enough or the right people agree, then the meeting is scheduled. That’s somewhat helpful, but the program could go farther. If it is tied to the phone used in the teleconference, it could detect that the current discussion is getting close to going over time and will impact the schedule of any following meetings. At that point, it could discretely signal the user and inquire if any of the following meetings should be rescheduled. A quite smart version of this might even negotiate with all of the future participants to reoptimize the new schedule to meet most of their demands all on its own.

That type of program would allow you to lean on it in the way that you might rely on a trusted human secretary. They often understand enough of the working context to be able to take some of the cognitive load off their boss, for specific issues. The program's scope is to understand all of the interactions and how they intertwine and to ensure that any rescheduling meets most of the constraints. In other words, it’s not just a brain-dead todo app; it doesn’t just blissfully kick off timers and display dialogues with buttons, instead, it has a sophisticated model of how you are communicating with other people and the ability to rearrange these interactions if one or more of those meetings exceed some threshold. So it’s not really intelligent, but it is far more sophisticated than just an egg timer.

It might seem that it would be a complicated program to write, but that upper-level logic isn’t hard. If the current meeting is running late, prompt the user and possibly reschedule. It just needs to know the current meeting status, have a concept of late, be able to interact with the user and then reschedule the other meetings.

A programmer might easily get lost in the details that would be necessary to craft this logic. It might, for example, be an app running on a phone that schedules a lot of different mediums, like conference calls, physical meetings, etc. It would need timers to check the meeting status, an interface to prompt the user and a lot of code to deal with the intrinsic complexities of quickly rescheduling, including plenty of data to deal with conflicts, unavailable contacts, etc.

It is certainly a non-trivial piece of code and to my knowledge, although parts of it probably appear spread across lots of different software products, none of them have it all together in one place. As one giant piece of code, it would be quite the tangle of timers, events, widgets, protocols, resources, contacts, business logic, conditional blocks of code and plenty of loops. It would be so complicated that it would be nearly impossible to get right and brutal to test. Still, conceptually it’s not rocket science, even if it is a lot of individual moving parts. It’s just sophisticated.

So how would we go about building something this sophisticated? Obviously, as one huge list of instructions, it would be impossible. But clearly, we can describe the higher level logic quite easily. So what we need to do is to make it easy to encode that logic as described:

If the current meeting is running late, prompt the user and possibly reschedule.

If that is all there is to the logic, then the problem is simple. We basically have four ‘parts’ that need to be tied together in a higher level block of code. The current meeting is just some data with respect to time. Running late is a periodic event check. Prompting the user is a simple interface and rescheduling while tricky, is a manageable algorithm. Each of these components wraps code and data. This is where encapsulation becomes critical. For each part, if it is a well-defined black box that really encapsulates all of the underlying issues, then we are able to build that sophistication on top. If not, then we’ll get lost in the details.

Some of the underlying data is shared, so the data captured and its structure need to span these different components. These components are not independent. The current meeting, for example, overlaps with the rescheduling, but in a way that both rely on the underlying list of contacts, meeting schedule and communications medium. That implies that under those two components there are at least three more. Having both higher level components anchor on using the same underlying code and data ensures that they will interoperate together correctly, thus we need to acknowledge these dependencies and even leverage them as reusable sub-components.

And of course, we want each of these upper-level components to be as polymorphic as possible. That is, we want a general concept of medium, without having to worry about whether that means a teleconference, video conference or a physical meeting. We don’t need to care, so we shouldn’t care.

So, in order to achieve sophistication, we clearly need encapsulation and polymorphism. These three are all tied together, and the last two fall directly under the banner of abstraction. This makes a great deal of sense in that for large programs to build up the necessary high-level capabilities you have to be both tightly organized and to generalize at that higher level.

Without organization and generalization, the code will grow so complex that it will eventually become unmanageable by humans. There is a real physical limit to our abilities to visual larger and larger contexts. If we fail to understand the behavior of enough code, then we can then no longer predict what it will do, which essentially is the definition of a fragile and buggy system.

It is worth noting here that we quite easily get to a reasonable decomposition by going at the problem from the top down, but we are actually just building up well-encapsulated components from the bottom up. This contradiction often scares software developers away from wanting to utilize concepts like abstraction, encapsulation, polymorphism, generalization, etc. in their efforts, but without them, they are really limited in what they can achieve.

Software that isn’t sophisticated is really just a fancy means of displaying collected data. Initially, that was useful, but as we collect so much more data from so many sources, having that distributed across dozens of different systems and interfaces becomes increasingly useless. It shifts any burden of stitching it all back together If over to the user, and that too will eventually exceed their capacity. So all of that extra effort to collect that data is heavily diminished. It doesn’t solve problems, rather it just shifts them around.

What the next generation of software needs to do is to acknowledge these failings, and start leveraging the basic qualities of computers to be able to really help the users with sophisticated programs. But to get there, we can’t just pound out brute force code on a keyboard. Our means of designing and building this new generation of software has to become sophisticated as well. We have to employ higher-level approaches to designing and building systems. We need to change the way we approach the work.