Thursday, April 6, 2023

Waterloo Style

When I first started learning how to program, I stumbled onto an extremely strong programming philosophy that was fairly dominant at the University of Waterloo in the 1980s.

Before I was exposed, I would struggle with crafting even simple programs. Afterward, for pretty much anything codable, including system’s level stuff like databases, operating systems, and distributed systems, building it was just a matter of being able to carve out enough time to get the work done.

Over and over again I’ve tried different ways to explain it in this blog. But I think I keep getting lost in definitions, which probably makes it inaccessible to most people.

So, I’ll try again.

The primary understanding is that you should ignore the code. It doesn’t matter. It is just a huge list of instructions for the stupid computer to follow.

If you try to code by figuring up increasingly larger lists of instructions, you’ll go stark raving mad, long before you get it to work properly. So, don't do that.

Instead, focus on the data. Figure out how it should flow around.

Right now it is stored on a disk somewhere, tucked into some type of database. But the user wants to see it on their screen, wrapped in some pretty little interface.

You’ve got stuff running in the middle. You can contact and engage the database with enough information that it can just give you the data you care about. You take it from there and fiddle with it so that it will fit in the middle of the screen. Maybe that prompts the user to mess with the data. In that case, you have to grab it from the screen and tell the database technology to update it.

If you need some data from another system, you call out to it, get a whack load of stuff, then slowly start putting it in your database. If you have to send it elsewhere, you do the opposite. If you don’t want to call other systems, you can have them push the stuff to you instead.

If you take any complex system and view it by how the data flows around it, you’ll find that it isn't so complex anymore. A data perspective is the dual of a coding perspective, I can’t resist throwing that in, but it is intrinsically less complicated.

Okay, so if you get that perspective, how do you apply it?

In the object-oriented paradigm, you take each and every different type of data that you are moving around, wrap it in an object, and put all of the little functions that fiddle with in there too. Then you just move around the objects. It is literally a one-to-one mapping.

It gets messy in that some of the technologies aren't object-oriented. But fortunately one of the core influences into OO was ADTs, or abstract data types. People tend to know these as lists, trees, stacks, and such. But they are just "nearly" objects without the Object idealogy stamped into the programming language. That is, any collection of data has some structure to it that you have to accommodate, which is not surprisingly called ‘data structures’. Capture that, and the underlying mechanics will be able to correctly hold any instance of that data. Screw it up, and at least one instance won’t fit, you’ll have to hack it and you will regret that later.

What’s interesting about the ADT philosophy is that it fits on top of any procedural language, and it helps if the language has composite abilities like typedefs or structs. They allow you to package a group of variables together, and keep them together as they move around the system. You don’t need to package them, but it really helps to prevent mistakes.

If you package the data together, and you build some little functions that understand and work correctly with that data and its structure, you can keep these all together effectively “encapsulating” them from everything else. Or basically a loosely defined object.

So, you get right back to the main issue. You move around the data, and you can do that with objects, structures, or in a Language like Go, with interfaces. It is all the same thing in the end, as it is all derived from that original ADT philosophy. Different names and different language capabilities, but all are part of the same data structure coding style.

Now the only other trick is that you need to keep this organized as you build it. So it takes a little discipline. You have some new type of data in the system, you wrap it in objects or structs ‘first’ before you start using it everywhere else. You build from the bottom up. You don’t skip that, you don’t cheat the game, and you don’t make exceptions. 

Once that data is available, you just flow it to where ever you need it. Since you wrapped it, you will nicely reuse that wrapping everywhere, and if you were missing some little function to fiddle with the data, you would add it very close to that data, in the object, or with the other data structure code. It is altogether in the same place.

When you see it that way, then an operating system is just a very large collection of data structures. So, is a relational database. So are compilers and interpreters. Pretty much everything, really.

Most domain applications are relatively small and constrained sets of data structures. Build the foundation, and move the stuff around where you need it.

Very occasionally, you do need to think about code. Some systems have a small kernel of complexity, usually less than 10%, where they need some really difficult code. It’s best to do a bit of research before you tackle it, but the mechanics are simple. Get all of the data you need, feed it to the algorithm in hopefully a stateless way, then get all of the outputs from it and move them around as necessary. Easy peasy.

The most common objection to this type of coding comes from people believing that it is too many objects or too many little functions. It is ironic, in that when it is consistently applied, it is usually far fewer objects and functions than just slamming out reams of endless redundant code. It is often at least 1/10 of the amount of code. It may seem to be more work, but it is less.

And while it is harder to see the full width of all of the instructions that will be executed, it is actually far easier to vet that the underlying parts are acting correctly. That is, if as the code degenerates, you end up with a problem buried deep in the middle of hundreds of lines of tanged code, it is unlikely that you’ll be able to fix it cleanly. But if you have some little function, that has an obvious bug, you can fix it simply, and then start walking up the calling stack to make sure that everybody above doesn’t break with the fix. It’s actually less work, usually easier, and since you can properly assess the impact of the change, it is about 1000x times safer. It also has the added benefit that it will fix a ‘class’ of bugs, not just the bug itself, which helps to get far better quality.

Now, wrapping all of the data down low is a bit of extra overhead. It doesn’t make sense to do it for small throwaway programs, you generally write those too quickly. But for a medium system, it usually pays off in at least the first 6 months, and for large or huge, it pretty much exponentially cuts down on the development time. A huge saving.

That is, you may come out of the gate a little slower, but the payoff over the life of the project is massive. You can see the difference in large systems, quite clearly. After a year or so, either they are quickly degrading into an impossible mess, or you can keep extending them with a lot more functionality.

There is more. But you need to accept the foundations first. Then you can deal with difficult data, and maybe even understand why design patterns exist (and how not to abuse their usage).

Figuring out how to flow the data around is a much more powerful means of understanding computer systems than just trying to throw together massive lists of instructions. Once mastered it means that most of the other problems with designing and building complex systems are really about people, processes, and politics. If you can easily build anything you want, then you have a lot more choices for what you build, and why you build it.

8 comments:

  1. Can you do a walkthrough on a concrete example?

    ReplyDelete
  2. "Show me your flowchart and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won't usually need your flowchart; it'll be obvious." -- Fred Brooks, The Mythical Man Month (1975)

    ReplyDelete
  3. Is this the same thing as dataflow programming?

    ReplyDelete
  4. Interesting. I am a big fan of OO, but never loved the OO languages. They had too many complicated shit (C++) and mental models, becasue they insist to make everything a object. If you have a Hammer that is OO, everything becomes a nail. What I wanted out of OO where extensible ADTs (say, add complex numbers to C while sticking with C syntax and operator overload).
    If I think about my design style, that is "how to organize data, so I have efficient execution and little code" is not too far off Waterloo design.

    ReplyDelete
    Replies
    1. Actually, what I wanted out of OO was and extensible Compiler that can handle custom ADTs. Guess this is a more precise definition.

      Delete
  5. > Instead, focus on the data.
    After read this I immediately recalled a good post I read before. It’s about functional programming but they’re talking about the same principles. https://www.lihaoyi.com/post/WhatsFunctionalProgrammingAllAbout.html#functional-programming-recipes

    ReplyDelete
  6. https://www.lihaoyi.com/post/WhatsFunctionalProgrammingAllAbout.html#functional-programming-recipes

    ReplyDelete

Thanks for the Feedback!