Thursday, April 16, 2009

The End of Coding as We Know It

It's time for a bold prediction.

I know that most software developers see computer language programming as the essential element of software creation. For most people it is inconceivable that we might one day automate large chunks of this effort.

Automation, it is believed, can only come from tasks that are repetitive and require little thinking. Programming requires deep concentration, complex visualizations, and often some pretty creative thinking to get around both the technical and domain-based problems.

Thus, it is easy to make the assumption that because it is rooted in intellectual work, programming in computer languages will always be an essential part of part software development.

However, I think this is a false statement.


THE LAST OF THE TYPESETTERS

I can remember my Dad -- an editor for a printing-industry magazine -- taking me on a tour of one of the last vestiges of movable type. In its day, movable type was a technical marvel. They had invented a super-fast way of laying out printable pages.

Movable type involved the careful positioning of thousands of tiny small metal letters, called glyphs. A typesetter could quickly create page after page, and reuse the glyphs for more layouts after they were finished printing.

It was a huge step forward for printing, a major leap up from the hand carved printing blocks of the past. Printing in a matter of days.

A typesetter was responsible for hand positioning each element. Good quality workmanship was difficult, and the occupation was considered skilled. A great deal of knowledge and understanding were often required to get a high-quality readable layout, including hyphenation, kerning and leading. Typesetters, as a subset of typographers, which also includes font designers, were considered master craftsmen.

The future had looked bright for typesetters. Their highly skilled profession involved the careful layout of huge amounts of text. Printing was a major growth area; one of the great technical endeavors of the industrial revolution. It was a great job, well paying, and with a lot of esteem. It was a smart career choice, surely a job that would be required forever.

Forever didn't last long.

Over time, the technologies improved, yet right up to the 1960s, typesetters were still required. From movable type to hot press, then later to cold press, the work endured while the technologies changed around it. Still skilled, still complex, but gradually becoming less and less significant.

Finally, the growth of desktop publishing killed it. Much of the beauty and art of the layout was lost. Computers, even with good font hints, don't generate the same quality of output, however, consumers no longer appreciate the difference. High-quality typesetting became a lost art.

Typesetting still occurs in some context, but it is very different from its origins. It is a specialty now, used for very limited occasions. My dad lamented the end of typesetting, often telling me that desktop published documents were not nearly as readable or classy.


LESSONS LEARNED

So what does this have to do with computers?

It is an excellent example of how easily technology creates and removes occupations. How easily it changes things. Typesetters can be forgiven for believing that their positions would survive far longer. It was a skilled labor job, that required both a visual eye for detail, a considerable amount of intelligence, and a good knowledge of spelling and grammar for hyphenating the text. It would have been way better than laboring in a factory.

Way back then, with our current frame of reference, it would be easily to assume that the intellectual effort involved in typesetting rendered it impossible to automate. Anybody that has dealt into the murky world of fonts and layout knows that it is far messier and way more complex than most people realize. But then we know that aspects of the problem gradually got redistributed to fonts, layout programs, or just lost. The constrains of the original efforts disappeared, and people gradually accepted the reduced quality in their outputs. Automation brought mediocrity. Average.

Programming has aspects to it that require intelligence, but it also contains large amounts of rather mindless effort in the whole work. We code a big mess, then spend a lot of time finding fiddly small problems or reworking it. Intellectual work is intellectual work, no computer can ever do it, but that doesn't mean that it has to be done each and every time we build a system. That's the basis of a false assumption.

While the core of the intellectual work will never change, how it's done, how much of it really exists and whether or not we'll have to keep coding forever are all up for grabs. All we need do is restructure the way we look at programming and we can see some fairly simple ways to collapse the effort, or at very least get far more re-use out of our current efforts.

I'll start with a simple model.


CONTEXTS

Consider the idea of a 'data context'. Basically a pool of data. Each one holding some amount of data. In a context, a datum has a very specific structure and type. The overall collection of data may be complex, but it is finite and deterministic. There are a fixed number of things in a context.

Getting data from one context to another is a simple matter. The data in one context has a very specific type, while it may be quite different in another context. To go from one to the other the data must go through a finite series of transformations. Each transformation takes a list of parameters, and returns a list of modified values. Each transformation is well-defined.

We can see modern computers as being a whole series of different contexts. There is a persistent data context, an application model context, a domain model context and often a lot of temporary in between contexts, while some computation is underway or the data is being moved about. Data appears in many different contexts, and these stay persistent for various different lengths of time.

A context is simply a well-defined discrete pool of data.


EXTENDED TYPE STRUCTURES

We often talk of 'type' in the sense of programming language variables being strongly typed or loosely typed. Type, even though it may be based on a hierarchy, is generally a reference to a specific data type of a specific variable. It is usually a singular thing.

In a general context, we do use it as a broader structure based definition, such as referring to Lists, Trees and Hash Tables in abstract data structures, but most people don't classically associate an ADT like List with the 'type' of a variable. They tend to see them as 'typeless' containers.

For this discussion we need to go higher, and think more in terms of an 'extended type', a fully structural arrangement of a much larger set of variables, where the interaction isn't restricted to just hierarchies. The full structural information for an extended type includes all of the possible generalizations of the type itself, including any alternative terminology (such as language translations).

The type of any variable, then is all of information necessary to be very explicit or completely generalized in the handling of any collection of explicitly related data. The extended type information is a structure.

We can take 'type' to mean a specific node in this complex graph of inter-related type-based information. A place in a taxonomy for instance.

Type, then includes any other reasonable alternative "terms" or aliases for the underlying names of the data. For example, floating-point-number, number, value, percentage, yield, bond-yield, bond-statistic, financial-instrument-statistic, Canadian-bond-statistic or Canadian-bond-yield may all refer to the same underlying value: 4.3. Each title is just another way of mentioning the same thing, although its reference ranges from being very generalized to being very specific.

Type can also include a restricted sub-range of the fully expressible type. For example, it may only be integers between 2 and 28. Thus an integer of 123 cannot be mindlessly cast to an integer_2..28, it does not belong to that 'type', but the integer 15 does.

Data of one type can be moved effortlessly to data of any other type in the same structure, they are one in the same. Data that is not within the same type structure requires some explicit transformation to convert it.


TRANSFORMATIONS

A transformation is a small amount of manipulation required to move some data from one unique type to a different one. Consider it to be a mini-program. A very specific function, procedure or method to take data of type A, and convert it to type B.

A transformation is always doable. Any data that comes in, can be transformed to something outbound (although the results may not make sense to humans). Transformations are finite, deterministic and discrete, although they don't have to be reversible. The average value of a set of numbers for example is a non-reversible (one-way) transformation.

Transformations can have loops, possibly apply slightly differently calculations based on input, and could run for a long time. Also the output is a set of things, basically anything that has changed in some way, from the input. There are no side-effects, everything modified is passed in, everything changed is returned.

The transformation is specific, its input is a set of specific values of specific types, and its output is another set of values of specific types. No conditional processing, no invalid input, no side-effects. A transformation takes a given variable, applies some simple logic and then produces a set of resulting variables.

The underlying language for transformations could be any programming language, such as C, Java, Perl, etc. Mostly, I think most of the modern functional programming languages, such as Haskell and Erlang define their functions in same manner (although I am just guessing), but Perl is the only language that I am aware of that can return lists (as native variable) from function calls.


PUTTING IT ALL TOGETHER

The three simple concepts: contexts, types and transformations form a complete computational model for utilizing computer hardware.

We can skip over any essential proofs, if we accept that the model itself is just a way to partition an underlying Turing complete language, in the same way that Objected Oriented doesn't make anything more or less Turing complete.

I think that higher level structural decompositions do not intrinsically change the expressibility of the underlying semantics. In other words, nothing about this model constraints or changes the usability of the underlying transformation programming language, it just restructures the overall perspective. It is an architectural decomposition, not a computational one.

A path from context to context, involves a long pipeline of simple transformations. Each one takes a specific set of input, which it converts into output. To further simplify things, each transformation is actually the smallest transformation possible given the data. Each one does a near trivial change, and then returns the data. If there are conditional elements to the path, that processing takes place outside of the transformations, at a higher level. The transformations are always a simple path from one context to another.

In that way, the entire system could consist of millions and millions of transformations, some acting on general data types, others gradually getting more specific as the transformations require. Each one is well defined, and the path from context to context for each datum is also well-understood.

From a particular context, working backwards, it is an entirely straight-forward and deterministic pathway to get back to some known context starting point. That is, the computer can easily assemble the transformations required for a specific pipeline if the start and end contexts are known.

There is no limit to the number of contexts or the length of time they stay around. There could be a small number or as we often cache a lot in modern systems, there could be a very large number of smaller contexts.

We can build massive computer systems from a massive number of these transformations that help the system to move data from one context to another. It would not take a huge amount of effort -- in comparison to normal programming efforts -- to break down all of the domain specific data into a explicit data types and then map out a huge number of transformations between the different types. We do this work constantly anyways when building a big system, this just allows us the ultimate 'reuse' for our efforts.

Users of this type of system would create a context for themselves. They would fill it with all of the references to the various different bits of data they want to access, and then for each, map it back to a starting context. In a very real sense, the users can pick from a sea of data, and assemble their own screens as they see fit. A big browser, and some drag and drop capabilities would be more than enough to allow the users to create their own specific 'context' pages in the system.

We already see this type of interface with portal web applications like iGoogle, but instead of little gadgets, the users get to pick actual data from their accessible contexts. No doubt they would be able to apply a few presentation transformations to the data as well to change how it appears in their own context. Common contexts could be shared (or act as starting templates).

As an interface, it is simple and no more complicated than many of the web based apps. Other than the three core concepts, there are no unknown technologies, algorithms or other bits necessary to implement this.


RAMIFICATIONS

Software would no longer be a set of features in an application. Instead it would be millions and millions of simple transformations, which could be conveniently mixed and matched as needed.

Upgrading a computer would involve dumping in more transformations. The underlying software could be sophisticated enough to be able to performance test different pipeline variations, so you could get newer more optimized transformations over time. Bad pipeline combinations could be marked as unusable.

Big releases from different vendors or even different domains could be mixed and matched as needed. One could easily write a simple series of patching transformations to map different sets of obscure data onto each other. All of our modern vertical silo problems would go away.

Distributed programming or parallel programming are also easy in this model, since it becomes a smaller question of how the individual pipelines are synchronized. Once reasonable algorithms get developed -- since they don't change -- the overall quality will be extremely high and very dependable.

In fact the system will stabilize quickly as more and more transformations get added, quantified and set into place. Unlike modern systems the changes will get less and less significant in time, meaning the quality will intrinsically get better and better. Something we definitely don't have now.

Of course the transformations themselves are still programming. But the scope of the programming has gone from having to create hugely complex massive programs to a massive number of hugely simple small ones. The assembly is left to the computer (and indirectly to the user to pick the data).

Eventually, though, the need for new transformations would slow down, as all the major data types for all of the various different domains would get added. Gradually, creating new transformations would be rarer and rarer, although there would always be some need to create a few.

Just quickly skipping back to typesetting, it should be noted that Graphic Designers still occasionally manually tweak kerning or leading to make some graphic layouts have a high quality appearance. The job disappeared, but some vestiges of it still remain.

Of course, we will still need data analysis and operations people to handle setting up and running big systems in production, but the role of the programmer agonizing over line after line of code is not necessary in this model. The computer just assembles the code dynamically as needed.


SUMMATION

I presented these ideas to show that there is at least one simple model that could eliminate programing as we know it. Although these ideas are fairly simple, building such a system involves a great deal more complexity that I addressed.

It is entirely possible, but even if these ideas are picked up right away, don't expect to see anything commonly in production for a long time. Complex ideas generally need about twenty years -- a generation -- to find acceptance, and some ideas need to sit on the bench a lot longer before people are willing to accept them.

Even if we built the best distributed transformation pipeline systems with near perfect quality, it would still takes decades for the last of the old software to die out. People become rather attached to their old ways, even if they are shown to not work very well. Technology rusts quickly, but fades slowly, it seems.

Programming, while doomed, will be around for a while yet.

18 comments:

  1. First, do you want a proofreader? Its-it's, usage of 'way', etc.

    This touches on various previous efforts in computing. One is Flow Based Programming (http://en.wikipedia.org/wiki/Flow-based_programming), and another is Applicative programming (e.g. LISP and ML).

    I suspect that even with this kind of system, there would be no end to the writing of programs, just as there is no end to the writing of books. As we acquire knowledge and understanding, our context and focus shifts, and our requirements change. Thus, our software must keep shifting to keep pace.

    Unix offers the most simple and long-held system for doing what you suggest: sockets provide I/O handles for a given component to receive input and send reprocessed output.

    I would advise caution concerning functions that return differences rather than new data - these are limited to returning to the original system/machine and to static states, rather than to pipelining and dynamic systems.

    How do you envision this overlapping with web services, which could function in the way you describe at the macro-level?

    Or are you suggesting everything happens in xslt?!

    ReplyDelete
  2. I have the feeling that it as already begun: functionnal programming is clearly just about that: types and transformations.

    IMO, Haskell is to functional programming what was assembly to early computer programming: weird, requiring arkane knowledge (assembly: of the chip; Haskell: of algebra) and low-level.

    ReplyDelete
  3. Until then the next step would be the rise of Application Designers.

    They will operate with various Domain Specific Languages and Application Generators.

    http://clair.ro/blog/2009/03/30/towards-to-automagically-generated-web-applications/

    ReplyDelete
  4. Thanks for all of the comments everyone!

    @Phil,

    If these ideas were to be developed, they would displace virtually all of our modern software. A context would probably be anchored to a 'machine' but beyond that clients, servers, OSes, sockets, etc. are all artifacts of our current technology model, not this new one.

    If my poor spelling and grammar get to you, I'm fine with you posting corrections. I'm often rushed, so I edit with as much time as I have and then out it goes (otherwise I'd never publish).

    @Astrobe,

    Yes, there is really nothing new about any of these ideas, except that I isolated them with the intent to get rid of legacy complexity. Gradually, I am sure we will naturally drift in this direction, but it may take us a long time to get there. The hard part is not getting bounded by the earlier complexities.

    @csbartus,

    The one-man show idea has motivated developers for decades. The problem seems to be that as the technology improves, the demands of the users improve just as quickly. Our current technology path does nothing to remove or reduce this race condition. At the same time, each new generation of technology forces a rewrite of the existing systems, we have no way to heavily leverage the past.

    ReplyDelete
  5. The 'programming without programming' is one of the oldest ideas in computing. Richard Hamming called it 'automatic programming' back in the '50s. It's gone through many variations; high level languages were supposed to let anyone write software, then 4GLs where supposed to let business analyst write software, then UML was supposed to make software creation trivial, nowadays, Domain Driven Development and Domain Specific languages are supposed let any domain expert create software.

    IMO, the issue isn't programming and developing software. The hard thing is creating solutions. However thats done or whatever you call it, it will always be 'programming'.

    ReplyDelete
  6. Ben Peirce's work on what he calls 'lenses' touches on this. You would really benefit from digging up the papers or reading the docs (the super-super alpha language implementation is called "Boomerang").

    The underlying idea with lenses is this (I might fudge some trivial details, but nothing material):

    Most interactive computer programs look something like this:

    - take some underlying representation (ie, in a text editor, you might start with a raw text file)

    - sequence it through some series of transformations (eg: fix the encoding, move it into a better data structure for handling edits)

    - finally, (last transformation), expose it to the user in some way, so they can edit (eg the on-screen text buffer)

    ...then, where the magic comes in, is that the user makes changes to the data as its presented to them, and then having those changes propagate back up the chain of transformations.

    The research work goes through this in much more detail (like: what if a transformation is lossy in one or both directions?) but it does cover ~90% of program logic: read a datafile in, possibly transform it several times, present it in editable form to the user, then backport the user's edits to the underlying data format.

    It's a great formalism.

    The other area of investigation you might look into are some of the exotic languages (i'd recommend things like APL, Amazon's FPS language, and so on).

    If you want to get away from programming as such you want to move away from a language that specifies algorithms and more towards a language that specifies results.

    APL is a good example of this: it has a funny alphabet with a bunch of special symbols, and there's ~50+ 'primitives' you need to learn to really know all the basics (I can't think of any language with more, honestly).

    The downside is it's a pain to learn (and for that reason has basically evaporated from use except from a handful of niches) and requires special keyboards to use effectively (this might be less necessary if more stuff like iphones or tablets with software keyboards become widespread).

    The upside is that the basic language primitives are sufficiently flexible that you can express ~90% of simple transformations or calculations in a handful of basic language symbols, without really needing to define a new function.

    It gets you a lot closer to saying:

    - construct the 100x100 matrix where row(i,j) is the j-th root of the i'th prime #

    instead of writing a sequence of operations that another human can read and understand that will construct that matrix once it finishes operating.

    This is basically the polar opposite of the 'domain specific languages' approach: the DSL approach assumes you'll have some ur-language in which you implement specialized languages for a specific problem space.

    The consequence of the DSL approach is that you never get away from programming: there's always a new DSL to learn and new DSLs to write, and you're still programming even as a DSL-consumer, you're just programming in some rare language.

    ReplyDelete
    Replies
    1. English:

      - construct the 100x100 matrix where row(i,j) is the j-th root of the i'th prime #

      APL: {b p←⍵⋄b*-(primesupto 600)[p]}¨⍳100 100

      Shorter, just as understandable, one just needs a little education into the protocols.

      Delete
  7. I admit I got lost with your discussion of extended types and what not. I do think that the field of Software Development is ripe for a change and I think I may agree with alot of what you say.

    I think what you are trying to describe is another way of describing a problem that can move software development further up the stack. What I mean is Software first began as machine level instructions that were abstracted to assembler. Assembler gave way to languages like Fortran, Cobol and C. C++ began to supplant C, but the job was finished off by Java. While Java has been a success and removed a lot of the troubles of lower level languages, its success has allowed people to focus on other problems that it is not as well disposed to address. So now we are looking to what is next. I know that I left out alot of computer language development (like Smalltalk, Python, Objective C, etc...), but it was meant as an example, not as flame bait.

    One thing to note though, even though systems continue to change and the languages that describe them continue to evolve, there is going to always be work cleaning that mess up or rather learning to deal with "legacy systems". Whomever can crack that tough nut best, will probably eventually write themselves a huge check and the tools that "crack that nut" will probably better facilitate a future with "coding not as we know it" than anything else.

    Still your example in the beginning of typesetters reminded me more of the way that old DBA's must be feeling these days with all the ORM engines out there (particularly Hibernate and JPA in the Java world). They sound more like the typesetters of old, "bah your schema isn't properly n-normal form" or "those generated schemas aren't nearly as efficient as what I can whip up." They definitely must feel underappreciated, that is for sure.

    ReplyDelete
  8. Once again, thanks for all of the comments! :-)

    @Anonymous the first,

    Most programmers like to think about computers in terms of 'functions', but I've been pushing for years for a data-centric view. What we're doing is building up larger and larger piles of data, the code is just how we accomplished that. There are easy ways to collect and organize information.

    @Anonymous the second,

    Lenses are very similar to what I suggested, it's a very interesting reference.

    Given that APL was heavily favored by insurance and financial programmers, in many ways it is an early prototype for a DSL. The vector/matrix paradigm was far more natural for many of those problems, so the developers needed less translations.

    While higher-level instructions accomplish more, they still need to be iterated into functionality. Each new layer is built on the lower ones, and each lower problem propagates through the upper layers. Eventually it becomes unstable.

    @James,

    Yes, a move up the stack is often what I refer to as a higher abstraction, except in this case I've gone up and over quite a ways.

    Legacy systems will only stay around if they have some value. Most of our modern solutions have been so flaky, that the legacy systems have actually been the safe and secure option. That would change (slowly) if there existed a new system that was clearly better than the older one (as happened to movable type).

    ReplyDelete
  9. I don't know if this has been commented already, but programming and typesetting are two wholly different things.

    Typesetting is a very mechanical process. There is room for creativity, yes, but not much.

    Whereas programming is truly a creative process. As Fred Brooks said in the Mythical Man Month, it's something that we do solely in our minds that produces something subtly tangible on a screen.

    Therefor, no matter what happens, programming will never be obsolete so long as computers are not as creative as human beings. It would be the same to say that computers will one day make painting, writing, or musical composition obsolete.

    Now you mention that it's "coding" that becomes obsolete. Well, it won't ever. Code as we know it (Java, C++, etc...) may be obsolete, but we will still need some kind of language to express the creative process called programming. The language itself will change, but there still needs some way to express what we humans want to happen in a machine.

    ReplyDelete
  10. The analogy of typsetting <-> programming is flawed. Typesetting operates on a fixed set of content under a fixed set of constraints and rules, programming does not. If you want to make an analogy to the the world of the written word, then compare programming to writing literature.

    The paradign you propose as building blocks are already present and used in many programming languages/systems, to good effect too. Yet they have not made programming as we know it obsolete, therefore more of the same will not automagically bring about the golden programming free future.

    You do fall into the classical framework trap that is: "If we just could get that framework (of paradigns/patterns) right then all our problems are easy and we can start describing them abstractly".

    Frameworks succeed in their domain yet fail utterly outside it. There is no big unified be all/end all framework (and there will never be one). In the absence of the universal framework, if you try to press any work into the limited frame, you will end up being able to do less, and do it less satisfactory then with "manual coding". Often good/elegant solutions that make the problem managable exist outside a fixed set of paradigns/patterns/frameworks.

    Again you hit a wall in terms of intellectual work, because even choosing what kind of paradigns/patterns you want to employ to solve your problem is no fixed set of rules. And unless we have a true strong AI we can talk to and describe to what we want, there never will be a time when we do not write code ourselves.

    ReplyDelete
  11. This has already happened. Programming languages as we know them are already the desktop publisher as the hardware or assembly programmer was to the typesetter. These skills are still needed and transform themselves.

    There will still need to be some domain design, input/output and state management, persistence etc that is always different and always dynamic because technology is ever changing.

    Programming languages are already a highly designed entity for formulating applications and tools. They will simplify even further. Take a look at assembly to python or ruby. I would argue the latter languages might be even more precise than english language in many cases and in the end an application is still going to have to be described and created from some language. Or transformation as you call them. The transformations already take place numerous times in the layers of software and OSI framework. More layers are continually added.

    Coding will never end, just as everything it will simplify but really just provide a simple facade to complexity that is hidden below.

    ReplyDelete
  12. OWL and RDF have been promising the same for past decade! Semantic Systems with data descriptors (OWL) would seem to allow transformations from meta-object to another as and when the context changes or applications changes.
    These kinda predictions are being going on from the DCOM/CORBA days, semantic systems and so on ...
    They all seem accessible to the current stream of technologies but still ...

    The woods are lovely, dark and deep,
    But I have promises to keep,
    And miles to go before I sleep,
    And miles to go before I sleep.

    ReplyDelete
  13. Just because you can translate data from one format to another it doesn't mean you can have a computer assemble a meaningful and useful user interface and efficient program behind it.

    Your no-where near ending computer programming.

    ReplyDelete
  14. "to show that there is at least one simple model"

    Sorry, you completely lost me on the 'simple' part. It sounds a bit like 'its structure is simple, so its application must be, too' of XML and other languages for the masses (or those for smart people, but they don't haven that claim).

    See also: Panel two of this xkcd. I would very much like an automaton that rids me of the obvious parts of my work.

    ReplyDelete
  15. I find the analogy with writing a book best. Can you automate the writing of a novel? Sure! Would anybody's heart be moved by it?

    Programming is a business plan to a business problem. Once completed, it becomes the story. Can this be automated? Will you trust your paycheck to it?

    When you automate programming, you might just automate the human being.

    Ready for a wire plugged in the back of your head?

    The analogy with typesetting is false. Typesetting was a minuscule part in the delivery of words. You might compare that with software deployment. Don't confuse it with the book itself.

    ReplyDelete
  16. There is a pattern in creating software that is: 1) it is never finished and 2) there is always more things you can do.

    If you just make the bulk of what we write code for today easier to do, you will still end up writing the same amount of code, because you'll have more time to write it for the part of the work that isn't finished and that couldn't be made easier.

    ReplyDelete
  17. I love it when folks get together and share opinions.
    Great website, stick with it!

    ReplyDelete

Thanks for the Feedback!