A very common programming and architectural problem comes from what I  like to call the 'edit loop'. 
It is a software programming  construct that exists when getting the user's data out of some form of  long-term persistent storage and then putting it into a user interface  for display or editing. The second half of the loop comes from taking  any changes made by the user and getting them back into storage. 
Mostly  these days the interfaces are GUIs, and the storage is a relational  database, but all of the same problems apply even if the interface is a  command line or text based, and if the storage is some other type of  persistent storage such as a key/value data-store. For this discussion  we'll just assume the interface is a standard GUI with panels and  widgets, and the database is relational. 
These loops are  generally triggered by some event mechanism, which is started by a user  navigating to a specific screen-full of data in their application. The  loops happen often, and are the main form of processing for the  application. They usually constitute the bulk of the functionality and  of the code.
Millions of programmers have been coding these types  of loops for decades. Virtually every system has some type of  interface, and in all of these there are often many types of edit loop. 
The  number of times this code has been written, rewritten, hacked or  refactored is staggering. It generally accounts for at least 80% of the  work of most application development. It is were most programmers spend  most of their time.
This post will start first with the stand  practice for building such loops in large systems, and then examine some  of the common problems with this approach. It will then get into ideas  about how to reduce the work and redundancies. Finally it will deal with  more complex architectures.
STANDARD DESIGN
The  easiest approach to building an edit loop is to start with the database.  
Since the major strength of a relational database is to allow  several different applications (such as reporting and mining) to share  the same underlying data, the first thing that should be done is to  create a fourth normal form schema in a 'universal' representation  usable by all of the applications.
Some applications have more  stringent performance requirements, so a few of the tables may have to  be de-normalized as required. 
From the database, the programmers  need to get the data into their running code. Since most applications  have slightly different requirements then the universal schema, there is  usually some finessing of the data to put it into an application  specific model.
The current popular programming paradigm is  object-oriented, which generally relates the tables in the database to  specific objects in the application. 
There could be a one-to-one  correspondence between the tables and the objects, but it is more  likely that the programmers will source the same underlying tables  within many different objects in the system. There is usually a lot of  repetition. This collection of objects in the application is often  loosely referred to as the "model", although it is rarely so precise.
Once  the users have created a large number of different models of the data,  most system architectures are layered into distinct pieces, either as a  client/server architecture or just as some lower data level with an  interface level sitting on top of it. 
In either case, the data  constructed in the modeling part of the code needs to get transported to  the interface part of the code. 
The predominate convention for  this type of 'transportation' code is to make it strongly typed. Each  variable loaded from the database has a consistent data type that stays  with it as it travels throughout the system. If the field is an integer  in the database, it is read as an integer, passed through the system as  an integer and edited as an integer. The type stays consistent. 
Beyond  the transportation code lies the user interface. In most systems this  is predominately the largest chunk of code. Although the convention is  to not decorate the data with 'presentation', normally the different  models in the back-end have been specifically created based on different  presentation needs, so some of that presentation information is already  implicitly encoded into the data. 
This mostly occurs because  the panel/widget code is often thick, ugly and confusing, so the  programmers push the different views of the data back down to the  database layer where it won't get mixed into the GUI code. What starts  as good intentions quickly gets obfuscated.
THE REVERSE TRIP
So  far, the data has worked its way up from the database, through to the  interface. It has been strongly typed right out of the database and  throughout its journey.
Once into the interface, if it is only a  half loop, the data is just further spruced up for presentation,  annotated (with things like links) and then dumped to the screen. 
If  the data is involved in a full loop, it is usually displayed on the  screen in a series of widgets that allow for it to be modified. Once the  user's editing work is deemed complete, the data is then heavily  validated. 
Usually this means some set of checks that is more  stringent than just its simple data-type. For example, if the field is  an integer, perhaps the only valid values for it are between 1 and 4. It  is highly restrictive.
Many fields are also cross-checked with  each other, to make sure that the whole set of data fields is consistent  with some set of external rules. 
If this process fails, the  problem is pushed back to the user, to try again. And again if required.  Until the data is finally validated enough to start the journey back to  the database.
On the reverse trip the data is also commonly  strongly typed. It goes from the interface code through some  transportation back to the database code. Many systems use entirely  different sections of code and different transportation mechanisms for  the second half of the loop. It is frequently seen as a different  problem from fetching the data.
Now, most schema usually have  some serious validations checks encoded into the database as well. It is  an essential part of a well-designed schema. If the application  requires a relational database, it should use that database properly and  to its fullest extent.
The incoming data must be consistent in  order to get added or updated in the database. However, most database  code assumes that the incoming data is fine, and then blindly tries to  save it. 
A common problem is basing user edits on stale data --  data that has changed since it was retrieved -- but this one is rarely  solved in most systems, usually just ignored.
In the database  code, the data is jammed into the database API, and then added or  updated as needed. If this fails, the code will generally just chuck an  error straight up towards the user. If it succeeds then a success  message is sent instead. 
Sometimes you'll find a second layer of  validation checks, just ahead of the database. Usually this is to allow  for a slightly different but more specific message then would have been  issued by the database in the error case.
INTRINSIC  WEAKNESSES
While this type of code construction is very common,  it makes a trade-off between just getting the code written quickly and  working smartly as a software programmer. 
Simply put, it is more  often a result of the programmers needing to get something working now  for a deadline, then it is a result of the programmers sitting around  spending lots of time thinking about the ultimate way to build their  systems. It is a by-product of the pressures to get something built  fast.
In these types of systems, there are essentially three main  sections: a) the database code, b) the transportation code, and c) the  interface code. 
Starting backwards, the interface code is the  mechanism to allow the users to use some functionality within a specific  context. That is, the user really wants to do something to the data in  the system; their choices are to edit it, or to view it in some fashion  (that may take significant computing power). Because of this, we can  view the interface code as just remembering the user's context, and as  an entry-point to launch some functionality for the user. 
In  most big systems, there are literally thousands of different individual  'user-functions' that can be accessed. These can be large like editing  data in a form, or small like re-sorting some data in a table. Each and  every action by the user executes some user-function of some type.
This  is usually the most code in the system, and the worst code in the  system.
Generally, most programmers short-cut the  object-orientation of their user interfaces so that the code becomes  very big and bulky. There is a tendency to create larger objects that  contain more underlying screen element construction code, like setting  up panels or widgets. Full object-orientation is very rare, particularly  in the more modern programming languages like Java and C#. 
Most  of this interface code is very specific to individual screens in the  system. Most screens are only used once within the system. There are a  huge number of redundancies.
The transportation code moves the  data back and forth between the interface and the database. Although the  transportation code is simple, because of an insistence on strong  typing, it is not uncommon the find systems out there with lots and lots  of very specific transportation code. 
Usually it comes across  as just glue code, binding one interface to another, but with a large  number of specific variable copies. Sometimes you'll find a  number of  different data transformations as well, where the programmers have gone  to extra work to fiddle with the data in some way, in the middle of  transport. 
On the opposite spectrum, sometimes the code is just  spanned right out to a full, but highly redundant representation, that  could have been easily compressed. The programmers thinking it was too  much effort to pack the data a little tighter. Simple tricks like  pulling out any common absolute partial strings, leaving only relative  ones, can help cut down on data duplication, but are rarely used. 
Mostly,  the bulk of the transportation work goes into setting it up on one  side, and then re-sorting it into a different shape on the other.  Different schools of philosophy and coding style have very different  preferences in how to build this part. Many prefer brute force.
The  final section of code is the database code. In a language like Java,  much of it is as direct SQL implementations in APIs like JDBC. In C#,  the programmers can save a bit of the ugly syntax with LINQ. Most  software development environments allows for some type of direct SQL  access to the database. 
Later implementations of alternative  paradigms like object-relational-mapping (OR/M) try to put an abstracted  layer over this code to make it less redundant and more consistent. 
Still,  one key problem is that the data representation in a relational  database is both universal and less expressive than the application  representation. As such, there are a lot of things that the code can do,  that are not easy, convenient or even possible with the database. 
The  difference in expressibility leads to a very common software  development mistake, where some developers try to build the system  backwards, starting from the interface, and doing the database layer  last. 
This top-down approach is easier to code, but it is more  likely that there will be issues when trying to force the application  perspective into a database, then if they had started with a bottom-up  approach. 
Also, the top-down approach leaves the data in a  highly de-normalized state which is application specific. If the data is  only usable in its persistent form by one application then it is a  technological waste to store it into a complex container like a  relational database, when there are far simpler and more expressive ways  to store the data. Still the convention is to not think that way.
CODE  BLOAT
These three sections, in most systems, get quickly filled  with lots and lots of redundant code. Most programming efforts expect  and allow this. That is, they set up the architecture, add in some  common libraries, and they start letting these three sections grow large  and fat.
There is usually more re-use initially, but as the work  progresses and the teams change slightly, more and more programmers  tend to avoid the old code, and just write new stuff. Bloat is normal.
What  inevitably happens is that the code gets more and more redundant. Each  new user-function gets its own specific interface, transport and  database code. Most times, the teams are conscious enough to not  duplicate data in the actual schema, but even then you can often find  the same data stored in multiple different tables.
Redundancies  become common place. One clear way to see this is by breaking the code  up into two. Given two distinct sets of screens, how easily can you  split the system into two separate programs? How much code will they  share as common libraries? 
If this decomposition is easy and  includes the bulk of the code, then the screens are 'independent' from  each other. 
In a non-independent system, most of the code in the  system is shared between the different screens. Each screen has very  little unique code.
In an independent system, you could keep  breaking the system into two new pieces, recursively, until all of the  screens exist in their own programs. If it wasn't for context, you could  also break down all of the user-functions into their own programs. A  thousand little programs would work just the same as one big one  (although navigation from program to program would be tricky).
Independence  means that there is more and more specific code that has a 1-to-1  correspondence with some user-function. The code is only executed by a  very specific entry-point into the system. As that code is replicated  throughout the system, it becomes worse and worse. 
We can define  a metric to count how well the code is being re-used. We'll call it  'leverage'. The leverage value L, for a line of code in the system, is  the number of user-functions which require it. If the code is used only  by one specific piece of functionality, then L=1. If it is shared in two  places, then L=2. If it is never used then L=0. If there are N  user-functions that need it, then L=N. It is a straight-forward count.
We  can combine the different L values for the various code blocks to get  one overall value for the system. Independent systems have a lower  average L value. The more independent the code, the more it is  redundantly doing the same work over and over again. The more likely  that changes will cause new bugs.
Dense systems have a higher L  value. If the value closely matches the amount of functionality in the  system, then the bulk of the code is being greatly re-used, and even  minimal testing has large coverage. Systems that have higher L values  are more likely to have higher overall quality. There is way less code  in the system, and more bugs are found with less testing. All good  attributes to have in a code base.
We clearly want to try to  maximize the leverage in our systems. Less work, means we can do a  better job. Nothing is worse than flailing away at poorly structured  code.
FIXES AND SOLUTIONS
If you have a big system  with thousands of user-functions, then even if the functions are all  relatively simple and only require a thousand lines of code each, a  complete independent system is easily over a million lines. 
And  it is that multiplier effect that is so poorly understood by many  programmers. 
There is a limit to the speed at which we can  write, and as the application becomes bigger and more redundant, our  coding speed gets slower and slower. Inconsistencies get larger and more  dangerous, causing the bug fixing to become another huge problem. 
Mostly  the lower the leverage value in the system, the harder future  development will become, often further lowering it. It is a downward  spiral.
The only real solution is to try to increase the leverage  to its highest possible value. Only in a highly leverage system will  the developers be able to keep up with the work load and most  development will get easier with time, not harder. 
The first,  most obvious change is to the transportation code. Wherever it is, in  any system, it really doesn't need to know the type of data it is  handling. It is crucial to minimize parsing, and conversions, but we  also want to leverage any block of code as highly as possible.
Even  though code is handling the data, the code should always have a minimal  understanding of its data. That is, a container like a tree, doesn't  need to know about its underlying objects, it just needs to hold them.  And the same is true in the transportation section.
The data has  to come from the model, and the model has to alter the incoming data,  but from there all data is the same as any other data. It should go into  one large generic container. 
In that way, each and every  user-function that uses an edit loop, should go through the same  transportation code. If there are fifty user-functions that need a full  or partial loop, then L should equal 50 for the transportation code. 
Modern  languages allow for introspection, which allows for further  encapsulation of the packaging and unpacking code. The method of  transportation for the system should be identical for all parts of the  system. Writing duplicate, but slightly different code sections is  neither necessary nor reasonable.
From a programming standpoint,  it should be trivial to package all or parts of an underlying model into  a transport container. It should be a one-line statement. It should  also be trivial to get the data from a screen and send it back to the  back-end. These are hard, but good qualities to achieve.
STRONG  TYPING
Strongly typed code is often viewed as the "correct" way  to build systems these days, because it pushes some of the validation  work onto the compiler. While that can be useful in detecting some types  of compile-time bugs, the compiler cannot check the running content of  the data, so there are an awful lot of bugs that can't get caught this  way. 
On the other hand, any programmers with experience in  loosely typed languages know that there is always less code, and that  the underlying code is always intrinsically more reusable. Loose typing  makes for flexible code. Strongly typing makes it more rigid, which is  sometimes useful, but only in specific circumstances.
If we are  really interested in significantly reducing the size of our systems,  then we are interested in any type of approach that reduces code. 
By  keeping the data as loosely typed as possible, we can move it through  the system without having to create a lot of specific code to manage it.  For instance, in Java, since everything can be of type Object, then  everything can be loosely typed. Even the containers like lists and  trees can be packaged as Objects.
When we finally need to use the  data, we can convert it to a more convenient type. This process is  natural in a loosely typed language -- usually being handled quietly and  automatically -- but can easily be emulated in a strongly typed one. If  the interface needs a date from the database we can grab it, cast it to  an Object, and move it through the system and into position. Only at  the last moment do we need to need to re-convert it back into an actual  date string for display. 
At the end, this may seem like an  unnecessary conversion, but truthfully it is a small cost compared to  managing all of the strongly typed code in between. Software development  is always about trade-offs, and this is one of the better ones.
The  presentation layer generally needs to 'stringify' the data anyways, so  most often it isn't even extra, and it might just be an optimization in  some cases. If your underlying transportation code is forcing all of the  data into a strings, then the final conversion might just be moot. 
But  it shouldn't matter to the programmer, because the transportation code  should abstract away all of the underlying details. String, integer or  other data-type, the programmer shouldn't know or have to know about the  in-transport data representation, it should be encapsulated. 
All  they need to know is what their final data-type for the user should be,  and if they are using some interface abstraction, they may not even  need to know that much.
The less a programmer needs to know about  what is underneath, the less they should know. Many programmers tend  towards being too specific in their handling of data, often writing  large amounts of excessive, unnecessary code. It's a common form of  over-complicating the code. Stopping this behavior, or at least  containing it, helps in keeping the code from getting under-leveraged.
DUPLICATE  DATA VALIDATIONS
For most systems, the interface has very strict  validation requires and so does the database. As previously mentioned,  the database validates and returns either an error or success during an  update or save. 
It is not possible to keep the database from  ever returning an error, as there will always be unforeseen events like  being out of disk space, or communication problems. 
Because of  this, there must be a passageway through the system for database errors,  which one might as well utilize for other things as well. The more we  reuse the existing mechanisms, the less specific testing we'll need. 
Code  re-use always cuts down on testing.
So ultimately, we really  don't want to do much duplicate validation at the database level, and we  certainly don't want to do any of it at the transport level. Display  data is validated by the fact that it came directly from the database. 
Since  editing always requires strong validation, if we stack our validation  code into one place, then it is not duplicated or spread all over the  system. Editing is the best candidate.
Implicitly it does also  exist in the database schema, but that is OK. The database schema  encodes a universal view of the data, which can have its own different  validation, and differences should be accommodated with the reading and  transformation of the data. The differences may be small, but they need  to be encapsulated together.
In that way, although we want strict  validations in the interface, the rest of the system shouldn't know,  and shouldn't care about its data. That is, it should just be some  loosely typed 'thing' that is going someplace. 
Code shouldn't  know any more about the data, then what is absolutely necessary.
Now  once at the front-end, both the presentation and editing do require the  data to be strongly typed. In fact, they require 'strict' typing, where  the type is far more limited than just a simple data-type, it is tied  by its domain and by being cross-referenced to one or more other  variables.
FRONT-END ABSTRACTIONS
One place where  programmers spend way too much time is on the front end. The code is  highly repetitive, and none of the current framework paradigms attempt  to abstract that down to something smaller. 
Common practice is  too create a small set of unique objects per screen, or action, and then  fill these with highly-redundant low-level GUI code, such as allocating  widgets, or handling events. 
Model-view-controller (MVC)  frameworks bring this down marginally to dealing with actions, but since  they usually have a one-to-one correspondence with screens, there is  still a large amount of nearly identical code.
Mostly, all of the  interface code is about displaying data or editing it. The displays can  be textual or image based. They include embedded user-functionality to  help drive the appearance of interactivity. They are all very similar in  nature, and always need to have some overriding consistency.
But  if the code is spread out into a large collection of redundant  sections, keeping it neat and tidy quickly becomes impossible. What's  needed is an overlaying abstraction to minimize the code and enforce  consistency.
A good abstraction provides a higher-level  structure, in an attempt to reduce the amount of code necessary to work  with an encapsulated set of behavior. That is, the programmer should  have to do way less work, when working with a good abstraction. It not  only provides structure, but also reduces effort. 
An abstraction  that works nice is to fit a 'form' over all of the interfaces.  Read-only code fits into a read-only form, while editing works mostly  within a standard forms model. 
In a reasonable implementation,  the user would need some minimum way of defining the form, and then  populating it with initial data. A good abstraction would hide most of  the normal interaction with the user, allowing for only special cases to  flow through. This is necessary because the programmers shouldn't have  to reinvent the mechanics for handling simple technical things like  paging list of data from the database. 
Keeping the programmers  away from re-inventing the smaller technical solutions, also helps to  enforce consistency in the behavior of the interface.
Some care  needs to be established because frameworks can occasionally make things  too implicit. That is, APIs often present the wrong types of options to  the programmers. Simple obvious values should be set into the defaults,  and the only options that need explicit overriding should be those that  really represent a degree of freedom within the method call (although  multiple calls are a better choice sometimes). 
Restricted  validation and some cross-conditional handling is always necessary. When  ready, a good abstraction will present the final and completed data to  the programmer, ready for transportation.
Of course, many  applications will have special corner-cases, so the abstraction will  need to allow the programmer to 'hook' in code at specific points. This  can be complex because these hooks are intrinsically disconnected from  the rest of the system, making them less obvious and harder to  understand. A good interface should read very simply, and so the nature  and purpose of the hooks should be obvious and easy to grasp. 
BACK-END  COMPRESSION
One of the hardest places to reduce overly redundant  code is in the back-end data models. Still, there are some key data  elements contained in the database which are of interest to the  application. If you're strict about minimizing these, and keeping out  presentation information, the code can be reduced. 
Most schema  have inherent redundancies in their tables, such as date stamps and user  auditing information. These can be encapsulated, and reused over and  over again. Small convenience libraries can be used to make any of these  common fields share one implementation.
As well, even if the  schema is forth normal form, some of the tables themselves can be  brought together at higher levels. Although this type of generalization  can cut down on code, you have to be very careful not to over-do it and  make the schema itself impossible to share across applications. It  becomes a set of very hard trade-offs.
If the data is only ever  used in one application, and has some alternative import/export format  for external system's comparability, then using some other  non-relational format may be a better choice. Ideally, the less work  required to load and finesse the data is the best solution. 
Thus  in an object-orient language, a real object-oriented persistence  mechanism that is super-simple is the best choice, although  consideration has to be made in correctly handling system data upgrades  at some later point. 
The best solution is to just boil it down  to its absolute minimum, and take advantage of techniques like  polymorphism wherever possible. Less code, means less duplication, less  redundancy, less testing and less bugs.
SUMMARY
Inexperienced  programmers are generally more concerned with getting their code to run  then they are with keeping it so. Because of this, they rely on the  simpler and more obvious brute force approaches towards development,  which significantly increase the amount of code, reduce the leveraging  and the increase the work in testing it. 
A system with a few  hundred user-functions might start out OK, but if most of the code is  poorly leveraged, that quickly changes as the project matures. 
Poorly  leveraged, highly redundant code, is the most common critical problem  with most software development. You see it everywhere. An initial  release might be successful, but the accrued technical debt grows more  rapidly then the resources to offset it. Stagnation or implosion are the  expected consequences.
Ultimately, if the system is so highly  leveraged that just logging into it tests a large swath of the  transportation and back-end code, then the overall quality of the system  will be high. It will be high because it was intrinsically built into  the architecture. A few simple tests will cover large code sections,  which makes for a trivially well-tested system.
Ultimately, it's  not how many bugs you have in the system, it is how easy they are to  detect that really matters. 
As well, writing the initial code is  not particularly hard. Given enough concentration most people can put  together a sequence of instructions to tell the computer to do  something. The real trick to programming, is to be able to keep this  code sane, version after version. 
Without structure, and with a  low leverage value, the code will degrade rapidly as it gets pushed and  expanded. 
On the other hand, strong abstractions in highly  leverage code are the gold standard of programming. They make extending  the code -- any part of it -- easy. A sign of good code is that it  should be easier to grow the system, then it should be to re-write it. 
We  don't want huge, inconsistent systems. We don't want independent code.  Programmers never start out with this as their goal, but it is the  inevitable consequence of many of our standard programming best  practices. 
Given, that with a little more effort and thought,  the work involved in most systems can easily to be reduced by orders of  magnitude, it is surprising to see how rare this is in practice.  Programmers are usually their own worst enemies, even though few stay in  the profession long enough to correctly understand this.
Software is a static list of instructions, which we are constantly changing.
Monday, March 29, 2010
Monday, March 15, 2010
The Value Proposition
Suppose for a moment that you had a great idea for a new but  non-trivial piece of software. Somehow, through a bit of luck, you  managed to secure just enough money to get it built and released, but  only enough for you to do all of the work by yourself.
That is, you've come into possession of a empty company with just enough money to pay your salary for a couple of years, but nothing else. The money is guaranteed, but until you find some other source of income that's all you are going to get.
It's a simple scenario, but a strong one in being able to evaluate software development methodologies. It works well, because your time is limited, and for every choice you make there is an opportunity cost in terms of what you could have been doing instead. It forces you to make better choices.
If you spend your days writing long-winded comments for instance, you've severely cut down on your development time. Given that you can't afford another programmer, comments aren't all that useful are they? Whose going to read them?
Of course, in this scenario you can't win until you get the software finished, packaged, and shipped to people. A whole bunch of people. But initially you don't even have a market, or a way to generate sales, just a great idea.
For the sake of simplicity, we'll say the idea is so great that in the hands of a reasonable company it will definitely generate revenue with some type of reasonable profit. That is, it is a winner, even if it's not a big one.
To get it going, you'll have to get several versions built and released to a number of different clients before the revenue becomes consistent enough to hire new people to replace yourself.
Still, just knowing that the idea can win, doesn't lead one to being able to make it so. Initially you'll need demos, marketing materials, and enough time set aside to practice a song and dance about why people should commit to this product. You are not only the software developer, you're also the salesperson, or at least the pre-sales engineer.
PRIORITIES
You realize that this is a chance of a lifetime, to be able to create your own unique product from scratch, so you'd be crazy to turn it down. Somethings are just worth doing.
The first priority would be to set up an environment in which to work. A development machine is obvious, but some type of source code control would definitely help in tracing problems, keeping track of changes, and tracking down bugs. A separate source server would be easier to backup.
You'll have to set up the whole environment, including the development and test machines, so you'll probably want something really simple and easy to maintain. You don't want to lose a lot of effort to system admin tasks, and you don't have enough to out-source this issue.
Of course for tools, you don't want to waste a lot of time learning some fancy environment; the simplest, but most straight-forward tools are always the best. You need to edit, and search the code, but you'll also need some powerful debugging from time-to-time. You really don't want to lose a week or two in trying to track down a stupid bug with only print statements.
Assembling a few lines of working code is not a hard problem, but as that code grows and grows, you'll quickly find that "structure" at multiple levels is important. It is there to separate the code, encapsulating it so it doesn't come back to haunt you. It also makes it way easier to triage bugs, and relate them to specific sections of code, something that will come in really handy later, when the first support calls start to eat into development time.
Structure is fine, but given that you'll be working on this system for years, and given that you'll need to do the sales yourself there will be plenty of interruptions, so you'll need to write down you plans.
Of course, spending six months to produce the perfect documentation, with arrows and charts and what not, is a killer waste of time. You only need enough documentation to remind you of exactly what you planned to do. Just enough to keep you honest. But enough to remind you what is important on extremely busy and disruptive days.
Early on demos will be necessary, so version control plays an increasingly important role. You must be able to code full stream, but fall back to a consistent version for a demo. Buggy demos hurt, and getting sales is your only chance at getting help. Of course a separate demo machine is important, and so is being able to update it quickly to the latest and greatest version. Another side issue is being able to quickly fill it with demo-related configuration and persistent data. And empty system doesn't demo well.
Now, in thinking about your design, you realize that if you just belt out the code you'll need way over half a million lines. Given that you've only got a couple of years, and you can't pound out that much code that quickly, you have to resort to re-using as much code as possible to reduce the code size into something manageable. Abstraction is the only way to do this. You find an intelligent way to implement some internal generic structure that allows you to hang all of the required functionality onto the same infrastructure, but without continuously duplicating each piece separately. Done well, the code comes down into the thousands of lines range.
Still, there are way more features than you initially need to get a sale, and you really want to get sales early to build up both experience and capital. As such, the 1.0 version of the system has got to be ready early, and you'll keep extending it, by adding in new features until it gets closer and closer to the product you imagined. While structure was really great in encapsulation and bug fixing, it really becomes crucial in handling how you will extend the system. The structure delineates what you think are the permanent lines, not the ones in which you'll end up replacing or refactoring code. In this way, the modules, libraries, components and other pieces make it far easier to do the extensions, particularly in small improvements. The pieces also make it easier to just test a small subset of the overall system. Testing too, is another time-consumptive problem that you have to watch out for.
Given that the the environment will be chaos once the product is out there, but you'll still need to be making a large number of improvements, any effort in structure, or in abstraction will pay huge dividends. In fact, you won't be able to win unless you invest effort enough in both.
VALUE
The common denominator, again and again is in just doing on the most minimal amount of work possible to get to the results. If you get caught up on a side-track or a make-work project, it's a painful waste of your already limited resources.
The easy way to really value work, then is by tracing it backwards from some necessary requirement. Something you just can't live without. You need software to sell, so you have to write the code. People want it to work, so you'll have to test it. Other people will need to understand it, so they'll be documentation.
You need documentation for the users, so you'll have to both make the interface simple, and provide some trivial documentation. Of course, if you make interface changes frequently you certainly don't want to create a huge amount of extra work by having to keep the user documentation up-to-date. It should be simple, and as resistant to change as possible.
Since you have to sell, you'll need marketing documentation. People always want to know about the features, or how to use the product or a tutorial, or any number of other necessary 'sales aides'. These documents are usually summaries, but they too need to be kept in sync. In fact, they often need to be updated head of the code, given that you are trying to generate interest in upcoming features.
Nothing looks worse than poorly packaged software. While it's common with some shops, vendors do not look professional until they've wrapped their works in nice installers. It may not seem like much, but generally even a simple installer can take weeks if not months, and as the systems progress they need continuous updating as well. Many modern technologies are utterly pathetic to install, so it makes the problems even worse. Suddenly you have to be concerned about the dependencies, like databases, containers and libraries, not just for their technical abilities, but also for their capacity to interfere with the installer.
Of course, while you're on dependencies, you can't forget the massive number of sticky, icky licenses that are floating about. You need revenue, and you don't want to make you're competitor's lives easier by giving them the system, so you'd like to retain control of the source code. That means you have to be careful in choosing which libraries you use underneath. Some of them are legal landmines, just waiting to bite.
One easy thing to forget is support. In the initial days, for your first sales you still have to add in new features, but you'll also be both customer service relations, and support. You'll be on-site post-sales engineering, and will definitely be there for the first few installs, it is both necessary and prudent. System administrators will call with technical problems, but also with other weird and wonderful issues, like which standards you support. Users will call, and it won't matter how simple the interface is, or how well the documentation is, they will still call. They do this mostly because they know you are small, and they can be lazy. And of course there will be bugs.
The worse part about many of the bugs is that they come out of no-where, when you are busiest and just eat up huge segments of time. Just trying to understand and replicate takes effort, but then if the problem requires documentation and/or patching, a few days or a week can just disappear. No matter how good the coding and testing is, there will always be bugs, and some will be very difficult to find. Support is always underestimated, and can be a huge drain on resources.
The initial panic to setup the environment, then create a design and get coding as fast as possible, gives way to a much more disruptive environment once the first sales get going. Of course, you have to sell to get more people, and you really don't want to just hire someone immediately, because the initial cashflow is un-steady. Hiring costs time, and laying off your first employees is both time consuming because you procrastinate, and demoralizing even if they take it well.
SUMMARY
Wining in a scenario like this takes an ability to really value work. In an environment with a lot of resources, it is very easy to place a value on useless work because it is deemed to fit some idea or process. But when there is just enough between you and success, you quickly find that a lot of what people think helps them, is really just excess baggage.
Of course some things that you might not think have value, come to play really strongly. Two examples are simple code, and simple interfaces.
Simple code doesn't mean brute force, but it does mean that the code does no more than exactly what it should. Abstractions massively cut down on the code size, so they are absolutely necessary but intricate, fragile and complex code is not. Code needs to be neat, consistent and easily readable, particularly late at night or when you are really busy.
Simple interfaces are another important feature, in that they both cut down on user documentation and support calls. If your system contains some uber-complex algorithm for handling stuff, it will provoke significant effort to educate the users, and that could cost you the win. A bad interface can eat a lot of time in creating it, but also a lot more in trying to support it. These are not resources you can afford.
What's interesting about a lot of our best practices in the software development industry is that they fail very quickly in this type of scenario. The same bloat and excess we put into the code, also goes into the making of the code. At the same type, you'll also find developers that have swung too far to the other way. They panic, becoming cowboy coders, avoiding all process and organization. Either too much or too little causes problems. In both cases, as the projects grow, the complexity increases and the resources get squandered on poor value propositions. If you have plenty of resources to waste, this might not be a problem, but even in large companies few developers have any real spare cycles.
That is, you've come into possession of a empty company with just enough money to pay your salary for a couple of years, but nothing else. The money is guaranteed, but until you find some other source of income that's all you are going to get.
It's a simple scenario, but a strong one in being able to evaluate software development methodologies. It works well, because your time is limited, and for every choice you make there is an opportunity cost in terms of what you could have been doing instead. It forces you to make better choices.
If you spend your days writing long-winded comments for instance, you've severely cut down on your development time. Given that you can't afford another programmer, comments aren't all that useful are they? Whose going to read them?
Of course, in this scenario you can't win until you get the software finished, packaged, and shipped to people. A whole bunch of people. But initially you don't even have a market, or a way to generate sales, just a great idea.
For the sake of simplicity, we'll say the idea is so great that in the hands of a reasonable company it will definitely generate revenue with some type of reasonable profit. That is, it is a winner, even if it's not a big one.
To get it going, you'll have to get several versions built and released to a number of different clients before the revenue becomes consistent enough to hire new people to replace yourself.
Still, just knowing that the idea can win, doesn't lead one to being able to make it so. Initially you'll need demos, marketing materials, and enough time set aside to practice a song and dance about why people should commit to this product. You are not only the software developer, you're also the salesperson, or at least the pre-sales engineer.
PRIORITIES
You realize that this is a chance of a lifetime, to be able to create your own unique product from scratch, so you'd be crazy to turn it down. Somethings are just worth doing.
The first priority would be to set up an environment in which to work. A development machine is obvious, but some type of source code control would definitely help in tracing problems, keeping track of changes, and tracking down bugs. A separate source server would be easier to backup.
You'll have to set up the whole environment, including the development and test machines, so you'll probably want something really simple and easy to maintain. You don't want to lose a lot of effort to system admin tasks, and you don't have enough to out-source this issue.
Of course for tools, you don't want to waste a lot of time learning some fancy environment; the simplest, but most straight-forward tools are always the best. You need to edit, and search the code, but you'll also need some powerful debugging from time-to-time. You really don't want to lose a week or two in trying to track down a stupid bug with only print statements.
Assembling a few lines of working code is not a hard problem, but as that code grows and grows, you'll quickly find that "structure" at multiple levels is important. It is there to separate the code, encapsulating it so it doesn't come back to haunt you. It also makes it way easier to triage bugs, and relate them to specific sections of code, something that will come in really handy later, when the first support calls start to eat into development time.
Structure is fine, but given that you'll be working on this system for years, and given that you'll need to do the sales yourself there will be plenty of interruptions, so you'll need to write down you plans.
Of course, spending six months to produce the perfect documentation, with arrows and charts and what not, is a killer waste of time. You only need enough documentation to remind you of exactly what you planned to do. Just enough to keep you honest. But enough to remind you what is important on extremely busy and disruptive days.
Early on demos will be necessary, so version control plays an increasingly important role. You must be able to code full stream, but fall back to a consistent version for a demo. Buggy demos hurt, and getting sales is your only chance at getting help. Of course a separate demo machine is important, and so is being able to update it quickly to the latest and greatest version. Another side issue is being able to quickly fill it with demo-related configuration and persistent data. And empty system doesn't demo well.
Now, in thinking about your design, you realize that if you just belt out the code you'll need way over half a million lines. Given that you've only got a couple of years, and you can't pound out that much code that quickly, you have to resort to re-using as much code as possible to reduce the code size into something manageable. Abstraction is the only way to do this. You find an intelligent way to implement some internal generic structure that allows you to hang all of the required functionality onto the same infrastructure, but without continuously duplicating each piece separately. Done well, the code comes down into the thousands of lines range.
Still, there are way more features than you initially need to get a sale, and you really want to get sales early to build up both experience and capital. As such, the 1.0 version of the system has got to be ready early, and you'll keep extending it, by adding in new features until it gets closer and closer to the product you imagined. While structure was really great in encapsulation and bug fixing, it really becomes crucial in handling how you will extend the system. The structure delineates what you think are the permanent lines, not the ones in which you'll end up replacing or refactoring code. In this way, the modules, libraries, components and other pieces make it far easier to do the extensions, particularly in small improvements. The pieces also make it easier to just test a small subset of the overall system. Testing too, is another time-consumptive problem that you have to watch out for.
Given that the the environment will be chaos once the product is out there, but you'll still need to be making a large number of improvements, any effort in structure, or in abstraction will pay huge dividends. In fact, you won't be able to win unless you invest effort enough in both.
VALUE
The common denominator, again and again is in just doing on the most minimal amount of work possible to get to the results. If you get caught up on a side-track or a make-work project, it's a painful waste of your already limited resources.
The easy way to really value work, then is by tracing it backwards from some necessary requirement. Something you just can't live without. You need software to sell, so you have to write the code. People want it to work, so you'll have to test it. Other people will need to understand it, so they'll be documentation.
You need documentation for the users, so you'll have to both make the interface simple, and provide some trivial documentation. Of course, if you make interface changes frequently you certainly don't want to create a huge amount of extra work by having to keep the user documentation up-to-date. It should be simple, and as resistant to change as possible.
Since you have to sell, you'll need marketing documentation. People always want to know about the features, or how to use the product or a tutorial, or any number of other necessary 'sales aides'. These documents are usually summaries, but they too need to be kept in sync. In fact, they often need to be updated head of the code, given that you are trying to generate interest in upcoming features.
Nothing looks worse than poorly packaged software. While it's common with some shops, vendors do not look professional until they've wrapped their works in nice installers. It may not seem like much, but generally even a simple installer can take weeks if not months, and as the systems progress they need continuous updating as well. Many modern technologies are utterly pathetic to install, so it makes the problems even worse. Suddenly you have to be concerned about the dependencies, like databases, containers and libraries, not just for their technical abilities, but also for their capacity to interfere with the installer.
Of course, while you're on dependencies, you can't forget the massive number of sticky, icky licenses that are floating about. You need revenue, and you don't want to make you're competitor's lives easier by giving them the system, so you'd like to retain control of the source code. That means you have to be careful in choosing which libraries you use underneath. Some of them are legal landmines, just waiting to bite.
One easy thing to forget is support. In the initial days, for your first sales you still have to add in new features, but you'll also be both customer service relations, and support. You'll be on-site post-sales engineering, and will definitely be there for the first few installs, it is both necessary and prudent. System administrators will call with technical problems, but also with other weird and wonderful issues, like which standards you support. Users will call, and it won't matter how simple the interface is, or how well the documentation is, they will still call. They do this mostly because they know you are small, and they can be lazy. And of course there will be bugs.
The worse part about many of the bugs is that they come out of no-where, when you are busiest and just eat up huge segments of time. Just trying to understand and replicate takes effort, but then if the problem requires documentation and/or patching, a few days or a week can just disappear. No matter how good the coding and testing is, there will always be bugs, and some will be very difficult to find. Support is always underestimated, and can be a huge drain on resources.
The initial panic to setup the environment, then create a design and get coding as fast as possible, gives way to a much more disruptive environment once the first sales get going. Of course, you have to sell to get more people, and you really don't want to just hire someone immediately, because the initial cashflow is un-steady. Hiring costs time, and laying off your first employees is both time consuming because you procrastinate, and demoralizing even if they take it well.
SUMMARY
Wining in a scenario like this takes an ability to really value work. In an environment with a lot of resources, it is very easy to place a value on useless work because it is deemed to fit some idea or process. But when there is just enough between you and success, you quickly find that a lot of what people think helps them, is really just excess baggage.
Of course some things that you might not think have value, come to play really strongly. Two examples are simple code, and simple interfaces.
Simple code doesn't mean brute force, but it does mean that the code does no more than exactly what it should. Abstractions massively cut down on the code size, so they are absolutely necessary but intricate, fragile and complex code is not. Code needs to be neat, consistent and easily readable, particularly late at night or when you are really busy.
Simple interfaces are another important feature, in that they both cut down on user documentation and support calls. If your system contains some uber-complex algorithm for handling stuff, it will provoke significant effort to educate the users, and that could cost you the win. A bad interface can eat a lot of time in creating it, but also a lot more in trying to support it. These are not resources you can afford.
What's interesting about a lot of our best practices in the software development industry is that they fail very quickly in this type of scenario. The same bloat and excess we put into the code, also goes into the making of the code. At the same type, you'll also find developers that have swung too far to the other way. They panic, becoming cowboy coders, avoiding all process and organization. Either too much or too little causes problems. In both cases, as the projects grow, the complexity increases and the resources get squandered on poor value propositions. If you have plenty of resources to waste, this might not be a problem, but even in large companies few developers have any real spare cycles.
Tuesday, March 9, 2010
97 Things Every Programmer Should Know
It is probably a good time for me to plug the latest in the 97 Things series. It came out last month and I managed to get a story of my included:
97 Things Every Programmer Should Know, edited by Kevlin Henney.
This is another great work containing a wide range of writers interested in sharing their experiences and understandings. Something our industry desperately needs.
For this addition I threw in one of my favorite stories from around the age when I really learned programming. I was lucky enough to have a mentor, which made a huge difference to my abilities. I had no real idea what I was doing when I started.
You can work as hard as you want, but you can't get beyond just pounding out the obvious until you gain access to a larger knowledge base. That is, until you climb up on other's shoulders, you're just getting the same view as everyone else. You're just learning to do what they did already.
We may produce tonnes of reference material, and have pretty good online Q&A forms, but they are really only band-aides to help programmers fix things when they go wrong. Our biggest problems come from the way coders are building their systems. From the way they are just belting out code.
A mass of ugly poorly structured code may not be immediately visible, but you can always tell the poor construction from the disorderly interfaces, annoying over-complexity and quirky behavior. The true nature of code always shines through!
Things won't change until we find ways to teach the "higher" principles to new coders; there is far more to programming than just assembling massive lists of instructions. Things won't change until people understand that flailing away at their code is unnecessary, unproductive and just adds to the problem; brute force always produces messy systems. Things won't change until we learn how to pass on our skill sets; left on their own, most new programmers will just pound out the same messes over and over again.
That is why the 97 Things series is such a great thing for an industry that is clearly having trouble trying to grow. A huge number of our software development problems stem from the fact that each new generation of programmers poorly reinvents the same wheels, while strategically avoiding the real problems frozen into our mammoth backward-compatible tar pits.
In an age were we have access to massive data, powerful machines and can connect to computers from anywhere, it is the only our software that continues to be a let down.
97 Things Every Programmer Should Know, edited by Kevlin Henney.
This is another great work containing a wide range of writers interested in sharing their experiences and understandings. Something our industry desperately needs.
For this addition I threw in one of my favorite stories from around the age when I really learned programming. I was lucky enough to have a mentor, which made a huge difference to my abilities. I had no real idea what I was doing when I started.
You can work as hard as you want, but you can't get beyond just pounding out the obvious until you gain access to a larger knowledge base. That is, until you climb up on other's shoulders, you're just getting the same view as everyone else. You're just learning to do what they did already.
We may produce tonnes of reference material, and have pretty good online Q&A forms, but they are really only band-aides to help programmers fix things when they go wrong. Our biggest problems come from the way coders are building their systems. From the way they are just belting out code.
A mass of ugly poorly structured code may not be immediately visible, but you can always tell the poor construction from the disorderly interfaces, annoying over-complexity and quirky behavior. The true nature of code always shines through!
Things won't change until we find ways to teach the "higher" principles to new coders; there is far more to programming than just assembling massive lists of instructions. Things won't change until people understand that flailing away at their code is unnecessary, unproductive and just adds to the problem; brute force always produces messy systems. Things won't change until we learn how to pass on our skill sets; left on their own, most new programmers will just pound out the same messes over and over again.
That is why the 97 Things series is such a great thing for an industry that is clearly having trouble trying to grow. A huge number of our software development problems stem from the fact that each new generation of programmers poorly reinvents the same wheels, while strategically avoiding the real problems frozen into our mammoth backward-compatible tar pits.
In an age were we have access to massive data, powerful machines and can connect to computers from anywhere, it is the only our software that continues to be a let down.
Subscribe to:
Comments (Atom)
