Saturday, November 21, 2020

Keep Your Eye on the Ball

 It’s so easy to be confused while developing software. 

Each little piece is tiny and not particularly hard to understand, but the real complexity comes from the fact that they are often millions of them.


The end goal of writing software is to get code out there that does a good job of solving one or more user’s problems. If you figure out what they need, build it, and then get it into a stable operating environment, they can use the software successfully.


Time is a funny issue in software development. These days, there is always an intense impatience with getting the work done. It’s commonly believed that issuing low-quality work now, is better than waiting for a higher quality version. 


That might be true if each piece was really small and independent, you could get an initial version out there and then build on it as necessary, but code actually ‘stacks’. Newer code builds on the older stuff. If the older stuff is flaky, then no matter how nice the newer stuff is, it is also flaky. If you try to avoid stacking the code, you either make it massively siloed or totally redundant. The siloing hurts usability, while the redundancies waste huge amounts of time. There is no easy, side-effect-free, short-cut that will decrease the time and effort needed. We’ve known this for at least 50 years.


But it’s not just the code itself that can be the victim of impatience, or even the other side of the coin: bureaucracy. It’s the whole end-to-end process, which starts with the user talking about some way the computer could help them, and ends with their frequent interactions with the code as they apply it to their tasks. This is a very long series of little steps, where coding is just one small part of what needs to happen. If this ‘process’ is disorganized, or awkward, or deliberately slowed down, it applies a great deal of pressure onto the code itself forcing it to be of lower quality. 


Low-quality code is a multiplier. A little bit of it is okay and manageable, but as you get more and more of it, the negative effects are magnified. If you get a large enough pile of low-quality workmanship, it forms an operational black hole that sucks in all other efforts and is nearly impossible to escape from. A really bad system doesn’t have any viable short-term options left to fix it. 


It’s super important to be able to ‘assess’ the quality of organization and workmanship for any ongoing software development project, and it is equally important to be able to ‘correct’ any issues that are hurting the quality. However, it is extremely easy for people to miss the importance of this. In the rush, the negative quality issues are only getting a tiny bit worse, and they believe success will come from just getting something crude out there today. But success today comes at the expense of more pain tomorrow. If you keep that up for long enough, eventually the costs are inescapable. 

Sunday, November 1, 2020

Software Development Workflow

 When someone needs a computer to help them, the very first thing that should happen is that all of the specifics of the problem are written down. 

A vague idea about the problem with a potential way to solve it is a reasonable place to start, but in order to build something that fully works, eventually, all of the details have to be gathered and it is far more efficient to do that earlier, rather than later.


This analysis includes a definition of the problem, the data it involves, the structure and frequency of the data, any calculations that need to be made, expected user workflows, and all of the interconnections to other systems. There are always lots of moving parts even for simple solutions. These details are then put together in a document. This document is added to the library of previous analysis documents used to build up the whole system.


The analysis documents are the input for design, both technical and graphical.


A system spans a related series of problems. A design needs to list out a matching solution. The architecture needs to be laid out, the interfaces need to be arranged nicely. The interconnections with other systems or components need to be prototyped and tested.


There are a number of overall documents that put caveats, requirements, and restrictions on the final design of a system. Sometimes the pieces are brand new, sometimes they are just modifications to existing works, so there is some effort necessary to go back to the older analysis and designs as well, for many of the components. 


The actual technical designs include high, mid, and low-level specifications. The depth of the design is relative to the programmers who will be tasked with building it. Difficult issues or less experienced teams need lower-level detail. 


For interfaces, issues like colors, fonts, design grids, screen layouts, navigation, and widget handling all need to be specified. As well, an interface only comes together when it’s individual features are easily interoperable. If they are intentionally silo’ed the program will be awkward or difficult to use. 


All of this design work is added to the library of previous designs for the system.


At the coding stage, the programmers now have a good idea of what they are writing, how it fits into the existing system, and what data they need from where. The depth of the specifications should match their background experience. They combine all of these different references to produce the code that will be used for both the system and for any diagnostics/testing that occurs in development or operations.


As they work, they often find missing details or ambiguous specifications. They push these issues back to the originating sources, either design or analysis. 


For the base mechanics of their code, they perform very specific tests on issues like obvious data problems, fault handling, logic flow through the permutations, etc. They also perform operational testing on starting stuff up, basic failures, slow resources, big data, etc. Once they feel that the code is sufficiently stable, they submit it to integration testing via the source code repository. 


Development and integration testing are done relative to the technical issues within the code. It’s testing to see that the code is operational and that it will perform as the technical specifications required. It does not test the fitness of the solution in solving the user’s problems. That is handled by QA. 


After the code is built and integrated into the existing system, it needs to be run through a broader series of tests that relate it back to its usage. The input to these tests is the analysis and design that preceded the code. The things that need to be tested now are the ones that were specified. That is, if the analysis identified a specific workflow, then there should at least be one test that that workflow behaves as expected. So, the user acceptance and correctness testing is driven from the deliverables that existed long ‘before’ the code was written. They should be independent of the programmers, to ensure that any assumptions and bias were not baked into the tests.


Whenever testing is completed, a set of defects is identified. These are prioritized, some need to be corrected before the code can be used operationally. Some are just livable annoyances. This type of testing happens in ‘rounds’ as the problems are found, corrected and a new round is started. Quite obviously, the list of defects should be as close to complete as possible, so that the work involved here is minimized. Testing is most efficient when handled in batches.


At some point, the software still has issues, but everyone can live with them, so it is moved into operations. For some commercial software it is published, for SaaS or in-house code it is put into production. This means that it is monitored by a special group of people to ensure that it is running, that it isn’t hitting resource issues, and that it is satisfying the user’s needs. 


The code will have common problems, these are documented, and the reaction/solution to them as they occur is also documented. When operational issues happen, they are tracked, and this tracking is fed back to the developers. If a new, unexpected problem occurs, then the programmers might have to get involved in diagnosing it. If the problem has occurred at least once already, then the programmers should not be involved in investigating it. Operations keep this library of common problems and their mitigations.


It should be noted that a software developer is involved in all five stages, while a programmer is often just working during the coding portion. As the system grows, the library of analysis and designs should grow as well. The code isn’t the only resource that needs to be collected into a repository.  


For a large or complicated system, the process is very complex. Since the time it takes to do the work exceeds the time available for all of the work to be done. Development happens in a very large series of interactions. It is often running in various parallel stages, which significantly amplifies the complexity (and threatens the quality). Deficiencies in any part of the work manifest as bugs in the system. This is expected, we cannot build complex systems perfectly, but we can at least ensure that they converge on getting more correct as the project matures. 


The work involved in initially setting up a brand new project for the analysis, design, architecture, environment, configurations, etc. is a lot and is often rushed. This causes foundational problems that can really slow down the progress of the development over the years. As such, doing a better job of initially setting the project up has a significant impact on the success of the project over its lifetime. Correcting problems early is essential. Also, it is never safe to assume that the initial setup was fine, it’s better to just double check that it was set up reasonably since most often it was not.


Consistency, in any part of the project, cuts down on the total amount of work that is necessary. If the work is getting harder as the project progresses, it is usually because the process itself is disorganized. Keeping everything organized is a huge time saver. Avoiding redundancy is also a huge time saver. Many projects get caught into a vicious cycle of taking too many short-cuts to save time, only to have the earlier short-cuts waste more time than they saved. Once that is started, it can be very difficult to break the cycle.


Sometimes, for rapid prototyping, the analysis, design, and coding stages will all get intermingled. One coder or a small group of people will do all three at once, in order to expedite the development. That gets the code done really fast, but it obliterates any reasonable ability to extend that code. It should be thrown away or at least heavily refactored after it has served its immediate purpose. Weaknesses in organization, consistency, architecture, etc. can force later work to wrap around the outside. New layers effectively freeze or disable the original code, causing fragmentation and preservation of bugs. That can start another cycle, where the existence of the older code makes it increasingly harder to make any new code work properly, eventually halting the development.


If there is no documentation, or if the documentation was lost, you can reverse engineer the specifications from the code, but any of the driving context is lost. You know what the code is doing, but you don’t know why someone chose to implement it that way. That means the logic is either now locked in stone, or you will have to rediscover what someone else already went through. Neither outcome is good, so building up libraries of analysis, design, and problem mitigations is quite beneficial.


It is crucial to never put untested code into production. Since there is always a need for rapid bug fixing, the shortest time to write code, test it, and then deploy it, for most systems is less than a day. This is predicated on reproducing a bug in the dev or test environment so that the potential change can be shown to work as expected. For normal development work, the time it takes to go from analysis to production is weeks, months, or even years. The two workflows are very different, both are necessary.


When software development is proceeding well, it is a reasonably smooth occupation. When it is degenerating or trapped in a bad cycle, it is very frustrating and stressful. The amount of friction and Yak Shaving in the work is an indicator of the suitability of the process used for the effort. If everything is late, and everybody is angry and tired, then there are serious process problems with the development project, that can and should be fixed.