Thursday, February 17, 2011

Big Data from Small Source Code

Editor’s Note: This is a guest post by Denton Gentry. He’s an experienced software developer and fellow blogger from California. This years New Year resolution for Denton was to cross-post to other blogs. He’s off to a good start. Read more from Denton at his blog Coding Relic.

"Nobody would be interested in this code, why open source it?"

"I'm not really looking for people to help out on this code, its just a simple little thing."

"I wrote this to learn the language, its not good enough for anyone to see."

I suspect that most developers are proud of the code they've written, but most of that code is owned by an employer and considered proprietary. We couldn't open source it even if we wanted to. Code written on one's own time and not related to the employer's business could be open sourced, but rationalizations such as these keep us from doing so.

I know I've used them.

The traditional reasons for open sourcing code revolved around collecting a community of like-minded developers to collaborate with. It is a lot of work to build such a project, and even then most such efforts fail to get traction and simply languish. Driving such a project would take a lot of time and effort, time not worth investing for something one doesn't feel truly passionate about.

The thing is, the world has changed.

When it comes to hiring, I'll take a Github commit log over a resume any day.5 Feb via webJohn Resig

Very subtly over the last few years, we've transitioned to a new model for information on the Internet. Its a world of Big Data, where we can draw inferences from collections of data even if the individual pieces go mostly unexamined.

  • Code search engines mean someone can reference your code to figure out arguments to a particular API, even if they don't use the code itself or care about the project it is a part of.
  • Framework and platform developers can use statistics from crawlers to know what parts of their API are widely used versus what is not getting traction. This can inform decisions about deprecation or API evolution.
  • Aggregate volumes of code can compare the rise and fall in popularity of different programming languages, useful information for developers looking to keep their skills current.
  • Resumes are dry reading, and get stale. Code commits, blog posts, etc are a living resume of one's work and skills.
  • Also, quite frankly you may be surprised at how many people benefit from a posting of code. Someone stuck on a problem will go through many pages of search engine results looking for an answer.

The question now should be if there are any reasons not to publish the source of personal coding projects. Is there a reason to keep it to private? Tools like GitHub and Stack Overflow have made programming into a social activity, with new opportunities for personal and professional advancement. The only cost in taking advantage of them is time, and even at that only a little time.

The question now of whether to publish the source revolves around internal factors, not external. "Am I proud of this code?" not "will anyone use this code?"

Tuesday, February 8, 2011

Top-down and Bottom-up

When I was younger, there was a lot of debate about how to design and build software. The prevailing theories were that you either started at the top and worked your way into the details, or that you started at the bottom and built up the pieces.

Software systems are tools used by people, so it is important to understand what their problems are and what is necessary to solve them. The only way to do this, is to see it from their perspective. If you understand what the user is trying to accomplish and how they are doing that, you can find the best design that simplifies their lives.

Functionality should not be arbitrary buried in dis-organized menus, it needs to be at their finger-tips, right when they need it, and out of the way the rest of the time.

Design then, only comes together if you’re looking down at the problem. You have to start at the 10,000 foot view and then wind your way through all of the steps necessary for someone to complete their work. Empathy for the users, and a deep understanding of both their environment and their goals is key to creating the tools that actually make their lives better.

But, from the other perspective, software is extremely slow and expensive to write. As the user expectations have increased, even the small systems of today are significantly larger then just a few decades ago. And the more we depend on underlying libraries, the faster the complexity increases. Even if we don’t have to write all of the code, each underlying dependency brings with it a unique set of problems that requires time to understand it and time to manage it properly. Declining standards, poor design, forgotten knowledge and sloppy release procedures in these external pieces don’t help.

Our only defense against spiraling complexity is to try isolate the work as much as possible. That is, highly redundant code that haphazardly calls the underlying components, anywhere, is the type of spaghetti that quickly burns through all of the development resources by wasting time chasing sloppy mistakes. A well-architect-ed system that correctly encapsulates all of its underlying parts allows for the scope of change to be controlled. Once a problem has been settled, it should no longer crop back up in arbitrary places. Each piece must fully encapsulates a specific section of the system. This is the only way to provide a solid foundation on which bigger and better functionality can be added without fear of setting off a chain reaction of cascading bugs.

This type of design can only be achieved by building upwards. Starting with the depths of the system, not unlike a real building, each layer is carefully designed, built and stacked. The lines between the layers need to be well-understood and carefully mapped out. This type of bottom-up construction insures that the whole does not become unstable as it gets pushed and extended. It can also insures that the momentum of the project doesn’t grind to halt because of exponential explosions in complexity. It requires a bit more effort and foresight, but it is the only way to build large systems that are dependable and can continue to grow as the needs of the users increase.

So, is it top-down or bottom-up? Really it is both. The design and requirements need to be seen from the top, but the construction needs to focus on building up solid and reliable pieces from the bottom. Foresight and experience are necessary to understand the scale of the system, but projects still need to be adaptable to changes in the environment, technologies or the user’s priorities. A solid foundation is necessary to insure that the system will survive for its maximum lifespan while continuing to grow.

Although our industry pretends to turn over every five years, by now it is clear that the life expectancy for big systems is ten to twenty years. And for some systems, they may be around far longer than that. With that understanding of our history, it seems even more reckless to hack together something without a design, apply duct-tape excessively as leaks appear, or just hope to catch the flaws in testing. Good software development demands a longer-term focus as well as seeing the whole process from both the ground and the aerial view.