Thursday, February 17, 2011

Big Data from Small Source Code

Editor’s Note: This is a guest post by Denton Gentry. He’s an experienced software developer and fellow blogger from California. This years New Year resolution for Denton was to cross-post to other blogs. He’s off to a good start. Read more from Denton at his blog Coding Relic.



"Nobody would be interested in this code, why open source it?"

"I'm not really looking for people to help out on this code, its just a simple little thing."

"I wrote this to learn the language, its not good enough for anyone to see."

I suspect that most developers are proud of the code they've written, but most of that code is owned by an employer and considered proprietary. We couldn't open source it even if we wanted to. Code written on one's own time and not related to the employer's business could be open sourced, but rationalizations such as these keep us from doing so.

I know I've used them.

The traditional reasons for open sourcing code revolved around collecting a community of like-minded developers to collaborate with. It is a lot of work to build such a project, and even then most such efforts fail to get traction and simply languish. Driving such a project would take a lot of time and effort, time not worth investing for something one doesn't feel truly passionate about.

The thing is, the world has changed.

When it comes to hiring, I'll take a Github commit log over a resume any day.5 Feb via webJohn Resig
jeresig

Very subtly over the last few years, we've transitioned to a new model for information on the Internet. Its a world of Big Data, where we can draw inferences from collections of data even if the individual pieces go mostly unexamined.

  • Code search engines mean someone can reference your code to figure out arguments to a particular API, even if they don't use the code itself or care about the project it is a part of.
  • Framework and platform developers can use statistics from crawlers to know what parts of their API are widely used versus what is not getting traction. This can inform decisions about deprecation or API evolution.
  • Aggregate volumes of code can compare the rise and fall in popularity of different programming languages, useful information for developers looking to keep their skills current.
  • Resumes are dry reading, and get stale. Code commits, blog posts, etc are a living resume of one's work and skills.
  • Also, quite frankly you may be surprised at how many people benefit from a posting of code. Someone stuck on a problem will go through many pages of search engine results looking for an answer.

The question now should be if there are any reasons not to publish the source of personal coding projects. Is there a reason to keep it to private? Tools like GitHub and Stack Overflow have made programming into a social activity, with new opportunities for personal and professional advancement. The only cost in taking advantage of them is time, and even at that only a little time.

The question now of whether to publish the source revolves around internal factors, not external. "Am I proud of this code?" not "will anyone use this code?"