Monday, July 27, 2020

Defensive Coding: Names and Order

A fundamental rule of programming is that the code should never lie. 


One consequence of that is that a variable name should be correctly descriptive of the data held inside. So, a variable called ‘user’ that holds ‘reporting data’ is lying. Report data is not user data, it is incorrect.


A second consequence is that in the system if most of the variables that are holding data about people interacting with the system are called ‘user’, then pretty much every variable in the system that holds the same data should have the same name. Well, almost. If one part of the system only deals with a subset of the users who have admin privileges, then it’s okay that the variable name is ‘admin_user’. Semantically, they are an admin user, which is a subset of all users. The variable name can be scoped a little more tightly if that increases its readability or intent. Obviously though, if in another part, the user data is passed into a variable called ‘report’ then that code is busted.


In some lower-level parts, we might just take the user data, the reporting data, and the event data, convert them all into a common format, and pass that somewhere else. So, the name of the common format needs to be apropos to the polymorphic converted data that is moving around. It might be the format, like ‘json’, or it could be even more generic like a ‘buffer’. Whatever helps best with the readability and informing other programmers about the content. General names are fine when the code has been generalized.


In most systems, the breadth of data is actually not that wide. There are a few dozen major entities at most. For some people, naming things is considered one of the harder parts about coding, but if variables are named properly for their data, and the system doesn’t have that many varieties of data anyways, and not a lot of new ones coming in, then the naming problem quickly loses its difficulty. If the data already exists and it is named properly, a new variable holding it elsewhere should not have a new name. There is no need for creativity, the name has already been established. So, naming should be a rather infrequent issue, and where readability dictates that the name should be tightened or generalized, those variations are mostly known or easy to derive. 


The other interesting aspect is that there is an intrinsic order to the entities. 


If we were writing a system that delivers reports to users, then the first most important thing we need to know is “who is the user?” That is followed by “what report do they get?” Basically, we are walking down through the context that surrounds the call. A user logs in, then they request a report. 


What that means is that there is a defined order between those two entities. So for any function or method call in the language, if both variables are necessary, they should be specified in that correct order. If there are 3 similar functions, ‘user’ always comes before ‘report’ in their argument lists. 


Otherwise, it is messy if some of the calls are (user, report) and others are (report, user). The backward order is somewhat misleading. Not exactly incorrect, but some pertinent information is lost.


Now if the system has a couple of dozen entities, the context may not be enough to make the order unambiguous, but there is also an order that comes from the domain. Generally, the two, have been enough to cover every permutation.


The corollary to this is that if we are focused on preserving order, but we have code that instead of passing around the entities, is passing around the low-level attributes individually, it becomes clear that we should fix that code. Attributes that are dependent should always travel together, they have no meaning on their own, so there is no good reason to keep them separated. That is, we are missing an object, or a typedef, or whatever other approaches the language has to treat similar variables as a single thing. When that is done, the attributes disappear, and there are fewer entities, and of course less naming issues.


What’s interesting about all of this is that if we set a higher level principle like that ‘code should never lie’, the consequences of that trickled down into a lot of smaller issues. Names and order are related back to keeping the code honest. To be correct about one means being consistent about the other two.

No comments:

Post a Comment

Thanks for the Feedback!