Andrey Romanov
Some time ago a proper functioning of program was the primary task of any developer, and it was really commendable to fulfill this requirement. But the time has passed, and now the program can be developed permanently for several years. It’s done by different developers, each of whom have no connection with the “before-I-came” stages of development process. For successful application development it’s necessary to know how does it work, and this requires reading and undestanding the code. Regarding the necessity of reading the programs and the fact that the computers’ performance increases rapidly, the primary task of proper application functioning is now becoming secondary, while the primary goal is code simplicity and the application’s ability to be changed.
In this article there are some recommendations for program code improvement, which are to ensure the solid lifecycle of the application being developed by you now.
It’s a must to initialize the variable in the place of its definition
This rule is not very difficult to follow, but requires accuracy and discipline. Many times the developer is tempted to skip the variable initialization before the branching, in which it is to be named. It’s a decency to initialize the variable in every branch. During the development workflow the branching can be changed, new branches appear (some of them are executed very rarely), and the variable is not initialized in them. As a result, the application behaves unpredictably. To avoid this it’s necessary to initialize all the variables right in the places of their definition.
Avoid nested method calls
On receiving the new information our consciousness always tries to divide it into many small parts (connected with each other), each of them will carry some sense. The string is the example of such part. The smaller the part – the easier the undestanding is. The part size can be defined as the amount of sense in current context. Method nesting significantly increases the part size per string. The more difficult the fuction nested – the harder the difficulty of the string is. Simple functions nesting will not significantly increase the difficulty. But in many cases we can see the nesting of rather sophisticated methods (e.g. sorted keys list return / unsorted keys list receive method).
When reading such code, it’s very difficult to understand what the string is for, because it requires un-nesting the large amount of data and dividing it into small parts.
Debugging of such code is nearly impossible, because you can’t notice the moment of error, and there’s no possibility to view the results of nested functions execution (except for step-by-step execution).
Sometimes the developer wants to write several method in a single string for consolidating them into a single logical block. In this case it’s necessary to give definite and easily understandable names to such blocks.
Use variables, methods and class names wisely
Frankly speaking, to give names to variable, method or class is not such an easy task as it seems to be. First of all, frequently we don’t feel like inventing long and complicated names; in the second – our vocabulary is limited. During the development process the mind of the developer is occupied by other things.
It’s often worth stopping and thinking about the names. There are some hints and tricks to create good names.
For example, the variables. Variables store data, it’s often objects of classes, so the first thing in mind is to call the variable the same name as class (e.g. Cars *_cars), but it doesn’t carry any information about the intention of the variable. In most cases the intention is clear from the context, but it’s right only in case when the class and method names are wise, and there are not so much variables in the class. It makes sense to think and underrstand the intention of the variable. For example, it contains a link to the machine being repaired, so it would be wise to give the variable the correspondent name.
Methods (or functions) – what are they needed for? Methods perform some operations with the data, it’s obvious. Then, maybe, it’s good to give them names according to their functions? It seems to be right, but there’s a small problem. Many methods contain additional actions and conditions. For example: the method is called (void)writeToDisc. It seems to be clear, the method is writing data to the disk. But inside the method we can find the checking of some conditions before the writing operation, and after the operation there’s a counter being increased. Why not to get this into the name of the method? Because this beraks the rule of single responsibility, as far as it’s resonsible for not only writing, but also checking its possibility and increasing a counter. In this example the method must be placed in a separate class responsible for disk writing and anything connected with it.
Now let’s see the classes. The classes are something like between the variables and the methods. In the first, a class can describe some data and operations, in the second, it can describe some precedent (in UML terminology). Regading that fact, the name of the class can describe either its intention or the object described by class.
If there’s no possibility to create a name for the object (class, variable, method), you should stop and think if do you need this object at all, is it so necessary here?
Store the semantic value of the variable for its whole lifecycle
The abovemetioned problem of variables naming has one more aspect: sometimes variable changes its sense during lifecycle.
This is likely to occur when writing large methods, when variables contain intermediate results or temporary data, and the result is stored only in the end. Changing of sense breaks the rule of naming and significantly complicates the understanding of the code.
For intermediate results it’s better to use the variables with corresponding names, which will give the sense of these results.
Storing the sense of the variable allows to make the understanding easier and not to make confusions.
Comment the program
The classic comments are bad, and you should stay along with it. The comments inflate the code, moreover they tend to become outdated therefore making confusions. But what should we do then? How to comment the code?
Speaking about commenting I understand using the “speaking” names for variables, methods or classes. For example, in the code there’s a string [cars addObject:a] repeated several times. What happens here? From the first sight, it’s obvious: adding a car to the array of cars. But what does it mean? What sense does the “car array” bear? Or maybe it’s not an array at all? Let’s see an another string: [self addCarToRepairQueue:car]. Now the sense is completely clear, and we understand what was the problem: “cars” is not an aaray, but a queue. Sometimes we can use the whole classes (which are just wrappers for library classes without inheriting anything) for this.
Using the “speaking” names helps to store the comments up-to-date and improves the simplicity of the code.
Group the methods from general to specific
Programming is like writing a book – both have a plot and a number of characters. But unlike the books, program code requires understanding here-and-now at every moment. In the book we often see several disconnected actions, and only up to the ending the whole plot appears. In the program the situation is right the opposite, it requires understanding every action right in the moment of its execution. When a person reads the program, we need to provide him all the information required without making him to navigate through the whole code.
So how to organize the code for easy reading? Remember that people usually read from the top to the bottom. So the general description of the class (in our case it consists of open methods) must be on the top. Open methods must be sorted from the create/initialize to the dealloc (the last is considered to be public, although it’s not recommended to call it). Such sorting corresponds with the class lifecycle and therefore appears to be natural for the reader.
The open methods implementation must be like a book’s contents page, enumerating everything planned for execution. On seeing such methods we can easily understand the execution schedule. Methods used to implement other methods must be placed further in the text.
The person reading the program code may become satisfied by general information about the method, it saves the time and makes gratitude to the developers. If the reader needs more detailed actions description, he goes further into the text and explores the code of the methods called. The more details he needs, the further into the text he goes. Such approach allows to estimate the class quickly and understand its sense without wasting time on secondary moments.
Use the data source
Under the “data source” term we understand a rather vague thing including not only the object supplying data, but also wrapper methods and the places of data refreshing which require repeating data requests.
There are 2 types of data in the class: required for the class workflow (inner data) and the data, for the sake of which the class has been created (significant data in the context of the whole application). There would inevitably be the methods which change the significant data; it can be either direct editing or inner change.
The developer is often lured to change the data directly inside the methods. But this approach has a serious drawback: such code is really difficult to maintaining. The difficulty is in incertitude of data intactness, as far as we could forget to put a stop point in another method, which also performs direct data changing (regardless the setter). The setter method writing solves a part of the problem; moreover it’s possible to put a stop point inside the setter.
Code support includes not only detected errors correction, but also business logics changing. If the class has no data supplier, the changing becomes very difficult because of very large amount of places changing significant data.
If the conditions difficulty becomes too high, it makes sense to create a separate class responsible for incapsulating all the data acquiring conditions.
Separating a method responsible for data refreshing lowers the overall difficulty of changes implementation. Difficulty lowering is explained by concentrating the business logics in a separate method which can be called in response to some actions.
Describe the loop operations instead of inheriting and overloading
Frequently there are some actions in the work description, which begin from the presentation of some fragment and have a definitely marked beginning & end. This action is looped – it means that it can’t trnsform itself into another action without being terminated (e.g. photo image selection & sending cannot trnsform into viewing without being completed first).
There are several ways to meet this requirement, but we shall take only two of them. The first way is creating a ViewController, which includes photo selection logics, its transformation and sending to server. Then if you need to add a “Select photo” button (without sending to server, for example, to create a new object), it would require a lot of effort because you’ll need to copy the code or separate a basic class.
Another variant is to analyze the task and divide it into 3 parts: selection, processing and sending. It’s a temptation here to write 3 separate classes and declare their interaction in the recall controller, but it’s better to create one more class responsible for their interaction. This approach requires some additional steps, but brings a number of advantages: allows to concentrate on details, allows to build combinations from the developed classes, allows to transfer the choosing & uploading functionality to any part of the application without using inherit features.
The described control class is a description of looped operation, which uses other operations in its workflow. To use it in different parts of application we need to initialize the operation object and to call the operation start.
Using of operations allows to avoid inheriting and “templated” methods, therefore making the code clear and dividing the independent parts of the code. Dividing the semantically independent parts allows to concentrate on a certain task.
Use inheriting only in case of full transfer of class functionality
Sometimes there are objects that look alike (frequently they are lists), e.g. settings pages in the application with the common kernel build for different platforms. For first platform (let’s call it platform A) the settings page has already been created. In platform A this page consist of table with 5 cells, 3 of which alter the application’s condition directly, and the other 2 open new structures for settings change. We have a task to create a settings page for another platform (platform B), and the behaviour of the corresponding elements must be similar to platform A. Platform B is very close to platform A; so close that the code can even be just copied without losing the functionality and ability to work. Meanwhile the platforms are slightly different (in terms of resolution and performance) the page design must be changed – one of the cells responsible for direct application changing will disappear, and some functions (not implemented in platform A) appear. Also design and sequence of cells change a little. Since most functionality has been implemented on A, it makes sense to inherit it from the existing page, reload “excessive” methods and implement our own ones.
The inherit-based approach is sometimes rather good (e.g. when cells are changing, but everything else remains the same). But frequently base class includes some functions unneeded in descendant class, in this case unneeded functions need to be disabled (by flag or reloading). Perhaps it results in a working code, but the code will become not understandable and not changeable. Inheriting “bonds” the code, disallowing to change it and hiding the inner connections beetween data (because of reloading methods in descendant classes).
If the functionality transmission is only partial, it makes sense to separate a class incapsulating the required logics (e.g. as an operation or as a basic class fo A & B) and initialize all objects of this class in any required place. If it’s impossible, we should consider copying the code, because understanding the copy is much simpler than seeking problems in “template” methods containing unused commands. When copying the code, pay special attention to the fact that most of the code can be separated into static methods within a separate class. If you consequently transfer some pieces of code to a separate class, since some moment the logical class separation procedure will become apparent.
Of course, initialization of an object in many parts of application will cause copying of the code, but is it so bad at all? It’s bad to copy the code of business logics (a significant code), all other code is a wrapper.
Moreover, the wrapper is any code not reflected in specification and created exceptionally for business logics code calls. The wrapper is not loaded with business logics; its chanding cannot be ordered by customer (because it requires radical changing of technical task); and it tends to be simple and clear (frequently there’s only 1 string for creating object and 1 more for starting the action). If creating an object requires too much effort, it makes sense to think about a factory for single-call creation.
Avoiding inherits where no full functionality transfer is present drastically improves the clarity of the code.
Use the rule of single responsibility and class superposition
This means the following: every class or method must be responsible for only one function.
For example, we have a class responsible for choosing a photo, a class responsible for its sending, and a class controlling choose&send procedure. This allows not only the grouping of components, but also debugging a certain isolated piece of code. But how can we determine if a class is responsible for too little or too much? There are several methods to do this, all of them are based on a question about the reason of change. What is the reason which allows me to change the code of the class? If there is more than 1 reason, then the class is “over-responsible” and needs to be divided.
For example, let’s look at the photo choosing and sending. This obviously can be divided into 2 sub-tasks: choosing and sending (uploading). Imagine that a class called ImageSelector is responsible for choosing the image, and a class called ImageSender is responsible for image uploading to the server. To direct them both we’ll use a class called ImageUploader. The clients will use ImageUploader without knowing anything about the rest 2 classes. Let’s try to find the possible reasons for chanding the code of our classes. The ImageSender code may be changed only in case of protocol or destination changing. The choosing class code may be changed if the sources of photos have changed (e.g. a new online source is presented, but in this case it’s better to represent ImageSelector as a superposition of 2 classes). If a photo needs additional processing prior to uploading, only the ImageUploader needs to be changed. A new object responsible for editing will be added to ImageUploader.
Classes contain special information required for inner utilisation. To provide a service a class needs some data connected with this service. As a result, if the class supports many precedents, they will contain a huge amount of disconnected data. Despite being disconnected, this data will be used directly in any methods and can be changed regardless the specific rules. Following the principle of single responsibility allows dividing the semantically disconnected data among the classes. Lowering the amount of data simplifies the code changing, because diminishing the class size simplifies the consequences analyzing.
Such dividing allows not only flexing the architecture, but also checking and developing the parts of sybsystem independently from each other.
If it’s bad – make it better!
All of us like to use the third-party code. It saves time for development and encourages us by seeing our code not disappearing after the project is accomplished. But this causes an interesting consequence: together with the code we take its problems and the style it was written in.
When executing a complicated task a developer is usually said to explore the internet in search of ready solutions. If the solution exists, and the time to include it into a project is much less than the time to create your own – the ready solution quickly migrates into a project. But many open solutions (especially when the task is very specific) suffer from awful projecting and implementation (once I’ve met a graph code implemented in a single method of 7 screens long; although it was possible to integrate it into the project, I chose another one just because it was impossible to understand the code). When integrating such solutions it makes sense to think of refactoring, because all the subsequent changes (after the component implementation) will require much more effort. This refactoring will be useful from the aspect of the whole project, because most of the tasks require time to be executed from scratch instead of implementing the ready solution (if the situation is vice versa, there’s no sense to think of refactoring).
The abovementioned fact doesn’t apply to code of large libraries, because large libraries are written quite correctly (although its “religion” may differ from yours).
And now – a small story. Once I was creating a textView with placeHolder (text which is shown when the textView is empty). Surfing the internet I’ve found a ready and working component. But its code was really awful, the component couldn’t be maintained, and inside it was working improperly. I decided to correct it. Of course, it will take some time. But when everything was over, I had a well-working and clear class which could be easily changed without fatal consequences. Moreover it didn’t seem alien any more, it harmonically integrated into the classes space.
Improving of existing code often allows make it clearer and simpler, find and destroy vulnerabilities and errors buried deep inside the source code.
Invest into the future, but remember of present
There are 2 kinds of profit (which we gain from our activity): immediate and delayed. If you concentrate on immediate profit too much, you will stumble. If you stake everything on delayed profit, you’ll have nothing to eat right now. To write the bad code (without superposition and single responsibility) is like gaining an immediate profit (you work quickly and easily), but after some time you just lose yourself in your own code. Only the balance between immediate and delayed will lead to success.
Do not follow your development religion blindly
When following a development paradigm, a developer inevitably starts “optimizing” the task for his vision of things. It’s obvious that optimizing is necessary, but doing it fanatically blinds you from seeing other solutions, which differ from the common rules. In this case optimizing significantly impacts the stability of the application and its clarity. You should always keep your mind free of “religion”. Everything is good in its temperance.
Epilogue
In our workflow we often meet a complicated and not understadable code, and therefore make a choice: change everything up to our vision or leave it “as is”. There are many reasons for leaving the code “as is”, and they are quite objective (time, effort), but there are also some subjective ones. Among them there’s a fear of “punished initiative”, it’s connected with our previous experience of being blamed for the good things. But if peolpe hadn’t overcome that fear and didn’t change the world to better, we all still would have lived in dark ages. I hope that this philosophy can help someone to see the code better and to make it better and clearer.
Don’t be afraid of being a hero and changing something to better!
This entry was posted on Tuesday, January 24th, 2012 at 2:58 am and is filed under Code.