Code style as a development standard

Let's right away, this is not about brackets. Here we will talk about how our brain works and why code style helps ensure the linear development of the project, significantly speeds up the adaptation of new employees and, in general, forms and educates the development culture. I tried to collect in one article several studies and principles on the work of the brain of the developer, and how programmers read the code, and also shared the results of a personal experiment.

Interesting? Welcome to cat.

Hello! My name is Anton, I write backend in ManyChat. Recently we had a mitap dedicated to PHP development, where I gave a talk about code style as one of the development standards. According to the feedback, he went well to the guests of the event, and we decided to decrypt the report on Habr. For those who especially appreciate the effect of personal presence and more like watching a video than reading texts, we broadcast, which is available now. You can find it here. For those who love longrid, welcome further.

Our brain is a neural network

It’s worth starting with a description of how it works in general and why you need all that I’ll talk about later. Many of you are familiar with the concept of a neural network, or at least have heard this phrase, which for several years has been almost a symbol of hype in IT space and marketing. Even now, many companies are adding “AI based” to their products, like the “Non-GMO” sticker on dumplings. The neural network operates on the principle of pattern recognition and is extremely effective when working with homogeneous data. These are images, videos, sound, etc. The main point that relates to code styling is the principle on which they were built. Perceptrons, network cells, were described as brain processes and laid down long before they could be implemented into a functional, workable and efficient network due to poor computing power. Our brain, although much more powerful, works in a similar way. As a result, a person uses the power of a trained neural network from birth to death, and training is almost seamless.

So, we perceive only images. For the brain, there are no texts, sounds and other things - we perceive all this only after decomposition. Roughly speaking, we separate images “pixel by pixel”, add them to memory, and then with the help of associative thinking we recognize various objects. Thanks to this, we can understand whether we should run away from the snake or mount a bicycle and press its pedals. Here is a diagram of a standard perceptron, which illustrates how the classification of objects occurs. For those who are not familiar with the principle of the perceptron , it may look a little chaotic. But thanks to such a balancing and weighing scheme, the neural network recognizes images quickly and efficiently.

From programming, you can recall the DuckType principle, when it is believed that if an object swims like a duck, flies like a duck and quacks like a duck, it's a duck. A peculiar approximation of the detection of objects. And for our brain, object recognition happens instantly. Training is a much longer process. Therefore, you can imagine this process as well as the training of a young child who only learns words. You take an apple, point and say: "This is an apple." Take another and repeat again: "This is an apple."

And so on until the result is fixed. Green, yellow, red, spherical, elongated, with and without cuttings, black and white. Year after year, the child builds its base on the basis of empirical knowledge. I think the principle is clear. We do the same thing when training a neural network. From object to object or, more precisely, from image to image.

How a programmer reads code

The same thing happens when we work with code. We open the page, study it, decompose it into blocks, identify different parts - we easily separate properties from functions, isolation levels of methods and properties, find constants and so on. We identify all this by blocks, and therefore, an important aspect of code styling is to bring all the code to the same form. This is a key factor that characterizes code readability.

Why is it important? Because most of his time, a programmer reads code. Some direct correlation was not found, for the most part it depends on qualifications, language (for example, Python is indented, and incorrectly structured code simply will not work), code quality, etc. But from 50% of the time it takes to read your own and someone else’s code. If the code is complex, then the indicator can reach 75%, and if the code is really bad, then 95% of the time can be spent reading and trying to figure out where to add one line in order to correct some kind of flaw. And now the question. What will you do if you see that your code spends 75% of its time reading a disk or allocating memory to doubly linked lists? The programmer will try to optimize this process, try to change the algorithm, apply a more efficient storage structure, etc. Accordingly, when the time spent became critical, I began to inspect this process. And one could say, well, it's the brain, it is much more powerful than any computer and can store about a petabyte of data. But, in the process of developing my mental framework for working with code (a hidden process of forming code reading habits), I came across studies in which sensors monitored eye movements of beginner and experienced programmers. It looks like this:

Beginning

Experienced

Pay attention to what an experienced programmer does. It selects blocks of code, decomposes them, and reads them in blocks, highlighting key parts and analyzing their work. The beginner is rushing around line by line, just trying to understand what is going on here. The lion's share of the time is spent on putting together the big picture and reading the code takes place with constant cross-inspection.

The video is quite old, from 2012, and the recording was aimed at studying the processes of the brain of a programmer. You can read a little more here and here .

Now is the time to return to the description of the work of perceptrons. This is a clear demonstration of the work of a trained and untrained neural network. Based on his knowledge base, an experienced programmer follows the code like an interpreter, often without even realizing it. It is possible to approach the solution of this problem in the same way as we approach the problems of training neural networks.

There are 2 ways to speed up code reading:

Constantly build a developer’s knowledge base on how the code might look. This means lifelong learning, which is constantly expanding network capacity.
Bring all code to one standard, set the same code style

Of course, it is quite obvious that the first method is very labor-intensive. In this case, the programmer needs to constantly read the repositories. And given the staff turnover, the emergence of new technologies and practices, this becomes almost impossible. The method of exclusion remains the stylization of the code, which will reduce the affect of decomposition and leave only affect on business logic.

Number of Piano Tuners

Imagine that a team of 5 people writes, as they want, to all modules and components of the system, constantly receiving overlapping tasks, adjusting each other's code. What is the possible number of all options for writing a condition in if, considering that each person writes in his own way? 5. And how many valid blocks do we have? All constructions, calls of single-argument and multiple-argument functions, namespaces, classes, traits, interfaces, positioning of blocks, statics, visibility zones, etc. This is provided that everyone uses only 1 writing style, which is different from the styles of everyone else. And if the person is 10? What about 50? It is clear that with an increase in the number of people, the variance will decrease due to a limited number of ways to write the same block. And this is the first level in which we do not invest one of the main problems of programming - naming of variables. The effect of the multiplier and all possible combinations are not even taken into account. Throw on top the style of separation, indentation on spaces and tabs, love for if noodles, etc. etc. And new specialists constantly come and old ones leave. At the output, you get a huge unreadable code, which is impossible to get used to and it is impossible to develop a well-trained neural network from programmers in the company.

Our brains are lazy

Another interesting effect of our central processor in the skull is the extremely negative perception of new information. Our brain is very lazy. With a mass of about 1.5–2% of the total body weight, the brain consumes 25% of all body energy. One of the most resource-intensive surgery for the brain was, is and remains a concentration of attention. You can keep the maximum concentration for 20-25 minutes (hi Pomodoro technique), and during this time the brain will gobble up as much glucose as it would gobble up a whole day of subjective rest. Processing new data is an extremely resource-intensive process. And one of the main goals of the brain is to save these same resources. This is due to the work of our mental model. A kind of psychological blocker. It looks something like this. You start to learn something new. Something new is difficult to give, and due to the lack of tangible dynamics of progress, your self-esteem begins to decline due to the thought “I'm stupid!” Soaring in the background. Our psyche is designed in such a way that the whole attitude depends on self-esteem, and on our attitude, in turn, depends on our success in society. And in order not to break the basic social elevators on adaptive dependencies, our brain begins to resist activities that reduce self-esteem. As a result, you want to watch a TV series, read a habr, go have a drink of coffee, sit in a social. networks, etc. Anything, just not to learn the very theory of the Landau field. Or fix a hard-to-reproduce bug. Or ... read low-quality code that is hard to figure out. A great example of how the brain behaves according to restrictions in this area is people of about 70-80 years old who point blank do not want to learn a smartphone or 2 buttons on a robot vacuum cleaner. Resources are running out. The brain desperately blocks the learning process and acts on the thumb. There are a lot of these resources while you are young. With the passage of time and growing up, they become less and less. It works quite linearly. Fortunately, this is offset by the increasing power of our neural networks. Unless we are talking about direct targeted degradation, of course.

Some stereotypes

A common misconception that you might have heard: “Well, I'm a pro, I can read any code. I write fast and cool, solving company problems. And it doesn't matter what my code looks like if it solves the problem. The rest simply can’t figure it out quickly, I have no problems with this. ” If you have such an encoder in your environment, you can tell him: “Dude, I have bad news for you. You’re aflecting the whole team, and this approach is the reason why others write more slowly when a super efficient programmer quickly forces code. ”

A super-efficient programmer who writes without formatting and standardization is actually super-efficient in 2 things:

Super efficiently close tasks without going into design and style.
It’s super effective to slow down everyone else, because after its additions, people have been trying for a long time to understand what happened here.

Why code style is needed

Although this should already be obvious, it is necessary to somehow emphasize the main points.

Code style:

1. Provides a linear development of the project and does not affect the amount of code base. If you have provided historical writing of understandable code, then no matter how many developers come and go, you always have equal code quality, which allows the project to grow dynamically regardless of its size.

2. Significantly speeds up the process of adaptation of new programmers. If your code is written clearly, a new specialist will quickly train his neural network to identify blocks and begin to be useful. There is such a thing as an employee self-sufficiency point. This is the point from which the employee begins to bring only benefits. And if the code is written clearly, new employees do not need to understand business logic, it will only learn to read your code. And the faster he does it, the faster he will stop asking tons of questions to other specialists, taking away their time. And the time of specialists who have passed the breakeven point is much more expensive for the team and company in terms of the potential value of the product brought.

3. Removes dependence on particulars. You will not need to constantly stumble on someone else's originality and specific design.

4. Minimizes the effect of the mental blocker when learning a new code. Your brain is less resistant because no need to delve into someone else's style. Mental resources for reading clear code need much less.

5. Minimizes reputation losses. Very soon, after arriving at a new company, the programmer will begin to share his impressions with former colleagues and friends. And he will either say that everything is cool here, or he will highlight the negative aspects in working with the code. In a sense, this gives you an HR bonus: if you want cool programmers to work with you, make a good project. Although this is not always important, and in some companies they do not look at the quality of the code in principle, but only look at the delivery of features to sales, this is a nice bonus. It is no secret that a frequent reason for leaving is fatigue from the constant cutting of a poor-quality code base.

6. Forms and fosters a development culture. The programmer’s task lies at a lower level than the future of the entire company, but it’s important to convey the understanding that the comprehensibility and readability of code now affects the dynamics of further development. If the code is difficult to read and not standardized, it can be refactored and scaled with pain and suffering, then with the growth of the code base of the project, the development speed will drop. The more low-quality code, the more difficult it is to write a new one, the slower a product is developing, the more difficult it is for a company to grow and the more difficult it is for you to pay more money because it takes a lot of money to provide the project life cycle with new and new employees, which is the main Onboarding time is spent not on how to benefit the company, but on the classification of drugs under which this code was written.

Perfect code

We all understand that it does not exist. From the point of view of the canonical code style is the concept of code design. And all would be well if that were true. In the development of long-distance code style is a more capacious concept, which includes the principles of development. Code splitting into logically isolated blocks in classes or files also refers to design. Naming, isolation levels, inheritance. All these tools are not only interaction, but also the design of your complex mechanism, where each brick plays a role.

Concepts change from language to language. Particulars come to the fore. But the foundation remains unchanged. A quality code is determined by just two criteria:

Code is anonymous
The code reads like a book

Anonymous code

The ideal state that you need to come up with in this process is called anonymous code. I am sure that many of you, having opened a working draft and a random component, can say without a hitch which of the colleagues wrote this. Everyone often has a style. Someone likes noodles from if, someone with and without popping lambdas, someone likes to reduce variables so as to save one character. In anonymous code, you cannot do this. In an ideal depersonalized code, it is impossible to identify the author. And this is the case when you came to the final point of your codestyle.

Book Level Readability

Recall 2 main programming problems? Invalidation of cache and naming of variables. We will bypass the dark past of the cache processes and go straight to our pain. Naming variables, classes, objects, files, methods, constants and so on. This problem has 2 extremes. Names too short and too long. Common sense tells us that logic is somewhere near, near the middle. Here, super-efficient programmers knock on our door and tell us that they don't care. But why should we theorize about this? Tell me what does this code do?

And this one?

The second code is read directly and easily. No need to really understand business logic in order to understand the functional load with such naming. In this example, we just take the entire repository, then use builder to create the collection, then apply the filter. You do not even need to delve into the code to understand what is happening. Like reading a book on headlines. But naming is not everything.

Design patterns

You are asked about design patterns at almost every interview. Their public goal is a repeatable architectural solution. Their side effect is predictability and consistency. The moment you switch to an architecture based on design patterns, you form a fairly predictable system. Those. based on the name, you easily understand the purpose of a class or object. What does the Repository pattern do? This is a repository view with data acquisition methods. What does the builder pattern do? Assembles a complex object or structure. And so on all fronts.

There is such a pattern as Command. what can we do with him? Of course, only execute! You do not need to study what is inside it. It is clear that it can be done. Design patterns help you understand how your project works. If you wrote everything well, you won’t be wondering: “What do I have in these directories?” By the names you can easily determine what and where is located.

So to speak. All decisions on design patterns were formed, bearing basically the pain of the developer on a particular problem. This pain was already experienced by someone, framed and got its solution in the form of one of the existing architectural models. Moreover, all this was formed on top of the notorious SOLID.

SOLID

SOLID is an acronym of the five principles of object-oriented programming. Note: 5 principles. Object oriented. Programming This is not a practice. This is not a recommendation. This is not the desire of some old programmer to teach young people. And there’s a lot where you can hear that SOLID are principles. But if you look at the SOLID manifest, which described 6 signs of a bad project, you will find that SOLID is not just principles. These are the conditions under which your project will have flexibility, extensibility and predictable behavior for the developer. And failure to comply with any of the principles is a violation of the terms of the contract with the code base in which you lose one of the qualities. Whether it's simplicity of perception, flexibility, or dependence on abstractions: whatever you lose, this is the beginning of the formation of a project or component that you will rewrite. From scratch. Because it’s much more difficult to do correctly in the resulting implementation than to quickly make the next incorrect inclusions.

How many rewrite projects have you seen? How many of these have you participated in yourself? Not one or two. In each company, you will find a component or an entire system that was designed without a development culture. Without a good code style, without design patterns, without considering SOLID conditions. Development departments rewrite code every day. Double job. Often rewriting into the same hard-to-read components. And the only plus of such rewritings is the specialist’s familiarity with business logic. Those. you rewrote the code, you think that you understand, but you do not understand because it has become simpler, easier, better designed. In most cases, they now understand it simply because they themselves rewrote it.

From this it is worth making that code style is a form of social contract within your community in the company. You agree on how to write code, taking into account the specifics of the business, the code itself, architecture and plans to change the code base. This arrangement means that everyone respects it. Team work involves not so much work together as work in the interests of the team. This is not coworking, it is a common goal that everyone is moving towards. As a result, one of the goals is code cleanliness as a tool for further development.

How to choose code style? It does not matter! This is not a discussion of taste. We are training a neural network, which means that it does not matter what happens in the training set. It is important that these are the same blocks. And how they will look does not matter. But it is better to take something from open-source based styles. Just based on the fact that more people know PSR than any of its own custom ones. Not to mention the extremely widespread symfony2 code style, which is actually an improvement and extension of the PSR . This, of course, is about PHP. They have a built-in high-quality block model that is easy to read, easy to complement and easy to maintain.

Why all this?

You probably want to know where this will lead? I will give an example from one of the previous places of work - there was a clean experiment, given that the CodeStyle questions did not concern the team before I raised this question. When I came to the team and said that from tomorrow on in your PHPStorm such a funny beast as the phpcs and codestyle protocol will get up - everything is a little depressed. It’s not in my rules to innovate through repression, I tried to convey this through a demonstration of that heap of research on pattern recognition by a programmer. And still, support was found only for a couple of people who, like me, suffered greatly from reading low-quality code.

I shrugged and started writing clean, well-designed (subjectively), isolated components. I could be wrong, right? Certainly can. And then he began to distribute to the same people tasks of identical complexity into new clean components and into old ones. Gradually, carefully collecting statistics. After 2 months, I picked up my tablet on Google Sheets and showed it to the team. With clean code, programmers' performance was 23% higher. After thinking that 7 people is not representative, I continued this practice quite quietly in a parallel project, where about 20 freelance programmers worked. The results were slightly worse. But yes, a small team of more than 20 people wrote fairly clean and well-designed code. And on top of their code, identical tasks by others were solved 21% faster. He wanted to move on, but that was enough to get support from the team.

Slowly but surely, we cleared the poor design of the working project, and at one point we came to the point where 90% of the code was executed according to all the canons of development that we identified. What did this lead to? We took the next Middle PHP Developer team. And his onboarding took place in a month, instead of the usual 3. And then another one ... Integrated business logic and asynchronous interaction are always difficult to give. But not in this case. Our onboarding has acquired the character of a quick learning of business logic, rather than the classification of drugs under which each piece of code is written. We have come to a state where only 1 month is enough before even a Junior PHP Developer is given a big and serious task. Communications in the style of “how it works” were minimized. And this one was really a success story.

So, just without violating the rules that were written long before today, we were able to out of the blue to increase team performance by almost 25%, and drop onboarding timing by ⅔. At the same time, the appearance of a "dirty code" inside a clean one is completely ruled out, because no one will allow you to make it unclear what is in a debugged and working system. And if the quality of the code is initially low, no one will notice the appearance of new worthless fragments. In addition, today there are enough tools that allow you to monitor the quality and cleanliness of the code in automatic mode, for example, with a phpstorm plugin like Sonar Light.

In many companies, poor-quality code is the reason why you need to keep a huge staff of Senior developers, the main value of which is that they know the project and can figure out any mess. Think about it, the resources of highly qualified specialists are used to combat the consequences of a lack of development culture, and not to implement technically complex systems. To solve this problem, additional money goes to the huge staff of Senior QA Engineer with all the associated additional costs.

Sometimes I do not believe that efficiency can be increased by 21-27%. Okay, suppose we got some incredible result. But even if we assume that the savings are 5%, then with real working time of 5 hours a day (excluding snacks, smoke breaks, etc.), then for a team of 5 people, 69 minutes a day will be saved. By simple multiplication we understand that it turns out 14,840 minutes per year - and this is 40 person / days. Why not spend one of them installing phpcs and phpstorm? I'm not talking about what the results will be when optimizing the work of 50 developers.

PS: If there is interest in this topic, then in the next article I will definitely share with you the story of how we configured this process in ManyChat, where to start and how to proceed.

All Articles