What I learned from a leading programmer

A year ago, I started working full time at Bloomberg. And then I decided to write this article. I thought I would be full of ideas that I could throw out on paper when the time comes. But a month later I realized that everything will not be so simple: I have already begun to forget what I learned. Either the knowledge was so well acquired that my mind made me believe that I always knew it, or they just flew out of my head. ^one

This is one of the reasons I started keeping a diary. Every day, getting into interesting situations, I described them. And all thanks to the fact that I was sitting next to a leading programmer. I could closely observe his work, and saw how different it was from what I would do. We programmed a lot together, which made my observations even easier. Moreover, our team does not condemn “spying” on people writing code. When it seemed to me that something interesting was happening, I turned and looked. Thanks to constant rising, I was always aware of what was happening.

I spent a year next to a leading programmer. This is what I learned.

Content

Code writing

How to name things in code

One of my first tasks was working on React UI. We had a main component that contained all the other components. I like to add a bit of humor to the code, and I wanted to name the main component of GodComponent

. The moment came to review the code, and I realized why it is so difficult to give names.

There are 2 hard problems in computer science: cache invalidation, naming things, and off-by-1 errors.

- Leon Bambrick (@secretGeek) January 1, 2010

Each piece of code that I have dubbed has gained an implicit purpose. GodComponent

? This is the component that gets all the rubbish that I don’t want to put in the right place. It contains everything. Name it LayoutComponent

, and the future I would decide that this component assigns a layout. That it does not contain a state.

Another important lesson I learned was that if something looks too big, like a LayoutComponent

with a bunch of business logic, then it's time to refactor it, because there shouldn't be business logic here. And in the case of the name GodComponent

presence of business logic will not matter.

Need to name clusters? Calling them after the services that run on them will be a great idea until you run something else on these clusters. We gave them a name in honor of our team.

The same thing applies to functions. doEverything()

is a terrible name with many consequences. If the function does everything, it will be damn difficult to test its individual parts. No matter how big such a function may become, it will never seem too strange to you, because it should do everything. So change the name. Refact.

A meaningful name has a downside. Suddenly the name will be too meaningful and hide some kind of nuance? For example, closing sessions does not close the database connection when session.close()

called in SQLAlchemy. I should have read the documentation and prevented this bug, more about this in the bike section.

From this point of view, naming functions as x

, y

, z

instead of count()

, close()

, insertIntoDB()

does not allow me to put a certain meaning into them and makes me carefully monitor what these functions do. ²

I never thought that I would write about the principles of naming more than one line of text.

Inherited code and the next developer

Have you ever looked at the code and it seems strange to you? Why did you write that? This does not make sense.

I had a chance to work with the inherited code base. This, you know, with comments like "Uncomment the code when Muhammad understands the situation." What are you doing here? Who is Muhammad?

I can switch roles and think about the person who will then be given my code, will it seem strange to him? Partially solving this problem helps peers review your code. This led me to think about the context: I need to remember the context in which my team works.

If I forget this code, return to it later and cannot restore the context, I will say: “What the hell did they do? This is stupid ... Ah, wait, I did it. ”

And here the documentation and comments in the code come into play.

Documentation and comments in code

They help maintain context and convey knowledge. As Lee said in How to Build Good Software :

The main value of software is not in the created code, but in the knowledge accumulated by the people who created this software

You have an open client API endpoint that no one seems to have ever used. Does it just need to be deleted? Generally speaking, it is a technical duty. And if I tell you that in one of the countries 10 journalists send their reports to this endpoint once a year? How to check it? If this is not mentioned in the documentation (it was), then there is no way to check it. We have not checked. They removed it, and after a few months the same annual moment came. Ten journalists were unable to send their important reports because the endpoint no longer existed. And people with product knowledge have already left the team. Of course, now there are comments in the code explaining why this is necessary.

As far as I know, each team is fighting with the documentation. Moreover, with the documentation not only by code, but also by the processes associated with it.

We have not yet come up with the perfect solution. Personally, I like the way Antirez divided comments in code into different types of values .

Atomic commits

If you need to roll back (and you need it. See the Testing chapter), will this commit make sense as a single module?

How to confidently delete lousy code

I was very unpleasant to delete the lousy or outdated code. It seemed to me that everything written centuries ago is sacred. I thought: "They had something in mind when they wrote like that." This is a confrontation between tradition and culture on the one hand, and thinking in the style of the “primary principle” on the other. This is the same as with the removal of the annual endpoint. I learned a special lesson. ³

I would try to get around the code, and the leading developers would try to get through it. Erase it. An if expression that cannot be accessed? Yeah, we erase. What have I done? I just wrote my function on top of it. I have not reduced technical debt. Anyway, I just increased code complexity and forwarding. It will be even more difficult for the next person to put together the pieces of the picture.

Empirically, I came to the conclusion: there is a code that you don’t understand, but there is a code that you will definitely never contact. Erase the code that you don’t address, and be careful with the code that you don’t understand.

Code review

A code review is a great tool for self-education. This is an external feedback loop showing how they would write the code and how you wrote it. What is the difference? One way better than another? I asked myself about this with each review: “Why did they write this way?” And if I couldn’t find a suitable answer, I went and asked.

After the first month, I began to find errors in the code of my colleagues (as they found in mine). It was some kind of madness. The review became much more interesting for me, it turned into a game that I was missing, a game that improved my "sense of code."

In my experience, you don’t have to approve the code until I understand how it works.

My github statistics.

Testing

I fell in love with testing so much that it’s unpleasant for me to write code in a codebase without tests.

If your application does only one thing (like all my school projects), then you can still test manually. ⁴ That's exactly what I did. But what happens if an application performs 100 different tasks? I don’t want to spend half an hour testing, and sometimes I lose sight of something. Nightmare.

Tests and test automation help here.

I treat testing as documentation. This is the documentation of my ideas about code. Tests tell me how I (or anyone else before me) represent the work of the code and where something expected should go wrong.

Today, when I write tests, I try:

Show how to use the test class, function or system.
Show what, in my opinion, might go wrong.

As a result, most often I test behavior, but not implementation ( here is an example that I chose during the breaks at Google).

In paragraph 2, I did not mention the sources of bugs.

When I notice a bug, I make sure that the fix has an appropriate test (this is called regression testing) to document the information. This is another reason why something might go wrong. ⁵

Of course, the quality of my code improves not because I write tests, but because I write code. But reading the tests helps me better understand situations and write better code.

This is the general situation with testing.

But this is not the only kind of testing that I apply. I am talking about deployment environments. You may have ideal unit tests, but if you do not have system tests, something like this may happen:

This also applies to well-tested code: if you do not have the necessary libraries on your computer, then everything will crash.

There are machines you are developing on (the source of all memes like “It worked on my computer!”).
There are machines on which you are testing (may coincide with the previous ones).
Finally, there are machines on which you are deploying (they should not match the machines on which you developed).

If the testing and deployment environments do not match on the machines, you will have problems. Deployment environments will help to avoid this.

We are doing local development in Docker on our computer.

We have a development environment, these computers are equipped with a set of libraries (and development tools), and here we install the written code. Here it can be tested with all the necessary systems. We also have a beta / staging environment that fully repeats the operational environment. Finally, we have an operational environment — machines that run code for our customers.

The idea is to catch errors that have not surfaced during unit and system testing. For example, the difference in API between the requesting and responding systems. I think in the case of a personal project or a small company, the situation may be completely different. Not everyone has the opportunity to create their own infrastructure. However, you can resort to cloud services such as AWS and Azure.

You can configure individual clusters for development and operation. AWS ECS uses Docker images to deploy, so processes in different environments will be relatively consistent. There are nuances in terms of integration between different AWS services. Are you invoking the right endpoint from the right environment?

You can go even further: download alternative container images for other AWS services and configure a local, fully functional Docker-Compose-based environment. This speeds up the feedback loop. ⁶ Perhaps I will gain more experience when I create and launch my side project.

Risk reduction

What steps can you take to reduce the risk of disaster? If we are talking about a new radical change, then how can we verify the minimum duration of downtime, if something goes wrong? “We don’t need to fully deploy the system because of all these new changes.” Really? And why didn’t I think about it!

Architecture

Why am I talking about architecture after writing code and testing? It can be installed first, but if I had not programmed and tested in the environment I used, I probably would not have succeeded in creating an architecture that takes into account the features of this environment. ⁷

You need to think a lot when creating an architecture.

How will numbers be used?
How many users will there be? How much can their number increase (the number of rows in the database depends on this)?
What reefs can meet?

I need to turn this into a checklist called “Requirements Collection”. Right now I don’t have enough experience, I’ll try to do it next year at Bloomberg. This process is largely contrary to Agile: how much can you design the architecture before moving on to implementation? It's all about balance, you need to choose when and what you will do. When does it make sense to rush forward, and when - to step back? Of course, collecting requirements is not tantamount to considering all issues. I think it pays off if you include development processes in the design. For example:

How will local development proceed?
How will we pack and deploy?
How will we conduct end-to-end testing?
How will we conduct stress testing of the new service?
How will we keep secrets?
CI / CD integration?

We recently developed a new search engine for BNEF . It was wonderful to work on it, I organized a local development and found out about DPG (packages and their deployment), levying deployment of secrets.

Who would have thought that deploying secrets to a prod could be so nontrivial:

They cannot be placed in code, because someone can notice them
To store them as an environment variable as the spec offers 12 application factors? Not a bad idea, but how to put them there? (Going to the prod to fill in environment variables every time the car starts is a pain)
Deploy them as files? But where will they come from and how to fill them?

We do not want to do everything manually.

As a result, we came to a database with role-based access control (only we and our computers can communicate with the database). Our code receives secrets from the database at startup. This approach is well replicated within the framework of development, staging and operation environments; secrets are stored in appropriate databases.

Again, with cloud services like AWS, the situation may be completely different. You do not need to take care of secrets somehow. Get an account for your role, enter the secrets in the interface, and your code will find them when they are needed. This greatly simplifies everything, but I'm glad that I got the experience thanks to which I can appreciate this simplicity.

We create architecture, not forgetting about maintenance

System design is inspiring. And the escort? Not too much. My journey through the world of escorts led me to the question: why and how do systems degrade? The first part of the answer is not related to the decommissioning of all obsolete, but only the addition of a new one. The tendency to add rather than delete (doesn’t it remind anything?). The second part is design with the thought of the ultimate goal. A system that, over time, begins to do what it was not intended for, will not necessarily work as well as a system that was originally designed for the same tasks. This is a step backward style approach, not tricks and tricks.

I know at least three ways to reduce the rate of degradation.

Separate business logic and infrastructure: infrastructure usually degrades - load increases, frameworks become obsolete, zero-day vulnerabilities appear, etc.
Create processes for future support. Apply the same updates for old and new bits. This will prevent differences between the old and the new and keep all the code in a “modern” state.
Make sure to throw away everything that is unnecessary and old.

Deployment

Will I pack features together or deploy them one at a time? Depending on the current process, if you pack them together, then wait for trouble. Ask yourself why you want to pack features together?

Deployment takes a lot of time?
Is code review not too fun?

Whatever the reason, this situation needs to be fixed. I know at least two problems associated with packaging:

You yourself block all features if one of them has a bug.
You increase the risk of problems.

Whatever deployment process you choose, you always want your cars to be like livestock, not like pets. They are not unique. You know exactly what is executed on each machine, how to recreate them in case of death. You will not be upset if any car dies, you just pick up a new one. You graze them, not grow.

When something goes wrong

In case something goes wrong - and it does - there is a golden rule: minimize the impact on customers. In the event of a failure, my first desire was always to fix it. This does not seem to be the optimal solution. Instead of fixing, even if it can be done in one line, you need to roll back first. Return to the previous operating state. This is the fastest way to return customers to a working version. Only then I find out what the problem is and fix it.

The same applies to the “damaged” machine in your cluster: turn it off, mark it as inaccessible, before finding out what happened to it. I find it strange how much my natural desire and instincts contradict the optimal solution.

I think this instinct also led to the fact that I fixed bugs longer. Sometimes I realized that something didn’t work, because the code I wrote was somehow wrong, and I climbed into the wilds, looking at each line. Something like a search “first in depth”. And when it turned out that the problem arose due to a configuration change, that is, I did not check it in the first place, this situation unsettled me. I wasted my time looking for a bug.

Since then I have learned to search “first in breadth”, and therefore already “first in depth”, in order to exclude higher-level reasons. What exactly can I confirm with current resources?

Does the car work?
Is the code installed correctly?
Is there a configuration?
<Code-specific configuration>, such as whether routing is spelled out correctly?
Is the schema version correct?
And then I plunge into the code.

We thought that nginx was installed incorrectly, but it turned out just the configuration was disabled

Of course, I do not need to do this every time. Sometimes just an error message is enough to immediately get down to the code. When I cannot determine the cause, I try to minimize the number of changes-in-code-in order to find the reason. The fewer changes, the faster I can find the real root of the problem. In addition, now I have a memo for bugs that saved me more than an hour thinking “what did I miss?” Sometimes I forget about the simplest checks, such as setting up routing, matching scheme versions and service, etc. This is another step in mastering the technology stack that I use, and what you gain only with experience is intuition in determining what exactly does not work.

Bike

This article cannot be complete without a story. I like to read them, and I want to share one of them with you. SQLAlchemy. BNEF , . . SQLAlchemy, , Solr. - .

«MYSQL server has gone away.» . , , . , . , . , ?

, ? , , . , , __exit__()

session.close()

.

, , . . . . .

Session.close()

MySQL- SQLAlchemy , NullPool. . , , . : StackOverflow (, !) , , SQL- . , . , , (), .

«» , 1 8. , , — .

, .

Monitoring

, . , , . , .

, , , . , . , . « ?! , ? ".

, : , . , , . , , . , . — ? , , . . , -, , , , , .. .

. , , ? (, AWS CloudWatch Grafana). .

. , , 50 %, — . ? . , — (, ?).

. , , , , ? ? , ?

, . , . — - .

Conclusion

. , , , . , - !

. , !

, . , — How to Build Good Software .

. : ! , .

?
? , ? , ?
. , ? ?
(utils) (, , , ) , « »?
?
, , - ?
— API , ?
? , .
, , . , , .
PR: « , , 52 , , , , , ». ?
. ?
?
?

Notes

. , ? - ? , ?
, x()

, y()

z(),

x()

, y()

z()

. , WYSIATI .
.
. , ?
, - , , . , .
, , Docker- AWS.
.

All Articles