Using strict modules in large-scale Python projects: Instagram experience. Part 1

We’ll publish the first part of the translation of the next article in a series on how Instagram works with Python. The first article in this series talked about the features of Instagram server code, that it is a monolith that changes frequently, and how static type checking tools help manage this monolith. The second material is about typing the HTTP API. Here we will talk about approaches to solving some of the problems that Instagram encountered using Python in its project. The author of the material hopes that the Instagram experience will be useful to those who may encounter similar problems.







Situation overview



Let's look at the following module, which, at first glance, looks completely innocent:



import re from mywebframework import db, route VALID_NAME_RE = re.compile("^[a-zA-Z0-9]+$") @route('/') def home():     return "Hello World!" class Person(db.Model):     name: str
      
      





What code will be executed if someone imports this module?





Problem # 1: slow server startup and restart



The only line of code for this module that (possibly) does not execute when it is imported is return "Hello World!"



. True, with certainty we cannot say this! As a result, it turns out that by importing this simple module consisting of eight lines (and still not even using it in our program), we may cause hundreds or even thousands of lines of Python code to be launched. And this is not to mention that the import of this module causes a modification of the global URL-mapping, located in some other place in the program.



What to do? Before us is part of the corollary of the fact that Python is a dynamic interpreted language. This allows us to successfully solve various problems using metaprogramming methods. But what, nevertheless, is wrong with this code?



In fact, this code is in perfect order. This is so as long as someone uses it in relatively small code bases, on which small programmer teams work. This code does not cause trouble as long as the one who uses it is guaranteed to maintain a certain level of discipline in how exactly Python features are used. But some aspects of this dynamism can become a problem if there are millions of lines of code in the project that hundreds of programmers are working on, many of which do not have deep Python knowledge.



For example, one of the great features of Python is the speed of the steps involved in phased development. Namely, the result of code changes can be seen literally immediately after making such changes, without the need to compile the code. But if we are talking about a project of several million lines (and a rather confusing dependency diagram of this project), then this plus of Python starts to turn into a minus.



It takes more than 20 seconds to start our server. And sometimes, when we do not pay due attention to optimization, this time increases to about a minute. This means that the developer needs 20-60 seconds to see the results of changes made to the code. This applies to what you can see in the browser, and even to the speed of running unit tests. Unfortunately, this time is enough for a person to be distracted by something and forget about what he had done before. Most of this time, literally, is spent on importing modules and creating functions and classes.



In a way, this is the same as waiting for the results of compiling a program written in some other language. But usually compilation can be done incrementally . The point is that you can recompile only what has changed, and what directly depends on the changed code. As a result, usually compilation of projects, performed after making small changes to them, is quick. But when working with Python, due to the fact that import commands can have any side effects, there is no reliable and safe way to incrementally restart the server. At the same time, the scale of the changes is unimportant and each time we have to completely restart the server, importing all modules, re-creating all classes and functions, recompiling all regular expressions, and so on. Usually, from the moment of the last server restart, 99% of the code does not change, but we still have to do the same thing over and over to enter changes.



In addition to slowing down developers, this also leads to the unproductive waste of serious amounts of system resources. The fact is that we are working in a mode of continuous deployment of changes, which means constant reloading of the production server code.



Actually, here is our first problem: slow server startup and restart. This problem arises due to the fact that the system has to constantly perform a large amount of repetitive actions during code import.



Problem # 2: Side Effects of Unsafe Import Commands



Here is another task that, as it turned out, developers often solve when importing modules. This is loading settings from the network storage of configurations:



 MY_CONFIG = get_config_from_network_service()
      
      





In addition to slowing server startup, it is also unsafe. If the network service is unavailable, then this will not only lead to the fact that we will receive error messages regarding the inability to fulfill some requests. This will cause the server to fail to start.



Let's thicken the colors and imagine that someone added to the module responsible for initializing an important network service, some code that is executed during import. The developer simply did not know where to add this code to him, so he placed it in a module that is imported in the early stages of starting the server. It turned out that this scheme works, so the solution was considered successful and the work continued.



But then someone else added somewhere else the import team, which at first glance was harmless. As a result, through a chain of imports with a depth of twelve modules, this led to the fact that the module that loads the settings from the network is now imported to the module that initializes the corresponding network service.



Now it turns out that we are trying to use the service before it is initialized. The system naturally crashes. In the best case, if we are talking about a system in which interactions are completely deterministic, this can lead to the fact that the developer will spend an hour or two figuring out how a minor change led to a failure in something, with him, seems unconnected. But in more complex situations, this can lead to a “fall” of the project in production. However, there are no universal ways to use linter to combat such problems or to prevent them.



The root of the problem lies in two factors, the interaction of which leads to devastating consequences:



  1. Python allows modules to have arbitrary and unsafe side effects that occur during import.
  2. The import order of the code is not explicitly set and is not controlled. On the scale of a project, a kind of “comprehensive import” is what consists of the import commands contained in all the modules. In this case, the import order of the modules may vary depending on the input point of the system used.


To be continued…



Dear readers! Have you encountered problems regarding slow startup of Python projects?








All Articles