Using strict modules in large-scale Python projects: Instagram experience. Part 2

We present to your attention the second part of the translation of material devoted to the features of working with modules in Instagram Python projects. The first part of the translation gave an overview of the situation and showed two problems. One of them concerns the slow start of the server, the second - the side effects of unsafe import commands. Today this conversation will continue. We will consider one more nuisance and talk about approaches to solving all the problems raised.







Issue # 3: mutable global status



Take a look at another category of common mistakes.



def myview(request):     SomeClass.id = request.GET.get("id")
      
      





Here we are in the presentation function and attach the attribute to a certain class based on the data received from the request. You probably already understood the essence of the problem. The fact is that classes are global singletones. And here we put the state, depending on the request, in a long-lived object. In a web server process that takes a long time to complete, this can lead to pollution of every future request made as part of this process.



The same thing can easily happen in tests. In particular, in cases where programmers try to use monkey patches and do not use the context manager - like mock.patch



. This can lead not to pollution of requests, but to pollution of all tests that will be performed in the same process. This is a serious reason for the unreliable behavior of our testing system. This is a significant problem, and it is very difficult to prevent this. As a result, we abandoned the unified testing system and switched to a test isolation scheme, which can be described as “one test per process”.



Actually, this is our third problem. A mutable global state is a phenomenon not unique to Python. You can find it anywhere. We are talking about classes, modules, lists or dictionaries attached to modules or classes, about singleton objects created at the module level. Work in such an environment requires discipline. In order to prevent pollution of the global state while the program is running, you need very good knowledge of Python.



Introducing strict modules



One of the root causes of our problems may be that we use Python to solve such problems that this language is not designed for. In small teams and small projects, if you follow the rules when using Python, this language works just fine. And we should go to a more rigorous language.



But our code base has already outgrown the size that allows us to at least talk about how to rewrite it in another language. And, more importantly, despite all the problems we face, Python has a lot to do with it. He gives us more good than bad. Our developers really like this language. As a result, it depends only on us how to get Python to work on our scale, and how to make sure that we can continue to work on the project as it develops.



Finding solutions to our problems led us to one idea. It consists in using strict modules.



Strict modules are Python modules of a new type, at the beginning of which there is a construction __strict__ = True



. They are implemented using many of the low-level extensibility mechanisms that Python already has. A special module loader parses the code using the ast



module, performs an abstract interpretation of the loaded code to analyze it, applies various transformations to the AST, and then compiles the AST back into Python bytecode using the built-in compile



function.



No side effects on import



Strict modules impose some restrictions on what can happen at the module level. So, all module-level code (including decorators and functions / initializers called at the module level) must be clean, that is, code that is free from side effects and does not use I / O mechanisms. These conditions are checked by the abstract interpreter using the means of static code analysis at compile time.



This means that using strict modules does not cause side effects when importing them. Code executed during module import can no longer cause unexpected problems. Due to the fact that we check this at the level of abstract interpretation, using tools that understand a large subset of Python, we eliminate the need to overly restrict Python expressiveness. Many types of dynamic code, devoid of side effects, can be safely used at the module level. This includes various decorators and the definition of module level constants using lists or dictionary generators.



Let’s make it clearer, consider an example. Here is a correctly written strict module:



 """Module docstring.""" __strict__ = True from utils import log_to_network MY_LIST = [1, 2, 3] MY_DICT = {x: x+1 for x in MY_LIST} def log_calls(func):    def _wrapped(*args, **kwargs):        log_to_network(f"{func.__name__} called!")        return func(*args, **kwargs)    return _wrapped @log_calls def hello_world():    log_to_network("Hello World!")
      
      





In this module, we can use the usual Python constructs, including dynamic code, one that is used to create the dictionary, and one that describes the module-level decorator. At the same time, accessing network resources in the _wrapped



or hello_world



functions is completely normal. The fact is that they are not called at the module level.



But if we moved the log_to_network



call to the external log_calls



function, or if we tried to use a decorator that caused side effects (like @route



from the previous example), or if we used the hello_world()



call at the module level, then it would cease to be strict strict -module.



How to find out that it is not safe to call log_to_network



or route



functions at the module level? We proceed from the assumption that everything imported from modules that are not strict modules is unsafe, with the exception of some functions from the standard library that are known to be safe. If the utils



module is a strict module, then we can rely on the analysis of our module to let us know if the log_to_network



function is log_to_network



.



In addition to improving code reliability, imports that are free from side effects eliminate a serious barrier to safe incremental code downloads. This opens up other possibilities for exploring ways to speed up import teams. If the module level code is free from side effects, this means that we can safely execute individual module instructions in the "lazy" mode, upon request, when accessing the module attributes. This is much better than following the “greedy” algorithm, when applying which all the module code is executed in advance. And, given that the form of all classes in the strict module is completely known at compile time, in the future we may even try to organize permanent storage of module metadata (classes, functions, constants) generated by code execution. This will allow us to organize quick import of unchanged modules, which does not require repeated execution of the bytecode of the module level.



Immunity and __slots__ attribute



Strict modules and classes declared in them are immutable after they are created. Modules are made immutable with the help of the internal transformation of the module body into a function in which access to all global variables is organized through closure variables. These changes have seriously reduced the possibilities for a random change in the global state, although mutable global state can still be worked out if it is decided to use it through mutable container level modules.



Members of classes declared in strict modules must also be declared in __init__



. They are automatically written to the __slots__



attribute during the AST transformation performed by the module loader. As a result, later you can no longer attach additional attributes to the class instance. Here is a similar class:



 class Person:    def __init__(self, name, age):        self.name = name        self.age = age
      
      





During the AST transformation, which is performed during the processing of strict modules, the operations of assigning values ​​to the name



and age



attributes performed in __init__



will be detected, and an attribute of the form __slots__ = ('name', 'age')



will be attached to the class. This will prevent any other attributes from being added to the class instance. (If type annotations are used, then we take into account information about types available at the class level, such as name: str



, and also add them to the list of slots).



The described limitations not only make the code more reliable. They help speed up code execution. Automatic transformation of classes with the addition of the __slots__



attribute increases the efficiency of memory usage when working with these classes. This allows you to get rid of dictionary searches when working with individual instances of classes, which speeds up access to attributes. In addition, we can continue to optimize these patterns during the execution of Python code, which will allow us to further improve our system.



Summary



Strict modules are still experimental technology. We have a working prototype, we are in the early stages of deploying these capabilities in production. We hope that after we gain enough experience in using strict-modules, we will be able to talk more about them.



Dear readers! Do you think the features offered by strict modules come in handy in your Python project?








All Articles