The way to type checking 4 million lines of Python code. Part 2

Today we are publishing the second part of the translation of how Dropbox organized the control of the types of several million lines of Python code.

→ Read the first part

Formal Type Support (PEP 484)

We did the first serious experiment with mypy on Dropbox during Hack Week 2014. Hack Week is an event held by Dropbox for one week. At this time, employees can work on anything! Some of Dropbox's most famous technology projects began at events like this. As a result of this experiment, we concluded that mypy looks promising, although this project was not yet ready for widespread use.

At that time, the idea of standardizing hinting systems for Python types was in the air. As I said, starting with Python 3.0, you could use type annotations for functions, but these were just arbitrary expressions, without specific syntax and semantics. During program execution, these annotations, for the most part, were simply ignored. After Hack Week, we started working on standardizing semantics. This work led to the emergence of PEP 484 (Guido van Rossum, Lukas Langa and I collaborated on this document).

Our motives could be viewed from two sides. First, we hoped that the entire Python ecosystem could take a general approach to using type hints (type hints is a term used in Python as an analogue of “type annotations”). This, given the possible risks, would be better than using many mutually incompatible approaches. Secondly, we wanted to openly discuss type annotation mechanisms with many members of the Python community. In part, this desire was dictated by the fact that we would not want to look like “apostates” from the basic ideas of the language in the eyes of the broad masses of Python programmers. It is a dynamically typed language known as "duck typing." In the community, at the very beginning, a somewhat suspicious attitude towards the idea of static typing could not help but arise. But this attitude eventually weakened - after it became clear that static typing was not planned to be mandatory (and after people realized that it was really useful).

The resulting syntax for type hints was very similar to the one mypy supported at that time. PEP 484 came out with Python 3.5 in 2015. Python was no longer a language that only supported dynamic typing. I like to think of this event as a significant milestone in the history of Python.

Start of migration

At the end of 2015, a team of three people was created in Dropbox to work on mypy. It included Guido van Rossum, Greg Price and David Fisher. From this moment the situation began to develop extremely quickly. The first obstacle to mypy growth was performance. As I already hinted above, in the early period of the development of the project, I thought about translating the implementation of mypy into C, but this idea has been removed from the lists so far. We are stuck with using the CPython interpreter to start the system, which is not fast enough for tools like mypy. (The PyPy project, an alternative implementation of Python with a JIT compiler, did not help us either.)

Fortunately, here some algorithmic improvements came to our aid. The first powerful “accelerator” was the implementation of incremental verification. The idea of this improvement was simple: if all the dependencies of the module have not changed since the previous launch of mypy, then we can use, while working with dependencies, the data cached during the previous session. All we had to do was type check in the modified files and in those files that depended on them. Mypy went even a little further: if the external interface of the module did not change - mypy thought that other modules that import this module should not be checked again.

Incremental validation has greatly helped us in annotating large volumes of existing code. The fact is that this process usually includes many iterative runs of mypy, since annotations are gradually added to the code and gradually improved. The first launch of mypy was still very slow, since it required a lot of dependencies to be checked. Then, for the sake of improving the situation, we implemented a remote caching mechanism. If mypy detects that the local cache is probably out of date, it downloads the current cache snapshot for the entire codebase from a centralized repository. He then performs an incremental check using this snapshot. This is another big step that has moved us towards increasing mypy productivity.

This was a period of the quick and natural introduction of the Dropbox type checking system. By the end of 2016, we already had approximately 420,000 lines of Python code with type annotations. Many users were enthusiastic about type checking. Dropbox mypy has been used by more and more development teams.

Everything looked good then, but we still had a lot to do. We began to conduct periodic internal user surveys in order to identify problem areas of the project and understand what issues need to be addressed first (this practice is used in the company today). The most important, as it became clear, were two tasks. First, you needed more code coverage with types, and second, you needed mypy to work faster. It was perfectly clear that our work on accelerating mypy and its implementation in the company's projects was still far from complete. We, fully aware of the importance of these two tasks, took up their solution.

More performance!

Incremental checks accelerated mypy, but this tool was still not fast enough. Many incremental checks lasted about a minute. The reason for this was cyclical imports. This will probably not surprise anyone who has worked with large codebases written in Python. We had sets of hundreds of modules, each of which indirectly imported all the others. If any file in the import cycle turned out to be modified, mypy had to process all the files included in this cycle, and often also any modules that import modules from this cycle. One such cycle was the infamous “tangle of dependencies,” which caused a lot of trouble in Dropbox. Once this structure contained several hundred modules, while it was imported, directly or indirectly, a lot of tests, it was used in the production code.

We considered the possibility of "unraveling" cyclic dependencies, but we did not have the resources to do this. There was too much code that we were not familiar with. As a result, we took an alternative approach. We decided to make mypy work fast even if there are "dependency balls". We accomplished this with the mypy daemon. A daemon is a server process that implements two interesting features. Firstly, it keeps in memory information about the entire code base. This means that every time you run mypy, you don’t have to download cached data related to thousands of imported dependencies. Secondly, he carefully, at the level of small structural units, analyzes the relationships between functions and other entities. For example, if the function foo

calls the function bar

, then there is a dependence of foo

on bar

. When a file is changed, the daemon first, in isolation, processes only the changed file. Then he looks at the changes to this file that are visible from the outside, such as the changed function signatures. The daemon uses detailed import information only to double-check those functions that truly use the changed function. Typically, with this approach, you have to check very few functions.

Implementing all of this was not an easy task, as the original implementation of mypy was very focused on processing one file at a time. We had to deal with many borderline situations, the occurrence of which required repeated checks in those cases when something changed in the code. For example, this happens when a new base class is assigned to a class. After we did what we wanted, we were able to reduce the execution time of most incremental checks to a few seconds. It seemed to us a great victory.

More performance!

Together with the remote caching, which I described above, the mypy daemon almost completely solved the problems that arise when the programmer often runs type checking, making changes to a small number of files. However, the system performance in the least favorable version of its use was still far from optimal. A clean start of mypy could take more than 15 minutes. And it was much more than we would like. Each week, the situation got worse, as programmers continued to write new code and add annotations to existing code. Our users were still eager for more performance, but we were happy to be ready to meet them.

We decided to return to one of my earliest ideas regarding mypy. Namely, to the conversion of Python code to C code. Experiments with Cython (this is a system that allows you to translate Python code into C code) did not give us any visible acceleration, so we decided to revive the idea of writing our own compiler. Since the mypy codebase (written in Python) already contained all the necessary type annotations, an attempt to use these annotations to speed up the system seemed to be worthwhile. I quickly created a prototype to test this idea. He showed on various micro benchmarks more than 10-fold increase in productivity. Our idea was to compile Python modules into C-modules using Cython, and to turn type annotations into type checks performed at run time (typically type annotations are ignored at run time and are used only by type checking systems ) We actually planned to translate the mypy implementation from Python to a language that was created statically typed, which would look (and, for the most part, work) exactly like Python. (This kind of cross-language migration has become something of a tradition of the mypy project. The initial implementation of mypy was written in Alore, then there was a syntactic hybrid of Java and Python).

Focusing on the CPython extension API was key to not losing project management capabilities. We did not need to implement a virtual machine or any libraries that mypy needed. In addition, the entire Python ecosystem would still be available to us, all the tools (such as pytest) would be available. This meant that we could continue to use interpreted Python code during development, which would allow us to continue to work using a very fast scheme for making changes to the code and testing it, rather than waiting for the code to compile. It looked like we were superbly able, so to speak, to sit on two chairs, and we liked it.

The compiler, which we named mypyc (as it uses mypy as a frontend for type analysis), turned out to be a very successful project. In general, we achieved about 4 times faster mypy's frequent runs without caching. The development of the core of the mypyc project took about 4 calendar months from a small team, which included Michael Sullivan, Ivan Levkivsky, Hugh Han and me. This amount of work was much less extensive than the one that would be needed to rewrite mypy, for example, in C ++ or Go. And we had to make much less changes to the project than we would have to make when rewriting it in another language. We also hoped that we could bring mypyc to such a level that other Dropbox programmers could use it to compile and speed up their code.

To achieve this level of performance, we had to apply some interesting engineering solutions. So, the compiler can speed up many operations by using fast low-level C constructs. For example, a call to a compiled function translates into a call to a C function. And such a call is made much faster than calling the interpreted function. Some operations, such as dictionary searches, still boiled down to using regular C-API calls from CPython, which after compilation turned out to be only a little faster. We were able to get rid of the extra load on the system created by the interpretation, but this in this case gave only a small gain in terms of performance.

To identify the most common “slow” operations, we performed code profiling. Armed with the data, we tried to either tweak mypyc so that it would generate faster C code for such operations, or rewrite the corresponding Python code using faster operations (and sometimes we simply did not have a simple enough solution for that or other problem). Rewriting Python code often proved to be an easier solution to the problem than implementing the same transformation automatically in the compiler. In the long run, we wanted to automate many of these transformations, but at that moment we were aimed at speeding up mypy with a minimum of effort. And we, moving towards this goal, cut off several corners.

To be continued…

Dear readers! What were your impressions of the mypy project when you learned about its existence?

All Articles

The way to type checking 4 million lines of Python code. Part 2

Formal Type Support (PEP 484)

Start of migration

More performance!

More performance!

More articles: