We at Virtuozzo have a small but already quite popular project called CRIU. This is a rather complicated system utility and a set of kernel patches (most of them, by the way, have already been accepted into the main kernel tree), with which you can do things like, for example, live migration of containers or update the kernel without restarting applications.
We started this project back in 2011. And despite the fact that the utility initially caused a lot of questions, and some considered its implementation impossible, CRIU gradually turned into a mature tool. To date, more than one and a half hundred contributors from several dozen companies, including such giants as Google and IBM, have managed to participate in it. Despite this, the search for new members continues, and this year we finally got to Google Summer of Code (GSoC).
GSoC is a Google-sponsored annual event whose goal is to attract students to various opensource projects. On the one hand, teams from open projects seek to participate in the event, and on the other hand, students who want to contribute to the development of the community and prove their professionalism on real projects.
To enter the GSoC team, you need to submit an application specifying the project description, several topics that students can work on and a list of so-called “mentors” - active participants in the project who will help the student in his difficult work. Students are required to select one or more projects and send their resumes to the mentors.
In the middle of the school year, Google considers the applications of the teams and selects the projects that will be involved, and closer to the summer holidays, the teams choose the students with whom they are ready to work, after which Google carries out the final filtering and distributes the students according to the teams. In summer, work begins, which lasts three months. Once every 30 days, students submit interim reports, and their mentors evaluate the results and make recommendations for the continuation (or termination) of work.
Memory optimization and implementation of binary logs
I admit that in 2019 was not our first attempt to enter the GSoC. It’s just that up to this point we were not able to go through the stage of selecting projects from Google. But we didn’t give up (in general, it’s not difficult to submit an application), and finally, everything worked out - Google recognized the development of our project as important and released CRUI at GSoC.
We had a lot of topics for students, one more beautiful and more complex. A pleasant surprise was the fact that for each of them in the community there were performers. There were people who knew the voiced issues and were ready to work as a mentor. At the stage of applying for students, we received a whole “competition” - 2 students applied for each of the topics and almost all had wonderful input. The final selection allowed us to get two students who took up topics of optimization of the memory preservation code from the ongoing process, as well as the implementation of binary logs.
Since CRIU is a system of live application migration, it has such a mode of operation when the memory that the process uses is read and written to image files in parallel with the execution of the process itself. We call this the “operation on the living heart” of the process, because it continues to work without stopping. Prior to the GSoC round, all the memory was pulled into pipes using the vmsplice system call, which made noise at one time, and then the process continued to execute, and CRIU slowly dumped this memory into files (or into a network channel, if it was a live migration). In principle, this is a working approach, but the problem was that the memory located in the pipes is effectively locked (mlock) and the kernel cannot unload it to disk (swap-out) if necessary.
From an architectural point of view, we wanted to replace the pipes to copy memory in small portions by calling process_vm_readv. This innovation appeared in the Linux kernel not so long ago (by the way, this call has a twin brother called process_vm_writev). But at the same time it allows you to greatly facilitate and speed up, for example, the work of the strace utility and debuggers, which can be poked around in the memory of processes for solving some other tasks.
Work on optimization was complicated by the fact that the code for working with the process memory is one of the central ones in the utility, and therefore it must be absolutely reliable. Any mistake in saving the pages can lead to the process getting an inconsistent state of its internal objects (about which CRIU, of course, does not “know” anything) and after restoration will fall without any clear diagnostics.
The second difficulty of this development was that working with memory is involved in almost all CRIU features. These are the usual checkpoint-restore procedures, these are its various optimized versions, for example pre-dump or lazy-restore. Once during the next reporting week, we even planned to “dismiss” the student from the project, but, fortunately, we did not do this and now there is already a long-awaited optimization in our devel branch.
The second task in the framework of GSoC 2019 was the development and implementation of the so-called binary logs. Here's the thing: when CRIU works, the utility writes messages about its work to the file (or to the screen, but still better to the file). The importance of these posts is huge! If the backup or restore procedure for some reason does not end with success, then the only way to understand the reason is to analyze each step in as much detail as possible, and for this you need information about the utility. Ideally, the proceedings require the most detailed logs and image files, if any. In practice, such requirements are difficult to satisfy.
To get the most detailed logs, CRIU provides an appropriate mode, and the vast majority of users (and maybe even all) always activate it. But the amount of information that criu generates in the process is so huge that the logging itself starts to noticeably affect the speed of the system. A small study showed that we spend 90% of our time in logging operations on formatting output — that is, on the “same”% d,% 08s,% .2f and other modifiers of the printf function family. Turning off the logs reduces the time of saving and restoring processes from 10 to 30 percent, depending on the size of the processes themselves.
In order to turn off the use of such an amount of system resources for logging, but leave logs as informative as possible, we decided to get rid of formatting and save binary data to log files. After all, you can format them later, if necessary. This task was handled by a second student, whose patches have also already been accepted into the development branch.
And not only at GSoC
By the way, another interesting fact of participating in GSoC is that a third student came to us who expressed a desire to solve the problem of anonymization. Indeed, it is often impossible to obtain image files due to the fact that they contain secret information that the user justly does not want to share with anyone - the contents of the memory, the files the process works with, the contents of queues in network connections, and so on. In order to solve this problem, we submitted a feature called “anonymization of images” in the application, but Google did not accept it.
Nevertheless, the topic has not lost its relevance, and the student who wanted to deal with it within the framework of GSoC, in the end, decided to work on the issue independently, outside of the Google event.
Conclusion
It was certainly a positive experience participating in GSoC. Our CRIU tool, which we love and appreciate, has received a couple more powerful impulses for development, has become even more mature and convenient. So to whom it may be useful, use it with pleasure!
On the other hand, we were convinced that the issue of participation in such events is a matter of perseverance and confidence in our project. If you need developers, just do not get tired of submitting applications and formulating new, interesting topics. You may find completely unexpected contributors from another country or even from another company.