Tarantool Cartridge: shredding Lua backend in three lines





At Mail.ru Group, we have Tarantool - this is such an application server on Lua, which also has a database (or vice versa?). It is fast and cool, but the capabilities of one server are still not unlimited. Vertical scaling is also not a panacea, so Tarantool has tools for horizontal scaling - the vshard module [1] . It allows you to shard data across multiple servers, but you have to tinker with it in order to configure it and fasten the business logic.



Good news: we collected the cones (for example [2] , [3] ) and sawed off another framework that will significantly simplify the solution to this problem.



Tarantool Cartridge is a new framework for developing complex distributed systems. It allows you to focus on writing business logic instead of solving infrastructure problems. Under the cut, I’ll tell you how this framework is organized and how to write distributed services with it.



And what, in fact, is the problem?



We have a tarantula, there is vshard - what more could you want?



First, the point is convenience. Vshard configuration is configured through Lua tables. For a distributed system of several Tarantool processes to work correctly, the configuration must be the same everywhere. Nobody wants to do this manually. Therefore, all sorts of scripts, Ansible, deployment systems are used.



Cartridge itself manages the vshard configuration; it does this based on its own distributed configuration . In essence, this is a simple YAML file, a copy of which is stored in each instance of Tarantool. The simplification lies in the fact that the framework itself monitors its configuration and so that it is the same everywhere.



Secondly, the point is again in convenience. The configuration has no relation to the development of business logic and only distracts the programmer from work. When we discuss the architecture of a project, then most often we are talking about individual components and their interaction. It’s too early to think about rolling out a cluster to 3 data centers.



We solved these problems over and over again, and at some point we managed to develop an approach to simplify the work with the application throughout its entire life cycle: creation, development, testing, CI / CD, maintenance.



Cartridge introduces the role concept for each Tarantool process. Roles are a concept that allows the developer to focus on writing code. All the roles available in the project can be run on one instance of Tarantool, and this will be enough for tests.



Key features of Tarantool Cartridge:





Hello World!



I’m eager to show the framework itself, so let's leave the story about architecture for later, and start with a simple one. Assuming that Tarantool itself is already installed, it remains only to do



$ tarantoolctl rocks install cartridge-cli $ export PATH=$PWD/.rocks/bin/:$PATH
      
      





These two commands will install the command line utilities and allow you to create your first application from the template:



 $ cartridge create --name myapp
      
      





And here is what we get:



 myapp/ ├── .git/ ├── .gitignore ├── app/roles/custom.lua ├── deps.sh ├── init.lua ├── myapp-scm-1.rockspec ├── test │ ├── helper │ │ ├── integration.lua │ │ └── unit.lua │ ├── helper.lua │ ├── integration/api_test.lua │ └── unit/sample_test.lua └── tmp/
      
      





This is a git repository with a ready-made “Hello, World!” Application. Let's immediately try to run it, pre-installing the dependencies (including the framework itself):



 $ tarantoolctl rocks make $ ./init.lua --http-port 8080
      
      





So, we have launched one node of the future sharded application. An inquisitive layman can immediately open the web interface, use the mouse to configure a cluster from one node and enjoy the result, but it's too early to rejoice. So far, the application does not know how to do anything useful, so I’ll tell you about the deployment later, and now it's time to write code.



Application Development



Just imagine, we will design a project that should receive data, save it and build a report once a day.







We begin to draw a diagram, and place three components on it: gateway, storage and scheduler. We are working on the architecture further. Since we use vshard as storage, we add vshard-router and vshard-storage to the scheme. Neither gateway nor scheduler will directly access the repository, there is a router for this, it was created for that.







This scheme still does not quite accurately reflect what we will create in the project, because the components look abstract. We also need to see how this is projected onto a real Tarantool - we group our components by process.







Keeping vshard-router and gateway on separate instances makes little sense. Why do we need to go over the network once again, if this is already the responsibility of the router? They must be running within the same process. That is, in the same process, the gateway and vshard.router.cfg are initialized, and let them interact locally.



At the design stage, working with three components was convenient, but as a developer, while I am writing code, I do not want to think about launching three instances of Tarnatool. I need to run tests and verify that I wrote gateway correctly. Or maybe I want to demonstrate a feature to my colleagues. Why should I suffer with the deployment of three copies? That is how the concept of roles was born. A role is a regular Loash module whose life cycle is managed by Cartridge. In this example, there are four of them - gateway, router, storage, scheduler. In another project, there may be more. All roles can be launched in one process, and this will be enough.







And when it comes to deploying to staging or commissioning, then we will assign each set of roles to each Tarantool process, depending on the hardware capabilities:







Topology management



Information about where which roles are launched must be stored somewhere. And this “somewhere” is the distributed configuration that I mentioned above. The most important thing in it is the cluster topology. Here are 3 replication groups of 5 Tarantool processes:







We do not want to lose data, therefore we carefully treat information about running processes. Cartridge monitors the configuration with a two-phase commit. As soon as we want to update the configuration, it first checks the availability of all instances and their readiness to accept the new configuration. After this, the second phase applies the config. Thus, even if one instance was temporarily unavailable, then nothing terrible will happen. The configuration simply will not apply and you will see an error in advance.



Also in the topology section is indicated such an important parameter as the leader of each replication group. Usually this is the instance that is being recorded. The rest are most often read-only, although there may be exceptions. Sometimes brave developers are not afraid of conflicts and can write data to several replicas in parallel, but there are some operations that, despite everything, should not be performed twice. There is a sign of a leader for this.







Role life



In order for an abstract role to exist in such an architecture, the framework must somehow manage them. Naturally, control takes place without restarting the Tarantool process. There are 4 callbacks for managing roles. Cartridge itself will call them, depending on what it says in a distributed configuration, thereby applying the configuration to specific roles.



 function init() function validate_config() function apply_config() function stop()
      
      





Each role has an init



function. It is called once, either when the role is enabled, or when Tarantool restarts. It is convenient there, for example, to initialize box.space.create, or scheduler can run some background fiber, which will do the work at certain intervals.



The init



function init



may not be enough. Cartridge allows roles to take advantage of the distributed configuration that it uses to store the topology. In the same configuration, we can declare a new section and store a fragment of the business configuration in it. In my example, this can be a data scheme, or schedule settings for the scheduler role.



The cluster calls validate_config



and apply_config



each time the distributed configuration changes. When a configuration is applied by a two-phase commit, the cluster verifies that each role is ready to accept this new configuration and, if necessary, reports an error to the user. When everyone agreed that the configuration is normal, apply_config



is apply_config



.



Roles also have a stop



method, which is needed to clear the role’s vital signs. If we say that the scheduler on this server is no longer needed, it can stop those fibers that it started using init



.



Roles can interact with each other. We are used to writing function calls in Lua, but it may happen that we don’t have the role we need in this process. To facilitate network access, we use the auxiliary module rpc (remote procedure call), which is built on the basis of the standard netbox built into Tarantool. This can be useful if, for example, your gateway wants to directly ask the scheduler to do the job right now, rather than wait a day.



Another important point is ensuring fault tolerance. Cartridge uses the SWIM protocol [4] to monitor health. In short, the processes exchange “rumors” with each other via UDP - each process tells its neighbors the latest news, and they respond. If suddenly no answer comes, Tarantool begins to suspect something is amiss, and after a while he declares death and begins to tell everyone around this news.







Based on this protocol, Cartridge organizes automatic failover. Each process monitors its environment, and if the leader suddenly stops responding, the replica can take its role on itself, and Cartridge will configure the running roles accordingly.







You have to be careful here because switching back and forth frequently can lead to data conflicts during replication. Turn on automatic failover at random, of course, is not worth it. You need to clearly understand what is happening and be sure that replication will not break after the leader recovers and the crown is returned to him.



From all that has been said, it may seem that the roles are similar to microservices. In a sense, they are, only as modules within Tarantool processes. But there are a number of fundamental differences. First, all project roles must live in the same code base. And all Tarantool processes should be launched from the same code base, so that there are no surprises like those when we try to initialize scheduler, but it simply does not. Also, do not allow differences in the versions of the code, because the behavior of the system in such a situation is very difficult to predict and debug.



Unlike Docker, we cannot just take an “image” of a role, take it to another machine and run it there. Our roles are not as isolated as Docker containers. Also, we cannot run two identical roles on the same instance. The role is either there or not, in a sense it is singleton. And thirdly, roles should be the same within the entire replication group, because otherwise it would be ridiculous - the data is the same, and the configuration is different.



Deployment tools



I promised to show how Cartridge helps deploy apps. To make life easier for others, the framework packs RPM packages:



 $ cartridge pack rpm myapp --    ./myapp-0.1.0-1.rpm $ sudo yum install ./myapp-0.1.0-1.rpm
      
      





The installed package contains almost everything you need: both the application and the installed lauch dependencies. Tarantool will also come to the server as an RPM package dependency, and our service is ready to launch. This is done through systemd, but first you need to write a little configuration. At a minimum, specify the URI of each process. Three for example is enough.



 $ sudo tee /etc/tarantool/conf.d/demo.yml <<CONFIG myapp.router: {"advertise_uri": "localhost:3301", "http_port": 8080} myapp.storage_A: {"advertise_uri": "localhost:3302", "http_enabled": False} myapp.storage_B: {"advertise_uri": "localhost:3303", "http_enabled": False} CONFIG
      
      





There is an interesting nuance here. Instead of specifying only the binary protocol port, we specify the public address of the whole process including hostname. This is necessary so that the cluster nodes know how to connect to each other. It is a bad idea to use the address 0.0.0.0 as advertise_uri, it should be an external IP address, not a bind socket. Without it, nothing will work, so Cartridge simply will not let the node with the wrong advertise_uri start.



Now that the configuration is ready, you can start the processes. Since a regular systemd unit does not allow starting more than one process, applications on Cartridge install the so-called instantiated units that work like this:



 $ sudo systemctl start myapp@router $ sudo systemctl start myapp@storage_A $ sudo systemctl start myapp@storage_B
      
      





In the configuration, we specified the HTTP port on which Cartridge serves the web interface - 8080. Let's go over it and look:







We see that the processes, although they are running, are not yet configured. The cartridge does not yet know who should replicate with whom and cannot decide on its own, so it is waiting for our actions. And our choice is not big: the life of a new cluster begins with the configuration of the first node. Then we add the rest to the cluster, assign roles to them, and on this deployment can be considered successfully completed.



Pour a glass of your favorite drink and relax after a long working week. The application can be exploited.







Summary



And what are the results? Try, use, leave feedback, start tickets on github.



References



[1] Tarantool »2.2» Reference »Rocks reference» Module vshard



[2] How we implemented the core of Alfa-Bank's investment business based on Tarantool



[3] Next-Generation Billing Architecture: Transitioning to Tarantool



[4] SWIM - cluster building protocol



[5] GitHub - tarantool / cartridge-cli



[6] GitHub - tarantool / cartridge



All Articles