Tarantool kubernetes operator





Kubernetes greatly simplifies the operation of applications. It takes responsibility for deploying, scaling, and failover, while the declarative nature of resource descriptions simplifies the management of complex applications.



Tarantool can act as an application server by executing stateless applications. But in reality, it can only be evaluated using it as a database and application server at the same time. Tarantool is not used where you can manage a couple of MySQL servers. It is used where the network bursts from the load, where one extra field in the tables results in hundreds of gigabytes of space spent, and where sharding is not a touch on a bright business future, but an urgent need.



We are developing solutions based on Tarantool, Tarantool Cartridge and their ecosystem. How did we get to launching the database on Kubernetes? Everything is very simple: speed of delivery and cost of operation. Today we represent Tarantool Kubernetes Operator , for details I ask under kat.



Table of contents:



  1. Instead of a thousand words
  2. What does the operator do at all
  3. A little about the nuances
  4. How the operator works
  5. What the operator expands
  6. Total




Tarantool is not only an open source database and application server, but also a team of engineers engaged in the development of turnkey enterprise systems.



Globally, our tasks can be divided into two areas: the development of new systems and the augmentation of existing solutions. For example, there is a large base from a famous vendor. To scale it for reading, they put a finally consistent cache on Tarantool behind it. Or vice versa: in order to scale the record, they put Tarantool in the hot / cold configuration, where as they cool down, the data is dumped to the cold storage and in parallel to the analytics queue. Or in order to back up an existing system, a light version of this system (functional reserve) is written, which reserves the main "hot" with data replication from the main system. You can learn more from the reports from T + 2019 .



All of these systems have one thing in common: they are quite difficult to operate. Quickly roll out a cluster of 100+ instances with redundancy to 3 data centers, update the application that stores data without downtime and maintenance drawdowns, make a backup restore in case of a catastrophe or man-made errors, ensure discreet failover of components, organize configuration management ... In general, a lot interesting.



Tarantool Cartridge , which literally just got into open source, greatly simplifies the development of distributed systems: it carries on board components of clustering, service discovery, configuration management, detection of instance failures and automatic failover, management of the replication topology, and sharding components.



And it would be great if all this was still operated as simply as it is being developed. Kubernetes makes it possible to achieve the desired result, but using a specialized operator makes life even easier.



Today we are announcing the alpha version of the Tarantool Kubernetes Operator.



Instead of a thousand words



We have prepared a small example based on Tarantool Cartridge, and we will work with it. A simple application like “distributed key value storage with HTTP interface”. After launch, we get this picture:







Where:





To balance incoming HTTP traffic on routers, the Kubernetian Ingress is used. Data is distributed in the storage at the level of Tarantool itself using the vshard component .



We need kubernetes 1.14+, minikube will work . Also the availability of kubectl will not hurt. To start the operator, you need to create a ServiceAccount, Role and RoleBinding for it:



$ kubectl create -f https://raw.githubusercontent.com/tarantool/tarantool-operator/0.0.1/deploy/service_account.yaml $ kubectl create -f https://raw.githubusercontent.com/tarantool/tarantool-operator/0.0.1/deploy/role.yaml $ kubectl create -f https://raw.githubusercontent.com/tarantool/tarantool-operator/0.0.1/deploy/role_binding.yaml
      
      





Tarantool Operator extends Kubernetes API with its resource definitions, we will create them:



 $ kubectl create -f https://raw.githubusercontent.com/tarantool/tarantool-operator/0.0.1/deploy/crds/tarantool_v1alpha1_cluster_crd.yaml $ kubectl create -f https://raw.githubusercontent.com/tarantool/tarantool-operator/0.0.1/deploy/crds/tarantool_v1alpha1_role_crd.yaml $ kubectl create -f https://raw.githubusercontent.com/tarantool/tarantool-operator/0.0.1/deploy/crds/tarantool_v1alpha1_replicasettemplate_crd.yaml
      
      





Everything is ready to launch the operator, let's go:



 $ kubectl create -f https://raw.githubusercontent.com/tarantool/tarantool-operator/0.0.1/deploy/operator.yaml
      
      





We are waiting for the operator to start, and we can proceed to launch the application:



 $ kubectl create -f https://raw.githubusercontent.com/tarantool/tarantool-operator/0.0.1/examples/kv/deployment.yaml
      
      





In the yaml file with the example, Ingress is declared on the web UI; it is available on cluster_ip/admin/cluster



. When at least one Pod from Ingress is up, you can go there and watch how new instances are added to the cluster and how its topology changes.



We are waiting for the cluster to be used:



 $ kubectl describe clusters.tarantool.io examples-kv-cluster
      
      





We expect that in the Status of the cluster there will be the following:



 … Status: State: Ready …
      
      





Everything, the application is ready to use!



Need more storage space? Add shards:



 $ kubectl scale roles.tarantool.io storage --replicas=3
      
      





Shards can not cope with the load? Increase the number of instances in the shard by editing the replicaset template:



 $ kubectl edit replicasettemplates.tarantool.io storage-template
      
      





Set .spec.replicas



, for example 2, to increase the number of instances in each replicaset to two.



A cluster is no longer needed? We delete it together with all resources:



 $ kubectl delete clusters.tarantool.io examples-kv-cluster
      
      





Something went wrong? Score the ticket , we will quickly disassemble. :)



What does the operator do at all



The launch and operation of the Tarantool Cartridge cluster is the story of performing certain actions, in a specific order, at a certain moment.



The cluster itself is managed primarily through the admin API: GraphQL over HTTP. You can, of course, go a level lower and drive commands directly into the console, but this rarely happens. For example, this is how the cluster starts:



  1. We raise the required number of Tarantool instances, for example, under systemd.

  2. Merge instances into membership:



     mutation { probe_instance: probe_server(uri: "storage:3301") }
          
          





  3. Assign roles to instances, prescribe instance and replicaset identifiers. The GraphQL API is also used for this:



     mutation { join_server( uri:"storage:3301", instance_uuid: "cccccccc-cccc-4000-b000-000000000001", replicaset_uuid: "cccccccc-0000-4000-b000-000000000000", roles: ["storage"], timeout: 5 ) }
          
          





  4. We perform bootstrap of the component responsible for sharding. Also through the API:



     mutation { bootstrap_vshard cluster { failover(enabled:true) } }
          
          







Easy, right?



Everything becomes more interesting when it comes to cluster expansion. The role of the Routers from the example scales simply: raise more instances, hook them to an existing cluster - you're done! The role of Storages is somewhat trickier. The storage is sharded, so when adding / removing instances, it is necessary to rebalance the data to move to new instances / to move from deleted instances. If this is not done, then in one case we get underloaded instances, in the second - we lose data. And if not just one, but a dozen such clusters with different topologies are in operation?



In general, Tarantool Operator is busy with all this. The user describes the desired state of the Tarantool Cartridge cluster, and the operator translates this into a set of actions on the k8s resources and into certain calls to the Tarantool cluster admin API - in a specific order, at a certain moment, and generally tries to hide all the nuances from the user.



A little about the nuances



In working with the Tarantool Cartridge admin cluster API, both the order of calls and where they come is important. Why is that?



Tarantool Cartridge carries on board its topology repository, its service discovery component and its configuration component. Each instance of the cluster stores a copy of the topology and configuration in a yaml file.



 servers: d8a9ce19-a880-5757-9ae0-6a0959525842: uri: storage-2-0.examples-kv-cluster:3301 replicaset_uuid: 8cf044f2-cae0-519b-8d08-00a2f1173fcb 497762e2-02a1-583e-8f51-5610375ebae9: uri: storage-0-0.examples-kv-cluster:3301 replicaset_uuid: 05e42b64-fa81-59e6-beb2-95d84c22a435 … vshard: bucket_count: 30000 ...
      
      





Updating occurs in concert using the two-phase commit mechanism. A successful upgrade requires a 100% quorum: each instance must respond, otherwise rollback. What does this mean in terms of operation? All requests to the admin API that modify the state of the cluster are most reliable to send to one instance, to the leader, otherwise we risk getting different configs on different instances. Tarantool Cartridge does not know how to make a leader election (yet does not know how), but Tarantool Operator can - and you can only know about this as an entertaining fact, because the operator will ruin everything.



Also, each instance must have a fixed identity, i.e. a set of instance_uuid



and replicaset_uuid



, as well as advertise_uri



. If suddenly a restart of storage occurs and one of these parameters changes, then you run the risk of breaking the quorum - the operator also does this.



How the operator works



The operator’s task is to bring the system into the state set by the user and maintain the system in this state until new directions are received. In order for the operator to be able to carry out his work, he needs:



  1. Description of the system status.

  2. The code that brings the system to this state.

  3. A mechanism for integrating this code into k8s (for example, to receive notifications of changes in state).



The Tarantool Cartridge cluster is described in terms of k8s through a Custom Resource Definition (CRD) ; the operator needs 3 such custom resources, united under the tarantool.io/v1alpha group:





All of these resources directly reflect the Tarantool Cartridge cluster description model. Having a common dictionary, it’s easier for an operator to communicate with developers and understand what they want to see in the prod.



The code that brings the system to the given state - in terms of k8s, this is Controller. In the case of the Tarantool Operator, there are several controllers:





What is a controller like? A set of code that gradually brings the world around you in order. ClusterController can be schematically depicted like this:







An entry point is a check to see if a Cluster resource exists with respect to which the event occurred. Does not exist? We are leaving. Exist? We move on to the next block: grab Ownership over user roles. Captured one - left, on the second circle we capture the second. And so on, until we capture everything. Are all roles captured? So go to the next block of operations. And so, until we get to the last; then we can assume that the controlled system is in a given state.



In general, everything is simple. It is important to determine the success criteria for passing each stage. For example, we consider successful the operation of joining a cluster not when it returned conditional success = true, but when it returned an error like "already joined".



And the last part of this mechanism is the integration of the controller with k8s. Aerial view, the entire k8s consists of a set of controllers that generate events and respond to them. Events go through queues that we can subscribe to. Schematically, this can be represented as follows:







The user calls kubectl create -f tarantool_cluster.yaml



, the corresponding Cluster resource is created. ClusterController is notified of the creation of a Cluster resource. And the first thing he is trying to do is find all the Role resources that should be part of this cluster. If it finds, then assigns Cluster as Owner for Role and updates the Role resource. The RoleController receives a Role update notification, sees that the resource has an Owner, and begins to create StatefulSets. And so on in a circle: the first strigger of the second, the second strigger of the third - and so on until someone stops. And you can also trigger on time, for example, every 5 seconds, which is sometimes useful.



That's the whole operator: create a custom resource and write code that responds to events on resources.



What the operator expands



Operator actions ultimately lead to k8s creating Pods and containers. In the Tarantool Cartridge cluster deployed on k8s, all Pods are merged into StatefulSets.



Why StatefulSet? As I wrote earlier, each Tarantool Cluster instance keeps a copy of the topology and configuration of the cluster, and often on the app server no, no, and they use some kind of space, for example, in turn or reference data, and this is already a full state . And StatefulSet also guarantees the preservation of identity Pods, which is important when clustering instances in a cluster: the identity of instances must be fixed, otherwise we risk losing the quorum when restarting.



When all cluster resources are created and brought to the desired state, they form the following hierarchy:







The arrows indicate the Owner-Dependant relationship between resources. It is necessary that the Garbage Collector clean up after us in the case of, for example, the removal of Cluster.



In addition to StatefulSets, the Tarantool Operator creates the Headless Service, which is needed for the leader election, and through it, the instances communicate with each other.



Under the hood of the Tarantool Operator lies the Operator Framework , the operator code itself is in golang, there is nothing extraordinary here.



Total



That’s all, in general! We are waiting for feedback and tickets from you - where without them, the alpha version is all the same. What's next? And then there is a lot of work to bring this all to mind:





Each of these topics is extensive in itself and deserves a separate material, wait for updates!



All Articles