Kubernetes greatly simplifies the operation of applications. It takes responsibility for deploying, scaling, and failover, while the declarative nature of resource descriptions simplifies the management of complex applications.
Tarantool can act as an application server by executing stateless applications. But in reality, it can only be evaluated using it as a database and application server at the same time. Tarantool is not used where you can manage a couple of MySQL servers. It is used where the network bursts from the load, where one extra field in the tables results in hundreds of gigabytes of space spent, and where sharding is not a touch on a bright business future, but an urgent need.
We are developing solutions based on Tarantool,
Tarantool Cartridge and their ecosystem. How did we get to launching the database on Kubernetes? Everything is very simple: speed of delivery and cost of operation. Today we represent
Tarantool Kubernetes Operator , for details I ask under kat.
Table of contents:
- Instead of a thousand words
- What does the operator do at all
- A little about the nuances
- How the operator works
- What the operator expands
- Total
Tarantool is not only an open source database and application server, but also a team of engineers engaged in the development of turnkey enterprise systems.
Globally, our tasks can be divided into two areas: the development of new systems and the augmentation of existing solutions. For example, there is a large base from a famous vendor. To scale it for reading, they put a finally consistent cache on Tarantool behind it. Or vice versa: in order to scale the record, they put Tarantool in the hot / cold configuration, where as they cool down, the data is dumped to the cold storage and in parallel to the analytics queue. Or in order to back up an existing system, a light version of this system (functional reserve) is written, which reserves the main "hot" with data replication from the main system. You can learn more from the
reports from T + 2019 .
All of these systems have one thing in common: they are quite difficult to operate. Quickly roll out a cluster of 100+ instances with redundancy to 3 data centers, update the application that stores data without downtime and maintenance drawdowns, make a backup restore in case of a catastrophe or man-made errors, ensure discreet failover of components, organize configuration management ... In general, a lot interesting.
Tarantool Cartridge , which literally just got into open source, greatly simplifies the development of distributed systems: it carries on board components of clustering, service discovery, configuration management, detection of instance failures and automatic failover, management of the replication topology, and sharding components.
And it would be great if all this was still operated as simply as it is being developed. Kubernetes makes it possible to achieve the desired result, but using a specialized operator makes life even easier.
Today we are announcing the alpha version of the Tarantool Kubernetes Operator.
Instead of a thousand words
We have prepared a small example based on Tarantool Cartridge, and we will work with it. A simple application like “distributed key value storage with HTTP interface”. After launch, we get this picture:
Where:
- Routers - the part of the cluster that is responsible for accepting and processing incoming HTTP requests;
- Storages is the part of the cluster that is responsible for storing and processing data, 3 shards rise from the box, in each master and replica.
To balance incoming HTTP traffic on routers, the Kubernetian Ingress is used. Data is distributed in the storage at the level of Tarantool itself using
the vshard component .
We need kubernetes 1.14+, minikube will
work . Also the availability of
kubectl will not hurt. To start the operator, you need to create a ServiceAccount, Role and RoleBinding for it:
$ kubectl create -f https://raw.githubusercontent.com/tarantool/tarantool-operator/0.0.1/deploy/service_account.yaml $ kubectl create -f https://raw.githubusercontent.com/tarantool/tarantool-operator/0.0.1/deploy/role.yaml $ kubectl create -f https://raw.githubusercontent.com/tarantool/tarantool-operator/0.0.1/deploy/role_binding.yaml
Tarantool Operator extends Kubernetes API with its resource definitions, we will create them:
$ kubectl create -f https://raw.githubusercontent.com/tarantool/tarantool-operator/0.0.1/deploy/crds/tarantool_v1alpha1_cluster_crd.yaml $ kubectl create -f https://raw.githubusercontent.com/tarantool/tarantool-operator/0.0.1/deploy/crds/tarantool_v1alpha1_role_crd.yaml $ kubectl create -f https://raw.githubusercontent.com/tarantool/tarantool-operator/0.0.1/deploy/crds/tarantool_v1alpha1_replicasettemplate_crd.yaml
Everything is ready to launch the operator, let's go:
$ kubectl create -f https://raw.githubusercontent.com/tarantool/tarantool-operator/0.0.1/deploy/operator.yaml
We are waiting for the operator to start, and we can proceed to launch the application:
$ kubectl create -f https://raw.githubusercontent.com/tarantool/tarantool-operator/0.0.1/examples/kv/deployment.yaml
In the yaml file with the example, Ingress is declared on the web UI; it is available on
cluster_ip/admin/cluster
. When at least one Pod from Ingress is up, you can go there and watch how new instances are added to the cluster and how its topology changes.
We are waiting for the cluster to be used:
$ kubectl describe clusters.tarantool.io examples-kv-cluster
We expect that in the Status of the cluster there will be the following:
… Status: State: Ready …
Everything, the application is ready to use!
Need more storage space? Add shards:
$ kubectl scale roles.tarantool.io storage --replicas=3
Shards can not cope with the load? Increase the number of instances in the shard by editing the replicaset template:
$ kubectl edit replicasettemplates.tarantool.io storage-template
Set
.spec.replicas
, for example 2, to increase the number of instances in each replicaset to two.
A cluster is no longer needed? We delete it together with all resources:
$ kubectl delete clusters.tarantool.io examples-kv-cluster
Something went wrong?
Score the ticket , we will quickly disassemble. :)
What does the operator do at all
The launch and operation of the Tarantool Cartridge cluster is the story of performing certain actions, in a specific order, at a certain moment.
The cluster itself is managed primarily through the admin API: GraphQL over HTTP. You can, of course, go a level lower and drive commands directly into the console, but this rarely happens. For example, this is how the cluster starts:
- We raise the required number of Tarantool instances, for example, under systemd.
- Merge instances into membership:
mutation { probe_instance: probe_server(uri: "storage:3301") }
- Assign roles to instances, prescribe instance and replicaset identifiers. The GraphQL API is also used for this:
mutation { join_server( uri:"storage:3301", instance_uuid: "cccccccc-cccc-4000-b000-000000000001", replicaset_uuid: "cccccccc-0000-4000-b000-000000000000", roles: ["storage"], timeout: 5 ) }
- We perform bootstrap of the component responsible for sharding. Also through the API:
mutation { bootstrap_vshard cluster { failover(enabled:true) } }
Easy, right?
Everything becomes more interesting when it comes to cluster expansion. The role of the Routers from the example scales simply: raise more instances, hook them to an existing cluster - you're done! The role of Storages is somewhat trickier. The storage is sharded, so when adding / removing instances, it is necessary to rebalance the data to move to new instances / to move from deleted instances. If this is not done, then in one case we get underloaded instances, in the second - we lose data. And if not just one, but a dozen such clusters with different topologies are in operation?
In general, Tarantool Operator is busy with all this. The user describes the desired state of the Tarantool Cartridge cluster, and the operator translates this into a set of actions on the k8s resources and into certain calls to the Tarantool cluster admin API - in a specific order, at a certain moment, and generally tries to hide all the nuances from the user.
A little about the nuances
In working with the Tarantool Cartridge admin cluster API, both the order of calls and where they come is important. Why is that?
Tarantool Cartridge carries on board its topology repository, its service discovery component and its configuration component. Each instance of the cluster stores a copy of the topology and configuration in a yaml file.
servers: d8a9ce19-a880-5757-9ae0-6a0959525842: uri: storage-2-0.examples-kv-cluster:3301 replicaset_uuid: 8cf044f2-cae0-519b-8d08-00a2f1173fcb 497762e2-02a1-583e-8f51-5610375ebae9: uri: storage-0-0.examples-kv-cluster:3301 replicaset_uuid: 05e42b64-fa81-59e6-beb2-95d84c22a435 … vshard: bucket_count: 30000 ...
Updating occurs in concert using the
two-phase commit mechanism. A successful upgrade requires a 100% quorum: each instance must respond, otherwise rollback. What does this mean in terms of operation? All requests to the admin API that modify the state of the cluster are most reliable to send to one instance, to the leader, otherwise we risk getting different configs on different instances. Tarantool Cartridge does not know how to make a leader election (yet does not know how), but Tarantool Operator can - and you can only know about this as an entertaining fact, because the operator will ruin everything.
Also, each instance must have a fixed identity, i.e. a set of
instance_uuid
and
replicaset_uuid
, as well as
advertise_uri
. If suddenly a restart of storage occurs and one of these parameters changes, then you run the risk of breaking the quorum - the operator also does this.
How the operator works
The operator’s task is to bring the system into the state set by the user and maintain the system in this state until new directions are received. In order for the operator to be able to carry out his work, he needs:
- Description of the system status.
- The code that brings the system to this state.
- A mechanism for integrating this code into k8s (for example, to receive notifications of changes in state).
The Tarantool Cartridge cluster is described in terms of k8s through a
Custom Resource Definition (CRD) ; the operator needs 3 such custom resources, united under the tarantool.io/v1alpha group:
- Cluster is a top-level resource that corresponds to one Tarantool Cartridge cluster.
- Role - in terms of Tarantool Cartridge, this is a user role .
- ReplicasetTemplate - a template by which StatefulSets will be created (why stateful - I'll tell you a bit later; not to be confused with k8s ReplicaSet).
All of these resources directly reflect the Tarantool Cartridge cluster description model. Having a common dictionary, it’s easier for an operator to communicate with developers and understand what they want to see in the prod.
The code that brings the system to the given state - in terms of k8s, this is Controller. In the case of the Tarantool Operator, there are several controllers:
- ClusterController - is responsible for interacting with the Tarantool Cartridge cluster, connects instances to the cluster, disconnects instances from the cluster.
- RoleController - a user role controller, is responsible for deploying StatefulSets from the template and maintaining their number in a given number.
What is a controller like? A set of code that gradually brings the world around you in order. ClusterController can be schematically depicted like this:
An entry point is a check to see if a Cluster resource exists with respect to which the event occurred. Does not exist? We are leaving. Exist? We move on to the next block: grab Ownership over user roles. Captured one - left, on the second circle we capture the second. And so on, until we capture everything. Are all roles captured? So go to the next block of operations. And so, until we get to the last; then we can assume that the controlled system is in a given state.
In general, everything is simple. It is important to determine the success criteria for passing each stage. For example, we consider successful the operation of joining a cluster not when it returned conditional success = true, but when it returned an error like "already joined".
And the last part of this mechanism is the integration of the controller with k8s. Aerial view, the entire k8s consists of a set of controllers that generate events and respond to them. Events go through queues that we can subscribe to. Schematically, this can be represented as follows:
The user calls
kubectl create -f tarantool_cluster.yaml
, the corresponding Cluster resource is created. ClusterController is notified of the creation of a Cluster resource. And the first thing he is trying to do is find all the Role resources that should be part of this cluster. If it finds, then assigns Cluster as Owner for Role and updates the Role resource. The RoleController receives a Role update notification, sees that the resource has an Owner, and begins to create StatefulSets. And so on in a circle: the first strigger of the second, the second strigger of the third - and so on until someone stops. And you can also trigger on time, for example, every 5 seconds, which is sometimes useful.
That's the whole operator: create a custom resource and write code that responds to events on resources.
What the operator expands
Operator actions ultimately lead to k8s creating Pods and containers. In the Tarantool Cartridge cluster deployed on k8s, all Pods are merged into StatefulSets.
Why StatefulSet? As I wrote earlier, each Tarantool Cluster instance keeps a copy of the topology and configuration of the cluster, and often on the app server no, no, and they use some kind of space, for example, in turn or reference data, and this is already a full state . And StatefulSet also guarantees the preservation of identity Pods, which is important when clustering instances in a cluster: the identity of instances must be fixed, otherwise we risk losing the quorum when restarting.
When all cluster resources are created and brought to the desired state, they form the following hierarchy:
The arrows indicate the Owner-Dependant relationship between resources. It is necessary that the
Garbage Collector clean up after us in the case of, for example, the removal of Cluster.
In addition to StatefulSets, the Tarantool Operator creates the Headless Service, which is needed for the leader election, and through it, the instances communicate with each other.
Under the hood of the Tarantool Operator lies the
Operator Framework , the operator code itself is in golang, there is nothing extraordinary here.
Total
That’s all, in general! We are waiting for feedback and tickets from you - where without them, the alpha version is all the same. What's next? And then there is a lot of work to bring this all to mind:
- Unit, E2E testing;
- Chaos Monkey testing
- Stress Testing;
- backup / restore;
- external topology provider.
Each of these topics is extensive in itself and deserves a separate material, wait for updates!