👨‍👨‍👦‍👦 👞 🤷🏼 Databases on HighLoad ++ 2019 🎐 🤹🏾 🏹

Working with a database is what significantly affects the performance of any web service. If you try , you can arrange a highload without any load at all.

And if you do everything wisely, you can handle the requests of many thousands of users. Therefore, the HighLoad ++ schedule traditionally has many reports on databases. We have tracks on PostgreSQL, MySQL and ClickHouse, there are several reports on MongoDB (in the best tradition, the speaker is a performance engineer in MongoDB). In addition, there are speeches on the comparison of different approaches or considering specialized solutions. And for generality, let's add Tarantool and in-memory here. A total of 33 reports are directly related to the section “Databases and storage systems” and at least 10 indirectly. And this is not counting the mitaps , which are no less than ten, and new ones will be added along the way.

We will try to help you navigate in all its diversity and not to miss truly unique reports. For reliability, we ask the opinion of the member of the Program Committee responsible for this section, Nikolay Samokhvalov. And don’t look that Nikolai is the founder of Postgres.ai and generally postgresmen - he is well-versed in the world of databases, he knows interesting behind-the-scenes stories and trends.

The review is organized as simply as possible: we opened the list of reports and went from top to bottom, dwelling on those topics that we should definitely pay attention to.

ClickHouse Query Profiler

The query profiler in analytic databases is a very interesting thing. The approach should be very different from OLTP databases, because, as a rule, queries are performed in analytical databases for a long time. If in PostgreSQL the query execution plan is static, then it makes sense to almost monitor one query.

In this report, Nikita Lapkov will talk about the design of such a ClickHouse request profiler, which allows you to determine which section of the code slows down for a particular request . And take appropriate measures to implement the famous "ClickHouse does not slow down."

Backup in modern infrastructure

This report is just from the series “next to the database”, it examines the system problem, but most of it is devoted to the issue of backups in MySQL. The story of Anton Turetsky will definitely be interesting, because this is the Badoo experience, that is, it is a question of a huge number of servers . On such a scale, backing up and, most importantly, checking everything is a non-trivial task. In addition, they managed to make friends of modern trends and paradigms of designing systems with backup so as not to lose confidence that the necessary data can be obtained in any, even the most critical, case.

NB: Backups without automatic verification are not backups.

Moving to ClickHouse: 3 years later

ClickHouse confidently conquers its position, but few third-party developers managed to accumulate solid experience working with it. Alexander Zaitsev and Altinity are pioneers of using ClickHouse, 3 years ago they spoke on HighLoad ++ about moving a multi-petabyte analytic system to ClickHouse.

What has changed since then? Alexander will share his experience and know-how, which cannot be found in the documentation.

MongoDB vs. Postgres benchmarks

Two guests will talk about MongoDB in HighLoad ++. The Álvaro Hernández report has an interesting, even scandalous, background. When Alvaro made and introduced benchmarks comparing MongoDB and PostgreSQL, a skirmish broke out on the Internet. MongoDB later introduced their benchmarks.

As a result, each of the worlds simply has its own philosophy. PostgreSQL adherents have a hard time accepting such a fuzzy attitude towards data, but MongoDB solutions are in demand on the market. Comparing them directly is almost impossible, and this makes Alvaro's report very exciting. It is easy to blindly adhere to one point of view, but it is much better to know and understand both.

This is a fun fact for everyone - Michael Stonebreaker participated. He drew attention to the dispute between Postgres and Mongo and published several articles about the problems of this model. That is, the founder of Postgres, who at one time said that one size is not suitable for everyone, and as a result launched the creation of specialized databases, including NoSQL, is now essentially returning to Postgres. He writes what problems there are, suggests marrying data models, and generally says that everything is fine with Postgres. It is very interesting to watch this twenty-year cycle.

MongoDB distributed transactions from top to bottom

The second report on MongoDB will be made by Henrik Ingo, the MongoDB solution architect, specializing in improving MongoDB performance and providing high availability. But Henrik, before MongoDB, worked in the MySQL world for many years, so he knows exactly the arguments of various camps.

At HighLoad ++, Henrik will tell you how to make transactions in a distributed NoSQL database satisfy ACID, and why you might need it at all.

Odyssey roadmap: what else do we want from the connection puller

Three weeks ago, the main limitation of PgBouncer, which companies often run into, was removed, but it already managed to bother everyone. For example, because it was impossible to commit improvements to open source - Yandex and Avito patches have not been accepted for years.

Yandex did not wait for these changes and made their connection puller - Odyssey. It is multi-trade, it has additional chips, and Andrei Borodin will tell in more detail in his report . In addition, it will be possible to discuss roadmap - what features of the puller would like to see in the new versions of the community.

DBA bot Joe. Relieve the pain of backend development

With this report, Postgres.ai proposes to fundamentally change the approach to backend development. Instead of checking the code and queries on small databases, check on large databases and immediately see the result. It sounds logical - if the request is slow, it will be detected immediately. Another thing is what to do for this, for example, full-fledged copies of the combat database is very inconvenient. This is where the artificial DBA bot Joe comes to the rescue

Joe can write a request or ask to create an index, and he will perform all the actions on a full-size copy of the combat database. At any time, you can start over, canceling all the changes in a few seconds, dropping the OS and DBMS caches. And for the work of ten developers do not need x10 disk space. How this magic works, and from which open source components you can try to collect it from yourself, says Anatoly Stansler.

Dear DELETE. Typical errors when performing massive operations in highly loaded PostgreSQL databases

And if it seems to someone that there is nothing wrong with deleting several million rows with one DELETE, when the condition is known and there is a suitable index, then you need to listen to the report of Nikolay Samokhvalov. If you try to perform an operation in such conditions, the service will most likely fall down, and there are a lot of reasons for it right away: DBA didn’t work well, the developers behaved incorrectly, and the organizational approach was wrong.

Nikolay will show how Postgres.ai helps to solve these problems and how to configure protection without using it and always act reliably without dropping the prod. All this is based on real experience of pain and huge financial losses . Because it seems to be clear that you can not immediately delete, but, for example, mark data for deletion first, but blocking operations on a million lines are all encountered.

Patroni on GitLab.com

GitLab uses PostgreSQL to the full, recently abandoned MySQL, and to ensure HA switched from REPMGR to Patroni. Patroni was developed by Zalando, its task is to automatically switch if something happened to the wizard, and to ensure the availability of the service.

Now Patroni is the de facto standard , and GitLab have implemented it on their cloud solution - for a second, 25,000,000 git pull operations per day - and are preparing a solution for checking installations. Jose Cores Finotto will share this super-interesting experience on HighLoad ++ on November 7th.

StackGres: Cloud-Native PostgreSQL on Kubernetes

Álvaro Hernández will also introduce the StackGres product, essentially replacing RDS, with PostgreSQL and MongoDB. But it makes it possible to deploy RDS in Kubernetes much cheaper, get backups configured with a minimum of effort with a trailer, Patroni for auto failover, health check and a bunch of different tools.

This is a promising undertaking in a direction similar to the Linux story. There is a Linux kernel, and many different assemblies around it. We see the same thing with respect to PostgreSQL, which can be considered the kernel for a DBMS, and assemblies will appear around it. StackGres has good chances to gain popularity, because there is a lively team, and clients where you can run your decisions.

PostgreSQL locks

Locks are basically a topic that everyone who works with PostgreSQL should listen to. Moreover, Yegor Rogov, who has established himself as a drop dead lecturer, will talk about them. He deeply knows the material and will help you understand the types of locks and understand how to read pg_locks and pg_stat_activity and avoid a number of errors in system design. Egor’s report on HighLoad ++ is a great opportunity not only to listen, but to talk with an expert, ask him your questions, and possibly discuss completely different problems.

Backing up loaded DBMS

Andrey Borodin and George Rylov work at Yandex and are developing WAL-G, an open source backup tool.

Initially, WAL-G is a tool for PostgreSQL developed by Citus (it is curious that Microsoft recently absorbed Citus, that is, in fact, bought a piece of PostgreSQL). But it turned out that the idea of organizing work with WAL-G is well suited to other databases. Andrew and George will just talk about the functionality for MySQL, Redis, MongoDB and the prospects opening up in connection with this.

Vitess: Fearlessly Scaling in the Cloud

Sugu Sougoumarane is the founder of PlanetScale. You may not have heard of this company yet, but recently it received funding of $ 25 million to develop its open Vitess product. You may not have heard of Vitess either. So Vitess is a MySQL sharding system , and you definitely know more than one large company that uses Vitess for highly loaded databases.

It all started with YouTube. It was there that Sugu and his team created what later became the Vitess open source system. By the way, they chose Go - a very young language at that time. Sugu can tell many interesting stories about the first years of Go and about its development in general - on Google his team became the first major user of the language.

Well, now, in addition to YouTube, Vitess is used by companies such as GitHub, Pinterest, Slack, Square. After leaving Google, Sugu founded PlanetScale and continues to develop Vitess, keeping the code open. Come hear about planetary sharding and the use of Go in true Highload. And don't forget to ask about the plans for the Postgres version of Vitess - Sugu really likes this question.

Patroni Failure Stories or How to crash your PostgreSQL cluster

It is funny that we will listen to the main Patroni maintainer on a different topic , because he already told us about Patroni. But Aleksey Lesovsky can tell how Patroni is exploited outside of Zalando and what cones are stuffed. Because these cones can be very different, and Alex promises to share the details of real crash cases . We will learn from the report what problems there are, what lessons have been learned in Data Egret and how to configure Patroni (and, possibly, PostgreSQL) correctly. And, of course, we get an idea of how to quickly identify emerging problems and quickly fix them.

SQL / JSON: we implement the standard and do not stop there

Recently, the border between relational and document-oriented DBMSs has been blurred. The SQL standard has functions for working with JSON, and PostgreSQL is a pioneer of effective JSON support among relational DBMSs . Largely thanks to Postgres Professional, the standard has already been partially implemented.

The report of Alexander Korotkov is a first-hand account of SQL / JSON implementation and its "heart" jsonpath in PostgreSQL. That is an opportunity to learn about internal features, operating experience and plans for the future.

PostgreSQL on K8s in Zalando: two years in battle

Alexander Kukushkin is a co-author of Patroni, but this year he will talk about another interesting development of Zalando. Two years ago, they started developing Postgres-Operator, and at the moment, with its help, DBA operators service more than 1000 Postgres clusters running on Kubernetes.

While some still doubt whether databases are possible in Kubernetes, large companies are already working with all this. Getting to know and learning from such an experience somewhere would be awesome.

Enterprise calls for Postgres

Large companies are increasingly using PostgreSQL, very often expecting from it what they are used to in the enterprise. A typical example - we need solutions for logical replication - we turn to the vendor. And some vendors even made such support - there was Oracle, now PostgreSQL has appeared. But we begin to understand, and it turns out, a lot works differently.

We are witnessing the collision of the worlds of open source and enterprise. Andreessen Horowitz recently published a study saying that investor interest in open source has grown substantially and will continue to grow. Therefore, vendors need to switch to open source and new monetization models - this will be better for a number of reasons.

Ivan Panchenko will just tell you what difficulties of migrating to PostgreSQL for enterprise are subjective and belong to the “crooked hands” type, and when are these important challenges that PostgreSQL must cope with during its development. Abstracts promise discussion of such topics: scaling factors (table size, number of objects, memory, connections, replication), storage features (Heap, Pluggable storages), temporary tables, vacuum, interaction with the OS.

And on this note - the future is open source - we will complete a detailed study of the reports. Unfortunately, behind the scenes MySQL was almost completely left behind. If this is your topic, check out Vittorio Cioe and Alkin Tezuysal .

ClickHouse is also presented in a larger number of reports, and as always, a mitap is of special value where you can ask any questions about ClickHouse, together with developers find a solution to problems, discuss opportunities and plans.

We also did not touch Tarantool, as this is a database and application server in one bottle. And the reports in the HighLoad ++ 2019 program focus on this multifunctionality. Vasily Tyubek will talk about Tarantool Kubernetes Operator to run a database in Kubernetes, Yaroslav Dynnikov will show the convenience of building distributed systems using Tarantool. And do not miss the opportunity to clarify all the details with the developers themselves - it is much more productive and interesting than reading the documentation.

In general, we consider questions to speakers, backstage discussions and networking - a very, if not the most, important part of the conference. Therefore, we create all conditions for informal communication and try to have a good time.

On November 7 and 8, HighLoad ++ will fill SKOLKOVO to the brim and splash out of it. There will be HighLoad ++ branches in Novosibirsk and St. Petersburg with a teleconference to the Main Hall and all the benefits of networking at the conference. On youtube, we will launch an open video broadcast of the most anticipated reports and the HighLoad ++ Prize , and in the telegram channel , we will launch text translation agents along a different trajectory. In short, even if you do not go to HighLoad ++ (in vain - you can still change your mind, get a ticket and take off), you can still get a lot of benefit and fun through our networks.

Databases on HighLoad ++ 2019