PostgreSQL and record consistency settings for each particular connection

The translation of the article was prepared especially for students of the "Database" course. Is it interesting to develop in this direction? We invite you to Open Day , where we talk in detail about the program, features of the online format, competencies and career prospects that await graduates after training.













PostgreSQL and record consistency settings for each particular connection

At Compose, we have to deal with many databases, which is what gives us the opportunity to get to know our functionality and weaknesses better. As we learn to love the functional features of new databases, we sometimes start to think about how good it would be if similar functions were present in more mature tools that we have been working with for a long time. One of the new features that PostgreSQL wanted to see was its custom write consistency across the entire cluster. And as it turned out, we already have it, and today we want to share with you information about how you can use it.







Why do I need it?



How a cluster should behave depends on your application. Take, for example, an application for paying bills. You will need 100% consistency in the cluster, so you have to enable synchronous commits so that your database waits for all changes to be made. However, if your application is a fast-growing social network, then you will probably prefer 100% consistency with a quick response. To achieve this, you can use asynchronous commits in your cluster.







Meet the compromise



You have to compromise between data consistency and performance. PostgreSQL steps from consistency, since the default configuration in this case is predictable and without unexpected surprises. And now we will get acquainted with compromises.







Compromise 1: Performance



If the PostgreSQL cluster does not require consistency, it may well work asynchronously. Recording is done on the cluster leader, and updates will be sent to his replicas after a few milliseconds. When consistency is required for a PostgreSQL cluster, it should work synchronously. The record will be made in the leader of the cluster, which will send updates to the replicas and wait for confirmation that everyone made a record before sending confirmation to the client who initiated the record that it was successful. The practical difference between these approaches is that the asynchronous method requires two network jumps, while the synchronous method requires four.







Compromise 2: Consistency



The result in the event of a failure in the work of the leader in these two approaches will also be different. If the work is performed asynchronously, then when such an error occurs, not all records will be committed by replicas. How much will be lost? Depends on the application itself and replication efficiency. Compose replication will prevent the replica from becoming a leader if the amount of information in it is 1 MB less than in the leader, that is, up to 1 MB of records can potentially be lost during asynchronous operation.







In synchronous mode, this does not happen. If the leader fails, all replicas are updated, since any record confirmed on the leader must be confirmed in the replicas. Here it is - coherence.







It makes sense to use synchronous behavior in an application to pay bills, where consistency has a clear advantage in finding a compromise between consistency and performance. The most important thing for such an application is valid data. Now remember about the social network, in which the main task is to keep the user's attention, responding to requests as quickly as possible. In this case, performance with fewer network hopping and fewer wait commits will be a priority. However, the tradeoff between performance and consistency is not the only one to think about.







Compromise 3: Failures



It is very important to understand how the cluster behaves during a failure. Consider a situation where one or more replicas fail. When commits are processed asynchronously, the leader will continue to function, that is, receive and process records without waiting for missing replicas. When the replicas return to the cluster, they catch up with the leader. In synchronous replication, if the replicas do not respond, then the leader will have no choice and he will continue to wait for confirmation of the commit until the replica returns to the cluster and can accept and confirm the record.







One connection per transaction?



Each application needs a special type of combination of consistency and performance. Unless of course it is our billing application, which we think is completely consistent, or our almost ephemeral social network application. In all other cases, there will be times when some operations must be synchronous and some asynchronous. You may not want the system to wait until the message sent to the chat is committed, but if the payment is made in the same application, you will have to wait.







All these decisions, of course, are made by the application developer. The right decisions about when to apply this or that approach will help to get the most out of the cluster. It is important that the developer can switch between them at the SQL level for connections and for transactions.







Providing control in practice



By default, PostgreSQL provides consistency. This is controlled by the synchronous_commit



server parameter. By default, it is on



, but it has three other options: local



, remote_write



or off



.







When the parameter is set to off



, all synchronous commits are stopped, even in the local system. The parameter in local determines the synchronous mode for the local system, but writes to replicas are asynchronous. Remote_write



goes even further: writes to replicas are made asynchronously, but are returned when the replica accepted the record but did not write it to disk.







Considering the available range of options, we choose the behavior and, remembering that on



are synchronous records, we select local



for asynchronous commits over the network, while leaving local commits synchronous.







Now, we will tell you how to configure this in an instant, but imagine that we set synchronous_commit



to local



for the server. We asked ourselves whether it is possible to change the synchronous_commit



parameter on the fly, and it turned out that it’s not just possible, there are even two ways to do this. The first is to set up your connection session as follows:







 SET SESSION synchronous_commit TO ON; // Your writes go here
      
      





All subsequent records in the session will confirm the write operations for the replicas before returning a positive result to the connected client. Unless, of course, you change the synchronous_commit



setting again. You can omit the SESSION



part of the command because it will be in the default value.







The second way is good when you just want to make sure you get synchronous replication for a single transaction. In many NoSQL databases, the concept of transactions does not exist, but it exists in PostgreSQL. In this case, you start the transaction, and then set synchronous_commit



to on



before writing to the transaction. COMMIT



the transaction using any value of the synchronous_commit



parameter that was set at that moment, although it is best to set the variable in advance to make sure that other developers understand that the records are not asynchronous.







 BEGIN; SET LOCAL synchronous_commit TO ON; // Your writes go here COMMIT;
      
      





All transaction commits will now be confirmed as written to the replicas even before the database returns a positive response to the connected client.







PostgreSQL setup



Before that, we imagined a PostgreSQL system with synchronous_commit



set to local



. For this to be real on the server side, you will need to set two server configuration parameters. Another synchronous_standby_names



parameter will take over when synchronous_commit



is on



. It determines which replicas are eligible for synchronous commits, and we will set it to *



, which will mean that all replicas are enabled. These values ​​are usually configured in the configuration file by adding:







 synchronous_commit = local synchronous_standby_names='*'
      
      





By setting the synchronous_commit



parameter to local



, we create a system in which local drives remain synchronous, but network replica commits are asynchronous by default. Unless, of course, we decide to make these commits synchronous, as shown above.







If you followed the development of the Governor project , you may have noticed some recent changes ( 1 , 2 ), which allowed Governor users to test these parameters and control their consistency.







A few more words ...



Just a week ago, I would tell you that it is not possible to fine-tune PostgreSQL so finely. It was then that Kurt, a member of the Compose platform team, insisted that there was such an opportunity. He pacified my objections and found the following in the PostgreSQL documentation:













This parameter can be changed at any time. The behavior for any transaction is determined by the setting in effect when committing. Therefore, it is possible and useful for commits to be committed synchronously for some transactions, and asynchronously for others. For example, to force a single multistatement



transaction to commit asynchronously when the default value of the parameter is the opposite, set
SET LOCAL synchronous_commit TO OFF



in the transaction.







With this small modification in the configuration file, we gave users the ability to control their consistency and performance.








All Articles