PostgreSQL locks: 2. String locks

Last time we talked about object-level locks , in particular, about locks on relationships. Today we will see how row locks are arranged in PostgreSQL and how they are used together with object locks, we will talk about waiting queues and about those who climb out of turn.







Row locks



Device



Let me remind you of several important conclusions from the last article.





We certainly want the change of one row not to block other rows of the same table. But we cannot afford to start each line with our own lock.



There are different ways to solve this problem. In some DBMSs, an increase in the level of locking occurs: if there are too many row level locks, they are replaced by one more general lock (for example, page level or the whole table).



As we will see later, PostgreSQL also uses this mechanism, but only for predicate locks. Line locks are different.



In PostgreSQL, information that a row is locked is stored solely and exclusively in the version of the row inside the data page (and not in RAM). That is, this is not a block at all in the usual sense, but just a sign. This sign is actually the xmax transaction number in combination with additional information bits; a little later we will see in detail how this works.



The plus is that we can block as many lines as we like without consuming any resources.



But there is a minus : since information about the lock is not presented in RAM, other processes cannot stand in line. And there is no monitoring possibility (in order to calculate the locks, you need to read the entire table).



Well, monitoring is fine, but something needs to be done with the queue. To do this, you still have to use "regular" locks. If we need to wait until the row is released, in fact, we must wait until the end of the blocking transaction - all locks are released upon commit or rollback. And for this, you can request a blocking number of a blocking transaction (which, I recall, is held by the transaction itself in exceptional mode). Thus, the number of locks used is proportional to the number of simultaneously running processes, and not to the number of rows being changed.



Exceptional Modes



In total there are 4 modes in which you can lock the line. Of these, two modes represent exclusive locks that only one transaction can hold at a time.





The UPDATE command itself selects the minimum appropriate locking mode; usually rows are locked in FOR NO KEY UPDATE mode.



As you remember , when deleting or changing a line, the current transaction version number is written in the xmax field of the current current version. It shows that the version of the row has been deleted by this transaction. So, the same xmax number is used as a sign of blocking. In fact, if xmax in the version of the line corresponds to an active (not yet completed) transaction and we want to update this particular line, then we must wait for the transaction to complete, so an additional sign is not needed.



Let's get a look. Create a table of accounts, the same as in the previous article.



=> CREATE TABLE accounts( acc_no integer PRIMARY KEY, amount numeric ); => INSERT INTO accounts VALUES (1, 100.00), (2, 200.00), (3, 300.00);
      
      





To look at the pages, of course, we need the already familiar pageinspect extension.



 => CREATE EXTENSION pageinspect;
      
      





For convenience, create a view showing only the information we are interested in: xmax and some information bits.



 => CREATE VIEW accounts_v AS SELECT '(0,'||lp||')' AS ctid, t_xmax as xmax, CASE WHEN (t_infomask & 128) > 0 THEN 't' END AS lock_only, CASE WHEN (t_infomask & 4096) > 0 THEN 't' END AS is_multi, CASE WHEN (t_infomask2 & 8192) > 0 THEN 't' END AS keys_upd, CASE WHEN (t_infomask & 16) > 0 THEN 't' END AS keyshr_lock, CASE WHEN (t_infomask & 16+64) = 16+64 THEN 't' END AS shr_lock FROM heap_page_items(get_raw_page('accounts',0)) ORDER BY lp;
      
      





So, we start the transaction and update the amount of the first account (the key does not change) and the number of the second account (the key changes):



 => BEGIN; => UPDATE accounts SET amount = amount + 100.00 WHERE acc_no = 1; => UPDATE accounts SET acc_no = 20 WHERE acc_no = 2;
      
      





We look into the view:



 => SELECT * FROM accounts_v LIMIT 2;
      
      



  ctid | xmax | lock_only | is_multi | keys_upd | keyshr_lock | shr_lock -------+--------+-----------+----------+----------+-------------+---------- (0,1) | 530492 | | | | | (0,2) | 530492 | | | t | | (2 rows)
      
      





The locking mode is determined by the keys_updated information bit.



The same xmax field is also used when locking a row with the SELECT FOR UPDATE command, but in this case an additional information bit (xmax_lock_only) is put down, which indicates that the version of the row is only locked, but not deleted and is still relevant.



 => ROLLBACK; => BEGIN; => SELECT * FROM accounts WHERE acc_no = 1 FOR NO KEY UPDATE; => SELECT * FROM accounts WHERE acc_no = 2 FOR UPDATE;
      
      





 => SELECT * FROM accounts_v LIMIT 2;
      
      



  ctid | xmax | lock_only | is_multi | keys_upd | keyshr_lock | shr_lock -------+--------+-----------+----------+----------+-------------+---------- (0,1) | 530493 | t | | | | (0,2) | 530493 | t | | t | | (2 rows)
      
      





 => ROLLBACK;
      
      







Shared Modes



Two more modes represent shared locks that can be held by several transactions.





We'll see.



 => BEGIN; => SELECT * FROM accounts WHERE acc_no = 1 FOR KEY SHARE; => SELECT * FROM accounts WHERE acc_no = 2 FOR SHARE;
      
      





In row versions we see:



 => SELECT * FROM accounts_v LIMIT 2;
      
      



  ctid | xmax | lock_only | is_multi | keys_upd | keyshr_lock | shr_lock -------+--------+-----------+----------+----------+-------------+---------- (0,1) | 530494 | t | | | t | (0,2) | 530494 | t | | | t | t (2 rows)
      
      





In both cases, the keyshr_lock bit is set, and the SHARE mode can be recognized by looking at one more information bit.



Here's what the general mode compatibility matrix looks like.



mode FOR KEY SHARE FOR SHARE FOR NO KEY UPDATE FOR UPDATE
FOR KEY SHARE X
FOR SHARE X X
FOR NO KEY UPDATE X X X
FOR UPDATE X X X X


It shows that:





Multi-transaction



Until now, we thought that the lock is represented by the number of the blocking transaction in the xmax field. But shared locks can be held by multiple transactions, and multiple numbers cannot be written to the same xmax field. How to be



For shared locks, the so-called multi- transactions (MultiXact) are used. This is a transaction group that is assigned a separate number. This number has the same dimension as a regular transaction number, but the numbers are allocated independently (that is, the system can have the same transaction and multi-transaction numbers). To distinguish one from the other, another information bit (xmax_is_multi) is used, and detailed information about the members of such a group and the locking modes are in the files in the $ PGDATA / pg_multixact / directory. Naturally, the last used data is stored in buffers in the server’s shared memory for faster access.



Add to the existing locks another exceptional one executed by another transaction (we can do this, because the FOR KEY SHARE and FOR NO KEY UPDATE modes are compatible with each other):



 | => BEGIN; | => UPDATE accounts SET amount = amount + 100.00 WHERE acc_no = 1;
      
      





 => SELECT * FROM accounts_v LIMIT 2;
      
      



  ctid | xmax | lock_only | is_multi | keys_upd | keyshr_lock | shr_lock -------+--------+-----------+----------+----------+-------------+---------- (0,1) | 61 | | t | | | (0,2) | 530494 | t | | | t | t (2 rows)
      
      





In the first line, we see that the usual number has been replaced by a multitransaction number - this is evidenced by the xmax_is_multi bit.



In order not to delve into the internals of multitransaction implementation, you can use another extension that allows you to see all the information about all types of row locks in a convenient way.



 => CREATE EXTENSION pgrowlocks; => SELECT * FROM pgrowlocks('accounts') \gx
      
      



 -[ RECORD 1 ]----------------------------- locked_row | (0,1) locker | 61 multi | t xids | {530494,530495} modes | {"Key Share","No Key Update"} pids | {5892,5928} -[ RECORD 2 ]----------------------------- locked_row | (0,2) locker | 530494 multi | f xids | {530494} modes | {"For Share"} pids | {5892}
      
      





 => COMMIT;
      
      





 | => ROLLBACK;
      
      





Freeze setting



Since separate numbers are allocated for multitransactions, which are written in the xmax field of row versions, because of the counter bit limit, they have the same xid wraparound problem as with a regular number.



Therefore, for multi-transaction numbers, it is also necessary to perform an analogue of freezing - replace the old numbers with new ones (or with a regular transaction number, if at the time of freezing the lock is held by only one transaction).



Note that the freezing of ordinary transaction numbers is performed only for the xmin field (since if the version of the line has a non-empty xmax field, then it is either an irrelevant version and it will be cleared, or the xmax transaction is canceled and its number does not interest us). But for multitransactions, we are talking about the xmax field of the current version of the line, which can remain relevant, but it is constantly blocked by different transactions in a shared mode.



For the freezing of multitransactions, parameters similar to the parameters of the usual freezing are responsible : vacuum_multixact_freeze_min_age , vacuum_multixact_freeze_table_age , autovacuum_multixact_freeze_max_age .



Who is the extreme?



Gradually approach the sweet. Let's see what the pattern of locks is when several transactions are going to update the same row.



Let's start by building a view over pg_locks. First, let's make the conclusion a little more compact, and secondly, we will restrict ourselves to interesting locks (in fact, we discard the locks of virtual transaction numbers, the index on the accounts table, pg_locks and the view itself - in general, everything that is irrelevant and only distracting).



 => CREATE VIEW locks_v AS SELECT pid, locktype, CASE locktype WHEN 'relation' THEN relation::regclass::text WHEN 'transactionid' THEN transactionid::text WHEN 'tuple' THEN relation::regclass::text||':'||tuple::text END AS lockid, mode, granted FROM pg_locks WHERE locktype in ('relation','transactionid','tuple') AND (locktype != 'relation' OR relation = 'accounts'::regclass);
      
      





Now start the first transaction and update the row.



 => BEGIN; => SELECT txid_current(), pg_backend_pid();
      
      



  txid_current | pg_backend_pid --------------+---------------- 530497 | 5892 (1 row)
      
      



 => UPDATE accounts SET amount = amount + 100.00 WHERE acc_no = 1;
      
      



 UPDATE 1
      
      





What about locks?



 => SELECT * FROM locks_v WHERE pid = 5892;
      
      



  pid | locktype | lockid | mode | granted ------+---------------+----------+------------------+--------- 5892 | relation | accounts | RowExclusiveLock | t 5892 | transactionid | 530497 | ExclusiveLock | t (2 rows)
      
      





The transaction holds the table and own number locks. So far, everything is expected.



We start the second transaction and try to update the same line.



 | => BEGIN; | => SELECT txid_current(), pg_backend_pid();
      
      



 | txid_current | pg_backend_pid | --------------+---------------- | 530498 | 5928 | (1 row)
      
      



 | => UPDATE accounts SET amount = amount + 100.00 WHERE acc_no = 1;
      
      





What about second transaction locks?



 => SELECT * FROM locks_v WHERE pid = 5928;
      
      



  pid | locktype | lockid | mode | granted ------+---------------+------------+------------------+--------- 5928 | relation | accounts | RowExclusiveLock | t 5928 | transactionid | 530498 | ExclusiveLock | t 5928 | transactionid | 530497 | ShareLock | f 5928 | tuple | accounts:1 | ExclusiveLock | t (4 rows)
      
      





And here it is more interesting. In addition to locking the table and own number, we see two more locks. The second transaction found that the row was locked first and “hung” waiting for its number (granted = f). But where and why did the line version lock (locktype = tuple) come from?



Do not confuse row version lock (tuple lock) and row lock (row lock). The first is the usual tuple type lock, which is visible in pg_locks. The second is a mark in the data page: xmax and information bits.



When a transaction is about to change a row, it performs the following sequence of actions:



  1. Captures an exclusive lock on a mutable version of a string (tuple).
  2. If xmax and information bits indicate that the row is locked, it asks for locking the xmax transaction number.
  3. Prescribes its xmax and necessary information bits.
  4. Releases row version lock.


When the row was updated by the first transaction, it also grabbed the lock of the row version (step 1), but immediately released it (step 4).



When the second transaction arrived, she captured the row version lock (item 1), but was forced to request a lock on the number of the first transaction (item 2) and hung on this.



What happens if a third similar transaction appears? She will try to capture the lock of the version of the string (item 1) and will hang already at this step. Check it out.



 || => BEGIN; || => SELECT txid_current(), pg_backend_pid();
      
      



 || txid_current | pg_backend_pid || --------------+---------------- || 530499 | 5964 || (1 row)
      
      



 || => UPDATE accounts SET amount = amount + 100.00 WHERE acc_no = 1;
      
      





 => SELECT * FROM locks_v WHERE pid = 5964;
      
      



  pid | locktype | lockid | mode | granted ------+---------------+------------+------------------+--------- 5964 | relation | accounts | RowExclusiveLock | t 5964 | tuple | accounts:1 | ExclusiveLock | f 5964 | transactionid | 530499 | ExclusiveLock | t (3 rows)
      
      





The fourth, fifth, etc. transactions that wish to update the same row will not be any different from transaction 3 — they will all “hang” on the same row version lock.



Add another transaction to the heap.



 ||| => BEGIN; ||| => SELECT txid_current(), pg_backend_pid();
      
      



 ||| txid_current | pg_backend_pid ||| --------------+---------------- ||| 530500 | 6000 ||| (1 row)
      
      



 ||| => UPDATE accounts SET amount = amount - 100.00 WHERE acc_no = 1;
      
      





 => SELECT * FROM locks_v WHERE pid = 6000;
      
      



  pid | locktype | lockid | mode | granted ------+---------------+------------+------------------+--------- 6000 | relation | accounts | RowExclusiveLock | t 6000 | transactionid | 530500 | ExclusiveLock | t 6000 | tuple | accounts:1 | ExclusiveLock | f (3 rows)
      
      





A general picture of current expectations can be seen in the pg_stat_activity view, adding information about blocking processes:



 => SELECT pid, wait_event_type, wait_event, pg_blocking_pids(pid) FROM pg_stat_activity WHERE backend_type = 'client backend';
      
      



  pid | wait_event_type | wait_event | pg_blocking_pids ------+-----------------+---------------+------------------ 5892 | | | {} 5928 | Lock | transactionid | {5892} 5964 | Lock | tuple | {5928} 6000 | Lock | tuple | {5928,5964} (4 rows)
      
      





It turns out a kind of “queue”, in which there is the first (the one who holds the lock version of the string) and all the others that line up behind the first.



Why do we need such a sophisticated design? Suppose we would not have a version lock for the string. Then the second and third (and so on) transactions would wait for the blocking of the number of the first transaction. At the moment of completion of the first transaction, the blocked resource disappears ( and what are you doing here, eh? The transaction has ended ) and now it all depends on which of the waiting processes will be first woken up by the operating system and, accordingly, will have time to lock the line. All other processes will also be awakened, but they will have to queue up again - now after another process.



This is fraught with the fact that some of the transactions can wait indefinitely for its turn if, due to an unfortunate combination of circumstances, it will always “go round” other transactions. In English, this situation is called lock starvation.



In our case, it turns out about the same, but still a little better: the transaction that came in the second is guaranteed that it will get access to the next resource. But what happens to the following (third and forth)?



If the first transaction ends with a rollback, everything will be fine: the incoming transactions will go in the order they were lined up.



But - this is bad luck - if the first transaction completes with a commit, then not only the transaction number disappears, but also the version of the line! That is, the version, of course, remains, but ceases to be relevant, and it will be necessary to update a completely different, latest version (of the same line). The resource behind the queue disappears, and everyone arranges a race for the possession of a new resource.



Let the first transaction complete with commit.



 => COMMIT;
      
      





The second transaction will be woken up and execute paragraphs. 3 and 4.



 | UPDATE 1
      
      





 => SELECT * FROM locks_v WHERE pid = 5928;
      
      



  pid | locktype | lockid | mode | granted ------+---------------+----------+------------------+--------- 5928 | relation | accounts | RowExclusiveLock | t 5928 | transactionid | 530498 | ExclusiveLock | t (2 rows)
      
      





What about the third transaction? She skips step 1 (because the resource has disappeared) and gets stuck on step 2:



 => SELECT * FROM locks_v WHERE pid = 5964;
      
      



  pid | locktype | lockid | mode | granted ------+---------------+----------+------------------+--------- 5964 | relation | accounts | RowExclusiveLock | t 5964 | transactionid | 530498 | ShareLock | f 5964 | transactionid | 530499 | ExclusiveLock | t (3 rows)
      
      





And the same thing happens with the fourth transaction:



 => SELECT * FROM locks_v WHERE pid = 6000;
      
      



  pid | locktype | lockid | mode | granted ------+---------------+----------+------------------+--------- 6000 | relation | accounts | RowExclusiveLock | t 6000 | transactionid | 530498 | ShareLock | f 6000 | transactionid | 530500 | ExclusiveLock | t (3 rows)
      
      





That is, both the third and fourth transactions are awaiting completion of the second. The line turned into a pumpkin crowd.



We complete all started transactions.



 | => COMMIT;
      
      





 || UPDATE 1
      
      



 || => COMMIT;
      
      





 ||| UPDATE 1
      
      



 ||| => COMMIT;
      
      





More details about blocking strings can be found in README.tuplock .



You were not here



So, the idea of ​​a two-level blocking scheme is to reduce the likelihood of an eternal wait for a "bad luck" transaction. Nevertheless, as we have already seen, such a situation is quite possible. And if the application uses shared locks, everything can become even sadder.



Let the first transaction lock the row in shared mode.



 => BEGIN; => SELECT txid_current(), pg_backend_pid();
      
      



  txid_current | pg_backend_pid --------------+---------------- 530501 | 5892 (1 row)
      
      



 => SELECT * FROM accounts WHERE acc_no = 1 FOR SHARE;
      
      



  acc_no | amount --------+-------- 1 | 100.00 (1 row)
      
      





The second transaction tries to update the same row, but cannot - the SHARE and NO KEY UPDATE modes are incompatible.



 | => BEGIN; | => SELECT txid_current(), pg_backend_pid();
      
      



 | txid_current | pg_backend_pid | --------------+---------------- | 530502 | 5928 | (1 row)
      
      



 | => UPDATE accounts SET amount = amount + 100.00 WHERE acc_no = 1;
      
      





The second transaction waits for the completion of the first and holds the row version lock - for now, everything is like last time.



 => SELECT * FROM locks_v WHERE pid = 5928;
      
      



  pid | locktype | lockid | mode | granted ------+---------------+-------------+------------------+--------- 5928 | relation | accounts | RowExclusiveLock | t 5928 | tuple | accounts:10 | ExclusiveLock | t 5928 | transactionid | 530501 | ShareLock | f 5928 | transactionid | 530502 | ExclusiveLock | t (4 rows)
      
      





And then a third transaction appears that wants a shared lock. The trouble is that it does not try to capture the lock on the version of the line (because it is not going to change the line), but simply crawls out of turn - it is compatible with the first transaction.



 || BEGIN || => SELECT txid_current(), pg_backend_pid();
      
      



 || txid_current | pg_backend_pid || --------------+---------------- || 530503 | 5964 || (1 row)
      
      



 || => SELECT * FROM accounts WHERE acc_no = 1 FOR SHARE;
      
      



 || acc_no | amount || --------+-------- || 1 | 100.00 || (1 row)
      
      





And now two transactions block the row:



 => SELECT * FROM pgrowlocks('accounts') \gx
      
      



 -[ RECORD 1 ]--------------- locked_row | (0,10) locker | 62 multi | t xids | {530501,530503} modes | {Share,Share} pids | {5892,5964}
      
      





What happens now when the first transaction is completed? The second transaction will be woken up, but will see that the row lock has not disappeared anywhere, and will again stand in the “queue” - this time for the third transaction:



 => COMMIT; => SELECT * FROM locks_v WHERE pid = 5928;
      
      



  pid | locktype | lockid | mode | granted ------+---------------+-------------+------------------+--------- 5928 | relation | accounts | RowExclusiveLock | t 5928 | tuple | accounts:10 | ExclusiveLock | t 5928 | transactionid | 530503 | ShareLock | f 5928 | transactionid | 530502 | ExclusiveLock | t (4 rows)
      
      





And only when the third transaction is completed (and if no other shared locks appear during this time), the second will be able to perform the update.



 || => COMMIT;
      
      





 | UPDATE 1
      
      



 | => ROLLBACK;
      
      





Perhaps it's time to draw some practical conclusions.







Asked not to borrow



Typically, SQL commands expect to release the resources they need. But sometimes you want to refuse to execute the command if the lock could not be obtained immediately. To do this, commands such as SELECT, LOCK, ALTER, allow you to use the phrase NOWAIT.



For example:



 => BEGIN; => UPDATE accounts SET amount = amount + 100.00 WHERE acc_no = 1;
      
      





 | => SELECT * FROM accounts FOR UPDATE NOWAIT;
      
      



 | ERROR: could not obtain lock on row in relation "accounts"
      
      





The command immediately fails if the resource is busy. In the application code, such an error can be intercepted and processed.



You cannot specify the NOWAIT phrase for the UPDATE and DELETE commands, but you can first execute SELECT FOR UPDATE NOWAIT, and then, if possible, update or delete the line.



There is another option not to wait - use the SELECT FOR command with the phrase SKIP LOCKED. Such a command will skip locked lines, but process free ones.



 | => BEGIN; | => DECLARE c CURSOR FOR | SELECT * FROM accounts ORDER BY acc_no FOR UPDATE SKIP LOCKED; | => FETCH c;
      
      



 | acc_no | amount | --------+-------- | 2 | 200.00 | (1 row)
      
      





In this example, the first - blocked - line was skipped and we immediately received (and blocked) the second.



In practice, this allows you to organize multi-threaded processing of queues. You should not come up with another application for this command - if you want to use it, then most likely you will lose sight of some simpler solution.



 => ROLLBACK;
      
      



 | => ROLLBACK;
      
      





To be continued.



All Articles