Locks in PostgreSQL: 3. Locks other objects

We have already talked about some locks at the object level (in particular, about locks on relations), as well as about locks at the row level , their relationship with object locks and the wait queue, which is not always honest.



Today we have a hodgepodge. Let's start with deadlocks (actually, I was going to talk about them the last time, but that article turned out indecently long), then we will go over the remaining object locks and talk about predicate locks in conclusion.



Deadlocks



When using locks, a deadlock (or deadlock ) situation is possible. It occurs when one transaction tries to capture a resource already captured by another transaction, while another transaction tries to capture a resource captured by the first. This is illustrated in the left figure below: solid arrows show captured resources, dashed arrows show attempts to capture an already occupied resource.



It is convenient to visualize a deadlock by constructing a graph of expectations. To do this, we remove specific resources and leave only transactions, noting which transaction is waiting. If the graph has a contour (from the top you can get to it by the arrows) - this is a deadlock.







Of course, deadlock is possible not only for two transactions, but also for any larger number.



If a deadlock occurs, the transactions involved in it cannot do anything about it - they will wait indefinitely. Therefore, all DBMSs, and PostgreSQL, too, automatically track deadlocks.



However, checking requires certain efforts, which I do not want to make whenever a new lock is requested (after all, deadlocks are quite rare). Therefore, when the process tries to capture the lock and cannot, it enters the queue and falls asleep, but it starts the timer at the value specified in the deadlock_timeout parameter (by default - 1 second). If the resource is released earlier, then good, we saved on verification. But if after deadlock_timeout the wait continues, then the waiting process will be awakened and initiate a check.



If the check (which consists in constructing a graph of expectations and searching for contours in it) does not reveal deadlocks, then the process continues to sleep - now already to the end.



Earlier in the comments, I was rightly reproached for not saying anything about the lock_timeout parameter, which acts on any operator and avoids an indefinitely long wait: if the lock could not be obtained within the specified time, the statement ends with the lock_not_available error. It should not be confused with the statement_timeout parameter, which limits the total execution time of the statement, no matter if it expects a lock or just does the job.



If a deadlock is detected, then one of the transactions (in most cases, the one that initiated the check) is forcibly terminated. At the same time, the locks captured by it are released and the remaining transactions can continue to work.



Deadlocks usually mean that the application is not designed correctly. There are two ways to detect such situations: first, messages will appear in the server log, and secondly, the value of pg_stat_database.deadlocks will increase.



Deadlock example



A common cause of deadlocks is the different order in which rows in tables are locked.

A simple example. The first transaction intends to transfer 100 rubles from the first account to the second. To do this, she first reduces the first count:



=> BEGIN; => UPDATE accounts SET amount = amount - 100.00 WHERE acc_no = 1;
      
      



 UPDATE 1
      
      





At the same time, the second transaction intends to transfer 10 rubles from the second account to the first. She starts by reducing the second count:



 | => BEGIN; | => UPDATE accounts SET amount = amount - 10.00 WHERE acc_no = 2;
      
      



 | UPDATE 1
      
      





Now the first transaction is trying to increase the second account, but finds that the row is locked.



 => UPDATE accounts SET amount = amount + 100.00 WHERE acc_no = 2;
      
      





Then the second transaction tries to increase the first account, but is also blocked.



 | => UPDATE accounts SET amount = amount + 10.00 WHERE acc_no = 1;
      
      





There is a cyclical expectation that will never end on its own. After a second, the first transaction, not having access to the resource, initiates a deadlock check and breaks off the server.



 ERROR: deadlock detected DETAIL: Process 16477 waits for ShareLock on transaction 530695; blocked by process 16513. Process 16513 waits for ShareLock on transaction 530694; blocked by process 16477. HINT: See server log for query details. CONTEXT: while updating tuple (0,2) in relation "accounts"
      
      





Now the second transaction can continue.



 | UPDATE 1
      
      



 | => ROLLBACK;
      
      





 => ROLLBACK;
      
      





The correct way to perform such operations is to block resources in the same order. For example, in this case, you can block accounts in ascending order of their numbers.



Deadlock for two UPDATE commands



Sometimes you can get a deadlock where, it would seem, it should not be. For example, it is convenient and familiar to perceive SQL commands as atomic, but take UPDATE - this command blocks rows as they are updated. This is not happening at once. Therefore, if one command updates the rows in one order and the other in another, they may be deadlocked.



It is unlikely to get such a situation, but nevertheless it can meet. For playback, we will create an index on the amount column, built in descending order of amount:



 => CREATE INDEX ON accounts(amount DESC);
      
      





In order to have time to see what is happening, we will write a function that increases the transmitted value, but slowly, slowly, for a second:



 => CREATE FUNCTION inc_slow(n numeric) RETURNS numeric AS $$ SELECT pg_sleep(1); SELECT n + 100.00; $$ LANGUAGE SQL;
      
      





We also need the pgrowlocks extension.



 => CREATE EXTENSION pgrowlocks;
      
      





The first UPDATE command will update the entire table. The execution plan is obvious - a sequential scan:



 | => EXPLAIN (costs off) | UPDATE accounts SET amount = inc_slow(amount);
      
      



 | QUERY PLAN | ---------------------------- | Update on accounts | -> Seq Scan on accounts | (2 rows)
      
      





Since the versions of the rows on the page of our table are in the increasing order of the sum (exactly as we added them), they will be updated in the same order. We start the update to work.



 | => UPDATE accounts SET amount = inc_slow(amount);
      
      





Meanwhile, in another session, we will prohibit the use of sequential scanning:



 || => SET enable_seqscan = off;
      
      





In this case, the scheduler decides to use index scan for the following UPDATE statement:



 || => EXPLAIN (costs off) || UPDATE accounts SET amount = inc_slow(amount) WHERE amount > 100.00;
      
      



 || QUERY PLAN || -------------------------------------------------------- || Update on accounts || -> Index Scan using accounts_amount_idx on accounts || Index Cond: (amount > 100.00) || (3 rows)
      
      





The second and third rows fall under the condition, and, since the index is built in descending order, the rows will be updated in the reverse order.



We launch the next update.



 || => UPDATE accounts SET amount = inc_slow(amount) WHERE amount > 100.00;
      
      





A quick glance at the tabular page shows that the first operator has already managed to update the first row (0,1), and the second - the last (0,3):



 => SELECT * FROM pgrowlocks('accounts') \gx
      
      



 -[ RECORD 1 ]----------------- locked_row | (0,1) locker | 530699 <-  multi | f xids | {530699} modes | {"No Key Update"} pids | {16513} -[ RECORD 2 ]----------------- locked_row | (0,3) locker | 530700 <-  multi | f xids | {530700} modes | {"No Key Update"} pids | {16549}
      
      





Another second passes. The first operator updated the second line, and the second would like to do this, but cannot.



 => SELECT * FROM pgrowlocks('accounts') \gx
      
      



 -[ RECORD 1 ]----------------- locked_row | (0,1) locker | 530699 <-  multi | f xids | {530699} modes | {"No Key Update"} pids | {16513} -[ RECORD 2 ]----------------- locked_row | (0,2) locker | 530699 <-    multi | f xids | {530699} modes | {"No Key Update"} pids | {16513} -[ RECORD 3 ]----------------- locked_row | (0,3) locker | 530700 <-  multi | f xids | {530700} modes | {"No Key Update"} pids | {16549}
      
      





Now the first statement would like to update the last row of the table, but it is already locked by the second. Here is the deadlock.



One of the transactions is aborted:



 || ERROR: deadlock detected || DETAIL: Process 16549 waits for ShareLock on transaction 530699; blocked by process 16513. || Process 16513 waits for ShareLock on transaction 530700; blocked by process 16549. || HINT: See server log for query details. || CONTEXT: while updating tuple (0,2) in relation "accounts"
      
      





And the other completes the execution:



 | UPDATE 3
      
      





Interesting details about detecting and preventing deadlocks can be found in the README lock manager .



That's all about deadlocks, and we proceed to the remaining object locks.







Non-Relationship Locks



When you want to lock a resource that is not a relation in the understanding of PostgreSQL, object locks are used. Such a resource can be almost anything: table spaces, subscriptions, schemas, roles, enumerated data types ... Roughly speaking, everything that can be found in the system catalog.



Let's look at a simple example. We start the transaction and create a table in it:



 => BEGIN; => CREATE TABLE example(n integer);
      
      





Now let's see what type object locks appeared in pg_locks:



 => SELECT database, (SELECT datname FROM pg_database WHERE oid = l.database) AS dbname, classid, (SELECT relname FROM pg_class WHERE oid = l.classid) AS classname, objid, mode, granted FROM pg_locks l WHERE l.locktype = 'object' AND l.pid = pg_backend_pid();
      
      



  database | dbname | classid | classname | objid | mode | granted ----------+--------+---------+--------------+-------+-----------------+--------- 0 | | 1260 | pg_authid | 16384 | AccessShareLock | t 16386 | test | 2615 | pg_namespace | 2200 | AccessShareLock | t (2 rows)
      
      





To understand what exactly is blocked here, you need to look at three fields: database, classid and objid. Let's start with the first line.



Database is the OID of the database to which the locked resource belongs. In our case, this column is zero. This means that we are dealing with a global object that does not belong to any particular base.



Classid contains the OID of pg_class, which corresponds to the name of the system catalog table, which determines the type of resource. In our case, pg_authid, that is, the role is the resource (user).



Objid contains the OID from the system catalog table that classid indicated to us.



 => SELECT rolname FROM pg_authid WHERE oid = 16384;
      
      



  rolname --------- student (1 row)
      
      





Thus, the student role is blocked, from which we work.



Now let's deal with the second line. The database is indicated, and this is the test database to which we are connected.



Classid points to the pg_namespace table that contains the schemas.



 => SELECT nspname FROM pg_namespace WHERE oid = 2200;
      
      



  nspname --------- public (1 row)
      
      





Thus, the public schema is blocked.



So, we saw that when creating an object, the owner role and the scheme in which the object is created are blocked (in a shared mode). Which is logical: otherwise, someone could have removed the role or schema while the transaction has not yet been completed.



 => ROLLBACK;
      
      





Relationship Extension Lock



When the number of rows in a relation (that is, in a table, index, materialized view) grows, PostgreSQL can use the free space in the existing pages for insertion, but, obviously, at some point you have to add new pages. Physically, they are added to the end of the corresponding file. This is understood as expanding the relationship .



To prevent two processes from rushing to add pages at the same time, this process is protected by a special lock of type extend. The same lock is used when cleaning indexes so that other processes cannot add pages during scanning.



Of course, this lock is released without waiting for the end of the transaction.



Previously, tables were expanded only one page at a time. This caused problems when several processes simultaneously inserted rows, therefore, in PostgreSQL 9.6, several pages were added to the tables at once (in proportion to the number of processes waiting for locking, but no more than 512).



Page Lock



A page-level lock is applied in the only case (except for the predicate locks, which are discussed later).



GIN indexes allow you to speed up the search in compound values, for example, words in text documents (or elements in arrays). To a first approximation, such indexes can be represented as a regular B-tree, in which not the documents themselves are stored, but individual words of these documents. Therefore, when adding a new document, the index has to be rebuilt quite strongly, introducing into it every word included in the document.



To improve performance, GIN indexes have a delayed insertion feature that is enabled by the fastupdate storage option. New words are first quickly added to the unordered pending list, and after some time, everything that has accumulated is moved to the main index structure. Savings are due to the fact that different documents are likely to contain duplicate words.



To exclude multiple processes from moving from the waiting list to the main index at the same time, the index meta page is blocked in exclusive mode for the duration of the transfer. This does not interfere with the use of the index in normal mode.



Advisory locks



Unlike other locks (such as relationship locks), advisory locks are never set automatically, they are managed by the application developer. They are convenient to use, for example, if an application needs blocking logic for some purpose that does not fit into the standard logic of ordinary locks.



Suppose we have a conditional resource that does not correspond to any database object (which we could block with commands like SELECT FOR or LOCK TABLE). You need to come up with a numerical identifier for it. If the resource has a unique name, then a simple option is to take a hash code from it:



 => SELECT hashtext('1');
      
      



  hashtext ----------- 243773337 (1 row)
      
      





This is how we capture the lock:



 => BEGIN; => SELECT pg_advisory_lock(hashtext('1'));
      
      





As usual, lock information is available in pg_locks:



 => SELECT locktype, objid, mode, granted FROM pg_locks WHERE locktype = 'advisory' AND pid = pg_backend_pid();
      
      



  locktype | objid | mode | granted ----------+-----------+---------------+--------- advisory | 243773337 | ExclusiveLock | t (1 row)
      
      





For a lock to actually work, other processes must also get a lock before accessing the resource. Compliance with this rule should obviously be ensured by the application.



In the above example, the lock is valid until the end of the session, and not the transaction, as usual.



 => COMMIT; => SELECT locktype, objid, mode, granted FROM pg_locks WHERE locktype = 'advisory' AND pid = pg_backend_pid();
      
      



  locktype | objid | mode | granted ----------+-----------+---------------+--------- advisory | 243773337 | ExclusiveLock | t (1 row)
      
      





It needs to be explicitly released:



 => SELECT pg_advisory_unlock(hashtext('1'));
      
      





There are a large set of functions for working with advisory locks for all occasions:





A set of try functions provides another way to not wait for a lock, in addition to those listed in a previous article .



Predicate locks



The term predicate locking appeared long ago, at the first attempts to implement full isolation based on locks in early DBMSs (the level is Serializable, although the SQL standard did not exist at that time). The problem that was then encountered was that even blocking all read and changed lines does not provide complete isolation: new lines may appear in the table that fall under the same selection conditions, which leads to phantoms (see the article on isolation ) .



The idea of ​​predicate locks was to block predicates, not rows. If, when executing a query with the condition a > 10, the predicate a > 10 is blocked, this will not add new rows to the table that fall under the condition and will avoid phantoms. The problem is that in the general case this is a computationally difficult task; in practice, it can be solved only for predicates that have a very simple form.



In PostgreSQL, the Serializable layer is implemented differently, on top of existing snapshot-based isolation. The term predicate lock remains, but its meaning has changed radically. In fact, such β€œlocks” do not block anything, but are used to track data dependencies between transactions.



It is proved that isolation based on images allows an anomaly of inconsistent recording and an anomaly of only a reading transaction , but no other anomalies are possible. To understand that we are dealing with one of the two anomalies listed, we can analyze the dependencies between transactions and find certain patterns in them.



We are interested in the dependencies of two types:





WR-dependencies can be tracked using existing conventional locks, but RW-dependencies just have to track additionally.



I repeat once again: despite the name, predicate locks do not block anything. Instead, when a transaction is committed, a check is performed and, if a β€œbad” sequence of dependencies is detected that may indicate an anomaly, the transaction breaks.



Let's see how predicate locks are set. To do this, create a table with a sufficiently large number of rows and an index on it.



 => CREATE TABLE pred(n integer); => INSERT INTO pred(n) SELECT gn FROM generate_series(1,10000) g(n); => CREATE INDEX ON pred(n) WITH (fillfactor = 10); => ANALYZE pred;
      
      





If the query is executed by sequential scanning of the entire table, then the predicate lock is set on the entire table (even if not all rows fall under the filtering conditions).



 | => SELECT pg_backend_pid();
      
      



 | pg_backend_pid | ---------------- | 12763 | (1 row)
      
      





 | => BEGIN ISOLATION LEVEL SERIALIZABLE; | => EXPLAIN (analyze, costs off) | SELECT * FROM pred WHERE n > 100;
      
      



 | QUERY PLAN | ---------------------------------------------------------------- | Seq Scan on pred (actual time=0.047..12.709 rows=9900 loops=1) | Filter: (n > 100) | Rows Removed by Filter: 100 | Planning Time: 0.190 ms | Execution Time: 15.244 ms | (5 rows)
      
      





Any predicate locks are always captured in one special SIReadLock (Serializable Isolation Read) mode:



 => SELECT locktype, relation::regclass, page, tuple FROM pg_locks WHERE mode = 'SIReadLock' AND pid = 12763;
      
      



  locktype | relation | page | tuple ----------+----------+------+------- relation | pred | | (1 row)
      
      





 | => ROLLBACK;
      
      





But if the query is performed using index scanning, the situation changes for the better. If we talk about the B-tree, then it is enough to set the lock on the read table rows and on the scanned leaf pages of the index - thereby we block not only specific values, but also the entire range read.



 | => BEGIN ISOLATION LEVEL SERIALIZABLE; | => EXPLAIN (analyze, costs off) | SELECT * FROM pred WHERE n BETWEEN 1000 AND 1001;
      
      



 | QUERY PLAN | ------------------------------------------------------------------------------------ | Index Only Scan using pred_n_idx on pred (actual time=0.122..0.131 rows=2 loops=1) | Index Cond: ((n >= 1000) AND (n <= 1001)) | Heap Fetches: 2 | Planning Time: 0.096 ms | Execution Time: 0.153 ms | (5 rows)
      
      





 => SELECT locktype, relation::regclass, page, tuple FROM pg_locks WHERE mode = 'SIReadLock' AND pid = 12763;
      
      



  locktype | relation | page | tuple ----------+------------+------+------- tuple | pred | 3 | 236 tuple | pred | 3 | 235 page | pred_n_idx | 22 | (3 rows)
      
      





You may notice several difficulties.



First, a separate lock is created for each version of the line that is read, but potentially there can be a lot of such versions. The total number of predicate locks in the system is limited by the product of the parameter values max_pred_locks_per_transaction Γ— max_connections (the default values ​​are 64 and 100, respectively). Memory for such locks is allocated at server startup; attempting to exceed this number will result in errors.



Therefore, for predicate locks (and only for them!), A level increase is used. Prior to PostgreSQL 10, there were restrictions that were hardwired into the code, and starting with it you can control the parameters by raising the level. If the number of row version locks per row is greater than max_pred_locks_per_page , such locks are replaced with one page-level lock. Here is an example:



 => SHOW max_pred_locks_per_page;
      
      



  max_pred_locks_per_page ------------------------- 2 (1 row)
      
      





 | => EXPLAIN (analyze, costs off) | SELECT * FROM pred WHERE n BETWEEN 1000 AND 1002;
      
      



 | QUERY PLAN | ------------------------------------------------------------------------------------ | Index Only Scan using pred_n_idx on pred (actual time=0.019..0.039 rows=3 loops=1) | Index Cond: ((n >= 1000) AND (n <= 1002)) | Heap Fetches: 3 | Planning Time: 0.069 ms | Execution Time: 0.057 ms | (5 rows)
      
      





Instead of three tuple locks, we see one page type:



 => SELECT locktype, relation::regclass, page, tuple FROM pg_locks WHERE mode = 'SIReadLock' AND pid = 12763;
      
      



  locktype | relation | page | tuple ----------+------------+------+------- page | pred | 3 | page | pred_n_idx | 22 | (2 rows)
      
      





Similarly, if the number of page locks related to one relationship exceeds max_pred_locks_per_relation , such locks are replaced with one relationship level lock.



There are no other levels: predicate locks are only captured for relationships, pages, or row versions, and always with SIReadLock mode.



Of course, an increase in the level of locks inevitably leads to the fact that a larger number of transactions will falsely result in a serialization error and, as a result, the throughput of the system will decrease. Here you need to find a balance between memory consumption and performance.



The second difficulty is that in various operations with the index (for example, due to the splitting of the index pages when inserting new lines), the number of sheet pages covering the read range may change. But the implementation of this takes into account:



 => INSERT INTO pred SELECT 1001 FROM generate_series(1,1000); => SELECT locktype, relation::regclass, page, tuple FROM pg_locks WHERE mode = 'SIReadLock' AND pid = 12763;
      
      



  locktype | relation | page | tuple ----------+------------+------+------- page | pred | 3 | page | pred_n_idx | 211 | page | pred_n_idx | 212 | page | pred_n_idx | 22 | (4 rows)
      
      





 | => ROLLBACK;
      
      





By the way, predicate locks are not always removed immediately after the transaction is completed, because they are needed to track the dependencies between several transactions. But in any case, they are automatically managed.



Not all index types in PostgreSQL support predicate locks. Previously, only B-trees could boast of this, but in PostgreSQL 11 the situation improved: hash indexes, GiST and GIN were added to the list. If index access is used, and the index does not work with predicate locks, then the entire index is locked to the lock. Of course, this also increases the number of false transaction breaks.



In conclusion, I note that it is with the use of predicate locks that there is a restriction that, in order to guarantee complete isolation, all transactions must work at the Serializable level. If a transaction uses a different level, it simply won’t set (and check) predicate locks.



By tradition, I’ll leave a link to README on predicate locks , from which you can begin to study the source code.



To be continued .



All Articles