8
0
mirror of https://github.com/FirebirdSQL/firebird.git synced 2025-01-22 18:43:02 +01:00
firebird-mirror/doc/README.read_consistency.md

263 lines
14 KiB
Markdown
Raw Normal View History

2018-07-03 14:34:22 +02:00
# Commits order as a way to define database snapshot
## Traditional way to define database snapshot using copy of TIP.
State of every transaction in database is recorded at **Transaction Inventory Pages (TIP)**.
It allows any active transaction to know state of another transaction, which created record version
that active transaction going to read or change. If other transaction is committed, then active
transaction is allowed to see given record version.
**Snapshot (concurrency)** transaction at own start takes *copy of TIP* and keeps it privately
until commit (rollback). This private copy of TIP allows to see any record in database as it
was at the moment of transaction start. I.e. it defines database *snapshot*.
**Read-committed** transaction not uses stable snapshot view of database and keeps no own TIP copy.
Instead, it ask for *most current* state of another transaction at the TIP. SuperServer have shared
TIP cache to optimize access to the TIP for read-committed transactions.
## Another way to define database snapshot.
The main idea: it is enough to know *order of commits* to know state of any transaction at
moment when snapshot is created.
### Define **commits order** for database
- per-database counter: **Commit Number (CN)**
- it is initialized when database is started
- when any transaction is committed, database Commit Number is incremented and
its value is associated with this transaction and could be queried later.
Let call it **"transaction commit number"**, or **transaction CN**.
- there is special values of CN for active and dead transactions.
### Possible values of transaction Commit Number
- Transaction is active
- CN_ACTIVE = 0
- Transactions committed before database started (i.e. older than OIT)
- CN_PREHISTORIC = 1
- Transactions committed while database works:
- CN_PREHISTORIC < CN < CN_DEAD
- Dead transaction
- CN_DEAD = MAX_TRA_NUM - 2
- Transaction is in limbo
- CN_LIMBO = MAX_TRA_NUM - 1
**Database snapshot** is defined by value of global Commit Number at moment when database snapshot
is created. To create database snapshot it is enough to get (and keep) global Commit Number value
at given moment.
### The record version visibility rule
- let **database snapshot** is current snapshot used by current transaction,
- let **other transaction** is transaction that created given record version,
- if other transaction's state is **active, dead or in limbo**:
record version is **not visible** to the current transaction
- if other transaction's state is **committed** - consider *when it was committed*:
- **before** database snapshot was created:
record version is **visible** to the current transaction
- **after** database snapshot was created:
record version is **not visible** to the current transaction
Therefore it is enough to compare CN of other transaction with CN of database snapshot to decide
if given record version is visible at the scope of database snapshot. Also it is necessary to
maintain list of all known transactions with associated Commit Numbers.
### Implementation details
List of all known transactions with associated Commit Numbers is maintained in shared memory.
It is implemented as array where index is transaction number and item value is corresponding
Commit Number. Whole array is split on blocks of fixed size. Array contains CN's for all
transactions between OIT and Next markers, thus new block is allocated when Next moves out of
scope of higher block, and old block is released when OIT moves out of lower block.
Block size could be set in firebird.conf using new setting **TipCacheBlockSize**. Default value is
2018-07-03 14:34:22 +02:00
4MB and it could keep 512K transactions.
2020-05-11 22:18:57 +02:00
**CONCURRENCY** transactions now uses **database snapshot** described above. Thus instead of taking
a private copy of TIP at own start it just keeps value of global Commit Number at a moment.
2018-07-03 14:34:22 +02:00
# Statement level read consistency for read-committed transactions
## Not consistent read problem
Current implementation of read-committed transactions suffers from one important problem - single
statement (such as SELECT) could see the different view of the same data during execution.
For example, imagine two concurrent transactions:
- first is inserting 1000 rows and commits,
- second run SELECT COUNT(*) against the same table.
If second transaction is read-committed its result is hard to predict, it could be any of:
- number of rows in table before first transaction starts, or
- number of rows in table after first transaction commits, or
- any number between two numbers above.
What case will take place depends on how both transactions interact with each other:
- second transaction finished counting before first transaction commits
- second transaction see no records inserted by first transaction
(as no new records was committed)
- happens if second transaction start to see new records after first transaction commits
- second transaction sees all records inserted (and committed) by first transaction
- happens in any other case
- second transaction could see some but not all records inserted (and committed) by first
transaction
This is the problem of not consistent read at *statement level*.
It is important to speak about *statement level* - because, by definition, each *statement* in
*read-committed* transaction is allowed to see own view of database. The problem of current
implementation is that this view is not stable and could be changed while statement is executed.
*Snapshot* transactions have no this problem as it uses the same stable database snapshot for all
executed statements. Different statements within read-committed transaction could see different
view of database, of course.
## Solution for not consistent read problem
The obvious solution to not consistent read problem is to make read-committed transaction to use
stable database snapshot while statement is executed. Each new top-level statement create own
database snapshot to see data committed recently. With snapshots based on commit order it is very
2020-05-11 22:18:57 +02:00
cheap operation. Let name this snapshot as **statement-level snapshot** further. Nested statements
(triggers, nested stored procedures and functions, dynamic statements, etc) uses same
statement-level snapshot created by top-level statement.
2018-07-03 14:34:22 +02:00
To support this solution new transaction isolation level is introduced: **READ COMMITTED READ
CONSISTENCY**
Old read-committed isolation modes (**RECORD VERSION** and **NO RECORD VERSION**) are still
2020-05-11 22:18:57 +02:00
allowed, works as before (i.e. not using statement-level snapshots) and could be considered
as legacy in the future versions of Firebird.
2018-07-03 14:34:22 +02:00
So, there are three kinds of read-committed transactions now:
- READ COMMITTED READ CONSISTENCY
- READ COMMITTED NO RECORD VERSION
- READ COMMITTED RECORD VERSION
### Update conflicts handling
When statement executed within READ COMMITTED READ CONSISTENCY transaction its database view is
2018-07-03 14:34:22 +02:00
not changed (similar to snapshot transaction). Therefore it is useless to wait for commit of
2018-07-16 15:05:10 +02:00
concurrent transaction in the hope to re-read new committed record version. On read, behavior is
similar to READ COMMITTED *RECORD VERSION* transaction - do not wait for active transaction and
2018-07-16 15:05:10 +02:00
walk backversions chain looking for record version visible to the current snapshot.
For READ COMMITTED *READ CONSISTENCY* mode handling of update conflicts by the engine is changed
2020-03-27 11:01:05 +01:00
significantly.
When update conflict is detected the following is performed:
a) transaction isolation mode temporarily switched to the READ COMMITTED *NO RECORD VERSION MODE*
2020-03-27 11:01:05 +01:00
b) engine put write lock on conflicted record
c) engine continue to evaluate remaining records of update\delete cursor and put write locks
on it too
d) when there is no more records to fetch, engine start to undo all actions performed since
top-level statement execution starts and preserve already taken write locks for every
updated\deleted\locked record, all inserted records are removed
e) then engine restores transaction isolation mode as READ COMMITTED *READ CONSISTENCY*, creates
2020-03-27 11:01:05 +01:00
new statement-level snapshot and restart execution of top-level statement.
Such algorithm allows to ensure that after restart already updated records remains locked,
will be visible to the new snapshot, and could be updated again with no further conflicts.
Also, because of read consistency mode, set of modified records remains consistent.
Notes:
- restart algorithm above is applied to the UPDATE, DELETE, SELECT WITH LOCK and MERGE statements,
with and without RETURNING clause, executing directly by user applicaiton or as a part of some
PSQL object (stored procedure\function, trigger, EXECUTE BLOCK, etc)
- if UPDATE\DELETE statement is positioned on some explicit cursor (WHERE CURRENT OF) then engine
skip step (c) above, i.e. not fetches and not put write locks on remaining records of cursor
- if top-level statement is SELECT'able and update conflict happens after one or more records was
returned to the application, then update conflict error is reported as usual and restart is not
initiated
- restart is not initiated for statements in autonomous blocks (IN AUTONOMOUS TRANSACTION DO ...)
- after 10 attempts engine aborts restart algorithm, releases all write locks, restores transaction
isolation mode as READ COMMITTED *READ CONSISTENCY* and report update conflict
- any not handled error at step (c) above stops restart algorithm and engine continue processing
in usual way, for example error could be catched and handled by PSQL WHEN block or reported to
the application if not handled
- UPDATE\DELETE triggers will fire multiply times for the same record if statement execution was
restarted and record is updated\deleted again
2020-05-11 22:18:57 +02:00
- statement restart usually fully transparent to the applications and no special actions should
be taken by developers to handle it in any way. The only exception is the code with side effects
that is out of transactional control, such as:
- usage of external tables, sequences or context variables;
- sending e-mails using UDF;
- committed autonomous transactions or external queries, and so on
Take into account that such code could be executed more than once if update conflict happens
- there is no special tools to detect restart but it could be easy done using code with side
effects as described above, for example - using context variable
- by historical reasons isc_update_conflict reported as secondary error code with primary error
code isc_deadlock.
2020-03-27 11:01:05 +01:00
2020-05-11 22:18:57 +02:00
### Read-committed read only transactions
2020-03-27 11:01:05 +01:00
2020-05-11 22:18:57 +02:00
READ COMMITTED *READ ONLY* transactions marked as committed immediately when transaction started.
Also such transactions do not inhibit regular garbage collection and not delays advance of OST
marker. READ CONSISTENCY READ ONLY transactions still marked as committed on start but, to not
let regular garbage collection to break future statement-level snapshots, it delays movement of
OST marker in the same way as SNAPSHOT transactions. Note, this delays *regular* (traditional)
garbage collection only, *intermediate* GC (see below) is not affected.
2018-07-03 14:34:22 +02:00
### Support for new READ COMMITTED READ CONSISTENCY isolation level
#### SQL syntax
New isolation level is supported at SQL level:
*SET TRANSACTION READ COMMITTED READ CONSISTENCY*
#### API level
To start read-committed read consistency transaction using ISC API use new constant in Transaction
Parameter Buffer (TPB):
*isc_tpb_read_consistency*
#### Configuration setting
2020-05-11 22:18:57 +02:00
It is recommended to use READ COMMITTED READ CONSISTENCY mode whenever read-committed isolation
is feasible. To help test existing applications with new READ COMMITTED READ CONSISTENCY isolation
level new configuration setting is introduced:
2018-07-03 14:34:22 +02:00
*ReadConsistency*
If ReadConsistency set to 1 (by default) engine ignores [NO] RECORD VERSION flags and makes all
read-committed transactions READ COMMITTED READ CONSISTENCY.
2020-05-11 22:18:57 +02:00
If ReadConsistency is set to 0 - flags [NO] RECORD VERSION takes effect as in previous Firebird
versions. READ COMMITTED READ CONSISTENCY isolation level should be specified explicitly by
application - in TPB or using SQL syntax.
2018-07-03 14:34:22 +02:00
2020-05-11 22:18:57 +02:00
The setting is per-database.
2018-07-03 14:34:22 +02:00
# Garbage collection of intermediate record versions
Lets see how garbage collection should be done with commit order based database snapshots.
From the *Record version visibility rule* can be derived following:
- If snapshot CN could see some record version then all snapshots with numbers greater than CN also
could see same record version.
- If *all existing snapshots* could see some record version then all it backversions could be removed,
or
- If *oldest active snapshot* could see some record version then all it backversions could be removed.
The last statement is exact copy of well known rule for garbage collection!
This rule allows to remove record versions at the *tail of versions chain*, starting from some "mature"
record version. Rule allows to find that "mature" record version and cut the whole tail after it.
Commit order based database snapshots allows also to remove some record version placed at the
*intermediate positions* in the versions chain. To do it, mark every record versions in the chain by
value of *oldest active snapshot* which could see *given* record version. If few consecutive versions
in a chain got the same mark then all of them after the first one could be removed. This allows to
keep versions chains short.
To make it work, engine maintains list of all active database snapshots. This list is kept in shared
2018-07-03 14:34:22 +02:00
memory. The initial size of shared memory block could be set in firebird.conf using new setting
**SnapshotsMemSize**. Default value is 64KB. It could grow automatically, when necessary.
When engine needs to find "*oldest active snapshot* which could see *given* record version" it just
searches for CN of transaction that created given record version in the sorted array of active
2018-07-03 14:34:22 +02:00
snapshots.
Garbage collection of intermediate record versions run by:
- sweep
- background garbage collector in SuperServer
- every user attachment after update or delete record
- table scan at index creation
2020-05-11 22:18:57 +02:00
Traditional way of garbage collection (regular GC) is not changed and still works the same way
as in previous Firebird versions.