WL#3523: Repeatable-Read Isolation in Falcon is implemented as Consistent Read

Affects: Falcon-6.0 — Status: Code-Review — Priority: Medium

Note: Much of the following was written and/or described by Ann Harrison in a 
series of emails. The rest is by Kevin Lewis.

Falcon implementation of Repeatable Read Isolation

The Repeatable Read transaction isolation that is selected through the SQL 
syntax is implemented in Falcon as Consistent Read.  

The SQL standard defines "Repeatable Read" in a way that is not repeatable.  
The standard's definitions of transaction modes are based on the artifacts of 
locking systems.  Repeatable read is exactly read/write record locking without 
range locking, so new records can appear in repeated queries.  

"Consistent read" is what you get with Mult-Version Concurrency Control (MVCC) 
which is the way Falcon handles concurrency.  Locks are not needed to isolate 
transactions from one another.  The isolation can be done in either a 
Consistent Read or Read Committed fashion with minimal extra code.  Using the 
Falcon engine, when you select 'Repeatable Read', the only changes you see are 
those you make.

Each record that undergoes changes in an MVCC database engine has a chain of 
prior records versions from newest to oldest.  The transactions that wrap these 
changes are identifiable by their age.  The chain never branches because only 
one transaction can change a record at a time.  There are three tasks in the 
engine that need to use this prior record Chain;
 
	Reading the correct record version.
	Deleting (or Scavenging) old record versions.
	Checking to see if a duplicate key value exists.

We will discuss those three tasks referencing the following hypothetical record 
chain.  Each record version is identified by the transaction ID that made it;

	T34->T19->T18->T17->T10->T5->T3

	Reading the correct record version.

Assume that T34 is still active/pending and the rest were previously 
committed.  T34 must have used the record version committed by T19 to make it's 
changes to this record.  If transaction T20 is a Consistent Read transaction 
and needs to read this record, it will need to use a record version that was 
visible when it started.  Assuming T19 was committed before T20 started it will 
use T19.  But if T19 was still active when T20 started, T20 will have to use 
the record version committed by T18. If T19 was a Consistent Read Transaction, 
T18 must have committed before T19 started.  If T19 is Read Committed, then it 
is possible for both T19 and T18 to be running when T20 started, which means 
T20 will need to read an even earlier version.  In the same way, T15 will need 
to use T10, T5, or T3.  T8 will need to use T5 or T3. 

	Deleting (or Scavenging) old record versions.

Periodically a background thread in Falcon will come along and clean up the 
record cache by deleting old record versions that are no longer needed.  The 
trick is to determine which records may still be needed.  If the oldest active 
transaction is T15, then you need at least record T10 in this chain.  But it is 
more complicated than that.

Suppose that T10 was active when T15 started.  In that case, T15 would not see 
the change made by T10 to this record.  The scavenger thread needs to know not 
to delete record versions made by T10 even if T15 is currently the oldest 
active transaction.  So it is important for the scavenger thread to know what 
the oldest active transaction was when the currently oldest active transaction 
started.  

Falcon does this in the constructor of each Consistent Read transaction.  It 
makes a list of each of the active transactions at the time each new Consistent 
Read transaction starts.  In this way, it knows which committed changes are 
visible and which are not.  Also, using this list, the Scavenger can determine 
the oldest transaction that was active when the oldest active transaction 
started, thereby keeping all record versions that may be needed by that oldest 
active transaction, and any other active transaction, and deleting all the rest 
of the old record versions.

	Checking to see if a duplicate key value exists.

Transactions cannot insert or update a record to create two occurrences of the 
same key value.  If this is attempted. an update conflict error will occur on 
the insert or update.  There are three types of record versions which can cause 
a duplicate conflict.  

The first is a pending record with a duplicate value in a unique index.  When 
this conflict occurs, Falcon will cause the transaction to wait on the 
completion of the conflicting transaction.  This is appropriate since you don't 
know if the other transaction will commit or rollback.  It is not appropriate 
to block on a unique key value that gets rolled back, so that it appears to the 
user that the conflict was a mistake.

The second kind of conflict is with the currently committed record.  Although 
Read Committed transactions can always read this record, even if it was 
committed after the transaction started, Consistent Read transactions may not 
be able to read this record.  But a duplicate entry cannot be allowed because 
that record is committed, whether it is visible or not.

The third type of update conflict is when a Consistent Read transaction 
attempts to create a key value that is the same as an older, but still visible 
record version.  Even though a newer key value has been committed by a 
concurrent transaction, the Consistent Read transaction does not see it.  
Instead, it sees the  old value.  If this transaction tried to make another 
record with this value, it would 'appear' as if there are duplicates in a 
unique index.  So this also, is not allowed in the Falcon storage engine.

Let's take the previous record version stack as an example of these three kinds 
of duplicate conflicts.

Record Versions	T34->	T19 ->	T18 ->	T17 ->	T10 ->	T5 ->	T3
Key values     	6  	5	4	3	2	1	0

As before, let's assume T34 is still active, T15 started while T10 was active 
and T15 is still active.  Now T15 inserts a new record.  What unique key values 
are not permissible?  First, value 6 is still pending, so if it would be 
committed at some point in the future, it would become the only allowed key 
value 6.  In this case T15 would wait for T34 to complete and then determine if 
there is a conflict.  Second, value 5 is the currently committed key value, so 
it is not allowed.   Third, a select of this record by T15 would see value 1, 
previously committed by T5.  Thus, a value of 1 would also cause an update 
conflict.  Of course, if T15 were a read committed transaction, it would only 
see value block on values 6 (with a wait state) and 5 (with an update conflict 
error).

	Changes to the Falcon Engine

BUG#22211 had to do with the wait state that occured when a conflict was found 
with a pending record version.  In this case, the pending record version is on 
a savepoint that is rolled back. But the transaction is committed.  So when the 
waiting thread wakes up, it immediately returned the Update Conflict error, 
since the transaction committed.  The fix is to note if a wait state occurred, 
and if so, retry the duplicate search from the current head of the record 
chain, just in case the duplicate found was not actually committed.

As part of the fix for BUG#22211, it was noticed that Consistent Read 
transactions were not blocking on duplicate values within visible but old 
records.  This gave the impression of having duplicate values within a unique 
index while the transaction was active.  The change involved being able to skip 
previously committed record versions that need not be checked.  

A third aspect of these improvements to Falcon's Consistent Read behavior is 
that the scavenger thread was using the oldest active transaction instead of 
the oldest transaction that was active when the oldest active transaction was 
started.  This is a subtle difference, and it would not be noticed unless a 
Consistent Read transaction lasted a very long time on a very busy server.


You must be logged in to tag this worklog

No Comments yet

Votes

Not yet rated.
You must be logged in to vote.

Watches

0 members are watching this worklog
You must be logged in to track this worklog.

Provide Feedback

Please note:
HTML will be purified, but we allow for a number of HTML tags so that you have the flexibility to decorate your comment text to some extent. The comments allow the following HTML tags:

strong, b, em, blockquote, a, code, pre

To put code into your comment, simply encapsulate your code with
[code language="XXX"][/code], where XXX is any common language, for instance "PHP", "SQL", "C", etc.



You must be logged in to comment