A transaction is a way for an application to group several reads and writes together into a logical unit. It is a valuable database feature, no matter which data model is used.
Conceptually, all the reads and writes in a transaction are executed as one operation: either the entire transaction succeeds (commit) or it fails (abort, rollback). If it fails, the application can safely retry.
Transactions are not a law of nature; they were created with a purpose, namely to simplify the programming model for applications accessing a database. By using transactions, the application is free to ignore certain potential error scenarios and concurrency issues, because the database takes care of them instead (we call these safety guarantees).
Here, we talk about transactions mostly in the context of a database running on a single machine. Transactions in distributed databases open a new set of difficult challenges, which we’ll discuss later.
Databases have long tried to hide concurrency issues from application developers by providing transaction isolation.
Serializable isolation means that the database guarantees that transactions have the same effect as if they ran serially (i.e., one at a time, without any concurrency). Serializable isolation has a performance cost, and many databases don’t want to pay that price. It’s therefore common for systems to use weak(nonserializable) isolation.
We will discuss several widely used isolation levels, in particular read committed, snapshot isolation (sometimes called repeatable read), and serializable.
As shown above, isolation levels are categorized by a set of race conditions handled. Some race conditions can be handled by database, whereas the others must be handled by applications accessing the database.
Race conditions:
- dirty read: when a transaction reads data that has not yet been committed
- dirty write: when a transaction writes data that has not yet been committed
- read skew: with two different queries, a transaction reads inconsistent data because between the 1st and 2nd queries, other transactions insert, update or delete data and commit.
Weak isolation levels protect against some of anomalies but leave you, the application developer, to handle others manually (e.g., using explicit locking).
Two guarantees:
- When reading from the database, you will only see data that has been committed (no dirty reads).
- When writing to the database, you will only overwrite data that has been committed (no dirty writes). That is, the issue as shown below cannot happen.
Potential concurrency bug: read skew. For example, Alice observes the database in an inconsistent state, as shown below.
Snapshot isolation(a.k.a repeatable read) is the most common solution to the read skew problem mentioned above.
The idea is that each transaction reads from a consistent snapshot of the database—that is, the transaction sees all the data that was committed in the database at the start of the transaction. Even if the data is subsequently changed by another transaction, each transaction sees only the old data from that particular point in time.
Serializable isolation is usually regarded as the strongest isolation level. It guarantees that even though transactions may execute in parallel, the end result is the same as if they had executed one at a time, serially, without any concurrency. In other words, the database prevents all possible race conditions.
The following are example implementations.
The simplest way of avoiding concurrency problems is to remove the concurrency entirely: to execute only one transaction at a time, in serial order, on a single thread.
A system designed for single-threaded execution can sometimes perform better than a system that supports concurrency, because it can avoid the coordination overhead of locking. However, its throughput is limited to that of a single CPU core.
Several transactions are allowed to concurrently read the same object as long as nobody is writing to it. But as soon as anyone wants to write (modify or delete) an object, exclusive access is required:
-
If transaction A has read an object and transaction B wants to write to that object, B must wait until A commits or aborts before it can continue. (This ensures that B can’t change the object unexpectedly behind A’s back.)
-
If transaction A has written an object and transaction B wants to read that object, B must wait until A commits or aborts before it can continue. (Reading an old version of the object.)
It provides full serializability, but has only a small performance penalty compared to snapshot isolation.
It is an optimistic concurrency control technique. Optimistic in this context means that instead of blocking if something potentially dangerous happens, transactions continue anyway, in the hope that everything will turn out all right. Then a transaction wants to commit, the database checks whether anything bad happened (i.e., whether isolation was violated); if so, the transaction is aborted and has to be retried. Only transactions that executed serializably are allowed to commit.
As the name suggests, SSI is based on snapshot isolation—that is, all reads within a transaction are made from a consistent snapshot of the database. On top of snapshot isolation, SSI adds an algorithm for detecting serialization conflicts among writes and determining which transactions to abort.
Designing Data-Intensive Applications By Martin Kleppmann