This week we had a bug report of one of our products regarding some strange deadlocks in our database access. For those of you who don't know what a deadlock is, I'll try to summarize here what a transaction is in a relational database environment and why those transactions might lead to those nasty errors, and try to explain what was causing this deadlock in our SQL Server 2005 engine.
One of the most interesting features of any good database engine is what we call concurrency. Concurrency means than the database engine should be able to perform a lot of data operations at the same time. This leads to some interesting problems that have a critical impact on another of the features that any database engine must ensure: data consistency.
Sometimes you need to perform more than one non atomic operation in a database that only make sense in a block. This is, all those operations should be executed from the start to the end without any other interference from other operations, because those other operations may have a non desired impact in your operations. The simulation of the atomic execution of more than one operation in databases is done by transactions. When you embrace a set of operations in a transaction, you are telling the database engine that you want those operations to be executed as if they were executed in an isolated mode, this is, with no interference from other operations.
While this might seem easy to accomplish, it can also have a big negative performance impact, because you can't plan on executing all the transactions that arrive at the same time in a sequential way, at least not if you want your database engine to be slow, very slow.
This is where locks ans isolation levels intervene. Locks are protections made to some resources to allow or forbid further access to those resources by another processes. An exclusive lock, for example, done by a process, means that no other process should be able to access that resource until the first process has freed the lock. There are a lot of different locks in SQL Server, which you can review at http://msdn2.microsoft.com/en-us/library/ms175519.aspx.
One of the problems we can face when working with transactions is the well known repeatable read. Suppose a process A starts a transaction and reads a registry with a select clause in a table. After that, another transaction comes in and modifies that data. The first transaction reads the same data again and gets a different value, becuase it was modified by the second transaction. This is inconsistent, because as we are executing a set of operations inside a transaction, we expect them to be executed as if no one else was modifying the data we need to use.
Because of that, there's a special isolation level called, precisely, Repeatable Read, which has a locking policy to avoid those problems. But be careful with this, since this can cause a nasty deadlock in your SQL Server if used the wrong way.
A deadlock is a special situation where two or more transactions are waiting each other. Imagine trasaction A starts some operations and locks some resources. Later, transaction B comes in, locks another set of resources, and then tries to access resources locked by A, so it waits for A to free those resources. After that, A tries to lock resources locked by B and wait for B to free them. A is waiting for B to free its resources, and B is waiting for A to do the very same thing. We have a deadlock. One of the transactions must be killed by the database engine and rolled back so the other one can continue. Altough this is something you might be able to control by code and issue a relaunch of the killed transaction, this is something usually not desired.
I'm going to talk now about a deadlock you may face when using the Repeatable Read transaction isolation level with SQL Server 2005. Imagine you have two transactions that do the same thing: read a value, modify it, and read it again. Note that you will be modifying the value, but you set the isolation mode to Repeatable Read (this is, setting the wrong isolation mode by mistake or by ignorance). Transaction A starts and reads the value. SQL Server puts a shared lock on that resource. Transaction B starts and reads the value. Because the lock in A was shared, B can also read that value, gaining a shared lock too. Now transaction B tries to write that value. Because the shared lock put by A was read only, B has to wait for A to release its shared lock to be able to gain write acces to it, so it blocks. After that, A tries to write the value, and because now it's transaction B that has a shared lock on the resource, it has to wait too. And there it is, the deadlock.
The key in avoiding this is that, although we perform a read-write-read operation, we don't need the second read to be the same as the first, because we are modifying that resource. In this case, the lock our transactions should get is an update lock, which knows that we'll be modifying that data after having read it. This way, when transaction A locks the resource, the lock will not be shared, so transaction B will have to wait for A to write the new value before B gets the lock, thus avoiding the deadlock. Another option is use an isolation level that frees the locks just after having read the value, and not maintaining it until the next update.
The conclusion to this is: be careful when working with transactions on which isolation level you use on them, and be sure you're using the right one if you don't want to have bug reports when a lof of concurrency starts to stress the database engine!