A while back I was involved in an email thread where people were wondering about some ‘weird’ behavior from SQL Server. The problem was occurring on SQL Server 2016 and used the default isolation level of read committed. The scenario was as follows:
- Create a table with a few columns
- Batch 1: In one SSMS window, do the following (which takes 10 seconds to run):
- Start a transaction
- Insert 1,000 rows into the table, with a 0.01 second WAITFOR DELAY between each insert
- Commit the transaction
- Batch 2: In a second SSMS window:
- Select * from the table
The ‘weird’ behavior is that when the “Batch 2” select completes, after having been blocked by the “Batch 1” transaction, it doesn’t return all 1,000 rows (even though “Batch 1” has completed). Furthermore, depending on when the “Batch 2” select is started, during the 10-seconds that “Batch 1” executes, “Batch 2” returns different numbers of rows. This behavior had also been reported on earlier versions of SQL Server as well. It’s easy to reproduce on SQL Server 2016/2017 and can be reproduced in all earlier versions with a single configuration change (more details in a moment).
Additionally, if the table has a clustered index created, 1,000 rows are returned every time, on all versions of SQL Server.
So why is this weird? Many people expect that all 1,000 rows will be returned every time AND that the structure of the table or the version of SQL Server should not make any difference.
Unfortunately, that assumption is not correct when using read committed. The read committed isolation level guarantees that only committed data will be read; it does NOT guarantee how much of the committed data will be returned and is prone to inconsistent analysis problems. If you want to avoid inconsistent analysis, you’d need to increase your isolation level (if using locking) or change to versioning for read committed (or even snapshot isolation).
However, I do agree that the assumption is reasonable, even though it’s not correct.
The definition of the requirement is straightforward however. SQL Server guarantees that only committed data will be read, not how much of it will be read, and so the amount of committed data returned in this case is variable. Furthermore, depending on the table structure, SQL Server version, and configuration option (which I’ve mentioned twice now, but haven’t explained as I don’t want to spoil my story…), the number of rows returned will vary wildly.
So, what’s going on? Why the discrepancy in the number of rows?
It’s because the table is a heap.
The ‘weird’ behavior manifests itself when the heap has an extent allocated to it immediately, from which the first data pages are allocated. When the allocation-order scanner for the select starts, it looks at the PFS bytes for the 8 pages in the extent to see which ones are allocated. These will be scanned. Depending on when the select starts within the 10 seconds that “batch 1” executes, there will be more rows or fewer rows read by the scanner (and returned in the “batch 2” results) because more or fewer of the PFS byte 0x40 bits will have been set indicating the page in the extent is allocated.
For a table to have an extent allocated to it immediately, mixed pages have to be disabled, which is the default for SQL Server 2016 onward, which is why the ‘weird’ behavior manifests with a small number of rows on SQL Server 2016. However, if you’ve enabled trace flag 1118, you’ll see this behavior in all versions. This is why some people report seeing the ‘weird’ behavior on versions earlier than SQL Server 2016.
When mixed extents are NOT disabled, i.e. in earlier versions than SQL Server 2016 when 1118 is not enabled, the first 8 pages allocated are mixed pages, and so the allocation order scanner has to pick them up individually and sees them all (without going into details of the synchronization around the single-page slot array on the first IAM page in the IAM chain/allocation unit for the table), and so as long as no more than 8 data pages are used by the insert, all the rows on them will be returned by the select in this example.
So there you have it, the behavior is “by design” and hasn’t changed at all, but is understandably confusing unless you know exactly how the Engine is performing the select under the covers.
And if you create a clustered index, the ‘weird’ behavior doesn’t occur for this example in any version. This is because the select gets hung up on the S(hare) row lock for the first row in the clustered index, and by the time the select is granted the S lock, the insert has completed and the index leaf scan picks up all 1,000 rows.
Don’t think that using a clustered index stops ‘weird’ behavior using read committed; it doesn’t. However, it can reduce some forms of inconsistencies. In this old blog post, I demonstrate a scenario where a scan of clustered index returns four rows when only three rows exist, using read committed. Kimberly also describes and shows “inconsistent analysis” in these blog posts:
So to summarize: the read committed isolation guarantees that only committed data is read; however, there are many forms of potential concerns (known as “inconsistent analysis”). It really pays to understand what these are and what the various isolation levels that SQL Server provides do and do not allow. In addition to the blog posts above, check out the MSDN lesson titled: Understanding the Available Transaction Isolation Levels here.
12 thoughts on “Read committed doesn’t guarantee much…”
It is unfortunate that the term “inconsistent analysis” is rarely used nowadays, when discussing isolation levels. Instead, the phrasing in courses, books etc tend to be something like “repeatable read prevents data modifications of the read data until commit”. Using the term inconsistent analysis might trigger the developers to be a bit more careful than just accepting read committed as an OK isolation level for whatever they do.
I aggree. It even seems to me that knowledge about the different Isolation levels and the phenomenas that can occur is spread very thinly.. and is not taught and learned much any more because maybe it seems to “old” and “super basic” and can easily become ignored when never noticed.
Thanks for this! Just to clarify it, it’s not triggered by TF1118 behaviour per se for all cases, but triggered by TF1118 behaviour for small number of rows until the 8 data pages allocation is not exceeded. The symptom will always occur no matter what SQL Server versions we have and TF behaviour is on, above 8 data pages allocation, it will always be extent based allocation, right?
Yes. The issue will show up when there are mixed pages, so TF 1118 affects it but doesn’t cause it.
Create Table Tbl
(ID Int Not Null)
Insert Into Tbl Values(1),(2),(3)
Update Tbl set ID = ID + 100
select * from Tbl
100, 101, 102
When update statement is done, the insert statement has not been committed.
Why the above result is obtained?
Because the update is part of the same transaction as the insert…. this is the expected and correct result.
OK, I agree, this is the expected and correct result.
My question is that:
When update statement is done, What is the insert statement status? Committed or Uncommitted ?
Each sentence alone is a transaction.
You started an explicit transaction, so nothing is committed until you do a ‘commit tran’.
No – each statement is not a transaction in your example. There is only one transaction in your example, the explicit one you started.
I reproduced this on my local SQL Server 2017.
But on our production SQL Server 2005 with TF1118 enabled (yes, we use this old version unfortunately) I cannot reproduce it. 1,000 rows are returned every time…
What can be the reasons for such behaviour?
It’s to do with which extents the pages are in. There’s a case in older versions where the pages are spread over multiple extents, with only one page in the first extent allocated. In that case, the scanner be forced to read all the pages in the not-first extents – giving your conditions. It doesn’t always happen, and I’ve even seen it on a vanilla 2016 SP1 system too.