A while back I was involved in an email thread where people were wondering about some ‘weird’ behavior from SQL Server. The problem was occurring on SQL Server 2016 and used the default isolation level of read committed. The scenario was as follows:
- Create a table with a few columns
- Batch 1: In one SSMS window, do the following (which takes 10 seconds to run):
- Start a transaction
- Insert 1,000 rows into the table, with a 0.01 second WAITFOR DELAY between each insert
- Commit the transaction
- Batch 2: In a second SSMS window:
- Select * from the table
The ‘weird’ behavior is that when the “Batch 2” select completes, after having been blocked by the “Batch 1” transaction, it doesn’t return all 1,000 rows (even though “Batch 1” has completed). Furthermore, depending on when the “Batch 2” select is started, during the 10-seconds that “Batch 1” executes, “Batch 2” returns different numbers of rows. This behavior had also been reported on earlier versions of SQL Server as well. It’s easy to reproduce on SQL Server 2016/2017 and can be reproduced in all earlier versions with a single configuration change (more details in a moment).
Additionally, if the table has a clustered index created, 1,000 rows are returned every time, on all versions of SQL Server.
So why is this weird? Many people expect that all 1,000 rows will be returned every time AND that the structure of the table or the version of SQL Server should not make any difference.
Unfortunately, that assumption is not correct when using read committed. The read committed isolation level guarantees that only committed data will be read; it does NOT guarantee how much of the committed data will be returned and is prone to inconsistent analysis problems. If you want to avoid inconsistent analysis, you’d need to increase your isolation level (if using locking) or change to versioning for read committed (or even snapshot isolation).
However, I do agree that the assumption is reasonable, even though it’s not correct.
The definition of the requirement is straightforward however. SQL Server guarantees that only committed data will be read, not how much of it will be read, and so the amount of committed data returned in this case is variable. Furthermore, depending on the table structure, SQL Server version, and configuration option (which I’ve mentioned twice now, but haven’t explained as I don’t want to spoil my story…), the number of rows returned will vary wildly.
So, what’s going on? Why the discrepancy in the number of rows?
It’s because the table is a heap.
The ‘weird’ behavior manifests itself when the heap has an extent allocated to it immediately, from which the first data pages are allocated. When the allocation-order scanner for the select starts, it looks at the PFS bytes for the 8 pages in the extent to see which ones are allocated. These will be scanned. Depending on when the select starts within the 10 seconds that “batch 1” executes, there will be more rows or fewer rows read by the scanner (and returned in the “batch 2” results) because more or fewer of the PFS byte 0x40 bits will have been set indicating the page in the extent is allocated.
For a table to have an extent allocated to it immediately, mixed pages have to be disabled, which is the default for SQL Server 2016 onward, which is why the ‘weird’ behavior manifests with a small number of rows on SQL Server 2016. However, if you’ve enabled trace flag 1118, you’ll see this behavior in all versions. This is why some people report seeing the ‘weird’ behavior on versions earlier than SQL Server 2016.
When mixed extents are NOT disabled, i.e. in earlier versions than SQL Server 2016 when 1118 is not enabled, the first 8 pages allocated are mixed pages, and so the allocation order scanner has to pick them up individually and sees them all (without going into details of the synchronization around the single-page slot array on the first IAM page in the IAM chain/allocation unit for the table), and so as long as no more than 8 data pages are used by the insert, all the rows on them will be returned by the select in this example.
So there you have it, the behavior is “by design” and hasn’t changed at all, but is understandably confusing unless you know exactly how the Engine is performing the select under the covers.
And if you create a clustered index, the ‘weird’ behavior doesn’t occur for this example in any version. This is because the select gets hung up on the S(hare) row lock for the first row in the clustered index, and by the time the select is granted the S lock, the insert has completed and the index leaf scan picks up all 1,000 rows.
Don’t think that using a clustered index stops ‘weird’ behavior using read committed; it doesn’t. However, it can reduce some forms of inconsistencies. In this old blog post, I demonstrate a scenario where a scan of clustered index returns four rows when only three rows exist, using read committed. Kimberly also describes and shows “inconsistent analysis” in these blog posts:
So to summarize: the read committed isolation guarantees that only committed data is read; however, there are many forms of potential concerns (known as “inconsistent analysis”). It really pays to understand what these are and what the various isolation levels that SQL Server provides do and do not allow. In addition to the blog posts above, check out the MSDN lesson titled: Understanding the Available Transaction Isolation Levels here.