Read committed doesn’t guarantee much…

A while back I was involved in an email thread where people were wondering about some ‘weird’ behavior from SQL Server. The problem was occurring on SQL Server 2016 and used the default isolation level of read committed. The scenario was as follows:

  • Create a table with a few columns
  • Batch 1: In one SSMS window, do the following (which takes 10 seconds to run):
    • Start a transaction
    • Insert 1,000 rows into the table, with a 0.01 second WAITFOR DELAY between each insert
    • Commit the transaction
  • Batch 2: In a second SSMS window:
    • Select * from the table

The ‘weird’ behavior is that when the “Batch 2” select completes, after having been blocked by the “Batch 1” transaction, it doesn’t return all 1,000 rows (even though “Batch 1” has completed). Furthermore, depending on when the “Batch 2” select is started, during the 10-seconds that “Batch 1” executes, “Batch 2” returns different numbers of rows. This behavior had also been reported on earlier versions of SQL Server as well. It’s easy to reproduce on SQL Server 2016/2017 and can be reproduced in all earlier versions with a single configuration change (more details in a moment).

Additionally, if the table has a clustered index created, 1,000 rows are returned every time, on all versions of SQL Server.

So why is this weird? Many people expect that all 1,000 rows will be returned every time AND that the structure of the table or the version of SQL Server should not make any difference.

Unfortunately, that assumption is not correct when using read committed. The read committed isolation level guarantees that only committed data will be read; it does NOT guarantee how much of the committed data will be returned and is prone to inconsistent analysis problems. If you want to avoid inconsistent analysis, you’d need to increase your isolation level (if using locking) or change to versioning for read committed (or even snapshot isolation).

However, I do agree that the assumption is reasonable, even though it’s not correct.

The definition of the requirement is straightforward however. SQL Server guarantees that only committed data will be read, not how much of it will be read, and so the amount of committed data returned in this case is variable. Furthermore, depending on the table structure, SQL Server version, and configuration option (which I’ve mentioned twice now, but haven’t explained as I don’t want to spoil my story…), the number of rows returned will vary wildly.

So, what’s going on? Why the discrepancy in the number of rows?

It’s because the table is a heap.

The ‘weird’ behavior manifests itself when the heap has an extent allocated to it immediately, from which the first data pages are allocated. When the allocation-order scanner for the select starts, it looks at the PFS bytes for the 8 pages in the extent to see which ones are allocated. These will be scanned. Depending on when the select starts within the 10 seconds that “batch 1” executes, there will be more rows or fewer rows read by the scanner (and returned in the “batch 2” results) because more or fewer of the PFS byte 0x40 bits will have been set indicating the page in the extent is allocated.

For a table to have an extent allocated to it immediately, mixed pages have to be disabled, which is the default for SQL Server 2016 onward, which is why the ‘weird’ behavior manifests with a small number of rows on SQL Server 2016. However, if you’ve enabled trace flag 1118, you’ll see this behavior in all versions. This is why some people report seeing the ‘weird’ behavior on versions earlier than SQL Server 2016.

When mixed extents are NOT disabled, i.e. in earlier versions than SQL Server 2016 when 1118 is not enabled, the first 8 pages allocated are mixed pages, and so the allocation order scanner has to pick them up individually and sees them all (without going into details of the synchronization around the single-page slot array on the first IAM page in the IAM chain/allocation unit for the table), and so as long as no more than 8 data pages are used by the insert, all the rows on them will be returned by the select in this example.

So there you have it, the behavior is “by design” and hasn’t changed at all, but is understandably confusing unless you know exactly how the Engine is performing the select under the covers.

And if you create a clustered index, the ‘weird’ behavior doesn’t occur for this example in any version. This is because the select gets hung up on the S(hare) row lock for the first row in the clustered index, and by the time the select is granted the S lock, the insert has completed and the index leaf scan picks up all 1,000 rows.

Don’t think that using a clustered index stops ‘weird’ behavior using read committed; it doesn’t. However, it can reduce some forms of inconsistencies. In this old blog post, I demonstrate a scenario where a scan of clustered index returns four rows when only three rows exist, using read committed. Kimberly also describes and shows “inconsistent analysis” in these blog posts:

So to summarize: the read committed isolation guarantees that only committed data is read; however, there are many forms of potential concerns (known as “inconsistent analysis”). It really pays to understand what these are and what the various isolation levels that SQL Server provides do and do not allow. In addition to the blog posts above, check out the MSDN lesson titled: Understanding the Available Transaction Isolation Levels here.

SQLskills SQL101: Why does repair invalidate replication subscriptions?

SQLskills has an ongoing initiative to blog about basic topics, which we’re calling SQL101. We’re all blogging about things that we often see done incorrectly, technologies used the wrong way, or where there are many misunderstandings that lead to serious problems. If you want to find all of our SQLskills SQL101 blog posts, check out

Whenever I’m teaching and recovering from corruption, I always stress that if the REPAIR_ALLOW_DATA_LOSS option of DBCC CHECKDB (which I’ll just call ‘repair’ from now on) has to be used, you should do the following:

  • Run another DBCC CHECKDB to make sure that repair fixed everything and no new problems were exposed (by the repair fixing something that was blocking DBCC CHECKDB‘s access to a portion of the database)
  • Run DBCC CHECKCONSTRAINTS on any tables or indexes that were repaired and are involved in constraints (stated in Books Online too)
  • Reinitialize any replication subscriptions that are based on tables that were repaired (stated in Books Online too)
  • Reevaluate your HA/DR strategy so you don’t have to use repair again in future

One question I’m often asked is about why replication can be broken by running repair.

There are two ways that replication can be affected: repairs on replication metadata tables, and repairs on anything else to do with a subscription.

Repairs on replication metadata tables

This is the simplest case to explain. If the repair operation affects any of the replication metadata tables (i.e. deleted some data from them), the entire replication publication will be in an inconsistent state and you should remove replication completely from the database an reinitialize it. This isn’t limited to a single subscription – all replication should be reconfigured.

Repairs on anything else

Transaction replication captures changes to the publication database by analyzing the transaction log, looking for transactions that change data in any of the publications, and converting those operations into logical operations that can be applied to the subscribers. Merge replication captures changes to the publication database using DML triggers and converting those operations into logical operations that can be applied to the subscribers.

Neither of these mechanisms can capture of operations performed by repair. Repair operations are always direct physical changes to the database structures to fix inconsistencies in the structures (e.g. a database page, table record, or a linkage between two pages), as opposed to physical changes because of queries performing inserts, updates, or deletes on tables.

These repair operations cannot translated into logical operations that can be applied to replication subscribers because there are no logical operations than can be expressed using T-SQL for the equivalent of the direct structural changes that repair is performing. Replication does not preserve the exact physical location of a particular record between the publication and subscription databases, so a direct change to record Y on page X in the publication database would not be able to be replayed on the subscription database (remember, replication ships logical changes, not physical changes). This means that if any table that is part of a replication publication is changed by a repair operation, the replication subscription is no longer valid and must be reinitialized.

As an example, imagine a repair operation is forced to remove a data page from a unique clustered index (essentially deleting some table records), and the subscription is NOT reinitialized. Those records would still exist on the replicated copy of the table. If a subsequent insert operation inserted records with cluster key values corresponding to the records deleted by the repair operation, the Distribution Agent would fail when trying to apply the inserts to the replicated copy – because the repair operation was not applied to the subscription database and a duplicate key violation error will occur when attempting to apply the insert to the replicated table.

A replication subscription must always be reinitialized if any table in the publication is affected by a repair operation, or the replication metadata tables are repaired.


New live online training class on using Query Store in May!

Continuing our series of live, online classes, Erin will be delivering her new IEQS: Immersion Event on Solving Common Performance Problems with Query Store in May! The class will be delivered live via WebEx on May 22-23 (roughly 12-13 hours of content including Q&As; about the same as two full workshop days!) and the recording will be available to attendees for six months following the end of the class.

Rather than have people try to watch a full day of training at their computer for one of more days, the class will run from 10am to 3pm PST each day, with two 90-minute teaching sessions, each followed by Q&A, and a lunch break. We chose to do this, and to spread the class over a few days, so the times work pretty well for those in the Americas, Africa, and Europe. We also realize that this is complex content, so want to give attendees time to digest each day’s material, plus extensive Q&A.

Here are some select quotes from prior attendees of Erin’s in-person classes:

  • “Extremely pleased with the course. FAR exceeded my expectations.”
  • “Well worth the time and expense to attend. Would highly recommend this to others.”
  • “I though this was extremely worthwhile. I have been a DBA for a few years and this was a great refresher for a lot of things I don’t do very often but should.”
  • “Found class extremely helpful and plan to take the next one.”
  • “Great course – very informative – very great instructors – I am sure to be back!”
  • “I was familiar with all of the topics in this course, but barely understood most. I feel I will be able to begin to do all of the things I need to take simpler tasks from our DBAs to allow them to focus on the more advanced tasks. That was exactly my goal for taking this class.”
  • “Great course. Good new info for me, plus refresher on other info. Thanks!”
  • “Both Erin and Jon have a vast knowledge of not only SQL Server & tools, but also effective presentation.”
  • “Thanks for taking the time to better my knowledge of SQL and allow me to better my career.”

The modules covered will be:

  • Query Store Fundamentals
  • Understanding the Query Store Data
  • Workload Characteristics and Query Store Performance
  • Finding Performance Issues
  • Forcing Plans
  • Automatic Tuning
  • Other Uses of Query Store
  • Visualizing Query Store Data

The price of the class is US$795 (or US$695 for prior live, online attendees) and you can get all the details here.

The class was also announced in our newsletter today, with a US$100 discount for those people who received that newsletter, valid for two weeks. All future live, online classes will always feature a two-week discount for newsletter subscribers.

We decided to start teaching some live, online classes as we recognize that not everyone can travel to our in-person classes, or take that time away from work or family, or simply have travel budget as well as training budget. People also have different ways they learn, some preferring in-person training, some preferring recorded, online training, and some preferring live, online training.

We’ll be doing more of these in 2018, on a variety of topics, so stay tuned for updates (and discounts through the newsletter).

We hope you can join us!