The Curious Case of… how to find which page an index row is on

(The Curious Case of… used to be part of our bi-weekly newsletter but we decided to make it a regular blog post instead so it can sometimes be more frequent. It covers something interesting one of us encountered when working with a client, doing some testing, or were asked in a random question from the community.)

I was working recently on a client’s corruption issue where I needed to know how to find which page an index row is on. The client was getting weird errors from DBCC CHECKDB around missing nonclustered index records, but SELECTs using that index on the supposedly missing key values would work. I suspected a stale read problem from the I/O subsystem as part of building the database snapshot that DBCC CHECKDB uses, but needed to prove which page the issue was on.

There’s a cool undocumented column that you can select called %%PHYSLOC%% that gives the page:record:slot of the selected record, and I’ve used it many times when working on corruption/data recovery issues for clients. It gives the information back in hex, so there’s a companion function called sys.fn_PhysLocCracker that formats the output nicely. Both of these have been in the product since SQL Server 2005.

Let me show you an example using the old AdventureWorks sample database.

Let’s say there’s a corruption issue where it says there’s a missing row for StateProvinceID = 1 and AddressID = 519 in the IX_Address_StateProvinceID nonclustered index (which is index ID 4) of the Person.Address table. If I want to prove that that row is NOT missing, here’s what I can do:

SELECT
	[StateProvinceID]
	, [AddressID]
	, [physloc].*
FROM [Person].[Address]
WITH (INDEX = 4)
CROSS APPLY sys.fn_PhysLocCracker (%%physloc%%) AS [physloc]
WHERE [StateProvinceID] = 1
	AND [AddressID] = 519;
GO

I’m selecting the index keys for the nonclustered index and using an index hint to force that index to be used.

And if I get any output, the row exists. In this case, I get:

StateProvinceID AddressID   file_id     page_id     slot_id
--------------- ----------- ----------- ----------- -----------
1               519         1           8120        9

Pretty cool!

I’ve also used this to find rows that I need to bit-twiddle using DBCC WRITEPAGE to work around corruptions to allow data recovery from trashed databases – tedious, but possible if you know what you’re doing.

Bottom line: there are quite a few little undocumented columns and functions you can use to easily get internals information about SQL Server, with real-world applications.

The Curious Case of… very long failover times with a large ad hoc workload

(The Curious Case of… used to be part of our bi-weekly newsletter but we decided to make it a regular blog post instead so it can sometimes be more frequent. It covers something interesting one of us encountered when working with a client, doing some testing, or were asked in a random question from the community.)

Erin worked with a client recently who was experiencing very long failover times with a large ad hoc workload and needed to understand why. By very long, we’re talking more than 30 minutes for an availability group failover, and they’d tested failovers under load before and never had it take more than a few minutes. They wondered if their workload had changed or there was some configuration setting that was off because they’d also recently upgraded.

Obviously Erin didn’t want them to do another failover and take more downtime and luckily they have a process in place to capture wait statistics all the time so Erin could see what waits happened while the failover was in progress. One stood out to her right away: QDS_LOADDB.

The QDS_LOADDB wait happens when a database is bring brought online, Query Store is enabled, and the Query Store data is being loaded from disk. No queries can execute until that data has finished loading. In this case, the client had enabled Query Store after upgrading and with their large workload of ad hoc queries, there was 100GB of Query Store data to load. That was the cause of the long failover time and they didn’t realize that Query Store has this behavior.

Luckily there is a workaround other than disabling Query Store. You can enable documented trace flag 7752 which makes the Query Store data load asynchronous with the database startup. This means that queries can run before the data load completes, but Query Store won’t capture any information about them (Query Store is essentially read-only until the data load completes). Erin discusses this and other Query Store trace flags in her post here, and in general about Query Store settings and data size in this post.

Bottom line: make sure you understand how a feature changes behavior before enabling it, and HA/DR testing should be performed regularly to catch unexpected behaviors such as this.

The Curious Case of… does SQL Server use a read/write thread per LUN?

(The Curious Case of… used to be part of our bi-weekly newsletter but we decided to make it a regular blog post instead so it can sometimes be more frequent. It covers something interesting one of us encountered when working with a client, doing some testing, or were asked in a random question from the community.)

I had a question in email this morning from someone that was basically does SQL Server use a read/write thread per LUN?

The short answer is no.

My friend Bob Dorr from CSS actually debunked this publicly way back in  2007 in the blog post SQL Server Urban Legends Discussed and also explains how some of the wording in earlier versions of Books online led to the myth that SQL Server uses one read/write thread per LUN or per data file. This myth leads people to believe that adding more data files will increase I/O performance because more threads will be used to do the writing. While it’s true that more data files can increase performance, depending on your I/O subsystem and the file/filegroup layout on that I/O subsystem, it’s nothing to do with that myth.

Any thread in SQL Server that’s processing a query can issue an asynchronous read request. When a thread makes a call into the buffer pool to get  a pointer to a data file page copy in memory, if the page isn’t already in memory then it has to issue a read and then wait for the read to complete (called a physical I/O). On the other hand, if the page already is in memory, the thread pops back out to where it came from (usually the Access Methods code) and that’s a logical I/O, with no accompanying wait. Pages can also be read in larger chunks by things like index seek/scan readahead and DBCC CHECKDB‘s readahead.

Pages can be written to disk in a variety of ways, including:

  • Through a regular checkpoint by a checkpoint background thread
  • Through lazy writing when there’s memory pressure on the buffer pool bu a lazy writer background thread
  • Through eager writing, when performing a minimally-logged operation in (guaranteeing that the full result of the operation is durable when it finishes)
  • Through database snapshot real-time maintenance, when a page is first changed after a database snapshot is created, it is synchronously written to the snapshot

There’s one exception though: backup and restore *will* use a separate reader or writer thread per device to aid in I/O parallelism, but backup and restore don’t use the buffer pool and open direct channels to the devices themselves.

Bottom line: for regular operations, the one-thread-per-file/LUN is a big myth that’s still around today.