The Curious Case of… what is the wait resource (0:0:0)?

(The Curious Case of… used to be part of our bi-weekly newsletter but we decided to make it a regular blog post instead so it can sometimes be more frequent. It covers something interesting one of us encountered when working with a client, doing some testing, or were asked in a random question from the community.)

On Twitter today under the #sqlhelp hash tag, a former student asked what is the wait resource (0:0:0)?

This can show up in a couple of places:

  • In the output from sys.dm_exec_requests in the wait_resource column when the last_wait_type is PAGELATCH_EX or PAGELATCH_SH
  • In the output from sys.sysprocesses in the waitresource column when the lastwaittype is PAGELATCH_EX or PAGELATCH_SH

(as sys.sysprocesses is basically the same code as the DMV under the covers)

The simple answer is that in some cases of PAGELATCH_SH or PAGELATCH_EX waits in tempdb, the wait resource simply isn’t preserved by the code and so the wait resource for the last wait type is set to (0:0:0) in the output of those queries. It doesn’t mean anything special, and I’ve only ever seen it in tempdb. It’s been that way since SQL Server 2005 and is still there in all the latest versions today.

Simple answer: ignore it and troubleshoot the other waits.

The Curious Case of… why the transaction log has to be zero-initialized

(The Curious Case of… used to be part of our bi-weekly newsletter but we decided to make it a regular blog post instead so it can sometimes be more frequent. It covers something interesting one of us encountered when working with a client, doing some testing, or were asked in a random question from the community.)

And it’s an auspicious day to be posting, as SQLskills/SYSolutions turns 24 today on 7/24/19!

I had an email from someone last week asking why the transaction log has to be zero-initialized and a I realized I’ve never written about this before, so here goes. (Note: I do explain and demonstrate this in my Pluralsight course on SQL Server: Logging, Recovery, and the Transaction Log.)

It’s all to do with crash recovery. SQL Server knows where crash recovery has to start for a database, but not where it ends – i.e. SQL Server does not persist the ‘most recent LSN’ for a database anywhere. This means it has to work out where the end of the log is (and by end, I mean the most recent log record persisted on disk, not the physical end of the log file).

Some background before the explanation:

  • The log is split up internally into chunks called virtual log files, or more commonly just VLFs.
  • The first time a VLF is activated and used, all used sections of it are stamped with parity bits 64 (the mechanism for this is not important)
  • Eventually the VLF will be marked inactive, and eventually reused. The second time a VLF is activated, all used sections of it are stamped with parity bits 128
  • And then 64 again
  • And then 128 again
  • Ad infinitum…

Why 64 and 128 as the alternating parity bits you may ask? Why not? is my response. I can’t think of a reason to use a different bit pattern pair.

Back to the question at hand…

The most common case when crash recovery happens is that the log has wrapped around a few times and so the various VLFs have been activated and deactivated a few times. Crash recovery goes to where it must start: either the most recent checkpoint, or the start of the oldest active transaction at the time the most recent checkpoint happened. It follows the sequence of active VLFs until it comes to a point where a section of a VLF has the wrong parity bits. This means a VLF is active and says all valid sections should have parity bits X, and crash recovery finds an old section of the VLF from its previous use that has parity bits Y. That’s the end of the log. (Yes, when a VLF is reactivated, it is not zeroed out, because the overwriting of the old sections with new sections with new parity bits works instead.)

The much rarer case is when the log *hasn’t* yet wrapped around and not all the VLFs in the log have been used. In that case, crash recovery proceeds from the start until it find a section of an active VLF that is full of zeroes. And that’s the end of the log in that case.

New physical portions of the log file have to zero-initialized as the previous bits and bytes on disk might just happen to look like a section of a VLF with the ‘correct’ parity bits, causing crash recovery to try to use it and most likely crash SQL Server. It’s highly improbable, but there’s a very small possibility.

Hence the log cannot use instant file initialization.

PS: note that in SQL Server 2016, the ‘zeroing’ pattern changed from 0x00 (hexadecimal zero) to 0xc0, for reasons unrelated to what we’re discussing here.

The Curious Case of… how to find which page an index row is on

(The Curious Case of… used to be part of our bi-weekly newsletter but we decided to make it a regular blog post instead so it can sometimes be more frequent. It covers something interesting one of us encountered when working with a client, doing some testing, or were asked in a random question from the community.)

I was working recently on a client’s corruption issue where I needed to know how to find which page an index row is on. The client was getting weird errors from DBCC CHECKDB around missing nonclustered index records, but SELECTs using that index on the supposedly missing key values would work. I suspected a stale read problem from the I/O subsystem as part of building the database snapshot that DBCC CHECKDB uses, but needed to prove which page the issue was on.

There’s a cool undocumented column that you can select called %%PHYSLOC%% that gives the page:record:slot of the selected record, and I’ve used it many times when working on corruption/data recovery issues for clients. It gives the information back in hex, so there’s a companion function called sys.fn_PhysLocCracker that formats the output nicely. Both of these have been in the product since SQL Server 2005.

Let me show you an example using the old AdventureWorks sample database.

Let’s say there’s a corruption issue where it says there’s a missing row for StateProvinceID = 1 and AddressID = 519 in the IX_Address_StateProvinceID nonclustered index (which is index ID 4) of the Person.Address table. If I want to prove that that row is NOT missing, here’s what I can do:

SELECT
	[StateProvinceID]
	, [AddressID]
	, [physloc].*
FROM [Person].[Address]
WITH (INDEX = 4)
CROSS APPLY sys.fn_PhysLocCracker (%%physloc%%) AS [physloc]
WHERE [StateProvinceID] = 1
	AND [AddressID] = 519;
GO

I’m selecting the index keys for the nonclustered index and using an index hint to force that index to be used.

And if I get any output, the row exists. In this case, I get:

StateProvinceID AddressID   file_id     page_id     slot_id
--------------- ----------- ----------- ----------- -----------
1               519         1           8120        9

Pretty cool!

I’ve also used this to find rows that I need to bit-twiddle using DBCC WRITEPAGE to work around corruptions to allow data recovery from trashed databases – tedious, but possible if you know what you’re doing.

Bottom line: there are quite a few little undocumented columns and functions you can use to easily get internals information about SQL Server, with real-world applications.