(The Curious Case of… used to be part of our bi-weekly newsletter but we decided to make it a regular blog post instead so it can sometimes be more frequent. It covers something interesting one of us encountered when working with a client, doing some testing, or were asked in a random question from the community.)
We had an interesting case with a client last week where one of their SQL Server instances was creating memory dumps. Jonathan investigated the memory dumps and all of the call stacks pointed to memory corruption, however there were no 832 errors in the error log. Why not?
First off, what is error 832?
When a data file page is written to disk, a checksum is calculated and stored in the page header. Until that page is changed again, the checksum calculation on the page should create the same checksum value. This means that when a page is read from disk, it’s checksum value can be validated, and if a discrepancy is found, the buffer pool throws an 824 error and you know that something in the I/O subsystem corrupted the page. (Well, technically, the buffer pool retries the read four times before declaring failure, but I digress…)
But what about 832? Once the page is in memory and hasn’t been changed, the checksum calculation should give the checksum value that’s in the page header. When the page is about to be changed for the first time, the buffer pool validates the checksum and if the page has been corrupted in memory (remember, it must already have been read from disk correctly), then an 832 error occurs.
So if an 832 error indicates memory corruption, and in this case there was definitely memory corruption, why weren’t there any 832 errors?
Simple: the checksum validation of a page in memory only occurs if the page hasn’t changed. So for pages that have already changed and then subsequently corrupted by something, memory corruption may not be discovered unless by other run-time assertion checks in the SQL Server code, leading to memory dumps, as in this case.