This is a case that happened on a client system last year: occasionally a common query on a tiny table appeared to ‘hang’ and had to be killed and re-run. What’s going on?
The table in question only had a few million rows of data in it, with a maximum row size of 60 bytes, and the query usually ran in a few seconds, but occasionally the query would ‘hang’ and would either be killed or take tens of minutes to run. Troubleshooting instrumentation when the issue happened showed no out-of-the-ordinary waits occurring, no pressure on the server, and the query plan generated when the query took a long time was essentially the same.
The only thing noticeable was that when the problem occurred, a column statistics update happened as part of query compilation, but with such a tiny table, how could that be the root cause of the issue? The calculated disk space for the row size and count worked out to be about 250MB, but with a statistics sample rate of only 4%, extended events showed an auto_stats event taking close to an hour!
Check It Out!
SQLskills Blackbelt Bundle and 2022 Updates - 1 Year Access
$2,495.00 Original price was: $2,495.00.$699.00Current price is: $699.00.
Further investigation showed that although the table only had a few hundred MB of data in it, it was taking up more than 25GB of disk space! Jonathan and I were stumped as to how that could be the case. True, there had been bugs in earlier versions of SQL Server, such as with LOB data types where only one page from each dedicated extent (of 8 pages) was actually used, but none were currently known, and even something like that wouldn’t account for a table taking up more than 100x more disk space than data in the table.
More and more curious, all the space was in a nonclustered index. How could *that* be possible?
I suggested looking at what the pages were that were allocated to the table. Bingo! 25GB of IAM pages (one-per-4GB allocation bitmaps) were allocated to that single nonclustered index.
Further investigation showed that the nonclustered index had thousands of partitions, each with a handful of index pages and around 1,500 IAM pages. The nonclustered index leading key was a ROWVERSION column and the churn rate on the table was very high, so essentially the schema and usage pattern were creating a long IAM chain for each partition, with hardly any data.
In a nutshell, this meant a statistics update on a column covered by the nonclustered index would have needed to read and process 25GB of IAM pages, looking for allocated extents to then process the index records from to produce the statistic.
Solution? The initial super-quick fix was to drop and recreate the nonclustered index to remove the long IAM chains (rather than having an index rebuild have to go through the same, laborious task of reading all the IAM pages!), and then to implement regular index maintenance to prevent the IAM chains from becoming long in the first place.
PS I’ll blog the script we used to prove it next week.
3 thoughts on “The Curious Case of… occasional query failure on a tiny table”
It sounds like you used sp_spaceused but that doesn’t break it down. As you know, it does point to an aggregate of the index and data sizes and unused.
I tried this instead:
SELECT
t.name AS [TableName],
i.name AS [IndexName],
i.type_desc AS [IndexType],
p.rows AS [RowCount],
(sum(a.total_pages) * 8) / 1024 AS [TotalSpaceMB],
(sum(a.used_pages) * 8) / 1024 AS [UsedSpaceMB],
(sum(a.data_pages) * 8) / 1024 AS [DataSpaceMB]
FROM
sys.tables t
INNER JOIN
sys.indexes i ON t.object_id = i.object_id
INNER JOIN
sys.partitions p ON i.object_id = p.object_id AND i.index_id = p.index_id
INNER JOIN
sys.allocation_units a ON p.partition_id = a.container_id
WHERE
t.name = ‘session’
GROUP BY
t.name, i.object_id, i.index_id, i.name, i.type_desc, p.rows;
That appears to get me further but without a mockup of it, I’m curious how this was uncovered since I don’t have anything to test against.
“25GB of IAM pages (one-per-4GB allocation bitmaps) were allocated to that single nonclustered index.”
My best guess is this:
SELECT
allocated_page_file_id AS [FileID],
allocated_page_page_id AS [PageID],
page_type_desc,
index_id,
partition_id,
is_allocated
FROM sys.dm_db_database_page_allocations(DB_ID(), OBJECT_ID(‘YourTableName’), NULL, NULL, ‘DETAILED’)
WHERE page_type_desc = ‘IAM_PAGE’;
But I’d be curious to know if I’m on the right track.
No – once we realized it couldn’t possibly be index pages we dove down into the guts and walked the various chains with dm_db_page_info. I’ll blog the script we used next week.