When DBCC DROPCLEANBUFFERS doesn’t work…

Here’s an interesting question that came up in our IEPTO1 class in Chicago this week (paraphrasing):

I was doing a demo recently where I was demonstrating physical I/Os occurring. I used DBCC DROPCLEANBUFFERS, then enabled SET STATISTICS IO ON and performed a SELECT operation. I was expecting to see physical reads occurring but I only saw logical reads. What’s going on? Why weren’t there any physical reads after I’d flushed the buffer pool with DBCC DROPCLEANBUFFERS?

It’s a very interesting question. If you ask most people what DBCC DROPCLEANBUFFERS does, you’ll get the response that it clears out the buffer pool (as my student expected).

But it doesn’t.

It drops *clean* pages from the buffer pool only.

A clean page is one that has not been changed since it was read into memory or last written to disk. A dirty page is one that has not been written to disk since it was last changed. Dirty pages are not dropped by DBCC DROPCLEANBUFFERS, they are only made clean by writing them to disk (either through one of the various kinds of checkpoints or by the lazy writer – or one of the per-NUMA node lazy writers if you have NUMA configured).

The demo in question had updated the table being selected, and so when DBCC DROPCLEANBUFFERS was executed, the pages from that table remained in memory – hence no physical reads were required for the subsequent SELECT.

If you want to ensure that all pages from a database are flushed from memory, you need to first perform a manual CHECKPOINT of that database and then run DBCC DROPCLEANBUFFERS.

You can then verify that there are no pages in memory for the database in question using the following code:

SELECT *,
	[DirtyPageCount] * 8 / 1024 AS [DirtyPageMB],
	[CleanPageCount] * 8 / 1024 AS [CleanPageMB]
FROM
	(SELECT 
		(CASE WHEN ([database_id] = 32767)
			THEN N'Resource Database'
			ELSE DB_NAME ([database_id]) END) AS [DatabaseName], 
		SUM (CASE WHEN ([is_modified] = 1)
			THEN 1 ELSE 0 END) AS [DirtyPageCount], 
		SUM (CASE WHEN ([is_modified] = 1)
			THEN 0 ELSE 1 END) AS [CleanPageCount]
	FROM sys.dm_os_buffer_descriptors
	GROUP BY [database_id]) AS [buffers]
ORDER BY [DatabaseName]
GO 

Hope this helps some of you out there!

Problems from having lots of server memory

A month ago I kicked off a survey asking how much memory is installed on your largest server that’s running SQL Server. Thank you to everyone that responded.

Here are the results:

memory

 

 

 

 

 

 

 

 

 

 

 

The “other” values are:

  • 3 more for the ‘128 GB or more, but less than 256 GB’ count
  • 1 more for the ‘Less than 16 GB’ count
  • One poor soul who only has 512 MB in their server!

This is very interesting:

  • I expected the majority of servers to fall into the middle of the range (around 128GB), and it’s actually only 37% that fit into the 64 GB to 256 GB range.
  • I’m surprised at the percentage of servers (41%) of servers with 256 GB or more.
  • I didn’t know what percentage would have more than 1 TB, so almost 10% is really cool to see.

So what do these results mean? Well, the number of servers out there with lots (more than 128GB) of memory is more than half of all respondents. The more memory you have, the more important it is that you make sure that the memory is being used efficiently and that you’re not wasting space in the buffer pool (see here) and that you’re not churning the buffer pool with poor query plans causing lots of reads (see here).

What other things could be problems with large amounts of memory?

  • Shutting down the instance. This will checkpoint all the databases, which could take quite a long time (minutes to hours) if suddenly all databases have lots of dirty pages that all need to be flushed out to disk. This can eat into your maintenance window, if you’re shutting down to install an SP or a CU.
  • Starting up the instance. If the server’s POST checks memory, the more memory you have, the longer that will take. This can eat into your allowable downtime if a crash occurs.
  • Allocating the buffer pool. We’ve worked with clients with terabyte+ buffer pools where they hit a bug on 2008 R2 (also in 2008 and 2012) around NUMA memory allocations that would cause SQL Server to take many minutes to start up. That bug has been fixed in all affected versions and you can read about in KB 2819662.
  • Warming up the buffer pool. Assuming you don’t hit the memory allocation problem above, how do you warm up such a large buffer pool so that you’re not waiting a long time for your ‘working set’ of data file pages to be memory resident? One solution is to analyze your buffer pool when it’s warm, to figure out which tables and indexes are in memory, and then write some scripts that will read much of that data into memory quickly as part of starting up the instance. For one of the same customers that hit the allocation bug above, doing this produced a big boost in getting to the steady-state workload performance compared to waiting for the buffer pool to warm up naturally.
  • Complacency. With a large amount of memory available, there might be a tendency to slacken off proactively looking for unused and missing index tuning opportunities or plan cache bloat or wasted buffer pool space (I mentioned above), thinking that having all that memory will be more forgiving. Don’t fall into this trap. If one of these things becomes such a problem that it’s noticeable on your server with lots of memory, it’s a *big* problem that may be harder to get under control quickly.
  • Disaster recovery. If you’ve got lots of memory, it probably means your databases are getting larger. You need to start considering the need for multiple filegroups to allow small, targeted restores for fast disaster recovery. This may also mean you need to think about breaking up large tables, using partitioning for instance, or archiving old, unused data so that tables don’t become unwieldy.

Adding more memory is one of the easiest ways to alleviate some performance issues (as a band-aid, or seemingly risk-free temporary fix), but don’t think it’s a simple thing to just max out the server memory and then forget about it. As you can see, more memory leads to more potential problems, and these are just a few things that spring to mind as I’m sitting in the back of class here in Sydney.

Be careful out there!

 

Capturing IO latencies for a period of time

In both my wait statistics pre-conference workshops at the PASS Summit and SQLintersection I promised to do a bunch of blog posts. The second one on the list is a simple script to allow you to capture all the reads, writes, and I/O latencies that occurred over a period of time.

The script does the following:

  • Creates two temporary tables
  • Captures the output from sys.dm_io_virtual_file_stats into the first table
  • Waits for a configurable delay (line 41 in the script – I made it 30 minutes in the example)
  • Captures the output from sys.dm_io_virtual_file_stats into the second table
  • Provides my usual virtual file stats output on the results

The original code and explanation about using the DMV are here.

Enjoy!

/*============================================================================
  File:     ShortPeriodIOLatencies.sql

  Summary:  Short snapshot of I/O latencies

  SQL Server Versions: 2005 onwards
------------------------------------------------------------------------------
  Written by Paul S. Randal, SQLskills.com

  (c) 2014, SQLskills.com. All rights reserved.

  For more scripts and sample code, check out http://www.SQLskills.com

  You may alter this code for your own *non-commercial* purposes (e.g. in a
  for-sale commercial tool). Use in your own environment is encouraged.
  You may republish altered code as long as you include this copyright and
  give due credit, but you must obtain prior permission before blogging
  this code.

  THIS CODE AND INFORMATION ARE PROVIDED "AS IS" WITHOUT WARRANTY OF
  ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED
  TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND/OR FITNESS FOR A
  PARTICULAR PURPOSE.
============================================================================*/
 

IF EXISTS (SELECT * FROM [tempdb].[sys].[objects]
    WHERE [name] = N'##SQLskillsStats1')
    DROP TABLE [##SQLskillsStats1];

IF EXISTS (SELECT * FROM [tempdb].[sys].[objects]
    WHERE [name] = N'##SQLskillsStats2')
    DROP TABLE [##SQLskillsStats2];
GO

SELECT [database_id], [file_id], [num_of_reads], [io_stall_read_ms],
	   [num_of_writes], [io_stall_write_ms], [io_stall],
	   [num_of_bytes_read], [num_of_bytes_written], [file_handle]
INTO ##SQLskillsStats1
FROM sys.dm_io_virtual_file_stats (NULL, NULL);
GO

WAITFOR DELAY '00:30:00';
GO

SELECT [database_id], [file_id], [num_of_reads], [io_stall_read_ms],
	   [num_of_writes], [io_stall_write_ms], [io_stall],
	   [num_of_bytes_read], [num_of_bytes_written], [file_handle]
INTO ##SQLskillsStats2
FROM sys.dm_io_virtual_file_stats (NULL, NULL);
GO

WITH [DiffLatencies] AS
(SELECT
-- Files that weren't in the first snapshot
        [ts2].[database_id],
        [ts2].[file_id],
        [ts2].[num_of_reads],
        [ts2].[io_stall_read_ms],
		[ts2].[num_of_writes],
		[ts2].[io_stall_write_ms],
		[ts2].[io_stall],
		[ts2].[num_of_bytes_read],
		[ts2].[num_of_bytes_written]
    FROM [##SQLskillsStats2] AS [ts2]
    LEFT OUTER JOIN [##SQLskillsStats1] AS [ts1]
        ON [ts2].[file_handle] = [ts1].[file_handle]
    WHERE [ts1].[file_handle] IS NULL
UNION
SELECT
-- Diff of latencies in both snapshots
        [ts2].[database_id],
        [ts2].[file_id],
        [ts2].[num_of_reads] - [ts1].[num_of_reads] AS [num_of_reads],
        [ts2].[io_stall_read_ms] - [ts1].[io_stall_read_ms] AS [io_stall_read_ms],
		[ts2].[num_of_writes] - [ts1].[num_of_writes] AS [num_of_writes],
		[ts2].[io_stall_write_ms] - [ts1].[io_stall_write_ms] AS [io_stall_write_ms],
		[ts2].[io_stall] - [ts1].[io_stall] AS [io_stall],
		[ts2].[num_of_bytes_read] - [ts1].[num_of_bytes_read] AS [num_of_bytes_read],
		[ts2].[num_of_bytes_written] - [ts1].[num_of_bytes_written] AS [num_of_bytes_written]
    FROM [##SQLskillsStats2] AS [ts2]
    LEFT OUTER JOIN [##SQLskillsStats1] AS [ts1]
        ON [ts2].[file_handle] = [ts1].[file_handle]
    WHERE [ts1].[file_handle] IS NOT NULL)
SELECT
	DB_NAME ([vfs].[database_id]) AS [DB],
	LEFT ([mf].[physical_name], 2) AS [Drive],
	[mf].[type_desc],
	[num_of_reads] AS [Reads],
	[num_of_writes] AS [Writes],
	[ReadLatency(ms)] =
		CASE WHEN [num_of_reads] = 0
			THEN 0 ELSE ([io_stall_read_ms] / [num_of_reads]) END,
	[WriteLatency(ms)] =
		CASE WHEN [num_of_writes] = 0
			THEN 0 ELSE ([io_stall_write_ms] / [num_of_writes]) END,
	-- [Latency] =
		-- CASE WHEN ([num_of_reads] = 0 AND [num_of_writes] = 0)
			-- THEN 0 ELSE ([io_stall] / ([num_of_reads] + [num_of_writes])) END,
	[AvgBPerRead] =
		CASE WHEN [num_of_reads] = 0
			THEN 0 ELSE ([num_of_bytes_read] / [num_of_reads]) END,
	[AvgBPerWrite] =
		CASE WHEN [num_of_writes] = 0
			THEN 0 ELSE ([num_of_bytes_written] / [num_of_writes]) END,
	-- [AvgBPerTransfer] =
		-- CASE WHEN ([num_of_reads] = 0 AND [num_of_writes] = 0)
			-- THEN 0 ELSE
				-- (([num_of_bytes_read] + [num_of_bytes_written]) /
				-- ([num_of_reads] + [num_of_writes])) END,
	[mf].[physical_name]
FROM [DiffLatencies] AS [vfs]
JOIN sys.master_files AS [mf]
	ON [vfs].[database_id] = [mf].[database_id]
	AND [vfs].[file_id] = [mf].[file_id]
-- ORDER BY [ReadLatency(ms)] DESC
ORDER BY [WriteLatency(ms)] DESC;
GO

-- Cleanup
IF EXISTS (SELECT * FROM [tempdb].[sys].[objects]
    WHERE [name] = N'##SQLskillsStats1')
    DROP TABLE [##SQLskillsStats1];

IF EXISTS (SELECT * FROM [tempdb].[sys].[objects]
    WHERE [name] = N'##SQLskillsStats2')
    DROP TABLE [##SQLskillsStats2];
GO