Just over a year ago I blogged about the enhancements that were made to the sqlserver.page_split Event in SQL Server 2012 to make it easier to identify what the splitting object was and the type of split that was being performed.  Sadly what I discovered writing that post was that even with the extra information about the split type, the event didn’t give you enough information to really focus on the problematic splits that lead to fragmentation and page density issues in the database.  I didn’t do a whole lot with this again until recently when a question was posted by Ami Levin (Blog | Twitter) on the MVP email list that commented that the page_split event was broken in SQL Server 2012 based on a presentation he’d seen by Guy Glantser (Blog | Twitter).

Let me start off by saying, the event isn’t broken, it tracks page splits, but it doesn’t differentiate between an end page split that occurs for an ever increasing index, versus a mid-page split for a random index that leads to fragmentation and page density issues in the database.  Both of these are technically splits inside the storage engine, even if we as DBA’s don’t really care about the end-page split for a increasing key value like an IDENTITY column in the database.  I had Ami pass my information along to the presenter and we traded a few emails on the subject of tracking splits with the specific focus on trying to pull out the mid-page, fragmenting splits.  While going through things for the third time, it dawned on me that this is incredibly simple, based one of the demo’s that was sent to me.  Just over a year ago, I also blogged about tracking transaction log activity in SQL Server 2012 using the sqlserver.transaction_log event, which can be used to track mid-page splits in a database.

Last year when I wrote about the sqlserver.transaction_log event, there were 10 columns output by the event in CTP1, but as of RC0, the events output has changed and only 9 columns are output by the event.

SELECT 
    oc.name, 
    oc.type_name, 
    oc.description
FROM sys.dm_xe_packages AS p
INNER JOIN sys.dm_xe_objects AS o
    ON p.guid = o.package_guid
INNER JOIN sys.dm_xe_object_columns AS oc
    ON oc.object_name = o.name
        AND oc.object_package_guid = o.package_guid
WHERE o.name = 'transaction_log'
  AND oc.column_type = 'data';

image

For the purposes of identifying the mid-page splits, we want to look at the operation column that is output by the event, which contains the specific operation being logged.  In the case of a mid-page split occurring, the operation will be a LOP_DELETE_SPLIT, which marks the delete of rows from a page as a result of the split.  To build our event session, we are going to need the map_key for the LOP_DELETE_SPLIT log_op map.  This can be obtained from the sys.dm_xe_map_values DMV:

SELECT *
FROM sys.dm_xe_map_values
WHERE name = 'log_op'
  AND map_value = 'LOP_DELETE_SPLIT';

With the map_key value, we have a couple of ways to collect the information with our targets.  We could collect everything into an event_file, but that doesn’t really make sense for this event.  Instead the best target for this type of information is the histogram target which will bucket our results based on how we configure the target and tell us how frequently the event fires based on our bucketing criteria.  If we don’t know anything about the server in question, we can start off with a very general event session that has a predicate on the operation only, and then aggregate the information in the histogram target based on the database_id to find the databases that have the most mid-page splits occurring in them in the instance.

-- If the Event Session exists DROP it
IF EXISTS (SELECT 1 
            FROM sys.server_event_sessions 
            WHERE name = 'SQLskills_TrackPageSplits')
    DROP EVENT SESSION [SQLskills_TrackPageSplits] ON SERVER

-- Create the Event Session to track LOP_DELETE_SPLIT transaction_log operations in the server
CREATE EVENT SESSION [SQLskills_TrackPageSplits]
ON    SERVER
ADD EVENT sqlserver.transaction_log(
    WHERE operation = 11  -- LOP_DELETE_SPLIT 
)
ADD TARGET package0.histogram(
    SET filtering_event_name = 'sqlserver.transaction_log',
        source_type = 0, -- Event Column
        source = 'database_id');
GO
        
-- Start the Event Session
ALTER EVENT SESSION [SQLskills_TrackPageSplits]
ON SERVER
STATE=START;
GO

This event session will allow you to track the worst splitting database on the server, and the event data can be parsed out of the histogram target.  To demonstrate this, we can create a database that has tables and indexes prone to mid-page splits and run a default workload to test the event session:

USE [master];
GO
-- Drop the PageSplits database if it exists
IF DB_ID('PageSplits') IS NOT NULL
BEGIN
    ALTER DATABASE PageSplits SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
    DROP DATABASE PageSplits;
END
GO
-- Create the database
CREATE DATABASE PageSplits
GO
USE [PageSplits]
GO
-- Create a bad splitting clustered index table
CREATE TABLE BadSplitsPK
( ROWID UNIQUEIDENTIFIER NOT NULL DEFAULT NEWID() PRIMARY KEY,
  ColVal INT NOT NULL DEFAULT (RAND()*1000),
  ChangeDate DATETIME2 NOT NULL DEFAULT CURRENT_TIMESTAMP);
GO
--  This index should mid-split based on the DEFAULT column value
CREATE INDEX IX_BadSplitsPK_ColVal ON BadSplitsPK (ColVal);
GO
--  This index should end-split based on the DEFAULT column value
CREATE INDEX IX_BadSplitsPK_ChangeDate ON BadSplitsPK (ChangeDate);
GO
-- Create a table with an increasing clustered index
CREATE TABLE EndSplitsPK
( ROWID INT IDENTITY NOT NULL PRIMARY KEY,
  ColVal INT NOT NULL DEFAULT (RAND()*1000),
  ChangeDate DATETIME2 NOT NULL DEFAULT DATEADD(mi, RAND()*-1000, CURRENT_TIMESTAMP));
GO
--  This index should mid-split based on the DEFAULT column value
CREATE INDEX IX_EndSplitsPK_ChangeDate ON EndSplitsPK (ChangeDate);
GO
-- Insert the default values repeatedly into the tables
WHILE 1=1
BEGIN
    INSERT INTO dbo.BadSplitsPK DEFAULT VALUES;
    INSERT INTO dbo.EndSplitsPK DEFAULT VALUES;
    WAITFOR DELAY '00:00:00.005';
END
GO

If we startup this workload and allow it to run for a couple of minutes, we can then query the histogram target for our session to find the database that has the mid-page splits occurring.

-- Query the target data to identify the worst splitting database_id
SELECT 
    n.value('(value)[1]', 'bigint') AS database_id,
    DB_NAME(n.value('(value)[1]', 'bigint')) AS database_name,
    n.value('(@count)[1]', 'bigint') AS split_count
FROM
(SELECT CAST(target_data as XML) target_data
 FROM sys.dm_xe_sessions AS s 
 JOIN sys.dm_xe_session_targets t
     ON s.address = t.event_session_address
 WHERE s.name = 'SQLskills_TrackPageSplits'
  AND t.target_name = 'histogram' ) as tab
CROSS APPLY target_data.nodes('HistogramTarget/Slot') as q(n)

image

With the database_id of the worst splitting database, we can then change our event session configuration to only look at this database, and then change our histogram target configuration to bucket on the alloc_unit_id so that we can then track down the worst splitting indexes in the database experiencing the worst mid-page splits.

-- Drop the Event Session so we can recreate it 
-- to focus on the highest splitting database
DROP EVENT SESSION [SQLskills_TrackPageSplits] 
ON SERVER

-- Create the Event Session to track LOP_DELETE_SPLIT transaction_log operations in the server
CREATE EVENT SESSION [SQLskills_TrackPageSplits]
ON    SERVER
ADD EVENT sqlserver.transaction_log(
    WHERE operation = 11  -- LOP_DELETE_SPLIT 
      AND database_id = 8 -- CHANGE THIS BASED ON TOP SPLITTING DATABASE!
)
ADD TARGET package0.histogram(
    SET filtering_event_name = 'sqlserver.transaction_log',
        source_type = 0, -- Event Column
        source = 'alloc_unit_id');
GO

-- Start the Event Session Again
ALTER EVENT SESSION [SQLskills_TrackPageSplits]
ON SERVER
STATE=START;
GO

With the new event session definition, we can now rerun our problematic workload for a 2 minute period and look at the worst splitting indexes based on the alloc_unit_id’s that are in the histogram target:

 

-- Query Target Data to get the top splitting objects in the database:
SELECT
    o.name AS table_name,
    i.name AS index_name,
    tab.split_count,
    i.fill_factor
FROM (    SELECT 
            n.value('(value)[1]', 'bigint') AS alloc_unit_id,
            n.value('(@count)[1]', 'bigint') AS split_count
        FROM
        (SELECT CAST(target_data as XML) target_data
         FROM sys.dm_xe_sessions AS s 
         JOIN sys.dm_xe_session_targets t
             ON s.address = t.event_session_address
         WHERE s.name = 'SQLskills_TrackPageSplits'
          AND t.target_name = 'histogram' ) as tab
        CROSS APPLY target_data.nodes('HistogramTarget/Slot') as q(n)
) AS tab
JOIN sys.allocation_units AS au
    ON tab.alloc_unit_id = au.allocation_unit_id
JOIN sys.partitions AS p
    ON au.container_id = p.partition_id
JOIN sys.indexes AS i
    ON p.object_id = i.object_id
        AND p.index_id = i.index_id
JOIN sys.objects AS o
    ON p.object_id = o.object_id
WHERE o.is_ms_shipped = 0;

image

With this information we can now go back and change our FillFactor specifications and retest/monitor the impact to determine whether we’ve had the appropriate reduction in mid-page splits to accommodate the time between our index rebuild operations:

-- Change FillFactor based on split occurences
ALTER INDEX PK__BadSplit__97BD02EB726FCA55 ON BadSplitsPK REBUILD WITH (FILLFACTOR=70)
ALTER INDEX IX_BadSplitsPK_ColVal ON BadSplitsPK REBUILD WITH (FILLFACTOR=70)
ALTER INDEX IX_EndSplitsPK_ChangeDate ON EndSplitsPK REBUILD WITH (FILLFACTOR=80)
GO

-- Stop the Event Session to clear the target
ALTER EVENT SESSION [SQLskills_TrackPageSplits]
ON SERVER
STATE=STOP;
GO

-- Start the Event Session Again
ALTER EVENT SESSION [SQLskills_TrackPageSplits]
ON SERVER
STATE=START;
GO

With the reset performed we can again start up our workload generation and begin monitoring the effect of the FillFactor specifications on the indexes with our code.  After another 2 minute period, the following splits were noted.

image

With this information we can go back and again attempt to tune our FillFactor values for the worst splitting indexes and rinse/repeat until we determine the best FillFactor for each of the indexes to minimize splits.  This is an incredibly powerful tool for the DBA moving into SQL Server 2012, and will definitely change how we perform index fragmentation analysis and troubleshoot problems with excessive log generation in SQL Server 2012 onwards.

Cheers!

Friday of last week, Steve Jones (Blog|Twitter) asked a question on the Twitter #sqlhelp tag about SQL Server and Soft NUMA that prompted me to write the long waiting follow up to my first post Understanding Non-Uniform Memory Access/Architectures (NUMA). The question that Steve asked was:

@way0utwest: “Do you always have one soft NUMA node? Say a one CPU PC or SMP architecture? #sqlhelp”

The whole start of my real work looking into the different NUMA configurations was actually around how to configure SQL Server for Soft NUMA so that I would see the benefits that are mentioned in two of the topics in the Books Online. The Understanding Non-uniform Memory Access topic states (or at least at the time I wrote this it stated):

"The benefits of soft-NUMA include reducing I/O and lazy writer bottlenecks on computers with many CPUs and no hardware NUMA. There is a single I/O thread and a single lazy writer thread for each NUMA node. Depending on the usage of the database, these single threads may be a significant performance bottleneck. Configuring four soft-NUMA nodes provides four I/O threads and four lazy writer threads, which could increase performance."

I remembered reading something along these lines after recently looking at a number of client servers that had 24 cores and up to 128GB RAM installed in them but were not hardware NUMA configurations. Having the extra lazy writer threads sounded like a good thing so I went about trying to figure out how to setup Soft NUMA which lead me to the BOL Topic How to: Configure SQL Server to Use Soft-NUMA which provided the following scenario:

“Instance A, which experiences significant I/O, now has two I/O threads and two lazy writer threads, while instance B, which performs processor-intensive operations, has only one I/O thread and one lazy writer thread. "

Perfect this is exactly what I was thinking, so now off to do some testing on one of my VM’s, and this is where I hit my first bump. I needed to figure out the appropriate CPUMask values for a 24 core system to subdivide it into four Soft NUMA nodes of 6 cores each, but the Books Online, and the available blog posts for the topic don’t make that very easy to accomplish and they don’t explain the basis of the CPUMask very well either. I was able to figure out with some testing, that the CPUMask is just a bitmask of the CPUMask values of the CPU’s that contribute to Soft NUMA node. Clear as mud right? 

To make this a bit easier, I am going to paste some tables that I created in Excel below that demonstrate what I am talking about, but for a 8 core example and not the 24 core solution I was originally working on. The first thing you have to keep in mind is that CPU’s are numbered with a CPUID starting with a base of zero, and each CPU has its own mask value that is two to the power of the CPUID. For the example in the How to: Configure SQL Server to Use Soft-NUMA Books Online Topic, this works out logically to the table below:

CPUID Mask Soft Node CPU BitMask Node Registry CPUMask
0 1 1 3 0x03
1 2
2 4 2 12 0x0C
3 8
4 16 3 240 0xF0
5 32
6 64
7 128

The CPU Mask Sum column is the sum of the Masks for each of the CPU’s in the Soft Node and then the Node Registry CPUMask is the hexadecimal representation of the decimal value. This can be obtained using the DEC2HEX() function in Excel, or by placing the Windows Calculator into Programmer Mode from the File Menu.

image  image

So now that we have an easier to understand example covered, here is what the CPUMask for a 24 core server with four 6 core Soft NUMA Nodes would be:

CPUID Mask Soft Node CPU BitMask Node Registry CPUMask
0 1 1 63 0x3F
1 2
2 4
3 8
4 16
5 32
6 64 2 4032 0xFC0
7 128
8 256
9 512
10 1024
11 2048
12 4096 3 258048 0x3F000
13 8192
14 16384
15 32768
16 65536
17 131072
18 262144 4 16515072 0xFC0000
19 524288
20 1048576
21 2097152
22 4194304
23 8388608

To test this configuration out, I went to one of my SQL Server 2008R2 VM’s on my laptop and made use of the –Pn startup trace flag which tells SQL Server to create n schedulers during the SQLOS boot up even if the actual hardware does not exist for those schedulers.  This is an undocumented startup parameter and it should NEVER BE USED in a production server.  Just because the SQLOS creates additional schedulers, doesn’t mean that you physically have additional processors in the server, and if you want to see how bad this startup parameter can cause performance to drag, create 24 schedulers on your laptop and then run a couple of concurrent queries in SQL Server.  After adding ;-P24 to startup parameters and restarting my SQL instance I now have 24 schedulers in sys.dm_os_schedulers as shown below:

image

Now all I have to do is add the registry keys to create my Soft NUMA nodes. At this point I realized that I had really complicated figuring out the hexadecimal CPUMask values. The DWORD registry value accepts the Decimal or Hexadecimal value in regedit, so you can skip converting the CPU BitMasks to hexadecimal, if you ever actually configure Soft NUMA for one of your servers.

image

After setting all of the registry keys for my instance, and then restarting my instance I went back in to sys.dm_os_nodes to make sure that my NUMA configuration changes were correctly made.

image

Everything looks perfect, I now have four Soft NUMA nodes with 6 schedulers in each node, so lets go look at our new Lazy Writer Threads in sys.dm_exec_requests:

image

This is only a subset of the system sessions that actually exist on the instance but trust me when I tell you, all of the Lazy Writer threads are shown above; there is only one.  So what happened here? I have four Soft NUMA nodes, we confirmed that so why don’t I have any additional Lazy Writer threads?  I was certain I had done something wrong here, so I sent tweet to Amit Banerjee (Blog|Twitter) a good friend of mine from the MSDN Forums who happens to be on the Product Support Team in India. It turns out that the Books Online is wrong about the benefits of using Soft NUMA, and this was pointed out a few years ago by Bob Dorr, another escalation engineer in Product Support on his blog post How It Works: Soft NUMA, I/O Completion Thread, Lazy Writer Workers and Memory Nodes. Two key points are made by Bob in this blog post; first that Lazy Writers are only assigned to Hard NUMA nodes, which map to SQLOS memory nodes, and second the IO Completion threads that are created per Soft NUMA node have nothing to do with Disk I/O, but are instead for connection requests and TDS traffic over the network.

Based on this, I filed a connect feedback for the incorrect entries in the Books Online for Soft NUMA that will eventually see this incorrect information corrected. At the point that I realized this, I had just over four hours of time dedicated to figuring out how to make something work that will never actually do what I expected based on the documentation, but I had a much better understanding of how NUMA actually worked inside of SQL Server, which is something that had up to this point been a big mystery to me.

This leaves open the question, when would you actually make use of Soft NUMA in SQL Server, if it doesn’t actually provide CPU to Memory masking like Hard NUMA does, or provide any benefits of having additional Lazy Writer Threads?  One use case would be to provide port affinity to drive specific connections to a specific node as discussed in the How It Works: SQL Server 2005 NUMA Basics blog post, also by Bob Dorr, but this only applies to serial execution plans and not to parallel execution plans as detailed by Bob in the NUMA Connection Affinity and Parallel Queries blog post. An example of where Soft NUMA usage is the ETL World Record set by SQL Server back in February 2008 which used Soft NUMA to balance the connection load as a part of achieving the world record of 1.18TB of flat file data loaded in 1794 seconds.

Categories:
Internals | NUMA

At SQL Connections, I presented a session titled “Learn SQL Server Internals with Extended Events” where I demonstrated a number ways to use Extended Events to learn about the internal workings of the database engine for SQL Server.  The morning of the session I was chatting with someone about a problem they had seen and the topic of proportional fill came up and how the database engine stripes data across multiple files in a user database.  From this discussion, I got the idea to play around with multiple database files and built a demo using Extended Events that showed how proportional fill worked inside of the database engine.  This wasn’t a planned demo for my presentation, and I had plenty of other demo’s that showed various SQL Server Internals, but it became a really good demo that I decided to throw into the mix, which put me way over budget for time.  I decided to leave it up the audience which demo they wanted to see for the last demo of the session, an originally planned one, or the one I wrote that morning for proportional fill; the majority wanted to see the one on proportional fill and it turned out to be the best demo of the entire session based on the crowd interest and feedback.  What was most interesting to me was the number of people attending the session that had never heard of the concept of proportional fill with SQL Server.  Proportional fill is the algorithm used by SQL Server to determine how much information is written to each of the files in a multi-file filegroup based on the proportion of free space within each file; which allows the files to become full at approximately the same time.  Proportional fill has nothing to do with the actual file sizes, it is strictly based on the free space within a file. 

To demonstrate proportional fill in SQL Server, we’ll take a look at how the page writes and I/O operations are distributed across varying configurations for a test database using the same test and event session for each configuration

Basic Example

To look at how proportional fill functions, we’ll start with a basic example, using a database with a separate filegroup for user objects that has been marked as the default filegroup for the database.  The UserObjects filegroup will have four data files, each 32MB in size.

-- Create a multi-file database with evenly sized files
CREATE DATABASE [MultipleDataFiles] ON  PRIMARY 
( NAME = N'MultipleDataFiles', 
    FILENAME = N'H:\SQLData\MultipleDataFiles.mdf' , 
    SIZE = 32768KB , 
    FILEGROWTH = 32768KB ), 
FILEGROUP [UserObjects] 
( NAME = N'MultipleDataFiles_UserObjects1', 
    FILENAME = N'H:\SQLData\MultipleDataFiles_UserObjects1.ndf' , 
    SIZE = 32768KB , 
    FILEGROWTH = 32768KB ), 
( NAME = N'MultipleDataFiles_UserObjects2', 
    FILENAME = N'H:\SQLData\MultipleDataFiles_UserObjects2.ndf' , 
    SIZE = 32768KB , 
    FILEGROWTH = 32768KB ), 
( NAME = N'MultipleDataFiles_UserObjects3', 
    FILENAME = N'H:\SQLData\MultipleDataFiles_UserObjects3.ndf' , 
    SIZE = 32768KB , 
    FILEGROWTH = 32768KB ), 
( NAME = N'MultipleDataFiles_UserObjects4', 
    FILENAME = N'H:\SQLData\MultipleDataFiles_UserObjects4.ndf' , 
    SIZE = 32768KB , 
    FILEGROWTH = 32768KB )
 LOG ON 
( NAME = N'MultipleDataFiles_log', 
    FILENAME = N'L:\SQLLogs\MultipleDataFiles_log.ldf' , 
    SIZE = 131072KB , 
    FILEGROWTH = 32768KB )
GO
ALTER DATABASE [MultipleDataFiles] 
MODIFY FILEGROUP [UserObjects] DEFAULT
GO

With the database created, we’ll create a table to load data into for our tests based on the AdventureWorks SalesOrderHeader table which will be the source for the test data:

- Create a table to load data into for the tests
USE [MultipleDataFiles]
GO
IF OBJECT_ID('SalesOrderHeader') IS NOT NULL
    DROP TABLE [dbo].[SalesOrderHeader]
GO
SET NOCOUNT ON
GO
CREATE TABLE [dbo].[SalesOrderHeader](
    [SalesOrderID] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY,
    [RevisionNumber] [tinyint] NOT NULL,
    [OrderDate] [datetime] NOT NULL,
    [DueDate] [datetime] NOT NULL,
    [ShipDate] [datetime] NULL,
    [Status] [tinyint] NOT NULL,
    [OnlineOrderFlag] [bit] NOT NULL,
    [SalesOrderNumber] [nvarchar](25) NOT NULL,
    [PurchaseOrderNumber] [nvarchar](25) NULL,
    [AccountNumber] [nvarchar](15) NULL,
    [CustomerID] [int] NOT NULL,
    [SalesPersonID] [int] NULL,
    [TerritoryID] [int] NULL,
    [BillToAddressID] [int] NOT NULL,
    [ShipToAddressID] [int] NOT NULL,
    [ShipMethodID] [int] NOT NULL,
    [CreditCardID] [int] NULL,
    [CreditCardApprovalCode] [varchar](15) NULL,
    [CurrencyRateID] [int] NULL,
    [SubTotal] [money] NOT NULL,
    [TaxAmt] [money] NOT NULL,
    [Freight] [money] NOT NULL,
    [TotalDue] [money] NOT NULL,
    [Comment] [nvarchar](128) NULL,
    [rowguid] [uniqueidentifier] NOT NULL,
    [ModifiedDate] [datetime] NOT NULL
) 
GO

Now we’ll create our event session using dynamic SQL to add the database_id for our test database to the predicate for each of the events, restricting them to firing only for our test database to minimize the need to filter through the event session data later on.  The event session is going to collect the sqlserver.checkpoint_begin, sqlserver.checkpoint_end. sqlserver.file_written, sqlserver.file_write_completed, sqlserver.physical_page_write, sqlos.async_io_requested, and sqlos.async_io_completed events.  The checkpoint events are included in the event session to show that writes don’t immediately begin to occur to the data files, but instead occur in response to the checkpoint operations in the database engine.

-- Create our Event Session dynamically
IF EXISTS(SELECT * FROM sys.server_event_sessions WHERE name='MultipleDataFiles')
    DROP EVENT SESSION [MultipleDataFiles] ON SERVER;
DECLARE @sqlcmd nvarchar(4000) = '
CREATE EVENT SESSION MultipleDataFiles
ON SERVER
ADD EVENT sqlserver.checkpoint_begin
( WHERE (sqlserver.database_id = '+ cast(DB_ID() as varchar(3))+')),
ADD EVENT sqlserver.checkpoint_end
( WHERE (sqlserver.database_id = '+ cast(DB_ID() as varchar(3))+')),
ADD EVENT sqlserver.file_written
( WHERE (sqlserver.database_id = '+ cast(DB_ID() as varchar(3))+')),
ADD EVENT sqlserver.file_write_completed
( WHERE (sqlserver.database_id = '+ cast(DB_ID() as varchar(3))+')),
ADD EVENT sqlserver.physical_page_write
( WHERE (sqlserver.database_id = '+ cast(DB_ID() as varchar(3))+')),
ADD EVENT sqlos.async_io_requested
( WHERE (sqlserver.database_id = '+ cast(DB_ID() as varchar(3))+')),
ADD EVENT sqlos.async_io_completed
( WHERE (sqlserver.database_id = '+ cast(DB_ID() as varchar(3))+'))--,
ADD TARGET package0.asynchronous_file_target(
     SET filename=''C:\SQLskills\MultipleDataFiles.xel'',
         metadatafile=''C:\SQLskills\MultipleDataFiles.xem'')
WITH (MAX_MEMORY = 8MB, EVENT_RETENTION_MODE = ALLOW_SINGLE_EVENT_LOSS, TRACK_CAUSALITY = ON, MAX_DISPATCH_LATENCY=5SECONDS)'
EXEC (@sqlcmd)

With the event session created we can run our data load:

-- Start the Event Session
ALTER EVENT SESSION MultipleDataFiles
ON SERVER
STATE=START
GO
 
-- Load data into the test database
INSERT INTO dbo.SalesOrderHeader
SELECT RevisionNumber, 
    DATEADD(DD, 1126+number, OrderDate), 
    DATEADD(DD, 1126+number, DueDate), 
    DATEADD(DD, 1126+number, ShipDate), 
    soh.Status, OnlineOrderFlag, SalesOrderNumber, 
    PurchaseOrderNumber, AccountNumber, 
    CustomerID, SalesPersonID, TerritoryID, 
    BillToAddressID, ShipToAddressID, 
    ShipMethodID, CreditCardID, CreditCardApprovalCode, 
    CurrencyRateID, SubTotal, TaxAmt, Freight, 
    TotalDue, Comment, rowguid, 
    DATEADD(DD, 1126+number, ModifiedDate)
FROM AdventureWorks2008R2.Sales.SalesOrderHeader AS soh
CROSS JOIN master.dbo.spt_values AS sv
WHERE sv.type = N'P'
  AND sv.number > 0 AND sv.number < 6
GO 3
 
-- Flush all dirty pages to disk
CHECKPOINT
GO
 
-- Stop the Event Session
ALTER EVENT SESSION MultipleDataFiles
ON SERVER
STATE=STOP
GO

The above script will load roughly 92MB of data into the test table which makes the insert operation smaller than the size of all four data files.  Note that there is an explicit checkpoint in the test to force all dirty pages to be written to the appropriate data files on disk.  Without this checkpoint, the file writes may appear to be imbalanced incorrectly due to the timing of the last automatic checkpoint and the dirty pages in cache when it occurred.  By manually checkpointing the database before ending the event session we ensure we have the file writes captured accurately.  To view the information captured by the event session we will read the event data into a table and then shred the XML into another intermediate table to allow for further analysis of the detailed information if you so desire.  Finally we’ll aggregate the events and pivot the results based on the file_id to see how SQL Server wrote to the database files for the database. 

-- Drop Results tables if they exist
IF OBJECT_ID('MultipleDataFileResults') IS NOT NULL
    DROP TABLE MultipleDataFileResults 
GO
IF OBJECT_ID('MultipleDataFileResultsParsed') IS NOT NULL
    DROP TABLE MultipleDataFileResultsParsed 
GO
 
-- Create results table to load data from XE files
CREATE TABLE MultipleDataFileResults
(RowID int identity primary key, event_data XML)
GO
 
-- Load the event data from the file target
INSERT INTO MultipleDataFileResults(event_data)
SELECT
    CAST(event_data AS XML) AS event_data
FROM sys.fn_xe_file_target_read_file('C:\SQLskills\MultipleDataFiles*.xel', 
                                     'C:\SQLskills\MultipleDataFiles*.xem', 
                                     null, null)
GO
 
-- Parse the event data
SELECT 
    RowID,
    event_data.value('(event/@name)[1]', 'varchar(50)') AS event_name,
    DATEADD(hh, 
            DATEDIFF(hh, GETUTCDATE(), CURRENT_TIMESTAMP), 
            event_data.value('(event/@timestamp)[1]', 'datetime2')) AS [timestamp],
    COALESCE(event_data.value('(event/data[@name="database_id"]/value)[1]', 'int'), 
             event_data.value('(event/action[@name="database_id"]/value)[1]', 'int')) AS database_id,
    event_data.value('(event/data[@name="mode"]/text)[1]', 'nvarchar(4000)') AS [mode],
    event_data.value('(event/data[@name="file_handle"]/value)[1]', 'nvarchar(4000)') AS [file_handle],
    event_data.value('(event/data[@name="offset"]/value)[1]', 'bigint') AS [offset],
    event_data.value('(event/data[@name="page_id"]/value)[1]', 'int') AS [page_id],
    event_data.value('(event/data[@name="file_id"]/value)[1]', 'int') AS [file_id],
    event_data.value('(event/data[@name="file_group_id"]/value)[1]', 'int') AS [file_group_id],
    event_data.value('(event/data[@name="wait_type"]/text)[1]', 'nvarchar(100)') AS [wait_type],
    event_data.value('(event/data[@name="duration"]/value)[1]', 'bigint') AS [duration],
    event_data.value('(event/action[@name="sql_text"]/value)[1]', 'nvarchar(4000)') AS [sql_text],
    event_data.value('(event/data[@name="cpu"]/value)[1]', 'int') AS [cpu],
    event_data.value('(event/data[@name="reads"]/value)[1]', 'bigint') AS [reads],
    event_data.value('(event/data[@name="writes"]/value)[1]', 'bigint') AS [writes],
    CAST(SUBSTRING(event_data.value('(event/action[@name="attach_activity_id"]/value)[1]', 'varchar(50)'), 1, 36) AS uniqueidentifier) as activity_id,
    CAST(SUBSTRING(event_data.value('(event/action[@name="attach_activity_id"]/value)[1]', 'varchar(50)'), 38, 10) AS int) as event_sequence
INTO MultipleDataFileResultsParsed
FROM MultipleDataFileResults
ORDER BY Rowid
 
-- Aggregate the results by the event name and file_id
SELECT file_id, [file_write_completed],[file_written], [physical_page_write]
FROM
(    SELECT event_name, file_id, COUNT(*) AS occurences
    FROM MultipleDataFileResultsParsed
    WHERE event_name IN ('file_write_completed', 'file_written', 'physical_page_write')
    GROUP BY event_name, file_id
) AS tab
PIVOT
(    MAX(occurences) 
    FOR event_name IN ([file_write_completed],[file_written], [physical_page_write])) AS pvt

The output of our pivot operation shows that the four data files in the UserObjects filegroup are written to relatively evenly, which should be expected based on the amount of free space being equal across the four data files.  One item that should become incredibly apparent is the importance of the transaction log which has twenty-three times the file writes occurring to it than the nearest data file. 

image

Different file size but same free space

As previously stated in the intro to this blog post, proportional fill is based on the amount of free space in each file in relation to the other files and not the actual file sizes themselves.  To demonstrate this, we’ll create a single file database with our table and then load approximately 31MB of data into the table.  Then we’ll increase the size of the first file to 63MB and add three additional files that are 32MB each to the database.

-- Delete target files from previous tests
EXECUTE sp_configure 'show advanced options', 1; RECONFIGURE;
EXECUTE sp_configure 'xp_cmdshell', 1; RECONFIGURE; 
EXEC xp_cmdshell 'DEL C:\SQLskills\MultipleDataFiles*';
EXECUTE sp_configure 'xp_cmdshell', 0; RECONFIGURE;
EXECUTE sp_configure 'show advanced options', 0; RECONFIGURE;
 
-- Drop the test database from the server
USE [master]
GO
IF DB_ID('MultipleDataFiles') IS NOT NULL
BEGIN
    ALTER DATABASE [MultipleDataFiles] SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
    DROP DATABASE [MultipleDataFiles];
END
GO
-- Create a single-file database 
CREATE DATABASE [MultipleDataFiles] ON  PRIMARY 
( NAME = N'MultipleDataFiles', 
    FILENAME = N'H:\SQLData\MultipleDataFiles.mdf' , 
    SIZE = 32768KB , 
    FILEGROWTH = 32768KB ), 
FILEGROUP [UserObjects] 
( NAME = N'MultipleDataFiles_UserObjects1', 
    FILENAME = N'H:\SQLData\MultipleDataFiles_UserObjects1.ndf' , 
    SIZE = 32768KB , 
    FILEGROWTH = 32768KB )
LOG ON 
( NAME = N'MultipleDataFiles_log', 
    FILENAME = N'L:\SQLLogs\MultipleDataFiles_log.ldf' , 
    SIZE = 131072KB , 
    FILEGROWTH = 32768KB )
GO
ALTER DATABASE [MultipleDataFiles] 
MODIFY FILEGROUP [UserObjects] DEFAULT
GO
 
-- Create a table to load data into for the tests
USE [MultipleDataFiles]
GO
CREATE TABLE [dbo].[SalesOrderHeader](
    [SalesOrderID] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY,
    [RevisionNumber] [tinyint] NOT NULL,
    [OrderDate] [datetime] NOT NULL,
    [DueDate] [datetime] NOT NULL,
    [ShipDate] [datetime] NULL,
    [Status] [tinyint] NOT NULL,
    [OnlineOrderFlag] [bit] NOT NULL,
    [SalesOrderNumber] [nvarchar](25) NOT NULL,
    [PurchaseOrderNumber] [nvarchar](25) NULL,
    [AccountNumber] [nvarchar](15) NULL,
    [CustomerID] [int] NOT NULL,
    [SalesPersonID] [int] NULL,
    [TerritoryID] [int] NULL,
    [BillToAddressID] [int] NOT NULL,
    [ShipToAddressID] [int] NOT NULL,
    [ShipMethodID] [int] NOT NULL,
    [CreditCardID] [int] NULL,
    [CreditCardApprovalCode] [varchar](15) NULL,
    [CurrencyRateID] [int] NULL,
    [SubTotal] [money] NOT NULL,
    [TaxAmt] [money] NOT NULL,
    [Freight] [money] NOT NULL,
    [TotalDue] [money] NOT NULL,
    [Comment] [nvarchar](128) NULL,
    [rowguid] [uniqueidentifier] NOT NULL,
    [ModifiedDate] [datetime] NOT NULL
) 
GO
 
-- Load ~31MB of data into the test database
INSERT INTO dbo.SalesOrderHeader
SELECT RevisionNumber, 
    DATEADD(DD, 1126+number, OrderDate), 
    DATEADD(DD, 1126+number, DueDate), 
    DATEADD(DD, 1126+number, ShipDate), 
    soh.Status, OnlineOrderFlag, SalesOrderNumber, 
    PurchaseOrderNumber, AccountNumber, 
    CustomerID, SalesPersonID, TerritoryID, 
    BillToAddressID, ShipToAddressID, 
    ShipMethodID, CreditCardID, CreditCardApprovalCode, 
    CurrencyRateID, SubTotal, TaxAmt, Freight, 
    TotalDue, Comment, rowguid, 
    DATEADD(DD, 1126+number, ModifiedDate)
FROM AdventureWorks2008R2.Sales.SalesOrderHeader AS soh
CROSS JOIN master.dbo.spt_values AS sv
WHERE sv.type = N'P'
  AND sv.number > 0 AND sv.number < 6
 
-- Grow the first data file to 63MB leaving 32MB free space
ALTER DATABASE [MultipleDataFiles] 
MODIFY FILE (    NAME = N'MultipleDataFiles_UserObjects1', 
                SIZE = 64512KB )
GO
 
-- Add second file with 32MB size
ALTER DATABASE [MultipleDataFiles] 
ADD FILE (    NAME = N'MultipleDataFiles_UserObjects2', 
            FILENAME = N'H:\SQLData\MultipleDataFiles_UserObjects2.ndf' , 
            SIZE = 32768KB , 
            FILEGROWTH = 32768KB ) 
TO FILEGROUP [UserObjects]
GO
 
-- Add third file with 32MB size
ALTER DATABASE [MultipleDataFiles] 
ADD FILE (    NAME = N'MultipleDataFiles_UserObjects3', 
            FILENAME = N'H:\SQLData\MultipleDataFiles_UserObjects3.ndf' , 
            SIZE = 32768KB , 
            FILEGROWTH = 32768KB ) 
TO FILEGROUP [UserObjects]
GO
 
-- Add fourth file with 32MB size
ALTER DATABASE [MultipleDataFiles] 
ADD FILE (    NAME = N'MultipleDataFiles_UserObjects4', 
            FILENAME = N'H:\SQLData\MultipleDataFiles_UserObjects4.ndf' , 
            SIZE = 32768KB , 
            FILEGROWTH = 32768KB ) 
TO FILEGROUP [UserObjects]
GO

With new database setup, the exact same test from the first demo can be run to view how proportional fill functions with one data file larger than the others, but with the same free space.

image

The impact of different free space amounts

Since proportional fill is free space based, lets look at the impact that having different free space in one file has to the writes that occur.  To setup this test, the database will be created with one file sized at 64MB and remaining files sized at 32MB.

-- Delete target files from previous tests
EXECUTE sp_configure 'show advanced options', 1; RECONFIGURE;
EXECUTE sp_configure 'xp_cmdshell', 1; RECONFIGURE; 
EXEC xp_cmdshell 'DEL C:\SQLskills\MultipleDataFiles*';
EXECUTE sp_configure 'xp_cmdshell', 0; RECONFIGURE;
EXECUTE sp_configure 'show advanced options', 0; RECONFIGURE;
 
-- Drop the database from the server
USE [master]
GO
IF DB_ID('MultipleDataFiles') IS NOT NULL
BEGIN
    ALTER DATABASE [MultipleDataFiles] SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
    DROP DATABASE [MultipleDataFiles];
END
GO
-- Create a multi-file database with one file larger than the others
CREATE DATABASE [MultipleDataFiles] ON  PRIMARY 
( NAME = N'MultipleDataFiles', 
    FILENAME = N'H:\SQLData\MultipleDataFiles.mdf' , 
    SIZE = 32768KB , 
    FILEGROWTH = 32768KB ), 
FILEGROUP [UserObjects] 
( NAME = N'MultipleDataFiles_UserObjects1', 
    FILENAME = N'H:\SQLData\MultipleDataFiles_UserObjects1.ndf' , 
    SIZE = 65536KB , 
    FILEGROWTH = 32768KB ), 
( NAME = N'MultipleDataFiles_UserObjects2', 
    FILENAME = N'H:\SQLData\MultipleDataFiles_UserObjects2.ndf' , 
    SIZE = 32768KB , 
    FILEGROWTH = 32768KB ), 
( NAME = N'MultipleDataFiles_UserObjects3', 
    FILENAME = N'H:\SQLData\MultipleDataFiles_UserObjects3.ndf' , 
    SIZE = 32768KB , 
    FILEGROWTH = 32768KB ), 
( NAME = N'MultipleDataFiles_UserObjects4', 
    FILENAME = N'H:\SQLData\MultipleDataFiles_UserObjects4.ndf' , 
    SIZE = 32768KB , 
    FILEGROWTH = 32768KB )
 LOG ON 
( NAME = N'MultipleDataFiles_log', 
    FILENAME = N'L:\SQLLogs\MultipleDataFiles_log.ldf' , 
    SIZE = 131072KB , 
    FILEGROWTH = 32768KB )
GO
ALTER DATABASE [MultipleDataFiles] 
MODIFY FILEGROUP [UserObjects] DEFAULT
GO

The exact same test can be rerun and when the event data is parsed, the 64MB file will show roughly twice as many write operations and pages as the 32MB files, right in proportion to its free space.

  image

The impact of autogrowth

One question I had after beginning to look at this was, what impact autogrowth would have on data files that were evenly sized, and using the same sizes for autogrowth? We could easily test this by upping the number of executions for our insert operation from the original test to force the database to grow, but I was really interested in the impact over repeated growth operations, and being impatient I wasn’t really willing to wait for large insert operations to complete.  I instead went back a recreated the database using 8MB data files set to grow by 8MB.  Then I changed the GO 3 batch terminator for the INSERT in the test to a GO 5 to retest the impact of autogrowth.

-- Delete target files from previous tests
EXECUTE sp_configure 'show advanced options', 1; RECONFIGURE;
EXECUTE sp_configure 'xp_cmdshell', 1; RECONFIGURE; 
EXEC xp_cmdshell 'DEL C:\SQLskills\MultipleDataFiles*';
EXECUTE sp_configure 'xp_cmdshell', 0; RECONFIGURE;
EXECUTE sp_configure 'show advanced options', 0; RECONFIGURE;
 
-- Drop the database from the server
USE [master]
GO
IF DB_ID('MultipleDataFiles') IS NOT NULL
BEGIN
    ALTER DATABASE [MultipleDataFiles] SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
    DROP DATABASE [MultipleDataFiles];
END
GO
-- Create a multi-file database with one file larger than the others
CREATE DATABASE [MultipleDataFiles] ON  PRIMARY 
( NAME = N'MultipleDataFiles', 
    FILENAME = N'H:\SQLData\MultipleDataFiles.mdf' , 
    SIZE = 32768KB , 
    FILEGROWTH = 32768KB ), 
FILEGROUP [UserObjects] 
( NAME = N'MultipleDataFiles_UserObjects1', 
    FILENAME = N'H:\SQLData\MultipleDataFiles_UserObjects1.ndf' , 
    SIZE = 8192KB , 
    FILEGROWTH = 8192KB ), 
( NAME = N'MultipleDataFiles_UserObjects2', 
    FILENAME = N'H:\SQLData\MultipleDataFiles_UserObjects2.ndf' , 
    SIZE = 8192KB , 
    FILEGROWTH = 8192KB ), 
( NAME = N'MultipleDataFiles_UserObjects3', 
    FILENAME = N'H:\SQLData\MultipleDataFiles_UserObjects3.ndf' , 
    SIZE = 8192KB , 
    FILEGROWTH = 8192KB ), 
( NAME = N'MultipleDataFiles_UserObjects4', 
    FILENAME = N'H:\SQLData\MultipleDataFiles_UserObjects4.ndf' , 
    SIZE = 8192KB , 
    FILEGROWTH = 8192KB )
 LOG ON 
( NAME = N'MultipleDataFiles_log', 
    FILENAME = N'L:\SQLLogs\MultipleDataFiles_log.ldf' , 
    SIZE = 131072KB , 
    FILEGROWTH = 32768KB )
GO
ALTER DATABASE [MultipleDataFiles] 
MODIFY FILEGROUP [UserObjects] DEFAULT
GO

The outcome of the autogrowth tests at five iterations for the INSERT batch seemed odd to me because one of the files was lagging the other three significantly.

image

I wasn’t satisfied with the initial results of a couple of iterations with five batch executions for the INSERT so I decided to validate the results at various scales including ten, twenty and fifty iterations of the INSERT operation.

Using GO 10

image

Using GO 20

image

Using GO 50

image

At the end of each of these series of tests, the files were always within a single autogrowth size of each other, so it is obvious that proportional fill is keeping things relatively equal throughout the tests, but there is the potential for hot spotting of a single data file when auto grow occurs, at least until the other data files grow as well.  In this test the auto grow numbers were kept small, primarily due to storage limitations on the SSD in my laptop, and in the configuration of the VM I was working on, but I am definitely going to make it a point to test this again at a later date using larger autogrowth numbers to see what the impact is longer term and whether the hot spots caused by autogrowth can impact performance significantly?  I have always espoused manual file size management, even in large environments so that certain factors like a filegroup with multiple files can be addressed at the same time.

So there you have it, evidence of how proportional fill functions inside of SQL Server.  Hope you found it interesting.

See you on the playground!

T-SQL Tuesday

This post is part of the monthly community event called T-SQL Tuesday started by Adam Machanic (blog|twitter) and hosted by someone else each month. This month the host is Sankar Reddy (blog|twitter) and the topic is Misconceptions in SQL Server. You can follow posts for this theme on Twitter by looking at #TSQL2sDay hashtag.

It is fairly common knowledge these days that proper Transaction Log management includes properly sizing the log file so that the number of VLF’s is balanced with their size for optimum performance.  If this is a foreign concept or news to you, I’d recommend that you jump over to SQLSkills and read Kimberly Tripp’s blog posts on the subject.

Transaction Log VLFs - too many or too few?

Back in July, Crys Manson (Blog|Twitter) asked about the importance of VLF count on the TempDB Log, and a conversation ensued about what the impact of tempdb recreation would be on the VLF count for the tempdb log file.

@crysmanson Does # of VLFs matter as much with tempdb log? Recommend reading? If you grow out tempdb (for VLF) everything is lost on restart. #sqlhelp

@SQLSarg

@crysmanson if you just let tempdb autogrow out then it is reset to the configured size at startup #sqlhelp

@SQLSoldier

@SQLSarg So the question is will it create new VLF's? #sqlhelp

Not knowing the answer, I did some quick testing to see, and found that when the instance restarts, the number of VLF's inside the tempdb log file is reset based on the size of the file, following the information contained in Kimberly's blog post.  If the initial size of the tempdb log file is 1GB or larger, it is evenly divided into 16 VLF’s regardless of how large the file and resulting VLF’s might be.  To validate this, I wrote a Powershell script to test various sizes for the tempdb log, following the prescribed best practices for VLF sizing from Kimberly’s post both before and after restart.

[System.Reflection.Assembly]::LoadWithPartialName("Microsoft.SqlServer.SMO")  | Out-Null

function ResetTest ([int]$size)
{
    $log.Refresh();

    if ($log.Size -gt $size)
    {
        $log.set_Size($size);
        $log.Alter();
        Restart-Service -Force -Name "MSSQLSERVER" -WarningAction SilentlyContinue |Out-Null
    }
}

function RunTest ([string]$testnumber, [int]$setsize, [int]$growsize)
{
    $log.Refresh();
    $size = $log.Size;
    while ($size -lt $setsize)
    {
        $log.set_Size(($size + $growsize));
        $log.Alter();
        $log.Refresh();
        $size = $log.Size;
    }
    
    $vlfbefore = $tempdb.ExecuteWithResults("DBCC LOGINFO").Tables[0].Rows.Count
    
    Restart-Service -Force -Name "MSSQLSERVER" -WarningAction SilentlyContinue | Out-Null
    
    Start-Sleep -Seconds 60 #Wait for Instance Recovery to Complete 
    
    $tempdb.Refresh();

    $vlfafter = $tempdb.ExecuteWithResults("DBCC LOGINFO").Tables[0].Rows.Count
        
    $results = New-Object PSObject -Property @{            
        TestNumber                = $testnumber                 
        LogFileSize               = $log.Size              
        VLFBeforeRestart         = $vlfbefore
        VLFAfterRestart          = $vlfafter}    

    $results | select TestNumber, LogFileSize, VLFBeforeRestart, VLFAfterRestart
}
        
cls

$smosvr = New-Object ('Microsoft.SqlServer.Management.Smo.Server') "(local)"
$tempdb = $smosvr.Databases['tempdb']
$log = $tempdb.LogFiles.Item(0);

ResetTest 8387584
RunTest "16GB Log" 16775168 8387584
ResetTest 8387584
RunTest "24GB Log" 25162752 8387584
ResetTest 8387584
RunTest "32GB Log" 33550336 8387584

Regardless of how you manually grew the log file to set appropriate VLF sizes, at server restart the log is always reset to 16 VLF’s of equal size, which is not necessarily ideal according to additional discussion that occurred on Twitter.  However, there is not much that you can do about this, aside from setting the log file to 8GB for its initial size and then setting a reasonable size for Autogrowth so that it can grow if necessary, and still minimize the VLF count which once again may not be ideal for all environments.

Categories:
Internals | tempdb

Theme design by Nukeation based on Jelle Druyts