Certificate Signing Stored Procedures in Multiple Databases

Yesterday I received an email asking me a question related to my Using a Certificate Signed Stored Procedure to Execute sp_send_dbmail on SQL Server Central that intrigued me enough that I had to put a couple of minutes into actually figuring out what the problem the being encountered was.  In the article, I show how to create a stored procedure in a user database that can be used to call sp_send_dbmail without the calling user being in the DatabaseMailUserRole role in msdb, and there are actually a number of uses for this implementation beyond the example that is provided in the article.  I use this same method to allow Service Broker activation procedures to send me emails through Database Mail all the time, and there are numerous other cases where you might need to create a stored procedure in one database that executes code in a separate database, where the calling user should not have access to the separate database. 

I don’t know the exact use case for the problem that was posed in the email question, but the basic tenet of the question was that there was a stored procedure in one database that needed to be called from multiple databases and the intent was to use the same certificate to sign stored procedures in multiple databases to call this centralized stored procedure without giving the users access to the central database.  Is that confusing to you?  I confused me initially when I read the email, and it took a second pass for me to actually understand the problem being encountered and then build out a repro of the issue to begin looking at what the problem might be.  To keep things simple and non-confusing if that is possible, lets first create two databases, two procedures and build out the basic framework from the article for two user stored procedures.

-- Create the calling database
CREATE DATABASE a;
GO
-- Create the target database
CREATE DATABASE b;
GO
-- Switch to the target database
USE b
GO
-- Create a table and insert 10 rows into the table
CREATE TABLE SomeTable (RowID int identity primary key)
GO
INSERT INTO SomeTable DEFAULT VALUES
GO 10
-- Create the target stored procedure that selects from the table
-- in the target database.
CREATE PROCEDURE Select_SomeTable
AS
SELECT * FROM SomeTable
GO
-- Switch to the calling database
USE a
GO
-- Create a stored procedure that calls the stored procedure
-- in the target database
CREATE PROCEDURE ExecuteDatabaseB_Select_SomeTable
WITH EXECUTE AS OWNER
AS
EXECUTE b.dbo.Select_SomeTable
GO
-- Create a certificate to sign the calling stored procedures with
CREATE CERTIFICATE [SigningCertificate]
ENCRYPTION BY PASSWORD = '$tr0ngp@$$w0rd'
WITH SUBJECT = 'Certificate for signing a Stored Procedure';
GO
-- Backup certificate so it can be create in master database
BACKUP CERTIFICATE [SigningCertificate]
TO FILE = 'c:\SQLskills\SigningCertificate.cer';
GO
-- Add Certificate to Master Database
USE [master]
GO
CREATE CERTIFICATE [SigningCertificate]
FROM FILE = 'c:\SQLskills\SigningCertificate.cer';
GO
-- Create a login from the certificate
CREATE LOGIN [SigningLogin]
FROM CERTIFICATE [SigningCertificate];
GO
-- The Login must have Authenticate Sever access
-- per http://msdn.microsoft.com/en-us/library/ms190785.aspx
GRANT AUTHENTICATE SERVER TO [SigningLogin]
GO
-- Create a user in database b for the Login
USE b
GO
CREATE USER [SigningLogin] FROM LOGIN [SigningLogin]
GO
-- Grant EXECUTE on the target stored procedure to the 
-- certificate based login
GRANT EXECUTE ON [dbo].[Select_SomeTable] TO [SigningLogin]
GO
-- Switch to the calling database
USE a
GO
-- Sign the procedure with the certificate's private key
ADD SIGNATURE TO OBJECT::[ExecuteDatabaseB_Select_SomeTable]
BY CERTIFICATE [SigningCertificate] 
WITH PASSWORD = '$tr0ngp@$$w0rd';
GO
-- Create a test login to test that the certificate signed procedure
-- can properly execute the target procedure without the login having
-- access to the target database or target procedure directly
USE master;
GO
CREATE LOGIN [testuser] WITH PASSWORD=N't3stp@$$', DEFAULT_DATABASE=[master], CHECK_EXPIRATION=OFF, CHECK_POLICY=OFF
GO
-- Create the test user for the test login in the calling database and
-- grant it EXECUTE privileges on the calling stored procedure
USE [a]
GO
CREATE USER [testuser] FOR LOGIN [testuser]
GO
GRANT EXECUTE ON [dbo].[ExecuteDatabaseB_Select_SomeTable] TO [testuser]
GO
-- Switch to the test login context
EXECUTE AS LOGIN = 'testuser'
EXECUTE a.dbo.ExecuteDatabaseB_Select_SomeTable
GO
-- Revert the context back to our sysadmin login
REVERT
GO

In the above code, database “a” has a stored procedure that calls a stored procedure in database “b” and the “testuser” login only has access to database “a” and EXECUTE privileges on the stored procedure in database “a”.  The certificate from database “a” is backed up and created in the “master” database so that a certificate login can be created from the certificate which has the EXECUTE privilege on the stored procedure in database “b”.  The “testuser” login does not have access to database “b” or any of the objects inside of the database but because the stored procedure in database “a” is signed by the certificate and a login has been created in master from the certificate that has access to EXECUTE the stored procedure in database “b”, the “testuser” login can execute the stored procedure in database “a” and get access to the information in database “b” without having to have explicit access to database “b” or the stored procedure contained in that database.

At this point everything is essentially in line with the information contained in the SQL Server Central article I wrote, and it works exactly as shown in the article, even though it is all being applied to user databases and objects.  Now let’s try to extend this functionality to a third database as a caller of the stored procedure in database “b” by creating a new database, the same stored procedure that was created in database “a”, and then creating the certificate in the new database from the previous backup from database “a” and signing the stored procedure using the same certificate.

-- Create a third database to create another calling database
-- and stored procedure to the target database and stored procedure
CREATE DATABASE c;
GO
-- Create the calling stored procedure in the third database
USE c
GO
CREATE PROCEDURE ExecuteDatabaseB_Select_SomeTable
WITH EXECUTE AS OWNER
AS
EXECUTE b.dbo.Select_SomeTable
GO
-- Create the certificate from in the third database from the 
-- previous certificate backup to allow signing of the procedure
CREATE CERTIFICATE [SigningCertificate]
FROM FILE = 'c:\SQLskills\SigningCertificate.cer';
GO
-- Sign the procedure with the certificate's private key
ADD SIGNATURE TO OBJECT::[ExecuteDatabaseB_Select_SomeTable]
BY CERTIFICATE [SigningCertificate] 
WITH PASSWORD = '$tr0ngp@$$w0rd';
GO

This all works great, right up to the point that we try to sign the stored procedure using the certificate we created from our backup from database “a”.  When we try to sign the stored procedure in database “c” we get the following error back from SQL Server:

Msg 15556, Level 16, State 1, Line 2

Cannot decrypt or encrypt using the specified certificate, either because it has no private key or because the password provided for the private key is incorrect.

So what exactly happened here, the process worked the first time, but now we can’t duplicate it for additional databases that need to call this stored procedure?  The problem is that when we backed up the certificate from database “a” we only backed up the certificate, we didn’t backup the certificate’s private key to a separate file, so we can’t use this same certificate in additional databases to sign stored procedures unless we go back and backup the private key from the certificate as well.

To fix the problem, we first have to drop the stored procedure in database “c” and then drop the certificate without its private key from the database as well.  Then we will need to take a new backup of the certificate from database “a” and specify the WITH PRIVATE KEY clause to backup the private key for the certificate to a separate private key file so that we can recreate the certificate in database “c” with the correct private key to sign the stored procedure.  Once we have done this, we can recreate our stored procedure and sign it with the certificate, then create the database user for the “testuser” login and grant the database user the EXECUTE privilege on the stored procedure in database “c”, and test the configuration by executing the stored procedure in database “c” to ensure that we get the results from database “b”.

-- Fix the problem
USE 
GO
-- Drop the calling procedure with the invalid signature
-- Create the calling stored procedure in the third database
DROP PROCEDURE ExecuteDatabaseB_Select_SomeTable
GO
-- Drop the certificate without the private key
DROP CERTIFICATE [SigningCertificate]
GO
-- Backup the certificate with its private key so it can be used in
-- additional databases to sign stored procedures calling the target
-- stored procedure.
USE a
GO
BACKUP CERTIFICATE [SigningCertificate]
TO FILE = 'c:\SQLskills\SigningCertificate_WithKey.cer'
WITH PRIVATE KEY (
FILE = 'c:\SQLskills\SigningCertificate_WithKey.pvk',
ENCRYPTION BY PASSWORD = '$tr0ngp@$$w0rd',
DECRYPTION BY PASSWORD = '$tr0ngp@$$w0rd');
GO
-- Create the certificate in our new database with the private key
-- from the new certificate backup with the private key.
USE c
GO
CREATE CERTIFICATE [SigningCertificate]
FROM FILE = 'c:\SQLskills\SigningCertificate_WithKey.cer'
WITH PRIVATE KEY (
FILE = 'c:\SQLskills\SigningCertificate_WithKey.pvk',
ENCRYPTION BY PASSWORD = '$tr0ngp@$$w0rd',
DECRYPTION BY PASSWORD = '$tr0ngp@$$w0rd');
GO
-- Recreate the calling stored procedure in the third database
CREATE PROCEDURE ExecuteDatabaseB_Select_SomeTable
WITH EXECUTE AS OWNER
AS
EXECUTE b.dbo.Select_SomeTable
GO
-- Sign the procedure with the certificate's private key
ADD SIGNATURE TO OBJECT::[ExecuteDatabaseB_Select_SomeTable]
BY CERTIFICATE [SigningCertificate] 
WITH PASSWORD = '$tr0ngp@$$w0rd';
GO
-- Create the database user for the testuser login and grant it
-- the EXECUTE privilege on the calling stored procedure
CREATE USER [testuser] FOR LOGIN [testuser]
GO
GRANT EXECUTE ON [dbo].[ExecuteDatabaseB_Select_SomeTable] TO [testuser]
GO
-- Switch to the test login context
EXECUTE AS LOGIN = 'testuser'
EXECUTE c.dbo.ExecuteDatabaseB_Select_SomeTable
GO
-- Revert the context back to our sysadmin login
REVERT
GO

This all works as expected and we’ve resolved our problem.  This was a really interesting problem because I had never actually anticipated that someone would use the same certificate in multiple databases to sign multiple stored procedures that perform cross database operations without granting the calling users access to the target database.  This repro shows that this is technically possible if you backup the private key for the certificate and restore the private key in the additional databases with the certificate.

See you on the playground!

Missing The Details Can Cause Such A Mess

This blog post is brought to you complements of tonight’s dinner meal.  My entire life I have been eating Greek food, mainly because I come from a Greek background on my fathers side, but also because its really good food.  I can’t tell you how many times in my life I have seen someone in my family, usually my mom or my Aunt Anna, prepare and cook spanakopita.  In the past year I’ve helped my mom make it twice in attempts to learn everything she did, but tonight was the first time I actually attempted to prepare the dish from scratch myself.  If you have never had spanakopita, it is basically a spinach pie that is made using phyllo (filo) dough, a paper thin pastry dough that can be incredibly difficult to work with if you don’t know what you are doing. 

Today I decided to take my first shot at making spanakopita and while taste wise it wasn’t the best I’d ever had, it was far from being the worst.  I’d almost call the attempt a success, except for the important detail I learned as soon as I pulled the pan out of the oven and tried to cut into it.  If you recall, the dish is made with a paper thin dough for the crust, and my entire life I can remember my mom cutting the portions in the pan before baking the dish, but I only tonight learned that this is an important step.  If you don’t cut the portions before you bake the dish, you won’t get a clean cut through the upper portion of the crust because it becomes crispy and the paper think layers shatter when you attempt to slice through them.  It doesn’t ruin the dish taste wise, but presentation is out the window as shown in the below picture of the tonight’s left overs.

IMG_2700

You’ve probably noticed that this is not a technical post at all; or is it?  As a trainer, mentor, and student this is a really good example of how the simplest detail can have a significant impact to the final outcome.  The same task may get accomplished, but it may not have the same fit and finish that it would had the detail been emphasized.  In the Army, we have a training doctrine that provides a POI (Program Of Instruction) that is incredibly detailed.  As a TAITC (Total Army Instructor Training Course) certified instructor, I can tell when an instructor at a course I am attending is following the POI for the course verbatim, not due to it sounding robot like and incredibly rehearsed, but because there are NOTES sections in the training that hit very important details that people often miss if they aren’t following the POI.   Keep in mind that these are the training packages that are used Army wide for the specific phase and level of training being conducted, and they have been reviewed numerous times at multiple levels inside of the Army, and they are under constant revision based on experiences from the instructors in the field. 

Consider all the things that you do day to day without really considering the impact of the specific steps that you are taking.  Do you think that you can teach someone how to do those tasks and ensure that you don’t miss some simple step that is important to the final outcome?  My favorite example of this is unfortunately another military example, but it occurred when I first reported to my current unit as a Drill Sergeant Candidate, and I learned about this horrific little black book known as the Drill Sergeant Module Book.  This book is a verbatim reference of how each of the Drill and Ceremony Modules is supposed to be taught to a new recruit to ensure that they get the exact same detail of instruction as any other soldier entering the Army.  The first module made me laugh initially, the Position of Attention, I mean how difficult can it really be to teach someone to stand straight, not move, face forward and not talk while standing in a formation, until I tried it and failed miserably, and then tried again three more times; each time missing other details that are important to the final position.  The module for the position covers every detail head to toe:

7. On the command FALL IN or on the command of execution ATTENTION of Demonstrator, ATTENTION.

8. Bring the heels together sharply on line, with the toes pointing out equally, forming an angle of 45-degrees. Rest the weight of the body evenly on the heels and balls of both feet. Keep the legs straight without locking the knees. Hold the body erect with the hips level, chest lifted and arched, and the shoulders square.

9. Keep the head erect and face straight to the front with the chin drawn in so that the alignment of the head and neck is vertical.

10. Let the arms hang straight without stiffness. Curl the fingers so that the tips of the thumbs are alongside and touching the first joint of the forefingers. Keep the thumbs straight along the seams of the trouser leg with the first joint of the fingers touching the trousers.

11. Remain silent and do not move unless otherwise directed. RELAX.

If you know the module segment that is listed above, you will never miss a single detail required to teach the position of attention to someone.  I had to use the first sentence from the Introduction of the module to actually find the above link to the module book; the memorization of modules is no longer a requirement in the current Drill Sergeant School POI.

I’ve tried and failed a number of times when working on training materials, presentations, or just generally trying to mentor someone, to make the instruction follow a similar pattern to make certain that I always hit the detail points.  However, unless you are willing to constrain yourself to sounding like a robot or sounding like you memorized your presentation entirely, you aren’t going to fit this type of pattern.  Today, Army Drill Sergeants don’t generally memorize the modules for Drill and Ceremony instruction.  In fact it has never been a requirement that a Drill Sergeant had to memorize the entire module verbatim, you just had to hit the key points and details inside the content, and you always have a demonstrator working with you, so if you miss something and they know it, they can exaggerate an incorrect motion that makes the position look wrong so that you see the mistake. 

In technology, we have demonstrations as a part of most of our presentations, and I try to make use of notes in the demo code that I write to provide details that will trigger coverage of a specific topic if I’ve failed to cover it already.  I’ve seen some of my mom’s recipe cards from when I was a kid, and there are notes on those as well that provide a tip or trick that worked best for the specific recipe.  The problem with this is, in our day to day work, it is really easy to miss making these little notes known to others because they have become so ingrained in our way of doing things. 

This post went from dinner to the Army and finally back into technology.  I hope the non-technical nature of the content still makes you think about how you perform some of your technical work as well.

Looking at multiple data files and proportional fill with Extended Events

At SQL Connections, I presented a session titled “Learn SQL Server Internals with Extended Events” where I demonstrated a number ways to use Extended Events to learn about the internal workings of the database engine for SQL Server.  The morning of the session I was chatting with someone about a problem they had seen and the topic of proportional fill came up and how the database engine stripes data across multiple files in a user database.  From this discussion, I got the idea to play around with multiple database files and built a demo using Extended Events that showed how proportional fill worked inside of the database engine.  This wasn’t a planned demo for my presentation, and I had plenty of other demo’s that showed various SQL Server Internals, but it became a really good demo that I decided to throw into the mix, which put me way over budget for time.  I decided to leave it up the audience which demo they wanted to see for the last demo of the session, an originally planned one, or the one I wrote that morning for proportional fill; the majority wanted to see the one on proportional fill and it turned out to be the best demo of the entire session based on the crowd interest and feedback.  What was most interesting to me was the number of people attending the session that had never heard of the concept of proportional fill with SQL Server.  Proportional fill is the algorithm used by SQL Server to determine how much information is written to each of the files in a multi-file filegroup based on the proportion of free space within each file; which allows the files to become full at approximately the same time.  Proportional fill has nothing to do with the actual file sizes, it is strictly based on the free space within a file. 

To demonstrate proportional fill in SQL Server, we’ll take a look at how the page writes and I/O operations are distributed across varying configurations for a test database using the same test and event session for each configuration

Basic Example

To look at how proportional fill functions, we’ll start with a basic example, using a database with a separate filegroup for user objects that has been marked as the default filegroup for the database.  The UserObjects filegroup will have four data files, each 32MB in size.

-- Create a multi-file database with evenly sized files
CREATE DATABASE [MultipleDataFiles] ON  PRIMARY 
( NAME = N'MultipleDataFiles', 
    FILENAME = N'H:\SQLData\MultipleDataFiles.mdf' , 
    SIZE = 32768KB , 
    FILEGROWTH = 32768KB ), 
FILEGROUP [UserObjects] 
( NAME = N'MultipleDataFiles_UserObjects1', 
    FILENAME = N'H:\SQLData\MultipleDataFiles_UserObjects1.ndf' , 
    SIZE = 32768KB , 
    FILEGROWTH = 32768KB ), 
( NAME = N'MultipleDataFiles_UserObjects2', 
    FILENAME = N'H:\SQLData\MultipleDataFiles_UserObjects2.ndf' , 
    SIZE = 32768KB , 
    FILEGROWTH = 32768KB ), 
( NAME = N'MultipleDataFiles_UserObjects3', 
    FILENAME = N'H:\SQLData\MultipleDataFiles_UserObjects3.ndf' , 
    SIZE = 32768KB , 
    FILEGROWTH = 32768KB ), 
( NAME = N'MultipleDataFiles_UserObjects4', 
    FILENAME = N'H:\SQLData\MultipleDataFiles_UserObjects4.ndf' , 
    SIZE = 32768KB , 
    FILEGROWTH = 32768KB )
 LOG ON 
( NAME = N'MultipleDataFiles_log', 
    FILENAME = N'L:\SQLLogs\MultipleDataFiles_log.ldf' , 
    SIZE = 131072KB , 
    FILEGROWTH = 32768KB )
GO
ALTER DATABASE [MultipleDataFiles] 
MODIFY FILEGROUP [UserObjects] DEFAULT
GO

With the database created, we’ll create a table to load data into for our tests based on the AdventureWorks SalesOrderHeader table which will be the source for the test data:

- Create a table to load data into for the tests
USE [MultipleDataFiles]
GO
IF OBJECT_ID('SalesOrderHeader') IS NOT NULL
    DROP TABLE [dbo].[SalesOrderHeader]
GO
SET NOCOUNT ON
GO
CREATE TABLE [dbo].[SalesOrderHeader](
    [SalesOrderID] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY,
    [RevisionNumber] [tinyint] NOT NULL,
    [OrderDate] 
2019-05-22, 31
NOT NULL,
    [DueDate] 
2019-05-22, 31
NOT NULL,
    [ShipDate] 
2019-05-22, 31
NULL,
    [Status] [tinyint] NOT NULL,
    [OnlineOrderFlag] [bit] NOT NULL,
    [SalesOrderNumber] [nvarchar](25) NOT NULL,
    [PurchaseOrderNumber] [nvarchar](25) NULL,
    [AccountNumber] [nvarchar](15) NULL,
    [CustomerID] [int] NOT NULL,
    [SalesPersonID] [int] NULL,
    [TerritoryID] [int] NULL,
    [BillToAddressID] [int] NOT NULL,
    [ShipToAddressID] [int] NOT NULL,
    [ShipMethodID] [int] NOT NULL,
    [CreditCardID] [int] NULL,
    [CreditCardApprovalCode] [varchar](15) NULL,
    [CurrencyRateID] [int] NULL,
    [SubTotal] [money] NOT NULL,
    [TaxAmt] [money] NOT NULL,
    [Freight] [money] NOT NULL,
    [TotalDue] [money] NOT NULL,
    [Comment] [nvarchar](128) NULL,
    [rowguid] [uniqueidentifier] NOT NULL,
    [ModifiedDate] 
2019-05-22, 31
NOT NULL
) 
GO

Now we’ll create our event session using dynamic SQL to add the database_id for our test database to the predicate for each of the events, restricting them to firing only for our test database to minimize the need to filter through the event session data later on.  The event session is going to collect the sqlserver.checkpoint_begin, sqlserver.checkpoint_end. sqlserver.file_written, sqlserver.file_write_completed, sqlserver.physical_page_write, sqlos.async_io_requested, and sqlos.async_io_completed events.  The checkpoint events are included in the event session to show that writes don’t immediately begin to occur to the data files, but instead occur in response to the checkpoint operations in the database engine.

-- Create our Event Session dynamically
IF EXISTS(SELECT * FROM sys.server_event_sessions WHERE name='MultipleDataFiles')
    DROP EVENT SESSION [MultipleDataFiles] ON SERVER;
DECLARE @sqlcmd nvarchar(4000) = '
CREATE EVENT SESSION MultipleDataFiles
ON SERVER
ADD EVENT sqlserver.checkpoint_begin
( WHERE (sqlserver.database_id = '+ cast(DB_ID() as varchar(3))+')),
ADD EVENT sqlserver.checkpoint_end
( WHERE (sqlserver.database_id = '+ cast(DB_ID() as varchar(3))+')),
ADD EVENT sqlserver.file_written
( WHERE (sqlserver.database_id = '+ cast(DB_ID() as varchar(3))+')),
ADD EVENT sqlserver.file_write_completed
( WHERE (sqlserver.database_id = '+ cast(DB_ID() as varchar(3))+')),
ADD EVENT sqlserver.physical_page_write
( WHERE (sqlserver.database_id = '+ cast(DB_ID() as varchar(3))+')),
ADD EVENT sqlos.async_io_requested
( WHERE (sqlserver.database_id = '+ cast(DB_ID() as varchar(3))+')),
ADD EVENT sqlos.async_io_completed
( WHERE (sqlserver.database_id = '+ cast(DB_ID() as varchar(3))+'))--,
ADD TARGET package0.asynchronous_file_target(
     SET filename=''C:\SQLskills\MultipleDataFiles.xel'',
         metadatafile=''C:\SQLskills\MultipleDataFiles.xem'')
WITH (MAX_MEMORY = 8MB, EVENT_RETENTION_MODE = ALLOW_SINGLE_EVENT_LOSS, TRACK_CAUSALITY = ON, MAX_DISPATCH_LATENCY=5SECONDS)'
EXEC (@sqlcmd)

With the event session created we can run our data load:

-- Start the Event Session
ALTER EVENT SESSION MultipleDataFiles
ON SERVER
STATE=START
GO
 
-- Load data into the test database
INSERT INTO dbo.SalesOrderHeader
SELECT RevisionNumber, 
    DATEADD(DD, 1126+number, OrderDate), 
    DATEADD(DD, 1126+number, DueDate), 
    DATEADD(DD, 1126+number, ShipDate), 
    soh.Status, OnlineOrderFlag, SalesOrderNumber, 
    PurchaseOrderNumber, AccountNumber, 
    CustomerID, SalesPersonID, TerritoryID, 
    BillToAddressID, ShipToAddressID, 
    ShipMethodID, CreditCardID, CreditCardApprovalCode, 
    CurrencyRateID, SubTotal, TaxAmt, Freight, 
    TotalDue, Comment, rowguid, 
    DATEADD(DD, 1126+number, ModifiedDate)
FROM AdventureWorks2008R2.Sales.SalesOrderHeader AS soh
CROSS JOIN master.dbo.spt_values AS sv
WHERE sv.type = N'P'
  AND sv.number > 0 AND sv.number < 6
GO 3
 
-- Flush all dirty pages to disk
CHECKPOINT
GO
 
-- Stop the Event Session
ALTER EVENT SESSION MultipleDataFiles
ON SERVER
STATE=STOP
GO

The above script will load roughly 92MB of data into the test table which makes the insert operation smaller than the size of all four data files.  Note that there is an explicit checkpoint in the test to force all dirty pages to be written to the appropriate data files on disk.  Without this checkpoint, the file writes may appear to be imbalanced incorrectly due to the timing of the last automatic checkpoint and the dirty pages in cache when it occurred.  By manually checkpointing the database before ending the event session we ensure we have the file writes captured accurately.  To view the information captured by the event session we will read the event data into a table and then shred the XML into another intermediate table to allow for further analysis of the detailed information if you so desire.  Finally we’ll aggregate the events and pivot the results based on the file_id to see how SQL Server wrote to the database files for the database. 

-- Drop Results tables if they exist
IF OBJECT_ID('MultipleDataFileResults') IS NOT NULL
    DROP TABLE MultipleDataFileResults 
GO
IF OBJECT_ID('MultipleDataFileResultsParsed') IS NOT NULL
    DROP TABLE MultipleDataFileResultsParsed 
GO
 
-- Create results table to load data from XE files
CREATE TABLE MultipleDataFileResults
(RowID int identity primary key, event_data XML)
GO
 
-- Load the event data from the file target
INSERT INTO MultipleDataFileResults(event_data)
SELECT
    CAST(event_data AS XML) AS event_data
FROM sys.fn_xe_file_target_read_file('C:\SQLskills\MultipleDataFiles*.xel', 
                                     'C:\SQLskills\MultipleDataFiles*.xem', 
                                     null, null)
GO
 
-- Parse the event data
SELECT 
    RowID,
    event_data.value('(event/@name)[1]', 'varchar(50)') AS event_name,
    DATEADD(hh, 
            DATEDIFF(hh, GETUTCDATE(), CURRENT_TIMESTAMP), 
            event_data.value('(event/@timestamp)[1]', 'datetime2')) AS [timestamp],
    COALESCE(event_data.value('(event/data[@name="database_id"]/value)[1]', 'int'), 
             event_data.value('(event/action[@name="database_id"]/value)[1]', 'int')) AS database_id,
    event_data.value('(event/data[@name="mode"]/text)[1]', 'nvarchar(4000)') AS [mode],
    event_data.value('(event/data[@name="file_handle"]/value)[1]', 'nvarchar(4000)') AS [file_handle],
    event_data.value('(event/data[@name="offset"]/value)[1]', 'bigint') AS [offset],
    event_data.value('(event/data[@name="page_id"]/value)[1]', 'int') AS [page_id],
    event_data.value('(event/data[@name="file_id"]/value)[1]', 'int') AS [file_id],
    event_data.value('(event/data[@name="file_group_id"]/value)[1]', 'int') AS [file_group_id],
    event_data.value('(event/data[@name="wait_type"]/text)[1]', 'nvarchar(100)') AS [wait_type],
    event_data.value('(event/data[@name="duration"]/value)[1]', 'bigint') AS [duration],
    event_data.value('(event/action[@name="sql_text"]/value)[1]', 'nvarchar(4000)') AS [sql_text],
    event_data.value('(event/data[@name="cpu"]/value)[1]', 'int') AS [cpu],
    event_data.value('(event/data[@name="reads"]/value)[1]', 'bigint') AS [reads],
    event_data.value('(event/data[@name="writes"]/value)[1]', 'bigint') AS [writes],
    CAST(SUBSTRING(event_data.value('(event/action[@name="attach_activity_id"]/value)[1]', 'varchar(50)'), 1, 36) AS uniqueidentifier) as activity_id,
    CAST(SUBSTRING(event_data.value('(event/action[@name="attach_activity_id"]/value)[1]', 'varchar(50)'), 38, 10) AS int) as event_sequence
INTO MultipleDataFileResultsParsed
FROM MultipleDataFileResults
ORDER BY Rowid
 
-- Aggregate the results by the event name and file_id
SELECT file_id, [file_write_completed],[file_written], [physical_page_write]
FROM
(    SELECT event_name, file_id, COUNT(*) AS occurences
    FROM MultipleDataFileResultsParsed
    WHERE event_name IN ('file_write_completed', 'file_written', 'physical_page_write')
    GROUP BY event_name, file_id
) AS tab
PIVOT
(    MAX(occurences) 
    FOR event_name IN ([file_write_completed],[file_written], [physical_page_write])) AS pvt

The output of our pivot operation shows that the four data files in the UserObjects filegroup are written to relatively evenly, which should be expected based on the amount of free space being equal across the four data files.  One item that should become incredibly apparent is the importance of the transaction log which has twenty-three times the file writes occurring to it than the nearest data file. 

image

Different file size but same free space

As previously stated in the intro to this blog post, proportional fill is based on the amount of free space in each file in relation to the other files and not the actual file sizes themselves.  To demonstrate this, we’ll create a single file database with our table and then load approximately 31MB of data into the table.  Then we’ll increase the size of the first file to 63MB and add three additional files that are 32MB each to the database.

-- Delete target files from previous tests
EXECUTE sp_configure 'show advanced options', 1; RECONFIGURE;
EXECUTE sp_configure 'xp_cmdshell', 1; RECONFIGURE; 
EXEC xp_cmdshell 'DEL C:\SQLskills\MultipleDataFiles*';
EXECUTE sp_configure 'xp_cmdshell', 0; RECONFIGURE;
EXECUTE sp_configure 'show advanced options', 0; RECONFIGURE;
 
-- Drop the test database from the server
USE [master]
GO
IF DB_ID('MultipleDataFiles') IS NOT NULL
BEGIN
    ALTER DATABASE [MultipleDataFiles] SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
    DROP DATABASE [MultipleDataFiles];
END
GO
-- Create a single-file database 
CREATE DATABASE [MultipleDataFiles] ON  PRIMARY 
( NAME = N'MultipleDataFiles', 
    FILENAME = N'H:\SQLData\MultipleDataFiles.mdf' , 
    SIZE = 32768KB , 
    FILEGROWTH = 32768KB ), 
FILEGROUP [UserObjects] 
( NAME = N'MultipleDataFiles_UserObjects1', 
    FILENAME = N'H:\SQLData\MultipleDataFiles_UserObjects1.ndf' , 
    SIZE = 32768KB , 
    FILEGROWTH = 32768KB )
LOG ON 
( NAME = N'MultipleDataFiles_log', 
    FILENAME = N'L:\SQLLogs\MultipleDataFiles_log.ldf' , 
    SIZE = 131072KB , 
    FILEGROWTH = 32768KB )
GO
ALTER DATABASE [MultipleDataFiles] 
MODIFY FILEGROUP [UserObjects] DEFAULT
GO
 
-- Create a table to load data into for the tests
USE [MultipleDataFiles]
GO
CREATE TABLE [dbo].[SalesOrderHeader](
    [SalesOrderID] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY,
    [RevisionNumber] [tinyint] NOT NULL,
    [OrderDate] 
2019-05-22, 31
NOT NULL,
    [DueDate] 
2019-05-22, 31
NOT NULL,
    [ShipDate] 
2019-05-22, 31
NULL,
    [Status] [tinyint] NOT NULL,
    [OnlineOrderFlag] [bit] NOT NULL,
    [SalesOrderNumber] [nvarchar](25) NOT NULL,
    [PurchaseOrderNumber] [nvarchar](25) NULL,
    [AccountNumber] [nvarchar](15) NULL,
    [CustomerID] [int] NOT NULL,
    [SalesPersonID] [int] NULL,
    [TerritoryID] [int] NULL,
    [BillToAddressID] [int] NOT NULL,
    [ShipToAddressID] [int] NOT NULL,
    [ShipMethodID] [int] NOT NULL,
    [CreditCardID] [int] NULL,
    [CreditCardApprovalCode] [varchar](15) NULL,
    [CurrencyRateID] [int] NULL,
    [SubTotal] [money] NOT NULL,
    [TaxAmt] [money] NOT NULL,
    [Freight] [money] NOT NULL,
    [TotalDue] [money] NOT NULL,
    [Comment] [nvarchar](128) NULL,
    [rowguid] [uniqueidentifier] NOT NULL,
    [ModifiedDate] 
2019-05-22, 31
NOT NULL
) 
GO
 
-- Load ~31MB of data into the test database
INSERT INTO dbo.SalesOrderHeader
SELECT RevisionNumber, 
    DATEADD(DD, 1126+number, OrderDate), 
    DATEADD(DD, 1126+number, DueDate), 
    DATEADD(DD, 1126+number, ShipDate), 
    soh.Status, OnlineOrderFlag, SalesOrderNumber, 
    PurchaseOrderNumber, AccountNumber, 
    CustomerID, SalesPersonID, TerritoryID, 
    BillToAddressID, ShipToAddressID, 
    ShipMethodID, CreditCardID, CreditCardApprovalCode, 
    CurrencyRateID, SubTotal, TaxAmt, Freight, 
    TotalDue, Comment, rowguid, 
    DATEADD(DD, 1126+number, ModifiedDate)
FROM AdventureWorks2008R2.Sales.SalesOrderHeader AS soh
CROSS JOIN master.dbo.spt_values AS sv
WHERE sv.type = N'P'
  AND sv.number > 0 AND sv.number < 6
 
-- Grow the first data file to 63MB leaving 32MB free space
ALTER DATABASE [MultipleDataFiles] 
MODIFY FILE (    NAME = N'MultipleDataFiles_UserObjects1', 
                SIZE = 64512KB )
GO
 
-- Add second file with 32MB size
ALTER DATABASE [MultipleDataFiles] 
ADD FILE (    NAME = N'MultipleDataFiles_UserObjects2', 
            FILENAME = N'H:\SQLData\MultipleDataFiles_UserObjects2.ndf' , 
            SIZE = 32768KB , 
            FILEGROWTH = 32768KB ) 
TO FILEGROUP [UserObjects]
GO
 
-- Add third file with 32MB size
ALTER DATABASE [MultipleDataFiles] 
ADD FILE (    NAME = N'MultipleDataFiles_UserObjects3', 
            FILENAME = N'H:\SQLData\MultipleDataFiles_UserObjects3.ndf' , 
            SIZE = 32768KB , 
            FILEGROWTH = 32768KB ) 
TO FILEGROUP [UserObjects]
GO
 
-- Add fourth file with 32MB size
ALTER DATABASE [MultipleDataFiles] 
ADD FILE (    NAME = N'MultipleDataFiles_UserObjects4', 
            FILENAME = N'H:\SQLData\MultipleDataFiles_UserObjects4.ndf' , 
            SIZE = 32768KB , 
            FILEGROWTH = 32768KB ) 
TO FILEGROUP [UserObjects]
GO

With new database setup, the exact same test from the first demo can be run to view how proportional fill functions with one data file larger than the others, but with the same free space.

image

The impact of different free space amounts

Since proportional fill is free space based, lets look at the impact that having different free space in one file has to the writes that occur.  To setup this test, the database will be created with one file sized at 64MB and remaining files sized at 32MB.

-- Delete target files from previous tests
EXECUTE sp_configure 'show advanced options', 1; RECONFIGURE;
EXECUTE sp_configure 'xp_cmdshell', 1; RECONFIGURE; 
EXEC xp_cmdshell 'DEL C:\SQLskills\MultipleDataFiles*';
EXECUTE sp_configure 'xp_cmdshell', 0; RECONFIGURE;
EXECUTE sp_configure 'show advanced options', 0; RECONFIGURE;
 
-- Drop the database from the server
USE [master]
GO
IF DB_ID('MultipleDataFiles') IS NOT NULL
BEGIN
    ALTER DATABASE [MultipleDataFiles] SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
    DROP DATABASE [MultipleDataFiles];
END
GO
-- Create a multi-file database with one file larger than the others
CREATE DATABASE [MultipleDataFiles] ON  PRIMARY 
( NAME = N'MultipleDataFiles', 
    FILENAME = N'H:\SQLData\MultipleDataFiles.mdf' , 
    SIZE = 32768KB , 
    FILEGROWTH = 32768KB ), 
FILEGROUP [UserObjects] 
( NAME = N'MultipleDataFiles_UserObjects1', 
    FILENAME = N'H:\SQLData\MultipleDataFiles_UserObjects1.ndf' , 
    SIZE = 65536KB , 
    FILEGROWTH = 32768KB ), 
( NAME = N'MultipleDataFiles_UserObjects2', 
    FILENAME = N'H:\SQLData\MultipleDataFiles_UserObjects2.ndf' , 
    SIZE = 32768KB , 
    FILEGROWTH = 32768KB ), 
( NAME = N'MultipleDataFiles_UserObjects3', 
    FILENAME = N'H:\SQLData\MultipleDataFiles_UserObjects3.ndf' , 
    SIZE = 32768KB , 
    FILEGROWTH = 32768KB ), 
( NAME = N'MultipleDataFiles_UserObjects4', 
    FILENAME = N'H:\SQLData\MultipleDataFiles_UserObjects4.ndf' , 
    SIZE = 32768KB , 
    FILEGROWTH = 32768KB )
 LOG ON 
( NAME = N'MultipleDataFiles_log', 
    FILENAME = N'L:\SQLLogs\MultipleDataFiles_log.ldf' , 
    SIZE = 131072KB , 
    FILEGROWTH = 32768KB )
GO
ALTER DATABASE [MultipleDataFiles] 
MODIFY FILEGROUP [UserObjects] DEFAULT
GO

The exact same test can be rerun and when the event data is parsed, the 64MB file will show roughly twice as many write operations and pages as the 32MB files, right in proportion to its free space.

  image

The impact of autogrowth

One question I had after beginning to look at this was, what impact autogrowth would have on data files that were evenly sized, and using the same sizes for autogrowth? We could easily test this by upping the number of executions for our insert operation from the original test to force the database to grow, but I was really interested in the impact over repeated growth operations, and being impatient I wasn’t really willing to wait for large insert operations to complete.  I instead went back a recreated the database using 8MB data files set to grow by 8MB.  Then I changed the GO 3 batch terminator for the INSERT in the test to a GO 5 to retest the impact of autogrowth.

-- Delete target files from previous tests
EXECUTE sp_configure 'show advanced options', 1; RECONFIGURE;
EXECUTE sp_configure 'xp_cmdshell', 1; RECONFIGURE; 
EXEC xp_cmdshell 'DEL C:\SQLskills\MultipleDataFiles*';
EXECUTE sp_configure 'xp_cmdshell', 0; RECONFIGURE;
EXECUTE sp_configure 'show advanced options', 0; RECONFIGURE;
 
-- Drop the database from the server
USE [master]
GO
IF DB_ID('MultipleDataFiles') IS NOT NULL
BEGIN
    ALTER DATABASE [MultipleDataFiles] SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
    DROP DATABASE [MultipleDataFiles];
END
GO
-- Create a multi-file database with one file larger than the others
CREATE DATABASE [MultipleDataFiles] ON  PRIMARY 
( NAME = N'MultipleDataFiles', 
    FILENAME = N'H:\SQLData\MultipleDataFiles.mdf' , 
    SIZE = 32768KB , 
    FILEGROWTH = 32768KB ), 
FILEGROUP [UserObjects] 
( NAME = N'MultipleDataFiles_UserObjects1', 
    FILENAME = N'H:\SQLData\MultipleDataFiles_UserObjects1.ndf' , 
    SIZE = 8192KB , 
    FILEGROWTH = 8192KB ), 
( NAME = N'MultipleDataFiles_UserObjects2', 
    FILENAME = N'H:\SQLData\MultipleDataFiles_UserObjects2.ndf' , 
    SIZE = 8192KB , 
    FILEGROWTH = 8192KB ), 
( NAME = N'MultipleDataFiles_UserObjects3', 
    FILENAME = N'H:\SQLData\MultipleDataFiles_UserObjects3.ndf' , 
    SIZE = 8192KB , 
    FILEGROWTH = 8192KB ), 
( NAME = N'MultipleDataFiles_UserObjects4', 
    FILENAME = N'H:\SQLData\MultipleDataFiles_UserObjects4.ndf' , 
    SIZE = 8192KB , 
    FILEGROWTH = 8192KB )
 LOG ON 
( NAME = N'MultipleDataFiles_log', 
    FILENAME = N'L:\SQLLogs\MultipleDataFiles_log.ldf' , 
    SIZE = 131072KB , 
    FILEGROWTH = 32768KB )
GO
ALTER DATABASE [MultipleDataFiles] 
MODIFY FILEGROUP [UserObjects] DEFAULT
GO

The outcome of the autogrowth tests at five iterations for the INSERT batch seemed odd to me because one of the files was lagging the other three significantly.

image

I wasn’t satisfied with the initial results of a couple of iterations with five batch executions for the INSERT so I decided to validate the results at various scales including ten, twenty and fifty iterations of the INSERT operation.

Using GO 10

image

Using GO 20

image

Using GO 50

image

At the end of each of these series of tests, the files were always within a single autogrowth size of each other, so it is obvious that proportional fill is keeping things relatively equal throughout the tests, but there is the potential for hot spotting of a single data file when auto grow occurs, at least until the other data files grow as well.  In this test the auto grow numbers were kept small, primarily due to storage limitations on the SSD in my laptop, and in the configuration of the VM I was working on, but I am definitely going to make it a point to test this again at a later date using larger autogrowth numbers to see what the impact is longer term and whether the hot spots caused by autogrowth can impact performance significantly?  I have always espoused manual file size management, even in large environments so that certain factors like a filegroup with multiple files can be addressed at the same time.

So there you have it, evidence of how proportional fill functions inside of SQL Server.  Hope you found it interesting.

See you on the playground!