Plagiarism online is unfortunately a very common problem.  Over the last few years, there have been numerous circumstances where a blog or website has done a verbatim word for word copy of blog posts written by other authors/bloggers online.  These are clear-cut cases where plagiarism has occurred.  However, plagiarism doesn’t have to be a verbatim copy of a blog post, it can simply be the reuse of ideas or solutions that are not common without attribution.  According to the Plagiarism.org website, and the Merriam Webster Online Dictionary, plagiarizing covers any of the following:

  • to steal and pass off (the ideas or words of another) as one's own
  • to use (another's production) without crediting the source
  • to commit literary theft
  • to present as new and original an idea or product derived from an existing source

My undergraduate degree is in History, and during my time in school there were numerous times where someone failed to appropriately cite sources and as a result was failed from a class and either placed on academic probation, or even dismissed from the school entirely.  As a result I make it a point to provide attribution when I blog and two examples of how I do this are in the opening paragraph of this blog post.  It is really easy to cite your sources and as a blogger it is your responsibility to do so.

This applies to paraphrasing/summarizing blog material as well, though I am not a big fan of taking another person's work and rewriting it so that the same concept is shown, just to keep from plagiarizing the content word for word. Earlier this week I read a blog post and left a comment with a link to a blog post I wrote that provided the correct solution to the problem.  I even traded DM’s with the blog author on Twitter, so the last thing I expected was that this person would rewrite the solution I provided in my post on their blog the next day without providing any form of attribution.  A link has since been added after a number of DM’s pointing out that this was still plagiarism.  I don’t necessarily agree with someone reposting an entire concept without adding anything meaningful to the content, but if you are going to do this, at a bare minimum you have to provide attribution to your original source.

If you are a blogger and you are unsure, provide a reference as it only takes a second or two to do so.  Not citing sources doesn’t make you appear any smarter, quite to the contrary, not citing sources and getting caught for it makes you appear dishonest. 

There is no valid excuse for plagiarizing content or ideas.

Categories:
General

Traditionally questions about how much memory SQL Server needs were aimed at how to appropriately set the 'max server memory' sp_configure option in SQL Server, and in my book the recommendation that I make is to reserve 1 GB of RAM for the OS, 1 GB for each 4 GB of RAM installed from 4–16 GB, and then 1 GB for every 8 GB RAM installed above 16 GB RAM.  This has typically worked out well for servers that are dedicated to SQL Server.  You can also get much more technical with determining where to set 'max server memory' by working out the specific memory requirements for the OS, other applications, the SQL Server thread stack, and other multipage allocators.  Typically this would be  ((Total system memory) – (memory for thread stack) – (OS memory requirements ~ 2-4GB) – (memory for other applications) - (memory for multipage allocations; SQLCLR, linked servers, etc)), where the memory for thread stack = ((max worker threads) *(stack size)) and the stack size is 512KB for x86 systems, 2MB for x64 systems and 4MB for IA64 systems.  The value for 'max worker threads' can be found in the max_worker_count column of sys.dm_os_sys_info.  However, the assumption with either of these methods is that you want SQL Server to use everything that is available on the machine, unless you've made reservations in the calculations for other applications.

As more shops move towards virtualizing SQL Servers in their environment this question is more and more geared towards determining what is the minimum amount of memory that a SQL Server will need to run as a VM.  Unfortunately there is no way to calculate out what the ideal amount of RAM for a given instance of SQL Server might actually be since SQL Server is designed to cache data in the buffer pool, and it will typically use as much memory as you can give it.  One of the things to keep in mind when you are looking at reducing the RAM allocated to a SQL Server instance is that you will eventually get to a point where the lower memory gets traded off for higher disk I/O access in the environments.

If you need to figure out the ideal configuration for SQL Server memory in an environment that has been over provisioned the best way to try to go about doing this is start off with a baseline of the environment and the current performance metrics.  Counters to begin monitoring would include:

  • SQL Server:Buffer Manager\Page Life Expectancy
  • SQL Server:Buffer Manager\Page reads/sec
  • Physical Disk\Disk Reads/sec

Typically if the environment has excess memory for the buffer pool, the Page Life Expectancy value will continue to increase by a value of one every second and it won't typically drop off under the workload because all of the data pages end up being cached.  At the same time, the number of SQL Server:Buffer Manager\Page reads/sec will be low after the cache ramp up occurs which will also correspond to a low value for Physical Disk\Disk Reads/sec. 

Once you have your baseline for the environment, make a change to the sp_configure 'max server memory' option to reduce the size of the buffer pool by 1GB and then monitor the impact to the performance counters after things stabilize from the initial cache flushing that may typically occur when RECONFIGURE is run in the environment.  If the level of Page Life Expectancy remains acceptable for your environment (keeping in mind that a fixed target of >= 300 is ridiculous for servers with large amounts of RAM installed), and the number of SQL Server:Buffer Manager\Page reads/sec is within what the disk I/O subsystem can support without performance degradation, repeat the process of reducing the sp_configure value for 'max server memory' by 1GB and continuing to monitor the impact to the environment. 

I recently did some drive upgrades in my personal laptops and as a result I had a couple of SSD drives sitting around that I wanted to make use of.  A few weeks ago I purchased a new OCZ Agility 3 240GB SATA III SSD from Newegg when it was on sale and at the same time I also purchased two MassCool USB 3 external enclosures to make use of the two 120GB OCZ Vertex 2 SSD’s that I would have sitting around.  The cost for the MassCool enclosures was only $14.99 so I figured it would be a good way to reuse the SSD’s at the time.  When the package from Newegg arrived, I immediately installed the SSD into one of my personal machines and then later I went about testing the performance of the USB 3 external enclosures with the older SSD’s.  Initially the performance was impressive, and I posted a few tweets that attracted some attention and a few people requested that I blog my results.

To make this blog post something worth looking at, I ran a number of additional tests over the last few days using SQLIO and a varying configuration of drives with external enclosures.  Specifically I tested two different USB 3 2.5” SATA enclosures, a separate eSATA 2.5” SATA external enclosure, as well as two USB 3 HDD’s that I already owned to get some performance information from each of the setups.

The external USB 3.0 HDD’s being tested are as follows:

The external enclosures being tested in these tests are as follows:

The SSD’s used with the external enclosures listed above are:

At the time that I bought the 120GB drive I paid close to $2.42 per megabyte for it, and at the time this was a good deal.  When I purchased the 240GB SSD I paid right around $1.08 per megabyte, showing how much the prices have decreased nearly 3 years later.  I happen to own a number of OCZ drives at home and they are one of my favorite brands personally because of the overall reliability I’ve had with them the last three years as well as the performance to cost ratio I’ve experienced. Initially I was just testing the MassCool USB 3 enclosures that I purchased, and I was incredibly happy with the performance that I had from them.  However, after tweeting about the results, my friend Jose Chinchilla (Blog | Twitter) mentioned that I should also try out eSATA because it performed significantly better in his own tests.  Based on this tweet, I set out to my local MicroCenter computer store to purchase a eSATA enclosure and while I was buying it, one of the sales representatives, a guy named Chris, approached me and asked what I was planning to do with the enclosure because USB 3.0 should be faster for SSD’s. The numbers he quoted didn’t match my previous testing, so he made a recommendation for a specific USB 3.0 enclosure, and I figured, for $20 it was worth testing, so I bought an extra Hornettek Panther USB 3.0 device for comparison testing along with the Vantec NexStar CX eSATA enclosure I had already selected.

For the tests, I ran a short set of tests using SQLIO that I previously blogged about on my blog post about the Powershell parser for SQLIO output.  The results from the tests are below:

image

image

The two external USB 3.0 HDD drives had very similar performance characteristics, and they beat their USB 2.0 counterparts performance wise significantly.  I’ve been incredibly happy with my USB 3.0 HDDs overall for the last few years and I only purchase USB 3.0 HDD drives based on my performance tests a few years ago.  However, when compared with the USB 3.0 external enclosures with the SSD’s, the performance difference is quite significant.  One of the problems I’ve had lately is being able to fit all of the virtual machine hard drives, VHD’s for Hyper-V when using my dual boot Hyper-V host VHD, as well as my original VMware Workstation VMDK’s for classes, and even the VirtualBox virtual disk images (VDI) from my blog series on building a completely free playground.  Using the SSD’s with USB 3.0 definitely makes storing my virtual hard disk files on an external array much more feasible performance wise, and my only real limitation from testing is the size of the external device. 

However, the performance of the eSATA external enclosure is incredibly better, with the side trade off that I can only have 1 of them attached to my laptops at a time, and for my personal laptop, the fact that the eSATA enclosure requires not only the eSATA port for throughput, but also the additional USB 2.0 port for power really limits what I can do as far as having multiple disks connected to the laptop.  To be honest, this is something that I can live with given that I also have swapped out my CD/DVD ROM bay for a replacement New Mode US second HDD conversion. This means that I can have two 240GB OCZ SSD’s in my E5420, which is a similar configuration to the dual 256GB Dell SSD configuration in my Precision M6500 for work, while being able to move VM images between the two machines using my older 120GB SSD’s with fantastic performance.

If you are looking at really high performance external hard disk configurations for scalability, I would highly recommend looking at either eSATA with an extra SSD drive or if you need multiple devices and you have USB 3.0, consider going that route.  Either will outperform your existing options significantly.  For the time being, I am going to stick with one eSATA device, which is compatible with both of my laptops, and one USB 3.0 device, which only works with my M6500 at USB 3.0 speeds.  My E5420 only has USB 2.0 onboard, so it makes more sense for me to stick with eSATA for the main shared drive.

A year ago today, Paul Randal (Blog | Twitter) first blogged about my joining the SQLskills team as the first full time employee.  I can still remember the first time that I met Paul at SQL PASS in 2008, and I am still amazed that I work with the best in the industry over a year after working out the details of my employment with SQLskills.  I remember the first time that Paul and I discussed the potential for me joining SQLskills, and how hard it was to tell him at the time that there was no way it would work out for either of us.  Joshua Harris once said, “The right thing at the wrong time is the wrong thing” and nothing could be further from the truth as far as my employment with SQLskills.  When Paul and I first discussed the possibility of me joining the SQLskills team, I knew ahead of time that my Army Reserve unit was pending a year long mobilization in early 2011 so that became the first, and as it turned out, last point of discussion about employment feasibility.  Legally there is no requirement for me to disclose this type of information when seeking employment, and further if disclosed this information can not be used as a part of the selection criteria, but I have always not only been proud of my service in the US Army Reserves, but have also been incredibly sensitive to the needs of potential employers when discussing job opportunities.  The end result was that we mutually agreed that the timing wasn’t right and we’d keep open communications about changes should they occur, such that if an employment possibility reopened we’d discuss it again further.  Fast forward a few months and the opportunity once again posed itself, but at this point it was clear the circumstances worked out to be mutually beneficial and we ultimate struck a deal that lead to me joining SQLskills full time as employee number three (it actually says this on my semi-annual review). 

For nearly six years, Paul has been a fabulous mentor of mine, dating back to before he left the product group at Microsoft, based on our forums interactions.  We actually have a very interesting background for solving problems related to SQL Server that have happened in very public venues, for example the Diskeeper 2010 issues that we figured out on Twitter late in 2010.  One of the highlights of my career as a DBA was the first time that Paul asked me to call him out of the blue, when he first broached the topic of joining SQLskills as a consultant.  I remember discussing things, and at the end of the call where we mutually agreed that it wasn’t a good fit at the time, wondering what in the world I had just done.  Looking back, that had to be one of the best decisions of my life, though I didn’t realize it at the time.  Over the ensuing months, I had the opportunity to grow as a SQL Server professional, working in one of the most challenging environments of my life, while learning more about the internals of SQL Server than I ever thought I would.  Before I ever accepted a position with SQLskills, I privately took the written exam for the new SQL Server MCM and passed, and upon passing the written exam, I privately scheduled my lab exam with Joe Sack.  The day after Paul first blogged about me joining SQLskills, I passed the Lab Exam for the SQL Server MCM and became the fourth, and youngest, SQL Server MCM under the new program.

In the year since the initial announcement I’ve made a number of changes.  First off, being a consultant is not what it might seem from the outside looking in.  This is without a doubt one of the hardest jobs I’ve ever had, and every day brings forth new challenges both technically and personally.  Additionally I’ve had to learn how to be an efficient trainer as an instructor for not only the SQLskills Immersion Events, but also the Microsoft SharePoint MCM program as well.  I definitely have the best mentors around for learning how to not only be an effective consultant, but also how to be an effective trainer as well.  Since joining the SQLskills team I have had a top rated session at SQLRally in Orlando, FL, presented at multiple Immersion Events, both public and private, as well as doing my first pre-conference seminar at PASS 2011 on Extended Events.  Along the way I’ve experienced both success and failure, but no matter what the feedback, I’ve been able to grow from the experiences in an incredibly positive manner. 

Personally I can’t imagine having a better year than the last one overall, but I am sure going to try!

Categories:
General

Yesterday I was asked by email about a problem that someone encountered associated with a SQL Server Failover Cluster configuration that I have run into a number of times myself, and I have had questions about repeatedly in the past.  The problem is that during the SQL Server Setup Validation of the environment, a warning is raised stating that the Network Binding Order is incorrect for the environment.  If you click on the specific Setup Validation Warning you will get a box like the following:

image

What this warning is telling you, is that you have multiple network interfaces configured on the server, and the default binding order for the interfaces places the current Domain access interface in a position that is not the first interface for the server.  This can be changed by adjusting the network binding order for the server in the Network Properties.  To make this change, open up the Network and Sharing Center by clicking on the network connection on the system tray and then clicking the Open Network and Sharing Center link.

image

When the Network and Sharing Center opens, click on the Change adapter settings link on the left hand side.

image

Then when the Network Connections window opens, you have to press the Alt key to open up the window context menu so that you can then click on Advanced and Advanced Settings.

image

When the Advanced Settings window opens, you will be on the Adapters and Bindings tab, which allows you to change the binding order for the network interfaces on the server.

image

If you click on the appropriate interface for the domain connection, you can move it up to the first position in the binding order which will eliminate the warning in SQL Server Setup.

image

If changing the binding order does not resolve the warning, there may be a disabled or ghost network adapter in the system as discussed in the following KB article. (http://support.microsoft.com//kb/955963)

My latest article on Simple-Talk was published this morning.  In this article I dig into the actual meaning of one of the performance counters I often see mentioned on the forums, but in a completely incorrect context.

Great SQL Server Debates: Buffer Cache Hit Ratio

After a recent discussion about Lock Pages in Memory on Twitter, I wrote an article that talks about the history of Lock Pages in Memory and the differing opinions on the appropriate usage of Lock Pages in Memory for 64 bit instances of SQL Server. 

Great SQL Server Debates: Lock Pages in Memory

Thanks to Brent Ozar (Blog|Twitter) and Glenn Berry (Blog|Twitter) for their part in discussing this on twitter, by email, and for reviewing the article before I submitted it.

While setting up my new Availability Group using SQL Server 2012 RC0 tonight, I noticed an interesting new addition to Extended Events associated with Availability Group configuration in the Release Candidate.  When you setup an Availability Group in RC0, another default Event Session is created on the servers that participate in the Availability Group to provide monitoring of the health of the Availability Group overall.  The definition of the monitoring session is as follows:

CREATE EVENT SESSION [AlwaysOn_health] ON SERVER
ADD EVENT sqlserver.alwayson_ddl_executed,
ADD EVENT sqlserver.availability_group_lease_expired,
ADD EVENT sqlserver.availability_replica_automatic_failover_validation,
ADD EVENT sqlserver.availability_replica_manager_state_change,
ADD EVENT sqlserver.availability_replica_state_change,
ADD EVENT sqlserver.error_reported(
    WHERE ([error_number]=(9691) OR [error_number]=(35204) OR [error_number]=(9693) OR [error_number]=(26024) OR [error_number]=(28047) OR [error_number]=(26023) OR [error_number]=(9692) OR [error_number]=(28034) OR [error_number]=(28036) OR [error_number]=(28048) OR [error_number]=(28080) OR [error_number]=(28091) OR [error_number]=(26022) OR [error_number]=(9642) OR [error_number]=(35201) OR [error_number]=(35202) OR [error_number]=(35206) OR [error_number]=(35207) OR [error_number]=(26069) OR [error_number]=(26070) OR [error_number]>(41047) AND [error_number]<(41056) OR [error_number]=(41142) OR [error_number]=(41144) OR [error_number]=(1480) OR [error_number]=(823) OR [error_number]=(824) OR [error_number]=(829) OR [error_number]=(35264) OR [error_number]=(35265))),
ADD EVENT sqlserver.lock_redo_blocked
ADD TARGET package0.event_file(SET filename=N'AlwaysOn_health.xel',max_file_size=(5),max_rollover_files=(4))
WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=ON)
GO

This Extended Event Session monitors a number of critical events in the system but one of the problems with figuring out what exactly this Event Session is monitoring is to figure out what all the predicate values on the sqlserver.error_reported event are actually firing on.  To that aspect of things, we can do a quick reuse of the predicate on this event be doing a replace SSMS on the [error_number] value with a replace for an alias to a query to sys.messages, on the message_id column from the DMV as follows:

SELECT message_id, severity, is_event_logged, text
FROM sys.messages AS m
WHERE m.language_id = SERVERPROPERTY('LCID')
  AND  (m.message_id=(9691)
        OR m.message_id=(35204)
        OR m.message_id=(9693)
        OR m.message_id=(26024)
        OR m.message_id=(28047)
        OR m.message_id=(26023)
        OR m.message_id=(9692)
        OR m.message_id=(28034)
        OR m.message_id=(28036)
        OR m.message_id=(28048)
        OR m.message_id=(28080)
        OR m.message_id=(28091)
        OR m.message_id=(26022)
        OR m.message_id=(9642)
        OR m.message_id=(35201)
        OR m.message_id=(35202)
        OR m.message_id=(35206)
        OR m.message_id=(35207)
        OR m.message_id=(26069)
        OR m.message_id=(26070)
        OR m.message_id>(41047)
        AND m.message_id<(41056)
        OR m.message_id=(41142)
        OR m.message_id=(41144)
        OR m.message_id=(1480)
        OR m.message_id=(823)
        OR m.message_id=(824)
        OR m.message_id=(829)
        OR m.message_id=(35264)
        OR m.message_id=(35265)
)

This will give us a list of the error messages that the Event Session will actually fire events for:

message_id

severity is_event_logged text
823 24 1 The operating system returned error %ls to SQL Server during a %S_MSG at offset %#016I64x in file '%ls'. Additional messages in the SQL Server error log and system event log may provide more detail. This is a severe system-level error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.
824 24 1 SQL Server detected a logical consistency-based I/O error: %ls. It occurred during a %S_MSG of page %S_PGID in database ID %d at offset %#016I64x in file '%ls'.  Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.
829 21 1 Database ID %d, Page %S_PGID is marked RestorePending, which may indicate disk corruption. To recover from this state, perform a restore.
1480 10 0 The %S_MSG database "%.*ls" is changing roles from "%ls" to "%ls" because the mirroring session or availability group failed over due to %S_MSG. This is an informational message only. No user action is required.
9642 16 0 An error occurred in a Service Broker/Database Mirroring transport connection endpoint, Error: %i, State: %i. (Near endpoint role: %S_MSG, far endpoint address: '%.*hs')
9691 10 0 The %S_MSG endpoint has stopped listening for connections.
9692 16 0 The %S_MSG endpoint cannot listen on port %d because it is in use by another process.
9693 16 0 The %S_MSG endpoint cannot listen for connections due to the following error: '%.*ls'.
26022 10 1 Server is listening on [ %hs <%hs> %d].
26023 16 1 Server TCP provider failed to listen on [ %hs <%hs> %d]. Tcp port is already in use.
26024 16 1 Server failed to listen on %hs <%hs> %d. Error: %#x. To proceed, notify your system administrator.
26069 10 1 Started listening on virtual network name '%ls'. No user action is required.
26070 10 1 Stopped listening on virtual network name '%ls'. No user action is required.
28034 10 0 Connection handshake failed. The login '%.*ls' does not have CONNECT permission on the endpoint. State %d.
28036 10 0 Connection handshake failed. The certificate used by this endpoint was not found: %S_MSG. Use DBCC CHECKDB in master database to verify the metadata integrity of the endpoints. State %d.
28047 10 1 %S_MSG login attempt failed with error: '%.*ls'. %.*ls
28048 10 1 %S_MSG login attempt by user '%.*ls' failed with error: '%.*ls'. %.*ls
28080 10 0 Connection handshake failed. The %S_MSG endpoint is not configured. State %d.
28091 10 0  Starting endpoint for %S_MSG with no authentication is not supported.
35201 10 0 A connection timeout has occurred while attempting to establish a connection to availability replica '%ls' with id [%ls]. Either a networking or firewall issue exists, or the endpoint address provided for the replica is not the database mirroring endpoint of the host server instance.
35202 10 0 A connection for availability group '%ls' from availability replica '%ls' with id  [%ls] to '%ls' with id [%ls] has been successfully established.  This is an informational message only. No user action is required.
35204 10 0 The connection between server instances '%ls' with id [%ls] and '%ls' with id [%ls] has been disabled because the database mirroring endpoint was either disabled or stopped. Restart the endpoint by using the ALTER ENDPOINT Transact-SQL statement with STATE = STARTED.
35206 10 0 A connection timeout has occurred on a previously established connection to availability replica '%ls' with id [%ls].  Either a networking or a firewall issue exists or the availability replica has transitioned to the resolving role.
35207 16 0 Connection attempt on availability group id '%ls' from replica id '%ls' to replica id '%ls' failed because of error %d, severity %d, state %d.
35264 10 0 AlwaysOn Availability Groups data movement for database '%.*ls' has been suspended for the following reason: "%S_MSG" (Source ID %d; Source string: '%.*ls'). To resume data movement on the database, you will need to resume the database manually. For information about how to resume an availability database, see SQL Server Books Online.
35265 10 0 AlwaysOn Availability Groups data movement for database '%.*ls' has been resumed. This is an informational message only. No user action is required.
41048 10 1 AlwaysOn Availability Groups: Local Windows Server Failover Clustering service has become unavailable. This is an informational message only. No user action is required.
41049 10 1 AlwaysOn Availability Groups: Local Windows Server Failover Clustering node is no longer online. This is an informational message only. No user action is required.
41050 10 1 AlwaysOn Availability Groups: Waiting for local Windows Server Failover Clustering service to start. This is an informational message only. No user action is required.
41051 10 1 AlwaysOn Availability Groups: Local Windows Server Failover Clustering service started. This is an informational message only. No user action is required.
41052 10 1 AlwaysOn Availability Groups: Waiting for local Windows Server Failover Clustering node to start. This is an informational message only. No user action is required.
41053 10 1 AlwaysOn Availability Groups: Local Windows Server Failover Clustering node started. This is an informational message only. No user action is required.
41054 10 1 AlwaysOn Availability Groups: Waiting for local Windows Server Failover Clustering node to come online. This is an informational message only. No user action is required.
41055 10 1 AlwaysOn Availability Groups: Local Windows Server Failover Clustering node is online. This is an informational message only. No user action is required.
41142 16 0 The availability replica for availability group '%.*ls' on this instance of SQL Server cannot become the primary replica. One or more databases are not synchronized or have not joined the availability group, or the WSFC cluster was started in Force Quorum mode. If the cluster was started in Force Quorum mode or the availability replica uses the asynchronous-commit mode, consider performing a forced manual failover (with possible data loss). Otherwise, once all local secondary databases are joined and synchronized, you can perform a planned manual failover to this secondary replica (without data loss). For more information, see SQL Server Books Online.
41144 16 0 The local availability replica of availability group '%.*ls' is in a failed state.  The replica failed to read or update the persisted configuration data (SQL Server error: %d).  To recover from this failure, either restart the local Windows Server Failover Clustering (WSFC) service or restart the local instance of SQL Server.

Based on this output, and the output of the following query:

SELECT name, description
FROM sys.dm_xe_objects
WHERE NAME IN (
'alwayson_ddl_executed',
'availability_group_lease_expired',
'availability_replica_automatic_failover_validation',
'availability_replica_manager_state_change',
'availability_replica_state_change',
'error_reported',
'lock_redo_blocked')

We can deduce the following about the Event Session:

CREATE EVENT SESSION [AlwaysOn_health] ON SERVER
--Occurs when AlwaysOn DDL is executed including CREATE, ALTER or DROP
ADD EVENT sqlserver.alwayson_ddl_executed,
--Occurs when there is a connectivity issue between the cluster and the Availability Group resulting
--in a failure to renew the lease
ADD EVENT sqlserver.availability_group_lease_expired,
--Occurs when the failover validates the readiness of replica as a primary. For instance, the failover
--validation will return false when not all databases are synchronized or not joined
ADD EVENT sqlserver.availability_replica_automatic_failover_validation,
--Occurs when the state of the Availability Replica Manager has changed.
ADD EVENT sqlserver.availability_replica_manager_state_change,
--Occurs when the state of the Availability Replica has changed.
ADD EVENT sqlserver.availability_replica_state_change,
--Occurs when an error is reported based on the previously listed table
ADD EVENT sqlserver.error_reported(
    WHERE ([error_number]=(9691) OR [error_number]=(35204) OR [error_number]=(9693) OR [error_number]=(26024) OR [error_number]=(28047) OR [error_number]=(26023) OR [error_number]=(9692) OR [error_number]=(28034) OR [error_number]=(28036) OR [error_number]=(28048) OR [error_number]=(28080) OR [error_number]=(28091) OR [error_number]=(26022) OR [error_number]=(9642) OR [error_number]=(35201) OR [error_number]=(35202) OR [error_number]=(35206) OR [error_number]=(35207) OR [error_number]=(26069) OR [error_number]=(26070) OR [error_number]>(41047) AND [error_number]<(41056) OR [error_number]=(41142) OR [error_number]=(41144) OR [error_number]=(1480) OR [error_number]=(823) OR [error_number]=(824) OR [error_number]=(829) OR [error_number]=(35264) OR [error_number]=(35265))),
--Occurs when the redo thread blocks when trying to acquire a lock.
ADD EVENT sqlserver.lock_redo_blocked
--Writes to the file target for persistence in the system beyond failovers and service restarts
ADD TARGET package0.event_file(SET filename=N'AlwaysOn_health.xel',max_file_size=(5),max_rollover_files=(4))
WITH (MAX_MEMORY=4096 KB, EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS, MAX_DISPATCH_LATENCY=30 SECONDS, MAX_EVENT_SIZE=0 KB, MEMORY_PARTITION_MODE=NONE, TRACK_CAUSALITY=OFF, STARTUP_STATE=ON)
GO

What is really cool is that this Event Session is used by the Availability Groups Dashboard to provide an overall status of the health of the Availability Group in Management Studio.

This morning Paul received an email from a member of the community that was asking if it is safe to run SQL Server inside of a VM and whether we would suggest running databases in virtual machines or not.  The root of the question being asked was an article that the person had read back in January titled SQL Server and VMware: A Potentially Fatal Combination.  This article is an amazing example of how misinformation on the internet can lead to confusion and an incorrect decisions about what options are feasible for a given environment or not.  Most people that read my blog know that I have been running SQL Server virtualized on VMware in production environments since late 2004/early 2005.  In all that time, not once have I ever experienced a data corruption issue associated with SQL Server running inside of a virtual machine, even on my laptop where the machine has powered off due to loss of battery life and the machines experienced a hard crash during operation.  Why?

Analyzing the Source of the Misinformation

Lets start out at the top of the article and debunk the misinformation that is contained in it about running SQL Server on VMware.  The first thing is that the article makes some very broad claims about VMware based on the use of VMware Player, which is akin to Microsoft Virtual PC for Windows XP Mode in Windows 7.  This is not a server class hypervisor, nor is it in any way representative of VMware’s main products ESX/ESXi or vSphere.  This is like comparing SQL Server to Microsoft Access, they aren’t even in the same class of functionality or features.

The next claim in the article is that the database is at risk due to hypervisor caching of the writes being performed by SQL Server, which uses the FILE_FLAG_WRITE_THROUGH flag when calling the CreateFile function to open the database, instructing the system to write through any cache directly to disk.  The article claims that VMware Player caches I/O operations and can result in lost writes.  Unfortunately, the example provided is a machine running on Windows Vista which as a hosted platform always uses unbuffered I/O for virtual machines, so it is not caching the writes being performed.  You don’t have to take my word for this, it is documented in the Storage IO crash consistency with VMware products KB article on their site:

For hosted products, write handling depends on the host operating system.

On Linux hosts, VMware does not use unbuffered IO, because it is not safe or supported across all the Linux versions that VMware supports. So currently, VMware hosted products on Linux hosts always use buffered IO.

On Windows hosts, VMware hosted products use unbuffered IO by default.

Unfortunately the entire foundation of the article is based on misinformation that can be proven by reading VMware’s documentation.  The VMware ESX hypervisors DO NOT cache I/O under any circumstance, which is also covered in the same KB article:

VMware ESX acknowledges a write or read to a guest operating system only after that write or read is acknowledged by the hardware controller to ESX. Applications running inside virtual machines on ESX are afforded the same crash consistency guarantees as applications running on physical machines or physical disk controllers. 

If this is the case, you might be wondering how the database in the article became corrupt? Unfortunately it is impossible to know because the necessary parameters surrounding the test configuration are not provided in the article.  Instead the article provides a Disclaimer that tells you the specifics are “rather unimportant”

Disclaimer:
In many articles you will find the execution times of different queries. Please do not interpret these results scientifically or as official benchmarks. You will not even find the configuration of the computer used for these experiments because it is rather unimportant.

I disagree that this kind of information is unimportant, especially when making claims that SQL Server will suffer database corruption issues if running under VMware. 

Setting up a Repeatable Test Scenario

For scientific purposes, I decided to take one of my standard SQL Server 2008R2 VMs running inside of my Dell M6500 laptop and run a series of repeatable tests using VMware Workstation 7 on Windows 7.  The VM is configured with 2 vCPUs and 2GB RAM, and like the article a thumb drive will be used to store the database data and log files.  To setup the environment for the tests, the following script will be used:

-- Create a "Safe" database on the VMDK stored on my laptop SSD
CREATE DATABASE [VMSafe]
ON PRIMARY
( NAME = N'VMSafe',
  FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\VMSafe.mdf',
  SIZE = 6144KB,
  FILEGROWTH = 1024KB )
LOG ON
( NAME = N'VMSafe_log',
  FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\VMSafe_log.ldf',
  SIZE = 1024KB ,
  FILEGROWTH = 1024KB)
GO

-- Create a "Test" database on a removable thumb drive
CREATE DATABASE [VMTest]
ON PRIMARY
( NAME = N'VMTest',
  FILENAME = N'E:\SQLskills\VMTest.mdf',
  SIZE = 6144KB ,
  FILEGROWTH = 1024KB )
LOG ON
( NAME = N'VMTest_log',
  FILENAME = N'E:\SQLskills\VMTest_log.ldf' ,
  SIZE = 1024KB ,
  FILEGROWTH = 1024KB)
GO

-- Create the baseline table in the "Safe" database
USE [VMSafe];
GO
IF OBJECT_ID('dbo.TestTable') IS NOT NULL
BEGIN
    DROP TABLE dbo.TestTable;
END
GO

CREATE TABLE dbo.TestTable
(RowID INT IDENTITY PRIMARY KEY,
Col1 INT,
Col2 CHAR(400) DEFAULT('ABC123'),
Col3 NCHAR(400) DEFAULT('789XYZ'))
GO
DECLARE @StartTime DATETIME = CURRENT_TIMESTAMP
DECLARE @loop INT = 1
BEGIN TRANSACTION
WHILE @loop <=100000
BEGIN
    INSERT INTO TestTable (Col1) VALUES (@loop);
    SET @loop = @loop + 1;
END
COMMIT TRANSACTION
SELECT DATEDIFF(ms, @StartTime, CURRENT_TIMESTAMP) AS Table1_Time
GO
UPDATE TestTable
SET Col1 = RowID;
GO

-- Create a duplicate of the table in the "Test" database
USE [VMTest]
GO
CREATE TABLE dbo.TestTable
(RowID INT IDENTITY PRIMARY KEY,
Col1 INT,
Col2 CHAR(400) DEFAULT('ABC123'),
Col3 NCHAR(400) DEFAULT('789XYZ'))
GO
SET IDENTITY_INSERT dbo.TestTable ON
INSERT INTO TestTable (RowID, Col1, Col2, Col3)
SELECT RowID, Col1, Col2, Col3
FROM VMSafe.dbo.TestTable
SET IDENTITY_INSERT dbo.TestTable OFF
GO

-- Issue CHECKPOINT to flush dirty buffer pages to disk
CHECKPOINT
GO

Now that we have our databases created and identical data in them so that we can validate the consistency of the environment in the event of a forced crash by removing the USB thumbdrive from the computer, lets issue a UPDATE against all the rows in the VMTest databases to prefix them with a 1_ for the Col2 column.

image

With the UPDATE complete, pull out the USB thumbdrive and then issue another CHECKPOINT operation to force a flush of dirty buffers to the disk.  This will raise a 9001 error for the VMTest database since the log is not available to write the CHECKPOINT log records to.

image

At this point we have a crashed database, so we can plug in our USB thumbdrive and restart the SQL Server instance to allow crash recovery to run for the databases.  Immediately upon instance restart we will see:

image

OMG!  It has the same status as the article, that must mean we have a problem!  NO! The In Recovery status tells you that crash recovery is running for the database.  If the database is In Recovery you should not set it to Emergency Mode as the article states.  Instead open up the ErrorLog for the instance and look at what is happening with the recovery, the database engine logs informational messages about the progression of crash recovery for the databases while it is running.

image

image

In this case, within a few minutes of opening the database the estimated time for crash recovery to be performed is already at 86 seconds.  The only thing you have to do here is be patient and allow the Database Engine to perform the Undo/Redo operations the way it is designed to.

image

image

All it took was 103 seconds for crash recovery to be performed and the database is available online and in a consistent manner.  The log records describing the transaction were hardened in the transaction log before the commit of the update operation occurred ensuring that the database could be recovered in a consistent state.

I have run this series of tests a number of different ways and the database always starts up in a crash consistent state.  Feel free to test this yourself by pulling the USB thumbdrive out while the UPDATE is running, and it will force a rollback of the transaction during crash recovery.

Summary

In Summary, there is nothing inherently unsafe about running SQL Server on VMware.  Microsoft has supported SQL Server on VMware for nearly three years at this point as a validated Server Virtualization Validation Program (SVVP) configuration (http://support.microsoft.com/?id=956893).  This article is further evidence of the danger of misinformation on the internet, and the need to validate information online before making decisions based on the information.  A SQL Server running as a virtual machine in a SVVP validated configuration is going to be as safe as a SQL Server running on physical hardware.

Just over two years ago, right before PASS Summit 2009, I got the insane idea after finishing Chapter 3 for the SQL Server 2008 Internals and Troubleshooting book that I wanted to write another book based on my experiences answering questions on the MSDN forums that would be based on the top 10 most frequently asked questions that I had seen over the years repeatedly.  At the time I had no idea how to contact a publisher (the Wrox book was handled entirely by Christian Bolton and I was honored to be asked to contribute Chapter 3 to it). I didn’t want to write this book for profit, and I really liked the way that Redgate and SimpleTalk publishing gave away free copies of the books that they published at events as well as online eBooks that could be downloaded for free.  At PASS I asked Grant Fritchey how he got hooked up with SimpleTalk to publish his book on Execution Plans, and he introduced me to the editor at Redgate, Tony Davis, who really liked the idea behind the book, and within weeks of PASS finishing, I was fast at work outlining chapters and writing the beginnings of what would ultimately become Troubleshooting SQL Server: A Guide for the Accidental DBA.

While the title of this says that it is a Guide for the Accidental DBA, the the book is far more in-depth than just covering the basics. The first chapter in the book provides a methodology for troubleshooting problems in SQL Server that I use every day while working on client machines, and that I have used time and again over the last 6 years that I have been a DBA for SQL Server.  One of the key points that we try to make clear in this chapter is that a problem is rarely identified correctly based on a single piece of information. Appropriate identification of the root cause of a problem is critical in applying the appropriate fix, and this chapter covers how to look at all the available information that SQL Server provides to find the root cause of a problem as fast as possible.   A slightly modified for web preview of the first chapter is available on the SimpleTalk website as an article titled A Performance Troubleshooting Methodology for SQL Server.  This chapter sets up the content for the remainder of the book, and the remaining chapters target specific problem areas that we see repeatedly on the various forums online.

  • High Disk I/O –RAID misconfiguration, inadequate I/O throughput, poor workload distribution, SAN issues, disk partition misalignment and more
  • High CPU usage –insufficient memory, poorly written queries, inadequate indexing, inappropriate configuration option settings, and so on
  • Memory mismanagement – the advent of 64-bit SQL Server removes the memory allocation "shackles" placed on its 32-bit predecessors, but issues arising from incorrect memory configuration are still common
  • Missing indexes – arguably the number one cause of wasteful resource usage in SQL Server
  • Blocking – caused mainly by poorly designed databases that lack proper keys and indexing, and applications that apply needlessly restrictive transaction isolation levels
  • Deadlocking – covering the Bookmark Lookup deadlock, the Serializable Range Scan deadlock, the Cascading Constraint deadlock and more
  • Full transaction logs – lack of log backups, hefty index maintenance operations, long running transaction, problems with replication and mirroring environments, and more.
  • Accidentally-lost data – "oops, wrong database!" Let's hope you've got backups!

This book would have never made it to print if it weren’t for the amazing efforts of the other people involved in the project.  The technical editor for the book was Gail Shaw, I will never understand how she was able to work this into her busy schedule.  When I had to pick a technical editor for the book, the depth of content made it incredibly hard to choose the right person to work with.  Luckily Gail agreed to take on the task and I am forever indebted to her for all of the ideas, additional content, and the countless hours she has worked over the last month and a half while we sprinted to the finish to have the book in print by PASS 2011. The coauthor of the book, Ted Krueger, agreed to step in to help by writing two of the chapters last year after I had fallen behind on the writing repeatedly, and made it possible for the book to be copy complete in time for PASS.  Last but absolutely not least is the editor Tony Davis, who has an amazing amount of patience and has probably spent more time on phone calls trying to work through copy and technical edits to provide clarity to the information being presented. 

In the last 48 hours I have seen two different people having the exact same problem so I thought that I would go about blogging some code I’ve had lying around for a while and been meaning to blog that would help them with finding the root cause of their problems.  In both cases, the question started because the person asking it noticed that Page Life Expectancy was below 300, which is based on dated information published by Microsoft before I began working with SQL Server, and consequently is something I’ll discuss a little at the end of this blog post.  In the first case I started asking questions about the amount of memory installed, and what was using the data cache in the server, (see Paul’s blog post Performance issues from wasted buffer pool memory).  In the second case this information had already been presented and as a matter of the troubleshooting of both of the problems it was noted that a single index was using a majority of the data pages in the buffer cache on the servers and the question became, “How do I find out what is using this index?”  and a solution that would help them with figuring out what queries are using a specific index in a database. 

Another Case for Querying the Plan Cache?

In the past I have written a number of scripts that can be used to find interesting information in the plan cache of SQL Server and this is just another one of those.  I originally wrote this back at the end of March when my good friend Thomas LaRock (Blog|Twitter) sent me an email asking if I would mind him providing my contact information to someone that had a question about Extended Events. What the person wanted to do was identify all the queries that were using a specific index using Extended Events, but it turned out this isn’t possible in Extended Events.  Bummer…  However I could offer an alternate solution that involved querying the plan cache using XQuery and then monitoring what was found over time to determine a majority of the queries that use a specific index. 

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
DECLARE @IndexName AS NVARCHAR(128) = 'PK__TestTabl__FFEE74517ABC33CD';

-- Make sure the name passed is appropriately quoted
IF (LEFT(@IndexName, 1) <> '[' AND RIGHT(@IndexName, 1) <> ']') SET @IndexName = QUOTENAME(@IndexName);
--Handle the case where the left or right was quoted manually but not the opposite side
IF LEFT(@IndexName, 1) <> '[' SET @IndexName = '['+@IndexName;
IF RIGHT(@IndexName, 1) <> ']' SET @IndexName = @IndexName + ']';

-- Dig into the plan cache and find all plans using this index
;WITH XMLNAMESPACES
   (DEFAULT '
http://schemas.microsoft.com/sqlserver/2004/07/showplan')   
SELECT
stmt.value('(@StatementText)[1]', 'varchar(max)') AS SQL_Text,
obj.value('(@Database)[1]', 'varchar(128)') AS DatabaseName,
obj.value('(@Schema)[1]', 'varchar(128)') AS SchemaName,
obj.value('(@Table)[1]', 'varchar(128)') AS TableName,
obj.value('(@Index)[1]', 'varchar(128)') AS IndexName,
obj.value('(@IndexKind)[1]', 'varchar(128)') AS IndexKind,
cp.plan_handle,
query_plan
FROM sys.dm_exec_cached_plans AS cp
CROSS APPLY sys.dm_exec_query_plan(plan_handle) AS qp
CROSS APPLY query_plan.nodes('/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple') AS batch(stmt)
CROSS APPLY stmt.nodes('.//IndexScan/Object[@Index=sql:variable("@IndexName")]') AS idx(obj)
OPTION(MAXDOP 1, RECOMPILE);


Now I have been asked this before so I will go ahead and defend it ahead of time.  I use transaction isolation level read uncommitted when querying the plan cache just to make sure I don’t cause any problems because this can be a long running query depending on the size of the plan cache and I don’t care if I get some phantom plan as a result of being read uncommitted.  I also use MAXDOP 1 for the query because it tends to be a CPU resource hog, and a lot of times I am running this on a production system or telling someone to run this on a production system and it has the potential for impacting performance.  If your server is considerably beefy hardware wise, it probably won’t matter, and for most people it doesn’t negatively impact their environment, but like anything else, I apply a rule of “First do no harm” to something that might be run against a production system.

Where have I used this code since originally writing it back in March?  All over the place, but generally when working with index tuning so I can find what is using a specific index in the system to evaluate what impact changing that index or removing it might have to the environment.  It is also useful for troubleshooting the problem that a specific index is using all of the buffer pool and causing buffer pool flushing to occur on a constant basis resulting in a low Page Life Expectancy for the server.  Using this query you can easily find the plans and statements that have been using this index and begin working to review the query stats information to find out where the problem is in the code, design, or indexing to try and improve the situation.

What’s Wrong about Page Life Expectancy >= 300?

Aside from being a performance metric that was recommended by Microsoft that has never been updated to reflect the changes that have occurred in hardware over the last 12 years, there is nothing wrong with this recommendation (i.e. sarcasm… :-)).  The problem with this fixed value is that it was determined when servers generally had 4GB of RAM installed in them, and servers with 16GB+ of RAM installed in them were extremely expensive and fairly rare. 

This performance counter tells you the number of seconds, at the current point in time, a page will remain in memory without being referenced (so a value of 300 means your buffer pool is flushing every 5 minutes).  So 10 years ago when you were reading anywhere from 1.7GB up to 12GB of data (depending on your server’s memory) from disk into the buffer cache every 5 minutes it was a sign of memory pressure on the server and something you needed to investigate. 

Fast forward to today, where it is not uncommon for a SQL Servers to have anywhere from 48-144GB+ RAM installed in them.  These RAM values equate to 32-132GB of buffer cache size depending on the ‘max server memory’ sp_configure option setting and the amount of memory being used for the plan cache, but you probably get the point by now.  If reading 1.7GB-12GB of data every 5 minutes was bad, how bad would it have to be to read 32GB-132GB of data from disk every 5 minutes consistently? 

Today the value 300 is ridiculously small as a threshold for when to start worrying about buffer pool pressure.

Now from my own experiences as a DBA and over the past 6 months as a consultant, the I/O subsystem is one of the most undersized components in a majority of SQL Server implementations, so the last thing I want to be doing is hammering the disks in my server because I relied on a performance counter that was out of date years ago but never updated.  For the last 3-4 years I have relied on the amount of memory being used by the data cache in SQL Server to determine when Page Life Expectancy was a sign of impending problems.  This means I replaced the old 300 threshold with a more reasonable value of (DataCacheSizeInGB/4GB *300) as a basis for when to begin investigating things. 

Keep in mind that short drops in Page Life Expectancy can be the result of a large query executing and are not a sign of problems in the system (though it may be a sign that you have a tuning opportunity).

Summary: don’t use the value 300 as any kind of Page Life Expectancy threshold. Anyone that continues to recommend doing so is doing you a disservice. Use an adaptive formula like (DataCacheSizeInGB/4GB*300). Even better – monitor the steady-state value of PLE and react when it dips *and stays* below your steady-state value.

For the last few years, I have always had a personal virtual playground for SQL Server setup on my laptop that allowed me to not only learn about new features in SQL Server, but also demonstrate complex topics hands on during presentations.  I recently had to rebuild my playground from the ground up and as a part of doing this; I decided that I would blog about all the steps I took to build a completely FREE environment to play with SQL Server, including a virtual iSCSI “SAN” to support failover clustering.  I have been using VMware Workstation personally for the last few years as the platform for my playground, but only because I received a free license a few years ago when I spoke about virtualizing SQL Server using ESX at the VMware Open Forum in Orlando, FL.  One of the requirements for building this playground is that the virtual machines have to support 64 bit guests, which is not supported by Microsoft Virtual PC but is supported by VMware Workstation.  However, as much as I love my copy of VMware Workstation, it isn’t free and you may not have the budget to purchase it as a base platform, so for the purposes of this series I decided to go with VirtualBox, which is a completely free alternative that also supports 64 bit guest VMs.

To get started you are going to have to download and install VirtualBox from their site:

http://www.virtualbox.org/

While we are covering downloads, the following additional items will be needed to setup the environment:

Windows Server 2008 R2 with Service Pack 1 Evaluation Edition
SQL Server 2008 R2 Evaluation Edition
SQL Server Denali CTP3
Microsoft iSCSI Software Target 3.3 for Windows Server 2008 R2
VMware ESXi VSphere 4.1 Evaluation 

With VirtualBox installed, I start out by building a “base” VM installation that I clone to create all of the other VMs that I need in my playground.  By creating a standardized “base” or template VM I save a ton of time in the long term by not having to install an patch Windows Server 2008 R2 a half dozen times, I can do it once and then by using the sysprep functionality in Windows, I can reuse the image for each of my servers.  I can also add the basic set of features to the template, for example .NET Framework 3.5.1, Failover Clustering, and Multipath I/O to keep from having to manually add them to my VMs that specifically need them.  Some of the VMs that get created from the clone will not need these features specifically, for example the Active Directory Domain Controller we are going to create, but these features don’t impact the ability of the machine to perform as a Domain Controller for the environment.

To get started building the base VM template, create a new VM in VirtualBox (Machine > New) and configure the VM options.  For my template, I chose to allocate 1024MB RAM and create a dynamically expanding virtual hard disk that was 30GB in size, using the Windows Server 2008 x64 template for the VM.  Once the VM is created, you will need to edit the VMs settings to configure the additional networking, and if appropriate additional vCPUs.   For the Networking configuration, I leave NAT enabled for Adapter 1, and then configure internal networking for Adapter 2, 3 and 4 with Adapter 2 as a Internal Network named “Domain Network” and Adapter 3 and 4 on another Internal Network named “iSCSI Network”. 

image      image

Once the virtual networking has been configured for the VM, the last thing to do before powering the VM on is to set the CD/DVD device to the Windows Server 2008 R2 Evaluation Edition ISO file that was downloaded from Microsoft.

image

Once the VM boots it will immediately begin running the Windows Server 2008 Setup off the ISO and you will be on your way to setting up the template VM to base the rest of the servers needed in the environment off of.  Once Windows Server 2008 R2 is installed, I add the three features I mentioned previously in this post to the VM; .NET Framework 3.5.1, Failover Clustering, and Multipath I/O.  Once these have been installed, I run Windows Update to download and apply the most recent updates for Windows to the Server and then allow it to reboot.  Once the VM reboots, depending on how anal retentive I intend to be with the environment, I may or may not disable the Windows Firewall, but as a best practice it should be left turned on.  After I complete the configuration of the “base” template I want for the OS, it is time to run sysprep and reset the VM image for cloning to create multiple VMs based on its image.  Sysprep can be executed from Start > Run.  The options for sysprep should be set to “Enter System Out-of-Box Experience (OOBE)” and the checkbox for Generalize should be checked.

image

Once the VM shuts down it can be used to create clone VMs to build out the necessary infrastructure for the playground.  In the next post in this series, we’ll build the VM clones for the environments Active Directory Domain Controller and iSCSI “virtual” SAN using the Windows Server iSCSI Target.

Yesterday I received an email asking me a question related to my Using a Certificate Signed Stored Procedure to Execute sp_send_dbmail on SQL Server Central that intrigued me enough that I had to put a couple of minutes into actually figuring out what the problem the being encountered was.  In the article, I show how to create a stored procedure in a user database that can be used to call sp_send_dbmail without the calling user being in the DatabaseMailUserRole role in msdb, and there are actually a number of uses for this implementation beyond the example that is provided in the article.  I use this same method to allow Service Broker activation procedures to send me emails through Database Mail all the time, and there are numerous other cases where you might need to create a stored procedure in one database that executes code in a separate database, where the calling user should not have access to the separate database. 

I don’t know the exact use case for the problem that was posed in the email question, but the basic tenet of the question was that there was a stored procedure in one database that needed to be called from multiple databases and the intent was to use the same certificate to sign stored procedures in multiple databases to call this centralized stored procedure without giving the users access to the central database.  Is that confusing to you?  I confused me initially when I read the email, and it took a second pass for me to actually understand the problem being encountered and then build out a repro of the issue to begin looking at what the problem might be.  To keep things simple and non-confusing if that is possible, lets first create two databases, two procedures and build out the basic framework from the article for two user stored procedures.

-- Create the calling database
CREATE DATABASE a;
GO
-- Create the target database
CREATE DATABASE b;
GO
-- Switch to the target database
USE b
GO
-- Create a table and insert 10 rows into the table
CREATE TABLE SomeTable (RowID int identity primary key)
GO
INSERT INTO SomeTable DEFAULT VALUES
GO 10
-- Create the target stored procedure that selects from the table
-- in the target database.
CREATE PROCEDURE Select_SomeTable
AS
SELECT * FROM SomeTable
GO
-- Switch to the calling database
USE a
GO
-- Create a stored procedure that calls the stored procedure
-- in the target database
CREATE PROCEDURE ExecuteDatabaseB_Select_SomeTable
WITH EXECUTE AS OWNER
AS
EXECUTE b.dbo.Select_SomeTable
GO
-- Create a certificate to sign the calling stored procedures with
CREATE CERTIFICATE [SigningCertificate]
ENCRYPTION BY PASSWORD = '$tr0ngp@$$w0rd'
WITH SUBJECT = 'Certificate for signing a Stored Procedure';
GO
-- Backup certificate so it can be create in master database
BACKUP CERTIFICATE [SigningCertificate]
TO FILE = 'c:\SQLskills\SigningCertificate.cer';
GO
-- Add Certificate to Master Database
USE [master]
GO
CREATE CERTIFICATE [SigningCertificate]
FROM FILE = 'c:\SQLskills\SigningCertificate.cer';
GO
-- Create a login from the certificate
CREATE LOGIN [SigningLogin]
FROM CERTIFICATE [SigningCertificate];
GO
-- The Login must have Authenticate Sever access
-- per http://msdn.microsoft.com/en-us/library/ms190785.aspx
GRANT AUTHENTICATE SERVER TO [SigningLogin]
GO
-- Create a user in database b for the Login
USE b
GO
CREATE USER [SigningLogin] FROM LOGIN [SigningLogin]
GO
-- Grant EXECUTE on the target stored procedure to the 
-- certificate based login
GRANT EXECUTE ON [dbo].[Select_SomeTable] TO [SigningLogin]
GO
-- Switch to the calling database
USE a
GO
-- Sign the procedure with the certificate's private key
ADD SIGNATURE TO OBJECT::[ExecuteDatabaseB_Select_SomeTable]
BY CERTIFICATE [SigningCertificate] 
WITH PASSWORD = '$tr0ngp@$$w0rd';
GO
-- Create a test login to test that the certificate signed procedure
-- can properly execute the target procedure without the login having
-- access to the target database or target procedure directly
USE master;
GO
CREATE LOGIN [testuser] WITH PASSWORD=N't3stp@$$', DEFAULT_DATABASE=[master], CHECK_EXPIRATION=OFF, CHECK_POLICY=OFF
GO
-- Create the test user for the test login in the calling database and
-- grant it EXECUTE privileges on the calling stored procedure
USE [a]
GO
CREATE USER [testuser] FOR LOGIN [testuser]
GO
GRANT EXECUTE ON [dbo].[ExecuteDatabaseB_Select_SomeTable] TO [testuser]
GO
-- Switch to the test login context
EXECUTE AS LOGIN = 'testuser'
EXECUTE a.dbo.ExecuteDatabaseB_Select_SomeTable
GO
-- Revert the context back to our sysadmin login
REVERT
GO

In the above code, database “a” has a stored procedure that calls a stored procedure in database “b” and the “testuser” login only has access to database “a” and EXECUTE privileges on the stored procedure in database “a”.  The certificate from database “a” is backed up and created in the “master” database so that a certificate login can be created from the certificate which has the EXECUTE privilege on the stored procedure in database “b”.  The “testuser” login does not have access to database “b” or any of the objects inside of the database but because the stored procedure in database “a” is signed by the certificate and a login has been created in master from the certificate that has access to EXECUTE the stored procedure in database “b”, the “testuser” login can execute the stored procedure in database “a” and get access to the information in database “b” without having to have explicit access to database “b” or the stored procedure contained in that database.

At this point everything is essentially in line with the information contained in the SQL Server Central article I wrote, and it works exactly as shown in the article, even though it is all being applied to user databases and objects.  Now let’s try to extend this functionality to a third database as a caller of the stored procedure in database “b” by creating a new database, the same stored procedure that was created in database “a”, and then creating the certificate in the new database from the previous backup from database “a” and signing the stored procedure using the same certificate.

-- Create a third database to create another calling database
-- and stored procedure to the target database and stored procedure
CREATE DATABASE c;
GO
-- Create the calling stored procedure in the third database
USE c
GO
CREATE PROCEDURE ExecuteDatabaseB_Select_SomeTable
WITH EXECUTE AS OWNER
AS
EXECUTE b.dbo.Select_SomeTable
GO
-- Create the certificate from in the third database from the 
-- previous certificate backup to allow signing of the procedure
CREATE CERTIFICATE [SigningCertificate]
FROM FILE = 'c:\SQLskills\SigningCertificate.cer';
GO
-- Sign the procedure with the certificate's private key
ADD SIGNATURE TO OBJECT::[ExecuteDatabaseB_Select_SomeTable]
BY CERTIFICATE [SigningCertificate] 
WITH PASSWORD = '$tr0ngp@$$w0rd';
GO

This all works great, right up to the point that we try to sign the stored procedure using the certificate we created from our backup from database “a”.  When we try to sign the stored procedure in database “c” we get the following error back from SQL Server:

Msg 15556, Level 16, State 1, Line 2
Cannot decrypt or encrypt using the specified certificate, either because it has no private key or because the password provided for the private key is incorrect.

So what exactly happened here, the process worked the first time, but now we can’t duplicate it for additional databases that need to call this stored procedure?  The problem is that when we backed up the certificate from database “a” we only backed up the certificate, we didn’t backup the certificate’s private key to a separate file, so we can’t use this same certificate in additional databases to sign stored procedures unless we go back and backup the private key from the certificate as well.

To fix the problem, we first have to drop the stored procedure in database “c” and then drop the certificate without its private key from the database as well.  Then we will need to take a new backup of the certificate from database “a” and specify the WITH PRIVATE KEY clause to backup the private key for the certificate to a separate private key file so that we can recreate the certificate in database “c” with the correct private key to sign the stored procedure.  Once we have done this, we can recreate our stored procedure and sign it with the certificate, then create the database user for the “testuser” login and grant the database user the EXECUTE privilege on the stored procedure in database “c”, and test the configuration by executing the stored procedure in database “c” to ensure that we get the results from database “b”.

-- Fix the problem
USE [c]
GO
-- Drop the calling procedure with the invalid signature
-- Create the calling stored procedure in the third database
DROP PROCEDURE ExecuteDatabaseB_Select_SomeTable
GO
-- Drop the certificate without the private key
DROP CERTIFICATE [SigningCertificate]
GO
-- Backup the certificate with its private key so it can be used in
-- additional databases to sign stored procedures calling the target
-- stored procedure.
USE a
GO
BACKUP CERTIFICATE [SigningCertificate]
TO FILE = 'c:\SQLskills\SigningCertificate_WithKey.cer'
WITH PRIVATE KEY (
FILE = 'c:\SQLskills\SigningCertificate_WithKey.pvk',
ENCRYPTION BY PASSWORD = '$tr0ngp@$$w0rd',
DECRYPTION BY PASSWORD = '$tr0ngp@$$w0rd');
GO
-- Create the certificate in our new database with the private key
-- from the new certificate backup with the private key.
USE c
GO
CREATE CERTIFICATE [SigningCertificate]
FROM FILE = 'c:\SQLskills\SigningCertificate_WithKey.cer'
WITH PRIVATE KEY (
FILE = 'c:\SQLskills\SigningCertificate_WithKey.pvk',
ENCRYPTION BY PASSWORD = '$tr0ngp@$$w0rd',
DECRYPTION BY PASSWORD = '$tr0ngp@$$w0rd');
GO
-- Recreate the calling stored procedure in the third database
CREATE PROCEDURE ExecuteDatabaseB_Select_SomeTable
WITH EXECUTE AS OWNER
AS
EXECUTE b.dbo.Select_SomeTable
GO
-- Sign the procedure with the certificate's private key
ADD SIGNATURE TO OBJECT::[ExecuteDatabaseB_Select_SomeTable]
BY CERTIFICATE [SigningCertificate] 
WITH PASSWORD = '$tr0ngp@$$w0rd';
GO
-- Create the database user for the testuser login and grant it
-- the EXECUTE privilege on the calling stored procedure
CREATE USER [testuser] FOR LOGIN [testuser]
GO
GRANT EXECUTE ON [dbo].[ExecuteDatabaseB_Select_SomeTable] TO [testuser]
GO
-- Switch to the test login context
EXECUTE AS LOGIN = 'testuser'
EXECUTE c.dbo.ExecuteDatabaseB_Select_SomeTable
GO
-- Revert the context back to our sysadmin login
REVERT
GO

This all works as expected and we’ve resolved our problem.  This was a really interesting problem because I had never actually anticipated that someone would use the same certificate in multiple databases to sign multiple stored procedures that perform cross database operations without granting the calling users access to the target database.  This repro shows that this is technically possible if you backup the private key for the certificate and restore the private key in the additional databases with the certificate.

See you on the playground!

This blog post is brought to you complements of tonight's dinner meal.  My entire life I have been eating Greek food, mainly because I come from a Greek background on my fathers side, but also because its really good food.  I can’t tell you how many times in my life I have seen someone in my family, usually my mom or my Aunt Anna, prepare and cook spanakopita.  In the past year I’ve helped my mom make it twice in attempts to learn everything she did, but tonight was the first time I actually attempted to prepare the dish from scratch myself.  If you have never had spanakopita, it is basically a spinach pie that is made using phyllo (filo) dough, a paper thin pastry dough that can be incredibly difficult to work with if you don’t know what you are doing. 

Today I decided to take my first shot at making spanakopita and while taste wise it wasn’t the best I’d ever had, it was far from being the worst.  I’d almost call the attempt a success, except for the important detail I learned as soon as I pulled the pan out of the oven and tried to cut into it.  If you recall, the dish is made with a paper thin dough for the crust, and my entire life I can remember my mom cutting the portions in the pan before baking the dish, but I only tonight learned that this is an important step.  If you don’t cut the portions before you bake the dish, you won’t get a clean cut through the upper portion of the crust because it becomes crispy and the paper think layers shatter when you attempt to slice through them.  It doesn’t ruin the dish taste wise, but presentation is out the window as shown in the below picture of the tonight’s left overs.

IMG_2700

You’ve probably noticed that this is not a technical post at all; or is it?  As a trainer, mentor, and student this is a really good example of how the simplest detail can have a significant impact to the final outcome.  The same task may get accomplished, but it may not have the same fit and finish that it would had the detail been emphasized.  In the Army, we have a training doctrine that provides a POI (Program Of Instruction) that is incredibly detailed.  As a TAITC (Total Army Instructor Training Course) certified instructor, I can tell when an instructor at a course I am attending is following the POI for the course verbatim, not due to it sounding robot like and incredibly rehearsed, but because there are NOTES sections in the training that hit very important details that people often miss if they aren’t following the POI.   Keep in mind that these are the training packages that are used Army wide for the specific phase and level of training being conducted, and they have been reviewed numerous times at multiple levels inside of the Army, and they are under constant revision based on experiences from the instructors in the field. 

Consider all the things that you do day to day without really considering the impact of the specific steps that you are taking.  Do you think that you can teach someone how to do those tasks and ensure that you don’t miss some simple step that is important to the final outcome?  My favorite example of this is unfortunately another military example, but it occurred when I first reported to my current unit as a Drill Sergeant Candidate, and I learned about this horrific little black book known as the Drill Sergeant Module Book.  This book is a verbatim reference of how each of the Drill and Ceremony Modules is supposed to be taught to a new recruit to ensure that they get the exact same detail of instruction as any other soldier entering the Army.  The first module made me laugh initially, the Position of Attention, I mean how difficult can it really be to teach someone to stand straight, not move, face forward and not talk while standing in a formation, until I tried it and failed miserably, and then tried again three more times; each time missing other details that are important to the final position.  The module for the position covers every detail head to toe:

7. On the command FALL IN or on the command of execution ATTENTION of Demonstrator, ATTENTION.

8. Bring the heels together sharply on line, with the toes pointing out equally, forming an angle of 45-degrees. Rest the weight of the body evenly on the heels and balls of both feet. Keep the legs straight without locking the knees. Hold the body erect with the hips level, chest lifted and arched, and the shoulders square.

9. Keep the head erect and face straight to the front with the chin drawn in so that the alignment of the head and neck is vertical.

10. Let the arms hang straight without stiffness. Curl the fingers so that the tips of the thumbs are alongside and touching the first joint of the forefingers. Keep the thumbs straight along the seams of the trouser leg with the first joint of the fingers touching the trousers.

11. Remain silent and do not move unless otherwise directed. RELAX.

If you know the module segment that is listed above, you will never miss a single detail required to teach the position of attention to someone.  I had to use the first sentence from the Introduction of the module to actually find the above link to the module book; the memorization of modules is no longer a requirement in the current Drill Sergeant School POI.

I’ve tried and failed a number of times when working on training materials, presentations, or just generally trying to mentor someone, to make the instruction follow a similar pattern to make certain that I always hit the detail points.  However, unless you are willing to constrain yourself to sounding like a robot or sounding like you memorized your presentation entirely, you aren’t going to fit this type of pattern.  Today, Army Drill Sergeants don’t generally memorize the modules for Drill and Ceremony instruction.  In fact it has never been a requirement that a Drill Sergeant had to memorize the entire module verbatim, you just had to hit the key points and details inside the content, and you always have a demonstrator working with you, so if you miss something and they know it, they can exaggerate an incorrect motion that makes the position look wrong so that you see the mistake. 

In technology, we have demonstrations as a part of most of our presentations, and I try to make use of notes in the demo code that I write to provide details that will trigger coverage of a specific topic if I’ve failed to cover it already.  I’ve seen some of my mom’s recipe cards from when I was a kid, and there are notes on those as well that provide a tip or trick that worked best for the specific recipe.  The problem with this is, in our day to day work, it is really easy to miss making these little notes known to others because they have become so ingrained in our way of doing things. 

This post went from dinner to the Army and finally back into technology.  I hope the non-technical nature of the content still makes you think about how you perform some of your technical work as well.

Categories:
General

When Microsoft announced that changes were being made to the Microsoft Certified Masters program for SQL Server 2008 last year I was initially pretty bummed out. I had been hoping to be able to attend one of the onsite MCM rotations at Microsoft. I wasn’t looking forward to the expense associated with that, but I was really interested in the training that was a part of the program more than anything else. When the new MCM exams became available at PASS Summit, I immediately signed up to take the written exam, initially on November 19th, and then due to some technical problems in Orlando, I ended up having to reschedule for January 21st. I found out last week that I passed the written portion of the certification, and immediately scheduled the Lab Exam for this morning, and I found out this afternoon that I had passed the Lab Exam. There has been a lot of discussion about the MCM exams and what someone might have to do to be able to pass them, so in this post I’d like to share my thoughts based on my experiences.

The Written Exam

Out of the two exams you have to take I’d say that the written exam was the worst of the two. Not because it was more difficult in level of content, but because it is multi-guess and those types of exams stress me out. When you arrive at the Prometric testing center, the only two items you should take into the building with you are 2 forms of ID and your car keys. Speaking of the two forms of ID, make sure that you read the requirements well; only one has to have a photo on it, but both have to have your name and a signature. When you walk into the testing center, they check your two ID’s, your keys go into a locker, you have to turn all your pockets inside out to prove there is nothing in them before you can go back to the testing center where they validate your information again, and then told you that you can put put your pockets back where they belong. From there, the test is like any other Microsoft written exam you’ve ever taken; only more intimidating. 

I didn’t feel that the questions asked were overly difficult; they fit inside of the concepts that I was expecting to be on the exam. To be honest I’ve worked harder problems on the MSDN Forums before, but there I don’t have a list of possible answers to choose from and I can always ask for more information before deciding what the correct way for someone to fix a problem is.  The problem with multiple-guess technology exams is that there are more incorrect answers than there are correct ones, and at least to me a lot of times multiple answers seem like they are correct answers and that gets me second guessing myself. I left the written exam uncertain about whether I had passed it or not, and I was feeling like I might have ended up on the wrong side of things.

If you plan to take the written exam, read everything that’s on the SQL Server 2008 MCM Pre-reading and the MCM: SQL Server online training lists, and understand the core concepts behind the subjects. When that list was originally published, I was surprised at how little of the material I hadn’t already read at some point in the past.  A lot of it was read trying to understand a problem someone had asked about on the forums.  Someone today asked me if I’d blog about how I studied and prepared for the exams. I’d have to say that my studying and preparation began four years ago when I started answering questions on the forums, and has continued ever since. People do all kinds of stuff with SQL Server that I would never dream of trying or doing.  They also get to upgrade and play with newer hardware and configurations faster than I ever will be able to, so when something new pops up as a problem, for instance the Power Management default in Windows Server 2008 R2 being Balanced cutting CPU speed in half, I’ve generally seen it by the time I am making similar changes in my own environments.

The Lab Exam

The Lab Exam for the MCM was AWESOME! I really excel at hands on work. And, generally you know when/if you got a question right by the results of the operations being performed. Taking the Lab Exam this morning was like having a week’s worth of the random stuff that pops up at work to make it a bad day, crammed into five and half hours. The exam is pretty much yours to run as you see fit, and I bounced all over the place the first half hour until I had resolved a order of priority for the questions based on what topics I knew the best to what I knew the least.  You do have access to the Books Online inside of the lab environment, but unless it’s just a quick look up and you know exactly what you are going after, it is going to do you absolutely no good.  You actually have to know SQL Server at a 400+ level and you have to know what you are doing from past hands-on experience so that it’s almost second nature.

I spent the last few nights moonlighting trying to watch as many of the MCM: Microsoft SQL Server 2008 Microsoft readiness videos as I could, and I even watched most of them at 1.4 to 1.6 times the regular speed.  If you think Paul Randal is hard to keep up with when he talks normally, give that a shot, you’ll have a whole new perspective the next time you listen to him speak in real life.  I actually don’t recommend that you watch the training videos at a higher speed if you are trying to actually learn from them. I did this because I found that I already knew a majority of the content and I was just trying to catch the stuff I didn’t know in a compressed amount of time.  Watching the videos alone won’t get you past the exam; you have to be able to apply the information covered in the videos and have real practical experience.  Also, you can’t know everything. If I had practical experience in every aspect of SQL Server, the last hour of my lab exam wouldn’t have been quite as stressful as it was. But, don’t worry; you don’t have to get every answer right to pass. Microsoft knows that even masters don’t know everything too!

While working on a problem today I happened to think about what the impact to startup might be for a really large tempdb transaction log file.  Its fairly common knowledge that data files in SQL Server 2005+ on Windows Server 2003+ can be instant initialized, but the transaction log files can not.  If this is news to you see the following blog posts:

Kimberly L. Tripp | Instant Initialization - What, Why and How?
In Recovery... | Misconceptions around instant file initialization
In Recovery… | Search Engine Q&A #24: Why can't the transaction log use instant initialization?
Tibor Karaszi : Do you have Instant File Initialization?

The thought occurred to me today that despite having log files 8GB in size for tempdb, I’ve never really noticed that it takes that long for SQL Server to startup.  So I jumped on twitter and shot a tweet out to Paul Randal (Blog | Twitter) and I also included the #sqlhelp hash tag to see what others in the community thought.  I got a couple of comments, one linking me to Paul Randal’s blog post, another saying test it, and another saying the transaction log is always zero initialized.

When I got home tonight I was still thinking about this and as I went to go test it, I remembered that I had read a blog post once about a trace flag that would output information about zero file initialization.  A quick search and I found the post on the Premier Field Engineers Blog.  So I jumped on a test system I have and added the –T3004 and –T3605 trace flags to the startup parameters.  Since the PFE blog provided the following disclaimer about using these trace flags so will I.

WARNING: These trace flags should be used under the guidance of Microsoft SQL Server support. They are used in this post for discussion purposes only and may not be supported in future versions.

After setting the flags I checked tempdb and it was currently setup with a 4GB transaction log file.  With this information in hand I restarted the instance and once it was online I opened the ErrorLog to look at what –T3004 could tell me about tempdb log file initialization.

2010-05-13 18:42:13.52 spid12s     Clearing tempdb database.
<……….skipped content…………>
2010-05-13 18:42:30.93 spid12s     Zeroing T:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\Data\templog.ldf from page 2 to 17 (0x4000 to 0x22000)
2010-05-13 18:42:30.93 spid12s     Zeroing completed on T:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\Data\templog.ldf
2010-05-13 18:42:30.93 spid12s     Starting up database 'tempdb'.
2010-05-13 18:42:30.96 spid12s     FixupLogTail(progress) zeroing T:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\Data\templog.ldf from 0x5000 to 0x6000.
2010-05-13 18:42:30.96 spid12s     Zeroing T:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\Data\templog.ldf from page 3 to 483 (0x6000 to 0x3c6000)
2010-05-13 18:42:30.96 spid12s     Zeroing completed on T:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\Data\templog.ldf

Hmm, so the log file is zeroed for tempdb, but it doesn’t take a rocket scientist to notice that the page counts being zeroed out (17-2=15 total pages) don’t add up to 4GB of space.  To check this, I created a new user database with a 8GB data file, and a 4GB log file to see the output for zeroing out a 4GB transaction log.

2010-05-13 18:45:42.61 spid54      Zeroing L:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\Data\ZeroLog_log.ldf from page 0 to 524288 (0x0 to 0x100000000)
2010-05-13 18:46:20.92 spid54      Zeroing completed on L:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\Data\ZeroLog_log.ldf
2010-05-13 18:46:36.35 spid54      Starting up database 'ZeroLog'.
2010-05-13 18:46:36.36 spid54      FixupLogTail(progress) zeroing L:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\Data\ZeroLog_log.ldf from 0x5000 to 0x6000.
2010-05-13 18:46:36.36 spid54      Zeroing L:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\Data\ZeroLog_log.ldf from page 3 to 483 (0x6000 to 0x3c6000)
2010-05-13 18:46:36.37 spid54      Zeroing completed on L:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\Data\ZeroLog_log.ldf

So a user database with a 4GB log file will zero out 524288 pages which is 4GB of space.  I got a private message on twitter about the topic from Remus Rusanu (Blog | Twitter), telling me that the entire file isn’t initialized at startup, but if you use ALTER DATABASE to grow the size of the tempdb log, the space you grow by will be zero initialized entirely, so to test that I grew the log file out to 8GB in size:

ALTER DATABASE [tempdb] MODIFY FILE ( NAME = N'templog', SIZE = 8388608KB )

When I ran this, it returned completed immediately, so I jumped over to the ErrorLog to pull the results and was momentarily confused by what I got back. 

2010-05-13 19:20:24.02 spid57      Zeroing T:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\Data\templog.ldf from page 524288 to 524319 (0x100000000 to 0x10003e000)
2010-05-13 19:20:24.02 spid57      Zeroing completed on T:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\Data\templog.ldf

It only initialized 31 pages??  Can that be correct?  So I went back and checked my log files actual size and I remembered this little problem mentioned by Kimberly Tripp on her blog post Kimberly L. Tripp | Transaction Log VLFs - too many or too few? that occurs when you grow the file in 4GB increments.  So I reran the ALTER DATABASE statement and this time it took a minute for the command to complete.  Much better, now I know we did some zeroing out of the file, and the ErrorLog output proved it:

2010-05-13 19:21:17.47 spid57      Zeroing T:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\Data\templog.ldf from page 524319 to 1048576 (0x10003e000 to 0x200000000)
2010-05-13 19:21:49.54 spid57      Zeroing completed on T:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\Data\templog.ldf

Much better, so now the begging question is, why isn’t the log file zero initialized in its entirety at startup?  I can’t speak with complete certainty on this, but I would guess that it has to do with the fact that the tempdb transaction log is never used for crash recovery, so it doesn’t really matter that the space isn’t zero initialized.  Paul Randal explains in his blog post, In Recovery… | Search Engine Q&A #24: Why can't the transaction log use instant initialization?, how the parity bits are used during crash recovery to identify where recovery should stop processing log records.  Perhaps full zero initialization is skipped for the tempdb log at startup because the log is never used for crash recovery, but that doesn’t explain why the log, when grown, does perform full zero initialization, unless it is due to the way that the log can wrap around, for example based on the last image above (VLF Usage After Log Reuse), if the log space continues to be used without truncation when the log gets back to FSeqNo 29, the log will have to grow causing the allocation to become non-sequential since FSeqNo 30-35 are still active.  Maybe someone else will explain the reason why the tempdb log has to be zero initialized when grown but not at startup better.

EDIT:

After posting this, I jumped back over to twitter and saw some interesting comments from Brent Ozar (Blog | Twitter) that made me go back and test the impact of deleting the tempdb files from disk and then starting the instance up.  When I did this, the entire file was zero initialized

2010-05-13 20:21:06.21 spid12s     Clearing tempdb database.
2010-05-13 20:21:06.23 spid12s     Zeroing T:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\Data\templog.ldf from page 0 to 1048576 (0x0 to 0x200000000)
<……….skipped content…………>
2010-05-13 20:22:22.81 spid12s     Zeroing completed on T:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\Data\templog.ldf
2010-05-13 20:22:22.84 spid12s     Starting up database 'tempdb'.
2010-05-13 20:22:22.85 spid12s     FixupLogTail(progress) zeroing T:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\Data\templog.ldf from 0x5000 to 0x6000.
2010-05-13 20:22:22.85 spid12s     Zeroing T:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\Data\templog.ldf from page 3 to 483 (0x6000 to 0x3c6000)
2010-05-13 20:22:22.86 spid12s     Zeroing completed on T:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\Data\templog.ldf

So the answer is YES the tempdb transaction log is completely initialized when it is first physically created, but after that, its not zero initialized entirely as the instance starts up.  There is a definite difference in startup times on my server when I deleted the files. 

With Files

2010-05-13 18:42:12.54 Server      Microsoft SQL Server 2008 (SP1) - 10.0.2757.0 (X64)
2010-05-13 18:42:31.04 spid8s      Recovery is complete. This is an informational message only. No user action is required.

Without Files

2010-05-13 20:21:05.49 Server      Microsoft SQL Server 2008 (SP1) - 10.0.2757.0 (X64)
2010-05-13 20:22:22.97 spid9s      Recovery is complete. This is an informational message only. No user action is required.

When the files preexist, instance startup only took 19 seconds, without the files it took 1 minute and 17 seconds.  This actually makes the zero initialization during log growth make sense.  I’d say a fair bit of my misunderstanding of this is the way tempdb is often referred to be as being recreated at restart.  Its not actually recreated based on these tests, but it is cleared, as show by the “Clearing tempdb database.” log entries.  Interesting stuff.

ANOTHER EDIT:

So after thinking about this some more, curiosity got the best of me and I wanted to know what would happen if the size of the tempdb log file on disk was different from the size configured for tempdb.  To test this, I used ALTER DATABASE to change the size of the log file from 8GB to 1GB and restarted SQL Server.  This reduced the size of the log file on startup from 8GB to 1GB and the log showed that it didn’t zero initialize the entire file:

2010-05-13 21:20:45.92 spid12s     Clearing tempdb database.

<……….skipped content…………>
2010-05-13 21:21:02.59 spid12s     Zeroing T:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\Data\templog.ldf from page 2 to 17 (0x4000 to 0x22000)
2010-05-13 21:21:02.59 spid12s     Zeroing completed on T:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\Data\templog.ldf
2010-05-13 21:21:02.62 spid12s     Starting up database 'tempdb'.
2010-05-13 21:21:02.63 spid12s     FixupLogTail(progress) zeroing T:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\Data\templog.ldf from 0x5000 to 0x6000.
2010-05-13 21:21:02.63 spid12s     Zeroing T:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\Data\templog.ldf from page 3 to 483 (0x6000 to 0x3c6000)
2010-05-13 21:21:02.64 spid12s     Zeroing completed on T:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\Data\templog.ldf

In addition to this, the startup time was fast, taking only 17 seconds to complete recovery of the instance.

2010-05-13 21:20:45.23 Server      Microsoft SQL Server 2008 (SP1) - 10.0.2757.0 (X64)
2010-05-13 21:21:02.78 spid9s      Recovery is complete. This is an informational message only. No user action is required.

So I shut down SQL Server and renamed the log file to templog_small.ldf and then restarted SQL.  Once recovery completed, I grew the transaction log back to 8GB and once again shutdown SQL Server.  Then I renamed the current 8GB log file to templog_big.ldf and renamed templog_small.ldf to templog.ldf, replacing the 8GB log file with a 1GB log file.  SQL Server recognized the change during startup, and once again zero initialized the entire log.

2010-05-13 21:25:11.72 spid12s     Clearing tempdb database.
2010-05-13 21:25:11.73 spid12s     Zeroing T:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\Data\templog.ldf from page 0 to 1048576 (0x0 to 0x200000000)
<……….skipped content…………>
2010-05-13 21:26:17.95 spid12s     Zeroing completed on T:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\Data\templog.ldf
2010-05-13 21:26:28.05 spid12s     Zeroing T:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\Data\templog.ldf from page 2 to 17 (0x4000 to 0x22000)
2010-05-13 21:26:28.05 spid12s     Zeroing completed on T:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\Data\templog.ldf
2010-05-13 21:26:28.07 spid12s     Starting up database 'tempdb'.
2010-05-13 21:26:28.09 spid12s     FixupLogTail(progress) zeroing T:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\Data\templog.ldf from 0x5000 to 0x6000.
2010-05-13 21:26:28.09 spid12s     Zeroing T:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\Data\templog.ldf from page 3 to 483 (0x6000 to 0x3c6000)
2010-05-13 21:26:28.10 spid12s     Zeroing completed on T:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\Data\templog.ldf

This once again impacted the time required to recover the instance.

2010-05-13 21:25:11.01 Server      Microsoft SQL Server 2008 (SP1) - 10.0.2757.0 (X64)
2010-05-13 21:26:28.23 spid9s      Recovery is complete. This is an informational message only. No user action is required.

So that leaves me asking, does SQL Server really recreate tempdb from the model database every time it starts?  According to KB Article 307487 :

“When SQL Server starts, the tempdb is re-created by using a copy of the model database and is reset to its last configured size.”

It would certainly seem as if this statement is wrong based on testing.  What I find the most interesting is if I reverse the process and replace a 1GB log file with a previously created 8GB log file, SQL Server doesn’t zero initialize the 8GB file, it just shrinks it back to 1GB.  It only performs the zero initialization as a part of having to grow the preexisting log file during instance startup.

Theme design by Nukeation based on Jelle Druyts