In this survey I’d like to know what the oldest version of SQL Server is in your environment – it has to be an instance that’s being used for a production workload in some way (even just occasionally).
I’ll report on the results next week.
Thanks!
In this survey I’d like to know what the oldest version of SQL Server is in your environment – it has to be an instance that’s being used for a production workload in some way (even just occasionally).
I’ll report on the results next week.
Thanks!
As of this afternoon all eight classes we’re doing in Chicago in April/May are sold out – that’s 160 people we’ll be training!
The reason for this post is to remind those of you who are planning to come to one of our London classes in June that they’re all more than half full already and we expect them to sell out in the next 2-3 weeks. We had multiple people wait too long to register for Chicago and we were able to add extra IE1 and IEBI classes to cope with demand, but we can’t do that in London.
Don’t forget that the early-bird price for IE1 in London expires Friday 19th!
Check out the London classes here - hope to see you there!
A while back I kicked off a survey asking what mechanism you use for running your regular SQL Server database maintenance.
Here are the results:
The “Other” responses were:
The main purpose of this survey is to show people that there are freely-available and comprehensive scripts that you can download to help run your regular database maintenance, and that many, many people use them in production.
I don’t like to recommend using the SSMS Maintenance Plan Wizard. It has quite limited options, has had a number of high profile bugs in the past, and to this day, even in SQL Server 2012, it still allows you to perform regular shrink operations without any warning as to the side-effects.
Home grown scripts are OK, but time after time when I’m reviewing client maintenance scripts I see coding errors, lack of logging of what happened, and lack of useful error handling. An example of a common error is in code to figure out which indexes are fragmented, where the results from sys.dm_db_index_physical_stats are not filtered by alloc_unit_type_desc, so there are false positives from LOB_DATA and ROW_OVERFLOW_DATA allocation units. In one client, many of their large clustered indexes had no logical fragmentation but were being rebuilt every night needlessly because of benign LOB_DATA fragmentation, generating a ton of extra transaction log that had to be backed up.
More than 40% of the almost 500 respondents use some or all of Ola Hallengren’s Maintenance Solution, and I always recommend our clients download and play around with Ola’s scripts before asking us to write customized code for them, or at least let us use Ola’s code where possible to save them consulting time. Ola’s Maintenance Solution has won multiple awards and is very widely used in the SQL Server community. There are also some popular scripts for performing index maintenance from SQL Server MVPs Michelle Ufford and Tara Kizer, but Ola’s are the gold standard and allow you to do backups, index and statistics maintenance, and consistency checks.
One of the cool things about using Ola’s scripts is that they’re tested constantly by thousands of installations around the world and they’re very robust. If you’ve never seen them, or want to upgrade your database maintenance, check them out!
I haven’t done any surveys this year and I’d like to get back into doing one every week or so for the next few months.
In this survey I’d like to know what mechanism(s) you use to perform your regular database maintenance.
I’ll editorialize the results in a week or so.
Thanks!
Back in February I ran a bunch of performance tests of DBCC CHECKDB on SSDs, to evaluate the effects of degree-of-parallelism (DOP) and various options and traceflags, and now I’m finally getting around to presenting the results. Make sure to also read the recent post where I talk about the detrimental effect of computed-column indexes on DBCC CHECKDB performance.
The two variables I altered during my tests were:
Trace flags 2549 and 2562 are described in KB 2634571, and can be summarized as:
These two trace flags really only apply to DBCC CHECKDB when the WITH_PHYSICAL_ONLY option is used.
My test system is as follows:
The complete results are shown below:
There are some very interesting results from this graph, for the test I’m doing using SSDs:
For the scalability result running a full DBCC CHECKDB, some latch contention for structures inside DBCC CHECKDB starts to show up at DOP 8, but not enough to give zero scalability at higher DOPs. This comes from a much higher time spent waiting for CXPACKET waits, roughly doubling from DOP 8 to DOP 16, and again from DOP 16 to DOP 32. I blame the query processor for that :-) In all seriousness, I didn’t expect to see a scalability gain with DOP higher than 8, based on what I’ve seen in the past.
The results without DOP 1 and DOP 2 are shown below, which allows us to focus on the PHYSICAL_ONLY results:
Again, for the test I’m doing using SSDs, there are some interesting results:
When running with a database on SSDs and very fast processors, full DBCC CHECKDB performance scales to DOP 8, and the main performance gain comes from using the WITH PHYSICAL_ONLY option. The two trace flags introduced in late 2011 don’t have a pronounced effect on PHYSICAL_ONLY performance. This is all to be expected, as the SSDs really remove the I/O component of DBCC CHECKDB performance and scalability from consideration.
Although these tests are not conclusive until I’ve run similar tests on a slower I/O subsystem, I’m tempted to recommend limiting DBCC CHECKDB‘s DOP to 8 if you can (maybe using Resource Governor) to limit its resource drain on your system.
I obviously also have to recommend using SSDs to vastly reduce or effectively eliminate (in my case) I/O waits.
During April and May I’ll perform the same set of tests using 15k SCSI drives and also 7.2k SATA drives to see if and how the results differ.
Hope you found this interesting!
While Kimberly was selecting underwater photos for our new website, she came across this one from our dive trip to the remote NE of Indonesia in January 2012.
After several people tried to do an underwater hand-stand, I gave one of our buddies my camera and showed them how it’s done in the middle of an archway. Not the most flattering position for a photo :-)
Click the image for a full-size version. We were at a dive site called Boo Jendela where there are fantastic swim-throughs. Click here for a Google Map of where we were based on the GPS reading I took.
Late last year we finally decided to make time to revamp our blogs and website, and move onto WordPress. I really wish we’d done it a lot sooner as WordPress is such a joy to work with compared to hand-coded ASP.NET, but client work always seemed to get in the way. Now we’ve upgraded, it’s so much easier to change website content and write blog posts.
Kimberly sourced our new logo using the 99 Designs website, which allows artists around the world to compete with each other to win a design competition. Our new blogs went live in January and we just launched our new website on Wednesday. It took us almost five months of elapsed time to do it, mostly waiting for me to populate content, but we did it all ourselves, on our own server hosted by Amazon. Jonathan and I spent many, many hours on con-calls figuring stuff out, and Jonathan now knows more about WordPress and WordPress plug-ins than he ever wanted to know and was the driving force behind getting this all done (he really is utterly fabulous – thank you Jon!). Overall, we’re really proud of what we achieved together – it’s a huge improvement over the old site.
By the way, if you haven’t seen them yet, check out the image rotators on the home page and on the training pages which showcase some of Kimberly’s finest underwater photography. You can get hi-res version of those photos here.
To celebrate getting our new website completely live, we’re offering all new clients a special offer. If you sign up for our signature SQL Server Health Check service during the month of April, it will only cost you US$2500. This covers the health check of a single SQL Server instance, plus up to an hour of email/phone review with us about the results, once you’ve read our comprehensive report. That’s more than a 1/3 saving over the usual price!
Check out our SQL Server Health Check description and then let us know that you’re interested.
We’re looking forward to helping you. It’s what we do best.
It’s no secret that DBCC CHECKDB has some performance quirks based on the schema of the database being checked and various kinds of corruptions. I was recently doing some scalability testing of DBCC CHECKDB for a blog post and discovered quite a nasty performance issue that exists in all versions of SQL Server back to SQL Server 2005. This isn’t a bug, it’s just the way things work.
The issue occurs when a nonclustered index exists that has a computed column as part of the index key or as one of the INCLUDEd columns and affects DBCC CHECKDB, DBCC CHECKFILEGROUP, and DBCC CHECKTABLE.
As a bit of background, here’s an excerpt from the SQL Server 2012 Internals book where I describe one of the ways nonclustered indexes are checked for consistency:
The nonclustered index cross-checks verify that
The mechanism to carry out these checks efficiently has changed in every release since SQL Server 7.0—becoming progressively more and more efficient. In SQL Server 2012, two hash tables are created for each partition of each nonclustered index—one hash table is for the actual records in that partition of the nonclustered index, and the other is for the records that should exist in that partition of the nonclustered index (as calculated from the existing data records in the table).
When a nonclustered index record is processed, all columns in the record are hashed together into a BIGINT value. This includes
The resulting value is added to the master hash value for actual records for the nonclustered index partition of which the record is part.
DBCC CHECKDB knows which nonclustered indexes exist for the table and what the complete nonclustered index record composition should be for each. When a data record is processed, the following algorithm is run for each matching nonclustered index record that should exist for the data record (taking into account any filter predicates for filtered nonclustered indexes):
The premise that this algorithm works on is that if no corruptions exist, the master hash values for the actual records and “should exist” records for each nonclustered index partition should match exactly at the end of the DBCC CHECKDB batch.
When a nonclustered index uses a computed column, the value of the computed column has to be computed based on the column definition. To do that, an internal mechanism called an ‘expression evaluator’ is created. The expression evaluator is provided by the Query Processor code and its behavior is entirely outside the control of DBCC CHECKDB. The drawback of the expression evaluator is that every time it is used, an exclusive latch must be held by the thread using it. This creates an incredible bottleneck and drastically affects performance.
My test system is the Dell R720 we bought last year, with 64GB of memory and 2 E5-2670 CPUs with 8 physical cores and hyperthreading enabled. It’s running SQL Server 2012 SP1 CU3.
The test database is AdventureWorks that Jonathan expanded to 500GB for me using his cool scripts. The database is split over 8 data files stored on two 320GB Fusion-io drives, with tempdb and its log placed on two more 320GB Fusion-io drives. I set things up this way to remove I/O waits from the test.
The first few runs of my testing, with unlimited parallelism produced DBCC CHECKDB elapsed times of around 340 minutes. This seemed incredibly slow to me, so I ran another test and looked at the output of sys.dm_os_waiting_tasks. I discovered that half the threads at any time were waiting for exclusive access to the DBCC_OBJECT_METADATA latch. Adding in some diagnostics to capture waits and latches gave the data below:
WaitType Wait_S Resource_S Signal_S WaitCount Percentage AvgWait_S AvgRes_S AvgSig_S --------------- ---------- ----------- --------- ---------- ----------- ---------- --------- --------- CXPACKET 684482.45 682667.13 1815.32 60294236 34.90 0.0114 0.0113 0.0000 OLEDB 659325.08 659325.08 0.00 207661462 33.62 0.0032 0.0032 0.0000 LATCH_EX 615168.39 605357.28 9811.11 798224634 31.37 0.0008 0.0008 0.0000 LatchClass Wait_S WaitCount Percentage AvgWait_S ------------------------ ---------- ---------- ----------- ---------- DBCC_OBJECT_METADATA 611768.00 764845636 99.16 0.0008
This didn’t make any sense to me as I couldn’t believe that such an incredible bottleneck existed – almost 1ms per latch wait and 800 million latch waits! – surely we’d have heard about it by now? I pinged my friend Ryan Stonecipher who inherited the DBCC CHECKDB code from me and he reminded me about the expression evaluator behavior.
I went back to my test system and looked for nonclustered indexes using computed columns using the code below:
SELECT
[s].[name],
[o].[name],
[i].[name],
1.[name],
[ic].*
FROM sys.columns 1
JOIN sys.index_columns [ic]
ON [ic].[object_id] = 1.[object_id]
AND [ic].[column_id] = 1.[column_id]
JOIN sys.indexes [i]
ON [i].[object_id] = [ic].[object_id]
AND [i].[index_id] = [ic].[index_id]
JOIN sys.objects [o]
ON [i].[object_id] = [o].[object_id]
JOIN sys.schemas [s]
ON [o].[schema_id] = [s].[schema_id]
WHERE 1.[is_computed] = 1;
GO
And I found 6 nonclustered indexes with computed columns in, on some of the largest tables in the database. I then disabled those nonclustered indexes and re-ran the DBCC CHECKDB tests.
17-18 minutes per run.
Wow! The bottleneck from the expression evaluator was causing DBCC CHECKDB to run about 20 times slower.
The wait and latch details for the test runs with the indexes disabled are as below, which is what I expected to see from running DBCC CHECKDB:
WaitType Wait_S Resource_S Signal_S WaitCount Percentage AvgWait_S AvgRes_S AvgSig_S --------------- ---------- ----------- --------- ---------- ----------- ---------- --------- --------- CXPACKET 33064.56 29331.74 3732.83 42034533 48.14 0.0008 0.0007 0.0001 OLEDB 28883.78 28883.78 0.00 173940154 42.05 0.0002 0.0002 0.0000 LATCH_EX 5021.20 4605.50 415.70 30340659 7.31 0.0002 0.0002 0.0000 LatchClass Wait_S WaitCount Percentage AvgWait_S ------------------------ ---------- ---------- ----------- ---------- DBCC_CHECK_AGGREGATE 5039.36 30267451 98.82 0.0002
There’s nothing we can do about this bottleneck except to disable nonclustered indexes on computed columns while DBCC CHECKDB is running and then rebuild them afterwards, which is not a very palatable solution. I wanted to blog about this just to make the issue known and help any of you who are beating your head against a wall trying to figure out poor DBCC CHECKDB performance.
For those of you who were planning to register for our IE1 Immersion Event (on Internals and Designing for Performance) in Chicago at the end of April, I have bad news and good news…
The bad news is that the class reached maximum capacity (30 students) and is sold out.
The good news is that we’ve added another IE1 in Chicago from May 20-24, which opens up another 30 IE1 seats!
This means we have seven classes in Chicago over the four weeks from April 29 – May 24. You can see all the details here.
We hope to see you there!
PS Please note: we do not plan to add any other Immersion Events in 2013 other than what’s currently scheduled. We chose to add another IE1 course in Chicago because we had a few folks on the wait list for which these dates work and, we’re already in Chicago during that week. Adding a class was relatively easy to do because the hotel had availability and all of our gear would already be onsite. We will NOT be adding any other courses or locations in 2013. We are planning a new location in 2014 as well as a new course – IE5: Immersion Event on Relational Data Warehousing. Stay tuned and we hope to see you at an Immersion Event soon!
Have you ever wondered why DBCC CHECKDB, for instance, doesn’t flush the buffer pool when it runs on a database that is larger than the memory capacity of the server?
There’s cool mechanism that it uses called disfavoring. When a buffer is no longer required, the buffer can be marked as the least recently used of all the buffers in the buffer pool, meaning that it will used next when the buffer pool needs a new buffer to read a page from disk. This means that a small number of buffers will be used repeatedly instead of filling the buffer pool with pages from the database being consistency checked.
This mechanism isn’t anything to do with DBCC, but DBCC makes good use of it. DBCC CHECK* commands, DBCC SHOWCONTIG, DBCC UPDATEUSAGE, and the sys.dm_db_index_physical_stats DMV (which is really part of DBCC) all use this.
Let me show you…
On our fast test machine at home (rack-mounted Dell R720) I’ve got a 500GB database called AdventureWorks_Big residing on four Fusion-io SSDs. First off I’ll show you the table I’m going to use:
SELECT [page_count] FROM sys.dm_db_index_physical_stats ( DB_ID (), OBJECT_ID (N'Production.TransactionHistoryEnlarged'), 1, NULL, N'LIMITED'); GO
page_count ----------- 18114867
It’s more than 138GB, and my buffer pool is 56,000MB. Now I’ll make sure the buffer pool is empty, and then run DBCC CHECKTABLE on that table:
DBCC DROPCLEANBUFFERS; GO DBCC CHECKTABLE (N'Production.TransactionHistoryEnlarged') WITH NO_INFOMSGS; GO
Five minutes later it completes. But how much of the table did it read into the buffer pool?
SELECT *, [DirtyPageCount] * 8 / 1024 AS [DirtyPageMB], [CleanPageCount] * 8 / 1024 AS [CleanPageMB] FROM (SELECT (CASE WHEN ([database_id] = 32767) THEN N'Resource Database' ELSE DB_NAME ([database_id]) END) AS [DatabaseName], SUM (CASE WHEN ([is_modified] = 1) THEN 1 ELSE 0 END) AS [DirtyPageCount], SUM (CASE WHEN ([is_modified] = 1) THEN 0 ELSE 1 END) AS [CleanPageCount] FROM sys.dm_os_buffer_descriptors GROUP BY [database_id]) AS [buffers] ORDER BY [DatabaseName]; GO
DatabaseName DirtyPageCount CleanPageCount DirtyPageMB CleanPageMB ------------------- -------------- -------------- ----------- ----------- AdventureWorks_Big 0 188 0 1 master 3 77 0 0 model 0 32 0 0 msdb 1 39 0 0 Resource Database 0 24 0 0 tempdb 260 179 2 1
Cool eh?
There are some other places that disfavor buffers when they’re done with them, including:
And backups don’t use the buffer pool at all so they’re not going to cause buffer pool flushing either.
You can watch disfavoring happening for table scans using the leaf_page_disfavored Extended Event.
Bottom line – some of your common maintenance activities don’t damage the buffer pool, which is a good thing.