Friday, October 19, 2007

During SQL Server 2005 development I did a comprehensive rewrite of the Books Online entries concerned with estimating index, table, and database size (see http://msdn2.microsoft.com/en-us/library/ms187445.aspx). I was recently having a discussion with Øystein Sundsbø about a bug in one of the formulas I wrote and it turns out he's written a neat tool that codifies all the formulas.

In case you're interested, the part of the formula I had incorrect was for figuring out how many pages would be needed for the upper levels of an index b-tree. My formula was:

Num_Index_Pages = ∑Level (Index_Rows_Per_Page)Level – 1
where 1 <= Level <= Levels

and Øystein came up with a better formula:

Num_Index_Pages = ∑Level (Total_Num_Rows/(Index_Rows_Per_Page)Level )
where 1 <= Level <= Levels

where in both formulas:

Levels = 1 + log Index_Rows_Per_Page (Total_Num_Rows / Index_Rows_Per_Page)

Check out Øystein's cool tool at http://dbgoby.blogspot.com/2007/10/db-goby-v10.html!

Friday, October 19, 2007 11:26:04 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Thursday, October 18, 2007

Today I finally managed to hook-up with SQL Server MVP and Regional Director Greg Low so he could interview me for his podcast show - we'd spent the last few months juggling schedules and time-zone differences (he's 17 hours ahead of Redmond) but today the stars aligned and we made it. You can download the interview at www.SQLDownUnder.com - show #24. Thanks Greg!

Ok - this is really geeky, but I love making models, and especially the harder Lego models. Lego is one of the coolest toys ever and I've been a big fan since I was a small child. Lego announced in the Spring their biggest Lego model ever - a large scale Ultimate Collector's Millenium FalconTM with 5195 pieces (check it out here on the Lego site). I pre-ordered mine right away!

Kimberly was away teaching for Microsoft in India the first two weeks of October this year. So what does Paul do when Kimberly's away? Well, apart from feverishly blogging on my new blog, I made the Falcon, which arrived the day before she left. I reckon it took me about 25-30 hours total time over the course of two weeks.

Here's a picture of the finished model, with a Diet Pepsi can alongside to give you an idea of the scale (click on it to get to a hi-res image you can scroll around to see the detail). Below is a bit of detail blow-up featuring my nick-name-sake Chewie.

 

      Pretty cool eh? :-)

 

 

 

 

 

 

Thursday, October 18, 2007 6:17:11 PM (Pacific Standard Time, UTC-08:00)  #    Comments [2]  | 

There's been some discussion over on the SQL Server Central forums about database mirroring performance, and one question in particular on how to do index rebuilds on mirrored VLDBs.

Remember that in database mirroring, the database has to be in the full recovery mode, so all index rebuilds are fully logged. It all comes down to the amount of transaction log generated and whether this causes a problem.

  • In synchronous mirroring, the additional log being generated could overload the network link between the principal and mirror. This will slow down the hardening of regular transactions in the mirror's transaction log, and thus led to a decrease in transaction throughput on the principal.
  • In asynchronous mirroring, the additional log being generated could again overload the network link - but this time there's no requirement for transactions to harden in the mirror before they can commit on the principal, so instead the SEND queue on the principal gets really large. Now, in asynchronous mirroring the SEND queue represents the amount of work that will be lost if a failover occurs, as its all the transaction log that hasn't yet been sent to the mirror. So, in asynchronous mode, a large index rebuild operation could lead to increase exposure to data loss in the event of a failover.

In SS2008, the log stream compression I blogged about here should go a long way to alleviating this problem. There are three ways I can think of to reduce the amount of transaction log generated by and index maintenance plan in SS2005:

  1. Use a potentially less expensive (in terms of logging) solution for removing index fragmentation. Doing an index rebuild is guaranteed to generate an equivalent amount of transaction log to the size of the index being rebuilt, no matter how much fragmentation there is - because an index rebuild always rebuilds the entire index. The alternative is to do an index reorganize, either using my old DBCC INDEXDEFRAG or the new ALTER INDEX ... REORGANIZE. These will only generate transaction log when index pages are compacted and reorganized - so for less heavily fragmented indexes. There's no hard and fast rule here but I generally say where Logical Scan Fragmentation/Average Fragmentation in Percent from DBCC SHOWCONTIG/sys.dm_db_index_physical_stats, respectively, is less than 30%. You also need to consider page density too - but really this is a topic for a whole other post.
  2. Be very selective on which indexes you choose to rebuild/reorganize. A lot of people have a maintenance plan that does this for every index every week, without checking whether the index is even fragmented or whether removing fragmentation for an index improves workload performance. Again, this a whole other topic but there is an old whitepaper for SS2000 I helped with that describes some of this - Microsoft SQL Server 2000 Index Defragmentation Best Practices.
  3. Partition the tables/indexes so that the changing portion of the data is the only portion that's affected by index maintenance. If most of your data is read-only, there's no point in having it included in reindex/reorganize operations, right? Kimberly recently wrote a blog post about such an architecture here.

For more info on database mirroring performance considerations, checkout the whitepaper Database Mirroring: Best Practices and Performance Considerations. There's also a slide deck presentation based on this whitepaper that's been presented at various TechEds (I did it in China and Hong Kong last year and Kimberly did it in South Africa last year) - you can download it from the Hong Kong website here.

The bottom line is when mirroring is in the mix, you need to be more intentional with your database maintenance.

Thursday, October 18, 2007 4:37:43 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Wednesday, October 17, 2007

Thanks for your patience and to all those who emailed to let me know. All the SQLskills.com blogs have been updated to the latest dasBlog version and everything's working again. I'd appreciate you taking the time to go back and re-enter any comments you tried to over the last few days.

Thanks!

Wednesday, October 17, 2007 4:52:22 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Monday, October 15, 2007

(I've heard from many of you that the Comments feature of my blog isn't working. I know - there's an issue with our blog engine that we're fixing. My apologies - I'll post a quick note when it's fixed.)

After posting last week about a BACKUP feature that I don't like (WITH NO_LOG - see here), I thought I'd do a quick post this week about a feature that was introduced in SS2005 for BACKUP that I DO like - the COPY_ONLY option to BACKUP DATABASE and BACKUP LOG.

Here's a situation I've seen several times that really screws people up. A savvy DBA of a busy web-fronted sales business has a rigorous backup schedule setup - daily full backups at midnight and differential backups every 4 hours. Everything's working perfectly. One day a disaster strikes at 7pm and the storage for the database is destroyed. The DBA starts restoring the backups using WITH NORECOVERY, gets to the noon differential backup and gets the following message:

RESTORE DATABASE production FROM DISK = 'c:\sqlskills\production-diff12pm.bck' WITH NORECOVERY;

GO

Msg 3136, Level 16, State 1, Line 1
This differential backup cannot be restored because the database has not been restored to the correct earlier state.
Msg 3013, Level 16, State 1, Line 1
RESTORE DATABASE is terminating abnormally.

Uh-oh. That's not good. That says that the 12pm differential backup does not have the same differential base as the 4am and 8am ones. How can that have happened? The DBA takes a look in the backup history tables in msdb (stripping out all the log backups):

SELECT name, backup_start_date, type, first_lsn, database_backup_lsn

FROM msdb.dbo.backupset WHERE database_name = 'production';

GO

name                           backup_start_date       type first_lsn            database_backup_lsn
------------------------------ ----------------------- ---- -------------------- --------------------
production Full 10/14/07       2007-10-14 00:00:00.000 D    88000000025300001    0
production Diff 4am 10/14/07   2007-10-14 04:00:00.000 I    118000000003000160   88000000025300001
production Diff 8am 10/14/07   2007-10-14 08:00:00.000 I    144000000070500160   88000000025300001
NULL                           2007-10-14 10:29:50.000 D    161000000056100147   88000000025300001
production Diff 12pm 10/14/07  2007-10-14 12:00:00.000 I    161000000062800034   161000000056100147
production Diff 4pm 10/14/07   2007-10-14 16:00:00.000 I    173000000054100144   161000000056100147

Aha! Look at the highlighted date in the output - someone took a full database backup of the database at 10.29am. The DBA checks and finds that one of the developers wanted a copy of the production database to play with so took a database backup. He restored the backup and then deleted both it and the database. Looking at the database_backup_lsn field, we can see that all the backups up till the accidental backup (look at the LSNs highlighted blue) have the differential base equal to the first_lsn of the full backup from midnight. The two backups after that have the differential base equal to the first_lsn of the accidental full backup (the LSNs highlighted red).

Oops! That means that the production database cannot be rolled forward any further than the last log backup before the accidental full backup was taken - losing more than 8 hours of data completely. Even though all the subsequent backups are intact, the initial full backup for them no longer exists so they're useless!

So how can a developer get a copy of the database without screwing up a recovery from a potential disaster? Using the new COPY_ONLY option. Taking a full backup with this option does not make the new backup a differential base - it does not clear any of the differential bitmaps and basically doesn't interfere with the regularly scheduled backups. Apart from that, it's a regular full backup of the database. One thing to bear in mind is that it's a one-off - you can't use one of these backups as a differential base, so you can't take COPY_ONLY differential backups. If you specify COPY_ONLY with DIFFERENTIAL, the COPY_ONLY option is ignored.

One other cool thing is that you can specify this option for a BACKUP LOG command too. This behaves the same way - it takes a log backup, but does not change the transaction log at all (i.e. it doesn't make any portion of the log inactive and permit log truncation). This is useful for doing online file restores without having the necessary backup of the tail of the log affect the log backup chain. More on that in a later post...

[Edit: In the initial version of this post, the DBA's backup strategy included log backups. I went through a couple of versions of this post before settling on full + diffs, but I forgot to remove the reference to log backups. In the first comment, Mark House correctly points out that an accidental full backup doesn't prevent a DBA with a complete log backup chain from recovering to any point in time. Apologies for the confusion!]

Monday, October 15, 2007 1:37:13 PM (Pacific Standard Time, UTC-08:00)  #    Comments [3]  | 
Friday, October 12, 2007

All the razzamatazz about new releases go on about what's new but hardly ever is there discussion of what's been removed. So that's the topic of this short post.

In the Books Online that comes with the July CTP of SS2008 (here's a link to the download page), the topic Discontinued Database Engine Functionality in SQL Server 2008 lists the following:

Category Discontinued feature Replacement

Aliases

sp_addalias

Replace aliases with a combination of user accounts and database roles. For more information, see CREATE USER (Transact-SQL) and CREATE ROLE (Transact-SQL). Remove aliases in upgraded databases by using sp_dropalias (Transact-SQL).

Backup and restore

DUMP statement

BACKUP

Backup and restore

LOAD statement

RESTORE

Backup and restore

BACKUP LOG WITH NO_LOG

None. The transaction log is automatically truncated when the database is using the simple recovery model. If you must remove the log backup chain from a database, switch to the simple recovery model.

Backup and restore

BACKUP LOG WITH TRUNCATE_ONLY

None. The transaction log is automatically truncated when the database is using the simple recovery model. If you must remove the log backup chain from a database, switch to the simple recovery model.

Backup and restore

BACKUP TRANSACTION

BACKUP LOG

Backup and restore

sp_helpdevice

Query the sys.backup_devices catalog view.

Compatibility level

60, 65, and 70 compatibility levels

Databases must be set to at least compatibility level 80.

DBCC

DBCC CONCURRENCYVIOLATION

None

Groups

sp_addgroup

Use roles.

Groups

sp_changegroup

Use roles.

Groups

sp_dropgroup

Use roles.

Groups

sp_helpgroup

Use roles.

Sample databases

Northwind and pubs

Use AdventureWorks. Northwind and pubs are available as downloads, or can be copied from a previous installation of SQL Server.

For more information, see AdventureWorks Sample Databases.

Most of these aren't going to bother people much I would guess but removing BACKUP LOG WITH NO_LOG is going to cause problems. Don't get me wrong, I think it's great that it's being removed (see my previous post for why I think it's evil) but it's going to break a lot of people who rely on it now for managing their transaction logs instead of taking log backups or switching to SIMPLE recovery mode.

Friday, October 12, 2007 11:26:00 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 

This has come up several times over the last few days, and is something that Kimberly and I discuss (and deplore) when we lecture - the use of BACKUP LOG WITH NO_LOG (or TRUNCATE_ONLY - they're synonymous) to allow log truncation.

How is it used?

The common use is when the transaction log grows to be inordinately large on a database that's in FULL (or BULK_LOGGED) recovery mode. This happens after a database backup has been taken, which switches the log into a mode where it won't truncate until its been backed up. In these circumstances, if you don't take a transaction log backup, the log will continue to grow until it runs out of disk space.

The alternative to taking a real log backup is to issue a BACKUP LOG dbname WITH NO_LOG command. Let's see this in action - note that I'm not advocating its use but I want to show you what it does. First off I'll create some transaction log after taking a full database backup:

USE nologtest;
GO

BACKUP DATABASE nologtest TO DISK = 'c:\sqlskills\nologtest.bck' WITH INIT;
GO

CREATE TABLE t1 (c1 INT, c2 VARCHAR (8000))
GO

SET NOCOUNT ON
GO

DECLARE @a INT;
SELECT @a = 1;
WHILE (@a < 10000)
BEGIN
   
INSERT INTO t1 VALUES (@a, REPLICATE ('a', 8000));
   
SELECT @a = @a + 1;
END;
GO

How large is the transaction log now?

SELECT name, size FROM sys.database_files;
GO

name            size
--------------- --------
nologtest       90264
nologtest_log   104128

A little bit larger than the data file (which is what I'd expect after the operation I just performed) and they're both around 100MB. I'll pretend that I haven't been paying attention to the size of the database and log and now I don't have any space to perform a backup. Can I just shrink the log?

DBCC SHRINKFILE (nologtest_log, 2);
GO

Cannot shrink log file 2 (nologtest_log) because all logical log files are in use.
DbId   FileId      CurrentSize MinimumSize UsedPages   EstimatedPages
------ ----------- ----------- ----------- ----------- --------------
11     2           102944      63          102944      56

No. What's stopping me (well DBCC SHRINKFILE just told me, but let's double-check)?

SELECT log_reuse_wait_desc FROM sys.databases WHERE NAME = 'nologtest';
GO

LOG_BACKUP

Ok - so I can't perform a backup so I'll use BACKUP LOG WITH NO_LOG.

BACKUP LOG nologtest WITH NO_LOG;
GO

SELECT log_reuse_wait_desc FROM sys.databases WHERE NAME = 'nologtest';
GO

NOTHING

Now it looks like I can do the shrink:

DBCC SHRINKFILE (nologtest_log, 2);
GO

DbId   FileId      CurrentSize MinimumSize UsedPages   EstimatedPages
------ ----------- ----------- ----------- ----------- --------------
11     2           256         63          256         56

Success!

Hold on, are you sure?

Why is it bad?

Do you realize what just happened? We discarded the contents of the transaction log since the last full/differential/log backup. That means that until the next full or differential backup completes, all work since the last backup completed will be lost if the database is destroyed. So if your business relies on not losing data/work in the event of a disaster - then what I just did is sacrilege!

The whole point of FULL (and BULK_LOGGED) recovery modes is to preserve the transaction log so that the database can be recovered right up to the point of a failure, or to do point-in-time restores. Using BACKUP LOG WITH NO_LOG negates all of that. (For completeness, note that you can't do a point-in-time restore to any time in a transaction log backup containing a bulk-logged transaction).

What are the alternatives?

If you don't want the FULL recovery mode behavior, then don't use FULL recovery mode - switch to SIMPLE mode. But be aware that you lose the ability to restore to any point in between your full or differential backups.

If you want the FULL recovery mode behavior, but don't want to run out of log space - then the answer is again simple (pun intended :-), take log backups! If your log is growing too quickly, take more frequent log backups! If you don't have enough disk space to store the log backups, talk to your management about the options - buying more disks and continuing to use the FULL recovery options, or being forced to move to SIMPLE recovery mode.

One thing to bear in mind - you actually have to monitor the size of your log to tell whether its growing. That's what gets people into trouble in the first place - a combination of:

  • Not knowing that the database is in FULL recovery mode
  • Not tracking the size of the log

The problem comes for those people who just use an app and have no idea that the developer put the database (maybe for a web application log) into FULL recovery mode, or even that they have SQL Server on their system. For these people there is no good answer unfortunately apart from educating developers.

How to prevent it being used

If you're a sysadmin who wants to stop your database owners and backup operators from using the NO_LOG option, there is way to do it. Trace flag 3231 in SS2000 and SS2005 will turn the NO_LOG and TRUNCATE_ONLY options into no-ops in FULL/BULK_LOGGED recovery mode, and will clear the log in SIMPLE recovery mode. Trace flag 3031 in SS2005 turns them in checkpoints in all recovery modes. These trace flags are undocumented but they're allowed to be publicized.

This way no-one can truncate the log unless you allow them to, and when they come to you to complain, you can educate them!

Friday, October 12, 2007 9:09:43 AM (Pacific Standard Time, UTC-08:00)  #    Comments [13]  | 
Thursday, October 11, 2007

It gives me great pleasure to announce two new additions to the SQLskills team - Stacia Misner and Simon Sabin. Stacia's a BI expert who will be working alongside Liz Vitt, and Simon's a developer expert who will be working alongside Bob Beauchemin. Bringing Stacia and Simon on-board really strengthens the capabilities of the SQLskills team as we now have two widely-known and respected Subject Matter Experts in each of the three areas we operate in:

  • DBA/ITPro - me and Kimberly
  • BI - Liz and Stacia
  • Developer - Bob and Simon

Their blogs are at http://www.sqlskills.com/blogs/stacia and http://sqlblogcasts.com/blogs/simons/, respectively (with a SQLskills.com-hosted blog for Simon coming soon) and their bios are below (in their own words).

Welcome!

Stacia Misner

Stacia Misner has over 23 years of experience with improving business practices through technology and has been providing consulting and education services for Microsoft’s business intelligence technologies since 2000. Prior to becoming an independent consultant, she spent nearly six years at Aspirity  (acquired by Hitachi Consulting) as a member of their Microsoft Gold-Certified Business Intelligence practice for which she developed and delivered a variety of BI courses including Microsoft’s SQL Server Accelerator for Business Intelligence, SQL Server 2005 Business Intelligence Ascend, and Business Intelligence Voyage courses.  Stacia has presented at the Professional Association for SQL Server (PASS), Microsoft’s TechEd, and SQL Server Magazine’s SQL Server 2005 roadshows. She has 8 years of experience in business intelligence architecture and implementation, data warehousing, OLAP, ETL, and reporting and analysis, as well as 15 years of experience in technical project management. Stacia is a Microsoft Certified IT Professional – Business Intelligence Developer, and a Microsoft Certified Technology Specialist – SQL Server 2005 Business Intelligence Development.

Stacia Misner is the author of:

and co-author of:

Simon Sabin

Simon runs his own database consultancy company Onarc Consulting. He specialises in SQL Server focusing on the areas of search, distributed architectures, business intelligence and application development. He has worked with SQL Server since 1998, and has always focused on high-performance, reliable systems.

Simon graduated from Nottingham University in 1996 and joined CMG as a consultant working on many varied systems over 7 years. He has spent the last 2 ½ years working for the largest Internet job board company in the UK developing a passion for search and the discoverability of data.

He was awarded MVP status by Microsoft in 2006, and is a regular speaker at SQL Server events, as well as writing for the SQL Server Central and Simple-Talk.com websites. In 2007 he co-founded SQLBits aimed at delivering SQL Server to the masses with the first free conference held in October 2007 for over 300 professionals.

Thursday, October 11, 2007 1:09:17 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Wednesday, October 10, 2007

As I mentioned in a previous post, one of the new features for database mirroring in SQL Server 2008 is log stream compression. There's a good explanation of what this is (along with some example workloads and performance graphs) on the SQL Customer Advisory Team blog so I'm not going to duplicate all that here.

Basically, the way mirroring works is by shipping all the transaction log from the principal to the mirror, and then replaying the log records on the mirror database. Network bandwidth can be a bottleneck for mirrored databases where the rate of transaction log generation is high enough to saturate the network. On synchronously mirrored databases (where a transaction cannot commit on the principal until all the transaction log for it has been hardened in the log of the mirror database), this can even lead to workload throughput degradation.

Log compression is a way to alleviate this problem. SS2008 Books Online states that compression rates of at least 12.5% are achieved - obviously the compression ratio could be much higher than that and is dependant on what's being compressed - i.e. the data that is being processed by the application. One of the tests they did for the post above shows a 5x compression ratio - that's pretty good.

The downside of this is that compression is not free - extra CPU must be used. The obvious CPU load increases comes from having to compress the transaction log on the principal and then decompress it again on the mirror before hardening it in the log. The non-obvious increase will come because the principal and mirror should end up being able to process more transactions/second, which means more CPU is needed.

This extra CPU could be a problem for some systems that are already pegged in terms of CPU load, and so they may actually see a drop in performance when log stream compression is enabled. Now, it's on by default when you upgrade to SS2008 but there is a way to turn it off. The blog post above divulges that there is a trace flag, 1462, that turns off log stream compression and effectively reverts the behavior back to SS2005.

Apart from the obvious benefits of this feature for systems that can afford the extra CPU, I'm excited because it may allow more mirrored databases to perform effective database maintenance - in the form of fragmentation control and index maintenance. When database mirroring is enabled, the database has to be running in the full recovery mode. This means that operations such as index rebuilds are fully logged - that's a lot of extra log being generated if you're used to switching to bulk-logged recovery mode to perform index operations. Of course, this isn't an issue for reorganizing an index (with ALTER INDEX ... REORGANIZE) as this has been fully logged (and has to be that way because of how it works) since I wrote the old DBCC INDEXDEFRAG for SS2000.

Anyway, for some customers the fact that these two operations are fully logged means that running them produces so much additional log that, in conjunction with the log from the workload itself, the network becomes a bottleneck for mirroring and it slows down both the maintenance and reduces the workload throughput. The addition of log stream compression means that for these systems, the log may be compressed enough so that the network is no longer a bottleneck and more regular fragmentation control can take place. This in turn will increase workload throughput as well - a double benefit!

Wednesday, October 10, 2007 4:06:51 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 

It's not like me to post on security - that's one of Kimberly's areas - but this is an interesting example of a real-life injection attack :-)

Wednesday, October 10, 2007 9:59:48 AM (Pacific Standard Time, UTC-08:00)  #    Comments [1]  | 
Tuesday, October 09, 2007

This is another question that came up on the Disaster Recovery forum on MSDN. Paraphrasing - ' I have a backup file containing full backups for 45 databases. How can I restore them all using a script?'

The answer is pretty straightforward. Let's create the situation described, using 3 databases for clarity rather than 45:

BACKUP DATABASE tinylogtest TO DISK = 'c:\sqlskills\mixedbackups.bck' WITH NAME = 'tinylogtest 10/09/07', INIT;

BACKUP DATABASE pagesplittest TO DISK = 'c:\sqlskills\mixedbackups.bck' WITH NAME = 'pagesplittest 10/09/07';

BACKUP DATABASE ghostrecordtest TO DISK = 'c:\sqlskills\mixedbackups.bck' WITH NAME = 'ghostrecordtest 10/09/07';

GO

You now can't do a regular restore from that backup file for any database except the first one in the file:

RESTORE DATABASE pagesplittest FROM DISK = 'c:\sqlskills\mixedbackups.bck' WITH REPLACE;

GO

Msg 1834, Level 16, State 1, Line 1

The file 'c:\tinylogtest\tinylogtest.mdf' cannot be overwritten. It is being used by database 'tinylogtest'.

Msg 3156, Level 16, State 4, Line 1

File 'fgt_mdf' cannot be restored to 'c:\tinylogtest\tinylogtest.mdf'. Use WITH MOVE to identify a valid location for the file.

Msg 1834, Level 16, State 1, Line 1

The file 'c:\tinylogtest\tinylogtest.ldf' cannot be overwritten. It is being used by database 'tinylogtest'.

Msg 3156, Level 16, State 4, Line 1

File 'fgt_log' cannot be restored to 'c:\tinylogtest\tinylogtest.ldf'. Use WITH MOVE to identify a valid location for the file.

Msg 3119, Level 16, State 1, Line 1

Problems were identified while planning for the RESTORE statement. Previous messages provide details.

Msg 3013, Level 16, State 1, Line 1

RESTORE DATABASE is terminating abnormally.

So how to restore the other databases apart from the first one? The answer is that you need to specify which of the backups within the file you're after. You can find the positions using the RESTORE HEADERONLY command:

RESTORE HEADERONLY FROM DISK = 'c:\sqlskills\mixedbackups.bck';

GO

This returns lots of information:

and then a whole bunch more columns and then ending with:

For our purposes, the interesting things to note are the Position, DatabaseName, and BackupTypeDescription. In our simple example we have 3 full backups. To restore the pagesplittest database as we tried to do above, we have to specifiy the position within the file of the backup we want to restore - just the database name isn't enough.

RESTORE DATABASE pagesplittest FROM DISK = 'c:\sqlskills\mixedbackups.bck' WITH REPLACE, FILE = 2;

GO

And it works. So the question asked how to do this, plus how to do it in a script. Below is a script I've adapted from the example I wrote for the Books Online for DBCC SHOWCONTIG back in 1999 when I rewrote DBCC SHOWCONTIG for SQL Server 2000.

Enjoy!

-- Create a temporary table to hold the output from RESTORE HEADERONLY

CREATE TABLE master.dbo.restoreheaderonly (

BackupName NVARCHAR (128), BackupDescription NVARCHAR (255), BackupType SMALLINT, ExpirationDate DATETIME,

Compressed TINYINT, Position SMALLINT, DeviceType TINYINT, UserName NVARCHAR (128), ServerName NVARCHAR (128),

DatabaseName NVARCHAR (128), DatabaseVersion INT, DatabaseCreationDate DATETIME, BackupSize NUMERIC (20, 0),

FirstLSN NUMERIC (25, 0), LastLSN NUMERIC (25,0), CheckpointLSN NUMERIC (25,0), DatabaseBackupLSN NUMERIC (25, 0),

BackupStartDate DATETIME, BackupFinishDate DATETIME, SortOrder SMALLINT, CodePage SMALLINT, UnicodeLocaleId INT,

UnicodeComparisonStyle INT, CompatibilityLevel TINYINT, SoftwareVendorId INT, SoftwareVersionMajor INT,

SoftwareVersionMinor INT, SoftwareVersionBuild INT, MachineName NVARCHAR (128), Flags INT, BindingID UNIQUEIDENTIFIER,

RecoveryForkID UNIQUEIDENTIFIER, Collation NVARCHAR (128), FamilyGUID UNIQUEIDENTIFIER, HasBulkLoggedData BIT,

IsSnapshot BIT, IsReadOnly BIT, IsSingleUser BIT, HasBackupChecksums BIT, IsDamaged BIT, BeginsLogChain BIT,

HasIncompleteMetaData BIT, IsForceOffline BIT, IsCopyOnly BIT, FirstRecoveryForkID UNIQUEIDENTIFIER,

ForkPointLSN NUMERIC (25, 0) NULL, RecoveryModel NVARCHAR (60), DifferentialBaseLSN NUMERIC (25, 0) NULL,

DifferentialBaseGUID UNIQUEIDENTIFIER, BackupTypeDescription NVARCHAR (60), BackupSetGUID UNIQUEIDENTIFIER NULL);

GO

 

-- Populate the table

INSERT INTO master.dbo.restoreheaderonly EXEC ('RESTORE HEADERONLY FROM DISK = ''C:\sqlskills\mixedbackups.bck''') ;

GO

 

DECLARE @Position SMALLINT;

DECLARE @DatabaseName NVARCHAR (128);

DECLARE @ExecString NVARCHAR (255);

 

-- Declare a cursor to iterate over the results

DECLARE databases CURSOR FOR

SELECT Position, DatabaseName FROM master.dbo.restoreheaderonly WHERE BackupTypeDescription = 'Database';

 

-- Open the cursor.

OPEN databases;

 

-- Loop through the databases.

FETCH NEXT FROM databases INTO @Position, @DatabaseName;

WHILE @@FETCH_STATUS = 0

BEGIN

SELECT @ExecString = 'RESTORE DATABASE ' + RTRIM (@DatabaseName) +

' FROM DISK = ''C:\sqlskills\mixedbackups.bck''' +

' WITH REPLACE, FILE = ' + RTRIM (CONVERT (VARCHAR (10), @Position));

SELECT 'Restoring database ' + RTRIM (@DatabaseName);

EXEC (@ExecString);

FETCH NEXT FROM databases INTO @Position, @DatabaseName;

END;

 

-- Close and deallocate the cursor.

CLOSE databases;

DEALLOCATE databases;

 

-- Delete the temporary table.

DROP TABLE master.dbo.restoreheaderonly;

GO

Tuesday, October 09, 2007 5:35:32 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Monday, October 08, 2007

Lots of people have been asking for us to create some aggregate feeds of the various blogs on the SQLskills site - well, now I've done it.

The three new feeds are:

SQLskills BI Team Blog

SQLskills SQL Server 2008 Category Aggregate Feed

SQLskills All Blogs Aggregate Feed

http://pipes.yahoo.com/pipes/pipe.run?_id=vv9PBOp13BGAc67kLO2fWQ&_render=rss. There's no FeedBurner equivalent for this feed as its larger than the maximum feed size that FeedbBurner can handle (512k).

Enjoy!

Monday, October 08, 2007 5:35:39 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 

Over the years I was in the Storage Engine team I saw a lot of concern on the various forums about the ghost cleanup task. There have been a few bugs with it in previous versions  (see these KB articles - 932115 and 815594) and there's very little info available on it. For some reason I didn't get around to posting about it on my old blog but today I want to go into some depth on it.

So what is ghost cleanup? It's a background process that cleans up ghost records - usually referred to as the ghost cleanup task. What's a ghost record? As I described briefly in the Anatomy of a record post last week, a ghost record is one that's just been deleted in an index on a table (well, actually it gets more complicated if snapshot isolation of some form is enabled but for now, a record in an index is a good start). Such a delete operation never physically removes records from pages - it only marks them as having been deleted, or ghosted. This is a performance optimization that allows delete operations to complete more quickly. It also allows delete operations to rollback more quickly because all that needs to happen is to unmark the records as being deleted/ghosted, instead of having to reinsert the deleted records. The deleted record will be physically removed (well, its slot will be removed - the record data isn't actually overwritten) later by the background ghost cleanup task. The ghost cleanup task will leave a single record on the page to avoid having to deallocate empty data or index pages.

The ghost cleanup task can't physically delete the ghost records until after the delete transaction commits because the deleted records are locked and the locks aren't released until the transaction commits. As an aside, when ghost records exist on a page, even a NOLOCK or READ UNCOMMITTED scan won't return them because they are marked as ghost records.

When a record is deleted, apart from it being marked as a ghost record, the page that the record is on is also marked as having ghost records in one of the allocation maps - the PFS page (post coming soon!) - and in its page header. Marking a page as having ghost records in a PFS page also changes the database state to indicate that there are some ghost records to cleanup - somewhere. Nothing tells the ghost cleanup task to clean the specific page that the delete happened on - yet. That only happens when the next scan operation reads the page and notices that the page has ghost records.

The ghost cleanup task doesn't just start up when it's told to - it starts up in the background every 5 seconds and looks for ghost records to cleanup. Remember that it won't be told to go cleanup a specific page by a delete operation - it's a subsequent scan that does it, if a scan happens. When the ghost cleanup task starts up it checks to see if its been told to cleanup a page - if so it goes and does it. If not, it picks the next database that is marked as having some ghost records and looks through the PFS allocation map pages to see if there are any ghost records to cleanup. It will check through or cleanup a limited number of pages each time it wakes up - I think I remember the limit is 10 pages - to ensure it doesn't swamp the system. So - the ghost records will eventually be removed - either by the ghost cleanup task processing a database for ghost records or by it specifically being told to remove them from a page. If it processes a database and doesn't find any ghost records, it marks the database as not having any ghost records so it will be skipped next time.

How can you tell its running? On SQL Server 2005, you can use the following code to see the ghost cleanup task in sys.dm_exec_requests:

SELECT * INTO myexecrequests FROM sys.dm_exec_requests WHERE 1 = 0;

GO

SET NOCOUNT ON;

GO

DECLARE @a INT

SELECT @a = 0;

WHILE (@a < 1)

BEGIN

INSERT INTO myexecrequests SELECT * FROM sys.dm_exec_requests WHERE command LIKE '%ghost%'

SELECT @a = COUNT (*) FROM myexecrequests

END;

GO

SELECT * FROM myexecrequests;

GO

And on SQL Server 2000 you need to use sysprocesses (well, on SQL Server 2005 this works as well but its fake view derived from the DMVs):

SELECT * INTO mysysprocesses FROM master.dbo.sysprocesses WHERE 1 = 0;

GO

SET NOCOUNT ON;

GO

DECLARE @a INT

SELECT @a = 0;

WHILE (@a < 1)

BEGIN

INSERT INTO mysysprocesses SELECT * FROM master.dbo.sysprocesses WHERE cmd LIKE '%ghost%'

SELECT @a = COUNT (*) FROM mysysprocesses

END;

GO

SELECT * FROM mysysprocesses;

GO

The output from sys.dm_exec_requests is (with most unused and uninteresting columns stripped off):

session_id request_id  start_time              status       command
---------- ----------- ----------------------- ------------ ----------------
15         0           2007-10-05 16:34:49.653 background   GHOST CLEANUP

So how can you tell if a record is ghosted? Let's engineer some and look at it with DBCC PAGE - I've stripped out the uninteresting bits of the output and highlighted the interesting ghost parts:

CREATE TABLE t1 (c1 CHAR(10))

CREATE CLUSTERED INDEX t1c1 on t1 (c1)

GO

BEGIN TRAN

INSERT INTO t1 VALUES ('PAUL')

INSERT INTO t1 VALUES ('KIMBERLY')

DELETE FROM t1 WHERE c1='KIMBERLY';

GO

DBCC IND ('ghostrecordtest', 't1', 1);

GO

DBCC TRACEON (3604);

GO

DBCC PAGE ('ghostrecordtest', 1, 143, 3);

GO

<snip>

m_freeData = 130    m_reservedCnt = 0   m_lsn = (20:88:20)
m_xactReserved = 0  m_xdesId = (0:518)  m_ghostRecCnt = 1
m_tornBits = 0

<snip>

Slot 0 Offset 0x71 Length 17

Record Type = GHOST_DATA_RECORD      Record Attributes =  NULL_BITMAP
Memory Dump @0x6256C071

00000000:   1c000e00 4b494d42 45524c59 20200200 †....KIMBERLY  ..
00000010:   fc†††††††††††††††††††††††††††††††††††.
UNIQUIFIER = [NULL]

Slot 0 Column 1 Offset 0x4 Length 10

c1 = KIMBERLY

Slot 1 Offset 0x60 Length 17

Record Type = PRIMARY_RECORD         Record Attributes =  NULL_BITMAP
Memory Dump @0x6256C060

00000000:   10000e00 5041554c 20202020 20200200 †....PAUL      ..
00000010:   fc†††††††††††††††††††††††††††††††††††.
UNIQUIFIER = [NULL]

Slot 1 Column 1 Offset 0x4 Length 10

c1 = PAUL

Let's see what goes on the transaction log during this process (remember this is undocumented and unsupported - do it on a test database) - I've stripped off a bunch of the columns in the output:

DECLARE @a CHAR (20)

SELECT @a = [Transaction ID] FROM fn_dblog (null, null) WHERE [Transaction Name]='PaulsTran'

SELECT * FROM fn_dblog (null, null) WHERE [Transaction ID] = @a;

GO