Friday, November 09, 2007

This is a question I was asked multiple times over the last week: the Microsoft guidelines for database mirroring say not to mirror more than 10 databases per instance - why is that and is it true?

The answer is my favorite 'it depends!'. The number 10 is a rough guess at the sweet-spot for the majority of customers. The factors that need to be considered are:

  • How much memory do the principal and mirror instances have? (hopefully the same)
  • How much processing-power do the principal and mirror instances have? (hopefully the same)
  • How much bandwidth does the IO subsystem have on the mirror instance? (hopefully the same as on the principal)
  • How much transaction log does the workload on each database generate?
  • How much network bandwidth is available between the principal and the mirror instances?

The last two factors are the most critical. If the network bandwidth available between the two instances is not enough to handle the combined transaction log generation rate per second from all databases being mirrored then performance will drop on the principal databases. SQL Server 2008 does alleviate some of this with log stream compression - see here for details. The next most critical thing to consider is the memory and thread requirements for mirroring - each mirrored database takes one thread plus some memory, so on low-powered servers, lots of mirrored databases may be too much load on the server when combined with the regular workload.

Here are some examples that I've seen:

  • A customer with 150 databases, all of which have very small amounts of activity, and not all at the same time, has them all mirrored with no problem.
  • A customer with only 3 heavily-loaded databases, but without a great network connection, that can barely mirror one of the databases without the lack of network bandwidth causing workload degradation.

The key to success here is to do the log generation calculation and then if it seems that the available network bandwidth will support the number of databases you want to mirror, test it first before relying on it in production.

I guess the bottom-line here is that any broad guidance is only that - your mileage may (and probably will) vary. Always do your own calculations and testing.

Friday, November 09, 2007 3:58:53 PM (Pacific Standard Time, UTC-08:00)  #    Comments [1]  | 

In one of the sessions Kimberly and I taught this week at SQL Connections, we discussed how to choose efficient data-types - I'd like to share the discussion here with an example.

The crux of the matter is in the amount of space required to store the data-types in the schema, and ensuring that the data-type choice matches the data being stored without incurring any unnecessary overhead. Here are a couple of examples:

  • A field that stores a person's age will usually be able to use a TINYINT type that can store a maximum value of 256 - unless its an archeological or historical database, for instance. Without putting a lot of thought into it, however, someone may choose to use an INTEGER type. What's the difference? A TINYINT take a single byte to store, whereas an INTEGER takes *4* bytes to store - wasting 3 bytes per record.
  • A field that stores a person's city of residence in the USA needs to be able to hold 24 characters (see my previous post on how long fields have to be) - so what data-type should you use? A CHAR (24) will be able to store all the possible values, but will *always* take up 24 bytes in the record as it's a fixed-length column. A VARCHAR (24), on the other hand will only store the number of bytes equal to the number of characters in the city name, so using the fixed-length type will waste a varying number of bytes per record.
  • In the USA, the names of the various States are commonly abbeviated to two characters - for instance, Washington = WA and California = CA. So is the best type to use CHAR (2) or VARCHAR (2)? Given that the abbreviations are always 2 characters, they'll both store 2 bytes all the time. BUT, a variable length column has a two-byte overhead (the two-byte pointer for the column that's stored in the variable-length column offset array in the record) - so in this case the best data-type to use is CHAR (2).

You can see how choosing data-types without considering whether it's the best type can lead to wasted space. Let's look at an example schema to support a population census of the USA.

CREATE TABLE CensusSchema1 (

SSN CHAR (256),

StateName CHAR (256),

Sex INTEGER,

Employed INTEGER,

SelfEmployed INTEGER,

EthnicOrigin INTEGER,

MaritalStatus INTEGER,

NumberOfDependents INTEGER,

Age INTEGER,

CountryOfBirth INTEGER,

HouseholdIncomeGroup INTEGER,

ZipCode5 INTEGER);

GO

At first glance this may look reasonable, but digging in you'll see that many of the columns are over-sized. Here's a cleaned-up schema to compare against, with notes on the per-column savings:

CREATE TABLE CensusSchema2 (

SSN CHAR (9), -- saving 244 bytes

StateName VARCHAR (256), -- saving at least 240 bytes (longest state name is 14 + 2-byte varchar overhead)

Sex BIT,

Employed BIT,

SelfEmployed BIT, -- saving 11 bytes altogether over these three fields

EthnicOrigin TINYINT, -- saving 3 bytes

MaritalStatus TINYINT, -- saving 3 bytes

NumberOfDependents TINYINT, -- saving 3 bytes

Age TINYINT, -- saving 3 bytes

CountryOfBirth TINYINT, -- saving 3 bytes

HouseholdIncomeGroup TINYINT, -- saving 3 bytes

ZipCode5 INTEGER); -- no saving

GO

The bad schema gives a per-row size of 574 bytes and the cleaned-up schema is no more than 48 bytes per-row. I designed these two schemas to support a US census. The population of the USA is approx. 300 million. This means the bad schema would take around 190GB to store all the info, and the cleaned-up schema only takes around 15GB - more than 12 times more efficient! Now we're starting to see how poor data-type choice can lead to poor performance.

Wider rows means:

  • Fewer rows can fit on an 8k page.
  • More CPU is necessary to crack open a record due to the number of CPU data cache line invalidations necessary to read the record into the CPU's various caches (every time a cache line is invalidated it takes a bunch of CPU cycles - see here for an explanation of CPUs and cache lines).

Less rows per page means:

  • More pages are needed to store the data
  • Indexes could have a smaller fan-out (if the index keys are wider than then need to be), leading to more levels in the index and less efficient index navigation from the index root page to the index leaf-level.

More pages means:

  • More IOs are necessary to read the same amount of actual data
  • More buffer pool memory is necessary to hold the same amount of actual data
  • More disk space is necessary to hold the same amount of actual data

Clearly there's a link between various aspects of workload performance and badly chosen data-types.

Friday, November 09, 2007 2:31:10 PM (Pacific Standard Time, UTC-08:00)  #    Comments [4]  | 
Tuesday, November 06, 2007

A couple more questions from the last couple of classes.

Q1) Why doesn't performing an index rebuild alter the fragmentation?

A1) Here are the possibilities - all of which I've seen happen:

  • There isn't an index - either DBCC DBREINDEX or ALTER INDEX ... REBUILD are being run on a table that only has a heap, and so the (extent) fragmentation of the heap isn't changing because there's no way to rebuild a heap (except by the heavily NOT recommended method of creating and then dropping a clustered index).
  • The index is too small. An index with only a handful of pages may not show any changes in fragmentation because all the pages are single, mixed pages (see my previous post on extent types for more info) and so rebuilding the index does nothing.
  • The workload and schema are such that by the time the rebuild has finished and the fragmentation calculation has been done again, the index is already getting fragmented again.
  • The Extent Scan Fragmentation result from DBCC SHOWCONTIG is being used to gauge fragmentation for an index stored in a filegroup with multiple files. The Extent Scan Fragmentation in DBCC SHOWCONTIG does not cope with multiple files (as is documented in Books Online) and so the value may even go UP in some cases!

Q2) What operations take advantage of minimal-logging when the recovery mode is BULK_LOGGED?

A2) The list is very small - 4 four classes of operations:

  • Index builds, rebuilds, or drop of a clustered index (NOT index defrags with DBCC INDEXDEFRAG or ALTER INDEX ... REORGANIZE - this is a common misconception).
  • Bulk load operations (i.e. BCP, INSERT ... SELECT * FROM OPENROWSET (BULK...), and BULK INSERT).
  • Insert or appends of LOB data (either using WRITETEXT/UPDATETEXT for TEXT/NTEXT/IMAGE data types, or UPDATE with a .WRITE clause).
  • SELECT INTO operations on permanent tables.

For these operations, only the allocations are logged in the transaction log. Any extents that are allocated and changed through a minimally-logged operation are marked in the ML bitmaps (one for every 4GB of each file) and then the next transaction log backup will also read all those extents and include them in the backup.

Tuesday, November 06, 2007 3:17:02 PM (Pacific Standard Time, UTC-08:00)  #    Comments [3]  | 

This is a question that came up yesterday in out Disaster Recovery class so I'm typing it up in between attending sessions at Microsoft Day here at SQL Connections. It's an interesting experience watching all the MS speakers walking around in the distinctive blue shirts and no longer having to wear one myself.

The question is the following - why does DBCC CHECKDB terminate with an out-of-space error in SS2005? A customer had a 500GB database spread over 17 LUNs on a SAN, with each LUN having only 5GB free. There's a heavy concurrent workload running while CHECKDB is running and very often it doesn't compelte but instead stops with an error. What's going on?

The reason for this is the way that CHECKDB gets a transactionally consistent view of the database. In a nutshell, it creates a internal database snapshot of the database and then checks the database snapshot (you can read more about this in my previous post detailing all the steps of CHECKDB). A database snapshot needs to have one snapshot file for each file in the source database. In the case of CHECKDB, the snapshot files are created as alternate-streams of the existing database files - i.e. stored on the same disk volume - and you have no control over this. This means any changes to the database while CHECKDB is running will cause these alternate streams to grow.
 
If there's a significant concurrent workload while CHECKDB is running, then these alternate streams can get very big very quickly. In the case described in the question, the workload caused one of the alternate streams to take up all available space on the LUN and then it ran out of space. When this happens the snapshot is no longer valid and so CHECKDB has to stop.
 
There are two solutions to this. The obvious first one is to run CHECKDB in a period with low concurrent workload. That's not feasible for many 24x7 shops so the better solution is to create your own database snapshot (so you can control the placement of it) and then run CHECKDB on that. This is no different than having CHECKDB create its own snapshot.
 
There's another reason that CHECKDB may stop with an out-of-space error.  CHECKDB has to validate the contents of the whole database. As such it needs to store info about things it has seen at one point during database scannig so that it can match that info against things it sees at later points during database scanning. These bits of info are stored in an in-memory worktable. In many cases though, the amount of info being stored exceeds the available memory and the worktable needs to spill over to tempdb. (For example, running CHECKDB against a 1TB database on a machine with 16GB of memory - the amount of info that CHECKDB needs to store the intermediate pieces of info will likely exceed the memory available to SQL Server). So - if tempdb isn't big enough to store the worktable, it has to grow. If it can't grow, then CHECKDB will fail because it needs the worktable to operate.
 
You can check in advance how much (estimated) space will be needed by CHECKDB by running it using the WITH ESTIMATEONLY option and then making sure that tempdb is sized accordingly to accomodate the CHECKDB requirements AND the regular tempdb space requirements. See Capacity Planning for tempdb in Books Online for more info.
Tuesday, November 06, 2007 2:23:12 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Monday, November 05, 2007

Sitting here in our Disaster Recovery class at SQL Connections and Kimberly's on till lunch so I'm banging out a quick blog post covering the database mirroring (DBM) specific questions.

Q1) Can I use IP addresses instead of server names when using the DBM Monitor?

A1) Unfortunately not.

Q2) Is there any in-built throttling mechanism in DBM to allow the mirror to catch-up when synchronizing a synchronous mirroring session?

A2) Yes, if there's more than 1MB of transaction log on the principal that hasn't been sent to the mirror, the mirroring session state will be switched from SYNCHRONIZED to SYNCHRONIZING and the principal itself will start adding a few milliseconds delay to transaction commits until the amount of unsent log drops below 1MB

Q3) Are there any tips when setting up a mirroring session using backups?

A3) Yes, make sure that all the backups are restored WITH NORECOVERY on the mirror. The database has to be unrecovered otherwise the mirroring session cannot start and you're back to square one with restoring the mirror database.

Q4) What are the performance considerations with DBM - both in terms of the impact on the application workload and on DBM itself?

A4) Here are some links to resources that discuss this:

Q5) Any other resources?

A5) Blog posts...

Enjoy!

Monday, November 05, 2007 11:45:16 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 

Quick post this morning before this gets knocked out of my head by the Disaster Recovery session we're doing today at SQL Connections. This came up yesterday for a few people both on 2000 and 2005 - the database is running in SIMPLE recovery mode but the log isn't getting cleared as it usually does with checkpoints. There are two possibilities here - one for 2000 and one for both 2000 and 2005.

Very large VLFs

If the transaction log is comprised of very large (multi-GB) virtual log files, then a checkpoint operation may not clear out all the active log from the VLF - and so the log may not clear as frequently as when its comprised of more, smaller VLFs. Kimberly's blog post yesterday touched on this plus a solution.

SQL 2000 checkpoint bug

This is one I hadn't heard of. There's a bug in 2000 SP4 that stops checkpoint working properly on some configurations and so the log file can grow until it runs out of space. A few people in yesterdays Database Maintenance class were hitting this. Searching through the Knowledge Base articles turned up KB 909369 which explains the issue and has a pointer to the hotfix.

Hope this helps some of you out.

Monday, November 05, 2007 6:59:01 AM (Pacific Standard Time, UTC-08:00)  #    Comments [2]  | 
Sunday, November 04, 2007

As promised, here's the first of the grab-bag of questions we were asked during conferences. I'm blogging a selection of the stuff I noted down - Kimberly also has a bunch of stuff too that she's blogging about as I type - see here. These are some of the questions we were asked during our pre-con at SQL Connections on Database Maintenance: From Planning to Practice to Post-Mortem. It was cool that people came prepared with so many great questions - most of which we answered during the session and some I need to follow-up with the Product Team about.

Q1) I have a filegroup with two files. I add a third file, use some of the space, and then do a rebuild of the index that takes up most of the space in the original two files. Why doesn't the newly rebuilt index get spread evenly across the three files? I.e. why doesn't SQL Server rebalance the data across the files?

A1) There are two things to consider here. The first is the way that SQL Server allocates space from multiple files in a filegroup. It uses a mechanism called proportional-fill that will allocates space from files in round-robin fashion, but weights the allocations towards files that are larger and have more free space. In the example above, the space in the newly-added third file will be used before the first and second files are grown to add more space. The second thing to consider is that the process of rebuilding an index requires building a new copy of the index before dropping the old on - so in the example above, the existing allocated space can't be reused until after the index rebuild operation completes.

The concept of adding a file and having SQL Server rebalance the data across the files doesn't exist. It was something I proposed during SQL Server 2005 development but we (seriously) didn't have time to do it. The solution I recommend is to create a new filegroup with as many files as you need, rebuild the index into the new filegroup using the CREATE INDEX WITH DROP_EXISTING command, and then drop the old filegroup.

Q2) Multiple questions about whether a non-clustered index gets rebuilt on SQL Server 2005 under different circumstances

A2) See the blog post I wrote here which goes into all the different combinations. A lot of the confusion comes from the fact that on SQL Server 2000, for non-unique clustered indexes where SQL Server has to generate a uniquifying value (called a uniquifier), when it gets rebuilt all the non-clustered indexes have to be rebuilt too as the uniquifier values are regenerated. On SQL Server 2005 this is not the case - a BIG improvement.

Q3) A lot of the features we discussed (e.g. partitioning and online operations) are in Enterprise Edition only. What's the complete list of features that are in Enterprise vs Standard Editions for SQL Server 2005?

A3) The best list we know of is in MSDN - http://msdn2.microsoft.com/en-us/library/ms143761.aspx

Q4) In a maintenance plan, is it a good idea to do an index rebuild followed by an update statistics?

A4) No! An index rebuild will do the equivalent of an update stats with a full scan. A manual update stats will use whichever sampling rate was set for that particular set of statistics. So - not only does doing an update stats after an index rebuild waste resources, you may actually end of with a worse set of stats if the manualy update stats only does a sampled scan.

Q5) How can you tell whether DBCC CHECKDB is doing a deep-dive that's going to take much longer than usual?

A5) There are several algorithms in DBCC CHECKDB that are designed to quickly tell whether a corruption exists or not, but at the expense of not being able to tell exactly where the corruption is. The justification for this is that corruptions are not very common (considering the millions of times per day that DBCC CHECKDB is run) and so it makes sense to engineer for the success case and take the hit of a longer run-time in the error case. When a corruption is discovered there hasn't been any to tell that DBCC CHECKDB is going to run logner than usual until SQL Server 2005 SP2. In SP2 a new error, 5268 was added that will be output to the errorlog when one of the deep-dive algorithms is triggered. I advise you to add an alert on this error.

Q6) What are the various forums where Paul answers questions on corruptions?

A6) There are 4 forums that I (and others with good advice also) frequent:

I'm also happy for you to send me email! Just beware that on forum posts and emails you may not get a reply from anyone for a day or more - so if you have a critical server-down issue then you should contact SQL Product Support.

Ok - that's enough for tonight. More tomorrow from the Disaster Recovery workshop. Thanks to everyone who came to the workshop today!

Sunday, November 04, 2007 7:01:40 PM (Pacific Standard Time, UTC-08:00)  #    Comments [1]  | 
Saturday, November 03, 2007

After all the build-up over the last few weeks (putting finishing touches to decks and demos), we're finally off to the last set of conferences for the year. First up is SQL Connections in Las Vegas, with over 5000 attendees!!. We flew down yesterday from Seattle to hang out for an extra few days before the conference, as unbelievably after 9 years in the US it's my first time ever in Las Vegas.

It's a kind of a weird place - you're flying along over desert and mountains and suddenly there's a city in the middle of nowhere. The first landmark I could see that positively ID'd it as Vegas was the distinctive black pyramid of the Luxor hotel - we're actually looking down on it from our room in THE hotel (totally pretentious name :-) but nice rooms). Looking out at the Strip, it seems like every second or third car is a limo of some description.

Now that we're here, we're making full use of the extra time to take in some of the sights and sounds. We had dinner last night at Mix with our good friends Gert Drapers and his wife Karen, plus Michele Leroux Bustamante and her husband Andres. The food was great - I had calamari risotto plus beef tenderloin with foie gras and truffles - and Andres chose some really nice wine for us all. Unfortunately, Kimberly didn't enjoy her entree at all - but she did digress from her usual shrimps and scallops choices (her appetizer was a stunning scallop dish) so I wasn't totally surprised. Actually I tried some of her dish too - cod with sweet crumbly pastry over capers, eggplant, olives, and lemon - not how it was described on the menu and totally not something I'd recommend - and I *love* seafood. Tonight we're heading out again with Michele and Andres to see Elton John playing at Ceasar's Palace. Tomorrow the work begins...

So given that we're doing a pre-pre con tomorrow (see my previous post here for all the details), we needed to head over to the conference centre in the Mandalay Bay to pick up our speaker-shirts and badges. After walking for what seemed like miles through casinos and corridors - Vegas is *busy* - we arrived. Here's the sight that greeted us - a small army of conference crew stuffing bags and finalizing registration details.

This is a pretty exciting conference for us - it's our first as a married couple! Almost all of our sessions are together and for the few that it's only Kimberly speaking I'll either sit at the back and heckle or maybe try my hand at blackjack or some other cunning way of giving all our money away very quickly. I suspect Kimberly would rather I heckle :-)

Seriously though, we *love* presenting together, which is why I left Microsoft in the first place and other the next few weeks we'll certainly be doing a lot of it - with TechEd IT Forum in Barcelona straight after SQL Connections and then a TechNet Deep-Dive on Database Maintenance in Zurich the week after that. When we're done we can finally crash for the year as the next work trip isn't until mid-January when we head to China to teach some more classes (Kimberly's turn to experience somewhere new).

One thing we're definitely going to do while we're on the road is blog lots - at least one of us will post every day with some tidbit of info or answer to an interesting question that came up during a session or workshop.

Hopefully we'll see you at one of these events - stop by and say hi!

Saturday, November 03, 2007 2:26:13 PM (Pacific Standard Time, UTC-08:00)  #    Comments [1]  | 
Wednesday, October 31, 2007

This is a quick answer to a question I was sent today by someone who'd read Kimberly's partitioning whitepaper - Partitioned Tables and Indexes in SQL Server 2005 - and is implementing a "sliding-window" scenario. (This is a mechanism to allow fast insertion and deletion of significant portions of data into/from a partitioned production table. Insertion is done by taking a table and making it a new partition of the production table - called switching-in. Deletion is done by removing a partition from the production table and making it into a stand-alone table - called switching-out.)

The question is - what indexes are required on the staging table to prevent the ALTER TABLE ... SWITCH PARTITION statement from failing with a message like that below:

Msg 4947, Level 16, State 1, Line 1
ALTER TABLE SWITCH statement failed. There is no identical index in source table 'PartitionTest.dbo.StagingTable' for the index 'NC_Birthday' in target table 'PartitionTest.dbo.ProductionTable'.

The answer is that the staging table has to have the exact same indexes - clustered and non-clustered - as the production table. I asked Kimberly if it has to have the same constraints too - the answer is yes, plus the staging table has to have a trusted constraint on it such that SQL Server can tell (without checking all the data in the staging table) that all the data satisfies the partitioning function for the partition that you're switching-in (i.e. the partition that the staging table will become in the production table). If it doesn't, the switching-in will fail with the following error:

Msg 4982, Level 16, State 1, Line 1
ALTER TABLE SWITCH statement failed. Check constraints of source table 'PartitionTest.dbo.StagingTable' allow values that are not allowed by range defined by partition 4 on target table 'PartitionTest.dbo.ProductionTable'.

One thing that confuses people is that SQL Server does not create the target table for you when doing a switch-out of a partition. The target table has to exist and have the exact same schema as the production table. Also, it has to be completely empty - otherwise you'll get an error like:

Msg 4905, Level 16, State 1, Line 1
ALTER TABLE SWITCH statement failed. The target table 'PartitionTest.dbo.StagingTable' must be empty.

The must-be-empty requirement also holds for switching-in operations - the partition that will be created has to be empty otherwise a similar 4904 error results.

Hope this helps!

Wednesday, October 31, 2007 6:05:57 PM (Pacific Standard Time, UTC-08:00)  #    Comments [1]  | 
Tuesday, October 30, 2007

There's a well known problem that every time a backup operation succeeds a message is written into the error log and Windows event log. If you're taking very frequent log backups (say every 5 minutes) of multiple databases, that's a significant amount of clutter in the logs. Well - now there's a fix!

Kevin Farlee, the Storage Engine PM responsible for (among many other things) BACKUP and RESTORE has just blogged about a trace flag - 3226 - that's been in the product since SQL Server 2000 that will suppress the success messages. He's planning to document this (and other) trace flags in this area starting in SQL Server 2008. Excellent!

Tuesday, October 30, 2007 11:26:29 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Thursday, October 25, 2007
I've been involved in a few conversations today that have highlighted some big misconceptions about how backups work. I'd like to use this blog post to debunk them. I checked everything I say here with my friend Steve Schmidt, the developer on the Storage Engine team responsible for BACKUP/RESTORE for the last ten years.

Myth 1: A full database backup only contains the transaction log from the start of the backup to the end of the backup

When you restore a full database backup, you get a transactionally consistent database. Consider the case where there's an active transaction that doesn't commit until after the backup completes. If the backup only contained the log that occured while the database was being backed up, how would it roll back the active transaction. It *has* to include enough transaction log to roll back the active transaction. The start LSN of the log included in a database backup is the minimum of:

  • LSN of the last checkpoint
  • LSN of the start of the oldest active transaction
  • LSN of the last replicated transaction

Let me prove it to you. I'm going to create a database, put it into FULL recovery mode, start a transaction, checkpoint, and then take a backup. The checkpoint ensures the page I've altered is flushed to disk.

CREATE DATABASE stopattest;

GO

ALTER DATABASE stopattest SET RECOVERY FULL;

GO

BACKUP DATABASE stopattest TO DISK = 'c:\sqlskills\stopattest.bck' WITH INIT;

GO

USE stopattest;

GO

CREATE TABLE t1 (c1 INT);

GO

BEGIN TRAN;

INSERT INTO t1 VALUES (1);

GO

Now in another connection I'll take another full database backup.

BACKUP DATABASE stopattest TO DISK = 'c:\sqlskills\stopattest.bck' WITH INIT;

GO

The msdb.dbo.backupmedia table will tell us the relevant LSNs in the backup (I added some spaces to delineate the prts of the LSN for clarity):

SELECT last_lsn, checkpoint_lsn, database_backup_lsn FROM msdb.dbo.backupset

WHERE database_name = 'stopattest';

GO

last_lsn              checkpoint_lsn        database_backup_lsn
--------------------- --------------------- ---------------------
21 0000000190 00001   21 0000000174 00037    21 0000000058 00037

[Edit: After swapping some email with Kalen Delaney, I realized that when I originally put this together I had more log records in the post and when I removed them I messed up the description of the (21:174:37) LSN - its now corrected below)

So you can see the checkpoint that begins the backup was at (21:174:37). The LSN of the first log record that the backup contains is (21:58:37), which is before the start of the backup. And the backup contains all the log from then until (21:190:1). Now let's look at the actual transaction log to see what these LSNs correspond to.

SELECT [Current LSN], Operation, [Transaction Name] FROM fn_dblog (null,null);

GO

Here's some selected output:

Current LSN              Operation        Transaction Name
------------------------ ---------------- ------------------
00000015:0000003a:0025   LOP_BEGIN_CKPT   NULL
                (this is the calculated minimum LSN the backup must contain (21:58:37) - which is (15:3a:25) in hex)
.
.
00000015:00000061:0001   LOP_BEGIN_XACT   user_transaction
                (here's my transaction starting - before the backup started but within the LSN range contained in the backup)
.
.
00000015:000000ab:0004   LOP_BEGIN_XACT   Backup:InvalidateDiffMaps
                (this is the backup clearing the differential bitmaps)
.
.
00000015:000000ae:0025  LOP_BEGIN_CKPT    NULL
                (this is the checkpoint that BACKUP does - matching the checkpoint LSN above)
.
.

So - this clearly shows that the backup contains more than just the log from the time the backup was running.

Myth 2: It's possible to do a STOPAT with only a full database backup

This myth is that its possible to use STOPAT with a full database backup to stop during the time the backup was being taken. The argument FOR this myth is that the backup contains the log for all the changes that happened while the backup was being taken, so it must be possible to stop at any point in time. Technically, that's correct, but in practice it's wrong - you cannot stop at a point while the backup was running, using only the database backup.

This one's more complicated to explain. Doing a STOPAT operation means getting the database to a state where operations later than the time or LSN specified in the STOPAT clause haven't affected the database yet. A database backup reads pages that may or may not have been changed while the backup was running. If they are changed, it could be at any point while the backup is running.

Consider the case where page X is changed at LSN (10:45:12), *just* before the backup completes and is read by the backup at the time equivalent to LSN (10:45:13). The backup will contain the changed page image, plus the log record for the change. What if I want to stop at a point while the backup was running but *before* the change to page X, say at LSN (10:44:00). The backup only contains the image of page X at LSN (10:45:12) - how can it be put back to the image at the time we want to stop at? The argument is that we have the log record for the change - can't SQL Server just undo it?

No. It won't even see it. STOPAT works by recovering the database up to the point that the STOPAT specified. If we ask to stop at LSN (10:44:00), then the log will only be read and recovered up to that point. However, because the database backup didn't read page X until LSN (10:45:13), it only has the image of it from when it was altered at (10:45:12). This clearly won't give a database image as of (10:44:00).

The only way to stop at a particular time/LSN, is to have images of *all* database pages from before that time/LSN (i.e. from the *previous* database backup) and then restore all the transaction logs up to and including the time/LSN to stop at.

Hopefully that makes sense.

Thursday, October 25, 2007 1:42:04 PM (Pacific Standard Time, UTC-08:00)  #    Comments [3]  | 

Ok - this post is a little strange and fun. I was thinking about word length and how it relates to designing software/schemas to support multiple-languages. How far do you have to go in your research to figure out the maximum string length to support? So I started digging about and found some interesting things about words. Here are some examples.

  • If you're putting together a schema to support hospital patient records, you might have a field for disease name. In that case, you'd have to allow for pnuemonoultramicroscopicsilicovolcanoconiosis which has 45 letters (caused by breathing in siliceous volcanic dust). A field for surgical procedure would have to support hepaticocholangiocholecystenterostomies which has 37 letters (creating a connection between the gall bladder and the hepatic duct). What about a field for how a measurement was obtained - electroencephalographically with 27 letters (using an electroencephalograph).
  • A schema to support chemical names could really be unlimited given the nature of systematic names for chemicals. The longest one in the dictionary is an acid called tetramethyldiaminobenzhydrylphosphinous with 39 letters (and given a few minutes I could probably draw its chemical structure by following the systematic method I learned at school :-)). The longest published chemical name is a kind of tobacco mosaic virus - ACETYLACETYL-SERYL-TYROSYL-SERYL-ISO-LEUCYL-THREONYL-SERYL-PROLYL-SERYL-GLUTAMINYL-PHENYL-ALANYL-VALYL-PHENYL-ALANYL-LEUCYL-SERYL-SERYL-VALYL-TRYPTOPHYL-ALANYL-ASPARTYL-PROLYL-ISOLEUCYL-GLUTAMYL-LEUCYL-LEUCYL-ASPARAGINYL-VALYL-CYSTEINYL-THREONYL-SERYL-SERYL-LEUCYL-GLYCYL-ASPARAGINYL-GLUTAMINYL-PHENYL-ALANYL-GLUTAMINYL-THREONYL-GLUTAMINYL-GLUTAMINYL-ALANYL-ARGINYL-THREONYL-THREONYL-GLUTAMINYL-VALYL-GLUTAMINYL-GLUTAMINYL-PHENYL-ALANYL-SERYL-GLUTAMINYL-VALYL-TRYPTOPHYL-LYSYL-PROLYL-PHENYL-ALANYL-PROLYL-GLUTAMINYL-SERYL-THREONYL-VALYL-ARGINYL-PHENYL-ALANYL-PROLYL-GLYCYL-ASPARTYL-VALYL-TYROSYL-LYSYL-VALYL-TYROSYL-ARGINYL-TYROSYL-ASPARAGINYL-ALANYL-VALYL-LEUCYL-ASPARTYL-PROLYL-LEUCYL-ISOLEUCYL-THREONYL-ALANYL-LEUCYL-LEUCYL-GLYCYL-THREONYL-PHENYL-ALANYL-ASPARTYL-THREONYL-ARGINYL-ASPARAGINYL-ARGINYL-ISOLEUCYL-ISOLEUCYL-GLUTAMYL-VALYL-GLUTAMYL-ASPARAGINYL-GLUTAMINYL-GLUTAMINYL-SERYL-PROLYL-THREONYL-THREONYL-ALANYL-GLUTAMYL-THREONYL-LEUCYL-ASPARTYL-ALANYL-THREONYL-ARGINYL-ARGINYL-VALYL-ASPARTYL-ASPARTYL-ALANYL-THREONYL-VALYL-ALANYL-ISOLEUCYL-ARGINYL-SERYL-ALANYL-ASPARAGINYL-ISOLEUCYL-ASPARAGINYL-LEUCYL-VALYL-ASPARAGINYL-GLUTAMYL-LEUCYL-VALYL-ARGINYL-GLYCYL-THREONYL-GLYCYL-LEUCYL-TYROSYL-ASPARAGINYL-GLUTAMINYL-ASPARAGINYL-THREONYL-PHENYL-ALANYL-GLUTAMYL-SERYL-METHIONYL-SERYL-GLYCYL-LEUCYL-VALYL-TRYPTOPHYL-THREONYL-SERYL-ALANYL-PROLYL-ALANYL-SERINE - with 1185 letters.
  • Probably the one that's going to catch most people out is place names. The bank Kimberly and I use won't allow a town/city name of more than 30 characters. That's fine for the USA, where the longest place name has 24 letters (Winchester-on-the-Severn in Maryland or Washington-on-the-Brazos in Texas). However, if the back-end database is coded to only support 30 characters, that wouldn't work around the world:
    • In Wales, there are two longest names are Llanfairpwllgwyngyllgogerychwyrndrobwyllllantysiliogogogoch with 58 letters and Gorsafawddachaidraigodanheddogleddolonpenrhynareurdraethceredigion wth 66 letters.
    • In New Zealand, there's a hill called Taumatawhakatangihangakoauauotamateaturipukakapikimaungahoronukupokaiwhenuakitanatahu - 85 letters and that name used to be in general use.

Pretty interesting - or as my kids like to say supercalafragalisticexpialidocious! (34 letters :-))

I'd be interested to hear of longest words in other languages apart from English - please leave a comment. Thanks

Thursday, October 25, 2007 11:38:59 AM (Pacific Standard Time, UTC-08:00)  #    Comments [6]  | 

(Been a few days since I posted - had some real work to do :-) Today I'll post a few things from the queue that's been building up)

This is part Q&A and part follow-on from my last post about running index maintenance when a database is mirrored.

A customer has a maintenance plan that involves running regular ALTER INDEX ... REORGANIZE on a 100GB clustered index to remove fragmentation. Three weeks ago they added database mirroring, with the database setup for synchronous mirroring. Every so often, they see the state of the mirror change from SYNCHRONIZED to SYNCHRONIZING and then a bit later back to SYNCHRONIZED. What's going on? Once a synchronously-mirrored database is synchronized, it should ever get out of sync, right?

Well not quite - if the communication link between the principal and the mirror is broken, then the mirror becomes unsynchronized. The exact behavior in this situation depends on how mirroring is setup and what's failed:

  1. If there's no witness instance, then transactions will continue on the principal database but the transaction log starts to grow, because the transactions can't be cleared from the principal's log (even after a log backup) until they've been sent to the mirror. The database is running 'exposed'. Once the link is reestablished, the mirror while synchronize again.
  2. If there's a witness, and the witness can still talk to the principal, then everything continues as in #1
  3. If there's a witness, and the communication link between it and principal is also broken, the the principal will stop serving the database - transactions will stop. In this case, if the mirror and the witness can still see each other, then a failover will occur.

There are some great Books Online entries that describe all of this - see http://msdn2.microsoft.com/en-us/library/ms179344.aspx to start with.

The customer had situation #1. Every so often the mirror would change state and it seemed to coincide with the defrag job. Looking in the error log shows messages like:

2007-10-24 11:43:36.21 spid23s     Error: 1474, Severity: 16, State: 1.

2007-10-24 11:43:36.21 spid23s     Database mirroring connection error 2 'Connection attempt failed with error: '10060(A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.)'.' for 'TCP://roadrunnerpr.sqlskills.com:5022'.

So the network link was dying sometimes when the defrag was running - that explains the switch between SYNCHRONIZED and SYNCHRONIZING. Why the network link was dying is still under investigation but it seems like the additional transaction log generated by the defrag job was causing the network to become overloaded and some component of it wasn't behaving correctly under load.

There are a few things to learn from this:

  1. Not only do you need to make sure that your IO subsystem can handle the load on it correctly, you also need to make sure your network can handle the load on it. There are a bunch of tools available to stress-test network paths - one simple one is TrafficEmulator.
  2. When you're running on your test system before going into production, make sure you test *everything* as if you were running in production - including maintenance jobs because they can add significant load to a production system.
  3. When you implement an HA solution such as mirroring, consider all the ways that transaction log will be generated when figuring out the required network bandwidth to support your HA configuration - something like a defrag or rebuild can cause an enormous spike in log generation
Thursday, October 25, 2007 9:04:07 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Sunday, October 21, 2007

Two weeks to go until the conferences start...

I've revamped the main Upcoming Events page on SQLskills.com to list all the DBA/IT-Pro, Developer, and BI sessions we're collectively doing at SQL Connections Fall, Microsoft TechEd Developers, Microsoft TechEd IT Forum, and a Microsoft TechNet Deep-Dive in November. (Now that I've finished my Lego model, what else am I going to do on a wet Sunday afternoon?:-))

And there's a lot of it - 4 full-day workshops, 17 sessions, 9 chalk-talks, 3 instructor-led labs, and 1 lunchtime demo session - but they're filling up fast and it looks like we'll be doing a few repeats at IT Forum. Anyone coming to any of our full-day workshops will get one of our DVDs packed full of Hands-On labs - incidentally, the Always-On DVD will be the base for our post-con workshop at SQL Connections.

One of the events I haven't mentioned before - the TechNet Deep-Dive. This is being organized by Microsoft in Switzerland and will be a full-day workshop taught by Kimberly and I on database maintenance. Checkout the link above for details on how to register.

We've had some questions about which of the full-day workshops are applicable to customers running SQL Server 2000. Well, a lot of the technology we talk about was introduced in SQL Server 2005, but the concepts and best-practices remain the same. I'd say that the database maintenance workshop probably has the most content that will transfer to SQL Server 2000. However, if you're planning to upgrade to SQL Server 2005 or 2008, these workshops are packed full of info on new features and syntax that you'll need to know.

Look forward to seeing you at one of the events!

Sunday, October 21, 2007 1:35:25 PM (Pacific Standard Time, UTC-08:00)  #    Comments [3]  | 
Friday, October 19, 2007

Two of the cool features in SQL Server 2005 are CROSS APPLY and DMVs (Dynamic Management Views). Now, far be it for me to get my hands dirty explaining developer stuff like CROSS APPLY :-) but I was having a discussion with Colin Leversuch-Roberts in the UK about the composability limitations of the sys.dm_db_index_physical_stats DMV. (Btw - you should check out Colin's blog post series on Analysing Indexes - lots of useful stuff).

So CROSS APPLY lets you do join-like functionality with table-valued functions that take parameters - which you can't do using JOIN. This works for most of the DMVs, but some of them are written to an older internal implementation that doesn't support CROSS APPLY, and sys.dm_db_index_physical_stats is one of them. If you try it you'll get an error like:

Msg 413, Level 16, State 1, Line 26

Correlated parameters or sub-queries are not supported by the inline function "master.sys.dm_db_index_physical_stats".

Fortunately there's a way around this - wrap the DMV in an artificial TVF of your own, and then CROSS APPLY to that. Here's an example:

 

CREATE FUNCTION my_index_physical_stats (

@database_id INT,

@object_id INT,

@index_id INT,

@partition_number INT,

@mode INT)

RETURNS @result TABLE (

database_id SMALLINT NULL, object_id INT NULL, index_id INT NULL, partition_number INT NULL,

index_type_desc NVARCHAR(60) NULL, alloc_unit_type_desc NVARCHAR (60) NULL, index_depth TINYINT NULL,

index_level TINYINT NULL, avg_fragmentation_in_percent FLOAT NULL, fragment_count BIGINT NULL,

avg_fragment_size_in_pages FLOAT NULL, page_count BIGINT NULL, avg_page_space_used_in_percent FLOAT NULL,

record_count BIGINT NULL, ghost_record_count BIGINT NULL, version_ghost_record_count BIGINT NULL,

min_record_size_in_bytes INT NULL, max_record_size_in_bytes INT NULL, avg_record_size_in_bytes FLOAT NULL,

forwarded_record_count BIGINT NULL)

BEGIN

INSERT INTO @result SELECT * FROM

sys.dm_db_index_physical_stats (@database_id, @object_id, @index_id, @partition_number, @mode)

RETURN

END;

GO

Friday, October 19, 2007 5:35:26 PM (Pacific Standard Time, UTC-08:00)  #    Comments [1]  | 

During SQL Server 2005 development I did a comprehensive rewrite of the Books Online entries concerned with estimating index, table, and database size (see http://msdn2.microsoft.com/en-us/library/ms187445.aspx). I was recently having a discussion with Øystein Sundsbø about a bug in one of the formulas I wrote and it turns out he's written a neat tool that codifies all the formulas.

In case you're interested, the part of the formula I had incorrect was for figuring out how many pages would be needed for the upper levels of an index b-tree. My formula was:

Num_Index_Pages = ∑Level (Index_Rows_Per_Page)Level – 1
where 1 <= Level <= Levels

and Øystein came up with a better formula:

Num_Index_Pages = ∑Level (Total_Num_Rows/(Index_Rows_Per_Page)Level )
where 1 <= Level <= Levels

where in both formulas:

Levels = 1 + log Index_Rows_Per_Page (Total_Num_Rows / Index_Rows_Per_Page)

Check out Øystein's cool tool at http://dbgoby.blogspot.com/2007/10/db-goby-v10.html!

Friday, October 19, 2007 11:26:04 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Thursday, October 18, 2007

Today I finally managed to hook-up with SQL Server MVP and Regional Director Greg Low so he could interview me for his podcast show - we'd spent the last few months juggling schedules and time-zone differences (he's 17 hours ahead of Redmond) but today the stars aligned and we made it. You can download the interview at www.SQLDownUnder.com - show #24. Thanks Greg!

Ok - this is really geeky, but I love making models, and especially the harder Lego models. Lego is one of the coolest toys ever and I've been a big fan since I was a small child. Lego announced in the Spring their biggest Lego model ever - a large scale Ultimate Collector's Millenium FalconTM with 5195 pieces (check it out here on the Lego site). I pre-ordered mine right away!

Kimberly was away teaching for Microsoft in India the first two weeks of October this year. So what does Paul do when Kimberly's away? Well, apart from feverishly blogging on my new blog, I made the Falcon, which arrived the day before she left. I reckon it took me about 25-30 hours total time over the course of two weeks.

Here's a picture of the finished model, with a Diet Pepsi can alongside to give you an idea of the scale (click on it to get to a hi-res image you can scroll around to see the detail). Below is a bit of detail blow-up featuring my nick-name-sake Chewie.

 

      Pretty cool eh? :-)

 

 

 

 

 

 

Thursday, October 18, 2007 6:17:11 PM (Pacific Standard Time, UTC-08:00)  #    Comments [2]  | 

There's been some discussion over on the SQL Server Central forums about database mirroring performance, and one question in particular on how to do index rebuilds on mirrored VLDBs.

Remember that in database mirroring, the database has to be in the full recovery mode, so all index rebuilds are fully logged. It all comes down to the amount of transaction log generated and whether this causes a problem.

  • In synchronous mirroring, the additional log being generated could overload the network link between the principal and mirror. This will slow down the hardening of regular transactions in the mirror's transaction log, and thus led to a decrease in transaction throughput on the principal.
  • In asynchronous mirroring, the additional log being generated could again overload the network link - but this time there's no requirement for transactions to harden in the mirror before they can commit on the principal, so instead the SEND queue on the principal gets really large. Now, in asynchronous mirroring the SEND queue represents the amount of work that will be lost if a failover occurs, as its all the transaction log that hasn't yet been sent to the mirror. So, in asynchronous mode, a large index rebuild operation could lead to increase exposure to data loss in the event of a failover.

In SS2008, the log stream compression I blogged about here should go a long way to alleviating this problem. There are three ways I can think of to reduce the amount of transaction log generated by and index maintenance plan in SS2005:

  1. Use a potentially less expensive (in terms of logging) solution for removing index fragmentation. Doing an index rebuild is guaranteed to generate an equivalent amount of transaction log to the size of the index being rebuilt, no matter how much fragmentation there is - because an index rebuild always rebuilds the entire index. The alternative is to do an index reorganize, either using my old DBCC INDEXDEFRAG or the new ALTER INDEX ... REORGANIZE. These will only generate transaction log when index pages are compacted and reorganized - so for less heavily fragmented indexes. There's no hard and fast rule here but I generally say where Logical Scan Fragmentation/Average Fragmentation in Percent from DBCC SHOWCONTIG/sys.dm_db_index_physical_stats, respectively, is less than 30%. You also need to consider page density too - but really this is a topic for a whole other post.
  2. Be very selective on which indexes you choose to rebuild/reorganize. A lot of people have a maintenance plan that does this for every index every week, without checking whether the index is even fragmented or whether removing fragmentation for an index improves workload performance. Again, this a whole other topic but there is an old whitepaper for SS2000 I helped with that describes some of this - Microsoft SQL Server 2000 Index Defragmentation Best Practices.
  3. Partition the tables/indexes so that the changing portion of the data is the only portion that's affected by index maintenance. If most of your data is read-only, there's no point in having it included in reindex/reorganize operations, right? Kimberly recently wrote a blog post about such an architecture here.

For more info on database mirroring performance considerations, checkout the whitepaper Database Mirroring: Best Practices and Performance Considerations. There's also a slide deck presentation based on this whitepaper that's been presented at various TechEds (I did it in China and Hong Kong last year and Kimberly did it in South Africa last year) - you can download it from the Hong Kong website here.

The bottom line is when mirroring is in the mix, you need to be more intentional with your database maintenance.

Thursday, October 18, 2007 4:37:43 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Wednesday, October 17, 2007

Thanks for your patience and to all those who emailed to let me know. All the SQLskills.com blogs have been updated to the latest dasBlog version and everything's working again. I'd appreciate you taking the time to go back and re-enter any comments you tried to over the last few days.

Thanks!

Wednesday, October 17, 2007 4:52:22 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  |