Monday, March 31, 2008

(Redmond, WA: For immediate release worldwide)

Today, in a surprise development that has stunned industry analysts, SQLskills.com announced a new technology for DBAs that will help in the never-ending battle against human-error and unforeseen disasters. The patent-pending Time-Setback technology allows DBAs of SQL Server to literally rewind time and avoid disasters before they happen.

Renowned SQL expert Kimberly Tripp said in an interview earlier today: "This will be a real boost for harried DBAs. All this time I've been going on and on and on about how DBAs should have a comprehensive backup strategy to cope with disasters. Now they can just forget all of that, throw caution to the wind, and rely on a Time-Setback device!"

Asked how the R&D department developed the technology, a spokesman for the company said "We got the idea after reading Harry Potter and The Prisoner of Azkaban, where Hermione is given a Time-Turner device from Professor Dumbledore. We figured there had to be some scientific basis for it, just like all those books that explain how Star Trek and the X-Files are based on real physics too. So, we had a crack at creating it and it worked! I'm not sure the color's quite right though. Maybe we'll change that in V2. Anyway, cool eh?"

A further disclosure, from a major software company, explained that it is in talks with SQLskills.com to purchase almost a thousand of the devices to hand out to developers to "ensure we ship this year and don't have to change the name". The spokesman wouldn't name the product when pressed.

The device will launch on April 1st, 2008, and will be available for immediate delivery. Although the company has only manufactured 4 of the devices, it will use one of them to do just-in-time manufacturing as orders stream in. For further details, please send email to: AprilFools@SQLskills.com

(Redmond, WA: For immediate release worldwide)

Monday, March 31, 2008 1:55:52 PM (Pacific Standard Time, UTC-08:00)  #    Comments [2]  | 

Fixed now - thanks so much!

Folks - a couple of people have setup their systems to refresh *all* our category feeds every five minutes - this is putting an undue load on our server (and screws up our server stats). I know I post a lot but not *that* often. I've WHOIS'd the IP addresses - one person is in Brazil and another in Korea - I can't limit how often you do this but I can ban your ISP's IP addresses if you continue to put this load on us - which I don't want to do. Please change your refresh rate to be 60 minutes or just subscribe to a whole blog feed rather than all the category feeds.

Thanks!

Monday, March 31, 2008 3:50:53 AM (Pacific Standard Time, UTC-08:00)  #    Comments [2]  | 
Sunday, March 30, 2008

Hot on the heels of a frenzied few weeks of teaching for Microsoft and we're off again - this time to Iceland! I've been wanting to come here for as long as I can remember but Kimberly's already been here 3 times before. We're teaching some seminars in conjunction with Miracle Iceland next week but decided to come a few days early to hang out in Reykjavik and see some sights.

Today we headed out of town with our friends Gunnar Bjarnason (Miracle Iceland's chief) and his wife Thorun with the aim of getting to the top of the Skjaldbreidur volcano (dormant of course!). First up we headed through the Thingvellir National Park to check out the fault line. Iceland sits on the boundary between the American and Eurasian tectonic plates - hence all the volcanic activity - checkout this link to see how Iceland sits in relation to the plates. In the park you can actually see where the plates come together as the fault line is a very obvious cleft lined with basalt formations. At one point there's a little bridge across the fault line and the strip-lake that formed in the cleft - not many places you can stand on a plate boundary. Apparently there's a tunnel that you can scuba-dive through to the nearby Lake Thingvallavatn, and the lake has amazing visibility for diving as it's fed almost exclusively from springs (see here). This is also where the oldest parliament in the world was established in 930AD. The Icelanders would meet here for a few weeks every year and the new laws would be memorized as there was a shortage of writing materials.

Next we headed out on road 52 for about 20km into the snowy wilderness, until we came to a set of power lines heading to one of the aluminum smelters on the island. Electricity is pretty cheap here (because it can be generated from steam from the geothermal activity) and so it's actually more economic to ship bauxite (the mineral that aluminum is smelted from) from Australia to here to be smelted and then shipped back again. Electricity here must be really cheap! Anyway, we followed the power lines across country along the side of the volcano and then Gunnar decided 'here!' and we simply turned and drove directly up the mountain in the snow. We got about 880m up before we finally got bogged down 200m from the top, even with the balloon tires down to 2 psi so we very carefully turned round and sat admiring the stunning view over lunch before heading back down.

Due to the remoteness of the area and the possibility of things going awry it's essential to have multiple radios and other emergency gear. We had no bother though, mostly due to Gunnar's excellent off-road driving skills, and the rugged Land Cruiser we were driving. We had towed along a snow-mobile part of the way with the idea of racing to the top of the mountain but the -7C temperature with *amazing* wind-chill killed that idea. Once back down by the lake we did a spot of bird-watching to add some more species to my life-list (Teal, Barrow's Goldeneye, Goosander). Kimberly and I dozed through our jet-lag on the drive back to Reykjavik, hitting the hotel 8 hours after we left.

Here's a couple of photos - click for bigger images.

 Gunnar and Paul unhooking the snowmobile.

 The view from up the mountain looking at other volcanoes.

Sunday, March 30, 2008 4:05:03 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Friday, March 21, 2008

This is a really interesting question that came up in the Microsoft Certified Architect class I'm teaching at present - if a database has torn-page protection enabled, and page checksums are enabled, is all the existing torn-page detection lost?

This is an important question, because enabling page checksums doesn't suddenly make all allocated pages be protected by page checksums (it's not until a page is read into the buffer pool, modified, and then written back to disk, that it gets a page checksum). If all the existing torn-page protection is discarded when page checksums are enabled, then the pages would be unprotected until they got page checksums on. I couldn't remember the answer, so I experimented!

My idea was to create a database with torn-page protection, create a table with a simulated torn-page in it, then enable page checksums and see if the torn-page was still reported.

-- Create the test database
USE master;
GO
CREATE DATABASE ChecksumTest;
GO
USE ChecksumTest;
GO

-- Explicitly set the database to have torn-page detection
ALTER DATABASE ChecksumTest SET PAGE_VERIFY TORN_PAGE_DETECTION;
GO

-- Create a test table and insert a row.
CREATE TABLE BrokenTable (c1 INT, c2 CHAR (1000));
INSERT INTO BrokenTable VALUES (1, 'a');
GO

-- Ensure the page is written to disk and then tossed from the buffer pool
CHECKPOINT;
GO
DBCC DROPCLEANBUFFERS;
GO

Now I'm going to examine the page. There are two bits in the page header that specify whether the page is protected by torn-page detection or with a page checksum. Specifically, the m_flagBits field will have 0x100 set if the page is encoded for torn-page protection, and 0x200 set if the page has a page-checksum stored on it, and the page has not been modified (i.e. the checksum is stillvalid). You should not see the 0x100 bit set as torn-page encoding is removed when the page is read into the buffer pool - UNLESS the page IS actually torn, in which case the encoding is NOT removed.

sp_allocationmetadata 'BrokenTable';
GO
DBCC TRACEON (3604);
GO
DBCC PAGE ('ChecksumTest', 1, 143, 3);
GO

<snip>

m_pageId = (1:143)                   m_headerVersion = 1                  m_type = 1
m_typeFlagBits = 0x4                 m_level = 0                          m_flagBits = 0x8000
m_objId (AllocUnitId.idObj) = 67     m_indexId (AllocUnitId.idInd) = 256 
Metadata: AllocUnitId = 72057594042318848                                
Metadata: PartitionId = 72057594038321152                                 Metadata: IndexId = 0
Metadata: ObjectId = 2073058421      m_prevPage = (0:0)                   m_nextPage = (0:0)
pminlen = 1008                       m_slotCnt = 2                        m_freeCnt = 6070
m_freeData = 2118                    m_reservedCnt = 0                    m_lsn = (28:183:2)
m_xactReserved = 0                   m_xdesId = (0:0)                     m_ghostRecCnt = 0
m_tornBits = 770
      

<snip>     

In this case the torn-page encoding has been removed, and the page is fine. Once I've corrupted the page on disk, it's tricky to be able to see it with DBCC PAGE. I managed to catch it once and saw the following:

m_pageId = (1:143)                   m_headerVersion = 1                  m_type = 1
m_typeFlagBits = 0x4                 m_level = 0                          m_flagBits = 0x8100
m_objId (AllocUnitId.idObj) = 67     m_indexId (AllocUnitId.idInd) = 256 
Metadata: AllocUnitId = 72057594042318848                                
Metadata: PartitionId = 72057594038321152                                 Metadata: IndexId = 0
Metadata: ObjectId = 2073058421      m_prevPage = (0:0)                   m_nextPage = (0:0)
pminlen = 1008                       m_slotCnt = 1                        m_freeCnt = 7083
m_freeData = 1107                    m_reservedCnt = 0                    m_lsn = (28:81:20)
m_xactReserved = 0                   m_xdesId = (0:0)                     m_ghostRecCnt = 0
m_tornBits = 41949233

Now if I try to select from the table I get:         

SELECT * FROM BrokenTable;
GO

Msg 824, Level 24, State 2, Line 1

SQL Server detected a logical consistency-based I/O error: torn page (expected signature: 0xaaaaaaaa; actual signature: 0xaaaaa82a). It occurred during a read of page (1:143) in database ID 8 at offset 0x0000000011e000 in file 'C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\ChecksumTest.mdf'. Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

The crux of the question is whether this will still be reported if the database switches to page checksums - let's try:

ALTER DATABASE checksumtest SET PAGE_VERIFY CHECKSUM;
GO

SELECT * FROM BrokenTable;
GO

Msg 824, Level 24, State 2, Line 1

SQL Server detected a logical consistency-based I/O error: torn page (expected signature: 0xaaaaaaaa; actual signature: 0xaaaaa82a). It occurred during a read of page (1:143) in database ID 8 at offset 0x0000000011e000 in file 'C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\ChecksumTest.mdf'. Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

Cool! The answer is YES - the torn-page is still detected, because the bit in the page header specifies which page protection algorithm the page is using. In fact, it even works if you turn off page checksums and torn-page detection completely.

Friday, March 21, 2008 3:23:07 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Wednesday, March 19, 2008

One of the drawbacks of not being in the SQL team at Microsoft any longer is that I don't know about all the undocumented features in the next release - I have to hunt around for them like everyone else :-(

So I was poking about in SSMS in 2008 CTP-6 and noticed a function called sys.fn_PhysLocCracker that I'd never heard of. Doing an sp_helptext on it gets the following output:

-- Name: sys.fn_PhysLocCracker
--
-- Description:
-- Cracks the output of %%physloc%% virtual column
--
-- Notes:
-------------------------------------------------------------------------------
create function sys.fn_PhysLocCracker (@physical_locator binary (8))
returns @dumploc_table table
(
 [file_id] int not null,
 [page_id] int not null,
 [slot_id] int not null
)
as
begin

 declare @page_id binary (4)
 declare @file_id binary (2)
 declare @slot_id binary (2)

 -- Page ID is the first four bytes, then 2 bytes of page ID, then 2 bytes of slot
 --
 select @page_id = convert (binary (4), reverse (substring (@physical_locator, 1, 4)))
 select @file_id = convert (binary (2), reverse (substring (@physical_locator, 5, 2)))
 select @slot_id = convert (binary (2), reverse (substring (@physical_locator, 7, 2)))
 
 insert into @dumploc_table values (@file_id, @page_id, @slot_id)
 return
end

Cool - but something else I've never heard of %%physloc%% - what's that? After playing around for a while, I figured out how to make it work.  Just to be confusing, there's another identical version of the function called sys.fn_PhysLocFormatter - and that's the only one I could get to work. Here's an example:

CREATE TABLE TEST (c1 INT IDENTITY, c2 CHAR (4000) DEFAULT 'a');
GO
INSERT INTO TEST DEFAULT VALUES
;
INSERT INTO TEST DEFAULT VALUES
;
INSERT INTO TEST DEFAULT VALUES;
GO

SELECT sys.fn_PhysLocFormatter (%%physloc%%) AS [Physical RID], * FROM TEST;
GO

Physical RID       c1
-----------------  -----------
(1:411:0)          1
(1:411:1)          2
(1:413:0)          3

It's a physical-record locator function! Undocumented and unsupported (obviously), but hey, some of the best features are :-) It gives the database file, page within the file, and slot number on the page in the format (file:page:slot). I can think of a *bunch* of uses for this which I'll be exploring over the next few months.

How cool is that?!?!

Wednesday, March 19, 2008 4:25:13 PM (Pacific Standard Time, UTC-08:00)  #    Comments [6]  | 
Thursday, March 13, 2008

Well this one is well overdue and I'm in the middle of writing a class where I want to reference this blog post - so I suppose I'd better write it!! This is an updated post from my old Storage Engine blog that now covers DIFF and ML map pages.

In some previous posts in this series I built up the storage basics in database files:

The final pieces in the allocation puzzle are the other allocation-tracking map pages - GAM, SGAM, PFS, ML map, and DIFF map pages. All of the following explanation holds for SQL Server 2000 and all subsequent releases so far. For any of these pages you can do a dump-style 3 DBCC PAGE dump and it will interpret the page and give you a human readable form of the allocation tracking data.

GAM pages

GAM stands for Global Allocation Map. If you remember from before, database data files are split up into GAM intervals (don't get confused - they're not split physically, just conceptually). A GAM interval is equivalent to the amount of space that the bitmaps in GAM, SGAM, ML map, DIFF map, and IAM pages track - 64000 extents or almost 4GB. These bitmaps are the same size in each of these five page types and have one bit per extent, but they mean different things in each of the different allocation pages.

The bits in the GAM bitmap have the following semantics:

  • bit = 1: the extent is available for allocation (you could think of it as currently allocated to the GAM page)
  • bit = 0: the extent is already allocated for use

These semantics are the same for mixed and dedicated/uniform extents.

One thing to note, at the start of every GAM interval is a GAM extent which contains the global allocation pages that track that GAM interval. This GAM extent cannot be used for any regular page allocations. The first GAM extent starts at page 0 in the file and has the following layout:

  • Page 0: the file header page (another post!)
  • Page 1: the first PFS page
  • Page 2: the first GAM page
  • Page 3: the first SGAM page
  • Page 4: Unused in 2005+
  • Page 5: Unused in 2005+
  • Page 6: the first DIFF map page
  • Page 7: the first ML map page

SGAM pages

I remember last year having an email discussion about what the 'S' stands for in SGAM. Various names have been used over the years inside and outside Microsoft but the official name that Books Online uses is Shared Global Allocation Map. To be honest, we always just call them 'es-gams' and never spell it out.

As I said above, the SGAM bitmap is exactly the same as the GAM bitmap in structure and the interval it covers, but the semantics of the bits are different:

  • bit = 1: the extent is a mixed extent and has at least one unallocated page available for use
  • bit = 0: the extent is either dedicated or is a mixed extent with no unallocated pages (essentially the same situation given that the SGAM is used to find mixed extents with unallocated pages)

Combining GAM, SGAM, and IAM pages

So, taking the GAM, SGAM and IAM pages together (remember that in the IAM bitmap, the bit is set if the extent is allocated to the IAM chain/allocation unit), the various combinations of bits are:

GAM

SGAM

Any IAM

Comments

0

0

0

Mixed extent with all pages allocated

0

0

1

Dedicated extent (must be allocated to only a single IAM page)

0

1

0

Mixed extent with >= 1 unallocated page

0

1

1

Invalid state

1

0

0

Unallocated extent

1

0

1

Invalid state

1

1

0

Invalid state

1

1

1

Invalid state

You can see that only 4 of the 8 possible bit combinations for any particular extent are valid. Anything else constitutes a corruption of some sort and can lead to all kinds of horrible situations.

ML map pages

ML stands for Minimally Logged. These pages track which extents have been modified by minimally-logged operations since the last transaction log backup when using the BULK_LOGGED recovery model. The idea is that the next transaction log backup will backup the log as usual, and then also include all the extents marked as changed in these bitmaps. The combination of these extents, plus the transaction log in the backup gives the differences that have occured in the database since the previous transaction log backup. The ML page bitmaps are cleared once they've been read. If you don't ever use the BULK_LOGGED recovery model, these pages are never used.

The ML page bitmap is exactly the same as the GAM bitmap in structure and the interval it covers, but the semantics of the bits are different:

  • bit = 1: the extent has been changed by a minimally logged operation since the last transaction log backup
  • bit = 0: the extent was not changed

DIFF map pages

DIFF stands for differential. These pages track which extents have been modified since the last full backup was taken. It is a common misconception that the bitmaps track the changes since the last differential backup. The idea is that a differential backup will contain all the extents that have changed since the last full backup. Restore time can be cut down significantly by using differential backups to avoid having to restore all the log backups in the period between the full and last differential backup - more on this in a later post. The bitmaps are not cleared until the next full backup. Note that I don't say full database backup in the explanation above. The full and differential backups can be database, filegroup, or file level backups.

The DIFF page bitmap is exactly the same as the GAM bitmap in structure and the interval it covers, but the semantics of the bits are different:

  • bit = 1: the extent has been changed since the last full backup
  • bit = 0: the extent was not changed

PFS pages

PFS stands for Page Free Space, but the PFS page tracks much more than that. As well as GAM intervals, every database file is also split (conceptually) into PFS intervals. A PFS interval is 8088 pages, or about 64MB. A PFS page doesn't have a bitmap - it has a byte-map, with one byte for each page in the PFS interval (not including itself).

The bits in each byte are encoded to mean the following:

  • bits 0-2: how much free space is on the page
    • 0x00 is empty
    • 0x01 is 1 to 50% full
    • 0x02 is 51 to 80% full
    • 0x03 is 81 to 95% full
    • 0x04 is 96 to 100% full
  • bit 3 (0x08): is there one or more ghost records on the page?
  • bit 4 (0x10): is the page an IAM page?
  • bit 5 (0x20): is the page a mixed-page?
  • bit 6 (0x40): is the page allocated?
  • Bit 7 is unused

For instance, an IAM page will have a PFS byte value of 0x70 (allocated + IAM page + mixed page).

Free space is only tracked for pages storing LOB values (i.e. text/image in SQL Server 2000, plus (n)varchar(max), varbinary(max), XML, and row-overflow data in SQL Server 2005) and heap data pages. This is because these are the only pages that store unordered data and so insertions can occur anywhere there's space. For indexes, there's an explicit ordering so there's no choice in the insertion point.

The point at which a PFS byte is reset is not intuitive. PFS bytes are not fully reset until the page is reallocated. On deallocation, the only bit in the PFS byte that's changed is the allocation status bit - this makes it very easy to rollback a deallocation.

Here's an example. Using a database with a simple table with one row. A DBCC PAGE of the IAM page includes:

PFS (1:1) = 0x70 IAM_PG MIXED_EXT ALLOCATED 0_PCT_FULL

If I drop the table in an explicit transaction and then do the DBCC PAGE again, the output no includes:

PFS (1:1) = 0x30 IAM_PG MIXED_EXT 0_PCT_FULL

And if I rollback then transaction, the DBCC PAGE output reverts to:

PFS (1:1) = 0x70 IAM_PG MIXED_EXT ALLOCATED 0_PCT_FULL

Ok - four blog posts in one day is quite enough! :-)

Thursday, March 13, 2008 7:31:54 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 

It seems that today is going to be one of those days where I get lost in forums and blogging - I can live with that :-)

One of the questions that came up on a forum today was about choosing an HA solution - based solely on the hardware that was running the database! Given that single piece of info, it's impossible to come up with any kind of sensible answer. The other thing I see a lot is someone saying 'just use a cluster' - well, if you're trying to protect against damage to the data, just using a cluster won't do it because of the single-point-of-failure in a failover cluster - the shared disks.

So where do you start? The key to choosing an HA solution is to work out your requirements first and then choose a technology that allows you to meet as many of them as you can, within your available budget. Here are some of the questions I like to ask (not an exhaustive list):

  • What is the maximum application downtime SLA (service-level agreement)? In other words, if a disaster happens, how long can the application be off-line while failover occurs or the disaster is fixed?
  • What is the maximum acceptable data-loss SLA? If a disaster happens, how much can you afford to lose in terms of data or work? You might require up-to-the minute recovery for instance, or you might be able to cope with losing the last day's worth of transactions.
  • What are you trying to protect? Site, server, instance, database, filegroup, partition, table, group of tables?
  • What is the transaction log generation rate of your workload? If it's very high, that means you're going to have problems with backup up the log and with getting transaction log over to your redundant server/site.
  • What recovery model are you running your database(s) in? If you're in SIMPLE, then you can't get point-in-time recovery and so you're looking at losing all the work since your last full backup, and it also means you can't use any of the HA technologies which rely on the transaction log.
  • What’s your current backup strategy? If the answer is 'what backup strategy?' then you've got bigger problems than just getting an HA solution in place...
  • Are you trying to achieve site-level redundancy? If so, do you have a second site? Where is it? Does it have the same protection as the main site (in terms of security, HVAC, power, etc)
  • What’s the network bandwidth and latency to the second site? If your transction log generation rate is MBs/second, but your second site is 2000 miles away through a 720KB/second link, you're not going to be doing any kind of HA solution involving the second site that comes close to your downtime and data-loss requirements...
  • What’s the hardware at the second site?
  • Can you alter the application at all? If you can't alter the application then you may have a hard time getting it to gracefully failover to a redundant server. You also won't be able to use explicit redirection with database mirroring.
  • What's the application eco-system? In other words, what all has to failover so the application can run properly.

All of these figure into the choice of HA solution. Work these out, prioritize them, and then evaluate HA technologies (or combinations of technologies) to see which requirements you can meet. Don't just jump at failover clustering first!

Over the next few months I'll be posting more on designing for high-availability - let me know if there's anything in particular you want to see.

Thursday, March 13, 2008 1:14:46 PM (Pacific Standard Time, UTC-08:00)  #    Comments [1]  | 

This came up several times during the week so I thought it was about time to blog about it. One of the new features we put into SQL Server 2005 was storing the last time that DBCC CHECKDB completed successfully (called the last-known good time). What does successfully mean? This is the catch - if DBCC CHECKDB runs to completion then it considers that a successful run - EVEN if it found some corruptions. However, the last-known good time is ONLY updated if there were NO corruptions found. Slightly confusing I know.

Cool - but how can you see it? Well, the only time it's ever reported is when the database starts up. Not too useful if the database has been running for months. So how to see it??? The trick is to use DBCC PAGE. The last-known good time is stored in the boot page of the database - page 9. The following code will dump page 9 for you:

-- you need this to get the DBCC PAGE output to your console
DBCC TRACEON (3604);
GO

-- page 9 is the boot page
DBCC PAGE (dbname, 1, 9, 3);
GO

You need to look for the dbi_dbccLastKnownGood field. That was the last time that DBCC CHECKDB ran without finding any corruptions.

Now - what about if you're trusting your SQL Agent jobs to run DBCC CHECKDB for you every so often (how often is a whole other discussion...) and relying on the Agent job failing if DBCC CHECKDB finds corruptions. Well, if all you do is run the DBCC command, you're never going to know about corruption unless DBCC CHECKDB itself fails for some reason - remember, it returns successfully even it it found corruptions.The key is to add another statement after you run DBCC that checks the value of @@ERROR (the last error code reported by SQL Server). Although DBCC CHECKDB doesn't stop with an error when it finds corruption, it will set @@ERROR if there are any error messages in its output.

Edit: The stuff I've struck-through above is not true. I thought I'd remembered it from testing previously and I've heard plenty of anecdotal evidence from customers too BUT an Agent job WILL fail if a DBCC CHECKDB within it fails. I went back and forth with Tara Kizer on SQLteam.com and we tested on 2005 and 2000 - and it worked. Now here is a real issue - the Job History that's captured will not contain all the output from DBCC CHECKDB - especially if you didn't use the WITH NO_INFOMSGS option. You should make sure you capture the output to a file to avoid having to go back and run DBCC CHECKDB all over again.

Summary - make sure you really know when DBCC CHECKDB runs successfully, without finding any corruptions!

Thursday, March 13, 2008 12:32:17 PM (Pacific Standard Time, UTC-08:00)  #    Comments [1]  | 

Here's an interesting question that came in to our questions line (questions@SQLskills.com - no guarantee of an answer - I check it every so often):

I have seen demonstrations where a large database being broken down into smaller ones using synonym names. I think it was used on a data warehouse and allowed smaller database backups instead of doing one large one and simulating file group backups. Is there somebody who has worked with this variation and can identify when this would be an advantage over file group backups if there is even an advantage.

My answer will always be to keep the VLDB (Very Large DataBase) as a single unit and go with filegroups if you need to. Breaking the VLDB into smaller databases has some serious issues:

  • Queries become more complicated as they're now potentially cross-database. This means you need to keep all the security settings in all the databases synchronized.
  • Referential integrity becomes a big problem as you can't create foreign key constraints across databases
  • You have multiple transaction logs to manage instead of one. This means you need to be doing log backups of ALL the databases, vastly increasing the number of backup files to manage.
  • Point-in-time recovery becomes very hard as you have to restore ALL the databases to a single point-in-time. Now, this may not be too much of a problem if the data in the VLDB is essentially read-only, and gets updated en-masse every so often from your OLTP system - but for changing data it's a nightmare.
  • Implementing a high-availability solution becomes very challenging. As soon as you start to think of multiple databases that need to be in sync, you can pretty much forget about log shipping and database mirroring. You're going to need whole-instance failure protection - which means failover clustering. Then if you want to mitigate the single-point-of-failure in a failover cluster (the shared disks), you're going to need SAN replication to a remote failover cluster too - expensive!!!

These are just the ones that spring to mind in 5 minutes - I'm sure there are more if I sat and thought about it longer (e.g. how to create a database snapshot, run a consistency check, ...)

So - IMHO it's always going to be easier to backup and restore a single VLDB split into filegroups than a VLDB split into multiple databases.

PS If there's something you'd like to see me do a blog post on, shoot me an email here.

Thursday, March 13, 2008 9:41:14 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Wednesday, March 12, 2008

It's been a long few days of building slide decks and other content and I just had to stop for a bit and take care of my internals-hacking withdrawal symptoms!

While I was at Microsoft, I wrote some code in the Storage Engine to very easily return all the IAM chains/allocation units (see this post for more details of the internals of those), what type they are, and the relevant page IDs (first, root, first-IAM) so I could go spelunking with DBCC PAGE. Since I left six months ago, it's one of the things I've been missing using when poking around on customer sites, so this afternoon I sat down and wrote the equivalent in T-SQL, using the undocumented sys.system_internals_allocation_units DMV. The output is easy to match up to sys.partitions but the page IDs are formatted in byte-reversed hex so a little tweaking was needed to extract the fields and make them human readable - I've put them into the same format that all SQL Server error messages use when giving a page number.

So - I present to you sp_AllocationMetadata. I was having all kinds of trouble using it in other databases (trying to figure out a way to change database contexts in the SP) until I remembered that you can create an SP in master and mark it as a system object using the undocumented sys.sp_MS_marksystemobject SP. This makes any SP execute in the context of the database from where it is called - extremely useful when you're querying against a database's system catalog views.

[Edit: Kalen pointed out that the DMV *is* documented, but just not in the BOL index. Even better - that means the SP below isn't doing anything dodgy :-) - thanks Kalen!]

The SP can be called with an optional object name parameter, in which case it will only give you back the allocation metadata for that object. If you don't specify a parameter, it gives you back the allocation metadata for all objects in the database. Here's an example of the output:

USE AdventureWorks;
GO

EXEC sp_AllocationMetadata 'HumanResources.Employee';
GO

Object Name   Index ID   Alloc Unit ID       Alloc Unit Type   First Page   Root Page   First IAM Page
------------- ---------- ------------------- ----------------- ------------ ----------- ----------------
Employee      1          72057594050379776   IN_ROW_DATA       (1:588)      (1:594)     (1:593)
Employee      2          72057594055491584   IN_ROW_DATA       (1:2141)     (1:2144)    (1:2142)
Employee      3          72057594055557120   IN_ROW_DATA       (1:2146)     (1:2149)    (1:2147)
Employee      4          72057594055622656   IN_ROW_DATA       (1:2150)     (1:2150)    (1:2151)
Employee      5          72057594055688192   IN_ROW_DATA       (1:2153)     (1:2153)    (1:2154)

You'll notice there are only IN_ROW_DATA allocation units - that's because this table doesn't have any LOB data or an variable-length columns that have been pushed off-row (producing LOB_DATA and ROW_OVERFLOW_DATA allocation units, respectively). So - it only shows what actually exists (rather than creating NULL values, for instance).

Below is the script that creates the SP, and I've included it as an attachment too.

Ah - that feel's better :-) Happy spelunking!

USE master;
GO

IF OBJECT_ID ('sp_AllocationMetadata') IS NOT NULL
   DROP PROCEDURE sp_AllocationMetadata;
GO

CREATE PROCEDURE sp_AllocationMetadata
(
   
@object VARCHAR (128) = NULL
)
AS
SELECT
   
OBJECT_NAME (sp.object_id) AS [Object Name],
   
sp.index_id AS [Index ID],
   
sa.allocation_unit_id AS [Alloc Unit ID],
   
sa.type_desc AS [Alloc Unit Type],
   
'(' CONVERT (VARCHAR (6),
      
CONVERT (INT, SUBSTRING (sa.first_page, 6, 1) +
      
   SUBSTRING (sa.first_page, 5, 1))) +
   
':' + CONVERT (VARCHAR (20),
      
CONVERT (INT, SUBSTRING (sa.first_page, 4, 1) +
      
   SUBSTRING (sa.first_page, 3, 1) +
         
SUBSTRING (sa.first_page, 2, 1) +
         
SUBSTRING (sa.first_page, 1, 1))) +
   
')' AS [First Page],
   '(' + CONVERT (VARCHAR (6),
      
CONVERT (INT,
         
SUBSTRING (sa.root_page, 6, 1) +
         
SUBSTRING (sa.root_page, 5, 1))) +
   
':' + CONVERT (VARCHAR (20),
      
CONVERT (INT,
         
SUBSTRING (sa.root_page, 4, 1) +
         
SUBSTRING (sa.root_page, 3, 1) +
         
SUBSTRING (sa.root_page, 2, 1) +
         
SUBSTRING (sa.root_page, 1, 1))) +
   
')' AS [Root Page],
   
'(' + CONVERT (VARCHAR (6),
      
CONVERT (INT,
         
SUBSTRING (sa.first_iam_page, 6, 1) +
         
SUBSTRING (sa.first_iam_page, 5, 1))) +
   
':' + CONVERT (VARCHAR (20),
      
CONVERT (INT,
         
SUBSTRING (sa.first_iam_page, 4, 1) +
         
SUBSTRING (sa.first_iam_page, 3, 1) +
         
SUBSTRING (sa.first_iam_page, 2, 1) +
         
SUBSTRING (sa.first_iam_page, 1, 1))) +
   
')' AS [First IAM Page]
FROM
   
sys.system_internals_allocation_units AS sa,
   
sys.partitions AS sp
WHERE
   
sa.container_id = sp.partition_id
   AND sp.object_id =
      
(CASE WHEN (@object IS NULL)
         THEN sp.object_id
         
ELSE OBJECT_ID (@object)
      
END);
GO

EXEC sys.sp_MS_marksystemobject sp_AllocationMetadata;
GO

sp_AllocationMetadata.zip (.69 KB)
Wednesday, March 12, 2008 2:38:51 PM (Pacific Standard Time, UTC-08:00)  #    Comments [1]  | 

A couple of weeks ago I blogged about the three tracks of the SQL Server 2008 JumpStart course that SQLskills.com taught internally for Microsoft and some MVPs - see here for details. Well, the content is now available to download! Note that this was based on CTP-5 (November 2007 CTP) and there have been *lots* of behavioral changes since then (with more planned for CTP-6 Refresh and RTM), but if you want a high-level overview of a bunch of the features (albeit in bullet-point summaries on slides) then this is a good place to start.

For me, what's *REALLY* cool is that the site also has a downloadable VPC plus lab manuals for all of the AlwaysOn High-Availability hands-on labs that SQLskills.com wrote. We originally wrote these labs for SQL Server 2005 and I updated them all for CTP-5. The VPC has a long lab on each of the following:

  • Database Snapshots
  • Data Recovery and Preventative Techniques
  • Instant Initialization
  • Peer-to-Peer Replication (including the new Topology Wizard I blogged about here)
  • Table and Index Partitioning
  • Snapshot Isolation
  • Online Operations
  • Database Mirroring (including a demo I wrote of Automatic Page Repair, described here)
  • Service Oriented Database Architecture

There is some great depth in each of these - Kimberly blogged more info about the exercises in each lab here. There's also another VPC image with some higher-level labs on a variety of 2008 features and written by a number of different people- including some labs on Policy-Based Management and Performance Data Collection that Kimberly wrote.

So - where can you get these from? Go to http://sqlserver2008jumpstart.microsofttraining.com/ and hit the Download link on the right-hand side. Register and then you can get to the materials. The AlwaysOn VPC image is Collection 2 at the bottom of the page, and you'll see all the slide decks as you scroll down the page.

Enjoy!

Wednesday, March 12, 2008 9:23:23 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Tuesday, March 11, 2008

The second part of our radio interview with TechNet has been released (see here for part 1). In this installment we discuss troubleshooting and manageability in SQL Server 2008. You can get to it by going to the March 11th 2008 show here.

Enjoy!

Tuesday, March 11, 2008 3:45:21 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Monday, March 10, 2008

(Cross-posted on Paul and Kimberly's blogs)

With the Spring SQL Server Connections show coming up next month, its time to start planning for the Fall show. Once again Kimberly and I will be Co-Chairs of the Conference and we'd like to invite you to submit abstracts for the Fall 2008 SQL Connections conference, to be held in Las Vegas, November 10-14th, 2008.

The conference will take place shortly after the SQL Server 2008 launch (when it actually RTMs, not the 'launch' that happened February 27th), and will focus heavily on SQL Server 2008. Abstracts are still welcome on best practices for SQL Server 2005 and how to upgrade and migrate applications from SQL Server 200x to SQL Server 2008. However, we will consider all topics - from design to performance to troubleshooting to architectures to programming. Also, we're likely to closely follow our past conference format with three primary themes/tracks: Infrastructure and Scalability, Programming, and Business Intelligence. You should aim for the intermediate to advanced audience for any SQL Server 2005 materials and for SQL Server 2008 we'll consider any level - especially depending on the newness of the release.

Note: You will not be able to speak at a DevConnections show if you are also presenting at a competitor’s show, in the same state, within 30 days of the DevConnections show. To be more specific, if you wish to speak at SQL Connections or any other DevConnections show this Fall in Las Vegas, you are precluded from speaking at any competing conferences in the state of Nevada between October 10, 2008 and December 13, 2008.

For submitting session abstracts, please use this URL:  http://www.deeptraining.com/devconnections/abstracts

The tool will be open from March 10th to midnight EST April 9th, after which we won't accept any abstracts. If you have used this site before and have forgotten your password, you can have your secret word mailed to you. It's better to do this then to create a new account. Please keep the abstracts under 200 words each and in one paragraph. Please do not use bulleted items or line breaks, and please use a spell-checker. If you have an issue with the site, please follow up with an email to paul@sqlskills.com.

Please create a new speaker record (if you haven't submitted before) and submit at least 3 abstracts; however, it will help your chances of being selected if you submit 5 or 6 abstracts. There's one slight change for this conference only - we won't have a fourth track for speakers just delivering a single conference session, so if you only submit a single session abstract, you're unlikely to be picked.

What you will get if selected:

  • $500 per conference talk. (Additional compensation for pre/post conference workshops.)
  • Coach airfare and hotel stay paid by the conference
  • Free admission to all of the co-located conferences
  • Speaker party
  • The adoration of attendees :-)
  • etc.

Thanks for your continued support for our SQL Connections conference and we look forward to seeing your bright and shiny new abstracts!

Thanks,
Kimberly L. Tripp & Paul S. Randal
SQL Connections Conference Chairs

Monday, March 10, 2008 1:29:38 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Sunday, March 09, 2008

During the various courses I've been teaching, people are interested in how FILESTREAM performance compares with storing BLOBs in the database itself. I have some performance graphs based on measurements the dev team made - these have been presented publicly by myself and the dev team so I can share them with you here.

There are three graphs below, showing the relative performance for read, insert, and update of:

  • BLOB data stored in FILESTREAM format and accessed through the WIN32 streaming APIs. The times include getting a transaction context from SQL Server, getting the file path, doing the operation, closing the file, and committing the transaction in SQL Server.
  • BLOB data stored in FILESTREAM format and manipulated through T-SQL
  • BLOB data stored in varbinary(max) format (and obviously manipulated through T-SQL)

The data is the same in each test. The tests were performed on a 4-way box with a cold buffer pool. One interesting point to note is that for smaller data sizes, it's faster to manipulate them through T-SQL than through the file system - this is expected based on research Jim Gray did when putting together the original TerraServer.

Sunday, March 09, 2008 9:45:15 AM (Pacific Standard Time, UTC-08:00)  #    Comments [3]  | 
Friday, March 07, 2008

Just over a month ago I posted on how to enable FILESTREAM in CTP-5 and the pre-CTP-6 build I was working with. Now that's all changed! CTP-6 is a hybrid of the CTP-5 method and what will eventually be the methodology in RTM. The changes were made to separate the OS-level configuration from the SQL-level configuration, so a SQL admin can't invoke OS-level changes. This makes sense to me.

In a nutshell, the new way of enabling FILESTREAM will be:

  • OS admin enables FILESTREAM when installing SQL Server or through the SQL Server Configuration Manager
  • SQL admin enables FILESTREAM in the instance using sp_configure. This will always succeed, but if this is done before the OS-level enabling, then FILESTREAM operations will fail.

I won't go into all the details as Joanna Omel (the Program Manager for FILESTREAM in the Storage Engine) has blogged about them on the Storage Engine PM team blog - see here for her post.

More on CTP-6 changes as I hear about them (or trip over them!)

Friday, March 07, 2008 11:10:52 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Tuesday, March 04, 2008

Wow - almost 10 days without a blog post - that must be a record for me! :-) Never fear - I'll be posting more over the next couple of weeks. Kimberly flew off to India yesterday to teach some Microsoft classes and unfortunately I couldn't join her this time as I'm teaching 3 classes myself:

  • an internal Microsoft class on Designing for High Availability
  • another internal Microsoft class on SQL Server 2008 for DBAs (similar to the JumpStart class I posted about here)
  • 3 days of content for the new Microsoft Certified Architect: Database qualification - see the Microsoft Learning site here for details

Anyway - the subject of this post is to let you know that last week, Kimberly and I did two interviews for TechNet Radio on SQL Server 2008 technologies. Part 1 has just been released where we discuss security and availability features. You can get to it by going to the March 4th 2008 show here. Tune in and find out how I lull myself to sleep when Kimberly's out of town...

Enjoy!

Tuesday, March 04, 2008 6:31:23 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Sunday, February 24, 2008

SQL Connections is in less than two months now and our pre-con and post-con workshops are filling up fast - checkout my previous blog post here for the full list of what we're presenting.