Monday, April 28, 2008

We're sitting here in St. Pete Beach in Florida visiting some of Kimberly's family and having some sun-kissed R&R before heading back up to Seattle on Wednesday, and I thought I'd get the next post in my sparse columns mini-series out. Before I start though, Kimberly just posted the resources for the Accidental DBA class we taught at SQL Connections last week and in Iceland at the end of March - see here.

In my first post on sparse columns (see here) I introduced the concepts and explained why sparse columns are a useful feature. In this post I'm going to use the example I gave - a document repository with 50 document types and 20 unique attributes per document-type. Yes, it's a contrived example, but scale it up be a factor of 100+ (think Sharepoint Server) and methods like normalization no longer apply.

I'm using a CTP-6 VPC on a Lenovo T60P laptop with 4GB, a dual-core 2.2GHz CPU, and the VPC is running off a 6200RPM drive. Your mileage may vary for run-times of the example scripts. The VPC is the one we gave out at SQL Connections and Iceland, and you can download the scripts and a VPC (maybe only CTP-5) from the Microsoft JumpStart site (see here for details).

The first test I'll do is just creating the schema necessary to store the 1000+ columns of information in the test scenario. I'll do one with sparse columns and one without:

-- Create two tables, one with 1000 columns and one with 1000 columns but 997 sparse.
CREATE TABLE TableWithoutSparseColumns (
   DocID INT IDENTITY, DocName VARCHAR (100), DocType INT,
   c0004 INT NULL, c0005 INT NULL, c0006 INT NULL, c0007 INT NULL, c0008 INT NULL, c0009 INT NULL,
   ...
   c0994 INT NULL, c0995 INT NULL, c0996 INT NULL, c0997 INT NULL, c0998 INT NULL, c0999 INT NULL,
   c1000 INT NULL);
GO

CREATE TABLE TableWithSparseColumns (
   DocID INT IDENTITY, DocName VARCHAR (100), DocType INT,
   c0004 INT SPARSE NULL, c0005 INT SPARSE NULL, c0006 INT SPARSE NULL, c0007 INT SPARSE NULL,
   ...
   c0996 INT SPARSE NULL, c0997 INT SPARSE NULL, c0998 INT SPARSE NULL, c0999 INT SPARSE NULL,
   c1000 INT SPARSE NULL);
GO

I won't list all the column names for the sake of brevity. Next I'll insert some values into each table (the same values in each table):

-- Insert a few rows in each
INSERT INTO TableWithSparseColumns (DocName, Doctype) VALUES ('aaaa', 1);
INSERT INTO TableWithSparseColumns (DocName, Doctype, c0945) VALUES ('bbbb', 2, 46);
INSERT INTO TableWithSparseColumns (DocName, Doctype, c0334) VALUES ('cccc', 3, 44);
INSERT INTO TableWithSparseColumns (DocName, Doctype, c0233, c0234) VALUES ('dddd', 4, 12, 34);
INSERT INTO TableWithSparseColumns (DocName, Doctype, c0233, c0234,c0235,c0236) VALUES ('eeee', 4, 12, 34, 46, 66);
GO

INSERT INTO TableWithoutSparseColumns (DocName, Doctype) VALUES ('aaaa', 1);
INSERT INTO TableWithoutSparseColumns (DocName, Doctype, c0945) VALUES ('bbbb', 2, 46);
INSERT INTO TableWithoutSparseColumns (DocName, Doctype, c0334) VALUES ('cccc', 3, 44);
INSERT INTO TableWithoutSparseColumns (DocName, Doctype, c0233, c0234) VALUES ('dddd', 4, 12, 34);
INSERT INTO TableWithoutSparseColumns (DocName, Doctype, c0233, c0234,c0235,c0236) VALUES ('eeee', 4, 12, 34, 46, 66);
GO

Now let's see how big each table is:

-- Now lets see how big the rows are
SELECT [avg_record_size_in_bytes], [page_count] FROM sys.dm_db_index_physical_stats (
   DB_ID ('SparseColumnsTest'), OBJECT_ID ('TableWithoutSparseColumns'), NULL, NULL, 'DETAILED');

SELECT [avg_record_size_in_bytes], [page_count] FROM sys.dm_db_index_physical_stats (
   
DB_ID ('SparseColumnsTest'), OBJECT_ID ('TableWithSparseColumns'), NULL, NULL, 'DETAILED');
GO

avg_record_size_in_bytes page_count
------------------------ --------------------
4135                     5

(1 row(s) affected)

avg_record_size_in_bytes page_count
------------------------ --------------------
40.6                     1

(1 row(s) affected)

Ok - so that's not a huge difference in page count (because we've only got 5 rows), but it's a *massive* difference in average record size. Scaled up to hundreds of thousands or millions of records, the space savings will be astronomical!

Now let's try selecting the data back using results-to-grid mode and a simple SELECT * statement. It takes 20 seconds to return - solely because the client still has to retrieve the metadata for 1000+ columns. Even though the columns are still defined as SPARSE, they show up in a SELECT * resultset, and that makes extracting out the non-NULL values pretty difficult...

Time for another new feature - column sets. There's a new column type available for use with sparse columns - an XML COLUMN_SET. This is a column that is only materialized when selected, and will return all the non-NULL sparse columns in a row as an XML BLOB. It will also change the behavior of a SELECT * operation - removing all the sparse columns from the resultset and replacing them with itself, representing all the non-NULL sparse columns. Redefining our TableWithSparseColumns to have an XML COLUMN_SET column called SparseColumns (using the syntax 'SparseColumns XML COLUMN_SET FOR ALL_SPARSE_COLUMNS'), and re-inserting the same values then gives the following results for a SELECT * operation:

DocID   DocName   DocType   SparseColumns
------- --------- --------- ---------------
1       aaaa      1         NULL
2       bbbb      2         <c0945>46</c0945>
3       cccc      3         <c0334>44</c0334>
4       dddd      4         <c0233>12</c0233><c0234>34</c0234>
5       eeee      4         <c0233>12</c0233><c0234>34</c0234><c0235>46</c0235><c0236>66</c0236>

Pretty cool - and it returns virtually instantaneously (obviously scaling up to hundreds of thousands or millions of rows would take longer due to the time necessary to read the pages into the buffer pool). One downside is that the XML blob only returns the column name and value - not the datatype - but if your application can cope with that then not having to wade through hundreds (or thousands by RTM) of NULL columns values is great.

Next time I'll discuss the internals of how sparse columns are stored.

Monday, April 28, 2008 3:05:37 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Wednesday, April 23, 2008

Today's the final day of SQL Connections proper (tomorrow is post-conference workshops). We've really enjoyed the conference (as usual!) and it's been a blast answering everyone's questions. We'll be posting resources over the next week or so (most likely on Kimberly's blog) and we did a video interview with our good friends Richard and Carl (from DotNetRocks) which should be available for download soon.

We spoke with many conference attendees and one request was the ability to give online feedback. While the conference itself doesn't have that ability (yet), we'd love to hear any feedback you have about the conference - the good, the bad, and the ugly! As we're the co-chairs of the SQL conference, all the feedback comes to us anyway, and we'll pass on any non-SQL feedback to the Connections organizers.

So - send feedback email to me with anything you'd like to share. If it's about a specific session/speaker/topic, please try to give as much detail as you can. All feedback will be in strictest confidence.

Thanks!

Wednesday, April 23, 2008 12:58:27 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Saturday, April 19, 2008

Many times I've been asked to do a blog post about creating Agent alerts, and given that today I demo'd it as part of our Accidental DBA workshop at Connections, it seemed a good time to do the blog post too!

I demo this in the context of alerting a DBA when an 823 or 824 IO error occurs. One of my early blog posts (see here) explains what these are, as well as providing a corrupt database that you can use to see these errors happening.

The idea is that I want to know as soon as an IO error occurs so I can start recovering and take preventative action to stop it happening again. I don't want to rely on users telling me when a query hits an IO error, and I don't want to have to scan the SQL error logs to find them. So I'm going to create an alert.

The first step is to fire up Management Studio and make sure SQL Server Agent is running. Next we need to make there's actually an Operator defined - so the new alert has someone to actually alert! - so we'll use the New Operator wizard (see below for how to get there).

In the New Operator Wizard that appears, I've created an operator named 'SysAdmin'. There are three Notification Options you can use - email, net send, and pager. I've setup SysAdmin to use net send to my local machine. You need to make sure the Messenger service is enabled otherwise net send will not work. Also, be aware the net sends will NOT work unless the machine has a network connection - even if the net send source and destination are the same machine! Given the various issues with net send, it's better to use email or pager alerts - but for the purposes of this blog post its the easiest option.

Now let's create the new alert - using the New Alert wizard (see the below for how to get there).

In the New Alert Wizard that appears, I've created an alert named 'IO Errors' for all severity 24 errors on all databases. Below is a portion of the General tab of the wizard showing these settings:

I also need to specify what happens. In the Response tab of the wizard I've set the SysAdmin operator to be notified using net send. Again, see below.

In the Options tab I've checked the box to include the error text in the net send.

Now let's test it. Using the database called 'broken' that I provide as an example (see here), I'll force an IO error to occur. In my query window I get:

SELECT * from broken..brokentable;
GO

Msg 824, Level 24, State 2, Line 1

SQL Server detected a logical consistency-based I/O error: incorrect checksum (expected: 0x7232c940; actual: 0x720e4940). It occurred during a read of page (1:143) in database ID 10 at offset 0x0000000011e000 in file 'C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\broken.mdf'. Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

And a few seconds later I get the net send:

Pretty cool!

You can also use the WMI Provider to do this - see Creating a SQL Server Agent Alert by Using the WMI Provider for Server Events.

Saturday, April 19, 2008 7:05:23 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Thursday, April 17, 2008

It's been quite a while since I wrote a blog post on a flight but I happened to be playing with a CTP-6 VPC on the way down to Orlando and thought I'd do the first in a few posts on a cool feature of SQL Server 2008 - Sparse Columns.
 
One problem in database schema design is how to store heterogenous data types with many properties efficiently. Here's an example (contrived) scenario - consider a document repository that can store up to 50 different kinds of documents, with a a few common properties (like document type, document name, last modification time) and 20 totally different attributes for each column type. The document repository needs to store the common fields, plus the per-document-type attributes for each document.

[Edit: I had several comments - thank you - that this example is too contrived and that normalization could give some benefit to it - ok- the example is *very* simple to illustrate the concepts involved. Now imagine the same example with thousands of document types and thousands of user-defined attributes per document - normalization is no longer applicable. This is the Sharepoint Server scenario that drives this feature.]
 
What are some of the options for this in SQL Server 2005?

Single table
Define a table with 1000+ columns to have allow all the document types to be stored in a single table. The first 20 columns could be for the common properties and then each subsequent set of columns stores the attributes for a single document type (e.g. columns 20-39 store the attributes for documents of type 1, columns 40-59 store the attributes for documents of type 2, and so on).

Comments on this architecture:

  • There is a huge amount of wasted space for each table row - as only a maximum of 40 columns (common fields plus per-document-type attributes) will have values in each 1000+ column record. Even if all the attributes are stored as nullable variable length columns (e.g. SQLVARIANT) then there's still a minimum of 1-bit of storage required per column (for the null bitmap entry). There's also the CPU overhead of having to crack the SQLVARIANT columns, and the storage overhead of having them in the first place.
  • The 8060 byte record size limit effectively limits the number and datatypes of columns that can be defined per record. You could easily have 1000 4-byte INT columns per record, for instance, but combinations of wider data-types becomes tricky.
  • There is a 1024 column limit per table. This puts a hard stop on the number of document types we can stores in our example.
  • Efficient indexes are impossible. Even if the index is defined to only contain the columns representing the attributes for a particular document type, they would still have a row for every document, regardless of the document type.

Vertically partition
Vertically partition the document repository such that each document type has its own table.

Comments on this architecture:

  • This allows an effectively unlimited number of document types to be supported, with a larger number of attributes for each document type, and with much more efficient indexes.
  • However, any operation that needs to operate over the entire document repository has to join all the tables (e.g. select all documents with a last modification date in the last 7 days)

Property bag
Use a table where the per-document-type properties are stored in a LOB value (somtimes called a property bag).

Comments on this architecture:

  • This also allows an effectively unlimited number of document types and attributes.
  • Accessing attributes is very slow and expensive, requiring reading into an offset inside the LOB column.
  • Indexing over attributes is going to be very expensive - requiring a computed column (to extract the attribute from the LOB value) for each attribute to be indexed

XML
Define an XML column which effectively acts as a property bag.

Comments on this architecture:

  • This also allows an effectively unlimited number of document types and attributes.
  • Accessing attributes is faster than using a LOB property bag but slower than regular columns, and requires XQuery operations.
  • Indexing is possible using XML indexes, but they're very space inefficient (a primary XML index shreds the entire XML column, and the XML column remains)

Basically - there's no good way to do it in SQL Server 2005 or before.
 
Enter SQL Server 2008 with sparse columns.
 
A sparse column is a nullable column that's declared with the SPARSE attribute. This means that when the value is null, it takes zero space - not even the single bit for the null bitmap is required - and this works even for fixed-length columns! The trade-off is that non-null sparse columns take an extra 4-bytes of space over regular columns. Here's an example for INT columns:

  • Non-null regular INT column: 4 bytes
  • Null regular INT column: 4 bytes
  • Non-null sparse INT column: 8 bytes
  • Null regular INT column: 0 bytes

Books Online has a table showing the potential space savings for the various data types using sparse columns. You can get to this table by looking for 'Sparse Columns' in the SQL Server 2008 Books Online index.
 
In my document repository example above, declaring each of the per-document type attributes as SPARSE would allow each record to only store the attributes needed for the document it represents, rather than every defined column - a huge space saving!
 
But what about the limit on the number of columns? Well, SQL Server 2008 is also bumping the number of columns per table to 30000 (see Kimberly's blog post from yesterday) - although not until the next CTP is available.
 
But how would indexing work? SQL Server 2008 has another new feature that helps here - filtered indexes. Conor's blogged about these recently (see here).
 
Over the next few weeks I'll post more on sparse columns - using them, comparisons with other schemas, and anything else I can come up with.

Thursday, April 17, 2008 5:46:03 PM (Pacific Standard Time, UTC-08:00)  #    Comments [2]  | 
Wednesday, April 16, 2008

Here's a quickie just before we head off to SQL Connections in Orlando (see here for all out pre/post cons and sessions).

On one of the internal MS forums was the question - how can I tell through T-SQL the last time SQL Server restarted (i.e. the current 'uptime')? The answer relies on the fact that all the background tasks that start when SQL Server starts must record a 'login time'.

You can get this from:

SELECT [login_time] FROM sysprocesses WHERE spid = 1;
GO

Or more simply:

SELECT MIN ([login_time]) FROM sysprocesses;
GO

Pretty neat trick!

As with the last few conferences, I'll try to blog every day during SQL Connections under the Conference Questions Pot-Pourri category.

Hope to see a bunch of you there!

PS Some people have suggested that checking the creation date of tempdb will also do the trick. That's not a *guaranteed* method as PSS could have used T3609 to recover tempdb instead of recreating it (if they're troubleshooting some tempdb issue). In that case the creation date of tempdb will *not* be the time the server started. Checking the time in sysprocesses is the only infallible method.

Wednesday, April 16, 2008 5:27:19 PM (Pacific Standard Time, UTC-08:00)  #    Comments [7]  | 
Tuesday, April 08, 2008

Over the weekend there was a question on one of the internal aliases at MS: how can I tell what percentage of a database has changed since the last full backup, so I can choose between a differential or full backup?

No such code exists as far as I know - until now! I happened to read the thread while sitting in the airport in Washington D.C. on the way back from Iceland so I started playing around and this morning I completed the code.

The code below creates a function and a stored procedure. The basic idea behind the code is as follows:

For each online data file in the database
   For each GAM interval in the file
      Crack the DIFF map page using DBCC PAGE
      Interpret the DIFF bitmap to aggregate the changed extents
      Add the sum to the total changed extents for the database
   End
End
Report results

There's a function that I create in msdb call SQLskillsConvertToExtents that cracks some of the DBCC PAGE output, and the main procedure is called sp_SQLskillsDIFForFULL and it created as a system object in master. I tried making it a table-valued function but you can't do things like INSERT-EXEC in a function, and that's required for processing the DBCC PAGE output. So - create your own wrapper function or whatever to use it. The interface/output is:

EXEC sp_SQLskillsDIFForFULL 'msdb';
GO

Total Extents Changed Extents Percentage Changed
------------- --------------- ----------------------
102           56              54.9

I've tested it with databases with multiple files and up to around 700GB for a single file size. There's the potential for an issue with file sizes of 4TB and beyond (where PFS intervals and GAM intervals map to the same extent every 4TB or so, but I think it'll be ok and the position of the DIFF map in the extent won't change - if someone could test it with a 4+TB file I'd be grateful). It's been tested on SQL Server 2005 and 2008. It will not work on SQL Server 2000 - I'll do a 2000 version sometime soon.

Note that after doing a full backup you will never see Changed Extents equal to zero. It will always be 4 + (number of online data files - 1), and around 20 or so for msdb. This is because the extent containing the file header in each file is always marked as changed, as are three extents in the primary file containing the roots of some critical system tables.

Anyway - here it is. You can download it in a zip file from SQLskillsDIFForFULL.zip (2.65KB). Enjoy!

/*============================================================================
   
File: SQLskillsDIFForFULL.sql

   Summary: This script creates a system-wide SP SQLskillsDIFForFILL that
   
works out what percentage of a database has changed since the
   
previous full database backup.

   Date: April 2008

   SQL Server Versions:
         
10.0.1300.13 (SS2008 February CTP - CTP-6)
         
9.00.3054.00 (SS2005 SP2)
------------------------------------------------------------------------------
   
Copyright (C) 2008 Paul S. Randal, SQLskills.com
   
All rights reserved.

   For more scripts and sample code, check out 
      
http://www.sqlskills.com/

   You may alter this code for your own *non-commercial* purposes. You may
   
republish altered code as long as you give due credit.

   THIS CODE AND INFORMATION ARE PROVIDED "AS IS" WITHOUT WARRANTY OF 
   
ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED 
   
TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND/OR FITNESS FOR A
   
PARTICULAR PURPOSE.

============================================================================*/

-- Create the function in MSDB
--
USE msdb;
GO

IF EXISTS (SELECT * FROM sys.objects WHERE NAME = 'SQLskillsConvertToExtents')
   
DROP FUNCTION SQLskillsConvertToExtents;
GO

-- This function cracks the output from a DBCC PAGE dump
-- of an allocation bitmap. It takes a string in the form
-- "(1:8) - (1:16)" or "(1:8) -" and returns the number
-- of extents represented by the string. Both the examples
-- above equal 1 extent.
--

CREATE FUNCTION SQLskillsConvertToExtents (
   
@extents VARCHAR (100))
RETURNS INTEGER
AS
BEGIN
   
DECLARE @extentTotal   INT;
   
DECLARE @colon         INT;
   
DECLARE @firstExtent   INT;
   
DECLARE @secondExtent  INT;

   SET @extentTotal = 0;
   
SET @colon = CHARINDEX (':', @extents);

   -- Check for the single extent case
   --
   IF (CHARINDEX (':', @extents, @colon + 1) = 0)
      
SET @extentTotal = 1;
   
ELSE
      
-- We're in the multi-extent case
      --
      BEGIN
      
SET @firstExtent = CONVERT (INT,
         
SUBSTRING (@extents, @colon + 1, CHARINDEX (')', @extents, @colon) - @colon - 1));
      
SET @colon = CHARINDEX (':', @extents, @colon + 1);
      
SET @secondExtent = CONVERT (INT,
         
SUBSTRING (@extents, @colon + 1, CHARINDEX (')', @extents, @colon) - @colon - 1));
      
SET @extentTotal = (@secondExtent - @firstExtent) / 8;
   
END

   RETURN @extentTotal;
END;
GO

USE master;
GO

IF OBJECT_ID ('sp_SQLskillsDIFForFULL') IS NOT NULL
   
DROP PROCEDURE sp_SQLskillsDIFForFULL;
GO

-- This SP cracks all differential bitmap pages for all online
-- data files in a database. It creates a sum of changed extents
-- and reports it as follows (example small msdb):
--
-- EXEC sp_SQLskillsDIFForFULL 'msdb';
-- GO
--
-- Total Extents Changed Extents Percentage Changed
-- ------------- --------------- ----------------------
-- 102           56              54.9
--
-- Note that after a full backup you will always see some extents
-- marked as changed. The number will be 4 + (number of data files - 1).
-- These extents contain the file headers of each file plus the
-- roots of some of the critical system tables in file 1.
-- The number for msdb may be round 20.
--
CREATE PROCEDURE sp_SQLskillsDIFForFULL (
   
@dbName VARCHAR (128))
AS
BEGIN
   
SET NOCOUNT ON;

   -- Create the temp table
   
--
   
IF EXISTS (SELECT * FROM msdb.sys.objects WHERE NAME = 'SQLskillsDBCCPage')
   
DROP TABLE msdb.dbo.SQLskillsDBCCPage;

   CREATE TABLE msdb.dbo.SQLskillsDBCCPage (
      
[ParentObject] VARCHAR (100),
      
[Object]       VARCHAR (100),
      
[Field]        VARCHAR (100),
      
[VALUE]        VARCHAR (100));

   DECLARE @fileID         INT;
   
DECLARE @fileSizePages  INT;
   
DECLARE @extentID       INT;
   
DECLARE @pageID         INT;
   
DECLARE @DIFFTotal      INT;
   
DECLARE @sizeTotal      INT;
   
DECLARE @total          INT;
   
DECLARE @dbccPageString VARCHAR (200);

   SELECT @DIFFtotal = 0;
   
SELECT @sizeTotal = 0;

   -- Setup a cursor for all online data files in the database
   
--
   
DECLARE files CURSOR FOR
      
SELECT [file_id], [size] FROM master.sys.master_files
      
WHERE [type_desc] = 'ROWS'
      
AND [state_desc] = 'ONLINE'
      
AND [database_id] = DB_ID (@dbName);

   OPEN files;

   FETCH NEXT FROM files INTO @fileID, @fileSizePages;

   WHILE @@FETCH_STATUS = 0
   
BEGIN
      
SELECT @extentID = 0;

      -- The size returned from master.sys.master_files is in
      
-- pages - we need to convert to extents
      
--
      
SELECT @sizeTotal = @sizeTotal + @fileSizePages / 8;

      WHILE (@extentID < @fileSizePages)
      
BEGIN
         
-- There may be an issue with the DIFF map page position
         
-- on the four extents where PFS pages and GAM pages live
         
-- (at page IDs 516855552, 1033711104, 1550566656, 2067422208)
         
-- but I think we'll be ok.
         
-- PFS pages are every 8088 pages (page 1, 8088, 16176, etc)
         
-- GAM extents are every 511232 pages
         
--
         
SELECT @pageID = @extentID + 6;

         -- Build the dynamic SQL
         
--
         
SELECT @dbccPageString = 'DBCC PAGE ('
            
+ @dbName + ', '
            
+ CAST (@fileID AS VARCHAR) + ', '
            
+ CAST (@pageID AS VARCHAR) + ', 3) WITH TABLERESULTS, NO_INFOMSGS';

         -- Empty out the temp table and insert into it again
         
--
         
DELETE FROM msdb.dbo.SQLskillsDBCCPage;
         
INSERT INTO msdb.dbo.SQLskillsDBCCPage EXEC (@dbccPageString);

         -- Aggregate all the changed extents using the function
         
--
         
SELECT @total = SUM ([msdb].[dbo].[SQLskillsConvertToExtents] ([Field]))
         
FROM msdb.dbo.SQLskillsDBCCPage
            
WHERE [VALUE] = '    CHANGED'
            
AND [ParentObject] LIKE 'DIFF_MAP%';

         SET @DIFFtotal = @DIFFtotal + @total;

         -- Move to the next GAM extent
         
SET @extentID = @extentID + 511232;
      
END

      FETCH NEXT FROM files INTO @fileID, @fileSizePages;
   
END;

   -- Clean up
   
--
   
DROP TABLE msdb.dbo.SQLskillsDBCCPage;
   
CLOSE files;
   
DEALLOCATE files;

   -- Output the results
   
--
   
SELECT
      
@sizeTotal AS [Total Extents],
      
@DIFFtotal AS [Changed Extents],
      
ROUND (
         
(CONVERT (FLOAT, @DIFFtotal) /
         
CONVERT (FLOAT, @sizeTotal)) * 100, 2) AS [Percentage Changed];
END;
GO

-- Mark the SP as a system object
--
EXEC sys.sp_MS_marksystemobject sp_SQLskillsDIFForFULL;
GO

-- Test to make sure everything was setup correctly
--
EXEC sp_SQLskillsDIFForFULL 'msdb';
GO

Tuesday, April 08, 2008 8:49:20 AM (Pacific Standard Time, UTC-08:00)  #    Comments [2]  | 
Monday, April 07, 2008

Now we're back from Iceland and I have a week to catch up with some content development before the MVP Summit next week and then SQL Connections the following week.

One of the things I struggled with earlier in the year while writing a SQL Server 2008 training course for Microsoft was how to get FILESTREAM to work with partitioning. There wasn't (and still isn't) any information in Books Online that I could find so I had to play around to figure it out.

I should say that the CTP-6/February CTP version of Books Online *does* have a bunch of code examples around using FILESTREAM, so I'm not going to write a blog post about that. Look in the Getting Started with FILESTREAM Storage section (or paste this link into the Books Online URL: window).

Back to partitioning - first I created a test database:

CREATE DATABASE FileStreamTestDB
ON PRIMARY
   
(NAME = FileStreamTestDB_data,
      
FILENAME = N'C:\Metro Demos\FileStreamTestDB\FSTestDB_data.mdf'),
FILEGROUP FileStreamFileGroup CONTAINS FILESTREAM
   
(NAME = FileStreamTestDBDocuments,
      
FILENAME = N'C:\Metro Demos\FileStreamTestDB\Documents')
LOG ON
   
(NAME = 'FileStreamTestDB_log',
      
FILENAME = N'C:\Metro Demos\FileStreamTestDB\FSTestDB_log.ldf');
GO

Then I tried the obvious, knowing that I can't partition on the ROWGUIDCOL:

CREATE PARTITION FUNCTION MyPartFunction (INT) AS RANGE RIGHT FOR VALUES (1000, 2000);

CREATE PARTITION SCHEME MyPartScheme AS PARTITION MyPartFunction ALL TO ([PRIMARY]);

CREATE TABLE FileStreamTest (
   
TestId UNIQUEIDENTIFIER ROWGUIDCOL NOT NULL UNIQUE,
   
Customer INT,
   
[
Name] VARCHAR (25),
   
[
Document] VARBINARY(MAX) FILESTREAM)
ON MyPartScheme (Customer);
GO

Partition scheme 'MyPartScheme' has been created successfully. 'PRIMARY' is marked as the next used filegroup in partition scheme 'MyPartScheme'.
Msg 1921, Level 16, State 4, Line 8
Invalid filegroup 'default' specified.

Eventually I worked out that you have to define a separate partitioning scheme just for FILESTREAM data. This is because the regular data is stored on non-FILESTREAM filegroups, so trying to use the regular partitioning scheme for FILESTREAM would mean telling the Engine to store the FILESTREAM data in non-FILESTREAM filegroups. Clearly a non-starter. Ok - try again with a separate partitioning scheme (the prior MyPartFunction partition function and MyPartScheme partition scheme already exist now remember):

CREATE PARTITION SCHEME MyFSPartScheme AS PARTITION MyPartFunction ALL TO ([FileStreamFileGroup]);

CREATE TABLE FileStreamTest (
   
TestId UNIQUEIDENTIFIER ROWGUIDCOL NOT NULL UNIQUE,
   
Customer INT,
   
[Name] VARCHAR (25),
   
[Document] VARBINARY(MAX) FILESTREAM)
ON MyPartScheme (Customer)
FILESTREAM_ON MyFSPartScheme;
GO

Partition scheme 'MyFSPartScheme' has been created successfully. 'FileStreamFileGroup' is marked as the next used filegroup in partition scheme 'MyFSPartScheme'.
Msg 1908, Level 16, State 1, Line 1
Column 'Customer' is partitioning column of the index 'UQ__FileStreamTest__03317E3D'. Partition columns for a unique index must be a subset of the index key.
Msg 1750, Level 16, State 0, Line 1
Could not create constraint. See previous errors.

Hmm - I can't partition on Customer because there's already a unique index over TestId - UNLESS I specifically set the unique index on TestId to be non-partitioned by setting a filegroup for it:

CREATE TABLE FileStreamTest (
   
TestId UNIQUEIDENTIFIER ROWGUIDCOL NOT NULL UNIQUE ON [PRIMARY],
   Customer INT,
   
[Name] VARCHAR (25),
   
[Document] VARBINARY(MAX) FILESTREAM)
ON MyPartScheme (Customer)
FILESTREAM_ON MyFSPartScheme;
GO

That works! Now - the BIG issue with this setup is that switching partitions won't work while the unaligned index is enabled. So how to disable it? First we need to find out what it's called:

SELECT * FROM sys.indexes WHERE [object_id] = OBJECT_ID ('FileStreamTest');
GO

And then disable it:

ALTER INDEX UQ__FileStre__8CC33161060DEAE8 ON FileStreamTest DISABLE;

Now you can do partition switching. Here's the catch - to re-enable the index you need to REBUILD it - which is a size of data operation! The upshot of all this is that partitioning can be made to work with FILESTREAM data but partition switching is no longer a metadata-only operation.

Hopefully this will be addressed for V2.

Monday, April 07, 2008 3:07:33 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Tuesday, April 01, 2008

 

(And this isn't an April Fool...) I'm very pleased to announce that I've been made a SQL Server MVP for 2008. For the eight years or so before leaving the SQL team last August, I was involved a lot with the SQL Server MVPs. It's going to be *really* interesting being on the 'other side of the fence' in the MVP community and be part of the group providing the product feedback instead of the group receiving the product feedback!

As the MVP award is based on community participation, I have to thank all of those who read my blog posts, and those who post questions on the various forums and websites I post on - keep'em coming!

Thanks!

Tuesday, April 01, 2008 8:08:22 AM (Pacific Standard Time, UTC-08:00)  #    Comments [9]  | 
Monday, March 31, 2008

(Redmond, WA: For immediate release worldwide)

Today, in a surprise development that has stunned industry analysts, SQLskills.com announced a new technology for DBAs that will help in the never-ending battle against human-error and unforeseen disasters. The patent-pending Time-Setback technology allows DBAs of SQL Server to literally rewind time and avoid disasters before they happen.

Renowned SQL expert Kimberly Tripp said in an interview earlier today: "This will be a real boost for harried DBAs. All this time I've been going on and on and on about how DBAs should have a comprehensive backup strategy to cope with disasters. Now they can just forget all of that, throw caution to the wind, and rely on a Time-Setback device!"

Asked how the R&D department developed the technology, a spokesman for the company said "We got the idea after reading Harry Potter and The Prisoner of Azkaban, where Hermione is given a Time-Turner device from Professor Dumbledore. We figured there had to be some scientific basis for it, just like all those books that explain how Star Trek and the X-Files are based on real physics too. So, we had a crack at creating it and it worked! I'm not sure the color's quite right though. Maybe we'll change that in V2. Anyway, cool eh?"

A further disclosure, from a major software company, explained that it is in talks with SQLskills.com to purchase almost a thousand of the devices to hand out to developers to "ensure we ship this year and don't have to change the name". The spokesman wouldn't name the product when pressed.

The device will launch on April 1st, 2008, and will be available for immediate delivery. Although the company has only manufactured 4 of the devices, it will use one of them to do just-in-time manufacturing as orders stream in. For further details, please send email to: AprilFools@SQLskills.com

(Redmond, WA: For immediate release worldwide)

Monday, March 31, 2008 1:55:52 PM (Pacific Standard Time, UTC-08:00)  #    Comments [2]  | 

Fixed now - thanks so much!

Folks - a couple of people have setup their systems to refresh *all* our category feeds every five minutes - this is putting an undue load on our server (and screws up our server stats). I know I post a lot but not *that* often. I've WHOIS'd the IP addresses - one person is in Brazil and another in Korea - I can't limit how often you do this but I can ban your ISP's IP addresses if you continue to put this load on us - which I don't want to do. Please change your refresh rate to be 60 minutes or just subscribe to a whole blog feed rather than all the category feeds.

Thanks!

Monday, March 31, 2008 3:50:53 AM (Pacific Standard Time, UTC-08:00)  #    Comments [2]  | 
Sunday, March 30, 2008

Hot on the heels of a frenzied few weeks of teaching for Microsoft and we're off again - this time to Iceland! I've been wanting to come here for as long as I can remember but Kimberly's already been here 3 times before. We're teaching some seminars in conjunction with Miracle Iceland next week but decided to come a few days early to hang out in Reykjavik and see some sights.

Today we headed out of town with our friends Gunnar Bjarnason (Miracle Iceland's chief) and his wife Thorun with the aim of getting to the top of the Skjaldbreidur volcano (dormant of course!). First up we headed through the Thingvellir National Park to check out the fault line. Iceland sits on the boundary between the American and Eurasian tectonic plates - hence all the volcanic activity - checkout this link to see how Iceland sits in relation to the plates. In the park you can actually see where the plates come together as the fault line is a very obvious cleft lined with basalt formations. At one point there's a little bridge across the fault line and the strip-lake that formed in the cleft - not many places you can stand on a plate boundary. Apparently there's a tunnel that you can scuba-dive through to the nearby Lake Thingvallavatn, and the lake has amazing visibility for diving as it's fed almost exclusively from springs (see here). This is also where the oldest parliament in the world was established in 930AD. The Icelanders would meet here for a few weeks every year and the new laws would be memorized as there was a shortage of writing materials.

Next we headed out on road 52 for about 20km into the snowy wilderness, until we came to a set of power lines heading to one of the aluminum smelters on the island. Electricity is pretty cheap here (because it can be generated from steam from the geothermal activity) and so it's actually more economic to ship bauxite (the mineral that aluminum is smelted from) from Australia to here to be smelted and then shipped back again. Electricity here must be really cheap! Anyway, we followed the power lines across country along the side of the volcano and then Gunnar decided 'here!' and we simply turned and drove directly up the mountain in the snow. We got about 880m up before we finally got bogged down 200m from the top, even with the balloon tires down to 2 psi so we very carefully turned round and sat admiring the stunning view over lunch before heading back down.

Due to the remoteness of the area and the possibility of things going awry it's essential to have multiple radios and other emergency gear. We had no bother though, mostly due to Gunnar's excellent off-road driving skills, and the rugged Land Cruiser we were driving. We had towed along a snow-mobile part of the way with the idea of racing to the top of the mountain but the -7C temperature with *amazing* wind-chill killed that idea. Once back down by the lake we did a spot of bird-watching to add some more species to my life-list (Teal, Barrow's Goldeneye, Goosander). Kimberly and I dozed through our jet-lag on the drive back to Reykjavik, hitting the hotel 8 hours after we left.

Here's a couple of photos - click for bigger images.

 Gunnar and Paul unhooking the snowmobile.

 The view from up the mountain looking at other volcanoes.

Sunday, March 30, 2008 4:05:03 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Friday, March 21, 2008

This is a really interesting question that came up in the Microsoft Certified Architect class I'm teaching at present - if a database has torn-page protection enabled, and page checksums are enabled, is all the existing torn-page detection lost?

This is an important question, because enabling page checksums doesn't suddenly make all allocated pages be protected by page checksums (it's not until a page is read into the buffer pool, modified, and then written back to disk, that it gets a page checksum). If all the existing torn-page protection is discarded when page checksums are enabled, then the pages would be unprotected until they got page checksums on. I couldn't remember the answer, so I experimented!

My idea was to create a database with torn-page protection, create a table with a simulated torn-page in it, then enable page checksums and see if the torn-page was still reported.

-- Create the test database
USE master;
GO
CREATE DATABASE ChecksumTest;
GO
USE ChecksumTest;
GO

-- Explicitly set the database to have torn-page detection
ALTER DATABASE ChecksumTest SET PAGE_VERIFY TORN_PAGE_DETECTION;
GO

-- Create a test table and insert a row.
CREATE TABLE BrokenTable (c1 INT, c2 CHAR (1000));
INSERT INTO BrokenTable VALUES (1, 'a');
GO

-- Ensure the page is written to disk and then tossed from the buffer pool
CHECKPOINT;
GO
DBCC DROPCLEANBUFFERS;
GO

Now I'm going to examine the page. There are two bits in the page header that specify whether the page is protected by torn-page detection or with a page checksum. Specifically, the m_flagBits field will have 0x100 set if the page is encoded for torn-page protection, and 0x200 set if the page has a page-checksum stored on it, and the page has not been modified (i.e. the checksum is stillvalid). You should not see the 0x100 bit set as torn-page encoding is removed when the page is read into the buffer pool - UNLESS the page IS actually torn, in which case the encoding is NOT removed.

sp_allocationmetadata 'BrokenTable';
GO
DBCC TRACEON (3604);
GO
DBCC PAGE ('ChecksumTest', 1, 143, 3);
GO

<snip>

m_pageId = (1:143)                   m_headerVersion = 1                  m_type = 1
m_typeFlagBits = 0x4                 m_level = 0                          m_flagBits = 0x8000
m_objId (AllocUnitId.idObj) = 67     m_indexId (AllocUnitId.idInd) = 256 
Metadata: AllocUnitId = 72057594042318848                                
Metadata: PartitionId = 72057594038321152                                 Metadata: IndexId = 0
Metadata: ObjectId = 2073058421      m_prevPage = (0:0)                   m_nextPage = (0:0)
pminlen = 1008                       m_slotCnt = 2                        m_freeCnt = 6070
m_freeData = 2118                    m_reservedCnt = 0                    m_lsn = (28:183:2)
m_xactReserved = 0                   m_xdesId = (0:0)                     m_ghostRecCnt = 0
m_tornBits = 770
      

<snip>     

In this case the torn-page encoding has been removed, and the page is fine. Once I've corrupted the page on disk, it's tricky to be able to see it with DBCC PAGE. I managed to catch it once and saw the following:

m_pageId = (1:143)                   m_headerVersion = 1                  m_type = 1
m_typeFlagBits = 0x4                 m_level = 0                          m_flagBits = 0x8100
m_objId (AllocUnitId.idObj) = 67     m_indexId (AllocUnitId.idInd) = 256 
Metadata: AllocUnitId = 72057594042318848               &nb