Monday, January 28, 2008

This kind of follows on from my previous post about making sure you have character column lengths that can handle data from different countries (e.g. city names that may be longer in one country than another). A question on the forums today asked what info there is available to help in designing a global-ready database.

It turns out that there's a wealth of information right there under your nose - type in 'globalization' in the Index of Books Online. It'll get you to a section titled 'International Considerations for SQL Server' that has a link to a sub-section for every component of SQL Server! Very impressive. For instance, the one for the Database Engine has everything you need (I've made these links to the latest online BOL entries on MSDN):

Check it out and save yourself some pain when your database/application suddenly needs to support customers outside your home country.

Monday, January 28, 2008 12:33:25 AM (Pacific Standard Time, UTC-08:00)  #    Comments [2]  | 
Sunday, January 27, 2008

Microsoft has announced that the RTM date of SQL Server 2008 has slipped out to Q3 of this year. See this official blog post here. The official launch will still go ahead as planned on February 27th.

While some people may see this as disappointing, I don't see many large customers going straight into SS2008 when it comes out so I, for one, am glad they're taking the time to ensure a high-quality release.

Sunday, January 27, 2008 5:59:02 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 

Start of the second week in China. We flew up from Shanghai to Beijing yesterday, and just in time it seems. As we flew out of Shanghai it was being enveloped in a snowstorm. Beijing's much colder than Shanghai though - being further north and inland quite a way compared to Shanghai.

As I said a few days ago, the last time I was here I stayed in the Jiu Hua International Conference & Exhibition Center Hotel, which has to be one of the worst hotel experiences ever. I remember going to the buffet breakfast one day and picking up a pork bun to find some dead moths stuck to the bottom - that was my breakfast appetite gone for the day. Whenever I had laundry done they demanded cash before they'd give it back to me - no such thing as charging to the room. This time we're in the Grand Hyatt Beijing which is right in the middle of the city and is far more pleasant. Our room's not so high as in Shanghai as the height of buildings in the old part of Beijing is limited. Here's the view from our room this afternoon. (Click on it for a bigger version)

It's looking out on the Peking Union Medical College Hospital, formerly the Yuwang Residence (what our guide said was a house/palace for a prince). The hotel's just a few blocks from the Forbidden City and Tiananmen Square, and we can see into the Forbidden City (to the left of the view above). We're going to do a tour of these on Friday, along with the stunning Temple of Heaven, before catching the late flight back to Vancouver and on to Seattle.

Today was a free day for us so we decided to take a tour out to the Great Wall - something we were both extremely excited to do and somewhere I've wanted to go since I was a child. We decided to go to the Mutianyu section about an hour outside Beijing as we'd been told it's the best section to see - and boy, where they right! The Wall is just *stunning*. Pictures or words don't do it justice. If you're ever in the area, I urge you to make time to go if it's the only thing you see in Beijing. I really think it has to be one of the most (if not *the* most) incredible things I've ever seen. Wow.

Well, now it's dinner time so I'll leave you with a shot of Kimberly and I on the Great Wall. (Again, click for a larger version.) The wall actually continues on over to the mountains at the top left of the picture - apparently some of the sections of wall are so steep and remote that they haven't been visited for a *long* time. It's amazing to think that all the rocks - some weighing up to a ton - were carried up the mountains by hand...

 

Sunday, January 27, 2008 1:36:01 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Saturday, January 26, 2008

It seems like all I've been talking about on the forums the last couple of days is the correct order of operations in a maintenance plan. The biggest confusion seems to be about the effect of rebuilding an index on statistics, with some incorrect advice being given out on what to do.

Rebuilding an index will update statistics with the equivalent of a full scan - doesn't matter whether you use DBCC DBREINDEX or ALTER INDEX ... REBUILD, the effect is the same. It can do this because it sees a complete picture of the index while its doing the rebuild.

Reorganizing an index (using the old DBCC INDEXDEFRAG I wrote, or the new ALTER INDEX ... REORGANIZE) will NOT update statistics at all, because it only sees a few pages of the index at a time.

The problem I've been seeing is people rebuilding indexes and then updating statistics. So why is this a problem? Well, it depends :-)

  • If your default for updating statistics is to do a sampled scan, then having rebuild update the statistics with a full scan (as a side effect), and then proceeding to manually update them again with the default sampled scan, means that after both operations you're left with sampled statistics. You've wasted resources doing the sampled scan AND you've lost the 'free' full-scan statistics that the index rebuild did for you.
  • If your default is to do a full scan, then you don't lose out on having the best statistics, but you do waste resources by unnecessarily updating statistics a second time.

So what's the solution?

The simple answer is not to update statistics on indexes that have just been rebuilt.

The more complicated answer is to:

  1. Have a list of indexes (F) that you know will cause workload performance problems if they get fragmented
  2. Have a list of indexes (S) that you know will cause workload performance problems if the statistics don't get regularly updated
  3. Evaluate fragmentation for list F, and choose to reorganize, rebuild, or do nothing
  4. For all indexes in list S that were not rebuilt in step 3, update statistics

Hope this helps.

Saturday, January 26, 2008 4:01:41 PM (Pacific Standard Time, UTC-08:00)  #    Comments [8]  | 
Friday, January 25, 2008

Well it's the end of our first week in China - tomorrow we fly to Beijing for another week of teaching. Hopefully we'll get to do some sightseeing on Sunday - the last time I was in Beijing (for TechEd '06) the conference hotel was about 25 miles outside the city centre - so I haven't really been to Beijing yet.

During this week I've been playing with FILESTREAM for some demos I'm writing for a Microsoft class I'll be teaching when I get back to Redmond. The class is about SQL Server 2008 for DBAs and the attendees will be a bunch of Microsoft Field personnel and SQL Server MVPs. Anyway, as I was playing, I realized that I hadn't blogged anything about FILESTREAM yet, so this is the start of a series of posts about the feature. First up - what is it?

One problem that SQL Server users face is how to store related structured and unstructured data (BLOBs) together while:

  • Providing a way to keep the data in sync (transactionally consistent)
  • Providing fast streaming access to the BLOBs
  • Keeping costs low
  • Enabling scalability
  • Providing ease of management

There's also been a strong desire for a data type that supports BLOB values greate than 2GB for many years.

Before SQL Server 2008, the solutions centered around:

  1. Storing the BLOBs in the file system
    • Advantages: low cost per GB; great streaming performance
    • Disadvantages: difficult to integrate with structured data; complex app development/deployment
  2. Storing the BLOBs on a dedicated BLOB store
    • Advantages: good scale/expandability; cost decreases as scale increases
    • Disadvantages: difficult to integrate with structured data; complex app development/deployment; streaming support/performance is solution dependent
  3. Storing the BLOBs in a database
    • Advantages: integrated management; BLOBs integrated with structured data; easy app development/deployment
    • Disadvantages: high cost per GB; poor streaming performance; 2GB size limit per BLOB

Enter FILESTREAM. It provides the following:

  • BLOB data is stored in the file system but rooted in the database (in the table of which it is part)
  • BLOB data is kept transactionally consistent with structured data
  • BLOB data is accessible through T-SQL and the NTFS Streaming APIs - with great streaming performance
  • BLOB size is limited only by the NTFS volume size
  • Manageability is integrated into SQL Server

Sounds pretty good eh? Well, it is - mostly. There are a few drawbacks with v1 however:

  • Database mirroring cannot be configured on databases with FILESTEAM data
  • Database snapshots don't snapshot FILESTEAM data
  • FILESTREAM data can't be natively encrypted by SQL Server

Personally, I think the first of these could be a major barrier to adoption in the enterprise - hopefully this restriction will be lifted in v2.

So - there's a taster. In the next few posts I'll detail how to enable FILESTREAM and create FILESTREAM data.

Friday, January 25, 2008 1:51:30 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Monday, January 21, 2008

Nothing technical in this blog post for a change, just some info about the first trip of the year for us. We're in China for two weeks, teaching classes at the Microsoft campuses in Shanghai and Beijing. We flew in Sunday afternoon on a direct flight from Vancouver, Canada which was just wonderful. When I came to Shanghai in September 2006 for TechEd I flew to Tokyo first and then on to Shanghai but the direct flight is great. We flew Air Canada and their business class beats the crap out of United's, which we usually fly - much bigger seats with much more legroom.

Last time I was here I stayed at the Le Meridien She Shan which was about 20 miles outside the city center. This time we booked a room at the Grand Hyatt Shanghai which is right in the middle of the city. The hotel is amazing - it starts on the 53rd floor of the Jin Mao Tower in Pudong - the new side of Shanghai. Our room is on the 77th floor looking out over the Huangpu River and over towards The Bund. Here's the view from our room this morning - pretty stunning! (Click on it for a bigger version)

The space-age looking tower on the left is the Oriental Pearl TV Tower - now a famous Shanghai landmark.

Today was our only free day in Shanghai so we decided to sleep in, have breakfast, and then take a private half-day tour of the city. We had a great guide (Marco) who took us round a few cool places and was a fountain of knowledge about Shanghai and Chinese culture. First up was the 400 year old Yuyuan Gardens in the old part of the city (built by a government official as a present to his elderly parents) and then on to the Jade Buddha Temple (built in 1882 to house two solid jade statues brought back by a monk from Burma) - quite different to the Hindu temples we'd seen on our dive trip to Bali at the end of 2006.

Later in the week I've got some good blog posts lined up on new 2008 features. For now, it's dinner time so I'll leave you with a shot of Kimberly and I just before heading into the Yuyuan Garden. (Again, click for a larger version)

 

Monday, January 21, 2008 1:23:25 AM (Pacific Standard Time, UTC-08:00)  #    Comments [1]  | 
Thursday, January 17, 2008

One of the cool features of SQL Server 2008 for developers is spatial data support. There have been some great posts recently about using spatial (see Simon Sabin's blog here), which is all developer stuff - but what I'm interested in are the implications of spatial support for DBAs, and they are focused on spatial indexes

Note: all images in this post are taken from November CTP Books Online

There are two kinds of spatial data that 2008 supports - planar (i.e. points, lines, polygons on a single 2-D plane) and geodetic (i.e. points, lines, polygons on a geodetic ellipsoid - for example, the Earth). These are presented in SQL as geometry and geography data respectively. A common operation that's performed on spatial data is comparing two spatial values to see if they intersect at all. Now, this is a complicated calculation, which gets more computationally expensive as the complexity of the spatial values increase. Given a problem of 'which spatial values in this table does this spatial value X intersect with', it would be great to have some way of quickly pruning out spatial values in the table that cannot possibly intersect with X, and so avoid doing the expensive calculation for them. Enter spatial indexes.

Here's the basic idea behind a spatial index:

  • A plane is broken up into a grid of cells.
  • Each spatial value is evaluated to see which cells in the grid it intersects with
  • The list of cells is stored along with the primary key of the table row that the spatial value is part of
  • Comparing two spatial values for intersection is a matter of comparing the list of grid cells - if there are no matching cells, the spatial values do not intersect, and there's no need to do the expensive intersection calculation

In practice its a bit more complicated. For planar data (i.e. the geometry data type), you need to define a bounding box (i.e. 4 corner points that define a rectangle of space in which you're interested on the 2-D plane). That bounding box on the plane will be broken down into a grid of cells. The top-level grid can be up to 16x16, giving 256 cells. The next level of granularity breaks each of those top-level grid cells into a further grid, again up to 16x16. So now there could be (16x16) x (16x16) cells in the grid - or 65536 cells. This obviously allows a more exact description of a spatial value in the list of cells. And so on and so on. There are actually 4 levels of grid that the bounding box is broken up into - and each can be 16x16, for a possible total of 168 or 4 billion cells. The picture below illustrates this with a grid size of 4x4 at each level.

The bounding box and the size of the grid at each level are specified when the spatial index is created, as well as the maximum number of matching grid cells to store in the spatial index per spatial value - to a max of 8192. Once the bounding box has been decomposed into the various levels of grid, each value in the spatial column is evaluated against the grid. The value is first decomposed against the first level grid. If the number of cells it matches is less than the max per spatial value, the decomposition then moves to the second level grid. This decomposition continues until the maximum number of matching grid cells is reached. If the max is reached while processing a deeper level for a cell, (e.g. in the middle of processing the 2nd level grid of 4x4 for cell #13 in the 1st level), the deeper level matches are thrown away and only the coarser granularity matching cell is stored (e.g. continuing that example, the 2nd level grid matches are discarded and only cell #13 in the 1st level will be stored). The picture below helps to illustrate this.

So, each geometric spatial value is approximated in the spatial index by a list of matching cells in the defined bounding box. As there is a limit to the number of matching cells that can be stored in the approximation, it is an optimistic representation. This means that if two values are compared using the approximations, there will be no false negatives, as the approximations map a larger space than the actual spatial values. There can, however, be false positives. A false (or real) positive means the spatial values need to then be compared using the complex, computationally expensive intersection algorithm using the actual spatial values. So again, the spatial index serves as a way of pruning out the need to run the expensive algorithm.

The algorithm is very similar for geodetic data (i.e. the geography data type), however there's no bounding box. Instead, the entire geodetic ellipsoid is projected onto a 2-D plane and then the grid decomposition algorithm is applied to that plane in exactly the same way as for planar data. The picture below describes how the projection is done.

You may have already realized that the effectiveness of the spatial index in pruning is directly proportional to how exactly the approximations in the index actually describe the spatial values. In other words, the higher the number of grid cells at each level, and the higher the number of grid cell matches that are stored per spatial value in the index, the better the index is at pruning. More exact approximations require storing more matching grid cells at deeper granularities - i.e. taking MORE SPACE. Creating a more exact spatial index takes more space.

With all that in mind, the interesting thing for DBAs here is that there's a trade-off between CPU use to do the real intersection algorithm and spatial index size to use in pruning calls to the algorithm. It's too early to know what best practices there are - but I'll blog them as I here about them.

Thursday, January 17, 2008 11:59:06 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Wednesday, January 16, 2008

Despite the fact that I was in the Storage Engine, and there's always been humorous rivalry between the Storage Engine team and the Relational Engine (a.k.a. the Query Processor) team, I did actually get along with some of the QP guys :-)

One of my good friends, Conor Cunningham, has been wanting to get back into blogging and we're extremely pleased that he's now blogging on SQLskills.com - see his new blog at http://www.SQLskills.com/blogs/conor. Before Conor left Microsoft last year to head home to Texas (the Seattle rain gets the non-natives or rain-hardened Scots eventually), he was the Development Lead of the Query Optimizer team and influenced much of the Query Processor architecture. Conor's probably forgotten more about how the Optimizer works than I'll ever know! :-)

So - I for one will be following his blog avidly.

Welcome Conor!

Wednesday, January 16, 2008 9:48:14 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Monday, January 14, 2008

Back in October 2007 I blogged about partition-level lock escalation in SQL Server 2008 (see here) and I promised to do a follow-up once CTP-5 came out with syntax etc. So here it is.

A brief recap - lock escalation in SQL Server 2005 and before only allowed table-level lock escalation. If you have a partitioned table with queries going against different partitions, then table-level escalation is a pain because the whole table is suddenly locked and concurrent queries against distinct partitions can't run. SQL Server 2008 gives the ability to escalate to a parttition lock, which won't affect the queries on the other partitions.

The lock escalation policy can only be set with ALTER TABLE after a table has been created, and the policy can only be set at the table level. The syntax is

ALTER TABLE TableName SET (LOCK_ESCALATION = TABLE | AUTO | DISABLE);

The options mean:

  • TABLE - escalation will always be to the table level. This is the default.
  • AUTO - escalation will be to the partition level if the table is partitioned; otherwise it will be to the table level
  • DISABLE - escalation will be disabled. This does not guarantee that it will NEVER occur - there are some cases where it is necessary (Books Online gives the example of scanning a heap in the SERIALIZABLE isolation level)

The only way I could find to check what the escalation policy for a table is set to is to use the sys.tables catalog view:

SELECT lock_escalation_desc FROM sys.mytables WHERE name = 'TableName';

Let's try it out. Here's a script that creates a database with an example table with 3 partitions. The partition ranges are negative infinity to 7999, 8000 to 15999, 16000 to positive infinity.

CREATE DATABASE LockEscalationTest;
GO

USE LockEscalationTest;
GO

-- Create three partitions: -7999, 8000-15999, 16000+
CREATE PARTITION FUNCTION MyPartitionFunction (INT) AS RANGE RIGHT FOR VALUES (8000, 16000);
GO

CREATE PARTITION SCHEME MyPartitionScheme AS PARTITION MyPartitionFunction
ALL TO ([PRIMARY]);
GO

-- Create a partitioned table
CREATE TABLE MyPartitionedTable (c1 INT);
GO

CREATE CLUSTERED INDEX MPT_Clust ON MyPartitionedTable (c1)
ON MyPartitionScheme (c1);
GO

-- Fill the table
SET NOCOUNT ON;
GO

DECLARE @a INT = 1;
WHILE (@a < 17000)
BEGIN
INSERT INTO MyPartitionedTable VALUES (@a);
SELECT @a = @a + 1;
END;
GO

Now I'm going to explicitly set the escalation to TABLE and start a transaction that should cause lock escalation.

ALTER TABLE MyPartitionedTable SET (LOCK_ESCALATION = TABLE);
GO

BEGIN TRAN
UPDATE
MyPartitionedTable SET c1 = c1 WHERE c1 < 7500;
GO

We should be able to see the locks being held:

SELECT [resource_type], [resource_associated_entity_id], [request_mode],
[request_type], [request_status] FROM sys.dm_tran_locks WHERE [resource_type] <> 'DATABASE';
GO

resource_type   resource_associated_entity_id request_mode   request_type   request_status
--------------- ----------------------------- -------------- -------------- ----------------
METADATA        0                             Sch-S          LOCK           GRANT
METADATA        0                             Sch-S          LOCK           GRANT
METADATA        0                             Sch-S          LOCK           GRANT
OBJECT          2105058535                    X              LOCK           GRANT

Just as we expected - an X table lock. Trying any query against the table fails now. Now I'll rollback that transaction, set the escalation to partition-level and try again.

ROLLBACK TRAN;
GO

ALTER TABLE MyPartitionedTable SET (LOCK_ESCALATION = AUTO);
GO

BEGIN TRAN
UPDATE
MyPartitionedTable SET c1 = c1 WHERE c1 < 7500;
GO

SELECT [partition_id], [object_id], [index_id], [partition_number]
FROM sys.partitions WHERE object_id = OBJECT_ID ('MyPartitionedTable');
GO

SELECT [resource_type], [resource_associated_entity_id], [request_mode],
[request_type], [request_status] FROM sys.dm_tran_locks WHERE [resource_type] <> 'DATABASE';
GO

partition_id         object_id   index_id    partition_number
-------------------- ----------- ----------- ----------------
72057594039042048    2105058535  1           1
72057594039107584    2105058535  1           2
72057594039173120    2105058535  1           3

resource_type   resource_associated_entity_id request_mode   request_type   request_status
--------------- ----------------------------- -------------- -------------- ----------------
HOBT            72057594039042048             X              LOCK           GRANT
METADATA        0                             Sch-S          LOCK           GRANT
METADATA        0                             Sch-S          LOCK           GRANT
METADATA        0                             Sch-S          LOCK           GRANT
OBJECT          2105058535                    IX             LOCK           GRANT

Excellent - the object lock is now IX rather than X, and the X lock is at the partition (HOBT) level for partition 1 (see the bold highlighting to match the partition ID with the lock resource). (For an explanation of HOBTs, see my post Inside The Storage Engine: IAM pages, IAM chains, and allocation units.) So now we should be able to do something with another partition - let's see if we can cause another partition level X lock in another connection:

USE LockEscalationTest;
GO

BEGIN TRAN
UPDATE
MyPartitionedTable set c1 = c1 WHERE c1 > 8100 AND c1 < 15900
;
GO

SELECT [partition_id], [object_id], [index_id], [partition_number]
FROM sys.partitions WHERE object_id = OBJECT_ID ('MyPartitionedTable'
);
GO

SELECT [resource_type], [resource_associated_entity_id], [request_mode],
[request_type], [request_status] FROM sys.dm_tran_locks WHERE [resource_type] <> 'DATABASE'
;
GO

partition_id         object_id   index_id    partition_number
-------------------- ----------- ----------- ----------------
72057594039042048    2105058535  1           1
72057594039107584    2105058535  1           2
72057594039173120    2105058535  1           3

resource_type   resource_associated_entity_id request_mode   request_type   request_status
--------------- ----------------------------- -------------- -------------- ----------------
HOBT            72057594039107584             X              LOCK           GRANT
HOBT            72057594039042048             X              LOCK           GRANT
METADATA        0                             Sch-S          LOCK           GRANT
METADATA        0                             Sch-S          LOCK           GRANT
METADATA        0                             Sch-S          LOCK           GRANT
OBJECT          2105058535                    IX             LOCK           GRANT
OBJECT          2105058535                    IX             LOCK           GRANT

Now we have two partition X locks, for partitions 1 and 2 (as expected - use the color coding above to match up the IDs), plus two table-level IX locks (one for each  connection, as expected). Very cool!

Now I'm going to force a deadlock - by having each connection try to read a row from the other locked partition:

Connection 1:

SELECT * FROM MyPartitionedTable WHERE c1 = 8500;
GO

Conneciton 2:

SELECT * FROM MyPartitionedTable WHERE c1 = 100;
GO

Connection 2 succeeds but on connection 1 we get (as expected):

(local)\SQLDEV01(SQLHAVPC\Administrator): Msg 1205, Level 13, State 18, Line 1
Transaction (Process ID 51) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.

This illustrates a potential problem with this new mechanism - applications that used to rely on the blocking nature of X table locks may now exhibit deadlocks if partition-level escalation is turned on in production without any testing. In fact, this mode was specifically chosen NOT to be the default setting for new tables because some trial workloads exhibited deadlocks during testing. Don't just turn it on in production without testing - as with any other option or feature.

Monday, January 14, 2008 2:10:24 PM (Pacific Standard Time, UTC-08:00)  #    Comments [2]  | 
Saturday, January 12, 2008

SQLskills.com has released its first ever utility! :-)

After several years of being asked to make available the Dual Database Monitor application (that's present inside the popular AlwaysOn Hands-On Labs we give out), I've spent a bunch of time making it configurable, work outside the VPC environment, and available in an easy kit form for people to use. It comes with easy-to-follow instructions and example SQL scripts to get you going. It supports SQL Server 2005 and SQL Server 2008.

This is an invaluable tool to use when setting up a Database Mirroring partnership to check that mirroring is working ok, without having to hack up your own application to do it. It can also be used to monitor two nodes of a peer-to-peer replication topology (or more using multiple instances of the monitor). It comes in two versions - single-user or unlimited use within a single company.

Check out the DDM webpage here for more details and ordering info.

Saturday, January 12, 2008 11:28:58 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Thursday, January 10, 2008

Just got an email notification of a new whitepaper from the SQL Customer Advisory Team on Database Mirroring and Log Shipping Working Together. It covers:

  • Converting a log shipping setup to a database mirroring partnership
  • Setting up log shipping to a 3rd destination (i.e. warm standby to go with the mirroring hot standby)
  • Swapping the roles of the mirroring server and the log shipping secondary server

It's short at 8 pages but has some good info in it. It's available to download here.

While I was poking about for a better download location, I discovered another new whitepaper (from last year) on database mirroring, this time on Implementing Application Failover with Database Mirroring. Its concerned with how make applications failover gracefully when a mirroring failover happens. Again, its not very log but there's some useful code examples for ADO.NET and JDBC. You can download it here.

I've added both of these to our whitepapers page too. Enjoy!

Thursday, January 10, 2008 6:02:51 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 

There's a new whitepaper on TechNet that I've just come across (even though it was published 6 months ago!) called Predeployment I/O Best Practices. It's really good - discussing the following:

  • Guidelines for determining I/O capacity
  • Disk configuration best practices and common pitfalls
  • Using SQLIO to determine capacity and interpreting its results
  • Using System Monitor to monitor an IO subsystem

You can read it here and I've added a link to our whitepapers page.

While I'm on the subject of I/O, Bob Dorr (A Senior Escalation Engineer in PSS) published a blog post last year that debunks a couple of urban legends around SQL Server's IO, specifically:

  • The myth that SQL Server used one thread per data file
  • The myth that a disk queue length greater than 2 indicates an I/O bottleneck

Check out his blog post and the subsequent discussion here.

Thursday, January 10, 2008 2:38:38 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Wednesday, January 09, 2008

Quickie today as I'm preparing to speak at the Pacific Northwest SQL Server User Group meeting tonight on the MS Campus here in Redmond (my trademark Detection and Recovery from Database Corruptions talk).

SQL Server 2005 introduced the concept of hot-add memory, to allow for dynamic workload handling. SQL Server 2008 increases these capabilities by adding hot-add CPU as well. 'Hot-add' means being able to plug in a CPU while the machine is running and then reconfigure SQL Server to make use of the CPU ONLINE! (i.e. no application downtime required at all)

There are a few restrictions:

  • You need a 64-bit system that support hot-add CPU (obviously :-))
  • You need Enterprise Edition of SQL Server 2008
  • You need Windows Server Datacenter or Enterprise Edition

When you plug in the new CPU, SQL Server won't automatically start using it. If you think about it, it can't - you may not want that CPU to be used by SQL Server - so it has to be told that it can use it. You do that by setting the appropriate affinity masks and then running a RECONFIGURE.

Ok - now we come to the bit that needs the "(and affinity masks)" in the title. What's an affinity mask? In a nutshell, it's a bitmap of all the CPUs on the machine that specifies which ones are available for general SQL Server Engine use, and which ones are available only for SQL Server I/O. There are two kinds of masks, a regular affinity mask, and an affinity I/O mask. The regular affinity mask, if all zeroes, says that Windows decides who get's what CPU when. If the affinity mask is non-zero, then there's a bit per CPU. If it's set to 1 then SQL Server can use the CPU. If a bit is set to 1 in the affinity I/O mask, then the CPU can only be used for I/O. A bit cannot be set in both masks. Now - an affinity mask is 32-bits wide, so if you have more than 32 CPUs, you need to use two more affinity masks, called affinity64 and affinity64 I/O. These do the same thing but for CPUs 32-63 on the machine.

So, very cool, especially for those of you that can afford such hardware - I can't so I don't have a box to test it on (the 64-bit server we have here at SQL skills doesn't support it).

Wednesday, January 09, 2008 4:11:12 PM (Pacific Standard Time, UTC-08:00)  #    Comments [4]  | 

Back in September last year I blogged about the native Backup Compression that is in SQL Server 2008 (see here) and promised to blog more when CTP-5 came out and I ran some tests. Well, it's here and I have so here's some data for you.

I expanded the AdventureWorks database to be 322Mb (random size, but big enough to get a decent sized run-time on my server). I used System Monitor to capture %user-mode CPU time, plus backup/restore throughput for a compressed and uncompressed backup operation, and then restores.

1) For the uncompressed backup the average CPU was 5% (the green line at the bottom), the run-time was 39.5s, and, of course, it took 322Mb to store the backup.

2) For the compressed backup the average CPU was way higher at 25%, BUT the run-time was 21.6s (a 45% improvement), and the backup was stored in 76.7MB (a 4.2x compression ratio). Very cool.

3) For the restore of the uncompressed backup the average CPU was 8%, and the run-time was 71.0s.

4) For the restore of the compressed backup the average CPU was 14.5%, and the run-time was 36s (a 50% improvement).

So - to summarize, turning on compression means more CPU and smaller run-times - just what was expected. Note that if you try this on your database you will see different results - the compression ratio and CPU usage is entirely dependent on the data being compressed.

Wednesday, January 09, 2008 2:26:58 AM (Pacific Standard Time, UTC-08:00)  #    Comments [3]  | 
Friday, January 04, 2008

Hit another issue last night while playing with the latest CTP - VS2005 can't cope with some of the changes in SS2008. The problem arose while defining an ASP.NET connection string that was pointing to a 2008 instance (while playing with query notifications and cache invalidation, but that's unrelated) - it failed with a slightly misleading error message from VS that my SQL Server instance had to be SQL Server 2005 Beta 2 or above!

Turns out there's a fix already (which works perfectly) - see http://www.microsoft.com/downloads/details.aspx?FamilyID=e1109aef-1aa2-408d-aa0f-9df094f993bf&displaylang=en. Thanks to Bob Beauchemin for helping me out.

Friday, January 04, 2008 10:05:51 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Thursday, January 03, 2008

I came across an interesting bug in Management Studio in the latest Katmai CTP today - when a connection disconnects from a database, SSMS doesn't release the shared database lock that it holds. This prevents any operations that need exclusive database access (like a RESTORE) and can be somewhat disconcerting if you don't realize what's going on. Try executing the following in SSMS in the November CTP:

USE master;
GO

CREATE DATABASE MySSMSTest;
GO

USE MySSMSTest;
GO

CREATE TABLE test (c1 INT);
GO

USE master;
GO

DROP DATABASE MySSMSTest;
GO

And you'll see:

Msg 3702, Level 16, State 4, Line 1
Cannot drop database "MySSMSTest" because it is currently in use.

It works perfectly in SQLCMD in the same CTP, and also in SQL Server 2005. You can work around this by doing something like:

ALTER DATABASE MySSMSTest SET RESTRICTED_USER WITH ROLLBACK IMMEDIATE;

GO

Not exactly ideal - but, hey, it's only a CTP. This has already been reported in Connect as issue 320135 - I added a workaround.

I'll post any further bugs that I find here as well as making sure they're on Connect.

Thursday, January 03, 2008 7:07:02 PM (Pacific Standard Time, UTC-08:00)  #    Comments [2]  | 

120x240_SQLConn_IBT.jpg120x240_SQLConnSpring08.jpg

 

No sooner has Fall conference season finished then it's New Year and we start everything all over again! SQLskills (Me, Kimberly, Bob, and Stacia) is doing a *ton* of stuff at SQL Connections this Spring. The conference is in Orlando as usual and runs from April 20th to 23rd, with pre-con workshops on the 20th. From this conference onwards Kimberly and I are the Co-Chairs of SQL Connections - it's been great fun putting together a killer line-up of speakers and sessions for you.

We've got so much cool stuff to talk about that as well as doing a pre-con on the 20th, we're also doing a pre-pre-con on the 19th (and so is Bob) AND a post-con workshop on the 24th! And if that's not enough to help us lose our voices during the week, we're also doing 10 conference sessions between the four of us! Tuesday 21st is Microsoft day and the session line-up is shaping up well - lots of juicy details about SQL Server 2008 - as well as some best-practices sessions for those of you who are happy with SQL Server 2000 or 2005 for now.

 

Here's what we're doing:

Workshops

  • April 19 - Pre-pre-con: SPR301: The Accidental DBA: Survival Tips, Tricks, and Techniques

(Paul S. Randal & Kimberly L. Tripp)

Have you been nominated as "the SQL person" on your team? Are you a developer who's suddenly found their test database has become critical for your company's business? Have you become a DBA—even only accidentally—and do you find yourself managing SQL Server database(s) more and more? Are you sure your data is protected? Are you sure your applications can scale? The one thing you NEED now, to manage this system correctly—is knowledge! We'll cover all of the critical components related to configuring, implementing, and maintaining a SQL Server system. Topics will include an overview of SQL Server components, protecting and maintaining the data, writing effective server-side components (e.g., procedures and transactions), and many other items that all require server-side smarts. Come to this workshop to find out the things you need to know to successfully manage SQL Server from the beginning—a day spent here will save you many more!

  • April 19 - Pre-pre-con: SPR302: SQL Server 2008 Overview for Developers

(Bob Beauchemin)

SQL Server 2008 introduces a number of exciting new features for developers, from support for Spatial Data types to a mechanism to store SQL BLOB data using the NTFS file system, to improvements in any development-related area from T-SQL to SQLCLR to XML. This one-day seminar is meant to get you up to speed quickly on the new features and give you some insight into how to most effectively use them to your advantage with either new or existing development projects. Some of the topics covered include:

  • Spatial data—how to geocode existing data, import spatial reference data, and use spatial queries and indexes for best performance.
  • Filestream data—when to store large binary data in the database or in SQL Server's filestream data storage. How to read and write filestream data with the system streaming I/O functions.
  • Extended date/time data type support—the specifics of SQL Server 2008's four new temporal data types and extensions to T-SQL date/time functions.
  • T-SQL enhancements—learn the "zen" behind the new T-SQL MERGE statement, grouping set support, table-valued parameters, change tracking, metadata tracking, and improved syntax.
  • Query Performance Improvements—besides covering T-SQL syntax, I'll cover how the new T-SQL statements help to improve performance, as well as covering performance and query plan guide improvements to not only "make SQL run faster" but also to stabilize query plans.
  • SQLCLR improvements—covering how to use the improvements in this area, including nullable type support, large UDTs and UDAggs, multi-input UDAggs, and ordered table-valued functions.
  • Service Broker enhancements—diagnose Service Broker setup problems easily using the new SSBDiag utility and set message priorities on a service/contract.
  • April 20 - Pre-con: SPR303: SQL Server 2008 Overview for DBAs

(Paul S. Randal & Kimberly L. Tripp)

SQL Server 2008 offers an impressive array of capabilities for professional developers that build upon key innovations introduced in SQL Server 2005. The use of many of these will have manageability and infrastructure implications for a database—and hence the DBA! There are also enhancements to existing high-availability technologies, plus a variety of new tools to aid in managing performance, scalability, administration, and troubleshooting. This workshop will help you understand how to exploit the new toolset and how to manage a database that makes use of the new features in SQL Server 2008. Come to this workshop so your developers don't surprise you with new demands once your company upgrades!

  • April 24 - Post-con: SPS302: SQL Server - Put Your Knowledge Into Action (Bring Your Own Laptop)

(Paul S. Randal & Kimberly L. Tripp)

After a week of learning and watching demos—spend your last conference day putting your knowledge into action and diving deeper into the implementation details. Bring your own laptop to install our VPC environment setup with hands-on lab exercises to walk you through some of our most important features in Database Maintenance and Disaster Recovery. All labs will be ILLs (instructor-led labs) with supporting hands-on lab content *and* you will walk away with your own copy of the DVD to continue the exercises back at your office. You can attend without a laptop but your experience will be significantly better with one! This is meant as an advanced workshop and will expect a reasonable laptop configuration in order to participate: * Virtual Server or Virtual PC—already installed * At least 1 GB of physical memory w/512 MB dedicated to the VPC environment (2 GB is preferred w/1 GB dedicated to VPC) * 12 GB of physical disk space (20+ GB is preferred) * DVD Drive.

Sessions

  • SDB403: Diagnosis with Extended Events in SQL Server 2008

(Bob Beauchemin)

When I'm doing problem solving, it's always good to have too much information rather than too little. With this in mind, you'll need to look at SQL Server Extended Events (XEvent support) in SQL Server 2008. SQL Server 2008 adds support for extended events that works by creating and activating EVENT SESSIONS with DDL statements. In this session, I'll cover the different event providers, including the event provider for ETW (Event Tracing for Windows) and go over setting up an extended event trace and deciphering the diagnostic information provided.

  • SDB304: PowerShell in SQL Server 2008

(Bob Beauchemin)

PowerShell scripting has become the command shell and scripting interface of choice in Windows, from Exchange administration through Windows Management Instrumentation (WMI) and everywhere in between. In SQL Server 2008, this functionality comes to SQL Server by means of a PowerShell provider for SQL Server and built-in PowerShell functionality from SQL Server Management Studio's Object Explorer. This session will focus on how to use the provider to your best advantage and unique features that separate PowerShell scripting from traditional scripting.

  • SDB307: Learn the XML You Need to Manage Your Database

(Bob Beauchemin)

SQL Server 2005 includes support for an XML data type and XML Query languages. Although at first glance it appears that XML support is only a developer tool, looking at the "XML landscape" in SQL Server 2005, this is not the case. There are not only needs for a DBA to know how to manage database objects like XML SCHEMA COLLECTIONS and XML Indexes, but DBA-specific features that require XML and XQuery knowledge. The DBA-specific items that use an XML format in SQL Server 2005 include Query Plans, Deadlock Graph/Blocked Process Information, Eventdata() function—DDL Triggers and Query Notifications, Bulkcopy—XML format files and bulk copying XML, SQL Server Surface Area Configuration tool format, Query Memory Grant Information, Command line input and output From Database Tuning Advisor, and more! With all of the DBA-specific interest items in XML format, it's not hard to see that knowing XML and XQuery not only makes the DBA able to better manage developer database objects, but enhances the DBA debugging, troubleshooting, and configuration abilities. XML is not a "nice to have" for DBAs any more, with SQL Server 2005 it’s a "must have."

  • SDB308: Follow the Rabbit: Interactive Session on Database Mirroring

(Paul S. Randal & Kimberly L. Tripp)

If there's something you want to know about Database Mirroring, we've got the answer for you! Bring along your questions as we discuss how Database Mirroring works while setting up and implementing a mirroring solution. We'll also discuss monitoring and troubleshooting, plus the various failover scenarios. This session might not seem as structured as other sessions, but you'll be surprised at how informative and fun it is!

  • SDB309: Follow the Rabbit: Interactive Session on Backup and Restore

(Paul S. Randal & Kimberly L. Tripp)

If there's something you want to know about Backup or Restore, we've got the answer for you! Bring along your questions as we discuss how Backup and Restore work, planning a backup strategy, and various restore scenarios. This session might not seem as structured as other sessions, but you'll be surprised at how informative and fun it is!

  • SDV307: Index Internals and Usage

(Paul S. Randal & Kimberly L. Tripp)

Indexes are arguably the most important structures in a database yet they are often poorly understood and neglected. In this session you'll be reminded of the internals of indexes but our focus will be on how they are used by the SQL Server engine and what you need to do to make sure they're kept in optimum health. A myriad of tips, tricks, and optimizations will be discussed and demo'd so that you improve performance immediately.

  • SBI201: Searching Business Intelligence Data in Microsoft Office SharePoint Server 2007

(Stacia Misner)

One great reason to use MOSS as your front-end to business intelligence is the ability to search for all relevant documents regardless of the format used to present the data and thereby eliminate information silos. However, deploying workbooks and reports or creating dashboards only satisfies some requests for information and only if document titles and properties have been created with search in mind. What if you could also search the data itself? In other words, what if you could find all BI reports that include information about a particular product? With a little extra effort, you can enhance MOSS’s search capabilities using the Business Data Catalog to search inside all those Excel 2007 and Reporting Services reports you’ve been deploying to MOSS. Come to this session to understand the technical architecture required to search business intelligence data and to learn how to create Business Data Catalog applications for business intelligence.

  • SBI202: Data Mining for the Rest of Us

(Stacia Misner)

Don’t worry if you don’t hold a PhD in data mining or even if you didn’t take statistics in college. You can still use and, better yet, understand data mining now that Data Mining Add-Ins for Office 2007 is available. Whether you regularly analyze data now or provide technical support for those who do, it’s time to learn how to take business intelligence to the next level in your organization. This session will show you specific examples for exploring common data sets, such as sales and financial data, to find the hidden information in your business. You’ll understand the technical architecture requirements for the Data Mining Add-Ins, learn how to prepare data for data mining, and learn how to apply data mining techniques to specific business problems.

  • SBI307: Troubleshooting MDX Query Performance

(Stacia Misner)

There are lots of ways to optimize your Analysis Services environment, but tuning the server or improving the database design doesn’t help if your MDX queries are not efficient. Learn how to determine whether a query is the root cause of your performance issues and how to use MDX best practices to improve query performance.

  • SBI308: Follow the Rabbit—Interactive Q&A on Analysis Services Performance

(Stacia Misner)

The focus of this discussion is how to adjust database design and server tuning to help you get better performance from your Analysis Services solution. There will be only 5-10 slides covering best practices to get the conversation started, but most of the session time is open for your questions. Come participate in this session for an informative and interactive experience that will give you practical advice to put into practice in your own environment.

So, serious amounts of info, tips, and tricks for you to geek-out on with us and take home to use. We hope to see a lot of you there!

Paul

Thursday, January 03, 2008 2:58:33 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Monday, December 31, 2007

Ok - so we did more partying than we thought so blog posts have been a little sparse this month, but here's one to end off the year.

There's a popular impression that turning on snapshot isolation and then rebuilding indexes will cause all rows in the table to get the extra 14-byte versioning tags. Fact or fiction? Let's find out.

First I'll create a test database, containing a small table with a clustered index and few rows:

CREATE DATABASE SItest;
GO

USE SItest;
GO

CREATE TABLE SmallTable (c1 INT, c2 INT);
CREATE CLUSTERED INDEX SmallTableCI ON SmallTable (c1);
GO

INSERT INTO SmallTable VALUES (1, 1);
INSERT INTO SmallTable VALUES (2,2);
GO

Next I'll turn on READ_COMMITTED_SNAPSHOT and rebuild the index to see if statement level versioning does the trick:

ALTER DATABASE SItest SET READ_COMMITTED_SNAPSHOT ON;
GO

ALTER INDEX SmallTableCI ON SmallTable REBUILD;
GO

Now let's look at the data page holding the two rows to see if there's any versioning info (the output is snipped short a little for brevity):

DBCC IND (SItest, SmallTable,