I got bored on the first leg of the journey from Seattle to London so thought I’d bang out a quick blog post.

After my previous post on checking whether a SQL instance is able to use instant initialization (see here), I had a discussion with Scott R., who regularly comments on blog articles. He proposed an alternative method of checking whether the SQL service account has the Perform volume maintenance tasks privilege (AKA SeManageVolumePrivilege) (or other useful privileges like Lock pages in memorySeLockMemoryPrivilege) and outlined the pros and cons of the various methods. With his permission, I’ve turned the information he provided into this blog post.

To use whoami /priv to find the SQL service account privileges you need to enable xp_cmdshell to do it from within SQL, or be logged into the Windows box as the SQL service account. Scott suggested using an alternative tool called AccessChk, written by my friend Mark Russinovich (formerly of Sysinternals, now a Technical Fellow at Microsoft). Using it you can find the privileges assigned to other users, services, or processes. The example syntax to do this for a SQL instance would be:

AccessChk –p sqlservr.exe –f –q -v

This allows you to find the privileges of the SQL service account without having to enable xp_cmdshell or login as the service account itself.

Let’s compare the whoami + xp_cmdshell combination with the AccessChk + command window combination.

Whoami + xp_cmdshell

Advantages:

  • You don’t need access to a command window.
  • You don’t need an administrator-capable user account.
  • Whoami may already be installed on a given Windosw server.

Disadvantages:

  • You need to enable xp_cmdshell (for a brief period – and have the authority to do so) to run whoami, which is often a separate security issue in many IT organizations.
  • Whoami can’t be run from a separate command window, because it can’t get the privileges from any user account other than the one which is running the command.  Since the goal is to get the privileges of the SQL service account, and that service account is often locked down from unnecessary privileges (such as interactive logon, using command windows, etc.), whoami can’t easily be used to derive this information outside the context of running from SQL Server via xp_cmdshell.
  • Whoami may not already be installed on a given Windows server.  This could be a barrier to using it for this purpose in IT organizations with rigid program install guidelines.

AccessChk + command window

Advantages:

  • You don’t need to enable xp_cmdshell, bypassing this separate security issue.
  • You don’t need to use the SQL service account as AccessChk can report on privileges of other running processes (and their implied user account) or other explicit user accounts, without requiring the security context of the desired user account to run AccessChk.
  • This approach may be used to automate capture of such information for configuration management and reporting purposes.

Disadvantages:

  • You need access to a command window on the Windows server.
  • You may need an administrator-capable user account (which you may not have) – I am not sure of this requirement, but it may be the case.
  • AccessChk is most likely not already installed on a given Windows server.  This could be a barrier to using it for this purpose in IT organizations with rigid program install guidelines, just like for whoami.

Summary

So is there a recommendation here? No, just a quick examination of the trade-offs with the two methods. Once again, I recommend you grant this privilege to the SQL service account for the massive performance boost it gives with data file creation or growth operations.

This is one I've been trying to get to since I started blogging a couple of years ago: how to re-attach a detached suspect database. This is a pretty common scenario I see on the forums - a database goes suspect so the DBA tries to detach/attach, which fails. I wrote a demo for my corruption session at TechEd this year that shows how to create a suspect database with hex editor, then detaches it and shows how to re-attach and fix it. It's going to be a long blog post, but I've got everything wrapped up before we fly out to the UK tomorrow so I've got a bit of spare time.

Creating a Suspect Database

First off I'm going to create a simple database to use, called DemoSuspect with a table and some random data.

USE MASTER
GO

CREATE DATABASE DemoSuspect
GO

USE DemoSuspect;
GO

CREATE TABLE Employees (FirstName VARCHAR (20), LastName VARCHAR (20), YearlyBonus INT);
GO
INSERT INTO Employees VALUES ('Paul', 'Randal', 10000);
INSERT INTO Employees VALUES ('Kimberly', 'Tripp', 10000);
GO

Now I'll perform an update in an explicit transaction and force it to be written out to disk with a CHECKPOINT. I've accidentally deleted Kimberly's bonus!

-- Simulate an in-flight transaction
BEGIN TRAN;
UPDATE Employees SET YearlyBonus = 0 WHERE LastName = 'Tripp';
GO

CHECKPOINT;
GO

Then in another window, I'll simulate a crash using:

SHUTDOWN WITH NOWAIT;
GO

Now that SQL Server is shutdown, I'm going to simulate an I/O failure that corrupts the log file. I'm going to use a hex editor to do this - my editor of choice is the immensely popular and useful XVI32, written by Christian Maas. I opened the log file, filled the first section with zeroes, and then saved it again. See the screenshot below.

When I startup SQL Server again, it will try to run recovery on the DemoSuspect database and fail. This will put the database into the SUSPECT state.

So I restarted SQL Server, let's try getting in to the DemoSuspect database.

USE DemoSuspect;
GO

Msg 945, Level 14, State 2, Line 1
Database 'DemoSuspect' cannot be opened due to inaccessible files or insufficient memory or disk space. See the SQL Server errorlog for details.

Now let's check the database status:

SELECT DATABASEPROPERTYEX ('DemoSuspect', 'STATUS') AS 'Status';
GO

Status
--------

SUSPECT

Now at this point, the correct procedure is to restore from backups. If there are no backups available, then the next best thing is to get the database into EMERGENCY mode and extract as much data as possible, or run emergency-mode repair. However, I'm going to try the detach/attach route instead.

Detaching the Database

I'll try detaching the database using sp_detach_db:

EXEC sp_detach_db 'DemoSuspect';
GO

Msg 947, Level 16, State 1, Line 1
Error while closing database 'DemoSuspect'. Check for previous additional errors and retry the operation.

Hmm - did it work or didn't it?

SELECT * FROM sys.databases WHERE NAME = 'DemoSuspect';
GO

And there are no results, so the detach must have succeeded.

Re-attaching the Database

Let's try the obvious sp_attach_db:

EXEC sp_attach_db @dbname = N'DemoSuspect', 
   
@filename1 = N'C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Data\DemoSuspect.mdf', 
   
@filename2 = N'C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Data\DemoSuspect_LOG.ldf';
GO

Msg 5172, Level 16, State 15, Line 1
The header for file 'C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Data\DemoSuspect_LOG.ldf' is not a valid database file header. The PageAudit property is incorrect.

Hmm. How about using the ATTACH_REBUILD_LOG option on CREATE DATABASE? That should create a new log file for me:

CREATE DATABASE DemoSuspect ON
   (NAME = DemoSuspect, FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Data\DemoSuspect.mdf')
FOR ATTACH_REBUILD_LOG;
GO

Msg 5172, Level 16, State 15, Line 1
The header for file 'C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Data\DemoSuspect_LOG.ldf' is not a valid database file header. The PageAudit property is incorrect.
File activation failure. The physical file name "C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Data\DemoSuspect_LOG.ldf" may be incorrect.
The log cannot be rebuilt because the database was not cleanly shut down.
Msg 1813, Level 16, State 2, Line 1
Could not open new database 'DemoSuspect'. CREATE DATABASE is aborted.

Hmm. The database knows that there was an active transaction. Using the ATTACH_REBUILD_LOG command only works if the database was cleanly shut down and the log is missing. What about if I actually remove the log file and try that - maybe I could fool it? I took a copy of the data and log file, and then deleted the original log file (this is getting more and more dodgy...). Let's try again:

CREATE DATABASE DemoSuspect ON
   
(NAME = DemoSuspect, FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Data\DemoSuspect.mdf')
FOR ATTACH_REBUILD_LOG;
GO

File activation failure. The physical file name "C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Data\DemoSuspect_LOG.ldf" may be incorrect.
The log cannot be rebuilt because the database was not cleanly shut down.
Msg 1813, Level 16, State 2, Line 1
Could not open new database 'DemoSuspect'. CREATE DATABASE is aborted.

Well, that was a long-shot but SQL Server's smarter.

Basically the problem is that the database wasn't cleanly shutdown, which means that recovery HAS to run and complete before the database can be attached again. Given that our log file is corrupt, that's impossible. So, never detach a suspect database.

The only way to get the database back into SQL Server is to use a hack. I'm going to create a new dummy database with the exact same file layout and as close as possible to the file sizes of the detached database. Then I'm going to shutdown SQL Server, swap in the corrupt database files, and re-start SQL Server. If all goes well, the corrupt, suspect database will be attached again.

The one major downside of this is that if the SQL Server instance doesn't have instant initialization enabled (see How to tell if you have instant initialization enabled?), then creating the dummy database could take a long time if the data files are very big. This means that your application is offline while the files are created and zero'd out.

So, I've already got a copy of the corrupt database files, so now I need to delete the data file too. Before doing this you want to make absolutely sure you've got multiple copies of the corrupt database files... just in case. After deleting the data file, I can create my dummy database.

CREATE DATABASE DemoSuspect
GO

If you forget to delete the existing corrupt files first, you'll get the following error:

Msg 1802, Level 16, State 4, Line 1
CREATE DATABASE failed. Some file names listed could not be created. Check related errors.
Msg 5170, Level 16, State 1, Line 1
Cannot create file 'C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Data\DemoSuspect.mdf' because it already exists. Change the file path or the file name, and retry the operation.

Ok - delete the data file and try again. Now I need to check the dummy database's files are there (at this point I'm totally paranoid), shutdown SQL Server, delete the dummy databases' files (this just seems wrong!), and swap in the corrupt files. I took another copy of the corrupt files before swapping them back in, just in case something goes wrong.

After restarting SQL Server, I can check the database state:

SELECT DATABASEPROPERTYEX ('DemoSuspect', 'STATUS') AS 'Status';
GO

Status
--------
SUSPECT

Woo-hoo - I'm back to having a suspect database attached again - after having to shutdown the server and mess about deleting and copying files around. Not good. Now I can actually fix it.

Fixing the Database

If you don't have any backups, then the only way to get into the database is to use EMERGENCY mode. This lets you into the database but you need to be aware that recovery has not completed so the contents of the database are transactionally (and possibly structurally) inconsistent. I'm going to choose to repair the database using emergency-mode repair. See CHECKDB From Every Angle: EMERGENCY mode repair - the very, very last resort for a detailed description of this tool.

ALTER DATABASE DemoSuspect SET EMERGENCY;
GO
ALTER DATABASE DemoSuspect SET SINGLE_USER;
GO
DBCC CHECKDB (DemoSuspect, REPAIR_ALLOW_DATA_LOSS) WITH NO_INFOMSGS, ALL_ERRORMSGS;
GO

Msg 5172, Level 16, State 15, Line 1
The header for file 'C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\DemoSuspect_log.LDF' is not a valid database file header. The PageAudit property is incorrect.
File activation failure. The physical file name "C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\DemoSuspect_log.LDF" may be incorrect.
The log cannot be rebuilt because the database was not cleanly shut down.
The Service Broker in database "DemoSuspect" will be disabled because the Service Broker GUID in the database (9E879BFC-B742-4A69-AB14-4D6BD6F99E02) does not match the one in sys.databases (B4568D23-7018-40CF-B189-9C29DE697C09).
Warning: The log for database 'DemoSuspect' has been rebuilt. Transactional consistency has been lost. The RESTORE chain was broken, and the server no longer has context on the previous log files, so you will need to know what they were. You should run DBCC CHECKDB to validate physical consistency. The database has been put in dbo-only mode. When you are ready to make the database available for use, you will need to reset database options and delete any extra log files.

First off it tries to do the regular ATTACH_REBUILD_LOG. When that fails, DBCC CHECKDB takes over and forces the log to be rebuilt, after trying to force as much of the damaged log to be processed for recovery as it can. It then runs a full repair, in case there's anything corrupt in the database - in this case there isn't so there are no corruption messages in the output.

Notice the line about the Service Broker GUID being wrong. I had to use the hack method to get the database attached again, but when I created the dummy database, it created a Service Broker GUID for the DemoSuspect database in master.sys.databases. When I swapped in the corrupt database, it has a different GUID - so now I can't use Service Broker! And all because I detached the suspect database instead of repairing it.

So what's the state of the data?

USE DemoSuspect;
GO

SELECT * FROM Employees;
GO

FirstName   LastName   YearlyBonus
----------  ---------  ------------
Paul        Randal     10000
Kimberly    Tripp      0

Kimberly doesn't get a bonus this year - she won't be happy! This is contrived and flippant, of course, but it illustrates the point that after doing an emergency-mode repair, transactions that were active at the time the log was damaged will not get a chance to roll-back, most likely. In this case, I know what was going on when the crash occured, but what about on a busy OLTP system with hundreds or thousands of active transactions? What state will the data be in?

Summary

Yes, you can recover from a detached suspect database, but it's not pretty and you have to be very careful. The best course of action is always to have a comprehensive backup strategy that allows you to restore as quickly as possible. If you do have a suspect database and no backups, use EMERGENCY mode to access and/or repair the database. Hopefully this article will help people that find themselves in these situations.

When I started blogging, way back in 2006 :-), the third post I made on the old Storage Engine blog was about rebuilding the msdb database in 2005 (see here). This no longer works in 2008 (fellow MVP Tibor Karaszi explains why here), and in fact the information in Books Online about how to rebuild any of the 2008 system databases is incorrect. It's not something I've tried yet but people have already needed to do it (including Tibor!). So what to do?

Step in Bob Ward, a Principal Escalation Engineer with PSS, and a very good friend of mine. He's just researched and published a comprehensive blog post giving the procedure for rebuilding the system databases in 2008 using setup.exe. Checkout his great post here.

Here's hoping that you never need to do it!

On Tuesday I recorded a couple of sessions with Richard and Greg on RunAs Radio. It's been a while since either Kimberly or I have been on the show - last November at TechEd in Barcelona. In the first session we discussed what it is to be an "involuntary DBA" - someone who's thrown into the DBA role with no training. This goes along nicely with the TechNet Magazine article I wrote for the August 2008 issue on Effective Database Maintenance (for the involuntary DBA). Kimberly also pops in from time to time when she's not coughing, we make fun of her, and I describe one of my favorite Calvin and Hobbes cartoons around collecting spit in a jar - all the usual silliness.

The show is 35 minutes long and you can download it at http://www.runasradio.com/default.aspx?showNum=72.

Enjoy!

Just go watch it - LEGO stop-go animation of a Metallica concert - it's fantastic!

http://www.sarahlacy.com:80/sarahlacy/2008/08/stop-what-you-a.html

Thanks to Jason Massie for the heads-up (tons of cool SQL stuff on Jason's site btw...)

Categories:
Personal

Every so often I'll see posts on the various data corruption forums discussing causes of corruption. In this post I want to debunk some of the myths around what can cause corruption. There are really two types of corruption to deal with, physical corruption and logical corruption.

Physical corruption

This is where something has altered the contents of a data or log file sector with no regard for what is being stored there. Possible causes of physical corruption are:

  • Problem with the I/O subsystem (99.8% of all cases I've ever seen - only 3 nines as I'd estimate I've seen around about a thousand corruption cases). Remember the I/O subsystem is everything underneath SQL Server in the I/O stack - including the OS, 3rd-party file system filter drivers, device drivers, RAID controllers, SAN controllers, network hardware, drives themselves, and so on. Millions of lines of code and lots of moving parts spinning very fast, very close to very fragile pieces of metal oxide (I once heard Jim Gray liken a disk drive head to a 747 jumbo jet flying at 500 mph at a height of 1/4 inch from the ground...)
  • Problem with the host machine hardware (0.1% of cases). Most of the time this is a memory error.
  • SQL Server bugs (0.1% of cases). Yes, there have been corruption bugs. Every piece of software has bugs. There are KB articles describing bugs.
  • Deliberate introduction of corruption using a hex editor or other means.

Physical corruption is what DBCC CHECKDB usually reports and the majority of cases are caused by a physical failures of some kind, with the minority caused by humans - software bugs.

Logical corruption

This is where something has altered some data so that a data relationship is broken. Possible causes of logical corruption are:

  • Humans

:-) Okay...

  • Application bug. The application deletes one part of an inherent data relationship but not the other. Or the application designer doesn't implement a constraint properly. Or the application designer doesn't cope with a transaction roll-back properly. You get the idea.
  • Accidental update/delete. Someone deletes or updates some data incorrectly.
  • SQL Server bug. See above.
  • DBCC CHECKDB when using the REPAIR_ALLOW_DATA_LOSS option. As is documented in Books Online, and I've blogged about and mentioned when lecturing, if you run repair, it doesn't take into account any inherent or explicit constraints on the data.

The point here is that a physical failure of a component does not cause logical corruption, it causes physical corruption. Conversely, application errors cause logical corruption, not physical corruption. DBCC CHECKDB errors are about physical corruption (okay, with the inclusion of DBCC CHECKCATALOG code in 2005, it will find cases where the DBA has manually altered the system tables, causing logical corruption) and applications cannot cause physical corruption as they can only manipulate data through SQL Server. If an application hits a SQL Server bug which causes physical corruption, that's still not the application causing physical corruption, it's SQL Server.

So - on to the myths.

  • Can an application cause physical corruption? No.
  • Can stopping a shrink operation cause corruption of any kind? No.
  • Can stopping an index rebuild cause corruption of any kind? No.
  • Can running DBCC CHECKDB without repair cause corruption of any kind? No.
  • Can creating a database snapshot cause corruption of any kind? No.

Hope this helps.

If you've been following my blog for a while, you'll know that I'm a big fan of LEGO (see SQL Down Under podcast interview - CHECKDB, 2008, and what does Paul get up to when Kimberly's away?). In fact the kids got me the new LEGO Volkswagen Beetle for my birthday last month - just need the time to make it!

Yesterday, fellow MVP and our good friend Greg Linwood set me some pictures of LEGO Olympics. The Hong Kong LEGO User's Group (can you believe there's a user group for LEGO - that's so cool!) built a complete miniature set of Beijing Olympic buildings from LEGO. Kimberly found a link to a site with a bunch of pictures (saves me uploading them here) - check it out at http://designyoutrust.com/2008/07/29/lego-sports-city-recreates-2008-beijing-olympics/. Here's one for you - awesome!

Categories:
Personal

This is a quick post inspired by a question I was sent in email (thanks Marcos!) which very neatly lets me show a DMV I've been meaning to blog about for a while. And the weather here in Redmond really sucks right now so I can't go outside - blogging will serve as my work-avoidance strategy this afternoon :-).

The (paraphrased) question is:

A checkpoint is a process that writes all dirty pages to disk, and is per-database. So, if the data cache can hold a page from any database, how does checkpoint know which pages to check for a dirty status? Does it scan through buffer pool looking for pages for database X and process only those? Or is data cache somehow partitioned by database? I'd like to know a bit better how it works under the covers.

The answer is that pages are stored in buffers in the buffer pool (aka buffer cache or data cache), and the buffers are indeed hashed so they can easily be found by database. You can see what pages are currently in the buffer pool, and their status using the sys.dm_os_buffer_descriptors DMV in 2005:

SELECT * FROM sys.dm_os_buffer_descriptors;
GO

database_id file_id  page_id  page_level  allocation_unit_id   page_type      row_count   free_space_in_bytes is_modified
----------- -------- -------- ----------- -------------------- -------------- ----------- ------------------- -----------
1           1        9        0           6488064              BOOT_PAGE      1           7362                0
1           1        6        0           6488064              DIFF_MAP_PAGE  2           6                   0
1           1        7        0           6488064              ML_MAP_PAGE    2           6                   0
1           1        104      0           262144               DATA_PAGE      100         4196                0
1           1        105      0           851968               DATA_PAGE      65          5041                0
1           1        106      0           262144               DATA_PAGE      197         413                 0
1           1        107      0           262144               DATA_PAGE      207         23                  0
1           1        108      1           262144               INDEX_PAGE     7           7949                0
.
.

I cut off the output rather than list all 3258 pages in the buffer pool on my laptop. The DMV gives you back some info from the pages themselves as well as you can see(remember all this is in memory so it's quick to find).

I played around with the DMV a little bit and came up with a neat script that will tell you may many clean and dirty pages there are in the buffer pool per-database.

SELECT
   
(CASE WHEN ([is_modified] = 1) THEN 'Dirty' ELSE 'Clean' END) AS 'Page State',
   
(CASE WHEN ([database_id] = 32767) THEN 'Resource Database' ELSE DB_NAME (database_id) END) AS 'Database Name',
   
COUNT (*) AS 'Page Count'
FROM sys.dm_os_buffer_descriptors
   
GROUP BY [database_id], [is_modified]
   
ORDER BY [database_id], [is_modified];
GO

Page State Database Name             Page Count
---------- ------------------------- -----------
Clean      master                    302
Dirty      master                    1
Clean      tempdb                    88
Dirty      tempdb                    52
Clean      model                     56
Clean      msdb                      622
Dirty      msdb                      5
Clean      adventureworks            110
Clean      DemoRestoreOrRepair       64
Clean      DBMaint2008               88
Clean      DemoFatalCorruption1      64
Clean      DemoFatalCorruption2      64
Clean      broken                    64
Clean      DemoFatalCorruption3      64
Clean      DemoCorruptMetadata       111
Clean      DemoDataPurity            88
Clean      SalesDB                   123
Clean      DemoNCIndex               88
Clean      shrinktest                88
Clean      DemoRestoreOrRepairCopy   64
Clean      DemoSuspect               64
Clean      FileHeaderTest            96
Clean      MultiFileDB               96
Clean      HA2008                    88
Clean      SalesDB_Snapshot          21
Clean      BootPageTest              88
Clean      Resource Database         599

Later this week I'll try to blog a script that can tell you how much of a particular table is in memory. Enjoy!

Now this one's sure to spark some controversy...

I was checking my RSS feeds of the SQL blogs that I follow and noticed an interesting post on Kalen's blog (the post is here). She'd been lambasted for suggesting in the weekly SQL Server Magazine newsletter that *Microsoft* shouldn't provide a log reader tool and asked for comments. It's worth reading her original newsletter too, as it provides some interesting history of the development of log readers.

Donning my flame-proof suit, I wholeheartedly agree. I don't think Microsoft should spend engineering resources on a log reader tool that tries to show the SQL that generated the log records. So why shouldn't Microsoft build one? Well, IMHO, in a perfect world with no mistakes and perfect applications, there really isn't a need for a log reader tool. In an imperfect world, there is a need - but should Microsoft be the one to provide it? I think there are way more important tools and features that Microsoft should spend engineering resources on.

So why do people want a log reader tool, apart from curiosity? The three uses I see are change tracking, auditing, and the ability to rollback mistakes.

The first use, change tracking, is viable, and in fact the change data capture feature in 2008 is built on top of the transactional replication log reader Agent job (I'll post more on this, and I've just written an article on tracking changes in 2008 for the November issue of TechNet Magazine).

For auditing, how can a log reader tell whether the SQL statement was being run under a different security context, such as after an EXECUTE AS statement? How can it tell the difference between a single statement UPDATE with a multi-part WHERE clause, and multiple UPDATE statements of single rows? And on top of that, it needs to read through all the transaction log, causing contention on the log drive. In 2008 there's an in-built, synchronous auditing solution (SQL Server Audit), although it has issues with parameterized queries. In 2005, you could roll your own auditing by having all DML done through stored-procs that log what they did, for instance, or using DML triggers.

For the ability to rollback mistakes... don't get me started! Recovering from user mistakes is not a situation you want to be in - you can setup your system to avoid mistakes altogether. DDL triggers, DML triggers, schema separation, no direct access to tables for DML, and so on. Any of these are better to do than having to figure out what someone did so you can undo a mistake. Ok - so people make mistakes and you'd like to rollback one statement. How about using your backups? Oh, you don't have a good backup strategy. Well, that's where a log reader can help, if it works. But should Microsoft have to provide it?

Yes, I know the alternatives I mentioned above are sometimes easier said than done, especially with 3rd-party applications, but that's for the application writers to fix. Microsoft shouldn't have to provide a tool because of broken applications, or 3rd-party license agreements that preclude adding triggers, or DBAs that haven't implemented safe-guards. It already provides features that can stop mistakes happening, and allow auditing to happen. Why provide a log reader tool that allows other companies to then produce tools on top of that which do what SQL Server already provides out-of-the-box? And why provide a log reader at all when other companies do it already?

If anything, I'd like to see the existing tools be made to work in all circumstances. AFAIK there isn't a log reader tool on the market today that copes with absolutely everything 2005 can put into the log. Although the log internals are supposedly proprietary, there's an internals document that Microsoft licenses for free (at least when I was on the team until last summer) to companies wishing to build such tools, and they're not *secret*. There's plenty of info about the log internals on the web (some provided by me) and you can poke about to your heart's content using the undocumented tools (that's what DBCC LOG and fn_dblog are for).

However, what I'd *really* like to see is the need for a log reader tool to slowly die away as more devs and DBAs are educated and implement techniques for preventing the problems that log reader tools help to rectify (sometimes). One thing I haven't mentioned above is to have a log-shipping secondary with a load-delay - that way you have a redundant copy of the data that's always several hours behind your primary system. Or even using regular database snapshots.

Ok - that was a bit of a rant, and this is the same view when I worked for Microsoft too. To summarize, I don't think Microsoft should provide a log-reader tool. I see the need for them, when a system isn't setup to prevent mistakes happening, and there's no good backup or redundancy strategy, but I think that need can be filled by 3rd-party vendors.

Happy to hear well-thought-out arguments on this either way, either privately or as comments.

Over the last few weeks I've had lots of questions about FILESTREAM performance and how to coax NTFS into scaling nicely. I just finished writing a 30-page whitepaper about FILESTREAM for the SQL Server team, which should be published before PASS in November (I'll blog the link when I have it). Although my whitepaper isn't strictly about performance, there is a long section about setting up your system to get high-performance from FILESTREAM. What I want to do in this blog post is give a bullet list of things to do that will help you get good performance. All of these are explained in more detail in the whitepaper.

Here you go, in no particular order:

  • Make sure you're storing the right-sized data in the right way. Jim Gray (et al) published a research paper a couple of years ago based called To BLOB or Not To BLOB: Large Object Storage in a Database or a Filesystem?. To summarize the findings, BLOBs smaller than 256-KB should be stored in a database, and 1-MB or larger should be stored in the file-system. For those in-between, "it depends" - my favorite answer. The upshot of this is that you won't get good performance if you store lots of small BLOBs in FILESTREAM.
  • Use an appropriate RAID level for the NTFS volume that will host the FILESTREAM data container (the NTFS directory structure corresponding to the FILESTREAM filegroup in the database). Don't use RAID-5, for instance, for a write-intensive workload.
  • Use an appropriate disk technology. SCSI will be usually be faster than SATA/IDE, but more expensive. This is because SCSI drives usually have higher rotational speeds, so lower latency and seek times.
  • Whichever disk technology you choose, if SATA, ensure it supports NCQ, and if SCSI, ensure it supports CTQ. Both of these allow the drives to process multiple, interleaved IOs concurrently.
  • Separate the data containers. Separate them from each other, and separate them from other database data and log files. This avoids contention for the disk heads.
  • Defrag the NTFS volume if needed before setting up FILESTREAM, and periodically to maintain good scan performance
  • Turn off 8.3 name generation on the NTFS volume. This is an order-N algorithm that has to check that the new name generated doesn't collide with any existing names in the directory. This slows insert and update performance down *a lot*. Do this using the command line fsutil utility.
  • Turn off tracking of last access time using fsutil.
  • Set the NTFS cluster size appropriately. For BLOBs 1-MB or large, use a cluster size of 64-KB. This will help to reduce fragmentation.
  • A partial update of FILESTREAM data creates a new file. Batch lots of small updates into one large update to reduce churn.
  • When streaming the data back to the client, use an ~60-KB SMB buffer size (or multiples thereof). This is so that the buffers don't get overly fragmented as TCP/IP buffer are 64-KB.

Hope this helps!

Theme design by Nukeation based on Jelle Druyts