Last week I kicked off a survey about the size of the largest single SQL Server database in your company – see here for the survey. I was interested to see how the distribution has changed since I did the same survey a few years ago.

2011 results:

 2011sizes Updated database size survey results

2009 results:

 2009sizes Updated database size survey results

Very interesting! As far as raw percentages are concerned, the distribution looks really the same at first glance. The raw numbers are higher, but that just reflects increased blog readership I think.

Looking a bit more carefully, there's a slight 2% increase in databases > 5TB, a 4% shift out of the 1-10GB range into the 10-100GB range, and a 3% shift from the 500GB-1TB range into the 1TB to 5TB range.

I must admit that I expected to see a more pronounced shift in the distribution towards the higher database sizes, so I was surprised. However, there's a definite overall increase in the distribution of database sizes out there – which means more people are wrestling with the challenges of maintaining a true VVVLDB – very very very large database :-)

How do you take backups of a 10TB+ database?

How do you do index maintenance on a 10+TB database?

How do you do consistency checks on a 10+TB database?

None of these operations are going to fit inside your maintenance window any longer. The trick is splitting the database up (using filegroups and/or partitioning) and rearchitecting all these processes to be done piecemeal over a period of time.

For example, one of the things I always help clients with VVVLDBs to do is architect an index maintenance process that does the absolute minimum of work, with tracking of what operations are being performed on what indexes, and heuristics to automatically change how often and what operations are performed on what indexes – almost like a self-tuning index maintenance process. This takes time to implement and get right, and makes things more complicated.

Backup and restore become much more complicated because you have to start dealing with filegroup-level full and diffferential backups, which the majority of people do not do.

I already blogged about consistency checking options for VLDBs back at the end of 2007 – I stand by those recommendations still.

Bottom line – as your databases grow in size, your database maintenance and disaster recovery practices have to change to be smarter so they don't start to take prohibitively long.

Thanks to all who responded!