Search Engine Q&A #13: Difference between database version and database compatibility level


This has been causing some problems on the various groups and forums over the last few days so I thought I’d repost this from my old Storage Engine blog. The questions have been around attaching 2005 databases to 2000 servers – even databases that are in 80 compat mode – and it doesn’t work. Why?


The confusion is between database compatibility level and database version. Here’s a quick explanation of the difference.


Database version


The database version is a number stamped in the boot page of a database that indicates the SQL Server version of the most recent SQL Server instance the database was attached to. The database version number does not equal the SQL Server version. For example, doing the following:



SELECT @@version;
GO


on one SQL Server instance on my laptop returns:


Microsoft SQL Server 2005 – 9.00.3054.00 (Intel X86)   Feb 13 2007 23:02:48   Copyright (c) 1988-2005 Microsoft Corporation  Developer Edition on Windows NT 5.1 (Build 2600: Service Pack 2)


However, the database version is 611. You can see the database version but if you attach a database from an earlier version of SQL Server, you’ll see these numbers in the error log as SQL Server reports what upgrade steps its doing. You can also see by doing the following:



USE master;
GO


SELECT DatabaseProperty (‘dbccpagetest’, ‘version’);
GO


Some things to note about database version:



  • SQL Server is not up-level compatible. You cannot attach a database that was created on (or has been upgraded to) SQL Server 2005 to any earlier version of SQL Server (also true for trying to attach a 2000 database to 7.0, and so on).
  • You cannot attach a database that was created on an earlier version without going through the proper upgrade procedures. Forcibly attaching a database using various hacky methods will result in all kinds of weird errors, and possibly crashes.

Database compatibility level


The database compatibility level determines how certain database behaviors work. For instance, in 90 compatibility, you need to use the OUTER JOIN syntax to do an outer join, whereas in earlier compatibility levels, you can use ‘*=’ and ‘=*’. Contrary to popular myth, all of the behavioral differences ARE documented – in the Books Online section for sp_dbcmptlevel – the SP used to set the compatibility level.


There are 5 supported compatibility levels support by SQL Server 2005:



60 = SQL Server 6.0


65 = SQL Server 6.5


70 = SQL Server 7.0


80 = SQL Server 2000


90 = SQL Server 2005


You can see the compatibility level of all databases by doing:



SELECT name AS ‘DB Name’, compatibility_level AS ‘Compatibility Level’
FROM master.sys.databases;
GO


Some things to note about compatibility levels:



  • A database created on SQL Server 2005 will have a default compatibility level of 90, unless the model database has a different compatibility level, in which case the new database inherits the compatibility level of model.
  • New features may work under older compatibility levels but beware of SET options.
  • An upgraded database retains its compatibility level. For example, a database that was created on SQL Server 2000, and didn’t have its compatibility level altered, will stay in 80 compatibility level when its upgraded to SQL Server 2005.

Summary


This was just a quick – and by no means comprehensive – explanation of the difference between the two terms. Basically, there’s no relationship between them.

The frustration of sweeping generalizations – follow on from Search Engine Q&A #12


A bit more traffic on the thread (see previous post here) prompted me to give my thoughts on the many sweeping generalizations that plague the computer industry and make it difficult sometimes to give advice in forums and blogs. I’d like to repost here (with a few tweaks for clarity).


Some examples of questions that breed sweeping generalizations:



  • Should you have clustered indexes on all tables? The well-known clustered-index debate as Kimberly likes to call it.
  • Should you rebuild or reorganize indexes to remove fragmentation?
  • Which high-availabilty solution should you use?

The problem – as with most advice – is that it’s extremely hard to make generalizations. This is both because:



  1. without lots of evidence many people (quite rightly) don’t believe sweeping generalizations as they may have been bitten by one in the past
  2. nearly every situation is different so many generalizations are useless

What I’d love to see, (and I tried to do this when at MS, and like to think I do it here or when teaching classes or conferences) is for people to provide the justification for generalizations, plus some idea of the exceptions and the circumstances under which they arise.

As for this case (whether to create multiple files because there are multiple cores/CPUs), I think we’ve about done this one to death. The sweeping generalizations here are:



  1. for non-tempdb you usually don’t need multiple files, unless you have a very high-end workload of the specific nature I described in my first post (rare)
  2. for tempdb you usually do, as long as your workload merits it on a multi-core/cpu box
  3. IO vendors may recommend it for increased IO throughput *on their specific hardware*
  4. there exist sweeping generalizations from various sources that dispute all of the above

Unfortunately, you’re not going to get a definitive, authoritative answer to a design/strategy question such as this and you’ll continue to find contradictions to anything anyone says on the forums, and even MS contradicting itself (sigh).

What I would suggest is the following:
1) go with the majority opinion of responses to questions asked, based on the respondents collective experience with many customers, databases, and workloads
2) do your own testing, on your own hardware, with your own workload and see what works for you (but beware that testing in a vacuum can prove or disprove anything you want – which is why you see so many contradictory statements)


One last thing on MS – it’s a very big company, with lots of groups. Anyone can sponsor a whitepaper, write a blog post/MSDN article/technet article and publish it, or reply on a forum as a visible MS person and it has the ‘official stamp’ of coming from MS. When I was in the product group I was continually dismayed by the misinformation, bad advice, contradictions, and baseless assertions that I saw coming from MS employees who were just trying to be helpful.


Once something’s published on the internet, it’s *incredibly* hard to undo the damage done. There’s a fundamental element of mistrust sometimes on forums and newsgroups which can be wearying when you’re trying to help people out. It can be very hard to convince people that someone else’s advice isn’t the best to follow – I remember several times arguing with people about how CHECKDB works or what a corruption error message means and finally having to resort to ‘I wrote that code – I’m afraid you *are* wrong’ – which I really hate doing.


Anyway – rant over :-)

Search Engine Q&A #12: Should you create multiple files for a user DB on a multi-core box?

There’s been a very interesting discussion going on over at SQLServerCentral.com about whether to create multiple files for a user database because the server has multiple CPUs – see the thread here. I wrote a couple of long responses during the thread that I wanted to duplicate here as I think it’s of broad interest.

My first response was:

Doesn’t make sense to split any database into multiple files for per-cpu performance, except tempdb which can suffer from multiple-CPUs all trying to modify the same allocation bitmaps under high load with small temp tables being created/deleted (see http://www.microsoft.com/technet/prodtechnol/sql/2005/workingwithtempdb.mspx for details).

Now, saying that, there’s an exception – and that’s when a non-tempdb database has the same allocation bitmap contention issues, but that only happens on the very high-end with thousands of single-row inserts per second happening on each CPU. It’s pretty rare. I’ve never seen it but Kimberly has.

What you should be considering is splitting a database into filegroups to be able to isolate critical tables within separate filegroups, or splitting a partitioned table across multiple filegroups. For SS2005, these aren’t for performance but instead for manageability (e.g. being able to restore a single filegroup offline while the rest of the database remains online, or setting several partitions of a table to be read-only). For SS2008, there are improvements to query plan generation over multiple partitions that could allow for better parallelism, but its still not optimal compared to non-partitioned data.

So – complex topic and these are simple guidelines. Hope they help.

This was followed by a discussion pointing out that various hardware vendors imply the opposite, as do several MS sources. So here was my latest response:

Hi folks,

What’s really not made clear in the MS literature is that the multiple-files in a database really only applies to tempdb. Even on tempdb, on 2005 you don’t need one file per core – more like 1/4 -1/2 the number of files as there are cores.

The tempdb problem is this – common workloads create and drop many worktables per second. The allocations that happen for a table are initially single-pages (as opposed to whole extents). This means that a search of the SGAM page for the 4GB interval needs to be done to find a mixed-extent with a free page to allocate. Multiple CPUs all hammering this page cause contention on it and performance problems. Then, a page needs to be allocated for the first IAM page – same thing happens. Then these pages need to be marked allocated in a PFS page – same thing happens. And then these pages need to inserted into the sysindexes row for the tabel – more contention. On 2000 this was particularly bad – so T1118 plus multiple files was the solution, where SQL Server would round-robin the single page allocations in the files in tempdb, alleviating the contention somewhat.

In SQL Server 2005, we changed the temp table mechanism so that whenever a temp table is dropped, one data page, one IAM page, and the system table entries (no longer sysindexes, but instead is a ‘hidden’ table called sys.allocation_units) are cached. When a new temp table is allocated, if there’s a cached ‘template temp table’ it is picked up and used without so much contention on the various allocation bitmaps. On a heavily loaded system there can still be contention and so you still need multiple files for an SMP box, but just not so many. And you don’t need T1118 any more for user databases, but do still for tempdb.

So – this is more prevalent on tempdb, but CAN happen on a user database under extreme load on monster hardware. Testing should show whether this is happening to you – if not, don’t create multiple files for performance.

Now, in terms of what works for your particular vendors IO scalability – that’s beyond me and you may need to think about that if its something they recommend. However – I’d still take it with a pinch of salt and do your own testing. See the whitepaper at http://www.microsoft.com/technet/prodtechnol/sql/bestpractice/pdpliobp.mspx for some testing/banchmarking info.

Hope this helps clarify a bit more – great discussion!

I’m interested to hear any other opinions on this – especially cases where you’ve found it necessary to create multiple files for performance.

Thanks!