Federation Enhancements/Changes in SQL Server 2005

SQL Server 2005 implements quite a few technology pieces to help database developers and administrators acheive scale-out while acheiving the appearance of (or actually accomplishing) application or server federation. The information is scattered throughout the books online, but you can read a nice whitepaper by Roger Wolter on how to choose among the features here. I'll be doing some events with SQLskills about scale-out applications in Chicago and NYC, and partners in Reading UK, and I didn't want to miss anything. One technology piece that was a big hit in SQL Server 2000 stayed mostly the same in SQL Server 2005, but I wanted to qualify the word "mostly". That feature is distributed partitioned views and distributed queries.

Distributed queries and distributed partitioned views use OLE DB as a communication mechanism. OLE DB access is built-into the query engine, data can be retrieved by the same basic mechanism from the SQL Server storage engine or from a "remote storage engine". When you're talking to another SQL Server, the OLE DB provider used in SQL Server 2000 is SQLOLEDB. In SQL Server 2005 the OLE DB provider changed to SQLNCLI (or SQL Native Client). There are some subtle differences (remote errors, ability to run out of process) between providers, and, if you upgraded to SQL Server 2005, all your Linked Servers now use SQLNCLI.

SQL Server 2005 does, however provide nice tracing information in SQLProfiler regarding which OLE DB calls the engine makes to the linked server. Because OLE DB is interface based, you can even see the query interface calls, if you want to go to that level of abstraction. You can trace these on either or both sides of a SQL Server linked server; outgoing trace works regardless of the destination provider. You do have to grok the OLE DB interfaces to understand the events, but it's like the query plan, there are various levels of understanding.

Distributed partitioned views remote part of the query tree to the remote database. In SQL Server 2000 the portions of the query would never execute in parallel. In SQL Server 2005, the startup commands can execute in parallel and accomodations are made for caching the remote connections and using overlapped I/O to optimize things. And SQL Server's OLE DB providers have always been able to remote "statistics rowsets" to the other side to help the query processor out.

For distributed query, there's a version of the SQL EXECUTE statement that implements passthrough queries using an AT keyword. You could execute passthrough queries in previous versions by using OPENQUERY, but this required the results to be treated as a rowset (SELECT * FROM OPENQUERY(…)). I've gotten EXECUTE AT to support remote queries that use the "MAX" data types and CLR data types (the assembly must be catalogued to both remote and local database), but the XML data type isn't yet supported for remote queries even though the BOL says so. So cast it to a MAX type and cast back when you get it back on the local server.

That's enhancing one piece of the puzzle, real federated servers. Although this won a TPC benchmark (in the SQL Server 2000 release timeframe, with 32 federated servers) there's LOTS more. As always, the devil is in the details, that's where I usually come in.

Other articles

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.