I've said before how thrilled I am will the new extended event UI in SSMS for SQL Server 2012. However, you might be one of the early adopters who made up their own scripts to define extended event sessions, and use hardcoded scripts to harvest the results. So, you may run into this problem if you're using what I call "lazy XML" in the event harvesting script. Take, as an example, an extended event session defined as follows with 3 actions:

create event session errorsession on server
add event sqlserver.error_reported
(
action
(
package0.callstack,   
sqlserver.session_id,  
sqlserver.sql_text    
)
where error = 547 and package0.counter <= 3 
)
add target package0.ring_buffer
with (max_dispatch_latency = 1 seconds)
go

In previous versions, you could pretty much depend on the XML presenting the actions in order. So the following XML would return a subset of the event information in rows and columns.

SELECT
  Data2.Results.value ('(data/.)[1]', 'int') AS ErrorNumber,
  Data2.Results.value ('(action/.)[2]', 'int') AS Session,
  Data2.Results.value ('(action/.)[3]', 'nvarchar(max)') AS SQLStatement
from
(
select CAST(xet.target_data as xml) as data
from sys.dm_xe_session_targets xet
join sys.dm_xe_sessions xe
on (xe.address = xet.event_session_address)
where xe.name = 'errorsession') events
cross apply Data.nodes ('//RingBufferTarget/event')
  AS Data2 (Results)

The XML above is based on the assumption that action 2 is the sqlserver.session_id and action 3 is the sqlserver.sql_text, because it doesn't check names, just blindly uses the ordinal number in XPath. It makes the XML easier to write and a bit faster to execute, but its "lazy" XML. This order of actions was a valid assumption in SQL Server 2008; its not valid in SQL Server 2012. The data fragment containing the action data appears like this:

    <action name="sql_text" package="sqlserver">
      <type name="unicode_string" package="package0" />
      <value> 
        DELETE pubs.dbo.jobs
      </value>
    </action>
    <action name="session_id" package="sqlserver">
      <type name="uint16" package="package0" />
      <value>53</value>
    </action>
    <action name="callstack" package="package0">
      <type name="callstack" package="package0" />
      <value>..callstack elided...</value>
    </action>

So now, sql_text (the third action defined) is the first action presented. The fragile harvesting script will break. So, to be one the safer side, if you have any such scripts change them to actually look for the element for want using a named XPath predicate, like this:

SELECT
  Data2.Results.value ('(data[@name="error_number"]/.)[1]', 'int') AS ErrorNumber,
  Data2.Results.value ('(action[@name="session_id"]/.)[1]', 'int') AS Session,
  Data2.Results.value ('(action[@name="sql_text"]/.)[1]', 'nvarchar(max)') AS SQLStatement
from
(
select CAST(xet.target_data as xml) as data
from sys.dm_xe_session_targets xet
join sys.dm_xe_sessions xe
on (xe.address = xet.event_session_address)
where xe.name = 'errorsession') events
cross apply Data.nodes ('//RingBufferTarget/event')
  AS Data2 (Results);

Note that it's probably better to do this with the "data" items too (ErrorNumber in the query above), although you may be a little safer with these, as they have a defined schema per-event. But BOL does point out that events have a "versioned" schema. The actions can be defined in any order, so make your scripts more robust. Use XPath predicates. And if you're not on SQL Server 2012 yet, change your scripts now.

@bobbeauch

OK, back to PathName() with AlwaysOn, which I started on in the previous blog post. PathName() in SQL Server 2012 doesn't return the computer name by default ("AlwaysOn1" or "AlwaysOn2" in my example) but returns the VNN name (virtual network name). That is, it returns the availability group "listener share" name. In my case, the PathName() would start with \\AlwaysOnAG1. There's an additional option in SQL Server 2012 PathName(), that allows you to return the current replica name. So, when "AlwaysOn1" is the current primary replica, it returns that; when we fail over to "AlwaysOn2", that's what name is returned. In addition, GetFileNamespacePath() and FileTableRootPath() always return the availability group listener name, they don't have an option to return the current replica name.

Finally, I'll discuss using AlwaysOn, FilePath(), T-SQL access and secondary replicas.In theory, ADO.NET 4.02 (and, of course, SQLNativeClient 11ODBC/OLE DB and Microsoft JDBC 4.0 driver) allows you to declare your ApplicationIntent as ReadOnly or ReadWrite. ReadWrite is the default. Adding either ApplicationIntent=ReadOnly or ApplicationIntent=ReadWrite in the SSMS Connection dialog is accepted, but either one allows connection to the secondary replica. This is because, by default, secondary replicas are configured with "Readable Secondary=Yes" rather than "Readable Secondary=Read-intent". Configuring a secondary as "Readable secondary=Read-Intent" disallows connections that do not specify ApplicationIntent=ReadOnly in the connection string. Obviously, connecting as "ReadWrite" to a ReadOnly replica errors out if you try and do something other than read. Check out Sunil's excellent blog post on the motivations for these secondary replica settings.

Now that we're connected (in T-SQL) to the secondary replica, let's try and get the PathName() from here:

SELECT Name, file_stream.PathName() FROM NWTab;

Msg 33447, Level 16, State 1, Line 1
Cannot access file_stream column in FileTable 'NWTab', because FileTable doesn't support row versioning. Either set transaction level to something other than READ COMMITTED SNAPSHOT or SNAPSHOT, or use READCOMMITTEDLOCK table hint.

Now, this was surprising...I checked the snapshot info in sys.databases and also DBCC USEROPTIONS. Neither snapshot isolation level was turned on or being used. And setting the iso level to, say, repeatable read or even the dreaded read uncommitted didn't help. But using the READCOMMITTEDLOCK did. Hmmm...

But of course, Sunil had the answer for this one too..."To avoid this blocking completely, all transaction isolation levels used in an application on the secondary replica are mapped transparently to Snapshot Isolation..." And this behavior is more obliquely documented in the BOL, here. But Sunil's blog post does go on to say "You may ask how about locking hints? Well, all locking hints are ignored." But the READCOMMITTEDLOCK locking hint does work in this case (modulo the effects it might have on performance, as Sunil mentions). So at this point I wonder if the READCOMMITTEDLOCK locking hint on a FileTable has a special dispensation (other locking hints, do, in fact, fail to have the desired effect), but I'll need to revisit this in 2012 RTM. BTW, it's only using the file_stream column of the FileTable that causes this, other columns work fine. But PathName() and GetFileNamespacePath() are methods on the column, so you can't use these, or access the column in T-SQL, on readonly secondaries without the hint.

Back to BOL comment, changing the NON_TRANSACTED_ACCESS to READ_ONLY (on the primary replica) changes the behavior on the secondary as promised, but now I can't write to the FileTable using the file system. As expected.

Two more things to mention. Firstly, Msg 33447 isn't a mistaken error message. You can, of course, attempt to access the file_stream column in a FileTable from the *primary* replica using Snapshot isolation. Then you'll receive the error, and changing the iso level WILL help. When enabling read-committed snapshot or snapshot you DO get this useful (but slightly misleading, IMHO) message on the ALTER DATABASE DDL statement: "When the FILESTREAM database option NON_TRANSACTED_ACCESS is set to FULL and the READ_COMMITTED_SNAPSHOT or the ALLOW_SNAPSHOT_ISOLATION options are on, T-SQL and transactional read access to FILESTREAM data in the context of a FILETABLE is blocked.". That's not quite true; if the options are on and you *USE* snapshot iso levels, you'll get the error. Changing the iso level to a non-snapshot iso level or using the READCOMMITTEDLOCK hint WILL succeed on the primary.

Finally, remember that "ordinary" (i.e. non-FileTable) tables containing filestream columns DO support snapshot isolation levels as of SQL Server 2008 R2. It should be only the FileTable, with its non-transacted access, that has this behavior with snapshot isolation on the primary and readonly secondaries. And, in fact, this does test out to be the case, an "ordinary" filestream column works fine, even in the same database.

Enough? ;-) I even turned moderated blog comments back on (against my better judgement, the amount of spam comments is amazing), if this is a useful discussion vehicle. Or, you know where to find me... Cheers.

@bobbeauch

When I was covering the FileTable-specific functions and methods, I didn't mention PathName(), a filestream-related function, that (naturally) can be used with FileTables. This posting is about PathName(), but, mostly about what happens with FileTable in an AlwaysOn availability group configuration. There were some surprises there. None of the FileTable-specific functions and methods mention AlwaysOn, but PathName() mentions it. There is an additional parameter you can specify that matters if you use PathName() in an AlwaysOn availability group environment.

Remember that, in addition to FileTable, there have been enhancements to filestream in SQL Server 2012. Namely, you can have multiple filestream containers (filegroups for filestream) per-database (note: this is an Enterprise-only feature). This has an effect on the paths returned by PathName(). In SQL Server 2008/R2, PathName() returns a path that looks something like this:

\\ZMV08\MSSQLSERVER\v1\Northwind\dbo\Employees2\Photo\DF707B77-9FA4-4837-91C6-363E23BCEBDD.

In SQL Server 2012, an analogous request would produce a path that looks like this:

\\ZMV04\MSSQLSERVER\v02-A60EC2F8-2B24-11DF-9CC3-AF2E56D89593\FTTest\dbo\Documents\file_stream\D46814E2-633D-E111-982F-000C2959648F\VolumeHint-HarddiskVolume1.

So, for at least one reason, there's a new "version" of filestream namespace names (I'm just looking at the v01 vs. v02 part of the pathname).

So let create an AlwaysOn Availability Group setup to try it out with FileTable and PathName(). Availability group is a new feature in SQL Server 2012 that appears, at first glance, to be an big extension to the database mirroring feature. An availability group can contain multiple databases, and up to 4 secondary replicas, in addition to the primary replica. The secondary replicas can be (optionally) read-only. Using availability groups requires using Windows Failover Clustering. And, perhaps because of this, availability groups support filestream storage (and by extension, FileTables). Availability groups also can have "listeners", separate endpoints that abstract connecting to an availability group.

So I've defined two SQL Server instances, each a default instance. On machines named "AlwaysOn1" and "AlwaysOn2" (disclaimer: I never proported to have an imagination for making up names). My availability group is named "AlwaysOnAG1" and encompasses the pubs and northwind database. 'Cause we all know sample databases need to be highly available. ;-) For reference, the availability group is using synchronous-commit mode and is set up for automatic failover (which means it could do automatic, manual, or even forced failover, if need be. Both nodes are configured to be read-only when they are the secondary node. I also have a listener set up named "AlwaysOnAG1" that has a hardcoded IP and listens on port 7001.

I've decided to put a FileTable on Northwind. And, from a previous experiement, I have a FileTable on a database named "FTTest". This database does NOT participate in the availability group and only exists on the "AlwaysOn1" instance, not on "AlwaysOn2". But it produced one of the surprises.

After enabling Northwind for FileTables (AlwaysOn1 is the current primary), I define a FileTable named "NWTest". With this configuration:
 --When SSMS connected to "AlwaysOn1"
    --The share opens from SSMS Object Explorer
    --Machine "AlwaysOn2" has an MSSQLServer share, but nothing is visible underneath.
    --Using the share \\AlwaysOn1\mssqlserver, I see both Northwind's filetable and FTTest's filetable.
 

--When SSMS connected to "AlwaysOn2"
    --Machine "AlwaysOn2" has an MSSQLServer share, but nothing is visible underneath.
    --Attempting to open the share from ObjectExplorer AlwaysOn2 node produces error: "The file location cannot be opened. Either access is not enabled or your do not have permission for the same."

--When Connected to the "listener share" (i.e. \\AlwaysOnAG1\mssqlserver) I see both Northwind's filetable and FTTest's filetable. This was surprising to me because FTTest is not a member of the availability group and I'm using it's "listener share" name. I can see both FileTables even when I open the listener share from a different machine on the network (like the "AlwaysOn2" machine). Just an observation...

The file system and SMB are more tolerant of temporary outages than SQL Server connections. But that's a subject for another blog post. So, using the "listener share" I create open/edit close Somefile.txt in Notepad. Now, open the file again. Make some changes in the text. Now failover the availability group from "AlwaysOn1" to "AlwaysOn2"...

During failover, I get the "Network resource unavailable" message for my Notepad. Then, as failover completes, the message goes away. Now I can save my changes. Without incident.

So, after failover, what do things look like?
--When SSMS connected to "AlwaysOn1" (now secondary)
   --The share receives an error from SSMS Object Explorer.
   --Machine "AlwaysOn2" has an MSSQLServer share, with Northwind FileTable subdirectory.
   --Using the share \\AlwaysOn1\mssqlserver, I see FTTest's filetable (this database is not in the AG, remember), but not Northwind's FileTable subdirectories.
 

--When SSMS connected to "AlwaysOn2"(now primary)
   --The share opens from SSMS Object Explorer
   --Using the share \\AlwaysOn2\mssqlserver, I see Northwind's FileTable subdirectories/files.
 

--When connected using the listener share
   --I see only Northwind's FileTable subdirectories/files.

Notice that this is only observations with one specific configuration of the availability group. No guarentees as to what happens with other configurations.

OK, long-ish post again. But this post began with discussion of PathName() method on the filestream column. I'll conclude with that next.

@bobbeauch

So finally, we get to using FileTable's path_locator column with the methods of hierarchyid. To pick up where we left off, remember that hierarchyid has a method named GetLevel(). Given the following directory structure in the FileTable's share:

File1.txt
File2.txt
SQLFiles
  SubdirFile1.txt
  SubdirFile2.txt

I have a few more levels of subdirectory here, but hopefully, you get the idea. Issuing the query

SELECT path_locator.GetLevel() as Level, Name
FROM dbo.Documents;

show, unsurprisingly, File1.txt, File2.txt, and SQLFiles directory at level 1, and the subdirectory files at level 2. So to be sure we're getting the "right" SQLFiles directory, we could change that query in the previous blog entry to:

SELECT @pathstring = path_locator.ToString() FROM dbo.Documents WHERE Name = 'SQLFiles' AND path_locator.GetLevel() = 1;

Note also that, the way the FileTable namespace works, the dbo.Documents directory is the root of the hierarchy. We can prove it using hierarchyids.

-- Level Number of table is 0. Table PathLocator is root
SELECT FileTableRootPath('dbo.Documents') as RootPath,
       GetPathLocator(FileTableRootPath('dbo.Documents')) as Path,
       GetPathLocator(FileTableRootPath('dbo.Documents')).GetLevel() as Level;

As long are we're on that subject, let's use the rest of the hierarchyid methods on the FileTable as well.

-- an interesting way to get all direct children of the top-level SQLFiles directory node
SELECT p.name as ParentName, c.name as ChildName, c.is_directory,
 c.file_stream.GetFileNamespacePath() as ChildPath
FROM dbo.Documents p
JOIN dbo.Documents c
ON p.name = 'SQLFiles'
AND p.path_locator.GetLevel() = 1
AND c.path_locator.GetAncestor(1) = p.path_locator;

-- but we have a persisted computed column for that...
SELECT p.name as ParentName, c.name as ChildName, c.is_directory,
 c.file_stream.GetFileNamespacePath() as ChildPath
FROM dbo.Documents p
JOIN dbo.Documents c
ON p.name = 'SQLFiles'
AND p.path_locator.GetLevel() = 1
AND c.parent_path_locator = p.path_locator;

How about all children of that node, at any subdirectory level?

-- child of the top-level SQLFiles node, any
SELECT p.name as ParentName, c.name as ChildName, c.is_directory,
 c.file_stream.GetFileNamespacePath() as ChildPath
FROM dbo.Documents p
JOIN dbo.Documents c
ON p.name = 'SQLFiles'
AND p.path_locator.GetLevel() = 1
AND c.path_locator.IsDescendantOf(p.path_locator) = 1;

And of course, wind our way up the tree:

-- parents of a specific node
SELECT p.name as ParentName, c.name as ChildName, p.is_directory,
 p.file_stream.GetFileNamespacePath() as ParentPath
FROM dbo.Documents p
JOIN dbo.Documents c
ON c.name = 'SQLSubdirSomeText.txt'
AND c.path_locator.IsDescendantOf(p.path_locator) = 1;

And, we can see the obvious use of GetReparentedValue. It moves a file from one subdirectory to another. You can't just set parent_path_locator because that's a computed column.

DECLARE @oldpath hierarchyid, @newpath hierarchyid;
SELECT @oldpath = path_locator from documents where name = 'SQLServerSubdir' AND path_locator.GetLevel() = 2;
SELECT @newpath = path_locator from documents where name = 'SQL Server' AND path_locator.GetLevel() = 1;
UPDATE dbo.Documents SET path_locator = path_locator.GetReparentedValue(@oldpath, @newpath)
WHERE name = 'SubSyntheticFile2.txt';

Enjoy!

@bobbeauch

So the functions/methods that I wrote about in previous post are needed because FileTables don't store the UNC path name of the file, they store the path_locator as a hierarchyid data type. Wonder what encoding scheme they're using. Let's see, by doing

SELECT path_locator.ToString(), Name
FROM dbo.Documents

We get hierarchyid strings that look like this: "/192992825631153.73945086322524.2119705196/" Turns out that the encoding scheme involves newid() as you can see by looking at the definition for the default constraint for the path_locator column. It looks like this:

convert(hierarchyid, '/' +    
convert(varchar(20), convert(bigint, substring(convert(binary(16), newid()), 1, 6))) + '.' +
convert(varchar(20), convert(bigint, substring(convert(binary(16), newid()), 7, 6))) + '.' +
convert(varchar(20), convert(bigint, substring(convert(binary(16), newid()), 13, 4))) + '/')

In fact, almost all of the columns in a FileTable have defaults or are computed columns. So, to create a row in a FileTable, let's say, a file named "Testfile1.txt" in the root of the file share, all that's required is:

INSERT INTO dbo.Documents(Name, file_stream) VALUES('Testfile1.txt', 0x);

But, if I have a directory named "SQLFiles" at the root? How do I create a file in that directory? You might think parent_path_locator, but that's a computed column. Well, how about...

SELECT @pathstring = path_locator.ToString() from documents where name = 'SQLFiles';
SET @newpath = @pathstring +  convert(varchar(20), convert(bigint, substring(convert(binary(16), newid()), 1, 6))) + '.'
                           +  convert(varchar(20), convert(bigint, substring(convert(binary(16), newid()), 7, 6))) + '.'
                           +  convert(varchar(20), convert(bigint, substring(convert(binary(16), newid()), 13, 4))) + '/';

INSERT INTO dbo.Documents(Name, path_locator, file_stream) VALUES('SQLFilesTest.txt', @newpath, 0x);

You could also just use "@pathstring + '1/'" but this sticks to the "native" encoding scheme. And no, newid() can't be used in a function, if that's what you're thinking.

Now that I've got a zero-length file, can I open it and edit it with Notepad.exe? Well, you can if its empty. But once there is data in the file, attempting to edit with Notepad.exe returns an error, "This request is not supported". This is because Notepad.exe uses memory-mapped files, a win32 feature which isn't supported by FileTable. Paint.exe uses them too. So you'd need to open it from a remote location. Actually, connecting the share as a network drive will work as well.

But what if there are directories named "SQLFiles" at different subdirectory levels? How can we distinguish between them? That's where the hierarchyid comes in. I'll continue with that next.

@bobbeauch

I've been working with the SQL Server 2012 FileTable feature lately. Besides learning to appreciate the esoteric features of the NTFS file system and SMB protocol, only some of which are supported by FileTables, I've been trying to work with FileTables in SQL Server using T-SQL. This turns out to be an interesting exercise, especially if you're trying to brush up on your skills with the hierarchyid data type.

It turns out that a FileTable is just like a normal SQL Server table with a filestream varbinary(max) column (named, unsurprisingly, file_stream). It uses computed columns and constraints rather extensively. Behind the scenes, SQL Server functions as a Win32 namespace owner and exposes a virtual file share. Although the namespace management and fitting non-transactional access into a transaction-based system, and its easy to think of the share as a real file share, SQL Server manages everything. And you can manage the FileTable data completely with T-SQL. This one's for the folks that think *everything* ought to be managed in T-SQL. But before trying to create and manage files with T-SQL, I looked into the FileTable specific functions, GetFileNamespacePath(), FileTableRootPath(), and GetPathLocator(). These will be useful and they have some interesting options.

GetFileNamespacePath() is exposed as a method on the file_stream column.Besides being used without options, it has two options. The first option allows you to get a full namespace path instead of a relative path. Relative path is the default. So if, for example, you have a FileTable named "Documents" on a database named FTTest on a machine named ZMV04, your full path for a file at the root named Testfile1.txt will be "\\ZMV04\MSSQLSERVER\FTTest\Documents\Testfile1.txt", and relative path would be "\Documents\Testfile1.txt". A second option lets you decide whether you want a NETBIOS name, Machine name, or Domain name in your path.

FileTableRootPath() is a global function. It produces slightly different names for the root depending on whether or not you use the first parameter, the name of the filetable. Without this parameter, FileTableRootPath() doesn't include the part of the path that includes the filetable. So, in our case its "\\ZMV04\MSSQLSERVER\FTTest". When the filetable name is specified this function returns "\\ZMV04\MSSQLSERVER\FTTest\Documents. This option makes a difference when using this method to construct full paths, as we'll soon see. There's a second option for path using NETBIOS name, Machine name, FQDN is analogous to GetFileNamespacePath().

Finally there is another global function, GetPathLocator(). This one was kind of strange, even though the use case provided by the BOL was clear enough. You'd use it if you have existing (pre-FileTable) SQL Server tables that use path locators. Path locators, if you're unfamiliar with the term, refer to storing a path name as a column value in a table. You'd use that name, stored in the database, to locate the file to open and read/write. The drawback to traditional path locators is that its up to you to keep the file location in the database in sync with the filesystem. And, of course, the files that file locators point to have no intergrated backup/restore with the database. With FileTables, management is taken care of by the database engine, along with coordinated backup and restore and other nice database features.

The twist is that FileTables don't directly expose file paths, although you can derive them using our first two functions. FileTables use the hierarchyid data type, stored in a column path_locator. The GetPathLocator() function can get you a path_locator (hierarchyid) given a valid string data type FileTable-produced path name. So, if I have a file "TestFile1.txt" in the root of a FileTable named "Documents", I can get its path locator hierarchyid by using GetPathLocator(file_stream.GetFileNamespacePath(1)). Of course, you can simply use the column value in the path_locator column, but this function may come in handy later on.

This posting has gone a little long, so I'll continue it in a subsequent posting.

@bobbeauch

Just confirmed that I'll be speaking at TechDays 2012 NL in Den Haag. I'll be speaking on Feb 16th and 17th, and I'm really looking forward to it. I'll be doing some talks on SQL Server 2012 and SQL Azure Database features including T-SQL improvements, filetable/filestream, SQL Azure Federations, and others. It will be nice to catch up with friends, and I hope to see you'all there. Stop by and say hi.

@bobbeauch

Categories:

I've had the good fortune to be invited back to Stockholm as part of Addskills Expertseminarium Series. This seminar, An In-Depth Look at Developer Features and Performance in SQL Server 2012, will be two days of studying performance, diagnostic tools and other features on Microsoft's latest release of SQL Server. We'll look at T-SQL coding improvements, examine new information available in query plans and extended events, and other features that could change how you work with SQL Server in applications. Join me and the folks at Addskills for two information-packed days on 3-4 Apr. See you in the spring!

@bobbeauch

Categories:

Just back from a small hiatus and getting my schedule together for the first half of the year. One thing I'm really looking forward to is the two public SQLskills Immersion Events for developers. These are week-long events taught specifically for developers, but that doesn't mean I'm doing shallow coverage by any means. One of the folks last year in Chicago made illusions to drinking from a firehose; another delegate said he was afraid to leave his seat for 5 minutes because he'd miss too much. All of the feature are presented with performance in mind, naturally. And, like last year, I've grafted in all new SQL Server 2012 developer features, so we'll look into some comparative use cases. The overall outline is available on the SQLskills website.

This year there will be one in Chicago Apr 23-27 and also, as Paul announced just before year-end, a developer event in London the week on May 14-18. Hope to see you there!

@bobbeauch

Categories:

Late tonight I tried something that I didn't think would work because @@version on SQL Azure didn't report anything different from last week. (Microsoft SQL Azure (RTM) - 11.0.1814.30 Nov 21 2011 16:46:09 Copyright (c) Microsoft Corporation). It didn't work last week. But it does work now. That's the last time I listen to @@version... ;-) This thing that works is...

CREATE FEDERATION foo (id bigint range);

That's right...SQL Azure federations are up and appear to be working. At least on my server. They do a rolling upgrade in SQL Azure, so they be not yet be in all servers. If you've missed reading about these in the recent past, Cihan Biyikoglu is one of the "fathers" of the SQL Azure Federation implementation, and he's written all about it, for months, in his blog. He was a special guest at SQLskills' week 4 immersion event that I gave in August and gave a talk there, mostly on federations. The CREATE FEDERATION DDL statement isn't available on the web yet, so that's where to get it to try it out.

Note: You can also do this on the new SQL Azure Management portion of the Windows Azure portal. Select a database on your SQL Azure server, choose "Manage" off the ribbon at the top. You'll need to sign on again. Choose the "Administration" entry at the bottom right, then on administration page, it's the "New" button on the left of the top riboon (whew, that was harder to describe than it was to do).

BIG NOTE: Since I get the impressions that federations will likely be cost items, THIS WILL LIKELY COST YOU MONEY. Don't say I didn't warn you, if it does. I don't know how much. The corollary is DROP FEDERATION foo;

Folks that are familiar with patitioning can think of federations as similar to partitioning, but with federations in SQL Azure you can't do SQL statements that cross federation members yet (i.e. no built-in fan-out queries or joins between federated tables and tables that live outside the federation member). Good thing we all know what map-reduce means from the Dr David Dewitt talk at SQLPASS this year. Folks that are more familiar with data-dependent routing (a.k.a. data-directed routing) like to draw parallels to this technology and its use in scale-out. There's lots of information on data-directed routing, I even wrote about it in a whitepaper called "Planning, Implementing, and Administering Scaleout Solutions with SQL Server 2005", once upon a time. And some folks like to liken federations to NoSQL variant that uses the, ah, SQL language.

Whatever you'd like to compare it to, if you want federations to work with your application, you'll need to design for it. You can start by thinking in terms of three different types of tables: federated tables, reference tables, and global tables. Federated tables must have the federation key as part of the table's key in the SQL Azure implementation. So if the federation key is a customer id, orders can be a federated table if you make customerid part of its key. You can even stretch it to order_details containing the composite orders key that contains customer id. I have a harder time thinking of the products table working that way.

If you want to join order_details and products, you can make products a reference table. A reference table lives in every federation member. When you split a federation into two members, both members contain a full copy of the products table. BTW, these copies are not automatically kept in sync, so you'd have to update all of them yourself. In case you're thinking as I was, we'll need to find out if Sync services for Azure will support reference tables in federation members right away.

Finally, you can have global tables, that don't know anything about the federation. These could be in the federation root (which keeps track of the federation member metadata but doesn't know about individual federated tables), or in a different SQL Azure database. Joins between federated tables on global tables, however, happen on the client/middle-tier.

So, if you're interested in using SQL Azure Federations, the first thing to do is to think about what would make a good federation key. And what type of table each of your tables would be; federated, reference, or global. Cihan probably was a blog entry about this somewhere.

Cheers. And a big congrats to Cihan, Michael, and the rest of the team.

@bobbeauch

Categories:
SQL Azure Database

Theme design by Nukeation based on Jelle Druyts