Tuesday, July 08, 2008

I've noticed that some folks have written PowerShell scripts that execute against a list of servers. In the scripts, they read the names of the servers from XML files. But the SQL Server PowerShell provider in RC0 has a useful "component" called SQLRegistration; its "path" is SQLSERVER:\SQLRegistration that they can use instead.

This path permits enumeration and manipulation of the (SQL) servers and server groups that are defined using SQL Server Management Studio. The SQLRegistration path is not specific to server or instance, but to the SSMS user that's signed on. For example, on the same physical machine, SQLRegistration for the Windows user bobb reflects bobb's settings in SSMS. SQLRegistration for user mary (on the same machine) would reflect mary's SSMS settings. There is no SQLSERVER:\SQLRegistration\{machinename}\{instancename} path, just SQLSERVER:\SQLRegistration.

SQLRegistration contains two "subdirectories", called "Central Management Server Group" and "Database Engine Server Group". These are equivalent to SSMS' "Central Management Servers" and "Local Server Groups" folders, respectively.

So rather than write custom XML files that hold information about groups of servers you can use the built-in Registration store. You can import and export registration information to keep team members' view of SQL Servers and groups consistant using SSMS.

Because I've been writing PowerShell scripts against the provider's Policy and DataCollection stores, I thought it would be fun to program SQLRegistration. I found the Microsoft.SqlServer.Management.Registration namespace in BOL (which is RC0 is really called Microsoft.SqlServer.Management.Smo.Registration, BOL says this will change before RTM) and was about to code against the RegisteredServer and ServerGroup classes, when I realized that Registration was a simple hierarchies of servers and groups. Just like directories and files. And the built-in groups each have a "mode" property that's value is "d". Registered Servers have a blank mode property. Hmmm...

So, at the PowerShell prompt
> cd 'SQLSERVER:\sqlregistration\Database Engine Server Group'
> new-item MyNewGroup

creates a new directory (ServerGroup) named MyNewGroup. Well then, a registered server should be a file.

> new-item MyNewServer -itemtype file

New-Item : SQL Server PowerShell provider error: This provider only supports the
creation of new Server Groups or Server Registrations. Please specify "directory"
to create a new Server Group or "registration" to create a new Server Registration.

So this should work?

> new-item MyNewServer -itemtype registration

New-Item : SQL Server PowerShell provider error: Please specify a connection string using the -Value parameter.

> new-item MyNewServer -itemtype registration -Value "server=mynewserver;integrated security=true"
>

Yep. This creates a new server registration using the connection parameters that you specify. How cool is that? No other part of the SQL SERVER PowerShell provider that I'm aware of supports the new-item cmdlet/operation. Because the SQLRegistration represent fairly simple items and hierarchies new-item is supported here. No custom programming needed (although you can use the SMO classes if you'd rather), just treat it as you would file system.

A couple of caveats. In RC0 SSMS and PowerShell SQLRegisgtration sync up when you open SSMS but if you're using both at once, the sync to SSMS isn't immediate. Sometimes refresh doesn't appear to show the changes you made in PowerShell. Eventually it will show them if you refresh multiple times or restart SSMS. And after deleting a Server Registration in SSMS, I still had to run "rm MyNewServer" in PowerShell to get rid of it there. Also, I use "integrated security=sspi" rather than "integrated security=true" in my ADO.NET connection strings. This seems to bother SSMS which reports an error in configuration. Use "true" rather than "sspi" when creating registrations through PowerShell.

Monday, July 07, 2008 11:23:58 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 

I received a question today about whether I'd converted my Policy-Based Management examples using SMO (see the multi-part "Programming Policy-Based Management with SMO" series, starting here) from C# to PowerShell yet. I did do this a while ago; they're available as a script download on the SQLskills website (look on the "Past Conferences" page under TechEd 2008). But...

Since then RC0 has changed PBM a bit. The multipart name policy can't use ExecutionMode.Enforce any more, so I changed it to ExecutionMode.None which equates to "On Demand" in the SSMS dialog. There was a change to one of the enumerated constant names too. So the updated scripts for RC0 are posted as part of this blog entry.

As long as I was working on parts of the PowerShell provider for SQL Server and SMO, I decided to put together an example of using the DataCollection APIs as well. The script creates a custom DataCollection Collection Set. That's pretty straightforward. Using the CollectionStore instance just use the correct location in the provider hierarchy and get a CollectionStore instance.

# set a collectionstore object for the default instance on local machine
$colpath = ('SQLSERVER:\DataCollection\' + (get-item env:\computername).Value + '\default')
$col = get-item $colpath

Then create a CollectionSet instance with (your CollectionStore $col is specifed as a parameter to new-object) and one or more CollectionItems in the CollectionSet. Then set the CollectionSet and CollectionItem proprties and call Create. This script enclosed as well, named CreateCustomCollectionSet.ps1.

As an aside, I've enclosed a script called start_smo_sql.ps1. This is a script that I call at PowerShell startup if I know the SQL Server provider is already installed. It sets convenience variables for long SMO namespace names, paths to the default Server, CollectionStore, etc. It's not invoked from my main PowerShell profile %UserProfile%\My Documents\WindowsPowerShell\profile.ps1 which is executed by all PowerShell shells/this user. Instead it's invoked from %UserProfile%\My Documents\WindowsPowerShell\Microsoft.SqlServer.Management.PowerShell.sqlps_profile.ps1. This profile is only executed when I'm using SQLPS.exe, the SQL Server-specific custom shell. It's quite cool that PowerShell accomodates separate profiles for custom shells.

The toughest part in creating a custom collection set is that the XML schema for collection set properties has been updated in one of the CTPs, but not updated in Books Online RC0. The updated XML schema can be obtained by executing:

use msdb
select * from syscollector_collector_types

and using the parameter_schema column to get the schema you need to figure out how to create a validatable parameter. My T-SQL custom collection set used a parameter with an XML namespace on the root element (only) that looked like this:

<ns:TSQLQueryCollector xmlns:ns="DataCollectorType">
        <Query>
          <Value>select * from sys.dm_exec_query_stats</Value>
          <OutputTable>dm_exec_query_stats</OutputTable>
        </Query>
</ns:TSQLQueryCollector>

ps_demo_scripts.zip (3.23 KB)
Monday, July 07, 2008 11:11:57 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Monday, June 09, 2008

Last week at TechEd Developers, I gave a talk on PowerShell and SQL Server. I mentioned some upcoming changes in RC0, and have just had a chance to check them out.

The PowerShell provider for SQL Server has been expanded to handle not only a "SQL" subdirectory (which enumerates database objects) and "SQLPolicy" (which enumerates the policy-based management objects) but also two new "directories": SQLRegistration and DataCollection.

SQLRegistration covers the groups and members of "Registered Servers" and before you ask, yes this is the same Registered Servers that show up in SQL Server Management Studio. So if you're managing hundreds of servers, you can easier script against any server or group of "Registered Servers". You don't have to navigate to a different machine and instance by referencing the name, as I described in an earlier blog entry. Be aware that the registration service takes a while to refresh registrations. If you define a new registered server in SSMS, it won't show up in the PowerShell list immediately. Just wait a minute or so...

The DataCollection "subdirectory" allows you to enumerate the database objects and properties for the data collection feature that is used to populate and control the performance data warehouse in SQL Server 2008. You navigate through these collections/values the same way you'd navigate SQL and SQLPolicy. This one isn't doc'd as being part of the PowerShell provider in  BOL yet.

A couple of other miscellanous changes I noticed:
1. There are no longer separate custom drives for SQL and SQLPolicy as in previous versions. If you liked these custom drives its easy to add them:
New-PSDrive -Name SQL -Root SQLSERVER:\SQL
New-PSDrive -Name SQLPolicy -Root SQLSERVER:\SQLPolicy
..etc

2. Collection Names are no longer case-sensitive. So:

dir SQLServer:\SQL\{Computer}\{Instance}\databases

(where you replace {Computer} and {Instance} with real names) works now with "databases" being case insensitive.

Monday, June 09, 2008 1:59:08 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 

SQL Server 2008 RC0 comes with a Readme file and a Release Notes file. It's always good to read both. In this RC, the readme file contains information on installation and upgrade, and the Release Notes file contains information about things that have changed from previous releases.

In RC0, they've changed the syntax/format of a few relevent new features. The changes affect T-SQL MERGE, the HierarchyID data type's methods, and the Geometry data type's Latitude/Longitude order. These won't be doc'd in BOL under What's New, because the What's New section lists new features, not features that have changed since the last CTP. Reading up on features that have changed will (hopefully) reduce the number of "why doesn't this code example that I downloaded from a reputable source work?" questions.

In addition to the two new performance features I just blogged about, RC0 also contains the new "Spatial Builder" API. Isaac has an excellent blog post about this API, with a (now-)working example. Check out his blog, I didn't find this feature mentioned in either "What's New" or "Release Notes".

Monday, June 09, 2008 1:20:39 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 

There's a newer version of SQL Server 2008 (Release Candidate 0) available on MSDN and Technet.

Another performance-related feature (actually its additional information) allows you to have better visibility into performance-affecting queries. This information is available as columns query_hash and query_plan_hash in the sys.dm_exec_query_stats and sys.dm_exec_requests DMV. There's a really nice illustrative example in Books Online, so I won't repeat it here. You can use the queries in my last blog post to experiment with this as well.

The queries:

SELECT title_id, sum(qty) from sales
group by title_id
having sum(qty) = 30

and

SELECT title_id, sum(qty) from sales
group by title_id
having sum(qty) = 40

are similar enough that they could be parameterized (although they aren't autoparameterized). Parameterization would save query plan entry storage, rather than allocating a separate plan for each incantation of the similar query. You can also use this feature, along with the additional information in sys.dm_exec_query_stats to produce cumulative statistics (such as avg IOs, elasped time, etc. BOL has a nice query to accumulate similar plans, the relavent part is simply "GROUP BY query_hash".

But because you have similar queries, should you always have a parameterized query? What if "...having  sum(qty) = 30" has a completely different plan than "...having sum(qty) = 130"? You can find this information by looking at query_plan_hash. If two plans have similar structure but a different query plan, the query_hash value will be identical but query_plan_hash will be different. That's an indication that parameterization might not be the way to go; the first query to executed will cache it's plan and the other query will use the same plan. This is because of a SQL Server behavior known as parameter sniffing.

Because parameterizing queries is usually one of the first things programmers can do to affect performance, it helps to know when to parameterize. In addition, many folks like to start query tuning by tuning the queries (including similar queries with different parameter values) that are executed most frequently. Query_plan_hash and query_hash gives you visibility into this important information.

Monday, June 09, 2008 1:02:36 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 

There's a newer version of SQL Server 2008 (Release Candidate 0) available on MSDN and Technet.

There's a couple of features that show up in RC0 that are performance related. One is a server setting 'optimize for ad hoc workloads'. This one tells SQL Server to save a "compiled plan stub" in the query cache for adhoc queries, rather than the query plan that's usually saved.

You can see these stubs as cacheobjtype 'Compiled Plan Stub' in sys.dm_exec_cached_plans. And you can see them because trying to resolve a plan_handle to a query plan (using sys.dm_exec_cached_plans or sys.dm_exec_query_stats) doesn't work for stubs, like it does for "real" plan handles. Saving a stub rather than the entire plan saves precious plan cache memory.

Here's a little script to try it out.

sp_configure 'show advanced options',1
reconfigure
go
sp_configure 'optimize for ad hoc workloads',1
reconfigure
go
-- don't use on a production system, clears whole query cache
DBCC FREEPROCCACHE
go
use pubs
go
-- background plans to produce this DMV, note there is a stub for one of these plans
select * from sys.dm_exec_cached_plans
go
-- not autoparameterized, stub produced
SELECT title_id, sum(qty) from sales
group by title_id
having sum(qty) = 30
go
-- you can see the stub
select * from sys.dm_exec_cached_plans
go
-- autoparameterized, whole query plan for parameterized version
SELECT *
FROM titles
WHERE price = 19.99
go

-- stub for the non-parameterized version, plan for the parameterized version
SELECT usecounts, cacheobjtype, objtype, [text]
FROM sys.dm_exec_cached_plans P
   CROSS APPLY sys.dm_exec_sql_text(plan_handle)
   WHERE [text] NOT LIKE '%dm_exec%'
ORDER BY p.usecounts DESC
-- query plan for stub query handle is not saved returns NULL
SELECT sql.text, p.query_plan
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(sql_handle) sql
CROSS APPLY sys.dm_exec_query_plan(plan_handle) p
WHERE text NOT LIKE '%sys_dm_exec%' AND text NOT LIKE '%msparam_0%'
ORDER BY qs.EXECUTION_COUNT DESC
go

-- set it back, same experiment produces plans not stubs
sp_configure 'show advanced options',1
reconfigure
go
sp_configure 'optimize for ad hoc workloads',0
reconfigure
go

Monday, June 09, 2008 12:40:47 PM (Pacific Standard Time, UTC-08:00)  #    Comments [1]  | 
Friday, May 30, 2008

In a blog posting about a few weeks ago, I'd written about noticing a DataSet being created in an Entity Framework program by using the ADO.NET client trace facility. Entity Framework programs do (indirectly) cause a DataSet to be created, but its only for resolving the ProviderInvariantName of the underlying data provider. NOT for anything related to the functioning of the EntityClient or ObjectContext in any way. I've revised the original blog posting to remove the DataSet reference, but just in case you already read it....

The connection string for the EntityClient data provider contains a parameter that refers to the ProviderInvariantName of the underlying data provider (e.g. System.Data.SqlClient). Entity Framework validates the underlying data provider name by using the ADO.NET provider enumerator class DbProviderFactories. DbProviderFactories creates the DataSet to store the provider information obtained from the machine.config file.

I might have guessed this...as I'd written a whitepaper on the functionality of the ADO.NET provider model, "Generic Coding with the ADO.NET 2.0 Base Classes and Factories" not more than a few years ago. Using the DbProviderFactories class is the only way to enumerate the ADO.NET data providers installed on your machine.

Come to my "End-To-End Tracing with SQL Server 2008 and ADO.NET" talk at TechEd next week and I'll show you the trace.

Friday, May 30, 2008 3:48:41 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 

I'm getting ready to head out to TechEd in Orlando. I'll be doing a full day of SQL Server 2008 for Developers on Monday, a talk on SQL Server client and server-side tracing (including SQL Server 2008 Extended Events), one on PowerShell and SQL Server, and finishing up with a talk on SQL Server Security for Developers and Architects. I'll also be hanging around the SQL Server Technical Learning Center (on the showfloor) at least part of the time I'm there.

If you're around, stop by and say hi...

Friday, May 30, 2008 2:38:13 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 

It's a good idea when talking to a database to save on network roundtrips. The table-valued parameter in SQL Server 2008 is an example of a feature that can reduce them in the "1 order, 1-n detail items" use case.

So its always been mildly irritating that in order to insert a row with a filestream column, you'd need to make 2 database roundtrips. One roundtrip is to execute the INSERT statement, inserting an empty value in the filestream column. This causes the file to be created, a NULL value won't cause file creation, and you need the PathName to open a file handle or use the new SqlFileStream .NET class. The second roundtrip is used to get the FILESTREAM_TRANSACTION_CONTEXT and PathName. Then you have the info you need for the handle/SqlFileStream.

However, with the OUTPUT clause introduced in SQL Server 2005 you can do it all in one roundtrip. Here's the SQL statement, based on the BOL filestream example. You can run this code using an ADO.NET DataReader to retrieve the values, and construct a SqlFileStream instance to write to.

INSERT INTO dbo.student
OUTPUT Inserted.Resume.PathName(), GET_FILESTREAM_TRANSACTION_CONTEXT()
VALUES (newid (), 'Mary', CAST ('' as varbinary(max)))";

Of course, if you're writing a large value into a file anyway, the savings of one database roundtrip in the overall scheme of things is questionable, but using minimum roundtrips is a good habit to get into in any case.

Friday, May 30, 2008 10:29:50 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Friday, May 16, 2008

This item was mentioned in other blogs too, but just to mention it again (so my SQLCE and Sync Services friends won't think I overlooked it)...

Visual Studio 2008 SP1 Beta contains direct support for using SQL Server 2008 Change Tracking. When you use ADO.NET 1.0 Sync Services with a SQL Server 2008 database, the Sync Services designer (that's Add/New Item/Local Database Data Cache) adds a checkbox that allows you simply to "Use SQL Server Change Tracking". No extra triggers, no tombstone tables, change tracking does it all for you. Visual Studio generates some scripts to enable change tracking at a database level and at a table level for the tables you select.

There's some nice generated code that warns you if you need to resync because you haven't synchronized often enough. And you can tweak the SQL scripts for more control. You can specify a CHANGE_RETENTION (the default is 2 days) and whether or not AUTO_CLEANUP is ON (it's on by default) in your ALTER DATABASE statement. And if you want to write your own, more granular synchronization or conflict resolution code, you can specify WITH TRACK_COLUMNS_UPDATED = ON (default is OFF).

Something that puzzled me was, although the SQL Server Books Online states: "Using snapshot isolation is the easiest way for you to help ensure that all change tracking information is consistent. For this reason, we strongly recommend that snapshot isolation be set to ON for the database.", the ALTER DATABASE DDL generated by Visual Studio did not include the option to SET ALLOW_SNAPSHOT_ISOLATION ON. Enabling snapshot isolation in SQL Server does involve a non-trivial amount of overhead, so it looks like they set it to the least overhead option and let you decide if you need snapshot isolation after reading the arguments/scenarios where it's useful in Books Online. Probably a wise choice.

Synchronization Framework is an exciting technology, given that there's so many possible uses of synchronization with "Local Data Caches" and, with the support for Sync Services in mobile devices. The next release of the Synchronization Framework (2.0, there's a beta out already) even contains support for Peer-to-Peer synchronization through the model. Perhaps this will fit some additional "replication via services" in databases. We'll have to wait and see if the Peer-to-Peer sync provider even supports databases, I guess. Always something interesting going on...

Friday, May 16, 2008 3:12:43 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 

Last fall I'd asked the folks on the data access team about the possibility of including the new SQL Server 2008 data types in the new data access object mapping layers. At that time they said "no" but they also asked "why do you want it?" and "are people expecting it?". Someone must have had some compelling arguments.

I'm sure you've read this in other blogs, but in Visual Studio 2008 SP1 Beta, which arrived Monday, there was...drumroll please...support for all four date and time types in not only LINQ to SQL, but in ADO.NET Entity Framework. Thanks! I'm still a bit surprised about the inclusion of DateTimeOffset in EF (its a SQL Server-specific data type and EF is more platform-neutral than LINQ to SQL), but it was a cool surprise to have them there. 

Everyone's blog entry also points out there is some special support for Filestream storage (including a System.Data.Types.SqlFilestream class), but they identify Filestream as a data type. Filestream is a storage format rather than a data type, the actual data type is varbinary(max). Minor terminology nit. You can read and write FILESTREAM based types without resorting to PInvoke. Cool.

Of course, there were some SQL Server 2008 data types (the SQLCLR-based types) that aren't "in there". These are the spatial types (Geometry and Geography) and the HierarchyID data type. Perhaps for EF/LINQ to SQL version 2. These types are correctly identified in Server Explorer in a table but attempting to add a table containing them to an EF diagram fails (it actually leaves the type out of the diagram) and in LINQ to SQL throws an error "the data type is not supported".

That means if you have one of these types in your SQL Server 2008 database, you need to do something funky like treat it as a varbinary(max) and deserialize it (think IBinarySerialize) on the client yourself. Because the SQLCLR-based data types would have a one-to-one mapping to .NET classes (Geometry = Microsoft.SqlServer.Types.SqlGeometry, Geography = SqlGeography, HierarchyID = SqlHierarchyID) these would possibly be straightforward to implement in the next release. They WOULD be a potential model-breaker in that you would have "columns" (entity properties) that have their own properties and methods. Another whole level of indirection.

I've said before that IMHO, spatial support is the "killer developer feature" in SQL Server 2008. Let's hope for more API support in future. But THANKS for the inclusion of the date and time types.

Friday, May 16, 2008 2:43:32 PM (Pacific Standard Time, UTC-08:00)  #    Comments [1]  | 

After posting the last blog entry, I realized that I hadn't blogged in about a month. Hmmmm...

Well, since last month I posted the demos for my SQLConnections talks on the SQLskills website (under PastConferences). I really had a good time at SQLConnections, its been a while since I'd been invited to speak there. Thanks to those folks who showed up and said hi.

I've been updating my classes for SQL Server 2008 while waiting for the next CTP. Don't know when that's coming, but everyone's guess is "sometime in June". And I've found a few more neat things in SQL Server 2008 that I'll get around to writing about some day soon. And did some experimenting with the Visual Studio 2008 SP1 Beta (see previous post).

And I've done a lot of yardwork and gardening. I've been outside almost every day; might even lose that pastey-white "programmer who never sees the sun" look. But its 97 degrees in Portland today (yes, 97, really) and its a bit hot for gardening.

Next stop...TechEd Developer Week in June. I'm doing a precon there on SQL Server 2008 and a couple of other interesting talks. Wonder if it will be 97 degrees in Orlando... If you're in Orlando stop by and say hi.

Friday, May 16, 2008 2:20:37 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 

Every once in a while I'll get inquires on a paper I wrote in 2004 about the ETW trace providers for ADO.NET (named "Tracing Data Access"). I got an inquiry today, and since I'd recently installed Visual Studio 2008 SP1 Beta, I thought I'd try it out on a LINQ to SQL program and an Entity Framework program.

If you haven't read the paper, the original is still available on MSDN, but an embellished version (including SNAC tracing in addition to ADO.NET) was released in 2006. I'm also doing a talk at TechEd on "End-to-End tracing in ADO.NET and SQL Server 2008". If you're at all interested in client tracing or SQL Server 2008 Extended Events, it will be worth your while to attend.

Unless the LINQ to SQL program has its own ETW provider that I missed, I got no LINQ to SQL specific trace events. Since LINQ to SQL is a thin layer over System.Data.SqlClient, I do get the events for SqlClient however.

Entity Framework however, is another matter. I not only get EF-specific trace events, but there appears to be some new trace namespace abbreviations for EF as well. These are the ones I've seen while going through one trace of one simple EF program. I may have missed some.

dobj = System.Data.Objects
ec   = System.Data.EntityClient
cqt  = System.Data.Common.CommandTrees
pc   = System.Data.Query.PlanCompiler

There's quite a bit of trace information for some parts of EF, like the CommandTree portion. But after a whole half-hour and a few traces, it appears that not all items in EF are as thoroughly instrumented with trace events as, say, SqlClient. On the other hand, EF provider for SQL Server will show lots of SqlClient-specific and System.Data.Common events, as does LINQ to SQL.

Friday, May 16, 2008 1:37:31 PM (Pacific Standard Time, UTC-08:00)  #    Comments [2]  | 
Thursday, April 17, 2008

I'm heading off to SQLConnections in Orlando early tomorrow morning. I'll see starting with a preconference "Day of SQL Server 2008 for Developers" and covering, well, every new feature that could interest a developer. I'll also being doing a series of talks about everything from Spatial data support to Extended Events to PowerShell in SQL Server to XML for DBAs.

If you'll be in Orlando for the conference, stop by and say hi.

Thursday, April 17, 2008 6:42:51 AM (Pacific Standard Time, UTC-08:00)  #    Comments [2]  | 
Tuesday, April 08, 2008

One of the uses for sparse columns will likely be replacing entity-attribute-value designs. In an EAV design, the "sparse attribute" values are represented by a separate table. The attribute value can either be declared as (n)varchar or sql_variant. An example I've always used is the hardware store that wants to keep track of its products in a "products" table, but each product has different attributes. If you use the minimum number of columns to simplify the example, the EAV design looks like this.

create table products -- base table
(
 id int primary key,
 name varchar(max)
)
go
create table properties -- sparse attribute (name-value pair) table
(
 id int,
 name varchar(50),
 value varchar(max),
 CONSTRAINT PK_properties PRIMARY KEY (id, name),
 FOREIGN KEY (id) REFERENCES products (id)
)
go

Here's a straightforward way to convert the table to use SQL Server 2008's sparse columns. It uses dynamic SQL, but in this case there's no user input (SQL injection worries).

declare @tab nvarchar(max),
        @sql nvarchar(max)

set @tab=N'create table products2 (id int primary key, name nvarchar(max) '
select  @tab=@tab+','+ name + ' varchar(max) sparse' from properties
group by name
set @tab += ' ,col_values xml column_set for all_sparse_columns);'
-- select @tab
exec(@tab)

Populating it is also straghtforward using the pivot operator, introduced in SQL Server 2005. In the case where each item has only one of each sparse property (the table constraint enforces this), and our properties table has only three columns, there's no real aggregation with pivot. The aggregate is just required by the pivot operator syntax. This populates the table:

declare @col nvarchar(max),
        @sql nvarchar(max)

set @cols=N''
select  @cols=@cols+','+ name from properties
group by name
 select @cols
set @cols=substring(@cols,2,datalength(@cols)/2 - 1)
set @sql=N'insert into products2 (id, name,' + @cols + ') select p.id, p.name, '+@cols+
      N' from (
      select id, name, value
      from properties
      ) as q
      pivot
      (
      max(value)
      for name in ('+@cols+
      ')
      ) as PivotTable'
set @sql += ' join products p on PivotTable.id = p.id'
-- select @sql
exec (@sql)

What remains to be done now is to choose better data types for the sparse columns, if the data isn't really a string. If you haven't enforced value type correctness in the application, this may require some data cleansing. Full example as an attachment. Remember that currently (in CTP6) you can only have 1024 total columns in a table; but the limit will be increased to 30000 sparse columns before RTM.

eav_to_sparse.zip (1.23 KB)
Tuesday, April 08, 2008 9:50:47 AM (Pacific Standard Time, UTC-08:00)  #    Comments [1]  | 
Monday, April 07, 2008

It's easier than you'd think... When you use the SQL Server 2008 CREATE EVENT SESSION DDL statement with extended events, you specify:

Events to be captured (e.g. sqlserver.error_reported)
Actions to be fired to add more information (e.g. sqlserver.sql_text)
Predicates for conditional capture (e.g. where sqlserver.error_reported.error = 547)
Event target to collect the events (e.g. package0.ring_buffer)
Addtional options (e.g. MAX_MEMORY)

In general, the way you look for events, actions, predicates, etc to use is to query against the metadata views:

select p.name + '.' + o.name as [Full Name],
       o.description s [Description]
from sys.dm_xe_objects o
join sys.dm_xe_packages p on o.package_guid = p.guid
-- @type can be 'event', 'action', etc
where o.object_type = @type order by p.name, o.name

Two of the actions listed have to do with activity (causality): package0.attach_activity_id and package0.attach_activity_id_xfer. In addition there is an option TRACE_CAUSALITY. I thought you'd have to set the TRACE_CAUSALITY option and add the actions to collect activity information. But attempting to add either action produced an error.

It's easier than that. Simply turning on the TRACE_CAUSALITY option is enough. The actions can't be specified in CREATE EVENT SESSION (or ALTER EVENT SESSION) because they are for internal use. The activity_ids show up at the target without explicitly naming the actions.

Monday, April 07, 2008 10:54:14 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Saturday, April 05, 2008

It appears that there are some changes in .NET 3.5 System.Transactions (or System.Transactions.dll and System.Data.dll version 2.0.50727.1433 if you're looking at assembly versions). Florin Lazar blogs about a change to the syntax here. And Alazel Acheson blogs about changes to SqlClient changes to accomodate using lightweight transactions with less promotion to distributed transactions in the ADO.NET Team blog.

It is interesting to read how the latest version of SqlClient can combine automatic transactions with connection pooling to use promotion to a distributed only when necessary. It was reminiscent of how COM+ used the connection pool; the difference is that COM+ always used distibuted transactions. Some of the changes use a transaction-aware connection reset mode that's only supported in SQL Server 2008. So server changes were required as well.

At the end of the blog entry Alazel mentions that you can't use this facility with two open connections at the same time; it requires that there is a free connection with the appropriate transaction scope in the pool. I immediately thought of SQL Server's sp_getbindtoken and sp_bindsession, the manual way to allow two connections to share a single transaction context. But sp_getbindtoken and sp_bindsession are on the deprecation list (to be removed in a future version of SQL Server) in SQL Server 2008. The books online entry for these calls recommends "using MARS or distributed transactions instead". But why MARS?

MARS (multiple active resultsets) made its appearence in SQL Server 2005 and its data access stacks. It permits you to have multiple interleaved sessions while using a single SQL Server connection. These multiple sessions share the same transaction, modulo an interesting but escoteric behavior known as batch-scoped transactions. MARS also does not support named savepoints.

What MARS and sp_getbindtoken/sp_bindsession have in common is that they are both solutions to the same "problem". I prefer calling this a SQL Server "behavior" because its not technicall a problem, just how the underlying network stack works. SQL Server does not allow other activity on a connection (like an UPDATE or a second SELECT) while reading a rowset is in progress. Pre-MARS, if you wanted to have multiple commands in a single transaction scope, you would use two connections and "bind" the transaction scopes together with the transaction token. With the MARS capability this is no longer necessary.

So, the OpenConnction1/DoCommand1/Close then OpenConnection2/DoCommand2/Close pattern is now acommodated by System.Transactions and SQL Server 2008 without transaction promotion. To do two commands with the same transaction scope without closing the connection, use a single connection and multiple sessions with MARS. With either data access pattern, only a single physical connection to SQL Server is needed.

Saturday, April 05, 2008 2:14:44 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Monday, March 24, 2008

Short post this evening...

Just in case anyone else flails around looking for this feature, its right under your nose. When you use the PowerShell SQL Server 2008 provider, you have visibility to a single, local machine (and all its SQL Server instances you can access with integrated security) by default. To get access to multiple machines using the provider, simply reference a SQL provider path that contains that machine name. If the (Windows) principal has access to the other machine's SQL Server instance, it will open a connection using Windows auth.

For example, say that I'm on a machine named zmv20. I have access to machine zmv21's SQL Server instance as well.

>cd SQLSERVER:\SQL
>dir

MachineName
-----------
zmv20

>dir zmv21  <---- makes a connection to zmv21

Instance Name  (on zmv21)
-------------
DEFAULT

>dir           <---- now you can "see" zmv21 too

MachineName
-----------
zmv20
zmv21

Using the test-path cmdlet also works.

>test-path SQLSERVER:\SQL\zmv22  <---- can I login to this machine too?

But bear in mind that this is subject to a connection timeout lag. The error message indicates that its first using WMI to obtain the machine connection.

Of course, all this is doc'd in SQL Server Books Online. Where I missed it a few times...I was looking for the equivalent of a "connect" command. You don't need one.

Monday, March 24, 2008 11:21:58 PM (Pacific Standard Time, UTC-08:00)  #    Comments [2]  | 
Thursday, March 20, 2008

I'm still getting used to the new sparse column feature in SQL Server 2008.

I'd just read in the BOL definition of ALTER TABLE that you can add a column_set to an existing table. I was converting a sample app from an EAV (entity attribute value) design to sparse columns. I used the existing "attribute-value" table to create the sparse columns, then created the table. I then went back to add the column_set with ALTER TABLE after the fact.

alter table sparsetest3 add spcolset xml column_set for all_sparse_columns

Received this error message:
Msg 1734, Level 16, State 1, Line 1
Cannot create the sparse column set 'spcolset' in the table 'sparsetest3' because the table already contains a sparse column set. A table cannot have more than one sparse column set.

Huh? I figured that I should be able to add the column_set and filed the lack of this capability as a bug. It came back as "by design".  And doc'd in BOL that way. The fact that its by design makes perfect sence and helped to solidify in my mind how the column_set works.

In a "normal" table (even with sparse columns) without a column_set, "select * from table" returns all of the columns. If a table has a sparse column_set, "select * from..." behaves differently, and returns only the non-sparse columns and the column_set column. NOT the individual sparse columns. You can INSERT or UPDATE this table by using only the column set. When you update using a column_set, all of the sparse columns that you don't specify are set to NULL. You can even update to column_set to NULL itself, which NULLs out all of the sparse columns.

The reason that "A column set cannot be added to a table if that table already contains sparse columns" (BOL exact wording under "Guidelines for using sparse columns") is that it could break existing code that uses "select * from...". Imagine:

create table sparsetab (
 id int identity primary key,
 col1 int,
 spcol2 int sparse
);

select * from sparsetab -- returns spcol2
-- add column_set (this is disallowed, but imagine it DID work)
select * from sparsetab -- doesn't return spcol2, only the column_set, can break code that relies ont spcol2

And BOL is right in ALTER TABLE as well.

create table sparsetab2 (
 id int identity primary key,
 col1 int
);
go
-- add first sparse column and column_set at the same time, works fine.
alter table sparsetab2
 add spcol1 int sparse,
       spcolset xml column_set for all_sparse_columns
go

You can even, as BOL indicates, add a column_set to a table that does not yet have a sparse column.

create table sparsetab4 (
 id int identity primary key,
 spcolset xml column_set for all_sparse_columns
);
go
-- Now you can add sparse columns, they use the column_set

The ONLY issue I have is with the error message 1734 at the beginning of the post. It's misleading, because I DON'T already have a column_set. But I DO already have sparse columns.

In last sparse column correction from a long-ago post. I'd heard (early on) that you would be able to have over 4 million sparse columns (actually sizeof(int) of them). This turns out to be incorrect, it was announced lately that the limit will be 30,000 sparse columns. And, in the current CTP6 you can only have 1024 total columns (in as previous versions); the sparse column limit will be changed sometime before RTM.

Amazingly, when I quoted the 30,000 column limit to a class last week, there was a groan. One student told me his EAV table already had over 60,000 unique attributes. 30,000 wouldn't be enough...must be a HUGE EAV table. That's the motivation for sparse column. Also, his EAV table had the "value" column defined as SQL_VARIANT. That's the other motivation, sparse columns are strongly typed. Although inserting through the column_set always uses a string (nvarchar) as the value, and attempts to convert string to the definied data type for specific columns.

Thursday, March 20, 2008 10:03:07 AM (Pacific Standard Time, UTC-08:00)  #    Comments [1]  | 
Monday, March 17, 2008

This is the last part of a series on programming policy-based management. The series starts here.

In the previous installment, I created a policy that was constrained to a single database. To accomplish this, I used a Condition that called out the database by name, and tied it to the TargetSet using TargetSet's SetLevelCondition method.

An alternative consists of creating a policy as part of a PolicyCategory. Each Policy is a member of exactly one PolicyCategory. The default PolicyCategory is the only one that "ships with the system", but you can define your own. If you don't specify otherwise in the code (SSMS has a Category dropdown on the Description page of the new Category dialog), your policy is a member of the default category.

Each PolicyCategory has a property that indicates whether it's manditory that a database subscribe to that category. If this property is true, each database has an implicit subscription to the category. If not, a database must explicitly subscribe. The code to define a PolicyCategory is straightforward:

static void CreateCategory(PolicyStore ps)
{
    PolicyCategory cat = new PolicyCategory(ps, "MyNewCategory");
    cat.MandateDatabaseSubscriptions = false;
    cat.Create();
}

To create a Policy that's a member of the PolicyCategory, simply use the aptly-named PolicyCategory property. If you're using a named PolicyCategory you may not want to restrict that policy to a specific database. Here's the changes to the Policy definition.

// No Condition On This One, applies to all databases, but must be subscribed to
//ts1.SetLevelCondition(ts1.GetLevel("Server/Database"), "FinanceDB");

// Name the category
p2.PolicyCategory = "MyNewCategory";

To create a PolicyCategorySubscription (for those categories that are not Manditory), the only interesting part is that you need a SqlSmoObject. You can use a concrete subclass (like Database) or make up a SqlSmoObject by using a URN. Dan Sullivan's and my book "Developer's Guide to SQL Server 2005" covers both ways to make an SqlSmoObject. Here's the code for PolicyCategorySubscription.

static void CreatePolicyCategorySubscription(PolicyStore ps)
{
    Server svr = new Server(); // open a connection to default instance, local server
    Database db = new Database(svr, "pubs");
    PolicyCategorySubscription subs = new PolicyCategorySubscription(ps, db);
    subs.PolicyCategory = "MyNewCategory";
    subs.Create();
}

This means that only the pubs database now follows the policies in "MyNewCategory". BTW, there's currently the PowerShell provider in CTP6 throws an error when attempting list a PolicyCategorySubscription (its in the hierarchy at the same level as Policy). The PolicyCategorySubscription still works as advertised though.

This concludes the series on Programming Policy-Based Management with SMO. Hope it was useful.

Monday, March 17, 2008 6:58:47 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 

This post is part of a series on programming policy-based management. The series begins here.

So, we were working with a policy that required an ObjectSet. ObjectSets contain TargetSets. For example, the ObjectSet for the naming policy (IMultipartName) we were working on needs a TargetSet for Procedure, Synonym, Table, Function, Type, View, and XmlSchemaCollection.

Note that this collection is similar to what you'd see for a MultipartName policy in the SSMS designer dialog. The title for it in SSMS is "Against Targets:". We want our policy to apply only to Tables. After fumbling around for a while attempting to define TargetSets and add them to the ObjectSet's collection of them, I found that the seven TargetSets I needed were *already* defined. This reduces enabling only the table's TargetSet to two lines of additional code. Notice that you can reference a specific TargetSet in an ObjectSet by using an indexer. The indexer is an SMO URL.

TargetSet ts1 = os1.TargetSets["Server/Database/Table"];
ts1.Enabled = true;

But how to restrict this policy to a single database? For this we need a Condition to name the database. Because this condition is simple we can use ExpressionNode.Parse(). Here's the Condition code.

// Create a condition to enforce
Condition con = new Condition(ps, "FinanceDB");
con.Facet = "Database";
// Note: Using Parse() treats the string as an SMO URL. Only works for simplest cases
string s = "@Name = 'finance'";    
// try-catch code omitted for brevity
con.ExpressionNode = ExpressionNode.Parse(s);
con.Create();

Back to our TargetSet. We restrict the TargetSet to a specific database by using TargetSet.SetLevelCondition. SetLevelCondition takes a TargetSetLevel, and we get the appropriate TargetSetLevel (which is prepopulated) by.... you guessed it...using a SMO URL. After ts1.Enabled, this limits the policy to a single database defined by our Condition.

// FinanceDB is the name of our Condition that "limit/defines" this policy only to Finance
ts1.SetLevelCondition(ts1.GetLevel("Server/Database"), "FinanceDB");

Having initialized the ObjectSet correctly, we now create the ObjectSet, tie it to the Policy and voila...

os1.Create();
p2.ObjectSet = "CheckFinanceTab_ObjectSet";
p2.Create();

Check this policy by using the following code in a query window:

use finance
go
create table dbo.foo (id int);
go

You get the expected error:
Policy 'CheckFinanceTab' has been violated by '/Server/(local)/Database/finance/Table/dbo.foo'.
This transaction will be rolled back.
Policy description: ''
Additional help: '' : ''.
Msg 3609, Level 16, State 1, Procedure sp_syspolicy_dispatch_event, Line 50
The transaction ended in the trigger. The batch has been aborted.

Monday, March 17, 2008 2:44:58 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 

This is part of a series on programming policy-based management. The series starts here.

Now, we'll tackle programming a little bit more complicated policy. The table-naming standard that applies to a set of database objects. For this, we need three items:
1. Condition for defining the policy itself.
2. Policy that uses the condition and contains....
3. Condition that specifies a set of database objects to which the policy should be applied.

The first condition is straightforward

Condition con = new Condition(ps, "TablePattern");
con.Facet = "IMultipartNameFacet";
ExpressionNode exp = new ExpressionNodeOperator(OperatorType.LIKE,
    new ExpressionNodeAttribute("Name"),
    new ExpressionNodeConstant("fintbl%")
    );
con.ExpressionNode = exp;
con.Create();

Note that, as with the complex condition we specified previously (part 3), you can't specify 'LIKE' by using the Parse() method, but 'LIKE' does appear when you access the condition by using ToString(). Perhaps, in future, Parse and ToString will be reflexive on an ExpressionNode. Or maybe I'm just missing something.

The first part of the policy is straightforward too:

Policy p2 = new Policy(ps, "CheckFinanceTab");
p2.Condition = "TablePattern";
p2.AutomatedPolicyExecutionMode = AutomatedPolicyExecutionMode.Enforce;
p2.Enabled = true;

But we want this policy to apply to to the TABLES (not views, for example) and only those tables in the finance database. For this, we need an ObjectSet.

ObjectSet os1 = new ObjectSet(ps, "CheckFinanceTab_ObjectSet");
os1.Facet = "IMultipartNameFacet";

Note that ObjectSets reference Facets too. Note also that ObjectSets are a "top-level" object, that is, in the PowerShell provider they appear at the same level as Conditions and Policies. Note that Facets do NOT appear at that level.

Before we go any further, cavaet... there is no function in SSMS to define an ObjectSet standalone. Therefore, no way to delete a "half-baked" ObjectSet from SSMS if you don't want it. To delete an ObjectSet you don't want, you need to go into the PowerShell provider, navigate to SQLSERVER:\SQLPolicy\{server}\{instance}\ObjectSets and delete it. "del MyObjectSet". In SSMS, deleting a policy that refers to an ObjectSet deletes the ObjectSet too. Not sure if you can have multiple policies refer to the same ObjectSet yet, but if you could I assume you'd have to delete all the policies before an ObjectSet would go away.

Monday, March 17, 2008 2:04:44 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 

This is part of a series on programming policy-based management. The series starts here.

So, to initialize my Condition's ExpressionNode I need more than just a string. It doesn't look like, at this time, every ExpressionNode CAN initialized with a string. But we can use the subclasses. The ones I need here are Operator, Attribute, and Function. I need: DatabaseMailEnabled (Attribute), Equals (Operator) and false (Function). BTW, ExpressionNodeFunction appears to be what you are programming when you use the "Advanced Functions" dialog in the SSMS Condition dialog.

Putting one expression together first looks like this:

ExpressionNode exp1 = new ExpressionNodeOperator(OperatorType.EQ,
    new ExpressionNodeAttribute("DatabaseMailEnabled"),
    new ExpressionNodeFunction(ExpressionNodeFunction.Function.False)
    );

Note that, unlike in SSMS, you don't need '@' before the attribute name. But I need to specify both DatabaseMail and SqlMail. That's just a little more complex.

ExpressionNode exp1 = new ExpressionNodeOperator(OperatorType.EQ,
    new ExpressionNodeAttribute("DatabaseMailEnabled"),
    new ExpressionNodeFunction(ExpressionNodeFunction.Function.False)
    );
ExpressionNode exp2 = new ExpressionNodeOperator(OperatorType.EQ,
    new ExpressionNodeAttribute("SqlMailEnabled"),
    new ExpressionNodeFunction(ExpressionNodeFunction.Function.False)
    );
ExpressionNode exp_both = new ExpressionNodeOperator(OperatorType.AND, exp1, exp2);

con1.ExpressionNode = exp_both;
con1.Create();

And, nice as you please. Looks just like the one defined in SSMS using the GUI. Make sure that you remove (or comment out) the original ExpressionNode.Parse statement.

Now, on to the policy.

// Create a policy that uses the condition we just created
Policy p1 = new Policy(ps, "OffByDefaultSMO");
p1.Condition = "MailOffSMO";
p1.AutomatedPolicyExecutionMode = AutomatedPolicyExecutionMode.None;
p1.Enabled = false;
p1.Create();

Looks like the one created by the GUI and works like it too.

This is a pretty simple policy, because it can only be applied at the instance level. Next, we'll look at a policy that can be applied to a set of database objects, e.g. All tables in a particular database. This requires that we investigate ObjectSets.

Cavaet. It's pretty easy to make a mistake, sans docs. Some mistakes produce a generic "I can't do this"-type message. So ALWAYS drill into the INNER exception if you get an error. That is:

try
{
    con1.ExpressionNode = exp_both;
    con1.Create();
}
catch (Exception ex)
{
    Console.WriteLine(ex.Message);
    if (ex.InnerException != null)
    {
        Console.WriteLine(ex.InnerException.Message);
    }
}

Stack trace might even be helpful.

Monday, March 17, 2008 1:15:14 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 

This is the second part in a series about programming policy-based management. The series starts here.

To build our MailOffByDefault policy we need:
   Condition that specifies properties and settings
   Policy that uses the condition

Condition first. This looks pretty straighforward.

Condition con1 = new Condition(ps, "MailOffSMO");
con1.Facet = "ISurfaceAreaFacet";
con1.ExpressionNode = ExpressionNode.Parse(
                        "@DatabaseMailEnabled = 0 and @SqlMailEnabled = 0");
con1.Create();

The Condition class uses PolicyStore instance (ps) and names the condition (MailOffSMO). You initialize a facet with a string. So where did the string come from? It is named in the SMO library but I "found" it by making an equivalent condition with SSMS and inspecting it.

foreach (Condition c in ps.Conditions)
 // ... look at c in the VS visualizer

ExpressionNode should "define" my condition, and since there is no ExpressionNode constructor, I first tried Parse(). This created the Condition, but it was unusable in SSMS. SSMS wanted "false" not "0". I searched around for how to specify "false" in the parse string for a while, then came upon something better.

ExpressionNode has six subclasses that can be used in combination to specify any set of expressions that you need. These are
  ExpressionNodeAttribute
  ExpressionNodeConstant
  ExpressionNodeChildren (with subclasses)
    ExpressionNodeFunction, ExpressionNodeGroup, and ExpressionNodeOperation

That's next.

Monday, March 17, 2008 12:46:24 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 

I've been looking at the new Policy-Based Management (was: Declarative Management Framework) in SQL Server BOL. All of the BOL examples use the SSMS user-interface to define and maintain the policy store. While there will likely be 3 ways to configure PBM as there is with, say, replication (SSMS, SMO, and system stored procs), I thought I'd take a try at programming it using the new SMO libraries.

Because the docs are sparse (there are listing of the new classes but no description of what they do in BOL), I figured I'd start by writing C# code, because I like the visualizers (those components that allow you to drill into a heavily nested structures while debugging) in Visual Studio. Later on, I'll port these to PowerShell. I can also use PowerShell reflection capabilities and the new SQL Server provider to get a quick look a the structures.

Be aware of the fact that, since this API is so sparsely documented it could change by RTM. Always a consideration.

The new classes live, for the most part, in two libraries:
  Microsoft.SqlServer.Dmf
  Microsoft.SqlServer.Management.Sdk.Sfc

I also added references in the project to:
  Microsoft.SqlServer.ConnectionInfo
  Microsoft.SqlServer.PolicyEnum
  Microsoft.SqlServer.Smo

I mean to start by replicating the two examples in the PBM (is that its new acronym?) books online tutorial. But first, we need a starting point. It's the PolicyStore class. The PolicyStore is also the machine-root of the PowerShell provider drive SQLSERVER:SQLPolicy\{machine}\{instance}.

You can initialize the PolicyStore's connection with an instance of SqlStoreConnection from the ...Management.Sdk.Sfc namespace. Not sure what Sfc stands for, but being an old C++ programmer, perhaps its SQL Foundation Classes ;-) ? No matter. Luckily you can initialize a SqlStoreConnection with a plain old SqlConnection. So, lets connect to the store.

SqlConnection conn = new SqlConnection("server=zmv32;integrated security=sspi");
PolicyStore ps = new PolicyStore();
ps.SqlStoreConnection = new SqlStoreConnection(conn);
conn.Open();

Where to go from here? In the PowerShell provider, the subdirectories of the policy store are Conditions, Policies, ObjectSets, PolicyCategories, and PolicyCategorySubscriptions. To create the first policy (MailOffByDefault) we need a Condition and a Policy. That's next.

Monday, March 17, 2008 12:43:55 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Friday, March 07, 2008

This is the last post in the series, at least for now. I'll update it (or post more on the topics discussed here) as the products involved evolve and mature. This one's about:

LINQ to SQL and EF queries will be "untunable" because, even if you discover a performance problem, you can't change the underlying API code to produce the exact SQL query that you want. There's too many layers of abstraction to change it.

T-SQL is a declarative language, allowing you the ability to rewrite queries for better performance. A whole cottege industry has grown up around this (I teach it myself), and it usually consist of changing the SQL to get the plan you want, based on your intimate knowledge of the (current) data and the (current) use cases. As one of the simplest examples, you can switch between joins, correlated subqueries, and nested subqueries to see which one gives best performance. Or use EXISTS rather than a correlated subquery or IN clause.

Because the queries (LINQ and/or ESQL) are programmatically transformed in SQL queries there is not (that I'm currently aware of) the ability to "rephrase" LINQ/ESQL queries to produce subtlely different SQL queries and thus better performance. If you can produce rephrased SQL by changing a LINQ/ESQL query (not just rewriting a LINQ/EF query to produce different results that are more optimal), I'd be interested in hearing about it. Perhaps another cottege industry awaits...

BTW, although most/many SQL queries can be rewritten (sometimes many different ways) and tested for best generated query plan/best performance, the limitation is that, in future, the query processor can get smarter, thus making your past years' work unncessary. Usually though, you've benefited from rewriting SQL for that extra 6 mos-5 years until the query processor changed anyway.

Besides query rewrites, you can also "hint" queries, in most dialects of SQL I've seen. This helps when the query processor chooses a suboptimal plan (uncommon, but not unheard of) and you have intimate knowledge or data and use cases. Or when you're trying to service different use cases with the same query; SQL queries only have one plan at a time (modulo parallelized plans) and you might have to satisfy different use case by differently hinting the same query. Because the translation to SQL is deeply imbedded in the LINQ/EF source code, if I find a performance problem that can be helped with a hint, I can't hint in the LINQ/ESQL code. This means going back to using stored procedures (they work with hints) and away from the model.

Hinting is usually not preferred over rewriting the SQL because hints "tie the query processor's hands", i.e. if the statistics change so that a different plan would work better, the query processor can't use this information because you've told it how to accomplish the query. You've changed SQL from a declarative language to an imperative language. It's best not to put query hints in code, but separate them to a separate layer. SQL Server 2005 calls this separate layer plan guides. The plan guide is a named database object that relates a hint to a query without changing the underlying code. You can add/drop plan guides or turn them on/off at will. Or re-evalute them when things (the statistics or use cases) change.

Can you use plan guides with LINQ/EF queries? Two things to keep in mind. First, a plan guide for a SQL statement requires an exact match on a batch-by-batch basis. Machine-generated SQL will likely make exact match easier, but you will have to check that the guides are being used each time LINQ/EF libraries changes. Second, plan guides work best if you have a limited number of them in your database. They're meant to be special-case... not to add another level of complexity to an already complex (and getting more so as the layers of abstraction increase) situation. So use these with care.

So, is this an issue worth worrying about? I think we'll need to wait and see. Fix a few "bad (generated) SQL or bad queries" problems before giving up entirely. Or, fix performance problems (in the generated SQL) by going to stored procedures and see how many procs you have after a year. Are the folks who are licking their chops in anticipation of LINQ/EF related perf problems justified? Well, its not me that thinks optimizing declarative languages will always have its place.

MHO.

Hope you enjoyed this series. As implementations of these models take hold, I'll be watching for items that would change my opinions. Or prove them...

Friday, March 07, 2008 3:08:08 PM (Pacific Standard Time, UTC-08:00)  #    Comments [1]  | 
Thursday, March 06, 2008


This post is part of a series about worries when implementing LINQ to SQL or ADO.NET Entity Framework from a SQL database-centric programmer's perspective. The last two worries are related. It's mostly about either level of abstraction being one level away from the "real SQL code" that's being executed.

First off...
LINQ to SQL and EF will discourage the use of stored procedures to return rowsets; returning rowsets in stored procedures is *thought to be* superior to returning them in dynamic SQL. In addition, the code generators will use a subset of T-SQL query constructs, that is, only the constructs that the LINQ or ESQL language supports, rather than the full power of the T-SQL query language.

A stored procedure is always thought of by stored procedure afficianados as representing a "contract" between consumer and provider. That is, the database metadata tells me exactly what I'm going to get. Although the database metadata does indicate number, type, etc of parameters, this is absolutely not true for rowsets returned by stored procedures. There is NO database metadata that records anything about the returned rowsets, or even how many rowsets a stored procedure will return. Actually the number of rowsets returned is part of the ANSI standard but SQL Server implement it. In addition, errors that might happen in the middle of a stored procedure might result of rowsets being missing. And than there's always the possibility of returning multiple and/or different rowsets by using a stored procedure with conditional code... Not much of a rowset contract at all.

One way to ameliorate this problem (in SQL Server) is to use multistatement table-valued functions to return one rowset with known metadata. The main hassle with this is performance; a multistatement table-valued function is the equivalent of filling a table variable in code and then returning it. There are I/O considerations (the I/O of reading the base tables + I/O of reading the table variable at the consumer) The are also performance considertations as SQL Server table variables have no statistics, if the table-valued function is used as a row source in a larger query (composable queries), there is no way to estimate the number of rows returned by the TVF.

SQL Server's strongly typed table-valued parameters in SQL Server 2008 would be an analogous concept, but currently these are limited in being "input only" in procedures. No strong typed results yet. Oracle is an exception to this "no contract for rowsets" concept. Because Oracle doesn't return rowsets from stored procedures, they introduced a special parameter type called refcursor. Refcursors can appear in database APIs as a parameter (of type Refcursor or more generically "table"). And you can have strongly typed Refcursors, providing the needed contract. We'll have to wait for Oracle's (or DataDirect Technologies') EF provider or LINQ abstraction product to see how they use this.

So now that we've determined that there is no more of a rowset contract for stored procedures than ad-hoc SQL (the difference really is in SQL encapsulation and support of ownership chains, but that's another story), what about extentions that ESQL doesn't support? There are database-specifc extensions like SQL Server's PIVOT operator, or ANSI SQL standards, like ranking and windowing.

LINQ folks are quick to talk about implementation thorugh "extension methods" but the long and short of this is that these are a LINQ-ism, unrelated to LINQ to SQL. That is, the LINQ construct to SQL dialect statement mapping is fixed and embedded in the LINQ product. Using extensions to the *SQL statement mapping* (ie changing what SQL statement is produced) require either going deep into the framework (if this can be done at all) or implementing equivalent concepts on the client side, leaving the generated database code alone.

EF may have a little better story with this because each provider-writer implements the ESQL to query mapping, conceivably you could write a custom provider to encapsulate the supplied provider with extensions. However, the ESQL language itself does not have to capability of ODBC-like "escape clauses", so there'd be no way to express this extended SQL-based functionality in ESQL.

So I'd classify the "subset of SQL" and "stored procedure rowset is an anonymous type" problem as something that might be worked out in future releases of databases and frameworks. Until LINQ to SQL or EF provides escape clauses in the framework, the easiest way out is the ultimate escape clause, using the stored procedure that returns (anonymous) rowsets. And the more stored procedures are used (not CUD procedures, which enhance the model, but rowset-returning procedures), the farther away from the model you get. Which interferes with the usefulness of the model in general.

MHO.

Thursday, March 06, 2008 12:39:03 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 

This post covers LINQ to SQL and EF worry #4. That is:

LINQ to SQL and EF will write code that gets too much or too little data at a time. Too much data in one query is a waste. Too little data is also bad because it means extra database round-trips.

I really put this in for completeness. Both LINQ to SQL and EF have good mechanisms to deal with this one. In addition, its not necessarily (only) an ORM problem. In a file system graphic user interface, do you prefetch all of the (perhaps thousands) or files' information throughout the entire file system when someone wants to look at the content of the C drive? The answer with this one is "it depends". If you know you're going to eventually display all of the related entities' information you likely do want to get them. If not, perhaps you want to get related entities all at one, when the first child entity is selected. Or get the children one at a time when each child entity is selected.

LINQ to SQL addresses this by implementing a property on the DataContext, the DeferredLoadingEnabled property. It is True by default, retrieving only the Customer object when the Customer has Orders. The related Orders objects are retrieve with an extra roundtrip to the database, one row at a time, when the Customers' Orders property is accessed in code. There is a related property, LoadOptions, on the DataContext that takes a DataLoadOptions instance that allows you to control exactly how much related data is retrieved. That is, do I want only related Orders or Orders, OrderDetails, and associated Products in a single round trip? The DataLoadOptions also allows you to filter the amount of data you get from related tables, that is, I want each Customers' associated Orders, but only OrderID and OrderDate.

ADO.NET Entity Framework does this a little differently. They don't have a property for whether deferred loading is enabled, they just load deferred by default. In order to load associated entites, there is a separate Load method, and an IsLoaded property that you can check before loading. EF also has an Include property of the query of also you to specify which related entities can be loaded, if eager loading is desired. With EF you can also use Entity-Splitting in your design if you know you always want to retrieve OrderID and OrderDate, but no other properties, from the Orders table. Object purists may frown on composing object based only on commonly-used queries, however.

You can also retrieve only certain columns from an object (ie all the fields in Customers but the Customers' picture) with either a related or anonymous type. And, of course, you can always specify a join that returns an anonymous type is desired to get just the properties you need from related tables.

So I'd say that this worry not only is completely unwarrented, but that LINQ to SQL and EF make programmers think more about lazy loading vs eager loading, and make it clearer and more maintainable than a join, which always returns an "anonymous rowset" with columns from all tables interspersed. That is, you know exactly what related data (at an object level) is being requested and retrieved.

MHO.

Thursday, March 06, 2008 11:41:33 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 

Theme design by Jelle Druyts

Pick a theme: