I’ve been interested in exploring what changes may be coming through in SQL Azure (Windows Azure – “SQL Database” now) around query execution cost models and cardinality estimates being fed to them.  This was motivated by a couple of things…

First of all, I saw a blog post from Grant Fritchey where he noticed that the estimated costs for a few plan operators were different.

Secondly, I read a paper called “Testing Cardinality Estimation Models in SQL Server” by Campbell Fraser, Leo Giakoumakis, Vikas Hamine, and Katherine F. Moore-Smith.  This was a for-fee article, but the non-member price of $15.00 was worth it.  One particularly interesting quote was as follows:

“The new CE model is planned as a future service release of the Microsoft SQL Azure service.”

That quote was a tipping point for further investigation, so, collaborating with Jonathan Kehayias, we discussed a testing approach and set up two different Azure databases, with one database on Web Edition and the other Business Edition.  The intention wasn’t to perform “formal” tests, but I did want to sniff around and see what variations in cost and cardinality estimates I could find (if any) between SQL Azure (version 11.0.2006) and SQL Server 2012 (version 11.0.2316) across various types of queries.  I used the Credit database for a variety of test queries – with identical schema and data in all three databases (one engine DB and two Azure DBs).

One thing I’ve learned so far is that you should watch out for is misinterpreting cost differences for “identical” databases.  Even if you load the same exact schema and rows, you will likely have a different data page count between Engine and Azure (think of the Azure fillfactor and RCSI behavior).  For example, after loading my dbo.member table in SQL Azure, it had 159 pages versus the 142 pages in my SQL Server 2012 version.  So testing an initial Clustered Index Scan query against that table showed me an estimated IO of 0.1075694 in SQL Server 2012 versus 0.120162 for SQL Azure.  So assuming one random I/O and the rest sequential, I see that my SQL Azure cost is still calculated the same for the Clustered Index Scan:

— Random I/O – 0.003125
— Sequential – 0.000740741
SELECT    0.003125 +
0.000740741 * (159-1);

So the key will be to make sure I’m looking at true apples-to-apples comparisons here.  I’ll be testing when I have a few spare moments between other tasks – but in the meantime I’m very interested to learn more about what new changes will come in to SQL Azure in the future.  I’ll share anything interesting I find on the blog – and if you find noteworthy QO items, please share on the comments of this blog as well.

Thanks!