{"id":429,"date":"2008-01-27T14:12:04","date_gmt":"2008-01-27T14:12:04","guid":{"rendered":"\/blogs\/conor\/post\/SQL-Server-Query-Optimization-and-Data-Warehouses.aspx"},"modified":"2008-01-27T14:12:04","modified_gmt":"2008-01-27T14:12:04","slug":"sql-server-query-optimization-and-data-warehouses","status":"publish","type":"post","link":"https:\/\/www.sqlskills.com\/blogs\/conor\/sql-server-query-optimization-and-data-warehouses\/","title":{"rendered":"SQL Server Query Optimization and Data Warehouses"},"content":{"rendered":"<p>For a number of years, the marketing for SQL Server&#8217;s competitors made a large number of claims that SQL Server couldn&#8217;t handle &#8220;Enterprise&#8221;-scale deployments.&nbsp; Over time, those claims became much quieter as Microsoft used SQL Server extensively to run their whole business, through multi-terrabyte reference cases, and the hugely successful TerraServer project where Jim Gray led a team trying to build a huge database of satellite images using SQL Server.<\/p>\n<p>This doesn&#8217;t mean that there aren&#8217;t challenges for all of the major vendors in adding features to help their products scale.&nbsp; Each of them have some failings in some of their product offerings, so please don&#8217;t interpret this as a &#8220;Microsoft SQL Server can&#8217;t do the job&#8221; post.&nbsp; On the contrary, I think that there is little that it can&#8217;t do, but I know where the limitations get hit and I have some expertise in working around the current limitations.&nbsp; Each Data Warehouse problem is a bit different, so the kinds of problems I&#8217;ll discuss here may or may not happen for any given deployment.<\/p>\n<p>In my previous post, I discussed how subqueries are approached by the query optimizer (as joins), and this can lead to problems when the cardinality or cost estimates are incorrect.&nbsp; Often when an estimate is incorrect, &#8220;unrolling&#8221; the subquery by hand and creating a temporary table for a portion of the query will correct the mistake.&nbsp; Splitting a larger query like this makes the operation more procedural and makes each piece smaller.&nbsp; It also happens to give the optimizer better cardinality estimates for the portion of the query that has been unrolled &#8211; these better estimates may help the rest of the query perform more to your expectations.&nbsp; This all assumes that:<br \/>1. You have the skills and desire to unroll a subquery<br \/>2. You know how to unroll it to give the optimizer information to help avoid cardinality estimate errors<br \/>3. That the query is known <i>a priori<\/i>.&nbsp; Dynamically-generated queries are harder to hint<\/p>\n<p>The best way to track down estimation errors is to find some time when you can run a problematic query with &#8220;set statistics profile on&#8221; enabled.&nbsp; While this does make the query run more slowly, it captures useful information to help identify areas of a query where the estimated and actual cardinality are materially incorrect.&nbsp; I&#8217;ve blogged about this previously (in previous blogs, not this one), so I&#8217;ll do a post linking some of the useful backgrounders on this area.<\/p>\n<p>Data Warehouses are usually the largest of the large in terms of SQL Server deployments, and usually they have a dedicated DBA to help keep it working.&nbsp; This is partially because any business problem worth hundreds of thousands of dollars in hardware is also probably worth the salary of a DBA to keep it working.&nbsp; However, the other side of this coin is that there aren&#8217;t enough data warehouses in the world to make it truly a commodity software purchase.&nbsp; As such, lots of little things can and do go wrong in data warehouse deployments that often require expertise to fix.<\/p>\n<p>Let&#8217;s talk about some of the reasons why the query optimizer may make incorrect guesses on data warehouse tables, starting with some of the easier ones<\/p>\n<p>1. missing statistics.&nbsp; I&#8217;ve seen a bunch of customers disable auto-create statistics because &#8220;someone told them to do so&#8221; or &#8220;I don&#8217;t trust those things&#8221;.&nbsp; Unless you are going to hand-tune every query, you should likely leave auto-create enabled.<\/p>\n<p>2. out-of-date statistics.&nbsp; This is a bit trickier.&nbsp; Column statistics are updated when a certain number of changes are made, and sometimes that hasn&#8217;t been triggered.&nbsp; You can manually update statistics in those cases.&nbsp; For customers who disable auto-update statistics (&lt;2%, but this happens a lot on the higher-end deployments), you need to have a schedule to update those stats so that the optimizer has a fighting chance.<\/p>\n<p>3. a statistics sample that does not reflect the whole data set.&nbsp; There are a number of assumptions made during statistics creation that are more pragmatic than mathematically proven.&nbsp; To make auto-stats possible, only a small percentage of the pages are actually used to create statistics for non-indexed columns.&nbsp; If those pages don&#8217;t cover the whole data range well, queries over the missing ranges could get poor estimates.<\/p>\n<p>4. Fundamental limits in the statistics object.&nbsp; Currently there can be up to 200 steps in the histogram that is created.&nbsp; If your table&#8217;s data has less than 200 interesting spikes in the distribution curve, then you are probably fine.&nbsp; However, your fact table may benefit from more.&nbsp; Unfortunately, this can be a problem even with full stats enabled in SQL 2005 and, from what I have seen so far, in the CTPs of SQL 2008.<\/p>\n<p>5. model limits in the optimizer.&nbsp; Optimizers make assumptions so that they can compile a query in a reasonable amount of time.&nbsp; For example, an optimizer may assume that columns are, in general, statistically independent of each other for the purposes of cardinality estimation because it&#8217;s the only computationally feasible assumption to make.&nbsp; This assumption is very often not true, and it can introduce errors in cardinality estimation that can lead to poor plan selection.&nbsp; As databases get larger, these &#8220;out of model&#8221; errors can get larger and require more hand-holding to make sure that a given query gets a good plan from the optimizer.<\/p>\n<p>I hope this gives some insight into how cardinality estimation impacts join optimization and thus subquery optimization in SQL Server 200x.<\/p>\n<p>Thanks,<\/p>\n<p>Conor<\/p>\n","protected":false},"excerpt":{"rendered":"<p>For a number of years, the marketing for SQL Server&#8217;s competitors made a large number of claims that SQL Server couldn&#8217;t handle &#8220;Enterprise&#8221;-scale deployments.&nbsp; Over time, those claims became much quieter as Microsoft used SQL Server extensively to run their whole business, through multi-terrabyte reference cases, and the hugely successful TerraServer project where Jim Gray [&hellip;]<\/p>\n","protected":false},"author":6,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-429","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v21.9.1 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>SQL Server Query Optimization and Data Warehouses - Conor Cunningham<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.sqlskills.com\/blogs\/conor\/sql-server-query-optimization-and-data-warehouses\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"SQL Server Query Optimization and Data Warehouses - Conor Cunningham\" \/>\n<meta property=\"og:description\" content=\"For a number of years, the marketing for SQL Server&#8217;s competitors made a large number of claims that SQL Server couldn&#8217;t handle &#8220;Enterprise&#8221;-scale deployments.&nbsp; Over time, those claims became much quieter as Microsoft used SQL Server extensively to run their whole business, through multi-terrabyte reference cases, and the hugely successful TerraServer project where Jim Gray [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.sqlskills.com\/blogs\/conor\/sql-server-query-optimization-and-data-warehouses\/\" \/>\n<meta property=\"og:site_name\" content=\"Conor Cunningham\" \/>\n<meta property=\"article:published_time\" content=\"2008-01-27T14:12:04+00:00\" \/>\n<meta name=\"author\" content=\"Conor Cunningham\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Conor Cunningham\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"5 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.sqlskills.com\/blogs\/conor\/sql-server-query-optimization-and-data-warehouses\/\",\"url\":\"https:\/\/www.sqlskills.com\/blogs\/conor\/sql-server-query-optimization-and-data-warehouses\/\",\"name\":\"SQL Server Query Optimization and Data Warehouses - Conor Cunningham\",\"isPartOf\":{\"@id\":\"https:\/\/www.sqlskills.com\/blogs\/conor\/#website\"},\"datePublished\":\"2008-01-27T14:12:04+00:00\",\"dateModified\":\"2008-01-27T14:12:04+00:00\",\"author\":{\"@id\":\"https:\/\/www.sqlskills.com\/blogs\/conor\/#\/schema\/person\/f9106e03423de6b5157295891b8c3ae3\"},\"breadcrumb\":{\"@id\":\"https:\/\/www.sqlskills.com\/blogs\/conor\/sql-server-query-optimization-and-data-warehouses\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.sqlskills.com\/blogs\/conor\/sql-server-query-optimization-and-data-warehouses\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.sqlskills.com\/blogs\/conor\/sql-server-query-optimization-and-data-warehouses\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.sqlskills.com\/blogs\/conor\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"SQL Server Query Optimization and Data Warehouses\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.sqlskills.com\/blogs\/conor\/#website\",\"url\":\"https:\/\/www.sqlskills.com\/blogs\/conor\/\",\"name\":\"Conor Cunningham\",\"description\":\"\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.sqlskills.com\/blogs\/conor\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.sqlskills.com\/blogs\/conor\/#\/schema\/person\/f9106e03423de6b5157295891b8c3ae3\",\"name\":\"Conor Cunningham\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.sqlskills.com\/blogs\/conor\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/d9c37eff231ec89c1b244347d966860875eea8b55b366911d2694e8cd9913e57?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/d9c37eff231ec89c1b244347d966860875eea8b55b366911d2694e8cd9913e57?s=96&d=mm&r=g\",\"caption\":\"Conor Cunningham\"},\"url\":\"https:\/\/www.sqlskills.com\/blogs\/conor\/author\/conor\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"SQL Server Query Optimization and Data Warehouses - Conor Cunningham","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.sqlskills.com\/blogs\/conor\/sql-server-query-optimization-and-data-warehouses\/","og_locale":"en_US","og_type":"article","og_title":"SQL Server Query Optimization and Data Warehouses - Conor Cunningham","og_description":"For a number of years, the marketing for SQL Server&#8217;s competitors made a large number of claims that SQL Server couldn&#8217;t handle &#8220;Enterprise&#8221;-scale deployments.&nbsp; Over time, those claims became much quieter as Microsoft used SQL Server extensively to run their whole business, through multi-terrabyte reference cases, and the hugely successful TerraServer project where Jim Gray [&hellip;]","og_url":"https:\/\/www.sqlskills.com\/blogs\/conor\/sql-server-query-optimization-and-data-warehouses\/","og_site_name":"Conor Cunningham","article_published_time":"2008-01-27T14:12:04+00:00","author":"Conor Cunningham","twitter_misc":{"Written by":"Conor Cunningham","Est. reading time":"5 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.sqlskills.com\/blogs\/conor\/sql-server-query-optimization-and-data-warehouses\/","url":"https:\/\/www.sqlskills.com\/blogs\/conor\/sql-server-query-optimization-and-data-warehouses\/","name":"SQL Server Query Optimization and Data Warehouses - Conor Cunningham","isPartOf":{"@id":"https:\/\/www.sqlskills.com\/blogs\/conor\/#website"},"datePublished":"2008-01-27T14:12:04+00:00","dateModified":"2008-01-27T14:12:04+00:00","author":{"@id":"https:\/\/www.sqlskills.com\/blogs\/conor\/#\/schema\/person\/f9106e03423de6b5157295891b8c3ae3"},"breadcrumb":{"@id":"https:\/\/www.sqlskills.com\/blogs\/conor\/sql-server-query-optimization-and-data-warehouses\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.sqlskills.com\/blogs\/conor\/sql-server-query-optimization-and-data-warehouses\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.sqlskills.com\/blogs\/conor\/sql-server-query-optimization-and-data-warehouses\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.sqlskills.com\/blogs\/conor\/"},{"@type":"ListItem","position":2,"name":"SQL Server Query Optimization and Data Warehouses"}]},{"@type":"WebSite","@id":"https:\/\/www.sqlskills.com\/blogs\/conor\/#website","url":"https:\/\/www.sqlskills.com\/blogs\/conor\/","name":"Conor Cunningham","description":"","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.sqlskills.com\/blogs\/conor\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/www.sqlskills.com\/blogs\/conor\/#\/schema\/person\/f9106e03423de6b5157295891b8c3ae3","name":"Conor Cunningham","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.sqlskills.com\/blogs\/conor\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/d9c37eff231ec89c1b244347d966860875eea8b55b366911d2694e8cd9913e57?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/d9c37eff231ec89c1b244347d966860875eea8b55b366911d2694e8cd9913e57?s=96&d=mm&r=g","caption":"Conor Cunningham"},"url":"https:\/\/www.sqlskills.com\/blogs\/conor\/author\/conor\/"}]}},"_links":{"self":[{"href":"https:\/\/www.sqlskills.com\/blogs\/conor\/wp-json\/wp\/v2\/posts\/429","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.sqlskills.com\/blogs\/conor\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.sqlskills.com\/blogs\/conor\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.sqlskills.com\/blogs\/conor\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/www.sqlskills.com\/blogs\/conor\/wp-json\/wp\/v2\/comments?post=429"}],"version-history":[{"count":0,"href":"https:\/\/www.sqlskills.com\/blogs\/conor\/wp-json\/wp\/v2\/posts\/429\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.sqlskills.com\/blogs\/conor\/wp-json\/wp\/v2\/media?parent=429"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.sqlskills.com\/blogs\/conor\/wp-json\/wp\/v2\/categories?post=429"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.sqlskills.com\/blogs\/conor\/wp-json\/wp\/v2\/tags?post=429"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}