{"id":401,"date":"2008-03-26T21:01:48","date_gmt":"2008-03-26T21:01:48","guid":{"rendered":"\/blogs\/conor\/post\/Local-Global-Aggregation.aspx"},"modified":"2008-03-26T21:01:48","modified_gmt":"2008-03-26T21:01:48","slug":"local-global-aggregation","status":"publish","type":"post","link":"https:\/\/www.sqlskills.com\/blogs\/conor\/local-global-aggregation\/","title":{"rendered":"Local-Global Aggregation"},"content":{"rendered":"<p>I don&#8217;t know about you, but groupby is one of my favorite operators.&nbsp; There are a TON of interesting optimizations that a QP considers when you start adding group by into queries, especially when you have joins and then a group by.&nbsp; TPC-H benchmark wars among the large database vendors are won and lost on many such optimizations.<\/p>\n<p>So, if you are doing relational OLAP (ROLAP) or are otherwise running group by queries over lots of well-normalized data, then I suggest you brush up a little on your knowledge of group by &#8211; it will help you understand when queries are behaving and when they are not.<\/p>\n<p><a href=\"http:\/\/citeseer.ist.psu.edu\/jaedicke97framework.html\">Here&#8217;s<\/a> the paper I tell people to read on the subject.&nbsp; It&#8217;s written for the person implementing a database, but anyone who can read query plans should get the basics from the paper.&nbsp; It is the basis for most of the hard-core optimizations (and tricky problems) that face all query processors today.&nbsp; <\/p>\n<p>The basic idea is caused by understanding that an aggregate function can be split into multiple operations and done in parts.&nbsp; Some parts can be done earlier in a query, saving a lot of work.&nbsp; If these results can be combined later, you might be able to speed up a query by computing these partial results early and then combining them at the end of the query.&nbsp; The usual savings is that you don&#8217;t have to materialize the results of a join when you only care about the aggregate over some piece of it.<\/p>\n<p>Most of the core aggregate functions defined in SQL can be decomposed.&nbsp; If you have a set of rows {x} := concat({y}, {z}), then <br \/>SUM({x}) == SUM({y}) + SUM({z}).<br \/>COUNT == COUNT + COUNT<br \/>MAX == MAX (MAX(), MAX())<br \/>&#8230;<\/p>\n<p>Not all aggregate operations can be decomposed in this manner, but many of them can.&nbsp; <\/p>\n<p>If you take this concept and then apply it a query with some joins:<\/p>\n<p>select sum(col1) from a join b join c<\/p>\n<p>Then the idea of local-global aggregation is that you can do part of the sum before joins and pass up the SUM for each group instead of all the rows from that group.&nbsp; <\/p>\n<p>This idea becomes more powerful when you start throwing more complex operations into a query processor, such as partitioning or parallelism.&nbsp; Often, you want to perform partial aggregations on these groups to minimize the amount of data you have to send between threads or perhaps between nodes on a NUMA machine.&nbsp; All of this is the fun stuff that makes databases interesting &#8211; you can <\/p>\n<p>Not every aggregate can be pushed below every join &#8211; there are rules about what can and can&#8217;t be done and maintain the same results from a query.&nbsp; For example, you may need to consider whether the aggregate function can handle seeing additional NULL values and return the same result or not.<\/p>\n<p>If you go look at the SQL CLR user-defined aggregate definition, you&#8217;ll see the exposed pieces of some of this capability in the SQL Server query optimizer.&nbsp; I won&#8217;t spoil all of your fun, but go take a look.&nbsp; <\/p>\n<p>Happy Querying!<\/p>\n<p>Conor<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I don&#8217;t know about you, but groupby is one of my favorite operators.&nbsp; There are a TON of interesting optimizations that a QP considers when you start adding group by into queries, especially when you have joins and then a group by.&nbsp; TPC-H benchmark wars among the large database vendors are won and lost on [&hellip;]<\/p>\n","protected":false},"author":6,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-401","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.8 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Local-Global Aggregation - Conor Cunningham<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.sqlskills.com\/blogs\/conor\/local-global-aggregation\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Local-Global Aggregation - Conor Cunningham\" \/>\n<meta property=\"og:description\" content=\"I don&#8217;t know about you, but groupby is one of my favorite operators.&nbsp; There are a TON of interesting optimizations that a QP considers when you start adding group by into queries, especially when you have joins and then a group by.&nbsp; TPC-H benchmark wars among the large database vendors are won and lost on [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.sqlskills.com\/blogs\/conor\/local-global-aggregation\/\" \/>\n<meta property=\"og:site_name\" content=\"Conor Cunningham\" \/>\n<meta property=\"article:published_time\" content=\"2008-03-26T21:01:48+00:00\" \/>\n<meta name=\"author\" content=\"Conor Cunningham\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Conor Cunningham\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"3 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.sqlskills.com\\\/blogs\\\/conor\\\/local-global-aggregation\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.sqlskills.com\\\/blogs\\\/conor\\\/local-global-aggregation\\\/\"},\"author\":{\"name\":\"Conor Cunningham\",\"@id\":\"https:\\\/\\\/www.sqlskills.com\\\/blogs\\\/conor\\\/#\\\/schema\\\/person\\\/f9106e03423de6b5157295891b8c3ae3\"},\"headline\":\"Local-Global Aggregation\",\"datePublished\":\"2008-03-26T21:01:48+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.sqlskills.com\\\/blogs\\\/conor\\\/local-global-aggregation\\\/\"},\"wordCount\":544,\"commentCount\":0,\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/www.sqlskills.com\\\/blogs\\\/conor\\\/local-global-aggregation\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.sqlskills.com\\\/blogs\\\/conor\\\/local-global-aggregation\\\/\",\"url\":\"https:\\\/\\\/www.sqlskills.com\\\/blogs\\\/conor\\\/local-global-aggregation\\\/\",\"name\":\"Local-Global Aggregation - Conor Cunningham\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.sqlskills.com\\\/blogs\\\/conor\\\/#website\"},\"datePublished\":\"2008-03-26T21:01:48+00:00\",\"author\":{\"@id\":\"https:\\\/\\\/www.sqlskills.com\\\/blogs\\\/conor\\\/#\\\/schema\\\/person\\\/f9106e03423de6b5157295891b8c3ae3\"},\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.sqlskills.com\\\/blogs\\\/conor\\\/local-global-aggregation\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.sqlskills.com\\\/blogs\\\/conor\\\/local-global-aggregation\\\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.sqlskills.com\\\/blogs\\\/conor\\\/local-global-aggregation\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/www.sqlskills.com\\\/blogs\\\/conor\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Local-Global Aggregation\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.sqlskills.com\\\/blogs\\\/conor\\\/#website\",\"url\":\"https:\\\/\\\/www.sqlskills.com\\\/blogs\\\/conor\\\/\",\"name\":\"Conor Cunningham\",\"description\":\"\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.sqlskills.com\\\/blogs\\\/conor\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.sqlskills.com\\\/blogs\\\/conor\\\/#\\\/schema\\\/person\\\/f9106e03423de6b5157295891b8c3ae3\",\"name\":\"Conor Cunningham\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/d9c37eff231ec89c1b244347d966860875eea8b55b366911d2694e8cd9913e57?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/d9c37eff231ec89c1b244347d966860875eea8b55b366911d2694e8cd9913e57?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/d9c37eff231ec89c1b244347d966860875eea8b55b366911d2694e8cd9913e57?s=96&d=mm&r=g\",\"caption\":\"Conor Cunningham\"},\"url\":\"https:\\\/\\\/www.sqlskills.com\\\/blogs\\\/conor\\\/author\\\/conor\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Local-Global Aggregation - Conor Cunningham","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.sqlskills.com\/blogs\/conor\/local-global-aggregation\/","og_locale":"en_US","og_type":"article","og_title":"Local-Global Aggregation - Conor Cunningham","og_description":"I don&#8217;t know about you, but groupby is one of my favorite operators.&nbsp; There are a TON of interesting optimizations that a QP considers when you start adding group by into queries, especially when you have joins and then a group by.&nbsp; TPC-H benchmark wars among the large database vendors are won and lost on [&hellip;]","og_url":"https:\/\/www.sqlskills.com\/blogs\/conor\/local-global-aggregation\/","og_site_name":"Conor Cunningham","article_published_time":"2008-03-26T21:01:48+00:00","author":"Conor Cunningham","twitter_misc":{"Written by":"Conor Cunningham","Est. reading time":"3 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.sqlskills.com\/blogs\/conor\/local-global-aggregation\/#article","isPartOf":{"@id":"https:\/\/www.sqlskills.com\/blogs\/conor\/local-global-aggregation\/"},"author":{"name":"Conor Cunningham","@id":"https:\/\/www.sqlskills.com\/blogs\/conor\/#\/schema\/person\/f9106e03423de6b5157295891b8c3ae3"},"headline":"Local-Global Aggregation","datePublished":"2008-03-26T21:01:48+00:00","mainEntityOfPage":{"@id":"https:\/\/www.sqlskills.com\/blogs\/conor\/local-global-aggregation\/"},"wordCount":544,"commentCount":0,"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.sqlskills.com\/blogs\/conor\/local-global-aggregation\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.sqlskills.com\/blogs\/conor\/local-global-aggregation\/","url":"https:\/\/www.sqlskills.com\/blogs\/conor\/local-global-aggregation\/","name":"Local-Global Aggregation - Conor Cunningham","isPartOf":{"@id":"https:\/\/www.sqlskills.com\/blogs\/conor\/#website"},"datePublished":"2008-03-26T21:01:48+00:00","author":{"@id":"https:\/\/www.sqlskills.com\/blogs\/conor\/#\/schema\/person\/f9106e03423de6b5157295891b8c3ae3"},"breadcrumb":{"@id":"https:\/\/www.sqlskills.com\/blogs\/conor\/local-global-aggregation\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.sqlskills.com\/blogs\/conor\/local-global-aggregation\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.sqlskills.com\/blogs\/conor\/local-global-aggregation\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.sqlskills.com\/blogs\/conor\/"},{"@type":"ListItem","position":2,"name":"Local-Global Aggregation"}]},{"@type":"WebSite","@id":"https:\/\/www.sqlskills.com\/blogs\/conor\/#website","url":"https:\/\/www.sqlskills.com\/blogs\/conor\/","name":"Conor Cunningham","description":"","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.sqlskills.com\/blogs\/conor\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/www.sqlskills.com\/blogs\/conor\/#\/schema\/person\/f9106e03423de6b5157295891b8c3ae3","name":"Conor Cunningham","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/d9c37eff231ec89c1b244347d966860875eea8b55b366911d2694e8cd9913e57?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/d9c37eff231ec89c1b244347d966860875eea8b55b366911d2694e8cd9913e57?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/d9c37eff231ec89c1b244347d966860875eea8b55b366911d2694e8cd9913e57?s=96&d=mm&r=g","caption":"Conor Cunningham"},"url":"https:\/\/www.sqlskills.com\/blogs\/conor\/author\/conor\/"}]}},"_links":{"self":[{"href":"https:\/\/www.sqlskills.com\/blogs\/conor\/wp-json\/wp\/v2\/posts\/401","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.sqlskills.com\/blogs\/conor\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.sqlskills.com\/blogs\/conor\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.sqlskills.com\/blogs\/conor\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/www.sqlskills.com\/blogs\/conor\/wp-json\/wp\/v2\/comments?post=401"}],"version-history":[{"count":0,"href":"https:\/\/www.sqlskills.com\/blogs\/conor\/wp-json\/wp\/v2\/posts\/401\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.sqlskills.com\/blogs\/conor\/wp-json\/wp\/v2\/media?parent=401"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.sqlskills.com\/blogs\/conor\/wp-json\/wp\/v2\/categories?post=401"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.sqlskills.com\/blogs\/conor\/wp-json\/wp\/v2\/tags?post=401"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}