Let’s say you have a heap table with 1,000,000 rows in it.  Let’s also say that your automatic creation of statistics are disabled, as well as updates to the statistics (and in this scenario, there are NO existing indexes or statistics).

What kind of selectivity guess would the optimizer make for a query like the following? (I copied over the FactInternetSales table with no indexes and put 1,000,000 rows in it – tested on SQL Server 2012, 11.0.2316)

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE ProductKey = 537;

First of all, the actual plan on my system was as follows:

SNAGHTML1904f4f

If I look at the properties of the table scan, I see the actual number of rows was 23,042 versus the estimated of 31,622.8.  (As an aside – execute SELECT POWER(1000000.0,.75) and you’ll see how the estimated number of rows was derived).  I also see that the table cardinality is known (and correct) at 1,000,000.

SNAGHTML19c0871

What if I have a predicate referencing another column, like OrderDateKey?  This too results in an estimate of 31,622.8 rows.

The following table shows a sampling of various queries and associated search conditions, along with their associated row estimates:

Query Estimated Rows Percentage (based on 1,000,000 rows)

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE ProductKey = 537;

31,622.8 3.16228%

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE OrderDateKey = 20040224;

31,622.8 3.16228%

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE ProductKey BETWEEN 537 AND 600;

90,000 9%

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE ProductKey BETWEEN 537 AND 537;

90,000 9%

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE TotalProductCost < 100.00;

300,000 30%

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE TotalProductCost > 100.00;

300,000 30%

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE TotalProductCost > 100.00 AND
      TotalProductCost < 200.00;

90,000 9%

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE TotalProductCost >= 100.00 AND
      TotalProductCost <= 200.00;

90,000 9%

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE TotalProductCost LIKE 100.00;

100,000 10%

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE SalesOrderNumber LIKE '%A%';

539,232 53.9232%

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE SalesOrderNumber LIKE 'A%';

269,616 26.9616%

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE SalesOrderNumber LIKE '%A';

269,616 26.9616%

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE TotalProductCost > (ProductStandardCost + 10.00);

300,000 30%

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE ProductKey IN (1);

31,622.8 3.16228%

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE ProductKey IN (1,2);

62,245.6 6.22456%

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE ProductKey = 1 OR
      ProductKey = 2;

62,245.6 6.22456%

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE ProductKey IN (1,2,3);

91,900 9.19%

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE ProductKey = 1 OR
      ProductKey = 2 OR
      ProductKey = 3;

91,900 9.19%

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE ProductKey = 1 AND
      ProductKey = 2 AND
      ProductKey = 3;

10,746.1 1.07461%

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE ProductKey = 1 AND
      ProductKey = 2;

13,335.2 1.33352%

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE (TaxAmt * UnitPrice) > 10.00;

300,000 30%

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE ProductKey NOT IN (1);

968,377 96.8377%

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE ProductKey <> 1;

968,377 96.8377%

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE ProductKey NOT IN (1,2);

13,335.2 1.33352%

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE ProductKey NOT IN (1,2,3);

10746.1 1.07461%

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE ProductKey <> 1 AND
      ProductKey <> 2 AND
      ProductKey <> 3;

10746.1 1.07461%

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE EXISTS (SELECT 1);

1,000,000 100%

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE NOT EXISTS (SELECT 1);

0 0%

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE ProductKey IS NOT NULL;

1,000,000 100%

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE ProductKey IS NULL;

0 0%

ALTER TABLE dbo.FactInternetSales
ALTER COLUMN ProductKey int NULL;

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE ProductKey IS NOT NULL;

968,377 96.8377%

ALTER TABLE dbo.FactInternetSales
ALTER COLUMN ProductKey int NULL;

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE ProductKey IS NULL;

31,622.8 3.16228%

There are numerous other scenarios I could have put in this table, but you probably get the point by now that the search conditions help define the which “guess” calculation is used in absence of statistics.  When there are additional details beyond table cardinality, such as the nullability of a column or constraints, SQL Server can often leverage it accordingly. 

Categories:
Performance

This post is just a reminder to be attentive to the locking overhead of your Transact-SQL server cursors.

For example, the following cursor is using default options in SQL Server 2012 to iterate row-by-row through the Employee table.  I’m declaring a variable and populating it with the BusinessEntityID from each row (and I’m not doing anything with it – as I just wanted to avoid the chatter back to my SQL Server Management Studio window):

   1:  DECLARE Employee_Cursor CURSOR FOR
   2:   
   3:  SELECT BusinessEntityID
   4:  FROM [HumanResources].[Employee];
   5:   
   6:  SELECT properties
   7:  FROM sys.dm_exec_cursors(52);
   8:   
   9:  OPEN Employee_Cursor;
  10:   
  11:  DECLARE @BusinessEntityID int;
  12:   
  13:  FETCH NEXT FROM Employee_Cursor
  14:  INTO @BusinessEntityID;
  15:   
  16:  WHILE @@FETCH_STATUS = 0
  17:  BEGIN
  18:     
  19:      FETCH NEXT FROM Employee_Cursor
  20:      INTO @BusinessEntityID;
  21:   
  22:  END
  23:   
  24:  CLOSE Employee_Cursor;

Now what kind of locking behavior did I see?  Using extended events and tracking lock_acquired and lock_released, I saw 2,340 events total.  Below is just a bit of what I saw – with the IS mode acquired for object, then IS mode lock for page, then S lock mode for the key, released for that key, acquired for the next key, released, and then released for the page and object and then starting all over again until reaching the final row of the table.

SNAGHTML1610fa0

I also included a call to sys.dm_exec_cursors for my test session id (52).  This returned the following property value:

TSQL | Dynamic | Optimistic | Global (0)

What about using SCROLL_LOCKS? (TSQL | Dynamic | Scroll Locks | Global (0)).  Here is the slightly modified T-SQL code and abridged result set:

   1:  DECLARE Employee_Cursor  CURSOR SCROLL_LOCKS
   2:  FOR
   3:   
   4:  SELECT BusinessEntityID
   5:  FROM [HumanResources].[Employee];
   6:   
   7:  SELECT properties
   8:  FROM sys.dm_exec_cursors(55);
   9:   
  10:  OPEN Employee_Cursor;
  11:   
  12:  FETCH NEXT FROM Employee_Cursor;
  13:   
  14:  WHILE @@FETCH_STATUS = 0
  15:  BEGIN
  16:     
  17:      FETCH NEXT FROM Employee_Cursor;
  18:   
  19:  END
  20:   
  21:  CLOSE Employee_Cursor;
  22:  DEALLOCATE Employee_Cursor;

SNAGHTML1712dcb

Now we see IX (object), IU (page) and U (key) modes enter the mix.  By the way, the 1237579447 is associated with the Employee table.  The 72057594045136896 value is the container_id – which is a type “1” (IN_ROW_DATA), so we can cross reference to sys.partitions.hobt_id, which then correlated to the Employee table.

Are the locks necessary? Maybe, depending on what you wish to achieve, but I find that often the cursor settings are not configured intentionally. 

As an exercise, you can check sys.dm_exec_cursor to see cursors running on your server at that moment in time.  This DMF will surface the session IDs, cursor name, properties and the associated sql_handle.  You may see settings that are unexpected or are overkill for what is actually required.

Categories:
Performance

Last November I blogged about how index usage stats don’t get updated when the associated index statistics (but not index) are used.

This post will describe another scenario that you should be aware of (the topic came up today in class while Kimberly was teaching – as we were trying to recall tricks to clearing stats for sys.dm_db_index_usage_stats)…

Imagine that I’ve queried a specific table as follows:

SELECT member_no, lastname, firstname, middleinitial, street, city, state_prov, country
FROM dbo.member
WHERE member_no = 1;

If I check sys.dm_db_index_usage_stats for any reference to the member table, I’ll see the following:

SELECT i.index_id, i.name, u.user_seeks, u.user_lookups, u.user_scans
FROM sys.dm_db_index_usage_stats u
INNER JOIN sys.indexes i ON
     u.object_id = i.object_id AND
     u.index_id = i.index_id
WHERE u.object_id=object_id('dbo.member')

This returns:

image

Now let’s say that you have a weekly rebuild of specific indexes (for example):

ALTER INDEX member_ident
ON dbo.member REBUILD

If I check for usage stats after rebuilding the query (and before anyone has accessed the member table specifically) – the stats have been cleared out for that table. 

image

Why does this matter? 

If you’re using the sys.dm_db_index_usage_stats to determine which indexes should be removed, you’re running the risk of making decisions based on recently cleared out statistics.  This is similar to the case where a SQL Server instance has been recently restarted.  You should not be dropping indexes without knowing whether the accumulated statistics represent the full set of critical workloads. 

For tables with frequent index rebuilds, be sure to capture data from sys.dm_db_index_usage_stats before these jobs run.  This DMV is definitely a useful tool, but if you’re not careful, you could be dropping indexes based on missing information.

A few other noteworthy items:

  • Rebuilding an index only clears the stats for the index involved.  For example – if I have two rows in the DMV for two different indexes on the same object, rebuilding one index will only clear stats for that object.
  • Reorganizing the index does NOT clear the stats from sys.dm_db_index_usage_stats.
  • This is not the only circumstance where the stats will clear.
  • I have only tested this recently on SQL Server 2008 R2 and SQL Server 2012.  At one point when I was at Microsoft I blogged about seeing the stats persist after a REBUILD (more in the context of REBUILD itself – and I had mentioned the DMV in passing).  Which makes me think that perhaps this behavior wasn’t always the case.  If you’ve found a different behavior on different versions, please post your comment here.

Thanks!

Categories:
indexing | Performance

Let’s say you are querying a partitioned table and you would like to see which partitions were accessed by looking at the graphical execution plan:

SNAGHTML59cf4d6

“Actual Partition Count” shows a value of 1 and “Actual Partitions Accessed” shows a value of 50.  The “Actual Partitions Accessed” property name could cause confusion though, since what you’re actually looking at is the partition numbers accessed (not the count of partitions accessed). 

I prefer the XML naming convention instead:


  <PartitionsAccessed PartitionCount="1">
    <PartitionRange Start="50" End="50" />
  </PartitionsAccessed>

The name mapping is as follows from Graphical-to-XML Plan formats:

“Actual Partition Count” = PartitionsAccessed element & PartitionCount attribute

“Actual Partitions Accessed” = PartitionRange

If I modify the query to access two partitions, I see the following (graphical and XML plan output):

image

<PartitionsAccessed PartitionCount="2">
               <PartitionRange Start="1" End="1" />
               <PartitionRange Start="50" End="50" />
</PartitionsAccessed>

And here is an example accessing all partitions in the table:

image
 

<PartitionsAccessed PartitionCount="63">
    <PartitionRange Start="1" End="63" />
  </PartitionsAccessed>

Once you realize the mapping, its no big deal to understand what’s going on, although I do see it causing confusion (hence this blog post).

Categories:
Performance | Execution Plan

Yesterday I was working on a SSMS SQLCMD-mode script and I noticed that periodically my session_id changed across test executions.  I was tracking wait statistics based on specific session IDs, so I had to periodically add new session ids to my extended events session definition to make sure I captured my session’s wait stats accordingly. It was an isolated instance, so I could rely on the various session ids only being associated with my test.

It got me thinking about capturing session_id and scheduler assignment using SQLCMD mode. I wasn’t so much interested in the new session ids being used, but rather, I was interested in where my session ids were landing from a scheduler perspective. Certainly you could test this through an application, PowerShell script, etc. But SQLCMD mode offered a quick way to do this and I thought I would explore further. 

There are a few scenarios that I thought would be interesting, but for now I’ll just show one of them. For this scenario I had SQL Server instance (10.50.2500) with four visible schedulers and without multiple NUMA nodes (stay tuned on that for another time).  I started off by creating a table in tempdb to track session_id, scheduler_id, parent_node_id, cpu_id and a counter of connects:

:CONNECT CAESAR

USE tempdb;

CREATE TABLE [dbo].[SQLCMDRoundRobin](

    [sqlcmdroundrobin_id] [int] NOT NULL PRIMARY KEY CLUSTERED IDENTITY(1,1),

      [session_id] [smallint] NOT NULL,

      [scheduler_id] [int] NOT NULL,

      [parent_node_id] [int] NOT NULL,

      [cpu_id] [smallint] NOT NULL,

      [connect_counter] [int] NOT NULL

) ON [PRIMARY];

GO

Before I executed my test, I launched a separate background set of query executions (looping) which put the system under CPU pressure across all schedulers.

clip_image001[4]

Next, I connected using SQLCMD mode and used MERGE to INSERT/UPDATE specific session_id/scheduler_id combos.  I used “GO 10000” to connect-and-execute 10,000 times:

:CONNECT CAESAR

USE tempdb;

SET NOCOUNT ON;

MERGE  dbo.SQLCMDRoundRobin AS target

USING (

      SELECT      r.session_id,

                  s.scheduler_id,

                  s.parent_node_id,

                  s.cpu_id

      FROM sys.dm_exec_requests AS r

      INNER JOIN sys.dm_os_schedulers AS s ON

            r.scheduler_id = s.scheduler_id

      WHERE r.scheduler_id IS NOT NULL AND

            r.session_id = @@SPID) AS source

                  (session_id, scheduler_id, parent_node_id, cpu_id)

ON (target.session_id = source.session_id AND

    target.scheduler_id = source.scheduler_id)

WHEN MATCHED THEN

      UPDATE SET connect_counter = target.connect_counter  + 1

WHEN NOT MATCHED THEN

      INSERT (session_id, scheduler_id, parent_node_id, cpu_id, connect_counter )

      VALUES (source.session_id, source.scheduler_id,

             source.parent_node_id, source.cpu_id,  1);

GO 10000

Then I executed the following query to look at the distribution based on every SQLCMD connect:

:CONNECT CAESAR

USE tempdb;

SELECT session_id, scheduler_id, parent_node_id, cpu_id, connect_counter

FROM [dbo].[SQLCMDRoundRobin];

GO

This returned:

clip_image003[4]

Removing all concurrent activity, I saw the following distribution instead:

clip_image005[4]

I also tested the workload and CPU driving background activity with the processor affinity mask explicitly set, and the distribution was as follows:

clip_image007[4]

I’d like to apply this framework to more scenarios – including for those on NUMA systems and also against a wider variety of workloads, since this has an impact on how the sessions will get distributed.  When I get the chance, I’ll write about the observations here, but I thought I would share the capturing method in the meantime.

Categories:
Performance

You may have already heard that SQL Server 2012 adds automatic generation of statistics for AlwaysOn availability group readable secondaries, storing the statistics in tempdb.  This functionality was also extended to other areas beyond just AlwaysOn availability groups as I’ll demonstrate next…

Using the AdventureWorksDWDenali database, I’ll start by checking the current statistics on the FactInternetSales table:

USE [AdventureWorksDWDenali];

GO

-- Confirm statistics on FactInternetSales

EXEC sp_helpstats 'FactInternetSales';

/*

statistics_name statistics_keys

_WA_Sys_00000008_1273C1CD  SalesTerritoryKey

_WA_Sys_0000000A_1273C1CD  SalesOrderLineNumber

_WA_Sys_0000000B_1273C1CD  RevisionNumber

_WA_Sys_0000000C_1273C1CD  OrderQuantity

_WA_Sys_0000000D_1273C1CD     UnitPrice

_WA_Sys_0000000E_1273C1CD  ExtendedAmount

_WA_Sys_0000000F_1273C1CD  UnitPriceDiscountPct

_WA_Sys_00000010_1273C1CD  DiscountAmount

_WA_Sys_00000011_1273C1CD  ProductStandardCost

_WA_Sys_00000012_1273C1CD  TotalProductCost

_WA_Sys_00000013_1273C1CD  SalesAmount

_WA_Sys_00000014_1273C1CD  TaxAmt

_WA_Sys_00000015_1273C1CD  Freight

_WA_Sys_00000016_1273C1CD  CarrierTrackingNumber

_WA_Sys_00000017_1273C1CD  CustomerPONumber

_WA_Sys_00000018_1273C1CD  OrderDate

_WA_Sys_00000019_1273C1CD  DueDate

_WA_Sys_0000001A_1273C1CD  ShipDate

*/

Next, I’m going to add a UnitPrice value to an existing row (creating just one row out of the table with this value):

UPDATE TOP (1) dbo.FactInternetSales

SET UnitPrice = 777.77;

After that, I’ll drop the statistics for the UnitPrice column.  Why?  I want demonstrate a scenario where automatic statistics may not have already been generated due to incoming query requests – but may still be needed for queries against the database snapshot:

-- UnitPrice stats

DROP STATISTICS FactInternetSales._WA_Sys_0000000D_1273C1CD;

GO

Now I’ll create a database snapshot on the database:

-- Create the database snapshot

CREATE DATABASE AdventureWorksDWDenali_AWDW_S1

ON

( NAME = AdventureWorksDWDenali_Data,

FILENAME =

'C:\Program Files\Microsoft SQL Server\MSSQL11.CAESAR\MSSQL\Data\AdventureWorksDWDenali_AWDW_S1.ss' )

AS SNAPSHOT OF AdventureWorksDWDenali;

GO

What happens if I use the database snapshot to query for a specific UnitPrice value?  There are no statistics on the read-write database and now that I’m using the database snapshot, I’m not triggering automatic-statistics on the source database:

USE AdventureWorksDWDenali_AWDW_S1;

GO

SELECT ProductKey, ProductStandardCost

FROM dbo.FactInternetSales

WHERE UnitPrice = 777.77;

Looking at the actual plan for this query, I see that the estimated rows vs. actual do indeed match (and as an aside – the optimizer “guess” for this predicate would have been 3,852.72 without the supporting stats):

clip_image001

But I can also check to see if there are automatic statistics generated by qualifying is_temporary = 1:

SELECT name, stats_id

FROM sys.stats

WHERE OBJECT_ID = object_id('FactInternetSales')

     AND is_temporary = 1;

This returns:

clip_image003

The stats name even gave me an additional hint of its origins via the “readonly” naming convention.  If I look up the column for the automatically generated temporary stats, I see the following:

SELECT     object_name(object_id) tblname,

           COL_NAME(object_id, column_id) columnname

FROM sys.stats_columns

WHERE stats_id = 512000;

clip_image005

If you need a supporting index, you still need to provide this structure in the read-write database, but when a query can be optimized using statistics alone - this is definitely a welcome new feature.

I gave a presentation on columnstore indexing last week and one question I received was regarding the performance of a (hypothetical) narrow, supporting nonclustered index versus a columnstore index.  We discussed how nonclustered indexes were still going to be more effective for specific types of queries (for example – singleton operations, smaller range scans, etc.).  But what about a situation where we’re aggregating data across the entire fact table? Was there any way that a narrow nonclustered index could measure up against a columnstore index?

The ability to contrast the performance is of course very much dependent on the query construction, table design, data distribution and other factors.  With that said, I thought I would test out a scenario as follows…

First of all, I used FactInternetSales - pumped up to 123,695,104 rows.  The query I used for comparing performance was as follows (with a few hint-modifications to force index and parallelism caps that I used later):

SELECT     p.ProductLine,

           SUM(f.SalesAmount) TotalSalesAmount

FROM [dbo].[FactInternetSales] f

INNER JOIN [dbo].[DimProduct] p ON

     f.ProductKey = p.ProductKey

GROUP BY p.ProductLine

ORDER BY p.ProductLine;

 

I created the following nonclustered indexes (index on joining keys – and then an INCLUDE on the SalesAmount which is being aggregated):

CREATE NONCLUSTERED INDEX [NCI_FactInternetSales_ProductKey]

ON [dbo].[FactInternetSales]

([ProductKey] ASC)

INCLUDE ( [SalesAmount]) ON [PRIMARY];

GO

 

CREATE NONCLUSTERED INDEX [NCI_DimProduct_ProductKey_ProductLine] ON [dbo].[DimProduct]

(    [ProductKey] ASC,

     [ProductLine] ASC

)ON [PRIMARY];

GO

 

I also created a columnstore index that covered all columns in the fact table.  To force my query to ignore that index for the comparison test, I added the following line:

OPTION (IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX);

 

So how did the performance compare between a supporting nonclustered index versus the columnstore index? (By the way, I used a warm cache for both tests – executing twice):

Index Support

Elapsed Time

CPU Time

Columnstore

256 ms.

1045 ms.

Nonclustered index

34189 ms.

33322 ms

 

The columnstore index plan was as follows:

clip_image002

The nonclustered index plan was as follows:

clip_image004

One obvious difference is that the columnstore index execution leveraged parallelism.  The query also benefited from batch execution mode:

clip_image006

If I hobble the columnstore index by capping max degree of parallelism, the results (in comparison) are as follows (third row represents the test):

Intended Index Support

Elapsed Time

CPU Time

Mode

Columnstore

256 ms.

1045 ms.

Batch

Nonclustered index

34189 ms.

33322 ms.

Row

Columnstore index (MAXDOP 1) – ending up using NCI

33540 ms.

34560 ms.

Row

 

Looking at the plan, we see the columnstore index wasn’t used at all:

clip_image008

If I force use of the columnstore index with MAXDOP 1, I see:

Index Support

Elapsed Time

CPU Time

Mode

Columnstore

256 ms.

1045 ms.

Batch

Nonclustered index

34189 ms.

33322 ms.

Row

Columnstore index (MAXDOP 1) – ending up using NCI

33540 ms.

34560 ms.

Row

Columnstore index (MAXDOP 1) – forced columnstore index

29859 ms.

30107 ms.

Row

 

The plan shows the index is used, but it is row mode (due to capping parallelism):

clip_image010

clip_image012

Looping back to the inciting discussion – even with a supporting nonclustered index, this specific query which scanned all 123 million rows and returned a smaller result set with aggregated values performed significantly better when using a columnstore index in conjunction with batch execution mode.

When moving from SQL Server 2008+ to SQL Server 2012, be aware that the sampling algorithm has changed.

For example, I created a partitioned clustered index on two identical tables (same schema, rows and distribution), one in SQL Server 2008 R2 SP1 and the other in SQL Server 2012 RC0. 

Below shows the partial output (STAT_HEADER) of a DBCC SHOW_STATISTICS on SQL Server 2008 R2 SP1:

clip_image001

All rows were evaluated for the statistics generation.

Now compare this to SQL Server 2012 RC0:

clip_image002

SQL Server 2012 now uses default sampling for the partition creation and rebuild.

Is this good or bad?

Certainly if you’ve required a full scan in the past in order to get higher quality stats, this may not be a good thing.  Otherwise you may not notice anything at all from a query performance perspective if the default is good (or accurate) enough.

You can see the number of steps decreased in my examples in this post between 2008 R2 SP1 to SQL Server 2012.  My histograms for this identical table also changed of course.  For example, moving from average rows on one step of 5504 down to 5305, distinct range rows from 343 up to 361... The real question is, will these changes be enough to turn a good plan bad. 

Post-upgrade, if you see changes in query performance against partitioned tables, evaluate the histograms and test performance of the workload after the default sampling versus after a full scan update of the statistics (via UPDATE STATISTICS for example).  

I was interested in exploring various questions about columnstore indexing this morning – and I realized I should probably blog about what I observed.  This truly was just an exploration and not a formal test (call it semi-structured fiddling).  Also, some of my “questions” were really just confirmations.  While I believe what I read, I like to see things firsthand whenever possible.

Today I thought I would take a few minutes to explore columnstore index behavior on a 123,695,104 row fact table.  My previous tests had been limited to < 10 million rows, which doesn’t really showcase columnstore index capabilities.  So I thought I would experiment with higher numbers – although I still want to start the billion row testing scenarios at some point.

The experimentations took place on a SQL Server instance capped at 8GB of RAM max server memory and 8 logical processors.  I used the FactInternetSales table from AdventureWorksDWDenali – increasing it to 123,695,104 rows. 

I covered the entire fact table for my initial exploration (although as you’ll see, I made a few changes both to this index and also moving from a heap table to a clustered index):

CREATE NONCLUSTERED COLUMNSTORE INDEX [NonClusteredColumnStoreIndex-20120225-092018] ON [dbo].[FactInternetSales]

(

       [ProductKey],

       [OrderDateKey],

       [DueDateKey],

       [ShipDateKey],

       [CustomerKey],

       [PromotionKey],

       [CurrencyKey],

       [SalesTerritoryKey],

       [SalesOrderNumber],

       [SalesOrderLineNumber],

       [RevisionNumber],

       [OrderQuantity],

       [UnitPrice],

       [ExtendedAmount],

       [UnitPriceDiscountPct],

       [DiscountAmount],

       [ProductStandardCost],

       [TotalProductCost],

       [SalesAmount],

       [TaxAmt],

       [Freight],

       [CarrierTrackingNumber],

       [CustomerPONumber],

       [OrderDate],

       [DueDate],

       [ShipDate]

) WITH (DROP_EXISTING = OFF)

GO

Remember that we can have up to 1,024 columns and that the concept of key columns and INCLUDE don’t apply here.  Nor does the ASC or DESC keywords or creation of the columnstore index as clustered (only nonclustered is supported at this point). 

Regarding the data types, we can cover the standard “business” ones – but not the “max”, LOB and xml flavors.

<quick aside> By the way – a shout out to Eric Hanson’s Columnstore Index references on the TechNet Wiki.  These have been invaluable in understanding columnstore indexing and also revealing tools to help fish for your own answers.  I love seeing this kind of high quality content coming directly from the Program Managers.  I really hope that more PMs start updating their associated feature areas on TechNet Wiki in the future.  When I first heard about TechNet Wiki I was more than a little dubious.  We already have blogs, forums, BOL, papers, videos, etc.  Why do we need another information channel?  Seeing Eric’s content has totally shifted my opinion on this.  The information is timely, fresh (you see it get updated frequently) and easily discoverable.  It also allows for community participation and questions – which, if managed well – only helps the associated quality of the content.   I don’t believe TechNet Wiki replaces BOL or other information channels, but I feel it has definitely earned a seat at the table.  Again, I hope other PMs follow this lead – or alternatively use their blogs to heavily describe their feature areas (much like the great blog post series on the AlwaysOn Readable Secondary feature from Sunil Agarwal).   Just doing periodic updates on their pet features can help do wonders for feature adoption and comprehension. </quick aside>

Back on task… The following are the various questions I asked and the various observations made….

Was tempdb used in creating the columnstore index on a heap table?

I intentionally had tempdb configured to an initial small size with auto-growth enabled in order to see if it was used at all during the CREATE COLUMNSTORE INDEX against the 123 million row table (I could have looked through counters and other ways too – this was just a side effect observation). The answer? It was not used (staying at a very small value of 10 MBs).

How do things look in sys.indexes and sys.index_columns?

Nothing terribly interesting - except that the is_included_column is a value of “1”:

clip_image002

Anything interesting in sys.column_store_dictionaries?

Most definitely… This was a pretty illuminating catalog to query.  A few of the observations:

·        There were 149 rows across the 26 columns defined in the index.

·        25 columns had one row – and 1 column had 125 rows.  Not surprisingly – the SalesOrderNumber was the column with 125 entries.  This is also the column I populated with a random GUID that I converted to nvarchar(20) for new entries to the table.  The values for this column, while not constrained, were nearly unique in my test (with the exception of the base data).

·        The on disk (dictionary) storage for SalesOrderNumber was by far the worst (not surprisingly).  It was 2,328,411,337 in bytes.  Compare that with 21,384 bytes for an integer column, 74,584 bytes for a datetime column and 1,088 bytes for a money data type column.  So this would point to a design decision around not including 100% unique values as part of the index – or if you do, choosing a much more compression friendly data type mapped to an associated dimension table.

·        The entry_count column_store_dictionaries column was also very interesting.  For example, the int data type ProductKey had 158 entries.  The ShipDate datetime column had 1,126 entries.  The SalesOrderNumber had 123,662,365 entries. No magic here (and the difference between those entries and the 123,695,104 row count was that the initial data populated didn’t use my random GUID values).

·        The type column from the results told me about the type of dictionary… For the columns with “1”, that maps to hash dictionary for integer based columns.  A type of “4” represented my float columns. Type “3” represented the string values.

·        The flags (“internal use only”) column from column_store_dictionaries was “0” for all columns except for UnitPriceDiscountPct (float), DiscountAmount (float), OrderDate/DueDate/ShipDate (datetime).  The flags column was “2” for the CustomerPONumber (nvarchar(25)) and CarrierTrackingNumber (nvarchar(25)).  Flags was 0 for everything else, so I’m definitely curious about this.

Before I continued, I dropped and recreated the columnstore index, this time without the SalesOrderNumber.    It wasn’t going to provide any value for my next set of explorations.

With the SalesOrderNumber column removed, leaving the other columns on the columnstore index, what was the on-disk footprint?

310,400 bytes for the dictionary size – for 123 million rows (per sys.column_store_dictionaries).  I double checked sys.dm_db_partition_stats as well and the lob_used_page_count was 304,579.  So the page count is around 2.3 GB – and the dictionary size is 310 KB.  Again, this is for 123,695,104 rows.  The heap itself is 2,876,461 pages – or roughly 22.4 GB.

Anything interesting in sys.column_store_segments?

Another great catalog view to explore with many interesting ways to slice and dice this data:

·        All 26 columns had 120 segments associated with them. But what is interesting is that this is for 26 columns – even though my columnstore index was recreated with just 25 columns.  So it was segmenting based on each column of the heap (seemingly).  If you enable trace flag 646 and 3605, you’ll see during index creation a message like “Column 25 was omitted from VertiPaq during column index build.” So it made me wonder if this really did represent 120 segments for the omitted column?

·        The average row count per segment was 1,030,792 rows and a maximum of 1,048,576. 

·        The minimum rows for a segment was 2,416. 

·        The 2,416 row count was for all columns on segment number 118. This was not the last segment.  Segment 119 was the last segment (first segment is 0-based) for each column and each one of these had 1,040,352.  Eric Hanson gave an explanation on a forum which talks about segments being built on parallel threads and when hitting the end of the table, not having enough rows to completely fill segments.

·        As for encoding_type, that was also interesting.  Some columns had both an encoding_type of 1 AND 2.  Only the first five columns of the table had an encoding type of “1” and then columns 1 through 25 had an encoding type of “2”.  Columns 21 and 22 (Freight and CarrierTrackingNumber) had an encoding of 3.  Freight is money data type and CarrierTrackingNumber is nvarchar(25).  Column 26 (ShipDate datetime) had an encoding type of “4”.  In BOL – encoding_type is defined as “type of encoding used for that segment”.  Sigh.

·        The min_data_id and max_data_id value rangers were also interesting.  Sometimes they reflected the actual column data ranges (for example in the date columns) and sometimes not.  Looking at “Understanding Segment Elimination” – Eric Hanson describes this behavior – saying that the values in these ranges can also reference “into a dictionary”.

Anything interesting in sys.column_store_index_stats?

The only noteworthy value was the number_of_segments (3,120).   

Can I see some segment elimination for my queries? (for non-string types on a heap table)

Remember that I created a columnstore index on a heap.  Let’s say we look at the segment ranges for the ProductKey column (column id 1).  Here are the associated ranges:

SELECT column_id, min_data_id, max_data_id, segment_id

FROM sys.column_store_segments

WHERE column_id = 1

ORDER BY column_id, min_data_id, max_data_id, segment_id;

All 120 segments have identical min and max ranges:

clip_image003

What about other columns – like the DueDate (datetime)?

clip_image004

Same thing applies – each segment, with the exception of a trailing row, have the same ranges. 

What about segment elimination on a table with a clustered index?

So next I dropped the columnstore index, and created a clustered index on ProductKey.  I’m not saying this is a great choice for the clustered index key – but rather, I’m just trying to understand the behavior in comparison to a heap and also look at any potential segment elimination.

In the following query, I’m pulling total order quantity for three products:

-- Segment elimination written to error log

DBCC TRACEON(3605, -1);

GO

DBCC TRACEON(646, -1);

GO

SELECT ProductKey,

             [DueDateKey],

             SUM(OrderQuantity) Total_OrderQuantity

FROM [dbo].[FactInternetSales]

GROUP BY ProductKey, DueDateKey

HAVING ProductKey IN

             (478, 343, 574);

I first validated that batch mode was indeed being used:

clip_image005

I then looked at the SQL Error Log (per the trace flag) – and I saw the storage engine skipped 28 “row group” (segments):

clip_image007

If I look at the segment meta data though for column_id 1, and I see blocks of segments covering the same ranges (instead of the same ranges across all segments):

SELECT segment_id, min_data_id, max_data_id

FROM sys.column_store_segments

WHERE column_id = 1

ORDER BY segment_id

clip_image009

By the way, the 28 segment elimination I saw in the error log for column 1 translated to 27,226,112 rows across (row_count from sys.column_store_segments).  The query itself takes less than 1 second to return 569 rows – from a 120 million+ row table. 

Will I get segment elimination on other columns not part of the clustered index key?

I tried the following query that filtered on DueDateKey:

SELECT ProductKey,

             [DueDateKey],

             SUM(OrderQuantity) Total_OrderQuantity

FROM [dbo].[FactInternetSales]

GROUP BY ProductKey, DueDateKey

HAVING DueDateKey = 20040522

Nope.  While the query still ran in less than a second – using batch instead of row mode for the columnstore index scan, segment elimination event did not occur.  Looking at the min and max ranges from sys.column_store_segments helps answer why segment elimination wasn’t possible:

SELECT column_id, min_data_id, max_data_id, segment_id

FROM sys.column_store_segments

WHERE column_id = 3 AND

       20040522 BETWEEN min_data_id AND max_data_id

This returns all 120 rows:

clip_image011

What if the DueDateKey is a secondary key column on the clustered index? 

I dropped the columnstore index and recreated the clustered index with ProductKey and DueDateKey.  I then recreated the columnstore index.

Checking the DueDateKey range in sys.column_store_segments, I see the ranges are no longer identical:

clip_image013

That seemed promising, so I executed the query filtering by DueDateKey:

clip_image015

Sure enough – I had segment elimination, even though this was defined as the second column in the clustered index key.  And what’s more, my ProductKey column segment elimination was still working (I tested the earlier queries again).  This raises some interesting questions around new clustered index key strategies when you know your table will primarily be used with a columnstore index.

Okay – that’s enough for today.  I still have many other scenarios I’d like to try out, and I’ll share here when I get the opportunity.

This post idea was prompted by a discussion I had this week with Jonathan Kehayias about an environment that had multiple transactional replication publications defined with overlapping table articles. 

In other words, a table was defined as an article in more than one publication.

While I can think of some cases where you would want to leverage different article options or filters, in this particular case the articles had no differences in how they were defined.  I’ve seen this in other environments in the past – and as I recall it wasn’t a conscious decision, but rather a lack of coordination across application teams and projects.

For small databases with lower volumes of modifications, this overlap could likely go unnoticed.   For larger tables with high amounts of data modifications, well, consider the following scenario:

·        You have two transactional replication publications that each reference the same table as an article.  No other article properties are changed between the two publications and articles.

·        Each publication maps to a single subscriber.

·        Your table article setting for this scenario use the default - propagating INSERTs, UPDATEs and DELETEs via the default statement delivery method (spMSins_ / sp_MSupd_ sp_MSdel) etc. (And while we are propagating changes made directly to the table, we’re not using stored procedure execution articles.)

So let’s say we execute the following single statement batch update against the redundantly published table.  This is one statement that updates 3,120 rows:

UPDATE dbo.charge

SET charge_amt = charge_amt * .97

WHERE provider_no = 386;

If we used sp_replcmds in the publisher database (I had the log reader agent job stopped in order to step through the scenario), how many command transactions would you expect to see marked for replication?

The answer is – 6,240.  One call per row updated, multiplied by two separate publications (and we’re still only in the publication database):

clip_image002

And as you may expect, those 6,240 rows move on to the distribution database (you can validate via  sp_browsereplcmds or MSrepl_commands):

clip_image004

Now had you instead just created ONE publication with that article sent to the two different subscribers, you would see just 3,120 in the publication database for the original update – and 3,120 as well at the distributor prior to multicasting the update to the two subscribers.

Coupled with the already “chatty” nature of transactional replication – you can imagine scenarios where performance rapidly degrades for large batch updates, particularly on already-constrained topologies.

Categories:
Performance | Replication

Theme design by Nukeation based on Jelle Druyts