Querying the VMware vCenter Database (VCDB) for Performance and Configuration Information

In my previous post I showed how DBAs can be provided with read-only access to Virtual Center for monitoring VM performance in the data-center   One thing that many administrators don’t realize or think about is that Virtual Center uses a database for storing information about the virtual data center and the most common database platform used for this purpose is SQL Server.  Virtual Center can use an existing SQL Server in the environment, or it can install SQL Server Express during setup.

As a consultant, I do a lot of reviews of SQL Servers running on VMware, and as part of that work I have to understand the underlying virtualization configuration – not just for SQL Server, but for the virtualized data center.  This includes needing information about the host configuration, other VMs running on the host, and performance metrics from the VM historical data that is stored in the vCenter database, which is named VIM_VCDB by default.

The VIM_VCDB database isn’t entirely documented, but a list of the views and their definitions can be found in the Using VirtualCenter Database Views technical note from VMware.  Using this information, and additional information collected using a server-side trace against the database while working in vCenter, I’ve compiled a series of Transact-SQL scripts for pulling information from the VIM_VCDB database in an environment.

The first time I ever demonstrated these scripts was last week during the Virtualization module of our IE3: High Availability and Disaster Recovery course in Chicago.  I’m also using these scripts in the Pluralsight course on SQL Server Virtualization that I am currently recording.  When I demonstrated these scripts last week, the first question the class asked was where can they get them.  We always provide the scripts from our demos to students, but I also promised to blog them as well this week.

Host Configuration

The VPXV_HOSTS view provides host level information and I commonly JOIN that view to the VPXV_VMS view to aggregate VM information per host as a part of a environment health check.

-- Host Configuration
SELECT
    vh.NAME AS HOST_NAME,
    HOST_MODEL,
    CPU_MODEL,
    CPU_COUNT,
    CPU_CORE_COUNT,
    CPU_HZ,
    CPU_THREAD_COUNT,
    SUM(CASE WHEN vm.POWER_STATE = N'On' THEN vm.NUM_VCPU ELSE 0 END) AS VM_VCPU_ACTIVE,
    MEM_SIZE,
    SUM(CASE WHEN vm.POWER_STATE = N'On' THEN vm.NUM_VCPU ELSE 0 END)*1./CPU_THREAD_COUNT AS THREAD_OVERCommit,
    SUM(CASE WHEN vm.POWER_STATE = N'On' THEN vm.NUM_VCPU ELSE 0 END)*1./CPU_CORE_COUNT AS CORE_OVERCommit,
    CAST(MEM_SIZE AS BIGINT)/1024/1024 AS MEM_SIZE_MB,
    SUM(CASE WHEN vm.POWER_STATE = N'On' THEN vm.MEM_SIZE_MB ELSE 0 END) AS VM_MEM_SIZE_MB,
    SUM(CASE WHEN vm.POWER_STATE = N'On' THEN vm.MEM_SIZE_MB ELSE 0 END)*1./(CAST(MEM_SIZE AS BIGINT)/1024/1024) AS MEM_OVERCommit,
    SUM(CAST(vm.MEMORY_OVERHEAD AS BIGINT)) AS VM_MEMORY_OVERHEAD,
    SUM(vm.MEM_SIZE_MB) AS VM_MEM_SIZE_MB_POTENTIAL,
    SUM(vm.NUM_VCPU) AS VM_VCPU_ALLOC_POTENTIAL,
    NIC_COUNT,
    HBA_COUNT,
    SUM(CASE WHEN vm.VMMWARE_TOOL = N'OK' THEN 1 ELSE 0 END) AS VM_TOOLS_OK,
    SUM(CASE WHEN vm.VMMWARE_TOOL = N'Old' THEN 1 ELSE 0 END) AS VM_TOOLS_OUT_OF_DATE,
    SUM(vm.NUM_VCPU) AS VM_VCPU_ALLOC
FROM dbo.VPXV_HOSTS AS vh
INNER JOIN dbo.VPXV_VMS AS vm
    ON vh.HOSTID = vm.HOSTID
GROUP BY vh.NAME, HOST_MODEL, CPU_MODEL, CPU_COUNT, CPU_CORE_COUNT, CPU_HZ,
    CPU_THREAD_COUNT, MEM_SIZE, NIC_COUNT, HBA_COUNT;

Performance Counter Information

Performance counters are maintained historically in the VIM_VCDB database at multiple roll-up levels. The VPXV_HIST_STAT_DAILY view has the daily roll-up values for each of the VMs and the VPXV_HIST_STAT_WEEKLY view has the weekly roll-up values.

Querying individual counter values and making decisions is not appropriate for all counters. An example of this is the CPU Ready counter which has to be calculated from the roll-up summation to determine if a problem actually exists (see CPU Ready Time in VMware and How to Interpret its Real Meaning). Two counters that I look at for the virtual data center and all of the VMs are CPU Ready and Memory Ballooned (vmmemctl).

CPU Ready Values

-- Daily %RDY values
SELECT
    vh.NAME AS HostName,
    vv.NAME AS GuestName,
    SAMPLE_TIME,
    SAMPLE_INTERVAL,
    (STAT_VALUE/ (vv.NUM_VCPU * SAMPLE_INTERVAL * 1000)) * 100 AS READY_PERCENT,
    CASE
        WHEN (STAT_VALUE/ (vv.NUM_VCPU * SAMPLE_INTERVAL * 1000)) * 100 > 5
                AND (STAT_VALUE/ (vv.NUM_VCPU * SAMPLE_INTERVAL * 1000)) * 100 < 10
            THEN N'WARN'
        WHEN (STAT_VALUE/ (vv.NUM_VCPU * SAMPLE_INTERVAL * 1000)) * 100 > 10 THEN N'RED'
        ELSE N'OK'
    END AS CPURDY_State,
    STAT_VALUE AS CPUReady_Summation,
    NUM_VCPU
FROM dbo.VPXV_HIST_STAT_DAILY AS vhsd
INNER JOIN dbo.VPXV_VMS AS vv
    ON vhsd.ENTITY = N'vm-'+CAST(vv.VMID AS NVARCHAR)
INNER JOIN  dbo.VPXV_HOSTS AS vh
    ON vv.HOSTID = vh.HOSTID
WHERE STAT_GROUP = N'cpu'
  AND STAT_NAME = N'ready'
  AND CASE
        WHEN (STAT_VALUE/ (vv.NUM_VCPU * SAMPLE_INTERVAL * 1000)) * 100 > 5
            AND (STAT_VALUE/ (vv.NUM_VCPU * SAMPLE_INTERVAL * 1000)) * 100 < 10
            THEN N'WARN'
        WHEN (STAT_VALUE/ (vv.NUM_VCPU * SAMPLE_INTERVAL * 1000)) * 100 > 10 THEN N'RED'
        ELSE N'OK'
        END <> N'OK'
ORDER BY CPURDY_State, READY_PERCENT DESC;

-- Weekly %RDY values
SELECT
    vh.NAME AS HostName,
    vv.NAME AS GuestName,
    SAMPLE_TIME,
    SAMPLE_INTERVAL,
    (STAT_VALUE/ (vv.NUM_VCPU * SAMPLE_INTERVAL * 1000)) * 100 AS READY_PERCENT,
    CASE
        WHEN (STAT_VALUE/ (vv.NUM_VCPU * SAMPLE_INTERVAL * 1000)) * 100 > 5
                AND (STAT_VALUE/ (vv.NUM_VCPU * SAMPLE_INTERVAL * 1000)) * 100 < 10
            THEN N'WARN'
        WHEN (STAT_VALUE/ (vv.NUM_VCPU * SAMPLE_INTERVAL * 1000)) * 100 > 10 THEN N'RED'
        ELSE N'OK'
    END AS CPURDY_State,
    STAT_VALUE AS CPUReady_Summation,
    NUM_VCPU
FROM dbo.VPXV_HIST_STAT_WEEKLY AS vhsd
INNER JOIN dbo.VPXV_VMS AS vv
    ON vhsd.ENTITY = N'vm-'+CAST(vv.VMID AS NVARCHAR)
INNER JOIN  dbo.VPXV_HOSTS AS vh
    ON vv.HOSTID = vh.HOSTID
WHERE STAT_GROUP = N'cpu'
  AND STAT_NAME = N'ready'
  AND CASE
        WHEN (STAT_VALUE/ (vv.NUM_VCPU * SAMPLE_INTERVAL * 1000)) * 100 > 5
            AND (STAT_VALUE/ (vv.NUM_VCPU * SAMPLE_INTERVAL * 1000)) * 100 < 10
            THEN N'WARN'
        WHEN (STAT_VALUE/ (vv.NUM_VCPU * SAMPLE_INTERVAL * 1000)) * 100 > 10 THEN N'RED'
        ELSE N'OK'
        END <> N'OK'
ORDER BY CPURDY_State, READY_PERCENT DESC;

Memory Ballooning Counters

-- Weekly Memory Ballooned
SELECT
    vh.NAME AS HostName,
    vv.NAME AS VMName,
    Start,
    Finish,
    tab.SAMPLE_INTERVAL,
    MIN(STAT_VALUE)/1024. AS MinBallooned_MB,
    MAX(STAT_VALUE)/1024. AS MaxBallooned_MB,
    AVG(STAT_VALUE)/1024. AS AvgBallooned_MB,
    COUNT(*) * (tab.SAMPLE_INTERVAL/60) AS MinutesBallooned
FROM dbo.VPXV_HIST_STAT_WEEKLY AS vhsd
INNER JOIN dbo.VPXV_VMS AS vv
    ON vhsd.ENTITY = N'vm-'+CAST(vv.VMID AS NVARCHAR)
INNER JOIN  dbo.VPXV_HOSTS AS vh
    ON vv.HOSTID = vh.HOSTID
CROSS JOIN (SELECT
                MIN(SAMPLE_TIME) AS Start,
                MAX(SAMPLE_TIME) AS Finish,
                SAMPLE_INTERVAL
            FROM dbo.VPXV_HIST_STAT_WEEKLY
            WHERE STAT_NAME = N'vmmemctl'
              AND STAT_VALUE > 0
            GROUP BY SAMPLE_INTERVAL) AS tab
WHERE STAT_NAME = N'vmmemctl'
  AND STAT_VALUE > 0
GROUP BY vh.Name, vv.Name, Start, Finish, tab.SAMPLE_INTERVAL
ORDER BY HostName, MinutesBallooned DESC;

-- Daily Memory Ballooned
SELECT
    vh.NAME AS HostName,
    vv.NAME AS VMName,
    Start,
    Finish,
    tab.SAMPLE_INTERVAL,
    MAX(STAT_VALUE)/1024. AS MaxBallooned_MB,
    AVG(STAT_VALUE)/1024. AS AvgBallooned_MB,
    COUNT(*) * (tab.SAMPLE_INTERVAL/60) AS MinutesBallooned
FROM dbo.VPXV_HIST_STAT_DAILY AS vhsd
INNER JOIN dbo.VPXV_VMS AS vv
    ON vhsd.ENTITY = N'vm-'+CAST(vv.VMID AS NVARCHAR)
INNER JOIN  dbo.VPXV_HOSTS AS vh
    ON vv.HOSTID = vh.HOSTID
CROSS JOIN (SELECT
                MIN(SAMPLE_TIME) AS Start,
                MAX(SAMPLE_TIME) AS Finish,
                SAMPLE_INTERVAL
            FROM dbo.VPXV_HIST_STAT_DAILY
            WHERE STAT_NAME = N'vmmemctl'
                AND STAT_VALUE > 0
            GROUP BY SAMPLE_INTERVAL) AS tab
WHERE STAT_NAME = N'vmmemctl'
  AND STAT_VALUE > 0
GROUP BY vh.Name, vv.Name, Start, Finish, tab.SAMPLE_INTERVAL
ORDER BY HostName, MinutesBallooned DESC;

Summary

The VIM_VCDB contains a lot more information than I’ve covered in this post.  These are simply examples of the types of information that you can query from the database as a part of troubleshooting problems or evaluating the configuration of a virtual data center.  One of the important things about having access to this data is to always validate that you are applying the appropriate understanding to the values being stored in the database.

Read-Only Access to Virtual Center for DBAs

Most environments using VMware for server virtualization are going to also have Virtual Center Server installed for administration and monitoring.  As a DBA, I always had access to Virtual Center with limited permissions.  It is incredibly easy for VM administrators to provide read-only access to Virtual Center to non-administrators so that they can monitor the performance counters for their servers.  In this post I’ll show how easy it is to configure permissions for Virtual Center, as a reference for DBAs to provide to VM administrators when requesting access.

Configuring Read-Only Access to Virtual Center

The VM administrator can provision accounts for access to vCenter two ways; creating named accounts with passwords in Virtual Center directly, or through Active Directory Authentication.  The Permissions tab on any object in Virtual Center allows an administrator to provision access as shown below.

image

Right-clicking inside the window and choosing Add Permission will open the Assign Permissions dialog shown below.

image

Here an account can be selected from the Virtual Center users or through Active Directory, and then permissions can be assigned to the user or group to allow access into Virtual Center.  For simplicity, Virtual Center ships with a number of default roles that can be used to provide access with limited permissions.

Default Roles

The full definition of the default roles can be found on page 93 of the Datacenter Administrator Guide. While this guide is for vSphere 4.1, the same role configurations exist in the most recent version of Virtual Center Server.  The two key roles for DBAs are Read Only and Virtual Machine User.  As a DBA, I had Virtual Machine User access to all of my SQL Server VMs, which were grouped in a folder inside of Virtual Center to simplify permissions management.

Read Only

The Read Only role provides the bare minimum set of access to Virtual Center to allow monitoring performance information and viewing configuration information for a VM.  It provides the following level of access to Virtual Center:

  • View the state and details about the object
  • View all the tab panels in the vSphere Client except the Console tab
  • Cannot perform any actions through the menus and toolbars

Virtual Machine User

The Virtual Machine User role provides all of the access to vCenter that the Read Only role provides, but also provides the following additional permissions in vCenter:

  • Interact with a virtual machine’s console, insert media, and perform power operations
  • Does not grant privileges to make virtual hardware changes to the virtual machine
  • All privileges for the scheduled tasks privileges group
  • Selected privileges for the global items and virtual machine privileges groups
  • No privileges for the folder, datacenter, datastore, network, host, resource, alarms, sessions, performance, and permissions privileges groups

Summary

As a DBA, I would push for Virtual Machine User permissions inside of Virtual Center, but I’d want Read Only access as the bare minimum.  Configuring permissions is incredibly easy to do, and providing the ability to track performance and configuration information for VMs makes it easier to diagnose and track problems when they occur.  In the past, having the ability to control the VM power has prevented after-hours calls to VM administrators.  Additionally, having console access to the VM has allowed multiple team members to view the console remotely during troubleshooting rather than having to be in the office.  The fact that these permissions prevent configuration changes should negate any arguments against providing access to DBAs.

New Article – How expensive are column-side Implicit Conversions?

Last week I blogged about Implicit Conversions that cause Index Scans, and showed two charts for where implicit conversions could result in an index scan operation instead of an index seek during query execution.  As a part of writing that blog post I thought it would be interesting to also look at the performance overhead of the column-side implicit conversion occurring, given that I’ve seen it’s effects a number of times in the last few weeks.  Today my article How expensive are column-side Implicit Conversions? went live on the SQLPerformance.com site to show using Performance Monitor information on CPU usage and batch requests per second, the effect of the performance overhead when column-side implicit conversions occur that cause index scans during query execution.

Implicit Conversions that cause Index Scans

Implicit conversions seem to be a growing trend in the SQL Server performance tuning work that I’ve been engaged in recently, and I’ve blogged in the past about ways to identify when implicit conversions are occurring using the plan cache.

For those of you who don’t know, implicit conversions occur whenever data with two different data types are being compared, based on the data type precedence.  The precedence establishes the hierarchy of of the types, and lower precedence data types will always be implicitly converted up to the higher precedence type.  These conversions increase CPU usage for the operation, and when the conversion occurs on a table column can also result in an index scan where an index seek would have been possible without the implicit conversion.

For a long time I’ve wanted to map out the most common data types and the effect of a column-side implicit conversion for creating an index seek versus an index scan and recently I finally got around to mapping it all out.

Setting up the tests

To map out the implicit conversion affects I created two databases using different collations, one using SQL_Latin_General_CP1_CI_AS and the other using Latin_General_CI_AS, and then created the following table in each of the databases.

CREATE TABLE dbo.TestImplicitConverts
(
	RowID int NOT NULL IDENTITY (1, 1),
	BigIntCol bigint NOT NULL,
	BitCol bit NOT NULL,
	CharCol char(10) NOT NULL,
	DateCol date NOT NULL,
	DateTimeCol datetime NOT NULL,
	DateTime2Col datetime2(7) NOT NULL,
	DateTimeOffsetCol datetimeoffset(7) NOT NULL,
	DecimalCol decimal(10, 2) NOT NULL,
	FloatCol float(53) NOT NULL,
	IntCol int NOT NULL,
	MoneyCol money NOT NULL,
	NCharCol nchar(10) NOT NULL,
	NumericCol numeric(10, 2) NOT NULL,
	NVarchrCol nvarchar(50) NOT NULL,
	RealCol real NOT NULL,
	SmallDateTimeCol smalldatetime NOT NULL,
	SmallIntCol smallint NOT NULL,
	SmallMoneyCol smallmoney NOT NULL,
	TimeCol time(7) NOT NULL,
	TinyIntCol tinyint NOT NULL,
	GUIDCol uniqueidentifier NOT NULL,
	VarcharCol varchar(50) NOT NULL,
	CONSTRAINT PK_TestImplicitConverts PRIMARY KEY CLUSTERED (RowID)
);
GO

I then created a nonclustered index on each of the columns in the test table so that a single column query filtering on each column could then generate an execution plan with a single index seek when using the matching data type for the filtering.

-- Create nonclustered indexes on all columns to test implicit conversion affects
CREATE NONCLUSTERED INDEX IX_TestImplicitConverts_BigIntCol ON dbo.TestImplicitConverts (BigIntCol);
GO
CREATE NONCLUSTERED INDEX IX_TestImplicitConverts_BitCol ON dbo.TestImplicitConverts (BitCol);
GO
CREATE NONCLUSTERED INDEX IX_TestImplicitConverts_CharCol ON dbo.TestImplicitConverts (CharCol);
GO
CREATE NONCLUSTERED INDEX IX_TestImplicitConverts_DateCol ON dbo.TestImplicitConverts (DateCol);
GO
CREATE NONCLUSTERED INDEX IX_TestImplicitConverts_DateTime2Col ON dbo.TestImplicitConverts (DateTime2Col);
GO
CREATE NONCLUSTERED INDEX IX_TestImplicitConverts_DateTimeCol ON dbo.TestImplicitConverts (DateTimeCol);
GO
CREATE NONCLUSTERED INDEX IX_TestImplicitConverts_DateTimeOffsetCol ON dbo.TestImplicitConverts (DateTimeOffsetCol);
GO
CREATE NONCLUSTERED INDEX IX_TestImplicitConverts_DecimalCol ON dbo.TestImplicitConverts (DecimalCol);
GO
CREATE NONCLUSTERED INDEX IX_TestImplicitConverts_FloatCol ON dbo.TestImplicitConverts (FloatCol);
GO
CREATE NONCLUSTERED INDEX IX_TestImplicitConverts_GUIDCol ON dbo.TestImplicitConverts (GUIDCol);
GO
CREATE NONCLUSTERED INDEX IX_TestImplicitConverts_NVarcharCol ON dbo.TestImplicitConverts (NVarchrCol);
GO
CREATE NONCLUSTERED INDEX IX_TestImplicitConverts_RealCol ON dbo.TestImplicitConverts (RealCol);
GO
CREATE NONCLUSTERED INDEX IX_TestImplicitConverts_SmallDateTimeCol ON dbo.TestImplicitConverts (SmallDateTimeCol);
GO
CREATE NONCLUSTERED INDEX IX_TestImplicitConverts_SmallIntCol ON dbo.TestImplicitConverts (SmallIntCol);
GO
CREATE NONCLUSTERED INDEX IX_TestImplicitConverts_IntCol ON dbo.TestImplicitConverts (IntCol);
GO
CREATE NONCLUSTERED INDEX IX_TestImplicitConverts_MoneyCol ON dbo.TestImplicitConverts (MoneyCol);
GO
CREATE NONCLUSTERED INDEX IX_TestImplicitConverts_NCharCol ON dbo.TestImplicitConverts (NCharCol);
GO
CREATE NONCLUSTERED INDEX IX_TestImplicitConverts_NumericCol ON dbo.TestImplicitConverts (NumericCol);
GO
CREATE NONCLUSTERED INDEX IX_TestImplicitConverts_SmallMoneyCol ON dbo.TestImplicitConverts (SmallMoneyCol);
GO
CREATE NONCLUSTERED INDEX IX_TestImplicitConverts_TimeCol ON dbo.TestImplicitConverts (TimeCol);
GO
CREATE NONCLUSTERED INDEX IX_TestImplicitConverts_TinyIntCol ON dbo.TestImplicitConverts (TinyIntCol);
GO
CREATE NONCLUSTERED INDEX IX_TestImplicitConverts_VarcharCol ON dbo.TestImplicitConverts (VarcharCol);
GO

Finally each of the tables was loaded with 467000 rows of random data and the indexes were rebuilt to remove any fragmentation before testing was started.

INSERT INTO dbo.TestImplicitConverts
(	BigIntCol, BitCol, CharCol, DateCol, DateTimeCol, DateTime2Col, DateTimeOffsetCol,
	DecimalCol, FloatCol, IntCol, MoneyCol, NCharCol, NumericCol, NVarchrCol, RealCol,
	SmallDateTimeCol, SmallIntCol, SmallMoneyCol, TimeCol, TinyIntCol, GUIDCol, VarcharCol)
SELECT a.number, a.number%1, CAST(b.name AS CHAR(10)), DATEADD(ms, -1*a.number, GETDATE()), DATEADD(ms, -1*a.number, GETDATE()), DATEADD(ms, -1*a.number, GETDATE()), DATEADD(ms, -1*a.number, GETDATE()),
	a.number, a.number, a.number, a.number, CAST(b.name AS NCHAR(10)), a.number, b.name, a.number,
	DATEADD(ms, -1*a.number, GETDATE()), a.number, a.number, DATEADD(ms, -1*a.number, GETDATE()), a.number%255, NEWID(), b.name
FROM master.dbo.spt_values AS a
CROSS JOIN master.dbo.spt_values AS b
WHERE a.type = N'P'
  AND a.number < 1000
  AND b.name IS NOT NULL;
GO
ALTER INDEX ALL ON TestImplicitConverts REBUILD;
GO

The Test Harness

The test harness first connects to the instance of SQL Server and retrieves the column list from the database for the TestImplicitConverts table and stores them into an ArrayList that is then used to iterate over the tests, generating dynamic statements and at the same time using a different type for each test.  The types are selected from the SqlDbType enumeration in .NET and filter out the types that are not being tested as a part of this investigation.  Each test is executed with SET STATISTICS XML ON to collect the actual execution plan for each of the tests for processing to determine if the implicit conversion caused a scan to occur or not.

#Load the Showplan as a XML document for parsing
$xml = New-Object System.Xml.XmlDocument;
$xml.LoadXml($String);

#Setup the XmlNamespaceManager and add the ShowPlan Namespace to it.
$nsMgr = New-Object 'System.Xml.XmlNamespaceManager' $xml.NameTable;
$nsMgr.AddNamespace("sm", "<a href="http://schemas.microsoft.com/sqlserver/2004/07/showplan&quot;);">http://schemas.microsoft.com/sqlserver/2004/07/showplan");</a>

#Search for a Index Scan or Clustered Index Scan operation that is an Implicit Conversion
$HasImplictConvert = $xml.SelectNodes("//sm:RelOp[@LogicalOp='Index Scan' or @LogicalOp='Clustered Index Scan']/sm:IndexScan/sm:Predicate//sm:Convert[@Implicit='1']/sm:ScalarOperator/sm:Identifier/sm:ColumnReference", $nsMgr).Count -gt 0;

Then the results of each test is output to the host as a comma separated list that can then be placed in Excel and used to generate a chart of the implicit conversion effects.

Note:  The full test harness and scripts are attached to this post.

Results

The results of the tests are mapped out in the two charts below.  The green blocks show where an implicit conversion occurs but still results in an index seek operation.  The yellow blocks show where the implicit conversion causes an index scan to occur.  The red blocks show where an operand clash occurs and implicit conversion is not actually supported by SQL Server for the data types, and the grey blocks are the same data type, so no conversion was required.

image

Figure 1 – Implicit Conversions using SQL_Latin_General_CP1_CI_AS

There is a difference between using the SQL_Latin_General_CP1_CI_AS collation and the Latin_General_CI_AS for the char and varchar data types, when converting to nchar or nvarchar.  In the SQL collation the conversion results in an index scan as shown above, but in the Windows collation, the scan does not occur and an index seek is still used to execute the query.

image

Figure 2 – Implicit Conversions using Latin_General_CI_AS

Conclusion

While implicit conversions occur in SQL Server to allow data evaluations against different data types, they can introduce performance problems for specific data type conversions that result in an index scan occurring during the execution.  Good design practices and code reviews can easily prevent implicit conversion issues from ever occurring in your design or workload.

Download scripts here!

Finding Key Lookups inside the Plan Cache

This is actually a blog post I thought I’d written more than two years ago, but this morning when I went looking for it after receiving a question by email  I realized that I’ve never blogged about this before.  At PASS Summit 2010 I presented a session on performance tuning SQL Server by digging into the plan cache, and since then I have blogged a number of other plan cache parsing scripts.  One of the demo scripts I showed at PASS was a query to find plans and statements in the plan cache that perform a Key Lookup operation.  An example statement that performs a Key Lookup is shown below.

USE [AdventureWorks];
GO

SELECT NationalIDNumber, HireDate, MaritalStatus
FROM HumanResources.Employee
WHERE NationalIDNumber = N'14417807';
GO

image

Taking a look at the execution plan XML we can see that there isn’t an actual Key Lookup operator, instead the Index Scan operator associated with the Clustered Index Seek will have a Lookup attribute that is set to a value of 1, as shown in the XML snip-it below:

<RelOp NodeId="3" PhysicalOp="Clustered Index Seek" LogicalOp="Clustered Index Seek" EstimateRows="1" EstimateIO="0.003125" EstimateCPU="0.0001581" AvgRowSize="17" EstimatedTotalSubtreeCost="0.0032831" TableCardinality="290" Parallel="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row">
  <OutputList>
    <ColumnReference Database="[AdventureWorks]" Schema="[HumanResources]" Table="[Employee]" Column="MaritalStatus" />
    <ColumnReference Database="[AdventureWorks]" Schema="[HumanResources]" Table="[Employee]" Column="HireDate" />
  </OutputList>
  <IndexScan Lookup="1" Ordered="1" ScanDirection="FORWARD" ForcedIndex="0" ForceSeek="0" ForceScan="0" NoExpandHint="0" Storage="RowStore">
    <DefinedValues>
      <DefinedValue>
        <ColumnReference Database="[AdventureWorks]" Schema="[HumanResources]" Table="[Employee]" Column="MaritalStatus" />
      </DefinedValue>
      <DefinedValue>
        <ColumnReference Database="[AdventureWorks]" Schema="[HumanResources]" Table="[Employee]" Column="HireDate" />
      </DefinedValue>
    </DefinedValues>
    <Object Database="[AdventureWorks]" Schema="[HumanResources]" Table="[Employee]" Index="[PK_Employee_EmployeeID]" TableReferenceId="-1" IndexKind="Clustered" />
    <SeekPredicates>
      <SeekPredicateNew>
        <SeekKeys>
          <Prefix ScanType="EQ">
            <RangeColumns>
              <ColumnReference Database="[AdventureWorks]" Schema="[HumanResources]" Table="[Employee]" Column="EmployeeID" />
            </RangeColumns>
            <RangeExpressions>
              <ScalarOperator ScalarString="[AdventureWorks].[HumanResources].[Employee].[EmployeeID]">
                <Identifier>
                  <ColumnReference Database="[AdventureWorks]" Schema="[HumanResources]" Table="[Employee]" Column="EmployeeID" />
                </Identifier>
              </ScalarOperator>
            </RangeExpressions>
          </Prefix>
        </SeekKeys>
      </SeekPredicateNew>
    </SeekPredicates>
  </IndexScan>
</RelOp>

As a side note, the ShowplanXML uses the IndexScan XML element for both Seeks and Scans.  An Index Seek is identified by the SeekPredicates child element, whereas an Index Scan would have a Predicates child element instead. Another important note when dealing with ShowplanXML is that the attribute values of a cached execution plan differ from those of an Actual execution plan collected from SSMS.  In the above XML from the plan cache, the Lookup attribute has a value of 1.  However, in an Actual plan collected by SSMS, the value would be true as shown in the Actual XML below:

<IndexScan Lookup="true" Ordered="true" ScanDirection="FORWARD" ForcedIndex="false" ForceSeek="false" ForceScan="false" NoExpandHint="false">

When building an XML parser for querying the plan cache, the parser must check for a value of 1, not true, when looking for a Lookup attribute in the cached plan XML.  Given this information we can build a parser to search the plan cache and find the statements and columns associated with Key Lookups occurring.


SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;

WITH XMLNAMESPACES
   (DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')
SELECT
    n.value('(@StatementText)[1]', 'VARCHAR(4000)') AS sql_text,
    n.query('.'),
    i.value('(@PhysicalOp)[1]', 'VARCHAR(128)') AS PhysicalOp,
    i.value('(./IndexScan/Object/@Database)[1]', 'VARCHAR(128)') AS DatabaseName,
    i.value('(./IndexScan/Object/@Schema)[1]', 'VARCHAR(128)') AS SchemaName,
    i.value('(./IndexScan/Object/@Table)[1]', 'VARCHAR(128)') AS TableName,
    i.value('(./IndexScan/Object/@Index)[1]', 'VARCHAR(128)') as IndexName,
    i.query('.'),
    STUFF((SELECT DISTINCT ', ' + cg.value('(@Column)[1]', 'VARCHAR(128)')
       FROM i.nodes('./OutputList/ColumnReference') AS t(cg)
       FOR  XML PATH('')),1,2,'') AS output_columns,
    STUFF((SELECT DISTINCT ', ' + cg.value('(@Column)[1]', 'VARCHAR(128)')
       FROM i.nodes('./IndexScan/SeekPredicates/SeekPredicateNew//ColumnReference') AS t(cg)
       FOR  XML PATH('')),1,2,'') AS seek_columns,
    i.value('(./IndexScan/Predicate/ScalarOperator/@ScalarString)[1]', 'VARCHAR(4000)') as Predicate,
	cp.usecounts,
    query_plan
FROM (  SELECT plan_handle, query_plan
        FROM (  SELECT DISTINCT plan_handle
                FROM sys.dm_exec_query_stats WITH(NOLOCK)) AS qs
        OUTER APPLY sys.dm_exec_query_plan(qs.plan_handle) tp
      ) as tab (plan_handle, query_plan)
INNER JOIN sys.dm_exec_cached_plans AS cp 
    ON tab.plan_handle = cp.plan_handle
CROSS APPLY query_plan.nodes('/ShowPlanXML/BatchSequence/Batch/Statements/*') AS q(n)
CROSS APPLY n.nodes('.//RelOp[IndexScan[@Lookup="1"] and IndexScan/Object[@Schema!="[sys]"]]') as s(i)
OPTION(RECOMPILE, MAXDOP 1);

The first CROSS APPLY above breaks out the individual statements so that we can tie the Key Lookup operation to the specific statement that is causing it to occur.  The second CROSS APPLY breaks out the actual IndexScan element in the XML and filters out any Key Lookups that are happening in the ‘sys’ schema, since many of the system tables have frequent Key Lookup operations that occur.  Using the two XML fragments created by the CROSS APPLY operations we can extract the statement, the table and index the lookup operation occurs against, the output columns of the lookup operation and the seek predicate being used to perform the lookup.   With this information we can look at the indexing on our tables and determine if a change to a non-clustered index to make it covering is warranted to improve performance or not.

Hope this is helpful!

DBCC CHECKDB Execution Memory Grants – Not Quite What You Expect

I was recently working on a client system that has problems running DBCC CHECKDB normally due to the concurrent volume of data changes occurring against their 2TB OLTP database.  The challenge with running DBCC CHECKDB on this system is that the database snapshot (which DBCC CHECKDB creates) quickly exceeds the sparse file limits in Windows and their check fails with error 665.

To work around this I wrote a custom procedure for them that breaks down the DBCC CHECKDB checks into individual checks and spreads the process over a period of roughly two weeks, within a 2-3 hour window each morning when the system has the least amount of other activity.

While I was working on changes to this custom script I noticed that DBCC CHECKTABLE was taking a ~96GB execution memory grant during its execution.  The server is a Dell R720 with 2 x Intel E5-2690 processors (2.9GHz 8 cores and HT is enabled so 16 logical cores per socket) and 512GB of RAM.  I mentioned this to Paul and it became something that we wanted to investigate further.

We have a similar R720 with a multiple Fusion-IO PCI-X SSDs installed in it as a test server, but it only has 64GB RAM.  I had already setup a 500GB test database for Paul’s DBCC CHECKDB performance tests, using AdventureWorks and my scripts to enlarge it by adding new objects and creating a number of enlarged versions of the SalesOrder* tables in SQL Server 2012 SP1 with CU3, so when Paul finished his tests, I decided to take a deeper look at how DBCC CHECKDB uses execution memory in SQL Server.

The SQL instance was configured with ‘max server memory’ at 54000 which leaves roughly 4GB of available memory on the server at all times.  I then wrote a test harness to perform DBCC CHECKDB with ‘max degree of parallelism’ set a 32, 16, 8, and 4 that logged the start and end time of each test, the tempdb usage for each test, and the deltas for wait, latch, and spinlock stats for the each test.  Under the default configuration, DBCC CHECKDB acquired a 10GB memory grant from the instance.  I ran the test harness through four iterations of tests and had the following averages:

Default Configuration Results

Default Configuration Results

I then configured Resource Governor on the instance and created a Resource Pool that had MAX_MEMORY set at 10% and a Workload Group in the pool with REQUEST_MAX_MEMORY_GRANT_PERCENT set at 25% which yields a ~1GB maximum execution grant size for sessions assigned to the group.

CREATE RESOURCE POOL [Maint]
WITH (min_cpu_percent=0,
      max_cpu_percent=100,
      min_memory_percent=0,
      max_memory_percent=10,
      cap_cpu_percent=100,
      AFFINITY SCHEDULER = AUTO);
GO
CREATE WORKLOAD GROUP [wg_Maint]
WITH (group_max_requests=0,
      importance=Medium,
      request_max_cpu_time_sec=0,
      request_max_memory_grant_percent=25,
      request_memory_grant_timeout_sec=0,
      max_dop=0)
USING [Maint];
GO

Next I created a dbcc_user login on the server, specifically for performing DBCC CHECKDB, added it to the sysadmin server role, and created a classifier function to place connections from this login into the wg_Maint group for testing the effects of reducing the memory grant for DBCC CHECKDB.

USE [master];
GO
CREATE LOGIN [dbcc_user] WITH PASSWORD=N'R3@lly$tr0ngP@$$w0rd!', DEFAULT_DATABASE=[master], CHECK_EXPIRATION=OFF, CHECK_POLICY=OFF;
GO
ALTER SERVER ROLE [sysadmin] ADD MEMBER [dbcc_user];
GO

--- Create the classifier function
CREATE FUNCTION fnRGClassifier()
RETURNS SYSNAME
WITH SCHEMABINDING
AS
BEGIN
     DECLARE @group SYSNAME;
     IF(SUSER_NAME() = 'dbcc_user')
     BEGIN
          SET @group =  N'wg_Maint';
     END
--- Use the default workload group if there is no match on the lookup.
     ELSE
        BEGIN
             SET @group =  N'default';
       END
       RETURN @group;
END
GO

--- Reconfigure the Resource Governor to use the new function
ALTER RESOURCE GOVERNOR with (CLASSIFIER_FUNCTION = dbo.fnRGClassifier);
ALTER RESOURCE GOVERNOR RECONFIGURE;
GO

Next I changed the test harness to login as the dbcc_user and reran the series of tests an additional four times with the following averages for the results of each test:

1GB Memory Grant Results

1GB Memory Grant Results

Reviewing the data, you will see that the elapsed time decreased for all of the tests, while the tempdb usage did not increase significantly as a result of running with only 1GB of execution memory granted.  This reduction in execution memory allowed the remaining 9GB to be used by the data cache for database pages during the tests, which was confirmed by capturing the Memory Manager and Buffer Manager performance counters for the instance during the testing.

I then decided to see what different memory effects had and ran the tests at 2GB, 520MB and 240MB by changing the Workload Group REQUEST_MAX_MEMORY_GRANT_PERCENT to values of 50%, 13%, and 6% respectively.  Using the same series of tests for additional times per configuration.  The following averages were observed after all of the tests were completed:

Additional Lower Memory Grant Results

Additional Lower Memory Grant Results

From the results, the only time that tempdb usage increased, was for the 240MB execution grant size and then only for the DOP 32 and DOP 16 tests.  The total increase in tempdb usage was roughly 300MB in the worst case, which is considerably smaller than the gains in data cache space savings by reducing the execution memory grant size for DBCC CHECKDB.  The fastest average execution time for this specific database was obtained by using 520MB for all of the levels of DOP that were tested.

After completing this testing, Paul and I spent an hour on a call reviewing the spinlock stats, the wait stats, and the latch stats for each of the tests to try to identify why performance improved by reducing the execution memory grant with Resource Governor like this.  The only thing that pointed to an explanation for the improvement in performance is the higher amount of memory available for database pages as a result of reducing the execution memory grant.

Additionally, using Extended Events, I attempted to collect the query_post_execution_showplan events for the executions under the default configuration and under the constrained Resource Governor configuration.  It only received a few parts of the actual execution plan information, but the plans returned were identical with the exception of the memory grant information for RequestedMemory, which matched the limitations that were in place for the Resource Governor tests.

The most interesting item from the execution plan however, was the value for the desired memory, which was 45,182,976,776 KB, or roughly 43TB.  I don’t have an answer, yet, to why this occurs, but I hope to be able to figure it out at some point. It appears that the costing model for DBCC CHECKDB has limitations that weren’t apparent years ago when servers didn’t have the levels of memory available in today’s servers. Even doing a DBCC CHECKDB against the master database, which is 5MB in size on this instance, requests a ~2GB memory grant for execution, so it really seems to point to an excessive costing algorithm for DBCC CHECKDB inside of SQL Server.

(From Paul: the way that costing is performed for DBCC CHECKDB is based on the expected number of bits of information DBCC CHECKDB will create in the rowset it gives to the query processor to sort, hash, and give back to DBCC again. This cardinality estimation method has been the same since SQL Server 2000.)

In summary, I’ll be reviewing our clients’ systems and implementing the Resource Governor method I described above on systems where it will make a difference to performance by not taking so much of the buffer pool. I suggest you consider doing this too.

Tracking SQL Server Database Usage

One of the challenges with inheriting an existing set of SQL Servers and databases when you start a new job can be determining which of the databases are actively being used or not, especially on development and testing systems where there may be multiple copies of the same database that were used by different projects over time. This can also be a challenge for multi-tenant software as a service (SaaS) providers that create a new database for each client they provide service for. An easy way to track whether a database is being used is with Extended Events and the lock_acquired event by filtering for the shared transaction workspace (SharedXactWorkspace) lock that is acquired anytime a user connects to the database.

To start off, we first need to look up the columns returned by the lock_acquired event, and also look up the map values associated with any of the columns so that we know the correct values to use in our event session definition.

-- Look up the lock_acquired event columns
SELECT 
	name,
	column_id,
	type_name
FROM sys.dm_xe_object_columns
WHERE object_name = N'lock_acquired' AND
	column_type = N'data';

-- Look up the values for the Lock Resource Type and the Lock Owner Type
SELECT 
	name,
	map_key,
	map_value
FROM sys.dm_xe_map_values
WHERE name IN (N'lock_resource_type',
N'lock_owner_type');

From this, we can get the DATABASE lock_resource_type map_key=2 and the SharedXactWorkspace lock_owner_type map_key=4. With these values, we can define our event session to track how frequently this lock occurs by database_id, and leverage the bucketizer/histogram target to bucket the data automatically. Since the target name and output changed slightly in SQL Server 2012, two different version specific examples of the event session and event parsing code are presented below:

SQL Server 2008 Event Session

-- If the Event Session Exists, drop it first
IF EXISTS (SELECT 1 
			FROM sys.server_event_sessions 
			WHERE name = 'SQLskills_DatabaseUsage')
	DROP EVENT SESSION [SQLskills_DatabaseUsage] 
	ON SERVER;

-- Create the Event Session
CREATE EVENT SESSION [SQLskills_DatabaseUsage] 
ON SERVER 
ADD EVENT sqlserver.lock_acquired( 
	WHERE owner_type = 4 -- SharedXactWorkspace
	  AND resource_type = 2 -- Database level lock
	  AND database_id > 4 -- non system database
	  AND sqlserver.is_system = 0 -- must be a user process
) 
ADD TARGET package0.asynchronous_bucketizer
( SET slots = 32, -- Adjust based on number of databases in instance
	  filtering_event_name='sqlserver.lock_acquired', -- aggregate on the lock_acquired event
	  source_type=0, -- event data and not action data
	  source='database_id' -- aggregate by the database_id
)
WITH(MAX_DISPATCH_LATENCY =1SECONDS); -- dispatch immediately and don't wait for full buffers
GO

-- Start the Event Session
ALTER EVENT SESSION [SQLskills_DatabaseUsage] 
ON SERVER 
STATE = START;
GO

-- Parse the session data to determine the databases being used.
SELECT  slot.value('./@count', 'int') AS [Count] ,
        DB_NAME(slot.query('./value').value('.', 'int')) AS [Database]
FROM
(
	SELECT CAST(target_data AS XML) AS target_data
	FROM sys.dm_xe_session_targets AS t
    INNER JOIN sys.dm_xe_sessions AS s 
		ON t.event_session_address = s.address
	WHERE   s.name = 'SQLskills_DatabaseUsage'
	  AND t.target_name = 'asynchronous_bucketizer') AS tgt(target_data)
CROSS APPLY target_data.nodes('/BucketizerTarget/Slot') AS bucket(slot)
ORDER BY slot.value('./@count', 'int') DESC

GO

SQL Server 2012 Event Session

-- If the Event Session Exists, drop it first
IF EXISTS (SELECT 1 
			FROM sys.server_event_sessions 
			WHERE name = 'SQLskills_DatabaseUsage')
	DROP EVENT SESSION [SQLskills_DatabaseUsage] 
	ON SERVER;

-- Create the Event Session
CREATE EVENT SESSION [SQLskills_DatabaseUsage] 
ON SERVER 
ADD EVENT sqlserver.lock_acquired( 
	WHERE owner_type = 4 -- SharedXactWorkspace
	  AND resource_type = 2 -- Database level lock
	  AND database_id > 4 -- non system database
	  AND sqlserver.is_system = 0 -- must be a user process
) 
ADD TARGET package0.histogram
( SET slots = 32, -- Adjust based on number of databases in instance
	  filtering_event_name='sqlserver.lock_acquired', -- aggregate on the lock_acquired event
	  source_type=0, -- event data and not action data
	  source='database_id' -- aggregate by the database_id
); -- dispatch immediately and don't wait for full buffers
GO

-- Start the Event Session
ALTER EVENT SESSION [SQLskills_DatabaseUsage] 
ON SERVER 
STATE = START;
GO

-- Parse the session data to determine the databases being used.
SELECT  slot.value('./@count', 'int') AS [Count] ,
        DB_NAME(slot.query('./value').value('.', 'int')) AS [Database]
FROM
(
	SELECT CAST(target_data AS XML) AS target_data
	FROM sys.dm_xe_session_targets AS t
    INNER JOIN sys.dm_xe_sessions AS s 
		ON t.event_session_address = s.address
	WHERE   s.name = 'SQLskills_DatabaseUsage'
	  AND t.target_name = 'histogram') AS tgt(target_data)
CROSS APPLY target_data.nodes('/HistogramTarget/Slot') AS bucket(slot)
ORDER BY slot.value('./@count', 'int') DESC

GO

One thing to keep in mind with this event session is that while a end user might not actually use a database, other tasks like maintenance, backups, CHECKDB, or even using intellisense in SQL Server Management Studio will. It is therefore, expected that databases not being used by end users would still show up inside of the histogram target, but the frequency of usage would be significantly lower than the databases that are actively being used by end users or applications.

SQL Saturday #192 Content Uploaded

I know that it is somewhat late, but I uploaded my slide decks and demos from SQL Saturday #192 in Tampa, FL to the SQL Saturday website earlier today. For simplicity in finding the information, the links to each of the sessions are below:

Performance Tuning by Digging into the Plan Cache

Successfully Virtualizing SQL Server in Production

Free Tools for the DBA

For those that attended the sessions, once again thank you for your time and I hope you gained something from the content that was presented.  Next up is SQL Saturday #202 in Edinburgh, UK where I’ll be co-presenting a Precon with Joe Sack on SQL Server 2012 New Features.  Hope to see you there if you are local.

Execution Plan Sanitizer v2

Almost two years ago I blogged about Sanitizing Execution Plans using PowerShell. The original intent behind this idea was to be able to provide completely sanitized execution plans that could be shared outside of your business environment without the risk of sharing any design elements that may be considered Intellectual Property. Since writing that post I’ve had many people contact me with suggestions for the sanitized output, and so I’ve made a bunch of minor changes to the code to fix various output issues and suggestions.

Fast forward two years and I believe this idea is even more useful today, especially since you can now upload execution plans using SQL Sentry Plan Explorer to the SQLPerformance.com website to get assistance from the community in diagnosing performance problems or query plan regressions. When I found out about this feature being included in Plan Explorer, my first question was whether the ability to sanitize or obfuscate the plan would be included, and I found that it wasn’t part of the initial launch plans but may eventually be part of the product. To be perfectly honest, the number of total users that worry about schema information in an execution plan is a minority so from a business standpoint I totally understand it not being in the first release of this functionality. We’ve already seen a few plans posted to the SQLPerformance.com site show up on the #sqlhelp tag on Twitter, so I know that the new upload feature is being used.

Earlier today I was asked if I realized that the script was broken and didn’t sanitize the output XML properly. Unfortunately I never took the time to upload the changes I’ve made over the last two years, so ran my local script through some plans and then asked my good friend Aaron Bertrand (Blog|Twitter) to test out a new version of it for me. He sent me back some feedback which I was able to quickly implement and now there is a v2 version of the script that should handle most of the plan sanitizing issues that existed in the initial version of the PowerShell script.

2013_2_27_ExecutionPlanSanitizer

Graphically Viewing Extended Events Deadlock Graphs

I’ve previously blogged about the changes to the xml_deadlock_report in Extended Events to support multi-victim deadlock analysis.  One of the side effects of this change was that the XML document for the deadlock report had to be modified to be able to accommodate multiple victim processes in the graph.  As a result of this, the deadlock graphs that are provided by Extended Events in SQL Server 2008 cannot be saved and opened graphically inside of SQL Server Management Studio.  If you attempt to open an Extended Events deadlock graph you will get an error similar to the following:

image

Failed to initialize deadlock control.
        There is an error in XML document (1, 2).
                <deadlock xmlns=""> was not expected

Until recently there wasn’t anything you could do about this, but today there are two options for viewing the deadlock graph in graphical format; SQL Sentry Plan Explorer Pro and SQL Server 2012 Management Studio.

Plan Explorer Pro

One of the new Pro features that was added in SQL Sentry Plan Explorer at the beginning of November 2012 was the ability to open deadlock graph .xdl files in Plan Explorer.  The display is exactly the same display that you would get from Performance Advisor for deadlocks proactively collected during it’s monitoring, which really simplifies the analysis of the deadlock graph information.  While I was beta testing the changes ahead of the public release, I mentioned that the XML from the deadlock graphs generated by Extended Events had a different output to support multi-victim deadlock analysis, and Brooke (the developer for Plan Explorer at SQL Sentry) asked for an example.  A few hours later I had another beta build that handled the multi-victim deadlock graph format provided by Extended Events which is shown below.

image

SQL Server 2012 Management Studio

I didn’t actually realize this until I was recording the next Insider Demo Video for our SQLskills Insiders Newsletter, but SQL Server 2012 Management Studio has also been updated to handle the XML format output by Extended Events, and it also understands the multi-victim deadlock information.

image

However, if you compare the relative amount of information provided here, even if you use the hover tips on the individual processes, there is still a significant amount of information that is missing.  For example, you don’t get the isolation levels of the processes in the graphical display at all, which can be important for troubleshooting specific types of deadlocks.

If you haven’t given SQL Sentry Plan Explorer Pro a spin, I’d really recommend it.  I had the opportunity to take the next Beta release of Plan Explorer Pro for a spin last night, and there are some really awesome new features coming that will make plan analysis faster and easier than you’ve ever experienced.