Query Store Settings

In SQL Server 2017 there are nine (9) settings related to Query Store.  While these are documented in sys.database_query_store_options, I often get asked what the value for each setting “should” be.  I’ve listed out each setting below, along with the default value and considerations around changing the setting.

 

OPERATION_MODE

The default value for a new or upgraded database in SQL Server 2016 or SQL Server 2017 is OFF.  For Azure SQL Database, the default value is READ_WRITE.

If you want to enable Query Store, this needs to be set to READ_WRITE, which is the desired state.

You also have the option of READ_ONLY, whereby new queries, new plans, runtime statistics, and wait statistics (in SQL Server 2017) will not be captured, but any forced plans will still be forced.  This state can occur if you reach the MAX_STORAGE_SIZE_MB limit (see below).  You can check actual state against desired state using the query below:

SELECT [actual_state_desc], [desired_state_desc]
FROM [sys].[database_query_store_options];
GO

It’s recommended to always run in a READ_WRITE state.  I have heard of some environments which switch between READ_WRITE and READ_ONLY.  If you want to understand your workload and have the data needed to troubleshoot performance issues, you need to be capturing information on a continual basis.

 

QUERY_CAPTURE_MODE

The default value for SQL Server 2016 and SQL Server 2017 is ALL.  For Azure SQL Database, the default value is AUTO.

With AUTO, queries that are insignificant from a resource utilization perspective, or executed infrequently, are not captured.  If you need to capture queries that may only execute a few times, or those that use very few resources, then use ALL.  Otherwise, use AUTO, as this will capture the relevant majority of your workload.

There is a third option, NONE, where no new queries are captured.  Runtime and wait statistics will continue to be captured for queries that are already in Query Store.

I recommend setting this option to AUTO, as the number of queries in your environment that need tuning/your attention is a small percentage of the total number of queries that execute.  You won’t miss out on important data if you exclude queries that don’t use a lot of resources or don’t execute very often.

 

MAX_PLANS_PER_QUERY

The default value for SQL Server 2016, SQL Server 2017, and Azure SQL Database is 200.

This setting is an integer, so theoretically you can set it to 2,147,483,647!  If you don’t know how many distinct plans you might have for a query, you can use sys.dm_exec_query_stats and get a count of distinct query_plan_hash values for a given query_hash:

SELECT [query_hash], COUNT (DISTINCT [query_plan_hash])
FROM [sys].[dm_exec_query_stats]
GROUP BY [query_hash]
ORDER BY 2 DESC;
GO

While I would like to believe that 200 distinct plans for a query is really high, I’ve talked to several DBAs who confirmed they had counts in the thousands.  Thus, you may need to increase this settings if you have queries that are unstable and generate a lot of different plans, and you want to capture each different plan.  Understand that a workload with a large number of plans for a query will require more space, hence the limitation.  You can set the limit lower than the possible number of plans to control the size, with the understanding that you won’t capture every plan variation.  The value of 200 is a good starting point for most environments.

 

MAX_STORAGE_SIZE_MB

For SQL Server 2016 and SQL Server 2017 the default value is 100MB.  For Azure SQL Database, the default value is specific to the tier (Basic = 10MB, Standard = 100MB, Premium = 1GB).

The Query Store data is stored in internal tables in the user database (in the PRIMARY filegroup, like other system tables) and exposed through catalog views.  You can configure how much disk space can be used by Query Store.

This settings should be increased for an on-premises solution.  It may need to be increased for SQL Database, there are multiple factors that affect how much space you will need for Query Store data.  These factors are:

  • The value for QUERY_CAPTURE_MODE; if you’re capturing ALL queries, you will have more information than if using AUTO.  The amount of data is difficult to predict – it depends on your workload (Do you have a lot of queries that run just one time?  Do you have a lot of queries that use very little resources?).
  • The length of time you retain data in Query Store (CLEANUP_POLICY).  The more data you keep, the more space you will need.
  • Whether you’re running SQL Server 2017 and capturing wait statistics information (WAIT_STATISTICS_CAPTURE_MODE).  The wait statistics information is extremely valuable, but it is more data to keep and retain.
  • The value for INTERVAL_LENGTH_MINUTES.  The lower this value, the more runtime statistics data you will have and thus you will need more space.
  • Type of workload.  If you have an ad-hoc workload that has high variation in query text, then you will have more individual queries stored, and thus more plans and more runtime and wait statistics as that information.  If you have a stable workload that does not have ad-hoc queries or queries generated by dynamic strings or ORM tools like NHibernate or Entity Framework), then you will have a fewer number queries and less data overall.

As you can see, there is no “answer” for what the value for MAX_STORAGE_SIZE_MB should be.  I recommend starting with 2GB allocated, and then monitor via sys.database_query_store_options and Extended Events.  For some solutions, 1GB is plenty.  For other solutions, you may need 5GB or more.

 

CLEANUP_POLICY (STALE_QUERY_THRESHOLD_DAYS)

The default value for SQL Server 2016, SQL Server 2017, and Azure SQL Database is 30, with the exception of the Basic tier for Azure SQL Database, which defaults to 7 days.

How much historical data do you want to keep?  If you’re a shop that develops in production, you might want to keep more history.  If your workload is pretty stable and you only roll-out changes quarterly or less frequently, 30 days may be enough information for you.  The more data that you retain, the more disk space you will need.  If you’re not certain about workload, I recommend starting with at least 30 days for this setting, and over the first couple months of use you’ll figure out if you want to keep older data.

 

SIZE_BASED_CLEANUP_MODE

The default value for SQL Server 2016, SQL Server 2017, and Azure SQL Database is AUTO, and I recommend leaving it as such.

With a value of AUTO, as Query Store gets close to the storage size allocated by MAX_STORAGE_SIZE_MB it will automatically purge out the oldest data to make sure there is enough space for new data.  There is a possibility for data that has not reached the CLEANUP_POLICY to be removed (e.g. if MAX_STORAGE_SIZE_MB is 2GB and CLEANUP_POLICY is 30 days, and you reach 2GB in 15 days, data will start to be removed).

You can set this to OFF, but in that scenario, if the MAX_STORAGE_SIZE_MB is reached the OPERATION_MODE will change to READ_ONLY and you will no longer capture new data.  It is recommended to leave this set to AUTO and adjust MAX_STORAGE_SIZE_MB as appropriate.

 

DATA_FLUSH_INTERVAL_SECONDS

The default value for SQL Server 2016, SQL Server 2017, and Azure SQL Database is 900 (15 minutes).

It is recommended to leave this value at the default.

 

INTERVAL_LENGTH_MINUTES

The default value for SQL Server 2016, SQL Server 2017, and Azure SQL Database is 60.

This is a critical setting, as it determines the window of time across which runtime statistics will be aggregated.  You can only select fixed values for this settings (1, 5, 10, 15, 30, 60, 1440).  The smaller this value, the smaller the window of time for which you will have runtime stats.  This will allow you to look at data at a more granular level.  However, the smaller the value the more data you will capture and thus the more space that is needed.

For the client environments that I support, I’ve set this to 30, as I like a smaller window of time for analysis and based on the performance issues I’ve had to troubleshoot thus far, that’s been a good window.  If you have space constraints or concerns, then leave it at the default of 60.

 

WAIT_STATISTICS_CAPTURE_MODE

The default value for SQL Server 2016, SQL Server 2017, and Azure SQL Database is ON.

If you upgrade a database which has Query Store enabled from SQL Server 2016 to SQL Server 2017, the WAIT_STATISTICS_CAPTURE_MODE will be enabled on upgrade.  If you have a database on SQL Server 2017 and enable Query Store, this option will be enabled.

I recommend enabling this option if you’re on SQL Server 2017 as this information can be extremely valuable when troubleshooting query performance.  Note that you may need to increase MAX_STORAGE_SIZE_MB to accommodate this additional data.

Query Store and Availability Groups

Last week at the PASS Summit I presented a pre-con and general session on Query Store. I had several questions related to Query Store and Availability Groups, so I thought I’d pull them all together and answer them here.

 

Q: What happens to the Query Store data when a database is in an Availability Group?

A: You can enable Query Store for any user database, including one in an AG, and since that data is stored in internal tables in the database, it exists in the replica copies as well (just like other system tables and user tables).

 

Q: Can you enable Query Store for a read-only replica?

A: No.  Because the replica is read-only, and Query Store inherently writes data TO the database, you cannot enable it to capture queries that are executed against that read-only copy.  I did create a Connect item for this request.  If this is of interest to you, please up-vote it: Enable Query Store for collection on a read-only replica in an Availability Group.  The more votes this has, the better the possibility that Microsoft will implement it, so feel free to share with your friends and have them vote too!

 

Q: Can you force a plan for a query on a read-only replica if you force it first for that query on the primary?

A: No.  With a read-only database this is not possible.

 

Q: Can you create a separate user database on the secondary, enable Query Store for it, and then issue queries against the read-only replica through that user database and have them captured in Query Store?

A: No.  (But it’s a really creative idea.)

 

Have other questions related to Query Store and AGs?  Let me know!  Happy to answer them.

PASS Summit 2017: Day 2

Day 2 is starting here at PASS Summit in Seattle – watch this space for updates the next hour and a half!

Grant Fritchey is kicking off day 2!  I’m a big fan of Grant.  He talks about the community and how people should take advantage of what’s available here at Summit.  I couldn’t agree more – I wouldn’t be where I am in my career without this conference, nor would I have the friends I do all over the world.  Grant hands it over to Denise McInerney, VP of Marketing, and Denise shares her journey within PASS (she’s a 6 year Board Member).  Denise continues driving the point about the value of this conference and PASS.  She then talks about the volunteers for PASS, and announces the winner of the PASSion Award.  This year’s winner is Roberto Fonseca.

Denise is talking about the survey feedback from the past couple years, and session evaluations at PASS.  *PLEASE* complete these by the end of next week (if not earlier) – both PASS and the speakers truly value this feedback.  If you provide additional feedback there is a Board Q&A tomomrrow at Friday, 2PM. Today is the WIT lunch, and Denise announces that next year’s Summit is  November 6 – November 9, 2017.  Denise introduce Rimma Nehme, a Group Product Manager/Architect for Azure Cosmos DB and Azure HDInsight at Microsoft.  Today is going to uncover CosmosDB.  It will be technical!  Let’s go!!

 Do we need another database?

This is the wrong question to ask.  If you look at the rising data problems and needs, most of the production systems today (particularly the ones designed in the 70s and 80s), the modern calls and needs are addressing these problems.  90% of the world’s data was created in the last 2 years alone.  The estimated growth in the next 3-5 years is 50x.  Another trend is global, and another is data is big.  Not just a few TB, but trends of companies processing hundreds of TB to pedabytes.  Every 60 seconds 200 million emails are generated.  Rimma is throwing out ridiculous numbers about the amount of data being generated.  I can’t keep up!

Data is also interconnected.  What you do in Seattle can be connected to another location in the world.  This the butterfly affect.  We are experiencing about 125 exabytes of data (that’s a lot of zeroes).  Companies are looking at ways of extracting that data and monetize that information.  Another trend is the balance of power continues to shift from structured to unstructured data.  About 80% of data originates in unstructured data. Never push the data to computation – push the computation to the data.

When dealing with distributed, you need to deal with a lot of differences.  For example, different architectures.  In 2010 an engineer inside Microsoft observed this and identified that we need a different architectures to deal with these fundamental differences in data at scale.  This is how Project Florence was born, which is the base of what is now CosmosDB.  It was one of the exemplary partnerships between Microsoft Research and the Azure Data team.

At the time they were working to address the problem of the data for large scale applications within Microsoft (e.g. XBox).  They tried the “earthly databases”, building something on their own, and these options weren’t working.  Hence project Florence to meet the internal needs.  A basic set of requirements were laid out:

  1. Turnkey global distribution
  2. Guaranteed low latency at the 99th percentiles, worldwide
  3. Guaranteed HA within region and globally
  4. Guaranteed consistency
  5. Elastically scale throughput and storage, at any time, on demand, and globally
  6. Comprehensive SLAs (availability, latency, throughput, consistency)
  7. Operate at low cost (this is crucial and important! – first customers were Microsoft departments)
  8. Iterate and query without working about schemas and index management (applications evolve over time and rapidly))
  9. Provide a variety of data model and API choices

This manifests into three principals that have evolved

  1. Global distribution from the ground up
  2. fully resource governed stack
  3. Schema-agnostic service

It is very hard to build any service (particularly with such requirements).

If it was easy, everyone would do it (via NASA).  So this is how CosmosDB was built.  This is used internally by Microsoft.  It is one of the fastest services in the cloud.  It is a ring-0 service, meaning it is available in all regions by default.  It is millions of lines of C++ code.  It is 7 years in the making, it is truly not a new services.  Here is what it looks like (a bit of marketing here).

The foundations of the service for a globally distributed, massively scale-able multi–model database service are

  1. comprehensive SLA
  2. five well-defined consistency model
  3. guaranted low latency at t the99th percentile
  4. elasticscale out of storage and throughput
  5. and…

Fine grained multi-tenancy.  This cannot be an after thought.  From left to right, you can take a physical resource like a cluster and dedicate to a single tenant (e.g. customer or database).  You can take an entire machine and dedicate.  You can go another step and take a machine to homogeneous customers.  The final level of granularity is taking that machine and dividing between heterogeneous tenants and providing performance and scalability.

In terms of global distribution, Azure has 42 regions world wife…36 are available, 6 are still being  built out.  You can span your CosmosDB across all of those regions.

Within a database account you have a database.  Within that you have users and permissions.  Within that CosmosDB is a container.  A container of data with a particular data model.  Below that are other user defined code.  The database may span multiple clusters and regions and you can scale it in terms of these containers.  It is designed to scale throughput and storage INDEPENDENTLY.  How is the system designed by the scene (10K foot view)?  Within regions there are data centers, with data centers there are stamps, within that there are fault domains, within that there are containers and within that the replicas.  Within the replicas are the data.  On the database engine this is where the secret sauce comes in – bw-indexes, resource manager, log manager, IO manager, etc.  On any cluster will see thousands or hundreds of tenants.  Need to make sure that none of the tenants are noisy.

Another tenant that is import is the concept of partitioning.  How does CosmosDB solve this?  The tenants create containers of data and behind the scenes these are partitions.  The partitions are comprised are 4 replicas.  This is consistent and reliable.  Each one is a smart construct.  Out of those partitions, you can create partition sets.  These can then span clusters, federations, data centers, regions.  You can overlay topologies to implement solutions that span across multiple regions across the planet.  You need to make sure that the applications then work really well (particularly when merge or split partitions set).  You have the partition which is a data block and then you can build the partition set of various topological.

What are some of the best practices?

  1. Always want to select a partition key that provides even distribution
  2. user location aware partition key for access locally
  3. Select a partition key that can be a transaction scope
  4. Don’t want to use the timestamp for write-heavy workloadso

The resource model summary : Resources, Resource model, partitioning model

Core capabilities Turnkey global distribution – this is adding regions with a click.  Yu can come to an Azure portal, you can see the map of the entire world and pick the regions where you want your data to be.  The data is replicated behind the scenes and then its available for access.  You’re not dealing with VMs, cores.  You can add and remove regions at any time and the application does not need to be re-deployed.  The connection between application and database is logical.  This is enabled by multi-homing capability API.  You can connect to physically to the end point.  Another capability is that you can associate priorities with each of the regions.  If there is an outage or failover in a region, the failover will occur in the order of priority, and that can be changed at any time.  Something added for customers is to simulate a regional outage (but don’t go crazy with this says Rimma!).  You can test HA of the entire application stack.

Another capability is being able to provide geo-fencing.  If you come from any other part of the world there can be regulations where data has to present in particular regions, so if data needs to stay withing a location for requirements, that capability is required.

How does AlwaysOn work?  By virtue of partitioning have multiple locations.  One replica goes down, the application will be unaffected.  If partition goes down, the application will go t partition in another region.  If an entire region goes down, the application will go to another region.  The data is always available.

Another area of interest is active writers and active readers in any region.  Right now turnkey provided at database level, but looking to push this down to the partition key level (a true Active Active topology).  Online backups are available, they are stored in Azure blob storage.  The key capability is that it’s intended for “oops I deleted my data”, it”s’ not for a data center going down (that’s hwy you have multiple regions).

Second capability is elastic scale out.  As data size, scale throughput independently.  Could start out with small amount of data and keep adding more and more.  Back end will seamlessly scale.  Transnational data tends to be small, web and content data is medium sized, and social data/machine generated data is much larger.  As data size grows or throughput grows, scale occurs and this happens seamlessly behind the scenes.  This is done with SSDs behind the scenes.

Resource governance is the next component.  As operations occur, they occur RUs.  You provision RUs that you need (how many transactions/sec to you need?).  All replicas (just a partitioning of data) get a certain budget of RUs.  If you exceed, you’ll get rate limited.  At any time can increase provision throughput.  Can then support more transactions/sec.  Can also decrease at any time.

RU is a read-based currency partitioned at granularity of a 1 second.  It is normalized across DB operations.  Cost the operations via machine learning pipelines that cost queries (e.g. scans, lookups, etc.).  Have run machine learning on models on telemetry data, and then calibrate the cost model accordingly for RUs.  ((DATA DRIVEN).  Back to partitioning model: at any time can change throughput and behind the scenes you can specify the throughput (RUs) you want.  Behind the scenes the re-partitioning will occur, and each one will get more RUs to provide the throughput asked for.  This is where splitting/merging partitions matters, but it happens behind the scenes and you don’t have to worry about it.

What about when you add regions?  You want to add more RUs so you don”t starve existing regions.  Those RUs are spread across all partitions and regions.  Rimma shows how one customer elastically provisioned resources during the holiday season to size up to meet demand, and then size down when no-longer needed.  In a 3 day period, Rimma shows a graph of RUs.  At the top end there are 3 trillion RUs.  (THREE TRILLION IN THREE DAYS PEOPLE)

Various nodes have a various number of RUs serving different workloads, and you can look at the different tenants and partitions in there.  Multi-tenancy and global distribution at that level is incredible.

Another tenant: Guaranteed low-latency at 99%.  This was a core requirement because time is money.  From the business part of view, twice as much revenue lost to slowdowns.  So the system is designed.  At 99th percentile, less than 10ms for the reads measured at 1KB document (which is =80-90% of workload).  At average, will observe lower latency (less than 2 ms for reads and 6ms for writes.  How is this accomplished?  Reads and writes from local region and SSDs done.  The database designed to be write optimized and using latch-free database engine.  All data is indexed by default.  This is a fundamental difference from relational databases, here we have automatically indexed SSD storage.  Customer example: application in California and data in far east.  Added another region and then latency dropped.  Over black Friday/cyber Monday latencies less than 10ms for reads and 15ms for writes.  Azure Cosmos DB allows you to be the speed of light.  If have a database in Chicago and have friends in Paris who want to read your data.  If this  was a central database they would request to read the data from Paris and getting that data from Chicago to Paris takes 80-100 ms.  With CosmosDB you get it in less than 10ms because of those regional locations.

The last here is the consistency model in CosmosDB.  How does it go?  When you are dealing with any distributed system, whether databases or other sytem, typically you are faced with fundamental trade off of latency, availability, consistency and throughput.  If centralized database all requests against primary copy.  With global distribution, get geo–replication get HA and low latency.  But what happens if one replica can’t communicate with others and updates are being made?  What kind of consistency guarantees are made?  This can be a problem!  Do you need to wait for data to be synchronized before you serve it?  Do you want strong consistency or eventual consistency?  Do you want the red pill or blue pill?  With a relational database you get a perfect consistency.  They won’t serve the data until quorum is an agreement. The price there is latency.  On the other hand, the whole movement of no consistency guarantees means low latency.  But real-world consistency is not a binary choice as just described.

What about something in between?  The research literature talks about the wild west of consistency models (not one or the other).  A parper recommended is Replicated Data Consistency Explained Through Baseball by Doug terry, a Microsoft Research individual.  Uses real-world examples from baseball.  Depending on who you are in the game, you might get value out of different consistency models.  The engineers asked the question: can we pick out an intermedicate consistency model and is easy to configure, programmable, presents clear trade-offs?  Most real-life applications don”t fall into those two extremes.  Bounded-stateless, session (monotonic reads and writes local to geo location), and consistent prefix (when updates applied, the prefixes are guaranteed to be consistent).

How is this accomplished?  use TLA+ specifications to specify consistency models.  If you don”t know, check out video by Leslie Lampert who is an important figure in how the system was designed.  Leslie was a Touring award winner for Paxis (sp?) algorithm and founding father of what is used in the stack.

Operationalized the five different consistency models.  Using telemetry to see then how those models are used.  Only 3% use strong consistency, 4% use eventual, and 93% are using the three models in between.  Depending on consistency model specified, might need more computational work, which requires RU.  Have to make trade offs accordingly, and you can monetize this and decide what’s’ best.

Comprehensive SLAs…high availability SLAs are not enough.  Microsoft is in the service business, not just the software business.  Typically services don’t give any SLA, or give for HA.  When tried to approach this problem, asked “What are all the things that developers and customers really care about?”  They care about performance, throughput, consistency, availability and latency.  SO this is the first service in the market that has published comprehensive SLAs that are backed up by Microsoft financially.  Can now see that guaranteed if come in and run workload, will get guaranteed performance, throughput, consistency, availability and latency .  Availability tracked at the tenant and partition level in 5 minute granularity.   Customers can see their run time statistics against their SLA.

Service is also multi-model.  Wanted to enable native integration with different data models.  Behind scenes just ARS model (atom-record-sequence).  All models get translated to ARS model.  Very easy for the service to then on-board other data models now and in the future.  If want document and graph, do not need two copies of data, it can be handled by the same set of data.  This is a powerful combination — to look at data through different lenses.

Why schema agnostic approach?  Modern applications that are built in the cloud are not static.  Can start with one schema, add more tables/new columns…need a robust approach to handle these scenarios.  The object model is schema-free.  The data gets stored as-is.  How do you query this data?  Behind the scenes the data is ARS.  At the global scale, dealing with indexes, schema, etc. is a nonstarter.  In CosmosDB there is schema agnostic indexing.  The indexes are a union of all document trees, and can then consolidate into one and only keep unique values.  All of this structure info is then normalized.  It is an inverted index which gives optimal write performance.  Can identify where documents located and then serve up.  The index overhead in practice is pretty small.  There is a published paper, Schema-Agnostic Indexing with Azure Cosmos DB, go read it!

The Bw-tree in Azure Cosmos DB is highly concurrent, non-blocking, optimized for SSDs.  Avoids random writes.  There are three layers, the b-tree, cache and log structured store (see paper).  Rhema is going faster now. I’m not kidding.  Bw-tree is implemented as delta updates.  There is a mapping table to the updates and updates stored as deltas (sounds like in-memory index structure?).

Rimma shows architecture of Query Processing, there are different semantics but have the same underlying components (compiler, optimizer, etc.).  The engine is very flexible and expect that in the future will host other run-times.  The multi-API approach allows native support for multiple APIs.  If want to store data in cloud but not re-write your app, you can.  There are more APIs coming in the future.  What does this strategy enable?

  • No recompilation needed
  • Better SLAs, lower TCO
  • leverage the existing OSS tool-chain and ecosystem and development IT expertise
  • Life and shift from on-premises to cloud
  • No vendor lock-in
  • Symmetric on-premises and cloud database development

All happy databases are alike, each unhappy database is unhappy in its own way (Kyle Kingsbury via Leo Tolstoy).

How run service?  Weekly deployments worldwide.  16 hours of stress testing every day.  It’s like magic factory across the globe.  Also run variant and invariant checks.  Bots that are fixing nodes that might have issues.  Consistency checking and reporting going over the data continually.

In conclusion…wanted to put herself in our shoes. It’s a lot of information to digest…particularly if not invested in this technology.  Why should you care?  Rimma brings up a quote from Darwin:

It is not the strongest species that survive, nor the most intelligent, but the ones most responsive to change.

Can try CosmosDB for free, no need to credit card info, etc.  Childhood dream of going to Cosmos (space) will be fulfilled.

Key points to remember:

  • global distribution, horizontal partitioning and fine grained multi-tenancy cannot be an afterthought
  • schema agnostic database engine design is crucial for a globally distributed database
  • intermediate consistency models are extremely useful
  • globally distributed database must provide comprehensive SLAs beyond just HA

This is a hidden gem, but the bigger message remember the entire NoSQL movement is a counter-culture movement. But the question is how would we build databases if we started today? Without the legacy that we know, would we look at things differently?  Would we focus on limitations or focus on the bigger picture?  Sometimes it is ok to break the rules and try to different things.  Life is too short to build something that nobody wants.  If we focus on real pain points, not the “nice to have things”, but really look at the issues and abandon our constraints and self–imposed limits, we can create a modern SQL.  Rimma ends by thanking Dharma Shukla and entire CosmosDB team.

<collapses>

PASS Summit 2017: Day 1

Hey friends!  After a one-year hiatus I am BACK at the PASS Summit and ready to blog the day 1 keynote :)  I will update this post throughout the morning so refresh every so often to see the changes.  You can also follow along on Twitter – check out the #PASSsummit hashtag.

Reminder: My session is today at 1:30 in 6B, Query Store and Automatic Tuning in SQL Server, I hope to see you there!

Today’s keynote is headlined by Rohan Kumar (who I just got to meet thank you Mark Souza) and he’s stated that it will be a lot fun – you can preview what’s coming here.  Rohan is the General Manager Database Systems Engineering for Microsoft, and there are a fair number of demos coming our way.

PASS News

PASS President Adam Jorgensen starts off the day – this is the 19th PASS Summit.  Holy cow.  The PASS keynote is being streamed live via PASStv if you’re not available to be here in person.  If you are at the Summit this week and you have any [problem with your SQL Server implementation that you need answered, go to the Microsoft Clinic.  It is on the 4th floor near the Community Zone, and there are numerous Microsoft Engineers available to help.  It’s an amazing resource at this conference.

Adam takes a moment to thank the individuals that volunteer for PASS – the organization is primarily run by volunteers, and that includes the PASS Board.  The Board will have an open meeting on Friday at 2PM which anyone can attend. I f you have feedback or want to better understand how things work, definitely attend.  Outgoing PASS Board members are Jen Stirrup and Chirs Woodruff.  New elected members are John Martin, Diego Nogare, and Chris Yates.  Adam takes a moment to thank outgoing Past President Tom LaRock and Exec Board member Denise McInerney as their time on the Board comes to a close.

Please take time to meet our sponsors in the Exhibit Hall.  The support of our sponsors makes *so* many things possible not just at Summit, but throughout the year.

And Rohan takes the stage…

SQL Server 2017

Data, AI, and Cloud are three disruptive technology trends…and we need to figure out how to better migrate data to the cloud (I’m asking: how do we make it easier?).

At it’s core, the modern data estate enables simplicity.  It takes in any type of data, and allows a hybrid setup between on-premises and the cloud.  Rohan asks how many people believe they can move their data/solution to the cloud?  About 1% of the attendees raise their hand.  He then asks how many people think that deploying to the cloud or on-prem is what’s needed in the future?  The majority of people raise their hands.

SQL Server 2017 was released October 2, 2017, and SQL Server 2016 was released April 1, 2016…that’s a very fast release cycle for Microsoft, and that’s been possible because of the cloud-first approach, which translates to an increased cadence of SP and CU releases.  Reminder: in SQL Server 2017 there’s a shift to CU releases every month, and no more SPs.  Glenn blogged about this in September.  Rohan brings Bob Ward and Conor Cunningham on stage for the first demo.  They’re wearing Cowboys jerseys.  *sigh*  If you see Bob this week ask him how the Rangers did this year…

Bob and Conor step through a demo showing the performance benefit of a new HPE DL580 Gen 10, using persistent scalable memory NVDIMMs – a query that takes 15 seconds on SSDs takes about 2 seconds on the HP storage.  And it’s cheaper?  I’m deferring to Glenn for the hardware details!!

Bob introduces a “typical” parameter sniffing issue – and then shows how to use Automatic Plan Correction (which relies on Query Store under the covers)…which I’ll be showing today in my session as well!

New features in SQL Server 2017:

  • Advanced Machine Learning with R and Python
  • Support for graph data and queries
  • Native T-SQL scoring
  • Adaptive Query Processing and Automatic Plan Correction

There is much more available in 2017, as noted in the release notes.

Docker Containers

Tobias Ternstrom and Mihaela Blendea take the stage to talk about containers running SQL Server.  Mihaela shows the build definition, which starts a container based on the SQL Server build.  On top of that, restore production database to it and run any additional scripts (e.g. obfuscate and remove some data), then push out the images.  Tobias starts typing in a command line window…this I love.  He knows what he’s doing, but he’s always kind of winging it.  Tobais gives a sneak peak of a tool that shows up as being named Carbon, but Rohan introduces it as Microsoft SQL Operations Studio.  It works on Windows, Linux, and Mac to connect to a SQL Server database.  So at some point SSMS will be deprecated?  Yeah…just like Profiler 😉

Rohan comes back and talks a bit more about the cloud-first approach.  Azure SQL Database is updated regularly, and on a monthly basis new CUs are being pushed out (CU1 for SQL Server 2017 has ALREADY been released).  Multiple terabytes (yes TERABYTES) of telemetry data are captured every day from Azure implementations.  This feedback goes right into making the product better (how else do you think they’re able to release builds and CUs faster?).

Managed Instances

New deployment option in Azure: Managed Instances.  It’s currently in private preview, but you get an entire SQL Server instance with PaaS benefits.  This allows for much more of a lift and shift migration with minimal code changes.  Microsoft is also working on a database migration service – this will not be easy and may not work for every solution, but it’s a step in the direction of making that process better and more reliable.

Working with Large Data/BI Solutions

The next data is showing performance and scale with Azure SQL Database hosted by Danielle Dean, a Principal Data Scientist at Microsoft.  Reading in a lot of data – ingesting patient vitals into Azure database (1.4 million rows/sec via columnstore and in-memory).  Azure Machine Learning Workbench is then used to take an existing model and put it into Azure SQL Database.  Switching to SSMS (it’s not dead yet folks!!) you can query that model (it “looks” like a table), and use a stored procedure to predict against the model.

Scott Currie, the creator of Biml, on stage to talk about using the new Azure Data Factory with Biml.  I’ll admit, this isn’t a technology I know, so I”m just listening at this point :)

Azure SQL Data Warehouse Designed from ground up to separate out storage and compute so that you can scale each independently.  This design is very flexible and powerful, and provides significant ability to scale (up to 60 nodes currently), and it’s secure.  Also launched in early in October: Azure SQL Data Warehouse Compute-Optimized Tier.  This was a result of feedback from customers who had some DW queries that were running REALLY slow in Azure.  The solution caches column segments (data) locally, and this cache survives failures, which then provides high performance for DW queries.  Julie Strauss, a Principal Group Program Manager comes on stage to demo this.

Why are these behavioral analytic queries so compute-intensive?  It’s a combination of the data that’s needed and the complexity of the query.  Two kinds of analysis – funnel and cohort.  Both use telemetry from customer interactions/purchases from web site clicks.  The sophistication of the query is taking the vast about of data (100TB) and then fold it many times to create the different cohorts – the query takes about 6 seconds to read through that 100TB of data.  I’d like to know how this is done…

PowerBI quick demo against data with 100+ million rows.  Model built from Visual Studio sourcing data from Azure SQL Data Warehouse – very easy to deploy the model and then generate different visuals in PowerBI (clicky clicky drop was the “official” term used…I’m not kidding).  Ability to also scale in Azure very quickly so only using resources really need (and thus only pay for what need and use).

Ok, there was one more demo but I’ll admit, I’m fading.  :)

Rohan is wrapping up the keynote and talks about community and all of us working together and lifting each other up.  Rohan gives a shout out to members of the community that have really given a lot back to others.  He also mentioned Denny Cherry, a member of the community who had surgery a couple weeks ago.  I had a recent update from a colleague that Denny is doing well – please send good thoughts his way!

And that’s a wrap.  Off for a day of learning – have a great one friends!

Query Store Pre-Con at PASS Summit: More Details

In the time since my last post about my Query Store pre-con at the PASS Summit (it’s just about a month away!) I’ve gotten several more questions about the session.  I’ve consolidated them and provided answers here, and if anyone still wants to know more, please email me!

What version of SQL Server will you be running?

With the announcement on Monday from Ignite about SQL Server 2017, I can now say for certain that is the release I will be using.  I will call out differences between SQL Server 2016 and SQL Server 2017 as needed.  I don’t expect everyone who’s attending to be running 2016 or 2017, but I do expect that you’re in the planning stages for an upgrade, are going to start planning soon, or you’re looking for some solid reasons to share with your company about why you should upgrade.  I would argue that Query Store is a very good reason to upgrade, but that is definitely not the only one (check out the new features list for SQL Server 2016 and SQL Server 2017 – including enhancements to Query Store).

Will you have labs during your session?

No, but I will make all scripts available for attendees, and if you want to follow along during the day, you can get the scripts that morning.  I tweak demos right up until a session – it’s what I expect to be doing the Sunday before.  I’ll have all the scripts on a share that attendees can access the morning of the 30th.  I *love* the idea of labs during a pre-con, and I know that some other presenters do them.  Huge props, as it’s not easy to do.  In order to cover everything I want to cover, I don’t have enough time for in-depth labs that everyone can walk through from start to finish in the time allotted.  But again, feel bring to a laptop with SQL Server 2017 installed, along with the latest SSMS, and you can follow right along.

Are you giving anything away?

Besides knowledge?  😊  Yes, I will have codes for 30 days of free access to Pluralsight, which gives you access to over 50 courses from the SQLskills team, including my 3-hour course on Query Store.  If there’s something you miss during the pre-con, you can watch the Pluralsight course to pick it up, or further enhance what you learned.  I’m also working on a new course about the enhancements to Query Store in SQL Server 2017 and Automatic Tuning, which I’m also covering in my full day session.

Additional thoughts…

I’m also hosting a panel session at Summit and the organizer has emailed me to discuss how we handle questions during the session.  In the case where there are a large number of attendees, sometimes people hesitate to ask questions, and sometimes it can slow down the flow of the session.  While I am not setting records in terms of attendance, there are a lot of people that have signed up (which is fantastic) and I do want to make sure that everyone has a chance to ask their questions.  I’ve thought about using Twitter, and I know that Slack has been suggested as an option.  I’m still thinking about it.  If you’re signed up and have a preference, feel free to let me know in the comments!

That’s it for the moment, but please keep those questions coming.  Again, I want to make sure this is the right session for you to attend – there are a lot of great choices this year!  Either way, I look forward to seeing at Summit next month!

 

Changes to query text and the effects in Query Store

In this week’s Insider newsletter, Paul talked about an interesting customer issue I ran into last week related to Query Store, and an inadvertent change to a query’s text.  I thought I’d provide some demo code so that those of you using (or planning to use) Query Store could see how this shows up in your database.

Recreating the issue

I started by restoring a copy of the WideWorldImporters database, which you can download here.  I enabled Query Store and cleared out the data, just as a precaution.

USE [master];
GO
RESTORE DATABASE [WideWorldImporters]
FROM  DISK = N'C:\Backups\WideWorldImporters-Full.bak' WITH  FILE = 1,
MOVE N'WWI_Primary' TO N'C:\Databases\WideWorldImporters\WideWorldImporters.mdf',
MOVE N'WWI_UserData' TO N'C:\Databases\WideWorldImporters\WideWorldImporters_UserData.ndf',
MOVE N'WWI_Log' TO N'C:\Databases\WideWorldImporters\WideWorldImporters.ldf',
MOVE N'WWI_InMemory_Data_1' TO N'C:\Databases\WideWorldImporters\WideWorldImporters_InMemory_Data_1',
NOUNLOAD,
REPLACE,
STATS = 5;
GO

ALTER DATABASE [WideWorldImporters]
SET QUERY_STORE = ON;
GO

ALTER DATABASE [WideWorldImporters]
SET QUERY_STORE CLEAR;
GO

I then created a stored procedure with just one query in it:

USE [WideWorldImporters];
GO

CREATE PROCEDURE [Sales].[usp_GetFullProductInfo]
@StockItemID INT
AS

SELECT
[o].[CustomerID],
[o].[OrderDate],
[ol].[StockItemID],
[ol].[Quantity],
[ol].[UnitPrice]
FROM [Sales].[Orders] [o]
JOIN [Sales].[OrderLines] [ol]
ON [o].[OrderID] = [ol].[OrderID]
WHERE [ol].[StockItemID] = @StockItemID
ORDER BY [o].[OrderDate] DESC;
GO

Then I executed the stored procedure twice, with two different input parameters.  I purposely used WITH RECOMPILE here because I knew the values would generate different plans.

EXEC [Sales].[usp_GetFullProductInfo] 90 WITH RECOMPILE;
GO

EXEC [Sales].[usp_GetFullProductInfo] 224 WITH RECOMPILE;
GO

Within Object Explorer I then drilled into the Query Store reports (Databases | WideWorldImporters | Query Store) and double-clicked to run Top Resource Consuming Queries.  My query has two different plans:

Initial query execution, two plans

Initial query execution, two plans

 

 

 

 

 

 

 

 

 

Plan_id 1 has a clustered index scan with a filter, that feeds into the probe phase of a hash match (with a columnstore index scan for the build). This was generated with the input value of 90.

Plan 1 for query_id 1

Plan 1 for query_id 1

 

 

 

 

 

 

 

 

 

 

Plan_id 2 has the same columnstore index scan, but it feeds into a nested loop, with a clustered index seek for inner input, and was generated with the input value of 224.

Plan 2 for query_id 1

Plan 2 for query_id 1

 

 

 

 

 

 

 

 

 

 

Now let’s suppose I decide to force one plan for that query. After analysis, I decide to force the second plan (plan_id 2), which I can do either through the UI, or using a stored procedure. Since change control and documentation is a good thing, we’ll use the stored procedure:

EXEC sp_query_store_force_plan @query_id = 3, @plan_id = 8;
GO

At this point, if I re-run the stored procedure with the two different input parameters and WITH RECOMPILE, I will get the plan with the nested loop.

EXEC [Sales].[usp_GetFullProductInfo] 90 WITH RECOMPILE;
GO

EXEC [Sales].[usp_GetFullProductInfo] 224 WITH RECOMPILE;
GO
SP execution after plan is forced

SP execution after plan is forced

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Now…let’s assume that this forced plan is used going forward and there are no issues. And then let’s assume that a developer decides to add some comments to the stored procedure, and they know they should use ALTER to keep the object_id the same…but somehow, a couple extra spaces end up in the query.

ALTER PROCEDURE [Sales].[usp_GetFullProductInfo]
@StockItemID INT
AS
/*
This is getting information based on ID
*may need to add more columns per BL
*/
SELECT
[o].[CustomerID],
[o].[OrderDate],
[ol].[StockItemID],
[ol].[Quantity],
[ol].[UnitPrice]
FROM [Sales].[Orders] [o]
JOIN [Sales].[OrderLines] [ol]
ON [o].[OrderID] = [ol].[OrderID]
WHERE [ol].[StockItemID] = @StockItemID
ORDER BY [o].[OrderDate] DESC;

GO

I re-run the stored procedures again (WITH RECOMPILE) and now I get that other plan again for the input value of 90:

SP execution after ALTER, spaces inadvertenly added

SP execution after ALTER, spaces inadvertenly added

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

If I look in Query Store, I can see I now have two queries that look nearly identical:

Two queries in QS after SP was changed

Two queries in QS after SP was changed

 

 

 

 

 

 

There are different query_id values in sys.query_store_query, as well as different entries for each query in sys.query_store_query_text. Both queries are still tied to the stored procedure (object_id 1483152329) which can be confusing if you don’t understand what occurred. In order to confirm which query has executed most recently, you can use the query below, which joins both system views (replace the query_ids as appropriate):

SELECT
[qst].[query_text_id],
[qst].[query_sql_text],
[qst].[statement_sql_handle],
[qs].[query_id],
[qs].[query_hash],
[qs].[last_execution_time],
[qs].[object_id]
FROM [sys].[query_store_query] [qs]
JOIN [sys].[query_store_query_text] [qst]
ON [qs].[query_text_id] = [qst].[query_text_id]
WHERE [qs].[query_id] IN (1, 14);
GO
Information for both queries from Query Store views

Information for both queries from Query Store views

 

 

 

 

Summary

The take home message here is that you need to be careful when changing objects (stored procedures, functions) where you have queries with forced plans. Inadvertent changes in spacing or casing create entirely separate queries in Query Store, which means that the original query with a forced plan is no longer in use.  You’ll either have to ALTER the procedure again without the extra spaces in the query (which may not be as easy as it sounds), or get the new version of the query to execute the “right” plan, and then force that plan.  It may be appropriate to set up monitoring for object changes, or teach your developers how to identify what queries are forced and to what objects they below (hint: use the Queries With Forced Plans report that’s available in SSMS 17.x). Hope this helps, and if you want to learn more about Query Store, consider my full day pre-con at Summit next month where we’ll cover this type of scenario and a whole lot more! Feel free to reach out if you have questions about it :)

p.s. did you see that they announced the release date for SQL Server 2017? I’ll be running that build at my pre-con!

Query Store Pre-Con at the PASS Summit: Is it right for you?

I received an email over the weekend asking about my pre-con at the PASS Summit, my general session at the Summit, and my Query Store course on Pluralsight. The individual wanted to know the requirements for the pre-con, and what overlap exists between these three. Great question.

First, feel free to review the abstracts for all:

PASS Summit Pre-Con, October 30, 2017 (Seattle, WA)
Solving Common Performance Problems Using Query Store

PASS Summit General Session, Date TBA (Seattle, WA)
Query Store and Automatic Tuning in SQL Server

Pluralsight, online
SQL Server: Introduction to Query Store

None of these courses require pre-existing knowledge of Query Store.

The Pluralsight course starts at the beginning and walks you through configuring and using Query Store in SQL Server 2016 in find performance issues both retroactively and proactively (3 hours total).

The general session at the Summit discusses Query Store at a high level (what it is and how to use it), and talks about the new Automatic Tuning feature in SQL Server 2017 (Automatic Plan Correction) as well as Automatic Index Management in Azure SQL Database (75 minutes).

My full day pre-con covers everything from Pluralsight and the general session, and a whole lot more. If you want:

  • all the gory details on what Query Store is, how to configure it, and what data exists
  • best practices
  • a slew of demos about how you can use it to find performance issues (out-of-the-box Query Store methods and custom options not documented elsewhere)
  • tons of scripts that you can take home and use immediately
  • details about the wait statistics data added to Query Store in SQL Server 2017
  • additional uses for Query Store (beyond what’s documented)
  • how to use Automatic Plan Correction in SQL Server 2017 (automatic and manual) and what Automatic Index Management looks like in Azure SQL Database
  • an overview of how to visualize Query Store data

then the pre-con is going to give you all that and whatever else I can find time to fit in.  It is a packed day and I am in the thick of pulling all the slides and demos together (so if there’s something you want to see/know that you don’t see mentioned here, feel free to email me!).

I hope this answers any questions you have about my pre-con at Summit and other Query Store sessions available, but if not, just contact me!

Have a great week, and to my friends in the south – the Stellato family is keeping you in our thoughts.

SQLskills 101: The Other Bad Thing About Clearing Procedure Cache

Every time you execute a query against SQL Server, it goes through optimization and compilation and a query plan is generated. Most of the time that plan goes into cache (there’s always exceptions, which is a discussion for a separate time), which means that the next time that query executes, SQL Server can use the same plan.  Now, re-using the same plan is typically a good thing; SQL Server doesn’t have to go through full optimization and compilation again.  But sometimes that plan isn’t ideal for the query and you want to remove it from the plan cache.  You can do this manually.  Plans can also age out if they’re not being used, but if I want to purposely evict a plan from cache, I have several options:

Ideally, you should remove only what’s absolutely necessary.  Using DBCC FREEPROCCACHE is a sledgehammer approach and typically creates a spike in CPU as all subsequent queries need to have their plans re-generated.  Glenn gives examples on how to use each statement (and others) in his post Eight Different Ways to Clear the SQL Server Plan Cache, and I want to show you one more thing that happens when you clear a plan (or all plans) from cache.

For this demo script, I recommend running it against a TEST/DEV/QA environment because I am removing plans from cache which can adversely affect performance.

We’ll  run one statement and one stored procedure multiple times against the WideWorldImporters database:

/*
Create a stored procedure to use for testing
*/
USE [WideWorldImporters];
GO

DROP PROCEDURE IF EXISTS [Application].[usp_GetCountryInfo];
GO

CREATE PROCEDURE [Application].[usp_GetCountryInfo]
@Country_Name NVARCHAR(60)
AS
SELECT *
FROM [Application].[Countries] 1
JOIN [Application].[StateProvinces] [s]
ON [s].[CountryID] = 1.[CountryID]
WHERE 1.[CountryName] = @Country_Name;
GO

/*
Remove everything from cache
*/
DBCC FREEPROCCACHE;
GO

/*
Run the stored procedure 20 times
*/
EXECUTE [Application].[usp_GetCountryInfo] N'United States';
GO 20

/*
Run the query 20 times
*/
SELECT
[s].[StateProvinceName],
[s].[SalesTerritory],
[s].[LatestRecordedPopulation],
[s].[StateProvinceCode]
FROM [Application].[Countries] 1
JOIN [Application].[StateProvinces] [s]
ON [s].[CountryID] = 1.[CountryID]
WHERE 1.[CountryName] = 'United States';
GO 20

Now let’s see what’s in cache, and what’s also in sys.dm_exec_query_stats (which gives us execution statistics).

SELECT
[qs].[last_execution_time],
[qs].[execution_count],
[qs].[total_logical_reads]/[qs].[execution_count] [AvgLogicalReads],
[qs].[max_logical_reads],
[t].1,
[p].[query_plan]
FROM sys.dm_exec_query_stats [qs]
CROSS APPLY sys.dm_exec_sql_text([qs].sql_handle) [t]
CROSS APPLY sys.dm_exec_query_plan([qs].[plan_handle]) [p]
WHERE [t].1 LIKE '%Application%';
GO
Text, plan, and query stats from the DMOs

Text, plan, and query stats from the DMOs

 

 

 

 

Cool, I see that each one executed 20 times, and in addition to the text and the plan I pulled information about I/Os.

Now let’s clear procedure cache, and we’ll take the good old sledgehammer approach because we’re lazy and run DBCC FREEPROCACHE.  Then run the query to see what’s in sys.dm_exec_query_stats…

DMO information *after* running DBCC FREEPROCCACHE

DMO information *after* running DBCC FREEPROCCACHE

 

 

 

 

 

 

Do you see what happened?  In addition to removing the plan, we also lost all the query execution stats from dm_exec_query_stats.  We have no history about query performance (unless we were writing it to another table on a regular basis, or have a third-party tool that captures baselines).  If you don’t believe me, just run SELECT * FROM sys.dm_exec_query_stats…all that data is gone (but will start to re-populate as you run queries and plans go into cache).

This is important to know! If you’re troubleshooting a performance issue and you want to try removing the plan from cache, be aware that you’re removing execution statistics too (and maybe save those off first in case you need them for reference/comparison later).

Migrating to In-Memory OLTP…what’s stopping you?

In the past few months I’ve had several clients reach out about migrating to In-Memory OLTP solutions.  When the second or third request rolled in I remember thinking, “Finally!”  As in, I’ve been wondering why businesses haven’t been looking to implement In-Memory sooner.  In fact, I added a section on In-Memory to our IEPTO2 course because with the availability of the feature in SQL Server 2016 SP1 (yes, there are memory limitations, but it’s a good place to start) I figured we would see an uptick in interest.  But here we are, half way through 2017 and over 6 months since the release of SQL Server 2016 SP1, and I still see a lot of hesitation around it.

I wrote a post over on SQLPerformance last week, Testing DML Statements for In-Memory OLTP, and that generated some discussions on Twitter.  So I figured it was time for a post to find out what’s holding companies back.  This isn’t a true poll – it’s a fill-in-the-blank.  As in: post a comment.  If you have considered migrating to In-Memory and then didn’t, I want to understand why.  I recognize there are limitations – the product is still new and it’s evolving.  But perhaps if we understand the largest inhibitors to migration we can help move them up on Microsoft’s list via Connect and other pathways.  Understand I am asking for specifics here, for example: we can’t use In-Memory tables because they don’t support spatial data types.  Whatever the reason, share it via a comment.

I’m giving this a week or so, then I’ll write a follow up either discussing trends, debunking myths, showing some testing, or perhaps just talking about why I’m sad that no one uses In-Memory OLTP.  :)  In all seriousness, I have no idea what the follow up will look like, it all depends on whether you and your friends share info!  Until then, thanks for reading and commenting!

(And if you’re in the US, please have a safe 4th of July…no one wants to go to the hospital after trying to set off fireworks at home…just ask my friend who’s a hand surgeon, it’s her least favorite time of year!)

Fireworks from the professionals

Fireworks from the professionals

DBCC CLONEDATABASE Cannot insert duplicate key Error

If you’ve been using DBCC CLONEDATABASE at all, you might have run into a cannot insert duplicate key error (or something similar) when trying to clone a database:

Database cloning for ‘YourDatabase’ has started with target as ‘COPY_YourDatabase’.
Msg 2601, Level 14, State 1, Line 1
Cannot insert duplicate key row in object ‘sys.sysschobjs’ with unique index ‘clst’. The duplicate key value is (1977058079).

If you do some searching, you’ll probably end up at this Connect item: DBCC DATABASECLONE fails on sys.sysowners.

The Connect item states that the problem exists because of user objects in model.  That’s not the case here.

I’m working with a database created in SQL Server 2000…now running on SQL Server 2016.  It turns out that when you create new user tables in a SQL Server 2000 database, the first objects have the IDs 1977058079, 2009058193, and 2041058307.  (Yes, we actually tested this…I had a copy of SQL 2000 in a VM but Jonathan installed one and dug into it to figure out the IDs.)  There were a lot of changes between SQL Server 2000 and SQL Server 2005, and that included changes to model.  In the model database in SQL Server 2005 and higher, the aforementioned IDs are used for QueryNotificationErrorsQueue, EventNotificationErrorsQueue and ServiceBrokerQueue respectively…Service Broker objects.  As a result, my user database – created in SQL Server 2000 – has user objects with IDs that are the same as system objects in model (and of course when the system objects like QueryNotificationErrorsQueue, EventNotificationErrorsQueue and ServiceBrokerQueue got created in my user database, they got completely different IDs).

Edit 2017-06-15: This issue is fixed in CU3 for SQL Server 2016 SP1 and CU5 for SQL Server 2014 SP2! See KB 4016238, FIX: DBCC CLONEDATABASE is unsuccesful if the source database has an object originally created in SQL Server 2000.

To determine if this is an issue for any of your user databases you can use the following queries (change user_database to the name of your DB):

/*
lists tables in the user database that have an ID that matches an object in model
*/
SELECT m.id, m.name, c.name, c.id, m.type
FROM model.sys.sysobjects m
FULL OUTER JOIN user_database.sys.sysobjects c
ON m.id = c.id
JOIN user_database.sys.objects o
ON c.id = o.object_id
WHERE o.is_ms_shipped &lt;&gt; 1
AND m.id IS NOT NULL;

/*
lists system objects in the user database that do *not* match the id of the same object in model
(FYI only)
*/
SELECT m.id, m.name, c.name, c.id, m.type
FROM model.sys.sysobjects m
FULL OUTER JOIN user_database.sys.sysobjects c
ON m.name = c.name
JOIN user_database.sys.objects o
ON c.id = o.object_id
WHERE m.id != c.id

To workaround this, you need different IDs for the affected objects in the user database.  Now, I could recreate my tables in the user database, confirm the ID isn’t used for any system object in model, and then move the data over…  But I have some tables that are larger than 100GB, so it’s not an easy fix.

If you’re interested in reproducing the issue, sample code is below (thanks Jonathan).  You need SQL Server 2000 installed and at least SQL Server 2005 or 2008 or 2008R2.   You cannot restore the SQL 2000 backup to SQL Server 2016 directly, you must do an intermediate upgrade to 2005, 2008, or 2008R2, then go to 2016.  Paul has tested upgrading a SQL Server 2005 database to a newer instance here; but for SQL Server 2000 you have to perform the intermediate upgrade.

Code to create the issue

DROP DATABASE New2000
GO

CREATE DATABASE New2000
GO

USE New2000
GO

DECLARE @loop INT
SET @loop = 0

WHILE @loop  < 3
BEGIN

	DECLARE @table NVARCHAR(4000)

	SET @table = 'table_' + REPLACE(REPLACE(REPLACE(REPLACE(CONVERT(VARCHAR(30), GETDATE(), 121), ':', ''), '-', ''),  ' ', ''), '.', '')

	DECLARE @sql NVARCHAR(4000)

	SET @sql = 'CREATE TABLE ' + @table+ ' (RowID INT)';

	EXEC(@sql)

	SET @Loop = (select COUNT(*) FROM sysobjects WHERE id IN (
		1977058079,
		2009058193,
		2041058307)
		)

	IF @loop = 0
	BEGIN
		PRINT 'Dropping Table'
		SET @sql = 'DROP TABLE '+ @table
		EXEC(@sql)
	END

	WAITFOR DELAY '00:00:00.010'

END

/*
	Backup the database in SQL Server 2000
	Restore in SQL Server 2016
*/

DBCC CLONEDATABASE ('New2000', 'COPY_New2000');
GO