Make sure you’re eligible to vote in the PASS elections this year!

Last year PASS introduced some eligibility requirements for voting in the 2014 PASS elections, but a lot of people didn’t get the message for various reasons (including me) before the deadline expired. PASS did the right thing and allowed people to become eligible by delaying the elections.

This year there are similar eligibility requirements and you’ve got until June 1st to make sure you’re eligible.

Check out the blog post from Bill Graziano (Immediate Past President of PASS) that explains what to do and how to verify your eligibility.

I’ve checked and I’m eligible (early this year! :-)

Thanks

 

Identifying queries with SOS_SCHEDULER_YIELD waits

One of the problems with the SOS_SCHEDULER_YIELD wait type is that it’s not really a wait type. When this wait type occurs, it’s because a thread exhausted its 4ms scheduling quantum and voluntarily yielded the CPU, going directly to the bottom of the Runnable Queue for the scheduler, bypassing the Waiter List. A wait has to be registered though when a thread goes off the processor, so SOS_SCHEDULER_YIELD is used.

You can read more about this wait type:

  • Here on my blog
  • Here in my post on the SQLPerformance.com blog

You want to investigate these waits if they’re a prevalent wait on your server, as they could be an indicator of large scans happening (of data that’s already in memory) where you’d really rather have small index seeks.

The problem is that they’re not a real wait type, so you can’t use my script to look at sys.dm_os_waiting_tasks and get the query plans of threads incurring that wait type, because these threads aren’t waiting for a resource, so don’t show up in the output of sys.dm_os_waiting_tasks!

The solution is to use the sys.dm_exec_requests DMV, as that will show the last_wait_type for all running requests. Below is a script you can use.

SELECT
	[er].[session_id],
	[es].[program_name],
	[est].text,
	[er].[database_id],
	[eqp].[query_plan],
	[er].[cpu_time]
FROM sys.dm_exec_requests [er]
INNER JOIN sys.dm_exec_sessions [es] ON
	[es].[session_id] = [er].[session_id]
OUTER APPLY sys.dm_exec_sql_text ([er].[sql_handle]) [est]
OUTER APPLY sys.dm_exec_query_plan ([er].[plan_handle]) [eqp]
WHERE
    [es].[is_user_process] = 1
	AND [er].[last_Wait_type] = N'SOS_SCHEDULER_YIELD'
ORDER BY
    [er].[session_id];
GO

That will give you the code and query plan of what’s happening, but even with that it might not be obvious which exact operator is causing that wait so you may need to resort to capturing SQL Server call stacks, as I explain in the first blog post link above.

Enjoy!

New 4-day Immersion Event on Practical Data Science with Rafal Lukawiecki

We’re very excited to announce that we’re offering a new Immersion Event on Practical Data Science using Azure Machine Learning, SQL Data Mining, and R, presented by our great friend Rafal Lukawiecki.

We’ve known Rafal well since 2003, and apart from being an expert in BI and data mining, he’s a superb, highly-engaging speaker, who’s presented keynotes, workshops, and sessions to high acclaim at conferences around the world including TechEds and Microsoft’s brand new Ignite conference. In fact, you can watch Rafal’s 75-minute, 300-level session from Ignite earlier this week, on the same topic as the new Immersion Event, on Channel 9 here – it’s really interesting!

Rafal’s course will be four days long, with the following modules:

  1. Overview of Practical Data Science for Business
  2. Data
  3. Process
  4. Algorithm Overview
  5. Tools and Getting Started
  6. Segmentation
  7. Classification
  8. Basic Statistics
  9. Model Validation
  10. Classifier Precision
  11. Regressions
  12. Similarity Matching and Recommenders
  13. Other Algorithms (Brief Overview)
  14. Production and Model Maintenance

You can read through the detailed curriculum here.

The first offering of this new class will be in Chicago, at our usual location, from November 16-19, 2015 – all the details are here.

Cool stuff – hope to see you there!

When DBCC DROPCLEANBUFFERS doesn’t work…

Here’s an interesting question that came up in our IEPTO1 class in Chicago this week (paraphrasing):

I was doing a demo recently where I was demonstrating physical I/Os occurring. I used DBCC DROPCLEANBUFFERS, then enabled SET STATISTICS IO ON and performed a SELECT operation. I was expecting to see physical reads occurring but I only saw logical reads. What’s going on? Why weren’t there any physical reads after I’d flushed the buffer pool with DBCC DROPCLEANBUFFERS?

It’s a very interesting question. If you ask most people what DBCC DROPCLEANBUFFERS does, you’ll get the response that it clears out the buffer pool (as my student expected).

But it doesn’t.

It drops *clean* pages from the buffer pool only.

A clean page is one that has not been changed since it was read into memory or last written to disk. A dirty page is one that has not been written to disk since it was last changed. Dirty pages are not dropped by DBCC DROPCLEANBUFFERS, they are only made clean by writing them to disk (either through one of the various kinds of checkpoints or by the lazy writer – or one of the per-NUMA node lazy writers if you have NUMA configured).

The demo in question had updated the table being selected, and so when DBCC DROPCLEANBUFFERS was executed, the pages from that table remained in memory – hence no physical reads were required for the subsequent SELECT.

If you want to ensure that all pages from a database are flushed from memory, you need to first perform a manual CHECKPOINT of that database and then run DBCC DROPCLEANBUFFERS.

You can then verify that there are no pages in memory for the database in question using the following code:

SELECT *,
	[DirtyPageCount] * 8 / 1024 AS [DirtyPageMB],
	[CleanPageCount] * 8 / 1024 AS [CleanPageMB]
FROM
	(SELECT 
		(CASE WHEN ([database_id] = 32767)
			THEN N'Resource Database'
			ELSE DB_NAME ([database_id]) END) AS [DatabaseName], 
		SUM (CASE WHEN ([is_modified] = 1)
			THEN 1 ELSE 0 END) AS [DirtyPageCount], 
		SUM (CASE WHEN ([is_modified] = 1)
			THEN 0 ELSE 1 END) AS [CleanPageCount]
	FROM sys.dm_os_buffer_descriptors
	GROUP BY [database_id]) AS [buffers]
ORDER BY [DatabaseName]
GO 

Hope this helps some of you out there!

SQLintersection coming up in May

We’re just about to leave for three weeks of Immersion Events in Chicago, starting next week, and if that wasn’t enough, we’ve got our Spring 2015 SQLintersection conference right after that!

This time we’re in Scottsdale (Phoenix), Arizona (May 18-21), which I’m really looking forward to as I’ve never been to Arizona (it’ll be the 23rd State I’ve visited).

We’ve got a fantastic speaker line-up as usual, with Brent Ozar, Allen White, Aaron Bertrand, Kevin Kline, SQL CAT team members Shep Shephard and Denzil Ribeiro, SQL PFEs David Pless and Tim Chapman, and four of us from SQLskills (me, Kimberly, Erin, and Tim).

I’ll be doing my favorite pre-con workshop on Performance Troubleshooting using Waits and Latches, plus there are other workshops from Brent Ozar, Kimberly, and Allen White. We’ve also got four tracks of sessions, including three from me:

  • Transaction Log Performance
  • Advanced Data Recovery Techniques
  • DBA Mythbusters

It’s been five years since I last put together a mythbusters session so plenty of new myths to bust :-)

I’m really looking forward to the conference and hope to see you there!

Cheers

PS We’ll be running out SQL After Dark evening quiz bowl and raffle again – that was a blast when we did it last time!

New SQLskills Pluralsight courses

We’ve released a couple of new Pluralsight courses recently:

The first is the conclusion of Glenn’s DMVs course series, SQL Server 2014 DMV Diagnostic Queries – Part 3, based on the very popular set of DMV queries he publishes every month.

The second course is Jonathan’s long-awaited SQL Server: Change Data Capture course. The short abstract is:

Learn how to capture SQL Server table data changes for downstream consumption using the Change Data Capture feature of SQL Server. This course is applicable to developers, DBAs, and architects on SQL Server 2008 onward.

The module list is:

  • Introduction
  • Configuring Change Data Capture
  • Querying Change Data Using Transact-SQL
  • Administration
  • Performance Tuning and Optimization
  • SQL Server 2012 SSIS Components

Jonathan’s been using CDC quite a bit with clients recently and this course is a distillation of all his knowledge and experience.

Check them both out!

We’ve got a lot more courses in production and planned for the remainder of 2015 around server consolidation, Service Broker (3 courses planned), more stored procedure performance, fragmentation, and more. Watch this space (and the Insider Newsletter) for details.

Thanks!

REPLICATION preventing log reuse but no replication configured

Last week, for the second time in as many weeks, I was sent a question in email from someone who had a transaction log that was growing out of control. He’d already queried log_reuse_wait_desc for the database (see this post for some more background) and the result was REPLICATION.

The problem was, there was no replication configured for that database. Just for completeness, he tried turning off the publish and merge publish settings for the database with sp_replicationdboption, ran sp_removedbreplication on the database, and then when those didn’t work, he also tried configuring and then removing replication. Nothing worked and the transaction log kept growing.

The problem turned out to be Change Data Capture. CDC uses the replication log scanning mechanism to harvest changes from the database, either piggy-backing on replication’s Log Reader Agent job or creating it’s own capture job if replication isn’t configured. If CDC is configured but the capture job isn’t running, the log_reuse_wait_desc will show as REPLICATION, as the log manager doesn’t have any way to know *why* the replication log scanner is configured, just that it is, and it hasn’t run.

So, if you ever see REPLICATION as the log_reuse_wait_desc and don’t have replication configured, check the is_cdc_enabled flag in sys.databases too. And then either figure out why the CDC capture job isn’t running correctly (see Change Data Capture Agent Jobs in this BOL entry), or remove CDC if it’s not supposed to be there (see this BOL entry).

Hope that helps a few people!

PASS 2014 top-10 sessions available to watch online

PASS has just released the top-10 rated sessions from the 2014 PASS Summit for everyone to watch for free!

My Advanced Data Recovery Techniques session is in there, as is Erin’s session on Five Execution Plan Patterns To Watch For.

Congratulations to everyone who made the top-10, and very cool to see that SQLskills is the only company that has more than one speaker in the top-10 – woohoo!

You can watch all the sessions online on-demand at http://www.sqlpass.org/Events/PASSSummit/BestofPASSSummit2014.aspx.

Enjoy!

Incomplete checkpoints and recovery

Back in 2009 I blogged about how checkpoints work (see How do checkpoints work and what gets logged) and I received a question in email on Monday that I thought would make a good little blog post.

The question is (paraphrased): What happens if a checkpoint starts but doesn’t finish before a crash occurs? Will that checkpoint be used for crash recovery?

The answer is no, it won’t. Now if I left it at that, that really would be a little blog post, so let me explain my answer :-)

The purpose of a checkpoint is to bring the pages in the data files up-to-date with what’s in the transaction log. When a checkpoint ends, there’s a guarantee that as of the LSN of the LOP_BEGIN_CKPT log record, all changes from log records before that point are persisted in the data files on disk. There’s no guarantee about logged changes after that point, only before it. In other words, all the log records before the LSN of the LOP_BEGIN_CKPT log record are no longer required for crash recovery, unless there’s a long running transaction that started before that LSN.

When the checkpoint ends, the boot page of the database (page 9 in file 1 – see here for some more info) is updated with the beginning LSN of the checkpoint (and then if in the SIMPLE recovery mode, any log clearing/truncating can occur).

So if a checkpoint started but didn’t end before a crash, it’s LSN would not be in the boot page and so crash recovery would start from the previous checkpoint. This is good, because an incomplete checkpoint means there’s no guarantee about which logged changes are persisted in the data files, and so crash recovery wouldn’t be able to work correctly from only starting at the beginning of the incomplete checkpoint.

A corollary question could be: How does SQL Server guarantee that there’s always one complete checkpoint in the active portion of the log, in case a crash occurs?

The answer is that log clearing/truncation of a VLF containing an LOP_BEGIN_CKPT log record cannot happen until another complete checkpoint has occurred. In other words, a complete checkpoint has to occur since the last log clearing/truncation before the next one can happen. If a checkpoint hasn’t occurred, the log_reuse_wait_desc for the database in sys.databases will return CHECKPOINT. It’s not common to see this occur, but you might see it if there’s a very long running checkpoint (e.g. a very large update on a system with a slow I/O subsystem so the flushing of data file pages takes a long time) and very frequent log backups, so two log backups occur over the time taken by a single checkpoint operation. It could also happen if you’ve messed with the sp_configure recovery interval and set it higher than the default.

Interesting, eh?

Monkeys and bananas: stifling innovation in the workplace

The story I wrote for our SQLskills Insider newsletter on Monday last week resonated so well with people that I thought I’d publish it here on my blog too (which I only do rarely). I try to mix things up in the newsletter so it’s not all about SQL Server technical matters; people seem to really like career topics too. You can sign up for the bi-weekly newsletter here (and get all the 100 past issues too) – more than 12,000 people get it now!

Here’s the story, from the newsletter section I call Paul’s Ponderings. Enjoy!

Last week Jonathan pointed me at an interesting story about a psychology experiment involving monkeys and bananas and the reinforcement of constrained, negative thinking. The experiment actually never happened, but the story is quite illustrative. You can see a graphic of the story here: http://i.stack.imgur.com/MyQki.jpg (totally safe for work), and I’ll paraphrase quickly below:

  • A group of monkeys are trained to know that if any one of them attempts to climb a stepladder with a banana at the top, cold water is sprayed on all of them (i.e. temptation is prevented by the spraying of cold water, so the group prevents any individual from trying to get the banana).
  • When a new monkey is introduced, and tries to get the banana, all the other monkeys prevent it as they don’t want to be sprayed with water. The new monkey learns that it can’t get the banana.
  • Eventually all the monkeys are replaced, and they all know not to try to get the banana, but none of them know why, as they’d only been told not to get the banana by the others.

It’s a really interesting story of how conditioning can become groupthink (even if the experiment didn’t actually happen).

There are obvious parallels in human society, and specifically in the work environments of many of you reading this newsletter. Let me explain:

  • A new person (A) joins a team. The new person has a great idea for how to do something differently/better, and everyone else on the team prevents them from expressing their opinion because that won’t be allowed, it won’t work, and/or they could get into trouble (e.g. from an intransigent and influential boss/architect/senior developer).
  • Eventually all the original people leave the team and it’s only A and the people who joined the team after A that are left. They all have the ingrained knowledge that they can’t try anything new, or try a specific technology because they won’t be allowed to etc.

In other words, being constrained to incumbent technologies and methodologies becomes “the way it’s done around here, because it’s always been that way.” It doesn’t matter if the wider world knows that’s maybe not the best way to do it, the historical groupthink wins out.

We see this with new clients over and over, and it can be really difficult to educate the client team as to different (usually better) ways of approaching a problem. Sometimes it’s just one person who stymies all thoughts of innovation or change in the team, and sometimes it’s the collective groupthink that does it.

In fact, there was a client (years ago), that Kimberly actually “fired” because of this. One meeting was enough but she told her contact (“the management”) that she’d try again and meet again for a second session. In her post-mortem with management, her main quote was – your problems are more than technical (specifically person “X”) and until that’s resolved, I can’t help you.

Here are some simple examples of ingrained behavior we’ve seen and corrected:

  • Always use NOLOCK to work around blocking and deadlocking problems (and then it’s used everywhere, and no one even realizes the negatives)
  • Only ever rebuild an index to remove fragmentation, without considering FILLFACTORs or trying to reorganize rather than rebuild
  • Always use a heap and never use a clustered index because heaps are faster
  • When creating a primary key, always use a clustered index as that’s the default so Microsoft must know that’s always the right way to do it
  • For each column that’s used in a WHERE clause, create a single-column nonclustered index
  • Always use OPTION (RECOMPILE) to prevent plans being put into the plan cache and taking precious memory away from the buffer pool
  • Always use sp_executesql to put a plan into cache for ad hoc statements so that their plans get reused
  • Always create one file per processor core for tempdb, because that was the Microsoft guidance for SQL Server 2000

As I hope you can see, these are all very blinkered (and in some cases downright wrong) views on a variety of SQL Server topics. There are cases when some of these are the right thing to do, but not ALL the time just because ‘that’s the way we do it here’.

Does this remind you of anywhere? Or of anyone?

It’s a dangerous way for a team to work because it can lead to all kinds of problems (such as poor performance, failed disaster recovery, good people leaving the team, business failure) that can be extremely hard to fix unless someone takes a stand or someone from outside the team helps to break the impasse. And even then, it’s a delicate process to provide education and slowly change the thinking of the team, or of that one person who dominates the team’s thinking.

Call to action: Consider the environment you work in and whether the situation described above is how your team operates. Are you the person to take the stand to try to change the groupthink? Do you do it or do you move on? Or are you the person who’s unreasonably dominating the team’s thinking? Do you allow yourself to be changed or continue in your intransigence? It’s a difficult situation to recognize you’re in, whatever your role is, and a hard choice to decide what to do about it.