New 4-day Immersion Event on Practical Data Science with Rafal Lukawiecki

We’re very excited to announce that we’re offering a new Immersion Event on Practical Data Science using Azure Machine Learning, SQL Data Mining, and R, presented by our great friend Rafal Lukawiecki.

We’ve known Rafal well since 2003, and apart from being an expert in BI and data mining, he’s a superb, highly-engaging speaker, who’s presented keynotes, workshops, and sessions to high acclaim at conferences around the world including TechEds and Microsoft’s brand new Ignite conference. In fact, you can watch Rafal’s 75-minute, 300-level session from Ignite earlier this week, on the same topic as the new Immersion Event, on Channel 9 here – it’s really interesting!

Rafal’s course will be four days long, with the following modules:

  1. Overview of Practical Data Science for Business
  2. Data
  3. Process
  4. Algorithm Overview
  5. Tools and Getting Started
  6. Segmentation
  7. Classification
  8. Basic Statistics
  9. Model Validation
  10. Classifier Precision
  11. Regressions
  12. Similarity Matching and Recommenders
  13. Other Algorithms (Brief Overview)
  14. Production and Model Maintenance

You can read through the detailed curriculum here.

The first offering of this new class will be in Chicago, at our usual location, from November 16-19, 2015 – all the details are here.

Cool stuff – hope to see you there!


Here’s an interesting question that came up in our IEPTO1 class in Chicago this week (paraphrasing):

I was doing a demo recently where I was demonstrating physical I/Os occurring. I used DBCC DROPCLEANBUFFERS, then enabled SET STATISTICS IO ON and performed a SELECT operation. I was expecting to see physical reads occurring but I only saw logical reads. What’s going on? Why weren’t there any physical reads after I’d flushed the buffer pool with DBCC DROPCLEANBUFFERS?

It’s a very interesting question. If you ask most people what DBCC DROPCLEANBUFFERS does, you’ll get the response that it clears out the buffer pool (as my student expected).

But it doesn’t.

It drops *clean* pages from the buffer pool only.

A clean page is one that has not been changed since it was read into memory or last written to disk. A dirty page is one that has not been written to disk since it was last changed. Dirty pages are not dropped by DBCC DROPCLEANBUFFERS, they are only made clean by writing them to disk (either through one of the various kinds of checkpoints or by the lazy writer – or one of the per-NUMA node lazy writers if you have NUMA configured).

The demo in question had updated the table being selected, and so when DBCC DROPCLEANBUFFERS was executed, the pages from that table remained in memory – hence no physical reads were required for the subsequent SELECT.

If you want to ensure that all pages from a database are flushed from memory, you need to first perform a manual CHECKPOINT of that database and then run DBCC DROPCLEANBUFFERS.

You can then verify that there are no pages in memory for the database in question using the following code:

	[DirtyPageCount] * 8 / 1024 AS [DirtyPageMB],
	[CleanPageCount] * 8 / 1024 AS [CleanPageMB]
		(CASE WHEN ([database_id] = 32767)
			THEN N'Resource Database'
			ELSE DB_NAME ([database_id]) END) AS [DatabaseName], 
		SUM (CASE WHEN ([is_modified] = 1)
			THEN 1 ELSE 0 END) AS [DirtyPageCount], 
		SUM (CASE WHEN ([is_modified] = 1)
			THEN 0 ELSE 1 END) AS [CleanPageCount]
	FROM sys.dm_os_buffer_descriptors
	GROUP BY [database_id]) AS [buffers]
ORDER BY [DatabaseName]

Hope this helps some of you out there!

SQLintersection coming up in May

We’re just about to leave for three weeks of Immersion Events in Chicago, starting next week, and if that wasn’t enough, we’ve got our Spring 2015 SQLintersection conference right after that!

This time we’re in Scottsdale (Phoenix), Arizona (May 18-21), which I’m really looking forward to as I’ve never been to Arizona (it’ll be the 23rd State I’ve visited).

We’ve got a fantastic speaker line-up as usual, with Brent Ozar, Allen White, Aaron Bertrand, Kevin Kline, SQL CAT team members Shep Shephard and Denzil Ribeiro, SQL PFEs David Pless and Tim Chapman, and four of us from SQLskills (me, Kimberly, Erin, and Tim).

I’ll be doing my favorite pre-con workshop on Performance Troubleshooting using Waits and Latches, plus there are other workshops from Brent Ozar, Kimberly, and Allen White. We’ve also got four tracks of sessions, including three from me:

  • Transaction Log Performance
  • Advanced Data Recovery Techniques
  • DBA Mythbusters

It’s been five years since I last put together a mythbusters session so plenty of new myths to bust :-)

I’m really looking forward to the conference and hope to see you there!


PS We’ll be running out SQL After Dark evening quiz bowl and raffle again – that was a blast when we did it last time!

New SQLskills Pluralsight courses

We’ve released a couple of new Pluralsight courses recently:

The first is the conclusion of Glenn’s DMVs course series, SQL Server 2014 DMV Diagnostic Queries – Part 3, based on the very popular set of DMV queries he publishes every month.

The second course is Jonathan’s long-awaited SQL Server: Change Data Capture course. The short abstract is:

Learn how to capture SQL Server table data changes for downstream consumption using the Change Data Capture feature of SQL Server. This course is applicable to developers, DBAs, and architects on SQL Server 2008 onward.

The module list is:

  • Introduction
  • Configuring Change Data Capture
  • Querying Change Data Using Transact-SQL
  • Administration
  • Performance Tuning and Optimization
  • SQL Server 2012 SSIS Components

Jonathan’s been using CDC quite a bit with clients recently and this course is a distillation of all his knowledge and experience.

Check them both out!

We’ve got a lot more courses in production and planned for the remainder of 2015 around server consolidation, Service Broker (3 courses planned), more stored procedure performance, fragmentation, and more. Watch this space (and the Insider Newsletter) for details.


REPLICATION preventing log reuse but no replication configured

Last week, for the second time in as many weeks, I was sent a question in email from someone who had a transaction log that was growing out of control. He’d already queried log_reuse_wait_desc for the database (see this post for some more background) and the result was REPLICATION.

The problem was, there was no replication configured for that database. Just for completeness, he tried turning off the publish and merge publish settings for the database with sp_replicationdboption, ran sp_removedbreplication on the database, and then when those didn’t work, he also tried configuring and then removing replication. Nothing worked and the transaction log kept growing.

The problem turned out to be Change Data Capture. CDC uses the replication log scanning mechanism to harvest changes from the database, either piggy-backing on replication’s Log Reader Agent job or creating it’s own capture job if replication isn’t configured. If CDC is configured but the capture job isn’t running, the log_reuse_wait_desc will show as REPLICATION, as the log manager doesn’t have any way to know *why* the replication log scanner is configured, just that it is, and it hasn’t run.

So, if you ever see REPLICATION as the log_reuse_wait_desc and don’t have replication configured, check the is_cdc_enabled flag in sys.databases too. And then either figure out why the CDC capture job isn’t running correctly (see Change Data Capture Agent Jobs in this BOL entry), or remove CDC if it’s not supposed to be there (see this BOL entry).

Hope that helps a few people!

PASS 2014 top-10 sessions available to watch online

PASS has just released the top-10 rated sessions from the 2014 PASS Summit for everyone to watch for free!

My Advanced Data Recovery Techniques session is in there, as is Erin’s session on Five Execution Plan Patterns To Watch For.

Congratulations to everyone who made the top-10, and very cool to see that SQLskills is the only company that has more than one speaker in the top-10 – woohoo!

You can watch all the sessions online on-demand at


Incomplete checkpoints and recovery

Back in 2009 I blogged about how checkpoints work (see How do checkpoints work and what gets logged) and I received a question in email on Monday that I thought would make a good little blog post.

The question is (paraphrased): What happens if a checkpoint starts but doesn’t finish before a crash occurs? Will that checkpoint be used for crash recovery?

The answer is no, it won’t. Now if I left it at that, that really would be a little blog post, so let me explain my answer :-)

The purpose of a checkpoint is to bring the pages in the data files up-to-date with what’s in the transaction log. When a checkpoint ends, there’s a guarantee that as of the LSN of the LOP_BEGIN_CKPT log record, all changes from log records before that point are persisted in the data files on disk. There’s no guarantee about logged changes after that point, only before it. In other words, all the log records before the LSN of the LOP_BEGIN_CKPT log record are no longer required for crash recovery, unless there’s a long running transaction that started before that LSN.

When the checkpoint ends, the boot page of the database (page 9 in file 1 – see here for some more info) is updated with the beginning LSN of the checkpoint (and then if in the SIMPLE recovery mode, any log clearing/truncating can occur).

So if a checkpoint started but didn’t end before a crash, it’s LSN would not be in the boot page and so crash recovery would start from the previous checkpoint. This is good, because an incomplete checkpoint means there’s no guarantee about which logged changes are persisted in the data files, and so crash recovery wouldn’t be able to work correctly from only starting at the beginning of the incomplete checkpoint.

A corollary question could be: How does SQL Server guarantee that there’s always one complete checkpoint in the active portion of the log, in case a crash occurs?

The answer is that log clearing/truncation of a VLF containing an LOP_BEGIN_CKPT log record cannot happen until another complete checkpoint has occurred. In other words, a complete checkpoint has to occur since the last log clearing/truncation before the next one can happen. If a checkpoint hasn’t occurred, the log_reuse_wait_desc for the database in sys.databases will return CHECKPOINT. It’s not common to see this occur, but you might see it if there’s a very long running checkpoint (e.g. a very large update on a system with a slow I/O subsystem so the flushing of data file pages takes a long time) and very frequent log backups, so two log backups occur over the time taken by a single checkpoint operation. It could also happen if you’ve messed with the sp_configure recovery interval and set it higher than the default.

Interesting, eh?

Monkeys and bananas: stifling innovation in the workplace

The story I wrote for our SQLskills Insider newsletter on Monday last week resonated so well with people that I thought I’d publish it here on my blog too (which I only do rarely). I try to mix things up in the newsletter so it’s not all about SQL Server technical matters; people seem to really like career topics too. You can sign up for the bi-weekly newsletter here (and get all the 100 past issues too) – more than 12,000 people get it now!

Here’s the story, from the newsletter section I call Paul’s Ponderings. Enjoy!

Last week Jonathan pointed me at an interesting story about a psychology experiment involving monkeys and bananas and the reinforcement of constrained, negative thinking. The experiment actually never happened, but the story is quite illustrative. You can see a graphic of the story here: (totally safe for work), and I’ll paraphrase quickly below:

  • A group of monkeys are trained to know that if any one of them attempts to climb a stepladder with a banana at the top, cold water is sprayed on all of them (i.e. temptation is prevented by the spraying of cold water, so the group prevents any individual from trying to get the banana).
  • When a new monkey is introduced, and tries to get the banana, all the other monkeys prevent it as they don’t want to be sprayed with water. The new monkey learns that it can’t get the banana.
  • Eventually all the monkeys are replaced, and they all know not to try to get the banana, but none of them know why, as they’d only been told not to get the banana by the others.

It’s a really interesting story of how conditioning can become groupthink (even if the experiment didn’t actually happen).

There are obvious parallels in human society, and specifically in the work environments of many of you reading this newsletter. Let me explain:

  • A new person (A) joins a team. The new person has a great idea for how to do something differently/better, and everyone else on the team prevents them from expressing their opinion because that won’t be allowed, it won’t work, and/or they could get into trouble (e.g. from an intransigent and influential boss/architect/senior developer).
  • Eventually all the original people leave the team and it’s only A and the people who joined the team after A that are left. They all have the ingrained knowledge that they can’t try anything new, or try a specific technology because they won’t be allowed to etc.

In other words, being constrained to incumbent technologies and methodologies becomes “the way it’s done around here, because it’s always been that way.” It doesn’t matter if the wider world knows that’s maybe not the best way to do it, the historical groupthink wins out.

We see this with new clients over and over, and it can be really difficult to educate the client team as to different (usually better) ways of approaching a problem. Sometimes it’s just one person who stymies all thoughts of innovation or change in the team, and sometimes it’s the collective groupthink that does it.

In fact, there was a client (years ago), that Kimberly actually “fired” because of this. One meeting was enough but she told her contact (“the management”) that she’d try again and meet again for a second session. In her post-mortem with management, her main quote was – your problems are more than technical (specifically person “X”) and until that’s resolved, I can’t help you.

Here are some simple examples of ingrained behavior we’ve seen and corrected:

  • Always use NOLOCK to work around blocking and deadlocking problems (and then it’s used everywhere, and no one even realizes the negatives)
  • Only ever rebuild an index to remove fragmentation, without considering FILLFACTORs or trying to reorganize rather than rebuild
  • Always use a heap and never use a clustered index because heaps are faster
  • When creating a primary key, always use a clustered index as that’s the default so Microsoft must know that’s always the right way to do it
  • For each column that’s used in a WHERE clause, create a single-column nonclustered index
  • Always use OPTION (RECOMPILE) to prevent plans being put into the plan cache and taking precious memory away from the buffer pool
  • Always use sp_executesql to put a plan into cache for ad hoc statements so that their plans get reused
  • Always create one file per processor core for tempdb, because that was the Microsoft guidance for SQL Server 2000

As I hope you can see, these are all very blinkered (and in some cases downright wrong) views on a variety of SQL Server topics. There are cases when some of these are the right thing to do, but not ALL the time just because ‘that’s the way we do it here’.

Does this remind you of anywhere? Or of anyone?

It’s a dangerous way for a team to work because it can lead to all kinds of problems (such as poor performance, failed disaster recovery, good people leaving the team, business failure) that can be extremely hard to fix unless someone takes a stand or someone from outside the team helps to break the impasse. And even then, it’s a delicate process to provide education and slowly change the thinking of the team, or of that one person who dominates the team’s thinking.

Call to action: Consider the environment you work in and whether the situation described above is how your team operates. Are you the person to take the stand to try to change the groupthink? Do you do it or do you move on? Or are you the person who’s unreasonably dominating the team’s thinking? Do you allow yourself to be changed or continue in your intransigence? It’s a difficult situation to recognize you’re in, whatever your role is, and a hard choice to decide what to do about it.

Mentoring: the Class of 2015

A couple of weeks ago I kicked off a competition of sorts, to find some people to mentor to celebrate the community voting me “Person You’d Most Like to be Mentored By” in the annual Redgate Tribal Awards.

If you remember my terms, all someone had to do to be considered is write a blog post and post a link (and the link had to work!). Below is the list of everyone who qualified, with links to their posts, in the order they were received.

  1. Garry Bargsley
  2. Dean Savović
  3. Anthony Nocentino
  4. Shawn Melton
  5. Shane Bryant
  6. Chris Bell
  7. Peter Baillie
  8. Martha Schofield
  9. Dennis Cheung
  10. Tim Moss
  11. Max Vernon
  12. Snehabandakavi
  13. Robert Bishop
  14. Peter Tran
  15. Clive Strong
  16. Marcos Galvani
  17. Sarah Falkavage
  18. Derek Czarny
  19. Siva Ramasamy
  20. Mark Sinkinson
  21. Derek Hammer
  22. Pat ? (Incessantlearner880)
  23. Steph Locke
  24. Arun ? (blobeater)
  25. Giacomo Gozzi
  26. Steve Hall
  27. Charles Olatunji
  28. Terri Champlin
  29. Sandra Gaw
  30. Todd Kleinhans
  31. Sina Hassanpour
  32. Dan Stefan
  33. Ryan Desmond
  34. Kevin Feasel
  35. Mika Sutinen
  36. Matt Bowler
  37. Kyle Hayes
  38. Paul McHenry
  39. Praveen D’sa
  40. Josh ?
  41. Gilbert Quevauvilliers
  42. John ? (logosphos16)
  43. Ognjen ?
  44. Sam Bryant
  45. Lalitha ?
  46. Tracy Boggiano
  47. Paul Popovich Jr
  48. Paulo dos Santos
  49. Priya ?
  50. JP Godoy
  51. Mark Wilkinson
  52. John Wells
  53. Travis Gan
  54. LongHairedWeirdo

I said I’d read through the blog posts and pick three men and three women to mentor over the next couple of months.

How could I figure out who to pick? Is X more worthy than Y? Would A benefit more than B? Who am I to judge that?



Yes, I cheated a bit. I had no intention of trying to arbitrarily pick someone over someone else – that would just leave people disappointed. So I told myself that I’d mentor anyone who was motivated enough to write a blog post and submit it for consideration. That was the first test.

Yup. I’m going to mentor all 54 people on the list. And I’ll spread it out over the rest of the year, rather than trying to do it all in the next two months. Yes, it’ll take a bunch of my time, but it’ll be fun.

The second test for all the people on the list is that they have to send me an email using the link in this post, from a non-work email address (as we’ll end up discussing your work probably), by the end of February. You don’t need to say anything in the email – I just need an email address to contact you to kick things off.

Congratulations to everyone on the list!

(Hopefully this will work out really well and maybe I’ll do something similar next year)

Physical security


This month’s T-SQL Tuesday (hosted by Kenneth Fisher – @sqlstudent144) is about security This hasn’t been my area of expertise for a long time, although I did write a long TechNet Magazine article about common security issues and solutions back in 2009.

There’s a huge amount that’s been written about implementing security in SQL Server and Windows – working towards the security of the data while it’s in the database, being sent to the client application, and within the client application. This can be incredibly important for your business and your clients and so the focus there is justifiable.

However, I think there’s an aspect to data security that’s often completely overlooked: physical security.

Consider the infrastructure in your environment, and ask yourself the following questions:

  • Are the servers running SQL Server physically secured so only authorized people have access to them? I’m not just talking about whether someone can walk out with a server under their arm (and then get the hard drives with the data on – the actual server hardware isn’t a physical security risk if there is no data storage in it), although this is something you should consider. I also want you to consider whether an unauthorized person can walk up to such a server and insert a USB drive that could have an auto-run program on it that installs some kind of security hole.
  • And what about if the server has server-local storage? An unauthorized person could grab a hard drive from a server and clone it really quickly, maybe overnight so no-one’s available onsite to see why the server went down. Here‘s a link on Amazon to a machine we use for quickly cloning laptop hard drives when we upgrade them. Really useful, but also useful in the hands of someone with nefarious aims.
  • Are the storage arrays where the data resides physically secured so only authorized people have access to them? And what about the routers? Here is a thread from the Dell support forums about making an MD3000i password reset cable from scratch. You don’t want someone to be able to physically reset the password on some storage array, and then make a connection to it from an unauthorized server on the network and gain access to the data on the drives. And then there’s the question of someone just popping out some of the drives and walking away with them…
  • Are there cameras monitoring all of the above?
  • For the questions above, now ask them about your failover data center. And what if you data center is hosted? Does the hoster guarantee physical security of your data?
  • Now let’s think about your admin users. What kind of physical security protects the desktops of the people with secure access to the data? Is it possible for them to walk away and leave their screen unlocked? Is it possible for someone to walk up to their computer and plug in a USB drive with some auto-run software on it?
  • Now let’s think about your admin users’ laptops. Same questions as above. What about if they take their laptops home? Or they use their own systems at home? Are they physically secured so someone can’t access your data from these people’s systems?

Still think your data is secure?