New SQLskills Pluralsight courses

We’ve released a couple of new Pluralsight courses recently:

The first is the conclusion of Glenn’s DMVs course series, SQL Server 2014 DMV Diagnostic Queries – Part 3, based on the very popular set of DMV queries he publishes every month.

The second course is Jonathan’s long-awaited SQL Server: Change Data Capture course. The short abstract is:

Learn how to capture SQL Server table data changes for downstream consumption using the Change Data Capture feature of SQL Server. This course is applicable to developers, DBAs, and architects on SQL Server 2008 onward.

The module list is:

  • Introduction
  • Configuring Change Data Capture
  • Querying Change Data Using Transact-SQL
  • Administration
  • Performance Tuning and Optimization
  • SQL Server 2012 SSIS Components

Jonathan’s been using CDC quite a bit with clients recently and this course is a distillation of all his knowledge and experience.

Check them both out!

We’ve got a lot more courses in production and planned for the remainder of 2015 around server consolidation, Service Broker (3 courses planned), more stored procedure performance, fragmentation, and more. Watch this space (and the Insider Newsletter) for details.

Thanks!

REPLICATION preventing log reuse but no replication configured

Last week, for the second time in as many weeks, I was sent a question in email from someone who had a transaction log that was growing out of control. He’d already queried log_reuse_wait_desc for the database (see this post for some more background) and the result was REPLICATION.

The problem was, there was no replication configured for that database. Just for completeness, he tried turning off the publish and merge publish settings for the database with sp_replicationdboption, ran sp_removedbreplication on the database, and then when those didn’t work, he also tried configuring and then removing replication. Nothing worked and the transaction log kept growing.

The problem turned out to be Change Data Capture. CDC uses the replication log scanning mechanism to harvest changes from the database, either piggy-backing on replication’s Log Reader Agent job or creating it’s own capture job if replication isn’t configured. If CDC is configured but the capture job isn’t running, the log_reuse_wait_desc will show as REPLICATION, as the log manager doesn’t have any way to know *why* the replication log scanner is configured, just that it is, and it hasn’t run.

So, if you ever see REPLICATION as the log_reuse_wait_desc and don’t have replication configured, check the is_cdc_enabled flag in sys.databases too. And then either figure out why the CDC capture job isn’t running correctly (see Change Data Capture Agent Jobs in this BOL entry), or remove CDC if it’s not supposed to be there (see this BOL entry).

Hope that helps a few people!

PASS 2014 top-10 sessions available to watch online

PASS has just released the top-10 rated sessions from the 2014 PASS Summit for everyone to watch for free!

My Advanced Data Recovery Techniques session is in there, as is Erin’s session on Five Execution Plan Patterns To Watch For.

Congratulations to everyone who made the top-10, and very cool to see that SQLskills is the only company that has more than one speaker in the top-10 – woohoo!

You can watch all the sessions online on-demand at http://www.sqlpass.org/Events/PASSSummit/BestofPASSSummit2014.aspx.

Enjoy!

Incomplete checkpoints and recovery

Back in 2009 I blogged about how checkpoints work (see How do checkpoints work and what gets logged) and I received a question in email on Monday that I thought would make a good little blog post.

The question is (paraphrased): What happens if a checkpoint starts but doesn’t finish before a crash occurs? Will that checkpoint be used for crash recovery?

The answer is no, it won’t. Now if I left it at that, that really would be a little blog post, so let me explain my answer :-)

The purpose of a checkpoint is to bring the pages in the data files up-to-date with what’s in the transaction log. When a checkpoint ends, there’s a guarantee that as of the LSN of the LOP_BEGIN_CKPT log record, all changes from log records before that point are persisted in the data files on disk. There’s no guarantee about logged changes after that point, only before it. In other words, all the log records before the LSN of the LOP_BEGIN_CKPT log record are no longer required for crash recovery, unless there’s a long running transaction that started before that LSN.

When the checkpoint ends, the boot page of the database (page 9 in file 1 – see here for some more info) is updated with the beginning LSN of the checkpoint (and then if in the SIMPLE recovery mode, any log clearing/truncating can occur).

So if a checkpoint started but didn’t end before a crash, it’s LSN would not be in the boot page and so crash recovery would start from the previous checkpoint. This is good, because an incomplete checkpoint means there’s no guarantee about which logged changes are persisted in the data files, and so crash recovery wouldn’t be able to work correctly from only starting at the beginning of the incomplete checkpoint.

A corollary question could be: How does SQL Server guarantee that there’s always one complete checkpoint in the active portion of the log, in case a crash occurs?

The answer is that log clearing/truncation of a VLF containing an LOP_BEGIN_CKPT log record cannot happen until another complete checkpoint has occurred. In other words, a complete checkpoint has to occur since the last log clearing/truncation before the next one can happen. If a checkpoint hasn’t occurred, the log_reuse_wait_desc for the database in sys.databases will return CHECKPOINT. It’s not common to see this occur, but you might see it if there’s a very long running checkpoint (e.g. a very large update on a system with a slow I/O subsystem so the flushing of data file pages takes a long time) and very frequent log backups, so two log backups occur over the time taken by a single checkpoint operation. It could also happen if you’ve messed with the sp_configure recovery interval and set it higher than the default.

Interesting, eh?

Monkeys and bananas: stifling innovation in the workplace

The story I wrote for our SQLskills Insider newsletter on Monday last week resonated so well with people that I thought I’d publish it here on my blog too (which I only do rarely). I try to mix things up in the newsletter so it’s not all about SQL Server technical matters; people seem to really like career topics too. You can sign up for the bi-weekly newsletter here (and get all the 100 past issues too) – more than 12,000 people get it now!

Here’s the story, from the newsletter section I call Paul’s Ponderings. Enjoy!

Last week Jonathan pointed me at an interesting story about a psychology experiment involving monkeys and bananas and the reinforcement of constrained, negative thinking. The experiment actually never happened, but the story is quite illustrative. You can see a graphic of the story here: http://i.stack.imgur.com/MyQki.jpg (totally safe for work), and I’ll paraphrase quickly below:

  • A group of monkeys are trained to know that if any one of them attempts to climb a stepladder with a banana at the top, cold water is sprayed on all of them (i.e. temptation is prevented by the spraying of cold water, so the group prevents any individual from trying to get the banana).
  • When a new monkey is introduced, and tries to get the banana, all the other monkeys prevent it as they don’t want to be sprayed with water. The new monkey learns that it can’t get the banana.
  • Eventually all the monkeys are replaced, and they all know not to try to get the banana, but none of them know why, as they’d only been told not to get the banana by the others.

It’s a really interesting story of how conditioning can become groupthink (even if the experiment didn’t actually happen).

There are obvious parallels in human society, and specifically in the work environments of many of you reading this newsletter. Let me explain:

  • A new person (A) joins a team. The new person has a great idea for how to do something differently/better, and everyone else on the team prevents them from expressing their opinion because that won’t be allowed, it won’t work, and/or they could get into trouble (e.g. from an intransigent and influential boss/architect/senior developer).
  • Eventually all the original people leave the team and it’s only A and the people who joined the team after A that are left. They all have the ingrained knowledge that they can’t try anything new, or try a specific technology because they won’t be allowed to etc.

In other words, being constrained to incumbent technologies and methodologies becomes “the way it’s done around here, because it’s always been that way.” It doesn’t matter if the wider world knows that’s maybe not the best way to do it, the historical groupthink wins out.

We see this with new clients over and over, and it can be really difficult to educate the client team as to different (usually better) ways of approaching a problem. Sometimes it’s just one person who stymies all thoughts of innovation or change in the team, and sometimes it’s the collective groupthink that does it.

In fact, there was a client (years ago), that Kimberly actually “fired” because of this. One meeting was enough but she told her contact (“the management”) that she’d try again and meet again for a second session. In her post-mortem with management, her main quote was – your problems are more than technical (specifically person “X”) and until that’s resolved, I can’t help you.

Here are some simple examples of ingrained behavior we’ve seen and corrected:

  • Always use NOLOCK to work around blocking and deadlocking problems (and then it’s used everywhere, and no one even realizes the negatives)
  • Only ever rebuild an index to remove fragmentation, without considering FILLFACTORs or trying to reorganize rather than rebuild
  • Always use a heap and never use a clustered index because heaps are faster
  • When creating a primary key, always use a clustered index as that’s the default so Microsoft must know that’s always the right way to do it
  • For each column that’s used in a WHERE clause, create a single-column nonclustered index
  • Always use OPTION (RECOMPILE) to prevent plans being put into the plan cache and taking precious memory away from the buffer pool
  • Always use sp_executesql to put a plan into cache for ad hoc statements so that their plans get reused
  • Always create one file per processor core for tempdb, because that was the Microsoft guidance for SQL Server 2000

As I hope you can see, these are all very blinkered (and in some cases downright wrong) views on a variety of SQL Server topics. There are cases when some of these are the right thing to do, but not ALL the time just because ‘that’s the way we do it here’.

Does this remind you of anywhere? Or of anyone?

It’s a dangerous way for a team to work because it can lead to all kinds of problems (such as poor performance, failed disaster recovery, good people leaving the team, business failure) that can be extremely hard to fix unless someone takes a stand or someone from outside the team helps to break the impasse. And even then, it’s a delicate process to provide education and slowly change the thinking of the team, or of that one person who dominates the team’s thinking.

Call to action: Consider the environment you work in and whether the situation described above is how your team operates. Are you the person to take the stand to try to change the groupthink? Do you do it or do you move on? Or are you the person who’s unreasonably dominating the team’s thinking? Do you allow yourself to be changed or continue in your intransigence? It’s a difficult situation to recognize you’re in, whatever your role is, and a hard choice to decide what to do about it.

Mentoring: the Class of 2015

A couple of weeks ago I kicked off a competition of sorts, to find some people to mentor to celebrate the community voting me “Person You’d Most Like to be Mentored By” in the annual Redgate Tribal Awards.

If you remember my terms, all someone had to do to be considered is write a blog post and post a link (and the link had to work!). Below is the list of everyone who qualified, with links to their posts, in the order they were received.

  1. Garry Bargsley
  2. Dean Savović
  3. Anthony Nocentino
  4. Shawn Melton
  5. Shane Bryant
  6. Chris Bell
  7. Peter Baillie
  8. Martha Schofield
  9. Dennis Cheung
  10. Tim Moss
  11. Max Vernon
  12. Snehabandakavi
  13. Robert Bishop
  14. Peter Tran
  15. Clive Strong
  16. Marcos Galvani
  17. Sarah Falkavage
  18. Derek Czarny
  19. Siva Ramasamy
  20. Mark Sinkinson
  21. Derek Hammer
  22. Pat ? (Incessantlearner880)
  23. Steph Locke
  24. Arun ? (blobeater)
  25. Giacomo Gozzi
  26. Steve Hall
  27. Charles Olatunji
  28. Terri Champlin
  29. Sandra Gaw
  30. Todd Kleinhans
  31. Sina Hassanpour
  32. Dan Stefan
  33. Ryan Desmond
  34. Kevin Feasel
  35. Mika Sutinen
  36. Matt Bowler
  37. Kyle Hayes
  38. Paul McHenry
  39. Praveen D’sa
  40. Josh ?
  41. Gilbert Quevauvilliers
  42. John ? (logosphos16)
  43. Ognjen ?
  44. Sam Bryant
  45. Lalitha ?
  46. Tracy Boggiano
  47. Paul Popovich Jr
  48. Paulo dos Santos
  49. Priya ?
  50. JP Godoy
  51. Mark Wilkinson
  52. John Wells
  53. Travis Gan
  54. LongHairedWeirdo

I said I’d read through the blog posts and pick three men and three women to mentor over the next couple of months.

How could I figure out who to pick? Is X more worthy than Y? Would A benefit more than B? Who am I to judge that?

So…….

EVERYBODY WINS!

Yes, I cheated a bit. I had no intention of trying to arbitrarily pick someone over someone else – that would just leave people disappointed. So I told myself that I’d mentor anyone who was motivated enough to write a blog post and submit it for consideration. That was the first test.

Yup. I’m going to mentor all 54 people on the list. And I’ll spread it out over the rest of the year, rather than trying to do it all in the next two months. Yes, it’ll take a bunch of my time, but it’ll be fun.

The second test for all the people on the list is that they have to send me an email using the link in this post, from a non-work email address (as we’ll end up discussing your work probably), by the end of February. You don’t need to say anything in the email – I just need an email address to contact you to kick things off.

Congratulations to everyone on the list!

(Hopefully this will work out really well and maybe I’ll do something similar next year)

Physical security

TSQL2sDay150x150 388014A5 Physical security

This month’s T-SQL Tuesday (hosted by Kenneth Fisher – @sqlstudent144) is about security This hasn’t been my area of expertise for a long time, although I did write a long TechNet Magazine article about common security issues and solutions back in 2009.

There’s a huge amount that’s been written about implementing security in SQL Server and Windows – working towards the security of the data while it’s in the database, being sent to the client application, and within the client application. This can be incredibly important for your business and your clients and so the focus there is justifiable.

However, I think there’s an aspect to data security that’s often completely overlooked: physical security.

Consider the infrastructure in your environment, and ask yourself the following questions:

  • Are the servers running SQL Server physically secured so only authorized people have access to them? I’m not just talking about whether someone can walk out with a server under their arm (and then get the hard drives with the data on – the actual server hardware isn’t a physical security risk if there is no data storage in it), although this is something you should consider. I also want you to consider whether an unauthorized person can walk up to such a server and insert a USB drive that could have an auto-run program on it that installs some kind of security hole.
  • And what about if the server has server-local storage? An unauthorized person could grab a hard drive from a server and clone it really quickly, maybe overnight so no-one’s available onsite to see why the server went down. Here‘s a link on Amazon to a machine we use for quickly cloning laptop hard drives when we upgrade them. Really useful, but also useful in the hands of someone with nefarious aims.
  • Are the storage arrays where the data resides physically secured so only authorized people have access to them? And what about the routers? Here is a thread from the Dell support forums about making an MD3000i password reset cable from scratch. You don’t want someone to be able to physically reset the password on some storage array, and then make a connection to it from an unauthorized server on the network and gain access to the data on the drives. And then there’s the question of someone just popping out some of the drives and walking away with them…
  • Are there cameras monitoring all of the above?
  • For the questions above, now ask them about your failover data center. And what if you data center is hosted? Does the hoster guarantee physical security of your data?
  • Now let’s think about your admin users. What kind of physical security protects the desktops of the people with secure access to the data? Is it possible for them to walk away and leave their screen unlocked? Is it possible for someone to walk up to their computer and plug in a USB drive with some auto-run software on it?
  • Now let’s think about your admin users’ laptops. Same questions as above. What about if they take their laptops home? Or they use their own systems at home? Are they physically secured so someone can’t access your data from these people’s systems?

Still think your data is secure?

Want to be mentored by me?

Every year Redgate hands out a series of Tribal Awards, voted for by the SQL Server community around the world. In the 2013 awards I won the “Most Authoritative Voice” category (I think just because I’m very tall and have a deep voice :-) and in the 2014 awards, announced a few weeks ago, I won the “Person You’d Most Like to be Mentored By” category.

Cool! I like to mentor people and that’s one of the most enjoyable parts of managing teams, both when I was at Microsoft, and my team here at SQLskills.

Given that you all voted for me, I’m going to give you the opportunity to be mentored by me. I’ll mentor up to three men and three women during March and April to help with career guidance, skills growth, or whatever I can help you with (disclaimers: I do all mentoring over email – asynchronicity rules! I reserve the right to change the mix of mentees if I’m short on applicants.).

How do you qualify? Write a blog post explaining why you should be mentored, and then post a link to your post as a comment on this post (it won’t appear right away as comments are moderated due to the amount of spam comments).

Don’t make the post about me, make it about you. I already know about me :-)

You’ve got until Sunday 15th February at 23:59:59 PST. I’ll pick the winners based on my sole judgment and announce them on Monday 16th February. I don’t care who you are, where you are, who you work for, or how good your English is; you have an equal chance of being picked as anyone else. It’s all down to the blog post.

Note: the contest is over, please do not submit any further entries.

Have fun!

PS So far people are focusing on what they know about SQL Server. I’m not going to teach you what I know about SQL Server in two months – that’s what my blog and training materials are for – but I could help you figure out *how* to learn more, if that’s really the most important thing in your career that you need help with. A blog post about what you do and have done with SQL Server tells me nothing about you or what you want to get out of being mentored….

PPS Remember that mentoring means you working to figure out what you need help with first of all – if you can’t tell me what you need help with (and I’ll provide a whole bunch of suggestions) then there’s nothing I can do…

PPPS Mentoring (as far as I’m concerned) isn’t about firing off a bunch of questions about day-to-day SQL Server problems you’re having – that’s not what I’ll help you with…

Important change to VLF creation algorithm in SQL Server 2014

Since SQL server 2014 was released back in April last year, there have been some rumblings about changes to how many VLFs are created when the log is grown or auto-grown (I’ll just say auto-grown from now on, as that’s the most common scenario). I experimented a bit and thought I’d figured out the algorithm change. Turns out I hadn’t. There was a question on the MVP distribution list last week that rekindled the discussion and we collectively figured out that the algorithm was behaving non-deterministically… in other words we didn’t know what it was doing. So I pinged my friends in CSS who investigated the code (thanks Bob Ward and Suresh Kandoth!) and explained the change.

The change is pretty profound, and is aimed at preventing lots of auto-growth from creating huge numbers of VLFs. This is cool because having too many (it depends on the log size, but many thousands is too many) VLFs can cause all manner of performance problems around backups, restores, log clearing, replication, crash recovery, rollbacks, and even regular DML operations.

Up to 2014, the algorithm for how many VLFs you get when you create, grow, or auto-grow the log is based on the size in question:

  • Less than 1 MB, complicated, ignore this case.
  • Up to 64 MB: 4 new VLFs, each roughly 1/4 the size of the growth
  • 64 MB to 1 GB: 8 new VLFs, each roughly 1/8 the size of the growth
  • More than 1 GB: 16 new VLFs, each roughly 1/16 the size of the growth

So if you created your log at 1 GB and it auto-grew in chunks of 512 MB to 200 GB, you’d have 8 + ((200 – 1) x 2 x 8) = 3192 VLFs. (8 VLFs from the initial creation, then 200 – 1 = 199 GB of growth at 512 MB per auto-grow = 398 auto-growths, each producing 8 VLFs.)

For SQL Server 2014, the algorithm is now:

  • Is the growth size less than 1/8 the size of the current log size?
  • Yes: create 1 new VLF equal to the growth size
  • No: use the formula above

So on SQL Server 2014, if you created your log at 1GB and it auto-grow in chunks of 512 MB to 200 GB, you’d have:

  • 8 VLFs from the initial log creation
  • All growths up to the log being 4.5 GB would use the formula, so growths at 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5 GB would each add 8 VLFs = 56 VLFs
  • All growths over 4.5 GB will only create 1 VLF per growth = (200 – 4.5) x 2 = 391 VLFs
  • Total = 391 + 56 + 8 = 455 VLFs

455 is a much more reasonable number of VLFs than 3192, and will be far less of a performance problem.

A commenter asked whether compatibility level affects this? No – compatibility level is ignored by the Storage Engine internals.

I think this is an excellent change and I can’t see any drawbacks from it (apart from that it wasn’t publicized when SQL Server 2014 was released). CSS will be doing a comprehensive blog post about this soon, but they were cool with me making people aware of the details of the change ASAP to prevent confusion.

You might think that it could lead to very large VLFs (e.g. you set a 4 GB auto-growth size with a 100 GB log), and it can. But so what? Having very large VLFs is only a problem if they’re created initially and then you try to shrink the log down. At a minimum you can only have two VLFs in the log, so you’d be stuck with two giant VLFs at the start of the log and then smaller ones after you’d grown the log again. That can be a problem that prevents the log being able to wrap around and avoid auto-growth, but that’s not anywhere near as common as having too many VLFs. And that’s NOT a scenario that the new algorithm creates. (As an aside, you can fix that problem by creating a database snapshot and then reverting to it, which deletes the log and creates a 0.5 MB log with two tiny VLFs… it’s a bugfeature that’s been there since 2005, but it breaks your log backup chain when you do it.)

There’s certainly more that can be done in future around VLF management (e.g. fixing the problem I describe immediately above), but this is a giant step in the right direction.

Enjoy!

Updated sys.dm_os_waiting_tasks script

Over the holidays I was playing around with parallelism and updated my sys.dm_os_waiting_tasks script to add in the scheduler distribution of the waiting threads. Here it is for your use.

Enjoy!

(Note that ‘text’ on one line does not have delimiters because that messes up the code formatting plugin):

/*============================================================================
  File:     WaitingTasks.sql

  Summary:  Snapshot of waiting tasks

  SQL Server Versions: 2005 onwards
------------------------------------------------------------------------------
  Written by Paul S. Randal, SQLskills.com

  (c) 2015, SQLskills.com. All rights reserved.

  For more scripts and sample code, check out 

http://www.SQLskills.com

  You may alter this code for your own *non-commercial* purposes. You may
  republish altered code as long as you include this copyright and give due
  credit, but you must obtain prior permission before blogging this code.
  
  THIS CODE AND INFORMATION ARE PROVIDED "AS IS" WITHOUT WARRANTY OF 
  ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED 
  TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND/OR FITNESS FOR A
  PARTICULAR PURPOSE.
============================================================================*/
SELECT
    [owt].[session_id],
    [owt].[exec_context_id],
    [ot].[scheduler_id],
    [owt].[wait_duration_ms],
    [owt].[wait_type],
    [owt].[blocking_session_id],
    [owt].[resource_description],
    CASE [owt].[wait_type]
        WHEN N'CXPACKET' THEN
            RIGHT ([owt].[resource_description],
                CHARINDEX (N'=', REVERSE ([owt].[resource_description])) - 1)
        ELSE NULL
    END AS [Node ID],
    --[es].[program_name],
    [est].text,
    [er].[database_id],
    [eqp].[query_plan],
    [er].[cpu_time]
FROM sys.dm_os_waiting_tasks [owt]
INNER JOIN sys.dm_os_tasks [ot] ON
    [owt].[waiting_task_address] = [ot].[task_address]
INNER JOIN sys.dm_exec_sessions [es] ON
    [owt].[session_id] = [es].[session_id]
INNER JOIN sys.dm_exec_requests [er] ON
    [es].[session_id] = [er].[session_id]
OUTER APPLY sys.dm_exec_sql_text ([er].[sql_handle]) [est]
OUTER APPLY sys.dm_exec_query_plan ([er].[plan_handle]) [eqp]
WHERE
    [es].[is_user_process] = 1
ORDER BY
    [owt].[session_id],
    [owt].[exec_context_id];
GO