Finding a transaction in the log for a particular user

In the last IEHADR class we just had in Chicago, I was doing a demo of looking in the transaction log to find the point at which a table was dropped so a restore could be performed (as described in this blog post). One of the students asked how to find a transaction for a particular user, so I demo’d that and thought it would make a good little post.

This can be done using fn_dblog, or if the relevant log isn’t available on the system any more, using fn_dump_dblog, albeit more slowly.

The two pieces of information needed are the user and the rough time that the transaction occurred.

The user can’t be used to search in the log directly, but every LOP_BEGIN_XACT log record contains the SID of who ran the transaction. The SID can be obtained from the username using the SUSER_SID function (remember that it’s more complicated if someone’s used EXECUTE AS, as that can mask who they really are – details in this post).

For instance, on my laptop:


Then you can use that SID as a filter for fn_dblog (or fn_dump_dblog), like so:

	[Current LSN],
	[Transaction ID],
	[Begin Time],
	LEFT ([Description], 40) AS [Description]
	fn_dblog (NULL, NULL)
Current LSN             Operation                       Transaction ID Begin Time               Description
----------------------- ------------------------------- -------------- ------------------------ ----------------------------------------
00000021:0000049d:0001  LOP_BEGIN_XACT                  0000:000006c8  2015/06/03 11:18:13:790  Backup:CommitDifferentialBase;0x01050000
00000021:000004a5:0001  LOP_BEGIN_XACT                  0000:000006c9  2015/06/03 11:18:13:810  Backup:CommitLogArchivePoint;0x010500000
00000021:000004a5:0002  LOP_BEGIN_XACT                  0000:000006ca  2015/06/03 11:18:13:810  Backup:CommitLogArchivePoint;0x010500000
00000021:000004a7:0003  LOP_BEGIN_XACT                  0000:000006cb  2015/06/03 11:18:13:820  INSERT;0x0105000000000005150000008e888d4
00000021:000004a7:0004  LOP_BEGIN_XACT                  0000:000006cc  2015/06/03 11:18:13:820  AllocHeapPageSimpleXactDML;0x01050000000
00000021:000004a7:0007  LOP_BEGIN_XACT                  0000:000006cd  2015/06/03 11:18:13:820  AllocFirstPage;0x01050000000000051500000
00000021:000004ad:0002  LOP_BEGIN_XACT                  0000:000006ce  2015/06/03 11:18:13:820  INSERT;0x0105000000000005150000008e888d4
00000021:000004ae:0001  LOP_BEGIN_XACT                  0000:000006cf  2015/06/03 11:18:16:112  INSERT;0x0105000000000005150000008e888d4
00000021:000004af:0001  LOP_BEGIN_XACT                  0000:000006d0  2015/06/03 11:19:17:306  INSERT;0x0105000000000005150000008e888d4
00000021:000004b0:0001  LOP_BEGIN_XACT                  0000:000006d1  2015/06/03 11:22:35:451  DELETE;0x0105000000000005150000008e888d4
00000021:000004b1:0001  LOP_BEGIN_XACT                  0000:000006d2  2015/06/03 11:27:42:998  INSERT;0x0105000000000005150000008e888d4
00000021:000004b2:0001  LOP_BEGIN_XACT                  0000:000006d3  2015/06/03 11:29:56:044  DELETE;0x0105000000000005150000008e888d4


Obviously the transactions above are a contrived example. You can imagine the case of lots of transactions spread out over a few hours (or even over a day, being investigated through log backups with fn_dump_dblog) and to narrow it down to the transaction you want you could look through the list manually for the rough time or specify a time range on the SELECT using predicates on the Begin Time column in the fn_dblog output.

For example:

	[Current LSN],
	[Transaction ID],
	[Begin Time],
	LEFT ([Description], 40) AS [Description]
	fn_dblog (NULL, NULL)
AND ([Begin Time] > '2015/06/03 11:18:15' AND [Begin Time] < '2015/06/03 11:18:25');
Current LSN             Operation                       Transaction ID Begin Time               Description
----------------------- ------------------------------- -------------- ------------------------ ----------------------------------------
00000021:000004ae:0001  LOP_BEGIN_XACT                  0000:000006cf  2015/06/03 11:18:16:112  INSERT;0x0105000000000005150000008e888d4
00000021:000004af:0001  LOP_BEGIN_XACT                  0000:000006d0  2015/06/03 11:19:17:306  INSERT;0x0105000000000005150000008e888d4
00000021:000004b0:0001  LOP_BEGIN_XACT                  0000:000006d1  2015/06/03 11:22:35:451  DELETE;0x0105000000000005150000008e888d4

And if you knew what the operation was, you could narrow it down by the Description too.

Then it’s a simple case of taking the Current LSN of the LOP_BEGIN_XACT log record of the transaction you’re interested in, and restoring a copy of the database using the STOPBEFOREMARK trick (that I showed in my previous post on using this stuff) to restore to a point just before that transaction.


New course: part 2 of Kimberly’s Optimizing Stored Procedure Performance

Kimberly’s latest Pluralsight course is live!

This is part 2 of her very popular course: SQL Server: Optimizing Stored Procedure Performance. The course is more than 3 hours long and has the following modules:

  • Introduction
  • Session Settings
  • Session Settings (section 2)
  • Special Considerations
  • Troubleshooting Techniques

Check it out here.

We now have more than 130 hours of SQLskills online training available, all for as little as $29/month through Pluralsight (including more than four thousand other developer and IT training courses). That’s unbeatable value that you can’t afford to ignore.


How would indexes on AG readable secondaries work?

Last weekend there was a suggestion on the MVP distribution list about having temporary nonclustered indexes on AG readable secondaries, in the same way that you can have temporary statistics on them. I replied that in my opinion it would be extremely difficult to do that, and said I’d blog about why. Here’s my explanation. Note that this isn’t an exhaustive list of all the problems, just the major ones that I see.

Where to store them?

There are several options that spring to mind for how to store these temporary indexes, with varying degrees of difficulty:

  1. Automatically create a temporary filegroup on the readable secondary
    1. This option has the problem that the readable secondary is read-only, and adding a filegroup would mean adding entries in a bunch of system tables (including sys.sysfiles1, sys.sysdbfiles, sys.sysbrickfiles, sys.sysprufiles). Even if this problem could be surmounted, there’s still the problem of…
    2. The readable secondary is read-only, so where to store all the information about the indexes themselves? There are a large number of system tables that have information about an index (including sys.sysallocunits, sys.sysrowsets, sys.sysrscols, sys.sysidxstats, sys.sysiscols). Even if this problem could be surmounted, there’s still the problem of maintaining the index (see below).
  2. Create an empty filegroup on the primary replica so there’s an empty filegroup to use on the readable secondary
    1. This only solves 1.a above.
  3. Store them in tempdb, the same as the temporary statistics
    1. This solves 1.a and 1.b, but then has the added difficulty of…
    2. Keeping track of the fact that there are indexes in tempdb that are really for another database, which already happens for temporary statistics, so that’s doable but there’s the much more complex problem of…
    3. Making the Storage Engine (specifically the Access Methods) get data from the tempdb index instead of an index in the real database. That’s not a code change in the Storage Engine (because the Access Methods just creates and uses a data set over whatever the Query Processor asks it to), but the query plan will have to know that the index it’s referencing which purports to be on a table in the real database is actually in tempdb, so it asks the Storage Engine to read from the correct place. I can imagine this being quite a challenge for things like a key lookup based on a nonclustered index seek/scan.
    4. This is the only potentially practical solution in my mind for where to create the temporary indexes.

That’s the easier part taken care of.

How to maintain them?

Assuming they are stored as in #3 above, then there’s the really hard problem of how to maintain the indexes, to keep them up-to-date with the underlying table (i.e. when an INSERT, UPDATE, or DELETE happens, make sure that all nonclustered indexes are updated accordingly). Creating the index would be relatively trivial using an online index build-like method, so I won’t go into that.

Back in SQL Server 2000, it would be much easier (although there weren’t any AGs back then :-) because the Storage Engine was responsible for real-time maintenance of nonclustered indexes. Since SQL Server 2005, however, this has been the purview of the Query Processor, so it drives maintaining indexes and the Storage Engine just does what it’s asked (insert a record here, update this column, etc.). There is functionality available to the Storage Engine to ask the Query Processor to manipulate nonclustered indexes without an operation occurring on the table – DBCC CHECKDB uses it for fixing missing or extra records in nonclustered indexes during repair operations.

Here’s the major problem with temporary indexes: the only data coming from the primary replica are physical log records. How to translate those physical log records into nonclustered index maintenance operations?

The only log records that are interesting are those related to the table itself (i.e. changes to the heap or clustered index). But log records are replayed by the recovery portion of the Storage Engine, not the Access Methods, so in the recovery portion, there is no context at all about the table, it’s columns, indexes, and so on. To make this work, the following would be required:

  1. The recovery portion would have to know that the log records LOP_INSERT_ROWS, LOP_DELETE_ROWS, LOP_MODIFY_ROW, LOP_MODIFY_COLUMNS, plus anything else like truncating the table, changing from a heap to a clustered index and vice versa, for certain allocation units (i.e. those for any table with a temporary nonclustered index) need to be passed to a new piece of code to do the temporary index maintenance.
  2. The new piece of code would have to know about the new indexes and do the index maintenance. In other words, Storage Engine nonclustered index maintenance like in SQL Server 2000 would have to be built again. There’s a complication with table record modifications, as the LOP_MODIFY_ROW and LOP_MODIFY_COLUMNS log records don’t say which columns are changing – just the offset and length of the change (they’re physical log records remember). The log record could potentially be changed to have a bitmap of column IDs being changed, that’s only present when there’s a temporary index on a readable secondary. More conditional code.
  3. Another option is for the Query Processor on the primary replica to know that there’s a temporary index and generate special logical operation log records to aid in the maintenance of the index on the readable secondary (in the same way that transactional replication works). I think this would be easier than having to interpret regular log records and figure out what to do.
  4. All this new code to be executed would potentially slow down the replay of log records, with the slow down increasing with each temporary index that’s added.


To summarize, there are possible solutions to how to store the temporary indexes and how to maintain them, but it’s a lot of tricky work and involves a big change to the recovery code, which is always fraught with danger as it’s such a critical portion of the Storage Engine. And I haven’t touched on partitioning, rebuild/reorganize, and other things you can do with indexes.

IMHO, the easiest solution is to not use temporary indexes and just create the indexes on the primary replica that you want to be there on the readable secondary. This doesn’t require any code changes and is available today. Of course there’s the potential downside of extra space being taken up, extra logging, and extra time to send that log to the secondaries, but I think this is the way to go.

Hope you found this interesting!

Survey: page life expectancy

It’s been a while since I’ve done a survey, and this one is inspired by a discussion I had with Matt Slocum (b | t) yesterday that led me to update my PLE post Page Life Expectancy isn’t what you think… with the discovery that on NUMA systems, Buffer Manager: Page Life Expectancy is a harmonic mean, not an arithmetic mean – big difference.

Anyway, in this survey I want some information about your page life expectancy and other supporting information:

  • SQL Server version
  • Number of logical processor cores
  • Buffer Node: Page Life Expectancy and Target/Total Node Memory for each buffer pool partition
  • Buffer Manager: Page life Expectancy

Feel free to run the code below any way you want, and also add a single preceding column to the result set (e.g. server name or number) if you want, but PLEASE do not add any *rows* of data apart from what I’ve asked for otherwise it makes the data processing very time consuming, especially if you send results from hundreds of servers. I know people that do that are trying to be helpful, but I really don’t need any other data apart from what I’ve asked for.

You can send me results in email in a text file or spreadsheet, or leave a comment below. The code will work on SQL Server 2005 onwards.

[Edit: 6/17/15 – the survey is closed – please don’t send any more results – thanks!]

I’ll editorialize the results in a week or two.


	LEFT (@@version, 25),
	'Cores available:',
	COUNT (*)
FROM sys.dm_os_schedulers


FROM sys.dm_os_performance_counters
WHERE [object_name] LIKE '%Memory Node%'
	AND [counter_name] = 'Target Node Memory (KB)'


FROM sys.dm_os_performance_counters
WHERE [object_name] LIKE '%Memory Node%'
	AND [counter_name] = 'Total Node Memory (KB)'


	[object_name] AS [ObjectName],
	[counter_name] AS [CounterName],
	[cntr_value] AS [CounterValue]
FROM sys.dm_os_performance_counters
WHERE [object_name] LIKE '%Buffer Node%'
	AND [counter_name] = 'Page life expectancy'


FROM sys.dm_os_performance_counters
WHERE [object_name] LIKE '%Buffer Manager%'
	AND [counter_name] = 'Page life expectancy';

Make sure you’re eligible to vote in the PASS elections this year!

Last year PASS introduced some eligibility requirements for voting in the 2014 PASS elections, but a lot of people didn’t get the message for various reasons (including me) before the deadline expired. PASS did the right thing and allowed people to become eligible by delaying the elections.

This year there are similar eligibility requirements and you’ve got until June 1st to make sure you’re eligible.

Check out the blog post from Bill Graziano (Immediate Past President of PASS) that explains what to do and how to verify your eligibility.

I’ve checked and I’m eligible (early this year! :-)



Identifying queries with SOS_SCHEDULER_YIELD waits

One of the problems with the SOS_SCHEDULER_YIELD wait type is that it’s not really a wait type. When this wait type occurs, it’s because a thread exhausted its 4ms scheduling quantum and voluntarily yielded the CPU, going directly to the bottom of the Runnable Queue for the scheduler, bypassing the Waiter List. A wait has to be registered though when a thread goes off the processor, so SOS_SCHEDULER_YIELD is used.

You can read more about this wait type:

  • Here on my blog
  • Here in my post on the blog

You want to investigate these waits if they’re a prevalent wait on your server, as they could be an indicator of large scans happening (of data that’s already in memory) where you’d really rather have small index seeks.

The problem is that they’re not a real wait type, so you can’t use my script to look at sys.dm_os_waiting_tasks and get the query plans of threads incurring that wait type, because these threads aren’t waiting for a resource, so don’t show up in the output of sys.dm_os_waiting_tasks!

The solution is to use the sys.dm_exec_requests DMV, as that will show the last_wait_type for all running requests. Below is a script you can use.

FROM sys.dm_exec_requests [er]
INNER JOIN sys.dm_exec_sessions [es] ON
	[es].[session_id] = [er].[session_id]
OUTER APPLY sys.dm_exec_sql_text ([er].[sql_handle]) [est]
OUTER APPLY sys.dm_exec_query_plan ([er].[plan_handle]) [eqp]
    [es].[is_user_process] = 1
	AND [er].[last_Wait_type] = N'SOS_SCHEDULER_YIELD'

That will give you the code and query plan of what’s happening, but even with that it might not be obvious which exact operator is causing that wait so you may need to resort to capturing SQL Server call stacks, as I explain in the first blog post link above.


New 4-day Immersion Event on Practical Data Science with Rafal Lukawiecki

We’re very excited to announce that we’re offering a new Immersion Event on Practical Data Science using Azure Machine Learning, SQL Data Mining, and R, presented by our great friend Rafal Lukawiecki.

We’ve known Rafal well since 2003, and apart from being an expert in BI and data mining, he’s a superb, highly-engaging speaker, who’s presented keynotes, workshops, and sessions to high acclaim at conferences around the world including TechEds and Microsoft’s brand new Ignite conference. In fact, you can watch Rafal’s 75-minute, 300-level session from Ignite earlier this week, on the same topic as the new Immersion Event, on Channel 9 here – it’s really interesting!

Rafal’s course will be four days long, with the following modules:

  1. Overview of Practical Data Science for Business
  2. Data
  3. Process
  4. Algorithm Overview
  5. Tools and Getting Started
  6. Segmentation
  7. Classification
  8. Basic Statistics
  9. Model Validation
  10. Classifier Precision
  11. Regressions
  12. Similarity Matching and Recommenders
  13. Other Algorithms (Brief Overview)
  14. Production and Model Maintenance

You can read through the detailed curriculum here.

The first offering of this new class will be in Chicago, at our usual location, from November 16-19, 2015 – all the details are here.

Cool stuff – hope to see you there!


Here’s an interesting question that came up in our IEPTO1 class in Chicago this week (paraphrasing):

I was doing a demo recently where I was demonstrating physical I/Os occurring. I used DBCC DROPCLEANBUFFERS, then enabled SET STATISTICS IO ON and performed a SELECT operation. I was expecting to see physical reads occurring but I only saw logical reads. What’s going on? Why weren’t there any physical reads after I’d flushed the buffer pool with DBCC DROPCLEANBUFFERS?

It’s a very interesting question. If you ask most people what DBCC DROPCLEANBUFFERS does, you’ll get the response that it clears out the buffer pool (as my student expected).

But it doesn’t.

It drops *clean* pages from the buffer pool only.

A clean page is one that has not been changed since it was read into memory or last written to disk. A dirty page is one that has not been written to disk since it was last changed. Dirty pages are not dropped by DBCC DROPCLEANBUFFERS, they are only made clean by writing them to disk (either through one of the various kinds of checkpoints or by the lazy writer – or one of the per-NUMA node lazy writers if you have NUMA configured).

The demo in question had updated the table being selected, and so when DBCC DROPCLEANBUFFERS was executed, the pages from that table remained in memory – hence no physical reads were required for the subsequent SELECT.

If you want to ensure that all pages from a database are flushed from memory, you need to first perform a manual CHECKPOINT of that database and then run DBCC DROPCLEANBUFFERS.

You can then verify that there are no pages in memory for the database in question using the following code:

	[DirtyPageCount] * 8 / 1024 AS [DirtyPageMB],
	[CleanPageCount] * 8 / 1024 AS [CleanPageMB]
		(CASE WHEN ([database_id] = 32767)
			THEN N'Resource Database'
			ELSE DB_NAME ([database_id]) END) AS [DatabaseName], 
		SUM (CASE WHEN ([is_modified] = 1)
			THEN 1 ELSE 0 END) AS [DirtyPageCount], 
		SUM (CASE WHEN ([is_modified] = 1)
			THEN 0 ELSE 1 END) AS [CleanPageCount]
	FROM sys.dm_os_buffer_descriptors
	GROUP BY [database_id]) AS [buffers]
ORDER BY [DatabaseName]

Hope this helps some of you out there!

SQLintersection coming up in May

We’re just about to leave for three weeks of Immersion Events in Chicago, starting next week, and if that wasn’t enough, we’ve got our Spring 2015 SQLintersection conference right after that!

This time we’re in Scottsdale (Phoenix), Arizona (May 18-21), which I’m really looking forward to as I’ve never been to Arizona (it’ll be the 23rd State I’ve visited).

We’ve got a fantastic speaker line-up as usual, with Brent Ozar, Allen White, Aaron Bertrand, Kevin Kline, SQL CAT team members Shep Shephard and Denzil Ribeiro, SQL PFEs David Pless and Tim Chapman, and four of us from SQLskills (me, Kimberly, Erin, and Tim).

I’ll be doing my favorite pre-con workshop on Performance Troubleshooting using Waits and Latches, plus there are other workshops from Brent Ozar, Kimberly, and Allen White. We’ve also got four tracks of sessions, including three from me:

  • Transaction Log Performance
  • Advanced Data Recovery Techniques
  • DBA Mythbusters

It’s been five years since I last put together a mythbusters session so plenty of new myths to bust :-)

I’m really looking forward to the conference and hope to see you there!


PS We’ll be running out SQL After Dark evening quiz bowl and raffle again – that was a blast when we did it last time!

New SQLskills Pluralsight courses

We’ve released a couple of new Pluralsight courses recently:

The first is the conclusion of Glenn’s DMVs course series, SQL Server 2014 DMV Diagnostic Queries – Part 3, based on the very popular set of DMV queries he publishes every month.

The second course is Jonathan’s long-awaited SQL Server: Change Data Capture course. The short abstract is:

Learn how to capture SQL Server table data changes for downstream consumption using the Change Data Capture feature of SQL Server. This course is applicable to developers, DBAs, and architects on SQL Server 2008 onward.

The module list is:

  • Introduction
  • Configuring Change Data Capture
  • Querying Change Data Using Transact-SQL
  • Administration
  • Performance Tuning and Optimization
  • SQL Server 2012 SSIS Components

Jonathan’s been using CDC quite a bit with clients recently and this course is a distillation of all his knowledge and experience.

Check them both out!

We’ve got a lot more courses in production and planned for the remainder of 2015 around server consolidation, Service Broker (3 courses planned), more stored procedure performance, fragmentation, and more. Watch this space (and the Insider Newsletter) for details.