Paul’s awesome ribs recipe

On Friday I made my signature pork ribs for our good friend Bob Ward (along with Kimberly’s truffled cauliflower mash and caramelized broccoli) and I promised to blog the ribs recipe. I’ve been making ribs like this for about ten years – they’re dry and you have to tear the meat off the ribs with your teeth, rather than the usual American ribs where the meat falls off on its own. I don’t remember which BBQ recipe book I originally got the recipe from but it’s evolved over the years into what’s below.

Let me know if you try it – enjoy!

Ingredients:

  • 1 large rack of baby-back pork ribs
  • 1 full bulb of garlic
  • 1/2 cup soy sauce
  • 1/4 cup cream sherry (e.g. Sheffield)
  • 2 tablespoons honey

Prepare the ribs: wash the rack and cut it into individual ribs. Arrange in a deep rectangular dish (e.g. Pyrex or Le Creuset) on their side – layed out like ( ( ( ( (. Optional: discard weird-shaped or extra pieces of meat.

Prepare the sauce: pour the soy sauce into a Pyrex measuring jug. Pour the sherry in too. Pour the honey into the mixture. It’ll settle at the bottom of the jug. Peel the garlic (vary the amount based on your garlic-loving preference) and use a garlic press (I use this one) to get the ‘meat’ of each garlic glove into the sauce jug. Discard the skin of each garlic glove.

Marinate the ribs: Mix up the sauce with a small fork to get the honey mixed in. Pour the sauce evenly over the ribs. Cover the rib dish with cling wrap and let it sit in the fridge for 4-24 hours.

Prepare to cook: set the oven to broil and set the top oven shelf to be one level down from the very top. Line a deep rectangular baking/broiling pan with aluminum foil. Arrange the ribs on the side on the foil in two rows like ( ( ( ( (.

Broil: broil the ribs as prepared for 5 minutes. Remove from the oven and turn the ribs over in place. Drizzle the remains of the sauce and garlic over each rib using a small spoon. Broil for 4 more minutes. Remove from the oven.

Bake: after broiling the ribs the second time, set the same over to bake at 325F. Turn the ribs a second time and drizzle any remaining sauce on them. Bake on the bottom shelf for 20 minutes. Increase heat to 350F and bake for 10 more minutes. Increase heat to 375F and bake for 10 more minutes. You want the outside of the ribs to be blackening and caramelizing.

Remove and place in a serving dish. Eat immediately with your hands.

For garlic lovers, get a teaspoon and scoop the concentrated garlic mush and sauce from the bottom of the baking pan – yummmmm!

Extra ribs can be eaten cold over the next few days or reheated by baking for 5-8 minutes at 375.

Delayed Durability in SQL Server 2014

One of the cool new features in SQL Server 2014 is delayed durability (available in all Editions), which is described in detail in Books Online here.

I think I’m going to see a lot of people turn this on, as you can get a profound increase in transaction throughput with the right workload. However, I also think a lot of people are going to turn this on without realizing the potential for data loss and making the appropriate trade off.

Why can it give a throughput boost?

I put together a contrived workload with a small table where 50 concurrent clients are updating the same rows, and the database log is on a slow I/O subsystem. Here’s a graph showing my test:

delayed

At the obvious change point, that’s where I enabled delayed durability, with all transactions being forced to use it. Before the change, the number of Transactions/sec is equal to the number of Log Flushes/sec, as each transaction is holding locks that block all other transactions (I told you it’s a contrived workload). So why the profound jump in Transactions/sec when I forced delayed durability?

Under normal circumstances, when a transaction commits, the commit doesn’t complete until the log block (see this blog post for more details) containing the LOP_COMMIT_XACT log record for the transaction has been flushed to disk and the write is acknowledged back to SQL Server as having completed, providing the durability of the transaction (the D in the ACID properties of the transaction). The transaction’s locks cannot be dropped until the log flush completes.

In my workload, all the other transactions are waiting for the one that is committing, as they all need the same locks, so Transactions/sec is tied to Log Flushes/sec in this case.

With delayed durability, the transaction commit proceeds without the log block flush occurring – hence the act of making the transaction durable is delayed. Under delayed durability, log blocks are only flushed to disk when they reach their maximum size of 60KB, or roughly every 1ms, whichever comes first. This means that transactions commit a lot faster, hold their locks for less time, and so Transactions/sec increases greatly (for this workload). You can also see that the Log Flushes/sec decreased greatly as well, as previously it was flushing lots of tiny log blocks and then changed to only flush maximum-sized log blocks.

Note:

  • I was forcing all transactions to be delayed durable, but the facility exists to make the delayed durability choice per transaction too (see Books Online for more details).
  • There’s a bit more to the log block flushing too: under delayed durability, a log block will flush when it fills up, or if a non-delayed durable transaction commits, or if the new sp_flush_log proc is executed, or after 1ms.

My good friend Aaron Bertrand over at SQL Sentry has a long post about delayed durability that looks into its performance implications in a little bit more depth so I recommend you check out his post as well.

So this looks great, for the right type of workload. But I bet you’re thinking:

What’s the catch?

Your transactions aren’t durable when they commit. Simple.

Now you may be thinking that if the system crashes, the most you’re going to lose is up to 60KB of transaction log. Wrong. If that last log block contains the LOP_COMMIT_XACT log record for a long-running transaction, when the system crashes, and that log block isn’t on disk, that whole transaction will roll back during crash recovery. So the potential for work/data loss is greater than just 60KB.

And there’s more:

  • Log backups will not back up that unflushed log block, as it’s not on disk, so non-durable committed transactions may not be contained within a log backup.
  • Non-durable transactions that have committed are not protected by synchronous database mirroring or a synchronous availability group either, as these rely on log block flushes (and transmission to the mirror/replica).

For critical transactions, an sp_flush_log can be used, or per-transaction delayed durability used instead.

So the million-dollar question is:

Should I enable delayed durability?

It depends. Is your business comfortable making the throughput vs. durability trade off? Does enabling it give a throughput boost? If yes to both, go ahead. If no to either, don’t enable it. That’s a very simplistic way of thinking about it, but that’s what it boils down to really.

There are lots of other things you can do to increase the throughput and performance of the transaction log, and I explained them in a blog post series:

As I stated above though, I think a lot of people are going to be wowed by the throughput boost (if their workload benefits) from delayed durability and see this as a no-brainer, without considering the potential for data loss.

Tempting, isn’t it?

Bug: Error: 3449 and server restart during DBCC CHECKDB

This is a quick post to let you know about a bug that a few people are hitting when running DBCC CHECKDB. The symptoms are a series of errors in the error log plus SQL Server forcibly shuts itself down and restarts. I’ve heard of people hitting the bug on SQL Server 2014 and SQL Server 2012 SP1.

[Update 2/24/15] Microsoft has confirmed that it’s a bug in 2014 and 2012 and they’re planning a fix for the next CU of both.

[Update 3/18/15] The fix for SQL Server 2012 is in CU5 of SP2 and is described in KB 3044958.

The error log symptoms look something like I show below:

2014-11-18 09:04:15.69 spid64      The operating system returned error 665(The requested operation could not be completed due to a file system limitation) to SQL Server during a write at offset 0x00001EE6FD8000 in file 'C:\SQLskills\\Company_file2.ndf:MSSQL_DBCC23'. Additional messages in the SQL Server error log and system event log may provide more detail. This is a severe system-level error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.
2014-11-18 09:04:15.72 spid64      Error: 3314, Severity: 17, State: 3.
2014-11-18 09:04:15.72 spid64      During undoing of a logged operation in database 'Company', an error occurred at log record ID (887654:3321:14). Typically, the specific failure is logged previously as an error in the Windows Event Log service. Restore the database or file from a backup, or repair the database.
2014-11-18 09:04:34.38 spid64      Error: 831, Severity: 20, State: 1.
2014-11-18 09:04:34.38 spid64      Unable to deallocate a kept page.
2014-11-18 09:04:34.40 spid64      Error: 3449, Severity: 21, State: 1.
2014-11-18 09:04:34.40 spid64      SQL Server must shut down in order to recover a database (database ID 23). The database is either a user database that could not be shut down or a system database. Restart SQL Server. If the database fails to recover after another startup, repair or restore the database.

The 665 error is from the snapshot file that DBCC CHECKDB creates hitting an NTFS limitation on the number of file fragments in a sparse file. This causes the snapshot creation to fail. The failure causes the undo of a log record in the snapshot to fail (remember that a database snapshot undergoes crash recovery to make it transactionally consistent). This failure then leads to SQL Server thinking it has to forcibly restart to recover the snapshot database, which is should never do for a snapshot – and that’s the bug.

I’ll update this post as soon as I hear about the builds that the fix is in.

This is a rare bug to hit, but it’s a regression (from builds people are reporting), and you can help yourself to avoid it by:

  • Creating your own database snapshot, on a volume without file-system free space fragmentation, and running DBCC CHECKDB against the snapshot
  • OR, trying to run DBCC CHECKDB when there isn’t a significant amount of change occurring in the database, so the database snapshot doesn’t have to become very large
  • OR, using the backup-copy-restore-check method of running DBCC CHECKDB on a restored backup of the database on another server

Thanks