Remote DB maintenance auditing promotion

Okay, this post is describing some new services we provide (we are a consulting firm after all :-), but there's a promotion at the bottom where you can save $$$ on one of the new services.

We're starting to offer a set of standalone auditing services that we can perform remotely, greatly reducing the cost of having someone come on-site to evaluate your SQL Server environment. The new standalone audit services we are providing are:

  • Database maintenance and operations audit of your SQL Server environment
  • Disaster recovery and high-availability audit of your SQL Server environment

The remote audits take the form of us giving you some non-intrusive scripts to run on each server, plus a detailed questionnaire to fill in. We analyze the results and send you a list of recommendations, with supporting explanations and links to deeper information.

As part of a regular remote consulting engagement, we'd also be engaging with the DBA team through web-meetings, phone calls, email conversations to undertake performance tuning, design work, and so on. However, many people just want to buy a small, limited block of time to run a quick audit and get the OK or a set of changes to implement to improve operations – this is where the standalone audits come in.

The standalone audit services we offer take a much shorter time (depending on environment size) than a regular consulting engagement, as they don't involve web-meetings and in-depth interaction with the DBA team, so they're a lot more cost-effective when on a tight budget. Of course, we also provide remote performance tuning, available in small blocks of time rather than an open-ended engagement, and the traditional 'figure out all sorts of problems' consulting engagements.

These audits are usually billed at our regular offsite consulting rate, but as an initial promotion I'll perform a database maintenance audit for $500 to the first 10 new customers who sign up before the end of January, limited to 4 hours of my time (a 50% saving over our regular rates).

Shoot me an email (through the contact button at the top of the page) if you're interested in any of these services and/or want to be one of the lucky few to take part in the promotion.

We look forward to working with you!

Interesting case of watching log file growth during a perf test

I'm running some performance tests on the hardware we have (more details on the first of these tomorrow) and I was surprised to see some explosive transaction log growth while running in the SIMPLE recovery model with single row insert transactions!

Without spoiling tomorrow's thunder too much, I've got a setup with varying numbers of connections populating a 1TB table with default values, with the single data and log files on the same 8-drive RAID-10 array (again more details on why tomorrow). I was looking at perfmon while the test was running with 128 concurrent connections, and noticed some weird-looking behavior – basically what looked like a never-ending checkpoint occuring. So I dug in deeper and discovered the transaction log had grown over 7GB since the start of the test. Huh?

I restarted the test from scratch, with a 1TB data file and a 256MB log and watched what happened in perfmon; here's the screenshot of the first 90 seconds:


This is *so* interesting. The black line is the number of log growths, so you can see the log grows every time the line level goes up. The bright blue line is the number of pages being allocated per second to hold all the table rows my 128 connections are inserting. You can clearly see that every time there's a log growth, the allocations take a nose-dive – because no transactions can commit while the new portion of the log is being zeroed (remember that instant file initialization does not apply to the log – see this blog post). The green line (disk write bytes per second) spikes when the log grows because of the zeroes being written out to the log file by SQL Server (remember that SQL Server does the zeroing).

But why is the log growing in the first place? I'm in the SIMPLE recovery model and doing single-row implicit transactions (yes, I'm deliberately doing this) so there shouldn't be anything stopping the log from clearing during a checkpoint, right?

Wrong. The log starts out small (256MB) so one of the thresholds for triggering a checkpoint gets hit pretty fast (70% of the log file is used). So a checkpoint occurs (you can see the checkpoints occuring when the pink line at the bottom of the perfmon screen is above zero), and starts writing out to the data file, which is on the same disk as the log (see How do checkpoints work and what gets logged for an explanation of checkpoint mechanics), but it can't write fast enough (because of disk contention) to get to the log-clearing part before the log fills up completely and has to grow (because transactions are continuing at break-neck speed). So the log grows, and the insert transactions stop while the log is zeroed. And then the log starts to fill up again very quickly and another checkpoint is triggered, and so on and so on.

Eventually a steady state is reached where there's enough free log space during a checkpoint that no new log is required for the concurrent transactions to commit. You might ask why the checkpoint is so slow in the first place? Because I deliberately put the data file on the same RAID array as the log file, and both are being steadily hammered with writes – classic disk contention. Even though the RAID array is RAID-10 with 8x300GB 15k SCSI drives, the average disk queue write length is over 20 most of the time during the 128-way test because I'm simply trying to do too much.

The point of my blog post? Just because you don't have any of the classic causes of transaction log growth going on, doesn't mean you're going to be immune. In this case my (deliberate) poor physical layout of the database files and workload growth up to 128 concurrent connections caused the log to grow. What started out working when I was running 16 connections didn't work any more at 128 (actually I went back and re-ran some of the earlier tests and even with only 64 connections, the log grew to over 1GB before reaching steady-state).

Interesting eh?

Tomorrow I'll be continuing the perf/benchmarking series by creating my first benchmark and then tweaking the setup to see how I can improve performance (for instance with multiple data files, separation of log and data files – all the things I preach but have never *demonstrated*), but this behavior merited a post all on its own.

New blog on Paul’s Electronics

At the start of the year I decided to get seriously back into playing with electronics after a long haitus. I agonized about where to blog about all this stuff (as I like to blog a *lot*, as you all well know) and I initially thought it would be cool to have everything intermingled on this blog. Then I started to play and realized that I have a *ton* of stuff I want to blog about around electronics and microcontrollers – photos, videos, code – and that it would really interfere with the purpose of this blog – disseminating SQL Server info. Couple that with the fact that this blog is picked up by lots of SQL Server blog aggregators, and I started to worry that people would get a little miffed by *lots* of non-SQL stuff coming from this blog.

Soooo I decided to create a completely new blog on the site today, with a new 'theme'. I now know lots about CSS style sheets, ASP.NET, IIS, and BlogEngine.Net – probably way more than I really wanted to. But I have a nice shiny new blog dedicated to my electronics adventures. I've deleted the two posts since 1/1/10 about electronics and re-posted on the new blog (sorry to the 3 people who'd commented!). There's a link to the new blog at the top-left of this one – feel free to subscribe, drop in from time to time, or ignore it completely :-) 

You can get to the new blog at Paul's Electronics (in homage to the name of the electronics kit my Dad made me when I was 10).

Normal service will be resumed here very shortly…