Page Life Expectancy isn’t what you think…

Updated 5/11/2015…

There’s a lot of controversy about the Buffer Manager performance object counter Page Life Expectancy – mostly around people continuing to quote 300 as a threshold for starting to worry about there being a problem (which is just utter nonsense these days). That’s far too *low* to be the point at which to start worrying if your PLE dips and stays there. Jonathan came up with a better number to use – based on the size of your buffer pool – see the bottom of his post here.

But that’s not why I’m writing today: I want to explain why most of the time now Page Life Expectancy is really not giving you useful information.

Most new systems today use NUMA, and so the buffer pool is split up and managed per NUMA node, with each NUMA node getting it’s own lazy writer thread, managing it’s own buffer free list, and dealing with node-local memory allocations. Think of each of these as a mini buffer pool.

The Buffer Manager:Page Life Expectancy counter is calculated by adding the PLE of each mini buffer pool and then calculating the mean. But it’s not the arithmetic mean as we’ve all thought forever, it’s the harmonic mean (see Wikipedia here), so the value is lower than the arithmetic mean. (5/11/2015: Thanks to Matt Slocum (b | t) for pointing out a discrepancy from the arithmetic mean on a large NUMA system and making me dig into this more, and my friend Bob Dorr from CSS for digging into the code.)

What does this mean? It means that the overall PLE is not giving you a true sense of what is happening on your machine as one NUMA node could be under memory pressure but the *overall* PLE would only dip slightly. One of my friends who’s a Premier Field Engineer and MCM just had this situation today, which prompted this blog post. The conundrum was how can there be 100+ lazy writes/sec occurring when overall PLE is relatively static – and this was the issue.

For instance, for a machine with 4 NUMA nodes, with PLE of each being 4000, the overall PLE is 4000.

The calculation is: add the reciprocals of (1000 x PLE) for each node, divide that into the number of nodes and then divide by 1000.

In my example, this is 4 / (1/(1000 x 4000) + 1/(1000 x 4000) + 1/(1000 x 4000) + 1/(1000 x 4000)) / 1000 = 4000.

Now, if one of them drops to 2200, the overall PLE only drops to: 4 / (1/(1000 x 2200) + 1/(1000 x 4000) + 1/(1000 x 4000) + 1/(1000 x 4000)) / 1000 = 3321.

If you had an alert set watching for a 20% drop in PLE then that wouldn’t fire, even though one of the buffer nodes was under high pressure.

And you have to be careful not to overreact too. If one of them drops to 200, the overall PLE only drops to: 4 / (1/(1000 x 200) + 1/(1000 x 4000) + 1/(1000 x 4000) + 1/(1000 x 4000)) / 1000 = 695, which might make you think that the server is suffering hugely across the board.

On NUMA machines, you need to be looking at the Buffer Node:Page Life Expectancy counters for all NUMA nodes otherwise you’re not getting an accurate view of buffer pool memory pressure and so could be missing or overreacting to performance issues. And adjust Jonathan’s threshold value according to the number of NUMA nodes you have.

You can see the lazywriter activity for each NUMA node by looking for the lazywriter threads in sys.dm_exec_requests.

Hope this helps!

SQL Connections Spring 2012: Call for Abstracts

Yes, it's that time again. We're just about to have the Fall show so its time to start planning for the Spring 2012 show. Once again Kimberly and I will be Co-Chairs of the Conference and we'd like to invite you to submit abstracts for the Spring 2012 SQL Connections conference, to be held in Las Vegas, March 26th – 29th, 2012. Pre-cons will be March 25th, post-cons on March 30th.

The conference will focus heavily on SQL Server 2012. Abstracts are still welcome on best practices for SQL Server 2005 – 2008 R2 and how to upgrade and migrate applications from SQL Server 200x to SQL Server 2008/2008R2. However, we will consider all topics – from design to performance to troubleshooting to architectures to programming. You should aim for the intermediate to advanced audience for any SQL Server 2005-2008 R2 materials and for SQL Server 2012 we'll consider any level – especially depending on the newness of the release.

The themes to consider for your abstracts are:

  • Design & Architecture
  • Performance
  • Troubleshooting
  • Monitoring
  • Best Practices
  • Myths and Misconceptions
  • Understanding/Introduction To…

For the session level, use the following examples as a guideline:

  • 100 – Beginner (e.g. what does 'corruption' mean?
  • 200 – Intermediate (e.g. what do I do when corruption is detected?)
  • 300 – Advanced (e.g. how do I do take advantage of partial database availability and online piecemeal restore?)
  • 400 – Master (e.g. how can I fix broken system tables using the DAC and server single-user mode?)
  • 500 – SQL Server Internals (e.g. how does the read-ahead in DBCC CHECKDB differ from regular adaptive range-scan read-ahead?)

For submitting session abstracts, please use this URL: http://www.deeptraining.com/devconnections/abstracts

The tool will be open from now until to November 1st, after which we won't accept any abstracts – no exceptions. If you have used this site before and have forgotten your password, you can have your password emailed to you. It's better to do this then to create a new account.

Please keep the abstracts under 200 words each and in one paragraph. Please do not use bulleted items or line breaks, and please use a spell-checker. If you have an issue with the site, please follow up with an email to paul@sqlskills.com.

Please create a new speaker record (if you haven't submitted before) and submit at least 3 abstracts; however, it will help your chances of being selected if you submit 5 or 6 abstracts. Please do not submit abstracts for sessions that you have previously presented at the Spring 2011 show. We must accept a minimum of three of your abstracts for you to be considered as one of the speakers – there are 7 open speaker slots, not including Kimberly and myself.

What you will get if selected:

  • $500 per conference talk. (Additional compensation for pre/post conference workshops.)
  • Coach airfare and hotel stay paid by the conference
  • Free admission to all of the co-located conferences
  • Speaker party
  • The adoration of attendees
  • etc. 

Thanks for your continued support for our SQL Connections conference and we look forward to seeing your bright and shiny *new* abstracts!

Thanks,
Kimberly L. Tripp & Paul S. Randal
SQL Connections Conference Chairs

(Comments are disabled for this post – send email with questions.)

Spring 2012 classes in Tampa, FL open for registration

We’ve started to confirm our 2012 classes, and we’re coming to the south-east of the US by popular demand in Spring 2012! We’re teaching three 5-day classes back-to-back in Tampa, FL so not only can you get your learning on, you can also escape the winter weather and get your shorts and sunglasses on too!

We don’t allow anyone else to license and teach our material so you’re guaranteed the best instruction possible – unbeatable ROI for your training budget!

All classes are 5 days from 8.30 to 5.30 and include catered breakfast and lunch every day. The early bird price for each class is US$2,995.

We hope to see you there!

Here are the details:

Tampa, FL, February 27-March 2, 2012

IE1: Immersion Event on Internals and Performance

  • Data Storage Internals, Designing for Performance, and Indexing for Performance 
  • Instructors: Paul S. Randal, Kimberly L. Tripp

Tampa, FL, March 5-9, 2012

IE2: Immersion Event on Performance Tuning

  • IO Subsystems, Workload Analysis, and Performance Tuning Methodologies 
  • Instructors: Paul S. Randal, Kimberly L. Tripp, Jonathan Kehayias, Joe Sack

Tampa, FL, March 12-16, 2012

IE3: Immersion Event on High Availability and Disaster Recovery

  • HA/DR strategy, replication, clustering, mirroring, virtualization, consolidation, corruption 
  • Instructors: Paul S. Randal, Kimberly L. Tripp, Jonathan Kehayias, Joe Sack