(New for 2020: we’ve published a range of SQL Server interview candidate screening assessments with our partner Kandio, so you can avoid hiring an ‘expert’ who ends up causing problems. Check them out here.)
There’s a lot of controversy about the Buffer Manager performance object counter Page Life Expectancy – mostly around people continuing to quote 300 as a threshold for starting to worry about there being a problem (which is just utter nonsense these days). That’s far too *low* to be the point at which to start worrying if your PLE dips and stays there. Jonathan came up with a better number to use – based on the size of your buffer pool – see the bottom of his post here.
But that’s not why I’m writing today: I want to explain why most of the time now Page Life Expectancy is really not giving you useful information.
Most new systems today use NUMA, and so the buffer pool is split up and managed per NUMA node, with each NUMA node getting it’s own lazy writer thread, managing it’s own buffer free list, and dealing with node-local memory allocations. Think of each of these as a mini buffer pool.
The Buffer Manager:Page Life Expectancy counter is calculated by adding the PLE of each mini buffer pool and then calculating the mean. But it’s not the arithmetic mean as we’ve all thought forever, it’s the harmonic mean (see Wikipedia here), so the value is lower than the arithmetic mean. (5/11/2015: Thanks to Matt Slocum (b | t) for pointing out a discrepancy from the arithmetic mean on a large NUMA system and making me dig into this more, and my friend Bob Dorr from CSS for digging into the code.)
What does this mean? It means that the overall PLE is not giving you a true sense of what is happening on your machine as one NUMA node could be under memory pressure but the *overall* PLE would only dip slightly. One of my friends who’s a Premier Field Engineer and MCM just had this situation today, which prompted this blog post. The conundrum was how can there be 100+ lazy writes/sec occurring when overall PLE is relatively static – and this was the issue.
For instance, for a machine with 4 NUMA nodes, with PLE of each being 4000, the overall PLE is 4000.
The calculation is: add the reciprocals of (1000 x PLE) for each node, divide that into the number of nodes and then divide by 1000.
In my example, this is 4 / (1/(1000 x 4000) + 1/(1000 x 4000) + 1/(1000 x 4000) + 1/(1000 x 4000)) / 1000 = 4000.
Now, if one of them drops to 2200, the overall PLE only drops to: 4 / (1/(1000 x 2200) + 1/(1000 x 4000) + 1/(1000 x 4000) + 1/(1000 x 4000)) / 1000 = 3321.
If you had an alert set watching for a 20% drop in PLE then that wouldn’t fire, even though one of the buffer nodes was under high pressure.
And you have to be careful not to overreact too. If one of them drops to 200, the overall PLE only drops to: 4 / (1/(1000 x 200) + 1/(1000 x 4000) + 1/(1000 x 4000) + 1/(1000 x 4000)) / 1000 = 695, which might make you think that the server is suffering hugely across the board.
On NUMA machines, you need to be looking at the Buffer Node:Page Life Expectancy counters for all NUMA nodes otherwise you’re not getting an accurate view of buffer pool memory pressure and so could be missing or overreacting to performance issues. And adjust Jonathan’s threshold value according to the number of NUMA nodes you have.
You can see the lazywriter activity for each NUMA node by looking for the lazywriter threads in sys.dm_exec_requests.
Hope this helps!
34 thoughts on “Page Life Expectancy isn’t what you think…”
Thanks Paul. I would guess this one should go to myth a day series:)
Do you have any suggestions for a good monitoring metric or two that could encompass this? Maybe the combination of average PLE and the standard deviation per node? That way if a single NUMA node dropped the standard deviation would spike, throwing an alert.
Would the URL below help us accidental DBA’s understand whether we have NUMA and if the hardware has it, whether SQL Server xxxx uses it?
If there’s a better resource to help determine our real-world-NUMA-status……
Must admit, NUMA is new to me. I’m concerned because the DBAs (real ones, purportedly) at Rackspace quoted the magic 300 number a few months ago. We’re (still) on SQL 2005 EE but on a Windows 2008 R2 Enterprise Server with a dual-4-core CPU w/ 36GB RAM config (14-month-old Dell PowerEdge platform). Your post and the one referenced above _seem_ to indicate that we’re not NUMA-enabled/benefitted. Does this still render the magic-300 defunct?
Thanks for raising the topic – helpful as always and much appreciated.
Yes – look at Jonathan’s post that I link to…
Thanks Paul, an amazing insight on PLE on NUMA nodes.
That’s left me wondering that "what else is different on NUMA nodes in regards CPU/Memory/IO" which once should be knowing. May be decicted blog post with some pointers "If you are on NUMA, you should be knowing this….", will surely help DBA’s.
Apologies if this already exits somewhere, which I cannot Google.
Cheers!! (you are my hero)
I haven’t looked at mine for a while, just checked production averages:
node 0 – 262,179
node 1-6 – 411,946
node 7 – 379,091
This server was rebooted Saturday night (10/15) for patching.
Thanks, Jonatham, but being on SQL 2005, the DMV doesn’t exist in master… By running:
SELECT DISTINCT memory_node_id FROM sys.dm_os_memory_clerks
I get Nodes 0 and 1. Guess I’m non-NUMA.
You can tell if your system uses NUMA or not by querying sys.dm_os_memory_nodes. A non-NUMA system will have 2 nodes listed, 0 and 64 (which is the DAC resource and not a true node). A NUMA system will have additional node entries for each of the hardware NUMA nodes in the system. If you manually configured Soft-NUMA sys.dm_os_nodes would have the independent CPU nodes under Soft-NUMA, but you don’t get the extra lazywriters or memory partitioning per node under Soft-NUMA.
Does this only applies to SQL Servers installed on physical machines or does it applies also to virtual machines working on a physical machine that uses NUMA?
By the way, great article as usual.
Unless you are running ESX 5.0 already, this only applies to physical hardware as there is no concept of virtual NUMA prior to ESX 5.0. With ESX 5.0, you can configure the VM to use virtual NUMA which is how I assume they were able to let a VM be larger than the physical resources in one of the hardware nodes, but I haven’t had time to look at it in depth yet.
I don’t have a 2005 machine immediately available to look at to tell if the DAC used node 64 like it does in 2008 or not. The DAC runs under its own node_id which is 64 in 2008. See if you have a scheduler in dm_os_schedulers that is ONLINE DAC for one of the two nodes, and you will know if either is the DAC node. If it is, you don’t have hardware NUMA. If you have an even number of schedulers that are ONLINE VISIBLE for both node_id’s and you have a separate scheduler for the ONLINE DAC, then you have hardware NUMA unless you or someone else manually configured Soft-NUMA for the system.
I will take that in consideration.
Thanks for the great explanation.Can you please let me know about the value shown by performance dashboard PLE? is this the overall PLE or is this the NUMA wise.
Hi Aditya – it’s the overall PLE.
thanks for your great article from turkey…
Thanks Paul for shedding light on this !
I just wrote a query to see and demonstrate that to a colleague.
Here is the result.
nice article, good explanation. I was running some checks on one of the servers I work with (where we suffer perf. bottlenecks), and the results are quite scary. The PLE value moves between 40 and 110, SQL Server has 96 GB allocated and the box has 64 cores (16 physical). So I assume we are doomed. I have the impression this problem is 99% related to DB design and indexes mis-usage (too many, too big), but my DBA insists the problem is hardware based. I do see some I/0 bottlenecks, mostly in the drives where the monster indexes live. This is a large OLTP system, with some large volumes involved. Should we panic?
Thanks in advance for any response!
Yes, SQL Server is reading 90+GB of data every minute or two – it’s not a hardware problem. You need to figure out why SQL Server is reading all that data.
How to PLE (in each node) is calculated?
That algorithm isn’t public.
Hi, Paul, we have a VM that has 198 GB memory, the PLE is around 2482113, but the query memory grant constantly stays on 20-35, what is wrong ?
result from sys.dm_os_memory_nodes
0 16320 15912 0 3667688 11384 0 0 31 31 0
1 206491696 193831812 0 3667696 239176 0 0 992 992 0
64 2624 2552 0 3667696 2472 0 0 31 31 0
What do you mean by “query memory grant constantly stays on 20-35”?
I am having a sql server with PLE 20-50 all time, It’s a 32 bit SQL 2008 machine having 15 Gb allocated non numa node.
We performed db maintenance tasks, but the performance is the same. The SP’s are getting timed out.
object_name counter_name cntr_value
SQLServer:Buffer Manager Page life expectancy 13
Can you please help me on this
PS: We are having a different machine with same configurations and data but it not having any performance issues.
With PLE that low, you’re thrashing the buffer pool. Look to see what’s doing so many page reads – see http://sqlperformance.com/2014/06/io-subsystem/knee-jerk-waits-pageiolatch-sh
Very nice article and good explanation, really appreciated for sharing. I was just checking PLE for one of my Servers, the findings were bit worrying. The SQL Server Buffer Pool allocated 25GB, the PLE this morning at 09:28:17 was 20913 when checked again at 10:49:34 it was 25798. As of my understanding it is cumulative value so to calculate it to per second, I find the difference between two times in seconds (which is 4877 seconds) and then difference in PLE values between those times which is 4876. It tell me that PLE is 1/second which is very scary. I’m just wondering if my calculation is right or I’m missing something please?
Thanks a lot in advance for your response.
Yes, PLE ticks up once per second when there’s no memory pressure.
Hi Paul, I was looking at this article to see if it might help with with some curious results I’m seeing with a physical instance with 12 buffer nodes where the figures for Buffer Manager and Buffer Node seem very skewed. For Buffer Manager, PLE is 5,059 and for Nodes 0 and 1 it’s 3,913 and 12,106 respectively. I was wondering what might cause this skew.
First off, if you do the harmonic mean calculation using all 12 buffer node PLEs, you’ll get the buffer manager PLE. The skew is because of different amounts of memory pressure in each NUMA node. Very common to see and because of different queries running on the different NUMA nodes.
For a NUMA based system, if i want to just get PLE for the entire server, do i just add PLE for each NUMA node?
You should monitor the PLE for each NUMA node.
Hey Paul, in your example you said 8 Numa nodes but your calculation uses only 4 – which would render the same overall PLE of 4000 either way UNTIL one of your NUMA nodes is suffering. With 8 nodes in play, the overall PLE calc when only one node drops to 2200 comes out to 3628 compared to 3321 with only 4 nodes, so the more NUMA nodes you have will make things seem even better – yet one or two of them could definitely be memory stressed…
Or did I miss something?
Great article, thanks for that!
Oops – yes, 4 is the right number – I’ve corrected the post – thanks!
LOL!!! Good to know! I wasn’t sure if I was just being dopey and missed the boat somewhere!!
Hey Paul was wondering if this is an issue. I ran the queries at the bottom of this post and got a PLE between 600 and 1200 over the course of 2 hours
The total GBs in the buffer pool ranged from 320-400 resulting in a suggested Threshold of around 27k using Jonathan’s formula. I am not the DBA but a consultant that was called in to do development work so have limited experience with SQL Server troubleshooting but am ramping up as quickly as I can.
This is an on-prem box. The 2 biggest db culprits are using about 140GB of buffer pool each. I just wanted to get confirmation from you or one of your readers that this is actually an issue before I go to the client.
SELECT [object_name], [counter_name], [cntr_value] AS Page_Life_Expectency
WHERE [object_name] LIKE ‘%Manager%’ AND [counter_name] = ‘Page life expectancy’
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
(COUNT(*) * 8.0) / 1024 / 1024 / 4 * 300 AS SuggestedThreshold
That low of a PLE is definitely an issue, as the server will be doing a lot of physical reads, slowing down queries. However, those two tables may not be the culprits – you’ll need to examine wait stats (see https://www.sqlskills.com/blogs/paul/updated-sys-dm_os_waiting_tasks-script/) and work out which tables and queries are driving the reads, and then examine the workload to see what really needs to be memory resident. Might be that some queries are doing table scans – all kinds of possible issues.