There’s a lot of controversy about the Buffer Manager performance object counter Page Life Expectancy – mostly around people continuing to quote 300 as a threshold for starting to worry about there being a problem (which is just utter nonsense these days). That’s far too *low* to be the point at which to start worrying if your PLE dips and stays there. Jonathan came up with a better number to use – based on the size of your buffer pool – see the bottom of his post here.
But that’s not why I’m writing today: I want to explain why most of the time now Page Life Expectancy is really not giving you useful information.
Most new systems today use NUMA, and so the buffer pool is split up and managed per NUMA node, with each NUMA node getting it’s own lazy writer thread, managing it’s own buffer free list, and dealing with node-local memory allocations. Think of each of these as a mini buffer pool.
The Buffer Manager:Page Life Expectancy counter is calculated by adding the PLE of each mini buffer pool and then getting the average.
What does this mean? It means that the overall PLE is not giving you a true sense of what is happening on your machine as one NUMA node could be under memory pressure but the *overall* PLE would only dip slightly. One of my friends who’s a Premier Field Engineer and MCM just had this situation today, which prompted this blog post. The conundrum was how can there be 100+ lazy writes/sec occurring when overall PLE is relatively static – and this was the issue.
For instance, for a machine with 8 NUMA nodes, with PLE of each being 4000, the overall PLE is 4000. If one of them drops to 1000, the overall PLE only drops to 3625, which likely wouldn’t trigger your alerting as it hasn’t even dropped 10%.
On NUMA machines, you need to be looking at the Buffer Node:Page Life Expectancy counters for all NUMA nodes otherwise you’re not getting an accurate view of buffer pool memory pressure and so could be missing performance issues. And adjust Jonathan’s threshold value according to the number of NUMA nodes you have.
You can see the lazywriter activity for each NUMA node by looking for the lazywriter threads in sys.dm_exec_requests.
Hope this helps!
16 Responses to Page Life Expectancy isn’t what you think…
Thanks Paul. I would guess this one should go to myth a day series:)
Do you have any suggestions for a good monitoring metric or two that could encompass this? Maybe the combination of average PLE and the standard deviation per node? That way if a single NUMA node dropped the standard deviation would spike, throwing an alert.
Yes – look at Jonathan’s post that I link to…
Would the URL below help us accidental DBA’s understand whether we have NUMA and if the hardware has it, whether SQL Server xxxx uses it?
http://blogs.technet.com/b/beatrice/archive/2008/04/21/to-numa-or-not-to-numa.aspx
If there’s a better resource to help determine our real-world-NUMA-status……
Must admit, NUMA is new to me. I’m concerned because the DBAs (real ones, purportedly) at Rackspace quoted the magic 300 number a few months ago. We’re (still) on SQL 2005 EE but on a Windows 2008 R2 Enterprise Server with a dual-4-core CPU w/ 36GB RAM config (14-month-old Dell PowerEdge platform). Your post and the one referenced above _seem_ to indicate that we’re not NUMA-enabled/benefitted. Does this still render the magic-300 defunct?
Thanks for raising the topic – helpful as always and much appreciated.
Thanks Paul, an amazing insight on PLE on NUMA nodes.
That’s left me wondering that "what else is different on NUMA nodes in regards CPU/Memory/IO" which once should be knowing. May be decicted blog post with some pointers "If you are on NUMA, you should be knowing this….", will surely help DBA’s.
Apologies if this already exits somewhere, which I cannot Google.
Cheers!! (you are my hero)
VarunD
I haven’t looked at mine for a while, just checked production averages:
node 0 – 262,179
node 1-6 – 411,946
node 7 – 379,091
This server was rebooted Saturday night (10/15) for patching.
Thanks, Jonatham, but being on SQL 2005, the DMV doesn’t exist in master… By running:
SELECT DISTINCT memory_node_id FROM sys.dm_os_memory_clerks
I get Nodes 0 and 1. Guess I’m non-NUMA.
Stephen,
You can tell if your system uses NUMA or not by querying sys.dm_os_memory_nodes. A non-NUMA system will have 2 nodes listed, 0 and 64 (which is the DAC resource and not a true node). A NUMA system will have additional node entries for each of the hardware NUMA nodes in the system. If you manually configured Soft-NUMA sys.dm_os_nodes would have the independent CPU nodes under Soft-NUMA, but you don’t get the extra lazywriters or memory partitioning per node under Soft-NUMA.
Does this only applies to SQL Servers installed on physical machines or does it applies also to virtual machines working on a physical machine that uses NUMA?
By the way, great article as usual.
Vitor,
Unless you are running ESX 5.0 already, this only applies to physical hardware as there is no concept of virtual NUMA prior to ESX 5.0. With ESX 5.0, you can configure the VM to use virtual NUMA which is how I assume they were able to let a VM be larger than the physical resources in one of the hardware nodes, but I haven’t had time to look at it in depth yet.
Stephen,
I don’t have a 2005 machine immediately available to look at to tell if the DAC used node 64 like it does in 2008 or not. The DAC runs under its own node_id which is 64 in 2008. See if you have a scheduler in dm_os_schedulers that is ONLINE DAC for one of the two nodes, and you will know if either is the DAC node. If it is, you don’t have hardware NUMA. If you have an even number of schedulers that are ONLINE VISIBLE for both node_id’s and you have a separate scheduler for the ONLINE DAC, then you have hardware NUMA unless you or someone else manually configured Soft-NUMA for the system.
Cheers,
Jonathan
Thanks Jonathan,
I will take that in consideration.
Hello Paul,
Thanks for the great explanation.Can you please let me know about the value shown by performance dashboard PLE? is this the overall PLE or is this the NUMA wise.
Hi Aditya – it’s the overall PLE.
hi paul,
thanks for your great article from turkey…
[...] subset thereof) will be table scans, which can put pressure on the buffer pool (see my post on Page Life Expectancy) and lead to contention on the ACCESS_METHODS_DATASET_PARENT latch (all manifesting as a high [...]
Thanks Paul for shedding light on this !
I just wrote a query to see and demonstrate that to a colleague.
Here is the result.