The pros and cons of trace flags

If I told you about an undocumented mod you could make to your car's brakes to make them more efficient, would you try it? What if a random mechanic told you? What if someone who's an expert in your make and model of car told you? What if the car manufacturer asked you to try it to solve a brake problem – and just drive around a bit to see what happens and see if it stops the car crashing again?

:-)

A month ago I kicked off a survey about your comfort level with using trace flags in production and now it's time to look at the results.

 

The Other values are:

  • 15 x "It depends!!!!"
  • 7 x "We use them where specifically needed for short amounts of time example – the deadlock trace flags (1204 and 1222) to dump the details to the error log. in some cases as needed we will use startup trace flags – example 845 on 2005 x64 standard to allow."
  • 6 x "If Paul says they're okay."
  • 4 x "We use Trace flags which well documented by MS *and* reputable sources and tested in a controlled manner to ensure continued stability."
  • 3 x "Deadlock full details – because deadlocks are rare, I need full details every time because lightening doesn't strike twice."
  • 3 x "I feel comfortable, each trace does different things, if we need it we'll use it."
  • 3 x "I rarely if ever use a trace flag in production but I'm comfortable using one if it's documented by a reputable source and it makes sense for us."
  • 2 x "Enable it if css asks, or it its documented in ms official documentation but test before implementing."
  • "Either from MS source or from a non-MS but reputable source."
  • "If I hear/read about it from a non-MS, but reputable source, to fix a bug or boost performance."
  • "If some doc is evident and MS knows what the TF actually does, instead of their telling you to turn it on without realizing it affects more than they thought (ex/ TF2330 stops all index usage stats gathering)."
  • "Only after testing in a dev/test environment first."
  • "Will use them on need basis – for example – deadlock trace 1222 to log the deadlock info in the error log."

Trace flags are interesting things. They change the behavior of SQL Server in a huge variety of sometimes useful, sometimes dangerous, and sometimes bizarre ways. There are many hundreds of trace flags, most of which are undocumented – and usually for the very good reasons that they're only supposed to be used for testing, for debugging problems with Product Support, or to enable specific bug fixes that may be detrimental to others.

I never advise turning on a trace flag (either when teaching a class or discussing with a client) unless I'm rock-solid 100% sure about what it does and what the effects are. You should do the same when considering trace flag use on a production system.

The example I like to use is trace flag 661 that disables the ghost cleanup background task. It's documented in KB 920093 and I've blogged about it here (and in a few other posts). It's a very useful trace flag which is safe to use, is documented by MS, is referenced by trustworthy sources (well, me :-) and can give a performance boost under some circumstances. However, it can have horrible side-effects if you don't understand the ramifications of turning it on – deleted data and index records stay around until the next index rebuild/reorganize, leading to page splits and data file growth.

Where trace flags are concerned, you need to test and validate before enabling in production – no matter *who* you hear about it from.

Be careful out there!

9 thoughts on “The pros and cons of trace flags

  1. Paul,

    Thank you for posting the results of this survey.

    Based on the singleton choices, I had to choose "Other" as my answer since I apply several factors to my potentially using a Trace Flag.

    For me, there has to be enough documentation on the MS website (BOL, whitepapers, blogs from teams and individuals, and forum answers) to understand the positive and negative affects of enabling a Trace Flag; including any potential interactions with other Trace Flags. I do not accept MS whitepapers by themselves as we have seen individual whitepaper are not always trustworthy — I stand-up the latest SharePoint whitepaper as evidence, though it is not the first offender and not the last).

    Not wanting to trust the Microsoft website as my sole source of information, I also need input from external sources I deem trustworthy; which includes the SQLskills website.

    Finally, I have to try the functionality and feel comfortable with it.

  2. And yet this is how the Linux and many other CS movements are based: random advice from someone else. I’d like to think most people perform a "sniff test" to see if something makes sense, but many do not.

    So here’s the followup question: how cane we be 100% of what something does?

    MS documentation is poor, in many cases, and even trusting you, Paul, means going to an unofficial source. I do tend to value your insight into the product since you used to write the code, but what about things you don’t write about? MCMs, and many MVPs, give advice, which is generally good, but sometimes wrong. Or incomplete, or lacks details on the implications.

    I would really like to see some thoughts from you in another blog post on ways, or even sources, that are good for picking and choosing those flags that are safe (or unsafe) for production systems.

  3. Hi Paul,

    Thanks for the insight on this issue. Can you tell me if you feel that -T272 is a safe/well documented trace flag? I have seen some posts on blogs by Microsoft staff indicating that the use of this flag is very specific to the issue it addresses, and that it is safe. But I don’t know if that is enough.

    Also wondering if you can point to some M$ documentation on this particular flag.

    Many thanks in advance!
    -Jeremy

  4. Hi Paul
    What is your opinion of -T1800? I have a case for using it but I don’t have anywhere to do a proper test to be confident it won’t delay start up of the instance or cause other issues. We have production AG on 512 physical sectors and are restoring the backups to 4k VMs. My case is not the documented AG typical scenario that is reported a lot but the logic still seems to apply. Unfortunately I don’t have another 512 machine to be able to test it on.

    Thanks in advance

    Angeline

Leave a Reply

Your email address will not be published. Required fields are marked *

Other articles

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.