The frustration of sweeping generalizations – follow on from Search Engine Q&A #12

A bit more traffic on the thread (see previous post here) prompted me to give my thoughts on the many sweeping generalizations that plague the computer industry and make it difficult sometimes to give advice in forums and blogs. I’d like to repost here (with a few tweaks for clarity).

Some examples of questions that breed sweeping generalizations:

  • Should you have clustered indexes on all tables? The well-known clustered-index debate as Kimberly likes to call it.
  • Should you rebuild or reorganize indexes to remove fragmentation?
  • Which high-availabilty solution should you use?

The problem – as with most advice – is that it’s extremely hard to make generalizations. This is both because:

  1. without lots of evidence many people (quite rightly) don’t believe sweeping generalizations as they may have been bitten by one in the past
  2. nearly every situation is different so many generalizations are useless

What I’d love to see, (and I tried to do this when at MS, and like to think I do it here or when teaching classes or conferences) is for people to provide the justification for generalizations, plus some idea of the exceptions and the circumstances under which they arise.

As for this case (whether to create multiple files because there are multiple cores/CPUs), I think we’ve about done this one to death. The sweeping generalizations here are:

  1. for non-tempdb you usually don’t need multiple files, unless you have a very high-end workload of the specific nature I described in my first post (rare)
  2. for tempdb you usually do, as long as your workload merits it on a multi-core/cpu box
  3. IO vendors may recommend it for increased IO throughput *on their specific hardware*
  4. there exist sweeping generalizations from various sources that dispute all of the above

Unfortunately, you’re not going to get a definitive, authoritative answer to a design/strategy question such as this and you’ll continue to find contradictions to anything anyone says on the forums, and even MS contradicting itself (sigh).

What I would suggest is the following:
1) go with the majority opinion of responses to questions asked, based on the respondents collective experience with many customers, databases, and workloads
2) do your own testing, on your own hardware, with your own workload and see what works for you (but beware that testing in a vacuum can prove or disprove anything you want – which is why you see so many contradictory statements)

One last thing on MS – it’s a very big company, with lots of groups. Anyone can sponsor a whitepaper, write a blog post/MSDN article/technet article and publish it, or reply on a forum as a visible MS person and it has the ‘official stamp’ of coming from MS. When I was in the product group I was continually dismayed by the misinformation, bad advice, contradictions, and baseless assertions that I saw coming from MS employees who were just trying to be helpful.

Once something’s published on the internet, it’s *incredibly* hard to undo the damage done. There’s a fundamental element of mistrust sometimes on forums and newsgroups which can be wearying when you’re trying to help people out. It can be very hard to convince people that someone else’s advice isn’t the best to follow – I remember several times arguing with people about how CHECKDB works or what a corruption error message means and finally having to resort to ‘I wrote that code – I’m afraid you *are* wrong’ – which I really hate doing.

Anyway – rant over :-)

6 thoughts on “The frustration of sweeping generalizations – follow on from Search Engine Q&A #12

  1. Well, with such endorsement I’m with you!

    No, seriously though, it is hard to know what or who to follow. In my work as an independent DBA I try and keep up with it all, reading books, talking with ‘experts’, taking selective training (with people like your good lady, Kimberly), following forums, specific blogs, etc. I acknowledge that some of my skills and knowledge are from ‘riding on the shoulders of giants.’ This is because I don’t have the time or resources at my disposal to try out every test or have personal access to such giants (tho I’ve just acquired an nice PE2900 to start some experimentation/testing of my own). So I rely in-part on guys like you. I have a number of people I track, and whilst not taking their word as gospel, their work and advice holds strong credence with me in my work and when looking for solutions to possible problems.

    Keep up the great work. And thanks.


  2. Hi Paul,

    You posted this article on my birthday, but I’m reading it only today.

    I’m from a country (Brazil) where English isn’t our mother language, so reading this kind of article I’ve been learning about technology and my English skills is getting better, I hope so…lol

    Some years ago I’ve decided to follow the best of the best, and be sure, you and Kim are on the top of the list.

Leave a Reply

Your email address will not be published. Required fields are marked *

Other articles

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.