In one of the sessions Kimberly and I taught this week at SQL Connections, we discussed how to choose efficient data-types - I'd like to share the discussion here with an example.
The crux of the matter is in the amount of space required to store the data-types in the schema, and ensuring that the data-type choice matches the data being stored without incurring any unnecessary overhead. Here are a couple of examples:
You can see how choosing data-types without considering whether it's the best type can lead to wasted space. Let's look at an example schema to support a population census of the USA.
CREATE
SSN CHAR (256),
StateName CHAR (256),
Sex INTEGER,
Employed INTEGER,
SelfEmployed INTEGER,
EthnicOrigin INTEGER,
MaritalStatus INTEGER,
NumberOfDependents INTEGER,
Age INTEGER,
CountryOfBirth INTEGER,
HouseholdIncomeGroup INTEGER,
ZipCode5 INTEGER);
GO
At first glance this may look reasonable, but digging in you'll see that many of the columns are over-sized. Here's a cleaned-up schema to compare against, with notes on the per-column savings:
SSN CHAR (9), -- saving 244 bytes
StateName VARCHAR (256), -- saving at least 240 bytes (longest state name is 14 + 2-byte varchar overhead)
Sex BIT,
Employed BIT,
SelfEmployed BIT, -- saving 11 bytes altogether over these three fields
EthnicOrigin TINYINT, -- saving 3 bytes
MaritalStatus TINYINT, -- saving 3 bytes
NumberOfDependents TINYINT, -- saving 3 bytes
Age TINYINT, -- saving 3 bytes
CountryOfBirth TINYINT, -- saving 3 bytes
HouseholdIncomeGroup TINYINT, -- saving 3 bytes
ZipCode5 INTEGER); -- no saving
The bad schema gives a per-row size of 574 bytes and the cleaned-up schema is no more than 48 bytes per-row. I designed these two schemas to support a US census. The population of the USA is approx. 300 million. This means the bad schema would take around 190GB to store all the info, and the cleaned-up schema only takes around 15GB - more than 12 times more efficient! Now we're starting to see how poor data-type choice can lead to poor performance.
Wider rows means:
Less rows per page means:
More pages means:
Clearly there's a link between various aspects of workload performance and badly chosen data-types.
Remember Me
a@href@title, strike
Theme design by Jelle Druyts
Pick a theme: BlogXP sqlx BlogXP sqlx
Powered by: newtelligence dasBlog 2.0.7226.0
The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.
© Copyright 2008, Paul S. Randal
E-mail