And the EAV winner is …. sparse columns

Many of you have already heard the "hardware store" story. What's the best way to model products in a hardware store, where new items arrive at the hardware store each day. Each item has a "short list" of similar properties (e.g. UPC, price) and a long list of dissimilar properties (e.g. paint has color, type, amount and curtain rods have width, metal, etc). How to model the dissimilar properties for each item in relational table(s)?

This isn't as unusual of a problem as you might think, examples I've heard lately include:
 Items in a directory system (like AD)
 Readings for lab test results
 Attributes for Sharepoint items

I've always thought of the main contenders as:
1. Sparse tables – one per product
2. Sparse columns – 90% of the column values would be NULL
3. Model as XML – similar properties are subelements, sparse properties are attributes
4. Entity-attribute-value (EAV) – also known as open schema. A separate "properties" table with name-value pairs.

EAV is one of the most popular solutions, even supposedly endorsed by standard schemas in some industries. Many relational purists detest EAV because its non-relational. It's main drawbacks are that the "name-value pair" table gets huge fast, with the corresponding lack of performance, the need for careful editing (color and colour would be two different attributes), and the fact that the "value" column of name-value must have a data type of nvarchar or SQL-variant.

SQL Server 2005 added the PIVOT keyword. One use for PIVOT is the change the EAV tables into something that looks like sparse tables.

I even had the opportunity to ask Joe Celko (no fan of EAV) which he prefers, trying to ease him towards the "model as XML" mechanism. He stood up for sparse tables or sparse columns.

SQL Server 2008 will include support for sparse columns. You can designate a column as

SPARSE in the DDL, like this:

CREATE TABLE products (product_num int, item_num int, price decimal(7,2), …,
                       color char(5) SPARSE, width float SPARSE…)

You can have a huge number of sparse columns per table, although the number of non-sparse columns remains at 1024. In addition, SQL Server 2008 will support sparse indexes (aka filtered indexes) defined like:

CREATE INDEX coloridx ON products(color) WHERE product_num IN (21,22,42…)

Finally, you can have an XML "COLUMN SET" column for each table; this exposes the sparse properties (or perhaps a subset of them?) for each item as a collection of XML elements, for those folks that like to model these as XML.

ALTER TABLE products ADD COLUMN properties XML COLUMN_SET FOR ALL_SPARSE_COLUMNS

It's an interesting idea; the proof will be in the perf as well as the usability.

2 thoughts on “And the EAV winner is …. sparse columns

  1. Mladen,

    The issue with your approach, at least in 2005, is that you have no maintenance path in case you need to add additional attributes. XML schema collections in 2005 are difficult or impossible to alter without changing every piece of code that uses them. Sparse columns appear to not have this maintenance issue. But then again, apparently the schema collections will also be easier to work with in 2008. So it’s hard to say which option will be better…

Comments are closed.

Other articles

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.