I've been talking with folks for (it seems like) a long time about modeling sparse attributes in a relational database. Seems like I run into a new design where there's a need for sparse attributes every few weeks. If you remember the "hardware store" example (where each new sales item may have unqiue properties), that's just the tip of the iceburg. Basically, your choices boil down to: Sparse Columns (new column for each new attribute), Sparse Tables (new table for each new set of related attributes, if they are related), Entity-Attribute-Value (the "traditional" design, often eshewed because of scalability concerns), and XML (attributes model that sparse attributes, elements model the common attributes).

Last week I ran into a person with a modeling decision like this. He also informed me that he'd tried the sparse table design and ran into SQL Server's hard limit of 256 tables in a join (if you want all the sparse attributes for all products?). Wow. I can't image what a 256 table join would look like, and how the query processor would have time to load all the statistics for this one. He'd also run into the 1024 column limit with sparse columns. I told him to wait on that one; SQL Server 2008 will have sparse columns.

The XML design is interesting too, because you can do a search to which rows have which sparse attributes and spit out the right data.  It's what the XML VALUE index was designed for. And hoist the common attributes to persisted computed columns for best query perf. Seems that sparse columns may have that covered too, if its implemented like what was shown at TechEd US. There would be an optional column defined as "XML COLUMN_SET FOR ALL_SPARSE_COLUMNS". A value index on this should do the trick for a fast search too.

I (and quite a few other folks, if my networking is correct) can hardly wait…