More about sparse columns and column_sets

I'm still getting used to the new sparse column feature in SQL Server 2008.

I'd just read in the BOL definition of ALTER TABLE that you can add a column_set to an existing table. I was converting a sample app from an EAV (entity attribute value) design to sparse columns. I used the existing "attribute-value" table to create the sparse columns, then created the table. I then went back to add the column_set with ALTER TABLE after the fact.

alter table sparsetest3 add spcolset xml column_set for all_sparse_columns

Received this error message:
Msg 1734, Level 16, State 1, Line 1
Cannot create the sparse column set 'spcolset' in the table 'sparsetest3' because the table already contains a sparse column set. A table cannot have more than one sparse column set.

Huh? I figured that I should be able to add the column_set and filed the lack of this capability as a bug. It came back as "by design".  And doc'd in BOL that way. The fact that its by design makes perfect sence and helped to solidify in my mind how the column_set works.

In a "normal" table (even with sparse columns) without a column_set, "select * from table" returns all of the columns. If a table has a sparse column_set, "select * from…" behaves differently, and returns only the non-sparse columns and the column_set column. NOT the individual sparse columns. You can INSERT or UPDATE this table by using only the column set. When you update using a column_set, all of the sparse columns that you don't specify are set to NULL. You can even update to column_set to NULL itself, which NULLs out all of the sparse columns.

The reason that "A column set cannot be added to a table if that table already contains sparse columns" (BOL exact wording under "Guidelines for using sparse columns") is that it could break existing code that uses "select * from…". Imagine:

create table sparsetab (
 id int identity primary key,
 col1 int,
 spcol2 int sparse
);

select * from sparsetab — returns spcol2
— add column_set (this is disallowed, but imagine it DID work)
select * from sparsetab — doesn't return spcol2, only the column_set, can break code that relies ont spcol2

And BOL is right in ALTER TABLE as well.

create table sparsetab2 (
 id int identity primary key,
 col1 int
);
go
— add first sparse column and column_set at the same time, works fine.
alter table sparsetab2
 add spcol1 int sparse,
       spcolset xml column_set for all_sparse_columns
go

You can even, as BOL indicates, add a column_set to a table that does not yet have a sparse column.

create table sparsetab4 (
 id int identity primary key,
 spcolset xml column_set for all_sparse_columns
);
go
— Now you can add sparse columns, they use the column_set

The ONLY issue I have is with the error message 1734 at the beginning of the post. It's misleading, because I DON'T already have a column_set. But I DO already have sparse columns.

In last sparse column correction from a long-ago post. I'd heard (early on) that you would be able to have over 4 million sparse columns (actually sizeof(int) of them). This turns out to be incorrect, it was announced lately that the limit will be 30,000 sparse columns. And, in the current CTP6 you can only have 1024 total columns (in as previous versions); the sparse column limit will be changed sometime before RTM.

Amazingly, when I quoted the 30,000 column limit to a class last week, there was a groan. One student told me his EAV table already had over 60,000 unique attributes. 30,000 wouldn't be enough…must be a HUGE EAV table. That's the motivation for sparse column. Also, his EAV table had the "value" column defined as SQL_VARIANT. That's the other motivation, sparse columns are strongly typed. Although inserting through the column_set always uses a string (nvarchar) as the value, and attempts to convert string to the definied data type for specific columns.

One thought on “More about sparse columns and column_sets

  1. 60000? You’ve got to be kidding me? Wow…I wanna know what his app does.

    Did he consider using 2 tables (or however many were needed), with the maximum 30000 in each? I guess its open to opinion whether using more than 1 table to store data for one entity is better than an EAV table. I think it is, I would prefer to have the benefits of sparse columns:
    Strongly-typed
    More performant than pivoting the data

    Mind you…I wouldn’t enjoy writing the CREATE TABLE statements 🙂

    -Jamie

Comments are closed.

Other articles

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.