Column order doesn’t matter… generally, but – IT DEPENDS!

OK, for years I've been saying that SQL Server doesn't care about the order in which you define the columns of your table because internally SQL Server will re-arrange your columns to store all of the fixed width columns first and the variable columns last. In both the fixed-width portion of the row as well as the variable-width portion of the row, the columns are defined in the order in which they are declared. So, what does matter?

It's all in the cost of the variable array's offset values. If the large majority of your NULLable records are at the end, then SQL Server doesn't need to completely populate the variable block array (which saves 2 bytes per column). If you have a table where 36 columns are NULLable and generally they are NULL, then defining those columns at the end of the row can save you space.

The following script will show you how the maximum length of the row changes based on whether or not a later column in the variable block is NOT NULL – even when most/all of the prior columns are!

CREATE TABLE RowSizeVariableBlock
(
ID int NOT NULL identity,
c01 char(10) NOT NULL default 'test',
c02 datetime2(7) NOT NULL default sysdatetime(),
c03 char(80) NOT NULL default 'junk',
c04 varchar(100) NULL,
c05 varchar(100) NULL,
c06 varchar(100) NULL,
c07 varchar(100) NULL,
c08 varchar(100) NULL,
c09 varchar(100) NULL,
c10 varchar(100) NULL,
c11 varchar(100) NULL,
c12 varchar(100) NULL,
c13 varchar(100) NULL,
c14 varchar(100) NULL,
c15 varchar(100) NULL,
c16 varchar(100) NULL,
c17 varchar(100) NULL,
c18 varchar(100) NULL,
c19 varchar(100) NULL,
c20 varchar(100) NULL,
c21 varchar(100) NULL,
c22 varchar(100) NULL,
c23 varchar(100) NULL,
c24 varchar(100) NULL,
c25 varchar(100) NULL,
c26 varchar(100) NULL,
c27 varchar(100) NULL,
c28 varchar(100) NULL,
c29 varchar(100) NULL,
c30 varchar(100) NULL,
c31 varchar(100) NULL,
c32 varchar(100) NULL,
c33 varchar(100) NULL,
c34 varchar(100) NULL,
c35 varchar(100) NULL,
c36 varchar(100) NULL,
c37 varchar(100) NULL,
c38 varchar(100) NULL,
c39 varchar(100) NULL,
c40 varchar(100) NULL
)
go

insert RowSizeVariableBlock DEFAULT VALUES
go

select * from RowSizeVariableBlock
go

select * from sys.dm_db_index_physical_stats
(db_id(), object_id('RowSizeVariableBlock'), null, null, 'detailed')
— review "max" record size = 114
go

insert RowSizeVariableBlock (c01, c03, c20)
values ('med row', 'up to c20', 'test')
go

select * from RowSizeVariableBlock
go

select * from sys.dm_db_index_physical_stats
(db_id(), object_id('RowSizeVariableBlock'), null, null, 'detailed')
— review "max" record size = 154
go

insert RowSizeVariableBlock (c01, c03, c30)
values ('med+ row', 'up to c30', 'test')
go

select * from RowSizeVariableBlock
go

select * from sys.dm_db_index_physical_stats
(db_id(), object_id('RowSizeVariableBlock'), null, null, 'detailed')
— review "max" record size = 174
go

insert RowSizeVariableBlock (c01, c03, c40)
values ('large row', 'up to c40', 'test')
go

select * from RowSizeVariableBlock
go

select * from sys.dm_db_index_physical_stats
(db_id(), object_id('RowSizeVariableBlock'), null, null, 'detailed')
— review "max" record size = 194
go

While there are some other optimizations at this level, most tables cannot benefit from this as the data populations aren't as predictable nor are most tables filled with so many variable-width and NULLable columns. However, if you do have this data pattern, defining these columns at the end of your table's definition – MIGHT save a tremendous amount of space, especially when this table is very large!

Paul's blogged more on these structures as well as the NULL bitmap here: http://www.sqlskills.com/blogs/PAUL/post/Misconceptions-around-null-bitmap-size.aspx.

Enjoy! And, thanks for reading,
kt

8 thoughts on “Column order doesn’t matter… generally, but – IT DEPENDS!

  1. Good information, Kimberly.

    Column order is more of an issue with other DBMSs, so it comes up often during our data modeling/database design classes. I like that you mentioned that it might have an impact on certain types of tables, with very large. However, in my experience you need to explain what you mean by "very large". Many newish developers think that very large means more than a few hundred rows.

    They read something about a VLDB and think it applies to our relatively tiny tables that hold a few 10 thousand rows or so.

    So perhaps you could qualify your post with where you think, in the overall scheme of things, at what point you think designers should even start to worry about column order.

  2. I’m thinking that table level compression available on SQL Server 2008 would be the first choice in reducing space and perhaps increasing performance for large,sparsely populated tables.

  3. Hey there Karen – Yeah, originally, I had "with millions of rows" but in all honesty, it depends on how wide the rows are with/without the variable block. What I probably should have said is that it’s more related to overall size and certainly depends on your specific number of rows but if you estimate this out then a savings of 50% on a 100,000 row table with really wide rows… would actually make a huge difference!

    So… I think of table size more related to MB of total row size and # of records. But, on smaller systems (like Express) you could argue that even 10s of thousands of rows can have an impact.

    Maybe I need another post on capacity planning, row size and "what really matters" (how to figure out what and why you should care wrt to table design :). LOL.

    Cheers,
    kt

  4. Hey there Ali-While compression is certainly an "easy to implement" option for reducing space (and will reduce this example quite drastically!) therefore potentially/drastically reducing IOs, compression does have other trade-offs (in terms of CPU and the fact that it’s only on SQL Server 2008 Enterprise Edition). Having said that, good design principles are always welcome – even with new features. ;-). But, this is definitely a rare case and one that I’m better off knowing (and can use in any environment) but, compression would be easier (if I can meet the requirements!).

    So, *great* point for this example.

    One thing I’ll add though is that while compression works really well for this example, it might not always work as well and before you implement compression see what compression ratio you will get by using sp_estimate_data_compression_savings. Other tables might not compress as well and all work to compress would be lost and paid for with more work in CPU!

    THANKS!!
    kt

  5. Hey there Chris – Feel free to post some comments and a link – but, not the text. It’s better to share opinions and links rather than copying content!

    Cheers,
    kt

  6. Need some kind help of yours on this post.

    When you say "[quote]It’s in the cost of the variable array’s offset values. If the large majority of your NULLable records are at the end, then SQL Server doesn’t need to completely populate the variable block array (which saves 2 bytes per column). If you have a table where 36 columns are NULLable and generally they are NULL, then defining those columns at the end of the row can save you space.[/quote]"

    What I’m not aware of or confused is what does it means by SQL Server doesn’t need to completely populate the variable block array (which saves 2 bytes per column). I’m aware of the fact that SQL Server uses 2 bytes extra for a variable length column but having all such columns null wount; the storage engine still using 2 bytes into No’ of colls.

    Basically how is the array effected by grouping all such columns at the very end.

    Thanks a lot for helping me out on these.

  7. When a value actually exists in the variable block array, it requires 2 bytes to store the offset. If you have 5 variable width columns and only columns 1, 2 and 3 have values then only those must be in the variable block array (meaning that you have 6 additional bytes of overhead). If another row has ONLY the third variable column populated then SQL Server still needs 2 bytes for EACH: columns 1, 2 and 3 – even though the values for 1 and 2 are NULL. Basically the variable block does not have column numbers (because with columns numbers and offsets, it would take 4 bytes – not 2), it has only offsets. And, they need ALL columns offsets up to the last NON NULL value populated because it’s the position within the variable block array that defines to which column the offset points.

    Cheers!
    kt

Leave a Reply

Your email address will not be published. Required fields are marked *

Other articles

SQLskills SQL101: Partitioning

Continuing on our path to understanding the basics and core concepts, there’s a big topic that’s often greatly misunderstood and that’s partitioning. I know I’m

Explore

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.