Well, I spun the wheel of database topics I have here in my room, and today I think I’ll talk about updates and views…. specifically inserts through non-indexed views.  Since I haven’t blogged about this previously, I’ll start at the beginning.  There are many, many update topics, so don’t feel left out – comment if there’s an update topic that interests you and I’ll get to it.

UPDATEs are challenging for different reasons than SELECT statements.  While a SELECT statement can have huge combinatorical challenges in terms of the number of different plan choices to consider, UPDATEs often have a relatively small number of plan choices but instead have a number of very difficult performance tradeoffs about which kind of plan tweaks will cause the plan to finish the fastest.  You can imagine that there is a labor tradeoff between making a system work really well for SELECT plan exploration and making a system that can handle all of the detailed tweaks for UPDATE plans – most things are neutral, but getting that last 10% of performance out in one area may impact the other’s ability to innovate.  So, balance is required on the part of the database implementor.

So if I were building my first database engine and I wanted to update a base table (a “heap”), I may have the following:

create table z1 (col1 int, col2 decimal, col3 nvarchar(100));
insert into z1 (col1, col2, col3) values (1, 2, 'i like cheese')

So if I implement my storage engine with a series of equally sized pages, I can probably figure out how to load each of them into memory and look for a place to store a record that’s a linearization of (1, 2, ‘i like cheese’) on disk. 

Let’s start making things more complex.

create table z2 (col1 int primary key, col2 decimal, col3 nvarchar(100));
insert into z2 (col1, col2, col3) values (1, 2, 'i like cheese')

So if I add a primary key, this is implemented in SQL Server as a clustered index.  This replaces the heap.  So, now to build my own storage engine that does this I’d have a B-Tree implemented and I would find the right place in my B-Tree to insert my new record.  I may need to split some pages to make things fit.  Here’s the plan in SQL Server:

  |–Clustered Index Insert(OBJECT:([t1].[dbo].[z2].[PK__z2__357D0D3E03317E3D]), SET:([t1].[dbo].[z2].[col1] = RaiseIfNullInsert([@1]),[t1].[dbo].[z2].[col2] = [Expr1003],[t1].[dbo].[z2].[col3] = [Expr1004]), DEFINE:([Expr1003]=CONVERT_IMPLICIT(decimal(18,0),[@2],0), [Expr1004]=CONVERT_IMPLICIT(nvarchar(100),[@3],0)))

This is nice because there is still just the one structure to manage.  I just load it up, insert my row, and I am a happy man.

Now, I’ll blog some other day about what happens with multiple indexes and such – today I want to talk about inserts against views, as this has lots of nasty details in its implementation.

create view v1 as select col1, col2, col3 from z2 where col2 between 2 and 100

ok, so I’ve added a view against z2 that shows a subset of the rows.  That’s not too bad, or is it?  Well, if I insert/update the base table, my plan is the same as before, so that’s easy enough.  However, if I try to run an insert against the _view_, what should happen?  The view isn’t stored anywhere.  However, in this case, there is a single table underneath it and the rows from the base table can be “mapped” up through the view, so perhaps I could reverse that operation and translate a request to update the view to be an update against the base table. 

insert into v1 (col1, col2, col3) values (1, 2, 'foo')

  |–Clustered Index Insert(OBJECT:([t1].[dbo].[z2].[PK__z2__357D0D3E03317E3D]), SET:([t1].[dbo].[z2].[col1] = RaiseIfNullInsert([@1]),[t1].[dbo].[z2].[col2] = [Expr1003],[t1].[dbo].[z2].[col3] = [Expr1004]), DEFINE:([Expr1003]=CONVERT_IMPLICIT(decimal(18,0),[@2],0), [Expr1004]=CONVERT_IMPLICIT(nvarchar(100),[@3],0)))

Hey, that’s pretty neat – SQL Server performs an insert against the base table that looks very similar to what we saw before.  Very nice of them :).

Ok, let’s talk about the case when the row doesn’t meet the filter:

insert into v1 (col1, col2, col3) values (3, 250, 'foo')

  |–Clustered Index Insert(OBJECT:([t1].[dbo].[z2].[PK__z2__357D0D3E03317E3D]), SET:([t1].[dbo].[z2].[col1] = RaiseIfNullInsert([@1]),[t1].[dbo].[z2].[col2] = [Expr1003],[t1].[dbo].[z2].[col3] = [Expr1004]), DEFINE:([Expr1003]=CONVERT_IMPLICIT(decimal(18,0),[@2],0), [Expr1004]=CONVERT_IMPLICIT(nvarchar(100),[@3],0)))

Hrm.  Well, that seems to work too… The ANSI committee got this far when working on their ANSI SQL specification, and they added a neat little keyword “with check option” that allows inserts, updates, and deletes against views to make sure that the resulting row would continue to be exposed through the view after the change before it is allowed.

create view v2 as select col1, col2, col3 from z2 where col2 between 2 and 100 with check option


insert into v2 (col1, col2, col3) values (1, 250, 'foo')

So if we run the same query as before against this view with “with check option” defined, we get this crazy looking plan:

What’s all this then?  Well, some of this requires a few more operators of the day before I can fully explain the plan.  However, in this case it’s doing the following:

1. create a dummy row that has the values you want to insert into it. 
2. insert it into the clustered index.
3. perform a check to see if this row matches the filter condition(s)
4. Assert that the check succeeded.  If not, fail the query and roll back the transaction.

All of that from a harmless little insert statement…

Well, let’s do something else that is theoretically invertable and see how far this support goes:

create view v3 as select col1 + 1 as a, col2, col3 from z2 
insert into v3 (a, col2, col3) values (1, 250, 'foo')

Msg 4406, Level 16, State 1, Line 1
Update or insert of view or function ‘v3′ failed because it contains a derived or constant field.

So I claim that this could be supported as an updatable view type because the domain of col1 is integer and the scalar expression on it is invertable.  Unfortunately, none of the vendors really support this to my knowledge.  So, instead of it being a straight reference to a column in z2, we could create an expression that is the proper query needed to make the view v3 consistent by inserting a different row into z2. 

I’ve done two operators – basically the two easiest ones in any query processor.  However, lots of different operators can be supported and have some set of rules about invertability that can be used to perform inserts, updates, and deletes against them.

So that’s scratching the surface on updates.  I’ll try to post up a few more entries on different parts of the system, but I hope you’ve learned something today about how view updatability is implemented.

Thanks,

Conor