MERGE, JOINS, and determinism

SQL Server 2008 will contain an ANSI SQL standard MERGE statement with extensions. Listening to the webcast last Friday, there's a fairly straightforward way to describe how this works.

Let's go back to first principals. Relational databases support two ways to do UPDATE and DELETE; positioned updates and searched updates. In a positioned update, you open an updatable cursor over a set of rows, navigate to the row you want, and issue an "UPDATE…WHERE CURRENT OF" statement. The searched update (UPDATE foo SET x=y WHERE z=1) is a SELECT and UPDATE in one statement. You specify the rows you want and also update them. The way you specify/gather the rows to update is mostly up to you, SQL Server even supports using a JOIN to do this.

update t
set t.name = s.name, t.age  = s.age
from [target] t
join [source] s on t.id = s.id;
go

MERGE in SQL Server 2008 is a searched operation that can operate on up to three different rowsets depending on how you specify the MERGE statement. These are actually the three rowsets you can get from a FULL OUTER JOIN. The statement:

select [target] t
inner join [source] s on t.id = s.id;

gets the rows in table T with a matching id value in table S. Let's call this rowset1. A left outer join will also include the rows in T that do not match S (rowset2); right outer join includes the rows in T that don't match S (rowset3), and FULL OUTER JOIN contains all three rowsets.

In MERGE, you can get up to all three rowsets (INNER, LEFT, and RIGHT join rowsets)

merge [target] t
 using [source] s on t.id = s.id
 when matched then update t.name = s.name, t.age = s.age — use "rowset1"
 when not matched then insert values(id,name,age) — use "rowset2"
 when source not matched then delete; — use "rowset3"

The query processor will do the appropriate type of join to gather the rowsets that you need, as all clauses are not required. This optimizes performance over multiple statements, as you only have to gather the rows to process once.

Even, if you use only "when matched", MERGE is an improvement over our first "update using a join". If more than one row in the source matches one row in the target…

insert into t values(1, 'Fred', 42)
insert into s values(1, 'Buddy', 43)
insert into s values(1, 'Sam', '95)

The update using a join non-deterministically picks a row in the source to do the update, it could be Buddy or Sam. MERGE throws the following exception.

Msg 8672, Level 16, State 1, Line 1
The MERGE statement attempted to UPDATE or DELETE the same row more than once. This happens when a target row matches more than one source row. A MERGE statement cannot UPDATE/DELETE the same row of the target table multiple times. Refine the ON clause to  ensure a target row matches at most one source row, or use the GROUP BY clause to group the source rows.

MERGE can actually do more than three operations using predicates in the "match/no match clauses", but that's it for now.

Other articles

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.