(Apologies to Star Trek).
I received a question about trigger performance, especially the do’s and dont’s about how they are used.
Let’s first start by saying that triggers can be very useful tools to facilitate server-side programming in OLTP systems. They are powerful ways to do things like audit changes to a table, validate business logic, or extend the features that the database server provides to you. So, in the proper hands, these can really provide a lot of business value.
The problem with this area is that there is a great temptation to think about databases procedurally, as you would with a programming language like C++ or C#. You can write code that looks like a procedural function call and have it get called for each insert into table! Before you know it, non-database programmers are checking in code to your production sysem. Suddenly, your application grinds to a halt because a trigger plan has no index or is poorly designed.
Databases are great tools for working on SETS of rows. You can define queries that can query the whole data efficiently. If you do it properly, you can solve a lot of problems very easily. So, for any given query, the startup time for a query is relatively high, but the per-row cost is optimized by lots of smart people. I can assure you that they count instructions, worry about CPU architectures, memory access trends, cache hierarchies, and every other trick in the book to make the per-row cost of a query as low as possible. So, it may take a few million instructions to start a query vs. a few thousand or tens of thousands to process a row in a query.
When you add a trigger, you should think that you are adding another query whenever you run the first one. Furthermore, if you have written your application to call the database more often than necessary (say, once per row instead of batching things up as sets), then you are now adding the cost of that second query to the PER-ROW cost of the query. That is pretty close to the definition of a “bad thing” in the dictionary…
In addition to the number of times that a trigger is called, the trigger plan itself may not be set up to run efficiently. If the trigger enforces business logic that touches a lot of tables or rows, you need to consider what indexes to add. I usually create an equivalent SELECT statment and run it to look at the query plan (I’d fake data and a table for the inserted table, for example). This will help me find slow queries before the trigger is deployed – you probably want to look closely at every table scan to see if it is really necessary).
I’ve put together some examples that you can try to see how the system behaves under different scenarios with triggers. I would publish some numbers to give you relative ideas about how fast things are, but I think that the current CTP of SQL Server that I am running has a few issues with memory on my machine that cause it to page wildly occassionally, so I will let you go run these on your own. The basic idea is that things will get slower as you do more and more “bad practices” with triggers.
create table table1 (col1 int primary key identity, col2 int, col3 binary(2000), col4 binary(4000)) declare @a int set @a=0 while @a < 20000 begin insert into table1 (col2) values (@a/100); set @a+=1 end go -- target table on which we are creating triggers create table table2 (col1 int primary key identity, col2 int, col3 binary(2000), col4 binary(4000)) -- audit table create table table3(col1 int, col2 binary(100)) -- try to insert 20,000 rows with no triggers defined - should be pretty fast. begin transaction insert into table2 select col2, col3, col4 from table1 rollback go -- now create a simple trigger that reads all the inserted rows and copies them to table3 (an audit table) create trigger trig1 ON table2 FOR INSERT AS INSERT INTO table3(col1, col2) select col1, null from inserted begin transaction insert into table2 select col2, col3, col4 from table1 rollback -- this will run the trigger query once with all 20,000 rows. go -- here is where triggers starts to hurt. Let's write the same query except that -- we read each row separately in its own query. (This will be even worse if you remove the begin transaction -- since each row will need to commit the log to disk). begin transaction declare @i int set @i=0 while @i < 20000 begin insert into table2 select col2, col3, col4 from table1 where col1 = @i set @i+=1 end rollback -- This is a very poor use of the database - In almost every case, using a set-based approach will -- yield better performance. You avoid multiple transactions. You avoid running through a long codepath -- to start each query. You give the optimizer to consider more advanced plans. go -- let's remove trig1 for our next example drop trigger trig1 go -- now let's create a trigger that copies ~100 rows every time it is called. Furthermore, let's -- restrict on a non-indexed column in the source table. This will lead to a table scan for each call. -- This should simulate your average "bad" trigger formulation. create trigger trig2 ON table2 FOR INSERT AS INSERT INTO table3(col1, col2) select col1, null from table1 where col2=5; go -- and, for fun, we'll run the one row at a time example again. -- (go get some coffee, or whatever else you do to amuse yourself - this will take awhile.) begin transaction declare @i int set @i=0 while @i < 20000 begin insert into table2 select col2, col3, col4 from table1 where col1 = @i set @i+=1 end rollback
So what is a responsible DBA or database developer to do???
Well, there is hope. A few things to consider:
1. Look at that trigger – do you need it? really? Perhaps not. Consider each one and make sure that you really, really need it.
2. Make sure that you call any trigger the minimum number of times. SET STATISTICS PROFILE ON can help you see how many calls are happening for any given query.
3. You can often pull the logic for a trigger up into the stored procedure that is calling your query. This works if you can control all of the codepaths into the system for changes to the system – this is often the case in OLTP applications.
4. SQL Server 2005 added some neat features that help with benchmarks. However, they are also quite useful for you if you have the time to consume them. Look up INSERT… with OUTPUTs. This lets you insert rows into one table and then perform an operation with the rows after the insertion (return them as a select query, insert them into another table, etc). This is a very cool feature. It takes quite a bit of knowledge to wrap your head around what the database system is doing to make this all work (too much blogging for this post, to be sure), but I can assure you that it is pretty fast :).
Bottom line – I recommend that you have one person at your company who is responsible to understand the schema, including things like triggers. This means that someone can think through the tradeoffs and make reasonable decisions to avoid the most common pitfalls.
Happy Querying, folks!