SQL101: Introduction to SQL Server Transactions

(The original version of this post first appeared on the now-deleted SentryOne blog at the start of 2022.)

One of the most fundamental concepts in any relational database management system (RDBMS), such as SQL Server, is the transaction. During my consulting career, I’ve seen many instances of performance problems caused by developers not understanding how transactions work in SQL Server, so in this tutorial, I’ll explain what transactions are and why they’re necessary, plus some details of how they work in SQL Server. There are nuances to some of this when Accelerated Database Recovery (ADR) is in use – topics for future articles.

What is a Transaction?

A transaction is a unit of work in the database. Every transaction has a defined starting point and a defined ending point. The ending point may be the transaction committed (i.e. completed successfully) or transaction finished rolling back (i.e. did not complete successfully), and I’ll discuss the meaning of those terms a little later.

The basic syntax for transactions is as follows:

  • BEGIN TRANSACTION (or BEGIN TRAN) starts the transaction.
  • COMMIT TRANSACTION (or COMMIT TRAN) ends the transaction successfully.
  • ROLLBACK TRANSACTION (or ROLLBACK TRAN) causes the transaction to end unsuccessfully so all operations performed in the transaction are reversed.

You can also specify a transaction name but this is not required and I don’t see them used very often.

It is important to think about what a unit of work is. It means all changes to the database within the confines of the transaction, typically DML data modification operations like insert statements, update statements, and delete statements. This might be a single T-SQL statement or multiple statements, depending on the kind of transaction being used. If it’s a single statement, it doesn’t necessarily mean a single change. Consider a table with 1,000 rows and someone runs an UPDATE statement with a WHERE clause. This is a single statement but will cause a change to all the rows in the table, at least 1,000 changes to the database within the transaction.

Even if the UPDATE statement only operates on a single row in the table, there are still at least two changes to the database; the update of the row itself on a data file page and the update of the differential bitmap page to mark that portion of the database as changed so the next differential backup will back up the extent that the page is part of. There are plenty more examples I’ve seen over the years of where a single statement can cause many changes to the database depending on data types like varchar, table formats, whether nonclustered indexes exist, and so on.

Why Are Transactions Necessary?

Transactions are part of how SQL Server implements the ACID properties of a database (Atomicity, Consistency, Isolation, and Durability), along with mechanisms like locking and logging.

A transaction guarantees that its unit of work is either wholly present in the database or wholly not present. This is the atomicity in the ACID properties, and I’ll explain how this is done later. This means transactions are very useful for SQL Server developers to control whether a set of operations (e.g. implementing some business logic) completely succeeds or does not succeed at all, so there are no partially-executed sets of operations that would leave the database inconsistent from a business perspective.

A classic example is moving money from a checking account to a deposit account. This involves a debit from the checking account and a credit to the deposit account. This must be implemented as a single transaction so that if the debit succeeds and the credit fails, the transaction as a whole fails and the debit is reversed when the transaction rolls back.

In all cases, changes to the database are performed under locks held by the transaction and these locks are not released until the transaction ends. Using the default isolation level, which is called read committed, other transactions will not be able to see these changes until the transaction has committed (ended), hence the name of the isolation level. This is the isolation in the ACID properties.

For instance, a change to a row will involve the row being exclusively locked by the transaction. Another transaction that wants to read the row will usually require a share lock on the row, and so will also be blocked. This behavior can be changed if the reading transaction changes to the read uncommitted isolation level (or uses the NOLOCK option on the SELECT statement), that doesn’t require share locks for reading rows, but introduces the possibility of anomalies occurring.

Types of Transactions in SQL Server

There are three basic types of transactions in SQL Server:

  1. Explicit transactions, as the name suggests, must be explicitly started with a BEGIN TRANSACTION statement and explicitly ended with either a COMMIT TRANSACTION statement or a ROLLBACK TRANSACTION statement. In other words, the SQL Server developer controls when the unit of work is committed or not.
  2. Autocommit transactions are where the developer does not control the starting and ending points of the transaction. Each T-SQL statement is its own transaction that SQL Server begins and commits automatically under the covers. There is no concept of being able to make a change to a SQL Server database without a transaction being started, as SQL Server must have the ability to roll back the change if something goes wrong.
  3. Implicit transactions are when a transaction is automatically started by SQL Server as soon as a change is made to the database, but remains active until it is explicitly ended. At that point a new transaction is automatically started. This behavior is not the default and must be specifically enabled using a SET IMPLICIT_TRANSACTIONS statement, which is not normally done except to allow behavior compatibility with another RDBMS where this is the default behavior. I’ve seen this be a problem when developers don’t realize implicit transactions are enabled and don’t think they need to explicitly commit the transaction. More on that in the ‘common mistakes’ section below.

With all three of these transaction types, if SQL Server encounters a problem the entire transaction will automatically roll back.

There are also two more advanced kinds of transactions in SQL Server, which are beyond the scope of this article:

  1. Batch-scoped transactions, which are only used during Multiple Active Result Sets sessions.
  2. Distributed transactions, which are used when a local transaction needs to coordinate with multiple SQL Server instances, for instance to run stored procedures with business logic on different servers. This is done using a Distributed Transaction Coordinator, or by the service itself on Azure Managed Instance.

How Does Commit Work in SQL Server?

Consider a very simple example of an explicit transaction that inserts a record into a table, using the code:

BEGIN TRANSACTION;
INSERT INTO
     MyDatabase.dbo.MyTable
VALUES (1, ‘Paul’, ‘Randal’);
COMMIT TRANSACTION;

The insert statement causes some locks to be acquired, which provides the isolation portion of the ACID properties of a database. These will only be released once the transaction has committed. When the COMMIT TRANSACTION statement is executed, I know that the insert is now durable. How does this actually happen?

All changes to a database are logged. Simply put, this means that when a change is made to a data file page, a description of the change is generated, called a log record and entered into the database transaction log. Also when a transaction begins, there is a log record generated and another when a transaction commits. This means our simple explicit transaction will have three log records in the transaction log, all with the same transaction ID, one for each of the three statements executed. In fact, if I’d used an autocommit transaction instead of an explicit transaction (executing just the insert statement), SQL Server would have automatically started and committed the transaction and there would still be three log records in the transaction log for the transaction. One interesting fact you might not know is that SQL Server usually names transactions that it starts; in this case it would have been named simply ‘INSERT’.

When a transaction commits, SQL Server has to make sure that all the log records for the transaction are in the transaction log on disk and not just in memory, so in the event of a crash, the transaction can be replayed, guaranteeing its durability. It does this by making sure that all the transaction log in memory up to the log record for the COMMIT TRANSACTION is flushed to disk before the commit is acknowledged back to the user or application. The sequence of operations when a commit occurs is:

  • Make sure the log is flushed to disk
  • If there is a synchronous database mirror or synchronous availability group replica, make sure the log is also written to disk for their log files on the remote servers
  • Release the locks the transaction is holding
  • Acknowledge the commit has happened

There is no need to also flush the changed data file pages to disk at this point, as the transaction has already been made durable by making sure the description of all the changes are on disk. The data file pages will be written out later by a checkpoint operation – a topic for a future article.

How Does Rollback Work in SQL Server?

When a transaction must be rolled back, all operations that were part of the transaction must be essentially reversed so none of the data modifications from the transaction are present in the database. This is done using the transaction log, as the log records for a transaction are linked together in reverse order and this allows the transaction’s changes to be undone in reverse order.

Consider another simple example:

BEGIN TRANSACTION;
INSERT INTO
     MyDatabase.dbo.MyTable
VALUES (1, ‘Paul’, ‘Randal’);
INSERT INTO
     MyDatabase.dbo.MyTable
VALUES (2, ‘Kimberly’, ‘Tripp’);

At this point there are three log records for the transaction. If I then decide to execute a rollback command, SQL Server does the following:

  1. Find the most recent log record for the ‘forward’ part of the transaction, work out what operation will undo the change described by the log record, perform the operation, and generate a log record.
  2. Find the previous log record, pointed to by the ‘previous log record’ LSN.
  3. Repeat until the begin log record is reached. At this point the rollback has been completed, so another log record is generated that indicates that the transaction has successfully aborted.

This will generate three more log records for my example. As you can see, rolling back a transaction takes a lot of work under the covers.

It is also possible to define a savepoint using the SAVE TRANSACTION statement and roll back to that named point in the transaction rather than rolling the entire transaction back.

Common Mistakes That Can Cause Transaction Log Problems

The first mistake is to forget to commit a transaction. This means everything that subsequently happens on that connection is part of the same transaction. As more changes are made, more log records are generated and more transaction log space is required. The space used to store log records from earlier in the transaction cannot be reused (i.e. allowing the log to truncate), as those log records must remain in case the transaction rolls back (and they’re needed for the mechanism I described above). The transaction log will likely grow… and grow… and grow, until someone finally commits the long-running transaction and allows the log to be brought back under control.

The second mistake is to inadvertently execute some code that does a lot more work than you thought, for instance performing an update on a very large table (e.g. a billion rows) and forgetting a WHERE clause. For every row that’s updated, there’s at least one log record generated so there will be at least a billion log records generated for the transaction and that will likely cause explosive transaction log growth. A DBA that doesn’t know how rollback works might be tempted to immediately cancel the update. But a knowledgeable DBA will know that rolling back a very long-running transaction will generate at least the same number of log records as have already been generated, taking a lot more time, and may decide that the prudent course of action is to let the update complete.

If you have a transaction log that is seemingly growing out of control, you can see why by running this code:

SELECT
     [log_reuse_wait_desc]
FROM [master].[sys].[databases]
WHERE
    [name] = N'MyDatabase';

If one of these two mistake scenario is the culprit, the output will look like this:

log_reuse_wait_desc
-------------------
ACTIVE_TRANSACTION

If not, you can read about the other possible values and what they mean in the Microsoft documentation here, in the section Factors that can delay log truncation.

Importance of Understanding Transactions in SQL Server

Not only is it important to understand what transactions are and design your code to make sure that it is appropriately implementing your business logic, it is also important to understand some of the internals that I’ve described as making mistakes can cause problems for database administrators. In my experience, database administrators often need to know how transactions work and what the potential mistakes that can be made are so they can troubleshoot issues around the transaction log.

There are many more facets to using transactions, such as specifying isolation levels and designing efficient code, but I hope this initial primer has given you a good grounding in why transactions are needed and how they work. I know it’s a bit of a cliche, but with SQL Server, it’s definitely a case of the more you know, the further you’ll go!

Post Categories:

Leave a Reply

Your email address will not be published. Required fields are marked *

Other articles

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.