A question came up on the Data Platform MVP email list last night asking what the FCB_REPLICA_SYNC spinlock is. I answered the question and then promised to do a quick blog post, as there’s no information online about it that I could find.
Explanation
In a nutshell, this spinlock is used to synchronize access to the list of pages that are present in a database snapshot, as follows:
- If a page in a database with one or more database snapshots is being updated, check each snapshot’s list to see if the page is already in the snapshot. If yes, nothing to do. If no, copy the pre-change image of the page into the snapshot.
- If a query is reading a page in the context of a database snapshot, check the list of pages to see whether to read from the snapshot or the source database.
This synchronization ensures that the correct copy of a page is read by a query using the snapshot, and that updated pages aren’t copied to the snapshot more than once.
The original question was because the person was seeing trillions of spins for the FCB_REPLICA_SYNC spinlock. That’s perfectly normal if there’s at least one database snapshot, a read workload on the snapshot, and a concurrent heavy update workload on the source database.
Example
For example, using our sample SalesDB database (zip file here), I created this query and set it running:
WHILE (1=1) BEGIN UPDATE [SalesDB].[dbo].[Sales] SET [Quantity] = [Quantity] + 1; END; GO
Then I took my script to capture spinlock metrics over a period of time (see this post), changed it to capture over 20 seconds, and then ran a DBCC CHECKDB on the SalesDB database, which took 18 seconds.
The spinlock metrics returned were:
Spinlock DiffCollisions DiffSpins SpinsPerCollision DiffSleepTime DiffBackoffs ------------------------- -------------- --------- ----------------- ------------- ------------ BUF_HASH 2 500 250 0 0 DBTABLE 5 1250 250 0 0 FCB_REPLICA_SYNC 5716270 1513329500 264 0 154380 LOCK_HASH 12 3500 291 0 1 LOGCACHE_ACCESS 6 387 64 0 3 LOGFLUSHQ 4 75840 18960 0 3 LOGPOOL_HASHBUCKET 15 3750 250 0 0 LOGPOOL_SHAREDCACHEBUFFER 32 8000 250 0 0 LSLIST 8 2000 250 0 0 SOS_SCHEDULER 3 1114 371 0 0 SOS_TASK 1 356 356 0 0
You can see that even for a 20-second test, a single DBCC CHECKDB produced 1.5 billion spins on the FCB_REPLICA_SYNC spinlock.
This is perfectly normal.
One of the dangers of looking at spinlock metrics is that the numbers involved can be so high that it’s easy to convince yourself that there’s some kind of problem, especially as there’s so little information available online about what the spinlocks actually mean. The vast majority of the time, there’s no problem, but it takes a lot of internals knowledge to know what’s going on.
About the only thing I’d be concerned about is if there are multiple concurrent snapshots on a database with heavy update workload, as that will cause synchronous writes to all the snapshots when a page in the source database is first updated, slowing down the workload.
Investigation
One thing you can always do if you’re interested in what a specific spinlock means is to investigate with Extended Events. There’s a whitepaper I helped review called Diagnosing and Resolving Spinlock Contention on SQL Server that you can download here. In it there’s an Extended Event session that I use to see where spinlocks backoffs occur.
Here’s the session I used for FCB_REPLICA_SYNC (which maps to the type value of 136 in sys.dm_xe_map_values):
-- Drop the session if it exists. IF EXISTS (SELECT * FROM sys.server_event_sessions WHERE [name] = N'WatchSpinlocks') DROP EVENT SESSION [WatchSpinlocks] ON SERVER GO CREATE EVENT SESSION [WatchSpinlocks] ON SERVER ADD EVENT [sqlos].[spinlock_backoff] (ACTION ([package0].[callstack]) WHERE [type] = 136) -- FCB_REPLICA_SYNC only ADD TARGET [package0].[asynchronous_bucketizer] ( SET filtering_event_name = N'sqlos.spinlock_backoff', source_type = 1, -- source_type = 1 is an action source = N'package0.callstack') -- bucketize on the callstack WITH (MAX_MEMORY = 50MB, MAX_DISPATCH_LATENCY = 5 seconds) GO -- Start the session ALTER EVENT SESSION [WatchSpinlocks] ON SERVER STATE = START; GO -- TF to allow call stack resolution DBCC TRACEON (3656, -1); GO -- Cause some spinlock backoffs -- Get the callstacks from the bucketizer target SELECT [event_session_address], [target_name], [execution_count], CAST ([target_data] AS XML) FROM sys.dm_xe_session_targets [xst] INNER JOIN sys.dm_xe_sessions [xs] ON ([xst].[event_session_address] = [xs].[address]) WHERE [xs].[name] = N'WatchSpinlocks'; GO -- Stop the event session ALTER EVENT SESSION [WatchSpinlocks] ON SERVER STATE = STOP; GO
You’ll need to download the debug symbols for the build you’re using – see here for my instructions on how to do this.
I started the event session and re-ran the test. A sampling of the call stacks is below, with matching explanations.
Explanation:
- Pushing a page into a snapshot just before it gets modified in the source database
- Reading a page from a snapshot (in this case, from one of DBCC CHECKDB‘s parallel threads performing readahead)
- Pulling a page in to a snapshot while crash recovery is running on the new snapshot to make it a transactionally-consistent view of the source database (in this case, it’s a ‘transient’ database snapshot that DBCC CHECKDB has created)
And plenty more similar call stacks.
Call stack:
- XeSosPkg::spinlock_backoff::Publish+0x138
SpinlockBase::Sleep+0xc5
SpinlockBase::Backoff+0x145
Spinlock<136,4,1>::SpinToAcquireWithExponentialBackoff+0x169
FCBReplicaSync::StartWrite+0x7f
FCB::CopyPageToReplicas+0x212
BUF::CopyOnWrite+0x60
BPool::PrepareToDirty+0x180
IndexPageRef::Modify+0x146
BTreeRow::UpdateRecord+0x20ab
IndexDataSetSession::SetDataInternal+0x9a03
DatasetSession::SetData+0x16d
RowsetNewSS::SetData+0x6a
CValRow::SetDataX+0x63
CEsExec::GeneralEval4+0xe7
CQScanUpdateNew::GetRow+0x24b
CQueryScan::GetRow+0x81
CXStmtQuery::ErsqExecuteQuery+0x5be
CXStmtDML::XretDMLExecute+0x31c
CXStmtDML::XretExecute+0xad
CMsqlExecContext::ExecuteStmts<0,1>+0x8bd
CMsqlExecContext::FExecute+0xa68
CSQLSource::Execute+0x86c
CStmtPrepQuery::XretPrepQueryExecute+0x464 - XeSosPkg::spinlock_backoff::Publish+0x138
SpinlockBase::Sleep+0xc5
Spinlock<136,4,1>::SpinToAcquireWithExponentialBackoff+0x169
FCBReplicaSync::StartRead+0x86
FCB::ScatterRead+0x1b3
RecoveryUnit::ScatterRead+0xa9
BPool::GetFromDisk+0x719
BPool::ReadAhead+0x7e
MultiObjectScanner::GetNextPageAndReadAhead+0x38e
MultiObjectScanner::GetNext+0x98
MultiObjectScanner::GetNextPageAndBatch+0x2fc
CheckTables::ProcessNextData+0x1bb
CheckAggregateSingleInstance::GetNextFact+0x28e
CTRowsetInstance::FGetNextRow+0x3c
CUtRowset::GetNextRows+0xa0
CQScanRmtScanNew::GetRowHelper+0x3b8
CQScanXProducerNew::GetRowHelper+0x53
CQScanXProducerNew::GetRow+0x15
FnProducerOpen+0x57
FnProducerThread+0x8c3
SubprocEntrypoint+0xa7f
SOS_Task::Param::Execute+0x21e
SOS_Scheduler::RunTask+0xab
SOS_Scheduler::ProcessTasks+0x279 - XeSosPkg::spinlock_backoff::Publish+0x138
SpinlockBase::Sleep+0xc5
Spinlock<136,4,1>::SpinToAcquireWithExponentialBackoff+0x169
FCBReplicaSync::StartWrite+0x7f
FCB::PullPageToReplica+0x35
FCB::CopyPageToReplicas+0x12c
BUF::CopyOnWrite+0x60
BPool::PrepareToDirty+0x180
PageRef::ModifyRow+0x24a
IndexPageRef::Modify+0x19f2
BTreeRow::UpdateRecord+0x20ab
IndexDataSetSession::UndoSetData+0x4d9
XdesRMReadWrite::IndexModify+0x61
XdesRMReadWrite::UndoPageOperation+0x10da
XdesRMReadWrite::RollbackToLsn+0x7d6
RecoveryMgr::UndoRegularXacts+0xb09
RecoveryMgr::RollbackRemaining+0x137
RecoveryUnit::DoRollbackRecovery+0x19
RecoveryUnit::CompleteRecovery+0x6b8
RecoveryUnit::PhaseStart+0x87
DBTABLE::StartupPostRecovery+0x4d
DBTABLE::ReplicaCreateStartup+0x284
DBMgr::SyncAndLinkReplicaRecoveryPhase+0x787
DBMgr::CreatePhasedTransientReplica+0x717
Summary
So there you have it. The FCB_REPLICA_SYNC spinlock is to do with database snapshot reads and writes, and high numbers around it are expected with concurrent updates in the source database and reads in the snapshot.