[Edit 2016: Check out my new resource – a comprehensive library of all wait types and latch classes – see here.]
In some of the wait statistics data I’ve been analyzing, some servers have been showing very long ASYNC_IO_COMPLETION waits, which I had a hunch about but wanted proof.
The official definition of ASYNC_IO_COMPLETION is ‘Occurs when a task is waiting for I/Os to finish.’ Very helpful – NOT!
Using the code I blogged yesterday (How to determine what causes a particular wait type), I set up an XEvent session to track all the ASYNC_IO_COMPLETION waits and their durations, changing to a ring_buffer target so I get the wait durations too:
CREATE EVENT SESSION [InvestigateWaits] ON SERVER ADD EVENT [sqlos].[wait_info] ( ACTION ([package0].[callstack]) WHERE [wait_type] = 98 -- ASYNC_IO_COMPLETION only, same on all versions AND [opcode] = 1 -- Just the end wait events ) ADD TARGET [package0].[ring_buffer] WITH ( MAX_MEMORY = 50 MB, MAX_DISPATCH_LATENCY = 5 SECONDS ) GO
And then I ran some commands. Although ASYNC_IO_COMPLETION pops up in various locations, the long-running wait happens during a data backup. In fact, three ASYNC_IO_COMPLETION waits occurred during each full backup I ran:
. <snip> . <data name="duration"> <type name="uint64" package="package0" /> <value>7</value> </data> . <snip> . XeSosPkg::wait_info::Publish+138 [ @ 0+0x0 SOS_Scheduler::UpdateWaitTimeStats+30c [ @ 0+0x0 SOS_Task::PostWait+90 [ @ 0+0x0 EventInternal<SuspendQueueSLock>::Wait+1f9 [ @ 0+0x0 AsynchronousDiskPool::WaitUntilDoneOrTimeout+fb [ @ 0+0x0 CheckpointDB2+2cd [ @ 0+0x0 BackupDatabaseOperation::PerformDataCopySteps+c2 [ @ 0+0x0 BackupEntry::BackupDatabase+2ca [ @ 0+0x0 CStmtDumpDb::XretExecute+ef [ @ 0+0x0 CMsqlExecContext::ExecuteStmts<1,1>+400 [ @ 0+0x0 CMsqlExecContext::FExecute+a33 [ @ 0+0x0 CSQLSource::Execute+866 [ @ 0+0x0 process_request+73c [ @ 0+0x0 process_commands+51c [ @ 0+0x0 SOS_Task::Param::Execute+21e [ @ 0+0x0 SOS_Scheduler::RunTask+a8 [ @ 0+0x0 SOS_Scheduler::ProcessTasks+29a [ @ 0+0x0 SchedulerManager::WorkerEntryPoint+261 [ @ 0+0x0 SystemThread::RunWorker+8f [ @ 0+0x0 SystemThreadDispatcher::ProcessWorker+3c8 [ @ 0+0x0 SchedulerManager::ThreadEntryPoint+236 [ @ 0+0x0 BaseThreadInitThunk+d [ @ 0+0x0 RtlUserThreadStart+21 [ @ 0+0x0</value . <snip> . <data name="duration"> <type name="uint64" package="package0" /> <value>0</value> </data> . <snip> . SOS_Scheduler::UpdateWaitTimeStats+30c [ @ 0+0x0 SOS_Task::PostWait+90 [ @ 0+0x0 EventInternal<SuspendQueueSLock>::Wait+1f9 [ @ 0+0x0 AsynchronousDiskPool::WaitUntilDoneOrTimeout+fb [ @ 0+0x0 BackupOperation::GenerateExtentMaps+34c [ @ 0+0x0 BackupDatabaseOperation::PerformDataCopySteps+179 [ @ 0+0x0 BackupEntry::BackupDatabase+2ca [ @ 0+0x0 CStmtDumpDb::XretExecute+ef [ @ 0+0x0 CMsqlExecContext::ExecuteStmts<1,1>+400 [ @ 0+0x0 CMsqlExecContext::FExecute+a33 [ @ 0+0x0 CSQLSource::Execute+866 [ @ 0+0x0 process_request+73c [ @ 0+0x0 process_commands+51c [ @ 0+0x0 SOS_Task::Param::Execute+21e [ @ 0+0x0 SOS_Scheduler::RunTask+a8 [ @ 0+0x0 SOS_Scheduler::ProcessTasks+29a [ @ 0+0x0 SchedulerManager::WorkerEntryPoint+261 [ @ 0+0x0 SystemThread::RunWorker+8f [ @ 0+0x0 SystemThreadDispatcher::ProcessWorker+3c8 [ @ 0+0x0 SchedulerManager::ThreadEntryPoint+236 [ @ 0+0x0 BaseThreadInitThunk+d [ @ 0+0x0 RtlUserThreadStart+21 [ @ 0+0x0</value> . <snip> . <data name="duration"> <type name="uint64" package="package0" /> <value>1958</value> </data> . <snip> . XeSosPkg::wait_info::Publish+138 [ @ 0+0x0 SOS_Scheduler::UpdateWaitTimeStats+30c [ @ 0+0x0 SOS_Task::PostWait+90 [ @ 0+0x0 EventInternal<SuspendQueueSLock>::Wait+1f9 [ @ 0+0x0 AsynchronousDiskPool::WaitUntilDoneOrTimeout+fb [ @ 0+0x0 BackupOperation::BackupData+272 [ @ 0+0x0 BackupEntry::BackupDatabase+2ca [ @ 0+0x0 CStmtDumpDb::XretExecute+ef [ @ 0+0x0 CMsqlExecContext::ExecuteStmts<1,1>+400 [ @ 0+0x0 CMsqlExecContext::FExecute+a33 [ @ 0+0x0 CSQLSource::Execute+866 [ @ 0+0x0 process_request+73c [ @ 0+0x0 process_commands+51c [ @ 0+0x0 SOS_Task::Param::Execute+21e [ @ 0+0x0 SOS_Scheduler::RunTask+a8 [ @ 0+0x0 SOS_Scheduler::ProcessTasks+29a [ @ 0+0x0 SchedulerManager::WorkerEntryPoint+261 [ @ 0+0x0 SystemThread::RunWorker+8f [ @ 0+0x0 SystemThreadDispatcher::ProcessWorker+3c8 [ @ 0+0x0 SchedulerManager::ThreadEntryPoint+236 [ @ 0+0x0 BaseThreadInitThunk+d [ @ 0+0x0 RtlUserThreadStart+21 [ @ 0+0x0
One wait for each of the following:
- The start of the initial checkpoint at the start of the backup
- Reading the GAM pages from the data files to figure out what to back up
- Reading the actual data from the data files (the writes to the backup files are tracked by BACKUPIO waits)
The third wait is for almost the full duration of the backup. As there are three of these per data backup, the per wait average will be a bit skewed.
The aggregate waits for the single backup operation are as follows, showing the BACKUPIO and BACKUPBUFFER waits each adding up to almost the total backup run time of about 2 seconds:
WaitType Wait_S Resource_S Signal_S WaitCount Percentage AvgWait_S AvgRes_S AvgSig_S BACKUPBUFFER 2.69 2.68 0.01 505 37.23 0.0053 0.0053 0.0000 ASYNC_IO_COMPLETION 2.33 2.33 0.00 3 32.29 0.7773 0.7773 0.0000 BACKUPIO 1.96 1.96 0.00 171 27.08 0.0114 0.0114 0.0000 PREEMPTIVE_OS_WRITEFILE 0.20 0.20 0.00 2 2.78 0.1005 0.1005 0.0000 BACKUPTHREAD 0.04 0.04 0.00 15 0.50 0.0024 0.0024 0.0000 PREEMPTIVE_OS_FILEOPS 0.00 0.00 0.00 5 0.06 0.0008 0.0008 0.0000 WRITELOG 0.00 0.00 0.00 11 0.06 0.0004 0.0004 0.0000
So there you have it. XEvents and a little internals knowledge let us understand what could otherwise be seen as a worrying high-duration wait type. Long-duration ASYNC_IO_COMPLETION waits commonly are from data backups (they don’t happen for regular log backups).
Enjoy!
12 thoughts on “A cause of high-duration ASYNC_IO_COMPLETION waits”
Hi Paul.
That was a good explanation about the wait type.
However, I have a small question related to this.
I’m facing a high level of ASYNC_IO_COMPLETION into one of my environments what is resulting in backups of small databases (around 1,5GB) to take more than 15 minutes.
I have a feeling that it is related to low performance on the I/O subsystem. But I’m not 100% confidente to go ahead on this diagnostic to ask for analysis or changes on that.
Is there a way to be conclusive about the performance issue on the I/O subsystem, maybe crossing I/O performance counters to others?
Thanks in advance.
Yup – that’s what it is. Try doing a backup to nul, as that will give you time taken to read from the data files. The difference is the time being taken to write out the backup files – there’s your ammo.
Hi Paul,
Great post! how did you get the last result set though, which DMV did you use to get that, do you mind posting? I am talking about the aggregate waits for the single backup operation.
Thanks.
My regular wait stats query – see the start of the Waits Stats category.
Interesting article. about Backup with NUL, i will suggest WITH COPY_ONLY to preserve your TLOGS without disturbing your TLOG backup chain or your DR will be jeopardize? adding my 2 cents.
Please, do all asynchronous operation fall into this category like read ahead, writing to error log etc.
No
Hey, do you already see this case: ASYNC_IO_COMPLETION with more than 50 seconds and increasing and no entries on sys.dm_io_pending_io_requests?
Also, FlushCache messages are logged in errolog… But disk are performing well… FlushCache says 600MB takes more than 2min, but doing tests on disk and looking for some metrics on perfmon, maximum latency is just 1ms…
ASYNC_IO_COMPLETION can reflect other problems, like contention in kernel mode?
Are you performing a full or differential backup at that time? IIRC backup I/Os don’t show up in that DMV as backup uses its own channels to the data and backup files rather than going through the buffer pool and regular thread scheduling.
Hey Paul!
Thanks your reply.
I’m not running backups at time I see this wait. I see it being reported by REDO thread of an always on secondary…Redo was slow and analyzing waits it showed this as TOP wait ( i collected waits for a specific redo session using this post https://techcommunity.microsoft.com/t5/sql-server-support/troubleshooting-redo-queue-build-up-data-latency-issues-on/ba-p/318488)
But, if backup can use anothers channels that is not tracked by sys.dm_io_pending_io_requests, I can assume other components, like a REDO THREAD, also can do that, right?
No – the redo thread will use the buffer pool and regular log flushing mechanisms. Only way to know more is to capture call stacks for when it’s occurring.
I had this issue, a constantly recurring 40-60sec wait on ASYNC_IO_COMPLETION on DB STARTUP threads as part of the REDO on the secondary of my AO cluster, which impacted the redo rate of the busiest database on the secondary. It turned out to be the target recovery time for the database. This is set by default when creating the database on later versions of SQL but may not be set on databases restored/migrated from previous versions. This article was the start of my journey, the extended event helped identify which databases were affected and the setting from there. Thanks Paul.