A cause of high-duration ASYNC_IO_COMPLETION waits

[Edit 2016: Check out my new resource – a comprehensive library of all wait types and latch classes – see here.]

In some of the wait statistics data I’ve been analyzing, some servers have been showing very long ASYNC_IO_COMPLETION waits, which I had a hunch about but wanted proof.

The official definition of ASYNC_IO_COMPLETION is ‘Occurs when a task is waiting for I/Os to finish.’ Very helpful – NOT!

Using the code I blogged yesterday (How to determine what causes a particular wait type), I set up an XEvent session to track all the ASYNC_IO_COMPLETION waits and their durations, changing to a ring_buffer target so I get the wait durations too:

CREATE EVENT SESSION [InvestigateWaits] ON SERVER
ADD EVENT [sqlos].[wait_info]
(
    ACTION ([package0].[callstack])
    WHERE [wait_type] = 98 -- ASYNC_IO_COMPLETION only, same on all versions
    AND [opcode] = 1 -- Just the end wait events
)
ADD TARGET [package0].[ring_buffer]
WITH
(
    MAX_MEMORY = 50 MB,
    MAX_DISPATCH_LATENCY = 5 SECONDS
)
GO

And then I ran some commands. Although ASYNC_IO_COMPLETION pops up in various locations, the long-running wait happens during a data backup. In fact, three ASYNC_IO_COMPLETION waits occurred during each full backup I ran:

.
<snip>
.
    <data name="duration">
      <type name="uint64" package="package0" />
      <value>7</value>
    </data>
.
<snip>
.
XeSosPkg::wait_info::Publish+138 [ @ 0+0x0
SOS_Scheduler::UpdateWaitTimeStats+30c [ @ 0+0x0
SOS_Task::PostWait+90 [ @ 0+0x0
EventInternal<SuspendQueueSLock>::Wait+1f9 [ @ 0+0x0
AsynchronousDiskPool::WaitUntilDoneOrTimeout+fb [ @ 0+0x0
CheckpointDB2+2cd [ @ 0+0x0
BackupDatabaseOperation::PerformDataCopySteps+c2 [ @ 0+0x0
BackupEntry::BackupDatabase+2ca [ @ 0+0x0
CStmtDumpDb::XretExecute+ef [ @ 0+0x0
CMsqlExecContext::ExecuteStmts<1,1>+400 [ @ 0+0x0
CMsqlExecContext::FExecute+a33 [ @ 0+0x0
CSQLSource::Execute+866 [ @ 0+0x0
process_request+73c [ @ 0+0x0
process_commands+51c [ @ 0+0x0
SOS_Task::Param::Execute+21e [ @ 0+0x0
SOS_Scheduler::RunTask+a8 [ @ 0+0x0
SOS_Scheduler::ProcessTasks+29a [ @ 0+0x0
SchedulerManager::WorkerEntryPoint+261 [ @ 0+0x0
SystemThread::RunWorker+8f [ @ 0+0x0
SystemThreadDispatcher::ProcessWorker+3c8 [ @ 0+0x0
SchedulerManager::ThreadEntryPoint+236 [ @ 0+0x0
BaseThreadInitThunk+d [ @ 0+0x0
RtlUserThreadStart+21 [ @ 0+0x0</value
.
<snip>
.
    <data name="duration">
      <type name="uint64" package="package0" />
      <value>0</value>
    </data>
.
<snip>
.
SOS_Scheduler::UpdateWaitTimeStats+30c [ @ 0+0x0
SOS_Task::PostWait+90 [ @ 0+0x0
EventInternal<SuspendQueueSLock>::Wait+1f9 [ @ 0+0x0
AsynchronousDiskPool::WaitUntilDoneOrTimeout+fb [ @ 0+0x0
BackupOperation::GenerateExtentMaps+34c [ @ 0+0x0
BackupDatabaseOperation::PerformDataCopySteps+179 [ @ 0+0x0
BackupEntry::BackupDatabase+2ca [ @ 0+0x0
CStmtDumpDb::XretExecute+ef [ @ 0+0x0
CMsqlExecContext::ExecuteStmts<1,1>+400 [ @ 0+0x0
CMsqlExecContext::FExecute+a33 [ @ 0+0x0
CSQLSource::Execute+866 [ @ 0+0x0
process_request+73c [ @ 0+0x0
process_commands+51c [ @ 0+0x0
SOS_Task::Param::Execute+21e [ @ 0+0x0
SOS_Scheduler::RunTask+a8 [ @ 0+0x0
SOS_Scheduler::ProcessTasks+29a [ @ 0+0x0
SchedulerManager::WorkerEntryPoint+261 [ @ 0+0x0
SystemThread::RunWorker+8f [ @ 0+0x0
SystemThreadDispatcher::ProcessWorker+3c8 [ @ 0+0x0
SchedulerManager::ThreadEntryPoint+236 [ @ 0+0x0
BaseThreadInitThunk+d [ @ 0+0x0
RtlUserThreadStart+21 [ @ 0+0x0</value>
.
<snip>
.
    <data name="duration">
      <type name="uint64" package="package0" />
      <value>1958</value>
    </data>
.
<snip>
.
XeSosPkg::wait_info::Publish+138 [ @ 0+0x0
SOS_Scheduler::UpdateWaitTimeStats+30c [ @ 0+0x0
SOS_Task::PostWait+90 [ @ 0+0x0
EventInternal<SuspendQueueSLock>::Wait+1f9 [ @ 0+0x0
AsynchronousDiskPool::WaitUntilDoneOrTimeout+fb [ @ 0+0x0
BackupOperation::BackupData+272 [ @ 0+0x0
BackupEntry::BackupDatabase+2ca [ @ 0+0x0
CStmtDumpDb::XretExecute+ef [ @ 0+0x0
CMsqlExecContext::ExecuteStmts<1,1>+400 [ @ 0+0x0
CMsqlExecContext::FExecute+a33 [ @ 0+0x0
CSQLSource::Execute+866 [ @ 0+0x0
process_request+73c [ @ 0+0x0
process_commands+51c [ @ 0+0x0
SOS_Task::Param::Execute+21e [ @ 0+0x0
SOS_Scheduler::RunTask+a8 [ @ 0+0x0
SOS_Scheduler::ProcessTasks+29a [ @ 0+0x0
SchedulerManager::WorkerEntryPoint+261 [ @ 0+0x0
SystemThread::RunWorker+8f [ @ 0+0x0
SystemThreadDispatcher::ProcessWorker+3c8 [ @ 0+0x0
SchedulerManager::ThreadEntryPoint+236 [ @ 0+0x0
BaseThreadInitThunk+d [ @ 0+0x0
RtlUserThreadStart+21 [ @ 0+0x0

One wait for each of the following:

The start of the initial checkpoint at the start of the backup
Reading the GAM pages from the data files to figure out what to back up
Reading the actual data from the data files (the writes to the backup files are tracked by BACKUPIO waits)

The third wait is for almost the full duration of the backup. As there are three of these per data backup, the per wait average will be a bit skewed.

The aggregate waits for the single backup operation are as follows, showing the BACKUPIO and BACKUPBUFFER waits each adding up to almost the total backup run time of about 2 seconds:

WaitType                Wait_S   Resource_S   Signal_S   WaitCount   Percentage   AvgWait_S   AvgRes_S   AvgSig_S
BACKUPBUFFER            2.69     2.68         0.01       505         37.23        0.0053      0.0053     0.0000
ASYNC_IO_COMPLETION     2.33     2.33         0.00       3           32.29        0.7773      0.7773     0.0000
BACKUPIO                1.96     1.96         0.00       171         27.08        0.0114      0.0114     0.0000
PREEMPTIVE_OS_WRITEFILE	0.20     0.20         0.00       2           2.78         0.1005      0.1005     0.0000
BACKUPTHREAD            0.04     0.04         0.00       15          0.50         0.0024      0.0024     0.0000
PREEMPTIVE_OS_FILEOPS   0.00     0.00         0.00       5           0.06         0.0008      0.0008     0.0000
WRITELOG                0.00     0.00         0.00       11          0.06         0.0004      0.0004     0.0000

So there you have it. XEvents and a little internals knowledge let us understand what could otherwise be seen as a worrying high-duration wait type. Long-duration ASYNC_IO_COMPLETION waits commonly are from data backups (they don’t happen for regular log backups).

Enjoy!

12 thoughts on “A cause of high-duration ASYNC_IO_COMPLETION waits”

Marcelo Moraes says:

August 27, 2015 at 5:25 am

Hi Paul.

That was a good explanation about the wait type.

However, I have a small question related to this.

I’m facing a high level of ASYNC_IO_COMPLETION into one of my environments what is resulting in backups of small databases (around 1,5GB) to take more than 15 minutes.

I have a feeling that it is related to low performance on the I/O subsystem. But I’m not 100% confidente to go ahead on this diagnostic to ask for analysis or changes on that.

Is there a way to be conclusive about the performance issue on the I/O subsystem, maybe crossing I/O performance counters to others?

Thanks in advance.

Reply
Paul Randal says:

August 27, 2015 at 5:51 am

Yup – that’s what it is. Try doing a backup to nul, as that will give you time taken to read from the data files. The difference is the time being taken to write out the backup files – there’s your ammo.

Reply
Jose says:

September 1, 2015 at 7:04 am

Hi Paul,

Great post! how did you get the last result set though, which DMV did you use to get that, do you mind posting? I am talking about the aggregate waits for the single backup operation.

Thanks.

Reply
1. Paul Randal says:
  
  September 1, 2015 at 7:19 pm
  
  My regular wait stats query – see the start of the Waits Stats category.
  
  Reply
Jay says:

November 1, 2016 at 12:17 pm

Interesting article. about Backup with NUL, i will suggest WITH COPY_ONLY to preserve your TLOGS without disturbing your TLOG backup chain or your DR will be jeopardize? adding my 2 cents.

Reply
Bhavya says:

October 9, 2017 at 5:39 am

Please, do all asynchronous operation fall into this category like read ahead, writing to error log etc.

Reply
1. Paul Randal says:
  
  October 9, 2017 at 5:51 am
  
  No
  
  Reply
Rodrigo says:

May 6, 2020 at 11:30 am

Hey, do you already see this case: ASYNC_IO_COMPLETION with more than 50 seconds and increasing and no entries on sys.dm_io_pending_io_requests?

Also, FlushCache messages are logged in errolog… But disk are performing well… FlushCache says 600MB takes more than 2min, but doing tests on disk and looking for some metrics on perfmon, maximum latency is just 1ms…

ASYNC_IO_COMPLETION can reflect other problems, like contention in kernel mode?

Reply
1. Paul Randal says:
  
  May 12, 2020 at 2:42 pm
  
  Are you performing a full or differential backup at that time? IIRC backup I/Os don’t show up in that DMV as backup uses its own channels to the data and backup files rather than going through the buffer pool and regular thread scheduling.
  
  Reply
  1. Rodrigo says:
    
    May 19, 2020 at 2:35 pm
    
    Hey Paul!
    
    Thanks your reply.
    
    I’m not running backups at time I see this wait. I see it being reported by REDO thread of an always on secondary…Redo was slow and analyzing waits it showed this as TOP wait ( i collected waits for a specific redo session using this post https://techcommunity.microsoft.com/t5/sql-server-support/troubleshooting-redo-queue-build-up-data-latency-issues-on/ba-p/318488)
    
    But, if backup can use anothers channels that is not tracked by sys.dm_io_pending_io_requests, I can assume other components, like a REDO THREAD, also can do that, right?
    
    Reply
    1. Paul Randal says:
      
      May 24, 2020 at 6:00 pm
      
      No – the redo thread will use the buffer pool and regular log flushing mechanisms. Only way to know more is to capture call stacks for when it’s occurring.
      
      Reply
    2. Richard says:
      
      May 10, 2024 at 5:45 pm
      
      I had this issue, a constantly recurring 40-60sec wait on ASYNC_IO_COMPLETION on DB STARTUP threads as part of the REDO on the secondary of my AO cluster, which impacted the redo rate of the busiest database on the secondary. It turned out to be the target recovery time for the database. This is set by default when creating the database on later versions of SQL but may not be set on databases restored/migrated from previous versions. This article was the start of my journey, the extended event helped identify which databases were affected and the setting from there. Thanks Paul.
      
      Reply

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.

A cause of high-duration ASYNC_IO_COMPLETION waits

12 thoughts on “A cause of high-duration ASYNC_IO_COMPLETION waits”

Leave a Reply Cancel reply

Other articles

2024: the year in books

2023 (not a typo!): the year in books

Black Friday Super Sale!

Summer School Savings SALE is live!

The Curious Case of… missing log information

The Curious Case of… missing 832 errors during memory corruption

Imagine feeling confident enough to handle whatever your database throws at you.