60 responses

  1. Ray Herring
    April 11, 2011

    WriteLatency = CASE WHEN io_stall_write_ms = 0
    THEN 0 ELSE (io_stall_write_ms / num_of_writes) END

    I think you want to test num_of_writes = 0 in this case.

  2. paul
    April 11, 2011

    Indeed – blame messing around with formatting. Thanks!

  3. Manjot
    April 11, 2011

    Hi,

    Thanks for the script.
    How can I find out "I/O subsystem configuration"? Do I need to ask the windows engineers who look after the server?

  4. paul
    April 11, 2011

    @Manjot Yes – you’ll need to ask whoever manages the I/O subsystem.

  5. alex
    April 11, 2011

    This will display the file name instead of outputting the full path:

    SUBSTRING(mf.physical_name, len(mf.physical_name)-CHARINDEX(‘\’,REVERSE(mf.physical_name))+2, 100)

    I find this script to be quite useful when capturing data in segments especially during a job execution and peak usage.

  6. Wes Brown
    April 11, 2011

    Hey Paul!
    I blogged about this earlier in the year too. I’ve got a method to collect the data over time.

    http://sqlserverio.com/2011/02/08/gather-virtual-file-statistics-using-t-sql-tsql2sday-15/

  7. amosquera
    April 12, 2011

    Hi,

    Could be operations like DBCC CEHCKDB, REBUILD, REORGANIZE REBUILD INDEX, UPDATE STATISTICS, … spoil this measures ?.

  8. paul
    April 12, 2011

    Well they won’t spoil the measurement, but they will add a lot of extra I/O workload and that needs to be taken into account when designing your I/O subsystem. An alternative is to offload the CHECKDB workload to another system after restoring a full backup there.

  9. Aukcje Alegro
    April 16, 2011

    I am quite interesting in this topic hope you will write more on it in future posts

  10. scorellis
    March 21, 2013

    hi Paul,

    i am trying to estimate the total bytes/sec of throughput that i am getting from my sql server and am wondering if it is accurate to say that my total bytes/sec of throughput would be = num_of_bytes_written/io_stall_write_ms?

    thank you
    scorellis

    • Paul Randal
      March 23, 2013

      No – dividing by sample_ms is the correct way, but that only gives you an average, not your peak throughput. You need to use a benchmarking tool to get that.

      • Jean-Philippe
        March 27, 2013

        Hi
        I often use the sys.dm_io_virtual_file_stats DMV to calculate Avg Service Time for a file and also for a drive (by aggregation).
        Recently, i have compared DMV latency results (ReadLatency and WriteLatency for each drive) and perfmon counters (Avg. Disk sec/Read and Avg. Disk sec/Write)
        It showed [Avg Time ms/Read] was more important in DMV results than perfmon counters.
        what is the reason for that ?

        Thanks

      • Paul Randal
        March 27, 2013

        Not sure what you mean by ‘more important’? If you mean that the avg time was higher in the DMV than in perfmon, could be that there are other uses of that drive doing I/Os that will be taken into account by perfmon, but not by the DMV, which is only measuring I/O latency for those files managed by SQL on that drive. Remember also that the DMV is aggregated data since the database came online whereas perfmon is a rolling average – that could also make perfmon lower if there have been some spikes of high I/O latency in the past that the DMV is still taking into account.

      • Jean-Philippe
        March 28, 2013

        Hello and thank you for your reply
        In fact, I have a process that collects data from the dmv dm_io_virtual_file_stats every minute and that makes the difference between each collection.
        I therefore compare the results of the dmv (select Drive, sum (io_stall_read_ms) / sum (num_of_reads) from group by dm_io_virtual_file_stats_diff Drive) with perfmon counters (Avg. Disk sec / Read)
        From a general way, we have similar time but when high latency disk, the time seen through my collection “DMV” are more important than the perfmon data (20-40%).
        I am sure there is no other activity on the monitored drive (just SQLServer activity)
        Is it possible to explain this phenomenon?
        Is that these 2 two types of counters (perfmon and DMV) is based on the same metric?
        I hope they are not stupid questions

        Thank you for your help
        Jean-Philippe

        NB : sorry for my english

      • Paul Randal
        March 28, 2013

        They’re not actually the same metric. The DMV is measuring everything from SQL Server to the disk and back. The PerfMon counters are just measuring the time Windows sees for the I/O. There could things like filter drivers etc in the way. Any significant difference between them can be explained by something taking time between SQL Server and Windows issuing the I/O.

  11. Karthik
    April 2, 2013

    Dear Paul,
    I am seeing lot of negative values in Latency value of the given query. Is that the indication of an IO issue? Also I am seeing lot of events like this in eventlog and sql error log. SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file. Please advice.

    • Paul Randal
      April 2, 2013

      I don’t see how you can get negative values for latencies, unless the DMV is returning negative values for total latencies, which would be a bug. If you’re seeing I/O stall warnings in the error log, that’s usually a sign of problems with your I/O subsystem.

    • Peter
      May 28, 2013

      Hello,

      Since the dmv gives cumulative numbers, I persist the results each hour and compare sequential results. This way we found out our latencies are “acceptable” during production hours, while they are terrible during night due to index rebuilds, integrity checks, database backups, system backups, etl processes etc.

      While creating the view (eg ([io_stall_read_ms] – previous.[io_stall_read_ms]) / ([num_of_reads] – previous.[number_of_reads]) I also saw negative values, caused by the fact the services had been restarted (counters are reset) between 2 measurements. After testing on service restart using (SELECT crdate FROM sys.sysdatabases WHERE lower(NAME) = ‘tempdb’) I still saw some negative numbers. It seems to me this is caused by database restore, so now I just test whether bytesread and byteswritten are higher in the previous measurement. In that case I assume the values represent the first measurement, I can compare with.

      Kind Regards
      Peter.

      • Allen McGuire
        March 20, 2014

        I just started doing the same Peter. Doing any sort of computation on the DMV results is very, very misleading. Where the DMV was telling me I had IO wait times for writes of >200ms for each of my TempDB data files, my new hourly/differential report indicates the actual to be <5ms throughout the day with very few exceptions thus far. Since I just set this up three days ago, I need to gather more data but I see nothing that would explain why the DMV is returning such a high value. I don't do any crazy maintenance on the weekends – time will tell I suppose.

  12. Eric M
    July 1, 2013

    I’m seeing high avg. disk sec /read , and sec /write, but no high page waits in perfmon. Trying to figure out how to compare those values, vs dm_io_pending_io_pending requests, and sys.dm_io_virtual_file_stats. My issue is I have four data files two split across two LUNS each total 660 Gigs. Plan is to move off to four LUNS put one data file per drive on an upgraded o/s, with more memory, cpu etc. My question is there a rule of thumb as to when to add more data files to your databases. And during a rebuild operation does it split the load across the number of data files defined in your database?

    • Paul Randal
      July 3, 2013

      What do you mean by ‘page waits in perfmon’? Are you looking at wait statistics? You should be seeing PAGEIOLATCH_XX waits. Too complex a topic for a comment. Read through my benchmarking and IO subsystem series. Yes, during a rebuild the allocations will come from all files in the filegroup where the index is stored.

  13. Brian ODwyer
    December 19, 2013

    The sample_ms column has a bug http://connect.microsoft.com/SQLServer/feedback/details/795556 which does return negagive numbers.

    There are two fixes depends on version of SQL server, 2005 uses one method 2008 and higher another.

  14. Binny Mathew
    January 27, 2014

    Thank you for the query in the blog. I ran the above query and found that the write latency is quite high(532) for
    tempdb and for another user database which stores images in this case 19. We have a clustered environment running
    in sql2008R2. Unfortunately all the mdf ldf and tempdb had been installed in one lun(Drive:E). How can we improve the
    I/o performance although the users have not complained so far.

    • Paul Randal
      January 27, 2014

      Move the files to different parts of the I/O subsystem, for instance.

  15. dan
    March 6, 2014

    Dear Paul,
    I ran your query on my laptop, got:
    ReadLatency 29
    wrtieLatency 422
    Latency 57
    AvgBPerRead 63666
    AvgBPerWrite 8192
    AvgBperTranfer 59745

    I think wrtieLatency wrtieLatency 422 is high on AdventureWorks .mdf file. Now what should I do to fix it? Can you please help me to understand?

    • Paul Randal
      March 9, 2014

      Yup – that’s very high. You’re going to have to follow the steps in the blog post to reduce the amount of writes or address a shortfall in the I/O subsystem.

  16. Nick
    April 23, 2014

    Paul, thanks for your ever useful information. I was wondering if you could suggest where to look in my situation. We have our 64bit Enterprise, 96GBRam DWH server, connected to an array of SAN disks. the disk IO hasn’t been great so I ran your query – the top 13 files:

    ReadLatency WriteLatency Latency AvgBPerRead AvgBPerWrite AvgBPerTransfer Drive
    29 4857 2598 65169 68134 66747 G:
    29 4856 2598 65167 68138 66748 G:
    28 4833 2585 65175 68132 66749 G:
    28 4820 2576 65156 68127 66736 G:
    726 535 694 427252 129080 376745 G:
    548 4 525 316994 34130 305151 G:
    403 5 401 199901 28582 199270 E:
    74 680 376 63511 65044 64275 E:
    375 16 375 265342 8192 265248 E:
    356 10 350 285793 8199 281173 E:
    331 11 330 217433 8192 216726 E:
    272 7 261 180452 18909 173827 E:
    289 28 260 197289 11691 176490 E:

    Which look pretty awful.

    However, I then used SQLIO to test performance straight on the disks (G: and E:). The performance was excellent for example:

    IOs/sec: 106183.79
    MBs/sec: 829.56
    latency metrics:
    Min_Latency(ms): 0
    Avg_Latency(ms): 0
    Max_Latency(ms): 785
    histogram:
    ms: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24+
    %: 94 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

    There’s a mismatch between the actual performance of the SAN (which is excellent) and SQL Server’s experience of it and I cannot figure out what steps to take next. What would impact SQL Server’s use of the disks?

    Thanks for your help !!!!

    N

    • Paul Randal
      April 24, 2014

      Is your SQLIO test properly simulating your production workload? That’s all I can think of. You might also try taking two snapshots of the DMV data and then diff’ing them to see what I/O characteristics are over a period of time, rather than for all time. Could be that performance sucked earlier and now it’s fine.

      • Nick
        April 28, 2014

        Thanks, Paul. I guess it isn’t. Maybe SQLIOSim is a better tool to try.
        Cheers,
        Nick

  17. Juan Cabrera
    July 11, 2014

    Paul, for calculating read(or write) latency you have the following:
    [ReadLatency] =
    CASE WHEN [num_of_reads] = 0
    THEN 0 ELSE ([io_stall_read_ms] / [num_of_reads]) END

    Why are we dividing by num_of_reads? Shouldn’t we dividing by sample_ms instead? Don’t we want to see the average number of read stalls in the span of time? I am sure there is an explanation, and it is possible that sleep depravity has caused brain latency in my head :)

    thanks,

    Juan Cabrera.

    • Paul Randal
      July 11, 2014

      No – we’re looking for ms per read and ms per write – i.e. average read/write latency.

      • Juan Cabrera
        July 14, 2014

        Yup, sleep depravation alright. After a restful weekend it makes perfect sense.

        thanks,

        Juan.

  18. Pawel
    September 23, 2014

    Has anyone taken this script and turned it into a way to collect and aggregate the stats? Would be interested in how this data trends over time. As opposed to seeing just the total.

    • Paul Randal
      September 24, 2014

      I don’t know – should be easy to do.

  19. eric81
    October 8, 2014

    Paul, does the DMV io stats match to what we’re seeing in windows perfmon? for example I noticed 123 ms read latency but when I go to the DMV I don’t see anything that high. Just asking

    • Paul Randal
      October 8, 2014

      You have to remember that the DMV is the aggregate over the time the database has been online, so the average may mask any transient latency spikes.

  20. Robert
    October 27, 2014

    Just to confirm, these are cumulative stats.

    So to get a true estimate of latency, if I sample every hour, subtract sample 1 (baseline) from sample 2 (an hour later). At least this will get me the latency at that point in time, correct?

    • Paul Randal
      October 28, 2014

      Correct

  21. chung
    October 30, 2014

    Does any know how to get Avg. Disk sec / Read from DMV instead of Perform Monitor? I don’t see this information in any DMVs.

    • Paul Randal
      October 30, 2014

      That’s what the script in the blog post gives you – avg. read latency.

  22. chung
    October 30, 2014

    Ah, I see. The name is confusing. Thanks Paul!!!!

  23. chung
    October 30, 2014

    Paul,

    One more question. We are using AZURE DB and I find out these DMVs won’t work anymore. Do you have other script for AZURE? Thanks in advance.

    Chung.

    • Paul Randal
      October 30, 2014

      Sorry – I don’t do Azure at all so I don’t know how to do that in Azure.

  24. chung
    October 31, 2014

    Paul,

    You mentioned that ” if you design an I/O subsystem to support 1000 IOPS (I/O operations per second) and SQL Server is trying to push 2000 IOPS, performance is going to suffer.” Is there a DMVs I can use to find the IOPS of my current DB is pushing? How do I find out if I push more IOPS than the system can support?

    Thanks in advance.

    Chung.

    • Paul Randal
      October 31, 2014

      You can use the IO count fields in the DMV from this article to see how many IOs per second, but to get a really detailed view, you may need to move to using Extended Events and aggregating the IO data from there – but that’s likely overkill.

  25. Awadh
    February 12, 2015

    Hi there,

    when I run the script, I got two entries as follows:
    1-
    17 ,0 ,17, 515665, 8192, 515665 ,C: local C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\Research2.mdf

    2-
    8 ,0, 8, 5347, 3379, 5335, C: local C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\Research2_log.ldf
    the query is read intensive one as you can see from entry 1 but why there is ave write latency when there is no write latency?

    for entry 2 as you can see, latency time is shorter than entry 1 should I include its data in my analysis?

    the query runs for 9 days before network failure occurs.

    thank for your help

    • Paul Randal
      February 19, 2015

      I don’t see any avgwritelatency in the data you posted above…

  26. Harsh Jain
    February 13, 2015

    Hi Paul,

    Is there any DMV/script by which we can know, which query is consuming high I/O of storage?

    • Paul Randal
      February 15, 2015

      Look in one of the query stats DMVs for the I/O counts.

  27. dan
    March 10, 2015

    Hi Paul,
    I have first row with:
    Readlatency 623
    Writelatency 266
    Latency 569
    AvgBPerRead 51856
    AvgBperwrite 40577
    AvgBperTransfer 50166
    What do you see what’s happening? What are the suggestions?

    • dan
      March 10, 2015

      ++ Physical name
      G:\DATA\mydb.mdf

    • dan
      March 10, 2015

      sorry, forgot to add below:
      Physical name
      G:\DATA\mydb.mdf

Leave a Reply

 

 

 

Back to top
mobile desktop