Today Joe Webb (Blog|Twitter) blogged about The Power of Regex in Powershell, and in his post he shows how to parse the SQL Server Error Log for events of interest.  At the end of his blog post Joe asked about other places where Regular Expressions have been useful in PowerShell so I thought I’d blog my script for parsing SQLIO output using Regex in PowerShell, to populate an Excel worksheet and build charts based on the results automatically.

If you’ve never used SQLIO, Brent Ozar (Blog|Twitter) has a article on SQLServerPedia titled SAN Performance Tuning with SQLIO that includes a lot of information as well as a video tutorial on its use.  I have been using SQLIO quite a bit this year, mainly to gather performance information for systems before and after reconfiguring their storage to show percent impact information to the business for making changes in storage like migrating from RAID 5 to 10, and aligning disk partitions used for database and log file storage.  I have also used it in benchmarking new SAN configurations and the SAN configuration of our Virtual Machine environment to show team members why I have recommended that a number of our physical servers be migrated onto Virtual Machines.

I generally use a param.txt file similar to:

G:\MSSQL\testfile1.dat 4 0×0 32768

which creates a 32GB test file and then have a DOS batch file that runs the actual SQLIO tests using a script like:

sqlio -kW -t16 -s10 -o8 -fsequential -b8 -BH -LS -Fparam.txt

sqlio -kW -t8 -s360 -o8 -frandom -b8 -BH -LS G:\MSSQL\testfile1.dat > post.txt
sqlio -kW -t8 -s360 -o8 -frandom -b32 -BH -LS G:\MSSQL\testfile1.dat >> post.txt
sqlio -kW -t8 -s360 -o8 -frandom -b64 -BH -LS G:\MSSQL\testfile1.dat >> post.txt
sqlio -kW -t8 -s360 -o8 -frandom -b128 -BH -LS G:\MSSQL\testfile1.dat >> post.txt
sqlio -kW -t8 -s360 -o8 -frandom -b256 -BH -LS G:\MSSQL\testfile1.dat >> post.txt

sqlio -kR -t8 -s360 -o8 -frandom -b8 -BH -LS G:\MSSQL\testfile1.dat >> post.txt
sqlio -kR -t8 -s360 -o8 -frandom -b32 -BH -LS G:\MSSQL\testfile1.dat >> post.txt
sqlio -kR -t8 -s360 -o8 -frandom -b64 -BH -LS G:\MSSQL\testfile1.dat >> post.txt
sqlio -kR -t8 -s360 -o8 -frandom -b128 -BH -LS G:\MSSQL\testfile1.dat >> post.txt
sqlio -kR -t8 -s360 -o8 -frandom -b256 -BH -LS G:\MSSQL\testfile1.dat >> post.txt

sqlio -kW -t8 -s360 -o8 -fsequential -b8 -BH -LS G:\MSSQL\testfile1.dat >> post.txt
sqlio -kW -t8 -s360 -o8 -fsequential -b32 -BH -LS G:\MSSQL\testfile1.dat >> post.txt
sqlio -kW -t8 -s360 -o8 -fsequential -b64 -BH -LS G:\MSSQL\testfile1.dat >> post.txt
sqlio -kW -t8 -s360 -o8 -fsequential -b128 -BH -LS G:\MSSQL\testfile1.dat >> post.txt
sqlio -kW -t8 -s360 -o8 -fsequential -b256 -BH -LS G:\MSSQL\testfile1.dat >> post.txt

sqlio -kR -t8 -s360 -o8 -fsequential -b8 -BH -LS G:\MSSQL\testfile1.dat >> post.txt
sqlio -kR -t8 -s360 -o8 -fsequential -b32 -BH -LS G:\MSSQL\testfile1.dat >> post.txt
sqlio -kR -t8 -s360 -o8 -fsequential -b64 -BH -LS G:\MSSQL\testfile1.dat >> post.txt
sqlio -kR -t8 -s360 -o8 -fsequential -b128 -BH -LS G:\MSSQL\testfile1.dat >> post.txt
sqlio -kR -t8 -s360 -o8 -fsequential -b256 -BH -LS G:\MSSQL\testfile1.dat >> post.txt

Which runs the gamit of tests using 8, 32, 64, 128, and 256K block sizes, with 16 threads, and 8 pending IO.  You can certainly expand the test set much further than this and do a very exhaustive testing regimen, I am by no means saying that my tests above are the best fit, but they have sufficiently met my needs for the tuning I have been doing.  Once the tests have been run the post.txt file will contain an output entry similar the following for every test that was run.

sqlio v1.5.SG
using system counter for latency timings, 14318180 counts per second
16 threads writing for 60 secs to file d:\MSSQL\testfile1.dat
    using 8KB random IOs
    enabling multiple I/Os per thread with 8 outstanding
    buffering set to use hardware disk cache (but not file cache)
using current size: 2048 MB for file: d:\MSSQL\testfile1.dat
initialization done
CUMULATIVE DATA:
throughput metrics:
IOs/sec: 18527.91
MBs/sec:   144.74
latency metrics:
Min_Latency(ms): 0
Avg_Latency(ms): 6
Max_Latency(ms): 4900
histogram:
ms: 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24+
%: 78  6  8  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  2

Linchi Shea (Blog) wrote a blog post three years ago that showed how to Parse the sqlio.exe Output using Perl.  I found this in my search for a smarter way of parsing these long output files I was generating other than manually, but I don’t have Perl installed anywhere and I really felt that I should be able to do this natively using the tools readily available to me on my Windows 7 laptop, so I looked to PowerShell for a solution. 

Reading the file data in is accomplished using Get-Content which returns a collection of strings.  At first I tried casting this to a [string] object directly, but soon found out that when I did this the resulting string was missing the carriage returns and line feeds that existed in the original file.  This posed a couple of problems for me in parsing the outputs initially, but doing a [string]::Join operation using [Environment]::NewLine as the concatenater allowed an appropriately CRLF delimited string to be returned. 

$filedata = [string]::Join([Environment]::NewLine,(Get-Content $FileName))

Now that I had my file data loaded into a the $filedate object, the next step was to split this string based on the “sqlio v1.5.SG” header that is output at the beginning of each test run.  Since PowerShell is .NET based, this is easily accomplished by using the Split() method of the System.String object which is System.Type for the $filedata object.

$Results = $filedata.Split( [String[]]"sqlio v1.5.SG", [StringSplitOptions]::RemoveEmptyEntries )

At first I started to write a foreach loop with this object, lets face it old habits die hard, but instead I made use of piping which allows a collection like the one returned by the Split() method to be worked on in a single operation.  Piping in PowerShell is similar to a manufacturing process where an object gets transformed and then passed on until the ultimate result is achieved.  Once an object is piped, it becomes accessible through the use of the $_ variable name, allowing the object to be transformed along the pipeline.  By piping the result of the Split() I was able to write a SELECT statement that utilized Regex to get the information of interest out of each test run.

     select @{Name="Threads"; Expression={[int]([regex]::Match($_, "(\d+)?\sthreads\s(reading|writing)").Groups[1].Value)}},`
                @{Name="Operation"; Expression={switch ([regex]::Match($_, "(\d+)?\sthreads\s(reading|writing)").Groups[2].Value)
                                                {
                                                    "reading" {"Read"}
                                                    "writing" {"Write"}
                                                }    }},`
                @{Name="Duration"; Expression={[int]([regex]::Match($_, "for\s(\d+)?\ssecs").Groups[1].Value)}},`
                @{Name="IOSize"; Expression={[int]([regex]::Match($_, "\tusing\s(\d+)?KB\s(sequential|random)").Groups[1].Value)}},`
                @{Name="IOType"; Expression={switch ([regex]::Match($_, "\tusing\s(\d+)?KB\s(sequential|random)").Groups[2].Value)
                                                {
                                                    "random" {"Random"}
                                                    "sequential" {"Sequential"}
                                                }  }},`
                @{Name="PendingIO"; Expression={[int]([regex]::Match($_, "with\s(\d+)?\soutstanding").Groups[1].Value)}},`
                @{Name="FileSize"; Expression={[int]([regex]::Match($_, "\s(\d+)?\sMB\sfor\sfile").Groups[1].Value)}},`
                @{Name="IOPS"; Expression={[decimal]([regex]::Match($_, "IOs\/sec\:\s+(\d+\.\d+)?").Groups[1].Value)}},`
                @{Name="MBs_Sec"; Expression={[decimal]([regex]::Match($_, "MBs\/sec\:\s+(\d+\.\d+)?").Groups[1].Value)}},`
                @{Name="MinLat_ms"; Expression={[int]([regex]::Match($_, "Min.{0,}?\:\s(\d+)?").Groups[1].Value)}},`
                @{Name="AvgLat_ms"; Expression={[int]([regex]::Match($_, "Avg.{0,}?\:\s(\d+)?").Groups[1].Value)}},`
                @{Name="MaxLat_ms"; Expression={[int]([regex]::Match($_, "Max.{0,}?\:\s(\d+)?").Groups[1].Value)}}`

The select returns a new type of object that has Properties named Threads, Operation, Duration, IOSize, IOType, PendingIO, FileSize, IOPS, MBs_sec, MinLat_ms, AvgLat_ms, and MaxLat_ms.  This alone is sufficient to proceed to creating an Excel file for the information, but I wanted the information sorted in the same manner every time I ran this, primarily because I change how I run my tests sometimes, and I like consistency in the output.  Since I am so anal retentive thorough, I fixed this problem by piping the output from the select to the Sort-Object commandlet and sorted by the IOSize, IOType, Operation, and Threads.

Getting the data into Excel was actually fairly simple to do, thanks to Buck Woody(Blog|Twitter) and SQL Rockstar Tom LaRock(Blog|Twitter).  Buck wrote a blog post titled Create an Excel Graph of your Big Tables – With Powershell, and Tom wrote a Simple Talk Article, Monitoring SQL Server Virtual Log File Fragmentation, that use the Interop Excel objects to create an Excel workbook from PowerShell, populate the workbook with data, and build charts using the data.  My code for the Excel integration is based entirely on their examples.

$Excel = New-Object -ComObject Excel.Application
$Excel.Visible = $true
$WorkBook = $Excel.WorkBooks.Add()
$WorkBook.WorkSheets.Item(1).Name = "RawData"
$WorkBook.WorkSheets.Item(3).Delete()
$WorkBook.WorkSheets.Item(2).Delete()
$WorkSheet = $WorkBook.WorkSheets.Item("RawData")
$x = 2
$WorkSheet.Cells.Item(1,1) = "Threads"
$WorkSheet.Cells.Item(1,2) = "Operation"
$WorkSheet.Cells.Item(1,3) = "Duration"
$WorkSheet.Cells.Item(1,4) = "IOSize"
$WorkSheet.Cells.Item(1,5) = "IOType"
$WorkSheet.Cells.Item(1,6) = "PendingIO"
$WorkSheet.Cells.Item(1,7) = "FileSize"
$WorkSheet.Cells.Item(1,8) = "IOPS"
$WorkSheet.Cells.Item(1,9) = "MBs/Sec"
$WorkSheet.Cells.Item(1,10) = "Min_Lat(ms)"
$WorkSheet.Cells.Item(1,11) = "Avg_Lat(ms)"
$WorkSheet.Cells.Item(1,12) = "Max_Lat(ms)"
$WorkSheet.Cells.Item(1,13) = "Caption"

$Results | % {
    $WorkSheet.Cells.Item($x,1) = $_.Threads
    $WorkSheet.Cells.Item($x,2) = $_.Operation
    $WorkSheet.Cells.Item($x,3) = $_.Duration
    $WorkSheet.Cells.Item($x,4) = $_.IOSize
    $WorkSheet.Cells.Item($x,5) = $_.IOType
    $WorkSheet.Cells.Item($x,6) = $_.PendingIO
    $WorkSheet.Cells.Item($x,7) = $_.FileSize
    $WorkSheet.Cells.Item($x,8) = $_.IOPS
    $WorkSheet.Cells.Item($x,9) = $_.MBs_Sec
    $WorkSheet.Cells.Item($x,10) = $_.MinLat_ms
    $WorkSheet.Cells.Item($x,11) = $_.AvgLat_ms
    $WorkSheet.Cells.Item($x,12) = $_.MaxLat_ms
    $WorkSheet.Cells.Item($x,13) = [string]$_.IOSize + "KB " + [string]$_.IOType + " " + `
                                [string]$_.Operation + " " + [string]$_.Threads + `
                                " Threads " + [string]$_.PendingIO + " pending"
    $x++}

$WorkBook.Charts.Add() | Out-Null
$Chart = $WorkBook.ActiveChart
$Chart.SetSourceData($WorkSheet.Range("H1:H$x"))
$Chart.SeriesCollection(1).xValues = $WorkSheet.Range("M2:M$x")
$Chart.SetSourceData($WorkSheet.Range("H1:H$x"))
$Chart.SeriesCollection(1).xValues = $WorkSheet.Range("M2:M$x")
$Chart.Name = "IOPS"

$WorkBook.Charts.Add() | Out-Null
$WorkBook.ActiveChart.SetSourceData($WorkSheet.Range("I1:I$x"))
$Chart = $WorkBook.ActiveChart
$Chart.SeriesCollection(1).xValues = $WorkSheet.Range("M2:M$x")
$Chart.Name = "MBs Sec"

I don’t know why, but the first Chart doesn’t refresh properly unless I call SetSourceData() and set the SeriesCollection xValues properties twice.  Since the code block that loads the data into the worksheet tracks the rows with the $x variable, the Range for the chart is always row 1 to $x.  The H column happens to be the column that holds the IOPS values and the I column holds the MBs Sec values in the workbook.  The only thing missing at this point a parameter declaration at the head of the script file to require an input parameter $FileName which is passed to Get-Content to read the information.

param(    [Parameter(Mandatory=$TRUE)]
        [ValidateNotNullOrEmpty()]
        [string]
        $FileName)

The complete script is attached to this blog post.  If you use it let me know your thoughts.

SQLIOResults.ps1 (4.98 kb)