Collection of Baseline Scripts

The topic of baselines in SQL Server is one that I’ve had an interest in for a long time.  In fact, the very first session I ever gave back in 2011 was on baselines.  I still believe they are incredibly important, and most of the data I capture is still the same, but I have tweaked a couple things over the years.  I’m in the process of creating a set of baseline scripts that folks can use to automate the capture of this information, in the event that they do not have/cannot afford a third-party monitoring tool (note, a monitoring tool such as SQL Sentry’s Performance Advisor can make life WAY easier, but I know that not every can justify the need to management).  For now, I’m starting with links to all relevant posts and then I’ll update this post once I have everything finalized.

These scripts are just a starting point for what to monitor.  One thing I like to point in our IEPTO2: Performance Tuning and Optimization course is that there is A LOT of data you can capture related to your SQL Server environment.  Your options include Performance Monitor (using Custom Data Collectors), queries via Extended Events or Trace (depending on version), and any data from the DMVs or system views within SQL Server. You have decide what to capture based on

1) What problem you might be trying to solve in your environment, and

2) What information is most important for you to have.  Start simple, and then work your way up.

Figure out the one or two most critical things to capture, and start there, and then add on.

If you find there’s something missing from my scripts, let me know and I’ll try to get it added!

Monitoring in general

Configuration Information

Disk and I/O





Capture Blocking Information with Extended Events and the Blocked Process Report

I am a big fan of Adam Machanic’s WhoIsActive script, and when customers have issues with performance, it’s one of the first tools I recommend because it’s so simple to use and provides great information.  Very often it helps with quickly determining an issue, but sometimes there’s a need to capture more information, particularly when locking and blocking is part of the issue.  Adam’s script has an option to include blocking information, for example including the [blocking_session_id] column in the output and using @find_block_leaders = 1as an parameter.  But sometimes you need more information, like the blocked process report.  I’ve found one of the easiest ways to get that in SQL Server 2012 and higher is Extended Events.  If you’re running SQL Server 2005 and higher, you can use Event Notifications to capture the blocked process report.  This option is nice because you are notified when the problem occurs.  For those of you using SQL Server 2008R2 and below, you also have the option of capturing the blocked process report event through a server-side Trace.  But if you’re on SQL Server 2012 and higher, you can use Extended Events and the blocked process report.  Note: the blocked_process_report event does not exist in SQL Server 2008 or SQL Server 2008R2, which is why Trace is the method there.  The drawback to Extended Events is that you don’t get a notification that blocking occurred, but for those who are not as comfortable with Event Notifications – for whatever reason – Extended Events is a very simple alternative.

The Setup

In order to capture a blocked process report, you must have the blocked process threshold system configuration option enabled.  A good starting value is 15, which is the threshold in seconds at which the report is generated.  To set this value, run the following code:

EXECUTE sp_configure 'show advanced options', 1;
EXECUTE sp_configure 'blocked process threshold', 15;
EXECUTE sp_configure 'show advanced options', 0;

The following code will create the event session and then start it.  Note that you can create the event session and just have it defined in your system without running it.  Then, if you start to have blocking you can set the blocked process threshold and start the event session.

check to see if the event session exists
FROM    sys.server_event_sessions
WHERE   name = 'Capture_BlockedProcessReport' )
DROP EVENT SESSION [Capture_BlockedProcessReport] ON SERVER;

create the event session
edit the filename entry if C:\temp is not appropriate
CREATE EVENT SESSION [Capture_BlockedProcessReport]
ADD EVENT sqlserver.blocked_process_report
ADD TARGET package0.event_file(
SET filename=N'C:\Temp\Capture_BlockedProcessReport.xel'

start the event session
ALTER EVENT SESSION [Capture_BlockedProcessReport]

Capturing Data

Once the event session is started, then you just wait until the blocking occurs.  The following code can be used to generate an example in your test/dev environment:

create a table and insert
one row without committing
USE [tempdb];

CREATE TABLE [BlockingTest] (

INSERT INTO [BlockingTest] ([INFO]) VALUES ('SQLskills');

run the following statement in a different window
USE [tempdb];
FROM [BlockingTest];


After about 15 seconds, run the following code back in the original window:

clean up (run in original window)
USE [tempdb];
DROP TABLE [BlockingTest];

You can then stop the event session, and either leave it there until you need it again, or drop it entirely:

stop the event session
ALTER EVENT SESSION [Capture_BlockedProcessReport]

drop the event session
DROP EVENT SESSION [Capture_BlockedProcessReport]

Viewing the Report

To view the output from extended events you can open the .xel file in Management Studio or query the data using the sys.fn_xe_file_target_read_file function. I typically prefer the UI, but there’s currently no great way to copy the blocking report text and view it in the format you’re used to.  But if you use the function to read and parse the XML from the file, you can…

    event_data.value('(event/@name)[1]', 'varchar(50)') AS event_name,
    event_data.query('(event/data[@name="blocked_process"]/value/blocked-process-report)[1]') as [blocked_process_report]
    SELECT CAST(event_data AS XML) AS event_data
    FROM sys.fn_xe_file_target_read_file('C:\Temp\Capture_BlockedProcessReport*.xel', NULL, NULL, NULL)
) AS sub;

Depending on how long you let the blocking continue, you may have captured more than one event and therefore have multiple reports in the output:

Retrieving the blocked process report

Retrieving the blocked process report


You can click on the output for any row to see the blocked process in XML format, and then work through the blocking:

The blocked process report

The blocked process report

[Huge thanks to Jonathan for help with the XML.  I don’t think XML and I will ever be friends.  Geez.]


If you’re in need of the blocked process report and running SQL Server 2012, you now have another option for getting that information.  If you’re still new to extended events, check out the first two stairways in my XE series on SQLServerCentral

Happy New Year!

Taking Risks

risk \’risk\ noun : the possibility that something bad or unpleasant (such as an injury or a loss) will happen


There are risks in life every day.  Some we see very clearly.  Others we don’t even notice.  Some are related to relationships with family and friends.  Some are related to our careers.  And some involve the hundreds of other components in our daily lives.

When I first started attending user group meetings in Cleveland, every month Allen White would say, “If you are interested in speaking, please consider submitting.  Everyone has something to share, and everyone else has something they can learn from you.”  I admit, at first I kind of thought it was just rhetoric.  I was wrong.  If you know Allen, you know that he really means it when he says it.  And I know he’s right.  I love asking people what they do in their job every day, because rarely do people do the same thing (especially in the SQL Server world) and I always learn something new.  Everyone in the SQL Server community is extremely well-versed in some SQL Server topic – enough so that they could put together a presentation and talk about it for an hour.  But many don’t, for a variety of reasons.  Some people just have no desire to speak in front of a group, and that’s fine.  You can share knowledge in other ways (hello blog posts).

But for those of you that have considered speaking, or are just a little bit interested, I give you:

Evelyn Maxwell

I tweeted about her SQLSaturday Cleveland submission yesterday (it’s on Improving Your PowerPoint Skills, in case you didn’t click through), but a lot of people aren’t on Twitter so I wanted to mention it here, particularly because many people commented that if a 7th grader has the chutzpah (my word, not anyone else’s) to submit to a SQLSaturday, then others can too.  Yes.  Exactly yes.

Now, Evelyn’s not all alone, her dad is David Maxwell (who just won speaker Idol at the PASS Summit) and I’m sure she’s getting some guidance from him.  Anyone who is speaking at a SQLSaturday for the first time is hopefully getting some mentoring – it’s a daunting task to take on all alone!  But if you want to try it, then do it.  Submit to your local SQLSaturday.  Find a mentor.  Take that risk.  I know there’s a fear of failure there.  Your session may not get accepted.  Evelyn’s may not, and she knows that.  But she tried.

Fly...  photo credit: Jonathan Kehayias

Fly… photo credit: Jonathan Kehayias



Use of the C: drive by the Profiler UI and XE Live Data Viewer

I had an email from a fellow MVP this week who is in the process of learning Extended Events (hooray!). One question this person had was whether Extended Events had the same issue as Profiler where the C: drive can be heavily used and potentially run out space.

To clarify, with regard to Profiler UI, if you are using the UI to capture events (not a server side trace that writes to a file which is the preferred method), the Profiler UI does file caching of events locally when it runs against a SQL Server instance. It also performs caching when reading an existing file. These cached events are stored on the C:\drive by default, unless you have changed the User TMP location in Environment Variables (Control Panel | System | Edit the system environment variables | Advanced | Environment Variables… ):

Accesing the User TMP variable

Accesing the User TMP variable

Depending on what events you have configured for Profiler, your filter(s), the workload, and how long you run Profiler, you could generate more events than the UI can handle. Therefore, they’ll start buffering to the User TMP location. If you’re not paying attention, you can fill up the C: drive. This can cause applications (including SQL Server) to generate errors or stop working entirely. Not good.


Now, back to the original question. Does the same problem exist for Extended Events? Only if you’re using the Live Data Viewer.  After you have an event session created (you can just use system_health for this example), within Management Studio, go to Management | Extended Events | Sessions, select the session and right-click and select Watch Live Data:

Using the Live Data Viewer in XE

Using the Live Data Viewer in XE

As events are captured, they will show up in the data view. As with the Profiler UI, the number of events that appear will depend on the session configuration and the workload. The Live Data Viewer will only show a maximum of one million (1,000,000) events. Once that number has been exceeded, it will start to cache events to the User TMP location, just like the Profiler UI. And just like the Profiler UI, that can fill up the C: drive if that is still the User TMP location. Note that the Live Data Viewer will automatically disconnect and stop displaying events if the engine determines it’s negatively affecting performance. If the event session’s internal memory buffers fill up and the events cannot be dispatched to the event stream for consumption, the engine will disconnect the UI from the event stream. –

[More on the viewer if you’re interested: Introducing the Extended Events Reader]

There are two action items for you:

  1. Don’t use the Profiler UI unless it’s a NON-Production environment.
    1. If you refuse to give up that woobie (name the movie reference) at least change the User TMP location to something other than C:
  2. If you use the Live Data View in Extended Events for event sessions that generate a lot of events, change the User TMP location to something other C:

Shout out to Jonathan for a review of this one.




Be the change

My day had a wonderful start…hot yoga at 6AM with one of my favorite instructors. Homemade cinnamon rolls for breakfast.  Great progress made with some client work. Then I read the email from Tom LaRock and the post from Wendy Pastrick regarding a harassment issue at this year’s PASS Summit. My heart is heavy.

I’ve read both texts multiple times. This line from Wendy keeps sticking in my head, “I declined, telling him I was fine.”

I understand. I’ve been there. And it’s an awful place. Any person who has been harassed knows this. Whether the harassment was physical – having someone grab your ass (or part of your body) is never funny, whether it was verbal – a sly comment with a lewd look that makes you go “ew”, doesn’t matter. The emotional response that comes with it is the indicator that you are not fine, and that you need to do something.

Very often we are taught to not “rock the boat.” Pull up your boots, put it behind you, and move on. It’s as if there is shame in experiencing that discomfort, and we must wholeheartedly deny that. If we do not, when we do NOT call out the offender, we let the offense continue. That person does it to someone else, who may or may not speak up, and the cycle continues.

I applaud Wendy for realizing that she was not fine, and for reporting it. For anyone who might think she over-reacted, I’ll strongly tell you to sit down and just stop. If you have ever experienced that feeling of discomfort, where your body temperature rises and you feel embarrassed – EVEN THOUGH YOU DID NOTHING WRONG – then you have been harassed. And if shame or fear has stopped you from saying anything, then I ask you – not encourage you, but implore you – to act differently if it occurs again. Do not wrap up those feelings inside a blanket and hide in a corner. Be brave and step forward.

I believe in going to any event with someone you trust – particularly events at the PASS Summit because there are so many people and because it’s a city where you probably don’t live. That person that goes with you is your wingman. You have his/her back, he/she has yours. You never, ever leave your wingman (if that sounds familiar, yes I’m quoting Top Gun…those pilots are on to something). If what happened to Wendy happens to you, you go right to your wingman. Do not say that you are fine. Let your wingman help you figure out next steps. One of those steps is reporting the event to PASS (or the proper governing body – HR, another Board – depending on the event) because this behavior will not change unless we begin to speak out and condemn it.

I leave you with this:

Be the change you wish to see in the world.

Be the change you wish to see in the world.

It starts with each one of us. Wendy has taken that path. In the unfortunate event that this happens to you, I hope you will follow.


With Thanksgiving just around the corner, I wanted to write about the appreciation I have for some colleagues I’ve had throughout the years, as well as several that I have now.  We often take time at the end of November to think about the things for which we are thankful.  And while that’s a very good thing, my goal this year is to take it one step further and make sure I tell the people in my life that I am grateful for them, and why.

This week at our Immersion Event, I went running on Wednesday morning with a few attendees. One of them mentioned that he had three daughters, and wondered whether it was worth encouraging them to go into IT as he noticed that in the Immersion Event classes the ratio of women to men was pretty low. He suggested that perhaps IT wasn’t a great place for women. I immediately said that I would absolutely recommend it. There may not be a lot of women in IT, but that doesn’t mean that it’s not a good place for them and won’t continue to be.  And things are always changing.  Getting women into IT, and retaining them, is a continual conversation we have, particularly in the SQL Server community.  Perhaps I’m unique, but I don’t need to work with a large percentage of women or majority of women, to feel comfortable with my team. Perhaps that’s because I’ve never encountered some of the issues that I’ve heard from other women in technology.  The issues where a male colleagues was not supportive, perhaps purposely kept a female colleague out of the loop, was very negative, or maybe even avoided or ignored female teammates entirely.  I don’t know if I’ve been lucky, or if, when I’ve encountered those men, didn’t take it personally, figured that person was just a jerk, and just figured out how to work through it.  There are jerks everywhere – within IT and out of it – and those jerks can be men or women.

I have been fortunate to have supportive individuals, both male and female, in every job I’ve ever had, in both leadership and peer roles.  Maybe I’m unique, maybe not. But I’d like to take a minute to thank the individuals who supported me, and who continue support and stand up for me and for other women in IT, and in their lives. To those of you who have daughters…my unsolicited advice is to absolutely encourage them to go into IT if that is something in which they are interested. And I would encourage fathers and mothers to reach out to women and men in the field already to ask for guidance and mentorship. Most people are happy to provide their experience and any insight they have, you just need to ask.

I won’t list all of the colleagues to whom I am grateful, there are just so many and I’m afraid I might miss someone.  But if I’ve ever looked at you and said “thank you”, or given you a handshake or a hug with a “thank you”, or sent you an email or tweet with those words, or mentioned you in a post here or a post on Facebook, then YOU are one of those people who I appreciate, who I value, and for whom I am grateful to have in my circle.  Thank you, and Happy Thanksgiving.

Filtering Events in Trace/Profiler and Extended Events

It seems very wrong to write a post that talks about Trace, after all I’ve done to advocate that everyone start using Extended Events (XE). However, I know there are a lot of you who still use Trace because you’re running SQL Server 2008R2 and earlier, so you all get a free pass. Until you upgrade. If you’re running SQL Server 2012 or higher, I strongly recommend that you use XE. But that’s the not the main point of this post. What I really want to do is step through filtering a .trc or .xel file to remove selected events.


Now, if you’ve worked with Trace for a long time, you may be wondering why you would ever filter events because, let’s be honest, you might not know you can do that (I didn’t for ages).  You can! The Profiler UI isn’t where you typically perform data analysis, but one reason you might filter out events is if you’re using Distributed Replay and you need to remove events to avoid generating errors during the reply. To do this, open the .trc file in the Profiler UI, then select File | Properties… Within the Trace File Properties window, go to the Events Selection tab, then select Column Filters… Within the Edit Filter window, you can chose a column (or multiple columns) on which to filter your data. If I want to remove all events before a specific time, I would edit the EndTime:

EndTime filter in Trace

EndTime filter in Trace

This removes all events before 11:31PM on November 12, 2015. After you have removed the events, you can save the remaining data as a new .trc file through File | Save As | Trace File…

Extended Events

Filtering events from an Extended Events file is even easier. Open the .xel file within Management Studio, then select Extended Events | Filters (you can also select the Filters icon in the Extended Events toolbar). Within the Filters window, you can choose to filter on time, and/or any of the other fields captured for the events:

Date and logical_reads filter in Extended Events

Date and logical_reads filter in Extended Events

Once you select Apply, all events before 11:31PM will be removed, as well as those with less than 1000 logical_reads. The remaining events can again be saved to a new file (Extended Events | Export to | XEL File…), or you can just run analysis against the filtered data. You can remove the filter at any time by going back to the Filters window and selecting Clear All.


Hopefully this helps if you ever need to remove events from a .trc or .xel file.  Note that I always save the filtered data as a new file – I prefer to keep the original just in case I need all the events for some reason.

PASS Summit 2015: The Recap

Another PASS Summit has ended, and again the days and evenings just flew by.  I’m not intending this to be a long-winded post, just some notes and a few highlights.  We’ll see how I do.

First, a huge thank you to those of you who attended my sessions on Wednesday.  I had wonderful crowds and great questions – it is such an honor to present at Summit, and I so enjoyed both sessions.  If you are looking for the decks and the scripts.  I have sent updated slide decks to PASS (I admit, I tweak them until the very end).  When they are uploaded you can pull them from the Summit site:

Kick and Screaming: Replacing Profiler with Extended Events

Statistics and Query Plans

As for the demos, they can be found on the SQLskills Resources page, under the PASS Summit 2015 section.  The link is also listed in the slide decks.

A few weeks ago I had mentioned that I was supporting Argenis Without Borders 2.0 by wearing a costume at Summit.  The cool news first: Argenis and Kirsten raised $25,000.  Yes I have the comma in the right place: TWENTY-FIVE THOUSAND DOLLARS.  Ah-mazing.  As for my costume?  I went as Perry.  You may know that I take a stuffed Perry with me when I travel and take pictures of him to send to my kids, so why not dress up as him?  It was great, and huge thanks to Jes (Wonder Woman in the photo below) for the help.  Sadly, you can’t see my orange shoes in these pictures.

Argenis (as Ted) and me

Argenis (as Ted) and me


Wonder Woman and Perry, supporting Argenis Without Borders 2.0

Wonder Woman and Perry, supporting Argenis Without Borders 2.0


With my sessions finished on Wednesday, I had time on Thursday and Friday to catch up with friends, meet some new people, and sit in on some sessions.  There are some fantastic features coming in SQL Server 2016, and I’ll be making time over the next few months to start working with a few of them, as well as getting better acquainted with recently-released features that are starting to mature.  I am not at MVP Summit this week, which is a bummer on multiple levels.  However, I have been gone for three out of four weeks in October, and being away from my family for another week would have been a bigger bummer.  Life is about choices and balance :)

On a final note, I saw on Twitter that David Maxwell was the winner of Speaker Idol, meaning that he will present at the 2016 PASS Summit (to be held October 26-29, 2016).  David has worked hard over the last couple years – presenting at user groups and SQLSaturdays – steadily working to improve and hone his skills and style.  He is a great example of a community member that decided to step up and start speaking, and now he’ll have a spot at Summit next year.  Congratulations David!  And to those of you who have ever sat in a session and thought, “I could do that,” or “I would like to try that,” then let me ask you, “What are you waiting for?”  There are many people in the community who are willing to help new speakers (including me), but you need to take initiative and reach out to them when you’re ready.  Who knows, you could be speaking at Summit, or another conference, down the road.

For those of you that were at Summit, I hope you catch up on sleep and email in the next couple days, and I hope to see you again next year!


PASS Summit 2015: Women in Technology Luncheon

It’s Thursday at the PASS Summit so that means it’s time for the Women in Technology Luncheon.  As in years past (I’ve lost count of how many), the luncheon is sponsored by  SQL Sentry.  The SQL Sentry team is here at Summit in full force, and I have Allen White at the blogger’s table with me.  But while I’m at it, let me give a shout out to a few members of the SQL Sentry team that have been supportive of not just this event, but of myself and some fellow colleagues.  These gentleman have provided feedback, suggestions, and good old fashioned support whenever asked or needed.  Thank you Aaron Bertrand, Kevin Kline, Nick Harshbarger, and Greg Gonzalez for all you do for me, my colleagues, and this community.

For those of you at home, you can watch the luncheon live on PASSTV.  Finally, if you want more rapid-fire commentary from the luncheon (as I’ll refresh this post every 5-10 minutes), I recommend following Mark Vaillancourt on Twitter (@markvsql).

Today’s luncheon features guest Angie Chang from Hackbright Academy, the VP of Strategic Partnerships, and we start with PASS Board VP of Marketing Denise McInerney welcoming us to today’s lunch (it’s the 13th one).

Angie starts by talking about her path from undergrad to her position today.  She started the Girl Geek Dinner chapter in San Francisco, and Hackbright sought her out to help celebrate the first graduating class of Hackbright.  Hackbright has graduated around 300 women over the past 3 years, and a few of those women now hold technical management positions.  Hackbright was started by some women who attended a coding camp.  The group started with an experiment of 12 women, teaching them to code in 10 weeks.  Since then they have grown the classes and the curriculum has evolved.  Right now teaching Python, and also teach some Java, Angular – they are taught to learn not just the language, but also ask questions.  Each engineering fellow has three mentors.  There are 100 software engineers who mentor those students for one hour a week.  This mentorship helps enhance the experience, and the students also get to visit other technical companies (e.g. Twitter, Dropbox).

Hackbright uses pair programming.  The community aspect is important – particularly because it’s an all-women environment.  The environment is very casual.  The students at Hackbright are very diverse and come from a variety of backgrounds.  Hackbright has a high rate of job placements.  Angie highlights some graduates of Hackbright who have been promoted to engineer management positions within their company.  SurveyMonkey has hired the most “Hackbright’s” of any company and one of the engineers is a manager there now.

Hackbright works with partner companies by inviting them to career day events and the Hackbright graduation.  Facebook sponsors a scholarship once a quarter, and Denise’s company, Intuit, also provides a scholarship.  Girl Geek Dinner started in London in about 2006, and Angie was working at a startup at that time.  Angie started up the Girl Geek Dinner in Mountain View, sponsored by Google – they had 400 people in 5 days.  They are booked into 2017 for dinners, with 2-3 per month.

Denise shifts to talking about the pipeline problem.  One Hackbright instructor, Rachel Thomas, wrote a post, If you think this is a pipeline issue then you haven’t been paying attention.  The article has suggestions for how to improve the pipeline – it’s not about getting women in, it’s about retaining them.  Denise asks Angie if she feels retention will be an issue for those graduating from Hackbright – and Angie states that they create a good network for each graduating engineer – their classmates at Hackbright, their mentors, etc. which gives each person a set of resources to turn when they’re struggling.

If you have questions you can come up to the microphone or use the #passwit hashtag on Twitter.

Documentary from Technovation called Codegirl which will stream on YouTube from November 1-5, check out the trailer.

Want to see if you have any unconscious biases?  Check out these tests on Harvard’s site.

PASS Summit 2015: Day 2

8:20 AM

We’re off and running with Adam Jorgensen, PASS EVP of Finance.  Adam takes this opportunity to provide an update about the financial status of PASS as this satisfies the requirements of the by-laws.  The largest source of revenue is the PASS Summit (not a surprise), bringing in just over 7 million dollars (of the 8 million generated in the 2015 fiscal year).  The finances continue trend upward, which is great.  Finances support the community through events all year long.  This year, 78% of every dollar taken in goes back to a community program.  PASS is in great financial health, increased reserves to 1.14 million dollars.  Starting this year, portfolio-level budget summaries will be published, to make the process more transparent to the community.  Last year goals for 2015 were to focus on support for SQLSaturdays and Chapters, among others.  PASS Summit will be in Seattle through 2019.  SQLSaturday website was relaunched this past year to help better support the events.  This year, goals include the BA Community Portfolio, refocus investments to community profiles, global growth program, sales portfolio, technology investments (including a re-design of ELS: this makes Jes happy).  Adam wraps up by thanking Amy Lewis, outgoing board member.

8:33 AM

Adam finishes up and EVP Denise McInerney comes on stage.  Denise takes a minute to thank Bill Graziano, who is the outgoing Immediate Past President.  Bill has been a member of the board for 10 years.  ELS: I’m personally a big fan of Bill, I worked with him on the NomCom.

Denise moves on to the PASSion Award.  There were 71 Outstanding Volunteers this past year.  This year’s PASSion Award goes to Lance Harra.  He runs the Kansas City SQLSaturday and was an integral part of the program committee.  If you are interested in becoming a part of the SQL Server leadership team, stop by the Community Zone this week.  There are always ways to get involved with PASS.

There are over 150,000 members of PASS.  There are 3000 people from over 95 countries tuning in live.  Yesterday PASS introduced foundation sessions, which were offered by Microsoft (four of them yesterday).  Over the years PASS has grown its offerings to meet its members needs – virtual chapters, 24 Hours of PASS, SQLSaturday, user groups, and more.

Today is the Women in Technology lunch (11:45) sponsored by SQLSentry, and the keynote speaker is Angie Chang.  It will be live streamed on PASSTV.  Today is the Board Q&A at 3:30.  Tonight is the Community Appreciation Party at the EMP Museum at 7 PM.

PASS Summit next year is scheduled for October 25 – October 28 – early bird pricing is available!

Today’s keynote speakers are Dr. David DeWitt and Dr. Rimma Nehme. (ELS: TWO OF MY FAVORITES!!!)  They are both at the Microsoft Jim Gray Systems Lab in Madison.  Data Management for the Internet of Things.

8:45 AM

Dr. Nehme takes the stage.  She mentions that it’s harder to present a keynote together than individually.  She will start, Dr. DeWitt will come in, then Dr. Nehma will wrap up (dessert!).  What, why, how and of IOT.

Disclaimer: not announcing a product.  Goal is to inform, educate, and inspire (and entertain a bit).

Wants to begin with a new reality.  Things around us have a voice that can communicate to us.  IOT is a collection of devices and services that work together to do something useful.  Basic formula: take a basic object, add controller, sensor and actuator, add the internet, and then you get the internet of things.

Take the sensors and actuators, add connectivity and big data analytics, and then you can provide new services and optimization.  The target is to create value (make money).  What does that typically look like?  Collect data from sensors, aggregate it, analyze, then act on it. This is a continuous loop.  There are 2 types of IOT that people agree upon.  On one side have a consumer internet of things – things that are wearable, related to us as humans (phone, watch, etc.) then have things that are industrial (cars, factories, etc.).

Consumer IOT: fitbut, Nest, Lumo.  What can they reveal about us?  Health info, house information, driving habits.  You can analyze that information and make predications/revelations.  The Industrial Internet of Things (IOT) can be connected, and then significant value can be realized, particularly in Industry.  It is still in its infancy.  There are four types of IOT capabilities: Monitoring, Control, Optimization, Autonomy.  The analogy of this to human development..  We are in the “terrible twos” of the IOT development.  Why IOT?  We are at the peak of the hype right now (based on Gartner).  There is a growth of “things” connected to the internet.  In 2003 had about 500 million devices connected to the internet.  Have 12.5 billion by 2010.  Around 2008, the number of things connected to the internet exceeded the number of people.  In 2015, at 25 billion things connected to the internet.  The value to customers is huge.  The power of 1% – if you can improve 1% in fuel savings in an industry like aviation, health care, or power generation, that’s extremely significant.

Why is this happening now?  More mobile devices, better sensors and activators, and BI analysis.

For IOT How?, Dr. DeWitt comes on stage.  Dr. DeWitt is going to talk about the services available.  There are a lot of challenges – a large number (and variety) of sensors.  There are A LOT of devices sending data.  Sensors are frequently dirty, and it’s hard to distinguish between dirty readings and anomalies.  And then there is just the volume of data that’s being sent into the cloud.  One of the biggest challenges is device security.  How do you prevent them from overwhelming cloud infrastructure or impersonating a device?  And then there’s cloud-to-device messaging.  Sometimes the device is not online.  Therefore the device may miss a message, so persistent queue and reliability is needed.  How do you deploy this and get the IOT set up?  We’re not going to tackle that today.

There are differences between consumer and industrial IOT.  In consumer IOT have to worry about battery and power failure, more cost-sensitive, and might be a simple embedded device, or it could be a powerful sensor, and finally, consumers have wireless (industrial has unlimited power, full-fledged, wired, and depends on needed functionality).  Rest of talk will focus on industrial.  Note: one size fits none.

Today’s IOT: Just Do It Yourself.  The state of the art is still rather primitive.  What are the ingredients that go into IOT?  The basic block diagram, out in the field you have devices with a sensor and actuator (e.g. sense temp, humidity, in a Nest thermostat).  Up in the cloud, have event/data aggregator.  Device to Cloud (D2C) is how the data gets from the device up to the cloud.  You can feed this data into an application, into event/data storage, into a real-time processing engine (real time), and that *can* use a device controller and send it back to the device (C2D = Cloud to Device).  Azure IOT services exist.  Two main components: Azure Iot Hubs and Azure Event Hubs.  The data management is done through Azure Stream Analytics, DocumentDB, SQL Azure and SQL-DW, Azure HDInsight and Azure Machine Learning.  and then use PowerBI and Excel to visualize the data.

Azure IOT Hub (an Azure PaaS Service), this is the cornerstone of IOT.  It receives events and routes them.  It is scalable to millions of devicees, and it provides per-device instance authentication.  It can send commands back to the devices.  Within the hug, every device has it’s own send endpoint, to which the sensors will send events.  On the output side, is a set of partitions, into which data gets routed.  The number of partitions is created when the service is created in the cloud.  A hash function routes it to a partition.  Event consumers then “pull” events from the Receive EndPoint.  There is a C2D Send Endpoint that can send messages out, and then get routed to a message queue that guarantee once delivery out to the device’s actuator.

One thing you can do with events is pull them out of the IOT HUB and they go to the Event Consumer such as SQL Azure (doesn’t have a nicer sexy symbol like SQL Server), into HDFS, into Azure Storage, or into DocDB (these are examples).  Analyzing the events, then, can be done via SQL Server, or use SQL-DW and Polybase, Hadoop from HDFS (or Hvie/Storm), or DocDB.  All of these are great opportunities to store events.  A neat thing to do with IOT data is LEARN from it (e.g. when the boiler might explode).

Options for real-time query engine include Azure Stream Analytics or Apache Storm on HDInsight.  What’s a real-time query engine?  Traditional RDBMS with data on disk, send in a query, get data back.  In Dr. DeWitt’s mind, the real-time streaming is taking a sequence of events, and some queries that will operate over those events, and the query will find IDs of boilers that are about ready to explode based on PSi.  As query processes stream events, it will eventually produce results.  Can have multiple queries operating over the same set of events, or different streams.  Dr DeWitt encourages us to learn about stream analytics.

There is no data stored, the queries are just continually running, data flows through the query, outputs results.  When you see something important, what do you do?  Send a message to IOT hub to do an action (e..g open pressure release valve).  Field gateway – Raspberry Pi, running Windows 10, has WiFi – that’s a field gateway.  There are two primary use cases: when a sensor/device cannot itself connect to the internet, or for complex objects (e.g. smart cars) with multiple sensors/actuators.  Two flavors: opaque (only field gateway has identity in IOT hub) and transparent (each device is registered in IOT hub.  The field gateway are processors with memory and processors.

How to manage IOT metadata?  per-device metadata is not stored in a database system at present time so no query support.

Device security is super critical for IOT deployment.  Devices must have unique identities, and must PULL to obtain C2D commands (no ports open to reduce attacks).  Main takeway: it is PUSH to the cloud.  All the IOT events get pushed up into the cloud.  It was a good first effort.  But what are the problems with pushing everything to the cloud? Not enough bandwidth, requires connectivity, latency, data deluge (from boring sensor readings), storage constraints (storing EVERY event), speed, main point: wastes network bandwidth, computational resources, storage capacity and bandwidth processing for NON-INTERESTING events.

Go back to boiler example…Running the same query over and over, waste bandwidth sending the reading every second.  Centralizing all data from multiple systems might overload the system.  Here is their insight: exploit the capability of the field gateway.  It can do local processing and control.  Have the boiler with sensor and actuator.  Then you have a field gateway, and in that, going to run a streaming database system, and install on that boiler gateway control program, and run data through.  If run streaming engine there, can run any number of queries, might send average pressure reading for 60 seconds of data up to the IOT.  This is a better approach – reduce what pushing up to the cloud, and what needs to be stored.

How can we do better?  Dr. Nehme comes back on stage…  (she has changed her outfit…but don’t tweet about it…she’s a jeans and tshirt girl (I KNEW IT)

Fog computing – all about computing on the edge.  It is not cloud vs. fog, it is cloud + fog.

What’s the fog?  It’s like “predicate pushdown”.  Never move the data to the computation, move the computation to the data.  Devices perform some data pre-processing and compression, the cloud is a big gorilla that can do the management, processing, and machine learning.  How can we do better?  Real-time response, scalability, metadata management, GeoDR of IOT hubs.  IOT is a database problem, not just a networking problem.  It hasn’t been database-centric before, but trying to address that.

Want to take existing IOT Azure services and expand on them.  Proposing Polybase for IOT (not a product announcement, just an idea).  What is vision? Declarative language, complex object modeling, scale able metadata management, discrete and continuous queries, multi-purpose querying, computation pushdown.

Declarative language: if dealing with IOT, only choice is to use imperative language.  Have to explicitly specify how you want to see something.  What about IOT-SQL?  A declarative language where you can select information from the sensors.  If have tables specified as buildings, room, temperature sensors, etc.  With temperature sensors, have columns that looks like regular database.  Need to figure out how to model complex objects – for example, a room on a floor in a building, – need a model for this.  Have a notion of a shell database – it is a regular database that stores metadata, statistics, and access privileges – can perform authentication, authorization and query optimization against that database.  As far as these processes are concerned, they don’t need the actual data.  Now expand this to the devices.  The IOT shell also gives a simple abstraction for sensors, actuators, and distributors.  The shell can be stored in SQL Azure, DocDB, etc.  It’s JUST a database.

What about querying devices?  One query is ExecuteOnce: push select to device, it sends results, we’re done.  ExecuteForever, push SELECT to device, then the device continually sends results back to client.  When done, send signal we’re done and query stops running.  Then have ExecuteAction: send a SELECT and then an action, and the action gets fired when predicate is met.  Can do execution once, or forever.

Back to temperature sensor table…need some delcarative queries.  ExecuteOnce – get the count of all hot locations.  The optimized plan is generated, data is moved, and then work is done up in the cloud.  Not a lot of pushdown here.  ExecuteForever query – record all hot locations up in the cloud, and execute forever, the optimizer might produce a different plan (does some partial aggregation before pushing data up into the cloud – larger computation is done in the “fog”).

ExecuteAction: turn on AC in all the hot locations.  Larger computation and the action is pushed down in the fog, and only interesting events are pushed up into the cloud.  Multi-purpose query – based on results, some could go to one location, some could go to another location.

The Polybase for IOT Wrapup – use SQL front end with Polybase for sensor/actuator metadata management and querying.  Exploit Polybase’s external attribute mechanism to allow SQL queries to reference sensor values…and then one more thing I didn’t get :)

Why should we, as data professionals, care?  When a new technology rolls over you, you’re either part of the steamroller or part of the road (didn’t get the attribute).  Key takeway: the amount of data to manage is exponentially going up.  Need to step back to see what success looks like.

Dr. Nehme has announced that this is their last keynote.  Why? Dr. DeWitt…they have done 7 of these.  There are a lot of great speakers at MS, and he is sure there are people who are better speakers.  Dr. DeWitt and Dr. Rehme are “parting ways”.  She is finishing up her MBA and moving on.  Dr. DeWitt is starting to think about retirement.  After 40 years thinks it’s about time to give up the full time gig.  In 10 years…  Have not seen the last of Dr. Rehma – whether it’s at Microsoft or at a competitive.  Dr. DeWitt says this has been one of his brightest spots in his career.  He says it’s been a terrific experience.  He will think about this community for many years to come.  (ELS: I admit, I’m a little teary.)