PASS Summit 2015: Day 2

8:20 AM

We’re off and running with Adam Jorgensen, PASS EVP of Finance.  Adam takes this opportunity to provide an update about the financial status of PASS as this satisfies the requirements of the by-laws.  The largest source of revenue is the PASS Summit (not a surprise), bringing in just over 7 million dollars (of the 8 million generated in the 2015 fiscal year).  The finances continue trend upward, which is great.  Finances support the community through events all year long.  This year, 78% of every dollar taken in goes back to a community program.  PASS is in great financial health, increased reserves to 1.14 million dollars.  Starting this year, portfolio-level budget summaries will be published, to make the process more transparent to the community.  Last year goals for 2015 were to focus on support for SQLSaturdays and Chapters, among others.  PASS Summit will be in Seattle through 2019.  SQLSaturday website was relaunched this past year to help better support the events.  This year, goals include the BA Community Portfolio, refocus investments to community profiles, global growth program, sales portfolio, technology investments (including a re-design of sqlpass.org ELS: this makes Jes happy).  Adam wraps up by thanking Amy Lewis, outgoing board member.

8:33 AM

Adam finishes up and EVP Denise McInerney comes on stage.  Denise takes a minute to thank Bill Graziano, who is the outgoing Immediate Past President.  Bill has been a member of the board for 10 years.  ELS: I’m personally a big fan of Bill, I worked with him on the NomCom.

Denise moves on to the PASSion Award.  There were 71 Outstanding Volunteers this past year.  This year’s PASSion Award goes to Lance Harra.  He runs the Kansas City SQLSaturday and was an integral part of the program committee.  If you are interested in becoming a part of the SQL Server leadership team, stop by the Community Zone this week.  There are always ways to get involved with PASS.

There are over 150,000 members of PASS.  There are 3000 people from over 95 countries tuning in live.  Yesterday PASS introduced foundation sessions, which were offered by Microsoft (four of them yesterday).  Over the years PASS has grown its offerings to meet its members needs – virtual chapters, 24 Hours of PASS, SQLSaturday, user groups, and more.

Today is the Women in Technology lunch (11:45) sponsored by SQLSentry, and the keynote speaker is Angie Chang.  It will be live streamed on PASSTV.  Today is the Board Q&A at 3:30.  Tonight is the Community Appreciation Party at the EMP Museum at 7 PM.

PASS Summit next year is scheduled for October 25 – October 28 – early bird pricing is available!

Today’s keynote speakers are Dr. David DeWitt and Dr. Rimma Nehme. (ELS: TWO OF MY FAVORITES!!!)  They are both at the Microsoft Jim Gray Systems Lab in Madison.  Data Management for the Internet of Things.

8:45 AM

Dr. Nehme takes the stage.  She mentions that it’s harder to present a keynote together than individually.  She will start, Dr. DeWitt will come in, then Dr. Nehma will wrap up (dessert!).  What, why, how and of IOT.

Disclaimer: not announcing a product.  Goal is to inform, educate, and inspire (and entertain a bit).

Wants to begin with a new reality.  Things around us have a voice that can communicate to us.  IOT is a collection of devices and services that work together to do something useful.  Basic formula: take a basic object, add controller, sensor and actuator, add the internet, and then you get the internet of things.

Take the sensors and actuators, add connectivity and big data analytics, and then you can provide new services and optimization.  The target is to create value (make money).  What does that typically look like?  Collect data from sensors, aggregate it, analyze, then act on it. This is a continuous loop.  There are 2 types of IOT that people agree upon.  On one side have a consumer internet of things – things that are wearable, related to us as humans (phone, watch, etc.) then have things that are industrial (cars, factories, etc.).

Consumer IOT: fitbut, Nest, Lumo.  What can they reveal about us?  Health info, house information, driving habits.  You can analyze that information and make predications/revelations.  The Industrial Internet of Things (IOT) can be connected, and then significant value can be realized, particularly in Industry.  It is still in its infancy.  There are four types of IOT capabilities: Monitoring, Control, Optimization, Autonomy.  The analogy of this to human development..  We are in the “terrible twos” of the IOT development.  Why IOT?  We are at the peak of the hype right now (based on Gartner).  There is a growth of “things” connected to the internet.  In 2003 had about 500 million devices connected to the internet.  Have 12.5 billion by 2010.  Around 2008, the number of things connected to the internet exceeded the number of people.  In 2015, at 25 billion things connected to the internet.  The value to customers is huge.  The power of 1% – if you can improve 1% in fuel savings in an industry like aviation, health care, or power generation, that’s extremely significant.

Why is this happening now?  More mobile devices, better sensors and activators, and BI analysis.

For IOT How?, Dr. DeWitt comes on stage.  Dr. DeWitt is going to talk about the services available.  There are a lot of challenges – a large number (and variety) of sensors.  There are A LOT of devices sending data.  Sensors are frequently dirty, and it’s hard to distinguish between dirty readings and anomalies.  And then there is just the volume of data that’s being sent into the cloud.  One of the biggest challenges is device security.  How do you prevent them from overwhelming cloud infrastructure or impersonating a device?  And then there’s cloud-to-device messaging.  Sometimes the device is not online.  Therefore the device may miss a message, so persistent queue and reliability is needed.  How do you deploy this and get the IOT set up?  We’re not going to tackle that today.

There are differences between consumer and industrial IOT.  In consumer IOT have to worry about battery and power failure, more cost-sensitive, and might be a simple embedded device, or it could be a powerful sensor, and finally, consumers have wireless (industrial has unlimited power, full-fledged, wired, and depends on needed functionality).  Rest of talk will focus on industrial.  Note: one size fits none.

Today’s IOT: Just Do It Yourself.  The state of the art is still rather primitive.  What are the ingredients that go into IOT?  The basic block diagram, out in the field you have devices with a sensor and actuator (e.g. sense temp, humidity, in a Nest thermostat).  Up in the cloud, have event/data aggregator.  Device to Cloud (D2C) is how the data gets from the device up to the cloud.  You can feed this data into an application, into event/data storage, into a real-time processing engine (real time), and that *can* use a device controller and send it back to the device (C2D = Cloud to Device).  Azure IOT services exist.  Two main components: Azure Iot Hubs and Azure Event Hubs.  The data management is done through Azure Stream Analytics, DocumentDB, SQL Azure and SQL-DW, Azure HDInsight and Azure Machine Learning.  and then use PowerBI and Excel to visualize the data.

Azure IOT Hub (an Azure PaaS Service), this is the cornerstone of IOT.  It receives events and routes them.  It is scalable to millions of devicees, and it provides per-device instance authentication.  It can send commands back to the devices.  Within the hug, every device has it’s own send endpoint, to which the sensors will send events.  On the output side, is a set of partitions, into which data gets routed.  The number of partitions is created when the service is created in the cloud.  A hash function routes it to a partition.  Event consumers then “pull” events from the Receive EndPoint.  There is a C2D Send Endpoint that can send messages out, and then get routed to a message queue that guarantee once delivery out to the device’s actuator.

One thing you can do with events is pull them out of the IOT HUB and they go to the Event Consumer such as SQL Azure (doesn’t have a nicer sexy symbol like SQL Server), into HDFS, into Azure Storage, or into DocDB (these are examples).  Analyzing the events, then, can be done via SQL Server, or use SQL-DW and Polybase, Hadoop from HDFS (or Hvie/Storm), or DocDB.  All of these are great opportunities to store events.  A neat thing to do with IOT data is LEARN from it (e.g. when the boiler might explode).

Options for real-time query engine include Azure Stream Analytics or Apache Storm on HDInsight.  What’s a real-time query engine?  Traditional RDBMS with data on disk, send in a query, get data back.  In Dr. DeWitt’s mind, the real-time streaming is taking a sequence of events, and some queries that will operate over those events, and the query will find IDs of boilers that are about ready to explode based on PSi.  As query processes stream events, it will eventually produce results.  Can have multiple queries operating over the same set of events, or different streams.  Dr DeWitt encourages us to learn about stream analytics.

There is no data stored, the queries are just continually running, data flows through the query, outputs results.  When you see something important, what do you do?  Send a message to IOT hub to do an action (e..g open pressure release valve).  Field gateway – Raspberry Pi, running Windows 10, has WiFi – that’s a field gateway.  There are two primary use cases: when a sensor/device cannot itself connect to the internet, or for complex objects (e.g. smart cars) with multiple sensors/actuators.  Two flavors: opaque (only field gateway has identity in IOT hub) and transparent (each device is registered in IOT hub.  The field gateway are processors with memory and processors.

How to manage IOT metadata?  per-device metadata is not stored in a database system at present time so no query support.

Device security is super critical for IOT deployment.  Devices must have unique identities, and must PULL to obtain C2D commands (no ports open to reduce attacks).  Main takeway: it is PUSH to the cloud.  All the IOT events get pushed up into the cloud.  It was a good first effort.  But what are the problems with pushing everything to the cloud? Not enough bandwidth, requires connectivity, latency, data deluge (from boring sensor readings), storage constraints (storing EVERY event), speed, main point: wastes network bandwidth, computational resources, storage capacity and bandwidth processing for NON-INTERESTING events.

Go back to boiler example…Running the same query over and over, waste bandwidth sending the reading every second.  Centralizing all data from multiple systems might overload the system.  Here is their insight: exploit the capability of the field gateway.  It can do local processing and control.  Have the boiler with sensor and actuator.  Then you have a field gateway, and in that, going to run a streaming database system, and install on that boiler gateway control program, and run data through.  If run streaming engine there, can run any number of queries, might send average pressure reading for 60 seconds of data up to the IOT.  This is a better approach – reduce what pushing up to the cloud, and what needs to be stored.

How can we do better?  Dr. Nehme comes back on stage…  (she has changed her outfit…but don’t tweet about it…she’s a jeans and tshirt girl (I KNEW IT)

Fog computing – all about computing on the edge.  It is not cloud vs. fog, it is cloud + fog.

What’s the fog?  It’s like “predicate pushdown”.  Never move the data to the computation, move the computation to the data.  Devices perform some data pre-processing and compression, the cloud is a big gorilla that can do the management, processing, and machine learning.  How can we do better?  Real-time response, scalability, metadata management, GeoDR of IOT hubs.  IOT is a database problem, not just a networking problem.  It hasn’t been database-centric before, but trying to address that.

Want to take existing IOT Azure services and expand on them.  Proposing Polybase for IOT (not a product announcement, just an idea).  What is vision? Declarative language, complex object modeling, scale able metadata management, discrete and continuous queries, multi-purpose querying, computation pushdown.

Declarative language: if dealing with IOT, only choice is to use imperative language.  Have to explicitly specify how you want to see something.  What about IOT-SQL?  A declarative language where you can select information from the sensors.  If have tables specified as buildings, room, temperature sensors, etc.  With temperature sensors, have columns that looks like regular database.  Need to figure out how to model complex objects – for example, a room on a floor in a building, – need a model for this.  Have a notion of a shell database – it is a regular database that stores metadata, statistics, and access privileges – can perform authentication, authorization and query optimization against that database.  As far as these processes are concerned, they don’t need the actual data.  Now expand this to the devices.  The IOT shell also gives a simple abstraction for sensors, actuators, and distributors.  The shell can be stored in SQL Azure, DocDB, etc.  It’s JUST a database.

What about querying devices?  One query is ExecuteOnce: push select to device, it sends results, we’re done.  ExecuteForever, push SELECT to device, then the device continually sends results back to client.  When done, send signal we’re done and query stops running.  Then have ExecuteAction: send a SELECT and then an action, and the action gets fired when predicate is met.  Can do execution once, or forever.

Back to temperature sensor table…need some delcarative queries.  ExecuteOnce – get the count of all hot locations.  The optimized plan is generated, data is moved, and then work is done up in the cloud.  Not a lot of pushdown here.  ExecuteForever query – record all hot locations up in the cloud, and execute forever, the optimizer might produce a different plan (does some partial aggregation before pushing data up into the cloud – larger computation is done in the “fog”).

ExecuteAction: turn on AC in all the hot locations.  Larger computation and the action is pushed down in the fog, and only interesting events are pushed up into the cloud.  Multi-purpose query – based on results, some could go to one location, some could go to another location.

The Polybase for IOT Wrapup – use SQL front end with Polybase for sensor/actuator metadata management and querying.  Exploit Polybase’s external attribute mechanism to allow SQL queries to reference sensor values…and then one more thing I didn’t get 🙂

Why should we, as data professionals, care?  When a new technology rolls over you, you’re either part of the steamroller or part of the road (didn’t get the attribute).  Key takeway: the amount of data to manage is exponentially going up.  Need to step back to see what success looks like.

Dr. Nehme has announced that this is their last keynote.  Why? Dr. DeWitt…they have done 7 of these.  There are a lot of great speakers at MS, and he is sure there are people who are better speakers.  Dr. DeWitt and Dr. Rehme are “parting ways”.  She is finishing up her MBA and moving on.  Dr. DeWitt is starting to think about retirement.  After 40 years thinks it’s about time to give up the full time gig.  In 10 years…  Have not seen the last of Dr. Rehma – whether it’s at Microsoft or at a competitive.  Dr. DeWitt says this has been one of his brightest spots in his career.  He says it’s been a terrific experience.  He will think about this community for many years to come.  (ELS: I admit, I’m a little teary.)

Leave a Reply

Your email address will not be published. Required fields are marked *

Other articles

A Fond Farewell

If you haven’t guessed from the title, I’m writing this post because I am leaving SQLskills. This Friday, January 14th, is my last day, and

Explore

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.