We’re kicking off Thursday’s PASS Summit keynote in about 5 minutes, and the good news is that I have network connectivity today and I’ll really be live-blogging today. Stay tuned for updates throughout the morning!
Ok, I guess I need to start with this picture of Brent Ozar and Grant Fritchey:
Brent and Grant will be wearing these lovely leggings today when they present as part of an effort by to help raise money for Doctors Without Borders. You can still donate!
8:15 AM
We’re off and running with Adam Jorgensen, PASS EVP of Finance. Adam is going to provide an update about the financial status of PASS. Funny enough, doing this at Summit satisfies the requirements of the by-laws. The largest source of revenue is the PASS Summit (not a surprise): 96% of the revenue for PASS is generated by Summit and the Business Analytics conference. The money raised goes to provide activities throughout the community, and there are reserves of over one million dollars (pretty good). These funds protect the PASS Community in case the Summit is cancelled due to a natural disaster. In fiscal year 2016, PASS wants to focus on projects already scoped and also provide funds for new projects that are yet to be determined. PASS publishes the budget every year. The community members have access to this, and starting in 2016 portfolio-level budgets will be published so it’s easier to drill into the areas of PASS which interest members the most. The focus in 2015 includes conferences, the global alliance program, investing in IT, community events, data culture, and the business and data analytics community.
8:22 AM
Adam finishes up and PASS President Tom LaRock comes on stage. Tom takes a few minutes to say goodbye to members who are ending their term on the board, including Sri Sridharan and Olivier Matrat. Sri managed the Volunteer profile within PASS and did a phenomenal job trying to bring more volunteers into the community to help PASS. Tom also introduces new Board members: Sanja Mishra and Grant Fritchey. Next up is Denise McInerney, EVP of Marketing.
Denise mentions that over 5000 people are watching today’s keynote online on PASStv in over 113 countries. Denise starts by talking about her involvement with PASS, which started back in 2002 with a session she attended by Kimberly Tripp. (ES: SQLskills shout out!) Denise then got involved locally and at the national level. When you volunteer for PASS you help other members, and you broaden your own network. A point from Denise: many of the people she met in the beginning are the ones she still turns to. Denise announces this year’s PASSion Award winner: Andrey Korshikov, who is based in Russia and a SQL Server MVP and BI Developer. Andrey is a PASS Regional Mentor and the founder of the Russian VC. He’s managed four SQLSaturdays and three Russian editions of 24 Hours of PASS.
Denise also mentions those who were also nominated for the PASSion award – but I couldn’t type them all fast enough 🙂 She then highlights the PASS Outstanding Volunteers that have been recognized through the year and asks them to stand and be recognized. (ES: It takes a village – there are so many fantastic people who contribute to this community.)
On Friday, from 2:15 to 2:45, in room 307/308, there will be a Business Analytics Direction Board Discussion. If you want to provide feedback about the Business Analytics Conference (taking place April 20-22, 2015 in Santa Clara, CA) and/or this direction that PASS is going, please attend the discussion. Denise also reminds people to update their PASS profile, particularly if you want to volunteer and and get involved.
The next PASS Summit will be in Seattle, October 27-30, 2015. Registration is already open!
8:35 AM
Dr. Rimma Nehme finally takes the stage for her keynote: Cloud Databases 101. She is a Principal Research Engineer at the Microsoft Jim Gray Systems Lab.
Dr. Nehme has been watching this conference for the past 5 years, and starts by thanking the organizers for inviting her, and mentions Dr. DeWitt. And she was thinking about how she could be like Dr. DeWitt, and then realized, “trying to be a man is a waste of a woman”. She won’t try to be like Dr. DeWitt, she will just be herself. Yes. Dr. Nehme was born in Belarus, she knows a little bit about databases from an academic and real-world perspective, and she is learning a lit bit about business too. (Dr. Nehme is getting her MBA in her “spare time”…seriously…and did I mention that she’s also a mom of two kids? SO impressive.) Dr. Nehme is a big fan of the PASS Community.
Today’s topic is: What is a cloud database? Our roadmap for today:
- Why Cloud?
- What’s a Cloud Database?
- How are they built?
- What’s my role as a DBA?
- Summary
Cloud technology is still relatively new, and it has “Shiny Object Syndrome” around it. Dr. Nehme’s goal is to explain why cloud is special. Basic equation to remember is that cloud = service. More precisely defined: the cloud is computing and software resources that are delivered on demand, as a service that is always on, accessible from anywhere, and at any time. This is also known as the 5the utility. Why is it called cloud computing? Blame the network people (not database people). Cloud computing characteristics:
- on-demand self-service – demand for resources can be filled automatically
- location transparent resource pooling – resources are pooled to several customers
- ubiquitous network access – all resources available over the network that allows data exchange
- rapid elasticity – capability provided on-demand when needed, then releases
- measure service with pay per use – resource charges as based on the quantity used
Think about it: one woman or a man, and a credit card, can tap into some of the largest computing solution in the world.
A brief history: the wave of computing started in the 1960s. The concept of computation was born in this time by one of the MIT professors. In the 1990s, the first cloud application was offered. In the 2002, Amazon Web Services was launched, and Windows and Google launched offerings in the 2008 timeframe.
Question: Where does the cloud live? In a data center. Let’s go on a virtual tour of a Microsoft Data Center. The data center in Chicago looks like a fancy trailer park. What’s inside those big containers? Lots and lots of servers. When we think of a data center we think of lots of servers, raised floors, etc. There is more to it, there are transformers, cooling towers, chillers, UPS’, powers, and people. One way to describe a data center is by its efficiency. Optimizing for energy efficiency is a good thing. We are socially responsibility to pay attention to our use of resources. One way that efficiency is calculated is by using PUE = power usage effectiveness. The formula is the total facility power divided by IT equipment power. This is valuable as a broad efficiency ratio. The PUE ration for a modular data center (hosting cloud resources) is 1.15, whereas for a traditional data center it is 2.0. Interestingly enough, the cooling for the modular data center is 0%. One example of how this is done is swamp cooling (aka evaporative cooling)…put cold water in front of fans. ES: Rob Farley tells me this is how its done in Australia. Data centers have significantly evolved since the late 1980s. There are over 100 data centers in more than 40 countries – more than 1 million servers. What does Microsoft consider with site selection? There are over 35 factors, the top 3 are proximity to the customers, energy and fiber infrastructure, and the skilled workforce.
The main takeaways for why cloud: elasticity, no cap ex, pay per use, focus on business, and fast time to market. This is why cloud computing is special.
What is a cloud database? Everything in the cloud is a service. So you’re getting a database, as a service. Cloud services has 3 layers: infrastructure, platforms, and applications (and these are also all services). The Microsoft Cloud has the same thing – infrastructure services, platform services (e.g. Windows Azure, SQL Azure), application services. When you have a data center on site, you manage everything. When it’s infrastructure as a service, part of that stack is outsourced to a vendor. With platform as a services, you’re responsible for the application and data – everything else is outsourced to the vendor. With software as a service – you outsource everything. Dr. Nehme takes this and then does “pizza-as-a-service” analogy:
- On Premise = you buy everything and make the pizza at home
- IaaS = take and bake (pick up the pizza, you cook it at home)
- PaaS = pizza delivered
- SaaS = dining in the restaurant
IAAS in the database world – you must still manage provisioning, backups, security, scaling, failover, replication, tuning, performance, etc. Dr. Nehme calls this lift and shift – take an earth version of a database and put it in the cloud. Existing applications don’t need to be modified. Just need to point to version of the DBMS in the cloud.
PAAS = DBMS as a service – select the cloud vendor, select a DBMS. Here, the cloud vendor manages provisioning, backups, security, tuning, failover, etc. There might be some changes to the language surface compared to an earth version of a database.
SAAS = select the cloud vendor, select a cloud app (SharePoint). The whole stack is outsource to the cloud vendor.
Database as a service examples:
- Managed RDBMs (SQL Server)
- Managed No SQL (Doc DB, MongoHQ)
- Cloud-Only DBaaSS (Dynammo DB, Google F1)
- Analytics-as-a-Service (HDInsight, EMR)
- Object Stores (Azure storage, S3)
Why virtualization? It’s a huge enabler for cloud computing. Unfortunately, many servers are grossly underutilized. Virtualization developed to put resources back to work. However, there are bottlenecks with these resources. What can be virtualized? CPU, network, memory, and disk. Keep in mind that there is no free lunch. Virtualization comes with limitations. Lose direct access to the computing resources. Now have an indirect path. Also, hiding the details of physical resources is unfortunate in terms of configurations. In addition, virtualization always causes some degree of performance penalty. Use cases are consolidation, migration and load balancing, and high availability. For consolidation, if CPU requirements are high for one server, and IO requirements are high for a second server, consolidating those two might be ideal (and can also equate to energy-savings).
With migration and load balancing, assume one machine with a VM with a RDBMS that gets overloaded. It could be migrated to another machine to help maintain performance. And with high availability – one machine with a VM and then backup machine with VM image, will detect a failover, restart the image so the server stays up and available.
There are four common approaches to multi-tenancy (with a lodging analogy):
- Private OS (SQL Server in a VM) – private apartment
- Private Process/DB (MongoHQ) – private room
- Private Schema (Azure SQL DB) – share room
- Shared Schema (SalesForce) – share bed
What’s the big deal with this? When you consider database as a service, what are the requirements for your database, for your data? If application independence is important, don’t go with a shared schema approach. You must do cost-benefit analysis. Given pros and cons, what works best for you?
Service Level Agreements…when people talk about the cloud they talk about SLAs. It’s a common understanding about services, guarantees, and responsibilities. There is a legal component and a technical component. Service level objectives are measurable characteristics such as performance goals, deadlines, constraints, etc. Think of this in terms of availability and “nines”. If you require four nines (99.99%) up time, that’s about 4 minutes of downtime per month. Three nines (99.9%) is about 43 minutes per month. Just one nine can make a big difference. Container based hardware is three nines-reliable, but with SQL DB they are delivering four nines-reliability.
Three main concepts behind Azure SQL DB:
- Account – 0 or more servers
- Server – 1 or more databases
- Database – standard SQL objects
This was designed with high availability in mind. This means that there are multiple replicas of data. There is a primary and two secondaries. If a node goes down, the secondary becomes a primary, then replicate again so end up with two secondaries. Reads are completed on primary, writes replicated to secondaries. Four layers:
- client – used by application to communicate directly to SQL Database
- services – the gateway between the clients connecting to the SQL DB and the platform layer where computation occurs; provisioning, billing, routing for connections
- platform – physical services that support the services layer above, includes SQL Server, management services
- infrastructure – IT admin of physical hardware and OS
Applications connect to the internet, go to the Azure cloud, get to the load balances, hit the gateway which are connected to the SQL DB nodes, and then under all that is the scalability and availability fabric which does failover, replication and load balancing. What does the SQL node look like? It’s a machine with a SQL instance with a single physical database for the entire node. The database files and logs are shared across every logical database – might be sharing log files with someone else. Each logical database is a silo with its own independent schema (sharing a room analogy).
What we if create a database or run a query, how does it work? The Azure service will identify where to put the primary database (when creating a new one), then put secondaries on two other machines. When a user comes in to do a query, the SQL Azure gateway service will identify where the primary is located, get to it, perform computation, and return results to user. If you want to know more – go to the sessions here at PASS 🙂
Next up: my role as a DBA. From Dr. Nehme: “I have to be honest, I tried to put myself in your shoes.” She asks, do we still need a DBA in the cloud era? Dr. Nehme says yes. Cloud doesn’t have to an either/or choice. You can augment on-premise systems with cloud (remember the stretched tables example for yesterday). This is the time to refresh your skills and adjust to this era in the cloud. The cloud was not designed to be a threat to DBAs. The number of DBAs vs. the number of database apps vs hardware computer capacity. The number of DBAs is much smaller than both. This is where cloud computing can help. Address issue of underutilized hardware and alleviate some of the work of overburdened DBAs. Dr. Nehme’s recommendation is to take current skills, add cloud skills, and call yourself a Cloud DBA.
Some key things to remember:
- cloud database = a service, designe dto reduce admin and operational costs (pay as you go, elasticity), there is a wide spectrum of solution (rent a database, cloud database).
- If you get confused about cloud deployment options, remember the pizza analogy
- Do the cost benefit analysis, and you need to embrace the cloud. It presents a lot of opportunities.
Dr. Nehme finishes up and takes a minute to thank Dr. DeWitt and has him come on stage. She hints at possibility a keynote with both of them in the future. I’d love that, but I’d also be happy to just hear Dr. Nehme again 🙂 Great session. Perry is overwhelmed…
Edit: 10:09AM In my original publishing I referred to Dr. Nehme as Rimma…and I think it’s because Dr. DeWitt always refers to her that way (and she refers to him as David). I updated the post to fix that. And also wanted to add a new pic (Dr. Nehme’s keynote and having the chance to chat with her was one of this week’s highlights):