Taking the Azure SQL Database row-level security preview for a spin

The security announcements around Azure SQL Database keep coming. Auditing was implemented a few months ago, and today it was followed by a preview of row-level security. There were also announcements around transparent data encryption (TDE) and a new dynamic data masking feature in future but these aren’t available, even in preview, yet. The row-level security announcements like the one in the SQL Server Team Blog preceded posting the documentation page, so if you originally got a “page not found” it’s up there now.

So I thought I’d provision-up a database (takes about 30 seconds if you have an Azure subscription) and take RLS for a spin. It’s quite important to read the documentation page first, not only because there’s a nice demo in there, but also to get a feel for the implementation details, i.e. what the feature is and what exactly it’s documented to do. I’ll try not to simply regurgitate the documentation here. That’s difficult because it’s quite a nice one-page summary with additional info in the “other resources” section.

RLS is implemented by means of a new schema-scoped object, the SECURITY POLICY object, and inline table-valued functions (TVFs) that implement the policy. The CREATE SECURITY POLICY DDL uses filter predicate clauses to tie the TVFs to individual tables, one filter predicate per table. The TVFs return a one-column table, with a column name of fn_securitypredicate_result. You can use any of the mechanisms and built-in security functions to determine which principals you’re dealing with and you can access which rows they can access with the TVF parameters can be column names. I’d stay away from security functions marked deprecated in books online. Remember, an inline TVF only contains a single SQL SELECT statement.

There a couple of nice things about using inline table-valued functions. First off, the functions need to be defined WITH SCHEMABINDING so there’s no way for someone to simply alter the function unless they drop and re-create the SECURITY POLICY object. In addition, the fact that it’s an inline TVF means that the function shows up as a specific “Filter” iterator in query plans because it’s inlined (unlike multi-statement and scalar TVFs which are not inlined, i.e. have a separate query plan of their own).

You can use ALTER SECURITY POLICY to turn the filters on and off, and to ADD and DROP filters. There are two new metadata tables, sys.security_policies and sys.security_predicates to allow visibility.

Naturally, DDL against SECURITY POLICY and the TVFs can be audited (I did try this out to check that it appeared in the audit file). Because the SECURITY POLICY object references the TVFs, I actually had to grant a number of permissions to allow “Manager” (from the first example in the docs) to create the policy (note: I changed the example a bit to define the SECURITY POLICY in the “Security” schema instead of dbo schema):

GRANT SELECT ON Security.fn_securitypredicate TO Manager;
GRANT REFERENCES ON Security.fn_securitypredicate TO Manager;
GRANT REFERENCES ON SCHEMA::dbo TO Manager;  — Needed because the table lives in DBO schema

— This allows ALTER or DROP, but not to CREATE unless ALTER ANY SECURITY POLICY is also given
— This may not be granular enough, unless the RLS objects are the only ones in the schema.

It looks like this feature has the potential to make it into the box-product SQL Server in future (the docs refer to it not being supported with memory-optimized tables for example, which Azure SQL Database doesn’t support yet), so even if you don’t use Azure SQL Database, the documentation page provides a good opportunity to CAREFULLY evaluate whether it will meet your needs, and get your comments/requests in now. There are four specific places in the documentation page to look at:

1. Limitations during the preview (e.g. limitation on views)
2. Limitations and restrictions (e.g. DBCC SHOW_STATISTICS)
3. The Description section (e.g. INSERT anywhere is allowed, not filtered)
4. A special section called “Security Note: Side-Channel Attacks”

I’d propose that the “INSERT anywhere not blocked” limitation is the biggest worry (modulo the side-channel attacks), especially in the multi-tenant (listed as a specific use-case) scenario. Although conceivably you might ameliorate this with an INSERT trigger. This doesn’t address the problem of someone changing CONTEXT_INFO though (as is mentioned in the doc), you’d need to set up different roles for each tenant as well.

Cheers, Bob (@bobbeauch)

Azure SQL Database V12 Preview – Spatial Fully Functional

Yesterday’s blog post about Azure SQL Database V12 mentioned that one of the features I was particularly interested in seeing/testing were the spatial features. Interestingly, this was not even mentioned as an enhancement in the technical “What’s New” page. This may be due to the fact that the only place where a list of the limitations of existing Azure SQL Database spatial implementation was documented is Ed Katibah’s article about “Updated Spatial Features in the SQL Azure Q4 2011 Service Release“. Q4 2011 was the last update for ASD spatial that I was aware of. As an example, the Transact-SQL books online page for CREATE SPATIAL INDEX makes no mention of the fact that ASD doesn’t support AUTO_GRID spatial indexes and spatial index compression.

So I fired up an S1 instance of both a current (V11) database and a new V12 preview database (to ensure that I hadn’t missed an update to V11), and tested out all the features that Ed mentioned. They were never updated in pre-V12, but EVERYTHING is there in V12. Very cool!

You can refer to Ed’s article for a complete list (not supported in V11, supported in V12 and SQL Server 2012/2014), but here’s some of the highlights:
Spatial objects larger than a hemisphere
FULLGLOBE, circular/curve types, and associated methods
AutoGrid spatial index
Compression for spatial indexes
Spatial_Windows_Max_Cells query hint
Spatial Analysis Stored Procedures and Histograms

As long as I was experimenting in a V12 database, I thought I’d try the Codeplex SQL Spatial Tools (http://sqlspatialtools.codeplex.com/) an extention library for spatial built using SQLCLR. As I’d suspected, you can run CREATE ASSEMBLY by with a file system or Azure blob storage reference. Tooling is coming soon, but until then, I had to catalog the assembly from the file system to a “box version” SQL Server and then use SSMS Object Explorer to locate and right-click on the “box” SQL Server Assembly and use “Script Assembly as CREATE” to obtain a “CREATE ASSEMBLY … [from bits]” statement. The create-from-bits worked fine. The failure message using “local DLL” syntax is kind of interesting: “Msg 6585, Level 16, State 1, Line 2 Could not impersonate the client during assembly file operation.”

The SQL Spatial Tools assembly uses SQLCLR UDT, UDAggs, and UDFs, all seemed to work fine. Using SQLCLR along with spatial enables writing speedy spatial manipulation functions because each operation on a spatial object is not a separate T-SQL statement. So these are two features (SQLCLR and SQL spatial) that work well together.

That’s it for now.

Cheers, @bobbeauch

Not a “me-too announcement” blog on Azure SQL Database V12 preview

In general, I usually hate “me too” announcement blog posts. Over the years, I’ve considered it less than useful to simply repeat “Product XXX released to RTM” when nice marketing/technical announcements have already been posted for the folks that are responsible. Or even to rehash books online pages, as another example.

In keeping with that concept, it was announced today that Azure SQL Database (the PAAS offering) is previewing a V12-level (that’s SQL Server 2014-major version number) version of the database. If you’ve ever heard me harping (politely, I think 😉 on the fact that Azure SQL Database (ASB) has been V11-level since its release, that’s great news by itself.

The marketing announcement is here. The somewhat more technical “what’s new” announcement is here.

So what’s there to add? Before actually trying this out, here’s some things that came to mind.

1. There’s no full-text search feature included in ASB V12 that I can see. BUT… because Change Tracking works with ASB V12, we should be able to use Liam’s lovely sample (that syncs with Azure Search but requires SQL Server Change Tracking enabled) to sync ASB V12 with Azure Search now. Great.

2. We’re going to need new tools (SSMS, SSDT, PowerShell Azure cmdlets) to work with the new ASB V12 features. Updates, the tools are here: Azure PowerShell 0.8.12 that supports ASB V12 is available as of 12/16/2014. SQL Server 2014 CU5 contains support for ASB V12 in SSMS. Previews of SSDT and DAC that support ASB V12 were announced on 12/17/2014.

3. There’s SAFE assembly SQLCLR support. Since I don’t see CREATE ASSEMBLY functionality that works with Azure Blob as DLL, I think we’ll need to use CREATE ASSEMBLY…. [from bits] for now. There’s no EXTERNAL ACCESS support. I can see that because you shouldn’t attempt to trundle around in a file system or registry that isn’t yours. But how about the ability to call a web service? That’s EXTERNAL ACCESS and a (sort-of) natural for a web-based offering. I always caution against doing this (web service thing) synchronously in on-prem SQL Server (what if the website is down?), but there’s no service broker in ASB to make it asynchronous. But anyway, it’s not supported.

4. Spatial support in ASB was sort-of a 75/25 WRT new SQL Server 2014 features. Can’t wait to try these out… as well as trying out the Extended Event sessions for real. They should work much better than when I got overly excited about some early artifacts a few months ago.

5. This is the first version of ASB (that I can remember) that has some in-database functions segregated to premium databases (e.g. columnstore, parallel queries). Ancillary service DB functions (e.g point-in-time restore) were always different-by-edition.

That’s all for now. Guess there’s a lot of tires to kick here. Thanks for all the fish.

Cheers, @bobbeauch

AzureML: What components are used by the sample experiments?

A few months ago, I embarked on a project to learn more about data mining, machine learning and, as a prerequisite, statistics. I was tired of hearing “statistics show that…” without enough “proof” and have taken a long side-trip toward Statistical Inference and Reproducible Research. I’d be much more interested in reading a scientific paper that comes with code and data than one with paper and forgone conclusions (possibly influenced by grant-money) only. And after hearing “this would be clearer with an understanding of calculus”, I brushed up on my calculus and linear algebra as well (it had been over 40 years ago, after all). I’d been working in languages like Octave and R and when Azure Machine Learning (AzureML) arrived, the idea that you could use R scripts in AzureML experiments was intriguing.

AzureML features a drag-and-drop canvas-based project system (the projects are called “experiments”). I tend to like programming with code over drag-drop-cut-paste, but the killer feature of AzureML may be that you can publish an endpoint that enables singleton or batch scoring, using your work. But back to the R scripts.

AzureML currently has about 25 sample experiments to use as templates or exemplars. After looking at a few of them, I thought it would be nice to find the experiments that use R scripts and see how the samples used them. Turns out that you run into problems doing this automatically. You can’t, at this point, persist an experiment using, say, an XML vocabulary or some JSON format. At all. To me, this was quite strange, because you can’t pick them up and move them, like you can with say, an SSIS package. And since the only artifacts in the Azure storage account are the blob containers “experimentoutput” and “uploadedresources”, neither one of which, as far as I could see, contains the experiment “definition” (i.e. the project).

I validated this observation (on the AzureML forum) and put in a request on the AzureML improvement suggestions site. Since there already is a fairly standard XML vocabulary called PMML, I suggested using that. Other reasons for having this available would be for version control and providing an offline format in the event that you inadvertently delete your AzureML storage account. In addition to the uses I’ve already mentioned.

To end this post with something useful, I did do the search for R Scripts in the sample AzureML experiments “by hand”, by copying each sample opening the canvas and typing in each unique component use (but not the parameters or scripts) in notepad. Enclosed is a file that contains this information, I hope you’ll find it useful. Please excuse any inadvertent typos.

Hopefully in future, this would be a simple case of querying the set of “experiment projects” with XPath. Or if you’re XML-query-phobic, with grep.

Cheers, @bobbeauch

azureml_sample-component xref

Book Review: Chris Webb’s “Power Query for Power BI and Excel”

I’ve been experimenting with Power Query and the rest of the Power BI suite in Excel (desktop edition) since it was originally released. But, like most folks that aren’t completely centered in that part of the data-based offerings, the plethora of new functionality and new releases made it difficult to keep up. The online help has improved, but the distinction between Office 365 Power BI  and Power BI desktop and the fast release cadence (updates appear almost monthly), made it difficult to write books (which require pre-production time). Having met Chris Webb at a few conferences, when I heard he stepped up to this challenge with a Power Query book entitled “Power Query for Power BI and Excel“, I was interested in reading.

The book weighs in at 272 pages, and I thought it might take a couple of weeks to do it justice. Once I started however, I was hooked and consumed it in a couple of days. Chapter 1 was a nice introduction that told the story of Power Query’s reason for existence, and described the differences between the experience with different versions (Office 365 Power BI vs Excel 2013 vs Excel 2010). It wouldn’t be too big of a stretch to call this an incantation of SSIS-like functionality for self-service BI. Reading quickly through chapters 2-4, about data sources-transformations-destinations was a thorough demonstration of the functionality of the product, and clarified exactly what was supported. I was introduced to the “Excel Data Model” concept here as well.

It was the chapter on M that was the gem of the book for me. I’d realized that you could extend your Power Query queries with M functions, but didn’t realize that every query was actually a script written in M. Chapter 5 describes M the language and how you can use it to not only embellish Power Query queries, but to even write them by hand. M code that comprises Power Query queries looks vaguely reminiscent of Hadoop Pig scripts, although obviously the surface area isn’t the same. I was impressed by the author’s treatment in “Introduction to M”.

The chapter on Power Query in Power BI Office 365 told me what I wanted to know about how Office 365 extends the functionality of Excel desktop Power Query into a shared, more enterprise-y offering (e.g. query sharing), as well as how it integrates with Sharepoint. And the Multiple Queries and Power Query Recipes chapters helped get me into the “thinking of solving problems with Power Query” mindset.  The first sentence of the recipes chapter “…when you are learning about a new tool it isn’t enough to know what each button does when you press it, or what each function returns when you call it” was right on target.

I had a few minor quibbles. I’d like to have seen a mention of the exact version of Power Query (version number and release date) that the book covers. I knew that new releases are coming out monthly, and was working with the version that appeared a few days ago, but looking at a screenshot that said “Apply and Close” when my version said “Close and Load” (as well as other differences) in Chapter 1 was a bit disconcerting. But that’s more of a “don’t constantly change the GUI on something that’s supposed to be consumer-focused” comment on the product. Or maybe it looks different in the Office 365 version, I didn’t know.

And about the product…I was also surprised, product-wise, that they didn’t support OLE DB (at least for the Analysis Services OLAP provider), or even ODBC, although you can shoehorn these in via the vanilla Excel “data sources” tab. I guess it was to keep the GUI rich and consistent, functionality-wise, but if you can do it for OData… I did especially like the non-traditional data sources like HTML tables/pages and Facebook, although a generic JSON data source would also be useful. Maybe I just didn’t see where the generic JSON functionality lives. The “data sources” book chapter did make the level of support very clear, though.

Overall, I’d recommend this highly recommend this book if you’re getting started, or even if you think you’re pretty far along with this tool. The M and Office 365 Power Query chapters themselves may be worth the price of admission.

Cheers, Bob (@bobbeauch)

New Azure services and evolution of the Service/SQL Server relationship

Today, two new service offerings for the Azure platform were introduced, DocumentDB and Azure Search. These are exciting in themselves and worth a look-see but, to me, they are also occurrences of the phasing out of the “database as an application server” concept, formalized in the SQL Server 2005 timeframe. At least, in the cloud. Did everybody already notice, or was it too subtle?

Services moving inside the database (where they are more tightly integrated with the data they use) predates SQL Server 2005. Rather than rely on the OS for scheduling and alerting activities, SQL Server has always had its own facility (and separate Windows service) for this, known as SQL Agent. A mail component is included in the database as well. But possibly the first big service to be moved into the database was Microsoft Search Server. It was originally grafted on, but over time came to be completed integrated in SQL Server as the fulltext search feature. SQL Server 2005 was the watershed for this concept, including the XML/XQuery, Service Broker, event and query notifications and HTTP endpoints, with SQLCLR providing a programming substrate to be used in conjunction with T-SQL for the new services. This concept was formalized in the Dave Campbell’s paper “Service Oriented Database Architecture App-Server Lite”. Interestingly, these features are often artifices over relational constructs (e.g. XML and fulltext indexing are relational side-tables). There was even lots of work put into using SQL Server as a substrate for the Windows file system (WinFS).

The first move away from this concept of “database as substrate for all” was the removal of HTTP endpoints and it’s replacement by what finally came to be called the OData protocol. OData serves data as either an XML format (AtomPub) or JSON format, which is why, although it’s often requested, it’s doubtful we’ll see “native JSON support” inside SQL Server soon.

Then….to the cloud…

One of Microsoft’s first PaaS offerings, now known as Azure SQL Database, solidified what was to come. Azure SQL Database is a relational database engine only, shipped without most of the “app-server” services inside SQL Server proper. Features that do not exist inside Azure SQL Database include not only the features mentioned above (XML/XQuery survives, but XML indexes, as far as I’m aware, do not), but other nice infrastructure features such as replication. These are to be offered by “the platform”, as in PaaS.

DocumentDB and Azure Search are just the latest pieces of the platform. SQL Agent functionality can be implemented using Azure scheduling and notification services. Service Broker is implemented/subsumed by Azure Service Bus. OData services are part of the service layer, not the database layer. JSON is tightly coupled in DocumentDB, JSON documents subsuming XML documents in many use cases, e.g. document databases rather than XML databases.

A discussion of what precipitated the move from application servers and service-based architectures to database as application server to separate services with service coordination programmed into each application is a whole other set of blog entries…or discussions in bars.  As is the comparing-and-contrasting of how this is being done in other ecosystems, e.g. cluster as substrate for all and a central resource negotiator (YARN) that manages the cluster. Note: SQL Server (and likely SQL Azure Database) uses it’s own Resource Governor to divvy-up the resources. And there’s always the question of who (application and administration-wise) manages and orchestrates the integration of these disparate services; now that’s a occupation that will certainly be in demand.

As far as SQL Server “proper” goes, DBAs need not worry about having less to do. The latest versions now integrate a column-based “database” and its optimizations, and an in-memory “database” (with different storage and isolation paradigms) into the mix. But these are data engines rather than services. Also note the cross-pollination between different parts of “the platform”; the XVelocity engine also lives quite nicely in SQL Server Analysis Services and in Excel. And there’s already rumblings of the implementation of DocumentDB using lessons learned with SQL Server’s in-memory implementation. So far, it’s interesting that neither columnstore nor hekaton have yet to make an appearance in Azure SQL Database. The same “redistribution, integration, and evolution of services” is going on in the ETL/Analysis/DataMining/Machine Learning Microsoft ecosystem too.

One thing about the technologies surrounding software; we always “live in interesting times”.

Cheers, Bob (@bobbeauch)

Don’t drop empty Azure SQL Database servers

In a previous blog entry, I mentioned changes in SQL Azure Database’s new tiers. Another one, that I think is indirectly related to the new tiers, is the disappearance of a message that used to appear (on the portal) when you deleted the last database on a particular Azure SQL Database server. That message asked “Do you want to delete the server as well?”. Lately, this message doesn’t appear.

I’d always answered “no” to the message in the past; having a server hanging around without databases didn’t cost you anything and if I restored (imported) the database in future, I didn’t want to change all my connection strings. The server would, it was said, go away after a certain number of days (60?) without databases.

With the introduction of the new tiers and restore/recovery there’s a better reason not to delete servers with no databases. Your collection of “Dropped but restorable databases” (the PowerShell property for these is RestorableDroppedDatabases) is tied to the server. When the server is deleted, you can’t restore them any more. And, as far as I know (the cost was likely amortized up-front) servers with only RestorableDroppedDatabases still don’t cost you money.

Because the Azure Portal has more mention of “create a new database”, the server concept has sort of taken a back seat. A server in Azure SQL Database is like a “virtual SQL Server instance”. It consists of a master database (similar in concept, but not completely in implementation to SQL Server’s master), an associated administrative login (as well as other logins you’ve added) and a set of firewall rules. In addition to associated user databases. For a complete list of associated properties (and methods and events), use the following lines of code in PowerShell, after getting a credential:

$ctx = New-AzureSqlDatabaseServerContext -ServerName “myserver” -credential $cred
$ctx | get-member

Currently (AzurePS 0.83), attempting to retrieve all the property values returns an exception (you get the same exception trying to access the RecoverableDatabases property so it might be that), so you’ll need to retrieve the values that you want by name.

So don’t forget that servers are important in the new tier world, and dont drop ’em unless you don’t care about (or you don’t have any, because of time limits) RestorableDroppedDatabases.

Cheers, @bobbeauch

A few tweaks to Azure SQL Database’s new tiers

One of the things that make Azure SQL Database difficult to keep track of is that details of some database behaviors can change, usually without notice or announcement. There were supposed to be announcements on one of the Azure blogs, but that hasn’t happened for Azure SQL Database in over a year, except for the “big announcements” this April. There is no way to confirm that “things used to work this way, and now they don’t”. This is particularly difficult with preview features that are more likely to change during the preview.

That being said, there are a few changes that have happened with respect to the new Basic/Standard/Premium tiers preview. The first one is that you can mix “new tier” and “old tier” databases on the same server now. Previously they didn’t mix; a server could either contain old or new tier databases, but not both. I currently have a server with a Web edition and a Basic edition database. I first the server after the preview started (PowerShell identifies these as “V2″ servers); I don’t have an “old tier” server (V1?) to test out whether it supports the new tiers. Using an “old tier” database on the same server as a “new tier” database doesn’t convey “new tier” behaviors. That is, deleting the “new tier” (Basic edition) database makes it available for restore under the “Deleted Databases” portal tab. Deleted the “old tier” (Web edition) database doesn’t.

Another change is in the implementation of restore/recovery of databases (and deleted databases). Not only have the Basic and Standard tiers “length of available backup” changed (Basic from “most recent daily backup” to “point in time within 7 days” and Standard’s point-in-time from 7 to 14 days), but all editions (including Basic) support point-in-time restore. This also means that the Start-AzureSqlDatabaseRestore (for editions that support point-in-time) now works on all new editions, whereas Start-AzureSqlDatabaseRecovery (for editions that don’t support point-in-time) doesn’t seem to work at all, with any edition.

Finally, the “Automated Export” feature (where you want to backup on a schedule and also keep your bacpac file) is now available for the new tiers as well as the old. Check it out under the “Configure” tab on any SQL Azure Database, old tier or new.

Cheers, @bobbeauch

Extended Events In Azure SQL Database. For real.

I’ve just returned home after teaching two weeks of SQLskills Immersion Events. Total post-event exhaustion for two days. Trying to produce some nice demos on the new SQL Azure Database tiers and services for my SQL Azure Database A-Z preconference ssession at SQLTuneIn. And then this happens…

About a year or so ago, after attending a TechEd talk on XEvents in Azure SQL Database (ASD, the PaaS offering), I looked for and found metadata and blogged about it. The feature never seemed to come to fruition and after a while, I stopped looking.

This morning, I received email from Egbert Schagen, a person I’d met in my conference travels. He let me know that XEvents now appeared to work in ASD. They appeared in the SSMS object explorer. Egbert said he’d seen no announcement for it, and by web search he came across my old blog entry. He thought I might like to know. And he was right.

And so they DO work. I tried this in an Standard S1 Edition database I had and Egbert said they also worked in Web/Business edition. As with almost everything ASD, these event sessions exist at a database level, rather than in master.

There are 12 pre-configured event sessions. It doesn’t look like you can create your own. The sessions are:

The event sessions with an asterisk can be turned on and you can “Watch Live Data” in SSMS. The rest of them look like they require a credential because they write to Azure blob storage. More on that as I try and figure through this with the metadata. It appears you can only run one event session at a time. The second concurrent one always produces a “system is currently busy” message on attempting to start it.

I’m not sure what the “official” status is, but for now it looks like you *can* trace in Azure SQL Database. Great! Thanks, ASD developers. Thanks, Egbert.


Installing and Running HDP 2.1 on Windows

And now, for something completely different. Hadoop.

I’ve been playing with everything I can get my hands on (and can find time for) with Hadoop for a while now. Have tried Hortonworks on CentOS (seems like the Hortonworks “reference” platform), Hortonworks HDP on Windows, and HDInisght. I even think I may have a clue about what big data is and why it matters. But that’s a topic for another time. It’s intruiging that Hadoop on Linux has a built-in GUI dev tools (like Hue) but the Windows’ versions (put out only by Hortonworks/Microsoft) have no GUI except for the NameNode/YARN/HBase status pages, and almost mandate you work on PowerShell. I thought the Linux folks were the bigger command-line wonks. I haven’t tried any other distos (e.g. Cloudera) yet (or tried building it myself for that matter).

Over the weekend, I installed Hortonworks new GA distro for Windows. The main reason I installed it was to try out Hive 0.13 (with vectorized query – think SQL Server columnstore batch mode, ORC files are the columnstore) and Hive under Tez. Think of Tez as a more flexible, powerful, faster MapReduce. I did a single-node install for simplicity. Ran into some problems and anomolies that I’ll mention here. But did get most of what there is working. I say “what there is” because the reference diagram at http://hortonworks.com/hdp/whats-new/ mentions all the things I do see, but the Windows distro doesn’t include Accumulo (another fast data storage offering based on BigTable) and Solr (a search offering). Perhaps the CentOS distro does. And, of course, no Hue. And, although there’s an exe and it’s in the list of Windows services, HWI (Hive Web Interface) doesn’t start up. That’s been reported by others too.

BTW, this version isn’t available on HDInight yet. Maybe soon. Microsoft allocated some engineers to Hive 0.13 (and other phases of the Stinger Initiative), so I’m pretty sure they want it in HDInsight too.

There were the usual potholes in the docs, if you’re trying to follow the docs by rote. First, the docs warn you not to use path names with spaces in the Java or Python installs (http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.1-Win/bk_installing_hdp_for_windows/content/win-software-install-gui.html). Good idea in any case (don’t have to put quotes around names in command lines) but especially with Unix ports to Windows. But they warn you in the Python directions and Python 2.7 doesn’t install in “Program Files”. Java does. And one of the instructions says you should add (step 5c) e.g C:\Java\jdk1.7.0_45\bin as the value for JAVA_HOME. Don’t use the “bin”. Step 6b is wrong too; the path should include the “bin” directory. Easy to fix. The install reports “JAVA_HOME is not set” if you mess it up. Ask me how I know…I should have been paying attention.

If you choose “Run Hive Under Tez” you must peform Section 6.1 “Setting up Tez for Hive”. But there’s a glitch there too. Step 5: “Copy the Tez home directory on the local machine into the HDFS /apps/tez directory” is wrong too.
%HADOOP_HOME%\bin\hadoop.cmd dfs -put %TEZ_HOME%* /apps/tez
should be:
%HADOOP_HOME%\bin\hadoop.cmd dfs -put %TEZ_HOME%/* /apps/tez
(note the slash after %TEZ_HOME% and before the *)

Until you do that, Hive fails starting up the command line app with /apps/tez/lib not found.

And its likely that step 6 should be:
%HADOOP_HOME%\bin\hadoop.cmd dfs -rmr -skipTrash /apps/tez/conf
because there is no conf3 directory.

With the right copy command the Tez smoke test and Hive under Tez run just fine. I did have trouble with the HCatalog and Hive smoke tests at first (something about hadoop user needing a GRANT, GRANTs are expanded in Hive 0.13), but after enough tries (3 reinstalls of the distro and 7-8 runs of smoke tests) it just started working. Don’t know exactly what I did to make that start happening. So all the smoke tests (including optional components) succeed.

So THANKS folks! Lots more to try. There’s a lovely Hive/Tez/vectorized query tutorial (do it sans-GUI) that shows the new features off. And I’m glad that I can specify precision/scale for decimal types. And all other Stinger features and additional ecosystem components too.

BTW, I ran this on Windows 2012 R2 OS, even though the docs say only Windows 2008 R2 an 2012 OS are officially supported. It lists these under minimal requirements.

Hopefully this information will be helpful to someone else.

Cheers, @bobbeauch