SQL Server 2008: The New Report Designer and Tablix


I’ve had some time to experiment with the new report designer in the November CTP and continue to believe this will be a reasonable alternative to the Visual Studio interface for business users who participate in the report development process, as I mentioned in my post Looking Forward to Report Designer Features in Katmai’s Reporting Services. I think hard-core report developers will be happy about some changes and not happy about others which I’ll explain shortly. In the November CTP, the new report designer features (as well as the Tablix report item) are available only in a separate client tool. Today I’ll comment on my general experiences without getting into a deep dive on everything there is to know about the report designer and Tablix. For that sort of detailed information, check out Books Online here for the report designer and here for Tablix.

 

The first thing that tends to throw people is the absence of a Tablix control on the Insert menu.

 


 

You actually get it when you add insert either a Table or a Matrix into your report. There’s no such thing as a table or matrix element anymore. They have both been replaced by the tablix element which is a hybrid of the two. By choosing table or matrix, you define your starting layout for the tablix but you can change your mind as you go. I like this feature a lot because one of my clients inevitably wants a report that started its life as a table to turn into a matrix.

 


Here’s a screenshot of a report I was working on:

 


 

The next thing I really like is the ability to put captions into the column headers for rows in my matrix. In the example above, the captions I added are Products and Total. My client who likes to see data in a matrix more often than not also likes column captions – so I am constantly fussing with rectangles and textboxes in the matrix to get it right. (It’s not hard the first time, and it’s not hard to add new columns “headers”, but it’s a pain to take them away I have found. Thank goodness SQL Server 2008 Reporting Services offers a new and better solution!)

 

Another nice feature is the ability to have different sets of static columns. Traditionally (i.e. SQL Server 2000 and SQL Server 2005), you can present columns of numeric data as a single static group underneath the group(s) defined on columns. Here you can see I’ve created something independently of the dynamic columns. The Avg Sale column is to the right of the CalendarYear group. You can tell it’s outside of the group because the bracket in the column handles includes only CalendarYear. Compare this with the brackets on rows – one group is indicated for EnglishProductSubcategoryName and a second group including the first group is indicated for EnglishProductCategoryName.

 

Notice also you don’t see standard expressions in the textboxes. Instead there are expression placeholders, like <<Expr>>. I’m not sure I’m wild about this, but it may be less intimidating to business users. For hard-core types, you can open the Properties pane and see the expression you’re accustomed to seeing in the Value property box. The Properties pane isn’t visible by default so timid business users don’t need to be frightened by its presence unnecessarily. To set properties, they can use Office-like dialog boxes. Now I found that to be a relatively tedious process (by my standards) to set something simple like a currency format with 0 decimals as compared to a simple C0 in the Properties pane, but that’s the hard-core developer type in me coming out.

 

Speaking of panes, notice the Data pane on the left. It contains your data source and dataset information. To edit, you just right-click the applicable item. The Built-in Fields folder contains all the global variables like ReportName, PageNumber, etc., making it much easier to enhance your report with these items.

 

Another huge improvement is the addition of the Row Groups and  Column Groups panes. Instead of having to navigate through the dialog boxes to get to your group properties – or even to see the hierarchical arrangement of them when using multiple groups – all is available right next to the report design surface. Just right-click on a group and click Edit to access its properties. To create a group, by the way, you just drag a field from the Data pane into the Row Group or Column Group and position it where you like. That’s about as easy as it gets!

 

Other cool features – when you enable the ruler, you get more than markings on the vertical and horizontal edges. The position of your cursor becomes a crosshair that travels along those edges so you can get incredible precision on the report design surface. And if you’re having difficulty seeing the details, the slider in the bottom right corner lets you zoom in as much as you like. I can’t tell you how many times I wished I could do this in earlier versions.

 

The report designer has the same paradigm as the standard BIDS designer – design view versus preview. Here’s a section of the report in preview mode:

 


 

Nothing unfamiliar here. However, I did notice that the ReportName global variable didn’t show up here – a little CTP bug – although the ExecutionTime global variable did appear correctly. ReportName did show up when I deployed the report to the server.

 

So now that you see the final report, you can see more clearly that this is a hybrid of a table plus a matrix… table+(matr)ix…tablix! How so? In a standard matrix in previous versions, the row groups would appear in separate columns. Here they appear in a single column – which I can easily do with grouping in a table. I have standard matrix style with dynamic columns but a new capability with the non-grouped column for Avg Sale at the end. Of course, you can nest a tablix inside of a tablix, which can make for some very interesting possibilities, I’m sure!

 

That’s it for today. More to come on Tablix in future blog posts. –Stacia

SQL Server 2008: Data Profiling Task – the Quick Version


Maintaining data quality is important throughout the organization, but certainly BI applications are expected to maintain an even higher standard because we have greater control over the data going into the data warehouse. As with most things in life, the ideal seldom matches the real for a number of reasons. In SQL Server 2008 (November CTP), the new Data Profiling Task in Integration Services should help a lot becase not only can you perform data quality assessments on an ad hoc basis, but you can also automate data quality by integrating the quality assessment into the ETL process. One important caveat about this task is that it works only with SQL Server sources (2000 or higher).

 

This one little task has lots of options which I will address over a series of blog posts. Today, I will introduce the Quick Profile feature of the task and show you the output.

 

To get started with the Data Profiling task – whether you’re using the Quick Profile or not – you must define either a file connection or a variable to hold the output. The output will be generated in XML format. In the editor, there is a Quick Profile button to open the Quick Profile Editor. Consider this the quick and easy approach to profiling a single table (or view).

 

Now I must say I did read BOL to get an overview of the task, but one little detail about the connection for the Quick Profile escaped detection. I had dutifully created a OLE DB connection manager to AdventureWorks, because how else does one connect to SQL Server databases? But that connection did not show up in the connection drop-down list in the Quick Profile Editor. (Neither did it give me the option to create one, which might have given me a clue. Most other places to set connections give me the ability to create a new connection on the spot, so this seemed odd to me, but it IS a CTP…). Then I went back and re-read BOL to discover that the Quick Profile edtor requires an ADO.NET connection manager using the SqlClient provider. Not particularly intuitive, in my opinion, but it is documented so I can’t complain too loudly. Once I set up the correct connection type, it appeared in the drop-down list and I was set to continue.

 

Once the connection is set, you have the opportunity define which profiles you want to create. Just for kicks, I decided to profile the Product table in AdventureWorks and kept the default profile selections:

 


 

 

The Quick Profile editor adds all the profiles you selected to the Profile Requests page of the Data Profiling Task Editor.

If you click on a particular request, you can view, and of course change, the properties of the selected request:

 


 

You then execute the package and get the results. If you output the results to a file, you can use the Data Profile Viewer. Just click Start, Run, and type DataProfileViewer and click OK. (It’s actually found in C:\Program Files\Microsoft SQL Server\100\DTS\Binn if you’re curious.) Use the Open button there to navigate to your output file fromthe package execution.

 

Here’s an example of the Candidate Key Profile:

 


 

Now in the case of an existing table, this particular profile isn’t particularly enlightening. After all, the key has already been defined. It’s more useful for data we haven’t seen before, perhaps an extract sent to us by a business partner, and we want to evaluate the contents so we can build a table appropriately for it. In terms of data quality, this profile on the Product table tells us that the combination of columns – ProductID, Name, and ProductNumber uniquely identify each record in the table. That’s a good result because that’s what we’d expect! If there were any key violations, they would be listed in the viewer.

 

Here’s an example of the Column Null Ratio Profile for the Color column:

 


 

This profile tells us there are 248 records that have a null value in the column which represents 49% of the total records in the table. That seems like a high number of records, so this is could be an opportunity to do some data clean-up in the source. If that’s not an option, then you might want to consider replacing NULL with some value acceptable to business users before loading into your data warehouse.

 

Here’s an example of the Column Value Distribution Profile for the Color column:

 


 

This information shows the breakdown of records by distinct value showing both the actual count and the percentage. This is useful for validating distinct values and ensuring you don’t have any surprises. This is where dirty data that likes to pop up after you’ve built an Analysis Services database. Now you have an opportunity to spot potential problems and fix them before the data gets to the data warehouse.

 


Here’s an example of drilldown:

 


 

When examining a profile, you can select a profile row and click the drilldown button to view the records associated with that profile characteristic. This drilldown ability lets you know exactly which records fit this aspect of the profile – no need to stop and write a query to figure it out. (Not all profiles – at least in CTP5 – have drilldown capability.)

 

Here’s an example of the Column Statistics Profile on the StandardCost column:

 


 

This profile is useful for checking the min and max values in a column as well as the mean and standard deviation. If you see ranges you don’t expect, you should investigate and clean up accordingly.

 

Here’s an example of the Column Length Distribution Profile:

 


 

This profile identifies the maximum length of values in the column and provides counts and precentage of total records for each length found. This is most useful for columns in which you expect a constant length. For example, if Style should have only values of 1 character, but some records have 2-character values, then you need to find and fix those records with invalid values.

 

So, you can see the default Quick Profile of the Data Profiling Task provides quite a lot of information without a lot of setup. Many of the profiles can be configured to fine-tune the profiling process which I will explain in a future blog post. –Stacia

SQL Server 2008: More to Know About Script Task and Script Component


The most anticipated change to the Script Task and Script Component has certainly been the change from VSA to VSTA for developing the script code. As a result of this change, the script language is no longer limited to Visual Basic .NET; now developers can choose to code using Visual C# .NET instead. Virtually every introduction to Integration Services I’ve conducted generates at least one complaint from the audience about the lack of support for Visual C# .NET in SQL Server 2005, so I’m glad to see this capability added.

 

All the buzz I’ve seen around the Script Task and Script Component focuses on the C# capability, but misses other changes that I think are also important to know about. Let’s take a closer look at other features introduced in these items with SQL Server 2008 Integraton Services.

 

ReadOnlyVariables and ReadWriteVariables properties

In the editor, the ReadOnlyVariables and ReadWriteVariables in SQL Server 2005 required you to type in the variable names. You can still do this, but I always ask, why type when you can select an item from a list? It saves keystrokes, and more importantly, ensures a correct value! In SQL Server 2008 there is an ellipsis button to click which opens a Select Variables dialog box which displays all the variables in the System and User namespaces. The downside (at least in the November CTP in which I most recentely tested this feature) is that this list seems to be a bit random and you can’t sort by name or type.


Precompilation


The PrecompileScriptIntoBinaryCode property in SQL Server 2005 is removed in SQL Server 2008. The removal of this property means you no longer get the choice of whether to precompile the code before package execution (which gets better performance at the expense of a larger package) or to compile just-in-time.


The default behavior in SQL Server 2008 is to precompile your script. When you’re editing the script, a set of VSTA project files are created (or re-opened), but they are deleted from your hard disk and persisted only in the package.


Edit Script button (formerly known as the Design Script button)


This is such a little change, but it means a lot to me and I’m sure to a lot of others. While intellectually I understand that it doesn’t require a lot of physical effort to click the mouse a few extra times, I invariably am annoyed when I feel I’m clicking much more than necesary to accomplish a task or to reach a destination. With regard to the Script Task and Script Component, I appreciate that the Edit Script button is now on the first page of the editor. Hooray! What really happened here was the order of the pages was changed. In SQL Server 2005, the pages in the Editor are General, Script, and Expressions whereas, in SQL Server 2008, the pages are Script, General, and Expressions. Microsoft acknowledged that the most frequent reason people open the editor is to edit the script and thus shortened the path to get there.


Access to External Assemblies


If you want to add a reference to a DLL in your script in SQL Server 2005, the VSA interface limits access to external assemblies in the Windows\Microsoft.NET\Framework\v2.0.50727 folder on your development machine and to GAC or this folder on the machine running the package. The Add Reference dialog presents a list of DLLs which you can add to your script, but there is no way to browse to another location where you might prefer to (or must) store your DLLs.  Furthermore, there is no way to add a Web reference if you want to interact with a Web service. Instead, you have to create your own proxy class first, register the class on the dev and production machines, and then reference that class in your script. Not impossible, but lots of extra steps required.


In SQL Server 2008, the VSTA interface in the Script Task and Script Component makes development much easier. First, the Add Reference dialog box has all the tabs you’d expect in a full IDE – .NET, COM, Projects, Browse, and Recent. The key point here is you can browse to the location of the DLL you want to use now. Second, you have the ability to add a Web reference and thereby get the proxy class created automatically for you without having to go through all the steps required to get this behavior in SQL Server 2005. Jamie Thomson has posted a video demonstrating how to consume a Web service with VSTA in SQL Server 2008 Integration Services which you should check out if this sort of functionality is useful in your environment. –Stacia

SQL Server 2008: Reporting Services Configuration Changes

One of the benefits of SQL Server 2008 Reporting Services is the removal of the dependency on IIS. This architectural change is huge because it opens the door to Reporting Services for those IT shops that wouldn’t allow installations of SQL Server components on a Web server. Plus it removes one more layer of potential configuration problems to troubleshoot when connectivity issues arise. This architectural change also affects the steps we normally follow to configure the server. So I decided today to do a little exploring to see what’s different.


 


Now, it’s important to note the Juy CTP doesn’t provide the complete configuration and management experience, because after all Katmai is still a work in progress. But enough is there for me to start poking on the parts to see what’s new and different. After installing Katmai on a Windows Server 2003 server, the Reporting Services Windows service is running under the Network Service account. I have not installed IIS (and won’t) but the default installation sets up my report server databases so I should be good to go. No dependency on IIS means I don’t have to set up application pool identities. As an initial test, I try to access Report Manager using the standard URL (localhost/reports) and see that it displays the Home page so everything is working under the default configuration. So far, so good.


 


Next, I want to peruse the Reporting Services Configuration to see what the current settings are, so I open the configuration tool and connect to the local report server instance. I can immediately see a difference in the configuration tool layout (including the absence of little green or red buttons to indicate whether that page has been configured):


 



 


I only need to have one service account configured for the Reporting Services service, rather than configure one for its Windows service and another for the Web service. On the Service Account page of the configuration tool, I can change to a different built-in account (which I don’t recommend) or to a domain account (which I do recommend).


 


Although IIS is no longer required, you still need to configure an IP address, TCP port, a URL, a virtual directory, and optionally an SSL certificate to create a URL reservation. A URL reservation is the mechanism by which http.sys – the operating system API required to run the Web service without IIS – allows users to access a specific URL. You can configure these settings on the Web Service URL page. An Advanced button on this page displays a dialog box that I can use to configure a variety of IP addresses, ports, host headers, and SSL ports if necessary. (I’ll delve into your options with URL reservations in more detail in a future blog.) When you apply the configuration settings, the applicable URL reservations are created.  If you’re curious about how http.sys enables applications to run without IIS, see this article from MSDN Magazine.


 


The Database page of the configuration tool does what it always did. You can create a new report server database or connect the current report server instance to a different existing report server database or even switch to a different mode (native or SharePoint integrated). The main thing different here is the interface. Configuring the database now is a series of pages in a wizard. I’m not sure whether this is a good thing or not – no opinion, really – but for one thing. One you walk through the wizard, there’s no option to save the database script. I hope this is fixed in a future CTP as I have a current client that always has one group build the report server and the database needs to be built on a separate server to which that server-building group doesn’t have permissions needed to execute the script. So we generate the script and hand it off to the DBAs. This omission is pretty big in the product, but as I mentioned earlier, Katmai is a work in a progress. I’ll keep my eye on this one.


 


The next page is the Report Manager URL which you use to set the virtual directory for, well, Report Manager. I’m not sure why this isn’t positioned after the page for the Web Service URL. It doesn’t matter in the grand scheme, but it feels out of place here to me. This page also includes an Advanced button allowing you to set up IP address, ports, host headers, and SSL ports if needed.


 


The Email Settings page hasn’t changed from SQL Server 2005’s configuration tool. All it does is allow you to put in a sender address and set up a SMTP server, so I wouldn’t expect a change here. However, if I could submit a wish, it would be nice to have other configurable settings for SMTP on this page. Currently in SQL Server 2005 – and it appears this won’t change in Katmai – you have to change the configuration file for important properties like SendUsing or SMTPAuthenticate (see RSReportServer Configuration file for other SMTP-related properties).


 


The Execution Account settings page hasn’t changed either, but I can live with that. There’s nothing I would change. J Similarly, the Encryption Keys page hasn’t changed functionally, but the UI has been modified to include test to better explain each option (Backup, Restore, Change, Delete).


The Scale-out Deployment page is the new name for SQL Server 2005’s Initialization page and, in my opinion, is a better name for it. I don’t currently have an environment set up to test setting up a report server farm, so I can’t comment on what differences you might find here, but I would not expect much different from the SQL Server 2005 experience. If I find otherwise in the future, I’ll blog about it.


Setting the authentication method no longer occurs in IIS, obviously. Now authentication configuration happens only in configuration files which I’ll be exploring in much greater detail in a forthcoming blog (because it’s near and dear to my heart at the moment since I’m speaking on SSRS and authentication configuration next week at SQL Server Magazine Connections – I need to find out what changes in that presentation once Katmai releases!). For now, you should be aware the default configuration of Reporting Services requires users to have a Windows domain account. Authentication is set to Negotiate, much like IIS, which will use Kerberos if it’s enabled or NTLM if Kerberos is not enabled. You can force Kerberos only or NTLM only by changing the report server configuration. Alternatively, you can use Basic authentication (although this feature will come in a future CTP) or Anonymous authentication if you’re adding in custom security like forms authentication. Note that the report server will reject anonymous authentication unless you are explicitly using custom security. Also, Single Sign-On (SSO), Passport, and Digest authentication will not be supported. More to come soon! –Stacia

SQL Server 2008: Attribute Relationship Designer


The dimension editor in SQL Server 2008 Analysis Services (in the July CTP) is supposed to improve the development experience by making it easier to work with attribute relationships. I read a comment from Vladimir Chtepa on Mosha’s blog that no UI improvement was needed for attribute relationship design, but from my experience this person’s opinion is not the norm. At least not in my classrooms. The people who generally have the hardest time understanding the arrangement of attribute relationships in SQL Server 2005 Analysis Services were those who had experience with Analysis Services 2000 (AS2K). Compared to defining levels in AS2K, which displays the relationship between levels in a top-down approach withhigher levels on top, the attribute relationships in Analysis Services 2005 (Yukon) seem upside-down by taking a bottom-up approach. That is, you connect a “higher-level” attribute to a “lower-level” attribute. For example, in a time dimension, in AS2K, you define Year-Quarter-Month-Day in that order from top to bottom whereas in Yukon you add Month as an attribute relationship to Day, add Quarter as an attribute relationship to Month, and Year as an attribute relationship to Quarter. My students who started their Analysis Services experience with Yukon had less trouble with this concept, but still commented that it seemed backwards and required some extra thought. I think the problem is related to how people visualize hierarchical information and the Yukon UI does not lend itself well to visualization of attribute relationships particularly if there are a lot of attributes in a dimension and the related attributes are spread far apart from one another. Therefore, I think the new attribute relationship designer will help a lot.

 

Before I started working with the new designer, I created a dimension from the snowflaked tables – DimProduct, DimProductSubcategory, and DimProductCategory. I set EnglishProductName as the Name column for the key attribute. I also selected the following columns: ProductAlternateKey (with enable browsing cleared), Color, ModelName, ProductLine, Size, ProductSubcategoryKey, and ProductCategoryKey. I renamed attributes as follows: Product Key to Product, Product Subcategory Key to Subcategory, and Product Category Key to Category. Then I set the Name column for Subcategory to EnglishProductSubcategoryName and for Category to EnglishProductCategoryName.

 

Next I wanted to see what the attribute relationship designer looks like before I start adding user-defined hieararchies. The designer has a diagram pane and an attribute relationships pane which containts a list of attributes and a list of attribute relationships. If I click on an attribute in the list, the attribute it’s related to in the diagram is highlighted with a bold outline. In the list of attribute relationships, I can also see how each attribute is related to either the Product or Subcategory attributes (each is the key attribute of a table in the snowflake structure) as I would expect.

 

 


 

After setting up hierarchies like this:

 


The attribute relationship diagram designer looks like this:

 


 

Nothing has changed in terms of the actual attribute relationships, but the levels of each hierarchy are represented. The correct attribute relationships are defined for the Products hierarchy because of the foreign key relationships defined between the tables in the snowflake. However, Product Line and Model Name come from the same table, and therefore by default are related only to the key attribute, Product. I need to fix this. In fact, note the warning in the picture above which lets me know there are no attribute relationships defined between the levels of the Models hierarchy. I can improve query performance by creating the attribute relationship. To do this, I click on the “lower-level” attribute Model Name, and drag it on top of the “higher-level” attribute Product Line. This sets up the attribute relationship properly, as shown here:

 


Pretty easy and pretty clear graphically. I can see exactly how each level in the hierarchy relates back to the dimension’s key attribute – and if I click Expand All in the designer’s toolbar, I can see all other attributes that are related to Product as well.

 

There may be cases when an attribute that isn’t in a hierarchy is not related to the key attribute. An example is in the Time dimension in which I have EnglishMonthName and MonthNumberOfYear. This is not a hierarchical-relationship as I showed in the previous example, but a member property relationship that I can use for sorting purposes. To get the behavior I want for sorting, I need to define an attribute relationship without the hierarchy. In the Time dimension designer, the MonthNumberOfYear is related by default to the Time Key. In the diagram, I expand the Time Key object, then click English Month Name, and drag it onto Month Number of Year inside the Time Key object. The relationship is adjusted like this:

 


 

As a side note, because the cardinality in this relationship is one-to-one, I need to change the Cardinality property in the property pane from Many (the default) to One. Now that the relationship exists, I can set the OrderBy property on EnglishMonthName to AttributeKey and the OrderByAttribute property to Month Number Of Year.

 

While I don’t think moving the interface to another tab really simplifies my work, or enables me to build relationships faster in terms of mouse clicks or keystrokes, I do feel like the movements to set up the attribute relationships are more intuitive. The part I like best about the attribute relationship diagram is the visualization of flexible and rigid relationships. Best practice design says relationships should be rigid when members aren’t shifting around. So, for example, March is always part of calendar quarter 1 and will never, ever be part of calendar quarter 4. Therefore, the relationship between month and calendar quarter should be rigid. Now, as I mentioned in my previous post I think rigid should be the default, but it’s not, so I have to deal with it here. I simply right-click on the arrow representing the attribute relationship, point to Relationship Type, and select Rigid. Notice how the diagram reflects the Rigid relationship with the black arrow tip as compared to the Flexibile relationship with the white arrow tip:

 


I like this feature of the attribute relationship designer best because one of the first things I do when cleaning up someone else’s cube is to fix up the relationship types. In Yukon, I would have to look at each relationship individually whereas the diagram here allows me to see at a glance whether I need to fix anything. Currently, there are no warnings to draw my attention to potential problems in the dimension when some attributes are defined with flexible relationships and two with a rigid relationship. –Stacia

SQL Server 2008: Best Practice Design Alerts

Once upon a time, there was such a thing as a talking car. I never owned one, but I did get to drive one for a week in Quebec while a colleague and I were working with a client up there back in the late 80s. Normally, we were supposed to rent a compact car when out on business, but we had to pick up a bunch of computer equipment at air cargo and there was no way our luggage and the equipment was fitting into a compact. As it turned out, the only car that accommodated us was a New Yorker (and even then it was pretty tight). We quickly discovered that the New Yorker was one of those talking cars - with a French male voice. We named him Pierre and proceeded to try out things to see what he would say and add to our French vocabulary while we were at it. I don’t think we had an owner’s manual to simply peruse the list of errors we could commit (and should presumably avoid) for which Pierre would gently scold us. As time has shown, demand for Pierre and his counterparts simply didn’t hold up in the market. Maybe people can accept warning lights, but not a warning voice?


In SQL Server 2008, the cube and dimension designers in Analysis Services now come with best practice design warnings, but fortunately Dev Studio doesn’t read them aloud to you. A visual indicator – which I’ll call the blue squiggly – will appear on screen to highlight the offending object. The first warning you’re likely to see when you create dimension is associated with the dimension object at the top of the attribute tree. This warning says (in the July CTP), “Create hierarchies in non-parent child dimensions.” As soon as you create a user hierarchy, the blue squiggly goes away, right? Nope… now you probably have a new warning on the dimension object if the attributes you selected are all visible – “Avoid visible attribute hierarchies for attributes used as levels in user-defined hierarchies.” And the hierarchy object now probably has a blue squiggly to let you know that there are no attribute relationships defined between one or more levels in the hierarchy. (Remember this is a brand new dimension).


Don’t worry about more warnings appearing as you do your design work. Just go about your normal business, and hopefully all will clear up before you’re ready to deploy the project. Many of the 48 warnings (in the July CTP) are well-known best practices to experienced Analysis Services developers. So what’s the point of including best practices if they are so well-known? Well, not everyone implementing SQL Server for the first time has access to experienced developers, so their experience will be much more positive with Analysis Services if they are warned about the pitfalls before they fall in.


Rather than haphazardly try out something to see whether or not it conforms to best practices, as I did with Pierre, you can jump straight to Books Online to see complete list of the warnings (including links to more information about each). Search for the topic, “Design Warning Rules.” The warnings are organized into categories (in the July CTP BOL) as follows: Dimensions, Aggregations, Partitions, Attributes and Attribute Relationships, Measures and Measure Groups, User-defined Hierarchies, ROLAP and MOLAP storage, Data Providers, and Error Handling. Some warnings come with better explanations about best practices than others. I hope this will improve over time, because for the unitiated these warnings without explanation are little more than “because I said so” instead of the educational opportunity it could be.


Like Pierre’s reminders that we were doing something contrary to the established best practices of driving, the Analysis Services design warnings are there to alert you to potential hazards, but won’t stop you from ignoring them. For example, I’m not certain that I agree that one should always “Avoid visible attribute hierarchies for attributes used as levels in user-defined hierarchies.” This is a matter best decided in conjunction with users, in my opinion, after explaining the pros and cons of this approach. Some implementations may not have this luxury, in which case I would defer to the best practice recommendation.


Some best practices earn a chuckle from me, such as “Define a time dimension.” I have yet to meet a cube without one. I had a student insist once that they had seen one, but when pressed could not describe the purpose of the cube. I’m still waiting for a cube without a time dimension. I’m not saying it’s not possible, but I can’t imagine why you would want one as time-series analysis is one of the most compelling reasons to build a cube in the first place.


Some best practices contradict default values for dimensions (in the July CTP), which also amuses me, such as “Change the UnknownMember property of dimensions from Hidden to None” or “Define attribute relationships as ‘Rigid’ where appropriate”. It seems to me the Analysis Services dev team could easily make the change for default values to accomodate these best practices, as they did with “Do not ignore duplicate key errors. Change the KeyDuplicate property of the error configuration so that it is not set to IgnoreError”. To clarify, in SQL Server 2005, the default KeyDuplicate property value is IgnoreError, but this is changed to ReportAndStop in SQL Server 2008.


As mentioned earlier, before you deploy your project, you should clean up – to the extent you wish – the current warnings in your project. Warnings won’t stop your deployment, but you should make a conscious decision whether to ignore the surfaced warnings. A comprehensive list of all warnings in your project can be found in the Error List window (which you can open with Ctrl+E). Double-click on an error to access the designer and fix the problem. Alternatively, you can right-click the error and click dismiss to clear it off the list if you don’t intend to fix it. You can even add a comment to document your reason for ignoring this error. This method of clearing the error is instance-based and will not clear the same error if it’s found in a different dimension or cube. To globally dismiss a particular type of error, whether proactively before you start development or after the fact, you can access the new Warnings tab in the Database editor (which you can open on the Database menu by clicking Edit Database). Incidentally, the Warnings tab also contains a list of the warnings dismissed individually and the related comment.


All in all, I think this is a nice feature in SQL Server 2008 Analysis Services, particuarly for the many folks out there who are just getting started with this technology. Just as long as the warnings stay visual. As much as I like technology in general, I still don’t think I’m ready for Dev Studio to start talking to me like Pierre and I suspect many other people feel the same way. –Stacia

SQL Server 2008: Building a Time Dimension

In my previous post, I covered the new dimension wizard and mentioned there were options for creating time dimensions that I would cover later. Now I’ll explain those options further.


Time Dimension Options in SQL Server 2005 Analysis Services


Let’s start with a quick review of what happens in SQL Server 2005 (referred to as Yukon hereafter). On the Select the Dimension Type page of the dimension wizard, you can choose Standard, Time Dimension, and Server Time Dimension.


If you select Time Dimension, you identify the time table in your DSV and then you map your time columns to the Analysis Services time properties. For example, you map a CalendarYear column in your time table to the Year property. This association of a table column to a property helps your MDX queries how to handle time-related functions like YTD or PeriodToDate.  I admit I find this mapping process tedious, but necessary. When you use a time dimension table, you have to manage the processing of the Analysis Services dimension to add new time members if you incrementally add members to the table (instead of populating it well into the future as some people prefer to do).  The benefit of this approach is the ability to include time attributes that mean something to your industry, such as a flag for a holiday or weekend versus weekday. You also can confirm inclusion of hierarchies based on the columns you map to time properties. I never liked the inability to change the hierarchy names here, but that’s just a nit. You can, of course, add your own hierarchy or modify the hierarchy name later in the dimension editor.


If your time-related analysis is pretty simple and you don’t want to manage a time dimension in your data source, you can create a Server Time Dimension instead. This is a pretty handy feature that lets you define a date range for the dimension, the type of attributes you want to include (year, quarter, month, etc.), the calendars to support (calendar, fiscal, etc.). You can confirm the default hierarchies just like you can with a table-based time dimension. The generated dimension includes several additional attributes, such as Day of Month and Day of Year, and there are other optional attributes you can add using the editor, such as Day of Week or Trimester of Year. When you want to add the Server Time dimension to a cube, you have to add it on the Cube Structure page of the cube designer because the cube wizard doesn’t have a way for you to add it there.  You still need to ensure the end date of the range of your Server Time dimension is equal to or greater than the maximum date in your fact table.


There is a third option available. You could generate a time table by selecting the option to build the dimension without a data source. Select the Date template and you’ll get a similar interface as that for the Server Time Dimension. When you complete the dimension wizard, you have the option to generate the schema  on the spot or you can run the Schema Generation Wizard at a later time. The Schema Generation Wizard lets you create a new data source view for your time table or select an existing DSV. You can even choose to have the wizard populate the time table or you can leave it empty. You’re also given the opportunity to specify naming conventions. By the way, your credentials are used to create the database objects so you’ll need to be sure you have the correct permissions on the data source. This is a nice way to get started with a time table but you’ll need to keep it up-to-date with an ETL tool and you can’t customize it to have more or fewer columns.


Time Dimension Options in SQL Server 2008 Analysis Services


As I mentioned in my previous blog entry, SQL Server 2008 Analysis Services (which I’ll call Katmai from now on) gives you four choices for creating a dimension:


·        Use an existing table


·        Generate a time table in the data source


·        Generate a time table on the server


·        Generate a non-time table in the data source (using a template)


The second and third options relate specifically to a time dimension. If you select “Generate a time table on the server,” you get the same result as the Server Time Dimension in Yukon and an almost identical interface in the wizard. The exception is that the hierarchy confirmation page is missing in Katmai – which is fine for me as I’d rather fine-tune the hierarchy in the editor anyway.


The new option, “Generate a time table in the data source,” is the same as Yukon’s option to build the dimension from the Date template. You run the Schema Generation Wizard to design the table and optionally to populate it the first time.


So what do you do if you have a time dimension in your data source? The only option is to choose “Use an existing table.” On the Select Dimension Attributes page of the dimension wizard, which you use to select the columns from your table to include in your dimension, you have the ability to change the attribute type. This is the equivalent of mapping columns to time properties in Yukon, although I must say the interface is not ideal for this task. In short, more clicks are required to set up your time dimension from a table, but you have the benefit of getting a table and all the attributes exactly the way you want them.


Conclusion


So, functionally, nothing has really changed much for time dimensions in Katmai apart from renaming of options and some slight interface adjustments. If you base your time dimension on a table, the interface changes make the process to create the time dimension in the Analysis Services database a bit more tedious, in my opinion. Fortunately, these aren’t tasks that you have to repeat every day and, if you want to reproduce the time dimension in another cube, you can always script it out rather than build it through the wizard.  –Stacia

SQL Server 2008: The New Dimension Wizard for Analysis Services

While there are several new features slated for Analysis Services that haven’t been released in a CTP yet, the July CTP does include a new dimension wizard. This wizard is intended to simplify your work by streamlining the steps involved to set up a new dimension. Today, as I walk you through the new wizard in SQL Server 2008 (which I’ll henceforth call Katmai throughout this article),  I’ll explain how it’s lived up to the promise of an improved design experience and remind you how it’s different from the dimension wizard in SQL Server 2005 (which I’ll refer to as Yukon).


Of course, before you can add a dimension, you need to add a data source and data source view (DSV). Nothing has changed here. If an attribute doesn’t exist in the format you want in the physical data source, you will still need to add a named calculation (or a derived column in a named query) in the DSV before you add the attribute to the dimension. For example, if you have FirstName and LastName columns in a customer dimension table, but want to display “LastName, FirstName”, you’ll need to concatenate the columns in the DSV.


Step 1: What’s the source for your dimension?


Once the DSV is just right, you can kick off the dimension wizard. In Yukon, the first main page of the wizard is “Select Build Method” which gives you two choices for creating a dimension. I use the bottom-up approach most often – that is, I build the dimension using a data source which in turn is associated with a DSV which includes one or more tables for the dimension. Alternatively, there is the top-down approach, or more officially ”Build the dimension without using a data source” which lets you describe the design and generate a table schema in your data source. If you leave the Auto build check box selected, then the wizard recommends the key column for the dimension and looks for hierarchies (although my experience with auto-detected hierarchies has been inconsistent). You then click Next to select the DSV and click Next again to specify whether you’re creating a standard dimension, a time dimension based on a table in your DSV, or a server-based time dimension. To recap, not counting the welcome page of the wizard, you go through three pages of the wizard to define the type and source of the dimension you want to build.


In Katmai, the three pages have been consolidated into one page – Select Creation Method – which gives you the following choices:



  • Use an existing table
  • Generate a time table in the data source
  • Generate a time table on the server
  • Generate a non-time table in the data source (using a template)

The first and fourth options are equivalent to the options you have in Yukon. I’ll discuss the time table options in a future blog entry. For now, I’ll continue through the wizard using an existing table.


Step 2: Which table is the main dimension table and which are the key and name columns?


In Yukon, on the Select the Main Dimension Table page of the wizard, you first select the dimension table (or the most granular table in a snowflake schema). Then you select one or more key columns in the table to uniquely identify each dimension member. Optionally, you select a column to represent the member name.


In Katmai, the only change here is that one page – Specify Source Information – allows you to select the DSV and the dimension table selection. You also specify the Key and Name columns on this page. The interface is slightly different if you want to use a composite key – using a drop-down list instead of check boxes. I think this will wind up requiring more mouse movement than the previous interface, so I’m not wild about this last change, but practically speaking I rarely use composite keys in a dimension so it’s probably a negligible change.


Step 3: Which columns are dimension attributes?


On the Select Dimension Attributes page (the next page in both Yukon and Katmai), a list of all remaining columns displays. In Yukon, you won’t see the key or name columns in this list, but in Katmai the key column is included in the list. In Yukon, all attributes are selected by default (if you kept Auto Build enabled) whereas in Katmai only the key column is selected by default.


In Yukon, you see the same column name in the list’s columns labeled Attribute Key Column and Attribute Name Column. I liked this feature to update name columns for snowflaked schemas. Unfortunately, this feature goes away in Katmai. You’ll have to update the name column in the dimension editor directly. Not the end of the world, I suppose, but it’s a feature I use enough to really notice it’s missing.


Katmai adds another feature to this page which I’ll concede compensates for the inability to specify the Attribute Name Column. Specifically, there is a Enable Browsing check box for each attribute. This is a nice quick way to quickly and efficiently set the AttributeHierarchyEnabled property to False which means the attribute can’t be placed on an axis in a query (i.e. you can’t put it in rows or columns or in the filter).  Disabled attributes are useful for things like phone numbers or addresses – you don’t really analyze this information but your client application can make it available to the end user as a tooltip, as an example. On this page, you can also specify the attribute type, although I don’t know too many people who actually use this often for non-time dimensions.


Step 4: What is the dimension name?


The final page in Katmai allows you to name the dimension and you’re done after going through a grand total of four pages! Before I get to this point in Yukon, I have to specify a dimension type (which is usually Regular), define a parent-child hierarchy (which should self-detect anyway and which I avoid whenever possible), two pages for hierarchies (detecting and reviewing) and then I reach the final page to give the dimension a name. For a standard dimension with auto build enabled in Yukon, you have to go through ten pages. That’s quite a difference and therefore Katmai considerably streamlines the basic development of a dimension with the new dimension wizard.


There are a few more Analysis Services features in the July CTP that I’ll review in future blog entries. Check back soon!  –Stacia

SQL Server 2008: Lookup Transformation and Caching

In theory, using a Lookup transformation in SQL Server 2005 Integration Services is faster than trying to achieve the same goal with a SQL query. If you frequently use the Lookup transformation, you are probably aware of some of its limitations. One limitation in particular–the larger the reference table used in the lookup, the more likely you’re going to experience performance degradation. There are several ways to mitigate performance issues, but for the sake of this discussion, I will focus on better performance through caching .

 

To review, in SQL Server 2005 you have three caching options:


  • Full caching (default): loads the entire reference dataset into memory before the lookups begin.
  • Partial caching: loads a subset of the reference dataset into memory (which can be constrained to a specific size), queries the source database when a lookup match isn’t found in the existing cache, and loads new rows into the cache if a match is found in the source database.
  • No caching: the source dataset is queried for each row in the pipeline.

While you can tune the caching for the lookup to get optimal performance, that cache goes away. So if you have a second package that needs to lookup to the same reference table, you have to pay the overhead cost of loading up the cache (if you’re using it) again. That’s where SQL Server 2008 comes to the rescue with “persistent lookups.”


The new Lookup transformation hasn’t been made available yet in a public CTP, so I haven’t had a chance to benchmark the caching, but I think the improvement of persistent lookups shows promise. Here’s my understanding of this feature (which of course is subject to change before SQL Server 2008 goes RTM):



  • Caching as it was in SQL Server 2005 continues to work as described with full, partial, and no cache (only better).
  • A new type of cache - a “persistent” cache – can be created and re-used.

One big change to caching in general is that you can have a cache larger than 4GB, on both 32-bit and 64-bit systems which should help the scalability of your Lookups.


Let’s explore the idea of a persistent cache further. Obviously, it’s not useful when the reference dataset is highly volatile. But for a relatively stable reference dataset, it’s got possibilities. Essentially, you start the process by populating the cache. You can do this in a separate data flow from the one containing your Lookup transformation. The cache stays in memory to be used by as many Lookups as you like until the package stops executing.


But wait – there’s more! As an alternative, you could populate the cache in its own package and store it in a .caw file. Then that cache file can be used in as many other packages as needed. Here’s the resusability factor that was missing in SQL Server 2005. The .caw file can be read faster into memory than reading in a table, view, or query for a full cache. It’s like the Raw File you can use for asynchronous processing and optimized for reads by Integration Services. In fact, you can use the Raw File Source to load the cache contents into a data flow, do whatchya gotta do to the data, and land the results someplace else if you use the data for more than just Lookups.


Another benefit of storing the cache in a file is the ability to deploy a package to multiple environments and ensure the contents of each cache are identical. Simply add it to the Miscellaneous folder and the package deployment utility will include it in the set of files to be transferred to each target server.


An important change with partial caching is the ability to store rows in the cache that don’t match rows in the data flow. You will even be able to specify how much of the cache you want to devote to this purpose.


Considering a considerable part of the ETL process in data warehousing depends on the Lookup transformation, it’s good to see this part of Integration Services has received attention in the next release of SQL Server. Watch for the new Lookup transformation, Cache transformation, and Cache connection manager in a future CTP. –Stacia

Looking Forward to Report Designer Features in Katmai’s Reporting Services

As I was slogging through updating reports for a client recently, I was really wishing I could fast-forward in time, install SQL Server 2008 in my client’s environment, and use the new report designer demonstrated by Jason Carlson (Product Unit Manager of Reporting Services at Microsoft) at PASS 2007 in Denver a few weeks ago. In particular, the 2000/2005 report designer interface isn’t very friendly when you want to work with matrix subtotal properties. When I teach Reporting Services classes, I often ask students to come up with a name for the green triangle that is the single point of entry to the Subtotal properties. No one has yet come up with anything better than “that green thingie…”. Somehow that strikes me as so much more amusing than the more matter-of-fact “green triangle.” (But then I’m easily amused…) I’ve been creating reports in Reporting Services for at least 4 years now and still have yet to master the precise click-motion required to nail that green thingie the first time. Compound that with trying to accomplish this feat over a Remote Desktop Connection and I was quickly frustrated before I had finished updating the first report! Only twenty more to go…sigh.


While Katmai won’t help me with my current problem, I am delighted to see that it will resolve one of the most frustrating aspects of working with the matrix data region. I haven’t done an official count, but I do believe I use a matrix much more often than a table, so my encounters with the green thingie are more numerous than I would like. I suspect many other people are frequent users of the matrix and therefore feel my pain. Fortunately, our collective frustration goes away as soon as Katmai releases (and we can convince everyone to make the leap right away – I AM an optimist after all!).


In fact, not only does the matrix improve, the whole report designer interface changes. The version that Jason showed at PASS was a separate client tool – outside of Visual Studio, that is - which he said will ultimately be merged with Report Builder (but don’t expect that to happen in the Katmai release). The beautiful thing about the report designer is that it’s less intimidating to non-developer types than the Visual Studio report designer interface. The Properties window is still there for the hard-core folks. For everyone else, a right-click will get you what you need. There’s also definitely an Office 2007 flavor to the new designer, including ribbons. Business users who are responsible for report development will LOVE this tool.


I have to say it all looks pretty, but my personal favorite is the disappearance of the green thingie. Nothing personal, but good riddance. Subtotal areas in a matrix will now have a place right alongside every other object you place in the designer. And you can use subtotal areas a lot more flexibly, too, because a matrix is no longer really a matrix. Now it’s a tablix. (Is that tay-blicks with a long a or tab-blicks with a short a…..?) More about tablix in a future blog entry. Watch for the new report designer in a future CTP release. –Stacia