Wednesday, October 24, 2007

While there are several new features slated for Analysis Services that haven't been released in a CTP yet, the July CTP does include a new dimension wizard. This wizard is intended to simplify your work by streamlining the steps involved to set up a new dimension. Today, as I walk you through the new wizard in SQL Server 2008 (which I'll henceforth call Katmai throughout this article),  I'll explain how it's lived up to the promise of an improved design experience and remind you how it's different from the dimension wizard in SQL Server 2005 (which I'll refer to as Yukon).

Of course, before you can add a dimension, you need to add a data source and data source view (DSV). Nothing has changed here. If an attribute doesn't exist in the format you want in the physical data source, you will still need to add a named calculation (or a derived column in a named query) in the DSV before you add the attribute to the dimension. For example, if you have FirstName and LastName columns in a customer dimension table, but want to display "LastName, FirstName", you'll need to concatenate the columns in the DSV.

Step 1: What's the source for your dimension?

Once the DSV is just right, you can kick off the dimension wizard. In Yukon, the first main page of the wizard is "Select Build Method" which gives you two choices for creating a dimension. I use the bottom-up approach most often - that is, I build the dimension using a data source which in turn is associated with a DSV which includes one or more tables for the dimension. Alternatively, there is the top-down approach, or more officially "Build the dimension without using a data source" which lets you describe the design and generate a table schema in your data source. If you leave the Auto build check box selected, then the wizard recommends the key column for the dimension and looks for hierarchies (although my experience with auto-detected hierarchies has been inconsistent). You then click Next to select the DSV and click Next again to specify whether you're creating a standard dimension, a time dimension based on a table in your DSV, or a server-based time dimension. To recap, not counting the welcome page of the wizard, you go through three pages of the wizard to define the type and source of the dimension you want to build.

In Katmai, the three pages have been consolidated into one page - Select Creation Method - which gives you the following choices:

  • Use an existing table
  • Generate a time table in the data source
  • Generate a time table on the server
  • Generate a non-time table in the data source (using a template)

The first and fourth options are equivalent to the options you have in Yukon. I'll discuss the time table options in a future blog entry. For now, I'll continue through the wizard using an existing table.

Step 2: Which table is the main dimension table and which are the key and name columns?

In Yukon, on the Select the Main Dimension Table page of the wizard, you first select the dimension table (or the most granular table in a snowflake schema). Then you select one or more key columns in the table to uniquely identify each dimension member. Optionally, you select a column to represent the member name.

In Katmai, the only change here is that one page - Specify Source Information - allows you to select the DSV and the dimension table selection. You also specify the Key and Name columns on this page. The interface is slightly different if you want to use a composite key - using a drop-down list instead of check boxes. I think this will wind up requiring more mouse movement than the previous interface, so I'm not wild about this last change, but practically speaking I rarely use composite keys in a dimension so it's probably a negligible change.

Step 3: Which columns are dimension attributes?

On the Select Dimension Attributes page (the next page in both Yukon and Katmai), a list of all remaining columns displays. In Yukon, you won't see the key or name columns in this list, but in Katmai the key column is included in the list. In Yukon, all attributes are selected by default (if you kept Auto Build enabled) whereas in Katmai only the key column is selected by default.

In Yukon, you see the same column name in the list's columns labeled Attribute Key Column and Attribute Name Column. I liked this feature to update name columns for snowflaked schemas. Unfortunately, this feature goes away in Katmai. You'll have to update the name column in the dimension editor directly. Not the end of the world, I suppose, but it's a feature I use enough to really notice it's missing.

Katmai adds another feature to this page which I'll concede compensates for the inability to specify the Attribute Name Column. Specifically, there is a Enable Browsing check box for each attribute. This is a nice quick way to quickly and efficiently set the AttributeHierarchyEnabled property to False which means the attribute can't be placed on an axis in a query (i.e. you can't put it in rows or columns or in the filter).  Disabled attributes are useful for things like phone numbers or addresses - you don't really analyze this information but your client application can make it available to the end user as a tooltip, as an example. On this page, you can also specify the attribute type, although I don't know too many people who actually use this often for non-time dimensions.

Step 4: What is the dimension name?

The final page in Katmai allows you to name the dimension and you're done after going through a grand total of four pages! Before I get to this point in Yukon, I have to specify a dimension type (which is usually Regular), define a parent-child hierarchy (which should self-detect anyway and which I avoid whenever possible), two pages for hierarchies (detecting and reviewing) and then I reach the final page to give the dimension a name. For a standard dimension with auto build enabled in Yukon, you have to go through ten pages. That's quite a difference and therefore Katmai considerably streamlines the basic development of a dimension with the new dimension wizard.

There are a few more Analysis Services features in the July CTP that I'll review in future blog entries. Check back soon!  --Stacia

Wednesday, October 24, 2007 7:15:02 PM (GMT Standard Time, UTC+00:00)  #    Comments [0]  | 
Wednesday, October 17, 2007
In theory, using a Lookup transformation in SQL Server 2005 Integration Services is faster than trying to achieve the same goal with a SQL query. If you frequently use the Lookup transformation, you are probably aware of some of its limitations. One limitation in particular--the larger the reference table used in the lookup, the more likely you're going to experience performance degradation. There are several ways to mitigate performance issues, but for the sake of this discussion, I will focus on better performance through caching .
 
To review, in SQL Server 2005 you have three caching options:
  • Full caching (default): loads the entire reference dataset into memory before the lookups begin.
  • Partial caching: loads a subset of the reference dataset into memory (which can be constrained to a specific size), queries the source database when a lookup match isn't found in the existing cache, and loads new rows into the cache if a match is found in the source database.
  • No caching: the source dataset is queried for each row in the pipeline.

While you can tune the caching for the lookup to get optimal performance, that cache goes away. So if you have a second package that needs to lookup to the same reference table, you have to pay the overhead cost of loading up the cache (if you're using it) again. That's where SQL Server 2008 comes to the rescue with "persistent lookups."

The new Lookup transformation hasn't been made available yet in a public CTP, so I haven't had a chance to benchmark the caching, but I think the improvement of persistent lookups shows promise. Here's my understanding of this feature (which of course is subject to change before SQL Server 2008 goes RTM):

  • Caching as it was in SQL Server 2005 continues to work as described with full, partial, and no cache (only better).
  • A new type of cache - a "persistent" cache - can be created and re-used.

One big change to caching in general is that you can have a cache larger than 4GB, on both 32-bit and 64-bit systems which should help the scalability of your Lookups.

Let's explore the idea of a persistent cache further. Obviously, it's not useful when the reference dataset is highly volatile. But for a relatively stable reference dataset, it's got possibilities. Essentially, you start the process by populating the cache. You can do this in a separate data flow from the one containing your Lookup transformation. The cache stays in memory to be used by as many Lookups as you like until the package stops executing.

But wait - there's more! As an alternative, you could populate the cache in its own package and store it in a .caw file. Then that cache file can be used in as many other packages as needed. Here's the resusability factor that was missing in SQL Server 2005. The .caw file can be read faster into memory than reading in a table, view, or query for a full cache. It's like the Raw File you can use for asynchronous processing and optimized for reads by Integration Services. In fact, you can use the Raw File Source to load the cache contents into a data flow, do whatchya gotta do to the data, and land the results someplace else if you use the data for more than just Lookups.

Another benefit of storing the cache in a file is the ability to deploy a package to multiple environments and ensure the contents of each cache are identical. Simply add it to the Miscellaneous folder and the package deployment utility will include it in the set of files to be transferred to each target server.

An important change with partial caching is the ability to store rows in the cache that don't match rows in the data flow. You will even be able to specify how much of the cache you want to devote to this purpose.

Considering a considerable part of the ETL process in data warehousing depends on the Lookup transformation, it's good to see this part of Integration Services has received attention in the next release of SQL Server. Watch for the new Lookup transformation, Cache transformation, and Cache connection manager in a future CTP. --Stacia

Wednesday, October 17, 2007 2:01:23 AM (GMT Standard Time, UTC+00:00)  #    Comments [0]  | 
Wednesday, October 10, 2007

As I was slogging through updating reports for a client recently, I was really wishing I could fast-forward in time, install SQL Server 2008 in my client's environment, and use the new report designer demonstrated by Jason Carlson (Product Unit Manager of Reporting Services at Microsoft) at PASS 2007 in Denver a few weeks ago. In particular, the 2000/2005 report designer interface isn't very friendly when you want to work with matrix subtotal properties. When I teach Reporting Services classes, I often ask students to come up with a name for the green triangle that is the single point of entry to the Subtotal properties. No one has yet come up with anything better than "that green thingie...". Somehow that strikes me as so much more amusing than the more matter-of-fact "green triangle." (But then I'm easily amused...) I've been creating reports in Reporting Services for at least 4 years now and still have yet to master the precise click-motion required to nail that green thingie the first time. Compound that with trying to accomplish this feat over a Remote Desktop Connection and I was quickly frustrated before I had finished updating the first report! Only twenty more to go...sigh.

While Katmai won't help me with my current problem, I am delighted to see that it will resolve one of the most frustrating aspects of working with the matrix data region. I haven't done an official count, but I do believe I use a matrix much more often than a table, so my encounters with the green thingie are more numerous than I would like. I suspect many other people are frequent users of the matrix and therefore feel my pain. Fortunately, our collective frustration goes away as soon as Katmai releases (and we can convince everyone to make the leap right away - I AM an optimist after all!).

In fact, not only does the matrix improve, the whole report designer interface changes. The version that Jason showed at PASS was a separate client tool - outside of Visual Studio, that is - which he said will ultimately be merged with Report Builder (but don't expect that to happen in the Katmai release). The beautiful thing about the report designer is that it's less intimidating to non-developer types than the Visual Studio report designer interface. The Properties window is still there for the hard-core folks. For everyone else, a right-click will get you what you need. There's also definitely an Office 2007 flavor to the new designer, including ribbons. Business users who are responsible for report development will LOVE this tool.

I have to say it all looks pretty, but my personal favorite is the disappearance of the green thingie. Nothing personal, but good riddance. Subtotal areas in a matrix will now have a place right alongside every other object you place in the designer. And you can use subtotal areas a lot more flexibly, too, because a matrix is no longer really a matrix. Now it's a tablix. (Is that tay-blicks with a long a or tab-blicks with a short a.....?) More about tablix in a future blog entry. Watch for the new report designer in a future CTP release. --Stacia

Wednesday, October 10, 2007 3:23:34 AM (GMT Standard Time, UTC+00:00)  #    Comments [0]  | 
Friday, October 05, 2007

dfyrthrth

Friday, October 05, 2007 7:25:07 AM (GMT Standard Time, UTC+00:00)  #    Comments [0]  | 

Theme design by Jelle Druyts