Erik Veerman's Blog, page 7

April 1, 2007

Old blog entries

I have yet to move my old blog entries over to our new blog site... but you can find them here:   http://www.solidqualitylearning.com/Blogs/erik/  

Erik Veerman

...on practical SSIS (and SSAS)

erik (at) solidq.com 

 

 •  0 comments  •  flag
Share on Twitter
Published on April 01, 2007 19:47

March 20, 2007

SSAS Many to many optimization technique #1

This has been something brewing for a few weeks that I am just getting to writing down… and yes, naming this idea #1 does imply that I have another idea :) so stay tuned.  Before diving into the details, the summary is that this approach when applied against the AdventureWorksDW database was able to take the intermediate measure group rowcount down from approx 16,000 to only 19.  Wow… because the size of the intermediate measure group has a big impact on performance.   Lets call this the "Many to Many Matrix Relationship Optimization" approach for SSAS M:N relationships… I've been working on this with Dan Hardan (Msft) on this because of a need to make the Many to Many relationships perform better.   As a background, there are many kinds of many to many relationships, but if I can summarize two of the common kinds…   1.)  If you have a many to many relationship that only relates one dimension to another without the m-n tied to the main fact table, then you are lucky!  For example, you have a vendor dimension that contains all your company vendors and then a second dimension that is a vendor role dimension that has different roles (such as shipper, buyer, etc) and a vendor ties to one or more roles directly through an intermediate fact table.  Your main fact table, say inventory fact ties directly to the vendor dimension but you don't know which specific vendor role that an inventory record relates to, so the vendor to vendor role relationship is only between the two dimension.  In this case, your intermediate fact table will be very small as is.   2.) The second many to many relationship is what this (long) blog entry is about… and you can look at the AdventureWorksDW SalesReason implementation to see it.   It's the situation when you have a fact table to dimension M:N relationship with an intermediate fact table that has just as many rows (if not more) than your main fact table… such as through a degenerate dimension.  A single sale can be tied to zero, one or more Sales Reasons… and there is no "normal" dimension that the sales fact table has to go through to get to the intermediate fact table (measure group).  Instead, the intermediate fact table also has the degenerate sales key included in it (so its in both fact tables).  I have run into this situation 2-3 times at difference clients.  In SSAS, a "fact dimension" is created and is tied to each measure group (the sales measure group and sales reason intermediate measure group).  The sales reason dimension is directly tied to the intermediate measure group and tied to the sales measure group as a many to many dimension relationship.  Look at the AdventureWorks cubes to see how this works.  So essentially the shared dimension is huge (the same number of rows as the sales fact) AND the intermediate measure group is potentially bigger than the sales fact table because a single sale can have more than one sales reason! The solution described in scenario 2 surely works, but only for smaller size cubes where the fact tables don't exceed approx 20-50M records (depending on your hardware and SLA).   There's a much better solution for this scenario… and its based on the premise that the relationships can be collapsed to common sets of the many to many dimension records.  For example, if in my sales fact table, I may have 25 records that have the same combination of sales reasons… such as "internet" and "friend."  In the current solution, there are 50 records in the intermediate fact table for those 25 records.  One per sale for the "internet" reason and one per sale for the "friend" reason.  However, this can be collapsed to 2 records if you are able to associate all the 25 sales fact records together and identify them as having these two specific combinations of sales reasons.   How?  Through creating a matrix relationship key, meaning that you get all the different combinations of the sales reason members (not the cross product of them, only the ones that are actually used) and then assign a key to the "matrix" of records.  Here's a great visualization that Dan Hardan put together:   M2M Matrix   These are the steps to make it happen (and the code for AdventureWorks is below).   1.) Create a new dimension the called Sales Reason Matrix Relationship Dimension and it will need to have a column that has a delimited list of surrogate key combinations from the Sales Reason AND an IDENTITY column that is the Matrix Relationship Key.   2.) Add a new dimension key to the fact table that ties the sales records to this new dimension (meaning, look at the old intermediate fact table and find the right combination and then lookup the right new dimension record)   3.) Create an intermediate measure group that has two SKs… the Matrix Dimension Key and the associated Sales Reason Key.  So you will have potentially more than one record per Matrix Dimension Key if you have.   4.) Finally, redo your SSAS design to use the new intermediate measure group and the new Matrix Dimension.  Hide the Matrix Dimension as it's only there to resolve the many to many relationship.  Here's what the Dimension Usage l;ooks like in SSAS... and the XMLA is below.

DimUsage

  OK!  Now, here's some TSQL code and the SSAS XMLA to test it out in AdventureWorks:
TSQL to create the M2M Matrix structures   XMLA to build the new cube with the Matrix Relationship included   FYI, I've only made this work for the Sales fact table… but obviously this can extend beyond that.
  I am very interested in hearing your feedback on this.
  Kind regards,
Erik
 •  0 comments  •  flag
Share on Twitter
Published on March 20, 2007 14:49

Erik Veerman's Blog

Erik Veerman
Erik Veerman isn't a Goodreads Author (yet), but they do have a blog, so here are some recent posts imported from their feed.
Follow Erik Veerman's blog with rss.