Thomas.I Microsoft BI & Analytics

A Site Dedicated to General BI Issues and MS BI Issues: Architecture and Technology

Optimize CROSSJOINS and Calculated Measures with NONEMPTY

Posted by thomasivarssonmalmo on February 3, 2008

 
This post, like many previous about MDX, is mainly for the readers, like me, who do a lot of work in other MS BI applications(SSIS, the database,SSRS, MOSS and PerformancePoint) and are not top 10 MDX experts but would like to know more about this interesting multidimensional language.  My ambition is to write about a real business problem with SSAS cubes and how you find a solution by going from the gurus blogs to implementing it, by trial and error, by yourself.
 
In the SSAS newsgroup I read a question of why a crossjoin of leaf level products and single days takes so long when you add a calculated member.
 
The MDX Select was close to this statement that I have changed to apply to the Adventure Works cube. We can neglet the known issue with ParallelPeriod and time dimensions that start in the second half calendar year.
 
WITH Member Measures.[PPInternetSales] As
SUM(ParallelPeriod([Date].[Calendar].[Calendar Year], 1, [Date].[Calendar].CurrentMember),
[Measures].[Internet Sales Amount])
 
Select NON EMPTY{[Measures].[Internet Sales Amount], Measures.[PPInternetSales]} On Columns,
NON EMPTY CrossJoin({[Date].[Calendar].[Date].Members},{[Product].[Product Categories].[Product].Members}) On Rows
From [Adventure Works];
 
This query takes 7 minutes and 22 seconds on my laptop.
 
If we do the same crossjoin but without the calculated member, like this:
 
Select {[Measures].[Internet Sales Amount]} On Columns,
NON EMPTY CrossJoin({[Date].[Calendar].[Date].Members},{[Product].[Product Categories].[Product].Members}) On Rows
From [Adventure Works]; 
 
It takes 5 seconds on my laptop in management studio. This means that it is the calculated measure that is the problem but I have never realized that they slow down a MDX query this much, from five seconds to more than seven minutes.
 
If we do exactly the same in TSQL I will get an answere in less than 1 second on my laptop and 23 797 records(all products sold on all the the days in the Adventure Works cube), but without a parallelperiod value.
 
Select t.FullDateAlterNateKey, p.ProductAlterNateKey + ‘-‘ + EnglishProductName as Product,
Sum(f.SalesAmount) as InternetSalesAount
From dbo.DimTime t Join dbo.FactInternetSales f On
t.TimeKey = f.OrderDateKey Join dbo.DimProduct p On
f.ProductKey = p.ProductKey
Group By  t.FullDateAlterNateKey, p.ProductAlterNateKey + ‘-‘ + EnglishProductName
Order By  t.FullDateAlterNateKey, p.ProductAlterNateKey + ‘-‘ + EnglishProductName
 
With the TSQL query I know how many records I should get. The first MDX select above returns the same number of records from the cube as the TSQL version. To be fair when comparing the query time with the cube I get the parallelperiod value with a cost that is only nine seconds more than the TSQL query, as you will see soon. When I do not use calculated members in the MDX Select, MDX is faster than I expected, but statistically you can say that five times slower than TSQL.
 
After reading a few blogs I decided to start with the NONEMPTY MDX function that Mosha has written about.
 
WITH Member Measures.[PPInternetSales] As
SUM(ParallelPeriod([Date].[Calendar].[Calendar Year], 1, [Date].[Calendar].CurrentMember),[Measures].[Internet Sales Amount])
 
Select {[Measures].[Internet Sales Amount], Measures.[PPInternetSales]} On Columns,
NONEMPTY(CrossJoin({[Date].[Calendar].[Date].Members},{[Product].[Product Categories].[Product].Members})) On Rows
From [Adventure Works];
 
Query time is down to 4 seconds on my laptop but I get only 3575 records. Something is wrong. Another version was even worse and returned 13 361 records in 12 seconds.
 
WITH Member Measures.[PPInternetSales] As
SUM(ParallelPeriod([Date].[Calendar].[Calendar Year], 1, [Date].[Calendar].CurrentMember),[Measures].[Internet Sales Amount])
Select {[Measures].[Internet Sales Amount], Measures.[PPInternetSales]} On Columns,
CrossJoin({NONEMPTY([Date].[Calendar].[Date].Members)},{NONEMPTY([Product].[Product Categories].[Product].Members)}) On Rows
From [Adventure Works]; 
 
Changing NONEMPTY to EXISTS returned the right number of records but response time was 4 minutes and 40 seconds.
 
WITH Member Measures.[PPInternetSales] As
SUM(Exists(ParallelPeriod([Date].[Calendar].[Calendar Year], 1, [Date].[Calendar].CurrentMember),[Measures].[Internet Sales Amount]))
Select {[Measures].[Internet Sales Amount], Measures.[PPInternetSales]} On Columns,
NON EMPTY Exists(CrossJoin({[Date].[Calendar].[Date].Members},{[Product].[Product Categories].[Product].Members})) On Rows
From [Adventure Works];
 
After this execise NONEMPTY seemed to be hard to trust so I had searched the SSAS2005 newsgroup on NONEMPTY and got over 300 hits. Lycky me that I found a comment from Chris Webb regarding NONEMPTY on cubes with several measure groups. Chris commented that you will get members that exists in all measure groups unless you are pointing to the exact measure that should be used.
I changed my MDX Select to this.
 
WITH Member Measures.[PPInternetSales] As
SUM(ParallelPeriod([Date].[Calendar].[Calendar Year], 1, [Date].[Calendar].CurrentMember),[Measures].[Internet Sales Amount])
 
Select {[Measures].[Internet Sales Amount],Measures.[PPInternetSales]} On Columns,
NONEMPTY(CrossJoin({[Date].[Calendar].[Date].members}, {[Product].[Product Categories].[Product].members}),[Measures].[Internet Sales Amount]) On Rows
From [Adventure Works];
 
Finally I get the correct number of records and a MDX select with a calculated member in 10 seconds.
 
I you have an idea of how to make this MDX select even quicker do not hesitate to comment.
 
Edit: Seems that there are differences of how fast the query is returned. My last query with SUM in the calculated member takes 5,7 seconds on my laptop with another tool, MDX-Studio. Darrens suggestion, that you can read in the comment, takes 2,9 seconds in MDX-Studio.

6 Responses to “Optimize CROSSJOINS and Calculated Measures with NONEMPTY”

  1. Darren said

    Hi Thomas,
     
    Sometimes you can get a performance boost by also doing NonEmpty on the sets inside the crossjoin. If your data is sparse enough, this means that you end up with less members to crossjoin, but I did not any boost from this against Adventure Works. What did help was to remove the Sum() function around the calculated member. Which I am guessing was casting the member returned from ParallelPeriod as a single member set before summing it.
     

    WITH Member Measures.[PPInternetSales] As
    (ParallelPeriod([Date].[Calendar].[Calendar Year], 1, [Date].[Calendar].CurrentMember),[Measures].[Internet Sales Amount])
    Select {[Measures].[Internet Sales Amount],Measures.[PPInternetSales]} On Columns,
    NONEMPTY(CrossJoin({[Date].[Calendar].[Date].members}
    , {[Product].[Product Categories].[Product].members})
    ,[Measures].[Internet Sales Amount]) On Rows
    From [Adventure Works];

  2. Thomas said

    Hi Darren! As an MDX rookie I would greatly appreciate if you have a further comment on this: "What did help was to remove the Sum() function around the calculated member. Which I am guessing was casting the member returned from ParallelPeriod as a single member set before summing it."
     
    Have Mosha written about this? MDX is certainly a puzzle sometimes!

  3. Thomas said

    Darren, your suggestion to leave out SUM in the calculated member did improve performance even further, 6 seconds on an empty cache. Thanks for that!

  4. Vidas said

    Hi Thomas,
     
    I was playing with SSAS 2008 and needed a query on AdventureWorks that runs long. So I grabbed your query to see how long this will run. I like what I see in 2008. Your tweaks are just right for SSAS 2005. But in SSAS 2008 results are very close. Results on my VirtualPC on cold cache:
    Your first query runs 15seconds
    Your last query runs 15seconds
    Darrens query run 11seconds.
     
    Just thought I’ll share this here after "borrowing" your query for my tests.
     
    Vidas
     

  5. Thomas said

    Hi Vidas. It is nice to hear about the improvements in SSAS 2008 and that the queries are useful examples.

  6. Luke said

    Hi Thomas – Thank you for sharing this. This small change did wonders on a slow running report (20 mins to 28 secs)Luke

Leave a comment