|
Spaces home Thomas.I Microsoft BI & ...ProfileFriendsBlogMore ![]() | ![]() |
|
Thomas.I Microsoft BI & AnalyticsAugust 18 Product Sales, Top and BottomCount in the same reportHere is a MDX code snippet that will show you how to combine named set based on TopCount and BottomCount. This type of report is sometimes refered to as "winners and loosers". This means that the report will show the best selling products and the worst for a single year.
It is also easy to use this code in others types of analytics like the products with most and least errors.
I will start this time with showing the final result.
I have the top 10 product sales for 2003 as the first ten members followed by the bottom 10 product sales. On the columns I have Internet Sales Amount for 2003 , the market share for each product and finally the ranking of these products over all sold products in 2003. Here is the code to produce this result. I am sure that there are many other ways to do the same. This time I have checked the query in MDX Studio and the number of Storage Engine queries was 7. WITH Member Measures.[Market Share] As Set Top10Products2003 as TopCount([Product].[Product Categories].[Product].members,10,(Measures.[Internet Sales Amount],[Date].[Calendar Year].&[2003])) Set Bottom10Products2003 as Set Top10Bottom10Products as Union(Top10Products2003,Bottom10Products2003) Set AllSoldProducts2003 as NonEmpty([Product].[Product Categories].[Product].members,([Measures].[Internet Sales Amount],[Date].[Calendar Year].&[2003])) Member Measures.[RankTopBotton10Products] as Select {Measures.[Internet Sales Amount],Measures.[Market Share],Measures.[RankTopBotton10Products]} On 0, A few comments regaring this code. The BottomCount function needs the help of the NonEmpty function to not pick the products with null sales in 2003. The AllSoldProducts2003 is used to rank these 20 products over all sold products. I have also checked that the number of sold products was 133 in 2003. The ranking of products is tied to Internet Sales Amount but it will also work with the market share calculated measure. You should compare the amount of MDX code required to build this report with a SQL based solution. August 03 One master warned us ten years agoThere is one article that I frequently read and mail the link to colleagues and friends in the BI-business.
It is written ten years ago when I entered the BI business and I still see these mistakes being repeated today.
Do not try to build a complete solution for all your BI requirements in one large project.
Next, the stovepipe approach or building information silos is still to frequent together with the monolitic approach. Stovepipes will only copy the Excel anarchy approach by centralizing the same problem, that people go to meetings with different data about the same business. The monolitic approach will be to slow on delivering anything useful.
If a project is built according to one of these approaches it will spill over to your cubes and force you to create MDX code to support structures that were not good enough from the start.
The third point of that article is to distinguish between transaction cubes and snapshot cubes. I have been part of projects were we tried to build snapshot truth on top of transactions and had a cube with thousand of lines of MDX on top of transaction fact table. This is a frequent issue when you are trying to get budgets, forecasts and transaction views of data to fit in to the same fact table.
Chew the elephant one bite at a time! Avoid the single bite approach. July 27 What happend to the top 10 products previous years ?I have written previously about the TOPCOUNT() function and in this blog post I will extend that with the previous development for the top 10 products. This is also the second part of my previous blog post about analyzing product volatility.
Let us start with the TOPCOUNT function and later extend that with a calculated measure using the RANK() function.
Here we have the top 10 selling products, for the year 2004, in the Adventure Works cube.
WITH Set TopTenProducts2004 As
TopCount(Descendants([Product].[Product Categories],[Product].[Product Categories].[Product]) , 10, ([Measures].[Internet Sales Amount],[Date].[Calendar Year].&[2004])) Select {[Date].[Calendar Year].&[2004]} On 0,
TopTenProducts2004 On 1 From [Adventure Works] Where (Measures.[Internet Sales Amount]); I have been explicit about putting the year 2004 on column since you have the freedom to put any year you want there.
If you would like to check this with the data source you can use this TSQL statement:
Select p.ProductAlternateKey, p.EnglishProductName, Sum(f.SalesAmount)as SalesAmount
From DimProduct p Join dbo.FactInternetSales f On p.ProductKey = f.ProductKey join dimTime t On f.OrderDateKey = t.TimeKey Where Year(t.FullDateAlterNateKey) = 2004 Group By p.ProductAlternateKey, p.EnglishProductName Order By Sum(f.SalesAmount) DESC And now to the final example. How can I show the previous years ranking for the top 10 products in 2004?
WITH Set TopTenProducts2004 As
TopCount(Descendants([Product].[Product Categories],[Product].[Product Categories].[Product]) , 10, ([Measures].[Internet Sales Amount],[Date].[Calendar Year].&[2004])) MEMBER [Measures].[ProductRank2004] AS
RANK([Product].[Product Categories].CurrentMember, TopTenProducts2004) Set TopTenProducts2003 As
TopCount(Descendants([Product].[Product Categories],[Product].[Product Categories].[Product]) , 10, ([Measures].[Internet Sales Amount],[Date].[Calendar Year].&[2004].Lag(1))) MEMBER [Measures].[ProductRank2003] AS
RANK([Product].[Product Categories].CurrentMember, TopTenProducts2003) Set TopTenProducts2002 As
TopCount(Descendants([Product].[Product Categories],[Product].[Product Categories].[Product]) , 10, ([Measures].[Internet Sales Amount],[Date].[Calendar Year].&[2004].Lag(2))) MEMBER [Measures].[ProductRank2002] AS
RANK([Product].[Product Categories].CurrentMember, TopTenProducts2002) Select {[Measures].[Internet Sales Amount],[Measures].[ProductRank2004],[Measures].[ProductRank2003],[Measures].[ProductRank2002]} On 0, TopTenProducts2004 On 1 From [Adventure Works] Where ([Date].[Calendar Year].&[2004]); What I do is to create sets for the previous years simply by repeating the same named sets for each year of interest but by only adding the LAG() function to each named set. This gives me the top 10 products for 2004 but their sales for each previous year I add. Each top 10 set is placed in the calculated measure with the RANK() function. If the product has not been sold the RANK() function will return 0.
In the result you will see that some of the top 10 products in 2004 had another ranking in 2003 and none of them were sold in 2002.
I am sure that the code above can be improved and optimized so do not not hesitate to add you comments.
July 22 Product Volatility in Adventure WorksSales for products are going up and down. In this blog post I will simply show how you can count the number of products with increasing and decreasing sales between each month in the Adventure Works cube. Volatility is simply about how many changes in sales you have for products. Sometimes you do this by comparing current month with the previous month, as in this example. Sometimes you compare this month with the same month one year ago.
This is a MDX snippet written for beginners and medium experienced users. I will also use some TSQL to make you sure of that the starting point for this analysis is correct.
I will start in the relational world and the source for the Adventure Works cube. If you run:
Select * From DimProduct
It will return 606 products. Not all of these are final products sold to customer. If you run this select and relate the product table to the product subcategory table:
Select Count(*) From DimProduct dp
Join DimProductSubCategory ds on dp.ProductSubCategoryKey = ds. ProductSubCategoryKey You will see 397 products.
But how many products have been sold each month in this data base? Run this TSQL:
Select Distinct Year(t.FulldateAlternateKey) * 100 + Month(t.FulldateAlternateKey) as YearMonth, Count(Distinct ProductKey) as NoProducts
From DimTime t Join dbo.FactInternetSales f On t.TimeKey = f.OrderDateKey Group By Year(t.FulldateAlternateKey) * 100 + Month(t.FulldateAlternateKey) Order By Year(t.FulldateAlternateKey) * 100 + Month(t.FulldateAlternateKey) What I am doing with dates is simply to create a integer version of year and month. Remember this resultset when we move to the MDX-world.
I want to see the total number of products and the number of sold products in the Adventure Works cube. This MDX will help you with that and show you the same result, for the number of sold products as above:
WITH
MEMBER Measures.TotNumberOfProducts AS Count( Descendants ( [Product].[Product Categories] ,[Product].[Product Categories].[Product] )) MEMBER Measures.TotNumberOfSoldProducts AS Count( NonEmpty ( Descendants ( [Product].[Product Categories] ,[Product].[Product Categories].[Product] ) ,[Measures].[Internet Sales Amount] )) SELECT { Measures.TotNumberOfProducts ,Measures.TotNumberOfSoldProducts } ON 0 ,[Date].[Calendar].[Month] ON 1 FROM [Adventure Works]; You will see exacly the same resultset as in the TSQL query above.
The nice thing with MDX is that it is not that hard, as in TSQL, to compare what happend the month before the current one in the resultset. Here is the final snippet of how many products that increased, decreased and had the same sales as the previous month in the Adventure Works cube.
WITH
MEMBER Measures.TotNumberOfProducts AS Count( Descendants ( [Product].[Product Categories] ,[Product].[Product Categories].[Product] )) MEMBER Measures.TotNumberOfSoldProducts AS Count( NonEmpty ( Descendants ( [Product].[Product Categories] ,[Product].[Product Categories].[Product] ) ,[Measures].[Internet Sales Amount] )) MEMBER Measures.NumberOfSoldProductsIncreasing AS Count(Filter( NonEmpty ( Descendants ( [Product].[Product Categories] ,[Product].[Product Categories].[Product] ) ,[Measures].[Internet Sales Amount] ),([Date].[Calendar].CurrentMember,[Measures].[Internet Sales Amount]) > ([Date].[Calendar].PrevMember,[Measures].[Internet Sales Amount])) ) MEMBER Measures.NumberOfSoldProductsDecreasing AS Count(Filter( NonEmpty ( Descendants ( [Product].[Product Categories] ,[Product].[Product Categories].[Product] ) ,[Measures].[Internet Sales Amount] ),([Date].[Calendar].CurrentMember,[Measures].[Internet Sales Amount]) < ([Date].[Calendar].PrevMember,[Measures].[Internet Sales Amount])) ) MEMBER Measures.NumberOfSoldProductsNonChanging AS Count(Filter( NonEmpty ( Descendants ( [Product].[Product Categories] ,[Product].[Product Categories].[Product] ) ,[Measures].[Internet Sales Amount] ),([Date].[Calendar].CurrentMember,[Measures].[Internet Sales Amount]) = ([Date].[Calendar].PrevMember,[Measures].[Internet Sales Amount])) ) SELECT { Measures.TotNumberOfProducts, Measures.TotNumberOfSoldProducts, Measures.NumberOfSoldProductsIncreasing, Measures.NumberOfSoldProductsDecreasing, Measures.NumberOfSoldProductsNonChanging } ON 0 ,[Date].[Calendar].[Month] ON 1 FROM [Adventure Works]; The basic idea is the same for the calculated measures in the last example. Use NonEmpty to get the products that were sold and the combination of Count and Filter to get the number of products that increased, decreased and stayed stable between months. It should not be that hard to add ratios for each of them compared with the number of sold products.
Another perspective might be to return the names of the products that had the highest number of increases, decreases or both during a time period. I will return with that later.
Edit: Mosha Pasumansky kindly asked me if he could use this blog post as an example for optimizing the query with MDX studio. Since I have focused on the business problem part of the problem I was happy to get the help from the best MDX mentor to tune this query. You should always think about performance as the second part in this process. Have a look at the performance angle of this query at Mosha's blog. July 19 SQL Server 2008 book releases coming onI did a quick search om Amazon for SQL Server 2008 books and it returned 3 pages of titles. Most of them will be issued during the last few months this year and the first months, next year. It is only Teo Lachev that has already released his Reporting Service 2008 book.
When I search for SQL Server books I look for well known names in the business and authors that have written good books earlier. This does not mean that my list have rejected other books because they are bad, it is imply beacuse I cannot buy all titles.
Here is the list.
Since the changes from SQL Server 2005 to 2008 is not that big most of the books you already have will not be outdated the day after the release of SQL Server 2008. I also think that most SQL Server 2005 customers will wait until SP1 before they upgrade. Finally, What I am missing is a good book on MDX, like an updated version of "Fast tracks to MDX". I have not found that yet but I guess that a MDX book will not be a block buster and that I must find a general SSAS 2008 book with best MDX coverage. July 13 Recursive hierarchies or non recursive, pros and consRecursive dimensions in SSAS 2005 is supported only in the form of parent-child dimensions. You can check my previous entry about parent-child dimensions but shortly they rely on a single parent at the top of the dimension. The more complex form of recursive hierarchies allows several parents at the top and are called Bill Of Materials hierarchies. The later is not supported in SSAS 2005 unless you model it as many-to-many dimensions.
Lately I have met some developers that prefer to use parent-child hierarchies as a standard structure for all dimension in a data warehouse and in SSAS 2005. Even if Microsofts advice is to be careful with parent-child dimensions, like you can read in the SSAS 2005 "Performance Guide", some developers thinks that this is valid general design approach. Still they will never build a recursive hierarchy of the time dimension because even they know that you need named dimensions levels, like Year, Month and Date, in order for time calculations to work.
Let us have a look of how you can model a product dimension as a recursive structure and later the other way, non-recursive, or natural hierachies. This is a simplified version of a recursive model:
In the product table you keep all records for all levels in the product dimension. Members and parents are kept in the same table. You have also a product level table that can tell on what level the different records are in the product table. The third table, ProductStructure keep the relations between all levels in the product dimension. The ProductStrucure table has a primary key and aforeign key relation between the ProductId and the ParentProductId, but if you keep several alternate hierarchies in this structure the foreign key constraint will not work.
What are the pros of this design?
What are the cons?
It think that the cons outperforms the pros by a very high magnitude. How do you model the product dimension in a non recursive way? You can put all attributes in a single dimension table. If you have large dimension tables and frequent changes this will not be the best choice. Here is a simple description that is a copy of the structure in the Adventure Works DW database. If you would like to add new parent attributes to the product table you will need to add a new key for that and new tables for levels above the product level. When the structure change you will need to add new tables. If you use a single product table for all attributes you add new columns. When you write a query, with this last structure, you will only need to write standard joins and the report side of combining attributes will be much more simple, than with recursive hierarchies. Keep you design simple and good enough. June 29 Nothing new in SSAS 2008 RC0 regarding mulitiselect ?I recently installed SQL Server 2008 RC with SSAS 2008. My first idea was to check if anything have changed regarding multiselect in SSAS 2005. I have tried many of the examples that Mosha has on his blog but without having seeing any changes compared to SSAS 2005.
Mosha also says here that he would return with a solution i february 2008.
Edit: Mosha has a comment on this blog post that will tell you more about this subject.
Dynamic sets in SSAS 2008 can be a solution. Anyone else out there that have found any news regarding multiselect in SSAS 2008? June 27 Null processing of measures in SSAS2005A few days ago a question regarding null processing of fact records was posted on the SSAS forum. It was also about nulls in dimension tables but let us focus on what to do with empty measures in a fact record. Should you put zero in them or allow null? I answered that post with the assumption that you should never feed a cube with zeros because it will make the NON EMPTY keyword useless in MDX Selects. Still I never checked the options in SSAS 2005 so this post is the outcome of the need to check this more carefully.
I recommend you to think twice about your fact table design because if you have null or zeros for measures you might have designed it wrong. If not you can continue with this technical solution that SSAS 2005 offers.
We will need to build a simple cube to understand how this works. Run the following TSQL script in management studio in order to create a time dimension and a simple fact table.
---------------------------------------------
Create Table TimeDim1
( FullDate datetime, MonthNo int, MonthNameT Char(3), YearNo Int) Go
Insert Into TimeDim1 (FullDate) Values ('2008-01-01')
Insert Into TimeDim1 (FullDate) Values ('2008-02-01') Insert Into TimeDim1 (FullDate) Values ('2008-03-01') Insert Into TimeDim1 (FullDate) Values ('2008-04-01') Insert Into TimeDim1 (FullDate) Values ('2008-05-01') Insert Into TimeDim1 (FullDate) Values ('2008-06-01') Go
Update TimeDim1
Set MonthNo = Month(FullDate), MonthNameT = Left(DateName(mm,FullDate),3), YearNo = Year(FullDate) Go
Create Table FactsWithNull
( FullDate datetime, SalesValue int, SalesQty int) Go
Insert Into FactsWithNull (FullDate, SalesValue, SalesQty) Values('2008-01-01', 100, 1)
Insert Into FactsWithNull (FullDate, SalesValue, SalesQty) Values('2008-02-01', Null, Null) Insert Into FactsWithNull (FullDate, SalesValue, SalesQty) Values('2008-03-01', 100, 1) Insert Into FactsWithNull (FullDate, SalesValue, SalesQty) Values('2008-04-01', 100, 1) Insert Into FactsWithNull (FullDate, SalesValue, SalesQty) Values('2008-05-01', 100, 1) Insert Into FactsWithNull (FullDate, SalesValue, SalesQty) Values('2008-06-01', 0, 0) -----------------------------------------
When you run select * on the fact table you will have february 2008 with two null measures and june 2008 with two zero measures.
Next, you build the simple cube with one time dimension like this.
No it is time to run the test on the two measures in the cube. The nullprocessing property of a measure is placed under the source property in the cube structure pane in BIDS. The settings available are automatic, preserve, error and zeroOrblank. I will test all except for error.
The first case is NullProcessing = Automatic. This is also the deafult. You will see a result like this, below, in ProClarity or the cube browser in BIDS. The Null measures will be converted to zeros. If you add NON EMPTY on columns, february 2008 will still be seen. This is also the outcome of NullProcessing = ZeroOrBlank.
The next case is Nullprocessing = Preserve. In this case, Nulls will be kept and presented as empty cells and when you apply NON EMPTY on columns the empty month, february 2008, will be removed.
Conclusion: I still do not see any reason for entering zero in empty measures because you have setting in SSAS 2005 that can convert null to zero. If you would like to use the NON EMPTY keyword, to remove empty rows or columns, you should keep the nulls with Preserve.
Edit: This geeky blog reached 20 000 hits this week. Next time i mention this will be when i reach 50 000 hits.
June 21 Named Sets and TopCount, some thoughts of how this workVery often questions about named sets and the TopCount version of that are posted on the SSAS2005 newsgroup.
This blog post is written so I will not have to write the same answer over and over and even an experienced MDX developer can sometimes get confused about how this works.
You can see MDX Selects like this and a developer that is not sure about what is getting returned.
WITH SET TenBestProducts AS
TopCount([Product].[Product Categories].[Product].Members,10, [Measures].[Internet Sales Amount]) Select {[Customer].[Customer Geography].[Country]} On Columns,
TenBestProducts On Rows From [Adventure Works] In this Select you actually ask to the top 10 best selling products for all countries and all times. All dimensions that are not mentioned in TopCount part will point to their all level. How can point to a specific year and a specific country? Let us try France and the calendar year 2004.
WITH SET TenBestProducts AS
TopCount([Product].[Product Categories].[Product].Members,10, ([Measures].[Internet Sales Amount],[Date].[Calendar].[Calendar Year].&[2004],[Customer].[Customer Geography].[Country].&[France])) Select {[Customer].[Customer Geography].[Country]} On Columns,
TenBestProducts On Rows From [Adventure Works] Where([Measures].[Internet Sales Amount]) ; You change the measure part of TopCount into what is technically called a tuple or cube coordinate. Here is the point where end users and developer gets confused.
The best selling product for France in 2004 is Mountain-200 Silver, 46 with the value of $74,239.68 but in the grid you will see France, the correct top selling product and the Internet Sales Amount of $102,079.56. The reason that this result gets returned is that the named set, made with TopCount, is independent of what you put in the slicer. The named sets shows the ten best selling products in France in 2004, on rows, but the values in the cells for France points to the All Date member.
If you would like to show the sales (Internet Sales Amount) for the year 2004 in the cells, you will have to put that in the slicer like this. Remember that the slicer will have no impact on the product members on rows, only the cell values that are returned.
WITH SET TenBestProducts AS
TopCount([Product].[Product Categories].[Product].Members,10, ([Measures].[Internet Sales Amount],[Date].[Calendar].[Calendar Year].&[2004],[Customer].[Customer Geography].[Country].&[France])) Select {[Customer].[Customer Geography].[Country]} On Columns,
TenBestProducts On Rows From [Adventure Works] Where([Measures].[Internet Sales Amount],[Date].[Calendar].[Calendar Year].&[2004]) ; There is another way to check your TopCount results with the Order function and validate each country. Be aware of that the Order function can be a performance killer.
Select {[Measures].[Internet Sales Amount]} On columns,
NON EMPTY ORDER([Product].[Product Categories].[Product],[Measures].[Internet Sales Amount],BDESC) On Rows From [Adventure Works] Where ([Customer].[Country].&[France],[Date].[Calendar Year].&[2004]) The Rank function is also related to the way of doing TopCount analysis with MDX. TopCount is a simplification of doing Rank and Order in MDX, if I am not uncorrect.
WITH
SET OrderedProducts As ORDER([Product].[Product Categories].[Product].Members,[Measures].[Internet Sales Amount],BDESC) MEMBER [Measures].[ProductRank] AS
RANK([Product].[Product Categories].CurrentMember, OrderedProducts) Select {[Measures].[ProductRank],[Measures].[Internet Sales Amount]} On columns,
ORDER([Product].[Product Categories].[Product].members,[ProductRank],BASC) On Rows From [Adventure Works] Where ([Customer].[Country].&[France],[Date].[Calendar Year].&[2004]); Finally, if you would like to remove empty products at the end(with null in the cells) you can use this select instead. I have added the Filter function for this.
WITH
SET OrderedProducts As ORDER([Product].[Product Categories].[Product].Members,[Measures].[Internet Sales Amount],BDESC) MEMBER [Measures].[ProductRank] AS
RANK([Product].[Product Categories].CurrentMember, OrderedProducts) Select {[Measures].[ProductRank],[Measures].[Internet Sales Amount]} On columns,
ORDER(FILTER([Product].[Product Categories].[Product].members,[Measures].[Internet Sales Amount] > 0),[ProductRank],BASC) On Rows From [Adventure Works] Where ([Customer].[Country].&[France],[Date].[Calendar Year].&[2004]); I recommend to play around with these examples in Management Studio and see the effects of changes in the named set and the slicer. The book "Fast Track to MDX" has a dedicated chapter about this problem in AS2000. June 10 How many products do we have in the Adventure Works cube ?This is my first blog post about Counts in SSAS 2005 cubes. I have an idea of posting MDX code snippets that can serve as a help, mainly for beginners and medium experienced developers. The final goal is to create a good enough newbies education in MDX.
Lets get started on MDX and Count from the dimension perspective. This post might also help you with connecting the relational database source world with the SSAS multidimensional world.
If you should answer the question of how many products you have in the Adventure Works cube and you should answere this with MDX you have two opportunities.
Check this by counting the number of products in the natural hierarchy within the product dimension like this:
WITH MEMBER measures.X AS
[Product].[Product Categories].[Product].members.count SELECT Measures.X ON 0 FROM [Adventure Works The answer is 397 products.
What answer will you expect if you use the attribute hierarchy instead? The same number of products?
WITH MEMBER measures.X AS
[Product].[Product].Children.Count SELECT Measures.X ON 0 FROM [Adventure Works] ; The answer is 606 products!
What is correct then? If you use TSQL on the dimension tables you will get the same two different answers.
Select Count(*) from DimProduct will return 606 products.
Since the product dimension have two other tables in the join, productcategory and productsubcategory you should join all three to check the result.
Select Count(*) from dimProduct p Join DimProductSubCategory sc On
p.ProductSubCategoryKey = sc.ProductSubCategoryKey Join DimProductCategory pc On sc.ProductCategoryKey = pc.ProductCategoryKey This last SQL will return 397 products.
Conclusion: If you have a snowflaked dimension in SSAS2005 you will only see the number of members, in a natural hierarchy, that is the joined number of records in all the tables. If you use the attribute hierarchy you can get a different number of leaf level records if the table, connected to the fact table have records with no parents in the snowflaked dimensions above.
In the Adventure Works cube I suspect that this is by design and that the leaf level records without parents are spareparts.
Anyway I hope you get an idea of how to check the number of dimension members in a cube and the related numbers in the data source.
This started an idea of snowflaking dimension with a top level table that tells if a product or customer is still active or not...
|
|
||||||||||||||||||||||||||
|
|