Thomas.I Microsoft BI & Analytics

A Site Dedicated to General BI Issues and MS BI Issues: Architecture and Technology

The single version of the truth in PowerPivot

Posted by thomasivarssonmalmo on August 29, 2010

With the arrival of PowerPivot and the concept of self service BI might bring fear among experienced BI developers for the known spreadsheet anarchy of data consolidation and integration within a business. This is founded on the way classic Excel data repositories(that is any spreadsheet) work with the data part, the business rules part(formulas) and the presentation part in the same application. End users can collect data in diifferent ways, apply different filters and create their own business logic in the Excel spreadsheet and at a meeting you end up with endless discussions of what is the most correct Excel version of the truth.
 
How can an organization avoid the situation of their Excel reporting being beyond control and a black box where you put in some data and receive a result that you cannot check if it is correct or wrong? And will PowerPivot make this even worse?
 
Well to be honest, any application, even centrally managed SSAS, can get out of hand. I have seen SSAS cubes with 4000-5000 lines of MDX in a single cube. Add to that business logic in data source views by named queries and named calculations.
 
Also add an ETL system of the data warehouse were we have no standards for documenting how data is changed when it goes from the source system into the data warehouse. These systems mostly have a technical documentation that will not help end users and report creators.
 
All of the issues that I have described appears because of the lack of a blueprint of how information relates in a business and how it should be reported to get a view that can be shared among the whole business.
 
Back to PowerPivot. Todays PowerPivot data collections can become tomorrows centralized view that works if there is a framework for structuring data that is known among end users.  I think that this means that end users should think about their report requirements in terms of centralized dimensions and centralized facts. When I have started SSAS project with customers that do not have SSAS cubes(or cubes from other suppliers) we have started with workshops where we try to write down the dimensions and measures they are using in their business. After the dimensions have been identfied I continue with the hierarchies within each dimension.
 
If you have dimensions that work across spreadsheets, Pivot Tables, PowerPivot models and SSAS cubes you will cause less doubt on what you report. You will also have a better opportunity to consolidate this information across different business processes.
 
Let me bring a simple example of how the idea of this blueprint and thinking can work.
 
Date or Time is a frequent dimension in all reporting. If we discuss a calendar hierarchy in a date dimension that will consist of levels like Date->Month->Quarter->Year. If we add week over date we will have a problem that the entire organization will have to create consensus on. The levels in a week hierarchy can be Date->Week->Year but the rules of how weeks relates to years can be very different across countries. You have the ISO week with years having 52 or 53 weeks depending on when new years day is positioned in as the day in the week and you have the US system. Make a blueprint in the organization of which standard to use and use that standard in all your reports including PowerPivot.
 
You should think in the same way about your organization structure, your products or services. Make a blueprint of these structures and use them in your reports.
 
Actually I have not seen the end users like business controllers having problems with understanding how to build a data framework, with dimensions and fact tables, for reporting since I use their business terms. These discussions do not fit in to a classical IT versus end users conflict since I do not use strange relational diagrams with foreign keys.
 
One of the most interesting features in PowerPivot is that you can import a dimensional model and if your developers have done their job you do not have to care about BI-stuff like starschemas. The data in the dimension tables will be attached to a fact table if the definitions are correct in the backroom database. In PowePivot I can select to import from a relational database and SQL Server and point at the fact table and mark select related tables.
 
 
 
 
 The spreadsheet anarchy can find its way into PowerPivot, like all other uncoordinated efforts to build a report if the end users choose to select the linked table option in PowerPivot. In this case end users can apply different filters and the result can be an information pizza or chaos. Remember that this is not the outcome of the tool but a misuse of the tool.
 
The ideas here are not new but comes from many years reeding of Ralph Kimball’s books on Business Intelligence.
 

Leave a comment