Thomas.I Microsoft BI & Analytics

A Site Dedicated to General BI Issues and MS BI Issues: Architecture and Technology

Sampling data with the Excel 2007 data mining add in

Posted by thomasivarssonmalmo on June 19, 2009

I have used the sample tool in the Excel 2007 before but I never thought that I could do this directly on a data source, only on an Excel 2007 table, but you can. I am using SSAS 2008, as the data source, and the Excel 2007 data mining tools for SSAS 2008.  This can be useful even if you do not want to use the data mining add ins and only collect data for a pivot table. Think about building a pivot table on top of 100 thousand records or do it with a random sample of 10 thousand records.
 
You cannot join tables in this tool, only select single tables or views.
 
Edit: I had the wrong order of the pictures but that is fixed now.
 
Open Excel 2007 with an empty sheet and go to the data mining menu.
 
 
 
Click on the button next to the data source name box. You will see the select data source page.
 
 
You will se the empty data source query editor below.
 
  
 Click on the button next to server data source listbox. You will see the new analysis services data source form. Enter the data source name and the taget server name and browse for the database at the bottom in the catalog name listbox. I will select the AdventureWorksDW database in the cataloge name listbox below.
 
 
After we have selected the database we will return to the data source query editor. I will select the well known target mail view in the AdventureWorksDW database on the next page below.
 
 
 
 You will come back to the select data source page and see the query in the bottom test box. Click next. You will arrive to the select sample type page where you do not have any choice but accepting random sampling.
 

 
 
 In the random sampling option page you will only have the option to select the number of records for the random sample.
 
 
In the final page you enter the name of the spreadsheet tab. I will not show that. Click finish to get the random sample records into the spreadsheet.
 

 
Another approach is to write a TSQL select and use the random sampling features that has been available since SQL Server 2005. The built in sampling in the data mining tools can be interesting for end users without TSQL coding skills.

Leave a comment