This tutorial is based on one of the earlier tutorials – How to Create a Star Schema Model. I strongly recommend that you go through it first to become comfortable with using Power Query to create facts and dimensions for your Star schema before we start working on ID generation.
The data set that we’ll use in this tutorial is very similar to the one we used before with one exception, I have removed the Product ID field from it.
This is a very common scenario when only business attributes are exposed to end users in data exports which makes building dimensions and linking them with fact tables a little challenging.
Please download the Lab Data File here.
Since we already covered the basics of reorganizing similar data sets in facts and dimensions and loading these data into a Power Pivot model, this tutorial will focus on the following objectives:
- Create a Product dimension based on our data set
- Generate a unique id for each product
- Create a fact table that is compatible with our new Product dimension using the same Product ID
Step 1 – Create Product Dimension and generate Product ID
- Open the data file and click on the Excel table on the SalesData tab
-
Click on Power Query->From Table in order to create a new Power Query based on that excel table
-
Shift click on Product related attributes (Product, Category, Segment, Manufacturer) to select them, then right click on any of the selected columns’ header and select “Remove Other Columns”
-
Before we generate a unique Product ID, we need to decide what field (or combination of fields) are unique for each product. In our case Product is such field. If no one field is unique, then we need to add a new one by concatenating the data attributes that create the desired unique value by selecting those attributes and then clicking Add Column->Merge Columns. For the sake of simplicity, we will just assume that Product field is unique in this tutorial. Once we have identified/created the unique field in our Product dimension, we need to make sure that it does not contain any duplicate values. In order to remove duplicates, click on Product field and then Home->Remove Duplicates
- Now we are ready to add an ID field. Click on Add Column->Add Index Column->From 1 to add a new column with incremental values starting from 1
-
Right click on the header of the newly created column, and click Rename
-
Let’s name our new column Product ID
-
Now let’s click on the Product ID header and then drag it to be the very first column in our Product dimension. We can also rename our Query as ProductDim
-
Let’s click on Home->Close & Load to save our Query and load it into a new worksheet.
Step 2 – Creating a Fact table and linking it to Product dimension
- Make sure you are in the SalesData tab
-
Click on Power Query->From Table
-
Our objective now is to remove all of the unnecessary product attributes from the Fact table and replace them with our new Product ID field. In order to do that, we will merge our Fact Power Query with our Product Query by clicking Home->Merge Queries
-
Make sure that your Merge criteria look the same as the picture below
-
Now we can Expand the new column that was created as a result of the Merge operation and pick Product ID from the list (note that we unchecked the default column prefix to save us from having to rename this column later on to get rid of the prefix)
-
Now we can delete the unnecessary product attributes, move the Product ID field to the very beginning, rename the Query as SalesDataFact
- Click Home->Close & Load
We now have what we set out to achieve in the beginning of this tutorial, a Product dimension with a unique Product ID and a Fact table with a corresponding Product ID in it as well. Although it is outside of the scope of this tutorial, you can now add the two power queries to a Power Pivot model and then link them up by Product Id field.
Please download the final lab file here.
Hello, very interesting, helped me a lot. Is this good practice, today?
Thanks! This solved my problem, like no other solutions did 🙂
Thank you my dude, this was exactly what I needed.
Thank you for the demo, however, I am having an issue with this concept — when we create a future query that pulls in new SalesData with new products added in addition to the old products, won’t the ProductID (Index) be re-generated, making the previous ProductIDs obsolete because they are no longer in the original order? Is there a way to preserve the Product ID (fact) table and append new products to the bottom to preserve the Index order?