Do you want data analysis to be your cup of tea? That’s great because 2021 is a good year for that. If you’re starting out, part of data analysis is modeling your data. And you can model your data using a mature approach called a star schema. It’s also beginner-friendly. So today, you’ll learn how to create a star schema in Power BI.
But why 2021?
Because data is the gameplay of many industries in 2021 moving forward. In fact, companies are increasing their budgets on data-driven marketing by 40%. Aside from that, Netflix already increased its value to $50 billion in 2020. That’s despite the pandemic. Why? Because their decisions are data-driven.
So, data analysis is one of the sought-after skills in 2021. And we want a slice of that 40% budget. Learning about data modeling through a star schema is also a good start.
But first, what exactly is a star schema?
What is a Star Schema in Power BI?
Star schema is the simplest of the data models used in many data warehouses today. In its basic form, it consists of a fact table at the center surrounded by dimension tables.
But what is a fact table?
A fact table is a table that consists of all the facts about a business entity or process. It can be movie ticket sales, daily rainfall data, daily COVID-19 new cases, and many more.
There can be several fact tables in a business. Each defines different business processes in one data warehouse.
But what is a dimension table?
A dimension table categorizes information in a fact table. For example, in movie ticket sales, the dimensions can be the movie title, date and time shown, the genre of the movie, and many more.
Dimensions help answer business questions like “How many movie ticket sales did we earn last month?” or “Which movie title sold the most tickets?”
But there’s something called a measure inside a fact table. What is a measure?
A measure is a column in the fact table that can be calculated. You can get the sum, average, variance, and more with this. Examples of this can be the total movie ticket sales or average movie theater expenses.
Here’s how a star schema will appear in a diagram in Figure 1.
The formation of the fact table and dimensions is like a star. But note that it doesn’t fix the number of dimensions to 5 tables. It can be less. It can be more.
Now, let’s talk about the good and the bad about star schema.
Star Schema Advantages and Disadvantages
Characteristics of a star schema include:
- A fact table at the center. It acts as a parent table.
- Dimension tables are like child tables. They surround the fact table.
- Dimension tables can’t have their child tables.
- Fact tables have columns that reference a column in each dimension table.
- The design is denormalized because a dimension table can’t reference another dimension table.
These characteristics offer advantages and disadvantages.
- Simpler design that is friendly to beginners and experts.
- Faster queries because of the simpler design and denormalized state.
- No many-to-many relationships.
- Requires strict control to update denormalized dimension tables to maintain data integrity.
- Repeated data in dimension tables may consume more disk space.
Why Start in Power BI?
Power BI Desktop is used to design dashboards and analytical reports. It’s FREE and easy to use. You can also use a variety of data sources. For example, you can build a Power BI report from a SQL Server data warehouse.
But not just SQL Server. It can also be in Excel, MySQL, PostgreSQL, and many more. See a screenshot of Power BI when selecting a data source in Figure 2.
Aside from various data sources, Power BI can use the existing relationships from within a relational database. This will help in modeling your data. And with a good design in your data source, you may not need to work on relationships in your data model. Or at least minimize relating the dimensions to your fact table.
You will see an example of this later.
How to Make a Star Schema in Power BI in 3 Easy Steps
Now let’s build a star schema in Power BI. This will take you a few minutes with our sample data.
The sample data is a SQL Server data warehouse. You can download it from here. After downloading, you’ll notice that it’s a database backup. So, you need to restore it using SQL Server Management Studio. Name your copy of the database SportsCarSalesDW.
(Note: Building a data warehouse in SQL Server is out of the scope of this article. Check out this good article on how to do that.)
If you don’t have Power BI Desktop yet, you can download it here and install it in your PC.
STEP 1: Open Power BI Desktop and Connect to the Data Source
Start by running Power BI Desktop. Then, click Get Data. See a screenshot of this in Figure 3.
There will be further on-screen instructions to pick your specific data source. Then, the next screen will be the same as Figure 2. So, pick SQL Server database and click Connect.
Then, Figure 4 shows the window for specifying the SQL Server and database names. Check the name of your SQL Server for this. For the database name, enter SportsCarSalesDW.
Then, click OK. Enter the credentials to log in to your SQL Server.
STEP 2: Select the Fact and Dimension Tables
Now that you’re connected to our sample data warehouse, it’s time to select the fact and dimension tables. Check out Figure 5.
After selecting the needed tables as indicated in Figure 5, click Load.
STEP 3: Refine Your Data Model
After loading the tables in Power BI, you need to check the table relationships and form a star schema. As you can see in Figure 6 below, dimSportsCarStyles is not yet related to the fact table. So, you need to relate it.
First, click the SportsCarID in dimSportsCarStyles. Then, drag it to the SportsCarID in FactSportsCarSales. A Create Relationship window will appear. Check it out in Figure 7.
This will make a one-to-many relationship between the 2 tables. All the default values are correct at this point. You just need to click OK.
The final star schema diagram in Power BI is shown in Figure 8.
Finally, you need to save your work. Click the disk icon in the upper left and name the report file SportsCarSales.
After you’re done with the data model, you can now create the report you want. In the star schema data model we made, here’s a possible report you can make. Check out Figure 9.
The report above shows total sales and quantity sold. The total for the year 2020 is a whopping $254.25M! Hey, we’re selling some of the coolest sports cars from McLaren and Ferrari!
But there’s nothing yet about profit, expenses, returns, and more. Our sample data source lacks that information. So, I’ll leave that to your analytic, left-brain minds to improve the model.
2 More Points Worth Considering
When I was starting out, I also got confused about how to approach the data model. At some point, it didn’t turn out to be a star schema.
So, consider the points below.
Power BI Star Schema vs. Snowflake Schema
Consider the sample data model we have again. There’s this dimSportsCarStyles table. In the data warehouse, there are separate dimStyles and dimSportsCars tables. But instead of using it, we pick a database view dimSportsCarStyles. This is a combined data set of the styles and sports car tables.
We are discussing the star schema data model. If we use those 2, the dimSportsCars (a dimension table) will need to be related to the dimStyles table. That will contradict the star schema model.
And if we do that, it becomes a Snowflake schema.
Snowflake schemas have a more normalized table design. Because dimension tables can be related to another dimension table. This is closer to the design of transactional databases. It is also more compact and saves space compared to a star schema.
But note that using the database view as a dimension will avoid storing repeated data. Since the data set is small, this approach is good too. There’s also no need to worry about data integrity. And lastly, it satisfies our purpose of creating a star schema.
So, which is best?
It depends on your requirements. For simple, straightforward requirements, you can stick to a star schema data model. Meanwhile, a star schema no longer fits if you need many-to-many relationships.
Power BI Star Schema vs. Flat Table
But you may ask, “Can I just use 1 table for everything?”
That will mean taking out the dimension tables altogether. It will be just one big fact table with all the descriptive fields in it. No IDs and foreign keys at all.
Is that bad?
When you take an Excel worksheet to Power BI, you can do some analysis of it too. It’s flat but doable.
But there’s a catch.
It will be a high-maintenance data model. Why? If you need to add details to it, you keep making the table big and fat. Details are repeated. And not all information can be included. What if we include expenses for the sports car sales we have earlier? That will be another entity.
In the end, the big table will be slow. And you will finally decide it’s time to break it up into a star schema. Do you know what you’re up against when you do this?
So, a flat table is a big NO. It’s not flexible for growing businesses.
Data modeling using a star schema is a part of data analysis. It is a good modeling approach for starters and professionals alike.
Meanwhile, Power BI is a powerful tool to help you model your data and create reports. It’s easy connecting to any data source and creating table relationships.
So, at the end of the day, bosses will make data-driven decisions better.
Did this star schema Power BI tutorial help you? If so, please share it on your favorite social media platforms.Last modified: November 03, 2022