Introduction to Dimensions

Today, we are going to quickly cover one of the basic concepts included in dimensional modeling.  We are going over the basics of dimensions, attributes and hierarchies.  We will review how they are related and how they work within Data Warehouses.  If you are looking for an introduction to building a data warehouse click here.

What is a Dimension Table?

A dimension table provides the description behind the analytic numbers.  It describes the who, what, when, where and why behind the facts. Dimensions are normally broken down into groups (tables) and they contain several attributes (columns).   Unlike a fact table the dimension table is not normalized.  Generally, dimension tables have many columns but a limited amount of rows. Dimension tables normally provide two purposes in a data warehouse, it can be used to filter queries and to select data.

Several data warehouses include the following dimension tables products, employees, customers, time, and location.  Lets say your business requirement is to provide an  time tracking data warehouse.  You would to want to implement a employee dimension table that included at least the following attributes (columns) first name, last name, status, start date, hire date, end date, department name title, salary etc….   we could actually go on and on and this is okay as long as the attributes support your business requirements.

image image

What are dimension hierarchies?

Dimension hierarchies provide a way to define a relationship between multiple attributes within a dimension. They are commonly defined as a structure to provide drill up and drill down capabilities.  You can have multiple hierarchies within a dimension. For example, in a time dimension you might want to have two separate hierarchies, one for fiscal year and another for calendar year.  These two dimension could contain the following attributes year, quarter, month, day.

image

Best Practices for designing a Dimension table

  • Use a unique identifier integer column that is auto incremental as your primary key.  This is commonly known as a surrogate key.
  • Use the source’s primary key as an alternating key.
  • Any additional attributes (columns) that describe the business entity

Slowly Changing Dimensions

Now that we know how to build a dimension we need to consider how the data is stored.  Yes, occasionally the attribute data changes.  For example, lets pretend you are a die hard Pirates fan and built a Data Warehouse to do analysis on your favorite players.  How would you handle promotions like Andrew McCutchen getting called up from AAA Indianapolis to the Pirates?

The following are options to handle slowly changing dimensions:

  1. Do nothing (type 0).  This approach is highly not recommended.  This cannot guarantee history preservation and includes the least control over managing changed attributes.
  2. Overwrite old data with new data (type 1)
    This completely overwrites the previous attribute value.  This could be ideal if you had a spelling error or the historical value of this column is not crucial. This is the easiest method to implement but it can be hard to manage.
  3. Create multiple records with different key values (type 2)
    This provides you with the ability to have an unlimited amount of history.  It is usually implemented by adding a start and end date column.  If you have to make changes to past records this could require you to also update fact records.
  4. Create a new column for previous values (type 3)
    This allows only a fixed amount of history to be retained.  If you create two extra columns for the team attribute you can only store three team values (the current value and last two previous values)

I hope you enjoyed this post.  Next we will cover an introduction to facts and measures.