Tag Archives: Data Warehouse

Pittsburgh SQL User Group: Data Warehousing Presentation

Today, I am giving my first presentation at my local SQL Server User Group.  Below are links to the slides.  I will also add some more resource here after todays presentation.

Introduction to Data Warehousing slides

DEMO: Build Your First Cube

Below is the description of the discussion.

November’s presentation will include an Introduction to Data Warehousing with SQL Server presented by John Sterrett, a Database Administrator and Web Engineering Specialist for Orrick, Herrington & Sutcliffe LLP.  If you are interested in learning how Analysis Services, Integration Services and Reporting Services can provide Business Intelligence this meeting is for you.  We will start with the basic concepts of data warehousing including key terms and definitions.  Next, we will cover some sample business cases while we walk through the process of designing a data warehouse.  Once this session is complete you will be able to develop your first cube.

Date: Monday, November 16, 2009

Time: 11:45 a.m. Registration; 12:00 PM – 2:00 p.m.

Venue: Council Training Room, 2000 Technology Drive, Pittsburgh, PA 15219

Cost: No charge

RSVP: Email Heather Jones at Heather_Jones@Dell.com

If you have any questions, need more information, or would be willing to participate in any upcoming presentations or discussions, please Heather Jones at Heather_Jones@Dell.com.

Data Warehouse: Facts and Measures

A few days ago I wrote a post that gave an introduction to dimensions.  Today, we are going to continue covering the basic concepts included in dimensional modeling by covering an introduction to fact tables and measures.  These posts are all part of the introduction to building a data warehouse with sql server series.

What is a Fact Table?

A fact table is a table that joins dimension tables with measures.  For example, Lets say you wanted to know the time worked by employees, by location, by project and by task.  If you had a dimension for employees, location, project and task you would create a composite primary key using these foreign keys and add an additional column for the time worked measure. (more on measures in a little bit)

Keep in mind that fact tables are huge unlike dimension tables.  Fact tables are usually built to contain a hundred thousand records on the low side up to billions of records on the high side.  Therefore, these tables must be normalized to be efficient.

A fact table is normalized when it consists of the following two things:

    1. A composite primary key using the foreign keys to the dimension tables.
    2. Measure(s) used for numerical analysis.

What is a measure?

A measure is normally an additive numerical value that represents a business metric.  You are also not limited to a single measure.  You can have multiple measures within a fact table.  For example, if your fact table is used to track purchases internationally you might have measures for each type of currency.  If you are building a fact table for the retail industry you might also have the following measures cost, list price, average sale price.

Next, you can take a look at my demo on building your first cube.

Introduction to Dimensions

Today, we are going to quickly cover one of the basic concepts included in dimensional modeling.  We are going over the basics of dimensions, attributes and hierarchies.  We will review how they are related and how they work within Data Warehouses.  If you are looking for an introduction to building a data warehouse click here.

What is a Dimension Table?

A dimension table provides the description behind the analytic numbers.  It describes the who, what, when, where and why behind the facts. Dimensions are normally broken down into groups (tables) and they contain several attributes (columns).   Unlike a fact table the dimension table is not normalized.  Generally, dimension tables have many columns but a limited amount of rows. Dimension tables normally provide two purposes in a data warehouse, it can be used to filter queries and to select data.

Several data warehouses include the following dimension tables products, employees, customers, time, and location.  Lets say your business requirement is to provide an  time tracking data warehouse.  You would to want to implement a employee dimension table that included at least the following attributes (columns) first name, last name, status, start date, hire date, end date, department name title, salary etc….   we could actually go on and on and this is okay as long as the attributes support your business requirements.

image image

What are dimension hierarchies?

Dimension hierarchies provide a way to define a relationship between multiple attributes within a dimension. They are commonly defined as a structure to provide drill up and drill down capabilities.  You can have multiple hierarchies within a dimension. For example, in a time dimension you might want to have two separate hierarchies, one for fiscal year and another for calendar year.  These two dimension could contain the following attributes year, quarter, month, day.

image

Best Practices for designing a Dimension table

  • Use a unique identifier integer column that is auto incremental as your primary key.  This is commonly known as a surrogate key.
  • Use the source’s primary key as an alternating key.
  • Any additional attributes (columns) that describe the business entity

Slowly Changing Dimensions

Now that we know how to build a dimension we need to consider how the data is stored.  Yes, occasionally the attribute data changes.  For example, lets pretend you are a die hard Pirates fan and built a Data Warehouse to do analysis on your favorite players.  How would you handle promotions like Andrew McCutchen getting called up from AAA Indianapolis to the Pirates?

The following are options to handle slowly changing dimensions:

  1. Do nothing (type 0).  This approach is highly not recommended.  This cannot guarantee history preservation and includes the least control over managing changed attributes.
  2. Overwrite old data with new data (type 1)
    This completely overwrites the previous attribute value.  This could be ideal if you had a spelling error or the historical value of this column is not crucial. This is the easiest method to implement but it can be hard to manage.
  3. Create multiple records with different key values (type 2)
    This provides you with the ability to have an unlimited amount of history.  It is usually implemented by adding a start and end date column.  If you have to make changes to past records this could require you to also update fact records.
  4. Create a new column for previous values (type 3)
    This allows only a fixed amount of history to be retained.  If you create two extra columns for the team attribute you can only store three team values (the current value and last two previous values)

I hope you enjoyed this post.  Next we will cover an introduction to facts and measures.

Building a Data Warehouse with SQL Server

Today, I am going to start a series of blogs on one of my favorite topics Data Warehousing with SQL Server.  I will start by defining a data warehouse.  I will then give some business cases that explain the benefits of a data warehouse.  Finally, I will start writing about the built in tools SQL Server provides to implement a data warehouse with SQL Server.

What is a Data Warehouse?

In its simplest form a Data Warehouse is a way to store data information and facts in an format that is informational.  Hopefully, you were able to pull this information from the photos above.   Personally, I like to think of a Data Warehouse as a tool used by decision makers to improve decision‐making.

Business cases for a Data Warehouse

The following are several reasons business cases that explain how “insert company name here” can benefit from a data warehouse.

  • A Data Warehouse is fast to query. After the initial setup is complete, queries can run up to 1000% faster in an OLAP database than in an OLTP database.
  • A Data Warehouses improves ROI by allowing end users to make more efficient use of enterprise information so many companies have all the information they need.
  • A Data Warehouse is a good solution for application’s that are great for data entry but lacks in depth reporting and drilldown capabilities.
  • A Data Warehouse could be used to bring several applications and/or data sources together.
  • If you are a service company a data warehouse could be used to analyze work completed to estimate future flat fee engagements. (I can go on and on on these examples…)
  • If you already own SQL Server you can implement a data warehouse solution with the built in tools.  This means no additional cost for software is needed.

What tools are needed to Build a Data Warehouse?

Both SQL Server 2005 and SQL Server 2008 include three tools that are very helpful towards implementing a Data Warehouse. These three tools allow you to create an killer data warehouse.

  • Analysis Services (SSAS)
  • Integration Services (SSIS)
  • Reporting Services (SSRS).

We will go over these tools in great detail in future posts.  For now, let’s just think of SSAS as the tool that provides storage for the data used in cubes for your data warehouse. SSIS is the tool we will use to extract, transform and load (ETL) data into our data warehouse.  SSRS is a tool used to view the data in our data warehouse.

Next post : Introduction to Data Warehouse Dimensions