Basic Concepts

Dimensional Modelling

Dimensional modeling is widely accepted as the preferred technique for presenting analytic data because it addresses two simultaneous requirements:

  • Deliver that that’s understandable to the business users
  • Deliver fast query performance

Dimensional modelling always uses the concepts of facts and dimensions.

Facts, or measures are typically (but not always) numeric values that can be aggregated; Dimensions are groups of hierarchies and descriptors that define the facts.

For example, in a sale transaction dataset, sales amount is a fact; timestamp, product, region number, store number, etc. are elements of dimensions.

Normalisation vs Denormalisation

Normalised Data

A schema design to store non-redundant and consistent data.

  • Data Integrity is maintained.
  • Little to no redundant data.
  • Many tables.

Denormalised Data

A schema that combines data so that accessing data (querying) is fast.

  • Data Integrity is not maintained.
  • Redundant data is common.
  • Fewer tables.

Normalisation vs Denormalisation

Typical Data Warehouse Architecture

data warehouse arch

Staging / Integration Layer

  • Create data in particular day
  • Ensure all data comes from the same format

Data Warehouse / Data Vault

  • All Data are having the same format
  • Everything in 2NF

Data Marts

  • Different department or team can have their dedicated Data Mart, with their own business logic applied.
  • Provide information more quickly
  • One user does data manipulation in his Data Mart will not affect other user in their Data Mart

ETL and Data Marts

  • Extraction, Transformation and Loading
    • Extraction. Get the Data
    • Transformation. make it useful.
    • Loading. Save it to the warehouse.
  • Data Marts (Sub-sets of the Data Warehouse)
    • Don’t mess with my data.
    • Keep it simple for the user.
    • Small problems are easier to solve

Schema Design

The dimensional model is built on a star-like schema or snowflake schema, with dimensions surrounding the fact table. To build the schema, the following design model is used:

  1. Choose the business process.
  2. Declare the grain
  3. Identify the dimensions
  4. Identify the fact

Step 1: Define business process

The basic in the design build on the actual business process which the data warehouse should cover. Therefore, the first step in the model is to describe the business process which the model builds on. To describe the business process, one you choose to do this in plain text or use basic Business Process Modeling Notation (BPMN) or other design guides like the Unified Modeling Language (UML).

Step 2: Declare the grain

The grain of the model is the exact description of what the dimensional model should be focusing on. This could for instance be “An individual line item on a customer slip from a retail store”. To clarify what the grain means, you should pick the central process and describe it w

Step 3: Identify the dimensions

Step 4: Identify the fact

Multidimensional Schemas

  • Star Schema
  • Snowflake Schema
  • Kimball Model

Star Schema

Star Schema is the simplest type of Data Warehouse schema. In the Star Schema, the center of the star can have one fact tables and multiple associated dimension tables. It is also known as Star Join Schema and it optimised for querying large dataset.

Not in 3NF (Big Data)

  • The data should be de-normalised to 2NF
  • This means you get data redundancy
  • This means you need more storage
  • But you can get at the data more quickly
  • The purposes of a DW is to provide aggregate data which is in suitable format for decision making.

Slowly Changing Dimensions (SCD)

Type 0 SCD

Type 1 SCD

Type 2 SCD

Reference