Multidimensional Schema was inspired by Data Warehousing Systems. The schemas are intended to meet the requirements of large and complex databases used for analytical purposes (OLAP). Data Warehouse Schemas come in various sizes and shapes. There are three primary types of Multidimensional Schemas – Star Schema, Snowflake Schema, and Fact Constellation Schema. Here we will learn in detail the most commonly used schema i.e. Star Schema
Table of Contents
- What is Star Schema Data Modelling?
- What are Fact Tables?
- What are Dimension Tables?
- How does Start Schema Works?
- Pros and Cons of Start Schema Data Modelling
What is Star Schema Data Modelling?
The Star Schema Data Modelling is the easiest and simplest of the Data Warehouse Schemas. Data Warehouses are frequently built using this schema. It is made up of one or more Facts Tables, each of which indexes an infinite number of Dimensional Tables. The Star Schema is required for the Snowflake Schema to exist. Star Schema Data Modelling has numerous advantages and aids in data modeling for warehouses. Because the model looks like a star physically, it is called Star Schema with a Fact Table in the center and Dimension Tables around it to represent the star’s points.
An example of Star Schema Data Modelling is given below to understand it better
A Fact Table is a Table in a Star Schema that contains Facts and is linked to Dimensions.
A Fact Table has two types of columns:
- Fact Columns & Foreign Keys to the Dimension Table
- The Primary Key in Fact Tables is typically a Composite Key composed of all of the Foreign Keys.
Detail Level Facts or Aggregated Facts could be found in a Fact Table (Fact Tables that include Aggregated facts are often instead called Summary Tables). A Fact Table typically contains facts with the same level of aggregation.
As seen in the above example, SALES is a Fact Table that contains properties such as (Product ID, Time ID, Store ID, Employee ID, Sale Type ID, Price, and Quantity) that refer to Dimension Tables.
In the Star Schema Data Modelling, Dimension tables help in the description of dimensions, such as dimension values, attributes, and keys. It is generally small in size and can range from a few hundred to thousands of rows. It describes the items in the fact table.
A dimension table is a collection or group of data related to any measurable event. They serve as a basis for dimensional modeling. It includes a column that serves as a primary key column, allowing each dimension row or record to be uniquely identified. Through this key, it is linked to the fact tables. When it is created, a system-generated key called the surrogate key is used to uniquely identify the rows in the dimension.
For example, in the above diagram Product Dimension table has attributes like Product Id, Product Name, and Product Type.
The Fact table stores both the numerical as well as dimension attribute values. Let’s understand this with an example:
Numeric value cells are distinct to each row or data point and have no correlation or relationship to data in other rows. These may include transactional information such as the order ID, total amount, net profit, order quantity, or exact time.
Instead of data, the foreign key value for a row in a related dimensional table is stored in the dimension attribute values. This type of data will be referenced in numerous rows of the fact table. It could store, for example, the sales employee ID, a date value, a product ID, or a branch office ID.
Dimension tables hold the supporting data for the fact table. Each star schema database contains at least one dimension table, but many more are common. Each dimension table will be linked to a fact table column that contains a dimension value and will contain additional information about that value.
For example, the employee dimension table can contain information such as the employee’s name, gender, address, or phone number, as well as the employee ID as a key value.
Organizations should construct a star schema with vigilance. In each table, avoid combining fact and dimension data. Consider the total number of dimension tables when optimizing performance. Consider the granularity of the data captured when optimizing for the types of queries that will be run. Decide whether to use the exact time or date and whether monetary values should be recorded to the dollar or rounded to the thousandth place, for example.
|Pros of Star Schema Data Modelling
|Cons of Star Schema Data Modelling
|Simple and Efficient Querying as the Fact Table connects the dimensions in the schema
|Data Integrity is not properly enforced because the Schema is severely de-normalized.
|Streamlined business reporting logic as it simplifies reporting and period-over-period reporting compared to standardized transactional schema
|It is less versatile in terms of analytical demands than a normalized Data Model.
|In-built Referential Integrity as a correct key cannot be assigned in the fact table if not associated with dimension
|Does not encourage Many-to-many linkages between business entities
|Easy Maintenance and Load Performance as dimension tables filled once are renewed on a regular basis and you can add new facts on a regular and selective basis
|Data integrity isn’t well-enforced because of its denormalized state.
|Easy to understand and navigate, with simply the Fact Table connecting the Dimensions.
Also Read – Star and Snowflake Schema
To summarise, Star Schema Data Modelling is the preferred schema by businesses among various schemas because it is easy to understand and build, accessing data is faster as compared to other schema models, simpler to drive business insights, and works well with many analytical tools.