10,573 total views, 3 views today
Data model is a process of organizing data from different data sources to a single design schema which helps to insight the analysis around combined data. A data model is a collection of Fact and Dimension tables connected through common fields called key fields. This data model can help exploring impact of one data entity over different data entities in different ways. We can easily slice and dissect the data in many ways accurately by having a proper data model.
The best common practices that involved to build a proper data model are as given below:
Finding the targeted source tables and fields:
Before building the data model we need to ensure that for what purpose or business need we are building the model. We need to know the list of source tables and fields which holds the descriptive or metric data which helps in analyzing the business needs. Preparing a document with source table and field information helps in keep the tracking the data model with the actual business need.
Fact Tables & Dimension Tables
At the heart of a schema or any data model is a fact table, which holds data entries that comprise a set of facts related to a company’s business operations. Each row in a fact table represents an individual transaction or event. A fact table stores quantitative information for analysis.
A dimension table is a table in a star schema of a data warehouse. A dimension table stores attributes, or dimensions, that describe the objects in a fact table. In data warehousing, a dimension is a collection of reference information about a measurable event. Identify the Fact and Dimensional tables, understand the data in these tables and document which fields can be useful for making the analysis in the final application.
Before joining the two tables, we need to find the appropriate fields to be considered to join the two tables. A key field refers to the unique value of a table which can identify each record of a table distinctly. Do the joins with appropriate primary and foreign keys. The key field of a table refers the most unique data that helps to make an association to another table, which has the same granularity.
Removing Synthetic Keys
The redundant fields among the tables in the model may cause forming the synthetic keys. Forming more number of synthetic keys results to ambiguity results and performance issue of an application.
We can remove the synthetic keys by the following ways:
- Renaming the redundant fields
- Dropping/ Commenting the fields
Removing Circular loops
When we do any improper joins or having multiple common fields among the table, a circular loop happens. Circular loop occurs when we can traverse from one table to other table in multiple paths. This leads to ambiguity results in the data model.
Circular loop can be eliminated by the following ways:
- Removing multiple common fields. Or renaming the field which are causing to multiple traverse from One table to another table.
- Dropping the duplicate key field.
Find the example demonstration below
Below is the code to eliminate the circular reference. We can remove the field Region Name from EMP table as it is also there in REGION table.
When there are situations where there are multiple common fields among multiple fact tables then, we need to build an intermediate table with composite keys which can identically associate each fact table. This intermediate table can be called as link table. Without a link table, the data model leads to either circular loops or synthetic key formation issues.
This can be implemented by concatenating all distinct key field values into a single field. And preparing appropriate composite keys to join each Fact table distinctly and dropping the original key field from each fact table.
- Load all distinct key values from all fact tables and concatenate into single table.
- Create composite keys in Link table and Fact table.
- Drop original key field from each fact table.
The sample code for building the LinkTable is as follows. In the below script we can see multiple common fields from each fact table and built a unique field by creating a composite of two fields which brings the unique association property.
Find sample code below:
Creating a Calendar
In any Qlik application, the Date and Time is an important factor to explore the business patterns in many ways. Sales or any metric factors by dimension variables like Year on Year, Month on Month, period, quarter wise analysis helps end users to understand the business growth. To do the same analysis, we need an extra table which can constitute of all these data variable. To fetch the same from any data point, we need an extended table with Year, Month, Quarter, Week, etc. fields based on a Date field from a Fact table. Creating a Master calendar can be done as given below:
- Find minimum date and maximum date from a Fact table which has a Date field in it.
- Create a loop to iterate from minimum date to maximum date to bring all unique dates. Name it as Master Calendar.
- Write a script to find year, month, quarter, week, day fields based on the generate unique Date field.
- Link the Calendar table to Fact table based on common Date field.
The basic sample script for creating Master Calendar is as follows:
Remove the fields which will not be used in the Final application by any business user. Removing the unused fields improves the performance of the application.
It is recommended to use a pre-built Document analyzer application from Qlik Tech to improve the performance of huge data/size application.
This document analyzer helps in finding the application issues like, unused key fields, redundant data points, redundant expressions and master items. Unused UI objects, number of Objects created etc.
Developer Qlik View, Qlik Sense and Tableau
Suman is working as a Developer (QlikView, QlikSense and Tableau) for the past five years, involved in development of Business Intelligence (BI) solutions. Responsible for all activities related to the development, implementation, administration and support of ETL processes for large scale data warehouses and Performance optimization of the BI applications. He has expertise in complex data modelling, designing the scorecard dashboards, Ad-hoc reporting and building KPI’s. In the past two years, he has worked in several domains viz., Financial, Sales, Transportation, Insurance, Media, Healthcare, Telecom, Hospitality, Retail and Crime Analytics.