This article is focused on the beginners who are confused with the term Slowly Changing Dimensions(SCD) . Working on a ETL or any Data Management tool Slowly Changing Dimension is the term which we come across every now and then, in this article I will try to explain what are SCD types and how we can implement them using SAP BODS.
First lets understand what is Slowly Changing Dimension, as the name suggests it involves the dimensions which change over a period of time and we as a Data Consultant have to take care of these dimensions as per the business requirements. Depending on how we want to preserve or manage these changing dimensions we have three Slowly Changing Dimension types.
- SCD Type I,
- SCD Type II, and
- SCD Type III
SCD Type I
In this type of SCD the dimensions which are changing are not required to be used in future and hence we overwrite the data, in other terms we can say that no history is preserved in this type of SCD. Below is the example for the same.
EMP_ID | EMP_NAME | EMP_CITY | EMP_DEPT |
---|---|---|---|
123 | Shubham | Bangalore | Insights and Data |
456 | Manish | Patna | SAP |
EMP_ID | EMP_NAME | EMP_CITY | EMP_DEPT |
---|---|---|---|
123 | Shubham | Bangalore | BODS |
456 | Manish | Patna | BW |
In the above example we can observe that EMP_DEPT for both the records is changed, but in the final table the department name is overwritten and we cannot determine what was there earlier and the new value will be considered.
SCD Type II
In this type of SCD implementation history is maintained and at any given point of time we can see the historic records. For each update a new record is inserted here and a flag or effective date is used for identifying the current or the active record.
EMP_ID | EMP_NAME | EMP_CITY | isActive |
---|---|---|---|
123 | Shubham | Bangalore | 1 |
456 | Manish | Patna | 1 |
EMP_ID | EMP_NAME | EMP_CITY | isActive |
---|---|---|---|
123 | Shubham | Bangalore | 0 |
123 | Shubham | Pune | 1 |
456 | Manish | Patna | 0 |
456 | Manish | Bangalore | 1 |
In the above example you can observe that the EMP_CITY is changed for both the employees but instead of updating the records a new record is inserted with the new city value and also the previous record is updated with isActive as ‘0’. In this manner the historic data is maintained and can be used later if required. In some cases people prefer using effective dates instead of flag which is sometimes more useful in manipulating the data. Below is the example with effective dates:
EMP_ID | EMP_NAME | EMP_CITY | ValidFrom | ValidTo |
---|---|---|---|---|
123 | Shubham | Bangalore | 01/01/1900 | 01/01/9999 |
456 | Manish | Patna | 01/01/1900 | 01/01/9999 |
EMP_ID | EMP_NAME | EMP_CITY | ValidFrom | ValidTo |
---|---|---|---|---|
123 | Shubham | Bangalore | 01/01/1900 | 18/12/2016 |
123 | Shubham | Pune | 19/12/2016 | 01/01/9999 |
456 | Manish | Bangalore | 01/01/1900 | 18/12/2016 |
456 | Manish | Patna | 19/12/2016 | 01/01/9999 |
From the above example gives the more clear picture of the data. It clearly explains till what time a particular employee was in a particular city and when it changed. When in some cases evaluation is done on historic data also it is always advised to use effective dates.
SCD Type III
This is the type of SCD implementation where only limited history is preserved and for preserving history addition columns are added. Taking the initial data from the above SCD Type II example in SCD Type III the data will something like the below table.
EMP_ID | EMP_NAME | EMP_CITY_Current | EffectiveDate | EMP_CITY_Previous |
---|---|---|---|---|
123 | Shubham | Pune | 19/12/2016 | Bangalore |
456 | Manish | Bangalore | 19/12/2016 | Patna |
The limitation of Slowly Changing Dimension Type III over Type II is that in Type III we have the immediate previous data only, whereas in Type II we have the complete history. For example if an employee changes 3 cities there will be 3 records in the table with proper effective dates whereas in Type III implementation only the current and the previous city will be stored.
Slowly Changing Dimension Type II is the most commonly used, but since it inserts records for every update, database performance tuning steps should be taken to avoid querying huge database.
I hope I was clear in explaining the concept above, in the next article we will see how to implement them using SAP BODS.