How to implement SCD type 2 in Abinitio
SCD-type 2 in Abinitio | SCD type 2 in Abinitio | How to implement SCD type 2 in Abinitio
For example . we have an input dataset which is already loaded in as dimension table already.
vehicle dimension dataset
sk, veh_cd, curr_driver,hist_driver,start_dt, end_dt, curr_flag
1, veh01, dr0, dr0, 19000101000000,39991231235959,Y
2, veh02, dr1, dr1, 19000101000000,39991231235959,Y
3, veh03, dr2, dr2, 19000101000000,39991231235959,Y
4, veh04, dr3, dr3, 19000101000000,39991231235959,Y
5, veh05, dr5, dr4, 19000101000000,20210220235959,N
6, veh05, dr5, dr5, 20210221000000,39991231235959,Y
Sk is Logical Surrogate Key created by Abinitio Process
veh_cd - vehicle code of the driven vehicle
curr_driver - current driver of the vehicle
hist_driver - driver who drove the vehicle in past between start_dt and end_dt
curr_flag - Y/N , based on the current record of the vehicle
curr_flag=Y meaning the vehicle is currently driven by the curr_driver
curr_flag=N, meaning the vehicle was driven by hist_driver between start_dt and end_dt
Now,
lets say we received the current dataset from the source as below:
we will not have hist_driver and curr_flag in the incoming dataset.
Assumption is that we received the current dataset on 5th June 2021.
sk, veh_cd,curr_driver,start_dt, end_dt,
1, veh01, dr0, 19000101000000,39991231235959
2, veh02, dr1, 19000101000000,39991231235959
3, veh03, dr2, 19000101000000,39991231235959
4, veh04, dr3, 19000101000000,39991231235959
5, veh05, dr6, 19000101000000,39991231235959
note: do not gain any meaning of the start_dt,end_dt and curr_flag columns in the current dataset, as
this data set was prepared to match the layout of the dimension dataset.
Observation:
we see that we received a updated record for the veh05 with driver as dr6.
so we need to close the record #6 of the dimension dataset and insert an new record with the dr6 as current driver of the vehicle.
Our Target dataset will look like as below:
sk, veh_cd,curr_driver,hist_driver,start_dt, end_dt, curr_flag
1, veh01, dr0, dr0, 19000101000000,39991231235959,Y
2, veh02, dr1, dr1, 19000101000000,39991231235959,Y
3, veh03, dr2, dr2, 19000101000000,39991231235959,Y
4, veh04, dr3, dr3, 19000101000000,39991231235959,Y
5, veh05, dr6, dr4, 19000101000000,20210220235959,N
6, veh05, dr6, dr5, 20210221000000,20210604235959,N
7, veh05, dr6, dr6, 20210605000000,39991231235959,Y
we also update all values of curr_driver for historical records with the new driver.
the design of the Abinitio Graph will look like as below:
input port0: $AI_DML/vehicle.dml ==>rec1
input port1: $AI_DML/vehicle_incoming.dml===>rec2
Output port of JOIN component:
PDL : $[join_types(in0_metadata,in1_metadata)]
Join output will be passed through the REFORMAT component:
out:output-indexes(in)=
begin
let date(YYYYMMDD") HIGH_END_DATE ="39991231235959";
out:1: if(in.rec1.curr_driver!=in.rec2.curr_driver and in.rec1.end_dt!=HIGH_END_DATE) [vector 1];
out:2: if(in.rec1.curr_driver!=in.rec2.curr_driver and in.rec1.end_dt==HIGH_END_DATE) [vector 2,3]
out:: [vector 0];
end;
transform0 (no update, exact match )
out:reformat(in)=
begin
out.*::in.rec1.*;
end;
transform1 (update the historical records with curr_driver value)
out:reformat(in)=
begin
out.*::in.rec1.*;
out.curr_driver::in.rec2.curr_driver;
end;
transform12 (close the current record)
out:reformat(in)=
begin
out.*::in.rec2.*;
out.hist_driver::in.rec2.curr_driver;
out.end_date::now1();
out.curr_flag=N;
end;
transform13 (create the new record)
out:reformat(in)=
begin
out.*::in.rec1.*;
out.hist_driver::in.rec2.curr_driver;
out.start_dt::now1();
out.end_dt::HIGH_END_DATE;
out.curr_flag::Y
end;
Collect the output of transform0 to TRASH
Collect the output of transform1,2 to Update the Dimension TABLE.
Collect the output of transform3 to insert into the Dimension TABLE.
Please also view my follow-up video on SCD type 2
SCD type 2 dimension video for more metaprogramming detail:
SCD type 2 dimension video for more metaprogramming detail:
Comments
Post a Comment