Abinitio Phases and Checkpoints
Abinitio Phases and Checkpoints | Why Phases | Abinitio Checkpoints | Recovery and Resource Control
Phases and Checkpoints mechanism in Abinitio works on below tradeoffs:
Speed/control - More control makes graph slower
Speed /safety - more safety (recovery point of view) makes graph slover
primary objective of creating phases in graph is to efficient use of available resources such as CPU, main memory and disk space.
phase break : the boundary between 2 phases in called phase break.
first phases does complete and then only the 2nd phase can be started.
When the first phases completes, the Abinitio component immediately before the phase break writes down all the data passing through it to the temporary files under the layout of the component immediately after the phase break, the component after the phase break reads all these files to begin the next phase.
check point: a point where Co operating system saves all the information it would need to restore the
job to its state. At point of failure , the job can be restored up to the point of last completed check point.
Use Case 1
Phase 0 - Lookup preparation done OR data can be pulled from the non native server such as AWS S3, HDFS , SFTP or from any webserver
Phase 1 - Main processing happens where in memory intensive component takes part such as SORT, ROLLUP ,JOIN etc.
Benefits:
resource usage : resources can be utilized perfectly
recovery: . once all data available in AI sever memory , then only the local AI process starts thus improves the possible recovery easy if the graph fails in phase 2
Use Case 2:
Phase 0 : Graph processing completes and load ready file creation happens
Phase 1: only DB load happens
Benefits:
resource usage:
Heavy native memory intensive processing done in phase 0 without making connection to non native server , in this case its a database server
recovery:
If the DB connection fails due to some reason then we have the data in load ready so while recovering ,
we do not have to process the entire data preparation again.
Use case 3:
Phase0 - It can be as usual data preparation phase such as lookup , ICFF etc
Phase 1(before the check point) - preparation of data to be replicated , it is wise to have check pointed phase before the REPLCIATE component not after it.
Phase2 , after the check point , data is replicated in 2 flows and be used differently thus makes
it easy to fall back in case of failure in either of the branch.
Benefits:
Easy recovery.
Use case 4:
Its is wise to place phase without check point to make the data processing consistent after the reformat.
if upper flow fails to process then fall back phase would be the check pointed phase which is phase 1.
thus makes reprocessing of the graph from REFORMAT component.
Comments
Post a Comment