Abinitio Phases and Checkpoints

January 21, 2023

Abinitio Phases and Checkpoints | Why Phases | Abinitio Checkpoints | Recovery and Resource Control

Phases and Checkpoints mechanism in Abinitio works on below tradeoffs:

Speed/control - More control makes graph slower

Speed /safety - more safety (recovery point of view) makes graph slover

primary objective of creating phases in graph is to efficient use of available resources such as CPU, main memory and disk space.

phase break : the boundary between 2 phases in called phase break.

first phases does complete and then only the 2nd phase can be started.

When the first phases completes, the Abinitio component immediately before the phase break writes down all the data passing through it to the temporary files under the layout of the component immediately after the phase break, the component after the phase break reads all these files to begin the next phase.

check point: a point where Co operating system saves all the information it would need to restore the

job to its state. At point of failure , the job can be restored up to the point of last completed check point.

Use Case 1

Phase 0 - Lookup preparation done OR data can be pulled from the non native server such as AWS S3, HDFS , SFTP or from any webserver

Phase 1 - Main processing happens where in memory intensive component takes part such as SORT, ROLLUP ,JOIN etc.

Benefits:

resource usage : resources can be utilized perfectly

recovery: . once all data available in AI sever memory , then only the local AI process starts thus improves the possible recovery easy if the graph fails in phase 2

Use Case 2:

Phase 0 : Graph processing completes and load ready file creation happens

Phase 1: only DB load happens

Benefits:

resource usage:

Heavy native memory intensive processing done in phase 0 without making connection to non native server , in this case its a database server

recovery:

If the DB connection fails due to some reason then we have the data in load ready so while recovering ,

we do not have to process the entire data preparation again.

Use case 3:

Phase0 - It can be as usual data preparation phase such as lookup , ICFF etc

Phase 1(before the check point) - preparation of data to be replicated , it is wise to have check pointed phase before the REPLCIATE component not after it.

Phase2 , after the check point , data is replicated in 2 flows and be used differently thus makes

it easy to fall back in case of failure in either of the branch.

Benefits:

Easy recovery.

Use case 4:

Its is wise to place phase without check point to make the data processing consistent after the reformat.

if upper flow fails to process then fall back phase would be the check pointed phase which is phase 1.

thus makes reprocessing of the graph from REFORMAT component.

CLICK HERE FOR MORE ABINITIO

Search This Blog

datapundittechblogs

Abinitio Phases and Checkpoints

Abinitio Phases and Checkpoints | Why Phases | Abinitio Checkpoints | Recovery and Resource Control

Comments

Post a Comment

Popular posts from this blog

Abinitio Interview Question # 1 - Write Multiple Files in Abinitio

Abinitio Interview Question 43