Posts

Image
  Metaprogramming in Abinitio 1. How to create output DML dynamically based in INPUT DML 2. How to create intermediate transform( XFR) dynamically based on INPUT DML      Please See my you tube video for explanation : PART 1 PART 2 For Example, let us say, we have an Input DML as below-: cat input.dml record string(",") name; decimal(",") age; decimal(",") salary; string(",") aadharid; date("yyyy-mm-dd")(",") joiningdate; string(",") mobileno=NULL; string(",") dlno=NULL; string(",") passportnumber=NULL; end; (1.1) prepare a new DML based on input DML i.e. add fields, remove fields, modify type Graph based parameters: ===================================================================== input_dml=$DIR/input.dml  [interpretation = constant] input_dml_info =$[read_type(input_dml)]   [interpretation = pdl] /* Define functions in AB_DML_DEFS */ AB_DML_DEFS= $[ out:: dynamic_dml_gen(input_dml_info)= beg...

Populate Streaming data into AWS Redshift Kinesis Data Stream

Image
                   Reading Real Time Data Analytics using AWS Redshift See My you tube video for detailed Explanation   1.         Amazon Kinesis Data Stream 2.       Creating IAM Role for Amazon Redshift 3.       Launch Redshift and Associate the AWS Redshift with IAM role create in Step2 4.       Create External Schema in Redshift 5.       Create Materialized view 6.       Schedule Materialized view        My You tube Channel   DataPundit   Project 1 - Putting random generated payload to Kinesis stream and to the AWS redshift cluster for analytics   --External python program to produce streaming data import boto3 import random   client = boto3.client('kinesis',aws_access_key...

AWS Redshift - Creating working Data Warehouse in AWS Redshift ( 4 Major Steps)

Image
                       AWS Redshift cluster - Star Schema Benchmark   As we know that AWS redshift has multiple advantages with respect to OLAP analytics, so we have decided to create a small Data warehouse replica in this tutorial.      The complete video with explanation is given here.          The DWH architecture which we are going to create is as follows, this can be understood as a star schema based single fact data warehouse.     Following are the steps how anyone can create the Start Schema based Data Warehouse in AWS Redshift, we are going to utilize multiple AWS services to prepare this mini Data warehouse.   Step 1 .   IAM role,  create a role using AWS Redshift service which has S3 full operation policies.   Step 2  S3 bucket , create S3 bucket and copy the file from the below zip location to your S3 location. ...

Machine Learning - Reading the JSON data using Python in 3 ways

Image
  Dear Learners, This blog is to just give a glimpse to show how a simple python code can be used to extract the data from Web API(JSON ) into a tabular (pandas) format. As a learner myself had experienced many hurdles to find out the ways to read JSON effectively therefore blogging here to share my approaches to the problem .For further references , please visit my Git link given below and also i have shared my Linked In profile and Youtube Channel. Here are the 3 different ways to do it : Approach 1 : #Importing Python Libraries import requests import pandas as pd from pandas.io.json import json_normalize #used this example URL for example json url = 'https://api.github.com/repos/pandas-dev/pandas/issues' resp = requests.get(url) resp #Read the JSON data = resp.json() #Declare a  Dict Variable data_flat = {} #Define the Custom Function Named flatten_dict which is resposible to flatten the JSON: def flatten_dict(obj,name=''):     for k,v in obj.items():   ...

Using awk for data manipulation - Unix & Linux Stack Exchange

Image
   Important AWK commands: This blog is useful for beginner and intermediate data engineers who want to do advance in the filed of data engineering and data administration.     To listen to you tube video please click link below: AWK A few More Cases Day 2 Day usage Following are 5 examples which may be  used in daily use cases to solve real time problems. 1. Many a times we deal with configuration file such as .config and .YAML file (which is mostly used in java projects). Data storage happens as Key-Value pair in these configuration files,  the key and value are separated by a separator such as colon (:) or equal to(=) sign, for example weather_detail.config is shown as below. we need to retrieve the value for the key "Country" config, yaml Solution A. awk -F : '{ if($1=='Country') print $2}' weather_detail.config Solution B. cat <filename> |grep 'Country' | awk -F : '{print $2}' weather_detail.config Region:APAC Country:India 2.  In si...

Reformat - Parameter used in abinitio reformat

Image
Reformat - Parameter Description   Common uses of Reformat component- 1. transforming the data 2. drop the fields  3. datatype conversion  parameters: 1. count - 1,  2 , 3  2. Select - Expression to filter the data based on condition 3. transform 0,1,2... 4. output_index out:output_index(in)= begin out.x::if(in.region=='Europe') 0 out.x::if(in.region=='APAC') 1 out.x::if(in.region=='North America') 2 end 5. output_indexes out:output_indexes(in)= begin out.x::if(in.region=='Europe') [vector 0] out.x::if(in.region=='APAC') [vector 1,2] out.x::if(in.region=='North America') [vector 2] end 6.  logging true/false  , LOGX, 7. reject threshold - abort on first reject, never abort, limit/ramp youtube videos can be seen as below: Refromat - Parameter Desciption                 Authored by datapuditeducation@gmail.com