Posts

Showing posts from September, 2022

Populate Streaming data into AWS Redshift Kinesis Data Stream

Image
                   Reading Real Time Data Analytics using AWS Redshift See My you tube video for detailed Explanation   1.         Amazon Kinesis Data Stream 2.       Creating IAM Role for Amazon Redshift 3.       Launch Redshift and Associate the AWS Redshift with IAM role create in Step2 4.       Create External Schema in Redshift 5.       Create Materialized view 6.       Schedule Materialized view        My You tube Channel   DataPundit   Project 1 - Putting random generated payload to Kinesis stream and to the AWS redshift cluster for analytics   --External python program to produce streaming data import boto3 import random   client = boto3.client('kinesis',aws_access_key_id='AKIAYWSRW5TSORSZMAR5', aws_secret_access_key='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', region_name='ap-south-1')   for x in range(1, 6):     v = x * random.randint(1, 4)     t = x * random.randint(1, 3)     p = x * random.randint(

AWS Redshift - Creating working Data Warehouse in AWS Redshift ( 4 Major Steps)

Image
                       AWS Redshift cluster - Star Schema Benchmark   As we know that AWS redshift has multiple advantages with respect to OLAP analytics, so we have decided to create a small Data warehouse replica in this tutorial.      The complete video with explanation is given here.          The DWH architecture which we are going to create is as follows, this can be understood as a star schema based single fact data warehouse.     Following are the steps how anyone can create the Start Schema based Data Warehouse in AWS Redshift, we are going to utilize multiple AWS services to prepare this mini Data warehouse.   Step 1 .   IAM role,  create a role using AWS Redshift service which has S3 full operation policies.   Step 2  S3 bucket , create S3 bucket and copy the file from the below zip location to your S3 location.  Click on here  S3 Data to Copy From   you will find data for customer , part and dwdate table.       Unzip and copy the data from this location to your S3 bucket

Machine Learning - Reading the JSON data using Python in 3 ways

Image
  Dear Learners, This blog is to just give a glimpse to show how a simple python code can be used to extract the data from Web API(JSON ) into a tabular (pandas) format. As a learner myself had experienced many hurdles to find out the ways to read JSON effectively therefore blogging here to share my approaches to the problem .For further references , please visit my Git link given below and also i have shared my Linked In profile and Youtube Channel. Here are the 3 different ways to do it : Approach 1 : #Importing Python Libraries import requests import pandas as pd from pandas.io.json import json_normalize #used this example URL for example json url = 'https://api.github.com/repos/pandas-dev/pandas/issues' resp = requests.get(url) resp #Read the JSON data = resp.json() #Declare a  Dict Variable data_flat = {} #Define the Custom Function Named flatten_dict which is resposible to flatten the JSON: def flatten_dict(obj,name=''):     for k,v in obj.items():                

Using awk for data manipulation - Unix & Linux Stack Exchange

Image
   Important AWK commands: This blog is useful for beginner and intermediate data engineers who want to do advance in the filed of data engineering and data administration.     To listen to you tube video please click link below: AWK A few More Cases Day 2 Day usage Following are 5 examples which may be  used in daily use cases to solve real time problems. 1. Many a times we deal with configuration file such as .config and .YAML file (which is mostly used in java projects). Data storage happens as Key-Value pair in these configuration files,  the key and value are separated by a separator such as colon (:) or equal to(=) sign, for example weather_detail.config is shown as below. we need to retrieve the value for the key "Country" config, yaml Solution A. awk -F : '{ if($1=='Country') print $2}' weather_detail.config Solution B. cat <filename> |grep 'Country' | awk -F : '{print $2}' weather_detail.config Region:APAC Country:India 2.  In si

Reformat - Parameter used in abinitio reformat

Image
Reformat - Parameter Description   Common uses of Reformat component- 1. transforming the data 2. drop the fields  3. datatype conversion  parameters: 1. count - 1,  2 , 3  2. Select - Expression to filter the data based on condition 3. transform 0,1,2... 4. output_index out:output_index(in)= begin out.x::if(in.region=='Europe') 0 out.x::if(in.region=='APAC') 1 out.x::if(in.region=='North America') 2 end 5. output_indexes out:output_indexes(in)= begin out.x::if(in.region=='Europe') [vector 0] out.x::if(in.region=='APAC') [vector 1,2] out.x::if(in.region=='North America') [vector 2] end 6.  logging true/false  , LOGX, 7. reject threshold - abort on first reject, never abort, limit/ramp youtube videos can be seen as below: Refromat - Parameter Desciption                 Authored by datapuditeducation@gmail.com