AWS Redshift - Creating working Data Warehouse in AWS Redshift ( 4 Major Steps)

AWS Redshift cluster - Star Schema Benchmark

As we know that AWS redshift has multiple advantages with respect to OLAP analytics, so we have decided to create a small Data warehouse replica in this tutorial.

The complete video with explanation is given here.

The DWH architecture which we are going to create is as follows, this can be understood as a star schema based single fact data warehouse.

Following are the steps how anyone can create the Start Schema based Data Warehouse in AWS Redshift, we are going to utilize multiple AWS services to prepare this mini Data warehouse.

Step 1. IAM role, create a role using AWS Redshift service which has S3 full operation policies.

Step 2 S3 bucket, create S3 bucket and copy the file from the below zip location to your S3 location.

Click on here S3 Data to Copy From

you will find data for customer , part and dwdate table.

Unzip and copy the data from this location to your S3 bucket residing in same AWS region where you will create AWS redshift cluster.

We will copy the Supplier and LINEPRDER table data from other AWS S3 bucket which are created

by AWS for learning purpose (these are created in other AWS regions)

S Step 3 Redshift cluster Db - Create AWS redshift cluster, make it accessible Publicly and also

Make sure the Security Group liked to it has Inbound rule which allows traffic from 0.0.0.0/0 IP address. See the video above.

4. Step 4. SQL work bench/J (Use SQL workbench/J not MySQL workbench) –

SQL Workbench/J

We need to use any SQL client to connect and create tables & copy data to the tables. We may utilize the SQL Editor supplied by AWS Redshift Dashboard as well but to automate we need to connect to the AWS redshift via some API ( Java or Python), so we anyway needs JDBC Or ODBC connectivity. we will need, JDBC driver and ARN end point of AWS Redshift cluster, User name and Password for connectivity

See the youtube video for the visual explanation

Table creation scripts are as below:

Run the below table creation statements in SQL Workbench/J

CREATE TABLE part

(

p_partkey INTEGER NOT NULL,

p_name VARCHAR(22) NOT NULL,

p_mfgr VARCHAR(6),

p_category VARCHAR(7) NOT NULL,

p_brand1 VARCHAR(9) NOT NULL,

p_color VARCHAR(11) NOT NULL,

p_type VARCHAR(25) NOT NULL,

p_size INTEGER NOT NULL,

p_container VARCHAR(10) NOT NULL

);

CREATE TABLE supplier

(

s_suppkey INTEGER NOT NULL,

s_name VARCHAR(25) NOT NULL,

s_address VARCHAR(25) NOT NULL,

s_city VARCHAR(10) NOT NULL,

s_nation VARCHAR(15) NOT NULL,

s_region VARCHAR(12) NOT NULL,

s_phone VARCHAR(15) NOT NULL

);

CREATE TABLE customer

(

c_custkey INTEGER NOT NULL,

c_name VARCHAR(25) NOT NULL,

c_address VARCHAR(25) NOT NULL,

c_city VARCHAR(10) NOT NULL,

c_nation VARCHAR(15) NOT NULL,

c_region VARCHAR(12) NOT NULL,

c_phone VARCHAR(15) NOT NULL,

c_mktsegment VARCHAR(10) NOT NULL

);

CREATE TABLE dwdate

(

d_datekey INTEGER NOT NULL,

d_date VARCHAR(19) NOT NULL,

d_dayofweek VARCHAR(10) NOT NULL,

d_month VARCHAR(10) NOT NULL,

d_year INTEGER NOT NULL,

d_yearmonthnum INTEGER NOT NULL,

d_yearmonth VARCHAR(8) NOT NULL,

d_daynuminweek INTEGER NOT NULL,

d_daynuminmonth INTEGER NOT NULL,

d_daynuminyear INTEGER NOT NULL,

d_monthnuminyear INTEGER NOT NULL,

d_weeknuminyear INTEGER NOT NULL,

d_sellingseason VARCHAR(13) NOT NULL,

d_lastdayinweekfl VARCHAR(1) NOT NULL,

d_lastdayinmonthfl VARCHAR(1) NOT NULL,

d_holidayfl VARCHAR(1) NOT NULL,

d_weekdayfl VARCHAR(1) NOT NULL

);

CREATE TABLE lineorder

(

lo_orderkey INTEGER NOT NULL,

lo_linenumber INTEGER NOT NULL,

lo_custkey INTEGER NOT NULL,

lo_partkey INTEGER NOT NULL,

lo_suppkey INTEGER NOT NULL,

lo_orderdate INTEGER NOT NULL,

lo_orderpriority VARCHAR(15) NOT NULL,

lo_shippriority VARCHAR(1) NOT NULL,

lo_quantity INTEGER NOT NULL,

lo_extendedprice INTEGER NOT NULL,

lo_ordertotalprice INTEGER NOT NULL,

lo_discount INTEGER NOT NULL,

lo_revenue INTEGER NOT NULL,

lo_supplycost INTEGER NOT NULL,

lo_tax INTEGER NOT NULL,

lo_commitdate INTEGER NOT NULL,

lo_shipmode VARCHAR(10) NOT NULL

);

Run the copy commands one by one in SQL workbench/J as below:

Copy the data from your own S3 bucket to Redshift tables:

1. Copy part Table

copy part from 's3://myredshiftbucket111/load/part-csv.tbl'

credentials 'aws_iam_role=arn:aws:iam::598245567716:role/s3redshiftreadonly'

csv;

2. Copy dwdate Table

copy dwdate from 's3://myredshiftbucket111/load/dwdate-tab.tbl'

credentials 'aws_iam_role=arn:aws:iam::598245567716:role/s3redshiftreadonly'

delimiter '\t'

dateformat 'auto';

3. copy customer table

from 's3://myredshiftbucket111/load/customer-fw.tbl'

credentials 'aws_iam_role=arn:aws:iam::598245567716:role/s3redshiftreadonly'

fixedwidth 'c_custkey:10, c_name:25, c_address:25, c_city:10, c_nation:15, c_region :12, c_phone:15,c_mktsegment:10'

ACCEPTINVCHARS

maxerror 10

now, copy the 2 tables which are placed in some other S3 bucket placed in other AWS region:

4. Copy supplier Table

copy supplier from 's3://awssampledbuswest2/ssbgz/supplier.tbl'

credentials 'aws_iam_role=arn:aws:iam::598245567716:role/s3redshiftreadonly'

delimiter '|'

gzip

region 'us-west-2';

5. Copy lineorder Table

copy lineorder from 's3://awssampledb/load/lo/lineorder-multi.tbl'

credentials 'aws_iam_role=arn:aws:iam::598245567716:role/s3redshiftreadonly'

gzip

compupdate off

region 'us-east-1';

after copying, you can see the all data copied and run the different SQL analytics queries.

you will find STL_LOAD_ERRORS table useful to see error while copying data to redshift tables.

Step 6 Vacuum and analyze the database

run the below command to reclaim and the space and realign and gather the table statistics in the AWS redshift Data warehouse tables.

vacuum;

analyze;

Step 7. Clean up your resources

If you are using Free Tier then delete all AWS resource else you may incur unwanted charges.

delete all 4 AWS services from your AWS resource usage.

Please visit below my youtube channel datapundt for other AWS and Abinitio and Data Engineering tutorials.

datapundit

Thanks

datapunditeducation....

Search This Blog

datapundittechblogs

AWS Redshift - Creating working Data Warehouse in AWS Redshift ( 4 Major Steps)

Comments

Post a Comment

Popular posts from this blog

Abinitio Interview Question # 1 - Write Multiple Files in Abinitio

Abinitio Interview Question 43