building a data warehouse with aws redshift, matillion and yellowfin

Post on 18-Jan-2017

366 Views

Category:

Technology

4 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Building a Data Warehouse on AWS

Amazon S3

Amazon Redshift

CollectCollect ProcessProcess AnalyzeAnalyzeStoreStore

Data Answers

Visualize

@Lynn Langit

Building a Data Warehouse on AWS

Move data into Redshift from S3 for analysis

Amazon S3

Amazon Redshift

AWS Marketplace Partners

Matillion

Visualize

Yellowfin

CollectCollect ProcessProcess AnalyzeAnalyzeStoreStore

Data Answers

Setup

Our Scenario and Source FilesFile Types

-- Text - .csv-- Compressed - .gz

File CategoriesDetails / Events -- Flights

-- WeatherMetadata

-- Airports -- Carriers

“In this scenario we will use Matillion ETL for Redshift to prepare two separate data sources ready for analysis. The sample data is US airport flight information from 1995 -> 2008. Every flight to or from a US airport (and whether it left on time or not) is included.

The second data set is weather data, taken from NOAA, including the daily weather readings for each US Airport.”

Loading data from S3 in to Redshift

Using Matillion ETL for Redshift• Create Instance (AMI/EC2) of Matillion/AWS Marketplace• Connect Matillion to Redshift

Loading Data in Redshift

Table distribution stylesDistribution Key All

Node 1

Slice 1

Slice 2

Node 2

Slice 3

Slice 4

Node 1

Slice 1

Slice 2

Node 2

Slice 3

Slice 4

key1

key2

key3

key4

All data on every nodeSame key to same location

Node 1

Slice 1

Slice 2

Node 2

Slice 3

Slice 4

EvenRound robin distribution

Sort Keys• Single Column - [ SORTKEY ( date ) ]

• Queries that use 1st column (i.e. date) as primary filter

• Compound - [ SORTKEY COMPOUND ( date, region, country) ]

• Queries that use 1st column as primary filter, then other columns

• Interleaved - [ SORTKEY INTERLEAVED ( date, region, country) ] • Queries that use different columns in filter

Time Series Data – Vacuum Operation

Unsorte

dRegion

Sorte

dRegio

n Sorte

d

Sorte

d

Sorte

d

Append in Sort Key Order

Sort Unsorted Region

Merge

Visualizing with Yellowfin

Automate – https://github.com/lynnlangit/AWSDataWarehouse

top related