building a data warehouse with aws redshift, matillion and yellowfin

13
Building a Data Warehouse on AWS Amazon S3 Amazon Redshi ft Collect Collect Process Process Analyze Analyze Store Store Data Answers Visualiz e @Lynn Langit

Upload: lynn-langit

Post on 18-Jan-2017

366 views

Category:

Technology


4 download

TRANSCRIPT

Page 1: Building a data warehouse with AWS Redshift, Matillion and Yellowfin

Building a Data Warehouse on AWS

Amazon S3

Amazon Redshift

CollectCollect ProcessProcess AnalyzeAnalyzeStoreStore

Data Answers

Visualize

@Lynn Langit

Page 3: Building a data warehouse with AWS Redshift, Matillion and Yellowfin

Building a Data Warehouse on AWS

Move data into Redshift from S3 for analysis

Amazon S3

Amazon Redshift

AWS Marketplace Partners

Matillion

Visualize

Yellowfin

CollectCollect ProcessProcess AnalyzeAnalyzeStoreStore

Data Answers

Page 4: Building a data warehouse with AWS Redshift, Matillion and Yellowfin

Setup

Page 5: Building a data warehouse with AWS Redshift, Matillion and Yellowfin

Our Scenario and Source FilesFile Types

-- Text - .csv-- Compressed - .gz

File CategoriesDetails / Events -- Flights

-- WeatherMetadata

-- Airports -- Carriers

“In this scenario we will use Matillion ETL for Redshift to prepare two separate data sources ready for analysis. The sample data is US airport flight information from 1995 -> 2008. Every flight to or from a US airport (and whether it left on time or not) is included.

The second data set is weather data, taken from NOAA, including the daily weather readings for each US Airport.”

Page 6: Building a data warehouse with AWS Redshift, Matillion and Yellowfin

Loading data from S3 in to Redshift

Page 7: Building a data warehouse with AWS Redshift, Matillion and Yellowfin

Using Matillion ETL for Redshift• Create Instance (AMI/EC2) of Matillion/AWS Marketplace• Connect Matillion to Redshift

Page 8: Building a data warehouse with AWS Redshift, Matillion and Yellowfin

Loading Data in Redshift

Page 9: Building a data warehouse with AWS Redshift, Matillion and Yellowfin

Table distribution stylesDistribution Key All

Node 1

Slice 1

Slice 2

Node 2

Slice 3

Slice 4

Node 1

Slice 1

Slice 2

Node 2

Slice 3

Slice 4

key1

key2

key3

key4

All data on every nodeSame key to same location

Node 1

Slice 1

Slice 2

Node 2

Slice 3

Slice 4

EvenRound robin distribution

Page 10: Building a data warehouse with AWS Redshift, Matillion and Yellowfin

Sort Keys• Single Column - [ SORTKEY ( date ) ]

• Queries that use 1st column (i.e. date) as primary filter

• Compound - [ SORTKEY COMPOUND ( date, region, country) ]

• Queries that use 1st column as primary filter, then other columns

• Interleaved - [ SORTKEY INTERLEAVED ( date, region, country) ] • Queries that use different columns in filter

Page 11: Building a data warehouse with AWS Redshift, Matillion and Yellowfin

Time Series Data – Vacuum Operation

Unsorte

dRegion

Sorte

dRegio

n Sorte

d

Sorte

d

Sorte

d

Append in Sort Key Order

Sort Unsorted Region

Merge

Page 12: Building a data warehouse with AWS Redshift, Matillion and Yellowfin

Visualizing with Yellowfin

Page 13: Building a data warehouse with AWS Redshift, Matillion and Yellowfin

Automate – https://github.com/lynnlangit/AWSDataWarehouse