easy analytics on aws with amazon redshift, amazon quicksight, and amazon machine learning | aws...

43
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Greg Khairallah, Business Development Manager, AWS Malini Saxena, Senior Consultant, AWS Raj Chary, VP of Technology / Architecture, WagglePractice Lige Hensley, Chief Technology Officer, Ivy Tech June 20, 2016 Easy Analytics with AWS

Upload: amazon-web-services

Post on 23-Jan-2018

5.456 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine Learning | AWS Public Sector Summit 2016

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Greg Khairallah, Business Development Manager, AWS

Malini Saxena, Senior Consultant, AWS

Raj Chary, VP of Technology / Architecture, WagglePractice

Lige Hensley, Chief Technology Officer, Ivy Tech

June 20, 2016

Easy Analytics with AWS

Page 2: Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine Learning | AWS Public Sector Summit 2016

What to expect from this session

• AWS toolkit for analytics

• Understand stakeholders

• Demo

• Case Study – WagglePractice

• Case Study – Ivy Tech

• Q&A

Page 3: Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine Learning | AWS Public Sector Summit 2016

AnalyzeStore

Amazon

Glacier

Amazon

S3

Amazon

DynamoDB

Amazon RDS,

Amazon Aurora

Big data portfolio—but what do I recommend?

AWS Data Pipeline

Amazon

CloudSearch

Amazon EMR

Amazon EC2

Amazon

Redshift

Amazon

Machine

Learning

Amazon

Elasticsearch

Service

AWS Database

Migration

Amazon

Kinesis

Analytics

Amazon Kinesis

Firehose

AWS Import/Export

AWS Direct

Connect

Collect

Amazon Kinesis

Streams Amazon

QuickSight

Page 4: Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine Learning | AWS Public Sector Summit 2016

Match toolset to right persona

• Business intelligence (BI) analyst

• Primary tool is SQL

• Historical data resides in data warehouse such as

Amazon Redshift

• Data scientist—Uses programmatic languages such as R or

Python

• Application developer—Requires API to integrate with AWS

services

Page 5: Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine Learning | AWS Public Sector Summit 2016

BI analyst

Page 6: Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine Learning | AWS Public Sector Summit 2016

BI analyst with existing BI tools

BI Analyst

BI tools

Amazon EC2

Amazon Redshift

QuickSight API

• Primary tool is SQL

• Data is largely structured with well known data sources

• Primary concern is fast, consistent performance

• Need to extend SQL with custom functions

BI tools

Amazon EC2

Amazon QuickSight

Amazon QuickSight

Page 7: Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine Learning | AWS Public Sector Summit 2016

Amazon Redshift system architecture

Leader node• SQL endpoint

• Stores metadata

• Coordinates query execution

Compute nodes• Local, columnar storage

• Execute queries in parallel

• Load, backup, restore via Amazon S3; load from Amazon DynamoDB, Amazon EMR, or SSH

Two hardware platforms• Optimized for data processing

• DS2: HDD; scale from 2 TB to 2 PB

• DC1: SSD; scale from 160 GB to 356 TB

10 GigE

(HPC)

JDBC/ODBC

Page 8: Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine Learning | AWS Public Sector Summit 2016

New SQL functions

We add SQL functions regularly to expand Amazon Redshift’s query capabilities

Added 25+ window and aggregate functions since launch, including:

LISTAGG

[APPROXIMATE] COUNT

DROP IF EXISTS, CREATE IF NOT EXISTS

REGEXP_SUBSTR, _COUNT, _INSTR, _REPLACE

PERCENTILE_CONT, _DISC, MEDIAN

PERCENT_RANK, RATIO_TO_REPORT

We’ll continue iterating but also want to enable you to write your own

Window function examples: http://docs.aws.amazon.com/redshift/latest/dg/r_Window_function_examples.html

Page 9: Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine Learning | AWS Public Sector Summit 2016

Scalar user defined functions

You can write UDFs using Python 2.7

• Syntax is largely identical to PostgreSQL UDF

• Python execution is performed in parallel

• System and network calls within UDFs are prohibited

Comes integrated with Pandas, NumPy, SciPy, DateUtil, and

Pytz analytic libraries

• Import your own libraries for even more flexibility

• Take advantage of thousands of functions available through Python

libraries to perform operations not easily expressed in SQL

Page 10: Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine Learning | AWS Public Sector Summit 2016

A very fast, cloud-powered, business

intelligence service for 1/10 the cost of

traditional BI software

What is Amazon QuickSight?

Page 11: Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine Learning | AWS Public Sector Summit 2016

Business

User

Business

User

QuickSight

APIQuickSight UI

Mobile Devices Web Browsers

Partner BI Products

MetadataData PrepConnectors SuggestionsSPICE

Amazon

S3

Amazon

Kinesis

Amazon

DynamoDB

Amazon EMRAmazon

RedshiftAmazon RDSFiles Third-party

Page 12: Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine Learning | AWS Public Sector Summit 2016

Data scientist

Page 13: Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine Learning | AWS Public Sector Summit 2016

Data scientist with existing toolsets

Data scientistToolkits like SAS or

R Studio installed

with Amazon EC2

Unstructured data

Amazon S3

Structured data

Amazon Redshift

• Work with unstructured datasets

• Use existing toolsets to connect to Amazon Redshift

Page 14: Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine Learning | AWS Public Sector Summit 2016

Querying Amazon Redshift with R packages

• RJDBC—Supports SQL queries

• dplyr—Uses R code for data

analysis

• RPostgreSQL—R compliant

driver or database Interface (DBI)R UserR Studio

Amazon

EC2

Unstructured data

Amazon S3

User profile

Amazon RDS

Amazon Redshift

Connecting R with Amazon Redshift blog post: https://blogs.aws.amazon.com/bigdata/post/Tx1G8828SPGX3PK/Connecting-R-with-Amazon-Redshift

Page 15: Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine Learning | AWS Public Sector Summit 2016

Querying Amazon Redshift with R packages example

Page 16: Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine Learning | AWS Public Sector Summit 2016

Application developer

Page 17: Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine Learning | AWS Public Sector Summit 2016

Application developers can build smart

applications using Amazon Machine Learning

Structured data/predictions

Amazon Redshift

Generate/query

predictions

Amazon QuickSight

Application

Amazon Machine

Learning

Visualize

• All skill levels

• Amazon Machine Learning technology is accessed through APIs and SDKs

• Embed visualizations in applications

Page 18: Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine Learning | AWS Public Sector Summit 2016

Demo

Page 19: Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine Learning | AWS Public Sector Summit 2016
Page 20: Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine Learning | AWS Public Sector Summit 2016

Raj Chary, WagglePracticeVice President of Technology/Architecture

Page 21: Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine Learning | AWS Public Sector Summit 2016

Smart, responsive practice

Math and ELA (Grades 2-8)

Provides students the right

challenge at the right time

What is Waggle?

Page 22: Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine Learning | AWS Public Sector Summit 2016

Right Challenge, Right Time

Waggle looks for more than

correct answers. Waggle

continually analyzes each

student’s decisions and

progress. That way, students get

tougher material right when

they’re ready.

What is Waggle?

Page 23: Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine Learning | AWS Public Sector Summit 2016

Productive Struggle

Waggle motivates students to

push themselves forward. How?

Through helpful hints,

supportive feedback, and

achievement badges that build

grit and confidence.

What is Waggle?

Page 24: Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine Learning | AWS Public Sector Summit 2016

Constructive Grouping Waggle’s

insights means you can easily

group students together based

on learning needs. All without

sacrificing the quality of

individual instruction.

What is Waggle?

Page 25: Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine Learning | AWS Public Sector Summit 2016

Waggle: Product Demo

• Data Creators Differentiated learning experience

Fun and engaging

• Data Visualizers Seamless integration with application

Analytics with a Story

Actionable Data

Page 26: Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine Learning | AWS Public Sector Summit 2016

Redshift: Data Warehouse Layout

Write ClusterCompute – dw2.large

Redshift

Read ClusterCompute – dw2.large

Redshift

History ClusterDensity – dw1.xlarge

Redshift

Initial and Incremental {processed} data loads

Periodic Data Snapshots for historical analysis

Data sources

For serving Jaspersoft reports

APIs

OLTP

S3 COPY

S3 UnLoad and Load

S3 UnLoad and Load

Data mart(aggregations)

NodesNodes

Staging

Datamart(aggregations)

NodesS3 UnLoad and Load

S3 UnLoad and Load + UPSERTS

Page 27: Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine Learning | AWS Public Sector Summit 2016

Results and Lessons Learned• Performance Metrics

– Millions of records are processed in <1 minute

• LOAD/UNLOAD commands | UPSERTS | S3 COPY Command – Report queries average < 1 to ~1.5 seconds

– {compression} – gained 20+% efficiencies in data retrieval

• Best Practices

– {sort keys} – lens-based data model: visualize data in variety of ways

– {commit stats} – Redshift is not a transactional system

– {nested loops} – no Cartesian products, ensure joins well managed

– {queries that queue} – tune the WLM configuration

– {query runtimes} – faster query means less queuing

– {stats missing} – analyze and vacuum when possible

– {alerts with tables} – monitor to ensure queries running optimally

Page 28: Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine Learning | AWS Public Sector Summit 2016
Page 29: Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine Learning | AWS Public Sector Summit 2016

Thank You

Page 30: Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine Learning | AWS Public Sector Summit 2016

Ivy Tech & Amazon Redshift

May 25, 2016

Page 31: Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine Learning | AWS Public Sector Summit 2016

• Transforming the culture of the College to be more data driven

• Moving from reporting silos to an Integrated Analytics system, we call

this a Data Democracy

• Collecting and analyzing a vast variety of data at a scale that no one

in Higher Ed is doing

• Using machine learning tools to identify students who may need

further assistance

• Starting this fall, we are implementing a one-on-one coaching

initiative for the students we identified with the machine learning tools

What We’re Doing

Page 32: Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine Learning | AWS Public Sector Summit 2016

96% of organizations in the United States

use data in the same way.

…and it’s wrong.

But it’s not just education…

Page 33: Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine Learning | AWS Public Sector Summit 2016

The “Standard” Approach

VIP

Page 34: Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine Learning | AWS Public Sector Summit 2016

Relevant Data for Everyone

Page 35: Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine Learning | AWS Public Sector Summit 2016

Data Regimes

Data Dictatorship: Data is controlled and its use is restricted. There is asymmetric distribution of information based on your position

Data Aristocracy: Data analysts, scientists and PhDs are needed to do anything meaningful. Power concentrates in the hands of these employees and their supervisors

Data Anarchy: Business users feel underserved and take matters into their own hands. They create “shadow IT” systems and work around the “unresponsive” IT group

Data Democracy: Everybody gets timely and equitable access to data. Line of business users are empowered and “own” the data. Executives and IT get out of the way

1 Shash Hegde, Mariner, “The Rise of Data Regimes”, 9/12/13, http://www.mariner-usa.com/rise-data-regimes/ (image substitution for Mao Zedong)

Page 36: Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine Learning | AWS Public Sector Summit 2016

Every organization moves through increasingly complex stages of data accessibility.

Data Maturity Model

… very few complete the transition to Integrated Analytics

Page 37: Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine Learning | AWS Public Sector Summit 2016

Stage 1: Report SilosRequest

Tracker

Banner Blackboard Luminis StarfishSCCM CAS

Authentication

This is what we have had for

decades at Ivy Tech…

Page 38: Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine Learning | AWS Public Sector Summit 2016

Request

Tracker

Banner Blackboard Luminis StarfishSCCM CAS

Authentication

Stage 2: Data Warehousing

This is what

most

companies

do…

but we are

taking this a

step further…

Page 39: Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine Learning | AWS Public Sector Summit 2016

Stage 3: Integrated AnalyticsRequest

Tracker

Banner Blackboard Luminis StarfishSCCM CAS

Authentication

Students by

Financial

Aid

Students

by

Award

Students

by

Term

Students

by

Class

Classes by

Class

Section

Students

These curated collections of

data are designed to enable

direct access to...

…the data you need, regardless of

where it came from. Quickly.

Easily.

Page 40: Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine Learning | AWS Public Sector Summit 2016

GPA Graduation—Cumulative

Graduation Grade Point Average (Cumulative) is an indication of a student's academic progress for all

semester credit classes for all registered terms up to and including the selected term. Letter grades are

assigned points (A=4, B=3, C=2, D=1, F=0) and the GPA is calculated by taking the number of grade

points a student earned in a selected period of time divided by the total number of classes taken during

that same period.

GPA Graduation Cumulative = Sum of a student's total grade points earned in credit classes for all

classes for all registered terms up to and including the selected term / Sum of student's total classes

taken during that same period

NOTES ON USING THIS TERM: GPA Graduation - Cumulative does not include grades from remedial

classes.

Related Terms: [GPA Graduation - Term]

Page 41: Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine Learning | AWS Public Sector Summit 2016

Questions?

Page 42: Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine Learning | AWS Public Sector Summit 2016

Resources

Amazon Redshift Getting Started Guide:

http://docs.aws.amazon.com/redshift/latest/gsg/getting-started.html

Scalar UDF Documentation: http://docs.aws.amazon.com/redshift/latest/dg/user-defined-

functions.html

Introduction to Python UDFs in Amazon Redshift:

https://blogs.aws.amazon.com/bigdata/post/Tx1IHV1G67CY53T/Introduction-to-Python-UDFs-in-

Amazon-Redshift

Connecting R with Amazon Redshift:

https://blogs.aws.amazon.com/bigdata/post/Tx1G8828SPGX3PK/Connecting-R-with-Amazon-

Redshift

Databricks Apache Spark–Amazon Redshift Tutorial: https://github.com/databricks/spark-

redshift/tree/master/tutorial

Amazon ML Getting Started Guide: https://aws.amazon.com/machine-learning/getting-started/

Amazon QuickSight (Preview Registration): https://aws.amazon.com/quicksight/

Page 43: Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine Learning | AWS Public Sector Summit 2016

Thank you!