using aws emr, redshift, and spark to power your analytics

25
1 Using AWS EMR, Redshift, and Spark to Power Your Analytics Mark Balkenende & Ashwin Viswanath, Talend Product Team Mick Bass, Co-founder and AWS Professional Certified Solutions Architect, 47 Lining

Upload: talend

Post on 16-Apr-2017

714 views

Category:

Technology


1 download

TRANSCRIPT

1

Using AWS EMR, Redshift, and Spark to Power Your AnalyticsMark Balkenende & Ashwin Viswanath, Talend Product Team

Mick Bass, Co-founder and AWS Professional Certified Solutions Architect, 47 Lining

2

Agenda

• Different Types of Analytics

• Predictive Analytics – what is it?

• Use Cases

• Enabling Technologies

• Introduction to 47Lining

• Importance of AWS Technologies

• Demo

• Getting a Successful Predictive Analytics POC

• Next Steps

• Q&A

3

Different Types of Analytics

4

Different Types of Analytics

Predictive Analytics

Descriptive Analytics Diagnostic Analytics

Prescriptive Analytics

What just happened?

Why did it happen?

What should I doabout it?

What might

happen?

5

Predictive Analytics – What is it?

6

Predictive Analytics is NOT Just One Of These Areas

Predictive Analytics

7

Defining Predictive Analytics

Algorithms

Consuming AppStatistical Models

Integration & Cleansing

Predictive Analytics

}

8

Use Cases for Predictive Analytics

Preventative Maintenance

Health Risk Management

Customer ChurnProduct

RecommendationsFraud Detection

9

Key Enabling Technologies

Data StorageData Ingestion Data Processing

Amazon S3

Data Warehousing

Amazon EMR

Amazon Redshift

10

Name

Title

Company

IDC discovered that the five-year total cost of ownership (TCO) of developing,

deploying, and managing critical applications in AWS delivered a 64.3%

savings when compared with deploying the same resources on-premises or in

hosted environments. The findings also showed a 560% ROI over five years

and 81.7% less downtime.

Amazon Economics: 560% ROI Over 5 Years

IDC: Quantifying the Business Value of Amazon Web ServicesMay 2015

https://d0.awsstatic.com/analyst-reports/IDC_Business_Value_of_AWS_May_2015.pdf

11

Introduction to 47Lining

Mick Bass, Co-founder and AWS Professional Certified Solutions Architect

12

47Lining is a Talend and AWS Advanced Consulting Partner with Big Data Competency designation. We develop big data solutions and deliver big data managed services built from underlying Talendand AWS big data building blocks like Talend Studio, Talend Integration Cloud, Amazon Redshift, Kinesis, S3, DynamoDB, Machine Learning and Elastic MapReduce (EMR). We help customers build, operate and manage breathtaking “Data Machines” for their data-driven businesses.

47Lining recently enabled a major gaming customer to ingest a billion rows a day into Redshift for a co-branded movie launch

www.47lining.com | [email protected] | @47lining

Why AWS is So Important for Analytics and Which AWS Services Are Most Relevant?

13

AWS provides customers with:

• Business agility – zero lead time for establishment of resources• Elastic Pricing – pay only for what you use• Near-infinite Scalability with ability to Burst based on business demand• Rich set of Big-Data ingest, storage and analytics services

The more relevant big data services include:

Amazon Kinesis Amazon Redshift Amazon EMRAmazon S3

Ingest Data Lake, Source of Truth Warehouse Analytics

www.47lining.com | [email protected] | @47lining

14

Demo

Mark Balkenende, Talend Product Team

15

Demonstration Take Aways

1. Review a Recommendation Pipeline using Elastic Cloud

Services

2. Talend’s Value on Cloud Services

3. Talend Spark on EMR with Machine Learning

16

Continuous Delivery of Analytics

Create a prediction, model,

score

Operationalize

analytics

Continuousdelivery

Full access to Data Lake for modeling

DataScientist

IT

17

Data & Services Flow

Customer Account

Amazon Redshift

Raw Events Ingest• All Consumers• Clickstream Data; or

Consumption & Usage Data

Per-UserBehavior

DailyEvents

o o o

RecommendationsData Prep<nightly>

NightlyRecommendations

input results

Incremental Load & Maintenance

<nightly>

Elastic

MapReduce

Service

Transient

Cluster

18

Data & Services Flow

Customer Account

Amazon Redshift

Raw Events Ingest• All Consumers• Clickstream Data; or

Consumption & Usage Data

Per-UserBehavior

DailyEvents

o o o

RecommendationsData Prep<nightly>

NightlyRecommendations

input results

Incremental Load & Maintenance

<nightly>

Elastic

MapReduce

Service

Transient

Cluster

E

Elastic Start & Stop

Amazon EMR

E

Elastic Start & Stop

Amazon Redshift

19

Data & Services Flow

Customer Account

Amazon Redshift

Raw Events Ingest• All Consumers• Clickstream Data; or

Consumption & Usage Data

Per-UserBehavior

DailyEvents

o o o

RecommendationsData Prep<nightly>

NightlyRecommendations

input results

Incremental Load & Maintenance

<nightly>

Elastic

MapReduce

Service

Transient

Cluster

EBuild the Spark

Recommendations

Amazon EMR

20

• Lower Costs• No large hardware, software investment• Fewer IT resources to manage• Buy capacity as you need it

• Faster Time-to-Market• Secure, hosted service • Up-and-running in minutes• Instant computing and DW Resources

• Improved Agility• Instant capacity• Quicker Iterative Cycles• Easier to access

Take Aways

21

Next Steps

22

• Download Talend Studio at https://www.talend.com/download/talend-open-studio

• Take a trial of Talend Integration Cloud at https://www.talend.com/products/integration-cloud

• Contact your Talend Account Manager• Free 47 Lining roadmap session on developing your first Predictive Analytics proof-

of-concept (POC)

Next Steps

Getting Started with Your First Predictive Analytics PoC

23

47Lining makes Talend shine in AWS. We enable customers to quickly reap elasticity and price / performance benefits of AWS for data warehousing & analytics in the Cloud at a fraction of the price of traditional solutions.

On-Premise

Partners

PublicDataSets

SaaSProviders

SocialMedia

Fuse/Visualize

DataSources Predictors AWSEnablers

1)Enhancereal- mecustomer

engagement

2)Decisionsupporttoop mizeprocesses

Results

Scalability

Automa on

Agility

Costeffec veness

MachineLearning

www.47lining.com | [email protected] | @47lining

47Lining can jumpstart your first predictive analytics PoC leveraging the combined power of Talend and AWS.

Getting Started with Your First Predictive Analytics PoC

24

Proof of Concept(PoC)

Build / Launch Run

www.47lining.com | [email protected] | @47lining

47Lining can help you prove, launch and operate new capabilities in AWS or extend your existing capabilities to the Cloud

Do any of these statements ring true?

• My company runs large-scale processes that could benefit from predictive analytics, but I’m not sure how to start

• My company already runs big data workloads and would like to extend to AWS’ on-demand capacity and elastic pricing

• My company would like to accelerate time to business benefit by working closely with AWS and Talend experts

If so, contact 47Lining for a free Consultative Jumpstart Working Session covering:

• Predictive Analytics Value Exploration, PoC Focus, Approach and Business Case, Strategic Roadmap

25

Q&A

Mark Balkenende & Ashwin Viswanath, Talend Product Team

Mick Bass, Co-founder and AWS Professional Certified Solutions Architect, 47Lining