aws summit barcelona - data analysis on aws

105
AWS Summit 2013 Barcelona Oct 24 Barcelona, Spain Carlos Conde Sr. Mgr. Solutions Architecture DATA ANALYSIS ON AWS

Upload: amazon-web-services

Post on 11-May-2015

770 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: AWS Summit Barcelona - Data Analysis on AWS

AWS Summit 2013 Barcelona Oct 24 – Barcelona, Spain

Carlos Conde

Sr. Mgr. Solutions Architecture

DATA ANALYSIS ON AWS

Page 2: AWS Summit Barcelona - Data Analysis on AWS

GENERATE STORE ANALYZE SHARE

Page 3: AWS Summit Barcelona - Data Analysis on AWS

THE COST OF DATA

GENERATION IS FALLING

Page 4: AWS Summit Barcelona - Data Analysis on AWS
Page 5: AWS Summit Barcelona - Data Analysis on AWS
Page 6: AWS Summit Barcelona - Data Analysis on AWS

THE MORE DATA YOU COLLECT

THE MORE VALUE YOU CAN

DERIVE FROM IT

Page 7: AWS Summit Barcelona - Data Analysis on AWS
Page 8: AWS Summit Barcelona - Data Analysis on AWS
Page 9: AWS Summit Barcelona - Data Analysis on AWS
Page 10: AWS Summit Barcelona - Data Analysis on AWS
Page 11: AWS Summit Barcelona - Data Analysis on AWS

GENERATE STORE ANALYZE SHARE

Lower cost,

higher throughput

Page 12: AWS Summit Barcelona - Data Analysis on AWS

GENERATE STORE ANALYZE SHARE

Lower cost,

higher throughput

Highly

constrained

Page 13: AWS Summit Barcelona - Data Analysis on AWS

Generated data

Available for analysis

DATA VOLUME

Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011

IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares

Page 14: AWS Summit Barcelona - Data Analysis on AWS

GENERATE STORE ANALYZE SHARE

Page 15: AWS Summit Barcelona - Data Analysis on AWS

GENERATE STORE ANALYZE SHARE

ACCELERATE

Page 16: AWS Summit Barcelona - Data Analysis on AWS

+ ELASTIC AND HIGHLY SCALABLE

+ NO UPFRONT CAPITAL EXPENSE

+ ONLY PAY FOR WHAT YOU USE

+ AVAILABLE ON-DEMAND

= REMOVE CONSTRAINTS

Page 17: AWS Summit Barcelona - Data Analysis on AWS

GENERATE STORE ANALYZE SHARE

Page 18: AWS Summit Barcelona - Data Analysis on AWS

GENERATE STORE ANALYZE SHARE

AWS Import / Export

AWS Direct Connect

Page 19: AWS Summit Barcelona - Data Analysis on AWS

Generated and stored in AWS

Inbound data transfer is free

Multipart upload to S3

Physical media

AWS Direct Connect

Regional replication of AMIs and snapshots

Page 20: AWS Summit Barcelona - Data Analysis on AWS

GENERATE STORE ANALYZE SHARE

Amazon S3,

Amazon Glacier,

Amazon DynamoDB,

Amazon RDS,

Amazon Redshift,

AWS Storage Gateway,

Data on Amazon EC2

Page 21: AWS Summit Barcelona - Data Analysis on AWS

AMAZON S3 SIMPLE STORAGE SERVICE

Page 22: AWS Summit Barcelona - Data Analysis on AWS
Page 23: AWS Summit Barcelona - Data Analysis on AWS

AMAZON

DYNAMODB HIGH-PERFORMANCE, FULLY MANAGED

NoSQL DATABASE SERVICE

Page 24: AWS Summit Barcelona - Data Analysis on AWS

DURABLE &

AVAILABLE CONSISTENT, DISK-ONLY

WRITES (SSD)

Page 25: AWS Summit Barcelona - Data Analysis on AWS

LOW LATENCY AVERAGE READS < 5MS,

WRITES < 10MS

Page 26: AWS Summit Barcelona - Data Analysis on AWS

NO ADMINISTRATION

Page 27: AWS Summit Barcelona - Data Analysis on AWS

500,000 WRITES PER SECOND

DURING SUPER BOWL

Page 28: AWS Summit Barcelona - Data Analysis on AWS

AMAZON

REDSHIFT FULLY MANAGED, PETA-BYTE SCALE

DATAWAREHOUSE ON AWS

Page 29: AWS Summit Barcelona - Data Analysis on AWS
Page 30: AWS Summit Barcelona - Data Analysis on AWS

DESIGN OBJECTIVES: A petabyte-scale data warehouse service that was…

AMAZON REDSHIFT

A Whole Lot Simpler

A Lot Cheaper

A Lot Faster

Page 31: AWS Summit Barcelona - Data Analysis on AWS

AMAZON REDSHIFT

RUNS ON OPTIMIZED HARDWARE

HS1.8XL: 128 GB RAM, 16 Cores, 16 TB compressed user storage, 2 GB/sec scan rate

HS1.XL: 16 GB RAM, 2 Cores, 2 TB compressed customer storage

Page 32: AWS Summit Barcelona - Data Analysis on AWS
Page 33: AWS Summit Barcelona - Data Analysis on AWS
Page 34: AWS Summit Barcelona - Data Analysis on AWS

30 MINUTES

DOWN TO

12 SECONDS

Page 35: AWS Summit Barcelona - Data Analysis on AWS
Page 36: AWS Summit Barcelona - Data Analysis on AWS

Extra Large Node

(HS1.XL)

Single Node (2 TB)

Cluster 2-32 Nodes (4 TB – 64 TB)

AMAZON REDSHIFT LETS YOU

START SMALL AND GROW BIG

Eight Extra Large Node (HS1.8XL) Cluster 2-100 Nodes (32 TB – 1.6 PB)

Page 37: AWS Summit Barcelona - Data Analysis on AWS

CREATE A DATAWAREHOUSE IN

MINUTES

Page 38: AWS Summit Barcelona - Data Analysis on AWS
Page 39: AWS Summit Barcelona - Data Analysis on AWS
Page 40: AWS Summit Barcelona - Data Analysis on AWS
Page 41: AWS Summit Barcelona - Data Analysis on AWS
Page 42: AWS Summit Barcelona - Data Analysis on AWS
Page 43: AWS Summit Barcelona - Data Analysis on AWS
Page 44: AWS Summit Barcelona - Data Analysis on AWS
Page 45: AWS Summit Barcelona - Data Analysis on AWS
Page 46: AWS Summit Barcelona - Data Analysis on AWS
Page 47: AWS Summit Barcelona - Data Analysis on AWS
Page 48: AWS Summit Barcelona - Data Analysis on AWS

JDBC/ODBC

Page 49: AWS Summit Barcelona - Data Analysis on AWS
Page 50: AWS Summit Barcelona - Data Analysis on AWS
Page 51: AWS Summit Barcelona - Data Analysis on AWS
Page 52: AWS Summit Barcelona - Data Analysis on AWS

Price Per Hour for

HS1.XL Single

Node

Effective Hourly

Price Per TB

Effective Annual

Price per TB

On-Demand $ 0.850 $ 0.425 $ 3,723

1 Year

Reservation $ 0.500 $ 0.250 $ 2,190

3 Year

Reservation $ 0.228 $ 0.114 $ 999

Page 53: AWS Summit Barcelona - Data Analysis on AWS

DATA WAREHOUSING DONE THE AWS WAY

No upfront costs, pay as you go

Really fast performance at a really low price

Open and flexible with support for popular tools

Easy to provision and scale up massively

Page 54: AWS Summit Barcelona - Data Analysis on AWS

USAGE SCENARIOS

Page 55: AWS Summit Barcelona - Data Analysis on AWS

Redshift Reporting

and BI EMR

S3

Page 56: AWS Summit Barcelona - Data Analysis on AWS

DynamoDB Redshift

OLTP

Web Apps Reporting

and BI

Page 57: AWS Summit Barcelona - Data Analysis on AWS

RDBMS Redshift

OLTP

ERP Reporting

& BI

Page 58: AWS Summit Barcelona - Data Analysis on AWS

+

RDBMS Redshift

OLTP

ERP Reporting

& BI

Page 59: AWS Summit Barcelona - Data Analysis on AWS

Social Point Analytics in AWS Marc Canaleta (CTO)

@mcanaleta AWS Summit Barcelona 2013

Page 60: AWS Summit Barcelona - Data Analysis on AWS

Social Games developer para Mobile y Facebook

Fundada en 2008, oficinas en Barcelona (22@), 170 personas.

Top #20 mobile grossing games worldwide

Top #3 facebook developer

Page 61: AWS Summit Barcelona - Data Analysis on AWS

Juegos Sociales: interacción entre amigos, viralidad

Modelo freemium: Jugar es gratis, algunos items de pago

Sector Midcore

Leader in Breeding & Collecting strategy games

Page 62: AWS Summit Barcelona - Data Analysis on AWS

Top 20 Grossing en iOS App Store worldwide

Lanzado

recientemente en Android, featured en Google Play

6M DAU en Facebook

Page 63: AWS Summit Barcelona - Data Analysis on AWS

No mantener ni planificar hardware: aumenta la velocidad del negocio

Flexible: Pago por uso

Facilita la escalabilidad:

Auto Scaling

Facilita la alta disponibilidad: múltiples availability zones

Managed components: Load Balancers, Bases de datos, …

Page 64: AWS Summit Barcelona - Data Analysis on AWS

Analytics Driven. Necesarias para casi todos nuestros equipos:

Ingenieros: analíticas realtime, monitorización, detección de problemas

Producto: tomar decisiones, A/B testing, game balancing, …

Marketing: optimización de campañas

Finanzas: seguimiento del negocio

Page 65: AWS Summit Barcelona - Data Analysis on AWS

ANALYTICS QUEUES

BACKEND SERVERS BACKEND SERVERS

FLASH CLIENT IOS CLIENT ANDROID

CLIENT

ANALYTICS QUEUES ANALYTICS QUEUES

LOGFILES STORAGE

ANALYTICS DATABASE

BACKEND SERVERS Symfony 2

Redis

AWS S3

AWS Redshift

Page 66: AWS Summit Barcelona - Data Analysis on AWS

REDIS

Backend escribe eventos en listas de redis

Porque Redis? Coste y rendimiento: 10K eventos/segundo/servidor

Problema: es una base de datos en memoria, hay que vaciar las colas

constantemente Escalado y HA: N servidores distribuidos aleatoriamente

BACKEND

REDIS REDIS

Page 67: AWS Summit Barcelona - Data Analysis on AWS

Procesos python consumen las colas constantemente y

Calculan métricas Real Time

Almacenan logfiles de

eventos para subirlos a S3

Encolan en SQS la URL del objeto S3

Consumer

Redis Queue

LPOP event

Event Log File

Amazon S3

write event

put object

CARGA DE DATOS

GENERACIÓN DE EVENTOS

Redis Real Time

INCR counter

Amazon SQS

enqueue S3 object URL

Page 68: AWS Summit Barcelona - Data Analysis on AWS

Python es muy adecuado para desarrollar workers y tratar datos

Redis: estructuras como contadores,

sets, sorted sets, para métricas Real Time

S3: espacio virtualmente infinito, escalable, alta disponibilidad

SQS fiabilidad y disponibilidad a mayor precio que Redis

Consumer

LPOP event

Redis Real Time

INCR counter

Event Log File

Amazon S3

write event

put object

Amazon SQS

enqueue S3 object URL

CARGA DE DATOS

Redis Queue

GENERACIÓN DE EVENTOS

Page 69: AWS Summit Barcelona - Data Analysis on AWS

Amazon S3 Amazon SQS

Importer

TSV

RedShift

Los importers leen URLs de SQS

Se descargan logfiles de S3

Convierten a TSV

Importan masivamente a Redshift (N logfiles a la vez)

PROCESADO DE EVENTOS

Page 70: AWS Summit Barcelona - Data Analysis on AWS

Nos permite ser flexibles -> cambios de esquema sin downtime

Muy escalable (con downtime de escrituras)

Poco riesgo de implantación Sistema offline Backups

Mantenimiento mínimo: vacuums, espacio

Buen soporte de SQL, a diferencia de otras columnar databases

Page 71: AWS Summit Barcelona - Data Analysis on AWS

Transformaciones y cálculos diarios implementados en SQL

Ejemplo: UPDATE USER SET total_revenues = (SELECT SUM(amount) FROM transaction t

WHERE t.user_id = user.user_id);

Por qué no hadoop?

Mucho más complejo y lento; de momento las operaciones SQL cumplen todos nuestros requisitos

Page 72: AWS Summit Barcelona - Data Analysis on AWS

¿Te gustaría trabajar en el sector de los videojuegos?

Buscamos talento. El talento atrae al talento.

www.socialpoint.es/jobs

¡GRACIAS!

Page 73: AWS Summit Barcelona - Data Analysis on AWS

GENERATE STORE ANALYZE SHARE

Amazon EC2

Amazon Elastic

MapReduce

Page 74: AWS Summit Barcelona - Data Analysis on AWS
Page 75: AWS Summit Barcelona - Data Analysis on AWS

AMAZON ELASTIC

MAPREDUCE HADOOP AS A SERVICE

Page 76: AWS Summit Barcelona - Data Analysis on AWS

• A FRAMEWORK

• SPLITS DATA INTO PIECES

• LETS PROCESSING OCCUR

• GATHERS THE RESULTS

Page 77: AWS Summit Barcelona - Data Analysis on AWS

Corporate Data

Center

Elastic Data

Center

Page 78: AWS Summit Barcelona - Data Analysis on AWS

Corporate Data

Center

Elastic Data

Center

Application data

and logs for

analysis pushed

to S3

Page 79: AWS Summit Barcelona - Data Analysis on AWS

Corporate Data

Center

Elastic Data

Center

Amazon Elastic

Map Reduce

name node to

control analysis

N

Page 80: AWS Summit Barcelona - Data Analysis on AWS

Corporate Data

Center

Elastic Data

Center

Hadoop cluster

started by Elastic

Map Reduce

N

Page 81: AWS Summit Barcelona - Data Analysis on AWS

Corporate Data

Center

Elastic Data

Center

N

Adding many

hundreds or

thousands of

nodes

Page 82: AWS Summit Barcelona - Data Analysis on AWS

Corporate Data

Center

Elastic Data

Center

N

Disposed of when

job completes

Page 83: AWS Summit Barcelona - Data Analysis on AWS

Corporate Data

Center

Elastic Data

Center

Results of

analysis pulled

back into your

systems

Page 84: AWS Summit Barcelona - Data Analysis on AWS
Page 85: AWS Summit Barcelona - Data Analysis on AWS

GENERATE STORE ANALYZE SHARE

Amazon S3,

Amazon DynamoDB,

Amazon RDS,

Amazon Redshift,

Data on Amazon EC2

Page 86: AWS Summit Barcelona - Data Analysis on AWS

PUBLIC DATA SETS http://aws.amazon.com/publicdatasets

Page 87: AWS Summit Barcelona - Data Analysis on AWS
Page 88: AWS Summit Barcelona - Data Analysis on AWS
Page 89: AWS Summit Barcelona - Data Analysis on AWS
Page 90: AWS Summit Barcelona - Data Analysis on AWS

GENERATE STORE ANALYZE SHARE

Page 91: AWS Summit Barcelona - Data Analysis on AWS
Page 92: AWS Summit Barcelona - Data Analysis on AWS
Page 93: AWS Summit Barcelona - Data Analysis on AWS
Page 94: AWS Summit Barcelona - Data Analysis on AWS
Page 95: AWS Summit Barcelona - Data Analysis on AWS
Page 96: AWS Summit Barcelona - Data Analysis on AWS
Page 97: AWS Summit Barcelona - Data Analysis on AWS
Page 98: AWS Summit Barcelona - Data Analysis on AWS
Page 99: AWS Summit Barcelona - Data Analysis on AWS
Page 100: AWS Summit Barcelona - Data Analysis on AWS
Page 101: AWS Summit Barcelona - Data Analysis on AWS
Page 102: AWS Summit Barcelona - Data Analysis on AWS
Page 103: AWS Summit Barcelona - Data Analysis on AWS
Page 104: AWS Summit Barcelona - Data Analysis on AWS

GENERATE STORE ANALYZE SHARE

Page 105: AWS Summit Barcelona - Data Analysis on AWS

FROM DATA TO

ACTIONABLE

INFORMATION