experiences with serverlessbig data · be serverless and serve data amazon s3 aws lambda aws lambda...

22
Munich, 17.10.16 Markus Schmidberger, Head of Data Service Experiences with Serverless Big Data AWS Meetup – Munich 2016

Upload: others

Post on 20-May-2020

9 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Experiences with ServerlessBig Data · Be serverless and serve data Amazon S3 AWS Lambda AWS Lambda Amazon API Gateway. glomex – A company of ProSiebenSat.1 Media SE 10 Be serverless

Munich, 17.10.16Markus Schmidberger, Head of Data Service

ExperienceswithServerless BigDataAWS Meetup – Munich 2016

Page 2: Experiences with ServerlessBig Data · Be serverless and serve data Amazon S3 AWS Lambda AWS Lambda Amazon API Gateway. glomex – A company of ProSiebenSat.1 Media SE 10 Be serverless

2glomex – A company of ProSiebenSat.1 Media SE

Key Components of our Data Service

ContentDiscoveryFindthemostrelevantcontentforourcustomersandtheirusers.

Real-TimeMonitoringEnableourdevelopmentteamstoserveourcontenttoourusersinthebestqualitypossible.

AnalyticsProvideourteamsaccesstothedatatoenabledata-drivendevelopmentofnewfeaturesandproducts.

Page 3: Experiences with ServerlessBig Data · Be serverless and serve data Amazon S3 AWS Lambda AWS Lambda Amazon API Gateway. glomex – A company of ProSiebenSat.1 Media SE 10 Be serverless

3glomex – A company of ProSiebenSat.1 Media SE

Micro-Service Architecture

INGEST STOREPROCESS &

ANALYSEVISUALIZE &

SERVE

AdProxy Log Import Service

Player Feedback Import Service

Data PlatformAccess

Data ScienceAnalytics Service

TechnicalMonitoring

Service

Dev / Ops Analytics Service

Content Discovery Service

KPI & Analytics Service

MetadataService

ContentImport Service

Data Platform Monitoring Service

Data QualityService

Data Management

Service

Data Layer

Data API

Data Lake

External Data Import Service

Portal

CDN files

data stream

data stream

Team

VAS Log Import Service

data stream

other modules

Real-Time Dashboards

ContentAPI

Data Platform - MicroService Layout

CDN Log Import Service

Data Science UI

Page 4: Experiences with ServerlessBig Data · Be serverless and serve data Amazon S3 AWS Lambda AWS Lambda Amazon API Gateway. glomex – A company of ProSiebenSat.1 Media SE 10 Be serverless

4glomex – A company of ProSiebenSat.1 Media SE

Lambda Architecture

BatchProcessing• KPIsforMES• MESBilling

DataScience• CDNBills• DataInsights

Real-time• Real-timeplayer

monitoring• Internaldashboard

Page 5: Experiences with ServerlessBig Data · Be serverless and serve data Amazon S3 AWS Lambda AWS Lambda Amazon API Gateway. glomex – A company of ProSiebenSat.1 Media SE 10 Be serverless

5glomex – A company of ProSiebenSat.1 Media SE

Lambda Architecture

Graphic provided by http://lambda-architecture.net

≠AWSLambda

Page 6: Experiences with ServerlessBig Data · Be serverless and serve data Amazon S3 AWS Lambda AWS Lambda Amazon API Gateway. glomex – A company of ProSiebenSat.1 Media SE 10 Be serverless

6glomex – A company of ProSiebenSat.1 Media SE

AWSLambda

Notification

Amazon S3 AWS Lambda processes the object

Amazon S3

New object uploaded

Amazon DynamoDB

Page 7: Experiences with ServerlessBig Data · Be serverless and serve data Amazon S3 AWS Lambda AWS Lambda Amazon API Gateway. glomex – A company of ProSiebenSat.1 Media SE 10 Be serverless

7glomex – A company of ProSiebenSat.1 Media SE

AWSLambdaExecution

Page 8: Experiences with ServerlessBig Data · Be serverless and serve data Amazon S3 AWS Lambda AWS Lambda Amazon API Gateway. glomex – A company of ProSiebenSat.1 Media SE 10 Be serverless

8glomex – A company of ProSiebenSat.1 Media SE

Be serverless for ETL

Page 9: Experiences with ServerlessBig Data · Be serverless and serve data Amazon S3 AWS Lambda AWS Lambda Amazon API Gateway. glomex – A company of ProSiebenSat.1 Media SE 10 Be serverless

9glomex – A company of ProSiebenSat.1 Media SE

Be serverless and serve data

AWSLambda AWSLambda AmazonAPIGatewayAmazonS3

Page 10: Experiences with ServerlessBig Data · Be serverless and serve data Amazon S3 AWS Lambda AWS Lambda Amazon API Gateway. glomex – A company of ProSiebenSat.1 Media SE 10 Be serverless

10glomex – A company of ProSiebenSat.1 Media SE

Be serverless for Recommendations

Recommendation Pipeline

Publisher’sURL RecommenderSystem

SearchDownloadPagecontent

ContentPlaylist

Page 11: Experiences with ServerlessBig Data · Be serverless and serve data Amazon S3 AWS Lambda AWS Lambda Amazon API Gateway. glomex – A company of ProSiebenSat.1 Media SE 10 Be serverless

11glomex – A company of ProSiebenSat.1 Media SE

• Read data from Kinesis Firehose / S3

• Server downtime / scheduler

• Load to ElasticSearch

• Clean ElasticSearch and Redshift

• Advanced Redshift monitoring

• EBS Snapshots

Be serverless everywhere

Page 12: Experiences with ServerlessBig Data · Be serverless and serve data Amazon S3 AWS Lambda AWS Lambda Amazon API Gateway. glomex – A company of ProSiebenSat.1 Media SE 10 Be serverless

12glomex – A company of ProSiebenSat.1 Media SE

Agile DevOps

Continous Delivery of Micro-services – „Automate all the things“

Page 13: Experiences with ServerlessBig Data · Be serverless and serve data Amazon S3 AWS Lambda AWS Lambda Amazon API Gateway. glomex – A company of ProSiebenSat.1 Media SE 10 Be serverless

13glomex – A company of ProSiebenSat.1 Media SE

Agile Cloud Deployment

Glomex Cloud Deployment Tools

KumoAWSCloudFormation

RamudaAWSLambda

YugenAWSAPIGateway

TenkaiAWSCodeDeploy

• IncooperationwithOperationsTeam

• Usedbyotherteams

• Simplifyclouddeploymentsdrastically

• Slack- andmonitoringintegration

Page 14: Experiences with ServerlessBig Data · Be serverless and serve data Amazon S3 AWS Lambda AWS Lambda Amazon API Gateway. glomex – A company of ProSiebenSat.1 Media SE 10 Be serverless

14glomex – A company of ProSiebenSat.1 Media SE

Agile DevOps

Cross-functional team responsible to push components to production themselves

Page 15: Experiences with ServerlessBig Data · Be serverless and serve data Amazon S3 AWS Lambda AWS Lambda Amazon API Gateway. glomex – A company of ProSiebenSat.1 Media SE 10 Be serverless

15glomex – A company of ProSiebenSat.1 Media SE

AWS Lambda Limits

5 min512 MB

AWS Lambda Timeout

AWS Lambda temp disk

• Howtoprocess800MBgziped logfile?

• Howtosplitcompressedgzip files?

• SplitterusingAmazonSQSandAmazonEC2SpotInstances

Page 16: Experiences with ServerlessBig Data · Be serverless and serve data Amazon S3 AWS Lambda AWS Lambda Amazon API Gateway. glomex – A company of ProSiebenSat.1 Media SE 10 Be serverless

16glomex – A company of ProSiebenSat.1 Media SE

• Lambda function deployment package size (.zip/.jar file)• 50 MB

• Total size of all the deployment packages that can be uploaded per region• 75 GB

• CreateLogGroup• 500 log groups / account / region

• Lambda functions have to be wired• Be aware of retries

• 3 not configurable• Traceback and error output available via CloudWatch Logs

• https://github.com/jorgebastida/awslogs• No local development environment

More AWS Lambda limits

Page 17: Experiences with ServerlessBig Data · Be serverless and serve data Amazon S3 AWS Lambda AWS Lambda Amazon API Gateway. glomex – A company of ProSiebenSat.1 Media SE 10 Be serverless

17glomex – A company of ProSiebenSat.1 Media SE

More Facts

20 GB5 Billion

Per day click-stream data IN (player, vas, adproxy)

Click-stream records processed per day

~100 ms Data freshness to S3

25 GB300 Million

Per day as zipped CDN log-files

CDN record processed per day

< 1 min Data freshness to API

Page 18: Experiences with ServerlessBig Data · Be serverless and serve data Amazon S3 AWS Lambda AWS Lambda Amazon API Gateway. glomex – A company of ProSiebenSat.1 Media SE 10 Be serverless

18glomex – A company of ProSiebenSat.1 Media SE

More Facts

600 rec/sec

1 $ / hour

Processing time

Cost for 25 GB/dayCDN processing

6 Parallel AWS Lambda functions

2.3 min Average run-time of AWS Lambda AWS Lambda duration

Redshift CPU

~ 89% - 97.8% Accuracy

Page 19: Experiences with ServerlessBig Data · Be serverless and serve data Amazon S3 AWS Lambda AWS Lambda Amazon API Gateway. glomex – A company of ProSiebenSat.1 Media SE 10 Be serverless

19glomex – A company of ProSiebenSat.1 Media SE

Key Takeaways

LambdaArchitecture

Enrichyourtraditional,batch-drivenBI-workflowwithreal-timeanalytics

UseLambda-Architectureasaguidingprincipleandadaptittoyourneeds

Page 20: Experiences with ServerlessBig Data · Be serverless and serve data Amazon S3 AWS Lambda AWS Lambda Amazon API Gateway. glomex – A company of ProSiebenSat.1 Media SE 10 Be serverless

20glomex – A company of ProSiebenSat.1 Media SE

Analyze

TakeActionsAutomate

1

23

Key Takeaways

Page 21: Experiences with ServerlessBig Data · Be serverless and serve data Amazon S3 AWS Lambda AWS Lambda Amazon API Gateway. glomex – A company of ProSiebenSat.1 Media SE 10 Be serverless

21glomex – A company of ProSiebenSat.1 Media SE

Key Takeaways

AWSmanagedservicesprovideanrobustwaytoruncomplexbigdatainfrastructures

Followbest-practicesprovidedbyAWSandthecommunity

Focusonfeaturesdevelopmentandrobustpipelinesnotoninfrastructuremanagement

Page 22: Experiences with ServerlessBig Data · Be serverless and serve data Amazon S3 AWS Lambda AWS Lambda Amazon API Gateway. glomex – A company of ProSiebenSat.1 Media SE 10 Be serverless

Munich, 17.10.16Markus Schmidberger, Head of Data Service@cloudHPC,[email protected]

Wearehiring…

• DataEngineers

• ProductOwner

• BigDataProductLineManager