aws may webinar series - streaming data processing with amazon kinesis and aws lambda

19
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Vyom Nagrani, Sr. Product Manager, AWS Lambda May 21, 2015 Streaming Data Processing with Amazon Kinesis and AWS Lambda

Upload: amazon-web-services

Post on 23-Jul-2015

1.125 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AWS Lambda

© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Vyom Nagrani, Sr. Product Manager, AWS Lambda

May 21, 2015

Streaming Data Processing with

Amazon Kinesis and AWS Lambda

Page 2: AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AWS Lambda

Amazon Kinesis: A managed service for

streaming data ingestion and processing

Amazon Web Services

AZ AZ AZ

Durable, highly consistent storage replicates dataacross three data centers (availability zones)

Aggregate andarchive to S3

Millions ofsources producing100s of terabytes

per hour

FrontEnd

AuthenticationAuthorization

Ordered streamof events supportsmultiple readers

Real-timedashboardsand alarms

Machine learningalgorithms or

sliding windowanalytics

Aggregate analysisin Hadoop or adata warehouse

Inexpensive: $0.028 per million puts

Page 3: AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AWS Lambda

Benefits of Amazon Kinesis for stream data

ingestion and continuous processing

Real-time Ingest

Highly Scalable

Durable

Elastic

Replay-able Reads

Continuous Processing FX

Elastic

Load-balancing incoming streams

Fault-tolerance, Checkpoint / Replay

Enable multiple processing apps in parallel

Enable data movement into Stores/ Processing Engines

Managed Service

Low end-to-end latency

Page 4: AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AWS Lambda

AWS Lambda: A compute service that runs

your code in response to events

Lambda functions: Stateless, event-driven code execution

Triggered by events:

• Put to an Amazon S3 bucket

• Record in an Amazon Kinesis stream

• Direct sync and async invocations

Makes it easy to

• Build back-end services that perform at scale

• Perform data-driven auditing, analysis, and notification

Page 5: AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AWS Lambda

High performance at any scale;

Cost-effective and efficient

No Infrastructure to manage

Pay only for what you use: Lambda

automatically matches capacity to

your request rate. Purchase

compute in 100ms increments.

Bring Your Own Code

“Productivity focused compute platform to build powerful, dynamic,

modular applications in the cloud”

Run code in a choice of standard

languages. Use threads, processes,

files, and shell scripts normally.

Focus on business logic, not

infrastructure. You upload code; AWS

Lambda handles everything else.

Benefits of AWS Lambda for building a server-

less data processing engine

1 2 3

Page 6: AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AWS Lambda

What you can do with Kinesis+Lambda

Data Input Kinesis Action Lambda Data Output

IT application activity

Capture the

stream

Audit

Process the

stream

SNS

Metering records Condense Redshift

Change logs Backup S3

Financial data Store RDS

Transaction orders Process SQS

Server health metrics Monitor EC2

User clickstream Analyze EMR

IoT device data Respond Backend endpoint

Custom data Custom action Custom application

Page 7: AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AWS Lambda

Today’s demo: Workflow of a simple real-time

data analytics setup

Amazon

Kinesis

AWS

Lambda

Amazon

SNS

Amazon

CloudWatch

Page 8: AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AWS Lambda

Create different Lambda functions for each task,

associate to same Kinesis stream

Log to

CloudWatch

Logs

Push to SNS

Page 9: AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AWS Lambda

Demo: Real time processing of

Amazon Kinesis data streams with

AWS Lambda

Page 10: AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AWS Lambda

Things to remember when creating a Kinesis

stream

• Streams are made of Shards

• Each Shard ingests data up to 1MB/sec

• Each Shard emits data up to 2MB/sec

• All data is stored for 24 hours, Replay data inside of 24hr window

• A Partition Key is supplied by producer and used to distribute the PUTs across Shards

• A unique Sequence # is returned to the Producer upon a successful PUT call

Page 11: AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AWS Lambda

Attaching a Lambda function to a Kinesis stream

• Shards: One Lambda function concurrently invoked per Kinesis shard

• Increasing shards will cause more Lambda functions invoked concurrently

• Each individual shard follows ordered processing

… …Source

Kinesis

Destination

1

Lambda

Destination

2

Pollers FunctionsShards

Lambda will scale automaticallyScale Kinesis by adding shards

Page 12: AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AWS Lambda

Performance tuning Kinesis as an event source

• Batch size: Number of records that AWS

Lambda will retrieve from Kinesis at the

time of invoking your function

• Increasing batch size will cause fewer

Lambda function invocations with more

data processed per function

• Starting Position: The position in the

stream where Lambda starts reading

• Set to “Trim Horizon” for ordered

processing (FIFO)

• Set to “Latest” for reading most recent

data (LIFO)

Page 13: AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AWS Lambda

Best practices for creating Lambda functions

• Memory: CPU and disk proportional to the memory configured

• Increasing memory makes your code execute faster (if CPU bound)

• Increasing memory allows for larger record sizes processed

• Timeout: Increasing timeout allows for longer functions, but more wait in case of errors

• Retries: For Kinesis, Lambda has unlimited retries (until data expires)

• Permission model: Lambda pulls data from Kinesis, so no invocation role needed, only

execution role

Page 14: AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AWS Lambda

Monitoring and Debugging Lambda functions

• Monitoring: available in Amazon CloudWatch Metrics

• Invocation count

• Duration

• Error count

• Throttle count

• Debugging: available in Amazon CloudWatch Logs

• All Metrics

• Custom logs

• RAM consumed

• Search for log events

Page 15: AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AWS Lambda

Customers running real-time data stream

processing on Kinesis+Lambda

AWS

Lambda

Aggregate

statistics

Real-time

analyticsKinesis Stream

“I want to apply custom logic to

process data being uploaded through

my Kinesis stream”.

• Client activity tracking

• Metrics generation

• Data cleansing

• Log filtering

• Indexing and searching

• Log routing

• Live alarms and notifications

Page 16: AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AWS Lambda

Three Next Steps

1. Create your first Kinesis stream. You can configure hundreds of thousands of data producers to continuously put data into an Amazon Kinesis stream. For example, data from website clickstreams, application logs, and social media feeds.

2. Create and test your first Lambda function. With AWS Lambda, there are no new languages, tools, or frameworks to learn. You can use any third party library, even native ones. And the first 1M requests each month are on us!

3. Use AWS Lambda to process Amazon Kinesis streams … no infrastructure to manage, and setup real-time analytics in minutes!

Page 17: AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AWS Lambda

AWS Summit – Chicago: An exciting, free cloud conference designed to educate and inform new

customers about the AWS platform, best practices and new cloud services.

Details• July 1, 2015

• Chicago, Illinois

• @ McCormick Place

Featuring• New product launches

• 36+ sessions, labs, and bootcamps

• Executive and partner networking

Registration is now open• Come and see what AWS and the cloud can do for you.

Page 18: AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AWS Lambda

- If you are interested in learning more about how to navigate the cloud to grow

your business - then attend the AWS Summit Chicago, July 1st.

- Register today to learn from technical sessions led by AWS engineers, hear best

practices from AWS customers and partners, and participate in some of the 30+

paid sessions and labs.

- Simply go to

https://aws.amazon.com/summits/chicago/?trkcampaign=summit_chicago_bootc

amps&trk=Webinar_slide

to register today.

- Registration is FREE.

Page 19: AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AWS Lambda

Thank you!

Visit http://aws.amazon.com/kinesis,

the AWS Big Data blog, and the

Kinesis forum to learn more and get

started using Kinesis.

Visit http://aws.amazon.com/lambda,

the AWS Compute blog, and the

Lambda forum to learn more and

get started using Lambda.