Transcript

+ =

AWS LAMBDA FROM THE TRENCHESwhat you should know before you go to production

hi, my name is Yan Cui

@theburningmonk

- Dan North

“lead time to someone saying thank you is the only reputation

metric that matters.”

security

complexity OUTSIDE the code

deployment

load balancing

caching

monitoring

config management

https://www.infoq.com/presentations/complexity-simplicity-esb

centralised logging

elastic scalingsetup server

THERE IS NO SERVER

automatic scaling

minimise undifferentiated

heavy-lifting

simple, fast deployment

- Dan North

“lead time to someone saying thank you is the only reputation

metric that matters.”

cost saving

not paying for idle servers

energy efficiency in DCs

easy to get started

fuelling the Yubl platform evolution

completely rebuilt search

Legacy Monolith Amazon Kinesis Amazon Lambda

Legacy Monolith Amazon Kinesis Amazon Lambda

Amazon CloudSearchAmazon API Gateway Amazon Lambda

analytics pipeline

Legacy Monolith Amazon Kinesis Amazon Lambda

Google BigQuery

Legacy Monolith Amazon Kinesis Amazon Lambda

Google BigQuery

1 developer, 2 daysdesign production

(his 1st serverless project)

Legacy Monolith Amazon Kinesis Amazon Lambda

Google BigQuery“nothing ever got done

this fast at Skype!”

- Chris Twamley

- Dan North

“lead time to someone saying thank you is the only reputation

metric that matters.”

Facebook login

Amazon Lambda GrapheneDBAmazon API Gateway

Amazon API Gateway Amazon Lambda Facebook Graph API

and many more…

GET PRODUCTION-READY

USE ADEPLOYMENT FRAMEWORK

http://serverless.com

http://apex.run

https://github.com/claudiajs/claudia

TESTING

Amazon Lambda

Amazon KinesisAmazon IOT Amazon IOT

“I thought of objects being like biological cells and/or individual computers on a network, only

able to communicate with messages.”

- Alan Kay

Amazon Lambda

Amazon KinesisAmazon IOT Amazon IOT

“OOP to me means only messaging, local retention and protection and hiding of state-

process, and extreme late-binding of all things.”

- Alan Kay

amzn.to/29Lxuzu

Level of Testing

1.Unitdo our objects do the right thing?are they easy to work with?

Level of Testing

1.Unit2.Integrationdoes our code work against code we can’t change?

handler

handler

test by invoking the handler

Level of Testing

1.Unit2.Integration3.Acceptancedoes the whole system work?

Level of Testing

unit

integration

acceptance

Level of Testing

unit

integration

acceptance

can do all 3 with Lambda

“…We find that tests that mock external libraries often need to be complex to get the code into the right state for the functionality we need to exercise.

The mess in such tests is telling us that the design isn’t right but, instead of fixing the problem by improving the code, we have to carry the extra complexity in both code and test…”

Don’t Mock Types You Can’t Change

“…The second risk is that we have to be sure that the behaviour we stub or mock matches what the external library will actually do…

Even if we get it right once, we have to make sure that the tests remain valid when we upgrade the libraries…”

Don’t Mock Types You Can’t Change

Don’t Mock Types You Can’t ChangeServices

“…Wherever possible, an acceptance test should exercise the system end-to-end without directly calling its internal code.

An end-to-end test interacts with the system only from the outside: through its interface…”

Testing End-to-End

Legacy Monolith Amazon Kinesis Amazon Lambda

Amazon CloudSearchAmazon API Gateway Amazon Lambda

Legacy Monolith Amazon Kinesis Amazon Lambda

Amazon CloudSearchAmazon API Gateway Amazon Lambda

Test Input

Legacy Monolith Amazon Kinesis Amazon Lambda

Amazon CloudSearchAmazon API Gateway Amazon Lambda

Test Input

Validate

“…We prefer to have the end-to-end tests exercise both the system and the process by which it’s built and deployed…

This sounds like a lot of effort (it is), but has to be done anyway repeatedly during the software’s lifetime…”

Testing End-to-End

Jenkins build config deploys and tests

unit + integration tests

deploy

acceptance tests

build.sh allows repeatable builds on both local & CI

TEAM WORK

shared environments

GOALS

easily propagate environmental changes

GOALS

PRO TIPdon’t ignore _meta

centralised config service

config servicegoes here

APP SECRETS

GOALSsensitive data are encrypted at rest

(credentials, connection string, etc.)

GOALShas to work on CI

GOALSrole-based access

hand-rolled with KMS

(encrypted at rest)

hand-rolled with KMS

plug-ins

serverless-plugin-kmsvariables

serverless-secrets

serverless-meta-sync

centralised config service

DOCUMENTATION

set goals

set goals

choose a way

set goals

choose a way

document

create project templates/scaffolds

set goals

choose a way

evaluate document

set goals

choose a way

evaluate document

set goals

choose a way

evaluate document

share

LOGGING

2016-07-12T12:24:37.571Z 994f18f9-482b-11e6-8668-53e4eab441ae GOT is off air, what do I do now?

2016-07-12T12:24:37.571Z 994f18f9-482b-11e6-8668-53e4eab441ae GOT is off air, what do I do now?

UTC Timestamp API Gateway Request Id

your log message

organised by Function + Version

LOG OVERLOAD

centralise your logs

CloudWatch Logs AWS Lambda

LogStash ElasticSearch

CloudWatch Logs AWS Lambda

LogStash ElasticSearch

AWS Elasticsearch

CloudWatch Logs AWS Lambda

LogStash ElasticSearch

AWS Elasticsearch

Elastic Cloud

CloudWatch Logs AWS Lambda

LogStash ElasticSearch

AWS Elasticsearch

Elastic Cloud

?

correlation IDs

MONITORING

PRO TIPset up dashboards

PRO TIPdon’t forget to set

up alarms

PRO TIPadd application-level

metrics

ERROR HANDLING

“how do I return HTTP error codes?”

{ “status” : 404, “errorMessage” : ”oops” }

{ “status” : 404, “errorMessage” : ”oops” }

s-templates.json

{ “status” : 404, “errorMessage” : ”oops” }

PRO TIPmap timeouts to 504

every Lambda function has a timeout setting

use error regex to map it to a HTTP 504

s-templates.json

PRO TIPavoid using 128mb

setting for production

continuous timeout loop…

PRO TIPproactively time out

your function

“what’s the retry strategy with Kinesis and SNS?”

“…If the invocation for one record times out, is throttled, or

encounters any other error, Lambda will retry until it

succeeds (or the record reaches its 24-hour expiration) before

moving on to the next record…”

http://aws.amazon.com/lambda/faqs

• do nothing• swallow errors• track retry count

effort

• retry forever• no retry• retry N times

PRO TIPuse local state to track no. of retries; move on

after N retries

PRO TIPrecord CloudWatch

metrics for error count; alarm if necessary

retried 3-5 times

KEEP WARM

functions are unloaded if idle for a while

noticeable cold start time(package size matters)

CloudWatch Event AWS Lambda

CloudWatch Event AWS Lambda

ping

ping

ping

ping

CloudWatch Event AWS Lambda

ping

ping

ping

ping

CloudWatch Event AWS Lambda

ping

ping

ping

ping

HEALTH CHECKS?

even then…

functions are recycled every few hours

functions are recycled every few hours

PRO TIPdon’t make hard

assumptions about function lifetime

KNOW YOUR LIMITS

max 50 MB deployment package size

max 50 MB deployment package sizemax 75 GB total deployment package size*

* limit is per AWS region

Janitor Monkey

Janitor Lambda

max 5 mins execution time

max 6 MB request payload size*

max 6 MB response payload size

* for a request-response event type

default max 100 concurrent executions*

* soft-limit, can be raised via support ticket

looking ahead

.Net core?SQS support?

v1.0 (coming soon)

MULTI-CLOUD FUTURE?

IBM OpenWhisk

Amazon Lambda Azure Web Functions

Google Cloud Functions

competition

faster innovation lower prices

@theburningmonk

@theburningmonktheburningmonk.comgithub.com/theburningmonk


Top Related