AWS Lambda from the trenches (Serverless London)

Download AWS Lambda from the trenches (Serverless London)

Post on 23-Jan-2018

405 views

Category:

Technology

4 download

Embed Size (px)

TRANSCRIPT

<ul><li><p>from the</p><p>TRENCHESTRENCHES</p><p>what you should know before you go to production</p><p>AWS LAMBDAAWS LAMBDA</p></li><li><p>hi, Im Yan Cui</p></li><li><p>hi, Im Yan CuiAWS user since 2009 </p></li><li><p>apr, 2016</p></li><li><p>hidden complexities and dependencies</p><p>low utilisation to leave room for traffic spikes</p><p>EC2 scaling is slow, so scale earlier</p><p>lots of cost for unused resources</p><p>up to 30 mins for deployment</p><p>deployment required downtime</p></li><li><p>- Dan North</p><p>lead time to someone saying thank you is the only reputation </p><p>metric that matters.</p></li><li><p>what would good look like for us?</p></li><li><p>be small be fast </p><p>have zero downtime have no lock-step</p><p>DEPLOYMENTS SHOULD...</p></li><li><p>FEATURES SHOULD...be deployable independently </p><p>be loosely-coupled </p></li><li><p>WE WANT TO...minimise cost for unused resources </p><p>minimise ops effort reduce tech mess </p><p>deliver visible improvements faster</p></li><li><p>nov, 2016</p></li><li><p>170 Lambda functions in prod</p><p>1.2 GB deployment packages in prod</p><p>95% cost saving vs EC215x no. of prod releases per month</p></li><li><p>timeis a good fit</p></li><li><p>1st function in prod!time</p><p>is a good fit</p></li><li><p>?</p><p>timeis a good fit</p><p>1st function in prod!</p></li><li><p>Practices ToolsPrinciples</p><p>what is good? how to make it good? with what?</p></li><li><p>Principles outlast Tools</p></li><li><p>ALERTING</p><p>CI / CD</p><p>TESTING</p><p>LOGGING</p><p>MONITORING</p></li><li><p>170 functions</p><p>WOOF!</p><p>? ?</p><p>timeis a good fit</p><p>1st function in prod!</p></li><li><p>SECURITY</p><p>DISTRIBUTEDTRACING</p><p>CONFIGMANAGEMENT</p></li><li><p>evolving the PLATFORM</p></li><li><p>rebuilt search</p></li><li><p>Legacy Monolith Amazon Kinesis Amazon Lambda</p><p>Amazon CloudSearch</p></li><li><p>Legacy Monolith Amazon Kinesis Amazon Lambda</p><p>Amazon CloudSearchAmazon API Gateway Amazon Lambda</p></li><li><p>new analytics pipeline</p></li><li><p>Legacy Monolith Amazon Kinesis Amazon Lambda</p><p>Google BigQuery</p></li><li><p>Legacy Monolith Amazon Kinesis Amazon Lambda</p><p>Google BigQuery</p><p>1 developer, 2 daysdesign production</p><p>(his 1st serverless project)</p></li><li><p>Legacy Monolith Amazon Kinesis Amazon Lambda</p><p>Google BigQuerynothing ever got done </p><p>this fast at Skype!</p><p>- Chris Twamley</p></li><li><p>- Dan North</p><p>lead time to someone saying thank you is the only reputation </p><p>metric that matters.</p></li><li><p>Rebuiltwith Lambda</p></li><li><p>Rebuiltwith Lambda</p></li><li><p>BigQuery</p></li><li><p>BigQuery</p></li><li><p>grapheneDB</p><p>BigQuery</p></li><li><p>grapheneDB</p><p>BigQuery</p></li><li><p>grapheneDB</p><p>BigQuery</p></li><li><p>getting PRODUCTION READY</p></li><li><p>CHOOSE A</p><p>FRAMEWORK</p><p>DEPLOYMENT</p></li><li><p>http://serverless.com</p><p>http://serverless.com</p></li><li><p>https://github.com/awslabs/serverless-application-model</p><p>https://github.com/awslabs/serverless-application-model</p></li><li><p>http://apex.run</p><p>http://apex.run</p></li><li><p>https://apex.github.io/up</p><p>https://apex.github.io/up</p></li><li><p>https://github.com/claudiajs/claudia</p><p>https://github.com/claudiajs/claudia</p></li><li><p>https://github.com/Miserlou/Zappa</p><p>https://github.com/Miserlou/Zappa</p></li><li><p>http://gosparta.io/</p><p>http://gosparta.io/</p></li><li><p>TESTING</p></li><li><p>amzn.to/29Lxuzu</p><p>http://amzn.to/29Lxuzu</p></li><li><p>Level of Testing</p><p>1.Unitdo our objects do the right thing?are they easy to work with?</p></li><li><p>Level of Testing</p><p>1.Unit2.Integrationdoes our code work against code we cant change?</p></li><li><p>handler</p></li><li><p>handler</p><p>test by invoking the handler</p></li><li><p>Level of Testing</p><p>1.Unit2.Integration3.Acceptancedoes the whole system work?</p></li><li><p>Level of Testing</p><p>unit</p><p>integration</p><p>acceptance</p><p>feedb</p><p>ack</p><p>confidence</p></li><li><p>We find that tests that mock external libraries often need to be complex to get the code into the right state for the functionality we need to exercise. </p><p>The mess in such tests is telling us that the design isnt right but, instead of fixing the problem by improving the code, we have to carry the extra complexity in both code and test</p><p>Dont Mock Types You Cant Change</p></li><li><p>The second risk is that we have to be sure that the behaviour we stub or mock matches what the external library will actually do </p><p>Even if we get it right once, we have to make sure that the tests remain valid when we upgrade the libraries</p><p>Dont Mock Types You Cant Change</p></li><li><p>Dont Mock Types You Cant ChangeServices</p></li><li><p>Paul Johnston </p><p>The serverless approach to testing is different and may </p><p>actually be easier.</p><p>http://bit.ly/2t5viwK</p><p>http://bit.ly/2t5viwK</p></li><li><p>LambdaAPI Gateway DynamoDB</p></li><li><p>LambdaAPI Gateway DynamoDB</p><p>Unit Tests</p></li><li><p>LambdaAPI Gateway DynamoDB</p><p>Unit Tests</p><p>Mock/Stub</p></li><li><p>is our request correct?</p><p>is the request mapping set up correctly?is the API resources </p><p>configured correctly?</p><p>are we assuming the correct schema?</p><p>LambdaAPI Gateway DynamoDB</p><p>is Lambda proxy configured correctly?</p><p>is IAM policy set up correctly?</p><p>is the table created?</p><p>what unit tests will not tell you</p></li><li><p>most Lambda functions are simple have single purpose, the risk of </p><p>shipping broken software has largely shifted to how they integrate with </p><p>external services</p><p>observation</p></li><li><p>But it slows down my feedback loop</p><p>ITS NOT ABOUT YOU!</p></li><li><p>ITS CHINA. NOT SCHINA.</p></li><li><p>me</p><p>Your users shouldnt be the ones to pay the price for your </p><p>faster feedback loop. Optimise for working software. </p><p>Test your software end-to-end.</p></li><li><p>Wherever possible, an acceptance test should exercise the system end-to-end without directly calling its internal code. </p><p>An end-to-end test interacts with the system only from the outside: through its interface</p><p>Testing End-to-End</p></li><li><p>Legacy Monolith Amazon Kinesis Amazon Lambda</p><p>Amazon CloudSearchAmazon API Gateway Amazon Lambda</p></li><li><p>Legacy Monolith Amazon Kinesis Amazon Lambda</p><p>Amazon CloudSearchAmazon API Gateway Amazon Lambda</p><p>Test Input</p></li><li><p>Legacy Monolith Amazon Kinesis Amazon Lambda</p><p>Amazon CloudSearchAmazon API Gateway Amazon Lambda</p><p>Test Input</p><p>Validate</p></li><li><p>integration tests exercise systems Integration with its </p><p>external dependencies</p></li><li><p>acceptance tests exercise system End-to-End from </p><p>the outside</p></li><li><p>integration tests differ from acceptance tests only in HOW the </p><p>Lambda functions are invoked</p><p>observation</p></li><li><p>CI + CD PIPELINE</p></li><li><p>the earlier you consider CI + CD, the more time you save in the long run </p><p>- me</p></li><li><p>We prefer to have the end-to-end tests exercise both the system and the process by which its built and deployed </p><p>This sounds like a lot of effort (it is), but has to be done anyway repeatedly during the softwares lifetime</p><p>Testing End-to-End</p></li><li><p>deployment scripts that only live on the CI </p><p>box is a disaster waiting to happen </p><p>- me</p></li><li><p>Jenkins build config deploys and tests</p><p>unit + integration tests</p><p>deploy</p><p>acceptance tests</p></li><li><p>if [ "$1" = "deploy" ] &amp;&amp; [ $# -eq 4 ]; then STAGE=$2 REGION=$3 PROFILE=$4 </p><p> npm install AWS_PROFILE=$PROFILE 'node_modules/.bin/sls' deploy -s $STAGE -r $REGION elif [ "$1" = "int-test" ] &amp;&amp; [ $# -eq 4 ]; then STAGE=$2 REGION=$3 PROFILE=$4 </p><p> npm install AWS_PROFILE=$PROFILE npm run int-$STAGE elif [ "$1" = "acceptance-test" ] &amp;&amp; [ $# -eq 4 ]; then STAGE=$2 REGION=$3 PROFILE=$4 </p><p> npm install AWS_PROFILE=$PROFILE npm run acceptance-$STAGE else usage exit 1 fi</p></li><li><p>build.sh allows repeatable builds on both local &amp; CI</p></li><li><p>Auto Auto Manual</p></li><li><p>LOGGING</p></li><li><p>2016-07-12T12:24:37.571Z 994f18f9-482b-11e6-8668-53e4eab441ae GOT is off air, what do I do now?</p></li><li><p>2016-07-12T12:24:37.571Z 994f18f9-482b-11e6-8668-53e4eab441ae GOT is off air, what do I do now?</p><p>UTC Timestamp API Gateway Request Id</p><p>your log message</p></li><li><p>function name</p><p>date</p><p>function version</p></li><li><p>LOG OVERLOAD</p></li><li><p>CENTRALISE LOGS</p></li><li><p>CENTRALISE LOGS</p><p>MAKE THEM EASILYSEARCHABLE</p></li><li><p>+ +the elk stack</p></li><li><p>CloudWatch Logs</p></li><li><p>CloudWatch Logs AWS Lambda ELK stack</p></li><li><p>CloudWatch Events</p></li><li><p>http://bit.ly/2f3zxQG</p><p>http://bit.ly/2f3zxQG</p></li><li><p>DISTRIBUTED TRACING</p></li><li><p>my followers didnt receive my new post! </p><p>- a user</p></li><li><p>where could the problem be?</p></li><li><p>correlation IDs*</p><p>* eg. request-id, user-id, yubl-id, etc.</p></li><li><p>ROLL YOUR OWNCLIENTS</p></li><li><p>kinesis client</p><p>http client</p><p>sns client</p></li><li><p>http://bit.ly/2k93hAj</p><p>http://bit.ly/2k93hAj</p></li><li><p>ROLL YOUR OWNCLIENTS</p><p>X-RAY</p></li><li><p>Amazon X-Ray</p></li><li><p>Amazon X-Ray</p></li><li><p>traces do not span over API Gateway</p></li><li><p>http://bit.ly/2s9yxmA</p><p>http://bit.ly/2s9yxmA</p></li><li><p>MONITORING + ALERTING</p></li><li><p>where do I install monitoring agents?</p></li><li><p>you cant</p></li><li><p> invocation Count error Count latency throttling granular to the minute support custom metrics</p></li><li><p> same metrics as CW better dashboard support custom metrics</p><p>https://www.datadoghq.com/blog/monitoring-lambda-functions-datadog/</p><p>https://www.datadoghq.com/blog/monitoring-lambda-functions-datadog/</p></li><li><p>how do I batch up and send logs in the </p><p>background?</p></li><li><p>you cant (kinda)</p></li><li><p>console.log(hydrating yubls from db);</p><p>console.log(fetching user info from user-api);</p><p>console.log(MONITORING|1489795335|27.4|latency|user-api-latency);</p><p>console.log(MONITORING|1489795335|8|count|yubls-served);</p><p>timestamp metric value</p><p>metric type</p><p>metric namemetrics</p><p>logs</p></li><li><p>CloudWatch Logs AWS Lambda</p><p>ELK stacklogs</p><p>metrics</p><p>CloudWatch</p></li><li><p>http://bit.ly/2gGredx</p><p>http://bit.ly/2gGredx</p></li><li><p>DASHBOARDS</p></li><li><p>DASHBOARDS</p><p>SET ALARMS</p></li><li><p>DASHBOARDS</p><p>SET ALARMS</p><p>TRACK APP-LEVELMETRICS</p></li><li><p>Not Only CloudWatch</p></li><li><p>you really don't want your monitoring </p><p>system to fail at the same time as the </p><p>system it monitors - me</p></li><li><p>CONFIG MANAGEMENT</p></li><li><p>easily and quickly propagate config changes</p></li><li><p>CENTRALISEDCONFIG SERVICE</p></li><li><p>config servicegoes here</p></li><li><p>SSM Parameter </p><p>Store</p></li><li><p>sensitive data should be encrypted in-flight, and at rest</p><p>(credentials, connection string, etc.)</p></li><li><p>role-based access</p></li><li><p>SSM Parameter Store</p><p>HTTPS</p><p>role-based access</p><p>encrypted in-flight</p></li><li><p>SSM Parameter Store</p><p>encrypt</p><p>role-based access</p></li><li><p>SSM Parameter Store</p><p>encrypted at-rest</p></li><li><p>HTTPS</p><p>role-based access</p><p>SSM Parameter Store</p><p>encrypted in-flight</p></li><li><p>CENTRALISEDCONFIG SERVICE</p><p>CLIENT LIBRARY</p></li><li><p>fetch &amp; cache at Cold Start</p></li><li><p>invalidate at interval + signal</p></li><li><p>http://bit.ly/2yLUjwd</p><p>http://bit.ly/2yLUjwd</p></li><li><p>PRO TIPS</p></li><li><p>max 75 GB total deployment package size*</p><p>* limit is per AWS region</p></li><li><p>Janitor Monkey</p></li><li><p>Janitor Lambda</p><p>http://bit.ly/2xzVu4a</p><p>http://bit.ly/2xzVu4a</p></li><li><p>disable versionFunctions in</p></li><li><p>install Serverless framework as dev dependency at project level</p><p>dev dependencies are excluded since 1.16.0</p></li><li><p>http://bit.ly/2vzBqhC</p><p>http://bit.ly/2vzBqhC</p></li><li><p>http://amzn.to/2vtUkDU</p><p>http://amzn.to/2vtUkDU</p></li><li><p>UNDERSTANDCOLDSTARTS</p></li><li><p>Amazon X-Ray1st invocation</p><p>2nd invocation</p><p>cold start</p></li><li><p>source: http://bit.ly/2oBEbw2</p><p>http://bit.ly/2oBEbw2</p></li><li><p>EMBRACENODE.JS &amp; PYTHON</p></li><li><p>http://bit.ly/2rtCCBz</p><p>http://bit.ly/2rtCCBz</p></li><li><p>C#</p><p>http://bit.ly/2rtCCBz</p><p>http://bit.ly/2rtCCBz</p></li><li><p>Java</p><p>http://bit.ly/2rtCCBz</p><p>http://bit.ly/2rtCCBz</p></li><li><p>NodeJs, Python</p><p>http://bit.ly/2rtCCBz</p><p>http://bit.ly/2rtCCBz</p></li><li><p>what about type safety?</p></li><li><p>complexity ceiling of a Node.js app</p><p>com</p><p>plex</p><p>ity</p></li><li><p>complexity ceiling of a Node.js app</p><p>com</p><p>plex</p><p>ity</p><p>referential transparencyimmutability as default</p><p>type inferenceoption typesunion types</p></li><li><p>for managing complexity</p><p>complexity ceiling of a Node.js app</p><p>com</p><p>plex</p><p>ity</p><p>referential transparencyimmutability as default</p><p>type inferenceoption typesunion types</p></li><li><p>complexity ceiling of a Node.js app</p><p>com</p><p>plex</p><p>ity</p><p>complexity ceiling of a Node.js Lambda function</p></li><li><p>if you can limit the complexity of your solution, maybe you </p><p>wont need the tools for managing that complexity.me</p></li><li><p>AVOID HARDASSUMPTIONS</p><p>ABOUT FUNCTIONLIFETIME</p></li><li><p>USE STATE FOR</p><p>OPTIMISATION</p></li><li><p>AVOIDCOLDSTARTS</p></li><li><p>CloudWatch Event AWS Lambda</p></li><li><p>CloudWatch Event AWS Lambda</p><p>ping</p><p>ping</p><p>ping</p><p>ping</p></li><li><p>CloudWatch Event AWS Lambda</p><p>ping</p><p>ping</p><p>ping</p><p>ping</p></li><li><p>CloudWatch Event AWS Lambda</p><p>ping</p><p>ping</p><p>ping</p><p>ping</p><p>HEALTH CHECKS?</p></li><li><p>max 5 mins execution time</p></li><li><p>USE RECURSIONFOR LONG </p><p>RUNNING TASKS</p></li><li><p>CONSIDERPARTIAL</p><p>FAILURES</p></li><li><p>AWS Lambda polls your stream and invokes your Lambda function. Therefore, if </p><p>a Lambda function fails, AWS Lambda attempts to process the erring batch of </p><p>records until the time the data expires</p><p>http://docs.aws.amazon.com/lambda/latest/dg/retries-on-errors.html</p><p>http://docs.aws.amazon.com/lambda/latest/dg/retries-on-errors.html</p></li><li><p>should function fail on partial/any failures?</p></li><li><p>SNS</p><p>Kinesis</p><p>SQS</p><p>after 3 attempts</p><p>share processing logic</p><p>events are processed in chronological order</p><p>failed events are retried out of sequence</p></li><li><p>PROCESS SQSWITH RECURSIVE</p><p>FUNCTIONS</p></li><li><p>http://bit.ly/2npomX6</p><p>http://bit.ly/2npomX6</p></li><li><p>AVOID HOTKINESS</p><p>STREAMS</p></li><li><p>Each shard can support up to 5 transactions per second for reads, up to a maximum total data </p><p>read rate of 2 MB per second.</p><p>http://docs.aws.amazon.com/streams/latest/dev/service-sizes-and-limits.html</p><p>http://docs.aws.amazon.com/streams/latest/dev/service-sizes-and-limits.html</p></li><li><p>If your stream has 100 active shards, there will be 100 Lambda functions running concurrently. Then, each </p><p>Lambda function processes events on a shard in the order that they arrive.</p><p>http://docs.aws.amazon.com/lambda/latest/dg/concurrent-executions.html</p><p>http://docs.aws.amazon.com/lambda/latest/dg/concurrent-executions.html</p></li><li><p>when no. of processors goes up</p></li><li><p>ReadProvisionedThroughputExceeded</p><p>can have too many Kinesis read operations</p></li><li><p>ReadRecords.IteratorAge</p><p>unpredictable spikes in read latency</p></li><li><p>can kinda workaround</p></li><li><p>http://bit.ly/2uv5LsH</p><p>http://bit.ly/2uv5LsH</p></li><li><p>clever, but costly</p></li><li><p>for subsystems that dont have to be realtime, or are task-</p><p>based (ie. order doesnt matter), consider other </p><p>triggers such as S3 or SNS.me</p></li><li><p>@theburningmonktheburningmonk.comgithub.com/theburningmonk</p></li><li><p>@theburningmonktheburningmonk.comgithub.com/theburningmonk</p><p>http://bit.ly/2yQZj1H</p><p>all my blog posts on Lambda</p><p>http://bit.ly/2yQZj1H</p></li><li><p>sign up here: http://bit.ly/2xIO23O</p><p>http://bit.ly/2xIO23O</p></li></ul>