ops for noops - operational challenges for serverless apps

19
Ops for NoOps Operational challenges for serverless apps Eric Windisch CTO IOpipe, Inc.

Upload: eric-windisch

Post on 19-Jan-2017

1.019 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Ops for NoOps - Operational Challenges for Serverless Apps

Ops for NoOpsOperational challenges for serverless apps

Eric Windisch CTO IOpipe, Inc.

Page 2: Ops for NoOps - Operational Challenges for Serverless Apps

ERIC WINDISCH

@ewindisch

Founder & CTO of IOpipe, Inc. www.iopipe.com

ex-Docker, ex-Cloudscaling.

Builder of clouds,destroyer of monoliths.

Page 3: Ops for NoOps - Operational Challenges for Serverless Apps
Page 4: Ops for NoOps - Operational Challenges for Serverless Apps

EVOLUTION CREATES CHALLENGES

➤ Fear, uncertainty, and doubt for new users:

➤ What problems will I run into with this new platform?

➤ What will I do when those problems happen?

➤ Will I know about those problems when they happen?

➤ Is it secure?

➤ What tools to use?

Page 5: Ops for NoOps - Operational Challenges for Serverless Apps
Page 6: Ops for NoOps - Operational Challenges for Serverless Apps

SERVERLESS DEVELOPER PROFILES

➤ Frameworks: SLS, Zappa, Apex, DIY, others.

➤ Event sources: API Gateway, SNS, S3, Kinesis, others. (Alexa and AWS IoT sources are relatively infrequent)

➤ Languages: Node, Python, Java, Go, C, Ruby.

➤ Regions: all the regions: us-east, us-west, etc. several moving to new international regions (Sydney, etc.)

➤ Events: 0-100m+ events per day

➤ Stage: dev/test through production

Page 7: Ops for NoOps - Operational Challenges for Serverless Apps
Page 8: Ops for NoOps - Operational Challenges for Serverless Apps

CLOUDWATCH➤ Basic “super-outside” metrics:

➤ Errors ➤ Logs ➤ Invocations/time ➤ Duration ➤ Memory

➤ This is what Datadog, Sumologic, etc. ingest.

Page 9: Ops for NoOps - Operational Challenges for Serverless Apps

HARD PROBLEMS➤ Cold-starts

➤ Especially painful for Java users. ➤ Relationship of metrics vs logs. ➤ Lack or difficulty of profiling &

tracing tools. When do GCs happen?

➤ Retries - why/when & in relation to event sources

➤ AWS account level limits (& when to bump them up)

➤ Difficulty of managing unsupported languages: C, C++, Go, Ruby, etc.

➤ Debugging of & visibility into distributed systems ➤ Are failures at event-source or

lambda function? ➤ Kinesis!!!

➤ Cross-invocation leaks ➤ Memory leaks ➤ File descriptor leaks ➤ Backend process visibility ➤ Thread/callback leaks. ➤ etc.

Page 10: Ops for NoOps - Operational Challenges for Serverless Apps

➤ We install into your process, around your functions.

➤ Import a library, use a decorator (or low-level reporting API)

➤ Gets info via NodeJS process var, Python sys, etc.

➤ Timing information for wrapped function(s).

➤ Stacktrace reporting.

➤ Extra logging / events pushed by developers.

➤ & looks outside…

INSIDE THE PROCESS

Page 11: Ops for NoOps - Operational Challenges for Serverless Apps

METRICS & ANALYTICS

Page 12: Ops for NoOps - Operational Challenges for Serverless Apps

INTO THE BLACK BOX

Page 13: Ops for NoOps - Operational Challenges for Serverless Apps

GITHUB.COM/IOPIPE/LAMBDA-SHELL

Page 14: Ops for NoOps - Operational Challenges for Serverless Apps

OUTSIDE THE FUNCTION - INSIDE THE BLACK BOX

➤ Reuse of containers and VMs

➤ Cold-starts by VM, container, and app process.

➤ Tenancy of VMs (how many containers)

➤ Host VM processes(!!) & processes in other containers(!!!)

➤ Limited & very likely to go away…probably per-tenent VMs anyway

➤ Spawned processes

Page 15: Ops for NoOps - Operational Challenges for Serverless Apps

SECURITY

➤ I founded the Docker Security Team…

➤ FYI - Lambda’s not Docker!

➤ Lambda’s not perfect! (Security never is!)

➤ Amazon did a good job.

➤ Re-inventing the wheel means repeating some mistakes solved elsewhere…

➤ Still… AWS did a pretty good job.

➤ Don’t worry about it.

➤ Some questions can only be answered by AWS or with more data! TBD!

Page 16: Ops for NoOps - Operational Challenges for Serverless Apps
Page 17: Ops for NoOps - Operational Challenges for Serverless Apps

APP MANAGEMENT

➤ Actionable metrics from inside & outside the function.

➤ Ingest CloudTrail for context-aware intelligence.

➤ Where events originate, retries, etc.

➤ Alarms -> Lambda invocation

➤ triggers AWS services, PagerDuty, IFTTT, Zapier, etc.

➤ Real-time visibility. Daily, Weekly, Monthly reporting.

Page 18: Ops for NoOps - Operational Challenges for Serverless Apps

GETTING HELP➤ Gitter…

➤ https://gitter.im/serverless/serverless

➤ Slack…

➤ https://serverless-forum.slack.com/signup

➤ IOpipe Slack (for registered users!)

➤ Forums…

➤ Amazon - https://forums.aws.amazon.com/index.jspa

Page 19: Ops for NoOps - Operational Challenges for Serverless Apps

Eric Windisch CTO IOpipe, Inc.

Register for FREE beta access:

www.iopipe.com

Q&A