(sec403) diving into aws cloudtrail events w/ apache spark on emr

35
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Will Kruse, AWS IAM Senior Security Engineer October 2015 SEC403 Timely Security Alerts and Analytics Diving into AWS CloudTrail Events by Using Apache Spark on Amazon EMR

Upload: amazon-web-services

Post on 20-Jan-2017

4.380 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: (SEC403) Diving into AWS CloudTrail Events w/ Apache Spark on EMR

© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Will Kruse, AWS IAM Senior Security Engineer

October 2015

SEC403

Timely Security Alerts and AnalyticsDiving into AWS CloudTrail Events by Using Apache Spark

on Amazon EMR

Page 2: (SEC403) Diving into AWS CloudTrail Events w/ Apache Spark on EMR

What to expect from this session

Why are we here? To learn how to:

• Audit AWS activity across multiple AWS accounts for compliance

and security.

• Analyze AWS CloudTrail events as they arrive (in your Amazon

S3 bucket).

• Build profiles of AWS activity for users, origins, etc.

• Send alerts when an unexpected or interesting event, or series

of events, occurs.

• Use Apache Spark, a cutting edge big data platform, on AWS for

security and compliance auditing.

Page 3: (SEC403) Diving into AWS CloudTrail Events w/ Apache Spark on EMR

Expected technical background

• You are generally familiar with “big data” processing

frameworks (e.g., Hadoop).

• You are familiar with CloudTrail.

• You can read OO-code (e.g., Java, Scala, Python, Ruby,

etc.).

• You are comfortable with a command line.

Page 4: (SEC403) Diving into AWS CloudTrail Events w/ Apache Spark on EMR

CloudTrail schema

Page 5: (SEC403) Diving into AWS CloudTrail Events w/ Apache Spark on EMR

Demo: SQL queries over CloudTrail

Page 6: (SEC403) Diving into AWS CloudTrail Events w/ Apache Spark on EMR

Agenda

• SQL queries over CloudTrail logs

• Demo using spark-sql + hive tables

• Architecture

• Code

• Demo of code using Scala

• Processing CloudTrail logs as they arrive

• Architecture

• Demo

• Code

• Wrap-up

You are here

Page 7: (SEC403) Diving into AWS CloudTrail Events w/ Apache Spark on EMR

Our architecture

CloudTrail

objects

Amazon EMR cluster

running Apache

Spark

Security or

compliance

analyst

Page 8: (SEC403) Diving into AWS CloudTrail Events w/ Apache Spark on EMR

Recipe for SQL queries over CloudTrail logs

Write a Spark application that:

1. “Discovers” CloudTrail logs by calling CloudTrail.

• Alternatively, put all your CloudTrail logs in one or more buckets

known ahead of time.

2. Creates a list of CloudTrail trails + S3 objects.

3. Loads the data from each S3 object into an RDD.

4. Splits into individual CloudTrail event JSON objects.

5. Loads this RDD into a Spark DataFrame.

6. Register this DataFrame as a table (for querying).

Page 9: (SEC403) Diving into AWS CloudTrail Events w/ Apache Spark on EMR

Introduction to Apache Spark

• Big data processing framework

• Supported languages: Scala, Python, and Java

• Cluster management: Hadoop YARN, Apache Mesos, or

standalone

• Distributed storage: HDFS, Apache Cassandra,

OpenStack Swift, and S3

Page 10: (SEC403) Diving into AWS CloudTrail Events w/ Apache Spark on EMR

Why Spark?

• Fast

• Only does the work it needs to do

• Stores final and intermediate results in memory

• Supports batch and streaming processing

• Supports SQL queries, machine learning (ML), graph data

processing, and an R interface.

• Provides 20+ high-level operators that would otherwise be left

as an exercise to the coder

• Compatible with much of your existing Hadoop ecosystem

Page 11: (SEC403) Diving into AWS CloudTrail Events w/ Apache Spark on EMR

RDDs = Resilient Distributed Datasets

CloudTrail

objects in S3

Log #2

Log #1

Log #N

Log #1 string

Log #2 string

Log #N string

RDD of JSON arrays of

CloudTrail events (as

strings)

Event #1

Event #2

Event #M

Event #3

Event #4flatMapLog #2

Log #1

Log #N

Log #1 string

Log #2 string

Log #N string

Event #1

Event #2

Event #M

RDD of CloudTrail events

as individual JSON strings

Event #3

Event #4parallelize

Page 12: (SEC403) Diving into AWS CloudTrail Events w/ Apache Spark on EMR

DataFrames = Relational table abstraction

Event #1

Event #2

Event #M

RDD of CloudTrail

events

Event #3

Event #4

service API Timestamp Source IP Principal

Event #1

Event #2

Event #3

Event #4

Event #M

Event #1

Event #2

Event #M

RDD of CloudTrail

events

Event #3

Event #4

Service API Time Stamp Source IP Principal

Event #1 ec2 D… 2015/08/31 1:10 1.2.3.4 AIDA1…

Event #2 s3 P… 2015/08/31 1:11 1.2.3.5 AIDA2…

Event #3 swf S… 2015/08/31 1:12 1.2.3.6 AROA1…

Event #4 iam C… 2015/08/31 1:13 1.2.3.7 AROA2…

… … … … … …

Event #M CloudTrail D… 2015/08/31 2:43 1.2.3.8 AIDA3…

SQLContext

.read.json

Page 13: (SEC403) Diving into AWS CloudTrail Events w/ Apache Spark on EMR

Spark cluster components

Master node

Core node

Executor

Executor

RDD partitions

Core node

Executor

Executor

RDD partitions

Application driver

Tasks

(serialized

Java/Scala)

Page 14: (SEC403) Diving into AWS CloudTrail Events w/ Apache Spark on EMR

Recommended CloudTrail configuration

• Turn on CloudTrail logging in all regions.

• Enable S3 bucket logging for all buckets as well.

• Get all your CloudTrail logs for all your accounts in one

bucket (per region).

• Either have CloudTrail deliver them or copy them.

• Disallow deletes from CloudTrail buckets.

Page 15: (SEC403) Diving into AWS CloudTrail Events w/ Apache Spark on EMR

Needed AWS IAM permissions

• Getting started recommendation

• Launch an EMR cluster with default roles.

• Attach the CloudTrailReadOnly policy to the

EMR_EC2_DefaultRole.

• Least privilege improvements

• Restrict s3:getObject and s3:listBucket to CloudTrail

buckets.

• Remove EMR’s DDB, Amazon Kinesis, Amazon RDS,

Amazon SimpleDB, Amazon SNS, and Amazon SQS

permissions.

Page 16: (SEC403) Diving into AWS CloudTrail Events w/ Apache Spark on EMR

Tour through code to query

CloudTrail logs with SQL

Page 17: (SEC403) Diving into AWS CloudTrail Events w/ Apache Spark on EMR

Discover CloudTrail data

Page 18: (SEC403) Diving into AWS CloudTrail Events w/ Apache Spark on EMR

Transform CloudTrail data

Page 19: (SEC403) Diving into AWS CloudTrail Events w/ Apache Spark on EMR

Register CloudTrail data as a table

Page 20: (SEC403) Diving into AWS CloudTrail Events w/ Apache Spark on EMR

Demo: Querying CloudTrail

logs with Scala prompt

Page 21: (SEC403) Diving into AWS CloudTrail Events w/ Apache Spark on EMR

Agenda

• SQL Queries over CloudTrail logs

• Demo using spark-sql

• Architecture

• Code

• Demo of code using Scala

• Processing CloudTrail logs as they arrive

• Architecture

• Demo

• Code

• Wrap-up

You are here

Page 22: (SEC403) Diving into AWS CloudTrail Events w/ Apache Spark on EMR

Analytics as soon as

CloudTrail data arrives in S3

Page 23: (SEC403) Diving into AWS CloudTrail Events w/ Apache Spark on EMR

Introduction to Spark Streaming

CloudTrail

SNS topic

Spark CloudTrail

receiver

Executors

New activity

Batch N-1 RDD

Batch N RDD

Previous profile

+

=Update

profileAlerts

Alert topic

Store

Spark Application

Page 24: (SEC403) Diving into AWS CloudTrail Events w/ Apache Spark on EMR

Discretized stream (Dstream)

RDD for micro-batch #3RDD for micro-batch #2RDD for micro-batch #1

Spark Streaming and micro-batches

Time

Event 1 Event 2 Event 3 Event 4 Event 5 Event 6 Event 7 Event 8

3 seconds 3 seconds 3 seconds

Page 25: (SEC403) Diving into AWS CloudTrail Events w/ Apache Spark on EMR

Recipe

Write a Spark Streaming application that:

1. Uses a CloudTrail log receiver to learn about new logs

from CloudTrail’s SNS feed.

• Logs are delivered to S3, usually in less than 15 minutes.

2. Store()s each event from CloudTrail logs.

3. Analyzes events in micro-batches.

• Size based on the “batch interval.”

4. Generates alarms on suspicious behavior.

Page 26: (SEC403) Diving into AWS CloudTrail Events w/ Apache Spark on EMR

Scenarios we want to know about ASAP

• Connections from unusual geographies

• Connections from anonymizing proxies

• Use of dormant AWS access keys

• Use of dormant AWS principals (users, roles, root)

Page 27: (SEC403) Diving into AWS CloudTrail Events w/ Apache Spark on EMR

Demo: Streaming analysis of

CloudTrail logs

Page 28: (SEC403) Diving into AWS CloudTrail Events w/ Apache Spark on EMR

Creating stream of CloudTrail events

Page 29: (SEC403) Diving into AWS CloudTrail Events w/ Apache Spark on EMR

Build profiles and send alerts

Page 30: (SEC403) Diving into AWS CloudTrail Events w/ Apache Spark on EMR

Agenda

• SQL Queries over CloudTrail logs

• Demo using spark-sql

• Architecture

• Code

• Demo of code using Scala

• Processing CloudTrail logs as they arrive

• Architecture

• Demo

• Code

• Wrap-up

You are here

Page 31: (SEC403) Diving into AWS CloudTrail Events w/ Apache Spark on EMR

How to use these tools

1. Build your threat model.

2. Configure and customize this streaming application.

3. Use Spark-on-EMR for ad hoc log analysis.

4. Use Spark Streaming for regular analysis and alerts.

Page 32: (SEC403) Diving into AWS CloudTrail Events w/ Apache Spark on EMR

How I use these tools

1. Keep my engineering teams honest.

2. Identify noncompliant usage.

3. Review actors and their actions in my accounts.

4. Craft least privilege policies by analyzing historical

usage.

Page 33: (SEC403) Diving into AWS CloudTrail Events w/ Apache Spark on EMR

Take action

• See who is active in your AWS accounts and when.

• Run queries over your logs in EMR.

• Configure and extend the sample application to meet

your specific needs.

• Find the demo code here:

https://github.com/awslabs/timely-security-analytics

Page 34: (SEC403) Diving into AWS CloudTrail Events w/ Apache Spark on EMR

Remember to complete

your evaluations!

Page 35: (SEC403) Diving into AWS CloudTrail Events w/ Apache Spark on EMR

Thank you!