(sec313) security & compliance at the petabyte scale

62
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Igor Bogicevic, CTO Security and Compliance at the Petabyte Scale Lessons from the National Cancer Institute’s Cancer Genomics Cloud Pilot Angel Pizarro, AWS Scientific Computing October 2015

Upload: amazon-web-services

Post on 20-Mar-2017

3.346 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: (SEC313) Security & Compliance at the Petabyte Scale

© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Igor Bogicevic, CTO

Security and Compliance

at the Petabyte ScaleLessons from the National Cancer Institute’s

Cancer Genomics Cloud PilotAngel Pizarro, AWS Scientific Computing

October 2015

Page 2: (SEC313) Security & Compliance at the Petabyte Scale

What to expect from this session

• Background: Unique challenges for securing genomics

information

• Case study: Democratizing access to The Cancer

Genome Atlas (TCGA) through the Seven Bridges

Cancer Genomics Cloud

• Deep dives: How we’ve leveraged AWS to support

secure and compliant genomics research

Page 3: (SEC313) Security & Compliance at the Petabyte Scale

Why is securing genomics

information hard?

Page 4: (SEC313) Security & Compliance at the Petabyte Scale

i) Genomics data is big…and getting bigger

NGS: Next Generation Sequencing

NGS sequencers include machines from Illumina, Life Technologies, and Pacific Biosciences. Human genome data based on estimates of whole human genomes sequenced

Sources: Financial reports of Illumina, Life Technologies, Pacific Biosciences; revenue guidances; JP Morgan; The Economist; Seven Bridges Analysis.

Between 2014–2018, production of new NGS data to exceed 2 exabytes

# s

equencers

Genom

ic d

ata

Tb

Page 5: (SEC313) Security & Compliance at the Petabyte Scale

ii) Genomes are inherently sensitive

Very personal (including your relatives…)

Can’t fully anonymize information

Can’t take it back once it’s out there

Page 6: (SEC313) Security & Compliance at the Petabyte Scale

iii) Research is highly collaborative and

diverse

It occurs in large teams... ...with numerous analytical tools

Page 7: (SEC313) Security & Compliance at the Petabyte Scale

The Challenge

Enable thousands of researchers

using hundreds of (custom) tools

to analyze petabytes of highly sensitive data

in a secure and compliant environment

Page 8: (SEC313) Security & Compliance at the Petabyte Scale

Case study:

Bringing the Cancer Genome

Atlas (TCGA) to the Cloud

This project has been funded in whole or in part with Federal funds from the

National Cancer Institute, National Institutes of Health, Department of Health

and Human Services, under Contract No. HHSN261201400008C.

Page 9: (SEC313) Security & Compliance at the Petabyte Scale

TCGA is one of the richest and most complete

genomics data sets in the world

34 tumor types

from thousands

of patients…

…analyzed across

multiple

dimensions…

…by researchers

across the US…

…at a cost of

$375 million.

1.5+ petabytes, growing to 3.5 petabytes in the next year

Page 10: (SEC313) Security & Compliance at the Petabyte Scale

But learning from this data is challenging

Page 11: (SEC313) Security & Compliance at the Petabyte Scale

The Cancer Genomics Cloud Pilots seek to

directly address these difficulties

• Initiated by Dr. Harold Varmus in 2013

• BAA issued in January 2014

• 3 pilots awarded September 2014o Broad Institute

o Institute for Systems Biology

o Seven Bridges Genomics

Early access: November 2015

Open release: January 2016

www.CancerGenomicsCloud.org

Page 12: (SEC313) Security & Compliance at the Petabyte Scale

Our approach to democratizing

access to TCGA data

Page 13: (SEC313) Security & Compliance at the Petabyte Scale

The components of democratized access –

Data

● Immediately and securely access

petabytes of open-access and

controlled-access cancer genomics

data.

● Analyze data from your private

cohorts alongside public data.

● Data access governed by the NIH

Genomic Data Sharing Policy.

● As an NIH trusted partner, Seven

Bridges is able to authorize approved

researchers.

● First controlled access genomic

dataset on AWS.

● Coming soon:

http://aws.amazon.com/public-data-

sets/tcga/.

Page 14: (SEC313) Security & Compliance at the Petabyte Scale

The components of democratized access –

Reproducibility

1.1.2 2.0a 2.3Lite

● Execute workflows from primary

analysis through visualization.

● Each result is always associated with

a complete snapshot of the tool

versions, parameters, and input files.

Page 15: (SEC313) Security & Compliance at the Petabyte Scale

The components of democratized access –

Open standards

● Native execution of Docker-based Common

Workflow Language (CWL) pipelines allows

portability and sharing of custom tools.

● APIs support workflow automation and

enhance interoperability.

Page 16: (SEC313) Security & Compliance at the Petabyte Scale

...implemented through our genomics platform

Page 17: (SEC313) Security & Compliance at the Petabyte Scale

How we’ve leveraged AWS to

support secure and compliant

genomics research

Page 18: (SEC313) Security & Compliance at the Petabyte Scale

Security and compliance―connected, but separate.

Page 19: (SEC313) Security & Compliance at the Petabyte Scale

Security

• Network and data security overview

• Parallel file access at scale

• Enabling secure computation using researcher-

contributed tools

• Enabling secure user access and collaboration

Page 20: (SEC313) Security & Compliance at the Petabyte Scale

Simplified system architecture

Encrypted Amazon S3 buckets

Virtual private cloud

(Development environment)

Virtual private cloud

(Production environment)

Dynamic worker

instancesInfrastructure

server

Seven Bridges

website

Dynamic worker

instances

Infrastructure

server

IPSEC VPN

Seven Bridges

offices

Open VPN

Gateway

Remote

workforce

AWS

IPSEC

AWS

IPSEC

UserAccess platform

download data

Data flow

Secure access point

AWS

Page 21: (SEC313) Security & Compliance at the Petabyte Scale

Securing the network

• Extensive use of virtual private clouds (VPCs)

• Separate dev and production environments

DevProduction

● Built-in IPSEC allows easy

network integration

• Open VPN to secure remote

user access

● Each instance and VPC is

individually firewalled

Page 22: (SEC313) Security & Compliance at the Petabyte Scale

Securing data

• At-rest encryption

• Amazon S3 SSE, SSE-KMS

• Amazon EBS encryption

• Ephemeral storage

DevProduction

• In transit

• Data in-transit-fortifying - TLS

exclusively on S3

● From other users

• AWS IAM to access other users’ buckets

Page 23: (SEC313) Security & Compliance at the Petabyte Scale

Controls to support secure data

• Atomic data access

• Data locality

• Dedicated tenancy on

computation instances

• Using only encrypted storage

• Strict data purging

Amazon S3 Amazon EBS Amazon EC2

{

"Version":"2012-10-17",

"Statement":[

{

"Sid":"112",

"Effect":"Deny",

"Principal": "*",

"Action":"s3:PutObject",

"Resource":"arn:aws:s3:::examplebucket/*",

"Condition": {

"StringNotEquals": {"s3:x-amz-server-side-encryption": "AES256"}

}

}

]

}

dm-crypt

Page 24: (SEC313) Security & Compliance at the Petabyte Scale

Security

• Network and data security overview

• Parallel file access at scale

• Enabling secure computation using researcher-

contributed tools

• Enabling secure user access and collaboration

Page 25: (SEC313) Security & Compliance at the Petabyte Scale

Parallel file access at scale

The Challenge:

Many bioinformatics tasks require sharing of

intermediary results between multiple instances.

Page 26: (SEC313) Security & Compliance at the Petabyte Scale

Parallel file access at scale – NFS

Observed network

saturation at ~8 NFS clients.

Page 27: (SEC313) Security & Compliance at the Petabyte Scale

Hypothesis

• Amazon S3 would remove single NFS server bandwidth

bottleneck.

• Presenting user’s S3 objects as a local filesystem could provide

an elegant abstraction that any application could use.

• Cumulative S3 read/write speed should scale mostly linearly

with number of workers.

• Total read/write speed on shared S3 objects should significantly

exceed NFS server solution speed on >10 workers.

Page 28: (SEC313) Security & Compliance at the Petabyte Scale

Parallel access at scale – SBG-FS/Amazon S3

Amazon S3

Page 29: (SEC313) Security & Compliance at the Petabyte Scale

SBG-FS single worker performance

Compute Instances

300200100

90

215

894

Thro

ughput M

B/s

400

600

50 250150

1st read (SBG-FS Prefetch)

Write (SBG-FS Upload)

2nd read (SBG- FS Cache)

Page 30: (SEC313) Security & Compliance at the Petabyte Scale

SBG-FS cumulative worker performance

Compute Instances

300200100

50

250

Thro

ughput G

B/s

150

200

50 250150

1st read (SBG-FS Prefetch)

Write (SBG-FS Upload)

2nd read (SBG- FS Cache)100

Page 31: (SEC313) Security & Compliance at the Petabyte Scale

SBG-FS auditing capabilities

Amazon S3

Page 32: (SEC313) Security & Compliance at the Petabyte Scale

Security

• Network and data security overview

• Parallel file access at scale

• Enabling secure computation using researcher-

contributed tools

• Enabling secure user access and collaboration

Page 33: (SEC313) Security & Compliance at the Petabyte Scale

Enabling secure computation using

researcher-contributed tools

The Challenge:

bioinformatics tools

10,000+

50+tools used in single

TCGA marker paper

Our Approach:

Common Workflow Language (CWL) wrapper

Seven Bridges Platform

Page 34: (SEC313) Security & Compliance at the Petabyte Scale

Benefits of using Docker to deploy user-

contributed tools

• Enables solid resource

isolation at the container

level

• Simplifies deploying and

managing tools at scale

DevProduction

Page 35: (SEC313) Security & Compliance at the Petabyte Scale

Security risks posed by use of Docker

• Docker daemon runs under

root privileges

• User can intentionally or

unintentionally add malicious

apps

• If resources management not

set properly, apps could do

damage outside its container

DevProduction

Page 36: (SEC313) Security & Compliance at the Petabyte Scale

Enabling secure use of Docker containers

● Know your private vs. public

resources

● Isolate network resources for

each container (firewalling)

• Be careful with linking

containers

• Aggregate logs (forensics)

DevProduction

Page 37: (SEC313) Security & Compliance at the Petabyte Scale

Security

• Network and data security overview

• Parallel file access at scale

• Enabling secure computation using researcher-

contributed tools

• Enabling secure user access and collaboration

Page 38: (SEC313) Security & Compliance at the Petabyte Scale

Enabling secure access

DevProduction

● Organizations have diverse

models of internal structure

and responsibilities

• Roles and authentication

models are very diverse

• Federated authentication

and SSO

Page 39: (SEC313) Security & Compliance at the Petabyte Scale

Supporting federated login for controlled data

access

Error Message

Approved Researchers

cron x 24hr

Metadata service

ELK stackVerify

SAML

Page 40: (SEC313) Security & Compliance at the Petabyte Scale

Enabling collaboration

• SBG Platform provides isolation

of resources at project level

• Users can share projects and

control access through roles

• Basic role provides just a read

access, write/copy privileges

separate from execution

One Billing Group

per project

$

Multiple users and

roles per project

Users participate in projects

and can provide funding

. .

(-

$ $$

$

Project-specific user roles

Multiple users per project

Clear funding/payment

responsibility

Page 41: (SEC313) Security & Compliance at the Petabyte Scale

Overall system security is enabled by

monitoring and testing

• Penetration testing

• Patch management

• Software and infrastructure vulnerability assessments

• Monitoring of platform performance and availability

• Pandora FMS/OSSEC/Sysdig

• Auditing and logs at a project and platform level

• Logs aggregated and available for inspection with ELK

stack

Page 42: (SEC313) Security & Compliance at the Petabyte Scale

Putting it all together 1. User logs on to the platform

2. Platform creates a unique signed URL

for the user

3. Using signed URL, data is uploaded to

an encrypted Amazon S3 bucket

4. After the user starts a computation, the

Seven Bridges Platform calculates the

optimal execution plan and starts

dedicated task worker instances

5. Worker instances securely pull data

from Amazon S3

6. Worker instances are able to securely

share intermediate data

7. Final results are uploaded to

Amazon S3

Encrypted

S3 bucket

User

EC2

instancesData sharing

between instances

6

SevenBridges

Computation environment

Seven Bridges Platform

4

1,2

3

5,7 Encrypted

Amazon S3

Amazon EC2

Instances

Page 43: (SEC313) Security & Compliance at the Petabyte Scale

Lessons learned from petabyte-scale security

• Isolate resources as much as possible

• Encrypt everything―it will make your life easier

• Understand the scale of the data

• Measure everything

• Leverage the infrastructure

Page 44: (SEC313) Security & Compliance at the Petabyte Scale

Compliance

Page 45: (SEC313) Security & Compliance at the Petabyte Scale

When we talk about compliance, we talk about

Building trust Shared language

Page 46: (SEC313) Security & Compliance at the Petabyte Scale

dbGaPProtect against risk associated with release of genomes of

individuals consenting to participate in research studies.

HIPAAProtect against risk associated with release of Personal Health

Information (PHI).

ISO 27001 Provides framework for general security management of assets

across the organization and is a general specification for

information security management system (ISMS).

Compliance frameworks

Page 47: (SEC313) Security & Compliance at the Petabyte Scale

Shared responsibility == compliance coordination

Sta

cked R

esponsib

ility

Facilities

Infrastructure

Virtualization

API and Service Endpoints

AWS

Data Security

Data Provenance

Application Monitoring

OS, Network, etc.

Seven Bridges

Genomics

Users | Groups | Projects | Applications Researcher

Auditor

Page 48: (SEC313) Security & Compliance at the Petabyte Scale

Shared responsibility across frameworks

dbGaP

HIPAA

ISO 27001

ResearcherAWS Seven Bridges

Page 49: (SEC313) Security & Compliance at the Petabyte Scale

Shared responsibility across frameworks

dbGaP

HIPAA

ISO 27001

ResearcherAWS Seven Bridges

Page 50: (SEC313) Security & Compliance at the Petabyte Scale

Shared responsibility across frameworks

dbGaP

HIPAA

ISO 27001

ResearcherAWS Seven Bridges

Page 51: (SEC313) Security & Compliance at the Petabyte Scale

Securely integrating with platforms

Page 52: (SEC313) Security & Compliance at the Petabyte Scale

Security and compliance in practiceS

tacked R

esponsib

ility Data Security

Data Provenance

Application Monitoring

OS, Network, etc.

Users | Groups | Projects | Applications

Facilities

Infrastructure

Virtualization

API and Service Endpoints

Horizontal

Responsibility

Seven Bridges GenomicsResearcher Amazon Web Services

Page 53: (SEC313) Security & Compliance at the Petabyte Scale

Use case: Analyze Personal Genome Project data

http://personalgenomes.org

VPC subnet

Dedicated instance

1000 Genomes

Page 54: (SEC313) Security & Compliance at the Petabyte Scale

Strategies to follow

• Rely on the platform as much as possible

• Follow security best practices outlined in the AWS

documentation

• Have a checklist!

Page 55: (SEC313) Security & Compliance at the Petabyte Scale

Compliance checklist

AWS security

VPC, security groups, encrypted storage

Protect AWS credentials

Protect platform credentials

SOPs for OS and application updates

Audit and logging of the activities outside of platform

Data provenance and lifecycle

Page 56: (SEC313) Security & Compliance at the Petabyte Scale

AWS architecture

IAM instance role

VPC subnet

Security

group

Virtual private cloud

• Access platforms via

Internet or VPC peering

• DevOps for instance and

application management

• Protect credentials with

AWS IAM and AWS KMS

Page 57: (SEC313) Security & Compliance at the Petabyte Scale

Secure bootstrapping with instance UserData

Page 58: (SEC313) Security & Compliance at the Petabyte Scale

AWS Command Line Interface

Page 59: (SEC313) Security & Compliance at the Petabyte Scale

Secure and format local storage

Page 60: (SEC313) Security & Compliance at the Petabyte Scale

Compliance checklist

AWS security

VPC, security groups, encrypted storage

Protect AWS credentials

Protect platform credentials

SOPs for OS and application updates

❑ Audit and logging of the activities outside of platform

❑ Data provenance and lifecycle

Page 61: (SEC313) Security & Compliance at the Petabyte Scale

Thank you!

Page 62: (SEC313) Security & Compliance at the Petabyte Scale

Remember to complete

your evaluations!