how thermo fisher is reducing mass spectrometry experiment times from days to minutes w/ mongodb...

51
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. DAT204 How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to Minutes with MongoDB & AWS

Upload: mongodb

Post on 16-Apr-2017

639 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to Minutes w/ MongoDB Atlas on AWS

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

DAT204

How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to

Minutes with MongoDB & AWS

Page 2: How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to Minutes w/ MongoDB Atlas on AWS

World leader in serving scienceRevenues of $17 billion50,000 employees 50 countries

Page 3: How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to Minutes w/ MongoDB Atlas on AWS

A Mass Spectrometer tells you…

What’s in there and how much

Page 4: How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to Minutes w/ MongoDB Atlas on AWS
Page 5: How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to Minutes w/ MongoDB Atlas on AWS

Making the world cleaner and safer

Page 6: How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to Minutes w/ MongoDB Atlas on AWS

Mars Organic Molecule Analyzer (MOMA) will take a modified Thermo Linear Ion Trap Mass Spectrometer to Mars in 2020

Page 7: How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to Minutes w/ MongoDB Atlas on AWS
Page 8: How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to Minutes w/ MongoDB Atlas on AWS

What beer looks like in a mass spec

Page 9: How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to Minutes w/ MongoDB Atlas on AWS
Page 10: How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to Minutes w/ MongoDB Atlas on AWS
Page 11: How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to Minutes w/ MongoDB Atlas on AWS

Demo

Page 12: How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to Minutes w/ MongoDB Atlas on AWS

Instrument

MongoDB

MS Instrument Connect

Demo: instrument connect

Page 13: How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to Minutes w/ MongoDB Atlas on AWS

Demo: remote monitoring a mass spectrometer

Page 14: How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to Minutes w/ MongoDB Atlas on AWS

Why does Thermo use MongoDB?

Page 15: How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to Minutes w/ MongoDB Atlas on AWS

ThermoFisher apps using MongoDB

XML MongoDB

Starting on MongoDBOracle MongoDB

SQL Lite MongoDB

Postgres MongoDB

Amazon DynamoDB MongoDB Atlas

Page 16: How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to Minutes w/ MongoDB Atlas on AWS

Scientific apps = humongous data

Page 17: How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to Minutes w/ MongoDB Atlas on AWS

Big molecules = big data

Page 18: How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to Minutes w/ MongoDB Atlas on AWS

instrument { UserId : "[email protected]", MachineName : "TRACEFINDER8", Location : "Austin", AcquisitionStationName : "TSQ 8000", LastErrorEventDate : "2016-09-05", LastErrorEventValue : null, RuntimeEstimate : { MeasuredElaspedDuration : 0.21966, Confidence : HighConfidence }, RunManagerStatus : { Status : "Acquire", Sequence : "Testosterone", SampleName : "Drugx", VialPosition : "1", Rawfile : "2pg_161029205505", Instmethod : "1x.meth", Instrument : "TSQ 8000", IsPaused : false, Operator : "Fred", }}

Why MongoDB was chosen

• Performance• Developer productivity• Cost effective• Runs anywhere• Rich feature set• Achieved legal and regulatory approval

Page 19: How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to Minutes w/ MongoDB Atlas on AWS

MongoDB is a Swiss army knife

• Hierarchical data• Relational data • Queues• File storage• Device state

Amazon SQSAmazon S3Amazon IoT

Page 20: How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to Minutes w/ MongoDB Atlas on AWS

Join example

• Version 3.2 introduced the $lookup operator

• SQL query

• MongoDB C# driver query

Page 21: How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to Minutes w/ MongoDB Atlas on AWS

MongoDB has caught up to relational DBs

Notably, we show that the MUPG (match, unwind, project, group) fragment is already at least as expressive as full relational algebra over (the relational view of) a single collection, and in particular able to express arbitrary joins.

– Bolzano University in Italy

“”

Page 22: How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to Minutes w/ MongoDB Atlas on AWS

Hash-Based ShardingRolesKerberosOn-Prem Monitoring

2.4GA 2013

2.6GA 2014

3.0GA 2015

3.2GA 2015

Headline Features by Release

$outIndex IntersectionText SearchField-Level RedactionLDAP & x509Auditing

Document Validation$lookupFast FailoverSimpler ScalabilityAggregation ++Encryption At RestIn-Memory Storage EngineBI ConnectorMongoDB CompassAPM IntegrationProfiler VisualizationAuto Index BuildsBackups to File System

Doc-Level ConcurrencyCompressionStorage Engine API≤50 replicasAuditing ++Ops Manager

Linearizable readsIntra-cluster compressionViewsLog RedactionGraph ProcessingDecimalCollations Faceted NavigationSpark Connector ++Zones ++Aggregation ++Auto-balancing ++ARM, Power, zSeriesBI Connector ++Compass ++Hardware MonitoringServer PoolLDAP AuthorizationEncrypted BackupsCloud Foundry Integration

3.4GA 2016Atlas

The evolution of MongoDB

1.02009

Page 23: How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to Minutes w/ MongoDB Atlas on AWS

MySQL vs. MongoDB

Page 24: How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to Minutes w/ MongoDB Atlas on AWS

Database schema

MySQL schema

MongoDB schema

Page 25: How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to Minutes w/ MongoDB Atlas on AWS

Inserting data: MongoDB vs. MySQL

• Inserting 1,615 chemical compound records into two parent-child tables.• To optimize the MySQL query, we turned off foreign keys during insert and

used a string builder to create a bulk insert SQL statement. This improved insert performance by a factor of 360.

• Compare to MongoDB.

Database Milliseconds Lines of codeMySQL not optimized 147,600 (2.5 minutes) 21MySQL optimized 410 40MongoDB 68 1

Page 26: How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to Minutes w/ MongoDB Atlas on AWS

Inserting data: MongoDB vs. MySQL

Page 27: How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to Minutes w/ MongoDB Atlas on AWS

Selecting data: MongoDB vs. MySQL

• Query 600,000 rows of SampleCompound result data• To optimize the MySQL select query, we created a dictionary to lookup child

records for each parent, this improved performance by a factor of 300, optimization effort: 2 engineers and 2 weeks.

Database Seconds Lines of codeMySQL not optimized 2,400 (4.1 minutes) 20MySQL optimized 8.2 29MongoDB 17.5 7

Page 28: How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to Minutes w/ MongoDB Atlas on AWS

Update: MongoDB vs. MySQL

Page 29: How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to Minutes w/ MongoDB Atlas on AWS

Migrating to MongoDB reduced code by 3.5x

SQLite MongoDBData Layer Lines of Code 4271 1260

Page 30: How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to Minutes w/ MongoDB Atlas on AWS

MongoDB compared to DynamoDB

MongoDB DynamoDBAnywhere AWSRich Ad-hoc Query Language + IDE No Ad-hoc query languageMany operators (Joins, Aggregation, etc.) Fewer operatorsExcellent Performance Excellent PerformanceEasy to deploy (with Atlas) Easy to Deploy each tableAdding tables requires no configuration changes

Adding tables requires additional configuration and cost

Easy to use from AWS services but not natively integrated

Native integration with AWS Services: IAM, VPC, Lambda, Kinesis

Released in 2009 Released in 2012

Page 31: How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to Minutes w/ MongoDB Atlas on AWS

MongoDB vs. S3 performance

Download 220 KB object from MongoDB was 7x faster cold, and 3x faster when warm

MongoDB Amazon S3Retrieve document first time 68 ms 468 ms

Retrieve document second time 13 ms 38 ms

Page 32: How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to Minutes w/ MongoDB Atlas on AWS

MongoDB vs. S3 performance

MongoDB 11x faster than S3 in the use case of partial document loading

MongoDB S3

Data size 400 Bytes 2.1 MB

Performance 19 ms 214 ms

Page 33: How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to Minutes w/ MongoDB Atlas on AWS

Reducing processing from days to minutes

Page 34: How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to Minutes w/ MongoDB Atlas on AWS

Frameworks used to parallelize algorithms

• AWS Lambda• Docker and Amazon ECS• Spark and Elastic Map Reduce

Page 35: How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to Minutes w/ MongoDB Atlas on AWS

Parallel data processing

Page 36: How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to Minutes w/ MongoDB Atlas on AWS

Why Atlas?

• Easy• Performant • Seamless Migration• Robust• No downtime, even when scaling up

Page 37: How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to Minutes w/ MongoDB Atlas on AWS

Building MongoDB Atlas on Amazon Web Services

Page 38: How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to Minutes w/ MongoDB Atlas on AWS

Operations burden

PATCHES

UPGRADES

SECURITY

BACKUPS

RECOVERY

99.999% UPTIME

UPSCALE

DOWNSCALE

PERFORMANCE

UAT

STAGING

MONITORING

ALERTS

PROVISION

CONFIGURE

INSTALL

Page 39: How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to Minutes w/ MongoDB Atlas on AWS

Automated Available On-Demand

Secure Highly Available Automated Backups

Elastically Scalable

Database as a service for MongoDB

Page 40: How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to Minutes w/ MongoDB Atlas on AWS

Fully managed MongoDB clusters

Customer only needs to choose the shape and size of the cluster

● Instance size (CPU and RAM)

● Replication factor

● Number of shards

● Disk space

● Disk speed

Screenshot of create dialog

Cluster features

Page 41: How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to Minutes w/ MongoDB Atlas on AWS

VPC peering

IP address whitelist

SCRAM-SHA-1 authentication

readWriteAnyDatabase

enableSharding

clusterMonitorSSL

Using well-known CATrust system CAs by default

Security features

Page 42: How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to Minutes w/ MongoDB Atlas on AWS

Backup AutomationMonitoring

Key components

Page 43: How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to Minutes w/ MongoDB Atlas on AWS

AWS Account X—Region Y

VPC (Customer N)

Availability Zone A

Availability Zone B

Availability Zone C

Subnet A Subnet B Subnet C

mongod—27017

mongod—27017

mongod—27017

Customer container with replica set

Page 44: How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to Minutes w/ MongoDB Atlas on AWS

AWS Account X—Region Y

VPC (Customer N)

Availability Zone A

Availability Zone B

Availability Zone C

Subnet A Subnet B Subnet C

Customer container with sharded cluster

shard0

S

shard1

S

shard2 config

shard0

S

shard1

S

shard2 config

shard0

S

shard1

S

shard2 config

Page 45: How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to Minutes w/ MongoDB Atlas on AWS

mongod—27017

mongod—27017

mongod—27017

One security group per VPC applied to all Amazon EC2 instances

Three classes of security rules:

● MongoDB traffic between cluster members

● MongoDB traffic between application and clusters

● SSH traffic between production support jump box and EC2 instance

App Server Jump Box

IP firewall using security groups

Page 46: How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to Minutes w/ MongoDB Atlas on AWS

173.31.248.0/21

10.0.0.0/16

VPC peering

Your VPC

Elastic LB

CIDR Block: 10.0.0.0/16

Atlas VPC

AZ 1 AZ 2 AZ 3

CIDR Block: 172.31.248.0/21

Page 47: How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to Minutes w/ MongoDB Atlas on AWS
Page 48: How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to Minutes w/ MongoDB Atlas on AWS

We want prime to be such a good value, you’d be irresponsible not to be a member.—Jeff Bezos

“”

Page 49: How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to Minutes w/ MongoDB Atlas on AWS

Questions?

Page 50: How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to Minutes w/ MongoDB Atlas on AWS

Thank you!

Page 51: How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to Minutes w/ MongoDB Atlas on AWS

Remember to complete your evaluations!