bionimbus - an overview (2010-v6)

25
An Overview of Bionimbus and the Open Cloud Consortium Robert Grossman Open Cloud Consortium Institute for Genomics & Systems Biology University of Chicago Laboratory for Advanced Computing University of Illinois at Chicago

Upload: robert-grossman

Post on 17-May-2015

1.641 views

Category:

Technology


3 download

DESCRIPTION

Bionimbus is an open source cloud based system for managing, analyzing and sharing genomic data.

TRANSCRIPT

Page 1: Bionimbus - An Overview (2010-v6)

An Overview of Bionimbus and the Open Cloud Consortium

Robert GrossmanOpen Cloud Consortium

Institute for Genomics & Systems BiologyUniversity of Chicago

Laboratory for Advanced ComputingUniversity of Illinois at Chicago

Page 2: Bionimbus - An Overview (2010-v6)

Part 1. Bionimbus

www.bionimbus.org

Page 3: Bionimbus - An Overview (2010-v6)

Database Services

Analysis Pipelines & Re-analysis

Services

Web Portal & Widgets

Large Data Cloud Services

Ingestion Services

Elastic Cloud Services

Scalable data transport

Page 4: Bionimbus - An Overview (2010-v6)

Case Study 1: Cistrack

• Resource for cis-regulatory data.• Integrates databases and large data clouds.• Open source.• Contains raw data, intermediate, and analyzed

data from approximately 300 experiments from Agilent, Affy and Solexa platforms.

Page 5: Bionimbus - An Overview (2010-v6)

Flynet Provides Web2.0 Access to Cistrack

Page 6: Bionimbus - An Overview (2010-v6)

Cube is an Elastic Cloud For Re-analysis

Page 7: Bionimbus - An Overview (2010-v6)

App

OS

App

OS

App

OS

Hypervisers

Racks of Hardware

Private cloud (Eucalyptus & Cube)

Working Space

Simple Persistent

Storage (glusterfs)

Virtual MachinesmodENCODE Worm/Fly peak calling reanalysis

Case Study 2

ftp

ssh

Page 8: Bionimbus - An Overview (2010-v6)

App

OS

App

OS

App

OS

Hypervisers

Hardware Cluster

Private / Community cloud

Virtual Machines

Virtual machine containing (small) data & pipelines

Public cloud

ami-efa24c86

Hybrid Clouds

Page 9: Bionimbus - An Overview (2010-v6)

Case Study 3

SNP concordance:

Alignment against gene models: 46%

TopHat alignment: 91%

71 rare, deleterious SNP genotypes were validated by Sequenom.

• Ran TopHat in Bionimbus using VMs and cube.• Total time went from 25 days to 1 day.

Page 10: Bionimbus - An Overview (2010-v6)

Bionimbus Delivery Mechanisms

• Login and use the Bionimbus cloud.• Use Bionimbus Virtual Machine Images in a)

your private cloud; b) Bionimbus cloud; c) public clouds such as Amazon.

• Bionimbus is open source and you can build your own cloud (and interoperate with ours) (First release of integrated system 3Q 2010)

• Bionimbus data services for genomic data, even for large datasets

Page 11: Bionimbus - An Overview (2010-v6)

Goal: Minimize latency and control heat.

Goal: Maximize data (with matching compute) and control cost.

Goal: Minimize cost of virtualized machines & provide on-demand.

HPC

Large Data Clouds

Elastic Clouds

Page 12: Bionimbus - An Overview (2010-v6)

Persist & refresh data over the long term

High speed network to move & share the data

Web 2.0/3.0 user interface

Compute services at the scale of a data center.

A successful cloud will…

Page 13: Bionimbus - An Overview (2010-v6)

Part 2.

www.opencloudconsortium.org

13

Page 14: Bionimbus - An Overview (2010-v6)

• 501(c)(3) Not-for-profit corporation• Develops standards, interoperability

frameworks, and reference implementations.

• Operates clouds.• Develops benchmarks.• One area of focus: bridge between

private and public clouds.14

www.opencloudconsortium.org

Page 15: Bionimbus - An Overview (2010-v6)

Operates Clouds

• 500 nodes• 3000 cores• 1.5+ PB• Four data centers• 10 Gbps• Target to refresh 1/3

each year.

• Open Cloud Testbed• Open Science Data Cloud• Intercloud Testbed• Cloud-based Disaster

Relief Services

Page 16: Bionimbus - An Overview (2010-v6)

OCC Members

• Companies: Yahoo, Cisco, Aerospace Corp., Booz Allen Hamilton, InfoBlox, Open Data Group, Raytheon

• Universities: CalIT2, Johns Hopkins, MIT Lincoln Lab, Northwestern Univ., University of Chicago, University of Illinois at Chicago

• Government agencies: NASA

16

Page 17: Bionimbus - An Overview (2010-v6)

Open Cloud Consortium Perspective

• Vendor neutral• Open, interoperable

architecture• Experiment at scale• Operate infrastructure at the

scale of a small data center• Long term point of view

(think like a library not cloud service provider)

• Think public, private & hybrid clouds

Page 18: Bionimbus - An Overview (2010-v6)

Raywulf rack

Condo Clouds

Page 19: Bionimbus - An Overview (2010-v6)

Open Cloud Testbed

Phase 2• 9 racks• 250+ Nodes• 1000+ Cores• 10+ Gb/s

19

MREN

CENIC Dragon

Hadoop Sector/Sphere Thrift KVM VMs Eucalyptus VMs

C-Wave

Page 20: Bionimbus - An Overview (2010-v6)

Open Science Data Cloud

20

Astronomical dataBiological data (Bionimbus)

Networking dataImage processing for disaster relief

Page 21: Bionimbus - An Overview (2010-v6)

Storage Services

Compute Services

Applications

Virtual Network Manager

Data Services

Network Transport

Virtual Machine Manager

CloudMetadata Services

Identity Manager

IaaS

PaaS

Apps

Page 22: Bionimbus - An Overview (2010-v6)

Standards

Infrastructure as a Service– Virtual Data Centers (VDC)– Virtual Networks (VN)– Virtual Machines (VM)

Platform as a Service– Cloud Compute Services– Data/Table Cloud Services– Cloud Storage Services

Open Virtualization Format (OVF)

Open Cloud Computing Interface (OCCI)

SNIA Cloud Data Management Interface (CDMI)

Large Data Cloud Interoperability Framework

Page 23: Bionimbus - An Overview (2010-v6)

OCC Benchmarks

MalStone A MalStone BLarge Data Cloud 1a 455m 13s 840m 50s

Large Data Cloud 1b 87m 29s 142m 32s

Large Data Cloud 2 33m 40s 43m 44s

There are surprises.

Page 24: Bionimbus - An Overview (2010-v6)

Acknowledgements

Page 25: Bionimbus - An Overview (2010-v6)

Thank You

• For more information:– www.bionimbus.org– www.opencloudconsortium.org– rgrossman.com (for research papers, etc.)