2014 bioit trends from the trenches

105
1 Trends from the trenches: 2014 slideshare.net/chrisdag/ [email protected] @chris_dag #BioIT14 Wednesday, April 30, 14

Upload: the-bioteam-inc

Post on 15-Jun-2015

220 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: 2014 BioIT Trends From The Trenches

1

Trends from the trenches: 2014

slideshare.net/chrisdag/ [email protected] @chris_dag #BioIT14Wednesday, April 30, 14

Page 2: 2014 BioIT Trends From The Trenches

2

I’m Chris.I’m an infrastructure geek.

I work for the BioTeam.

Wednesday, April 30, 14

Page 3: 2014 BioIT Trends From The Trenches

Apologies in advance

3

If you have not heard me speak ...

‣ ‘Infamous’ for speaking very fast and carrying a huge slide deck

‣ In 2014 CHI finally gave up and just gave me a 60min talk slot

‣ Aiming to end with enough time for questions & discussions

By the time you see this slide I’ll be on my ~4th espresso

Wednesday, April 30, 14

Page 4: 2014 BioIT Trends From The Trenches

Who, What, Why ...

4

BioTeam

‣ Independent consulting shop

‣ Staffed by scientists forced to learn IT, SW & HPC to get our own research done

‣ 12+ years bridging the “gap” between science, IT & high performance computing

‣ Our wide-ranging work is what gets us invited to speak at events like this ...

Wednesday, April 30, 14

Page 5: 2014 BioIT Trends From The Trenches

5

Why I do this talk every year ...

‣ Bioteam works for everyone

• Pharma, Biotech, EDU, Nonprofit, .Gov, etc.

‣ We get to see how groups of smart people approach similar problems

‣ We can speak honestly & objectively about what we see “in the real world”

Wednesday, April 30, 14

Page 6: 2014 BioIT Trends From The Trenches

Listen to me at your own risk

6

Standard Disclaimer

‣ I’m not an expert, pundit, visionary or “thought leader”

‣ There are ~2000 smart people at this event; I don’t presume to speak for us as a whole

‣ All career success entirely due to shamelessly copying what actual smart people do

‣ I’m biased, burnt-out & cynical

‣ Filter my words accordingly

Wednesday, April 30, 14

Page 7: 2014 BioIT Trends From The Trenches

7

What’s new?What’s new?

I’ve seen your slides before. <yawn>

Wednesday, April 30, 14

Page 8: 2014 BioIT Trends From The Trenches

aka ‘spreading the blame ...’

8

What’s new 1: Acknowledgements

‣ This talk used to be made in a vacuum each year

• ... often mere minutes before the scheduled talk time

‣ Not this year• Heavily influenced by

peer group of smarter people who get chatty when given beer

‣ Non-comprehensive blame gang:

• Ari Berman• Aaron Gardner• Adam Kraut• Chris Botka (Harvard)• Chris Dwan (Broad)• James Cuff (Harvard)• ... many more ...

Wednesday, April 30, 14

Page 9: 2014 BioIT Trends From The Trenches

What has not changed in recent talksNot new 2: Recycled Content

‣ The core Bio-IT ‘meta’ issue remains unchanged

‣ Minor updates to report for cloud landscape

‣ Compute landscape largely unchanged

• ... a few updates to share in this space but nothing earth shattering

9Wednesday, April 30, 14

Page 10: 2014 BioIT Trends From The Trenches

10

Why are we all here?

Wednesday, April 30, 14

Page 11: 2014 BioIT Trends From The Trenches

11

The #1 ‘meta issue’ is unchanged in 2014

Wednesday, April 30, 14

Page 12: 2014 BioIT Trends From The Trenches

12

It’s a risky time to be doing Bio-IT

Wednesday, April 30, 14

Page 13: 2014 BioIT Trends From The Trenches

13

Meta: Science evolving faster than IT can refresh infrastructure & practices

Wednesday, April 30, 14

Page 14: 2014 BioIT Trends From The Trenches

This is what keeps Bio-IT folks up at nightThe Central Problem Is ...

‣ Instrumentation & protocols are changing FAR FASTER than we can refresh our Research-IT & Scientific Computing infrastructure

• Bench science is changing month-to-month ...• ... while our IT infrastructure only gets refreshed every

2-7 years

‣ Our job is to design systems TODAY that can support unknown research requirements & workflows over multi-year spans (gulp ...)

14Wednesday, April 30, 14

Page 15: 2014 BioIT Trends From The Trenches

The Central Problem Is ...

‣ The easy period is over

‣ 5 years ago we could toss inexpensive storage and servers at the problem; even in a nearby closet or under a lab bench if necessary

‣ That does not work any more; real solutions required

15Wednesday, April 30, 14

Page 16: 2014 BioIT Trends From The Trenches

16

This is our “new normal” for informatics

Wednesday, April 30, 14

Page 17: 2014 BioIT Trends From The Trenches

17

The Central Problem Is ...

‣ Lab technology is being refreshed, upgraded and replaced at an astonishing rate

• Bigger, faster, parallel• Requiring increasingly

sophisticated IT support• Cheap and easily obtainable

Wednesday, April 30, 14

Page 18: 2014 BioIT Trends From The Trenches

18

The Central Problem Is ...

‣ ... and IT still being caught by surprise in 2014

• Procurement practices and cheaper instrument prices result in situations where IT is bypassed or not consulted in advance

Wednesday, April 30, 14

Page 19: 2014 BioIT Trends From The Trenches

True Story - 48 Hours Ago

19Wednesday, April 30, 14

Page 20: 2014 BioIT Trends From The Trenches

A conversation with a client Just 48 hours ago ...

‣ Scientists tell IT that they are getting a new PacBio sequencing platform

• Gave IT a 5-node cluster quote that PacBio provided as blueprint for SMRT Portal

• Wanted confirmation that everything was cool with IT support

20Wednesday, April 30, 14

Page 21: 2014 BioIT Trends From The Trenches

A conversation with a client Just 48 hours ago ...

‣ Partial “Minor” Issue List:• Scientists had no clue about power

requirements. A pair of 60amp 220v power outlets = multi-month facility project

• ... assumed IT would be cool accepting and supporting a one-off HPC system sized for 1 instrument & 1 workgroup

• ... also appeared to believe that storage was infinite and free. At least that is what their budget assumed.

21Wednesday, April 30, 14

Page 22: 2014 BioIT Trends From The Trenches

One more thing ...

22Wednesday, April 30, 14

Page 23: 2014 BioIT Trends From The Trenches

We can’t blame the science/lab side for everythingOne more thing ...

‣ Can’t blame the lab-side for all our woes

‣ IT innovation is causing headaches in research and program management

‣ Grant funding agencies, regulatory rules and internal risk/program management practices not updated to reflect current and emerging IT capabilities, architectures & practices

• Rules & policies often simply do not cover what we are capable of doing right now

23Wednesday, April 30, 14

Page 24: 2014 BioIT Trends From The Trenches

24

A related problem ...

Wednesday, April 30, 14

Page 25: 2014 BioIT Trends From The Trenches

This also hurts ...

‣ It has never been easier to acquire vast amounts of data cheaply and easily

‣ Growth rate of data creation/ingest exceeds rate at which the storage industry is improving disk capacity

‣ Not just a storage lifecycle problem. This data *moves* and often needs to be shared among multiple entities and providers

• ... ideally without punching holes in your firewall or consuming all available internet bandwidth

25Wednesday, April 30, 14

Page 26: 2014 BioIT Trends From The Trenches

The future is not looking pretty for the ill prepared

26Wednesday, April 30, 14

Page 27: 2014 BioIT Trends From The Trenches

High Costs For Getting It Wrong

‣ Lost opportunity

‣ Missing capability

‣ Frustrated & very vocal scientific staff

‣ Problems in recruiting, retention, publication & product development

27Wednesday, April 30, 14

Page 28: 2014 BioIT Trends From The Trenches

28

Enough groundwork. Lets Talk Trends

Wednesday, April 30, 14

Page 29: 2014 BioIT Trends From The Trenches

29

Trends: DevOps & Org Charts

Wednesday, April 30, 14

Page 30: 2014 BioIT Trends From The Trenches

30

The social contract betweenscientist and IT is changing forever

Wednesday, April 30, 14

Page 31: 2014 BioIT Trends From The Trenches

31

You can blame “the cloud” for this

Wednesday, April 30, 14

Page 32: 2014 BioIT Trends From The Trenches

32

DevOps & Scriptable Everything

‣ On (real) clouds, EVERYTHING has an API

‣ If it’s got an API you can automate and orchestrate it

‣ “scriptable datacenters” are now a very real thing

Wednesday, April 30, 14

Page 33: 2014 BioIT Trends From The Trenches

33

DevOps & Scriptable Everything

‣ Incredible innovation in the past few years

‣ Driven mainly by companies with massive internet ‘fleets’ to manage

‣ ... but the benefits trickle down to us mere mortals

Wednesday, April 30, 14

Page 34: 2014 BioIT Trends From The Trenches

34

DevOps will conquer the enterprise

‣ Over the past few years cloud automation/orchestration methods have been trickling down into our local infrastructures

‣ This will have significant impact on careers, job descriptions and org charts

Wednesday, April 30, 14

Page 35: 2014 BioIT Trends From The Trenches

2014: Continue to blur the lines between all these roles

35

Scientist/SysAdmin/Programmer

‣ Radical change in how IT is provisioned, delivered, managed & supported

• Technology Driver: Virtualization & Cloud

• Ops Driver:Configuration Mgmt, Systems Orchestration & Infrastructure Automation

‣ SysAdmins & IT staff need to re-skill and retrain to stay relevant

www.opscode.com

Wednesday, April 30, 14

Page 36: 2014 BioIT Trends From The Trenches

2014: Continue to blur the lines between all these roles

36

Scientist/SysAdmin/Programmer

‣ When everything has an API ...

‣ ... anything can be ‘orchestrated’ or ‘automated’ remotely

‣ And by the way ...

‣ The APIs (‘knobs & buttons’) are accessible to all, not just the expert practitioners sitting in that room next to the datacenter

Wednesday, April 30, 14

Page 37: 2014 BioIT Trends From The Trenches

2014: Continue to blur the lines between all these roles

37

Scientist/SysAdmin/Programmer

‣ IT jobs, roles and responsibilities are changing

‣ SysAdmins must learn to program in order to harness automation tools

‣ Programmers & Scientists can now self-provision and control sophisticated IT resources

Wednesday, April 30, 14

Page 38: 2014 BioIT Trends From The Trenches

2014: Continue to blur the lines between all these roles

38

Scientist/SysAdmin/Programmer

‣ My take on the future ...• SysAdmins (Windows & Linux) who

can’t code will have career issues • Far more control is going into the

hands of the research end user • IT support roles will radically change

-- no longer owners or gatekeepers

‣ IT will “own” policies, procedures, reference patterns, identity mgmt, security & best practices

‣ Research will control the “what”, “when” and “how big”

Wednesday, April 30, 14

Page 39: 2014 BioIT Trends From The Trenches

2014 SummaryTrend: DevOps & Automation

‣ Almost every HPC project (all sizes) BioTeam worked on in 2014 included

• A bare-metal OS provisioning service (Cobbler, etc.)• A ‘next-gen’ configuration management service (Chef, Puppet,

Saltstack, etc.)

‣ Gut feeling: This is going to be very useful for regulated environments

• Not BS or empty hype: IT infrastructure and server/OS/service configuration encoded as text files

• Easy to version control, audit, revert, rebuild, verify and fold into existing change management & documentation systems

39Wednesday, April 30, 14

Page 40: 2014 BioIT Trends From The Trenches

40

Trends: Compute

Wednesday, April 30, 14

Page 41: 2014 BioIT Trends From The Trenches

Compute related design patterns largely static

41

Core Compute

‣ Linux compute clusters are still the baseline compute platform

‣ Even our lab instruments know how to submit jobs to common HPC cluster schedulers

‣ Compute is not hard. It’s a commodity that is easy to acquire & deploy in 2014

Wednesday, April 30, 14

Page 42: 2014 BioIT Trends From The Trenches

Defensive hedge against Big Data / HDFS

42

Compute: Local Disk Matters

‣ This slide is from 2013; trend is continuing

‣ The “new normal” may be 4U enclosures with massive local disk spindles - not occupied, just available

‣ Why? Hadoop & Big Data

‣ This is a defensive hedge against future HDFS or similar requirements

• Remember the ‘meta’ problem - science is changing far faster than we can refresh IT. This is a defensive future-proofing play.

‣ Hardcore Hadoop rigs sometimes operate at 1:1 ratio between core count and disk count

Wednesday, April 30, 14

Page 43: 2014 BioIT Trends From The Trenches

Faster networks are driving compute config changes

43

Compute: NICs and Disks

‣ One pain point for me in 2013-2014:• Network links to my nodes are getting

faster• It’s embarrassing my disks are slower

than the network feeding them• Need to be careful about selecting and

configuring high speed NICs- Example: that dual-port 10Gig card may

not actually be able to drive both ports if the card was engineered for an active:passive link failover scenario

• Also need to re-visit local disk configurations

Wednesday, April 30, 14

Page 44: 2014 BioIT Trends From The Trenches

New and refreshed HPC systems running many node types

44

Compute: Huge trend in ‘diversity’

‣ Accelerated trend since at least 2012 ...• HPC compute resources no longer homogenous; many

types and flavors now deployed in single HPC stacks

‣ Newer clusters mix-and-match to match the known use cases:

• GPU nodes for compute• GPU nodes for visualization

• Large memory nodes (512GB +)

• Very Large memory nodes (1TB +)• ‘Fat’ nodes with many CPU cores

• ‘Thin’ nodes with super-fast CPUs• Analytic nodes with SSD, FusionIO, flash or large local

disk for ‘big data’ tasks

Wednesday, April 30, 14

Page 45: 2014 BioIT Trends From The Trenches

GPUs, Coprocessors & FPGAs

45

Compute: Hardware Acceleration

‣ Specialized hardware acceleration has it’s place but will not take over the world

• “... the activation energy required for a scientist to use this stuff is generally quite high ...”

‣ GPU, Phi and FPGA best used in large scale pipelines or as specific solution to a singular pain point

Wednesday, April 30, 14

Page 46: 2014 BioIT Trends From The Trenches

Compute: Big Data & Analytics

‣ BioTeam is starting to build “Big Data” labs and environments for clients

‣ The most interesting trend:• We are not designing for specific

analytic use cases; in most projects are are adding in basic “capabilities” with the expectation that the apps and users will come later

• ... defensive IT hedge against rapidly changing science requirements, remember?

46Wednesday, April 30, 14

Page 47: 2014 BioIT Trends From The Trenches

Compute: Big Data & Analytics

‣ This translates to infrastructure designed to support certain capabilities rather than specific software or application.

‣ Example:• Beefy HDFS friendly servers• 100% bare metal provisioning and dynamic

system reconfiguration• Systems for ingest• Very large RAM systems• Big PCIx bus systems• Memory-resident database systems• Mix of very fast and capacity optimized storage• Very fast core, top-of-rack and server networking

47Wednesday, April 30, 14

Page 48: 2014 BioIT Trends From The Trenches

Also known as hybrid cloudsEmerging Trend: Hybrid HPC‣ No longer “utter crap” or “cynical

vendor-supported reference case”• small local footprint• large, dynamic, scalable, orchestrated

public cloud component

‣ DevOps is key to making this work

‣ High-speed network to public cloud required

‣ Software interface layer acting as the mediator between local and public resources

‣ Good for tight budgets, has to be done right to work

‣ Still best approached very carefully48

Wednesday, April 30, 14

Page 49: 2014 BioIT Trends From The Trenches

BioIT World Homework

‣ We’ve got interesting hardware vendors on the show floor this week; check them out

• Silicon Mechanics, Thinkmate, Microway: cool commodity

• Intel, IBM, Dell, SGI: Large & enterprise• Timelogic: hardware acceleration• ...

49Wednesday, April 30, 14

Page 50: 2014 BioIT Trends From The Trenches

50

Trends: Network

Wednesday, April 30, 14

Page 51: 2014 BioIT Trends From The Trenches

51

Big trouble ahead ...

Wednesday, April 30, 14

Page 52: 2014 BioIT Trends From The Trenches

52

Network: Speed @ Core and Edge

‣ Huge potential pain point

‣ May surpass storage as our #1 infrastructure headache

‣ Petascale data is useless if you can’t move it or access it fast enough

‣ Don’t be smug about 10 Gigabit - folks need to start thinking *now* about 40 and even 100 Gigabit Ethernet

‣ You may need 10Gig to some desktops for data ingest/export

Wednesday, April 30, 14

Page 53: 2014 BioIT Trends From The Trenches

53

Network: Speed @ Core and Edge

‣ Remember ~2004 when research storage requirements started to dwarf what the enterprise was using?

‣ Same thing is happening now for networking

‣ Research core, edge and top-of-rack networking speeds may exceed what the rest of the organization has standardized on

Wednesday, April 30, 14

Page 54: 2014 BioIT Trends From The Trenches

Massive data movement needs are driving innovation painThis is going to be painful‣ Enterprise networking folks

are even more aloof than storage admins we battled in ’04

‣ Often used to driving requirements and methods; unhappy when science starts to drive them out of their comfort zones

‣ Research needs to start pushing harder and faster for network speeds above 10GbE

• This will take a long time so start now!

54Wednesday, April 30, 14

Page 55: 2014 BioIT Trends From The Trenches

Not sure how this will play out

‣ It will be interesting to see what large-scale data movement does to our local infrastructure and desktop experience

‣ Especially with other trends like BYOD

‣ My $.02• Speeds to our desktops are going get very fast, or • We give up on growing massive bandwidth to the client

and embrace a full VDI model where the users just “remote desktop” into a well-networked scientific informatics environment

55Wednesday, April 30, 14

Page 56: 2014 BioIT Trends From The Trenches

BioIT World Homework

‣ Visit the Internet2 booth to chat high speed networking

• Ask about their free or low-cost training events and technical workshops; start thinking about how you can get your internal networking teams/leadership to attend

• Ask them about the new trend of private/corporate links into I2 and other fast research networks

‣ Arista is here. Talking and exhibiting. They are not Cisco. Listen, visit & talk to them.

56Wednesday, April 30, 14

Page 57: 2014 BioIT Trends From The Trenches

Significant new trend in networkingScience DMZs

57Wednesday, April 30, 14

Page 58: 2014 BioIT Trends From The Trenches

It’s real and becoming necessaryNetwork: ‘ScienceDMZ’

‣ BioTeam building them in 2014 and beyond

‣ Central premise:• Legacy firewall, network and security methods

architected for “many small data flows” use cases• Not built to handle smaller #s of massive

data flows

• Also very hard to deploy ‘traditional’ security gear on 10Gigabit and faster networks

‣ More details, background & documents at http://fasterdata.es.net/science-dmz/

58

Background traffic or

competing bursts

DTN traffic with wire-speed

bursts

10GE

10GE

10GE

Wednesday, April 30, 14

Page 59: 2014 BioIT Trends From The Trenches

Network: ‘ScienceDMZ’

‣ Start thinking/discussing this sooner rather than later

‣ Just like “the cloud” this may fundamentally change internal operations and technology

‣ Will also require conscious buy-in and support from senior network, security and risk management professionals

• ... these talks take time. Best to plan ahead

59Wednesday, April 30, 14

Page 60: 2014 BioIT Trends From The Trenches

Network: ‘ScienceDMZ’

‣ A Science DMZ has 3 required components:1. Very fast “low-friction” network links and paths with

security policy and enforcement specific to scientific workflows

2. Dedicated, high performance data transfer nodes (“DTNs”) highly optimized for high speed data xfer

3. Dedicated network performance/measurement nodes

60Wednesday, April 30, 14

Page 61: 2014 BioIT Trends From The Trenches

Network: ‘ScienceDMZ’

‣ Implementation specifics are complex; the basic concept is not:1. Research need to move scientific data at high speeds

is already being negatively affected by networks not designed for this requirement

2. Likely to force fundamental changes in core enterprise architectures on a similar disruptive scale as what genome data storage forced in ~2004

3. Firewalls/IDS and security in particular will be affected

61Wednesday, April 30, 14

Page 62: 2014 BioIT Trends From The Trenches

62

Simple Science DMZ:

Image source: “The Science DMZ: Introduction & Architecture” -- esnet

Wednesday, April 30, 14

Page 63: 2014 BioIT Trends From The Trenches

Network: ‘ScienceDMZ’

‣ My gut feeling:1. The fanciest and most complex Science DMZ architectures in the literature right

now are not suitable for our world

• Expensive specialized equipment; Expensive specialist staff expertise required

• Often still experimental, not something enterprise IT would want to drop into a production environment

2. Science DMZ concepts are sound and simple implementations are possible today

3. Start small:

• Incorporate these sorts of concepts/ideas into long term planning ASAP

• Start adding network performance monitoring nodes to research networks, DMZs and external circuit connections now; this entire concept falls over without actionable flow and performance data

• Start work on policies and procedures for manual bypass of firewall/IDS rules when known sender/receivers are freighting high speed data; automation comes later!

63Wednesday, April 30, 14

Page 64: 2014 BioIT Trends From The Trenches

BioIT World Homework

‣ Bookmark http://fasterdata.es.net and check out the published materials and advice

‣ Monitor http://www.oinworkshop.com/ to see when a workshop/event may be coming near you (send your networking people ...)

‣ Both ESNet and Internet2 run training and technical workshops that deliver far more value for price than the usual training junkets

64Wednesday, April 30, 14

Page 65: 2014 BioIT Trends From The Trenches

Check out this talkBioIT World Homework

‣ Track 1 - 3:10pm today: • Christian Todorov talks “Accelerating Biomedical

Research Discovery: The 100G Internet2 Network – Built and Engineered for the Most Demanding Big Data Science Collaborations”

65Wednesday, April 30, 14

Page 66: 2014 BioIT Trends From The Trenches

Not very significant trend in 2014:Software Defined Networking (“SDN”)

66Wednesday, April 30, 14

Page 67: 2014 BioIT Trends From The Trenches

More hype than useful reality at the moment

67

Network: SDN Hype vs. Reality

‣ Software Defined Networking (“SDN”) is the new buzzword

‣ It WILL become pervasive and will change how we build and architect things

‣ But ...

‣ Not hugely practical at the moment for most environments

• We need far more than APIs that control port forwarding behavior on switches

• More time needed for all of the related technologies and methods to coalesce into something broadly useful and usable

Wednesday, April 30, 14

Page 68: 2014 BioIT Trends From The Trenches

More hype than useful reality at the moment

68

Network: SDN

‣ My gut feeling:• It is the future but right now we are still in the

“mostly empty hype” phase if you wanna be cynical about it; best to wait and watch

• Production enterprise use: OpenFlow and similar stuff does not provide value relative to implementation effort right now

• Best bang for the buck in ’14 will be getting ‘SDN’ features as part of some other supported stack

- OpenStack, VMWare, Cloud, etc.

Wednesday, April 30, 14

Page 69: 2014 BioIT Trends From The Trenches

69

Trends: Storage

Wednesday, April 30, 14

Page 70: 2014 BioIT Trends From The Trenches

70

Storage

‣ Still the biggest expense, biggest headache and scariest systems to design in modern life science informatics environments

‣ Many of the pain points we’ve talked about for years are still in place:

• Explosive growth forcing tradeoffs in capacity over performance• Lots of monolithic single tiers of storage• Critical need to actively manage data through it’s full life cycle

(just storing data is not enough ...)• Need for post-POSIX solutions such as iRODS and other

metadata-aware data repositories

Wednesday, April 30, 14

Page 71: 2014 BioIT Trends From The Trenches

71

Storage Trends

‣ The large but monolithic storage platforms we’ve built up over the years are no longer sufficient

• Do you know how many people are running a single large scale-out NAS or parallel filesystem? Most of us!

‣ Tiered storage is now an absolute requirement• At a minimum we need an active storage tier plus

something far cheaper/deeper for cold files

‣ Expect the tiers to involve multiple vendors, products and technologies

• The Tier1 storage vendors tend to have higher-end pricing for their “all in one” tiered data management solutions

Wednesday, April 30, 14

Page 72: 2014 BioIT Trends From The Trenches

72

Storage - The Old Way

‣ Single tier of scale-out NAS or parallel FS

‣ Why?• Suitable for broadest set of use cases• Easy to procure/integrate• Lowest administrative & operational burden

‣ Example:• 400TB - 1PB of ‘something’ stores ‘everything’

Wednesday, April 30, 14

Page 73: 2014 BioIT Trends From The Trenches

73

Storage - The New Way

‣ Multiple tiers; potentially from multiple vendors

‣ Why?• Way more cost efficient (size the tier to the need)• Single tier no longer capable of supporting all use cases and

workflow patterns• Single tiers waste incredible money at large scale

‣ Example:• 10-40 TB SSD/Flash for ingest & IOPS-sensitive workloads• 50-400 TB tier (SATA/SAS/SSD mix) for active processing• Multi-petabyte tier (Cloud, Object, SATA) for cost and operationally

efficient long term (yet reachable) storage of scientific data at rest

Wednesday, April 30, 14

Page 74: 2014 BioIT Trends From The Trenches

Sticking 100% with Tier 1 vendors gets expensive

74

Storage: Disruptive stuff ahead

‣ BioTeam has built 1Petabyte ZFS-based storage pools from commodity whitebox kit for about ~$100,000 in direct hardware costs (engineering effort & admin not included in this price ...)

‣ There are many storage vendors in the middle tier who can provide storage systems that are less ‘risky’ than DIY homebuilt setups yet far less expensive than the traditional Tier 1 enterprise storage options

• Several of these vendors are here at the show!

‣ Companies like Avere Systems are producing boxes that unify disparate storage tiers and link them to cloud and object stores

• This is a route to unifying “tier 1” storage with the “cheap & deep” storage

Wednesday, April 30, 14

Page 75: 2014 BioIT Trends From The Trenches

Infinidat aka http://izbox.com The new thumper.

‣ 1 petabyte usable NAS shipped as a single integrated rack

• List price: $500 per usable terabyte

‣ More expensive than DIY ZFS on commodity chassis but less expensive than current mainstream products

‣ Lots of interesting use cases for ‘cheap & deep’

75Wednesday, April 30, 14

Page 76: 2014 BioIT Trends From The Trenches

Avere SystemsWait, I can DO that?

‣ These folks caught my eye in late 2013 for one very specific use case

‣ Since then I keep them in mind for 4-5 common problems I regularly face

‣ It can:• Add performance layer on top of storage bought

to be “cheap & deep”• Virtualize many NAS islands into a single

namespace• Replicate & move data between tiers and sites• Act as CIFS/NFS gateway to on-premise or

offsite object stores ***• Treat Amazon S3 and Glacier as simply another

storage tier fully integrated into your environment76

Wednesday, April 30, 14

Page 77: 2014 BioIT Trends From The Trenches

Object Storage

‣ Object storage is the future for scientific data at rest• Total no brainer; makes more sense than the “files and

folders” paradigm, especially for automated analysis

• Plus Amazon does it for super cheap

‣ But ... There will be a long transition period due to all of our legacy codes and workflows

• This is where gateway devices can play

‣ It can:• Provide a much better workflow design pattern than

assuming “files and folders” data storage• Save millions of dollars via efficiencies of erasure coding

• Provide a much more robust and resilient peta-scale storage framework

• Hide behind a metadata-aware layer such as IRODS to provide very interesting capabilities

77Wednesday, April 30, 14

Page 78: 2014 BioIT Trends From The Trenches

Object Storage

‣ Erasure coding distributed object stores are very interesting at peta-scale ...

‣ Think about how you would handle & replicate 20 petabytes of data the “traditional way”

• Purchase 2x or 3x storage capacity to handle replication overhead

• Ignore the nightmare scenario of having to restore from one of the distributed replicas

78Wednesday, April 30, 14

Page 79: 2014 BioIT Trends From The Trenches

Object Storage

‣ Efficiencies of erasure coding allow for LESS raw disk to be distributed across MORE geographic sites

‣ End result is a “single” usable system that tolerant to the failure of an entire datacenter/site

‣ For the 20 petabyte problem instead of purchasing 2x disk you buy ~1.8x and use the capex savings to add an extra colo facility or increase WAN link speed

79Wednesday, April 30, 14

Page 80: 2014 BioIT Trends From The Trenches

ExerciseBioIT World Homework

‣ Pick a storage size that make sense for you (100TB or 1PB suggested)

‣ Visit the various storage vendors on the show floor and price out what 100TB or 1PB would cost

‣ You will see an awesome diversity of products, performance, features and capabilities at various price points

• DO NOT fixate on price alone. This is a mistake.

‣ This is REALLY worth doing - there is incredible diversity in the mix of price/features/performance/capability out there

80Wednesday, April 30, 14

Page 81: 2014 BioIT Trends From The Trenches

Check out these boothsBioIT World Homework

‣ Object storage: • Amplidata & CleverSafe

‣ Glue/Gateway/Acceleration: • Avere Systems

‣ Enterprise: • EMC Isilon, IBM, Dell, SGI, Hitachi, Panasas

‣ Mid-tier/Commodity: • Silicon Mechanics, Thinkmate, RAID Inc., Xyratex

81Wednesday, April 30, 14

Page 82: 2014 BioIT Trends From The Trenches

Check out these talksBioIT World Homework

‣ Track 5 - noon today:• Aaron Gardner talks “Taming big scientific data growth with

converged infrastructure”

‣ Track 1 - 2:55pm today: • Jacob Farmer talks “Bridging the Worlds of Files, Objects,

NAS, and Cloud: A Blazing Fast Crash Course in Object Storage”

‣ Track 1 - 4:30pm today: • Dirk Petersen talks “ Deploying Very Low Cost Cloud Storage

Technology in a Traditional Research HPC Environment

82Wednesday, April 30, 14

Page 83: 2014 BioIT Trends From The Trenches

83

Can you do a Bio-IT talk without using the ‘C’ word?

Wednesday, April 30, 14

Page 84: 2014 BioIT Trends From The Trenches

84

Cloud: 2014

‣ Core advice remains the same

‣ A few new permutations ...

Wednesday, April 30, 14

Page 85: 2014 BioIT Trends From The Trenches

Core Advice

85

Cloud: 2014

‣ Research Organizations need a cloud strategy today yesterday

• Those that don’t will be bypassed by frustrated users or sneaky “cloud aware” devices

‣ IaaS cloud services are only a departmental credit card away ... some senior scientists are too big to be fired for violating IT policy

‣ Instrument vendors are forcing the issue

‣ Storage vendors are forcing the issue

Wednesday, April 30, 14

Page 86: 2014 BioIT Trends From The Trenches

Design Patterns

86

Cloud Advice

‣ We actually need several tested cloud design patterns:

‣ (1) To handle ‘legacy’ scientific apps & workflows

‣ (2) The special stuff that is worth re-architecting

‣ (3) Hadoop & big data analytics

‣ ... and maybe (4) Regulated/sensitive efforts...

‣ ...and maybe (5) a way to evaluate Commercial solutions

Wednesday, April 30, 14

Page 87: 2014 BioIT Trends From The Trenches

Legacy HPC on the Cloud

87

Cloud Advice

‣ MIT StarCluster• http://star.mit.edu/cluster/• This is your baseline• Extend as needed

‣ Also check out Univa• Commercially supported Grid Engine

stack with compelling roadmap and native cloud capabilities

Wednesday, April 30, 14

Page 88: 2014 BioIT Trends From The Trenches

“Cloudy” HPC

88

Cloud Advice

‣ Some of our research workflows are important enough to be rewritten for “the cloud” and the advantages that a truly elastic & API-driven infrastructure can deliver

‣ This is where you have the most freedom

‣ Many published best practices you can borrow

‣ Warning: Cloud vendor lock-in potential is strongest here

Wednesday, April 30, 14

Page 89: 2014 BioIT Trends From The Trenches

What has changed ..Cloud: 2014

‣ Lets revisit some of my bile from prior years

‣ “... private clouds: still utter crap”

‣ “... some AWS competitors are delusional pretenders”

‣ “... AWS has a multi-year lead on the competition”

89Wednesday, April 30, 14

Page 90: 2014 BioIT Trends From The Trenches

Private Clouds in 2014:

‣ I’m no longer dismissing them as “utter crap”• However it is a lot of work and money to build a system that only has 5% of the

features that AWS can deliver today (for a cheaper price). Need to be careful about the use case, justification and operational/development burden.

‣ Usable & useful in certain situations

‣ BioTeam positive experiences with OpenStack

‣ Starting to see OpenStack pilots among our clients

‣ Hype vs. Reality ratio still wacky

‣ Sensible only for certain shops• Have you seen what you have to do

to your networks & gear?

‣ Still important to remain cynical and perform proper due diligence

Wednesday, April 30, 14

Page 91: 2014 BioIT Trends From The Trenches

Not all AWS competitors are delusional

‣ Google Compute is viable in 2014 for scientific workflows• Compute/Memory: Late start into IaaS means CPUs and memory are current generation; we have

‘war stories’ from AWS users who probe /proc/cpuinfo on EC2 servers so they can instantly kill any instance running on older chipsets

• Price: Competitive on price although the shooting war between IaaS providers means it is hard to pin down the current “winner”; The “sustained use” pricing is easier to navigate than AWS Reserved Instances. Overall AWS pricing algorithms for various services seem more complicated than Google equivalents.

• Network performance: Fantastic networking and excellent performance/latency figures between regions and zones. VPC type features are baked into the default resource set

• Ops: Priced in 1min increments; no more need to hunt and kill servers at 55 min past the hour. Google has a concept of “Projects” with assigned collaborators and quotas. Quite different from the AWS account structure and IAM-based access control model. Project-based paradigm easier to think about for scientific use case.

• IaaS Building Blocks: Still far fewer features than AWS but the core building blocks that we need for science and engineering workflows are present.

‣ My $.02• AWS is still the clear leader but Google Compute is now a viable option and it is worth ‘kicking the

tires’ in 2014 and beyond ... to me AWS has had no serious competition until now

Wednesday, April 30, 14

Page 92: 2014 BioIT Trends From The Trenches

Cloud Science Facilitators

‣ Cycle Computing is legit• They’ve proven themselves

on some of largest IaaS HPC grids ever built

• Experience with hybrid systems (cloud & premise)

‣ Smart people. Nice people.

‣ They have a booth, stop by and chat them up ...

Wednesday, April 30, 14

Page 93: 2014 BioIT Trends From The Trenches

93

The road ahead ...

Wednesday, April 30, 14

Page 94: 2014 BioIT Trends From The Trenches

This has been a slow moving trend for years now ...

94

POSIX Alternatives Coming

‣ The scope of organizations faced with the limitations of POSIX filesystem will continue to expand

‣ We desperately need some sort of “metadata aware” data management solution in life science

‣ Nobody has an easy solution yet; several bespoke installations but no clear mass-market options

‣ IRODS front-ending “cheap & deep” storage tiers or object stores appears to be gaining significant interest out in our community

Wednesday, April 30, 14

Page 95: 2014 BioIT Trends From The Trenches

Application Containers are getting interesting

95

Watch out for: Containerization

‣ Application containerization via methods like http://docker.io gaining significant attention

• Docker support now in native RHEL kernel

• AWS Elastic Beanstalk recently added Docker support

‣ If broadly adopted, these techniques will stretch research IT infrastructures in interesting directions

• This is far more interesting to me than moving virtual machines around a network or into the cloud

‣ ... with a related impact on storage location, features & capability

‣ Major new news and progress expected in 2014

Wednesday, April 30, 14

Page 96: 2014 BioIT Trends From The Trenches

96

Keep an eye on: Storage‣ Data generation out-pacing

technology

‣ Really interesting disruptive stuff on the market now

‣ Cheap/easy laboratory assays taking over

• Researchers largely don’t know what to do with it all

• Holding on to the data until someone figures it out

• This will cause some interesting headaches for IT

• Huge need for real “Big Data” applications to be developed

Wednesday, April 30, 14

Page 97: 2014 BioIT Trends From The Trenches

97

Keep an eye on: Networking‣ Unless there’s an investment

in ultra-high speed networking, need to change thought on analysis

‣ Data commons are becoming a precedent

• Need to minimize the movement of data

• Include compute power and analysis platform with data commons

‣ Move the analysis to the data, don’t move the data

• Requires sharing/Large core institutional resources

Wednesday, April 30, 14

Page 98: 2014 BioIT Trends From The Trenches

98

Long term trends ...

‣ Compute continues to become easier

‣ Data movement and ingest (physical & network) gets harder

‣ Cost of storage will be dwarfed by “cost of managing stored data”

‣ We can see end-of-life for our current IT architecture and design patterns; new patterns will start to appear over next 2-5 years

Wednesday, April 30, 14

Page 99: 2014 BioIT Trends From The Trenches

99

Wrap-up: Final Advice & Tips

Wednesday, April 30, 14

Page 100: 2014 BioIT Trends From The Trenches

Embrace The Innovation

100

Ending Advice: 1 of 5

‣ Understand the ‘interesting times’ we are in• Science is changing faster than we can refresh IT• This is not going to change any time soon

‣ Advice:• Spend as much time thinking about future flexibility as

you spend on actual current needs & requirements• Design for agility & responsiveness

Wednesday, April 30, 14

Page 101: 2014 BioIT Trends From The Trenches

Capacity

101

Ending Advice: 2 of 5

‣ Many of us will need ‘petabyte capable’ storage

‣ However:• Only some of us will ever have 1PB+ under management• The hard part is knowing whom that will be

Wednesday, April 30, 14

Page 102: 2014 BioIT Trends From The Trenches

Tiers are in your future

102

Ending Advice: 3 of 5

‣ Tiers are now a requirement, at least long-term• At a minimum we need an ‘active’ tier for processing &

ingest• ... and some sort of inexpensive cold/nearline/archive

option as well

‣ Advice:• It’s OK to buy a single block/tier of disk• ... but always have a strategy for diversification

Wednesday, April 30, 14

Page 103: 2014 BioIT Trends From The Trenches

103

Ending Advice: 4 of 5

‣ Above a certain scale, inefficient data management & simple storage practices are hugely wasteful

‣ Advice:• The cost of a new hire “data manager” or curator role may

be cheaper and far more beneficial to your organization than continuing to throw CapEx dollars at keeping a badly run storage platform under it’s capacity limit

• Many opportunities to get clever & recapture efficiency & capability: tiers, replication, cloud, dedupe, CRAM compression, iRODS

• BROADEN YOUR PERSPECTIVEWednesday, April 30, 14

Page 104: 2014 BioIT Trends From The Trenches

104

Ending Advice: 5 of 5

‣ You need a cloud strategy. Yesterday.- Users, instrument makers & IT vendors are forcing the issue

- Economic trends indicate cloud storage is inescapable

- 90% of cloud is “easy”. Remaining 10% takes time & effort

‣ Advice:• The technical aspects of using “the cloud” are trivial• The political, policy and risk management aspects are

difficult and time consuming; start these ASAP

Wednesday, April 30, 14

Page 105: 2014 BioIT Trends From The Trenches

105

end; Thanks! slideshare.net/chrisdag/ [email protected] @chris_dag #BioIT14

Wednesday, April 30, 14