idc update on how big data is redefining high performance ... · idc update on how big data is...

59
IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph [email protected] Steve Conway [email protected] Chirag Dekate [email protected] Bob Sorensen [email protected]

Upload: others

Post on 28-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: IDC Update on How Big Data Is Redefining High Performance ... · IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph – ejoseph@idc.com Steve Conway –

IDC Update on How Big Data Is Redefining High Performance

Computing Earl Joseph – [email protected]

Steve Conway – [email protected] Chirag Dekate – [email protected]

Bob Sorensen – [email protected]

Page 2: IDC Update on How Big Data Is Redefining High Performance ... · IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph – ejoseph@idc.com Steve Conway –

IDC Has Over 1,000 Analysts In 62 Countries

Page 3: IDC Update on How Big Data Is Redefining High Performance ... · IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph – ejoseph@idc.com Steve Conway –

Agenda

A Short HPC Market Update

Big Data Challenges and Short Comings

The High End of Big Data

• Examples of Very Large Big Data

Examples of How Big Data is Redefining

High Performance Computing

Page 4: IDC Update on How Big Data Is Redefining High Performance ... · IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph – ejoseph@idc.com Steve Conway –

HPC

Market Update

Page 5: IDC Update on How Big Data Is Redefining High Performance ... · IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph – ejoseph@idc.com Steve Conway –

5

What Is HPC?

IDC uses these terms to cover all technical

servers used by scientists, engineers, financial

analysts and others:

• HPC

• HPTC

• Technical Servers

• Highly computational servers

HPC covers all servers that are used for

computational or data intensive tasks • From a $5,000 deskside server up to over $550 million

dollar supercomputer

Page 6: IDC Update on How Big Data Is Redefining High Performance ... · IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph – ejoseph@idc.com Steve Conway –

Top Trends in HPC

2013 declined overall – by $800 million

• For a total of $10.3 billion

• Mainly due to a few very large systems sales in 2012

that weren’t repeated in 2013

• We expect growth in 2015 to 2018

Software issues continue to grow

The worldwide Petascale Race is at full speed

GPUs and accelerators are hot new technologies

Big data combined with HPC is creating new solutions

in new areas

Page 7: IDC Update on How Big Data Is Redefining High Performance ... · IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph – ejoseph@idc.com Steve Conway –

IDC HPC Competitive Segments: 2013

Departmental ($250K - $100K)

$3.4B

Divisional ($250K - $500K)

$1.4B

Supercomputers (Over $500K)

$4.0B

Workgroup (under $100K)

$1.6B

HPC

Servers

$10.3B

Page 8: IDC Update on How Big Data Is Redefining High Performance ... · IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph – ejoseph@idc.com Steve Conway –

HPC WW Market Trends:

A 17 Year Perspective

C

Page 9: IDC Update on How Big Data Is Redefining High Performance ... · IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph – ejoseph@idc.com Steve Conway –

HPC

Market Forecasts

Page 10: IDC Update on How Big Data Is Redefining High Performance ... · IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph – ejoseph@idc.com Steve Conway –

HPC Forecasts

• Forecasting a 7.4% yearly growth from 2013 to 2018

• 2018 should reach $14.7 billion

Page 11: IDC Update on How Big Data Is Redefining High Performance ... · IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph – ejoseph@idc.com Steve Conway –

HPC Forecasts: By Industry/Applications

Page 12: IDC Update on How Big Data Is Redefining High Performance ... · IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph – ejoseph@idc.com Steve Conway –

The Broader HPC Market

Page 13: IDC Update on How Big Data Is Redefining High Performance ... · IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph – ejoseph@idc.com Steve Conway –

Big Data

Challenges And

Shortcomings

Page 14: IDC Update on How Big Data Is Redefining High Performance ... · IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph – ejoseph@idc.com Steve Conway –

Defining Big Data

Page 15: IDC Update on How Big Data Is Redefining High Performance ... · IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph – ejoseph@idc.com Steve Conway –

HPDA Market Drivers More input data (ingestion)

• More powerful scientific instruments/sensor networks

• More transactions/higher scrutiny (fraud, terrorism)

More output data for integration/analysis • More powerful computers • More realism • More iterations in available time

Real time, near-real time requirements • Catch fraud before it hits credit cards • Catch terrorists before they strike • Diagnose patients before they leave the office • Provide insurance quotes before callers leave the phone

The need to pose more intelligent questions • Smarter mathematical models and algorithms

Page 16: IDC Update on How Big Data Is Redefining High Performance ... · IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph – ejoseph@idc.com Steve Conway –

Top Drivers For Implementing Big Data

Page 17: IDC Update on How Big Data Is Redefining High Performance ... · IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph – ejoseph@idc.com Steve Conway –

Organizational Challenges With Big Data:

Government Compared To All Others

Page 18: IDC Update on How Big Data Is Redefining High Performance ... · IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph – ejoseph@idc.com Steve Conway –

Big Data Meets

HPC And

Advanced Simulation

Page 19: IDC Update on How Big Data Is Redefining High Performance ... · IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph – ejoseph@idc.com Steve Conway –

High Performance Data Analysis

Needs HPC resources • High complexity (algorithms)

• High time-criticality

• High variability

• (On premise or in cloud)

Data of all kinds • The 4 V’s: volume, variety, velocity, value

• Structured, unstructured

• Partitionable, non-partitionable

• Regular, irregular patterns

Simulation & analytics

• Search, pattern discovery

• Iterative methods

• Established HPC users + new

commercial users

Page 20: IDC Update on How Big Data Is Redefining High Performance ... · IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph – ejoseph@idc.com Steve Conway –

HPC Adoption Timeline (Examples)

1960 1970 1980 1990 2000 2012

Page 21: IDC Update on How Big Data Is Redefining High Performance ... · IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph – ejoseph@idc.com Steve Conway –

Very Large

Big Data

Examples

Page 22: IDC Update on How Big Data Is Redefining High Performance ... · IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph – ejoseph@idc.com Steve Conway –

NASA

22

Page 23: IDC Update on How Big Data Is Redefining High Performance ... · IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph – ejoseph@idc.com Steve Conway –

Square Kilometre Arrary – Radio

Astronomy for Astrophysics

Page 24: IDC Update on How Big Data Is Redefining High Performance ... · IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph – ejoseph@idc.com Steve Conway –

CERN • LHC: the world’s leading accelerator -- Multiple Nobel Prizes

for particle physics work

• Innovation driven by the need to distribute massive data sets

and the accompanying applications

• Altas, one of CERN’s two detectors, generates 1PB of data

per second when running! (Not all of this is distributed).

• Private cloud distribution to scientists in 20 EU member states

plus observer states (single largest user is the U.S.)

• Today: only .0000005% of the data is used

Page 25: IDC Update on How Big Data Is Redefining High Performance ... · IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph – ejoseph@idc.com Steve Conway –

NOAA

25

Page 26: IDC Update on How Big Data Is Redefining High Performance ... · IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph – ejoseph@idc.com Steve Conway –

26

Page 27: IDC Update on How Big Data Is Redefining High Performance ... · IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph – ejoseph@idc.com Steve Conway –

27

Page 28: IDC Update on How Big Data Is Redefining High Performance ... · IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph – ejoseph@idc.com Steve Conway –

HPC Will Be Used More for Managing Mega-IT Infrastructures

Page 29: IDC Update on How Big Data Is Redefining High Performance ... · IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph – ejoseph@idc.com Steve Conway –

Examples of

Big Data

Redefining HPC

Page 30: IDC Update on How Big Data Is Redefining High Performance ... · IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph – ejoseph@idc.com Steve Conway –

Finding suspicious patterns that we don’t

even know exist in related data sets

Use Case: PayPal Fraud Detection

The Problem

Page 31: IDC Update on How Big Data Is Redefining High Performance ... · IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph – ejoseph@idc.com Steve Conway –

What Kind of Volume?

PayPal’s Data Volumes And HPDA Requirements

Page 32: IDC Update on How Big Data Is Redefining High Performance ... · IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph – ejoseph@idc.com Steve Conway –

Where Paypal Used HPC

Page 33: IDC Update on How Big Data Is Redefining High Performance ... · IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph – ejoseph@idc.com Steve Conway –

The Results $710 million saved in fraud that they wouldn’t have been able

to detect before (in the first year)

Page 34: IDC Update on How Big Data Is Redefining High Performance ... · IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph – ejoseph@idc.com Steve Conway –

There Are New Technologies That Will Likely Cause A Mass Explosion In Data – Requiring HPDA Solutions

Page 35: IDC Update on How Big Data Is Redefining High Performance ... · IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph – ejoseph@idc.com Steve Conway –

GEICO: Real-Time Insurance Quotes

Problem: Need accurate automated phone quotes

in 100ms. They couldn’t do these calculations

nearly fast enough on the fly.

Solution: Each weekend, use a new HPC cluster to

pre-calculate quotes for every American adult and

household (60 hour run time)

Page 36: IDC Update on How Big Data Is Redefining High Performance ... · IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph – ejoseph@idc.com Steve Conway –

Something To Think About -- GEICO: Changing

The Way One Approaches Solving a Problem

• Instead of processing each event one-at-a-time, process it for everyone on a regular basis It can be dramatically cheaper, faster and offers

additional ways to be more accurate

But most of all it can create new and more powerful capabilities

• Examples: For home loan applications – calculate for every adult

in the US and every home in the US

For health insurance fraud – track every procedure done on every US person by every doctor – and find patterns

Page 37: IDC Update on How Big Data Is Redefining High Performance ... · IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph – ejoseph@idc.com Steve Conway –

Something To Think About -- GEICO: Changing

The Way One Approaches Solving a Problem

• Future Examples (continued): If you add-in large scale data collection via sensors like

GPS, drones and RFID tags:

• New car insurance rules – The insurance company doesn’t have to pay if you break the law -- like speeding and having an accident

• You could track every car at all times – then charge $2 to see where the in-laws are in traffic if they are late for a wedding

• Google maps could show in real-time where every letter and package is located

• But crooks could also use it in many ways – e.g. watching ATM machines, looking for when guards are on break, …

Page 38: IDC Update on How Big Data Is Redefining High Performance ... · IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph – ejoseph@idc.com Steve Conway –

U.S. Postal Service

Page 39: IDC Update on How Big Data Is Redefining High Performance ... · IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph – ejoseph@idc.com Steve Conway –

U.S. Postal Service

Page 40: IDC Update on How Big Data Is Redefining High Performance ... · IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph – ejoseph@idc.com Steve Conway –

U.S. Postal Service

MCDB = memory-centric database

Page 41: IDC Update on How Big Data Is Redefining High Performance ... · IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph – ejoseph@idc.com Steve Conway –

CMS: Government Health Care Fraud

5 separate databases for the big USG health care programs

under Centers for Medicare and Medicaid Services (CMS)

Estimated fraud: $150B-$450B <$5B caught today)

ORNL, SDSC have evaluation contracts to unify the

databases and perform fraud detection on various

architectures

Page 42: IDC Update on How Big Data Is Redefining High Performance ... · IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph – ejoseph@idc.com Steve Conway –

Schrödinger: Cloud-based Lead

Discovery for Drug Design

NOVARTIS/SCHROEDINGER:

Pharmaceutical company Novartis increased resolution

of drug discovery algorithm 10x and wanted to use it to

test 21 million small molecules as drug candidates

Novartis used the Schroedinger drug discovery app in

AWS public cloud, with the help of Cycle Computing

Initial run used 51,000 AWS cores and took $14,000 and

<4 hours

… and its getting cheaper Later run used 156,000

AWS cores with comparable costs and time

Page 43: IDC Update on How Big Data Is Redefining High Performance ... · IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph – ejoseph@idc.com Steve Conway –

Schrödinger: Cloud-based Lead

Discovery for Drug Design

Page 44: IDC Update on How Big Data Is Redefining High Performance ... · IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph – ejoseph@idc.com Steve Conway –

Global Financial Services: Company X

One of the most respected firms in the global financial services

industry updates detailed information daily on several million

companies around the world.

Clients use the firm's credit ratings and other company information

in making lending decisions and for other planning, marketing, and

business decision making.

The firm uses statistical models to develop a company's scores and

ratings, and for years, the ratings have been prepared and analyzed

locally in near real time by the firm's personnel around the world.

• This practice is a major competitive advantage but resulted in the

creation of hundreds of distinct databases and more than a dozen

scoring environments.

• Several years ago, the company established a goal of centralizing these

resources and chose SAS as the centralization mechanism, including

SAS Grid Manager as part of the software stack.

Page 45: IDC Update on How Big Data Is Redefining High Performance ... · IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph – ejoseph@idc.com Steve Conway –

Global Financial Services: Company X

The centralized IT infrastructure created using SAS preserves the

advantages of the company's locally created ratings and reports.

The new infrastructure provides an effective environment for

analytics development and accommodates multiple testing, debt,

and production environments in a single stack.

It is flexible enough to allow dynamic prioritization among these

environments, according to a company executive. With help from

SAS Grid Manager, the company can maximize the use of its

computing resources. The software automatically assigns jobs to

server nodes with available capacity, instead of having users wait in

queue for time on fully utilized nodes.

The company executive estimates that it might cost 30% more to

purchase servers with enough capacity to handle these peak

workloads on their own.

Page 46: IDC Update on How Big Data Is Redefining High Performance ... · IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph – ejoseph@idc.com Steve Conway –

Global Financial Services: Company X

Several million clients use

the firm’s credit ratings to

help make lending

decisions

Goal: increase efficiency for

updating ratings

Result: HPC multi-cluster

grid boosted efficiency 30%

-- no need to buy additional

clusters yet.

Page 47: IDC Update on How Big Data Is Redefining High Performance ... · IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph – ejoseph@idc.com Steve Conway –

Real Estate

Worldwide vacation exchange &

rental leader

Goal: Update property valuations

several times per day (not possible

with enterprise servers)

Results:

• HPC technology enabled all updates in

8-9 hours

• Avoided move to heuristics

• Allowed company to focus on rental

side

Page 48: IDC Update on How Big Data Is Redefining High Performance ... · IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph – ejoseph@idc.com Steve Conway –

Outcomes-Based Medical Diagnosis and

Treatment Planning

Enter the patient’s history and symptomology.

While the patient is still in the office, sift through millions of archived patient records for relevant outcomes.

Provider considers the efficacies of various treatments for “similar” patients (but is not bound by the findings).

Ergo, this functions as a powerful decision-support tool.

Benefits: better outcomes + rein in costly outlier practices.

Page 49: IDC Update on How Big Data Is Redefining High Performance ... · IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph – ejoseph@idc.com Steve Conway –
Page 50: IDC Update on How Big Data Is Redefining High Performance ... · IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph – ejoseph@idc.com Steve Conway –

Digital Television Services

A global leader with 30 million subscribers

Goal: maximize revenue & customer satisfaction

during high-growth period

Result: HPC has added €7.5 million in annual

revenue while increasing satisfaction

Page 51: IDC Update on How Big Data Is Redefining High Performance ... · IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph – ejoseph@idc.com Steve Conway –

IDC HPDA Server Forecast Fast growth from a small starting point: $933 M

HPDA ecosystem >$2.6B in 2018

Page 52: IDC Update on How Big Data Is Redefining High Performance ... · IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph – ejoseph@idc.com Steve Conway –

IDC HPDA Storage Forecast Storage is the fastest-growing HPC market (9% CAGR)

And HPDA storage will grow even faster (26.5% CAGR)

Page 53: IDC Update on How Big Data Is Redefining High Performance ... · IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph – ejoseph@idc.com Steve Conway –

In

Summary

Page 54: IDC Update on How Big Data Is Redefining High Performance ... · IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph – ejoseph@idc.com Steve Conway –

Summary: HPDA Market Opportunity

HPDA: simulation + newer high-

performance analytics

• IDC predicts fast growth from a small

starting point

HPC and high-end commercial analytics

are converging

• Algorithmic complexity is the common

denominator

Economically important use cases are

emerging

No single HPC solution is best for all

problems

Page 55: IDC Update on How Big Data Is Redefining High Performance ... · IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph – ejoseph@idc.com Steve Conway –

HPDA User Talks: HPC User Forums, UK, Germany, France,

China and U.S. …

• HPC in Evolutionary Biology, Andrew Meade, University of Reading

• HPC in Pharmaceutical Research: From Virtual Screening to All-Atom Simulations of Biomolecules, Jan Kriegl, Boehringer-Ingelheim

• European Exascale Software Initiative, Jean-Yves Berthou, Electricite de France

• Real-time Rendering in the Automotive Industry, Cornelia Denk, RTT-Munich

• Data Analysis and Visualization for the DoD HPCMP, Paul Adams, ERDC

• Why HPCs Hate Biologists, and What We're Doing About It, Titus Brown, Michigan State University

• Scalable Data Mining and Archiving in the Era of the Square Kilometre Array, the Square Kilometre Array Telescope Project, Chris Mattmann, NASA/JPL

• Big Data and Analytics in HPC: Leveraging HPC and Enterprise Architectures for Large Scale Inline Transactional Analytics in Fraud Detection at PayPal, Arno Kolster, PayPal, an eBay Company

• Big Data and Analytics Vendor Panel: How Vendors See Big Data Impacting the Markets and Their Products/Services, Panel Moderator: Chirag Dekate, IDC

• Data Analysis and Visualization of Very Large Data, David Pugmire, ORNL

• The Impact of HPC and Data-Centric Computing in Cancer Research, Jack Collins, National Cancer Institute

• Urban Analytics: Big Cities and Big Data, Paul Muzio, City University of New York

• Stampede: Intel MIC And Data-Intensive Computing, Jay Boisseau, Texas Advanced Computing Center

• Big Data Approaches at Convey, John Leidel

• Cray Technical Perspective On Data-Intensive Computing, Amar Shan

• Data-intensive Computing Research At PNNL, John Feo, Pacific Northwest National Laboratory

• Trends in High Performance Analytics, David Pope, SAS

• Processing Large Volumes of Experimental Data, Shane Canon, LBNL

• SGI Technical Perspective On Data-Intensive Computing, Eng Lim Goh, SGI

• Big Data and PLFS: A Checkpoint File System For Parallel Applications, John Bent, EMC

• HPC Data-intensive Computing Technologies, Scott Campbell, Platform/IBM

• The CEA-GENCI-Intel-UVSQ Exascale Computing Research Centre, Marie-Christine Sawley, Intel

Page 57: IDC Update on How Big Data Is Redefining High Performance ... · IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph – ejoseph@idc.com Steve Conway –

Big Data Software Shortcomings --

Today

Page 58: IDC Update on How Big Data Is Redefining High Performance ... · IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph – ejoseph@idc.com Steve Conway –

Big Data Software:

Page 59: IDC Update on How Big Data Is Redefining High Performance ... · IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph – ejoseph@idc.com Steve Conway –

Big Data Software Technology Stack