big data in research: research analytics industry solutionnov 04, 2012  · big data in research:...

28
Big Data in Research: Research Analytics Industry Solution Stuart Long CTO - Oracle Systems Asia Pacific and Japan

Upload: others

Post on 21-Mar-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Big Data in Research: Research Analytics Industry SolutionNov 04, 2012  · Big Data in Research: Research Analytics Industry Solution Stuart Long CTO - Oracle Systems Asia Pacific

Big Data in Research:

Research Analytics Industry Solution

Stuart Long

CTO - Oracle Systems Asia Pacific and Japan

Page 2: Big Data in Research: Research Analytics Industry SolutionNov 04, 2012  · Big Data in Research: Research Analytics Industry Solution Stuart Long CTO - Oracle Systems Asia Pacific

Master Data

Reference Data

Metadata

Transaction Data

Analytical Data

Unstructured Data

Big Data

Data Realms

Information Sharing & Delivery

Business Intelligence &

Data Warehousing

Data Integration

Content Management

Master Data Management

Enterprise Data Model

Data Governance,

Quality, & Lifecycle

Data Security

Data Technology

Management

Data security

Information sharing & delivery

Business intelligence

& data warehousing

Data integration

Content management

Master data management

Enterprise data model

Data governance,

quality & lifecycle

mgmt

Data technology management

Master data

Transaction data

Reference data

Analytical data

Metadata

Unstructured data

Big Data

Data Realms Data

Realms

Oracle Enterprise Architecture

Framework

Oracle

Information Architecture Framework

Information Architecture Capability Model

Page 3: Big Data in Research: Research Analytics Industry SolutionNov 04, 2012  · Big Data in Research: Research Analytics Industry Solution Stuart Long CTO - Oracle Systems Asia Pacific

The Information Architecture Spectrum

Data Realms Structure Volume Security Storage & Retrieval Modeling

Processing/Integration Consumption

Master data Transactions Analytical data Metadata

Structured Medium - High

Database, app, & user access

RDBMS / SQL Pre-defined relational or dimensional modeling

ETL/ELT, CDC, Replication, Message

BI & Statistical Tools, Operational Applications

Reference data Structured and Semi-Structured

Low-Medium

Platform security

XML / xQuery Flexible & Extensible

ETL/ELT, Message

System-based data consumption

Documents and Content

Unstructured High

File system based

File System / Search

Free Form OS-level file movement

Content Mgmt

Big Data - Weblogs - Sensors - Social Media

Structured, Semi-Structured, Unstructured

High File system & database

Distributed FS / noSQL

Flexible (Key Value)

Hadoop, MapReduce, ETL/ELT, Message

BI & Statistical Tools

Evaluating Economic and Architecture Tradeoffs

Page 4: Big Data in Research: Research Analytics Industry SolutionNov 04, 2012  · Big Data in Research: Research Analytics Industry Solution Stuart Long CTO - Oracle Systems Asia Pacific

Big Data

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

11000

12000

13000

14000

15000

16000

17000

18000

19000

20000

21000

22000

Tota

l Arc

hive

in

Terr

aByt

es (T

B)

1986 1989 1993 1995 1998 2000 2003 2005 2007 2015 2020

Year

Evolution of ESA's EO Data Archives between 1986-2007

and future estimates (up to 2020)

Future Data Estimates

LANDSAT 2-4 MSS (75-Dec 93)

AQUA Modis (April 03-today)

ENVISAT LR (March 02-today)

ENVISAT HR (March 02-today)

TERRA Modis (June 01-today)

QUICK SCATT (01-today) /PROBA (May 02-today)

LANDSAT 7 ETM (April 99-Dec 03)

SEA STAR SeaWifs (Apr 98-today)

ERS 2 HR (May 95-today)

ERS 2 LBR (May 95-today)

JERS SAR/OPS VNIR (92-Sep 98)

ERS 1 HR (Jul 91-Mar 00)

ERS 1 LBR (Jul 91-Mar 00)

SPOT 1-4 HRV (87-today)

MOS 1, 1b MESSR (87-Oct 93)

NOAA 9-17 AVHRR (86-today)

LANDSAT 5 TM (April 84-today)

NIMBUS 7 (Nov 78-May 86), SEASAT (Jun-Oct 78)

The LOFAR

Radio-Interferometre is producing

1.6TB/sec 138PB/day, setting

new frontiers for radio-astronomy

The volume of

earth-observation

data from European Space

Agency’s satellites passed

3PB in 2007 and the projection

for 2020 is seven-fold

In genomics:

• Cost of sequencing is dropping

by 50% every 5 months

• “… analysis, not sequencing, will

be the main expense hurdle”

(Cambridge University, UK)

Courtesy of BERIS

The volume of worldwide

climate data is expanding

rapidly, creating challenges for

both physical archiving and

sharing, for ease of access of

relevant information in a

multidisciplinary environment

J T Overpeck et al. Science 2011;331:700-702

In high energy physics, the data

recorded by each of the big

experiments at the Large Hadron

Collider will be enough to fill

around 100,000 DVDs every year!

4

Page 5: Big Data in Research: Research Analytics Industry SolutionNov 04, 2012  · Big Data in Research: Research Analytics Industry Solution Stuart Long CTO - Oracle Systems Asia Pacific

The Challenges of Big Data

Volume Very large

quantities

of data

Velocity Extremely

fast streams

of data

Variety Wide range

of data type

characteristics

Value High potential

value

if harnessed correctly

5

Page 6: Big Data in Research: Research Analytics Industry SolutionNov 04, 2012  · Big Data in Research: Research Analytics Industry Solution Stuart Long CTO - Oracle Systems Asia Pacific

Intel® VT For Connectivity Intel® VT-c

Intel® VT For Directed I/O Intel® VT-d

Intel® VT-x

Intel Xeon® 5500 Series:

First Platform with End-to-End HW Virtualization

Processor Chipset Network

Intel® Virtualization Technology

Holistic platform centric approach for virtualization

usages

Page 7: Big Data in Research: Research Analytics Industry SolutionNov 04, 2012  · Big Data in Research: Research Analytics Industry Solution Stuart Long CTO - Oracle Systems Asia Pacific

Low Latency Smart Cache SQL

Oracle First Platform with Data Embedded Instructions

Data Processing Unit DPU Data Aware Storage Data Defined Network

Oracle® Enabling Technology

Optimised for Data Processing and Database

Page 8: Big Data in Research: Research Analytics Industry SolutionNov 04, 2012  · Big Data in Research: Research Analytics Industry Solution Stuart Long CTO - Oracle Systems Asia Pacific

Economies of Real Time Analytics Waiting for DATA • Today’s Research applications are

increasingly held back by slow storage

• When requesting data, the server spends most of its time waiting for storage

• Application performance remains sluggish regardless of the Server CPU horsepower

• The traditional remedy of adding more DRAM or “short-stroking” HDDs is both expensive and inefficient

0

20000000

40000000

60000000

80000000

100000000

120000000

Page 9: Big Data in Research: Research Analytics Industry SolutionNov 04, 2012  · Big Data in Research: Research Analytics Industry Solution Stuart Long CTO - Oracle Systems Asia Pacific

Acquire Organize Analyze Visualize

Oracle

Big Data Appliance Oracle Exadata Oracle Exalytics

Infiniband

Big Data inside the Research Lifecycle Oracle’s Engineered Systems Solution

13

Page 10: Big Data in Research: Research Analytics Industry SolutionNov 04, 2012  · Big Data in Research: Research Analytics Industry Solution Stuart Long CTO - Oracle Systems Asia Pacific

The Research Industry Solutions

The Research Enterprise

Research

Analytics

Research Data

Management

Research

Administration

& Control

Our goal: To support researchers, their communities and their organizations

to do better Research by providing

cost-effective, reliable and open solutions

3

Page 11: Big Data in Research: Research Analytics Industry SolutionNov 04, 2012  · Big Data in Research: Research Analytics Industry Solution Stuart Long CTO - Oracle Systems Asia Pacific

Oracle Research Analytics

A platform that enables Researchers to:

• Work collaboratively on extremely large data sets providing

performance and innovative ways to exploit into data

• Build workflows that best support science and the operations of

complex Research

• Run applications and best adapt them to different scientific loads and

challenges

9

Page 12: Big Data in Research: Research Analytics Industry SolutionNov 04, 2012  · Big Data in Research: Research Analytics Industry Solution Stuart Long CTO - Oracle Systems Asia Pacific

Challenges to address

• Exponential growth in data and the ability to access critical

information

• Enterprise infrastructure ability to quickly

accommodate new data sources

• Evolve from data analysis to predictive science

• Ability to translate raw data into information and

knowledge

• Managing resources across workloads and platforms

7

Page 13: Big Data in Research: Research Analytics Industry SolutionNov 04, 2012  · Big Data in Research: Research Analytics Industry Solution Stuart Long CTO - Oracle Systems Asia Pacific

• Process high-volume, low-density information

• Support flexible data structures

• In-database deep analytics

• Perform analysis on big data

• Parallel execution for efficient processing

• Deep, rich set of analytics for extracting maximum business value

Oracle Differentiators

Research

Ecosystem

Research

Infrastructure

Research Data

Management Research

Administration

Research

Mission

11

Page 14: Big Data in Research: Research Analytics Industry SolutionNov 04, 2012  · Big Data in Research: Research Analytics Industry Solution Stuart Long CTO - Oracle Systems Asia Pacific

Research Analytics Flow

Visualization Sharing Discovery Organization

12

Page 15: Big Data in Research: Research Analytics Industry SolutionNov 04, 2012  · Big Data in Research: Research Analytics Industry Solution Stuart Long CTO - Oracle Systems Asia Pacific

Key Capabilities

• Open standards-based environment

• Minimize development time and effort

• Ensures appropriate levels of access

• Lower cost of research

• Facilitate manipulation of extremely large data sets

• Maximize analytic performance and achieve faster results

• Access to the latest investigative methods & tools

• Enables new science

• Ability to work on extremely large data sets allowing researchers new ways to exploit data

• Ensure trust and security

• Interoperable access to distributed repositories of data

• Facilitate innovative approach to discovery and results

• Support deep rich set of analytics

• Minimize development time/effort

• Reduce time-to-discovery

• Lower cost of research

• Enables new science

Visualization Sharing Discovery

Key Benefits

• High velocity loading and organization of information

• Ability to optimize workloads and system operations

• Ingest a wide range of data types

• Data integration

• Map reduction

• Statistical tools

• Analyze data across a wide variety of data characteristics using deep analytics

• Represent analyze finding

• Transform big data into something easy to analyze

• Load data quickly

• Ensures appropriate levels of access

• Enables cross-disciplinary science & discovery

Organization

Oracle Research Analytics: overview

14

Page 16: Big Data in Research: Research Analytics Industry SolutionNov 04, 2012  · Big Data in Research: Research Analytics Industry Solution Stuart Long CTO - Oracle Systems Asia Pacific

People. Process. Portfolio.

Oracle’s Integrated

Big Data

Solution Stack

Page 17: Big Data in Research: Research Analytics Industry SolutionNov 04, 2012  · Big Data in Research: Research Analytics Industry Solution Stuart Long CTO - Oracle Systems Asia Pacific

Oracle Integrated Solution for Big Data

ACQUIRE

Oracle NoSQL Database

HDFS

ORGANIZE

Hadoop

Oracle Big Data

Connectors

DECIDE

Analytic

Applications

ANALYZE

In-D

ata

base

An

aly

tics

Data

Warehouse

Interactive

Discovery Enterprise Applications

Page 18: Big Data in Research: Research Analytics Industry SolutionNov 04, 2012  · Big Data in Research: Research Analytics Industry Solution Stuart Long CTO - Oracle Systems Asia Pacific

Oracle

Exalytics

InfiniBand

Oracle’s Big Data solution

Oracle

Big Data

Appliance Oracle

Exadata

InfiniBand

Acquire Organize & Discover Analyze Decide

Endeca Information Discovery

Cloudera

Hadoop

Oracle

NoSQL

Open-Source

R

Big Data

Connectors

Oracle Data

Integrator

Oracle

Business

Intelligence

Oracle

Advanced

Analytics

Oracle

Database

Oracle

Spatial and

Graph

Page 19: Big Data in Research: Research Analytics Industry SolutionNov 04, 2012  · Big Data in Research: Research Analytics Industry Solution Stuart Long CTO - Oracle Systems Asia Pacific

• Pre-configured and optimized for Big Data processing

– 18 Servers, 864GB RAM, 648TB Storage/Rack; easy rack expansion

– NoSQL, Cloudera Hadoop, Oracle R

– Oracle Loader, Oracle Data Integrator, HDFS Connector for integration

• Integrates into your existing architecture

– Streams data into Exadata @15 TB/hour

Oracle Big Data Appliance

Oracle Big Data Appliance Engineered Systems for Big Data

Big Data

Appliance

Page 20: Big Data in Research: Research Analytics Industry SolutionNov 04, 2012  · Big Data in Research: Research Analytics Industry Solution Stuart Long CTO - Oracle Systems Asia Pacific

Fastest Data Warehouse & OLTP:

– 10X-20X fast load and query times

– 10X storage savings, 80% less power, and a lot less space

Optimized for In-Database Analytics – Model functions execute in storage

Optimized for Network Throughput – Network connections In from Big Data

Capture and Out to In-Memory Analytics

1/5th to1/8th cost of other alternatives

Oracle Exadata

Oracle Exadata Engineered Systems for Systems of Record

Exadata

Page 21: Big Data in Research: Research Analytics Industry SolutionNov 04, 2012  · Big Data in Research: Research Analytics Industry Solution Stuart Long CTO - Oracle Systems Asia Pacific

Data Mining

Statistics

Text Mining

Predictive

Analytics

• Comprehensive Predictive Analytic

platform built inside Database

– Data mining, text mining

– Statistical analysis (based on R)

– Built for data analysts / scientists

• Scalable and parallel: analyzes

huge volumes of data

• Tightly integrated with SQL,

enabling broad usage

• Works inside Exadata and

Big Data Appliance

Oracle Advanced Analytics Advanced In-Database Predictive Analytics

Page 22: Big Data in Research: Research Analytics Industry SolutionNov 04, 2012  · Big Data in Research: Research Analytics Industry Solution Stuart Long CTO - Oracle Systems Asia Pacific

• Spans Relational, Multi-Dimensional, and Unstructured analysis,

combined with Financial & Operational Planning – In-Memory Optimized Hardware

– In-Memory Oracle BI, TimesTen, Essbase, and Endeca

– Several In-Memory Software Innovations

• Tightly integrated with Exadata

Exalytics

In-Memory

Machine

Oracle Exalytics In-Memory Engineered System for Analytics

Page 23: Big Data in Research: Research Analytics Industry SolutionNov 04, 2012  · Big Data in Research: Research Analytics Industry Solution Stuart Long CTO - Oracle Systems Asia Pacific

• Hybrid in-memory search /

analytic engine

– Combines un-structured/structured

and internal/external data (big data)

– Enables search, navigation, and

discovery of data and correlations

• Highly interactive UI for

discovery/exploration

– Social Media Analytics

– Customer 360 Analysis

– Competitive Intelligence

Unified

Indexing

Data

Mashup

Text

Analysis

Unified

Search

Faceted

Navigation

Interactive

Exploration

Information Discovery

Exalytics

In-Memory

Machine

Oracle Information Discovery In-Memory Un-Structured & Semi-Structured Analysis

Page 24: Big Data in Research: Research Analytics Industry SolutionNov 04, 2012  · Big Data in Research: Research Analytics Industry Solution Stuart Long CTO - Oracle Systems Asia Pacific

People. Process. Portfolio.

Customer Success

in

Big Data

Architecture

Page 25: Big Data in Research: Research Analytics Industry SolutionNov 04, 2012  · Big Data in Research: Research Analytics Industry Solution Stuart Long CTO - Oracle Systems Asia Pacific

Customer Success: Erasmus Medical

Centre

Thanks to an Exadata-based solution, Erasmus Medical Centre achieved:

• For a 11 minute query, Exadata could improve it to 1 second, which is a major

advantage for researchers to have immediate results

• Smart Scan and Flash Card : give performance in analyzing data.

• Hybrid Columnar Compression : gives performance in the ability to manipulate

Tb of data (compression from 133 Gb to 11 Gb), with increased performance.

• Adding Oracle Database 11g features like partitioning gives more performance in

manipulating, quantifying data obtained through the study of various genomes

Challenges

Results

• Complex data processing and analysis.

• Ability to

• load huge data information in minimum time

• store these data and their genomic DNA research results on storage disk

• have an efficient system able to give them query performance

16

Page 26: Big Data in Research: Research Analytics Industry SolutionNov 04, 2012  · Big Data in Research: Research Analytics Industry Solution Stuart Long CTO - Oracle Systems Asia Pacific

Customer Success: Oregon State University’s COAS COAS: College of Oceanic and Atmospheric Sciences

• With Oracle, COAS has an easy to manage, integrated system that delivers the

flexibility and scalability necessary to address the exponential data increases

associated with its leading-edge research, as well as quickly adjust to ever-changing

data availability requirements.

• As a result of extending its infrastructure with Oracle, COAS has improved data

movement and performance by approximately 3 to 4 times, reduced system

administration and management time, and unified research silos to gain a holistic

view of integrated data sets.

• Additionally, COAS can now manage its unusually large input/output (I/O) loads,

enabling the computation, storage, analysis and visualization of massive data flows.

Challenges

Results

•To expand its infrastructure to support its leading edge scientific research on the

ocean and atmosphere’s influence on the Earth’s climate

•To meet the data intensive demands of its scientific research and foster an

environment that will address current and future workflows

17

Page 27: Big Data in Research: Research Analytics Industry SolutionNov 04, 2012  · Big Data in Research: Research Analytics Industry Solution Stuart Long CTO - Oracle Systems Asia Pacific

Customer Success: Indiana University

• Enable Research and effective data analysis in different fields

• Provide and run a robust, secure and cost-effective Research environment

protecting data and ensuring that researchers have access to state-of-the-art

technology.

• For additional insight into research data, it provides researchers with access to

Oracle Data Mining, Oracle Spatial and Oracle OLAP to deliver its Database-as-a-

Service to researchers both within Indiana University and at other universities

around the country.

Challenges

Results

• To provide researchers with a first-class database environment that is secure,

reliable and easy to use

• To gain rapidly and effectively insight into the data by building and managing

research-oriented, data-intensive applications.

• To provide tools, templates and plug-ins they need to easily leverage research

data to enhance their findings and increase productivity.

18

Page 28: Big Data in Research: Research Analytics Industry SolutionNov 04, 2012  · Big Data in Research: Research Analytics Industry Solution Stuart Long CTO - Oracle Systems Asia Pacific