why you need to govern big data

20
© 2014 IBM Corporation IBM Big Data Governance

Upload: ibm-analytics

Post on 08-Aug-2015

116 views

Category:

Data & Analytics


1 download

TRANSCRIPT

© 2014 IBM Corporation

IBM Big Data Governance

© 2014 IBM Corporation2

What you’ll learn…

The opportunity

Big data governance:

Requirements

How it works

Capabilities

A holistic approach

Next steps

© 2014 IBM Corporation3

Veracity: Can I trust what I am seeing?

What Is Big Data?

Immense volume, variety and velocity of data, in context, beyond what was

previously possible

Opportunity to derive new insights – challenged by questionable veracity

Volume

Prevent customer churn

call detail records per day

500million

Velocity

trade events per second

Identify potential fraud

5 million

is images, video, documents

Improve customer satisfaction

80%

Variety

from surveillance cameras

Monitor events of interest

100’s of video feeds

of data growth

meter readings per annum

350 billion

Analyze product sentiment

of Tweets create daily

12 terabytes

Predict power consumption

© 2014 IBM Corporation4

Utilities• Weather analysis• Smart grid management

Retail

• 360° View of the customer

• Real-time promotions

Law Enforcement

• Multimodal surveillance

• Cyber security detection

Transportation

• Logistics optimization

• Traffic congestion

Financial Services

• Fraud detection

• 360° View of the customer

Information Technology• System Log Analysis• Cybersecurity

Health & Life Sciences

• Epidemic early warning

• ICU monitoring

Telecommunications

• Geomapping/marketing

• Network monitoring

What Can You Do With Big Data?

© 2014 IBM Corporation5

c

cc

c

cMake decisions on untrusted information1 in 3

60%

Don’t have necessary information1 in 2

Time spent per big data project to find, prepare, understand & defend information due to lack of context

80%

Have more data than they can use60%

So, How Are We Doing?

© 2014 IBM Corporation6

American’s in a recent survey

don’t want personalized

on-line advertising

When you tell them the

information you collect and

store in order to do it

66%

Increasing to

86%

© 2014 IBM Corporation7

Context, Agility and Security are Essential Requirements to

Meet Business Objectives in a Big Data Environment

AgilityA business framework

(policies) for determining

how and where to use

big data.

ContextFlexibility to establish

and maintain context

independent of the

volume, variety and

velocity of data.

SecurityProtection of data privacy and access; compliance with data

security and other regulatory requirements

Essential

Requirements

© 2014 IBM Corporation8

Context Requires Governance;

Agility Requires a Unique Big Data Approach to Governance

Traditional approach Big data approach

Govern data to the highest standard. Store it, then use it for multiple purposes

Understand data and usage. Govern to the appropriate level. Use it, and iterate

RepositoryGovernto

Perfection

UseData

Data

Explore/ Understand

Govern Appropriately

Use

How does an organization achieve agility in creating and

continually evolving a safe and secure context in big data environments?

© 2014 IBM Corporation9

ACT

Implement

planned

projects with

governed

data search,

preparation,

defense and

security

Begin by

defining the

business

problem to

solve with big

data

Obtain Executive

Sponsorship

2

AlignTeams

3

Understand Data Risk and

Value

4

Define Business Problem

1

MeasureResults

6

ImplementAnalytical / Operational Project(s)

5

ACT

ASSESSPLAN

Defend Secure and Comply

PrepareFind

Big Data Governance is a Holistic Approach

Obtain

executive

sponsor to

finalize

priorities and

goals

Update

governance

roles to

account for

big data

Categorize

data to

understand

risk exposure

Assess

governance

results and

adjust

© 2014 IBM Corporation10

Key Data Scenarios for Big Data Governance

Find Prepare Defend Secure and Comply

Establish context to find, visualize, and understand data for improved decision making

Understand context to extract, cleanse, integrate and monitor data properly, to increase integrity and trustworthiness for subsequent usage

Build confidence in information by making it defensible against challenges

Protection of data privacy and access; compliance with data security and other regulatory requirements

Analytical use Operational use

© 2014 IBM Corporation11

FindEstablish context to find, visualize, and understand data for improved decision making

Capabilities to Consider

The Cost

is High

of data scientists’ time on big data projects is spent finding and preparing data

80%

Connectivity

to sources

Real-time

queries

(SQL, etc)

Enterprise

search

Automated

data

discovery

Data profiling

Key Data Scenarios for Big Data Governance

© 2014 IBM Corporation12

Key Data Scenarios for Big Data Governance

PrepareUnderstand context to extract, cleanse, integrate and monitor data properly to increase integrity and trustworthiness for subsequent usage

Capabilities to Consider

The Risk

is Real

Highly

scalable data

integration

Define terms

and policies

Data

cleansing

Quality

dashboarding

Rich

annotation

© 2014 IBM Corporation13

Capabilities to Consider

Maintain data

lineage

Data quality

dashboarding

Master data

management

Make decisions on untrusted information

DefendBuild confidence in information by making it defensible against challenges

The Risk

is Real

1 in 3

Key Data Scenarios for Big Data Governance

© 2014 IBM Corporation14

Capabilities to Consider

Secure data

at rest and in

motion

Data

masking

Governed

data

retention

Test data

management

Governance

reporting

$200 million

just to replace

cards!

Secure and ComplyProtection of data privacy and access; compliance with data security and other regulatory requirements

The Risk

is Severe

Key Data Scenarios for Big Data Governance

© 2014 IBM Corporation15

Organizations rated their

decision making as

7 or higher on a scale

of 1 to 10

4 out of 5Organizations are

improving at 3 times the

rate of competitors

3XOf organizations show

high or very high levels

of trust

77%

Source: The Big Data Imperative: Why Information Governance Must Be Addressed Now, Aberdeen Group, Dec 2012

IBM Big Data Governance Offers a Golden Opportunity

© 2014 IBM Corporation16

All Hadoop Vendors Talk About Their Big “Data Lake”.

ONLY IBM Delivers Consumable Big Data From The Swamp.

Clean Hadoop LakeHadoop Data Swamp

IBM Big Data Governance–including quality, security, and data lineage–transforms your Hadoop Data Swamp to a consumable Big Data Lake.

© 2014 IBM Corporation17

A Complete Big Data Solution Is More Than Just An Engine

© 2013 IBM Corporation

IBM Teradata Pivotal INFA Cloudera Horton

Hadoop Distribution Horton

Hadoop Available via Appliance ORCL & HP Teradata

Hadoop SQL Engine Postgre

Streaming Data Flume/

Storm

Flume/

Storm

Data Exploration Tools

Enterprise Reporting

Data Provisioning Tools IBM, INFA Scripting Talend

Security Monitoring Protegrity

ELT, ETL & Replication IBM, INFA Talend

Metadata & Lineage Revelytix

Profile & Cleanse (native) IBM, INFA Talend

Hadoop Matching (native) IBM, INFA

Reference Data Mgmt.

Data Masking on Hadoop IBM, INFA

Archiving on Hadoop

© 2014 IBM Corporation18

Reduces reporting timefrom 2 to 3 days to minutes

“The IBM analytics solution greatly improves our ability

to define and monitor business KPIs, and it brings much

greater transparency to reporting. We now have a

single version of the truth and a single comprehensive

report for each topic.”

— Irfan Zafar, Chief Technology Innovation Officer

and Senior General Manager of Customer Services,

Sui Southern Gas Company Limited

Enables timely analyticscombining real-time operational

and geographic data from over

5000 sources

Single source to

informationthat is reliable and provides better

clarity into the supply chain

Chemicals & Petroleum, Energy & Utilities

The transformation: Deployed an analytics solution

that overlays digital maps with real-time operational

and financial data, enabling SSGC to analyze data in

a real-world context.

IBM Software–Information Management

Sui Southern Gas Company

Mitigates Business Risk Through Insights Into Supply and Demand

Learn more: https://ibm.biz/bigdatagovernance

© 2014 IBM Corporation20

Legal Disclaimer

• © IBM Corporation 2014. All Rights Reserved.

• The information contained in this publication is provided for informational purposes only. While efforts were made to verify the completeness and accuracy of the information contained

in this publication, it is provided AS IS without warranty of any kind, express or implied. In addition, this information is based on IBM’s current product plans and strategy, which are

subject to change by IBM without notice. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this publication or any other materials. Nothing

contained in this publication is intended to, nor shall have the effect of, creating any warranties or representations from IBM or its suppliers or licensors, or altering the terms and

conditions of the applicable license agreement governing the use of IBM software.

• References in this presentation to IBM products, programs, or services do not imply that they will be available in all countries in which IBM operates. Product release dates and/or

capabilities referenced in this presentation may change at any time at IBM’s sole discretion based on market opportunities or other factors, and are not intended to be a commitment to

future product or feature availability in any way. Nothing contained in these materials is intended to, nor shall have the effect of, stating or implying that any activities undertaken by

you will result in any specific sales, revenue growth or other results.

• If the text contains performance statistics or references to benchmarks, insert the following language; otherwise delete:

Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will

experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage

configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.

• If the text includes any customer examples, please confirm we have prior written approval from such customer and insert the following language; otherwise delete:

All customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs

and performance characteristics may vary by customer.

• Please review text for proper trademark attribution of IBM products. At first use, each product name must be the full name and include appropriate trademark symbols (e.g., IBM

Lotus® Sametime® Unyte™). Subsequent references can drop “IBM” but should include the proper branding (e.g., Lotus Sametime Gateway, or WebSphere Application Server).

Please refer to http://www.ibm.com/legal/copytrade.shtml for guidance on which trademarks require the ® or ™ symbol. Do not use abbreviations for IBM product names in your

presentation. All product names must be used as adjectives rather than nouns. Please list all of the trademarks that you use in your presentation as follows; delete any not included in

your presentation. IBM, the IBM logo, Lotus, Lotus Notes, Notes, Domino, Quickr, Sametime, WebSphere, UC2, PartnerWorld and Lotusphere are trademarks of International

Business Machines Corporation in the United States, other countries, or both. Unyte is a trademark of WebDialogs, Inc., in the United States, other countries, or both.

• If you reference Adobe® in the text, please mark the first use and include the following; otherwise delete:

Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other

countries.

• If you reference Java™ in the text, please mark the first use and include the following; otherwise delete:

Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both.

• If you reference Microsoft® and/or Windows® in the text, please mark the first use and include the following, as applicable; otherwise delete:

Microsoft and Windows are trademarks of Microsoft Corporation in the United States, other countries, or both.

• If you reference Intel® and/or any of the following Intel products in the text, please mark the first use and include those that you use as follows; otherwise delete:

Intel, Intel Centrino, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States

and other countries.

• If you reference UNIX® in the text, please mark the first use and include the following; otherwise delete:

UNIX is a registered trademark of The Open Group in the United States and other countries.

• If you reference Linux® in your presentation, please mark the first use and include the following; otherwise delete:

Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Other company, product, or service names may be trademarks or service marks of

others.

• If the text/graphics include screenshots, no actual IBM employee names may be used (even your own), if your screenshots include fictitious company names (e.g., Renovations, Zeta

Bank, Acme) please update and insert the following; otherwise delete: All references to [insert fictitious company name] refer to a fictitious company and are used for illustration

purposes only.