hadoop as a data hub

18
© 2012 Datameer, Inc. All rights reserved. Audio Audio will be streamed over the web for today’s webcast Make sure your computer speakers are turned up and the volume is adjusted If you are having trouble connecting, please send the host a chat message through the chat window

Upload: dianna-doan

Post on 08-Apr-2017

539 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Hadoop as a Data Hub

© 2012 Datameer, Inc. All rights reserved.

Audio!

!   Audio will be streamed over the web for today’s webcast

!   Make sure your computer speakers are turned up and the volume is adjusted

!   If you are having trouble connecting, please send the host a chat message through the chat window

!

Page 2: Hadoop as a Data Hub

© 2012 Datameer, Inc. All rights reserved.

© 2012 Datameer, Inc. All rights reserved.

Hadoop as a Data Hub: A Sears Case Study

Page 3: Hadoop as a Data Hub

© 2012 Datameer, Inc. All rights reserved.

Audio!

!   Audio will be streamed over the web for today’s webcast

!   Make sure your computer speakers are turned up and the volume is adjusted

!   If you are having trouble connecting, please send the host a chat message through the chat window

!

Page 4: Hadoop as a Data Hub

© 2012 Datameer, Inc. All rights reserved.

© 2012 Datameer, Inc. All rights reserved.

Hadoop as a Data Hub: A Sears Case Study

Page 5: Hadoop as a Data Hub

© 2012 Datameer, Inc. All rights reserved.

About our Speaker!Phil Shelley !Dr. Shelley is CTO at Sears Holdings Corporation (SHC), leading IT Operations and is focusing on the modernization of IT across the company. !!Phil is also CEO of Metascale, a subsidiary of Sears Holdings. Metascale is an IT managed Services Company that makes Big Data easy by designing, delivering and operating Hadoop-based solutions for Analytics, Mainframe Migration and massive-scale processing, integrated into the customers’ Enterprise.!

Page 6: Hadoop as a Data Hub

© 2012 Datameer, Inc. All rights reserved.

About our Speaker!

Stefan Groschupf!!Stefan Groschupf is the co-founder and CEO of Datameer and one of the original contributors to Nutch, the open source predecessor of Hadoop, !!Prior to Datameer, Stefan was the co-founder and CEO of Scale Unlimited, which implemented custom Hadoop analytic solutions for HP, Sun, Deutsche Telekom, Nokia and others. Earlier, Stefan was CEO of 101Tec, a supplier of Hadoop and Nutch-based search and text classification software to industry-leading companies such as Apple, DHL and EMI Music. Stefan has also served as CTO at multiple companies, including Sproose, a social search engine company.!

Page 7: Hadoop as a Data Hub

Hadoop as a Data Hub a new approach to data management

Dr. Phil ShelleyCTO Sears HoldingsCEO MetaScale

Page 8: Hadoop as a Data Hub

The

Challenge

Data

Volume /

Retention

Batch

Window

Limits

Escalating

IT Costs

Scalability

Ever

Evolving

Business

ETL

Complexity

/ Costs

Data

Latency /

Redundancy

Tight IT

Budgets

Challenges & Trends

2

Constant pressure to lower costs, deliver faster, migrate to real time

and answer more difficult questions…

Batch Real-Time→

Proprietary Open Source→

Capital Cloud Expense→

Heavy Iron Commodity→

Linear Parallel Processing→

Copy and Use Source Once & Re-Use→

Costs Down→

Power Up→

Page 9: Hadoop as a Data Hub

What is a Data Hub

A single, consolidated, fully populated data archive that

gives unfettered user access to analyze and report on data, with appropriate security, as soon as

the data is created by the transactional or other source

system

Page 10: Hadoop as a Data Hub

Why a Data Hub

• Most data latency is removed

• Users and analysts are put in a self-service mode

• The concept of a “data cube” is unnecessary

• Analysis at the lowest level – No need to run at the segment level

• Any question can be asked

• Business users and analysts have unrestricted ability to explore

• Correlation of any data set is immediately possible

• Significant reduction in reporting and analysis times

– Time to source the data

– Time for users to gain access to the data

• Reduction in IT labor ….

– Source Once – Use Many Times

Page 11: Hadoop as a Data Hub

• Data is Copied from source systems via ETL

• Sub-sets of data are captured

– Too expensive to keep all detail

– Takes too long to ETL all data fields from sources

• Each use of data generates more unique ETL jobs

• Data is segmented to reduce query times

• Cubes or views are generated to improve analysis speed

• Disparate data silos required ETL before users have access

• Data warehouse costs and performance limitations force

archiving and data truncation

• Tends to lead to different versions of “truth”

• Time lag or latency from data generation to use

The Traditional Approach

Page 12: Hadoop as a Data Hub

Benefits - Hadoop as a Data Hub

• All data is available

– All history

– All detail

• No need to filter, segment or cube before use

• Data can be consumed almost immediately

• No need to silo into different databases to

accommodate performance limitations

• Users do not require IT to ETL data before use

• Security is applied via Datameer profiles

• User self-service is a reality

Page 13: Hadoop as a Data Hub

Prerequisites

• An Enterprise data architecture that has a Data

Hub as a foundation

• Data sourcing must be controlled

• Metadata must be created for data sources

• A leader with the vision and capability to drive

• Willing business users to pilot and coach others

• A sustained strategy to Enterprise Data

Architecture and governance

• A carefully designed Hadoop data layer

architecture

Page 14: Hadoop as a Data Hub

Key Concepts

• A Data Hub is now reality

• Drives lower costs and reduces delays

• Time to value for data is reduced

• Business users and analysts are empowered

• The most important:

– Source Once – Re-use Many Times

– Source everything

– Retain everything

Page 15: Hadoop as a Data Hub

o ETL complexity is needed no-longer – DATA HUB

– Source Once – Re-Use many times

– ETL is transformed to ELTTTTTT with lower data latency

– Consume data in-place with Datameer

o ETL-induced data latency is largely eliminated

– Analysis is routinely possible within minutes of data creation

o Long-running overnight workload on Legacy Systems

– Can be eliminated and executed at any time

– Run times are a fraction of the original clock-time

o Batch processing on mainframes or other conventional batch

– Moved to Hadoop

– Run 10, 50, even 100 times faster.

o Intelligent Archive

– Put your archives/tape data on Hadoop and make it Intelligent

– Archive with the ability to run analytics or join it with other data

o Modernize Legacy

– Mainframe MIPs reduction has very attractive ROI

– Move Data Warehouse workload – Reduce Cost – Go Faster

Key Learning

Page 16: Hadoop as a Data Hub

Sample Reports - Datameer

Page 17: Hadoop as a Data Hub

© 2012 Datameer, Inc. All rights reserved.

Questions and Answers!

Page 18: Hadoop as a Data Hub

© 2012 Datameer, Inc. All rights reserved.

Online Resources

!  Try Datameer: www.datameer.com!!  Visit Metascale: www.metascale.com!!  Follow us on Twitter @datameer & @BigDataMadeEasy!

!