combine apache hadoop and elasticsearch to get the most of your big data

35
© Hortonworks Inc. 2013 Combine Apache Hadoop & Elasticsearch to get the most of your big data... Page 1

Upload: hortonworks

Post on 26-Jan-2015

120 views

Category:

Technology


5 download

DESCRIPTION

Hadoop is a great platform for storing and processing massive amounts of data. Elasticsearch is the ideal solution for Searching and Visualizing the same data. Join us to learn how you can leverage the full power of both platforms to maximize the value of your Big Data. In this webinar we'll walk you through: How Elasticsearch fits in the Modern Data Architecture. A demo of Elasticsearch and Hortonworks Data Platform. Best practices for combining Elasticsearch and Hortonworks Data Platform to extract maximum insights from your data.

TRANSCRIPT

Page 1: Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data

© Hortonworks Inc. 2013

Combine Apache Hadoop & Elasticsearch to get the most of your big data...

Page 1

Page 2: Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data

© Hortonworks Inc. 2013

Your Presenters

Page 2

Steve Mayzak (@smayzak) –  Head of Sales Engineering –  Seahawks fan!

Mark Lochbihler (@mlochbihler) – Partner Solutions Engineer – HUGE FC Barcelona Fan!

Page 3: Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data

© Hortonworks Inc. 2013

Today’s Topics

• Drivers for the Modern Data Architecture (MDA) • Elasticsearch’s role in the MDA • Q&A

Page 3

Page 4: Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data

© Hortonworks Inc. 2013

Hadoop Adoption

“Hadoop’s momentum is unstoppable as its open source roots grow wildly into enterprises. Its refreshingly unique approach to data management is transforming how companies store, process, analyze, and share big data”

--Mike Gualtieri, Forrester

Page 4

Page 5: Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data

© Hortonworks Inc. 2013

A Traditional Approach Under Pressure

Page 5

APPLICAT

IONS  

DATA

   SYSTEM  

REPOSITORIES  

SOURC

ES  

Exis4ng  Sources    (CRM,  ERP,  Clickstream,  Logs)  

RDBMS   EDW   MPP  

Emerging  Sources    (Sensor,  Sen4ment,  Geo,  Unstructured)  

Business    Analy4cs  

Custom  Applica4ons  

Packaged  Applica4ons  

Source: IDC

2.8  ZB  in  2012  

85%  from  New  Data  Types  

15x  Machine  Data  by  2020  

40  ZB  by  2020  

Page 6: Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data

© Hortonworks Inc. 2013

Emerging Modern Data Architecture

Page 6

APPLICAT

IONS  

DATA

   SYSTEM  

REPOSITORIES  

SOURC

ES  

Exis4ng  Sources    (CRM,  ERP,  Clickstream,  Logs)  

RDBMS   EDW   MPP  

Emerging  Sources    (Sensor,  Sen4ment,  Geo,  Unstructured)  

OPERATIONAL  TOOLS  

MANAGE  &  MONITOR  

DEV  &  DATA  TOOLS  

BUILD  &  TEST  

Business    Analy4cs  

Custom  Applica4ons  

Packaged  Applica4ons  

Page 7: Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data

© Hortonworks Inc. 2013

MDA Driver #1: A New Approach to Insight

HADOOP Iterate over structure

Transform and Analyze

Hadoop Approach §  Apply schema on read §  Support range of access patterns to

data stored in HDFS: polymorphic access

batch interactive real-time

Right Engine, Right Job

in-memory

Page 7

Current Approach §  Apply schema on write §  Heavily dependent on IT

Determine list of questions

Design solution

Collect structured data

Ask questions from list

Detect additional questions

Single Query Engine SQL

Page 8: Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data

© Hortonworks Inc. 2013

Operations 50%

Analytics 50%

HADOOP Parse, cleanse,

apply structure, transform

Augment with Hadoop §  Free up EDW resources from low

value tasks §  Keep 100% of source data and

historical data for ongoing exploration §  Mine data for value after loading it

because of schema-on-read

MDA Driver #2: Data Warehouse Optimization

Analytics 20%

ETL Process 30%

Operations 50%

Current Reality §  EDW at capacity; some usage

from low value workloads §  Older transformed data

archived, unavailable for ongoing exploration

§  Source data often discarded

Page 8

Page 9: Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data

© Hortonworks Inc. 2013

Cost, Insight IT Driven

MDA/Data Lake

The Common Journey with Hadoop SC

ALE

SCOPE Page 9

More data and analytic apps

New Analytic Apps New Types of Data LOB Driven

Page 10: Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data

© Hortonworks Inc. 2013

Unlock Value in New Types of Data 1.  Social

Understand how people are feeling and interacting – right now

2.  Clickstream Capture and analyze website visitors’ data trails and optimize your website

3.  Sensor/Machine Discover patterns in data streaming from remote sensors and machines

4.  Geographic Analyze location-based data to manage operations where they occur

5.  Server Logs Diagnose process failures and prevent security breaches

6.  Unstructured (txt, video, pictures, etc..) Understand patterns in files across millions of web pages, emails, and documents

Value

Page 10

+ Online archive Data that was once purged or moved to tape can be stored in Hadoop to discover long term trends and previously hidden value

Page 11: Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data

© Hortonworks Inc. 2013

20 Business Applications of Hadoop Industry Use Case Type of Data

Financial Services New Account Risk Screens Text, Server Logs

Trading Risk Server Logs

Insurance Underwriting Geographic, Sensor, Text

Telecom Call Detail Records (CDRs) Machine, Geographic

Infrastructure Investment Machine, Server Logs

Real-time Bandwidth Allocation Server Logs, Text, Social

Retail 360° View of the Customer Clickstream, Text

Localized, Personalized Promotions Geographic

Website Optimization Clickstream

Manufacturing Supply Chain and Logistics Sensor

Assembly Line Quality Assurance Sensor

Crowdsourced Quality Assurance Social

Healthcare Use Genomic Data in Medical Trials Structured

Monitor Patient Vitals in Real-Time Sensor

Pharmaceuticals Recruit and Retain Patients for Drug Trials Social, Clickstream

Improve Prescription Adherence Social, Unstructured, Geographic

Oil & Gas Unify Exploration & Production Data Sensor, Geographic & Unstructured

Monitor Rig Safety in Real-Time Sensor, Unstructured

Government ETL Offload in Response to Federal Budgetary Pressures Structured

Sentiment Analysis for Government Programs Social

Page 11

Page 12: Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data

© Hortonworks Inc. 2013

YARN Unlocks the Data Lake Vision

1st Gen of Hadoop

HDFS  (redundant,  reliable  storage)  

MapReduce  (cluster  resource  management  

 &  data  processing)  

Single Use System Batch Apps

Page 12

Store all data in one place, interact in multiple ways

Multi-Use Data Platform Batch, Interactive, Online, Streaming, …

Redundant,  Reliable  Storage  (HDFS)  

Efficient  Cluster  Resource    Management  &  Shared  Services  

(YARN)  

Flexible  Data  Processing  

Hive,  Pig,  others…  

Batch  MapReduce  

Batch  &  Interac4ve  Tez  

Online  Data    Processing  

HBase,  Accumulo  

Stream    Processing  

Storm  

 others  

…  

2nd Gen of Hadoop

Classic  Hadoop  Apps  

Page 13: Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data

© Hortonworks Inc. 2013

The Common Journey with Hadoop SC

ALE

SCOPE Page 13

New Analytic Apps New Types of Data LOB Driven

More data and analytic apps

MDA/Data Lake Cost, Insight IT Driven

Page 14: Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data

© Hortonworks Inc. 2013

Example Journey Towards a Data Lake D

ATA

VALUE

Risk Management E.g., Fraud Reduction

Operational Excellence E.g., Network Maintenance

New Business E.g., Data as a Product

Customer Intimacy E.g., 360 Degree View

of the Customer

TB’s

P

B

PB

’s

Page 14

DATA LAKE An architectural shift in the

data center that uses Hadoop to deliver deep insight across a

large, broad, diverse set of data at efficient scale

Data Lake

Page 15: Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data

© Hortonworks Inc. 2013

Enabling Hadoop for the Enterprise

2006 2007 2008 2009 2010 2011 2012 2013 2014 2015

Capabilities Ensure enterprise capabilities are delivered in 100% open source to benefit all

1 2 Integration

Interoperable with existing data center investments

Skills Leverage your existing skills: development, analytics, operations 3

Page 15

Page 16: Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data

© Hortonworks Inc. 2013

 Deployment  Model  Provide  the  efficient  deployment  op4on  for  your  organiza4on  

 

 Presenta4on  &  Applica4on  Enable  both  exis4ng  and  new  applica4ons  to  provide    

value  to  the  organiza4on  

 Opera4ons  Empower  Current  opera4ons  and  security  tools  to  manage  Hadoop  

Core Capabilities of Enterprise Hadoop

Page 16

Data  Governance  Integrate  with  exis4ng  systems  and  move  data  

in/out  and  within  the  environment  

Security  Provide  layered  approach  to  

security  through  Authen4ca4on,  Authoriza4on,  Accountability  

and  Data  Protec4on  

Opera4ons  Allow  you  to  deploy  and  effec4vely  manage  the  environment  

 BROAD  INSIGHT  Data  Access  

Access  your  data  simultaneously  in  mul4ple  ways  (batch,  interac4ve,  real4me)  

 EFFICIENT  SCALE  Data  Management  

Store  and  process  all  of  your  Corporate  Data  Assets  

1 Capabilities Ensure enterprise capabilities are delivered in 100% open source to benefit all

Page 17: Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data

© Hortonworks Inc. 2013

Enabling Familiar and Existing Tools

Page 17

DEVE

LOPE

R  AN

ALYST  

OPE

RATO

R  

COLLECT   PROCESS   BUILD  

EXPLORE   QUERY   DELIVER  

PROVISION   MANAGE   MONITOR  

1 2 Skills

Leverage your existing skills: development, analytics, operations

Integration Interoperable with existing data center investments 3

Capabilities Ensure enterprise capabilities are delivered in 100% open source to benefit all

Page 18: Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data

© Hortonworks Inc. 2013

APPLICAT

IONS  

DATA

   SYSTEM  

REPOSITORIES  

SOURC

ES  

Exis4ng  Sources    (CRM,  ERP,  Clickstream,  Logs)  

RDBMS   EDW   MPP  

Emerging  Sources    (Sensor,  Sen4ment,  Geo,  Unstructured)  

OPERATIONAL  TOOLS  

MANAGE  &  MONITOR  

DEV  &  DATA  TOOLS  

BUILD  &  TEST  

Business    Analy4cs  

Custom  Applica4ons  

Packaged  Applica4ons  

Requirements for Enterprise Hadoop

Page 18

1 2 Skills

Leverage your existing skills: development, analytics, operations

Capabilities Ensure enterprise capabilities are delivered in 100% open source to benefit all

Integrate with Applications Business Intelligence, Developer IDEs, Data Integration

Systems Data Systems & Storage, Systems Management

Platforms Operating Systems, Virtualization, Cloud, Appliances

Integration Interoperable with existing data center investments 3

Page 19: Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data

© Hortonworks Inc. 2013

Elasticsearch in the Modern Data Architecture

Page 19

APPLICAT

IONS  

DATA

 SYSTEM  

SOURC

ES  

RDBMS   EDW   MPP  

Emerging  Sources    (Sensor,  Sen4ment,  Geo,  Unstructured)  

HANA

OPERATIONAL  TOOLS  

DEV  &  DATA  TOOLS  

Exis4ng  Sources    (CRM,  ERP,  Clickstream,  Logs)  

INFRASTRUCTURE  

Page 20: Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data

© Hortonworks Inc. 2013

Today’s Topics

• Drivers for the Modern Data Architecture (MDA) • Elasticsearch’s role in the MDA • Q&A

Page 20

Page 21: Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data

Copyright  ElasBcsearch  2014.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited

What is Elasticsearch?

Page 22: Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data

Copyright  ElasBcsearch  2013.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited

Elasticsearch real time,

search and analytics engine

open-source

Lucene based

distributed

scales massively

high availability

RESTful API JSON

over HTTP

schema free

multi tenancy

Page 23: Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data

Copyright  ElasBcsearch  2013.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited

Data From Any Source

Instantly Analyze

Actionable Insights

The Elasticsearch ELK Stack

Logstash Elasticsearch Kibana

Page 24: Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data

Copyright  ElasBcsearch  2013.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited

What about Elasticsearch the Company? •  Support 100s of Companies in Production environments •  Training Developers and Ops around the world on ELK •  Drive the ELK Projects forward, great things to come! •  Commercial products: Marvel to monitor and manage ELK

•  Backed by the best: Benchmark, Index Ventures

Page 25: Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data

Copyright  ElasBcsearch  2014.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited

Who’s using Elasticsearch?

Page 26: Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data

Copyright  ElasBcsearch  2014.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited

What are people saying about Elasticsearch?

Page 27: Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data

Copyright  ElasBcsearch  2013.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited

Real-time Search •  Europe’s largest professional social

network

•  Over 14 Million members

•  New data available for search immediately vs 50 mins

•  “According to the customer survey that we conduct every quarter, search is the most important feature on our platform,” - Dr. Daniel Olmedilla, Vice President, Data Science at XING

Page 28: Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data

Copyright  ElasBcsearch  2014.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited

How do they fit together?

Page 29: Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data

Copyright  ElasBcsearch  2013.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited

Elasticsearch

Elasticsearch-Hadoop Library

Raw data

Integrate Natively Choice

Index seamlessly

Free Text Search

Analytics

Clean, Enrich

Page 30: Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data

Copyright  ElasBcsearch  2013.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited

Elasticsearch-Hadoop Library •  Java Library for integrating Elasticsearch and Hadoop

•  Pig, Hive, Cascading, MapReduce

•  Search & Real-time Analytics with Elasticsearch, Hadoop as Data Lake

•  Scales with Hadoop

•  Works with Apache Hadoop, Certified on HDP 1.x and 2.x (Yarn compatible Binary)

Page 31: Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data

Copyright  ElasBcsearch  2013.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited

Multiple Architectures

-Same Hardware -1 for 1

Page 32: Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data

Copyright  ElasBcsearch  2013.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited

-Separate Hardware -Clusters of each

-Scale Independently

ES Node

ES Node

ES Node

Multiple Architectures

Page 33: Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data

Copyright  ElasBcsearch  2013.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited

Show me! •  Hortonworks HDP Sandbox - making Hadoop easy!

•  Installed Elasticsearch, Marvel and Kibana on Sandbox

•  Upload elasticsearch-hadoop jar as Pig Storage lib

•  Index CSV data from Pig to Elasticsearch

•  Query Elasticsearch from Pig - best of both

•  Kibana to Visualize and Discover

Page 34: Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data

Copyright  ElasBcsearch  2013.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited

Where to find us?

elasticsearch.com elasticsearch.org @elasticsearch #elasticsearch

IRC (webchat.freenode)

Github elasticsearch/elasticsearch

Page 35: Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data

© Hortonworks Inc. 2013

Try Hadoop Today… Get Involved

Download the Hortonworks Sandbox

Page 35

Learn Hadoop

Build Your Analytic App

Try Hadoop 2

More about Elasticsearch & Hortonworks hortonworks.com/partner/elasticsearch

Contact us: [email protected]