hortonworks presentation at big data london

34
© Hortonworks Inc. 2013 Hortonworks Enterprise Apache Hadoop March 5, 2013 Page 1

Upload: hortonworks

Post on 10-May-2015

2.482 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Hortonworks Presentation at Big Data London

© Hortonworks Inc. 2013

Hortonworks Enterprise Apache Hadoop

March 5, 2013

Page 1

Page 2: Hortonworks Presentation at Big Data London

© Hortonworks Inc. 2013

Hortonworks

•  Who is Hortonworks

•  Our Approach

•  Customer Use Cases

Page 2

Page 3: Hortonworks Presentation at Big Data London

© Hortonworks Inc. 2013

Housekeeping Items

•  Restrooms on 2nd and 4th Floors

•  Hadoop Summit –  March 20-21 in Amsterdam –  PreConference Training on March 18-19

–  Discount Code Amst13Spon20

•  Download SandBox –  QR Code at postcode on table

Page 3

Page 4: Hortonworks Presentation at Big Data London

© Hortonworks Inc. 2013

A Brief History of Apache Hadoop

Page 4

2013

Focus on INNOVATION 2005: Yahoo! creates

team under E14 to work on Hadoop

Focus on OPERATIONS 2008: Yahoo team extends focus to

operations to support multiple projects & growing clusters

Yahoo! begins to Operate at scale

Enterprise Hadoop

Apache Project Established

Hortonworks Data Platform

2004 2008 2010 2012 2006

STABILITY 2011: Hortonworks created to focus

on “Enterprise Hadoop“. Starts with 24 key Hadoop engineers from Yahoo

Page 5: Hortonworks Presentation at Big Data London

© Hortonworks Inc. 2013

Hortonworks Snapshot

Page 5

•  We distribute the only 100% Open Source Enterprise Hadoop Distribution: Hortonworks Data Platform

•  We engineer, test & certify HDP for enterprise usage

•  We employ the core architects, builders and operators of Apache Hadoop

•  We drive innovation within Apache Software Foundation projects

•  We are uniquely positioned to deliver the highest quality of Hadoop support

•  We enable the ecosystem to work better with Hadoop

Develop Distribute Support

We develop, distribute and support the ONLY 100% open source Enterprise Hadoop distribution

Endorsed by Strategic Partners

Headquarters: Palo Alto, CA Employees: 180+ and growing Investors: Benchmark, Index, Yahoo

Page 6: Hortonworks Presentation at Big Data London

© Hortonworks Inc. 2013

Hortonworks

•  Who is Hortonworks •  Our approach

–  Leading Open Source Hadoop innovation –  Addressing “Enterprise Hadoop” Requirements –  Enabling Interoperability of the Ecosystem –  Ensuring No Lock-In: 100% Open Source

•  Patterns of Use

Page 6

Page 7: Hortonworks Presentation at Big Data London

© Hortonworks Inc. 2013 Page 7

Apache Software Foundation Guiding Principles •  Release early & often •  Transparency, respect, meritocracy

Key Roles held by Hortonworkers •  VP & PMC Members

–  Arun Murthy (Hadoop), Daniel Dai (Pig), Mahadev Konar (Zookeeper)

•  Release Managers –  Matt Foley (Hadoop 1.x), Arun Murthy

(Hadoop 2.x), Ashutosh Chauhan (Hive), Daniel Dai (Pig), Alan Gates (HCatalog), Mahadev Konar (Ambari)

•  Committers –  54 across all Hadoop-related projects

Apache Hadoop

Test & Patch

Design & Develop

Release

Apache Pig

Apache HCatalog

Apache HBase

Other Apache Projects

Apache Hive

Apache Ambari

“We have noticed more activity over the last year from Hortonworks’ engineers on building out Apache Hadoop’s more innovative features. These include YARN, Ambari and HCatalog..”

- Jeff Kelly: Wikibon

Apache Community Leadership

Page 8: Hortonworks Presentation at Big Data London

© Hortonworks Inc. 2013

Leadership that Starts at the Core

Page 8

• Driving next generation Hadoop – YARN, MapReduce2, HDFS2, High

Availability, Disaster Recovery

•  420k+ lines authored since 2006 – More than twice nearest contributor

• Deeply integrating w/ecosystem

– Enabling new deployment platforms –  (ex. Windows & Azure, Linux & VMware HA)

– Creating deeply engineered solutions –  (ex. Teradata big data appliance)

• All Apache, NO holdbacks –  100% of code contributed to Apache

Page 9: Hortonworks Presentation at Big Data London

© Hortonworks Inc. 2013

Driving Enterprise Hadoop Innovation

Page 9

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

AMBARI

HBASE

HCATALOG

HIVE

PIG

HADOOP CORE

Lines Of Code By Company Source: Apache Software Foundation

Hortonworks Yahoo!

Cloudera Other

Hortonworks Committers

Cloudera Committers

19 9

5 1

1 0

5 0

3 7

14 0

Page 10: Hortonworks Presentation at Big Data London

© Hortonworks Inc. 2013

Hortonworks Process for Enterprise Hadoop

Page 10

Upstream Community Projects Downstream Enterprise Product

Hortonworks Data Platform

Design & Develop

Distribute

Integrate & Test

Package & Certify

Apache HCatalog

Apache Pig

Apache HBase

Other Apache Projects

Apache Hive

Apache Ambari

Apache Hadoop

Test & Patch

Design & Develop

Release

No Lock-in: Integrated, tested & certified distribution lowers risk by ensuring close alignment with Apache projects

Virtuous cycle when development & fixed issues done upstream & stable project releases flow downstream

Stable Project Releases

Fixed Issues

Page 11: Hortonworks Presentation at Big Data London

© Hortonworks Inc. 2013

Hortonworks

•  Who is Hortonworks •  Our approach

–  Leading Open Source Hadoop Innovation –  Addressing “Enterprise Hadoop” Requirements –  Enabling Interoperability of the Ecosystem –  Ensuring NO LOCK-IN: 100% Open Source

•  Patterns of use

Page 11

Page 12: Hortonworks Presentation at Big Data London

© Hortonworks Inc. 2013

Enhancing the Core of Apache Hadoop

Deliver high-scale storage & processing with enterprise-ready platform services

Unique Focus Areas: •  Bigger, faster, more flexible

Continued focus on speed & scale and enabling near-real-time apps

•  Tested & certified at scale Run ~1300 system tests on large Yahoo clusters for every release

•  Enterprise-ready services High availability, disaster recovery, snapshots, security, …

Page 12

HADOOP  CORE  

Hortonworkers are the architects, operators, and builders of core Hadoop

Distributed Storage & Processing

PLATFORM  SERVICES   Enterprise Readiness

Page 13: Hortonworks Presentation at Big Data London

© Hortonworks Inc. 2013 Page 13

HADOOP  CORE  

DATA  SERVICES  

Provide data services to store, process & access data in many ways

Unique Focus Areas: •  Apache HCatalog

Metadata services for consistent table access to Hadoop data

•  Apache Hive Explore & process Hadoop data via SQL & ODBC-compliant BI tools

Distributed Storage & Processing

Hortonworks enables Hadoop data to be accessed via existing tools & systems

Store, Process and Access Data

PLATFORM  SERVICES   Enterprise Readiness

Data Services for Full Data Lifecycle

Page 14: Hortonworks Presentation at Big Data London

© Hortonworks Inc. 2013

Operational Services for Ease of Use

Page 14

OPERATIONAL  SERVICES  

Include complete operational services for productive operations & management

Unique Focus Area: •  Apache Ambari:

Provision, manage & monitor a cluster; complete REST APIs to integrate with existing operational tools; job & task visualizer to diagnose issues

Only Hortonworks provides a complete open source Hadoop management tool

Manage & Operate at

Scale

DATA  SERVICES  

Store, Process and Access Data

HADOOP  CORE   Distributed Storage & Processing

PLATFORM  SERVICES   Enterprise Readiness

Page 15: Hortonworks Presentation at Big Data London

© Hortonworks Inc. 2013

OS   Cloud   VM   Appliance  

Page 15

PLATFORM  SERVICES  

HADOOP  CORE  

DATA  SERVICES  

OPERATIONAL  SERVICES  

Manage & Operate at

Scale

Store, Process and Access Data

Enterprise Readiness

Only Hortonworks allows you to deploy seamlessly across any deployment option

•  Linux & Windows •  Azure, Rackspace & other clouds •  Virtual platforms •  Big data appliances

HORTONWORKS    DATA  PLATFORM  (HDP)  

Distributed Storage & Processing

Deployable Across a Range of Options

Page 16: Hortonworks Presentation at Big Data London

© Hortonworks Inc. 2013

OS   Cloud   VM   Appliance  

HDP: Enterprise Hadoop Distribution

Page 16

PLATFORM  SERVICES  

HADOOP  CORE  

DATA  SERVICES  

OPERATIONAL  SERVICES  

Manage & Operate at

Scale

Store, Process and Access Data

HORTONWORKS    DATA  PLATFORM  (HDP)  

Distributed Storage & Processing

Hortonworks Data Platform (HDP) Enterprise Hadoop

•  The ONLY 100% open source and complete distribution

•  Enterprise grade, proven and tested at scale

•  Ecosystem endorsed to ensure interoperability

Enterprise Readiness

Page 17: Hortonworks Presentation at Big Data London

© Hortonworks Inc. 2013

OS   Cloud   VM   Appliance  

HDP 1.2: Data Services Improvements

Page 17

PLATFORM  SERVICES  

HADOOP  CORE  

Enterprise Readiness High Availability, Disaster Recovery, Snapshots, Security, etc…

HORTONWORKS    DATA  PLATFORM  (HDP)  

OPERATIONAL  SERVICES  

DATA  SERVICES  

HCATALOG  

HIVE  PIG  HBASE  

OOZIE  

AMBARI  

HDFS   YARN  (in  2.0)  

WEBHDFS   MAP  REDUCE  

Hortonworks Data Platform (HDP) Enterprise Hadoop

•  The ONLY 100% open source and complete distribution

•  Enterprise grade, proven and tested at scale

•  Ecosystem endorsed to ensure interoperability

SQOOP  

FLUME  

Page 18: Hortonworks Presentation at Big Data London

© Hortonworks Inc. 2013

Latest Hortonworks Announcements

Two releases in January 2013

Hortonworks Data Platform 1.2 Hortonworks Brings Enterprise Manageability to 100% Open Source Apache Hadoop Distribution

Hortonworks Sandbox Hortonworks accelerates Hadoop skills development with an easy-to-use, flexible and extensible platform to learn, evaluate and use Apache Hadoop

Page 18

JANUARY

15

JANUARY

22

Page 19: Hortonworks Presentation at Big Data London

© Hortonworks Inc. 2013

Latest Hortonworks Announcements

February 2013

Hortonworks : New Apache projects Hortonworks fuel the Open Source by releasing three new projects : KNOX / TEZ / STINGER

HDP available on Microsoft Windows To help the Hadoop adoption, Hortonworks release HDP on Microsoft Windows

Page 19

February

20

February

25

Page 20: Hortonworks Presentation at Big Data London

© Hortonworks Inc. 2013

Hortonworks

•  Who is Hortonworks •  Our approach

–  Leading Open Source Hadoop Innovation –  Addressing “Enterprise Hadoop” Requirements –  Enabling Interoperability of the Ecosystem –  Ensuring No Lock-in: 100% Open Source

•  Patterns of use

Page 20

Page 21: Hortonworks Presentation at Big Data London

© Hortonworks Inc. 2013

Existing Data Architecture

Page 21

APPLICAT

IONS  

DATA

 SYSTEMS  

TRADITIONAL  REPOS  RDBMS   EDW   MPP  

DATA

 SOURC

ES  

OLTP,  POS  SYSTEMS  

OPERATIONAL  TOOLS  

MANAGE  &  MONITOR  

TradiLonal  Sources    (RDBMS,  OLTP,  OLAP)  

DEV  &  DATA  TOOLS  

BUILD  &  TEST  

Business  AnalyLcs  

Custom  ApplicaLons  

Enterprise  ApplicaLons  

Page 22: Hortonworks Presentation at Big Data London

© Hortonworks Inc. 2013

An Emerging Data Architecture

Page 22

APPLICAT

IONS  

DATA

 SYSTEMS  

TRADITIONAL  REPOS  RDBMS   EDW   MPP  

DATA

 SOURC

ES  

MOBILE  DATA  

OLTP,  POS  SYSTEMS  

OPERATIONAL  TOOLS  

MANAGE  &  MONITOR  

TradiLonal  Sources    (RDBMS,  OLTP,  OLAP)  

New  Sources    (web  logs,  email,  sensor  data,  social  media)  

DEV  &  DATA  TOOLS  

BUILD  &  TEST  

Business  AnalyLcs  

Custom  ApplicaLons  

Enterprise  ApplicaLons  

HORTONWORKS    DATA  PLATFORM  

Page 23: Hortonworks Presentation at Big Data London

© Hortonworks Inc. 2013

Interoperating With Your Tools

Page 23

APPLICAT

IONS  

DATA

 SYSTEMS  

TRADITIONAL  REPOS  

DEV  &  DATA  TOOLS  

OPERATIONAL  TOOLS  

Viewpoint

Microsoft Applications

HORTONWORKS    DATA  PLATFORM  

DATA

 SOURC

ES  

MOBILE  DATA  

OLTP,  POS  SYSTEMS  

TradiLonal  Sources    (RDBMS,  OLTP,  OLAP)  

New  Sources    (web  logs,  email,  sensor  data,  social  media)  

Page 24: Hortonworks Presentation at Big Data London

© Hortonworks Inc. 2013

Hortonworks

•  Who is Hortonworks •  Our approach

–  Leading Open Source Hadoop Innovation –  Addressing “Enterprise Hadoop” Requirements –  Enabling Interoperability of the Ecosystem –  Ensuring No Lock-In: 100% Open Source

•  Patterns of use

Page 24

Page 25: Hortonworks Presentation at Big Data London

© Hortonworks Inc. 2013

Hortonworks

•  Who is Hortonworks •  Our approach •  Patterns of use

Page 25

Page 26: Hortonworks Presentation at Big Data London

© Hortonworks Inc. 2013

Operational Data Refinery

Page 26

DATA

 SYSTEMS  

DATA

 SOURC

ES  

1

3 1 Capture

Capture all data

Process Parse, cleanse, apply structure & transform

Exchange Push to existing data warehouse for use with existing analytic tools

2

3

Refine Explore Enrich

2

APPLICAT

IONS  

Collect data and apply a known algorithm to it in trusted operational process

TRADITIONAL  REPOS  RDBMS   EDW   MPP  

HORTONWORKS    DATA  PLATFORM  

Business  AnalyLcs  

Custom  ApplicaLons  

Enterprise  ApplicaLons  

TradiLonal  Sources    (RDBMS,  OLTP,  OLAP)  

New  Sources    (web  logs,  email,  sensor  data,  social  media)  

Page 27: Hortonworks Presentation at Big Data London

© Hortonworks Inc. 2013

Big Data Exploration & Visualization

Page 27

DATA

 SYSTEMS  

DATA

 SOURC

ES  

Refine Explore Enrich

APPLICAT

IONS  

1 Capture Capture all data

Process Parse, cleanse, apply structure & transform

Exchange Explore and visualize with analytics tools supporting Hadoop

2

3

Collect data and perform iterative investigation for value

3

2 TRADITIONAL  REPOS  

RDBMS   EDW   MPP  

1

HORTONWORKS    DATA  PLATFORM  

Business  AnalyLcs  

TradiLonal  Sources    (RDBMS,  OLTP,  OLAP)  

New  Sources    (web  logs,  email,  sensor  data,  social  media)  

Custom  ApplicaLons  

Enterprise  ApplicaLons  

Page 28: Hortonworks Presentation at Big Data London

© Hortonworks Inc. 2013

Application Enrichment

Page 28

DATA

 SYSTEMS  

DATA

 SOURC

ES  

Refine Explore Enrich

APPLICAT

IONS  

1 Capture Capture all data

Process Parse, cleanse, apply structure & transform

Exchange Incorporate data directly into applications

2

3

Collect data, analyze and present salient results for online apps

3

1

2 TRADITIONAL  REPOS  

RDBMS   EDW   MPP  

TradiLonal  Sources    (RDBMS,  OLTP,  OLAP)  

New  Sources    (web  logs,  email,  sensor  data,  social  media)  

Custom  ApplicaLons  

Enterprise  ApplicaLons  

NOSQL  

HORTONWORKS    DATA  PLATFORM  

Page 29: Hortonworks Presentation at Big Data London

© Hortonworks Inc. 2013

Key 2013 “Enterprise Hadoop” Initiatives

Page 29

Invest In:

– Platform Services – DR, Snapshot, …

– Data Services –  In support of Refine,

Explore, Enrich

– Operational Services – Manageability,

Security, …

Tez / “Stinger” Interactive Query

“Gateway” Secure Access

“Continuum” Biz Continuity

HORTONWORKS    DATA  PLATFORM  (HDP)  

PLATFORM  SERVICES  

HADOOP  CORE  

DATA  SERVICES  

OPERATIONAL  SERVICES  

Ambari Manage & Operate

“Herd” Data Integration

HBase Online Data

Page 30: Hortonworks Presentation at Big Data London

©  Hortonworks  Inc.  2013  

Stinger: Make Hive Best for All Needs

Page 30

Interac4ve   Batch  

•  Parameterized  Reports  

•  Drilldown  •  Visualiza4on  •  Explora4on  

•  Opera4onal  batch  processing  

•  Enterprise  Reports  •  Data  Mining  

Data Size

5s – 1m 1m – 1h 1h+

Non-­‐Interac4ve  

•  Data  prepara4on  •  Incremental  batch  processing  

•  Dashboards  /  Scorecards  

Improve Latency & Throughput •  Query engine improvements •  New “Optimized RCFile” column store •  Next-gen runtime (elim’s M/R latency)

Extend Deep Analytical Ability •  Analytics functions •  Improved SQL coverage •  Continued focus on core Hive use cases

Page 31: Hortonworks Presentation at Big Data London

© Hortonworks Inc. 2013

Flexible Support Subscription Programs

Leverage Hortonworks Expertise: Subscription and Support delivered and backed by Hadoop experts; subscriptions based on nodes or storage

Page 31

Developer Support “How to” guidance for developers and archs

Essential Support* Operations support for small research clusters

Standard Support Operations support for dev & test clusters

12 x 5 Web only

12 x 5 Web only

All Sev: 1 business day

All Sev: 1 business day

12 x 5 Web only

Application Design Advice Code Review

Cluster Design, Install, Maintain, Performance

Cluster Design, Install, Maintain, Performance

All Sev: 1 business day 1 seat

3 Contacts

3 Contacts

Patches & Updates

Patches & Updates

* Limited in size and no expansion

Enterprise Support Operations support for critical clusters

24 x 7 Phone &

Web

Sev 1: 1 Hour Sev 2: 4 Bus Hour

Cluster Design, Install, Maintain, Performance

5 Contacts

Patches & Updates

Additional Options

Page 32: Hortonworks Presentation at Big Data London

© Hortonworks Inc. 2013

Hortonworks: Best In Class Hadoop Support

•  Experienced enterprise support team – Experience supporting enterprise clients in production – Core engineers have real operational

experience: built and supported 44+K nodes in production – Extensive experience in commercial big data offerings

including HDP, MapR, Karmasphere

•  Global 24x7 operation – support based in Sunnyvale, UK & India

•  Stringent case management processes ensures high quality customer service & responsiveness

Page 32

Page 33: Hortonworks Presentation at Big Data London

© Hortonworks Inc. 2013

Transferring Our Hadoop Expertise to You

The expert source for Apache Hadoop training & certification

• World class training programs designed to

help you learn fast – Role-based hands on classes with 50% lab time

• Expert consulting services – Programs designed to transfer knowledge

•  Industry leading Hadoop Sandbox program

– Fastest way to learn Apache Hadoop – Multi-level tutorials for wide applicability – Customizable and updateable

Page 33

Page 34: Hortonworks Presentation at Big Data London

© Hortonworks Inc. 2013

Summary

• Leading the Innovation in Core Hadoop • Addressing the requirements for Enterprise usage • Enabling interoperability of the ecosystem • No lock-in. 100% Open Source.

• Best in industry support with flexible pricing model

• Find out more – www.hortonworks.com

– http://hortonworks.com/hadoop-training/

Page 34