hadoop operations, innovations and enterprise readiness with hortonworks data platform v1.2

19
© Hortonworks Inc. 2013 Hadoop Operations & Enterprise Readiness HDP 1.2 Jim Walker Jeff Sposetti Page 1

Upload: hortonworks

Post on 10-May-2015

2.041 views

Category:

Documents


7 download

DESCRIPTION

Hortonworks continues to innovate throughout all Hadoop related projects, packaging the most enterprise-ready components, such as Ambari, into the Hortonworks Data Platform (HDP). Please join us in this interactive webinar as we present real-world use cases of Enterprise customers that are finding success with HDP and their Big Data initiatives. We will also introduce new features from version 1.2 of the Hortonworks Data Platform and how it has become the leading 100% open source distribution choice for the Enterprise. In this webinar we will outline how enterprise customers are successfult with HDP and also review some of the newest features in version1.2 including: -How to provision a cluster -How to manage and monitor a cluster using completely open source tools -How to perform diagnostics to identify issues in a cluster

TRANSCRIPT

Page 1: Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data Platform v1.2

© Hortonworks Inc. 2013

Hadoop Operations &Enterprise Readiness

HDP 1.2

Jim Walker

Jeff Sposetti

Page 1

Page 2: Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data Platform v1.2

© Hortonworks Inc. 2013

Hortonworks Snapshot

Page 2

• We distribute the only 100% Open Source Enterprise Hadoop Distribution: Hortonworks Data Platform

• We engineer, test & certify HDP for enterprise usage

• We employ the core architects, builders and operators of Apache Hadoop

• We drive innovation within Apache Software Foundation projects

• We are uniquely positioned to deliver the highest quality of Hadoop support

• We enable the ecosystem to work better with Hadoop

Develop Distribute Support

We develop, distribute and support the ONLY 100% open source Enterprise Hadoop distribution

Endorsed by Strategic Partners

Page 3: Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data Platform v1.2

© Hortonworks Inc. 2013

Hortonworks Process for Enterprise Hadoop

Page 3

Upstream Community Projects Downstream Enterprise Product

HortonworksData Platform

Design & Develop

Distribute

Integrate & Test

Package & Certify

ApacheHCatalo

g

ApachePig

ApacheHBase

Other Apache Projects

ApacheHive

Apache Ambari

ApacheHadoop

Test &Patch

Design & Develop

Release

No Lock-in: Integrated, tested & certified distribution lowers risk by ensuring close alignment with Apache projects

Virtuous cycle when development & fixed issues done upstream & stable project releases flow downstream

Stable Project Releases

Fixed Issues

Page 4: Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data Platform v1.2

© Hortonworks Inc. 2013

Hortonworks Data Platform 1.2

• Quarterly cadence– HDP is aligned tightly with the open source community

software releases, not a patchwork– Regular open source innovation based on an open

community

• Ecosystem validation– Packaged and tested with our key development partner,

Yahoo! across hundreds of nodes– Ambari is the preferred management tool for integration with

of Microsoft System Center and Teradata Viewpoint, today.

Page 4

Page 5: Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data Platform v1.2

© Hortonworks Inc. 2013

HDP 1.2 Summary

Hortonworks Data Platform 1.2 Hortonworks Data Platform outpaces the competition to extend leadership through 100% open source Enterprise Apache Hadoop

Focus areas:1. Ambari: continued innovation with a complete,

free and open cluster management tool• Existing: Provision, Manage and Monitor your Hadoop infrastructure

• New: Root Cause Analysis with job diagnostics, usage heat maps,

• Improved: Ecosystem integration and user interface

2. Enhanced security model and performance for Hive and HCatalog

3. Apache Mahout: now included in the HDP distribution

Page 5

Page 6: Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data Platform v1.2

© Hortonworks Inc. 2013

HDP Certifies Latest Stable Components

Page 6

Apache Project

HDP1.2

CDH3u5

CDH4.1.2

Hadoop 1.1.2 020.2 +923.418 2.0.0alpha +541 Pig 0.10.1 0.8.1 +51.39 0.10.0 +48

Hive 0.10.0 0.7.1 +42.56 0.9.0 +148

HCatalog 0.5.0 n/a n/a

HBase 0.94.2 0.90.6 +84.73 0.92.1 +154

Sqoop 1.4.2 1.3.0 +5.88 1.4.1 +51

Oozie 3.2.0 3.2.0 3.2.0

Zookeeper 3.4.5 3.3.5 +19.5 3.4.3 +25

Ambari 1.2.0 n/a n/a

Flume 1.3.0 0.9.4 +25.46 1.2.0 +119

Mahout 0.7.0 0.5 +9.7 0.7 +4

Source: http://files.cloudera.com/pdf/datasheet/cdh4.1_spec_sheet.pdf

Page 7: Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data Platform v1.2

© Hortonworks Inc. 2013

A Brief History of Apache Hadoop

Page 7

2013

Focus on INNOVATION2005: Yahoo! creates

team under E14 to work on Hadoop

Focus on OPERATIONS2008: Yahoo team extends focus to

operations to support multiple projects & growing clusters

Yahoo! begins to Operate at scale

EnterpriseHadoop

Apache Project Established

HortonworksData Platform

2004 2008 2010 20122006

STABILITY2011: Hortonworks created to focus on “Enterprise Hadoop“. Starts with

24 key Hadoop engineers from Yahoo

Page 8: Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data Platform v1.2

© Hortonworks Inc. 2013

HDP: Enterprise Hadoop Distribution

Page 8

PLATFORM SERVICES

HADOOP CORE

DATASERVICES

OPERATIONAL SERVICES

Manage & Operate at

Scale

Store, Process and Access Data

HORTONWORKS DATA PLATFORM (HDP)

Distributed Storage & Processing

Hortonworks Data Platform (HDP)Enterprise Hadoop

• The ONLY 100% open source and complete distribution

• Enterprise grade, proven and tested at scale

• Ecosystem endorsed to ensure interoperability

Enterprise Readiness

Page 9: Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data Platform v1.2

© Hortonworks Inc. 2013

Next-Generation Data Architecture

Page 9

APPL

ICAT

ION

SDA

TA S

YSTE

MS

TRADITIONAL REPOSRDBMS EDW MPP

DATA

SO

URC

ES

MOBILEDATA

OLTP, POS SYSTEMS

OPERATIONALTOOLS

MANAGE & MONITOR

Traditional Sources (RDBMS, OLTP, OLAP)

New Sources (web logs, email, sensor data, social media)

DEV & DATATOOLS

BUILD & TEST

Business Analytics

Custom Applications

Enterprise Applications

HORTONWORKS DATA PLATFORM

Page 10: Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data Platform v1.2

© Hortonworks Inc. 2013

HDP 1.2: Operational Services Improvements

Page 10

PLATFORM SERVICES

HADOOP CORE

DATASERVICES

Store, Process and Access Data

Enterprise ReadinessHigh Availability, Disaster Recovery, Snapshots, Security, etc…

HORTONWORKS DATA PLATFORM (HDP)

Distributed Storage & Processing

Apache Ambari 1.2Hortonworks open source approach continues to accelerate enterprise adoption of Hadoop

– Open Source ApproachThe only 100% open source Apache Hadoop cluster management tool

– Baseline FeaturesDelivers all necessary tools/functions to provision, manage and monitor a Apache Hadoop cluster

– InnovationProvides ability to zoom into cluster usage and performance metrics for jobs and tasks to identify root cause of bottlenecks or operations issues

– InteroperableIncludes APIs for integrating with Microsoft System Center, Teradata Viewpoint, and other systems

Also Upgraded Oozie & Zookeeper

OPERATIONAL SERVICES

Manage & Operate at

ScaleOOZIE

AMBARI

Page 11: Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data Platform v1.2

© Hortonworks Inc. 2013

Apache Ambari Dashboard

HDP 1.2: New Ambari Features

• Job DiagnosticsVisualize and troubleshoot Hadoop job execution and performance

• Cluster History View historical job execution & performance

• REST interface provides external access to Ambari for existing tools. Facilitates integration with Microsoft System Center and Teradata Viewpoint

• Instant InsightView health of Core Hadoop (HDFS, MapReduce) and related projects

• Cluster Navigation “Quick link” buttons jump into namenode web UI for a server

Page 11

Page 12: Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data Platform v1.2

© Hortonworks Inc. 2013

Demo

Page 12

Page 13: Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data Platform v1.2

© Hortonworks Inc. 2013

HDP 1.2: Platform Service Improvements

Page 13

PLATFORM SERVICES

HADOOP CORE

DATASERVICES

Store, Process and Access Data

Enterprise ReadinessHigh Availability, Disaster Recovery, Snapshots, Security, etc…

HORTONWORKS DATA PLATFORM (HDP)

Distributed Storage & Processing

SecurityExtend platform services for security, a KEY requirement for enterprise adoption of Hadoop

– Enhanced security architecture & pluggable authentication model controls access to Hive tables and metastore

– Aligns and improves Hive & HCatalog authentication models

High AvailabilityFull stack HA on Hadoop 1.0

– Extended HA to Hive & HCatalog Metastore

OPERATIONAL SERVICES

Manage & Operate at

Scale

Page 14: Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data Platform v1.2

© Hortonworks Inc. 2013

HDP 1.2: Data Services Improvements

Page 14

PLATFORM SERVICES

HADOOP CORE

Enterprise ReadinessHigh Availability, Disaster Recovery, Snapshots, Security, etc…

HORTONWORKS DATA PLATFORM (HDP)

Distributed Storage & Processing

Data Services Updates– Upgraded Pig, and Flume

– Added Mahout (0.7.0) to distribution

Hive, HCatalog & HBaseContinue to innovate & improve the data services with open source contributions to HCatalog, Hive and HBase

– Concurrency improvements for Hive and consistent security for Hive & HCatalog

– Performance and operational enhancements for HBase

– Improved Java developer productivity via certified Cascading framework

OPERATIONAL SERVICES

Manage & Operate at

Scale

DATASERVICES

Store, Process and Access Data

HCATALOG

HIVEPIG

HBASESQOOP

FLUMEMAHOUT

Page 15: Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data Platform v1.2

© Hortonworks Inc. 2013Page 15

Page 16: Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data Platform v1.2

© Hortonworks Inc. 2013Page 16

Page 17: Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data Platform v1.2

© Hortonworks Inc. 2013Page 17

Apache Software Foundation Guiding Principles• Release early & often• Transparency, respect, meritocracy

Key Roles held by Hortonworkers• PMC Members

– Managing community projects– Mentoring new incubator projects– About 20 Hortonworkers managing community

• Committers– Authoring, reviewing & editing code– About 50 Hortonworkers across projects

• Release Managers– Testing & releasing projects– Hortonworkers across key projects like Hadoop,

Hive, Pig, HCatalog, Ambari, HBase

ApacheHadoop

Test &Patch

Design & Develop

Release

ApachePig

ApacheHCatalo

gApacheHBase

Other Apache Projects

ApacheHive

Apache Ambari

“We have noticed more activity over the last year from Hortonworks’ engineers on building out Apache Hadoop’s more innovative features. These include YARN, Ambari and HCatalog..”

- Jeff Kelly: Wikibon

Apache Community Leadership

Page 18: Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data Platform v1.2

© Hortonworks Inc. 2013

True Enterprise Class Open Source

• 100% Open Source. No Holdbacks.–Only true implementation of OSS Apache Hadoop–Preferred by the software vendors that you rely on

• Flexible Deployment–No License Fee for usage

• Community Open Source Mitigates Lock-In–Proprietary Open Source = Lock-In–Open communities always trump “open source”

Page 18

Page 19: Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data Platform v1.2

© Hortonworks Inc. 2013

THANK YOU!!

Page 19

Download Hortonworks Sandboxwww.hortonworks.com/sandbox

Download Hortonworks Data Platformwww.hortonworks.com/download

Register for Enterprise Hadoop Serieswww.hortonworks.com/webinars

Follow US!@hortonworks@jaymce@jsposetti