hadoop operations, innovations and enterprise readiness with hortonworks data platform v1.2
DESCRIPTION
Hortonworks continues to innovate throughout all Hadoop related projects, packaging the most enterprise-ready components, such as Ambari, into the Hortonworks Data Platform (HDP). Please join us in this interactive webinar as we present real-world use cases of Enterprise customers that are finding success with HDP and their Big Data initiatives. We will also introduce new features from version 1.2 of the Hortonworks Data Platform and how it has become the leading 100% open source distribution choice for the Enterprise. In this webinar we will outline how enterprise customers are successfult with HDP and also review some of the newest features in version1.2 including: -How to provision a cluster -How to manage and monitor a cluster using completely open source tools -How to perform diagnostics to identify issues in a clusterTRANSCRIPT
© Hortonworks Inc. 2013
Hadoop Operations &Enterprise Readiness
HDP 1.2
Jim Walker
Jeff Sposetti
Page 1
© Hortonworks Inc. 2013
Hortonworks Snapshot
Page 2
• We distribute the only 100% Open Source Enterprise Hadoop Distribution: Hortonworks Data Platform
• We engineer, test & certify HDP for enterprise usage
• We employ the core architects, builders and operators of Apache Hadoop
• We drive innovation within Apache Software Foundation projects
• We are uniquely positioned to deliver the highest quality of Hadoop support
• We enable the ecosystem to work better with Hadoop
Develop Distribute Support
We develop, distribute and support the ONLY 100% open source Enterprise Hadoop distribution
Endorsed by Strategic Partners
© Hortonworks Inc. 2013
Hortonworks Process for Enterprise Hadoop
Page 3
Upstream Community Projects Downstream Enterprise Product
HortonworksData Platform
Design & Develop
Distribute
Integrate & Test
Package & Certify
ApacheHCatalo
g
ApachePig
ApacheHBase
Other Apache Projects
ApacheHive
Apache Ambari
ApacheHadoop
Test &Patch
Design & Develop
Release
No Lock-in: Integrated, tested & certified distribution lowers risk by ensuring close alignment with Apache projects
Virtuous cycle when development & fixed issues done upstream & stable project releases flow downstream
Stable Project Releases
Fixed Issues
© Hortonworks Inc. 2013
Hortonworks Data Platform 1.2
• Quarterly cadence– HDP is aligned tightly with the open source community
software releases, not a patchwork– Regular open source innovation based on an open
community
• Ecosystem validation– Packaged and tested with our key development partner,
Yahoo! across hundreds of nodes– Ambari is the preferred management tool for integration with
of Microsoft System Center and Teradata Viewpoint, today.
Page 4
© Hortonworks Inc. 2013
HDP 1.2 Summary
Hortonworks Data Platform 1.2 Hortonworks Data Platform outpaces the competition to extend leadership through 100% open source Enterprise Apache Hadoop
Focus areas:1. Ambari: continued innovation with a complete,
free and open cluster management tool• Existing: Provision, Manage and Monitor your Hadoop infrastructure
• New: Root Cause Analysis with job diagnostics, usage heat maps,
• Improved: Ecosystem integration and user interface
2. Enhanced security model and performance for Hive and HCatalog
3. Apache Mahout: now included in the HDP distribution
Page 5
© Hortonworks Inc. 2013
HDP Certifies Latest Stable Components
Page 6
Apache Project
HDP1.2
CDH3u5
CDH4.1.2
Hadoop 1.1.2 020.2 +923.418 2.0.0alpha +541 Pig 0.10.1 0.8.1 +51.39 0.10.0 +48
Hive 0.10.0 0.7.1 +42.56 0.9.0 +148
HCatalog 0.5.0 n/a n/a
HBase 0.94.2 0.90.6 +84.73 0.92.1 +154
Sqoop 1.4.2 1.3.0 +5.88 1.4.1 +51
Oozie 3.2.0 3.2.0 3.2.0
Zookeeper 3.4.5 3.3.5 +19.5 3.4.3 +25
Ambari 1.2.0 n/a n/a
Flume 1.3.0 0.9.4 +25.46 1.2.0 +119
Mahout 0.7.0 0.5 +9.7 0.7 +4
Source: http://files.cloudera.com/pdf/datasheet/cdh4.1_spec_sheet.pdf
© Hortonworks Inc. 2013
A Brief History of Apache Hadoop
Page 7
2013
Focus on INNOVATION2005: Yahoo! creates
team under E14 to work on Hadoop
Focus on OPERATIONS2008: Yahoo team extends focus to
operations to support multiple projects & growing clusters
Yahoo! begins to Operate at scale
EnterpriseHadoop
Apache Project Established
HortonworksData Platform
2004 2008 2010 20122006
STABILITY2011: Hortonworks created to focus on “Enterprise Hadoop“. Starts with
24 key Hadoop engineers from Yahoo
© Hortonworks Inc. 2013
HDP: Enterprise Hadoop Distribution
Page 8
PLATFORM SERVICES
HADOOP CORE
DATASERVICES
OPERATIONAL SERVICES
Manage & Operate at
Scale
Store, Process and Access Data
HORTONWORKS DATA PLATFORM (HDP)
Distributed Storage & Processing
Hortonworks Data Platform (HDP)Enterprise Hadoop
• The ONLY 100% open source and complete distribution
• Enterprise grade, proven and tested at scale
• Ecosystem endorsed to ensure interoperability
Enterprise Readiness
© Hortonworks Inc. 2013
Next-Generation Data Architecture
Page 9
APPL
ICAT
ION
SDA
TA S
YSTE
MS
TRADITIONAL REPOSRDBMS EDW MPP
DATA
SO
URC
ES
MOBILEDATA
OLTP, POS SYSTEMS
OPERATIONALTOOLS
MANAGE & MONITOR
Traditional Sources (RDBMS, OLTP, OLAP)
New Sources (web logs, email, sensor data, social media)
DEV & DATATOOLS
BUILD & TEST
Business Analytics
Custom Applications
Enterprise Applications
HORTONWORKS DATA PLATFORM
© Hortonworks Inc. 2013
HDP 1.2: Operational Services Improvements
Page 10
PLATFORM SERVICES
HADOOP CORE
DATASERVICES
Store, Process and Access Data
Enterprise ReadinessHigh Availability, Disaster Recovery, Snapshots, Security, etc…
HORTONWORKS DATA PLATFORM (HDP)
Distributed Storage & Processing
Apache Ambari 1.2Hortonworks open source approach continues to accelerate enterprise adoption of Hadoop
– Open Source ApproachThe only 100% open source Apache Hadoop cluster management tool
– Baseline FeaturesDelivers all necessary tools/functions to provision, manage and monitor a Apache Hadoop cluster
– InnovationProvides ability to zoom into cluster usage and performance metrics for jobs and tasks to identify root cause of bottlenecks or operations issues
– InteroperableIncludes APIs for integrating with Microsoft System Center, Teradata Viewpoint, and other systems
Also Upgraded Oozie & Zookeeper
OPERATIONAL SERVICES
Manage & Operate at
ScaleOOZIE
AMBARI
© Hortonworks Inc. 2013
Apache Ambari Dashboard
HDP 1.2: New Ambari Features
• Job DiagnosticsVisualize and troubleshoot Hadoop job execution and performance
• Cluster History View historical job execution & performance
• REST interface provides external access to Ambari for existing tools. Facilitates integration with Microsoft System Center and Teradata Viewpoint
• Instant InsightView health of Core Hadoop (HDFS, MapReduce) and related projects
• Cluster Navigation “Quick link” buttons jump into namenode web UI for a server
Page 11
© Hortonworks Inc. 2013
Demo
Page 12
© Hortonworks Inc. 2013
HDP 1.2: Platform Service Improvements
Page 13
PLATFORM SERVICES
HADOOP CORE
DATASERVICES
Store, Process and Access Data
Enterprise ReadinessHigh Availability, Disaster Recovery, Snapshots, Security, etc…
HORTONWORKS DATA PLATFORM (HDP)
Distributed Storage & Processing
SecurityExtend platform services for security, a KEY requirement for enterprise adoption of Hadoop
– Enhanced security architecture & pluggable authentication model controls access to Hive tables and metastore
– Aligns and improves Hive & HCatalog authentication models
High AvailabilityFull stack HA on Hadoop 1.0
– Extended HA to Hive & HCatalog Metastore
OPERATIONAL SERVICES
Manage & Operate at
Scale
© Hortonworks Inc. 2013
HDP 1.2: Data Services Improvements
Page 14
PLATFORM SERVICES
HADOOP CORE
Enterprise ReadinessHigh Availability, Disaster Recovery, Snapshots, Security, etc…
HORTONWORKS DATA PLATFORM (HDP)
Distributed Storage & Processing
Data Services Updates– Upgraded Pig, and Flume
– Added Mahout (0.7.0) to distribution
Hive, HCatalog & HBaseContinue to innovate & improve the data services with open source contributions to HCatalog, Hive and HBase
– Concurrency improvements for Hive and consistent security for Hive & HCatalog
– Performance and operational enhancements for HBase
– Improved Java developer productivity via certified Cascading framework
OPERATIONAL SERVICES
Manage & Operate at
Scale
DATASERVICES
Store, Process and Access Data
HCATALOG
HIVEPIG
HBASESQOOP
FLUMEMAHOUT
© Hortonworks Inc. 2013Page 15
© Hortonworks Inc. 2013Page 16
© Hortonworks Inc. 2013Page 17
Apache Software Foundation Guiding Principles• Release early & often• Transparency, respect, meritocracy
Key Roles held by Hortonworkers• PMC Members
– Managing community projects– Mentoring new incubator projects– About 20 Hortonworkers managing community
• Committers– Authoring, reviewing & editing code– About 50 Hortonworkers across projects
• Release Managers– Testing & releasing projects– Hortonworkers across key projects like Hadoop,
Hive, Pig, HCatalog, Ambari, HBase
ApacheHadoop
Test &Patch
Design & Develop
Release
ApachePig
ApacheHCatalo
gApacheHBase
Other Apache Projects
ApacheHive
Apache Ambari
“We have noticed more activity over the last year from Hortonworks’ engineers on building out Apache Hadoop’s more innovative features. These include YARN, Ambari and HCatalog..”
- Jeff Kelly: Wikibon
Apache Community Leadership
© Hortonworks Inc. 2013
True Enterprise Class Open Source
• 100% Open Source. No Holdbacks.–Only true implementation of OSS Apache Hadoop–Preferred by the software vendors that you rely on
• Flexible Deployment–No License Fee for usage
• Community Open Source Mitigates Lock-In–Proprietary Open Source = Lock-In–Open communities always trump “open source”
Page 18
© Hortonworks Inc. 2013
THANK YOU!!
Page 19
Download Hortonworks Sandboxwww.hortonworks.com/sandbox
Download Hortonworks Data Platformwww.hortonworks.com/download
Register for Enterprise Hadoop Serieswww.hortonworks.com/webinars
Follow US!@hortonworks@jaymce@jsposetti