lefbriefing_hortonworks_041912
TRANSCRIPT
-
7/31/2019 LEFBriefing_Hortonworks_041912
1/27
Hortonworks Inc. 2012
Hortonworks
February 2012
Page 1
Enabling Apache Hadoop to be
the next-generation enterprise data platform
-
7/31/2019 LEFBriefing_Hortonworks_041912
2/27
Hortonworks Inc. 2012
Hortonworks Vision
How to achieve that vision???Enable ecosystem around enterprise-viable
open source data platform.
We believe that by the end of 2015,
more than half the world's data will
be processed by Apache Hadoop
Page 2
-
7/31/2019 LEFBriefing_Hortonworks_041912
3/27
Hortonworks Inc. 2012
What is Apache Hadoop?
Solution for big dataDeals with complexities of high
volume, velocity & variety of data
Set of open source projects Transforms commodity hardware
into a service that:
Stores petabytes of data reliablyAllows huge distributed computations
Key attributes:Redundant and reliable (no data loss)Extremely powerfulBatch processing centricEasy to program distributed appsRuns on commodity hardware
Page 3
One of the best examples of
open source driving innovationand creating a market
-
7/31/2019 LEFBriefing_Hortonworks_041912
4/27
Hortonworks Inc. 2012
Market Trends Were Seeing
Page 4
-
7/31/2019 LEFBriefing_Hortonworks_041912
5/27
Hortonworks Inc. 2012
Trend: Hadoop as a Data Refinery
The old wayOperational systems keep current records, short historyAnalytics systems keep only conformed / cleaned / digested dataUnstructured data locked away in operational silosArchives offline
Inflexible, new questions require system redesigns
The new trendKeep all copies of multi-structured data (raw & refined) in HadoopPerform immediate transformations and data refining in HadoopMove refined data downstream for data discovery and BI/analyticsAgile outcome justifies new infrastructure
Page 5
-
7/31/2019 LEFBriefing_Hortonworks_041912
6/27
Hortonworks Inc. 2012
Agile Data Refinery w/HadoopConnecting All of Your Big Data
Page 6
EDWDataMarts
BI /Analytics
Traditional Data Warehouses,BI & AnalyticsCRUD / Serving systems
Webapps
ERP
Unstructured Systems
ServingLogs
SocialMedia
SensorData
TextSystems
Store, Transform, Refine,
Archive all data,Custom Analytics
-
7/31/2019 LEFBriefing_Hortonworks_041912
7/27
Hortonworks Inc. 2012
Trend: Data-driven Development
Limited runtime logic driven by huge lookup tablesData computed offline on Hadoop
Machine learning, other expensive computation offlinePersonalization, classification, fraud, value analysis
Application development requires data scienceHuge amounts of actually observed data key to modern servicesHadoop used as the science platform
Page 7
-
7/31/2019 LEFBriefing_Hortonworks_041912
8/27
Hortonworks Inc. 2012
CASE STUDYYAHOO! HOMEPAGE
8CopyrightYahoo2011
Personalized
for each visitor
Result:
twice the engagement
+160% clicksvs. one size fits all
+79% clicksvs. randomly selected
+43% clicksvs. editor selected
Recommendedlinks NewsInterests TopSearches
-
7/31/2019 LEFBriefing_Hortonworks_041912
9/27
Hortonworks Inc. 2012
Every Market Has Big Data
Page 9
Source: McKinsey & Company report. Big data: The next frontier for innovation, competition, and productivity. May 2011.
Digital data is personal, everywhere, increasinglyaccessible, and will continue to grow exponentially
-
7/31/2019 LEFBriefing_Hortonworks_041912
10/27
Hortonworks Inc. 2012
Trend: Specialization of Data Systems
Hadoop adds new capabilities to the enterprise,especially in scale out situations
Does not replace existing systems
Specialization of traditional data componentsUse Transactional systems for transactionsUse Analytics systems for interactive analysis
Hadoop has LOTS of bandwidth for storage and CPUPull data out Transactional systems for storage and stagingPull ELT out of Analytics systems
Page 10Confidential
-
7/31/2019 LEFBriefing_Hortonworks_041912
11/27
Hortonworks Inc. 2012
Hadoop and Transactional Systems
WebSite
Online Transaction Processing
Mission critical Manages transactions & serves reports
Page 11
TransactionProcessing
Systems
$$$
Reports
TransactionLogs
Hadoop used to Process Reports
Free up 50+% processing power fortransaction processing system
Significant cost savings due to commoditynature of Hadoop
WebSite
WebSite
-
7/31/2019 LEFBriefing_Hortonworks_041912
12/27
Hortonworks Inc. 2012
Hadoop and Analytics Systems
Mobile
Social
Otherlogs
Web
Hadoop EDW
Fast loading, raw data staging,ELT & long-term archival
(The Agile Data Zone)
High-value strategic andoperational intelligence
(Leverages huge ecosystem of tooling)
OnlineArchival
Page 12
Ex. Historical Black Friday data
-
7/31/2019 LEFBriefing_Hortonworks_041912
13/27
Hortonworks Inc. 2012
Hadoop as data refinery implies
Hadoop must be an open platform; Open Data APIsETL Integration / Data Ingest
Hadoop should work well with industry standard toolsOffsite Backups / DR
HDFS Snapshots, Cloud Backup, other toolsObject / Event-level Storage APIsNon-Relational Data
HCatalog (for all 3 above)Efficient / Low Cost Storage
Compression, Raid / Reed-SolomonNo Storage Limits
No file limits, scale beyond 10,000 computers / cluster
Page 13
-
7/31/2019 LEFBriefing_Hortonworks_041912
14/27
Hortonworks Inc. 2012
Open Platform Enables Ecosystem
Page 14
-
7/31/2019 LEFBriefing_Hortonworks_041912
15/27
Hortonworks Inc. 2012
Enabling a Broad Ecosystem
Page 15
-
7/31/2019 LEFBriefing_Hortonworks_041912
16/27
Hortonworks Inc. 2012
Open Platform Enables Ecosystem
Page 16
ETL (basic & advanced)
SQL, NewSQL, NoSQL, xDBC
Integration (msg bus, )
Search, Index
Tools, Languages
Algorithms, Data Science
Analytics, EDW
BI, Reporting, Visualization
Operations
HortonworksData Platform
Operational APIs
-
7/31/2019 LEFBriefing_Hortonworks_041912
17/27
Hortonworks Inc. 2012
Example: Teradata & HortonworksOnline Customer Behavior Example
Page 17
Mobile
Social
OtherLogs
Web
HadoopTeradataAster Teradata
Fast loading, raw data staging,ELT & long-term archival
Frequent, iterative analysis(e.g. user behavior/response to
promotions, pattern det.)
High-concurrency strategic& operational intelligence
OnlineArchival
Ex. Historical Black Friday data
-
7/31/2019 LEFBriefing_Hortonworks_041912
18/27
Hortonworks Inc. 2012
Industrys first open source big data integration software Feature-rich Job Designer Rich palette of pre-built templates Supports HDFS, Pig, Hive, HBase, Sqoop
Apache-licensed, bundled with HDP
Key benefits Graphical development Robust and scalable execution Broadest connectivity to support
all systems:450+ components
Real-time debugging
Example: Talend & Hortonworks
Page 18
-
7/31/2019 LEFBriefing_Hortonworks_041912
19/27
Hortonworks Inc. 2012 Page 19
Example: Microsoft & Hortonworks
Hadoop on Windows Server / Azure Target most used Hadoop components Patches flow into Apache open source
Hadoop 1.0, 0.23, and Trunk JavaScript Framework
Interactive JavaScript console for fastiterative development Fluent data query API that translates
JavaScript queries to server-side PigLatin and HiveQL
Robust data visualization & charting Enhanced Hive ODBC Driver
Move data from Hive into MicrosoftExcel, PowerPivot, Power View, etc.
Analyze Hadoop data and buildcorporate BI solutions
BusinessAnalysts
JavaScriptDevelopers
WindowsServer Admins
Patches to open source
components
Open source client andserver-side frameworks
Opensource
HiveODBC
Driver
-
7/31/2019 LEFBriefing_Hortonworks_041912
20/27
Hortonworks Inc. 2012
Hortonworks Data Platform
Page 20
-
7/31/2019 LEFBriefing_Hortonworks_041912
21/27
Hortonworks Inc. 2012
Balancing Innovation & Stability
Apache: Be aggressive - ship early and oftenProjects need to keep innovating and visibly improveAim for big improvementsMake early buggy releases
Hortonworks: Be predictable - ship when stableWe need to ship stable, working releasesMake packaged binary releases availableWe need to do regular sustaining engineering releasesHDP quarterly release trains sweep in stable Apache projects
Enables HDP to stay reasonably current and predictable while minimizing riskof thrashing that coordinating large # of Apache projects can cause
Page 21
-
7/31/2019 LEFBriefing_Hortonworks_041912
22/27
Hortonworks Inc. 2012
HCatalogZookeeper HivePigHadoopCore
HBase
Challenge:Integrate, manage, and supportchanges across a wide range of opensource projects that power the Hadoopplatform; each with their own releaseschedules, versions, & dependencies.Time intensive, Complex, ExpensiveSolution: Hortonworks Data Platform
Integrated certified platform distributionsExtensive Q/A process: many appsacross small, medium, & large clusters
Industry-leading Support with clearservice levels for updates and patchesContinuity via multi-year Support andMaintenance PolicyTechnical guidance support for Universeand Multiverse components
Hortonworks Data Platform (HDP)Fully Integrated, Extensively Tested, Enterprise Supported
Page 22
= New Version
-
7/31/2019 LEFBriefing_Hortonworks_041912
23/27
Hortonworks Inc. 2012
Support & Distribution Model
Page 23
Model and terminology conceptually similar to Ubuntus model:http://www.ubuntu.com/project/about-ubuntu/licensing
Hortonworks Data PlatformFully supported, integrated, tested, maintained100% Apache license, or compatible: BSD, MIT/X11,
NCSA, W3C Software license, X.Net
HDP Universe: Open Source EcosystemValidated & interoperable with HDPTechnical guidance support; work with OSS projects
100% OSI-compliant licensesOptionally installed
HDP Multiverse: Commercial EcosystemValidated & interoperable with HDPTechnical guidance support; work with TSANet3rd-party vendor licenses and support optionsOptionally installed
-
7/31/2019 LEFBriefing_Hortonworks_041912
24/27
Hortonworks Inc. 2012
Zookeeper
(ClusterCoordination)
Hortonworks Data Platform HDP Universe
HDFS(Hadoop Distributed File System)
MapReduce(Distributed Programing Framework)
Hive(SQL)
Pig(Data Flow)
HCatalog(Table & Schema Management)
Hortonworks Data Platform (HDP)Key Components of Standard Hadoop Open Source Stack
HBase
(ColumnarNoSQL
Store)
Page 24
OozieWorkflow scheduling
Sqoop &Other Ingest, ETL tools
Mahout &Other libraries
Ambari &Other Monitoring & Management
-
7/31/2019 LEFBriefing_Hortonworks_041912
25/27
Hortonworks Inc. 2012
Hadoop Now, Next, and Beyond
Page 25
Hadoop.Now(Hadoop 1.0)
HDP1
Most stable Hadoop ever
HBase, security, WebHDFSHCatalog data APIs
Hadoop.Next
(Hadoop 2.0)HDP2
HA, Next-gen MapReduceExtension & Integration APIsExtended HCatalog data APIs
Hadoop.BeyondFuture investments
Apache community, including Hortonworks investing to improve Hadoop: Make Hadoop an Open, Extensible, and Enterprise Viable Platform Enable More Applications to Run on Apache Hadoop
-
7/31/2019 LEFBriefing_Hortonworks_041912
26/27
Hortonworks Inc. 2012
Hortonworks Data Platform Timeline
Hortonworks Data Platform 1
1.0preview
1.1preview 1.1 1.2 1.3
Hortonworks Data Platform 2
Q4Q3Q2Q1
1.0
2.0preview 2.0 2.1
1.0preview
1.0preview
Page 26
36 Month support policy, from GA date
-
7/31/2019 LEFBriefing_Hortonworks_041912
27/27