lefbriefing_hortonworks_041912

Upload: tstacct543

Post on 04-Apr-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 LEFBriefing_Hortonworks_041912

    1/27

    Hortonworks Inc. 2012

    Hortonworks

    February 2012

    Page 1

    Enabling Apache Hadoop to be

    the next-generation enterprise data platform

  • 7/31/2019 LEFBriefing_Hortonworks_041912

    2/27

    Hortonworks Inc. 2012

    Hortonworks Vision

    How to achieve that vision???Enable ecosystem around enterprise-viable

    open source data platform.

    We believe that by the end of 2015,

    more than half the world's data will

    be processed by Apache Hadoop

    Page 2

  • 7/31/2019 LEFBriefing_Hortonworks_041912

    3/27

    Hortonworks Inc. 2012

    What is Apache Hadoop?

    Solution for big dataDeals with complexities of high

    volume, velocity & variety of data

    Set of open source projects Transforms commodity hardware

    into a service that:

    Stores petabytes of data reliablyAllows huge distributed computations

    Key attributes:Redundant and reliable (no data loss)Extremely powerfulBatch processing centricEasy to program distributed appsRuns on commodity hardware

    Page 3

    One of the best examples of

    open source driving innovationand creating a market

  • 7/31/2019 LEFBriefing_Hortonworks_041912

    4/27

    Hortonworks Inc. 2012

    Market Trends Were Seeing

    Page 4

  • 7/31/2019 LEFBriefing_Hortonworks_041912

    5/27

    Hortonworks Inc. 2012

    Trend: Hadoop as a Data Refinery

    The old wayOperational systems keep current records, short historyAnalytics systems keep only conformed / cleaned / digested dataUnstructured data locked away in operational silosArchives offline

    Inflexible, new questions require system redesigns

    The new trendKeep all copies of multi-structured data (raw & refined) in HadoopPerform immediate transformations and data refining in HadoopMove refined data downstream for data discovery and BI/analyticsAgile outcome justifies new infrastructure

    Page 5

  • 7/31/2019 LEFBriefing_Hortonworks_041912

    6/27

    Hortonworks Inc. 2012

    Agile Data Refinery w/HadoopConnecting All of Your Big Data

    Page 6

    EDWDataMarts

    BI /Analytics

    Traditional Data Warehouses,BI & AnalyticsCRUD / Serving systems

    Webapps

    ERP

    Unstructured Systems

    ServingLogs

    SocialMedia

    SensorData

    TextSystems

    Store, Transform, Refine,

    Archive all data,Custom Analytics

  • 7/31/2019 LEFBriefing_Hortonworks_041912

    7/27

    Hortonworks Inc. 2012

    Trend: Data-driven Development

    Limited runtime logic driven by huge lookup tablesData computed offline on Hadoop

    Machine learning, other expensive computation offlinePersonalization, classification, fraud, value analysis

    Application development requires data scienceHuge amounts of actually observed data key to modern servicesHadoop used as the science platform

    Page 7

  • 7/31/2019 LEFBriefing_Hortonworks_041912

    8/27

    Hortonworks Inc. 2012

    CASE STUDYYAHOO! HOMEPAGE

    8CopyrightYahoo2011

    Personalized

    for each visitor

    Result:

    twice the engagement

    +160% clicksvs. one size fits all

    +79% clicksvs. randomly selected

    +43% clicksvs. editor selected

    Recommendedlinks NewsInterests TopSearches

  • 7/31/2019 LEFBriefing_Hortonworks_041912

    9/27

    Hortonworks Inc. 2012

    Every Market Has Big Data

    Page 9

    Source: McKinsey & Company report. Big data: The next frontier for innovation, competition, and productivity. May 2011.

    Digital data is personal, everywhere, increasinglyaccessible, and will continue to grow exponentially

  • 7/31/2019 LEFBriefing_Hortonworks_041912

    10/27

    Hortonworks Inc. 2012

    Trend: Specialization of Data Systems

    Hadoop adds new capabilities to the enterprise,especially in scale out situations

    Does not replace existing systems

    Specialization of traditional data componentsUse Transactional systems for transactionsUse Analytics systems for interactive analysis

    Hadoop has LOTS of bandwidth for storage and CPUPull data out Transactional systems for storage and stagingPull ELT out of Analytics systems

    Page 10Confidential

  • 7/31/2019 LEFBriefing_Hortonworks_041912

    11/27

    Hortonworks Inc. 2012

    Hadoop and Transactional Systems

    WebSite

    Online Transaction Processing

    Mission critical Manages transactions & serves reports

    Page 11

    TransactionProcessing

    Systems

    $$$

    Reports

    TransactionLogs

    Hadoop used to Process Reports

    Free up 50+% processing power fortransaction processing system

    Significant cost savings due to commoditynature of Hadoop

    WebSite

    WebSite

  • 7/31/2019 LEFBriefing_Hortonworks_041912

    12/27

    Hortonworks Inc. 2012

    Hadoop and Analytics Systems

    Mobile

    Social

    Otherlogs

    Web

    Hadoop EDW

    Fast loading, raw data staging,ELT & long-term archival

    (The Agile Data Zone)

    High-value strategic andoperational intelligence

    (Leverages huge ecosystem of tooling)

    OnlineArchival

    Page 12

    Ex. Historical Black Friday data

  • 7/31/2019 LEFBriefing_Hortonworks_041912

    13/27

    Hortonworks Inc. 2012

    Hadoop as data refinery implies

    Hadoop must be an open platform; Open Data APIsETL Integration / Data Ingest

    Hadoop should work well with industry standard toolsOffsite Backups / DR

    HDFS Snapshots, Cloud Backup, other toolsObject / Event-level Storage APIsNon-Relational Data

    HCatalog (for all 3 above)Efficient / Low Cost Storage

    Compression, Raid / Reed-SolomonNo Storage Limits

    No file limits, scale beyond 10,000 computers / cluster

    Page 13

  • 7/31/2019 LEFBriefing_Hortonworks_041912

    14/27

    Hortonworks Inc. 2012

    Open Platform Enables Ecosystem

    Page 14

  • 7/31/2019 LEFBriefing_Hortonworks_041912

    15/27

    Hortonworks Inc. 2012

    Enabling a Broad Ecosystem

    Page 15

  • 7/31/2019 LEFBriefing_Hortonworks_041912

    16/27

    Hortonworks Inc. 2012

    Open Platform Enables Ecosystem

    Page 16

    ETL (basic & advanced)

    SQL, NewSQL, NoSQL, xDBC

    Integration (msg bus, )

    Search, Index

    Tools, Languages

    Algorithms, Data Science

    Analytics, EDW

    BI, Reporting, Visualization

    Operations

    HortonworksData Platform

    Operational APIs

  • 7/31/2019 LEFBriefing_Hortonworks_041912

    17/27

    Hortonworks Inc. 2012

    Example: Teradata & HortonworksOnline Customer Behavior Example

    Page 17

    Mobile

    Social

    OtherLogs

    Web

    HadoopTeradataAster Teradata

    Fast loading, raw data staging,ELT & long-term archival

    Frequent, iterative analysis(e.g. user behavior/response to

    promotions, pattern det.)

    High-concurrency strategic& operational intelligence

    OnlineArchival

    Ex. Historical Black Friday data

  • 7/31/2019 LEFBriefing_Hortonworks_041912

    18/27

    Hortonworks Inc. 2012

    Industrys first open source big data integration software Feature-rich Job Designer Rich palette of pre-built templates Supports HDFS, Pig, Hive, HBase, Sqoop

    Apache-licensed, bundled with HDP

    Key benefits Graphical development Robust and scalable execution Broadest connectivity to support

    all systems:450+ components

    Real-time debugging

    Example: Talend & Hortonworks

    Page 18

  • 7/31/2019 LEFBriefing_Hortonworks_041912

    19/27

    Hortonworks Inc. 2012 Page 19

    Example: Microsoft & Hortonworks

    Hadoop on Windows Server / Azure Target most used Hadoop components Patches flow into Apache open source

    Hadoop 1.0, 0.23, and Trunk JavaScript Framework

    Interactive JavaScript console for fastiterative development Fluent data query API that translates

    JavaScript queries to server-side PigLatin and HiveQL

    Robust data visualization & charting Enhanced Hive ODBC Driver

    Move data from Hive into MicrosoftExcel, PowerPivot, Power View, etc.

    Analyze Hadoop data and buildcorporate BI solutions

    BusinessAnalysts

    JavaScriptDevelopers

    WindowsServer Admins

    Patches to open source

    components

    Open source client andserver-side frameworks

    Opensource

    HiveODBC

    Driver

  • 7/31/2019 LEFBriefing_Hortonworks_041912

    20/27

    Hortonworks Inc. 2012

    Hortonworks Data Platform

    Page 20

  • 7/31/2019 LEFBriefing_Hortonworks_041912

    21/27

    Hortonworks Inc. 2012

    Balancing Innovation & Stability

    Apache: Be aggressive - ship early and oftenProjects need to keep innovating and visibly improveAim for big improvementsMake early buggy releases

    Hortonworks: Be predictable - ship when stableWe need to ship stable, working releasesMake packaged binary releases availableWe need to do regular sustaining engineering releasesHDP quarterly release trains sweep in stable Apache projects

    Enables HDP to stay reasonably current and predictable while minimizing riskof thrashing that coordinating large # of Apache projects can cause

    Page 21

  • 7/31/2019 LEFBriefing_Hortonworks_041912

    22/27

    Hortonworks Inc. 2012

    HCatalogZookeeper HivePigHadoopCore

    HBase

    Challenge:Integrate, manage, and supportchanges across a wide range of opensource projects that power the Hadoopplatform; each with their own releaseschedules, versions, & dependencies.Time intensive, Complex, ExpensiveSolution: Hortonworks Data Platform

    Integrated certified platform distributionsExtensive Q/A process: many appsacross small, medium, & large clusters

    Industry-leading Support with clearservice levels for updates and patchesContinuity via multi-year Support andMaintenance PolicyTechnical guidance support for Universeand Multiverse components

    Hortonworks Data Platform (HDP)Fully Integrated, Extensively Tested, Enterprise Supported

    Page 22

    = New Version

  • 7/31/2019 LEFBriefing_Hortonworks_041912

    23/27

    Hortonworks Inc. 2012

    Support & Distribution Model

    Page 23

    Model and terminology conceptually similar to Ubuntus model:http://www.ubuntu.com/project/about-ubuntu/licensing

    Hortonworks Data PlatformFully supported, integrated, tested, maintained100% Apache license, or compatible: BSD, MIT/X11,

    NCSA, W3C Software license, X.Net

    HDP Universe: Open Source EcosystemValidated & interoperable with HDPTechnical guidance support; work with OSS projects

    100% OSI-compliant licensesOptionally installed

    HDP Multiverse: Commercial EcosystemValidated & interoperable with HDPTechnical guidance support; work with TSANet3rd-party vendor licenses and support optionsOptionally installed

  • 7/31/2019 LEFBriefing_Hortonworks_041912

    24/27

    Hortonworks Inc. 2012

    Zookeeper

    (ClusterCoordination)

    Hortonworks Data Platform HDP Universe

    HDFS(Hadoop Distributed File System)

    MapReduce(Distributed Programing Framework)

    Hive(SQL)

    Pig(Data Flow)

    HCatalog(Table & Schema Management)

    Hortonworks Data Platform (HDP)Key Components of Standard Hadoop Open Source Stack

    HBase

    (ColumnarNoSQL

    Store)

    Page 24

    OozieWorkflow scheduling

    Sqoop &Other Ingest, ETL tools

    Mahout &Other libraries

    Ambari &Other Monitoring & Management

  • 7/31/2019 LEFBriefing_Hortonworks_041912

    25/27

    Hortonworks Inc. 2012

    Hadoop Now, Next, and Beyond

    Page 25

    Hadoop.Now(Hadoop 1.0)

    HDP1

    Most stable Hadoop ever

    HBase, security, WebHDFSHCatalog data APIs

    Hadoop.Next

    (Hadoop 2.0)HDP2

    HA, Next-gen MapReduceExtension & Integration APIsExtended HCatalog data APIs

    Hadoop.BeyondFuture investments

    Apache community, including Hortonworks investing to improve Hadoop: Make Hadoop an Open, Extensible, and Enterprise Viable Platform Enable More Applications to Run on Apache Hadoop

  • 7/31/2019 LEFBriefing_Hortonworks_041912

    26/27

    Hortonworks Inc. 2012

    Hortonworks Data Platform Timeline

    Hortonworks Data Platform 1

    1.0preview

    1.1preview 1.1 1.2 1.3

    Hortonworks Data Platform 2

    Q4Q3Q2Q1

    1.0

    2.0preview 2.0 2.1

    1.0preview

    1.0preview

    Page 26

    36 Month support policy, from GA date

  • 7/31/2019 LEFBriefing_Hortonworks_041912

    27/27