mrinal devadas, hortonworks making sense of big data

36
© Hortonworks Inc. 2013 Hortonworks Community Driven Enterprise Apache Hadoop Mrinal Devadas Systems Architect [email protected] Page 1

Upload: patrickcrompton

Post on 10-May-2015

941 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Mrinal devadas, Hortonworks Making Sense Of Big Data

© Hortonworks Inc. 2013

HortonworksCommunity DrivenEnterprise Apache Hadoop

Mrinal Devadas

Systems Architect

[email protected]

Page 1

Page 2: Mrinal devadas, Hortonworks Making Sense Of Big Data

© Hortonworks Inc. 2013

Hortonworks

• Who is Hortonworks• Our Approach• Patterns of Use

Page 2

Page 3: Mrinal devadas, Hortonworks Making Sense Of Big Data

© Hortonworks Inc. 2013

A Brief History of Apache Hadoop

Page 3

2013

Focus on INNOVATION2005: Yahoo! creates

team under E14 to work on Hadoop

Focus on OPERATIONS2008: Yahoo team extends focus to

operations to support multiple projects & growing clusters

Yahoo! begins to Operate at scale

EnterpriseHadoop

Apache Project Established

HortonworksData Platform

2004 2008 2010 20122006

STABILITY2011: Hortonworks created to focus on “Enterprise Hadoop“. Starts with

24 key Hadoop engineers from Yahoo

Page 4: Mrinal devadas, Hortonworks Making Sense Of Big Data

© Hortonworks Inc. 2013

Hortonworks Snapshot

Page 4

• We distribute the only 100% Open Source Enterprise Hadoop Distribution: Hortonworks Data Platform

• We engineer, test & certify HDP for enterprise usage

• We employ the core architects, builders and operators of Apache Hadoop

• We drive innovation within Apache Software Foundation projects

• We are uniquely positioned to deliver the highest quality of Hadoop support

• We enable the ecosystem to work better with Hadoop

Develop Distribute Support

We develop, distribute and support the ONLY 100% open source Enterprise Hadoop distribution

Endorsed by Strategic Partners

Headquarters: Palo Alto, CAEmployees: 200+ and growingInvestors: Benchmark, Index, Yahoo

Page 5: Mrinal devadas, Hortonworks Making Sense Of Big Data

© Hortonworks Inc. 2013

Hortonworks

• Who is Hortonworks• Our approach

– Leading Open Source Hadoop innovation– Addressing “Enterprise Hadoop” Requirements– Enabling Interoperability of the Ecosystem– Ensuring No Lock-In: 100% Open Source

• Patterns of Use

Page 5

Page 6: Mrinal devadas, Hortonworks Making Sense Of Big Data

© Hortonworks Inc. 2013Page 6

Apache Software Foundation Guiding Principles• Release early & often• Transparency, respect, meritocracy

Key Roles held by Hortonworkers• PMC Members

– Managing community projects– Mentoring new incubator projects– Over 20 Hortonworkers managing community

• Committers– Authoring, reviewing & editing code– Over 50 Hortonworkers across projects

• Release Managers– Testing & releasing projects– Hortonworkers across key projects like Hadoop,

Hive, Pig, HCatalog, Ambari, HBase

ApacheHadoop

Test &Patch

Design & Develop

Release

ApachePig

ApacheHCatalo

gApacheHBase

Other Apache Projects

ApacheHive

Apache Ambari

“We have noticed more activity over the last year from Hortonworks’ engineers on building out Apache Hadoop’s more innovative features. These include YARN, Ambari and HCatalog..”

- Jeff Kelly: Wikibon

Apache Community Leadership

Page 7: Mrinal devadas, Hortonworks Making Sense Of Big Data

© Hortonworks Inc. 2013

Leadership that Starts at the Core

Page 7

• Driving next generation Hadoop– YARN, MapReduce2, HDFS2, High

Availability, Disaster Recovery

• 420k+ lines authored since 2006– More than twice nearest contributor

• Deeply integrating w/ecosystem– Enabling new deployment platforms

– (ex. Windows & Azure, Linux & VMware HA)

– Creating deeply engineered solutions– (ex. Teradata big data appliance)

• All Apache, NO holdbacks– 100% of code contributed to Apache

Page 8: Mrinal devadas, Hortonworks Making Sense Of Big Data

© Hortonworks Inc. 2013

Driving Enterprise Hadoop Innovation

Page 8

HortonworksCommitters

Cloudera Committers

19 8

6 1

5 0

5 9

16 0AMBARI

HBASE

HIVE/HCATALOG

PIG

HADOOP CORE

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Lines Of Code By CompanySource: Apache Software Fundation

Hortonworks Yahoo! Cloudera Other

Page 9: Mrinal devadas, Hortonworks Making Sense Of Big Data

© Hortonworks Inc. 2013

Hortonworks Process for Enterprise Hadoop

Page 9

Upstream Community Projects Downstream Enterprise Product

HortonworksData Platform

Design & Develop

Distribute

Integrate & Test

Package & Certify

ApacheHCatalo

g

ApachePig

ApacheHBase

Other Apache Projects

ApacheHive

Apache Ambari

ApacheHadoop

Test &Patch

Design & Develop

Release

Virtuous cycle when development & fixed issues done upstream & stable project releases flow downstreamNo Lock-in: Integrated, tested & certified distribution lowers risk by ensuring close alignment with Apache projects

Stable Project Releases

Fixed Issues

“We have noticed more activity over the last year from Hortonworks’ engineers on building out Apache Hadoop’s more innovative features. These include YARN, Ambari and HCatalog.” - Jeff Kelly: Wikibon

Page 10: Mrinal devadas, Hortonworks Making Sense Of Big Data

© Hortonworks Inc. 2013

Hortonworks

• Who is Hortonworks• Our approach

– Leading Open Source Hadoop Innovation– Addressing “Enterprise Hadoop” Requirements– Enabling Interoperability of the Ecosystem– Ensuring NO LOCK-IN: 100% Open Source

• Patterns of use

Page 10

Page 11: Mrinal devadas, Hortonworks Making Sense Of Big Data

© Hortonworks Inc. 2013

Enhancing the Core of Apache Hadoop

Deliver high-scale storage & processing with enterprise-ready platform services

Unique Focus Areas:• Bigger, faster, more flexible

Continued focus on speed & scale and enabling near-real-time apps

• Tested & certified at scale Run ~1300 system tests on large Yahoo clusters for every release

• Enterprise-ready servicesHigh availability, disaster recovery, snapshots, security, …

Page 11

HADOOP CORE

Hortonworkers are the architects, operators, and builders of core Hadoop

Distributed Storage & Processing

PLATFORM SERVICES Enterprise Readiness

Page 12: Mrinal devadas, Hortonworks Making Sense Of Big Data

© Hortonworks Inc. 2013Page 12

HADOOP CORE

DATASERVICES

Provide data services to store, process & access data in many ways

Unique Focus Areas:• Apache HCatalog

Metadata services for consistent table access to Hadoop data

• Apache Hive Explore & process Hadoop data via SQL & ODBC-compliant BI tools

Distributed Storage & Processing

Hortonworks enables Hadoop data to be accessed via existing tools & systems

Store, Process and Access Data

PLATFORM SERVICES Enterprise Readiness

Data Services for Full Data Lifecycle

Page 13: Mrinal devadas, Hortonworks Making Sense Of Big Data

© Hortonworks Inc. 2013

Operational Services for Ease of Use

Page 13

OPERATIONAL SERVICES

Include complete operational services for productive operations & management

Unique Focus Area:• Apache Ambari:

Provision, manage & monitor a cluster; complete REST APIs to integrate with existing operational tools; job & task visualizer to diagnose issues

Only Hortonworks provides a complete open source Hadoop management tool

Manage & Operate at

Scale

DATASERVICES

Store, Process and Access Data

HADOOP CORE Distributed Storage & Processing

PLATFORM SERVICES Enterprise Readiness

Page 14: Mrinal devadas, Hortonworks Making Sense Of Big Data

© Hortonworks Inc. 2013

OS Cloud VM Appliance

Page 14

PLATFORM SERVICES

HADOOP CORE

DATASERVICES

OPERATIONAL SERVICES

Manage & Operate at

Scale

Store, Process and Access Data

Enterprise Readiness

Only Hortonworks allows you to deploy seamlessly across any deployment option

• Linux & Windows• Azure, Rackspace & other clouds• Virtual platforms• Big data appliances

HORTONWORKS DATA PLATFORM (HDP)

Distributed Storage & Processing

Deployable Across a Range of Options

Page 15: Mrinal devadas, Hortonworks Making Sense Of Big Data

© Hortonworks Inc. 2013

OS Cloud VM Appliance

HDP: Enterprise Hadoop Distribution

Page 15

PLATFORM SERVICES

HADOOP CORE

DATASERVICES

OPERATIONAL SERVICES

Manage & Operate at

Scale

Store, Process and Access Data

HORTONWORKS DATA PLATFORM (HDP)

Distributed Storage & Processing

Hortonworks Data Platform (HDP)Enterprise Hadoop

• The ONLY 100% open source and complete distribution

• Enterprise grade, proven and tested at scale

• Ecosystem endorsed to ensure interoperability

Enterprise Readiness

Page 16: Mrinal devadas, Hortonworks Making Sense Of Big Data

© Hortonworks Inc. 2013

Hortonworks

• Who is Hortonworks• Our approach

– Leading Open Source Hadoop Innovation– Addressing “Enterprise Hadoop” Requirements– Enabling Interoperability of the Ecosystem– Ensuring No Lock-in: 100% Open Source

• Patterns of use

Page 16

Page 17: Mrinal devadas, Hortonworks Making Sense Of Big Data

© Hortonworks Inc. 2013

Existing Data ArchitectureAP

PLIC

ATIO

NS

DATA

SYS

TEM

S

TRADITIONAL REPOSRDBMS EDW MP

P

DATA

SO

URC

ES

OLTP, POS SYSTEMS

OPERATIONALTOOLS

MANAGE & MONITOR

Traditional Sources (RDBMS, OLTP, OLAP)

DEV & DATATOOLS

BUILD & TEST

Business Analytics

Custom Applications

Enterprise Applications

Page 17

Page 18: Mrinal devadas, Hortonworks Making Sense Of Big Data

© Hortonworks Inc. 2013

Next-Generation Data ArchitectureAP

PLIC

ATIO

NS

DATA

SYS

TEM

S

TRADITIONAL REPOSRDBMS EDW MP

P

DATA

SO

URC

ES

OLTP, POS SYSTEMS

OPERATIONALTOOLS

MANAGE & MONITOR

Traditional Sources (RDBMS, OLTP, OLAP)

New Sources (web logs, email, sensors, social media)

DEV & DATATOOLS

BUILD & TEST

Business Analytics

Custom Applications

Enterprise Applications

ENTERPRISE HADOOP PLATFORM

Page 18

Page 19: Mrinal devadas, Hortonworks Making Sense Of Big Data

© Hortonworks Inc. 2013

Interoperating With Your Tools

Page 19

APPL

ICAT

ION

SDA

TA S

YSTE

MS

TRADITIONAL REPOS

DEV & DATATOOLS

OPERATIONALTOOLS

Viewpoint

Microsoft Applications

HORTONWORKS DATA PLATFORM

DATA

SO

URC

ES

Traditional Sources (RDBMS, OLTP, OLAP)

New Sources (web logs, email, sensors, social media)

Page 20: Mrinal devadas, Hortonworks Making Sense Of Big Data

© Hortonworks Inc. 2013

Hortonworks

• Who is Hortonworks• Our approach

– Leading Open Source Hadoop Innovation– Addressing “Enterprise Hadoop” Requirements– Enabling Interoperability of the Ecosystem– Ensuring No Lock-In: 100% Open Source

• Patterns of use

Page 20

Page 21: Mrinal devadas, Hortonworks Making Sense Of Big Data

© Hortonworks Inc. 2013

True Enterprise Class Open Source

• Community-driven Approach Mitigates Lock-In– Identify & introduce enterprise requirements into public domain– Work with community to advance & incubate open source projects– Apply Enterprise Rigor for the most stable and reliable distribution

• 100% Open Source. No Holdbacks.– Only true implementation of OSS Apache Hadoop– Preferred by the software vendors that you rely on– Proprietary Open Source = Lock-In– Open communities always trump “open source”

• Flexible Deployment– No License Fee for usage

Page 21

Page 22: Mrinal devadas, Hortonworks Making Sense Of Big Data

© Hortonworks Inc. 2013

Hortonworks

• Who is Hortonworks• Our approach• Patterns of use

Page 22

Page 23: Mrinal devadas, Hortonworks Making Sense Of Big Data

© Hortonworks Inc. 2013

Big DataTransactions, Interactions, Observations

Hadoop Common Patterns of Use

Business Cases

HORTONWORKSDATA PLATFORM

Refine Explore Enrich

Batch Interactive Online

“Right-time” Access to Data

Page 23

Page 24: Mrinal devadas, Hortonworks Making Sense Of Big Data

© Hortonworks Inc. 2013

Operational Data RefineryDA

TA S

YSTE

MS

DATA

SO

URC

ES

1

3

1 Capture

Process

Distribute & Retain

2

3

Refine Explore Enrich

2

APPL

ICAT

ION

S

Transform & refine ALL sources of data

Also known as Data Reservoir or Catch Basin

TRADITIONAL REPOSRDBMS EDW MPP

Business Analytics

Custom Applications

Enterprise Applications

Traditional Sources (RDBMS, OLTP, OLAP)

New Sources (web logs, email, sensor data, social media)

Page 24

HORTONWORKS DATA PLATFORM

Page 25: Mrinal devadas, Hortonworks Making Sense Of Big Data

© Hortonworks Inc. 2013

Big Data Exploration & VisualizationDA

TA S

YSTE

MS

DATA

SO

URC

ES

Refine Explore Enrich

APPL

ICAT

ION

S

Leverage “data lake” to perform iterative investigation for value

3

2TRADITIONAL REPOS

RDBMS EDW MPP

1

Business Analytics

Traditional Sources (RDBMS, OLTP, OLAP)

New Sources (web logs, email, sensor data, social media)

Custom Applications

Enterprise Applications

1 Capture

Process

Explore & Visualize

2

3

Page 25

HORTONWORKS DATA PLATFORM

Page 26: Mrinal devadas, Hortonworks Making Sense Of Big Data

© Hortonworks Inc. 2013

DATA

SYS

TEM

SDA

TA S

OU

RCES

Refine Explore Enrich

APPL

ICAT

ION

S

Create intelligent applications

Collect data, create analytical models and deliver to online apps

3

1

2TRADITIONAL REPOS

RDBMS EDW MPP

Traditional Sources (RDBMS, OLTP, OLAP)

New Sources (web logs, email, sensor data, social media)

Custom Applications

Enterprise Applications

NOSQL

1 Capture

Process & Compute

Deliver Model

2

3

Page 26

Application Enrichment

HORTONWORKS DATA PLATFORM

Page 27: Mrinal devadas, Hortonworks Making Sense Of Big Data

© Hortonworks Inc. 2013

Flexible Support Subscription Programs

Leverage Hortonworks Expertise: Subscription and Support delivered and backed by Hadoop experts; subscriptions based on nodes or storage

Page 27

Developer Support“How to” guidance for developers and archs

Essential Support*Operations support for small research clusters

Standard SupportOperations support for dev & test clusters

12 x 5Web only

12 x 5Web only

All Sev: 1 business day

All Sev: 1 business day

12 x 5Web only

ApplicationDesign Advice

Code Review

Cluster Design, Install, Maintain, Performance

Cluster Design, Install, Maintain, Performance

All Sev: 1 business day

1 seat

3 Contacts

3 Contacts

Patches & Updates

Patches & Updates

* Limited in size and no expansion

Enterprise SupportOperations support for critical clusters

24 x 7 Phone &

Web

Sev 1: 1 Hour Sev 2: 4 Bus Hour

Cluster Design, Install, Maintain, Performance

5 Contacts

Patches & Updates

Additional Options

Page 28: Mrinal devadas, Hortonworks Making Sense Of Big Data

© Hortonworks Inc. 2013

Hortonworks: Best In Class Hadoop Support

• Experienced enterprise support team – Experience supporting enterprise clients in production– Core engineers have real operational

experience: built and supported 44+K nodes in production– Extensive experience in commercial big data offerings

including HDP, MapR, Karmasphere

• Global 24x7 operation – support based in Sunnyvale, UK & India

• Stringent case management processes ensures high quality customer service & responsiveness

Page 28

Page 29: Mrinal devadas, Hortonworks Making Sense Of Big Data

© Hortonworks Inc. 2013

Transferring Our Hadoop Expertise to You

The expert source for Apache Hadoop training &

certification

• World class training programs designed to help you learn fast

– Role-based hands on classes with 50% lab time

• Expert consulting services– Programs designed to transfer knowledge

• Industry leading Hadoop Sandbox program– Fastest way to learn Apache Hadoop– Multi-level tutorials for wide applicability– Customizable and updateable

Page 29

Page 30: Mrinal devadas, Hortonworks Making Sense Of Big Data

© Hortonworks Inc. 2013

Introducing Hortonworks Data Platform for Windows

Enterprise Apache Hadoop

March 2013

Page 30

Page 31: Mrinal devadas, Hortonworks Making Sense Of Big Data

© Hortonworks Inc. 2013

Why Apache Hadoop on Windows?

• According to IDC Windows Server held 73% market share in 2012– Hadoop was traditionally built for Linux servers so there are a large number of underserved

organizations

• According to 2012 Barclays CIO study big data outranks virtualization as #1 trend driving spending initiatives

– Unstructured data growth exceeds 80% year/year in most enterprises

• Apache Hadoop is the defacto big data platform for processing massive amounts of unstructured data

– Complementary to existing Microsoft technologies– There is a huge untapped community of Windows developers and ecosystem partners

• A strong Microsoft-Hortonworks partnership and 18 months of development makes this a natural next step

Page 31

Page 32: Mrinal devadas, Hortonworks Making Sense Of Big Data

© Hortonworks Inc. 2013

Hortonworks Data Platform for Windows

• Enterprise-grade Apache Hadoop on Windows– Enables same experience for Hadoop on Windows & Linux

• More partners, more developers for Hadoop– Makes native Apache Hadoop available to Windows ecosystem– More options for Windows focused organizations

• Hortonworks focus: Enterprise Apache Hadoop for all platforms– Trusted reliable production-ready distribution for on-premise Hadoop on Windows

deployments

• Built with joint investment and contributions from Microsoft– Deep engineering relationship ensures tight integration and maximum performance

Page 32

HDP is the first and only distribution available on Windows & Linux

Page 33: Mrinal devadas, Hortonworks Making Sense Of Big Data

© Hortonworks Inc. 2013

Seamless Interoperability with Your Microsoft Tools

• Integrated with Microsoft tools for native big data analysis

– Bi-directional connectors for SQL Server and SQL Azure through SQOOP

– Excel ODBC integration through Hive

• Addressing demand for Hadoop on Windows

– Ideal for Windows customers with Hadoop operational experience

• Enables most common Hadoop workloads in the Enterprise

– Data refinement and ETL offload for high-volume data landing

– Data exploration for discovery of new business opportunities

– Data enrichment for fined tuned delivery and recommendation engines

Page 33

APPL

ICAT

ION

SDA

TA S

YSTE

MS

Microsoft Applications

HORTONWORKS DATA PLATFORMFor Windows

DATA

SO

URC

ES

MOBILEDATA

OLTP, POS SYSTEMS

Traditional Sources (RDBMS, OLTP, OLAP)

New Sources (web logs, email, sensor data, social media)

Page 34: Mrinal devadas, Hortonworks Making Sense Of Big Data

© Hortonworks Inc. 2013

Inside HDP for Windows

Page 34

HORTONWORKS DATA PLATFORM (HDP)For Windows

Hortonworks Data Platform (HDP)For Windows

• 100% Open Source Enterprise Hadoop

• Component and version compatible with HDInsight

• Availability

• Beta release available now

PLATFORM SERVICES

HADOOP CORE Distributed Storage & ProcessingHDFS

WEBHDFS

MAP REDUCE

DATASERVICES

Store, Process and Access Data

HCATALOG

HIVEPIG

SQOOP

OPERATIONAL SERVICES

Manage & Operate at

ScaleOOZIE

Page 35: Mrinal devadas, Hortonworks Making Sense Of Big Data

© Hortonworks Inc. 2013

Maximize Your Hadoop Deployment Choice

• Use HDP for Windows for on-premises deployment on Windows Server– Ideal for Windows users with Hadoop experience– Perfect next step for those who are ready to move from POC to production

• Use HDInsight for Microsoft tooling and Management and Provisioning– HDInsight Service that offers full benefit of Windows Azure (e.g. elasticity & low cost) –

available in Preview today– HDInsight Server for full integration of Hadoop with Microsoft tools on premises –

Developer Preview available today

• Full interoperability and deployment choice across platforms– Implement big data applications that run on-premise & cloud– By leveraging open source HDP, enables seamless interoperability across

environments: Linux, Windows, Windows Azure

Page 35

Page 36: Mrinal devadas, Hortonworks Making Sense Of Big Data

© Hortonworks Inc. 2013

Summary

• Leading the Innovation in Core Hadoop• Addressing the requirements for Enterprise usage• Enabling interoperability of the ecosystem• No lock-in. 100% Open Source.

• Best in industry support with flexible pricing model

• Find out more– www.hortonworks.com

– http://hortonworks.com/hadoop-training/

Page 36