enterprise hadoop - sas · provide deployment choice across physical, virtual, ... cluster: knox...
TRANSCRIPT
Page 1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Enterprise Hadoop
Enterprise Hadoop
Jeff Markham
Technical Director, APAC
Page 2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Upcoming Announcements
Hortonworks Data Platform 2.1A continued focus on innovation within the core of Enterprise Hadoop
to enable an ecosystem to flourish and cement Hadoop’s role in the
data architectures of tomorrow
• Interactive SQL Query: Final phase of Stinger Delivered.
• Comprehensive Features: Governance, Security, Operations
• Processing Versatility: Storm, Search
April
2
April
2
April
3
LucidWorks partnershipA resell agreement has been inked with Lucidworks
to provide tier 2 and tier 3 support for HDP Search
Hadoop Summit Europe 2014SOLD OUT, double exhibitors,
double content, year over year.
April
21Concurrent
Partnership
Cascading is the proven application
development platform for building data
applications on Hadoop
Integrate and Deliver the Cascading SDK
into HDP 2.1
• Collection of tools, documentation,
libraries, tutorials and example projects
• Simplifies SQL integration and enables
Scala development for Hadoop
Hortonworks provides level 1 & 2 support
for Cascading SDK
Page 3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hadoop within an emerging Modern Data Architecture
OPERATIONS TOOLS
Provision,
Manage &
Monitor
DEV & DATA TOOLS
Build &
Test
DA
TA
SY
ST
EM
REPOSITORIES
SO
UR
CE
S
RDBMS EDW MPP
OLTP, ERP,
CRM Systems
Documents,
Emails
Web Logs,
Click Streams
Social
Networks
Machine
Generated
Sensor
Data
Geolocation
Data
Go
ve
rna
nc
e
& In
teg
rati
on
Se
cu
rity
Op
era
tio
nsData Access
Data Management
AP
PLI
CA
TIO
NS
Business
Analytics
Custom
Applications
Packaged
Applications
Page 4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Core Capabilities of Enterprise Hadoop
Load data
and manage
according
to policy
Deploy and
effectively
manage the
platform
Store and process all of your Corporate Data Assets
Access your data simultaneously in multiple ways
(batch, interactive, real-time) Provide layered
approach to
security through
Authentication,
Authorization,
Accounting, and
Data Protection
DATA MANAGEMENT
SECURITYDATA ACCESSGOVERNANCE &
INTEGRATIONOPERATIONS
Enable both existing and new application to
provide value to the organization
PRESENTATION & APPLICATION
Empower existing operations and
security tools to manage Hadoop
ENTERPRISE MGMT & SECURITY
Provide deployment choice across physical, virtual, cloud
DEPLOYMENT OPTIONS
Page 5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
=delivered in Open Source
Provision,
Manage &
Monitor
Ambari
Zookeeper
Scheduling
Oozie
Data Workflow,
Lifecycle &
Governance
Falcon
Sqoop
Flume
NFS
WebHDFS
YARN : Data Operating System
DATA MANAGEMENT
SECURITYDATA ACCESSGOVERNANCE &
INTEGRATION
Authentication
Authorization
Accounting
Data Protection
Storage: HDFS
Resources: YARN
Access: Hive, …
Pipeline: Falcon
Cluster: Knox
OPERATIONS
Script
Pig
Search
Solr
SQL
Hive/Tez,
HCatalog
NoSQL
HBase
Accumulo
Stream
Storm
Others
In-Memory
Analytics,
ISV engines
1 ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° °
°
°
N
HDFS (Hadoop Distributed File System)
Batch
Map
Reduce
Page 6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
HDP 2.1: Enterprise Hadoop
HDP 2.1Hortonworks Data Platform
Provision,
Manage &
Monitor
Ambari
Zookeeper
Scheduling
Oozie
Data Workflow,
Lifecycle &
Governance
Falcon
Sqoop
Flume
NFS
WebHDFS
YARN : Data Operating System
DATA MANAGEMENT
SECURITYDATA ACCESSGOVERNANCE &
INTEGRATION
Authentication
Authorization
Accounting
Data Protection
Storage: HDFS
Resources: YARN
Access: Hive, …
Pipeline: Falcon
Cluster: Knox
OPERATIONS
Script
Pig
Search
Solr
SQL
Hive/Tez,
HCatalog
NoSQL
HBase
Accumulo
Stream
Storm
Others
In-Memory
Analytics,
ISV engines
1 ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° °
°
°
N
HDFS (Hadoop Distributed File System)
Batch
Map
Reduce
Deployment ChoiceLinux Windows On-
Premise
Cloud
Page 7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
HDP 2.1 Investment Themes
HDP 2.1 Represents a MAJOR step forward for HadoopDelivery of Interactive Query via Stinger Initiative, Addition of Data Governance,
more Security, Stream Processing and Search, Highlight Release
Three Key Highlights of Release
1. Stinger Initiative DELIVERED: Interactive Query in Apache Hive
2. NEW Capabilities for Hadoop
• Governance: delivered with Apache Falcon
• Security: Apache Knox extends perimeter security for Hadoop
3. NEW Engines included in HDP
• Stream processing: Apache Storm to analyze/process streams of data
• Search: via Apache Solr
Go
ve
rna
nc
e
& In
teg
rati
on
Se
cu
rity
Op
era
tio
nsData Access
Data
Management
HDP 2.1
Page 8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hortonworks Data Platform
So
lr
Had
oo
p
&Y
AR
N
Pig
Tez
Hiv
e &
HC
ata
log
HB
ase
Sq
oo
p
Oo
zie
Zo
okeep
er
Mah
ou
t
Am
bari
Sto
rm
Flu
me
Kn
ox
Ph
oen
ix
Accu
mu
lo
HDP 2.1: Reliable, Consistent & Current
HDP certifies most recent & stable community innovation
2.2.0
1.1.2
0.11.0
0.11.0
0.12.0
0.12.0
HDP 1.3
May
2013
2.4.0 0.12.1
HDP 2.0
October
2013
HDP 2.1
April
2014
SecurityOperationsData AccessData
Management
0.13.0
0.94.6
0.96.1
0.98.0
0.9.1
0.7.0
0.8.0
0.9.04.7.8
1.4.3
1.4.4
1.3.1
1.4.0
1.2.5
1.4.4
1.5.1
3.3.2
4.0.0
3.4.5
0.4.0
0.4.04.0.0
1.5.1
Falc
on
0.5.0
Governance
& Integration
Go
ve
rna
nc
e
& In
teg
rati
on
Se
cu
rity
Op
era
tio
nsData Access
Data
Management
HDP 2.1
Page 9 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Interactive SQL-IN-Hadoop Delivered
Stinger Initiative – DELIVERED
Next generation SQL based
interactive query in Hadoop
SpeedImprove Hive query performance has increased by 100X to allow for
interactive query times (seconds)
Scale
The only SQL interface to Hadoop designed for queries that scale
from TB to PB
SQL
Support broadest range of SQL semantics for analytic applications
running against Hadoop
Apache Hive ContributionA an Open Community at its finest
1,672Jira Tickets Closed
145Developers
44Companies
~390,000Lines Of Code AddedA (2x)
Apache YARN
Apache
MapReduce
1 ° ° °
° ° ° °
° ° ° °
°
°
N
HDFS (Hadoop Distributed File System)
Apache
Tez
Apache Hive
SQL
Business AnalyticsCustom
AppsStinger Project
Stinger Phase 1:
• Base Optimizations
• SQL Types
• SQL Analytic Functions
• ORCFile Modern File Format
Stinger Phase 2:
• SQL Types
• SQL Analytic Functions
• Advanced Optimizations
• Performance Boosts via YARN
Stinger Phase 3• Hive on Apache Tez
• Query Service (always on)
• Buffer Cache
• Cost Based Optimizer (Optiq)
13Months
Go
ve
rna
nc
e
& In
teg
rati
on
Se
cu
rity
Op
era
tio
nsData Access
Data
Management
HDP 2.1
Page 10 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
New: Data Governance & Integration
Investment Phases
Apache FalconSimplified Data Governance
for Enterprise Hadoop
• First time included in HDP
• Provides key governance framework for:
• Acquisition & processing of data sets
• Replication & Retention of datasets
• Redirect datasets to non-Hadoop extensions
• Provides audit trail & lineage
Phase-3• Advanced Dashboard for pipeline
definition & management
• Audit
• Lineage
• Data tagging
• File import SSH & SCP
Phase-2• Basic dashboard for
pipeline viewing
• Kerberos security support
• Ambari integration for
management
• Hive/HCatalog integration
Phase-1
• Incubate Apache Falcon
• Dataset replication & retention
• Falcon tech preview
Another great example of
Open Community InnovationOriginally built and contributed to Apache by InMobi
• Fastest path to innovation is the open community
• 14 months in the making
• Tested In production
• Vibrant community of developers building
Go
ve
rna
nc
e
& In
teg
rati
on
Se
cu
rity
Op
era
tio
nsData Access
Data
Management
HDP 2.1
Page 11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
New: Apache Knox for Perimeter Security
Go
ve
rna
nc
e
& In
teg
rati
on
Se
cu
rity
Op
era
tio
nsData Access
Data
Management
HDP 2.1
Important Note: Security for Hadoop must be addressed within
every layer of the stack and integrated into existing frameworksFor a full description of what is available in Enterprise Hadoop
today across Authentication, Authorization, accountability and
Encryption please visit our security labs page
Apache KnoxPerimeter security for Hadoop
� A common place to preform authentication
across Hadoop and all related projects
� Integrated to LDAP and AD
� Currently supports:
WebHDFS, WebHCAT, Oozie, Hive & HBase
� Broad community effort, Incubated with
Microsoft, broad set of developers invovled
Security Investments
Security Phase 3:• Audit event correlation and Audit viewer
• Support Token-Based AuthN beyond kerb
• Data Encryption in HDFS, Hive & Hbase
• Knox for HDFS HA, Ambari & Falcon
Security Phase 2:
• ACLs for HDFS
• Knox: Hadoop REST API Security
• SQL-style Hive AuthZ (GRANT, REVOKE)
• SSL support for Hive Server 2
• SSL for DN/NN UI & WebHDFS
• PAM support for Hive
Phase 1• Strong AuthN with Kerberos
• HBase, Hive, HDFS basic AuthZ
• Encryption with SSL for NN, JT, etc.
• Wire encryption with Shuffle, HDFS, JDBC
Page 12 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
New: Stream Processing with Apache Storm
Apache StormReal-time event processing for
sensor and business activity
monitoring
• Unlocks new business cases for Hadoop
• Key component of a data lake architecture
• Scale: Ingest millions of events per second.
Fast query on petabytes of data
• Integrated with Ambari to manage
Investment Phases
Phase-3• High Availability mgmnt w/Ambari
• AD/LDAP plugin for authentication
• Declarative “wiring”
• Hive update support
• Advanced scheduler
Phase-2• Storm-on-YARN
• Ingest & Notification for JMS
• Data persistence: EDWs, RDBMS,
Cassandra
Phase-1� Install, Start, & Stop via Ambari
� Kafka, HBase, & HDFS Connectors
� Ganglia & Nagios
based monitoring
Go
ve
rna
nc
e
& In
teg
rati
on
Se
cu
rity
Op
era
tio
nsData Access
Data
Management
HDP 2.1
Page 13 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
New: Search for Hadoop
• Apache SolrOpen source enterprise
search for Hadoop and HDP
• Open architecture: In the community, for the community
• Simple, powerful UI for advanced search applications
• High performance indexing & sub-second search times
over billions of documents
• Deep Integration Roadmap with HDP
• Partnership with LucidWorks
• LucidWorks provides tier 3 & 4 support
• Alignment w/ strategy of working within the community
and with the core committers
• 9 committers total (7 PMC)
Go
ve
rna
nc
e
& In
teg
rati
on
Se
cu
rity
Op
era
tio
nsData Access
Data
Management
HDP 2.1
Page 14 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Cascading SDK & HDP 2.1
Cascading SDK
Enables the the rapid development of batch
and interactive data-driven applications
Integration Roadmap
• Step 1: Integrate Cascading SDK for
customers to use with HDP 2.1
• Step 2: Integration with Tez
Page 15 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Tech Preview: Apache Spark
In-memory processing is “HOT!”
Ahowever, most of the world using for science and machine learning
In memory sandbox for iterative data
analytics used by a handful of data scientists
Hortonworks provides guidance for initial
applicability and scale
� Exploring key use cases with customers focused on
Iterative access & machine learning
� Experience thus far supports target deployments of no
more than: 1 TB of data, 40 nodes, and 1-3 users
� Skill set required: Scala (Java-based API Framework)
Go
ve
rna
nc
e
& In
teg
rati
on
Se
cu
rity
Op
era
tio
nsData Access
Data
Management
HDP 2.1
Page 16 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Operating Enterprise Hadoop
Apache Ambari is the only 100% open source
framework for provisioning, managing and monitoring
Apache Hadoop clusters
AMBARI WEB
OthersViewpoint
compute
&
storage. . .
. . .
. .compute
&
storage
.
.
PROVISION
MANAGE
MONITOR
REST APIs
AMBARI SERVERPROVISION | MANAGE | MONITOR
Integration With Existing Operations Tools
New in HDP 2.1
� Support new Data Access Engines
� Stack extensibility, Cluster Blueprints
� Rolling restarts
� Maintenance mode
� more...
Go
ve
rna
nc
e
& In
teg
rati
on
Se
cu
rity
Op
era
tio
nsData Access
Data
Management
HDP 2.1
Page 17 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
HDP 2.1 Investment Themes
HDP 2.1 Represents a MAJOR step forward for HadoopDelivery of Interactive Query via Stinger Initiative, Addition of Data Governance,
more Security, Stream Processing and Search, Highlight Release
Three Key Highlights of Release
1. Stinger Initiative DELIVERED: Interactive Query in Apache Hive
2. NEW Capabilities for Hadoop
• Governance: delivered with Apache Falcon
• Security: Apache Knox extends perimeter security for Hadoop
3. NEW Engines included in HDP
• Stream processing: Apache Storm to analyze/process streams of data
• Search: via Apache Solr
Go
ve
rna
nc
e
& In
teg
rati
on
Se
cu
rity
Op
era
tio
nsData Access
Data
Management
HDP 2.1
)AND the HDP Spark Tech Preview,
Simultaneous Linux & Windows Release,
COUNTLESS additional features
Page 18 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Thank You
Jeff Markham
Technical Director, APAC