whither the hadoop developer experience, june hadoop meetup, nitin motgi
TRANSCRIPT
![Page 1: Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi](https://reader033.vdocuments.mx/reader033/viewer/2022051414/55bdb25ebb61eb17588b46fe/html5/thumbnails/1.jpg)
@nmotgi
Nitin Motgi
Whither the Hadoop Developer Experience ?
![Page 2: Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi](https://reader033.vdocuments.mx/reader033/viewer/2022051414/55bdb25ebb61eb17588b46fe/html5/thumbnails/2.jpg)
PROPRIETARY & CONFIDENTIAL2
• Introduction to data applications
• Challenges with building operational data applications on Hadoop
• Motivation and Goals for CDAP
• Use-‐cases
• Introduction to CDAP and Architecture Overview
• Demo
Agenda
![Page 3: Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi](https://reader033.vdocuments.mx/reader033/viewer/2022051414/55bdb25ebb61eb17588b46fe/html5/thumbnails/3.jpg)
PROPRIETARY & CONFIDENTIAL3
Applications that use data insights to enhance the customers/user experience, achieve a business objective or improve a business process.
What are Data Applications?
![Page 4: Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi](https://reader033.vdocuments.mx/reader033/viewer/2022051414/55bdb25ebb61eb17588b46fe/html5/thumbnails/4.jpg)
PROPRIETARY & CONFIDENTIAL4
• 360-‐Degree Customer View
• Recommendation Engine
• Predictive Modeling
• Fraud Analysis
• Network Threat Detection
• Telemetry Analysis
• Time Series Analysis
• Data Processing -‐ ETL
• And many more
Examples
![Page 5: Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi](https://reader033.vdocuments.mx/reader033/viewer/2022051414/55bdb25ebb61eb17588b46fe/html5/thumbnails/5.jpg)
Challenges
![Page 6: Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi](https://reader033.vdocuments.mx/reader033/viewer/2022051414/55bdb25ebb61eb17588b46fe/html5/thumbnails/6.jpg)
Technology Explosion
Core HadoopHDFS, MR
2006
HbaseZooKeeper
Core Hadoop
2008
HivePig
MahoutHbase
ZooKeeperCore Hadoop
2009
SqoopWhirrAvroHivePig
MahoutHbase
ZookeeperCore Hadoop
2010
FlumeBigtopOozie
MRUnitHCatalog
SqoopWhirrAvroHivePig
MahoutHbase
ZookeeperCore Hadoop
2011
SparkImpala
SolrKafkaFlumeBigtopOozie
MRUnitHCatalog
SqoopWhirrAvroHivePig
MahoutHbase
ZookeeperCore Hadoop
2012
SentryTez
ParquetYARNSparkYARNImpala
SolrKafkaFlumeBigtopOozie
MRUnitHCatalog
SqoopWhirrAvroHivePig
MahoutHbase
ZookeeperCore Hadoop
Knox
Present
![Page 7: Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi](https://reader033.vdocuments.mx/reader033/viewer/2022051414/55bdb25ebb61eb17588b46fe/html5/thumbnails/7.jpg)
APPLICATION
COMPLEXITY
MANY DOMAINS TO
BRIDGE
LOTS OF
BOILERPLATEINCONSISTENT
APIS
NO
REUSABILITY LACK OF DEVELOPER
PRODUCTIVITY
Challenges
![Page 8: Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi](https://reader033.vdocuments.mx/reader033/viewer/2022051414/55bdb25ebb61eb17588b46fe/html5/thumbnails/8.jpg)
Application Complexity
![Page 9: Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi](https://reader033.vdocuments.mx/reader033/viewer/2022051414/55bdb25ebb61eb17588b46fe/html5/thumbnails/9.jpg)
Mo:va:on
![Page 10: Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi](https://reader033.vdocuments.mx/reader033/viewer/2022051414/55bdb25ebb61eb17588b46fe/html5/thumbnails/10.jpg)
Motivation• Simple yet powerful platform for developers to build applications on Hadoop
• Expose capabilities rather than features
•Make Hadoop accessible to developers with no Hadoop knowledge
![Page 11: Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi](https://reader033.vdocuments.mx/reader033/viewer/2022051414/55bdb25ebb61eb17588b46fe/html5/thumbnails/11.jpg)
Goals• Unified platform for building solutions on Hadoop
• Simpler application development lifecycle
• Reusable Data and Processing Patterns with Abstractions
• Framework level correctness and consistency
• Easy to use developer APIs
![Page 12: Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi](https://reader033.vdocuments.mx/reader033/viewer/2022051414/55bdb25ebb61eb17588b46fe/html5/thumbnails/12.jpg)
PROPRIETARY & CONFIDENTIAL12
• Reliable and scalable real-‐time business critical analytics
• Closed Loop Recommendation and Analytics
• Data Ingestion As A Service
• Extendable and Reusable use-‐case blueprints
• ETL Automation -‐ Real-‐time and Batch
• Data As A Service
• Reduce development and operational complexity of Hadoop
Typical Customer Use-cases
Which one of these are applicable to you ?
![Page 13: Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi](https://reader033.vdocuments.mx/reader033/viewer/2022051414/55bdb25ebb61eb17588b46fe/html5/thumbnails/13.jpg)
Introduc:on toCask Data Applica:on PlaCorm
![Page 14: Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi](https://reader033.vdocuments.mx/reader033/viewer/2022051414/55bdb25ebb61eb17588b46fe/html5/thumbnails/14.jpg)
An open source, integrated, distributed and extensible platform for building data applications on Hadoop.
Cask Data Application Platform
![Page 15: Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi](https://reader033.vdocuments.mx/reader033/viewer/2022051414/55bdb25ebb61eb17588b46fe/html5/thumbnails/15.jpg)
Provides
![Page 16: Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi](https://reader033.vdocuments.mx/reader033/viewer/2022051414/55bdb25ebb61eb17588b46fe/html5/thumbnails/16.jpg)
Supports developers, operations, and organizations through the entire enterprise data application lifecycle.
CASK DATA APP PLATFORM
Data Lifecycle
Ingest
Explore
Transform
Serve
Application Lifecycle
Develop
Test
Deploy
Scale
EnterpriseLifecycle
Secure
Manage
Monitor
Operate
Supports
![Page 17: Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi](https://reader033.vdocuments.mx/reader033/viewer/2022051414/55bdb25ebb61eb17588b46fe/html5/thumbnails/17.jpg)
17
ServeTransformExploreIngest
Unification
ACID
Dataset
Streams
Realtime - Tigon
JDBC
Query
RPC
SparkMR Dataset
Dataset
MR
Spark
Ad-hocquery
Dataset API, SPI & Management Services
Application Structure
![Page 18: Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi](https://reader033.vdocuments.mx/reader033/viewer/2022051414/55bdb25ebb61eb17588b46fe/html5/thumbnails/18.jpg)
Building Blocks
Dataset Program
Encapsulated data access paEerns and data model in a reusable, domain-‐specific API
Standardized containers for processing paradigms
ProgramaUc abstracUon for composing mulUple Datasets and Programs that integrates ingesUon, exploraUon, transformaUon and serving
Application
Dataset ProgramProgramDataset
![Page 19: Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi](https://reader033.vdocuments.mx/reader033/viewer/2022051414/55bdb25ebb61eb17588b46fe/html5/thumbnails/19.jpg)
19
Deployment Architecture
• Services• Master• Router • Auth Server
CDAP Server• Highly Available (HA)• Installed on edge node(s)• Supports Kerberos - Impersonation & Permitter Security• Manager system services in YARN
CDAP Server
System Services (Twill Containers)• Transactions (Tephra)• Metrics Aggregation• Log Aggregation• Dataset Services• Metadata Management Service• Explore Service• Stream Management Service & more
![Page 20: Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi](https://reader033.vdocuments.mx/reader033/viewer/2022051414/55bdb25ebb61eb17588b46fe/html5/thumbnails/20.jpg)
Want to Learn More?
Open-source (Apache License v2)
Website: http://cdap.io
Mailing List: [email protected] [email protected]
IRC: #cdap on freenode.net
![Page 21: Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi](https://reader033.vdocuments.mx/reader033/viewer/2022051414/55bdb25ebb61eb17588b46fe/html5/thumbnails/21.jpg)
QUESTIONS?