Turbocharging CDAP Applications with Ampool

Download Turbocharging CDAP Applications with Ampool

Post on 21-Jan-2017

582 views

Category:

Data & Analytics

0 download

Embed Size (px)

TRANSCRIPT

<ul><li><p>2015Slide 1</p><p>Prepared for:BDA Meetup</p><p>Turbocharging CDAP Applications With AmpoolMilind Bhandarkar, (@techmilind)Founder &amp; CEO @AmpoolIO</p></li><li><p>2015Slide 2</p><p>Prepared for:BDA Meetup</p><p>Ampool Vision</p><p>Pipelines w/ CDAP</p><p>IMDG w/ Geode</p><p>Ampool w/ CDAP</p><p>Q &amp; A</p><p>Outline 1</p><p>2</p><p>3</p><p>4</p><p>Ampool Vision</p><p>Pipeline/ CDAP</p><p>IMDG / Geode</p><p>Ampool/ CDAP</p><p>5</p><p>Q &amp; A</p></li><li><p>2015Slide 3</p><p>Prepared for:BDA Meetup</p><p>Data Processing &amp; Storage layers have evolved for scale-out </p><p>Unstructured Structured</p><p>Pers</p><p>iste</p><p>nce</p><p>Proc</p><p>essi</p><p>ng ImmutableMutable</p><p>Unmanaged Managed</p><p>Log Publish</p><p>QTx</p><p>ETL</p><p>In the beginning</p><p>As app users &amp; data grew</p><p>Big Data/ App Explosion!</p><p>Ampool Vision</p><p>Pipeline/ CDAP</p><p>IMDG / Geode</p><p>Ampool/ CDAP</p><p>Q &amp; A</p></li><li><p>2015Slide 4</p><p>Prepared for:BDA Meetup</p><p>ImmutableMutable</p><p>Unmanaged Managed</p><p>Log Publish</p><p>ETL</p><p>Build a Processing &amp; Storage-agnostic Memory Architecture</p><p>Unstructured Structured</p><p>Pers</p><p>iste</p><p>nce</p><p>Proc</p><p>essi</p><p>ng</p><p>Unify data processing</p><p>Design for Scale-out</p><p>Best of breed data engines!</p><p>ampool</p><p>Data Frame</p><p>Data Set</p><p>QTxAmpool Vision</p><p>Pipeline/ CDAP</p><p>IMDG / Geode</p><p>Ampool/ CDAP</p><p>Q &amp; A</p></li><li><p>2015Slide 5</p><p>Prepared for:BDA Meetup</p><p>Ampools Mission:To help build real-time customer experiences through high-performance analytics built for modern, commodity hardware platforms</p><p>For the community:To speed-up big, real-time analytics in a democratic way through a memory-centric architecture (complementing existing architectures), driving better interoperability between compute and storage layers.</p><p>Ampool Vision</p><p>Pipeline/ CDAP</p><p>IMDG / Geode</p><p>Ampool/ CDAP</p><p>Q &amp; A</p></li><li><p>2015Slide 6</p><p>Prepared for:BDA Meetup</p><p>AnalyticsIngest App UseETL</p><p>Big Data Processing Pipelinesuse slow, persistent storage for data exchange today!</p><p>!</p><p>" # # #</p><p>Ampool Vision</p><p>Pipeline/ CDAP</p><p>IMDG / Geode</p><p>Ampool/ CDAP</p><p>Q &amp; A</p></li><li><p>2015Slide 7</p><p>Prepared for:BDA Meetup</p><p>AnalyticsIngest App UseETL</p><p>!</p><p>" # # #</p><p>AMPOOL: Fast memory across distributed compute clusters...driving performance, simplicity and agility</p><p>ampool </p><p>Ampool Vision</p><p>Pipeline/ CDAP</p><p>IMDG / Geode</p><p>Ampool/ CDAP</p><p>Q &amp; A</p></li><li><p>2015Slide 8</p><p>Prepared for:BDA Meetup</p><p>AnalyticsIngest App UseETL</p><p>!</p><p>"</p><p>Energy ManagementIoT Analytics</p><p>Data ingestion flows: Smart meter data </p><p>(Kafka)</p><p>Hive processing: De-norm, Sessionize Aggregations</p><p>Spark processing: Linear Regression Export to HBase</p><p>Downstream Apps: Web app integration</p><p>ampoolHDFS</p><p>Ampool Vision</p><p>Pipeline/ CDAP</p><p>IMDG / Geode</p><p>Ampool/ CDAP</p><p>Q &amp; A</p></li><li><p>2015Slide 9</p><p>Prepared for:BDA Meetup</p><p>Pipeline implemented in CDAPAmpool Vision</p><p>Pipeline/ CDAP</p><p>IMDG / Geode</p><p>Ampool/ CDAP</p><p>Q &amp; A</p></li><li><p>2015Slide 10</p><p>Prepared for:BDA Meetup</p><p>CDAP Application</p><p>Ampool Vision</p><p>Pipeline/ CDAP</p><p>IMDG / Geode</p><p>Ampool/ CDAP</p><p>Q &amp; A</p></li><li><p>2015Slide 11</p><p>Prepared for:BDA Meetup</p><p>In-memory TechnologyWhat is Apache Geode?</p><p>Ampool Vision</p><p>Pipeline/ CDAP</p><p>IMDG / Geode</p><p>Ampool/ CDAP</p><p>Q &amp; A</p></li><li><p>2015Slide 12</p><p>Prepared for:BDA Meetup</p><p>How does it compare with the Big Data stack?YCSB: Geode &amp; HBase</p><p>Ampool Vision</p><p>Pipeline/ CDAP</p><p>IMDG / Geode</p><p>Ampool/ CDAP</p><p>Q &amp; A</p></li><li><p>2015Slide 13</p><p>Prepared for:BDA Meetup</p><p>Ampool with CDAP</p><p>CDAP with HBase</p><p>(as-is Application)</p><p>Configuration ChangesExtension modules/directoryDistributed Mode table/stream</p><p>CDAP with Ampool(powered by Geode)</p><p>Ampool Vision</p><p>Pipeline/ CDAP</p><p>IMDG / Geode</p><p>Ampool/ CDAP</p><p>Q &amp; A</p></li><li><p>2015Slide 14</p><p>Prepared for:BDA Meetup</p><p>Ampool Vision</p><p>Pipeline/ CDAP</p><p>IMDG / Geode</p><p>Ampool/ CDAP</p><p>Q &amp; ACDAP Demo Pipeline(Video)</p></li><li><p>2015Slide 15</p><p>Prepared for:BDA Meetup</p><p>Ampool with CDAPPipeline Baseline: Ampool &amp; HBase</p><p>Ampool Vision</p><p>Pipeline/ CDAP</p><p>IMDG / Geode</p><p>Ampool/ CDAP</p><p>Q &amp; A</p></li><li><p>2015Slide 16</p><p>Prepared for:BDA Meetup</p><p> CDAP simplifies the development of complex big data pipelines and offers extensibility at multiple layers </p><p> In-memory technology such as Geode promise higher performancein certain use-cases</p><p> Ampool, powered by Geode, is able to show immediate performance gains without any pipeline re-engineering!</p><p> Future</p><p>Key TakeawaysAmpool complements CDAP</p></li><li><p>2015Slide 17</p><p>Prepared for:BDA Meetup</p><p>C o m p a t i b l e w i t h t h e F u t u r e</p></li><li><p>2015Slide 18</p><p>Prepared for:BDA Meetup</p><p>AnalyticsIngest App UseETL</p><p>ampool</p><p>Customer BehaviorPredictive Modeling</p><p>Data ingestion flows: Click streams (Kafka) Dim. tables (Sqoop)</p><p>2-stage MR pipeline: Cleanse data Sessionize clickstream</p><p>HAWQ stages: Data import (PxF) Exp. features (MADlib)</p><p>Spark modeling stages: Feature analysis (MLlib) Scoring (R/ HAWQ)</p><p>HDFS</p><p>!</p><p>"</p></li><li><p>2015Slide 19</p><p>Prepared for:BDA Meetup</p><p>AnalyticsIngest App UseETL</p><p>Security AnalyticsBig Data Insights</p><p>Data ingestion flows: Security Logs (Flume)</p><p>Pig data processing: Joins logs w/ catalog Stores denorm. logs</p><p>Kylin stages: Pre-aggregations Export to HBase</p><p>Downstream Apps: Drill-down API for logs Web app integration</p><p>ampool</p><p>!</p><p>"</p><p>HDFS</p></li></ul>