review of calculation paradigm and its components
TRANSCRIPT
![Page 1: Review of Calculation Paradigm and its Components](https://reader033.vdocuments.mx/reader033/viewer/2022042819/55c5a523bb61eb222a8b45f0/html5/thumbnails/1.jpg)
Review of Calculation Paradigm and its
ComponentsNamuk Park
Nov 18, 2014
![Page 2: Review of Calculation Paradigm and its Components](https://reader033.vdocuments.mx/reader033/viewer/2022042819/55c5a523bb61eb222a8b45f0/html5/thumbnails/2.jpg)
Hadoop File System
![Page 3: Review of Calculation Paradigm and its Components](https://reader033.vdocuments.mx/reader033/viewer/2022042819/55c5a523bb61eb222a8b45f0/html5/thumbnails/3.jpg)
Hadoop: MapReduce
![Page 4: Review of Calculation Paradigm and its Components](https://reader033.vdocuments.mx/reader033/viewer/2022042819/55c5a523bb61eb222a8b45f0/html5/thumbnails/4.jpg)
Hadoop 2.0
• improve scalability
• to support non-mapreduce job
• heterogeneous machine
• common scenarios for low cluster utilization: maps slots might be full while reduce slots are empty, and vice-versa
![Page 5: Review of Calculation Paradigm and its Components](https://reader033.vdocuments.mx/reader033/viewer/2022042819/55c5a523bb61eb222a8b45f0/html5/thumbnails/5.jpg)
Hadoop 2.0
![Page 6: Review of Calculation Paradigm and its Components](https://reader033.vdocuments.mx/reader033/viewer/2022042819/55c5a523bb61eb222a8b45f0/html5/thumbnails/6.jpg)
Hadoop 2.0: Service Layers
![Page 7: Review of Calculation Paradigm and its Components](https://reader033.vdocuments.mx/reader033/viewer/2022042819/55c5a523bb61eb222a8b45f0/html5/thumbnails/7.jpg)
YARN
• split up the two functions of the JobTracker, resource management and job scheduling/monitoring
• to have a global Resource Manager (RM) and per-application ApplicationMaster (AM)
![Page 8: Review of Calculation Paradigm and its Components](https://reader033.vdocuments.mx/reader033/viewer/2022042819/55c5a523bb61eb222a8b45f0/html5/thumbnails/8.jpg)
YARN: MapReduce
![Page 9: Review of Calculation Paradigm and its Components](https://reader033.vdocuments.mx/reader033/viewer/2022042819/55c5a523bb61eb222a8b45f0/html5/thumbnails/9.jpg)
Storm
![Page 10: Review of Calculation Paradigm and its Components](https://reader033.vdocuments.mx/reader033/viewer/2022042819/55c5a523bb61eb222a8b45f0/html5/thumbnails/10.jpg)
Storm
public class WordCountTopology { {……} public static void main(String[] args) throws Exception {
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new RandomSentenceSpout(), 5);
builder.setBolt("split", new SplitSentence(), 8).shuffleGrouping("spout"); builder.setBolt("count", new WordCount(), 12).fieldsGrouping("split", new Fields("word"));
Config conf = new Config(); conf.setDebug(true);
if (args != null && args.length > 0) { conf.setNumWorkers(3);
StormSubmitter.submitTopologyWithProgressBar(args[0], conf, builder.createTopology()); }}
![Page 11: Review of Calculation Paradigm and its Components](https://reader033.vdocuments.mx/reader033/viewer/2022042819/55c5a523bb61eb222a8b45f0/html5/thumbnails/11.jpg)
Storm Architecture
![Page 12: Review of Calculation Paradigm and its Components](https://reader033.vdocuments.mx/reader033/viewer/2022042819/55c5a523bb61eb222a8b45f0/html5/thumbnails/12.jpg)
Storm Architecture
![Page 13: Review of Calculation Paradigm and its Components](https://reader033.vdocuments.mx/reader033/viewer/2022042819/55c5a523bb61eb222a8b45f0/html5/thumbnails/13.jpg)
Lambda Architecture
query = function (all datum)
![Page 14: Review of Calculation Paradigm and its Components](https://reader033.vdocuments.mx/reader033/viewer/2022042819/55c5a523bb61eb222a8b45f0/html5/thumbnails/14.jpg)
Lambda Architecture
![Page 15: Review of Calculation Paradigm and its Components](https://reader033.vdocuments.mx/reader033/viewer/2022042819/55c5a523bb61eb222a8b45f0/html5/thumbnails/15.jpg)
Tez
Low Level DAG Framework
• to execute a complex DAG of tasks
• more general-purpose resource management framework
![Page 16: Review of Calculation Paradigm and its Components](https://reader033.vdocuments.mx/reader033/viewer/2022042819/55c5a523bb61eb222a8b45f0/html5/thumbnails/16.jpg)
Tez: Runtime API
![Page 17: Review of Calculation Paradigm and its Components](https://reader033.vdocuments.mx/reader033/viewer/2022042819/55c5a523bb61eb222a8b45f0/html5/thumbnails/17.jpg)
Pig: ConceptsNon-blocking operators
• LOAD / STORE
• FOREACH __ GENERATE __
• FILTER __ BY __
Blocking operators
• GROUP __ BY __
• ORDER __ BY __
• JOIN __ BY __
Translated to a MapReduce shuffle
![Page 18: Review of Calculation Paradigm and its Components](https://reader033.vdocuments.mx/reader033/viewer/2022042819/55c5a523bb61eb222a8b45f0/html5/thumbnails/18.jpg)
Pig: Problems
Restrictions by MapReduce
• Extra intermediate output on HDFS
• Artificial synchronization barriers
• Inefficient use of resources
• Multi-query optimization
![Page 19: Review of Calculation Paradigm and its Components](https://reader033.vdocuments.mx/reader033/viewer/2022042819/55c5a523bb61eb222a8b45f0/html5/thumbnails/19.jpg)
Pig: on Tez
![Page 20: Review of Calculation Paradigm and its Components](https://reader033.vdocuments.mx/reader033/viewer/2022042819/55c5a523bb61eb222a8b45f0/html5/thumbnails/20.jpg)
Pig: Tez DAG
![Page 21: Review of Calculation Paradigm and its Components](https://reader033.vdocuments.mx/reader033/viewer/2022042819/55c5a523bb61eb222a8b45f0/html5/thumbnails/21.jpg)
Pig: Strategies
• AM/Container Reuse
• Broadcast Edge, Object Cache
• Vertex Group
• Slow Start, Pre-launch
![Page 22: Review of Calculation Paradigm and its Components](https://reader033.vdocuments.mx/reader033/viewer/2022042819/55c5a523bb61eb222a8b45f0/html5/thumbnails/22.jpg)
Pig: Performance
![Page 23: Review of Calculation Paradigm and its Components](https://reader033.vdocuments.mx/reader033/viewer/2022042819/55c5a523bb61eb222a8b45f0/html5/thumbnails/23.jpg)
Pig: Performance
![Page 24: Review of Calculation Paradigm and its Components](https://reader033.vdocuments.mx/reader033/viewer/2022042819/55c5a523bb61eb222a8b45f0/html5/thumbnails/24.jpg)
Pig: Performance
![Page 25: Review of Calculation Paradigm and its Components](https://reader033.vdocuments.mx/reader033/viewer/2022042819/55c5a523bb61eb222a8b45f0/html5/thumbnails/25.jpg)
Complex Event Processing: Problems
• fungible data
• EDA: event-driven SOA
• EDA requires non-pipeline complex
![Page 26: Review of Calculation Paradigm and its Components](https://reader033.vdocuments.mx/reader033/viewer/2022042819/55c5a523bb61eb222a8b45f0/html5/thumbnails/26.jpg)
Complex Event Processing: Paradigm
Task Tracker Task TrackerTask TrackerTask Tracker
Job Tracker
datadata data
pipeline
Task Tracker Task TrackerTask TrackerTask Tracker
Job Tracker
data data
data Message Coordinatordatadata
independent
![Page 27: Review of Calculation Paradigm and its Components](https://reader033.vdocuments.mx/reader033/viewer/2022042819/55c5a523bb61eb222a8b45f0/html5/thumbnails/27.jpg)
References
• Hadoop YARN: The Architectural Center of Enterprise Hadoop
• Lambda Architecture
• Apache Pig를 위한 Tez 연산 엔진 개발하기