computations incremental incoop: mapreduce for › presentation › 5252 › ... · incremental...
TRANSCRIPT
![Page 1: Computations Incremental Incoop: MapReduce for › presentation › 5252 › ... · Incremental map/reduce and contraction phase Memoization-aware scheduler. Memoization Scheduling](https://reader031.vdocuments.mx/reader031/viewer/2022041113/5f20390c235b901b3512fd27/html5/thumbnails/1.jpg)
Incoop: MapReduce for Incremental
Computationsby Bhatotia et al
![Page 2: Computations Incremental Incoop: MapReduce for › presentation › 5252 › ... · Incremental map/reduce and contraction phase Memoization-aware scheduler. Memoization Scheduling](https://reader031.vdocuments.mx/reader031/viewer/2022041113/5f20390c235b901b3512fd27/html5/thumbnails/2.jpg)
What is Incoop?
● Hadoop based framework
● Designed for improved efficiency of incremental programs
● Developed at the Max Plank institute by Bhatotia et al.
![Page 3: Computations Incremental Incoop: MapReduce for › presentation › 5252 › ... · Incremental map/reduce and contraction phase Memoization-aware scheduler. Memoization Scheduling](https://reader031.vdocuments.mx/reader031/viewer/2022041113/5f20390c235b901b3512fd27/html5/thumbnails/3.jpg)
Why Incoop?
![Page 4: Computations Incremental Incoop: MapReduce for › presentation › 5252 › ... · Incremental map/reduce and contraction phase Memoization-aware scheduler. Memoization Scheduling](https://reader031.vdocuments.mx/reader031/viewer/2022041113/5f20390c235b901b3512fd27/html5/thumbnails/4.jpg)
![Page 5: Computations Incremental Incoop: MapReduce for › presentation › 5252 › ... · Incremental map/reduce and contraction phase Memoization-aware scheduler. Memoization Scheduling](https://reader031.vdocuments.mx/reader031/viewer/2022041113/5f20390c235b901b3512fd27/html5/thumbnails/5.jpg)
● Lots of applications are incremental○ Machine Learning, wc over a range of docs etc
● Easy to write, input = Hadoop programs
● Great speedups
Why run incremental computation on Incoop?
![Page 6: Computations Incremental Incoop: MapReduce for › presentation › 5252 › ... · Incremental map/reduce and contraction phase Memoization-aware scheduler. Memoization Scheduling](https://reader031.vdocuments.mx/reader031/viewer/2022041113/5f20390c235b901b3512fd27/html5/thumbnails/6.jpg)
What differs Incoop from Hadoop?
● Incremental HDFS
● Incremental map and incremental reduce through contraction phase
● Memoization-aware scheduler
![Page 7: Computations Incremental Incoop: MapReduce for › presentation › 5252 › ... · Incremental map/reduce and contraction phase Memoization-aware scheduler. Memoization Scheduling](https://reader031.vdocuments.mx/reader031/viewer/2022041113/5f20390c235b901b3512fd27/html5/thumbnails/7.jpg)
HDFS recap
● Large, fixed sized chunks - 64MB
● Append only filesystem
● Serial reads and writes
![Page 8: Computations Incremental Incoop: MapReduce for › presentation › 5252 › ... · Incremental map/reduce and contraction phase Memoization-aware scheduler. Memoization Scheduling](https://reader031.vdocuments.mx/reader031/viewer/2022041113/5f20390c235b901b3512fd27/html5/thumbnails/8.jpg)
What’s bad about HDFS?
● Even small changes to input data results in unstable partitioning!
● This makes it difficult to reuse results
![Page 9: Computations Incremental Incoop: MapReduce for › presentation › 5252 › ... · Incremental map/reduce and contraction phase Memoization-aware scheduler. Memoization Scheduling](https://reader031.vdocuments.mx/reader031/viewer/2022041113/5f20390c235b901b3512fd27/html5/thumbnails/9.jpg)
The problem with HDFSPartitioning
Input file
Input file
Input file
Mapper Mapper Mapper
HDFS
![Page 10: Computations Incremental Incoop: MapReduce for › presentation › 5252 › ... · Incremental map/reduce and contraction phase Memoization-aware scheduler. Memoization Scheduling](https://reader031.vdocuments.mx/reader031/viewer/2022041113/5f20390c235b901b3512fd27/html5/thumbnails/10.jpg)
The problem with HDFSPartitioning
Input file
Input file
Input file
Mapper Mapper Mapper
HDFS
![Page 11: Computations Incremental Incoop: MapReduce for › presentation › 5252 › ... · Incremental map/reduce and contraction phase Memoization-aware scheduler. Memoization Scheduling](https://reader031.vdocuments.mx/reader031/viewer/2022041113/5f20390c235b901b3512fd27/html5/thumbnails/11.jpg)
The problem with HDFSPartitioning
Input file
Input file
Input file
Mapper Mapper Mapper
HDFS
![Page 12: Computations Incremental Incoop: MapReduce for › presentation › 5252 › ... · Incremental map/reduce and contraction phase Memoization-aware scheduler. Memoization Scheduling](https://reader031.vdocuments.mx/reader031/viewer/2022041113/5f20390c235b901b3512fd27/html5/thumbnails/12.jpg)
Incremental HDFS
● Splits input data based on content
● Variable length chunk sizes
● Done at the input creation phase
● Follows the HDFS API
![Page 13: Computations Incremental Incoop: MapReduce for › presentation › 5252 › ... · Incremental map/reduce and contraction phase Memoization-aware scheduler. Memoization Scheduling](https://reader031.vdocuments.mx/reader031/viewer/2022041113/5f20390c235b901b3512fd27/html5/thumbnails/13.jpg)
Solution with incremental HDFS
Input file
Input file
Input file
Mapper Mapper Mapper
INC-HDFS
![Page 14: Computations Incremental Incoop: MapReduce for › presentation › 5252 › ... · Incremental map/reduce and contraction phase Memoization-aware scheduler. Memoization Scheduling](https://reader031.vdocuments.mx/reader031/viewer/2022041113/5f20390c235b901b3512fd27/html5/thumbnails/14.jpg)
Solution with incremental HDFS
Input file
Input file
Input file
Mapper Mapper Mapper
INC-HDFS
![Page 15: Computations Incremental Incoop: MapReduce for › presentation › 5252 › ... · Incremental map/reduce and contraction phase Memoization-aware scheduler. Memoization Scheduling](https://reader031.vdocuments.mx/reader031/viewer/2022041113/5f20390c235b901b3512fd27/html5/thumbnails/15.jpg)
Solution with incremental HDFS
Input file
Input file
Input file
Mapper Mapper Mapper
INC-HDFS
![Page 16: Computations Incremental Incoop: MapReduce for › presentation › 5252 › ... · Incremental map/reduce and contraction phase Memoization-aware scheduler. Memoization Scheduling](https://reader031.vdocuments.mx/reader031/viewer/2022041113/5f20390c235b901b3512fd27/html5/thumbnails/16.jpg)
What differs Incoop from Hadoop?
● Incremental HDFS
● Incremental map/reduce and contraction phase
● Memoization-aware scheduler
![Page 17: Computations Incremental Incoop: MapReduce for › presentation › 5252 › ... · Incremental map/reduce and contraction phase Memoization-aware scheduler. Memoization Scheduling](https://reader031.vdocuments.mx/reader031/viewer/2022041113/5f20390c235b901b3512fd27/html5/thumbnails/17.jpg)
Incremental Map Phase
● Persistently stores result between iterations
● Creates a reference to the result in the memoization server (via hashing)
● Later iterations fetches results pointed to by the memoization server
![Page 18: Computations Incremental Incoop: MapReduce for › presentation › 5252 › ... · Incremental map/reduce and contraction phase Memoization-aware scheduler. Memoization Scheduling](https://reader031.vdocuments.mx/reader031/viewer/2022041113/5f20390c235b901b3512fd27/html5/thumbnails/18.jpg)
Incremental Map Phase
![Page 19: Computations Incremental Incoop: MapReduce for › presentation › 5252 › ... · Incremental map/reduce and contraction phase Memoization-aware scheduler. Memoization Scheduling](https://reader031.vdocuments.mx/reader031/viewer/2022041113/5f20390c235b901b3512fd27/html5/thumbnails/19.jpg)
Incremental Reduce phase
● More challenging than the Map Phase
● Coarse grained memoization○ Reducers copies map input only if result not already
computed
● Fine-grained memoization○ Combiners
![Page 20: Computations Incremental Incoop: MapReduce for › presentation › 5252 › ... · Incremental map/reduce and contraction phase Memoization-aware scheduler. Memoization Scheduling](https://reader031.vdocuments.mx/reader031/viewer/2022041113/5f20390c235b901b3512fd27/html5/thumbnails/20.jpg)
What are combiners?
● A step between mappers and reducers
● Traditionally used to reduce the bandwidth between mappers and reducers
● Used in incoop to split reduce tasks and allow for better memoization
![Page 21: Computations Incremental Incoop: MapReduce for › presentation › 5252 › ... · Incremental map/reduce and contraction phase Memoization-aware scheduler. Memoization Scheduling](https://reader031.vdocuments.mx/reader031/viewer/2022041113/5f20390c235b901b3512fd27/html5/thumbnails/21.jpg)
Incremental Reduce phase
![Page 22: Computations Incremental Incoop: MapReduce for › presentation › 5252 › ... · Incremental map/reduce and contraction phase Memoization-aware scheduler. Memoization Scheduling](https://reader031.vdocuments.mx/reader031/viewer/2022041113/5f20390c235b901b3512fd27/html5/thumbnails/22.jpg)
What differs Incoop from Hadoop?
● Incremental HDFS
● Incremental map/reduce and contraction phase
● Memoization-aware scheduler
![Page 23: Computations Incremental Incoop: MapReduce for › presentation › 5252 › ... · Incremental map/reduce and contraction phase Memoization-aware scheduler. Memoization Scheduling](https://reader031.vdocuments.mx/reader031/viewer/2022041113/5f20390c235b901b3512fd27/html5/thumbnails/23.jpg)
Memoization Scheduling
● Built using memcached
● Per node work queue for good use of data locality and memoization
● Work stealing
![Page 24: Computations Incremental Incoop: MapReduce for › presentation › 5252 › ... · Incremental map/reduce and contraction phase Memoization-aware scheduler. Memoization Scheduling](https://reader031.vdocuments.mx/reader031/viewer/2022041113/5f20390c235b901b3512fd27/html5/thumbnails/24.jpg)
Results - incremental runs
![Page 25: Computations Incremental Incoop: MapReduce for › presentation › 5252 › ... · Incremental map/reduce and contraction phase Memoization-aware scheduler. Memoization Scheduling](https://reader031.vdocuments.mx/reader031/viewer/2022041113/5f20390c235b901b3512fd27/html5/thumbnails/25.jpg)
Results - Scheduler
![Page 26: Computations Incremental Incoop: MapReduce for › presentation › 5252 › ... · Incremental map/reduce and contraction phase Memoization-aware scheduler. Memoization Scheduling](https://reader031.vdocuments.mx/reader031/viewer/2022041113/5f20390c235b901b3512fd27/html5/thumbnails/26.jpg)
Results - Overheads
![Page 27: Computations Incremental Incoop: MapReduce for › presentation › 5252 › ... · Incremental map/reduce and contraction phase Memoization-aware scheduler. Memoization Scheduling](https://reader031.vdocuments.mx/reader031/viewer/2022041113/5f20390c235b901b3512fd27/html5/thumbnails/27.jpg)
Results - Overheads
![Page 28: Computations Incremental Incoop: MapReduce for › presentation › 5252 › ... · Incremental map/reduce and contraction phase Memoization-aware scheduler. Memoization Scheduling](https://reader031.vdocuments.mx/reader031/viewer/2022041113/5f20390c235b901b3512fd27/html5/thumbnails/28.jpg)
Criticisms
● Lack of comparison against other frameworks
● How were the percentual incremental changes generated?
● Garbage collection is pretty naïve. Odd-even runtime workloads sees no memoization.
● How realistic are the incremental results for real world workloads wrt Inc-HDFS?
![Page 29: Computations Incremental Incoop: MapReduce for › presentation › 5252 › ... · Incremental map/reduce and contraction phase Memoization-aware scheduler. Memoization Scheduling](https://reader031.vdocuments.mx/reader031/viewer/2022041113/5f20390c235b901b3512fd27/html5/thumbnails/29.jpg)
Questions?