topic 6: mapreduce applications
DESCRIPTION
Cloud Computing Workshop 2013, ITUTRANSCRIPT
![Page 1: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/1.jpg)
6: MapReduce Applications
Zubair Nabi
April 18, 2013
Zubair Nabi 6: MapReduce Applications April 18, 2013 1 / 27
![Page 2: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/2.jpg)
Outline
1 The Anatomy of a MapReduce Application
2 MapReduce Design Patterns
3 Common MapReduce Application Types
Zubair Nabi 6: MapReduce Applications April 18, 2013 2 / 27
![Page 3: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/3.jpg)
Outline
1 The Anatomy of a MapReduce Application
2 MapReduce Design Patterns
3 Common MapReduce Application Types
Zubair Nabi 6: MapReduce Applications April 18, 2013 3 / 27
![Page 4: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/4.jpg)
MapReduce job phases
A MapReduce job can be divided into 4 phases:
1 Input split: The input dataset is sliced into M splits, one per map task
2 Map logic: The user-supplied map function is invokedI In tandem a sort phase is also applied that ensures that map output is
locally sorted by keyI In addition, the key space is also partitioned amongst the reducers
3 Shuffle: Map output is relayed to all reduce tasks4 Reduce logic: The user-provided reduce function is invoked
I Before the application of the reduce function, the input keys are mergedto get globally sorted key/value pairs
Zubair Nabi 6: MapReduce Applications April 18, 2013 4 / 27
![Page 5: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/5.jpg)
MapReduce job phases
A MapReduce job can be divided into 4 phases:
1 Input split: The input dataset is sliced into M splits, one per map task2 Map logic: The user-supplied map function is invoked
I In tandem a sort phase is also applied that ensures that map output islocally sorted by key
I In addition, the key space is also partitioned amongst the reducers
3 Shuffle: Map output is relayed to all reduce tasks4 Reduce logic: The user-provided reduce function is invoked
I Before the application of the reduce function, the input keys are mergedto get globally sorted key/value pairs
Zubair Nabi 6: MapReduce Applications April 18, 2013 4 / 27
![Page 6: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/6.jpg)
MapReduce job phases
A MapReduce job can be divided into 4 phases:
1 Input split: The input dataset is sliced into M splits, one per map task2 Map logic: The user-supplied map function is invoked
I In tandem a sort phase is also applied that ensures that map output islocally sorted by key
I In addition, the key space is also partitioned amongst the reducers
3 Shuffle: Map output is relayed to all reduce tasks4 Reduce logic: The user-provided reduce function is invoked
I Before the application of the reduce function, the input keys are mergedto get globally sorted key/value pairs
Zubair Nabi 6: MapReduce Applications April 18, 2013 4 / 27
![Page 7: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/7.jpg)
MapReduce job phases
A MapReduce job can be divided into 4 phases:
1 Input split: The input dataset is sliced into M splits, one per map task2 Map logic: The user-supplied map function is invoked
I In tandem a sort phase is also applied that ensures that map output islocally sorted by key
I In addition, the key space is also partitioned amongst the reducers
3 Shuffle: Map output is relayed to all reduce tasks4 Reduce logic: The user-provided reduce function is invoked
I Before the application of the reduce function, the input keys are mergedto get globally sorted key/value pairs
Zubair Nabi 6: MapReduce Applications April 18, 2013 4 / 27
![Page 8: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/8.jpg)
MapReduce job phases
A MapReduce job can be divided into 4 phases:
1 Input split: The input dataset is sliced into M splits, one per map task2 Map logic: The user-supplied map function is invoked
I In tandem a sort phase is also applied that ensures that map output islocally sorted by key
I In addition, the key space is also partitioned amongst the reducers
3 Shuffle: Map output is relayed to all reduce tasks
4 Reduce logic: The user-provided reduce function is invokedI Before the application of the reduce function, the input keys are merged
to get globally sorted key/value pairs
Zubair Nabi 6: MapReduce Applications April 18, 2013 4 / 27
![Page 9: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/9.jpg)
MapReduce job phases
A MapReduce job can be divided into 4 phases:
1 Input split: The input dataset is sliced into M splits, one per map task2 Map logic: The user-supplied map function is invoked
I In tandem a sort phase is also applied that ensures that map output islocally sorted by key
I In addition, the key space is also partitioned amongst the reducers
3 Shuffle: Map output is relayed to all reduce tasks4 Reduce logic: The user-provided reduce function is invoked
I Before the application of the reduce function, the input keys are mergedto get globally sorted key/value pairs
Zubair Nabi 6: MapReduce Applications April 18, 2013 4 / 27
![Page 10: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/10.jpg)
MapReduce job phases
A MapReduce job can be divided into 4 phases:
1 Input split: The input dataset is sliced into M splits, one per map task2 Map logic: The user-supplied map function is invoked
I In tandem a sort phase is also applied that ensures that map output islocally sorted by key
I In addition, the key space is also partitioned amongst the reducers
3 Shuffle: Map output is relayed to all reduce tasks4 Reduce logic: The user-provided reduce function is invoked
I Before the application of the reduce function, the input keys are mergedto get globally sorted key/value pairs
Zubair Nabi 6: MapReduce Applications April 18, 2013 4 / 27
![Page 11: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/11.jpg)
Of mappers and reducers
In the common case, programmers only need to write a map and areduce function
The user-provided map function is invoked for every line (can bemodified) in the input file and is passed the line number as key and linecontents as value
The user-provided reduce function is invoked for each key output bythe map phase and is passed the set of associated values as iterablevalues
Zubair Nabi 6: MapReduce Applications April 18, 2013 5 / 27
![Page 12: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/12.jpg)
Of mappers and reducers
In the common case, programmers only need to write a map and areduce function
The user-provided map function is invoked for every line (can bemodified) in the input file and is passed the line number as key and linecontents as value
The user-provided reduce function is invoked for each key output bythe map phase and is passed the set of associated values as iterablevalues
Zubair Nabi 6: MapReduce Applications April 18, 2013 5 / 27
![Page 13: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/13.jpg)
Of mappers and reducers
In the common case, programmers only need to write a map and areduce function
The user-provided map function is invoked for every line (can bemodified) in the input file and is passed the line number as key and linecontents as value
The user-provided reduce function is invoked for each key output bythe map phase and is passed the set of associated values as iterablevalues
Zubair Nabi 6: MapReduce Applications April 18, 2013 5 / 27
![Page 14: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/14.jpg)
Wordcount: High-level view
Input: A text corpus such as Wikipedia dump, books from Gutenberg,etc.
The map function is invoked once for each text line
Map output: Words as keys and 1 as values
Reduce input: Key/value pairs of words and values (1)
The reduce function is invoked once for each word with a list of 1s
Reduce output: Words and their final counts
Zubair Nabi 6: MapReduce Applications April 18, 2013 6 / 27
![Page 15: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/15.jpg)
Wordcount: High-level view
Input: A text corpus such as Wikipedia dump, books from Gutenberg,etc.
The map function is invoked once for each text line
Map output: Words as keys and 1 as values
Reduce input: Key/value pairs of words and values (1)
The reduce function is invoked once for each word with a list of 1s
Reduce output: Words and their final counts
Zubair Nabi 6: MapReduce Applications April 18, 2013 6 / 27
![Page 16: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/16.jpg)
Wordcount: High-level view
Input: A text corpus such as Wikipedia dump, books from Gutenberg,etc.
The map function is invoked once for each text line
Map output: Words as keys and 1 as values
Reduce input: Key/value pairs of words and values (1)
The reduce function is invoked once for each word with a list of 1s
Reduce output: Words and their final counts
Zubair Nabi 6: MapReduce Applications April 18, 2013 6 / 27
![Page 17: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/17.jpg)
Wordcount: High-level view
Input: A text corpus such as Wikipedia dump, books from Gutenberg,etc.
The map function is invoked once for each text line
Map output: Words as keys and 1 as values
Reduce input: Key/value pairs of words and values (1)
The reduce function is invoked once for each word with a list of 1s
Reduce output: Words and their final counts
Zubair Nabi 6: MapReduce Applications April 18, 2013 6 / 27
![Page 18: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/18.jpg)
Wordcount: High-level view
Input: A text corpus such as Wikipedia dump, books from Gutenberg,etc.
The map function is invoked once for each text line
Map output: Words as keys and 1 as values
Reduce input: Key/value pairs of words and values (1)
The reduce function is invoked once for each word with a list of 1s
Reduce output: Words and their final counts
Zubair Nabi 6: MapReduce Applications April 18, 2013 6 / 27
![Page 19: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/19.jpg)
Wordcount: High-level view
Input: A text corpus such as Wikipedia dump, books from Gutenberg,etc.
The map function is invoked once for each text line
Map output: Words as keys and 1 as values
Reduce input: Key/value pairs of words and values (1)
The reduce function is invoked once for each word with a list of 1s
Reduce output: Words and their final counts
Zubair Nabi 6: MapReduce Applications April 18, 2013 6 / 27
![Page 20: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/20.jpg)
Wordcount: Low-level view
A new process is created for each map, called MapRunner
MapRunner has a RecordReader instance that is used to read theinput file
RecordReader reads the input file in chunks and parses the chunksinto lines
MapRunner also has a Mapper instance with a map function,WordCountMapper in this case
For each line parse by RecordReader, MapRunner callsWordCountMapper.map() and passes it the line
Zubair Nabi 6: MapReduce Applications April 18, 2013 7 / 27
![Page 21: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/21.jpg)
Wordcount: Low-level view
A new process is created for each map, called MapRunner
MapRunner has a RecordReader instance that is used to read theinput file
RecordReader reads the input file in chunks and parses the chunksinto lines
MapRunner also has a Mapper instance with a map function,WordCountMapper in this case
For each line parse by RecordReader, MapRunner callsWordCountMapper.map() and passes it the line
Zubair Nabi 6: MapReduce Applications April 18, 2013 7 / 27
![Page 22: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/22.jpg)
Wordcount: Low-level view
A new process is created for each map, called MapRunner
MapRunner has a RecordReader instance that is used to read theinput file
RecordReader reads the input file in chunks and parses the chunksinto lines
MapRunner also has a Mapper instance with a map function,WordCountMapper in this case
For each line parse by RecordReader, MapRunner callsWordCountMapper.map() and passes it the line
Zubair Nabi 6: MapReduce Applications April 18, 2013 7 / 27
![Page 23: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/23.jpg)
Wordcount: Low-level view
A new process is created for each map, called MapRunner
MapRunner has a RecordReader instance that is used to read theinput file
RecordReader reads the input file in chunks and parses the chunksinto lines
MapRunner also has a Mapper instance with a map function,WordCountMapper in this case
For each line parse by RecordReader, MapRunner callsWordCountMapper.map() and passes it the line
Zubair Nabi 6: MapReduce Applications April 18, 2013 7 / 27
![Page 24: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/24.jpg)
Wordcount: Low-level view
A new process is created for each map, called MapRunner
MapRunner has a RecordReader instance that is used to read theinput file
RecordReader reads the input file in chunks and parses the chunksinto lines
MapRunner also has a Mapper instance with a map function,WordCountMapper in this case
For each line parse by RecordReader, MapRunner callsWordCountMapper.map() and passes it the line
Zubair Nabi 6: MapReduce Applications April 18, 2013 7 / 27
![Page 25: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/25.jpg)
Wordcount: Low-level view (2)
WordCountMapper has an OutputCollector instance which maintainsan in-memory buffer for each output partition (one partition per reduce)
Each time WordCountMapper.map() is invoked it, it tokenizes the lineinto words
For each word, it writes the word as key and 1 as value toOutputCollector
OutputCollector uses the Partitioner instance to select a partitionbuffer for each key
Whenever the size of a partition buffer exceeds a configurablethreshold, its contents are first sorted by key and then flushed to disk
This process is repeated till the map logic has been applied to all lineswithin the input file
Zubair Nabi 6: MapReduce Applications April 18, 2013 8 / 27
![Page 26: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/26.jpg)
Wordcount: Low-level view (2)
WordCountMapper has an OutputCollector instance which maintainsan in-memory buffer for each output partition (one partition per reduce)
Each time WordCountMapper.map() is invoked it, it tokenizes the lineinto words
For each word, it writes the word as key and 1 as value toOutputCollector
OutputCollector uses the Partitioner instance to select a partitionbuffer for each key
Whenever the size of a partition buffer exceeds a configurablethreshold, its contents are first sorted by key and then flushed to disk
This process is repeated till the map logic has been applied to all lineswithin the input file
Zubair Nabi 6: MapReduce Applications April 18, 2013 8 / 27
![Page 27: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/27.jpg)
Wordcount: Low-level view (2)
WordCountMapper has an OutputCollector instance which maintainsan in-memory buffer for each output partition (one partition per reduce)
Each time WordCountMapper.map() is invoked it, it tokenizes the lineinto words
For each word, it writes the word as key and 1 as value toOutputCollector
OutputCollector uses the Partitioner instance to select a partitionbuffer for each key
Whenever the size of a partition buffer exceeds a configurablethreshold, its contents are first sorted by key and then flushed to disk
This process is repeated till the map logic has been applied to all lineswithin the input file
Zubair Nabi 6: MapReduce Applications April 18, 2013 8 / 27
![Page 28: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/28.jpg)
Wordcount: Low-level view (2)
WordCountMapper has an OutputCollector instance which maintainsan in-memory buffer for each output partition (one partition per reduce)
Each time WordCountMapper.map() is invoked it, it tokenizes the lineinto words
For each word, it writes the word as key and 1 as value toOutputCollector
OutputCollector uses the Partitioner instance to select a partitionbuffer for each key
Whenever the size of a partition buffer exceeds a configurablethreshold, its contents are first sorted by key and then flushed to disk
This process is repeated till the map logic has been applied to all lineswithin the input file
Zubair Nabi 6: MapReduce Applications April 18, 2013 8 / 27
![Page 29: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/29.jpg)
Wordcount: Low-level view (2)
WordCountMapper has an OutputCollector instance which maintainsan in-memory buffer for each output partition (one partition per reduce)
Each time WordCountMapper.map() is invoked it, it tokenizes the lineinto words
For each word, it writes the word as key and 1 as value toOutputCollector
OutputCollector uses the Partitioner instance to select a partitionbuffer for each key
Whenever the size of a partition buffer exceeds a configurablethreshold, its contents are first sorted by key and then flushed to disk
This process is repeated till the map logic has been applied to all lineswithin the input file
Zubair Nabi 6: MapReduce Applications April 18, 2013 8 / 27
![Page 30: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/30.jpg)
Wordcount: Low-level view (2)
WordCountMapper has an OutputCollector instance which maintainsan in-memory buffer for each output partition (one partition per reduce)
Each time WordCountMapper.map() is invoked it, it tokenizes the lineinto words
For each word, it writes the word as key and 1 as value toOutputCollector
OutputCollector uses the Partitioner instance to select a partitionbuffer for each key
Whenever the size of a partition buffer exceeds a configurablethreshold, its contents are first sorted by key and then flushed to disk
This process is repeated till the map logic has been applied to all lineswithin the input file
Zubair Nabi 6: MapReduce Applications April 18, 2013 8 / 27
![Page 31: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/31.jpg)
Wordcount: Low-level view (3)
Once all maps have completed their execution, the reduce phase isstarted
For each reduce task, a ReduceRunner process is created
Each reduce task fetches its input partitions from machines on whichmap tasks were run
All input partitions are then merged to get a globally sorted partition ofkey/value pairs
Zubair Nabi 6: MapReduce Applications April 18, 2013 9 / 27
![Page 32: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/32.jpg)
Wordcount: Low-level view (3)
Once all maps have completed their execution, the reduce phase isstarted
For each reduce task, a ReduceRunner process is created
Each reduce task fetches its input partitions from machines on whichmap tasks were run
All input partitions are then merged to get a globally sorted partition ofkey/value pairs
Zubair Nabi 6: MapReduce Applications April 18, 2013 9 / 27
![Page 33: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/33.jpg)
Wordcount: Low-level view (3)
Once all maps have completed their execution, the reduce phase isstarted
For each reduce task, a ReduceRunner process is created
Each reduce task fetches its input partitions from machines on whichmap tasks were run
All input partitions are then merged to get a globally sorted partition ofkey/value pairs
Zubair Nabi 6: MapReduce Applications April 18, 2013 9 / 27
![Page 34: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/34.jpg)
Wordcount: Low-level view (3)
Once all maps have completed their execution, the reduce phase isstarted
For each reduce task, a ReduceRunner process is created
Each reduce task fetches its input partitions from machines on whichmap tasks were run
All input partitions are then merged to get a globally sorted partition ofkey/value pairs
Zubair Nabi 6: MapReduce Applications April 18, 2013 9 / 27
![Page 35: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/35.jpg)
Wordcount: Low-level view (4)
ReduceRunner contains a Reducer instance with a reduce function,WordCountReducer in this case
For each word, ReduceRunner invokes WordCountReducer.reduce()and passes it the word and a list of its values (1s)
WordCountReducer also has an OutputCollector instance with anin-memory buffer
WordCountReducer.reduce() sums the list of values it is passed andwrites the word and its final count to the OutputCollector
This process is repeated till the reduce logic has been appliedkey/value pairs
At the end of the entire job, each reduce produces an output file withwords and their number of occurrences
Zubair Nabi 6: MapReduce Applications April 18, 2013 10 / 27
![Page 36: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/36.jpg)
Wordcount: Low-level view (4)
ReduceRunner contains a Reducer instance with a reduce function,WordCountReducer in this case
For each word, ReduceRunner invokes WordCountReducer.reduce()and passes it the word and a list of its values (1s)
WordCountReducer also has an OutputCollector instance with anin-memory buffer
WordCountReducer.reduce() sums the list of values it is passed andwrites the word and its final count to the OutputCollector
This process is repeated till the reduce logic has been appliedkey/value pairs
At the end of the entire job, each reduce produces an output file withwords and their number of occurrences
Zubair Nabi 6: MapReduce Applications April 18, 2013 10 / 27
![Page 37: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/37.jpg)
Wordcount: Low-level view (4)
ReduceRunner contains a Reducer instance with a reduce function,WordCountReducer in this case
For each word, ReduceRunner invokes WordCountReducer.reduce()and passes it the word and a list of its values (1s)
WordCountReducer also has an OutputCollector instance with anin-memory buffer
WordCountReducer.reduce() sums the list of values it is passed andwrites the word and its final count to the OutputCollector
This process is repeated till the reduce logic has been appliedkey/value pairs
At the end of the entire job, each reduce produces an output file withwords and their number of occurrences
Zubair Nabi 6: MapReduce Applications April 18, 2013 10 / 27
![Page 38: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/38.jpg)
Wordcount: Low-level view (4)
ReduceRunner contains a Reducer instance with a reduce function,WordCountReducer in this case
For each word, ReduceRunner invokes WordCountReducer.reduce()and passes it the word and a list of its values (1s)
WordCountReducer also has an OutputCollector instance with anin-memory buffer
WordCountReducer.reduce() sums the list of values it is passed andwrites the word and its final count to the OutputCollector
This process is repeated till the reduce logic has been appliedkey/value pairs
At the end of the entire job, each reduce produces an output file withwords and their number of occurrences
Zubair Nabi 6: MapReduce Applications April 18, 2013 10 / 27
![Page 39: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/39.jpg)
Wordcount: Low-level view (4)
ReduceRunner contains a Reducer instance with a reduce function,WordCountReducer in this case
For each word, ReduceRunner invokes WordCountReducer.reduce()and passes it the word and a list of its values (1s)
WordCountReducer also has an OutputCollector instance with anin-memory buffer
WordCountReducer.reduce() sums the list of values it is passed andwrites the word and its final count to the OutputCollector
This process is repeated till the reduce logic has been appliedkey/value pairs
At the end of the entire job, each reduce produces an output file withwords and their number of occurrences
Zubair Nabi 6: MapReduce Applications April 18, 2013 10 / 27
![Page 40: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/40.jpg)
Wordcount: Low-level view (4)
ReduceRunner contains a Reducer instance with a reduce function,WordCountReducer in this case
For each word, ReduceRunner invokes WordCountReducer.reduce()and passes it the word and a list of its values (1s)
WordCountReducer also has an OutputCollector instance with anin-memory buffer
WordCountReducer.reduce() sums the list of values it is passed andwrites the word and its final count to the OutputCollector
This process is repeated till the reduce logic has been appliedkey/value pairs
At the end of the entire job, each reduce produces an output file withwords and their number of occurrences
Zubair Nabi 6: MapReduce Applications April 18, 2013 10 / 27
![Page 41: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/41.jpg)
Wordcount map in Java
1 public void map(Object key, Text value, Context context) {
2 StringTokenizer itr = new StringTokenizer(value.toString());
3 while (itr.hasMoreTokens()) {
4 word.set(itr.nextToken());
5 context.write(word, one);
6 }
7 }
Zubair Nabi 6: MapReduce Applications April 18, 2013 11 / 27
![Page 42: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/42.jpg)
Wordcount reduce in Java
1 public void reduce(Text key, Iterable<IntWritable > values,
2 Context context) {
3 int sum = 0;
4 for (IntWritable val : values) {
5 sum += val.get();
6 }
7 result.set(sum);
8 context.write(key, result);
9 }
Zubair Nabi 6: MapReduce Applications April 18, 2013 12 / 27
![Page 43: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/43.jpg)
Wordcount map in Python
1 def map(self, key, value):
2 [self._output_collector.collect(word, 1) for word in value.split(’ ’)]
Zubair Nabi 6: MapReduce Applications April 18, 2013 13 / 27
![Page 44: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/44.jpg)
Wordcount reduce in Python
1 def reduce(self, key, values):
2 sum__ = 0
3 for value in values:
4 sum__ += value
5 self._output_collector.collect(key, sum__)
Zubair Nabi 6: MapReduce Applications April 18, 2013 14 / 27
![Page 45: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/45.jpg)
Outline
1 The Anatomy of a MapReduce Application
2 MapReduce Design Patterns
3 Common MapReduce Application Types
Zubair Nabi 6: MapReduce Applications April 18, 2013 15 / 27
![Page 46: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/46.jpg)
Bird’s-eye view
The MapReduce paradigm is amenable to divide-and-conqueralgorithms
One way to look at MapReduce is that it is just a large-scale sortingplatform
User-logic is only involved at specific hook pointsAlgorithms must be expressed in terms of a small number of specificcomponents that fit together in preset ways
I Like putting together a jigsaw puzzle in which all the other pieces havealready been assembled and you only need to add two pieces: The mapand the reduce pieces
Fortunately a large number of algorithms easily fit this rigid pattern
Zubair Nabi 6: MapReduce Applications April 18, 2013 16 / 27
![Page 47: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/47.jpg)
Bird’s-eye view
The MapReduce paradigm is amenable to divide-and-conqueralgorithms
One way to look at MapReduce is that it is just a large-scale sortingplatform
User-logic is only involved at specific hook pointsAlgorithms must be expressed in terms of a small number of specificcomponents that fit together in preset ways
I Like putting together a jigsaw puzzle in which all the other pieces havealready been assembled and you only need to add two pieces: The mapand the reduce pieces
Fortunately a large number of algorithms easily fit this rigid pattern
Zubair Nabi 6: MapReduce Applications April 18, 2013 16 / 27
![Page 48: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/48.jpg)
Bird’s-eye view
The MapReduce paradigm is amenable to divide-and-conqueralgorithms
One way to look at MapReduce is that it is just a large-scale sortingplatform
User-logic is only involved at specific hook points
Algorithms must be expressed in terms of a small number of specificcomponents that fit together in preset ways
I Like putting together a jigsaw puzzle in which all the other pieces havealready been assembled and you only need to add two pieces: The mapand the reduce pieces
Fortunately a large number of algorithms easily fit this rigid pattern
Zubair Nabi 6: MapReduce Applications April 18, 2013 16 / 27
![Page 49: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/49.jpg)
Bird’s-eye view
The MapReduce paradigm is amenable to divide-and-conqueralgorithms
One way to look at MapReduce is that it is just a large-scale sortingplatform
User-logic is only involved at specific hook pointsAlgorithms must be expressed in terms of a small number of specificcomponents that fit together in preset ways
I Like putting together a jigsaw puzzle in which all the other pieces havealready been assembled and you only need to add two pieces: The mapand the reduce pieces
Fortunately a large number of algorithms easily fit this rigid pattern
Zubair Nabi 6: MapReduce Applications April 18, 2013 16 / 27
![Page 50: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/50.jpg)
Bird’s-eye view
The MapReduce paradigm is amenable to divide-and-conqueralgorithms
One way to look at MapReduce is that it is just a large-scale sortingplatform
User-logic is only involved at specific hook pointsAlgorithms must be expressed in terms of a small number of specificcomponents that fit together in preset ways
I Like putting together a jigsaw puzzle in which all the other pieces havealready been assembled and you only need to add two pieces: The mapand the reduce pieces
Fortunately a large number of algorithms easily fit this rigid pattern
Zubair Nabi 6: MapReduce Applications April 18, 2013 16 / 27
![Page 51: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/51.jpg)
Bird’s-eye view
The MapReduce paradigm is amenable to divide-and-conqueralgorithms
One way to look at MapReduce is that it is just a large-scale sortingplatform
User-logic is only involved at specific hook pointsAlgorithms must be expressed in terms of a small number of specificcomponents that fit together in preset ways
I Like putting together a jigsaw puzzle in which all the other pieces havealready been assembled and you only need to add two pieces: The mapand the reduce pieces
Fortunately a large number of algorithms easily fit this rigid pattern
Zubair Nabi 6: MapReduce Applications April 18, 2013 16 / 27
![Page 52: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/52.jpg)
Programmer control
The programmer has no control over
1 The location of a map or reduce task in terms of nodes in the cluster
2 The start and end time of a map or a reduce task
3 The input key/value pairs processed by a specific map task
4 The intermediate key/value pairs processed by a specific reduce task
Zubair Nabi 6: MapReduce Applications April 18, 2013 17 / 27
![Page 53: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/53.jpg)
Programmer control
The programmer has no control over
1 The location of a map or reduce task in terms of nodes in the cluster
2 The start and end time of a map or a reduce task
3 The input key/value pairs processed by a specific map task
4 The intermediate key/value pairs processed by a specific reduce task
Zubair Nabi 6: MapReduce Applications April 18, 2013 17 / 27
![Page 54: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/54.jpg)
Programmer control
The programmer has no control over
1 The location of a map or reduce task in terms of nodes in the cluster
2 The start and end time of a map or a reduce task
3 The input key/value pairs processed by a specific map task
4 The intermediate key/value pairs processed by a specific reduce task
Zubair Nabi 6: MapReduce Applications April 18, 2013 17 / 27
![Page 55: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/55.jpg)
Programmer control
The programmer has no control over
1 The location of a map or reduce task in terms of nodes in the cluster
2 The start and end time of a map or a reduce task
3 The input key/value pairs processed by a specific map task
4 The intermediate key/value pairs processed by a specific reduce task
Zubair Nabi 6: MapReduce Applications April 18, 2013 17 / 27
![Page 56: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/56.jpg)
Programmer control (2)
The programmer does have control over
1 The data structures to be used as keys and values
2 Initialization code at the beginning of map/reduce tasks andtermination code at the end
3 Preservation of state across multiple invocations of map/reduce tasks
4 The sort order of intermediate keys and in turn, the order in which areducer encounters keys
5 Partitioning of key space and in turn, the set of keys that a particularreducer encounters
Zubair Nabi 6: MapReduce Applications April 18, 2013 18 / 27
![Page 57: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/57.jpg)
Programmer control (2)
The programmer does have control over
1 The data structures to be used as keys and values
2 Initialization code at the beginning of map/reduce tasks andtermination code at the end
3 Preservation of state across multiple invocations of map/reduce tasks
4 The sort order of intermediate keys and in turn, the order in which areducer encounters keys
5 Partitioning of key space and in turn, the set of keys that a particularreducer encounters
Zubair Nabi 6: MapReduce Applications April 18, 2013 18 / 27
![Page 58: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/58.jpg)
Programmer control (2)
The programmer does have control over
1 The data structures to be used as keys and values
2 Initialization code at the beginning of map/reduce tasks andtermination code at the end
3 Preservation of state across multiple invocations of map/reduce tasks
4 The sort order of intermediate keys and in turn, the order in which areducer encounters keys
5 Partitioning of key space and in turn, the set of keys that a particularreducer encounters
Zubair Nabi 6: MapReduce Applications April 18, 2013 18 / 27
![Page 59: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/59.jpg)
Programmer control (2)
The programmer does have control over
1 The data structures to be used as keys and values
2 Initialization code at the beginning of map/reduce tasks andtermination code at the end
3 Preservation of state across multiple invocations of map/reduce tasks
4 The sort order of intermediate keys and in turn, the order in which areducer encounters keys
5 Partitioning of key space and in turn, the set of keys that a particularreducer encounters
Zubair Nabi 6: MapReduce Applications April 18, 2013 18 / 27
![Page 60: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/60.jpg)
Programmer control (2)
The programmer does have control over
1 The data structures to be used as keys and values
2 Initialization code at the beginning of map/reduce tasks andtermination code at the end
3 Preservation of state across multiple invocations of map/reduce tasks
4 The sort order of intermediate keys and in turn, the order in which areducer encounters keys
5 Partitioning of key space and in turn, the set of keys that a particularreducer encounters
Zubair Nabi 6: MapReduce Applications April 18, 2013 18 / 27
![Page 61: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/61.jpg)
Multi-job algorithms
Many algorithms cannot be easily expressed as a single MapReducejob
Complex algorithms need to be decomposed into a sequence of jobsI The output of one job becomes the input to the next
Most interactive algorithms need to be run by an external driverprogram that performs the convergence check
Zubair Nabi 6: MapReduce Applications April 18, 2013 19 / 27
![Page 62: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/62.jpg)
Multi-job algorithms
Many algorithms cannot be easily expressed as a single MapReducejobComplex algorithms need to be decomposed into a sequence of jobs
I The output of one job becomes the input to the next
Most interactive algorithms need to be run by an external driverprogram that performs the convergence check
Zubair Nabi 6: MapReduce Applications April 18, 2013 19 / 27
![Page 63: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/63.jpg)
Multi-job algorithms
Many algorithms cannot be easily expressed as a single MapReducejobComplex algorithms need to be decomposed into a sequence of jobs
I The output of one job becomes the input to the next
Most interactive algorithms need to be run by an external driverprogram that performs the convergence check
Zubair Nabi 6: MapReduce Applications April 18, 2013 19 / 27
![Page 64: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/64.jpg)
Multi-job algorithms
Many algorithms cannot be easily expressed as a single MapReducejobComplex algorithms need to be decomposed into a sequence of jobs
I The output of one job becomes the input to the next
Most interactive algorithms need to be run by an external driverprogram that performs the convergence check
Zubair Nabi 6: MapReduce Applications April 18, 2013 19 / 27
![Page 65: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/65.jpg)
Local aggregation
Network and disk latencies are expensive compared to otheroperations
Decreasing the amount of data transferred over the network during theshuffle phase results in efficiency
Aggressive user of combiners for commutative and associativealgorithms can greatly reduce intermediate data
Another strategy, dubbed “in-mapper combining” can not only decreasethe amount of intermediate data but also the number of key/valur pairsemitted by the map tasks
Zubair Nabi 6: MapReduce Applications April 18, 2013 20 / 27
![Page 66: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/66.jpg)
Local aggregation
Network and disk latencies are expensive compared to otheroperations
Decreasing the amount of data transferred over the network during theshuffle phase results in efficiency
Aggressive user of combiners for commutative and associativealgorithms can greatly reduce intermediate data
Another strategy, dubbed “in-mapper combining” can not only decreasethe amount of intermediate data but also the number of key/valur pairsemitted by the map tasks
Zubair Nabi 6: MapReduce Applications April 18, 2013 20 / 27
![Page 67: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/67.jpg)
Local aggregation
Network and disk latencies are expensive compared to otheroperations
Decreasing the amount of data transferred over the network during theshuffle phase results in efficiency
Aggressive user of combiners for commutative and associativealgorithms can greatly reduce intermediate data
Another strategy, dubbed “in-mapper combining” can not only decreasethe amount of intermediate data but also the number of key/valur pairsemitted by the map tasks
Zubair Nabi 6: MapReduce Applications April 18, 2013 20 / 27
![Page 68: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/68.jpg)
Local aggregation
Network and disk latencies are expensive compared to otheroperations
Decreasing the amount of data transferred over the network during theshuffle phase results in efficiency
Aggressive user of combiners for commutative and associativealgorithms can greatly reduce intermediate data
Another strategy, dubbed “in-mapper combining” can not only decreasethe amount of intermediate data but also the number of key/valur pairsemitted by the map tasks
Zubair Nabi 6: MapReduce Applications April 18, 2013 20 / 27
![Page 69: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/69.jpg)
Outline
1 The Anatomy of a MapReduce Application
2 MapReduce Design Patterns
3 Common MapReduce Application Types
Zubair Nabi 6: MapReduce Applications April 18, 2013 21 / 27
![Page 70: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/70.jpg)
Counting and Summing
1 ProblemI A number of documents with a set of terms
I Need to calculate the number of occurrences of each term (word count)or some arbitrary function over the terms (average response time in logfiles)
2 SolutionI Map: For each term, emit the term and “1”I Reduce: Take the sum (or any other operation) of each term values
Zubair Nabi 6: MapReduce Applications April 18, 2013 22 / 27
![Page 71: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/71.jpg)
Counting and Summing
1 ProblemI A number of documents with a set of termsI Need to calculate the number of occurrences of each term (word count)
or some arbitrary function over the terms (average response time in logfiles)
2 SolutionI Map: For each term, emit the term and “1”I Reduce: Take the sum (or any other operation) of each term values
Zubair Nabi 6: MapReduce Applications April 18, 2013 22 / 27
![Page 72: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/72.jpg)
Counting and Summing
1 ProblemI A number of documents with a set of termsI Need to calculate the number of occurrences of each term (word count)
or some arbitrary function over the terms (average response time in logfiles)
2 SolutionI Map: For each term, emit the term and “1”
I Reduce: Take the sum (or any other operation) of each term values
Zubair Nabi 6: MapReduce Applications April 18, 2013 22 / 27
![Page 73: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/73.jpg)
Counting and Summing
1 ProblemI A number of documents with a set of termsI Need to calculate the number of occurrences of each term (word count)
or some arbitrary function over the terms (average response time in logfiles)
2 SolutionI Map: For each term, emit the term and “1”I Reduce: Take the sum (or any other operation) of each term values
Zubair Nabi 6: MapReduce Applications April 18, 2013 22 / 27
![Page 74: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/74.jpg)
Collating
1 ProblemI A number of documents with a set of terms and some function of one
item
I Need to group all items that have the same value of function to eitherstore items together or perform some computation over them
2 SolutionI Map: For each item, compute given function and emit function value as
key and item as valueI Reduce: Either save all grouped items or perform further computationI Example: Inverted Index: Items are words and function is document ID
Zubair Nabi 6: MapReduce Applications April 18, 2013 23 / 27
![Page 75: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/75.jpg)
Collating
1 ProblemI A number of documents with a set of terms and some function of one
itemI Need to group all items that have the same value of function to either
store items together or perform some computation over them
2 SolutionI Map: For each item, compute given function and emit function value as
key and item as valueI Reduce: Either save all grouped items or perform further computationI Example: Inverted Index: Items are words and function is document ID
Zubair Nabi 6: MapReduce Applications April 18, 2013 23 / 27
![Page 76: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/76.jpg)
Collating
1 ProblemI A number of documents with a set of terms and some function of one
itemI Need to group all items that have the same value of function to either
store items together or perform some computation over them
2 SolutionI Map: For each item, compute given function and emit function value as
key and item as value
I Reduce: Either save all grouped items or perform further computationI Example: Inverted Index: Items are words and function is document ID
Zubair Nabi 6: MapReduce Applications April 18, 2013 23 / 27
![Page 77: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/77.jpg)
Collating
1 ProblemI A number of documents with a set of terms and some function of one
itemI Need to group all items that have the same value of function to either
store items together or perform some computation over them
2 SolutionI Map: For each item, compute given function and emit function value as
key and item as valueI Reduce: Either save all grouped items or perform further computation
I Example: Inverted Index: Items are words and function is document ID
Zubair Nabi 6: MapReduce Applications April 18, 2013 23 / 27
![Page 78: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/78.jpg)
Collating
1 ProblemI A number of documents with a set of terms and some function of one
itemI Need to group all items that have the same value of function to either
store items together or perform some computation over them
2 SolutionI Map: For each item, compute given function and emit function value as
key and item as valueI Reduce: Either save all grouped items or perform further computationI Example: Inverted Index: Items are words and function is document ID
Zubair Nabi 6: MapReduce Applications April 18, 2013 23 / 27
![Page 79: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/79.jpg)
Filtering, Parsing, and Validation
1 ProblemI A set of records
I Need to collect all records that meet some condition or transform eachrecord into another representation
2 SolutionI Map: For each record, emit it if passes the condition or emit its
transformed versionI Reduce: IdentityI Example: Text parsing or transformation such as word capitalization
Zubair Nabi 6: MapReduce Applications April 18, 2013 24 / 27
![Page 80: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/80.jpg)
Filtering, Parsing, and Validation
1 ProblemI A set of recordsI Need to collect all records that meet some condition or transform each
record into another representation
2 SolutionI Map: For each record, emit it if passes the condition or emit its
transformed versionI Reduce: IdentityI Example: Text parsing or transformation such as word capitalization
Zubair Nabi 6: MapReduce Applications April 18, 2013 24 / 27
![Page 81: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/81.jpg)
Filtering, Parsing, and Validation
1 ProblemI A set of recordsI Need to collect all records that meet some condition or transform each
record into another representation
2 SolutionI Map: For each record, emit it if passes the condition or emit its
transformed version
I Reduce: IdentityI Example: Text parsing or transformation such as word capitalization
Zubair Nabi 6: MapReduce Applications April 18, 2013 24 / 27
![Page 82: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/82.jpg)
Filtering, Parsing, and Validation
1 ProblemI A set of recordsI Need to collect all records that meet some condition or transform each
record into another representation
2 SolutionI Map: For each record, emit it if passes the condition or emit its
transformed versionI Reduce: Identity
I Example: Text parsing or transformation such as word capitalization
Zubair Nabi 6: MapReduce Applications April 18, 2013 24 / 27
![Page 83: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/83.jpg)
Filtering, Parsing, and Validation
1 ProblemI A set of recordsI Need to collect all records that meet some condition or transform each
record into another representation
2 SolutionI Map: For each record, emit it if passes the condition or emit its
transformed versionI Reduce: IdentityI Example: Text parsing or transformation such as word capitalization
Zubair Nabi 6: MapReduce Applications April 18, 2013 24 / 27
![Page 84: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/84.jpg)
Distributed Task Execution
1 ProblemI Large computational problem
I Need to divide it into multiple parts and combine results from all parts toobtain a final result
2 SolutionI Map: Perform corresponding computationI Reduce: Combine all emitted results into a final oneI Example: RGB histogram calculation of bitmap images
Zubair Nabi 6: MapReduce Applications April 18, 2013 25 / 27
![Page 85: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/85.jpg)
Distributed Task Execution
1 ProblemI Large computational problemI Need to divide it into multiple parts and combine results from all parts to
obtain a final result
2 SolutionI Map: Perform corresponding computationI Reduce: Combine all emitted results into a final oneI Example: RGB histogram calculation of bitmap images
Zubair Nabi 6: MapReduce Applications April 18, 2013 25 / 27
![Page 86: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/86.jpg)
Distributed Task Execution
1 ProblemI Large computational problemI Need to divide it into multiple parts and combine results from all parts to
obtain a final result
2 SolutionI Map: Perform corresponding computation
I Reduce: Combine all emitted results into a final oneI Example: RGB histogram calculation of bitmap images
Zubair Nabi 6: MapReduce Applications April 18, 2013 25 / 27
![Page 87: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/87.jpg)
Distributed Task Execution
1 ProblemI Large computational problemI Need to divide it into multiple parts and combine results from all parts to
obtain a final result
2 SolutionI Map: Perform corresponding computationI Reduce: Combine all emitted results into a final one
I Example: RGB histogram calculation of bitmap images
Zubair Nabi 6: MapReduce Applications April 18, 2013 25 / 27
![Page 88: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/88.jpg)
Distributed Task Execution
1 ProblemI Large computational problemI Need to divide it into multiple parts and combine results from all parts to
obtain a final result
2 SolutionI Map: Perform corresponding computationI Reduce: Combine all emitted results into a final oneI Example: RGB histogram calculation of bitmap images
Zubair Nabi 6: MapReduce Applications April 18, 2013 25 / 27
![Page 89: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/89.jpg)
Sorting
1 ProblemI A set of records
I Need to sort records in some order
2 SolutionI Map: IdentityI Reduce: IdentityI Also possible to sort by value, either perform a secondary sort or
perform a key-to-value conversion
Zubair Nabi 6: MapReduce Applications April 18, 2013 26 / 27
![Page 90: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/90.jpg)
Sorting
1 ProblemI A set of recordsI Need to sort records in some order
2 SolutionI Map: IdentityI Reduce: IdentityI Also possible to sort by value, either perform a secondary sort or
perform a key-to-value conversion
Zubair Nabi 6: MapReduce Applications April 18, 2013 26 / 27
![Page 91: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/91.jpg)
Sorting
1 ProblemI A set of recordsI Need to sort records in some order
2 SolutionI Map: Identity
I Reduce: IdentityI Also possible to sort by value, either perform a secondary sort or
perform a key-to-value conversion
Zubair Nabi 6: MapReduce Applications April 18, 2013 26 / 27
![Page 92: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/92.jpg)
Sorting
1 ProblemI A set of recordsI Need to sort records in some order
2 SolutionI Map: IdentityI Reduce: Identity
I Also possible to sort by value, either perform a secondary sort orperform a key-to-value conversion
Zubair Nabi 6: MapReduce Applications April 18, 2013 26 / 27
![Page 93: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/93.jpg)
Sorting
1 ProblemI A set of recordsI Need to sort records in some order
2 SolutionI Map: IdentityI Reduce: IdentityI Also possible to sort by value, either perform a secondary sort or
perform a key-to-value conversion
Zubair Nabi 6: MapReduce Applications April 18, 2013 26 / 27
![Page 94: Topic 6: MapReduce Applications](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555097ebb4c9058b208b473c/html5/thumbnails/94.jpg)
References
1 Jimmy Lin and Chris Dyer. 2010. Data-Intensive Text Processing withMapReduce. Morgan and Claypool Publishers.
2 MapReduce Patterns, Algorithms, and Use Cases:http://highlyscalable.wordpress.com/2012/02/01/mapreduce-patterns/
Zubair Nabi 6: MapReduce Applications April 18, 2013 27 / 27