map-reduce and hadoop - inria and... · 4 map reduce example • compute the average grade of...
TRANSCRIPT
![Page 1: Map-Reduce and Hadoop - Inria and... · 4 Map Reduce example • Compute the average grade of students • For each course, the professor provides us with a text file • Text file](https://reader036.vdocuments.mx/reader036/viewer/2022062504/5b78e05f7f8b9a02268c6e41/html5/thumbnails/1.jpg)
1
Map-Reduce and Hadoop
![Page 2: Map-Reduce and Hadoop - Inria and... · 4 Map Reduce example • Compute the average grade of students • For each course, the professor provides us with a text file • Text file](https://reader036.vdocuments.mx/reader036/viewer/2022062504/5b78e05f7f8b9a02268c6e41/html5/thumbnails/2.jpg)
2
Introduction to Map-Reduce
![Page 3: Map-Reduce and Hadoop - Inria and... · 4 Map Reduce example • Compute the average grade of students • For each course, the professor provides us with a text file • Text file](https://reader036.vdocuments.mx/reader036/viewer/2022062504/5b78e05f7f8b9a02268c6e41/html5/thumbnails/3.jpg)
3
Map Reduce operations
• Input data are (key, value) pairs
• 2 operations available : map and reduce
• Map • Takes a (key, value) and generates other (key, value)
• Reduce • Takes a key and all associated values • Generates (key, value) pairs
• A map-reduce algorithm requires a mapper and a reducer
![Page 4: Map-Reduce and Hadoop - Inria and... · 4 Map Reduce example • Compute the average grade of students • For each course, the professor provides us with a text file • Text file](https://reader036.vdocuments.mx/reader036/viewer/2022062504/5b78e05f7f8b9a02268c6e41/html5/thumbnails/4.jpg)
4
Map Reduce example
• Compute the average grade of students • For each course, the professor provides us with a text file • Text file format : lines of “student grade”
• Algorithm (non map-reduce) • For each student, collect all grades and perform the average
• Algorithm (map-reduce) • Mapper
• Assume the input file is parsed as (student, grade) pairs • So … do nothing!
• Reducer • Perform the average of all values for a given key
![Page 5: Map-Reduce and Hadoop - Inria and... · 4 Map Reduce example • Compute the average grade of students • For each course, the professor provides us with a text file • Text file](https://reader036.vdocuments.mx/reader036/viewer/2022062504/5b78e05f7f8b9a02268c6e41/html5/thumbnails/5.jpg)
5
Map Reduce example
Fabrice 20 Brian 10 Paul 15
Fabrice 15 Brian 20 Paul 10
Fabrice 10 Brian 15 Paul 20
(Fabrice, 20) (Brian, 10) (Paul, 15) (Fabrice, 15) (Brian, 20) (Paul, 10) (Fabrice, 10) (Brian, 15) (Paul, 20)
(Fabrice, [20, 15, 10]) (Brian, [10, 15, 20]) (Paul, [15, 20, 10])
(Fabrice, 15) (Brian 15) (Paul, 15)
Map Reduce
Course 1
Course 2
Course 3
![Page 6: Map-Reduce and Hadoop - Inria and... · 4 Map Reduce example • Compute the average grade of students • For each course, the professor provides us with a text file • Text file](https://reader036.vdocuments.mx/reader036/viewer/2022062504/5b78e05f7f8b9a02268c6e41/html5/thumbnails/6.jpg)
6
Map Reduce example… too easy
• Ok, this was easy because • We didn’t care about technical details like reading inputs • All keys are “equals”, no weighted average
• Now can we do something more complicated ?
• Let’s computed a weighted average • Course 1 has weight 5 • Course 2 has weight 2 • Course 3 has weight 3
• What is the problem now ?
![Page 7: Map-Reduce and Hadoop - Inria and... · 4 Map Reduce example • Compute the average grade of students • For each course, the professor provides us with a text file • Text file](https://reader036.vdocuments.mx/reader036/viewer/2022062504/5b78e05f7f8b9a02268c6e41/html5/thumbnails/7.jpg)
7
Map Reduce example
Fabrice 20 Brian 10 Paul 15
Fabrice 15 Brian 20 Paul 10
Fabrice 10 Brian 15 Paul 20
(Fabrice, 20) (Brian, 10) (Paul, 15) (Fabrice, 15) (Brian, 20) (Paul, 10) (Fabrice, 10) (Brian, 15) (Paul, 20)
(Fabrice, [20, 15, 10]) (Brian, [10, 15, 20]) (Paul, [15, 20, 10])
(Fabrice, 15) (Brian 15) (Paul, 15)
Map Reduce
Course 1
Course 2
Course 3
Should be able to discriminate between values
![Page 8: Map-Reduce and Hadoop - Inria and... · 4 Map Reduce example • Compute the average grade of students • For each course, the professor provides us with a text file • Text file](https://reader036.vdocuments.mx/reader036/viewer/2022062504/5b78e05f7f8b9a02268c6e41/html5/thumbnails/8.jpg)
8
Map Reduce example - advanced
• How discriminate between values for a given key • We can’t … unless the values look different
• New reducer • Input : (Name, [course1_Grade1, course2_Grade2, course3_Grade3]) • Strip values from course indication and perform weighted average
• So, we need to change the input of the reducer which comes from… the mapper
• New mapper • Input : (Name, Grade) • Output : (Name, courseName_Grade) • The mapper needs to be aware of the input file
![Page 9: Map-Reduce and Hadoop - Inria and... · 4 Map Reduce example • Compute the average grade of students • For each course, the professor provides us with a text file • Text file](https://reader036.vdocuments.mx/reader036/viewer/2022062504/5b78e05f7f8b9a02268c6e41/html5/thumbnails/9.jpg)
9
Map Reduce example - 2
Fabrice 20 Brian 10 Paul 15
Fabrice 15 Brian 20 Paul 10
Fabrice 10 Brian 15 Paul 20
(Fabrice, C1_20) (Brian, C1_10) (Paul, C1_15) (Fabrice, C2_15) (Brian, C2_20) (Paul, C2_10) (Fabrice, C3_10) (Brian, C3_15) (Paul, C3_20)
(Fabrice, [C1_20, C2_15, C3_10]) (Brian, [C1_10, C2_15, C3_20]) (Paul, [C1_15, C2_20, C3_10])
(Fabrice, 16) (Brian, 14) (Paul, 14.5)
Map Reduce
Course 1
Course 2
Course 3
![Page 10: Map-Reduce and Hadoop - Inria and... · 4 Map Reduce example • Compute the average grade of students • For each course, the professor provides us with a text file • Text file](https://reader036.vdocuments.mx/reader036/viewer/2022062504/5b78e05f7f8b9a02268c6e41/html5/thumbnails/10.jpg)
10
Introduction to Hadoop
F. Huet, Oasis Seminar, 07/07/2010
![Page 11: Map-Reduce and Hadoop - Inria and... · 4 Map Reduce example • Compute the average grade of students • For each course, the professor provides us with a text file • Text file](https://reader036.vdocuments.mx/reader036/viewer/2022062504/5b78e05f7f8b9a02268c6e41/html5/thumbnails/11.jpg)
11
What is Hadoop ?
• A set of software developed by Apache for distributed computing
• Many different projects • MapReduce • HDFS : Hadoop Distributed File System • Hbase : Distributed Database • ….
• Written in Java
• Can be deployed on any cluster easily
![Page 12: Map-Reduce and Hadoop - Inria and... · 4 Map Reduce example • Compute the average grade of students • For each course, the professor provides us with a text file • Text file](https://reader036.vdocuments.mx/reader036/viewer/2022062504/5b78e05f7f8b9a02268c6e41/html5/thumbnails/12.jpg)
12
Hadoop Job
• An Hadoop job is composed of a map operation and (possibly) a reduce operation
• Map and reduce operations are implemented in a Mapper subclass and a Reducer subclass
• Hadoop will start many instances of Mapper and Reducer • Decided at runtime but can be specified
• Each instance will work on a subset of the keys called a Splits
![Page 13: Map-Reduce and Hadoop - Inria and... · 4 Map Reduce example • Compute the average grade of students • For each course, the professor provides us with a text file • Text file](https://reader036.vdocuments.mx/reader036/viewer/2022062504/5b78e05f7f8b9a02268c6e41/html5/thumbnails/13.jpg)
13
Map-Reduce workflow Source : Hadoop the definitive guide
![Page 14: Map-Reduce and Hadoop - Inria and... · 4 Map Reduce example • Compute the average grade of students • For each course, the professor provides us with a text file • Text file](https://reader036.vdocuments.mx/reader036/viewer/2022062504/5b78e05f7f8b9a02268c6e41/html5/thumbnails/14.jpg)
14
Mapper
• Extend default class Mapper<K1, V1, K2, V2> • K1, V1 : type of input (key,value) • K2, V2 : type of output (key,value)
• Implements public void map(K1 key, V1 value, Context context) throws IOException, InterruptedException
• Output of values is done using context.write
![Page 15: Map-Reduce and Hadoop - Inria and... · 4 Map Reduce example • Compute the average grade of students • For each course, the professor provides us with a text file • Text file](https://reader036.vdocuments.mx/reader036/viewer/2022062504/5b78e05f7f8b9a02268c6e41/html5/thumbnails/15.jpg)
15
Reducer
• Extend default class Reducer<K1, V1, K2, V2> • K1, V1 : type of input (key,[values]) • K2, V2 : type of output (key, value)
• Implements public void reduce(K1 key, V1 values, Context context) throws IOException, InterruptedException
• V1 is iterable • Output of values is done using context.write
![Page 16: Map-Reduce and Hadoop - Inria and... · 4 Map Reduce example • Compute the average grade of students • For each course, the professor provides us with a text file • Text file](https://reader036.vdocuments.mx/reader036/viewer/2022062504/5b78e05f7f8b9a02268c6e41/html5/thumbnails/16.jpg)
16
Input/Output
• Hadoop helps abstracting away data format and I/O from map/reduce process
• InputFormat • Validates data input format (user specified) • Split-up the input file into Splits • Provides an InputReader to read records from the Splits • Default : TextInputFormat to read text file (key will be offset, value will be
the line)
• OutputFormat • Validate data output format • Provides an OutputWriter to write records to the file system • Default : TextOutputFormat to write plain text files
![Page 17: Map-Reduce and Hadoop - Inria and... · 4 Map Reduce example • Compute the average grade of students • For each course, the professor provides us with a text file • Text file](https://reader036.vdocuments.mx/reader036/viewer/2022062504/5b78e05f7f8b9a02268c6e41/html5/thumbnails/17.jpg)
17
Hadoop Job example
Configuration config = new Configuration(); Job job = new Job(config, "filesplitTest");
job.setInputFormatClass(TextInputFormat.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setOutputFormatClass(SingleTextOutputFormat.class); Path outputDir = new Path(output);
Path inputPath = new Path(input);
FileInputFormat.setInputPaths(job, inputPath);
FileOutputFormat.setOutputPath(job, outputDir);
job.setMapperClass(MapSingleSortedFile.class);
job.setReducerClass(Reducer.class);
![Page 18: Map-Reduce and Hadoop - Inria and... · 4 Map Reduce example • Compute the average grade of students • For each course, the professor provides us with a text file • Text file](https://reader036.vdocuments.mx/reader036/viewer/2022062504/5b78e05f7f8b9a02268c6e41/html5/thumbnails/18.jpg)
18
HDFS
• Hadoop Distributed File System
• Aggregate local storage
• Used by Hadoop workers to read input, store temporary data and final output
• Can be accessed using CLI • $> hadoop –fs command • put : copy a local file to HDFS • get : copy a HDFS file to a local directory
• Suitable for large files • 64MB Block
![Page 19: Map-Reduce and Hadoop - Inria and... · 4 Map Reduce example • Compute the average grade of students • For each course, the professor provides us with a text file • Text file](https://reader036.vdocuments.mx/reader036/viewer/2022062504/5b78e05f7f8b9a02268c6e41/html5/thumbnails/19.jpg)
19
Demo
![Page 20: Map-Reduce and Hadoop - Inria and... · 4 Map Reduce example • Compute the average grade of students • For each course, the professor provides us with a text file • Text file](https://reader036.vdocuments.mx/reader036/viewer/2022062504/5b78e05f7f8b9a02268c6e41/html5/thumbnails/20.jpg)
20
Scenario
• Input : a text file made of RDF data (subject, predicate, object)
• Output : 3 “files” containing the input data sorted by subject, predicate or object
• Hadoop cluster • eon 2-4 with HDFS • Only need Hadoop conf files to use this cluster
• Monitor computation using web interface on eon2