hadoop, hdfs, mapreduce and pig
DESCRIPTION
Open presentation, training material. Presented at CSIRO Big Data 2.0 workshop in September 2013, North Ryde, Australia. Animated by hands-on examples.TRANSCRIPT
![Page 1: Hadoop, HDFS, MapReduce and Pig](https://reader034.vdocuments.mx/reader034/viewer/2022052218/5592c5461a28abdf0f8b4757/html5/thumbnails/1.jpg)
![Page 2: Hadoop, HDFS, MapReduce and Pig](https://reader034.vdocuments.mx/reader034/viewer/2022052218/5592c5461a28abdf0f8b4757/html5/thumbnails/2.jpg)
●
●
●
![Page 3: Hadoop, HDFS, MapReduce and Pig](https://reader034.vdocuments.mx/reader034/viewer/2022052218/5592c5461a28abdf0f8b4757/html5/thumbnails/3.jpg)
●
![Page 4: Hadoop, HDFS, MapReduce and Pig](https://reader034.vdocuments.mx/reader034/viewer/2022052218/5592c5461a28abdf0f8b4757/html5/thumbnails/4.jpg)
●●●
●●●●●●●
![Page 5: Hadoop, HDFS, MapReduce and Pig](https://reader034.vdocuments.mx/reader034/viewer/2022052218/5592c5461a28abdf0f8b4757/html5/thumbnails/5.jpg)
●●●
●
●●●
![Page 6: Hadoop, HDFS, MapReduce and Pig](https://reader034.vdocuments.mx/reader034/viewer/2022052218/5592c5461a28abdf0f8b4757/html5/thumbnails/6.jpg)
●
●
●
●
●
![Page 7: Hadoop, HDFS, MapReduce and Pig](https://reader034.vdocuments.mx/reader034/viewer/2022052218/5592c5461a28abdf0f8b4757/html5/thumbnails/7.jpg)
●
●
●
![Page 8: Hadoop, HDFS, MapReduce and Pig](https://reader034.vdocuments.mx/reader034/viewer/2022052218/5592c5461a28abdf0f8b4757/html5/thumbnails/8.jpg)
●●
![Page 9: Hadoop, HDFS, MapReduce and Pig](https://reader034.vdocuments.mx/reader034/viewer/2022052218/5592c5461a28abdf0f8b4757/html5/thumbnails/9.jpg)
●
●
●
●
> hadoop fs
![Page 10: Hadoop, HDFS, MapReduce and Pig](https://reader034.vdocuments.mx/reader034/viewer/2022052218/5592c5461a28abdf0f8b4757/html5/thumbnails/10.jpg)
hadoop fs
●
●
●
●
●
●
●
![Page 11: Hadoop, HDFS, MapReduce and Pig](https://reader034.vdocuments.mx/reader034/viewer/2022052218/5592c5461a28abdf0f8b4757/html5/thumbnails/11.jpg)
●
●
$ hadoop fs
● ls
$ hadoop fs –help ls
![Page 12: Hadoop, HDFS, MapReduce and Pig](https://reader034.vdocuments.mx/reader034/viewer/2022052218/5592c5461a28abdf0f8b4757/html5/thumbnails/12.jpg)
●
$ hadoop fs –ls <path> $ hadoop fs –ls /
●
$ hadoop fs -ls $ hadoop fs –ls /user/cloudera
●
●
![Page 13: Hadoop, HDFS, MapReduce and Pig](https://reader034.vdocuments.mx/reader034/viewer/2022052218/5592c5461a28abdf0f8b4757/html5/thumbnails/13.jpg)
●
$ hadoop fs -mkdir data $ hadoop fs -ls
●
$ cd ~/bigdata/Exercises/hadoop/data $ ls -l $ hadoop fs –put mammograms.zip data
![Page 14: Hadoop, HDFS, MapReduce and Pig](https://reader034.vdocuments.mx/reader034/viewer/2022052218/5592c5461a28abdf0f8b4757/html5/thumbnails/14.jpg)
●
● http://localhost:50070
● fsck: an HDFS utility $ hadoop fsck /user/cloudera/data/mammograms.zip \
-blocks -locations -files
●
$ head -n 100 ato_centenary.txt \ | hadoop fs –put - data/ato100.txt
![Page 15: Hadoop, HDFS, MapReduce and Pig](https://reader034.vdocuments.mx/reader034/viewer/2022052218/5592c5461a28abdf0f8b4757/html5/thumbnails/15.jpg)
●
$ head -n 1000 ato_centenary.txt \ | hadoop fs –put - data/ato100.txt
●
put: ‘data/ato100.txt': File exists●
$ hadoop fs -rm data/ato100.txt $ head -n 1000 ato_centenary.txt \ | hadoop fs –put - data/ato100.txt
![Page 16: Hadoop, HDFS, MapReduce and Pig](https://reader034.vdocuments.mx/reader034/viewer/2022052218/5592c5461a28abdf0f8b4757/html5/thumbnails/16.jpg)
●
$ hadoop fs -cat data/ato100.txt | less
●
$ hadoop fs -get data/ato100.txt ato100.txt
●
-mv, -cp, -rmdir, -stat ...
![Page 17: Hadoop, HDFS, MapReduce and Pig](https://reader034.vdocuments.mx/reader034/viewer/2022052218/5592c5461a28abdf0f8b4757/html5/thumbnails/17.jpg)
●
●●●●
●●
●
●
●
![Page 18: Hadoop, HDFS, MapReduce and Pig](https://reader034.vdocuments.mx/reader034/viewer/2022052218/5592c5461a28abdf0f8b4757/html5/thumbnails/18.jpg)
●
●
●
○
■
![Page 19: Hadoop, HDFS, MapReduce and Pig](https://reader034.vdocuments.mx/reader034/viewer/2022052218/5592c5461a28abdf0f8b4757/html5/thumbnails/19.jpg)
●○
●
○
●○
●○
○○
![Page 20: Hadoop, HDFS, MapReduce and Pig](https://reader034.vdocuments.mx/reader034/viewer/2022052218/5592c5461a28abdf0f8b4757/html5/thumbnails/20.jpg)
![Page 21: Hadoop, HDFS, MapReduce and Pig](https://reader034.vdocuments.mx/reader034/viewer/2022052218/5592c5461a28abdf0f8b4757/html5/thumbnails/21.jpg)
●
●
●
●
●
![Page 22: Hadoop, HDFS, MapReduce and Pig](https://reader034.vdocuments.mx/reader034/viewer/2022052218/5592c5461a28abdf0f8b4757/html5/thumbnails/22.jpg)
●
●
●
●
●
$ javac –classpath `hadoop classpath` *.java
●
$ jar cvf csiro.jar *.class
●
$ hadoop jar csiro.jar Csiro input_dir output_dir
![Page 23: Hadoop, HDFS, MapReduce and Pig](https://reader034.vdocuments.mx/reader034/viewer/2022052218/5592c5461a28abdf0f8b4757/html5/thumbnails/23.jpg)
●
○
●●
map(in_key, in_value) -> (inter_key, inter_value) list
![Page 24: Hadoop, HDFS, MapReduce and Pig](https://reader034.vdocuments.mx/reader034/viewer/2022052218/5592c5461a28abdf0f8b4757/html5/thumbnails/24.jpg)
●
○
■
■
■
●
![Page 25: Hadoop, HDFS, MapReduce and Pig](https://reader034.vdocuments.mx/reader034/viewer/2022052218/5592c5461a28abdf0f8b4757/html5/thumbnails/25.jpg)
●
let map(key, value) =emit(key.toUpper(), value.
toUpper())
(‘csiro’, ‘cci’) -> (‘CSIRO’, ‘CCI’)(‘csiro’, ‘cesre’) -> (‘CSIRO’, ‘CESRE’)(‘csiro’, ‘cmse’) -> (‘CSIRO’, ‘CMSE’)(‘toyota’, ‘yaris’) -> (‘TOYOTA’, ‘YARIS’)
![Page 26: Hadoop, HDFS, MapReduce and Pig](https://reader034.vdocuments.mx/reader034/viewer/2022052218/5592c5461a28abdf0f8b4757/html5/thumbnails/26.jpg)
●
let map(key, value) =foreach char c in value:
emit(key, c)
(‘cci’, ‘csiro’) -> (‘cci’, ‘c’), (‘cci’, ’s’),(‘cci’, ‘i’), (‘cci’, ‘r’),(‘cci’, ‘o’)
(‘open’, ‘nasa’) -> (‘open’, ‘n’), (‘open’, ’a’),(‘open’, ‘s’), (‘open’, ‘a’)
![Page 27: Hadoop, HDFS, MapReduce and Pig](https://reader034.vdocuments.mx/reader034/viewer/2022052218/5592c5461a28abdf0f8b4757/html5/thumbnails/27.jpg)
●
let map(key, value) =emit(value.length(), value)
(‘csiro’, ‘cci’) -> (‘3’, ‘cci’)(‘csiro’, ‘cesre’) -> (‘5’, ‘cesre’)(‘csiro’, ‘cmse’) -> (‘4’, ‘cmse’)(‘toyota’, ‘yaris’) -> (‘5’, ‘yaris’)
![Page 28: Hadoop, HDFS, MapReduce and Pig](https://reader034.vdocuments.mx/reader034/viewer/2022052218/5592c5461a28abdf0f8b4757/html5/thumbnails/28.jpg)
●
●○
○
○
●○
![Page 29: Hadoop, HDFS, MapReduce and Pig](https://reader034.vdocuments.mx/reader034/viewer/2022052218/5592c5461a28abdf0f8b4757/html5/thumbnails/29.jpg)
●
map(String input_key, String input_value)foreach word w in input_value:
emit(w, 1)
reduce(String output_key,Iterator<int> intermediate_values)
set count = 0foreach v in intermediate_values:
count += vemit(output_key, count)
![Page 30: Hadoop, HDFS, MapReduce and Pig](https://reader034.vdocuments.mx/reader034/viewer/2022052218/5592c5461a28abdf0f8b4757/html5/thumbnails/30.jpg)
● Wordcount $ cd ~/bigdata/Exercises/hadoop/wordcount; ls
●
$ javac –classpath `hadoop classpath` *.java
●
$ jar cvf wc.jar *.class
WordCount.java WordMapper.java SumReducer.java
![Page 31: Hadoop, HDFS, MapReduce and Pig](https://reader034.vdocuments.mx/reader034/viewer/2022052218/5592c5461a28abdf0f8b4757/html5/thumbnails/31.jpg)
●
$ hadoop jar wc.jar WordCount data/ato100.txt ato_wc
●
$ hadoop fs ls ato_wc $ hadoop fs -cat ato_wc/part-r-00000 | less $ hadoop fs -cat ato_wc/* | grep ‘ATO\|CSIRO’
●
$ hadoop fs -rm -r ato_wc
![Page 32: Hadoop, HDFS, MapReduce and Pig](https://reader034.vdocuments.mx/reader034/viewer/2022052218/5592c5461a28abdf0f8b4757/html5/thumbnails/32.jpg)
● Average max temperature ●
![Page 33: Hadoop, HDFS, MapReduce and Pig](https://reader034.vdocuments.mx/reader034/viewer/2022052218/5592c5461a28abdf0f8b4757/html5/thumbnails/33.jpg)
●
$ cd ~/bigdata/Exercises/hadoop/data $ less nsw_temp.csv $ less bom_data_Note.txt
![Page 34: Hadoop, HDFS, MapReduce and Pig](https://reader034.vdocuments.mx/reader034/viewer/2022052218/5592c5461a28abdf0f8b4757/html5/thumbnails/34.jpg)
●
map(String input_key, String input_value):emit(input_value[3], input_value[5])
(‘IDCJAC0010,061087,1965,01,02,32.2,1,Y’)->(‘01’, 32.2)
(‘IDCJAC0010,066062,1890,04,27,20.2,1,Y’)->(‘04’, 20.2)
(‘IDCJAC0010,066062,2012,02,03,21.0,1,Y’)->(‘02’, 21.1)
![Page 35: Hadoop, HDFS, MapReduce and Pig](https://reader034.vdocuments.mx/reader034/viewer/2022052218/5592c5461a28abdf0f8b4757/html5/thumbnails/35.jpg)
●
reduce(String month, Iterator<double> values)set count = 0
set sum = 0foreach v in values:
sum += v count++ set mean = sum/count
emit(month, mean)
![Page 36: Hadoop, HDFS, MapReduce and Pig](https://reader034.vdocuments.mx/reader034/viewer/2022052218/5592c5461a28abdf0f8b4757/html5/thumbnails/36.jpg)
● $ cd ../averagetemp $ gedit *.java&
●
$ cd ../wordcount $ gedit *.java&
AverageTemp.java AverageTempMapper.java AverageReducer.java
![Page 37: Hadoop, HDFS, MapReduce and Pig](https://reader034.vdocuments.mx/reader034/viewer/2022052218/5592c5461a28abdf0f8b4757/html5/thumbnails/37.jpg)
●●
$ hadoop fs -put ../data/nsw_temp.csv data
$ javac –classpath `hadoop classpath` *.java $ jar cvf avt.jar *.class $ hadoop jar avt.jar AverageTemp data/nsw_temp.csv avt
![Page 38: Hadoop, HDFS, MapReduce and Pig](https://reader034.vdocuments.mx/reader034/viewer/2022052218/5592c5461a28abdf0f8b4757/html5/thumbnails/38.jpg)
● $ hadoop fs -cat avt/part-1-00000
~/bigdata/Exercises/hadoop/averagetemp/sample_solution
![Page 39: Hadoop, HDFS, MapReduce and Pig](https://reader034.vdocuments.mx/reader034/viewer/2022052218/5592c5461a28abdf0f8b4757/html5/thumbnails/39.jpg)
![Page 40: Hadoop, HDFS, MapReduce and Pig](https://reader034.vdocuments.mx/reader034/viewer/2022052218/5592c5461a28abdf0f8b4757/html5/thumbnails/40.jpg)
●○
○
●●●
○
![Page 41: Hadoop, HDFS, MapReduce and Pig](https://reader034.vdocuments.mx/reader034/viewer/2022052218/5592c5461a28abdf0f8b4757/html5/thumbnails/41.jpg)
●●●
●
![Page 42: Hadoop, HDFS, MapReduce and Pig](https://reader034.vdocuments.mx/reader034/viewer/2022052218/5592c5461a28abdf0f8b4757/html5/thumbnails/42.jpg)
●●●
●
![Page 43: Hadoop, HDFS, MapReduce and Pig](https://reader034.vdocuments.mx/reader034/viewer/2022052218/5592c5461a28abdf0f8b4757/html5/thumbnails/43.jpg)
![Page 44: Hadoop, HDFS, MapReduce and Pig](https://reader034.vdocuments.mx/reader034/viewer/2022052218/5592c5461a28abdf0f8b4757/html5/thumbnails/44.jpg)
●
●○○
●
●○
![Page 45: Hadoop, HDFS, MapReduce and Pig](https://reader034.vdocuments.mx/reader034/viewer/2022052218/5592c5461a28abdf0f8b4757/html5/thumbnails/45.jpg)
●
●
●●
●●
![Page 46: Hadoop, HDFS, MapReduce and Pig](https://reader034.vdocuments.mx/reader034/viewer/2022052218/5592c5461a28abdf0f8b4757/html5/thumbnails/46.jpg)
![Page 47: Hadoop, HDFS, MapReduce and Pig](https://reader034.vdocuments.mx/reader034/viewer/2022052218/5592c5461a28abdf0f8b4757/html5/thumbnails/47.jpg)
●●●
![Page 48: Hadoop, HDFS, MapReduce and Pig](https://reader034.vdocuments.mx/reader034/viewer/2022052218/5592c5461a28abdf0f8b4757/html5/thumbnails/48.jpg)
![Page 49: Hadoop, HDFS, MapReduce and Pig](https://reader034.vdocuments.mx/reader034/viewer/2022052218/5592c5461a28abdf0f8b4757/html5/thumbnails/49.jpg)
![Page 50: Hadoop, HDFS, MapReduce and Pig](https://reader034.vdocuments.mx/reader034/viewer/2022052218/5592c5461a28abdf0f8b4757/html5/thumbnails/50.jpg)
![Page 51: Hadoop, HDFS, MapReduce and Pig](https://reader034.vdocuments.mx/reader034/viewer/2022052218/5592c5461a28abdf0f8b4757/html5/thumbnails/51.jpg)
●○○○
●○
●○○○
![Page 52: Hadoop, HDFS, MapReduce and Pig](https://reader034.vdocuments.mx/reader034/viewer/2022052218/5592c5461a28abdf0f8b4757/html5/thumbnails/52.jpg)
●
●
●
○○○○○○
![Page 53: Hadoop, HDFS, MapReduce and Pig](https://reader034.vdocuments.mx/reader034/viewer/2022052218/5592c5461a28abdf0f8b4757/html5/thumbnails/53.jpg)
![Page 54: Hadoop, HDFS, MapReduce and Pig](https://reader034.vdocuments.mx/reader034/viewer/2022052218/5592c5461a28abdf0f8b4757/html5/thumbnails/54.jpg)
https://github.com/tomaszbednarz/pig-abc-toilets
● We have list of local ABC Radio stations in Australia
● We have list of all Public Toilets across Australia
● We want to find a closest toilet to a Radio Station
Demonstration of:
● Data Schemas● Use of external libraries● Google Maps API
![Page 55: Hadoop, HDFS, MapReduce and Pig](https://reader034.vdocuments.mx/reader034/viewer/2022052218/5592c5461a28abdf0f8b4757/html5/thumbnails/55.jpg)
●
●
●
●
●
●
●