Python + Hive on AWS EMR で貧者のログサマリ

Download Python + Hive on AWS EMR で貧者のログサマリ

Post on 21-Nov-2014

2.822 views

Category:

Engineering

0 download

Embed Size (px)

DESCRIPTION

PyCon JP 2014

TRANSCRIPT

<ul><li> 1. 1ZUIPO </li> <li> 2. "84&amp;.3 1Z$PO+1 "LJSB$IJLV </li> <li> 3. BDIJLV /BNF"LJSB$IJLV 5XJUUFS!@BDIJLV (JU)VC!BDIJLV "LJSB$IJLV'JSF !LBONV </li> <li> 4. (PBM 1ZUIPO </li> <li> 5. &amp;.3 </li> <li> 6. ,BONV#VTJOFTT $BSE-JOLFE0FS$-0 </li> <li> 7. $BSE-JOLFE0FS </li> <li> 8. $BSE-JOLFE0FS " ,BONV </li> <li> 9. 2VJDL4VSWFZ )BEPPQ )JWF &amp;.3 </li> <li> 10. </li> <li> 11. </li> <li> 12. </li> <li> 13. </li> <li> 14. 1PPSNBOT </li> <li> 15. ,BONV&amp;OHJOFFS5FBN NBLJ $&amp;0&amp;OHJOFFS @JEFZVUB %FTJHOFS NPRBEB &amp;OHJOFFS @BDIJLV &amp;OHJOFFS </li> <li> 16. 3FRVJSFNFOUT "WF(EBZ .BY(EBZ (# </li> <li> 17. </li> <li> 18. /PU3FRVJSFNFOUT 3FRVJSFNFOUT </li> <li> 19. "NB[PO&amp;MBTUJD.BQ3FEVDF </li> <li> 20. "84&amp;.3 "84 )BEPPQ)BEPPQ48 "1*+PC 4)%'4 </li> <li> 21. "SDIJUFDUVSF 'MVFOUE VFOUETQMVHJO 4 &amp;.3)JWF 3%4 </li> <li> 22. %BUB"OBMZTJT'MPXCZUBHPNPSJT 1SPDFTT $PMMFDU 1BSTF $MFBOVQ 4UPSF 1SPDFTT 7JTVBMJ[F IUUQXXXTMJEFTIBSFOFUUBHPNPSJTIBOEMJOHOPUTPCJHEBUB </li> <li> 23. 1PPSNBOT%BUB"OBMZTJT'MPX 1SPDFTT $PMMFDU 1BSTF $MFBOVQ 4UPSF 1SPDFTT 7JTVBMJ[F </li> <li> 24. $PMMFDU 1SPDFTT $PMMFDU 1BSTF $MFBOVQ 4UPSF 1SPDFTT 7JTVBMJ[F </li> <li> 25. $PMMFDU 'MVFOUE </li> <li> 26. VFOUET QMVHJO 2VFSZ4USJOH +40/2VFSZ4USJOH )JWF+40/ IUUQTFYBNQMFDPNCFBDPO TVCPCKDPVQPOBDUJPODMJDLDJE 'MVFOUE 4 </li> <li> 27. 1SPDFTT $PMMFDU 1BSTF $MFBOVQ 4UPSF 1SPDFTT 7JTVBMJ[F 4UPSF </li> <li> 28. 4UPSF 4 4 FYBNQMFDPNQSPEVDUJPOMPH FYBNQMFDPNQSPEVDUJPOMPHBQJ )JWF FYBNQMFDPNQSPEVDUJPOMPHBQJEU </li> <li> 29. 1SPDFTT $PMMFDU 1BSTF $MFBOVQ 4UPSF 1SPDFTT 7JTVBMJ[F 1SPDFTT </li> <li> 30. 1SPDFTT )BEPPQ </li> <li> 31. )JWF&amp;.3 'MVFOUE )BEPPQ 2VFSZ4USJOH6%'+40/ 1SPDFTT)%'44 3%4 )BEPPQ </li> <li> 32. )JWF </li> <li> 33. 1SFTUP&amp;.3 1SFTUP)JWF 4 </li> <li> 34. 1SPDFTT $PMMFDU 1BSTF $MFBOVQ 4UPSF 1SPDFTT 7JTVBMJ[F 7JTVBMJ[F </li> <li> 35. 7JTVBMJ[F &amp;.3.Z42- &amp;MBTUJDTFBSDI </li> <li> 36. ,JCBOB </li> <li> 37. 1PPSNBOT%BUB"OBMZTJT'MPX 1SPDFTT $PMMFDU 1BSTF $MFBOVQ 4UPSF 1SPDFTT 7JTVBMJ[F </li> <li> 38. :"(/* </li> <li> 39. </li> <li> 40. </li> <li> 41. 1PPSNBOT%BUB"OBMZTJT'MPX 1SPDFTT $PMMFDU 1BSTF $MFBOVQ 4UPSF 1SPDFTT 7JTVBMJ[F </li> <li> 42. 3FGFSFODFT "84"NB[PO&amp;.3#FTU1SBDUJDFT &amp;.3 )BEPPQ NJYJ"QBDIF)JWF+40/ +40/7JFX #BUDI1SPDFTTJOHBOE4USFBN1SPDFTTJOHCZ42- .11 *NQBMB1SFTUP41SFTUP *NQBMB4 </li> <li> 43. 1ZUIPO </li> <li> 44. &amp;.3 </li> <li> 45. VTF UPEPUIFGPMMPXJOH BXTDMJ &amp;YFDVUF )JWF2- &amp;YFDVUF TEJTUDQ $POH :PVS&amp;.3 #PPUTUSQ 1SFTUP $SFBUF $MVTUFS .FUBTUS $POH 1ZUIPO 4DSJQU $SFBUF $MVTUFS +PC'MPX .HNOU GSPN &amp;YFDVUF )JWF2- &amp;.3 </li> <li> 46. VTF UPEPUIFGPMMPXJOH BXTDMJ &amp;YFDVUF )JWF2- &amp;YFDVUF TEJTUDQ $POH :PVS&amp;.3 #PPUTUSQ 1SFTUP $SFBUF $MVTUFS .FUBTUS $POH 1ZUIPO 4DSJQU $SFBUF $MVTUFS +PC'MPX .HNOU GSPN &amp;YFDVUF )JWF2- &amp;.3 </li> <li> 47. BXTDMJ 7FS&amp;.31SFWJFX "1* 3VCZ&amp;MBTUJD.BQ3FEVDF QJQ BXTDMJ (JU)VC13 </li> <li> 48. 8F-PWF1ZUIPO </li> <li> 49. $ mkvirtualenv pycon-emr-dev (pycon-emr-dev)$ pip install awscli (pycon-emr-dev)$ mkdir ~/.awscli (pycon-emr-dev)$ cat ~/.awscli/config [profile development] aws_access_key_id= aws_secret_access_key= region=ap-northeast-1 EOF (pycon-emr-dev)$ cat $VIRTUAL_ENV/bin/activate export AWS_CONFIG_FILE=~/.awscli/config export AWS_DEFAULT_PROFILE=development source aws_zsh_completer.sh EOF </li> <li> 50. VTF UPEPUIFGPMMPXJOH BXTDMJ &amp;YFDVUF )JWF2- &amp;YFDVUF TEJTUDQ $POH :PVS&amp;.3 #PPUTUSQ 1SFTUP $SFBUF $MVTUFS .FUBTUS $POH 1ZUIPO 4DSJQU $SFBUF $MVTUFS +PC'MPX .HNOU GSPN &amp;YFDVUF )JWF2- &amp;.3 </li> <li> 51. $ aws emr create-cluster --ami-version 3.1.1 --name 'PyConJP 2014 (AMI 3.1.1 Hive)' --tags Name=pycon-jp-emr environment=development --ec2-attributes KeyName=yourkey --log-uri 's3://yourbucket/jobflow_logs/' --no-auto-terminate --visible-to-all-users --instance-groups file://./normal-instance-setup.json --applications file://./app-hive.json </li> <li> 52. [ { OPSNBMJOTUBODFHSPVQKTPO BQQIJWFKTPO "Name": "emr-master", "InstanceGroupType": "MASTER", "InstanceCount": 1, "InstanceType": "m1.medium" }, { "Name": "emr-core", "InstanceGroupType": "CORE", "InstanceCount": 2, "InstanceType": "m1.medium" } ] [ { "Name": "HIVE" } ] </li> <li> 53. SFTVMU { "ClusterId": "j-8xxxxxxxxx" } </li> <li> 54. VTF UPEPUIFGPMMPXJOH BXTDMJ &amp;YFDVUF )JWF2- &amp;YFDVUF TEJTUDQ $POH :PVS&amp;.3 #PPUTUSQ 1SFTUP $SFBUF $MVTUFS .FUBTUS $POH 1ZUIPO 4DSJQU $SFBUF $MVTUFS +PC'MPX .HNOU GSPN &amp;YFDVUF )JWF2- &amp;.3 </li> <li> 55. $ aws emr add-steps --cluster-id j-8xxxxxxxxx --steps file://./hive-sample-step-1.json </li> <li> 56. [ { IJWFTBNQMFTUFQKTPO "Args": [ "-f", "s3n://yourbucket/hive-script/sample01.hql", "-d", "BUCKET_NAME=yourbucket", "-d", "TARGET_DATE=20140818" ], "ActionOnFailure": "CONTINUE", "Name": "Hive Sample Program 01", "Type": "HIVE" }, { "Args": [ "-f", "s3n://yourbucket/hive-script/sample02.hql", "-d", "BUCKET_NAME=yourbucket", "-d", "TARGET_DATE=20140818" ], "ActionOnFailure": "CONTINUE", "Name": "Hive Sample Program 02", "Type": "HIVE" } ] </li> <li> 57. VTF UPEPUIFGPMMPXJOH BXTDMJ &amp;YFDVUF )JWF2- &amp;YFDVUF TEJTUDQ $POH :PVS&amp;.3 #PPUTUSQ 1SFTUP $SFBUF $MVTUFS .FUBTUS $POH 1ZUIPO 4DSJQU $SFBUF $MVTUFS +PC'MPX .HNOU GSPN &amp;YFDVUF )JWF2- &amp;.3 </li> <li> 58. $ aws emr add-steps --cluster-id j-8xxxxxxxxx --steps file://./s3distcp-sample-step.json </li> <li> 59. [ { TEJTUDQTBNQMFTUFQKTPO "Name": "s3distcp Sample", "ActionOnFailure": "CONTINUE", "Jar": "/home/hadoop/lib/emr-s3distcp-1.0.jar", "Type": "CUSTOM_JAR", "Args": [ "--src", "s3n://yourbucket/access_log/dt=20140818", "--dest", "s3n://yourbucket/compressed_log/dt=20140818", "--groupBy", ".*(nginx_access_log-).*", "--targetSize", "100", "--outputCodec", "gzip" ] } ] </li> <li> 60. VTF UPEPUIFGPMMPXJOH BXTDMJ &amp;YFDVUF )JWF2- &amp;YFDVUF TEJTUDQ $POH :PVS&amp;.3 #PPUTUSQ 1SFTUP $SFBUF $MVTUFS .FUBTUS $POH 1ZUIPO 4DSJQU $SFBUF $MVTUFS +PC'MPX .HNOU GSPN &amp;YFDVUF )JWF2- &amp;.3 </li> <li> 61. $ aws emr create-cluster --ami-version 3.1.1 --name 'PyConJP 2014 (AMI 3.1.1 Hive)' --tags Name=pycon-jp-emr environment=development --ec2-attributes KeyName=yourkey --log-uri 's3://yourbucket/jobflow_logs/' --no-auto-terminate --visible-to-all-users --instance-groups file://./normal-instance-setup.json --applications file://./app-hive-with-config.json </li> <li> 62. [ { BQQIJWFXJUIDPOHKTPO "Args": [ "--hive-site=s3://yourbucket/libs/config/hive-site.xml" ], "Name": "HIVE" } ] </li> <li> 63. IJWFTJUFYNM hive.optimize.s3.querytrueOptimize query on S3</li> <li> 64. VTF UPEPUIFGPMMPXJOH BXTDMJ &amp;YFDVUF )JWF2- &amp;YFDVUF TEJTUDQ $POH :PVS&amp;.3 #PPUTUSQ 1SFTUP $SFBUF $MVTUFS .FUBTUS $POH 1ZUIPO 4DSJQU $SFBUF $MVTUFS +PC'MPX .HNOU GSPN &amp;YFDVUF )JWF2- &amp;.3 </li> <li> 65. $ aws emr create-cluster --ami-version 3.1.1 --name 'PyConJP 2014 (AMI 3.1.1 Hive + Presto)' --tags Name=pycon-jp-emr environment=development --ec2-attributes KeyName=yourkey --log-uri 's3://yourbucket/jobflow_logs/' --no-auto-terminate --visible-to-all-users --instance-groups file://./normal-instance-setup.json --bootstrap-actions file://./bootstrap-presto.json --applications file://./app-hive-with-config.json </li> <li> 66. [ { "Name": "Install/Setup Presto", "Path": "s3://yourbucket/libs/setup-presto.rb", "Args": [ "--task_memory", "1GB", "--log-level", "DEGUB", "--version", "0.75", "--presto-repo-url", "http://central.maven.org/maven2/com/ facebook/presto/", "--sink-buffer-size", "1GB", "--query-max-age", "1h", "--jvm-config", "-server -Xmx2G -XX:+UseConcMarkSweepGC -XX: +ExplicitGCInvokesConcurrent -XX:+CMSClassUnloadingEnabled -XX: +AggressiveOpts -XX:+HeapDumpOnOutOfMemoryError - XX:OnOutOfMemoryError=kill -9 %p -XX:PermSize=150M - XX:MaxPermSize=150M -XX:ReservedCodeCacheSize=150M - Dhive.config.resources=/home/hadoop/conf/core-site.xml,/home/ hadoop/conf/hdfs-site.xml" ] } ] </li> <li> 67. TFUVQQSFTUPSCIUUQTHJUIVCDPN BXTMBCTFNSCPPUTUSBQBDUJPOTCMPCNBTUFS QSFTUPJOTUBMM "841SFTUP&amp;.3 #PPUTUSBQ ".*PS".* )JWF)JWF 5ISJGU4FSWJDF </li> <li> 68. VTF UPEPUIFGPMMPXJOH BXTDMJ &amp;YFDVUF )JWF2- &amp;YFDVUF TEJTUDQ $POH :PVS&amp;.3 #PPUTUSQ 1SFTUP $SFBUF $MVTUFS .FUBTUS $POH 1ZUIPO 4DSJQU $SFBUF $MVTUFS +PC'MPX .HNOU GSPN &amp;YFDVUF )JWF2- &amp;.3 </li> <li> 69. .FUBTUPSF)JWF .Z42- &amp;.3.Z42- .FUBTUPSF&amp;.3%# &amp;.3%%- %#4FDVSJUZ(SPVQ </li> <li> 70. BQQIJWFXJUIDPOHKTPO hive.optimize.s3.querytrueOptimize query on S3javax.jdo.option.ConnectionURLjdbc:mysql://hostname:3306/hive?createDatabaseIfNotExist=trueJDBC connect string for a JDBC metastorejavax.jdo.option.ConnectionDriverNamecom.mysql.jdbc.DriverDriver class name for a JDBC metastorejavax.jdo.option.ConnectionUserNameusernameUsername to use against metastore databasejavax.jdo.option.ConnectionPasswordpasswordPassword to use against metastore database</li> <li> 71. VTF UPEPUIFGPMMPXJOH BXTDMJ &amp;YFDVUF )JWF2- &amp;YFDVUF TEJTUDQ $POH :PVS&amp;.3 #PPUTUSQ 1SFTUP $SFBUF $MVTUFS .FUBTUS $POH 1ZUIPO 4DSJQU $SFBUF $MVTUFS +PC'MPX .HNOU GSPN &amp;YFDVUF )JWF2- &amp;.3 </li> <li> 72. 1ZUIPO&amp;.3 $FMFSZ5BTL 1ZUIPO&amp;.3 CPUPFNS BXTDMJ6UJMJUZ </li>...</ul>

Recommended

View more >