AWS Lambdaで作るクローラー/スクレイピング

Download AWS Lambdaで作るクローラー/スクレイピング

Post on 15-Jul-2015

4.752 views

Category:

Documents

2 download

Embed Size (px)

TRANSCRIPT

<ul><li><p>AWS Lambda Meetup #0 </p><p>Lambda </p><p>41222 NRI </p></li><li><p> NRI Twitter: @dkfj Facebook: takuro.sasaki blog: http://blog.takuros.net/ AWSS3,SQS</p></li><li><p>JAWSUG </p></li><li><p> Ruby</p><p>http://amzn.to/1lsJ5id</p><p>Ruby 21</p></li><li><p>NRI</p><p>NRIWeb </p><p> Web AWS</p></li><li><p>AWS Lambda </p><p> Lambda</p></li><li><p>( </p></li><li><p>S3 Event Notications </p><p>S3 </p><p>Put,Post, etc</p><p>SQS</p><p>SNS</p><p>Lambda Function</p></li><li><p>Lambda /</p></li><li><p> Web </p></li><li><p> HTML HTMLA </p></li><li><p>Lambda</p><p>1. </p><p>2. http 3. html 4. S3 Event Call</p><p>5. S3 getObject6. Scrape</p><p>LambdaCrawler parseHtml</p><p>S3</p></li><li><p>1. </p><p>3. html</p><p>LambdaCrawler</p><p>Node.js http</p><p>AWS s3 putObject</p><p>URL</p><p>2. http </p></li><li><p>4. S3 Event Call</p><p>5. S3 getObject6. Scrape</p><p>parseHtml</p><p>S3 Event Lambda</p><p>cheerio </p></li><li><p>https://github.com/takuros/lambda-crawler </p><p> http://blog.takuros.net/entry/2014/12/14/053606 </p></li><li><p> Lambda </p><p> S3 + Event Notication LambdaLambda </p></li><li><p>Http </p><p>HttpTest</p><p>HttpTest2</p><p>1. </p><p>54.172.104.205 - - [21/Dec/2014:13:24:12 +0000] "GET /hoge.html HTTP/1.1" 200 5 "-" "-" 54.172.104.205 - - [21/Dec/2014:13:24:20 +0000] "GET /hoge.html HTTP/1.1" 200 5 "-" "-" 54.172.104.205 - - [21/Dec/2014:13:24:23 +0000] "GET /hoge.html HTTP/1.1" 200 5 "-" "-" 54.172.104.205 - - [21/Dec/2014:13:24:28 +0000] "GET /hoge.html HTTP/1.1" 200 5 "-" "-" 54.172.104.205 - - [21/Dec/2014:13:25:24 +0000] "GET /hoge.html HTTP/1.1" 200 5 "-" "-" </p><p>2. </p></li><li><p>Lambda </p><p>1010 100</p><p>ParallelCall</p><p>1. HttpTest</p><p>HttpTest54.173.132.200 - - [21/Dec/2014:15:57:32 +0000] "GET /hoge.html HTTP/1.1" 200 5 "-" "-" 54.173.132.200 - - [21/Dec/2014:15:57:32 +0000] "GET /hoge.html HTTP/1.1" 200 5 "-" "-" 54.173.132.200 - - [21/Dec/2014:15:57:32 +0000] "GET /hoge.html HTTP/1.1" 200 5 "-" "-" 54.173.132.200 - - [21/Dec/2014:15:57:32 +0000] "GET /hoge.html HTTP/1.1" 200 5 "-" "-" 54.173.132.200 - - [21/Dec/2014:15:57:32 +0000] "GET /hoge.html HTTP/1.1" 200 5 "-" "-"</p><p>IP</p></li><li><p>Lambda </p><p>10101010 10,000</p><p>1. </p><p>ParallelCall</p><p>HttpTest</p><p>HttpTest</p><p>ParallelCall</p><p>HttpTest</p><p>HttpTest</p><p>ParallelChainCall</p></li><li><p>DDos </p><p>IP</p><p>54.172.104.205 - - [21/Dec/2014:16:08:31 +0000] "GET /hoge.html HTTP/1.1" 200 5 "-" "-" 54.173.73.201 - - [21/Dec/2014:16:08:31 +0000] "GET /hoge.html HTTP/1.1" 200 5 "-" "-" 54.173.132.200 - - [21/Dec/2014:16:08:31 +0000] "GET /hoge.html HTTP/1.1" 200 5 "-" "-" 54.173.132.200 - - [21/Dec/2014:16:08:31 +0000] "GET /hoge.html HTTP/1.1" 200 5 "-" "-" 54.173.73.201 - - [21/Dec/2014:16:08:31 +0000] "GET /hoge.html HTTP/1.1" 200 5 "-" "-" 54.173.132.200 - - [21/Dec/2014:16:08:31 +0000] "GET /hoge.html HTTP/1.1" 200 5 "-" "-"</p></li><li><p> Lambda Hadoop </p><p> Phantomjs</p></li><li><p>Lambda </p><p> / AWS Ex </p><p>Cloud Automator </p><p>NRImPLAT</p></li><li><p>Google Analytics </p><p>Google</p></li><li><p>Lambda </p><p>AWS</p><p>Rate Exceeded </p></li><li><p> Lambda Lambda</p></li><li><p> @dkfj</p></li></ul>