aws lambdaで作るクローラー/スクレイピング

Download AWS Lambdaで作るクローラー/スクレイピング

Post on 15-Jul-2015

4.755 views

Category:

Documents

2 download

Embed Size (px)

TRANSCRIPT

  • AWS Lambda Meetup #0

    Lambda

    41222 NRI

  • NRI Twitter: @dkfj Facebook: takuro.sasaki blog: http://blog.takuros.net/ AWSS3,SQS

  • JAWSUG

  • Ruby

    http://amzn.to/1lsJ5id

    Ruby 21

  • NRI

    NRIWeb

    Web AWS

  • AWS Lambda

    Lambda

  • (

  • S3 Event Notications

    S3

    Put,Post, etc

    SQS

    SNS

    Lambda Function

  • Lambda /

  • Web

  • HTML HTMLA

  • Lambda

    1.

    2. http 3. html 4. S3 Event Call

    5. S3 getObject6. Scrape

    LambdaCrawler parseHtml

    S3

  • 1.

    3. html

    LambdaCrawler

    Node.js http

    AWS s3 putObject

    URL

    2. http

  • 4. S3 Event Call

    5. S3 getObject6. Scrape

    parseHtml

    S3 Event Lambda

    cheerio

  • https://github.com/takuros/lambda-crawler

    http://blog.takuros.net/entry/2014/12/14/053606

  • Lambda

    S3 + Event Notication LambdaLambda

  • Http

    HttpTest

    HttpTest2

    1.

    54.172.104.205 - - [21/Dec/2014:13:24:12 +0000] "GET /hoge.html HTTP/1.1" 200 5 "-" "-" 54.172.104.205 - - [21/Dec/2014:13:24:20 +0000] "GET /hoge.html HTTP/1.1" 200 5 "-" "-" 54.172.104.205 - - [21/Dec/2014:13:24:23 +0000] "GET /hoge.html HTTP/1.1" 200 5 "-" "-" 54.172.104.205 - - [21/Dec/2014:13:24:28 +0000] "GET /hoge.html HTTP/1.1" 200 5 "-" "-" 54.172.104.205 - - [21/Dec/2014:13:25:24 +0000] "GET /hoge.html HTTP/1.1" 200 5 "-" "-"

    2.

  • Lambda

    1010 100

    ParallelCall

    1. HttpTest

    HttpTest54.173.132.200 - - [21/Dec/2014:15:57:32 +0000] "GET /hoge.html HTTP/1.1" 200 5 "-" "-" 54.173.132.200 - - [21/Dec/2014:15:57:32 +0000] "GET /hoge.html HTTP/1.1" 200 5 "-" "-" 54.173.132.200 - - [21/Dec/2014:15:57:32 +0000] "GET /hoge.html HTTP/1.1" 200 5 "-" "-" 54.173.132.200 - - [21/Dec/2014:15:57:32 +0000] "GET /hoge.html HTTP/1.1" 200 5 "-" "-" 54.173.132.200 - - [21/Dec/2014:15:57:32 +0000] "GET /hoge.html HTTP/1.1" 200 5 "-" "-"

    IP

  • Lambda

    10101010 10,000

    1.

    ParallelCall

    HttpTest

    HttpTest

    ParallelCall

    HttpTest

    HttpTest

    ParallelChainCall

  • DDos

    IP

    54.172.104.205 - - [21/Dec/2014:16:08:31 +0000] "GET /hoge.html HTTP/1.1" 200 5 "-" "-" 54.173.73.201 - - [21/Dec/2014:16:08:31 +0000] "GET /hoge.html HTTP/1.1" 200 5 "-" "-" 54.173.132.200 - - [21/Dec/2014:16:08:31 +0000] "GET /hoge.html HTTP/1.1" 200 5 "-" "-" 54.173.132.200 - - [21/Dec/2014:16:08:31 +0000] "GET /hoge.html HTTP/1.1" 200 5 "-" "-" 54.173.73.201 - - [21/Dec/2014:16:08:31 +0000] "GET /hoge.html HTTP/1.1" 200 5 "-" "-" 54.173.132.200 - - [21/Dec/2014:16:08:31 +0000] "GET /hoge.html HTTP/1.1" 200 5 "-" "-"

  • Lambda Hadoop

    Phantomjs

  • Lambda

    / AWS Ex

    Cloud Automator

    NRImPLAT

  • Google Analytics

    Google

  • Lambda

    AWS

    Rate Exceeded

  • Lambda Lambda

  • @dkfj