hadoop for rubyists

Upload: amy-chen

Post on 07-Apr-2018

225 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/3/2019 Hadoop for Rubyists

    1/15

    Hadoo for Rub ists

    Loren [email protected]

    Friday, October 7, 2011

    mailto:[email protected]:[email protected]
  • 8/3/2019 Hadoop for Rubyists

    2/15

    GOVT SEARCH

    Friday, October 7, 2011

  • 8/3/2019 Hadoop for Rubyists

    3/15

    FROM LOGS TO DATA

    BizLogic

    Friday, October 7, 2011

  • 8/3/2019 Hadoop for Rubyists

    4/15

    SUPER SIMPLE WINS

    Friday, October 7, 2011

  • 8/3/2019 Hadoop for Rubyists

    5/15

    VERSION 1.0

    Friday, October 7, 2011

  • 8/3/2019 Hadoop for Rubyists

    6/15

    One year later...

    Friday, October 7, 2011

  • 8/3/2019 Hadoop for Rubyists

    7/15Friday, October 7, 2011

  • 8/3/2019 Hadoop for Rubyists

    8/15

    HIVE =

    HDFS

    +

    Schema

    +

    HQL

    selectds, count(*) cnt

    fromlogs

    group bydsorder bycnt

    Friday, October 7, 2011

  • 8/3/2019 Hadoop for Rubyists

    9/15

    HIVE WITH RUBY

    HDFS

    +

    Schema

    +

    HQL with custom mapper

    add file /local/path/to/queries_mapper.rb;

    select transform(host, time, agent, ...)using './queries_mapper.rb'as host, time, agent, query, affiliate, locale, is_bot, ...fromlogs where ... group by ... having ...

    Friday, October 7, 2011

  • 8/3/2019 Hadoop for Rubyists

    10/15

    STDIN & STDOUT

    % cat logfile | ./queries_mapper.rb

    Friday, October 7, 2011

  • 8/3/2019 Hadoop for Rubyists

    11/15

    VERSION 2.0

    Friday, October 7, 2011

  • 8/3/2019 Hadoop for Rubyists

    12/15

    BUT WHAT ABOUT...

    Hive UDF

    Hadoop streaming

    Wukong/MRToolkit

    Java MR (kidding!)

    Friday, October 7, 2011

  • 8/3/2019 Hadoop for Rubyists

    13/15

    WHERE ARE YOU?

    AddSlaves

    CacheLayer

    LargerBoxes

    Denormalize

    RemoveIndexes

    Shard

    Friday, October 7, 2011

  • 8/3/2019 Hadoop for Rubyists

    14/15

    Parting wordsHadoop ecosystem is rich and very complexNo one piece is too hard

    You can leverage your Ruby/SQL skills with Hive

    Start somewhere, its fun!

    Friday, October 7, 2011

  • 8/3/2019 Hadoop for Rubyists

    15/15

    THANK YOU!

    F id O t b 7 2011