big data camp intro hadoop

Upload: indoos2000

Post on 07-Apr-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/4/2019 Big Data Camp Intro Hadoop

    1/22

    Big Data Camp, Delhi, Sep 10,

    2011

    Introduction to Hadoop / Big Data

    1

  • 8/4/2019 Big Data Camp Intro Hadoop

    2/22

    2

    Good Times < Year 2000

    Web Users

    Web Servers RDBMS

    Online Applications- OLTP

    Report Users

    Reporting Servers RDBMS DW

    Analytics and Reporting- OLAP

  • 8/4/2019 Big Data Camp Intro Hadoop

    3/22

    3

    Year 2000 +

    Web UsersWeb Servers

    RDBMS

    Online Applications- OLTP

    Report Users

    Reporting Servers

    RDBMS DW

    Analytics and Reporting- OLAP

  • 8/4/2019 Big Data Camp Intro Hadoop

    4/22

    Big Data- Problems to Solve

    Scalability

    Storage

    Fail

  • 8/4/2019 Big Data Camp Intro Hadoop

    5/22

    The Knight in Shining Armor

    Engine + LogicFile system

  • 8/4/2019 Big Data Camp Intro Hadoop

    6/22

    Video:

    What can Apache Hadoop Do for You?

  • 8/4/2019 Big Data Camp Intro Hadoop

    7/22

    Who Uses Hadoop?

    Search

    Yahoo, Amazon, Zvents,

    Log processing

    Facebook, Yahoo

    Recommendation Systems Facebook

    Data Warehouse

    Facebook, AOL

    Video and Image Analysis

    New York Times, Eyealike

    INDIAN GOVERNMENT- UUID project

  • 8/4/2019 Big Data Camp Intro Hadoop

    8/22

    HDFS: Design Principles

    Hardware will Fail!

    Petabyte ScaleStore!

  • 8/4/2019 Big Data Camp Intro Hadoop

    9/22

    HDFS: Design Principles

  • 8/4/2019 Big Data Camp Intro Hadoop

    10/22

    Map Reduce

    Origin in Lisp!

    Google- GFS paper!

    Divide and Rule!

  • 8/4/2019 Big Data Camp Intro Hadoop

    11/22

    Borrows from functional programming

    Users implement interlace of two functions :

    map (in_key, in_value) ->

    (out_key, intermediate value) list

    reduce (out_key, intermediate value list) ->

    out_value list

    Map ReduceProgramming Model

  • 8/4/2019 Big Data Camp Intro Hadoop

    12/22

    Hadoop Map Reduce

  • 8/4/2019 Big Data Camp Intro Hadoop

    13/22

    Hadoop Map Reduce

  • 8/4/2019 Big Data Camp Intro Hadoop

    14/22

    Hadoop Example

    Weather sensors collecting data every hour at manylocations cross the globe gather a large volume of logdata, which is a good candidate for analysis withMapReduce, since it is semistructured and record-oriented.

    Data Format:

    The data is stored using a line-oriented ASCII format, in whicheach line is a record. The format supports a rich set ofmeteorological elements, many of which are optional or with

    variable data lengths. For simplicity, we shall focus on the basicelements, such as temperature, which are always present and areof fixed width.

  • 8/4/2019 Big Data Camp Intro Hadoop

    15/22

    Hadoop Example

  • 8/4/2019 Big Data Camp Intro Hadoop

    16/22

    Hadoop Example

  • 8/4/2019 Big Data Camp Intro Hadoop

    17/22

    OLTP

    Java Applications

    Structured Data

    hiho

    Sqoop

    Hadoop Ecosystem Map

    RDBMS

    File system

    Engine + Logic

    UnstructuredData

    High LevelInterfaces

    JAQL

    Workflow

    Cascading

    Support

    Cascading

    More HighLevel

    Interfaces

    Monitor/manageHadoop ecosystem

    1 2

    3

    45

    6

    7

    8

    9

    10

    11

    12

    13

    14

    http://www.google.co.in/imgres?imgurl=http://isabel-drost.de/Bilder/wordpress/karmasphere.jpg&imgrefurl=http://berlinbuzzwords.de/&usg=__RJZN_XQYGrXhMKTU3_tNe5NisJE=&h=43&w=227&sz=5&hl=en&start=7&sig2=N8gC6-ZvXO1TsJdXEgttoA&um=1&itbs=1&tbnid=T2TK9niQ5gcaBM:&tbnh=20&tbnw=108&prev=/images?q=karmasphere&um=1&hl=en&sa=N&rls=com.microsoft:en-us&tbs=isch:1&ei=ZPdSTLK-FIOtrAeLsYkw
  • 8/4/2019 Big Data Camp Intro Hadoop

    18/22

    How can You Contribute?

    Apache Hadoop Projects Learn more about Hadoop

    Contribute to source code

    Participate in Mailing Lists/Forums

    Share blogs etc.

    Impetus Open Source Projects

    Github/Google code hosted projects

    Contribute to source code

  • 8/4/2019 Big Data Camp Intro Hadoop

    19/22

    Thank you

    Visit bigdata.impetus.com

    http://bigdata.impetus.com/http://bigdata.impetus.com/
  • 8/4/2019 Big Data Camp Intro Hadoop

    20/22

    Big Data in EDW

    20

  • 8/4/2019 Big Data Camp Intro Hadoop

    21/22

    Building Big Data Analytics Platform

    Commercial

    Teradata/Netezza

    Greenplum/Vertica/ Aster

    Informatica

    SAS/Microstrategy/

    BusinessObjects

    Pentaho/Jasper

    Open source

    CloverETL/Kettle/ Talend

    Jaspersoft/Pentaho

    Reporting

    Hadoop

    Apache Cassandra

    Hybrid

    ETL - Open Sourceand Commercial

    Analytics - OpenSource or

    Commercial

    Commercial HadoopVersions

  • 8/4/2019 Big Data Camp Intro Hadoop

    22/22

    Web Analytics

    22