map reduce intro 130424032255 phpapp01

Upload: imran-khan

Post on 03-Jun-2018

226 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    1/64

    MapReduce Intro

    The MapReduce Programming Model

    Introduction and Examples

    Dr. Jose Mara Alvarez-Rodrguez

    Quality Management in Service-based Systems and CloudApplications

    FP7 RELATE-ITNSouth East European Research Center

    Thessaloniki, 10th of April, 2013

    1 / 6 1

    http://goforward/http://find/http://goback/
  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    2/64

  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    3/64

    MapReduce Intro

    MapReduce in a nutshell

    Features

    A programming model...

    1 Large-scale distributed data processing2 Simple but restricted

    3 Paralell programming

    4 Extensible

    3 / 6 1

    http://find/
  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    4/64

    MapReduce Intro

    MapReduce in a nutshell

    Antecedents

    Functional programming

    1 Inspired

    2 ...but not equivalent

    Example in Python

    Given a list of numbers between 1 and 50 print only evennumbers

    p ri nt f il te r ( la mb da x : x % 2 = = 0 , r an ge ( 1, 5 0) )

    A list of numbers (data)

    A condition (even numbers)

    A function filterthat is applied to the list (map)

    4 / 6 1

    http://find/
  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    5/64

    MapReduce Intro

    MapReduce in a nutshell

    Antecedents

    Functional programming

    1 Inspired

    2 ...but not equivalent

    Example in Python

    Given a list of numbers between 1 and 50 print only evennumbers

    p ri nt f il te r ( la mb da x : x % 2 = = 0 , r an ge ( 1, 5 0) )

    A list of numbers (data)

    A condition (even numbers)

    A function filterthat is applied to the list (map)

    5 / 6 1

    M R d I

    http://find/http://goback/
  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    6/64

    MapReduce Intro

    MapReduce in a nutshell

    ...Other examples...

    Example in Python

    Return the sum of the squares of a list of numbers between 1 and50

    import operatorr ed u ce ( o p e ra t or . a dd , m ap ( ( l am bd a x : x * *2 ) , r an ge ( 1 , 5 0) ) , 0 )

    reduce is equivalent to foldl in other func. languages asHaskell

    other math considerations should be taken into account (kindof operator)...

    6 / 6 1

    M R d I t

    http://find/
  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    7/64

    MapReduce Intro

    MapReduce in a nutshell

    Some interesting points...

    The Map Reduce framework...

    1 Inspired in functional programming concepts (but notequivalent)

    2 Problems that can be paralellized

    3 Sometimes recursive solutions

    4

    ...

    7 / 6 1

    MapReduce Intro

    http://goforward/http://find/http://goback/
  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    8/64

    MapReduce Intro

    MapReduce in a nutshell

    Basic Model

    MapReduce: The Programming Model and Practice, SIGMETRICS, Turorials 2009, Google.

    8 / 6 1

    MapReduce Intro

    http://find/
  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    9/64

    MapReduce Intro

    MapReduce in a nutshell

    Map Function

    Figure: Mapping creates a new output list by applying a function to

    individual elements of an input list.

    Module 4: MapReduce, Hadoop Tutorial, Yahoo!.

    9 / 6 1

    MapReduce Intro

    http://find/
  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    10/64

    MapReduce Intro

    MapReduce in a nutshell

    Reduce Function

    Figure: Reducing a list iterates over the input values to produce anaggregate value as output.

    Module 4: MapReduce, Hadoop Tutorial, Yahoo!.

    10/61

    MapReduce Intro

    http://find/
  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    11/64

    MapReduce Intro

    MapReduce in a nutshell

    MapReduce Flow

    Figure: High-level MapReduce pipeline.

    Module 4: MapReduce, Hadoop Tutorial, Yahoo!.

    11/61

    MapReduce Intro

    http://find/
  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    12/64

    apReduce t o

    MapReduce in a nutshell

    MapReduce Flow

    Figure: Detailed Hadoop MapReduce data flow.

    12/61

    MapReduce Intro

    http://goforward/http://find/http://goback/
  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    13/64

    p

    MapReduce in a nutshell

    Tip

    What is MapReduce?

    It is a framework inspired in functional programming to tackleproblems in which steps can be paralellized applying a divide andconquer approach.

    13/61

    MapReduce Intro

    http://find/http://goback/
  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    14/64

    Thinking in MapReduce

    When should I use MapReduce?

    Query

    Index and Search: inverted index

    Filtering

    Classification

    Recommendations: clustering or collaborative filtering

    Analytics

    Summarization and statistics

    Sorting and merging

    Frequency distribution

    SQL-based queries: group-by, having, etc.

    Generation of graphics: histograms, scatter plots.

    Others

    Message passing such as Breadth First-Search or PageRank algorithms.

    14/61

    MapReduce Intro

    http://find/
  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    15/64

    Thinking in MapReduce

    When should I use MapReduce?

    Query

    Index and Search: inverted index

    Filtering

    Classification

    Recommendations: clustering or collaborative filtering

    Analytics

    Summarization and statistics

    Sorting and merging

    Frequency distribution

    SQL-based queries: group-by, having, etc.

    Generation of graphics: histograms, scatter plots.

    Others

    Message passing such as Breadth First-Search or PageRank algorithms.

    15/61

    MapReduce Intro

    http://find/
  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    16/64

    Thinking in MapReduce

    When should I use MapReduce?

    Query

    Index and Search: inverted index

    Filtering

    Classification

    Recommendations: clustering or collaborative filtering

    Analytics

    Summarization and statistics

    Sorting and merging

    Frequency distribution

    SQL-based queries: group-by, having, etc.

    Generation of graphics: histograms, scatter plots.

    Others

    Message passing such as Breadth First-Search or PageRank algorithms.

    16/61

    MapReduce Intro

    http://find/
  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    17/64

    Thinking in MapReduce

    How Google uses MapReduce (80% of data processing)

    Large-scale web search indexing

    Clustering problems for Google News

    Produce reports for popular queries, e.g. Google Trend

    Processing of satellite imagery data

    Language model processing for statistical machine translation

    Large-scale machine learning problems

    . . .

    17/61

    MapReduce Intro

    http://find/
  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    18/64

    Thinking in MapReduce

    Comparison of MapReduce and other approaches

    MapReduce: The Programming Model and Practice, SIGMETRICS, Turorials 2009, Google.

    18/61

    MapReduce Intro

    http://find/
  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    19/64

    Thinking in MapReduce

    Evaluation of MapReduce and other approaches

    MapReduce: The Programming Model and Practice, SIGMETRICS, Turorials 2009, Google.

    19/61

    MapReduce Intro

    http://find/
  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    20/64

    Thinking in MapReduce

    Apache Hadoop

    MapReduce definition

    The Apache Hadoop software

    library is a framework thatallows for the distributedprocessing of large data setsacross clusters of computersusing simple programmingmodels.

    Figure: Apache Hadoop Logo.

    20/61

    MapReduce Intro

    Thi ki i M R d

    http://find/
  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    21/64

    Thinking in MapReduce

    Tip

    What can I do in MapReduce?

    Three main functions:

    1 Querying2 Summarizing

    3 Analyzing

    . . . large datasets in off-line mode for boosting other on-line

    processes.

    21/61

    MapReduce Intro

    A l i M R d

    http://find/
  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    22/64

    Applying MapReduce

    MapReduce in Action

    MapReduce Patterns

    1 Summarization

    2 Filtering

    3

    Data Organization (sort, merging, etc.)4 Relational-based (join, selection, projection, etc.)

    5 Iterative Message Passing (graph processing)6 Others (depending on the implementation):

    Simulation of distributed systemsCross-correlationMetapatternsInput-output. . .

    22/61

    MapReduce Intro

    Applying MapReduce

    http://find/
  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    23/64

    Applying MapReduce

    Overview (stages)-Counting Letters

    23/61

    MapReduce Intro

    Applying MapReduce

    http://find/
  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    24/64

    Applying MapReduce

    Summarization

    Types

    1 Numerical summarizations

    2 Inverted index

    3 Counting and counters

    24/61

    MapReduce Intro

    Applying MapReduce

    http://find/
  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    25/64

    Applying MapReduce

    Numerical Summarization-I

    Description

    A general pattern for calculating aggregate statistical values over

    your data.

    Intent

    Group records together by a key field and calculate a numerical

    aggregate per group to get a top-level view of the larger data set.

    25/61

    MapReduce Intro

    Applying MapReduce

    http://find/
  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    26/64

    Applying MapReduce

    Numerical Summarization-II

    Applicability

    To deal with numerical data or counting.

    To group data by specific fields

    Examples

    1 Word count

    2 Record count

    3 Min/Max/Count

    4 Average/Median/Standard deviation

    5 . . .

    26/61

    MapReduce Intro

    Applying MapReduce

    http://find/
  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    27/64

    pp y g p

    Numerical Summarization-Pseudocode

    class Mapper

    method Map(recordid id, record r)

    for all term t in record r do

    Emit(term t, count 1)

    class Reducer

    method Reduce(term t, counts [c1, c2,...])

    sum = 0

    for all count c in [c1, c2,...] dosum = sum + c

    Emit(term t, count sum)

    27/61

    MapReduce Intro

    Applying MapReduce

    http://find/
  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    28/64

    pp y g p

    Overview-Word Counter

    28/61

    MapReduce Intro

    Applying MapReduce

    http://find/
  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    29/64

    Numerical Summarization-Word Counter

    p u b li c v o id m a p ( L on g Wr i ta b le k ey , T ex t v al ue , C o nt e xt c o nt e xt )

    throws E x c ep t i o n {S tr i ng l in e = v al ue . t o S tr i ng ( ) ;S t r i ng T o k en i z e r t o k en i z e r = new S t r i n g T o k e n i z e r ( l i n e ) ;while ( t o k en i z er . h a s M or e T o ke n s ( ) ) {

    w o r d . s e t ( t o k e n i z e r . n e x t T o k e n ( ) ) ;

    c o n te x t . w r i te ( w o rd , o n e ) ;}

    }

    p u b li c v o id r e d u c e ( T e x t k ey , I t e r ab l e < I n t W r i t a b le > v a l ue s ,C o n te x t c o n te x t )

    throws I O E x c e p t i on , I n t e r r u p t e d E x c e p t i o n {in t s um = 0;fo r ( I n tW r it a bl e v al : v a lu e s ) {

    s um + = v al . g et ( ) ;}c o n t e x t . w r i t e ( k e y , ne w I n t W r i t a b l e ( s u m ) ) ;

    }

    29/61

    MapReduce Intro

    Applying MapReduce

    http://find/
  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    30/64

    Example-II

    Min/Max

    Given a list of tweets (username, date, text) determine first andlast time an user commented and the number of times.

    Implementation

    See https://github.com/chemaar/seqos/tree/master/prototypes/mapreduce-intro

    30/61

    MapReduce Intro

    Applying MapReduce

    https://github.com/chemaar/seqos/tree/master/prototypes/mapreduce-introhttps://github.com/chemaar/seqos/tree/master/prototypes/mapreduce-introhttp://find/
  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    31/64

    Overview - Min/Max

    Min and max creation date are the same in the map phase.31/61

    MapReduce IntroApplying MapReduce

    http://find/
  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    32/64

    Example II-Min/Max, function Map

    p u b li c v o id m a p ( Ob j ec t k ey , T ex t v al ue , C o nt e xt c o nt e xt )

    throws I O E x c e pt i o n , I n t e r r u p t e d E x c e pt i o n , P a r s e E x c e p t i o n {M ap < S tr i ng , S t ri ng > p a r se d = M R D PU t i l s . p a rs e ( v a l ue .

    t o S t r i n g ( ) ) ;S t r in g s t r Da t e = p a r se d . g e t ( M R D P Ut i l s . C R E A T IO N _ D AT E ) ;

    S t r in g u s e rI d = p a rs e d . g e t ( M R D PU t i ls . U S E R _ I D ) ;if ( s t r Da t e = = null | | u se rI d = = null ) {

    return ;}D a te c r e a ti o n D at e = M R D PU t i ls . f r m t . p a r se ( s t r D a te ) ;o u t T u p l e . s e t M i n ( c r e a t i o n D a t e ) ;o u t T u p l e . s e t M a x ( c r e a t i o n D a t e ) ;o u t T u p l e . s e t C o u n t ( 1 ) ;o u t U s e r I d . s e t ( u s e r I d ) ;

    c o n t e x t . w r i t e ( o u t U s e r I d , o u t T u p l e ) ;}

    32/61

    MapReduce IntroApplying MapReduce

    http://find/
  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    33/64

    Example II-Min/Max, function Reduce

    p u b li c v o id r e d u c e ( T e x t k ey , I t e r ab l e < M i n M a x C o u n t T u p l e > v a l ue s ,

    C o n te x t c o n te x t ) throws I O Ex c e pt i on , I n t e r ru p t e d Ex c e p t io n {r e s u l t . s e t M i n ( null ) ;r e s u l t . s e t M a x ( null ) ;int s um = 0;for ( M i n Ma x Co u nt T up l e v al : v a lu e s ) {

    if ( r e s u lt . g e t M i n ( ) = = null| | v a l . g e tM i n ( ) . c o m pa r e To ( r e s u l t . g e t Mi n ( ) ) < 0 )

    {r e s u l t . s e t M i n ( v a l . g e t M i n ( ) ) ;

    }if ( r e s u lt . g e t M a x ( ) = = null

    | | v a l . g e tM a x ( ) . c o m pa r e To ( r e s u l t . g e t Ma x ( ) ) > 0 ){

    r e s u l t . s e t M a x ( v a l . g e t M a x ( ) ) ;}

    s um + = v a l . g e tC o u nt ( ) ; }r e s u l t . s e t C o u n t ( s u m ) ;c o n te x t . w r i te ( k e y , r e s ul t ) ;

    }

    33/61

    MapReduce IntroApplying MapReduce

    http://find/
  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    34/64

    Example-III

    Average

    Given a list of tweets (username, date, text) determine the averagecomment length per hour of day.

    Implementation

    See https://github.com/chemaar/seqos/tree/master/prototypes/mapreduce-intro

    34/61

    MapReduce IntroApplying MapReduce

    https://github.com/chemaar/seqos/tree/master/prototypes/mapreduce-introhttps://github.com/chemaar/seqos/tree/master/prototypes/mapreduce-introhttp://find/
  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    35/64

    Overview - Average

    35/61

    MapReduce IntroApplying MapReduce

    http://find/
  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    36/64

    Example III-Average, function Map

    p u b li c v o id m a p ( Ob j ec t k ey , T ex t v al ue , C o nt e xt c o nt e xt )

    throws I O E x c e pt i o n , I n t e r r u p t e d E x c e pt i o n , P a r s e E x c e p t i o n {M ap < S t ri ng , S tr in g > p ar se d =

    M R D P U t i l s . p a r s e ( v a l u e . t o S t r i n g ( ) ) ;S t r in g s t r Da t e = p a r se d . g e t ( M R D P Ut i l s . C R E A T IO N _ D A TE ) ;S t r in g t e xt = p a r se d . g e t ( M R D P Ut i l s . T E XT ) ;if ( s t r Da t e = = null | | t ex t = = null ) {

    return ;}D a te c r e a ti o n D at e = M R D PU t i ls . f r m t . p a r se ( s t r D a t e ) ;o u t H o u r . s e t ( c r e a t i o n D a t e . g e t H o u r s ( ) ) ;o u t C o u n t A v e r a g e . s e t C o u n t ( 1 ) ;o u t C o u n t A v e r a g e . s e t A v e r a g e ( t e x t . l e n g t h ( ) ) ;c o n t e x t . w r i t e ( o u t H o u r , o u t C o u n t A v e r a g e ) ;

    }

    36/61

    MapReduce IntroApplying MapReduce

    http://find/
  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    37/64

    Example III-Average, function Reduce

    p u b li c v o id r e d u c e ( I n t W r i t a b l e k e y , I t e r ab l e < C o u n t A v e r a g e T u p l e >

    values ,C o n te x t c o n te x t ) throws I O Ex c e pt i on , I n t e r ru p t e d Ex c e p t io n {float s um = 0;

    float c ou nt = 0 ;for ( C o u nt A ve r ag e Tu p le v al : v al u es ) {

    s um + = v al . g e tC o un t ( ) * v al . g e t Av e ra g e () ;c o un t + = v a l . g e tC o u nt ( ) ;

    }r e s u l t . s e t C o u n t ( c o u n t ) ;r e s ul t . s e t A v e ra g e ( s u m / c o un t ) ;c o n te x t . w r i te ( k e y , r e s ul t ) ;

    }

    37/61

    MapReduce IntroApplying MapReduce

    http://find/
  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    38/64

    Numerical Summarization-Other approaches

    Relation to SQL

    S E L EC T M IN ( n u m c o l1 ) , M A X ( n u mc o l 1 ) ,C OU NT ( * ) F RO M t ab le G RO UP B Y g r ou p co l 2 ;

    Implementation in PIG

    b = G RO UP a B Y g ro up co l2 ;c = F OR E AC H b G E NE R AT E g ro up , M IN ( a . n um c ol 1 ) ,M A X ( a . n um c o l1 ) , C O U N T_ S T A R ( a ) ;

    38/61

    MapReduce IntroApplying MapReduce

    http://find/
  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    39/64

    Numerical Summarization-Other approaches

    Relation to SQL

    S E L EC T M IN ( n u m c o l1 ) , M A X ( n u mc o l 1 ) ,C OU NT ( * ) F RO M t ab le G RO UP B Y g r ou p co l 2 ;

    Implementation in PIG

    b = G RO UP a B Y g ro up co l2 ;c = F OR E AC H b G E NE R AT E g ro up , M IN ( a . n um c ol 1 ) ,M A X ( a . n um c o l1 ) , C O U N T_ S T A R ( a ) ;

    39/61

    MapReduce IntroApplying MapReduce

    http://find/
  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    40/64

    Filtering

    Types

    1 Filtering2 Top N records

    3 Bloom filtering

    4 Distinct

    40/61

    MapReduce IntroApplying MapReduce

    http://find/
  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    41/64

    Filtering-I

    Description

    It evaluates each record separately and decides, based on somecondition, whether it should stay or go.

    Intent

    Filter out records that are not of interest and keep ones that are.

    41/61

    MapReduce IntroApplying MapReduce

    http://find/
  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    42/64

    Filtering-II

    Applicability

    To collate data

    Examples

    1 Closer view of dataset

    2 Data cleansing

    3 Tracking a thread of events

    4 Simple random sampling

    5 Distributed Grep

    6 Removing low scoring dataset

    7 Log Analysis8 Data Querying

    9 Data Validation

    10 . . .

    42/61

    MapReduce IntroApplying MapReduce

    http://find/
  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    43/64

    Filtering-Pseudocode

    class Mapper

    method Map(recordid id, record r)

    field f = extract(r)

    if predicate (f)Emit(recordid id, value(r))

    class Reducer

    method Reduce(recordid id, values [r1, r2,...])

    //Whatever

    Emit(recordid id, aggregate (values))

    43/61

    MapReduce IntroApplying MapReduce

    http://find/
  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    44/64

    Example-IV

    Distributed Grep

    Given a list of tweets (username, date, text) determine the tweetsthat contain a word.

    Implementation

    See https://github.com/chemaar/seqos/tree/master/prototypes/mapreduce-intro

    44/61

    MapReduce IntroApplying MapReduce

    https://github.com/chemaar/seqos/tree/master/prototypes/mapreduce-introhttps://github.com/chemaar/seqos/tree/master/prototypes/mapreduce-introhttp://find/
  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    45/64

    Overview - Distributed Grep

    45/61

    MapReduce IntroApplying MapReduce

    http://find/
  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    46/64

    Example IV-Distributed Grep, function Map

    p u b li c v o id m a p ( Ob j ec t k ey , T ex t v al ue , C o nt e xt c o nt e xt )

    throws I O E x c e p t i on , I n t e r r u p t e d E x c e p t i o n {M ap < S t ri ng , S tr in g > p ar se d =

    M R D P U t i l s . p a r s e ( v a l u e . t o S t r i n g ( ) ) ;S t r in g t x t = p a r se d . g e t ( M R D P Ut i l s . T E XT ) ;

    S t ri ng m a pR e ge x = " . * \ \ b " + c o n t e x t . g e t C o n f i g u r a t i o n ( ). g e t ( " m a p r e g e x " ) + " ( . ) * \ \ b . * " ;if ( t x t . m a t c he s ( m a p R e ge x ) ) {

    c o n te x t . w r i te ( N u l l W r i t ab l e . g e t ( ) , v a lu e ) ;}

    }

    ...and the Reduce function?

    In this case it is not necessary and output values are directly writing to the output.

    46/61

    MapReduce IntroApplying MapReduce

    http://find/
  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    47/64

    Example-V

    Top 5

    Given a list of tweets (username, date, text) determine the 5 usersthat wrote longer tweets

    Implementation

    See https://github.com/chemaar/seqos/tree/master/prototypes/mapreduce-intro

    47/61

    MapReduce IntroApplying MapReduce

    https://github.com/chemaar/seqos/tree/master/prototypes/mapreduce-introhttps://github.com/chemaar/seqos/tree/master/prototypes/mapreduce-introhttp://find/
  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    48/64

    Overview - Top 5

    48/61

    MapReduce IntroApplying MapReduce

    http://find/
  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    49/64

    Example V-Top 5, function Map

    private T r e e Ma p < I n t e g er , T e xt > r e p T o R e c o r d M a p = ne w TreeMap ();p u b li c v o id m a p ( Ob j ec t k ey , T ex t v al ue , C o nt e xt c o nt e xt )

    throws I O E x c e p t i on , I n t e r r u p t e d E x c e p t i o n {M ap < S t ri ng , S tr in g > p ar se d =M R D P U t i l s . p a r s e ( v a l u e . t o S t r i n g ( ) ) ;if ( p a rs e d = = null ) { return ;}S t r in g u s er I d = p a r se d . g e t ( M R D P Ut i l s . U S E R_ I D ) ;S t r in g r e p u ta t i o n = S t ri n g . v a l u eO f ( p a r s ed . g e t ( M R D P U ti l s .

    T E X T ) . l e n g t h ( ) ) ;/ / Ma x r e pu t at i on i f y ou w ri te t w ee ts l on g erif ( u s er I d = = null | | r e pu t at i on = = null ) { return ;}

    r e p T o R e c o r d M a p . p u t ( I n t e g e r . p a r s e I n t ( r e p u t a t i o n ) , ne wT e x t ( v a l u e ) ) ;

    if ( r e p T oR e c o rd M a p . s i ze ( ) > M A X _T O P ) {r e p T o R e c o r d M a p . r e m o v e ( r e p T o R e c o r d M a p . f i r s t K e y ( )

    ) ;}

    }

    49/61

    MapReduce IntroApplying MapReduce

    E l V T f R d

    http://find/
  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    50/64

    Example V-Top 5, function Reduce

    p u b li c v o id r e d uc e ( N u l l W r i ta b l e k ey , I t er a bl e < T ex t > v a lu e s ,

    C o n te x t c o n te x t ) throws I O Ex c e pt i on , I n t e r ru p t e d Ex c e p t io n {fo r ( T ex t v al ue : v al ue s ) {M ap < S t ri n g , S t ri n g > p a rs e d = M R D P Ut i l s . p a r se ( v a l u e .

    t o S t r i n g ( ) ) ;r e p T o R e c o r d M a p . p u t ( p a r s e d . g e t ( M R D P U t i l s . T E X T ) . l e n g t h

    () , new T e x t ( v a l u e ) ) ;if ( r e p T oR e c o rd M a p . s i ze ( ) > M A X _T O P ) {

    r e p T o R e c o r d M a p . r e m o v e ( r e p T o R e c o r d M a p . f i r s t K e y ( )) ;

    }}

    fo r ( T e xt t : r e p T oR e c o r dM a p . d e s c e n d in g M a p ( ) . v a lu e s ( )) {

    c o n te x t . w r i te ( N u l l W r i ta b l e . g e t ( ) , t ) ;

    }}

    50/61

    MapReduce IntroApplying MapReduce

    Fil i O h h

    http://find/
  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    51/64

    Filtering-Other approaches

    Relation to SQL

    S EL E CT * F RO M t ab le W HE RE c o lv a lu e < V A LU E ;

    Implementation in PIG

    b = F IL TE R a BY c ol va lu e < V AL UE ;

    51/61

    MapReduce IntroApplying MapReduce

    Fil i O h h

    http://find/
  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    52/64

    Filtering-Other approaches

    Relation to SQL

    S EL E CT * F RO M t ab le W HE RE c o lv a lu e < V A LU E ;

    Implementation in PIG

    b = F IL TE R a BY c ol va lu e < V AL UE ;

    52/61

    MapReduce IntroApplying MapReduce

    Ti

    http://find/
  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    53/64

    Tip

    How can I use and run a MapReduce framework?

    You should identify what kind of problem you are addressing andapply a design pattern to be implemented in a framework suchas Apache Hadoop.

    53/61

    MapReduce Intro

    Success Stories with MapReduce

    Ti

    http://find/
  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    54/64

    Tip

    Who is using MapReduce?

    All companies that are dealing with Big Data problems for

    analytics such as:Cloudera

    Datasalt

    Elasticsearch

    . . .

    54/61

    MapReduce Intro

    Success Stories with MapReduce

    Apache Hadoop Related Projects

    http://goforward/http://find/http://goback/
  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    55/64

    Apache Hadoop-Related Projects

    55/61

    MapReduce Intro

    Success Stories with MapReduce

    More tips

    http://goforward/http://find/http://goback/
  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    56/64

    More tips

    FAQ

    MapReduce is a framework based on a simple programmingmodel

    ...to deal with large datasets in a distributed fashion

    ...scalability, replication, fault-tolerant, etc.

    Apache Hadoop is not a database

    New frameworks on top of Hadoop for specific tasks:querying, analysis, etc.

    Other similar frameworks: Storm, Signal/Collect, etc.

    . . .

    56/61

    MapReduce Intro

    Summary and Conclusions

    Summary

    http://find/
  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    57/64

    Summary

    57/61

    MapReduce Intro

    Summary and Conclusions

    Conclusions

    http://goforward/http://find/http://goback/
  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    58/64

    Conclusions

    What is MapReduce?

    It is a framework inspired in functional programming to tackle problems in which steps can be paralellizedapplying a divide and conquer approach.

    What can I do in MapReduce?

    Three main functions:

    1 Querying

    2 Summarizing

    3 Analyzing

    . . . large datasets in off-line mode for boosting other on-line processes.

    How can I use and run a MapReduce framework?

    You should identify what kind of problem you are addressing and apply a design pattern to be implemented in aframework such as Apache Hadoop.

    58/61

    MapReduce Intro

    Summary and Conclusions

    Conclusions

    http://goforward/http://find/http://goback/
  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    59/64

    Conclusions

    What is MapReduce?

    It is a framework inspired in functional programming to tackle problems in which steps can be paralellizedapplying a divide and conquer approach.

    What can I do in MapReduce?

    Three main functions:

    1 Querying

    2 Summarizing

    3 Analyzing

    . . . large datasets in off-line mode for boosting other on-line processes.

    How can I use and run a MapReduce framework?

    You should identify what kind of problem you are addressing and apply a design pattern to be implemented in aframework such as Apache Hadoop.

    59/61

    MapReduce Intro

    Summary and Conclusions

    Conclusions

    http://find/
  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    60/64

    Conclusions

    What is MapReduce?

    It is a framework inspired in functional programming to tackle problems in which steps can be paralellizedapplying a divide and conquer approach.

    What can I do in MapReduce?

    Three main functions:

    1 Querying

    2 Summarizing

    3 Analyzing

    . . . large datasets in off-line mode for boosting other on-line processes.

    How can I use and run a MapReduce framework?

    You should identify what kind of problem you are addressing and apply a design pattern to be implemented in aframework such as Apache Hadoop.

    60/61

    MapReduce Intro

    Summary and Conclusions

    Whats next?

    http://find/
  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    61/64

    What s next?

    . . .

    Concatenate MapReduce jobs

    Optimization using combiners and setting the parameters (sizeof partition, etc.)

    Pipelining with other languages such as Python

    Hadoop in Action: more examples, etc.

    New trending problems (image/video processing)

    Real-time processing. . .

    61/61

    MapReduce Intro

    References

    J Dean and S Ghema at

    http://find/
  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    62/64

    J. Dean and S. Ghemawat.MapReduce: simplified data processing on large clusters.

    Commun. ACM, 51(1):107113, Jan. 2008.J. L. Jonathan R. Owens, Brian Femiano.Hadoop Real-World Solutions Cookbook.Packt Publishing Ltd, 2013.

    C. Lam.Hadoop in Action.Manning Publications Co., Greenwich, CT, USA, 1st edition,2010.

    J. Lin and C. Dyer.Data-intensive text processing with MapReduce.In Proceedings of Human Language Technologies: The 2009Annual Conference of the North American Chapter of theAssociation for Computational Linguistics, Companion

    62/61

    MapReduce Intro

    References

    Volume: Tutorial Abstracts, NAACL-Tutorials 09, pages 12,

    http://goforward/http://find/http://goback/
  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    63/64

    , , p g ,Stroudsburg, PA, USA, 2009. Association for ComputationalLinguistics.

    D. Miner and A. Shook.Mapreduce Design Patterns.Oreilly and Associates Inc, 2012.

    T. G. Srinath Perera.Hadoop MapReduce Cookbook.Packt Publishing Ltd, 2013.

    T. White.Hadoop: The Definitive Guide.

    OReilly Media, Inc., 1st edition, 2009.

    I. H. Witten and E. Frank.Data Mining: Practical Machine LearningTools and Techniques.

    63/61

    MapReduce Intro

    References

    Morgan Kaufmann Publishers Inc., San Francisco, CA, USA,

    http://find/
  • 8/12/2019 Map Reduce Intro 130424032255 Phpapp01

    64/64

    g2005.

    64/61

    http://goforward/http://find/http://goback/