google search & pagerank: a case study

Upload: gaurav

Post on 08-Apr-2018

228 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/7/2019 Google Search & PageRank: A Case Study

    1/19

    GOOGLE SEARCH & PAGERANK-A CASE STUDY

    Presented by:Gaurav Kumar Srivastava

  • 8/7/2019 Google Search & PageRank: A Case Study

    2/19

    OVERVIEW

    Worl wi w

    oogl on w

    How googl works

    oogl s rch

    ger nk lgorith

    Iss es in p ger nk

    concl sion

  • 8/7/2019 Google Search & PageRank: A Case Study

    3/19

    WORLD WIDE WEB

    WWW is HU E. Approxi teesti tions :

    ~50 illion ctive web sites

    ~25 billion web p ges

    ~1 billion sers

    There re l rgenumber ofsearch engines too :

    At least 3,105 search engines

  • 8/7/2019 Google Search & PageRank: A Case Study

    4/19

    GOOGLE ON WEB

  • 8/7/2019 Google Search & PageRank: A Case Study

    5/19

    HOW GOOGLE WORKS?

    What happens when I enter a search at oogle ?

  • 8/7/2019 Google Search & PageRank: A Case Study

    6/19

    GOOGLE SEARCH

    Google consists of three istinct parts with fast parallelprocessing.

    GooglebotGooglebot, a web crawler that finds and fetches webpages.

    The IndexerIndexer that sorts every word on every page andstores the resulting indexofwords ina hugedatabase.

    The Query rocessorQuery rocessor, which compares your search queryto the index and recommends the documents that itconsiders most relevant.

  • 8/7/2019 Google Search & PageRank: A Case Study

    7/19

    SEARCH ENGINE

    ARCHITECTURE

  • 8/7/2019 Google Search & PageRank: A Case Study

    8/19

    TECHNOLOGY ASPECT

  • 8/7/2019 Google Search & PageRank: A Case Study

    9/19

    ANATOMY OF A SEARCH ENGINE

    WWW

    User Query

    Crawler Module

    Page

    Repository

    Indexing

    Module

    Indexes

    Ranking

    ModuleQuery

    Module

    Results

  • 8/7/2019 Google Search & PageRank: A Case Study

    10/19

    RANKING MODULE

    Key is tofindthose pages thattheuser desires

    Takes a setofrelevant web pages and ranks them

    Rank is generally afunctionof:

    Content core &

    Popularity core (Thefocus ofthis talk)

    E.g. What are some good Indian restaurants in Toronto?

  • 8/7/2019 Google Search & PageRank: A Case Study

    11/19

    PIGEONRANK

    PigeonRankPigeonRank OverviewOverview

    The heart of Google's search technology is PigeonRank, asystem for ranking web pages developed by Google founders

    Larry Page and ergey Brin at tanford University.

    low cost pigeon clusters (PCs) could be used to compute therelative value of web pages faster than human editors or

    machine-basedalgorithms.

  • 8/7/2019 Google Search & PageRank: A Case Study

    12/19

    PAGERANK AGORITHM

    PageRankalgorithm, givenbyergey Brin and Larry Page in 1998

    Exploits the linked structureofthewebfor computingpopularity

    PR(A) = (1-d) + d (PR(T1)/C(T1) +

    ... + PR(Tn)/C(Tn))

    PR(A) is thePageRank ofpage A,

    PR(Ti) is thePageRank ofpages Ti

    which link to page A,C(Ti) is thenumber ofoutbound

    links on page Ti and

    d is adampingfactor which canbe

    setbetween 0 and 1.

  • 8/7/2019 Google Search & PageRank: A Case Study

    13/19

    PAGERANK DEFINITION(CONT..)

    So, firstofall,we seethatPageRank does not rank

    web sites as a whole, but is determinedfor each page

    individually.

    Further,the PageRank ofpage A is recursively defined

    by thePageRanks ofthose pages which link to page A.

  • 8/7/2019 Google Search & PageRank: A Case Study

    14/19

    The Characteristics of PageRank

    AveragePageRank ofa web page is 1.

    TheminimumPageRank ofa page is givenby (1-d).

    Therefore, there is amaximumPageRank for a page

    which is givenby dN+(1-d), whereN is total number of

    web pages.

  • 8/7/2019 Google Search & PageRank: A Case Study

    15/19

  • 8/7/2019 Google Search & PageRank: A Case Study

    16/19

    PAGERANK: ISSUES

    PageRank is quite sensitiveto small changes inthe

    web.Google computes PageRank from scratch every

    month!

    PageRank is Query Independent, Itmeans thattobe

    better linked is more importantthanto containthesearch terms !

  • 8/7/2019 Google Search & PageRank: A Case Study

    17/19

    CONCLUSION..

    GooglePageRank is probably oneofthemost importantalgorithms

    ever developedfor theWeb.

    MostofGoogle's popularity is creditedto its preferredformofsearch

    engineoptimization, atrademarked programGoogledubbed

    PageRank. WhenPageRank was patentedthe patent was assigned

    toStanford University.

    PageRank, only oneofhundreds offactors usedby Googleto

    determinebest search results, helps to keep our search cleanand

    efficient.

  • 8/7/2019 Google Search & PageRank: A Case Study

    18/19

    QUERIES

    ?

  • 8/7/2019 Google Search & PageRank: A Case Study

    19/19

    THANK YOU & KEEP

    GOOGLING