working of webb search engines

Upload: mohammed-azzan-patni

Post on 08-Apr-2018

223 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/7/2019 Working of Webb Search Engines

    1/29

    A Technical Seminar

    Presented by

    Working of Search Engines

    MANGALORE INSTITUTE OF TECHNOLOGY & ENGINEERING(Affiliated to Visvesvaraya Technological University, Belgaum)

    Badaga Mijar, Mangalore- 574225, Karnataka

    2010 2011

    Mohammed Azzan Patni

    (4MT07IS018)

    Seminar Coordinator Seminar Guide

    Ms. RITHIKA KOTIAN Ms. PRAJNA M

  • 8/7/2019 Working of Webb Search Engines

    2/29

    Agenda

    Introduction

    A Brief history of Search Engines

    Modules of a Search Engine

    Working

    Page Ranking

    Drawbacks

    Conclusion

    References

  • 8/7/2019 Working of Webb Search Engines

    3/29

    Introduction

    Search Engine is a specialized tool that helps usfind information on the World Wide Web.

    A search engine is a coordinated set of programs

    that includes: A spider (also called a "crawler" or a "bot") that goes to every

    page or representative pages on every Web site that wants tobe searchable and reads it, using hypertext links on each page todiscover and read a site's other pages

    A program that creates a huge index (sometimes called a

    "catalog") from the pages that have been read A program that receives your search request, compares it to the

    entries in the index, and returns results to you. (Whatis.com,2001.)

  • 8/7/2019 Working of Webb Search Engines

    4/29

    Search Engines

    Larry Page and Sergey Brin

  • 8/7/2019 Working of Webb Search Engines

    5/29

    A Brief History of Search Engines 1st Generation (1994):

    AltaVista, Excite, Infoseek

    Ranking based on Content

    The more rare words two documents share the more similar they are

    Documents are treated as bags of words(no effort to understand

    the contents) 2nd Generation (1996):

    Lycos

    Ranking based on Content + Structure Site Popularity

    3rd Generation (1998):

    Google, Yahoo, Bing Ranking based on Content + Structure + Value

    Page Reputation

    In the Works

    Ranking based on the need behind the query

  • 8/7/2019 Working of Webb Search Engines

    6/29

    Search Engine Modules :

    A document processor

    A query processor

    A search and matching function

    A ranking capability

    Summarizing and Presenting documents(SERP).

  • 8/7/2019 Working of Webb Search Engines

    7/29

  • 8/7/2019 Working of Webb Search Engines

    8/29

    The Web is a Graph

    ANCHOR TEXT

  • 8/7/2019 Working of Webb Search Engines

    9/29

  • 8/7/2019 Working of Webb Search Engines

    10/29

    High Level Design Architecture of a Web Crawler

    A Web crawler is a computer

    program that browses the World

    Wide Web in a methodical,

    automated manner or in an orderly

    fashion. Wikipedia

    The behavior of a Web crawler is the

    outcome of a combination of policies:

    a selection policy that states which pages

    to download,

    a re-visit policy that states when to check

    for changes to the pages,

    a politeness policy that states how to avoid

    overloading Web sites, and

    a parallelization policy that states how to

    coordinate distributed Web crawlers.

  • 8/7/2019 Working of Webb Search Engines

    11/29

    Web Crawling

  • 8/7/2019 Working of Webb Search Engines

    12/29

    Document Processor

    1. Normalize the document stream to a predefinedformat

    2. Break the document stream into desired retrievableunits

    3. Isolate and meta-tags sub-document pieces4. Identify potential indexable elements in documents

    5. Delete stop words

    6. Stem terms

    7. Extract index entries

    8. Compute weights9. Create and update the main inverted file againstwhich the search engine searches in order to matchqueries to documents.

  • 8/7/2019 Working of Webb Search Engines

    13/29

    Query Processing

  • 8/7/2019 Working of Webb Search Engines

    14/29

  • 8/7/2019 Working of Webb Search Engines

    15/29

  • 8/7/2019 Working of Webb Search Engines

    16/29

  • 8/7/2019 Working of Webb Search Engines

    17/29

  • 8/7/2019 Working of Webb Search Engines

    18/29

    What happens in Google ?

  • 8/7/2019 Working of Webb Search Engines

    19/29

    Problem..!!

    Search Engines Cant READ.

  • 8/7/2019 Working of Webb Search Engines

    20/29

    PageRank Algorithm

    A Top 10 IEEE data mining algorithm

    A PageRank results from a mathematical algorithm

    based on the graph created by all WWW.

    Other link-based ranking algorithms for Web pages

    include the HITS algorithm invented by Jon Kleinberg(used by Teoma and now Ask.com), the IBM CLEVER

    project, and the TrustRank algorithm.

  • 8/7/2019 Working of Webb Search Engines

    21/29

  • 8/7/2019 Working of Webb Search Engines

    22/29

    In other words, the PageRank conferred by an outbound link is equal tothe document's own PageRank score divided by the normalized

    number of outbound links L( ) (it is assumed that

    links to specific URLs only count once per

    document).

    In the general case, the PageRank value for any page u can be expressed as:

    i.e. the PageRank value for a page u is dependent on the PageRank values for each

    page v out of the set Bu (this set contains all pages linking to

    page u), divided by the number L(v) of links from page v.

  • 8/7/2019 Working of Webb Search Engines

    23/29

    PageRanking

  • 8/7/2019 Working of Webb Search Engines

    24/29

    The Panda Update

    Google formed their definition of low quality by

    asking outside testers to rate sites by answering

    questions such as:

    Would you be comfortable giving this site your credit

    card?

    Would you be comfortable giving medicine prescribed by

    this site to your kids?

    Do you consider this site to be authoritative?

    Would it be okay if this was in a magazine?

    Does this site have excessive ads?

    And if the answer was yes then PageRank was to

    decrease.

  • 8/7/2019 Working of Webb Search Engines

    25/29

    Drawbacks

    No Real-time Search Results

    Not Intelligent

    Chances of misleading the search are more

  • 8/7/2019 Working of Webb Search Engines

    26/29

    Conclusions

    Search engine plays important role in accessing thecontent over the internet, it fetches the pagesrequested by the user.

    It made the internet and accessing theinformation just a click away.

    The need for better search engines only increases

    The search engine sites are among the mostpopular websites.

  • 8/7/2019 Working of Webb Search Engines

    27/29

    References

    Wikipedia

    http://en.wikipedia.org/wiki/Web_search_engine

    How Stuff Works

    http://www.howstuffworks.com. WebReference.com

    The Anatomy of a Large-Scale Hypertextual Web Search

    Engine by Sergey Brin and Lawrence Page

    How a Search Engine Works by Elizabeth Liddy http://www.cnlp.org/publications/02HowASearchEngineWorks.pdf

  • 8/7/2019 Working of Webb Search Engines

    28/29

    Questions

    ???

  • 8/7/2019 Working of Webb Search Engines

    29/29

    Thank You for Patient Listening !