working of webb search engines
TRANSCRIPT
-
8/7/2019 Working of Webb Search Engines
1/29
A Technical Seminar
Presented by
Working of Search Engines
MANGALORE INSTITUTE OF TECHNOLOGY & ENGINEERING(Affiliated to Visvesvaraya Technological University, Belgaum)
Badaga Mijar, Mangalore- 574225, Karnataka
2010 2011
Mohammed Azzan Patni
(4MT07IS018)
Seminar Coordinator Seminar Guide
Ms. RITHIKA KOTIAN Ms. PRAJNA M
-
8/7/2019 Working of Webb Search Engines
2/29
Agenda
Introduction
A Brief history of Search Engines
Modules of a Search Engine
Working
Page Ranking
Drawbacks
Conclusion
References
-
8/7/2019 Working of Webb Search Engines
3/29
Introduction
Search Engine is a specialized tool that helps usfind information on the World Wide Web.
A search engine is a coordinated set of programs
that includes: A spider (also called a "crawler" or a "bot") that goes to every
page or representative pages on every Web site that wants tobe searchable and reads it, using hypertext links on each page todiscover and read a site's other pages
A program that creates a huge index (sometimes called a
"catalog") from the pages that have been read A program that receives your search request, compares it to the
entries in the index, and returns results to you. (Whatis.com,2001.)
-
8/7/2019 Working of Webb Search Engines
4/29
Search Engines
Larry Page and Sergey Brin
-
8/7/2019 Working of Webb Search Engines
5/29
A Brief History of Search Engines 1st Generation (1994):
AltaVista, Excite, Infoseek
Ranking based on Content
The more rare words two documents share the more similar they are
Documents are treated as bags of words(no effort to understand
the contents) 2nd Generation (1996):
Lycos
Ranking based on Content + Structure Site Popularity
3rd Generation (1998):
Google, Yahoo, Bing Ranking based on Content + Structure + Value
Page Reputation
In the Works
Ranking based on the need behind the query
-
8/7/2019 Working of Webb Search Engines
6/29
Search Engine Modules :
A document processor
A query processor
A search and matching function
A ranking capability
Summarizing and Presenting documents(SERP).
-
8/7/2019 Working of Webb Search Engines
7/29
-
8/7/2019 Working of Webb Search Engines
8/29
The Web is a Graph
ANCHOR TEXT
-
8/7/2019 Working of Webb Search Engines
9/29
-
8/7/2019 Working of Webb Search Engines
10/29
High Level Design Architecture of a Web Crawler
A Web crawler is a computer
program that browses the World
Wide Web in a methodical,
automated manner or in an orderly
fashion. Wikipedia
The behavior of a Web crawler is the
outcome of a combination of policies:
a selection policy that states which pages
to download,
a re-visit policy that states when to check
for changes to the pages,
a politeness policy that states how to avoid
overloading Web sites, and
a parallelization policy that states how to
coordinate distributed Web crawlers.
-
8/7/2019 Working of Webb Search Engines
11/29
Web Crawling
-
8/7/2019 Working of Webb Search Engines
12/29
Document Processor
1. Normalize the document stream to a predefinedformat
2. Break the document stream into desired retrievableunits
3. Isolate and meta-tags sub-document pieces4. Identify potential indexable elements in documents
5. Delete stop words
6. Stem terms
7. Extract index entries
8. Compute weights9. Create and update the main inverted file againstwhich the search engine searches in order to matchqueries to documents.
-
8/7/2019 Working of Webb Search Engines
13/29
Query Processing
-
8/7/2019 Working of Webb Search Engines
14/29
-
8/7/2019 Working of Webb Search Engines
15/29
-
8/7/2019 Working of Webb Search Engines
16/29
-
8/7/2019 Working of Webb Search Engines
17/29
-
8/7/2019 Working of Webb Search Engines
18/29
What happens in Google ?
-
8/7/2019 Working of Webb Search Engines
19/29
Problem..!!
Search Engines Cant READ.
-
8/7/2019 Working of Webb Search Engines
20/29
PageRank Algorithm
A Top 10 IEEE data mining algorithm
A PageRank results from a mathematical algorithm
based on the graph created by all WWW.
Other link-based ranking algorithms for Web pages
include the HITS algorithm invented by Jon Kleinberg(used by Teoma and now Ask.com), the IBM CLEVER
project, and the TrustRank algorithm.
-
8/7/2019 Working of Webb Search Engines
21/29
-
8/7/2019 Working of Webb Search Engines
22/29
In other words, the PageRank conferred by an outbound link is equal tothe document's own PageRank score divided by the normalized
number of outbound links L( ) (it is assumed that
links to specific URLs only count once per
document).
In the general case, the PageRank value for any page u can be expressed as:
i.e. the PageRank value for a page u is dependent on the PageRank values for each
page v out of the set Bu (this set contains all pages linking to
page u), divided by the number L(v) of links from page v.
-
8/7/2019 Working of Webb Search Engines
23/29
PageRanking
-
8/7/2019 Working of Webb Search Engines
24/29
The Panda Update
Google formed their definition of low quality by
asking outside testers to rate sites by answering
questions such as:
Would you be comfortable giving this site your credit
card?
Would you be comfortable giving medicine prescribed by
this site to your kids?
Do you consider this site to be authoritative?
Would it be okay if this was in a magazine?
Does this site have excessive ads?
And if the answer was yes then PageRank was to
decrease.
-
8/7/2019 Working of Webb Search Engines
25/29
Drawbacks
No Real-time Search Results
Not Intelligent
Chances of misleading the search are more
-
8/7/2019 Working of Webb Search Engines
26/29
Conclusions
Search engine plays important role in accessing thecontent over the internet, it fetches the pagesrequested by the user.
It made the internet and accessing theinformation just a click away.
The need for better search engines only increases
The search engine sites are among the mostpopular websites.
-
8/7/2019 Working of Webb Search Engines
27/29
References
Wikipedia
http://en.wikipedia.org/wiki/Web_search_engine
How Stuff Works
http://www.howstuffworks.com. WebReference.com
The Anatomy of a Large-Scale Hypertextual Web Search
Engine by Sergey Brin and Lawrence Page
How a Search Engine Works by Elizabeth Liddy http://www.cnlp.org/publications/02HowASearchEngineWorks.pdf
-
8/7/2019 Working of Webb Search Engines
28/29
Questions
???
-
8/7/2019 Working of Webb Search Engines
29/29
Thank You for Patient Listening !