4)search engine
TRANSCRIPT
-
8/13/2019 4)Search Engine
1/9
Assignment 4
Title : Perform evaluation of any popular search engine based
on relevancy.(E.g Google)
Theory :
Introduction
The Web can be used as a quick and direct reference to get any type of
information all over the world. However, information found on the Web needs to
be filtered and may include voluminous misinformation or non relevant
information. The Internet surfer may not be aware of many search engines to get
information on a topic quickly and may use different search strategies.Finding
useful information quickly on the Internet poses a challenge to both the ordinary
users and the information professionals. Though the performance of currently
available search engines has been improving continuously with powerful search
capabilities of various types, the lack of comprehensive coverage, the inability to
predict the quality of retrieved results, and the absence of controlled vocabulariesmake it difficult for users to use search engines effectively. The use of the Internet
as an information resource needs to be carefully evaluated as no traditional quality
standards or control have been applied to the Web. Librarians need to be able to
provide informative recommendations to their clientele regarding the selection of
search engines and their effective search strategies.
To evaluate an IR system is to measure how well the system meets the
information needs of the users.This is troublesome, given that a same result set
might beinterpreted differently by distinct users.To deal with this problem, some
metrics have been defined that,on average, have a correlation with the preferences
of a group of users Without proper retrieval evaluation, one cannot determine how
-
8/13/2019 4)Search Engine
2/9
well the IR system is performing compare the performance of the IR system with
that of othersystems, objectively Retrieval evaluationis a critical and integral
component of any modern IR system.
Searching the web and how of it?
The world is all over the web the amount of transactions on the world
(www) seem to justify this statement. This century has brought the whole shopping
experience from the physical stores to the wires. During the last decade we have
encountered an extreme shift in the online shopping paradigm. Companies that
provide the online shopping facilities have increased tremendously. The userspatience threshold is lowering day by day and they want their demands in a single
click while sellers also want to push the best of their products.
These days, one of the most challenging task is to provide the most relevant
and meaningful search results to the customer. Normally, when a user searches for
some product, most of them would just go through top 10 to 15 results, So, if the
right product doesnt show up within this range of results there is a high
probability that the vendor would lose out to a competitor.
What is Relevancy? And its importance
With tremendous increase in the amount of data over the web, it has become
really tough to manage data in a way so as to present the user with the most
accurate search results. Right information means more business return on
investments. In simple words, relevancy can be defined as simplicity and
usefulness. If a specific bit of information is useful for the user and they can reach
it without making a lot of effort, then that is what relevancy is.
Just the way a web designer thinks of making a site that captures users
attention within seconds, the search results on the site decide whether a user will be
-
8/13/2019 4)Search Engine
3/9
interested in the site or not. The users engagement with search -experience defines
their site-behavior including the likelihood to purchase/complete certain
transactions. This calls for the need to study and analyze behavior of site-users,
their action-paths, decision-points, interest areas etc.
Relevancy means the relationship between things or events.
Measurements for evaluation
Precision of Search EnginesAfter a search, the user is sometimes able to retrieve relevant information
andsometimes able to retrieve irrelevant information. The quality of searching the
right information accurately would be the precision value of the search engine.
Precisionis the fraction of the retrieved documents (the set A) which is
relevant i.e.,
Precision =|Ra|
|A|
Consider,
R: the set of relevant documents
A: the answer set for I, generated by an IR system
R a: the intersection of the sets R and A
Precision:-
Precision of GoogleGoogle, being one of the most popular search engines on the Internet, was
selected as one of the search engines for comparison. Google focuses on the link
structure of the Web to determine relevant results and is representative of the
variety of easy-to-use search engines. This study would measure the relevance of
-
8/13/2019 4)Search Engine
4/9
-
8/13/2019 4)Search Engine
5/9
r(j) is the recall at the j-th position in the ranking
P(j) is the precision at the j-th position in the ranking
b _ 0 is a user specified parameter
E(j) is the E metric at the j-th position in the ranking
The parameter b is specified by the user and reflects the relative importance of
recall and precision.
If b = 0
E(j) = 1 P(j)
low values of b make E(j) a function of precision
If b ! 1
limb!1 E(j) = 1 r(j)
high values of b make E(j) a function of recal
For b = 1, the E-measure becomes the F-measure
Search queries and Web pages retrieved
Queries in the study were designed to test various search features including
single word search,phrase search, and a combination of the two using a Boolean
operator. The four search topics and their corresponding search queries were:
Type 1: Single word search query.
Type 2: Phrase search query.
Type 3:Two word searches connected by a Boolean AND.
Type 4:A phrase search and a word search connected by a Boolean AND.
Type 1: Search query : NASA
Type of query : Single word search query
Rq={d2,d5,d13,d19,d48,d55,d68,d121,d140,d151}
Rq is a set containing relevant documents for query.
-
8/13/2019 4)Search Engine
6/9
Ranking for query:-
1. d68* 6. d151
2. d48 7. d2
3. d140 8. d5
4. d19* 9. d55 *
5. d13 10. d121
Calculated measures for query:-
-Document d68 corresponds to 10% of all the relevant documents in the set
Rq.
-Thus having precision of 1/1 i.e. 100% and recall of 1/10 i.e. 10%.
-E measure can be calculated by using formula.
Document Precision(%) Recall(%) E measure
d68 100 10 0.82 (b=1)
d19 50 20 0.73 (b=1.1)
d55 33 30 0.52 (b=1)
Type2:Search query: University of Pune
Type of query : Phrase search query
Rq={d5,d7,d19,d23,d58,d70,d99,d190}
Ranking of query:-
1. d58 5. d99
2. d5 6. d70 *
-
8/13/2019 4)Search Engine
7/9
3. d190*7. d19
4. d7 8. d23 *
Calculated measures for query:-
Document Precision(%) Recall(%) E measure
d190 33 13 0.81 (b=1)d70 33 25 0.72 (b=1)
d23 38 38 0.62 (b=1)
Type3:-Search query:- Passport AND office.
Query type:- Two word searches connected by a Boolean AND.
Rq={ d2,d5,d10,d55,d80,d90,d125,d150,d200,d250}
Ranking for query:-
1. d55 * 6. d250
2. d5 7. d2 *
3. d80 * 8. d90
4. d125 9. d100
5. d200 * 10. d150 *
Calculated measures for query:-
Document d55 corresponds to 10% of all the relevant documents in the set
Rq.Thus having precision of 1/1 i.e. 100% and recall of 1/10 i.e. 10%.
-
8/13/2019 4)Search Engine
8/9
Document Precision(%) Recall(%) E-Measure
d55 100 10 0.81 (b=1)
d80 66 20 0.69 (b=1)d200 60 30 0.60 (b=1)d2 57 40 0.07 (b=1)
d150 50 50 0.5 (b=1)
Type4:Search query: admission requirements AND ME.
Query type: a phrase search and a word search connected by a BooleanAND.
Rq={d10,d25,d5,d1,d70,d65,d100,d15,d150,d80}
Ranking for query:-
1. d25 6. d100 *
2. d5 7. d10
3. d1 8. d80
4. d70 * 9. d15 *
5. d65 10. d150
Calculated measures for query:-
Precision of document d70 is 1/4 i.e. 25%.
While recall is 1/10 i.e. 10%.
Document Precision (%) Recall(%) E-Measure
-
8/13/2019 4)Search Engine
9/9
d70 25 10 0.85 (b=1)
d100 33 20 0.75 (b=1)
d150 33 30 0.65 (b=1)
Conclusion
While the concept of relevancy as an approach is considered
important.Search engine relevancy is a key feature, which often tends to get
ignored in losses in terms of time, money and effort to fix and tune the
effectiveness of the engine. By giving adequate thought to various parameters,
which can boost the relevancy, organizations can achieve a self-sustainable,
intelligent and useful search system.
Precision, recall and E measure these measurements are best to evaluate the
search engine.