4)search engine

Upload: monzieair

Post on 04-Jun-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/13/2019 4)Search Engine

    1/9

    Assignment 4

    Title : Perform evaluation of any popular search engine based

    on relevancy.(E.g Google)

    Theory :

    Introduction

    The Web can be used as a quick and direct reference to get any type of

    information all over the world. However, information found on the Web needs to

    be filtered and may include voluminous misinformation or non relevant

    information. The Internet surfer may not be aware of many search engines to get

    information on a topic quickly and may use different search strategies.Finding

    useful information quickly on the Internet poses a challenge to both the ordinary

    users and the information professionals. Though the performance of currently

    available search engines has been improving continuously with powerful search

    capabilities of various types, the lack of comprehensive coverage, the inability to

    predict the quality of retrieved results, and the absence of controlled vocabulariesmake it difficult for users to use search engines effectively. The use of the Internet

    as an information resource needs to be carefully evaluated as no traditional quality

    standards or control have been applied to the Web. Librarians need to be able to

    provide informative recommendations to their clientele regarding the selection of

    search engines and their effective search strategies.

    To evaluate an IR system is to measure how well the system meets the

    information needs of the users.This is troublesome, given that a same result set

    might beinterpreted differently by distinct users.To deal with this problem, some

    metrics have been defined that,on average, have a correlation with the preferences

    of a group of users Without proper retrieval evaluation, one cannot determine how

  • 8/13/2019 4)Search Engine

    2/9

    well the IR system is performing compare the performance of the IR system with

    that of othersystems, objectively Retrieval evaluationis a critical and integral

    component of any modern IR system.

    Searching the web and how of it?

    The world is all over the web the amount of transactions on the world

    (www) seem to justify this statement. This century has brought the whole shopping

    experience from the physical stores to the wires. During the last decade we have

    encountered an extreme shift in the online shopping paradigm. Companies that

    provide the online shopping facilities have increased tremendously. The userspatience threshold is lowering day by day and they want their demands in a single

    click while sellers also want to push the best of their products.

    These days, one of the most challenging task is to provide the most relevant

    and meaningful search results to the customer. Normally, when a user searches for

    some product, most of them would just go through top 10 to 15 results, So, if the

    right product doesnt show up within this range of results there is a high

    probability that the vendor would lose out to a competitor.

    What is Relevancy? And its importance

    With tremendous increase in the amount of data over the web, it has become

    really tough to manage data in a way so as to present the user with the most

    accurate search results. Right information means more business return on

    investments. In simple words, relevancy can be defined as simplicity and

    usefulness. If a specific bit of information is useful for the user and they can reach

    it without making a lot of effort, then that is what relevancy is.

    Just the way a web designer thinks of making a site that captures users

    attention within seconds, the search results on the site decide whether a user will be

  • 8/13/2019 4)Search Engine

    3/9

    interested in the site or not. The users engagement with search -experience defines

    their site-behavior including the likelihood to purchase/complete certain

    transactions. This calls for the need to study and analyze behavior of site-users,

    their action-paths, decision-points, interest areas etc.

    Relevancy means the relationship between things or events.

    Measurements for evaluation

    Precision of Search EnginesAfter a search, the user is sometimes able to retrieve relevant information

    andsometimes able to retrieve irrelevant information. The quality of searching the

    right information accurately would be the precision value of the search engine.

    Precisionis the fraction of the retrieved documents (the set A) which is

    relevant i.e.,

    Precision =|Ra|

    |A|

    Consider,

    R: the set of relevant documents

    A: the answer set for I, generated by an IR system

    R a: the intersection of the sets R and A

    Precision:-

    Precision of GoogleGoogle, being one of the most popular search engines on the Internet, was

    selected as one of the search engines for comparison. Google focuses on the link

    structure of the Web to determine relevant results and is representative of the

    variety of easy-to-use search engines. This study would measure the relevance of

  • 8/13/2019 4)Search Engine

    4/9

  • 8/13/2019 4)Search Engine

    5/9

    r(j) is the recall at the j-th position in the ranking

    P(j) is the precision at the j-th position in the ranking

    b _ 0 is a user specified parameter

    E(j) is the E metric at the j-th position in the ranking

    The parameter b is specified by the user and reflects the relative importance of

    recall and precision.

    If b = 0

    E(j) = 1 P(j)

    low values of b make E(j) a function of precision

    If b ! 1

    limb!1 E(j) = 1 r(j)

    high values of b make E(j) a function of recal

    For b = 1, the E-measure becomes the F-measure

    Search queries and Web pages retrieved

    Queries in the study were designed to test various search features including

    single word search,phrase search, and a combination of the two using a Boolean

    operator. The four search topics and their corresponding search queries were:

    Type 1: Single word search query.

    Type 2: Phrase search query.

    Type 3:Two word searches connected by a Boolean AND.

    Type 4:A phrase search and a word search connected by a Boolean AND.

    Type 1: Search query : NASA

    Type of query : Single word search query

    Rq={d2,d5,d13,d19,d48,d55,d68,d121,d140,d151}

    Rq is a set containing relevant documents for query.

  • 8/13/2019 4)Search Engine

    6/9

    Ranking for query:-

    1. d68* 6. d151

    2. d48 7. d2

    3. d140 8. d5

    4. d19* 9. d55 *

    5. d13 10. d121

    Calculated measures for query:-

    -Document d68 corresponds to 10% of all the relevant documents in the set

    Rq.

    -Thus having precision of 1/1 i.e. 100% and recall of 1/10 i.e. 10%.

    -E measure can be calculated by using formula.

    Document Precision(%) Recall(%) E measure

    d68 100 10 0.82 (b=1)

    d19 50 20 0.73 (b=1.1)

    d55 33 30 0.52 (b=1)

    Type2:Search query: University of Pune

    Type of query : Phrase search query

    Rq={d5,d7,d19,d23,d58,d70,d99,d190}

    Ranking of query:-

    1. d58 5. d99

    2. d5 6. d70 *

  • 8/13/2019 4)Search Engine

    7/9

    3. d190*7. d19

    4. d7 8. d23 *

    Calculated measures for query:-

    Document Precision(%) Recall(%) E measure

    d190 33 13 0.81 (b=1)d70 33 25 0.72 (b=1)

    d23 38 38 0.62 (b=1)

    Type3:-Search query:- Passport AND office.

    Query type:- Two word searches connected by a Boolean AND.

    Rq={ d2,d5,d10,d55,d80,d90,d125,d150,d200,d250}

    Ranking for query:-

    1. d55 * 6. d250

    2. d5 7. d2 *

    3. d80 * 8. d90

    4. d125 9. d100

    5. d200 * 10. d150 *

    Calculated measures for query:-

    Document d55 corresponds to 10% of all the relevant documents in the set

    Rq.Thus having precision of 1/1 i.e. 100% and recall of 1/10 i.e. 10%.

  • 8/13/2019 4)Search Engine

    8/9

    Document Precision(%) Recall(%) E-Measure

    d55 100 10 0.81 (b=1)

    d80 66 20 0.69 (b=1)d200 60 30 0.60 (b=1)d2 57 40 0.07 (b=1)

    d150 50 50 0.5 (b=1)

    Type4:Search query: admission requirements AND ME.

    Query type: a phrase search and a word search connected by a BooleanAND.

    Rq={d10,d25,d5,d1,d70,d65,d100,d15,d150,d80}

    Ranking for query:-

    1. d25 6. d100 *

    2. d5 7. d10

    3. d1 8. d80

    4. d70 * 9. d15 *

    5. d65 10. d150

    Calculated measures for query:-

    Precision of document d70 is 1/4 i.e. 25%.

    While recall is 1/10 i.e. 10%.

    Document Precision (%) Recall(%) E-Measure

  • 8/13/2019 4)Search Engine

    9/9

    d70 25 10 0.85 (b=1)

    d100 33 20 0.75 (b=1)

    d150 33 30 0.65 (b=1)

    Conclusion

    While the concept of relevancy as an approach is considered

    important.Search engine relevancy is a key feature, which often tends to get

    ignored in losses in terms of time, money and effort to fix and tune the

    effectiveness of the engine. By giving adequate thought to various parameters,

    which can boost the relevancy, organizations can achieve a self-sustainable,

    intelligent and useful search system.

    Precision, recall and E measure these measurements are best to evaluate the

    search engine.