a novel algorithm to determine the quality of a web page for improving the search engine ranking...

Upload: journal-of-computing

Post on 04-Apr-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/30/2019 A Novel Algorithm to Determine the Quality of a Web Page for Improving the Search Engine Ranking Framework

    1/7

    1

    A Novel Algorithm to Determine the Quality of aWeb Page for Improving the Search Engine

    Ranking FrameworkSheikh Muhammad Sarwar, Md. Mustafizur Rahman and Mosaddek Hossain Kamal

    AbstractThis paper proposes and develops a general and formal static rank computation algorithm for ranking web documents

    considering the availability, significance, appeal and relevance of the images present in them. Different types of images appear in a

    web document; some of them increase the content quality of the web page and some are deemed irrelevant considering the content.

    Moreover, some images are not appealing for catching the attraction of the users and they do not necessarily improve the content. In

    this paper, a static ranking algorithm like PageRank has been proposed, which works based on the analysis of the images appearing

    in the web document. A method for integrating this algorithm with a complete ranking framework, which is based on Markov Random

    Field Model has also been presented. The algorithm computes a metric IBQV (Image Based Quality Value) that demonstrates the

    extent to which images in a web document increase its value. The theoretical and practical implications of IBQV has been shown and

    experimental results indicate that incorporating IBQV increases the correctness of the search result.

    Index TermsInformation Retrieval, Web Page Ranking, Image Search Engine, Markov Random Field Model

    !

    1 INTRODUCTION

    Existing document retrieval models usually assume that all

    the documents in the collection have the same quality. The

    equal quality assumption does not, however, hold for large

    and heterogeneous web corpora [1]. As the number of web

    documents is growing exponentially, it is becoming very

    difficult to find the required web documents with respect to

    a user query. Even if there are many documents those can

    satisfy the users information need, many web documentscontain unnecessary textual and multimedia information

    which hampers readability, makes navigation difficult and

    tiresome and scatters the presentation and layout. As a result,

    quality of a web document becomes a crucial factor when

    designing a ranking function or framework.

    Most of the current researches focus on the structure mining

    of the web pages to give it a rank value. Two page ranking

    algorithms, HITS [2] and PageRank [3], are commonly

    used in web structure mining. Both the algorithms measure

    the importance of a web Page based on the non-local link

    structure. They rely solely on the votes from the neighbors

    of the document in the link graph to determine the quality of

    the document [1]. So, they are not sufficient for determining

    Sheikh Muhammad Sarwar is with the Department of Computer Scienceand Engineering, University of Dhaka, Dhaka, Bangladesh. E-mail:[email protected]

    Md. Mustafizur Rahman is with the Department of Computer Scienceand Engineering, University of Dhaka, Dhaka, Bangladesh. E-mail:[email protected]

    Mosaddek Hossain Kamal is with the Department of Computer Sci-ence and Engineering, University of Dhaka, Dhaka, Bangladesh. E-mail:[email protected]

    the static rank of a web page as content representation is

    not considered as a quality criteria.

    Few researches for integrating document quality into the

    ranking framework have been done. Text based features of

    a web page have been integrated in a Markov Random

    Field Ranking Model [1]. Markov Random Field Model for

    Information Retrieval (MRF-IR), was proposed by Metzler

    and Croft [4]. This model can make use of features based

    on single terms, ordered phrases, and unordered phrases.MRF-IR has been used as an effective tool in different search

    tasks. It performs reasonably well for text based queries. But,

    web documents contain quality images, which can also act

    as discriminators when choosing pages to represent to a user.

    A document with images, that supports its text content, can

    really be useful and interactive for the user. A research related

    to the learning process of students showed the importance of

    images in teaching, and the findings confirmed the benefit of

    incorporating images in teaching and learning. If the images

    are selected and used appropriately, they can enhance and

    lead to a deep approach to learning amongst students [5].

    So, images that are coherent with the textual content, will

    enhance the quality of a web document.Images that are appealing increase the aesthetics of a

    web page and attracts users. Finding appealing images for

    automatic album creation is a popular research topic [6].

    Several algorithms and techniques for computing the appeal

    of an image or a frame in a video has been proposed in

    literature [7] [8]. From users point of view, appealing images

    increase the quality of web documents and should have an

    impact in ranking them.

    From the above assumptions, the design and architecture

    of a ranking model has been proposed in this work, where

    a document containing images, which are appealing and

    JOURNAL OF COMPUTING, VOLUME 4, ISSUE 10, OCTOBER 2012, ISSN (Online) 2151-9617

    https://sites.google.com/site/journalofcomputing

    WWW.JOURNALOFCOMPUTING.ORG 49

    2012 Journal of Computing Press, NY, USA, ISSN 2151-9617

  • 7/30/2019 A Novel Algorithm to Determine the Quality of a Web Page for Improving the Search Engine Ranking Framework

    2/7

    2

    coherent with the content of the document, should be given

    preference, when being ranked with respect to a user query.

    In this context, a new metric, Image Based Quality Value

    (IBQV) has been proposed and the calculation process

    of IBQV has been illustrated. A new kind of image with

    respect to a web page has been proposed which is Stop Image.

    2 BACKGROUND

    There are some topics on which a short discussion is needed

    before describing the new image based quality model for

    integrating into the ranking framework for web pages. These

    topics form the basic idea for developing the new quality

    model. A new topic has been introduced in this context which

    is Stop Image, and we have provided the definition of Stop

    Image which can be treated as a part of our contribution in

    this paper.

    2.1 Vector Space Model (VSM) for Document Simi-larity Analysis

    Vector Space Model (VSM) treats each document as a vector.

    The dimensions of the vector are terms. If a term occurs in the

    document, its value in the vector is non-zero. Several different

    ways of computing these values, also known as (term) weights,

    have been developed. One of the best known schemes is tf-idf

    weighting. The definition of term depends on the application.

    Typically terms are single words, keywords, or longer phrases

    [9]. To measure the similarity between two documents, first,

    vector representations of the two documents are created. Then,

    cosine similarity measure is used to compute the similarityscore [9]. There are other similarity measures that can be used

    too.

    2.2 Web Image Search Engine

    A web image search engine combines the functionality of

    keyword based image searching and content based image

    searching. For keyword based image searching, the user gives

    text queries to the search engine and the search engine returns

    a list of web pages where the object mentioned in the keyword

    occurs. For content based image searching, the user presents

    an image query to the search engine and the search engineuses image distance base measure to find out the web pages

    which contain the matched image. An image distance measure

    compares the similarity of two images in various dimensions

    such as color, texture, shape, and others. For example, a

    distance of 0 signifies an exact match with the query, with

    respect to the dimensions that were considered. As one may

    intuitively gather, a value greater than 0 indicates various

    degrees of similarities between the images. Search results then

    can be sorted based on their distance to the queried image [10].

    In this work, we will only consider exact match which will

    only consider the value 0 for image distance base measure.

    2.3 Stop Image

    Stop images are those, which can be deleted from the

    document and the document will not lose any visual

    information it is providing. In this context, a text based

    concept can be mentioned which is stop word. In a document

    text, there are terms and words which are not necessary

    from a retrieval perspective. Articles, prepositions are the

    words that do not improve the knowledge inside a document.They are stop words. Analogously, we can say that, there are

    images in a web document which do not necessarily improve

    the content of the document. They are the background images,

    images with low resolution, images of very small size and

    single color images.

    2.4 Image Appeal Value

    Image Appeal Value or Image Aesthetic Value (IAV) is nec-

    essary for choosing images in automatic albuming application

    [6]. Image appeal value of an image depends on contrast and

    colorfulness of the image [7]. There are several methods forcalculating contrast and colorfulness of an image or a region

    of an image and different images can be compared using

    this metrics. The contrast value can be calculated using the

    following equations [7] :

    CNi =

    1

    ni1

    jregioni

    (xj x)2

    1/2x = 1ni

    jregioni

    xj

    An established method described in [11] is used to compute

    the colorfulness of an image. It combines chroma magnitude

    and color variance in the CIE-Lab color space. The following

    equation is used to compute the colorfulness:

    CFi =

    aibi + 0

    .37aibi

    According to the equation above, aibi is the trigonometric

    length of the standard deviation in CIE-L ab space, and aibiis the distance of the centre of gravity in CIE-L ab space to

    the neutral color axis [7]. IAV is calculated as the average of

    contrast and colorfulness values in this work.

    3 RELATED WORKS

    Web document ranking without any query is a part of the

    ranking framework or ranking function. It is usually men-

    tioned as static ranking. There has been many approaches

    for calculating link-based priors such as PageRank [3], HITS

    [2], and SALSA [12] which are often used in web search.Research showed that including these priors can significantly

    improve the performance of the ranking function [13] [14]

    [15]. Click-based priors can also be a measure of relevance as

    they are measured by information like how frequently a user

    clicks a web document and how much time they spend on that

    page. They have been proven beneficial for web search [16]

    [17]. Documents text quality was also taken into account to

    calculate document quality based priors and incorporated into

    the ranking function [1].

    Link-based, click-based and document text quality based

    priors do not take into account the quality of images present

    JOURNAL OF COMPUTING, VOLUME 4, ISSUE 10, OCTOBER 2012, ISSN (Online) 2151-9617

    https://sites.google.com/site/journalofcomputing

    WWW.JOURNALOFCOMPUTING.ORG 50

    2012 Journal of Computing Press, NY, USA, ISSN 2151-9617

  • 7/30/2019 A Novel Algorithm to Determine the Quality of a Web Page for Improving the Search Engine Ranking Framework

    3/7

    3

    in a web page. Features, based on document text, are used

    to get a numerical value against the text quality of the

    document text [1]. There has been research on the images

    present in the web pages and their role in summarising the

    contents of the document [18]. A research showed that, in the

    evaluation of web pages, the presentation plays an effective

    role [19]. Evaluation in the context of presentation considered

    two aspects:

    Text presentation (font size, character).

    Multimedia presentation (image quality, image size, num-

    ber of images in a page, resolution of video etc.).

    But there were no approach for integrating the presentation

    aspect of a web page into the ranking function of a search

    engine.

    4 TWO NOVEL METRICS: ITSV AN D IBQV

    In this work, we have proposed a new image based feature

    to incorporate into the ranking model, which is Image Based

    Quality Value (IBQV). To calculate IBQV, we have used

    a new metric ITSV (Image Text Similarity Value). In

    this section, the computation process of these two values is

    illustrated.

    4.1 Image Text Similarity Value (ITSV)

    ITSV is just a simple variation of document similarity value.

    If two documents D1 and D2 contains the same image, then

    the text based similarity value is defined as Image Text

    Similarity Value (ITSV). This value is between 0 and 1. To

    asses the document similarity, Vector Space Model (VSM)

    is used to represent each of the documents. Then using the

    cosine similarity measure, the similarity value is obtained and

    this value is the ITSV value according to this paper.

    4.2 Image Based Quality Value (IBQV)

    IBQV is calculated with the help of an image search engine.

    IBQV is the value that is added as a static rank value of

    the web document. To find the value of IBQV for a web

    document, first we extract all the images from the web

    document. Then, from the set of images, Stop Images are

    identified and removed. The remaining images form the

    set of quality images for that web page. Now, each of the

    remaining images is presented to an image search engine

    and top-k documents are retrieved. After the retrieval of

    top-k documents, each of the top-k documents are takenone by one and the ITSV value is calculated with respect

    to the web document being considered for the calculation

    of IBQV. Then the summation of the ITSV value for each

    pair of documents is obtained and divided by the value of k

    to obtain the average ITSV value for an image. The value

    of IAV is calculated as mentioned in section 2.4 and added

    with average ITSV value. Average ITSV value and IAV value

    are added with IBQV value. Finally, IBQV value is obtained

    by dividing the summation of IBQV values by the number

    of images. Figure 1 shows the process of calculating IBQV

    for a single web document. The pseudo-code for calculating

    IBQV is illustrated in algorithm 1.

    Algorithm 1. IBQV Calculation Algorithm

    Input:

    A web document D.

    Set of images in D, I = {i1, i2,...,in} where n is the numberof images in D.

    Final Image Set F = {}.

    Output:

    IBQVvalue ofD.

    Method:

    1. FOR each image ik I do2. if ik is not a Stop Image

    3. then F = F ik4. END FOR

    5. IBQV= 0

    6. FOR each image fk F do7. submit fk to an image search engine as query8. K = { top-k documents from the search engine }9. ITSV = 010. FOR each element ki K do11. find ITSV i ofD and ki12. ITSV = ITSV + ITSV i13. END FOR

    14. ITSV = ITSV |K|15. IAV = calculate IAV(fk)

    16. IBQV= IBQV+ (ITSV + IAV) 217. END FOR

    18. IBQV= IBQV |F|

    5 INTEGRATION OF IBQV WITH MARKOVRANDOM FIELD MODEL

    A Markov random field (MRF) is a graphical model where

    the nodes correspond to random variables and the edges

    are undirected. The edges define dependencies among the

    variables. MRF can represent cyclic dependencies. Metzler

    and Croft [4] proposed to model a joint relevance distribution

    over a query Q = q1,...,qn and a document D using MRF [1].Figure 2 shows the MRF model for a document and a three

    term query. As shown in the model, given D, non-adjacent

    query terms are independent, but adjacent query terms are

    dependent on each other as there is an edge between them.

    Using the MRF, the joint distribution over the random

    variables in the graph G, can be calculated. In this process,

    a set of cliques C(G) in the graph G are found and anon-negative potential function is defined over the set

    of cliques. Given a query, Q, and document, D, the joint

    relevance distribution is expressed as:

    PG,(Q,D) =1Z

    cC(G)

    (c;)

    JOURNAL OF COMPUTING, VOLUME 4, ISSUE 10, OCTOBER 2012, ISSN (Online) 2151-9617

    https://sites.google.com/site/journalofcomputing

    WWW.JOURNALOFCOMPUTING.ORG 51

    2012 Journal of Computing Press, NY, USA, ISSN 2151-9617

  • 7/30/2019 A Novel Algorithm to Determine the Quality of a Web Page for Improving the Search Engine Ranking Framework

    4/7

    4

    Fig. 1. Computation procedure of IBQV

    Fig. 2. MRF model with a sequential dependence as-sumption [1]

    In the above equation, Z is a normalizing constant and is a set of free parameters that are used within the potential

    functions. The potential function usually takes the following

    form:

    (c;) = e

    iifi(c)

    The score of a document D with respect to the query Q can

    be defined as [4]:

    score(Q,D) = logPG,(D | Q)

    = logPG,(Q,D) logPG,(Q)

    =

    cC(G) log(c;) logZ logPG,(Q)rank=

    cC(G) log(c;)

    Now, to instantiate the MRF model, a set of cliques, C(G),and a set of potential functions, (c;), over the cliqueshave to be defined. There are several possible instantiations,

    based on the different dependence assumptions between the

    document and the query terms [4]. The sequential dependence

    instantiation has been shown as an effective instantiation

    [4], [1]. The sequential dependence instantiation of the MRF

    model, shown in Figure 2, assumes dependence only between

    the adjacent query terms.There are three types of cliques that can be found when

    considering the sequential dependency assumption. The first

    type of cliques involve a single term node and a document

    node. The potential function for these cliques are defined as

    follows [1]:

    log(qi, D;) = TfT(qi, D)

    Here fT(qi, D) is a feature function defined over the queryterm qi and the document D, and T is a free parameter.

    The second type of cliques involve two query terms and the

    document node. The potential functions over these cliques aredefined as:

    log(qi, qi+1, D;) = OfO(qi, qi+1, D)

    + UfU(qi, qi+1,D)

    Where, fO(qi, qi+1,D) and fU(qi, qi+1, D) are featurefunctions, and O and U are free parameters. These

    potentials are made up of two distinct components. The first

    considers ordered (exact phrase) matches and is denoted by O

    JOURNAL OF COMPUTING, VOLUME 4, ISSUE 10, OCTOBER 2012, ISSN (Online) 2151-9617

    https://sites.google.com/site/journalofcomputing

    WWW.JOURNALOFCOMPUTING.ORG 52

    2012 Journal of Computing Press, NY, USA, ISSN 2151-9617

  • 7/30/2019 A Novel Algorithm to Determine the Quality of a Web Page for Improving the Search Engine Ranking Framework

    5/7

    5

    subscript. The second, denoted by the U subscript, considers

    unordered matches [1].

    The third type of cliques over which the potential function

    is defined contains the document node only. [1] defined

    the query independent potential function, based on a set

    of quality based factors, which increases content clarity,

    document readability and ease of navigation. The query

    independent potential function, which is defined relying on

    the text based features, only takes the following form:

    log(D;) =

    LL(D) LfL(D)

    In the equation above, L(D) is the set of quality basedfactors associated with the document node D. The features

    values are calculated using the set of quality based factors and

    parameter values are multiplied for each of the features. This

    way the quality value of the document can be obtained [1].

    We propose the integration of IBQV with the final ranking

    function, and after the integration, the equation for the final

    score computation will take the following form:

    score(Q,D) = TfT(qi,D)

    + OfO(qi, qi+1, D) + UfU(qi, qi+1,D)

    +

    LL LfL(D)

    + i IBQV

    Here, i is a free parameter which indicates the impor-

    tance of IBQV in the ranking function. The value of ican be determined by a learning-to-rank [20] scheme. The

    determination of the parameter values for other features were

    done [1] using a co-ordinate ascent algorithm proposed byMetzler and Croft [21]. In total, parameters for 13 features

    were tuned using the algorithm [1]. But in that model, 10

    features were document quality based and all of them were

    text based features. No image based features were considered

    for ranking the documents. Our assumption is that, inclusion

    of the novel metric IBQV would certainly improve the ranking

    framework as quality images are the assets of a web document.

    They certainly increase the quality of the document and can

    act as discriminating agents when total ranking is performed.

    6 EXPERIMENTAL RESULTS

    For the purpose of experiment, 40 documents from yahoo webdirectory were downloaded. Among them 20 documents were

    from the general health directory and the other 20 documents

    were downloaded from general business directory. The web

    pages were carefully chosen, so that they contained significant

    number of images. Human judgment values for image based

    quality rating for each web page were taken from three experts

    and the range of their judgment values were between 0 and

    10. 0 indicated thats images in the web document do not

    possess any quality to support the actual text content of the

    document and are less appealing. 10 indicated that, images in

    the web document highly support the actual text content of the

    document and the document quality is excellent considering

    the images and text. The average of the values provided by

    human judges for each document was calculated. Then, our

    developed program calculated the IBQV values for each of

    the documents. Apache Lucene [22] was used for calculating

    document similarity based on vector space model. Google

    Image Search Engine was used to find the web documents

    containing a specific image.

    Figure 3 shows IBQV values given by both human judges

    and our program for general health related documents graph-

    ically. Correlation coefficient was calculated between these

    values and 0.65 correlation was found. Figure 4 shows IBQV

    values given by both human judges and our program for

    general business related documents graphically. In case of

    Figure 4, 0.71 was the correlation coefficient.

    The values provided by human judges do not exactly match

    with the values captured by our program. But, in most of the

    cases, they were nearer to the values provided by the human

    judges. It proves that, the proposed metric IBQV can estimate

    the appeal of the images in the web documents and the degree

    to which they support the content.

    7 CONCLUSIONS

    In this work, an attempt has been taken to enhance the existing

    ranking framework used by the current web search engines

    by integrating image based features. Images have become

    an indispensable part of a web page as they provide visual

    information which is essential for prompt understanding of the

    content. The appeal and relevance of the information conveyed

    by images are crucial factors for determining the static rank

    of a web document. Moreover, there are images which are

    advertisements, Spam images and unnecessary images for

    content enrichment. They drive the users towards a wrong

    direction and make them unsatisfied with the low quality ofinformation. Our proposed method can find out documents

    that contain quality visual information and lift them up in

    the ranked list presented to the user. The metrics, those we

    proposed, are quite scalable and can be easily integrated

    with a ranking framework because of the simplicity of the

    computation process.

    REFERENCES

    [1] M. Bendersky, W. B. Croft, and Y. Diao, Quality-biased rankingof web documents, in Proceedings of the fourth ACM internationalconference on Web search and data mining, ser. WSDM 11. NewYork, NY, USA: ACM, 2011, pp. 95104. [Online]. Available:

    http://doi.acm.org/10.1145/1935826.1935849[2] J. M. Kleinberg, Authoritative sources in a hyperlinked environment,

    J. ACM, vol. 46, pp. 604632, September 1999. [Online]. Available:http://doi.acm.org/10.1145/324133.324140

    [3] S. Brin and L. Page, The anatomy of a large-scale hypertextualweb search engine, Comput. Netw. ISDN Syst., vol. 30, pp. 107117, April 1998. [Online]. Available: http://dx.doi.org/10.1016/S0169-7552(98)00110-X

    [4] D. Metzler and W. B. Croft, A markov random field model for termdependencies, in Proceedings of the 28th annual international ACMSIGIR conference on Research and development in information retrieval,ser. SIGIR 05. New York, NY, USA: ACM, 2005, pp. 472479.[Online]. Available: http://doi.acm.org/10.1145/1076034.1076115

    [5] S. N. Keegan, Importance of visual images in lectures: Case study ontourism management students, Journal of Hospitality, Leisure, Sportsand Tourism Education, vol. 6, pp. 5865, 2007.

    JOURNAL OF COMPUTING, VOLUME 4, ISSUE 10, OCTOBER 2012, ISSN (Online) 2151-9617

    https://sites.google.com/site/journalofcomputing

    WWW.JOURNALOFCOMPUTING.ORG 53

    2012 Journal of Computing Press, NY, USA, ISSN 2151-9617

  • 7/30/2019 A Novel Algorithm to Determine the Quality of a Web Page for Improving the Search Engine Ranking Framework

    6/7

    6

    Fig. 3. IBQV values given by both human judges and our program for general health related documents.

    Fig. 4. IBQV values given by both human judges and our program for general business related documents.

    [6] A. E. Savakis, S. P. Etz, and A. C. P. Loui, Evaluation of imageappeal in consumer photography, B. E. Rogowitz and T. N. Pappas,Eds., vol. 3959, no. 1. SPIE, 2000, pp. 111120. [Online]. Available:http://dx.doi.org/10.1117/12.387147

    [7] P. Obrador and N. Moroney, Low level features for image appealmeasurement, pp. 72 420T72 420T12, 2009. [Online]. Available: +http://dx.doi.org/10.1117/12.806140

    [8] A. K. Moorthy, P. Obrador, and N. Oliver, Towards computationalmodels of the visual aesthetic appeal of consumer videos, inProceedings of the 11th European conference on Computer vision: Part

    V, ser. ECCV10. Berlin, Heidelberg: Springer-Verlag, 2010, pp. 114.[Online]. Available: http://dl.acm.org/citation.cfm?id=1888150.1888152

    [9] G. Salton, A. Wong, and C. S. Yang, A vector space model forautomatic indexing, Commun. ACM, vol. 18, pp. 613620, November1975. [Online]. Available: http://doi.acm.org/10.1145/361219.361220

    [10] L. a b Shapiro, Computer Vision, 1st ed. Prentice Hall, 2001.

    [11] D. Hasler and S. E. Suesstrunk, Measuring colorfulness in naturalimages, in Society of Photo-Optical Instrumentation Engineers (SPIE)

    Conference Series, ser. Society of Photo-Optical Instrumentation Engi-neers (SPIE) Conference Series, B. E. Rogowitz and T. N. Pappas, Eds.,vol. 5007, Jun. 2003, pp. 8795.

    [12] M. A. Najork, Comparing the effectiveness of hits and salsa,in Proceedings of the sixteenth ACM conference on Conferenceon information and knowledge management, ser. CIKM 07. NewYork, NY, USA: ACM, 2007, pp. 157164. [Online]. Available:http://doi.acm.org/10.1145/1321440.1321465

    [13] N. Craswell, S. Robertson, H. Zaragoza, and M. Taylor, Relevanceweighting for query independent evidence, in Proceedings of the28th annual international ACM SIGIR conference on Researchand development in information retrieval, ser. SIGIR 05. NewYork, NY, USA: ACM, 2005, pp. 416423. [Online]. Available:http://doi.acm.org/10.1145/1076034.1076106

    [14] W. Kraaij, T. Westerveld, and D. Hiemstra, The importance ofprior probabilities for entry page search, in Proceedings of the25th annual international ACM SIGIR conference on Researchand development in information retrieval, ser. SIGIR 02. New

    JOURNAL OF COMPUTING, VOLUME 4, ISSUE 10, OCTOBER 2012, ISSN (Online) 2151-9617

    https://sites.google.com/site/journalofcomputing

    WWW.JOURNALOFCOMPUTING.ORG 54

    2012 Journal of Computing Press, NY, USA, ISSN 2151-9617

  • 7/30/2019 A Novel Algorithm to Determine the Quality of a Web Page for Improving the Search Engine Ranking Framework

    7/7

    7

    York, NY, USA: ACM, 2002, pp. 2734. [Online]. Available:http://doi.acm.org/10.1145/564376.564383

    [15] J. Peng, C. Macdonald, B. He, and I. Ounis, Combination of documentpriors in web information retrieval, in Large Scale Semantic Access toContent (Text, Image, Video, and Sound), ser. RIAO 07. Paris, France,France: LE CENTRE DE HAUTES ETUDES INTERNATIONALESDINFORMATIQUE DOCUMENTAIRE, 2007, pp. 596611. [Online].Available: http://dl.acm.org/citation.cfm?id=1931390.1931446

    [16] Y. Liu, B. Gao, T.-Y. Liu, Y. Zhang, Z. Ma, S. He, and H. Li,Browserank: letting web users vote for page importance, in

    Proceedings of the 31st annual international ACM SIGIR conferenceon Research and development in information retrieval, ser. SIGIR 08.New York, NY, USA: ACM, 2008, pp. 451458. [Online]. Available:http://doi.acm.org/10.1145/1390334.1390412

    [17] M. Richardson, A. Prakash, and E. Brill, Beyond pagerank:machine learning for static ranking, in Proceedings of the 15thinternational conference on World Wide Web, ser. WWW 06. NewYork, NY, USA: ACM, 2006, pp. 707715. [Online]. Available:http://doi.acm.org/10.1145/1135777.1135881

    [18] E. Baratis, E. G. M. Petrakis, and E. E. Milios, Automatic websitesummarization by image content: A case study with logo and trademarkimages. IEEE Trans. Knowl. Data Eng., vol. 20, no. 9, pp. 11951204,2008.

    [19] O. Signore, A comprehensive model for web sites quality, in WSE,2005, pp. 3038.

    [20] T.-Y. Liu, Learning to rank for information retrieval, Found. TrendsInf. Retr., vol. 3, pp. 225331, March 2009. [Online]. Available:

    http://dl.acm.org/citation.cfm?id=1618303.1618304[21] D. Metzler and W. Bruce Croft, Linear feature-based models for

    information retrieval, Inf. Retr., vol. 10, pp. 257274, June 2007.[Online]. Available: http://dl.acm.org/citation.cfm?id=1265488.1265494

    [22] A. S. Foundation, Apache lucene - apache lucene core,http://lucene.apache.org/core/, 2011.

    Sheikh Muhammad Sarwar is currently work-ing as a researcher in the department of

    CSE, University of Dhaka, Bangladesh. Hecompleted his M.Sc. and B.Sc. from Uni-versity of Dhaka. His research interests in-clude Information Retrieval, Image Process-ing, Quantum Computing etc. He publishedhis research paper in an international confer-ence. He received scholarship for his resultin B.Sc. from University of Dhaka.

    Md. Mustafizur Rahman is currently work-ing as an Associate Professor in Depart-ment of Computer Science and Engineering,University of Dhaka, Dhaka, Bangladesh. Heobtained his B.Sc. and M.Sc. from Univer-sity of Dhaka. He completed his PhD. fromKyung Hee University, South Korea. His re-search interests include Mobile Ad-hoc Net-work, Wireless Mesh Network, InformationRetrieval etc.

    Mosaddek Hossain Kamal is currently work-ing as an Associate Professor in Departmentof Computer Science and Engineering, Uni-versity of Dhaka, Dhaka, Bangladesh. He ob-tained his B.Sc. from University of Dhaka,Bangladesh and M.Sc. from University ofNew South Wales, Australia. His research in-terests include Mobile Middle ware, RoutingAlgorithms, Information Retrieval etc.

    JOURNAL OF COMPUTING, VOLUME 4, ISSUE 10, OCTOBER 2012, ISSN (Online) 2151-9617

    https://sites.google.com/site/journalofcomputing

    WWW.JOURNALOFCOMPUTING.ORG 55

    2012 Journal of Computing Press, NY, USA, ISSN 2151-9617