a novel algorithm to determine the quality of a web page for improving the search engine ranking...
TRANSCRIPT
-
7/30/2019 A Novel Algorithm to Determine the Quality of a Web Page for Improving the Search Engine Ranking Framework
1/7
1
A Novel Algorithm to Determine the Quality of aWeb Page for Improving the Search Engine
Ranking FrameworkSheikh Muhammad Sarwar, Md. Mustafizur Rahman and Mosaddek Hossain Kamal
AbstractThis paper proposes and develops a general and formal static rank computation algorithm for ranking web documents
considering the availability, significance, appeal and relevance of the images present in them. Different types of images appear in a
web document; some of them increase the content quality of the web page and some are deemed irrelevant considering the content.
Moreover, some images are not appealing for catching the attraction of the users and they do not necessarily improve the content. In
this paper, a static ranking algorithm like PageRank has been proposed, which works based on the analysis of the images appearing
in the web document. A method for integrating this algorithm with a complete ranking framework, which is based on Markov Random
Field Model has also been presented. The algorithm computes a metric IBQV (Image Based Quality Value) that demonstrates the
extent to which images in a web document increase its value. The theoretical and practical implications of IBQV has been shown and
experimental results indicate that incorporating IBQV increases the correctness of the search result.
Index TermsInformation Retrieval, Web Page Ranking, Image Search Engine, Markov Random Field Model
!
1 INTRODUCTION
Existing document retrieval models usually assume that all
the documents in the collection have the same quality. The
equal quality assumption does not, however, hold for large
and heterogeneous web corpora [1]. As the number of web
documents is growing exponentially, it is becoming very
difficult to find the required web documents with respect to
a user query. Even if there are many documents those can
satisfy the users information need, many web documentscontain unnecessary textual and multimedia information
which hampers readability, makes navigation difficult and
tiresome and scatters the presentation and layout. As a result,
quality of a web document becomes a crucial factor when
designing a ranking function or framework.
Most of the current researches focus on the structure mining
of the web pages to give it a rank value. Two page ranking
algorithms, HITS [2] and PageRank [3], are commonly
used in web structure mining. Both the algorithms measure
the importance of a web Page based on the non-local link
structure. They rely solely on the votes from the neighbors
of the document in the link graph to determine the quality of
the document [1]. So, they are not sufficient for determining
Sheikh Muhammad Sarwar is with the Department of Computer Scienceand Engineering, University of Dhaka, Dhaka, Bangladesh. E-mail:[email protected]
Md. Mustafizur Rahman is with the Department of Computer Scienceand Engineering, University of Dhaka, Dhaka, Bangladesh. E-mail:[email protected]
Mosaddek Hossain Kamal is with the Department of Computer Sci-ence and Engineering, University of Dhaka, Dhaka, Bangladesh. E-mail:[email protected]
the static rank of a web page as content representation is
not considered as a quality criteria.
Few researches for integrating document quality into the
ranking framework have been done. Text based features of
a web page have been integrated in a Markov Random
Field Ranking Model [1]. Markov Random Field Model for
Information Retrieval (MRF-IR), was proposed by Metzler
and Croft [4]. This model can make use of features based
on single terms, ordered phrases, and unordered phrases.MRF-IR has been used as an effective tool in different search
tasks. It performs reasonably well for text based queries. But,
web documents contain quality images, which can also act
as discriminators when choosing pages to represent to a user.
A document with images, that supports its text content, can
really be useful and interactive for the user. A research related
to the learning process of students showed the importance of
images in teaching, and the findings confirmed the benefit of
incorporating images in teaching and learning. If the images
are selected and used appropriately, they can enhance and
lead to a deep approach to learning amongst students [5].
So, images that are coherent with the textual content, will
enhance the quality of a web document.Images that are appealing increase the aesthetics of a
web page and attracts users. Finding appealing images for
automatic album creation is a popular research topic [6].
Several algorithms and techniques for computing the appeal
of an image or a frame in a video has been proposed in
literature [7] [8]. From users point of view, appealing images
increase the quality of web documents and should have an
impact in ranking them.
From the above assumptions, the design and architecture
of a ranking model has been proposed in this work, where
a document containing images, which are appealing and
JOURNAL OF COMPUTING, VOLUME 4, ISSUE 10, OCTOBER 2012, ISSN (Online) 2151-9617
https://sites.google.com/site/journalofcomputing
WWW.JOURNALOFCOMPUTING.ORG 49
2012 Journal of Computing Press, NY, USA, ISSN 2151-9617
-
7/30/2019 A Novel Algorithm to Determine the Quality of a Web Page for Improving the Search Engine Ranking Framework
2/7
2
coherent with the content of the document, should be given
preference, when being ranked with respect to a user query.
In this context, a new metric, Image Based Quality Value
(IBQV) has been proposed and the calculation process
of IBQV has been illustrated. A new kind of image with
respect to a web page has been proposed which is Stop Image.
2 BACKGROUND
There are some topics on which a short discussion is needed
before describing the new image based quality model for
integrating into the ranking framework for web pages. These
topics form the basic idea for developing the new quality
model. A new topic has been introduced in this context which
is Stop Image, and we have provided the definition of Stop
Image which can be treated as a part of our contribution in
this paper.
2.1 Vector Space Model (VSM) for Document Simi-larity Analysis
Vector Space Model (VSM) treats each document as a vector.
The dimensions of the vector are terms. If a term occurs in the
document, its value in the vector is non-zero. Several different
ways of computing these values, also known as (term) weights,
have been developed. One of the best known schemes is tf-idf
weighting. The definition of term depends on the application.
Typically terms are single words, keywords, or longer phrases
[9]. To measure the similarity between two documents, first,
vector representations of the two documents are created. Then,
cosine similarity measure is used to compute the similarityscore [9]. There are other similarity measures that can be used
too.
2.2 Web Image Search Engine
A web image search engine combines the functionality of
keyword based image searching and content based image
searching. For keyword based image searching, the user gives
text queries to the search engine and the search engine returns
a list of web pages where the object mentioned in the keyword
occurs. For content based image searching, the user presents
an image query to the search engine and the search engineuses image distance base measure to find out the web pages
which contain the matched image. An image distance measure
compares the similarity of two images in various dimensions
such as color, texture, shape, and others. For example, a
distance of 0 signifies an exact match with the query, with
respect to the dimensions that were considered. As one may
intuitively gather, a value greater than 0 indicates various
degrees of similarities between the images. Search results then
can be sorted based on their distance to the queried image [10].
In this work, we will only consider exact match which will
only consider the value 0 for image distance base measure.
2.3 Stop Image
Stop images are those, which can be deleted from the
document and the document will not lose any visual
information it is providing. In this context, a text based
concept can be mentioned which is stop word. In a document
text, there are terms and words which are not necessary
from a retrieval perspective. Articles, prepositions are the
words that do not improve the knowledge inside a document.They are stop words. Analogously, we can say that, there are
images in a web document which do not necessarily improve
the content of the document. They are the background images,
images with low resolution, images of very small size and
single color images.
2.4 Image Appeal Value
Image Appeal Value or Image Aesthetic Value (IAV) is nec-
essary for choosing images in automatic albuming application
[6]. Image appeal value of an image depends on contrast and
colorfulness of the image [7]. There are several methods forcalculating contrast and colorfulness of an image or a region
of an image and different images can be compared using
this metrics. The contrast value can be calculated using the
following equations [7] :
CNi =
1
ni1
jregioni
(xj x)2
1/2x = 1ni
jregioni
xj
An established method described in [11] is used to compute
the colorfulness of an image. It combines chroma magnitude
and color variance in the CIE-Lab color space. The following
equation is used to compute the colorfulness:
CFi =
aibi + 0
.37aibi
According to the equation above, aibi is the trigonometric
length of the standard deviation in CIE-L ab space, and aibiis the distance of the centre of gravity in CIE-L ab space to
the neutral color axis [7]. IAV is calculated as the average of
contrast and colorfulness values in this work.
3 RELATED WORKS
Web document ranking without any query is a part of the
ranking framework or ranking function. It is usually men-
tioned as static ranking. There has been many approaches
for calculating link-based priors such as PageRank [3], HITS
[2], and SALSA [12] which are often used in web search.Research showed that including these priors can significantly
improve the performance of the ranking function [13] [14]
[15]. Click-based priors can also be a measure of relevance as
they are measured by information like how frequently a user
clicks a web document and how much time they spend on that
page. They have been proven beneficial for web search [16]
[17]. Documents text quality was also taken into account to
calculate document quality based priors and incorporated into
the ranking function [1].
Link-based, click-based and document text quality based
priors do not take into account the quality of images present
JOURNAL OF COMPUTING, VOLUME 4, ISSUE 10, OCTOBER 2012, ISSN (Online) 2151-9617
https://sites.google.com/site/journalofcomputing
WWW.JOURNALOFCOMPUTING.ORG 50
2012 Journal of Computing Press, NY, USA, ISSN 2151-9617
-
7/30/2019 A Novel Algorithm to Determine the Quality of a Web Page for Improving the Search Engine Ranking Framework
3/7
3
in a web page. Features, based on document text, are used
to get a numerical value against the text quality of the
document text [1]. There has been research on the images
present in the web pages and their role in summarising the
contents of the document [18]. A research showed that, in the
evaluation of web pages, the presentation plays an effective
role [19]. Evaluation in the context of presentation considered
two aspects:
Text presentation (font size, character).
Multimedia presentation (image quality, image size, num-
ber of images in a page, resolution of video etc.).
But there were no approach for integrating the presentation
aspect of a web page into the ranking function of a search
engine.
4 TWO NOVEL METRICS: ITSV AN D IBQV
In this work, we have proposed a new image based feature
to incorporate into the ranking model, which is Image Based
Quality Value (IBQV). To calculate IBQV, we have used
a new metric ITSV (Image Text Similarity Value). In
this section, the computation process of these two values is
illustrated.
4.1 Image Text Similarity Value (ITSV)
ITSV is just a simple variation of document similarity value.
If two documents D1 and D2 contains the same image, then
the text based similarity value is defined as Image Text
Similarity Value (ITSV). This value is between 0 and 1. To
asses the document similarity, Vector Space Model (VSM)
is used to represent each of the documents. Then using the
cosine similarity measure, the similarity value is obtained and
this value is the ITSV value according to this paper.
4.2 Image Based Quality Value (IBQV)
IBQV is calculated with the help of an image search engine.
IBQV is the value that is added as a static rank value of
the web document. To find the value of IBQV for a web
document, first we extract all the images from the web
document. Then, from the set of images, Stop Images are
identified and removed. The remaining images form the
set of quality images for that web page. Now, each of the
remaining images is presented to an image search engine
and top-k documents are retrieved. After the retrieval of
top-k documents, each of the top-k documents are takenone by one and the ITSV value is calculated with respect
to the web document being considered for the calculation
of IBQV. Then the summation of the ITSV value for each
pair of documents is obtained and divided by the value of k
to obtain the average ITSV value for an image. The value
of IAV is calculated as mentioned in section 2.4 and added
with average ITSV value. Average ITSV value and IAV value
are added with IBQV value. Finally, IBQV value is obtained
by dividing the summation of IBQV values by the number
of images. Figure 1 shows the process of calculating IBQV
for a single web document. The pseudo-code for calculating
IBQV is illustrated in algorithm 1.
Algorithm 1. IBQV Calculation Algorithm
Input:
A web document D.
Set of images in D, I = {i1, i2,...,in} where n is the numberof images in D.
Final Image Set F = {}.
Output:
IBQVvalue ofD.
Method:
1. FOR each image ik I do2. if ik is not a Stop Image
3. then F = F ik4. END FOR
5. IBQV= 0
6. FOR each image fk F do7. submit fk to an image search engine as query8. K = { top-k documents from the search engine }9. ITSV = 010. FOR each element ki K do11. find ITSV i ofD and ki12. ITSV = ITSV + ITSV i13. END FOR
14. ITSV = ITSV |K|15. IAV = calculate IAV(fk)
16. IBQV= IBQV+ (ITSV + IAV) 217. END FOR
18. IBQV= IBQV |F|
5 INTEGRATION OF IBQV WITH MARKOVRANDOM FIELD MODEL
A Markov random field (MRF) is a graphical model where
the nodes correspond to random variables and the edges
are undirected. The edges define dependencies among the
variables. MRF can represent cyclic dependencies. Metzler
and Croft [4] proposed to model a joint relevance distribution
over a query Q = q1,...,qn and a document D using MRF [1].Figure 2 shows the MRF model for a document and a three
term query. As shown in the model, given D, non-adjacent
query terms are independent, but adjacent query terms are
dependent on each other as there is an edge between them.
Using the MRF, the joint distribution over the random
variables in the graph G, can be calculated. In this process,
a set of cliques C(G) in the graph G are found and anon-negative potential function is defined over the set
of cliques. Given a query, Q, and document, D, the joint
relevance distribution is expressed as:
PG,(Q,D) =1Z
cC(G)
(c;)
JOURNAL OF COMPUTING, VOLUME 4, ISSUE 10, OCTOBER 2012, ISSN (Online) 2151-9617
https://sites.google.com/site/journalofcomputing
WWW.JOURNALOFCOMPUTING.ORG 51
2012 Journal of Computing Press, NY, USA, ISSN 2151-9617
-
7/30/2019 A Novel Algorithm to Determine the Quality of a Web Page for Improving the Search Engine Ranking Framework
4/7
4
Fig. 1. Computation procedure of IBQV
Fig. 2. MRF model with a sequential dependence as-sumption [1]
In the above equation, Z is a normalizing constant and is a set of free parameters that are used within the potential
functions. The potential function usually takes the following
form:
(c;) = e
iifi(c)
The score of a document D with respect to the query Q can
be defined as [4]:
score(Q,D) = logPG,(D | Q)
= logPG,(Q,D) logPG,(Q)
=
cC(G) log(c;) logZ logPG,(Q)rank=
cC(G) log(c;)
Now, to instantiate the MRF model, a set of cliques, C(G),and a set of potential functions, (c;), over the cliqueshave to be defined. There are several possible instantiations,
based on the different dependence assumptions between the
document and the query terms [4]. The sequential dependence
instantiation has been shown as an effective instantiation
[4], [1]. The sequential dependence instantiation of the MRF
model, shown in Figure 2, assumes dependence only between
the adjacent query terms.There are three types of cliques that can be found when
considering the sequential dependency assumption. The first
type of cliques involve a single term node and a document
node. The potential function for these cliques are defined as
follows [1]:
log(qi, D;) = TfT(qi, D)
Here fT(qi, D) is a feature function defined over the queryterm qi and the document D, and T is a free parameter.
The second type of cliques involve two query terms and the
document node. The potential functions over these cliques aredefined as:
log(qi, qi+1, D;) = OfO(qi, qi+1, D)
+ UfU(qi, qi+1,D)
Where, fO(qi, qi+1,D) and fU(qi, qi+1, D) are featurefunctions, and O and U are free parameters. These
potentials are made up of two distinct components. The first
considers ordered (exact phrase) matches and is denoted by O
JOURNAL OF COMPUTING, VOLUME 4, ISSUE 10, OCTOBER 2012, ISSN (Online) 2151-9617
https://sites.google.com/site/journalofcomputing
WWW.JOURNALOFCOMPUTING.ORG 52
2012 Journal of Computing Press, NY, USA, ISSN 2151-9617
-
7/30/2019 A Novel Algorithm to Determine the Quality of a Web Page for Improving the Search Engine Ranking Framework
5/7
5
subscript. The second, denoted by the U subscript, considers
unordered matches [1].
The third type of cliques over which the potential function
is defined contains the document node only. [1] defined
the query independent potential function, based on a set
of quality based factors, which increases content clarity,
document readability and ease of navigation. The query
independent potential function, which is defined relying on
the text based features, only takes the following form:
log(D;) =
LL(D) LfL(D)
In the equation above, L(D) is the set of quality basedfactors associated with the document node D. The features
values are calculated using the set of quality based factors and
parameter values are multiplied for each of the features. This
way the quality value of the document can be obtained [1].
We propose the integration of IBQV with the final ranking
function, and after the integration, the equation for the final
score computation will take the following form:
score(Q,D) = TfT(qi,D)
+ OfO(qi, qi+1, D) + UfU(qi, qi+1,D)
+
LL LfL(D)
+ i IBQV
Here, i is a free parameter which indicates the impor-
tance of IBQV in the ranking function. The value of ican be determined by a learning-to-rank [20] scheme. The
determination of the parameter values for other features were
done [1] using a co-ordinate ascent algorithm proposed byMetzler and Croft [21]. In total, parameters for 13 features
were tuned using the algorithm [1]. But in that model, 10
features were document quality based and all of them were
text based features. No image based features were considered
for ranking the documents. Our assumption is that, inclusion
of the novel metric IBQV would certainly improve the ranking
framework as quality images are the assets of a web document.
They certainly increase the quality of the document and can
act as discriminating agents when total ranking is performed.
6 EXPERIMENTAL RESULTS
For the purpose of experiment, 40 documents from yahoo webdirectory were downloaded. Among them 20 documents were
from the general health directory and the other 20 documents
were downloaded from general business directory. The web
pages were carefully chosen, so that they contained significant
number of images. Human judgment values for image based
quality rating for each web page were taken from three experts
and the range of their judgment values were between 0 and
10. 0 indicated thats images in the web document do not
possess any quality to support the actual text content of the
document and are less appealing. 10 indicated that, images in
the web document highly support the actual text content of the
document and the document quality is excellent considering
the images and text. The average of the values provided by
human judges for each document was calculated. Then, our
developed program calculated the IBQV values for each of
the documents. Apache Lucene [22] was used for calculating
document similarity based on vector space model. Google
Image Search Engine was used to find the web documents
containing a specific image.
Figure 3 shows IBQV values given by both human judges
and our program for general health related documents graph-
ically. Correlation coefficient was calculated between these
values and 0.65 correlation was found. Figure 4 shows IBQV
values given by both human judges and our program for
general business related documents graphically. In case of
Figure 4, 0.71 was the correlation coefficient.
The values provided by human judges do not exactly match
with the values captured by our program. But, in most of the
cases, they were nearer to the values provided by the human
judges. It proves that, the proposed metric IBQV can estimate
the appeal of the images in the web documents and the degree
to which they support the content.
7 CONCLUSIONS
In this work, an attempt has been taken to enhance the existing
ranking framework used by the current web search engines
by integrating image based features. Images have become
an indispensable part of a web page as they provide visual
information which is essential for prompt understanding of the
content. The appeal and relevance of the information conveyed
by images are crucial factors for determining the static rank
of a web document. Moreover, there are images which are
advertisements, Spam images and unnecessary images for
content enrichment. They drive the users towards a wrong
direction and make them unsatisfied with the low quality ofinformation. Our proposed method can find out documents
that contain quality visual information and lift them up in
the ranked list presented to the user. The metrics, those we
proposed, are quite scalable and can be easily integrated
with a ranking framework because of the simplicity of the
computation process.
REFERENCES
[1] M. Bendersky, W. B. Croft, and Y. Diao, Quality-biased rankingof web documents, in Proceedings of the fourth ACM internationalconference on Web search and data mining, ser. WSDM 11. NewYork, NY, USA: ACM, 2011, pp. 95104. [Online]. Available:
http://doi.acm.org/10.1145/1935826.1935849[2] J. M. Kleinberg, Authoritative sources in a hyperlinked environment,
J. ACM, vol. 46, pp. 604632, September 1999. [Online]. Available:http://doi.acm.org/10.1145/324133.324140
[3] S. Brin and L. Page, The anatomy of a large-scale hypertextualweb search engine, Comput. Netw. ISDN Syst., vol. 30, pp. 107117, April 1998. [Online]. Available: http://dx.doi.org/10.1016/S0169-7552(98)00110-X
[4] D. Metzler and W. B. Croft, A markov random field model for termdependencies, in Proceedings of the 28th annual international ACMSIGIR conference on Research and development in information retrieval,ser. SIGIR 05. New York, NY, USA: ACM, 2005, pp. 472479.[Online]. Available: http://doi.acm.org/10.1145/1076034.1076115
[5] S. N. Keegan, Importance of visual images in lectures: Case study ontourism management students, Journal of Hospitality, Leisure, Sportsand Tourism Education, vol. 6, pp. 5865, 2007.
JOURNAL OF COMPUTING, VOLUME 4, ISSUE 10, OCTOBER 2012, ISSN (Online) 2151-9617
https://sites.google.com/site/journalofcomputing
WWW.JOURNALOFCOMPUTING.ORG 53
2012 Journal of Computing Press, NY, USA, ISSN 2151-9617
-
7/30/2019 A Novel Algorithm to Determine the Quality of a Web Page for Improving the Search Engine Ranking Framework
6/7
6
Fig. 3. IBQV values given by both human judges and our program for general health related documents.
Fig. 4. IBQV values given by both human judges and our program for general business related documents.
[6] A. E. Savakis, S. P. Etz, and A. C. P. Loui, Evaluation of imageappeal in consumer photography, B. E. Rogowitz and T. N. Pappas,Eds., vol. 3959, no. 1. SPIE, 2000, pp. 111120. [Online]. Available:http://dx.doi.org/10.1117/12.387147
[7] P. Obrador and N. Moroney, Low level features for image appealmeasurement, pp. 72 420T72 420T12, 2009. [Online]. Available: +http://dx.doi.org/10.1117/12.806140
[8] A. K. Moorthy, P. Obrador, and N. Oliver, Towards computationalmodels of the visual aesthetic appeal of consumer videos, inProceedings of the 11th European conference on Computer vision: Part
V, ser. ECCV10. Berlin, Heidelberg: Springer-Verlag, 2010, pp. 114.[Online]. Available: http://dl.acm.org/citation.cfm?id=1888150.1888152
[9] G. Salton, A. Wong, and C. S. Yang, A vector space model forautomatic indexing, Commun. ACM, vol. 18, pp. 613620, November1975. [Online]. Available: http://doi.acm.org/10.1145/361219.361220
[10] L. a b Shapiro, Computer Vision, 1st ed. Prentice Hall, 2001.
[11] D. Hasler and S. E. Suesstrunk, Measuring colorfulness in naturalimages, in Society of Photo-Optical Instrumentation Engineers (SPIE)
Conference Series, ser. Society of Photo-Optical Instrumentation Engi-neers (SPIE) Conference Series, B. E. Rogowitz and T. N. Pappas, Eds.,vol. 5007, Jun. 2003, pp. 8795.
[12] M. A. Najork, Comparing the effectiveness of hits and salsa,in Proceedings of the sixteenth ACM conference on Conferenceon information and knowledge management, ser. CIKM 07. NewYork, NY, USA: ACM, 2007, pp. 157164. [Online]. Available:http://doi.acm.org/10.1145/1321440.1321465
[13] N. Craswell, S. Robertson, H. Zaragoza, and M. Taylor, Relevanceweighting for query independent evidence, in Proceedings of the28th annual international ACM SIGIR conference on Researchand development in information retrieval, ser. SIGIR 05. NewYork, NY, USA: ACM, 2005, pp. 416423. [Online]. Available:http://doi.acm.org/10.1145/1076034.1076106
[14] W. Kraaij, T. Westerveld, and D. Hiemstra, The importance ofprior probabilities for entry page search, in Proceedings of the25th annual international ACM SIGIR conference on Researchand development in information retrieval, ser. SIGIR 02. New
JOURNAL OF COMPUTING, VOLUME 4, ISSUE 10, OCTOBER 2012, ISSN (Online) 2151-9617
https://sites.google.com/site/journalofcomputing
WWW.JOURNALOFCOMPUTING.ORG 54
2012 Journal of Computing Press, NY, USA, ISSN 2151-9617
-
7/30/2019 A Novel Algorithm to Determine the Quality of a Web Page for Improving the Search Engine Ranking Framework
7/7
7
York, NY, USA: ACM, 2002, pp. 2734. [Online]. Available:http://doi.acm.org/10.1145/564376.564383
[15] J. Peng, C. Macdonald, B. He, and I. Ounis, Combination of documentpriors in web information retrieval, in Large Scale Semantic Access toContent (Text, Image, Video, and Sound), ser. RIAO 07. Paris, France,France: LE CENTRE DE HAUTES ETUDES INTERNATIONALESDINFORMATIQUE DOCUMENTAIRE, 2007, pp. 596611. [Online].Available: http://dl.acm.org/citation.cfm?id=1931390.1931446
[16] Y. Liu, B. Gao, T.-Y. Liu, Y. Zhang, Z. Ma, S. He, and H. Li,Browserank: letting web users vote for page importance, in
Proceedings of the 31st annual international ACM SIGIR conferenceon Research and development in information retrieval, ser. SIGIR 08.New York, NY, USA: ACM, 2008, pp. 451458. [Online]. Available:http://doi.acm.org/10.1145/1390334.1390412
[17] M. Richardson, A. Prakash, and E. Brill, Beyond pagerank:machine learning for static ranking, in Proceedings of the 15thinternational conference on World Wide Web, ser. WWW 06. NewYork, NY, USA: ACM, 2006, pp. 707715. [Online]. Available:http://doi.acm.org/10.1145/1135777.1135881
[18] E. Baratis, E. G. M. Petrakis, and E. E. Milios, Automatic websitesummarization by image content: A case study with logo and trademarkimages. IEEE Trans. Knowl. Data Eng., vol. 20, no. 9, pp. 11951204,2008.
[19] O. Signore, A comprehensive model for web sites quality, in WSE,2005, pp. 3038.
[20] T.-Y. Liu, Learning to rank for information retrieval, Found. TrendsInf. Retr., vol. 3, pp. 225331, March 2009. [Online]. Available:
http://dl.acm.org/citation.cfm?id=1618303.1618304[21] D. Metzler and W. Bruce Croft, Linear feature-based models for
information retrieval, Inf. Retr., vol. 10, pp. 257274, June 2007.[Online]. Available: http://dl.acm.org/citation.cfm?id=1265488.1265494
[22] A. S. Foundation, Apache lucene - apache lucene core,http://lucene.apache.org/core/, 2011.
Sheikh Muhammad Sarwar is currently work-ing as a researcher in the department of
CSE, University of Dhaka, Bangladesh. Hecompleted his M.Sc. and B.Sc. from Uni-versity of Dhaka. His research interests in-clude Information Retrieval, Image Process-ing, Quantum Computing etc. He publishedhis research paper in an international confer-ence. He received scholarship for his resultin B.Sc. from University of Dhaka.
Md. Mustafizur Rahman is currently work-ing as an Associate Professor in Depart-ment of Computer Science and Engineering,University of Dhaka, Dhaka, Bangladesh. Heobtained his B.Sc. and M.Sc. from Univer-sity of Dhaka. He completed his PhD. fromKyung Hee University, South Korea. His re-search interests include Mobile Ad-hoc Net-work, Wireless Mesh Network, InformationRetrieval etc.
Mosaddek Hossain Kamal is currently work-ing as an Associate Professor in Departmentof Computer Science and Engineering, Uni-versity of Dhaka, Dhaka, Bangladesh. He ob-tained his B.Sc. from University of Dhaka,Bangladesh and M.Sc. from University ofNew South Wales, Australia. His research in-terests include Mobile Middle ware, RoutingAlgorithms, Information Retrieval etc.
JOURNAL OF COMPUTING, VOLUME 4, ISSUE 10, OCTOBER 2012, ISSN (Online) 2151-9617
https://sites.google.com/site/journalofcomputing
WWW.JOURNALOFCOMPUTING.ORG 55
2012 Journal of Computing Press, NY, USA, ISSN 2151-9617