utilizing local evidence for blog feed search

21
Utilizing local evidence for blog feed search Yeha Lee Seung-Hoon Na Jong-Hyeok Lee Received: 18 March 2011 / Accepted: 8 August 2011 / Published online: 26 August 2011 Ó Springer Science+Business Media, LLC 2011 Abstract Blog feed search aims to identify a blog feed of recurring interest to users on a given topic. A blog feed, the retrieval unit for blog feed search, comprises blog posts of diverse topics. This topical diversity of blog feeds often causes performance deterioration of blog feed search. To alleviate the problem, this paper proposes several approaches based on passage retrieval, widely regarded as effective to handle topical diversity at document level in ad-hoc retrieval. We define the global and local evidence for blog feed search, which correspond to the document-level and passage-level evidence for passage retrieval, respectively, and investigate their influence on blog feed search, in terms of both initial retrieval and pseudo-relevance feedback. For initial retrieval, we propose a retrieval framework to integrate global evidence with local evidence. For pseudo-relevance feed- back, we gather feedback information from the local evidence of the top K ranked blog feeds to capture diverse and accurate information related to a given topic. Experimental results show that our approaches using local evidence consistently and significantly out- perform traditional ones. Keywords Blog feed search Blog distillation Passage-based retrieval Pseudo-relevance feedback A preliminary version of this work was presented in Lee et al. (2009). Y. Lee (&) J.-H. Lee Division of Electrical and Computer Engineering, POSTECH, Pohang, South Korea e-mail: [email protected] J.-H. Lee e-mail: [email protected] S.-H. Na Department of Computer Science, National University of Singapore, Singapore, Singapore e-mail: [email protected] 123 Inf Retrieval (2012) 15:157–177 DOI 10.1007/s10791-011-9176-6

Upload: yeha-lee

Post on 26-Aug-2016

213 views

Category:

Documents


1 download

TRANSCRIPT

Utilizing local evidence for blog feed search

Yeha Lee • Seung-Hoon Na • Jong-Hyeok Lee

Received: 18 March 2011 / Accepted: 8 August 2011 / Published online: 26 August 2011� Springer Science+Business Media, LLC 2011

Abstract Blog feed search aims to identify a blog feed of recurring interest to users on a

given topic. A blog feed, the retrieval unit for blog feed search, comprises blog posts of

diverse topics. This topical diversity of blog feeds often causes performance deterioration

of blog feed search. To alleviate the problem, this paper proposes several approaches based

on passage retrieval, widely regarded as effective to handle topical diversity at document

level in ad-hoc retrieval. We define the global and local evidence for blog feed search,

which correspond to the document-level and passage-level evidence for passage retrieval,

respectively, and investigate their influence on blog feed search, in terms of both initial

retrieval and pseudo-relevance feedback. For initial retrieval, we propose a retrieval

framework to integrate global evidence with local evidence. For pseudo-relevance feed-

back, we gather feedback information from the local evidence of the top K ranked blog

feeds to capture diverse and accurate information related to a given topic. Experimental

results show that our approaches using local evidence consistently and significantly out-

perform traditional ones.

Keywords Blog feed search � Blog distillation � Passage-based retrieval �Pseudo-relevance feedback

A preliminary version of this work was presented in Lee et al. (2009).

Y. Lee (&) � J.-H. LeeDivision of Electrical and Computer Engineering, POSTECH, Pohang, South Koreae-mail: [email protected]

J.-H. Leee-mail: [email protected]

S.-H. NaDepartment of Computer Science, National University of Singapore, Singapore, Singaporee-mail: [email protected]

123

Inf Retrieval (2012) 15:157–177DOI 10.1007/s10791-011-9176-6

1 Introduction

Many users have been using blogs (or weblogs) to express their thoughts or opinions about

a wide range of topics including political issues, product reviews and diary-like private

posts. As the number of blog users has increased, the importance of blogs as an information

source has risen. As a result, the need for a customized elaborate search system, which

aims to find useful information in the blogosphere, has grown. Several commercial search

engines such as Google1 and Technorati2 have started to provide blog search services.

Nowadays, it has become common for blog users to search for blog feeds (e.g. RSS,

ATOM) relevant to topics that interest them, and then subscribe to the feeds using a feed

reader such as an RSS reader. In this scenario, a key issue is how to identify blog feeds that

are relevant and dedicated to a given topic. This task is blog feed search, which is one of

the most important blog search services. The Blog Distillation task of TREC Blog Track

(Macdonald et al. 2008; Ounis et al. 2009; Macdonald et al. 2010) also reflects the

increasing interest in blog feed search.

A straightforward approach for blog feed search would be to apply existing retrieval

models developed in ad-hoc retrieval. For example, we can view a blog feed as a virtual

document by concatenating all constituent posts belonging to the blog feed, and then

readily apply existing retrieval models without any modification. In fact, most previous

work on blog feed search used this approach as the baseline system (Macdonald et al.

2008; Ounis et al. 2009; Macdonald et al. 2010).

However, blog feed search has some characteristics that limit the performance of the

straightfoward approach. First, the retrieval unit is a blog feed, which is an aggregation of

its constituent posts, not a single blog post. In this regard, blog feed search should consider

how to model the relationship between the relevance of blog posts and the blog feed, in

response to a given topic. Second, most blog feeds contain topically diverse blog posts,

depending on a blogger’s interest. In other words, a blog feed generally addresses a large

number of topics. The topical diversity of blog feeds makes it difficult for blog feed search

systems to find out which blog feeds are relevant to users’ information needs. Third, blog

feed search has to deal with more noisy data than traditional search tasks. The blog corpus

is not as topically coherent as the news corpus, and may also have non-topical contents

such as spam blogs and blog comment spam, which advertise commercial products and

services (Kolari et al. 2006). Therefore, feed search techniques should be robust to this

noisy environment.

Among the above characteristics, this paper focuses on the performance deterioration

caused by the topical diversity of blog feeds. To mitigate this problem, our approaches are

motivated by the passage retrieval technique, which is one of the most effective techniques

to deal with topical diversity at document level for ad-hoc retrieval. We introduce globalevidence and local evidence for evaluating the relevance of a blog feed in response to

a query. These two types of evidence correspond to document-level and passage-level

evidence for passage retrieval, respectively. Whereas global evidence is derived from all

the constituent posts within a feed, local evidence is defined using a few blog posts that are

highly relevant to a query.

Different from most previous studies that use only global evidence to estimate the

relevance of a blog feed, we explicitly define and take advantage of the local evidence on

both initial retrieval and pseudo-relevance feedback (PRF). For initial retrieval, we propose

1 http://blogsearchgoogle.com/2 http://www.technorati.com/

158 Inf Retrieval (2012) 15:157–177

123

an approach to integrate the global evidence with the local evidence, and verify that the

usage of local evidence is effective in mitigating the topical diversity problem of blog feed

search. Furthermore, we present a novel document selection approach for PRF, based on

the local evidence of a blog feed. While several research work have looked at the initial

retrieval model for blog feed search, PRF has not been well studied on blog feed search,

despite its importance. Our approaches select feedback documents based on local evidence

of the top ranked blog feeds in order to improve the ‘‘precision’’ and ‘‘aspect recall’’

(Kurland et al. 2005) of feedback information, which are two important factors affecting

the performance of the feedback model. Experimental results show that the proposed

method achieves MAP scores that are 6, 2 and 11% better than the best results of TREC 07,

08 and 09, respectively. These results are notable in that our work is the first successful

feedback approach for blog feed search in a ‘‘closed setting’’ using only a test collection. In

general, it is the common interest to investigate whether the PRF, in the context of the

closed setting, improves performance over the baseline for various retrieval tasks including

ad-hoc retrieval and web search (Rocchio 1971; Yu et al. 2003; Zhai and Lafferty 2001;

Na et al. 2008a; Lavrenko and Croft 2001). Furthermore, to the best of our knowledge,

while the existing work which reported the improvements on PRF for blog feed search is

based on an external resource, we improve the performances of the blog feed search

without resorting to any other resources.

The rest of the paper is organized as follows. In Sect. 2, we present the issue of topical

diversity that motivates our work, and address feed search models using the global and

local evidence of a blog feed. In Sect. 3, we conduct several experiments to evaluate the

performance of our proposed methods, and discuss the difference between our approach

and previous work. In Sects. 4 and 5, we describe our approaches for PRF, and compare the

results with traditional feedback approaches. In Sect. 6, we briefly survey related work on

blog feed search. Finally, we conclude the paper and discuss future work in Sect. 7.

2 Initial retrieval model for blog feed search

2.1 Motivation: topical diversity of blog feeds

Topical diversity is a problem not only for blog feed search, but also for ad-hoc retrieval at

the document level. A document can contain diverse topics, particularly when it is long. As

a result, long documents are likely to be over-penalized by a retrieval algorithm even

though they are relevant to a given topic, resulting in poor retrieval performance (Salton

et al. 1993).

Many approaches have been proposed to solve this problem. One of the effective

approaches is passage retrieval, in which the relevance score of a document is boosted by

an additional score estimated using passage-level evidence. Passage retrieval has turned

out to significantly improve the baseline using only traditional document-level evidence

(Callan 1994; Kaszkiel and Zobel 1997; Kaszkiel and Zobel 2001; Salton et al. 1993;

Na et al. 2008b; Bendersky and Kurland 2010).

Passage-level evidence has also been applied to PRF (Allan 1995; Na et al. 2008a),

namely passage-based feedback which uses passages as the context for query expansion

instead of documents. Passage-based feedback has been reported to result in significant

improvements over conventional document-based feedback.

The topical diversity has a greater negative impact on blog feed search than on ad-hoc

retrieval, because a blog feed which is the retrieval unit of blog feed search consists of

Inf Retrieval (2012) 15:157–177 159

123

many blog posts. A blog feed usually contains more topics than a document, and the topics

of the feed are likely to be less coherent than those of the document. This means that even

if a blog feed is relevant to a given topic, a large number of posts within the feed can be

irrelevant.

In practice, most relevance judgments currently used for blog feed search regard a blog

feed as relevant even if only some of the posts within the feed are relevant. For example,

Seo and Croft (2008) introduced several criteria for relevance judgments, where relevant

feeds are divided into three levels according to the proportion of relevant posts in a feed.

Their minimum cutoff criterion to determine if a blog feed is relevant is whether at least

25% of all the posts within the feed are relevant.

2.2 Retrieval framework

To deal with the topical diversity of a blog feed, this paper proposes a novel approach

based on passage retrieval. We first define the global and local evidence of a blog feed. To

achieve this, we make correspondences between a document and a blog feed, and a passage

and a subset of blog posts within the feed. Then, we evaluate the global evidence using all

the constituent posts within the feed, corresponding to the document-level evidence in

passage retrieval. We also estimate the local evidence using a subset of the blog posts,

corresponding to passage-level evidence.

In the following sections, we address how the evidence affects the relevance of a blog

feed in response to a given query.

2.2.1 Global evidence and local evidence

Global evidence can be estimated using the overall information of a blog feed (i.e. all

constituent posts). This addresses one of the important issues for evaluating the relevance

of a blog feed to a query. Global evidence reflects how much the feed is devoted to a given

query. Given the query, we evaluate the devotedness of a blog feed using the proportion of

relevant blog posts within the feed. We assume that the more devoted a blog feed is to a

given query, the more likely it is to be relevant.

Local evidence can be evaluated using a subset of blog posts within a blog feed. As the

definition of a passage is important for passage retrieval, the way local evidence is defined

is also a critical issue. In this paper, we utilize a set of the T most relevant posts within a

blog feed to evaluate the local evidence of the feed in response to a given query. We

assume that the top-ranked posts can be a representative sample of the feed about a query

topic, conceptually corresponding to a passage in passage retrieval.

2.2.2 Combination of evidence

Global and local evidence have their own limitations in terms of blog feed search. First,

global evidence tends to prefer a small blog feed (i.e. a feed with a small number of posts)

to a large one, because small feeds are less likely to contain diverse topics. Second, local

evidence uses only a few relevant posts within a blog feed, and thus cannot identify which

blog feed is more devoted to a given query.

To overcome the limitations of each type of evidence, our retrieval model combines

both global and local evidence. Let R(Q, F) be the relevance score of a blog feed F in

160 Inf Retrieval (2012) 15:157–177

123

response to a query Q. We use linear interpolation to combine the two types of evidence as

follows:

RðQ;FÞ ¼ ð1� aÞRGðQ;FÞ þ aRLðQ;FÞ ð1Þ

where RG and RL are the relevance scores estimated using the global and local evidence,

respectively, and a is a weight parameter to control the relative importance of the two types

of evidence.

2.3 Basic retrieval models

Since a blog feed consists of a number of blog posts, there may be several approaches for

representing the blog feed according to the granularity level. From previous work on blog

feed search (Elsas et al. 2008; Macdonald and Ounis 2008; Seo and Croft 2008; Mac-

donald et al. 2008; Ounis et al. 2009), we can observe that there are two ways to represent

a blog feed. In this study, we use the models proposed by Elsas et al. (2008) in order to

represent a blog feed: ‘‘Large Document Model’’ (LDM) and ‘‘Small Document Model’’

(SDM).

Let L be a subset of a blog feed F, which will be defined differently according to the

type of evidence (i.e. global or local).

First, LDM regards a blog feed as a single large document represented by concatenating

all the constituent posts within it. Then, the relevance score of the feed is estimated using

the relevance score between the virtual document and a query. Therefore, most ad-hoc

retrieval techniques can be applied to LDM.

RLDMðQ;FÞ ¼ ScoreðQ;VDÞ ð2Þ

where VD is a virtual document represented by concatenating the blog posts within the set

L, and Score(Q, VD) can be evaluated using (4) to be defined later.

LDM has some problems which arise from representing a blog feed by concatenating all

posts within it without any consideration of the relationship among the posts (Seo and

Croft 2008).

Second, SDM regards a blog feed as a collection of all the blog posts within it. Then, its

relevance score is evaluated by summing up the relevance score of each post in response to

a query. The score function for SDM is defined as follows:

RSDMðQ;FÞ ¼X

D2L

ScoreðQ;DÞPðDjLÞ ð3Þ

where P(D|L) means the probability of selecting a blog post D, given the set L, and

Score(Q, D) can be evaluated using (4).

There are many possible approaches to estimate the probability P(D|L) (Elsas et al.

2008). However, we assume the probability P(D|L) has a uniform distribution because our

interest is in exploring the influence of global and local evidence on the performance of the

blog feed search.

The remaining issue is how to define the subset L of a blog feed F. We construct the

subset L according to each evidence, as follows:

1. Global Evidence: L consists of all blog posts within a feed F, i.e. L = F.

2. Local Evidence: L is a set of top T ranked blog posts within a feed F in response to a

given query, denoted by Top(T, F).

Inf Retrieval (2012) 15:157–177 161

123

Then, we can define two different models for each type of evidence, depending on

which representation (LDM or SDM) is used. First, global evidence has the following two

models:

– Global Large Document Model (GLD), RGLDM, uses global evidence with the LDM

for feed representation (i.e. L = F in (2)). GLD was used as the baseline for many

systems on the Blog Distillation task (Macdonald et al. 2008; Ounis et al. 2009;

Macdonald et al. 2010), and the results show that this model is effective without

resorting to any other techniques or resources.

– Global Small Document Model (GSD), RGSDM, uses global evidence with the SDM for

feed representation (i.e. L = F in (3)).

Similarly, local evidence has the following two models:

– Local Large Document Model (LLD), RLLDM, uses local evidence with the LDM for

feed representation (i.e. L = Top(T, F) in (2)). Unlike the GLD, the virtual document is

represented by a concatenation of blog posts relevant to a query, not all of the posts

within the blog feed F.

– Local Small Document Model (LSD), RLSDM, uses local evidence with the SDM for

feed representation (i.e. L = Top(T, F) in (3)).

2.4 Combined models

Four models are possible in (1) using two global models (GLD and GSD) and two local

models (LLD and LSD) as follows:

– GLD1LLD: GLD for RG(Q, F), and LLD for RL(Q, F), formulated by R(Q, F) =

(1 - a)RGLDM(Q, F) ? aRL

LDM(Q, F).

– GLD1LSD: GLD for RG(Q, F), and LSD for RL(Q, F), formulated by R(Q, F) =

(1 - a)RGLDM(Q, F) ? aRL

SDM(Q, F).

– GSD1LLD: GSD for RG(Q, F), and LLD for RL(Q, F), formulated by R(Q, F) =

(1 - a)RGSDM(Q, F) ? aRL

LDM(Q, F).

– GSD1LSD: GSD for RG(Q, F), and LSD for RL(Q, F), formulated by R(Q, F) =

(1 - a)RGSDM(Q, F) ? aRL

SDM(Q, F).

2.5 Relevance score function

The remaining issue is the score function to evaluate the relevance between a document

and a query. Since LDM views a blog feed as a large document, we need a score function

to estimate the relevance between a large document (blog feed) and a query. For SDM, we

also need a score function to evaluate the relevance between each blog post and a query. To

this end, we use one of the representative state-of-the-art retrieval models, the KL-diver-

gence language model (Lafferty and Zhai 2001).

Let hQ and hD be a query language model and a document language model, respectively.

We use Dirichlet smoothing (Zhai and Lafferty 2004) to estimate the document language

model. Our score function is as follows:

ScoreðQ;DÞ ¼def X

w2Q\D

PðwjhQÞ � log 1þ tf ðw;DÞlPðwjCÞ

� �þ log

llþ jDj ð4Þ

162 Inf Retrieval (2012) 15:157–177

123

where tf(w, D) is the frequency of term w within a document D; PðwjCÞ ¼ ctfwjCj : ctfw is the

number of times term w occurred in the entire collection and l is a smoothing parameter.

In the initial retrieval, a query language model is estimated by using the maximum

likelihood estimate. We then update the query language model based on feedback docu-

ments. In Sect. 4, we address novel feedback approaches to improve the performance of the

blog feed search.

3 Retrieval experiments

We investigated the influence of global and local evidence on the performance of blog feed

search according to varying the weight parameter a. The experimental results show that our

models based on passage retrieval are simple, but effective for blog feed search.

3.1 Experimental setup

3.1.1 Data set

The TREC Blogs06 and Blogs08 collections (Macdonald 2006; Macdonald et al. 2010)

were used for our experiments. Each collection is a big sample from the blogosphere.

Table 1 shows the statistics of the collections. For the TREC 2009 Blog Distillation task,

we evaluated the topical relevance of a blog feed with only 39 topics3 that have at least one

relevant blog (Macdonald et al. 2010).

We only used permalinks (blog posts) for the experiments. We discarded the HTML

tags of the blog posts. The posts were also processed by stemming using the Porter

stemmer and eliminating stopwords using the INQUERY words stoplist (Allan et al.

2001).

3.1.2 Parameter setting and evaluation measures

We evaluated four basic models, GLD, GSD, LLD and LSD, using only the title field of

each topic as a query. We also evaluated four combined models, GLD?LLD, GLD?LSD,

GSD?LLD and GSD?LSD. Each model has a few parameters. The global models (GLD

and GSD) have one parameter, i.e. the parameter l for Dirichlet smoothing. The local

models (LLD and LSD) have two parameters, i.e. the smoothing parameter l and T which

controls the number of posts used to estimate the local evidence of their feed. In addition to

these parameters, the combined models have a weight parameter a.

Table 1 Statistics for the testcollections

Task Collection # Docs Topics

2007 Blog distillation Blogs06 3,215,171 951–995

2008 Blog distillation 1,051–1,100

2009 Blog distillation Blogs08 28,488,766 1,101–1,150

3 These 39 topics were used to obtain the official evaluation results in the TREC 2009 Blog Distillation task.

Inf Retrieval (2012) 15:157–177 163

123

We trained the parameters using the 07 topics for evaluating the performance of the 08

topics, and vice-versa. Then, the parameters for the 09 topics are trained using the 07 and

08 topics. We selected the parameters resulting in the best MAP score.

Similar to the Blog Distillation task, we retrieved the 100 most relevant blog feeds in

response to each query. We used the mean average precision (MAP) and the precision at

rank 10 (Pr@10) as the evaluation measure.

3.2 Results and discussion

Table 2 shows the performance of each model. We performed the Wilcoxon signed rank

test to examine whether or not the improvement of the combined models over the baseline

(GLD) was statistically significant. The baseline outperformed other basic models for all

the topic sets. These results are similar to those from previous work (Macdonald et al.

2008; Ounis et al. 2009).

The best performance was obtained from GSD?LSD. Compared with the basic models,

all the combined models improved the performance significantly and consistently. This

confirms our hypothesis that the combined approach reduces the risk of separately using

each type of evidence, and leads to better performance. An interesting observation is that

the best performance of the basic models resulted from using Large Document Model4

(GLD), but the best performance of the combined models resulted from using Small

Document Model (GSD?LSD). This implies that the interaction of the global and local

evidence for blog feed search is better captured by SDM than LDM.

Figure 1a shows how varying the weight parameter a affects the performance of blog

feed search when using GSD?LSD, which shows the best performance. The weight

parameter a controls the importance between the global and local evidence. For all the

topic sets, we can obtain the best performance when the weight parameter is 0.7 or 0.8. We

can confirm that the two types of evidence should be considered together to improve the

performance of blog feed search.

Table 2 The performance of basic models and combined models

Models 2007 (951–995) 2008 (1,051–1,100) 2009 (1,101–1,150)

MAP Pr@10 MAP Pr@10 MAP Pr@10

Basic models GLD 0.3637 0.4867 0.2729 0.4000 0.3126 0.4051

GSD 0.3196 0.5111 0.2419 0.3980 0.2733 0.3923

LLD 0.3151 0.4489 0.2333 0.3820 0.2613 0.3744

LSD 0.2782 0.4298 0.2152 0.3640 0.2509 0.3641

Combined models GLD?LLD 0.3790� 0.4956 0.2744� 0.4180 0.3401} 0.4282�

GLD?LSD 0.3885} 0.5089� 0.2864} 0.4360� 0.3552} 0.4308�

GSD?LLD 0.3839� 0.5644} 0.2879� 0.4400� 0.3557} 0.4487�

GSD?LSD 0.3930} 0.5667} 0.3015} 0.4480� 0.3635} 0.4564}

The statistical significance at the 0.05 and 0.01 level is indicated by � and } for an improvement from thebaseline (GLD), respectively. The best performance is shown in bold

4 LLD also outperforms LSD.

164 Inf Retrieval (2012) 15:157–177

123

Figure 1b shows the influence of the parameter T on the performance of blog feed

search when using GSD?LSD. T controls how many blog posts are used for the local

evidence of a blog feed in response to a given query. We can obtain the best performance

when T is set to 2. This reveals that using a few highly relevant posts within a blog feed is

effective in evaluating the local evidence.

3.3 Comparison with other approaches

In the experiments, we showed that the use of local evidence is quite helpful in improving

the performance of blog feed search. Some previous researchers had already utilized

similar methods.

Macdonald and Ounis (2008), motivated by the Voting Model for the expert search task,

suggested expCombSUM. In expCombSUM, the highly relevant posts have a large effect

0 0.2 0.4 0.6 0.8 10.15

0.20

0.25

0.30

0.35

0.40

0.45

alpha

MA

P

07 MAP08 MAP09 MAP

0 2 4 6 8 10 12 150.20

0.25

0.30

0.35

0.40

0.45

The number of T

MA

P

07 MAP08 MAP09 MAP

(a)

(b)

Fig. 1 MAP scores for varying the parameters, a and T, under the GSD?LSD retrieval model. a MAPscores for varying the weight parameter a, b MAP scores for varying the value of T

Inf Retrieval (2012) 15:157–177 165

123

on the relevance score of a blog feed. Due to its weighted approach using query-relevant

scores, expCombSUM plays a similar role to local evidence. Elsas et al. (2008) also

proposed the Entry Centrality Component as a part of the Small Document Model. The

component estimates a probability distribution to measure the similarity between a blog

post and its feed, and controls the weight of each post to evaluate the relevance between its

feed and a query.

However, these approaches are different from ours in some respects. They consider all

constituent posts within a blog feed. Although the posts are differently weighted, the

approaches can be regarded as using weighted global evidence. In contrast, our model

actively finds local evidence corresponding to the passage-level evidence for passage

retrieval. Furthermore, whereas their approaches can only be applied in SDM, our model

provides a more flexible and expanded framework, in the sense that two types of evidence

can be estimated regardless of representation methods (e.g. LDM or SDM).

One of the most similar approaches to our model is the PCS-GR model suggested by

Seo and Croft (2008). PCS-GR is an approach combining their Global Representation and

Pseudo-Cluster based Selection, corresponding to our GLD?LSD approach. Like our

results, they showed that the combining approach results in significant improvements in

their well-designed experiments. However, our motivation is different from theirs.

Whereas they introduced a combining approach to penalize topically-diverse feeds, we

proposed a combining approach to avoid ‘‘over-penalizing’’ topically-diverse feeds. The

local evidence of a blog feed plays a similar role to the passage-level evidence of passage

retrieval. In addition, our approach provides a general framework by integrating global and

local evidence, including PCS-GR as a special case (i.e. GLD?LSD).

4 Feedback model for blog feed retrieval

In the previous section, we showed how local evidence is explored for the initial retrieval

of blog feed search, and verified that local evidence is helpful in improving retrieval

performance. In this section, we further explore local evidence in terms of PRF, and

propose novel feedback approaches based on local evidence.

4.1 Limitations of naive feedback approaches

Before addressing our feedback methods, we present two naive approaches for PRF and

show why they are not desirable.

Because the retrieval unit of blog feed search is a blog feed, not a document, a blog feed

is also a natural feedback unit. In this regard, a naive feedback model is an All-Postsapproach, which chooses all constituent posts in the top-ranked feeds as feedback docu-

ments. However, due to the topical diversity of the blog feed, even if a blog feed is

relevant, it does not mean that all of its constituent posts are relevant to a query. Fur-

thermore, if some of the top-ranked feeds chosen for the feedback are irrelevant, almost all

of the posts within them could be irrelevant. Therefore, the All-Posts approach has

potentially high risk of selecting many irrelevant posts, which decrease the precision of the

feedback information.

Another naive model is a Post-Level approach, which applies the traditional feedback

approach to blog feed search. The approach first performs a post-level retrieval and then

uses the top-ranked posts as feedback documents, without considering which feed they

come from. Unlike the All-Posts approach, the Post-Level one does not suffer from the low

166 Inf Retrieval (2012) 15:157–177

123

precision of feedback information. However, the feedback information can be biased

toward a dominant aspect within the top-ranked posts. In other words, the Post-Level

approach may suffer from ‘‘aspect recall’’ (Kurland et al. 2005), one of the important

properties which determines feedback quality.

With regard to query expansion for blog feed search, previous work has addressed some

properties of blog feed search queries: ‘‘. . . Given the nature of feed search, queries maydescribe more general and multifaceted topics, likely to stimulate discussion over time. If aquery corresponds to a high-level description of some topic, there might be a widevocabulary gap between the query and the more nuanced and faceted discussion in blogposts’’ (Elsas et al. 2008).

This property can make the aspect-recall problem of the Post-Level approach more

serious, because the vocabulary gap may make the top N ranked documents more likely to

be biased to a certain aspect of a given query. As a result, the feedback documents selected

using the Post-Level approach will cover only a few aspects of a query.

4.2 Feed based selection

A blog feed consists of posts with diverse topics depending on the bloggers’ interests or

inclinations. Thus, for a given query, the blog posts from different feeds may present

different perspectives or facets of a topic, although they address information about the

same topic. In other words, all (unknown) aspects of a query are scattered over all the

relevant feeds, and their relevant posts. Therefore, if we gather information from various

blog feeds, we can obtain more diverse information about a query so that it can cover the

various aspects of the query topic, and this leads to the improved performance of PRF.

However, this approach can have the same problem as the All-Posts approach. To solve

this problem, motivated by passage-based feedback, we propose Feed-Based Selectionwhich first selects as many feeds as possible for PRF, and then gathers only a few posts

within each of them, in order of the relevance between posts and the query. In other words,

Feed-Based Selection uses local evidence on the top-ranked blog feeds. This method

corresponds to passage-based feedback in ad-hoc retrieval where the scope of the feedback

context is narrowed into the passage, rather than using the entire document context.

The Feed-Based Selection has two important characteristics that allow it to handle the

problems of two naive approaches, All-Posts and Post-Level. First, it only uses the highly

relevant posts of a top-ranked feed (local evidence), not entire posts (global evidence). In

contrast with the All-Posts approach, it can alleviate the low precision problem caused by

the topical diversity of a blog feed. Second, it collects more diverse information from as

many feeds as possible. As a result, it allows a system to learn much more about the aspects

of a query than the Post-Level approach, and leads to an increase in aspect recall.

Similar to the initial retrieval model presented in Sect. 2, one of the most important

issues is how to define the local evidence of each blog feed. We propose two approaches

for defining local evidence: Fixed Feed Based Selection and Weighted Feed BasedSelection.

4.3 Fixed feed based selection (FFBS)

FFBS uses the top K ranked feeds to gather feedback documents. FFBS considers the top

K ranked feeds as equally relevant to a given query regardless of their relevance to the

query indicated by the relevance score.

Inf Retrieval (2012) 15:157–177 167

123

Let FBFFBS be a set of blog posts chosen by using FFBS. We can define FBFFBS as

follows:

FBFFBS ¼ djdm;k 2 Fk; k ¼ 1 � � �K;m ¼ 1 � � �M� �

ð5Þ

where dm,k indicates the mth blog post, ranked in order of a score obtained by 4, within the

kth ranked feed, and Fk represents the kth ranked feed. In this paper, FFBS-K-M indicates a

FBFFBS with K and M.

4.4 Weighted feed based selection (WFBS)

Similar to FFBS, WFBS also uses top K ranked feeds to construct feedback documents.

However, WFBS chooses a different number of blog posts from each blog feed according

to their relevance score. To achieve this, we assign differnet weights to the top k feeds in

order of their relevancy.

Let N be the total number of feedback documents and FBWFBS be a set of blog posts

chosen by using WFBS. We can define FBWFBS as follows:

FBWFBS ¼ djdm;k 2 Fk; k ¼ 1 � � �K;m ¼ 1 � � �Mk

� �ð6Þ

where Mk indicates the number of blog posts selected from each feed, and we define Mk as

follows:

Mk ¼WFkP

j WFj� N ð7Þ

where WFi indicates the weight of the ith ranked blog feed. In practice, Mk should be an

integer number, so it is rounded to the nearest integer. WFBS-K-N denotes a FBWFBS with

K and N.

There may be several methods to assign the weight WFi, but this paper uses a simple

method defined as follows:

WFi ¼ K � iþ 1 ð8Þ

where WFi is an inverted measure with respect to i, i.e. the blog feed with the highest score

has a weight of K and the Kth feed has a weight of 1.

5 Feedback experiments

In this section, we investigate the influence of several document selection approaches on

the performance of PRF.

5.1 Experiment setup

For feedback experiments, we used GSD?LSD as a baseline retrieval model, because it

showed the best performance among the initial feed retrieval models in Sect. 4. The

baseline model, GSD?LSD, is also used to perform PRF based on the expanded query

model.

To update the query language model, we used model-based feedback (Zhai and Lafferty

2001).

168 Inf Retrieval (2012) 15:157–177

123

hQ0 ¼ ð1� aFÞhQ þ aFhF ð9Þ

where aF controls the influence of the feedback model, and the feedback model hF is

estimated by using a generative model of feedback documents.

5.1.1 Document selection approaches

We built several sets of feedback documents. Each set includes 10 documents as feedback

documents. The document sets used for feedback are as follows:

– TOP-10: 10 documents are chosen according to the relevance of the document. This

approach is the Post-Level document selection.

– Feed3All-Posts: All posts from the top 3 ranked feeds are chosen as feedback

documents.

– Feed5All-Posts: All posts from the top 5 ranked feeds are chosen as feedback

documents.

– FFBS-3-3: 10 documents are chosen using FFBS with K = 3 and M = 3.

– FFBS-5-2: 10 documents are chosen using FFBS with K = 5 and M = 2.

– WFBS-3-10: 10 documents are chosen using WFBS with K = 3 and N = 10.

– WFBS-5-10: 10 documents are chosen using WFBS with K = 5 and N = 10.

5.2 Experimental results

Table 3 shows the performance of each selection method. The experimental results show

that our feed-based selection approaches (FFBS and WFBS) significantly and consistently

outperform the baselines. FFBS and WFBS increase the MAP score by 2–3% over the

baseline models for all the topic sets. To check whether our methods show statistically

significant improvements over the baseline, we performed the Wilcoxon signed rank test at

0.05 significance level for each metric, and attached the symbol � to the scores for FFBS

and WFBS only when they showed significant results over the baseline. As shown in

Table 3, almost all runs of FFBS and WFBS showed statistically significant improvements

Table 3 The performance of the feedback models according to each feedback document selection approach

Model 2007 (951–995) 2008 (1051–1100) 2009 (1101–1150)

MAP Pr@10 MAP Pr@10 MAP Pr@10

Baseline 0.3930 0.5667 0.3015 0.4480 0.3635 0.4564

TOP-10 0.3850 0.5133 0.3009 0.4380 0.3847 0.4821

Feed5All-Posts 0.4249 0.5600 0.2743 0.3860 0.3623 0.4615

Feed3All-Posts 0.4067 0.5578 0.3051 0.4180 0.3732 0.4641

FFBS-5-2 0.4269� 0.5882§ 0.3103} 0.4440} 0.3863�} 0.4744

FFBS-3-3 0.4243�} 0.5778§ 0.3253�§} 0.4640§} 0.3899� 0.4718

WFBS-5-10 0.4299� 0.5844§ 0.3160} 0.4380} 0.3852� 0.4718

WFBS-3-10 0.4215� 0.5689§ 0.3238�§} 0.4640} 0.3903� 0.4846

The statistical significance at the 0.05 level is indicated by �, § and } for an improvement from the baseline,the Post-Level selection (TOP-10), and the All-Posts selection, respectively. The best performance is shownin bold

Inf Retrieval (2012) 15:157–177 169

123

over the baseline on MAP. This means that the feed-based approaches (FFBS and WFBS)

are effective to improve the performance of PRF.

Furthermore, FFBS and WFBS show better performance than the two naive approaches:

All-Posts (Feed3All-Posts and Feed5All-Posts) and Post-Level (TOP-10). To see whether

the improvement is statistically significant, we again performed the Wilcoxon signed rank

test, and attached § and } only when they showed significant results over All-Posts and

Post-Level, respectively.5 We found that the majority of runs of FFBS and WFBS show

statistically significant improvements over both of the naive approaches.

The All-Posts and Post-Level methods did not show reliable performance. They did not

show any improvement over the baseline for most topic sets. First, the failure of the All-

Posts approach provides good evidence that it suffers from low precision of feedback

information. In particular, for the 08 topics, the top K feeds used for PRF are likely to

contain many irrelevant feeds, because the initial performance for the 08 topics is relatively

low. Thus, as K increases for the 08 topics, the feedback documents constructed using the

All-Posts include too many irrelevant documents to improve the performance of PRF.

Actually, when using K = 5, the performance deteriorated more seriously than when

K = 3. This result explains why we need to use local evidence for PRF.

Second, for the 07 and 08 topics, the failure of the Post-Level approach supports our

proposal for the feed-level selection. Post-Level suffers from low aspect recall so that it

can only cover a few relevant aspects of a query. In contrast, our approaches enable the

system to increase the aspect recall, because the feedback documents are chosen from

various feeds which reflect the diverse aspects relevant to a query. Finally, this leads to the

improved performance of the feedback model.

We compare our approaches with the top 3 performing runs6 of the TREC 07, 08 and 09

Distillation task in Table 4. The results are obtained from (Macdonald et al. 2008; Ounis

et al. 2009; Macdonald et al. 2010). Our feedback approaches significantly and consistently

improve the results of the best runs for all tasks. In particular, for the 07 task, WFBS-5-10

achieved about a 6% increase of the MAP score over the TREC ’07 best run. FFBS-3-3

accomplished more than a 2% increase of the MAP score over the TREC ’08 best run.

WFBS-3-10 also increased the MAP score by 12% over the TREC ’09 best run.

Note that in Table 4, we only quote the official results of the top performing runs from

TREC, and we did not implement them. Furthermore, we did not apply the significance test

such as the Wilcoxon signed rank test between our methods and the TREC runs. Therefore,

it is unclear what caused the difference in performances between our methods and the runs.

The performance differences might be caused by several factors such as the method for

preprocessing documents, the way for selecting parameters or the effectiveness of each

algorithm for blog feed search. It will be valuable to implement the top performing

algorithms and directly compare the results. We leave this issue for a remaining work.

5.3 Influence of K and M on performance

Figure 2 shows the performance of FFBS and All-Posts according to varying K and

M parameters.

5 FFBS or WFBS were compared with All-Posts using the same K. That is, FFBS-5-2 and WFBS-5-10 werecompared with Feed5All-Posts.6 The runs are the automatic title-only runs sorted by MAP.

170 Inf Retrieval (2012) 15:157–177

123

The FFBS methods show more reliable and better curves than All-Posts for all the topic

sets. In particular, for the 08 and 09 topics, the performance gap between FFBS methods

and All-Posts was very big for large values of K. From these results, we can again verify

the effectiveness of local evidence to improve the performance of PRF.

The best parameter range of K for each method was different for each topic set. For the

07 topics, the best MAPs were found at relatively large K values between 5 and 7, while the

MAP scores at small K(B2) were not good. However, for the 08 and 09 topics, the trend for

K is reversed, where the best MAP scores are obtained at relatively small K values between

1 and 3, while MAP scores at large K(C5) decreased seriously.

Note that the performance curves are more robust on the 07 topics than the 08 and 09

topics for all methods including All-Posts. In other words, on the 07 topics, even at large

K values, the MAP for each method did not seriously decrease, while on the others, when

K C 5, the MAP of all methods decreased sharply.

One possible explanation for the differing trend and robustness between 07 topics and

08, 09 topics can be obtained by comparing the performance of the initial retrieval for each

topic set. From Table 3, we already saw that Pr@10 on the 07 topics is much better than

those on the 08 and 09 topics. That is, the number of relevant feeds in the top-ranked ones

will be more for the 07 topics than for the 08 and 09 topics. This may mean that the

deterioration of the precision from using more feeds is not severe, resulting in reliable

MAP scores. In contrast, for the 08 and 09 topics, when using a relatively large K value

(about 5), the top K ranked feeds are likely to be irrelevant due to low Pr@10, so that the

precision seriously decreases, causing a low MAP score.

When using M = 2, we obtained the most reliable performance, for all the topic sets,

among the three values.7 The results for M = 1 and M = 5 are on a case-by-case basis

Table 4 The performance of the top 3 performing runs for the TREC 07, 08 and 09 Distillation tasks

Models 2007 (951–995) 2008 (1,051–1,100) 2009 (1,101–1,150)

MAP Pr@10 MAP Pr@10 MAP Pr@10

TREC 2007 CMUfeedW 0.3695 0.5356

ugoBDFeMNZP 0.2923 0.5311

UMaTiPCSwGR 0.2529 0.5111

TREC 2008 cmuLDwikiSP 0.3056 0.4340

KLEDistLMT 0.3015 0.4480

uams08b1 0.2638 0.4200

TREC 2009 prisb 0.2756 0.2767

ICTNETBDRUN2 0.2399 0.2384

Combined 0.2326 0.2409

FFBS-5-2 0.4269 0.5882 0.3103 0.4440 0.3863 0.4744

FFBS-3-3 0.4243 0.5778 0.3253 0.4640 0.3899 0.4718

WFBS-5-10 0.4299 0.5844 0.3160 0.4380 0.3852 0.4718

WFBS-3-10 0.4215 0.5689 0.3238 0.4640 0.3903 0.4846

For comparison purposes, the performance of our approaches are included. The best performance is shownin bold

7 Even though we did not plot the case of M = 3, the curve is similar to that of M = 2.

Inf Retrieval (2012) 15:157–177 171

123

according to each topic set. For example, consider when M = 5. For the 07 topics, its MAP

is the best, compared with the MAPs when M = 1 or M = 2. On the other hand, for the 08

topics, its MAP becomes worse than when M = 1 or M = 2.

1 2 3 4 5 6 7 8 9 100.390

0.395

0.400

0.405

0.410

0.415

0.420

0.425

0.430

0.435

K (The number of Feeds)

MA

P

M = 1M = 2M = 5All−PostsBaseline

1 2 3 4 5 6 7 8 9 100.26

0.27

0.28

0.29

0.30

0.31

0.32

0.33

K (The number of Feeds)

MA

P

M = 1M = 2M = 5All−PostsBaseline

1 2 3 4 5 6 7 8 9 100.34

0.35

0.36

0.37

0.38

0.39

0.40

K (The number of Feeds)

MA

P

M = 1M = 2M = 5All−PostsBaseline

(a)

(b)

(c)

Fig. 2 The MAP scores of FFBSaccording to varying K andM parameters, usingGSD?LSD as the retrievalmodel, compared to the baselineand All-Posts. a The MAP scoresfor 07 topics. b The MAPscores for 08 topics. c The MAPscores for 09 topics

172 Inf Retrieval (2012) 15:157–177

123

5.4 Comparison of cluster centroid algorithm

As discussed in Sect. 4, our feed based selection is derived by considering the aspect recall.

There is existing work for the ad-hoc retrieval task related to increasing the aspect recall.

This work is called the Cluster Centroid approach (Shen and Zhai 2005). Cluster Centroid

clusters the feedback documents to maximize the diversity of the feedback information (i.e.

aspect recall). Cluster Centroid consists of the following 3 steps: 1) Group the top

N documents into K clusters, 2) Select a centroid document from each resulting cluster, and

3) Use all such K centroid documents for feedback documents. Since Cluster Centroid does

not use any information about the relationship between the posts and their feed, it can be

viewed as an automatic method to construct feeds by regarding a cluster as a pseudo feed.

We re-implemented the Cluster Centroid method in the same setting used in their exper-

iment, by using the K-Medoid clustering algorithm (Kaufman and Rousseeuw 1990), and

J-Divergence (Lin 1991) as the distance function between clusters. For a fair comparison to

the previous section, we fix the number of clusters K to 10.

Table 5 shows the results of Cluster Centroid, according to the number of top posts

N for clustering, where the feedback model is GSD?LSD. Note that when N = 10, the

Cluster Centroid corresponds to TOP-N, since each post creates a separate cluster. The

Cluster Centroid method outperforms the baseline for some N values by about 0.2%, 0.9%

and 2.6% for the 07, 08 and 09 topics, respectively. This result of Cluster Centroid is

important, because it confirms the view we previously discussed on the aspect recall, i.e.,

using diverse information is helpful to improve the performance of PRF for blog feed

search.

Our approaches are still notable, due to the improvements over Cluster Centroid. In

particular, for the 07 and 08 topics, our best approaches show about 3.4% and 1.5%

increases of MAP over Cluster Centroid, respectively. From these results, we can verify

that the feed-level information used in our methods is important for improving the retrieval

performance, because it captures a realistic structure between posts and feeds that Cluster

Centroid cannot automatically recognize.

Table 5 The performance of K = 10 Cluster Centroid with N under the GSD?LSD method

N 2007 (951–995) 2008 (1,051–1,100) 2009 (1,101–1,150)

MAP Pr@10 MAP Pr@10 MAP Pr@10

10 0.3850 0.5133 0.3009 0.4380 0.3847 0.4821

20 0.3955 0.5422 0.3086 0.4500 0.3850 0.4795

40 0.3911 0.5356 0.3069 0.4520 0.3901 0.4897

60 0.3802 0.5133 0.3066 0.4520 0.3780 0.4846

80 0.3949 0.5356 0.3100 0.4420 0.3832 0.4795

100 0.3926 0.5333 0.3071 0.4500 0.3827 0.4692

Baseline 0.3930 0.5667 0.3015 0.4480 0.3635 0.4564

Best 0.4299 0.5844 0.3253 0.4640 0.3903 0.4846

Approach (WFBS-5-10) (FFBS-3-3) (WFBS-3-10)

The best performance is shown in bold. For comparison purposes, the performances of the baseline and ourbest approaches for each task are included

Inf Retrieval (2012) 15:157–177 173

123

6 Related work

Since the TREC blog distillation task was introduced, many approaches have been sug-

gested for blog feed search. Most approaches are motivated by other well-studied retrieval

tasks such as the expert search task (Soboroff and de Vries 2007) and the resource selection

task in distributed information retrieval.

Elsas et al. (2008) and Arguello et al. (2008), (2009) treated blog feed search as a

resource ranking problem by using the ReDDE federated search algorithm (Si and Callan

2003). They proposed two blog representations based on granularity, and also suggested a

query expansion approach using Wikipedia for blog feed search. For PRF, Elsas et al.

(2008) proposed knowledge-intensive feedback, using Wikipedia as external knowledge.

Despite its notable results, their approach is not a closed solution that only uses the given

test collection, which is different from our approaches.

Seo and Croft (2008), (2009) dealt with blog feed search by using cluster-based retrieval

for distributed information retrieval. They also divided blog sites into three types based on

topical diversity, and considered several methods for penalizing blog sites with diverse

topics.

Macdonald and Ounis (2008) and He et al. (2009) regarded blog feed search as an

expert finding task. They used the adaptable Voting Model for the expert search task

(Macdonald and Ounis 2006), and proposed several techniques that aim to boost blog feeds

where a blogger has shown a central or recurring interest in a topic area. Carman et al.

(2009) also used a similar approach, using the Voting Model. In contrast to Macdonald and

Ounis’s work, they used non-content features for each blog in addition to existing content-

level features, and applied the Learning-to-Rank (Yue et al. 2007) approach to combine the

features and obtain a single retrieval function.

Nunes et al. (2009) suggested several strategies using temporal features for blog feed

search. They examined whether or not the maximum temporal span covered by the relevant

posts is a positive criterion in the feed search, and also investigated how the dispersion of

relevant blog posts in a blog feed would impact this task. Wang et al. (2009) proposed a

reduced document model by indexing text between certain tags, and used the PageRank of

a blog feed with its query likelihood score. Balog et al. (2008) and Weerkamp et al. (2008)

proposed two language models based on expert finding techniques, and some blog-specific

features such as document structure, social structure, and temporal structure.

7 Conclusion and future work

In this paper, we have addressed several approaches for initial retrieval and pseudo-

relevance feedback on blog feed search. Our key concern was the topical diversity of a

blog feed. Motivated by passage retrieval techniques, we presented global and local evi-

dence of blog feeds, corresponding to the document-level and passage-level evidence of

passage retrieval. We estimated global evidence using all constituent posts within a blog

feed, and local evidence using highly relevant posts within a blog feed in response to a

given query. We proposed a series of methods for evaluating the relevance between a blog

feed and a given query, using the two types of evidence.

In addition, we investigated the pseudo-relevance feedback method for blog feed search.

Our feedback approaches, motivated by passage-based feedback, gathered feedback

information using the local evidence of top K ranked feeds. The proposed methods have

two advantages. First, the usage of various feeds enables the feedback model to locate the

174 Inf Retrieval (2012) 15:157–177

123

feeds that discuss different aspects of the topic of a given query. In other words, it increases

the aspect recall of feedback information. Second, the usage of the local evidence provides

the feedback model with information relevant to a query. That is, it increases the precision

of feedback information. Experimental results on TREC distillation for the 07, 08 and 09

topics showed that the proposed feedback approach significantly and consistently out-

performed the baseline.

Many studies remain for future work. First, for the initial retrieval, we used a simple

uniform distribution as P(D|L) in (3). It would be interesting to investigate other methods

to estimate P(D|L) such as Entry Centrality (Elsas et al. 2008). Furthermore, we would like

to investigate other techniques for blog feed search such as link analysis and temporal

profiling. These techniques have the potential to improve the performance of blog feed

search. Second, for pseudo-relevance feedback, we will explore a probabilistic approach

for selecting the relevant local posts, instead of using the current threshold-driven method.

References

Allan, J. (1995). Relevance feedback with too much data. In Proceedings of the 18th annual internationalACM SIGIR conference on research and development in information retrieval (pp. 337–343).

Allan, J., Connell, M. E., Croft, W. B., Feng, F. F., Fisher, D., & Li, X. (2001). INQUERY and TREC-9. InProceedings of the ninth text REtrieval conference (pp. 551–562).

Arguello, J., Elsas, J. L., Callan, J., & Carbonell, J. G. (2008). Document representation and query expansionmodels for blog recommendation. In Proceedings of the 2nd international conference on weblogs andsocial media.

Arguello, J., Elsas, J. L., Yoo, C., Callan, J., & Carbonell, J. G. (2009). Document and query expansionmodels for blog distillation. In Proceedings of the seventeenth text REtrieval conference.

Balog, K., de Rijke, M., & Weerkamp, W. (2008). Bloggers as experts: Feed distillation using expertretrieval models. In Proceedings of the 31st annual international ACM SIGIR conference on researchand development in information retrieval (pp. 753–754).

Bendersky, M., & Kurland, O. (2010). Utilizing passage-based language models for ad hoc documentretrieval. Information Retrieval, 13, 157–187.

Callan, J. P. (1994). Passage-level evidence in document retrieval. In Proceedings of the 17th annualinternational ACM SIGIR conference on research and development in information retrieval(pp. 302–310).

Carman, M., Keikha, M., Gerani, S., Gwadera, R., Taibi, D., & Crestani, F. (2009). University of Lugano atTREC 2008 blog track. In Proceedings of the seventeenth text REtrieval conference.

Elsas, J. L., Arguello, J., Callan, J., & Carbonell, J. G. (2008). Retrieval and feedback models for blog feedsearch. In Proceedings of the 31st annual international ACM SIGIR conference on research anddevelopment in information retrieval (pp. 347–354).

He, B., Macdonald, C., Ounis, I., Peng, J., & Santos, R. L. (2009). University of Glasgow at TREC 2008:Experiments in blog, enterprise, and relevance feedback tracks with terrier. In Proceedings of theseventeenth text REtrieval conference.

Kaszkiel, M., & Zobel, J. (1997). Passage retrieval revisited. In Proceedings of the 20th annual internationalACM SIGIR conference on Research and development in information retrieval (pp. 178–185).

Kaszkiel, M., & Zobel, J. (2001). Effective ranking with arbitrary passages. Journal of the American Societyfor Information Science and Technology, 52(4), 344–364.

Kaufman, L., & Rousseeuw, P. J. (1990). Finding groups in data: an introduction to cluster analysis. NewYork: Wiley.

Kolari, P., Java, A., & Finin, T. (2006). Characterizing the splogosphere. In Proceedings of the 3rd annualworkshop on weblogging ecosystem: Aggregation, analysis and dynamics, 15th World Wid Webconference.

Kurland, O., Lee, L., & Domshlak, C. (2005). Better than the real thing?: Iterative pseudo-query processingusing cluster-based language models. In Proceedings of the 28th annual international ACM SIGIRconference on research and development in information retrieval (pp. 19–26).

Inf Retrieval (2012) 15:157–177 175

123

Lafferty, J., & Zhai, C. (2001). Document language models, query models, and risk minimization forinformation retrieval. In Proceedings of the 24th annual international ACM SIGIR conference onresearch and development in information retrieval (pp. 111–119).

Lavrenko, V., & Croft, W. B. (2001). Relevance based language models. In Proceedings of the 24th annualinternational ACM SIGIR conference on research and development in information retrievalpp. 120–127.

Lee, Y., Na, S. H., & Lee, J. H. (2009). An improved feedback approach using relevant local posts for blogfeed retrieval. In Proceeding of the 18th ACM conference on Information and knowledge management(pp. 1971–1974).

Lin, J. (1991). Divergence measures based on the shannon entropy. IEEE Transactions on InformationTheory, 37, 145–151.

Macdonald, C., & Ounis, I. (2006a). Voting for candidates: Adapting data fusion techniques for an expertsearch task. In Proceedings of the 15th ACM international conference on Information and knowledgemanagement (pp. 387–396).

Macdonald, C., & Ounis, I. (2006b). The TREC Blogs06 collection: Creating and analysing a blog testcollection. Techincal report, Department of Computing Science, University of Glasgow.

Macdonald, C., & Ounis, I. (2008). Key blog distillation: Ranking aggregates. In Proceeding of the 17thACM conference on Information and knowledge management (pp. 1043–1052).

Macdonald, C., Ounis, I., & Soboroff, I. (2008). Overview of the TREC-2007 blog track. In Proceedings ofthe sixteenth text REtrieval conference.

Macdonald, C., Ounis, I., & Soboroff, I. (2010). Overview of the TREC-2009 Blog Track. In Proceedings ofthe eighteenth text REtrieval conference.

Na, S. H., Kang, I. S., Lee, Y., & Lee, J. H. (2008a). Applying complete-arbitrary passage for pseudo-relevance feedback in language modeling approach. In Proceedings of the 4th Asia informationretrieval conference on information retrieval technology (pp. 626–631).

Na, S. H., Kang, I. S., Lee, Y., & Lee, J. H. (2008b). Completely-arbitrary passage retrieval in languagemodeling approach. In Proceedings of the 4th Asia information retrieval conference on informationretrieval technology (pp. 22–33).

Nunes, S., Ribeiro, C., David, G. (2009). FEUP at TREC 2008 blog track: Using temporal evidence forranking and feed distillation. In Proceedings of the seventeenth text REtrieval conference.

Ounis, I., Macdonald, C., & Soboroff, I. (2009). Overview of the TREC-2008 blog track. In Proceedings ofthe seventeenth text REtrieval conference.

Rocchio, J. J. (1971). Relevance feedback in information retrieval. In G. Salton (Ed.), The SMART retrievalsystem: Experiments in automatic document processing, Prentice-Hall series in automatic computation(Chap. 14, pp. 313–323). Englewood Cliffs, NJ: Prentice-Hall.

Salton, G., Allan, J., & Buckley, C. (1993). Approaches to passage retrieval in full text information systems.In Proceedings of the 16th annual international ACM SIGIR conference on research and developmentin information retrieval (pp. 49–58).

Seo, J., & Croft, W. B. (2008). Blog site search using resource selection. In Proceeding of the 17th ACMconference on Information and knowledge management (pp. 1053–1062).

Seo, J., & Croft, W. B. (2009). UMass at TREC 2008 blog distillation task. In Proceedings of the seven-teenth text REtrieval conference.

Shen, X., & Zhai, C. (2005). Active feedback in ad hoc information retrieval. In Proceedings of the 28thannual international ACM SIGIR conference on research and development in information retrieval(pp. 59–66).

Si, L., & Callan, J. (2003). Relevant document distribution estimation method for resource selection. InProceedings of the 26th annual international ACM SIGIR conference on research and development ininformaion retrieval (pp. 298–305).

Soboroff, I., & de Vries, A. P. (2007). Overview of the TREC 2006 enterprise track. In Proceedings of thefifteenth text REtrieval conference.

Wang, J., Sun, Y., Mukhtar, O., & Srihari, R. (2009). TREC 2008 at the University at Buffalo: Legal andblog track. In Proceedings of the seventeenth text REtrieval conference.

Weerkamp, W., Balog, K., & de Rijke, M. (2008). Finding key bloggers, one post at a time. In Proceeding ofthe 2008 conference on ECAI 2008: 18th European conference on artificial intelligence (pp. 318–322).

Yu, S., Cai, D., Wen, J. R., & Ma, W. Y. (2003). Improving pseudo-relevance feedback in web informationretrieval using web page segmentation. In Proceedings of the 12th international conference on WorldWide Web (pp. 11–18).

Yue, Y., Finley, T., Radlinski, F., & Joachims, T. (2007). A support vector method for optimizing averageprecision. In Proceedings of the 30th annual international ACM SIGIR conference on research anddevelopment in information retrieval (pp. 271–278).

176 Inf Retrieval (2012) 15:157–177

123

Zhai, C., & Lafferty, J. (2001). Model-based feedback in the language modeling approach to informationretrieval. In Proceedings of the tenth international conference on information and knowledge man-agement (pp. 403–410).

Zhai, C., & Lafferty, J. (2004). A study of smoothing methods for language models applied to informationretrieval. ACM Transactions on Information Systems, 22(2), 179–214.

Inf Retrieval (2012) 15:157–177 177

123