integrating term dependencies according to their utility jian-yun nie university of montreal 1

Integrating term dependencies according to their utility

Jian-Yun NieUniversity of Montreal

Need for term dependency

• The meaning of a term often depends on other terms used in the same context– Term dependency– E.g. computer architecture, hot dog, …

• Unigram model is unable to capture term dependency– hot + dog ≠ "hot dog"

• Dependency: a group of terms (a pair of terms)

Previous approaches• Phrase + unigram

– 2 representations: phrase model and unigram model– Interpolation (each model with a fixed weight)– Assumption: phrases represent useful dependencies between

terms for IR– E.g. Q = the price of hot dog

• PUnigram: price, hot, dog

• PPhrase: price, hot_dog

• P(price hot dog|D) = Pphrase(price hot dog|D) + (1-PUnigram(price hot dog|D)

• or score = scorephrase + (1-scoreunigram

– Effect: documents with the phrase “hot dog” have a higher score

Dependency model

• Dependency language model (Gao et al. 2005)– Determine the strongest dependencies among

query terms (a parsing process):– price hot dog

– The determined dependencies define an additional requirement for documents:• Documents have to contain the unigrams• Documents have to contain the required dependencies• The two criteria are linearly interpolated

Markov Random Field (MRF) (Metzler&Croft)

• Sequential Full

• Potential function• Sequential model: Interpolation of unigram model, ordered

bigram and unordered bigram

Limitations

• The importance of a (type of) dependency is fixed in the combined model in the same way for all the queries– A fixed weight is assigned to each component model

• price-dog is as important as hot-dog (dependency model)• price-hot is as important as hot-dog (MRF) in the ordered model

• Are they equally strong dependencies?– hot-dog > price-dog, price-hot

• Intuition: a stronger dependency forms a stronger constraint

Limitations

• Can a phrase model solve this problem?– Some phrases form a semantically stronger dependency

than some others• hot-dog > cute-dog• Sony digital-camera > Sony-digital camera, Sony-camera digital

– Is a semantically stronger dependency more useful for IR?• Not necessarily• digital-camera could be less useful than Sony-camera• The importance of a dependency in IR depends on its usefulness

to retrieve better documents.

Limitations

• MRF sequential model– Only consider consecutive pairs of terms– No dependency between distant terms• Sony digital camera: Sony-digital, digital-camera

• Full model– Can cover long distance dependencies– But large increase in complexity

Proximity: more flexible dependency

• Tao&Zhai, 2007• Zhao&Yun 2009

• ProxB(wi): proximity centrality– Min/average/sum dist. to the other query terms

• However, is still fixed. 9

A recent extension to MRF model• Bendersky, Metzler, Croft, 2010

– Weighted dependencies

– wjuni and wj

bi: the importance of different features

– gjuni and gj

bi: the weight of each unigram and bigram according to its utility

– However• fo and fu are mixed up

• Only consider dependency between pairs of adjacent terms

Go further

• Using discriminative model instead of MRF– Can consider dependencies between more distant terms,

without having the exponential complexity growth

• We only consider pair-wise dependencies• Assumption: pair-wise dependencies capture the most important part of

dependencies

• Consider several types of dependencies between query terms– Ordered bigram– Unordered pair of terms within some distance (2, 4, 8, 16)

• Dependencies at different distances have different strengths• Co-occurrence dependency ~ variable proximity

General discriminative model

• Breaking down each component model to consider the strength/usefulness of a term dependency

• U, B, Cw: importance of a unigram, a bigram and a co-

occurrence pair within distance w in documents

An example

• corporate pension plans funds

corporate pension fundsplans

.07 .60

.08 .80

.70.80

.60.35

bico2co4co8 (co16 omitted)

Further development

• Set U at 1 and vary the other • Features:

How to determine the usefulness of a bigram and a co-occurrence pair B andCw ?

- Using a learning method based on some features- Cross-validation

Learning method

• Parameters• Goal:

– Ti: Training data

– Ri: Document ranking using the parameters

– E: measure of effectiveness (MAP)

• Training data:– {xi, zi} a bigram or a pair of term within distance w and its best value

for the query– Finding the best value by coordinate-level ascendent search

• Epsilon SVM with radial basis kernel function

Features

Test collections

Results with other models

With our model

Analysis• Some intuitively strong dependencies should not

be considered as important in the retrieval process

• Disk1-query 088:“crude oil price trends”– Ideal weights (bi,co2,4,8,16)=0, AP=0.103 – leant bi=0.2, co2..16=0, AP=0.060

• Disk1-query 003: “joint ventures”– Ideal weights (bi,co2,4,8,16)=0, AP=0.086– leant bi=0.07,co2..16=0, AP=0.084

• Disk1-query 094: “computer aided crime”– Ideal weights (bi,co2,4,8,16) =0, AP=0.223– leant bi=0.3, co2..16=0, AP=.158

Analysis

• Some intuitively weakly connected words should be considered as strong dependencies:

• Disk1-query184: “corporate pension plans funds”– Ideal wt.bi=0.5, co2=0.7, co4=0.2, AP=0.253– Learnt wt.bi=0.2,co8=0.01, co16=0.001, AP=0.201 (Uni=0.131)

• Disk1-query115: “impact 1986 immigration law”– Ideal wt.co2=0.1, co4=0.35, co8=0.05, AP=0.511– Learnt wt.bi=0, co16=0.01, AP=0.492 (Uni=0.437)

Disk1-query115:“impact 1986 immigration law”

Ideal AP =0.511, uni=0.437, learnt=0.492

.01 .35impact 1986 immigr. law

.01.10

(Learnt) imp-1986 imp-imm imp-law 1986-imm 1986-law imm-law

wt.bi - - .14 - - -

wt.co2 - - - - - .05

wt.co8 - .01 .01 .01 - .01

wt.co16 - .01 .01 .01 .01 .02

Disk1-query184:“corporate pension plans funds”

• AP ideal=0.253, uni=0.132, learnt=0.201

corporate pension fundsplans

.07 .60

.08 .80

.70.80

.60.35

(Learnt) corp-pen corp-plan corp-fund pen-plan pen-fund plan-fund

wt.bi - - - .20 .18 -

wt.co2 - .05 - .59 .23 -

wt.co8 - .01 - .02 .02 .01

wt.co16 - .02 .02 .04 - .001

Typical case 1: weak bigram dependency, weak co-occurrence dependency

Typical case 2: strong dependencies

Typical case 3: Weak bigram dependency, strong co-occurrence dependency

Conclusions

• Different types of dependency between query terms to be considered

• They have variable importance/usefulness for IR, and should be integrated in IR model with different weights.– Not necessarily correlate with semantic dependency

• The new model is better than the existing models in most cases (stat. significance in some cases)

integrating term dependencies according to their utility jian-yun nie university of montreal 1

phrase model

phrase hot dog

component model pricedog

ordered model

stronger dependency

price of hot dog p unigram

unigram model interpolation

interpolation of unigram

Documents

dependence language model for information retrieval jianfeng...

1 ift1025 – programmation 2 multi-thread jian-yun nie

stanford typed dependencies manual -...

information retrieval – lsi, plsi and lda jian-yun nie

irf1 what’s different with chinese in cross-language ir?...

web viewxu fengyin,yun jian,meng fuyin.low carbon economy...

selecting good expansion terms for pseudo-relevance feedback...

osgi community event 2010 - dependencies, dependencies,...

dependencies, dependencies, dependencies

introduction aux réseaux informatique ift 6800 – e 2007...

supporting information · alkene functionalization for the...

interface et classe ift1025, programmation 2 jian-yun nie

date: 2013/1/17 author: yang liu, ruihua song, yu chen,...

ir - indexing jian-yun nie (based on the lectures of manning...

1 combining linguistic resources and statistical language...

ift1025: programmation 2 internet jian-yun nie. concepts...

representing higher-order dependencies in networks ·...

1 architecture dordinateur ift6800 jian-yun nie...

arduino yun temboo twitter tracker - adafruit...arduino yun

classe et héritage ift1025 jian-yun nie. notions...