iterative translation disambiguation for cross-language information retrieval
DESCRIPTION
Iterative Translation Disambiguation for Cross-Language Information Retrieval. Advisor : Dr. Hsu Presenter : Yu-San Hsieh Author : Christof Monz and Bonnie J. Dorr. 2005.SIGIR.520-527. Outline. Motivation Objective Approach Experiment Result Introduction Experiment Conclusions. - PowerPoint PPT PresentationTRANSCRIPT
1Intelligent Database Systems Lab
國立雲林科技大學National Yunlin University of Science and Technology
Iterative Translation Disambiguation for Cross-Language Information
Retrieval
Advisor : Dr. Hsu
Presenter : Yu-San Hsieh
Author : Christof Monz and Bonnie J. Dorr
2005.SIGIR.520-527
2
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Motivation Objective Approach Experiment Result Introduction Experiment Conclusions
Outline
3
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Motivation Many words or phrases in one language can
be translated into another language in a number of way, so translation ambiguity is very common ,that impacting the effectiveness of information retrieval.
Penalty (English)
Elfmeter (Soccer)
Strafe (punishment)
4
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Objective Finding a proper distribution of translation
probabilities that can solve the translation ambiguity problem.
5
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Approach Find a proper of translation
probabilities. Computing Term Weight
─ Initialization Step
─ Iteration Step
─ Normalization Step
─ All term weights in a vector
─ Iteration Stop
tradeunion
europe
gewerbe
geschaeft
handel
europa
union
gewerkschaft
2*2.02*2.01*20833.0
)|(
:
1,1
iiT stw
ex
6
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Approach
Measuring association strength─ Pointwise mutual information
─ Dice coefficient
─ Log Likelihood ratio
7
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Experiment Result
Individual queries (topic)
Differences
baseline
Improve
8
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Introduction
Two techniques for cross-language retrieval─ Translate collection of document into target language
and apply monolingual retrieval─ Translate the query into target language and apply
translated query retrieval Three approach may be used produce the
translations─ Machine translation system─ Dictionary ─ Parallel corpus to estimate the probabilities
9
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Introduction
One language translation into another language in a number ways.─ Penalty (English) => Elfmeter (soccer) or Strafe (punis
hment)
10
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Introduction A approach can solve the problem of word sele
ction is to use co-occurrences between term. Problem (a larger number of terms)
─ Data-sparseness Use very large corpora for counting co-occruences frequencies Use internet search engines Smoothing
11
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Experiment Test Data
─ CLEF 2003 English to German bilingual data─ Choice 56 topic (title, description, narrative)
Morphological Normalization─ Source-language word (topic) normalized to match in bilingual dictiona
ry ─ De-compounding: 5-grams─ Assign weights to 5-gram substrings
12
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Experiment Retrieval Model
─ Lnu.Itc weighting scheme
─ Weighted document similarity
Statistical Significance─ Bootstrap method
Bootstrap sample One-tailed significance testing (compare two retrieval method)
13
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Experiment
Found some problem in experiment─ Individual average precision of Log Likelihood ratio
decreases for a number of query. Unknown word
The original word from the source language is include in the target language query.
Example Women’s Conference Beijing
Women
(專有名詞 )Women
Women
Assign weighted =1
Result1.Woman control document simliarity
2.Most top-ranked documents contain
Women as the only matching term.
normalized
Not find : Woman
14
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Conclusions
Our approach improve retrieval effectiveness compare to baseline using bilingual dictionary lookup.
Experimental result show that Log Likelihood Ratio has the strong positive impact.
15
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.My opinion
Advantage: It only requires a bilingual dictionary and a
monolingual corpus in the target language.
Disadvantage: Unknown word
Apply