evaluating cross-language information retrieval systems carol peters iei-cnr
TRANSCRIPT
![Page 1: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56649e245503460f94b11bea/html5/thumbnails/1.jpg)
Evaluating Cross-language Information Retrieval Systems
Carol Peters
IEI-CNR
![Page 2: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56649e245503460f94b11bea/html5/thumbnails/2.jpg)
SPINN Seminar, Copenhagen26-27 October 2001
Outline
Why IR System Evaluation is Important
Evaluation programs
An Example
![Page 3: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56649e245503460f94b11bea/html5/thumbnails/3.jpg)
SPINN Seminar, Copenhagen26-27 October 2001
What is an IR System Evaluation Campaign?
An activity which tests the performance of different systems on a given task (or set of tasks) under standard conditions
Permits contrastive analysis of approaches/technologies
![Page 4: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56649e245503460f94b11bea/html5/thumbnails/4.jpg)
SPINN Seminar, Copenhagen26-27 October 2001
How well does system meet information need?
System evaluation:
how good are document rankings?
User-based evaluation:
how satisfied is the user?
![Page 5: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56649e245503460f94b11bea/html5/thumbnails/5.jpg)
SPINN Seminar, Copenhagen26-27 October 2001
Why we need Evaluation
evaluation permits hypotheses to be validated and progress assessed
evaluation helps to identify areas where more R&D is needed
evaluation saves developers time and money
CLIR systems are still in experimental stageEvaluation is particularly important!
![Page 6: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56649e245503460f94b11bea/html5/thumbnails/6.jpg)
SPINN Seminar, Copenhagen26-27 October 2001
CLIR System Evaluation is Complex
CLIR systems consist of integration of components and technologies
need to evaluate single components need to evaluate overall system
performance need to distinguish methodological aspects
from linguistic knowledge
![Page 7: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56649e245503460f94b11bea/html5/thumbnails/7.jpg)
SPINN Seminar, Copenhagen26-27 October 2001
Technology vs. Usage Evaluation
Usage Evaluation: shows value of a technology for user determines the technology thresholds that are
indispensable for specific usage provides directions for choice of criteria for
technology evaluation
Influence of language and culture on usability of technology needs to be understood
![Page 8: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56649e245503460f94b11bea/html5/thumbnails/8.jpg)
SPINN Seminar, Copenhagen26-27 October 2001
Organising an Evaluation Activity
select control task(s) provide data to test and tune systems define protocol and metrics to be used
in results assessment
Aim is an objective comparison between systems and approaches
![Page 9: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56649e245503460f94b11bea/html5/thumbnails/9.jpg)
SPINN Seminar, Copenhagen26-27 October 2001
Test Collection
Set of documents - must be representative of task of interest; must be large
Set of “topics” - statement of user needs from which system data structure (query) is extracted
Relevance judgments – judgments vary by assessor but no evidence that differences affect comparative evaluation of systems
![Page 10: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56649e245503460f94b11bea/html5/thumbnails/10.jpg)
SPINN Seminar, Copenhagen26-27 October 2001
Using Pooling to Create Large Test Collections
Assessors create topics.
Systems are evaluated using relevance judgments.
Form pools of unique documents from all submissions which the assessors judge for relevance.
A variety of different systems retrieve the top 1000 documents for each topic.
Ellen Voorhees – CLEF 2001 Workshop
![Page 11: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56649e245503460f94b11bea/html5/thumbnails/11.jpg)
SPINN Seminar, Copenhagen26-27 October 2001
Cross-language Test Collections
Consistency harder to obtain than for monolingualparallel or comparable document collectionsmultiple assessors per topic creation and relevance
assessment (for each language)must take care when comparing different language
evaluations (e.g., cross run to mono baseline)
Pooling harder to coordinateneed to have large, diverse pools for all languages retrieval results are not balanced across languages
Taken from Ellen Voorhees – CLEF 2001 Workshop
![Page 12: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56649e245503460f94b11bea/html5/thumbnails/12.jpg)
SPINN Seminar, Copenhagen26-27 October 2001
Evaluation Measures
Recall: measures ability of system to find all relevant items
recall =
Precision: measures ability of system to find only relevant items
precision =
no. of rel. items retrieved----------------------------------no. of rel. items in collection
no. of rel. items retrieved----------------------------------total no. of items retrieved
Recall-Precision Graph is used to compare systems
![Page 13: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56649e245503460f94b11bea/html5/thumbnails/13.jpg)
SPINN Seminar, Copenhagen26-27 October 2001
Main CLIR Evaluation Programs
TIDES: sponsors TREC (Text REtrieval Conferences) and TDT (Topic Detection and Tracking) - Chinese-English tracks in 2000; TREC focussing on English/French - Arabic in 2001
NTCIR: Nat.Inst. for Informatics, Tokyo. Chinese-English; Japanese-English C-L tracks
AMARYLLIS: focused on French; 98-99 campaign included C-L track; 3rd campaign begins Sept.01
CLEF: Cross Language Evaluation Forum - C-L evaluation for European languages
![Page 14: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56649e245503460f94b11bea/html5/thumbnails/14.jpg)
SPINN Seminar, Copenhagen26-27 October 2001
Cross-Language Evaluation Forum
Funded by DELOS Network of Excellence for Digital libraries and US National Institute for Standards and Technology (200-2001)
Extension of CLIR track at TREC (1997-1999)
Coordination is distributed - national sites for each language in multilingual collection
![Page 15: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56649e245503460f94b11bea/html5/thumbnails/15.jpg)
SPINN Seminar, Copenhagen26-27 October 2001
CLEF Partners (2000-2001)
Eurospider, Zurich, Switzerland (Peter Schäuble, Martin Braschler)
IEEC-UNED, Madrid, Spain (Felisa Verdejo, Julio Gonzalo) IEI-CNR, Pisa, Italy (Carol Peters) IZ Sozialwissenschaften, Bonn, Germany (Michael Kluck) NIST, Gaithersburg MD, USA (Donna Harman, Ellen
Voorhees) University of Hildesheim, Germany (Christa Womser-
Hacker) University of Twente, The Netherlands (Djoerd Hiemstra)
![Page 16: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56649e245503460f94b11bea/html5/thumbnails/16.jpg)
SPINN Seminar, Copenhagen26-27 October 2001
CLEF - Main Goals
Promote research by providing an appropriate infrastructure for: CLIR system evaluation, testing and tuning comparison and discussion of results building of test-suites for system developers
![Page 17: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56649e245503460f94b11bea/html5/thumbnails/17.jpg)
SPINN Seminar, Copenhagen26-27 October 2001
CLEF 2001Task Description
Four main evaluation tracks in CLEF 2001: multilingual information retrieval bilingual IR monolingual (non-English) IR domain-specific IR
plus experimental track for interactive C-L systems
![Page 18: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56649e245503460f94b11bea/html5/thumbnails/18.jpg)
SPINN Seminar, Copenhagen26-27 October 2001
CLEF 2001Data Collection
Multilingual comparable corpus of news agencies and newspaper documents for six languages (DE,EN,FR,IT,NL,SP). Nearly 1 million documents
Common set of 50 topics (from which queries are extracted) created in 9 European languages (DE,EN,FR,IT,NL,SP+FI,RU,SV) and 3 Asian languages (JP,TH,ZH)
![Page 19: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56649e245503460f94b11bea/html5/thumbnails/19.jpg)
SPINN Seminar, Copenhagen26-27 October 2001
CLEF 2001 Creating the Queries
Title: European Industry Description: What factors damage the competitiveness of
European industry on the world's markets? Narrative: Relevant documents discuss factors that
render European industry and manufactured goods less competitive with respect to the rest of the world, e.g. North America or Asia. Relevant documents must report data for Europe as a whole rather than for single European nations.
Queries are extracted from topics: 1 or more fields
![Page 20: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56649e245503460f94b11bea/html5/thumbnails/20.jpg)
SPINN Seminar, Copenhagen26-27 October 2001
CLEF 2001 Creating the Queries
Distributed activity (Bonn, Gaithersburg, Pisa, Hildesheim, Twente, Madrid)
Each group produced 13-15 queries (topics), 1/3 local, 1/3 European, 1/3 international
Topic selection at meeting in Pisa (50 topics) Topics were created in DE, EN,FR,IT,NL,SP and
additionally translated to SV,RU,FI and TH,JP,ZH Cleanup after topic translation
![Page 21: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56649e245503460f94b11bea/html5/thumbnails/21.jpg)
SPINN Seminar, Copenhagen26-27 October 2001
Topics either DE,EN,FR,IT FI,NL,SP,SV,RU,ZH,JP,TH
English German
French Italian
Participant’s Cross-Language Information Retrieval System
documents
CLEF 2001 Multilingual IR
One result list of DE, EN, FR,IT and SP documents ranked in decreasing
order of estimated relevance
Spanish
![Page 22: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56649e245503460f94b11bea/html5/thumbnails/22.jpg)
SPINN Seminar, Copenhagen26-27 October 2001
CLEF 2001 Bilingual IR
Task: query English or Dutch target document collections
Goal: retrieve documents for target language, listing results in ranked list
Easier task for beginners !
![Page 23: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56649e245503460f94b11bea/html5/thumbnails/23.jpg)
SPINN Seminar, Copenhagen26-27 October 2001
CLEF 2001 Monolingual IR
Task: querying document collections in FR|DE|IT|NL|SP
Goal: acquire better understanding of language- dependent retrieval problems
different languages present different retrieval problems
issues involved include word order, morphology, diacritic characters, language variants
![Page 24: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56649e245503460f94b11bea/html5/thumbnails/24.jpg)
SPINN Seminar, Copenhagen26-27 October 2001
CLEF 2001Domain-Specific IR
Task: querying a structured database from a vertical domain (social sciences) in German
German/English/Russian thesaurus and English translations of document titles
Monolingual or cross-language task Goal: understand implications of querying in
domain-specific context
![Page 25: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56649e245503460f94b11bea/html5/thumbnails/25.jpg)
SPINN Seminar, Copenhagen26-27 October 2001
CLEF 2001Interactive C-L
Task: interactive document selection in an “unknown” target language
Goal: evaluation of results presentation rather than system performance
![Page 26: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56649e245503460f94b11bea/html5/thumbnails/26.jpg)
SPINN Seminar, Copenhagen26-27 October 2001
CLEF 2001: Participation
N.America Asia
Europe
34 participants, 15 different countries
![Page 27: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56649e245503460f94b11bea/html5/thumbnails/27.jpg)
SPINN Seminar, Copenhagen26-27 October 2001
Details of ExperimentsTrack # Participants # Runs/Experiments
Multilingual 8 26
Bilingual to EN 19 61
Bilingual to NL 3 3
Monolingual DE 12 25
Monolingual ES 10 22
Monolingual FR 9 18
Monolingual IT 8 14
Monolingual NL 9 19
Domain-specific 1 4
Interactive 3 6
![Page 28: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56649e245503460f94b11bea/html5/thumbnails/28.jpg)
SPINN Seminar, Copenhagen26-27 October 2001
Runs per Topic Language
20
20
38
40
17
33
912 6 2 4
DutchEnglishFrenchGermanItalianSpanishChineseFinnishJapaneseRussianSwedishThai
![Page 29: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56649e245503460f94b11bea/html5/thumbnails/29.jpg)
SPINN Seminar, Copenhagen26-27 October 2001
Topic Fields
63
108
13 3 5
TDNTDTDN
![Page 30: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56649e245503460f94b11bea/html5/thumbnails/30.jpg)
SPINN Seminar, Copenhagen26-27 October 2001
CLEF 2001Participation
CMU Eidetica Eurospider * Greenwich U HKUST Hummingbird IAI * IRIT * ITC-irst * JHU-APL * Kasetsart U KCSL Inc.
Medialab Nara Inst. of Tech. National Taiwan U OCE Tech. BV SICS/Conexor SINAI/U Jaen Thomson Legal * TNO TPD * U Alicante U Amsterdam U Exeter
U Glasgow * U Maryland * (interactive only) U Montreal/RALI * U Neuchâtel U Salamanca * U Sheffield * (interactive only) U Tampere * U Twente (*) UC Berkeley (2 groups) * UNED (interactive only)
(* = also participated in 2000)
![Page 31: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56649e245503460f94b11bea/html5/thumbnails/31.jpg)
SPINN Seminar, Copenhagen26-27 October 2001
CLEF 2001Approaches
All traditional approaches used: commercial MT systems (Systran, Babelfish,
Globalink Power Translator, ) both query and document translation tried
bilingual dictionary look-up (on-line and in-house tools) aligned parallel corpora (web-derived) comparable corpora (similarity thesaurus) conceptual networks (Eurowordnet, ZH-EN wordnet) multilingual thesaurus (domain-specific task)
![Page 32: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56649e245503460f94b11bea/html5/thumbnails/32.jpg)
SPINN Seminar, Copenhagen26-27 October 2001
CLEF 2001Techniques Tested
Text processing for multiple languages: Porter stemmer, Inxight commercial stemmer, on-site tools
simple generic “quick&dirty” stemming language independent stemming
separate stopword lists vs single list morphological analysis n-gram indexing, word segmentation, decompounding
(e.g. Chinese, German) use of NLP methods, e.g. phrase identification,
morphosyntactic analysis
![Page 33: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56649e245503460f94b11bea/html5/thumbnails/33.jpg)
SPINN Seminar, Copenhagen26-27 October 2001
CLEF 2001Techniques Tested
Cross-language strategies included: integration of methods (MT, corpora and MRDs) pivot language to translate from L1 -> L2 (DE ->
FR,SP,IT via EN) N-gram based technique to match untranslatable words prior and post-translation pseudo-relevance feedback
(query expanded by associating frequent cooccurrences) vector-based semantic analysis (query expanded by
associating semantically similar terms)
![Page 34: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56649e245503460f94b11bea/html5/thumbnails/34.jpg)
SPINN Seminar, Copenhagen26-27 October 2001
CLEF 2001Techniques Tested
Different strategies experimented for results merging
This remains still an unsolved problem
![Page 35: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56649e245503460f94b11bea/html5/thumbnails/35.jpg)
SPINN Seminar, Copenhagen26-27 October 2001
CLEF 2001 Workshop
Results of CLEF 2001 campaign presented at Workshop, 3-4 September 2001, Darmstadt, Germany
50 researchers and system developers from academia and industry participated.
Working Notes containing preliminary reports and statistics on CLEF2001 experiments distributed.
![Page 36: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56649e245503460f94b11bea/html5/thumbnails/36.jpg)
SPINN Seminar, Copenhagen26-27 October 2001
CLEF-2001 vs. CLEF-2000
Most participants were back Less MT More Corpus-Based People really start to try each other’s
ideas/methods: corpus-based approaches (parallel web,
alignments) n-grams combination approaches
![Page 37: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56649e245503460f94b11bea/html5/thumbnails/37.jpg)
SPINN Seminar, Copenhagen26-27 October 2001
“Effect” of CLEF
Many more European groups Dramatic increase of work in
stemming/decompounding (for languages other than English)
Work on mining the web for parallel texts Work on merging (breakthrough still
missing?) Work on combination approaches
![Page 38: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56649e245503460f94b11bea/html5/thumbnails/38.jpg)
SPINN Seminar, Copenhagen26-27 October 2001
CLEF 2002
Accompanying Measure under IST Accompanying Measure under IST programme: Contract programme: Contract No. IST-2000-31002No. IST-2000-31002. . October 2001October 2001
CLEF ConsortiumIEI-CNR, Pisa; ELRA/ELDA, Paris; Eurospider, Zurich; UNED, Madrid; NIST, USA; IZ Sozialwissenschaften, Bonn
Associated MembersUniversity of Hildesheim, University of Twente, University of Tampere (?)
![Page 39: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56649e245503460f94b11bea/html5/thumbnails/39.jpg)
SPINN Seminar, Copenhagen26-27 October 2001
CLEF 2002Task Description
Similar to CLEF 2001: multilingual information retrieval bilingual IR (not to English!) monolingual (non-English) IR domain-specific IR interactive track
Plus feasibility study for spoken document track (within DELOS – results reported at CLEF)
Possible cooordination with Amaryllis
![Page 40: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56649e245503460f94b11bea/html5/thumbnails/40.jpg)
SPINN Seminar, Copenhagen26-27 October 2001
CLEF 2002Schedule
Call for Participation - November 2001 Document release – 1 February 2002 Topic Release – 1 April 2002 Runs received - 15 June 2002 Results communicated – 1 August 2002 Paper for Working Notes - 1 September 2002 Workshop - 19-20 September
![Page 41: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56649e245503460f94b11bea/html5/thumbnails/41.jpg)
SPINN Seminar, Copenhagen26-27 October 2001
Evaluation - Summing up
system evaluation is not a competition to find the best
evaluation provides opportunity to test, tune, and compare approaches in order to improve system performance
an evaluation campaign creates a community interested in examining the same issues and comparing ideas and experiences
![Page 42: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56649e245503460f94b11bea/html5/thumbnails/42.jpg)
SPINN Seminar, Copenhagen26-27 October 2001
Cross-Language Evaluation Forum
For further information see:
http://www.clef-campaign.org
or contact:
Carol Peters - IEI-CNR
E-mail: [email protected]