chances and challenges in comparing cross-language retrieval tools

103
Chances and Challenges in Comparing Cross-Language Retrieval Tools Giovanna Roda Vienna, Austria Irf Symposium 2010 / June 3, 2010

Upload: giovannaroda

Post on 11-Nov-2014

528 views

Category:

Technology


1 download

DESCRIPTION

Presentation at IRF symposium 2010 Vienna June 3, 2010

TRANSCRIPT

Page 1: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Chances and Challenges in ComparingCross-Language Retrieval Tools

Giovanna RodaVienna, Austria

Irf Symposium 2010 / June 3, 2010

Page 2: Chances and Challenges in Comparing Cross-Language Retrieval Tools

CLEF-IP: the Intellectual Property track at CLEF

CLEF-IP is an evaluation track within the Cross LanguageEvaluation Forum (Clef). 1

organized by the IRF

first track ran in 2009

running this year for the second time

1http://www.clef-campaign.org

Page 3: Chances and Challenges in Comparing Cross-Language Retrieval Tools

CLEF-IP: the Intellectual Property track at CLEF

CLEF-IP is an evaluation track within the Cross LanguageEvaluation Forum (Clef). 1

organized by the IRF

first track ran in 2009

running this year for the second time

1http://www.clef-campaign.org

Page 4: Chances and Challenges in Comparing Cross-Language Retrieval Tools

CLEF-IP: the Intellectual Property track at CLEF

CLEF-IP is an evaluation track within the Cross LanguageEvaluation Forum (Clef). 1

organized by the IRF

first track ran in 2009

running this year for the second time

1http://www.clef-campaign.org

Page 5: Chances and Challenges in Comparing Cross-Language Retrieval Tools

CLEF-IP: the Intellectual Property track at CLEF

CLEF-IP is an evaluation track within the Cross LanguageEvaluation Forum (Clef). 1

organized by the IRF

first track ran in 2009

running this year for the second time

1http://www.clef-campaign.org

Page 6: Chances and Challenges in Comparing Cross-Language Retrieval Tools

CLEF-IP: the Intellectual Property track at CLEF

CLEF-IP is an evaluation track within the Cross LanguageEvaluation Forum (Clef). 1

organized by the IRF

first track ran in 2009

running this year for the second time

1http://www.clef-campaign.org

Page 7: Chances and Challenges in Comparing Cross-Language Retrieval Tools

What is an evaluation track?

An evaluation track in Information Retrieval is a cooperative actionaimed at comparing different techniques on a common retrievaltask.

produces experimental data that can be analyzed and used toimprove existing systems

fosters exchange of ideas and cooperation

produces a reusable test collection, sets milestones

Test collection

A test collection consists traditionally of target data, a set ofqueries, and relevance assessments for each query.

Page 8: Chances and Challenges in Comparing Cross-Language Retrieval Tools

What is an evaluation track?

An evaluation track in Information Retrieval is a cooperative actionaimed at comparing different techniques on a common retrievaltask.

produces experimental data that can be analyzed and used toimprove existing systems

fosters exchange of ideas and cooperation

produces a reusable test collection, sets milestones

Test collection

A test collection consists traditionally of target data, a set ofqueries, and relevance assessments for each query.

Page 9: Chances and Challenges in Comparing Cross-Language Retrieval Tools

What is an evaluation track?

An evaluation track in Information Retrieval is a cooperative actionaimed at comparing different techniques on a common retrievaltask.

produces experimental data that can be analyzed and used toimprove existing systems

fosters exchange of ideas and cooperation

produces a reusable test collection, sets milestones

Test collection

A test collection consists traditionally of target data, a set ofqueries, and relevance assessments for each query.

Page 10: Chances and Challenges in Comparing Cross-Language Retrieval Tools

What is an evaluation track?

An evaluation track in Information Retrieval is a cooperative actionaimed at comparing different techniques on a common retrievaltask.

produces experimental data that can be analyzed and used toimprove existing systems

fosters exchange of ideas and cooperation

produces a reusable test collection, sets milestones

Test collection

A test collection consists traditionally of target data, a set ofqueries, and relevance assessments for each query.

Page 11: Chances and Challenges in Comparing Cross-Language Retrieval Tools

What is an evaluation track?

An evaluation track in Information Retrieval is a cooperative actionaimed at comparing different techniques on a common retrievaltask.

produces experimental data that can be analyzed and used toimprove existing systems

fosters exchange of ideas and cooperation

produces a reusable test collection, sets milestones

Test collection

A test collection consists traditionally of target data, a set ofqueries, and relevance assessments for each query.

Page 12: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Clef–Ip 2009: the task

The main task in the Clef–Ip track was to find prior art for agiven patent.

Prior art search

Prior art search consists in identifying all information (includingnon-patent literature) that might be relevant to a patent’s claim ofnovelty.

Page 13: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Clef–Ip 2009: the task

The main task in the Clef–Ip track was to find prior art for agiven patent.

Prior art search

Prior art search consists in identifying all information (includingnon-patent literature) that might be relevant to a patent’s claim ofnovelty.

Page 14: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Participants - 2009 track

1 Tech. Univ. Darmstadt, Dept. of CS,Ubiquitous Knowledge Processing Lab (DE)

2 Univ. Neuchatel - Computer Science (CH)

3 Santiago de Compostela Univ. - Dept.Electronica y Computacion (ES)

4 University of Tampere - Info Studies (FI)

5 Interactive Media and Swedish Institute ofComputer Science (SE)

6 Geneva Univ. - Centre Universitaired’Informatique (CH)

7 Glasgow Univ. - IR Group Keith (UK)

8 Centrum Wiskunde & Informatica - InteractiveInformation Access (NL)

Page 15: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Participants - 2009 track

1 Tech. Univ. Darmstadt, Dept. of CS,Ubiquitous Knowledge Processing Lab (DE)

2 Univ. Neuchatel - Computer Science (CH)

3 Santiago de Compostela Univ. - Dept.Electronica y Computacion (ES)

4 University of Tampere - Info Studies (FI)

5 Interactive Media and Swedish Institute ofComputer Science (SE)

6 Geneva Univ. - Centre Universitaired’Informatique (CH)

7 Glasgow Univ. - IR Group Keith (UK)

8 Centrum Wiskunde & Informatica - InteractiveInformation Access (NL)

Page 16: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Participants - 2009 track

1 Tech. Univ. Darmstadt, Dept. of CS,Ubiquitous Knowledge Processing Lab (DE)

2 Univ. Neuchatel - Computer Science (CH)

3 Santiago de Compostela Univ. - Dept.Electronica y Computacion (ES)

4 University of Tampere - Info Studies (FI)

5 Interactive Media and Swedish Institute ofComputer Science (SE)

6 Geneva Univ. - Centre Universitaired’Informatique (CH)

7 Glasgow Univ. - IR Group Keith (UK)

8 Centrum Wiskunde & Informatica - InteractiveInformation Access (NL)

Page 17: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Participants - 2009 track

1 Tech. Univ. Darmstadt, Dept. of CS,Ubiquitous Knowledge Processing Lab (DE)

2 Univ. Neuchatel - Computer Science (CH)

3 Santiago de Compostela Univ. - Dept.Electronica y Computacion (ES)

4 University of Tampere - Info Studies (FI)

5 Interactive Media and Swedish Institute ofComputer Science (SE)

6 Geneva Univ. - Centre Universitaired’Informatique (CH)

7 Glasgow Univ. - IR Group Keith (UK)

8 Centrum Wiskunde & Informatica - InteractiveInformation Access (NL)

Page 18: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Participants - 2009 track

1 Tech. Univ. Darmstadt, Dept. of CS,Ubiquitous Knowledge Processing Lab (DE)

2 Univ. Neuchatel - Computer Science (CH)

3 Santiago de Compostela Univ. - Dept.Electronica y Computacion (ES)

4 University of Tampere - Info Studies (FI)

5 Interactive Media and Swedish Institute ofComputer Science (SE)

6 Geneva Univ. - Centre Universitaired’Informatique (CH)

7 Glasgow Univ. - IR Group Keith (UK)

8 Centrum Wiskunde & Informatica - InteractiveInformation Access (NL)

Page 19: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Participants - 2009 track

1 Tech. Univ. Darmstadt, Dept. of CS,Ubiquitous Knowledge Processing Lab (DE)

2 Univ. Neuchatel - Computer Science (CH)

3 Santiago de Compostela Univ. - Dept.Electronica y Computacion (ES)

4 University of Tampere - Info Studies (FI)

5 Interactive Media and Swedish Institute ofComputer Science (SE)

6 Geneva Univ. - Centre Universitaired’Informatique (CH)

7 Glasgow Univ. - IR Group Keith (UK)

8 Centrum Wiskunde & Informatica - InteractiveInformation Access (NL)

Page 20: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Participants - 2009 track

1 Tech. Univ. Darmstadt, Dept. of CS,Ubiquitous Knowledge Processing Lab (DE)

2 Univ. Neuchatel - Computer Science (CH)

3 Santiago de Compostela Univ. - Dept.Electronica y Computacion (ES)

4 University of Tampere - Info Studies (FI)

5 Interactive Media and Swedish Institute ofComputer Science (SE)

6 Geneva Univ. - Centre Universitaired’Informatique (CH)

7 Glasgow Univ. - IR Group Keith (UK)

8 Centrum Wiskunde & Informatica - InteractiveInformation Access (NL)

Page 21: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Participants - 2009 track

1 Tech. Univ. Darmstadt, Dept. of CS,Ubiquitous Knowledge Processing Lab (DE)

2 Univ. Neuchatel - Computer Science (CH)

3 Santiago de Compostela Univ. - Dept.Electronica y Computacion (ES)

4 University of Tampere - Info Studies (FI)

5 Interactive Media and Swedish Institute ofComputer Science (SE)

6 Geneva Univ. - Centre Universitaired’Informatique (CH)

7 Glasgow Univ. - IR Group Keith (UK)

8 Centrum Wiskunde & Informatica - InteractiveInformation Access (NL)

Page 22: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Participants - 2009 track

9 Geneva Univ. Hospitals - Service of MedicalInformatics (CH)

10 Humboldt Univ. - Dept. of German Languageand Linguistics (DE)

11 Dublin City Univ. - School of Computing (IE)

12 Radboud Univ. Nijmegen - Centre for LanguageStudies & Speech Technologies (NL)

13 Hildesheim Univ. - Information Systems &Machine Learning Lab (DE)

14 Technical Univ. Valencia - Natural LanguageEngineering (ES)

15 Al. I. Cuza University of Iasi - Natural LanguageProcessing (RO)

Page 23: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Participants - 2009 track

9 Geneva Univ. Hospitals - Service of MedicalInformatics (CH)

10 Humboldt Univ. - Dept. of German Languageand Linguistics (DE)

11 Dublin City Univ. - School of Computing (IE)

12 Radboud Univ. Nijmegen - Centre for LanguageStudies & Speech Technologies (NL)

13 Hildesheim Univ. - Information Systems &Machine Learning Lab (DE)

14 Technical Univ. Valencia - Natural LanguageEngineering (ES)

15 Al. I. Cuza University of Iasi - Natural LanguageProcessing (RO)

Page 24: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Participants - 2009 track

9 Geneva Univ. Hospitals - Service of MedicalInformatics (CH)

10 Humboldt Univ. - Dept. of German Languageand Linguistics (DE)

11 Dublin City Univ. - School of Computing (IE)

12 Radboud Univ. Nijmegen - Centre for LanguageStudies & Speech Technologies (NL)

13 Hildesheim Univ. - Information Systems &Machine Learning Lab (DE)

14 Technical Univ. Valencia - Natural LanguageEngineering (ES)

15 Al. I. Cuza University of Iasi - Natural LanguageProcessing (RO)

Page 25: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Participants - 2009 track

9 Geneva Univ. Hospitals - Service of MedicalInformatics (CH)

10 Humboldt Univ. - Dept. of German Languageand Linguistics (DE)

11 Dublin City Univ. - School of Computing (IE)

12 Radboud Univ. Nijmegen - Centre for LanguageStudies & Speech Technologies (NL)

13 Hildesheim Univ. - Information Systems &Machine Learning Lab (DE)

14 Technical Univ. Valencia - Natural LanguageEngineering (ES)

15 Al. I. Cuza University of Iasi - Natural LanguageProcessing (RO)

Page 26: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Participants - 2009 track

9 Geneva Univ. Hospitals - Service of MedicalInformatics (CH)

10 Humboldt Univ. - Dept. of German Languageand Linguistics (DE)

11 Dublin City Univ. - School of Computing (IE)

12 Radboud Univ. Nijmegen - Centre for LanguageStudies & Speech Technologies (NL)

13 Hildesheim Univ. - Information Systems &Machine Learning Lab (DE)

14 Technical Univ. Valencia - Natural LanguageEngineering (ES)

15 Al. I. Cuza University of Iasi - Natural LanguageProcessing (RO)

Page 27: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Participants - 2009 track

9 Geneva Univ. Hospitals - Service of MedicalInformatics (CH)

10 Humboldt Univ. - Dept. of German Languageand Linguistics (DE)

11 Dublin City Univ. - School of Computing (IE)

12 Radboud Univ. Nijmegen - Centre for LanguageStudies & Speech Technologies (NL)

13 Hildesheim Univ. - Information Systems &Machine Learning Lab (DE)

14 Technical Univ. Valencia - Natural LanguageEngineering (ES)

15 Al. I. Cuza University of Iasi - Natural LanguageProcessing (RO)

Page 28: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Participants - 2009 track

9 Geneva Univ. Hospitals - Service of MedicalInformatics (CH)

10 Humboldt Univ. - Dept. of German Languageand Linguistics (DE)

11 Dublin City Univ. - School of Computing (IE)

12 Radboud Univ. Nijmegen - Centre for LanguageStudies & Speech Technologies (NL)

13 Hildesheim Univ. - Information Systems &Machine Learning Lab (DE)

14 Technical Univ. Valencia - Natural LanguageEngineering (ES)

15 Al. I. Cuza University of Iasi - Natural LanguageProcessing (RO)

Page 29: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Participants - 2009 track

15 participants

48 experimentssubmitted for the maintask

10 experimentssubmitted for thelanguage tasks

Page 30: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Participants - 2009 track

15 participants

48 experimentssubmitted for the maintask

10 experimentssubmitted for thelanguage tasks

Page 31: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Participants - 2009 track

15 participants

48 experimentssubmitted for the maintask

10 experimentssubmitted for thelanguage tasks

Page 32: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Participants - 2009 track

15 participants

48 experimentssubmitted for the maintask

10 experimentssubmitted for thelanguage tasks

Page 33: Chances and Challenges in Comparing Cross-Language Retrieval Tools

2009-2010: participants

Page 34: Chances and Challenges in Comparing Cross-Language Retrieval Tools

2009-2010: evolution of the CLEF-IP track

2009

2010

1 task: prior art search

prior art candidate searchand classification task

targeting granted patents

patent applications

15 participants

20 participants

all from academia

4 industrial participants

families and citations

include forward citations

manual assessments

expanded lists of relevantdocs

standard evaluation mea-sures

new measure: pres, morerecall-oriented

Page 35: Chances and Challenges in Comparing Cross-Language Retrieval Tools

2009-2010: evolution of the CLEF-IP track

2009

2010

1 task: prior art search

prior art candidate searchand classification task

targeting granted patents

patent applications

15 participants

20 participants

all from academia

4 industrial participants

families and citations

include forward citations

manual assessments

expanded lists of relevantdocs

standard evaluation mea-sures

new measure: pres, morerecall-oriented

Page 36: Chances and Challenges in Comparing Cross-Language Retrieval Tools

2009-2010: evolution of the CLEF-IP track

2009 2010

1 task: prior art search

prior art candidate searchand classification task

targeting granted patents

patent applications

15 participants

20 participants

all from academia

4 industrial participants

families and citations

include forward citations

manual assessments

expanded lists of relevantdocs

standard evaluation mea-sures

new measure: pres, morerecall-oriented

Page 37: Chances and Challenges in Comparing Cross-Language Retrieval Tools

2009-2010: evolution of the CLEF-IP track

2009 2010

1 task: prior art search prior art candidate searchand classification task

targeting granted patents

patent applications

15 participants

20 participants

all from academia

4 industrial participants

families and citations

include forward citations

manual assessments

expanded lists of relevantdocs

standard evaluation mea-sures

new measure: pres, morerecall-oriented

Page 38: Chances and Challenges in Comparing Cross-Language Retrieval Tools

2009-2010: evolution of the CLEF-IP track

2009 2010

1 task: prior art search prior art candidate searchand classification task

targeting granted patents patent applications

15 participants

20 participants

all from academia

4 industrial participants

families and citations

include forward citations

manual assessments

expanded lists of relevantdocs

standard evaluation mea-sures

new measure: pres, morerecall-oriented

Page 39: Chances and Challenges in Comparing Cross-Language Retrieval Tools

2009-2010: evolution of the CLEF-IP track

2009 2010

1 task: prior art search prior art candidate searchand classification task

targeting granted patents patent applications

15 participants 20 participants

all from academia

4 industrial participants

families and citations

include forward citations

manual assessments

expanded lists of relevantdocs

standard evaluation mea-sures

new measure: pres, morerecall-oriented

Page 40: Chances and Challenges in Comparing Cross-Language Retrieval Tools

2009-2010: evolution of the CLEF-IP track

2009 2010

1 task: prior art search prior art candidate searchand classification task

targeting granted patents patent applications

15 participants 20 participants

all from academia 4 industrial participants

families and citations

include forward citations

manual assessments

expanded lists of relevantdocs

standard evaluation mea-sures

new measure: pres, morerecall-oriented

Page 41: Chances and Challenges in Comparing Cross-Language Retrieval Tools

2009-2010: evolution of the CLEF-IP track

2009 2010

1 task: prior art search prior art candidate searchand classification task

targeting granted patents patent applications

15 participants 20 participants

all from academia 4 industrial participants

families and citations include forward citations

manual assessments

expanded lists of relevantdocs

standard evaluation mea-sures

new measure: pres, morerecall-oriented

Page 42: Chances and Challenges in Comparing Cross-Language Retrieval Tools

2009-2010: evolution of the CLEF-IP track

2009 2010

1 task: prior art search prior art candidate searchand classification task

targeting granted patents patent applications

15 participants 20 participants

all from academia 4 industrial participants

families and citations include forward citations

manual assessments expanded lists of relevantdocs

standard evaluation mea-sures

new measure: pres, morerecall-oriented

Page 43: Chances and Challenges in Comparing Cross-Language Retrieval Tools

2009-2010: evolution of the CLEF-IP track

2009 2010

1 task: prior art search prior art candidate searchand classification task

targeting granted patents patent applications

15 participants 20 participants

all from academia 4 industrial participants

families and citations include forward citations

manual assessments expanded lists of relevantdocs

standard evaluation mea-sures

new measure: pres, morerecall-oriented

Page 44: Chances and Challenges in Comparing Cross-Language Retrieval Tools

What are relevance assessments

A test collection (also known as gold standard) consists of a targetdataset, a set of queries, and relevance assessments correspondingto each query.

The CLEF-IP test collection:

target data: 2 million EP patents

queries: full-text patents (without images)

relevance assessments: extended citations

Page 45: Chances and Challenges in Comparing Cross-Language Retrieval Tools

What are relevance assessments

A test collection (also known as gold standard) consists of a targetdataset, a set of queries, and relevance assessments correspondingto each query.

The CLEF-IP test collection:

target data: 2 million EP patents

queries: full-text patents (without images)

relevance assessments: extended citations

Page 46: Chances and Challenges in Comparing Cross-Language Retrieval Tools

What are relevance assessments

A test collection (also known as gold standard) consists of a targetdataset, a set of queries, and relevance assessments correspondingto each query.

The CLEF-IP test collection:

target data: 2 million EP patents

queries: full-text patents (without images)

relevance assessments: extended citations

Page 47: Chances and Challenges in Comparing Cross-Language Retrieval Tools

What are relevance assessments

A test collection (also known as gold standard) consists of a targetdataset, a set of queries, and relevance assessments correspondingto each query.

The CLEF-IP test collection:

target data: 2 million EP patents

queries: full-text patents (without images)

relevance assessments: extended citations

Page 48: Chances and Challenges in Comparing Cross-Language Retrieval Tools

What are relevance assessments

A test collection (also known as gold standard) consists of a targetdataset, a set of queries, and relevance assessments correspondingto each query.

The CLEF-IP test collection:

target data: 2 million EP patents

queries: full-text patents (without images)

relevance assessments: extended citations

Page 49: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Relevance assessments

We used patents cited as prior art as relevance assessments.

Sources of citations:

1 applicant’s disclosure: the Uspto requires applicants todisclose all known relevant publications

2 patent office search report: each patent office will do a searchfor prior art to judge the novelty of a patent

3 opposition procedures: patents cited to prove that a grantedpatent is not novel

Page 50: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Relevance assessments

We used patents cited as prior art as relevance assessments.

Sources of citations:

1 applicant’s disclosure: the Uspto requires applicants todisclose all known relevant publications

2 patent office search report: each patent office will do a searchfor prior art to judge the novelty of a patent

3 opposition procedures: patents cited to prove that a grantedpatent is not novel

Page 51: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Relevance assessments

We used patents cited as prior art as relevance assessments.

Sources of citations:

1 applicant’s disclosure: the Uspto requires applicants todisclose all known relevant publications

2 patent office search report: each patent office will do a searchfor prior art to judge the novelty of a patent

3 opposition procedures: patents cited to prove that a grantedpatent is not novel

Page 52: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Relevance assessments

We used patents cited as prior art as relevance assessments.

Sources of citations:

1 applicant’s disclosure: the Uspto requires applicants todisclose all known relevant publications

2 patent office search report: each patent office will do a searchfor prior art to judge the novelty of a patent

3 opposition procedures: patents cited to prove that a grantedpatent is not novel

Page 53: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Relevance assessments

We used patents cited as prior art as relevance assessments.

Sources of citations:

1 applicant’s disclosure: the Uspto requires applicants todisclose all known relevant publications

2 patent office search report: each patent office will do a searchfor prior art to judge the novelty of a patent

3 opposition procedures: patents cited to prove that a grantedpatent is not novel

Page 54: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Extended citations as relevance assessments

direct citations and their families

Page 55: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Extended citations as relevance assessments

direct citations of family members ...

Page 56: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Extended citations as relevance assessments

... and their families

Page 57: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Patent families

A patent family consists of patents granted by different patentauthorities but related to the same invention.

simple family all family members share the same priority number

extended family there are several definitions, in the INPADOCdatabase all documents which are directly orindirectly linked via a priority number belong to thesame family

Page 58: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Patent families

A patent family consists of patents granted by different patentauthorities but related to the same invention.

simple family all family members share the same priority number

extended family there are several definitions, in the INPADOCdatabase all documents which are directly orindirectly linked via a priority number belong to thesame family

Page 59: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Patent families

A patent family consists of patents granted by different patentauthorities but related to the same invention.

simple family all family members share the same priority number

extended family there are several definitions, in the INPADOCdatabase all documents which are directly orindirectly linked via a priority number belong to thesame family

Page 60: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Patent families

Patent documents are linked bypriorities

Page 61: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Patent families

Patent documents are linked bypriorities

INPADOC family.

Page 62: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Patent families

Patent documents are linked bypriorities

Clef–Ip uses simple families.

Page 63: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Relevance assessments 2010

Expanding the 2009 extended citations:

1 include citations of forward citations ...

2 ... and their families

This is apparently a well-known method among patent searchers.

Zig-zag search?

Page 64: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Relevance assessments 2010

Expanding the 2009 extended citations:

1 include citations of forward citations ...

2 ... and their families

This is apparently a well-known method among patent searchers.

Zig-zag search?

Page 65: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Relevance assessments 2010

Expanding the 2009 extended citations:

1 include citations of forward citations ...

2 ... and their families

This is apparently a well-known method among patent searchers.

Zig-zag search?

Page 66: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Relevance assessments 2010

Expanding the 2009 extended citations:

1 include citations of forward citations ...

2 ... and their families

This is apparently a well-known method among patent searchers.

Zig-zag search?

Page 67: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Relevance assessments 2010

Expanding the 2009 extended citations:

1 include citations of forward citations ...

2 ... and their families

This is apparently a well-known method among patent searchers.

Zig-zag search?

Page 68: Chances and Challenges in Comparing Cross-Language Retrieval Tools

How good are the CLEF-IP relevance assessments?

CLEF-IP uses families + citations:

Page 69: Chances and Challenges in Comparing Cross-Language Retrieval Tools

How good are the CLEF-IP relevance assessments?

how complete are extendedcitations as a relevanceassessments?

will every prior art patent beincluded in this set?

and if not, what percentageof prior art items are capturedby extended citations?

when considering forwardcitations, how good areextended citations as a priorart candidate set?

Page 70: Chances and Challenges in Comparing Cross-Language Retrieval Tools

How good are the CLEF-IP relevance assessments?

how complete are extendedcitations as a relevanceassessments?

will every prior art patent beincluded in this set?

and if not, what percentageof prior art items are capturedby extended citations?

when considering forwardcitations, how good areextended citations as a priorart candidate set?

Page 71: Chances and Challenges in Comparing Cross-Language Retrieval Tools

How good are the CLEF-IP relevance assessments?

how complete are extendedcitations as a relevanceassessments?

will every prior art patent beincluded in this set?

and if not, what percentageof prior art items are capturedby extended citations?

when considering forwardcitations, how good areextended citations as a priorart candidate set?

Page 72: Chances and Challenges in Comparing Cross-Language Retrieval Tools

How good are the CLEF-IP relevance assessments?

how complete are extendedcitations as a relevanceassessments?

will every prior art patent beincluded in this set?

and if not, what percentageof prior art items are capturedby extended citations?

when considering forwardcitations, how good areextended citations as a priorart candidate set?

Page 73: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Feedback from patent experts needed

Quality of prior art candidate sets has to be assessed

Page 74: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Feedback from patent experts needed

Know-how of patent search experts is needed

Page 75: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Feedback from patent experts needed

at Clef–Ip 2009 7 patent search professionals assessed 12search results

the task was not well defined and there weremisunderstandings on the concept of relevance

amount of data was not sufficient to draw conclusions

Page 76: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Feedback from patent experts needed

at Clef–Ip 2009 7 patent search professionals assessed 12search results

the task was not well defined and there weremisunderstandings on the concept of relevance

amount of data was not sufficient to draw conclusions

Page 77: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Feedback from patent experts needed

at Clef–Ip 2009 7 patent search professionals assessed 12search results

the task was not well defined and there weremisunderstandings on the concept of relevance

amount of data was not sufficient to draw conclusions

Page 78: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Feedback from patent experts needed

Page 79: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Some initiatives associated with Clef–Ip

The results of evaluation tracks are mostly useful for the researchcommunity.

This community often produces prototypes that are of littleinterest to the end-user.

Next I’d like to present two concrete outcomes - not of Clef–Ipdirectly but arising from work in patent retrieval evaluation

Page 80: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Some initiatives associated with Clef–Ip

The results of evaluation tracks are mostly useful for the researchcommunity.

This community often produces prototypes that are of littleinterest to the end-user.

Next I’d like to present two concrete outcomes - not of Clef–Ipdirectly but arising from work in patent retrieval evaluation

Page 81: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Some initiatives associated with Clef–Ip

The results of evaluation tracks are mostly useful for the researchcommunity.

This community often produces prototypes that are of littleinterest to the end-user.

Next I’d like to present two concrete outcomes - not of Clef–Ipdirectly but arising from work in patent retrieval evaluation

Page 82: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Soire

Page 83: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Soire

developed at Matrixware

service-oriented architecture - available as a a Web service

allows to replicate IR experiments based on classicalevaluation model

tested on the CLEF-IP data

customized for the evaluation of machine translation

Page 84: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Soire

developed at Matrixware

service-oriented architecture - available as a a Web service

allows to replicate IR experiments based on classicalevaluation model

tested on the CLEF-IP data

customized for the evaluation of machine translation

Page 85: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Soire

developed at Matrixware

service-oriented architecture - available as a a Web service

allows to replicate IR experiments based on classicalevaluation model

tested on the CLEF-IP data

customized for the evaluation of machine translation

Page 86: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Soire

developed at Matrixware

service-oriented architecture - available as a a Web service

allows to replicate IR experiments based on classicalevaluation model

tested on the CLEF-IP data

customized for the evaluation of machine translation

Page 87: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Soire

developed at Matrixware

service-oriented architecture - available as a a Web service

allows to replicate IR experiments based on classicalevaluation model

tested on the CLEF-IP data

customized for the evaluation of machine translation

Page 88: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Spinque

Page 89: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Spinque

a spin-off (2010) from CWI (the Dutch National ResearchCenter in Computer Science and Mathematics)

introduces search-by-strategy

provides optimized strategies for patent search - tested onCLEF-IP data

transparency: understand your search results to improvestrategy

Page 90: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Spinque

a spin-off (2010) from CWI (the Dutch National ResearchCenter in Computer Science and Mathematics)

introduces search-by-strategy

provides optimized strategies for patent search - tested onCLEF-IP data

transparency: understand your search results to improvestrategy

Page 91: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Spinque

a spin-off (2010) from CWI (the Dutch National ResearchCenter in Computer Science and Mathematics)

introduces search-by-strategy

provides optimized strategies for patent search - tested onCLEF-IP data

transparency: understand your search results to improvestrategy

Page 92: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Spinque

a spin-off (2010) from CWI (the Dutch National ResearchCenter in Computer Science and Mathematics)

introduces search-by-strategy

provides optimized strategies for patent search - tested onCLEF-IP data

transparency: understand your search results to improvestrategy

Page 93: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Clef–Ip 2009 learnings

The Humboldt University implemented a model for patent searchthat produced the best results.

The model combined several strategies:

using metadata (IPC, ECLA)

indexes built at lemma level

an additional phrase index for English

crosslingual concept index (multilingual terminologicaldatabase)

Page 94: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Clef–Ip 2009 learnings

The Humboldt University implemented a model for patent searchthat produced the best results.

The model combined several strategies:

using metadata (IPC, ECLA)

indexes built at lemma level

an additional phrase index for English

crosslingual concept index (multilingual terminologicaldatabase)

Page 95: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Clef–Ip 2009 learnings

The Humboldt University implemented a model for patent searchthat produced the best results.

The model combined several strategies:

using metadata (IPC, ECLA)

indexes built at lemma level

an additional phrase index for English

crosslingual concept index (multilingual terminologicaldatabase)

Page 96: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Clef–Ip 2009 learnings

The Humboldt University implemented a model for patent searchthat produced the best results.

The model combined several strategies:

using metadata (IPC, ECLA)

indexes built at lemma level

an additional phrase index for English

crosslingual concept index (multilingual terminologicaldatabase)

Page 97: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Clef–Ip 2009 learnings

The Humboldt University implemented a model for patent searchthat produced the best results.

The model combined several strategies:

using metadata (IPC, ECLA)

indexes built at lemma level

an additional phrase index for English

crosslingual concept index (multilingual terminologicaldatabase)

Page 98: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Clef–Ip 2009 learnings

The Humboldt University implemented a model for patent searchthat produced the best results.

The model combined several strategies:

using metadata (IPC, ECLA)

indexes built at lemma level

an additional phrase index for English

crosslingual concept index (multilingual terminologicaldatabase)

Page 99: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Some additional investigations

Some citations were hard to find

% runs class≤ 5 hard

5 < x ≤ 10 very difficult

10 < x ≤ 50 difficult

50 < x ≤ 75 medium

75 < x ≤ 100 easy

Page 100: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Some additional investigations

Some citations were hard to find

% runs class≤ 5 hard

5 < x ≤ 10 very difficult

10 < x ≤ 50 difficult

50 < x ≤ 75 medium

75 < x ≤ 100 easy

Page 101: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Some additional investigations

We looked at the content of citations and citing patents.

Page 102: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Some additional investigations

Ongoing investigations.

Page 103: Chances and Challenges in Comparing Cross-Language Retrieval Tools

Thank you for your attention.