ecoterm iv nbii/eionet demo of federated kos search

Post on 15-Mar-2016

38 Views

Category:

Documents

4 Downloads

Preview:

Click to see full reader

DESCRIPTION

EcoTerm IV NBII/EioNet Demo of Federated KOS Search. Mike Frame Vienna, Austria April 2007. Discussion Topics…. Project Background NBII Thesaurus GEMET Thesaurus Prototype Client Sample Query Results Including no, 1, or both thesauri Overall Findings. - PowerPoint PPT Presentation

TRANSCRIPT

EcoTerm IVNBII/EioNet Demo of Federated

KOS Search

Mike Frame

Vienna, Austria

April 2007

Discussion Topics…• Project Background• NBII Thesaurus• GEMET Thesaurus• Prototype Client• Sample Query Results

• Including no, 1, or both thesauri • Overall Findings

Biocomplexity Thesaurushttp://thesaurus.nbii.gov

http://thesaurus.nbii.gov

EIONET GEMET Thesaurushttp://www.eionet.europa.eu/gemet/webservices?langcode=en

NBII/EIONET Thesaurus Web-service

1

• Background - collaboration through Ecoinformatics TWG • Primary Goal – access distributed multi-lingual thesauri• Results – SKOS web-service & client

Latest Client & Service capabilities Access to both NBII and GEMET Single language capability Results are provided by source All documentation is completed

http://thesaurus.nbii.gov

Demo Client

Current State Users

• Most aren’t aware of the underlying vocabulary Vocabulary are often unique to organization

and more for “categorization” than retrieval Goal

• Include all Vocabularies and let Search Engine handle results

Demonstration Search Retrieval Created a demonstration datasets

• NBII Cataloged Resources•~30,000 web-sites, publications, images,

maps, etc.•Xml structured data – controlled subject

• NBII FGDC Metadata•~22,000 resources on research studies• 150-200 elements•Semi-structured with no controlled vocabulary

NBII Catalog Records Based on the Dublin Core + 18 elements, of which 10 are mandatory In place since 2002 Used by distributed content managers

NBII Metadata CH

Process Added thesaurus capabilities to Development

Search Engine for: • NBII Thesaurus• EIONET GEMET Thesaurus• Used BT, RT, NT relationships & weighting

Performed sample queries within the test repositories for:• No thesaurus • GEMET only aided searching• NBII only aided searching• GEMET+NBII aided searching (X)

Test Repository 1NBII Resource Catalog

(Dublin Core)

No Thesauri – “invasive species”

NBII Thesaurus – “invasive species”

GEMET Thesaurus – “invasive species”

No Thesauri – “Endangered Species”

NBII Thesaurus – “endangered species”

GEMET Only – “endangered species”

No Thesaurus – “rare species”

NBII Thesaurus – “rare species”

GEMET Thesaurus – “rare species”

GEMET Thesaurus – “rare species” (expanded degrees of relevance)

No Thesauri – “protected species”

NBII Thesaurus – “protected species”

GEMET Thesaurus – “protected species”

Results – NBII Catalog Resourcesterm None NBII GEMET“invasive species”

2487 10802 2487

“endangered species”

1612 3532 1619

“rare species”

“rare species” (expanded)

249 7186 290

5847

“”protected species”

203 2345 1664

Results – NBII Resource Catalog

0

2000

4000

6000

8000

10000

12000

Invasivespec ies

endangeredspec ies

rare spec ies protec tedspec ies

None NBII GEMET

Test Repository 2NBII FGDC Metadata

Sample Queries – No vocabulariesMetadata CH “ invasive species”

Sample Queries – NBII onlyMetadata CH “invasive species”

Sample Queries – GEMET onlyMetadata CH

“ invasive species”

Sample Queries – No vocabulariesMetadata CH

“endangered species”

Sample Queries – NBII onlyMetadata CH

“endangered species”

Sample Queries – GEMET onlyMetadata CH

“ endangered species”

No Thesauri – Metadata CH“rare species”

NBII Thesaurus – Metadata CH “rare species”

GEMET Thesaurus – Metadata CH“rare species”

Sample Queries – No vocabulariesMetadata CH “protected species”

Sample Queries – NBII onlyMetadata CH

“protected species”

Sample Queries – GEMET onlyMetadata CH

“ protected species”

Results – FGDC Metadataterm None NBII GEMET“invasive species”

302 7884 302

“endangered species”

1008 2690 1019

“rare species” 59 4259 64

“protected species”

11 2152 1011

Results – NBII Resource Catalog

0

1000

2000

3000

4000

5000

6000

7000

8000

Invasivespec ies

endangeredspec ies

rare spec ies protec tedspec ies

None NBII GEMET

Overall ResultsGeneral Findings

Assumption that a Thesaurus improves “number” of results is valid• Degree does vary by the term and mappings

Since users search from a # of perspectives, backgrounds, expertise, multiple thesaurus do improve the number of results

Overall ResultsUsing only GEMET Terminology

Terms not included in the NBII thesaurus that were in GEMET improved search results

GEMET strength of broad coverage aided searches

In General for the Metadata repository• Results varied somewhat, but often same

top 10 results

Overall ResultsGeneral Findings

With “No thesaurus” test results produced poorer #1 results

Thesaurus results for the structured set ordered results list more differently than unstructured set (Metadata)

Issues “integrating” multi-scope and purpose

thesauri presents challenges:• Can’t turn the effort into a thesaurus project• Degrees of relevance of terms is an issue• Concept matching or different intent• Differing classification (RT vs. NT) across

thesauri • Differing “weighting” algorithms

Further Study Options 1.) Take multiple thesauri “as is”2.) Do some “attempted” concept

matchingi.e. “endangered animal species” –

“endangered animal”3.) If not match is present, add term and

relationship as is4.) Obtain terms from XMDR

Further Study Options – cont. Follow-up with additional repositories Repeat with other query terms Re-look at weighting algorithms Do queries with subset of terms Repeat with completely integrated

thesaurus as compared to>>>>>>> Repeat queries with machine integration

Complete By June

Questions, Comments,

GEMET Control file endangered species,category of endangered

species[.2],endangered animal species[0.8],endangered plant species[0.8]

protected species,category of endangered species[0.2],endangered species [0.2]

rare species,category of endangered species[0.2],extinct species[0.2],vanished species[0.2]

top related