turning three thesauri into a global agricultural concept scheme
TRANSCRIPT
![Page 1: Turning three thesauri into a Global Agricultural Concept Scheme](https://reader034.vdocuments.mx/reader034/viewer/2022051617/55a70e701a28ab10628b4663/html5/thumbnails/1.jpg)
Turning three thesauri into a
Global Agricultural Concept Scheme
March 9, 2015
Research Data Alliance
Session II: Good Practices towards opening data in agriculture
Cynthia Parr, National Agricultural Library
@cydparr, [email protected]
![Page 2: Turning three thesauri into a Global Agricultural Concept Scheme](https://reader034.vdocuments.mx/reader034/viewer/2022051617/55a70e701a28ab10628b4663/html5/thumbnails/2.jpg)
Outline
1. Background
2. Starting point: three thesauri
3. Creating GACS
4. Challenges
5. Next steps and future of GACS
![Page 3: Turning three thesauri into a Global Agricultural Concept Scheme](https://reader034.vdocuments.mx/reader034/viewer/2022051617/55a70e701a28ab10628b4663/html5/thumbnails/3.jpg)
Background● Food and Agriculture Organization of the UN
● CABI (UK)
● National Agricultural Library (US)
Each organization maintains a thesaurus of terms and concepts related to
agriculture -- concepts like rice, ricefield aquaculture, and plant pests.
![Page 4: Turning three thesauri into a Global Agricultural Concept Scheme](https://reader034.vdocuments.mx/reader034/viewer/2022051617/55a70e701a28ab10628b4663/html5/thumbnails/4.jpg)
Separate thesauri, separate databasesCreate GACS as a glue linking them together
![Page 5: Turning three thesauri into a Global Agricultural Concept Scheme](https://reader034.vdocuments.mx/reader034/viewer/2022051617/55a70e701a28ab10628b4663/html5/thumbnails/5.jpg)
Global Agricultural Concept Scheme (GACS)agreement October 2013 to conduct feasibility study
1. To improve the semantic interoperability of thesauri
maintained by FAO, CABI, and NAL.
2. To identify and provide core concepts broadly
supported across the three thesauri.
3. To achieve efficiencies of scale by maintaining the core
concepts in cooperation.
![Page 6: Turning three thesauri into a Global Agricultural Concept Scheme](https://reader034.vdocuments.mx/reader034/viewer/2022051617/55a70e701a28ab10628b4663/html5/thumbnails/6.jpg)
Consultants
Osma Suominen (Finland)
Tom Baker (Germany)
![Page 7: Turning three thesauri into a Global Agricultural Concept Scheme](https://reader034.vdocuments.mx/reader034/viewer/2022051617/55a70e701a28ab10628b4663/html5/thumbnails/7.jpg)
Creating GACS
Phase One: Analysis of Thesauri
![Page 8: Turning three thesauri into a Global Agricultural Concept Scheme](https://reader034.vdocuments.mx/reader034/viewer/2022051617/55a70e701a28ab10628b4663/html5/thumbnails/8.jpg)
AGROVOC CAB Thesaurus NAL Thesaurus
140,000
concepts,
>1.4M terms
32,000
concepts,
>1.2M terms
53,000
concepts,
>200k terms
English, Spanish,
Portuguese, German,
Czech, Persian, Polish,
Hindi, French, Italian,
Russian, Japanese,
Hungarian, Chinese,
Slovak, Thai, Lao, Turkish,
Korean, Arabic, Telugu ...
English, Spanish,
Portuguese, Dutch
+ many languages with
lower coverage
English, Spanish
All thesauri represented using SKOS
![Page 9: Turning three thesauri into a Global Agricultural Concept Scheme](https://reader034.vdocuments.mx/reader034/viewer/2022051617/55a70e701a28ab10628b4663/html5/thumbnails/9.jpg)
Overlap estimateObtained via automatic
mappings created using
AgreementMakerLight
![Page 10: Turning three thesauri into a Global Agricultural Concept Scheme](https://reader034.vdocuments.mx/reader034/viewer/2022051617/55a70e701a28ab10628b4663/html5/thumbnails/10.jpg)
Long tail distribution (in AGRIS)10,000 concepts cover nearly 99% of occurrences in metadata
![Page 11: Turning three thesauri into a Global Agricultural Concept Scheme](https://reader034.vdocuments.mx/reader034/viewer/2022051617/55a70e701a28ab10628b4663/html5/thumbnails/11.jpg)
Requirements and Wishes
1. An integrated view and bridge of existing thesauri
2. Reuses thesaurus development work, incl. translations
3. Compatible with existing databases
4. Based on RDF technologies: URIs, SKOS etc.
5. Available as Linked Open Data
Currently building GACS Beta, a proof-of-concept
implementation attempting to fulfill most requirements
![Page 12: Turning three thesauri into a Global Agricultural Concept Scheme](https://reader034.vdocuments.mx/reader034/viewer/2022051617/55a70e701a28ab10628b4663/html5/thumbnails/12.jpg)
Creating GACS
Phase Two: Proof of Concept
![Page 13: Turning three thesauri into a Global Agricultural Concept Scheme](https://reader034.vdocuments.mx/reader034/viewer/2022051617/55a70e701a28ab10628b4663/html5/thumbnails/13.jpg)
Selection of top 10,000 concepts
Each partner organization provided
the 10,000 concepts most frequently
used in their respective databases.
These lists of concepts were
modified as follows:
● added all countries (from
AGROVOC)
● added organisms hierarchy all
the way to the top
![Page 14: Turning three thesauri into a Global Agricultural Concept Scheme](https://reader034.vdocuments.mx/reader034/viewer/2022051617/55a70e701a28ab10628b4663/html5/thumbnails/14.jpg)
Automated mappings
Created using AgreementMakerLight software
between the full thesauri, for completeness
AgreementMakerLight was top performer at
OAEI 2014 ontology mapping competition!
![Page 15: Turning three thesauri into a Global Agricultural Concept Scheme](https://reader034.vdocuments.mx/reader034/viewer/2022051617/55a70e701a28ab10628b4663/html5/thumbnails/15.jpg)
Human evaluation of mappings
Created Google Docs spreadsheets using the lists of selected concepts and
the auto-generated mappings. Three sheets with circa 10,700 rows each.
Mappings manually evaluated by
staff of partner organizations.
Evaluated 60 to 150 rows/hour,
total evaluation time over 300
hours so far.
Currently projected to take
500-600 hours for GACS Beta.
![Page 16: Turning three thesauri into a Global Agricultural Concept Scheme](https://reader034.vdocuments.mx/reader034/viewer/2022051617/55a70e701a28ab10628b4663/html5/thumbnails/16.jpg)
Forming GACS conceptsby merging the source concepts and aggregating their information
rice
UF paddy
UF paddy rice
cereals
UF feed cereals
UF small grain cereals (grain)
Oryza sativa
UF Oryza glutinosa
UF Oryza indica
UF Oryza japonica
UF Oryza sativa … (subsp, var etc.)
Oryza
UF Padia
UF rice (plant)
agrovoc:c_5435
cabt:82917
nalt:56271
exactMatch
agrovoc:c_5438
cabt:82935
nalt:56277
exactMatch
agrovoc:c_1474
cabt:26247
exactMatch
agrovoc:c_6599
cabt:101613
nalt:56293
exactMatch
(actually we use SKOS, not traditional thesaurus tags)
![Page 17: Turning three thesauri into a Global Agricultural Concept Scheme](https://reader034.vdocuments.mx/reader034/viewer/2022051617/55a70e701a28ab10628b4663/html5/thumbnails/17.jpg)
Size of GACS
GACSGACS Beta
will have around
14,000 of the
most used
concepts
![Page 18: Turning three thesauri into a Global Agricultural Concept Scheme](https://reader034.vdocuments.mx/reader034/viewer/2022051617/55a70e701a28ab10628b4663/html5/thumbnails/18.jpg)
Quality evaluation
Using the qSKOS and Skosify tools that can find and correct problems in SKOS
vocabularies [1], we can detect
● missing, invalid or overlapping concept labels
● anomalies in concept hierarchy, e.g. cycles
● ...and many other kinds of problems.
Many problems are expected due to merging of concepts within GACS, but
most should be automatically corrected.
[1] Osma Suominen and Christian Mader: Assessing and Improving the
Quality of SKOS Vocabularies. JoDS, 3(1) 2014.
![Page 19: Turning three thesauri into a Global Agricultural Concept Scheme](https://reader034.vdocuments.mx/reader034/viewer/2022051617/55a70e701a28ab10628b4663/html5/thumbnails/19.jpg)
Demo of GACS Alpha in Skosmos
http://bit.ly/1Gjf5jl
![Page 20: Turning three thesauri into a Global Agricultural Concept Scheme](https://reader034.vdocuments.mx/reader034/viewer/2022051617/55a70e701a28ab10628b4663/html5/thumbnails/20.jpg)
Additional mapping rounds
Need to perform 2-3 more
smaller mapping rounds
in order to ensure that
all necessary concepts
have been fully mapped
between all source thesauri
![Page 21: Turning three thesauri into a Global Agricultural Concept Scheme](https://reader034.vdocuments.mx/reader034/viewer/2022051617/55a70e701a28ab10628b4663/html5/thumbnails/21.jpg)
Lessons already learned● It is hard to sustain focus on mapping beyond circa five hours per day.
● Mapping reveals issues with both the source and target thesauri -- areas
for improvement, or errors, fixable in collaboration.
● Starting with the 10,000 most-used concepts shines a light on parts of
thesauri that may long have lacked attention.
● Starting small, with a core, avoids the potential stress of over-committing
resources.
● Mapping provides an incentive to adopt open-data technologies that have
proven beneficial in other areas.
![Page 22: Turning three thesauri into a Global Agricultural Concept Scheme](https://reader034.vdocuments.mx/reader034/viewer/2022051617/55a70e701a28ab10628b4663/html5/thumbnails/22.jpg)
Challenges
![Page 23: Turning three thesauri into a Global Agricultural Concept Scheme](https://reader034.vdocuments.mx/reader034/viewer/2022051617/55a70e701a28ab10628b4663/html5/thumbnails/23.jpg)
Differences in modeling
Q: Are taxonomic organism names (e.g. ‘Bos taurus’)
different concepts than the common names (‘cattle’)?
● sometimes there is no 1:1 match
and/or context of use is different
● the source thesauri all have different policies
No final answer yet...
![Page 24: Turning three thesauri into a Global Agricultural Concept Scheme](https://reader034.vdocuments.mx/reader034/viewer/2022051617/55a70e701a28ab10628b4663/html5/thumbnails/24.jpg)
Lumpsclusters of concepts mapped one-to-several, several-to-one, or in spirals
![Page 25: Turning three thesauri into a Global Agricultural Concept Scheme](https://reader034.vdocuments.mx/reader034/viewer/2022051617/55a70e701a28ab10628b4663/html5/thumbnails/25.jpg)
Next steps
and future of GACS
![Page 26: Turning three thesauri into a Global Agricultural Concept Scheme](https://reader034.vdocuments.mx/reader034/viewer/2022051617/55a70e701a28ab10628b4663/html5/thumbnails/26.jpg)
GACS system infrastructure
![Page 27: Turning three thesauri into a Global Agricultural Concept Scheme](https://reader034.vdocuments.mx/reader034/viewer/2022051617/55a70e701a28ab10628b4663/html5/thumbnails/27.jpg)
Beyond GACS Beta?
Q: Can GACS replace existing agricultural thesauri?
● definitely not with GACS Beta due to smaller scope/size
● a future GACS may be an alternative for some
scenarios, but not all uses of existing thesauri because
o they cover areas beyond agriculture
o existing systems and processes (publication,
automatic indexing…) depend on current thesauri
In future, more partners are expected and the scope of GACS can be adjusted.
![Page 28: Turning three thesauri into a Global Agricultural Concept Scheme](https://reader034.vdocuments.mx/reader034/viewer/2022051617/55a70e701a28ab10628b4663/html5/thumbnails/28.jpg)
Thank you
Reports available on the FAO AIMS site:http://aims.fao.org/community/agrovoc/blogs/phase-one-gacs-approved-read-reports
GACS Alpha: http://tester-os-kktest.lib.helsinki.fi/gacsdemo/en/
Slides prepared by Osma Suominen and Tom Baker
@cydparr