michael khoo 1 douglas tudhope 2 , ceri binding 2
DESCRIPTION
Extracting Dewey Decimal Classifications from Dublin Core Metadata Records With the DISTIL Project : Preliminary Findings and Observations. Michael Khoo 1 Douglas Tudhope 2 , Ceri Binding 2 1 Drexel University 2 University of Glamorgan NKOS Workshop/TPDL 2012 Paphos Cyprus. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Michael Khoo 1 Douglas Tudhope 2 , Ceri Binding 2](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164d4550346895dd707d8/html5/thumbnails/1.jpg)
Extracting Dewey Decimal Classifications from Dublin Core Metadata Records With the DISTIL Project: Preliminary Findings
and Observations
Michael Khoo1
Douglas Tudhope2, Ceri Binding2
1Drexel University 2University of GlamorganNKOS Workshop/TPDL 2012 Paphos Cyprus
![Page 2: Michael Khoo 1 Douglas Tudhope 2 , Ceri Binding 2](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164d4550346895dd707d8/html5/thumbnails/2.jpg)
DISTIL (Document Indexing & Semantic Tagging Interface for Libraries)
• Setting• Small(ish)-scale, DC, educational DLs• Large-scale information infrastructures
• Aim: Achieve efficient federated search and discovery across heterogeneous DLs
• Focus: Humanities and social sciences• Funding: Digging Into Data Challenge
![Page 3: Michael Khoo 1 Douglas Tudhope 2 , Ceri Binding 2](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164d4550346895dd707d8/html5/thumbnails/3.jpg)
National Science Digital Library
Drexel
U. Manchester
U. Glamorgan
![Page 4: Michael Khoo 1 Douglas Tudhope 2 , Ceri Binding 2](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164d4550346895dd707d8/html5/thumbnails/4.jpg)
![Page 5: Michael Khoo 1 Douglas Tudhope 2 , Ceri Binding 2](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164d4550346895dd707d8/html5/thumbnails/5.jpg)
![Page 6: Michael Khoo 1 Douglas Tudhope 2 , Ceri Binding 2](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164d4550346895dd707d8/html5/thumbnails/6.jpg)
Stage 1: Harvesting
Some metadata is exposed – other metadata is hidden
Building the harvest is requiring some communication and negotiation with the original metadata curators
![Page 7: Michael Khoo 1 Douglas Tudhope 2 , Ceri Binding 2](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164d4550346895dd707d8/html5/thumbnails/7.jpg)
Stage 1: Harvesting - IPL
IPL
LII
1990sSeparate organizationsHomebrewed metadata& SQL databases
2008Merge> DC
2012Dublin CoreFedora databasewith multiple datastreams
exposed
hidden
![Page 8: Michael Khoo 1 Douglas Tudhope 2 , Ceri Binding 2](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164d4550346895dd707d8/html5/thumbnails/8.jpg)
Stage 1: Harvesting - Intute
Intute stores metadata for each resource in unrelated tables• One database contains the main record• Additional tables contain discipline-specific
metadata that supports different focused search and browsing views on the collections (e.g. some collections indexed with specific controlled vocabularies)
normallyexposed
hidden
‘general’ metadata
‘specific metadata’
![Page 9: Michael Khoo 1 Douglas Tudhope 2 , Ceri Binding 2](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164d4550346895dd707d8/html5/thumbnails/9.jpg)
Stage 1: Harvesting - NSDL
exposed
hidden
‘normalized’ metadata
NSDL Pathway metadata
‘pre-normalized’ metadata
completely hidden
![Page 10: Michael Khoo 1 Douglas Tudhope 2 , Ceri Binding 2](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164d4550346895dd707d8/html5/thumbnails/10.jpg)
Stage 1: Harvesting - NSDL
Environmental scienceteacher resourceprofessional developmentteaching awardsProfessional organizationEcology, Forestry and AgricultureGeoscienceSocial SciencesEducationChemistryPhysicsSpace Science
Educational theory and practiceEnvironmental sciencePolicy issuesSpace scienceScienceEarth sciencePhysical sciencesChemistryBiologyEducation (General)PhysicsAstronomySpace sciencesEducationEcology, Forestry and AgricultureGeoscienceSocial SciencesHistory/Policy/LawSpace ScienceChemistryPhysicsLife ScienceTechnology
BiologyPhysicsEducationLife ScienceChemistry
![Page 11: Michael Khoo 1 Douglas Tudhope 2 , Ceri Binding 2](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164d4550346895dd707d8/html5/thumbnails/11.jpg)
Observation Easy in theory In practice, organizational histories and legacy
factors complicate the process Each DL’s metadata is requiring:
Custom approaches in order to harvest and process Access to specific people with specific knowledge
Unknown unknowns …
![Page 12: Michael Khoo 1 Douglas Tudhope 2 , Ceri Binding 2](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164d4550346895dd707d8/html5/thumbnails/12.jpg)
![Page 13: Michael Khoo 1 Douglas Tudhope 2 , Ceri Binding 2](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164d4550346895dd707d8/html5/thumbnails/13.jpg)
![Page 14: Michael Khoo 1 Douglas Tudhope 2 , Ceri Binding 2](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164d4550346895dd707d8/html5/thumbnails/14.jpg)
![Page 15: Michael Khoo 1 Douglas Tudhope 2 , Ceri Binding 2](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164d4550346895dd707d8/html5/thumbnails/15.jpg)
Stage 2: Pre-processing
Select fields and remove tags …
![Page 16: Michael Khoo 1 Douglas Tudhope 2 , Ceri Binding 2](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164d4550346895dd707d8/html5/thumbnails/16.jpg)
Stage 2: Pre-processing
Frequency countsSum (total occurrences) = 81Mean = 1.6Std Dev = 1.7Cut off (Mean + Std Dev) = 3.3
![Page 17: Michael Khoo 1 Douglas Tudhope 2 , Ceri Binding 2](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164d4550346895dd707d8/html5/thumbnails/17.jpg)
Stage 2: Pre-processing
Noun phrasesFrantzi, K., Ananiadou, S. and Mima, H. (2000) Automatic recognition of multi-word terms. International Journal of Digital Libraries 3(2), pp.117-132.http://www.nactem.ac.uk/software/termine/
![Page 18: Michael Khoo 1 Douglas Tudhope 2 , Ceri Binding 2](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164d4550346895dd707d8/html5/thumbnails/18.jpg)
Stage 2: Pre-processing
National Science Teachers AssociationSpace scienceSpace sciences
teacher programsNSTA memberteacher resourcesteaching evolutioneducational theoryenvironmental scienceearth sciencephysical sciencelife science
![Page 19: Michael Khoo 1 Douglas Tudhope 2 , Ceri Binding 2](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164d4550346895dd707d8/html5/thumbnails/19.jpg)
National Science Teachers AssociationSpace scienceSpace sciences
teacher programsNSTA memberteacher resourcesteaching evolutioneducational theoryenvironmental scienceearth sciencephysical sciencelife science
![Page 20: Michael Khoo 1 Douglas Tudhope 2 , Ceri Binding 2](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164d4550346895dd707d8/html5/thumbnails/20.jpg)
National Science Teachers AssociationSpace scienceSpace sciences
teacher programsNSTA memberteacher resourcesteaching evolutioneducational theoryenvironmental scienceearth sciencephysical sciencelife science
![Page 21: Michael Khoo 1 Douglas Tudhope 2 , Ceri Binding 2](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164d4550346895dd707d8/html5/thumbnails/21.jpg)
Summary Work is complex but do-able (so far) Many subsidiary steps Harvesting work has a significant organizational
knowledge dimension, and requires organizational communication* Suggests a need for organizational models, processes,
and best practices to account for and address the general nature of these phenomena
Khoo, M., Hall, C. (2012). Rethinking organizational distance: Networks of practice, legacy issues, and metadata work in a digital library project. Accepted, Information and Organization.
Lagoze, C., Krafft, D. B., Cornwell, T., Dushay, N., Eckstrom, D., & Saylor, J. (2006). Metadata aggregation and ‘automated digital libraries’: a retrospective on the NSDL experience. 6th ACM-IEEE Joint Conference on Digital Libraries (JCDL), June 11–15, 2006, Chapel Hill, North Carolina, USA, pp. 230-239.
Lagoze, C., & Patzke, K. (2011). A research agenda for data curation in cyberinfrastructure. Paper presented at the 11th ACM-IEEE Joint Conference on Digital Libraries (JCDL), June 13-17, 2011, Ottawa, Canada.
![Page 22: Michael Khoo 1 Douglas Tudhope 2 , Ceri Binding 2](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164d4550346895dd707d8/html5/thumbnails/22.jpg)
Thank you – and …
Questions?