if you tag it, will they come? metadata quality and repository management
DESCRIPTION
Presentation to Metadata Perspectives 2009, a conference held in Vienna, Austria in November 2009. When we build collections of scholarly works, learning materials, or other educational "stuff", we want people to be able to find it. This raises a number of problems, including ensuring that resources are tagged with adequate metadata. In 2004 a pioneering paper on this issue noted: "At its best, “accurate, consistent, sufficient, and thus reliable” (Greenberg & Robertson, 2002) metadata is a powerful tool that enables the user to discover and retrieve relevant materials quickly and easily and to assess whether they may be suitable for reuse. At worst, poor quality metadata can mean that a resource is essentially invisible within the repository and remains unused." (Currier et al, 2004). Have the five years since the above-quoted paper was published borne out its prediction: that simply expecting resource authors to create their own metadata at upload would lead to metadata of insufficient quality? Have repository managers been able to persuade funders that including professional metadata augmentation is worth the money? What has been the impact of recent Web developments allowing easier exposure, searching and sharing of resources? How is metadata being treated within the emerging domain of open educational resources? And what does all this mean for repository managers wanting to increase the discoverability of their resources, and to implement workflows for creation of good quality metadata? Currier, S. et al (2004) Quality assurance for digital learning object repositories: issues for the metadata creation process, ALT-J, Research in Learning Technology, Vol. 12, No. 1, March 2004 http://repository.alt.ac.uk/616/1/ALT_J_Vol12_No1_2004_Quality%20assurance%20for%20digital%20.pdf Greenberg, J. & Robertson, W. (2003) Semantic web construction: an inquiry of authors’ views on collaborative metadata generation, Proceedings of the International Conference on Dublin Core and Metadata for e-Communities 2002, 45–52. http://dcpapers.dublincore.org/ojs/pubs/article/viewArticle/693TRANSCRIPT
If You Tag it,Will They Come?Metadata Quality and Repository Management
Presentation by Sarah CurrierPerspectives on Metadata Conference
University of Vienna, Austria, 12-13 November 2009
Who is here?
• How many librarians / information management people?
• How many IT / systems management people?• How many software development people?• How many from libraries?• How many from museums?• How many from archives?• How many from educational support (e.g.
repositories of learning & teaching resources, educational development)?
• Others?
If you tag it? What is metadata?
for the purposes of this discussion, metadata is
If you tag it? What is metadata?
for the purposes of this discussion, metadata is
structured data about data
If you tag it? What is metadata?
for the purposes of this discussion, metadata is
structured data about data
this includes
metadata structured via recognised standards, local specifications and social tagging systems
.. will they come? Who are they?
Whose requirements are you trying to meet?
What is your business case?
What is your business model?
.. will they come? Who are they?
Whose requirements are you trying to meet?End users? Academics? Students?
Funders? Administrators?A subject community? Some other community?
The whole wide world?
Who are your users and what are their requirements?
.. will they come? Who are they?
What is your business case?
Enabling academics to share research with subject community?
Enhancing the reputation of your institution?
Saving costs across an organisation, consortium, country?
Archiving resources for the future?
What is your business case? Have you articulated it?
.. will they come? Who are they?
What is your business model?
Consortium of institutions sharing costs?
Nationally funded service?
Institutional service?
Subject community with member organisations paying a subscription?
What is your business model?
What is “metadata quality”?
• Technical quality: adherence to local or international metadata standards, specifications and application profiles.
• Semantic quality: proper use of controlled vocabularies and semantic standards.
• Value quality: populating metadata fields appropriately for describing the resource and its relationships , for the benefit of the user community and other stakeholders:
“accurate, consistent, sufficient, and thus reliable”(Greenberg & Robertson, 2002)
What is “metadata quality”?
• Technical quality: adherence to local or international metadata standards, specifications and application profiles.
• Semantic quality: proper use of controlled vocabularies and semantic standards.
• Value quality: populating metadata fields appropriately for describing the resource and its relationships , for the benefit of the user community and other stakeholders:
“accurate, consistent, sufficient, and thus reliable”(Greenberg & Robertson, 2002)
I’m going to assume you know something about this
What is “metadata quality”?
• Technical quality: adherence to local or international metadata standards, specifications and application profiles.
• Semantic quality: proper use of controlled vocabularies and semantic standards.
• Value quality: populating metadata fields appropriately for describing the resource and its relationships , for the benefit of the user community and other stakeholders:
“accurate, consistent, sufficient, and thus reliable”(Greenberg & Robertson, 2002)
You may not understand everything about this (who does?), but it’s a big topic for another presentation
What is “metadata quality”?
• Technical quality: adherence to local or international metadata standards, specifications and application profiles.
• Semantic quality: proper use of controlled vocabularies and semantic standards.
• Value quality: populating metadata fields appropriately for describing the resource and its relationships, for the benefit of the user community and other stakeholders:
“accurate, consistent, sufficient, and thus reliable”(Greenberg & Robertson, 2002)
This is the quality of the values that populate the metadata fields
Why worry about it?
"At its best, “accurate, consistent, sufficient, and thus reliable” (Greenberg & Robertson, 2002) metadata is a powerful tool that enables the user to discover and retrieve relevant materials quickly and easily and to assess whether they may be suitable for reuse. At worst, poor quality metadata can mean that a resource is essentially invisible within the repository and remains unused."
(Currier et al, 2004)
Why worry about it?
"At its best, “accurate, consistent, sufficient, and thus reliable” (Greenberg & Robertson, 2002) metadata is a powerful tool that enables the user to discover and retrieve relevant materials quickly and easily and to assess whether they may be suitable for reuse. At worst, poor quality metadata can mean that a resource is essentially invisible within the repository and remains unused."
(Currier et al, 2004)
Is this still true?
Do we even need metadata?
Now we have Google ...
Do we even need metadata?
Now we have Google ...For some use cases, in order to maximise
resource discovery and use, you need to focus on search engine optimisation, and exposure
of resources to user communities via social media. Looking ahead to the Semantic Web and linked data probably won’t hurt either.
First things first
What is the problem to which the repository is a solution? And who identifies this as a problem?
What will be the measure of success for your repository?
Margaryan, Milligan and Douglas, 2007
Using “Good Intentions”
• JISC-funded Good Intentions project developed a template to gather different existing business models for sharing t&l resources, and evaluating affordances, successes
• Created a matrix to map different elements of business cases to different business models– Too big to show it all here: worth following up, but here are
examples
McGill et al (2008)
Business model template
Finance models
Service models
Supplier/consumer models
Issues affecting models
Impact of business cases
Significant impact Some impact Possible with right conditions No impact
General benefits to global community Open CoP Subject-based Institutional National Informal
Supporting subject-discipline communities to share
Encourages innovation and experimentation
Shares expertise and resources between developed and developing countries
Supports re-use and re-purposing
Supports community input to metadata through tagging, notes, reviews
Supports effective retrieval through professionally created metadata
Ensures trust through appropriate licensing
General benefits to global community Open CoP Subject-based Institutional National Informal
Supporting subject-discipline communities to share
Encourages innovation and experimentation
Shares expertise and resources between developed and developing countries
Supports re-use and re-purposing
Supports community input to metadata through tagging, notes, reviews
Supports effective retrieval through professionally created metadata
Ensures trust through appropriate licensing
Business cases - Global
Case Subject Open
Supporting subject-based communities to share
Encourages innovation and experimentation
Shares expertise and resources between developed and developing countries
Supporting re-use and re-purposing
Supporting continued development of standards and interoperability
Supporting continued development of tools for sharing and exchange
Supporting sharing and reuse of individual assets
Helps develop critical mass of materials in particular subject areas
Supporting ease of access through search engines such as Google
Business cases - NationalCase Subject Open
Cost efficiencies
Decrease in duplication
Supports cross-institutional sharing
Provides access to non-educational bodies such as employers, professional bodies, trade unions, etc
Supports a broad vision of sharing across the country
Promotes the concept of lifelong learning
Supports shared curricula
Supports discovery of most used/highest quality resources
Supports the notion that educational institutions should leverage taxpayers money by allowing free sharing and reuse of resources
Mitigates the cost of keeping resources closed
Mitigates the risk of doing nothing in a rapidly changing environment
Supports sustained long-term sharing
Business cases - Institutional
Case Subject Open
Increased transparency and quality of learning materials
Encourages high quality learning and teaching resources
Supports modular course development
Maintaining and building institution’s reputation - globally
Attracting new staff and students to institutions – recruitment tool for students and prospective employers
Shares expertise efficiently within institutions
Supports the altruistic notion that sharing knowledge is in line with academic traditions and a good thing to do
Likely to encourage review of curriculum, pedagogy and assessment
Enhancing connections with external stakeholders by making resources visible
Business cases - Teachers
Case Subject Open
Increased personal recognition
Supports sharing of knowledge and teaching practice
Encourages improvement in teaching practice
Supports immediate one-off instances of sharing
Supports attribution
Encourages multi-disciplinary collaboration and sharing
Supports CPD and offers evidence of this
Business cases - Learners
Case Subject Open
Easy and free access to learning material for learners
Increased access options for students enrolled on courses (particularly remote students)
Easily accessed through student-owned technologies
Increased access for non-traditional learners (widening participation)
Likely to encourage self-regulated and independent learning
Likely to increase demand for flexible learning opportunities
Likely to increase demand for assessment and recognition of competencies gained outside formal learning settings
Likely to encourage peer support, mentorship and ambassadorial programmes
What are the use cases for metadata?
• Resource discovery• Resource selection• Resource aggregation and manipulation• Intellectual property rights• Digital preservation• Marketing• Accessibility• Interoperability• Reputation (of individuals and organisations)
What are the use cases for metadata?
• Resource discovery• Resource selection• Resource aggregation and manipulation• Intellectual property rights• Digital preservation• Marketing• Accessibility• Interoperability• Reputation (of individuals and organisations)• Any others?
Developing your application profile
Once you have your requirements,... based on your business case, business
model and use cases ...you can decide what metadata fields and
vocabularies are necessary (if any) to meet these requirements.
How is metadata created?
Broadly:
• Manual generation by humans, or:• Automatic generation
How is metadata created?
• Manual generation by humans– Created by resource authors– Added by resource depositors– Created, checked, augmented by professionals, e.g.:
• Cataloguers• Subject experts• Designated IPR gatekeepers
– Enriched by resource users, e.g.:• Additional description, comments, annotations, descriptions of usage• Corrections• Enrichment (additional subject description etc.)• Social tagging• Ratings and recommendations
How is metadata created?
• Manual generation by humans– Created by resource authors– Added by resource depositors– Created, checked, augmented by professionals, e.g.:
• Cataloguers• Subject experts• Designated IPR gatekeepers
– Enriched by resource users, e.g.:• Additional description, comments, annotations, descriptions of usage• Corrections• Enrichment (additional subject description etc.)• Social tagging• Ratings and recommendations
Rich and useful, but requires quality checks, and must be minimal to encourage deposit
How is metadata created?
• Manual generation by humans– Created by resource authors– Added by resource depositors– Created, checked, augmented by professionals, e.g.:
• Cataloguers• Subject experts• Designated IPR gatekeepers
– Enriched by resource users, e.g.:• Additional description, comments, annotations, descriptions of usage• Corrections• Enrichment (additional subject description etc.)• Social tagging• Ratings and recommendations
Expensive! Must be justified by business case and minimised by use of automatic metadata generation, search engine exposure and community metadata
How is metadata created?
• Manual generation by humans– Created by resource authors– Added by resource depositors– Created, checked, augmented by professionals, e.g.:
• Cataloguers• Subject experts• Designated IPR gatekeepers
– Enriched by resource users, e.g.:• Additional description, comments, annotations, descriptions of usage• Corrections• Enrichment (additional subject description etc.)• Social tagging• Ratings and recommendations
Can be useful for many types of resource collection, for description; community building; and supporting greater exposure and use of resources
How is metadata created?
• Automatic generation, e.g.:– Extraction from resource files– Inferred from resource relationships– Creation according to system settings– Generation of default values– Extraction via text mining
How is metadata created?
• Automatic generation, e.g.:– Extraction from resource files– Inferred from resource relationships– Creation according to system settings– Generation of default values– Extraction via text mining– Other ways?
When is metadata created?
• During resource creation / editing• During resource upload• During metadata creation workflow• Via post-upload metadata harvesting / combining /
augmentation / checking / “cleaning”• During or after resource use.
When is metadata created?
• During resource creation / editing• During resource upload• During metadata creation workflow• Via post-upload metadata harvesting / combining /
augmentation / checking / “cleaning”• During or after resource useSo, basically:• At any and many points during the resource lifecycle.
Thinking outside the repository box
“We have used the term 'service' to describe the various infrastructures that exist to support sharing, but must stress that this includes a wide range of activities including those supported by formal repositories and/or open social software services, as well as informal mechanisms within or across institutions, between lecturers and/or students. This term [...] was deliberately chosen to highlight the wide range of activities, mechanisms and support that are offered to encourage and facilitate sharing, including, but not limited to static storage of content.”
McGill, Currier, Duncan & Douglas, 2008
Thinking outside the repository box
Implications:• Think about the places, ways your intended community works,
socialises, shares and communicates• Think about interoperability
– What if you need to migrate your content in 5 years?– What metadata specs and standards do you need?– Expose your content and services via open APIs.
• Think about a service-based approach (Web services that is): what components do you need to interact with?– Facebook? Twitter? Delicious or Diigo tagging? Widgets? RSS feeds!– Student and staff records?– Learning managements systems? Library management systems? Other
campus / organisational systems?
Thinking about communities (1 of 5)
If you build it, will they come?
“[...] the pedagogical, social, and organisational aspects of these communities have not been at the forefront in the design and development of [learning object repositories]. Research has consistently demonstrated that the most substantial barriers in uptake of technology are rooted in these factors”
Margaryan, Milligan and Douglas, 2007
Thinking about communities (1 of 5)
If you build it, will they come?
“[...] the pedagogical, social, and organisational aspects of these communities have not been at the forefront in the design and development of [learning object repositories]. Research has consistently demonstrated that the most substantial barriers in uptake of technology are rooted in these factors”
Margaryan, Milligan and Douglas, 2007
Thinking about communities (2 of 5)
Community dimensions to think about(1) Purpose: the shared goal/interest of the community; the reason why the community
was formed in the first place(2) Composition: the number and types of (sub-)communities to be supported(3) Dialogue: modes of participation and communication (online, face-to-face, or mixed)
adopted by the community(4) Roles and responsibilities: of community members(5) Coherence: whether the community is close-knit or loosely confederated/transient(6) Context: the broader ecology within which the community exists (for example,
professional bodies; governments; implicit and explicit rules that govern the functioning of community; ground rules of conduct; rewards and incentives mechanisms; etc.)
(7) Pedagogy: teaching and learning approaches used in the community (for example, problem-based learning, collaborative learning, etc.)
Thinking about communities (3 of 5)
Repository dimensions to think about(1) Purpose: including t&l repositories created to support professional development of
teachers, or for the exchange of specific resource formats, such as sound files, learning designs, or student assignments
(2) Subject discipline: including t&l repositories created to support mono-disciplinary or multidisciplinary communities
(3) Scope: including t&l repositories supporting departmental, institutional, regional, national, or international communities
(4) Sector: for example school, higher education, further education, hobby-based learning, work-based, or lifelong learning
(5) Contributors: such as teachers, students, publishers, institutions, funded projects(6) Business model: concerning the business, trading, and management framework
underpinning the repository
Thinking about communities (4 of 5)
Thinking about engaging communities• Iterative, agile design: be ready to change tack, make mistakes• Multi-disciplinary team from the start:
– Educational development, library, staff development, learning services, technical services, academic and student representatives
• Engagement and support vital from line managers at departmental, school, faculty, institutional level: gives people permission to put time and effort into working with repository, sharing materials
• Talk to others doing the same thing (JISC CETIS Repositories Community, JISC-Repositories list, software user communities, international contacts)
• If you can, have a designated repository manager from the start.
Thinking about communities (5 of 5)
Thinking about engaging individuals• How do they currently store, back up, share and discover resources?• What pain points can you solve first off, to get them engaged?• What’s juicy for them? E.g. Providing an RSS Feed of their own
publications that can appear on their personal or departmental web page.• Be aware of time & other pressures: sometimes engaging with new
technology/processes takes more time at the start; make sure it pays off for them fairly quickly re supporting their work and saving them time.
• Identify champions in user communities to mentor others• Mentor and support users by choosing a specific task they can easily
achieve, or a specific problem they can solve with your repository
Thinking about software
Affordances to support metadata quality:• Tried and tested support for appropriate metadata standards, and
interface standards– Web services, APIs– OAI-PMH– RSS / Atom / OPML– SWORD for easy or bulk deposit– Vocabulary interchange (SKOS, Zthes, IMS VDEX)).
• Automatic metadata generation MUST be used to create as much metadata as possible at the appropriate points.
– Text mining for term extraction;– Use of templates to populate with default values;– Extraction of user data for authorship and IPR;– Extraction of course data to populate educational level, educational context, subject
metadata ... Etc.!
Thinking about software
Affordances to support metadata quality:• Workflow capability:
– To support different kinds of metadata being created at appropriate times by appropriate people or systems.
– To support publication of resources before all metadata is created.
• Metadata forms usability– Technical aspects of metadata should be invisible– Drop-down menus, text-completion, vocabulary term suggestion– Spell-check! Some browsers do this: make sure they can use those browsers.– Step-through wizard type approach can be helpful.– Careful with default values though: research and experience shows that users
will simply leave the default selected.
Thinking about software
Affordances to support metadata enhancement:• Using text mining to create / suggest metadata.• Using tools for combining metadata from different sources:
– Other instances of the same resource;– From related resources; – Course information about where the resource was used;– “Person” metadata about authors and other agents.
• Metadata “cleaning” tools: checking spelling, appropriate use of vocabularies, reducing duplication, etc.
• Registries for vocabularies, metadata elements and application profiles– Can assist with ensuring your metadata is standardised, and mapped across
different communities / languages etc.
Example of repository with metadata quality measures in place
IRISS Learning Exchange:• Built on intraLibrary, using their open source SRU search
tool• Leeds Met Uni / others are adapting for their own use• Social work education across Scotland (HE, now
WBL/CPD and FE also)• Started closed to members only, now open• Professional metadata, high quality resources- but
teacher sharing never took off.http://www.iriss.org.uk/openlx/
Example of repository with metadata workflows easing sharing
EdShare (Southhampton)• Built on ePrints: first formal attempt to make ePrints a
learning materials repository• Covers all subjects at Southampton Uni, open to Web• Worked closely from the start with academics• Focussed on minimal metadata, maximal sharing and Web
exposure (Morris, 2009)• Problems with metadata quality? Early example: academic
unable to create RSS Feed of own materials as couldn’t be distinguished from another academic of the same name!
http://www.edshare.soton.ac.uk/
ResourcesSarah Currier Consultancy http://www.sarahcurrier.com/ JISC CETIS Repositories Domain: http://jisc.cetis.ac.uk/domain/metadata JISC CETIS Repositories & Metadata list: http://www.jiscmail.ac.uk/CETIS-METADATA Special thanks to Lou McGill and Charles Duncan for “Good Intentions” slides:
http://www.loumcgill.co.uk/ and http://www.intrallect.com/ Additional thanks to Lorna M. Campbell, Phil Barker and R. John Robertson of JISC CETIS for the
fabulous sessions at CETIS09 in Birmingham, UK immediately prior to this Vienna conference, also to the participants from the JISC OER Programme. These sessions have not yet been written up so cannot be referenced here, but there will be resources appearing on JISC CETIS website in due course. Here’s an initial taster: http://blogs.cetis.ac.uk/lmc/2009/11/13/orders-from-the-roundtable/
Automated metadata generation and enhancement:FixRep Project, UKOLN: http://www.ukoln.ac.uk/projects/fixrep/NaCTEM / Intute Project on enhancing metadata using text mining:
http://www.nactem.ac.uk/intute/JISC Automatic Metadata Generation study: http://www.intrallect.com/wiki/index.php/AMG-UC JISC Metadata Generation Tools study: http://ie-repository.jisc.ac.uk/258/ For information on metadata augmentation, enhancement and “cleaning”
post-creation/harvesting, see numerous publications by Diane Hillmann at Metadata Associates: http://managemetadata.org/otherPubs.php & [email protected]
For numerous publications on automatic metadata generation and enhancement in e-learning see publications of Erik Duval and his research group: https://lirias.kuleuven.be/items-by-author?author=Duval%2C+Erik%3B+U0016838
ReferencesCurrier, S. et al (2004) Quality assurance for digital learning object
repositories: issues for the metadata creation process in ALT-J Research in Learning Technology Vol. 12, No. 1, March 2004. Available: http://repository.alt.ac.uk/616/1/ALT_J_Vol12_No1_2004_Quality%20assurance%20for%20digital%20.pdf
Greenberg, J. & Robertson, W. (2003) Semantic web construction: an inquiry of authors’ views on collaborative metadata generation, Proceedings of the International Conference on Dublin Core and Metadata for e-Communities 2002, 45–52. Available: http://dcpapers.dublincore.org/ojs/pubs/article/viewArticle/693
Margaryan, A., Milligan, C. And Douglas, P. (2007) CD-LOR Deliverable 9: Structured Guidelines for Setting up Learning Object Repositories. JISC. Available: http://www.academy.gcal.ac.uk/cd-lor/documents/CD-LOR_Structured_Guidelines_v1p0_000.pdf
McGill, L ., Currier, S., Duncan, C. , Douglas, P. (2008) Good Intentions: Improving the Evidence Base in Support of Sharing Learning Materials. JISC. Available: http://ie-repository.jisc.ac.uk/265/
Morris, D. (2009) Encouraging More Open Educational Resources with Southampton’s EdShare in Ariadne, Issue 59 Available: http://www.ariadne.ac.uk/issue59/morris/