linked open data for cultural heritage
DESCRIPTION
This paper surveys the landscape of linked open data projects in cultural heritage, exam- ining the work of groups from around the world. Traditionally, linked open data has been ranked using the five star method proposed by Tim Berners-Lee. We found this ranking to be lacking when evaluating how cultural heritage groups not merely develop linked open datasets, but find ways to used linked data to augment user experience. Building on the five-star method, we developed a six-stage life cycle describing both dataset development and dataset usage. We use this framework to describe and evaluate fifteen linked open data projects in the realm of cultural heritage.TRANSCRIPT
Linked Open Data Projects for Cultural Heritage:
Evolution of an Information Technology
Julia Marsden – Carolyn Li-MadeoJeff Edelstein – Noreen WhyselLola Galla – Alison Rhonemus
Cultural Heritage: Description & Access
Pratt SILS LIS 670 – Spring 2013Prof. Cristina Pattuelli
WHAT IS LINKED OPEN DATA?
Linked Data provides a mechanism for representing
databases (RDF) and a mechanism for querying those
databases (SPARQL)*
Linked Open Data uses W3C Semantic Web standards to
create relationships between previously isolated data silos
Behind almost every website is a database and although these
sites are linkable the information in their databases
is left unconnected
*From the New York Times’ OPEN blog
REVIEW OF TERMINOLOGIES
RDF Triple
Subject
Object
Predicate
URI
API
An Application Programming Interface
softwareprogram
softwareprogram
Allows software programs to interactwith one another
URL URNURI
Unique Resource Identifier
URI
SPARQL Query
• SPARQL Protocol and RDF Query Language• Query language for RDF / Databases• Allows users to write unambiguous queries
METHODOLOGY
•Affiliation / Mission / Intended Audience•Knowledge Organization / Data Models & Vocabulary •Technology Platform •Usability/Interface Design•Discovery (search & navigation)•Data Shareability (ie. availability of an API)•Sustainability (ie. digital preservation, documentation or available code)•Project Leaders•Funding Sources•Level of Collaboration•Analysis•Star-Rating (ie Tim Berners-Lee's coffee cup)
Developing Datasets Release one or more datasets in linked open format, expressed as RDF triples, that others may use. Projects: Library of Congress; Pan- Canadian Documentary Heritage Network
Linking Data Cultural heritage institutions link their datasets to others (e.g., DBpedia, VIAF, GeoNames) to enhance discovery and reuse of
their collections. Projects: Hungarian National Library; Civil War 150; Linking Lives; Bibliothèque national de France
Documenting Processes for Reuse Explain linked open data and ways that cultural heritage professionals can use datasets. Projects: New York Times; Deutsche National Bibliothek
Developing User Interfaces Institutional or collaborative projects use the datasets to develop applications , including interfaces, visualizations, and augmented reality. Projects: Agora; Pan-Canadian Documentary Heritage Network; Amsterdam Mobile City App; Linked Jazz
Promoting Reuse Institutions go beyond the creation of their own test projects, encouraging users to develop innovative applications. Projects: Open Cultuur Data, EUScreen
Expanding the Definition of Cultural Heritage Efforts from outside the cultural heritage framework, such asgovernment agencies and international aid organizations, can serve to strengthen societies and their cultural institutions. Project: Open Data for Resilience Initiative
LINKED DATA LIFE CYCLES
Stage 1. Developing Datasets
Pan-Canadian Documentary Heritage Network• Formed in 2010; highly collaborative effort across a broad spectrum
of LAMs.
• Pilot project results published July 2012:• RDF metadata• Detailed project report• Demonstration video, “Out of the Trenches”
• Project content submitted in various formats:• War songs (MARC records; BAnQ)• War posters (spreadsheets; McGill)• Newspaper articles, postcards, and wartime records (MODS XML; University of Alberta)• Portrait archives of CEF solders; WWI documents (spreadsheets; University of Calgary)• Archival material from Saskatchewan War Experience Project (DC RDF; University of
Saskatchewan)
• Use of external LOD datasets:• Geonames, VIAF, LCSH, TGM, Rameau, LACSH• Metadata then mapped to ontologies (e.g., events, places,
persons)
• Principal findings: • Good approach for resource integration and discovery• Considered “reuse” in terms of using element sets in multiple
contexts (e.g., “role” as predicate or as object) and repurposing vocabularies
developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
LIBRARY OF CONGRESS
developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
LIBRARY OF CONGRESS
Dereferenceable URI
Name Variants
Related Terms
Promotes existing Library of Congress
resources to Linked Open Data
web resources, uncovers and
connects related names and terms
developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
LIBRARY OF CONGRESS
Multiple formats are available for wider use
LC Classification Numbers are related to
each entry
Connects with and acknowledges other
schemes
developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
Stage 2. Linking Data
developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
CIVIL WAR DATA 150
Project was designed to encourage the contribution
of a wide variety of data sources: from institutions
to individuals
Partnership between The Archives of Michigan, The
Internet Archive and Freebase
Celebrating the sesquicentennial of the
American Civil War
developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
CIVIL WAR DATA 150Project Goals:
Create web apps to enable users to add to
or modify shared metadata with strong
identifiers
Engage the public in the process of interacting with and adding value to the
data
Identify sources and map metadata into
Freebase
developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
LOCAH and Linking Lives• Projects of Archives Hub UK (http://archiveshub.ac.uk), which represents more than 220
institutions
• LOCAH (Linked Open Copac & Archives Hub; 2010-2011):• Published data from Archives Hub finding aids and Copac, a union catalog of more than 70
major UK libraries• Created LOD resources:
1. SPARQL endpoint2. Query box for trying out SPARQL queries3. RDF dump of the dataset4. Archives HUB EAD to RDF XSLT stylesheet
• Linking Lives (2011-2012) expanded on LOCAH• Test project focusing on biography• Brought in more external datasets (Dbpedia, VIAF,
Freebase, OpenLibrary, BBC Programmes, Linked Open British National Biography)
• Developed interface model (wireframe)
• Principal findings:• Even when expressed in triples, data may lack uniformity, requiring time-consuming clean-up• Difficulty of firmly establishing identity when there are variant forms of names or identifying
roles (e.g., “author” vs. “writer”) and when different people have the same name
developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
Stage 3. Documenting Processes for Reuse
DEUTSCHE NATIONAL BIBLIOTEK
• Linked Data Service• Library scientist led• Authority names and
bibliographic data• Downloadable dataset• SRU and OAI/PMH interfaces• Extensive documentation
developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
THE NEW YORK TIMES
developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
THE NEW YORK TIMES
The OPEN BlogDocuments and contextualizes the
APIsPlatform for sharing Open Source
CodeForum for trouble shooting and
ideas
Downloadable SKOS FilesThe entire dataset is downloadableDevelopers can also chose by topic
Users are invited to utilize the datasets and APIs through
downloads, documentation, support and explanation of LOD
terminology, code and uses
developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
THE NEW YORK TIMESAvailable APIs
Developer NetworkAPI Request Tool allows developers to search through the expansive list of APIs and set parameters for their
search using a widget. The tool then formats the URL and request
results
developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
Stage 4. Developing User Interfaces
AUSTRALIAN WAR MEMORIAL
• Proof of concept• Developer led• Embedded RDF tags• Page based API• No documentation or
downloadable dataset
developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
THE AMSTERDAM MUSEUM
• Mobile app parses data from Amsterdam museum and linked ontologies
• Proposal for visual interface that enables user to become tour guide
• Current problem: search and download speed
developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
Out of the Trenches Demonstration Video
Subjects can be explored across a range of dimensionsSource: http://www.canadiana.ca/sites/pub.canadiana.ca/files/LOD-Demo-ENG_0.mp4
developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
Stage 5. Promoting Reuse
OPEN CULTUUR DATA INITIATIVE
• Offered workshops on how cultural heritage orgs could open their data
• Hosted hackathons to encourage developers to turn datasets into apps
• Three award-winners: • VISTORY (using LOD Open Images dataset)• Rijksmonumenten.info• Connected Collection
developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
OPEN CULTUUR DATA INITIATIVE
Screenshot from http://www.glimworm.com/vistory.shtml
developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
EUSCREEN• Linked Data Pilot• International collaboration• Open, International standards• Downloadable datasets• Fully documented• Showcase of projects in blog• Active in promoting reuse
developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
Stage 6. Expanding the Definition of Cultural Heritage
developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
CONCLUSIONS• (Most) LOD projects:
• Proof of concept• No access to a dataset• Not highly documented• Highly curated• Experimental• Promising
• The number of LOD datasets continues to increase• Actual use by cultural heritage institutions appears to remain limited
• Trust remains an obstacle• Compare: “A guppy is_a_Kind_of fish” (TRUE)
“A pony is_a_Kind_of fish" (UNTRUE) Computers see these as equally valid.• Verifying or identifying source of a statement may become a best practice
• Information added to triples? “A guppy is_a_Kind_of fish [source] DBpedia”
• Published datasets hold great potential for making the content of an archive's collections known• Researcher studying Person A finds that a collection of Person X's letters includes letters
to or from Person A