standing on the shoulder?
DESCRIPTION
Standing on the Shoulder?. Curation and the Record of Science Chris Rusbridge JISC/CNI 2006. Contents. Curation Sustainability Data resources Context Access and re-use Citation, archiving and preserving Breaking news: OAIS Review. - PowerPoint PPT PresentationTRANSCRIPT
a centre of expertise in data curation and preservation
Funded by:This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 UK: Scotland License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/2.5/scotland/ ; or, (b) send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.
Standing on the Shoulder?
Curation and the Record of Science
Chris Rusbridge
JISC/CNI 2006
a centre of expertise in data curation and preservation
JISC/CNI 2006
Contents
• Curation• Sustainability• Data resources• Context • Access and re-use• Citation, archiving and preserving• Breaking news: OAIS Review
a centre of expertise in data curation and preservation
JISC/CNI 2006
“If I have seen a little further it is by standing on the shoulders of giants”
• Newton’s letter to Hooke (1676); possibly a snide remark linked to Hooke’s stature -attributed to Bernard of Chartres by John of Salisbury, 1159 (Metalogicon)• Citation of evidence base fundamental
a centre of expertise in data curation and preservation
JISC/CNI 2006
Curation
• Data increasingly important as evidence• Experimental verifiability (the basis of science)• Unrepeatable observations & experiments
(particularly environmental in broadest sense)• Legal, compliance & transactions• Cultural resources
• For evidential value, data must be curated
a centre of expertise in data curation and preservation
JISC/CNI 2006
Curation
• “Maintaining and adding value to a trusted body of digital information for current and future use”
a centre of expertise in data curation and preservation
JISC/CNI 2006
Lynch remarks
• Closing the 2005 Curation Conference• 3 views of digital curation
• Collection as a living thing• Whole life process, evolving object(s)• Finite process, handover to preservation
a centre of expertise in data curation and preservation
JISC/CNI 2006
a centre of expertise in data curation and preservation
JISC/CNI 2006
Sustainability and exit strategy
• Most critical resource for curation: present and future money supply!
• Plan for the long term, but have a succession plan
• Sustained approach not project mentality
a centre of expertise in data curation and preservation
JISC/CNI 2006
Data resource stages
• Curated data is created…• Observations? Fixed!
• Or Acquired…• Data brought/bought from outside• Ingest
• Development• Derived, refined, combined, processed data• Potentially many stages
a centre of expertise in data curation and preservation
JISC/CNI 2006
Csat8-day composite
and subscene
Csat
E0
SST
8-day composite and subscene Pbopt calc
Ctot calc Zeu calc PPeu calc
PARsubscene
HRPT
NASA
University research group1
research group3 local
decision-making body
University research group2
Slide from Rajendra Bose
a centre of expertise in data curation and preservation
JISC/CNI 2006
Some illustrations: UK census• 1881 census (UKDA)
• Hand-written individual return forms: data conversion issue (reference form available): digitisation and access issues
• 1961 census (TNA/NDAD)• First using computers to analyse (first major UK-wide
computer project?); individual returns closed until 2062: data preservation issue!!!
• 2001 census (ONS/CDU) • Data corrections and adjustments: curation issue
a centre of expertise in data curation and preservation
JISC/CNI 2006 Khosrow Hejazian
a centre of expertise in data curation and preservation
JISC/CNI 2006
Student databases• Glasgow: 1960s flat files
• Converted to Indexed Sequential
• Converted to IDMS-X ~1983• Converted to Ingres ~1994 still current
• All students since 1960s• All prior students who have returned• All General Council <100 years
• Think of what has changed in that time!• Faculties, depts, grade structures, regulations…• Curation problem!
a centre of expertise in data curation and preservation
JISC/CNI 2006
Another university
• Also 3rd or 4th generation system• Previous data not carried forward• Available on tapes
• Let’s hope they are properly looked after, re-tensioned, metadata & documentation available…
• Dataset preservation nightmare!• (Urban myth? Told by senior manager!)
a centre of expertise in data curation and preservation
JISC/CNI 2006
Curation of emailsLots of metadata and context (RFC 822)Often highly distributedSplit conversationsUnknown numbers of copiesPersonal choice of clients
• Legal requirements!• Controlled filing and controlled deletion
needed…
a centre of expertise in data curation and preservation
JISC/CNI 2006
a centre of expertise in data curation and preservation
JISC/CNI 2006SDSS (Visual)
TWOMASS (Infrared)
Slide from Rajendra Bose
a centre of expertise in data curation and preservation
JISC/CNI 2006 Slide from Rajendra Bose
a centre of expertise in data curation and preservation
JISC/CNI 2006
Example…• National Virtual Observatory
• Johns Hopkins press release: “Scientists working to create the NVO, an online portal for astronomical research unifying dozens of large astronomical databases, confirmed discovery of [a] new brown dwarf recently. The star emerged from a computerized search of information on millions of astronomical objects in two separate astronomical databases. Thanks to an NVO prototype, that search, formerly an endeavor requiring weeks or months of human attention, took approximately two minutes.”
a centre of expertise in data curation and preservation
JISC/CNI 2006
Context
• Data meaningless without context• Linkage• Metadata of many kinds• Workflow!
• Provenance • Computational lineage • Authenticity
a centre of expertise in data curation and preservation
JISC/CNI 2006
Access and re-use
• Ethics and rights control access• Weak in expressing this long-term
• Collaboration tools• Annotation, discussion, review• Re-use leading to change and development
• “Publication”• Not just in “print”• Underlying data should be “published”, too
• Citation…
a centre of expertise in data curation and preservation
JISC/CNI 2006
Citation
OWL Web Ontology Language Reference
W3C Proposed Recommendation 15 December 2003
This version:http://www.w3.org/TR/2003/PR-owl-ref-20031215/Latest version:http://www.w3.org/TR/owl-ref/Previous version:http://www.w3.org/TR/2003/CR-owl-ref-2003081
• Needs a stable resource to cite…
a centre of expertise in data curation and preservation
JISC/CNI 2006
Citation…
• The date alone (as in common web citation approaches) is not enough!
• Cited object likely to have changed…• Citation should link to the cited object as it was!
•[6] The CIA World Factbook.
•www.cia.gov/cia/publications/factbook/.
•Retrieved on 8 Jan 2006.
a centre of expertise in data curation and preservation
JISC/CNI 2006
Citation needs…• An efficient way to reference and access “archived”
past states of a changing dataset (work in progress, Buneman et al)
• Less important for original observations• Don’t mess with those data
• Less important for incremental datasets• Later stuff should not invalidate earlier
• Very important for revisable datasets• Eg Genomics… datasets that result from the combined work
of curators, or contain opinions or facts likely to change
a centre of expertise in data curation and preservation
JISC/CNI 2006
XM
L Arch
iver
RelationalDatabase
XML Archive at time t - 1
XML Archive at time t
XMLArch: System Architecture
Pre-processor
VersionMerger
Data Extractor
XML Snapshot at time t
•Carwyn Edwards
a centre of expertise in data curation and preservation
JISC/CNI 2006
Preservation • Use preserves• Money preserves• Redundancy good, monoculture bad?
• LOCKSS-type & other approaches…
• Bits are fragile and robust• Don’t rely on portable media• Look after them well
• Technology changes…• How fast? What impact?
• Metadata matters! (Know what you’ve got)
a centre of expertise in data curation and preservation
JISC/CNI 2006
Preservation
• We can’t do it alone• Collective responsibility
• We can’t rely on anyone else• Institutional responsibility
a centre of expertise in data curation and preservation
JISC/CNI 2006
It’s about time…• From the very short
• Good management (don’t under-estimate but don’t over-estimate)
• Through the medium term• Curation: use it or lose it• Gather ye metadata while ye may!• Preservation relay
• To the very long term• High commitment, high cost, high risk• Harder to do en masse
a centre of expertise in data curation and preservation
JISC/CNI 2006
OAIS• “Announcement of a Comment Period for the Five
Year Review of the Reference Model for an Open Archival Information System (OAIS) Standard”• “… must be reviewed every five years and a determination
made to reaffirm, modify, or withdraw the existing standard.”• “…any revision must remain backward compatible with
regard to major terminology and concepts.”• “… we do not plan to expand the general level of detail”• “… reduce ambiguities and fill in any missing or weak
concepts”
• Make suggestions and express interest until 30/10/06• [email protected]
a centre of expertise in data curation and preservation
JISC/CNI 2006
• Are we standing on the hard shoulder (the road side) waiting for a ride?
• Or are we supporting the shoulders of giants (building the evidence bases for future science)?