standing on the shoulder?

30
a centre of expertise in data curation and preservation Funded by: This work is licensed under the Creative Commons Attribution- NonCommercial-ShareAlike 2.5 UK: Scotland License. To view a copy of this license, visit http: //creativecommons .org/licenses/by-nc-sa/2.5/scotland/ ; or, (b) send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA. Standing on the Shoulder? Curation and the Record of Science Chris Rusbridge JISC/CNI 2006

Upload: topper

Post on 08-Feb-2016

36 views

Category:

Documents


0 download

DESCRIPTION

Standing on the Shoulder?. Curation and the Record of Science Chris Rusbridge JISC/CNI 2006. Contents. Curation Sustainability Data resources Context Access and re-use Citation, archiving and preserving Breaking news: OAIS Review. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Standing on the Shoulder?

a centre of expertise in data curation and preservation

Funded by:This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 UK: Scotland License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/2.5/scotland/ ; or, (b) send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.

Standing on the Shoulder?

Curation and the Record of Science

Chris Rusbridge

JISC/CNI 2006

Page 2: Standing on the Shoulder?

a centre of expertise in data curation and preservation

JISC/CNI 2006

Contents

• Curation• Sustainability• Data resources• Context • Access and re-use• Citation, archiving and preserving• Breaking news: OAIS Review

Page 3: Standing on the Shoulder?

a centre of expertise in data curation and preservation

JISC/CNI 2006

“If I have seen a little further it is by standing on the shoulders of giants”

• Newton’s letter to Hooke (1676); possibly a snide remark linked to Hooke’s stature -attributed to Bernard of Chartres by John of Salisbury, 1159 (Metalogicon)• Citation of evidence base fundamental

Page 4: Standing on the Shoulder?

a centre of expertise in data curation and preservation

JISC/CNI 2006

Curation

• Data increasingly important as evidence• Experimental verifiability (the basis of science)• Unrepeatable observations & experiments

(particularly environmental in broadest sense)• Legal, compliance & transactions• Cultural resources

• For evidential value, data must be curated

Page 5: Standing on the Shoulder?

a centre of expertise in data curation and preservation

JISC/CNI 2006

Curation

• “Maintaining and adding value to a trusted body of digital information for current and future use”

Page 6: Standing on the Shoulder?

a centre of expertise in data curation and preservation

JISC/CNI 2006

Lynch remarks

• Closing the 2005 Curation Conference• 3 views of digital curation

• Collection as a living thing• Whole life process, evolving object(s)• Finite process, handover to preservation

Page 7: Standing on the Shoulder?

a centre of expertise in data curation and preservation

JISC/CNI 2006

Page 8: Standing on the Shoulder?

a centre of expertise in data curation and preservation

JISC/CNI 2006

Sustainability and exit strategy

• Most critical resource for curation: present and future money supply!

• Plan for the long term, but have a succession plan

• Sustained approach not project mentality

Page 9: Standing on the Shoulder?

a centre of expertise in data curation and preservation

JISC/CNI 2006

Data resource stages

• Curated data is created…• Observations? Fixed!

• Or Acquired…• Data brought/bought from outside• Ingest

• Development• Derived, refined, combined, processed data• Potentially many stages

Page 10: Standing on the Shoulder?

a centre of expertise in data curation and preservation

JISC/CNI 2006

Csat8-day composite

and subscene

Csat

E0

SST

8-day composite and subscene Pbopt calc

Ctot calc Zeu calc PPeu calc

PARsubscene

HRPT

NASA

University research group1

research group3 local

decision-making body

University research group2

Slide from Rajendra Bose

Page 11: Standing on the Shoulder?

a centre of expertise in data curation and preservation

JISC/CNI 2006

Some illustrations: UK census• 1881 census (UKDA)

• Hand-written individual return forms: data conversion issue (reference form available): digitisation and access issues

• 1961 census (TNA/NDAD)• First using computers to analyse (first major UK-wide

computer project?); individual returns closed until 2062: data preservation issue!!!

• 2001 census (ONS/CDU) • Data corrections and adjustments: curation issue

Page 12: Standing on the Shoulder?

a centre of expertise in data curation and preservation

JISC/CNI 2006 Khosrow Hejazian

Page 13: Standing on the Shoulder?

a centre of expertise in data curation and preservation

JISC/CNI 2006

Student databases• Glasgow: 1960s flat files

• Converted to Indexed Sequential

• Converted to IDMS-X ~1983• Converted to Ingres ~1994 still current

• All students since 1960s• All prior students who have returned• All General Council <100 years

• Think of what has changed in that time!• Faculties, depts, grade structures, regulations…• Curation problem!

Page 14: Standing on the Shoulder?

a centre of expertise in data curation and preservation

JISC/CNI 2006

Another university

• Also 3rd or 4th generation system• Previous data not carried forward• Available on tapes

• Let’s hope they are properly looked after, re-tensioned, metadata & documentation available…

• Dataset preservation nightmare!• (Urban myth? Told by senior manager!)

Page 15: Standing on the Shoulder?

a centre of expertise in data curation and preservation

JISC/CNI 2006

Curation of emailsLots of metadata and context (RFC 822)Often highly distributedSplit conversationsUnknown numbers of copiesPersonal choice of clients

• Legal requirements!• Controlled filing and controlled deletion

needed…

Page 16: Standing on the Shoulder?

a centre of expertise in data curation and preservation

JISC/CNI 2006

Page 17: Standing on the Shoulder?

a centre of expertise in data curation and preservation

JISC/CNI 2006SDSS (Visual)

TWOMASS (Infrared)

Slide from Rajendra Bose

Page 18: Standing on the Shoulder?

a centre of expertise in data curation and preservation

JISC/CNI 2006 Slide from Rajendra Bose

Page 19: Standing on the Shoulder?

a centre of expertise in data curation and preservation

JISC/CNI 2006

Example…• National Virtual Observatory

• Johns Hopkins press release: “Scientists working to create the NVO, an online portal for astronomical research unifying dozens of large astronomical databases, confirmed discovery of [a] new brown dwarf recently. The star emerged from a computerized search of information on millions of astronomical objects in two separate astronomical databases. Thanks to an NVO prototype, that search, formerly an endeavor requiring weeks or months of human attention, took approximately two minutes.”

Page 20: Standing on the Shoulder?

a centre of expertise in data curation and preservation

JISC/CNI 2006

Context

• Data meaningless without context• Linkage• Metadata of many kinds• Workflow!

• Provenance • Computational lineage • Authenticity

Page 21: Standing on the Shoulder?

a centre of expertise in data curation and preservation

JISC/CNI 2006

Access and re-use

• Ethics and rights control access• Weak in expressing this long-term

• Collaboration tools• Annotation, discussion, review• Re-use leading to change and development

• “Publication”• Not just in “print”• Underlying data should be “published”, too

• Citation…

Page 22: Standing on the Shoulder?

a centre of expertise in data curation and preservation

JISC/CNI 2006

Citation

OWL Web Ontology Language Reference

W3C Proposed Recommendation 15 December 2003

This version:http://www.w3.org/TR/2003/PR-owl-ref-20031215/Latest version:http://www.w3.org/TR/owl-ref/Previous version:http://www.w3.org/TR/2003/CR-owl-ref-2003081

• Needs a stable resource to cite…

Page 23: Standing on the Shoulder?

a centre of expertise in data curation and preservation

JISC/CNI 2006

Citation…

• The date alone (as in common web citation approaches) is not enough!

• Cited object likely to have changed…• Citation should link to the cited object as it was!

•[6] The CIA World Factbook.

•www.cia.gov/cia/publications/factbook/.

•Retrieved on 8 Jan 2006.

Page 24: Standing on the Shoulder?

a centre of expertise in data curation and preservation

JISC/CNI 2006

Citation needs…• An efficient way to reference and access “archived”

past states of a changing dataset (work in progress, Buneman et al)

• Less important for original observations• Don’t mess with those data

• Less important for incremental datasets• Later stuff should not invalidate earlier

• Very important for revisable datasets• Eg Genomics… datasets that result from the combined work

of curators, or contain opinions or facts likely to change

Page 25: Standing on the Shoulder?

a centre of expertise in data curation and preservation

JISC/CNI 2006

XM

L Arch

iver

RelationalDatabase

XML Archive at time t - 1

XML Archive at time t

XMLArch: System Architecture

Pre-processor

VersionMerger

Data Extractor

XML Snapshot at time t

•Carwyn Edwards

Page 26: Standing on the Shoulder?

a centre of expertise in data curation and preservation

JISC/CNI 2006

Preservation • Use preserves• Money preserves• Redundancy good, monoculture bad?

• LOCKSS-type & other approaches…

• Bits are fragile and robust• Don’t rely on portable media• Look after them well

• Technology changes…• How fast? What impact?

• Metadata matters! (Know what you’ve got)

Page 27: Standing on the Shoulder?

a centre of expertise in data curation and preservation

JISC/CNI 2006

Preservation

• We can’t do it alone• Collective responsibility

• We can’t rely on anyone else• Institutional responsibility

Page 28: Standing on the Shoulder?

a centre of expertise in data curation and preservation

JISC/CNI 2006

It’s about time…• From the very short

• Good management (don’t under-estimate but don’t over-estimate)

• Through the medium term• Curation: use it or lose it• Gather ye metadata while ye may!• Preservation relay

• To the very long term• High commitment, high cost, high risk• Harder to do en masse

Page 29: Standing on the Shoulder?

a centre of expertise in data curation and preservation

JISC/CNI 2006

OAIS• “Announcement of a Comment Period for the Five

Year Review of the Reference Model for an Open Archival Information System (OAIS) Standard”• “… must be reviewed every five years and a determination

made to reaffirm, modify, or withdraw the existing standard.”• “…any revision must remain backward compatible with

regard to major terminology and concepts.”• “… we do not plan to expand the general level of detail”• “… reduce ambiguities and fill in any missing or weak

concepts”

• Make suggestions and express interest until 30/10/06• [email protected]

Page 30: Standing on the Shoulder?

a centre of expertise in data curation and preservation

JISC/CNI 2006

• Are we standing on the hard shoulder (the road side) waiting for a ride?

• Or are we supporting the shoulders of giants (building the evidence bases for future science)?