2013 11-27 sustainable-history_slides

22
Preserving research data for the future Dr James Baker, Digital Curator @j_w_baker [email protected]

Upload: james-baker

Post on 25-May-2015

175 views

Category:

Education


0 download

DESCRIPTION

Slides from invited talk at 'Sustainable history: ensuring today's digital history survives' event, Institute of Historical Research, 28 November 2013.

TRANSCRIPT

Page 1: 2013 11-27 sustainable-history_slides

Preserving research data for

the future

Dr James Baker, Digital Curator

@j_w_baker

[email protected]

Page 2: 2013 11-27 sustainable-history_slides

www.bl.uk 2

Some admin…

You are free to:

– Copy, share, adapt, or re-mix

– Photograph, film, or broadcast

– Blog, live-blog, or post video of;

this presentation provided that:

– You attribute the work to its author

and respect the rights and licences

associated with its components

– You distribute the resulting work only

under the same or similar license to

this one

Text attribution Greg Wilson, Two Solitudes, SPLASH 2013 (29 October 2013)

http://www.slideshare.net/gvwilson/splash-2013

This work is licensed under a

Creative Commons Attribution-

ShareAlike 3.0 Unported License

unless stated otherwise.

Page 3: 2013 11-27 sustainable-history_slides

www.bl.uk 3

‘the fragility of evidence in the digital era’

‘[the digital] archive is considerably more fragile than one

would like’

‘The simultaneous fragility and promiscuity of digital data’

Roy Rosenzweig, Scarcity or Abundance? Preserving the Past in a Digital Era,

The American Historical Review 108:3 (2003), 736, 737, 739.

Page 4: 2013 11-27 sustainable-history_slides

www.bl.uk 4

Page 5: 2013 11-27 sustainable-history_slides

www.bl.uk 5

‘The core guiding principle is simple: Someone unfamiliar with

your project should be able to look at your computer

files and understand in detail what you did and why […]

Most commonly, however, that “someone” is you. A few

months from now, you may not remember what you were up to when you

created a particular set of files, or you may not remember what

conclusions you drew. You will either have to then spend time

reconstructing your previous experiments or lose whatever insights you

gained from those experiments.’

William Stafford Noble (2009) A Quick Guide to Organizing Computational Biology

Projects. PLoS Comput Biol 5(7): e1000424. doi:10.1371/journal.pcbi.1000424

Preservation and use

Page 6: 2013 11-27 sustainable-history_slides

www.bl.uk 6

‘What is often said of military strategy seems to apply to digital

preservation: "the greatest enemy of a good plan is the

dream of a perfect plan." We have never preserved everything; we

need to start preserving something.’

Roy Rosenzweig, Scarcity or Abundance? Preserving the Past in a Digital Era,

The American Historical Review 108:3 (2003), 754.

Page 7: 2013 11-27 sustainable-history_slides

www.bl.uk 7

and, how do we plan for the future - the unknown future of

digital space, digital dissemination, and digital information?

Heather Froehlich (heatherfro). “and, how do we plan for the future - the unknown

future of digital space, digital dissemination, and digital information?” 4 November

2013, 5:15 a.m. Tweet.

What demands the closest attention?

Page 8: 2013 11-27 sustainable-history_slides

www.bl.uk 8

Victory is mine: while ago I worked out some Clever Stuff

(tm) in Excel. And I MADE NOTES ON IT. And those notes

ENABLED ME TO DO IT AGAIN.

Katie Birkwood (girlinthe). “Victory is mine: while ago I worked out some Clever

Stuff (tm) in Excel. And I MADE NOTES ON IT. And those notes ENABLED ME

TO DO IT AGAIN.” 7 October 2013, 3:46 a.m. Tweet.

Documentation

Page 9: 2013 11-27 sustainable-history_slides

www.bl.uk 9

Good documentation must:

– Include ‘the archive references for the originals!’

– Explain the [source or] ‘dataset (and its limitations)

accurately’.

– Be ‘clear about what it represents (eg full transcriptions,

partial transcriptions, just summaries, changes, iterations)’.

– Be written ‘in a structured data format to make it machine-

readable […] Plain text files (.txt) are preferable to Word

docs’.

Documentation

Sharon Howard, ‘Unclean, unclean! What historians can do about sharing

our messy research data’, Early Modern Notes (18 May 2013)

Page 10: 2013 11-27 sustainable-history_slides

www.bl.uk 10

"Word is not a digital preservation standard" -

understatement of the day #SearchSolutions2013

Helen Lippell (octodude). “"Word is not a digital preservation standard" -

understatement of the day #SearchSolutions2013” 27 November 2013, 12:36

a.m. Tweet.

Documentation

Page 11: 2013 11-27 sustainable-history_slides

www.bl.uk 11

Page 12: 2013 11-27 sustainable-history_slides

www.bl.uk 12

Notes on digital books.docx NO!

2013-11-18_MS_books_documentation.txt YES!

Extensible, scalable, reusable

Page 13: 2013 11-27 sustainable-history_slides

www.bl.uk 13

Page 14: 2013 11-27 sustainable-history_slides

www.bl.uk 14

\root\BL\Talks\ 2013-11_Sustainable_History

2013-11_Liverpool_John_Moores

2013-05_Going_Digital

(Extensible, scalable, reusable) Structure

\root\ Admin

Attic

BL

Notes

Research

Teaching

\root\BL\ Admin

Attic

Data

Events

Projects

Research

Talks

Teaching

Page 15: 2013 11-27 sustainable-history_slides

www.bl.uk 15

2013-08-11_History_Journal_Articles.tsv

2013-08-11_History_Journal_Articles.txt

(Extensible, scalable, reusable) Naming

Page 16: 2013 11-27 sustainable-history_slides

www.bl.uk 16

2013-08-11_History_Journal_Articles_africa.tsv

2013-08-11_History_Journal_Articles_america.tsv

2013-09-11_History_Journal_Articles_art.tsv

2013-09-11_History_Journal_Articles_britain.tsv

2013_History_Journal_Articles.txt

copy *.tsv newfile.tsv

copy 2013-08*.tsv newfile.tsv

copy 2013-0*-11_History_Journal_Articles_a*.tsv newfile.tsv

(Extensible, scalable, reusable) Naming

Page 17: 2013 11-27 sustainable-history_slides

www.bl.uk 17

DATE_ARTIST_TITLE.FORMAT

1804-02-10_Gillray_TheKingofBrobdignagandGulliver.png

1653_Rembrandt_TheThreeCrosses.png

(Extensible, scalable, reusable) Naming

Page 18: 2013 11-27 sustainable-history_slides

www.bl.uk 18

Page 19: 2013 11-27 sustainable-history_slides

www.bl.uk 19

1653_Rembrandt_TheThreeCrosses.png

1653_Rembrandt_TheThreeCrosses_edited.png

2013-08-11_History_Journal_Articles_africa.tsv

2013-11-18_History_Journal_Articles_africa_3column.tsv

2013-11-18_Sustainable_History_talk.docx

2013-11-18a_Sustainable_History_talk.docx

2013-11-19_Sustainable_History_talk.docx

(Extensible, scalable, reusable) Version Controllite

Page 20: 2013 11-27 sustainable-history_slides

www.bl.uk 20

1. We integrate curatorial assessments of our digital collection

content into preservation decisions, so that technical activities

support curatorial requirements for the collections

2. We preserve metadata about our digital

collections, so that we may understand

and preserve the collections over time 3. We preserve the provenance of our digital collection

content, so that we understand and can demonstrate its

authenticity over time

4. We record any modifications to digital

collection content (e.g. preservation action, normalisation)

during the lifecycle, so that we can understand and

demonstrate its integrity over time

5. We consistently apply and document our

application of metadata standards, so

that future generations can understand

our collections 6. We maintain file-level integrity of our digital collections, so

that we can protect against loss and damage

(Extensible, scalable, reusable) Review

7. We preserve original files in our long term

repository, alongside any other required

representations of the content, so that we

maintain the original artefacts acquired or deposited into our

care as a ground truth representation of the content for future,

currently unknown, preservation and access scenarios

8. We maintain Preservation Master copies of

collection content in our long term repository, so that the

format-based risks of preservation over time are minimised

9. We maintain and implement

preservation plans for our digital collections, so

that preservation actions are reliable and based on a holistic

understanding of the collections and their context

10. We implement comprehensive end-to-end

workflows, so that we may consistently manage and

preserve our digital collections across the entire lifecycle

11. We regularly monitor our digital collection content for

emergent preservation risks, so that we may mitigate against

them

12. We integrate quality assurance checks into the lifecycle

where appropriate, so that the authenticity and integrity of the

content is maintained Maureen Pennock, 'The Twelve Principles of Digital Preservation (and a cartridge in a

repository…)', British Library Collection Care blog (3 September 2013)

Page 21: 2013 11-27 sustainable-history_slides

www.bl.uk 21

Page 22: 2013 11-27 sustainable-history_slides

www.bl.uk 22

Thank you!

James Baker

@j_w_baker

[email protected]