digital preservation, archival science and ...
TRANSCRIPT
-
8/12/2019 DigitalPreservation, Archival Science and MethodologicalFoundationsforDigitalLibraries1
1/19
Seamus Ross (Keynote Address ECDL2007, Budapest) -1-
DigitalPreservation,ArchivalScienceand
MethodologicalFoundationsforDigitalLibraries1SeamusRoss
HATIIattheUniversityofGlasgow,DigitalCurationCentre(UK)andDigitalPreservationEurope(DPE)
AbstractDigitallibraries,whethercommercial,publicorpersonal,lieattheheartoftheinformation
society.Yet,researchintotheirlongtermviabilityandthemeaningfulaccessibilityoftheircontentsremainsin its infancy.Ingeneral,aswehavepointedoutelsewhere, aftermorethan twenty years of research in digital curation and preservation the actual theories,methods and technologies that can either foster or ensure digital longevity remainstartlingly limited. Research led by DigitalPreservationEurope (DPE) and the DigitalPreservation Cluster of DELOS has allowed us to refine the key research challenges theoretical,methodologicalandtechnologicalthatneedattentionbyresearchersindigitallibrariesduringthecomingfivetotenyears,ifwearetoensurethatthematerialsheldinour emerging digital libraries are to remain sustainable, authentic, accessible andunderstandableovertime.Buildingonthisworkandtakingthetheoreticalframeworkofarchival science as bedrock, this paper investigates digital preservation and its
foundational role if digital libraries are to have longterm viability at the centre of theglobalinformationsociety.
1IntroductionTheSignificanceandScopeofDigitalPreservationGoodmorning.ItisapleasuretoreturntoanotherECDLConferenceand,inparticular,tooneinBudapest,oneofmyfavouritecities.2
Libraries have long played a critical role in the creation and transmission of scientificknowledge and culture.3 As they undergo a metamorphosis from the physical to the
1Pleaseciteas:S.Ross(2007),DigitalPreservation,ArchivalScienceandMethodologicalFoundationsforDigital Libraries, Keynote Address at the 11th European Conference on Digital Libraries (ECDL),Budapest(17September2007).SeamusRoss,HATIIattheUniversityofGlasgow.2Acknowledgements:IwishtoextendmythankstotheChairsofECDL2007,LszlKovs,NorbertFuhrandCarloMeghini,for invitingme todeliver theOpeningKeynoteAddress.Iamgrateful tomycolleaguesinHATII,SarahJones,PerlaInnocentiandAndrewMcHugh,andtoProfessorsRossHarvey(VisitingProfessoratHATIIattheUniversityofGlasgow2007andDigitalCurationCentreUK Research Fellow), Dagobert Soergel (College of Information Studies at the University ofMaryland)andHelenTibbo (Schoolof InformationandLibraryScienceat theUniversityofNorthCarolinaatChapelHill,NC) for theircommentson thispaper. Iam indebted tomycolleagues inDigitalPreservationEurope (DPE)whoarecollaboratingondeveloping theResearchRoadmapandespeciallytoHolgerBrocksoftheFernUniversittinHagen.3L.Casson,LibrariesintheAncientWorld,(NewHaven,CT:YaleUniversityPress,2001);W.Hoepfner(ed.),AntikeBibliotheken, (Mainz:PhilippvonZabern,2002);M.Battles,Library:AnUnquietHistory,(London:Vintage,2004).
-
8/12/2019 DigitalPreservation, Archival Science and MethodologicalFoundationsforDigitalLibraries1
2/19
Seamus Ross (Keynote Address ECDL2007, Budapest) -2-
virtual, they continue to serve this role, although their nature and reach maybe verydifferent in the future. Increasingly, though, as institutions invest in developing digitallibraries they come to recognise that the digital assets on which their library depends their capital assets, so to speak are fragile and may require substantial continued
investmentof financeandeffort, if theholdings themselvesare toremainaccessibleoverthe longer term.4 In fact, there is a risingbuzz within the information managementcommunitiesabout thepreservationofdigitalobjects. In thenext fortyfiveminutes Iamgoing to talk about the digital preservation challenge, about some of the concepts ofarchival science thatmightaddvalue to thedesignanddeliveryofdigital libraries,andabouttheresearchagendafordigitalpreservation.BymyconclusionIhopethatIwillhavestimulated discussion and encouraged more digital library researchers to contribute toaddressingthechallengesposedinthedigitalpreservationresearchagenda.
Digitalobjectsbreak.Digitalmaterialsoccur inaricharrayof typesandrepresentations.Theyareboundtovaryingdegreestothespecificapplicationpackages(orhardware)that
were
used
to
create
or
manage
them.
They
are
prone
to
corruption.
They
are
easily
misidentified. They are generally poorly described or annotated; they often haveinsufficientmetadataattached to them toavoid theirgradualsusceptibility to syntacticaland semantic glaucoma. Where they do have sufficient ancillary data, these data arefrequentlytimeconstrained.Beyondmaintainingtheintactnessofthebitstream(whichisfairly straightforward), the longterm curation and preservation of digital materials canbestbedescribedasa labourintensiveartisanorcraftactivity. While thisapproachmayworkwellwhenthenumbersofobjectsaresmall,thediversityoftheirtypesisrestricted,their complexity narrow, and the scale of digital libraries limited, there is widespreadagreement that thehandicraftapproachwillnot scale to support the longevityofdigitalcontentinthediverseandlargedigitallibrariesthatareemerging.
Digitalpreservation isaboutmorethankeepingthebitsthosestreamsof1sand0sthatwe use to represent information.5 It is about maintaining the semantic meaning of thedigital object and its content, about maintaining its provenance and authenticity, aboutretaining its interrelatedness, and about securing information about the context of itscreation and use. Measured planning and the recognition that digital curation andpreservation is a risk management activity at all stages of the longevity pathway arecritical aspects of the preservation process.6 In undertaking preservation planning and
4S.Ross,ReflectionsontheImpactoftheLundPrinciplesonEuropeanApproachestoDigitisationinStrategiesforaEuropeanAreaofDigitalCulturalResources:TowardsaContinuumofDigitalHeritage,
(Den Haag: Dutch Ministry of Culture, 2004), 8898.http://www.digitaliseringerfgoed.nl/sites/cultuurtechnologie/contents/i000263/ocw_conclusieboekje.pdf or http://eprints.erpanet.org/103/01/sross_denhaag_dutch_paper.pdf; S. Ross, Strategies forSelecting Resources for Digitization: SourceOrientated, UserDriven, AssetAware Model(SOUDAAM), inT.Coppock (ed.),Making InformationAvailable inDigitalFormat:PerspectivesfromPractitioners,(Edinburgh:TheStationeryOffice,1999),527.5S.Ross,ApproachingDigitalPreservationHolistically,inA.ToughandM.Moss(eds.),InformationManagement and Preservation, (Oxford: Chandos Press, 2006), 115153; S. Ross, Changing Trains atWigan: Digital Preservation and the Future of Scholarship, (London: British Library, NationalPreservationOffice,2000),ISBN0712347178,http://portico.bl.uk/services/npo/pdf/wigan.pdf;S.RossandA.Gow,Digital archaeology?RescuingNeglected orDamagedDataResources, (London &Bristol:British Library and Joint Information Systems Committee, 1999), ISBN 1900508516,
http://www.ukoln.ac.uk/services/elib/papers/supporting/pdf/p2.pdf6 S. Ross and A. McHugh, Audit and Certification: Creating a Mandate for the Digital CurationCentre,Diginews, vol. 9, no. 5 (2005), http://www.rlg.org/en/page.php?Page_ID=20793#article1; S.RossandA.McHugh, TheRoleofEvidence inEstablishingTrust inRepositories,DLibMagazine,
-
8/12/2019 DigitalPreservation, Archival Science and MethodologicalFoundationsforDigitalLibraries1
3/19
Seamus Ross (Keynote Address ECDL2007, Budapest) -3-
action, individuals and organisations must adopt a level of risk that reflects theirpreservationobjectivesandcapabilitiesbothorganisationalandtechnical.Ourapproachtopreservationmustbevariableanddigitalobjectresponsive:
for
some
materials
held
in
digital
libraries
retaining
the
content
will
be
asufficientoutcome; forothermaterialwemustalsoretaintheenvironmentandcontextof
creationanduse;and, forstillothermaterialswemustbeabletoreproducetheexperienceof
use if we are to ensure that the right semantic representation andinformationispassedtothefuture.
As examples of these three classes of preservation, consider a digital library of literarytexts,oneofscientificreports linkedtodatasets,andfinallyadigitallibraryofcomputergames.Inallthesecaseseachrenditionofadigitalobjectmustcarrythesameforceastheinitialinstantiation,sometimeserroneouslylabelledastheoriginal.Aseveryinstantiationis a performance representing a range of functions andbehaviours, we need ways toassess the verisimilitude of each subsequent performance to the initial one and cleardefinitionsofacceptablevariance.7
Although we have, as yet, no statistically substantiated grounds for making this claim,access over time to digital objects appears closely correlated to their continuous use forbusiness purposes, and to their perceived and actual recurring value. Recurring valuearisesfromtheuseofdigitalobjectsfortheirevidential,informationorcommercialvalue.Fromanevidentiaryperspectivetheymightbeusedto:
limitcorporateliability; demonstrateprimaryrightstoanidea,inventionorproperty; meetcomplianceorregulatoryrequirements; achievecompetitiveadvantage; facilitateeducationandlearning;or supportnewscholarship.
Recurring value may result from the reexploitation of materials through leasing them,their sale in new kinds of packaging or contexts, or their release in some new andunexpectedway.Certaindatasetsthatareregularlyexploitedforcommercialorresearchpurposes, such as meteorological, diagnostic (especially medical), digital maps, orbiologicaldatasets(e.g.genomicorproteindatabases)arelikelytobenefitfromalevelof
persistentcarethatwillensuretheirlongertermaccessibility.Recurringvaluehasvariabletimedepthandinsomeinstancesdigitalobjects,liketheiranaloguecounterparts,gooutoffashionoruseandmustsurviveverylongtimeperiodsofwhatProfessorHelenTibboof
July/August, vol. 12, no. 7/8 (2006), (also published in Archivi e Computer, August 2006),http://www.dlib.org/dlib/july06/ross/07ross.html7ThisapproachismostelegantlydescribedinUNESCO[NationalLibraryofAustralia],Guidelinesforthe Preservation of Digital Heritage, (Paris: UNESCO, 2003),http://unesdoc.unesco.org/images/0013/001300/130071e.pdf Indeed, we have done little to providemechanisms to establish verisimilitudebetween initial and subsequent instantiations. A paperpresentedatECDL2007byLarsClausenoftheStatsbibliotekinDenmarkisagoodexampleofthe
kindofwork thatneeds tobedone in thisarea: L.Clausen, OpeningSchrdingersLibrary:SemiautomaticQA Reduces Uncertainty in ObjectTransformation, L.Kovs,N. Fuhr andC.Meghini(eds.),Research andAdvancedTechnologyforDigitalLibraries,11thEuropeanConference,ECDL2007,LNCS4675,(Berlin:Springer,2007),186197.
-
8/12/2019 DigitalPreservation, Archival Science and MethodologicalFoundationsforDigitalLibraries1
4/19
Seamus Ross (Keynote Address ECDL2007, Budapest) -4-
the University of North Carolina at Chapel Hill has called benign neglectbefore theybecome thesubjectofscholarlyorcommercial interestagain.8Asaresultof theconstantevolutionoftechnology,thedegradationofstoragemediaandtheeverincreasingpaceofsemanticdrift,digital objects do not, in contrast to many of their analogue counterparts, respond
well to benign neglect.
2AnAppreciationoftheProblemHowwidespreadistheappreciationofthedigitalpreservationproblem?Theanswerisnotencouraging.JustbeforeERPANETapreservationactivity fundedunder theEuropeanCommissionsFifthFrameworkProgrammeended inNovember2004 itcompletedonehundredcasestudies involvingcompaniesandpublicsectororganisations inaneffort toinvestigatethisquestion.SomeseventyeightofthesecasestudiesarepubliclyavailableontheERPANETwebsite.9ThesestudiesprovideinsightsintocurrentpreservationpracticesindifferentEuropean institutional,juridicalandbusinesscontextsaswellasacrossboth
the
public
and
private
sectors.
The
case
studies
and
results
are
complemented
by
research
conductedelsewhere, includingbutnot limited to researchby InterPARES;10asurveyoffifteenNationalLibraries;11 theDPE surveyofarchivesand libraries in theEUMemberStates;12 theAIIMsurveys in2004and2005;13 the2006DigitalPreservationCoalitionUKsurvey MindtheGap;14andsurveysofnationaland localarchiveswhichHansHofman
8H.R.Tibbo,OntheNatureandImportanceofArchivingintheDigitalAge,AdvancesinComputers,vol.57(2003),167.9ERPANETconductedaround100casestudiesbetween2002andtheendof2004,ofwhichseventyeightarepublishedon theERPANETwebsiteandareforthcoming inS.Rossetal.,ERPANETCaseStudiesinDigitalPreservation(Glasgow,2007).SeealsoS.Ross,M.GreenanandP.McKinney,DigitalPreservationStrategies:The InitialOutcomesof theERPANET CaseStudies in thePreservation ofElectronic Records: New Knowledge and Decisionmaking, (Ottawa: Canadian Conservation Institute,2004),ISBN0662686209,99111.ERPANET,withfundingfromtheSwissFederalGovernmentandthe European Commission (IST200132706), ledby the Humanities Advanced Technology andInformation Institute (HATII)at theUniversityofGlasgow (UnitedKingdom)and itspartners theSchweizerisches Bundesarchiv (Switzerland), ISTBAL at the Universit di Urbino (Italy) andNationaal ArchiefvanNederland (Netherlands), workedbetweenNovember 2001 and the end ofOctober2004toenhancethepreservationofculturalandscientificdigitalobjects.10
http://www.interpares.org; L. Duranti (ed.), The Longterm Preservation of Authentic ElectronicRecords:FindingsoftheInterPARESProject,(SanMiniato:Archilab,2005).11
I.Verheul,NetworkingforDigitalPreservation.CurrentPracticein15NationalLibraries,(Mnchen:KG
Saur,
2006),
IFLA
Publication
Series,
http://www.ifla.org/VI/7/pub/IFLAPublication
No119.pdf
12http://www.digitalpreservationeurope.eu
13AIIM theEnterpriseContentManagement industryassociation reports areonlyavailable to
membersorforafixedfee,http://www.aiim.org/.The2005studyElectronicCommunicationPoliciesandProcedureswasconductedbyAIIMandKahnConsulting.14
M.WallerandR.Sharpe,MindtheGap:AssessingDigitalPreservationNeedsintheUK,(York:DigitalPreservationCoalition,2006),http://www.dpconline.org/graphics/reports/mindthegap.html. In Section 3, Background andMethodology,of the Mind theGapreportwediscover that thequestionnaireonwhich thisstudywasbasedhadbeensentto900individuals,butwearenottoldhowtheywereselected.Asaresult,thereaderhasnoinformationastowhetherornotthese900constitutearepresentativesampleofanyparticularconstituencyandifsowhatcommunitythatis.Theprojectteamreceived104responses,a
responserateofjustover10%.Theauthorsdonotexplainwhetherornotthe10%isrepresentativeofthecohortof900andifsoinwhatway itisrepresentativeofthatcohort.Theproblemsarefurtherexacerbatedbythewaytheresultsofthesurveyaresubsequentlyused.Forexample,howarewetounderstandthesentence:Incontrast,fewerthan20%oftheorganisationssurveyedhavesomekind
-
8/12/2019 DigitalPreservation, Archival Science and MethodologicalFoundationsforDigitalLibraries1
5/19
Seamus Ross (Keynote Address ECDL2007, Budapest) -5-
reported on inEnabling Persistent And Sustainable Digital Cultural Heritage in Europe.15Basically,asaresultoftheERPANETCaseStudies,itissafetoconcludethat:
awareness of the issues surrounding digital preservation variedmarkedly
across
organisations,
and
even
across
different
divisions
of
thesameorganisation; few organisations took a longterm perspective and those that did
wereeithernationalinformationcuratinginstitutions(e.g.archives)orinstitutions from telecommunications, pharmaceuticals andtransportation sectors where failure to adoptbest practices createshigherlevelsoftheregulatoryriskexposurethaninothersectors;
anorganisationalstrategicapproachtopreservationwasrare;16 the lackofpreservationpoliciesandprocedureswithinorganisations
wasanissuethatstillneedsalotofattention;17
retentionpolicieswerenotoftennotedbutwheretheywere,theytoowere
not
necessarily
implemented
across
the
entire
organisation;18
there was a general recognition that preservation and storageproblems were aggravatedby the complexity, diversity of types orformats,andsizeofthedigitalentities;
costswerepoorlyunderstood; benefits to be derived from longterm preservation have proved
elusive and arguments which might convince commercially mindedbusinessleadersofthebenefitsarerestricted;19
thevalueplacedonthedigitalmaterialsbyorganisationsdependedonhowmuchtheorganisationreliedonthematerialforbusinessactivity;with the highest value placed on informationby organisations that
either
saw
or
depended
on
exploiting
the
potential
re
use
of
information or identified the risks associated with its not beingavailable;
organisationswerewaitingforsolutionstobedeliveredbytechnologydevelopers,researchersandserviceproviders.
ofdigitalpreservationstrategyinplace.Isthis20%ofthe900or20%ofthe104thatresponded?Eachinterpretationwouldtellaverydifferentstoryinmyview.Itisfairtosaythatingeneralthesekindsof methodological problems are inherent in much of the surveybased research that is done inelectronicrecordsmanagement,digitalpreservationanddatacuration.15
H. Hofman and M. Lunghi, Enabling Persistent and Sustainable Digital Cultural Heritage in
Europe: The Netherlands Questionnaire Responses Summary and Position Paper, presented atTowardsAContinuum ofDigitalHeritageStrategiesfor a EuropeanArea ofDigitalCulturalResources,(2004), http://www.minervaeurope.org/publications/globalreport/globalrepdf04/enabling.pdf16
ERPANETCaseStudies,(Glasgow,2004),http://www.erpanet.org17ERPANET, Policies for Digital Preservation, ERPANET Training Seminar, Paris, 2930January2003,(Glasgow,2003),http://www.erpanet.org/events/2003/paris/ERPAtrainingParis_Report.pdf,16.18The findings of ERPANET in Europe are alsoborne outby evidence in the USA. In legal casesinvolving the securities and financial sectors more generally staff often report that they were illadvisedabouthowtheyshouldhandlerecords.Forinstance,InreOldBancOneShareholdersSecuritiesLitigation,2005USDist.LEXIS32154(N.D.Ill.,8December2005),[b]ankemployeestestifiedtheydidnotknowmissingdocumentsshouldhavebeenretained,andthebankdidnotinformemployeesoftheneed to retain documents for this litigation orhave employees read and follow the electronicversionofthepolicythatwasestablished.19ERPANET, Business Models Related to Digital Preservation, (Glasgow, 2003)http://www.erpanet.org/events/2004/amsterdam/Amsterdam_Report.pdf,17.
-
8/12/2019 DigitalPreservation, Archival Science and MethodologicalFoundationsforDigitalLibraries1
6/19
Seamus Ross (Keynote Address ECDL2007, Budapest) -6-
Preservation of digital materials is a dynamic and evolving process: the methods arechanging,asare the technical requirements. It ishard,and thehype surroundingdigitalpreservation has made it even harder. We might wonder what twenty years of digitalpreservationresearchcanoffertodigital librariesIfearpreciouslittleofanyrealvalue.
AsIhavearguedelsewhere,during thisperiodmembersof thearchives, library,recordsmanagement and research communities have worked relentlessly to create an agitatingbuzzaboutthingsdigital.20Indeed,wherepreservationisconcerned,theriskamplifiershave taken thehighground from riskattenuators,as isevident from thegrowth in thenumberofpublications,conferencesandconferencepresentationsduringthepasttenyearsthat stress how essential it is that we overcome the obstacles to the longevity of digitalmaterials. Through our discussions we have socially amplified the perception of risksassociatedwithdigital entities,21butmainlywithin ourown community. It would seemappropriatetoconcludethatwehavedonethiswiththebestofintentions.Ascuratorsofour cultural and scientific memory we want to ensure that we pass our informationheritage to future generations in viable form. We recognise that the accountability of
individuals and public and private institutions in the digital age depends on thepreservationofdigitalmaterials.Weacknowledgethatreuseovertimeofdigitalmaterialswillproduceopportunitiesforthegrowthofcreativeandknowledgeeconomies.Weknowthat, as the transition from in vitro to in silico science gathers pace, the longertermviabilityof thisnewscientificparadigmrequires thatwecuratedigitalmaterials inwaysthatensuretheirreusability.Whilewemightconcludethatasmallbandofagitatedbuzzmakers have alone socially constructed our views of preservation risk, we know fromotherdomainsthattheprocessofestablishingriskperceptionsinvolvescomplexsocialandculturalprocessesanddependsonmorethanjusttheactionsofindividuals.22Indeed,asaresultwemightevenmistakenlyconcludethat,increatinganagitatingbuzzaboutthingsdigital, individuals within the preservation community have in a postmodern sense
socially constructed the impression and notions of preservation risk without abasis inreality.
Nothingcouldbe furtherfrom the truth.Preservationrisk isreal. It is technological. It issocial.It isorganisational.Anditiscultural.Intruth,ourheritagemaynowbeatgreaterrisk because many in our community believe that we are making progress towardsresolvingthepreservationchallenges.IfasIhavedoneelsewhereoneistocontrasttwoclassicstatementsofthedigitalpreservationchallenges,Roberts1994andTibbo2003,itisobvious that, although our understanding of the challenges surrounding digitalpreservation hasbecome richer and more sophisticated, the approaches to overcomingobstacles to preservation remain limited.23 Ross Harveys comprehensive examination of
the
landscape
of
preservation,
PreservingDigital
Materials,
24
similarly
points
to
only
a
few
20 S. Ross, Uncertainty, Risk, Trust and Digital Persistency, NHPRC Electronic Records Research
FellowshipsSymposiumLecture,UniversityofNorthCarolinaatChapelHill,(2006).21
R.E.Kasperson,O.Renn,P.Slovic,H.S.Brown,J.Emel,R.Goble,J.X.KaspersonandS.Ratick,TheSocialAmplificationofRisk:AConceptualFramework.RiskAnalysis,vol.8,no.2 (1988),177187.See also R.E. Kasperson, The Social Amplification of Risk: Progress in Developing an IntegrativeFramework, in S. Krimsky and D. Golding (eds.), Social Theories of Risk (Ch. 6), (Westport, CT:Praeger,1992).22
N.Pidgeon,R.E.KaspersonandP.Slovic,TheSocialAmplificationofRisk,(Cambridge:CambridgeUniversityPress,2003).23
D.
Roberts,
Defining
Electronic
Records,
Documents
and
Data,
Archivesand
Manuscripts,
vol.
22
(May 1994), 1426; H.R. Tibbo, On the Nature and Importance of Archiving in the Digital Age,AdvancesinComputers,vol.57(2003),167.24
R.Harvey,PreservingDigitalMaterials,(Mnchen:K.G.Saur,2005). AreadingofU.M.Borghoff,P.
-
8/12/2019 DigitalPreservation, Archival Science and MethodologicalFoundationsforDigitalLibraries1
7/19
Seamus Ross (Keynote Address ECDL2007, Budapest) -7-
implementedpreservationmethods,andthepreservationapproachesheexaminesappeartobebestcharacterisedashandicraft.Thepreservationcommunityhasnotyetcarriedoutsufficient underlying experimental and practical research either to deliver the range ofpreservationmethodsandtoolsnecessarytosupportpreservationactivitiesortoprovide
us with sufficient data to reason effectively about preservation risks or how to managethem.Weneed tobeable toreasonaboutpreservationrisks inthesamewayas,say,anengineermightdo in theconstruction industry,ora transportsafetyexpertmight,oranepidemiologist in a hospital might.25 While the work that DigitalPreservationEurope(DPE),26 the Digital Preservation Cluster of the DELOS NoE27 and the Digital CurationCentre (UK)28 have done in the risk management area, such as the development of theDRAMBORA29(DigitalRepositoryAuditMethodBasedonRiskAssessment)toolkitwhichenablesorganisationstoreasonaboutriskattherepositorylevel,isworthyofmention,weneedsimilartoolstoreasonaboutriskattheobjectlevelsaswell.
3DigitalLibrariesandArchivalScienceScientificcommunicationandinsilicoresearchrequiredanewmechanismformanagingits scholarlyproduction,disseminationandpreservation.DigitalLibrariesappearedasasolution; there are lots of them in the realm of scientific communication, the ACM,30IEEE,31Springer32orElsevier33digitallibrariescometomind.Butwhatexactlyisadigitallibrary?AsIamcertainthatnotallofuswouldagreeonthesamedefinition,IamgoingtouseonethatIpreparedfortheNationalLibraryofNewZealandaspartofareviewoftheirdigitalpreservationinitiatives,whichasaresultemphasisespreservation.Forourpurposeshere,letusthinkofadigitallibraryas:
the infrastructure, policies and procedures, and organisational, politicalandeconomicmechanismsnecessarytoenableaccesstoandpreservationofdigitalcontent.34
Rdig,J. Scheffczyk and L. Schmitz, LongTerm Preservation of Digital Documents: Principles andPractices,(Heidelberg:Springer,2003)givesthesameimpression.25
S.Ross,(2006)Uncertainty,Risk,TrustandDigitalPersistency,(seeabovenote20).26
http://www.digitalpreservationeurope.eu27
http://www.dpc.delos.info28
http://www.dcc.ac.uk;C.Rusbridge,P.Burnhill,S.Ross,P.Buneman,D.Giaretta,L.LyonandM.Atkinson, TheDigitalCurationCentre:A Vision forDigitalCuration, inProceedings IEEEsMassStorage and Systems Technology Committee Conference on From Local toGlobal:Data Interoperability
Challenges and Technologies, (2005); an online version is at:http://eprints.erpanet.org/archive/00000082/01/DCC_Vision.pdf29
A.McHugh,R.Ruusalepp,S.RossandH.Hofman,DigitalRepositoryAuditMethodBasedonRiskAssessment. Digital Curation Centre (DCC) and DigitalPreservationEurope (DPE), (2007),http://www.repositoryaudit.eu30
http://portal.acm.org/dl.cfm31
http://ieeexplore.ieee.org/Xplore/login.jsp?url=/Xplore32
http://www.springerlink.com33
http://www.elsevier.com/
34 S. Ross,Digital
Library
Development
Review, (Wellington, NZ: National Library of New Zealand,2003), http://www.natlib.govt.nz/files/ross_report.pdf, 5. This definition is broad enough to
encompass the new classes of digital libraries, such as YouTube (http://www.youtube.com) andFlickr(http://flickr.com),whichareinteractive,participatory,dynamicanduserdriven.
-
8/12/2019 DigitalPreservation, Archival Science and MethodologicalFoundationsforDigitalLibraries1
8/19
Seamus Ross (Keynote Address ECDL2007, Budapest) -8-
But ifwe thinkmorecarefullyaboutdigital librariesweeasilyobserve that theymaybelibrariesbyname,but theyarearchivesbynature.Thecontent theyholddoesnotreallyneedtobeheldelsewherebecausenetbasedservicesmeanitcanbeprovidedfromasinglesourcewhereverandwheneverit iswanted.Digitallibraries,therefore,canhold unique
exemplars.When users access the content from these domains they expect tobe able totrust and verify its authenticity (although not necessarily its reliability), they requireknowledgeof itscontextofcreation,and theydemandevidenceof itsprovenance.Theseareprocessestowhicharchivesrespondwellbecausetheyhavedevelopedanappropriatetheoretical frameworkandhaveoperationalised it inrepositorydesign,managementanduseoveratleastthreecenturies.Thearchivalframeworkmeetsrequirementssurroundingtheproduction,management,selection,dissemination,preservationandcurationneedsofinformation. It also supports a layering of services from repository services at thefoundation touserservicesatupper levels.While thesenotionsoriginate in theworldofarchivalscience,theyequallywellbelongtotheworldofdigitallibraries.
Modern
archival
science
began
in
the
17th
century
with
the
development
of
diplomatics.
35
Muchofmodernarchivalpracticedevelopedinthesameearlymodernperiodinresponsetotheneedtomanagedistantconquestsanddistributedtransnationaltradingcompaniesand economies.36 Beginning in the late 16th century there was an unprecedentedinformationanddocumentaryexplosionandthistrendhascontinuedintothedigitalage.Over three centuries archival practice and science has responded well to the changingenvironmentof informationproductionanduse. Itscoreprinciplesofauthenticity, trust,context,provenance,descriptionandarrangement,andrepositorydesignandmanagementevolvedduringthisperiodandhavebecomemoreandmorerefinedasthecommunicationand information production and use landscapes have evolved. Others such as appraisalhave emerged more recently. While an effort to define a formal foundation for digital
libraries
within
the
context
of
archival
science
would
require
an
exploration
of
each
of
these
notionsinturn,todayIshalltouchonlyonthreetopics:diplomaticsasatool,theconceptsofauthenticity,andprovenance.
Digital library users might wish to know where the digital materials came from, whocreated them,why they were created,where they were created,how theywere created,howtheycametobedeposited,howtheywereingested(e.g.underwhatconditions,usingwhat technology, how the success of the ingest was validated), and they may needinformationastohowthedigitalobjectwasmaintainedafteritsacquisitionbythedigitallibrary (e.g. was it maintained in a secure environment? have changes in hardware andsoftwarehadanimpactonthedigitalobject inquestion?).Iftheyweretorequireorseeksuchdata, theycould legitimatelyexpect tobeable toacquire this information relativelyeasily.Theirneedforthisknowledgeincreasesinlinewiththeincreaseinthetimebetweenthepointatwhich thedigitalobjectwascreatedanddeposited in thedigital libraryandwhen it comes tobe used. Diplomatics, a core tool in archival science, provides thetheoreticalframeworktoinvestigatesuchquestions.
35J.Mabillon,DerediplomaticalibriVI,(Paris:SumtibusCaroliRobustel,1709).(Althoughoriginally
issuedin1681Iamfamiliaronlywiththerevisededitionof1709.)36
Oneneedonlythinkofthe80millionpagesofdocumentsintheArchivoGeneraldeIndias(Seville)
representingtherecordsfromtheConquistadorestotheendofthe19thcenturyorthe14kilometresofrecordsoftheEastIndiaCompanybeginningin1600(anditsvariousreincarnationsafter1858)tosee the scale on which documents were being created during the period (seehttp://www.bl.uk/collections/iorgenrl.html).
-
8/12/2019 DigitalPreservation, Archival Science and MethodologicalFoundationsforDigitalLibraries1
9/19
Seamus Ross (Keynote Address ECDL2007, Budapest) -9-
Intheirrigour,transparencyandmethodologicalprecision,themethodsofJeanMabillon,theBenedictinemonkwho solidified the foundationsofdiplomatics,mirror thoseof thegenerallybetter known scientific giants who were his contemporaries, including RobertBoyle,EdmondHalley,RobertHooke,AntonivanLeeuwenhoek,MarcelloMalpiqhiand,
of course, Isaac Newton.37
The information object domains to which theorists andpractitionershaveapplieddiplomaticshaveevolvedsince theearly thinkingofMabillonand Papenbroeck38. Early scholars, such as Thomas Madox, felt diplomatics was mostappropriately applied to instruments such as charters.39 For nearly two centuries theprevailing intellectual wind, as represented in manuals of diplomatic practice andintroductions to what we regard now as classic studies of documents, held that theconcepts of diplomatics should really only be applied to juridical documents theconservative view consistently reigned in morebroadminded thinking.40 But during the20th century attitudes firmly changed. For instance, Georges Tessier, Professor ofDiplomaticsatLcoleNationaledesChartesfrom1930until1961,arguedthatdiplomaticswasapplicabletoallclassesofdocumentsandnotjustjuridicalones.41Thisviewhasbeen
increasinglyadoptedbyotherscholars.LucianaDuranti,ProfessorofArchivalScienceattheUniversityofBritishColumbia,whohaspioneeredtherevitalisationofdiplomaticsforthe digital age, has argued for its relevance to electronic records.42 Indeed, through herleadership of InterPARES 1 and 2 she has led abroadening of the conceptualisation ofrecords from including recordsproducedand/ormaintained indatabasesanddocumentmanagementsystems to recordsproducedand/ormaintained in interactive,experientialanddynamicenvironments.43Durantihastherebybroadenedthetypesofobjectstowhichdiplomaticscouldbeeffectivelyapplied.LeonardBoyle,theeminentscholarand,rumourhasit,reluctantPrefectoftheVaticanLibrary,inanelegantessaythatIfirstreadjustthirtyyears ago this month while an undergraduate and that remains the finest succinctdiscussionofdiplomaticsofwhich Iamaware,argued: it seemsmuchmore realistic
andfarlesspreciousandselectivetodescribediplomaticsasthescholarlyinvestigationof
37JeanMabillon(b.1632d.1707)andDanielvanPapenbroeck(b.1628d.1714).Contemporaries:
RobertBoyle(b.1627d.1691),EdmondHalley(b.1656d.1742),RobertHooke(b.1635d.1703),AntonivanLeeuwenhoek (b.1632d.1723),MarcelloMalpighi (b.1628d.1694),andofcourseIsaacNewton(b.1643d.1727).38
ThegroundbreakingworkofDanielvanPapenbroeck(b.1628d.1714)isworthyofdiscussion,buttimedoesnotpermitmetodoitjusticehere.39
ThomasMadox,FormulareAnglicanum,(London,1702).40
Forexample,theconservativeviewsofJ.Ficker,BeitrgezurUrkundenlehre,(Innsbruck,18778)isagoodexampleofthiskindofconservativethinking.41
G. Tessier, La diplomatique, (Paris: Presses universitaires de France, 1952), (3rd edition, 1966).GeorgesTessier,Diplomatique,inLHistoireetsesmthodes,inCharlesSamaran(ed.),Encyclopdiedelapliade11,(Paris,1961),63376.42
L. Duranti, The longterm preservation of accurate and authentic digital data: the InterPARESproject, Data Science Journal, vol. 4 (2005), 106118.(http://www.jstage.jst.go.jp/article/dsj/4/0/4_106/_article);L.Duranti,ConceptsandPrinciplesfortheManagementofElectronicRecords,TheInformationSociety.AnInternationalJournal,vol.17(2001),19;L.Duranti, Diplomatics:NewUses foranOldScience.PartVI,Archivaria,vol.33 (199192),624;Diplomatics:NewUsesforanOldScience.PartV,Archivaria,vol.32(1991),724;DiplomaticsNewUses foranOldScience.Part IV,Archivaria,vol.31 (19901),1025; Diplomatics:NewUses foranOldScience.PartIII,Archivaria,vol.30(1990),420;Diplomatics:NewUsesforanOldScience.PartII,Archivaria,vol.29(198990),417; Diplomatics:NewUsesforanOldScience,Archivaria,vol.28
(1989),727.43
L.Duranti andK.Thibodeau, The ConceptofRecord in Interactive,Experiential andDynamicEnvironments: the View of InterPARES, Archival Science, vol. 6, no. 1 (2006), 1368 (Online:http://dx.doi.org/10.1007/s1050200690217)
-
8/12/2019 DigitalPreservation, Archival Science and MethodologicalFoundationsforDigitalLibraries1
10/19
Seamus Ross (Keynote Address ECDL2007, Budapest) -10-
any and every written documentary source,juridical, quasijuridical, or nonjuridical.44Moreover,thereisnoreasontolimititsapplicabilitytoinformationobjectsrepresentedasphysical documents; it can equally wellbe applied to all information objects held in adigitallibrary,whetherstillormovingimages,audio,vectorgraphics,anddata(andeven
data held in databases). Broadly speaking, diplomatics provides a critical apparatus tostudy any information object and this process was encapsulated for Boyle in sevenmechanisms to investigate the veracity of an information object: quis?, quid?, quomodo?,quibusauxiliis?,cur?,ubi?andquando?45
Diplomatics assists us to assess a digital objects provenance, which relates its origin,lineageorpedigree.Provenanceiscentraltoarchivalpracticeandtoourabilitytovalidate,verify and contextualise digital objects.46 Within the archival context the significance ofknowledgeaboutprovenancecame tobereflected inhowdocumentsweremanaged.So,archivistsbeginning in the late 18th and early 19th century rejected approaches to theorganisationof informationobjectsalongsuch linesofpertinenceas subject,contentand
physical
place
of
creation
in
favour
of
respecting
the
environment
of
creation
and
the
originalorderinwhichthedocumentshadbeencreatedandused.47Tobejustalittlemoreprecise the significance of provenance within archival practice emerged not merely inresponsetothefloodofdocumentsthatwerearrivingatthedoorsofarchives,butfromacombination of experience, the cultural milieu of the period which emphasisedclassificatorypracticesandevolutionarythinking,andabeliefbyhistoriansthatifmaterialwastoberetainedinitsoriginalorderresearcherswouldbeabletohearthevoicesinthedocumentsmoreaccurately,morerichlyandwithamoreprecisesemanticappreciation.48As Michael Roper, former Keeper of the Public Record Office (London), put it, theprovenance or context of archives remained a vital means of assessing the source,authority,accuracyandvalueoftheinformationwhichtheycontainedforadministrative,
legal
[.]
research
and
cultural
uses.
49
In
fact,
provenance
is
of
critical
importance
to
another archival concept, that of appraisal,50 where the disposition of digital objects is
44 L.E. Boyle, Diplomatics, inJ.M. Powell (ed.),Medieval Studies:An Introduction, (Syracuse, NY:
SyracuseUniversityPress,1976),69101,atp.75.45
Boyle(1976),7990.Had,forexample,HughTrevorRoper(LordDacre)adheredtotheseprinciplesofanalysis,whichdependuponaskingquestionsaboutwho,what,inwhatmanner(e.g.form,formulae,style),withwhat support, aid or help,why (e.g.whatpurpose),where, andwhen, when he acted as amemberofthegroupengagedtodeterminetheauthenticityoftheHitlerDiariesin1983,hemightnot havebeen led astray. One could cite dozens of other examples, including some in which thematerials inquestionwereheldwithinarchives.When theseprinciplesareapplied theycanassistscholars,as isevidentin thestudybyL.BerlinandH.CraigCasey, RobertNoyceand theTunnelDiode, IEEE Spectrum, (May 2005), 4245. See especially page 43 where the authors describe theprocessofvalidatingcopiesmadefrompagesofNoyceslaboratorynotebooks.46
SeetheessaysinK.AbukhanfusaandJ.Sydbeck(eds.),ThePrincipleofProvenance:ReportfromtheFirst Stockholm Conference onArchival Theory and the Principle of Provenance (23 September 1993),SkrifterutgivnaavSvenskaRiksarkivet10(1994),ISBN:9188366111.47
N.deWailly(1841).SeeM.Duchein,TheoreticalPrinciplesandPracticalProblemsofRespectdesfondsinArchivalScience,Archivaria,vol.16(Summer1983),6482.48
S.Muller,J.A.FeithandR.Fruin,Handleidingvoorhetordenenenbescrijvenvanarchiven, (Groningen,1898).49
M.Roper, ArchivalTheoryandthePrincipleofProvenance:aSummingup, inK.AbukhanfusaandJ.Sydbeck(eds.),(1994),187.50
RossHarvey,AppraisalandSelection,inS.RossandM.Day(eds.),DCCDigitalCurationManual,(Glasgow: Digital Curation Centre, 2006), http://www.dcc.ac.uk/resource/curationmanual/chapters/appraisalandselectionprovidesanexcellentintroductiontotheissuesofappraisal.
-
8/12/2019 DigitalPreservation, Archival Science and MethodologicalFoundationsforDigitalLibraries1
11/19
Seamus Ross (Keynote Address ECDL2007, Budapest) -11-
determined. Of course, in the digital age knowledge of provenance continues to beessential,asPeterBunemanandhiscolleaguesattheUniversityofEdinburghhavearguedin thecontextofdatabases.51 In the flexibledigital libraries (anddigitalarchives for thatmatter)we canboth retain theknowledgeofprovenanceatall levelsofgranularityand
even repackage the entities along the lines of pertinence if this is required to meetspecialiseduserneedsorexpectations.
Digital preservation aims to ensure the maintenance over time of the value of digitalentities.AstheresearchoftheInterPARESTaskForceonAuthenticityconcluded,[w]henweworkwithdigitalobjectswewanttoknowtheyarewhattheypurporttobeandthatthey are complete and have notbeen altered or corrupted.52 These twin concepts areencapsulatedinthetermsauthenticityandintegrity.53Digitalobjectsthatlackauthenticityandintegrityhavelimitedvalueasevidenceorevenasasourceforinformation.Asdigitalobjects are more easily altered and corrupted than, say, paper documents and records,creators and preservers often find it challenging to demonstrate their authenticity. How
many
of
us
would
be
comfortable
if
our
doctor
were
to
use
a
clinical
trials
data
set
in
which
he/she couldnotverify theauthenticityof thematerials itcontained toplana regimeoftreatment?Theabilitytoestablishauthenticityof,andtrustin,adigitalobjectiscrucial.54Awelldocumentedchainofcustodyisonefactorthathelpswithestablishingauthenticity.55
Authenticityhasbecomea21stcenturychallengethatreachesintoeverycornerofmodernlife.Ofcourse,authenticitymeansdifferentthingstodifferentcommunitiesindeed,evenwithinasingledomain itsmeaningcanvary fromrigid to flexible,asacontrastbetweenthe Warhol Foundation approach to validating authorship in Warhol works56 and thejudgementintheUKlegalcaseofThomsonvsChristiesdemonstratesfortheartworld.57
51 P. Buneman, A. Chapman andJ. Cheney, Provenance Management in Curated Databases, inProceedings of the 2006ACMSIGMOD InternationalConference onManagement ofData, (Chicago, IL:2006),539550.http://portal.acm.org/citation.cfm?doid=1142473.1142534;P.Buneman,S.KhannaandW.C.Tan,WhyandWhere:ACharacterizationofDataProvenance,in8thInternationalConferenceonDatabase Theory (ICDT 2001), (2001), 316330,http://www.springerlink.com/content/edf0k68ccw3a22hu/;P.Buneman,S.Khanna,K.TajimaandW.ChiewTan,ArchivingScientificData,ACMTransactionsonDatabaseSystems(TODS),vol.29(2004),242,http://portal.acm.org/citation.cfm?doid=974750.97475252
InterPARESAuthenticityTaskForce,AuthenticityTaskForceReportinTheLongtermPreservationofAuthentic Electronic Records: Findings of the InterPARES Project, (Vancouver, 2004),http://www.interpares.org/book/index.cfm53
L.Duranti, ReliabilityandAuthenticity:TheConceptsandTheirImplications,Archivaria,vol.39
(Spring,1995),510.54
S.Ross,PositionPaperonIntegrityandAuthenticityofDigitalCulturalHeritageObjects,IntegrityandAuthenticity of Digital Cultural HeritageObjects, DigiCULT Thematic Issue 1, (2002), 78; alsoavailableathttp://www.digicult.info55
Fromthepointofviewofthepolicethisisseenin:TheNationalHiTechCrimeUnitproducedfor the Association of Chief Police Officers, (n.d.), Good Practice Guide for Computer BasedElectronic Evidence, (version 3.0),http://www.acpo.police.uk/asp/policies/Data/gpg_computer_based_evidence_v3.pdf56
For example, R. Brooks, Worthless Warhol Alarms Art World, Timesonline, 22January 2006,http://www.timesonline.co.uk/tol/news/uk/article717330.ece57
InthecaseofThomsonvsChristies,70%certaintythatanobjectwaswhatChristiesclaimeditto
be was good enough for the presiding judge,http://news.bbc.co.uk/2/hi/uk_news/england/norfolk/3727623.stm; S.G. Vyas, Is There an Expert intheHouse?Thomson v.Christies:TheCase of theHoughtonUrns, InternationalJournal ofCulturalProperty,vol.12(2005),425441.
-
8/12/2019 DigitalPreservation, Archival Science and MethodologicalFoundationsforDigitalLibraries1
12/19
Seamus Ross (Keynote Address ECDL2007, Budapest) -12-
Theinabilitytoseparatetheauthenticfromtheinauthenticinthecaseofcounterfeitdrugsiscreatinga globalpublichealthproblemcausingdeath,disabilityand injury58and thecontinuinggrowthintheproductionofsuchcounterfeitproductsashandbags,trainersandwatches raises concerns over the protection of intellectual property rights and economic
returns.Attheheartofestablishingauthenticityliestrustandthisisanareawherewearejustbeginningtounderstandtheissues.59
We live inapostmodernistworldand,as the innovativearchival theorist,TerryCooke,haspoignantlynoted:Thepostmodernisttoneisoneofironicaldoubt,oftrustingnothingatfacevalue,ofalwayslookingbehindthesurface60Authenticityisatopicthatcouldbethesubjectofmuchnewresearchatbothpracticalandtheoreticallevels;herewecanonlydrawattentiontotheissuefromtheperspectiveoftheuser:
Howdoesauserknowthatadigitalobjectisanauthenticinstantiationof the version that was ingested (e.g. deposited) into the digitallibrary?Whattoolswillauserneedtohaveather/hisdisposalinthisworld of digital diplomatics if the user is tobe able to make anindependent judgement about authenticity?61 What information,functionsandservicesshouldthedigitallibraryprovidetoenabletheusertobeabletoauthenticateadigitalobject?
Confrontedwithdigitalobjects,thoseofuswhowereengaged intheInterPARES 1 Taskforce on Authenticity concluded that most usersbegin from a position of presuming that if an object is said tobeauthenticby the supplier then it is Presumption of Authenticity.Unless some evidence emerges that causes them to question theauthenticity of an object, users generally assume that,because theobject is heldby an archives or a library, its authenticity isbeyondquestion.
Therearefewwaysthatausercouldevenbegintodeterminewhetheradigitalobject iswhatitpurportstobewherethey lackaccesstothedetailsoftheprocessbywhichthedigitalobjectwascreated,ingestedand managed. They can only do this if institutions have adequatelyand transparently documented the processes of digital entity ingest,managementanddelivery.
58 WHO, Combating CounterfeitDrugs:A Concept Paperfor Effective International CooperationWorld
Health
Organization,
Health
Technology
and
Pharmaceuticals,
Combating
Counterfeit
Drugs:
Building Effective International Collaboration, International Conference Rome, Italy, 1618February 2006, (drafted by Michele Forzley), (Rome, 27 January 2006)http://www.who.int/medicines/events/FINALBACKPAPER.pdf, p. 1. See also:http://www.who.int/medicines/services/counterfeit/overview/en/ and FDA, COMBATINGCOUNTERFEITDRUGS:AReportof theFoodandDrugAdministration, (Rockville,MD:FDA,2004),http://www.fda.gov/oc/initiatives/counterfeit/report02_04.html59
H.MacNeil,ProvidingGroundsforTrust:DevelopingConceptualRequirementsfortheLongtermPreservationofAuthenticElectronicRecords,Archivaria,vol.50(2000),5278;H.MacNeil,ProvidingGroundsforTrustII:TheFindingsoftheAuthenticityTaskForceofInterPARES,Archivaria,vol.54(2002),2458.60
T.Cooke,ArchivalScienceandPostmodernism:NewFormulationsforoldConcepts,Archival
Science,
vol.
1,
no.
1
(2000),
3
24.
61Thetechnologiestoassistwithdigitalforensicsareemerging,asW.WangandH.Farid,Exposing
DigitalForgeriesinInterlacedandDeinterlacedVideo,IEEETransactionsonInformationForensicsandSecurity,vol.2,no.3 (September2007),438 449shows.
-
8/12/2019 DigitalPreservation, Archival Science and MethodologicalFoundationsforDigitalLibraries1
13/19
Seamus Ross (Keynote Address ECDL2007, Budapest) -13-
Notwishingtoconfusethe issuesatthisstage,but it isworthrecognising thedistinctionbetweenauthenticandreliableinformation.62Notallauthenticmaterialheldbyadigitallibraryneedbereliable.Oncematerialcomestobeheldinadigitallibraryorrepositoryitmustbeimmutableifwearetoacceptitasauthentic.Infact,manydigitallibrariescontain
unreliable information,but even unreliable data can tell its own story if its provenance,contextandpurposecanbeascertained.Additionally,wemightraisetheissueofcontentquality in terms of digital libraries; quality is a property of digital objects that needsattentionalongsideauthenticityandreliability.63
We arejust coming to grips with archival science and diplomatics as components of atheoryofinformationobjectmanagementandafoundationfordigitallibraries.Agrowingnumberofresearchersaremovingintothisdiscussionareaanddebateinthisareaislikelytobecomemoreandmorelively.
4ResearchAgendaGiven thecoredependencyofdigital librariesonguaranteeing theauthenticity, integrity,interpretabilityandcontextofthedigitalmaterialacrosssystems,timeandcontext,digitalpreservation/curation action mustbe at the heart of any future digital library researchagenda.Ifdigitallibrariesaretofunctioninthisnewtechnologicalenvironment,theywillneed to be transparent, accessible, and responsive to user needs and expectations.Contemporary research in digital libraries tends to emphasise such research topics aspersonalisation, architecture, representation, retrieval, presentation and access. And theinvestigation of digital preservation has been limited. My impression from browsingthroughthepastfiveyears(20022006)ofproceedingsfromECDLandJCDL isthatmostdigital library research tends to focus on the here and now. The addition of a digitalpreservationclustertoDELOSNetworkofExcellencewasavisionarymovebyCostantinoThanos and Vittore Casarosa (Istituto di Szienza e Tecnologie dellInformazione ISTI,ConsiglioNazionaledelleRicercheCNRatPisa); itreflected theirrecognitionthatdigitallibrarieswerenotjustaboutcommunicatingwiththepresentbutthattheyaremechanismstofacilitatecommunicationwiththefuture.64Untilrecently,however,preservationhasnotbeenseenascentraltodigitallibrarydesignanddevelopment.ThoseofusworkingintheteamablyledbyDonatellaCastelli(alsoofISTIatCNRPisa)todeveloptheDELOSDigitalLibrary Reference Model are only just coming to grips with how to incorporate
62SeetheworkofInterPARES1,http://www.interpares.org/ip1/ip1_index.cfm
63D.M.Strong,Y.W.LeeandR.Y.Wang,Dataqualityincontext,CommunicationsoftheACM,vol.40,
issue 5 (May 1997), 103110. DOI= http://doi.acm.org/10.1145/253769.253804; A. Martinez andJ.Hammer, Making quality count in biological data sources, IQIS 05: Proceedings of the 2ndinternationalworkshoponInformationqualityininformationsystems,(Baltimore,MD:ACMPress,2005),ISBN 1595931600,1627.DOI=http://doi.acm.org/10.1145/1077501.1077508.Of course, asA.EvenandG.Shankaranarayananhavedemonstrated,thesamedatamaybeassessedbydifferentuserstohavedifferentdegreesofdatautilitydependingupon contextofuse (Utilitydrivenassessmentofdata quality, SIGMIS Database, vol. 38, no. 2 (May 2007), 7593. DOI=http://doi.acm.org/10.1145/1240616.1240623).64
DELOS: Network of Excellence on Digital Libraries (G038507618) funded under the EuropeanCommissions 6th Framework IST Programme, http://www.delos.info andhttp://www.dpc.delos.info
-
8/12/2019 DigitalPreservation, Archival Science and MethodologicalFoundationsforDigitalLibraries1
14/19
-
8/12/2019 DigitalPreservation, Archival Science and MethodologicalFoundationsforDigitalLibraries1
15/19
Seamus Ross (Keynote Address ECDL2007, Budapest) -15-
digitalpreservationlandscapeandtocomeupwitharesearchagendathatmightbetakenforwardundertheSeventhFrameworkProgrammeoftheEuropeanCommission,aswellas at national levels within the Member States of the European Union. Based on anextensive crosswalk of existing preservation research agendas, the DPE Research
Roadmaps objective is to provide a concise overview of the core issues that have tobeaddressed in future digital preservation research.67 To construct the framework, mycolleague Holger Brocks (of the FernUniversitt in Hagen) led participants in the DPEResearch Roadmap Working Group (RAWG) to examine the challenges of preservationfromfivevantagepoints:digitalobjectlevel,collectionlevel,repositorylevel,processlevelandorganisationalenvironmentwhichalsoencapsulatescreationanduse.So,forinstance,attheobjectlevelwefocusonmigration,emulation,experimentationandacceptableloss;at the collection levelweexamine interoperability,metadataand standardisation;andattheprocesslevelwelookatissuessuchasautomationandworkflow.
First and foremost, the DPE Research Agenda responds to the lack of progress that has
been
made
in
the
delivery
of
preservation
solutions,
methods
and
techniques
over
the
past
twentyyears. Secondly,itrecognisesthat,asthoseworkinginthedisciplinecametobetterunderstand thepreservationobstacles, theyextendedtheresearchdomain intoareas thatwere originally peripheral to digital preservation. This has actually hampered progressbecause it has fragmented research activity much toobroadly. In response, DPE hasproposed narrowing the research agenda and argued that as a research community wemustcapitaliseonancillaryworkcarriedout inotherdomainssuchassemanticenabledinformation infrastructures, gridbased resources and serviceoriented architectures. TheDPE team have agreed that there are really nine themes that should characterise ourresearch in preservation. These nine themes alsobring digital preservation in line withtraditional preservation activities in the analogue world. In addition, there is one core
methodological
approach
that
researchers
in
preservation
need
to
adopt.
Theninethemesare:
1. Restoration.Digitalobjectsbreak.Thiscanoccurwhenstoragemediabecomedamaged, software and hardware become obsolete, applications becomeinaccessible either through loss of access or through technologicaldevelopments,orbitstreamsbecomecorrupt.When theybreakand theyareunique and valuable, they mustbe restored. What processes can we use toensure the syntactical completenessofdigitalobjectsandwhatmethodswillenable us toaddress semantic opaqueness? Computer forensics research hasledtosomerestorationmethods,68butweneedmoreexperimentalresearchin
thisareatodevelopeffectiveanduserfriendlyrestoration technologies.How
Powell, A Digital Repositories Roadmap: Looking Forward (2006),http://www.eduserv.org.uk/upload/foundation/pdf/reproadmapv15.pdf;N.Beagrie,eInfrastructureStrategyforResearch:FinalReportfrom theOSIPreservationandCurationWorkingGroup, (Edinburgh:National eScience Centre, November 2006, but published in 2007),http://www.nesc.ac.uk/documents/OSI/preservation.pdf67DigitalPreservationEurope, DPE Digital Preservation Research Roadmap (2007),http://www.digitalpreservationeurope.eu/publications/dpe_research_roadmap_D72.pdf68
Companies such as OnTrack Data Recovery (http://ontrackdatarecovery.com) or DriveSavers(www.drivesavers.com)havedevelopedaricharrayofdatarecoverytechnologies.Themethodsand
processesaregettingbetter,asScottGaidano,cofounderofDriverSavers,pointsout:eightyearsago[1997], 50 percent of our drives could notbe restored. Now up to 90 percent of the data canbesalvaged from 85 to 90percentofdrives,E.A.Taub, Bad habits keepdata recovery firms alive,InternationalHeraldTribune,1617July2005,14.
-
8/12/2019 DigitalPreservation, Archival Science and MethodologicalFoundationsforDigitalLibraries1
16/19
Seamus Ross (Keynote Address ECDL2007, Budapest) -16-
do we verify the completeness of a restored digital object? What is anacceptableleveloflossatdifferentsyntacticalandsemanticlevels?Howdowerestorecontent,contextandexperience?
2. Conservation. Whereas restoration offers ways to handle objects that havebecome severely damaged or exist only in fragmentary form, methods forconservation enable us to address challenges that may arise with digitalentitiesbeforethedamagehasbecometoosevere,muchaswemightconservea post1830s printedbookby deacidifying itbeforebrittlebook syndrometakesholdoradoptpreventivemedicine. Transcoding,migration,emulation,virtualisation, information extraction, metadata enhancement, and semanticannotationtechnologiesareallexamplesofmethodsthatwemightdeploy tofacilitatetheconservationofdigitalobjects.Hereagaintherearefewmethodsthatwecantakeofftheshelf;wesimplyhavenotdonetheresearch.
3. Collection and repository management. Operational and organisationalresearchintothemanagementofdigitalobjects,collectionsandrepositoriesisneeded.Researchneeds to focusonplanning,enacting, executing,managingandmonitoringoforganisationalprocessesforrepositories.Forexample,howdoweconstructcollectionsinthedigitalage?Whatkindsofservicelayersdousersofdigitallibrariesrequireandhowwillthesebemaintainedovertime?
4. Preservation as risk management. We have argued elsewhere that digitalpreservation is a risk management problem.69 Hence, decisionmakinginstrumentsareneededwhichwillenabledigitalpreservationpractitionerstotranslate the uncertainties involved in digital preservation into quantifiable
risksthatcanbemanaged.
5. Preserving the interpretability and functionality of digital objects. Ourunderstandingofthepropertiesthatdigitalobjectsmustretainovertimeiftheobjectsare to remainsemanticallymeaningful,authentic, reliableandusable,whether for rendering or analysis, remains limited. How do we validateverisimilitudeofcontent,contextandperformance?Whatmetricsdowehaveformeasuringconsistencyoffunctionalityandbehaviourofdigitalobjectsoverdifferentdigitallibrarytechnicalsystemsandenvironments?
6. Collection cohesion and interoperability. Digital libraries and repositorieshandlecollectionsofdigitalobjectsasopposedtojustdiscreteentities.Itistheintegrated nature of these collections that provides some degree ofcontextuality to the individual objects. Moreover, collections often only gainreal value when they can be integrated with collections held by otherrepositories. The research that has been done into interoperability acrossgenerationsofsystems,timeandrepositorieshasbeeninsufficient.
7. Automation inpreservation.Thesheerquantityofdigitalobjectswithwhichdigitallibrariesneedtodealmeansthatweneedtodomuchmoreintermsofautomation of processes than we have done in the past.70 Areas where
69RossandMcHugh,(2006a),TheRoleofEvidenceinEstablishingTrust..(seeabovenote6),andRoss(2006),Uncertainty,Risk,TrustandDigitalPersistency(seeabovenote20).
70J.F.Gantzetal.,TheExpandingUniverse:AForecastofWorldwide InformationGrowthThrough2010,
(An IDC White Paper, sponsored by EMC, 2007),
-
8/12/2019 DigitalPreservation, Archival Science and MethodologicalFoundationsforDigitalLibraries1
17/19
Seamus Ross (Keynote Address ECDL2007, Budapest) -17-
automationhaspromiseinclude: metadataextraction,71preservationplanningand action,72 and selection and appraisal. To date, the tools that supportautomationofprocessesarequitelimited,requirehumanintervention,anddonot scale. Again we simply have not done the underlying research,
experimentationandtesting.
8. Preserving the context. Establishing the semantic meaning of digital objectsandevencollectionsdependsuponretentionofcontextual information.Howwastheobjectcreated?Howwasitused?Whatwasthelegalorsocialcontextof itsvalue?Whatkindsofprocessesare necessary to construct contextandmeaning?Researchintocontextualityisneeded.
9. Storage technologies and methods. On the one hand this is an engineeringproblem and on the other it is a deployment problem. The digital librarycommunityhasmuchtoofferthepreservationcommunitythroughitsresearch
intotheGRIDanditscollaborativeinitiativesinthedomainofeScience.
Youmaywonderwhyissuessuchasmetadataareabsentfromthislist.73Thereasonisthatmetadataissuescutacrossmanyresearchlinesfrominteroperabilitytocontextualisation.
Until recently, much preservation research hasbeen theoretically led and little of it hasactually involvedwelldesigned experimentation.74Everyaspectofpreservation researchfrom characterisation of digital objects to preservation planning to user needs analysisrequires experimental research. Some of the post2003 research and support activitiesrelatedtodigitalpreservationinEurope,suchastheDigitalCurationCentre(DCC)intheUK,75 DigitalPreservationEurope (DPE), CASPAR (Cultural, Artistic and Scientific
http://www.emc.com/about/destination/digital_universe/pdf/Expanding_Digital_Universe_IDC_WhitePaper_022507.pdfThecurrentgrowthratecontinues toexceedpredictions.Forexample,contrastthedatainGantzetal.withthatinP.LymanandH.R.Varian,HowMuchInformation?,(Berkeley,CA:UniversityofCaliforniaatBerkeley,SchoolofInformationManagementandSystems,2000),online:http://www.sims.berkeley.edu/research/projects/howmuchinfo/internet.html71
For example, Y. Kim and S. Ross, The Naming of Cats: Automated Genre Classification, TheInternational Journal of Digital Curation, vol. 2, no. 1 (2007),http://www.ijdc.net/./ijdc/article/view/24/27,ISSN:1746825672
Forexample,S.Strodl,A.Rauber,C.Rauch,H.Hofman,F.DeboleandG.Amato, TheDELOSTestbedforChoosingaDigitalPreservationStrategy,inProceedingsofthe9thInternationalConferenceonAsianDigitalLibraries (ICADL06) (Kyoto,Japan,2730November2006), (Berlin:Springer,2006),323332; S. Strodl, C. Becker, R. Neumayer and A. Rauber, (2007), How to Choose a DigitalPreservation Strategy: Evaluating a Preservation Planning Procedure, in Proceedings of the 2007ConferenceonDigitalLibraries,(Vancouver,2007),2938,http://doi.acm.org/10.1145/1255175.125518173
W.Duff,MetadatainDigitalPreservation:Foundations,Functions,andIssues,inF.M.Bischoff,H.HofmanandS.Ross(eds.),MetadatainPreservation,(Marburg:VerffentlichungenderArchivschuleMarburg,InstitutfrArchivwissenschaft,no.40,2004),ISBN3923833776,2738.74
This isnot tosuggest that therehasbeennoexperimentation todate,but topointout that ithasbeenlimited.ExamplesincludeM.L.Nelson,J.Bollen,G.ManepalliandR.Haq,ArchiveIngestandHandlingTest:TheOldDominionUniversityApproach,DLibMagazine,vol.11,no.12(2005);W.Y.Arms, R. Adkins, C. Ammen and A. Hayes, Collecting and Preserving the Web: The Minerva
Prototype,
RLGDigiNews,
vol.
5,
no.
2
(2001).
A
good
summary
of
the
publications
related
to
both
practitioner and researcher led studies in preservation is provided in the quarterly DPC/PADIWhatsNewinDigitalPreservation,http://www.nla.gov.au/padi/quarterly.html75
http://www.dcc.ac.uk
-
8/12/2019 DigitalPreservation, Archival Science and MethodologicalFoundationsforDigitalLibraries1
18/19
Seamus Ross (Keynote Address ECDL2007, Budapest) -18-
knowledge for Preservation, Access and Retrieval),76 PLANETS (Preservation and LongtermAccessthroughNETworkedServices),77theDigitalPreservationClusteroftheDELOSNetworkofExcellence inDigitalLibraries (DELOSDPC)78andnumerousotherprojects Imighthavementioned,reflecttherealisationthatweneedtobemuchmoreexperimentally
driven inourresearchendeavours ifweare toprogress thedigitalpreservationresearchagenda.
5ConclusionSo,whatarethetakeawaypointswithwhichIwanttoleaveyou?
First,asacommunityweneed to rethinkhowweareapproachingresearch intodigitalpreservationandcuration.
Second,weneedtoengagedigitallibrariesresearchersinthisprocess,andespeciallythosewithastrongcomputingscienceandengineeringbackground.
Third,researchindigitalpreservationmustingeneralbemorerigorous,methodologicallyfounded, repeatable, verifiable, contextualised, and more effectively reported; that is, itcouldconformbettertothescientificparadigm.Itneedstobemoreexperimentalthanithasbeenuptonow,somethingthat,asIhavenoted,anumberofnewresearchprojectsareattempting to inspire. These experimental results will provide us with mechanisms topredict more accurately the likelihood of certain conditions arising, and a betterappreciation of how to measure the implications of uncertainties associated with digitalobjectsandlongevitypathways.
Fourth, not only do we need to try tobetter understand what we might do to alleviate
obstaclestothelongevityofdigitalmaterials,wemustdomoretodefinetheuncertaintiesrelated todigitalpreservationand toconvert theseuncertainties intoknown,measurableand mitigatable risks. We should, of course, make a genuine distinction herebetweenperceivedriskandactualrisk;anactualriskrepresentsanassessedandmeasurableriskwejustdonotknow inameasurablewayinthecontextofdigitalobjectswhichrisksareactual.
Digital libraries must adopt a theoretical stance; recent discussions about curricula forundergraduate and postgraduate education in digital libraries make this lack of atheoretical knowledge base really evident. Indeed, the team led by the School ofInformation and Library Science at University of North Carolina (Chapel Hill), and
Department
of
Computer
Science
at
Virginia
Tech
conducting
the
US
National
Science
Foundationprojecttodevelopacurriculumforeducationindigitallibrarieshavereportedthat,researchanddevelopmentintheDLareawillflourishonlyifithasafirmtheoreticalfoundation.79 As I noted above, library science has not demonstrated that it has the
76http://www.casparpreserves.eu/
77http://www.planetsproject.eu
78http://www.dpc.delos.info
79 Work to develop curriculum to facilitate the education of digital librarians is describedbyJ.
Pomerantz,B.M.Wildemuth,S.YangandE.A.Fox,Curriculumdevelopmentfordigitallibraries,in
Proceedingsofthe6thACM/IEEECSJointConferenceonDigitalLibraries(ChapelHill,NC,USA,1115
June 2006), JCDL 06, (New York, NY: ACM Press, 2006), 175184. DOI =http://doi.acm.org/10.1145/1141753.1141787. SeealsoM.MossandS.Ross, Educating informationmanagementprofessionalstheGlasgowperspective,JournalofEducationforLibraryandInformation
-
8/12/2019 DigitalPreservation, Archival Science and MethodologicalFoundationsforDigitalLibraries1
19/19
S R (K Add ECDL2007 B d ) 19
theoretical foundationsandknowledgebase thatarecapableofprovidingtheframeworkfor handling digital entities and for underpinning digital libraries. Moreover, as digitallibrariesaremoreakin to archives than they are to traditional librariesweneed to seektheir theoretical foundations in the domain of archival science and their practices in
archival and records management environments. Archival science, with its principles ofuniqueness,provenance,arrangementanddescription,authenticity,appraisal,anditstoolsets suchasdiplomaticsandpalaeography,80mayofferusa framework fora theoreticalfoundationfordigitallibraries.
Perhapsyouaresurprised that Ihavenotcomehere todayandtoldyouthatthoseofusworkingindigitalpreservationhavesolvedall,oratleastmost,ofthechallenges,andwearejust waiting for those of you working in digital libraries to ask us to integrate oursolutions into your work. Since we have not overcome the obstacles to preservation ofdigitalmaterials,Icannotholdoutsuchapromise.Somyfinalmessageisthatthevalueofdigitallibrariesrestsverymuchintheirabilitytocommunicateourculturalandscientific
knowledge
to
the
future;
if
they
are
to
do
this,
we
must
address
the
digital
preservation
challengesandtodothisweneedtobemorecollaborative,bettercoordinatedandevencompetitive.81Thankyou.
Prof.SeamusRoss,Budapest,17September2007(revised30September2007)
Science,(forthcoming).80
Imighthaveexaminedtheissuessurroundingdigitalpalaeographyhereaswell.Inthesamewaythatusingknowledgeaboutdifferentscripts(sayInsularroundcomparedwithCarolineminuscule)apalaeographer can make inferences about the origin and production of documents, a digital
palaeographer
will
be
able
to
use
information
about
the
characterisation
and
nature
of
digital
objects
to draw conclusions about the process of production, use and authenticity. Theboundaries ofdiplomaticsanddigitalpalaeographystillneedtobedefinedforthedigitalage,muchastheydidinthe17thcentury.81
AsInotedinmyclosingkeynotetalk,ConnectingandCommunicatingourDigitalHeritagewiththe Future, at the conference organised under the auspices of the Austrian Presidency of theEuropeanUnion(ResidenzzuSalzburg,2122June2006)onAnExpeditiontoEuropeanDigitalCulturalHeritage:Collecting,Connecting andConserving?,prizes for competitionshaveactedasapowerfulincentive to research and development in other contexts,http://dhc2006.salzburgresearch.at/images/stories/info/mp3/6_ross.mp3Forinstance,theOrteigPrizefor anonstop transAtlantic flight (e.g. NewYork toParis) andAnsari XPrize for private sectorspace flightusinga reusable craft stimulated investmentby individualsandorganisations that far
exceeded the value of the prizes. In 2005, when designing DigitalPreservationEurope (DPE), Iproposed that we should include a Digital Preservation Challenge,http://www.digitalpreservationeurope.eu/challenge/. Thefirstchallengewaslaunchedinthespringof2007andtheawardswereannouncedatECDL2007inBudapest.