preserving our digital present for future memories ... · preserving our digital present for future...

44
Preserving our digital present for future memories: technical and social challenges XXXIV Reunión Nacional de Archivos 2012 Villahermosa, Tabasco, México Francisco Barbedo Direção Geral do Livro, Arquivos e Bibliotecas [email protected]

Upload: lyhanh

Post on 09-May-2018

218 views

Category:

Documents


2 download

TRANSCRIPT

Preserving our digital present for future memories: technical and social challenges

XXXIV Reunión Nacional de Archivos 2012Villahermosa, Tabasco, México

Francisco BarbedoDireção Geral do Livro, Arquivos e Bibliotecas

[email protected]

What is digital preservation

A case study

DP problems

Solutions: community building

conclusion

So… What is digital preservation?• Keep digital information available (usable,

authentic, reliable)• For the time it is operationally required and

socially relevant. (may be 20 y, or forever)• Independently of the technology originally used▫ The objective of preservation is not only to

transmit our heritage to future generations and maintain the capability to understand and reuse what we have preserved, but also to permit the ongoing use of information inside institutions

▫ We’ll get back to this..

Digital information, digital object

• Digital information is functionally similar to paper, in the way that both support business actvities. But it’s features turn it into a completely different object.

• 0 and 1 are difficult to read

• Things change, disapear, cease it’s existence…▫ Software, hardware, knowledge, people

Digital information, digital object

• Digital information depends of an intermediarysystem. Can not be used/accessed directly by thehuman being.

Intermediary system

• The intermediary system is the software andhardware in which the information wasproduced. But has also other components likeoperative system, applets, java, browsers, etc

• Greater complexity and richeness ofinformation.

• It’s everywhere. Everybody has it and produce it

6

Digital information, digital object• Informatic industry• Very quickly evolving market• Fast pace of obsolescence• Av. 7 years backwards compatibility• If no preservation actions are performed, the

risk of obsolescence increases considerabily after7 years

• This means of course that the need of preservingdigital information has become part of theagenda of institutions

Digital information, digital object

• The problem with obsolescence is that we can no longer access the information because the IS in which was produced does not exist anymore

• We may still have the information that we can no longer access, probably stored in a, alsoarchaic, media, to which we no longer possessthe adequate devices to run…

A real case: Gabinete da Área de Sines• Year: 1971• Time of existente: 1971-1989• Place: Sines, Portugal• Mission : manage the construction of a

international deep water harbour; urbanisingthe region around, build and run a big energycentral (the biggest in Portugal, by that time)

• Big project, lots of resources• Informatic resources acquired (mainframe

computer . UNIVAC)

9

A real case: Gabinete da Área de Sines

• The documentation was ingested in the NationalArchives

• Among 13.864 boxes, 88 magnetic tapes containing data

• Actions developed:• Find a company with devices and knowledge to

read the tapes. Not found in Portugal. Eventuallya company in UK commited to the job

• Data existed in the tapes: refreshed to a DVD

10

11

12

Case study

• We received a lot of files containing data.

• Was the problem solved?• NO

• Because the data appeared like this

13

14

� �B ñ :PÁ– � �Á“S (E(EÊ [“ Å(E(E(q‡q·0 3Žµ� � � � � � ?� � � � � � � � � � � � � � � � � � � � � Œ� � � � � � � � � � � � �

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �˜ � � CÝ� � � �Ž (E(E(E @� � � � � <†ñ 0� � � � � � � Ü� � � È� � �

<(� � ðÿÿÿÿ� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � èØ� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �àØ� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � àØ� � � � � � � � � �� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � Ø€AIRES BARROS GOMES DE VALLERA � � � � � � � � � � *� � � � � � � � � � � G� � �‘� � � � � � � � � � � � � � � � � � M� � � …� � � � � � � e 244645/OA� � � � � � çè� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � S� � � � � � � � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � à€CARLOS JOSE DA CONCEICAO VIEIRA � � � � � � � � � � � � � � � � � � � � � � � G� � � ±� � � � � � � � � � � � � � � � � � � G� � � .� � � � � � � e 250021/OA� � � � � � Q„� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � U� � �� � � � � � � ¶� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 3J� � � � � � � � � à€DANIEL GOMES DOS SANTOS � � � � � � �

� � � *� � � � � � � � � � � G� � � ±� � � � � � � � � � � � � � � � � � J� � � .� � � � � � � e 250023/OA� � � � � � Rþ� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � S� � � � � � � � � � � ¶� � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � � z¬� � � � � � � � � à€GUILHERME LUIS FARIA CANCIO MARTINS � � � � � � �

Digital information, digital object• What do we need to perform digital preservation?• Standards ..we actually have some and are pretty

effective (OAIS - ISO 14721)• Certification. ▫ ISO 16363 (Audit and certification of trustworthy

digital repositories)▫ European Framework for Audit and Certification of

Digital Repositories• Strategies and methods. We have fairly good

methods of preserving information although not allinformation can be totally preserved.▫ Eg. Migration

Digital information, digital object

• Anything else??

• Oh, that’s right…

• MONEY… lots of it and forever…

DP problems: review

1. Technological2. Financial3. Legal4. Trustworthiness of repositories5. Loss of knowledge6. Social issues

DP: technological issues• Software upgrades fail to support legacy files.• The format itself is superseded by another or evolves in

complexity.• The format "take up" is low or industry fails to create

compatible software.• The format fails, stagnates, or is no longer compatible with

the current environment.• Software supporting the format fails in the marketplace or is

bought by a competitor and withdrawn.• Hardware also evolves very quickly. New software does not

run on old hardware and vice versa.• Storage media and systems , including compression

algorithms and backup technology are highly volatile

• In any case the information becomes technologically “isolated”, ant becomes impossible to access and to use.

DP: formats• Without a format specification, a file is just a

meaningless string of ones and zeros. The specification indicates the proper subdivision, encoding, sequence, arrangement, size, and internal relationships that uniquely identify the particular format and allow it to be properly interpreted and rendered.

• If the specification is open, ie, everybody can consult it, the problem is mitigated.

• BUT…Most of the times vendors keep their specifications closed , even those that have been discontinued.

DP: legal issues• What can we really preserve?

• The objet or a representation of the object?

▫ Everytime we act over the information for preservationpurposes, we change it and it becomes, in a certain way, different from the original information. That’s whathappens with migration…

• The fact is that law is still being made thinking in the waysthings worked in the paper world. But paper is not digital…Wecan not expect digital information to behave like paper. Although it is all information.

• Plus. We must consider digital rights about material to bepreserved

DP: financial issues• Factors that affect dp costs:

The cost of the digital archival system (a digital depot or repository) and functionality for the long term preservation of digital records

+ Personnel costs

+ The cost of the development (or procurement) of

software and methods for the preservation of digital records, eg, conversors.

+ The cost of the actual storage of digital records

+ Other factors that exert an influence on the total

1. eg,. communications

DP: financial issues• As Dp is a permanent process, money must keep

supporting that effort.

• The amount of information to be preserved has a cumulative growth (just like in the paper world).

• DP is a highly expensive business, and there are no guarantees whatsoever of any organisationbeing financially able to support DP on the longterm.

DP: trustworthiness of repositories• Difficult to prove the efectiveness of our preservation

methods at a long term. Simply because not enough time has elapsed since we started digitally preserving

▫ Informatics has c. 61 years existence (1951 UNIVAC 1st commercialcomputer. Paper existes since many centuries.

• How can we certify that we really are effectivelypreserving information that was delivered to us?

▫ How is trust build?

Certification of digital repositories, helps. The concept is identical to ISO 9000. If a repository is certified according to a recognised standard, weexpect that it runs it’s business effectively

• Standard ISO 16363 for digital repositories certification.

What do we expect from repositories?• Long-term preservation of readability and accessibility

in a way that is independent from any specific software or hardware

• Reliability and authenticity of digital records while carrying them across successive generations of information technologies. Because… we are talking or archival information and evidence

• Scalability to accommodate a huge amount of data and records

• Users expect… Low cost, Low trouble

But is all this possible?

24

DP: trustworthiness of repositories• What compromise to Dp can/should we assume?

Because we shouldn’t do promises we are notcertain to keep.

• We can testify that a bit sequence has not beenphysically changed or corrupted

• We must be able to preserve metadata about thatbit sequence, which means data about it’sstructure, digital rights, original process, intermediary system, social environement, etc…

DP: Preserving knowledge• Because…

• The social and organisational environementsdisapear, so: how are we suppose to be able to recreate it, i.e, to preserve knowledge?

• Metadata helps a lot. Because it documents allaspects of the information to be preserved, so as it can be recreated in the future.

DP: Social issues

• To have a better understanding on digital preservation social issues, we must take a short trip on globalisation and information growth

27

globalisation• Globalisation is about things, people, countries

getting closely connected. ▫ For good and bad times!

• The actions we (individual or collective agents) perform have global impact in everyone’s life(whether minor or major)

• This fact is in part due to information massification, which is greatly explained by the possibilities offeredby technology▫ Information production increases, is more complex,

rich and disseminates quickly.• It is pervasive => global (everywhere and

everyone).• It impacts all range of human and social activities.

28

globalisation

• In 2020 total estimated amount of informationcreated = 35.000 Exabytes! (in 2009 = 800 exabytes) (source: Oracle, 2012) Exabyte

Petabyte

Terabyte

Gigabyte

29

Globalisation: some aspects of information

• Access to devices and media that allow anyone to produce information (cameras, videos)

• Institutional information rely heavily in database systems, sometimes mixed with more complex data such as multimedia (GIS, medical imagiology)

• Archivists must include into their area ofinfluence all information that supports business and constitutes evidence of it. Not only “records” or “documents”

30

globalisation

• Powerfull and available mechanisms for information search and retrieving (google…)

• Shared and accessable mechanisms for knowledge sharing (eg. Wikipedia)

• Social and professional networks available to everyone… connections.▫ Fbook, linkedin

31

globalisation• New ways to interconnect, socially and in work.

• More potencial for cooperation and theestablishement of networks of activities, people, work, organisations, etc

• Opportunity for sharing resources, maybe assets, creating horizontal structures (national orinternational), instead of vertical ones.

32

Social habits• Remote social interconnection

• Heavy dependance of internet for developement of commonprofessional and social activities (booking hotels, reservations, trips, ecommerce, social and professionalrelations (networks such as fbook, Linkedin), etc

• New information users. People under 25 that grew withinternet. They have no idea that once there were no computers.

• The concept of phisically visiting a specific place to getinformation, such a library or archive, might seems a bit weird…

• What services do they expect from archives?

33

DP: Social issues

• New users . How will they react or their attitudewill be regarding preserving information thathas always been easy to get?

• Will they be willing to spend money for keepingsomething of which they might not see thevalue?

• We value more things difficult to obtain thanthose that are easy.

DP: Social issues• Digital information also brings needs of DP to the

core of organisation issues.

▫ Or, at least it should…

• Unless one accepts becoming informationless on a short term, institutions must formally recognise DP as an issue to deal with.

• That’s a new situation that requires adaptation fromorganisations and people

• They must put DP in their planning processes andbudget.

Let’s build a community!• Community is a network of actors that share

interests in a specific domain. In this case: Domain = digital preservation

• All kinds of actors can exist in the network: institutions (public/private), developers, business, consultants, citizens

community

• It can be based on a specific platform that must beopen and freely reusable

• Advantages must be clearly perceived by thecommunity to be.

• Sharing services and development

• Sharing Costs▫ Holders want low cost/low trouble, so, let’s divide the

effort and smooth our job.

37

Sustainable network• Other issues must be considered for a

community to work:

• a/ Identification of problems regardingparticular realities at a national level▫ political commitement (for public administration)▫ Installed based (social and technological)▫ Installed skills and knowledge▫ Existing platform. level on informatics use That’s because for a network to function it’s agents

should be leveraged.

38

network

b/ Training and capacitation• In order to achieve common levels of expertise

and knowledge in DP. platform ofconvergence▫ In all domains: informatics, archival,

organisational, etc

39

network

c/ Developement

• Around an accepted common plataform▫ (why not RODA? Anyone can use it for free)

• That is open and reusable• Within a strategic development plan accepted by the

community• Broad, inclusive but specific enough so as every

development might be interconnected to theplatform and do not overlap▫ Control of quality necessary

40

Platform• Although all actors can and actually should build

add-ons and other informatic functionalities, itis best that every individual effort is gatheredinto a common planned development strategy,

• Advantages: ▫ no overlaping▫ Usable add-ons▫ Everyone can use and integrate those add-ons into

their specific solutions

41

network

• d/ envolvement and dynamics

• The network must be motivated and boosted soas it can keep it’s sustainability and permanentevolution

• If no dynamisation is performed, the communitywill certainly desapear on the short term.

42

network• Possibility of defining the best architecture for

the community in terms of building digital repositories. What is the best distribution ofdigital repositories, for example?

and

• Shared storage capabilities▫ Common, shared

43

my way or our way?

44

• “El camiño se hace caminãndo…”

• We all have the same problems. Why don’tmanage them together?

• But we can not only stick to models andplanning.

• We must act to build paths• May not be the better, but is is surely better than

not acting.• Video en You Tube Preserva Digital DGARQ

http://www.youtube.com/watch?v=47BZ6rXNcsQ