the dpp guide to digital - amazon web servicesdpp-assets.s3.amazonaws.com/wp-content/uploads/... ·...

40
ARCHIVING THE DPP GUIDE TO DIGITAL ARCHIVING DIGITAL

Upload: others

Post on 23-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: THE DPP GUIDE TO DIGITAL - Amazon Web Servicesdpp-assets.s3.amazonaws.com/wp-content/uploads/... · assumption: you employ or have the benefit of a person or persons who is qualified

ARCHIVING

THE DPP GUIDE TO

DIGITALARCHIVINGDIGITAL

Page 2: THE DPP GUIDE TO DIGITAL - Amazon Web Servicesdpp-assets.s3.amazonaws.com/wp-content/uploads/... · assumption: you employ or have the benefit of a person or persons who is qualified

PAGE 2

CONTENTS

Foreword . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3

Introduction: The Case for a Digital Archive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1 Why Store My Media? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

The risks inherent in the world of digital storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Forever is a Long Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7

2 What is a Store? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

An Introduction to the OAIS Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3 What Should I Keep? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Identifying the Key Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Risk-Value Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Formats and Standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12

4 Where Should I Keep It? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15

Differentiating Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16

Underlying Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17

Solution Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .19

The Land of Lost Content . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .19

5 How Will I Know It’s Safe? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

Application of technology to provide assurance of archiving . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

6 How Will I Find It? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

Minimum Metadata Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

Unique Identifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

Embedded Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

7 How Can I Stop the Wrong People Getting In? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .30

It’s Not Easy Being Secure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .30

The Threat Within . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .30

Authentication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .30

Encryption at Rest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

8 How Long Should I Keep It? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

Policies and Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

What is a Policy? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

What are Guidelines? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

9 How Do I Know It Will Always Play? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

Migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

Integrity Management During Migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

Data Tape Migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

All good things must come to an end . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

10 How Much Will It Cost Me? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

Page 3: THE DPP GUIDE TO DIGITAL - Amazon Web Servicesdpp-assets.s3.amazonaws.com/wp-content/uploads/... · assumption: you employ or have the benefit of a person or persons who is qualified

PAGE 3

When the DPP published our very first report, The

Reluctant Revolution, in 2011, we said that we had surveyed

independent production companies and they’d told us

digital archive was “not a key issue yet” .

Their view was reasonable: in a tape-based production

environment a move to digital archiving represented

nothing but pain, expense and risk . We predicted this

situation would change when the industry moved to

file-based delivery; and sure enough it did . Almost the

moment production companies became aware of the shift

programmed for October 2014, they also began to enquire

about digital storage solutions .

The DPP responded to this interest

by publishing an introductory

guide called 10 Things You Need To

Know About Digital Storage . We

promised at the time that a more

comprehensive guide would follow .

And here it is .

The introductory guide aimed to help people understand

the basic principles behind storing on file rather than tape .

This fuller guide has a more specific purpose . The DPP

Guide to Digital Archiving is designed for anyone who needs

to create and maintain a long-term collection of digital

media assets . It is intended to help independent production

companies, cultural organisations and other bodies who

need to migrate and then grow major collections of audio-

visual media . Such companies and organisations will

almost certainly hold the rights to this material, have a

direct interest in exploiting it (either commercially, or by

making it accessible to the public, or both), and will need to

guarantee the material will still be findable and usable for

many generations to come .

We have kept the format of the 10 Things guide, with the

same section headings, allowing us to elaborate on specific

topics to complement the original content . By maintaining

the same 10 Things structure we have also made it easier

for you to use this more detailed guide as a reference

document .

If you don’t immediately have time to digest the entire

document, we direct you to the recommendations in the

Conclusion section .

Anyone creating such a digital media collection would

be wise to employ the services of someone with proven

experience in this area of archiving . However this guide will

provide you with the knowledge to understand what the

experts are talking about – and the framework to undertake

what could be the most nerve-racking investment of your

professional life .

After all, who wants to be the modern-day equivalent of the

person who decided in the 1970s not to keep episodes of Dr

Who? But then, how were they to know? They didn’t have

the benefit of a guide like this .

Mark HarrisonManaging Director

Digital Production Partnership Ltd

FOREWORD

Page 4: THE DPP GUIDE TO DIGITAL - Amazon Web Servicesdpp-assets.s3.amazonaws.com/wp-content/uploads/... · assumption: you employ or have the benefit of a person or persons who is qualified

PAGE 4

The Case for a Digital Archive

Before we take you into the detail of what a Digital Archive

is and how to set one up, we ask you to consider two crucial

questions:

• Why do you want to keep the content in the first place?

• Why is it worth investing in doing it properly?

It is more expensive to store digital content than to keep

a tape on a shelf – not least because the susceptibility of

digital content to re-versioning means there are likely to

be more copies of it . It also requires more technology

than a physical collection, and technology decisions

and investments are rarely trivial for any company or

organisation . So it is necessary to have a clear sense of

purpose as to why you need to have an archive, and why

you need one that is worthy of the name ‘archive’ .

The Hoarding Reflex

It’s always tempting to keep everything, particularly in the

digital age where files are ‘invisible’ and search systems

promise so much .

But this approach used to happen in the pre-digital world

too . Many physical archives took an approach to ‘archiving’

that actually meant putting everything aside – and thereby

INTRODUCTION

putting off the awful day when the material needed to be

sorted . Anyone who has ever needed to clear out a tape

‘archive’ (aka storage cupboard) will be familiar with how

chaotic it tends to be . Much of the problem stems from the

fact that material initially labelled for one purpose becomes

impossible to identify unless decoded by the original

producer or the librarian who set up the collection – both of

whom are likely to have left years ago .

But let’s not kid ourselves . It is hard to throw things away .

When did you last hear of a technology vendor promoting

its fantastic deletion capability? It doesn’t feel overly bold

to assert there is no company or organisation that creates

or handles content that cheerily throws it all away as soon

as it has served its original purpose .

Hidden Value

Just as we know there is always a tendency to hoard, we

also know that the most common justification made for

such a reflex is ‘just in case .’ When it comes to digital

media, ‘just in case’ might relate to the possibility of a legal

or compliance issue; but more commonly it relates to the

possibility the material might have some kind of future

value – either for re-use or for sale .

Perhaps the greatest potential – but least actual – benefit

of digital media is its reusability . Such media is inherently

searchable in a way physical media just isn’t . And the first

pre-requisite for reusing or retrieving content is being able

to find it .

This is the heart of any decision to create to digital archive:

we can’t merely delete everything we create; and if we are

going to keep any of it, a digital archive makes it possible to

find it and extract its value .

But just how much value does old digital content have – and

is it worth the cost of maintaining it in an archive? This is an

impossible question to answer of course: it all depends on

the material . If you are lucky enough to be the owner and

rights holder of FA Cup footage, then the value will be rather

a lot . If you are the proud possessor of hundreds of hours of

interviews with people who never made the final cut for a

reality show, then rather less (until one of them becomes

famous, much later) .

This is where the challenges of setting up a digital archive

become the greatest benefit of setting up such an archive .

Throwing a tape in a cupboard didn’t really entail a decision

– which meant you never had to address whether it had any

future value or not . But as soon as you make the decision to

keep digital media in a form by which it will remain usable

and findable, you are forced to make judgements about the

worthwhileness of keeping it . This guide will assist you in

making those judgements .

Page 5: THE DPP GUIDE TO DIGITAL - Amazon Web Servicesdpp-assets.s3.amazonaws.com/wp-content/uploads/... · assumption: you employ or have the benefit of a person or persons who is qualified

PAGE 5

Less is More

The overall cost of keeping a programme as a tape on the

shelf currently remains lower than keeping it in a server

or server system – meaning the cost of digital storage is

a more important consideration . The reality of making

a decision to set up a digital archive is likely to be that

you will, over time, actually keep less material than you

otherwise would – simply because you now have the

capability to make retention and deletion decisions, and

because your confidence that you can retrieve the right

copy of something will reduce the temptation to keep all

the just-in-case copies .

But as may now be apparent, we are making one major

assumption: you employ or have the benefit of a person or

persons who is qualified to manage a digital archive . Make

no mistake: a digital store with a search capability is not

an archive . This guide will provide a formal explanation of

what you need before you can reasonably describe your

collection as an archive – and to be frank you may find

the pre-requisites a little daunting . But they become a lot

less daunting if you have access to a professional Media

Manager or archivist . What may feel complex and off-

putting to you, will be second nature to them . You might

think of them as your best system investment: they will give

you something that no amount of technology investment

can buy .

The Human Factor

Just as it’s a simple truth that we find it hard to throw

content away, it’s also a simple truth that those who are

employed to make content are not going to be as focused on

maintaining it . It’s all about priorities, skills and motivation .

No-one whose primary function is production (the

creation of content) is going to be as focused on archiving

(the retention or deletion of content) as someone who is

employed for precisely that purpose .

So does every organisation need a specialist role? This

depends on the size and complexity of the collection and

workflows . The demands of the BBC or ITV Archives

for example are very different from a small independent

production company or facility house . But you will at the

very least need your collection to be getting the benefit of a

specialist individual – it’s just that that individual might work

for someone else, and be overseeing your material as part

of a larger archiving service .

If you do decide you need a specialist Media Manager,

a sophisticated skillset is now required . Traditional

information management skills are still important but

should be accompanied by a practical business sense,

excellent communication abilities, and particularly good

influencing skills: this is the person who will need to

articulate the importance of good metadata within the

end-to-end process . In short, the Media Manager will be

the person persuading those who might be moving on

to another job next week that they should perform tasks

necessary for the content they created to be kept for a

lifetime – and beyond .

Digital Archiving: It’s a State of Mind

This is a fitting point to leave you with before we embark

on this guide: just as the very decision to create a digital

archive forces an organisation into disciplined thinking that

can ultimately save money, increase value and improve

efficiency, so the employment of a professional Media

Manager can change the culture of the workplace .

The presence of a Media Manager will make it much easier

to create frameworks for how to work with digital media, to

assign roles and responsibilities and to establish policies . It

signals ‘we understand what it means to work with digital

media .’ And that, after all, is your business .

INTRODUCTION

Page 6: THE DPP GUIDE TO DIGITAL - Amazon Web Servicesdpp-assets.s3.amazonaws.com/wp-content/uploads/... · assumption: you employ or have the benefit of a person or persons who is qualified

PAGE 6

1

The risks inherent in the world of digital storage

Technologies for the storage of content have a long history

and an exciting and rapidly evolving future .

Each advance in storage technology brings with it a hitherto

inconceivable increase in storage density . However, these

developments are accompanied by a corresponding

reduction in the lifespan of the storage medium .

The table on the right highlights major milestones in

storage development throughout the ages and shows how

each major advance delivers an increase in storage density

of a factor of one thousand but a corresponding decrease in

lifespan of a factor of ten .

For this and other reasons, it is often said that we are

the generation likely to leave least behind us for future

generations .

For cultural organisations this is a worrying trend, but for

content owners it also creates a challenge to the financial

and commercial security that the long-term custodianship

of content can bring with it .

Why store my media?

Moreover, moving from a physical to a digital world

introduces other non-technical challenges . For legacy

physical content such as film, the disposal of content

required a conscious decision . In the digital ephemeral

world however, quite the opposite applies . Unless a

conscious decision is made to secure content that would

otherwise just flow from one ‘cloud’ to another, it has

the potential to be lost through absence of a decision to

capture it, rather than through a conscious decision to

dispose of it .

These are not just theoretical concerns, and affect media

professionals on a daily basis, whether they are aware of

it or not . It is difficult to retro-fit solutions, and suitable

planning must be undertaken before content loss occurs

rather than after the fact .

Medium Storage Density (bits/cm2) Lifespan (years)

Stone 10 10,000

Paper 10,000 1000

Film 10,000,000 100

Disc 10,000,000,000 10

“Preserving Moving Pictures and Sound” Wright 2012

Page 7: THE DPP GUIDE TO DIGITAL - Amazon Web Servicesdpp-assets.s3.amazonaws.com/wp-content/uploads/... · assumption: you employ or have the benefit of a person or persons who is qualified

PAGE 7

WHY STORE MY MEDIA?

1

You will hear archivists use the term preservation with regard to the long-term storage of content .

Generally, storage is about keeping your content safe today, but preservation is about prolonging the life of content so that it lasts as long as you need it to, which may sometimes be forever .

STORAGE VS PRESERVATION

Of course, not all content is worthy of such attention and

hence it is useful to consider the lifespan that you would

like your content to have and act accordingly, applying a

suitable level of effort and funding .

If your content will outlive the life of the hardware and/or

software solution that it currently resides within then

you will need to consider the long-term preservation

management approaches that we discuss within this

document .

Forever is a Long Time

Many archives consider the “100 year question” namely

how can you be sure that your content will be safe and

accessible in one hundred years’ time . This is not an

unreasonable question, and small changes you can make

today to the way you manage your content can make

this a realistic likelihood . If you plan for such a long-term

view, then short term preservation comes as a natural

consequence .

A crucial archiving principle is to document the decisions

you have made and the reasons behind them so that future

custodians of your content can treat it accordingly . Unless

you are planning to live for another hundred years, your

content collections need to be self-describing .

In this document we’ll discuss the relationship you have

with your content and also, by implication, with the

technology it’s stored upon . Irrespective of who provides

your storage solution, you should never put blind trust in

the technology . You should take as much responsibility

for your media storage as is necessary to feel you have

adequately discharged the duty of care that you have as

current custodian of the content .

Although it’s important that your content is kept safe, it’s

equally important that it’s kept alive and accessible, not

least because this accessibility and ability to commercialise

or exploit your content is likely to justify the funding that

you’ll require to ensure the continued safe and secure

storage of your content .

Page 8: THE DPP GUIDE TO DIGITAL - Amazon Web Servicesdpp-assets.s3.amazonaws.com/wp-content/uploads/... · assumption: you employ or have the benefit of a person or persons who is qualified

PAGE 8

2

As outlined in the previous section, a store is a place where

you can keep your content safe . For any organisation,

not just in broadcasting, keeping material secure and

accessible over an extended period of time requires a

methodical approach . Even the traditional filing cabinet

requires some kind of system if content is to be retrieved

efficiently .

An introduction to the OAIS Model

Fortuitously, an international standard exists to define an

approach to address exactly this issue .

This standard is ISO 14721 – the Open Archival Information

System – more commonly known as OAIS . It is a conceptual

framework for setting the standard for archiving activities

rather than a method for carrying out those activities . As

such, the OAIS model does not require the use of any

particular computing platform, database management

system, technology, or media .

OAIS will already be well known to professionals managing

large archive collections . In fact, a working knowledge of

What is a store?

this standard could be considered an essential requirement

for anyone you are considering employing to look after

your content .

If you have not come across OAIS before, understanding

this framework is likely to help you organise your archive .

At first glance, the model might seem to apply to larger

archives employing dedicated teams of people but it

is equally valid for smaller operations and can act as a

checklist to ensure that no critical function has been

overlooked and that each function is mapped to a specific

role or individual .

It also provides a common language to use when discussing

and agreeing archiving policies and is now used as a

reference model by a wide variety of organisations in the UK

and internationally with digital archiving needs, including

the BBC, ITV and other European archives .

The Further Reading section of this document includes a link

to a thorough introductory guide to OAIS but the key points

are presented here .

STAKEHOLDERSThere are three main stakeholders or ‘entities’ defined

in OAIS:

Producer

Management

Consumer(DesignatedCommunity)

OAIS(Archive)

Page 9: THE DPP GUIDE TO DIGITAL - Amazon Web Servicesdpp-assets.s3.amazonaws.com/wp-content/uploads/... · assumption: you employ or have the benefit of a person or persons who is qualified

PAGE 9

WHAT IS A STORE?

2

ProducerThis doesn’t mean a television producer but rather the

person or system transferring content into the Archive .

ConsumerAlso known as the Designated Community, these are

the persons or systems expected to use the information

preserved in the archive .

ManagementThe persons defining how the Archive should operate and

function .

It is a principle of OAIS that decisions are made primarily

with reference to the Consumer (or “Designated

Community”) as it is ultimately only for their benefit that

content is being stored . This is a key point that can help

you make decisions on what content you store in your

archive, and how .

INFORMATION MODELOAIS defines three primary packages of information that get

managed as part of an archive system . For some systems

they may all be exactly the same file format, but they are

more often tailored towards the specific practicalities,

requirements and constraints of the people or systems

creating, storing, and consuming the content respectively .

Submission Information Package (SIP)The package transferred from the Producer to the Archive

for Ingest . For media archives this is likely to be the item or

collection of items that you want to archive, including all

accompanying metadata such as title, episode and version

production number .

Archival Information Package (AIP)The form of the package that is stored within the Archival

Storage . This is the content that you are preserving,

accompanied by additional metadata to support its long-

term management for example checksums, additional IDs,

or information regarding the classification, provenance,

retention period and restrictions on usage or access .

Dissemination Information package (DIP)The form of the package that is delivered to Consumers

for Access . This is your content as it leaves your archive,

having been rendered, packaged, trimmed or assembled

into a form suitable for a specific customer or consumer of

your content .

It’s worth noting that in the parlance of OAIS, these

packages refer to both the media and the metadata .

OAIS doesn’t mandate a single SIP, AIP, and DIP to be

used within an individual archive and actively encourages

different definitions for different producers and consumers .

Likewise there may not be a one-to-one relation between

these packages, for example with a DIP often formed from

multiple AIPs .

FUNCTIONSThe OAIS model, shown overleaf, describes six primary

services .

1 IngestAccepting content into the Archive from Producers

2 Archival StorageManagement of the long-term storage and maintenance of

content .

3 Data ManagementMaintenance of the databases describing the content in the

Archive .

4 Preservation PlanningDefining the strategy for preserving content in the presence

of changing technologies and user needs .

5 AccessThe process by which Consumers locate, request and receive

content from the Archive .

Page 10: THE DPP GUIDE TO DIGITAL - Amazon Web Servicesdpp-assets.s3.amazonaws.com/wp-content/uploads/... · assumption: you employ or have the benefit of a person or persons who is qualified

PAGE 10

WHAT IS A STORE?

2

6 AdministrationManagement of the day-to-day operations of the Archive and co-ordination of the five other activities .

Although we have focused on the procedural aspects of OAIS, at its heart is the need for an organisation

to make a commitment to long-term digital preservation and accept the responsibilities that this brings .

You may notice that the OAIS documentation makes reference to Space Data Systems, revealing that the primary author of the document was NASA, the American space agency .

NASA is particularly notable for running very long projects, for example the Space Shuttle programme which ran from 1969 to 2011 – a 42 year project . It was crucial that all their data was kept safe and accessible throughout the project and that information created on the first day of the project was readable on the last day .

The timespan of the Space Shuttle Programme naturally saw the obsolescence of many technologies, highlighting the need for continual migration of information and the need to manage the long-term life of content in the presence of changing technologies and user needs .

MAYBE IT IS ROCKET SCIENCE

ARCHIVALSTORAGE

ACCESSINGEST

DATA MANAGEMENT

PRESERVATION PLANNING

ADMINISTRATION

OAIS

Consum

erProd

ucer

Management

Page 11: THE DPP GUIDE TO DIGITAL - Amazon Web Servicesdpp-assets.s3.amazonaws.com/wp-content/uploads/... · assumption: you employ or have the benefit of a person or persons who is qualified

PAGE 11

3

Identifying the Key Metadata

A common misconception is that every piece of data

produced by an organisation or company has value

for the archive . In fact, much relates to a point-in-time

during the development of a programme, and can

conflict with decisions taken later . For example, the

original script may contain lines that were later edited

out; a piece of music that is included is later changed;

or a unique programme identifier is superseded when a

newer version is created .

So rather than retaining all data, you will need to decide

the relative value that each piece of data has, and the trust

that you place in it . There is no issue with storing data

created at any point in the production process so long as

you understand which pieces of metadata are known to be

accurate and universally authoritative .

It is therefore essential to have a complete picture of the

data flow through your organisation and to understand the

processes that cause data to be added or modified . You

can then decide which data is unarguably trustworthy and

which could become so with additional quality control .

What should I keep?

Accurate relevant metadata is often seen as a key contributor

to helping organisations gain the competitive edge .

Risk-Value Analysis

Given unlimited budget you might choose to apply the same

level of diligence and quality control to all your content .

However, given limited funding and resource, most archives

have to decide on a set of priorities that direct effort to that

content which is most valuable to the business . ‘Value’ in this

context might not just be commercial value: it could reflect

other priorities, such as re-use value, heritage value, legal

requirements for retention or public-access availability .

One example of a commercial model for this is illustrated in

the diagram on the right where the most detailed metadata

and widest format availability is applied to the percentage

of the collection that is predicted to generate the most sales

interest .

For example, in this scenario, it would be possible to weight

the amount of effort devoted to metadata augmentation

according to the expected exploitation and commercial

GOLDGeneratedtop sales

SILVERGenerated

very profitable sales

BRONZEGeneratedsolid sales

Page 12: THE DPP GUIDE TO DIGITAL - Amazon Web Servicesdpp-assets.s3.amazonaws.com/wp-content/uploads/... · assumption: you employ or have the benefit of a person or persons who is qualified

PAGE 12

WHAT SHOULD I KEEP?

3

activity . If clips from news programmes do not attract high volume sales, it would

not be worth using valuable journalists’ time to shot-list . In contrast, high-profile

high-value natural history footage may warrant employing a dedicated Media

Manager to apply detailed tagging information to the content .

In the preceding diagram, maximum quality control and data effort is expended on

the Gold content, ensuring that it can easily be discovered and that it is available in

a form to allow quick and cost-effective re-use .

The Bronze categories, meanwhile, would have a light-touch minimum metadata

set of information . This category might include older programmes that have

shown no commercial activity over a set period of time . If commercial interest in

this tier of content emerged over time, a business case might be made to fund

enhancement of this metadata .

Formats and Standards

“The nice thing about standards is that you have so many to choose from”

(Andrew S. Tanenbaum)

In digital media preservation there is no single ‘correct’ answer to the question

“what file format should I use for the AIP?”

It’s often impractical to hope to standardise on a single format whilst allowing for

tiers of multiple qualities and also supporting the desire to occasionally store the

original source files .

Archiving of content at the highest quality and with the highest level of integrity management may not be relevant for all your content .

In considering technology solutions, a crucial approach is the ability to consider the relative value of all the content being stored and architect the solution appropriately .

For example, would you consider a single archived rushes clip to have equal value to a fully-finished programme file and hence do they warrant the same degree of content resilience, speed of access, or choice of file format?

Designing or procuring an archive storage solution supporting tiers of quality and service will allow more cost-effective use of limited budget than ascribing identical value to all content .

Even within a tiered model, it’s not essential that a particular piece of content exists only at a single level . It’s common for easily accessible copies to be held for all content, but additional higher-quality but less frequently required versions can also be held in a slower storage tier .

NOT ALL CONTENT IS CREATED EQUAL

Page 13: THE DPP GUIDE TO DIGITAL - Amazon Web Servicesdpp-assets.s3.amazonaws.com/wp-content/uploads/... · assumption: you employ or have the benefit of a person or persons who is qualified

PAGE 13

WHAT SHOULD I KEEP?

3

Where content is being digitised from videotape, your

starting position is often the baseband video output of the

tape machine – and you need to make a conscious decision

about the quality at which you capture the content . A

natural modern choice for digitisation of content is to use

the same format that you would if the content was to be

delivered for transmission . As future content will only be

naturally available in this quality, you may decide that this

effectively sets an acceptable quality level for your archive .

In the UK this would therefore mean one of the family of

AS-11 DPP formats, although it’s important to understand

the distinction between the AMWA AS-11 DPP file-format

specification and the UK DPP delivery specification as you

may find that your legacy archive content can easily be

made to conform to the former but, due to being created

when a previous delivery specification prevailed, may not

conform to the latter .

If though, as highlighted above, there is a suggestion that

the content was of suitable value or importance to be

worthy of archiving at a higher quality, then alternative

higher quality codecs could be used with the pinnacle of

quality being an uncompressed or lossless file format .

Similarly it’s possible that your legacy archive content

may need to undergo processing, such as aspect ratio

conversion, before use in a new programme and therefore

would benefit from storing at a higher quality to ensure an

acceptable result in the completed programme .

When using very high quality formats, you may find that

these are often less supported by commercial systems and

you should be careful about interchange and compatibility

before committing to a format .

For some specific digital videotape formats, notably the

DV family, it’s possible to transfer the content in its native

form without introducing an additional compression pass –

creating files which contain an exact copy of the compressed

essence from the videotape . In these cases, it’s worth being

clear what method is being used to create files from these

tapes and ensure it is optimal .

It’s also a good time to consider what additional work would

be required on the archive content following the completion

of the digitisation process, but before the future intended

use is achievable . For example, due to modern delivery

standards being different from those prevailing when video

master tapes were created, it is likely that content will need

re-versioning before sending for distribution or publication .

Likewise, digitisation from video tape can be an imperfect

process and you would want to consider the level of quality

check required before deeming the media file to be an

acceptable representation of the source video tape .

THE RAW MATERIALSFor some categories of content, you will not be in control of

a format and codec choice as these will be dictated by the

Even if you were to decide today on a single archive storage

format, the advance in content consumption technologies

(e .g . Ultra High Definition and beyond) means that you will

rapidly need a new format, and even if you were to attempt

to migrate all content to this new format, it’s likely that you

will never reach an equilibrium with all your content stored

in a single format within your archive . Unless you are dealing

with a static historic collection of content, that is .

It’s important to consider the future life of your content,

for example whether there is any expectation that it would

be published on a future platform where the native quality

exceeds what is currently required – such as transmitting

SD programmes on an HD channel, or HD programmes as

UHD . Where such up-conversions are a possibility, it’s worth

considering the additional value that might be gained from

holding a higher quality version of the content, even if this is

not in your normal delivery format . For example you might

choose to hold a higher quality version of your content in an

edit-platform codec such as Apple ProRes or Avid DNxHD .

FROM PHYSICAL TO DIGITALIn selecting archive formats, it’s useful to make the

distinction between content which started life as a digital file,

and content migrated from analogue or tape-based media .

Content which has only ever existed as a digital file is

already at its zenith of quality, and care must be taken not

to compromise this .

Page 14: THE DPP GUIDE TO DIGITAL - Amazon Web Servicesdpp-assets.s3.amazonaws.com/wp-content/uploads/... · assumption: you employ or have the benefit of a person or persons who is qualified

PAGE 14

WHAT SHOULD I KEEP?

3

programme making process – for example RAW camera rushes . Here you are faced

with the decision of converting these into a more normalised form or leaving them

in their proprietary formats . It’s worth considering the future use of the content and

how readily accessible and controlled the content needs to be .

A useful parallel is to consider film negatives, which are often held by media

archives as the highest quality raw materials associated with legacy high-value

productions, even though a more readily accessible video tape copy is held of the

content . It is accepted that there will be a greater cost in handling film negatives

than accessing traditional video tape, however this is outweighed by the benefits

brought by access to this higher quality version of the content (for example, re-

mastering for commercial sale) .

Like with film negatives, there is sometimes a creative decision to be made

when converting camera raw files (e .g . REDCODE) into normalised formats .

Decisions on dynamic range, framing and grading are required and hence it is

neither cost-effective nor practical to perform this conversion work up-front

without knowledge of the final use of the content .

Similarly, the moment at which you know for sure that you want to re-use

content is the future moment at which its value has become apparent . It is at

this time that you are best placed to consider the level of investment you want to

make in converting from raw materials that you may hold . It is likely to be more

cost-effective to hold a large quantity of raw footage and accept the cost of

having to convert a small amount of this when commercialisation opportunities

arise, than to transfer all material up-front to the latest commonly-used format .

In making any decision around formats, standards, and technologies an important

factor to consider is the breadth of adoption of the product or the approach, i .e .

how many other people in the world are also using it . The more organisations and

institutions that share your specific preservation and migration needs, the more likely

that jointly you will have access to the solutions you require .

For example, Panasonic D3 video tape was only adopted by a small subset of the global

broadcast industry (notably the BBC and NHK) and migrating from this now-obsolete

format brought with it a number of practical and technical challenges . LTO data tape,

however, is used globally across a wide range of industries including medical, banking,

pharmaceutical and science . Problems with the future migration and access to this

format are shared with such a wide and varied range of users that greater confidence

can be placed in the likelihood of solutions becoming available when needed .

Similarly with file formats, although there are a large number of standards to choose

from, picking a format with wide adoption and wide product compatibility will always

make your future requirements easier to deliver . For this reason the work that the DPP

has undertaken in creating and promoting the adoption of the AS-11 DPP format makes

it worthy of consideration as a primary format within media archives .

A PROBLEM SHARED

Page 15: THE DPP GUIDE TO DIGITAL - Amazon Web Servicesdpp-assets.s3.amazonaws.com/wp-content/uploads/... · assumption: you employ or have the benefit of a person or persons who is qualified

PAGE 15

4

As a potential consumer of digital storage technologies, it is

likely you will be presented with a plethora of commercial

products and services, and you will need to choose the best fit

to your requirements and budget .

Not all content owners and creators have the necessary

technical resources to manage a functioning digital archive

in-house . Similarly, not everyone will be able to secure

ongoing technology investment if their operating model is

based upon project-focused activities with corresponding

peaks and troughs in their funding resulting from content

creation and commercialisation .

MANAGED SERVICESManaged services are a very practical and efficient

way of delivering archiving solutions . Rather than each

potential customer needing to become an expert in

archive technologies, you can look to suppliers who

specialise in these capabilities and provide services to a

number of customers .

There are several specific factors that become relevant when

considering managed services, but which would be less

relevant for in-house offerings .

Where should I keep it?

Primarily, it isn’t possible to fully transfer the risk of content

loss to a contracted third party as failure in the action of your

supplier is likely to ultimately have a greater impact on you

than contractually allowed financial damages .

If you value your content, it isn’t viable to fully devolve

responsibility of its long-term life to a managed service

provider under a purely contractual relationship . You owe

more to your content and have a duty of care to ensure that

its lifespan is managed effectively . You should therefore take

more interest in how it is being managed and how its long-

term life is being assured . It’s not unreasonable to want

to understand how your content is being stored and even

consider independent assurance of the approach, rather than

only focusing on service levels .

Nonetheless, with sufficient caution around risk-ownership,

managed service offerings can play a very important role

in a complete archive solution, especially for small and

medium sized organisations who can effectively ‘buy-in’ the

archive expertise as part of a managed service if they don’t

have this in-house .

CLOUD The word ‘Cloud’ is often misused and misunderstood but it

is, simplistically, a general term used to describe managed

services provided over the internet . You often find this used

in relation to media and archive service offerings, however

these should simply be considered and evaluated in the way

that you would with any other managed service offering .

Generic Cloud storage, as provided by products such as

Amazon S3 and Microsoft Azure, is different in nature, and

is generally just a storage component which can be used

as part of a complete architected technical solution . For

example, end-users would not be directly accessing Amazon

S3 storage: instead it would be used as the ‘backend’ to a

more user-friendly product .

All Cloud systems add an additional layer of abstraction

between the end-user and the technology providing the

service . The advantage of this abstraction is to insulate you

from technology changes and release you from the need to

support and maintain complex infrastructure and operations .

Particularly where you require global resilience, Cloud

offerings can make this available to small and medium sized

organisations where creating this capability in-house would

Page 16: THE DPP GUIDE TO DIGITAL - Amazon Web Servicesdpp-assets.s3.amazonaws.com/wp-content/uploads/... · assumption: you employ or have the benefit of a person or persons who is qualified

PAGE 16

WHERE SHOULD I KEEP IT?

4

be too complex and cost-prohibitive . A consequence of this

approach is that you often aren’t made aware of the actual

technology being used to provide the service and instead

the solution is described and defined purely in terms of the

Service Level Agreements (SLAs) offered by the supplier . It’s

therefore important to ensure you fully understand the nature

of these SLAs and the consequences of them not being met .

Simply getting a percentage of your monthly fee refunded

in case of downtime or content loss may not be adequate

compensation, so you may choose to design your solution

with this in mind . It’s best not to be fully reliant on purely

SLA-defined services if the financial remedies outweigh the

value you give to the content .

Differentiating Characteristics

Different storage media each have a range of properties, and

selecting the most relevant medium will involve balancing

each of these for your specific scenario . Factors include:

• Cost

• Read/Write performance – how fast content can flow

to and from your storage

• Access speed – the delay to retrieve content from

the system

• Data permanence (e .g . manufacturer’s expected

error rate)

• Physical degradation profile

• Environmental needs for storage of the media,

including space, temperature and humidity control

• Long-term support for the devices necessary to

provide access to the content .

• Technology requirements for continued storage (e .g .

power, tech support)

• Compatibility and interchange requirements (for

example, do you have partners or subsidiaries with

whom you may need to share your archive media?)

Finding that you have worms in your archive would not seem a pleasant proposition, especially for paper archives! However WORM stands for “Write Once, Read Many” and is an option available for some storage types to provide an un-erasable permanent version of normally modifiable media .

For example, you can buy WORM versions of LTO tapes which behave in all ways like traditional media except that once data is written it cannot be modified .

This provides an additional degree of protection against unwanted system or human error which could otherwise cause content loss .

Whilst not always applicable, it’s worth considering this technology when evaluating different storage options .

THE EARLY BIRD CATCHES THE WORM

Page 17: THE DPP GUIDE TO DIGITAL - Amazon Web Servicesdpp-assets.s3.amazonaws.com/wp-content/uploads/... · assumption: you employ or have the benefit of a person or persons who is qualified

PAGE 17

WHERE SHOULD I KEEP IT?

4

Underlying Technologies

PORTABLE DISK Hard disk drives have a very important part to play in modern

broadcast workflows but, being complex electro-mechanical

devices, they are prone to failure .

As hard drive sizes increase, the potential to lose large amounts

of data or media in a single incident increases dramatically .

Portable hard drives have a very valid use for short-term

storage such as moving files between systems, but are a bad

choice for any form of medium and long term storage .

MANAGED DISKProfessional systems making use of hard drives such as IT

and broadcast servers or storage arrays take into account the

possibility of drive malfunction and keep sufficient copies of

the data to allow recovery from failure .

Various redundancy methods are used to ensure varying

levels of protection against content loss ranging from full

mirroring of content between disks, to more complex ‘parity’

arrangements to provide ability to recover from failures of

multiple drives without a corresponding linear increase in

the storage volumes used .

A commonly-used redundancy method is known as RAID –

which stands for Redundant Array of Independent Disks .

All redundancy methods add an overhead to the efficiency

of storage usage and hence you will normally see storage

server capacities quoted as both ‘raw’ and ‘usable’ figures

reflecting the differing availability of storage capacity once

these protection methods have been applied .

OPTICAL DISCWe are all familiar with optical discs such as CD and

DVD but, despite all looking ostensibly similar, there are

considerable differences in the physical and chemical make-

up of these discs which affect their long term stability .

Commercially mass-produced music CDs and video DVDs

are created by a pressing process similar to how vinyl discs

are made, and the resulting data is effectively held within

a robust metallic layer . Barring rare manufacturing defects,

and potential degradation from very excessive temperatures,

these are a stable long-term medium .

Recordable disc (CD-R, DVD+/-R, CD-RW, DVD-RW) can

be filled with data using two different physical processes . In

both cases, a laser is used to write the data by either ‘burning’

it into a dye-layer for permanent (-R) formats, or causing a

physical ‘phase change’ in a crystal structure for re-writeable

(-RW) formats . Although there are inherent differences in

the resulting longevity of the formats, experience shows

that neither technology provides a good long-term archiving

format . Some manufactures sell ‘archive grade’ discs such

as the Millenniata M-DISC, or JVC Archival Grade products,

which claim to offer improved life-span but it is often the

manual handling of the discs such as printing, labelling,

writing on them, and general usage which can introduce

factors causing early unwanted degradation .

A new generation of re-writable ‘phase change’ discs has

emerged as a relative of Blu-ray technologies and some

manufacturers have added a robust caddy around the discs

to remove issues relating to manual handling of the discs and

provide somewhere to safely affix labels .

Formats such as Sony XDCam and Optical Disc Archive;

Hitachi Digital Preservation Platform; and the cross-

manufacturer Archive Disc product provide viable options

for long-term storage . Like other media such as LTO, some of

HARD DRIVES FAILPLAN FOR WHEN THEY DO

Page 18: THE DPP GUIDE TO DIGITAL - Amazon Web Servicesdpp-assets.s3.amazonaws.com/wp-content/uploads/... · assumption: you employ or have the benefit of a person or persons who is qualified

PAGE 18

WHERE SHOULD I KEEP IT?

4

these formats can be ejected for storage on shelves to reduce

energy-consumption and, in this case, generally do not need

any special storage conditions such as temperature, dust and

humidity control . Many are available in WORM configurations,

allowing greater confidence in data permanence and

can provide read-verify functionality where content is

automatically re-read after writing to make sure it has been

successfully stored .

As with any storage medium, you would be advised to

compare the characteristics of each format against your

particular requirements .

For all optical discs, there is an international standard (ISO

18925) offering guidance on the correct usage and storage of

these discs to ensure continued viability .

DATA TAPEA popular storage medium which is frequently used for long-

term archive storage is data tape such as LTO or T10000K .

The properties of this format are less familiar to many users

and often misunderstood, and so some key advantages and

disadvantages are presented below .

There are a number of reasons why data tape can often be a

good format to form part of an archive storage system:

• It’s a very stable format, which, if stored in suitable

conditions, does not suffer significant degradation over

time . This is partly due to the data on the tape being less

densely packed than on other media such as hard drives

and hence it is less susceptible to random corruption

through mechanisms such as thermal instability .

• The format has inherent verification built-in . Any content

written to data tape is immediately read back by a

different tape-head within the drive to ensure that the

information was written perfectly to tape, with the write

process being re-tried if issues are detected .

• The tapes themselves hold embedded digital information

(on a chip within the tape shell) regarding their history of

usage, errors and issues to allow systems to interrogate

this and act accordingly .

• Where tapes are ejected from robots and put on shelves

they are effectively decoupled from the live archive

system, and this provides some security in the event of

an issue affecting the copies in the live system .

However, data tape does have the following limitations

• It is not a random-access medium and there can often

be many minutes of delay for content to be recalled

from tapes in a library . It’s worth remembering that your

copy on data tape doesn’t have to be the copy used for

access and additional copies on disk or cloud storage can

provide more expedient access .

You’ll notice that sometimes this word ends with a ‘c’ and sometimes with a ‘k’ .

The reasoning for this is a matter of history, but in current parlance ‘disc’ is used to refer to optical or vinyl media – generally where the physical item itself is circular . ‘Disk’ meanwhile is reserved for magnetic media such as hard drives, generally where the physical items are hidden within rectangular housings .

DISC VS DISK

Page 19: THE DPP GUIDE TO DIGITAL - Amazon Web Servicesdpp-assets.s3.amazonaws.com/wp-content/uploads/... · assumption: you employ or have the benefit of a person or persons who is qualified

PAGE 19

WHERE SHOULD I KEEP IT?

4

• A number of incompatible methods exist for how data

can be stored on the tapes, leading to interchange

problems between different products . Even emerging

standards such as LTFS (Linear Tape File System),

whilst laudable in allowing easy interchange of data

tapes between systems from different vendors, can

bring complications when used in long-term archive

scenarios which need careful consideration . Variants of

the industry-standard TAR formats, or the emerging AXF

format are worthy of consideration in archive scenarios

but the exact choice of standard is non-trivial and should

be made in consultation with experts .

• Data tape needs good storage conditions, ideally with

managed temperature, humidity and dust control .

• Tapes have a limited number of read or write cycles

before they can wear out . Systems expecting to make

frequent use of tapes need to take this into account and

migrate content before the manufacturer recommended

limits are reached .

• The entry-level cost and level of technical expertise of

implementing data tape systems correctly can be larger

than using some other storage media .

Particularly, when storing archive content on data tape, be

careful to actually create data tapes suitable for archive

use and not simply IT backups of the content . There are a

number of common characteristics of IT backups which

don’t lend themselves to use in archive scenarios such as

the use of block-based incremental copies (where a file

can be split across multiple tapes) and reliance on a central

database to interpret the stored content rather than creating

self-describing and self-contained tapes . Put simply, IT

backup tapes are often of no use without the systems that

created them and the databases that describe them .

COLD STORAGEA particular property of archive storage is that a very small

proportion of the stored content is likely to be accessed in

any particular time period . It is therefore not an efficient

use of energy or cooling for all your content to exist in

permanently-powered disk arrays .

Solutions based on Optical Disc or Data Tape naturally use

no additional power for content which is stored but not being

accessed, however products are also available which apply

the same principles to hard disk storage and ‘spin down’ or

de-power drives which are not currently being used . This

is a principle employed by large organisations such as

Facebook who need to store huge volumes of data with low

usage profiles . As with any storage medium, you would need

to ensure that adequate consideration has been given to

management of the long-term integrity of content, especially

if the underlying storage media has not been designed with

archive usage in mind .

Solution Design

We have described the various technologies that can be used

when architecting a solution, but the real skill comes in using

these components in combination to design a solution which

meets your requirements in the most cost-effective way . As

re-iterated throughout this document, meeting the various

needs of access, preservation and security is rarely achieved

by the use of a single product . It is often cost-effective to

use a blended approach – for example using less secure drive

arrays for fast access, with the content also being stored on

less accessible but more secure media .

The Land of Lost Content

In considering how to ensure the longevity of your content

it is prudent to have a thorough understanding of the

mechanisms that can act to prevent this .

In general, it is important to remember that all media

has a limited lifespan and an inevitable degree of

ongoing degradation and corruption . It’s simply a case of

understanding the risk that this introduces, and balancing

this against the cost of mitigating it . Storing content digitally

is naturally entering into a game of chance and you need to

be clear of the odds and the consequences .

Page 20: THE DPP GUIDE TO DIGITAL - Amazon Web Servicesdpp-assets.s3.amazonaws.com/wp-content/uploads/... · assumption: you employ or have the benefit of a person or persons who is qualified

PAGE 20

WHERE SHOULD I KEEP IT?

4

Professionals involved in transmitting content over

satellite links are familiar with the concept of a naturally

imperfect transport medium due to the expected level of

corruption and interference that occurs on such links . They

are comfortable acting accordingly to mitigate these issues

and ensure adequate transmission of content .

Storage professionals who take a similar approach and

accept the imperfect nature of file storage are in a better

position than those who pretend that the issue doesn’t

exist . Planning for the day when content will be lost, either

through catastrophic failure or through gradual degradation,

will focus the mind on mechanisms to ensure the continued

lifespan of content in the presence of inevitable issues with

the underlying media .

For low volumes of content, particularly for content where

each item has a limited intrinsic value, it can be realistic

to ignore the possibility of data corruption or loss . But for

the archiving of complete radio and television programmes

where the accumulated cost and effort expended in the

creation of content now rests purely within the single

surviving media file, then the value of the content will

normally outweigh any desire to overlook the potential for

data corruption .

Risks to data permanence take a number of forms:

Catastrophic loss of storage mediumThis is a well-understood failure scenario, where a single

copy of content could be lost through technology failure or

environmental action such as fire or flood .

Technology storage solutions naturally provide a degree of

capability to recover from internal storage medium failures .

RAID storage for example allows for failures of one or more

hard drives without resultant data loss by the creation of

data redundancy across the array of drives and the provision

of automated recovery of the lost media .

Recovery after the failure of an entire collection of content

due to environmental action is possible by holding one or

more additional copies of content in a different geographical

location . This can also form part of your disaster recovery

provision – allowing your business still to operate during

temporary functional loss of one site .

Both these failure modes and mitigation approaches are well

understood and commonly deployed in archive scenarios .

An important consideration in deciding on the creation

of multiple copies of content is to consider how closely

coupled these are, and hence what scenarios could cause

multiple or all copies to be simultaneously lost . For example,

if all copies are managed by the same software system, is it

possible that human or system action could equally affect

all copies? For this reason, externalised, decoupled copies

are often created – for example by ejecting data tapes from

robotic storage libraries and storing these off-site so that

they are out of reach of erroneous or malicious action .

Gradual physical degradation of storage mediumMany storage media suffer from practical physical

degradation at a level greater than that quoted by the

manufacturer . The most prevalent example of this is

‘disc-rot’ affecting writeable optical discs such as CD-R

and DVD-R and caused by a range of chemical and

physical degradations resulting in catastrophic data

loss . This is generally irreversible with the only learning

being to consider the possibility of similar loss occurring

in the future and planning accordingly . On a practical

note, content on simple writeable CD or DVD should

not be considered as having been safely archived and

these formats are prime examples of media worthy of

preservation activities .

Natural gradual variation in the content storedA less familiar phenomenon is the gradual decay of storage

media where randomly occurring environmental factors

such as cosmic radiation and thermal instability can cause

low-level infrequent random corruption of data which can

be both undetectable and unrecoverable . This is colloquially

known as ‘bit rot’ .

Page 21: THE DPP GUIDE TO DIGITAL - Amazon Web Servicesdpp-assets.s3.amazonaws.com/wp-content/uploads/... · assumption: you employ or have the benefit of a person or persons who is qualified

PAGE 21

WHERE SHOULD I KEEP IT?

4

Manufacturers quote the likelihood of uncorrected data

corruption occurring in their media with figures such as

1x1015 bits of data (125 Terabytes) as the amount of content

that would need to pass through a medium before a single bit

of data is corrupted . Raw media error rates are improved by

error detection and error correction coding, however these

only provide a degree of protection and don’t eradicate all

potential data loss .

Although these probability figures appear incredibly

small and therefore unlikely, the volumes of content being

considered mean that they cannot be ignored . These are also

quoted as average, best case error rates, hence you would

expect to actually encounter some corruption earlier than

the figures might suggest . A long-term study by the Swiss

scientific establishment CERN gives real world experiences

of data corruptions with a frequency of the order of 1x10-7 .

This would be a worrying statistic if carried through to media

storage situations .

These figures also only relate to one journey of content

through the medium in question . In practice, content is

likely to flow through many hard drives and multiple parts

of computer memory on its way to a more secure storage

medium and the cumulative risks that this concatenation

of probabilities brings about should be given due

consideration .

The integrity management processes discussed within

this document can help to alleviate these issues, but don’t

feel complacent just by having multiple copies of content

unless you can say at any point in time which of them are

uncorrupted . This is particularly true when you come to

migrate content from one storage solution to another . In this

case you are likely to only migrate from one source copy of

each media file, and not migrate from each instance, hence

it’s essential to know the integrity of any content you are

using in your migration .

Technology obsolescence reducing the ability to read the storage medium.Many storage media, especially those designed for archive

use, are sufficiently stable to not degrade over time to any

sufficient level – if correctly handled . You will therefore often

see media such as LTO data tape quoted as having a 25-year

life .

In essence this is a commitment that the data will, in the

most part, still be intact on the storage media after 25 years .

It is not, however, a commitment that technologies will still

exist and be supportable to be able to read that medium .

In the case of LTO, there is a continual evolution in the

generation of the format, from LTO-1, holding 100 GB of data,

through to the current LTO-6 format holding 2500 GB . As

an example of support lifecycles, LTO-3 was the prevailing

format until approximately 2008; however by 2015 it was

becoming harder to purchase devices capable of reading

LTO-3 tapes . Anyone still needing to migrate from this

format would be well advised to stockpile suitable hardware .

Technology obsolescence reducing the ability to understand the stored contentEven if the storage media hasn’t degraded and you have the

technology to read the medium, you are not guaranteed

to be able to understand the data that is stored on it . As

formats and standards evolve over time, not all software

maintains the ability to read every previously existing format .

If you are expecting to need to read a format long into the

future without migrating the content between file formats

along the way, you would be advised to consider also

archiving the knowledge and capability of how to read that

format rather than rely on the capabilities of future software

solutions in this respect .

Sometimes, the stored information cannot be understood

without corresponding data held in a separate system . For

example, IT backup systems may store raw information on

data tapes but may rely on information held in the database

of the backup product to understand the context for the data .

For example the file names and folder structure may only be

held in the database and not on the tapes themselves .

Page 22: THE DPP GUIDE TO DIGITAL - Amazon Web Servicesdpp-assets.s3.amazonaws.com/wp-content/uploads/... · assumption: you employ or have the benefit of a person or persons who is qualified

PAGE 22

WHERE SHOULD I KEEP IT?

4

Good practice for archive systems is that the storage media

is self-contained and self-describing such that it doesn’t

rely on any external metadata or systems to allow it to be

accessed and understood .

Human ErrorEven if your solution takes into account all possible ways

in which a storage medium could fail, you may still find it

susceptible to human error .

In the digital world, the potential for a single, seemingly

insignificant action to destroy large volumes of data is far

greater than in the era of physical storage . Furthermore,

such a data loss may go unnoticed by the custodians of

the content in a way that a corresponding large-scale

destruction of physical content would never have .

Particular attention should be given to software upgrades,

systems administration activities and housekeeping tasks .

Limiting the ability for a single person to delete all digital

copies of content is greatly advised, for example not allowing

an engineer to simultaneously have administrative access to

your primary and secondary copies .

In 1986 to commemorate the 900th anniversary

of the Domesday Book (William The Conqueror’s

original survey of the country in 1086) the BBC ran

a groundbreaking project to create an equivalent

digital survey of the UK .

The country was divided up into 26,000 rectangles

and schools across the country contributed images,

sound, videos, and text capturing the essence of life,

work and play in the country in 1986 .

To this day, it is one of the largest crowd-sourced

projects in the country with over one million people

contributing to the project . The user-interface was

ground-breaking in a number of ways, allowing

users to walk around towns in the style of Google

Street View and allowing search and navigation of

the huge array of data, all pre-dating the invention

of the World Wide Web .

The content was stored on Laser Discs and these

have not suffered any degration over time and are

still perfectly intact .

The original Domesday book has lasted for 930

years so far, however our ability to read the contents

on the Domesday discs lasted about 10 years .

This wasn’t due to format degradation, but to

obselecence of the software and hardware to read

and understand the data they contained .

Interestingly, this issue was greatly compounded

by the restrictive licences under which the original

content was obtained, effectively limiting some

content to be usable only in its original context and

using the now-obsolete storage medium .

Some projects have successfully extracted some

specific content from the discs but they remain

largely inaccessible .

The lesson being that if you are at the forefront of

technology be very careful about your archiving

decisions and consider what you could do now,

both from a technology and contractual standpoint,

to ensure that your data has the lifespan that you

would like it to have .

BBC DOMESDAY

Page 23: THE DPP GUIDE TO DIGITAL - Amazon Web Servicesdpp-assets.s3.amazonaws.com/wp-content/uploads/... · assumption: you employ or have the benefit of a person or persons who is qualified

PAGE 23

5

Application of technology to provide assurance of archiving

The OAIS model provides an excellent framework within

which to consider how best to apply technology to meet the

needs of an archive system . It is therefore useful to address

technology aspects in relation to key principles of OAIS

beginning with Ingest .

INGESTIf you don’t successfully capture the content you had

intended, or aren’t able confidently to assure that you have,

then there is little point in expending effort and cost in

managing its long-term life .

The OAIS principle of Submission Information Packages

(SIPs) can be interpreted to imply an approach to control

the form in which content is presented for ingest to a digital

archive both in terms of media and metadata . It’s important

to note that this doesn’t mandate a single universal input

format but instead invites you to understand your input

formats to ensure that system behaviour in relation to each

of these is defined .

How will I know it’s safe?

In reality you may actually find that you are only able to

control some of your input formats . Maybe you have a small

list of defined formats used within your organisation, but

you may also wish to maintain the ability to archive arbitrary

files such as RAW outputs from whatever camera format

your content creators have chosen to use . In this case, you

could create a variety of SIP definitions for the recognised

controlled formats, but would also need to create a generic

SIP with a minimum metadata set for archiving files which

themselves may not be of a normalised and controlled

specification .

The SIP does not determine how the content is stored in

the archive, only the form in which it is presented for ingest .

We therefore need to consider the conversion of SIPs into

Archival Information Packages (AIPs) as content is stored

within an archive .

A crucial consideration is whether to modify content as it

is ingested into an archive . With a single defined AIP, any

content not conforming to this definition would be converted

or ‘normalised’ as it is ingested into an archive .

This gives benefits through ‘standardisation’ of content to be

managed within the archive and for onward distribution, but

adds complexity to the ingest process and to the assurance

of complete and valid capture of content .

Most notably, it becomes crucial to ensure that no

information is lost and that no unintended degradation in

content quality is experienced during the conversion of a

file from a SIP to an AIP . It is not a trivial task to automate

or to keep up with changes to formats and systems while

being unequivocally confident that the process is performing

perfectly .

Where content is received which deviates from expected

specifications, the best case is that the system notices

this discrepancy and reports the issue for attention . The

worst case is that it fails silently and unwittingly creates an

inferior or degraded copy of the source file . In practice, a

combination of the two often occurs because although good

system design can plan for every such known eventuality, the

design may not be exhaustive and may not cope with future

technological advancements not conceived of at the time of

system design .

Page 24: THE DPP GUIDE TO DIGITAL - Amazon Web Servicesdpp-assets.s3.amazonaws.com/wp-content/uploads/... · assumption: you employ or have the benefit of a person or persons who is qualified

PAGE 24

HOW WILL I KNOW IT’S SAFE?

5

Importantly, creating a new rendition of the media file at import means that the

checksum of the media file in the AIP will not be the same as the checksum of the

media file in the SIP, which is inconvenient . You therefore need to be totally sure

that the AIP for which you create a new checksum contains a perfect copy of all the

content that has value within the SIP, and hence you can treat the new checksum

as authoritative from this point on .

Archiving systems from other industries often maintain two instances of a file – a

‘bitstream’ copy which is an exact replica of the original and a ‘logical’ copy which

preserves the meaning and useful content of the file . For document archiving,

storage of two copies isn’t a huge overhead given the size of the files, but it needs

greater consideration when dealing with larger media files .

This is not to say that normalisation on ingest isn’t a valid option – it’s regularly

performed when content is ingested into some asset management platforms,

however it’s important to appreciate the complexity that it brings .

In summary:

• Choosing not to normalise content provides simplification up-front, but can

create complexities downstream .

• Normalising upon import can lead to simplifications downstream but you need to

invest in ensuring and assuring that your normalisation process works perfectly .

If the ultimate goal of normalisation upon ingest is to ensure long-term viability

of the content then you may want to consider effectively archiving the ability and

knowledge of how to normalise the content rather than actually performing it

upfront on all content .

PRESERVATION VS ACCESS

The two primary goals of archive technologies are to keep content safe and to provide access to it . What is often not given due consideration is that it is not always necessary to achieve both these objectives with the same technologies or with the same formats and standards . It is quite normal to use a blend of approaches, for example:

• Using cloud or cost-effective disk storage to provide access to content, whilst using data tape for an archive copy with managed integrity but less expedient access .

• Storing a copy of media in a widely accessible format (e .g . AS 11 DPP) but also where the content warrants it, storing a higher quality copy in a file format that is less easy to access or edit but preserves a better quality version of the content .

KILL TWO BIRDS WITH TWO STONES

Page 25: THE DPP GUIDE TO DIGITAL - Amazon Web Servicesdpp-assets.s3.amazonaws.com/wp-content/uploads/... · assumption: you employ or have the benefit of a person or persons who is qualified

PAGE 25

HOW WILL I KNOW IT’S SAFE?

5

ASSURANCE OF ARCHIVINGIt is important to assure, wherever possible, that content has been archived

successfully and this principle is equally as important whether you deliver the

archive technology in-house or you outsource it to a managed service provider .

Checksums should be created as soon as a SIP is received into the archive . Where

no normalisation of content occurs on ingest, and where checksums were created

as early as possible in the content creation process (such as when the content

is QCd) you will be able to programmatically determine that a perfect copy of

the content has been secured by comparing a checksum of your archived content

(re-read from archive storage if necessary) with the original source checksum .

Where automated archiving solutions are employed to capture content and

deliver it to an archive system, it is possible to make use of Automated Quality

Check products but it is also recommended that manual QC processes, in

the form of random spot-checks should be employed to catch issues which

hadn’t been considered when designing a solution . Where you want to have

ultimate confidence in your QC processes, do not have people or systems check

their own homework . Where a software tool processes some files, use a different

tool to validate them and where a person performs a task, have someone else

check it .

INTEGRITY MANAGEMENTGiven the likelihood of decay in content being stored, it becomes essential to

ensure that your content is not adversely affected . This can be achieved by

actively managing your content .

File checksums, or ‘hashes’ are key to any system that aims to manage content integrity .

They take any file, however large, and create a very short unique fingerprint which can be used to unambiguously identify the file and determine that it is still identical to when the checksum was created .

The most common hash function used for validating media content is the MD5 checksum, which creates a 32-character value in the style of…

252e60baf2658f6ea5237c45f47c6fde

…although other algorithms such as SHA-1, SHA-256 and CRC-32 are also used .

If a checksum is created as early as possible in the life of a piece of media – for example when it is Quality Checked – and if this checksum is then stored as the master content fingerprint for that file, during any future processing or migration task the content can be validated as being identical to when the checksum was created .

CHECKSUMS

Page 26: THE DPP GUIDE TO DIGITAL - Amazon Web Servicesdpp-assets.s3.amazonaws.com/wp-content/uploads/... · assumption: you employ or have the benefit of a person or persons who is qualified

PAGE 26

HOW WILL I KNOW IT’S SAFE?

5

In archive situations, it is typical to archive content and then

not access it for a number of years – but that’s a bad time to

discover that both your primary and secondary copies have

been lost .

It’s worth setting a policy to define that all content undergoes

a scheduled, albeit infrequent, integrity check to ensure it

is still intact . This frequency should be selected to balance

the impact of this process against the desire to detect errors

before more than one copy of your content is affected . It can

make sense to align these infrequent checks with the need

to migrate content between storage media .

If your interest in this topic is sparked and you choose to

read more about integrity management you will undoubtedly

encounter the word ‘fixity’ . This term is used by archive

professionals to refer to the measurement of the fact that

a file is unchanged since it was stored . Validating that files

haven’t been corrupted is therefore sometimes referred to

as ‘fixity checking’ but this is simply the same checksum

management process described above .

Keeping multiple copies, ideally in geographically resilient

locations, provides the mechanism to recover from failure .

However it is also necessary to provide the mechanism to

detect failure . Keeping more copies does not in itself provide

full protection against gradual degradation unless you have

a way of knowing which instances are uncorrupted at any

point in time .

If it moves, checksum it

A simple approach to allow confidence that no corruption

has occurred is to create a checksum as early in the life

of the content as possible, and to re-check this whenever

content is restored, migrated, moved or delivered . This

capability is frequently offered as part of asset management

and/or storage management solutions .

It’s worth noting that when performing a partial-restore

on content, where only a small portion of the entire file is

retrieved, it’s not practical to validate full-file checksums

and hence these activities cannot easily contribute to the

ongoing assurance of archive integrity .

If it doesn’t move, checksum it

An approach that is less common is additionally to check the

integrity of content which hasn’t been accessed for some

time . Manufacturers of archive technology products often

call this ‘scrubbing’ .

Page 27: THE DPP GUIDE TO DIGITAL - Amazon Web Servicesdpp-assets.s3.amazonaws.com/wp-content/uploads/... · assumption: you employ or have the benefit of a person or persons who is qualified

PAGE 27

6

Minimum Metadata Set

Even with thorough selection processes, there is little point in

storing content if you are unable to find it again successfully .

This puts the emphasis on getting your metadata correct at

time of archiving to ensure that it adequately describes your

content in a way that will let you find it again .

The OAIS model described earlier in this document

introduces the concept of a Submission Information Package

(SIP) which allows you to define the descriptive metadata

you wish to capture . It also encourages you to consider the

eventual consumer of the content when defining this . It

should be noted that the OAIS approach doesn’t mandate

a single SIP for all content, so you may choose to define a

minimum metadata set which is needed for all content

but allow variants for different content types . You might,

for example, accept that you will use different sets of

information to describe a completed programme from those

applied to rushes material .

Even with only a limited number of potential metadata

templates, you will still benefit from having a universal

How will I find it?

minimum metadata set defining a small subset of fields

common to all content types and which is sufficient to

allow the assets to be managed in the archive . This might

simply include some basic editorial information and unique

identifier but should ideally also contain detail on how long

the content is expected to be held for .

The DPP have defined a metadata set for delivering

programme files to transmission . This is detailed in the

AS-11 file-format specification – specifically in the ‘shim’ (a

name given to a constraint on an existing standard) which

customises this for DPP file-delivery scenarios . Given the

purpose of this delivery specification, it’s unsurprising that

it focuses primarily on the metadata needing to support

programme playout rather than for long-term archiving .

Nonetheless, it would be advisable to consider this standard

as you design your archive ‘data model’ as it provides a

useful starting point for editorial metadata standardisation

and it’s likely that your content will need to be contained

within an AS-11 DPP file at some point in its life .

Although it’s common to focus on adding file-level metadata,

people can often overlook the benefits arising from simply

providing a high level description of a collection to capture

its reason for existence and how it came to be . Often this is

the missing piece of the puzzle that brings context to the

low-level granular metadata and informs the user on how

best to interpret it .

When defining and documenting metadata fields, it’s worth

considering the following characteristics:

• Should the metadata field be constrained to a limited set

of values? For example allowing only numeric values, or

requiring selection from a drop-down list . Constraining a

field can improve consistency but could devalue it if the

constraints are too severe .

• Should all internal users be able to view the metadata

field, or would it be visible only to a limited group?

• Who, if anyone, is able to modify the metadata field?

• Would you like users to be able to search on the contents

of the metadata field, either when typing into a single

‘Google-style’ search box, or when using advanced

search?

Page 28: THE DPP GUIDE TO DIGITAL - Amazon Web Servicesdpp-assets.s3.amazonaws.com/wp-content/uploads/... · assumption: you employ or have the benefit of a person or persons who is qualified

PAGE 28

HOW WILL I FIND IT?

6

• Are there any sensitivities associated with the field that

would prohibit wide publication? With the desire to

expose content catalogues to the internet for commercial

exploitation, it’s essential to understand what subset of

your data would be suitable for such a purpose .

In addition to the metadata you explicitly provide when

content is archived, it’s useful to consider other data you

may already have which could be used to augment the

asset-level metadata . For example, you may hold or be able

to obtain subtitle files for your completed programmes, and

many asset management systems are now making use of

these to provide enhanced searching .

Similarly, there may be information effectively embedded

within content which new technologies may allow you

to unlock . Speech-to-text, topic extraction, video text

recognition, automated tagging, and phonetic search are

all technologies that are moving from the research and

development laboratories into mainstream products –

greatly improving the ability to find your content .

Search technologies are currently a major area of focus for

innovation, in contrast to storage and archiving technologies

where advancement is less rapid . Consequently it would be

wise, when selecting asset management technologies, to

be sure that the chosen supplier and product are able to

benefit from, and ideally drive, new innovation in the search

experience . This will improve the efficiency of the archive

operation and keep pace with increasing volumes of content

being stored .

Unique Identifiers

You may find many identifiers within your organisation which

are described as being ‘unique’ but in practice some are

more unique than others .

Often, you will find alphanumeric editorial or technical IDs

used to describe assets within your production process such

as Programme Numbers, Make Numbers or Version Codes .

Although these can be unique within a certain environment

or with reference to a certain level in your commissioning

structure (e .g . identifying a programme version) they rarely

uniquely define a piece of media sufficiently to be used as

your primary ID in a digital archive system .

For example, you might hold multiple copies of a single

editorial programme version in different systems but would

want to unambiguously distinguish between them, even

though they all share the same Programme ID .

Similarly, your Programme Number may be unique within

your organisation or broadcaster but there may be no

guarantee that it is globally unique . This may have not

been a problem in the past but with growing needs to share

content between organisations these internal IDs are often

not sufficiently unique to be of use in all situations . If you

use your internal IDs publicly and where there is a risk of

these clashing with IDs from other organisations, ensure

you include the issuing body (e .g . your organisation name)

in your metadata to allow disambiguation .

A number of standards exist to assist here such as SMPTE

330M – defining the Unique Material Identifier (UMID)

used by a variety of media systems such as Avid editing

systems, or tapeless capture products such as XDCam .

Other initiatives focusing specifically on global identifiers

for exchanging and publishing of content include the

International Standard AudioVisual Number (ISAN) and the

Entertainment Identity Registry (EIDR) .

Some systems, such as Media Asset Management Systems,

will assign unique private identifiers, such as UMIDs, to all

content they hold, but you will need to consider whether

these are suitable for use as public identifiers .

Page 29: THE DPP GUIDE TO DIGITAL - Amazon Web Servicesdpp-assets.s3.amazonaws.com/wp-content/uploads/... · assumption: you employ or have the benefit of a person or persons who is qualified

PAGE 29

HOW WILL I FIND IT?

6

In absence of asset management systems, it is sometimes

common to construct file names from metadata fields and

effectively use this as a unique ID for the file . Whilst this is

a pragmatic solution to a lack of a metadata management

system, it introduces a range of complexities if carried

through to use in digital archive systems . You would be

advised to keep your IDs purely for that simple purpose – to

uniquely identify an entity – and store your metadata in a

dedicated system with your ID providing the link between

the two .

Similarly, when digitising content from videotape to file,

the use of filenames based on sequences of editorial and

publication information can lead to unforeseen problems

stemming from poor metadata quality or clashing IDs which

can cause delays to your project . Decoupling the media

conversion activity from metadata issues, through use of the

simplest possible identifiers can streamline this process .

This can be a complex topic but it is essential to ensure

that someone in your organisation takes responsibility

for ensuring you have a defined approach and that it is

implemented consistently and effectively .

Embedded Metadata

Many file formats allow metadata to be embedded such that

content becomes self-describing . This is good practice and

can help with the future life of the content you create, but

it’s important to decide and document which copy of your

metadata is the master .

It is normal that the metadata embedded in a file is

just a point-in-time snapshot and not necessarily your

authoritative copy . It therefore only exists as a ‘reference

of last resort’ to improve understanding of the content in

extreme scenarios and wouldn’t necessarily need to be

updated if and when your metadata changes .

The most important metadata to embed is a globally-unique

identifier, which will allow the file to be unambiguously

linked to the authoritative metadata in whichever system

holds the master copy .

Page 30: THE DPP GUIDE TO DIGITAL - Amazon Web Servicesdpp-assets.s3.amazonaws.com/wp-content/uploads/... · assumption: you employ or have the benefit of a person or persons who is qualified

PAGE 30

7

It’s Not Easy Being Secure

Information Security is difficult . Even organisations with

multi-million dollar IT budgets fall prey to hackers and suffer

content and data breaches .

If you are not fully confident in your organisation’s capabilities

in data security, it is highly recommended that you take

specialist advice . Even if you believe yourselves to be fully

competent in this area, there is little harm in getting third

party assurance and potentially commissioning a ‘penetration

test’ given the potential consequences of a breach .

Like many other topics in this document, there is already an

international standard ISO-27001:2013 providing guidance,

standards, and means of certification for providers of

technology services . Reading the ISO-27001 standards

document may allow you to gain confidence in your

own provision, or highlight that you might benefit from

independent advice .

How can I stop the wrong people from getting in?

Below are three specific topics commonly discussed in

relation to modern security provision .

The Threat Within

Previous common practice was to strongly defend the

boundaries of your organisation but assume that people

within your facility and IT network could be allowed relatively

unfettered access to systems . You may, of course, have

access rights and permissions applying to your applications,

but you’re unlikely to be applying the same level of rigour

to your internal systems that you do to those on the public

internet .

Recent experience has shown that data breaches and leaks

often come from within an organisation with disgruntled

or otherwise-motivated employees making use of ‘social

engineering’ (such as tricking people into exposing their

login details) or exploiting security weaknesses to access

and carry away high-value content .

Coinciding with this is the desire to make internal systems

available remotely to support flexible working or global

collaboration, and also to make use of Cloud hosting to

enable cost-effective deployment and scaling .

For these reasons it is becoming advisable to specify and

design your internal IT systems as if they were on the public

internet and to pay closer attention to the level of access that

your own staff are given .

Authentication

We all have first-hand experience of information security

practices in our home and work life .

We are continually reminded of the need for long and

complex passwords and the need to keep them unique and

safe from prying eyes .

Page 31: THE DPP GUIDE TO DIGITAL - Amazon Web Servicesdpp-assets.s3.amazonaws.com/wp-content/uploads/... · assumption: you employ or have the benefit of a person or persons who is qualified

PAGE 31

HOW CAN I STOP THE WRONG PEOPLE FROM GETTING IN?

7

Passwords however, are becoming an incomplete and

imperfect way of securing systems and new approaches and

technologies are needed .

First, technologies for cracking passwords using brute-force

techniques are increasing in performance and effectiveness

faster than we can convince people to use increasingly

longer and stronger passwords . Secondly, passwords are

easy to share and lose and don’t actually give you confidence

that they are being entered by only the person to whom you

initially issued them .

A common approach in high-security systems is to require

Two Factor Authentication (2FA) where you need to prove

you have two unrelated components to gain access . This is

most commonly ‘something you have’ and ‘something you

know .’ We will all be familiar with bank cards where you

supply the card you have and the PIN you know in order to

get access to your money .

Likewise, some internet banking systems or corporate

remote access services require you to enter a password

whilst also supplying a rolling code from a physical token or

device .

Two factor authentication is becoming more common in

public systems such as Gmail, Facebook and iCloud which all

support this approach using your mobile phone as the thing

you have although it is often not enabled by default .

In considering the security model for your system, you

should consider whether you would benefit from such an

approach to be sure that the person gaining access is who

you believe them to be and not just someone who has seen a

password written on a post-it note stuck to a monitor .

Encryption at Rest

It is becoming commonplace to use encryption when moving

content over the internet . This is good practice and no longer

introduces delays or overhead to the process .

What is less common is to make use of ‘encryption at rest’ to

ensure that unwarranted physical access to storage media

will not result in access to content .

While not everyone may choose to make use of this

technology for content stored within their own premises, it

is good practice to use this when content leaves your site

either on hard drives, memory sticks, laptops or in the Cloud .

All modern computer Operating Systems include the option

for automatic Full Drive Encryption to ensure that a laptop

or hard drive falling into the wrong hands will not release

the secrets stored within, and it is good practice to mandate

or enforce the application of these technologies . Likewise,

when choosing cloud-based services you may want to

consider whether they offer such protection .

There are, however, disadvantages to making greater use of

encryption-at-rest within your systems, for example:

• Encrypted content cannot be manipulated or processed

without decryption . If you need to transcode, create

proxies, or partially-restore content then it will need to

be decrypted while these processes occur, creating a

potential weak-point in the system .

• Particularly with partial restore, it is normally necessary

to restore and decrypt the entire file in order to extract a

small portion from it, thereby negating the major benefit

that partial restore offers .

• Management of encryption keys becomes a critically

important task . With a sufficiently secure system, loss of

the encryption key equates to loss of all content .

• Encrypted content is more susceptible to data loss, for

example bit-rot . If you are lucky, corruption of a single

‘bit’ in a media file could just result in a minor amount of

digital dropout but the same corruption in an encrypted

file is likely to result in complete file loss .

Page 32: THE DPP GUIDE TO DIGITAL - Amazon Web Servicesdpp-assets.s3.amazonaws.com/wp-content/uploads/... · assumption: you employ or have the benefit of a person or persons who is qualified

PAGE 32

8

Without endless capacity to store content and unlimited

budget to manage it, you will need to define the criteria for

what comes in and out of the archive .

Furthermore, the business analysis that should be the

foundation for understanding the purpose of having an

archive will determine the valuable lifetime of a piece of

content and when it should be deleted or relocated to

another place .

The important thing is to have consistency and agreed

processes so that the holdings of the archive do not

reflect the mind-set of the manager in charge at any

particular time .

Policies and Guidelines

These decisions will provide the overarching criteria for

a set of written rules that will act as a constant reference

guide to the archive owner . The policies will be agreed with

all stakeholders in the archive, be endorsed top-down, and

be regularly reviewed to assure their continued relevance to

How long should I keep it?

the business . Any fundamental changes need to be agreed,

approved and communicated . Without these checks in place

an archive can quickly become unmanageable and costs can

spiral out of control .

What is a Policy?

A policy is a high level statement that may consist of just a

few words: e .g . “we collect a minimum set of good quality

data to support the production of science programmes” .

That statement alone indicates that you select data and that

there are rules around the type, source and accuracy . The

reader immediately gets a sense of the worth of the material

to the organisation . It’s more typical for a policy to include a

little more detail; but generally the objective is to make such

statements accessible and understandable to anyone at any

level in the company .

It may be useful to consider the readability from the point

of view of a Board Member of the organisation and also a

new apprentice, both of whom have the same requirement to

read and understand the purpose of having an archive .

What are Guidelines?

Guidelines make the initial policy statement more tangible to

an individual or department, interpreting what is required by

them to fulfil archive responsibilities .

They could detail the intake policy and what technical and

metadata standards are expected when a piece of content

enters the archive . They could explain a schedule for

reviewing and selecting content and who may have overall

approval . Metadata guidelines may set down where data

is acquired or added or where producers are expected to

submit shot-lists .

One golden rule for guidelines is to always use role names, as

opposed to specific people, so that, amendments can be kept

to a minimum when staff come and go .

Page 33: THE DPP GUIDE TO DIGITAL - Amazon Web Servicesdpp-assets.s3.amazonaws.com/wp-content/uploads/... · assumption: you employ or have the benefit of a person or persons who is qualified

PAGE 33

9

Migration

As discussed above, if you are intending to keep content in

perpetuity, it is inevitable it will outlive a specific technology

solution or storage medium .

Although in some rare cases you may commit to preserve

forever the capability to access the source media, for

example by continuing to maintain specialist hardware

and software, it will generally be necessary to consider

the need to migrate content between formats, carriers

or technologies .

Migration is just as important for metadata as it is for

media files and, for both scenarios, may need to occur on a

number of levels:

1 Physical Media Storage

2 File wrapper or metadata container

3 File codec or metadata structure

How do I know it will always play?

At its simplest, this is just an exercise in either copying or

converting media and metadata, en-masse, between storage

media and data containers . There are, however, a number of

real-world constraints that need to be considered .

In deciding when you choose to migrate content, you will

need to consider how your ability to access the content may

be degrading in relation to the emergence of new formats

onto which to migrate that content . For example, you may

want to migrate LTO3 content to LTO6 tape before LTO3

drives become obsolete but after LTO6 has become a

stable and cost-effective format . It is possible to adjust the

practicality of this timing by pre-purchasing equipment which

is due to become obsolete, although consideration should be

given to the availability of support, spares, and consumables .

A primary factor in migrating content which currently

resides in a live environment is the requirement not to affect

the live operation through the additional content access that

the migration process requires . A common approach where

a resilient second copy of content is held is to migrate from

the secondary copies rather than the main instance .

Access to data to drive the migration process is crucial .

Content stored in an archive may vary over time, for

example with superseded files still residing on media even

though they are not referenced by the live repository . You

may therefore only want to migrate current valid data rather

than the entire content of tapes, but would need data to

drive this process .

Similarly, when migrating metadata it is also normal to

perform a data-cleansing exercise before migration to

ensure that you are not needlessly migrating redundant or

erroneous metadata .

Integrity Management During Migration

The most effective checksum lives with a piece of content

through its entire life, however the potential need to

migrate between different formats or codecs introduces a

discontinuity, during which extra attention should be given

to ensure the integrity of content is maintained .

Page 34: THE DPP GUIDE TO DIGITAL - Amazon Web Servicesdpp-assets.s3.amazonaws.com/wp-content/uploads/... · assumption: you employ or have the benefit of a person or persons who is qualified

PAGE 34

HOW DO I KNOW IT WILL ALWAYS PLAY?

9

Where multiple copies of content are held to reduce the

likelihood that content is corrupted at rest, it’s important

not to adversely affect this approach by only migrating

from one copy without also validating the integrity of these

files . The normal migration approach would be to use file

checksums to ensure that content is identical to when it was

archived and make use of the additional copies to replace

any corrupted content .

If you don’t have access to checksums created when content

was originally archived, then you will need to take great care

to ensure that the migration has happened successfully, for

example comparing all the resilient copies you hold to give

extra confidence .

You can make use of a variety of tools to assist with

validation that no loss of content integrity has occurred

during migration, for example:

File-level checksums: When only migrating between storage platforms, this can

unequivocally prove perfect migration .

Content checksums: Where lossless migrations between format carriers

are undertaken, some formats support media-level

checksums either on a per-frame or per-track basis

which will allow unambiguous verification that the content

within a new wrapper format is identical to that contained

in the original .

Content comparison: Where transcoding is being performed, it becomes harder

to ensure that the created file is equivalent to the source .

Simple comparisons of file properties such as duration and

number of audio tracks can provide a level of confidence, but

some tools allow the content of two files to be compared

and a measure of the extent to which the media content

differs to be calculated as a Peak Signal-to-Noise Ratio

(PSNR) figure .

Whilst seemingly obvious, it’s important to remember that

if you go to the effort of creating checksums then you need

to store these safely somewhere too . A ‘verifiable manifest’

of all your content will also allow you to confirm that no

content is missing, as well as confirming that any existing

content is intact .

Data Tape Migration

Where content is stored in robotic data tape libraries, there

are two characteristics of these systems that need specific

consideration during migration activities:

• There is a delay while the robotic mechanism selects the

required tape from a library and loads it into a tape drive

for access .

• The content is stored linearly on the tape, so the system

must spool to the relevant portion of the tape before

content is read from it .

Both these factors introduce a delay in random access of a

specific piece of content . Where large volumes of content

need to be migrated, the cumulative effect of these delays

can make the operation prohibitive . For example, one million

small audio files migrated using a single data tape drive in

a system where there is a three minute delay to access a

random file will take over five years!

These delays can be drastically reduced by using information

about how the content is stored within the system to drive

an efficient migration process . This would result in content

on each tape being migrated together rather than swapping

between tapes, and also that the content should be migrated

in the exact order that it is stored on the linear tape rather

than requiring continual spooling to access the content .

Some storage management systems include this capability

as part of the product but where this is not provided, special

consideration should be given to whether it is possible to

access the required data and drive a process to migrate the

content in a timely manner .

Page 35: THE DPP GUIDE TO DIGITAL - Amazon Web Servicesdpp-assets.s3.amazonaws.com/wp-content/uploads/... · assumption: you employ or have the benefit of a person or persons who is qualified

PAGE 35

HOW DO I KNOW IT WILL ALWAYS PLAY?

9

All good things must come to an end

Exit planning is a specific subset of migration and is

important for all solutions but especially relevant for

managed services .

Put simply, don’t put any media or metadata into a system

unless you are very clear about how to get it out again later .

A common error is to focus more on the media and

editorial metadata and not ascribe sufficient value to the

organisational metadata that accompanies it . For example

the cataloguing and organising of content stored within a

media asset management system effectively results in the

creation of metadata which can have a high value and which

should be considered in migration and exit planning .

Your exit plan must be agnostic of destination and provide

generic capability to access all media and metadata in a

defined way such that it can be migrated into a replacement

system even if this system is not conceived or defined at the

time of entry into the initial system .

It’s good practice to create an exit plan, but it’s even better

practice to keep it up to date as technology changes .

Let us consider a scenario where you chose to store

your media and metadata with a service provider .

It is likely that you will change or re-evaluate your

solution or supplier before your content has reached

the end of its useful life and hence you need to plan

for how you will eventually move your content to a

new solution or provider .

Also, imagine that you are continually augmenting

the metadata you hold by your interactions with

the system, such as through the organisation,

cataloguing and accessing of content .

Your exit planning should take place before you

begin to store content in your system . It is also

advisable to run a proof-of-concept of any exit

procedures once you have a small volume of content

stored, but before it’s too late to change your

approach .

You would want to confirm that the agreed process

would allow a defined subset of media to be

delivered back to you in a usable form, and that the

accompanying associated metadata contains all the

information you require including both the original

and added data . An example of this additional

metadata would be organisation information, for

example resulting from putting the media into a

folder, bin, or collection within the system .

Where this metadata is passed back to you in a

physical form, such as a data tape, you would be

advised to confirm that you are able fully to access

the metadata without use of the system from which

you are testing exit . For example, you should confirm

that you can read the data tape using a generic

system, play the video, and read the metadata .

For metadata interchange, it is likely that XML files

will be used and you would ideally want a competent

person in your organisation to validate these and

confirm they contain the required information . If

this isn’t possible, you might enlist the help of your

supplier to walk you through the files and highlight

the information of interest .

You would be advised to re-run this exercise at defined,

albeit infrequent intervals . You should also trigger it

upon significant changes to the underlying technology

infrastructure such as major version upgrades .

EXIT PLANS – A WORKED EXAMPLE

Page 36: THE DPP GUIDE TO DIGITAL - Amazon Web Servicesdpp-assets.s3.amazonaws.com/wp-content/uploads/... · assumption: you employ or have the benefit of a person or persons who is qualified

PAGE 36

10

THE PRACTICE

“In theory there is no difference between theory and practice,

but in practice there is.” (various)

Now that you have an understanding of the key aspects of

Digital Archiving and Preservation, you may find yourself

looking to build or procure a technology solution or service

to deliver your archiving needs .

Putting aside business aspects such as organisational

design and training for a moment, and focusing purely on

technology aspects, you should first be clear who in your

organisation is making your technology decisions and

whether they have the experience and expertise needed to

do so .

If you don’t have suitable knowledge within your own

organisation you may want to call upon knowledgable

experts from other establishments; however always ensure

that you have clearly defined accountability for technology

decisions and that those being held accountable are aware of

their position and given the support that they need .

How much will it cost me?

A major factor that becomes relevant in considering

technology solutions is cost, as this will ultimately dictate

the limitations and capabilities of your system . When

considering cost in relation to other factors it is useful to

re-iterate the following points:

The same solution need not necessarily be

used to provide both preservation and access.

Architecting a single system to have the required levels of

content integrity and assurance whilst also providing wide

and expedient access may not be cost effective . For example,

you may choose to make your content available via cheap

fast hard disk arrays, while keeping a secure copy on less

accessible but more secure data tape .

Not all content is created equal

In considering technology solutions, you can consider

the relative value of all the content being stored and

architect the solution appropriately . For example, is a single

archived rushes clip of equal value to a fully-finished

programme file and hence do they warrant the same

degree of content resilience; speed of access; or choice of

file format?

Naturally, you will be considering whether to self-provide the

solution or look to a managed service . Either way, remember

that your information belongs to you and ensure that you

don’t relinquish more control of it than you would like .

GENERAL CONSIDERATIONSIn this document we have focused on the goal of an

organisation to have access to a Digital Archive . Whether

you deliver this in-house or look to a managed service

provider you should ensure that you and your ‘supplier’ agree

on the defining characteristics of your system, and document

them through the creation of (at least) the following plans:

• Predicted Capacity and Usage Plan

• Service Definition

• Migration Plan

• Exit Plan

• Quality Assurance Plan – to allow you to have confidence

that your system is working as intended without simply

putting all your faith in the technology

• Resilience and Replication Plan

• Disaster Recovery and Business Continuity Plan

Page 37: THE DPP GUIDE TO DIGITAL - Amazon Web Servicesdpp-assets.s3.amazonaws.com/wp-content/uploads/... · assumption: you employ or have the benefit of a person or persons who is qualified

PAGE 37

HOW MUCH WILL IT COST ME?

10

Considering these points will help you reach a common

understanding between you and your supplier about

expectations of the solution being provided . This is equally

relevant for internal provision as it is for managed services

and will ensure that any desired characterises of the service

which are liable to have a cost implication are discussed and

consensus reached on a practical solution .

For example, there is undoubtedly an additional cost arising

from processes to assure the safe storage of content

and from considering preservation needs such as integrity

management, migration planning, and exit planning . All

of thesecosts should be balanced against the risk and

financial impact of content loss . It may be that for some

categories of content such as raw rushes materials,

you may decide that the balance swings in favour of

limited preservation management and reduced storage

cost; but it is likely that the reverse will be true for

completed programmes .

For each of the plans described above, you should consider

whether you want to set aside funding for rehearsals and

trial-runs at service commencement, and potentially at

regular intervals thereafter, to ensure the viability of the

approach in the presence of changing technology . This is

commonly practiced around disaster recovery, to ensure

that content is still available even in the presence of defined

system failures, but less commonly practiced for exit and

migration planning outside of specific archive-focused

scenarios .

Where you become heavily reliant on a particular solution

you may consider entering into an escrow agreement . Here,

the software and information necessary to continue using a

solution is lodged with a third-party and passed on to you,

by means of a legal agreement, if your provider were to

cease trading .

Another approach to cover for cessation of your supplier is

for them to enter into contractual agreements with other

providers of similar services such that these other parties

can take on the custodianship and management of your

content if your initial supplier was to cease trading .

In managed service offerings in some other industries you

naturally get immediate feedback on the performance of

the contract, for example if the performance of a catering

contract fails or degrades then your staff will make you

aware of it . For archive managed services, whether or not

your content is being appropriately managed is often a

matter of trust . It is therefore recommended that additional

assurance measures are put into place to ensure that the

archive and preservation management practices that you

are expecting are actually being delivered .

To ensure that your content is being managed effectively, it is

advisable to set aside a proportion of your budget for quality

assurance to be performed within your own organisation .

In selecting a vendor, you may look to a certification such

as ‘Trustworthy Repository’, ‘Trusted Digital Repository’, or

equivalent accreditation to allow you to be comfortable that

your supplier has considered all essential factors relating

to long-term storage of content . At the very least, peruse

the checklists provided in relation to these certifications to

acquaint yourself with the factors that are considered by

expert bodies to be characteristics of such an organisation,

and consider whether the additional effort required to meet

these requirements is worth the potential cost which may be

incurred as a result .

The National Digital Stewardship Alliance (NDSA) has also

created a simple approach for evaluating the maturity of a

particular service or solution which you might find useful .

This focuses on the practical aspects of archive storage and

takes the form of a tiered set of recommendations entitled

‘Levels of Digital Preservation’ . Links to this, and other

approaches are provided in the Further Reading section of

this document .

Page 38: THE DPP GUIDE TO DIGITAL - Amazon Web Servicesdpp-assets.s3.amazonaws.com/wp-content/uploads/... · assumption: you employ or have the benefit of a person or persons who is qualified

PAGE 38

1 Consider the eventual consumer of your content when making all decisions

2 Remember that not all content has equal value

3 Be clear about owners and responsibilities for all archive functions

4 Don’t put archiving at the end of your production process – consider it throughout

5 Document your decisions

6 Consider the lifespan that you’d like your content to have, and plan accordingly

7 Remember that Preservation and Access can be delivered using different solutions

8 Don’t put blind trust in technology – always assure your processes and plan for failure

You may feel overwhelmed by how many things this guide gives you to think about . But if you hold on to

the following eight principles, and approach the topics discussed in this document methodically, there

is every reason to believe you will create a first-class digital archive .

CONCLUSION

You and your colleagues may then be amazed at the positive impact this has on your business .

Never have we been so aware of how precious and rare good content is; to have a place where you can

keep yours safely, and retrieve it easily, will feel extraordinarily empowering .

Page 39: THE DPP GUIDE TO DIGITAL - Amazon Web Servicesdpp-assets.s3.amazonaws.com/wp-content/uploads/... · assumption: you employ or have the benefit of a person or persons who is qualified

PAGE 39

DPP: 10 Things You Need to Know About Digital Storage

The document is available to DPP Members at http://www.digitalproductionpartnership.co.uk

Digital Preservation Coalition: Preserving Moving Pictures and Sound

http://dx.doi.org/10.7207/twr12-01

The Open Archive Information System Reference Model: Introductory Guide

http://dx.doi.org/10.7207/twr14-02

OAIS: Full Specification

http://public.ccsds.org/publications/archive/650x0m2.pdf

AMWA AS-11 DPP Format Specification

http://www.amwa.tv/projects/AS-11.shtml

CERN study on data integrity

http://indico.cern.ch/event/13797/session/0/contribution/3/attachments/115080/163419/Data_integrity_v3.pdf

NDSA Levels of Digital Preservation

http://www.digitalpreservation.gov/ndsa/activities/levels.html

Digital Preservation Metrics (including TRAC and TDR checklists)

http://www.crl.edu/archiving-preservation/digital-archives/metrics

Digital Curation Centre: Lifecycle model

http://www.dcc.ac.uk/resources/curation-lifecycle-model

Practical Digital Preservation

http://www.facetpublishing.co.uk/title.php?id=047555#.VS1xLjHF9SQ

Personal Digital Archiving

http://www.digitalpreservation.gov/personalarchiving/documents/NDIIP_PA_poster.pdf

FURTHER READING

Page 40: THE DPP GUIDE TO DIGITAL - Amazon Web Servicesdpp-assets.s3.amazonaws.com/wp-content/uploads/... · assumption: you employ or have the benefit of a person or persons who is qualified

PAGE 40

V1 .1

This DPP production was brought to you by Steve Daly and Heather Powell, both of whom have many years of practical experience of working with media archives . They were assisted by Mark Harrison, Emma Vandore, Rachel Baldwin and Abdul Hakim . We’d very much like to thank the numerous DPP Members who have also contributed to this publication: it has benefited greatly from their collective expertise .

Design by Vlad Cohen http://www.thunder-and-lightning.co.uk

Copyright Notice:

This publication is copyright © Digital Production Partnership Ltd 2015 .

All rights are reserved and it is prohibited to reproduce or redistribute all or

any part of this content . It is intended for members’ use only and must not be

distributed outside of an organisation . For clarity, this prohibits distribution to

members of a trade association, educational body or not-for-profit organisation

as defined by the DPP membership categories . Any exception to this must be

with the permission of the DPP .