people mashing: what we did in the aqua project paul wheatley (and) andrew jackson, bo middleton,...

15
People Mashing: What we did in the AQuA Project Paul Wheatley (and) Andrew Jackson, Bo Middleton, Jodie Double, Rebecca McGuinness

Upload: christiana-eaton

Post on 13-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: People Mashing: What we did in the AQuA Project Paul Wheatley (and) Andrew Jackson, Bo Middleton, Jodie Double, Rebecca McGuinness

People Mashing:What we did in the AQuA Project

Paul Wheatley

(and)

Andrew Jackson, Bo Middleton, Jodie Double, Rebecca McGuinness

Page 2: People Mashing: What we did in the AQuA Project Paul Wheatley (and) Andrew Jackson, Bo Middleton, Jodie Double, Rebecca McGuinness

2

Background

Increasing recognition of need for quite basic preservation tools What have we got? Is it of sufficient quality? Have I just broken it?

Characterisation and Quality Assurance (QA) in three main areas: QA of digitised material

Eg: missing pages, out of focus pages, duplicate pages, inconsistent metadata, incorrect cropping, “thumb in picture”…

QA of processing/storage/handling Eg: Unpacking containers, moving content, processing, storage…

Identification of preservation risks Eg: Non-embedded fonts in PDFs, Kakadu produced JPEG2000s

missing resolution information…

Particularly the case when working at scale: Move from thousands to millions of objects shows up processes that are not as solid as we thought

Causes: software errors, disks get full, networks drop out, human error…

Human QA does not scale well and is costly: automation is the solution

Page 3: People Mashing: What we did in the AQuA Project Paul Wheatley (and) Andrew Jackson, Bo Middleton, Jodie Double, Rebecca McGuinness

3

Digitised newspaper: Quality Assurance miss

Page 4: People Mashing: What we did in the AQuA Project Paul Wheatley (and) Andrew Jackson, Bo Middleton, Jodie Double, Rebecca McGuinness

4

Migration from TIFF to JPEG2000, corruption example

Page 5: People Mashing: What we did in the AQuA Project Paul Wheatley (and) Andrew Jackson, Bo Middleton, Jodie Double, Rebecca McGuinness

5

How do we move things forward…

Characterisation and QA challenges

Anecdotal evidence from colleagues suggested we were not alone

Existing digital preservation tools aren’t meeting the challenge

Open Source possibilities. Which tools, how effective are they, how do we make them work to solve our problems?

Page 6: People Mashing: What we did in the AQuA Project Paul Wheatley (and) Andrew Jackson, Bo Middleton, Jodie Double, Rebecca McGuinness

6

AQuA: Automating Quality Assurance Project

Project funding opportunity from JISC

Focus on putting existing tools into use and solving DP problems

Partnership with Universities of Leeds and York, The Open Planets Foundation (OPF) and the British Library (BL) : AQuA

Constraints: Modest funding 6 month project length No lead time

How could we make this work?

Page 7: People Mashing: What we did in the AQuA Project Paul Wheatley (and) Andrew Jackson, Bo Middleton, Jodie Double, Rebecca McGuinness

7

Staffing requirements

Skills and understanding required: A good understanding of the specific QA and

preservation challenges faced by UK institutions Access to samples of problematic digital

collections where these challenges were present, to support solution testing

Knowledge of likely open source toolsets that might provide useful solutions

Effort to progress, test, evaluate the new tools

Solution: Event focussed approach that would utilise the knowledge and the

expertise of a community rather than of a few individuals. The project effort would focus on facilitating events at which our

attendees would deliver the results

Page 8: People Mashing: What we did in the AQuA Project Paul Wheatley (and) Andrew Jackson, Bo Middleton, Jodie Double, Rebecca McGuinness

8

Existing event formats

Hackathons: technical focus, not heavily structured, prizes for challenges. OPF events, DEVSCI (UK)

Mashups: technical focus, combining open data sources in innovative ways to create new services

Wikiathons: none-technical, pooling effort of attendees to update and add detail to wiki data

Unconference events: agile, participant driven workshops and discussions, CURATE Camp (USA)

AQuA approach borrows elements from each and adds some structure and a mix of technical and none technical participants

Page 9: People Mashing: What we did in the AQuA Project Paul Wheatley (and) Andrew Jackson, Bo Middleton, Jodie Double, Rebecca McGuinness

9

AQuA events

Project held two events in the UK, Leeds and London

20-30 people, 3 days long

Strict participant roles: No observers! Participation is mandatory!

Techies/hackers, some with DP experience, some programmers from a library or archive background who hadn't worked on DP

Practitioners/collection owners, who were asked to bring a long at least one sample of a digital collection

Spoke to all attendees in advance to make sure they knew what to expect and what we expected of them at the event

Worked in an agile manner. Lots of lightning talks, knowledge exchange, group brainstorming, as well as hacking time

Page 10: People Mashing: What we did in the AQuA Project Paul Wheatley (and) Andrew Jackson, Bo Middleton, Jodie Double, Rebecca McGuinness

10

AQuA Event format:

Day 1: Capturing the preservation challenges Introductions, learn about collections Brainstorming and recording collection challenges and initial thoughts

about solutions Teamed up the participants in practitioner – techie pairs. These pairs

would then work together across the 3 days

Day 2: Hacking and mashing Techies – developing solutions Practitioners capturing and recording their requirements Workshop/discussion sessions on a variety of topics Lots of reporting back to facilitate knowledge exchange

Day 3: Wrap up, results, reporting back and evaluation Completing developments Presentations and demos Evaluating the preservation solutions against their requirements Evaluating the event

Page 11: People Mashing: What we did in the AQuA Project Paul Wheatley (and) Andrew Jackson, Bo Middleton, Jodie Double, Rebecca McGuinness

11

Capturing the outcomes

Short events, important to capture the outcomes

Checking in software developments to the GitHub code repository

Emphasis throughout event on writing up results in a wiki

Collection - Issues - Solution structure

Tool list of some 50 mainly open source tools used at the two events

http://wiki.opf-labs.org/display/AQuA

Page 12: People Mashing: What we did in the AQuA Project Paul Wheatley (and) Andrew Jackson, Bo Middleton, Jodie Double, Rebecca McGuinness

12

Example results

No time to describe all of them - 20 digital collections examined, 40 different preservation issues described, 25 different solutions implemented. http://bit.ly/ufyk4R

Collection: BL audio field recordings submitted by the British public, created on a variety of recording devices, in a cross section of esoteric file formats

Issue: Identify and validate sound files comprised of multiple file formats and containers; some with embedded metadata, some corrupted during upload, and some with incorrect file extensions

Solution: (produced by Maurice de Rooij, NANETH) AQUAudio is a wrapper script around the Open Source getID3() PHP-library. It extracts information from audio files, such as audio properties (bitrate, #channels, sample-frequency, etc.) and metadata (ID3v1, ID3v2, BWAVE metadata, etc), and writes the results to an XML file, optionally normalise

Page 13: People Mashing: What we did in the AQuA Project Paul Wheatley (and) Andrew Jackson, Bo Middleton, Jodie Double, Rebecca McGuinness

13

Example results (2)

Existing digital preservation tools (eg. DROID) supports only a handful of audio formats

getID3 is a comprehensive and well supported open source project.

Supports 60 different audio file formats Used extensively in other products, so its well tested

and reliable as evidenced by the active support forum

Achieving that level of support and quality from a home grown digital preservation tool would take years of development and funding that we simply don't have

Instead we leveraged an existing solution in 3 days that is now in production use at the British Library

Page 14: People Mashing: What we did in the AQuA Project Paul Wheatley (and) Andrew Jackson, Bo Middleton, Jodie Double, Rebecca McGuinness

14

Useful outputs

Focused way of solving preservation challenges. Some production solutions, some prototypes, some approaches ruled out.

Detailed record of genuine preservation challenges and requirements, that can inform future work

People Mashing / knowledge sharing: Better understanding between practitioners and techies Knowledge sharing on tools, approaches, techniques Community building sharing expertise, encouraging those with

problems to seek help in a non-judgemental environment Hands on digital preservation training for DP beginners

Participants wanted to continue to develop the fledgling community that was created

Page 15: People Mashing: What we did in the AQuA Project Paul Wheatley (and) Andrew Jackson, Bo Middleton, Jodie Double, Rebecca McGuinness

15

Thank you!

Any questions?

Further information:

AQuA Project: http://wiki.opf-labs.org/display/AQuA

Open Planets Foundation: http://openplanetsfoundation.org/

Me on twitter: @prwheatley