Download - EMu, Collections Online, and the Adkin Diaries: Using existing platforms for transcription
EMu, Collections Online, and the Adkin Diaries: Using existing platforms for transcription.
Carol StevensonCollection Information System ManagerMuseum of New Zealand Te Papa Tongarewa
11th Australasian EMu User ConferenceSydney, 3-4 September 2013
Spent morning shepherding breeding ewes etc. Mustered the 2nd class Waikanae hoggarts (these are now the best we have) + drafted out 105 of the 109 to go UR (North Block). Also mustered the (former) 1st class Waikanae (193) + drafted out 179 to go up to N. Blk with the others. In evening rode down to see Maud – helped her develop some plates – spent a lovely time with her – she’s a perfect darling.
http://collections.tepapa.govt.nz/theme.aspx?irn=4382
Background George Leslie Adkin, 1888 - 1964 Farmer, photographer, geologist, explorer, archaeologist, ethnologist. 1 man, 41 diaries, 59 years, Over 21000 days Thousands of negatives and prints, some albums Initial deadline, launch of @life100yearsago twitter feed part of
WW100 project Did everything ourselves Figure out process (imaging, cropping, loading, transcription
guidelines) as we went
Process Assess album condition Photograph album pages, load as media assets to album Crop pages to days, load as Media Assets (derivatives) Create narrative for days Load “day” images to EMu “day” narrative Transcribe Add associated subjects, people, places Add context to narrative entries for month Some parts semi-automated, some completely manual; some need no
special skills, others do
Framework Using existing framework; EMu & Collections Online CIDOC CRM for building and expressing semantic relationships Days are conceptual entities, not physical Links to physical entities: diaries, photographs, albums (Catalogue) Links to people, places, topics (Thesaurus and Parties) All content managed in EMu and delivered to Collections Online
Narrative for Day
Narrative Hierarchy
Narrative Associations
Hierarchy links to Catalogue
Catalogue - Media Asset
Media Asset Hierarchy
Existing framework: Cons No crowdsourcing opportunity Huge amounts of data pushes current visual design of Collections
Online Hierarchies get very long, can be slow in EMu
Existing framework: Pros Cheap! Know how to use it No set up Proved flexibility of system Full use of thesauri etc Links into rest of the collections (this is the most important) Existing audience
Crowdsourcing Size of the project is daunting, but the transcription could be
manageable through crowdsourcing The content is interesting: NZ history, early 20th Century courtship,
farming, geology, religion, war, politics, weather… Horowhenua locals interested in local history, and one of their famous
sons History students and educators: Bring students closer to primary
material, work with cursive handwriting, highlight the importance of accuracy in relation to data, personal biography
Learning history through a first hand account
Platforms and complex data There are a number of existing online platforms that look great
(Zooniverse, FromThePage), but how to deal with matching to our structure, vocabularies, authorities?
Could use automated text authority mining, but would need to then match back to authorities and structure, and text doesn’t include concepts that require human understanding (e.g. courtship)
Beyond scope of crowdsourcing? But does that diminish the value of the “data”?
External platforms means lots of data and image handling “Closed” crowdsourcing. Provide volunteers remote access to Emu,
with very cut down access
Where to Can’t do with existing (human) resource Transcription only one part of the project, richness comes form the linking of
concepts, people etc Need to figure what parts need to be crowdsourced, what can’t Transcription could enable the adding of contextual and semantic
relationships and links to other sources Options for automating the above Or, with a focussed crowd and a finite project, maybe we don’t need a new
platform, could provide training and use existing tools Make data available for analysis, visualisation, research, fun
In evening rode down to see Maud – showed her some books but there seemed to be a lack of sympathy between us + the evening was a failure.
http://collections.tepapa.govt.nz/theme.aspx?irn=4080
Twitter Narrative teasers are also Tweets Link to Collections Online One of a number of 100 years ago accounts Feeds into a group account @life100yearsago Also tweet images for days that have them Dead man tweeting: potential issues of responding to comments etc
What we’ve learnt So much content, so much data Our existing data structure works really well Transcription only one part Context needed, or at least useful, for the reader Enlivens the collection, a step beyond just digitisation and transcription need to formalise the project
See Adkin diaries on Collections Online @adkin_diary on Twitter @life100yearsago on Twitter