metadata/cataloging doesn’t one smart method...

12
9/24/12 1 Metadata in Media Projects Howard Besser, NYU http://besser.tsoa.nyu.edu/howard/Talks 1 MOW Occupy 26/9/2012 Metadata in Media Projects Ideas and methods for easing the metadata creation burden – Pushing Metadata-gathering upstream into the Production Process: Preserving Digital Public TV – Crowd-sourcing metadata Access to Moving Image info online Besser, Metadata in Media Projects, 9/24/12 Metadata/Cataloging doesn’t scale 20 th century model required the cultural repository to spend an enormous amount of time having a trained professional catalog each resource when it entered the repository This won’t scale when the repository receives hundreds (or thousands) of new works each day; most of our archives have huge bottlenecks of material inaccessible, while it waits to be cataloged Since the early 1970s, precursors to Google have provided access to documents based upon words that already exist in the document, but these techniques don’t yet work well for media works We need to find efficient methods to derive reliable metadata to avoid bottlenecks when a work enters a repository Besser, Metadata in Media Projects, 9/24/12 One smart Method Find a way to capture and maintain info about a work that is known early in its life-cycle For a media work, this often means maintaining metadata from the production cycle that was considered ephemeral, and discarded Besser, Metadata in Media Projects, 9/24/12 NDIIPP's Preserving Digital Public Television project http://www.thirteen.org/ptvdigitalarchive/ Besser, Metadata in Media Projects, 9/24/12 Background & Goals: NYU/Public Television Project 2004-2010 $6 million project -- 50% from LC/NDIIPP Marry asset management to preservation Preserve a broad set of elements (including ancillary material) Life-cycle mgmt (add metadata as soon as a clip comes in) Establish a community of stakeholders, working together for preservation (stations, university, librarians, journalists, historians, producers, scholars, …) Build an OAIS Repository Explore appropriate file formats, wrappers, METS extensions Develop sustainable business model Besser, Metadata in Media Projects, 9/24/12

Upload: others

Post on 20-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Metadata/Cataloging doesn’t One smart Method scalebesser.tsoa.nyu.edu/.../12access-class-video-metadata.pdf · 2012-09-24 · Metadata in Media Projects • Ideas and methods for

9/24/12

1

Metadata in Media Projects

Howard Besser, NYU http://besser.tsoa.nyu.edu/howard/Talks

1 MOW Occupy 26/9/2012

Metadata in Media Projects

•  Ideas and methods for easing the metadata creation burden – Pushing Metadata-gathering upstream into

the Production Process: Preserving Digital Public TV

– Crowd-sourcing metadata •  Access to Moving Image info online

Besser, Metadata in Media Projects, 9/24/12

Metadata/Cataloging doesn’t scale

•  20th century model required the cultural repository to spend an enormous amount of time having a trained professional catalog each resource when it entered the repository

•  This won’t scale when the repository receives hundreds (or thousands) of new works each day; most of our archives have huge bottlenecks of material inaccessible, while it waits to be cataloged

•  Since the early 1970s, precursors to Google have provided access to documents based upon words that already exist in the document, but these techniques don’t yet work well for media works

•  We need to find efficient methods to derive reliable metadata to avoid bottlenecks when a work enters a repository

Besser, Metadata in Media Projects, 9/24/12

One smart Method

•  Find a way to capture and maintain info about a work that is known early in its life-cycle

•  For a media work, this often means maintaining metadata from the production cycle that was considered ephemeral, and discarded

Besser, Metadata in Media Projects, 9/24/12

NDIIPP's Preserving Digital Public Television project

http://www.thirteen.org/ptvdigitalarchive/

Besser, Metadata in Media Projects, 9/24/12

Background & Goals:

NYU/Public Television Project •  2004-2010 •  $6 million project -- 50% from LC/NDIIPP •  Marry asset management to preservation •  Preserve a broad set of elements (including ancillary material) •  Life-cycle mgmt (add metadata as soon as a clip comes in) •  Establish a community of stakeholders, working together for

preservation (stations, university, librarians, journalists, historians, producers, scholars, …)

•  Build an OAIS Repository •  Explore appropriate file formats, wrappers, METS extensions •  Develop sustainable business model

Besser, Metadata in Media Projects, 9/24/12

Page 2: Metadata/Cataloging doesn’t One smart Method scalebesser.tsoa.nyu.edu/.../12access-class-video-metadata.pdf · 2012-09-24 · Metadata in Media Projects • Ideas and methods for

9/24/12

2

Project Partners •  Thirteen/WNET & WGBH – Content and production expertise

–  The two largest television stations in the PBS system –  Together produce largest percentage of national programs –  Both have preservation Archives

•  Public Broadcasting Service – More content and network design –  Distributes most of the national programming –  Determines and keeps ‘broadcast’ versions

•  New York University – Facilitation and Resources –  Leadership in designing digital libraries –  Experience in process for setting standards –  Has new Masters Program in Moving Image Archives

Besser, Metadata in Media Projects, 9/24/12

Public Television was analog for a long time

Besser, Metadata in Media Projects, 9/24/12

PBS Remote Tape Storage Facility

Besser, Metadata in Media Projects, 9/24/12

Computer-printed index of���1955-69 WNET Holdings at LC

Besser, Metadata in Media Projects, 9/24/12

Besser, Metadata in Media Projects, 9/24/12

Project activities include •  Completing an inventory of at-risk materials to better quantify

our holdings and prepare for selection

•  Reviewing best practices and most up-to-date developments in the field of video archiving

•  Conducting facilitated discussions on key topics to guide setting standards and policies

•  Establishing an Advisory Committee to assist with selection criteria

•  Ingesting sample materials and testing the repository

•  Presenting regular reports to public broadcasting and moving image archive community for ongoing feedback

Besser, Metadata in Media Projects, 9/24/12

Project Focus Areas

•  Appraisal and Selection – developing criteria and standards for what to preserve and by whom

•  File Formats and Packages – determining the best formats for our various uses, plus testing the suitability of file “packaging” for long term preservation

•  Metadata and Related Topics – specifying technical, descriptive and rights information

•  Repository Design – technical architecture, administrative policies and potential business models

•  Sharing Our Findings – Keeping the public broadcasting community involved and informed all along the way

Page 3: Metadata/Cataloging doesn’t One smart Method scalebesser.tsoa.nyu.edu/.../12access-class-video-metadata.pdf · 2012-09-24 · Metadata in Media Projects • Ideas and methods for

9/24/12

3

Besser, Metadata in Media Projects, 9/24/12

By 2007, the Collaboration had already show success

•  Helped produce standards that allow files to flow digitally from producer to PBS to stations

•  Studied various metadata schemes to zero in on what’s needed for preservation

•  Inventoried and located programs in non-regular locations

•  “Wrapper Roundtable” & aftermath (METS, MXF, AAF)

•  PBS & LC beginning to collaborate •  The “American Archive”

Pushing Metadata Gathering Upstream: The Problem

TRADITIONALLY… •  Very little metadata required for

preservation accompanies an object to a repository.

•  Archives, libraries and other repositories must create (or re-create) most of the necessary metadata.

•  This requires many manual hours, and significant resources - both time and money.

IN THE DIGITAL WORLD… •  This doesn’t scale up. Repositories

will be unable to continue in this manner, as more metadata than ever is required.

Besser, Metadata in Media Projects, 9/24/12

But much of the necessary metadata has already been gathered during production

•  For each element/clip, production team usually notes source, date, place, people, and other descriptive info

•  But this is treated as internal information, and often various parts of the info are distributed among the personal notebooks of different production assistants

•  There is seldom a central location for this info, and the info is seldom turned over to the archive (which later tries to recreate much of it)

Besser, Metadata in Media Projects, 9/24/12

When the Archive tries to re-create this info, it is seldom successful

Producers know much more about the content of their productions than the archivists do. Archivists wanting accurate info must go back to the production staff (often years later) to start brainstoriming over the info

“Once the (television) program is finished, it is passed on to the archive or library for safe keeping. Librarians will catalog and classify the content, possibly using a proxy copy, and enter the resulting informative metadata in their database so they can retrieve it in the future. However, rarely if ever is the metadata from the rest of the process passed onto them, except, perhaps, for the title, tape number, and basic technical information about recording formats. It has to be re-created, with all the associated risk of errors and lack of accuracy--not to mention the work and time involved.” - Cox, Tadic, and Mulder, Descriptive Metadata for Television (2006)

Besser, Metadata in Media Projects, 9/24/12

Similar issues w/other content types--E-Journals

•  “The necessary or additional metadata cannot be effectively and satisfactorily produced either as an afterthought post-production process on the publisher’s side or as a pre-ingest conversion activity at the archive’s end. Approaching e-archiving in this fashion leads to distribution delays and a more complex production and distribution scenario, with all the accompanying potential to introduce production delays and errors.”

- Yale University, YEA: The Yale University Archive, One Year of Progress, 2002

Besser, Metadata in Media Projects, 9/24/12

We need to find ways to push metadata access upstream

•  Digital requires even more metadata than Analog –  As the workflow becomes file-based, the need for robust and

accurate metadata will become critical. File relationships, video codecs, bit rates, and rights information must be explicit, accurate, and immediately accessible. This will require a much deeper level of metadata than is currently captured in tape-based archives.

–  We can’t continue to supply this metadata at ingest; that won’t scale •  Obtaining the necessary metadata at the end of production and

broadcast life cycle is not feasible. Metadata will need to be systematically gathered during the production lifecycle and submitted with the programs to the preservation repository.

Besser, Metadata in Media Projects, 9/24/12

Page 4: Metadata/Cataloging doesn’t One smart Method scalebesser.tsoa.nyu.edu/.../12access-class-video-metadata.pdf · 2012-09-24 · Metadata in Media Projects • Ideas and methods for

9/24/12

4

Besser, Metadata in Media Projects, 9/24/12

Examined Potential Points of Metadata Capture

Besser, Metadata in Media Projects, 9/24/12

Examined Potential Points for Metadata Capture

•  Much of the necessary metadata for preservation is already generated by the production unit, but discarded after their internal use. This needs to be captured throughout the workflow.

•  “Those in the production unit are the creators and have first hand knowledge of who, what, where, when, and why the content was created.” -- Mary Ide and Leah Weisse, WGBH Archivists.

Proposed Solutions…?

•  Preservation becoming a shared responsibility between content creators, distributors, curators, and preservationists.

•  Partnerships are needed to come to unified solutions.

•  Preservationists seek reliable metadata back upstream in the production workflow...

Besser, Metadata in Media Projects, 9/24/12

Workflow in Production Process-

•  Site Visits to productions •  Interview Production staff •  Diagrams of Workflow-

Besser, Metadata in Media Projects, 9/24/12

Besser, Metadata in Media Projects, 9/24/12

Site Visits

This report is based on workflow studies at public television stations, between June and August, 2006 by NDIIPP Research Assistants Caroline Rubens, Paula Felix-Dider, and Kara Van Malssen. Workflow report completed in September 2006.

Additional insight was gained through metadata studies conducted by Mary Ide and Leah Weisse, Archivists at WGBH.

Besser, Metadata in Media Projects, 9/24/12

Site Visits

•  WGBH, Boston - June 19-20, 2006 –  Interviewees included Archive and Media Library staff,

members of Frontline, NOVA, and American Experience production units, and legal dept staff

Page 5: Metadata/Cataloging doesn’t One smart Method scalebesser.tsoa.nyu.edu/.../12access-class-video-metadata.pdf · 2012-09-24 · Metadata in Media Projects • Ideas and methods for

9/24/12

5

Besser, Metadata in Media Projects, 9/24/12

• WNET/Thirteen, New York - July 18 and August 2, 2006 – Interviewees included the Archivist, members of Broadway: The American Musical production unit, Broadcast Operations, and Broadcast Technology staff.

• WNET/Thirteen, Washington DC - August 15, 2006 – Religion and Ethics production unit staff interviewed.

Site Visits Public Television Workflow Basically similar to workflows in other fields…

Besser, Metadata in Media Projects, 9/24/12

For digital preservation, this shouldn’t be the only place for metadata in the preservation workflow! This is far too late in the cycle!

+

Besser, Metadata in Media Projects, 9/24/12

METADATA

METADATA

It also needs to be here!

Besser, Metadata in Media Projects, 9/24/12

WorldFocus •  Nightly news program begun Oct 2008 •  We began working with Workflows six months before

program began •  Had ability to engineer metadata gathering into the

creation/production process

Besser, Metadata in Media Projects, 9/24/12

WorldFocus

•  Systematized metadata gathered during pre-production and production, and made sure that this metadata stayed with each piece of media though the end of its lifecycle (incl info on source)

•  Create new light-weight metadata gathering tools (like time-date stamps and GPS chips in cameras)

Besser, Metadata in Media Projects, 9/24/12

Page 6: Metadata/Cataloging doesn’t One smart Method scalebesser.tsoa.nyu.edu/.../12access-class-video-metadata.pdf · 2012-09-24 · Metadata in Media Projects • Ideas and methods for

9/24/12

6

Other Major Accomplishments-

•  PB Core 2.0 •  Appraisal/Selection/User Needs •  Intellectual Property •  Repository Design •  Sustainability

Besser, Metadata in Media Projects, 9/24/12 Besser, Metadata in Media Projects, 9/24/12

Helped advance PBCore •  2+ year collaborative effort •  48 elements based on Dublin Core

–  Intellectual content of a media asset or resource -- 13 elements

–  IP-creation, creators, and usage limitations of a media asset or resource -- 7 elements

–  Instantiation (in either digital and/or analog) -- 28 elements

•  http://www.utah.edu/cpbmetadata/

Activist Archivists http://activist-archivists.org/

•  MIAP students and grads originally working on archiving media from the Occupy movement

•  Guidelines for recorders to make their works more easily preservable: make notes, turn on GPS, upload to service that doesn’t strip out metadata, keep raw footage, don’t compress

•  For meeting recordings, have them read a script at start of the recording

Besser, Metadata in Media Projects, 9/24/12

Crowd-sourcing Metadata-

•  Australian National Library newspaper digitization project

•  Steve •  Waisda

Besser, Metadata in Media Projects, 9/24/12

Australian Newspapers http://www.dlib.org/dlib/march10/holley/03holley.html

Besser, Metadata in Media Projects, 9/24/12

•  Users correct OCR text for Natl Libr •  In 1st year 6000 users corrected 7M

lines of text •  and added 200K tags •  2008-09

Steve •  Social tagging of online images of

museum objects •  Began in 2005 •  In first 2 years, users created 36,981 tags,

comprising 11,944 terms •  By the end of 2010 there had been

468,120 contributions •  86% of the contributed terms were not

previously in the museum-created records (vernacular language)

Besser, Metadata in Media Projects, 9/24/12

Page 7: Metadata/Cataloging doesn’t One smart Method scalebesser.tsoa.nyu.edu/.../12access-class-video-metadata.pdf · 2012-09-24 · Metadata in Media Projects • Ideas and methods for

9/24/12

7

Steve http://tagger.steve.museum/

Besser, Metadata in Media Projects, 9/24/12

Waisda (What’s That?)

•  Launched in May 2009 by Netherlands Beeld en Geluid (Instit of Sound & Vision)

•  Players watch a tv program from the collection, then apply tags. They earn points when their tag matches a tag applied by a competing player. (validity)

•  In 1st 6 months, 2000 people played, tagging 600 videos with over 340,000 tags (40% of them matches)

Besser, Metadata in Media Projects, 9/24/12

Waisda tagging screen

Besser, Metadata in Media Projects, 9/24/12

Waisda http://woordentikkertje.manbijthond.nl/

Besser, Metadata in Media Projects, 9/24/12

Waisda Research

Besser, Metadata in Media Projects, 9/24/12

•  The 3 major motivations for taggers were: competition, altruism, and attractiveness of content (popular Dutch reality show)

•  Only 5.8% of tags matched B&G terms •  A B&G cataloger found that 73% of the tags of an averagely-tagged episode were useful

Access to Moving Image info online-

•  Concepts of Union Catalogs •  Portals to archival Moving Image

material – MIC (now defunct) – European Film Gateway – EU Screen –  IMDB

Besser, Metadata in Media Projects, 9/24/12

Page 8: Metadata/Cataloging doesn’t One smart Method scalebesser.tsoa.nyu.edu/.../12access-class-video-metadata.pdf · 2012-09-24 · Metadata in Media Projects • Ideas and methods for

9/24/12

8

Concepts of Union Catalogs/Portals --follow-up from Intro class Talk

•  A copy of every record can be sent to a central catalog (WorldCat, Melvyl)

•  Dublin Core (or similar) records can be generated from a more extensive existing record for each resource – …and sent by the local collectn to portal or – …and harvested by the portal

•  OAI-PMH Besser, Metadata in Media Projects, 9/24/12

Portals

•  May display the resource (video, image), or just its metadata

•  May force the user to go to the host institution to view the resource

•  Why?

Besser, Metadata in Media Projects, 9/24/12

Moving Image Collections (MIC) http://gondolin.rutgers.edu/MIC/

•  AMIA & LC (technical development by Rutgers)

•  2002-04 NDSL/NSF funding •  Mapping to/from MARC21, DC, MPEG7

Besser, Metadata in Media Projects, 9/24/12

MIC DC Elements

Besser, Metadata in Media Projects, 9/24/12

MIC Conceptual Design

Besser, Metadata in Media Projects, 9/24/12

MIC Mapping

Besser, Metadata in Media Projects, 9/24/12

Page 9: Metadata/Cataloging doesn’t One smart Method scalebesser.tsoa.nyu.edu/.../12access-class-video-metadata.pdf · 2012-09-24 · Metadata in Media Projects • Ideas and methods for

9/24/12

9

MIC defunct for several years

•  No one really took serious ownership •  It wasn’t a high priority for anyone •  Parts of it were too technically complex

for non-Rutgers institutions •  Mapping utilities expected institutions to

have familiarity with DC, MARC21, or MPEG7

Besser, Metadata in Media Projects, 9/24/12

European Film Gateway

Besser, Metadata in Media Projects, 9/24/12

European Film Gateway http://www.europeanfilmgateway.eu/

Besser, Metadata in Media Projects, 9/24/12

European Film Gateway http://www.europeanfilmgateway.eu/

Besser, Metadata in Media Projects, 9/24/12

•  ACE & Europeana (Association des Cinémathèques Européennes)

•  Pilot 2008-11 •  WWI project 2012-14

– 20 European Film Archives – Plan to digitise 654 hours of film and 5.600

film-related documents – Access through EFG & Europeana

20 archives, providers of content

  Deutsches Filminstitut – DIF, Frankfurt (co-ordinator)   Arhiva Nationala de Filme, Bucharest   Centre National du cinéma et de l’Image animée, Bois

d’Arcy   Cinecittà Luce, Rome   Cinémathèque Royale de Belgique, Brussels   Cineteca di Bologna, Bologna   Det Danske Filminstitut, Copenhagen   Deutsche Kinemathek, Berlin   Estonian Film Archive, Talinn   EYE Filminstituut Nederland, Amsterdam   Filmarchiv Austria, Vienna   Filmoteca Espanola, Madrid   Fondazione Cineteca Italiana, Milan   Imperial War Museum, London   IVAC, Valencia   Jugoslovenska Kinoteka, Belgrade   MaNDA, Budapest   Národní filmový archiv, Prague   Nasjonalbiblioteket, Oslo   Österreichisches Filmmuseum, Vienna Besser, Metadata in Media Projects, 9/24/12

European Film Gateway Pilot Metadata Design

ENTITIES •  3.1 AVCREATION •  3.2 AVMANIFESTATION •  3.3 NONAVCREATION •  3.4

NONAVMANIFESTATION •  3.5 ITEM •  3.6 AGENT •  3.7 EVENT •  3.8 COLLECTION

COMMON ELEMENTS •  4.1 IDENTIFIER •  4.2 TITLE •  4.3 RECORD SOURCE •  4.4 FORMAT •  4.5 KEYWORDS •  4.6 DESCRIPTION •  4.7 USER TAG •  4.8 NOTE •  4.9 SEGMENTATION

Besser, Metadata in Media Projects, 9/24/12

Page 10: Metadata/Cataloging doesn’t One smart Method scalebesser.tsoa.nyu.edu/.../12access-class-video-metadata.pdf · 2012-09-24 · Metadata in Media Projects • Ideas and methods for

9/24/12

10

European Film Gateway Pilot Metadata Design

Besser, Metadata in Media Projects, 9/24/12

European Film Gateway Current Metadata Plans

http://www.europeanfilmgateway.eu/

Besser, Metadata in Media Projects, 9/24/12

  OAI-PMH exports: partner archive upgrades for those that don’t already support OAI-PMH

  Extension of pre-existing EFG common schema and definition of mappings

  Extension of the EFG aggregation system to deliver metadata from the archives to Europeana, EFG portal and Virtual exhibition tools

  Management of harvesting and transformation activities necessary to grow and curate the information space

European Film Gateway Metadata Plans

Besser, Metadata in Media Projects, 9/24/12

EU Screen http://www.euscreen.eu/

Besser, Metadata in Media Projects, 9/24/12

EU Screen

•  28 Partners from 28 European countries •  Focus on clips and programs related to

14 historical topics

Besser, Metadata in Media Projects, 9/24/12

Next group of slides

•  Courtesy of Marco Rendina, Cinecittå Luce, Oct 18, 2010

Besser, Metadata in Media Projects, 9/24/12

Page 11: Metadata/Cataloging doesn’t One smart Method scalebesser.tsoa.nyu.edu/.../12access-class-video-metadata.pdf · 2012-09-24 · Metadata in Media Projects • Ideas and methods for

9/24/12

11

EU Screen

Besser, Metadata in Media Projects, 9/24/12

EU Screen

Besser, Metadata in Media Projects, 9/24/12

EU Screen

Besser, Metadata in Media Projects, 9/24/12

EU Screen

Besser, Metadata in Media Projects, 9/24/12

EU Screen

Besser, Metadata in Media Projects, 9/24/12

EU Screen

Besser, Metadata in Media Projects, 9/24/12

Page 12: Metadata/Cataloging doesn’t One smart Method scalebesser.tsoa.nyu.edu/.../12access-class-video-metadata.pdf · 2012-09-24 · Metadata in Media Projects • Ideas and methods for

9/24/12

12

IMDB

•  If IMDB already has detailed credit info, why should we derive all that info again ourselves Can’t we just link to an IMDB record?

•  Who contributes? •  How reliable? Can we evaluate

reliability for a particular record?

Besser, Metadata in Media Projects, 9/24/12

Metadata in Media Projects

•  http://besser.tsoa.nyu.edu/howard/Talks/ •  http://www.thirteen.org/ptvdigitalarchive/ •  http://www.europeanfilmgateway.eu/ •  http://www.euscreen.eu/

Besser, Metadata in Media Projects, 9/24/12