computer music journal article template - indiana...

Proposal to NEH Division of Preservation & AccessA Timeline-Based Approach to the

Study and Preservation of Music Performance MaterialPD: Dr. Donald Byrd, School of Informatics & Jacobs School of Music

Co-PD: Prof. Beth Plale, School of Informatics & Pervasive Technology Institute

Other participants: Prof. Konrad Strauss, Jacobs School of Music

Prof. Vince Liotta, Jacobs School of MusicWill Cowan, Institute of Digital Arts & Humanities

Indiana University, Bloomington

Submitted 29 July 2009

1

AbstractThe study of classical music and classical music performances involves intensive examination of previous performances, but other materials can significantly enhance the research experience. There is almost always a “script” for a performance, in the form of music notation. It is standard procedure in classical music for scholars and students to compare recordings of concerts to the published notation. Other kinds of materials exist as well: recordings and videos of rehearsals, and multiple forms of the notated music. Vocal music brings in lyrics and translations. Musical theatre, including opera and ballet, adds a great deal more, including designs and photographs of costumes, sets, and staging. We posit that the study of classical music and its performances can benefit from software that allows musicians to organize all these materials together, aligned along a timeline. Further, users can use the same software to preserve the materials they have gathered and annotated, and share these results more easily.

We propose a two-year project that is a collaboration between musicians (Professors Konrad Strauss and Vincent Liotta), music informaticians (Dr. Donald Byrd), and computer scientists (Professor Beth Plale) to construct a tool for organizing, annotating, and preserving the varied information and digital media that makes up musical performances, including those of unusual complexity such as operas. The project, budgeted slightly under $350,000, will involve, in addition to the senior staff, two graduate students, one from the Indiana University School of Music and one from the IU School of Informatics and Computing, who will gain interdisciplinary knowledge from the project. The project development is in stages; in the first year we focus on issues of metadata and selected data formats and materials. In the second year we expand the types of media we support. Both years the target of examination will be several productions by the IU Opera Theatre. The 2-year project will culminate in a user study that will seek feedback from users and incorporate the feedback into the tool. We envision the proposed effort forming the basis of a more general tool, a Phase 2 effort that would be a follow on to successful completion of Phase 1 (the work proposed here).

The users of the tool are anticipated to be scholars studying all aspects of music-based performance art, including set design, staging, lighting, and dance as well as the music itself—and for any kind of music used in a theatrical setting, including rock and musical shows. It will also be suitable for presentations, e.g., in the classroom or at conferences.

Specifically, the proposed General Music Workbench (”GMW”) is a tool for collecting, assimilating, visualizing, playing, and editing of music-related and music-theatre-related performance material. The tool will handle several types of media (video, separate audio, still photos, music notation in scanned-image form, and text) and will support the automatic capture as well as manual editing of metadata associated with the different media. To allow the user to work with the media in their temporal context, it will have a timeline-based user interface. Once media are synchronized and metadata prepared to the user’s satisfaction, he/she will be able to preserve collected objects. To make this possible, we will develop a common metadata representation suitable for long-term preservation in a digital library repository. Building on existing software developed at Indiana University, the GMW will be designed with extensibility in mind. Phase 2 of the project extends the functionality of this proposal with support for any temporal phenomenon, and for accessing the tool and data anywhere in the world by means of hosting key components of the tool on the cloud.

2

VisionThe world is full of complex temporal (time-based) phenomena, both natural and cultural: to name a few of each, animal movement, movies, football games, fireworks shows, weather, political crises, illnesses, wars, development of individuals, development of species, development of geographic features, and, of course, music. These things are always difficult to study or—for performances of all kinds, whether artistic or athletic—to create, because they don’t “sit still”, so to speak. The brilliant jazz woodwind player Eric Dolphy once said, “When you hear music, after it’s over, it’s gone, in the air. You can never capture it again.” (Dolphy 1964) It’s hard to disagree, but if you really want to think about what you heard, to study it in detail and, especially, to compare it to something else, you need some way to capture as much as you can. This need extends to all aspects of the art, not just studying performance or other analytic purposes. It also extends to every complex temporal phenomenon we know of.

The usual technique for making these phenomena easier to handle is visualization. The well-known anthology Readings in Information Visualization (Card et al 1999) is subtitled “Using Vision to Think”; the editors make it clear they intend the idea to be taken seriously, and with good reason. Visualizations on paper have been around for many centuries (music notation and timelines are two examples), and computer-based visualizations, especially interactive visualizationsi, can be extremely helpful. But, to our knowledge, every visualization system to date is either fundamentally limited to a specific domain or range of related domains, or fundamentally limited in what it can do within the domain(s). This is unfortunate because the synergy or leverage possible with a more general approach is enormous: three reasons are generalized interactive timelines, integrated visualization/analysis systems, and frequency-domain viewers. It’s clear that most scholars who deal heavily with complex temporal phenomena are aware of the power of none of these technologies. SIMILE Timeline/Timeplot/Timemap, to name one recent interactive timeline system, has already been applied in many disciplines. Some years after Shneiderman and others first mentioned the potential of integrated visualization/analysis systems, they have started achieve wide application, even for multimedia (Yu 2008). As for examining their subject material in the frequency (or the time/frequency) domain, surely most humanists have no idea that anything other than time domain exists. To use a simple analogy, they are at best reinventing the wheel and building simple carts; more often, they are dragging sledges. Nor is it implausible that frequency-domain methods are of value to humanists. Plausible claims of periodic changes in one field or another are not hard to find; but regardless of the scholars’ mathematical sophistication, a shortage of data has always insured that arguments for temporal relationships could only be subjective. A rare exception is the correlation between congressional elections and representatives’ use of franked mail, cited by Tufte (1983, p. 37). But Nicholas Cook commented a few years ago on the rapidly-changing situation in musicology as significant musical databases become available, making large-scale quantitative research possible for the first time (Cook 2005).

We propose to build the General Music Workbench (”GMW”), a tool for collecting, assimilating, visualizing, “playing”, and editing of music-related and music-theatre-related performance material. The GMW will be designed in such a way that it can be extended (given additional funding) to perform the same functions plus comparison for any temporal phenomenon. This is significant

i The term is not yet very well-known; by Wikipedia’s definition, it means a visualization that satisfies two criteria:

* Human input: control of some aspect of the visual representation of information, or of the information being represented, must be available to a human, and * Response time: changes made by the human must be incorporated into the visualization in a timely manner.

3

because many fields have needs similar to music, an observation that led the creators of an early music-representation standard to spin off HyTime (ISO/IEC 10744:1992), a very general time representation; see Byrd (2009). But what needs are we talking about? What fields can really benefit from synergy of “not reinventing the wheel”? For temporal phenomena, the features that we believe are relevant are:

1. Complex enough that a number of ways exist of “looking” at the phenomenon, and (in general) no one way can capture everything significant

2. People often want to compare two or more instances of the phenomenon3. (less important) specialized graphical notation(s) are widely used for symbolic form

We believe it is self-evident that many fields fulfill at least the first two criteria.

We outline here the roadmap for tool evolution. We envision a Phase 2 for this project that develops an extension of the Phase 1 tool capable of handling any temporal phenomena. The breakdown is shown in Table 1.

Phase Name Domain knowledge Analysis/presentation modules1-Yr 1 GMW Built-in; audio & video only Built-in1-Yr 2 GMW Built-in; audio, video, & symbolic music (for

notation)Plug-ins

2 GTW Modular; initially audio, video, & symbolic music

Plug-ins, independent programs via Web services

Table 1. Phasing Plan: Phase 2 extends the General Music Workbench (GMW) to support temporal phenomena beyond music, and we refer to the result as the General Temporal Workbench (GTW)

We will build the proposed tool on existing software developed at Indiana University, particularly the EVIA Annotator’s Workbench (Dunn & Cowan 2005), the XMC Cat metadata catalog (Jensen and Plale 2008), and to a lesser extent Variations (Dunn et al 2006). The tool will handle several types of media (video, separate audio, still photos, text notes, and music notation in scanned-image form) and will support the automatic capture as well as manual editing of metadata associated with the different media. To allow the user to work with the media synchronously, i.e., in their proper temporal context, it will have a timeline-based user interface. Once media are time synchronized and metadata prepared to the user’s satisfaction, he/she will be able to identify media belonging to a single logical object and select to preserve the object a single unit. To make all of this possible, we will develop a common representation suitable for long-term preservation in a digital library repository.

The software architecture of a tool is critical to its ease of use and extensibility. A pluggable architecture, such as we propose here, allows programmers to extend a tool by means of “plugging in” tools for visualizing, analyzing, and editing temporal phenomenon. For most applications and users that deal with time-based phenomena, a user interface based on some kind of timeline is both intuitive and powerful, and the GTW will offer strong support for a very wide variety of timelines. The time is right for such a system because of recent developments in infrastructure both general (e.g., the power of modern personal computers, infrastructures for visualization) and specific (e.g., advances in signal processing and automatic synchronization, emerging metadata standards, multimedia data mining software). Figure 3ii is a mockup of what the GTW might look like to the user, showing a possible configuration for studying three productions of Mozart’s opera The Magic Flute. Note the timeline in the middle, labeled “Magic Flute – Overview”. With minor variations, a

ii Found at end of narrative.

4

setup like this could be useful to conductors, singers, and stage and lighting designers, for musical shows of all kinds. Most relevant to us, it could be useful to musicologists, in some cases with different kinds of notation for the music. (A “playable” Flash demo of this scenario is available at www.informatics.indiana.edu/donbyrd/Spc/GTWDemos/QueenOfNightDemo.html.)

Related Work. Interactive visualizations based on timelines are in use for a wide range of purposes. LifeLines is a pioneering interactive visualization for personal history information (Plaisant et al 1996); it was originally designed for use in juvenile justice, but has also been applied to medical records, as in Figure 4. SIMILE Timeline is a much more recent development. It is a popular “toolkit” implemented as a set of JavaScript APIs for creating timelines; Figure 5 shows a SIMILE version of the assassination of President Kennedy (available “live” at http://simile.mit.edu/timeline/examples/jfk/jfk.html). A related and compatible toolkit, Timeplot, creates graphs of numeric values as a function of time. Note that both LifeLines and SIMILE can display multiple bands, i.e., parallel coordinated display areas, with user control of what goes in each. In the JFK example, the two bands merely show the same material at different zoom levels, with labels omitted in the “overview” (the lower band). The six bands in the LifeLines example cover different facets, i.e., in library-science terminology, aspects of the situation: they are labeled “Notes”, “Hosps.”, “Tests”, “Meds.”, Others”, and “Immun.”.

Neither LifeLines nor SIMILE timelines can be “played”, though items on the latter can link to audio or video files, but genuinely playable timelines are not too unusual. The New York Times Business section website for 5 July 2009 has an elaborate playable visualization of features of the U.S. economy in recent decades, with a conventional timeline coordinated with a parametric-curve display of important parameters. (Cf. http://www.nytimes.com/interactive/2009/07/02/business/economy/20090705-cycles-graphic.html.) Figure 6 documents the Salem Witchcraft accusations with a playable timeline integrated with a GIS-like display whose layers can be turned on or off as it plays. (This is at http://www2.iath.virginia.edu/salem/bcr/salem/salem.html . ) Playback spends only a few seconds on each day, skipping over days when nothing happened. Playable timelines can also be found in commercial products like Avid Media Composer, arguably the most popular editing system in the film and television industry.

Finally, Figure 7 shows an ethnomusicology project in the EVIA Annotator’s Workbench (henceforth “AWB”), and Figure 8 is the Chopin G-minor Ballade as seen in the Variations system’s timeliner. These last two programs are those the current proposal intends to build on. The Variations timeliner works with individual audio files, for which it allows segmenting and adding simple annotation; it is primarily intended for use in teaching. AWB works with projects involving multiple video files, and it supports segmenting and adding library/archive-quality annotation and other metadata.

But given today’s technology, even the standard features of each of these interactive visualizations and, no doubt, hundreds of others had to be implemented from scratch, which is time consuming and costly: an enormous amount of work “reinventing the wheel”. In addition, while SIMILE’s toolkit design gives it a great deal of flexibility, the penalty in ease of use is severe: very few scholars are able to use it without technical help. Drawing on work from IU’s existing projects, especially the AWB and the Variations system, we plan to start drawing the salient features together into a single, highly-flexible “shell”, allowing different domains to reuse a basic framework without having to implement anything that’s not specific to their domain, yet ending up with a system that, once configured for an application, can be very user-friendly.

However, a single system can work in a huge variety of disciplines only if the system design is very careful to avoid any unnecessary limitations, and that requires a highly modular architecture. Even if

5

http://www.informatics.indiana.edu/donbyrd/Spc/GTWDemos/QueenOfNightDemo.html

http://www2.iath.virginia.edu/salem/bcr/salem/salem.html

http://simile.mit.edu/timeline/examples/jfk/jfk.html

it can work, there is no magic. Creating domain-knowledge modules (described in the next section) and discipline-specific plugins will be far beyond the capability of most users, so support from user communities will be essential. And that is likely to happen only if the framework is as easy as possible for a programmer to work with. For example, it needs excellent support for user-interface features like coordinated views at very different magnifications, and a representation of time that is exceptionally flexible, e.g., in terms of range of resolution, range of precision, and support for precision in both humanities (“1853(?)”, “fl. 5th century BC”, etc.) and scientific forms. We intend to provide these features, and to build the GTW as a rebrandable shell along the lines of CIShell (Herr et al 2007).

Significance and Impact The current proposal is targeted at delivering a useable and useful tool for music performance researchers by using a design that can be extended for a wide range of other humanities applications. Musical performances, particularly the performance of operas, are an excellent starting point for several reasons. Unique productions are revived occasionally (for example, Hair and West Side Story are being revived on Broadway), and production materials are invaluable in this case.

First, the facilities for studying opera performances available at IUB are exceptional; see “Indiana University’s Background” below. More fundamentally, musical performances can be very complex events. Many aspects of studying music can benefit from examination of previous performances; but, as we have argued, music is inherently too complex for recordings alone, even video recordings, to be adequate. Besides, for classical music, there’s almost always a pre-existing visualization, an abstract recipe for the performance or “script”, in the form of music notation. With classical music, experts want to judge a performance against the composer’s description, and it is standard procedure for scholars as well as music students to compare recordings of concerts to the published notation. But several other kinds of materials are relevant and can be valuable. In fact, scholarly study of opera used to concentrate heavily on the music, but these days much more attention is given to looking at it context, considering the non-musical material for a given production as well (Philip Gossett, personal communication, July 2009). The complete list for opera and many other kinds of music theater includes a host of materials, each of which we refer to as a “facet” in Table 2.

Facet Description1 Videos of performances and rehearsals2 Audio recordings of performances and rehearsals3 Notated music, e.g., full scores, vocal scores, and parts, perhaps from multiple

publishers, perhaps with markings by the conductor and performers4 Drawings and paintings of designs for, and photographs of, costumes, sets, and

staging5 Correspondence among principals (composer, librettist, conductor, designer,

etc.)6 Concert programs and program notes7 The production design instructions (in Italian, “disposizione scenica”)8 3D visualizations of set models9 Libretti and translations of them10 Miscellaneous ephemera: production photos, production notes, etc.11 Physical objects: media containing audio and video recordings, concert

programs & program notes, actual three-dimensional costumes and sets, etc.

Table 2. Opera involves numerous types of materials.

6

The physical objects (facet 11) are largely irrelevant to our software-development project and will not mentioned again. Many of the other materials are of sufficient interest to be published or kept in library or museum collections. There are numerous books of letters by famous musicians, a large fraction of which are usually to collaborators; published libretti are commonplace; and a number of collections of drawings and paintings of operatic sets exist, e.g., one of work by C. Mario Cristini at IU’s Lilly Library. But no technology exists now for systematically preserving more than a small fraction of these materials, let alone for keeping them in their proper temporal relationships and letting users interact with them in those relationships, and they generally are not preserved consistently. As a case in point, IU commissioned the opera Our Town from the important American composer Ned Rorem, and we hope to use material from its 2006 premiere as one of our test cases. However, on hearing that idea, the production designer commented that he didn’t know how much of the ephemera and other ancillary material it would be possible to locate—and this after only three years (C. David Higgins, personal communication, June 2009).

Of the material in Table 2, Phase 1 of the project will handle at least facets 1 to 5; if time permits, it will handle some of facets 6 to 10 as well. Phase 2, if funded, will extend the workbench to include the remaining facets plus richer analysis tools, such as are shown in part in Level III of Figure 1. Building on existing software developed at Indiana University, especially the EVIA Annotator’s Workbench and to a lesser extent Variations, the first version of the GMW will handle several types of media (video, separate audio, still photos, music notation in scanned (page-image) form, and text notes) in forms suitable for opera-related mediaiii, and will have extensive support for metadata. With a timeline-based user interface, it will allow the user to organize, edit, and play back media synchronously. Once media are time synchronized to the user's satisfaction, he/she can identify a collection of media for preservation as a single unit.

Why music notation only in scanned form? Little can be done with scanned text that has not been recognized by an OCR program; similarly, there are severe limits to what can be done with music notation in scanned form. In view of the importance of notation, why don’t we plan to support it in a symbolic form? Because doing anything at all with “real” notation for complex music, even reading in and displaying it, is extremely difficult (Byrd 1984, Byrd 1994). The best bet might be via plug-ins, but even the API required is far from simple. But Variations already handles display of scanned-in music, with the page images in their own window, and coordinates it with audio, albeit with very coarse resolution: it can accept metadata saying what measures are on each page and, in the audio, when each measure starts.

Usability. While preserving the media and the metadata to put it in context is important, it is also important to provide access in a way music scholars can easily use. Design and evaluation of the user interface of our system is a significant part of our plans.

Software Architecture Client ArchitectureWe believe tools designed and implemented according to the principles described above could produce interactive visualizations similar to any of the examples we have mentioned. Design work so far strongly suggests that the best software architecture consists of three levels. From broadest applicability to narrowest, they are:

iii For example, paintings of set designs may have very odd aspect ratios. 11 by 64 inches is not unknown.

7

I. A very general framework for presenting (in both visible and audible forms), comparing, and editing temporal phenomena. This is completely independent of domain; it’s concerned mostly with the user interface, especially window and pane management, plus primitives for coordination among them and synchronization with audio. It also handles task-specific configuration files (see below for explanation) and generic metadata (including controlled vocabularies) and I/O functions, and it has an API for file I/O, symbolic alignment (e.g., via the Needleman-Wunsch generalization of Levenshtein distance), etc.

II. Generic domain-specific modules for functions like file I/O; domain-specific metadata; converting between representations; support for low-level modules for the domain, most likely via Web Services or plug-in API (henceforth referred to as “plug-ins”, with quotation marks around the word); and automatically (perhaps in a loose sense of “automatic”) aligning representations of temporal phenomena, either of similar or dissimilar types. For music purposes, this means synchronizing some combination of one or more recordings and one or more music-notation “scores”. Note that this automatically includes syncing videos based on their audio tracks. Note also that the alignment might be done ahead of time by a completely separate program that produces alignment metadata the GTW uses.

III. Lower-level modules for handling the content in some form, either analysis (feature recognition, data mining, etc.) or presentation (not necessarily visual), implemented via “plug-ins” (or Web Services). For music, music notation—specifically, conventional Western music notation—is perhaps the most important form as well as by far the most complex.

To maximize flexibility, the architecture has as little as possible to do with content. But one type of content it must impose requirements on is time data; to allow synchronizing information belonging to different domain modules, obviously some conventions are necessary. Here is a diagram of the modules in a fairly complex configuration of the system suitable for vocal music, perhaps opera. Blocks with heavy outlines are built-in, and would be present in any configuration.

Figure 1. Client configuration for vocal music (includes Phase 1 and 2 items)

While many level III (presentation) modules will be highly domain-specific, like music notation, many will be more general. This is important because the more general-purpose modules the tool can have, the more effective its factoring and the greater its advantage over domain-specific designs. An extreme case is a timeliner, i.e., a module to create and display a visualization of the temporal (or

8

other interval-level-data) basis of the phenomenon. Almost any application of the tool is likely to benefit from a timeliner, if only to allow segmenting and labeling as in the Variations timeliner (Figure 6). The spectrogram, the standard time/frequency-domain visualization, is another example.

The User’s View. The description above is from the software engineer’s viewpoint; as always, the user’s view is quite different. In addition to sets of modules, as shown in the above diagrams, any specific task might involve a complicated layout of windows and controls on the screen, and it might want to initialize the modules in ways unrelated to the user interface. The “task-specific configuration files” mentioned above should include both views, i.e., they should describe the modules needed and their settings, and the initial screen layout. They should also be able to say something about zoom levels and visibility, metadata vocabularies, and quite a bit more. For example, the Annotator’s Workbench’s configuration feature has been very successful; it has a “conf” directory, with features distributed among several XML files (see below for details).

Client Architecture and the Phasing Plan. As the phasing plan table above indicates, the three-level architecture described above will not be implemented until Phase 2; hence the limitation of earlier versions of the client to music-related areas.

Distributed Architecture: metadata collection, data management, and curation

The user’s view of our proposed system is as a desktop tool, with the architecture just described; however, the overall system architecture is distributed, organized in a client-server relationship. The desktop tool, called General Music Workbench Desktop Tool in Figure 2, is the client; the server pieces are the music workbench metadata service and the preservation repository. These three components communicate with one another by means of passing messages, that is, by using a message-passing protocol. The metadata service manages the musical content on behalf of the client tool. For example, suppose a musicologist is working on assembling material for a particular operatic performance into the GMW tool. She sets the work aside, then later revisits her work to add more material. Her current working set of materials is retrieved from the metadata service and displayed within the client. As the musicologist is uploading new content into the tool, the software listeners (see Figure 2) are feeding information about the new data object, a photograph for instance, back to the metadata service where a metadata record is being assembled for the photo. When she is happy with the contents of the assembled material, she pushes a button to preserve the material a single collection. At that time, the metadata service prepares a preservation object that is handed off to the preservation service for long-term preservation and broad dissemination of the finished product. Until the data products are sent to the preservation service, the musicologist can be assured that the data are private to her (or a small group that she approves in advance). This separation between the internal metadata service and the preservation service allows the GMW tool to work with any digital library repository back-end (e.g., DSpace, Fedora, or a future version that combines the two).

The protocols used in the distributed architecture are standard and publically available as open source. We propose to use one or both of the SOAP and the REST protocol for message passing, within an Axis web services container. These are shown in Figure 2 as the “web service communication layer” or just “web service layer”. The layer directly beneath the GMW client in the figure is an interoperability layer that allows Eclipse-based tools to communicate with Axis services. In Phase 2 we envision additional graphical and analytical tool support in the music workbench user tool; use of the Eclipse-based framework during Phase 1 will position the tool to support that extension. We will implement software listeners as the mechanism for real-time capture of metadata. Listeners have been successfully deployed in earlier projects including the LEAD project (Plale et al.

9

2008, Droegemeier et al. 2005) and the Karma provenance collection project (Simmhan 2008), both projects that one of us (Plale) has had a leadership role in.

Figure 2. Client-server architecture of music workbench gives it flexibility to serve a single user in an office, or many users simultaneously, even if some have traveled away from the base location.

We propose to use the METS standard for representing metadata for long-term preservation. METS is a popular solution for storing media content to a digital library repository, and IU’s Digital Library Program has considerable experience with it. We will also evaluate METS for use as the internal metadata format within the metadata service. By design, digital library formats like METS are very general and require extension of some form through specialized schemas, for instance, in order to represent application-specific attributes needed for search and retrieval. We will use XML schemas, a standard technology, to store metadata and pass it around the system.

Operation Descriptioncreate (new collection) create new collectionadd (data object) add data object to metadata cataloggroup (data object, collection) insert data object into existing collection;delete (data object | group | object/group) remove data object from catalog or remove

group from collection or remove data object from group;

modify properties (data object | collection) change metadata attributes in data object or collection metadata record;

search (data object | collection ; temporal attributes; other attributes)

search for and retrieve objects or collections on attributes such as temporal attributes;

preserve (collection); commit a collection to long-term preservation;

Table 3. Operations of the metadata catalog support ability to gather metadata, organize data objects into groups, and send collection off for long-term preservation.

The GMW client is discussed in detail elsewhere in this proposal; we provide additional detail on the back-end services here, with particular focus on the metadata service. Designing the tool as a client

10

server application gives it maximum flexibility. The whole tool could be installed and run on a single workstation, on the desk of the performance curator at the music school, for instance, where it serves a single user and everything executes and is stored at the local host. Should broader use of the tool be needed, the tool can be configured to run one instance of the metadata server that supports multiple simultaneously-running instances of the client, and storing results off to an institution’s digital library repository. Still another configuration could utilize cloud computing for hosting the metadata server and data objects and collections. This would enable efficient and timely access to metadata and data objects wherever the user happens to be working. As long as the user has the client installed on his or her laptop and an Internet connection, the client will connect with the server and be able to retrieve files quickly. We will implement the single-user, single-desktop configuration, and explore tradeoffs between the other two configurations, selecting the best for adoption.

The metadata service manages what we refer to as data objects and data collections. A data object might be a video of an opera performance in DV format, a Broadcast WAV audio file, or a .png image of a set drawing. A data collection might consist of all the digital materials associated with a single operatic production. The metadata service supports a set of operations shown in Table X2 and explained in more detail below. Collections and objects in a collection are identified by globally unique URIs that allow a file or collection to be located regardless of the computer system on which it resides. All operations supported by the metadata catalog are invoked by the client tool and return results to the tool in XML. Not shown are operations for administrative management of the catalog such as adding a new user. In more detail, the metadata service operations are:create (new collection) creates a new collection on behalf of a user. Two important attributes of a

collection are a bounded time interval over the collection and the temporal (statistical) distribution of objects. The temporal distribution is used in retrieval.

add (data object) will add a data object to a catalog. It accepts a base data product (e.g., WAV file) and any other attributes the software-listeners “know” about the product such as its format type and any contextual information the listeners collected. It creates an internal metadata representation for the object. The add operation will attempt to open the file and read header information to gather richer metadata.

group (data object, collection) associates a data object with an existing collection. In other words, a call to “group” adds the data object to the collection. A data object may belong to zero, one, or more collections.

delete (data object | collection | object/collection) removes data object from the catalog; removes a collection; or removes a data object from a collection. When a data object is removed from a collection, the data object still remains in the catalog, but it is no longer a member of that particular collection. Remaining rules needed to support deletes are too low level to be discussed here.

modify_properties (data object | collection) updates metadata attributes in data object or collection metadata record. The modify call allows modifications to be made to the metadata record of a data object or a collection.

search (data object | collection ; temporal attributes; other attributes) searches for and retrieves objects or collections on attributes such as temporal attributes. Retrieving a collection will retrieve all members of the collection. The client will receive the metadata records of retrieved objects immediately. Files, if remote, are moved conservatively. That is, we will develop predictive prefetching rules to selectively prefetch files so they are available at the client when needed but not moved unnecessarily.

preserve (collection) commits a collection to long-term preservation. The metadata server will generate a METS format description for each object in the collection and for the collection itself. The METS metadata will be built using the internal metadata records and contextual information

11

such as user ID to establish attribution. The collection plus metadata descriptions will be “zipped” into a single data-compressed object and passed to the preservation service (in Figure x) where it will await additional curation before being committed for long-term preservation. As shown in the figure, curation and final ingest to the long term repository are planned for Phase 2 of the project.

The proposed metadata catalog will leverage the XMC Cat metadata catalog (Jensen and Plale 2008; Plale et al. 2006, http://www.dataandsearch.org/dsi/research) developed in IU’s Data To Insight Center. XMC Cat is a general metadata catalog that supports many of the operations in Table 3; we will extend it to support the full list commands. XMC Cat can be configured to support any XML metadata schema by means of a step-through wizard, making it easy to configure for a new application. It supports use by multiple simultaneous users, supports secure access through X.509 certificates, and communicates using SOA messages via Axis 2. It will be extended to support REST as needed for this project. The software listeners and preservation pieces will draw from previous and ongoing research and development in the Karma project (http://dataandsearch.org/provenance), wherein listeners are actively being used to collect the provenance of digital data products and preservation objects are generated from provenance gathered from scientific workflows.

The expected outcome of the architecture effort is a flexible architecture that can simultaneously support multiple users, and can be configured to work under a number of different configurations from single-user single-machine (Phase 1), to multiple user (Phase 1), to a cloud hosted environment supporting (Phase 2).

Background of Humanities, Music and Preservation at IUFew American universities have as much practical experience as Indiana University Bloomington in areas related to the GTW project, and especially in areas related to the current proposal: technology areas like video annotation for the humanities, digital music libraries, and digital audio preservation as well as music performance and recording technology.

The Jacobs School of Music (JSOM) at Indiana University’s main campus in Bloomington is one of the largest music schools in the country, and for many years it has been almost universally considered one of the best. The JSOM has an outstanding opera performance program. IU’s opera house, the Musical Arts Center, is generally considered to be the most advanced facility in any US university. The Jacobs school presents nine operas and three ballets each year, a season that is exceeds that of many professional companies. All told, the school presents in excess of 1100 events each year, some 600 of which are recorded by the school’s Recording Arts Department. The Department records digital audio and video masters directly to their own servers, but these “born digital” files are sent to IU’s MDSS (Massive Data Storage Service, with a capacity of 4.2 petabytes) for long-term storage. Finally, listening copies of the recordings are deposited in the IU Music Library—perhaps the fourth largest music library in the country—where they are made available online via the Variations system. Detailed metadata is embedded in all performance recording files; it is used for purposes such as creating MARC records for the library catalog.

EVIA (“Ethnographic Video for Instruction and Analysis”), a joint project of IU and the Univ. of Michigan, had had support from the Mellon Foundation to develop tools for enthnomusicologists and other ethnographers. The EVIA Annotator’s Workbench (“AWB”; Figure 7) is essentially a video segmentation and annotation tool, designed to help researchers create finished presentations of their projects with library- and archive- quality metadata. The AWB has extensive configuration support, plus—in conjunction with other tools from the EVIA project—metadata-creation features like support for user-definable controlled vocabularies and authority control. Its configuration features are organized by a directory containing mode-specific configuration files that describe actions and window-based features. These files, in XML form, determine what windows are available to the user

12

and where on the screen each window will initially be displayed. They also describe what set of actions the user can take, such as adding video segment or editing a transcription. A configuration file of another type determines what controlled vocabulary features are available. Consequently, a wide variety of activities and workflows can be supported just by changing configuration files, without requiring any software development. The AWB is scheduled to be released in open-source form in the near future, but it has already been used successfully by dozens of ethnographers, including folklorists and anthropologists as well as ethnomusicologists, in four summer workshops at Indiana University. The Annotator’s Workbench graphical annotation component will serve as the code base for this project. Enhancements derived from this proposed effort will be made available to the AWB team for incorporation into the AWB tool.

The Variations system, developed by the IU Digital Library Program and School of Music, is arguably one of the most advanced academically-oriented digital music library system in existence. Music brings up repeatedly the need to deal with multiple versions of a work in multiple forms, especially sound recordings and some kind of music notation (sometimes tablature). One of the main thrusts of IU’s Variations, Variations2 (Scherle & Byrd 2004; Dunn et al 2006), and Variations3 projects was to solve the access challenges of multiple versions. The same is true of other projects, e.g., SyncPlayer (Kurth et al 2007), but Variations focuses on metadata, while SyncPlayer focuses on content. And the same basic problem occurs in many fields of study. For some years now the digital library community has been moving towards the “FRBR” model for cataloging (Hickey 2002), which addresses the metadata aspect; but the need is especially pressing in music.

Finally, the “Sound Directions” project of IUB’s Archives of Traditional Music is nationally known for its pioneering work in digital audio preservation; the Archives recently received their third grant from NEH for the project. We will leverage this campus expertise in a number of areas.

Management and Work PlanThe project is planned as a two year project with deliverables each month. The project will be led by Dr. Byrd and Prof. Plale, and will involve regular weekly meetings among team members. Since all primary staff are located on the Bloomington campus, the meetings will normally be held in person, but Indiana University has facilities for telecons should members be on travel. The technical staff may meet more frequently and for extended periods as they work out issues like common formats, protocols, and program interfaces. The project anticipates making a first release of a 0.1 version of the tool at the end of Year 1. This milestone is established to ensure the distributed components of the system can work together on a small subset of the digital media types. In Year 2 we anticipate hardening the code, and extending its support to other digital media types. A key component of the project is its assessment. The tool is only good if it satisfies the needs of real users, and we will conduct a user study the final 6 months of the project so as to get feedback on the tool. This feedback will be used to enhance the tool. It will be released open source under an Apache style license, and made available to the broader community through passive and active dissemination strategies including posting the code to a project web site, and notifying communities of its existence.

A detailed breakdown of the work tasks is provided in Table 4 and in more detail in the paragraphs following.

May – Oct 09 Nov 09 – Apr 10 May – Oct 10 Nov 10 – Apr 11 Improve synchronization of AWB especially for audio; integrate Variations’ audio player

Add support for timelines of unstructured material (any number of events at a time; usually realtime-

Make UI more flexible (e.g., displaying simultaneous events like SIMILE or LifeLines).

Conduct user study; gather feedback (whole team)

13

in AWB; add support for text notes. (Byrd)

based organization) as well as structured (time starts at 0, one hierarchy dominates, exactly one event at a time). (Byrd)

Implement 2-level architecture: framework & support for add-ons. (Byrd)

Study of Phase 1 data formats for metadata representation (CS GRA)

Extend methods in metadata service (CS GRA)

Extend support for richer kinds of data formats (e.g., notated music) (CS GRA)

Develop internal schema for metadata service. Make work for audio and video only. (CS GRA)

Develop listener instrumentation (CS GRA)

Develop support for preservation object and simple support for preservation service. In collaboration with W. Cowan. (CS GRA)

Conduct performance evaluation on systems aspects. (CS GRA)

Use feedback on user experience to improve system. Use performance study results to optimize system (whole team)

Apr 2010 milestone: working metadata service that ingests new data objects and serves data objects to client GMW tool.

Oct 2010 milestone: V1.0 version of system supporting Phase 1 data types.

Apr 2011 milestones: delivery of Phase 1 tool.

Apr 2010 milestone: release V0.1 : minimal data formats supported, single user, no preservation.

Table 4. Milestone and timeline chart for development of the tool.

Table 5 provides a glimpse of how the Annotators Workbench (AWB) will need extending. Since the tool’s interface will be based on the AWB, plus chunks of Variations and quite possibly EVIA code other than AWB, this summary is particularly meaningful. The chart (Table 5) also extends into a Phase 2 effort to show the continuity. The AWB is written in Java, as are Variations code and most of EVIA code other than AWB, the latter being written part in Java and part in JavaScript. One of us (Byrd) has substantial experience as a programmer with both Variations, for which he was a member of the development team, and with AWB. In addition, consultant Will Cowan is software development manager for IU’s Institute of Digital Arts and Humanities, in which position he supervises the EVIA programmers, so we can count on strong support (though no committed programmer time).

Feature Foundational code base functionality

(AWB)

Phase 1 (GMW) Phase 2 (GTW)

Intended primary application ethnography complex music performances

any temporal phenomena

Media fully supported video video, audio, still via domain knowledge

14

photos, text modules, any; initial set: video, audio, still photos, text

Vertical Structure (e.g., for facets)Multiple bands (parallel coordinated display areas), with user control of what goes in each

no any number of bands, with same scale but different content

any number of bands; can have different scale, content, etc.

Horizontal Structure (across time)Intervals/segments at any time point 1 any number any numberHierarchy type for non-point items subdividing;

special items outside hierarchy

?? any

Max. levels in hierarchy 4 no limit no limitHandling of TimeRelative or absolute time, or non-time interval-level variable

relative relative or absolute

any

Exact or approximate time exact both bothTarget timeline duration hours hours to years picoseconds to billions

of yearsUser InterfacePlayability: medium & rate video, via

QuickTime; real-time or single-frame

audio & video, via JavaFX; 1/2x to 3x real-time

audio & video, via JavaFX; any available

Table 5. Kinds of details that must be considered for the General Music Workbench, cast in comparison to the Annotator’s Workbench from which the code will be derived through extensions, and the Phase 2

roadmap version.

To clarify a few things in the above table, the vertical structure of a timeline is the way it shows things happening at the same time, either in separate bands or within a band. For anything as complex as the LifeLines medical history (Figure 2) or opera materials, we feel it’s best to use bands corresponding to facets.

Many timelines essentially display a segmentation on whatever basis of the time they cover. The segmentation may be hierarchic, but it has a relatively constrained structure nonetheless. These timelines are generally concerned with relative time only, and start nominally at time zero. In contrast, a timeline of the type we envisage, for organizing an arbitrary set of materials, ideally should not impose any limits on the temporal structures it handles: it may involve multiple hierarchies or none at all. It also needs to be able to refer to real-world time—for an opera production, covering a period of perhaps two years. Of course, the user may want to focus on an individual item attached to the timeline and interact with it as a highly-sructured thing starting at time zero.

These changes make it clear that a significant amount of programming effort will be required. From the Variations codebase, we expect to use the audio player, parts of the timeliner, and parts of the “Opus” (page-image display of music notation coordinated with audio) window.

The challenges that will need to be addressed during this project include the following:

15

1. Develop an internal metadata representation for disparate media that facilitates eventual preservation. Much work has been done with metadata for each medium type, but little towards an integrated form.

2. Explore lightweight client software that imposes minimal technological burden on the user of the tool to set up, update, and maintain.

3. Explore options for user selection of preservation objects that strikes a balance between transparency and control.

4. Develop a timeline UI appropriate for complex musical performances, in terms of both its appearance and behavior. For example, SIMILE Timeline knows which band each item goes in but it knows nothing about the structure of events within bands: vertical position is not significant, and the events are automatically arranged compactly but avoiding overlap. On the other hand, LifeLines has hierarchic bands, intended to be used for facets, with an “expand/collapse” interface; that seems like the best approach for our purposes. While it won’t eliminate the need for automatic arrangement, it will also reduce it greatly.

One interesting question is, what does it mean to play a general timeline, as opposed to simply playing media attached to the timeline, and where (a) durations might be far too short or long for real time playback to make sense; (b) some media might be playable only in a nonobvious way, if at all; and (c) any number of individually-playable media can occur in parallel—perhaps (d) zero for a very long time? For (a), part of the answer is obviously support for playback very much slower or faster than real time. For (d), the Salem Witchcraft page zips through parts of the timeline where there are no events, and that seems to make sense. SIMILE supports piecewise-linear spacing; playback might move at constant distance per unit time. We know of no previous discussion of these issues, though other programs do “play” more-or-less general timelines, e.g., dipity.com, AVID Composer.

5. API for communication with some visualization plug-ins.

6. Representation and encoding of time, including approximate, and especially for communication with add-ons. The representation/encoding Variations uses (designed by Byrd) seems to meet library/archive needs for vague and uncertain dates, but its time resolution is no better than 1 minute. For audio/video, we need to increase resolution to, say, 1 millisec.. frame-accurate is desirable; 1 millisec. can’t guarantee that, but is it good enough?

7. Configurability/control of configuration. This is one of AWB’s strong points, so it is probably not a serious problem, but it needs attention.

Testbed Opera ProductionsAs a testbed for this work, we will use three productions of operas by the IU Opera Theatre: one past premiere (Our Town), one upcoming but a non-new production (e.g., Rondine or The Magic Flute), and one upcoming premiere. The production of Ned Rorem’s Our Town in 2006 was a premiere, an IU commission, and an all-local production (design, conductor, etc.), It involved many kinds of material, and is recent enough that the varied performance materials may still be available in sufficient quantity and diversity. Note that the availability question is itself relevant to this project, and an interesting aspect of the project’s value may be to quantitatively assess the tool’s impact on the amount of data and information that can be saved. Two upcoming premieres are Vincent, by Bernard Rands, and an unnamed opera being written by IU composition professor P.Q. Phan; either

16

should be satisfactory for our purposes.

Phase 2 add-ons An add-on is a module or program providing additional functionality that the GMW communicates with either as a plug-in or via Web Services. An add-on can provide any kind of functionality that regular software can, so is more general than Firefox add-ons. We propose as part of Phase 2 to integrate support for add-ons such as video players. There are different approaches to architecting the system so as to facilitate integrating add-ons (i.e., making them discoverable to users). One approach is to wrap functionality as Web services and the other is to develop in an Eclipse environment. By means of the OSGI communication protocol, Eclipse/Equinox provides a standard plug-in interface with support for discoverability, etc. Effort has been done at Indiana University and elsewhere to minimize the effort needed to integrate new plugins. We will examine both during Phase 1 of the project for their ability to support both Flex applications as add-ons and Java applications. Note that Figure 2 shows the client using Eclipse and communicating through a special Web Services layer. This implies that our design decision at this point is to allow the client to be an Eclipse based client, and let it communicate with back-end services through Web Services.

Dissemination and EvaluationAt this early stage, only limited forms of evaluation make sense. One form that does is of the user interface of the GTW prototype,

As a primary deliverable in addition to the tool itself, we will write a whitepaper for the NEH website; submit papers to conferences; set up a website for the project; and make the software available for download, at least to non-profit institutions, on a free and open-source basis, with an Apache-style license.

Visual Examples

Figure 3. Mock GTW “screenshot”: configuration to study three performances of a Mozart aria

17

Figure 4. LifeLines screenshot: still image and aligned timelines

Figure 5. SIMILE Timeline of the assassination of President Kennedy

Figure 6. Salem Witchcraft accusations: playable timeline with GIS-like display

18

Figure 7. EVIA Annotator’s Workbench

Figure 8. Variations Audio Timeliner

19

Appendix A. Music & Musical Theatre Document Facets

Priority Facet Description File formats for NEH proposal (JSOM opera prods.)

Sources for NEH proposal (JSOM opera prods.)

Plan/comments for NEH R&D proposal

Top 1 video files of performances & rehearsals DV, HDV, DVCPro HD, Apple ProRes 422, XDCAM

JSOM servers, MDSS

Need to add a simple audio mixer; otherwise, AWB handles OK as is

Top 2 audio files of performances and rehearsals Broadcast WAV 96/24, 48/24, 44.1/24, 44.1/16; interleaved stereo, multiple mono

JSOM servers, MDSS

Probably integrate audio player from Variations & add a simple mixer

High 3 notated music: published full scores & vocal scores; full scores, vocal scores, & parts with markings by the conductor, performers, etc.; alternate versions & different editions

DjVu (the compressed format Variations uses), TIFF, JPEG

Music Library, Opera Studies?

Handle in image form only (code from Variations), or possibly in symbolic form via add-on

High 4 drawings and paintings of designs for, & photographs of, costumes, sets, and staging

.png, .jpg Opera Studies? Some are likely to have odd aspect ratios (e.g., Cristini's 11 x 64 in. paintings)

High 5 correspondence among principals (composer, librettist, conductor, designer, etc.)

various text formats principals' personal material

Med 6a concert programs & program notes PDF JSOM servers, MDSS

Formatted text; display as HTML?

Med 6b report files (index of the content of each BWF file & performance notes)

RTF JSOM servers, MDSS

Formatted text; display as HTML?

Low 7 production design instructions (in Italian, “disposizione scenica”)

(not important for Phase 1) Music Library, Opera Studies?

Unlikely we'll do in Phase 1

Low 8 3D visualizations of set models (not important for Phase 1) Opera Studies? Unlikely we'll do in Phase 1. Very desirable to use an existing viewer (via add-on) when/if done

Med 9 for opera: libretti & translations of them; for other vocal music: lyrics & translations of them.

(not important for Phase 1) Music Library, Opera Studies?

Formatted text; display as HTML? Unlikely we'll do in Phase 1

Low 10 miscellaneous ephemera: production photos, production notes, etc.

(not important for Phase 1) Opera Studies? Won't do in Phase 1

computer music journal article template - indiana...

Documents