david huron music information processing using the...

11Huron

David HuronSchool of Music1866 College RoadOhio State UniversityColumbus, OH 43210 [email protected]

Music InformationProcessing Using theHumdrum Toolkit:Concepts, Examples,and Lessons

Computer Music Journal, 26:2, pp. 11–26, Summer 2002� 2002 Massachusetts Institute of Technology.

Humdrum is a general-purpose software system in-tended to assist users in a variety of music-relatedapplications. This article provides an introductorytour of Humdrum emphasizing three goals: to iden-tify the basic elements, operations, and organiza-tion of Humdrum; to illustrate the range ofapplications through a series of tutorial examples;and to identify a number of lessons from Humdrumthat might benefit future music software designers.

Humdrum’s capabilities are quite broad, so it isdifficult to describe concisely what it can do. Sinceits inception, the principal users of Humdrum havebeen systematic musicologists, theorists, ethnomu-sicologists, music librarians, historians, music cog-nition researchers, and composers. Five featureshave accounted for this broad interest: the abilityof users to concoct or tailor unique representationsthat pertain to the user’s specific interests; a flexi-ble set of analytic and processing tools that can beapplied to both established and user-defined repre-sentations; a coherent and extensible system forrepresenting reference-related metadata; ease ofconnectivity to other existing software; and avail-ability of a large volume of high-quality encodedmaterials.

In Humdrum, new musical representations areeasily created. The opportunity to craft representa-tions for specific tasks has led to the developmentof innumerable representations. For example, Hum-drum users have concocted representations for Bu-gandan xylophone music, Cajun button accordiontablatures, square notation, Persian Ney music, Be-nesh dance notation, running acoustic spectra, andtrack index markers for compact discs. For manyprojects, it is common to generate intermediate or‘‘throw-away’’ representations that are used onlyfor a single task. For example, in perceptual re-

search, collected data (such as listener responses)are commonly encoded in the same document thatcontains the stimulus materials.

In addition to specific representation schemes foruser ‘‘data,’’ Humdrum allows users to define theirown types of reference-related information. Con-ventional reference information includes an artist’sname, title, date of performance, copyright status,etc. However, Humdrum allows users both to ex-tend and omit such metadata in ways that main-tain compatibility without compelling users toencode task-irrelevant materials. For example, aballet scholar might define reference informationidentifying the transcriber of a Laban dance nota-tion while remaining compatible with conven-tional cataloguing systems. At the same time, userscan still process materials with incomplete or ab-sent reference information.

Although Humdrum does not provide its ownsound generation or graphic notation capabilities, anumber of translators exist that connect Humdrumto existing applications. These include translationsto MIDI, Csound (Vercoe 1986), the Finale Enigmaformat, the Score notation system (Smith 1972;Kornstadt 1996), and the Mup music text formatter(available from www.arkkra.com). In addition, ma-terials can be translated to the Humdrum formatfrom MIDI, the Finale Enigma format, MuseData(Hewlett 1997), DARMS (Erickson 1976; Hall1997), and the RISM Plaine and Easie representa-tion (Howard 1997).

All Humdrum file formats are ASCII text. Thisfacilitates viewing and editing data, increases cross-platform portability, and eases the writing of task-specific software. The plain text format alsofacilitates reformatting data for special-purpose usessuch as graphics programs or statistical packages.

Musical materials encoded in the Humdrum for-mat span five centuries and six continents. En-coded materials include complete musical works as

12 Computer Music Journal

well as collections of themes and incipits. As of2001, over 40,000 musical works had been encodedin various levels of detail. Some encodings are sim-ple monophonic melodies; others include full or-chestral scores including such details asstem-direction and beaming. In addition to theMIDI format, the Humdrum ‘‘kern’’ representationis one of the principal distribution formats for thehigh-quality electronic editions produced by theCenter for Computer Assisted Research in the Hu-manities (www.ccarh.org). An extensive collectionof non-Western musics has been encoded; an in-complete alphabetic sample includes Aleutian, Bra-zilian, Chinese, Dutch, Ethiopian, Fijian, German,Haitian, Indonesian, Japanese, Khmer, Lakota,Mandan, Namibian, Ojibway, Peruvian, Quebecois,Russian, Scottish, Thai, Usarafus, Venda, Xhosa,Yoruba, and Zulu musics. (Huron 1992). RegardingHumdrum’s analytic capability, a selection of toolswill be discussed next.

Humdrum Syntax

Humdrum data are organized in two-dimensionaltables like a spreadsheet. Columns of data can bedefined to represent different types of information.Successive lines (records) represent successive mo-ments, so time passes as one moves down the page.The example here encodes an ascending majorscale. The leftmost column represents pitch usingthe International Standards Organization (ISO)pitch designations. The rightmost column repre-sents piano fingerings. The columns are separatedby a single tab character. Each column of data be-gins with an interpretation that identifies the typeof data being represented, and ends with a path ter-minator.

**pitch **fingeringC4 R1D4 R2E4 R3F4 R1G4 R2A4 R3B4 R4C5 R5*- *-

The ISO pitch representation is predefined inHumdrum, whereas the fingering representationhas been concocted for this example. Humdrum en-codings can consist of any number and length ofcolumns.

Unlike the columns in a spreadsheet, Humdrumcolumns can exhibit complicated paths through thedocument. Columns can join together, split apart,exchange positions, stop in mid-table, or be intro-duced in mid-table. Because columns of Humdrumdata can roam about the table in a flexible way,they are referred to as spines.

Humdrum spines are formally labeled using in-terpretations (tokens beginning with an asterisk).Exclusive interpretations (double asterisks) identifythe basic type of data being represented. Tandeminterpretations (single asterisks) provide supple-mentary labels or tags that can clarify the state ofthe data. Any number of tandem interpretationscan be added anywhere in a spine. In addition,Humdrum provides ways of adding running com-mentaries. Comments might pertain to the wholeencoding, to a given row or spine, to a given datacell, or to a particular item of information within acell (such as a single letter or digit). Specially for-matted comments are used to encode reference- orcataloguing-related information. All types of com-ments are designated by a leading exclamationmark.

The most common Humdrum files encode the‘‘score’’ of a complete work or movement. Typi-cally, musical notes are encoded in the variouscells of the table; the most common use of a spineis to represent a single musical part or instrument.

Humdrum includes some thirty predefined repre-sentational schemes. Predefined representations in-clude many common forms of music-relatedinformation such as frequency, pitch, cents, scaledegree, melodic and harmonic intervals, duration,dynamics, decibels, harmonic function, pitch-classsets, lyrics, etc. However, spines can be used torepresent anything, and users can concoct theirown representations simply by writing an exclusiveinterpretation with a unique identifier.

Of the predefined Humdrum representations, themost popular is kern, a scheme intended to repre-sent the basic or core musical information of notes,

13Huron

durations, rests, barlines, etc. (Huron 1997). (Thekern representation gets its name from the Germanword for ‘‘core.’’) By way of illustration, the follow-ing example encodes the accompanying musical ex-cerpt of Figure 1 using the **kern representation.

!!!COM: Kabalevsky, Dmitri!!!OTL: 2. Rondo–Dance**kern **kern*staff2 *staff1*clefF4 *clefG2*k[f#c#] *k[f#c#]*M3/8 *M3/88r (16dd. 16cc#� �8D’ 8dd’)8A’ 8f#’ 8a’8r (16b. 16a#� �8BB’ 8b’)8F#’ 8d’ 8f#’8r (16g. 16f#� �8BB-’ 8g’)8G’ 8c#’ 8e’8F#’ 8a’� �8D (8f#8A 8d)*- *-

The representation mirrors the two-dimensionstructure of the score, rotated clockwise by 90 de-grees. The tandem interpretations in this examplecan be readily deciphered as representing staves,clefs, key, and meter signatures. Reciprocal num-bers are used to indicate durations. Pitches are rep-resented using lower-case letters (middle C andabove) and upper-case letters (below middle C). Oc-taves are represented using a system of letter repe-tition. (See Huron 1995 for a complete description.)In each complete measure, note the appearance of a‘‘multiple stop.’’ Using the space character as a de-limiter, more than one data token can appear in agiven spine. A single staff can be represented bymore than one spine, so complex polyphonic andother textures can be represented.

It is important to understand that there is noth-ing special about the **kern representation. Forexample, in using the pitch letter names A–G, thisparticular representation shows a bias towardsEnglish-speaking users, while the use of numbersfor durations shows an American bias. However,other predefined Humdrum representations allowusers to represent music using French ‘‘fixed-Do’’solfege or the German system of pitch naming (e.g.,H�B-natural, Es�S�E-flat, etc.) Encodings areeasily translated from one representation to an-other. Users of Humdrum have tailored other repre-sentations that better reflect special representationrequirements, from Australian aboriginal music toZydeco music.

Figure 1. Musical excerptfrom Kabalevsky.


Humdrum Tools

The Humdrum Toolkit contains roughly 70 inter-related yet stand-alone software tools. Each toolcan be invoked individually or in connection withother tools. Typically, the tools are invoked in acommand-line shell or as statements in a programor script. Humdrum commands conform to thePOSIX 2 portability standard (IEEE standard 1003.1,available online at standards.ieee.org/catalog/olis/posix.html) and thus can be used on all popularoperating systems. Any data that conforms to theHumdrum syntax can be manipulated using theHumdrum tools. Because the tools can be con-nected with each other (and can also be connectedwith non-Humdrum tools), there are many ways tomanipulate Humdrum data.

Some Sample Commands

One group of tools is used to extract or select sec-tions of data. Vertical spines of data can be ex-tracted from a Humdrum file using the extractcommand. For example, if a file encodes fourmusical parts (encoded in four spines), then theextract command might be used to isolate one ormore given parts. The command

extract -f 1,3 filename

will extract the first (leftmost) and third column orspine of data. Often it is useful to extract materialaccording to the encoded content without regard tothe position of the spine. For example, the follow-ing command will extract all spines containing alabel indicating the tenor part(s):

extract -i ‘*Itenor’ filename

Instruments can be labeled by ‘‘instrument class’’and can thus be extracted accordingly. The follow-ing command extracts all of the woodwind parts:

extract -i ‘*ICww’ filename

Any vocal text can be similarly extracted:

extract -i ‘**text’ filename

Or, if the text is available in more than one lan-guage, a specific language may be isolated:

extract -i ‘*LDeutsch’ filename

As noted above, users are free to define their owninterpretations. For example, if a user has codedelectro-encephalographic data in a spine (denotedby, say, **EEG), the pertinent data can be extractedusing the same syntax:

extract -i ‘**EEG’ filename

Segments or passages of music can be extractedusing the yank command. Segments can be definedby sections, phrases, measures, or other any user-specified marker. For example, the following com-mand extracts the section labeled ‘‘Trio’’ from aminuet and trio:

yank -s Trio -r 1 filename

Suppose a user wanted to select the material inmeasures 114 to 183. In the following command,the-n option specifies a regular expression (�)which might be used in this case as a barlinemarker. The-r option identifies the range of, in thiscase, the numbered labels on the barlines:

yank -n �-r 114-183 filename

In a representation (like **kern) that uses curlybraces to represent phrases, selecting the penulti-mate phrase in the work might be achieved asfollows:

yank -o {-e }-r ‘$-1’ filename

Whenever appropriate, users can define general-purpose patterns using the well-known regular ex-pression syntax. This means that the tools canoperate in ways that are sensitive to the encodedmaterial, even without any ‘‘knowledge’’ of therepresentational convention.

Some tools translate one representation into an-other. For example, the mint command generatesmelodic interval information. The mint commandonly operates on pitch-related data; any non–pitch-related data are ignored.

For example, if we apply the mint command tothe previous ascending scale, the resulting outputwould transform the **pitch spine while leavingthe **fingering spine unaffected:

15Huron

**mint **fingering[C4] R1�M2 R2�M2 R3�m2 R1�M2 R2�M2 R3�M2 R4�m2 R5*- *-

Two or more commands can be connected into apipeline. The following command locates all me-lodic tritones—including compound (octave) equiv-alents:

mint -c filename | egrep -n ‘((d5)|(A4))’

Because the Humdrum tools are stand-alonecommands, non-Humdrum tools can be interposedat any point in the processing. For example, theegrep command in the above pipeline is a com-mon utility associated with the UNIX operatingsystem but available on many other systems aswell.

Depending on the type of translation, the result-ing data can be searched for different things. Thefollowing command identifies French sixth chordsby first translating the input into a predefined scaledegree representation. When provided with scale-degree data, the regular expression 6 - .*4� means‘‘find any lowered sixth scale degree that is concur-rent with a raised fourth scale degree. The regularexpression 2 simply ensures that the sonority alsocontains the second scale degree:

sdeg file | extract -i ‘**deg’ | ditto | grep‘6-.*4�’ | grep 2

The following command locates all sonorities inthe music of Machaut where the seventh scale de-gree has been doubled:

deg -t machaut* | grep -n ‘7[^-�].*7’

This command counts the number of phrasesthat end on the subdominant pitch:

deg filename | egrep -c ‘(}.*4)|(4.*})’

The next command identifies all scores in thecurrent directory whose instrumentation includes atuba but not a trumpet:

grep -sl ‘!!!AIN.*tuba’ * | grep -v ‘tromp’

Predefined pitch and duration representationscan be translated to MIDI using the Humdrummidi or smf commands; the perform commandprovides a simple command-line MIDI player. Forexample, the following pipeline generates a MIDIperformance of the second phrase in the oboe part:

extract -i ‘*Ioboe’ filename | yank -o {-e }-r 2 | midi | perform

Similarly, the following command will play thefirst and last measures from a section marked‘‘Coda’’ at half the notated tempo from a filenamed Cui:

yank -s ‘Coda’ Cui | yank -o ^�-r 1,$ | midi| perform -t .5

The ms command generates input for the Mupnotation program written by John Krallmann andWilliam Krauss. The following pipeline generatesgraphical PostScript output of a notational render-ing of the first 20 measures of just the string parts:

yank -o �-r 1-20 filename | extract -i‘*ICstr’ | ms

Robert Gjerdingen and Po-Yan Tsang have writ-ten tools that translate between Humdrum and theFinale Engima format. Other output tools can beused to generate Csound score files.

The context command provides a simple way ofsearching for patterns in particular contexts. Thecommand

context -n 2

changes a sequence of tokens, such as the followingRoman numeral harmonies:

**harmIViiiVI7ii;*-into a sequence of paired tokens, or ‘‘digrams,’’ asfollows:


**harmI VV iiiiii VI7VI7 ii;.*-

In effect, the resulting representation indicates thata I chord is followed by a V chord; a V chord is fol-lowed by a iii chord, and so on. An inventory of themost common two-chord progressions in Bach’schorale harmonizations can be created by simplysorting and counting the number of unique datarecords.

extract -i ‘**harm’ chorales* | context-n 2-o � | sort | uniq -c | sort -n

An inventory of three-chord progressions can begenerated by changing the -n 2 option to -n 3.

Earlier we saw that the extract command canbe used to isolate one or more spines from a file.The reverse process is achieved using the Hum-drum assemble command, which amalgamates in-dividual spines into a single file. An obvious use ofthis would be to assemble a full score from individ-ual parts. However, assemble is more commonlyused to amalgamate different kinds of informationin a single document. Multiple files can be alignedsimply by specifying the inputs and output:

assemble degree.file interval.file� filename

Suppose we would like to determine whether de-scending minor seconds are more likely to occur asFa–Mi or Do–Ti. We can use the mint command tocharacterize melodic intervals and the solfa com-mand to characterize scale degrees:

mint melody � file1solfa melody � file2

We can then use assemble to amalgamate thetwo kinds of representations and use grep tosearch for the appropriate combination of intervaland scale degree:

assemble file1 file2 | grep -c ‘-m2.*mi’assemble file1 file2 | grep -c ‘-m2.*ti’

Innumerable variants of this technique are possi-

ble. Several different representations of the samemusic can be coordinated in a single file. For exam-ple, a user might search for all instances of a de-scending minor third in the soprano voice, inpreparation for a suspension (occurring in eitherthe alto or tenor), leading to a half cadence, withthe soprano terminating on a tonic pitch ap-proached from below. (Such complicated examplesare discussed in detail in the Humdrum UserGuide (Huron, 1999).)

A more general tool, humsed, implements astream editor conforming to the syntax of the pop-ular sed stream editor used on UNIX systems.Both sed and humsed are Turing-complete, whichmeans that any manipulation of strings of charac-ters that can be achieved using a computer can bedone using humsed. In practice, stream editorssuch as humsed are used for more modest functionslike simple substitutions. For example, a usermight create a simple script to rewrite the pitchesin a trumpet piece as valve combinations.

At the University of Waterloo, Jonathan Berechas collected data from trumpet performers esti-mating the degree of difficulty for various finger/valve combinations. On a scale from 1 (leastdifficult) to 10 (most difficult), the valve/finger se-quence 1–3 followed by 2 was rated by performersas 9.7 in difficulty, whereas the sequence 1–2 fol-lowed by 2 was rated 5.8. Having used humsed totranslate pitches to valve combinations, thecontext command can be used to assemble pairsof successive valve-combination changes. A subse-quent humsed script can then be used to replacevalve-combination changes by their difficulty rat-ings. In other words, any monophonic musicalscore in the range of the trumpet can be processedto generate a new spine indicating the degree ofmoment-to-moment fingering difficulty:**kern **valves **difficultyB4 2 0D#5 2 0A4 1-2 3.0G#4 2-3 6.0A4 1-2 5.0F4 1 1.5C4 0 1.0*- *- *-

17Huron

When the output is piped to a sound synthesis pro-gram such as Csound, the ‘‘difficulty’’ informationmight be used to modify performance parametersso that difficult fingerings are associated with hesi-tation, fluctuations in intonation, or other acousticcues that add realism to the performance.

A composer might try different transpositions todetermine the easiest (or most difficult) key for awork. For example, Figure 2 shows a graph of theaverage fingering difficulty for Herbert Clarke’sStars in a Velvety Sky, a virtuoso trumpet workcomposed by a skilled trumpet performer. Thework was methodically transposed covering allkeys in which the work might be theoreticallyplayed. In general, the average fingering difficultytends to decrease as the work is transposed higherin pitch owing to the greater variety of alternativefingerings available for high notes compared withlow notes. Apart from this general trend, signifi-cant changes in average fingering difficulty occurfrom key to key. In works composed by trumpetvirtuosi, Mr. Berec and I have found a marked ten-dency for the lowest fingering difficulty to corre-spond with the key in which the work was actually

written. This pattern was not evident in trumpetworks written by non-trumpet players.

The values for Figure 2 used the Humdrumtrans tool, which allows passages to be transposedin various ways. The trans tool permits indepen-dent diatonic and chromatic offsets, so it is possi-ble to transpose works into different modes as wellas different keys. For example, the following com-mand causes a simple diatonic shift, and so per-forms a Joplin rag in D Dorian rather than C major:

trans -d 1 Joplin | midi | perform

Two-Dimensional Patterns

Tools such as grep allow users to carry out stringsearches where the pattern of interest can be foundon a single line. The two-dimensional structure ofHumdrum representations means that importantmusical patterns can stretch over many lines orrecords. The Humdrum patt command provides atwo-dimensional equivalent to grep that allowsusers to search for patterns spanning multiplerecords.

Figure 2. Graphicalrepresentation of thefingering difficulty forHerbert Clarke’s Stars ina Velvety Sky.


Consider, for example, the task of searching forthe pitch pattern B–A–C–H. As a sequential pat-tern, the Humdrum syntax will cause such a pat-tern to span several lines. The patt commandallows users to specify a multi-record pattern tem-plate. A suitable template might be as follows:

[Bb] �[Aa] �[Cc] �[Hh] �

The square brackets are used to indicate characterclasses (e.g. ‘‘B’’ or ‘‘b’’). The plus sign followingthe tab means ‘‘one or more matching records.’’Consequently, the above template refers to ‘‘one ormore (data) records matching either a lower- or

uppercase character ‘B’ followed by one or more rec-ords matching either a lower- or uppercase charac-ter ‘A’ followed by one or more records matchingeither a lower- or uppercase character ‘C’ followedby one or more records matching either a lower- oruppercase character ‘H’.’’ We can use this templateafter we have translated our input into the Germansystem for pitch representation. Assuming the fileBACH contains the above template, the followingtwo commands will search for all instances of B–A–C–H and play each instance using MIDI:

tonh -x file | ditto -s � � file.germanpatt -e -f BACH file.german | midi | perform

Figure 3 shows an example of a formerly un-known instance of B–A–C–H discovered by Walter

Figure 3. A formerlyunknown instance ofB–A–C–H in Bach’sConcerto No. 2.

19Huron

Hewlett in Bach’s Brandenburg Concerto No. 2—just six measures before the end of the first move-ment. The B–A–C–H pattern occurs in the celloand contrabass parts.

Rather than searching for pitch patterns, a musi-cally more useful search might focus on melodicintervals. Suppose, for example, we were lookingfor motivic statements in the opening movementof Beethoven’s Fifth Symphony. The followingtemplate will locate all passages that begin withany sonority (‘‘.*’’), followed by two perfect uni-sons, and terminated by either a major or minor de-scending third:

.*P1P1-[mM]3

In some applications, it is often useful to gener-ate pattern templates automatically. For example,in serial and twelve-tone compositions, the Hum-drum reihe command can be used to generate allset variants from some prime form. Then the pattcommand can be iteratively invoked to search foreach set form. Normally, a pitch-class set (**PC)representation would be used to conduct thesearch.

Figure 3. Continued.


A unique characteristic of set-related practices isthe constructing of simultaneities using successiveelements from a row. That is, an abstract sequenceof tones (1, 2, 3, 4, 5) might appear as tone 1, fol-lowed by tones 2, 3, and 4 (sounding concurrently),followed by tone 5. The patt command providesan option that instructs patt to match each inputrecord with the maximum number of possible con-tiguous template patterns.

Figure 4 shows a sample passage from the firstmovement of Webern’s Opus 24 Concerto, withthe row statements identified using a script that it-eratively invokes patt for each row variant. Notice

that patt has identified patterns both within andacross instruments. When appropriate, alternativenames for tone-row statements will be identified.In the second system, patt also identified a state-ment of I1 (identified only by numbered pitches).Although this statement seems to be questionable,the pitches of I1 are contiguous across the partici-pating instruments.

The patt command provides an option thatcauses a new spine to be output containing user-defined tags (such as P5, RI6, Theme A, etc.). Thesetagged outputs can be used as input for another pat-tern search, hence users can use patt to search for

Figure 4. A passage fromthe first movement of We-bern’s Opus 24 Concerto,with the row statements

identified using a scriptthat iteratively invokespatt for each rowvariant.

14 15 16

I5

R5

1

23

45

6

7

8

99

10

11

21Huron

patterns of patterns, etc. For example, the followingHumdrum input might match a template whosepurpose is to identify sonata allegro forms:

**sectionsIntroductionExpositionDevelopmentRecapitulationCoda*-

Similarity

A characteristic of pattern searches is that only twostates are possible: either a passage matches the de-fined template or it does not. Because music reliesextensively on the art of variation, however, it ismore important for many musical applications tobe able to characterize degrees of resemblance thanidentify absolute matches.

While the concept of ‘‘similarity’’ seems intui-tively obvious, as an operational concept it provesremarkably complex. A basic distinction can bemade between similarity for numerical (or paramet-ric) data and non-numeric (non-parametric) data.Measurements of similarity for parametric data canbe devised using variants of mathematical correla-tion, such as Pearson’s r. Measurements of similar-ity for non-parametric data have been exploredusing various algorithms in the field of approxi-mate string matching (see Hall and Dowling 1980).

Humdrum provides a parametric similarity tool(correl) for characterizing numerical similarity,such as determining similar pitch contours. A non-parametric Humdrum tool (simil) allows users todefine similarity according to an extended class ofDamereau–Levenshtein edit-distances (see Orpenand Huron 1992 for details).

The simil tool allows users to define penaltiesfor various edit actions such as deletions, inser-tions, and substitutions; users are also free to de-fine the basic representation used by simil.Depending on the application, a user might choosepitch similarity, interval similarity, contour/dura-tion similarity, articulation similarity, harmonic

similarity, or other forms or combinations of simi-larity. Note that simil is not restricted tonotation-based applications; it is equally adept atall non-parametric similarity tasks, such as charac-terizing spectral similarity, fingering similarity,similarity of conducting gestures, etc. Any repre-sentation that a user can concoct can be used as in-put for simil.

A unique property of the Humdrum simil toolis that it allows users to define asymmetrical simi-larity measures, where A is judged more similar toB than B is to A. Psychologically, the capacity forasymmetrical similarity measures is important. Inprototype theory, for example, it is known thatpeople tend to judge the color pink as more similarto the color red (prototype) than vice versa. Simi-larly, listeners tend to judge a musical variation asmore similar to the theme (prototype) than the re-verse comparison. Several contrasting illustrationsusing simil are described in Orpen and Huron(1992).

Other Tools

The above examples have introduced only a hand-ful of the Humdrum tools and have only scratchedthe surface in illustrating the information process-ing capabilities of Humdrum. As we have seen, thestrength of the Humdrum tools lies not in their in-dividual capabilities, but in their endless variety ofconfigurations and interactions. The Humdrumweb site (humdrum.net) provides hundreds of addi-tional tutorial examples that illustrate a variety ofmusic-related information-processing tasks.

Scripts

Humdrum’s command-line interface is frequentlyregarded as the principal impediment to morewidespread use. However, programmers will readilyunderstand that the command-oriented structure isHumdrum’s principal strength, because it allowscomplex scripts to be written and embedded inone’s favorite programming language. Scripts canbe created both to carry out routine operations and


to package a complex process as a stand-aloneapplication.

An example of the latter can be found in Theme-finder, a ‘‘name-that-tune’’ web application locatedat www.themefinder.com. Themefinder provides asimple interface that allows users to search a data-base of approximately 25,000 musical themes andincipits by specifying various search keys, such asthe up/down melodic contour (Kornstadt 1998).The engine underlying Themefinder is a Humdrumscript that was written in a single afternoon. Theactual searching is done by the UNIX grep com-mand. It is important to understand that Theme-finder actually limits users’ searching capabilities.Humdrum itself allows far more ways of accessingand searching the data, but a form-based web inter-face provides greater convenience.

Another example of the value of scripting is evi-dent in the simplicity with which Humdrum canbe connected to other software tools. Figure 5shows a musical map generated by connectingHumdrum to GMT, a free ‘‘generic mapping tool’’that is used by professional geographers to generatehigh-quality PostScript maps. Like the Humdrumcommands, GMT is invoked as a POSIX-formatcommand with a potentially large number of map-drawing options. As part of a project to studyculture-related musical features, Bret Aarden wrotea script that formats Humdrum’s geographical ref-erence records as input to GMT (Aarden and Huron2001). Figure 5 shows a simple contour map indi-cating the density of Germanic folksongs in minorkeys. The map was generated using data from theEssen Folksong Collection (Schaffrath 1995). Theimportant point is that any type of music-relatedsearch can be ‘‘piped’’ to GMT, resulting in atailor-made musical map.

Technical Specifications

Humdrum is written in a combination of lan-guages, including C, C��, awk, kornshell, yacc,lex, and perl. The toolkit is available free of chargevia the web, and source code is included in the dis-tribution. Humdrum can be used with all popularoperating systems; however, a POSIX-conformant

command shell is required. Complete documenta-tion is available, including information describingall pre-defined representations, software documen-tation for all Humdrum tools, introductory tutori-als, examples, and newsletters. A developers kit isavailable that includes a syntax checker, inputparser, and test suites. Documentation is availablein both printed and online versions. The best gen-eral introduction to the Humdrum Toolkit is Mu-sic Research Using Humdrum: A User’s Guide(Huron 1999).

Drawing Lessons from the Humdrum Experience

Whether or not one uses Humdrum, there are anumber of broad lessons arising from the Hum-drum experience that are pertinent to creatinggeneral-purpose music-related software.

In software design, a useful distinction can bemade between closed and open applications. Aclosed application can be defined as one serving awell-defined problem space where all tasks can bespecified in advance. In closed applications, it ispossible to construct efficient stand-alone softwarethat provides an intuitive and effective user inter-face. However, in research or development-relatedenvironments, the types of tasks users pursue areunpredictable and thus open-ended. In such envi-ronments, it is impossible to conceive of a singleintegrated application that will serve all purposes,and so a software tools approach is preferable.

Whenever possible, use flat text (ASCII) represen-tations. This spares programmers the trouble ofhaving to design special-purpose editors and othergeneral tools for each representation. Using text-based representation also avoids erecting barriersfor users who wish to define and use their own rep-resentations.

Do not attempt to represent everything in one gi-gantic encoding scheme. As in the case of struc-tured programming, it is better to segment therepresentational problem so encoding, editing, andprocessing remain manageable.

Do not force users to encode information (suchas pitch, duration, title, etc.) that is not of interestto the user.

23Huron

The effectiveness of text-based representations isgreatly increased by pushing the data structuresinto the data representation. In its use of a tabularformat of spines and records, Humdrum representa-tions are inherently two-dimensional and thusclosely echo the structure of sequential and concur-rent events typical of music. In effect, Humdrumrepresentations form a linked-list lattice structure.Instead of creating an internal data structure acces-

sible only within a program, Humdrum places thelinked-list structure within the data representation.This approach greatly simplifies access to and pro-cessing of the data.

Label or tag the representations so each softwaretool can identify which data the tool can legiti-mately manipulate, which data it should leave un-touched, and which data constitute errors.

In research and development situations, create

Figure 5. A musical mapgenerated by connectingHumdrum to GMT, a freegeneric mapping tool.


software tools that can be executed within any pro-gramming language or script.

When processing information, at each step en-sure that users can intercept the data and invokeeither commercial tools or user-crafted scripts tomassage the information.

Do not assume that users will perform a giventask using the tool you have provided. Users maydisagree with the approach or be more familiarwith the operation of another tool.

Avoid writing software that already exists, suchas general ‘‘sort’’ routines or music notation edi-tors. Having to write such software usually meansthat insufficient attention has been paid to allow-ing users to connect easily with other software.

Provide comprehensive and complete documen-tation for all tools so that conscientious users cancomprehend and anticipate the limits of operation,and so programmers can identify suspected bugswith confidence. In many ways, basic Humdrumtools like extract, assemble, patt, andhumsed are structured equivalents of the popularUNIX utilities cut, paste, grep, and sed.Whereas the UNIX tools operate indiscriminatelyon all data in the standard input, the Humdrumequivalents pass comments and reference informa-tion intact, preserve the structure and syntax of theinput, and operate only on specified data typeswithin a data stream. In addition, the Humdrumequivalents update the interpretation informationwhere appropriate so that subsequent tools recog-nize that the data has been modified in a particularmanner.

The importance of data typing is evident in theburgeoning conventions for filename extensionssuch as ‘‘.jgp,’’ ‘‘.gif,’’ ‘‘.html,’’ ‘‘.mid,’’ ‘‘.mp3,’’ etc.These tags are often used by application software torecognize files that can be processed, and they areappended to output files to specify the data format.The Humdrum ‘‘interpretation’’ records provide amore nuanced and powerful way of data tagging. Byplacing the type tags within the file or data streamitself, Humdrum allows multiple forms of informa-tion to co-exist within a single document.

The benefit of this approach is that it preservesstructural relationships between various data types,thereby facilitating contextually sensitive process-ing of multiple forms of data concurrently. For ex-

ample, with Humdrum it is straightforward toprocess notation-related and performance-relateddata concurrently. By way of illustration, in Hum-drum it would be relatively easy for a user tosearch for all notated mordents where the key ve-locity of a MIDI performance is higher for the sec-ond note than for the third note of the mordent. Bycontrast, if the notation data and MIDI data werestored in separate files, performing such a searchwould be technically challenging.

Conclusion

This article has provided a cursory introduction toHumdrum. Humdrum provides a syntax that al-lows users to represent arbitrary forms of time-dependent information. It also provides a set ofutilities or tools that manipulate Humdrum repre-sentations in various ways. The tools performoperations such as displaying, performing, search-ing, editing, transforming, extracting, linking, clas-sifying, labeling, and comparing. The tools can beused individually or linked together to carry out avariety of tasks. Users can intercept representationsat any point and write their own tools that aug-ment the functioning of Humdrum. In addition, theHumdrum tools can be accessed from within stan-dard programming languages.

After more than a decade, the Humdrum Toolkitremains without peer in music information pro-cessing applications. Its principal attraction hasbeen its broad scope for musical problem-solvingand its flexibility of operation. Its principal detrac-tions have been its command-line interface, and itspresumption that users have facility with UNIX-like shell commands. Especially for users with noprior programming experience, the learning curvefor Humdrum can seem intimidating. While not allcomputer music researchers will benefit from usingHumdrum, a number of design features and criteriaprovide useful lessons for developing future music-related software.

Acknowledgments

Continued software development has benefitedfrom the efforts of many individuals. Two graphic

25Huron

user interfaces have be written: one by MichaelTaylor at the University of Belfast (Taylor 1996),and a second by Andreas Kornstadt at the Univer-sity of Hamburg (Kornstadt 1996). Robert Gjerdin-gen at Northwestern University and Po-Yan Tsangat the University of Waterloo wrote Humdrum/Fi-nale translators. Kyle Dawkins at McGill Univer-sity wrote extensions for coordinating Humdrumwith compact discs. Utilities for making HumdrumLisp-compatible have been written by Jorg Garbersand Thomas Noll at the Technical University ofBerlin. Craig Sapp at Stanford University has writ-ten a number of utilities, including a harmonic an-alyzer. Another harmonic analyzer developed byDavid Temperley (Temperley 2001) at ColumbiaUniversity and the Eastman School of Music wasrendered Humdrum-compatible by Bret Aarden atthe Ohio State University. Michael Good of Recor-dare Inc. has fashioned an XML scheme inspired byHumdrum (Good 2000).

The Center for Computer Assisted Research inthe Humanities has been indispensable in provid-ing direct and indirect support for Humdrum. I amgrateful to Drs. Walter Hewlett and EleanorSelfridge-Field for making the MuseData editionsavailable online in Humdrum format. Other schol-ars have provided databases initially encoded usingother representation schemes, including JohnMiller at North Dakota State University, HelmutSchaffrath at the University of Essen, and HarryLincoln at the State University of New York atBinghamton.

More than 80 music collections have been en-coded directly in the Humdrum format. Amongthose who have encoded music using Humdrumare Suzi Wint, Sandra Serafini, Lonney Young, BenKoen, Eric Berg, Norma Welch, Franz Wiering,Joshua Veltman, Igor Karaca, Natasa Kara, Paul vonHippel (von Hippel 1998), Stefan Morent (Morent2000), Matthew Royal (Huron and Royal 1996), andDenis Collins (Collins and Huron, in press).

Further thanks are due to Keith Orpen, KeithMashinter, Bill Thompson (Thompson and Stainton1995–96), Jasba Simpson (Simpson and Huron1993), Jonathan Wild (Wild 1996), Randall Howard,

Simon Clift, Maki Ishizaki, Bo Alphonce, BrucePennycook, David Wessel, Chris Chafe, Max Ma-thews, John Howard, Gregory Sandell, Perry Ro-land, and Jordi Martin. I am especially indebted tomy former research assistants Tim Racinsky andKyle Dawkins for their professional programmingefforts.

References

Aarden, B., and D. Huron. 2001. ‘‘Mapping EuropeanFolksong: Geographical Localization of Musical Fea-tures.’’ Computing in Musicology 12:169–183.

Collins, D., and D. Huron. In press. ‘‘Voice-leading inCantus Firmus-Based Canonic Composition: A Com-parison Between Theory and Practice in Renaissanceand Baroque Music.’’ Computers in Music Research.

Erickson, R. 1976. ‘‘DARMS: A Reference Manual.’’Binghamton, NY: typescript.

Good, M. 2000. ‘‘MusicXML for Notation and Analysis.’’Computing in Musicology 12:113–124.

Hall, P., and G. Dowling. 1980. ‘‘Approximate StringMatching.’’ ACM Computing Surveys 12:381–402.

Hall, T. 1997. ‘‘DARMS: The A-R Dialect.’’ In E.Selfridge-Field, ed. Beyond MIDI: The Handbook ofMusical Codes. Cambridge, Massachusetts: MIT Press,pp. 573–580.

Hewlett, W. B. 1997. ‘‘MuseData: A Multipurpose Repre-sentation.’’ In E. Selfridge-Field, ed. Beyond MIDI: TheHandbook of Musical Codes. Cambridge, Massachu-setts: MIT Press, pp. 402–447.

Howard, J. 1997. ‘‘Plaine and Easie Code: A Code for Mu-sic Bibliography.’’ In E. Selfridge-Field, ed. BeyondMIDI: The Handbook of Musical Codes. Cambridge,Massachusetts: MIT Press, pp. 343–361.

Huron, D. 1992. ‘‘Design principles in computer-basedmusic representation.’’ In A. Marsden and A. Pople,eds. Computer Representations and Models in Music.London: Academic Press, pp. 5–59.

Huron, D. 1995. The Humdrum Toolkit: Reference Man-ual. Stanford, California: Center for Computer As-sisted Research in the Humanities.

Huron, D. 1997. ‘‘Humdrum and Kern: Selective FeatureEncoding.’’ In E. Selfridge-Field, ed. Beyond MIDI: TheHandbook of Musical Codes. Cambridge, Massachu-setts: MIT Press, pp. 375–401.

Huron, D. 1999. Music Research Using Humdrum: AUser’s Guide.


Huron, D., and M. Royal. 1996. ‘‘What is Melodic Ac-cent? Converging Evidence from Musical Practice.’’Music Perception 13(4):498–516.

Kornstadt, A. 1996. ‘‘SCORE-to-Humdrum: A GraphicalEnvironment for Musicological Analysis.’’ Computingin Musicology 10:105–122.

Kornstadt, A. 1998. ‘‘Themefinder: A Web-Based MelodicSearch Tool.’’ Computing in Musicology 11:231–236.

Morent, S. 2000. ‘‘Representing a Medieval Repertoryand its Sources: The Music of Hildegard von Bingen.’’Computing in Musicology 12:19–33.

Orpen, K., and D. Huron. 1992. ‘‘Measurement of Simi-larity in Music: A quantitative Approach for Non-Parametric Representations.’’ Computers in MusicResearch 4:1–44.

Schaffrath, H. 1995. The Essen Folksong Collection.Stanford, California: Center for Computers AssistedResearch in the Humanities.

Simpson, J., and D. Huron. 1993. ‘‘The Perception ofRhythmic Similarity: A Test of a Modified Versionof Johson-Laird’s Theory.’’ Canadian Acoustics 21(3):89–90.

Smith, L. 1972. ‘‘SCORE : A Musician’s Approach to

Computer Music.’’ Journal of the Audio EngineeringSociety 20:7–14.

Taylor, W. M. 1996. Humdrum Graphical User Inter-face. MA Thesis, Music Technology, Queen’s Univer-sity of Belfast, Ireland.

Temperley, D. (2001). The Cognition of Basic MusicalStructures. Cambridge, Massachusetts: MIT Press.

Thompson, W.F., and M. Stainton. 1995–96. ‘‘UsingHumdrum to Analyze Melodic Structure: An Assess-ment of Narmour’s Implication-Realization Model.’’Computing in Musicology 10:24–33.

Vercoe, B. 1986. Csound: A Manual for the Audio Pro-cessing System and Supporting Programs with Tutori-als. Cambridge, Massachusetts: MIT Media Lab.

von Hippel, P. 1998. 42 Ojibway Folksongs in the Hum-drum **kern Representation: Electronic Transcrip-tions from the Densmore Collections. Stanford,California: Center for Computer Assisted Research inthe Humanities.

Wild, J. 1996. ‘‘A Review of the Humdrum Toolkit:UNIX Tools for Musical Research, Created by DavidHuron.’’ Music Theory Online 2(7). Available online atwww.societymusictheory.org/mto/.

david huron music information processing using the...

Documents