bryanpardoandwilliamp.birmingham algorithmsfor chordalanalysis

23
27 Pardo and Birmingham Bryan Pardo and William P. Birmingham Arti cial Intelligence Laboratory Electrical Engineering and Computer Science Department The University of Michigan 110 ATL Building, 1101 Beal Ave. Ann Arbor, MI 48109-2110, USA [email protected] [email protected] Algorithms for Chordal Analysis Computer Music Journal, 26:2, pp. 27–49, Summer 2002 Ó 2002 Massachusetts Institute of Technology. The harmonic analysis of tonal music by computer is an important area of interest in the computer music research community. While the problem is interesting in its own right, the ability to parse and use chords and harmonies in a tonal composition also adds an important dimension to a computer agent’s ability to manipulate musical material. Au- tomated arranging, accompaniment, and phrasing are all aided by an understanding of the chordal structure of a piece. Although musicians analyze tonal music of all types, it is dif cult to create algorithms that have this generality. In this article, we describe a set of algorithms for harmonic analysis that make mini- mal use of stylistic and contextual cues, such as metric strength, harmonic context, and known sty- listic constraints. The system described in this arti- cle is purposefully simple in its approach. This allows us to specify its workings to the point where the work may be duplicated and extended by oth- ers. The system is designed both to explore basic computational properties of algorithms to solve the task and to provide a baseline against which other systems can be measured. In this way, it is possible to measure the change in performance introduced by use of such things as tonal context, voice- leading rules, and metrical information. We divide harmonic analysis into two tasks. Seg- mentation is the task of splitting the music into appropriate chunks (segments) for analysis. As a de- fault, we assume segmentation takes place in the time dimension. Labeling is the task of giving each segment the proper quality and root pitch. Figure 1 shows a measure of music partitioned into labeled segments. The algorithms we describe in this article per- form both segmentation and labeling. Previous re- search in the area of automated harmonic analysis of music (Winograd 1968; Smoliar 1980; Laden and Keefe 1989; Ebcioglu 1992; Maxwell 1992; Widmer 1992; Smaill et al. 1993; Taube 1999), with the no- table exception of recent work by Temperley and Sleator (1999), has either avoided the issue of seg- mentation by taking already segmented input, or has been unclear about how segmentation is done. Temperley and Sleator describe a clear and ef cient approach. While Temperley and Sleator do address the seg- mentation problem, they only return the root name (rather than the chord quality) and do not give any statistical analysis of the performance of their sys- tem when compared to an outside measure. To our knowledge, no researchers have published statisti- cal performance results on a system designed to an- alyze tonal music. The lack of such measures in the literature makes comparisons among systems dif cult. We present an empirical evaluation of the algo- rithms in this article, using a corpus of 45 excerpts of tonal music compiled by David Temperley, from the Kostka and Payne music theory textbook (Kostka and Payne 1984), hereafter referred to as the KP corpus. (The appendix lists each piece used in this corpus.) The analyses provided by our sys- tem are compared to an answer key based on the analyses in the teacher’s edition of the textbook us- ing a grading metric described later in this article. Both statistical results and analyses of individual excerpts are provided, and classes of error are tabu- lated and reported. Segment Labeling We de ne two major tasks in harmonic analysis: segmentation (i.e., identifying those places in the music where the harmony changes) and labeling (i.e., giving each segment the proper quality and

Upload: others

Post on 19-May-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: BryanPardoandWilliamP.Birmingham Algorithmsfor ChordalAnalysis

27Pardo and Birmingham

Bryan Pardo and William P. BirminghamArti� cial Intelligence LaboratoryElectrical Engineering and ComputerScience DepartmentThe University of Michigan110 ATL Building, 1101 Beal Ave.Ann Arbor, MI 48109-2110, [email protected]@eecs.umich.edu

Algorithms forChordal Analysis

Computer Music Journal, 26:2, pp. 27–49, Summer 2002Ó 2002 Massachusetts Institute of Technology.

The harmonic analysis of tonal music by computeris an important area of interest in the computermusic research community. While the problem isinteresting in its own right, the ability to parse anduse chords and harmonies in a tonal compositionalso adds an important dimension to a computeragent’s ability to manipulate musical material. Au-tomated arranging, accompaniment, and phrasingare all aided by an understanding of the chordalstructure of a piece.

Although musicians analyze tonal music of alltypes, it is dif� cult to create algorithms that havethis generality. In this article, we describe a set ofalgorithms for harmonic analysis that make mini-mal use of stylistic and contextual cues, such asmetric strength, harmonic context, and known sty-listic constraints. The system described in this arti-cle is purposefully simple in its approach. Thisallows us to specify its workings to the point wherethe work may be duplicated and extended by oth-ers. The system is designed both to explore basiccomputational properties of algorithms to solve thetask and to provide a baseline against which othersystems can be measured. In this way, it is possibleto measure the change in performance introducedby use of such things as tonal context, voice-leading rules, and metrical information.

We divide harmonic analysis into two tasks. Seg-mentation is the task of splitting the music intoappropriate chunks (segments) for analysis. As a de-fault, we assume segmentation takes place in thetime dimension. Labeling is the task of giving eachsegment the proper quality and root pitch. Figure 1shows a measure of music partitioned into labeledsegments.

The algorithms we describe in this article per-form both segmentation and labeling. Previous re-search in the area of automated harmonic analysis

of music (Winograd 1968; Smoliar 1980; Laden andKeefe 1989; Ebcioglu 1992; Maxwell 1992; Widmer1992; Smaill et al. 1993; Taube 1999), with the no-table exception of recent work by Temperley andSleator (1999), has either avoided the issue of seg-mentation by taking already segmented input, orhas been unclear about how segmentation is done.Temperley and Sleator describe a clear and ef� cientapproach.

While Temperley and Sleator do address the seg-mentation problem, they only return the root name(rather than the chord quality) and do not give anystatistical analysis of the performance of their sys-tem when compared to an outside measure. To ourknowledge, no researchers have published statisti-cal performance results on a system designed to an-alyze tonal music. The lack of such measures inthe literature makes comparisons among systemsdif� cult.

We present an empirical evaluation of the algo-rithms in this article, using a corpus of 45 excerptsof tonal music compiled by David Temperley, fromthe Kostka and Payne music theory textbook(Kostka and Payne 1984), hereafter referred to asthe KP corpus. (The appendix lists each piece usedin this corpus.) The analyses provided by our sys-tem are compared to an answer key based on theanalyses in the teacher’s edition of the textbook us-ing a grading metric described later in this article.Both statistical results and analyses of individualexcerpts are provided, and classes of error are tabu-lated and reported.

Segment Labeling

We de� ne two major tasks in harmonic analysis:segmentation (i.e., identifying those places in themusic where the harmony changes) and labeling(i.e., giving each segment the proper quality and

Page 2: BryanPardoandWilliamP.Birmingham Algorithmsfor ChordalAnalysis

28 Computer Music Journal

Table 1. Pitch class number and name correspondences

C C-sharpD-�at

D D-sharpE-�at

E F F-sharpG-�at

G G-sharpA-�at

A A-sharpB-�at

B

0 1 2 3 4 5 6 7 8 9 10 11

root). If the right segmentation has already been de-termined, labels must still be determined. This sec-tion describes an approach to labeling segmentsthat compares the collection of notes in a segmentto templates describing common chords.

This approach to labeling a segment uses a � xednumber of steps irrespective of the number of notesin the set and has the bene� t of generating a nu-merical measure of proximity to the nearest tem-plate in the system’s vocabulary. These twoproperties have important rami� cations that allowuse of the segment-labeling algorithm as a keycomponent in the segmentation process describedin a later section. However, for the purposes of de-scribing segment labeling here, we assume seg-mented input.

The chromatic scale and its twelve pitch classesare basic elements associated with most tonal mu-sic. The common labels for the pitch classes, alongwith their integer equivalents, are given in Table 1.Using the integer representations of the pitchclasses and modulo-12 arithmetic, structures suchas chords and scales can be represented as n-tuples.These tuples correspond to positive displacementsin the space of pitch classes in relation to a rootpitch class. The tuples imply templates that we useto � nd musical structures. These templates areclosely related to sets used in atonal set theory(Forte 1973), the chromagram (Wake� eld 1999), andthe work of Ulrich (1977).

An example of the template representations isthe following. Given a root (pitch) class r, the tuple

^ 0, 4, 7 & represents the pitch-class relations to r em-bodied in a major triad. Letting r = 7 results in achord given by mod12(r + 0, r + 4, r + 7) = [7,11,2].From Table 1, it is easy to verify that this corre-sponds to [G, B, D], the pitch classes in the G majortriad.

The system described in this article used a set of72 templates derived from six template classes.Each class has twelve member templates, one foreach root pitch class. These are generated in a man-ner identical to that of the example G major triaddescribed in the previous paragraph. Table 2 liststhe template classes used in the experiments re-ported in this article. Chordal template categorieswere selected based on their frequency of occur-rence in the harmonic analyses in the answer keyto the KP corpus of 45 musical excerpts. All chordqualities accounting for at least 2% (when rounded)of the chord labels in the corpus were used. Thiswas done to limit the negative impact of extremelyrare chord templates on the performance of thesystem.

It is simple to add new templates. Note thatnothing in this approach limits templates to triadicstructures. Pentatonic scales and quartal construc-tions, for instance, can be identi� ed simply by in-troducing a template for the structure in questionand establishing a rule for resolving ties betweenthe templates when more than one is applicable. Itis also important to note that templates are notheuristic: the pitch relationships described in thetemplate must hold for the tonal structure it mod-els, by de� nition of the structures.

Before describing the scoring mechanism for de-termining which template best matches a collec-tion of notes, we must introduce someterminology. Harmonic change can only occurwhen at least one note begins or ends. A partitionpoint occurs where the set of pitches currentlysounding in the music changes by the onset or off-

C major G 7 C major

Figure 1. Music partitionedinto labeled segments.

Page 3: BryanPardoandWilliamP.Birmingham Algorithmsfor ChordalAnalysis

29Pardo and Birmingham

1 0 0 1 1 1

Segment

PartitionPoint

C4

MinimalSegmen t

set of one or more notes. A segment is a contiguousinterval between two partition points. A minimalsegment is the interval between two sequentialpartition points. Figure 2 illustrates these concepts.

Given a segment containing a collection of notes,the score for a particular template is calculatedfrom the steps in Figure 3.

As an example, let us calculate the score for an Aminor triad template on the segment comprisingthe � rst two beats of the measure in Figure 2. Thetemplate for the A minor triad is found by adding 9(the pitch class of A) to the values in the templateclass for a minor triad, ^ 0, 3, 7 & , and taking modulo12 of the resulting values. This results in the tem-plate ^ 9, 0, 4 & , or [A, C, E]. We may now follow thesteps in Figure 3.

There are � ve notes in the � rst segment, [C4, E4,C5, D5, E5], whose weights are 3, 3, 1, 1, and 1, re-spectively. Each C and E in the segment matches atemplate element, so we add their weights:

P = Weight(C4) + Weight(E4) + Weight(C5) +Weight(E5) = 3 + 3 + 1 + 1 = 8

The pitch D5 is the only one that does not matchany template, so we have

N = Weight(D5) = 1

One template element (for the pitch A) was notfound in the segment. To summarize, we have

M = 1S = P – (M + N ) = 8 – (1 + 1) = 6

Thus, the � nal score for the A minor triad templateof this segment is 6.

Table 2. The template classes used by our system

Chord NameProportion of

KP corpusTemplate

Representation

Major Triad 43.6% ^ 0 4 7 &Dom7 (Major-Minor) 21.9% ^ 0 4 7 10 &Minor Triad 19.4% ^ 0 3 7 &Fully Diminished 7th 4.4% ^ 0 3 6 9 &Half Diminished 7th 3.7% ^ 0 3 6 10 &Diminished Triad 1.8% ^ 0 3 6 &

Figure 2. Segments andPartition Points.

Figure 3. (Bottom of page)Calculating the score of atemplate.

Page 4: BryanPardoandWilliamP.Birmingham Algorithmsfor ChordalAnalysis

30 Computer Music Journal

In this scoring mechanism, the number of mini-mal segments a note spans determines its weight.Note that this does not directly correlate to the ab-solute duration of the note or to its notated rhyth-mic value. Rather, it correlates to the number ofchanges in the music the note overlaps. Thus, aquarter note in the � rst beat of Figure 2 spans a sin-gle minimal segment, as do each of the eighthnotes in the following beat.

One could imagine any number of note-weighting methods involving actual duration, vol-ume (encoded as MIDI velocity), weighting thehighest or lowest note differently, etc. We tried anumber of ad hoc approaches to note weight in-volving these parameters but found that the ap-proach based on minimal segments workedsurprisingly well. Furthermore, this corresponds toour design philosophy, which emphasizes the useof the simplest approach wherever possible.

To label a segment, all 72 templates are scored,and the highest scoring template determines thesegment label. When this is done for the � rst twobeats of the measure in Figure 2, the C major triadties with the A minor triad, and each receives ascore of 6. If two templates generate the samescore, they form an equivalence class, and the win-ner may be selected using any tie-resolution rule.Some tie-breaking rules we used with success inour system are shown in Figure 4. The rules are ap-plied in the order they appear in Table 2. For ourexample, the � rst tie-breaking rule causes the seg-ment to be labeled C major, since the pitch weightfor C in the segment is 4, while that for A is 0.

The � rst two tie-breaking rules in Figure 4 areexamples of context-free tie resolution. The thirdrule performs contextual tie breaking. These rulesare not intended to be an exhaustive list; any num-ber of rules may be envisioned involving tonal con-text, position of lowest note, etc. A later section of

this article explores the effect these rules have onsystem performance.

Table 3 shows the label for every segment in Fig-ure 2, using the templates, scoring, and tie-breaking rules described in this section. Thevertical key determines the index number of thestarting partition point for the segment. The hori-zontal key determines the ending partition pointfor the segment. The number beneath each chordlabel is the score that template received on the setof notes in the given segment. For example, thesegment ^ 4, 6 & consists of the last two beats of themeasure. The label best matching these two beatsconsidered as an aggregate is G7, and it receivestwo points.

The approach described in this section labels andscores each segment in constant time given thatthe note weights for each segment have alreadybeen determined.

The minimal segments form the main diagonalof Table 3. Calculating the note weight for theminimal segment from MIDI input may be done inlinear time with respect to the number of notes ina piece by maintaining a 12-element array, indexedby pitch-class number (0 through 11), representingthe state of the music. Each element i gives thecount of notes of pitch class i currently sounding.The array is updated as MIDI data are received,adding one to the appropriate element for a ’’noteon’’ and subtracting one for a ’’note off.’’ When anon-zero MIDI delta-time is encountered, this indi-cates the beginning of a new minimal segment, andthe state is saved as a minimal segment before up-dating it with the new note event.

The time required to generate the set of minimalsegments is linear with respect to the number ofMIDI events in the input. This can be seen by not-ing that a � xed number of steps is taken for any’’note on’’ or ’’note off’’ event. Each of these steps

Figure 4. Tie breakingbetween templates.

Page 5: BryanPardoandWilliamP.Birmingham Algorithmsfor ChordalAnalysis

31Pardo and Birmingham

Table 3. Segment scores for Figure 2

2 3 4 5 6

1C Major2

C Major3

C Major6

C Major6

C Major9

2C Major0

C Major3

C Major3

C Major6

3C Major2

C Major2

C Major5

4G72

G72

5 C Major3

requires a � xed amount of time. Thus, the boundon the complexity is O(n) (i.e., linear with respectto n), where n is the number of notes in the music.

Once the note weights for minimal segmentshave been computed, all other segment weightsmay be derived from them, taking a � xed numberof steps per segment. This may be done recursivelyby noting that the weight of segment ^ i, j & is givenby the sum of the weight of segment ^ i, j–1 & andweight of segment ^ i + 1, j& . (Pardo and Birmingham(2001) offer a detailed description of how to con-struct the full table of segments.) Now we turn toan analysis of the ef� cacy of the labeling approachdescribed in this section.

Empirical Testing of theSegment-Labeling Algorithm

The segment-labeling algorithm is simple, consid-ering only structures based on pitch class and rela-tive durations of notes. No notion of tonal context,beat, absolute pitch height, or dynamics enters intothe calculation. How well, then, does such an algo-rithm do in labeling segments on typical examplesof tonal music?

Before presenting our experiment results, we de-scribe the method we used to grade the labeling al-gorithm. Figure 5 shows grade calculation for ourlabeling system on a two-bar passage. The correctanswer and the program’s analysis are shown foreach segment. The system receives one point foreach minimal segment where its label exactly

matches the answer key. If the program returnsmultiple answers and the correct answer is amongthem, the point is divided by the number ofguesses taken. Segments containing answers notavailable to the system (such as ’’German 6th’’ or’’rest’’) are not graded. These are indicated by a hy-phen on the ’’Points’’ line.

This is a non-forgiving grading measure, as canbe seen from Figure 5. The only way to achieve aperfect score is to label every minimal segment cor-rectly and unambiguously. In the example, the sys-tem did not apply label tie-breaking and was unableto distinguish between C major and C minor in thethird beat of the � rst measure, costing it roughly6% of the � nal grade. The grading measure is alsounforgiving of issues such as labeling a major triadwith a passing seventh as a dominant seventhchord, as happens in the � nal beat of the � rst mea-sure. We considered a scoring measure that wouldbe more forgiving of such errors but decided thatthe clarity of this grading measure outweighs itsdrawbacks.

Using this grading method, we compared thechord labels assigned by our labeling method withthose given in the answer key to the KP corpus.The corpus gives chord labels using standard Ro-man numeral notation and were translated by the� rst author to a format compatible with our label-ing system. For example, C: I – viio6 – I6 would be-come C major – B dim – C major.

For each excerpt, the segmentation was deter-mined by the positions of the chord labels in the

Page 6: BryanPardoandWilliamP.Birmingham Algorithmsfor ChordalAnalysis

32 Computer Music Journal

Table 4. Mean label scores by tie-breaking regime

Tie-Breaking StrategyMean % of Piece

with Multiple Labels Mean Score % Num CasesStd Dev ofMean Score

No tie breaking 9.45 84.54 45 13.00Rule 1: Root Weight 4.27 85.65 45 11.77Rule 2: Prior Probability 7.94 86.12 45 11.44Rule 3: Dim 7 Resolution 7.53 86.11 45 13.03Rules 1 and 3 2.22 86.97 45 10.83Rules 1 and 2 3.74 87.22 45 11.78Rules 2 and 3 6.03 87.80 45 11.28Rules 1 and 2 and 3 1.68 88.66 45 10.72

answer key, with one segment per label, beginningwhere the label � rst appears and continuing untilthe next label appears. The note weights for eachsegment in an excerpt were passed to a functionencoding the labeling method outlined in this arti-cle, and the label (or labels) returned was recorded.Once labels for all segments were determined, theresulting labeling was scored against the answerkey using the method previously described.

To explore the effects of the three tie-breakingrules in Figure 4, each piece in the corpus was la-beled using eight methods: no tie breaking, eachrule in isolation, every pair of rules, and all three ofthese. This resulted in a total of eight trials perpiece. Table 4 shows mean grades for the entirecorpus, broken down by tie-breaking regime.

As can be seen from the table, the algorithmwithout tie-breaking rules returns multiple labels,for an average of over 9% of the minimal segments

in a piece. When all three rules are applied, this � g-ure drops below 2%. This improves the mean gradefrom below 85% correct without tie breaking tonearly 89% correct with the use of all three tie-breaking rules.

Figure 6 compares the labeling algorithm’s per-formance using all three tie-breaking rules againstits performance without the use of tie breaking.The horizontal line is the mean of all cases shownin the � gure. All trials for a single excerpt arealigned vertically along a spike line from the meanline.

As the � gure shows, the number of perfect scoresincreases from six (with no tie breaking) to 13 (us-ing all three rules). While this is a great improve-ment, the labeling method continues to returnsub-optimal scores on 32 of the 45 excerpts. In ad-dition, tie breaking actually lowered the labelingscore on excerpts 2, 16, and 20. In these cases, a

Figure 5.Grading alabeling.

Page 7: BryanPardoandWilliamP.Birmingham Algorithmsfor ChordalAnalysis

33Pardo and Birmingham

EXCERPT

50403020100

GR

AD

E

110

100

90

80

70

60

50

40

30

TIE BREAKING

A ll 3 Rules

No Tiebreak

Mean

tie-breaking rule eliminated a correct templatefrom consideration, lowering the � nal grade.

Figure 5 shows how this can occur. The thirdbeat of measure 1 is missing the third of the chord,and two templates receive equal scores. If the tie-breaking rules from Figure 4 are applied, the sys-tem will select ’’C major’’ based on the relativefrequency of major versus minor chords in the cor-pus as shown in Table 2. In the case of this exam-ple, this gives the correct result; however, had thepassage been in a minor key, ’’C major’’ wouldhave still been selected, and the result would havebeen incorrect.

It is instructive to look at individual labeling er-rors to determine their causes. The system madethe most errors on Excerpt 1, the anonymous Min-uet in G from the Notebook for Anna MagdalenaBach. Figure 7 shows this excerpt, with the answerkey at the top of each system and the answers gen-erated by the algorithm after tie breaking shown atthe bottom of each system. Labels remain in effectfrom the point where they appear until the next la-bel is encountered. Errors are boxed and labeled.

Figure 7 illustrates several types of errors. Theworst error can be found in the third and eleventhmeasures. These measures are identical, and bothare mislabeled with a very unlikely chord name,given the tonal context. The penultimate measureshows a place where there are exactly as manynotes in a G major triad as in a D major triad. Thesystem is unable to resolve the tie and returns both

answers. Adding extra weight to the bass note orruling out obvious passing tones from considera-tion could have resolved the errors in all three ofthese cases.

The remainder of the errors are on the excerptare better in the sense that a human might chooseto label things as the system has. The KP corpusanswer key labels notes appearing to be a secondinversion V (or V7) chord between two I chords as aviio6 chord. Thus, measures � ve and thirteen are la-beled F-sharp diminished. Our system returns thelabel of the V7 chord (i.e., D7) for these measures.Both choices serve the same serve the same func-tional role in this context. In the eighth measure,the system includes the C into the chord, resultingin a D7 label, while the answer key calls C a pass-ing tone, resulting in a label of D major. Again, thesame functional role is served by either chord.

To gain a better understanding of what causesthe remaining error when full tie breaking is en-abled, each labeling error on every excerpt in thecorpus was examined and categorized, resulting inthe following classes.

Miscellaneous passing tone: An error was causedby the system’s inability to discount passingtones from consideration.

Bad tiebreak: A tie-breaking rule eliminated thecorrect answer.

Disagree with answer key: We believe the an-swer key has an incorrect chord label.

Figure 6. Labeling Gradesby Excerpt.

Page 8: BryanPardoandWilliamP.Birmingham Algorithmsfor ChordalAnalysis

Misc PT I-viio6-I6

Misc PT I-viio6-I6

PT called 7th

Unresolved Tie

34 Computer Music Journal

Harmonic context required: An incompletechord, which can only be uniquely identi� edthrough tonal function in context, is misidenti-� ed.

I – viio6 – I6: The KP corpus answer key labelsnotes that appear to be a second inversion V (orV7) chord between two I chords as a viio6 chord.

Performance variation: A timing variation in per-formance caused notes in the MIDI � le tooverlap in different combinations than indi-cated in the score, affecting the program’s in-terpretation.

Miscellaneous unresolved tie: Miscellaneous tiebetween chord labels.

Pedal tone: The system incorrectly classi� es achord by including a pedal tone.

Passing tone called seventh by algorithm: Theprogram labels a triad with a passing seventhin the answer key as a seventh chord.

Unresolved fully diminished tie: Fully dimin-ished seventh chord does not resolve by step(e.g., common-tone diminished chord), so our

tie-resolution rule does not resolve a tie be-tween the four possible roots for the chord.

These categories do not represent a serious at-tempt to develop an error taxonomy; rather, theyare natural categories resulting from hand analysisof the data. Thus, some classes represent relativelygeneral categories and others (such as ’’I-viio6-I6’’)represent speci� c, frequently occurring errors. Errorfrequencies as a percentage of the total are shownin Figure 8.

Figure 8 suggests certain things about the label-ing system. Perfect tie breaking can be expected toeliminate three of the error classes (’’bad tie break,’’’’miscellaneous unresolved tie,’’ and ’’unresolvedfully diminished tie’’), for a total 26% of the errors.This is the maximum improvement that can be ex-pected from improved tie breaking.

Speci� c idiomatic errors, such as ’’I – viio6 – I6,’’may be resolved by using a two-pass system. The� rst past would generate the initial chord names.The second pass could look for sequences of chordnames that have the functional form ’’I – V4

3 – I6’’

Figure 7. Excerpt 1, anony-mous Minuet in G, fromthe Notebook for AnnaMagdalena Bach.

Page 9: BryanPardoandWilliamP.Birmingham Algorithmsfor ChordalAnalysis

Error Class

unresolved fdim tie

pt called 7th by prg

pedal tone

misc unresolved tie

performancevariation

I - viio6

- I6

harmonic context

disagr with ans key

bad tiebreak

misc passing tone

30

20

10

0

12

15

6

9

56

12

65

23

35Pardo and Birmingham

and replace the middle chord with a diminishedchord. System performance might be further im-proved by adding beat information to the noteweights. This could reduce the ’’miscellaneouspassing tone’’ and ’’pedal-tone’’ errors, althoughheavily syncopated music would still present aproblem.

The current labeling system only considers har-monic context through the diminished chord tie-breaking resolution rule. This does not affect theraw label scores. Knowledge of adjacent chord la-bels might be used to adjust chord-label weights.This introduces a possible problem, as chordweights across the piece change and in� uence eachother, making the calculation of chord labels muchmore dif� cult.

Passages where the error class ’’passing tonecalled seventh by program’’ arise would probablycoincide with disagreement among human ana-lysts. This problem class might perhaps be resolvedthrough a deep understanding of structural voiceleading.

This completes our description of the labelingsystem, its performance, error classes, and direc-tions for future improvements. We now turn to theissue of generating a good segmentation of themusic.

Computational Issues in Segmentation

Segmentation can be a dif� cult problem, as we de-scribe in this section. The nature of the problem isimportant to understand, as it guided the design ofthe algorithms we describe in this article and af-fects all algorithms that perform segmentation.

We begin with the simplest case: block chords.With block chords, where notes start and end si-multaneously, the harmonic changes are obvious,and so harmonic analysis is relatively straightfor-ward. Music, however, is not always this simple:chords are often arpeggiated, incompletely stated,and interspersed with non-harmonic tones (oftenstating the melody). All of these things confoundthe analysis task.

Harmonic change can only occur where notes be-gin or end. A point of possible harmonic change iscalled a partition point, where the set of pitchescurrently sounding in the music changes by the on-set or offset of one or more notes. We refer to theordered set (start to end) of all partition points for apiece of music as Pall. The size of Pall is denoted asp. A segment is the interval between partitionpoints. A minimal segment is the interval betweentwo sequential partition points. Figure 2 illustratesthese concepts.

Figure 8. Error ClassFrequencies on theKP Corpus.

Page 10: BryanPardoandWilliamP.Birmingham Algorithmsfor ChordalAnalysis

36 Computer Music Journal

A segmentation of a piece is a subset of Pall con-taining at a minimum the � rst and last elements ofPall. A binary number uniquely labels any segmen-tation of a piece of music. The ith bit of the num-ber is 1 if the ith partition point in Pall is in thesegmentation; otherwise, the bit is 0. The numberof partition points in Figure 2, denoted by p, is six.The � gure shows a segmentation of the music intosegments labeled ’’C major,’’ ’’G7,’’ and ’’C major,’’respectively. The six-digit number 100111 repre-sents this segmentation.

Because a segmentation is uniquely labeled witha binary number of length p, and the initial and � -nal bits of this number are always 1, there are 2p–2

ways to segment any piece of music. The numberof partition points is limited to a maximum oftwice the number of notes (one point for each notebeginning and each note end). Thus, a piece withnotes that are staggered so that each note onset andoffset is different from any other note onset or off-set will have 2p–2 = 22n–2 segmentations. A singleblock chord, where all notes begin and end simulta-neously will have 22–2 = 20 = 1 segmentations.

Although the number of partition points variesgreatly from piece to piece, even a short piece oftypical piano music will have hundreds of partitionpoints and thus a huge number of segmentations.Bach’s Sinfonia No. 1, a 21-measure solo keyboardpiece, has 341 partition points and thus 2339, orroughly 10102, segmentations.

Clearly, an exhaustive exploration of possiblesegmentations is impractical for all but the small-est pieces. Thus, any system that performs har-monic analysis must make assumptions to reducethe size of the search problem for a typical piece ofmusic. In particular, the assumption that the taskof labeling a single segment has limited contextsensitivity greatly reduces the complexity of thetask by allowing the reuse of sub-solutions be-tween segmentations. In other words, if one can as-sume that the analysis of the � rst measure of apiece is not affected by the analysis of the 437thmeasure, comparing two segmentations that differonly in the 437th measure will not require twoseparate analyses of the � rst measure (and everymeasure between the � rst and 437th). This greatlyreduces the overall amount of work. The frame-

work we propose in this article limits context sen-sitivity in order to constrain the search problem.

Dynamic programming is a search implementa-tion approach designed to reuse sub-solutions.Temperley and Sleator (1999) describe a searchbased on this approach for � nding the best se-quence of root labels of a tonal composition. Labelsfor minimal segments are given numeric scores in acontext-independent manner (allowing reuse ofsub-solutions). A start-to-� nish search for the bestpath through the labels is performed by adding thescore of the best path through the previous seg-ment with the score of the current segment’s la-bels. Label changes between segments receive anumeric penalty. When the end of the piece isreached, the highest-path score corresponds to anoptimal labeling, given their scoring measure.

The approach of Temperley and Sleator can bedescribed as a way to solve the optimization prob-lem of � nding a highest-scoring labeling of thechord roots for a piece of music. This approachserved as an inspiration for our own work, and we,too, have framed the problem as one of numeric op-timization and search for the highest-scoring label-ing for a piece of music. The approach we havetaken, however, has lead to a different formaliza-tion of the problem: that of segmentation as graphsearch. This formalization has lead to two distinctalgorithms for solving the problem (one describableas dynamic programming), detailed in the followingsections.

Segmentation as Graph Search

Finding the best segmentation requires a metric forassessing the ’’goodness’’ of a segmentation that al-lows direct comparison with other segmentations.We do this by generating scores for the individualsegments constituting a segmentation. The score ofa segmentation is given by the sum of the scores ofits component segments. Segmentations can thenbe directly compared and the best one selected.

For the purpose of this analysis, we assume thatindividual segments may be scored without refer-ence to their context. The segment-labelingmethod described in this article has this property.

Page 11: BryanPardoandWilliamP.Birmingham Algorithmsfor ChordalAnalysis

37Pardo and Birmingham

Table 3 shows the best-scoring label for each seg-ment in Figure 2, along with the label’s score. Asthe table shows, segments containing two or morechords tend to get low scores. Segments containingnotes belonging to a single chord tend to get highscores. The score for an individual segment shouldcorrespond to how much sense it makes to selectthat segment as a unit of analysis for � nding achord label. Thus, the score of the high-scoring la-bel for a segment serves as a reasonable score forthat segment. For the remainder of this article, weassume the use of segment scores based on best la-bel scores.

Once a segment is scored in the course of label-ing a segmentation, its label and score may bereused in the course of labeling another segmenta-tion. The search algorithm can then build thehighest-scoring segmentation out of individuallyscored segments.

Assuming scored segments, we can pose the seg-mentation problem as a search for the highest-reward path through a directed acyclic graph(DAG). Two advantages of using this representationare that the search method is intuitive and thereare known optimal algorithms we can use. Figure 9describes the steps taken to create the DAG.

In this DAG, each vertex represents a partitionpoint, and each edge represents one segment. Thesource vertex represents the initial partition point,and the terminal vertex is the � nal partition point.Any path from the � rst vertex to the last one repre-sents a segmentation of the piece. The problemthen becomes one of � nding the highest-scoringpath from the start to the end vertex.

Figure 10 shows a short passage of music repre-sented as a DAG. Partition points (vertices) arelabeled with small digits. Two different segmenta-

tions are represented on the graph. The path alongthe gray edges represents the segmentation whosescores are shown in gray. The path along the blackedges corresponds to the segmentation scored inblack. Large digits of the appropriate color indicatesegment scores.

Segmentation Using the Relaxation Algorithm

Posing the segmentation problem as graph searchlets us take advantage of the Relaxation algorithm(Cormen et al. 1990), which is guaranteed to � ndthe best path through a topologically sorted DAGin O(E) steps, where E is the number of edges. Be-cause the creation of a DAG from a piece of musicinherently sorts the vertices, we can use this algo-rithm to � nd the best segmentation. Figure 11 out-lines a search algorithm based on the Relaxationalgorithm.

From Figure 10, it is clear that the ith vertex(partition point) has i–1 edges (segments) connect-ing it to the vertices before it. Adding the incomingedges, a piece with p partition points has 0 + 1 +2 + 3 + . . . p – 1 , p2/2 edges (segments). Thiscan also be seen from the size of Table 3.

The number of partition points is limited totwice the number of notes (one point for each notebeginning and each note end). This means a piecewith n notes will never have a number of segmentsexceeding n2. Because the number of segmentsequals the number of edges in the DAG, this algo-rithm � nds the best segmentation (assuming a� xed-time segment scoring function) in O(n2) steps.

Recall that Bach’s Sinfonia No. 1 has 341 parti-tion points, and thus 2(341 – 2) = 2339, or roughly 10102

Figure 9. Creating aDirected, Acyclic Graph(DAG) for the Segmenta-tion Problem.

Page 12: BryanPardoandWilliamP.Birmingham Algorithmsfor ChordalAnalysis

38 Computer Music Journal

Figure 11

segmentations. Using a Relaxation-based searchwith context-free, � xed-time segment labeling, thehighest-scoring segmentation may be found in(3412)/2 or roughly 58,000 steps.

Figure 12 illustrates the use of Relaxation-basedsearch to � nd the best segmentation of a measureof music. Each step in the � gure corresponds to oneiteration of the main loop in Figure 11, startingwith the second partition point (since the � rst par-tition point has no predecessors). In this � gure,black lines correspond to the segments beingscored in the current step. The number accompany-ing each segment (edge) is the score of the best-matching chordal label for the segment. Gray linescorrespond to the link from a vertex to the prede-cessor that generated the best overall path score tothat vertex. Numbers in each vertex (partitionpoint) correspond to the score of the highest-

scoring path (segmentation) found from vertex S(the start of the music) up to that point. All seg-ment scores were generated using the segment la-beling method described in another section of thisarticle and correspond to those in Table 3.

Knuth and Plass (1981) formulated the problemof breaking a paragraph of words into optimallength lines as a graph-search problem. The speedof their solution depends on the fact that the maxi-mum allowed line length is known in advance.This limits the context to a relatively small num-ber of breakpoints (usually no more than twenty),and it lets one force a break when a line has goneon too long (i.e., off the page). Since the problem re-starts roughly every twenty breakpoints, theysearch a sequence of graphs, each with no morethan twenty partition points. This translates into asearch where the additional work caused by a new

6

32

2

6

6 + 2 = 8 6 + 2 + 3 = 11

1 2 3 4 5 6 1 2 3 4 5 6

C maj G7 C maj G7 Cmaj

1 2 3 4 5

LABELSCORE

6

Figure 10

Figure 10. Segmentationas a Directed AcyclicGraph (DAG).

Figure 11. Relaxation-based search algorithm.

Page 13: BryanPardoandWilliamP.Birmingham Algorithmsfor ChordalAnalysis

39Pardo and Birmingham

3 6S 3 8 E2

6

2

3

2 6S 3 8 E

2

9

5

3

6

3 6S 3 8 11

Step 4 Step 5

End

32

6

6 2 3

2 -S 3 - E

3

0

2 -S - - E2

2 6S 3 - E

3

2

6

S E

Step 2 Step 3

Step 1

Figure 12. Segmentationusing full search.

breakpoint is bounded by a � xed value (the size of agraph with twenty vertices). Thus, the computa-tional complexity of the search is O(n), where n isthe number of breakpoints.

Unfortunately, one does not know the harmonicrhythm of a piece of music beforehand. It may bethat the entire piece consists of a single chord orthat it breaks into a thousand chords. The rate atwhich chords change can also vary drasticallythroughout the music. It is impossible to say that achord can have a maximal length of, say, twentysegments. Thus, we are forced to do a full searchon the graph of the entire piece if we are to � nd theoptimal segmentation. Still, the idea of a � xedmaximum search for each new partition point is auseful one that we employ in our real-time algo-rithm, the HarmAn algorithm.

The HarmAn Segmentation Algorithm

As stated previously, it is possible, given a good� xed-time segment scoring method, to � nd thebest segmentation of the music in O(n2) steps.While O(n2) steps is much better than O(n2), thetime required to process a piece of music still in-creases quadratically as the number of notes in-creases linearly. This is not a desirable feature ifone expects to use the analysis system for a real-time application, such as an interactive jazz accom-panist. For this, it would be preferable to have asystem that performs analysis in linear time, takingno more than a � xed number of steps per note. Tothis end, we have created the HarmAn algorithmfor � nding a good (although not necessarily the

Page 14: BryanPardoandWilliamP.Birmingham Algorithmsfor ChordalAnalysis

40 Computer Music Journal

best) segmentation in linear time. Figure 13 de-scribes the HarmAn algorithm.

To use this algorithm for chord labeling in a real-time system, the selection of a vertex (step 3 in Fig-ure 13) must be done in temporal order. Also, onlythe edges (segments) actually considered in thesearch should be scored and labeled. If real-timeperformance is not an issue, then vertices may beselected in reverse order or even random order, andthe full set of edges may be scored and labeled inadvance. Figure 14 shows an example segmentationperformed with HarmAn search using the time-ordered selection of partition points.

In Figure 14, each step represents the considera-tion of a single vertex (partition point). Black edgesrepresent newly scored segments. Gray edges arepreviously scored segments remaining in the graph.Dashed edges are not scored. Numbers in each ver-tex (partition point) correspond to the score of thehighest-scoring path (segmentation) found fromvertex S (the start of the music) up to that point.All scores for segments were generated using thesegment-labeling method described in this article.

The HarmAn algorithm considers exactly threeedges in the course of evaluating each partitionpoint. The number of partition points is at mosttwice the number of notes (one for each note begin-ning and each note end). This results in a maxi-mum of six segment-evaluations per note. Thus, apiece with n notes may be analyzed in O(n) steps.

Returning to the example of Bach’s Sinfonia No.1, recall that the piece has 341 partition points, re-sulting in roughly 10102 segmentations. A fullRelaxation-based search found the highest-scoringsegmentation in (3412)/2 or roughly 58,000 steps.HarmAn search, using the same segment-evaluation method, found a path from start to � n-

ish (i.e., a segmentation of the piece) in (341 – 2) ×3 = 668 steps. The catch is that a HarmAn search,unlike a Relaxation search, does not explore allpossible segmentations. How good, then, are thesolutions HarmAn � nds?

Evaluation of the HarmAn Search Heuristic

A perfect segment-scoring method would assignthe highest scores and appropriate labels to thesegments of music corresponding to the correctsegmentation as determined by some outside mea-sure (such as the harmonic analysis provided for anexcerpt in a music theory textbook). We have al-ready shown how a set of partition points andscored segments for a piece of music may be trans-lated into a DAG. To analyze the HarmAn searchheuristic, we use our chord-labeling method toscore the segments, working under the assumptionthat this generates perfect segment scores.

Relaxation search is guaranteed to � nd the high-est scoring path through a DAG given a particularmetric. The price for this guarantee is that thesearch considers increasing number of edges (seg-ments) for each new vertex (partition point) en-countered in the graph. In the worst case, thisresults in a running time that is the square of thenumber of notes in the piece. Because we are inter-ested in developing systems that can run in real-time, HarmAn was conceived as a kind of ’’greedy’’algorithm. That is, it picks the best local, short-term choice at every turn to work on the music ina start-to-� nish way, using a � xed maximum num-ber of steps per vertex.

The rationale behind this approach is that musicunfolds in time from start to � nish. Composers

Figure 13. The HarmAnsegmentation algorithm.

Page 15: BryanPardoandWilliamP.Birmingham Algorithmsfor ChordalAnalysis

41Pardo and Birmingham

6S 3 - E

6

2 -S 3 - E2 0

3

26S 8 E

6

2

Step 1

Step 2 Step 3

S E

3 6

6S 8 E

2

36S 8 11

Same result as Full Search

Step 4 End

326

6 2 3

26

Figure 14. Segmentationwith Time-OrderedHarmAn Search.

generally create structures that can be understoodwith limited backtracking to previously heard pas-sages. The expectation for what comes next is de-termined by what has just been heard. It isreasonable to assume that the set of likely seg-ments under consideration by a human is greatlyconstrained by what has already transpired in themusic. Thus, a ’’greedy’’ start-to-� nish approach isa reasonable way to approximate the � rst hearingof a piece.

If such an approach is reasonable, and the seg-ment scoring method is a good one, then the fol-lowing suppositions should also be true. First,segmentations generated by HarmAn search shouldgenerate scores very close to those generated by Re-laxation search. Second, the order in which

HarmAn considers the partition points (start-to-� nish, � nish-to-start, random, etc.) should stronglyaffect the resulting segmentation. Third, start-to-� nish should be the best HarmAn search orderthrough the partition points, and this should be re-� ected in higher-scoring segmentations.

In order to test these suppositions, we assembleda corpus of 32 MIDI performances of completepieces of Western tonal piano music. These wereBach’s 15 Three-Part Inventions, seven Bagatellesfrom Beethoven’s Opus 33, and Chopin’s Noc-turnes 10 through 19.

A complete DAG was generated for each piece.The score for each edge in the graph was the scoreof the best-matching template for that segment, asdescribed in the section on segment labeling.

Page 16: BryanPardoandWilliamP.Birmingham Algorithmsfor ChordalAnalysis

42 Computer Music Journal

Each piece was then segmented 43 times. Onesegmentation was generated by Relaxation search.One segmentation was generated by HarmAn’s pro-cessing the partition points in start-to-� nish order.Another segmentation was generated by HarmAn’sprocessing the partition points in � nish-to-start (re-verse) order. Finally, 40 segmentations were gener-ated by HarmAn’s processing partition points in 40different random orders. Figure 15 contains a ’’boxand whisker’’ plot for each type of search. All trialsfor all 32 pieces in the corpus were included in thedata for this � gure. Thus, for Relaxation, HarmAnforward, and HarmAn backward, n = 32. For ran-dom, n = 32 × 40= 1,260.

Each box has lines at the lower quartile, median,and upper quartile values. Median values for eachsearch type are also identi� ed by their numericvalue. The whiskers are lines extending from eachend of the box to show the extent of the rest of thedata. ’’Outliers’’ are data with values beyond theends of the whiskers. These ’’outliers’’ are indi-cated by the plus (’’ + ’’) symbol.

Recall that, because Relaxation performs a fullsearch of the segmentation graph, its score is al-ways 100% of the possible score. Note that thismay not correspond to generating perfectly labeledsegmentations compared to an outside measure. Aperfect score in this experiment indicates only thatthe high-scoring path through a DAG was found. Ifthe segment-scoring algorithm is faulty, then thishigh-scoring path will not correspond to the bestsegmentation of a piece of music.

HarmAn search, when applied to the partitionpoints in a start-to-� nish (forward) manner,achieves segmentation scores with a median 98.7%of the scores generated by Relaxation search.Finish-to-start (backward) search achieves a median98.2% of the maximal possible score. Neither for-ward nor backward search ever achieved a score be-low 95% of the maximal one. These � ndingssupport Supposition 1 and indicate that time-ordered HarmAn search generates good segmenta-tions when compared to full search usingRelaxation.

Using random-order selection of partition pointsfor the HarmAn method generates signi� cantlylower scores. Random search had a median score of

90.5% of the maximal score. The lowest score gen-erated by random order selection was on Beetho-ven’s Bagatelle No. 3, which received 72.2% of themaximal score. The distinctly lower scoresachieved by random-order consideration of parti-tion points support Supposition 2. The order inwhich the partition points are selected strongly in-� uences the result of the search.

Supposition 3 was not strongly supported by thedata. While forward search did achieve the highestmedian score, it was only marginally better thanbackward search, and the score distributions for thetwo orders are almost entirely overlapping. Further-more, backward search outscored forward search onmany pieces.

Evaluation of the Labeling Methodfor Segment Scoring

This article has explored the performance of thesegment-labeling algorithm, assuming a good seg-mentation has been provided. Two different seg-mentation algorithms have also been evaluatedunder the assumption that a good method for scor-ing segments has been provided. Recall that, in thecontext of segmentation, a good segment scoring

0.982

0.905

Relax ation Forward Backward Random

0.75

0.8

0.85

0.9

0.95

1.0

normalized score vs. search type

norm

aliz

ed s

core

1.00.987

Figure 15. Normalizedsegmentation scores.

Page 17: BryanPardoandWilliamP.Birmingham Algorithmsfor ChordalAnalysis

43Pardo and Birmingham

method gives high scores to segments that corre-spond to the ones a human would select as units ofharmonic analysis.

We observed that our segment-labeling methodcould be reasonably used to generate segmentscores, because segments containing notes belong-ing to a single chord tend to get high scores, andthose encompassing several chords tend to get lowscores. However, the question remained as to howgood these segment scores were for � nding a seg-mentation of music that would correspond wellwith one generated by a human expert. In order totest the quality of scores generated by our segment-labeling approach, we performed the steps in Figure16 on each excerpt in the KP corpus.

Table 5 shows mean scores for the full corpus of45 excerpts, comparing labeled segmentations gen-erated with Relaxation search and HarmAn searchto the program’s labeling of the segmentationsfrom the answer key. The table shows bothHarmAn and Relaxation search achieve roughlyequivalent performance in the range of 76% correcton the full corpus. This compares to the 88.65%mean score achieved by the labeling system whenpaired with the segmentation provided by the an-swer key.

It is instructive to examine the grades achievedby the different search methods on the 13 excerptsthat were labeled identically to the answer keywhen the correct segmentation was provided. Table5 shows mean scores on these excerpts, brokendown by segmentation method. Figure 17 plots theindividual grade for each of the 13 excerpts, brokendown by segmentation method.

Table 5 shows that the Relaxation search cor-rectly labeled 87% of minimal segments on the 13excerpts. Because the system achieves perfect label-ing on these excerpts when the correct segmenta-tion is provided, the search methods must havegenerated segmentations different from those givenby the answer key. Relaxation search is guaranteedto � nd the best scoring segmentation given a par-ticular scoring metric. Thus, the only way for thesearch not to � nd the same segmentation as the an-swer key is for the segment scoring mechanism tobe sub-optimal for the task of � nding the right seg-mentation.

This conclusion is reinforced by the fact thatHarmAn search performs better than the Relaxa-tion search on several cases. The difference be-tween the scores indicates that HarmAn search

Figure 16. Grading the out-put of the combined label-ing and segmentationalgorithms.

Table 5. Labeling scores by segmentation approach

Proportionof KP Corpus

SegmentationApproach

MeanGrade %

StdDev

All 45 excerpts Use Answer KeySegmentation

88.65 10.72

All 45 excerpts RelaxationSearch

76.50 12.88

All 45 excerpts HarmAn Search 75.81 12.78

Best 13 excerpts Use Answer KeySegmentation

100.00 0.00

Best 13 excerpts RelaxationSearch

86.89 10.59

Best 13 excerpts HarmAn Search 86.08 11.13

Page 18: BryanPardoandWilliamP.Birmingham Algorithmsfor ChordalAnalysis

44 Computer Music Journal

does not generate identical segmentations to theRelaxation search. Since the relaxation search � ndsthe highest scoring segmentation given the internalsegment scoring metric, HarmAn search can onlybeat the Relaxation search if the internal scoringmetric does not always capture all the informationneeded to segment the passage correctly.

Of course, raw grades are meaningless withoutcontext. An analysis system that achieves a meangrade of 87% is far from perfect, but it may be goodenough for many tasks. Consider Figure 18, whichshows excerpt 11, and Figure 19, which shows ex-cerpt 41. The segmentation methods both scoredvery well (98%) on excerpt 11 and rather poorly(72% and 80%) on excerpt 41. Examination of theanalyses of these two excerpts helps illustrate whatthe numeric grades mean.

In both � gures, the answer key is shown at thetop of each system. The program’s output is shownon the bottom. Chord labels apply from the pointwhere they appear until the next label appears. Dis-crepancies between the answer key and programoutput are identi� ed by dashed rectangles indicat-ing the duration of the discrepancy.

Excerpt 11 (Figure 18) was drawn from the thirdmovement of Beethoven’s String Quartet Op. 135and received a score of 98% when segmented usingeither HarmAn or Relaxation search. Here, the twosearches generated the same segmentation and la-beling. Measure six contains the only error on this

excerpt and is the result of a passing F7 chord notacknowledged as part of the harmony in the answerkey.

Excerpt 41 (Figure 19) shows measures � vethrough 17 of Schumann’s ’’Aus meinen Thranenspriessen.’’ Here, HarmAn search achieved a scoreof 80%, while Relaxation search scored 72%. Thelabeling shown in this � gure is that of the lower-scoring Relaxation search. The two searches differin the treatment of the B at the very end of mea-sure 11. HarmAn, when searching in temporal or-der, tends to continue the current segment a longas possible and thus assigns the B to the G-sharphalf-diminished chord in measure 11. The follow-ing measure is then correctly labeled C-sharp ma-jor. Relaxation search uni� es the B at the end ofmeasure 11 with the C-sharp triad in the followingmeasure, resulting in a label of C-sharp 7, whichproves to be incorrect.

The remainder of the analysis is identical be-tween the two segmentation methods. Measureone shows a situation where the initial A majortriad was uni� ed with the F-sharp minor followingit. This happens because only the root and third ofthe A major are present. Because both of thesenotes are also part of the F-sharp minor triad, theprogram does not assign separate labels to beatsone and two.

Measures 7, 8, 15, and 16 all show the same er-ror. Here, the human analyst chooses to begin the

Figure 17. Segmentationgrades on the 13 bestexcerpts.

EX CERPT

444342414038353118111085

110

100

90

80

70

60

50

Segmentation

Relaxation

Answer Key

Harm AnGR

AD

E

Page 19: BryanPardoandWilliamP.Birmingham Algorithmsfor ChordalAnalysis

45Pardo and Birmingham

A minor chord on the second beat of the measure,while the program (noting the C on the second halfof beat one) begins the A minor triad on the secondhalf of the � rst beat. Presumably, the human favorsthe second beat owing to the appearance of the rootof the A minor chord.

The errors in measures 10 and 14 are both the re-sult of spurious vertical simultaneities that happento spell chords. Because the segment scoring metricis based on how well a particular segment matchesa particular chord label, both search methods tend

to select segments that exactly spell a chord, evenif the chord is clearly (to a human) the result ofpassing tones.

Conclusion

In this article, we described a system for segment-ing and labeling tonal music based on templatematching and graph-search techniques. We thenanalyzed both the computational complexity of the

Figure 18. Excerpt 11,Beethoven’s String QuartetOp. 135, III, mm. 1–10.

Page 20: BryanPardoandWilliamP.Birmingham Algorithmsfor ChordalAnalysis

46 Computer Music Journal

algorithms used and real-world system performanceby comparing the analyses it produced to those cre-ated by human analysts. The results produced bythis system are surprisingly good, given its simplic-ity, and provide insight into the computationalproblems in analysis of tonal music.

When provided the correct segmentation, thesegment-labeling method described in Figures 3and 4 labeled an average of 89% of minimal seg-ments correctly on the KP corpus of musical ex-cerpts. Thirteen out of 45 excerpts in this corpuswere labeled perfectly using this method.

Our analysis of the errors generated by the label-ing system suggests further improvements to the

system. Figure 8 shows error class frequencies onthe corpus and indicates, for example, that 26% oflabeling errors can be eliminated with improved tiebreaking between chord templates.

The combined segmentation and labeling systemuses either Relaxation or HarmAn search to � nd agood segmentation based on segment scores gener-ated by the labeling system. This system correctlylabels roughly 77% of minimal segments in the fullKP corpus. These results are surprisingly good,given the unforgiving nature of the grading schemeand the simple analytical approach used by the sys-tem. Recall that metric information, key signature,harmonic function, voice leading, and stylistic in-

Figure 19. Excerpt 41,Schumann, ’’Aus meinenThranen spriessen,’’mm. 5–17.

Page 21: BryanPardoandWilliamP.Birmingham Algorithmsfor ChordalAnalysis

47Pardo and Birmingham

formation were not used. Furthermore, the assump-tion that segments can be labeled in isolation,without regard to context, is a strong one that, in-tuitively, one would expect to have far more of anegative impact than turned out to be the case.

The evidence supports the conclusion that thesearch methods we are using are good. Given ourformulation of the segmentation problem as asearch for the best path through a Directed AcyclicGraph, it is unlikely that a computationally moreef� cient algorithm than Relaxation can be foundwithout giving up either accuracy or completenessof the search result. HarmAn search is an examplethat gives up completeness to gain a time advan-tage over Relaxation search, while still returningnearly equivalent results.

While we broke the chordal analysis probleminto two quasi-independent subtasks, it is clearthat there are subtle (and not so subtle) interactionswhen segment scores are generated using the seg-ment labeling system. Furthermore, segment scoresgenerated by our current system do not fully cap-ture the detail needed to match the answer key seg-mentation. This is shown most clearly by theresults on the thirteen excerpts the system labelsperfectly when given the correct segmentation. Ta-ble 5 shows that Relaxation search correctly la-beled 87% of minimal segments on these thirteenexcerpts. Because the system achieves perfect label-ing when the correct segmentation is provided,Relaxation must have generated incorrect segmen-tations. Relaxation search is guaranteed to � nd thebest scoring segmentation, given some scoring met-ric. Thus, the only way for the search not to � ndthe same segmentation as the answer key is for thecurrent segment scoring mechanism to be sub-optimal for the task of � nding the right segmenta-tion.

Improving the segment scoring is a dif� cult task,and one that we plan to investigate further.Clearly, the scheme we use to assign positive andnegative evidence is not suf� cient, but improvingit while maintaining generality is not easy. Thisleads us to consider segment scoring mechanismsthat are either directed to a particular style (e.g., a’’Kostka-Payne’’ mechanism that favors what their

textbook favors), or that use additional informa-tion, such as metric placement of notes and dy-namic contrasts.

The system and experiments in this article con-tribute a number of things to the � eld of musicalanalysis by computer. We describe our approach toa level of detail that allows others to duplicate andextend the work. This article examines the compu-tational complexity of the algorithms we use tosolve the task. Finally, the performance of the sys-tem is measured using a clear grading measureagainst an objective standard on a corpus of knownpieces, providing an empirical baseline againstwhich future work may be measured.

Acknowledgments

We would like to thank David Temperely for pro-viding the Kostka-Payne corpus and for many help-ful discussions. We also thank Roger Dannenbergfor his helpful comments. We gratefully acknowl-edge the support of the NSF under award IIS-0085945 and a University of Michigan College ofEngineering Seed Grant. The views expressed inthis article are solely those of the authors and donot necessarily re� ect the views of the fundingagencies.

References

Cormen, T., et al. 1990. Introduction to Algorithms.Cambridge, Massachusetts: MIT Press.

Ebcioglu, K. 1992. ’’An Expert System for HarmonizingChorales in the Style of J. S. Bach.’’ In M. Balaban, K.Ebcioglu, and O. Laske, eds. Understanding MusicWith AI: Perspectives on Music Cognition. Cambridge,Massachusetts: MIT Press, pp. 294–333.

Forte, A. 1973. The Structure of Atonal Music. New Ha-ven, Connecticut: Yale University Press.

Knuth, D., and M. Plass. 1981. ’’Breaking Paragraphs intoLines.’’ Software Practice and Experience 11:1119–1184.

Kostka, S., and D. Payne 1984. Tonal Harmony. NewYork: McGraw-Hill.

Page 22: BryanPardoandWilliamP.Birmingham Algorithmsfor ChordalAnalysis

48 Computer Music Journal

Laden, B., and D. H. Keefe. 1989. ’’The Representation ofPitch in a Neural Net Model of Chord Classi� cation.’’Computer Music Journal 13(4):12–26.

Maxwell, J. H. 1992. ’’An Expert System for HarmonizingAnalysis of Tonal Music.’’ In M. Balaban, K. Ebciogluand O. Laske, eds. Understanding Music WithAI:Perspectives on Music Cognition. Cambridge, Mas-sachusetts: MIT Press, pp. 335–353.

Pardo, B., and W. P. Birmingham. 2001. ’’The ChordalAnalysis of Tonal Music.’’ Technical Report CSE-TR-439–01. Ann Arbor, Michigan: Department of Electri-cal Engineering and Computer Science, The Universityof Michigan.

Smaill, A., et al. 1993. ’’Hierarchical Music Representa-tion for Composition and Analysis.’’ Computers andthe Humanities 27:7–17.

Smoliar, S. 1980. ’’A Computer Aid for SchenkerianAnalysis.’’ Computer Music Journal 4(2):41–59.

Taube, H. 1999. ’’Automatic Tonal Analysis: Toward theImplementation of a Music Theory Workbench.’’Computer Music Journal 23(4):18–32.

Temperley, D., and D. Sleator. 1999. ’’Modeling Meterand Harmony: A Preference-Rule Approach.’’ Com-puter Music Journal 23(1):10–27.

Ulrich, J. W. 1977. ’’The Analysis and Synthesis of Jazzby Computer.’’ In Proceedings of the 5th InternationalJoint Conference on Arti�cial Intelligence. Los Altos,California: William Kaufmann.

Wake� eld, G. H. 1999. ’’Mathematical Representation ofJoint Time-Chroma Distributions.’’ Paper presented atthe International Symposium on Optical Science, Engi-neering, and Instrumentation, SPIE’99, 18–23 July,Denver, Colorado.

Widmer, G. 1992. ’’Perception Modeling and IntelligentMusical Learning.’’ Computer Music Journal 16(2):51–68.

Winograd, T. 1968. ’’Linguistics and the Computer Anal-ysis of Tonal Harmony.’’ The Journal of Music Theory12:2–49.

Appendix: The KP Corpus

The excerpt number, composer, and compositionare shown for each member of the KP corpus below.

1. Anonymous (often attributed to Bach),Minuet in G, mm. 1–16

2. Bach, Chorale, ’’Uns ist ein Kindlein heut’geborn’’

3. Bach, Chorale, ’’Jesu, der du meine Seele’’4. Beethoven, Rondo Op. 51, no. 1, mm. 103–

1205. Beethoven, Sonata Op. 10 No. 1, II, mm.

1–86. Beethoven, Sonata Op. 10 No. 3, II, mm.

9–177. Beethoven, Sonata Op. 13, II, mm. 1–88. Beethoven, Sonata Op. 14 No. 1, III9. Beethoven, Sonata Op. 2 No. 3, III, mm.

81–8810. Beethoven, Sonata Op. 27 No. I, mm. 1–911. Beethoven, String Quartet Op. 135, III,

mm. 1–1012. Beethoven, String Trio Op. 9 No. 3, II,

mm. 1–1013. Brahms, ’’Und gehst du ueber den Kir-

chhof,’’ Op. 44, mm. 29–3714. Ayer (arr. Campbell), ’’Oh! You Beautiful

Doll!,’’ mm. 1–915. Chopin, Mazurka Op. 62, No. 7, mm. 1–1616. Chopin, Mazurka Op. 63, No. 2, mm. 1–1617. Chopin, Nocturne Op. 27 No. 1, mm.

41–5218. Grieg, ’’The Mountain Maid,’’ Op. 67 No.

2, mm. 1–1119. Haydn, Sonata No. 22, III, mm. 1–820. Haydn, Sonata No. 30, I, mm. 84–9621. Haydn, String Quartet Op. 20 No. 4, I,

mm. 13–2422. Haydn, String Quartet Op. 50 No. 6, II,

mm. 55–6323. Haydn, String Quartet Op. 74 No. 3, II,

mm. 30–724. Haydn, String Quartet Op. 76 No. 6, II,

mm. 31–925. Mozart, Bassoon Concerto K. 191, II, mm.

42–5026. Mozart, ’’Eine Kleine Nachtmusik,’’ K.

525, II, mm. 1–827. Mozart, Piano Concerto K. 488, II, mm.

1–1228. Mozart, Sonata K. 330, II, mm. 21–829. Mozart, Sonata K. 333, III, mm. 91–830. Mozart, Piano Trio K. 542, I, mm. 210–229

Page 23: BryanPardoandWilliamP.Birmingham Algorithmsfor ChordalAnalysis

49Pardo and Birmingham

31. Mozart, Marriage of Figaro, ’’Voi che sa-pete,’’ mm. 41–52

32. Schubert, Sonata in B-�at, D. 960, I, mm.149–68

33. Schubert, ’’Erlkonig,’’ mm. 113–2334. Schubert, ’’Erlkonig,’’ mm. 134–4835. Schubert, ’’Auf dem Flusse,’’ mm. 14–2136. Schubert, Impromptu Op. 90 No. 1, mm.

42–5537. Schubert, String Trio D. 471, mm. 187–20138. Schubert, Originaltanze Op. 9 No. 14,

mm. 1–24

39. Schumann, ’’Die beiden Grenadiere,’’ mm.23–37

40. Schumann, ’’Sehnsucht,’’ mm. 2–1141. Schumann, ’’Aus meinen Thranen spries-

sen,’’ mm. 1–1742. Schumann, ’’Wenn ich in deine Augen

seh’,’’ mm. 1–2143. Tchaikovsky, ’’Morning Prayer,’’ mm. 1–1744. Tchaikovsky, ’’The Nurse’s Tail,’’ mm.

5–1545. Tchaikovsky, Symphony No. 6, I, mm.

89–97