comparing pitch spelling algorithms on a large corpus … · comparing pitch spelling algorithms on...

Comparing Pitch Spelling Algorithms on a LargeCorpus of Tonal Music

David Meredith

Centre for Computational Creativity, City University, London,Northampton Square, London, EC1V 0HB, United Kingdom

[email protected]

http://www.titanmusic.com

Abstract. This paper focuses on the problem of constructing a reliablepitch spelling algorithm—that is, an algorithm that computes the correctpitch names (e.g., C]4, B[5 etc.) of the notes in a passage of tonal music,when given only the onset-time, MIDI note number and possibly the du-ration of each note. The author’s ps13 algorithm and the pitch spellingalgorithms of Cambouropoulos, Temperley and Longuet-Higgins wererun on a corpus of tonal music containing 1.73 million notes. ps13 speltsignificantly more of the notes in this corpus correctly than the otheralgorithms (99.33% correct). However, Temperley’s algorithm spelt sig-nificantly more intervals correctly than the other algorithms (99.45%correct). If intervals and notes were considered together, ps13 performedbest (99.25% correct). The performance of each algorithm seemed todepend in an interesting way on the style of the music being processed.

1 The Concept of a Pitch Spelling Algorithm

In this paper, I focus on the problem of constructing a reliable pitch spellingalgorithm—that is, an algorithm that reliably computes the correct pitch names(e.g., C]4, B[5 etc.) of the notes in a passage of tonal music, when given onlythe onset-time, MIDI note number and possibly the duration of each note in thepassage.

There are good practical and scientific reasons for attempting to develop areliable pitch spelling algorithm. First, until such an algorithm is devised, it willbe impossible to construct a reliable MIDI-to-notation transcription algorithm—that is, an algorithm that reliably computes a correctly notated score of a passagewhen given only a MIDI file of the passage as input. Second, existing audiotranscription systems generate not notated scores but MIDI-like representationsas output [6, 21, 18]. So, if one needs to produce a notated score from a digitalaudio recording, a MIDI-to-notation transcription algorithm (incorporating apitch spelling algorithm) is required in addition to an audio transcription system.

Third, knowing the letter-names of the pitch events in a passage can beextremely useful in music information retrieval and musical pattern discovery[15]. In particular, the occurrence of a motive on a different degree of a scalemight be perceptually significant even if the corresponding chromatic intervals

Fig. 1. Three perceptually similar patterns with different chromatic pitch intervalstructures (from the first bar of the Prelude in C minor (BWV 871/1) from Book2 of J. S. Bach’s Das Wohltemperirte Klavier)

in the patterns differ. For example, the three patterns A, B and C in Fig. 1are perceived as being three occurrences of the same motive even though thecorresponding chromatic intervals are different in the three patterns. Note that inthis example, one important aspect of the perceived similarity between patternsA, B and C is nicely represented in the notation by the fact that they all havethe same scale-step interval structure (i.e., one step down followed by one stepup followed by another step up). In other words, the pitch names of the notes inthis passage are chosen so that the scale-step interval structures of these threepatterns are the same.

Such matches can be found using fast, exact-matching algorithms if the pitchnames of the notes are encoded, but exact-matching algorithms cannot be usedto find such matches if the pitches are represented using just MIDI note numbers.If a reliable pitch spelling algorithm existed, it could be used to compute thepitch names of the notes in the tens of thousands of MIDI files of works that arefreely available online, allowing these files to be searched more effectively by amusic information retrieval (MIR) system.

2 Pitch Spelling In Common Practice Western TonalMusic

In the vast majority of cases, the correct pitch name for a note in a passage oftonal music can be determined by considering the roles that the note plays inthe harmonic, motivic and voice-leading structures of the passage. For example,when played in isolation in an equal-tempered tuning system, the first sopranonote in Fig. 2(a) would sound the same as the first soprano note in Fig. 2(b).However, in Fig. 2(a), this note is spelt as a G]4 because it functions as a leadingnote in A minor; whereas in Fig. 2(b), the first soprano note is spelt as an A[4because it functions as a submediant in C minor. Similarly, the first alto note inFig. 2(b) would sound the same as the first alto note in Fig. 2(c) in an equal-tempered tuning system. However, in Fig. 2(b) the first alto note is spelt as an

F\4 because it functions in this context as a subdominant in C minor; whereas,in Fig. 2(c), the first alto note functions as a leading note in F] minor so it isspelt as an E]4.

Fig. 2. Examples of enharmonic spelling (from [17, p. 8])

So, in general, the pitch name assigned to a note in a passage of tonal musicis not arbitrary. In most cases, the pitch name of each note is carefully chosen sothat the resulting score represents as well as possible certain important aspectsof the way that the music is intended to be perceived and interpreted. Thisillustrates the important and more general point that a correctly notated scoreof a passage of Western tonal music is a structural description of the passage thatis supposed to represent certain aspects of the way that the passage is intendedto be interpreted by an expert listener. In this respect, a correctly notated scoreserves a similar function to, say, a Schenkerian analysis or an analysis usingLerdahl and Jackendoff’s Generative Theory of Tonal Music [11].

Fig. 3. Should the E[s be spelt as D]s? (From [17, p. 390])

Of course, there do exist cases where the pitch name of a note cannot beuniquely determined by considering the harmonic, motivic and voice-leadingstructures of its context. For example, as Piston observes [17, p. 390], the tenorE[4 in the third and fourth bars of Fig. 3 should be spelt as a D]4 if one perceivesthe harmonic progression here to be +II2− I as proposed by Piston. But spellingthe soprano E[5 in the fourth bar as D]5 does not seem to represent the perceivedstructure of the melody correctly. Indeed, this E[5 seems to be functioning as

the ninth of a supertonic chord whereas the tenor E[4 can be interpreted as theraised root of a supertonic chord. If one interprets the passage in this way, thenthe E[4 should be spelt as a D]4 and the soprano E[5 should be left unchanged,which would lead to an E[ sounding simultaneously with a D] in the second halfof the fourth bar!

3 Computationally Modelling the Cognitive ProcessesUnderlying Pitch Spelling

Fortunately, however, cases such as the one in Fig. 3 where it is difficult todetermine the correct pitch name of a note in a tonal work are relatively rare—particularly in Western tonal music of the so-called ‘common practice’ period(roughly the 18th and 19th centuries). In the vast majority of cases, those whostudy and perform Western tonal music agree about how a note should be speltin a given tonal context. This poses an interesting problem for cognitive science,namely: what are the cognitive processes involved when a musically trainedindividual determines the correct pitch name of a note in a passage of tonalmusic?

A good way of trying to answer this question is to attempt to constructa reliable pitch spelling algorithm that operates in a way that is consistentwith what is known about expert music perception and cognition. Such a pitchspelling algorithm would serve the valuable scientific purpose of furthering ourunderstanding of the cognitive processes underlying an interesting intellectualability exhibited by many individuals who study and perform Western tonalmusic.

The vast majority of notes in authoritative published editions of scores ofcommon practice tonal works are generally agreed to be spelt correctly by thosewho understand Western staff notation. Therefore a pitch spelling algorithm orcomputational model of pitch spelling can be evaluated quantitatively by runningit on tonal works and comparing the pitch names it predicts with those of thecorresponding notes in authoritative published editions of scores of the works.In other words, such authoritative scores can provide us with a ‘ground truth’that we can compare with the output of a pitch spelling algorithm. However,this can only be done accurately and quickly if one has access to encodings ofthese authoritative scores in the form of computer files that can be comparedautomatically with the pitch spelling algorithm’s output.

4 A Comparison of Three Pitch Spelling Algorithms

In order to get a clearer idea of the ‘state of the art’ in the field, three pitchspelling algorithms were run on two test corpora and their performance was com-pared. The algorithms compared were those of Cambouropoulos [1–5], Longuet-Higgins [12–14] and Temperley [19, 20]. In an initial pilot study, the corpus usedwas the first book of J. S. Bach’s Das Wohltemperirte Klavier (BWV 846–869),

which contains 41544 notes. Then a larger-scale comparison was carried out us-ing a corpus containing 1729886 notes and consisting of 1655 movements fromworks by 9 baroque and classical composers (Corelli, Vivaldi, Telemann, Bach,Handel, B. Marcello, Haydn, Mozart and Beethoven). Both corpora were derivedfrom the MuseData collection of encoded scores [8].1

5 Longuet-Higgins’s Algorithm

Pitch spelling is one of the tasks performed by Longuet-Higgins’s music.p pro-gram [12–14]. The input to music.p must be in the form of a list of triples,⟨p, ton, toff

⟩, each triple giving the “keyboard position” p together with the on-

set time ton and the offset time toff in centiseconds of each note. The keyboardposition p is just the MIDI note number minus 48. So, for example, the keyboardposition of middle C is 12. Longuet-Higgins intended the music.p program tobe used only on monophonic melodies and explicitly warns against using it on“accompanied melodies” or what he calls “covertly polyphonic” melodies (i.e.,compound melodies) [13, p. 114].

It is perhaps also worth pointing out that the pitch spelling algorithm imple-mented in music.p does not use Longuet-Higgins’s well-known three-dimensional‘tonal space’ model of tonality [13, p. 110–111]. However, Longuet-Higgins doesactually describe this multi-dimensional model in the paper where the music.pprogram is published. This has probably led some readers to assume (incorrectly)that the program implements Longuet-Higgins’s three-dimensional ‘tonal space’model. In fact, the only pitch spaces used in the pitch spelling algorithm imple-mented in music.p are the circle of fifths and the ‘line of fifths’ (see Fig. 4).

Sharpness

. . .

. . .C[

−7

G[

−6

D[

−5

A[

−4

E[

−3

B[

−2

F

−1

C

0

G

1

D

2

A

3

E

4

B

5

F]

6

C]

7

G]

8

D]

9

A]

10

E]

11

. . .

. . .

Fig. 4. The line of fifths

The algorithm computes a value of “sharpness” q for each note in the input[13, p. 111]. The sharpness of a note is a number indicating the position of thepitch name of the note on the line of fifths (see Fig. 4) [20, p. 117]. It is thereforeessentially the same as Temperley’s concept of “tonal pitch class” [20, p. 118].Longuet-Higgins’s algorithm tries to spell the notes so that the distance on theline of fifths between each note and the tonic at the point at which the noteoccurs is less than or equal to 6 (which corresponds to a tritone) [13, p. 113].The algorithm assumes at the beginning of the music that the first note is eitherthe tonic or the dominant of the opening key and chooses between these twopossibilities on the basis of the interval between the first two notes [13, p. 114].The algorithm also assumes that two consecutive notes in the music are never1 The MuseData collection is available online at www.musedata.org.

separated by more than 12 steps on the line of fifths [13, p. 111]. In fact, if twoconsecutive notes in the music are separated by more than 6 steps on the line offifths, then the algorithm treats this as evidence of a change of key.

6 Cambouropoulos’s Algorithm

Cambouropoulos’s method [1–5] involves first converting the input representa-tion into a sequence of pitch classes in which the pitch classes are in the orderin which they occur in the music (the pitch classes of notes that occur simulta-neously being ordered arbitrarily—see Fig. 5).

?

5 9 7 5 4 7 5 2 4 0

Fig. 5. The first step in Cambouropoulos’s method involves converting the input rep-resentation into a sequence of pitch classes. The music is processed a window at a time.Each window contains 9 notes and the first third of each window overlaps the last thirdof the previous window

Having derived an ordered set of pitch classes from the input, Cambouropou-los’s algorithm then processes the music a window at a time, each window con-taining a fixed number of notes (between 9 and 15). Each window is positionedso that the first third of the window overlaps the last third of the previous win-dow. Cambouropoulos allows ‘white note’ pitch classes (i.e., 0, 2, 4, 5, 7, 9 and11) to be spelt in three different ways (e.g., pitch class 0 can be spelt as B],C\ or D[[) and ‘black note’ pitch classes to be spelt in two different ways (e.g.,pitch class 6 can be spelt as F] or G[). Given these restricted sets of possiblepitch names for each pitch class, the algorithm computes all possible spellingsfor each window. So, on average, if the window size is 9, there will be over 5000possible spellings for each window. A penalty score is then computed for each ofthese possible window spellings. The penalty score for a given window spelling isfound by computing a penalty value for the interval between every pair of notesin the window and summing these penalty values. A given interval in a particu-lar window spelling is penalised more heavily if it is an interval that occurs lessfrequently in the major and minor scales. An interval is also penalised if either ofthe pitch names forming the interval is a double-sharp or a double-flat. For eachwindow, the algorithm chooses the spelling that has the lowest penalty score.

7 Temperley’s Algorithm

Temperley’s pitch spelling algorithm [19, 20] is implemented in his harmony pro-gram which forms one component of his and Sleator’s Melisma system. The inputto the harmony program must be in the form of a “note-list” [20, pp. 9–12] givingthe MIDI note number of each note together with its onset time and durationin milliseconds. Temperley’s pitch spelling algorithm searches for the spellingthat best satisfies three “preference rules” [20, pp. 115–136]. The first of theserules stipulates that the algorithm should “prefer to label nearby events so thatthey are close together on the line of fifths” [20, p. 125]. This rule bears someresemblance to the basic principle underlying Longuet-Higgins’s algorithm (seeabove). The second rule expresses the principle that if two tones are separatedby a semitone and the first tone is distant from the key centre, then the intervalbetween them should preferably be spelt as a diatonic semitone rather than achromatic one [20, p. 129]. The third preference rule steers the algorithm towardsspelling the notes so that what Temperley calls a “good harmonic representa-tion” results [20, p. 131].

Note, however, that Temperley’s algorithm requires more information in itsinput than the other algorithms. In particular, it needs to know the duration ofeach note and the tempo at each point in the passage. It also needs to performa full analysis of the metrical and harmonic structure of the passage in order togenerate a high quality result. Also, it cannot deal with cases where two or morenotes with the same pitch start at the same time.

8 Results of Running Algorithms on the First Book ofJ. S. Bach’s Das Wohltemperirte Klavier

Table 1 shows the results obtained in a pilot study in which the three algorithmsdescribed above were run on the first book of J. S. Bach’s Das WohltemperirteKlavier. As can be seen, Temperley’s algorithm performed best, followed byLonguet-Higgins’s algorithm and then Cambouropoulos’s.

Table 1. Results obtained when the three algorithms were run on the first book ofJ. S. Bach’s Das Wohltemperirte Klavier

Algorithm % notes correct Number of errorsCambouropoulos 93.74 2599Longuet-Higgins 99.36 265

Temperley 99.71 122

Total number of notes in corpus = 41544.

9 The ps13 Algorithm

Having carried out this pilot study and gained a better idea of the ‘state ofthe art’ in the field, an attempt was made to construct a new algorithm thatimproved on Temperley’s. Around 30 different algorithms were developed andtested and the one that performed best on the first book of Bach’s Das Wohltem-perirte Klavier will now be described. This algorithm is called ps13.2

At the highest level of description, ps13 can be broken down into two stages,which I shall call Stage 1 and Stage 2.

Stage 1 involves carrying out the following steps:

Step 1 For each note n in the input and each pitch class 0 ≤ p ≤ 11, computea value CNT(p, n) giving the number of times that p occurs within acontext surrounding n that includes n, some specified number Kpre ofnotes immediately preceding n and some specified number Kpost ofnotes immediately following n.

Step 2 For each note n and each pitch class p, compute the letter name L(p, n) ∈{A,B,C,D,E,F,G} that n would have if p were the tonic at the pointwhere n occurs (assuming that the notes are spelt as they are in theharmonic chromatic scale on p).

Step 3 For each note n and each letter name `, compute the set of tonic pitchclasses, X(n, `), that would lead to n having the letter name `. That is,

X(n, `) = {p | L(p, n) = `} .

Step 4 For each note n and each letter name `, compute the sum, N(`, n), ofthe values of CNT(p, n) for all the tonic pitch classes p ∈ X(n, `). Thatis,

N(`, n) =∑

p∈X(n,`)

CNT(p, n) .

Step 5 For each note n, make the letter name of n equal to that value of ` forwhich N(`, n) is a maximum.

Stage 2 of the algorithm corrects those instances in the output of Stage 1where a neighbour note or passing note is erroneously predicted to have the sameletter name as either the note preceding it or the note following it. That is, thesecond stage of the algorithm corrects errors like those shown in Fig. 6.

In the first step of Stage 1, the algorithm essentially counts how many timeseach pitch class occurs within some specified context surrounding a particularnote. Krumhansl [10, pp. 66–75] showed that there is a high correlation be-tween the frequency with which a pitch class occurs within a passage and itsperceived tonal stability as measured experimentally [9]. This suggests that thevalue CNT(p, n), calculated in the first step of Stage 1, gives an approximatemeasure of the perceived tonal stability of the pitch class p at the point in themusic where n occurs. In ps13 the value CNT(p, n) is used as a measure of the2 Patent pending [16].

Fig. 6. Examples of the types of passing and neighbour note errors corrected in thesecond part of the ps13 algorithm

likelihood of p being perceived to be the tonic at the point where note n occurs.Note that, unlike Temperley’s algorithm, ps13 uses neither duration nor tempoand can deal with situations where two or more notes with the same pitch startat the same time.

10 Results of running ps13 on the first book ofJ. S. Bach’s Das Wohltemperirte Klavier

In the first step of Stage 1, ps13 counts how many times each pitch class occurswithin some specified context surrounding a particular note, defined by the valuesof Kpre and Kpost. In order to explore the effect that varying the values of Kpreand Kpost has on the performance of ps13, the algorithm was run 2500 times on

the test corpus, each time using a different pair of values⟨Kpre,Kpost

⟩chosen

from the set{⟨Kpre,Kpost

⟩| 1 ≤ Kpre ≤ 50 ∧ 1 ≤ Kpost ≤ 50

}.

Fig. 7 summarises the results obtained. It was found that ps13 made fewer than122 mistakes (i.e., performed better than Temperley’s algorithm) on the testcorpus for 2004 of the 2500

⟨Kpre,Kpost

⟩pairs tested (i.e., 80.160% of the⟨

Kpre,Kpost⟩

pairs tested).ps13 performed best on the test corpus when Kpre was set to 33 and Kpost

was set to either 23 or 25. With these parameter values, ps13 made only 81 errorson the test corpus—that is, it correctly predicted the pitch names of 99.81% ofthe notes in the test corpus. The mean number of errors made by ps13 over all2500

⟨Kpre,Kpost

⟩pairs was 109.082 (i.e., 99.74% of the notes were correctly

spelt on average over all 2500⟨Kpre,Kpost

⟩pairs). This average value was

better than the result obtained by Temperley’s algorithm for this test corpus.The standard deviation in the accuracy over all 2500

⟨Kpre,Kpost

⟩pairs was

0.08%. The worst result was obtained when both Kpre and Kpost were set to 1.In this case, ps13 made 1117 errors (97.31% correct). However, provided Kprewas greater than about 14 and Kpost was greater than about 21, ps13 predictedthe correct pitch name for over 99.75% of the notes in the test corpus (see large,dark grey, upper right quadrant of Fig. 7).

Fig. 7. Contour map showing percentage of notes spelt correctly by ps13 for all valuesof Kpre and Kpost between 1 and 50

11 Structure of the larger test corpus

ps13 and the algorithms of Temperley, Cambouropoulos and Longuet-Higginswere then run on a much larger corpus containing 1729886 notes and consistingof 1655 movements from works by 9 baroque and classical composers (Corelli,Vivaldi, Telemann, Bach, Handel, B. Marcello, Haydn, Mozart and Beethoven).The values of Kpre and Kpost for ps13 were set to 33 and 23, respectively, thesebeing the values that produced the best results when the algorithm was run onthe smaller corpus in the pilot study. Figs. 8 and 9 show the percentage andnumber of notes in this larger test corpus in works by each composer.

Fig. 8. The percentage of notes in the test corpus in works by each composer

Fig. 9. The number of notes in the test corpus in works by each composer

Fig. 8 reveals that about 80% of the music in the corpus is baroque andthe remaining 20% is classical. So the corpus is not very stylistically varied—itcontains music written between about 1675 and 1825. Fig. 8 also shows that over60% of the corpus consists of works by Bach and Handel. Finally, note that thecorpus is very unevenly distributed between the nine composers.

Unfortunately, a more varied and balanced corpus of a comparable size wasnot available at the time this experiment was carried out.

12 Comparison of algorithms with respect to number ofnotes spelt correctly

Table 2 shows the number of notes spelt incorrectly by each algorithm for eachcomposer in the corpus. The bottom row in this table gives the total number ofnotes in the corpus spelt incorrectly by each algorithm.

Table 2. The number of notes spelt incorrectly by each algorithm for each composer

Cam LH ps13 TemCorelli 148 120 30 6Vivaldi 1816 5518 1497 4900

Telemann 604 665 481 111Bach 13347 13022 3450 1166

Handel 1929 3342 2339 1309Marcello 1 4 14 5

Haydn 1497 6174 823 6391Mozart 2300 10147 2250 19832

Beethoven 600 1703 727 6525Complete test corpus 22242 40695 11611 40245

Table 3 shows the percentage of notes spelt correctly by each algorithm foreach composer. The bottom row of this table shows the total percentage of notesin the corpus spelt correctly by each algorithm. As shown in this table, overall,the percentage of notes spelt correctly is largest for the ps13 algorithm.

Table 3. Percentage of notes spelt correctly by each algorithm for each composer

Cam LH ps13 TemCorelli 99.53% 99.62% 99.90% 99.98%Vivaldi 99.19% 97.53% 99.33% 97.81%

Telemann 99.33% 99.26% 99.46% 99.88%Bach 97.87% 97.92% 99.45% 99.81%

Handel 99.57% 99.26% 99.48% 99.71%Marcello 99.97% 99.86% 99.53% 99.83%

Haydn 98.23% 92.71% 99.03% 92.45%Mozart 98.66% 94.10% 98.69% 88.48%

Beethoven 98.77% 96.50% 98.51% 86.59%Complete test corpus 98.71% 97.65% 99.33% 97.67%

McNemar’s test [7] was then used to determine whether the differences be-tween the scores achieved by the algorithms were statistically significant.3 Theresults of this analysis are shown in Table 4. In this table, each value in a col-umn headed ‘p-value’ gives the statistical significance (expressed as a probabilitycomputed using McNemar’s test) of the difference in performance between thealgorithms to the left and right of the value in the table. For example, the valuein the seventh column and first row of Table 4 indicates that if the algorithmsof Longuet-Higgins and Cambouropoulos would actually make an equal num-ber of note errors when run over the whole population of works represented bythe test corpus, then the probability of getting a difference in scores at leastas great as that observed in this study would be 0.0765. The bottom row ofthis table shows that, overall, ps13 spelt very significantly more notes correctlythan Cambouropoulos’s algorithm (p < 0.0001), which in turn spelt very signifi-cantly more notes correctly than Temperley’s algorithm (p < 0.0001). However,the percentage of notes spelt correctly by Temperley’s algorithm (97.67%) andLonguet-Higgins’s algorithm (97.65%) did not differ significantly (p = 0.0954).(A difference is considered significant if p ≤ 0.05.)

Table 4. Statistical significance of differences between note error rates of algorithmsfor each composer

Composer Best p-value 2nd best p-value 3rd best p-value Worst

Corelli Tem < 0.0001 ps13 < 0.0001 LH 0.0765 CamVivaldi ps13 < 0.0001 Cam < 0.0001 Tem < 0.0001 LH

Telemann Tem < 0.0001 ps13 < 0.0001 Cam 0.0736 LHBach Tem < 0.0001 ps13 < 0.0001 LH 0.0352 Cam

Handel Tem < 0.0001 Cam < 0.0001 ps13 < 0.0001 LHMarcello Cam 0.1797 LH 0.7388 Tem 0.0389 ps13

Haydn ps13 < 0.0001 Cam < 0.0001 LH 0.0348 TemMozart ps13 0.3665 Cam < 0.0001 LH < 0.0001 Tem

Beethoven Cam < 0.0001 ps13 < 0.0001 LH < 0.0001 TemComplete test corpus ps13 < 0.0001 Cam < 0.0001 Tem 0.0954 LH

3 I would like to thank Marcus Pearce for suggesting the use of McNemar’s test.

The graph in Fig. 10 shows the percentage of notes spelt correctly by each al-gorithm for each composer, the composers being placed along the horizontal axisin order of birth year. This graph suggests that the algorithms of Temperley andLonguet-Higgins perform significantly worse on the classical composers (Haydn,Mozart and Beethoven) than they do on the baroque composers. However, norigorous statistical test has yet been carried out to determine if this is indeedthe case. (Comparing the performance of a particular algorithm on the differentcomposers is complicated by the fact that each composer is not represented byan equal share of notes in the corpus.)

The graph also seems to show that ps13 performs more consistently acrossthe different composers and styles than the other algorithms.

Fig. 10. Graph showing the percentage of notes spelt correctly by each algorithm foreach composer, with the composers arranged along the X-axis in order of birth

13 Comparison of algorithms with respect to number ofintervals spelt correctly

When the lists of errors generated by the algorithms were examined, it wasobserved that, in some cases, many errors were the result of large segmentsof the music simply being transposed up or down by a diminished second. Inother words, a single incorrect interval between two notes resulted in a wholesegment of notes following the incorrect interval being spelt incorrectly. Thealgorithms were therefore also compared with respect to the number of intervals

spelt correctly. Table 5 shows the number of intervals in works by each composerin the corpus.

Table 5. The number of intervals in works by each composer in the test corpus

Corelli Vivaldi Telemann Bach Handel Marcello Haydn Mozart Beethoven Complete31302 223516 89385 626439 449292 2959 84648 172041 48649 1728231

Table 6 shows the percentage of intervals spelt correctly by each algorithmfor each composer and Table 7 shows the number of intervals spelt incorrectlyby each algorithm for each composer.

Table 6. Percentage of intervals spelt correctly by each algorithm for each composer


Telemann 99.19% 99.50% 99.12% 99.77%Bach 97.97% 99.07% 99.28% 99.67%

Handel 99.44% 99.62% 99.55% 99.82%Marcello 99.93% 99.73% 99.12% 99.73%

Haydn 97.93% 97.99% 98.47% 98.13%Mozart 98.49% 98.41% 98.28% 98.33%


Table 7. Number of intervals spelt incorrectly by each algorithm for each composer


Telemann 727 450 788 204Bach 12697 5852 4507 2074

Handel 2518 1703 2030 822Marcello 2 8 26 8

Haydn 1750 1700 1298 1579Mozart 2596 2734 2955 2874


From the graph in Fig. 11 it is clear that the algorithms of Longuet-Higginsand Temperley were noticeably better at spelling intervals correctly than theywere at spelling notes correctly. In particular, this graph shows that most of thenote spelling errors made by Longuet-Higgins and Temperley’s algorithms onthe music of Haydn, Mozart and Beethoven were due to whole segments beingspelt a diminished second away from the correct spelling.

Fig. 11. Graph showing the percentage of intervals spelt correctly by each algorithmfor each composer, with the composers arranged along the X-axis in order of birth

Again, the statistical significance of the differences between the performancesof the algorithms were analysed using McNemar’s test and the results are shownin Table 8. The bottom line of this table reveals that, with respect to the numberof intervals spelt correctly, Temperley’s algorithm performs significantly betteron this test corpus than the other three algorithms.

Table 8. Statistical significance of differences between interval error rates of algorithmsfor each composer

Composer Best p-value 2nd best p-value 3rd best p-value WorstCorelli Tem < 0.0001 ps13 0.0068 LH < 0.0001 CamVivaldi Tem < 0.0001 LH < 0.0001 ps13 0.1077 Cam

Telemann Tem < 0.0001 LH < 0.0001 Cam 0.0029 ps13Bach Tem < 0.0001 ps13 < 0.0001 LH < 0.0001 Cam

Handel Tem < 0.0001 LH < 0.0001 ps13 < 0.0001 CamMarcello Cam 0.0577, 0.0577 LH,Tem 0.001, 0.002 ps13

Haydn ps13 < 0.0001 Tem 0.0028 LH 0.2416 CamMozart Cam 0.0584 LH 0.0143 Tem 0.2459 ps13

Beethoven ps13 0.041 Cam 0.0252 LH < 0.0001 TemComplete test corpus Tem < 0.0001 ps13 0.0637 LH < 0.0001 Cam

Nonetheless, the graph in Fig. 11 suggests that, even in terms of the numberof intervals spelt correctly, all the algorithms perform worse on the classicalmusic than on the baroque music.

14 Comparison of algorithms with respect to totalnumber of intervals and notes spelt correctly

Finally, to get a measure of the overall performance of the algorithms,they wereevaluated in terms of the total number of notes and intervals spelt correctly.Table 9 shows the number of notes and intervals in works by each composer inthe test corpus.

Table 9. Number of intervals and notes in works by each composer in the test corpus

Corelli Vivaldi Telemann Bach Handel Marcello Haydn Mozart Beethoven Complete62692 447194 178927 1253522 899085 5921 169330 344138 97308 3458117

Table 10 shows the number of intervals and notes spelt incorrectly. Table 11shows the percentage of notes and intervals spelt correctly for each algorithmand each composer. Table 12 shows the statistical significance of the differencesbetween the algorithms with respect to the percentage of notes and intervalsspelt correctly (computed using McNemar’s test). The bottom line in this tableshows that, when evaluated in this way, ps13 performs significantly better thanthe other algorithms on this test corpus. Finally, the graph in Fig. 12 shows thepercentage of notes and intervals spelt correctly by each algorithm for each com-poser. This graph suggests that ps13 is less affected than the other algorithmsby the style and period of the music.

Table 10. The number of intervals and notes incorrectly spelt by each algorithm foreach composer


Telemann 1331 1115 1269 315Bach 26044 18874 7957 3240

Handel 4447 5045 4369 2131Marcello 3 12 40 13

Haydn 3247 7874 2121 7970Mozart 4896 12881 5205 22706


15 Conclusions and further work

This paper presents the results of a study in which four pitch spelling algorithmswere compared on a large corpus of tonal music. The algorithms compared werethose of Cambouropoulos, Temperley and Longuet-Higgins, together with the

Table 11. The percentage of intervals and notes spelt correctly by each algorithm foreach composer


Telemann 99.26% 99.38% 99.29% 99.82%Bach 97.92% 98.49% 99.37% 99.74%

Handel 99.51% 99.44% 99.51% 99.76%Marcello 99.95% 99.80% 99.32% 99.78%

Haydn 98.08% 95.35% 98.75% 95.29%Mozart 98.58% 96.26% 98.49% 93.40%


Table 12. Statistical significance of differences between number of notes and intervalsspelt correctly by algorithms for each composer

Composer Best p-value 2nd best p-value 3rd best p-value WorstCorelli Tem < 0.0001 ps13 < 0.0001 LH < 0.0001 CamVivaldi ps13 < 0.0001 Cam < 0.0001 Tem < 0.0001 LH

Telemann Tem < 0.0001 LH < 0.0001 ps13 0.8047 CamBach Tem < 0.0001 ps13 < 0.0001 LH < 0.0001 Cam

Handel Tem < 0.0001 ps13 0.3702 Cam < 0.0001 LHMarcello Cam 0.0201 LH 0.8414 Tem 0.0002 ps13

Haydn ps13 < 0.0001 Cam < 0.0001 LH 0.4477 TemMozart Cam 0.0007 ps13 < 0.0001 LH < 0.0001 Tem

Beethoven Cam 0.1672 ps13 < 0.0001 LH < 0.0001 TemComplete test corpus ps13 < 0.0001 Cam < 0.0001 Tem < 0.0001 LH

Fig. 12. Graph showing the percentage of intervals and notes spelt correctly by eachalgorithm for each composer, with the composers arranged along the X-axis in orderof birth

author’s own ps13 algorithm. When the algorithms were evaluated in termsof the number of notes in this corpus that they spelt correctly, it was foundthat the author’s ps13 algorithm performed significantly better than the otheralgorithms, correctly spelling 99.33% of the notes in the corpus correctly. Whileall four algorithms performed well on the baroque music in the test corpus,it was found that the algorithms that were based on the circle of fifths (i.e.,those of Temperley and Longuet-Higgins) performed less well than the otheralgorithms on the classical music in the corpus. However, when the algorithmswere evaluated in terms of the number of intervals spelt correctly, it was foundthat all the algorithms performed very well across all styles, with Temperley’salgorithm performing best, correctly spelling 99.45% of the intervals in the corpuscorrectly. When the notes and intervals were taken together, it was found thatps13 performed best overall, correctly spelling 99.25% of the notes and intervalscorrectly.

It would be interesting to extend this study by testing these and other al-gorithms on other corpora containing works in a wider variety of tonal stylesincluding, e.g., romantic, impressionist, rock and jazz.

Krumhansl [10, p. 79] claims that “once a key (or key region) has been de-termined, the correct spellings of the tones will be able to be determined inmost cases”. Both Longuet-Higgins’s algorithm and the author’s ps13 algorithm(implicitly) perform key-finding as part of the pitch-spelling process. However,the two algorithms give quite different results when run on the same test cor-pora. This demonstrates that there are various plausible ways of using the key-structure of a passage to determine pitch names and it is not at all obvious whichof these methods will give the best results. Krumhansl’s claim needs to be testedby building complete pitch spelling algorithms based on various key-finding al-gorithms and comparing the performance of these algorithms with others suchas those described in this paper.

Finally, the mistakes made by the algorithms in these experiments need tobe analysed in more depth in order to determine if any further improvementscan be made. Also, it might be interesting to carry out an in-depth study todetermine whether or not the algorithms are consistent with what is knownabout the perception and cognition of pitch structure in tonal music.

Acknowledgements

While carrying out the research reported in this paper, the author was fundedby the EPSRC (grant numbers GR/N08049/01 and GR/S17253/01). The authorwould also like to thank Geraint Wiggins, Tim Crawford and Marcus Pearce fortheir kind help and support.

References

1. Emilios Cambouropoulos. A general pitch interval representation: Theory andapplications. Journal of New Music Research, 25:231–251, 1996.

2. Emilios Cambouropoulos. Towards a General Computational Theory of MusicalStructure. PhD thesis, University of Edinburgh, February 1998.

3. Emilios Cambouropoulos. From MIDI to traditional musical nota-tion. In Proceedings of the AAAI 2000 Workshop on Artificial Intel-ligence and Music, 17th National Conference on Artificial Intelligence(AAAI’2000), 30 July–3 August, Austin, TX., 2000. Available online atftp://ftp.ai.univie.ac.at/papers/oefai-tr-2000-15.pdf.

4. Emilios Cambouropoulos. Automatic pitch spelling: From numbersto sharps and flats. In VIII Brazilian Symposium on Computer Mu-sic (SBC&M 2001), Fortaleza, Brazil, 2001. Available online atftp://ftp.ai.univie.ac.at/papers/oefai-tr-2001-12.pdf.

5. Emilios Cambouropoulos. Pitch spelling: A computational model. Music Percep-tion, 2002. To appear.

6. Manuel Davy and Simon J. Godsill. Bayesian harmonic models formusical signal analysis (with discussion). In J. M. Bernardo, J. O.Berger, A. P. Dawid, and A. F. M. Smith, editors, Bayesian Statistics,volume VII. Oxford University Press, 2003. Draft available online athttp://www-sigproc.eng.cam.ac.uk/~sjg/papers/02/harmonicfinal2.ps.

7. T. G. Dietterich. Approximate statistical tests for comparing supervised classifi-cation learning algorithms. Neural Computation, 10(7):1895–1924, 1998.

8. Walter B. Hewlett. MuseData: Multipurpose representation. In Eleanor Selfridge-Field, editor, Beyond MIDI: The Handbook of Musical Codes, pages 402–447. MITPress, Cambridge, MA., 1997.

9. Carol L. Krumhansl and E. J. Kessler. Tracing the dynamic changes in perceivedtonal organisation in a spatial representation of musical keys. Psychological Review,89:334–368, 1982.

10. Carol L. Krumhansl. Cognitive Foundations of Musical Pitch, volume 17 of OxfordPsychology Series. Oxford University Press, New York and Oxford, 1990.

11. Fred Lerdahl and Ray Jackendoff. A Generative Theory of Tonal Music. MITPress, Cambridge, MA, 1983.

12. H. Christopher Longuet-Higgins. The perception of melodies. Nature,263(5579):646–653, 1976.

13. H. Christopher Longuet-Higgins. The perception of melodies. In H. ChristopherLonguet-Higgins, editor, Mental Processes: Studies in Cognitive Science, pages105–129. British Psychological Society/MIT Press, London, England and Cam-bridge, Mass., 1987.

14. H. Christopher Longuet-Higgins. The perception of melodies. In Stephan M.Schwanauer and David A. Levitt, editors, Machine Models of Music, pages 471–495. M.I.T. Press, Cambridge, Mass., 1993.

15. David Meredith, Kjell Lemstrom, and Geraint A. Wiggins. Algorithms for discov-ering repeated patterns in multidimensional representations of polyphonic music.Journal of New Music Research, 31(4):321–345, 2002. Draft available online athttp://www.titanmusic.com/papers/public/siajnmr submit 2.pdf.

16. David Meredith. Method of computing the pitch names of notes in MIDI-like music representations, 2003. Patent filing submitted to UK Patent Of-fice on 11 April 2003. Application number 0308456.3. Draft available online athttp://www.titanmusic.com/papers.html.

17. Walter Piston. Harmony. Victor Gollancz Ltd., London, 1978. Revised and ex-panded by Mark DeVoto.

18. M.D. Plumbley, S. Abdallah, J. Bello, M. E. Davies, G. Monti, and M. Sandler.Automatic music transcription and audio source separation. Cybernetics and Sys-tems, 33(6):603–627, 2002.

19. David Temperley. An algorithm for harmonic analysis. Music Perception, 15(1):31–68, 1997.

20. David Temperley. The Cognition of Basic Musical Structures. MIT Press, Cam-bridge, MA, 2001.

21. Paul J. Walmsley. Signal Separation of Musical Instruments. PhD thesis, SignalProcessing Group, Department of Engineering, University of Cambridge, 2000.

comparing pitch spelling algorithms on a large corpus … · comparing pitch spelling algorithms on...

Documents