key, chord, and rhythm tracking of popular music recordings

12
Shenoy and Wang Arun Shenoy and Ye Wang School of Computing National University of Singapore 3 Science Drive 2, Singapore 117543 [email protected] [email protected] Key, Chord, and Rhythm Tracking of Popular Music Recordings 75 In this article, we propose a framework to analyze a musical audio signal (sampled from a popular music CD) and determine its key, provide usable chord transcriptions, and obtain the hierarchical rhythm structure representation comprising the quarter- note, half-note, and whole-note (or measure) levels. This framework is just one specific aspect of the broader field of content-based analysis of music. There would be many useful applications of content- based analysis of musical audio, most of which are not yet fully realized. One of these is automatic music transcription, which involves the transfor- mation of musical audio into a symbolic represen- tation such as MIDI or a musical score, which in principle, could then be used to recreate the musical piece (e.g., Plumbley et al. 2002). Another applica- tion lies in the field of music informational re- trieval, that is, simplifying interaction with large databases of musical multimedia by annotating au- dio data with information that is useful for search and retrieval (e.g., Martin et al. 1998). Two other applications are structured audio and emotion detection in music. In the case of struc- tured audio, we are interested in transmitting sound by describing rather than compressing it (Martin et al. 1998). Here, content analysis could be used to automate partly the creation of this description by the automatic extraction of various musical con- structs from the audio. Regarding emotion detec- tion, Hevner (1936) has carried out experiments that substantiated a hypothesis that music inherently carries emotional meaning. Huron (2000) has pointed out that, because the preeminent functions of music are social and psychological, emotion could serve as a very useful measure for the characteriza- tion of music in information retrieval systems. The influence of musical chords on listeners’s emotion has been demonstrated by Sollberger et al. (2003). Whereas we would expect human listeners to be reasonably successful at general auditory scene analysis, it is still a challenge for computers to per- form such tasks. Even simple human acts of cogni- tion such as tapping the foot to the beat or swaying in time with the music are not easily reproduced by a computer program. A brief review of audio anal- ysis as it relates to music, followed by case studies of a recently developed system that analyze specific aspects of music, has been presented by Dixon (2004). The landscape of music-content processing technologies is discussed in Aigrain (1999). The cur- rent article does not present new audio signal- processing techniques for content analysis, instead building a framework from existing techniques. However, it does represent a unique attempt at inte- grating harmonic and metric information within a unified system in a mutually informing manner. Although the detection of individual notes consti- tutes low-level music analysis, it is often difficult for the average listener to identify them in music. Rather, it is the overall quality conveyed by the com- bination of notes to form chords. Chords are the har- monic description of music, and like melody and rhythm, could serve to capture the essence of the mu- sical piece. Non-expert listeners tend to hear groups of simultaneous notes as chords. It can be quite diffi- cult to identify whether or not a particular pitch has been heard in a chord. Furthermore, although a com- plete and accurate polyphonic transcription of all notes would undoubtedly yield the best results, it is often possible to classify music by genre, identify mu- sical instruments by timbre, or segment music into sectional divisions without this low-level analysis. Tonality is an important structural property of music, and it has been described by music theorists and psychologists as a hierarchical ordering of the pitches of the chromatic scale such that these notes are perceived in relation to one central and stable pitch, the tonic (Smith and Schmuckler 2000). This hierarchical structure is manifest in listeners’s per- ceptions of the stability of pitches in tonal contexts. The key of a piece of music is specified by its tonic and one of two modes: major or minor. A system to determine the key of acoustic musical signals has Computer Music Journal, 29:3, pp. 75–86, Fall 2005 © 2005 Massachusetts Institute of Technology.

Upload: trandang

Post on 11-Feb-2017

222 views

Category:

Documents


2 download

TRANSCRIPT

Shenoy and Wang

Arun Shenoy and Ye WangSchool of ComputingNational University of Singapore3 Science Drive 2 Singapore 117543arunarunshenoycomwangyecompnusedusg

Key Chord and RhythmTracking of Popular MusicRecordings

75

In this article we propose a framework to analyze amusical audio signal (sampled from a popular musicCD) and determine its key provide usable chordtranscriptions and obtain the hierarchical rhythmstructure representation comprising the quarter-note half-note and whole-note (or measure) levelsThis framework is just one specific aspect of thebroader field of content-based analysis of musicThere would be many useful applications of content-based analysis of musical audio most of which arenot yet fully realized One of these is automaticmusic transcription which involves the transfor-mation of musical audio into a symbolic represen-tation such as MIDI or a musical score which inprinciple could then be used to recreate the musicalpiece (eg Plumbley et al 2002) Another applica-tion lies in the field of music informational re-trieval that is simplifying interaction with largedatabases of musical multimedia by annotating au-dio data with information that is useful for searchand retrieval (eg Martin et al 1998)

Two other applications are structured audio andemotion detection in music In the case of struc-tured audio we are interested in transmitting soundby describing rather than compressing it (Martin etal 1998) Here content analysis could be used toautomate partly the creation of this description bythe automatic extraction of various musical con-structs from the audio Regarding emotion detec-tion Hevner (1936) has carried out experiments thatsubstantiated a hypothesis that music inherentlycarries emotional meaning Huron (2000) haspointed out that because the preeminent functionsof music are social and psychological emotion couldserve as a very useful measure for the characteriza-tion of music in information retrieval systems Theinfluence of musical chords on listenersrsquos emotionhas been demonstrated by Sollberger et al (2003)

Whereas we would expect human listeners to bereasonably successful at general auditory scene

analysis it is still a challenge for computers to per-form such tasks Even simple human acts of cogni-tion such as tapping the foot to the beat or swayingin time with the music are not easily reproduced bya computer program A brief review of audio anal-ysis as it relates to music followed by case studiesof a recently developed system that analyze specificaspects of music has been presented by Dixon(2004) The landscape of music-content processingtechnologies is discussed in Aigrain (1999) The cur-rent article does not present new audio signal-processing techniques for content analysis insteadbuilding a framework from existing techniquesHowever it does represent a unique attempt at inte-grating harmonic and metric information within aunified system in a mutually informing manner

Although the detection of individual notes consti-tutes low-level music analysis it is often difficultfor the average listener to identify them in musicRather it is the overall quality conveyed by the com-bination of notes to form chords Chords are the har-monic description of music and like melody andrhythm could serve to capture the essence of the mu-sical piece Non-expert listeners tend to hear groupsof simultaneous notes as chords It can be quite diffi-cult to identify whether or not a particular pitch hasbeen heard in a chord Furthermore although a com-plete and accurate polyphonic transcription of allnotes would undoubtedly yield the best results it isoften possible to classify music by genre identify mu-sical instruments by timbre or segment music intosectional divisions without this low-level analysis

Tonality is an important structural property ofmusic and it has been described by music theoristsand psychologists as a hierarchical ordering of thepitches of the chromatic scale such that these notesare perceived in relation to one central and stablepitch the tonic (Smith and Schmuckler 2000) Thishierarchical structure is manifest in listenersrsquos per-ceptions of the stability of pitches in tonal contextsThe key of a piece of music is specified by its tonicand one of two modes major or minor A system todetermine the key of acoustic musical signals has

Computer Music Journal 293 pp 75ndash86 Fall 2005copy 2005 Massachusetts Institute of Technology

been demonstrated in Shenoy et al (2004) and willbe summarized later in this article

Rhythm is another component that is fundamen-tal to the perception of music Measures of musicdivide a piece into time-counted segments andtime patterns in music are referred to in terms ofmeter The beat forms the basic unit of musicaltime and in a meter of 44mdashalso called common orquadruple timemdashthere are four beats to a measureRhythm can be perceived as a combination of strongand weak beats In a 44 measure consisting of foursuccessive quarter notes there is usually a strongbeat on the first and third quarter notes and a weakbeat on the second and fourth (Goto and Muraoka1994) If the strong beat constantly alternates withthe weak beat the inter-beat-interval (the temporaldifference between two successive beats) wouldusually correspond to the temporal length of a quar-ter note For our purpose the strong and weak beatsas defined above correspond to the alternating se-quence of equally spaced phenomenal impulses thatdefine the tempo of the music (Scheirer 1998) Ahierarchical structure like the measure (bar-line)level can provide information more useful for mod-eling music at a higher level of understanding (Gotoand Muraoka 1999) Key chords and rhythm areimportant expressive dimensions in musical perfor-mances Although expression is necessarily con-tained in the physical features of the audio signalsuch as amplitudes frequencies and onset times itis better understood when viewed from a higherlevel of abstraction that is in terms of musical con-structs (Dixon 2003) like the ones discussed here

Related Work

Existing work in key determination has been re-stricted to either the symbolic domain (MIDI andscore) or in the audio domain single-instrumentand simple polyphonic sounds (see for example Nget al 1996 Chew 2001 2002 Povel 2002 Pickens2003 Raphael and Stoddard 2003 Zhu and Kankan-halli 2003 Zhu et al 2004) A system to extract themusical key from classical piano sonatas sampledfrom compact discs has been demonstrated by Pauws

(2004) Here the spectrum is first restructured intoa chromagram representation in which the frequen-cies are mapped onto a limited set of twelve chromavalues This chromagram is used in a correlativecomparison with the key profiles of all the 24 West-ern musical keys that represent the perceived stabil-ity of each chroma within the context of a particularmusical key The key profile that has the maximumcorrelation with the computed chromagram istaken as the most likely key It has been mentionedthat the performance of the system on recordingsfrom other instrumentation or from other musicalidioms is unknown Additionally factors in musicperception and cognition such as rhythm and har-mony are not modeled These issues have been ad-dressed in Shenoy et al (2004) in extracting the keyof popular music sampled from compact-disc audio

Most existing work in the detection and recogni-tion of chords has similarly been restricted to thesymbolic domain or single-instrument and simplepolyphonic sounds (see for example Carreras et al1999 Fujishima 1999 Pardo and Birmingham 1999Bello et al 2000 Pardo and Birmingham 2001Ortiz-Berenguer and Casajus-Quiros 2002 Pickensand Crawford 2002 Pickens et al 2002 Klapuri2003 Raphael and Stoddard 2003) A statistical ap-proach to perform chord segmentation and recogni-tion on real-world musical recordings that usesHidden Markov Models (HMMs) trained using theExpectation-Maximization (EM) algorithm has beendemonstrated in Sheh and Ellis (2003) This workdraws on the prior idea of Fujishima (1999) who pro-posed a representation of audio termed ldquopitch-classprofilesrdquo (PCPs) in which the Fourier transform in-tensities are mapped to the twelve semitone classes(chroma) This system assumes that the chord se-quence of an entire piece is known beforehandwhich limits the technique to the detection ofknown chord progressions Furthermore becausethe training and testing data is restricted to the mu-sic of the Beatles it is unclear how this systemwould perform for other kinds of music A learningmethod similar to this has been used to for chord de-tection in Maddage et al (2004) Errors in chord de-tection have been corrected using knowledge fromthe key-detection system in Shenoy et al (2004)

76 Computer Music Journal

Shenoy and Wang

Much research in the past has also focused onrhythm analysis and the development of beat-tracking systems However most of this researchdoes not consider higher-level beat structuresabove the quarter-note level or it was restricted tothe symbolic domain rather than working in real-world acoustic environments (see for exampleAllen and Dannenberg 1990 Goto and Muraoka1994 Vercoe 1997 Scheirer 1998 Cemgil et al2001 Dixon 2001 Raphael 2001 Cemgil and Kap-pen 2003 Dixon 2003) Goto and Muraoka (1999)perform real-time higher level rhythm determina-tion up to the measure level in musical audio with-out drum sounds using onset times and chordchange detection for musical decisions The provi-sional beat times are a hypothesis of the quarter-note level and are inferred by an analysis of onsettimes The chord-change analysis is then performedat the quarter-note level and at the interpolatedeighth-note level followed by an analysis of howmuch the dominant frequency components in-cluded in chord tones and their harmonic overtoneschange in the frequency spectrum Musical knowl-edge of chord change is then applied to detect thehigher-level rhythm structure at the half-note andmeasure (whole-note) levels Goto has extended thiswork to apply to music with and without drumsounds using drum patterns in addition to onsettimes and chord changes discussed previously(Goto 2001) The drum pattern analysis can be per-formed only if the musical audio signal containsdrums and hence a technique that measures the au-tocorrelation of the snare drumrsquos onset times is ap-plied Based on the premise that drum-sounds arenoisy the signal is determined to contain drumsounds only if this autocorrelation value is highenough Based on the presence or absence of drumsounds the knowledge of chord changes andordrum patterns is selectively applied The highestlevel of rhythm analysis at the measure level(whole-notebar) is then performed using only musi-cal knowledge of chord change patterns In boththese works chords are not recognized by nameand thus rhythm detection has been performed us-ing chord-change probabilities rather than actualchord information

System Description

A well-known algorithm used to identify the key ofmusic is the Krumhansl-Schmuckler key-finding al-gorithm (Krumhansl 1990) The basic principle ofthe algorithm is to compare the input music with aprototypical major (or minor) scale-degree profile Inother words the distribution of pitch-classes in apiece is compared with an ideal distribution foreach key This algorithm and its variations (Huronand Parncutt 1993 Temperley 1999a 1999b 2002)however could not be directly applied in our sys-tem as these require a list of notes with note-onand note-off times which cannot be directly ex-tracted from polyphonic audio Thus the problemhas been approached in Shenoy et al (2004) at ahigher level by clustering individual notes to obtainthe harmonic description of the music in the formof the 24 majorminor triads Then based on a rule-based analysis of these chords against the chordspresent in the major and minor keys the key of thesong is extracted The audio has been framed intobeat-length segments to extract metadata in theform of quarter-note detection of the music The ba-sis for this technique is to assist in the detection ofchord structure based on the musical knowledgethat chords are more likely to change at beat timesthan at other positions (Goto and Muraoka 1999)

The beat-detection process first detects the onsetspresent in the music using sub-band processing(Wang et al 2003) This technique of onset detec-tion is based on the sub-band intensity to detect theperceptually salient percussive events in the signalWe draw on the prior ideas of beat tracking dis-cussed in Scheirer (1998) and Dixon (2003) to deter-mine the beat structure of the music As a first stepall possible values of inter-onset intervals (IOIs) arecomputed An IOI is defined as the time interval be-tween any pair of onsets not necessarily successiveThen clusters of IOIs are identified and a rankedset of hypothetical inter-beat-intervals (IBIs) is cre-ated based on the size of the corresponding clustersand by identifying integer relationships with otherclusters The latter process is to recognize harmonicrelationships between the beat (at the quarter-notelevel) and simple integer multiples of the beat (at

77

the half-note and whole-note levels) An error mar-gin of 25 msec has been set in the IBI to account forslight variations in the tempo The highest rankedvalue is returned as the IBI from which we obtainthe tempo expressed as an inverse value of the IBIPatterns of onsets in clusters at the IBI are trackedand beat information is interpolated into sections inwhich onsets corresponding to the beat might notbe detected

The audio feature for harmonic analysis is a re-duced spectral representation of each beat-spacedsegment of the audio based on a chroma transforma-tion of the spectrum This feature class representsthe spectrum in terms of pitch class and forms thebasis for the chromagram (Wakefield 1999) The sys-tem takes into account the chord distributionacross the diatonic major scale and the three typesof minor scales (natural harmonic and melodic)Furthermore the system has been biased by assign-ing relatively higher weights to the primary chords

in each key (tonic dominant and subdominant)This concept has also been demonstrated in the Spi-ral Array a mathematical model for tonality (Chew2001 2002)

It is observed that the chord-recognition accuracyof the key system though sufficient to determinethe key is not sufficient to provide usable chordtranscriptions or determine the hierarchical rhythmstructure across the entire duration of the musicThus in this article we enhance the four-step key-determination system with three post-processingstages that allow us to perform these two tasks withgreater accuracy The block diagram of the proposedframework is shown in Figure 1

System Components

We now discuss the three post-key processing com-ponents of the system These consist of a first phaseof chord accuracy enhancement rhythm structuredetermination and a second phase of chord accu-racy enhancement

Chord Accuracy Enhancement (Phase 1)

In this step we aim to increase the accuracy of chorddetection For each audio frame we perform twochecks

Check 1 Eliminate Chords Not in the Key of theSong

Here we perform a rule-based analysis of the de-tected chord to see if it exists in the key of the songIf not we check for the presence of the major chordof the same pitch class (if the detected chord is mi-nor) and vice versa If present in the key we replacethe erroneous chord with this chord This is becausethe major and minor chord of a pitch class differonly in the position of their mediants The chord-detection approach often suffers from recognitionerrors that result from overlaps of harmonic compo-nents of individual notes in the spectrum these arequite difficult to avoid Hence there is a possibly

78 Computer Music Journal

Figure 1 System frame-work

Shenoy and Wang

of error in the distinction between the major andminor chords for a given pitch class It must behighlighted that chords outside the key are not nec-essarily erroneous and this usage is a simplificationused by this system If this check fails we eliminatethe chord

Check 2 Perform Temporal Corrections ofDetected or Missing Chords

If the chords detected in the adjacent frames are thesame as each other but different from the currentframe then the chord in the current frame is likelyto be incorrect In these cases we coerce the cur-rent framersquos chord to match the one in the adjacentframes

We present an illustrative example of the abovechecks over three consecutive quarter-note-spacedframes of audio in Figure 2

Rhythm Structure Determination

Next our system checks for the start of measuresbased on the premise that chords are more likely tochange at the beginning of a measure than at otherbeat positions (Goto 2001) Because there are fourquarter notes to a measure in 44 time (which is byfar the most common meter in popular music) we

check for patterns of four consecutive frames withthe same chord to demarcate all possible measureboundaries However not all of these boundariesmay be correct We will illustrate this with an ex-ample in which a chord sustains over two measuresof the music From Figure 3c it can be seen thatthere are four possible measure boundaries being de-tected across the twelve quarter-note-spaced framesof audio Our aim is to eliminate the two erroneousones shown with a dotted line in Figure 3c and in-terpolate an additional measure line at the start ofthe fifth frame to give us the required result as seenin Figure 3b

The correct measure boundaries along the entirelength of the song are thus determined as followsFirst obtain all possible patterns of boundary loca-tions that are have integer relationships in multiplesof four The select the pattern with the highestcount as the one corresponding to the pattern of ac-tual measure boundaries Track the boundary loca-tions in the detected pattern and interpolate missingboundary positions across the rest of the song

Chord Accuracy Enhancement (Phase 2)

Now that the measure boundaries have been ex-tracted we can increase the chord accuracy in eachmeasure of audio with a third check

79

Figure 2 Chord accuracyenhancement (Phase 1)

Check 3 Intra-Measure Chord Check

From Goto and Muraoka (1999) we know thatchords are more likely to change at the beginning ofthe measures than at other positions of half-notetimes Hence if three of the chords are the samethen the four chord is likely to be the same as theothers (Check 3a) Also if a chord is common toboth halves of the measure then all the chords in

the measure are likely to be the same as this chord(Check 3b)

It is observed that all possible cases of chords un-der Check 3a are already handled by Checks 1 and 2above Hence we only implement Check 3b as il-lustrated in Figure 4 with an example This check isrequired because in the case of a minor key we canhave both the major and minor chord of the samepitch class present in the song A classic example of

80 Computer Music Journal

Figure 3 Error in measureboundary detection

Figure 4 Chord accuracyenhancement (Phase 2)

Figure 3

Figure 4

Shenoy and Wang

this can be seen in ldquoHotel Californiardquo by the EaglesThis song is in the key of B Minor and the chords inthe verse include an E major and an E minor chordwhich shows a possible musical shift from the as-cending melodic minor to the natural minor Hereif an E major chord is detected in a measure contain-ing the E minor chord Check 1 would not detectany error because both the major and minor chordsare potentially present in the key of B minor Themelodic minor however rarely occurs as such inpopular music melodies but it has been included inthis work along with the natural and harmonic mi-nor for completeness

Experiments

Setup

The results of our experiments performed on 30popular songs in English spanning five decades ofmusic are tabulated in Table 1 The songs havebeen carefully selected to represent a variety ofartists and time periods We assume the meter to be44 which is the most frequent meter of popularsongs and the tempo of the input song is assumedto be constrained between 40 and 185 quarter notesper minute

81

Table 1 Experimental Results

Chord Chord Accuracy Successful Chord AccuracyDetection Original Detected Enhancement-I Measure Enhancement-II

No Song Title ( accuracy) Key Key ( accuracy) Detection ( accuracy)

1 (1965) Righteous BrothersmdashUnchained melody 5768 C major C major 7092 2830 songs 85111 (1965) Righteous BrothersmdashUnchained melody 5768 C major C major 7092 Yes 85112 (1977) Bee GeesmdashStayinrsquo alive 3967 F minor F minor 5491 Yes 71403 (1977) Eric ClaptonmdashWonderful tonight 2770 G major G major 4082 Yes 60644 (1977) Fleetwood MacmdashYou make lovinrsquo fun 4437 A major A major 6069 Yes 79315 (1979) EaglesmdashI canrsquot tell you why 5241 D major D major 6874 Yes 88976 (1984) ForeignermdashI want to know what love is 5503 D minor D minor 7342 No 58127 (1986) Bruce HornsbymdashThe way it is 5974 G major G major 7032 Yes 88508 (1989) Chris ReamdashRoad to hell 6151 A minor A minor 7664 Yes 89249 (1991) REMmdashLosing my religion 5631 A minor A minor 7075 Yes 857410 (1991) U2mdashOne 5663 C major C major 6482 Yes 766311 (1992) Michael JacksonmdashHeal the world 3044 A major A major 5176 Yes 686212 (1993) MLTRmdashSomeday 5668 D major D major 6971 Yes 873013 (1995) CooliomdashGangstarsquos paradise 3175 C minor C minor 4794 Yes 707914 (1996) Backstreet BoysmdashAs long as you love me 4845 C major C major 6197 Yes 828215 (1996) Joan OsbornemdashOne of us 4690 A major A major 5930 Yes 800516 (1997) Bryan AdamsmdashBack to you 6892 C major C major 7569 Yes 958017 (1997) Green DaymdashTime of your life 5455 G major G major 6458 Yes 877718 (1997) HansonmdashMmmbop 3956 A major A major 6339 Yes 810819 (1997) Savage GardenmdashTruly madly deeply 4906 C major C major 6388 Yes 808620 (1997) Spice GirlsmdashViva forever 6450 D minor F major 7425 Yes 914221 (1997) Tina ArenamdashBurn 3542 G major G major 5613 Yes 773822 (1998) Jennifer PaigemdashCrush 4037 C min C min 5541 Yes 767823 (1998) Natalie ImbrugliamdashTorn 5300 F major F major 6789 Yes 877324 (1999) SantanamdashSmooth 5453 A minor A minor 6963 No 499125 (2000) CorrsmdashBreathless 3677 B major B major 6347 Yes 772826 (2000) Craig DavidmdashWalking away 6899 A minor C major 7526 Yes 930327 (2000) Nelly FurtadomdashTurn off the light 3636 D major D major 4848 Yes 705228 (2000) WestlifemdashSeasons in the sun 3419 F major F major 5869 Yes 763529 (2001) ShakiramdashWhenever wherever 4986 C minor C minor 6282 Yes 783930 (2001) TrainmdashDrops of Jupiter 3254 C major C major 5373 Yes 6985

Overall Accuracy at each stage 4813 2830 songs 6320 2830 songs 7891

It can be observed that the average chord-detectionaccuracy across the length of the entire music per-formed by the chord-detection step (module 3 inFigure 2) is relatively low at 4813 The rest of thechords are either not detected or are detected in er-ror This accuracy is sufficient however to deter-mine the key accurately for 28 out of 30 songs inthe key-detection step (module 4 in Figure 2) whichreflects an accuracy of over 93 for key detectionThis was verified against the information in com-mercially available sheet music for the songs (wwwmusicnotescom wwwsheetmusicpluscom) Theaverage chord detection accuracy of the system im-proves on an average by 1507 on applying ChordAccuracy Enhancement (Phase 1) (See module 5 inFigure 2) Errors in key determination do not haveany effect on this step as will be discussed next

The new accuracy of 6320 has been found to besufficient to determine the hierarchical rhythmstructure (module 6 in Figure 2) across the music for28 out of the 30 songs thus again reflecting an accu-racy of over 93 for rhythm tracking Finally theapplication of Chord Accuracy Enhancement (Phase2 see module 7 in Figure 2) makes a substantial per-formance improvement of 1571 leading to a finalchord-detection accuracy of 7891 This couldhave been higher were it not for the performancedrop for the two songs (song numbers 6 and 24 inTable 1) owing to error in measure-boundary detec-tion This exemplifies the importance of accuratemeasure detection to performing intra-measurechord checks based on the previously discussedmusical knowledge of chords

Analysis of Key

It can be observed that for two of the songs (songnumbers 20 and 26 in Table 1) the key has been de-termined incorrectly because the major key and itsrelative minor are very close Our technique as-sumes that the key of the song is constant through-out the length of the song However many songsoften use both major and minor keys perhapschoosing a minor key for the verse and a major keyfor the chorus or vice versa Sometimes the chordsused in the song are present in both the major and

its relative minor For example the four mainchords used in the song ldquoViva Foreverrdquo by the SpiceGirls are D-sharp minor A-sharp minor B majorand F-sharp major These chords are present in thekey of F-sharp major and D-sharp minor hence it isdifficult for the system to determine if the song is inthe major key or the relative minor A similar obser-vation can be made for the song ldquoWalking Awayrdquo byCraig David in which the main chords used are Aminor D minor F major G major and C majorThese chords are present in both the key of C majoras well as in its relative minor A minor

The use of a weighted-cosine-similarity techniquecauses the shorter major-key patterns to be preferredover the longer minor-key patterns owing to nor-malization that is performed while applying the co-sine similarity For the minor keys normalization isapplied taking into account the count of chords thatcan be constructed across all the three types of minorscales However from an informal evaluation of pop-ular music we observe that popular music in minorkeys usually shifts across only two out of the threescales primarily the natural and harmonic minorIn such cases the normalization technique appliedwould cause the system to become slightly biasedtoward the relative major key where this problem isnot present as there is only one major scale

Errors in key recognition do not affect the ChordAccuracy Enhancement (Phase 1) module becausewe also consider chords present in the relative ma-jorminor key in addition to the chords in the de-tected key A theoretical explanation of how toperform key identification in such cases of ambigu-ity (as seen above) based on an analysis of sheet mu-sic can be found in Ewer (2002)

Analysis of Chord

The variation in the chord-detection accuracy of thesystem can be explained in terms of other chordsand key change

Use of Other Chords

In this approach we have considered only the majorand minor triads However in addition to these

82 Computer Music Journal

Shenoy and Wang

there are other chord possibilities in popular musicwhich likely contribute to the variation in chord de-tection These chords include the augmented anddiminished triads seventh chords (eg the dominantseventh major seventh and minor seventh) etc

Polyphony with its multidimensional sequencesof overlapping tones and overlapping harmoniccomponents of individual notes in the spectrummight cause the elements in the chroma vector tobe weighted wrongly As a result a Cmajor 7 chord(C E G B) in the music might incorrectly get de-tected as an E minor chord (E G B) if the latterthree notes are assigned a relatively higher weightin the chroma vector

Key Change

In some songs there is a key change toward the endof a song to make the final repeated part(s) (eg thechorusrefrain) slightly different from the previousparts This is affected by transposing the song tohigher semitones usually up a half step This hasalso been highlighted by Goto (2003) Because oursystem does not currently handle key changes thechords detected in this section will not be recog-nized This is illustrated with an example in Figure 5

Another point to be noted here is of chord substi-tutionsimplification of extended chords in theevaluation of our system For simplicity extendedchords can be substituted by their respective ma-jorminor triads As long as the notes in the ex-tended chord are present in the scale and the basictriad is present the simplification can be done For

example the C-major 7 can be simplified to the Cmajor triad This substitution has been performedon the extended chords annotated in the sheet mu-sic in the evaluation of our system

Rhythm Tracking Observations

We conjecture the error in rhythm tracking to becaused by errors in the chord-detection as discussedabove Chords present in the music and not handledby our system could be incorrectly classified into oneof the 24 majorminor triads owing to complexitiesin polyphonic audio analysis This can result in in-correct clusters of four chords being captured by therhythm-detection process resulting in an incorrectpattern of measure boundaries having the highestcount Furthermore beat detection is a non-trivialtask the difficulties of tracking the beats in acous-tic signals are discussed in Goto and Muraoka(1994) Any error in beat detection can cause a shiftin the rhythm structure determined by the system

Discussion

We have presented a framework to determine thekey chords and hierarchical rhythm structure fromacoustic musical signals To our knowledge this isthe first attempt to use a rule-based approach thatcombines low-level features with high-level musicknowledge of rhythm and harmonic structure to de-termine all three of these dimensions of music The

83

Figure 5 Key Shift

methods should (and do) work reasonably well forkey and chord labeling of popular music in 44 timeHowever they would not extend very far beyondthis idiom or to more rhythmically and tonallycomplex music This framework has been appliedsuccessfully in various other aspects of contentanalysis such as singing-voice detection (Nwe et al2004) and the automatic alignment of textual lyricsand musical audio (Wang et al 2004) The humanauditory system is capable of extracting rich andmeaningful data from complex audio signals (Shehand Ellis 2003) and existing computational audi-tory analysis systems fall clearly behind humans inperformance Towards this end we believe that themodel proposed here provides a promising platformfor the future development of more sophisticatedauditory models based on a better understanding ofmusic Our current and future research that buildson this work is highlighted below

Key Detection

Our technique assumes that the key of the song isconstant throughout the duration of the song How-ever owing to the properties of the relative majorminor key combinations we have made the chord-and rhythm-detection processes (which use the keyof the song as input) quite robust against changesacross this key combination However the same can-not be said about other kinds of key changes in themusic This is because such key changes are quitedifficult to track as there are no fixed rules andthey tend to depend more on the songwriterrsquos cre-ativity For example the song ldquoLet It Growrdquo by EricClapton switches from a B-minor key in the verse toan E major key in the chorus We believe that ananalysis of the song structure (verse chorus bridgeetc) could likely serve as an input to help track thiskind of key change This problem is currently beinganalyzed and will be tackled in the future

Chord Detection

In this approach we have considered only the majorand minor triads However in addition to these

there are other chord possibilities in popular musicand future work will be targeted toward the detec-tion of dominant-seventh chords and extendedchords as discussed earlier Chord-detection re-search can be further extended to include knowl-edge of chord progressions based on the function ofchords in their diatonic scale which relates to theexpected resolution of each chord within a keyThat is the analysis of chord progressions based onthe ldquoneedrdquo for a sounded chord to move from anldquounstablerdquo sound (a ldquodissonancerdquo) to a more final orldquostablerdquo sounding one (a ldquoconsonancerdquo)

Rhythm Tracking

Unlike that of Goto and Muraoka (1999) therhythm-extraction technique employed in our cur-rent system does not perform well for ldquodrumlessrdquomusic signals because the onset detector has beenoptimized to detect the onset of percussive events(Wang et al 2003) Future effort will be aimed at ex-tending the current work for music signals that donot contain drum sounds

Acknowledgments

We thank the editors and two anonymous reviewersfor helpful comments and suggestions on an earlierversion of this article

References

Aigrain P 1999 ldquoNew Applications of Content Pro-cessing of Musicrdquo Journal of New Music Research28(4)271ndash280

Allen P E and R B Dannenberg 1990 ldquoTracking MusicalBeats in Real Timerdquo Proceedings of the 1990 Interna-tional Computer Music Conference San Francisco In-ternational Computer Music Association pp 140ndash143

Bello J P et al 2000 ldquoTechniques for Automatic MusicTranscriptionrdquo Paper presented at the 2000 Interna-tional Symposium on Music Information Retrieval Uni-versity of Massachusetts at Amherst 23ndash25 October

Carreras F et al 1999 ldquoAutomatic Harmonic Descrip-tion of Musical Signals Using Schema-Based Chord De-

84 Computer Music Journal

Shenoy and Wang

compositionrdquo Journal of New Music Research 28(4)310ndash333

Cemgil A T et al 2001 ldquoOn Tempo Tracking Tempo-gram Representation and Kalman Filteringrdquo Journal ofNew Music Research 29(4)259ndash273

Cemgil A T and B Kappen 2003 ldquoMonte Carlo Meth-ods for Tempo Tracking and Rhythm QuantizationrdquoJournal of Artificial Intelligence Research 18(1)45ndash81

Chew E 2001 ldquoModeling Tonality Applications to Mu-sic Cognitionrdquo Proceedings of the Twenty-Third An-nual Conference of the Cognitive Science SocietyWheat Ridge Colorado Cognitive Science Societypp 206ndash211

Chew E 2002 ldquoThe Spiral Array An Algorithm for De-termining Key Boundariesrdquo Proceedings of the 2002International Conference on Music and Artificial In-telligence Vienna Springer pp 18ndash31

Dixon S 2001 ldquoAutomatic Extraction of Tempo and Beatfrom Expressive Performancesrdquo Journal of New MusicResearch 30(1)39ndash58

Dixon S 2003 ldquoOn the Analysis of Musical Expressionsin Audio Signalsrdquo SPIEmdashThe International Society forOptical Engineering 5021(2)122ndash132

Dixon S 2004 ldquoAnalysis of Musical Content in DigitalAudiordquo Computer Graphics and Multimedia Applica-tions Problems and Solutions Hershey PennsylvaniaIdea Group pp 214ndash235

Ewer G 2002 Easy Music Theory Lower Sackville NovaScotia Spring Day Music Publishers

Fujishima T 1999 ldquoReal-Time Chord Recognition ofMusical Sound A System Using Common Lisp Mu-sicrdquo Proceedings of the 1999 International ComputerMusic Conference San Francisco International Com-puter Music Association pp 464ndash467

Goto M 2001 ldquoAn Audio-Based Real-Time Beat-TrackingSystem for Music with or without Drum SoundsrdquoJournal of New Music Research 30(2)159ndash171

Goto M 2003 ldquoA Chorus-Section Detecting Method forMusical Audio Signalsrdquo Proceedings of the 2003 Inter-national Conference on Acoustics Speech and SignalProcessing Piscataway New Jersey Institute for Elec-trical and Electronics Engineers pp 437ndash440

Goto M and Y Muraoka 1994 ldquoA Beat-Tracking Sys-tem for Acoustic Signals of Musicrdquo Proceedings of the1994 ACM Multimedia New York Association forComputing Machinery pp 365ndash372

Goto M and Y Muraoka 1999 ldquoReal-Time Beat Track-ing for Drumless Audio Signals Chord Change Detec-tion for Musical Decisionsrdquo Speech Communication27(3ndash4)311ndash335

Hevner K 1936 ldquoExperimental Studies of the Elements

of Expression in Musicrdquo American Journal of Psychol-ogy 48246ndash268

Huron D 2000 ldquoPerceptual and Cognitive Applicationsin Music Information Retrievalrdquo Paper presented atthe 2000 International Symposium on Music Informa-tion Retrieval University of Massachusetts at Amherst23ndash25 October

Huron D and R Parncutt 1993 ldquoAn Improved Modelof Tonality Perception Incorporating Pitch Salienceand Echoic Memoryrdquo Psychomusicology 12(2)154ndash171

Klapuri A 2003 ldquoAutomatic Transcription of MusicrdquoProceedings of SMAC 2003 Stockholm KTH

Krumhansl C 1990 Cognitive Foundations of MusicalPitch Oxford Oxford University Press

Maddage N C et al 2004 ldquoContent-Based Music Struc-ture Analysis with Applications to Music SemanticsUnderstandingrdquo Proceedings of 2004 ACM Multime-dia New York Association for Computing Machinerypp 112ndash119

Martin KD et al 1998 ldquoMusic Content AnalysisThrough Models of Auditionrdquo Paper presented at the1998 ACM Multimedia Workshop on Content Pro-cessing of Music for Multimedia Applications BristolUK 12 April

Ng K et al 1996 ldquoAutomatic Detection of Tonality Us-ing Note Distributionrdquo Journal of New Music Re-search 25(4)369ndash381

Nwe T L et al 2004 ldquoSinging Voice Detection in Popu-lar Musicrdquo Paper presented at the 2004 ACM Multime-dia Conference New York 12 October

Ortiz-Berenguer L I and F J Casajus-Quiros 2002ldquoPolyphonic Transcription Using Piano Modeling forSpectral Pattern Recognitionrdquo Proceedings of the 2002Conference on Digital Audio Effects Hoboken NewJersey Wiley pp 45ndash50

Pardo B and W Birmingham 1999 ldquoAutomated Parti-tioning of Tonal Musicrdquo Technical Report CSE-TR-396ndash99 Electrical Engineering and Computer ScienceDepartment University of Michigan

Pardo B and W Birmingham 2001 ldquoChordal Analysis ofTonal Musicrdquo Technical Report CSE-TR-439-01 Elec-trical Engineering and Computer Science DepartmentUniversity of Michigan

Pauws S 2004 ldquoMusical Key Extraction from AudiordquoPaper presented at the 2004 International Sympo-sium on Music Information Retrieval Barcelona11 October

Pickens J et al 2002 ldquoPolyphonic Score Retrieval UsingPolyphonic Audio Queries A Harmonic Modeling Ap-proachrdquo Paper presented at the 2002 International

85

Symposium on Music Information Retrieval Paris15 October

Pickens J and T Crawford 2002 ldquoHarmonic Models forPolyphonic Music Retrievalrdquo Proceedings of the 2002ACM Conference on Information and Knowledge Man-agement New York Association for Computing Ma-chinery pp 430ndash437

Pickens J 2003 ldquoKey-Specific Shrinkage Techniques forHarmonic Modelsrdquo Poster presented at the 2003 Inter-national Symposium on Music Information RetrievalBaltimore 26ndash30 October

Plumbley M D et al 2002 ldquoAutomatic Music Tran-scription and Audio Source Separationrdquo Cyberneticsand Systems 33(6)603ndash627

Povel D J L 2002 ldquoA Model for the Perception of TonalMelodiesrdquo Proceedings of the 2002 International Con-ference on Music and Artificial Intelligence ViennaSpringer pp 144ndash154

Raphael C 2001 ldquoAutomated Rhythm TranscriptionrdquoPaper presented at the 2001 International Symposiumon Music Information Retrieval Bloomington Indiana15ndash17 October

Raphael C and J Stoddard 2003 ldquoHarmonic Analysiswith Probabilistic Graphical Modelsrdquo Paper presentedat the 2003 International Symposium on Music Infor-mation Retrieval Baltimore 26ndash30 October

Scheirer E D 1998 ldquoTempo and Beat Analysis of Acous-tic Musical Signalsrdquo Journal of the Acoustical Societyof America 103(1)588ndash601

Sheh A and D P W Ellis 2003 ldquoChord Segmentationand Recognition Using Em-Trained Hidden MarkovModelsrdquo Paper presented at the 2003 InternationalSymposium on Music Information Retrieval Balti-more 26ndash30 October

Shenoy A et al 2004 ldquoKey Determination of AcousticMusical Signalsrdquo Paper presented at the 2004 Interna-tional Conference on Multimedia and Expo Taipei 27ndash30 June

Smith N A and M A Schmuckler 2000 ldquoPitch-Distributional Effects on the Perception of Tonalityrdquo

Paper presented at the Sixth International Confer-ence on Music Perception and Cognition Keele5ndash10 August

Sollberger B et al 2003 ldquoMusical Chords as AffectivePriming Context in a Word-Evaluation Taskrdquo MusicPerception 20(3)263ndash282

Temperley D 1999a ldquoImproving the Krumhansl-Schmuckler Key-Finding Algorithmrdquo Paper presentedat the 22nd Annual Meeting of the Society for MusicTheory Atlanta 12 November

Temperley D 1999b ldquoWhatrsquos Key for Key TheKrumhansl-Schmuckler Key-Finding Algorithm Recon-sideredrdquo Music Perception 17(1)65ndash100

Temperley D 2002 ldquoA Bayesian Model of Key-FindingrdquoProceedings of the 2002 International Conference onMusic and Artificial Intelligence Vienna Springerpp 195ndash206

Vercoe B L 1997 ldquoComputational Auditory Pathways toMusic Understandingrdquo In I Deliegravege and J Sloboda edsPerception and Cognition of Music London Psychol-ogy Press pp 307ndash326

Wakefield G H 1999 ldquoMathematical Representation ofJoint Time-Chroma Distributionsrdquo SPIEmdashThe Inter-national Society for Optical Engineering 3807637ndash645

Wang Y et al 2003 ldquoParametric Vector Quantization forCoding Percussive Sounds in Musicrdquo Proceedings ofthe 2003 International Conference on AcousticsSpeech and Signal Processing Piscataway New JerseyInstitute for Electrical and Electronics Engineers

Wang Y et al 2004 ldquoLyricAlly Automatic Synchroniza-tion of Acoustic Musical Signals and Textual LyricsrdquoProceedings of 2004 ACM Multimedia New York As-sociation for Computing Machinery pp 212ndash219

Zhu Y and M Kankanhalli 2003 ldquoMusic Scale Modelingfor Melody Matchingrdquo Proceedings of the Eleventh In-ternational ACM Conference on Multimedia New YorkAssociation for Computing Machinery pp 359ndash362

Zhu Y et al 2004 ldquoA Method for Solmization of MelodyrdquoPaper presented at the 2004 International Conferenceon Multimedia and Expo Taipei 27ndash30 June

86 Computer Music Journal

been demonstrated in Shenoy et al (2004) and willbe summarized later in this article

Rhythm is another component that is fundamen-tal to the perception of music Measures of musicdivide a piece into time-counted segments andtime patterns in music are referred to in terms ofmeter The beat forms the basic unit of musicaltime and in a meter of 44mdashalso called common orquadruple timemdashthere are four beats to a measureRhythm can be perceived as a combination of strongand weak beats In a 44 measure consisting of foursuccessive quarter notes there is usually a strongbeat on the first and third quarter notes and a weakbeat on the second and fourth (Goto and Muraoka1994) If the strong beat constantly alternates withthe weak beat the inter-beat-interval (the temporaldifference between two successive beats) wouldusually correspond to the temporal length of a quar-ter note For our purpose the strong and weak beatsas defined above correspond to the alternating se-quence of equally spaced phenomenal impulses thatdefine the tempo of the music (Scheirer 1998) Ahierarchical structure like the measure (bar-line)level can provide information more useful for mod-eling music at a higher level of understanding (Gotoand Muraoka 1999) Key chords and rhythm areimportant expressive dimensions in musical perfor-mances Although expression is necessarily con-tained in the physical features of the audio signalsuch as amplitudes frequencies and onset times itis better understood when viewed from a higherlevel of abstraction that is in terms of musical con-structs (Dixon 2003) like the ones discussed here

Related Work

Existing work in key determination has been re-stricted to either the symbolic domain (MIDI andscore) or in the audio domain single-instrumentand simple polyphonic sounds (see for example Nget al 1996 Chew 2001 2002 Povel 2002 Pickens2003 Raphael and Stoddard 2003 Zhu and Kankan-halli 2003 Zhu et al 2004) A system to extract themusical key from classical piano sonatas sampledfrom compact discs has been demonstrated by Pauws

(2004) Here the spectrum is first restructured intoa chromagram representation in which the frequen-cies are mapped onto a limited set of twelve chromavalues This chromagram is used in a correlativecomparison with the key profiles of all the 24 West-ern musical keys that represent the perceived stabil-ity of each chroma within the context of a particularmusical key The key profile that has the maximumcorrelation with the computed chromagram istaken as the most likely key It has been mentionedthat the performance of the system on recordingsfrom other instrumentation or from other musicalidioms is unknown Additionally factors in musicperception and cognition such as rhythm and har-mony are not modeled These issues have been ad-dressed in Shenoy et al (2004) in extracting the keyof popular music sampled from compact-disc audio

Most existing work in the detection and recogni-tion of chords has similarly been restricted to thesymbolic domain or single-instrument and simplepolyphonic sounds (see for example Carreras et al1999 Fujishima 1999 Pardo and Birmingham 1999Bello et al 2000 Pardo and Birmingham 2001Ortiz-Berenguer and Casajus-Quiros 2002 Pickensand Crawford 2002 Pickens et al 2002 Klapuri2003 Raphael and Stoddard 2003) A statistical ap-proach to perform chord segmentation and recogni-tion on real-world musical recordings that usesHidden Markov Models (HMMs) trained using theExpectation-Maximization (EM) algorithm has beendemonstrated in Sheh and Ellis (2003) This workdraws on the prior idea of Fujishima (1999) who pro-posed a representation of audio termed ldquopitch-classprofilesrdquo (PCPs) in which the Fourier transform in-tensities are mapped to the twelve semitone classes(chroma) This system assumes that the chord se-quence of an entire piece is known beforehandwhich limits the technique to the detection ofknown chord progressions Furthermore becausethe training and testing data is restricted to the mu-sic of the Beatles it is unclear how this systemwould perform for other kinds of music A learningmethod similar to this has been used to for chord de-tection in Maddage et al (2004) Errors in chord de-tection have been corrected using knowledge fromthe key-detection system in Shenoy et al (2004)

76 Computer Music Journal

Shenoy and Wang

Much research in the past has also focused onrhythm analysis and the development of beat-tracking systems However most of this researchdoes not consider higher-level beat structuresabove the quarter-note level or it was restricted tothe symbolic domain rather than working in real-world acoustic environments (see for exampleAllen and Dannenberg 1990 Goto and Muraoka1994 Vercoe 1997 Scheirer 1998 Cemgil et al2001 Dixon 2001 Raphael 2001 Cemgil and Kap-pen 2003 Dixon 2003) Goto and Muraoka (1999)perform real-time higher level rhythm determina-tion up to the measure level in musical audio with-out drum sounds using onset times and chordchange detection for musical decisions The provi-sional beat times are a hypothesis of the quarter-note level and are inferred by an analysis of onsettimes The chord-change analysis is then performedat the quarter-note level and at the interpolatedeighth-note level followed by an analysis of howmuch the dominant frequency components in-cluded in chord tones and their harmonic overtoneschange in the frequency spectrum Musical knowl-edge of chord change is then applied to detect thehigher-level rhythm structure at the half-note andmeasure (whole-note) levels Goto has extended thiswork to apply to music with and without drumsounds using drum patterns in addition to onsettimes and chord changes discussed previously(Goto 2001) The drum pattern analysis can be per-formed only if the musical audio signal containsdrums and hence a technique that measures the au-tocorrelation of the snare drumrsquos onset times is ap-plied Based on the premise that drum-sounds arenoisy the signal is determined to contain drumsounds only if this autocorrelation value is highenough Based on the presence or absence of drumsounds the knowledge of chord changes andordrum patterns is selectively applied The highestlevel of rhythm analysis at the measure level(whole-notebar) is then performed using only musi-cal knowledge of chord change patterns In boththese works chords are not recognized by nameand thus rhythm detection has been performed us-ing chord-change probabilities rather than actualchord information

System Description

A well-known algorithm used to identify the key ofmusic is the Krumhansl-Schmuckler key-finding al-gorithm (Krumhansl 1990) The basic principle ofthe algorithm is to compare the input music with aprototypical major (or minor) scale-degree profile Inother words the distribution of pitch-classes in apiece is compared with an ideal distribution foreach key This algorithm and its variations (Huronand Parncutt 1993 Temperley 1999a 1999b 2002)however could not be directly applied in our sys-tem as these require a list of notes with note-onand note-off times which cannot be directly ex-tracted from polyphonic audio Thus the problemhas been approached in Shenoy et al (2004) at ahigher level by clustering individual notes to obtainthe harmonic description of the music in the formof the 24 majorminor triads Then based on a rule-based analysis of these chords against the chordspresent in the major and minor keys the key of thesong is extracted The audio has been framed intobeat-length segments to extract metadata in theform of quarter-note detection of the music The ba-sis for this technique is to assist in the detection ofchord structure based on the musical knowledgethat chords are more likely to change at beat timesthan at other positions (Goto and Muraoka 1999)

The beat-detection process first detects the onsetspresent in the music using sub-band processing(Wang et al 2003) This technique of onset detec-tion is based on the sub-band intensity to detect theperceptually salient percussive events in the signalWe draw on the prior ideas of beat tracking dis-cussed in Scheirer (1998) and Dixon (2003) to deter-mine the beat structure of the music As a first stepall possible values of inter-onset intervals (IOIs) arecomputed An IOI is defined as the time interval be-tween any pair of onsets not necessarily successiveThen clusters of IOIs are identified and a rankedset of hypothetical inter-beat-intervals (IBIs) is cre-ated based on the size of the corresponding clustersand by identifying integer relationships with otherclusters The latter process is to recognize harmonicrelationships between the beat (at the quarter-notelevel) and simple integer multiples of the beat (at

77

the half-note and whole-note levels) An error mar-gin of 25 msec has been set in the IBI to account forslight variations in the tempo The highest rankedvalue is returned as the IBI from which we obtainthe tempo expressed as an inverse value of the IBIPatterns of onsets in clusters at the IBI are trackedand beat information is interpolated into sections inwhich onsets corresponding to the beat might notbe detected

The audio feature for harmonic analysis is a re-duced spectral representation of each beat-spacedsegment of the audio based on a chroma transforma-tion of the spectrum This feature class representsthe spectrum in terms of pitch class and forms thebasis for the chromagram (Wakefield 1999) The sys-tem takes into account the chord distributionacross the diatonic major scale and the three typesof minor scales (natural harmonic and melodic)Furthermore the system has been biased by assign-ing relatively higher weights to the primary chords

in each key (tonic dominant and subdominant)This concept has also been demonstrated in the Spi-ral Array a mathematical model for tonality (Chew2001 2002)

It is observed that the chord-recognition accuracyof the key system though sufficient to determinethe key is not sufficient to provide usable chordtranscriptions or determine the hierarchical rhythmstructure across the entire duration of the musicThus in this article we enhance the four-step key-determination system with three post-processingstages that allow us to perform these two tasks withgreater accuracy The block diagram of the proposedframework is shown in Figure 1

System Components

We now discuss the three post-key processing com-ponents of the system These consist of a first phaseof chord accuracy enhancement rhythm structuredetermination and a second phase of chord accu-racy enhancement

Chord Accuracy Enhancement (Phase 1)

In this step we aim to increase the accuracy of chorddetection For each audio frame we perform twochecks

Check 1 Eliminate Chords Not in the Key of theSong

Here we perform a rule-based analysis of the de-tected chord to see if it exists in the key of the songIf not we check for the presence of the major chordof the same pitch class (if the detected chord is mi-nor) and vice versa If present in the key we replacethe erroneous chord with this chord This is becausethe major and minor chord of a pitch class differonly in the position of their mediants The chord-detection approach often suffers from recognitionerrors that result from overlaps of harmonic compo-nents of individual notes in the spectrum these arequite difficult to avoid Hence there is a possibly

78 Computer Music Journal

Figure 1 System frame-work

Shenoy and Wang

of error in the distinction between the major andminor chords for a given pitch class It must behighlighted that chords outside the key are not nec-essarily erroneous and this usage is a simplificationused by this system If this check fails we eliminatethe chord

Check 2 Perform Temporal Corrections ofDetected or Missing Chords

If the chords detected in the adjacent frames are thesame as each other but different from the currentframe then the chord in the current frame is likelyto be incorrect In these cases we coerce the cur-rent framersquos chord to match the one in the adjacentframes

We present an illustrative example of the abovechecks over three consecutive quarter-note-spacedframes of audio in Figure 2

Rhythm Structure Determination

Next our system checks for the start of measuresbased on the premise that chords are more likely tochange at the beginning of a measure than at otherbeat positions (Goto 2001) Because there are fourquarter notes to a measure in 44 time (which is byfar the most common meter in popular music) we

check for patterns of four consecutive frames withthe same chord to demarcate all possible measureboundaries However not all of these boundariesmay be correct We will illustrate this with an ex-ample in which a chord sustains over two measuresof the music From Figure 3c it can be seen thatthere are four possible measure boundaries being de-tected across the twelve quarter-note-spaced framesof audio Our aim is to eliminate the two erroneousones shown with a dotted line in Figure 3c and in-terpolate an additional measure line at the start ofthe fifth frame to give us the required result as seenin Figure 3b

The correct measure boundaries along the entirelength of the song are thus determined as followsFirst obtain all possible patterns of boundary loca-tions that are have integer relationships in multiplesof four The select the pattern with the highestcount as the one corresponding to the pattern of ac-tual measure boundaries Track the boundary loca-tions in the detected pattern and interpolate missingboundary positions across the rest of the song

Chord Accuracy Enhancement (Phase 2)

Now that the measure boundaries have been ex-tracted we can increase the chord accuracy in eachmeasure of audio with a third check

79

Figure 2 Chord accuracyenhancement (Phase 1)

Check 3 Intra-Measure Chord Check

From Goto and Muraoka (1999) we know thatchords are more likely to change at the beginning ofthe measures than at other positions of half-notetimes Hence if three of the chords are the samethen the four chord is likely to be the same as theothers (Check 3a) Also if a chord is common toboth halves of the measure then all the chords in

the measure are likely to be the same as this chord(Check 3b)

It is observed that all possible cases of chords un-der Check 3a are already handled by Checks 1 and 2above Hence we only implement Check 3b as il-lustrated in Figure 4 with an example This check isrequired because in the case of a minor key we canhave both the major and minor chord of the samepitch class present in the song A classic example of

80 Computer Music Journal

Figure 3 Error in measureboundary detection

Figure 4 Chord accuracyenhancement (Phase 2)

Figure 3

Figure 4

Shenoy and Wang

this can be seen in ldquoHotel Californiardquo by the EaglesThis song is in the key of B Minor and the chords inthe verse include an E major and an E minor chordwhich shows a possible musical shift from the as-cending melodic minor to the natural minor Hereif an E major chord is detected in a measure contain-ing the E minor chord Check 1 would not detectany error because both the major and minor chordsare potentially present in the key of B minor Themelodic minor however rarely occurs as such inpopular music melodies but it has been included inthis work along with the natural and harmonic mi-nor for completeness

Experiments

Setup

The results of our experiments performed on 30popular songs in English spanning five decades ofmusic are tabulated in Table 1 The songs havebeen carefully selected to represent a variety ofartists and time periods We assume the meter to be44 which is the most frequent meter of popularsongs and the tempo of the input song is assumedto be constrained between 40 and 185 quarter notesper minute

81

Table 1 Experimental Results

Chord Chord Accuracy Successful Chord AccuracyDetection Original Detected Enhancement-I Measure Enhancement-II

No Song Title ( accuracy) Key Key ( accuracy) Detection ( accuracy)

1 (1965) Righteous BrothersmdashUnchained melody 5768 C major C major 7092 2830 songs 85111 (1965) Righteous BrothersmdashUnchained melody 5768 C major C major 7092 Yes 85112 (1977) Bee GeesmdashStayinrsquo alive 3967 F minor F minor 5491 Yes 71403 (1977) Eric ClaptonmdashWonderful tonight 2770 G major G major 4082 Yes 60644 (1977) Fleetwood MacmdashYou make lovinrsquo fun 4437 A major A major 6069 Yes 79315 (1979) EaglesmdashI canrsquot tell you why 5241 D major D major 6874 Yes 88976 (1984) ForeignermdashI want to know what love is 5503 D minor D minor 7342 No 58127 (1986) Bruce HornsbymdashThe way it is 5974 G major G major 7032 Yes 88508 (1989) Chris ReamdashRoad to hell 6151 A minor A minor 7664 Yes 89249 (1991) REMmdashLosing my religion 5631 A minor A minor 7075 Yes 857410 (1991) U2mdashOne 5663 C major C major 6482 Yes 766311 (1992) Michael JacksonmdashHeal the world 3044 A major A major 5176 Yes 686212 (1993) MLTRmdashSomeday 5668 D major D major 6971 Yes 873013 (1995) CooliomdashGangstarsquos paradise 3175 C minor C minor 4794 Yes 707914 (1996) Backstreet BoysmdashAs long as you love me 4845 C major C major 6197 Yes 828215 (1996) Joan OsbornemdashOne of us 4690 A major A major 5930 Yes 800516 (1997) Bryan AdamsmdashBack to you 6892 C major C major 7569 Yes 958017 (1997) Green DaymdashTime of your life 5455 G major G major 6458 Yes 877718 (1997) HansonmdashMmmbop 3956 A major A major 6339 Yes 810819 (1997) Savage GardenmdashTruly madly deeply 4906 C major C major 6388 Yes 808620 (1997) Spice GirlsmdashViva forever 6450 D minor F major 7425 Yes 914221 (1997) Tina ArenamdashBurn 3542 G major G major 5613 Yes 773822 (1998) Jennifer PaigemdashCrush 4037 C min C min 5541 Yes 767823 (1998) Natalie ImbrugliamdashTorn 5300 F major F major 6789 Yes 877324 (1999) SantanamdashSmooth 5453 A minor A minor 6963 No 499125 (2000) CorrsmdashBreathless 3677 B major B major 6347 Yes 772826 (2000) Craig DavidmdashWalking away 6899 A minor C major 7526 Yes 930327 (2000) Nelly FurtadomdashTurn off the light 3636 D major D major 4848 Yes 705228 (2000) WestlifemdashSeasons in the sun 3419 F major F major 5869 Yes 763529 (2001) ShakiramdashWhenever wherever 4986 C minor C minor 6282 Yes 783930 (2001) TrainmdashDrops of Jupiter 3254 C major C major 5373 Yes 6985

Overall Accuracy at each stage 4813 2830 songs 6320 2830 songs 7891

It can be observed that the average chord-detectionaccuracy across the length of the entire music per-formed by the chord-detection step (module 3 inFigure 2) is relatively low at 4813 The rest of thechords are either not detected or are detected in er-ror This accuracy is sufficient however to deter-mine the key accurately for 28 out of 30 songs inthe key-detection step (module 4 in Figure 2) whichreflects an accuracy of over 93 for key detectionThis was verified against the information in com-mercially available sheet music for the songs (wwwmusicnotescom wwwsheetmusicpluscom) Theaverage chord detection accuracy of the system im-proves on an average by 1507 on applying ChordAccuracy Enhancement (Phase 1) (See module 5 inFigure 2) Errors in key determination do not haveany effect on this step as will be discussed next

The new accuracy of 6320 has been found to besufficient to determine the hierarchical rhythmstructure (module 6 in Figure 2) across the music for28 out of the 30 songs thus again reflecting an accu-racy of over 93 for rhythm tracking Finally theapplication of Chord Accuracy Enhancement (Phase2 see module 7 in Figure 2) makes a substantial per-formance improvement of 1571 leading to a finalchord-detection accuracy of 7891 This couldhave been higher were it not for the performancedrop for the two songs (song numbers 6 and 24 inTable 1) owing to error in measure-boundary detec-tion This exemplifies the importance of accuratemeasure detection to performing intra-measurechord checks based on the previously discussedmusical knowledge of chords

Analysis of Key

It can be observed that for two of the songs (songnumbers 20 and 26 in Table 1) the key has been de-termined incorrectly because the major key and itsrelative minor are very close Our technique as-sumes that the key of the song is constant through-out the length of the song However many songsoften use both major and minor keys perhapschoosing a minor key for the verse and a major keyfor the chorus or vice versa Sometimes the chordsused in the song are present in both the major and

its relative minor For example the four mainchords used in the song ldquoViva Foreverrdquo by the SpiceGirls are D-sharp minor A-sharp minor B majorand F-sharp major These chords are present in thekey of F-sharp major and D-sharp minor hence it isdifficult for the system to determine if the song is inthe major key or the relative minor A similar obser-vation can be made for the song ldquoWalking Awayrdquo byCraig David in which the main chords used are Aminor D minor F major G major and C majorThese chords are present in both the key of C majoras well as in its relative minor A minor

The use of a weighted-cosine-similarity techniquecauses the shorter major-key patterns to be preferredover the longer minor-key patterns owing to nor-malization that is performed while applying the co-sine similarity For the minor keys normalization isapplied taking into account the count of chords thatcan be constructed across all the three types of minorscales However from an informal evaluation of pop-ular music we observe that popular music in minorkeys usually shifts across only two out of the threescales primarily the natural and harmonic minorIn such cases the normalization technique appliedwould cause the system to become slightly biasedtoward the relative major key where this problem isnot present as there is only one major scale

Errors in key recognition do not affect the ChordAccuracy Enhancement (Phase 1) module becausewe also consider chords present in the relative ma-jorminor key in addition to the chords in the de-tected key A theoretical explanation of how toperform key identification in such cases of ambigu-ity (as seen above) based on an analysis of sheet mu-sic can be found in Ewer (2002)

Analysis of Chord

The variation in the chord-detection accuracy of thesystem can be explained in terms of other chordsand key change

Use of Other Chords

In this approach we have considered only the majorand minor triads However in addition to these

82 Computer Music Journal

Shenoy and Wang

there are other chord possibilities in popular musicwhich likely contribute to the variation in chord de-tection These chords include the augmented anddiminished triads seventh chords (eg the dominantseventh major seventh and minor seventh) etc

Polyphony with its multidimensional sequencesof overlapping tones and overlapping harmoniccomponents of individual notes in the spectrummight cause the elements in the chroma vector tobe weighted wrongly As a result a Cmajor 7 chord(C E G B) in the music might incorrectly get de-tected as an E minor chord (E G B) if the latterthree notes are assigned a relatively higher weightin the chroma vector

Key Change

In some songs there is a key change toward the endof a song to make the final repeated part(s) (eg thechorusrefrain) slightly different from the previousparts This is affected by transposing the song tohigher semitones usually up a half step This hasalso been highlighted by Goto (2003) Because oursystem does not currently handle key changes thechords detected in this section will not be recog-nized This is illustrated with an example in Figure 5

Another point to be noted here is of chord substi-tutionsimplification of extended chords in theevaluation of our system For simplicity extendedchords can be substituted by their respective ma-jorminor triads As long as the notes in the ex-tended chord are present in the scale and the basictriad is present the simplification can be done For

example the C-major 7 can be simplified to the Cmajor triad This substitution has been performedon the extended chords annotated in the sheet mu-sic in the evaluation of our system

Rhythm Tracking Observations

We conjecture the error in rhythm tracking to becaused by errors in the chord-detection as discussedabove Chords present in the music and not handledby our system could be incorrectly classified into oneof the 24 majorminor triads owing to complexitiesin polyphonic audio analysis This can result in in-correct clusters of four chords being captured by therhythm-detection process resulting in an incorrectpattern of measure boundaries having the highestcount Furthermore beat detection is a non-trivialtask the difficulties of tracking the beats in acous-tic signals are discussed in Goto and Muraoka(1994) Any error in beat detection can cause a shiftin the rhythm structure determined by the system

Discussion

We have presented a framework to determine thekey chords and hierarchical rhythm structure fromacoustic musical signals To our knowledge this isthe first attempt to use a rule-based approach thatcombines low-level features with high-level musicknowledge of rhythm and harmonic structure to de-termine all three of these dimensions of music The

83

Figure 5 Key Shift

methods should (and do) work reasonably well forkey and chord labeling of popular music in 44 timeHowever they would not extend very far beyondthis idiom or to more rhythmically and tonallycomplex music This framework has been appliedsuccessfully in various other aspects of contentanalysis such as singing-voice detection (Nwe et al2004) and the automatic alignment of textual lyricsand musical audio (Wang et al 2004) The humanauditory system is capable of extracting rich andmeaningful data from complex audio signals (Shehand Ellis 2003) and existing computational audi-tory analysis systems fall clearly behind humans inperformance Towards this end we believe that themodel proposed here provides a promising platformfor the future development of more sophisticatedauditory models based on a better understanding ofmusic Our current and future research that buildson this work is highlighted below

Key Detection

Our technique assumes that the key of the song isconstant throughout the duration of the song How-ever owing to the properties of the relative majorminor key combinations we have made the chord-and rhythm-detection processes (which use the keyof the song as input) quite robust against changesacross this key combination However the same can-not be said about other kinds of key changes in themusic This is because such key changes are quitedifficult to track as there are no fixed rules andthey tend to depend more on the songwriterrsquos cre-ativity For example the song ldquoLet It Growrdquo by EricClapton switches from a B-minor key in the verse toan E major key in the chorus We believe that ananalysis of the song structure (verse chorus bridgeetc) could likely serve as an input to help track thiskind of key change This problem is currently beinganalyzed and will be tackled in the future

Chord Detection

In this approach we have considered only the majorand minor triads However in addition to these

there are other chord possibilities in popular musicand future work will be targeted toward the detec-tion of dominant-seventh chords and extendedchords as discussed earlier Chord-detection re-search can be further extended to include knowl-edge of chord progressions based on the function ofchords in their diatonic scale which relates to theexpected resolution of each chord within a keyThat is the analysis of chord progressions based onthe ldquoneedrdquo for a sounded chord to move from anldquounstablerdquo sound (a ldquodissonancerdquo) to a more final orldquostablerdquo sounding one (a ldquoconsonancerdquo)

Rhythm Tracking

Unlike that of Goto and Muraoka (1999) therhythm-extraction technique employed in our cur-rent system does not perform well for ldquodrumlessrdquomusic signals because the onset detector has beenoptimized to detect the onset of percussive events(Wang et al 2003) Future effort will be aimed at ex-tending the current work for music signals that donot contain drum sounds

Acknowledgments

We thank the editors and two anonymous reviewersfor helpful comments and suggestions on an earlierversion of this article

References

Aigrain P 1999 ldquoNew Applications of Content Pro-cessing of Musicrdquo Journal of New Music Research28(4)271ndash280

Allen P E and R B Dannenberg 1990 ldquoTracking MusicalBeats in Real Timerdquo Proceedings of the 1990 Interna-tional Computer Music Conference San Francisco In-ternational Computer Music Association pp 140ndash143

Bello J P et al 2000 ldquoTechniques for Automatic MusicTranscriptionrdquo Paper presented at the 2000 Interna-tional Symposium on Music Information Retrieval Uni-versity of Massachusetts at Amherst 23ndash25 October

Carreras F et al 1999 ldquoAutomatic Harmonic Descrip-tion of Musical Signals Using Schema-Based Chord De-

84 Computer Music Journal

Shenoy and Wang

compositionrdquo Journal of New Music Research 28(4)310ndash333

Cemgil A T et al 2001 ldquoOn Tempo Tracking Tempo-gram Representation and Kalman Filteringrdquo Journal ofNew Music Research 29(4)259ndash273

Cemgil A T and B Kappen 2003 ldquoMonte Carlo Meth-ods for Tempo Tracking and Rhythm QuantizationrdquoJournal of Artificial Intelligence Research 18(1)45ndash81

Chew E 2001 ldquoModeling Tonality Applications to Mu-sic Cognitionrdquo Proceedings of the Twenty-Third An-nual Conference of the Cognitive Science SocietyWheat Ridge Colorado Cognitive Science Societypp 206ndash211

Chew E 2002 ldquoThe Spiral Array An Algorithm for De-termining Key Boundariesrdquo Proceedings of the 2002International Conference on Music and Artificial In-telligence Vienna Springer pp 18ndash31

Dixon S 2001 ldquoAutomatic Extraction of Tempo and Beatfrom Expressive Performancesrdquo Journal of New MusicResearch 30(1)39ndash58

Dixon S 2003 ldquoOn the Analysis of Musical Expressionsin Audio Signalsrdquo SPIEmdashThe International Society forOptical Engineering 5021(2)122ndash132

Dixon S 2004 ldquoAnalysis of Musical Content in DigitalAudiordquo Computer Graphics and Multimedia Applica-tions Problems and Solutions Hershey PennsylvaniaIdea Group pp 214ndash235

Ewer G 2002 Easy Music Theory Lower Sackville NovaScotia Spring Day Music Publishers

Fujishima T 1999 ldquoReal-Time Chord Recognition ofMusical Sound A System Using Common Lisp Mu-sicrdquo Proceedings of the 1999 International ComputerMusic Conference San Francisco International Com-puter Music Association pp 464ndash467

Goto M 2001 ldquoAn Audio-Based Real-Time Beat-TrackingSystem for Music with or without Drum SoundsrdquoJournal of New Music Research 30(2)159ndash171

Goto M 2003 ldquoA Chorus-Section Detecting Method forMusical Audio Signalsrdquo Proceedings of the 2003 Inter-national Conference on Acoustics Speech and SignalProcessing Piscataway New Jersey Institute for Elec-trical and Electronics Engineers pp 437ndash440

Goto M and Y Muraoka 1994 ldquoA Beat-Tracking Sys-tem for Acoustic Signals of Musicrdquo Proceedings of the1994 ACM Multimedia New York Association forComputing Machinery pp 365ndash372

Goto M and Y Muraoka 1999 ldquoReal-Time Beat Track-ing for Drumless Audio Signals Chord Change Detec-tion for Musical Decisionsrdquo Speech Communication27(3ndash4)311ndash335

Hevner K 1936 ldquoExperimental Studies of the Elements

of Expression in Musicrdquo American Journal of Psychol-ogy 48246ndash268

Huron D 2000 ldquoPerceptual and Cognitive Applicationsin Music Information Retrievalrdquo Paper presented atthe 2000 International Symposium on Music Informa-tion Retrieval University of Massachusetts at Amherst23ndash25 October

Huron D and R Parncutt 1993 ldquoAn Improved Modelof Tonality Perception Incorporating Pitch Salienceand Echoic Memoryrdquo Psychomusicology 12(2)154ndash171

Klapuri A 2003 ldquoAutomatic Transcription of MusicrdquoProceedings of SMAC 2003 Stockholm KTH

Krumhansl C 1990 Cognitive Foundations of MusicalPitch Oxford Oxford University Press

Maddage N C et al 2004 ldquoContent-Based Music Struc-ture Analysis with Applications to Music SemanticsUnderstandingrdquo Proceedings of 2004 ACM Multime-dia New York Association for Computing Machinerypp 112ndash119

Martin KD et al 1998 ldquoMusic Content AnalysisThrough Models of Auditionrdquo Paper presented at the1998 ACM Multimedia Workshop on Content Pro-cessing of Music for Multimedia Applications BristolUK 12 April

Ng K et al 1996 ldquoAutomatic Detection of Tonality Us-ing Note Distributionrdquo Journal of New Music Re-search 25(4)369ndash381

Nwe T L et al 2004 ldquoSinging Voice Detection in Popu-lar Musicrdquo Paper presented at the 2004 ACM Multime-dia Conference New York 12 October

Ortiz-Berenguer L I and F J Casajus-Quiros 2002ldquoPolyphonic Transcription Using Piano Modeling forSpectral Pattern Recognitionrdquo Proceedings of the 2002Conference on Digital Audio Effects Hoboken NewJersey Wiley pp 45ndash50

Pardo B and W Birmingham 1999 ldquoAutomated Parti-tioning of Tonal Musicrdquo Technical Report CSE-TR-396ndash99 Electrical Engineering and Computer ScienceDepartment University of Michigan

Pardo B and W Birmingham 2001 ldquoChordal Analysis ofTonal Musicrdquo Technical Report CSE-TR-439-01 Elec-trical Engineering and Computer Science DepartmentUniversity of Michigan

Pauws S 2004 ldquoMusical Key Extraction from AudiordquoPaper presented at the 2004 International Sympo-sium on Music Information Retrieval Barcelona11 October

Pickens J et al 2002 ldquoPolyphonic Score Retrieval UsingPolyphonic Audio Queries A Harmonic Modeling Ap-proachrdquo Paper presented at the 2002 International

85

Symposium on Music Information Retrieval Paris15 October

Pickens J and T Crawford 2002 ldquoHarmonic Models forPolyphonic Music Retrievalrdquo Proceedings of the 2002ACM Conference on Information and Knowledge Man-agement New York Association for Computing Ma-chinery pp 430ndash437

Pickens J 2003 ldquoKey-Specific Shrinkage Techniques forHarmonic Modelsrdquo Poster presented at the 2003 Inter-national Symposium on Music Information RetrievalBaltimore 26ndash30 October

Plumbley M D et al 2002 ldquoAutomatic Music Tran-scription and Audio Source Separationrdquo Cyberneticsand Systems 33(6)603ndash627

Povel D J L 2002 ldquoA Model for the Perception of TonalMelodiesrdquo Proceedings of the 2002 International Con-ference on Music and Artificial Intelligence ViennaSpringer pp 144ndash154

Raphael C 2001 ldquoAutomated Rhythm TranscriptionrdquoPaper presented at the 2001 International Symposiumon Music Information Retrieval Bloomington Indiana15ndash17 October

Raphael C and J Stoddard 2003 ldquoHarmonic Analysiswith Probabilistic Graphical Modelsrdquo Paper presentedat the 2003 International Symposium on Music Infor-mation Retrieval Baltimore 26ndash30 October

Scheirer E D 1998 ldquoTempo and Beat Analysis of Acous-tic Musical Signalsrdquo Journal of the Acoustical Societyof America 103(1)588ndash601

Sheh A and D P W Ellis 2003 ldquoChord Segmentationand Recognition Using Em-Trained Hidden MarkovModelsrdquo Paper presented at the 2003 InternationalSymposium on Music Information Retrieval Balti-more 26ndash30 October

Shenoy A et al 2004 ldquoKey Determination of AcousticMusical Signalsrdquo Paper presented at the 2004 Interna-tional Conference on Multimedia and Expo Taipei 27ndash30 June

Smith N A and M A Schmuckler 2000 ldquoPitch-Distributional Effects on the Perception of Tonalityrdquo

Paper presented at the Sixth International Confer-ence on Music Perception and Cognition Keele5ndash10 August

Sollberger B et al 2003 ldquoMusical Chords as AffectivePriming Context in a Word-Evaluation Taskrdquo MusicPerception 20(3)263ndash282

Temperley D 1999a ldquoImproving the Krumhansl-Schmuckler Key-Finding Algorithmrdquo Paper presentedat the 22nd Annual Meeting of the Society for MusicTheory Atlanta 12 November

Temperley D 1999b ldquoWhatrsquos Key for Key TheKrumhansl-Schmuckler Key-Finding Algorithm Recon-sideredrdquo Music Perception 17(1)65ndash100

Temperley D 2002 ldquoA Bayesian Model of Key-FindingrdquoProceedings of the 2002 International Conference onMusic and Artificial Intelligence Vienna Springerpp 195ndash206

Vercoe B L 1997 ldquoComputational Auditory Pathways toMusic Understandingrdquo In I Deliegravege and J Sloboda edsPerception and Cognition of Music London Psychol-ogy Press pp 307ndash326

Wakefield G H 1999 ldquoMathematical Representation ofJoint Time-Chroma Distributionsrdquo SPIEmdashThe Inter-national Society for Optical Engineering 3807637ndash645

Wang Y et al 2003 ldquoParametric Vector Quantization forCoding Percussive Sounds in Musicrdquo Proceedings ofthe 2003 International Conference on AcousticsSpeech and Signal Processing Piscataway New JerseyInstitute for Electrical and Electronics Engineers

Wang Y et al 2004 ldquoLyricAlly Automatic Synchroniza-tion of Acoustic Musical Signals and Textual LyricsrdquoProceedings of 2004 ACM Multimedia New York As-sociation for Computing Machinery pp 212ndash219

Zhu Y and M Kankanhalli 2003 ldquoMusic Scale Modelingfor Melody Matchingrdquo Proceedings of the Eleventh In-ternational ACM Conference on Multimedia New YorkAssociation for Computing Machinery pp 359ndash362

Zhu Y et al 2004 ldquoA Method for Solmization of MelodyrdquoPaper presented at the 2004 International Conferenceon Multimedia and Expo Taipei 27ndash30 June

86 Computer Music Journal

Shenoy and Wang

Much research in the past has also focused onrhythm analysis and the development of beat-tracking systems However most of this researchdoes not consider higher-level beat structuresabove the quarter-note level or it was restricted tothe symbolic domain rather than working in real-world acoustic environments (see for exampleAllen and Dannenberg 1990 Goto and Muraoka1994 Vercoe 1997 Scheirer 1998 Cemgil et al2001 Dixon 2001 Raphael 2001 Cemgil and Kap-pen 2003 Dixon 2003) Goto and Muraoka (1999)perform real-time higher level rhythm determina-tion up to the measure level in musical audio with-out drum sounds using onset times and chordchange detection for musical decisions The provi-sional beat times are a hypothesis of the quarter-note level and are inferred by an analysis of onsettimes The chord-change analysis is then performedat the quarter-note level and at the interpolatedeighth-note level followed by an analysis of howmuch the dominant frequency components in-cluded in chord tones and their harmonic overtoneschange in the frequency spectrum Musical knowl-edge of chord change is then applied to detect thehigher-level rhythm structure at the half-note andmeasure (whole-note) levels Goto has extended thiswork to apply to music with and without drumsounds using drum patterns in addition to onsettimes and chord changes discussed previously(Goto 2001) The drum pattern analysis can be per-formed only if the musical audio signal containsdrums and hence a technique that measures the au-tocorrelation of the snare drumrsquos onset times is ap-plied Based on the premise that drum-sounds arenoisy the signal is determined to contain drumsounds only if this autocorrelation value is highenough Based on the presence or absence of drumsounds the knowledge of chord changes andordrum patterns is selectively applied The highestlevel of rhythm analysis at the measure level(whole-notebar) is then performed using only musi-cal knowledge of chord change patterns In boththese works chords are not recognized by nameand thus rhythm detection has been performed us-ing chord-change probabilities rather than actualchord information

System Description

A well-known algorithm used to identify the key ofmusic is the Krumhansl-Schmuckler key-finding al-gorithm (Krumhansl 1990) The basic principle ofthe algorithm is to compare the input music with aprototypical major (or minor) scale-degree profile Inother words the distribution of pitch-classes in apiece is compared with an ideal distribution foreach key This algorithm and its variations (Huronand Parncutt 1993 Temperley 1999a 1999b 2002)however could not be directly applied in our sys-tem as these require a list of notes with note-onand note-off times which cannot be directly ex-tracted from polyphonic audio Thus the problemhas been approached in Shenoy et al (2004) at ahigher level by clustering individual notes to obtainthe harmonic description of the music in the formof the 24 majorminor triads Then based on a rule-based analysis of these chords against the chordspresent in the major and minor keys the key of thesong is extracted The audio has been framed intobeat-length segments to extract metadata in theform of quarter-note detection of the music The ba-sis for this technique is to assist in the detection ofchord structure based on the musical knowledgethat chords are more likely to change at beat timesthan at other positions (Goto and Muraoka 1999)

The beat-detection process first detects the onsetspresent in the music using sub-band processing(Wang et al 2003) This technique of onset detec-tion is based on the sub-band intensity to detect theperceptually salient percussive events in the signalWe draw on the prior ideas of beat tracking dis-cussed in Scheirer (1998) and Dixon (2003) to deter-mine the beat structure of the music As a first stepall possible values of inter-onset intervals (IOIs) arecomputed An IOI is defined as the time interval be-tween any pair of onsets not necessarily successiveThen clusters of IOIs are identified and a rankedset of hypothetical inter-beat-intervals (IBIs) is cre-ated based on the size of the corresponding clustersand by identifying integer relationships with otherclusters The latter process is to recognize harmonicrelationships between the beat (at the quarter-notelevel) and simple integer multiples of the beat (at

77

the half-note and whole-note levels) An error mar-gin of 25 msec has been set in the IBI to account forslight variations in the tempo The highest rankedvalue is returned as the IBI from which we obtainthe tempo expressed as an inverse value of the IBIPatterns of onsets in clusters at the IBI are trackedand beat information is interpolated into sections inwhich onsets corresponding to the beat might notbe detected

The audio feature for harmonic analysis is a re-duced spectral representation of each beat-spacedsegment of the audio based on a chroma transforma-tion of the spectrum This feature class representsthe spectrum in terms of pitch class and forms thebasis for the chromagram (Wakefield 1999) The sys-tem takes into account the chord distributionacross the diatonic major scale and the three typesof minor scales (natural harmonic and melodic)Furthermore the system has been biased by assign-ing relatively higher weights to the primary chords

in each key (tonic dominant and subdominant)This concept has also been demonstrated in the Spi-ral Array a mathematical model for tonality (Chew2001 2002)

It is observed that the chord-recognition accuracyof the key system though sufficient to determinethe key is not sufficient to provide usable chordtranscriptions or determine the hierarchical rhythmstructure across the entire duration of the musicThus in this article we enhance the four-step key-determination system with three post-processingstages that allow us to perform these two tasks withgreater accuracy The block diagram of the proposedframework is shown in Figure 1

System Components

We now discuss the three post-key processing com-ponents of the system These consist of a first phaseof chord accuracy enhancement rhythm structuredetermination and a second phase of chord accu-racy enhancement

Chord Accuracy Enhancement (Phase 1)

In this step we aim to increase the accuracy of chorddetection For each audio frame we perform twochecks

Check 1 Eliminate Chords Not in the Key of theSong

Here we perform a rule-based analysis of the de-tected chord to see if it exists in the key of the songIf not we check for the presence of the major chordof the same pitch class (if the detected chord is mi-nor) and vice versa If present in the key we replacethe erroneous chord with this chord This is becausethe major and minor chord of a pitch class differonly in the position of their mediants The chord-detection approach often suffers from recognitionerrors that result from overlaps of harmonic compo-nents of individual notes in the spectrum these arequite difficult to avoid Hence there is a possibly

78 Computer Music Journal

Figure 1 System frame-work

Shenoy and Wang

of error in the distinction between the major andminor chords for a given pitch class It must behighlighted that chords outside the key are not nec-essarily erroneous and this usage is a simplificationused by this system If this check fails we eliminatethe chord

Check 2 Perform Temporal Corrections ofDetected or Missing Chords

If the chords detected in the adjacent frames are thesame as each other but different from the currentframe then the chord in the current frame is likelyto be incorrect In these cases we coerce the cur-rent framersquos chord to match the one in the adjacentframes

We present an illustrative example of the abovechecks over three consecutive quarter-note-spacedframes of audio in Figure 2

Rhythm Structure Determination

Next our system checks for the start of measuresbased on the premise that chords are more likely tochange at the beginning of a measure than at otherbeat positions (Goto 2001) Because there are fourquarter notes to a measure in 44 time (which is byfar the most common meter in popular music) we

check for patterns of four consecutive frames withthe same chord to demarcate all possible measureboundaries However not all of these boundariesmay be correct We will illustrate this with an ex-ample in which a chord sustains over two measuresof the music From Figure 3c it can be seen thatthere are four possible measure boundaries being de-tected across the twelve quarter-note-spaced framesof audio Our aim is to eliminate the two erroneousones shown with a dotted line in Figure 3c and in-terpolate an additional measure line at the start ofthe fifth frame to give us the required result as seenin Figure 3b

The correct measure boundaries along the entirelength of the song are thus determined as followsFirst obtain all possible patterns of boundary loca-tions that are have integer relationships in multiplesof four The select the pattern with the highestcount as the one corresponding to the pattern of ac-tual measure boundaries Track the boundary loca-tions in the detected pattern and interpolate missingboundary positions across the rest of the song

Chord Accuracy Enhancement (Phase 2)

Now that the measure boundaries have been ex-tracted we can increase the chord accuracy in eachmeasure of audio with a third check

79

Figure 2 Chord accuracyenhancement (Phase 1)

Check 3 Intra-Measure Chord Check

From Goto and Muraoka (1999) we know thatchords are more likely to change at the beginning ofthe measures than at other positions of half-notetimes Hence if three of the chords are the samethen the four chord is likely to be the same as theothers (Check 3a) Also if a chord is common toboth halves of the measure then all the chords in

the measure are likely to be the same as this chord(Check 3b)

It is observed that all possible cases of chords un-der Check 3a are already handled by Checks 1 and 2above Hence we only implement Check 3b as il-lustrated in Figure 4 with an example This check isrequired because in the case of a minor key we canhave both the major and minor chord of the samepitch class present in the song A classic example of

80 Computer Music Journal

Figure 3 Error in measureboundary detection

Figure 4 Chord accuracyenhancement (Phase 2)

Figure 3

Figure 4

Shenoy and Wang

this can be seen in ldquoHotel Californiardquo by the EaglesThis song is in the key of B Minor and the chords inthe verse include an E major and an E minor chordwhich shows a possible musical shift from the as-cending melodic minor to the natural minor Hereif an E major chord is detected in a measure contain-ing the E minor chord Check 1 would not detectany error because both the major and minor chordsare potentially present in the key of B minor Themelodic minor however rarely occurs as such inpopular music melodies but it has been included inthis work along with the natural and harmonic mi-nor for completeness

Experiments

Setup

The results of our experiments performed on 30popular songs in English spanning five decades ofmusic are tabulated in Table 1 The songs havebeen carefully selected to represent a variety ofartists and time periods We assume the meter to be44 which is the most frequent meter of popularsongs and the tempo of the input song is assumedto be constrained between 40 and 185 quarter notesper minute

81

Table 1 Experimental Results

Chord Chord Accuracy Successful Chord AccuracyDetection Original Detected Enhancement-I Measure Enhancement-II

No Song Title ( accuracy) Key Key ( accuracy) Detection ( accuracy)

1 (1965) Righteous BrothersmdashUnchained melody 5768 C major C major 7092 2830 songs 85111 (1965) Righteous BrothersmdashUnchained melody 5768 C major C major 7092 Yes 85112 (1977) Bee GeesmdashStayinrsquo alive 3967 F minor F minor 5491 Yes 71403 (1977) Eric ClaptonmdashWonderful tonight 2770 G major G major 4082 Yes 60644 (1977) Fleetwood MacmdashYou make lovinrsquo fun 4437 A major A major 6069 Yes 79315 (1979) EaglesmdashI canrsquot tell you why 5241 D major D major 6874 Yes 88976 (1984) ForeignermdashI want to know what love is 5503 D minor D minor 7342 No 58127 (1986) Bruce HornsbymdashThe way it is 5974 G major G major 7032 Yes 88508 (1989) Chris ReamdashRoad to hell 6151 A minor A minor 7664 Yes 89249 (1991) REMmdashLosing my religion 5631 A minor A minor 7075 Yes 857410 (1991) U2mdashOne 5663 C major C major 6482 Yes 766311 (1992) Michael JacksonmdashHeal the world 3044 A major A major 5176 Yes 686212 (1993) MLTRmdashSomeday 5668 D major D major 6971 Yes 873013 (1995) CooliomdashGangstarsquos paradise 3175 C minor C minor 4794 Yes 707914 (1996) Backstreet BoysmdashAs long as you love me 4845 C major C major 6197 Yes 828215 (1996) Joan OsbornemdashOne of us 4690 A major A major 5930 Yes 800516 (1997) Bryan AdamsmdashBack to you 6892 C major C major 7569 Yes 958017 (1997) Green DaymdashTime of your life 5455 G major G major 6458 Yes 877718 (1997) HansonmdashMmmbop 3956 A major A major 6339 Yes 810819 (1997) Savage GardenmdashTruly madly deeply 4906 C major C major 6388 Yes 808620 (1997) Spice GirlsmdashViva forever 6450 D minor F major 7425 Yes 914221 (1997) Tina ArenamdashBurn 3542 G major G major 5613 Yes 773822 (1998) Jennifer PaigemdashCrush 4037 C min C min 5541 Yes 767823 (1998) Natalie ImbrugliamdashTorn 5300 F major F major 6789 Yes 877324 (1999) SantanamdashSmooth 5453 A minor A minor 6963 No 499125 (2000) CorrsmdashBreathless 3677 B major B major 6347 Yes 772826 (2000) Craig DavidmdashWalking away 6899 A minor C major 7526 Yes 930327 (2000) Nelly FurtadomdashTurn off the light 3636 D major D major 4848 Yes 705228 (2000) WestlifemdashSeasons in the sun 3419 F major F major 5869 Yes 763529 (2001) ShakiramdashWhenever wherever 4986 C minor C minor 6282 Yes 783930 (2001) TrainmdashDrops of Jupiter 3254 C major C major 5373 Yes 6985

Overall Accuracy at each stage 4813 2830 songs 6320 2830 songs 7891

It can be observed that the average chord-detectionaccuracy across the length of the entire music per-formed by the chord-detection step (module 3 inFigure 2) is relatively low at 4813 The rest of thechords are either not detected or are detected in er-ror This accuracy is sufficient however to deter-mine the key accurately for 28 out of 30 songs inthe key-detection step (module 4 in Figure 2) whichreflects an accuracy of over 93 for key detectionThis was verified against the information in com-mercially available sheet music for the songs (wwwmusicnotescom wwwsheetmusicpluscom) Theaverage chord detection accuracy of the system im-proves on an average by 1507 on applying ChordAccuracy Enhancement (Phase 1) (See module 5 inFigure 2) Errors in key determination do not haveany effect on this step as will be discussed next

The new accuracy of 6320 has been found to besufficient to determine the hierarchical rhythmstructure (module 6 in Figure 2) across the music for28 out of the 30 songs thus again reflecting an accu-racy of over 93 for rhythm tracking Finally theapplication of Chord Accuracy Enhancement (Phase2 see module 7 in Figure 2) makes a substantial per-formance improvement of 1571 leading to a finalchord-detection accuracy of 7891 This couldhave been higher were it not for the performancedrop for the two songs (song numbers 6 and 24 inTable 1) owing to error in measure-boundary detec-tion This exemplifies the importance of accuratemeasure detection to performing intra-measurechord checks based on the previously discussedmusical knowledge of chords

Analysis of Key

It can be observed that for two of the songs (songnumbers 20 and 26 in Table 1) the key has been de-termined incorrectly because the major key and itsrelative minor are very close Our technique as-sumes that the key of the song is constant through-out the length of the song However many songsoften use both major and minor keys perhapschoosing a minor key for the verse and a major keyfor the chorus or vice versa Sometimes the chordsused in the song are present in both the major and

its relative minor For example the four mainchords used in the song ldquoViva Foreverrdquo by the SpiceGirls are D-sharp minor A-sharp minor B majorand F-sharp major These chords are present in thekey of F-sharp major and D-sharp minor hence it isdifficult for the system to determine if the song is inthe major key or the relative minor A similar obser-vation can be made for the song ldquoWalking Awayrdquo byCraig David in which the main chords used are Aminor D minor F major G major and C majorThese chords are present in both the key of C majoras well as in its relative minor A minor

The use of a weighted-cosine-similarity techniquecauses the shorter major-key patterns to be preferredover the longer minor-key patterns owing to nor-malization that is performed while applying the co-sine similarity For the minor keys normalization isapplied taking into account the count of chords thatcan be constructed across all the three types of minorscales However from an informal evaluation of pop-ular music we observe that popular music in minorkeys usually shifts across only two out of the threescales primarily the natural and harmonic minorIn such cases the normalization technique appliedwould cause the system to become slightly biasedtoward the relative major key where this problem isnot present as there is only one major scale

Errors in key recognition do not affect the ChordAccuracy Enhancement (Phase 1) module becausewe also consider chords present in the relative ma-jorminor key in addition to the chords in the de-tected key A theoretical explanation of how toperform key identification in such cases of ambigu-ity (as seen above) based on an analysis of sheet mu-sic can be found in Ewer (2002)

Analysis of Chord

The variation in the chord-detection accuracy of thesystem can be explained in terms of other chordsand key change

Use of Other Chords

In this approach we have considered only the majorand minor triads However in addition to these

82 Computer Music Journal

Shenoy and Wang

there are other chord possibilities in popular musicwhich likely contribute to the variation in chord de-tection These chords include the augmented anddiminished triads seventh chords (eg the dominantseventh major seventh and minor seventh) etc

Polyphony with its multidimensional sequencesof overlapping tones and overlapping harmoniccomponents of individual notes in the spectrummight cause the elements in the chroma vector tobe weighted wrongly As a result a Cmajor 7 chord(C E G B) in the music might incorrectly get de-tected as an E minor chord (E G B) if the latterthree notes are assigned a relatively higher weightin the chroma vector

Key Change

In some songs there is a key change toward the endof a song to make the final repeated part(s) (eg thechorusrefrain) slightly different from the previousparts This is affected by transposing the song tohigher semitones usually up a half step This hasalso been highlighted by Goto (2003) Because oursystem does not currently handle key changes thechords detected in this section will not be recog-nized This is illustrated with an example in Figure 5

Another point to be noted here is of chord substi-tutionsimplification of extended chords in theevaluation of our system For simplicity extendedchords can be substituted by their respective ma-jorminor triads As long as the notes in the ex-tended chord are present in the scale and the basictriad is present the simplification can be done For

example the C-major 7 can be simplified to the Cmajor triad This substitution has been performedon the extended chords annotated in the sheet mu-sic in the evaluation of our system

Rhythm Tracking Observations

We conjecture the error in rhythm tracking to becaused by errors in the chord-detection as discussedabove Chords present in the music and not handledby our system could be incorrectly classified into oneof the 24 majorminor triads owing to complexitiesin polyphonic audio analysis This can result in in-correct clusters of four chords being captured by therhythm-detection process resulting in an incorrectpattern of measure boundaries having the highestcount Furthermore beat detection is a non-trivialtask the difficulties of tracking the beats in acous-tic signals are discussed in Goto and Muraoka(1994) Any error in beat detection can cause a shiftin the rhythm structure determined by the system

Discussion

We have presented a framework to determine thekey chords and hierarchical rhythm structure fromacoustic musical signals To our knowledge this isthe first attempt to use a rule-based approach thatcombines low-level features with high-level musicknowledge of rhythm and harmonic structure to de-termine all three of these dimensions of music The

83

Figure 5 Key Shift

methods should (and do) work reasonably well forkey and chord labeling of popular music in 44 timeHowever they would not extend very far beyondthis idiom or to more rhythmically and tonallycomplex music This framework has been appliedsuccessfully in various other aspects of contentanalysis such as singing-voice detection (Nwe et al2004) and the automatic alignment of textual lyricsand musical audio (Wang et al 2004) The humanauditory system is capable of extracting rich andmeaningful data from complex audio signals (Shehand Ellis 2003) and existing computational audi-tory analysis systems fall clearly behind humans inperformance Towards this end we believe that themodel proposed here provides a promising platformfor the future development of more sophisticatedauditory models based on a better understanding ofmusic Our current and future research that buildson this work is highlighted below

Key Detection

Our technique assumes that the key of the song isconstant throughout the duration of the song How-ever owing to the properties of the relative majorminor key combinations we have made the chord-and rhythm-detection processes (which use the keyof the song as input) quite robust against changesacross this key combination However the same can-not be said about other kinds of key changes in themusic This is because such key changes are quitedifficult to track as there are no fixed rules andthey tend to depend more on the songwriterrsquos cre-ativity For example the song ldquoLet It Growrdquo by EricClapton switches from a B-minor key in the verse toan E major key in the chorus We believe that ananalysis of the song structure (verse chorus bridgeetc) could likely serve as an input to help track thiskind of key change This problem is currently beinganalyzed and will be tackled in the future

Chord Detection

In this approach we have considered only the majorand minor triads However in addition to these

there are other chord possibilities in popular musicand future work will be targeted toward the detec-tion of dominant-seventh chords and extendedchords as discussed earlier Chord-detection re-search can be further extended to include knowl-edge of chord progressions based on the function ofchords in their diatonic scale which relates to theexpected resolution of each chord within a keyThat is the analysis of chord progressions based onthe ldquoneedrdquo for a sounded chord to move from anldquounstablerdquo sound (a ldquodissonancerdquo) to a more final orldquostablerdquo sounding one (a ldquoconsonancerdquo)

Rhythm Tracking

Unlike that of Goto and Muraoka (1999) therhythm-extraction technique employed in our cur-rent system does not perform well for ldquodrumlessrdquomusic signals because the onset detector has beenoptimized to detect the onset of percussive events(Wang et al 2003) Future effort will be aimed at ex-tending the current work for music signals that donot contain drum sounds

Acknowledgments

We thank the editors and two anonymous reviewersfor helpful comments and suggestions on an earlierversion of this article

References

Aigrain P 1999 ldquoNew Applications of Content Pro-cessing of Musicrdquo Journal of New Music Research28(4)271ndash280

Allen P E and R B Dannenberg 1990 ldquoTracking MusicalBeats in Real Timerdquo Proceedings of the 1990 Interna-tional Computer Music Conference San Francisco In-ternational Computer Music Association pp 140ndash143

Bello J P et al 2000 ldquoTechniques for Automatic MusicTranscriptionrdquo Paper presented at the 2000 Interna-tional Symposium on Music Information Retrieval Uni-versity of Massachusetts at Amherst 23ndash25 October

Carreras F et al 1999 ldquoAutomatic Harmonic Descrip-tion of Musical Signals Using Schema-Based Chord De-

84 Computer Music Journal

Shenoy and Wang

compositionrdquo Journal of New Music Research 28(4)310ndash333

Cemgil A T et al 2001 ldquoOn Tempo Tracking Tempo-gram Representation and Kalman Filteringrdquo Journal ofNew Music Research 29(4)259ndash273

Cemgil A T and B Kappen 2003 ldquoMonte Carlo Meth-ods for Tempo Tracking and Rhythm QuantizationrdquoJournal of Artificial Intelligence Research 18(1)45ndash81

Chew E 2001 ldquoModeling Tonality Applications to Mu-sic Cognitionrdquo Proceedings of the Twenty-Third An-nual Conference of the Cognitive Science SocietyWheat Ridge Colorado Cognitive Science Societypp 206ndash211

Chew E 2002 ldquoThe Spiral Array An Algorithm for De-termining Key Boundariesrdquo Proceedings of the 2002International Conference on Music and Artificial In-telligence Vienna Springer pp 18ndash31

Dixon S 2001 ldquoAutomatic Extraction of Tempo and Beatfrom Expressive Performancesrdquo Journal of New MusicResearch 30(1)39ndash58

Dixon S 2003 ldquoOn the Analysis of Musical Expressionsin Audio Signalsrdquo SPIEmdashThe International Society forOptical Engineering 5021(2)122ndash132

Dixon S 2004 ldquoAnalysis of Musical Content in DigitalAudiordquo Computer Graphics and Multimedia Applica-tions Problems and Solutions Hershey PennsylvaniaIdea Group pp 214ndash235

Ewer G 2002 Easy Music Theory Lower Sackville NovaScotia Spring Day Music Publishers

Fujishima T 1999 ldquoReal-Time Chord Recognition ofMusical Sound A System Using Common Lisp Mu-sicrdquo Proceedings of the 1999 International ComputerMusic Conference San Francisco International Com-puter Music Association pp 464ndash467

Goto M 2001 ldquoAn Audio-Based Real-Time Beat-TrackingSystem for Music with or without Drum SoundsrdquoJournal of New Music Research 30(2)159ndash171

Goto M 2003 ldquoA Chorus-Section Detecting Method forMusical Audio Signalsrdquo Proceedings of the 2003 Inter-national Conference on Acoustics Speech and SignalProcessing Piscataway New Jersey Institute for Elec-trical and Electronics Engineers pp 437ndash440

Goto M and Y Muraoka 1994 ldquoA Beat-Tracking Sys-tem for Acoustic Signals of Musicrdquo Proceedings of the1994 ACM Multimedia New York Association forComputing Machinery pp 365ndash372

Goto M and Y Muraoka 1999 ldquoReal-Time Beat Track-ing for Drumless Audio Signals Chord Change Detec-tion for Musical Decisionsrdquo Speech Communication27(3ndash4)311ndash335

Hevner K 1936 ldquoExperimental Studies of the Elements

of Expression in Musicrdquo American Journal of Psychol-ogy 48246ndash268

Huron D 2000 ldquoPerceptual and Cognitive Applicationsin Music Information Retrievalrdquo Paper presented atthe 2000 International Symposium on Music Informa-tion Retrieval University of Massachusetts at Amherst23ndash25 October

Huron D and R Parncutt 1993 ldquoAn Improved Modelof Tonality Perception Incorporating Pitch Salienceand Echoic Memoryrdquo Psychomusicology 12(2)154ndash171

Klapuri A 2003 ldquoAutomatic Transcription of MusicrdquoProceedings of SMAC 2003 Stockholm KTH

Krumhansl C 1990 Cognitive Foundations of MusicalPitch Oxford Oxford University Press

Maddage N C et al 2004 ldquoContent-Based Music Struc-ture Analysis with Applications to Music SemanticsUnderstandingrdquo Proceedings of 2004 ACM Multime-dia New York Association for Computing Machinerypp 112ndash119

Martin KD et al 1998 ldquoMusic Content AnalysisThrough Models of Auditionrdquo Paper presented at the1998 ACM Multimedia Workshop on Content Pro-cessing of Music for Multimedia Applications BristolUK 12 April

Ng K et al 1996 ldquoAutomatic Detection of Tonality Us-ing Note Distributionrdquo Journal of New Music Re-search 25(4)369ndash381

Nwe T L et al 2004 ldquoSinging Voice Detection in Popu-lar Musicrdquo Paper presented at the 2004 ACM Multime-dia Conference New York 12 October

Ortiz-Berenguer L I and F J Casajus-Quiros 2002ldquoPolyphonic Transcription Using Piano Modeling forSpectral Pattern Recognitionrdquo Proceedings of the 2002Conference on Digital Audio Effects Hoboken NewJersey Wiley pp 45ndash50

Pardo B and W Birmingham 1999 ldquoAutomated Parti-tioning of Tonal Musicrdquo Technical Report CSE-TR-396ndash99 Electrical Engineering and Computer ScienceDepartment University of Michigan

Pardo B and W Birmingham 2001 ldquoChordal Analysis ofTonal Musicrdquo Technical Report CSE-TR-439-01 Elec-trical Engineering and Computer Science DepartmentUniversity of Michigan

Pauws S 2004 ldquoMusical Key Extraction from AudiordquoPaper presented at the 2004 International Sympo-sium on Music Information Retrieval Barcelona11 October

Pickens J et al 2002 ldquoPolyphonic Score Retrieval UsingPolyphonic Audio Queries A Harmonic Modeling Ap-proachrdquo Paper presented at the 2002 International

85

Symposium on Music Information Retrieval Paris15 October

Pickens J and T Crawford 2002 ldquoHarmonic Models forPolyphonic Music Retrievalrdquo Proceedings of the 2002ACM Conference on Information and Knowledge Man-agement New York Association for Computing Ma-chinery pp 430ndash437

Pickens J 2003 ldquoKey-Specific Shrinkage Techniques forHarmonic Modelsrdquo Poster presented at the 2003 Inter-national Symposium on Music Information RetrievalBaltimore 26ndash30 October

Plumbley M D et al 2002 ldquoAutomatic Music Tran-scription and Audio Source Separationrdquo Cyberneticsand Systems 33(6)603ndash627

Povel D J L 2002 ldquoA Model for the Perception of TonalMelodiesrdquo Proceedings of the 2002 International Con-ference on Music and Artificial Intelligence ViennaSpringer pp 144ndash154

Raphael C 2001 ldquoAutomated Rhythm TranscriptionrdquoPaper presented at the 2001 International Symposiumon Music Information Retrieval Bloomington Indiana15ndash17 October

Raphael C and J Stoddard 2003 ldquoHarmonic Analysiswith Probabilistic Graphical Modelsrdquo Paper presentedat the 2003 International Symposium on Music Infor-mation Retrieval Baltimore 26ndash30 October

Scheirer E D 1998 ldquoTempo and Beat Analysis of Acous-tic Musical Signalsrdquo Journal of the Acoustical Societyof America 103(1)588ndash601

Sheh A and D P W Ellis 2003 ldquoChord Segmentationand Recognition Using Em-Trained Hidden MarkovModelsrdquo Paper presented at the 2003 InternationalSymposium on Music Information Retrieval Balti-more 26ndash30 October

Shenoy A et al 2004 ldquoKey Determination of AcousticMusical Signalsrdquo Paper presented at the 2004 Interna-tional Conference on Multimedia and Expo Taipei 27ndash30 June

Smith N A and M A Schmuckler 2000 ldquoPitch-Distributional Effects on the Perception of Tonalityrdquo

Paper presented at the Sixth International Confer-ence on Music Perception and Cognition Keele5ndash10 August

Sollberger B et al 2003 ldquoMusical Chords as AffectivePriming Context in a Word-Evaluation Taskrdquo MusicPerception 20(3)263ndash282

Temperley D 1999a ldquoImproving the Krumhansl-Schmuckler Key-Finding Algorithmrdquo Paper presentedat the 22nd Annual Meeting of the Society for MusicTheory Atlanta 12 November

Temperley D 1999b ldquoWhatrsquos Key for Key TheKrumhansl-Schmuckler Key-Finding Algorithm Recon-sideredrdquo Music Perception 17(1)65ndash100

Temperley D 2002 ldquoA Bayesian Model of Key-FindingrdquoProceedings of the 2002 International Conference onMusic and Artificial Intelligence Vienna Springerpp 195ndash206

Vercoe B L 1997 ldquoComputational Auditory Pathways toMusic Understandingrdquo In I Deliegravege and J Sloboda edsPerception and Cognition of Music London Psychol-ogy Press pp 307ndash326

Wakefield G H 1999 ldquoMathematical Representation ofJoint Time-Chroma Distributionsrdquo SPIEmdashThe Inter-national Society for Optical Engineering 3807637ndash645

Wang Y et al 2003 ldquoParametric Vector Quantization forCoding Percussive Sounds in Musicrdquo Proceedings ofthe 2003 International Conference on AcousticsSpeech and Signal Processing Piscataway New JerseyInstitute for Electrical and Electronics Engineers

Wang Y et al 2004 ldquoLyricAlly Automatic Synchroniza-tion of Acoustic Musical Signals and Textual LyricsrdquoProceedings of 2004 ACM Multimedia New York As-sociation for Computing Machinery pp 212ndash219

Zhu Y and M Kankanhalli 2003 ldquoMusic Scale Modelingfor Melody Matchingrdquo Proceedings of the Eleventh In-ternational ACM Conference on Multimedia New YorkAssociation for Computing Machinery pp 359ndash362

Zhu Y et al 2004 ldquoA Method for Solmization of MelodyrdquoPaper presented at the 2004 International Conferenceon Multimedia and Expo Taipei 27ndash30 June

86 Computer Music Journal

the half-note and whole-note levels) An error mar-gin of 25 msec has been set in the IBI to account forslight variations in the tempo The highest rankedvalue is returned as the IBI from which we obtainthe tempo expressed as an inverse value of the IBIPatterns of onsets in clusters at the IBI are trackedand beat information is interpolated into sections inwhich onsets corresponding to the beat might notbe detected

The audio feature for harmonic analysis is a re-duced spectral representation of each beat-spacedsegment of the audio based on a chroma transforma-tion of the spectrum This feature class representsthe spectrum in terms of pitch class and forms thebasis for the chromagram (Wakefield 1999) The sys-tem takes into account the chord distributionacross the diatonic major scale and the three typesof minor scales (natural harmonic and melodic)Furthermore the system has been biased by assign-ing relatively higher weights to the primary chords

in each key (tonic dominant and subdominant)This concept has also been demonstrated in the Spi-ral Array a mathematical model for tonality (Chew2001 2002)

It is observed that the chord-recognition accuracyof the key system though sufficient to determinethe key is not sufficient to provide usable chordtranscriptions or determine the hierarchical rhythmstructure across the entire duration of the musicThus in this article we enhance the four-step key-determination system with three post-processingstages that allow us to perform these two tasks withgreater accuracy The block diagram of the proposedframework is shown in Figure 1

System Components

We now discuss the three post-key processing com-ponents of the system These consist of a first phaseof chord accuracy enhancement rhythm structuredetermination and a second phase of chord accu-racy enhancement

Chord Accuracy Enhancement (Phase 1)

In this step we aim to increase the accuracy of chorddetection For each audio frame we perform twochecks

Check 1 Eliminate Chords Not in the Key of theSong

Here we perform a rule-based analysis of the de-tected chord to see if it exists in the key of the songIf not we check for the presence of the major chordof the same pitch class (if the detected chord is mi-nor) and vice versa If present in the key we replacethe erroneous chord with this chord This is becausethe major and minor chord of a pitch class differonly in the position of their mediants The chord-detection approach often suffers from recognitionerrors that result from overlaps of harmonic compo-nents of individual notes in the spectrum these arequite difficult to avoid Hence there is a possibly

78 Computer Music Journal

Figure 1 System frame-work

Shenoy and Wang

of error in the distinction between the major andminor chords for a given pitch class It must behighlighted that chords outside the key are not nec-essarily erroneous and this usage is a simplificationused by this system If this check fails we eliminatethe chord

Check 2 Perform Temporal Corrections ofDetected or Missing Chords

If the chords detected in the adjacent frames are thesame as each other but different from the currentframe then the chord in the current frame is likelyto be incorrect In these cases we coerce the cur-rent framersquos chord to match the one in the adjacentframes

We present an illustrative example of the abovechecks over three consecutive quarter-note-spacedframes of audio in Figure 2

Rhythm Structure Determination

Next our system checks for the start of measuresbased on the premise that chords are more likely tochange at the beginning of a measure than at otherbeat positions (Goto 2001) Because there are fourquarter notes to a measure in 44 time (which is byfar the most common meter in popular music) we

check for patterns of four consecutive frames withthe same chord to demarcate all possible measureboundaries However not all of these boundariesmay be correct We will illustrate this with an ex-ample in which a chord sustains over two measuresof the music From Figure 3c it can be seen thatthere are four possible measure boundaries being de-tected across the twelve quarter-note-spaced framesof audio Our aim is to eliminate the two erroneousones shown with a dotted line in Figure 3c and in-terpolate an additional measure line at the start ofthe fifth frame to give us the required result as seenin Figure 3b

The correct measure boundaries along the entirelength of the song are thus determined as followsFirst obtain all possible patterns of boundary loca-tions that are have integer relationships in multiplesof four The select the pattern with the highestcount as the one corresponding to the pattern of ac-tual measure boundaries Track the boundary loca-tions in the detected pattern and interpolate missingboundary positions across the rest of the song

Chord Accuracy Enhancement (Phase 2)

Now that the measure boundaries have been ex-tracted we can increase the chord accuracy in eachmeasure of audio with a third check

79

Figure 2 Chord accuracyenhancement (Phase 1)

Check 3 Intra-Measure Chord Check

From Goto and Muraoka (1999) we know thatchords are more likely to change at the beginning ofthe measures than at other positions of half-notetimes Hence if three of the chords are the samethen the four chord is likely to be the same as theothers (Check 3a) Also if a chord is common toboth halves of the measure then all the chords in

the measure are likely to be the same as this chord(Check 3b)

It is observed that all possible cases of chords un-der Check 3a are already handled by Checks 1 and 2above Hence we only implement Check 3b as il-lustrated in Figure 4 with an example This check isrequired because in the case of a minor key we canhave both the major and minor chord of the samepitch class present in the song A classic example of

80 Computer Music Journal

Figure 3 Error in measureboundary detection

Figure 4 Chord accuracyenhancement (Phase 2)

Figure 3

Figure 4

Shenoy and Wang

this can be seen in ldquoHotel Californiardquo by the EaglesThis song is in the key of B Minor and the chords inthe verse include an E major and an E minor chordwhich shows a possible musical shift from the as-cending melodic minor to the natural minor Hereif an E major chord is detected in a measure contain-ing the E minor chord Check 1 would not detectany error because both the major and minor chordsare potentially present in the key of B minor Themelodic minor however rarely occurs as such inpopular music melodies but it has been included inthis work along with the natural and harmonic mi-nor for completeness

Experiments

Setup

The results of our experiments performed on 30popular songs in English spanning five decades ofmusic are tabulated in Table 1 The songs havebeen carefully selected to represent a variety ofartists and time periods We assume the meter to be44 which is the most frequent meter of popularsongs and the tempo of the input song is assumedto be constrained between 40 and 185 quarter notesper minute

81

Table 1 Experimental Results

Chord Chord Accuracy Successful Chord AccuracyDetection Original Detected Enhancement-I Measure Enhancement-II

No Song Title ( accuracy) Key Key ( accuracy) Detection ( accuracy)

1 (1965) Righteous BrothersmdashUnchained melody 5768 C major C major 7092 2830 songs 85111 (1965) Righteous BrothersmdashUnchained melody 5768 C major C major 7092 Yes 85112 (1977) Bee GeesmdashStayinrsquo alive 3967 F minor F minor 5491 Yes 71403 (1977) Eric ClaptonmdashWonderful tonight 2770 G major G major 4082 Yes 60644 (1977) Fleetwood MacmdashYou make lovinrsquo fun 4437 A major A major 6069 Yes 79315 (1979) EaglesmdashI canrsquot tell you why 5241 D major D major 6874 Yes 88976 (1984) ForeignermdashI want to know what love is 5503 D minor D minor 7342 No 58127 (1986) Bruce HornsbymdashThe way it is 5974 G major G major 7032 Yes 88508 (1989) Chris ReamdashRoad to hell 6151 A minor A minor 7664 Yes 89249 (1991) REMmdashLosing my religion 5631 A minor A minor 7075 Yes 857410 (1991) U2mdashOne 5663 C major C major 6482 Yes 766311 (1992) Michael JacksonmdashHeal the world 3044 A major A major 5176 Yes 686212 (1993) MLTRmdashSomeday 5668 D major D major 6971 Yes 873013 (1995) CooliomdashGangstarsquos paradise 3175 C minor C minor 4794 Yes 707914 (1996) Backstreet BoysmdashAs long as you love me 4845 C major C major 6197 Yes 828215 (1996) Joan OsbornemdashOne of us 4690 A major A major 5930 Yes 800516 (1997) Bryan AdamsmdashBack to you 6892 C major C major 7569 Yes 958017 (1997) Green DaymdashTime of your life 5455 G major G major 6458 Yes 877718 (1997) HansonmdashMmmbop 3956 A major A major 6339 Yes 810819 (1997) Savage GardenmdashTruly madly deeply 4906 C major C major 6388 Yes 808620 (1997) Spice GirlsmdashViva forever 6450 D minor F major 7425 Yes 914221 (1997) Tina ArenamdashBurn 3542 G major G major 5613 Yes 773822 (1998) Jennifer PaigemdashCrush 4037 C min C min 5541 Yes 767823 (1998) Natalie ImbrugliamdashTorn 5300 F major F major 6789 Yes 877324 (1999) SantanamdashSmooth 5453 A minor A minor 6963 No 499125 (2000) CorrsmdashBreathless 3677 B major B major 6347 Yes 772826 (2000) Craig DavidmdashWalking away 6899 A minor C major 7526 Yes 930327 (2000) Nelly FurtadomdashTurn off the light 3636 D major D major 4848 Yes 705228 (2000) WestlifemdashSeasons in the sun 3419 F major F major 5869 Yes 763529 (2001) ShakiramdashWhenever wherever 4986 C minor C minor 6282 Yes 783930 (2001) TrainmdashDrops of Jupiter 3254 C major C major 5373 Yes 6985

Overall Accuracy at each stage 4813 2830 songs 6320 2830 songs 7891

It can be observed that the average chord-detectionaccuracy across the length of the entire music per-formed by the chord-detection step (module 3 inFigure 2) is relatively low at 4813 The rest of thechords are either not detected or are detected in er-ror This accuracy is sufficient however to deter-mine the key accurately for 28 out of 30 songs inthe key-detection step (module 4 in Figure 2) whichreflects an accuracy of over 93 for key detectionThis was verified against the information in com-mercially available sheet music for the songs (wwwmusicnotescom wwwsheetmusicpluscom) Theaverage chord detection accuracy of the system im-proves on an average by 1507 on applying ChordAccuracy Enhancement (Phase 1) (See module 5 inFigure 2) Errors in key determination do not haveany effect on this step as will be discussed next

The new accuracy of 6320 has been found to besufficient to determine the hierarchical rhythmstructure (module 6 in Figure 2) across the music for28 out of the 30 songs thus again reflecting an accu-racy of over 93 for rhythm tracking Finally theapplication of Chord Accuracy Enhancement (Phase2 see module 7 in Figure 2) makes a substantial per-formance improvement of 1571 leading to a finalchord-detection accuracy of 7891 This couldhave been higher were it not for the performancedrop for the two songs (song numbers 6 and 24 inTable 1) owing to error in measure-boundary detec-tion This exemplifies the importance of accuratemeasure detection to performing intra-measurechord checks based on the previously discussedmusical knowledge of chords

Analysis of Key

It can be observed that for two of the songs (songnumbers 20 and 26 in Table 1) the key has been de-termined incorrectly because the major key and itsrelative minor are very close Our technique as-sumes that the key of the song is constant through-out the length of the song However many songsoften use both major and minor keys perhapschoosing a minor key for the verse and a major keyfor the chorus or vice versa Sometimes the chordsused in the song are present in both the major and

its relative minor For example the four mainchords used in the song ldquoViva Foreverrdquo by the SpiceGirls are D-sharp minor A-sharp minor B majorand F-sharp major These chords are present in thekey of F-sharp major and D-sharp minor hence it isdifficult for the system to determine if the song is inthe major key or the relative minor A similar obser-vation can be made for the song ldquoWalking Awayrdquo byCraig David in which the main chords used are Aminor D minor F major G major and C majorThese chords are present in both the key of C majoras well as in its relative minor A minor

The use of a weighted-cosine-similarity techniquecauses the shorter major-key patterns to be preferredover the longer minor-key patterns owing to nor-malization that is performed while applying the co-sine similarity For the minor keys normalization isapplied taking into account the count of chords thatcan be constructed across all the three types of minorscales However from an informal evaluation of pop-ular music we observe that popular music in minorkeys usually shifts across only two out of the threescales primarily the natural and harmonic minorIn such cases the normalization technique appliedwould cause the system to become slightly biasedtoward the relative major key where this problem isnot present as there is only one major scale

Errors in key recognition do not affect the ChordAccuracy Enhancement (Phase 1) module becausewe also consider chords present in the relative ma-jorminor key in addition to the chords in the de-tected key A theoretical explanation of how toperform key identification in such cases of ambigu-ity (as seen above) based on an analysis of sheet mu-sic can be found in Ewer (2002)

Analysis of Chord

The variation in the chord-detection accuracy of thesystem can be explained in terms of other chordsand key change

Use of Other Chords

In this approach we have considered only the majorand minor triads However in addition to these

82 Computer Music Journal

Shenoy and Wang

there are other chord possibilities in popular musicwhich likely contribute to the variation in chord de-tection These chords include the augmented anddiminished triads seventh chords (eg the dominantseventh major seventh and minor seventh) etc

Polyphony with its multidimensional sequencesof overlapping tones and overlapping harmoniccomponents of individual notes in the spectrummight cause the elements in the chroma vector tobe weighted wrongly As a result a Cmajor 7 chord(C E G B) in the music might incorrectly get de-tected as an E minor chord (E G B) if the latterthree notes are assigned a relatively higher weightin the chroma vector

Key Change

In some songs there is a key change toward the endof a song to make the final repeated part(s) (eg thechorusrefrain) slightly different from the previousparts This is affected by transposing the song tohigher semitones usually up a half step This hasalso been highlighted by Goto (2003) Because oursystem does not currently handle key changes thechords detected in this section will not be recog-nized This is illustrated with an example in Figure 5

Another point to be noted here is of chord substi-tutionsimplification of extended chords in theevaluation of our system For simplicity extendedchords can be substituted by their respective ma-jorminor triads As long as the notes in the ex-tended chord are present in the scale and the basictriad is present the simplification can be done For

example the C-major 7 can be simplified to the Cmajor triad This substitution has been performedon the extended chords annotated in the sheet mu-sic in the evaluation of our system

Rhythm Tracking Observations

We conjecture the error in rhythm tracking to becaused by errors in the chord-detection as discussedabove Chords present in the music and not handledby our system could be incorrectly classified into oneof the 24 majorminor triads owing to complexitiesin polyphonic audio analysis This can result in in-correct clusters of four chords being captured by therhythm-detection process resulting in an incorrectpattern of measure boundaries having the highestcount Furthermore beat detection is a non-trivialtask the difficulties of tracking the beats in acous-tic signals are discussed in Goto and Muraoka(1994) Any error in beat detection can cause a shiftin the rhythm structure determined by the system

Discussion

We have presented a framework to determine thekey chords and hierarchical rhythm structure fromacoustic musical signals To our knowledge this isthe first attempt to use a rule-based approach thatcombines low-level features with high-level musicknowledge of rhythm and harmonic structure to de-termine all three of these dimensions of music The

83

Figure 5 Key Shift

methods should (and do) work reasonably well forkey and chord labeling of popular music in 44 timeHowever they would not extend very far beyondthis idiom or to more rhythmically and tonallycomplex music This framework has been appliedsuccessfully in various other aspects of contentanalysis such as singing-voice detection (Nwe et al2004) and the automatic alignment of textual lyricsand musical audio (Wang et al 2004) The humanauditory system is capable of extracting rich andmeaningful data from complex audio signals (Shehand Ellis 2003) and existing computational audi-tory analysis systems fall clearly behind humans inperformance Towards this end we believe that themodel proposed here provides a promising platformfor the future development of more sophisticatedauditory models based on a better understanding ofmusic Our current and future research that buildson this work is highlighted below

Key Detection

Our technique assumes that the key of the song isconstant throughout the duration of the song How-ever owing to the properties of the relative majorminor key combinations we have made the chord-and rhythm-detection processes (which use the keyof the song as input) quite robust against changesacross this key combination However the same can-not be said about other kinds of key changes in themusic This is because such key changes are quitedifficult to track as there are no fixed rules andthey tend to depend more on the songwriterrsquos cre-ativity For example the song ldquoLet It Growrdquo by EricClapton switches from a B-minor key in the verse toan E major key in the chorus We believe that ananalysis of the song structure (verse chorus bridgeetc) could likely serve as an input to help track thiskind of key change This problem is currently beinganalyzed and will be tackled in the future

Chord Detection

In this approach we have considered only the majorand minor triads However in addition to these

there are other chord possibilities in popular musicand future work will be targeted toward the detec-tion of dominant-seventh chords and extendedchords as discussed earlier Chord-detection re-search can be further extended to include knowl-edge of chord progressions based on the function ofchords in their diatonic scale which relates to theexpected resolution of each chord within a keyThat is the analysis of chord progressions based onthe ldquoneedrdquo for a sounded chord to move from anldquounstablerdquo sound (a ldquodissonancerdquo) to a more final orldquostablerdquo sounding one (a ldquoconsonancerdquo)

Rhythm Tracking

Unlike that of Goto and Muraoka (1999) therhythm-extraction technique employed in our cur-rent system does not perform well for ldquodrumlessrdquomusic signals because the onset detector has beenoptimized to detect the onset of percussive events(Wang et al 2003) Future effort will be aimed at ex-tending the current work for music signals that donot contain drum sounds

Acknowledgments

We thank the editors and two anonymous reviewersfor helpful comments and suggestions on an earlierversion of this article

References

Aigrain P 1999 ldquoNew Applications of Content Pro-cessing of Musicrdquo Journal of New Music Research28(4)271ndash280

Allen P E and R B Dannenberg 1990 ldquoTracking MusicalBeats in Real Timerdquo Proceedings of the 1990 Interna-tional Computer Music Conference San Francisco In-ternational Computer Music Association pp 140ndash143

Bello J P et al 2000 ldquoTechniques for Automatic MusicTranscriptionrdquo Paper presented at the 2000 Interna-tional Symposium on Music Information Retrieval Uni-versity of Massachusetts at Amherst 23ndash25 October

Carreras F et al 1999 ldquoAutomatic Harmonic Descrip-tion of Musical Signals Using Schema-Based Chord De-

84 Computer Music Journal

Shenoy and Wang

compositionrdquo Journal of New Music Research 28(4)310ndash333

Cemgil A T et al 2001 ldquoOn Tempo Tracking Tempo-gram Representation and Kalman Filteringrdquo Journal ofNew Music Research 29(4)259ndash273

Cemgil A T and B Kappen 2003 ldquoMonte Carlo Meth-ods for Tempo Tracking and Rhythm QuantizationrdquoJournal of Artificial Intelligence Research 18(1)45ndash81

Chew E 2001 ldquoModeling Tonality Applications to Mu-sic Cognitionrdquo Proceedings of the Twenty-Third An-nual Conference of the Cognitive Science SocietyWheat Ridge Colorado Cognitive Science Societypp 206ndash211

Chew E 2002 ldquoThe Spiral Array An Algorithm for De-termining Key Boundariesrdquo Proceedings of the 2002International Conference on Music and Artificial In-telligence Vienna Springer pp 18ndash31

Dixon S 2001 ldquoAutomatic Extraction of Tempo and Beatfrom Expressive Performancesrdquo Journal of New MusicResearch 30(1)39ndash58

Dixon S 2003 ldquoOn the Analysis of Musical Expressionsin Audio Signalsrdquo SPIEmdashThe International Society forOptical Engineering 5021(2)122ndash132

Dixon S 2004 ldquoAnalysis of Musical Content in DigitalAudiordquo Computer Graphics and Multimedia Applica-tions Problems and Solutions Hershey PennsylvaniaIdea Group pp 214ndash235

Ewer G 2002 Easy Music Theory Lower Sackville NovaScotia Spring Day Music Publishers

Fujishima T 1999 ldquoReal-Time Chord Recognition ofMusical Sound A System Using Common Lisp Mu-sicrdquo Proceedings of the 1999 International ComputerMusic Conference San Francisco International Com-puter Music Association pp 464ndash467

Goto M 2001 ldquoAn Audio-Based Real-Time Beat-TrackingSystem for Music with or without Drum SoundsrdquoJournal of New Music Research 30(2)159ndash171

Goto M 2003 ldquoA Chorus-Section Detecting Method forMusical Audio Signalsrdquo Proceedings of the 2003 Inter-national Conference on Acoustics Speech and SignalProcessing Piscataway New Jersey Institute for Elec-trical and Electronics Engineers pp 437ndash440

Goto M and Y Muraoka 1994 ldquoA Beat-Tracking Sys-tem for Acoustic Signals of Musicrdquo Proceedings of the1994 ACM Multimedia New York Association forComputing Machinery pp 365ndash372

Goto M and Y Muraoka 1999 ldquoReal-Time Beat Track-ing for Drumless Audio Signals Chord Change Detec-tion for Musical Decisionsrdquo Speech Communication27(3ndash4)311ndash335

Hevner K 1936 ldquoExperimental Studies of the Elements

of Expression in Musicrdquo American Journal of Psychol-ogy 48246ndash268

Huron D 2000 ldquoPerceptual and Cognitive Applicationsin Music Information Retrievalrdquo Paper presented atthe 2000 International Symposium on Music Informa-tion Retrieval University of Massachusetts at Amherst23ndash25 October

Huron D and R Parncutt 1993 ldquoAn Improved Modelof Tonality Perception Incorporating Pitch Salienceand Echoic Memoryrdquo Psychomusicology 12(2)154ndash171

Klapuri A 2003 ldquoAutomatic Transcription of MusicrdquoProceedings of SMAC 2003 Stockholm KTH

Krumhansl C 1990 Cognitive Foundations of MusicalPitch Oxford Oxford University Press

Maddage N C et al 2004 ldquoContent-Based Music Struc-ture Analysis with Applications to Music SemanticsUnderstandingrdquo Proceedings of 2004 ACM Multime-dia New York Association for Computing Machinerypp 112ndash119

Martin KD et al 1998 ldquoMusic Content AnalysisThrough Models of Auditionrdquo Paper presented at the1998 ACM Multimedia Workshop on Content Pro-cessing of Music for Multimedia Applications BristolUK 12 April

Ng K et al 1996 ldquoAutomatic Detection of Tonality Us-ing Note Distributionrdquo Journal of New Music Re-search 25(4)369ndash381

Nwe T L et al 2004 ldquoSinging Voice Detection in Popu-lar Musicrdquo Paper presented at the 2004 ACM Multime-dia Conference New York 12 October

Ortiz-Berenguer L I and F J Casajus-Quiros 2002ldquoPolyphonic Transcription Using Piano Modeling forSpectral Pattern Recognitionrdquo Proceedings of the 2002Conference on Digital Audio Effects Hoboken NewJersey Wiley pp 45ndash50

Pardo B and W Birmingham 1999 ldquoAutomated Parti-tioning of Tonal Musicrdquo Technical Report CSE-TR-396ndash99 Electrical Engineering and Computer ScienceDepartment University of Michigan

Pardo B and W Birmingham 2001 ldquoChordal Analysis ofTonal Musicrdquo Technical Report CSE-TR-439-01 Elec-trical Engineering and Computer Science DepartmentUniversity of Michigan

Pauws S 2004 ldquoMusical Key Extraction from AudiordquoPaper presented at the 2004 International Sympo-sium on Music Information Retrieval Barcelona11 October

Pickens J et al 2002 ldquoPolyphonic Score Retrieval UsingPolyphonic Audio Queries A Harmonic Modeling Ap-proachrdquo Paper presented at the 2002 International

85

Symposium on Music Information Retrieval Paris15 October

Pickens J and T Crawford 2002 ldquoHarmonic Models forPolyphonic Music Retrievalrdquo Proceedings of the 2002ACM Conference on Information and Knowledge Man-agement New York Association for Computing Ma-chinery pp 430ndash437

Pickens J 2003 ldquoKey-Specific Shrinkage Techniques forHarmonic Modelsrdquo Poster presented at the 2003 Inter-national Symposium on Music Information RetrievalBaltimore 26ndash30 October

Plumbley M D et al 2002 ldquoAutomatic Music Tran-scription and Audio Source Separationrdquo Cyberneticsand Systems 33(6)603ndash627

Povel D J L 2002 ldquoA Model for the Perception of TonalMelodiesrdquo Proceedings of the 2002 International Con-ference on Music and Artificial Intelligence ViennaSpringer pp 144ndash154

Raphael C 2001 ldquoAutomated Rhythm TranscriptionrdquoPaper presented at the 2001 International Symposiumon Music Information Retrieval Bloomington Indiana15ndash17 October

Raphael C and J Stoddard 2003 ldquoHarmonic Analysiswith Probabilistic Graphical Modelsrdquo Paper presentedat the 2003 International Symposium on Music Infor-mation Retrieval Baltimore 26ndash30 October

Scheirer E D 1998 ldquoTempo and Beat Analysis of Acous-tic Musical Signalsrdquo Journal of the Acoustical Societyof America 103(1)588ndash601

Sheh A and D P W Ellis 2003 ldquoChord Segmentationand Recognition Using Em-Trained Hidden MarkovModelsrdquo Paper presented at the 2003 InternationalSymposium on Music Information Retrieval Balti-more 26ndash30 October

Shenoy A et al 2004 ldquoKey Determination of AcousticMusical Signalsrdquo Paper presented at the 2004 Interna-tional Conference on Multimedia and Expo Taipei 27ndash30 June

Smith N A and M A Schmuckler 2000 ldquoPitch-Distributional Effects on the Perception of Tonalityrdquo

Paper presented at the Sixth International Confer-ence on Music Perception and Cognition Keele5ndash10 August

Sollberger B et al 2003 ldquoMusical Chords as AffectivePriming Context in a Word-Evaluation Taskrdquo MusicPerception 20(3)263ndash282

Temperley D 1999a ldquoImproving the Krumhansl-Schmuckler Key-Finding Algorithmrdquo Paper presentedat the 22nd Annual Meeting of the Society for MusicTheory Atlanta 12 November

Temperley D 1999b ldquoWhatrsquos Key for Key TheKrumhansl-Schmuckler Key-Finding Algorithm Recon-sideredrdquo Music Perception 17(1)65ndash100

Temperley D 2002 ldquoA Bayesian Model of Key-FindingrdquoProceedings of the 2002 International Conference onMusic and Artificial Intelligence Vienna Springerpp 195ndash206

Vercoe B L 1997 ldquoComputational Auditory Pathways toMusic Understandingrdquo In I Deliegravege and J Sloboda edsPerception and Cognition of Music London Psychol-ogy Press pp 307ndash326

Wakefield G H 1999 ldquoMathematical Representation ofJoint Time-Chroma Distributionsrdquo SPIEmdashThe Inter-national Society for Optical Engineering 3807637ndash645

Wang Y et al 2003 ldquoParametric Vector Quantization forCoding Percussive Sounds in Musicrdquo Proceedings ofthe 2003 International Conference on AcousticsSpeech and Signal Processing Piscataway New JerseyInstitute for Electrical and Electronics Engineers

Wang Y et al 2004 ldquoLyricAlly Automatic Synchroniza-tion of Acoustic Musical Signals and Textual LyricsrdquoProceedings of 2004 ACM Multimedia New York As-sociation for Computing Machinery pp 212ndash219

Zhu Y and M Kankanhalli 2003 ldquoMusic Scale Modelingfor Melody Matchingrdquo Proceedings of the Eleventh In-ternational ACM Conference on Multimedia New YorkAssociation for Computing Machinery pp 359ndash362

Zhu Y et al 2004 ldquoA Method for Solmization of MelodyrdquoPaper presented at the 2004 International Conferenceon Multimedia and Expo Taipei 27ndash30 June

86 Computer Music Journal

Shenoy and Wang

of error in the distinction between the major andminor chords for a given pitch class It must behighlighted that chords outside the key are not nec-essarily erroneous and this usage is a simplificationused by this system If this check fails we eliminatethe chord

Check 2 Perform Temporal Corrections ofDetected or Missing Chords

If the chords detected in the adjacent frames are thesame as each other but different from the currentframe then the chord in the current frame is likelyto be incorrect In these cases we coerce the cur-rent framersquos chord to match the one in the adjacentframes

We present an illustrative example of the abovechecks over three consecutive quarter-note-spacedframes of audio in Figure 2

Rhythm Structure Determination

Next our system checks for the start of measuresbased on the premise that chords are more likely tochange at the beginning of a measure than at otherbeat positions (Goto 2001) Because there are fourquarter notes to a measure in 44 time (which is byfar the most common meter in popular music) we

check for patterns of four consecutive frames withthe same chord to demarcate all possible measureboundaries However not all of these boundariesmay be correct We will illustrate this with an ex-ample in which a chord sustains over two measuresof the music From Figure 3c it can be seen thatthere are four possible measure boundaries being de-tected across the twelve quarter-note-spaced framesof audio Our aim is to eliminate the two erroneousones shown with a dotted line in Figure 3c and in-terpolate an additional measure line at the start ofthe fifth frame to give us the required result as seenin Figure 3b

The correct measure boundaries along the entirelength of the song are thus determined as followsFirst obtain all possible patterns of boundary loca-tions that are have integer relationships in multiplesof four The select the pattern with the highestcount as the one corresponding to the pattern of ac-tual measure boundaries Track the boundary loca-tions in the detected pattern and interpolate missingboundary positions across the rest of the song

Chord Accuracy Enhancement (Phase 2)

Now that the measure boundaries have been ex-tracted we can increase the chord accuracy in eachmeasure of audio with a third check

79

Figure 2 Chord accuracyenhancement (Phase 1)

Check 3 Intra-Measure Chord Check

From Goto and Muraoka (1999) we know thatchords are more likely to change at the beginning ofthe measures than at other positions of half-notetimes Hence if three of the chords are the samethen the four chord is likely to be the same as theothers (Check 3a) Also if a chord is common toboth halves of the measure then all the chords in

the measure are likely to be the same as this chord(Check 3b)

It is observed that all possible cases of chords un-der Check 3a are already handled by Checks 1 and 2above Hence we only implement Check 3b as il-lustrated in Figure 4 with an example This check isrequired because in the case of a minor key we canhave both the major and minor chord of the samepitch class present in the song A classic example of

80 Computer Music Journal

Figure 3 Error in measureboundary detection

Figure 4 Chord accuracyenhancement (Phase 2)

Figure 3

Figure 4

Shenoy and Wang

this can be seen in ldquoHotel Californiardquo by the EaglesThis song is in the key of B Minor and the chords inthe verse include an E major and an E minor chordwhich shows a possible musical shift from the as-cending melodic minor to the natural minor Hereif an E major chord is detected in a measure contain-ing the E minor chord Check 1 would not detectany error because both the major and minor chordsare potentially present in the key of B minor Themelodic minor however rarely occurs as such inpopular music melodies but it has been included inthis work along with the natural and harmonic mi-nor for completeness

Experiments

Setup

The results of our experiments performed on 30popular songs in English spanning five decades ofmusic are tabulated in Table 1 The songs havebeen carefully selected to represent a variety ofartists and time periods We assume the meter to be44 which is the most frequent meter of popularsongs and the tempo of the input song is assumedto be constrained between 40 and 185 quarter notesper minute

81

Table 1 Experimental Results

Chord Chord Accuracy Successful Chord AccuracyDetection Original Detected Enhancement-I Measure Enhancement-II

No Song Title ( accuracy) Key Key ( accuracy) Detection ( accuracy)

1 (1965) Righteous BrothersmdashUnchained melody 5768 C major C major 7092 2830 songs 85111 (1965) Righteous BrothersmdashUnchained melody 5768 C major C major 7092 Yes 85112 (1977) Bee GeesmdashStayinrsquo alive 3967 F minor F minor 5491 Yes 71403 (1977) Eric ClaptonmdashWonderful tonight 2770 G major G major 4082 Yes 60644 (1977) Fleetwood MacmdashYou make lovinrsquo fun 4437 A major A major 6069 Yes 79315 (1979) EaglesmdashI canrsquot tell you why 5241 D major D major 6874 Yes 88976 (1984) ForeignermdashI want to know what love is 5503 D minor D minor 7342 No 58127 (1986) Bruce HornsbymdashThe way it is 5974 G major G major 7032 Yes 88508 (1989) Chris ReamdashRoad to hell 6151 A minor A minor 7664 Yes 89249 (1991) REMmdashLosing my religion 5631 A minor A minor 7075 Yes 857410 (1991) U2mdashOne 5663 C major C major 6482 Yes 766311 (1992) Michael JacksonmdashHeal the world 3044 A major A major 5176 Yes 686212 (1993) MLTRmdashSomeday 5668 D major D major 6971 Yes 873013 (1995) CooliomdashGangstarsquos paradise 3175 C minor C minor 4794 Yes 707914 (1996) Backstreet BoysmdashAs long as you love me 4845 C major C major 6197 Yes 828215 (1996) Joan OsbornemdashOne of us 4690 A major A major 5930 Yes 800516 (1997) Bryan AdamsmdashBack to you 6892 C major C major 7569 Yes 958017 (1997) Green DaymdashTime of your life 5455 G major G major 6458 Yes 877718 (1997) HansonmdashMmmbop 3956 A major A major 6339 Yes 810819 (1997) Savage GardenmdashTruly madly deeply 4906 C major C major 6388 Yes 808620 (1997) Spice GirlsmdashViva forever 6450 D minor F major 7425 Yes 914221 (1997) Tina ArenamdashBurn 3542 G major G major 5613 Yes 773822 (1998) Jennifer PaigemdashCrush 4037 C min C min 5541 Yes 767823 (1998) Natalie ImbrugliamdashTorn 5300 F major F major 6789 Yes 877324 (1999) SantanamdashSmooth 5453 A minor A minor 6963 No 499125 (2000) CorrsmdashBreathless 3677 B major B major 6347 Yes 772826 (2000) Craig DavidmdashWalking away 6899 A minor C major 7526 Yes 930327 (2000) Nelly FurtadomdashTurn off the light 3636 D major D major 4848 Yes 705228 (2000) WestlifemdashSeasons in the sun 3419 F major F major 5869 Yes 763529 (2001) ShakiramdashWhenever wherever 4986 C minor C minor 6282 Yes 783930 (2001) TrainmdashDrops of Jupiter 3254 C major C major 5373 Yes 6985

Overall Accuracy at each stage 4813 2830 songs 6320 2830 songs 7891

It can be observed that the average chord-detectionaccuracy across the length of the entire music per-formed by the chord-detection step (module 3 inFigure 2) is relatively low at 4813 The rest of thechords are either not detected or are detected in er-ror This accuracy is sufficient however to deter-mine the key accurately for 28 out of 30 songs inthe key-detection step (module 4 in Figure 2) whichreflects an accuracy of over 93 for key detectionThis was verified against the information in com-mercially available sheet music for the songs (wwwmusicnotescom wwwsheetmusicpluscom) Theaverage chord detection accuracy of the system im-proves on an average by 1507 on applying ChordAccuracy Enhancement (Phase 1) (See module 5 inFigure 2) Errors in key determination do not haveany effect on this step as will be discussed next

The new accuracy of 6320 has been found to besufficient to determine the hierarchical rhythmstructure (module 6 in Figure 2) across the music for28 out of the 30 songs thus again reflecting an accu-racy of over 93 for rhythm tracking Finally theapplication of Chord Accuracy Enhancement (Phase2 see module 7 in Figure 2) makes a substantial per-formance improvement of 1571 leading to a finalchord-detection accuracy of 7891 This couldhave been higher were it not for the performancedrop for the two songs (song numbers 6 and 24 inTable 1) owing to error in measure-boundary detec-tion This exemplifies the importance of accuratemeasure detection to performing intra-measurechord checks based on the previously discussedmusical knowledge of chords

Analysis of Key

It can be observed that for two of the songs (songnumbers 20 and 26 in Table 1) the key has been de-termined incorrectly because the major key and itsrelative minor are very close Our technique as-sumes that the key of the song is constant through-out the length of the song However many songsoften use both major and minor keys perhapschoosing a minor key for the verse and a major keyfor the chorus or vice versa Sometimes the chordsused in the song are present in both the major and

its relative minor For example the four mainchords used in the song ldquoViva Foreverrdquo by the SpiceGirls are D-sharp minor A-sharp minor B majorand F-sharp major These chords are present in thekey of F-sharp major and D-sharp minor hence it isdifficult for the system to determine if the song is inthe major key or the relative minor A similar obser-vation can be made for the song ldquoWalking Awayrdquo byCraig David in which the main chords used are Aminor D minor F major G major and C majorThese chords are present in both the key of C majoras well as in its relative minor A minor

The use of a weighted-cosine-similarity techniquecauses the shorter major-key patterns to be preferredover the longer minor-key patterns owing to nor-malization that is performed while applying the co-sine similarity For the minor keys normalization isapplied taking into account the count of chords thatcan be constructed across all the three types of minorscales However from an informal evaluation of pop-ular music we observe that popular music in minorkeys usually shifts across only two out of the threescales primarily the natural and harmonic minorIn such cases the normalization technique appliedwould cause the system to become slightly biasedtoward the relative major key where this problem isnot present as there is only one major scale

Errors in key recognition do not affect the ChordAccuracy Enhancement (Phase 1) module becausewe also consider chords present in the relative ma-jorminor key in addition to the chords in the de-tected key A theoretical explanation of how toperform key identification in such cases of ambigu-ity (as seen above) based on an analysis of sheet mu-sic can be found in Ewer (2002)

Analysis of Chord

The variation in the chord-detection accuracy of thesystem can be explained in terms of other chordsand key change

Use of Other Chords

In this approach we have considered only the majorand minor triads However in addition to these

82 Computer Music Journal

Shenoy and Wang

there are other chord possibilities in popular musicwhich likely contribute to the variation in chord de-tection These chords include the augmented anddiminished triads seventh chords (eg the dominantseventh major seventh and minor seventh) etc

Polyphony with its multidimensional sequencesof overlapping tones and overlapping harmoniccomponents of individual notes in the spectrummight cause the elements in the chroma vector tobe weighted wrongly As a result a Cmajor 7 chord(C E G B) in the music might incorrectly get de-tected as an E minor chord (E G B) if the latterthree notes are assigned a relatively higher weightin the chroma vector

Key Change

In some songs there is a key change toward the endof a song to make the final repeated part(s) (eg thechorusrefrain) slightly different from the previousparts This is affected by transposing the song tohigher semitones usually up a half step This hasalso been highlighted by Goto (2003) Because oursystem does not currently handle key changes thechords detected in this section will not be recog-nized This is illustrated with an example in Figure 5

Another point to be noted here is of chord substi-tutionsimplification of extended chords in theevaluation of our system For simplicity extendedchords can be substituted by their respective ma-jorminor triads As long as the notes in the ex-tended chord are present in the scale and the basictriad is present the simplification can be done For

example the C-major 7 can be simplified to the Cmajor triad This substitution has been performedon the extended chords annotated in the sheet mu-sic in the evaluation of our system

Rhythm Tracking Observations

We conjecture the error in rhythm tracking to becaused by errors in the chord-detection as discussedabove Chords present in the music and not handledby our system could be incorrectly classified into oneof the 24 majorminor triads owing to complexitiesin polyphonic audio analysis This can result in in-correct clusters of four chords being captured by therhythm-detection process resulting in an incorrectpattern of measure boundaries having the highestcount Furthermore beat detection is a non-trivialtask the difficulties of tracking the beats in acous-tic signals are discussed in Goto and Muraoka(1994) Any error in beat detection can cause a shiftin the rhythm structure determined by the system

Discussion

We have presented a framework to determine thekey chords and hierarchical rhythm structure fromacoustic musical signals To our knowledge this isthe first attempt to use a rule-based approach thatcombines low-level features with high-level musicknowledge of rhythm and harmonic structure to de-termine all three of these dimensions of music The

83

Figure 5 Key Shift

methods should (and do) work reasonably well forkey and chord labeling of popular music in 44 timeHowever they would not extend very far beyondthis idiom or to more rhythmically and tonallycomplex music This framework has been appliedsuccessfully in various other aspects of contentanalysis such as singing-voice detection (Nwe et al2004) and the automatic alignment of textual lyricsand musical audio (Wang et al 2004) The humanauditory system is capable of extracting rich andmeaningful data from complex audio signals (Shehand Ellis 2003) and existing computational audi-tory analysis systems fall clearly behind humans inperformance Towards this end we believe that themodel proposed here provides a promising platformfor the future development of more sophisticatedauditory models based on a better understanding ofmusic Our current and future research that buildson this work is highlighted below

Key Detection

Our technique assumes that the key of the song isconstant throughout the duration of the song How-ever owing to the properties of the relative majorminor key combinations we have made the chord-and rhythm-detection processes (which use the keyof the song as input) quite robust against changesacross this key combination However the same can-not be said about other kinds of key changes in themusic This is because such key changes are quitedifficult to track as there are no fixed rules andthey tend to depend more on the songwriterrsquos cre-ativity For example the song ldquoLet It Growrdquo by EricClapton switches from a B-minor key in the verse toan E major key in the chorus We believe that ananalysis of the song structure (verse chorus bridgeetc) could likely serve as an input to help track thiskind of key change This problem is currently beinganalyzed and will be tackled in the future

Chord Detection

In this approach we have considered only the majorand minor triads However in addition to these

there are other chord possibilities in popular musicand future work will be targeted toward the detec-tion of dominant-seventh chords and extendedchords as discussed earlier Chord-detection re-search can be further extended to include knowl-edge of chord progressions based on the function ofchords in their diatonic scale which relates to theexpected resolution of each chord within a keyThat is the analysis of chord progressions based onthe ldquoneedrdquo for a sounded chord to move from anldquounstablerdquo sound (a ldquodissonancerdquo) to a more final orldquostablerdquo sounding one (a ldquoconsonancerdquo)

Rhythm Tracking

Unlike that of Goto and Muraoka (1999) therhythm-extraction technique employed in our cur-rent system does not perform well for ldquodrumlessrdquomusic signals because the onset detector has beenoptimized to detect the onset of percussive events(Wang et al 2003) Future effort will be aimed at ex-tending the current work for music signals that donot contain drum sounds

Acknowledgments

We thank the editors and two anonymous reviewersfor helpful comments and suggestions on an earlierversion of this article

References

Aigrain P 1999 ldquoNew Applications of Content Pro-cessing of Musicrdquo Journal of New Music Research28(4)271ndash280

Allen P E and R B Dannenberg 1990 ldquoTracking MusicalBeats in Real Timerdquo Proceedings of the 1990 Interna-tional Computer Music Conference San Francisco In-ternational Computer Music Association pp 140ndash143

Bello J P et al 2000 ldquoTechniques for Automatic MusicTranscriptionrdquo Paper presented at the 2000 Interna-tional Symposium on Music Information Retrieval Uni-versity of Massachusetts at Amherst 23ndash25 October

Carreras F et al 1999 ldquoAutomatic Harmonic Descrip-tion of Musical Signals Using Schema-Based Chord De-

84 Computer Music Journal

Shenoy and Wang

compositionrdquo Journal of New Music Research 28(4)310ndash333

Cemgil A T et al 2001 ldquoOn Tempo Tracking Tempo-gram Representation and Kalman Filteringrdquo Journal ofNew Music Research 29(4)259ndash273

Cemgil A T and B Kappen 2003 ldquoMonte Carlo Meth-ods for Tempo Tracking and Rhythm QuantizationrdquoJournal of Artificial Intelligence Research 18(1)45ndash81

Chew E 2001 ldquoModeling Tonality Applications to Mu-sic Cognitionrdquo Proceedings of the Twenty-Third An-nual Conference of the Cognitive Science SocietyWheat Ridge Colorado Cognitive Science Societypp 206ndash211

Chew E 2002 ldquoThe Spiral Array An Algorithm for De-termining Key Boundariesrdquo Proceedings of the 2002International Conference on Music and Artificial In-telligence Vienna Springer pp 18ndash31

Dixon S 2001 ldquoAutomatic Extraction of Tempo and Beatfrom Expressive Performancesrdquo Journal of New MusicResearch 30(1)39ndash58

Dixon S 2003 ldquoOn the Analysis of Musical Expressionsin Audio Signalsrdquo SPIEmdashThe International Society forOptical Engineering 5021(2)122ndash132

Dixon S 2004 ldquoAnalysis of Musical Content in DigitalAudiordquo Computer Graphics and Multimedia Applica-tions Problems and Solutions Hershey PennsylvaniaIdea Group pp 214ndash235

Ewer G 2002 Easy Music Theory Lower Sackville NovaScotia Spring Day Music Publishers

Fujishima T 1999 ldquoReal-Time Chord Recognition ofMusical Sound A System Using Common Lisp Mu-sicrdquo Proceedings of the 1999 International ComputerMusic Conference San Francisco International Com-puter Music Association pp 464ndash467

Goto M 2001 ldquoAn Audio-Based Real-Time Beat-TrackingSystem for Music with or without Drum SoundsrdquoJournal of New Music Research 30(2)159ndash171

Goto M 2003 ldquoA Chorus-Section Detecting Method forMusical Audio Signalsrdquo Proceedings of the 2003 Inter-national Conference on Acoustics Speech and SignalProcessing Piscataway New Jersey Institute for Elec-trical and Electronics Engineers pp 437ndash440

Goto M and Y Muraoka 1994 ldquoA Beat-Tracking Sys-tem for Acoustic Signals of Musicrdquo Proceedings of the1994 ACM Multimedia New York Association forComputing Machinery pp 365ndash372

Goto M and Y Muraoka 1999 ldquoReal-Time Beat Track-ing for Drumless Audio Signals Chord Change Detec-tion for Musical Decisionsrdquo Speech Communication27(3ndash4)311ndash335

Hevner K 1936 ldquoExperimental Studies of the Elements

of Expression in Musicrdquo American Journal of Psychol-ogy 48246ndash268

Huron D 2000 ldquoPerceptual and Cognitive Applicationsin Music Information Retrievalrdquo Paper presented atthe 2000 International Symposium on Music Informa-tion Retrieval University of Massachusetts at Amherst23ndash25 October

Huron D and R Parncutt 1993 ldquoAn Improved Modelof Tonality Perception Incorporating Pitch Salienceand Echoic Memoryrdquo Psychomusicology 12(2)154ndash171

Klapuri A 2003 ldquoAutomatic Transcription of MusicrdquoProceedings of SMAC 2003 Stockholm KTH

Krumhansl C 1990 Cognitive Foundations of MusicalPitch Oxford Oxford University Press

Maddage N C et al 2004 ldquoContent-Based Music Struc-ture Analysis with Applications to Music SemanticsUnderstandingrdquo Proceedings of 2004 ACM Multime-dia New York Association for Computing Machinerypp 112ndash119

Martin KD et al 1998 ldquoMusic Content AnalysisThrough Models of Auditionrdquo Paper presented at the1998 ACM Multimedia Workshop on Content Pro-cessing of Music for Multimedia Applications BristolUK 12 April

Ng K et al 1996 ldquoAutomatic Detection of Tonality Us-ing Note Distributionrdquo Journal of New Music Re-search 25(4)369ndash381

Nwe T L et al 2004 ldquoSinging Voice Detection in Popu-lar Musicrdquo Paper presented at the 2004 ACM Multime-dia Conference New York 12 October

Ortiz-Berenguer L I and F J Casajus-Quiros 2002ldquoPolyphonic Transcription Using Piano Modeling forSpectral Pattern Recognitionrdquo Proceedings of the 2002Conference on Digital Audio Effects Hoboken NewJersey Wiley pp 45ndash50

Pardo B and W Birmingham 1999 ldquoAutomated Parti-tioning of Tonal Musicrdquo Technical Report CSE-TR-396ndash99 Electrical Engineering and Computer ScienceDepartment University of Michigan

Pardo B and W Birmingham 2001 ldquoChordal Analysis ofTonal Musicrdquo Technical Report CSE-TR-439-01 Elec-trical Engineering and Computer Science DepartmentUniversity of Michigan

Pauws S 2004 ldquoMusical Key Extraction from AudiordquoPaper presented at the 2004 International Sympo-sium on Music Information Retrieval Barcelona11 October

Pickens J et al 2002 ldquoPolyphonic Score Retrieval UsingPolyphonic Audio Queries A Harmonic Modeling Ap-proachrdquo Paper presented at the 2002 International

85

Symposium on Music Information Retrieval Paris15 October

Pickens J and T Crawford 2002 ldquoHarmonic Models forPolyphonic Music Retrievalrdquo Proceedings of the 2002ACM Conference on Information and Knowledge Man-agement New York Association for Computing Ma-chinery pp 430ndash437

Pickens J 2003 ldquoKey-Specific Shrinkage Techniques forHarmonic Modelsrdquo Poster presented at the 2003 Inter-national Symposium on Music Information RetrievalBaltimore 26ndash30 October

Plumbley M D et al 2002 ldquoAutomatic Music Tran-scription and Audio Source Separationrdquo Cyberneticsand Systems 33(6)603ndash627

Povel D J L 2002 ldquoA Model for the Perception of TonalMelodiesrdquo Proceedings of the 2002 International Con-ference on Music and Artificial Intelligence ViennaSpringer pp 144ndash154

Raphael C 2001 ldquoAutomated Rhythm TranscriptionrdquoPaper presented at the 2001 International Symposiumon Music Information Retrieval Bloomington Indiana15ndash17 October

Raphael C and J Stoddard 2003 ldquoHarmonic Analysiswith Probabilistic Graphical Modelsrdquo Paper presentedat the 2003 International Symposium on Music Infor-mation Retrieval Baltimore 26ndash30 October

Scheirer E D 1998 ldquoTempo and Beat Analysis of Acous-tic Musical Signalsrdquo Journal of the Acoustical Societyof America 103(1)588ndash601

Sheh A and D P W Ellis 2003 ldquoChord Segmentationand Recognition Using Em-Trained Hidden MarkovModelsrdquo Paper presented at the 2003 InternationalSymposium on Music Information Retrieval Balti-more 26ndash30 October

Shenoy A et al 2004 ldquoKey Determination of AcousticMusical Signalsrdquo Paper presented at the 2004 Interna-tional Conference on Multimedia and Expo Taipei 27ndash30 June

Smith N A and M A Schmuckler 2000 ldquoPitch-Distributional Effects on the Perception of Tonalityrdquo

Paper presented at the Sixth International Confer-ence on Music Perception and Cognition Keele5ndash10 August

Sollberger B et al 2003 ldquoMusical Chords as AffectivePriming Context in a Word-Evaluation Taskrdquo MusicPerception 20(3)263ndash282

Temperley D 1999a ldquoImproving the Krumhansl-Schmuckler Key-Finding Algorithmrdquo Paper presentedat the 22nd Annual Meeting of the Society for MusicTheory Atlanta 12 November

Temperley D 1999b ldquoWhatrsquos Key for Key TheKrumhansl-Schmuckler Key-Finding Algorithm Recon-sideredrdquo Music Perception 17(1)65ndash100

Temperley D 2002 ldquoA Bayesian Model of Key-FindingrdquoProceedings of the 2002 International Conference onMusic and Artificial Intelligence Vienna Springerpp 195ndash206

Vercoe B L 1997 ldquoComputational Auditory Pathways toMusic Understandingrdquo In I Deliegravege and J Sloboda edsPerception and Cognition of Music London Psychol-ogy Press pp 307ndash326

Wakefield G H 1999 ldquoMathematical Representation ofJoint Time-Chroma Distributionsrdquo SPIEmdashThe Inter-national Society for Optical Engineering 3807637ndash645

Wang Y et al 2003 ldquoParametric Vector Quantization forCoding Percussive Sounds in Musicrdquo Proceedings ofthe 2003 International Conference on AcousticsSpeech and Signal Processing Piscataway New JerseyInstitute for Electrical and Electronics Engineers

Wang Y et al 2004 ldquoLyricAlly Automatic Synchroniza-tion of Acoustic Musical Signals and Textual LyricsrdquoProceedings of 2004 ACM Multimedia New York As-sociation for Computing Machinery pp 212ndash219

Zhu Y and M Kankanhalli 2003 ldquoMusic Scale Modelingfor Melody Matchingrdquo Proceedings of the Eleventh In-ternational ACM Conference on Multimedia New YorkAssociation for Computing Machinery pp 359ndash362

Zhu Y et al 2004 ldquoA Method for Solmization of MelodyrdquoPaper presented at the 2004 International Conferenceon Multimedia and Expo Taipei 27ndash30 June

86 Computer Music Journal

Check 3 Intra-Measure Chord Check

From Goto and Muraoka (1999) we know thatchords are more likely to change at the beginning ofthe measures than at other positions of half-notetimes Hence if three of the chords are the samethen the four chord is likely to be the same as theothers (Check 3a) Also if a chord is common toboth halves of the measure then all the chords in

the measure are likely to be the same as this chord(Check 3b)

It is observed that all possible cases of chords un-der Check 3a are already handled by Checks 1 and 2above Hence we only implement Check 3b as il-lustrated in Figure 4 with an example This check isrequired because in the case of a minor key we canhave both the major and minor chord of the samepitch class present in the song A classic example of

80 Computer Music Journal

Figure 3 Error in measureboundary detection

Figure 4 Chord accuracyenhancement (Phase 2)

Figure 3

Figure 4

Shenoy and Wang

this can be seen in ldquoHotel Californiardquo by the EaglesThis song is in the key of B Minor and the chords inthe verse include an E major and an E minor chordwhich shows a possible musical shift from the as-cending melodic minor to the natural minor Hereif an E major chord is detected in a measure contain-ing the E minor chord Check 1 would not detectany error because both the major and minor chordsare potentially present in the key of B minor Themelodic minor however rarely occurs as such inpopular music melodies but it has been included inthis work along with the natural and harmonic mi-nor for completeness

Experiments

Setup

The results of our experiments performed on 30popular songs in English spanning five decades ofmusic are tabulated in Table 1 The songs havebeen carefully selected to represent a variety ofartists and time periods We assume the meter to be44 which is the most frequent meter of popularsongs and the tempo of the input song is assumedto be constrained between 40 and 185 quarter notesper minute

81

Table 1 Experimental Results

Chord Chord Accuracy Successful Chord AccuracyDetection Original Detected Enhancement-I Measure Enhancement-II

No Song Title ( accuracy) Key Key ( accuracy) Detection ( accuracy)

1 (1965) Righteous BrothersmdashUnchained melody 5768 C major C major 7092 2830 songs 85111 (1965) Righteous BrothersmdashUnchained melody 5768 C major C major 7092 Yes 85112 (1977) Bee GeesmdashStayinrsquo alive 3967 F minor F minor 5491 Yes 71403 (1977) Eric ClaptonmdashWonderful tonight 2770 G major G major 4082 Yes 60644 (1977) Fleetwood MacmdashYou make lovinrsquo fun 4437 A major A major 6069 Yes 79315 (1979) EaglesmdashI canrsquot tell you why 5241 D major D major 6874 Yes 88976 (1984) ForeignermdashI want to know what love is 5503 D minor D minor 7342 No 58127 (1986) Bruce HornsbymdashThe way it is 5974 G major G major 7032 Yes 88508 (1989) Chris ReamdashRoad to hell 6151 A minor A minor 7664 Yes 89249 (1991) REMmdashLosing my religion 5631 A minor A minor 7075 Yes 857410 (1991) U2mdashOne 5663 C major C major 6482 Yes 766311 (1992) Michael JacksonmdashHeal the world 3044 A major A major 5176 Yes 686212 (1993) MLTRmdashSomeday 5668 D major D major 6971 Yes 873013 (1995) CooliomdashGangstarsquos paradise 3175 C minor C minor 4794 Yes 707914 (1996) Backstreet BoysmdashAs long as you love me 4845 C major C major 6197 Yes 828215 (1996) Joan OsbornemdashOne of us 4690 A major A major 5930 Yes 800516 (1997) Bryan AdamsmdashBack to you 6892 C major C major 7569 Yes 958017 (1997) Green DaymdashTime of your life 5455 G major G major 6458 Yes 877718 (1997) HansonmdashMmmbop 3956 A major A major 6339 Yes 810819 (1997) Savage GardenmdashTruly madly deeply 4906 C major C major 6388 Yes 808620 (1997) Spice GirlsmdashViva forever 6450 D minor F major 7425 Yes 914221 (1997) Tina ArenamdashBurn 3542 G major G major 5613 Yes 773822 (1998) Jennifer PaigemdashCrush 4037 C min C min 5541 Yes 767823 (1998) Natalie ImbrugliamdashTorn 5300 F major F major 6789 Yes 877324 (1999) SantanamdashSmooth 5453 A minor A minor 6963 No 499125 (2000) CorrsmdashBreathless 3677 B major B major 6347 Yes 772826 (2000) Craig DavidmdashWalking away 6899 A minor C major 7526 Yes 930327 (2000) Nelly FurtadomdashTurn off the light 3636 D major D major 4848 Yes 705228 (2000) WestlifemdashSeasons in the sun 3419 F major F major 5869 Yes 763529 (2001) ShakiramdashWhenever wherever 4986 C minor C minor 6282 Yes 783930 (2001) TrainmdashDrops of Jupiter 3254 C major C major 5373 Yes 6985

Overall Accuracy at each stage 4813 2830 songs 6320 2830 songs 7891

It can be observed that the average chord-detectionaccuracy across the length of the entire music per-formed by the chord-detection step (module 3 inFigure 2) is relatively low at 4813 The rest of thechords are either not detected or are detected in er-ror This accuracy is sufficient however to deter-mine the key accurately for 28 out of 30 songs inthe key-detection step (module 4 in Figure 2) whichreflects an accuracy of over 93 for key detectionThis was verified against the information in com-mercially available sheet music for the songs (wwwmusicnotescom wwwsheetmusicpluscom) Theaverage chord detection accuracy of the system im-proves on an average by 1507 on applying ChordAccuracy Enhancement (Phase 1) (See module 5 inFigure 2) Errors in key determination do not haveany effect on this step as will be discussed next

The new accuracy of 6320 has been found to besufficient to determine the hierarchical rhythmstructure (module 6 in Figure 2) across the music for28 out of the 30 songs thus again reflecting an accu-racy of over 93 for rhythm tracking Finally theapplication of Chord Accuracy Enhancement (Phase2 see module 7 in Figure 2) makes a substantial per-formance improvement of 1571 leading to a finalchord-detection accuracy of 7891 This couldhave been higher were it not for the performancedrop for the two songs (song numbers 6 and 24 inTable 1) owing to error in measure-boundary detec-tion This exemplifies the importance of accuratemeasure detection to performing intra-measurechord checks based on the previously discussedmusical knowledge of chords

Analysis of Key

It can be observed that for two of the songs (songnumbers 20 and 26 in Table 1) the key has been de-termined incorrectly because the major key and itsrelative minor are very close Our technique as-sumes that the key of the song is constant through-out the length of the song However many songsoften use both major and minor keys perhapschoosing a minor key for the verse and a major keyfor the chorus or vice versa Sometimes the chordsused in the song are present in both the major and

its relative minor For example the four mainchords used in the song ldquoViva Foreverrdquo by the SpiceGirls are D-sharp minor A-sharp minor B majorand F-sharp major These chords are present in thekey of F-sharp major and D-sharp minor hence it isdifficult for the system to determine if the song is inthe major key or the relative minor A similar obser-vation can be made for the song ldquoWalking Awayrdquo byCraig David in which the main chords used are Aminor D minor F major G major and C majorThese chords are present in both the key of C majoras well as in its relative minor A minor

The use of a weighted-cosine-similarity techniquecauses the shorter major-key patterns to be preferredover the longer minor-key patterns owing to nor-malization that is performed while applying the co-sine similarity For the minor keys normalization isapplied taking into account the count of chords thatcan be constructed across all the three types of minorscales However from an informal evaluation of pop-ular music we observe that popular music in minorkeys usually shifts across only two out of the threescales primarily the natural and harmonic minorIn such cases the normalization technique appliedwould cause the system to become slightly biasedtoward the relative major key where this problem isnot present as there is only one major scale

Errors in key recognition do not affect the ChordAccuracy Enhancement (Phase 1) module becausewe also consider chords present in the relative ma-jorminor key in addition to the chords in the de-tected key A theoretical explanation of how toperform key identification in such cases of ambigu-ity (as seen above) based on an analysis of sheet mu-sic can be found in Ewer (2002)

Analysis of Chord

The variation in the chord-detection accuracy of thesystem can be explained in terms of other chordsand key change

Use of Other Chords

In this approach we have considered only the majorand minor triads However in addition to these

82 Computer Music Journal

Shenoy and Wang

there are other chord possibilities in popular musicwhich likely contribute to the variation in chord de-tection These chords include the augmented anddiminished triads seventh chords (eg the dominantseventh major seventh and minor seventh) etc

Polyphony with its multidimensional sequencesof overlapping tones and overlapping harmoniccomponents of individual notes in the spectrummight cause the elements in the chroma vector tobe weighted wrongly As a result a Cmajor 7 chord(C E G B) in the music might incorrectly get de-tected as an E minor chord (E G B) if the latterthree notes are assigned a relatively higher weightin the chroma vector

Key Change

In some songs there is a key change toward the endof a song to make the final repeated part(s) (eg thechorusrefrain) slightly different from the previousparts This is affected by transposing the song tohigher semitones usually up a half step This hasalso been highlighted by Goto (2003) Because oursystem does not currently handle key changes thechords detected in this section will not be recog-nized This is illustrated with an example in Figure 5

Another point to be noted here is of chord substi-tutionsimplification of extended chords in theevaluation of our system For simplicity extendedchords can be substituted by their respective ma-jorminor triads As long as the notes in the ex-tended chord are present in the scale and the basictriad is present the simplification can be done For

example the C-major 7 can be simplified to the Cmajor triad This substitution has been performedon the extended chords annotated in the sheet mu-sic in the evaluation of our system

Rhythm Tracking Observations

We conjecture the error in rhythm tracking to becaused by errors in the chord-detection as discussedabove Chords present in the music and not handledby our system could be incorrectly classified into oneof the 24 majorminor triads owing to complexitiesin polyphonic audio analysis This can result in in-correct clusters of four chords being captured by therhythm-detection process resulting in an incorrectpattern of measure boundaries having the highestcount Furthermore beat detection is a non-trivialtask the difficulties of tracking the beats in acous-tic signals are discussed in Goto and Muraoka(1994) Any error in beat detection can cause a shiftin the rhythm structure determined by the system

Discussion

We have presented a framework to determine thekey chords and hierarchical rhythm structure fromacoustic musical signals To our knowledge this isthe first attempt to use a rule-based approach thatcombines low-level features with high-level musicknowledge of rhythm and harmonic structure to de-termine all three of these dimensions of music The

83

Figure 5 Key Shift

methods should (and do) work reasonably well forkey and chord labeling of popular music in 44 timeHowever they would not extend very far beyondthis idiom or to more rhythmically and tonallycomplex music This framework has been appliedsuccessfully in various other aspects of contentanalysis such as singing-voice detection (Nwe et al2004) and the automatic alignment of textual lyricsand musical audio (Wang et al 2004) The humanauditory system is capable of extracting rich andmeaningful data from complex audio signals (Shehand Ellis 2003) and existing computational audi-tory analysis systems fall clearly behind humans inperformance Towards this end we believe that themodel proposed here provides a promising platformfor the future development of more sophisticatedauditory models based on a better understanding ofmusic Our current and future research that buildson this work is highlighted below

Key Detection

Our technique assumes that the key of the song isconstant throughout the duration of the song How-ever owing to the properties of the relative majorminor key combinations we have made the chord-and rhythm-detection processes (which use the keyof the song as input) quite robust against changesacross this key combination However the same can-not be said about other kinds of key changes in themusic This is because such key changes are quitedifficult to track as there are no fixed rules andthey tend to depend more on the songwriterrsquos cre-ativity For example the song ldquoLet It Growrdquo by EricClapton switches from a B-minor key in the verse toan E major key in the chorus We believe that ananalysis of the song structure (verse chorus bridgeetc) could likely serve as an input to help track thiskind of key change This problem is currently beinganalyzed and will be tackled in the future

Chord Detection

In this approach we have considered only the majorand minor triads However in addition to these

there are other chord possibilities in popular musicand future work will be targeted toward the detec-tion of dominant-seventh chords and extendedchords as discussed earlier Chord-detection re-search can be further extended to include knowl-edge of chord progressions based on the function ofchords in their diatonic scale which relates to theexpected resolution of each chord within a keyThat is the analysis of chord progressions based onthe ldquoneedrdquo for a sounded chord to move from anldquounstablerdquo sound (a ldquodissonancerdquo) to a more final orldquostablerdquo sounding one (a ldquoconsonancerdquo)

Rhythm Tracking

Unlike that of Goto and Muraoka (1999) therhythm-extraction technique employed in our cur-rent system does not perform well for ldquodrumlessrdquomusic signals because the onset detector has beenoptimized to detect the onset of percussive events(Wang et al 2003) Future effort will be aimed at ex-tending the current work for music signals that donot contain drum sounds

Acknowledgments

We thank the editors and two anonymous reviewersfor helpful comments and suggestions on an earlierversion of this article

References

Aigrain P 1999 ldquoNew Applications of Content Pro-cessing of Musicrdquo Journal of New Music Research28(4)271ndash280

Allen P E and R B Dannenberg 1990 ldquoTracking MusicalBeats in Real Timerdquo Proceedings of the 1990 Interna-tional Computer Music Conference San Francisco In-ternational Computer Music Association pp 140ndash143

Bello J P et al 2000 ldquoTechniques for Automatic MusicTranscriptionrdquo Paper presented at the 2000 Interna-tional Symposium on Music Information Retrieval Uni-versity of Massachusetts at Amherst 23ndash25 October

Carreras F et al 1999 ldquoAutomatic Harmonic Descrip-tion of Musical Signals Using Schema-Based Chord De-

84 Computer Music Journal

Shenoy and Wang

compositionrdquo Journal of New Music Research 28(4)310ndash333

Cemgil A T et al 2001 ldquoOn Tempo Tracking Tempo-gram Representation and Kalman Filteringrdquo Journal ofNew Music Research 29(4)259ndash273

Cemgil A T and B Kappen 2003 ldquoMonte Carlo Meth-ods for Tempo Tracking and Rhythm QuantizationrdquoJournal of Artificial Intelligence Research 18(1)45ndash81

Chew E 2001 ldquoModeling Tonality Applications to Mu-sic Cognitionrdquo Proceedings of the Twenty-Third An-nual Conference of the Cognitive Science SocietyWheat Ridge Colorado Cognitive Science Societypp 206ndash211

Chew E 2002 ldquoThe Spiral Array An Algorithm for De-termining Key Boundariesrdquo Proceedings of the 2002International Conference on Music and Artificial In-telligence Vienna Springer pp 18ndash31

Dixon S 2001 ldquoAutomatic Extraction of Tempo and Beatfrom Expressive Performancesrdquo Journal of New MusicResearch 30(1)39ndash58

Dixon S 2003 ldquoOn the Analysis of Musical Expressionsin Audio Signalsrdquo SPIEmdashThe International Society forOptical Engineering 5021(2)122ndash132

Dixon S 2004 ldquoAnalysis of Musical Content in DigitalAudiordquo Computer Graphics and Multimedia Applica-tions Problems and Solutions Hershey PennsylvaniaIdea Group pp 214ndash235

Ewer G 2002 Easy Music Theory Lower Sackville NovaScotia Spring Day Music Publishers

Fujishima T 1999 ldquoReal-Time Chord Recognition ofMusical Sound A System Using Common Lisp Mu-sicrdquo Proceedings of the 1999 International ComputerMusic Conference San Francisco International Com-puter Music Association pp 464ndash467

Goto M 2001 ldquoAn Audio-Based Real-Time Beat-TrackingSystem for Music with or without Drum SoundsrdquoJournal of New Music Research 30(2)159ndash171

Goto M 2003 ldquoA Chorus-Section Detecting Method forMusical Audio Signalsrdquo Proceedings of the 2003 Inter-national Conference on Acoustics Speech and SignalProcessing Piscataway New Jersey Institute for Elec-trical and Electronics Engineers pp 437ndash440

Goto M and Y Muraoka 1994 ldquoA Beat-Tracking Sys-tem for Acoustic Signals of Musicrdquo Proceedings of the1994 ACM Multimedia New York Association forComputing Machinery pp 365ndash372

Goto M and Y Muraoka 1999 ldquoReal-Time Beat Track-ing for Drumless Audio Signals Chord Change Detec-tion for Musical Decisionsrdquo Speech Communication27(3ndash4)311ndash335

Hevner K 1936 ldquoExperimental Studies of the Elements

of Expression in Musicrdquo American Journal of Psychol-ogy 48246ndash268

Huron D 2000 ldquoPerceptual and Cognitive Applicationsin Music Information Retrievalrdquo Paper presented atthe 2000 International Symposium on Music Informa-tion Retrieval University of Massachusetts at Amherst23ndash25 October

Huron D and R Parncutt 1993 ldquoAn Improved Modelof Tonality Perception Incorporating Pitch Salienceand Echoic Memoryrdquo Psychomusicology 12(2)154ndash171

Klapuri A 2003 ldquoAutomatic Transcription of MusicrdquoProceedings of SMAC 2003 Stockholm KTH

Krumhansl C 1990 Cognitive Foundations of MusicalPitch Oxford Oxford University Press

Maddage N C et al 2004 ldquoContent-Based Music Struc-ture Analysis with Applications to Music SemanticsUnderstandingrdquo Proceedings of 2004 ACM Multime-dia New York Association for Computing Machinerypp 112ndash119

Martin KD et al 1998 ldquoMusic Content AnalysisThrough Models of Auditionrdquo Paper presented at the1998 ACM Multimedia Workshop on Content Pro-cessing of Music for Multimedia Applications BristolUK 12 April

Ng K et al 1996 ldquoAutomatic Detection of Tonality Us-ing Note Distributionrdquo Journal of New Music Re-search 25(4)369ndash381

Nwe T L et al 2004 ldquoSinging Voice Detection in Popu-lar Musicrdquo Paper presented at the 2004 ACM Multime-dia Conference New York 12 October

Ortiz-Berenguer L I and F J Casajus-Quiros 2002ldquoPolyphonic Transcription Using Piano Modeling forSpectral Pattern Recognitionrdquo Proceedings of the 2002Conference on Digital Audio Effects Hoboken NewJersey Wiley pp 45ndash50

Pardo B and W Birmingham 1999 ldquoAutomated Parti-tioning of Tonal Musicrdquo Technical Report CSE-TR-396ndash99 Electrical Engineering and Computer ScienceDepartment University of Michigan

Pardo B and W Birmingham 2001 ldquoChordal Analysis ofTonal Musicrdquo Technical Report CSE-TR-439-01 Elec-trical Engineering and Computer Science DepartmentUniversity of Michigan

Pauws S 2004 ldquoMusical Key Extraction from AudiordquoPaper presented at the 2004 International Sympo-sium on Music Information Retrieval Barcelona11 October

Pickens J et al 2002 ldquoPolyphonic Score Retrieval UsingPolyphonic Audio Queries A Harmonic Modeling Ap-proachrdquo Paper presented at the 2002 International

85

Symposium on Music Information Retrieval Paris15 October

Pickens J and T Crawford 2002 ldquoHarmonic Models forPolyphonic Music Retrievalrdquo Proceedings of the 2002ACM Conference on Information and Knowledge Man-agement New York Association for Computing Ma-chinery pp 430ndash437

Pickens J 2003 ldquoKey-Specific Shrinkage Techniques forHarmonic Modelsrdquo Poster presented at the 2003 Inter-national Symposium on Music Information RetrievalBaltimore 26ndash30 October

Plumbley M D et al 2002 ldquoAutomatic Music Tran-scription and Audio Source Separationrdquo Cyberneticsand Systems 33(6)603ndash627

Povel D J L 2002 ldquoA Model for the Perception of TonalMelodiesrdquo Proceedings of the 2002 International Con-ference on Music and Artificial Intelligence ViennaSpringer pp 144ndash154

Raphael C 2001 ldquoAutomated Rhythm TranscriptionrdquoPaper presented at the 2001 International Symposiumon Music Information Retrieval Bloomington Indiana15ndash17 October

Raphael C and J Stoddard 2003 ldquoHarmonic Analysiswith Probabilistic Graphical Modelsrdquo Paper presentedat the 2003 International Symposium on Music Infor-mation Retrieval Baltimore 26ndash30 October

Scheirer E D 1998 ldquoTempo and Beat Analysis of Acous-tic Musical Signalsrdquo Journal of the Acoustical Societyof America 103(1)588ndash601

Sheh A and D P W Ellis 2003 ldquoChord Segmentationand Recognition Using Em-Trained Hidden MarkovModelsrdquo Paper presented at the 2003 InternationalSymposium on Music Information Retrieval Balti-more 26ndash30 October

Shenoy A et al 2004 ldquoKey Determination of AcousticMusical Signalsrdquo Paper presented at the 2004 Interna-tional Conference on Multimedia and Expo Taipei 27ndash30 June

Smith N A and M A Schmuckler 2000 ldquoPitch-Distributional Effects on the Perception of Tonalityrdquo

Paper presented at the Sixth International Confer-ence on Music Perception and Cognition Keele5ndash10 August

Sollberger B et al 2003 ldquoMusical Chords as AffectivePriming Context in a Word-Evaluation Taskrdquo MusicPerception 20(3)263ndash282

Temperley D 1999a ldquoImproving the Krumhansl-Schmuckler Key-Finding Algorithmrdquo Paper presentedat the 22nd Annual Meeting of the Society for MusicTheory Atlanta 12 November

Temperley D 1999b ldquoWhatrsquos Key for Key TheKrumhansl-Schmuckler Key-Finding Algorithm Recon-sideredrdquo Music Perception 17(1)65ndash100

Temperley D 2002 ldquoA Bayesian Model of Key-FindingrdquoProceedings of the 2002 International Conference onMusic and Artificial Intelligence Vienna Springerpp 195ndash206

Vercoe B L 1997 ldquoComputational Auditory Pathways toMusic Understandingrdquo In I Deliegravege and J Sloboda edsPerception and Cognition of Music London Psychol-ogy Press pp 307ndash326

Wakefield G H 1999 ldquoMathematical Representation ofJoint Time-Chroma Distributionsrdquo SPIEmdashThe Inter-national Society for Optical Engineering 3807637ndash645

Wang Y et al 2003 ldquoParametric Vector Quantization forCoding Percussive Sounds in Musicrdquo Proceedings ofthe 2003 International Conference on AcousticsSpeech and Signal Processing Piscataway New JerseyInstitute for Electrical and Electronics Engineers

Wang Y et al 2004 ldquoLyricAlly Automatic Synchroniza-tion of Acoustic Musical Signals and Textual LyricsrdquoProceedings of 2004 ACM Multimedia New York As-sociation for Computing Machinery pp 212ndash219

Zhu Y and M Kankanhalli 2003 ldquoMusic Scale Modelingfor Melody Matchingrdquo Proceedings of the Eleventh In-ternational ACM Conference on Multimedia New YorkAssociation for Computing Machinery pp 359ndash362

Zhu Y et al 2004 ldquoA Method for Solmization of MelodyrdquoPaper presented at the 2004 International Conferenceon Multimedia and Expo Taipei 27ndash30 June

86 Computer Music Journal

Shenoy and Wang

this can be seen in ldquoHotel Californiardquo by the EaglesThis song is in the key of B Minor and the chords inthe verse include an E major and an E minor chordwhich shows a possible musical shift from the as-cending melodic minor to the natural minor Hereif an E major chord is detected in a measure contain-ing the E minor chord Check 1 would not detectany error because both the major and minor chordsare potentially present in the key of B minor Themelodic minor however rarely occurs as such inpopular music melodies but it has been included inthis work along with the natural and harmonic mi-nor for completeness

Experiments

Setup

The results of our experiments performed on 30popular songs in English spanning five decades ofmusic are tabulated in Table 1 The songs havebeen carefully selected to represent a variety ofartists and time periods We assume the meter to be44 which is the most frequent meter of popularsongs and the tempo of the input song is assumedto be constrained between 40 and 185 quarter notesper minute

81

Table 1 Experimental Results

Chord Chord Accuracy Successful Chord AccuracyDetection Original Detected Enhancement-I Measure Enhancement-II

No Song Title ( accuracy) Key Key ( accuracy) Detection ( accuracy)

1 (1965) Righteous BrothersmdashUnchained melody 5768 C major C major 7092 2830 songs 85111 (1965) Righteous BrothersmdashUnchained melody 5768 C major C major 7092 Yes 85112 (1977) Bee GeesmdashStayinrsquo alive 3967 F minor F minor 5491 Yes 71403 (1977) Eric ClaptonmdashWonderful tonight 2770 G major G major 4082 Yes 60644 (1977) Fleetwood MacmdashYou make lovinrsquo fun 4437 A major A major 6069 Yes 79315 (1979) EaglesmdashI canrsquot tell you why 5241 D major D major 6874 Yes 88976 (1984) ForeignermdashI want to know what love is 5503 D minor D minor 7342 No 58127 (1986) Bruce HornsbymdashThe way it is 5974 G major G major 7032 Yes 88508 (1989) Chris ReamdashRoad to hell 6151 A minor A minor 7664 Yes 89249 (1991) REMmdashLosing my religion 5631 A minor A minor 7075 Yes 857410 (1991) U2mdashOne 5663 C major C major 6482 Yes 766311 (1992) Michael JacksonmdashHeal the world 3044 A major A major 5176 Yes 686212 (1993) MLTRmdashSomeday 5668 D major D major 6971 Yes 873013 (1995) CooliomdashGangstarsquos paradise 3175 C minor C minor 4794 Yes 707914 (1996) Backstreet BoysmdashAs long as you love me 4845 C major C major 6197 Yes 828215 (1996) Joan OsbornemdashOne of us 4690 A major A major 5930 Yes 800516 (1997) Bryan AdamsmdashBack to you 6892 C major C major 7569 Yes 958017 (1997) Green DaymdashTime of your life 5455 G major G major 6458 Yes 877718 (1997) HansonmdashMmmbop 3956 A major A major 6339 Yes 810819 (1997) Savage GardenmdashTruly madly deeply 4906 C major C major 6388 Yes 808620 (1997) Spice GirlsmdashViva forever 6450 D minor F major 7425 Yes 914221 (1997) Tina ArenamdashBurn 3542 G major G major 5613 Yes 773822 (1998) Jennifer PaigemdashCrush 4037 C min C min 5541 Yes 767823 (1998) Natalie ImbrugliamdashTorn 5300 F major F major 6789 Yes 877324 (1999) SantanamdashSmooth 5453 A minor A minor 6963 No 499125 (2000) CorrsmdashBreathless 3677 B major B major 6347 Yes 772826 (2000) Craig DavidmdashWalking away 6899 A minor C major 7526 Yes 930327 (2000) Nelly FurtadomdashTurn off the light 3636 D major D major 4848 Yes 705228 (2000) WestlifemdashSeasons in the sun 3419 F major F major 5869 Yes 763529 (2001) ShakiramdashWhenever wherever 4986 C minor C minor 6282 Yes 783930 (2001) TrainmdashDrops of Jupiter 3254 C major C major 5373 Yes 6985

Overall Accuracy at each stage 4813 2830 songs 6320 2830 songs 7891

It can be observed that the average chord-detectionaccuracy across the length of the entire music per-formed by the chord-detection step (module 3 inFigure 2) is relatively low at 4813 The rest of thechords are either not detected or are detected in er-ror This accuracy is sufficient however to deter-mine the key accurately for 28 out of 30 songs inthe key-detection step (module 4 in Figure 2) whichreflects an accuracy of over 93 for key detectionThis was verified against the information in com-mercially available sheet music for the songs (wwwmusicnotescom wwwsheetmusicpluscom) Theaverage chord detection accuracy of the system im-proves on an average by 1507 on applying ChordAccuracy Enhancement (Phase 1) (See module 5 inFigure 2) Errors in key determination do not haveany effect on this step as will be discussed next

The new accuracy of 6320 has been found to besufficient to determine the hierarchical rhythmstructure (module 6 in Figure 2) across the music for28 out of the 30 songs thus again reflecting an accu-racy of over 93 for rhythm tracking Finally theapplication of Chord Accuracy Enhancement (Phase2 see module 7 in Figure 2) makes a substantial per-formance improvement of 1571 leading to a finalchord-detection accuracy of 7891 This couldhave been higher were it not for the performancedrop for the two songs (song numbers 6 and 24 inTable 1) owing to error in measure-boundary detec-tion This exemplifies the importance of accuratemeasure detection to performing intra-measurechord checks based on the previously discussedmusical knowledge of chords

Analysis of Key

It can be observed that for two of the songs (songnumbers 20 and 26 in Table 1) the key has been de-termined incorrectly because the major key and itsrelative minor are very close Our technique as-sumes that the key of the song is constant through-out the length of the song However many songsoften use both major and minor keys perhapschoosing a minor key for the verse and a major keyfor the chorus or vice versa Sometimes the chordsused in the song are present in both the major and

its relative minor For example the four mainchords used in the song ldquoViva Foreverrdquo by the SpiceGirls are D-sharp minor A-sharp minor B majorand F-sharp major These chords are present in thekey of F-sharp major and D-sharp minor hence it isdifficult for the system to determine if the song is inthe major key or the relative minor A similar obser-vation can be made for the song ldquoWalking Awayrdquo byCraig David in which the main chords used are Aminor D minor F major G major and C majorThese chords are present in both the key of C majoras well as in its relative minor A minor

The use of a weighted-cosine-similarity techniquecauses the shorter major-key patterns to be preferredover the longer minor-key patterns owing to nor-malization that is performed while applying the co-sine similarity For the minor keys normalization isapplied taking into account the count of chords thatcan be constructed across all the three types of minorscales However from an informal evaluation of pop-ular music we observe that popular music in minorkeys usually shifts across only two out of the threescales primarily the natural and harmonic minorIn such cases the normalization technique appliedwould cause the system to become slightly biasedtoward the relative major key where this problem isnot present as there is only one major scale

Errors in key recognition do not affect the ChordAccuracy Enhancement (Phase 1) module becausewe also consider chords present in the relative ma-jorminor key in addition to the chords in the de-tected key A theoretical explanation of how toperform key identification in such cases of ambigu-ity (as seen above) based on an analysis of sheet mu-sic can be found in Ewer (2002)

Analysis of Chord

The variation in the chord-detection accuracy of thesystem can be explained in terms of other chordsand key change

Use of Other Chords

In this approach we have considered only the majorand minor triads However in addition to these

82 Computer Music Journal

Shenoy and Wang

there are other chord possibilities in popular musicwhich likely contribute to the variation in chord de-tection These chords include the augmented anddiminished triads seventh chords (eg the dominantseventh major seventh and minor seventh) etc

Polyphony with its multidimensional sequencesof overlapping tones and overlapping harmoniccomponents of individual notes in the spectrummight cause the elements in the chroma vector tobe weighted wrongly As a result a Cmajor 7 chord(C E G B) in the music might incorrectly get de-tected as an E minor chord (E G B) if the latterthree notes are assigned a relatively higher weightin the chroma vector

Key Change

In some songs there is a key change toward the endof a song to make the final repeated part(s) (eg thechorusrefrain) slightly different from the previousparts This is affected by transposing the song tohigher semitones usually up a half step This hasalso been highlighted by Goto (2003) Because oursystem does not currently handle key changes thechords detected in this section will not be recog-nized This is illustrated with an example in Figure 5

Another point to be noted here is of chord substi-tutionsimplification of extended chords in theevaluation of our system For simplicity extendedchords can be substituted by their respective ma-jorminor triads As long as the notes in the ex-tended chord are present in the scale and the basictriad is present the simplification can be done For

example the C-major 7 can be simplified to the Cmajor triad This substitution has been performedon the extended chords annotated in the sheet mu-sic in the evaluation of our system

Rhythm Tracking Observations

We conjecture the error in rhythm tracking to becaused by errors in the chord-detection as discussedabove Chords present in the music and not handledby our system could be incorrectly classified into oneof the 24 majorminor triads owing to complexitiesin polyphonic audio analysis This can result in in-correct clusters of four chords being captured by therhythm-detection process resulting in an incorrectpattern of measure boundaries having the highestcount Furthermore beat detection is a non-trivialtask the difficulties of tracking the beats in acous-tic signals are discussed in Goto and Muraoka(1994) Any error in beat detection can cause a shiftin the rhythm structure determined by the system

Discussion

We have presented a framework to determine thekey chords and hierarchical rhythm structure fromacoustic musical signals To our knowledge this isthe first attempt to use a rule-based approach thatcombines low-level features with high-level musicknowledge of rhythm and harmonic structure to de-termine all three of these dimensions of music The

83

Figure 5 Key Shift

methods should (and do) work reasonably well forkey and chord labeling of popular music in 44 timeHowever they would not extend very far beyondthis idiom or to more rhythmically and tonallycomplex music This framework has been appliedsuccessfully in various other aspects of contentanalysis such as singing-voice detection (Nwe et al2004) and the automatic alignment of textual lyricsand musical audio (Wang et al 2004) The humanauditory system is capable of extracting rich andmeaningful data from complex audio signals (Shehand Ellis 2003) and existing computational audi-tory analysis systems fall clearly behind humans inperformance Towards this end we believe that themodel proposed here provides a promising platformfor the future development of more sophisticatedauditory models based on a better understanding ofmusic Our current and future research that buildson this work is highlighted below

Key Detection

Our technique assumes that the key of the song isconstant throughout the duration of the song How-ever owing to the properties of the relative majorminor key combinations we have made the chord-and rhythm-detection processes (which use the keyof the song as input) quite robust against changesacross this key combination However the same can-not be said about other kinds of key changes in themusic This is because such key changes are quitedifficult to track as there are no fixed rules andthey tend to depend more on the songwriterrsquos cre-ativity For example the song ldquoLet It Growrdquo by EricClapton switches from a B-minor key in the verse toan E major key in the chorus We believe that ananalysis of the song structure (verse chorus bridgeetc) could likely serve as an input to help track thiskind of key change This problem is currently beinganalyzed and will be tackled in the future

Chord Detection

In this approach we have considered only the majorand minor triads However in addition to these

there are other chord possibilities in popular musicand future work will be targeted toward the detec-tion of dominant-seventh chords and extendedchords as discussed earlier Chord-detection re-search can be further extended to include knowl-edge of chord progressions based on the function ofchords in their diatonic scale which relates to theexpected resolution of each chord within a keyThat is the analysis of chord progressions based onthe ldquoneedrdquo for a sounded chord to move from anldquounstablerdquo sound (a ldquodissonancerdquo) to a more final orldquostablerdquo sounding one (a ldquoconsonancerdquo)

Rhythm Tracking

Unlike that of Goto and Muraoka (1999) therhythm-extraction technique employed in our cur-rent system does not perform well for ldquodrumlessrdquomusic signals because the onset detector has beenoptimized to detect the onset of percussive events(Wang et al 2003) Future effort will be aimed at ex-tending the current work for music signals that donot contain drum sounds

Acknowledgments

We thank the editors and two anonymous reviewersfor helpful comments and suggestions on an earlierversion of this article

References

Aigrain P 1999 ldquoNew Applications of Content Pro-cessing of Musicrdquo Journal of New Music Research28(4)271ndash280

Allen P E and R B Dannenberg 1990 ldquoTracking MusicalBeats in Real Timerdquo Proceedings of the 1990 Interna-tional Computer Music Conference San Francisco In-ternational Computer Music Association pp 140ndash143

Bello J P et al 2000 ldquoTechniques for Automatic MusicTranscriptionrdquo Paper presented at the 2000 Interna-tional Symposium on Music Information Retrieval Uni-versity of Massachusetts at Amherst 23ndash25 October

Carreras F et al 1999 ldquoAutomatic Harmonic Descrip-tion of Musical Signals Using Schema-Based Chord De-

84 Computer Music Journal

Shenoy and Wang

compositionrdquo Journal of New Music Research 28(4)310ndash333

Cemgil A T et al 2001 ldquoOn Tempo Tracking Tempo-gram Representation and Kalman Filteringrdquo Journal ofNew Music Research 29(4)259ndash273

Cemgil A T and B Kappen 2003 ldquoMonte Carlo Meth-ods for Tempo Tracking and Rhythm QuantizationrdquoJournal of Artificial Intelligence Research 18(1)45ndash81

Chew E 2001 ldquoModeling Tonality Applications to Mu-sic Cognitionrdquo Proceedings of the Twenty-Third An-nual Conference of the Cognitive Science SocietyWheat Ridge Colorado Cognitive Science Societypp 206ndash211

Chew E 2002 ldquoThe Spiral Array An Algorithm for De-termining Key Boundariesrdquo Proceedings of the 2002International Conference on Music and Artificial In-telligence Vienna Springer pp 18ndash31

Dixon S 2001 ldquoAutomatic Extraction of Tempo and Beatfrom Expressive Performancesrdquo Journal of New MusicResearch 30(1)39ndash58

Dixon S 2003 ldquoOn the Analysis of Musical Expressionsin Audio Signalsrdquo SPIEmdashThe International Society forOptical Engineering 5021(2)122ndash132

Dixon S 2004 ldquoAnalysis of Musical Content in DigitalAudiordquo Computer Graphics and Multimedia Applica-tions Problems and Solutions Hershey PennsylvaniaIdea Group pp 214ndash235

Ewer G 2002 Easy Music Theory Lower Sackville NovaScotia Spring Day Music Publishers

Fujishima T 1999 ldquoReal-Time Chord Recognition ofMusical Sound A System Using Common Lisp Mu-sicrdquo Proceedings of the 1999 International ComputerMusic Conference San Francisco International Com-puter Music Association pp 464ndash467

Goto M 2001 ldquoAn Audio-Based Real-Time Beat-TrackingSystem for Music with or without Drum SoundsrdquoJournal of New Music Research 30(2)159ndash171

Goto M 2003 ldquoA Chorus-Section Detecting Method forMusical Audio Signalsrdquo Proceedings of the 2003 Inter-national Conference on Acoustics Speech and SignalProcessing Piscataway New Jersey Institute for Elec-trical and Electronics Engineers pp 437ndash440

Goto M and Y Muraoka 1994 ldquoA Beat-Tracking Sys-tem for Acoustic Signals of Musicrdquo Proceedings of the1994 ACM Multimedia New York Association forComputing Machinery pp 365ndash372

Goto M and Y Muraoka 1999 ldquoReal-Time Beat Track-ing for Drumless Audio Signals Chord Change Detec-tion for Musical Decisionsrdquo Speech Communication27(3ndash4)311ndash335

Hevner K 1936 ldquoExperimental Studies of the Elements

of Expression in Musicrdquo American Journal of Psychol-ogy 48246ndash268

Huron D 2000 ldquoPerceptual and Cognitive Applicationsin Music Information Retrievalrdquo Paper presented atthe 2000 International Symposium on Music Informa-tion Retrieval University of Massachusetts at Amherst23ndash25 October

Huron D and R Parncutt 1993 ldquoAn Improved Modelof Tonality Perception Incorporating Pitch Salienceand Echoic Memoryrdquo Psychomusicology 12(2)154ndash171

Klapuri A 2003 ldquoAutomatic Transcription of MusicrdquoProceedings of SMAC 2003 Stockholm KTH

Krumhansl C 1990 Cognitive Foundations of MusicalPitch Oxford Oxford University Press

Maddage N C et al 2004 ldquoContent-Based Music Struc-ture Analysis with Applications to Music SemanticsUnderstandingrdquo Proceedings of 2004 ACM Multime-dia New York Association for Computing Machinerypp 112ndash119

Martin KD et al 1998 ldquoMusic Content AnalysisThrough Models of Auditionrdquo Paper presented at the1998 ACM Multimedia Workshop on Content Pro-cessing of Music for Multimedia Applications BristolUK 12 April

Ng K et al 1996 ldquoAutomatic Detection of Tonality Us-ing Note Distributionrdquo Journal of New Music Re-search 25(4)369ndash381

Nwe T L et al 2004 ldquoSinging Voice Detection in Popu-lar Musicrdquo Paper presented at the 2004 ACM Multime-dia Conference New York 12 October

Ortiz-Berenguer L I and F J Casajus-Quiros 2002ldquoPolyphonic Transcription Using Piano Modeling forSpectral Pattern Recognitionrdquo Proceedings of the 2002Conference on Digital Audio Effects Hoboken NewJersey Wiley pp 45ndash50

Pardo B and W Birmingham 1999 ldquoAutomated Parti-tioning of Tonal Musicrdquo Technical Report CSE-TR-396ndash99 Electrical Engineering and Computer ScienceDepartment University of Michigan

Pardo B and W Birmingham 2001 ldquoChordal Analysis ofTonal Musicrdquo Technical Report CSE-TR-439-01 Elec-trical Engineering and Computer Science DepartmentUniversity of Michigan

Pauws S 2004 ldquoMusical Key Extraction from AudiordquoPaper presented at the 2004 International Sympo-sium on Music Information Retrieval Barcelona11 October

Pickens J et al 2002 ldquoPolyphonic Score Retrieval UsingPolyphonic Audio Queries A Harmonic Modeling Ap-proachrdquo Paper presented at the 2002 International

85

Symposium on Music Information Retrieval Paris15 October

Pickens J and T Crawford 2002 ldquoHarmonic Models forPolyphonic Music Retrievalrdquo Proceedings of the 2002ACM Conference on Information and Knowledge Man-agement New York Association for Computing Ma-chinery pp 430ndash437

Pickens J 2003 ldquoKey-Specific Shrinkage Techniques forHarmonic Modelsrdquo Poster presented at the 2003 Inter-national Symposium on Music Information RetrievalBaltimore 26ndash30 October

Plumbley M D et al 2002 ldquoAutomatic Music Tran-scription and Audio Source Separationrdquo Cyberneticsand Systems 33(6)603ndash627

Povel D J L 2002 ldquoA Model for the Perception of TonalMelodiesrdquo Proceedings of the 2002 International Con-ference on Music and Artificial Intelligence ViennaSpringer pp 144ndash154

Raphael C 2001 ldquoAutomated Rhythm TranscriptionrdquoPaper presented at the 2001 International Symposiumon Music Information Retrieval Bloomington Indiana15ndash17 October

Raphael C and J Stoddard 2003 ldquoHarmonic Analysiswith Probabilistic Graphical Modelsrdquo Paper presentedat the 2003 International Symposium on Music Infor-mation Retrieval Baltimore 26ndash30 October

Scheirer E D 1998 ldquoTempo and Beat Analysis of Acous-tic Musical Signalsrdquo Journal of the Acoustical Societyof America 103(1)588ndash601

Sheh A and D P W Ellis 2003 ldquoChord Segmentationand Recognition Using Em-Trained Hidden MarkovModelsrdquo Paper presented at the 2003 InternationalSymposium on Music Information Retrieval Balti-more 26ndash30 October

Shenoy A et al 2004 ldquoKey Determination of AcousticMusical Signalsrdquo Paper presented at the 2004 Interna-tional Conference on Multimedia and Expo Taipei 27ndash30 June

Smith N A and M A Schmuckler 2000 ldquoPitch-Distributional Effects on the Perception of Tonalityrdquo

Paper presented at the Sixth International Confer-ence on Music Perception and Cognition Keele5ndash10 August

Sollberger B et al 2003 ldquoMusical Chords as AffectivePriming Context in a Word-Evaluation Taskrdquo MusicPerception 20(3)263ndash282

Temperley D 1999a ldquoImproving the Krumhansl-Schmuckler Key-Finding Algorithmrdquo Paper presentedat the 22nd Annual Meeting of the Society for MusicTheory Atlanta 12 November

Temperley D 1999b ldquoWhatrsquos Key for Key TheKrumhansl-Schmuckler Key-Finding Algorithm Recon-sideredrdquo Music Perception 17(1)65ndash100

Temperley D 2002 ldquoA Bayesian Model of Key-FindingrdquoProceedings of the 2002 International Conference onMusic and Artificial Intelligence Vienna Springerpp 195ndash206

Vercoe B L 1997 ldquoComputational Auditory Pathways toMusic Understandingrdquo In I Deliegravege and J Sloboda edsPerception and Cognition of Music London Psychol-ogy Press pp 307ndash326

Wakefield G H 1999 ldquoMathematical Representation ofJoint Time-Chroma Distributionsrdquo SPIEmdashThe Inter-national Society for Optical Engineering 3807637ndash645

Wang Y et al 2003 ldquoParametric Vector Quantization forCoding Percussive Sounds in Musicrdquo Proceedings ofthe 2003 International Conference on AcousticsSpeech and Signal Processing Piscataway New JerseyInstitute for Electrical and Electronics Engineers

Wang Y et al 2004 ldquoLyricAlly Automatic Synchroniza-tion of Acoustic Musical Signals and Textual LyricsrdquoProceedings of 2004 ACM Multimedia New York As-sociation for Computing Machinery pp 212ndash219

Zhu Y and M Kankanhalli 2003 ldquoMusic Scale Modelingfor Melody Matchingrdquo Proceedings of the Eleventh In-ternational ACM Conference on Multimedia New YorkAssociation for Computing Machinery pp 359ndash362

Zhu Y et al 2004 ldquoA Method for Solmization of MelodyrdquoPaper presented at the 2004 International Conferenceon Multimedia and Expo Taipei 27ndash30 June

86 Computer Music Journal

It can be observed that the average chord-detectionaccuracy across the length of the entire music per-formed by the chord-detection step (module 3 inFigure 2) is relatively low at 4813 The rest of thechords are either not detected or are detected in er-ror This accuracy is sufficient however to deter-mine the key accurately for 28 out of 30 songs inthe key-detection step (module 4 in Figure 2) whichreflects an accuracy of over 93 for key detectionThis was verified against the information in com-mercially available sheet music for the songs (wwwmusicnotescom wwwsheetmusicpluscom) Theaverage chord detection accuracy of the system im-proves on an average by 1507 on applying ChordAccuracy Enhancement (Phase 1) (See module 5 inFigure 2) Errors in key determination do not haveany effect on this step as will be discussed next

The new accuracy of 6320 has been found to besufficient to determine the hierarchical rhythmstructure (module 6 in Figure 2) across the music for28 out of the 30 songs thus again reflecting an accu-racy of over 93 for rhythm tracking Finally theapplication of Chord Accuracy Enhancement (Phase2 see module 7 in Figure 2) makes a substantial per-formance improvement of 1571 leading to a finalchord-detection accuracy of 7891 This couldhave been higher were it not for the performancedrop for the two songs (song numbers 6 and 24 inTable 1) owing to error in measure-boundary detec-tion This exemplifies the importance of accuratemeasure detection to performing intra-measurechord checks based on the previously discussedmusical knowledge of chords

Analysis of Key

It can be observed that for two of the songs (songnumbers 20 and 26 in Table 1) the key has been de-termined incorrectly because the major key and itsrelative minor are very close Our technique as-sumes that the key of the song is constant through-out the length of the song However many songsoften use both major and minor keys perhapschoosing a minor key for the verse and a major keyfor the chorus or vice versa Sometimes the chordsused in the song are present in both the major and

its relative minor For example the four mainchords used in the song ldquoViva Foreverrdquo by the SpiceGirls are D-sharp minor A-sharp minor B majorand F-sharp major These chords are present in thekey of F-sharp major and D-sharp minor hence it isdifficult for the system to determine if the song is inthe major key or the relative minor A similar obser-vation can be made for the song ldquoWalking Awayrdquo byCraig David in which the main chords used are Aminor D minor F major G major and C majorThese chords are present in both the key of C majoras well as in its relative minor A minor

The use of a weighted-cosine-similarity techniquecauses the shorter major-key patterns to be preferredover the longer minor-key patterns owing to nor-malization that is performed while applying the co-sine similarity For the minor keys normalization isapplied taking into account the count of chords thatcan be constructed across all the three types of minorscales However from an informal evaluation of pop-ular music we observe that popular music in minorkeys usually shifts across only two out of the threescales primarily the natural and harmonic minorIn such cases the normalization technique appliedwould cause the system to become slightly biasedtoward the relative major key where this problem isnot present as there is only one major scale

Errors in key recognition do not affect the ChordAccuracy Enhancement (Phase 1) module becausewe also consider chords present in the relative ma-jorminor key in addition to the chords in the de-tected key A theoretical explanation of how toperform key identification in such cases of ambigu-ity (as seen above) based on an analysis of sheet mu-sic can be found in Ewer (2002)

Analysis of Chord

The variation in the chord-detection accuracy of thesystem can be explained in terms of other chordsand key change

Use of Other Chords

In this approach we have considered only the majorand minor triads However in addition to these

82 Computer Music Journal

Shenoy and Wang

there are other chord possibilities in popular musicwhich likely contribute to the variation in chord de-tection These chords include the augmented anddiminished triads seventh chords (eg the dominantseventh major seventh and minor seventh) etc

Polyphony with its multidimensional sequencesof overlapping tones and overlapping harmoniccomponents of individual notes in the spectrummight cause the elements in the chroma vector tobe weighted wrongly As a result a Cmajor 7 chord(C E G B) in the music might incorrectly get de-tected as an E minor chord (E G B) if the latterthree notes are assigned a relatively higher weightin the chroma vector

Key Change

In some songs there is a key change toward the endof a song to make the final repeated part(s) (eg thechorusrefrain) slightly different from the previousparts This is affected by transposing the song tohigher semitones usually up a half step This hasalso been highlighted by Goto (2003) Because oursystem does not currently handle key changes thechords detected in this section will not be recog-nized This is illustrated with an example in Figure 5

Another point to be noted here is of chord substi-tutionsimplification of extended chords in theevaluation of our system For simplicity extendedchords can be substituted by their respective ma-jorminor triads As long as the notes in the ex-tended chord are present in the scale and the basictriad is present the simplification can be done For

example the C-major 7 can be simplified to the Cmajor triad This substitution has been performedon the extended chords annotated in the sheet mu-sic in the evaluation of our system

Rhythm Tracking Observations

We conjecture the error in rhythm tracking to becaused by errors in the chord-detection as discussedabove Chords present in the music and not handledby our system could be incorrectly classified into oneof the 24 majorminor triads owing to complexitiesin polyphonic audio analysis This can result in in-correct clusters of four chords being captured by therhythm-detection process resulting in an incorrectpattern of measure boundaries having the highestcount Furthermore beat detection is a non-trivialtask the difficulties of tracking the beats in acous-tic signals are discussed in Goto and Muraoka(1994) Any error in beat detection can cause a shiftin the rhythm structure determined by the system

Discussion

We have presented a framework to determine thekey chords and hierarchical rhythm structure fromacoustic musical signals To our knowledge this isthe first attempt to use a rule-based approach thatcombines low-level features with high-level musicknowledge of rhythm and harmonic structure to de-termine all three of these dimensions of music The

83

Figure 5 Key Shift

methods should (and do) work reasonably well forkey and chord labeling of popular music in 44 timeHowever they would not extend very far beyondthis idiom or to more rhythmically and tonallycomplex music This framework has been appliedsuccessfully in various other aspects of contentanalysis such as singing-voice detection (Nwe et al2004) and the automatic alignment of textual lyricsand musical audio (Wang et al 2004) The humanauditory system is capable of extracting rich andmeaningful data from complex audio signals (Shehand Ellis 2003) and existing computational audi-tory analysis systems fall clearly behind humans inperformance Towards this end we believe that themodel proposed here provides a promising platformfor the future development of more sophisticatedauditory models based on a better understanding ofmusic Our current and future research that buildson this work is highlighted below

Key Detection

Our technique assumes that the key of the song isconstant throughout the duration of the song How-ever owing to the properties of the relative majorminor key combinations we have made the chord-and rhythm-detection processes (which use the keyof the song as input) quite robust against changesacross this key combination However the same can-not be said about other kinds of key changes in themusic This is because such key changes are quitedifficult to track as there are no fixed rules andthey tend to depend more on the songwriterrsquos cre-ativity For example the song ldquoLet It Growrdquo by EricClapton switches from a B-minor key in the verse toan E major key in the chorus We believe that ananalysis of the song structure (verse chorus bridgeetc) could likely serve as an input to help track thiskind of key change This problem is currently beinganalyzed and will be tackled in the future

Chord Detection

In this approach we have considered only the majorand minor triads However in addition to these

there are other chord possibilities in popular musicand future work will be targeted toward the detec-tion of dominant-seventh chords and extendedchords as discussed earlier Chord-detection re-search can be further extended to include knowl-edge of chord progressions based on the function ofchords in their diatonic scale which relates to theexpected resolution of each chord within a keyThat is the analysis of chord progressions based onthe ldquoneedrdquo for a sounded chord to move from anldquounstablerdquo sound (a ldquodissonancerdquo) to a more final orldquostablerdquo sounding one (a ldquoconsonancerdquo)

Rhythm Tracking

Unlike that of Goto and Muraoka (1999) therhythm-extraction technique employed in our cur-rent system does not perform well for ldquodrumlessrdquomusic signals because the onset detector has beenoptimized to detect the onset of percussive events(Wang et al 2003) Future effort will be aimed at ex-tending the current work for music signals that donot contain drum sounds

Acknowledgments

We thank the editors and two anonymous reviewersfor helpful comments and suggestions on an earlierversion of this article

References

Aigrain P 1999 ldquoNew Applications of Content Pro-cessing of Musicrdquo Journal of New Music Research28(4)271ndash280

Allen P E and R B Dannenberg 1990 ldquoTracking MusicalBeats in Real Timerdquo Proceedings of the 1990 Interna-tional Computer Music Conference San Francisco In-ternational Computer Music Association pp 140ndash143

Bello J P et al 2000 ldquoTechniques for Automatic MusicTranscriptionrdquo Paper presented at the 2000 Interna-tional Symposium on Music Information Retrieval Uni-versity of Massachusetts at Amherst 23ndash25 October

Carreras F et al 1999 ldquoAutomatic Harmonic Descrip-tion of Musical Signals Using Schema-Based Chord De-

84 Computer Music Journal

Shenoy and Wang

compositionrdquo Journal of New Music Research 28(4)310ndash333

Cemgil A T et al 2001 ldquoOn Tempo Tracking Tempo-gram Representation and Kalman Filteringrdquo Journal ofNew Music Research 29(4)259ndash273

Cemgil A T and B Kappen 2003 ldquoMonte Carlo Meth-ods for Tempo Tracking and Rhythm QuantizationrdquoJournal of Artificial Intelligence Research 18(1)45ndash81

Chew E 2001 ldquoModeling Tonality Applications to Mu-sic Cognitionrdquo Proceedings of the Twenty-Third An-nual Conference of the Cognitive Science SocietyWheat Ridge Colorado Cognitive Science Societypp 206ndash211

Chew E 2002 ldquoThe Spiral Array An Algorithm for De-termining Key Boundariesrdquo Proceedings of the 2002International Conference on Music and Artificial In-telligence Vienna Springer pp 18ndash31

Dixon S 2001 ldquoAutomatic Extraction of Tempo and Beatfrom Expressive Performancesrdquo Journal of New MusicResearch 30(1)39ndash58

Dixon S 2003 ldquoOn the Analysis of Musical Expressionsin Audio Signalsrdquo SPIEmdashThe International Society forOptical Engineering 5021(2)122ndash132

Dixon S 2004 ldquoAnalysis of Musical Content in DigitalAudiordquo Computer Graphics and Multimedia Applica-tions Problems and Solutions Hershey PennsylvaniaIdea Group pp 214ndash235

Ewer G 2002 Easy Music Theory Lower Sackville NovaScotia Spring Day Music Publishers

Fujishima T 1999 ldquoReal-Time Chord Recognition ofMusical Sound A System Using Common Lisp Mu-sicrdquo Proceedings of the 1999 International ComputerMusic Conference San Francisco International Com-puter Music Association pp 464ndash467

Goto M 2001 ldquoAn Audio-Based Real-Time Beat-TrackingSystem for Music with or without Drum SoundsrdquoJournal of New Music Research 30(2)159ndash171

Goto M 2003 ldquoA Chorus-Section Detecting Method forMusical Audio Signalsrdquo Proceedings of the 2003 Inter-national Conference on Acoustics Speech and SignalProcessing Piscataway New Jersey Institute for Elec-trical and Electronics Engineers pp 437ndash440

Goto M and Y Muraoka 1994 ldquoA Beat-Tracking Sys-tem for Acoustic Signals of Musicrdquo Proceedings of the1994 ACM Multimedia New York Association forComputing Machinery pp 365ndash372

Goto M and Y Muraoka 1999 ldquoReal-Time Beat Track-ing for Drumless Audio Signals Chord Change Detec-tion for Musical Decisionsrdquo Speech Communication27(3ndash4)311ndash335

Hevner K 1936 ldquoExperimental Studies of the Elements

of Expression in Musicrdquo American Journal of Psychol-ogy 48246ndash268

Huron D 2000 ldquoPerceptual and Cognitive Applicationsin Music Information Retrievalrdquo Paper presented atthe 2000 International Symposium on Music Informa-tion Retrieval University of Massachusetts at Amherst23ndash25 October

Huron D and R Parncutt 1993 ldquoAn Improved Modelof Tonality Perception Incorporating Pitch Salienceand Echoic Memoryrdquo Psychomusicology 12(2)154ndash171

Klapuri A 2003 ldquoAutomatic Transcription of MusicrdquoProceedings of SMAC 2003 Stockholm KTH

Krumhansl C 1990 Cognitive Foundations of MusicalPitch Oxford Oxford University Press

Maddage N C et al 2004 ldquoContent-Based Music Struc-ture Analysis with Applications to Music SemanticsUnderstandingrdquo Proceedings of 2004 ACM Multime-dia New York Association for Computing Machinerypp 112ndash119

Martin KD et al 1998 ldquoMusic Content AnalysisThrough Models of Auditionrdquo Paper presented at the1998 ACM Multimedia Workshop on Content Pro-cessing of Music for Multimedia Applications BristolUK 12 April

Ng K et al 1996 ldquoAutomatic Detection of Tonality Us-ing Note Distributionrdquo Journal of New Music Re-search 25(4)369ndash381

Nwe T L et al 2004 ldquoSinging Voice Detection in Popu-lar Musicrdquo Paper presented at the 2004 ACM Multime-dia Conference New York 12 October

Ortiz-Berenguer L I and F J Casajus-Quiros 2002ldquoPolyphonic Transcription Using Piano Modeling forSpectral Pattern Recognitionrdquo Proceedings of the 2002Conference on Digital Audio Effects Hoboken NewJersey Wiley pp 45ndash50

Pardo B and W Birmingham 1999 ldquoAutomated Parti-tioning of Tonal Musicrdquo Technical Report CSE-TR-396ndash99 Electrical Engineering and Computer ScienceDepartment University of Michigan

Pardo B and W Birmingham 2001 ldquoChordal Analysis ofTonal Musicrdquo Technical Report CSE-TR-439-01 Elec-trical Engineering and Computer Science DepartmentUniversity of Michigan

Pauws S 2004 ldquoMusical Key Extraction from AudiordquoPaper presented at the 2004 International Sympo-sium on Music Information Retrieval Barcelona11 October

Pickens J et al 2002 ldquoPolyphonic Score Retrieval UsingPolyphonic Audio Queries A Harmonic Modeling Ap-proachrdquo Paper presented at the 2002 International

85

Symposium on Music Information Retrieval Paris15 October

Pickens J and T Crawford 2002 ldquoHarmonic Models forPolyphonic Music Retrievalrdquo Proceedings of the 2002ACM Conference on Information and Knowledge Man-agement New York Association for Computing Ma-chinery pp 430ndash437

Pickens J 2003 ldquoKey-Specific Shrinkage Techniques forHarmonic Modelsrdquo Poster presented at the 2003 Inter-national Symposium on Music Information RetrievalBaltimore 26ndash30 October

Plumbley M D et al 2002 ldquoAutomatic Music Tran-scription and Audio Source Separationrdquo Cyberneticsand Systems 33(6)603ndash627

Povel D J L 2002 ldquoA Model for the Perception of TonalMelodiesrdquo Proceedings of the 2002 International Con-ference on Music and Artificial Intelligence ViennaSpringer pp 144ndash154

Raphael C 2001 ldquoAutomated Rhythm TranscriptionrdquoPaper presented at the 2001 International Symposiumon Music Information Retrieval Bloomington Indiana15ndash17 October

Raphael C and J Stoddard 2003 ldquoHarmonic Analysiswith Probabilistic Graphical Modelsrdquo Paper presentedat the 2003 International Symposium on Music Infor-mation Retrieval Baltimore 26ndash30 October

Scheirer E D 1998 ldquoTempo and Beat Analysis of Acous-tic Musical Signalsrdquo Journal of the Acoustical Societyof America 103(1)588ndash601

Sheh A and D P W Ellis 2003 ldquoChord Segmentationand Recognition Using Em-Trained Hidden MarkovModelsrdquo Paper presented at the 2003 InternationalSymposium on Music Information Retrieval Balti-more 26ndash30 October

Shenoy A et al 2004 ldquoKey Determination of AcousticMusical Signalsrdquo Paper presented at the 2004 Interna-tional Conference on Multimedia and Expo Taipei 27ndash30 June

Smith N A and M A Schmuckler 2000 ldquoPitch-Distributional Effects on the Perception of Tonalityrdquo

Paper presented at the Sixth International Confer-ence on Music Perception and Cognition Keele5ndash10 August

Sollberger B et al 2003 ldquoMusical Chords as AffectivePriming Context in a Word-Evaluation Taskrdquo MusicPerception 20(3)263ndash282

Temperley D 1999a ldquoImproving the Krumhansl-Schmuckler Key-Finding Algorithmrdquo Paper presentedat the 22nd Annual Meeting of the Society for MusicTheory Atlanta 12 November

Temperley D 1999b ldquoWhatrsquos Key for Key TheKrumhansl-Schmuckler Key-Finding Algorithm Recon-sideredrdquo Music Perception 17(1)65ndash100

Temperley D 2002 ldquoA Bayesian Model of Key-FindingrdquoProceedings of the 2002 International Conference onMusic and Artificial Intelligence Vienna Springerpp 195ndash206

Vercoe B L 1997 ldquoComputational Auditory Pathways toMusic Understandingrdquo In I Deliegravege and J Sloboda edsPerception and Cognition of Music London Psychol-ogy Press pp 307ndash326

Wakefield G H 1999 ldquoMathematical Representation ofJoint Time-Chroma Distributionsrdquo SPIEmdashThe Inter-national Society for Optical Engineering 3807637ndash645

Wang Y et al 2003 ldquoParametric Vector Quantization forCoding Percussive Sounds in Musicrdquo Proceedings ofthe 2003 International Conference on AcousticsSpeech and Signal Processing Piscataway New JerseyInstitute for Electrical and Electronics Engineers

Wang Y et al 2004 ldquoLyricAlly Automatic Synchroniza-tion of Acoustic Musical Signals and Textual LyricsrdquoProceedings of 2004 ACM Multimedia New York As-sociation for Computing Machinery pp 212ndash219

Zhu Y and M Kankanhalli 2003 ldquoMusic Scale Modelingfor Melody Matchingrdquo Proceedings of the Eleventh In-ternational ACM Conference on Multimedia New YorkAssociation for Computing Machinery pp 359ndash362

Zhu Y et al 2004 ldquoA Method for Solmization of MelodyrdquoPaper presented at the 2004 International Conferenceon Multimedia and Expo Taipei 27ndash30 June

86 Computer Music Journal

Shenoy and Wang

there are other chord possibilities in popular musicwhich likely contribute to the variation in chord de-tection These chords include the augmented anddiminished triads seventh chords (eg the dominantseventh major seventh and minor seventh) etc

Polyphony with its multidimensional sequencesof overlapping tones and overlapping harmoniccomponents of individual notes in the spectrummight cause the elements in the chroma vector tobe weighted wrongly As a result a Cmajor 7 chord(C E G B) in the music might incorrectly get de-tected as an E minor chord (E G B) if the latterthree notes are assigned a relatively higher weightin the chroma vector

Key Change

In some songs there is a key change toward the endof a song to make the final repeated part(s) (eg thechorusrefrain) slightly different from the previousparts This is affected by transposing the song tohigher semitones usually up a half step This hasalso been highlighted by Goto (2003) Because oursystem does not currently handle key changes thechords detected in this section will not be recog-nized This is illustrated with an example in Figure 5

Another point to be noted here is of chord substi-tutionsimplification of extended chords in theevaluation of our system For simplicity extendedchords can be substituted by their respective ma-jorminor triads As long as the notes in the ex-tended chord are present in the scale and the basictriad is present the simplification can be done For

example the C-major 7 can be simplified to the Cmajor triad This substitution has been performedon the extended chords annotated in the sheet mu-sic in the evaluation of our system

Rhythm Tracking Observations

We conjecture the error in rhythm tracking to becaused by errors in the chord-detection as discussedabove Chords present in the music and not handledby our system could be incorrectly classified into oneof the 24 majorminor triads owing to complexitiesin polyphonic audio analysis This can result in in-correct clusters of four chords being captured by therhythm-detection process resulting in an incorrectpattern of measure boundaries having the highestcount Furthermore beat detection is a non-trivialtask the difficulties of tracking the beats in acous-tic signals are discussed in Goto and Muraoka(1994) Any error in beat detection can cause a shiftin the rhythm structure determined by the system

Discussion

We have presented a framework to determine thekey chords and hierarchical rhythm structure fromacoustic musical signals To our knowledge this isthe first attempt to use a rule-based approach thatcombines low-level features with high-level musicknowledge of rhythm and harmonic structure to de-termine all three of these dimensions of music The

83

Figure 5 Key Shift

methods should (and do) work reasonably well forkey and chord labeling of popular music in 44 timeHowever they would not extend very far beyondthis idiom or to more rhythmically and tonallycomplex music This framework has been appliedsuccessfully in various other aspects of contentanalysis such as singing-voice detection (Nwe et al2004) and the automatic alignment of textual lyricsand musical audio (Wang et al 2004) The humanauditory system is capable of extracting rich andmeaningful data from complex audio signals (Shehand Ellis 2003) and existing computational audi-tory analysis systems fall clearly behind humans inperformance Towards this end we believe that themodel proposed here provides a promising platformfor the future development of more sophisticatedauditory models based on a better understanding ofmusic Our current and future research that buildson this work is highlighted below

Key Detection

Our technique assumes that the key of the song isconstant throughout the duration of the song How-ever owing to the properties of the relative majorminor key combinations we have made the chord-and rhythm-detection processes (which use the keyof the song as input) quite robust against changesacross this key combination However the same can-not be said about other kinds of key changes in themusic This is because such key changes are quitedifficult to track as there are no fixed rules andthey tend to depend more on the songwriterrsquos cre-ativity For example the song ldquoLet It Growrdquo by EricClapton switches from a B-minor key in the verse toan E major key in the chorus We believe that ananalysis of the song structure (verse chorus bridgeetc) could likely serve as an input to help track thiskind of key change This problem is currently beinganalyzed and will be tackled in the future

Chord Detection

In this approach we have considered only the majorand minor triads However in addition to these

there are other chord possibilities in popular musicand future work will be targeted toward the detec-tion of dominant-seventh chords and extendedchords as discussed earlier Chord-detection re-search can be further extended to include knowl-edge of chord progressions based on the function ofchords in their diatonic scale which relates to theexpected resolution of each chord within a keyThat is the analysis of chord progressions based onthe ldquoneedrdquo for a sounded chord to move from anldquounstablerdquo sound (a ldquodissonancerdquo) to a more final orldquostablerdquo sounding one (a ldquoconsonancerdquo)

Rhythm Tracking

Unlike that of Goto and Muraoka (1999) therhythm-extraction technique employed in our cur-rent system does not perform well for ldquodrumlessrdquomusic signals because the onset detector has beenoptimized to detect the onset of percussive events(Wang et al 2003) Future effort will be aimed at ex-tending the current work for music signals that donot contain drum sounds

Acknowledgments

We thank the editors and two anonymous reviewersfor helpful comments and suggestions on an earlierversion of this article

References

Aigrain P 1999 ldquoNew Applications of Content Pro-cessing of Musicrdquo Journal of New Music Research28(4)271ndash280

Allen P E and R B Dannenberg 1990 ldquoTracking MusicalBeats in Real Timerdquo Proceedings of the 1990 Interna-tional Computer Music Conference San Francisco In-ternational Computer Music Association pp 140ndash143

Bello J P et al 2000 ldquoTechniques for Automatic MusicTranscriptionrdquo Paper presented at the 2000 Interna-tional Symposium on Music Information Retrieval Uni-versity of Massachusetts at Amherst 23ndash25 October

Carreras F et al 1999 ldquoAutomatic Harmonic Descrip-tion of Musical Signals Using Schema-Based Chord De-

84 Computer Music Journal

Shenoy and Wang

compositionrdquo Journal of New Music Research 28(4)310ndash333

Cemgil A T et al 2001 ldquoOn Tempo Tracking Tempo-gram Representation and Kalman Filteringrdquo Journal ofNew Music Research 29(4)259ndash273

Cemgil A T and B Kappen 2003 ldquoMonte Carlo Meth-ods for Tempo Tracking and Rhythm QuantizationrdquoJournal of Artificial Intelligence Research 18(1)45ndash81

Chew E 2001 ldquoModeling Tonality Applications to Mu-sic Cognitionrdquo Proceedings of the Twenty-Third An-nual Conference of the Cognitive Science SocietyWheat Ridge Colorado Cognitive Science Societypp 206ndash211

Chew E 2002 ldquoThe Spiral Array An Algorithm for De-termining Key Boundariesrdquo Proceedings of the 2002International Conference on Music and Artificial In-telligence Vienna Springer pp 18ndash31

Dixon S 2001 ldquoAutomatic Extraction of Tempo and Beatfrom Expressive Performancesrdquo Journal of New MusicResearch 30(1)39ndash58

Dixon S 2003 ldquoOn the Analysis of Musical Expressionsin Audio Signalsrdquo SPIEmdashThe International Society forOptical Engineering 5021(2)122ndash132

Dixon S 2004 ldquoAnalysis of Musical Content in DigitalAudiordquo Computer Graphics and Multimedia Applica-tions Problems and Solutions Hershey PennsylvaniaIdea Group pp 214ndash235

Ewer G 2002 Easy Music Theory Lower Sackville NovaScotia Spring Day Music Publishers

Fujishima T 1999 ldquoReal-Time Chord Recognition ofMusical Sound A System Using Common Lisp Mu-sicrdquo Proceedings of the 1999 International ComputerMusic Conference San Francisco International Com-puter Music Association pp 464ndash467

Goto M 2001 ldquoAn Audio-Based Real-Time Beat-TrackingSystem for Music with or without Drum SoundsrdquoJournal of New Music Research 30(2)159ndash171

Goto M 2003 ldquoA Chorus-Section Detecting Method forMusical Audio Signalsrdquo Proceedings of the 2003 Inter-national Conference on Acoustics Speech and SignalProcessing Piscataway New Jersey Institute for Elec-trical and Electronics Engineers pp 437ndash440

Goto M and Y Muraoka 1994 ldquoA Beat-Tracking Sys-tem for Acoustic Signals of Musicrdquo Proceedings of the1994 ACM Multimedia New York Association forComputing Machinery pp 365ndash372

Goto M and Y Muraoka 1999 ldquoReal-Time Beat Track-ing for Drumless Audio Signals Chord Change Detec-tion for Musical Decisionsrdquo Speech Communication27(3ndash4)311ndash335

Hevner K 1936 ldquoExperimental Studies of the Elements

of Expression in Musicrdquo American Journal of Psychol-ogy 48246ndash268

Huron D 2000 ldquoPerceptual and Cognitive Applicationsin Music Information Retrievalrdquo Paper presented atthe 2000 International Symposium on Music Informa-tion Retrieval University of Massachusetts at Amherst23ndash25 October

Huron D and R Parncutt 1993 ldquoAn Improved Modelof Tonality Perception Incorporating Pitch Salienceand Echoic Memoryrdquo Psychomusicology 12(2)154ndash171

Klapuri A 2003 ldquoAutomatic Transcription of MusicrdquoProceedings of SMAC 2003 Stockholm KTH

Krumhansl C 1990 Cognitive Foundations of MusicalPitch Oxford Oxford University Press

Maddage N C et al 2004 ldquoContent-Based Music Struc-ture Analysis with Applications to Music SemanticsUnderstandingrdquo Proceedings of 2004 ACM Multime-dia New York Association for Computing Machinerypp 112ndash119

Martin KD et al 1998 ldquoMusic Content AnalysisThrough Models of Auditionrdquo Paper presented at the1998 ACM Multimedia Workshop on Content Pro-cessing of Music for Multimedia Applications BristolUK 12 April

Ng K et al 1996 ldquoAutomatic Detection of Tonality Us-ing Note Distributionrdquo Journal of New Music Re-search 25(4)369ndash381

Nwe T L et al 2004 ldquoSinging Voice Detection in Popu-lar Musicrdquo Paper presented at the 2004 ACM Multime-dia Conference New York 12 October

Ortiz-Berenguer L I and F J Casajus-Quiros 2002ldquoPolyphonic Transcription Using Piano Modeling forSpectral Pattern Recognitionrdquo Proceedings of the 2002Conference on Digital Audio Effects Hoboken NewJersey Wiley pp 45ndash50

Pardo B and W Birmingham 1999 ldquoAutomated Parti-tioning of Tonal Musicrdquo Technical Report CSE-TR-396ndash99 Electrical Engineering and Computer ScienceDepartment University of Michigan

Pardo B and W Birmingham 2001 ldquoChordal Analysis ofTonal Musicrdquo Technical Report CSE-TR-439-01 Elec-trical Engineering and Computer Science DepartmentUniversity of Michigan

Pauws S 2004 ldquoMusical Key Extraction from AudiordquoPaper presented at the 2004 International Sympo-sium on Music Information Retrieval Barcelona11 October

Pickens J et al 2002 ldquoPolyphonic Score Retrieval UsingPolyphonic Audio Queries A Harmonic Modeling Ap-proachrdquo Paper presented at the 2002 International

85

Symposium on Music Information Retrieval Paris15 October

Pickens J and T Crawford 2002 ldquoHarmonic Models forPolyphonic Music Retrievalrdquo Proceedings of the 2002ACM Conference on Information and Knowledge Man-agement New York Association for Computing Ma-chinery pp 430ndash437

Pickens J 2003 ldquoKey-Specific Shrinkage Techniques forHarmonic Modelsrdquo Poster presented at the 2003 Inter-national Symposium on Music Information RetrievalBaltimore 26ndash30 October

Plumbley M D et al 2002 ldquoAutomatic Music Tran-scription and Audio Source Separationrdquo Cyberneticsand Systems 33(6)603ndash627

Povel D J L 2002 ldquoA Model for the Perception of TonalMelodiesrdquo Proceedings of the 2002 International Con-ference on Music and Artificial Intelligence ViennaSpringer pp 144ndash154

Raphael C 2001 ldquoAutomated Rhythm TranscriptionrdquoPaper presented at the 2001 International Symposiumon Music Information Retrieval Bloomington Indiana15ndash17 October

Raphael C and J Stoddard 2003 ldquoHarmonic Analysiswith Probabilistic Graphical Modelsrdquo Paper presentedat the 2003 International Symposium on Music Infor-mation Retrieval Baltimore 26ndash30 October

Scheirer E D 1998 ldquoTempo and Beat Analysis of Acous-tic Musical Signalsrdquo Journal of the Acoustical Societyof America 103(1)588ndash601

Sheh A and D P W Ellis 2003 ldquoChord Segmentationand Recognition Using Em-Trained Hidden MarkovModelsrdquo Paper presented at the 2003 InternationalSymposium on Music Information Retrieval Balti-more 26ndash30 October

Shenoy A et al 2004 ldquoKey Determination of AcousticMusical Signalsrdquo Paper presented at the 2004 Interna-tional Conference on Multimedia and Expo Taipei 27ndash30 June

Smith N A and M A Schmuckler 2000 ldquoPitch-Distributional Effects on the Perception of Tonalityrdquo

Paper presented at the Sixth International Confer-ence on Music Perception and Cognition Keele5ndash10 August

Sollberger B et al 2003 ldquoMusical Chords as AffectivePriming Context in a Word-Evaluation Taskrdquo MusicPerception 20(3)263ndash282

Temperley D 1999a ldquoImproving the Krumhansl-Schmuckler Key-Finding Algorithmrdquo Paper presentedat the 22nd Annual Meeting of the Society for MusicTheory Atlanta 12 November

Temperley D 1999b ldquoWhatrsquos Key for Key TheKrumhansl-Schmuckler Key-Finding Algorithm Recon-sideredrdquo Music Perception 17(1)65ndash100

Temperley D 2002 ldquoA Bayesian Model of Key-FindingrdquoProceedings of the 2002 International Conference onMusic and Artificial Intelligence Vienna Springerpp 195ndash206

Vercoe B L 1997 ldquoComputational Auditory Pathways toMusic Understandingrdquo In I Deliegravege and J Sloboda edsPerception and Cognition of Music London Psychol-ogy Press pp 307ndash326

Wakefield G H 1999 ldquoMathematical Representation ofJoint Time-Chroma Distributionsrdquo SPIEmdashThe Inter-national Society for Optical Engineering 3807637ndash645

Wang Y et al 2003 ldquoParametric Vector Quantization forCoding Percussive Sounds in Musicrdquo Proceedings ofthe 2003 International Conference on AcousticsSpeech and Signal Processing Piscataway New JerseyInstitute for Electrical and Electronics Engineers

Wang Y et al 2004 ldquoLyricAlly Automatic Synchroniza-tion of Acoustic Musical Signals and Textual LyricsrdquoProceedings of 2004 ACM Multimedia New York As-sociation for Computing Machinery pp 212ndash219

Zhu Y and M Kankanhalli 2003 ldquoMusic Scale Modelingfor Melody Matchingrdquo Proceedings of the Eleventh In-ternational ACM Conference on Multimedia New YorkAssociation for Computing Machinery pp 359ndash362

Zhu Y et al 2004 ldquoA Method for Solmization of MelodyrdquoPaper presented at the 2004 International Conferenceon Multimedia and Expo Taipei 27ndash30 June

86 Computer Music Journal

methods should (and do) work reasonably well forkey and chord labeling of popular music in 44 timeHowever they would not extend very far beyondthis idiom or to more rhythmically and tonallycomplex music This framework has been appliedsuccessfully in various other aspects of contentanalysis such as singing-voice detection (Nwe et al2004) and the automatic alignment of textual lyricsand musical audio (Wang et al 2004) The humanauditory system is capable of extracting rich andmeaningful data from complex audio signals (Shehand Ellis 2003) and existing computational audi-tory analysis systems fall clearly behind humans inperformance Towards this end we believe that themodel proposed here provides a promising platformfor the future development of more sophisticatedauditory models based on a better understanding ofmusic Our current and future research that buildson this work is highlighted below

Key Detection

Our technique assumes that the key of the song isconstant throughout the duration of the song How-ever owing to the properties of the relative majorminor key combinations we have made the chord-and rhythm-detection processes (which use the keyof the song as input) quite robust against changesacross this key combination However the same can-not be said about other kinds of key changes in themusic This is because such key changes are quitedifficult to track as there are no fixed rules andthey tend to depend more on the songwriterrsquos cre-ativity For example the song ldquoLet It Growrdquo by EricClapton switches from a B-minor key in the verse toan E major key in the chorus We believe that ananalysis of the song structure (verse chorus bridgeetc) could likely serve as an input to help track thiskind of key change This problem is currently beinganalyzed and will be tackled in the future

Chord Detection

In this approach we have considered only the majorand minor triads However in addition to these

there are other chord possibilities in popular musicand future work will be targeted toward the detec-tion of dominant-seventh chords and extendedchords as discussed earlier Chord-detection re-search can be further extended to include knowl-edge of chord progressions based on the function ofchords in their diatonic scale which relates to theexpected resolution of each chord within a keyThat is the analysis of chord progressions based onthe ldquoneedrdquo for a sounded chord to move from anldquounstablerdquo sound (a ldquodissonancerdquo) to a more final orldquostablerdquo sounding one (a ldquoconsonancerdquo)

Rhythm Tracking

Unlike that of Goto and Muraoka (1999) therhythm-extraction technique employed in our cur-rent system does not perform well for ldquodrumlessrdquomusic signals because the onset detector has beenoptimized to detect the onset of percussive events(Wang et al 2003) Future effort will be aimed at ex-tending the current work for music signals that donot contain drum sounds

Acknowledgments

We thank the editors and two anonymous reviewersfor helpful comments and suggestions on an earlierversion of this article

References

Aigrain P 1999 ldquoNew Applications of Content Pro-cessing of Musicrdquo Journal of New Music Research28(4)271ndash280

Allen P E and R B Dannenberg 1990 ldquoTracking MusicalBeats in Real Timerdquo Proceedings of the 1990 Interna-tional Computer Music Conference San Francisco In-ternational Computer Music Association pp 140ndash143

Bello J P et al 2000 ldquoTechniques for Automatic MusicTranscriptionrdquo Paper presented at the 2000 Interna-tional Symposium on Music Information Retrieval Uni-versity of Massachusetts at Amherst 23ndash25 October

Carreras F et al 1999 ldquoAutomatic Harmonic Descrip-tion of Musical Signals Using Schema-Based Chord De-

84 Computer Music Journal

Shenoy and Wang

compositionrdquo Journal of New Music Research 28(4)310ndash333

Cemgil A T et al 2001 ldquoOn Tempo Tracking Tempo-gram Representation and Kalman Filteringrdquo Journal ofNew Music Research 29(4)259ndash273

Cemgil A T and B Kappen 2003 ldquoMonte Carlo Meth-ods for Tempo Tracking and Rhythm QuantizationrdquoJournal of Artificial Intelligence Research 18(1)45ndash81

Chew E 2001 ldquoModeling Tonality Applications to Mu-sic Cognitionrdquo Proceedings of the Twenty-Third An-nual Conference of the Cognitive Science SocietyWheat Ridge Colorado Cognitive Science Societypp 206ndash211

Chew E 2002 ldquoThe Spiral Array An Algorithm for De-termining Key Boundariesrdquo Proceedings of the 2002International Conference on Music and Artificial In-telligence Vienna Springer pp 18ndash31

Dixon S 2001 ldquoAutomatic Extraction of Tempo and Beatfrom Expressive Performancesrdquo Journal of New MusicResearch 30(1)39ndash58

Dixon S 2003 ldquoOn the Analysis of Musical Expressionsin Audio Signalsrdquo SPIEmdashThe International Society forOptical Engineering 5021(2)122ndash132

Dixon S 2004 ldquoAnalysis of Musical Content in DigitalAudiordquo Computer Graphics and Multimedia Applica-tions Problems and Solutions Hershey PennsylvaniaIdea Group pp 214ndash235

Ewer G 2002 Easy Music Theory Lower Sackville NovaScotia Spring Day Music Publishers

Fujishima T 1999 ldquoReal-Time Chord Recognition ofMusical Sound A System Using Common Lisp Mu-sicrdquo Proceedings of the 1999 International ComputerMusic Conference San Francisco International Com-puter Music Association pp 464ndash467

Goto M 2001 ldquoAn Audio-Based Real-Time Beat-TrackingSystem for Music with or without Drum SoundsrdquoJournal of New Music Research 30(2)159ndash171

Goto M 2003 ldquoA Chorus-Section Detecting Method forMusical Audio Signalsrdquo Proceedings of the 2003 Inter-national Conference on Acoustics Speech and SignalProcessing Piscataway New Jersey Institute for Elec-trical and Electronics Engineers pp 437ndash440

Goto M and Y Muraoka 1994 ldquoA Beat-Tracking Sys-tem for Acoustic Signals of Musicrdquo Proceedings of the1994 ACM Multimedia New York Association forComputing Machinery pp 365ndash372

Goto M and Y Muraoka 1999 ldquoReal-Time Beat Track-ing for Drumless Audio Signals Chord Change Detec-tion for Musical Decisionsrdquo Speech Communication27(3ndash4)311ndash335

Hevner K 1936 ldquoExperimental Studies of the Elements

of Expression in Musicrdquo American Journal of Psychol-ogy 48246ndash268

Huron D 2000 ldquoPerceptual and Cognitive Applicationsin Music Information Retrievalrdquo Paper presented atthe 2000 International Symposium on Music Informa-tion Retrieval University of Massachusetts at Amherst23ndash25 October

Huron D and R Parncutt 1993 ldquoAn Improved Modelof Tonality Perception Incorporating Pitch Salienceand Echoic Memoryrdquo Psychomusicology 12(2)154ndash171

Klapuri A 2003 ldquoAutomatic Transcription of MusicrdquoProceedings of SMAC 2003 Stockholm KTH

Krumhansl C 1990 Cognitive Foundations of MusicalPitch Oxford Oxford University Press

Maddage N C et al 2004 ldquoContent-Based Music Struc-ture Analysis with Applications to Music SemanticsUnderstandingrdquo Proceedings of 2004 ACM Multime-dia New York Association for Computing Machinerypp 112ndash119

Martin KD et al 1998 ldquoMusic Content AnalysisThrough Models of Auditionrdquo Paper presented at the1998 ACM Multimedia Workshop on Content Pro-cessing of Music for Multimedia Applications BristolUK 12 April

Ng K et al 1996 ldquoAutomatic Detection of Tonality Us-ing Note Distributionrdquo Journal of New Music Re-search 25(4)369ndash381

Nwe T L et al 2004 ldquoSinging Voice Detection in Popu-lar Musicrdquo Paper presented at the 2004 ACM Multime-dia Conference New York 12 October

Ortiz-Berenguer L I and F J Casajus-Quiros 2002ldquoPolyphonic Transcription Using Piano Modeling forSpectral Pattern Recognitionrdquo Proceedings of the 2002Conference on Digital Audio Effects Hoboken NewJersey Wiley pp 45ndash50

Pardo B and W Birmingham 1999 ldquoAutomated Parti-tioning of Tonal Musicrdquo Technical Report CSE-TR-396ndash99 Electrical Engineering and Computer ScienceDepartment University of Michigan

Pardo B and W Birmingham 2001 ldquoChordal Analysis ofTonal Musicrdquo Technical Report CSE-TR-439-01 Elec-trical Engineering and Computer Science DepartmentUniversity of Michigan

Pauws S 2004 ldquoMusical Key Extraction from AudiordquoPaper presented at the 2004 International Sympo-sium on Music Information Retrieval Barcelona11 October

Pickens J et al 2002 ldquoPolyphonic Score Retrieval UsingPolyphonic Audio Queries A Harmonic Modeling Ap-proachrdquo Paper presented at the 2002 International

85

Symposium on Music Information Retrieval Paris15 October

Pickens J and T Crawford 2002 ldquoHarmonic Models forPolyphonic Music Retrievalrdquo Proceedings of the 2002ACM Conference on Information and Knowledge Man-agement New York Association for Computing Ma-chinery pp 430ndash437

Pickens J 2003 ldquoKey-Specific Shrinkage Techniques forHarmonic Modelsrdquo Poster presented at the 2003 Inter-national Symposium on Music Information RetrievalBaltimore 26ndash30 October

Plumbley M D et al 2002 ldquoAutomatic Music Tran-scription and Audio Source Separationrdquo Cyberneticsand Systems 33(6)603ndash627

Povel D J L 2002 ldquoA Model for the Perception of TonalMelodiesrdquo Proceedings of the 2002 International Con-ference on Music and Artificial Intelligence ViennaSpringer pp 144ndash154

Raphael C 2001 ldquoAutomated Rhythm TranscriptionrdquoPaper presented at the 2001 International Symposiumon Music Information Retrieval Bloomington Indiana15ndash17 October

Raphael C and J Stoddard 2003 ldquoHarmonic Analysiswith Probabilistic Graphical Modelsrdquo Paper presentedat the 2003 International Symposium on Music Infor-mation Retrieval Baltimore 26ndash30 October

Scheirer E D 1998 ldquoTempo and Beat Analysis of Acous-tic Musical Signalsrdquo Journal of the Acoustical Societyof America 103(1)588ndash601

Sheh A and D P W Ellis 2003 ldquoChord Segmentationand Recognition Using Em-Trained Hidden MarkovModelsrdquo Paper presented at the 2003 InternationalSymposium on Music Information Retrieval Balti-more 26ndash30 October

Shenoy A et al 2004 ldquoKey Determination of AcousticMusical Signalsrdquo Paper presented at the 2004 Interna-tional Conference on Multimedia and Expo Taipei 27ndash30 June

Smith N A and M A Schmuckler 2000 ldquoPitch-Distributional Effects on the Perception of Tonalityrdquo

Paper presented at the Sixth International Confer-ence on Music Perception and Cognition Keele5ndash10 August

Sollberger B et al 2003 ldquoMusical Chords as AffectivePriming Context in a Word-Evaluation Taskrdquo MusicPerception 20(3)263ndash282

Temperley D 1999a ldquoImproving the Krumhansl-Schmuckler Key-Finding Algorithmrdquo Paper presentedat the 22nd Annual Meeting of the Society for MusicTheory Atlanta 12 November

Temperley D 1999b ldquoWhatrsquos Key for Key TheKrumhansl-Schmuckler Key-Finding Algorithm Recon-sideredrdquo Music Perception 17(1)65ndash100

Temperley D 2002 ldquoA Bayesian Model of Key-FindingrdquoProceedings of the 2002 International Conference onMusic and Artificial Intelligence Vienna Springerpp 195ndash206

Vercoe B L 1997 ldquoComputational Auditory Pathways toMusic Understandingrdquo In I Deliegravege and J Sloboda edsPerception and Cognition of Music London Psychol-ogy Press pp 307ndash326

Wakefield G H 1999 ldquoMathematical Representation ofJoint Time-Chroma Distributionsrdquo SPIEmdashThe Inter-national Society for Optical Engineering 3807637ndash645

Wang Y et al 2003 ldquoParametric Vector Quantization forCoding Percussive Sounds in Musicrdquo Proceedings ofthe 2003 International Conference on AcousticsSpeech and Signal Processing Piscataway New JerseyInstitute for Electrical and Electronics Engineers

Wang Y et al 2004 ldquoLyricAlly Automatic Synchroniza-tion of Acoustic Musical Signals and Textual LyricsrdquoProceedings of 2004 ACM Multimedia New York As-sociation for Computing Machinery pp 212ndash219

Zhu Y and M Kankanhalli 2003 ldquoMusic Scale Modelingfor Melody Matchingrdquo Proceedings of the Eleventh In-ternational ACM Conference on Multimedia New YorkAssociation for Computing Machinery pp 359ndash362

Zhu Y et al 2004 ldquoA Method for Solmization of MelodyrdquoPaper presented at the 2004 International Conferenceon Multimedia and Expo Taipei 27ndash30 June

86 Computer Music Journal

Shenoy and Wang

compositionrdquo Journal of New Music Research 28(4)310ndash333

Cemgil A T et al 2001 ldquoOn Tempo Tracking Tempo-gram Representation and Kalman Filteringrdquo Journal ofNew Music Research 29(4)259ndash273

Cemgil A T and B Kappen 2003 ldquoMonte Carlo Meth-ods for Tempo Tracking and Rhythm QuantizationrdquoJournal of Artificial Intelligence Research 18(1)45ndash81

Chew E 2001 ldquoModeling Tonality Applications to Mu-sic Cognitionrdquo Proceedings of the Twenty-Third An-nual Conference of the Cognitive Science SocietyWheat Ridge Colorado Cognitive Science Societypp 206ndash211

Chew E 2002 ldquoThe Spiral Array An Algorithm for De-termining Key Boundariesrdquo Proceedings of the 2002International Conference on Music and Artificial In-telligence Vienna Springer pp 18ndash31

Dixon S 2001 ldquoAutomatic Extraction of Tempo and Beatfrom Expressive Performancesrdquo Journal of New MusicResearch 30(1)39ndash58

Dixon S 2003 ldquoOn the Analysis of Musical Expressionsin Audio Signalsrdquo SPIEmdashThe International Society forOptical Engineering 5021(2)122ndash132

Dixon S 2004 ldquoAnalysis of Musical Content in DigitalAudiordquo Computer Graphics and Multimedia Applica-tions Problems and Solutions Hershey PennsylvaniaIdea Group pp 214ndash235

Ewer G 2002 Easy Music Theory Lower Sackville NovaScotia Spring Day Music Publishers

Fujishima T 1999 ldquoReal-Time Chord Recognition ofMusical Sound A System Using Common Lisp Mu-sicrdquo Proceedings of the 1999 International ComputerMusic Conference San Francisco International Com-puter Music Association pp 464ndash467

Goto M 2001 ldquoAn Audio-Based Real-Time Beat-TrackingSystem for Music with or without Drum SoundsrdquoJournal of New Music Research 30(2)159ndash171

Goto M 2003 ldquoA Chorus-Section Detecting Method forMusical Audio Signalsrdquo Proceedings of the 2003 Inter-national Conference on Acoustics Speech and SignalProcessing Piscataway New Jersey Institute for Elec-trical and Electronics Engineers pp 437ndash440

Goto M and Y Muraoka 1994 ldquoA Beat-Tracking Sys-tem for Acoustic Signals of Musicrdquo Proceedings of the1994 ACM Multimedia New York Association forComputing Machinery pp 365ndash372

Goto M and Y Muraoka 1999 ldquoReal-Time Beat Track-ing for Drumless Audio Signals Chord Change Detec-tion for Musical Decisionsrdquo Speech Communication27(3ndash4)311ndash335

Hevner K 1936 ldquoExperimental Studies of the Elements

of Expression in Musicrdquo American Journal of Psychol-ogy 48246ndash268

Huron D 2000 ldquoPerceptual and Cognitive Applicationsin Music Information Retrievalrdquo Paper presented atthe 2000 International Symposium on Music Informa-tion Retrieval University of Massachusetts at Amherst23ndash25 October

Huron D and R Parncutt 1993 ldquoAn Improved Modelof Tonality Perception Incorporating Pitch Salienceand Echoic Memoryrdquo Psychomusicology 12(2)154ndash171

Klapuri A 2003 ldquoAutomatic Transcription of MusicrdquoProceedings of SMAC 2003 Stockholm KTH

Krumhansl C 1990 Cognitive Foundations of MusicalPitch Oxford Oxford University Press

Maddage N C et al 2004 ldquoContent-Based Music Struc-ture Analysis with Applications to Music SemanticsUnderstandingrdquo Proceedings of 2004 ACM Multime-dia New York Association for Computing Machinerypp 112ndash119

Martin KD et al 1998 ldquoMusic Content AnalysisThrough Models of Auditionrdquo Paper presented at the1998 ACM Multimedia Workshop on Content Pro-cessing of Music for Multimedia Applications BristolUK 12 April

Ng K et al 1996 ldquoAutomatic Detection of Tonality Us-ing Note Distributionrdquo Journal of New Music Re-search 25(4)369ndash381

Nwe T L et al 2004 ldquoSinging Voice Detection in Popu-lar Musicrdquo Paper presented at the 2004 ACM Multime-dia Conference New York 12 October

Ortiz-Berenguer L I and F J Casajus-Quiros 2002ldquoPolyphonic Transcription Using Piano Modeling forSpectral Pattern Recognitionrdquo Proceedings of the 2002Conference on Digital Audio Effects Hoboken NewJersey Wiley pp 45ndash50

Pardo B and W Birmingham 1999 ldquoAutomated Parti-tioning of Tonal Musicrdquo Technical Report CSE-TR-396ndash99 Electrical Engineering and Computer ScienceDepartment University of Michigan

Pardo B and W Birmingham 2001 ldquoChordal Analysis ofTonal Musicrdquo Technical Report CSE-TR-439-01 Elec-trical Engineering and Computer Science DepartmentUniversity of Michigan

Pauws S 2004 ldquoMusical Key Extraction from AudiordquoPaper presented at the 2004 International Sympo-sium on Music Information Retrieval Barcelona11 October

Pickens J et al 2002 ldquoPolyphonic Score Retrieval UsingPolyphonic Audio Queries A Harmonic Modeling Ap-proachrdquo Paper presented at the 2002 International

85

Symposium on Music Information Retrieval Paris15 October

Pickens J and T Crawford 2002 ldquoHarmonic Models forPolyphonic Music Retrievalrdquo Proceedings of the 2002ACM Conference on Information and Knowledge Man-agement New York Association for Computing Ma-chinery pp 430ndash437

Pickens J 2003 ldquoKey-Specific Shrinkage Techniques forHarmonic Modelsrdquo Poster presented at the 2003 Inter-national Symposium on Music Information RetrievalBaltimore 26ndash30 October

Plumbley M D et al 2002 ldquoAutomatic Music Tran-scription and Audio Source Separationrdquo Cyberneticsand Systems 33(6)603ndash627

Povel D J L 2002 ldquoA Model for the Perception of TonalMelodiesrdquo Proceedings of the 2002 International Con-ference on Music and Artificial Intelligence ViennaSpringer pp 144ndash154

Raphael C 2001 ldquoAutomated Rhythm TranscriptionrdquoPaper presented at the 2001 International Symposiumon Music Information Retrieval Bloomington Indiana15ndash17 October

Raphael C and J Stoddard 2003 ldquoHarmonic Analysiswith Probabilistic Graphical Modelsrdquo Paper presentedat the 2003 International Symposium on Music Infor-mation Retrieval Baltimore 26ndash30 October

Scheirer E D 1998 ldquoTempo and Beat Analysis of Acous-tic Musical Signalsrdquo Journal of the Acoustical Societyof America 103(1)588ndash601

Sheh A and D P W Ellis 2003 ldquoChord Segmentationand Recognition Using Em-Trained Hidden MarkovModelsrdquo Paper presented at the 2003 InternationalSymposium on Music Information Retrieval Balti-more 26ndash30 October

Shenoy A et al 2004 ldquoKey Determination of AcousticMusical Signalsrdquo Paper presented at the 2004 Interna-tional Conference on Multimedia and Expo Taipei 27ndash30 June

Smith N A and M A Schmuckler 2000 ldquoPitch-Distributional Effects on the Perception of Tonalityrdquo

Paper presented at the Sixth International Confer-ence on Music Perception and Cognition Keele5ndash10 August

Sollberger B et al 2003 ldquoMusical Chords as AffectivePriming Context in a Word-Evaluation Taskrdquo MusicPerception 20(3)263ndash282

Temperley D 1999a ldquoImproving the Krumhansl-Schmuckler Key-Finding Algorithmrdquo Paper presentedat the 22nd Annual Meeting of the Society for MusicTheory Atlanta 12 November

Temperley D 1999b ldquoWhatrsquos Key for Key TheKrumhansl-Schmuckler Key-Finding Algorithm Recon-sideredrdquo Music Perception 17(1)65ndash100

Temperley D 2002 ldquoA Bayesian Model of Key-FindingrdquoProceedings of the 2002 International Conference onMusic and Artificial Intelligence Vienna Springerpp 195ndash206

Vercoe B L 1997 ldquoComputational Auditory Pathways toMusic Understandingrdquo In I Deliegravege and J Sloboda edsPerception and Cognition of Music London Psychol-ogy Press pp 307ndash326

Wakefield G H 1999 ldquoMathematical Representation ofJoint Time-Chroma Distributionsrdquo SPIEmdashThe Inter-national Society for Optical Engineering 3807637ndash645

Wang Y et al 2003 ldquoParametric Vector Quantization forCoding Percussive Sounds in Musicrdquo Proceedings ofthe 2003 International Conference on AcousticsSpeech and Signal Processing Piscataway New JerseyInstitute for Electrical and Electronics Engineers

Wang Y et al 2004 ldquoLyricAlly Automatic Synchroniza-tion of Acoustic Musical Signals and Textual LyricsrdquoProceedings of 2004 ACM Multimedia New York As-sociation for Computing Machinery pp 212ndash219

Zhu Y and M Kankanhalli 2003 ldquoMusic Scale Modelingfor Melody Matchingrdquo Proceedings of the Eleventh In-ternational ACM Conference on Multimedia New YorkAssociation for Computing Machinery pp 359ndash362

Zhu Y et al 2004 ldquoA Method for Solmization of MelodyrdquoPaper presented at the 2004 International Conferenceon Multimedia and Expo Taipei 27ndash30 June

86 Computer Music Journal

Symposium on Music Information Retrieval Paris15 October

Pickens J and T Crawford 2002 ldquoHarmonic Models forPolyphonic Music Retrievalrdquo Proceedings of the 2002ACM Conference on Information and Knowledge Man-agement New York Association for Computing Ma-chinery pp 430ndash437

Pickens J 2003 ldquoKey-Specific Shrinkage Techniques forHarmonic Modelsrdquo Poster presented at the 2003 Inter-national Symposium on Music Information RetrievalBaltimore 26ndash30 October

Plumbley M D et al 2002 ldquoAutomatic Music Tran-scription and Audio Source Separationrdquo Cyberneticsand Systems 33(6)603ndash627

Povel D J L 2002 ldquoA Model for the Perception of TonalMelodiesrdquo Proceedings of the 2002 International Con-ference on Music and Artificial Intelligence ViennaSpringer pp 144ndash154

Raphael C 2001 ldquoAutomated Rhythm TranscriptionrdquoPaper presented at the 2001 International Symposiumon Music Information Retrieval Bloomington Indiana15ndash17 October

Raphael C and J Stoddard 2003 ldquoHarmonic Analysiswith Probabilistic Graphical Modelsrdquo Paper presentedat the 2003 International Symposium on Music Infor-mation Retrieval Baltimore 26ndash30 October

Scheirer E D 1998 ldquoTempo and Beat Analysis of Acous-tic Musical Signalsrdquo Journal of the Acoustical Societyof America 103(1)588ndash601

Sheh A and D P W Ellis 2003 ldquoChord Segmentationand Recognition Using Em-Trained Hidden MarkovModelsrdquo Paper presented at the 2003 InternationalSymposium on Music Information Retrieval Balti-more 26ndash30 October

Shenoy A et al 2004 ldquoKey Determination of AcousticMusical Signalsrdquo Paper presented at the 2004 Interna-tional Conference on Multimedia and Expo Taipei 27ndash30 June

Smith N A and M A Schmuckler 2000 ldquoPitch-Distributional Effects on the Perception of Tonalityrdquo

Paper presented at the Sixth International Confer-ence on Music Perception and Cognition Keele5ndash10 August

Sollberger B et al 2003 ldquoMusical Chords as AffectivePriming Context in a Word-Evaluation Taskrdquo MusicPerception 20(3)263ndash282

Temperley D 1999a ldquoImproving the Krumhansl-Schmuckler Key-Finding Algorithmrdquo Paper presentedat the 22nd Annual Meeting of the Society for MusicTheory Atlanta 12 November

Temperley D 1999b ldquoWhatrsquos Key for Key TheKrumhansl-Schmuckler Key-Finding Algorithm Recon-sideredrdquo Music Perception 17(1)65ndash100

Temperley D 2002 ldquoA Bayesian Model of Key-FindingrdquoProceedings of the 2002 International Conference onMusic and Artificial Intelligence Vienna Springerpp 195ndash206

Vercoe B L 1997 ldquoComputational Auditory Pathways toMusic Understandingrdquo In I Deliegravege and J Sloboda edsPerception and Cognition of Music London Psychol-ogy Press pp 307ndash326

Wakefield G H 1999 ldquoMathematical Representation ofJoint Time-Chroma Distributionsrdquo SPIEmdashThe Inter-national Society for Optical Engineering 3807637ndash645

Wang Y et al 2003 ldquoParametric Vector Quantization forCoding Percussive Sounds in Musicrdquo Proceedings ofthe 2003 International Conference on AcousticsSpeech and Signal Processing Piscataway New JerseyInstitute for Electrical and Electronics Engineers

Wang Y et al 2004 ldquoLyricAlly Automatic Synchroniza-tion of Acoustic Musical Signals and Textual LyricsrdquoProceedings of 2004 ACM Multimedia New York As-sociation for Computing Machinery pp 212ndash219

Zhu Y and M Kankanhalli 2003 ldquoMusic Scale Modelingfor Melody Matchingrdquo Proceedings of the Eleventh In-ternational ACM Conference on Multimedia New YorkAssociation for Computing Machinery pp 359ndash362

Zhu Y et al 2004 ldquoA Method for Solmization of MelodyrdquoPaper presented at the 2004 International Conferenceon Multimedia and Expo Taipei 27ndash30 June

86 Computer Music Journal