analysis of analysis: using machine learning to evaluate the

22
Journal of Mathematics and Music, 2016 Vol. 10, No. 2, 127–148, http://dx.doi.org/10.1080/17459737.2016.1209588 Analysis of analysis: Using machine learning to evaluate the importance of music parameters for Schenkerian analysis Phillip B. Kirlin a and Jason Yust b a Department of Mathematics and Computer Science, Rhodes College, Memphis, USA; b School of Music, Boston University, Boston, USA (Received 16 November 2015; accepted 19 June 2016) While criteria for Schenkerian analysis have been much discussed, such discussions have generally not been informed by data. Kirlin [Kirlin, Phillip B., 2014 “A Probabilistic Model of Hierarchical Music Analysis.” Ph.D. thesis, University of Massachusetts Amherst] has begun to fill this vacuum with a corpus of textbook Schenkerian analyses encoded using data structures suggested byYust [Yust, Jason, 2006 “Formal Models of Prolongation.” Ph.D. thesis, University of Washington] and a machine learning algorithm based on this dataset that can produce analyses with a reasonable degree of accuracy. In this work, we examine what musical features (scale degree, harmony, metrical weight) are most significant in the performance of Kirlin’s algorithm. Keywords: Schenkerian analysis; machine learning; harmony; melody; rhythm; feature selection 2010 Mathematics Subject Classification: 68T05; 68T10 2012 Computing Classification Scheme: supervised learning; sound and music computing 1. Introduction Schenkerian analysis is widely understood as central to the theory of tonal music. Yet, many of the most prominent voices in the field emphasize its status as an expert practice rather than as a theory. Burstein (2011, 116), for instance, argues for preferring a Schenkerian analysis “not because it demonstrates features that are objectively or intersubjectively present in the passage, but rather because I believe it encourages a plausible yet stimulating and exciting way of per- ceiving and performing [the] passage.” Rothstein (1990, 298) explains an approach to Schenker pedagogy as follows. Analysis should lead to better hearing, better performing, and better thinking about music, not just to “correct” answers. [ ... ] I spend lots of class time—as much as possible—debating the merits of alternate readings: not pri- marily their conformance with the theory, though that is discussed where appropriate, but their relative plausibility as models of the composition being analyzed. Schachter (1990) illustrates alternative readings of many works and asserts that a full musical context is essential to evaluating them. Paring the music down to just aspects of harmony and voice leading, like “the endless formulas in white notes that disfigure so many harmony texts,” he claims, leaves the difference between competing interpretations undecidable. In publications *Corresponding author. Email: [email protected] © 2016 Informa UK Limited, trading as Taylor & Francis Group

Upload: truongtuong

Post on 03-Jan-2017

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Analysis of analysis: Using machine learning to evaluate the

Journal of Mathematics and Music, 2016Vol. 10, No. 2, 127–148, http://dx.doi.org/10.1080/17459737.2016.1209588

Analysis of analysis: Using machine learning to evaluate theimportance of music parameters for Schenkerian analysis

Phillip B. Kirlina∗ and Jason Yustb

aDepartment of Mathematics and Computer Science, Rhodes College, Memphis, USA;bSchool of Music, Boston University, Boston, USA

(Received 16 November 2015; accepted 19 June 2016)

While criteria for Schenkerian analysis have been much discussed, such discussions have generally notbeen informed by data. Kirlin [Kirlin, Phillip B., 2014 “A Probabilistic Model of Hierarchical MusicAnalysis.” Ph.D. thesis, University of Massachusetts Amherst] has begun to fill this vacuum with acorpus of textbook Schenkerian analyses encoded using data structures suggested byYust [Yust, Jason,2006 “Formal Models of Prolongation.” Ph.D. thesis, University of Washington] and a machine learningalgorithm based on this dataset that can produce analyses with a reasonable degree of accuracy. In thiswork, we examine what musical features (scale degree, harmony, metrical weight) are most significant inthe performance of Kirlin’s algorithm.

Keywords: Schenkerian analysis; machine learning; harmony; melody; rhythm; feature selection

2010 Mathematics Subject Classification: 68T05; 68T102012 Computing Classification Scheme: supervised learning; sound and music computing

1. Introduction

Schenkerian analysis is widely understood as central to the theory of tonal music. Yet, many ofthe most prominent voices in the field emphasize its status as an expert practice rather than asa theory. Burstein (2011, 116), for instance, argues for preferring a Schenkerian analysis “notbecause it demonstrates features that are objectively or intersubjectively present in the passage,but rather because I believe it encourages a plausible yet stimulating and exciting way of per-ceiving and performing [the] passage.” Rothstein (1990, 298) explains an approach to Schenkerpedagogy as follows.

Analysis should lead to better hearing, better performing, and better thinking about music, not just to “correct”answers. [ . . . ] I spend lots of class time—as much as possible—debating the merits of alternate readings: not pri-marily their conformance with the theory, though that is discussed where appropriate, but their relative plausibilityas models of the composition being analyzed.

Schachter (1990) illustrates alternative readings of many works and asserts that a full musicalcontext is essential to evaluating them. Paring the music down to just aspects of harmony andvoice leading, like “the endless formulas in white notes that disfigure so many harmony texts,”he claims, leaves the difference between competing interpretations undecidable. In publications

*Corresponding author. Email: [email protected]

© 2016 Informa UK Limited, trading as Taylor & Francis Group

Page 2: Analysis of analysis: Using machine learning to evaluate the

128 P. B. Kirlin and J. Yust

such deliberation typically occurs at a high level. It rarely addresses the implicit principles usedto deal with many details of the musical surface. As Agawu (2009, 116) says, “the journeyfrom strict counterpoint to free composition makes an illicit or – better – mysterious leap as itapproaches its destination.”

As with any complex human activity, the techniques of artificial intelligence may greatlyadvance our understanding of how Schenkerian analysis is performed and what kinds of implicitcognitive abilities and priorities support it. The present work builds upon the research of(Kirlin 2014a; 2015) which models Schenkerian analysis using machine learning techniques.By probing Kirlin’s algorithm we address a question of deep interest to Schenkerian analysts andpedagogues: what roles do different aspects of the music play in deliberating between possibleanalyses of the same musical passage? Because the activity of Schenkerian analysis involvessuch a vast amount of implicit musical knowledge, it is treacherous to litigate this question byintuition, without the aid of computational models and methods.

This article is organized as follows. The next section, numbered 2, explains the machine learn-ing algorithm we used, which is essentially that of Kirlin (2014b). Section 3 describes a seriesof experiments to test which musical features the algorithm relies upon most heavily to produceaccurate analyses. Section 4 describes the results of that experiment, and Section 5 providesfurther exploratory analysis of data produced by the experiment.

2. A machine learning algorithm for Schenkerian analysis

Schenkerian theory is grounded in the idea that a tonal composition is organized as a hierarchicalcollection of prolongations, where a prolongation is defined, for our purposes, as an instancewhere a motion from one musical event, L, to another non-adjacent event, R, is understood tocontrol the passage between L and R, and the intermediate events it contains. A prolongation isrepresented in Schenkerian notation as a slur or beam.

Consider Figure 1, a descending melodic passage outlining a G major chord. Assumingthis melody takes place over G-major harmony, this passage contains two passing tones (non-harmonic tones in a melodic line that linearly connect two consonant notes via stepwise motion),the second note C and the fourth note A. These tones smoothly guide the melody between thechord tones D, B, and G. In Schenkerian terminology, the C prolongs the motion from the D tothe B, and the A similarly prolongs the motion from the B to the G.

The hierarchical aspect of Schenkerian analysis comes into play when we consider a prolon-gation that occurs over the entire five-note passage. The slurs from D to B and B to G identifythe C and A as passing tones. Another slur from D to G, which contains the smaller slurs, showsthat the entire motion outlines the tonic triad from D down to G. The placement of slurs mayreflect the relatively higher stability of the endpoints (between chord tones over non-chord tones,and between more stable members of the triad, root and fifth, over the third), or they may reflecta way in which the local motions (passing-tone figures) group into the most coherent larger-scalemotion (arpeggiation of a triad).

Figure 1. A melodic line illustrating prolongations.

Page 3: Analysis of analysis: Using machine learning to evaluate the

Journal of Mathematics and Music 129

Figure 2. The prolongational hierarchy of a G-major chord with passing tones represented as two equivalent datastructures. (a) A binary tree of melodic intervals. (b) A maximal outerplaner graph, or MOP.

This hierarchy can be represented visually by the tree in Figure 2(a): this diagram illustrates thehierarchy of melodic intervals present in the composition and the various prolongations identifiedabove. An equivalent representation, known as a maximal outerplanar graph, or MOP (Yust2006, 2009, 2015), is shown in Figure 2(b). Binary trees of intervals and MOPs are duals of eachother in that they represent identical sets of information, though the MOP representation is moresuccinct.

From a mathematical perspective, a MOP is a complete triangulation of a polygon. Each tri-angle in a MOP represents a single melodic prolongation among the three notes of the musicrepresented by the three endpoints of the triangle. Because MOPs are oriented temporally, withthe notes shown in a MOP always ordered from left to right as they are in the musical score,we can unambiguously refer to the three endpoints of a triangle in a MOP as the left (L), middle(M ), and right (R) endpoints. Each triangle in a MOP, therefore, represents a prolongation of themelodic interval from L to R by the intervals from L to M and M to R.

2.1. A probabilistic interpretation of MOPs

Our goal is to develop an algorithm with the ability to predict, given a musical composition,a “correct” Schenkerian analysis for that composition. We develop this algorithm using thefollowing probabilistic perspective.

Assume that we are given a sequence of notes N that we wish to analyze, and that all possibleSchenkerian analyses of N can be enumerated as A1, . . . , Am, for some integer m. We desirethe most probable analysis given the notes, which is arg maxi P(Ai | N). Because a Schenkeriananalysis Ai can be represented in MOP form by its collection of triangles, we will denote the setof triangles comprising analysis Ai by Ti,1, . . . , Ti,p, for some integer p. We then define P(Ai | N)

as the joint probability P(Ti,1, . . . , Ti,p). In other words, we define the probability of an analysisbeing correct for a given sequence of notes as the probability of observing a MOP containing thespecific set of triangles derived from that analysis.

We will use supervised machine learning to estimate this probability from a corpus of Schenke-rian analyses done by humans, which we interpret as ground truth. Specifically, we use theSchenker41 dataset, the largest known collection of machine-readable Schenkerian analysesin existence (Kirlin 2014a). This dataset contains 41 common-practice era musical excerpts andtheir corresponding Schenkerian analyses. All of the excerpts are either for a solo keyboardinstrument (or arranged for such an instrument) or for voice with keyboard accompaniment. Allare in major keys and do not modulate, though there are some tonicizations. The musical excerptsare encoded in the symbolic MusicXML format, whereas the corresponding analyses are encodedin a text-based format that captures the prolongational information in a Schenkerian analysis,while not assigning any musical interpretation to each prolongation. In other words, the prolon-gations are not labeled with terms such as “passing tone,” “neighbor tone,” or other descriptors,but are denoted only by their constituent notes. Each analysis also contains a Roman numeralharmonic labelling, determined either from the same source as the prolongations themselves, orfrom an expert music analyst.

Page 4: Analysis of analysis: Using machine learning to evaluate the

130 P. B. Kirlin and J. Yust

The analyses in the dataset are derived from four sources: Forte and Gilbert’s Introductionto Schenkerian Analysis and the accompanying solutions manual (1982a; 1982b) (10 excerpts),Cadwallader and Gagné’s Analysis of Tonal Music (1998) (4 excerpts), Pankhurst’s Schenker-GUIDE (2008) (24 excerpts), and teaching materials produced by an expert music analyst (3excerpts).

Each excerpt in the dataset can be translated into a MOP representing the prolongationspresent in the main melody of the excerpt. However, it is difficult to estimate the full jointprobability P(Ti,1, . . . , Ti, p) directly from the resulting collection of MOPs owing to the curseof dimensionality: the large number of possible combinations of triangles is simply too largefor any reasonably-sized dataset. Instead, we make the simplifying assumption that each trian-gle in a MOP is independent of all other triangles in a MOP, which implies P(Ti,1, . . . , Ti, p) =P(Ti,1) · · · P(Ti, p). This assumption reduces the full joint probability to a product of simpler,lower-dimensional probabilities, which are easier to learn. An experiment verifies that thisassumption largely preserves relative probability scores between two MOPs, which is a sufficientcondition for our purposes to proceed (Kirlin and Jensen 2011).

Finally, we define the probability of an individual triangle appearing in a MOP as the probabil-ity of a given melodic interval being elaborated by the specific choice of a certain child note. Thatis, we define P(Ti, j) = P(Mi, j | Li, j, Ri, j), where Li, j, Mi, j, and Ri, j are the three endpoints (notes)of triangle Ti, j. In summary, we have defined the most likely analysis Ai for a given sequence ofnotes N as

arg maxi

P(Ai | N) = arg maxi

P(Ti,1, . . . , Ti, p)

= arg maxi

p∏

j=1

P(Ti, j)

= arg maxi

p∏

j=1

P(Mi, j | Li, j, Ri, j). (1)

We now return to the question of using the Schenker41 corpus to compute the probabilityP(Mi, j | Li, j, Ri, j), which requires us to define exactly what type of musical information underliesthe variables Li, j, Mi, j, and Ri, j. It is typical to describe these variables by a collection of features:specific measurable properties of the source music whose values are determined from the notesof the music corresponding to those variables. We use a set of 18 features that provide basicmelodic, harmonic, metrical, and temporal information about the music; these are described morespecifically in Section 3.

With a set of features in hand, a straightforward approach to determining the probability aboveinvolves counting the frequency of every type of triangle within the corpus, where each type oftriangle is determined from a unique combination of the 18 features. The curse of dimensional-ity thwarts us again: there are too many possible types of triangle to expect to get a reasonablefrequency count from any reasonably-sized corpus of analyses. Instead, we use random forests(Breiman 2001), a machine learning ensemble method particularly suited for high-dimensionalfeature spaces, to learn the conditional probability P(Mi, j | Li, j, Ri, j). Random forests, whichoperate by constructing a large collection of decision trees, are typically used for classifica-tion tasks by outputting the most frequent class (the mode) predicted by the collection of trees.However, a probability distribution can be obtained instead by counting the frequencies of thepredicted classes and normalizing them (Provost and Domingos 2003).

Our set of 18 features includes a subset of six features that depend on the middle note Mi, j. Itis unlikely that a single random forest could sufficiently learn to predict all six features simul-taneously. Therefore, we factor our conditional probability into the product of six different

Page 5: Analysis of analysis: Using machine learning to evaluate the

Journal of Mathematics and Music 131

probabilities, each assigned to predict one of the six features of the middle note. If we denotethese six features of the middle note as M1 through M6, this factorization becomes (dropping thei, j subscripts for brevity):

P(M | L, R) = P(M1, M2, . . . , M6 | L, R)

= P(M1 | L, R) · P(M2, M3, . . . , M6 | M1, L, R)

= P(M1 | L, R) · P(M2 | M1, L, R) · P(M3, M4, . . . , M6 | M1, M2, L, R)

= P(M1 | L, R) · P(M2 | M1, L, R) · · · P(M6 | M1, . . . , M5, L, R).

In other words, we construct six random forests and use each one to construct a conditionalprobability distribution over a single feature of the middle note. Multiplying these distributionstogether gives us a distribution over all six features of the middle note, and therefore an estimateof the probability of seeing any particular triangle in a MOP.

Now that we have an appropriate estimate of this probability P(Mi, j | Li, j, Ri, j), we can cal-culate the probability of an entire MOP analysis according to equation (1). Computationally,it is inefficient to enumerate all possible Schenkerian analyses for a given sequence of notes;the size of this set grows exponentially with the length of the sequence. Instead, by combiningthe probabilistic interpretation of MOPs along with the equivalence between MOPs and binarytrees, we can view each prolongation within a MOP as a production in a probabilistic context-free grammar. Under this interpretation, it is straightforward to use standard parsing techniques(Jiménez and Marzal 2000; Jurafsky and Martin 2009) to develop an O(n3) algorithm, known asParseMop-C, that determines the most probable analysis for a given sequence of notes. Addi-tionally, the grammar formalism allows us to restrict the predicted analyses to those that containa valid Urlinie through specific sets of production rules.

ParseMop-C accepts as input a sequence of notes to analyze, along with information aboutthe harmonic context of those notes and what category of Urlinie should be located (e.g.,5̂-line or 3̂-line). The algorithm begins by computing the probabilities of analyses of all three-note excerpts of the input music. Because a three-note excerpt forms exactly one triangle in aMOP, these probabilities come directly from the output of the random forests. The most probableanalyses for all four-note excerpts are obtained by combining the pre-computed results for three-note excerpts. This continues with n-note excerpts being analyzed by combining probabilities of(n − 1)-note excerpts, until the entire piece is analyzed. The most probable analysis is output asa MOP, though there is an additional algorithm available that can display the analysis in moretraditional Schenkerian notation.

We can compare the predicted output analyses of ParseMop-C against the ground-truth anal-yses from the Schenker41 corpus to determine the performance of the algorithm, which wemeasure via edge accuracy, which is the percentage of edges in an algorithmically-producedMOP that correspond to edges in the ground-truth MOP. Additional details on the probabilis-tic interpretation of MOPs, ParseMop-C, and its evaluation are available in Kirlin and Jensen(2015) and Kirlin (2014b).

ParseMop-C is the first completely data-driven Schenkerian analysis algorithm, in that itlearns all the “rules” of Schenkerian analysis, including their forms and where the apply them,completely from a corpus of data. Very early studies in computional approaches to Schenkeriananalysis encountered difficulties due to the issues inherent in handling the seemingly conflict-ing guidelines of the analysis process (Kassler 1975, 1987; Frankel, Rosenschein, and Smoliar1978). Lerdahl and Jackendoff’s A Generative Theory of Tonal Music (1983) presented two dif-ferent formal systems for music that, like Schenkerian analysis, can be used to view a pieceof music as a hierarchy of musical objects. The authors, however, stated that their formaliza-tions were not designed to replicate the ideas of Schenkerian analysis, but were rather a new

Page 6: Analysis of analysis: Using machine learning to evaluate the

132 P. B. Kirlin and J. Yust

investigation in musical cognition. Though Lerdahl and Jackendoff’s systems lack the detailsnecessary to develop a “computable procedure for determining musical analyses,” a number offurther endeavors attempted to fill in the gaps. The most successful of these is the work doneby Hamanaka et al., which has resulted in a collection of computational systems that attempt toreplicate parts of Lerdahl and Jackendoff’s theories. Their early systems (Hamanaka, Hirata, andTojo 2005, 2006, 2007) required user-supplied parameters to identify the optimal music analy-sis, while later systems (Hamanaka and Tojo 2009; Miura et al. 2009; Kanamori, Hamanaka, andHoshino 2014) have focused on using machine learning to automate the search for the appropri-ate parameters. Numerous other approaches (Mavromatis and Brown 2004; Gilbert and Conklin2007; Marsden 2005; Marsden and Wiggins 2008; Marsden 2010) have used a context-free gram-mar formalization similar to the one presented above, but with a hand-created rule set rather thanrules learned directly from the ground-truth analyses.

3. Musical features

To understand how individual musical features might figure into the accuracy of these algorith-mic analyses, consider the task of analyzing the “music” in Figure 3. This is the intervallic patternof the melody of a simple four-measure phrase of real music. There is no information about therhythm or harmony of the original music. We do not know what the key is or which note is thetonic. Registral information is also removed by octave-reducing melodic intervals and invertinglarger leaps. How likely would it be that one could reproduce a textbook analysis of the melodyrelying on this intervallic pattern only?

Or what if one were given only the rhythm and meter of a melody, as in Figure 4? Howaccurately could we expect to predict a Schenkerian analysis of the passage?

In the experiment described below, we train the machine learning algorithm on Schenke-rian analyses with differing amounts of information about the music, and therefore discoverhow important different aspects of the music (melodic, harmonic, metric) are to accuratelyreproducing Schenkerian analyses.

The data for Figures 3 and 4 comes from Schubert’s Impromptu Op. 142, No. 2, which isanalyzed by Cadwallader and Gagné (1998) as shown in Figure 5. Their analysis has enoughdetail to be translated into the explicit and near-complete MOP shown in Figure 6(a). The MOP isa simplification, reflecting just the basic melodic hierarchy implied by Cadwallader and Gagné’sanalysis. There are three places where their analysis is ambiguous: in measures 1–3, they show adouble neighbor figure A�-G-B�-A�, where a slur could be added from A to B� or from G to A�,but neither is clearly implied by the analysis. In measures 5–8, they show C-B�-A�-C in stemmednotes. A slur from C to A� might be inferred but is not actually shown, so it is not included inthe coded analysis. Finally, this example reflects a common problem in that Cadwallader andGagné give the status of 3̂ to two different Cs. A fully explicit analysis would choose between

Figure 3. A melody with general interval information only.

Figure 4. The meter and rhythm of a melody.

Page 7: Analysis of analysis: Using machine learning to evaluate the

Journal of Mathematics and Music 133

Figure 5. Cadwallader and Gagné (1998)’s analysis of Schubert’s Impromptu Op. 142, No. 2.

Figure 6. (a) An encoded version of the analysis in Figure 5; (b) an analysis of just the intervallic pattern of theimpromptu, as shown in Figure 3, by a computer trained on just the intervallic patterns of textbook analyses; (c) computeranalysis of the melodic (interval and scale degree) and harmonic information; and (d) a computer analysis of the melodic,harmonic, and metrical data of the impromptu.

these as the true initiation of the fundamental line. The part of the analysis shown with beamsis the “background” of the analysis; the ParseMop-C algorithm is constrained to find such a1̂-2̂-3̂ initial ascent as the background of the phrase (but not to find these notes in any specificlocations).

Figures 6(b)–6(d) show three analyses by the algorithm trained on limited amounts of musicalinformation, which can be compared to the “ground truth” in Figure 6(a). The first, Figure 6(b),shows the solution arrived at by the algorithm trained on just the generic melodic intervals ofthe corpus of textbook analyses. Since this version of the algorithm has no awareness of keys,accidentals, harmony, register, or rhythm, the music is shown with no key signature, clef, or dura-tional values, and with octave-reduced generic intervals (as in Figure 3). Note that adjacent notesare implicitly connected, so a slur between the first two notes would be redundant. The algorithm

Page 8: Analysis of analysis: Using machine learning to evaluate the

134 P. B. Kirlin and J. Yust

accurately identifies some local passing motions in the music given just this basic generic inter-vallic information. Evaluated by the proportion of shared slurs between the computer analysisand the textbook one, however, the computer does not do especially well on this example, mostlybecause it buries the C of measure 4 in the middle of a B�-C-D� passing motion, and thus mostof its slurs cross over a note that is especially prominent in the textbook analysis. This passingmotion, which is perfectly reasonable given only intervallic information, is quite implausiblewhen we see the metric and harmonic status of the three notes involved. On other examples,however, the algorithm performs surprisingly well with such minimal musical information, aswe shall see below.

Given a slightly richer musical object, including the scale-degree number of each note (i.e., ithas the reference point of a key), and some harmonic context (Roman numeral without inversion)the algorithm produces the analysis of Figure 6(c). The new information allows it to avoid certainblatant errors: for instance, now that it knows that the third note (B�) occurs over a tonic chord,not a V, it avoids assigning it a major structural role. However, the algorithm makes anotherdecision that turns out to be a mistake, shifting the first note of the structure ahead to note 7. Itapparently identifies an arpeggiation of V as a likely introductory approach to the first structuralnote, but that turns out to be implausible because of the rhythm. When it has additional metricalinformation (specifically, which notes are accented relative to others), the algorithm correctsthis mistake, shifting the structural beginning back to the metrically strong note 2, as shown inFigure 6(d). This result is then quite similar to Cadwallader and Gagné’s analysis.

This single example shows how different aspects of the music – melodic pattern, harmony,rhythm – play different roles in determining the plausibility of a Schenkerian analysis. The prin-cipal goal of the experiment reported here is to answer the broad question of what features of themusic are most essential to accurately reproducing human analyses. To clarify, this experimentis not designed to select a “best” subset of variables that will be used in future studies; using ran-dom forests as our learning algorithm already reduces much of the need to find an ideal subset ofvariables in order to simplify the model or combat overfitting (Hastie, Tibshirani, and Friedman2009). The motivations of this study come, instead, primarily from music theory. For instance,theorists often emphasize the linear aspect of Schenkerian analysis, to distinguish it from otheranalytical methods that focus more heavily on chords and root progressions. Without a data-driven approach, however, it is not really clear how much a Schenkerian analysis really relieson melodic aspects of the music as opposed to harmonic progressions. Also, Schenker pedagogytypically de-emphasizes the metrical aspect of music, by illustrating, for instance, how brief andmetrically weak notes may sometimes play central structural roles. Yet, because analysts mayrely upon metrical information unconsciously, in making less prominent analytical decisions,such arguments may be misleading. Finally, Schenkerian analysis is often tacitly understood tobe a recursive procedure, meaning that the same criteria apply to very local analytical decisionsas those at higher levels, between non-adjacent notes and measure-to-measure across a phrase.Although the examples in the Schenker41 corpus are primarily short – usually just a singlephrase or pair of phrases – we were able to test a limited version of this hypothesis from thenote-to-note to the measure-to-measure level, by including a set of temporal features, showingthat even this limited version of the recursion hypothesis is incorrect.

The main results (reported in Section 4) are the outcomes of two feature selection processes.The first tests the relative importance of broadly-categorized harmonic, melodic, metric, andtemporal features. The second is an ordering of the more specific set of 18 features accordingto how much the performance of the machine learning algorithm depends upon them. This sec-ond process generated a substantial amount of data that also permitted us to draw some limitedconclusions via exploratory analysis of how features interact. This is reported in Section 5.

The features used in this work were originally specified by Kirlin (2014b) because they cov-ered a range of types of musical information, melodic, harmonic, rhythmic, and temporal, were

Page 9: Analysis of analysis: Using machine learning to evaluate the

Journal of Mathematics and Music 135

easily determinable by computer from the dataset, had a small number of possible values for eachfeature, and most importantly, seemed likely to fulfill a critical role in the Schenkerian analyticalprocess. The features also proved well-suited to test the kinds of common assumptions aboutSchenkerian analysis described above.

We define 18 features in all that are available to the algorithm in the full model. The followingsix of these are features of the middle note.

• SD-M The scale degree of the note (represented as an integer from 1 through 7, qualified asraised or lowered for altered scale degrees).

• RN-M The harmony present in the music at the time of onset of the center note (repre-sented as a Roman numeral from I through VII or “cadential six-four”). For applied chords(tonicizations), labels correspond to the key of the tonicization.

• HC-M The category of harmony present in the music at the time of the center note, representedas a selection from the set tonic (any I chord), dominant (any V or VII chord), predominant(II, II6, or IV), applied dominant, or VI chord. (Our dataset did not have any III chords.)

• CT-M Whether the note is a chord tone in the harmony present at the time (represented as aselection from the set “basic chord member” (root, third, or fifth), “seventh of the chord,” or“not in the chord”).

• Met-LMR∗ The metrical strength of the middle note’s position as compared to the metricalstrength of note L, and to the metrical strength of note R (represented as a selection from theset “weaker,” “same,” or “stronger”).

• Int-LMR∗ The melodic intervals from L to M and from M to R, generic (scale-step values)and octave generalized (ranging from a unison to a seventh).

Two features of the middle note are different than the others, in that they are influenced by theleft and right notes. These are therefore distinguished as “LMR” features, as opposed to simple“M” features.

We also used the following 12 features for the left and right notes, L and R.

• SD-LR∗ The scale degree (1–7, qualified as in SD-M) of the notes L and R.• Int-LR The melodic interval from L to R, with intervening octaves removed.• IntI-LR The melodic interval from L to R, with intervening octaves removed and intervals

larger than a fourth inverted.• IntD-LR The direction of the melodic interval from L to R; i.e., up or down.• RN-LR∗ The harmony present in the music at the time of L or R, represented as a Roman

numeral I through VII, or “cadential six-four.”• HC-LR∗ The category of harmony present in the music at the time of L or R, represented as a

selection from the set tonic, dominant, predominant, applied dominant, or VI chord.• CT-LR∗ Status of L or R as a chord tone in the harmony present at the time (see the description

of CT-M above).• MetN-LR∗ A number indicating the beat strength of the metrical position of L or R. The

downbeat of a measure is 0. For duple or quadruple meters, the halfway point of the measureis 1; for triple meters, beats two and three are 1. This pattern continues with strength levels of2, 3, and so on.

• MetO-LR∗ A number indicating the beat strength of the metrical position of L or R. Thedownbeat of a measure is 0. For duple or quadruple meters, the halfway point of the measureis 1; for triple meters, beats two and three are 1. This pattern continues with strength levels of2, 3, and so on. The difference between MetN-LR and MetO-LR is that the former is repre-sented in trees of the random forest as a numeric variable, and the latter as an ordinal variable.

Page 10: Analysis of analysis: Using machine learning to evaluate the

136 P. B. Kirlin and J. Yust

The algorithm that creates the forests treats these types of variables differently: numeric vari-ables are compared using greater-than or less-than tests, while ordinal variables are treated ascategorical, and are compared using equality tests.

• Lev1-LR Whether L, M, and R are consecutive notes in the music, represented as a true/falsevalue (this can be determined strictly from examining the positions of L and R, and so is notincluded in the list of middle-note features).

• Lev2-LR Whether L and R are in the same measure in the music, represented as a true/falsevalue.

• Lev3-LR Whether L and R are in consecutive measures in the music, represented as atrue/false value.

Note: Features marked with an asterisk (∗) in the lists above are, from a machine learningstandpoint, two different features: they are separate and distinct inputs available to the randomforests algorithm; for instance, there is an SD-L feature and an SD-R feature that the learningalgorithm treats independently. However, in the feature selection experiments that follow, these“paired features” are always treated as an indivisible unit in the set of available features– theyare always added or removed together.

There are many well-established feature selection techniques that could potentially helpdetermine the relative importance of the features in producing output analyses that match theground-truth analyses. Many are inappropriate in our situation, however, because they aredesigned specifically for classification tasks, and there is no notion of an output class in oursystem. A further complication is that any change in the input features directly affects the outputof the random forests, which is a probability distribution for which we do not have ground-truthdata. Certainly, improving the estimate of the probability distribution will likely improve thequality of the most-probable music analysis predicted by ParseMop-C, but it is unclear to whatdegree this is true, as the two algorithms are computationally separate. Therefore, we turned totwo standard subset selection methods along with examining correlations between features.

In the two experiments that follow, we evaluated the quality of a specific subset of features byrunning the ParseMop-C algorithm with leave-one-out cross-validation to compute new MOPanalyses for all the pieces in the Schenker41 corpus longer than four measures, then calculatedthe overall edge accuracy for all the MOPs combined. For our first experiment, we divided thefeatures into four broad categories, melodic, harmonic, rhythmic, and temporal, and evaluatedthe quality of all exhaustive subsets of these categories of features. In our second experiment,we used backward selection to cycle through each feature from the training data and in orderto find the feature that, when omitted, decreased the overall accuracy the least. This feature wasthen permanently removed and another cycle was performed, until only one feature remained.For the entire set of 18 features we then had 18 trials, numbered 0–17, where trial 0 included all18 features in the full model, and trial 17 (trivially) included only the last remaining feature.

4. Results

4.1. Experiment 1

We first divided the features into four categories as follows.

• Melodic: SD-M, SD-LR, Int-LMR, Int-LR, IntI-LR, IntD-LR• Harmonic: RN-M, RN-LR, HC-M, HC-LR, CT-M, CT-LR• Metrical: Met-LMR, MetN-LR, MetO-LR• Temporal: Lev1-LR, Lev2-LR, Lev3-LR

Page 11: Analysis of analysis: Using machine learning to evaluate the

Journal of Mathematics and Music 137

Figure 7. Edge accuracy as determined from different subsets of feature categories. There is no trial for the temporalfeatures by themselves, as there are no middle-note features in that category, and at least one such feature is required.The horizontal line at 28% indicates the baseline edge accuracy that could be obtained by an algorithm that operatedrandomly.

We ran 14 trials to evaluate the quality of non-zero sized subsets of feature categories, fromall four categories (including all 18 features) down to single categories. Each trial producedan overall edge accuracy giving us information on the utility of the categories of features, asdetailed in Figure 7. The random forests use the left–right features as “input” and the middle-note features as “output,” and therefore trials are impossible to run if no middle-note features areincluded. Thus, there is no “temporal-only” trial.

Figure 7 shows a baseline edge accuracy of roughly 28%. This number is calculated byaveraging the edge accuracies of all possible MOP analyses of all the musical excerpts inquestion. In other words, a hypothetical version of ParseMop that analyzed music by select-ing a MOP analysis uniformly at random from all possibilities would average 28% edgeaccuracy.

The data from this experiment allow us to test three broad music-theoretic questions describedin Section 3: first, what is the relative importance of melodic and harmonic information? Sec-ond, is metrical information necessary for producing accurate analyses? Third, does temporalinformation influence analyses – that is, is it necessary to use harmonic, melodic, and metri-cal information differently depending on whether one is making decisions at note-to-note ormeasure-to-measure levels? All of these questions essentially presume that each feature set has arelatively consistent influence over the performance between trials, regardless of the other featuresets present. Figure 8 therefore shows the change of performance between trials that differed onlyin the presence or absence of one feature set, melodic, harmonic, metrical, or temporal. Theseappear to reflect relatively normal distributions, so we further assessed the contribution of eachfeature set performing paired t-tests comparing data with and without the given feature, andfound the inclusion of harmonic, melodic, and temporal features made a significant difference atp < .001 and metrical features at p < .01.

Page 12: Analysis of analysis: Using machine learning to evaluate the

138 P. B. Kirlin and J. Yust

Figure 8. Change of performance for adding each feature set. Each data point is the difference in performance betweentrials that include the feature set listed on the x-axis and trials with those features removed. Averages and standarddeviations are shown for the addition of each group of features. Data points falling outside of a single standard deviationof the average are labeled with the other feature sets present on the trials compared. For instance, the largest differenceobserved is between the trial with Melodic + Metrical features, and the one with Melodic + Metrical + Harmonicfeatures.

From Experiment 1 we can draw the following conclusions.

• Harmonic features are the most essential to producing an accurate analysis. In particular, theyoutweigh melodic features, although these also are very important. This is shown in the largeaverage influence over performance exhibited by these feature groups in Figure 8.

• Temporal features have a reliably positive influence on performance, around 5% on averagein Figure 8, meaning that the rules of Schenkerian analysis do differ from level to level. Inparticular, this influence seems to be present regardless of what other feature sets are present,so we can infer that harmonic, melodic, and metrical information all need to be used differentlyin some way between note-to-note, within-measure, and/or measure-to-measure levels.

• Metrical features also show a reliable influence over performance in Figure 8, although it isrelatively small (around 3–4%).

The extreme values in Figure 8 are also suggestive. All feature sets have well above-averageinfluence when just the harmonic features are present, which suggests that harmonic featuresare most useful in combination with at least one other feature type. At the same time, three ofthe unusually low values point to the fact that one trial, Harmonic + Temporal + Metrical, issubstantially lower than what would be predicted by a simple additive model of how the featuresinteract. In other words, there appears to be an especially sizeable overlap in analytically usableinformation for this particular combination of features.

4.2. Experiment 2

The second experiment compared the expendability of the 18 individual features using the back-ward selection procedure. Figure 9 lists the features dropped on each trial. At certain points in theprocess there are larger changes in baseline edge accuracy, most notably on trials 14, 16, and 17.

Page 13: Analysis of analysis: Using machine learning to evaluate the

Journal of Mathematics and Music 139

Figure 9. Performance of the algorithm on trials 0–17 and the feature removed prior to each trial. (Trial 0 includes allfeatures.) N.B. There is no trial 18, but Int-LMR is listed as number 18 to show that it is the last feature remaining in themodel.

This suggests that between these points are more significant differences in feature importance,and the features can be sorted into groups between these critical points.

As a check on the robustness of the ordering produced by the backward selection we alsoaveraged, across all trials including the given feature, the decrease in performance observedafter removing that feature. These data are shown in Figure 10. The orderings are consistentbetween the groupings indicated on Figure 9, but not within the larger groups A and B.1 We alsoperformed paired t-tests on the data comparing trials that differed only on the inclusion of thegiven feature. The results are shown in Figure 10. Note that n decreases as one goes higher on thelist in Figure 9, and is very small for many of the features in group A. Nonetheless, the statisticaltests show reliable influence over performance for all of the features in groups B–F, and even forsome features in group A.

One striking aspect of the result is that, if we sort the features by type – melodic, harmonic,metrical, and temporal – the five features in groups C–F represent all four types, with a dupli-cation for melodic features (Int-LMR and SD-M). In addition, the five features in group B alsorepresent all types, with an additional harmonic feature (CT-M and RN-M). Since the orderingof groups C–F appears robust, while that within group B is not, we can broadly characterize theresults as follows.

• The result of Experiment 2 is consistent with the finding from Experiment 1 that melodic,harmonic, metrical, and temporal information all contribute to the analytical process. It alsoindicates, further, that the algorithm performs best with a mix of all these feature types.

• It is also consistent with the finding from Experiment 1 that melodic and harmonic featuresare the most important (groups D–F), metrical and temporal somewhat less so (group C).

1 These two measures are confounded: for the features that remain in the model for longer, there are more trials toaverage over, and the drop in performance predictably gets higher as the number of features in the full model getssmaller. The similarity of the two rankings can therefore be partly featured to this confound, but not entirely so. Theaverage rise of feature importance from trial to trial is just 0.12%, and this is mostly due to the last four trials. On trials0–13 the average change is 0.02%.

Page 14: Analysis of analysis: Using machine learning to evaluate the

140 P. B. Kirlin and J. Yust

Figure 10. Average feature importance and standard deviation for all features over the course of the experiment.Asterisks show the results of paired t-tests between trials including and excluding the given feature. ∗∗∗ = p < .001,∗∗ = p < .01, ∗ = p < .05.

• For each of the feature types, except melodic, the majority of the necessary information iscaptured by a single feature. Yet, in all cases, some additional valuable information is providedby one or two other features (group B). In the case of melodic features, these can be subdividedinto purely intervallic information (Int-LMR) and tonally anchored information (SD-M).

While the results of the two experiments are largely consistent, they give divergent indica-tions on the relative significance of melodic and harmonic features. Experiment 1 indicated thatharmonic features make a larger contribution to the analytical process than melodic ones, whilein experiment 2, the highest-ranking features are melodic ones (Int-LMR and SD-M). Together,these two results indicate that the important melodic information is consolidated in these two spe-cific features, which provide a melodic pattern (Int-LMR) and orient it to a tonal center (SD-M),whereas harmonic information is distributed among several different kinds of features: distinctRN/HC versus CT features, the former dividing up between M and LR types.

Other than SD-M and Int-LMR, the only other notable melodic feature is SD-LR (group B).For the most part, the scale degree of the left and right notes is predictable from the scale degreeof the middle note and the intervals to and from the middle note. While SD-LR provides an addi-tional distinction between chromatically altered scale degrees and diatonic ones, the significanceof this feature probably is more due to the potential use of melodic information as an LR feature– i.e. on the left-hand side of the conditional probabilities. This is further discussed in Section 5.

Three of the harmony features drop out in group A. The HC-M and RN-M features, and HC-LR and RN-LR, are highly redundant (see Section 4.4 below), so it is unsurprising that one ofeach of these drops out in group A. The early exclusion of the CT-LR feature is also unsurprising,because non-chord-tones should be rare as LR features in the corpus. The other three harmonyfeatures all seem to be important: HC-LR, CT-M, and RN-M. HC-LR is the last LR feature

Page 15: Analysis of analysis: Using machine learning to evaluate the

Journal of Mathematics and Music 141

(group D), and from trial 11 onward the only other LR features are temporal ones. This indicatesthat harmony provides the most useful context for predicting features of the middle note. CT-M is also a feature that would be expected to be important, especially at the note-to-note level,since non-chord-tones are usually afforded a low structural status. Some additional data analysisreported in Section 5 suggests that on trial 13, after the removal of CT-M, a group of otherfeatures (SD-M, HC-LR, Int-LMR, and MetN-LR) combine to predict the chord-tone status ofthe middle note effectively.

As in Experiment 1, the metrical and temporal features played a lesser but nonetheless signif-icant role. Of the temporal features, both Lev1-LR and Lev2-LR were valuable, showing that itis mainly the note-to-note, within-measure, and between-measure levels that behave differently.The difference between consecutive measures and non-consecutive measures (indexed by Lev3-LR) was less useful. However, this may simply reflect the fact that the musical examples wererelatively short, and important decisions at these higher levels were constrained by the givenfundamental line.

Between the metrical features, it is unsurprising that Met-LMR is especially useful, since itincorporates relative information about the left and right notes as they relate to the middle note.However, MetN-LR, which provides absolute information about the metrical context, is also ofsome value.

4.3. Experiment 3

To complement the information about the relative expendability of features provided by the back-ward selection process in Experiment 2, we also performed a more limited forward selectionprocess to assess the independent usefulness of smaller sets of features. Because M featuresrequire LR features, and vice versa, we paired all similar M and LR features, and also includedthe LMR features by themselves. The results are shown in Figure 11.

These results corroborated the usefulness of harmony, observed in Experiment 1 but less evi-dent in Experiment 2. Melodic features alone give lower accuracy than either harmonic (notincluding CT) or metrical. This suggests that the high analytical value of melodic information isdependent upon some baseline harmonic and metrical information (in the form of HC-LR andMet-LMR), while harmony and meter are somewhat more independent. The especially low per-formance of the CT features can be attributed to the uselessness of CT-LR (given that the greatmajority of triangles in the corpus will have chord tones on the left and right), also observed inExperiment 2.

Figure 11. The edge accuracy of four models containing only a single LMR feature, or a single M and LR feature.

Page 16: Analysis of analysis: Using machine learning to evaluate the

142 P. B. Kirlin and J. Yust

Figure 12. Cramer’s V for five musical features (SD = scale degree, HC = harmony class, RN = Roman numeral,CT = chord tone status, and MS = metric strength). The numerical values are given below the diagonal and alsovisually illustrated by the relative darkness of squares in the grid. ∗∗∗ = p < .001.

4.4. Correlations among features

To aid in understanding how features may be interacting in the two main experiments, we alsocalculated correlations between features that existed in the musical excerpts themselves, with-out regard to the analyses (Figure 12). Having two features highly correlated with one anotherincluded in the model can be expected to reduce the overall influence of both features. Thus,knowing which features are correlated in the music can help to separate the influence of thesecorrelations from redundancies that are specific to the analytical process.

The five features included are all those that can apply to a single note: distinctions having to dowith relationships between notes are not included. There is one melodic feature (scale degree),three harmonic features (harmonic class, Roman numeral, and chord tone), and one metricalfeature (metric strength). A number of correlations are significant at p < .001 (and no others weresignificant even at p < .05). Metric strength has no significant correlations except for a small onewith harmony class. On the other hand, melodic information is strongly correlated with all of theharmonic features. This is unsurprising for RN and HC, but the high value for CT is more so.This high correlation is due primarily to the strong tendency of the “7th of chord” category to berepresented by 4̂. Among the harmonic features, the very high correlation between HC and RNis unsurprising. These features differ mainly on the treatment of applied chords, which are rare,and otherwise on the greater specificity of the RN feature. CT is not significantly correlated withthe other two harmonic features.

5. Interactions between features

The results in the previous section give a general picture of what kind of musical information ismost important in producing a Schenkerian analysis. However, we also found that the importanceof a given feature may be dependent upon what other features are present in the model. In somecases, the reasons for this are obvious. For instance, we found in Section 4.4 that scale degree and

Page 17: Analysis of analysis: Using machine learning to evaluate the

Journal of Mathematics and Music 143

harmony class are strongly correlated in the music. Therefore, the presence of, e.g., SD-M shouldreduce the importance of HC-M (and SD-LR should reduce the importance of HC-LR, etc.).Where this reflects features of the music, it has nothing to do with Schenkerian analytical practiceper se. However, similar kinds of redundancies might derive from the nature of the analyticalprocess as well. And we also find that certain features increase in importance when anotherfeature is included, which reflects an interdependence of features in the analytical process.

The backward selection process, because it required removing every remaining feature onevery trial, produced data that made it possible to address this question of how features interact.The nature of this data analysis is necessarily exploratory. Consider, for instance, Figure 13,which tracks the effect of removing the scale degree feature over the course of Experiment 2.Were the relationship of this to other features equivocal, we might expect the values to stayroughly the same or to increase steadily from one trial to the next. The chart for SD-M is muchmore jagged, suggesting that it has non-trivial relationships with other features. For instance,when SD-LR is removed, SD-M suddenly spikes in importance. This suggests that the precedinglow value probably reflected not a lack of significance for scale degree information in general,but that there is considerable overlap between the two scale degree features, making one of them– but not both – expendable. This makes sense, because given that the algorithm has intervallicinformation about the melody (in the form of Int-LMR), either scale degree feature can anchorthat intervallic information to a tonal center.

The potentially most meaningful data in these graphs are the places like this where thereare large changes in the importance of some feature from one trial to the next. We can inferthat these large changes result from some kind of interaction between the feature in questionand the one removed from one trial to the next. These may be of the form of redundancies orinterdependencies. Such redundancies or interdependencies might involve more than one feature.For example, if one knows the scale degree of the middle note (SD-M) then one can infer thescale degree of the left and right notes (SD-LR) by knowing the intervals between the middleand the left and right notes (Int-LMR). Therefore, these three features are redundant as a group,but no pair of them is redundant.

Figure 13. The effect of removing the SD-M feature on trials 0–16. The labels on the horizontal axis show the featureremoved prior to the given trial. For instance, the peak at the trial for SD-LR means that the effect of removing SD-Mchanged to about 8% after the removal of SD-LR.

Page 18: Analysis of analysis: Using machine learning to evaluate the

144 P. B. Kirlin and J. Yust

This redundancy is reflected by spikes in the importance of SD-M in Figure 13. The dropsin importance reflect interdependencies between features. The largest drop in Figure 13 occurswhen MetN-LR is removed. This indicates that SD-M is more useful when MetN-LR is includedin the model. MetN-LR provides absolute metric orientation to complement the relative infor-mation of Met-LMR, so we can deduce that the likelihood of a given scale degree in the middleposition changes depending on the metric context. We also see fairly large drops in the impor-tance of SD-M for the temporal features Lev1-LR and Lev2-LR, which similarly indicates thatthe likelihood of certain scale-degree progressions depends on temporal context.

To search the data systematically for the most prominent such interactions between parameters,we tallied every change of feature importance from one trial to the next and ordered them fromlargest to smallest in distance from the mean (where “feature importance” is the reduction oftriangle accuracy that results from removing the given feature on the given trial). The meanchange is 0.12%, indicating a tendency for feature importance to increase modestly on averageas the model gets smaller. Table 1 includes all cases where the change is further than one standarddeviation (2.35%) from this mean.

We will not address every entry in this list. The sections below will instead discuss some ofthe more notable interactions shown.

5.1. Removal of CT-M

The largest two changes, and four of the largest 20, occurred between trials 12 and 13 afterthe removal of the CT-M feature. All of these are increases in sensitivity, which indicateredundancies. These redundancies apparently involve a large group of parameters. Most non-chord-tones will occur when the left and right notes are of the same harmonic class and themiddle note has a conflicting scale degree. Therefore, the combination of HC-LR and SD-M cansubstitute for CT-M in many instances, making for a three-way redundancy. This is indicated inthe first two entries of Table 1.

Other kinds of non-chord-tones that occur over changes of harmony, such as accented disso-nances and passing tones preceding a chord change, can be predicted with the help of metricalinformation and interval patterns associated with these types of dissonance figures. This explainsthe entries 11 and 13 on Table 1, which indicate a larger five-way redundancy between CT-M,SD-M, HC-LR, Int-LMR, and Met-LMR.

These observations help us interpret the ordering of features arrived at in the experiment.The importance of chord-tone status is reflected not only in the value of CT-M in the resultsof Experiment 2, but also in the large increases in sensitivity of the four features in Table 1.Therefore we may conclude that chord-tone status, as might be expected, plays a large role in theanalytical process for these kinds of short excerpts.

5.2. Removal of temporal features

Trial 14, where the Lev1-LR feature is removed, is also prominent in Table 1. All of the largechanges on this trial reflect interdependencies: entries 4 (Met-LMR), 6 (HC-LR), 16 (Int-LMR),and 21 (SD-M). These are all the features remaining in the model at trial 14. This result indicatesthat each of these basic metrical, harmonic, and melodic features operate differently at the local,note-to-note level than at higher levels. In other words, this is strong evidence that the rules ofnote-to-note analysis differ in systematic ways from higher-level analytical reasoning. For theharmonic and metrical features, this is readily understandable – a change of harmony at the note-to-note level is likely to represent a direct harmonic progression, and direct metrical relations,similarly, have a more specific meaning than indirect ones. However, the inclusion of melodic

Page 19: Analysis of analysis: Using machine learning to evaluate the

Journal of Mathematics and Music 145

Table 1. Changes of feature importance from one trial to the next greater than one standard deviation from the meanof 0.12% (less than −2.23% or greater than 2.47%).

Feature Edge Difference Change fromNo. Feature Trial removed accuracy from full previous trial

1 SD-M 13 CT-M 52.61 13.38 12.362 HC-LR 13 CT-M 52.74 13.25 7.773 SD-M 10 SD-LR 60.00 8.15 7.394 Met-LMR 14 Lev1-LR 60.00 0.51 − 5.995 CT-M 10 SD-LR 67.52 0.64 − 5.866 HC-LR 14 Lev1-LR 52.99 7.52 − 5.737 Int-LMR 10 SD-LR 59.36 8.79 5.738 Int-LMR 15 Met-LMR 45.73 14.27 5.229 MetN-LR 10 SD-LR 68.28 − 0.13 − 4.7110 SD-M 11 MetN-LR 64.84 3.44 − 4.7111 Int-LMR 13 CT-M 53.50 12.48 4.5912 Int-LMR 12 Lev2-LR 58.85 7.90 − 4.0813 Met-LMR 13 CT-M 59.49 6.50 4.2014 Int-LMR 8 RN-LR 66.62 3.95 − 3.6915 Met-LMR 11 MetN-LR 64.20 4.08 3.8216 Int-LMR 14 Lev1-LR 51.46 9.04 − 3.4417 Int-LMR 5 IntD-LR 65.35 7.64 3.4418 SD-M 16 HC-LR 46.62 5.35 − 3.0619 Met-LMR 10 SD-LR 67.90 0.25 − 3.0620 Int-LMR 11 MetN-LR 56.31 11.97 3.1821 SD-M 14 Lev1-LR 50.06 10.45 − 2.9322 RN-LR 4 HC-M 72.36 0.76 − 2.6823 SD-LR 9 RN-M 68.15 0.51 − 2.6824 RN-LR 3 CT-LR 69.43 3.44 2.6825 HC-LR 10 SD-LR 63.31 4.84 2.6826 RN-M 8 RN-LR 68.66 1.91 − 2.4227 SD-M 12 Lev2-LR 65.73 1.02 − 2.42

features also shows that we should expect different melodic shapes at slightly higher levels thanat the note-to-note level.

The latter point may also be made about Lev2-LR, the distinction between within-measure andbetween-measure levels. The melodic features are also dependent on this feature as indicated inentries 12 (Int-LMR) and 27 (SD-M).

5.3. Importance of Int-LMR

The Int-LMR feature occurs frequently in Table 1. This data is perhaps most readily assessed byconsidering the overall trajectory of Int-LMR’s feature importance, as shown in Figure 14. Thereis a general upward trajectory indicating that, as features are removed, Int-LMR compensates formany of them, especially melodic and metrical features. This trend is interrupted by three promi-nent dips. The first corresponds to the removal of two harmonic features, suggesting that thelikelihood of certain interval patterns depends upon harmonic context (e.g. a change of harmonyversus a stable harmony). The other two dips are for the temporal features just discussed.

5.4. Interactions involving SD-LR

As previously noted, the SD-LR feature is redundant with the combination of SD-M and Int-LMR, in the sense that the two middle features may be used to predict SD-LR, with the onlyexception being distinctions between chromatic and diatonic values. It is therefore unsurprisingthat two of the strongest interactions involve redundancies between these three features (entries

Page 20: Analysis of analysis: Using machine learning to evaluate the

146 P. B. Kirlin and J. Yust

Figure 14. Changes in Int-LMR’s feature importance through the trials. The labels on the horizontal axis show thefeature removed prior to the given trial.

3 and 7 in Table 1). However, SD-LR also shows some prominent interdependencies indicatedby negative shifts in entries 5 (CT-M), 9 (MetN-LR), and 19 (Met-LMR). These reflect the roleof SD-LR as the only LR melodic feature remaining in the model beyond trial 6, which allowsmelodic considerations to appear on the left side of the conditional probability, as predictors. Theresults show that the use of melodic information to reliably predict features of the middle noteis largely dependent upon complete harmonic (CT-M) and metrical (MetN-LR and Met-LMR)information.

6. Conclusions

• Melodic, harmonic, and metrical features are all significant considerations in Schenkeriananalysis.

• As a whole, harmonic information is the most essential for making an accurate analysis.• However, harmonic information breaks down into two separately important components. First,

harmonic context is the most useful conditional feature for predicting characteristic patterns ofall features. Second, identification of non-chord-tones is a significant aspect of the analyticalprocess.

• Melodic information – scale-degree progression and intervallic pattern – is also crucial andactually becomes the most important analytical factor when harmonic features are divided upbetween harmonic context and chord tone status.

• Metrical information is the least important overall, but nonetheless does play an essential role.• The rules of Schenkerian analysis vary from level to level. In particular, there is strong

evidence that the note-to-note level follows substantially different rules than larger-scale lev-els. This is especially true of how melodic patterns work in Schenkerian analysis, whichvaries not only at the note-to-note level, but also between within-measure and across-measurelevels.

Page 21: Analysis of analysis: Using machine learning to evaluate the

Journal of Mathematics and Music 147

Acknowledgments

The authors would like to thank the anonymous reviewers, José M. Iñesta (a Guest Editor of this special issue),and Thomas Fiore (the journal’s co-Editor-in-Chief) for their helpful and insightful comments on earlier versions ofthis paper.

Disclosure statement

The authors have no conflict of interest.

References

Agawu, Kofi. 2009. Music as Discourse: Semiotic Adventures in Romantic Music. New York: OxfordUniversity Press.

Breiman, Leo. 2001. “Random Forests.” Machine Learning 45 (1): 5–32.Burstein, L. Poundie. 2011. “Schenkerian Analysis and Occam’s Razor.” Res Musica 3 (3): 112–121.Cadwallader, Allen, and David Gagné. 1998. Analysis of Tonal Music: A Schenkerian Approach. Oxford:

Oxford University Press.Forte, Allen, and Steven E. Gilbert. 1982a. Instructor’s Manual for “Introduction to Schenkerian Analysis”.

New York: W. W. Norton.Forte, Allen, and Steven E. Gilbert. 1982b. Introduction to Schenkerian Analysis. New York: W. W. Norton.Frankel, R. E., S. J. Rosenschein, and S. W. Smoliar. 1978. “Schenker’s Theory of Tonal Music – Its

Explication Through Computational Processes.” International Journal of Man–Machine Studies 10 (2):121–138.

Gilbert, Édouard, and Darrell Conklin. 2007. “A Probabilistic Context-Free Grammar for MelodicReduction.” In Proceedings of the International Workshop on Artificial Intelligence and Music, 20thInternational Joint Conference on Artificial Intelligence, 83–94.

Hamanaka, Masatoshi, Keiji Hirata, and Satoshi Tojo. 2005. “ATTA: Automatic Time-Span Tree AnalyzerBased on Extended GTTM.” In Proceedings of the Sixth International Conference on Music InformationRetrieval, 358–365.

Hamanaka, Masatoshi, Keiji Hirata, and Satoshi Tojo. 2006. “Implementing ‘A Generative Theory of TonalMusic’.” Journal of New Music Research 35 (4): 249–277.

Hamanaka, Masatoshi, Keiji Hirata, and Satoshi Tojo. 2007. “ATTA: Implementing GTTM on a Computer.”In Proceedings of the Eighth International Conference on Music Information Retrieval, 285–286.

Hamanaka, Masatoshi, and Satoshi Tojo. 2009. “Interactive GTTM Analyzer.” In Proceedings of the 10thInternational Society for Music Information Retrieval Conference, 291–296.

Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. 2009. The Elements of Statistical Learning. 2nded. New York: Springer-Verlag.

Jiménez, Víctor M., and Andrés Marzal. 2000. “Computation of the N Best Parse Trees for Weightedand Stochastic Context-Free Grammars.” In Advances in Pattern Recognition, Volume 1876 of LectureNotes in Computer Science, edited by Francesc J. Ferri, José M. Iñesta, Adnan Amin, and Pavel Pudil,183–192. Berlin: Springer-Verlag.

Jurafsky, Daniel, and James H. Martin. 2009. Speech and Language Processing: An Introduction to NaturalLanguage Processing, Speech Recognition, and Computational Linguistics. 2nd ed. Upper Saddle River,NJ: Prentice-Hall.

Kanamori, Kouhei, Masatoshi Hamanaka, and Junichi Hoshino. 2014. “Method to Detect GTTM LocalGrouping Boundaries Based on Clustering and Statistical Learning.” In Proceedings of the 11th Soundand Music Computing/40th International Computer Music Conference, 1193–1197.

Kassler, Michael. 1975. Proving Musical Theorems I: The Middleground of Heinrich Schenker’s Theory ofTonality. Technical Report 103, Basser Department of Computer Science, School of Physics, Universityof Sydney: Australia.

Kassler, Michael. 1987. “APL Applied in Music Theory.” APL Quote Quad 18 (2): 209–214.Kirlin, Phillip B. 2014a. “A Data Set for Computational Studies of Schenkerian Analysis.” In Proceedings

of the 15th International Society for Music Information Retrieval Conference, 213–218.Kirlin, Phillip B. 2014b. “A Probabilistic Model of Hierarchical Music Analysis.” Ph.D. thesis, University

of Massachusetts Amherst.Kirlin, Phillip B., and David D. Jensen. 2011. “Probabilistic Modeling of Hierarchical Music Analysis.” In

Proceedings of the 12th International Society for Music Information Retrieval Conference, 393–398.

Page 22: Analysis of analysis: Using machine learning to evaluate the

148 P. B. Kirlin and J. Yust

Kirlin, Phillip B., and David D. Jensen. 2015. “Using Supervised Learning to Uncover Deep MusicalStructure.” In Proceedings of the 29th AAAI Conference on Artificial Intelligence, 1770–1776.

Lerdahl, Fred, and Ray Jackendoff. 1983. A Generative Theory of Tonal Music. Cambridge, MA: MITPress.

Marsden, Alan. 2005. “Towards Schenkerian Analysis by Computer: A Reductional Matrix.” In Proceed-ings of the International Computer Music Conference, 247–250.

Marsden, Alan. 2010. “Schenkerian Analysis by Computer: A Proof of Concept.” Journal of New MusicResearch 39 (3): 269–289.

Marsden, Alan, and Geraint A. Wiggins. 2008. “Schenkerian Reduction as Search.” In Proceedings of theFourth Conference on Interdisciplinary Musicology, Thessaloniki, Greece.

Mavromatis, Panayotis, and Matthew Brown. 2004. “Parsing Context-Free Grammars for Music: A Com-putational Model of Schenkerian Analysis.” In Proceedings of the 8th International Conference on MusicPerception & Cognition, 414–415.

Miura, Yuji, Masatoshi Hamanaka, Keiji Hirata, and Satoshi Tojo. 2009. “Use of Decision Tree to DetectGTTM Group Boundaries.” In Proceedings of the International Computer Music Conference, 125–128.

Pankhurst, Tom. 2008. SchenkerGUIDE: A Brief Handbook and Website for Schenkerian Analysis. NewYork: Routledge.

Provost, Foster, and Pedro Domingos. 2003. “Tree Induction for Probability-Based Ranking.” MachineLearning 52 (3): 199–215.

Rothstein, William. 1990. “The Americanization of Schenker Pedagogy?” Journal of Music TheoryPedagogy 4 (2): 295–300.

Schachter, Carl. 1990. “Either/Or.” In Schenker Studies, Volume 1, edited by Hedi Siegel, 165–179.Cambridge, UK: Cambridge University Press.

Yust, Jason. 2006. “Formal Models of Prolongation.” Ph.D. thesis, University of Washington, Seattle.Yust, Jason. 2009. “The Geometry of Melodic, Harmonic, and Metrical Hierarchy.” In Proceedings of the

International Conference on Mathematics and Computation in Music, 180–192.Yust, Jason. 2015. “Voice-Leading Transformation and Generative Theories of Tonal Structure.” Music

Theory Online 21 (4). http://www.mtosmt.org/issues/mto.15.21.4/mto.15.21.4.yust.html.