document resume - ericdocument resume ed 286 476 ir 012 809 author house, william j.; davis, cheryl...

DOCUMENT RESUME

ED 286 476 IR 012 809

AUTHOR House, William J.; Davis, CherylTITLE The Recognition of Melodies by Humans and by

Machine.PUB DATE 24 Aug 86NOTE 25p.; Paper presented at the Annual Convention of the

American Psychological Association (94th, Washington,DC, August 22-26, 1986/.

PUE TYPE Information Analyses (070) -- Reports -Research /Technical (143) -- Speeches/ConferencePapers (150)

EDRS PRICE MF01/PC01 Plus Postage.DESCRIPTORS *Computer Simulation; Higher Education; Models;

Multiple Regression Analysis; *Music; PredictiveValidity; *Recognition (Psychology); UndergraduateStudents

IDENTIFIERS *Melody Recognition

ABSTRACTThe behavior of two real-time computer simulation

models of melody recognition was compared with the performance ofhuman subjects in this study. One of the models, INT1, recognizedmelodies by comparing specific intervals with stored intervals. Theother model, CONT1, performed by comparing the contour of thestimulus melody with an array of melody intervals. The 20intermediate and advanced psychology majors who volunteered toparticipate in the study were tested on the speed and accuracy oftheir recognition of 20 familiar melodies. In regression analyses,these data were regressed against the two computer models, and theresults indicated that the contour analysis simulation (CONT1) was amore adequate predictor of human recognition than was the intervalanalyzer program. Four references are listed, and several graphsdepicting the results are appended. (MES)

************************************************************************ Reproductions supplied by EDRS are the best that can be made *

* from the original document. *

***********************************************************************

...

U.S DEPARTMENT OF EDUCATIONOffice of Educational Reseuch and Improvement

EDUCATIONAL RESOURCES INFORMATIONCENTER (ERIC)

7(This document has been reproduced asreceived from the person or organizationorlyinatingMinor changes have been made to improvereproduction quality

%10 Points of view or opinions stated in this document do not necessarily represent officialOERI position or policy

The Recognition of Melodies by Humans and By Machine

Research Reported at the 94th Annual Convention of the

American Psychological Association

August 24, 1986

William J. House and Cheryl Davis

University of South Carolina Aiken

ABSTRACT

THIS EXPERIMENT WAS A TEST OF TWO COMPUTER MODELS OF MELODY

RECOGNITION. THE BEHAVIOR OF TWO REAL-TIME SIMULATION PROGRAMS

WAS COMPARED WITH THE PERFORMANCE OF HUMAN SUBJECTS IN A SPEED-

ACCURACY FORMAT. ONE OF THE MODELS, INT1, RECOGNIZED MELODIES BY

COMPARING SPECIFIC INTERVALS OF MELODIES WITH STORED INTERVALS;

THE DISTINCTIVE FEATURE WAS SPECIFIC INTERVALS. THE OTHER MODEL

PERFORMED B".: COMPARING THE CONTOUR (SEQUENCE OF TONAL UPS AND

DOWNS) OF THE STIMULUS MELODY WITH AN ARRAY OF MELODY INTERVALS.

IN REGRESSION ANALYSES THE HUMAN DATA WERE REGRESSED AGAINST

THE TWO COMPUTER MODELS. THESE INDICATED THAT THE CONTOUR

ANALYSIS SIMULATION wAS A MORE ti )EQUATE PREDICTOR OF THE HUMAN

RECOGNITION THAN WAS THE INTERVAL ANALYZER PROGRAM.

BEST COPY AVAILABLE2 1

"PERMISSION TO RE'RODUCE THISMATERIAL HAS PFEW GFiANTED BY

William J. house

TO THE EDUCATIONAL RESOURCE&INFORMATION CENTER (ERIC)."

The Recognition of Melodies by Humans and By Machine

William J. House and Cheryl Davis

University of South Carolina Aiken

The fundamental question examined in this study was the

nature of the matching process which must occur when a listener

recognizes a melody which is being played. A relevant popular

general model of pattern recognition is the distinctive-feature

sort (see Estes, 1978). In these models, salient details of the

stimulus are matched with a stored list of such details. If

enough of these features are matched, recognition occurs. The

object of the present study is to examine two possible

distinctive-features of simple melodies and to ascertain which of

these melodic features evokes more human-like pattern recognition

in a computer model.

The critical ...ture of a melody cannot be the individual

tones of the melody because human beings are capable of

recognizing melodies no matter in what key the melody is played;

the specific notes themselves are unimportant rather, the

distance between the notes (e.g. Plomp, Wagenaar, and Mimpen,

1973) or the relative simplicity of the ratio between the notes

(e.g., House and Harm, 1979) is the crucial factor. Thus, the

musical interval (e.g., octaves, perfect fifths, minor thirds,

etc.), created by the relationship between the tones, is the

factor which is invariant in the performance of a melody. If the

musical interval is the critical feature by which recognition of

melodies occurs, then a distinctive-feature system must match

the intervals of the stimulus melody with some cognitive

representation of the musical interval.

2

3

The first model in the present work is an implementation of

a distinctive-feature matching system in which the musical

interval itself is the critical melodic component by which

recognition occurs. This proposition is embodied here in INT1, a

computational interval-matching model of melody recognition.

Another possible salient psychological feature of melodies

is melodic contour or the pattern of tonal ups and downs.

Dowling (e.g., Dowling and Fujitani, 1971) has demonstrated that

melodic contour may be the crucial means by which recognition of

melodies happens. CONT1 is the present computational model of

melody recognition which depends upon the matching of the

abstraction of the musical intervals their contour.

The Stimulus Representation

The "melodies" for both models were represented by a string

of letters and other characters which were input from the

keyboard of the computer as one might play a piano. Processing

began with the introduction of the first "tone" letter.

Subsequent letters were introduced in real time and corresponded

to the note name of a given melody in conventional music; that

is, letter names represented tones, a minus was a flat and a plus

was a sharp. As apparently occurs in humans, these programs

were designed such that the input melody sequence could be in any

key but for purposes of the experiment only the keys of C major,

Eb major, and F major were used. In order to simplify a very

complex behavior, all of the melodies in this study were

constructed with all of the tones having the same time value;

that is, all melodies consisted of seven equal eighth-notes.

Thus, the melodies were rhythmically identical.

3

4

;.,

Permanent Memory

Twenty melodies were selected on the basis of the variety of

their structure, my judgment tnat they could be recognized in a

steady eighth-note rhythm by human subjects, and my opinion

that they were reasonably well known (see table 1). These

melodies were stored as strings of musical intervals subscripted

in a twenty by six (1:.elody by interval) array. For example, the

first six intervals of Yankee-Doodle were stored as

uni,ama2,ama2,dma3,ama3,dma2;

thus, a unison interval is followed by two ascending major

seconds, a descending major thiLd, an ascending major third, and

a descending major second. The titles were stored in another

array in such a manner that the coordinates for this title array

corresponded with those of the melody array.

Immediate Memory and Partial Matching

Each -onal input resulted in the program fetching he tore's

frequency (based on the equal tempered scale with A4=-440 hz) and

holding that value long enough to compute the ratio between it

and the next tonal frequency:

R(j)=(FR[i])/(FR[i-1])

where j is one of the six possible ratios among the seven (i)

tones.

A contingency table in which ranges of ratios were

associated Pith musical intervals was next addressed. In the

case of INT1 the exact musical interval was used as the basis of

partial-matching. CONT1 assessed only the direction of movement

(ascending versus descending intervals) for matching.

4

5

At the beginning of a test, all of the melodies had an equal

probability of being the correct melody. Each melody was, thus,

from the outset assigned the probability subscript of

P(m)=.05

whore m is the number of a particular melody in memory. When a

match occurred P(m) was increased additively

P(m)=[P(m)+.14].

When an interval (in the case of INT1) or the direction of

movement (in CONT1) was )btained, it was compared with the

appropriate interval in all of the melodies in the array of known

melodies. If, for example, the first interval was a unison

(wherein the first two toaes are the same) then the program

assigned all known melodies that also had an initial unison with

an incrementing probability subscript. As the matching

progressed, the melodies in memory began to acquire differing

probability subscripts and groupings of structurally similar

melodies evolved. When one of the melodies' probability

subscript reached an arbitrarily set criterion, the program

"guessed" that melody.

Feedback

Aiter the program concluded that the partial input sequence

was one or the other melody, the operator communicated with the

program whether or not the response was correct. If the response

was correct, the probability criterion for that melody in

subsequent tests was decreased; otherwise the criterion was

increased. Consequently, over trials, successes tended to cause

the programs to "prefer" some melodies over others. Not

surprisingly, I have observed humans employing this "anchoring"

5

or favoring certain melodies in other melody guessing tasks.

Simulation Procedure

Each program was presented with five of the stored melodies

(Yankee Doodle, Twinkle-Twinkle Little Star, My Bonnie Lies Over

the Ocean, Danny Boy, and the principal theme from the Haydn

Surprise Symphony) to a random order and in the several keys

mentioned above over five trials. These particular melodies were

chosen because they represented structurally different types of

melodies. For example, Yankee Doodle and Twinkle are highly

redundant in comparison to Danny Boy which has a relatively more

entropic structure. The number of tones needed for the computer

to make a guess was recorded along with both the number of

correct responses and the melody name of incorrect responses.

The results are presented below along with the results of the

analogous human experiment.

The Experiment With Humans

Subjects

Twenty intermediate and advanced psychology majors

volunteered to participate in this experiment. There were ten

males and ten females and none of these people had any

significant musical training or experience.

Stimuli

A tape was recorded with twenty melodies (table 1) all made

up of seven sine-wave generated tones; each melody was played

twice. This recording would be played for the subject prior to

testing, demonstrating to her the pool of melodies from which she

was to choose the correct ones during testing. Next, the same

6

five melodies used in the simulation experiment above were

randomly presented over five trials randomly in the keys of C,

Eb, or F major. The sine-wave tones were generated on an

Electrocomp 400 synthesizer with a Electrocomp 401 sequencer

controlling the duration of the tone and the intertone interval.

The tempo was set at a comfortable listening speed which was

approximately two tones per second. They were recorded on an

amplified Teac 2300S deck and were played back on Realistic brand

stereo speakers.

Procedure

The subject was brought into a sound-tight laboratory room

where she was placed in a comfortable chair facing the stereo

speakers. The experimenter gave the subject a list of the twenty

melodies typed on a sheet of paper and then proceeded to play th

recorded seven tone melodies two times each pronouncing the na

of the melody after it was played. The subject was allowed

repeat the presentation of :he melodies if she wished.

The subject was told that she would be asked to liste

several of these melodies again and was to name the tune qu

in as few tones as possible. Next, the subject was given

"dry-run" melodies to see.if she understood the inszruc

Then, the experimental trials began. The experimente

further instruction if either the responses were very sl

too many errors were being made. The response as wel

tone on which the response was made were the dependent

which were used for comparison with the computer behavi

7

8

e

me

to

n to

ickly

five

tions.

r gave

ow or if

1 as the

variables

or.

Results

Accuracy

Figure 1 is a graphic comparison of INT1's, CONT1's, and the

human subjects' overall percentage of correct responses over five

trials for each of the test melodies. Regression analyses were

computed in which the human data were separately regressed

against each of the computational models to assess each model's

relative ability to predict the human subjects' behavior. CONT1

accounted for 40% of the human accuracy performance (R 2 =.40).

The accuracy of INTl had practically no correspondence to the

human behavior (R 2 =.008). Although not totally accurate in its

recognition of the melodies, INT1 was much more accurate in

"guessing" melodies than either CONT1 or the human subjects.

INTl was too good a recognizer to be an acceptable model of human

recognition in terms of overall accuracy.

(A multiple regression was also computed using both of the

models together as predictors. The human variance accounted for

in this analysis was somewhat of an improvement over CONT1 alone

(R2 =.43). This opens for speculation the possibility of a

future model in which the two approaches are somehow combined in

preprocessing. This will be discussed more below.)

Speed

Figure 2 contains a comparison among the humans and the two

computational models in their relative speed of recognition. The

dependent variable in this case is the mean tone of the melody on

which a correct recognition occurred over the five test trials on

each of the five melodies. The humans were by far the slowest

8

9

of the recognizers. This is because no particular care was taken

in writing these programs to include an output component which

would model human responding. When the recognition occurred in

each of the programs, a print function simply produced the

"guess" on the CRT. The response system in humans is clearly

more complex and time consuming than that of the computer. Other

experiments that we are doing now require subjects to press a

button at the point of recognition and then make their oral

response. In this way the computer's speed advantage is

lessened. However, none of these matters influence the

particular type of analysis being presented here.

Regression analyses of the mean-note-of-recognition were

computed, as above, regressing the human data against each of the

models. In contrast to the accuracy results described earlier,

INT1 was a somewhat better model for the human recognition

(R2=.12) than CONTI which accounted for virtually none of the

variance (R2 =.03) in mean-note-of-recognition. (As in the

accuracy analyses, a multiple regression revealed the potential

of an interaction between the models in modeling human

recognition (R 2 =.20).)

Learning

Figure 3 compares the models and the subjects on the mean

percentage of correct recognitions on all melodies across the

five trials. The human subjects demonstrated significant

improvement over trials (F[1,4]=44.31, p=.006). Neither of the

models' slopes demonstrated that learning had taken place

(F's<1).

9

10

CIA

When the human accuracy data across trials was regressed

against each of the models, INT1 accounted for 52% of the

variance (R2=.52) but the coefficient of correlation was

-egative! CONTI only explained 2% of the experimental variance.

Insofar as increased speed of recognition over trials is a

measure of learning (see figure 4), the humans produced evidence

of improvement over trials (slope= -.29 F[1 4]=36.8, p<.01).

Some trivial learning was observed in INT1 (F[1,4]=7.84, p=.07)

but little confidence can be put in the negative slope created in

the recognition by CONT) (slope= -.08, F[1,4]=4.0, p=.14).

Regression analyses, however, demonstrated that both

programs produced significantly accurate predictions of the human

data with CONTI (R 2 =.72) doing much better modeling. INT1

accounted for 57% of the experimental variance (R2=.57).

Confusion

Figure 5 displays the relative confusion found in the human

recognizers as compared with the two models. For all of the

times that the melody-name was guessed, the percentage of the

time that the name was correct is presented. Thus 100% meant

that the melody-name was never guessed correctly (total

confusion). Neither of the computational models accurately

predicted the human response; the human subjects were by far the

least confused of the recognizers. CONTI and INT1 accounted for

1 and 3 percent of the human variance respectively and both

together only accounted for 3 percent.

10

11

Summary

Accuracy

CONT1 was the better model of human accuracy in recognition

of these melodic patterns. INT1 was more accurate than either

CONT1 or the human recognizers. This is not surprising when one

considers that INT1 always has accurate although partial

information from which to make decisions. Its only failures were

the result of the effects of feedback and the arbitrary guessing

criterion. CONT1, on the other hand, had to deal with the

abstraction of melodic intervals, the sequence of ups and downs,

which involved far less accurate information content than the

exact musical intervals. Consequently, CONT1 was inaccurate as

were the human subjects. The inaccuracy of this model was not

haphazard though; so much like the human behavior was CONT1's

performance that this model predicted the human recognition very

well (R 2 =.40).

Speed

The modeling of the speed of recognition in humans by CONT1

and INT1 was unimpressive. I feel that much work needs to be

done in simulating the reaction cf humans in such decision making

as was being considered in this study. Clearly, the humans were

going through more processing than the computational models were.

This modeling could be accomplished by separating the recognition

stage from the reaction stage in an empirically justifiable

manner. In the present pilot work there was no distinction

between recognition and reaction in the two models. They were,

thus, given an advantage over the human recognizers in the speed

of recognition.

11

12

Learning

CONT1 modeled the human course of learning better than INTL

(R2=.72). Learning was slight in humans but, indeed, appeared

to be happening. INTL was also a reasonable predictor of the

human behavior (R 2 =.57) but this should not be surprising

because both models had identical feedback mechanisms. Thus, the

superiority of CONT1 is even more important; the difference in

efficacy between the two models must be in the pattern

recognition component rather than in the feedback.

Confusion

In a general sense, both of the present theoretical modals

of melodic pattern recogni ion were more accurate and faster than

the human subjects. However, as difficult as the experimental

task was for the humans, the humans were remarkably less confused

than the models.

Conclusion

The stroLger model of human pattern recognition in the

present pilot work appears to be CONT1. Even with major failures

(such as the total failure of CONT1 to contend with the Surprise

Symphony Theme), when one of the models succeeds it is usuall

CONT1. However, when it fails, it is sometimes augmented in its

ability by INT1; often together they account for more human

performance than they do separately. The implication here is

clear: the rapid recognition of melodic sequences prob.lbly

involves multiple stages of analysis triggered by some stimulus

characteristic perhaps in a preprocessing stage. In the case of

CONT1 and INT1, one can conceive of a model combining the two

12

13

recognition approaches such that a preliminary decision is made

regarding the type of melody being presented (e.g. a folk melody

versus a children's tune). This decision could be accomplished

using a heuristic estimate of the entropy of the melody. The

results of this branch of the program would lead to either a

CONT1 or a INT1 type analysis.

References

Estes, W.K. Perceptual processing in letter recognition andreading. In E.C. Carterette & M.P. Friedman (Eds.),Handbook of Perception, Vol IX. New York: Academic Press,1978.

Dowling, W. J. and Fujitani, D. Contour, interval, and pitchrecognition in memory for melodies, Journal of theAcoustical Society of America, 1971, 49, 524-531.

House, W.J. add Harm, O.J. Sensitivity to the elements of puretone auditory dyads. Journal of Auditory Research, 1979,19, 111-116.

Plomp, R., Wagenaar, W.A., and Mimpen, A.N. Musical intervalrecognition with simultaneous tones. Acustica, 1973, 29,101-1n.

Table 1

Melodies in Permanent Memory

Yankee DoodleTwinkle Twinkle Little StarOld McDonaldHaydn Surprise Symphony ThemeAuld Lang SyneSwanee RiverMy Bonnie Lies Over the OceanI've Been Working on the RailroadCamptown RacesComin' Through the RyeFlow Gently Sweet AftonDrink to Me On3y With Thine EyesDanny BoyDvorak New World Symphony ThemeOh SusannaAloha OeThree Blind MiceRed River ValleyDixieFrere Jacque

1

15

LT

,t-41611

L5222 '550'

952 0 FF

a3 0222 530X 0202 220, 2202 0202 0202 020: 0202 0202 0202 230e 0000 2200 2202 203 $202 .3220 7202 0: ,

2202 .21

2:S3 0002 02221

220224 2223 7002 2022 0.02

.0202 2

1 0200 220

0202

0252 2.'02

.., Ring :10

0-S 030 .

022 ''Sc .20 0303 0222

0302 5551 0002 2.02 0002 022 0020 022= 02011 0201

02 fl IC 0,22

0:21 r7,' in I ea.A

0211J .22,4

2222 ,[02 OrS2 032 2213 000. 22,

0210,

220e 0202

02S2 sl.so 0222 0002 020, 220, 0202 0000 0223 0000 0202 0202 $202 2002 0202 .222 0:03' ..0.

:IT",[ 11111

.0,4

.s:sao

:221

$2,2 2.2 S555

0.041 a3

.22.1 12

$222 t%.2it malkaa

020 2.

0:22 0211.

020 020

0230 020

'COO 220

020. 0 052 02041 0303

ors or..s 0302 000

3202 000

OCSZ 00

0202 720

00,2 001

0:02 PI"

023, 251

0202 021

02'. 051

020111 0,1

0202 Oaf

020,

1,011 020

0300 020

'no o..sa of u 323

0210 :

"On 0200

bini j

krclkind

33)11.11dA

202 00 0,02 002 000, 020t 02,2

/CVO 2002

04 02 227171 0072 0220 011100 71:02 0080 1200 S.X. 0=,,

0202 020 0202

O 205 5255 550. 01,72 0202 0201

O 211.,

0,02 7202 0202 00 0202 002

O 20,

2210 'cam -n I

;'-1=1?; w' 'xaS e

; I I 'vs! 121.7322202 222,-22102 Xs

.72e

-akf

i T r.3 I jriCrs 1 'es " mi ',Jul Ti 'au

09:

RE

g

ri

w

la]

aim I rocag

*ma.

oe=

r:

1=1:4 1::

; t2 l= -1.. S

tfiValitffitMEireEz2r

11 SICtf,*OSII ai

as te5ISISEastalf

SC:0IltaSIs''S tall

SSWISISr.ttealtat"at

II slit'C C'I titCOZO'tit2tataSst"GCS e

HUMAN

E cogi

lta"USat"it"etSS;"Ca"ZSatttt:t"Stat:aat":Cat"Ettt:G Y

la t atata

tta%atag

S c GEast

tat IIatatttaiat",

1-1,0iiV''

g==..!E7CZ: 0S tr I' 1. 1

. "... .... "WO =;e:

Lai tga L2 L -...c

LaMar.

AVM ggaa

4

5

44

I

"StatCt"at:tat

at".Saf"SSCS"tgai

1

Ii s.

19

7-17

3

r--

PER

rEN

T C

OR

RE

CT

t-3

CT

"iiC

O'1

-4.R

t T-t

fk-s

tiN

r't87

t1'

71O

rt(C

7t17

:7",

IRPM

CE

RFE

ITZ

AW

NW

EM

IER

MY

SOU

VA

RN

ION

TFR

RW

RIR

ER

RO

PRW

EA

TM

IPPI

OR

MZ

IYM

IAN

IAM

IVR

TIR

FAT

AR

tAV

AR

RIN

RI7

7,3.

flh

aerr

rril

a

IL"O

hrIV

rfi

nlig

e.,,W

attf

iNIC

TIT

EM

NIZ

ZR

VA

RT

C4.

/121

SIV

,IR

MA

RM

IPC

.B.V

IMA

IRM

AT

RIM

ME

OT

MT

NIM

MIW

AT

OR

MA

RM

IrM

ain4

Riif

feE

rilla

ltI17

,

2 4

cm. csea. I r-ve,,,, Ik 'a.' =0,

^~ !~~° ,3 MC 1u 4 0tad k.

thk,

LIS SIC .

22 20.:2222

23222220

2022

22122120 SC.

32C220 WC20Cs22".002C2022222:22 20522222"222to"

202200"202 02202

202C2220222:022S

CSC2.. 22

20222022022

2222202S2222

222112222

SSC/2022

2022022

12221222

CSC 0112"

222250 5

2022022S

222S2220

202220 2C

2:CS

"22"0200222020

2222L222

22222020

02200 22

22222220

12222220

C 2222022

zszs2022

22222220"222222

2CO212022

200:"22112322220

222211022

00222220

22222020

CO22202C

200222

2020-00"2 022

. I

La' i a

r2220

222:2222

1122.020C

22222222

2. C.22022220

22222222"LS

B S"2222

22222:222222.22s20 22

222S2S22

S2220 22

022022

.022222220"2222222

202Cssar.2230

.2222220

22222C22.2:220 20

C2302022

CO202022

S2222222

22002020

S 22.122"IC22"22.2222222

2022

22222222"22.020

%ss206;2222

"'WEEI 11 IN

HUMAN

,r-rr

"CO

"Li p

rAcow. az,

I

25

document resume - ericdocument resume ed 286 476 ir 012 809 author house, william j.; davis, cheryl...

Documents