music and video iconicity: theory and experimental design

7
Abstract Experimental studies on the relationship between quasi-musical patterns and visual movement have largely focused on either referential, associative aspects or syntactical, accent-oriented alignments. Both of these are very important, however, between the referential and areferential lays a domain where visual pattern perceptually connects to musical pattern; this is iconicity. The temporal syntax of accent structures in iconicity is hypothesized to be important. Beyond that, a multidimensional visual space connects to musical patterning through mapping of visual time/space to musical time/magnitudes. Experimental visual and musical correlates are presented and comparisons to previous research provided. J Physiol Anthropol Appl Human Sci 24(1): 143–149, 2005 http://www.jstage.jst.go.jp/browse/jpa Keywords: multimedia, animation, film, perception, cognition, music Introduction The connection of musical patterning to visual patterning is an essential and important area of study in the cognitive sciences. Multimedia, DVD video, video games, and television all incorporate interactions of musical and video elements. This study is designed to explore this important aspect of musical communication: How do musical and video elements combine in perceptually and culturally acceptable forms? Related Literature The relationship between music and video has been studied in a number of contexts, the most common of which is the assessment of meaning using semantic scales relative to relationships among musical and visual variables. Marshall and Cohen (1988) investigated the relationship between three animated shapes (a large and small triangle and a small circle) and subject ratings on semantic differentials as influenced by two contrasting examples of background music. Independent judgments of the animation and music were made, and then judgments of the combination of music and animation using twelve differentials. Although the activity dimension rating of the two pieces of music was the same, when combined with the animations the activity rating for the geometric ‘characters’ significantly varied. Marshall and Cohen (1988) proposed that there was, therefore, an interaction of visual and musical temporal structures such that visual attention was directed differently depending on the musical example with which it was paired. However it must be noted that technical analysis of the temporal structures of the audio and visual materials, and how specifically these might interact, was not undertaken. Lipscomb and Kendall (1994) conducted an experiment to evaluate the degree to which the composer’s intent in film music composition was received by the viewer/listener. Five excerpts from Star Trek IV (music by Leonard Rosenman) were extracted. The soundtracks from the five excerpts were interchanged, leaving one match and four distracters. The composites were edited to maximize musical/visual accent alignment by graduate student film composers. The twenty-five combinations were rated on ten verbal attribute scales. Evaluative factor ratings were highest for the combinations that were originally intended. This confirmed data from an experiment where subjects were asked to match one of the five music selections with the film excerpts, a task which confirmed the ability of viewers who had not previously seen the film to determine the intended music. Lipscomb and Kendall (1994) hypothesized that viewers examine the accent relationship between visual and musical elements. If the accent structures align, and associative meaning is culturally viable, then perception and judgment result from the audio/visual composite rather than on each modality in isolation or in lieu of one another. Further experiments in accent structure alignment between music and visual elements were carried out by Iwamiya (1994). Iwamiya (1994) offset the audio and visual channels of laserdisc video by 500 msec. He found that subjects rated temporally matched versions higher than the mismatched versions. He also added different background music to the originals. Since the audio/visual accent structures were incoherent, subjects rated the different music composites lower in matching than the original versions. Ratings of the audiovisual composites on 22 scales indicated the differential importance of audio visual elements depending on factor complexity: lower order factors influenced ratings across Music and Video Iconicity: Theory and Experimental Design Roger A. Kendall Music Cognition and Acoustics Laboratory, University of California, Los Angeles, USA Journal of PHYSIOLOGICAL ANTHROPOLOGY and Applied Human Science

Upload: others

Post on 07-Jan-2022

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Music and Video Iconicity: Theory and Experimental Design

Abstract Experimental studies on the relationshipbetween quasi-musical patterns and visual movement havelargely focused on either referential, associative aspects orsyntactical, accent-oriented alignments. Both of these are veryimportant, however, between the referential and areferentiallays a domain where visual pattern perceptually connects tomusical pattern; this is iconicity. The temporal syntax of accentstructures in iconicity is hypothesized to be important. Beyondthat, a multidimensional visual space connects to musicalpatterning through mapping of visual time/space to musicaltime/magnitudes. Experimental visual and musical correlatesare presented and comparisons to previous research provided.J Physiol Anthropol Appl Human Sci 24(1): 143–149, 2005http://www.jstage.jst.go.jp/browse/jpa

Keywords: multimedia, animation, film, perception,cognition, music

Introduction

The connection of musical patterning to visual patterning isan essential and important area of study in the cognitivesciences. Multimedia, DVD video, video games, and televisionall incorporate interactions of musical and video elements.This study is designed to explore this important aspect ofmusical communication: How do musical and video elementscombine in perceptually and culturally acceptable forms?

Related Literature

The relationship between music and video has been studiedin a number of contexts, the most common of which is theassessment of meaning using semantic scales relative torelationships among musical and visual variables. Marshalland Cohen (1988) investigated the relationship between threeanimated shapes (a large and small triangle and a small circle)and subject ratings on semantic differentials as influenced bytwo contrasting examples of background music. Independentjudgments of the animation and music were made, and thenjudgments of the combination of music and animation usingtwelve differentials. Although the activity dimension rating of

the two pieces of music was the same, when combined with theanimations the activity rating for the geometric ‘characters’significantly varied. Marshall and Cohen (1988) proposed thatthere was, therefore, an interaction of visual and musicaltemporal structures such that visual attention was directeddifferently depending on the musical example with which itwas paired. However it must be noted that technical analysis ofthe temporal structures of the audio and visual materials, andhow specifically these might interact, was not undertaken.

Lipscomb and Kendall (1994) conducted an experiment toevaluate the degree to which the composer’s intent in filmmusic composition was received by the viewer/listener. Fiveexcerpts from Star Trek IV (music by Leonard Rosenman)were extracted. The soundtracks from the five excerpts wereinterchanged, leaving one match and four distracters. Thecomposites were edited to maximize musical/visual accentalignment by graduate student film composers. The twenty-fivecombinations were rated on ten verbal attribute scales.Evaluative factor ratings were highest for the combinationsthat were originally intended. This confirmed data from anexperiment where subjects were asked to match one of the fivemusic selections with the film excerpts, a task whichconfirmed the ability of viewers who had not previously seenthe film to determine the intended music. Lipscomb andKendall (1994) hypothesized that viewers examine the accentrelationship between visual and musical elements. If the accentstructures align, and associative meaning is culturally viable,then perception and judgment result from the audio/visualcomposite rather than on each modality in isolation or in lieuof one another.

Further experiments in accent structure alignment betweenmusic and visual elements were carried out by Iwamiya(1994). Iwamiya (1994) offset the audio and visual channels oflaserdisc video by 500 msec. He found that subjects ratedtemporally matched versions higher than the mismatchedversions. He also added different background music to theoriginals. Since the audio/visual accent structures wereincoherent, subjects rated the different music composites lowerin matching than the original versions. Ratings of theaudiovisual composites on 22 scales indicated the differentialimportance of audio visual elements depending on factorcomplexity: lower order factors influenced ratings across

Music and Video Iconicity: Theory and Experimental Design

Roger A. Kendall

Music Cognition and Acoustics Laboratory, University of California, Los Angeles, USA

Journal of

PHYSIOLOGICALANTHROPOLOGYand Applied Human Science

Page 2: Music and Video Iconicity: Theory and Experimental Design

conditions, while higher order factors such as ‘uniqueness’were influential directly when the music and visual materialswere well matched. Lipscomb (in press) conducted anextensive investigation into the perception of visual andmusical accent structures and their alignment.

Psychologists are interested in the magnitude and order ofmusical and visual variables in the ratings scale judgmentsmentioned above. However, associative meanings such as thoseproduced by semantic differential ratings are not the only typesof meanings induced by the film and animation experience.Dowling and Harwood (1986) juxtapose the theories of Meyer(1956) and Peirce (1931–1935) in their discussion of varioustypes of musical meaning. The concept of ‘referentialism’ inMeyer (1956) largely is synonymous with Peirce’s ‘index.’These meanings are the type of associations between musicaland visual elements that are uncovered by semantic differentialratings methodologies such as those mentioned previously. Aninteresting, and central, point in Meyer’s (1956) theory ofmeaning is ‘expressionism.’ This type of meaning is embodied(contained within) the music itself. Kendall (1999, 2000)proposed that, instead of these categories, this implies acontinuum of referentiality. At one end of the continuum arethose meanings that are associative in a completely arbitraryway; the sign and signified are linked only by convention andnot by any aspect of form or structure. At the other end of thecontinuum is the areferential; the meaning embodied withinthe structure and patterning of the music or visual itself. In themiddle between the arbitrary denotation of the referential andthe meaning derived from pure syntactical patterning, is theconnotative meaning of icon. Icons are connotative meaningsin which the patterning or form in one domain or modalitysuggests or implies a connection to another domain. Thecommon example in English is the phrase ‘weeping willow.’This tree is named such because the branches droop,suggesting weeping (the tree obviously does not physicallycry). Iconic meanings are between association/index andsyntax on the continuum of referentiality. The continuum aspresented here is unidimensional. It is quite clear thatdimensional analysis of verbal responses indicates thatassociative meanings alone are themselves multidimensional .See Kendall (in press).

Iconic meanings in visual motion and music have as theirbasis syntax. An example in film and music is the GalleyRowing scene in Ben Hur (1959). The conveyed meaning isnot just in the fact that the music pulse and tempo matches therowing pace of the slaves, but that the melodic pitch ascendswhen the oars are brought up and in, and descends when theoars are brought down and outward. The combination oftemporal accent alignment in the visual and musical structurewith the connotation (suggestiveness) of musical variablemagnitude change through time and visual change in spacemelds into an iconic composite.

Kendall (in press) conducted an experiment to see if subjectscould rate audiovisual examples from commercial movies andbroadcasts on a scale of referentiality. Six music and video

combinations drawn from the commercial media were selectedand hypothesized to span a continuum from areferential(syntactical) through iconic to referential. The six exampleswere Shanghai Lil from Footlight Parade (movie, 1933), Coolfrom West Side Story (movie, 1961), Galley Rowing from BenHur (movie, 1959) Inside the Ship from Close Encounters ofthe Third Kind (1977), Paul Wyle Ice Skating (OlympicBroadcast, 1992), and Fantasia (animated film, 1940/1990).Six graduate students rated the randomly presented excerpts ona scale from areferential (left side) through icon (middle) toindex/referential (right side) Repeated measures ANOVA ofthe resulting data showed a significant main effect (p�.0001)across the six means. Post hoc analysis of paired meandifferences showed that means were statistically different(p�.05) except for the two syntax/icon combinations, ashypothesized. The data thus demonstrates that trainedmusicians can use the continuum in a non-categorical manner.

Relevant to the present discussion are two important recentstudies that approach elements of iconicity in visuals withmusic. Iwamiya and Ozaki (2004) studied the animation ofthree visual stimuli with combinations of seven musical pitchpatterns, three of which involved two-line contrary motion.The visual stimuli periodically expanded and contracted at thecenter of the screen. They note, “In the present experiments,the shape of the image was monotonously enlarging orreducing. This type of visual monotonous transformationmatched unidirectional and continuous descending orascending pitch patterns.” The formal congruency they speakof is very close to the concept of iconicity outlined in Kendall(1999, 2000, in press). However, it is not at all clear whetherthe temporal periodicity of the pitch patterns and of the visualstimuli was always coordinated or whether there were out ofphase conditions.

Lipscomb and Kim (2004) conducted experiments with acarefully nested set of visual and musical variables. Threepitch patterns with an ascending (3 notes) then descending (2notes) pitch contour were created with three interval sizes,from small to large with a C3 tonic. Timbre patterns, attachedto each of the unique pitch chroma, were drawn fromperceptual scalings in Kendall et al. (1999) to produce threemagnitudes of timbre change. Similarly, loudness contours andnote length contours were produced. Visual magnitudecontours of color, size, shape, and location along the y-axis of2-D space were paired with the musical contour patterns.Results indicated that pitch contour was matched best withposition along the y axis, a result that corresponds withIwamiya and Ozaki (2004) and connects to the concept oficonicity outlined below. Loudness was matched with size, andtimbre with shape (although Fig. 1, p. 74 shows location andloudness at relatively high match levels, and shape nearlyequally matched with pitch, loudness, and timbre. No post hocanalysis was provided). These results largely match thosefound in Walker (1987).

Although the data are valuable, several issues may impedeconnection of this research to that outlined below. Firstly,

144 Music and Video Iconicity: Theory and Experimental Design

Page 3: Music and Video Iconicity: Theory and Experimental Design

physical magnitudes of color, for example, were used based onwavelength. Color, like pitch, is multidimensional, and I willsuggest that only perceptual magnitudes, expressed in musicalterms, be used iconic studies. It is likely that the use of spectralwavelength resulted in a brightness contour that decreased aspitch increased, thus negating the iconic connection. Similarly,with timbre, the instruments employed cannot play at thetessitura indicated if indeed the tonic is C3. Thus the Kendallet al. (1999) data may not provide adequate information fortimbre magnitudes, since the pitch chroma employed in thatstudy was Bb4. Finally, the pitch pattern ascends and descendsfrom the tonic, and the corresponding animation on the y-axismoves up and down around the origin. These minor details inan important study can be explored and clarified in futureresearch.

Experimental Design

Musical icons emerge in multimedia from visual magnitudeand music magnitude and direction change through time. In theprojected experiments, the visual field is divided into temporalunits on the x, y, and z space axes. Musical time is expressedby conventional notation that matches the temporal structure ofthe visual graphs.

A few iconic archetypes are illustrated in Figure 1.The top row presents ascending and descending ramp. In

visual space, this could be motion up or down on any two axesof 3-d space. In addition, a ramp can be increasing visualmagnitude such as a zoom with increasing size along the z-axis. Musical ramps are common and exist in ascending anddescending scalar pitch patterns and crescendi anddecrescendi.

The burst, called in movies a stinger, is a sudden, temporallybrief contrast of increasing magnitude in the visual (e.g.

brightness, size, color) and musical (e.g. sforzando, cymbalcrash, consonant-dissonant shift, key change) domains.

The second row presents the arch, which can be diagramedeither in the semicircular form as illustrated above, or as theconnection of two ramps in temporal contiguity.

The third row simply indicates that combinations of iconscan be layered. In music, a perfect example is the start of SuiteNo. 2 in Ravel’s (1912) Daphnis et Chole. The woodwindsperform continuous arch patterns suggesting a brook while thestrings and brass perform a series of loudness and pitch ramps,suggesting a gradual sunrise.

An important syntactical feature in musical/visual iconicityis congruence of contours in time. Changes of pitch directionin time, in the manner described in Dowling and Harwood

Kendall, RA J Physiol Anthropol Appl Human Sci, 24: 143–149, 2005 145

Fig. 2 This is an example of a descending ramp in the visual domain. The hypothesized best-fit musical patterns include the two above.

Fig. 1 Iconic archetypes (Kendall, R. Empirical Approaches to MusicalMeaning. Selected Reports in Ethnomusicology, Vol. 12. Copyright,Regents of the University of California. Used by Permission).

Page 4: Music and Video Iconicity: Theory and Experimental Design

(1986), are hypothesized to connect to changes of objectdirection in time. Below we consider some examples for use ina perceptual experiment.

Stimuli

The present experiment diagrams a number of visualanimations using a circle and, in the future, a texture-mappedexternally-lighted sphere with camera angle perspective (sincecamera perspective can create visual/temporal patterns inpanning and zooming).

Figures 2–9 below are hypothesized to illustrate visualramps and are hypothesized to be best matched with musicalpattern changes that decrease in perceptual value at the same

rate. Below the visual figures are musical patterns which arehypothesized to correlate with the visual animations. In theexperiments, one variable is temporal rate, so the framerepresents 2400 msec (M. M.�100 in common time) or2800 msec (M. M.�70). Not shown here is the non-motionstopping point (at the half note musically) of 1200 or1400 msec.

The first experimental procedure follows the relatedliterature. All combinations of musical patterns andanimations, at two tempos, will be combined. The audio-visualcombinations will be rated by subjects for the degree of fit.

In the second experiment, the animations and music patternsthat are best matches in experiment 1 will be rated forsimilarity in all possible pairs. The similarity matrix will be

146 Music and Video Iconicity: Theory and Experimental Design

Fig. 3 The inverse version of figure 2.

Fig. 4 The visual arch starting at the center and extending only above the center line is hypothesized to match pitch pattern contours that moveaway from and back to the tonic (but not below it). For the contrasting pattern in the experiment, see below.

Page 5: Music and Video Iconicity: Theory and Experimental Design

subjected to multidimensional scaling, thus revealing thecognitive space for iconic combinations. Additional analysisshould reveal the perceptual salience of different variablesresulting in the composites.

References

Cohen A (2001) Music as a source of emotion in film. In JuslinP, Sloboda J ed. Music and Emotion: Theory and Research.Oxford University Press, Oxford, 249–272

Cohen A (in press) How music influences the interpretation of

film and video: Approaches from experimental psychology.In Kendall R, Savage R eds. Selected Reports inEthnomusicology 12

Dowling W, Harwood D (1986) Music Cognition. AcademicPress, Inc., Orlando

Iwamiya S (1994) Interaction between auditory and visualprocessing when listening to music in an audiovisualcontext: 1. Matching 2. Audio quality. Psychomusicology13: 133–154

Iwamyia S, Ozaki H (2004) Formal congruency between imagepatterns and pitch patterns. In Lipscomb S, Ashley R,

Kendall, RA J Physiol Anthropol Appl Human Sci, 24: 143–149, 2005 147

Fig. 6 It is hypothesized that motion on only one axis will be best fit by pitch patterns that are based on a single chroma. Both duple and triplestructures are employed.

Fig. 5 The visual structure is a zoom along the z axis. Hypothesized musical combinations include a crescendo, a crescendo with dissonance toconsonance at the final tone, and a pattern with decreasing inter-onset intervals.

Page 6: Music and Video Iconicity: Theory and Experimental Design

148 Music and Video Iconicity: Theory and Experimental Design

Fig. 7 This arch starts at the center of the visual field and alternates above and below the center point. Similarly the pitch pattern arches aroundthe tonic.

Fig. 8 This is an example of arches and ramp superposition. The animation rotates on the x-y axis and enlarges on the z-axis (not shown).

Fig. 9 These music patterns are included because they do not correspond syntactically and precisely to any of the visual patterns.

Page 7: Music and Video Iconicity: Theory and Experimental Design

Gjerdingen R, Webster P eds. Proc the 8th InternationalConference on Music Perception & Cognition. CausalProductions, Adelaide, Australia, 145–148

Kendall R (1999) A theory of meaning and film music. JointMeeting of the Acoustical Society of America and theEuropean Acoustics Association, Berlin Germany (Invitedpaper)

Kendall R, Carterette E, Hajda J (1999) Perceptual andacoustical features of natural and synthetic orchestralinstrument tones. Music Perception16: 327–364

Kendall R (2000) Film music and systematic musicology. InToronto 2000: Musical Intersections, 222

Kendall R (in press) Empirical approaches to musical meaning.In Kendall R, Savage R eds. Selected Reports inEthnomusicology 12

Langer S (1942/1957) Philosophy in a New Key: A Study inthe Symbolism of Reason, Rite, and Art. 3rd ed., HarvardUniversity Press, Cambridge, Massachusetts

Lipscomb S (in press) The perception of audio-visualcomposites: Accent structure alignment of simple stimuli. In Kendall R, Savage R eds. Selected Reports inEthnomusicology 12

Lipscomb S, Kendall RA (1994). Perceptual judgment of therelationship between musical and visual components in film.Pscyhomusicology 13: 60–98

Lipscomb S, Kim E (2004) Perceived match between visualparameters and auditory correlates: An experimentalmultimedia investigation. In Lipscomb S, Ashley R,

Gjerdingen R, Webster P eds. Proc the 8th InternationalConference on Music perception & Cognition. CausalProductions, Adelaide, Australia, 72–75

Marshall S, Cohen A (1988) Effects of musical soundtracks onattitudes toward animated geometric figures. MusicPerception 6: 95–112

Meyer L (1956) Emotion and Meaning in Music. University ofChicago Press, Chicago

Peirce C (1958) Collected Papers (1931–1935) Vols. 1–6.Hartshorne C, Weiss P eds., Harvard University Press,Cambridge, Massachusetts

Sonesson G (2000) Iconicity in the ecology of semiosis. InJohansson T, Skov M, Brogarrd B eds. Iconicity: AFundamental Problem in Semiotics. NSU Press, Aarhus

Walker R (1987) The effects of culture, environment, age, andmusical training on choices of visual metaphors for sound.Percept Psychophys 42: 491–502

Zwan R, Yaxley R (in press) Spatial iconicity affects semanticrelatedness judgments. Psychonomic Bulletin and Review

Received: October 30, 2004Accepted: November 8, 2004Correspondence to: Roger A. Kendall, Department of Ethno-musicology, Program in Systematic Musicology, University ofCalifornia at Los Angeles, (UCLA), Box 951657, Los Ange-les, CA 90095–1657, USAe-mail: [email protected]

Kendall, RA J Physiol Anthropol Appl Human Sci, 24: 143–149, 2005 149