introduction to music informatics: i548/n560, spring 2011
DESCRIPTION
Introduction to Music Informatics: I548/N560, Spring 2011. Instructor: Eric Nichols [email protected] http://tinyurl.com/Info548. Overview Tues, Feb 15. HW – questions? HW: contest and output format Dynamic Time Warping for Audio-to-MIDI alignment Symbolic Representations - PowerPoint PPT PresentationTRANSCRIPT
Introduction to Music Informatics: I548/N560, Spring
2011Instructor: Eric Nichols
http://tinyurl.com/Info548
OverviewTues, Feb 15
HW – questions? HW: contest and output format Dynamic Time Warping for Audio-to-MIDI
alignment Symbolic Representations Reading: Dannenberg
Polyphonic Audio Matching and
Alignment Ning Hu, Roger B. Dannenberg and George
Tzanetakis Goal: align polyphonic audio to a symbolic
score Does not perform transcription Used to search MIDI databases for a match
to a given audio recording
Motivation Query by Humming is an important
problem, and it uses a symbolic database. Why is symbolic better than audio matching
for this problem? Possible solution: do polyphonic
transcription on the query. Then find best match. However, transcription is hard.
Idea Instead of transcription of the query,
convert the symbolic database into audio! Instead of using an entire spectrum,
convert to a chroma vector. Do dynamic time warping (DTW) on audio
to look for matches.
Chroma Vector For each bin in the FFT
Assign the bin to the nearest half-step Remove octave information For each pitch class (1-12), average the value
of its associated bins. For this paper: 0.25 seconds of audio per
chroma vector. Nonoverlapping windows. Computing pitch from MIDI and vice versa
freq = 440 * 2^((MIDI-69) / 12.0) MIDI = 69 + 12*log(freq/440.0) / log(2)
Chroma Vectors
Why chroma? Not super-sensitive to spectral distribution –
ignores many details of timbre by collapsing everything into one octave
Mostly is sensitive to fundamental pitches and chords
Converting MIDI to chroma
Two possibilities: Render the MIDI with a synthesizer, and then
compute the FFT and then the chroma vector. Go directly from MIDI to chroma with a
theoretical model (in this paper, it is assumed that no overtones are present in the chroma for each given MIDI pitch.)
One difficulty: dealing with percussive sounds
Chroma Similarity Now we have lists of chroma vectors for an
audio query and for a database of MIDI files Normalize all vectors to have mean 0 and
variance 1 This helps reduce differences in vectors due
to absolute loudness Compute the Euclidean distance between
vectors (0 distance = perfect match) Compute the entire similarity matrix
between vector pairs.
Similarity MatrixDark = highly similar
Black diagonal = matching path
Note start, end, and length disparity
DTW computation
Results: 10 Beatles songs
Results 2
Results 3
Conclusion More sophisticated DTW could be used to
speed up the search Gives an example of linking symbolic and
audio domains
Discussion What elements/features of music should we
represent? Can we create a “dream” representation?