automatic transcription of piano music - presentation at icme 2011

Upload: ader-lee

Post on 05-Apr-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 Automatic Transcription of PIano Music - Presentation at ICME 2011

    1/28

    Automatic Transcription of Piano Music by

    Sparse Representation of Magnitude Spectra

    Cheng-Te Lee, Yi-Hsuan Yang, and Homer Chen

    National Taiwan University

    ICME 2011 Oral Presentation

    2011/07/14

    Speaker: Cheng-Te Lee

    1

  • 7/31/2019 Automatic Transcription of PIano Music - Presentation at ICME 2011

    2/28

    Outline

    Introduction

    Proposed System

    Performance Analysis & Demo

    2

  • 7/31/2019 Automatic Transcription of PIano Music - Presentation at ICME 2011

    3/28

    I. Introduction

    3

  • 7/31/2019 Automatic Transcription of PIano Music - Presentation at ICME 2011

    4/28

    Automatic Transcription

    Music signal

    (in WAVE format)Musical score

    (in MIDI format)

    Goal: Converting music signal to musical

    scores

    Main drawbacks of previous work

    Training data is difficult to generate

    Assuming the spectral shapes of notes are

    constant

    4

  • 7/31/2019 Automatic Transcription of PIano Music - Presentation at ICME 2011

    5/28

    Spectral Shape of Piano Sound

    Spectra of note C4 (MIDI number 60)

    produced by 6 pianos

    5

  • 7/31/2019 Automatic Transcription of PIano Music - Presentation at ICME 2011

    6/28

    ADSR Model

    Attack, Decay, Sustain, Release

    The spectral shape of a note varies with time

    6

    A

    D

    S

    R

    Frame

    Note C4 in time-domain Spectra over time

  • 7/31/2019 Automatic Transcription of PIano Music - Presentation at ICME 2011

    7/28

    Design Consideration

    Exploit online repository of piano notes as

    database to make the transcription

    work without generating training data

    adapt to a new piano easily adopt the ADSR model

    Synthesized mixture

    Keyboard

    Database of

    individual

    piano notes

    7Input signal

  • 7/31/2019 Automatic Transcription of PIano Music - Presentation at ICME 2011

    8/28

    II. Proposed System

    8

  • 7/31/2019 Automatic Transcription of PIano Music - Presentation at ICME 2011

    9/28

    Tuning factor

    estimation

    WAVE fileVolume

    normalization

    Frame

    decomposition

    FFT

    analysis

    Note candidate

    selection

    Sparse representation

    computation

    Noise

    elimination

    HMM post-processing

    MIDI filePiano sound

    database

    DatabaseTuning

    Tuned piano

    sound database

    System Overview

    9

  • 7/31/2019 Automatic Transcription of PIano Music - Presentation at ICME 2011

    10/28

    DatabaseTuning

    HMM post-processing

    Noise

    elimination

    FFT

    analysis

    Volume

    normalization

    Tuning factor

    estimation

    WAVE fileFrame

    decomposition

    Note candidate

    selection

    Sparse representation

    computation

    MIDI filePiano sound

    database

    Tuned piano

    sound database

    Note Candidate Selection

    10

  • 7/31/2019 Automatic Transcription of PIano Music - Presentation at ICME 2011

    11/28

    Note Candidate Selection

    Octave notes can be easily mistaken for each

    other because they have similar spectra

    Avoid octave error by note candidate selection

    Leverage the harmonic structure of piano sounds

    Spectra of note C4 (MIDI number 60) of two pianos:

    11

    Strong fundamental Weak fundamental

  • 7/31/2019 Automatic Transcription of PIano Music - Presentation at ICME 2011

    12/28

    Illustration of Candidate Selection

    Strong fundamental

    Weak fundamental

    13

  • 7/31/2019 Automatic Transcription of PIano Music - Presentation at ICME 2011

    13/28

    Note candidate

    selection

    DatabaseTuning

    HMM post-processing

    Noise

    elimination

    FFT

    analysis

    Volume

    normalization

    Tuning factor

    estimation

    WAVE fileFrame

    decomposition

    Sparse representation

    computation

    MIDI filePiano sound

    database

    Tuned piano

    sound database

    Sparse Representation Computation

    14

  • 7/31/2019 Automatic Transcription of PIano Music - Presentation at ICME 2011

    14/28

    Sparsity of Played Notes

    A total of 88 keys on a piano

    But the actual keys played each time are a

    sparse subset of the whole keys

    Only 4 voiced notes at a time on average

    15

  • 7/31/2019 Automatic Transcription of PIano Music - Presentation at ICME 2011

    15/28

    Sparse Representation

    Problem formulation

    y: vector of the magnitude spectrum of a frame

    A: matrix of bases, each column of A is the magnitude

    spectrum of a note candidate

    x*: vector of sparse representation coefficients

    *

    0argmin || || subject to ,x

    x x y = Ax (1)

    16

  • 7/31/2019 Automatic Transcription of PIano Music - Presentation at ICME 2011

    16/28

    Illustration of Sparse Representation

    y (frame spectrum) A (spectra of note candidates) x* (coefficient vector)

    17

    Solving (1) is NP-complete

    *

    0argmin || || subject to ,x

    x x y = Ax (1)

  • 7/31/2019 Automatic Transcription of PIano Music - Presentation at ICME 2011

    17/28

    Sparse Representation (contd)

    If the solution of (1) is sparse enough, it is close

    to the solution of the l1-regularized problem

    Can be solved in polynomial time, O(n1.2)

    * 2

    1argmin || || + || ||

    xx y - Ax x

    18

    *

    0argmin || || subject to ,x

    x x y = Ax (1)

  • 7/31/2019 Automatic Transcription of PIano Music - Presentation at ICME 2011

    18/28

    Note candidate

    selection

    DatabaseTuning

    HMM post-processing

    Noise

    elimination

    FFT

    analysis

    Volume

    normalization

    Tuning factor

    estimation

    WAVE fileFrame

    decomposition

    Sparse representation

    computation

    MIDI filePiano sound

    database

    Tuned piano

    sound database

    HMM Post-Processing

    19

  • 7/31/2019 Automatic Transcription of PIano Music - Presentation at ICME 2011

    19/28

    20

    HMM Post-Processing

    Model each note with a two-state (on/off)HMM (88 HMMs for 88 keys on a piano)

    Given a frame sequence X = x1x2xn, t[1,n]

    Maximize

    Because

    so we maximize

    Learnt from MIDI files

    Estimated from sparse

    representation coefficient

  • 7/31/2019 Automatic Transcription of PIano Music - Presentation at ICME 2011

    20/28

    (b) After HMM post-processing

    Result of HMM Post-Processing

    21

    True Positive , False Positive False Negative , True Negative,

    (a) Before HMM post-processing

  • 7/31/2019 Automatic Transcription of PIano Music - Presentation at ICME 2011

    21/28

    III. Performance Analysis & Demo

    22

  • 7/31/2019 Automatic Transcription of PIano Music - Presentation at ICME 2011

    22/28

    Frame-Level Evaluation

    70.2% F-measure 10 one-minute long classical music recordings

    Each frame is 100 ms long, hop size is 10 ms

    59,910 frames, 211,082 notes, 3.54 avg. polyphony

    Significant improvement compared to two state-

    of-the-art systems

    Under the one-tailed t-test (p-value < 0.05)

    F-measure Precision Recall

    Proposed system 70.2% 74.4% 66.5%

    Klapuris system [1] 62.2% 72.4% 54.6%

    Marolts system [2] 66.1% 78.6% 57.1%

    [1] M. Marolt, A connectionist approach to automatic transcription of polyphonic piano music,IEEE Trans. Multimedia, vol. 6, no. 3, pp. 439449, 2004.

    [2] A. Klapuri, Multiple fundamental frequency estimation by summing harmonic amplitudes, in Proc. ISMIR, Victoria, Canada, pp. 216221, Oct. 2006.23

  • 7/31/2019 Automatic Transcription of PIano Music - Presentation at ICME 2011

    23/28

    Note-Level Evaluation

    73.0% F-measure

    Only consider onsets of notes

    Within 100ms of the ground-truth onset

    4937 notes

    Significant improvement compared to the best

    system of MIREX F0 tracking 2010 [3]

    24

    F-measure Precision Recall

    Proposed system 70.2% 74.6% 71.6%

    Yehs system [3] 67.1% 57.2% 81.1%

    [3] C. Yeh and A. Roebel. (2010). Multiple-F0 estimation for MIREX 2010. Music Information Retrieval Evaluation eXchange.

    [Online]. Available: http://www.music-ir.org/mirex/abstracts/2010/AR1.pdf

    http://www.music-ir.org/mirex/abstracts/2010/AR1.pdfhttp://www.music-ir.org/mirex/abstracts/2010/AR1.pdfhttp://www.music-ir.org/mirex/abstracts/2010/AR1.pdfhttp://www.music-ir.org/mirex/abstracts/2010/AR1.pdf
  • 7/31/2019 Automatic Transcription of PIano Music - Presentation at ICME 2011

    24/28

    Analysis of System Components

    25

  • 7/31/2019 Automatic Transcription of PIano Music - Presentation at ICME 2011

    25/28

    Number of Base Elements

    Because we adopt the ADSR model, there are

    more than one base element for each note

    F-measure is improved from 64.6% (88 base

    elements) to 70.2% (646 base elements)

    26

  • 7/31/2019 Automatic Transcription of PIano Music - Presentation at ICME 2011

    26/28

    Conclusion

    We have presented an automatic transcription

    system that

    exploits sparse nature of played keys

    adapts to a new piano easily

    adopts ADSR model to improve the accuracy

    Significant improvement over state-of-the-art

    systems

  • 7/31/2019 Automatic Transcription of PIano Music - Presentation at ICME 2011

    27/28

    Live Demo

    Song

    Prelude and

    Fugue No.2 in

    C Minor

    Sonata no. 8

    Pathetique in

    C minor, 3rd

    movement

    Moments

    Musicaux No.

    4

    Sonata K.333

    in Bb Major,

    1st Movement

    Composer Bach Beethoven Schubert Mozart

    Original

    Result

    F-measure 78.2% 74.6% 67.0% 78.4%

    28

  • 7/31/2019 Automatic Transcription of PIano Music - Presentation at ICME 2011

    28/28

    Thanks for your attention

    Q&A

    29