singing expression transfer from one voice to another for ...singing expression transfer from one...

41
MACLab Music and Audio Computing Singing Expression Transfer from One Voice to Another for a Given Song Korea Advanced Institute of Science and Technology Sangeon Yong, Juhan Nam

Upload: others

Post on 24-May-2020

16 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Singing Expression Transfer from One Voice to Another for ...Singing Expression Transfer from One Voice to Another for a Given Song Korea Advanced Institute of Science and Technology

MACLabMusic and Audio Computing

Singing Expression Transfer from One Voice to Another for a Given Song

Korea Advanced Institute of Science and Technology

Sangeon Yong, Juhan Nam

Page 2: Singing Expression Transfer from One Voice to Another for ...Singing Expression Transfer from One Voice to Another for a Given Song Korea Advanced Institute of Science and Technology

Introduction

Page 3: Singing Expression Transfer from One Voice to Another for ...Singing Expression Transfer from One Voice to Another for a Given Song Korea Advanced Institute of Science and Technology

Introduction

source target

Page 4: Singing Expression Transfer from One Voice to Another for ...Singing Expression Transfer from One Voice to Another for a Given Song Korea Advanced Institute of Science and Technology

Related Works

Antares Autotune 8 graphical mode Steinberg Variaudio

Page 5: Singing Expression Transfer from One Voice to Another for ...Singing Expression Transfer from One Voice to Another for a Given Song Korea Advanced Institute of Science and Technology

Related Works

Cano et al. (ICMC, 2000)Voice morphing system with source and target voice

Score information is used for temporal alignment

Nakano et al. (SMC, 2009)Similar with above but using a singing synthesizer instead of the source voice (i.e. Vocaloid)

Tune synthesizer parameter with the lyric information of the song

However, they require additional score information!

Page 6: Singing Expression Transfer from One Voice to Another for ...Singing Expression Transfer from One Voice to Another for a Given Song Korea Advanced Institute of Science and Technology

Research Goal

Transfer musical expressions without any additional information

?Rhythm, Pitch, Dynamics

Voice color

Page 7: Singing Expression Transfer from One Voice to Another for ...Singing Expression Transfer from One Voice to Another for a Given Song Korea Advanced Institute of Science and Technology

System Structure

Feature

Extraction

Target

Singing Voice

Source

Singing Voice

Time-Scale

ModificationPitch Shifting Gain

DTW Smoothing HPSSEnvelope

Detector

Pitch

Detector

Modified

Singing Voice

stretching ratio

smoothed

stretching ratiopitch ratio gain ratio

harmonic signal

Temporal Alignment Pitch Alignment Dynamics Alignment

𝑠 𝑠𝑇 𝑠𝑇𝑃 𝑠𝑇𝑃𝐸

Page 8: Singing Expression Transfer from One Voice to Another for ...Singing Expression Transfer from One Voice to Another for a Given Song Korea Advanced Institute of Science and Technology

System Structure

Feature

Extraction

Target

Singing Voice

Source

Singing Voice

Time-Scale

ModificationPitch Shifting Gain

DTW Smoothing HPSSEnvelope

Detector

Pitch

Detector

Modified

Singing Voice

stretching ratio

smoothed

stretching ratiopitch ratio gain ratio

harmonic signal

Temporal Alignment Pitch Alignment Dynamics Alignment

𝑠 𝑠𝑇 𝑠𝑇𝑃 𝑠𝑇𝑃𝐸

Page 9: Singing Expression Transfer from One Voice to Another for ...Singing Expression Transfer from One Voice to Another for a Given Song Korea Advanced Institute of Science and Technology

System Structure

Feature

Extraction

Target

Singing Voice

Source

Singing Voice

Time-Scale

ModificationPitch Shifting Gain

DTW Smoothing HPSSEnvelope

Detector

Pitch

Detector

Modified

Singing Voice

stretching ratio

smoothed

stretching ratiopitch ratio gain ratio

harmonic signal

Temporal Alignment Pitch Alignment Dynamics Alignment

𝑠 𝑠𝑇 𝑠𝑇𝑃 𝑠𝑇𝑃𝐸

Page 10: Singing Expression Transfer from One Voice to Another for ...Singing Expression Transfer from One Voice to Another for a Given Song Korea Advanced Institute of Science and Technology

Temporal Alignment

Lyrics

Singer A

Singer B

Let it go let it go

Page 11: Singing Expression Transfer from One Voice to Another for ...Singing Expression Transfer from One Voice to Another for a Given Song Korea Advanced Institute of Science and Technology

Temporal Alignment – Dynamic Time Warping

Page 12: Singing Expression Transfer from One Voice to Another for ...Singing Expression Transfer from One Voice to Another for a Given Song Korea Advanced Institute of Science and Technology

System Structure

Feature

Extraction

Target

Singing Voice

Source

Singing Voice

Time-Scale

ModificationPitch Shifting Gain

DTW Smoothing HPSSEnvelope

Detector

Pitch

Detector

Modified

Singing Voice

stretching ratio

smoothed

stretching ratiopitch ratio gain ratio

harmonic signal

Temporal Alignment Pitch Alignment Dynamics Alignment

𝑠 𝑠𝑇 𝑠𝑇𝑃 𝑠𝑇𝑃𝐸

Page 13: Singing Expression Transfer from One Voice to Another for ...Singing Expression Transfer from One Voice to Another for a Given Song Korea Advanced Institute of Science and Technology

Temporal Alignment – Feature Extraction

Spectrogram of Source Spectrogram of Target

Page 14: Singing Expression Transfer from One Voice to Another for ...Singing Expression Transfer from One Voice to Another for a Given Song Korea Advanced Institute of Science and Technology

Temporal Alignment – Feature Extraction

Similarity matrix with spectrogram

Page 15: Singing Expression Transfer from One Voice to Another for ...Singing Expression Transfer from One Voice to Another for a Given Song Korea Advanced Institute of Science and Technology

Temporal Alignment – Feature Extraction

Spectrogram of Source Spectrogram of Target

Page 16: Singing Expression Transfer from One Voice to Another for ...Singing Expression Transfer from One Voice to Another for a Given Song Korea Advanced Institute of Science and Technology

Feature Extraction Strategy

Preserving common elementsNote-level melody

Lyrics

Suppressing different characteristicsVibrato or other pitch-related articulations

Singer timbre

Page 17: Singing Expression Transfer from One Voice to Another for ...Singing Expression Transfer from One Voice to Another for a Given Song Korea Advanced Institute of Science and Technology

Proposed Features

Max-filtered Constant-Q transform Semi-tone pitch resolution: vibrato with less than one semi-tone

Frequency-wise max-filtering: vibrato with more than one semi-tone

Constant-Q Transform Const-Q Trans with Maximum Filtering

Page 18: Singing Expression Transfer from One Voice to Another for ...Singing Expression Transfer from One Voice to Another for a Given Song Korea Advanced Institute of Science and Technology

Proposed Features

Phoneme score (phoneme classifier posteriorgram)Frame-level features for accurate temporal alignment

Singer invariant lyrical features

Phonem

es

Page 19: Singing Expression Transfer from One Voice to Another for ...Singing Expression Transfer from One Voice to Another for a Given Song Korea Advanced Institute of Science and Technology

Temporal Alignment – Feature Comparison

Spectrogram Max-filtered Constant-Q Transform

Page 20: Singing Expression Transfer from One Voice to Another for ...Singing Expression Transfer from One Voice to Another for a Given Song Korea Advanced Institute of Science and Technology

Temporal Alignment – Feature Comparison

Spectrogram phoneme score

Page 21: Singing Expression Transfer from One Voice to Another for ...Singing Expression Transfer from One Voice to Another for a Given Song Korea Advanced Institute of Science and Technology

Temporal Alignment – Feature Comparison

Spectrogram Phoneme Score +Const-Q Trans

Page 22: Singing Expression Transfer from One Voice to Another for ...Singing Expression Transfer from One Voice to Another for a Given Song Korea Advanced Institute of Science and Technology

System Structure

Feature

Extraction

Target

Singing Voice

Source

Singing Voice

Time-Scale

ModificationPitch Shifting Gain

DTW Smoothing HPSSEnvelope

Detector

Pitch

Detector

Modified

Singing Voice

stretching ratio

smoothed

stretching ratiopitch ratio gain ratio

harmonic signal

Temporal Alignment Pitch Alignment Dynamics Alignment

𝑠 𝑠𝑇 𝑠𝑇𝑃 𝑠𝑇𝑃𝐸

Page 23: Singing Expression Transfer from One Voice to Another for ...Singing Expression Transfer from One Voice to Another for a Given Song Korea Advanced Institute of Science and Technology

Temporal Alignment – Path Smoothing

Page 24: Singing Expression Transfer from One Voice to Another for ...Singing Expression Transfer from One Voice to Another for a Given Song Korea Advanced Institute of Science and Technology

Temporal Alignment – Path Smoothing

Savitzky, Abraham, and Marcel JE Golay. "Smoothing and differentiation of data by

simplified least squares procedures." Analytical chemistry 36.8 (1964): 1627-1639.

Page 25: Singing Expression Transfer from One Voice to Another for ...Singing Expression Transfer from One Voice to Another for a Given Song Korea Advanced Institute of Science and Technology

Temporal Alignment – Path Smoothing

Page 26: Singing Expression Transfer from One Voice to Another for ...Singing Expression Transfer from One Voice to Another for a Given Song Korea Advanced Institute of Science and Technology

Temporal Alignment – Path Smoothing

Page 27: Singing Expression Transfer from One Voice to Another for ...Singing Expression Transfer from One Voice to Another for a Given Song Korea Advanced Institute of Science and Technology

System Structure

Feature

Extraction

Target

Singing Voice

Source

Singing Voice

Time-Scale

ModificationPitch Shifting Gain

DTW Smoothing HPSSEnvelope

Detector

Pitch

Detector

Modified

Singing Voice

stretching ratio

smoothed

stretching ratiopitch ratio gain ratio

harmonic signal

Temporal Alignment Pitch Alignment Dynamics Alignment

𝑠 𝑠𝑇 𝑠𝑇𝑃 𝑠𝑇𝑃𝐸

WSOLA

Page 28: Singing Expression Transfer from One Voice to Another for ...Singing Expression Transfer from One Voice to Another for a Given Song Korea Advanced Institute of Science and Technology

System Structure

Feature

Extraction

Target

Singing Voice

Source

Singing Voice

Time-Scale

ModificationPitch Shifting Gain

DTW Smoothing HPSSEnvelope

Detector

Pitch

Detector

Modified

Singing Voice

stretching ratio

smoothed

stretching ratiopitch ratio gain ratio

harmonic signal

Temporal Alignment Pitch Alignment Dynamics Alignment

𝑠 𝑠𝑇 𝑠𝑇𝑃 𝑠𝑇𝑃𝐸

Page 29: Singing Expression Transfer from One Voice to Another for ...Singing Expression Transfer from One Voice to Another for a Given Song Korea Advanced Institute of Science and Technology

Pitch Alignment

Harmonic-Percussion Source Separation (HPSS)Pre-processing of pitch detection to increase detection accuracy

Median filter (IEEE Signal Processing Letters 2014)

Pitch DetectorYIN

Pitch shifting Pitch-Synchronous Overlap-Add (PSOLA)

Formant preservation

Page 30: Singing Expression Transfer from One Voice to Another for ...Singing Expression Transfer from One Voice to Another for a Given Song Korea Advanced Institute of Science and Technology

Pitch Alignment

source

target

result

Page 31: Singing Expression Transfer from One Voice to Another for ...Singing Expression Transfer from One Voice to Another for a Given Song Korea Advanced Institute of Science and Technology

System Structure

Feature

Extraction

Target

Singing Voice

Source

Singing Voice

Time-Scale

ModificationPitch Shifting Gain

DTW Smoothing HPSSEnvelope

Detector

Pitch

Detector

Modified

Singing Voice

stretching ratio

smoothed

stretching ratiopitch ratio gain ratio

harmonic signal

Temporal Alignment Pitch Alignment Dynamics Alignment

𝑠 𝑠𝑇 𝑠𝑇𝑃 𝑠𝑇𝑃𝐸

Page 32: Singing Expression Transfer from One Voice to Another for ...Singing Expression Transfer from One Voice to Another for a Given Song Korea Advanced Institute of Science and Technology

Dynamics Alignment

source

target

result

Page 33: Singing Expression Transfer from One Voice to Another for ...Singing Expression Transfer from One Voice to Another for a Given Song Korea Advanced Institute of Science and Technology

Evaluation

Datasets4 recordings for each of 4 songs (total 16 recordings)

One of 4 recordings is a target singing voice (professional or skilled)

Totally 12 pairs of source-target singing voice

Song 1 Song 2 Song 3 Song 4

Gender female male male male

No. of source 3 3 3 3

Remarks high pitch

English

low pitch

English

swing rhythm

Korean

swing rhythm

Korean

Page 34: Singing Expression Transfer from One Voice to Another for ...Singing Expression Transfer from One Voice to Another for a Given Song Korea Advanced Institute of Science and Technology

Evaluation

Temporal alignmentBetter alignment has less fluctuation of the DTW slope

Standard deviation of slope angle 𝜃 = arctan(𝑠𝑙𝑜𝑝𝑒)

Song 1 Song 2 Song 3 Song 4

Gender female male male male

No. of source 3 3 3 3

Remarks high pitch

English

low pitch

English

swing rhythm

Korean

swing rhythm

Korean

song 1 song 2 song 3 song 4

Page 35: Singing Expression Transfer from One Voice to Another for ...Singing Expression Transfer from One Voice to Another for a Given Song Korea Advanced Institute of Science and Technology

Evaluation

Pitch alignment

Song 1 Song 2 Song 3 Song 4

Gender female male male male

No. of source 3 3 3 3

Remarks high pitch

English

low pitch

English

swing rhythm

Korean

swing rhythm

Korean

Page 36: Singing Expression Transfer from One Voice to Another for ...Singing Expression Transfer from One Voice to Another for a Given Song Korea Advanced Institute of Science and Technology

Evaluation

Dynamics alignment

Song 1 Song 2 Song 3 Song 4

Gender female male male male

No. of source 3 3 3 3

Remarks high pitch

English

low pitch

English

swing rhythm

Korean

swing rhythm

Korean

Page 37: Singing Expression Transfer from One Voice to Another for ...Singing Expression Transfer from One Voice to Another for a Given Song Korea Advanced Institute of Science and Technology

Audio Examples

More examples are available on

source target result

let it go

cherry blossom ending

Page 38: Singing Expression Transfer from One Voice to Another for ...Singing Expression Transfer from One Voice to Another for a Given Song Korea Advanced Institute of Science and Technology

Summary

Proposed a method to transfer vocal expressions from one voice to another in terms of tempo, pitch and dynamics without any additional information

Showed the proposed method effectively transformed the source voices so that they mimic singing skills from the target voice

Page 39: Singing Expression Transfer from One Voice to Another for ...Singing Expression Transfer from One Voice to Another for a Given Song Korea Advanced Institute of Science and Technology

Future Plan

The limitation of this work is that the target voice must be available

A possible solution is to model a target singer model (e.g. singing synthesizer with natural expressions) and generate a target example using melody and lyrics information extracted from the source voice

Improve the audio quality using other time-scale/pitch modification algorithms

Page 40: Singing Expression Transfer from One Voice to Another for ...Singing Expression Transfer from One Voice to Another for a Given Song Korea Advanced Institute of Science and Technology
Page 41: Singing Expression Transfer from One Voice to Another for ...Singing Expression Transfer from One Voice to Another for a Given Song Korea Advanced Institute of Science and Technology