dr.d.y.patil polytechnic, ambi computer department topic : voice morphing

18
DR.D.Y.PATIL POLYTECHNIC, AMBI COMPUTER DEPARTMENT TOPIC : VOICE MORPHING

Upload: randolf-dorsey

Post on 04-Jan-2016

242 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: DR.D.Y.PATIL POLYTECHNIC, AMBI COMPUTER DEPARTMENT TOPIC : VOICE MORPHING

DR.D.Y.PATIL POLYTECHNIC, AMBICOMPUTER DEPARTMENT

TOPIC :

VOICE MORPHING

Page 2: DR.D.Y.PATIL POLYTECHNIC, AMBI COMPUTER DEPARTMENT TOPIC : VOICE MORPHING

WHAT IS VOICE MORPHING ? APPROACHS TO THE PROBLEM. SPEECH PRODUCTION. CONVERSION OF VOICE. TYPES OF VOICE MORPHING. REFRANCES OR METHODS. APPLICATION OF VOICE MORPHING. AVAILABLE SOFTWARE FOR VOICE

MORPHING. SUMMARY. CONCLUSION.

Page 3: DR.D.Y.PATIL POLYTECHNIC, AMBI COMPUTER DEPARTMENT TOPIC : VOICE MORPHING

Voice Morphing which is also referred to as voice transformation and voice conversion is a technique to modify a source speaker's speech utterance to sound as if it was spoken by a target speaker.

There are many applications which may benefit from this sort of technology. For example, a TTS system with voice morphing technology integrated can produce many different voices. In cases where the speaker identity plays a key role, such as dubbing movies and TV-shows, the availability of high quality voice morphing technology will be very valuable allowing the appropriate voice to be generated (maybe in different languages) without the original actors being present.

Page 4: DR.D.Y.PATIL POLYTECHNIC, AMBI COMPUTER DEPARTMENT TOPIC : VOICE MORPHING

Voice conversion will be performed in two phases.

In the first phase, the training, the speech signals of the source and target speakers will be analyzed and the voice characteristics will be extracted by means of a mathematical optimization technique, very popular in the speech processing world, the Linear Prediction Coding (LPC) technique.

Page 5: DR.D.Y.PATIL POLYTECHNIC, AMBI COMPUTER DEPARTMENT TOPIC : VOICE MORPHING

In second phase , the transformed features will be used in order to synthesis speech that will, hopefully, resemble that of the target speaker.

Speech synthesis will be performed again by means of the Linear Prediction Coding.

Page 6: DR.D.Y.PATIL POLYTECHNIC, AMBI COMPUTER DEPARTMENT TOPIC : VOICE MORPHING

The respiratory subsystem is composed of the lungs, trachea and windpipe, diaphragm and the chest cavity.The larynx and pharyngeal cavity or throat constitutes the laryngeal subsystems. The articulatory subsystem includes the oral cavity and the nasal cavity.

Page 7: DR.D.Y.PATIL POLYTECHNIC, AMBI COMPUTER DEPARTMENT TOPIC : VOICE MORPHING

The oral cavity is comprised of the velum, the tongue, the lips, the jaw and the teeth.

In speech processing technical discussions, the vocal tract is referred to as the combination of the larynx, the pharyngeal cavity and the oral cavity.

The respiratory subsystem behaves like an air pump, supplying the aerodynamic energy for the other two subsystems.

In speech processing, the basic aerodynamic parameters are air volume, flow, pressure and resistance.

Page 8: DR.D.Y.PATIL POLYTECHNIC, AMBI COMPUTER DEPARTMENT TOPIC : VOICE MORPHING

TECHNICS:- Wavelet Decomposition. Proposed model.

Wavelet Decomposition :- Wavelets are a class of functions that

possess compact support and form a basis for all finite energy signals.

They are able to capture the non-stationary spectral characteristics of a signal by decomposing it over a set of atoms which are localized in both time and frequency. The DWT uses the set of dyadic scales and translates of the mother wavelet to form an orthonormal basis for signal analysis.

Page 9: DR.D.Y.PATIL POLYTECHNIC, AMBI COMPUTER DEPARTMENT TOPIC : VOICE MORPHING

The original signal S is split into an approximation cA1 and a detail cD1.The approximation is then itself split into an approximation and a detail and so on.Decomposing a signal into k levels of decomposition therefore results in k+1 sets of coefficients at different frequency resolutions, k levels of detail and 1 level of approximation coefficients.

Page 10: DR.D.Y.PATIL POLYTECHNIC, AMBI COMPUTER DEPARTMENT TOPIC : VOICE MORPHING

Proposed model : Voice morphing is performed in two steps:

training and transformation. The training data consist of repetitions of the same phonemes uttered by both source and target speakers.

The source and target training data is divided into frames of 128 samples and the data is randomly divided into training and validation sets.

A 5-level wavelet decomposition is then performed to the source and target training data.

Page 11: DR.D.Y.PATIL POLYTECHNIC, AMBI COMPUTER DEPARTMENT TOPIC : VOICE MORPHING

IN THIS SECTION WE KNOW THAT IN WHICH FORM WE CAN TRANFORM A NORMAL VOICE OR SPEECH.

SOURCE

TARGET RESULT1 RESULT2

F TO M SPEECH1 TARGET1 RESULT1 VOICE1

M TO F SPEECH2 TARGET2 RESULT2 VOICE2

F TO F SPEECH3 TARGET3 RESULT3 VOICE3

M TO M SPEECH4 TARGET4 RESULT4 VOICE4

Page 12: DR.D.Y.PATIL POLYTECHNIC, AMBI COMPUTER DEPARTMENT TOPIC : VOICE MORPHING

The "Source Speech" column indicates the utterances of the source speaker.

Target Speech" column is the target speaker's utterances.

The utterances in both these two columns are NOT included in the training data for the estimation of the conversion function.

The next two columns for result. The difference between these two columns is

that the “RESULT1" applies the target prosody extracted from the target utterance, but the “RESULT2" still applies the original prosody of the source utterances.

Page 13: DR.D.Y.PATIL POLYTECHNIC, AMBI COMPUTER DEPARTMENT TOPIC : VOICE MORPHING

Abe M. , Nakamura S. , Shikano K. and Kuwabara H.: Voice conversion through vector quantization, Proceedings of the ICASSP, 1988.

Stylianou Y., Cappe O. And Moulines E.: Statistical Methods for Voice Quality Transformation, Proceedings of Euro speech, 1995.

Arslan L. and Talkin D: Voice Conversion by Codebook Mapping of Line Spectral Frequencies and Excitation Spectrum, Proceedings of Euro speech , 1997.

Page 14: DR.D.Y.PATIL POLYTECHNIC, AMBI COMPUTER DEPARTMENT TOPIC : VOICE MORPHING

ENTERTAINMENT. IN FILM INDUSTRY.

SECURITY. IN COMPUTER GAMING

Page 15: DR.D.Y.PATIL POLYTECHNIC, AMBI COMPUTER DEPARTMENT TOPIC : VOICE MORPHING

MORPH VOX PRO VOICE CHANGER 2.0.6. MORPH VOX PRO VOICE CHANGER 4.2.2. MORPH VOX PROVOICE CHANGER 4.3.8. TERA VOICE SERVAER 2004. FLASH VOICE BUTTONS 3.0. VOICE TWISTER 1.0.4. VOICE AGAIN 1.5.2. QUICK VOICE FOR OSX 2.2.0. QUICK VOICE FOR WINDOWS 2.2.0.

Page 16: DR.D.Y.PATIL POLYTECHNIC, AMBI COMPUTER DEPARTMENT TOPIC : VOICE MORPHING

Voice morphing is the process of changing voice personality i.e. speech uttered by a source speaker is modified to sound as if the target speaker had uttered it.

In this dissertation our attempt of voice morphing commenced by introducing the basic properties of speech signals.

Introducing basic techniques of voice morphing.

Concept behind voice morphing.

Page 17: DR.D.Y.PATIL POLYTECHNIC, AMBI COMPUTER DEPARTMENT TOPIC : VOICE MORPHING

As voice morphing is a technology with a lot of interesting, useful and fun applications further research on the subject with or without the implementation of the GTM (Generative Topographic Mapping) model is bound to follow that will lead to the production of morphed speech of an excellent quality.

Page 18: DR.D.Y.PATIL POLYTECHNIC, AMBI COMPUTER DEPARTMENT TOPIC : VOICE MORPHING