14ec3029 speech and audio signal processing

30
14EC3029 SPEECH AND AUDIO SIGNAL PROCESSING Credits3:0:0 Pre requisites: Basic Digital Signal Processing, Good knowledge of MATLAB and Simulink D.Sugumar Asst.Prof (AGP 8000)/ECE Karunya University Coimbatore

Upload: sugumar-sar-durai

Post on 11-Sep-2015

46 views

Category:

Documents


2 download

DESCRIPTION

Speech and Audio

TRANSCRIPT

  • 14EC3029 SPEECH AND AUDIO SIGNAL PROCESSING

    Credits3:0:0

    Pre requisites: Basic Digital Signal Processing, Good knowledge of MATLAB and Simulink

    D.Sugumar Asst.Prof (AGP 8000)/ECE Karunya University Coimbatore

  • Course objective

    To study the analysis of various M-band filter banks for audio coding

    To learn various transform coders for audio coding.

    To study the speech processing methods in time and frequency domain

    To study the basic concepts of speech and audio.

  • Course outcome

    On successful completion you should be able to:

    1. Express the speech signal in terms of its time domain and frequency domain representations and the different ways in which it can be modelled.

    2. Express the simple features used in speech and audio applications

    3. Able to understand the operation of algorithms, and the effects of varying parameter values within these;

    4. Synthesise block diagrams for speech applications, know the purpose of the various blocks, and detail algorithms that could be used to implement them;

    5. Able to Implement components of speech processing systems, including speech recognition and speaker recognition, in MATLAB.

    6. Able to understand the behaviour of previously unseen speech processing systems and hypothesise about their merits.

  • Course Contents

    Mechanics of speech and audio Nature of Speech signal Discrete time modelling of

    Speech production Classification of Speech sounds Absolute Threshold of Hearing - Critical Bands- Masking,

    Perceptual Entropy -The perceptual audio quality measure (PAQM) - Cognitive effects in judging audio quality.

    Time-frequency analysis - filter banks and transforms. Audio coding and transform coders. Time and frequency domain methods for speech

    processing, Homomorphic speech analysis. Linear predictive analysis of speech, Application of LPC

    parameters. Formant analysis.

  • Audio Engineering-related Professions

    Musician

    Audio/electronics technician

    Recording Engineer

    Architectural Acoustician

    Psychoacoustician

    Electroacoustic device designer

    Electronics designer

    Computer programmer

    Audio engineer: One who devises creative solutions

    to difficult problems in the field of audio.

  • Audio-related Organizations, Conferences and Publications

    Acoustical Society of American (ASA) Journal: JASA (since 1929) meets twice yearly in various locations. Most members work at universities or scientific labs. http://asa.aip.org Audio Engineering Society (AES) Journal: JAES (since 1953) meets several times a year in various locations. Most members work in the audio industry. http://www.aes.org IEEE Signal Processing Society Conferences: Int. Conf. on Acoustics, Speech and Signal

    Processing (ICASSP) and Workshop on Applications of Signal Processing to Audio and Acoustics (WASSP).

    http://www.ieee.org/organizations/society/sp/

  • Audio-related Organizations, Conferences and Publications

    International Computer Music Association (since 1980) Publication: Computer Music Journal (quarterly since 1977) Yearly International Computer Music Conference (ICMC) at American, European,

    and Asian locations (since 1974). http://computermusic.org Society for Music Perception and Cognition (SMPC) Yearly conferences alternate between U.S./Canada and Europe/Asia.

    http://www.musicperception.org/ International Conference on Digital Audio Effects (DAFx) http://dafx.labri.fr Yearly international (since 1997). International Conference on Music Information Retrieval http://ismir2007.ismir.net/ Yearly international (since 2000). International Symposium on Musical Acoustics http://iwk.mdw.ac.at/ma/ Roughly biennial international.

  • Reference Books

  • Books 1. Digital Audio Signal Processing, Second Edition, Udo Zolzer, A John Wiley& sons Ltd Publicatioons,2008. This book ... covers noise-

    shaping, gives you formulas for peaking and shelving filters used in mixing consoles, tells you how to implement a state of the art reverb or dynamic compression algorithm and explains how audio compression using psychoacoustic effects works. The mathematics are not that complicated, but you should already know what FFT or IIR stands for and how they work to be able to use the book.

  • 2. Applications of Digital Signal Processing to Audio and Acoustics, Mark Kahrs, Karlheinz Brandenburg, Kluwer Academic Publishers New York, Boston, Dordrecht, London , Moscow,2002

    Good One

  • 3. . Digital Processing of Speech signals L.R.Rabiner and R.W.Schaffer - Prentice Hall 1978

    Perfect Choice for this course

  • 4.Speech and Audio Signal Processing by Ben Gold and Nelson Morgan This is a book much needed in the speech and

    audio community because of its unique perspective on these topics. By their very nature, speech, music and other audio signals are only fully understood if one takes into account their perception, production, and the context within which they exist (language, symphony). To appreciate what to process about such signals, the scientist must have a broad appreciation of linguistics, hearing, vocal tract models, and the brain in general, in addition to the standard engineering tools and approaches. This is why this book is valuable. It indeed attempts to reach out to all these fields with just enough details to inspire the reader, and to provide links to existing more detailed literature. The book is well written, full of excellent illustrations, and it was the perfect choice for this class.

  • 5. T. F. Quatieri, Principles of Discrete Time Speech Processing, Prentice Hall Inc, 2002

    1. Express the speech signal in terms of its time domain and frequency domain representations and the different ways in which it can be modelled 2. Derive expressions for simple features used in speech classification applications; 3. Explain the operation of example algorithms covered in lectures, and discuss the effects of varying parameter values within these; 4. Synthesise block diagrams for speech applications, explain the purpose of the various blocks, and describe in detail algorithms that could be used to implement them; 5. Implement components of speech processing systems, including speech recognition and speaker recognition, in MATLAB. 6. Deduce the behaviour of previously unseen speech processing systems and hypothesise about their merits.

  • I hear...and I forget I see...and I remember I do...and I understand

    Chinese Proverb

    study learn

    Practice

  • Hardware & Software for QA

  • E-books My collection @

    Google drive

    E-Books For DSP & solution

    For Matlab

    For Labview

    For Scilab

    Videos NPTL

    Youtube

  • Websites

    JDSP

    Mathwork center

    Dspguru

    etc

  • A Brief History of Audio (Analog) Electromagnetic microphone (Ernst Siemens 1874) Telephone (Alexander Graham Bell 1876) Phonograph (wax cylinders) (Thomas Edison 1877) Gramophone 78 record (Emile Berliner 1888) Telegraphone magnetic wire recorder (Valdemar Paulsen 1898) Telharmonium first electrical synthesizer (Thadeus Cahill 1900) AM radio (Reginald Fessenden 1905) First radio broadcast (Lee DeForest (Met Opera) 1910) Vacuum tube amplifier (Edwin Armstrong 1912) Electrostatic microphone (E. Wente, Bell Labs 1916) Electromagnetic loudspeaker (Chester Rice &Edward Kellog 1924) FM radio (Edwin Armstrong 1933; common use began in 1950s) Wire recorder (consumer heyday 1947-1952) Magnetic tape (reel-to-reel Ampex 1948; cassette Phillips 1962) 33 rpm (LP) record (Columbia Records 1948) multitrack recording (Ampex 1954) stereo LP (Westrex 1958) and FM (GE/Zenith 1961, based on Armstrong)

  • A Brief History of Audio (Digital)

    12-bit digital recording on computer tape (Bell Labs, 1957)

    13-bit digital recording on computer tape (Illiac II, UIUC, 1964)

    12-bit digital recording on computer disk (DEC/Stanford U., 1965)

    16-bit 2/4 channel digital recorder (Soundstream, 1976/77)

    Sony PCM (16-bit stereo recorded on video tape 1978)

    compact disk (CD) (16-bit stereo format) (Phillips/Sony 1983)

    stereo digital audio tape (DAT) (Phillips/Sony 1986)

    recordable CD: CD-R (Phillips/Sony 1988)

    sound compression (MP3 standard 1989)

    audio record/playback from home computer (NeXT 1989)

    8-track digital audio tape (ADAT) (Alesis 1991)

    8-track digital audio tape (DA-88) (Tascam 1992)

    MiniDisc MD (Sony 1998)

  • Introduction to Speech Signal Processing

    Speech: Fundamental and eortless mode of communication among humans. Speech communication: Talker, listener and channel Speech Production Process: Message formulation, language coding, neuro-muscular commands, movement of speech production organs, acoustic pressure variations Speech Perception Process: acoustic pressure variations, movement of speech perception organs, neuro-muscular commands, message comprehension

  • Applications (Project Areas)

    1 Speech Modification: time-scale manipulations:

    Fitting the speech waveform - In Radio and TV commercials into an allocated time slot and the synchronization of audio and video

    presentation.

    Speeding up speech Message playback Voice mail Reading machines and books for the blind

    Slowing down speech Learning a foreign language

    Voice transformations using Pitch and spectral changes of speech signal: Voice disguise Entertainment Speech synthesis

    Spectral change of frequency compression and expansion: may be useful in transforming speech as an aid to the partially deaf.

    Many methods can be applied to music and special effects.

  • Applications (Project Areas)

    2.Speech Coding

    Goal is to reduce the information rate measured in bits per second while maintaining the quality of the original waveform. Waveform coders:

    Represent the speech waveform directly and do not rely on a speech production model.

    Operate in a high range of 16-64 kbps

    Vocoders:

    Largely are speech model-based and rely on a small set of model parameters.

    Operate at the low bit range of 1.2-4.8 kbps

    Lower quality then waveform coders.

    Hybrid coders:

    Partly waveform based and partly speech model-based

    Operate in the 4.8 16 kbps range

  • Applications (Project Areas)

    Applications of speech coders include:

    Digital telephony over constrained bandwidth channels Cellular

    Satellite

    Voice over IP (Internet)

    Video phones

    Storage of Voice messages for computer voice mail applications.

  • Applications (Project Areas)

    3 Speech Enhancement Goal is to improve the quality of degraded speech.

    Preprocess speech before is degraded: Increasing the broadcast range of transmitters constrained by a peak power

    transmission limits (e.g., AM radio and TV transmissions).

    Enhancing the speech waveform after it is degraded. Reduction of additive noise in

    (Digital) telephony Vehicle and aircraft communications

    Reduction of interfering backgrounds and speakers for the hearing impaired, Removal of unwanted convolutional channel distortion and reverberation Restoration of old phonograph recordings degraded by:

    Acoustic horns Impulse-like scratches from age and wear

  • Applications (Project Areas)

    4 Speaker Recognition Speech signal processing exploits the variability of speech model

    parameters across speakers. Verifying a persons identity (Biometrics) Voice identification in forensic investigation.

    Understanding of the speech model features that cue a persons identity is also important in speech modification where model parameters can be transformed for the study of specific voice characteristics: Speech modification and speaker recognition can be developed synergistically.

  • Office

    Room No 206 (2nd Floor of ECE)

    E-Mail: [email protected]