voiceprint system development design, implement, test unique voiceprint biometric system research...

12
Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead), Geeta Bothe, Mahesh Sooryambylu, Ravi Ray, Sreeram Vancheeswaran IBM India Customer: Jonathan Leet (DPS 2013) Instructor: Dr. Charles Tappert

Upload: caden-albro

Post on 14-Dec-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),

Voiceprint System DevelopmentDesign, implement, test unique voiceprint biometric system

Research Day Presentation, May 3rd 2013

Rahul Raj (Team Lead), Geeta Bothe, Mahesh Sooryambylu, Ravi Ray, Sreeram Vancheeswaran

IBM India

Customer: Jonathan Leet (DPS 2013)Instructor: Dr. Charles Tappert

Page 2: Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),

Common PassphraseBackground: four possible types of passphrases

1. User-specified phrase, like the user's name2.Specified phrase common to all users

• “My name is” from phrase “My name is user’s name”

3. Random phrase displayed on the computer screen4. Random phrase that can vary at the user's discretion

Advantages of a Common PassphraseSimplifies the segmentation problemAllows for careful selection of common phrase to optimize variety of

phonetic units for their authentication valueFacilitates testing for impostersPermits the measurement of true voice authentication biometric

performanceAvoids potential experimental flaws

2

Page 3: Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),

Software Used: Audacity & Matlab

• Audacity

• Open source audio editing software supports sound recording and editing.

• Supports resampling and stereo to mono conversion

• Available all platforms: Windows, Linux, Mac

• Matlab

• Signal Processing Toolbox provides industry-standard algorithms and apps for analog and digital signal processing

• Supports visualizing signals in time and frequency domains, FFT computation for spectral analysis, resampling, and other signal processing techniques.

3

Page 4: Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),

System Architecture

4

Collection and management of Speech Samples

in repository

Preprocessing and

spectrogram Generation

Mel Filter Banks and MFCC calculation

Automatic segmentation “My name is”

portion

Automatic Segmentation of phonemes using DTW

Feature Vector extraction

Pace’s Biometric Authentication System will obtain performance results from the

feature vectors

Page 5: Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),

Voice Sample Spectrogram using Matlab

Input speech Sample (Mono, 44100 Samples/sec)

5

• Voice stream collected into 1024 frames

• Samples are read sliding stream by 512 bytes, maintaining overlap

• Represent samples of a frame• One Frame ~ 23ms since

• Frame size = 44100/1024• Length of one frame =

1000ms/frame size

Page 6: Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),

Voice Sample Spectrogram using Matlab

• Represent component frequencies of a frame after applying FFT

• Frequency Vs Time data

Voiceprint Systems CS692 2013 Spring Batch 6

Represent the complete spectral data available for processing

Spectrogram constructed out of the above values

Page 7: Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),

Mel-Frequency bands space filters appropriately

7

Corresponds to frequency transform performed by the cochlea of human ear.

Mel filters are shown below, 13 lower bands are used for processing.

Page 8: Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),

Segmenting “My Name Is”• Speech Waveform indicating the voiced and

unvoiced segments

• Energy vs Zero Crossing plotted for same speech sample

• Non-voiced segments captures high zero crossing rate(red) and low energy(green) values

• Voiced segments indicate low zero crossing rate and high energy values

Voiceprint Systems CS692 2013 Spring Batch 8

• Higher frequency components of ‘z’ sound will have higher energy compared to the other phonemes

• Diagram shows the automatically Marked Spectrum in Matlab

• Vertical lines demarcate speech beginning and end of ‘z’

Page 9: Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),

Seven sound units of “My name is”

9

Page 10: Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),

Discrete Time Warp (TDW) AlgorithmSegments a Sample into Seven Sounds

• DTW operates on spectrographic data: amp x freq x time

• To segment a speech sample into the seven sound units, a sample’s time sequence is "warped" non-linearly against a manually sound segmented sample.

Voiceprint Systems CS692 2013 Spring Batch 10

Sample warp path represents the cost matrix and the warped path for the two time series represented long the axes

If the warp path passes through D(i, j) then the sample Xi is warped to the point Yi. If there is a vertical section in the warp path, a single point in

series X is warped to multiple points of series Y.

The decision to find the next point in the warp W(i, j) is:

Page 11: Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),

Feature Extraction• Features measurements reduce data & characterize

speaker

• The features extracted:

• Energy mean and variance in each frequency band over the entire utterance (~13*2 = 26 features)

• Energy mean in each frequency band within each of the 7 phonetic sounds (~13*7 = 91 features)

• Voice Fundamental Frequency (F0) – not completed

• Voice Formant Frequencies (F1-F3) – not completed

• Feature extractor output is a fixed-length vector appropriate as input to Pace University Biometric Authentication System

Note: 13 is the number of frequency bands 11

Page 12: Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),

System Performance

12

Feature Set Performance

Features from entire phrase

98.05%

Features from seven sounds

98.95%

• Performance was measured on 20 sample utterances from each of 30 speakers, manually segmented into the seven sounds.

• Receiver Operating Characteristic (ROC) curves were obtained to find the Equal Error Rate (EER) and system performance from two feature sets.