an approach to the computer recognition of single-syllable english words

2
78TH MEETING ß ACOUSTICAL SOCIETY OF AMERICA "structure" being within an infinitesimal paraxial region; H2 approachescylindrically collimated plane waves with ampli- tude proportional to z. Although both results are often as- sumed in practice, neither holds accurately below frequencies so high that absorption must be overwhelming. [Supported in part by U.S. Office of Naval Research.] 4:30 5Jll. Computerized Study of Finite-Amplitude Traveling Waves Confined to a Duct. ALAN B. Corfu. Ns, Department of Physics,Naval Postgraduate School, Monterey,California 93940. --Manipulation of an infinite set of coupled nonlineardiffer- ential equations representing a one-dimensional model of finite-amplitudeacousticprocesses occurringin a rigid-walled duct (whereinboundary-layer effects at the duct walls are the dominant loss mechanism)yields a form amenableto a Runge- Kutta iteration method performed on a computer. The results provide practical evaluation of the limitations of existingper- turbation solutions and allow extension of prediction of har- monic distortion and other nonlinear effectsfrom the regime O•lX < 1, Mb/$l<<l to the regime0•O•lX < oO, Mb/$l<• 1, where al={$•o•/co representsthe attenuation constant of the fun- damental frequency component of the waveform in the linear (infinitesimal-amplitude) limit x the distance from the source, M the peak Mach number of the source, 0• the angular fre- quency of the source,b= 1-}-« B/A, B/A the parameter of nonlinearity of the fluid, and cothe phasespeedin the limit b=a•=0. Waveform predictionshave been made and appear to be consistent with experimental observations of other workers. 4:45 5J12. On the Effects of Viscosity on SoundPropagating in a Moving Fluid through Cylindrical Ducts. G. M. R•.NTZ•.PIS, School of Engineering Science and Mechanics, Georgia Institute of Technology, Atlanta, Georgia 30328.--In this paper,theeffects of the presence of viscosity in a fluid flowing througha cylin- drical duct, on the sound propagation and attenuation is dis- cussed.The field equations are obtained from the linearized variational equations of the conservation equations.Solutions of viscous case for various flow profiles are obtained and the results are compared with the inviscid and the stationary cases. The effects of shearflow coupled with viscosityare dis- cussed and compared with the caseof flow boundedby two infinite planes. WEDNESDAY, 5 NOVEMBER 1969 COTILLION ROOM 1, 2:00 P.M. Session 5K. Speech Processing LAWRENCE R. RABINER, Chairman Contributed Papers (12 minutes) 2:00 5K1. Simulation of the Measurement Phase of an Automatic Speaker RecognitionSystem. JAR•.D J. WOLF, Department of Electrical Engineering and Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, Massa- chusetts 02139.--This paper describes an investigation of an efficient approach to acoustic measurements (feature extrac- tion) for speaker recognition motivated by known relations between acousticoutput and vocal tract shapes and gestures. Rather than make general measurements over the extent of an utterance, only significant features of certain selected segments are used.In this study, these segments were located manually.A speaker identification experiment was performed, using seventeen measurements such as fundamental frequency and features of vowel and nasal consonant spectra. A com- putationally simple linear classification procedure was used, and the test data was kept independent of the design data. No errors were made in identification of the speaker for 210 test "utterances" by 21 adult male speakers. Procedures for evaluating speaker-separating ability of individual measure- mentsand intermeasurement dependence are described. [This research was supported in part by the National Science Foundation, Air Force Cambridge ResearchLaboratories, and the National Institutes of Health.] 2:15 5K2. Calculation of Classification Functions Using a Large Number of SpokenWord Samples.Sram CmBA ANDHmOAK• SAKOE, Central ResearchLaboratories, Nippon Electric Com- pany, Kawasaki,Japan.--A two-phase procedure for calcula- tion of multiple linear classification functions has been de- veloped. PhaseI approximates the distributionof samples in the measurement space by a series of linear functionsand pre- selects only those samples lying in the critical region.PhaseII utilizes the reducednumber of samples for calculation by a linear programming algorithm, the multiple linear classifica- tion functionsthat can separategiven samples strictly. The procedure was applied to the problemof spoken-word recogni- tion. About 3300 samples of Japanese spokendigits, uttered by 55 male speakers and represented as 56-dimensional measurement vectors by a vocoder-type frequency analyzer are collected. During Phase I, the number was reduced to 16 linear functions on the average. Preciseclassificationfunctions were then calculated for each digit in Phase II. Most of the digits were separated by a single linear function while others were separated by two functions. An error rate of less than 0.2% was achieved for 550 samples newly uttered by the same 55 speakers. 2:30 5K3. An Approach to the Computer Recognition of Single- Syllable English Words. MA•ir M•.D•.SS, Department of Electrical Engineering and Research Laboratory of Electronics, Massachusetts Institute of Technology,Cambridge, Massa- chusetts 02139.*•A method has been developed to abstract distinctive features information about vowels, stops, and fricatives from a filter bank representation of single-syllable, single-morpheme English words. The procedure is organized with respect to and exploits the phonologicalstructure and rules for such words, and no reference to a lexicon is made. Because the program relies primarily on gross spectral prop- erties, it can handle a variety of male speakers with no alterations required. Location, classification, and false alarm scores of better than 92%, 91%, and 7%, respectively, were obtained for repetitions of a list of 105 such words by seven speakers.Based on this experience, 24 words were carefully chosen and subjected to the recognition procedure. An average word recognition score of 85% wasobtained for eight speakers. The Journal of the AcousticalSocietyof America 83 Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 129.12.233.180 On: Thu, 04 Dec 2014 19:39:08

Upload: mark

Post on 07-Apr-2017

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An Approach to the Computer Recognition of Single-Syllable English Words

78TH MEETING ß ACOUSTICAL SOCIETY OF AMERICA

"structure" being within an infinitesimal paraxial region; H2 approaches cylindrically collimated plane waves with ampli- tude proportional to z. Although both results are often as- sumed in practice, neither holds accurately below frequencies so high that absorption must be overwhelming. [Supported in part by U.S. Office of Naval Research.]

4:30

5Jll. Computerized Study of Finite-Amplitude Traveling Waves Confined to a Duct. ALAN B. Corfu. Ns, Department of Physics, Naval Postgraduate School, Monterey, California 93940. --Manipulation of an infinite set of coupled nonlinear differ- ential equations representing a one-dimensional model of finite-amplitude acoustic processes occurring in a rigid-walled duct (wherein boundary-layer effects at the duct walls are the dominant loss mechanism) yields a form amenable to a Runge- Kutta iteration method performed on a computer. The results provide practical evaluation of the limitations of existing per- turbation solutions and allow extension of prediction of har- monic distortion and other nonlinear effects from the regime O•lX < 1, Mb/$l<<l to the regime 0•O•lX < oO, Mb/$l<• 1, where al={$•o•/co represents the attenuation constant of the fun-

damental frequency component of the waveform in the linear (infinitesimal-amplitude) limit x the distance from the source, M the peak Mach number of the source, 0• the angular fre- quency of the source, b= 1-}-« B/A, B/A the parameter of nonlinearity of the fluid, and co the phase speed in the limit b=a•=0. Waveform predictions have been made and appear to be consistent with experimental observations of other workers.

4:45

5J12. On the Effects of Viscosity on Sound Propagating in a Moving Fluid through Cylindrical Ducts. G. M. R•.NTZ•.PIS, School of Engineering Science and Mechanics, Georgia Institute of Technology, Atlanta, Georgia 30328.--In this paper, the effects of the presence of viscosity in a fluid flowing through a cylin- drical duct, on the sound propagation and attenuation is dis- cussed. The field equations are obtained from the linearized variational equations of the conservation equations. Solutions of viscous case for various flow profiles are obtained and the results are compared with the inviscid and the stationary cases. The effects of shear flow coupled with viscosity are dis- cussed and compared with the case of flow bounded by two infinite planes.

WEDNESDAY, 5 NOVEMBER 1969 COTILLION ROOM 1, 2:00 P.M.

Session 5K. Speech Processing

LAWRENCE R. RABINER, Chairman

Contributed Papers (12 minutes) 2:00

5K1. Simulation of the Measurement Phase of an Automatic

Speaker Recognition System. JAR•.D J. WOLF, Department of Electrical Engineering and Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, Massa- chusetts 02139.--This paper describes an investigation of an efficient approach to acoustic measurements (feature extrac- tion) for speaker recognition motivated by known relations between acoustic output and vocal tract shapes and gestures. Rather than make general measurements over the extent of an utterance, only significant features of certain selected segments are used. In this study, these segments were located manually. A speaker identification experiment was performed, using seventeen measurements such as fundamental frequency and features of vowel and nasal consonant spectra. A com- putationally simple linear classification procedure was used, and the test data was kept independent of the design data. No errors were made in identification of the speaker for 210 test "utterances" by 21 adult male speakers. Procedures for evaluating speaker-separating ability of individual measure- ments and intermeasurement dependence are described. [This research was supported in part by the National Science Foundation, Air Force Cambridge Research Laboratories, and the National Institutes of Health.]

2:15

5K2. Calculation of Classification Functions Using a Large Number of Spoken Word Samples. Sram CmBA AND HmOAK• SAKOE, Central Research Laboratories, Nippon Electric Com- pany, Kawasaki, Japan.--A two-phase procedure for calcula- tion of multiple linear classification functions has been de- veloped. Phase I approximates the distribution of samples in the measurement space by a series of linear functions and pre- selects only those samples lying in the critical region. Phase II

utilizes the reduced number of samples for calculation by a linear programming algorithm, the multiple linear classifica- tion functions that can separate given samples strictly. The procedure was applied to the problem of spoken-word recogni- tion. About 3300 samples of Japanese spoken digits, uttered by 55 male speakers and represented as 56-dimensional measurement vectors by a vocoder-type frequency analyzer are collected. During Phase I, the number was reduced to 16 linear functions on the average. Precise classification functions were then calculated for each digit in Phase II. Most of the digits were separated by a single linear function while others were separated by two functions. An error rate of less than 0.2% was achieved for 550 samples newly uttered by the same 55 speakers.

2:30

5K3. An Approach to the Computer Recognition of Single- Syllable English Words. MA•ir M•.D•.SS, Department of Electrical Engineering and Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, Massa- chusetts 02139.*•A method has been developed to abstract distinctive features information about vowels, stops, and fricatives from a filter bank representation of single-syllable, single-morpheme English words. The procedure is organized with respect to and exploits the phonological structure and rules for such words, and no reference to a lexicon is made. Because the program relies primarily on gross spectral prop- erties, it can handle a variety of male speakers with no alterations required. Location, classification, and false alarm scores of better than 92%, 91%, and 7%, respectively, were obtained for repetitions of a list of 105 such words by seven speakers. Based on this experience, 24 words were carefully chosen and subjected to the recognition procedure. An average word recognition score of 85% was obtained for eight speakers.

The Journal of the Acoustical Society of America 83

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 129.12.233.180 On: Thu, 04 Dec 2014

19:39:08

Page 2: An Approach to the Computer Recognition of Single-Syllable English Words

78TH MEETING ß ACOUSTICAL SOCIETY OF AMERICA

Recognition was significantly improved if lexical information was used in addition to acoustic and phonological information. [Work supported in part by grants from the National Insti- tutes of Health and the Air Force Cambridge Research Laboratories.•

* Present address: Univac, Univac Park, Saint Paul, Minn. 55101.

2:45

5K4. Speaker-Machine Interaction in a Limited Speech Recognition System. JOHN I. MAKHOUL, Department of Elec- trical Engineering and Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, Massa- chusetts 02139.--The performance of a speaker-independent limited speech recognition system for adult male and female speakers is examined. In cases of incorrect recognition, on the basis of results, the speaker is instructed to change his articulation to effect correct recognition. Methods of changes in articulation and the effects they have on recognition performance are discussed. The vocabulary used consists of 55 /oCVd/ syllables, where C is one of six stop and five fricative consonants, and V is one of five tense vowels. The recognition depends on the extraction of several acoustic features. The positions of spectral energy concentrations are determined and their relative movements are tracked, espe- cially along the CV transitions. [This work was supported in part by grants from the National Institutes of Health. J

3:00

5K5. Pitch Determination by Measurement of Harmonics. II. The Hipex System. RALPH L. MILLER, Bell Telephone Laboratories, Inc., Holmdel, New Jersey 07733.--Work on the method of pitch determination by measurement of harmonics described previously [J. Acoust. Soc. Amer. 44, 390 (1968)-] has been continued to determine its performance character- istics in relation to system parameter changes. The resulting experimental arrangement has been termed the Hipex (Har- monic Identification Pitch Extraction) system. A statistical study of "period histograms" with a wide range of pitches gives an optimum adjustment for the peak detection process needed to identify the fundamental period. Percentage errors are used for comparative purposes in assessing the perfor- mance for different numbers of harmonic measurement

channels as well as for different types of speech. Character- istic error distributions lead to simple logic for error elimina- tion. Performance of the Hipex system in the presence of noise is unusually good. Usable pitch information has been obtained from speech in an approximately zero signal-to- noise level. The effectively coherent addition of harmonic signals in the period histogram regardless of their phase in the speech wave implies that the technique is approaching an optimum for determination of the fundamental period from the available information.

3:15

5K6. Reduction of Long-time Reverberation by a Center- Clipping Process. O. M. M. MITCr•ELL ,•ND D. A. BERKLE¾, Bell Telephone Laboratories, Inc., Holmdel, New Jersey 07733. --A center clipping process has been found to be effective in removing the reverberant tails of speech produced in a room with long reverberation time without significantly reducing intelligibility. The instantaneous output of a center clipper is zero unless the absolute value of the input exceeds a thresh- old value and otherwise varies linearly with the input. The input speech is divided into several channels by a set of contiguous band filters each less than 1 oct wide, and the output of each filter is independently center clipped. Harmonic distortions introduced by the center clippers are then removed by an output filter bank identical to the input set of filters. A six-channel system using •-oct filters (250-3500 Hz) was simulated on a CDC 3300-EAI 8800 hybrid computer to

process input speech recorded in an auditorium. Clipping levels were set so that the output of each center clipper was zero approximately 50% of the time. An output tape showing reduction of reverberation will be played.

3:30

5K7. Adaptive Delta Modulation of Speech with a One-Bit Memory. N. S. JAYANT, Bell Telephone Laboratories, Murray Hill, New Jersey 07974.•We define an adaptive delta modu- lator in which the adaptation of the step size m, at the rth sampling instant depends on the comparison of two channel symbols--the bits C, and Cr_• corresponding to the rth and (r-1) '• sampling instants. Specifically, the ratio of m• to the previous step size m•_• is -]-P or -Q depending on whether C• and C•_• are equal or not. We indicate the step response of the delta modulator and present results on the simulation of the adaptation logic on a speech sample that was band limited to 3.3 kHz. On the basis of signal-to-error ratios, we note that the equations PQ=I and P=1.5 define optimal adaptation characteristics for sampling frequencies of 20, 40, and 60 kHz, and also that PQ_< 1 -]-e (e<<l) represents a strong condition for stability. Finally, we comment on the relative performance of the adaptive delta modulator and of loga- rithmic PCM, and find that at a 60-kHz sampling rate, the delta modulator achieves a 7 log-bit PCM quality.

3:45

5K8. Optimum Linear Processing of Signals for Speech Transmission. HIROYA FuJIsA•<• AND KUNIHIKO N•WA, Engi- neering Research Institute, Faculty of Engineering, University of Tokyo, Bunkyo-ku, Tokyo, Japan.inEfficient utilization of a communication system requires that statistical properties of the signal be matched to characteristics of the over-all system noise. This can be achieved, although approximately, by an appropriate linear processing of the speech signal prior to modulation at the transmitter, and restoring the signal by another processing at the receiver. Based on measurements of signal characteristics and various performance criteria, optimum processings at the transmitter have been theoretically derived for conventional modulation systems including AM, PCM, FM, and PM. These systems have then been simulated and evaluated by articulation tests. The results indicate that the processing which maximizes the average rate of informa- tion transmission yields the highest articulation score, proving the validity of the information transmission rate as theoretical criterion for system performance. The maximum improvement is equivalent to a gain in signal-to-noise level of about 9 dB in the case of FM. It has also been shown that the require- ments on fidelity and intelligibility coincide in the determina- tion of the optimum processing characteristic at the receiver.

4:00

5K9. Tracking of Articulatory Movements by Means of a Computer Controlled X-Ray Microbeam. O. FUJ•MURA, H. ISmDA, AND S. KIRITANI, Faculty of Medicine, University of Tokyo Hongo, Tokyo, Japan.mA special x-ray generator pro- duces a fine beam of x-rays that can be deflected with very fast response by computer control. A high sensitivity scintilla- tion counter essentially counts the number of the outcoming x-ray photons and feed the detected beam intensity into the computer. Small pellets are placed on selected parts of articulatory organs and their movements are tracked by this method, the result being displayed by an oscilloscope. The radiation dose is minimized by giving x-ray exposures under program control only when and where necessary for deriving immediately useful data. The first prototype system has been assembled and preliminary experiments are being performed for establishing the experimental method. Some computer programs for pellet tracking have been written.

84 Volume 47 Number 1 (Part 1) 1970

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 129.12.233.180 On: Thu, 04 Dec 2014

19:39:08