isscs2011

17
Environmental Sound Recognition with CELP-based Features EnShuo Tsau, Seung-Hwan Kim and C.-C. Jay Kuo Dept. of Electrical Engineering University of Southern California Los Angeles, CA 90089-2564 http://viola.usc.edu/

Upload: sphinx-tsau

Post on 14-Jun-2015

461 views

Category:

Documents


1 download

TRANSCRIPT

  • 1. Environmental Sound Recognition with CELP-based Features EnShuo Tsau, Seung-Hwan Kim and C.-C. Jay Kuo Dept. of Electrical Engineering University of Southern California Los Angeles, CA 90089-2564 http://viola.usc.edu/

2. Outline Environmental Sound Recognition (ESR) and Challenge Conventional Audio Features Motivation and Proposed Solution Experimental Results Conclusion and Future Work 2 3. Environmental Sound Recognition (ESR) Environmental Sound Restaurants, streets, parks, airport and train stations, hallway, etc Environmental Sound Recognition (ESR) Use audio information to assist activities, Easy storage and process Robotic navigation and human-computer interactions Lacking of lighting and angle of the camera problems Other applications: surveillance, search and rescue Challenges of ESR Similar sounds Multiple generating sources Noise 3 Unlike speech and music Unstructured Difficult to build model 4. Conventional Audio Features Conventional features MFCCs, MFCC derivatives, sub-band energy, fundamental frequency, LPCCs, energy, zerocrossing, and spectral- centroid, bandwidth, matching pursuit (MP) Problems with conventional features MFCCs Describe the shape of the overall spectrum Only works well for structured sounds such as speech and music Performance degrades in the presence of noise MP Relatively works well for both structured sound and unstructured sound Require significant computational complexity 4 5. Motivation for CELP-based Features Feature Set CELP MFCCs MP Preserve Data Featuresdata (reversible) Easy Implementation ITU-T G.723.1 Low Complexity Real Time Compact Feature Classification Rate ESR Different Applications Speech Music Potential Side Benefits Mobile applications (5.3/6.3 kbps) Fix point Comparison with MFCCs and MP Bit streams Information Features 5 6. Code Excited Linear Prediction (CELP) . M. R. Schroeder and B. S. Atal, "Code-excited linear prediction (CELP): high-quality speech at very low bit rates," in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 10, pp. 937940, 1985 6 Analysis-by-Synthesis Linear Prediction Short Term Prediction (STP): Linear Prediction Coefficients Long Term Prediction (LTP): Pitch T Residual Description 7. Proposed CELP Features 240 samples/frame; 4 subframes/frame; Available CELP features from bit streams LPC(Linear Prediction Coefficients) 10 order or LSF(Line Spectral Frequencies) Pitch Lag Open loop Close loop 20 p 147 GAIN of two excitation Pitch filter (5 tap) Fixed codebook pulse POS Location and sign of fixed codebook pulse CELP 7 8. Proposed Solution CELP: 11 dim MFCC: 21 dim (full bank) Classifier (Bayesian Network) Data Preprocessing Normalization, Cleaning Feature Extraction Classification 8 9. Experimental Setup and Result 10 classes: Transportation (3): airplane, motorcycle and train. Weather (4): rain, thunder, wind and stream Rural Areas (2): bird, insect. Indoor (1): restaurant. Feature Extraction Modifying standard code ITU-T G.723.1 Classifier Bayesian Network 9 10. Comparison of Features ClassificationAirplane Bird Insect Motor Rain Rest. Stream Thunder Train Wind Overall PITCH 77.8 28.8 1.1 27.1 1.2 62.6 10.5 0 29.1 21.2 26.8 GAIN 66.3 8.5 44 18.5 32 8.3 8.3 2.4 15.9 11.5 22.2 LPC 85.4 96.3 99.6 89.8 99.1 63.7 98 77 74.1 98.5 88.5 CELP+GAIN 88.7 96.8 99.6 90.4 99 77.8 97.6 79.5 81.6 98.7 91 CELP+GAIN+ POS 92.6 99.5 98.7 73.7 96.3 55.9 96 30 61 93 81.3 MFCC 87.8 90 95.8 86.2 76.8 69.4 77 43.2 86.9 100 82.5 CELP 88.4 96.8 99.6 90.4 99 77.9 97.7 78.8 81.3 98.7 91.2 CELP+MFCC 92.3 97.7 99.5 95.5 99 87.5 98.7 85.4 93.4 99.9 95.2 10 0 10 20 30 40 50 60 70 80 90 100 Airplane Bird Insect Motor Rain Rest. Stream Thunder Train Wind Overall ClassificationRate(%) Comparison of Features MFCC CELP CELP+MFCC Short Term and Long Term Prediction Speech like 11. Confusion Matrix of CELP Features Classification Rate Airplane Bird Insect Motor Rain Rest. Stream Thunder Train Wind Airplane 88.4 1.9 0.2 5.1 4.4 Bird 96.8 0.1 1.6 0.3 0.2 1.1 Insect 99.6 0.4 Motor 0.1 90.4 5.7 0.3 3.5 Rain 99 0.3 0.4 0.1 Rest. 1 2.2 8.1 0.1 77.9 1.4 2.6 6.8 0.1 Stream 0.2 0.3 1 97.7 0.2 0.5 Thunder 1.9 0.6 0.1 3 0.3 7.5 3.8 78.8 3.4 0.7 Train 5.1 0.7 5 0.1 7.1 0.1 0.7 81.3 Wind 1.3 98.7 11 12. Principal Component Analysis 12 0 10 20 30 40 50 60 70 80 90 100 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 ClassificationRate(%) Number of Dimension Principal Component Analysis CELP MFCC MFCC+CELP 13. Speed and Complexity Speed Feature extraction: Real time Classification Training: Depends on different classifier/kernel Testing: Fast and neglect able 13 Avg Run Time Training(sec) Testing(sec) CELP 659 8 MFCC 672 9 CELP+MFCC 912 10 14. 14 Summary of ESR topic Conclusion A novel set of CELP-based features are proposed by exploring the CELP bit stream information MFCCs representing bank energy not suitable for ESR CELP and CELP+MFCC performs better than MFCC by 10% margin (Bayesian network classifier) in ESR problem Long and short term prediction more robust with respect to background noise CELP enjoys low complexity, easy implementation and extendible benefits Recognition based on CELP features is desirable since the additional effort required by feature extraction is almost negligible 15. Conclusion A novel set of CELP-based features are proposed by exploring the CELP bit stream information MFCCs representing bank energy not suitable for ESR CELP and CELP+MFCC performs better than MFCC by 10% margin (Bayesian network classifier) in ESR problem Long and short term prediction more robust with respect to background noise CELP enjoys low complexity, easy implementation and extendible benefits Recognition based on CELP features is desirable since the additional effort required by feature extraction is almost negligible 15 16. Future Work Explore more features Speaker recognition and identification Longer term signature capture 16 17. 17 Q&A Thanks Q & A