9th annual conference ; vol. 1 - gbv · 9thannualconferenceofthe internationalspeech...
TRANSCRIPT
9th Annual Conference of the
International SpeechCommunication Association 2008
(INTERSPEECH 2008)
Brisbane, Australia
22-26 September 2008
Volume 1 of 5
ISBN: 978-1-61567-378-0
TueSe2.01: Keynote 1: Hiroya FujisaM — ISCA Medallist
Great Hall, Time 11:00-12:00, Tuesday 23rd September 2008 Chair: Isabel Trancoso
p/ge i in Search of Models in Speech Communication ResearchKeynote 1 c
Hiroya Fujisaki, University of Tokyo, Japan
WedSel.Ol: Keynote 2: Abeer Alwan
Great Hall, Time 08:30-09:30, Wednesday 24th September 2008 Chair: Anne Cutler
Keynote 2Dealing with Limited and Noisy Data in ASR: A Hybrid Knowledge-Based andStatistical ApproachAbeer Alwan, University of California at Los Angeles, USA
ThuSel.Ol: Keynote 3: Joaquin Gonzalez-RodriguezGreat Hall, Time 08:30 - 09:30, Thursday 25th September 2008 Chair: Michael Wagner
KeynoM 3Forensic Automatic Speaker Recognition: Fiction or Science?
Joaquin Gonzalez-Rodriguez, Universidad Autonoma de Madrid, Spain
FriSel.Ol: Keynote 4: Justine Cassell
Great Hall, Time 08:30-09:30, Friday 26th September 2008 Chair: Denis Bumham
Keynote 4Modeling Rapport in Embodied Conversational AgentsJustine Cassell, Northwestern University, USA
TueSe3.01: Segmentation and Classification
GreatHall, Time 13:30-15:30, Tuesday 23rd September 2008 Chair: Helen M. Meng
Page 20TueSe3.01-l
13:30-13:50
Page 24TueSe3.01-213:50-14:10
Page 28TueSe3.01-314:10-14:30
Page 32TueSe3.01-414:30-14:50
Page 36TueSe3.01-514:50-15:10
Page40TueSe3.01-615:10-15:30
Agglomerative Hierarchical Speaker Clustering Using Incremental Gaussian
Mixture Cluster ModelingKyu J. Han, Shrikanth S. Narayanan, University ofSouthern California, USA
Weighted Segmental K-Means Initialization for SOM-Based Speaker ClusteringOshry Ben-Harush1, Itshak Lapidot2, Hugo Guterman1
1Ben-Gurion University of the Negev, Israel; 2Sami Shamoon College of Engineering,Israel
Learning Essential Speaker Sub-Space Using Hetero-Associative Neural Networks
for Speaker Clustering
Shajith Ikbal, Karthik Visweswariah, IBMIndia Research Lab, India
Two's a Crowd: Improving Speaker Diarization by Automatically Identifying and
Excluding Overlapped SpeechKofi Boakye, Oriol Vinyals, Gerald Friedland, ICSI, USA
T-Test Distance and Clustering Criterion for Speaker Diarization
Trung Hieu Nguyen1, Eng Siong Chng2, Haizhou Li11 Institute forlnfocomm Research, Singapore; 2Nanyang Technological University,Singapore
Integration of TDOA Features in Information Bottleneck Framework for Fast
Speaker Diarization
Deepu Vijayasenan, Fabio Valente, Hen'd Bourlard, IDIAP Research Institute,
Switzerland
TueSe3.02: Speech CodingPlaza 1, Time 13:30-15:30, Tuesday 23rdSeptember 2008 Chair: Julien Epps
Page 44
TueSe3.02-l13:30-13:50
Page 45
TueSe3.02-213:50- 14:10
Page 49
TueSe3.02-314:10-14:30
Page 53TueSe3.02-4
14:30-14:50
Page 57TueSe3.02-514:50-15:10
Page 61TueSe3.02-615:10-15:30
Low Complexity Near-Optimal Unit-Selection Algorithm for Ultra Low Bit-Rate
Speech Coding Based on N-Best Lattice and Viterbi Search
V. Ilamasubramanian, D. Harish, Siemens Corporate Technology India, India
A New Fast Algebraic Fixed Codebook Search Algorithm in CELP Speech Coding
Vaclav Eksler1, Redwan Salami2, Milan Jettnek11 University ofSherbrooke, Canada;2VoiceAge Corporation, Canada
A Novel Transcoding Algorithm Between 3GPP AMR-NB (7.95kbit/s) and ITU-T
G.729a (8kbit/s)
HaoXu, Changchun Bao, Beijing University of Technology, China
Mel-Frequency Cepstral Coefficient-Based Bandwidth Extension of Narrowband
SpeechAmrH. Nour-Eldin, Peter Kabal, McGill University, Canada
A PCM Coding Noise Reduction for ITU-T G.711.1
Jean-Luc Garcia, Claude Marro, Balazs Kdvesi, Orange Labs, France
An Instrumental Measure for End-to-End Speech Transmission Quality Based on
Perceptual Dimensions: Framework and Realization
Marcel Waltermann1, Kirstin Scholz2, Sebastian Moller1, Lu Huo2,Alexander Raake1, Ulrich Heute21 Technische Universitdt Berlin, Germany;2 Christian-Albrechts-Universitdt zu Kiel,
Germany
TueSe3.03: Human Conversation and Communication
Plaza 2, Time 13:30-15:30, Tuesday 23rd September 2008 Chair: Bernd Mdbius
Page 65TueSe3.03-l13:30-13:50
Page 69TueSe3.03-213:50-14:10
Page 70TueSe3.03-3
14:10-14:30
Page 74
TueSe3.03-414:30-14:50
Page 78
TueSe3.03-514:50-15:10
Page 82TueSe3.03-615:10-15:30
Duration and FO Interval of Utterance-Final Intonation Contours in the Perceptionof German Sentence Modality
Beimo Peters, Hartmut R. Pfitzinger, Christian-Albrechts-Universitdt zu Kiel, Germany
Contrastive Utterances Make Alternatives Salient — Cross-Modal PrimingEvidence
Bettina Braun, Lara Tagliapietra, Anne Cutler, Max Planck Institute for
Psycholinguistics, The Netherlands
Exploring a Mechanism of Speech Sychronization Using Auditory Delayed
ExperimentsMasato Ishizaki1, Yasuharu Den2, Senshi Fukashiro11University of Tokyo, Japan;2 Chiba University, Japan
Prosodic Manifestations of Confidence and Uncertainty in Spoken LanguageHeather Pon-Bany, Harvard University, USA
Identifying Relevant Phrases to Summarize Decisions in Spoken Meetings
Raquel Fernandez1, Matthew Frampton1, John Doweling2, Anish Adukuzhiyil1,Patrick Ehlen1, Stanley Peters1
Stanford University, USA;2 University of California at Santa Cruz, USA
Recovering Participant Identities in Meetings from a Probabilistic Description of
Vocal Interaction
Kornel Laskowski1, Tanja Schultz21 Universitdt Karlsruhe (TH), Germany;2Carnegie Mellon University, USA
TueSe3.04: Special Session: OzPhon08 — Phonetics and Phonology ofAustralian Aboriginal LanguagesPlaza 3&4, Time 13:30-15:35, Tuesday 23rd September 2008 Chair: Marija Tabain
Page 86TueSe3.04-l13:30-13:55
Page 90TueSe3.04-213:55-14:20
Page 94TueSe3.04-314:20-14:45
Page 95TueSe3.04-414:45-15:10
Page 96
TueSe3.04-515:10-15:35
Coarticulation in Nasal and Lateral Clusters in WarlpiriJanet Fletcher1, Deborah Loakes1, Andrew Butcher 2
1University of Melbourne, Australia; 2Flinders University, Australia
Phonetically Prestopped Laterals in Australian Languages: A PreliminaryInvestigation of WarlpiriDebomh Loakes1, Andrew Butcher2, Janet Fletcher1, Hywel Stoakes11University of Melbourne, Australia; 2Flinders University, Australia
Connected Speech Processes in WarlpiriJohn Ingram, Mary Laughren, Jeff Chapman, University of Queensland, Australia
Consonant Enhancement in Lamalama, an Initial-Dropping Language of Cape York
Peninsula, North QueenslandChristina Pendand, University of Queensland, Australia
Text, Rhythm and Metrical Form in an Aboriginal Song Series
Myfany Turpin, University of Queensland, Australia
TueSe3.Pl: Acoustic Activity Detection, Pitch Tracking and AnalysisMezzanine Level Area Al, Time 13:30 -15:30, Tuesday 23rd September 2008 Chair: Timothy J. Hazen
Page 99TueSe3.PM
Page 103TueSc3.Pl-2
Page 107
TueSe3.Pl-3
Page 111TueSe3.Pl-4
Page 115TueSe3.Pl-5
Page 119
TueSe3.Pl-6
Statistical Speech Activity Detection Based on Spatial Power Distribution for
Analyses of Poster Presentations
Kentaro Ishizuka1, Shoko Araki], Tatsuya Kawahara'1lNTT Corporation, Japan; 2Kyoto University, Japan
A Statistical Model-Based Voice Activity Detection Employing MinimumClassification Error TechniqueSang-Ick Kang, Ji-FIyun Song, Kye-Hwan Lee, Yun-Sik Park, Joon-Hyuk Chang, Inha
University, Korea
Comparative Evaluation of Different Methods for Voice Activity Detection
Fhmgfei Ding, Koichi Yamamoto, Masami Akamine, Toshiba Corporate R&D Center,Japan
Speech/Non-Speech Segments Detection Based on Chaotic and Prosodic Features
Soheil Shafiee, Farshail Almasganj, Ayyoob Jafari, Amirkabir University of
Technology, Iran
Acoustic Event Classification Using a Distributed Microphone Network with a
GMM/SVM Combined AlgorithmChristian Zieger, Maurizio Omologo, FBK-irst, Italy
Intentional Voice Command Detection for Completely Hands-Free SpeechInterface in Home Environments
Yasunari Ohuchi, Masahito Togami, Takashi Sumiyoshi, Hitachi Ltd., Japan
TueSe3.Pl continued
Page 123TueSe3.Pl-7
Page 127TueSe3.Pl-8
Page 131TueSe3.Pl-9
Page 135TueSe3.Pl-10
Page 139
TueSe3.PMl
Page 143TueSe3.Pl-12
Page 147TueSe3.Pl-13
Fusion of Audio and Video Modalities for Detection of Acoustic Events
Taras Butko, Audrey Temko, Climent Nadeu, Cristian Canton, Universitat Politecnica
de Catalunya, Spain
DySANA: Dynamic Speech and Noise Adaptation for Voice Activity Detection
Ron J. Weiss \ Trausti Kristjansson-1 Columbia University, USA;2 Google Inc., USA
A Comprehensive Study on the Effects of Room Reverberation on Fundamental
Frequency Estimation
Rico Petrick1, Masashi Unoki-, Anish Mittal*, Carlos Segura4, Ruediger Hoffmann11 Technische Unh>ersitdt Dresden, Germany; 2JAIST, Japan; 3IITRoorkee, India;4 Universitat Politecnica de Catalunya, Spain
A Hybrid Speech Signal Based Algorithm for Pitch Marking Using Finite State
Machines
H. Hussein, M. Wolff, O. Jokisch, F. Duckhorn, G. Strecha, Ruediger Hoffmann,Technische Universitat Dresden, Germany
Parameter Estimation Method of FO Control Model for Singing Voices
Yasunoh OhishO, Hirokazu Kameoka2, Kunio Kashino-, Kazuya Takeda]
lNagoya University, Japan;2NTT Corporation, Japan
An Algorithm for Multi-Pitch Tracking in Co-Channel SpeechSrikanth Vishnubhotla, Carol Y. Espy-Wilson, University ofMaryland, USA
Multipitch Tracking Using a Factorial Hidden Markov Model
Michael Wohlmayr, Franz Pernkopf] Graz University of Technology, Austria
TucSc3.Pl continued...
Page 151TueSe3.Pl-14
Page 155TueSe3.Pl-15
Cochannel Speech Separation Using Multi-Pitch Estimation and Model Based
Voiced Sequential Grouping
Ming Li, Chuan Cao, Di Wang, Ping Lu, Qiang Fu, Yonghong Yan, Chinese Academy ofSciences, China
Crosscorrelation of Adjacent Spectra Enhances Fundamental Frequency TrackingPhilippe Martin, UFR Linguistique, France
TueSe3.P2: Single- and Multichannel Speech Enhancement I
Mezzanine LevelArea A2, Time 13:30-15:30, Tuesday 23rd September 2008 Chair: Martin J. Russell
Page 159TueSe3.P2-l
Page 163TueSc3.P2-2
Page 1C7TueSe3.P2-3
Page 171
TueSe3.P2-4
Page 175TueSe3.P2-5
Page 179TueSc3.P2-6
Enhancement of Noisy Speech Recordings via Blind Source SeparationJiriMalek, Zbynek Koldovsky, jindrich Zdansky, Jan Nouza, Technical University ofLiberec, Czech Republic
Studies on Estimation of the Number of Sources in Blind Source SeparationTakaaki Ishibashi1, Hidetoshi Nakashima1, Hiromu Gotanda21Kumamoto National College of Technology, Japan; 2Kinki University, Japan
Speech Enhancement Based on Hypothesized Wiener FilteringV. Ramasubramanian1, Deepak Vijaywargi21 Siemens Corporate Technology India, India;2 University of Washington, USA
Psychoacoustically-Motivated Adaptive /i-Order Generalized Spectral Subtraction
Based on Data-Driven OptimizationJunfeng lV
,Hui Jiang2, Masato AkagO
XJAIST, Japan;2 York University, Canada
Two Stage Iterative Wiener Filtering for Speech EnhancementKrishna Nand K., T.V. Sreenivas, Indian Institute ofScience, India
Assessment of Correlation Between Objective Measures and Speech RecognitionPerformance in the Evaluation of Speech EnhancementPei Ding, Jie Hao, Toshiba China R&D Center, China
TueSe3.P3: Spoken Language Systems I
Mezzanine Level Area B3, Time 13:30-15:30, Tuesday 23rd September 2008 Chair: Sebastian Moller
Page 183TueSe3.P3-l
Page 187
TueSe3.P3-2
Page 191TueSe3.P3-3
PAGE 195TueSc3.P3-4
Page 199TueSe3.P3-5
Page 203TueSe3.P3-6
Page 207
TueSe3.P3-7
Predicting ASR Errors by Exploiting Barge-In Rate of Individual Users for SpokenDialogue SystemsKazunori Komatani, Tatsuya Kawahara, Hiroshi G. Okimo, Kyoto University, Japan
Expanding Vocabulary for Recognizing User's Abbreviations of Proper Nouns
Without Increasing ASR Error Rates in Spoken Dialogue SystemsMasaki Katsumaru, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno, Kyoto
University, Japan
Exploiting the ASR N-Best by Tracking Multiple Dialog State Hypotheses
Jason D. Williams, AT&TLabs Research, USA
A Spoken Language Interpretation Component for a Robot Dialogue SystemEnes Makalic, lngrid Zukerman, Michael Niemann, Monash University, Australia
MUESLI: Multiple Utterance Error Correction for a Spoken Language Interface
Federico Cesari, Horacio Franco, Gregory K. Myers, Harry Bratt, SRI International,
USA
Methods to Optimize Transcription of On-Line Media
Sarah Conrod\ Sara Basson1, Dimitri Kanevsky '1
1Cape Breton University, Canada; 2IBM T.J. Watson Research Center, USA
Discrimination of Task-Related Words for Vocabulary Design of Spoken Dialog
SystemsAkinori ho1, Toyomi Meguro1, Shozo Makino1, Motoyuki Suzuki11 Tohoku University, Japan;2 University of Tokushima, Japan
TucSc'I.PJ continued
Page 211
TueSe3.P3-8
Page 215TueSe3.P3-9
Page 219
TueSe3.P3-10
Page 220TueSe3.P3-ll
Page 224
TueSe3.P3-12
Page 228TueSe3.P3-13
Page 232TueSe3.P3-14
Page 236TueSe3.P3-15
Dialog Management Using Weighted Finite-State Transducers
Chiori Hon'1, Kiyonori Ohtake1, Teruhisa Misu1, Hideki Kashioka1,Satashi Nakamura2
lNICT, Japan; 2ATR-SLC, Japan
Probabilistic Answer Selection Based on Conditional Random Fields for SpokenDialog System
Yoshitaka Yoshimi, Ryota Kakitsuba, Yoshihiko Nankaku, Akinobu Lee,
Keiichi Tokuda, Nagoya Institute of Technology, Japan
Let's Go Lab: A Platform for Evaluation of Spoken Dialog Systems with Real
World Users
Maxine Eskenazi, Alan W. Black, Antoine Raux, Brian Langmr, Carnegie Mellon
University, USA
The Impact of Language Dynamics on the Capitalization of Broadcast NewsFernando Batista^, Nuno Mamede1, Isabel Trancoso2
1INESC-ID/ISCTE, Portugal; 2INESC-ID/IST, Portugal
Lightly Supervised Acoustic Model Training on EPPS RecordingsMatthias Paulik, Alex Waibel, Universitdt Karlsruhe (TH), Germany
Fast Call-Classification System Development Without In-Domain Training Data
Christophe. Servan, Frederic Bechet, LLA, France
/CNC and iROVER: The Limits of Improving System Combination with
Classification?
Bjorn Hoffmeister, RalfSchluter, Hermann Ney, RWTH Aachen University, Germany
System Combination for Spoken Language UnderstandingStefan Hahn, Patrick Lehnen, Hermann Ney, RWTH Aachen Unh'ersity, Germany
TueSe3.P4: Emotion and Expression I
Mezzanine Level Area B4, Time 13:30 -15:30, Tuesday 23rd September 2008 Chair: Denis Burnham
Page 240
TucSe3.P4-l
Page 241TueSe3.P4-2
Page 245TueSe3.P4-3
Page 249TueSe3.P4-4
Page 253TueSe3.P4-5
PAGE 257
TueSe3.P4-6
Page 261TucSe3.P4-7
Multidimensional Features of Emotional Speech
Tomoko Suzuki], Machiko Ikemoto-, Tomoko Sano:\ Toshihiko Kinoshita11 Kansai Medical University, Japan; 2Doshisha University, Japan; 3Seibi Gakuen
College, Japan
Leveraging Emotion Detection Using Emotions from Yes-No Answers
Narjes Boufaden, Pierre Dumouchel, CRIM, Canada
Vowel Placement During Operatic Singing: 'Come si Parla' or 'Aggiustamento'?Thomas J. Millhouse, Dianna T. Kenny, University of Sydney, Australia
Study on Strained Rough Voice as a Conveyer of RageYumiko O. Kato, Yoshifumi Hirose, Takahiro Kamai, Matsushita Electric Industrial Co.
Ltd., Japan
Integrating Rule and Template-Based Approaches for Emotional Malay Speech
Synthesis
Mumtaz Begum1, Raja N. Ainon1, Roziati Zainuddin 1, Zuraidah M. Don1,
Gerry Knowles2
1 University ofMalaya, Malaysia; 2Miquest Worldwide Sdn. Bhd., Malaysia
The Expression and Perception of Emotions: Comparing Assessments of Self
versus Others
Carlos Busso, ShrikanthS. Narayanan, University of Southern California, USA
On the Role of Acting Skills for the Collection of Simulated Emotional SpeechEmiel Krahmer, Marc Swerts, Tilburg University, The Netherlands
TueSc 3.P4 contin ued
Detection of Security Related Affect and Behaviour in Passenger TransportBjdrn Schuller1, Matthias Wimmer'*, Dejan Antic1, Tobias Moosmayr \Gerhard Rigoll1lTechnische Universitdt Miinchen, Germany;2 Waseda University, Japan;3BMWGroup, Germany
TueSe4.01: Automatic Speech Recognition: Acoustic Models I
Great Hall, Time 16:00-18:00, Tuesday 23rd September 2008 Chair: Roger K. Moore
PAGE 269
TueSe4.01-l
10:00-16:20
PAGE 273
TueSe4.01-2
16:20-16:40
Page 277TueSe4.01-316:40-17:00
PAGB 281TueSe4.01-4
17:00-17:20
Page 285TueSe4.01-517:20-17:40
Page 289
TueSe4.01-6
17:40-18:00
Soft Margin Estimation with Various Separation Levels for LVCSR
Jinyu Li1, Zhi-jie Van2, Chin-Hui Lee], Ren-Hua Wang21 Georgia Institute of Technology, USA;2 University of Science & Technology of China,China
On the Equivalence of Gaussian and Log-Linear HMMs
Georg Heigold, Patrick Lehnen, Ralf Schliiter, Hermann Ney, RWTH Aachen
University, Germany
Generalization of Extended Baum-Welch Parameter Estimation for Discriminative
Training and DecodingDimitri Kanevskyx, TaraN. Sainath2, Bhuvana Ramabhadran1, David Nahamoo]1IBM T.J. Watson Research Center, USA;2MIT, USA
An Ellipsoid Constrained Quadratic Programming Perspective to Discriminative
Training of HMMs
Peng Liu, Frank K. Soong, Microsoft Research Asia, China
Discriminative Training of Variable-Parameter HMMs for Noise Robust SpeechRecognition
Dong Yu1, Li Deng1, Yifan Gong 2, Alex Acero11Microsoft Research, USA;2Microsoft Corporation, USA
Towards a Non-Parametric Acoustic Model: An Acoustic Decision Tree for
Observation Probability Calculation
Jasha Droppo1, Michael L. Seltzer*, Alex Acero1, Yu-Hsiang Bosco Chiu21Microsoft Research, USA;2 Carnegie Mellon University, USA
TueSe4.02 : Accent and Language Identification
Plaza 1, Time 16:00-18:00, Tuesday 23rd September 2008 Chair: Yuko Kinoshita
PAGE 293TueSe4.02-l16:00-16:20
Page 297
TueSe4.02-216:20-16:40
Page 301TueSe4.02-3
16:40-17:00
Page 305
TueSe4.02-417:00-17:20
Page 309TueS<>4.02-517:20-17:40
Page 313
TueSe4.02-617:40-18:00
Experiments with the ABI (Accents of the British Isles) Speech CorpusShona D'Arcy], Martin J. Russell21Trinity College Dublin, Ireland;2 University ofBirmingham, UK
Politecnico di Torino System for the 2007 NIST Language Recognition Evaluation
Fabio Castaldo], Emanuele Dalmasso1, Pietro Laface1, Daniele Colibro2,Claudia Vair21 Politecnico di Torino, Italy; 2Loquendo, Italy
Discriminative Training and Channel Compensation for Acoustic LanguageRecognition
Valiantsina Hubeika, Lukds Burget, Pavel Mat&jka, Petr Schwarz, Brno University ofTechnology, Czech Republic
Comparison of Variable Selection Methods and Classifiers for Native Accent
Identification
Tingyao VV'u, Peter Karsmakers, Hugo Van hamme, Dirk Van Compernolle, Katholieke
Universiteit Leuven, Belgium
A Comparison of Subspace Feature-Domain Methods for Language RecognitionW.M. Campbell, Douglas E. Sturim, Pedro A. Torres-Carrasquillo, Douglas A. Reynolds,
MIT, USA
Context-Dependent Phone Models and Models Adaptation for Phonotactic
Language RecognitionMohamed Faouzi BenZeghiba, Jean-Luc Gauvain, Lori Lamel, LIMSI, France
TueSe4.03 : Emotion and Expression II
Plaza 2, Time 16:00-18:00, Tuesday 23rd September 2008 Chair: Ailbhe N( Chasaide
Page 317TueSe4.03-l16:00-1G:20
Page 318TueSe4.03-216:20-16:40
Page 322TueSe4.03-316:40-17:00
Page 326
TueSe4.03-417:00-17:20
Page 330TueSe4.03-517:20-17:40
Page 334TucSe4.03-6
17:40-18:00
Emotions and Articulator^ Precision
Martijn Goudbeek, Jean Philippe Goldman, Klaus R. Scherer, University of Geneva,Switzerland
Assessing Agreement of Observer- and Self-Annotations in Spontaneous
Multimodal Emotion Data
Khiet P. Tnwng, Mark A. Neerincx, David A. van Leeuwen, TNO-D&V, The
Netherlands
Emotion Recognition in Spontaneous Emotional Speech for Anonymity-ProtectedVoice Chat SystemsYoshiko Arimoto1, Hiromi Kawatsu 2, Sumio Ohno1, Hitoshi Iida11 Tokyo University of Technology, Japan; 2NIJLA, Japan
Assigning Suitable Phrasal Tones and Pitch Accents by Sensing Affective
Information from Text to Synthesize Human-Like SpeechMostafa Al Masum Shaikh, Md. Khademul Islam Malla, Keikichi Hirose, University ofTokyo, Japan
Cross-Language Study of Vocal Correlates of Affective States
lrena Yanushevskaya, Ailbhe Ni Chasaide, Christer Gobi, Trinity College Dublin,Ireland
Gender-Related Differences in the Production and Perception of Emotion
Marc Swerts, Emiel Krahmer, Tilburg University, The Netherlands
TueSe4.04: Special Session: PANZE 2008 — Phonetics and Phonology of
Australian and New Zealand EnglishPlaza 3&4, Time 16:00 -18:05, Tuesday 23rd September 2008 Chair: Felicity Cox
PAGE 338TueSe4.04-l
16:00-16:25
Page 342TueSe4.04-216:25-16:30
Page 346TueSe4.04-3J6:50-17:15
Page 347TueSe4.04-417:15-17:40
The English Pronunciation of Successive Groups of Maori SpeakersCatherine I. Watson1, Margaret Maclagan2, Jeanette King -, Ray Harlow-11University ofAuckland, New Zealand;2 University of Canterbury, New Zealand;
3University of Waikato, New Zealand
Reversal of Short Front Vowel Raising in Australian EnglishFelicity Cox, Sallyanne Palcthorpe, Macquarie University, Australia
GOOSE on the Move: A Study of /u/-Fronting in Australian News Speech
Jennifer Price, Monash University, Australia
The Vowels of Australian Aboriginal EnglishAndrew Butcher1, Victoria Anderson21 Flinders University, Australia;2 University ofHawaii at Manoa, USA
Page 351TueSe4.04-5
17:40-18:05
Perception and Production of /i:/, /ia/ and /e:/ in Australian EnglishRobert H. Mannell, Macquarie University, Australia
TueSe4.Pl: Speaker Recognition and Diarisation
Mezzanine Level Area Al, Time 16:00-18:00, Tuesday 23rd September 2008 Chair: Robbie Vogt
PAGE 355TueSe4.PM
Page 359TueSe4.Pl-2
Page 363TueSe4.Pl-3
Page 367TueSe4.Pl-4
Page 371TueSe4.P1-5
Page 375TueSe4.Pl-6
Page 379TueSe4.Pl-7
Page 383
TueSe4.Pl-8
An Expert System in Speaker Verification Task
Zbynek Zajic, Lukds Machlica, Ales Padrta, Jan Vanek, Vlasta Radovd, University ofWest Bohemia, Czech Republic
Cascading Appearance-Based Features for Visual Speaker Verification
David Dean, Sridha Sridharan, Patrick Lucey, Queensland University of Technology,Australia
Improved Novelty Detection for Online GMM Based Speaker Diarization
Konstantin Markov, Satoshi Nakamura, ATR-SLC, Japan
Analysis of Impostor Tests with High Scores in NIST-SRE Context
Salah Eddine Mezaache, Jean-Francois Bonastre, Driss Matrouf, LIA, France
Reinforced Temporal Structure Information for Embedded Utterance-Based
Speaker RecognitionAnthony Larcher1, Jean-Francois Bonastre1, John S.D. Mason2lLIA, France;2Swansea University, UK
Fast Search for Common Segments in Speech Signals for Speaker Verification
Michael Gerher, Beat Pftster, ETH Zurich, Switzerland
Audio-Visual Multilevel Fusion for Speech and Speaker RecognitionGirija Chett)>, Michael Wagner, University of Canberra, Australia
Clustering Initialization Based on Spatial Information for Speaker Diarization of
Meetings
,/. Luque, Carlos Segura, Javier Hernando, Universitat Politecnica de Catalunya, Spain
TueSe4.P2 : Single- and Multichannel Speech Enhancement II
Mezzanine LevelArea A2, Time 16:00 -18:00, Tuesday 23rd September2008 Chair: John H.L Hansen
Page 387TueSe4.P2-l
Page 391TueSe4.F2-2
PAGE 395TueSe4.P2-3
Page 399TueSe4.P2-4
Page 403
TueSe4.P2-5
PAGE 407
TueSe4.P2-6
Page 411TueSe4.P2-7
Effect of Compressing the Dynamic Range of the Power Spectrum in Modulation
Filtering Based Speech Enhancement
James G. Lyons, Kuldip K. Paliwal, Griffith University, Australia
A Long State Vector Kalman Filter for Speech Enhancement
Stephen So, Kuldip K. Paliwal, Griffith University, Australia
Subspace Based Speech Enhancement Using Gaussian Mixture Model
Achintya Kundu, Saikat Chatterjee, T.V. Sreenivas, Indian Institute ofScience, India
Generalized Parametric Spectral Subtraction Using Weighted Euclidean Distortion
-4mit Das, John H.L. Hansen, University of Texas at Dallas, USA
Sudden Noise Reduction Based on GMM with Noise Power Estimation
Nobuyuki Miyake, Tetsuya Takiguchi, Yasuo Arikl, Kobe University, Japan
Speech Enhancement Using a Wiener Denoising Technique and Musical Noise
Reduction
AM Jahangir Alam1, Sid-Ahmed Selouani'1, Douglas 0'Shaughnessy],Sofia Ben Jebara31 Universite du Quebec, Canada;2 Universite du Moncton, Canada; 3Ecole Supepeuredes Communications da Tunis, Tunisia
Regularized Non-Negative Matrix Factorization with Temporal Dependencies for
Speech DenoisingKevin W. Wilson1, Bhiksha Rajy, Paris Smaragdis -
1Mitsubishi Electric Research Labs, USA;2Adobe Systems, USA
Page 415TueSe4.P2-8
Page 419TueSe4.P2-9
PAGE 423TueSe4.P2-10
Page 427
TueSe4.P2-ll
Page 431TucSe4.P2-12
Page 435Tu<?Se4.P2-13
TueSe-i.P2 continued...
ICA-Based MAP Speech Enhancement with Multiple Variable Speech Distribution
Models
Xin Zou, Peter Jancovic, Munewer Kokuer, Martin J. Russell, University ofBirmingham, UK
Source Separation Based on Binaural Cues and Source Model Constraints
Ron./. Weiss, Michael I- Mandel, Daniel P.W. Ellis, Columbia University, USA
Maximum Kurtosis Beamforming with the Generalized Sidelobe Canceller
Kenichi Kumatani], John McDonough2, Barbara Rauch~, Philip N. Garner1,
Weifeng Li1, John Dines1
1IDIAP Research Institute, Switzerland;2Saarland University, Germany
Noise Robust Speech Dereverberation Using Constrained Inverse Filter
Ken 'ichi Furvya1, Akiloshi Kataoka2, Yoichi Haneda1
1AT7T Corporation, Japan; 2Ryukoku University, Japan
A Dual Microphone Coherence Based Method for Speech Enhancement in
Headsets
Mohsen Rahmani, Ahmad Akbari, Be.ghdad Ayad, Iran University of Science &
Technology, Iran
Sound Capture System and Spatial Filter for Small Devices
Ivan Tashev], Slavy Mihov2, Tyler Gleghom3, Alex Acero11 Microsoft Research, USA;2 Technical University of Sofia, Bulgaria;3 Microsoft
Corporation, USA
Page 439TueSe4.P2-14
An Effective Microphone Array Post-Filter in Arbitrary Environments
Ning Cheng, Wen-ju Liu, Peng Li, Bo Xu, Chinese Academy ofSciences, China
Tue.SV.-f.Pc' continued...
lAGf , r
Localization of Multiple Sound Sources Based on Inter-Channel Correlation UsingTueSe4.P2-15
a Distributed Microphone SystemKook Cho, Hajime Okumura, Takanobu Nishmra, Yoichi Yamashita, Ritsumeikan
University, Japan
Tuesetra-16 A Frequency Domain Approach for Speech Enhancement with DirectionalityUsing Compact Microphone Array
Heng Zhang, Qiang Fu, Yonghong Yan, Chinese Academy ofSciences, China
TueSe4.P3: Spoken Language Systems II
Mezzanine Level Area B3, Time 16:00-18:00, Tuesday 23rd September 2008 Chair: Jerome Bellegarda
Pace 451TueSe4.P3-l
PAGE 455TueSe4.P3-2
Page 459
TueSe4.P3-3
Page 463TueSe4.P3-4
Page 467TueSe4.P3-5
Page 471
TueSe4.P3-6
Question and Answer Database Optimization Using Speech Recognition Results
Shota Takeuchi, Tobias Cincarek, Hiromichi Kawanami, Hiroshi Saruwatari,
Kiyohiro Shikano, NAIST, Japan
Development and Evaluation of Hands-Free Spoken Dialogue System for RailwayStation Guidance
Hiroshi Saruwatari, Yu Takahashi, Hiroyuki Sakai, Shota Takeuchi, Tobias Cincarek,
Hiromichi Kawanami, Kiyohiro Shikano, NAIST, Japan
Statistical Shared Plan-Based Dialog ManagementAmanda j. Stent, Srinivas Bangalore, AT&T Labs Research, USA
When Calls Go Wrong: How to Detect Problematic Calls Based on Log-Files andEmotions?
Ota Herm1, Alexander Schmitt2, Jackson Llscombe 3
1 Czech Technical University in Prague, Czech Republic;2 University of Ulm, Germany;
3SpeechCycle Inc., USA
Unsupervised Learning of Edit Parameters for Matching Name Variants
Dan Gillick1, Dilek Hakkani-Tiir2, Michael Levit2
1University of California at Berkeley, USA;2ICSI, USA
Detection of Repetitions in Spontaneous Speech in Dialogue Sessions
Mert Cevik1, Fuliang Weng2, Chin-Hui Lee'1 Georgia Institute of Technology, USA;2Robert Bosch Corp., USA
TueSe4.P3 com inned
Page 475
TueSe4.P3-7
Page 479TueSe4.P3-8
Page- 483
TueSe4.P3-9
Page 487TueSe4.P3-10
Page 491TueSe4.P3-ll
Page 495TueSe4.P3-12
Automatic Customer Feedback Processing: Alarm Detection in Open Question
Spoken MessagesNathalie Camelin1, Geraldine Damnati1, Frederic Bechet1, Renato De Mori1
lLIA, France;2 Orange Labs, France
Minimal Training Based Semantic Categorization in a Voice Activated Question
Answering (VAQA) SystemMithun Balakrishna, Marta Tata, Dan Moldovan, Lymba Corporation, USA
User Study of the Bayesian Update of Dialogue State Approach to Dialogue
ManagementB. Thomson, M. Gasic, S. Keizer, F. Mairesse, J. Schatzmann, K. Yu, Steve Young,
University of Cambridge, UK
Extensibility Verification of Robust Domain Selection Against Out-of-Grammar
Utterances in Multi-Domain Spoken Dialogue SystemSatoshi Ikeda, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno, KyotoUniversity, Japan
Improving Large Scale Alphanumeric String Recognition Using Redundant
Information
Ea-Ee Jan 1, Osamuyimen Stewart', Raymond Co2, David Lubensky]lIBM T.J. Watson Research Center, USA; 2IBM Canada Global Business Services,Canada
SPRAAK: An Open Source "SPeech Recognition and Automatic Annotation Kit"
Kris Demuynck, Jan Roelens, Dirk Van Compernolle, Patrick Wambacq, Katholieke
Universiteit Leuven, Belgium
TueSc4.P3 continued...
Pace 496TueSe4.P3-13
Page 500TueSe4.P3-14
Page 504TueSe4.P3-15
Page 508TueSe4.1'3-16
Preliminary Evaluation of Speech/Sound Recognition for Telemedicine
Application in a Real Environment
Michel Vacher1, Anthony Flemy2, Jean-Erangois Serignat1, Norbert Noury2,Hubert Glasson1
lLIG, France;2 TIMC-IMAG, France
MobiDic — A Mobile Dictation and Notetaking ApplicationMarkka Turnnen, Aleksi Melto, Anssi Kainulainen, Jaakko Hakulinen, University of
Tampere, Finland
Automatic Speech Recognition for Scientific Purposes — webASR
Thomas Ham, Asmaa El Hannani, Stuart N. Wrigley, Vincent Wan, University of
Sheffield, UK
Evaluation of a Live Broadcast News Subtitling System for PortuguesHugo Meinedo, March) Viveiros, Joao Neto, MESC-ID/IST, Portugal
TueSe4.P4: Perception, Production, Discourse and DialogMezzanine Level Area B4, Time 16:00-18:00, Tuesday 23rd September 2008 Chair: Deborah Loakes
page 512TueSe4.P4-l
Page 516TueSe4.P4-2
Page 520
TueSe4.P4-3
Page 524TueSc4.P4-4
PAGE 528TueSe4.P4-5
PAGE 529TueSe4.F4-6
PAGE 530TueSc4.P4-7
Page 534TueSe4.P4-8
Recognizing and Modelling Regional Varieties of Swedish
Jonas Beskow1, Gtista Bruce2, Laura Enflo1, Bjorn Granstrom1, Susanna Schotz2
lKTH, Sweden;2Lund University, Sweden
Vowel Duration, Compression and Lengthening in Stressed Syllables in Central
and Southern Varieties of Standard Italian
John Hajek, Mary Stevens, University ofMelbourne, Australia
Acoustic Cues for the Perception of Intonation in Cantonese
JoanK.-Y. Ma1, Valter Ciocca1, "TaraL. Whitelull11University ofHong Kong, China;2 University ofBritish Columbia, Canada
Perception of Dialectal ProsodyAdrian Leemann, Beat Siebenhaar, University ofBerne, Switzerland
Does the McGurk Effect Rely on Processing Time Constraints?
Christian Kroos, Ashlie Dreves, University of Western Sydney, Australia
Exploring the Uncanny Valley Effect with Talking HeadsTakaaki Kuratate, Kathryn Ayers, Jeesun Kim, Denis Burnham, University of Western
Sydney, Australia
How Do the Elderly Talk to a Natural Language Call Routing System?KnutKvale, Ragnhild Halvorsrud, Telenor Research & Innovation, Norway
Analysis of Relationship Between Impression of Human-to-Human Conversations
and Prosodic Change and Its ModelingRyota Nishimura1, Norihide Kitaoka 2, Seiichi Nakagawa'1Toyohashi University of Technology, Japan; 2Nagoya University, Japan
Tue.Se-4.P4 continued
Page 538TueSe4.P4-9
Page 542
TueSe4.P4-10
Page 543
TueSc4.P4-l 1
Page 544
TueSe4.P4-12
Page 545
TueSe4.P4-13
Utterance-Level Normalization for Relative Articulation Rate Analysis
Tuomo Saarni, Jussi Hakokari, Jouni Isoaho, Tapio Salakoski, University of Turku,
Finland
Syntactic Complexity Induces Explicit Grounding in the MapTask Corpus
Martin Tietze, Vera Demberg, Johanna D. Moore, University ofEdinburgh, UK
Do Discourse Cues Facilitate Recall in Information Presentation Messages?Andi Winterboer, Johanna D. Moore., Fernanda Ferreim, University ofEdinburgh, UK
Structured Heterogeneity of English Stress Variants
Noriko Hattori, Mie University, Japan
A Method for Automatically Estimating FO Model Parameters and a Speech
Re-Synthesis Tool Using FO Model and STRAIGHT
Shota Sato1, Taro Kimura Yasuo Horiuchi1, Masafumi Nishida1, Shingo Kuroiwa1,Akira Ichikawa11 Chiba University, Japan;2Nintendo Co. Ltd., Japan
WedSe2.01: Single-Channel Speech EnhancementGreat Hall, Time 10:00-12:00, Wednesday 24th September 2008 Chair: Frank K. Soong
Page 549WedSe2.01-l10:00-10:20
Page 553
WedSe2.01-210:20-10:40
Page 557WedSe2.01-310:40-11:00
Page 5C1WedSe2.01-411:00-11:20
Page 565WedSe2.01-511:20-11:40
Page 569
WedSe2.01-611:40-12:00
Noise Driven Short-Time Phase Spectrum Compensation Procedure for SpeechEnhancement
Anthony P. Stark, Kamil K. Wojcicki, James G. Lyons, Kuldip K. Paliwal, GriffithUniversity, Australia
A Phase-Averaged Model for the Relationship Between Noisy Speech, Clean
Speech and Noise in the Log-Mel Domain
Friedrich Faubcl, John McDonough, Dietrich Klakow, Saarland University, Germany
Time and Frequency Dependent Amplification for Speech IntelligibilityEnhancement in Noisy Environments
Henk Brouckxon1, Werner Verhelst1, Bart De Schuymer21Vrije Universiteit Brussel, Belgium;2 Televic nv, Belgium
A Wavelet Based Speech Enhancement Method Using Noise Classification and
ShapingMahdi Mohammadi1, Behzad Zainani1, Babak Nasersharif2, Mohsen Rahmani1,Ahmad Akbari1
lIran University of Science & Technology, Iran;2 University ofGuilan, Iran
Speech Enhancement Based on Novel Two-Step a priori SNR Estimators
Md. Jahangir Alam1, Douglas O'Shaughnessy1, Sid-Ahmed Selouani21Universite du Quebec, Canada;2 Universite du Moncton, Canada
A Speech Enhancement Approach Using Piecewise Linear Approximation of an
Explicit Model of Environmental Distortions
Jun Du, Qiang Huo, Microsoft Research Asia, China
WedSe2.02: Speech Synthesis Methods I
Plaza 1, Time 10:00-12:00, Wednesday 24th September 2008 Chair: Rolf Carlson
Page 573WedSe2.02-l10:00-10:20
Page 577
WedSe2.02-210:20-10:40
PAGE 581
WedSe2.02-310:40-11:00
Page 585WedSe2.02-411:00-11:20
Page 589WedSe2.02-5
11:20-11:40
Page 593
WedSe2,02-611:40-12:00
Articulator^ Control of HMM-Based Parametric Speech Synthesis Driven byPhonetic KnowledgeZhen-Hua Ling1, Korin Richmond2, Junichi Yamagishi2, Ren-Hua Wang11University of Science & Technology of China, China;2 University of Edinburgh, UK
Minimum Generation Error Training with Direct Log Spectral Distortion on LSPs
for HMM-Based Speech Synthesis
Yi-Jian Wu, Keiichi Tokuda, Nagoya Institute of Technology, Japan
Robustness of HMM-Based Speech Synthesis
Junichi Yamagishi1, Zhen-Hua Ling2, Simon King11 University ofEdinburgh, UK;2 University ofScience & Technology of China, China
Improving Preselection in Unit Selection SynthesisAlistair Conkie, Ann Syrdal, Yeon-Jun Kim, Mark Beutnagel, AT&T Labs Research,USA
Efficient Join Cost Computation for Unit Selection Based ITS SystemsFeng Ding1, Jani Nurminen2, Jilei Tian7,
lNokia Research Center, China; 2Nokia Devices R&D, Finland; 3Nokia Research
Center, Finland
A Phonetic Assessment of Cross-Language Voice Conversion
Kayoko Yanagisawa, Mark Uuckvale, University College London, UK