9th annual conference ; vol. 1 - gbv · 9thannualconferenceofthe internationalspeech...

9th Annual Conference of the

International SpeechCommunication Association 2008

(INTERSPEECH 2008)

Brisbane, Australia

22-26 September 2008

Volume 1 of 5

ISBN: 978-1-61567-378-0

TueSe2.01: Keynote 1: Hiroya FujisaM — ISCA Medallist

Great Hall, Time 11:00-12:00, Tuesday 23rd September 2008 Chair: Isabel Trancoso

p/ge i in Search of Models in Speech Communication ResearchKeynote 1 c

Hiroya Fujisaki, University of Tokyo, Japan

WedSel.Ol: Keynote 2: Abeer Alwan

Great Hall, Time 08:30-09:30, Wednesday 24th September 2008 Chair: Anne Cutler

Keynote 2Dealing with Limited and Noisy Data in ASR: A Hybrid Knowledge-Based andStatistical ApproachAbeer Alwan, University of California at Los Angeles, USA

ThuSel.Ol: Keynote 3: Joaquin Gonzalez-RodriguezGreat Hall, Time 08:30 - 09:30, Thursday 25th September 2008 Chair: Michael Wagner

KeynoM 3Forensic Automatic Speaker Recognition: Fiction or Science?

Joaquin Gonzalez-Rodriguez, Universidad Autonoma de Madrid, Spain

FriSel.Ol: Keynote 4: Justine Cassell

Great Hall, Time 08:30-09:30, Friday 26th September 2008 Chair: Denis Bumham

Keynote 4Modeling Rapport in Embodied Conversational AgentsJustine Cassell, Northwestern University, USA

TueSe3.01: Segmentation and Classification

GreatHall, Time 13:30-15:30, Tuesday 23rd September 2008 Chair: Helen M. Meng

0TueSe3.01-l

13:30-13:50

Page 24TueSe3.01-213:50-14:10

Page 28TueSe3.01-314:10-14:30

Page 32TueSe3.01-414:30-14:50

Page 36TueSe3.01-514:50-15:10

Page40TueSe3.01-615:10-15:30

Agglomerative Hierarchical Speaker Clustering Using Incremental Gaussian

Mixture Cluster ModelingKyu J. Han, Shrikanth S. Narayanan, University ofSouthern California, USA

Weighted Segmental K-Means Initialization for SOM-Based Speaker ClusteringOshry Ben-Harush1, Itshak Lapidot2, Hugo Guterman1

1Ben-Gurion University of the Negev, Israel; 2Sami Shamoon College of Engineering,Israel

Learning Essential Speaker Sub-Space Using Hetero-Associative Neural Networks

for Speaker Clustering

Shajith Ikbal, Karthik Visweswariah, IBMIndia Research Lab, India

Two's a Crowd: Improving Speaker Diarization by Automatically Identifying and

Excluding Overlapped SpeechKofi Boakye, Oriol Vinyals, Gerald Friedland, ICSI, USA

T-Test Distance and Clustering Criterion for Speaker Diarization

Trung Hieu Nguyen1, Eng Siong Chng2, Haizhou Li11 Institute forlnfocomm Research, Singapore; 2Nanyang Technological University,Singapore

Integration of TDOA Features in Information Bottleneck Framework for Fast

Speaker Diarization

Deepu Vijayasenan, Fabio Valente, Hen'd Bourlard, IDIAP Research Institute,

Switzerland

TueSe3.02: Speech CodingPlaza 1, Time 13:30-15:30, Tuesday 23rdSeptember 2008 Chair: Julien Epps

Page 44

TueSe3.02-l13:30-13:50

Page 45

TueSe3.02-213:50- 14:10

Page 49

TueSe3.02-314:10-14:30

Page 53TueSe3.02-4

14:30-14:50

Page 57TueSe3.02-514:50-15:10

Page 61TueSe3.02-615:10-15:30

Low Complexity Near-Optimal Unit-Selection Algorithm for Ultra Low Bit-Rate

Speech Coding Based on N-Best Lattice and Viterbi Search

V. Ilamasubramanian, D. Harish, Siemens Corporate Technology India, India

A New Fast Algebraic Fixed Codebook Search Algorithm in CELP Speech Coding

Vaclav Eksler1, Redwan Salami2, Milan Jettnek11 University ofSherbrooke, Canada;2VoiceAge Corporation, Canada

A Novel Transcoding Algorithm Between 3GPP AMR-NB (7.95kbit/s) and ITU-T

G.729a (8kbit/s)

HaoXu, Changchun Bao, Beijing University of Technology, China

Mel-Frequency Cepstral Coefficient-Based Bandwidth Extension of Narrowband

SpeechAmrH. Nour-Eldin, Peter Kabal, McGill University, Canada

A PCM Coding Noise Reduction for ITU-T G.711.1

Jean-Luc Garcia, Claude Marro, Balazs Kdvesi, Orange Labs, France

An Instrumental Measure for End-to-End Speech Transmission Quality Based on

Perceptual Dimensions: Framework and Realization

Marcel Waltermann1, Kirstin Scholz2, Sebastian Moller1, Lu Huo2,Alexander Raake1, Ulrich Heute21 Technische Universitdt Berlin, Germany;2 Christian-Albrechts-Universitdt zu Kiel,

Germany

TueSe3.03: Human Conversation and Communication

Plaza 2, Time 13:30-15:30, Tuesday 23rd September 2008 Chair: Bernd Mdbius

Page 65TueSe3.03-l13:30-13:50

Page 69TueSe3.03-213:50-14:10

Page 70TueSe3.03-3

14:10-14:30

Page 74

TueSe3.03-414:30-14:50

Page 78

TueSe3.03-514:50-15:10

Page 82TueSe3.03-615:10-15:30

Duration and FO Interval of Utterance-Final Intonation Contours in the Perceptionof German Sentence Modality

Beimo Peters, Hartmut R. Pfitzinger, Christian-Albrechts-Universitdt zu Kiel, Germany

Contrastive Utterances Make Alternatives Salient — Cross-Modal PrimingEvidence

Bettina Braun, Lara Tagliapietra, Anne Cutler, Max Planck Institute for

Psycholinguistics, The Netherlands

Exploring a Mechanism of Speech Sychronization Using Auditory Delayed

ExperimentsMasato Ishizaki1, Yasuharu Den2, Senshi Fukashiro11University of Tokyo, Japan;2 Chiba University, Japan

Prosodic Manifestations of Confidence and Uncertainty in Spoken LanguageHeather Pon-Bany, Harvard University, USA

Identifying Relevant Phrases to Summarize Decisions in Spoken Meetings

Raquel Fernandez1, Matthew Frampton1, John Doweling2, Anish Adukuzhiyil1,Patrick Ehlen1, Stanley Peters1

Stanford University, USA;2 University of California at Santa Cruz, USA

Recovering Participant Identities in Meetings from a Probabilistic Description of

Vocal Interaction

Kornel Laskowski1, Tanja Schultz21 Universitdt Karlsruhe (TH), Germany;2Carnegie Mellon University, USA

TueSe3.04: Special Session: OzPhon08 — Phonetics and Phonology ofAustralian Aboriginal LanguagesPlaza 3&4, Time 13:30-15:35, Tuesday 23rd September 2008 Chair: Marija Tabain

Page 86TueSe3.04-l13:30-13:55

Page 90TueSe3.04-213:55-14:20

Page 94TueSe3.04-314:20-14:45

Page 95TueSe3.04-414:45-15:10

Page 96

TueSe3.04-515:10-15:35

Coarticulation in Nasal and Lateral Clusters in WarlpiriJanet Fletcher1, Deborah Loakes1, Andrew Butcher 2

1University of Melbourne, Australia; 2Flinders University, Australia

Phonetically Prestopped Laterals in Australian Languages: A PreliminaryInvestigation of WarlpiriDebomh Loakes1, Andrew Butcher2, Janet Fletcher1, Hywel Stoakes11University of Melbourne, Australia; 2Flinders University, Australia

Connected Speech Processes in WarlpiriJohn Ingram, Mary Laughren, Jeff Chapman, University of Queensland, Australia

Consonant Enhancement in Lamalama, an Initial-Dropping Language of Cape York

Peninsula, North QueenslandChristina Pendand, University of Queensland, Australia

Text, Rhythm and Metrical Form in an Aboriginal Song Series

Myfany Turpin, University of Queensland, Australia

TueSe3.Pl: Acoustic Activity Detection, Pitch Tracking and AnalysisMezzanine Level Area Al, Time 13:30 -15:30, Tuesday 23rd September 2008 Chair: Timothy J. Hazen

Page 99TueSe3.PM

Page 103TueSc3.Pl-2

Page 107

TueSe3.Pl-3

Page 111TueSe3.Pl-4

Page 115TueSe3.Pl-5

Page 119

TueSe3.Pl-6

Statistical Speech Activity Detection Based on Spatial Power Distribution for

Analyses of Poster Presentations

Kentaro Ishizuka1, Shoko Araki], Tatsuya Kawahara'1lNTT Corporation, Japan; 2Kyoto University, Japan

A Statistical Model-Based Voice Activity Detection Employing MinimumClassification Error TechniqueSang-Ick Kang, Ji-FIyun Song, Kye-Hwan Lee, Yun-Sik Park, Joon-Hyuk Chang, Inha

University, Korea

Comparative Evaluation of Different Methods for Voice Activity Detection

Fhmgfei Ding, Koichi Yamamoto, Masami Akamine, Toshiba Corporate R&D Center,Japan

Speech/Non-Speech Segments Detection Based on Chaotic and Prosodic Features

Soheil Shafiee, Farshail Almasganj, Ayyoob Jafari, Amirkabir University of

Technology, Iran

Acoustic Event Classification Using a Distributed Microphone Network with a

GMM/SVM Combined AlgorithmChristian Zieger, Maurizio Omologo, FBK-irst, Italy

Intentional Voice Command Detection for Completely Hands-Free SpeechInterface in Home Environments

Yasunari Ohuchi, Masahito Togami, Takashi Sumiyoshi, Hitachi Ltd., Japan

TueSe3.Pl continued

Page 123TueSe3.Pl-7

Page 127TueSe3.Pl-8

Page 131TueSe3.Pl-9

Page 135TueSe3.Pl-10

Page 139

TueSe3.PMl



Fusion of Audio and Video Modalities for Detection of Acoustic Events

Taras Butko, Audrey Temko, Climent Nadeu, Cristian Canton, Universitat Politecnica

de Catalunya, Spain

DySANA: Dynamic Speech and Noise Adaptation for Voice Activity Detection

Ron J. Weiss \ Trausti Kristjansson-1 Columbia University, USA;2 Google Inc., USA

A Comprehensive Study on the Effects of Room Reverberation on Fundamental

Frequency Estimation

Rico Petrick1, Masashi Unoki-, Anish Mittal*, Carlos Segura4, Ruediger Hoffmann11 Technische Unh>ersitdt Dresden, Germany; 2JAIST, Japan; 3IITRoorkee, India;4 Universitat Politecnica de Catalunya, Spain

A Hybrid Speech Signal Based Algorithm for Pitch Marking Using Finite State

Machines

H. Hussein, M. Wolff, O. Jokisch, F. Duckhorn, G. Strecha, Ruediger Hoffmann,Technische Universitat Dresden, Germany

Parameter Estimation Method of FO Control Model for Singing Voices

Yasunoh OhishO, Hirokazu Kameoka2, Kunio Kashino-, Kazuya Takeda]

lNagoya University, Japan;2NTT Corporation, Japan

An Algorithm for Multi-Pitch Tracking in Co-Channel SpeechSrikanth Vishnubhotla, Carol Y. Espy-Wilson, University ofMaryland, USA

Multipitch Tracking Using a Factorial Hidden Markov Model

Michael Wohlmayr, Franz Pernkopf] Graz University of Technology, Austria

TucSc3.Pl continued...



Cochannel Speech Separation Using Multi-Pitch Estimation and Model Based

Voiced Sequential Grouping

Ming Li, Chuan Cao, Di Wang, Ping Lu, Qiang Fu, Yonghong Yan, Chinese Academy ofSciences, China

Crosscorrelation of Adjacent Spectra Enhances Fundamental Frequency TrackingPhilippe Martin, UFR Linguistique, France

TueSe3.P2: Single- and Multichannel Speech Enhancement I

Mezzanine LevelArea A2, Time 13:30-15:30, Tuesday 23rd September 2008 Chair: Martin J. Russell

Page 159TueSe3.P2-l

Page 163TueSc3.P2-2

Page 1C7TueSe3.P2-3

Page 171

TueSe3.P2-4

Page 175TueSe3.P2-5

Page 179TueSc3.P2-6

Enhancement of Noisy Speech Recordings via Blind Source SeparationJiriMalek, Zbynek Koldovsky, jindrich Zdansky, Jan Nouza, Technical University ofLiberec, Czech Republic

Studies on Estimation of the Number of Sources in Blind Source SeparationTakaaki Ishibashi1, Hidetoshi Nakashima1, Hiromu Gotanda21Kumamoto National College of Technology, Japan; 2Kinki University, Japan

Speech Enhancement Based on Hypothesized Wiener FilteringV. Ramasubramanian1, Deepak Vijaywargi21 Siemens Corporate Technology India, India;2 University of Washington, USA

Psychoacoustically-Motivated Adaptive /i-Order Generalized Spectral Subtraction

Based on Data-Driven OptimizationJunfeng lV

,Hui Jiang2, Masato AkagO

XJAIST, Japan;2 York University, Canada

Two Stage Iterative Wiener Filtering for Speech EnhancementKrishna Nand K., T.V. Sreenivas, Indian Institute ofScience, India

Assessment of Correlation Between Objective Measures and Speech RecognitionPerformance in the Evaluation of Speech EnhancementPei Ding, Jie Hao, Toshiba China R&D Center, China

TueSe3.P3: Spoken Language Systems I

Mezzanine Level Area B3, Time 13:30-15:30, Tuesday 23rd September 2008 Chair: Sebastian Moller

Page 183TueSe3.P3-l

Page 187

TueSe3.P3-2

Page 191TueSe3.P3-3

PAGE 195TueSc3.P3-4

Page 199TueSe3.P3-5

Page 203TueSe3.P3-6

Page 207

TueSe3.P3-7

Predicting ASR Errors by Exploiting Barge-In Rate of Individual Users for SpokenDialogue SystemsKazunori Komatani, Tatsuya Kawahara, Hiroshi G. Okimo, Kyoto University, Japan

Expanding Vocabulary for Recognizing User's Abbreviations of Proper Nouns

Without Increasing ASR Error Rates in Spoken Dialogue SystemsMasaki Katsumaru, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno, Kyoto

University, Japan

Exploiting the ASR N-Best by Tracking Multiple Dialog State Hypotheses

Jason D. Williams, AT&TLabs Research, USA

A Spoken Language Interpretation Component for a Robot Dialogue SystemEnes Makalic, lngrid Zukerman, Michael Niemann, Monash University, Australia

MUESLI: Multiple Utterance Error Correction for a Spoken Language Interface

Federico Cesari, Horacio Franco, Gregory K. Myers, Harry Bratt, SRI International,

USA

Methods to Optimize Transcription of On-Line Media

Sarah Conrod\ Sara Basson1, Dimitri Kanevsky '1

1Cape Breton University, Canada; 2IBM T.J. Watson Research Center, USA

Discrimination of Task-Related Words for Vocabulary Design of Spoken Dialog

SystemsAkinori ho1, Toyomi Meguro1, Shozo Makino1, Motoyuki Suzuki11 Tohoku University, Japan;2 University of Tokushima, Japan

TucSc'I.PJ continued

Page 211

TueSe3.P3-8

Page 215TueSe3.P3-9

Page 219

TueSe3.P3-10

Page 220TueSe3.P3-ll

Page 224

TueSe3.P3-12

Page 228TueSe3.P3-13



Dialog Management Using Weighted Finite-State Transducers

Chiori Hon'1, Kiyonori Ohtake1, Teruhisa Misu1, Hideki Kashioka1,Satashi Nakamura2

lNICT, Japan; 2ATR-SLC, Japan

Probabilistic Answer Selection Based on Conditional Random Fields for SpokenDialog System

Yoshitaka Yoshimi, Ryota Kakitsuba, Yoshihiko Nankaku, Akinobu Lee,

Keiichi Tokuda, Nagoya Institute of Technology, Japan

Let's Go Lab: A Platform for Evaluation of Spoken Dialog Systems with Real

World Users

Maxine Eskenazi, Alan W. Black, Antoine Raux, Brian Langmr, Carnegie Mellon

University, USA

The Impact of Language Dynamics on the Capitalization of Broadcast NewsFernando Batista^, Nuno Mamede1, Isabel Trancoso2

1INESC-ID/ISCTE, Portugal; 2INESC-ID/IST, Portugal

Lightly Supervised Acoustic Model Training on EPPS RecordingsMatthias Paulik, Alex Waibel, Universitdt Karlsruhe (TH), Germany

Fast Call-Classification System Development Without In-Domain Training Data

Christophe. Servan, Frederic Bechet, LLA, France

/CNC and iROVER: The Limits of Improving System Combination with

Classification?

Bjorn Hoffmeister, RalfSchluter, Hermann Ney, RWTH Aachen University, Germany

System Combination for Spoken Language UnderstandingStefan Hahn, Patrick Lehnen, Hermann Ney, RWTH Aachen Unh'ersity, Germany

TueSe3.P4: Emotion and Expression I

Mezzanine Level Area B4, Time 13:30 -15:30, Tuesday 23rd September 2008 Chair: Denis Burnham

Page 240

TucSe3.P4-l

Page 241TueSe3.P4-2

Page 245TueSe3.P4-3

Page 249TueSe3.P4-4

Page 253TueSe3.P4-5

PAGE 257

TueSe3.P4-6

Page 261TucSe3.P4-7

Multidimensional Features of Emotional Speech

Tomoko Suzuki], Machiko Ikemoto-, Tomoko Sano:\ Toshihiko Kinoshita11 Kansai Medical University, Japan; 2Doshisha University, Japan; 3Seibi Gakuen

College, Japan

Leveraging Emotion Detection Using Emotions from Yes-No Answers

Narjes Boufaden, Pierre Dumouchel, CRIM, Canada

Vowel Placement During Operatic Singing: 'Come si Parla' or 'Aggiustamento'?Thomas J. Millhouse, Dianna T. Kenny, University of Sydney, Australia

Study on Strained Rough Voice as a Conveyer of RageYumiko O. Kato, Yoshifumi Hirose, Takahiro Kamai, Matsushita Electric Industrial Co.

Ltd., Japan

Integrating Rule and Template-Based Approaches for Emotional Malay Speech

Synthesis

Mumtaz Begum1, Raja N. Ainon1, Roziati Zainuddin 1, Zuraidah M. Don1,

Gerry Knowles2

1 University ofMalaya, Malaysia; 2Miquest Worldwide Sdn. Bhd., Malaysia

The Expression and Perception of Emotions: Comparing Assessments of Self

versus Others

Carlos Busso, ShrikanthS. Narayanan, University of Southern California, USA

On the Role of Acting Skills for the Collection of Simulated Emotional SpeechEmiel Krahmer, Marc Swerts, Tilburg University, The Netherlands

TueSc 3.P4 contin ued

Detection of Security Related Affect and Behaviour in Passenger TransportBjdrn Schuller1, Matthias Wimmer'*, Dejan Antic1, Tobias Moosmayr \Gerhard Rigoll1lTechnische Universitdt Miinchen, Germany;2 Waseda University, Japan;3BMWGroup, Germany

TueSe4.01: Automatic Speech Recognition: Acoustic Models I

Great Hall, Time 16:00-18:00, Tuesday 23rd September 2008 Chair: Roger K. Moore

PAGE 269

TueSe4.01-l

10:00-16:20

PAGE 273

TueSe4.01-2

16:20-16:40

Page 277TueSe4.01-316:40-17:00

PAGB 281TueSe4.01-4

17:00-17:20

Page 285TueSe4.01-517:20-17:40

Page 289

TueSe4.01-6

17:40-18:00

Soft Margin Estimation with Various Separation Levels for LVCSR

Jinyu Li1, Zhi-jie Van2, Chin-Hui Lee], Ren-Hua Wang21 Georgia Institute of Technology, USA;2 University of Science & Technology of China,China

On the Equivalence of Gaussian and Log-Linear HMMs

Georg Heigold, Patrick Lehnen, Ralf Schliiter, Hermann Ney, RWTH Aachen

University, Germany

Generalization of Extended Baum-Welch Parameter Estimation for Discriminative

Training and DecodingDimitri Kanevskyx, TaraN. Sainath2, Bhuvana Ramabhadran1, David Nahamoo]1IBM T.J. Watson Research Center, USA;2MIT, USA

An Ellipsoid Constrained Quadratic Programming Perspective to Discriminative

Training of HMMs

Peng Liu, Frank K. Soong, Microsoft Research Asia, China

Discriminative Training of Variable-Parameter HMMs for Noise Robust SpeechRecognition

Dong Yu1, Li Deng1, Yifan Gong 2, Alex Acero11Microsoft Research, USA;2Microsoft Corporation, USA

Towards a Non-Parametric Acoustic Model: An Acoustic Decision Tree for

Observation Probability Calculation

Jasha Droppo1, Michael L. Seltzer*, Alex Acero1, Yu-Hsiang Bosco Chiu21Microsoft Research, USA;2 Carnegie Mellon University, USA

TueSe4.02 : Accent and Language Identification

Plaza 1, Time 16:00-18:00, Tuesday 23rd September 2008 Chair: Yuko Kinoshita

PAGE 293TueSe4.02-l16:00-16:20

Page 297

TueSe4.02-216:20-16:40

Page 301TueSe4.02-3

16:40-17:00

Page 305

TueSe4.02-417:00-17:20

Page 309TueS<>4.02-517:20-17:40

Page 313

TueSe4.02-617:40-18:00

Experiments with the ABI (Accents of the British Isles) Speech CorpusShona D'Arcy], Martin J. Russell21Trinity College Dublin, Ireland;2 University ofBirmingham, UK

Politecnico di Torino System for the 2007 NIST Language Recognition Evaluation

Fabio Castaldo], Emanuele Dalmasso1, Pietro Laface1, Daniele Colibro2,Claudia Vair21 Politecnico di Torino, Italy; 2Loquendo, Italy

Discriminative Training and Channel Compensation for Acoustic LanguageRecognition

Valiantsina Hubeika, Lukds Burget, Pavel Mat&jka, Petr Schwarz, Brno University ofTechnology, Czech Republic

Comparison of Variable Selection Methods and Classifiers for Native Accent

Identification

Tingyao VV'u, Peter Karsmakers, Hugo Van hamme, Dirk Van Compernolle, Katholieke

Universiteit Leuven, Belgium

A Comparison of Subspace Feature-Domain Methods for Language RecognitionW.M. Campbell, Douglas E. Sturim, Pedro A. Torres-Carrasquillo, Douglas A. Reynolds,

MIT, USA

Context-Dependent Phone Models and Models Adaptation for Phonotactic

Language RecognitionMohamed Faouzi BenZeghiba, Jean-Luc Gauvain, Lori Lamel, LIMSI, France

TueSe4.03 : Emotion and Expression II

Plaza 2, Time 16:00-18:00, Tuesday 23rd September 2008 Chair: Ailbhe N( Chasaide

Page 317TueSe4.03-l16:00-1G:20

Page 318TueSe4.03-216:20-16:40

Page 322TueSe4.03-316:40-17:00

Page 326

TueSe4.03-417:00-17:20

Page 330TueSe4.03-517:20-17:40

Page 334TucSe4.03-6

17:40-18:00

Emotions and Articulator^ Precision

Martijn Goudbeek, Jean Philippe Goldman, Klaus R. Scherer, University of Geneva,Switzerland

Assessing Agreement of Observer- and Self-Annotations in Spontaneous

Multimodal Emotion Data

Khiet P. Tnwng, Mark A. Neerincx, David A. van Leeuwen, TNO-D&V, The

Netherlands

Emotion Recognition in Spontaneous Emotional Speech for Anonymity-ProtectedVoice Chat SystemsYoshiko Arimoto1, Hiromi Kawatsu 2, Sumio Ohno1, Hitoshi Iida11 Tokyo University of Technology, Japan; 2NIJLA, Japan

Assigning Suitable Phrasal Tones and Pitch Accents by Sensing Affective

Information from Text to Synthesize Human-Like SpeechMostafa Al Masum Shaikh, Md. Khademul Islam Malla, Keikichi Hirose, University ofTokyo, Japan

Cross-Language Study of Vocal Correlates of Affective States

lrena Yanushevskaya, Ailbhe Ni Chasaide, Christer Gobi, Trinity College Dublin,Ireland

Gender-Related Differences in the Production and Perception of Emotion

Marc Swerts, Emiel Krahmer, Tilburg University, The Netherlands

TueSe4.04: Special Session: PANZE 2008 — Phonetics and Phonology of

Australian and New Zealand EnglishPlaza 3&4, Time 16:00 -18:05, Tuesday 23rd September 2008 Chair: Felicity Cox

PAGE 338TueSe4.04-l

16:00-16:25

Page 342TueSe4.04-216:25-16:30

Page 346TueSe4.04-3J6:50-17:15

Page 347TueSe4.04-417:15-17:40

The English Pronunciation of Successive Groups of Maori SpeakersCatherine I. Watson1, Margaret Maclagan2, Jeanette King -, Ray Harlow-11University ofAuckland, New Zealand;2 University of Canterbury, New Zealand;

3University of Waikato, New Zealand

Reversal of Short Front Vowel Raising in Australian EnglishFelicity Cox, Sallyanne Palcthorpe, Macquarie University, Australia

GOOSE on the Move: A Study of /u/-Fronting in Australian News Speech

Jennifer Price, Monash University, Australia

The Vowels of Australian Aboriginal EnglishAndrew Butcher1, Victoria Anderson21 Flinders University, Australia;2 University ofHawaii at Manoa, USA

Page 351TueSe4.04-5

17:40-18:05

Perception and Production of /i:/, /ia/ and /e:/ in Australian EnglishRobert H. Mannell, Macquarie University, Australia

TueSe4.Pl: Speaker Recognition and Diarisation

Mezzanine Level Area Al, Time 16:00-18:00, Tuesday 23rd September 2008 Chair: Robbie Vogt

PAGE 355TueSe4.PM

Page 359TueSe4.Pl-2

Page 363TueSe4.Pl-3

Page 367TueSe4.Pl-4

Page 371TueSe4.P1-5

Page 375TueSe4.Pl-6

Page 379TueSe4.Pl-7

Page 383

TueSe4.Pl-8

An Expert System in Speaker Verification Task

Zbynek Zajic, Lukds Machlica, Ales Padrta, Jan Vanek, Vlasta Radovd, University ofWest Bohemia, Czech Republic

Cascading Appearance-Based Features for Visual Speaker Verification

David Dean, Sridha Sridharan, Patrick Lucey, Queensland University of Technology,Australia

Improved Novelty Detection for Online GMM Based Speaker Diarization

Konstantin Markov, Satoshi Nakamura, ATR-SLC, Japan

Analysis of Impostor Tests with High Scores in NIST-SRE Context

Salah Eddine Mezaache, Jean-Francois Bonastre, Driss Matrouf, LIA, France

Reinforced Temporal Structure Information for Embedded Utterance-Based

Speaker RecognitionAnthony Larcher1, Jean-Francois Bonastre1, John S.D. Mason2lLIA, France;2Swansea University, UK

Fast Search for Common Segments in Speech Signals for Speaker Verification

Michael Gerher, Beat Pftster, ETH Zurich, Switzerland

Audio-Visual Multilevel Fusion for Speech and Speaker RecognitionGirija Chett)>, Michael Wagner, University of Canberra, Australia

Clustering Initialization Based on Spatial Information for Speaker Diarization of

Meetings

,/. Luque, Carlos Segura, Javier Hernando, Universitat Politecnica de Catalunya, Spain

TueSe4.P2 : Single- and Multichannel Speech Enhancement II

Mezzanine LevelArea A2, Time 16:00 -18:00, Tuesday 23rd September2008 Chair: John H.L Hansen

Page 387TueSe4.P2-l

Page 391TueSe4.F2-2

PAGE 395TueSe4.P2-3

Page 399TueSe4.P2-4

Page 403

TueSe4.P2-5

PAGE 407

TueSe4.P2-6

Page 411TueSe4.P2-7

Effect of Compressing the Dynamic Range of the Power Spectrum in Modulation

Filtering Based Speech Enhancement

James G. Lyons, Kuldip K. Paliwal, Griffith University, Australia

A Long State Vector Kalman Filter for Speech Enhancement

Stephen So, Kuldip K. Paliwal, Griffith University, Australia

Subspace Based Speech Enhancement Using Gaussian Mixture Model

Achintya Kundu, Saikat Chatterjee, T.V. Sreenivas, Indian Institute ofScience, India

Generalized Parametric Spectral Subtraction Using Weighted Euclidean Distortion

-4mit Das, John H.L. Hansen, University of Texas at Dallas, USA

Sudden Noise Reduction Based on GMM with Noise Power Estimation

Nobuyuki Miyake, Tetsuya Takiguchi, Yasuo Arikl, Kobe University, Japan

Speech Enhancement Using a Wiener Denoising Technique and Musical Noise

Reduction

AM Jahangir Alam1, Sid-Ahmed Selouani'1, Douglas 0'Shaughnessy],Sofia Ben Jebara31 Universite du Quebec, Canada;2 Universite du Moncton, Canada; 3Ecole Supepeuredes Communications da Tunis, Tunisia

Regularized Non-Negative Matrix Factorization with Temporal Dependencies for

Speech DenoisingKevin W. Wilson1, Bhiksha Rajy, Paris Smaragdis -

1Mitsubishi Electric Research Labs, USA;2Adobe Systems, USA

TueSe4.P2-8

Page 419TueSe4.P2-9

PAGE 423TueSe4.P2-10

Page 427

TueSe4.P2-ll

Page 431TucSe4.P2-12

Page 435Tu<?Se4.P2-13

TueSe-i.P2 continued...

ICA-Based MAP Speech Enhancement with Multiple Variable Speech Distribution

Models

Xin Zou, Peter Jancovic, Munewer Kokuer, Martin J. Russell, University ofBirmingham, UK

Source Separation Based on Binaural Cues and Source Model Constraints

Ron./. Weiss, Michael I- Mandel, Daniel P.W. Ellis, Columbia University, USA

Maximum Kurtosis Beamforming with the Generalized Sidelobe Canceller

Kenichi Kumatani], John McDonough2, Barbara Rauch~, Philip N. Garner1,

Weifeng Li1, John Dines1

1IDIAP Research Institute, Switzerland;2Saarland University, Germany

Noise Robust Speech Dereverberation Using Constrained Inverse Filter

Ken 'ichi Furvya1, Akiloshi Kataoka2, Yoichi Haneda1

1AT7T Corporation, Japan; 2Ryukoku University, Japan

A Dual Microphone Coherence Based Method for Speech Enhancement in

Headsets

Mohsen Rahmani, Ahmad Akbari, Be.ghdad Ayad, Iran University of Science &

Technology, Iran

Sound Capture System and Spatial Filter for Small Devices

Ivan Tashev], Slavy Mihov2, Tyler Gleghom3, Alex Acero11 Microsoft Research, USA;2 Technical University of Sofia, Bulgaria;3 Microsoft

Corporation, USA


An Effective Microphone Array Post-Filter in Arbitrary Environments

Ning Cheng, Wen-ju Liu, Peng Li, Bo Xu, Chinese Academy ofSciences, China

Tue.SV.-f.Pc' continued...

lAGf , r

Localization of Multiple Sound Sources Based on Inter-Channel Correlation UsingTueSe4.P2-15

a Distributed Microphone SystemKook Cho, Hajime Okumura, Takanobu Nishmra, Yoichi Yamashita, Ritsumeikan

University, Japan

Tuesetra-16 A Frequency Domain Approach for Speech Enhancement with DirectionalityUsing Compact Microphone Array

Heng Zhang, Qiang Fu, Yonghong Yan, Chinese Academy ofSciences, China

TueSe4.P3: Spoken Language Systems II

Mezzanine Level Area B3, Time 16:00-18:00, Tuesday 23rd September 2008 Chair: Jerome Bellegarda

Pace 451TueSe4.P3-l

PAGE 455TueSe4.P3-2

Page 459

TueSe4.P3-3

Page 463TueSe4.P3-4

Page 467TueSe4.P3-5

Page 471

TueSe4.P3-6

Question and Answer Database Optimization Using Speech Recognition Results

Shota Takeuchi, Tobias Cincarek, Hiromichi Kawanami, Hiroshi Saruwatari,

Kiyohiro Shikano, NAIST, Japan

Development and Evaluation of Hands-Free Spoken Dialogue System for RailwayStation Guidance

Hiroshi Saruwatari, Yu Takahashi, Hiroyuki Sakai, Shota Takeuchi, Tobias Cincarek,

Hiromichi Kawanami, Kiyohiro Shikano, NAIST, Japan

Statistical Shared Plan-Based Dialog ManagementAmanda j. Stent, Srinivas Bangalore, AT&T Labs Research, USA

When Calls Go Wrong: How to Detect Problematic Calls Based on Log-Files andEmotions?

Ota Herm1, Alexander Schmitt2, Jackson Llscombe 3

1 Czech Technical University in Prague, Czech Republic;2 University of Ulm, Germany;

3SpeechCycle Inc., USA

Unsupervised Learning of Edit Parameters for Matching Name Variants

Dan Gillick1, Dilek Hakkani-Tiir2, Michael Levit2

1University of California at Berkeley, USA;2ICSI, USA

Detection of Repetitions in Spontaneous Speech in Dialogue Sessions

Mert Cevik1, Fuliang Weng2, Chin-Hui Lee'1 Georgia Institute of Technology, USA;2Robert Bosch Corp., USA

TueSe4.P3 com inned

Page 475

TueSe4.P3-7

Page 479TueSe4.P3-8

Page- 483

TueSe4.P3-9


Page 491TueSe4.P3-ll


Automatic Customer Feedback Processing: Alarm Detection in Open Question

Spoken MessagesNathalie Camelin1, Geraldine Damnati1, Frederic Bechet1, Renato De Mori1

lLIA, France;2 Orange Labs, France

Minimal Training Based Semantic Categorization in a Voice Activated Question

Answering (VAQA) SystemMithun Balakrishna, Marta Tata, Dan Moldovan, Lymba Corporation, USA

User Study of the Bayesian Update of Dialogue State Approach to Dialogue

ManagementB. Thomson, M. Gasic, S. Keizer, F. Mairesse, J. Schatzmann, K. Yu, Steve Young,

University of Cambridge, UK

Extensibility Verification of Robust Domain Selection Against Out-of-Grammar

Utterances in Multi-Domain Spoken Dialogue SystemSatoshi Ikeda, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno, KyotoUniversity, Japan

Improving Large Scale Alphanumeric String Recognition Using Redundant

Information

Ea-Ee Jan 1, Osamuyimen Stewart', Raymond Co2, David Lubensky]lIBM T.J. Watson Research Center, USA; 2IBM Canada Global Business Services,Canada

SPRAAK: An Open Source "SPeech Recognition and Automatic Annotation Kit"

Kris Demuynck, Jan Roelens, Dirk Van Compernolle, Patrick Wambacq, Katholieke

Universiteit Leuven, Belgium

TueSc4.P3 continued...

Pace 496TueSe4.P3-13



Page 508TueSe4.1'3-16

Preliminary Evaluation of Speech/Sound Recognition for Telemedicine

Application in a Real Environment

Michel Vacher1, Anthony Flemy2, Jean-Erangois Serignat1, Norbert Noury2,Hubert Glasson1

lLIG, France;2 TIMC-IMAG, France

MobiDic — A Mobile Dictation and Notetaking ApplicationMarkka Turnnen, Aleksi Melto, Anssi Kainulainen, Jaakko Hakulinen, University of

Tampere, Finland

Automatic Speech Recognition for Scientific Purposes — webASR

Thomas Ham, Asmaa El Hannani, Stuart N. Wrigley, Vincent Wan, University of

Sheffield, UK

Evaluation of a Live Broadcast News Subtitling System for PortuguesHugo Meinedo, March) Viveiros, Joao Neto, MESC-ID/IST, Portugal

TueSe4.P4: Perception, Production, Discourse and DialogMezzanine Level Area B4, Time 16:00-18:00, Tuesday 23rd September 2008 Chair: Deborah Loakes

page 512TueSe4.P4-l

Page 516TueSe4.P4-2

Page 520

TueSe4.P4-3

Page 524TueSc4.P4-4

PAGE 528TueSe4.P4-5

PAGE 529TueSe4.F4-6

PAGE 530TueSc4.P4-7

Page 534TueSe4.P4-8

Recognizing and Modelling Regional Varieties of Swedish

Jonas Beskow1, Gtista Bruce2, Laura Enflo1, Bjorn Granstrom1, Susanna Schotz2

lKTH, Sweden;2Lund University, Sweden

Vowel Duration, Compression and Lengthening in Stressed Syllables in Central

and Southern Varieties of Standard Italian

John Hajek, Mary Stevens, University ofMelbourne, Australia

Acoustic Cues for the Perception of Intonation in Cantonese

JoanK.-Y. Ma1, Valter Ciocca1, "TaraL. Whitelull11University ofHong Kong, China;2 University ofBritish Columbia, Canada

Perception of Dialectal ProsodyAdrian Leemann, Beat Siebenhaar, University ofBerne, Switzerland

Does the McGurk Effect Rely on Processing Time Constraints?

Christian Kroos, Ashlie Dreves, University of Western Sydney, Australia

Exploring the Uncanny Valley Effect with Talking HeadsTakaaki Kuratate, Kathryn Ayers, Jeesun Kim, Denis Burnham, University of Western

Sydney, Australia

How Do the Elderly Talk to a Natural Language Call Routing System?KnutKvale, Ragnhild Halvorsrud, Telenor Research & Innovation, Norway

Analysis of Relationship Between Impression of Human-to-Human Conversations

and Prosodic Change and Its ModelingRyota Nishimura1, Norihide Kitaoka 2, Seiichi Nakagawa'1Toyohashi University of Technology, Japan; 2Nagoya University, Japan

Tue.Se-4.P4 continued

Page 538TueSe4.P4-9

Page 542

TueSe4.P4-10

Page 543

TueSc4.P4-l 1

Page 544

TueSe4.P4-12

Page 545

TueSe4.P4-13

Utterance-Level Normalization for Relative Articulation Rate Analysis

Tuomo Saarni, Jussi Hakokari, Jouni Isoaho, Tapio Salakoski, University of Turku,

Finland

Syntactic Complexity Induces Explicit Grounding in the MapTask Corpus

Martin Tietze, Vera Demberg, Johanna D. Moore, University ofEdinburgh, UK

Do Discourse Cues Facilitate Recall in Information Presentation Messages?Andi Winterboer, Johanna D. Moore., Fernanda Ferreim, University ofEdinburgh, UK

Structured Heterogeneity of English Stress Variants

Noriko Hattori, Mie University, Japan

A Method for Automatically Estimating FO Model Parameters and a Speech

Re-Synthesis Tool Using FO Model and STRAIGHT

Shota Sato1, Taro Kimura Yasuo Horiuchi1, Masafumi Nishida1, Shingo Kuroiwa1,Akira Ichikawa11 Chiba University, Japan;2Nintendo Co. Ltd., Japan

WedSe2.01: Single-Channel Speech EnhancementGreat Hall, Time 10:00-12:00, Wednesday 24th September 2008 Chair: Frank K. Soong

Page 549WedSe2.01-l10:00-10:20

Page 553

WedSe2.01-210:20-10:40

Page 557WedSe2.01-310:40-11:00

Page 5C1WedSe2.01-411:00-11:20

Page 565WedSe2.01-511:20-11:40

Page 569

WedSe2.01-611:40-12:00

Noise Driven Short-Time Phase Spectrum Compensation Procedure for SpeechEnhancement

Anthony P. Stark, Kamil K. Wojcicki, James G. Lyons, Kuldip K. Paliwal, GriffithUniversity, Australia

A Phase-Averaged Model for the Relationship Between Noisy Speech, Clean

Speech and Noise in the Log-Mel Domain

Friedrich Faubcl, John McDonough, Dietrich Klakow, Saarland University, Germany

Time and Frequency Dependent Amplification for Speech IntelligibilityEnhancement in Noisy Environments

Henk Brouckxon1, Werner Verhelst1, Bart De Schuymer21Vrije Universiteit Brussel, Belgium;2 Televic nv, Belgium

A Wavelet Based Speech Enhancement Method Using Noise Classification and

ShapingMahdi Mohammadi1, Behzad Zainani1, Babak Nasersharif2, Mohsen Rahmani1,Ahmad Akbari1

lIran University of Science & Technology, Iran;2 University ofGuilan, Iran

Speech Enhancement Based on Novel Two-Step a priori SNR Estimators

Md. Jahangir Alam1, Douglas O'Shaughnessy1, Sid-Ahmed Selouani21Universite du Quebec, Canada;2 Universite du Moncton, Canada

A Speech Enhancement Approach Using Piecewise Linear Approximation of an

Explicit Model of Environmental Distortions

Jun Du, Qiang Huo, Microsoft Research Asia, China

WedSe2.02: Speech Synthesis Methods I

Plaza 1, Time 10:00-12:00, Wednesday 24th September 2008 Chair: Rolf Carlson

Page 573WedSe2.02-l10:00-10:20

Page 577

WedSe2.02-210:20-10:40

PAGE 581

WedSe2.02-310:40-11:00

Page 585WedSe2.02-411:00-11:20

Page 589WedSe2.02-5

11:20-11:40

Page 593

WedSe2,02-611:40-12:00

Articulator^ Control of HMM-Based Parametric Speech Synthesis Driven byPhonetic KnowledgeZhen-Hua Ling1, Korin Richmond2, Junichi Yamagishi2, Ren-Hua Wang11University of Science & Technology of China, China;2 University of Edinburgh, UK

Minimum Generation Error Training with Direct Log Spectral Distortion on LSPs

for HMM-Based Speech Synthesis

Yi-Jian Wu, Keiichi Tokuda, Nagoya Institute of Technology, Japan

Robustness of HMM-Based Speech Synthesis

Junichi Yamagishi1, Zhen-Hua Ling2, Simon King11 University ofEdinburgh, UK;2 University ofScience & Technology of China, China

Improving Preselection in Unit Selection SynthesisAlistair Conkie, Ann Syrdal, Yeon-Jun Kim, Mark Beutnagel, AT&T Labs Research,USA

Efficient Join Cost Computation for Unit Selection Based ITS SystemsFeng Ding1, Jani Nurminen2, Jilei Tian7,

lNokia Research Center, China; 2Nokia Devices R&D, Finland; 3Nokia Research

Center, Finland

A Phonetic Assessment of Cross-Language Voice Conversion

Kayoko Yanagisawa, Mark Uuckvale, University College London, UK

9th annual conference ; vol. 1 - gbv · 9thannualconferenceofthe internationalspeech...

Documents