nam(noルaudible murmur) recognition · lower edge of thyroid cartilage) cropping 1 pi:xcl width...

6
- NAM(NoルAudible Murmur) COGNITION NAM never used for communlcat!On or mput tnterface the human hlstory NAIST JAPAN The Definition of NAM YOSHITAKA NAKAJIMA PANIKOS HERACREOUS KIYOHIRO SHIKANO NICK CAMPBELL : 輔audible Murmur (NAM) Non-audible murmur (NAM) is a kind ofmonologue taking place in the mouth, which �l nCJI・1ηp口州n \\ ould +0t bc able to hじar NAM is the arcuJatじd productlon けircsplraH】れ 、けnd \\ lthout re以rsじ to 川町dl-fold \ lhratlOnproduced by the motions and interactions of speech organs, which is transmitted through thc 仲良 tissue of the head. It's almost like 'lip thinkiI Problems of Noal Speech Interface Communication (Including Speech Recognition) Air conducted voice can't be controlled its publicity. This technology ís essentially prone to :" especially in publíc places. pil; ì(�)�コこ二 Speech ís often consídered to be an unwanted ('ü Sf; fをin the same public situation. Surrounding people can J\にr �,' \r . υ�)nt(' r t. Can you trust voice commands in an ""円ソて .? It can be \ ltH. j', ,. ,1 to talk to a machine or to the air in the presence of others. The essence of all these problems are that we have to continuously detect voice sounds, dispersed by use of THE PROBlEM QF NORMAL SPEεCH INTER F ACE ! S PHONATING SPEεC Variousanner of Speeches Vibration of Vocal Chords Intention of Communication Intention for Communication Speech of the Articulated Respiratory Sound

Upload: others

Post on 13-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: NAM(NoルAudible Murmur) RECOGNITION · Lower Edge of Thyroid Cartilage) Cropping 1 pi:xcl width from everv frame (30/sec) and mcrgc them by time series Conc1usion and What's Nèw

トコ-.:) トコ

NAM(NoルAudible Murmur) RECOGNITION

NAM never used for communlcat!On or

mput tnterface Ifì the human hlstory

NAIST JAPAN

The Definition of NAM

YOSHIT AKA NAKAJIMA PANIKOS HERACREOUS KIYOHIRO SHIKANO NICK CAMPBELL

: 輔audible Murmur (NAM) Non-audible murmur (NAM) is a kind ofmonologue

taking place in the mouth, which �l nCJI・1ηp口州n

\\ ould 110t bc able to hじar.

NAM is the art1cuJatじd productlon けi、 rcsplraH】れ

、けlInd \\ lthout re以)Ursじ to 川町dl-fold \ lhratlOn,

produced by the motions and interactions of speech organs, which is transmitted through thc 仲良 tissue of

the head.

It' s almost like 'lip thinkiI

Problems of Nonnal Speech Interface Communication

(Including Speech Recognition)

Air conducted voice can't be controlled its publicity.

This technology ís essentially prone to :."

especially in publíc places.

pil乙1; ì(�)�コこ二

Speech ís often consídered to be an unwanted ('ü Sf; fをin the same public situation.

Surrounding people can J\にr �,' �if" \r....._... υ�)nt('r、t.

Can you trust voice commands in an ""円ソて .? It can be \ ltH. j', "l,. ,1 to talk to a machine or to the air

in the presence of others.

The essence of all these problems are that we have to

continuously detect voice sounds, dispersed

by use of

THE PROBlEM QF NORMAL SPEεCH INTERFACE ! S PHONA TING SPEεCトi

Various恥fanner of Speeches

Vibration of Vocal Chords

Intention of Communication

Intention for Communication Speech of the Articulated Respiratory Sound

Page 2: NAM(NoルAudible Murmur) RECOGNITION · Lower Edge of Thyroid Cartilage) Cropping 1 pi:xcl width from everv frame (30/sec) and mcrgc them by time series Conc1usion and What's Nèw

Endosωpy of Larynx in Various Conditions Norma1 Speech, Whisper and NAM

NorrnaJ Speech (Radiated in the Air) The Fine Structure ofSoUlld Source ofthç Vocal Fold … 剛 一一ー

B. Breathing

1. Slìghtly Opened Vocal Folds by Triangular Wìndow

C. Normal Speech of Vowels A. Restíng

NAM

ーf'f

Iγ)t> Kt:'ちII <H'-")' ! f .1('を,- ‘ i也 司u切( � JG

2. The Upper Structure ìs Bulgîng out and D. Whispering Cove出9 the Vocal制ds

、,jrr(r... n

t、コ'J 弘2

槽闘f瑚副'

.... ?はicrophoßl' ヘmplif1er

Stethoscopíc Type NAM Microphone E忌

\\パJ1

1J

践例nTI shot

Synthetìc rubber Cro書s sectjou

Micro echoic sρace SmaJf hole /'

Vibratìon and fixatìon plate

…ω

/札叫智

/

・曾Ea

J

e--、a

c

--a

n

u

、anu

m

o

c

e

c a

d

e ---n

y--a

1 Condenser

、仰がれ Huild lc'lU\ti< 込制ふ)...ll

o�gJ_�ふSJJA 44.SJMSふふふhelóu nello

i?::足立数万乙 ム:1 ; .. '[10

�$�問 診ゐ石 � .込�る"

RecognitÎon by H結期

Page 3: NAM(NoルAudible Murmur) RECOGNITION · Lower Edge of Thyroid Cartilage) Cropping 1 pi:xcl width from everv frame (30/sec) and mcrgc them by time series Conc1usion and What's Nèw

一ぷ

町、-_町、-凶臨州鮪附句-

暢Wゅ,w

• + ん 一一事

vm様、〈T'e・e・4伶模。穆場 si。ZTミa

ajF

三〈Z

11: ト2

誠治wm語会

誕極意隼察審も審‘,a--畠蜜丞毒事・傘畠嚢饗霊9・96・va・e,

ぶOCれwaω一の"と」02""

ドぃ・3・

J 義

f i ]

芯ゼ母一淫ω否両HO念仏hF80ぷnrdwM州)

ω阿MO岡山岳}し同日ぷ刷ぷa252mzhcυ川wccm円台ヤtFcczhdA凶50Uao潔き仏MW522忽怠ωegoh耐震〈Z

d聞

-�

斗鞍

8

ノ〆

切れW岡山川whvmw仏∞ωω出向'Hhccceq史同gcuMMO沼EMWCUMWM同案〈ZMCMaoW20山間切szu∞窃ω償ω阿南い

Aせ月iワ臼

i 品

Page 4: NAM(NoルAudible Murmur) RECOGNITION · Lower Edge of Thyroid Cartilage) Cropping 1 pi:xcl width from everv frame (30/sec) and mcrgc them by time series Conc1usion and What's Nèw

ト3-l CJl

Re-estimation of the acoustic model (Monophones)

• We tramcd new HMM acoustic models using ideal NAM recording samples,

牟A male speaker rcad the A TR 525 phonetically balanced sentcnces with NAM four times each (2,100) using HSLab from HTK, with a sampling ratc 16KHz • We then used HCopy to convcrt thcm to inlo 25出. order (12MFCC+ 12ムMFCC←ム戸川町J frames under the same conditions as lor ordinary specch

• Usin且these MFCC as training data with the correspondinιlabel files, wc cmployed HERest to代­cSlimate male specch models for monophones using 16 Gaussian mixtl.lrcs 9 times

For the purpose of evaluation we used the Japanese Dictation Toolkit ENV. Snt Corr Acc Sub Oel Ins Err S.Err

1. ω!日 24 93.61 93白33 4.72 1.67 0.28 6.67 50.00 2. MUSIC 24 91.11 90.00 6.67 2.22 1.11 10.00 62.50 3. TV -NEWS 24 89 .72 89. 17 9.17 1.11 0 .56 10.83 66.67

Sum/Avg 72 91.48 90.83 6.85 1.67 0.65 9,17 59.72

1. 10 a quict room

2. With background music at thc 、'olume we usually enjoy (a 8ach Conccrt)

3. With the sound of TV ncws

Iterative Suoervised Adaptation (1)

tiu.! h ,.'

二で〉暗輔副帯W<I

同M・・一..-'

1・tttt'1 、山市P'.. ��.�.噌伊噌れ、'H'l4川トもれ、,. 州、 � (恥匙れ。i山、れれ州、 ! 曳t ,.. 臥0・..J \ Iい' N 、、 .,.1‘�1I)r“亡 h・‘ 1'. ぃ川 、ÏlÏ・

。。、:、t

守;川ぱ,....

1.. i \、,,� . ?町, ,ムt一・�',.�ーふ5・1、3 唱・� ,�・.ミ,1.'1.... ,-,t川Ht...�目句Jt l....・ τ>b... �‘ '仇,,"'1.,叶一

;;[三二二11・1-sよu二L�

輯 g輔 岬@綱同>l4

;;:1LLJ ,剥民 ‘ i・8 昆蜘.k

ロ..,,]

-ョ・警喝.

tb:," 1 ,. 占←喝・“‘ ... .�‘,1.・勘1 ・"',' tl�‘111-;.-1・川叫川、 .1....'1...11・w、 " がいり官!r"f';J ".勾..1

Re-estimation of Acoustic Model for NAM

Iterative Supervised Adaptation (2)

判事b予flf 1,..1、,、, I .:I..� _., , ..噌蜘,"、u. " .. d.J''''思喝宅.,

主 「汗「;iLげ ! ; i ; i i

ti� � t . .ゆ$喝弔.、"..I.,....r<.l・1.1...,-= )'1、I �HII_.I明、弘、

UB I- 「 [(.,.伶!:I,.I・� .l..ぜ‘川 、,司110.- ,"1叫‘""l.Il n...I'I�

ξ..y瓦石

'嶋崎 ISlu・ 机...け刊、� '" "", ",,-. � \111.... I'T、‘併・"旬、、、,

Page 5: NAM(NoルAudible Murmur) RECOGNITION · Lower Edge of Thyroid Cartilage) Cropping 1 pi:xcl width from everv frame (30/sec) and mcrgc them by time series Conc1usion and What's Nèw

Let's Utilize Volunta.ry Body Noise!

ゑ�惣到詰

'串柿(;f、'l<'fl'払制r 渦品"l>l_-l齢、tJグ1 1'1\1叫混同楠Ü,'!

hZ

Est

円'M』停

車轟za司

ltera.tive Supe1'VÌsed Adaptation (3)

�...・ 5ωf't,tイS亀魚崎、,,1 r{'\ð"� "."旬、, PI一、1,拘‘lo>o� jh......."'"

-蹄。・

Tappíng lop 01 100 head. forehead. chee担. jaw. slde of lhe neck and chesl

015患串ß 結

3. Chckmg t��t.h 6. $cratchtng head

2. COllgh 5, Grmrlmg teelh

1. Snúf 1. Tut

'‘!t- t! '.;1-:'-.1 (_" 納ぜ柑頃ず 叶,,"-1,-ω"川、tt I Í<' �'ni,刈1pí穐& 、:、

.峨" i th<;,� 凶ω"吋同 ゅ‘巾相川、H1W 門川崎鳴岬Þ>、8・8 亀4

ー句。|

Can You 1m樗ine Inspiratory Speech?

、広....

ω.

4 鯵酔鵬主主、 毎回ム-33

Err 59,04

....

S,E .... 100

F働甑話F師会 も倫ゆ.i.," 穐 ,

ム ・ 弘 ,.._�1.:L__._ �← 亀』マ マ写勺.,....--._...,......-."....

b色r..:法語らH

保L7,23

明酬明‘u

Sub

必li

e

Can we also use nom1al speech together with NAM?

蝋脇謙輔醐鰯酬_-_".",物・・・・圃・b・・・NAM (2050 snt.

9・刷揃欄W 鴻... 、...t.I <íJ'� .軸・J .' “‘吋ベ竜 崎酬

抑制F圃圃圃圃圃圃圃圃園田園圃圃圃圃圃圃】

• 寝Mを ;"

議隅ば油輸僻官官岬岬ゆ

い欄嶋削・>� "-11抽瞬�'I;IIIiF, .弘治

h掛伽制,;",...,酔 伊崎駅;oct t.. J$

Ac, 40,96

トIt腕仰 い帯電

t.込二ぷ二r:…咋ー……

Body Transmitted

Ordinary Speech

(205Jljミnt.)

Page 6: NAM(NoルAudible Murmur) RECOGNITION · Lower Edge of Thyroid Cartilage) Cropping 1 pi:xcl width from everv frame (30/sec) and mcrgc them by time series Conc1usion and What's Nèw

['.j -.:J -.:J

The prosody of NAM

h醐

3j F0 .山川Z寸 . HH tl仇:;; ., 00崎町制崎 山崎町 ._.v .,. ー市一司舗 P世間J紛紛崎.. -ヤゆ伊が 地

CaWヲration

Ordim量ry Spccch

t 4.1\ . i';l、 , γv司、

. .

%、お1

樽除概略rt ._. • , . , > . • • • • - � ・・ 泌‘尽蔵宅 ; (1;0、 h\oi(!{'um"" . """ 凶 ; 31 tMiamth5叩 ;

11 .芯b/ ;丹、#

Time Series Imaging of the Movement of the Echoic Shadow (The Shadow of the Lower Edge of Thyroid Cartilage)

Cropping 1 pi:xcl width from everv frame (30/sec) and mcrgc them by time series

Conc1usion and What's Nèw

We developed NAM rmcropれone (stethoscopìc type; attached to the skin).

We found the jdea! senSl勾position for NAM recognition.

We retrained an acoustìc model with NAM and tested the practical use of this method (NAM acoustlc model).

We propose laryngeal e!evatJon mdぞx (しξ1) a new index of prosody, which can show the prosody of NAM without FO, using simple processing of images from medical ultrasonography.

'<' 容l' r (1 .efmed NA紛 れなver used fOf input or c ornmun! cation and orop0 5<: t h a t we should make tじi 診努 of I伐t f給正必>f Uね10' Ifね1t必rfu符印ぞ拐::e of tれ1主りurがfηmれla都Iれ作1い.イ小tれ什守γχt削k

hl.汲Imaow-(心ぷyber.... 下 eまtれIC r狩nachuれìes5