attention-based recurrent neural network models for joint intent detection and slot filling

Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling

東京⼤学⼤学院⼯学系研究科技術経営戦略学専攻

松尾研究室⼤野峻典

モチベーション：チャットボットを作りたい

• 客「ピザの注⽂がしたいです。」– → 🤖 …{ Intent: ピザ注⽂ }

• ~~ピザ注⽂開始~~ {必要Entities: “種類”, “場所”, “時間”}• 🤖「ピザの種類, 配達場所, 時間を教えてください。」• 客「種類はマルゲリータピザで、東京都OO-XX-OOにお願い。」

– → 🤖 …{Entities: {種類: “マルゲリータピザ”, 場所: “東京都OO-XX-OO”, 時間: “”}}• 🤖「マルゲリータピザで, 東京都OO-XX-OOですね。時間を教えてくださ

い。」• 客「時間, 19時半で。」

– → 🤖 …{Entities: {種類: “マルゲリータピザ”, 場所: “東京都OO-XX-OO”, 時間: “19:30”}}

• 🤖「マルゲリータピザで, 東京都OO-XX-OO, 19時で注⽂を受け付けました。」

• 必要なこと: テキスト⽂のIntent理解+各単語に対応するentity labelの理解.

2

書誌情報

3

• 論⽂名：“Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling”– https://arxiv.org/pdf/1609.01454.pdf

• 著者：Bing Liu , Ian Lane– Carnegie Mellon University

• 公開⽇：6 Sep 2016

• Accepted at Interspeech 2016

• ※ 特に断りが無い場合は, 上記論⽂, Slide, Videoから引⽤

Abstract

• 本研究では, intent detectionとslot fillingを⾏うattention-based NN モデルを提案.

• 従来の機械翻訳や発話理解の会話システムと異なり, slot fillingでは⽂字のalignment(順番)が明確.– その順番に関する情報をencoder-decoderフレームに組み込む作戦を⾊々考えた.

• attentionに関する情報は, intentの分類と, slotのラベルの予測に活⽤する.

• 結果: Intent分類のerror rateと, slot fillingのF1 scoreにおいて, ATISタスクでSotA達成. – Intent分類では, 0.56%のエラー改善, slot fillingでは, 0.23%の改善を得た.

• キーワード: Spoken Language Understanding, Slot Filling, Intent Detection, Recurrent Neural Network, Attention Model

4

Introduction

• 2つのタイプのSequenceモデルを説明する.– ① Intent detection & slot filling (from spoken language understanding)– ② Attention-based encoder-decoder(from machine translation/speech

recognition)

5

Introduction

• ① Spoken Language Understanding(SLU) の２つの重要なタスク– Intent detection/classification: 話者のintent分類(特定)

• 会話の意味分類問題.• SVM, DNNで解かれてきた.

– Slot Filling: 意味的に重要な構成要素抽出• Sequence labeling task. • Maximum entropy Markov models, conditional random fields, RNNsなどで解かれてきた.

– 近年, intent detectionとslot fillingの2タスクを１つのモデルで⾏うjoint modelが提案された.

6

Introduction

• ② 機械翻訳, speech認識におけるattention構造を持つEncoder-decoderモデル– Input sequenceをベクトル表現にencodeし, それをdecodeしてoutput sequenceを⽣

成 (sequence learning)– “Neural machine translation by jointly learning to align and translate,” (D.

Bahdanau, K. Cho, and Y. Bengio, ) [12]• Encoder-decoderモデルがattention構造により, align(語順)とdecodeを同時に学習できる

ものが提案されている

7

Introduction

• 以上, sequenceモデルの強さをまとめると, – Attention-based encoder-decoderモデルは, alignment情報が無い中で, 異なる⻑さ

のsequenceをmappingすることができる. (②)– Slot-fillingでは, alignment情報は明⽰的に与えられ, alignment-based RNNモデルが

機能する. (①)

• 本論⽂では, 上記①②を組み合わせることを考える.– Slot-fillingにおけるalignment情報が, encoder-decoderモデルでどう活⽤できるか– Encoder-decoderモデルにおけるattention構造が, slot-fillingでどう活⽤できるか– そして, そうした活⽤をした上で, slot-fillingとintent-detectionのjointモデルをいかに

設計するか

8

Background > RNN for Slot Filling

• Slot filling– Input sequence X → Label sequence Y のマッピングを⾏う関数fを学習– xとyの⻑さは同じで, alignmentは明確.

• RNNでは, slot fillingの各タイムステップごとに, 1単語を読み, 対応する 1つのslot labelを返す.– ここでは, その⼊⼒単語と, これまで出⼒されたsequenceから全ての情報⽤いて, slot

labelの推測がされる.– 数式にすると, 以下の尤度最⼤化するようなθを学習している.

• x: input word sequence, y1t-1: 1番⽬からt-1番⽬までのoutput label sequence

– 推論時は⼊⼒xに対して以下を満たすようなy^を⾒つける.

9

Background > RNN Encoder-Decoder

• Encoder:– input sequence (x1, …, xT) → vector c– vector cに⼊⼒sequence全体の意味がencodeされる.

• Decoder:– Vector cからtarget sequence⽣成.– Decoderでは, output sequenceの確率を以下で定義.

• 前ページで⾒た sequence labelingのためのRNNと異なり, encoder-decoderモデルでは, 異なる⻑さのsequence to sequenceのmappingができ, 明⽰的な⼊出⼒間のalignment情報は無い.

• → “Neural machine translation by jointly learning to align and translate,” (D. Bahdanau, K. Cho, and Y. Bengio, ) [12] では, encoder-decoderモデルがsoftなalignmentを学習し, 同時にdecodeできるようなattention構造を提案.

10

Proposed Methods

• 以下２つのアプローチを紹介– ①Alignment情報を, slot-fillingとintent-detectionタスクを遂⾏するためにencoder-

decoder構造に統合するアプローチ– ②Encoder-decoder構造におけるattention構造を, alignment-based RNNモデルに適

⽤するアプローチ

11

Proposed Methods > Encoder-Decoder Model with Aligned Inputs ①

• ①Alignment情報を, slot-fillingとintent-detectionタスクを遂⾏するためにencoder-decoder構造に統合するアプローチ

12


• Spot filling: input words x=(x1,…,xT) → label y=(y1, …, yT)

• Encoderにはbidirectional RNNを⽤いた.– Forward, Backwardの両⽅の向きで⼊⼒sequenceを読む. – Forward: hidden state fhi を各タイムステップで⽣成.– Backward: 後ろから読み, hidden states (bhT,…,bh1) を⽣成.– 各セルの最終的なhidden stateの値 hiは, fhiとbhiをconcatし得る. (i.e. hi=[fhi, bhi])

• RNNのユニットにはLSTMを⽤いた.

• Backward encoder RNN の最後のstateを, decoderの最初のhidden stateとする[12]– Forward, backward encoder RNNの最後のstateが⽂全体の情報を持つ.

13


• Decoder はunidirectional RNN.– 各タイムステップで, decoder state siは, 前のsi-1, label yi-1, aligned encoder hidden

state hi, context vector ciから計算される. (hiは, 各decoding stepで明⽰的なaligned inputに.)

– Context vector ciは, encoder states h=(h1, …, hT)の重み付けされた和で計算される.• ⼊⼒⽂章の中でdecoderが注意(attention)すべき箇所を⽰唆してる

• αは, 以下で計算される. gは, feed-forward neural network.

14


• Intent detectionとslot fillingを共に⾏うjoint モデルにするため, intent detection⽤のdecoderを追加する. (Fig2のアーキテクチャの右上のセル)– encoder部分はslot-fillingと共有.– 単⼀の出⼒出すだけなので, alignment情報は要らない.– Slot-fillingのときの初期の隠れ値s0(⽂全体encodeしてる)と, context vector cintent(⼊

⼒⽂章の中でdecoderが注意すべき箇所を⽰唆してる)を⼊⼒に持つ関数.

• 訓練の際は, intent detectionのdecoderと, slot-fillingのdecoderの両⽅からの誤差が伝播される.

15

Proposed Methods > Attention-Based RNN Model ②

• ②Encoder-decoder構造におけるattention構造を, alignment-based RNNモデルに適⽤するアプローチ.– Bidirectional RNN(BiRNN)を⽤いたsequence labeling.– 各stepで, aligned hidden state hiを活⽤するだけでなく, context vector ciの利⽤を

してみる.• Hidden state は⽂全体の意味を持つが, 遠くの単語の意味は徐々に忘れてしまうため, そうし

た情報をciで補えるかみる.

16

Proposed Methods > Attention-Based RNN Model ②

• BiRNNは⼊⼒⽂章をforward/backward両⽅向から読む. RNN unitには同じくLSTMセルを⽤いる.

• Slot label dependencies は, forward RNNに組み込まれてる.• Encoder-decoder構造のencoderと同様に, hidden state hiは, fhiとbhiを

concatenateしたもの.– 各hiは⼊⼒sequence全体の情報を含み, 特に各i番⽬の単語周りにfocusしてる.

• hiは, context vector ciと組み合わされ, label分類を⾏う. (ciは, encoder-decoder構造のとき同様, h=(h1,…,hT)を重み付きで⾜し合わせて算出.)

• Intent detectionは, ↑で計算したhを再利⽤して⾏う.– Attention構造を使わない場合は, mean-poolingをhに対して⾏い, その後logistic回帰

を⾏い分類.– Attention構造をつかう場合は, hidden state hの重み付け平均をすることで計算.

17

Proposed Methods

• Aligned inputsを活⽤したAttention-based encoder-decoderモデル(①)とくらべて, attention-based RNNモデル(②)はより計算効率が良い.– モデルの訓練時, encoder-decoder slot filling model(①)は, ⼊⼒sequenceを2度読む

のに対して, attention-based RNN model(②)は⼀度しか読まない.

18

Experiments > Data

• ATIS(Airline Travel Information Systems)データセットの[6,7,9,19]におけるセットアップ[6,7,9,19]で.– Training set: 4978 utterances from ATIS-2, ATIS-3 corpora– Test set: 893 utterances from ATIS-3 NOV93, DEC94– Slot labelsの種類: 127, intent typeの種類: 18– 評価

• Slot filling: F1 score.• Intent detection: classification error rate.

• さらに[9,20]で使われている追加のATISも得た– 5138 utterances– Slot labelの種類: 110, intent typeの種類: 21– [9,20]同様, 10-fold cross validationを⾏った.

19

Experiments > Training Procedure

• LSTMの実装は[21]に沿う.• LSTMセルにおけるユニット数を128に設定.• Forget gate biasは1にセット. [22]• LSTMの1層だけ使⽤. (LSTM層を重ねてより深いモデルを作るのはfuture

workで.)

• サイズ128のWord embeddingは, ランダムに初期化され, batch-size16のミニバッチ訓練の中でfine-tunedされる.

• Non-recurrent connectionsには訓練中 Dropout rate 0.5を適⽤. • Gradient clippingのためのmaximum normは, 5に設定. • OptimizationにはAdamを使⽤.

20

Experiments > Independent Training Model Results: Slot Filling

• 今回の提案モデルを, Slot fillingのみで独⽴に訓練した場合.

• 上2つ(a)(b)みると,やはりalignment 情報は今回のタスクに必要そう.• (b)(c)みると, attentionが微妙に精度に貢献していることがわかる.

– Attentionは基本的には⽂全体に均等に分散してたが, ⼀部, attentionが精度上げているようなケースもあった.

– 以下, noon部分のslotを予測するときのattention. (暗いところ程attention強い.) flight, cleveland, dallas,に注⽬して, slot label ”B-depart_time.period_of_day”を導けてる.

21


• 今回の提案モデルを, Slot fillingのみで独⽴に訓練した場合.

• 下２つは, section3.2のやつ.• attentionつけることでの精度向上はわずか.

– → ATISデータセットレベルの⻑さのテキストでは, attentionの恩恵無くとも hidden state hiがslot labelingに必要な⽂全体の情報をencodeできていそう.

22


• Slot fillingモデルを以前のアプローチと⽐較した. • 今回提案するどちらのモデルも精度以前のものに勝る.

23

Experiments > Independent Training Model Results: Intent Detection

• Intent classification errorにおける以前のモデルとの⽐較

– ⼤差をつけて既存SotAに勝った.• Attention-based encoder-decoder intent modelがbidirectional RNN

modelに勝った(表の下２つ)– Encoderから渡されているSequence levelの情報と, decoder RNNに追加された⾮線形

層(cintent計算してるとこ)の影響かも.

24

Experiments > Joint Model Results

• 2タスクともに⾏うjointモデルでの精度⽐較

– Encoder-decoderアーキテクチャはjointにすることでindependentのときより, slot fillingタスクで0.09%, intent detectionタスクで0.45%改善した.

– Attention-based bidirectional RNNはjointにすることでindependentのときより, slot filling で0.23%, intent detectionで0.56%改善した.

– → attention-based bidirectional RNNの⽅が, joint訓練の恩恵⼤きく受けてる.• さらに追加のデータで10-fold cross validationしてやる場合も, 提案⼿法ど

ちらも, 良い精度を出した.

25

Conclusions

• Slot-filling, intent-detectionの2タスクを同時にこなす上で, alignment情報をattention-based encoder-decoder NNモデルで活⽤する⽅法を探索し, またattention-based bidirectional RNNモデルを提案した.

• ダイアログシステムを作る際に, 2つのモデルを作らずとも, 1つのjointモデルで済む嬉しさ.

• 提案⼿法は, ATISでstate-of-the-artの精度出した.

26

attention-based recurrent neural network models for joint intent detection and slot filling

Science