development of a toolkit handling sunao harajulius am lm am lm am lm qadb qadb qadb synthesized...

Post on 28-Sep-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

HTTP Request

hts_engine

Cache check

Speech file - PCM - 16bit - 16kHz

Speech recognizer

N-best of ASR

hypothesis

Response generator

Dialog response

text

Speech recognizer

Speech recognizer

N-best of ASR

hypothesis

N-best of ASR

hypothesis

Response generator

Response generator

Dialog response

text

Dialog response

text

Task Classifi- cation

Text-to-speech

Recorded speech log

HTTP Response

Speech file’s URL

Dialog response

text

Dialog response

text

Synthesized speech log

Julius

AM LM

AM LM

AM LM

QADB

QADB

QADB

Synthesized speech file

Device ID

System=“Takemaru-kun” (jp… User=1D5C0439-81E2-44EF-B715-4…

System> Hello. User> Hello.

System> My name is Takema... User> Your name?

System> The nearest station… User> Where is the nearest station?

System=“Kita-chan”

(jp.naist.spalab.Kitachan)

User=1D5C0439-81E2-44EF-B715-

4CC0CFF0171A

Kita-chan>Are you finding

“Gakken-Kita-Ikoma” station? Takemaru-kun>The nearest station is

“Gakken-Kita-Ikoma” station.

Kita-chan

Conclusion / Future works

Sunao Hara Okayama University, JAPAN

Hiromichi Kawanami Hiroshi Saruwatari Kiyohiro Shikano Nara Institute of Science and Technology, JAPAN

Development of a toolkit handling

multiple speech-oriented guidance

agents for mobile applications

Main Agent

HTTP Response

HTTP Request

Speech file Dialog Management

Sub Agent

Dialog Management

Dialog response

Including task

classifica-tion

Including task

classifica-tion

Dialog response

Delegated agent’s URIs

HTTP Request

Delegated agent’s URI

HTTP Response

Dialog response

URI

Device ID

Device ID

Sub-agent URIs

Dialog response

Selection of subagents Location specific service using GPS (or A-GPS) Experiments/Evaluations under real environment

A Toolkit for developing a Spoken Dialog System (SDS)

Server-and-Client architecture Each server can communicate with the other servers

Aim of this study SDS as a “generalist” is constructed by a huge amount of SDSs as a “specialist” Specialist: Task-dependent dialog, location-dependent dialog, etc.

Approach Single main agent (“generalist”) and multiple sub agents (“specialists”) Client system: iTakemaru (Takemaru-kun for iPhone)

Web 1.0 Web 2.0 Web 3.0 Web 4.0? Information publishing

(one-side) URI is mainly used as URL Hyper-linking: definition for

implicitly relationships between resources

Bookmark (Favorite): Tools for remembering the resource’s URI

Information publishing (interactive)

Concept of usability and sharing was being important

Wiki-wiki, Web-API, Mash-up services, Information Tagging, etc.

Information publishing (collaborative)

Cloud: access to user’s data from any places

Social: information sharing with others

SNS, Cloud computing, Crowd sourcing, etc.

Information explosion: rapidly increasing in the amount of published information; e.g. “Total Recall”

Technology of Information Retrieval will be having more importance

Librarian (Concierge) of each database or websites may be promising

Lifelog, MyLifeBits (Microsoft), etc.

Development of a toolkit handling multiple speech-oriented guidance agents for mobile applications

Single main agent and multiple sub agents

System

In this study, we propose a toolkit to handle multiple speech-oriented guidance agents for mobile applications. The basic architecture of the toolkit is server-and-client architecture. We assumed the servers are located on a cloud-computing environment, and the clients are mobile phones, such as the iPhone. It is difficult to develop an omnipotent spoken dialog system, but it is easy to develop a spoken dialog agent that has limited but deep knowledge. If such limited agents could communicate with each other, a spoken dialog system with wide-ranging knowledge could be created.

Now

top related