p. 1 böhm, maicher (boehm|[email protected]) real-time generation of topic maps...

14
p. 1 Böhm, Maicher (boehm|[email protected]) Real-time Generation of Topic Maps from Speech Streams Real-Time Generation of Topic Maps from Speech Streams TMRA'05 Internatioal Workshop on Topic Maps Research and Applications 06.10.2005 Karsten Böhm, Lutz Maicher University of Leipzig boehm|[email protected]

Upload: kai-savell

Post on 29-Mar-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: P. 1 Böhm, Maicher (boehm|maicher@informatik.uni-leipzig.de) Real-time Generation of Topic Maps from Speech Streams Real-Time Generation of Topic Maps

p. 1 Böh

m,

Maic

her

(boeh

m|m

aic

her@

info

rmati

k.u

ni-

leip

zig

.de)

Real-time Generation of Topic Maps from Speech Streams

Real-Time Generation of Topic Maps from Speech Streams

TMRA'05Internatioal Workshop on Topic Maps

Research and Applications06.10.2005

Karsten Böhm, Lutz MaicherUniversity of Leipzig

boehm|[email protected]

Page 2: P. 1 Böhm, Maicher (boehm|maicher@informatik.uni-leipzig.de) Real-time Generation of Topic Maps from Speech Streams Real-Time Generation of Topic Maps

p. 2 Böh

m,

Maic

her

(boeh

m|m

aic

her@

info

rmati

k.u

ni-

leip

zig

.de)

Real-time Generation of Topic Maps from Speech Streams

Introduction● Topic Maps are means for

– representing (powerful) indexes

– of any information collection

– for semantic information integration● Our goal:

– real-time generation of conceptual indexes of speech streams,

– represented as Topic Maps

– for integration with other information systems

Page 3: P. 1 Böhm, Maicher (boehm|maicher@informatik.uni-leipzig.de) Real-time Generation of Topic Maps from Speech Streams Real-Time Generation of Topic Maps

p. 3 Böh

m,

Maic

her

(boeh

m|m

aic

her@

info

rmati

k.u

ni-

leip

zig

.de)

Real-time Generation of Topic Maps from Speech Streams

How to create Topic Maps

● Topic Maps are a semantic technology ......only in the perspective of information integration

– holding the Co-location objective always true

– „Subject Proxies indicating identical Subjects has to be viewed as merged ones“● Subject Equality Decision Approach● Subject Viewing Approach

● We have to represent the created indexes to hold the Co-location objective true in the perspective of the creator ....

... and therefore we need a theoretic fundament.

Page 4: P. 1 Böhm, Maicher (boehm|maicher@informatik.uni-leipzig.de) Real-time Generation of Topic Maps from Speech Streams Real-Time Generation of Topic Maps

p. 4 Böh

m,

Maic

her

(boeh

m|m

aic

her@

info

rmati

k.u

ni-

leip

zig

.de)

Real-time Generation of Topic Maps from Speech Streams

Are there any elgs?

1. World without any sensory system

Subject Equality Decision Chain

From the child's perspective: („Elgs are sweet.“) I caught always the same Subject, an elg.

From the ranger's perspective: („Bernd needs a cow“) I caught Lisa, Ud (fighting), and Bernd (in summer, in winter and as calf).

From the zoologist's perspective: („Elgs are loners.“) I caught two deers and three elgs.

Decision about Subject Identity is a perspective dependent process under uncertainty whether Subject Stages caught at different occassions belong to the same Subject.

Subject Identity

2. Sensory Systems come to stage, catching Subject Stages

Page 5: P. 1 Böhm, Maicher (boehm|maicher@informatik.uni-leipzig.de) Real-time Generation of Topic Maps from Speech Streams Real-Time Generation of Topic Maps

p. 5 Böh

m,

Maic

her

(boeh

m|m

aic

her@

info

rmati

k.u

ni-

leip

zig

.de)

Real-time Generation of Topic Maps from Speech Streams

Are there any elgs?

1. World without any sensory system

Subject Equality Decision Chain

3. Documenting the impressions (from the rangers perspective) 2. Sensory Systems come to stage, catching Subject Stages

(1) Subjectness: I'm only interested in Lisa, Ud, and Bernd not in snow, trees.

(2) Creating Subject Proxies for the current Subject Stages of Lisa, Ud and Bernd

(3) Try to document the decision about the Subject Identity of the current Subject Stage by the given means of the governing SMD ontology, TMV ontology and TMV vocabulary. Subject Identity of Subject Stages is mapped to Subject Indication of the Subject Proxy

(4) Document all further information observed about the Subject Stage. (Documenting = modelling = loosing information)

4. Subject Equality is decided according to the governing SMD

Page 6: P. 1 Böhm, Maicher (boehm|maicher@informatik.uni-leipzig.de) Real-time Generation of Topic Maps from Speech Streams Real-Time Generation of Topic Maps

p. 6 Böh

m,

Maic

her

(boeh

m|m

aic

her@

info

rmati

k.u

ni-

leip

zig

.de)

Real-time Generation of Topic Maps from Speech Streams

Subject Equality Decision ChainCo-Location Objective: Subject Proxies indicating identical Subjects 1. World without any sensory system

- How to make a qualified assertion about the very nature of Subjects?

2. Sensory systems come on stage, catching Subject Stages

- Never Subjects, only Subject Stages (see Quine) are observed- Subject Identity = Subject Stages caught at different occassions

belong to the same Subject (see Vatants hubjects)

– perspective dependent (see Biezunsky)

– decision process under uncertainty

3. Documenting the impressions from a perspective

– Subjectness in the current perspective– observations are documented restricted by the available

vocabulary (SMD Ontology, TMV ontology, TMV vocabulary)– Decision about Subject Identity is documented according to the

governing Subject Indication Approach

4. Subject Equality is decided according to a SMD

Page 7: P. 1 Böhm, Maicher (boehm|maicher@informatik.uni-leipzig.de) Real-time Generation of Topic Maps from Speech Streams Real-Time Generation of Topic Maps

p. 7 Böh

m,

Maic

her

(boeh

m|m

aic

her@

info

rmati

k.u

ni-

leip

zig

.de)

Real-time Generation of Topic Maps from Speech Streams

The Observation Principle

(1.) Observe the information collections in interest (texts, video streams, etc.) and detect Subject Stages of Subjects in interest from the current perspective.

(3.) Create a Subject Proxy for each Subject Stage in interest.

(4.) Document the decision about the Subject Identity of the current Subject Stage by the given means of the governing SMD ontology, TMV ontology and TMV vocabulary. ( ... and with respect to all expected Subject Equality Decision Approaches applied later to this Subject Proxy)

(5.) Document all further information observed about the Subject Stage by the given means of the governing SMD ontology, TMV ontology and TMV vocabulary.

.. or how to create Topic Maps from digital domains?

(2.) Decide about the Subject Identity of the observed Subject Stages.

Page 8: P. 1 Böhm, Maicher (boehm|maicher@informatik.uni-leipzig.de) Real-time Generation of Topic Maps from Speech Streams Real-Time Generation of Topic Maps

p. 8 Böh

m,

Maic

her

(boeh

m|m

aic

her@

info

rmati

k.u

ni-

leip

zig

.de)

Real-time Generation of Topic Maps from Speech Streams

The Semantic Talk System● Focusses on the support of group oriented conversation● Implementation of a “minimal invasive” IT-solution● Application for interviewing scenarios, innovation

processes and early stages of product development● Semantic Talk creates powerful, conceptual indexes of

Speech Streams in real-time● Combines speech recognition (LinguaTec’s VoicePro) with

Text Mining algorithms ● Provides dynamic visualization

(extended Version of TouchGraph) ● Networked application with multiple clients● Provides a generic RDF-export● Cooperation with University Duisburg-Essen, ISA

Informationssysteme GmbH

Page 9: P. 1 Böhm, Maicher (boehm|maicher@informatik.uni-leipzig.de) Real-time Generation of Topic Maps from Speech Streams Real-Time Generation of Topic Maps

p. 9 Böh

m,

Maic

her

(boeh

m|m

aic

her@

info

rmati

k.u

ni-

leip

zig

.de)

Real-time Generation of Topic Maps from Speech Streams

SemantikTalk: Speech recognition and text Mining

Sliders for configuration

parameters (zooms)

Sliders for configuration

parameters (zooms)

local context windowlocal context window

Overview window (birds eye view)

Overview window (birds eye view)

Window for add. Information

(documents, pictures)

Window for add. Information

(documents, pictures)

Page 10: P. 1 Böhm, Maicher (boehm|maicher@informatik.uni-leipzig.de) Real-time Generation of Topic Maps from Speech Streams Real-Time Generation of Topic Maps

p. 10 Böh

m,

Maic

her

(boeh

m|m

aic

her@

info

rmati

k.u

ni-

leip

zig

.de)

Real-time Generation of Topic Maps from Speech Streams

The Semantic Talk System

Speech recognition 1(VoicePro)

Speech recognition 1(VoicePro)

Speech recognition n

Speech recognition n

Integration und SerializationIntegration und SerializationTopic &

Association Extraction

Topic & Association Extraction

Background Knowledge

with Semantic Relations

Background Knowledge

with Semantic Relations

Visualization component

Visualization component

abc

foo cdf

topic3xyz

Page 11: P. 1 Böhm, Maicher (boehm|maicher@informatik.uni-leipzig.de) Real-time Generation of Topic Maps from Speech Streams Real-Time Generation of Topic Maps

p. 11 Böh

m,

Maic

her

(boeh

m|m

aic

her@

info

rmati

k.u

ni-

leip

zig

.de)

Real-time Generation of Topic Maps from Speech Streams

Semantic Talk creates indexes of speech streams we have to represent them as Topic Maps and use them for semantic information integration

Page 12: P. 1 Böhm, Maicher (boehm|maicher@informatik.uni-leipzig.de) Real-time Generation of Topic Maps from Speech Streams Real-Time Generation of Topic Maps

p. 12 Böh

m,

Maic

her

(boeh

m|m

aic

her@

info

rmati

k.u

ni-

leip

zig

.de)

Real-time Generation of Topic Maps from Speech Streams

From RDF-output to LTM

<st:node rdf:ID="node_Fisichella"> <st:ID>160615</st:ID> <st:label>Fisichella</st:label> <st:nodelevel>1</st:nodelevel> <st:ref_wort_nr rdf:resource="http://www.tt.de/dtd/st/pap#node_160615"/> <st:variant st:index="3" st:type="4" st:weight="0.3176"/></st:node>

ST did observe a noticeable usage of the term "Fisichella" in the speech stream ...

Semantic Mapping between RDF-output and Topic Map using the Omnigator ...

[id7406 : id7276 = "Fisichella" @"http://www.texttech.de/dtd/st/pap#node_160615" @"http://www.texttech.de/dtd/st/pap#node_Fisichella"] {id7406, id3670, [[1]]} {id7406, id7650, [[160615]]}id7549( id7406 : id463, id464 : id2195 )

[id464] {id464, id1636, [[0.31766722453166335]]} {id464, id4378, [[3]]} {id464, id787, [[4]]}

... and this 'noticeable usage of the term Fisichella' becomes the Subject in the Topic Map. (Subject Identity => the same algorithms observes the 'noticable usage' twice)

Page 13: P. 1 Böhm, Maicher (boehm|maicher@informatik.uni-leipzig.de) Real-time Generation of Topic Maps from Speech Streams Real-Time Generation of Topic Maps

p. 13 Böh

m,

Maic

her

(boeh

m|m

aic

her@

info

rmati

k.u

ni-

leip

zig

.de)

Real-time Generation of Topic Maps from Speech Streams

Integration with other Topic Maps ...

Starting point: Integration with an other Topic Map created by the observation principle (for example a motor-sport Topic Map)

- a mapping Topic Map is needed (which should be created under the observation principle, too)

[id @"http://www.formula1-fansite.org/Fisichella " @"http://www.texttech.de/dtd/st/pap#node_Fisichella"]

from the mapping perspective the same Subject is caught,- if Semantic Talk observes a noticeable usage of the term 'Fisichella' - if the motor-sport Topic Map caught a person with the same name.

... to allow more accurate mapping decisions, it seems to be necessary that the creation process of a Topic Map needs to be documented, too.

Page 14: P. 1 Böhm, Maicher (boehm|maicher@informatik.uni-leipzig.de) Real-time Generation of Topic Maps from Speech Streams Real-Time Generation of Topic Maps

p. 14 Böh

m,

Maic

her

(boeh

m|m

aic

her@

info

rmati

k.u

ni-

leip

zig

.de)

Real-time Generation of Topic Maps from Speech Streams

Discussion