marco büchler, lutz maicher, frederik baumgardt, benjamin...

Post on 24-Jun-2020

5 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Automatic Extraction of Topic Maps basedArgumentation Trails

Text Mining Services ConferenceLeipzig, 2009/03/25

Marco Büchler, Lutz Maicher,Frederik Baumgardt, Benjamin Bock

Natural Language Processing GroupDepartment of Computer Science

University of Leipzig

2

Starting Point: Panionion

3

Computation of argumentation trails on fragmentary texts

Surplus and relation between Topic Maps and argumentation trails

Results

Further work / conclusion

Agenda

4

Technical details

5

Text source

6

Co-occurrence as underlying graph- de Saussure (1898/1916):

Structuralism assumes that meaning is the result of structural relations between word forms

The fundamental structural relations are syntagmatic and paradigmatic relations [Heyer & Bordag 2007]

Argumentation trails vs. Lexical Chaining

- fragmentary texts

Underlying graph

7

“Definition/Motivation”: What's the average path length in a graph?

Average path length is typically not larger than7.Average path length is typically not larger than7. Simple proof of concept (Using XING):Simple proof of concept (Using XING):

Every person of my contacts has in Every person of my contacts has in average about 73 contacts (1. and 2.average about 73 contacts (1. and 2. level) level) loglog7373(6,800,000,000)= 5,28(6,800,000,000)= 5,28

Small World

8

Methodology

9

Topic Maps

Data model of Topic Maps (Topics)

10

Nikolaikirche

variant

St. Nicholas Church

St. Nikolai

name

English

scope

1165occurrence

www.nikolaikirche-leipzig.de/

occurrence

foundation

type

website

type

Data model of Topic Maps (Associations)

11

St. Nikolai Leipzig

association

container-containee

ass. rolerole player

containercontainee

role type

Data model of Topic Maps (Summary)

one topic represents one subject in a data source− names represent the names of the subject

names might have variants− occurrences represent properties of the subject− associations represent relationships between subjects

flexibility through roles n-ary associations

− all types and scopes are (set of) Topics in a topic map everything is a topic

12

What are Topic Maps (ISO 13250)?

Topic Maps are highly-networked data sources one topic for each subject relationships of subjects are associations between topics

Topic Maps have a human-centric data model vocabulary for documenting information fits human cognition network resembles human cognition

Topic Maps have an integration model whenever two topics represent the same subject, they have to be merged always one information access hub for each subject high terminological flexibility and schema-free use in knowledge federation and sensemaking

Topic Maps is an international industry standard (ISO 13250)

T

13

14

Extraction of typed significant terms

Corpus is categorized in several classification schemas.

Split corpus into several sub corpora

Medusa

age gender geography

....

Categorized co-occurrences/terms

Tomcat/Prefuse

Age

gender

geography

(Source:Taken from bachelor thesis slides of Marcus Puchalla.)

(

15

Results

16

Several graph properties

Number of nodes 538,572 388,929 363,359 353,618 1,14 9 4,487 2,178

57,762,474 34,818,138 25,615,956 21,004,538 15,4 36 126,188 152,856

30,382,422 21,739,476 17,687,582 15,462,940 14 ,876 69,858 84,124

Percentage 0.53 0.62 0.69 0.74 0.96 0.55 0.55

Average degree 56.41 55.90 48.68 43.73 12.95 15.57 38.62

Number of trails 361.094 7.958.240 3.087.581

Average degree 15.34 9.93 7.70 6.79 7.03 7.77 9.93

31.34 21.08 14.33 11.45 7.02 10.15 12.31

301.38 362.56 285.86 231.39 55.66 76.06 81.86

Complete graph

w_id>=100 &&

freq(word)>1

w_id>=300 &&

freq(word)>1

w_id>=500 &&

freq(word)>1

Named Entities

Normalised Named Entities

Normalised Text and Named Entities

Number of co-occurrences

Number of significant co-occurrences

> 108 > 108 > 108 > 108

Average degree of internal node (trail length 2)

Average degree of internal node (trail

length 3)

Grap

h pr

oper

ties

Argu

men

tatio

n tra

il pr

oper

ties

17

Visualisation of two argumentation trails

Marco Büchler

onotoa.topicmapslab.de

Topic-Maps-Ontologie for the Argumentation Trails

Topic Maps and Argumentation Trails

23

- Reduction of graph comlexity- e. g. by semantic pre-clustering or - authors restrictions

- Weighting of argumentation trails- e. g. Trails containing hubs should be weighted lower

- Improvements in visualisation- Clustering of similar trails to a bunch of semanitic similar trails

- Improvements in typing nodes and especially edges

Further work / conclusion

top related