extracting and delivering stories from heterogeneous information sources v.s. subrahmanian, m....

61
Extracting and Delivering Stories from Heterogeneous Information Sources V.S. Subrahmanian, M. Fayzullin University of Maryland M. Albanese, C. Cesarano, A. Picariello Univ. of Napoli, Italy

Post on 20-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Extracting and Delivering Stories from Heterogeneous Information Sources V.S. Subrahmanian, M. Fayzullin University of Maryland M. Albanese, C. Cesarano,

Extracting and Delivering Stories from Heterogeneous

Information Sources

V.S. Subrahmanian, M. FayzullinUniversity of Maryland

M. Albanese, C. Cesarano, A. PicarielloUniv. of Napoli, Italy

Page 2: Extracting and Delivering Stories from Heterogeneous Information Sources V.S. Subrahmanian, M. Fayzullin University of Maryland M. Albanese, C. Cesarano,

7/6/2005 JIKD 2

Talk Outline

Motivating examples STORY Architecture Theoretical Model Algorithms

OptStory DynStory GenStory

Experimental results

Page 3: Extracting and Delivering Stories from Heterogeneous Information Sources V.S. Subrahmanian, M. Fayzullin University of Maryland M. Albanese, C. Cesarano,

7/6/2005 JIKD 3

Motivating example: Pakistani Nuclear Scientists

Nuclear proliferation is the issue of the day

Complex web of Nuclear scientists Personnel at weapons

locations Arms dealers Customs officials Shipping companies Front companies Manufacturers …

Nuclear monitors may want the “story” on any person or place or event to decide if further investigation is warranted.

Huge amounts of data need to be processed and filtered so thatonly the relevant data is shown to the analyst.

Page 4: Extracting and Delivering Stories from Heterogeneous Information Sources V.S. Subrahmanian, M. Fayzullin University of Maryland M. Albanese, C. Cesarano,

7/6/2005 JIKD 4

Motivating example: US Immigration

Customs official sees a traveller. Wants the quick story on him

Where does he work? Who does he work for? What is his area of expertise? Any warrants? Is he on a watch list? Who are his associates – anyone suspicious?

Just the right data should be presented to him.

Page 5: Extracting and Delivering Stories from Heterogeneous Information Sources V.S. Subrahmanian, M. Fayzullin University of Maryland M. Albanese, C. Cesarano,

7/6/2005 JIKD 5

A motivating example: Pompeii

Pompeii is a spectacular archaeological site. Visitor experience can be greatly improved

by: Automatically notifying visitors of interesting

phenomena without posting extra signs Allowing visitors to explore the stories of various

monuments, paintings, sculptures, etc. in Pompeii. Allowing visitors to explore the stories of the

characters, events and places depicted in these monuments, paintings, sculptures, etc.

Visitors interests vary – so information about exhibits must adapt in real time to their interests to enhance the experience of the visitor.

Page 6: Extracting and Delivering Stories from Heterogeneous Information Sources V.S. Subrahmanian, M. Fayzullin University of Maryland M. Albanese, C. Cesarano,

7/6/2005 JIKD 6

Pompeii Visitors

Visitor arrives at ticket counter and buys ticket.

Page 7: Extracting and Delivering Stories from Heterogeneous Information Sources V.S. Subrahmanian, M. Fayzullin University of Maryland M. Albanese, C. Cesarano,

7/6/2005 JIKD 7

Pompeii Visitors

Visitor arrives at ticket counter and buys ticket.

ANALOG: Soldier inBaghdad sets out on a mission.

Page 8: Extracting and Delivering Stories from Heterogeneous Information Sources V.S. Subrahmanian, M. Fayzullin University of Maryland M. Albanese, C. Cesarano,

7/6/2005 JIKD 8

Pompeii Visitors

Ticket agent asks if they would like to use the storyfacility and if they would like to use their cell phone

and/ or PDA to get stories of interest to them.

Page 9: Extracting and Delivering Stories from Heterogeneous Information Sources V.S. Subrahmanian, M. Fayzullin University of Maryland M. Albanese, C. Cesarano,

7/6/2005 JIKD 9

Pompeii Visitors

Ticket agent asks if they would like to use the storyfacility and if they would like to use their cell phone

and/ or PDA to get stories of interest to them.

ANALOG: Soldier inBaghdad chooses to receive stories on hisradio or PDA.

Page 10: Extracting and Delivering Stories from Heterogeneous Information Sources V.S. Subrahmanian, M. Fayzullin University of Maryland M. Albanese, C. Cesarano,

7/6/2005 JIKD 10

Pompeii Visitors

As visitor walks through Pompeii, STORY identifies where he is and predicts where he might go in the future (probabilistically). Ex. if he is at location L, it might predict that he will go to the House of the Vetti.

Page 11: Extracting and Delivering Stories from Heterogeneous Information Sources V.S. Subrahmanian, M. Fayzullin University of Maryland M. Albanese, C. Cesarano,

7/6/2005 JIKD 11

Pompeii Visitors

As visitor walks through Pompeii, STORY identifies where he is and predicts where he might go in the future (probabilistically). Ex. if he is at location L, it might predict that he will go to the House of the Vetti.

ANALOG: As soldier drives through Baghdad, STORY identifies where he is andcorrelates where he will go with his route plan.

Page 12: Extracting and Delivering Stories from Heterogeneous Information Sources V.S. Subrahmanian, M. Fayzullin University of Maryland M. Albanese, C. Cesarano,

7/6/2005 JIKD 12

Pompeii Visitors

Based on this prediction of where he might go in future, it identifies potential stories he might be interested in and

downloads parts of these stories to his PDA/cell. E.g. It might download stories about Pentheus.

See items

You are here (Triclinium in the House of the Vetti)

Page 13: Extracting and Delivering Stories from Heterogeneous Information Sources V.S. Subrahmanian, M. Fayzullin University of Maryland M. Albanese, C. Cesarano,

7/6/2005 JIKD 13

Pompeii Visitors

Based on this prediction of where he might go in future, it identifies potential stories he might be interested in and

downloads parts of these stories to his PDA/cell. E.g. It might download stories about Pentheus.

See items

You are here (Triclinium in the House of the Vetti)

ANALOG: STORY findsstories satisfying the soldier’s conditions of interest and downloads them to his PDA or to the nearest radio broadcast location.

Page 14: Extracting and Delivering Stories from Heterogeneous Information Sources V.S. Subrahmanian, M. Fayzullin University of Maryland M. Albanese, C. Cesarano,

7/6/2005 JIKD 14

Pompeii Visitors

The visitor chooses which story he is interested in. STORY dynamically generates the story and delivers it to the user’s PDA/cell phone, e.g. user might choose story of Pentheus.

Page 15: Extracting and Delivering Stories from Heterogeneous Information Sources V.S. Subrahmanian, M. Fayzullin University of Maryland M. Albanese, C. Cesarano,

7/6/2005 JIKD 15

Pompeii Visitors

The visitor chooses which story he is interested in. STORY dynamically generates the story and delivers it to the user’s PDA/cell phone, e.g. user might choose story of Pentheus.

ANALOG: STORY delivers the story to the soldier. He can then further interact with the story if needed using voice and cursor prompts.

Page 16: Extracting and Delivering Stories from Heterogeneous Information Sources V.S. Subrahmanian, M. Fayzullin University of Maryland M. Albanese, C. Cesarano,

7/6/2005 JIKD 16

Pompeii Visitors

The user can choose to explore the story in greater detail (e.g. if he is seeing the story of Pentheus, he can also explore the story

of Agave).

Page 17: Extracting and Delivering Stories from Heterogeneous Information Sources V.S. Subrahmanian, M. Fayzullin University of Maryland M. Albanese, C. Cesarano,

7/6/2005 JIKD 17

Stories depend upon context

The concept of story is dramatically different for the examples mentioned earlier. Pompeii Visitor cares about mythological, historical,

artistic facts. Soldier in Baghdad cares about security and mission

related facts. Who are the people around me and not who is depicted on the walls.

Nuclear analyst cares about the nuclear networks – who is selling what to whom? Who is moving the money? What front companies are involved?

What goes into a story depends not only on basic facts about entity of interest but also on the application domain and specific items of interest to the user.

Page 18: Extracting and Delivering Stories from Heterogeneous Information Sources V.S. Subrahmanian, M. Fayzullin University of Maryland M. Albanese, C. Cesarano,

7/6/2005 JIKD 18

STORY Architecture

Page 19: Extracting and Delivering Stories from Heterogeneous Information Sources V.S. Subrahmanian, M. Fayzullin University of Maryland M. Albanese, C. Cesarano,

7/6/2005 JIKD 19

RDF Triples

Consist of 3 parts An entity An attribute A value

STORY also allows time-

stamped values. attributes to have

set-valued types.

Example: Attribute: mother,

Value: Agave Attribute: cartag,

Value: AMD 124 Attribute:

employers, Value = {ibm, hp }

Page 20: Extracting and Delivering Stories from Heterogeneous Information Sources V.S. Subrahmanian, M. Fayzullin University of Maryland M. Albanese, C. Cesarano,

7/6/2005 JIKD 20

RDF Triples

Consist of 3 parts An entity An attribute A value

STORY also allows time-stamped

values. attributes to have

set-valued types.

Time Varying Attribute (TVA)

Example: attribute: job Value = { (cardinal,

1500,1509), (pope,1510,1545) }

Example: Attribute: worked-for Value =

{(ibm,1990,1998), (hp,1999,2004)}

Page 21: Extracting and Delivering Stories from Heterogeneous Information Sources V.S. Subrahmanian, M. Fayzullin University of Maryland M. Albanese, C. Cesarano,

7/6/2005 JIKD 21

Story Schema

A story schema is a pair (E,A) Examples

Set of entities in Pompeii: Set of all objects in Pompei Set of all objects and events depicted Any entities related to the previous categories.

Set of all people/organizations associated with Iraqi cars

Set of all car ids Set of owners of such cars Set of people associated with such owners via one

or many links.

Unlike DBs, no needto declare schema inadvance.

Page 22: Extracting and Delivering Stories from Heterogeneous Information Sources V.S. Subrahmanian, M. Fayzullin University of Maryland M. Albanese, C. Cesarano,

7/6/2005 JIKD 22

Story InstanceNot all attributevalues needed forall entities.

An instance w.r.t. story schema (E,A) is a partial mapping

Input: an entity of E and an attribute of A

Output: a value v in dom(A) if A is an ordinary

attribute, or a timevalue if A is a TVA

Page 23: Extracting and Delivering Stories from Heterogeneous Information Sources V.S. Subrahmanian, M. Fayzullin University of Maryland M. Albanese, C. Cesarano,

7/6/2005 JIKD 23

Extracting RDF from text

Text needs to be parsed in order to understand its structure before extracting RDF triples Context free grammars

to parse the text A set of template-

based rules to extract triples from parsed text

Rule can be derived from examples

Page 24: Extracting and Delivering Stories from Heterogeneous Information Sources V.S. Subrahmanian, M. Fayzullin University of Maryland M. Albanese, C. Cesarano,

7/6/2005 JIKD 24

Generating rules from examples

Rome is the capital of Italy

Syntactic parsing

Manually mark nodes corresponding to entities, attributes and values.

Add alternatives for constant tokens (e.g. of | in)

Validate and define

extraction patterns (see

next slide)

Page 25: Extracting and Delivering Stories from Heterogeneous Information Sources V.S. Subrahmanian, M. Fayzullin University of Maryland M. Albanese, C. Cesarano,

7/6/2005 JIKD 25

Generating rules from examples

Each extraction patterns define which marked node acts as the entity, which one as the attribute and which one as the value.

Page 26: Extracting and Delivering Stories from Heterogeneous Information Sources V.S. Subrahmanian, M. Fayzullin University of Maryland M. Albanese, C. Cesarano,

7/6/2005 JIKD 26

Generating rules from examples

The same node may act as the entity w.r.t. an extraction pattern, and as the value w.r.t. another extraction pattern.

Page 27: Extracting and Delivering Stories from Heterogeneous Information Sources V.S. Subrahmanian, M. Fayzullin University of Maryland M. Albanese, C. Cesarano,

7/6/2005 JIKD 27

Triples extraction

Each sentence is parsed, generating one or more parse trees. Each parse tree is matched against the parse

tree that represents an extraction rule using a tree matching algorithm.

If the match succeeds, the pieces of information corresponding to the marked template nodes are extracted and triples are built according to the extraction patterns.

Probabilistic tree matchingAlgorithms in progress

Page 28: Extracting and Delivering Stories from Heterogeneous Information Sources V.S. Subrahmanian, M. Fayzullin University of Maryland M. Albanese, C. Cesarano,

7/6/2005 JIKD 28

Example “Iran is one of the most dangerous enemies of the United States”

Page 29: Extracting and Delivering Stories from Heterogeneous Information Sources V.S. Subrahmanian, M. Fayzullin University of Maryland M. Albanese, C. Cesarano,

7/6/2005 JIKD 29

Example “Iran is one of the most dangerous enemies of the United States”

Allows 4 different interpretations, corresponding to different parse trees.

All of the 4 parsing trees match the template 2 of them allow us to extract the triple:

E=“the most dangerous enemies of the United States”

A=“one” V=“Iran”

2 of them allow to extract the triple: E=“the United States” A=“one of the most dangerous enemies” V=“Iran”

Page 30: Extracting and Delivering Stories from Heterogeneous Information Sources V.S. Subrahmanian, M. Fayzullin University of Maryland M. Albanese, C. Cesarano,

7/6/2005 JIKD 30

Example “Hu Jintao is the most popular leader in China”

Page 31: Extracting and Delivering Stories from Heterogeneous Information Sources V.S. Subrahmanian, M. Fayzullin University of Maryland M. Albanese, C. Cesarano,

7/6/2005 JIKD 31

Example “Hu Jintao is the most popular leader in China”

Allows 2 different interpretations, corresponding to different parse trees. The first parse tree doesn’t match the template The second parse tree matches the template

and allows us to extract the triple: E=“China” A=“the most popular leader ” V=“Hu Jintao”

Page 32: Extracting and Delivering Stories from Heterogeneous Information Sources V.S. Subrahmanian, M. Fayzullin University of Maryland M. Albanese, C. Cesarano,

7/6/2005 JIKD 32

How the system works

The story application developer first specifies a set of data sources that are to be accessed, e.g.

www a relational database an object oriented database database of web documents a set of URLs Some combination of the above.

The STORY crawler extracts a full instance. Set of triples obtained from all sources specified by the

user. Full instances don’t resolve inconsistencies, generalize

data, etc. Stories are then created on demand using the full

instance and using appropriate conflict resolution, generalization, and other modules.

Page 33: Extracting and Delivering Stories from Heterogeneous Information Sources V.S. Subrahmanian, M. Fayzullin University of Maryland M. Albanese, C. Cesarano,

7/6/2005 JIKD 33

XML sources

Consider an XML node N= name,value,{c1,…cn}> where

{c1,…cn}are children nodes Assuming that N is a root node in an

XML document, and nodes may act both as entities and the attributes….

e is an entity A is an attribute

Page 34: Extracting and Delivering Stories from Heterogeneous Information Sources V.S. Subrahmanian, M. Fayzullin University of Maryland M. Albanese, C. Cesarano,

7/6/2005 JIKD 34

GetXMLAttr(N,e,A)

GetXMLAttr(N,e,A) begin \\

Result := If N.value=e or N.name=e then

for each child c of N such that c.name=A do Result := Result U {c.value }

end for else

for each child c of N do Result := Result U GetXMLAttr(c,e,A)

end for end if return Result

end

Page 35: Extracting and Delivering Stories from Heterogeneous Information Sources V.S. Subrahmanian, M. Fayzullin University of Maryland M. Albanese, C. Cesarano,

7/6/2005 JIKD 35

CPR

There are good stories and bad stories The STORY architecture supports the goals of

succinctness and exploration and creates stories with respect to three important parameters: the priority of the story content, the continuity of the story, the non-repetition of facts covered by the story

We want to deliver the most important facts to the intended audience.

So far, we have focused primarily on priority and non-repetition, worrying less about continuity.

Page 36: Extracting and Delivering Stories from Heterogeneous Information Sources V.S. Subrahmanian, M. Fayzullin University of Maryland M. Albanese, C. Cesarano,

7/6/2005 JIKD 36

CPR examples

In the story of Pentheus, it makes more sense to first say that his parents were Cadmus and Agave, then say he reigned as King of Thebes, and then explain why he was killed. This rendering of the story is in chronological order,

ensuring a kind of temporal continuity. Other measures of continuity are also possible

within the STORY framework. A repetition function may evaluates how much

repetition there is in a given story. For example, in the case of Pentheus, we may

extract the fact that Agave is a parent of Pentheus, and that Agave is the mother of Penthus. Including both these facts in a story is repetitive as the latter fact subsumes the former.

Page 37: Extracting and Delivering Stories from Heterogeneous Information Sources V.S. Subrahmanian, M. Fayzullin University of Maryland M. Albanese, C. Cesarano,

7/6/2005 JIKD 37

Story evaluation function

eval(S)=. (s)+. (s) - . (s) , , are arbitrary functions from the set of all

possible stories S about some entities to [0,1] describes whether high priority facts are

included in the story. For example, the fact that Pentheus' mother was

Agave is more important than the length of Pentheus' big toe.

describes how continuous the story is. This means that a story should not jump wildly from

one fact to another. describes repetition.

clearly, stories that repeat the same or similar facts over and over again leave much to be desired.

Page 38: Extracting and Delivering Stories from Heterogeneous Information Sources V.S. Subrahmanian, M. Fayzullin University of Maryland M. Albanese, C. Cesarano,

7/6/2005 JIKD 38

CPR functions

There are many ways of defining how continuous a story is, how repetitive a story is, etc.

Our story creation algorithms can work with any continuity, priority and repetition functions whatsoever.

Page 39: Extracting and Delivering Stories from Heterogeneous Information Sources V.S. Subrahmanian, M. Fayzullin University of Maryland M. Albanese, C. Cesarano,

7/6/2005 JIKD 39

Attribute Hierarchy

The attributes of interest are arranged in an attribute hierarchy where attributes can be labeled with priorities. The story application developer can browse

and edit this hierarchy (for example if he wishes to add new attributes).

He can add priorities to selected items in the hierarchy (all sub elements of a given element in the hierarchy will inherit the priority value for the parent unless otherwise stated).

Page 40: Extracting and Delivering Stories from Heterogeneous Information Sources V.S. Subrahmanian, M. Fayzullin University of Maryland M. Albanese, C. Cesarano,

7/6/2005 JIKD 40

Page 41: Extracting and Delivering Stories from Heterogeneous Information Sources V.S. Subrahmanian, M. Fayzullin University of Maryland M. Albanese, C. Cesarano,

7/6/2005 JIKD 41

Conflict Management

As multiple data sources may be used to extract attributes, conflicts might occur. For example, one source may say that

Pentheus‘ mother is Agave, while another may say it is Hera.

STORY allows conflict resolution with an application specific method.

Conflicts do not always need to be resolved. Sometimes, you just report the existence of a conflict, and specify what should be reported.

Page 42: Extracting and Delivering Stories from Heterogeneous Information Sources V.S. Subrahmanian, M. Fayzullin University of Maryland M. Albanese, C. Cesarano,

7/6/2005 JIKD 42

Conflict Management Policy

Temporal Conflict Resolution Suppose different data sources provide different values v1, …,

vn. Suppose value vi was inserted into the data source at time ti. In this case, we pick the value vi such that ti = max{ t1,t2, …,tn}. If multiple exist, one is selected randomly.

Source based conflict resolution. The developer of a story may assign a credibility ci to each

source si that provides a value vi for attribute A of entity e. This strategy picks value vi such that ci = max {c1,…, cn}. If multiple exist, one is selected randomly.

Voting based conflict resolution. Each value vi returned by at least one data source has a vote

that represents the number of sources that return value vi. In this case, this conflict resolution strategy returns the value with the highest vote. If multiple vi's have the same highest vote, one is picked randomly and returned.

Page 43: Extracting and Delivering Stories from Heterogeneous Information Sources V.S. Subrahmanian, M. Fayzullin University of Maryland M. Albanese, C. Cesarano,

7/6/2005 JIKD 43

Generalization Module

Goal: to generalize multiple RDF triples into one. For example, if we know that Pentheus's father is

Cadmus, and his mother is Agave, we may want to generalize this to say that Pentheus's parents are Cadmus and Agave.

If Pentheus was king of one town for some period, king of another town for another period of time, and so on, we may merely want to say that Pentheus was king of many places.

The Generalization Module looks at the RDF-triples stored in the RDF database and augments it with triples that include generalization attributes … that succinctly summarize a set of less general

(i.e. more specific) attributes.

Page 44: Extracting and Delivering Stories from Heterogeneous Information Sources V.S. Subrahmanian, M. Fayzullin University of Maryland M. Albanese, C. Cesarano,

7/6/2005 JIKD 44

Generalized Story Schema

A generalized story schema consists of a regular story schema, a function that associates an equivalence relation with each attribute domain and a function that associates a generalization function with each attribute domain.

An equivalence relation on the domain dom(A) of attribute $A$ specifies when certain values in the domain are considered equivalent. For example, we may consider string values “king” and “monarch” to be equivalent in dom(occupation).

For a time varying attribute we may consider (“king“”,L,U) and “monarch”,L',U' to be equivalent independently of whether L=L and U=U' is true or not.

Our system uses WordNet and some heuristics to infer equivalence relationships between terms.

Generalization currently being plugged into the system.

Page 45: Extracting and Delivering Stories from Heterogeneous Information Sources V.S. Subrahmanian, M. Fayzullin University of Maryland M. Albanese, C. Cesarano,

7/6/2005 JIKD 45

STORY creation

Construct a story of length k or less from the RDF database. examining all triples in the RDF entity of interest, including triples extracted from the data sources by

the attribute extractor as well as triples created by the generalization module.

It then finds the k triples that optimize an objective function. The objective function must be monotonic in priority

of the triples and monotonic w.r.t. the continuity function selected by the STORY application developer, and anti-monotonic in the amount of repetition between tuples.

Page 46: Extracting and Delivering Stories from Heterogeneous Information Sources V.S. Subrahmanian, M. Fayzullin University of Maryland M. Albanese, C. Cesarano,

7/6/2005 JIKD 46

Closed Instance

We first compute the full instance associated with our source access table.

We then split this instance into equivalence classes using equivalence relation.

Suppose the equivalence classes thus generated are X1, …, Xn.

For each equivalence class Xi we compute the generalization vi using the generalization function associated with attribute A. We insert the tuple (e,A, vi) into the full instance.

This process is repeated for all entities e and all attributes A

After all tuples of the form shown above inserted into the full instance, it becomes the closed instance.

Page 47: Extracting and Delivering Stories from Heterogeneous Information Sources V.S. Subrahmanian, M. Fayzullin University of Maryland M. Albanese, C. Cesarano,

7/6/2005 JIKD 47

Story Computation Problem

Given a closed instance I, a positive integer k, and an entity e as input, find a story of size k that maximizes the value of a given evaluation function eval.

In this case, the found story is called on Optimal Story.

Theorem: Finding an optimal story is NP-hard (even after the full instance is created).

Page 48: Extracting and Delivering Stories from Heterogeneous Information Sources V.S. Subrahmanian, M. Fayzullin University of Maryland M. Albanese, C. Cesarano,

7/6/2005 JIKD 48

Story Algorithms

OptSTORY algorithm: finds the story that optimizes the objective function. This algorithm has the disadvantage of being

very slow. Multiple alternative BestSTORY algorithms DynStory(S) uses a dynamic programming

approach GenStory(S) which is based on genetic

programming. DynStory and GenStory find suboptimal

stories, but do so very fast.

Page 49: Extracting and Delivering Stories from Heterogeneous Information Sources V.S. Subrahmanian, M. Fayzullin University of Maryland M. Albanese, C. Cesarano,

7/6/2005 JIKD 49

GPS Support SubsystemCurrent implementation

Outdoor positioning at Pompeii implemented using DGPS

Mobile devices are equipped with IEEE 802.11b wireless Ethernet to allow internet connection

Page 50: Extracting and Delivering Stories from Heterogeneous Information Sources V.S. Subrahmanian, M. Fayzullin University of Maryland M. Albanese, C. Cesarano,

7/6/2005 JIKD 50

GIS Support SubsystemOutdoor and indoor positioning

Outdoor positioning GPS has been successfully adopted in a lot of

applications Indoor positioning

GPS receivers are blind in indoor spaces Different kinds of positioning systems will be used

Infrared or ultrasound sensors Radio Frequency sensors WLAN-based positioning

We have methods to optimally position a set of sensors to monitor the site, but the system is not yet implemented.

Page 51: Extracting and Delivering Stories from Heterogeneous Information Sources V.S. Subrahmanian, M. Fayzullin University of Maryland M. Albanese, C. Cesarano,

7/6/2005 JIKD 51

STORY presentation

Our STORY architecture applies to several different hardware options our current implementation works for both

PDAs and laptops.

Multiple languages we currently support English, Spanish and

Italian.

Multiple output rendering via a graphical user interface or via speech

Page 52: Extracting and Delivering Stories from Heterogeneous Information Sources V.S. Subrahmanian, M. Fayzullin University of Maryland M. Albanese, C. Cesarano,

7/6/2005 JIKD 52

Page 53: Extracting and Delivering Stories from Heterogeneous Information Sources V.S. Subrahmanian, M. Fayzullin University of Maryland M. Albanese, C. Cesarano,

7/6/2005 JIKD 53

Methods to mergemultiple such sentences into one arebeing implemented.

Page 54: Extracting and Delivering Stories from Heterogeneous Information Sources V.S. Subrahmanian, M. Fayzullin University of Maryland M. Albanese, C. Cesarano,

7/6/2005 JIKD 54

STORY Experiments

Parameters to be evaluated Value of the facts included into the stories Quality of the prose (does it read nicely)

Experiments plan 61 students enrolled as reviewers

51 non experts (no a priori knowledge about the subjects of the stories)

10 experts (a priori knowledge) Facts and prose evaluated for

Different algorithms Different rendering techniques Different CPR parameters settings Different lengths of the stories

Page 55: Extracting and Delivering Stories from Heterogeneous Information Sources V.S. Subrahmanian, M. Fayzullin University of Maryland M. Albanese, C. Cesarano,

7/6/2005 JIKD 55

Value of the facts vs. length of the story: Trends

Page 56: Extracting and Delivering Stories from Heterogeneous Information Sources V.S. Subrahmanian, M. Fayzullin University of Maryland M. Albanese, C. Cesarano,

7/6/2005 JIKD 56

Value of the facts vs. length of the story: Considerations Highest Priorities:

GenSTORY (version 1: using original sentences from sources if available instead of only using templates) wins

Runner up is DynSTORY (version 1) Even if we ignore how the stories are

rendered, GenSTORY still wins. Including the original sentences in the

story adds more information content than rendering the same fact through a template.

Page 57: Extracting and Delivering Stories from Heterogeneous Information Sources V.S. Subrahmanian, M. Fayzullin University of Maryland M. Albanese, C. Cesarano,

7/6/2005 JIKD 57

Quality of the prose vs. length of the story: Trends

Page 58: Extracting and Delivering Stories from Heterogeneous Information Sources V.S. Subrahmanian, M. Fayzullin University of Maryland M. Albanese, C. Cesarano,

7/6/2005 JIKD 58

Quality of the prose vs. length of the story: Considerations

The quality of the prose is high and seems independent of the algorithm used

Quality of prose decreases as the story length increases (not surprising).

Including sentences from text sources into stories improves story quality.

Page 59: Extracting and Delivering Stories from Heterogeneous Information Sources V.S. Subrahmanian, M. Fayzullin University of Maryland M. Albanese, C. Cesarano,

7/6/2005 JIKD 59

Value of the facts and quality of the prose: Summary

Page 60: Extracting and Delivering Stories from Heterogeneous Information Sources V.S. Subrahmanian, M. Fayzullin University of Maryland M. Albanese, C. Cesarano,

7/6/2005 JIKD 60

Value of the facts vs. CPR parameters: Trends

Page 61: Extracting and Delivering Stories from Heterogeneous Information Sources V.S. Subrahmanian, M. Fayzullin University of Maryland M. Albanese, C. Cesarano,

7/6/2005 JIKD 61

Value of the facts vs. CPR parameters: Considerations

Best “value of facts” is obtained when the priority is set to a high value Users are more interested in priority

than in continuity and repetition Repetition is to avoid when the length

of the story is very short For low values of L the best results are

obtained when R is set to a high value