logical fusion rules for merging structured news reports

Logical fusion rules for merging structured news reports

Anthony Hunter

Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, UK

Received 25 April 2001; received in revised form 16 October 2001; accepted 19 December 2001

Abstract

Structured text is a general concept that is implicit in a variety of approaches in handling information.Syntactically, an item of structured text is a number of grammatically simple phrases together with a se-mantic label for each phrase. Items of structured text may be nested within larger items of structured text.Much information is potentially available as structured text including tagged text in XML, text in relationaland object-oriented databases, and the output from information extraction systems in the form of in-stantiated templates. In previous papers, we have presented a logic-based framework for merging items ofpotentially inconsistent structured text [Data Knowledge Eng. 34 (2000) 305–332, Data Knowledge Eng.2002 (in press)]. In this paper, we present fusion rules as a way of implementing logic-based fusion. Fusionrules are a form of scripting language that define how structured news reports should be merged. Theantecedent of a fusion rule is a call to investigate the information in the structured news reports and thebackground knowledge, and the consequent of a fusion rule is a formula specifying an action to be un-dertaken to form a merged report. It is expected that a set of fusion rules is defined for any given appli-cation. We give the syntax and mode of execution for fusion rules, and explain how the resulting actionsgive a merged report. We illustrate the presentation with examples of fusion rules for an application formerging weather reports. � 2002 Elsevier Science B.V. All rights reserved.

Keywords:Merging information; Structured text; Semi-structured text; Logic-based inconsistency management; Logic-

based fusion; Heterogenous information; Inconsistency handling; Conflict resolution

1. Introduction

Syntactically, an item of structured text is a data structure containing a number of gram-matically simple phrases together with a semantic label for each phrase. The set of semantic

www.elsevier.com/locate/datak

Data & Knowledge Engineering 42 (2002) 23–56

E-mail address: [email protected] (A. Hunter).

0169-023X/02/$ - see front matter � 2002 Elsevier Science B.V. All rights reserved.PII: S0169-023X(02 )00026-5

labels in a structured text is meant to parameterize a stereotypical situation, and so a parti-cular item of structured text is an instance of that stereotypical situation. Using appropriatesemantic labels, we can regard an item of structured text as an abstraction of an item of freetext.For example, news reports on corporate acquisitions can be represented as items of structured

text using semantic labels including buyer, seller, acquisition, value, and date. Eachsemantic label provides semantic information, and so an item of structured text is intended tohave some semantic coherence. Each phrase in structured text is very simple – such as a propernoun, a date, or a number with unit of measure, or a word or phrase from a prescribed lexicon.For an application, the prescribed lexicon delineates the types of states, actions, and attributes,that could be conveyed by the items of structured text. An example of structured text is given inFig. 1.Much material is potentially available as structured text. This includes items of text structured

using XML tags, and the output from information extraction systems given in templates (see forexample [2,15,25]). The notion of structured text also overlaps with semi-structured data (forreviews see [1,10]).Whilst structured text is useful as a resource, there is a need to develop techniques to handle,

analyse, and reason with it. In particular, we are interested in merging potentially inconsistent setsof news reports [30,35], and deriving inferences from potentially inconsistent sets of news reports[6,31,32,34].

Fig. 1. An example of a news report in the form of structured text.

24 A. Hunter / Data & Knowledge Engineering 42 (2002) 23–56

1.1. Our approach to fusion

In order to merge items of structured text, we need to take account of the contents of thestructured text. Different kinds of content need to be merged in different ways. To illustrate,consider Examples 1.1–1.3.

Example 1.1. Consider the following two conflicting weather reports which are for the same dayand same city.

We can merge them so that the source is TV1 and TV3, and the weather for today is sun,and the weather for tomorrow is sun or rain.

An alternative way of merging these reports may be possible if we have a preference for onesource over the other. Suppose we have a preference for TV3 in the case of conflict, then themerged report is:

Example 1.2. Consider the following two structured reports which are for the same day butdifferent regions.

hweatherreport : hweatherreport :hsource : TV1i hsource : TV3ihdate : 19:5:1999i hdate : 19:5:1999ihcity : Londoni hcity : Londonihtoday : suni htoday : sunihtomorrow : suni htomorrow : raini

: weatherreporti : weatherreporti

hweatherreport :hsource : TV1 and TV3ihdate : 19:5:1999ihcity : Londonihtoday : sunihtomorrow : sun or raini

: weatherreporti

hweatherreport :hsource : TV1 and TV3ihdate : 19:5:1999ihcity : Londonihtoday : sunihtomorrow : raini

: weatherreporti

hweatherreport : hweatherreport :hdate : 5 Nov 99i hdate : 5 Nov 99ihregionalreport : hregionareport :hregion : South Easti hregion : North Westi

A. Hunter / Data & Knowledge Engineering 42 (2002) 23–56 25

Here we may wish to take the union of the two regionalreport features in the merged report,giving the following merged report,

Example 1.3. Consider the following two weather reports for the same day and same city.

Here we may wish to take the conjunction of inclement and showers, and range of 18–20C inthe merged report.

There are many further examples we could consider, each with particular features that indicatehow the merged report should be formed.

hweatherreport :hdate : 5 Nov 99ihregionalreport :hregion : South Eastihmaxtemp : 20Ci

: regionalreportihregionalreport :hregion : North Westihmaxtemp : 18Ci

: regionalreporti: weatherreporti

hweatherreport : hweatherreport :hsource : TV1i hsource : TV3ihdate : 1:8:1999i hdate : 1:8:1999ihcity : Parisi hcity : Parisihmiddayweather : hmiddayweather :hprecipitation : inclementi hprecipitation : showersihtemperature : 20Ci htemperature : 18Ci

middayweatheri middayweatheri: weatherreporti : weatherreporti

hweatherreport :hsource : TV1 and TV3ihdate : 1:8:1999ihcity : Parisihmiddayweather :hprecipitation : inclement and showersihtemperature : 18C–20Ci

: middayweatheri: weatherreporti

hmaxtemp : 20Ci hmaxtemp : 18Ci: regionalreporti : regionalreporti



In our approach to merging items of structured text, in particular structured news reports, wedraw on domain knowledge to help produce merged reports. The approach is based on fusionrules defined in a logical meta-language. These rules are of the form a ) b, where if a holds, thenb is made to hold. So we consider a as a condition to check the information in the structuredreports and in the background information, and we consider b as an action to undertake toconstruct the merged report.To merge a pair of structured news reports, we start with the background knowledge and the

information in the news reports to be merged, and apply the fusion rules to this information. Fora pair of structured news reports and a set of fusion rules, we repeatedly apply the fusion rulesuntil no more rules apply. The application of the fusion rules is then a monotonic process thatbuilds up a set of actions that define how the structured news report should be merged. The in-formation in this fixpoint is then used to construct a merged structured news report. In order tomerge more than two reports, we can repeat the merging process iteratively.

1.2. Other approaches to fusion

Our logic-based approach differs from other logic-based approaches for handling inconsistentinformation such as belief revision theory (e.g. [19,22,36,38]), knowledgebase merging (e.g. [7,37]),and logical inference with inconsistent information (e.g. [3,5,9,18,42]). These proposals are toosimplistic in certain respects for handling news reports. Each of them has one or more of thefollowing weaknesses: (1) one-dimensional preference ordering over sources of information – fornews reports we require finer-grained preference orderings; (2) primacy of updates in belief re-vision – for news reports, the newest reports are not necessarily the best reports; and (3) weakmerging based on a meet operator – this causes unnecessary loss of information. Furthermore,none of these proposals incorporate actions on inconsistency or context-dependent rules speci-fying the information that is to be incorporated in the merged information, nor do they offer aroute for specifying how merged reports should be composed.Other logic-based approaches to fusion of knowledge include the KRAFT system and the use

of Belnap’s four-valued logic. The KRAFT system uses constraints to check whether informationfrom heterogeneous sources can be merged [26,43]. If knowledge satisfies the constraints, then theknowledge can be used. Failure to satisfy a constraint can be viewed as an inconsistency, but thereare no actions on inconsistency. In contrast, Belnap’s four-valued logic uses the values ‘‘true’’,‘‘false’’, ‘‘unknown’’ and ‘‘inconsistent’’ to label logical combinations of information (see for ex-ample [39]). However, this approach does not provide actions in case of inconsistency.Merging information is also an important topic in database systems. A number of proposals

have been made for approaches based in schema integration (e.g. [44]), the use of global schema(e.g. [24]), and conceptual modelling for information integration based on description logics[4,12,13,21,45]. These differ from our approach in that they do not seek an automated approachthat uses domain knowledge for identifying and acting on inconsistencies. Heterogeneous andfederated database systems also could be relevant in merging multiple news reports, but they donot identify and act on inconsistency in a context-sensitive way [16,41,47], though there is in-creasing interest in bringing domain knowledge into the process (e.g. [14,48]).Our approach also goes beyond other technologies for handling news reports. The approach of

wrappers offers a practical way of defining how heterogeneous information can be merged (see for


example [17,27,46]). However, there is little consideration of problems of conflicts arising betweensources. Our approach therefore goes beyond these in terms of formalizing reasoning with in-consistent information and using this to analyse the nature of the news report and for formalizinghow we can act on inconsistency.

1.3. The rest of the paper

In the rest of the paper, we develop the approach of logic-based fusion of news reports. InSection 2, we review the definitions for formalizing structured text. This includes coverage oftimestamps and sourcestamps. It also includes coverage of some concepts for describing thestructure of structured text. In Section 3, we present the syntax for fusion including fusion rulesand background knowledge. Then, in Section 4, we define how a set of fusion rules together withbackground knowledge can be executed to generate a set of actions that can be used to build amerged structured news report. We explain how to use these actions to build a merged structurednews report in Section 5. Then, in Section 6, we consider some properties of fusion systems basedon fusion rules.

2. Formalizing structured text

In this section, we formalize the composition and structure of structured news reports.

2.1. Structured text

Here we adopt some basic definitions that should be easy to view as an adaptation of ideas in avariety of fields in XML, relational and object-oriented databases, language engineering, andknowledgebased systems.

Definition 2.1. A word is a string of alphanumeric characters, and a phrase is a string of one ormore words. A text entry is either a phrase or a null value. A semantic label is a phrase. In thispaper, we assume the set of semantic labels and the set of text entries are disjoint.

Example 2.1. Examples of words include John, France, drive, happy, 23, and 3i, and ex-amples of phrases include University of London, John, 23 April 1999, and warm and

sunny.

Definition 2.2. If / is a semantic label, and w is a text entry, then h/ : wi is an atomic feature. Thecomplex features are defined as follows: (1) if h/ : wi is an atomic feature, then h/ : wi is acomplex feature; and (2) if / is a semantic label and r1; . . . ; rn are complex features, thenh/ : r1; . . . ;rn : /i is a complex feature. The type of a complex feature is the semantic label /. Anitem of structured text is just a complex feature.

Definition 2.3. Let h/ : r1; . . . ; rn : /i be a complex feature. The sub function is defined asfollows:


Subðh/ : r1; . . . ;rn : /iÞ ¼ fr1; . . . ; rng [ Subðr1Þ [ [ SubðrnÞ

Subðh/ : wiÞ ¼ fh/ : wig

For complex features a;b, a is a complex feature in b iff a 2 SubðbÞ.

We can consider a complex feature as a tree where the semantic labels are non-leaf nodes andthe text entries are leaves.

Definition 2.4. Let h/ : r1; . . . ; rn : /i be a complex feature. The semantic label / is the parent ofthe complex features r1; . . . ;rn. The elements of Subðh/ : r1; . . . ;rn : /iÞ are the offspring of /.The function Root is defined as: Rootðh/ : r1; . . . ; rn : /iÞ ¼ /.

Definition 2.5. The complex features, h/ : r1; . . . ; rn : /i and h/ : w1; . . . ;wn : /i have the samestructure iff r1 and w1 have the same structure, . . ., and rn and wn have the same structure. Theatomic features h/ : ai and h/ : bi have the same structure.

We assume that for an application, some complex features will be classified as structured re-ports. These will be complex features with some minimum structure such as illustrated in Ex-amples 1.1–1.3. We do not, however, assume any general conditions for classifying items ofstructured text as structured reports.

2.2. Timestamps and sourcestamps

There are four types of timestamp that we will consider for structured reports, namely, anatomic pointbased timestamp, an atomic intervalbased timestamp, a complex pointbased time-stamp, and a complex intervalbased timestamp, and we define these below.

Definition 2.6. The set of temporal semantic labels is a subset of the set of semantic labels used forstructured news reports.

Example 2.2. The set of temporal semantic labels includes time, date, publicationdate,and year.

Definition 2.7. The set of temporal text entries is a subset of the set of text entries used forstructured news reports. A temporal text entry may refer to a point or interval in a clock and/orcalendar. A temporal text entry is called a pointbased text entry if it refers to a point in a clock.And a temporal text entry is called an intervalbased text entry if it refers to an interval in a clockand/or calendar.

Example 2.3. Temporal text entries include 14.00hrs, 19 April 2000, and 19/4/00. Tem-poral text entries may or may not include the units of time used. For example, both 14.00 and14.00hrs are temporal text entries.


We will look more closely at the nature of points and intervals in the following definitions.

Definition 2.8. An atomic pointbased timestamp is an atomic feature ha : bi, where a is a temporalsemantic label referring to a particular clock and/or a particular calendar and b is a pointbasedtext entry with a value denoting a point in that clock and/or calendar.

Example 2.4. Examples of atomic pointbased timestamps include:

htime : 14:00hrsi hdate : 19 April 2000ihpublicationdate : 19=4=00i hyear : 2004i

We assume the background knowledge includes axioms that make different formats for temporaltext entries interchangeable, so that for example 19 April 2000 is equivalent to 19/4/2000.

Definition 2.9. An atomic intervalbased timestamp is an atomic feature ha : bi, where a is a tem-poral semantic label referring to a particular clock and/or a particular calendar and b is a in-tervalbased text entry with a value denoting an interval in that clock and/or calendar.

We view time intervals as being either implicitly given as an interval with the start and endpoints being inferred, or explicitly given in terms of a start point and an end point.

Definition 2.10. An explicit intervalbased text entry is of the form X–Y, where X and Y denotepoints, and an implicit intervalbased text entry of the form X, where X describes a period of timewithout using explicit end points.

Example 2.5. Examples of atomic intervalbased timestamps with explicit intervalbased text en-tries include:

hreportingperiod : 1=1=00–31=3=00i hopenhours : 09:00–12:00iExample 2.6. Examples of atomic intervalbased timestamps with implicit intervalbased text en-tries include:

hreportingperiod : 2004i hholidayperiod : EasteriSo for 2004, the inferred start point in days is 1/1/2004 and the inferred endpoint in days is31/12/2004.

It can appear difficult to distinguish some implicit intervalbased text entries from pointbasedtext entries. We address the problem of handling implicit intervals by reducing each intervalbasedtext entry to pointbased text entries. We assume that the background knowledge includes axiomsfor this (as discussed in [35]).

Definition 2.11. A complex pointbased timestamp is either an atomic pointbased timestamp or acomplex feature hw : /1; . . . ;/n : wi, where w is a temporal semantic label refering to a particularclock and/or a particular calendar and /1; . . . ;/n are complex pointbased timestamps that de-scribe the point in that clock and/or calendar.


In this paper, we will assume atomic and complex pointbased timestamps should be inter-changeable by appropriate axioms in the background knowledge.

Example 2.7. An example of a complex pointbased timestamp is:

So we can assume this complex pointbased timestamp is equivalent to hdate : 23 April 2000i.

Definition 2.12. A complex intervalbased timestamp is either an atomic intervalbased timestamp ora complex feature hw : /1; . . . ;/n : wi, where w is a temporal semantic label refering to a particularclock and/or a particular calendar and /1; . . . ;/n are complex pointbased timestamps that de-scribe the interval in that clock and/or calendar.

Also, in this paper, we will assume atomic and complex intervalbased timestamps should beinter-changeable by appropriate axioms in the background knowledge.

Example 2.8. An example of a complex intervalbased timestamp is:

So we assume this complex intervalbased timestamp is equivalent to

huniversity term : 10=1=2000–29=3=2000i

In the rest of this paper, we assume each news report may have a timestamp which has type date,and may be pointbased or intervalbased, and complex or atomic. It may also have a sourcestampwhich is an atomic feature of type source and text entry that describes what the source ofthe news report is. For example, for a weather report, we may have the atomic featurehsource : BBC TVi. Whilst we take a restricted position on timestamps and sourcestamps in thispaper, we believe that the approach presented here can be extended to further types and com-binations of timestamps and sourcestamps in structured text. For more information on usingtemporal knowledge in structured news reports see [35].

2.3. Structural information about structured news reports

In order to compare items of structured text on the basis of their structure, we will use thefollowing notion of a skeleton.

hdate :hday : 23i,hmonth : Aprili,hyear : 2000i,

: datei

huniversity term :hfirst day of term : 10=1=2000i,hlast day of term : 29=3=2000i,

: university termi


Definition 2.13. A skeleton is a tree ðN ;E; SÞ defined as follows: N is the set of nodes where eachnode is a semantic label; E is a set of edges represented by pairs of nodes; and S is the set of siblingneighbours represented by pairs of nodes such that ðx; yÞ 2 S iff (i) x and y are siblings (i.e. x and yhave the same parent) and (ii) x is to the left of y.

According to Definition 2.13, the relative positions of siblings is important in a skeleton. So if xand y are siblings in a skeleton T, such that x is left of y, then we can form a different skeleton T 0

where y is left of x.Since a skeleton is essentially a complex feature without the text entries, a skeleton can be

formed from an item by just removing the text entries. In other words, we use the semantic labelsin a item of structured text as the name of the nodes in the skeleton. We call each such name, thesimple name of the node. However, to obviate any problems arising from multiple occurrences of asemantic label in a complex feature, and hence the same name being used for different nodes, wecan adopt the following definition for pedantic names for nodes. This definition uses the sequenceof semantic labels used on the path from the root to the particular occurrence of a semantic labelin a feature.

Definition 2.14. The pedantic name for any node in a structured news report is defined inductivelyas follows: the pedantic name for the root of a structured news report is the semantic label for thereport. For a nested feature h/ : w1; . . . ;wn : /i, let the pedantic name for / be a=/, then thepedantic name for the root of each wi is a=/=RootðwiÞ. (If there is more than one wi with the samesemantic label at the root, then the occurrences can be differentiated by adding a superscript to thesemantic label in the pedantic name.)

Example 2.9. The pedantic name for source in Fig. 1 is bidreport/reportinfo/source.

Essentially, a pedantic name is like a unix file name that can be given by the path of directoriesfrom the root. But where we do not have ambiguity, we will just use the simple name.

Definition 2.15. The skeleton function, denoted Skeleton, is applied to a complex feature h andreturns the skeleton ðN ;E; SÞ for h, where the set of nodes N is the set of names (simple or pe-dantic) formed from the semantic labels used in h, and E and S are defined as follows, where / ,Rootðw1Þ

; . . . ;RootðwnÞ

are either the simple names or pedantic names (as required) for the

semantic labels /, and the roots of w1; . . . ;wn, respectively:

E ¼ fð/ ; RootðwiÞ Þ j h/ : w1; . . . ;wn : /i 2 SubðhÞ and i 2 f1; . . . ; ngg

S ¼ fðRootðwiÞ ; RootðwjÞ

Þ j h/ : w1; . . . ;wn : /i 2 SubðhÞ and i; j 2 f1; . . . ; ng and i < jg

So the skeleton function is defined to extract the tree structure of each item of structured text.

Definition 2.16. Let Ti ¼ ðNi;Ei; SiÞ and Tj ¼ ðNj;Ej; SjÞ be skeletons and let � be a pre-orderingover skeletons such that: Ti � Tj iff Ni � Nj and Ei � Ej and Si � Sj.


So for skeletons Ti and Tj, if we have Ti � Tj, then the set of edges in Ti is a subset of the edges inTj, and for all sibling nodes x; y in Ti, if x is left of y in Ti, then x is left of y in Tj.An example of a pre-ordering is given in Fig. 2.

Definition 2.17. Let h be a complex feature and let S be a skeleton. An instantiation of S by h isdefined as follows:

If SkeletonðhÞ � S then h is a partial instantiation of S

If SkeletonðhÞ � S and S � SkeletonðhÞ then h; is a full instantiation of S

If h is an instantiation of S, this is denoted by SðhÞ.With reference to Definition 2.5, structured reports h1 and h2 have the same structure iff

Skeletonðh1Þ � Skeletonðh2Þ and Skeletonðh2Þ � Skeletonðh1Þ.

Definition 2.18. Let F and F0 be features such that F 2 SubðF0Þ and let SkeletonðF0Þ ¼ ðN ;E; SÞ.A position of F in F0 is a pedantic name for the root of F in SkeletonðN ;E; SÞ.

Definition 2.19. Let h/ : w1; . . . ;wn : /i be a complex feature. The anchor for each complex featurewi is /. So each of w1; . . . ;wn is anchored at /.

The anchor is the position from which one or more complex features hang. In this way, theanchor gives the connection to the rest of structured text.

Example 2.10. Consider Fig. 1. Here we can see hbid date : 30 May 2000i is at positionbidreport/biddate and it is anchored at bidreport.

In this way, we are using the notions of nodes in skeletons and semantic labels in structuredreports interchangeably.

3. Syntax for fusion

In this section, we give the syntax for the fusion rules and associated background knowl-edge for defining fusion systems. We assume fusion is undertaken on pairs of structured

Fig. 2. Assume T3 and T4 are both skeletons. Here, T3 � T4 holds.


news reports, and that these reports are represented as logical terms in the logical fusionrules.

Definition 3.1. Each complex feature is equivalent to a ground term called a feature term.If h/ : w1; . . . ;wn : /i is a complex feature, and w0

1 is a feature term that represents w1; . . ., andw0

n is a feature term that represents wn, then /ðw01; . . . ;w

0nÞ is a feature term that repre-

sents h/ : w1; . . . ;wn : /i. If h/ : wi is a complex feature, then /ðwÞ is a feature term that repre-sents it.

Definition 3.2. Each text entry is equivalent to a constant symbol called a text entry constant. So ifT is a text entry, then T is a text entry constant.

Example 3.1. Consider the following item of structured text.

This can be represented by the feature term:

auctionreportðbuyerðnameðfirstnameðJohnÞ;surnameðSmithÞÞÞ;propertyðLot37ÞÞIn defining fusion rules, we require three kinds of atom. These are structural atoms, backgroundatoms, and action atoms. The structural atoms capture information about the structure and typeof complex features, and the type and text entries for the atomic features, in individual structurednews reports, and between paris of structured news reports. The background atoms relate in-formation in individual structured news reports to the domain knowledge. Finally, action atomscapture instructions for building merged structured news reports.

Definition 3.3. The structural atoms are atoms that include the following definitions:

1. FeatureType(F,T) where T is the type of feature F.2. Report(R) where R is a structured news report.3. TextEntry(A,E) where E is the text entry for the atomic feature A.4. IncludeFeature(F,F0Þ where F0 is a feature in F, and so F0 2 SubðFÞ.5. AtomicFeature(F,A) where A is an atomic feature in F.6. Position(F,P,R) where P is a position of the feature F in the report R.7. Anchor(F,P,R) where P is a position of an anchor of the feature F in the report R.8. SameSkeleton(F,F0) where the features F and F0 are such that Skeleton(F) ¼ Skele-

ton(F0).9. SameTextEntry(A,A0) where the atomic features A and A0 have the same text entry.10. NextSibling(P,P0,R) where position P is an immediate sibling to the left of P0 in

report R.

We denote the set of structural atoms by S.

hauctionreport :hbuyer : hname : hfirstname : Johni; hsurname : Smithi : namei : buyeri,hproperty : Lot37i

: auctionreporti


Example 3.2. To illustrate some of these atoms, consider the following:

Example 3.3. Consider Fig. 1. Let this report be denoted R, and let hbid date : 30 May 2000ibe denoted F. For this, we have

The structural atoms are evaluated by the underlying implementation for fusion. In otherwords, for any pair of structured news reports, the set of ground structural atoms that hold iscompletely determined. These atoms can be viewed as ‘‘built-in’’ predicates by analogy withProlog.We also require atoms that relate the contents of structured text to the background knowledge.

These are background atoms and a partial list is given below.

Definition 3.4. The background atoms are atoms that include the following definitions:

1. SameDate(F,F0) where F and F0 are timestamps that refer to the same time point.2. SameSource(F,F0) where F and F0 are sourcestamps that refer to the same source.3. SameCity(F,F0) where F and F0 are features that refer to the same city.4. Source(R,F) where F is the sourcestamp in the report R.5. Date(R,F) where F is the datestamp in the report R.6. Coherent(F,F0) where the features F and F0 are coherent.7. Prefer(F,F0) where the feature F is preferred to the feature F0.

We denote the set of background atoms by B.

Example 3.4. To illustrate background atoms consider the following literals that may be includedin the background knowledge:

Background atoms like SameDate(F,F0Þ, SameSource(F,F0Þ, and SameLog(F,F0Þ areuseful to determine when two features are equivalent and thereby indicate whether the featurescome from are on the same topic. For example, if we have two reports, we can use these atoms todetermine that the two reports have SameCity and SameDate holding, as a precondition beforemerging. We will use them as conditions in fusion rules below for this purpose.The coherent relation captures when two features are mutually consistent. The simplest form of

inconsistency is between a pair of atomic features. Consider two structured reports, h1 and h2,where the atomic feature ha : /i is in item h1 and the atomic feature ha : wi is in item h2 and / 6¼ w.For some semantic labels, this inequality would suggest an inconsistency with the domain

TextEntry(city(London),London)

IncludesFeature(address(street(GowerSt),city(London)),city(London))

SameTextEntry(city(London),area(London))

Position(F,bidreport/biddate,R)

Anchor(F,bidreport,R)

SameDate(date(14Nov01), date(14.11.01))

SameCity(city(Mumbai), city(Bombay))

SameLog(log(date(day(14), month(11), year(01))), log(date(14.11.01)))


knowledge, as illustrated by Example 3.5. Obviously different text entries for the same semanticlabel do not always suggest an inconsistency, as illustrated by Example 3.6.

Example 3.5. Let h1 and h2 be two structured reports. Suppose hweather : suni is an atomicfeature in h1 and hweather : raini is an atomic feature in h2, and h1 and h2 are on the topic‘‘weather reports for London on 1 August 1999’’.

Example 3.6. Let h1 and h2 be two structured reports. Consider hcity : Londoni is an atomicfeature in h1 and hcity : Parisi is an atomic feature in h2, and h1 and h2 are on the topic‘‘weather reports for 1 August 1999’’.

An example of a definition for the coherent relation is given below.

Example 3.7. Let us assume we have the following background knowledge for identifying pair-wise inconsistency in atomic features of type weather in weather reports:

This is likely to be only a partial list of literals required for this purpose. In addition, we willneed various further formulae to define Coherent for other types of atomic feature.

The background atoms are evaluated by querying background knowledge. In the simplest case,the background knowledge may be just a set of ground background atoms that hold. However, wewould expect the background knowledge would include classical quantified formulae that can behandled using automated reasoning. In any case, the background knowledge is defined by aknowledge engineer building a fusion system for an application.

Definition 3.5. The set of check atoms, denoted C, is the union of the structural atoms and thebackground atoms.

We now consider the syntax for action atoms, and this requires the definition for actionfunctions.

8X CoherentðX;XÞ:CoherentðweatherðsunÞ;weatherðrainÞÞ:CoherentðweatherðcoldÞ;weatherðhotÞÞ:CoherentðweatherðcoolÞ;weatherðhotÞÞ:CoherentðweatherðcoldÞ;weatherðwarmÞÞ:CoherentðweatherðcoolÞ;weatherðwarmÞÞ:CoherentðweatherðsunÞ;weatherðsnowÞÞCoherent(weather(snow),weather(sleet))

Coherent(weather(sun),weather(sunny))

Coherent(weather(rain),weather(rainy))

Coherent(weather(heavy rain),weather(storms))

Coherent(weather(rain),weather(showers))

Coherent(weather(showers),weather(unsettled))


Definition 3.6. The set of action functions include the following functions that can be used as termsin the action atoms:

1. Interval(X,Y) where X and Y are text entries and the function returns an interval X-Y.2. Conjunction(X,Y) where X and Y are text entries and the function returns a text entry X

and Y.3. Disjunction(X,Y) where X and Y are text entries and the function returns a text entry X

or Y.

We assume action functions are defined in the underlying implementation that uses the actionsas instructions to build a merged report.

Example 3.8. The following are examples of definitions for action functions:

We now consider a basic set of action atoms. In Section 6, we consider extending the set ofaction atoms.

Definition 3.7. The action atoms are atoms that include the following definitions:

1. CreateSkeleton(R) where R is a report. The resulting action is to create the skeleton forthe merged report. The postcondition of this action is Skeleton(R) holding.

2. AddText(E,P) where E is a text entry, and P is a tree position. The resulting action is toadd the text entry to the tree in position P. The precondition of this action is that there isno offspring, or text entry, for the semantic label at P. The postcondition of this action is thatE is the text entry for the semantic label at P.

3. ExtendFeature(F,P) where F is a feature and P is a position. The resulting actionis to extend the tree with F at position P. The preconditions for this are that the semanticlabel for Root(F) should equal the semantic label at P and there is no offspring, ortext entry, for the semantic label at P. The postcondition of this action is that Root(F)is at P.

4. AddFeature(F,P) where F is a feature and P is a position. The resulting action is to add Fto the tree in position P. The precondition of this action is that there is no text entry for thesemantic label at P. The postcondition of this action is that the semantic label at position P isthe anchor for F.

5. Populate(F,P) where F is a feature and P is a position. If there is a Skeleton(F) at-tached to position P in the tree, then the resulting action is to add the text entries inF to the vacant slots in Skeleton(F) in the tree. The preconditions of this action arethat there are no text entries in the offspring of the semantic label at P and the skele-ton rooted at P equals Skeleton(F). The postcondition of this action is that Root(F) isat P.

We denote the set of action atoms by A.

Interval(18C,25C) ¼ 18–25C

Conjunction(TV1,TV3) ¼ TV1 and TV3

Disjunction(sun,rain) ¼ sun or rain


Example 3.9. Let R be the report on the left. Then Skeleton(R) gives the item on the right:

Example 3.10. Consider AddText(BBC TV,weatherreport/source). This instruction ap-plied to the item below on the left gives the item on the right.

Example 3.11. Consider the item on the left, and the feature F on the right:

The instruction ExtendFeature(F,weatherreport/weathertoday) when applied tothe item on the left above gives the following item:

Implicit with the definition for ExtendFeature is the requirement to turn anatomic fea-ture into a complex feature. This is also possible with AddFeature though not necessarily.

hweatherreport : hweatherreport :hsource : TV1 and TV3i hsource : ihdate : 19:5:1999i hdate : ihcity : Londoni hcity : ihtoday : suni htoday : ihtomorrow : sun or raini htomorrow : i


hweatherreport : hweatherreport :hsource : i hsource : BBC TVihdate : i hdate : ihcity : Londoni hcity : Londonihtoday : i htoday : ihtomorrow : i htomorrow : i


hweatherreport : hweathertoday :hsource : i hprecipitation : snowihdate : i htemp : 0Cihcity : Londoni : weathertodayihweathertoday : i

: weatherreporti

hweatherreport :hsource : ihdate : ihcity : Londonihweathertoday : ihprecipitation : snowihtemp : 0Ci

: weathertodayi: weatherreporti


Example 3.12. Consider the item on the left, and the feature F on the right:

Then the instruction AddFeature(F,weatherreport) gives the following item:

Example 3.13. Consider the item on the left, and the feature F on the right.

Then the instruction Populate(F,weatherreport/weathertoday) gives the following:

The action atoms cause a structured news report to be constructed. They define the struc-ture of the report, and the text entries in the report. We will explain how this can be donein Section 5. In the remaining part of this section, we explain how we use these atoms in fusionrules.

hweatherreport : hprecipitation :hsource : i htype : snowihdate : i hamount : 2cmihcity : Londoni : precipitationi

: weatherreporti

hweatherreport :hsource : ihdate : ihcity : Londonihprecipitation : ihtype : snowihamount : 2cmi

: precipitationi: weatherreporti

hweatherreport : hweathertoday :hsource : i hprecipitation : snowihdate : i htemp : 0Cihcity : Londoni : weathertodayihweathertoday : ihprecipitation : ihtemp : i


hweatherreport :hsource : ihdate : ihcity : Londonihweathertoday : ihprecipitation : snowihtemp : 0Ci



Definition 3.8. The set of atoms is C [A. An atom is ground if each term in the atom is ground.If an atom is not ground, then it is unground. The set of ground atoms is denoted G.

Definition 3.9. Let f:;_;^g be the set of classical logical connectives. The set of ground formulae,denoted F, is the set of all classical formulae that can be formed from G and the set of logicalconnectives using the usual inductive definition for classical logic.

We leave consideration of quantification to Definitions 3.11 and 3.12. So in the above defini-tion, if a formula contains an unground atom, then the formula will not be a well-formed formulaof classical logic, because the free variable(s) will be unbound.

Definition 3.10. A check formula is a formula composed from one or more check atoms and zeroor more classical logical connectives using the usual inductive definition for classical logic for-mulae. An action formula is a formula composed from one or more action atoms and zero or moreclassical logical connectives using the usual inductive definition for classical logic formulae.

In the following definition, we introduce a non-classical form of implication that is denoted bythe ) symbol.

Definition 3.11. A fusion rule is a rule of the following form where a is a check formula and b is anaction formula.

a ) b

We assume that each variable in each rule is implicitly universally quantified, with the universalquantifiers outermost (i.e. if X1; . . . ;Xn are the free variables in a ) b, then the explicitly quan-tified version is 8X1; . . . ;Xn ða ) bÞ).

Normally, b will be an atom or a conjunction of atoms. However, if it incorporates disjunction,then it captures non-determinism in the intended actions, and if it incorporates negation, then thenegation captures a form of preclusion in the intended action.

Example 3.14. The following are four examples of fusion rules.

ReportðR1Þ ^ ReportðR2Þ ^ SameSkeletonðR1;R2Þ ^ SameDateðR1;R2Þ) CreateSkeletonðR1Þ

ReportðR1Þ ^ ReportðR2Þ ^ AtomicFeatureðR1;A1Þ ^ AtomicFeatureðR2;A2Þ^PositionðA1;P;R1Þ ^ FeatureTypeðA1;rainfallÞ ^ FeatureTypeðA2;rainfallÞ^CoherentðA1;A2Þ ^ TextEntryðA1;X1Þ ^ TextEntryðA2;X2Þ) AddTextðConjunctionðX1;X2Þ;PÞReportðR1Þ ^ ReportðR2Þ ^ AtomicFeatureðR1;A1Þ ^ AtomicFeatureðR2;A2Þ^FeatureTypeðA1;rainfallÞ ^ FeatureTypeðA2;rainfallÞ ^ :CoherentðA1;A2Þ^SourceðR1;S1Þ ^ SourceðR2;S2Þ ^ PreferðS1;S2Þ^PositionðA1;P;R1Þ ^ TextEntryðA1;EÞ) AddTextðE;PÞ


The action formulae give a logical specification that we can reason with. So for example, if wehave an action :a given by one fusion rule, and we have an action a _ b given by another fusionrule, then taking these together we are obliged to undertake the action b.

Definition 3.12. The set of background formulae is formed from the check formulae and theclassical universal quantifier, denoted 8, so that any unbound variable is bound by universalquantification. Any subset of the background formulae is called background knowledge.

Definition 3.13. A fusion system is a pair ðD;CÞ where C is a set of fusion rules and D is back-ground knowledge.

We explain how to use a fusion system in the next section.

4. Executing fusion rules

In order to use a set of fusion rules, we need to be able to execute them. We need a fusionsystem and a pair of structured news report to do this.

Definition 4.1. A fusion call is a triple ðD;C; fR1;R2gÞ, where ðD;CÞ is a fusion system, and R1

and R2 are structured news reports.

Suppose we want to merge the reports R1 and R2. To do this, we use the backgroundknowledge and the atoms Report(R1) and Report(R2), and then attempt to apply each ofthe fusion rules by a form of modus ponens, adding the consequent of each applied rule to thecurrent execution state, until no more fusion rules apply.

Definition 4.2. An execution state is a subset of F.

An execution state lists the action formulae that hold at each point in an execution of a fusioncall.

Definition 4.3. The starting execution state for a fusion call ðD;C; fR1;R2gÞ is fg.

So at the start of an execution, all we know is the background knowledge, and the two reports.An execution step for a fusion call takes an execution state and a fusion rule and creates a new

ReportðR1Þ ^ ReportðR2Þ ^ IncludeFeatureðR1;F1Þ ^ IncludeFeatureðR2;F2Þ^FeatureTypeðF1;regionalreportÞ ^ FeatureTypeðF2;regionalreportÞ^AtomicFeatureðF1;A1Þ ^ AtomicFeatureðF2;A2Þ^FeatureTypeðA1;regionÞ ^ FeatureTypeðA2;regionÞ^:SameTextEntryðA1;A2Þ ^ PositionðF1;P;R1Þ) AddFeatureðF1;PÞ ^ AddFeatureðF2;PÞ


execution state. The new execution state is the old execution state plus a grounded version of theconsequent of one of the fusion rules. For this, we need a form of substitution.

Definition 4.4. A substitution k for a fusion rule a ) b is an assignment k of ground terms tovariables in a and b such that kðaÞ and kðbÞ are ground formulae.

Example 4.1. Consider the first fusion rule in Example 3.14, where R1 and R2 are grounded byfeature terms. A substitution k is

Definition 4.5. An execution step for a fusion call ðD;C; fR1;R2gÞ is a triple ðX ; a ) b; Y Þ, whereX is an execution state, a ) b is a fusion rule, Y is an execution state, and the following twoconditions hold where k is a substitution for a ) b:

1. D [ X [ fReportðR1Þ;ReportðR2Þg ‘ kðaÞ2. Y ¼ X [ fkðbÞg

Each execution step can be regarded as an application of a form of modus ponens.

Definition 4.6. An execution sequence for a fusion call ðD;C; fR1;R2gÞ is a sequence of executionstates hX0; . . . ;Xni, where the following conditions hold:

1. X0 ¼ fg2. for all 06 i < n, Xi � Xiþ13. for all 06 i < n, there is an execution step ðXi; a ) b;Xiþ1Þ for the fusion call ðD;C; fR1;R2gÞ4. there is no execution step ðXn; a ) b;Xnþ1Þ for the fusion call ðD;C; fR1;R2gÞ such that condi-

tions 1–3 hold.

An execution sequence for a fusion call therefore ensures that: (1) the execution sequence hasthe starting execution state in the first execution step; (2) each execution step results in an ex-panded execution state; (3) each execution step follows from the previous step and uses a fusionrule from the fusion system; and (4) the execution sequence is maximal in the sense that it cannotbe extended without violating the other conditions.By definition, an execution sequence is a monotonically increasing sequence of sets. Each Xi in

the sequence has one more action formula than the previous set Xi�1 in the sequence.

Definition 4.7. An action sequence for an execution sequence hX0; . . . ;Xni is a sequence of actionformulae hA1; . . . ;Ani where for 0 < i6 n, Xi ¼ Xi�1 [ fAig.

An action sequence in just the sequence of action formulae that are added to the execution stateby each execution step. The action sequence summarizes the actions to be taken to construct themerged news report.

R1 ¼ weatherreport(date(12.12.01),city(London),weather(rain))

R2 ¼ weatherreport(date(12.12.01),city(London),weather(sleet))


Example 4.2. Consider the following pair of reports.

And a set of fusion rules that includes the following rules:

A fusion call with these fusion rules and news reports together with appropriate backgroundknowledge can give the following actions:

The complete action sequence is then given by:

hA1;A2;A3;A4;A5i

hweatherreport : hweatherreport :hsource : TV1i hsource : TV3ihdate : 19:5:1999i hdate : 19:5:1999ihcity : Londoni hcity : Londonihweather : suni hweather : showersi


ReportðR1Þ ^ ReportðR2Þ ^ SameSkeletonðR1;R2Þ^SameDateðR1;R2Þ ^ SameCityðR1;R2Þ) CreateSkeletonðR1Þ

ReportðR1Þ ^ ReportðR2Þ ^ AtomicFeatureðR1;A1Þ ^ AtomicFeatureðR2;A2Þ^PositionðA1;P;R1Þ ^ FeatureTypeðA1;sourceÞ ^ FeatureTypeðA2;sourceÞ^:SameSourceðR1;R2Þ ^ TextEntryðA1;E1Þ ^ TextEntryðA2;E2Þ) AddTextðConjunctionðE1;E2Þ;PÞReportðR1Þ ^ ReportðR2Þ ^ AtomicFeatureðR1;A1Þ ^ AtomicFeatureðR2;A2Þ^PositionðA1;P;R1Þ ^ FeatureTypeðA1;dateÞ ^ FeatureTypeðA2;dateÞ^TextEntryðA1;E1Þ ^ SameTextEntryðA1;A2Þ) AddTextðE1;PÞReportðR1Þ ^ ReportðR2Þ ^ AtomicFeatureðR1;A1Þ ^ AtomicFeatureðR2;A2Þ^PositionðA1;P;R1Þ ^ FeatureTypeðA1;cityÞ ^ FeatureTypeðA2;cityÞ^TextEntryðA1;E1Þ ^ SameTextEntryðA1;A2Þ) AddTextðE1;PÞReportðR1Þ ^ ReportðR2Þ ^ AtomicFeatureðR1;A1Þ ^ AtomicFeatureðR2;A2Þ^FeatureTypeðA1;weatherÞ ^ FeatureTypeðA2;weatherÞ ^ :CoherentðA1;A2Þ^SourceðS1;R1Þ ^ SourceðS2;R2Þ ^ PreferðS1;S2Þ^PositionðA1;P;R1Þ ^ TextEntryðA1;E1Þ) AddTextðE1;PÞ

A1 ¼ CreateSkeletonðR1ÞA2 ¼ AddTextðConjunctionðTV1;TV3Þ;weatherreport=sourceÞA3 ¼ AddTextð19:5:1999;weatherreport=dateÞA4 ¼ AddTextðLondon;weatherreport=cityÞA5 ¼ AddTextðsun;weatherreport=weatherÞ


In the next section, we consider how we can use an action sequence for constructing a mergedstructured news report.

5. Acting on fusion rules

Since we are building a merged structured news report in a number of steps, we need to firstclarify the nature of the intermediate stages in the construction process. To help, we adopt thefollowing definition of a fusion tree.

Definition 5.1. A fusion tree is a tree of the form ðN ;E; S; T ;BÞ, where ðN ;E; SÞ is a skeleton, T is aset of text entries, and B is a subset of N � T . The set of nodes of the tree is N [ T and the set ofedges of the tree is E [ B.

So B contains the edges that attach the text entries in T to the skeleton. If a fusiontree ðN ;E; S; T ;BÞ is a skeleton, then T ¼ fg and B ¼ fg. If a fusion tree is an item ofstructured text, then T is the set of text entries used in the structured text, and B specifieswhich atomic features they instantiate. In any case, a fusion tree is a partial instantiation of askeleton.

Definition 5.2. A construction sequence hT1; . . . ; Tni for an action sequence hA1; . . . ;Ani is a se-quence of fusion trees such that

1. If Ti ¼ ðNi;Ei; Si; Ti;BiÞ, and Tiþ1 ¼ ðNiþ1;Eiþ1; Siþ1; Tiþ1;Biþ1Þ, then Ni � Niþ1 and Ei � Eiþ1 andSi � Siþ1 and Ti � Tiþ1 and Bi � Biþ1.

2. T1 is the result of carrying out A1 on the fusion tree ðfg; fg; fg; fg; fgÞ.3. Tiþ1 is the result of carrying out Aiþ1 on Ti.

So an action sequence is a sequence of instructions to build a merged structured news report byacting incrementally on a fusion tree. To illustrate, consider the following example.

Example 5.1. Continuing Example 4.2, we have the action sequence, where the first instruction isCreateSkeleton(R) which results in the following fusion tree.

hweatherreport :hsource : ihdate : ihcity : ihweather : i

: weatherreporti


The second instruction is AddText(Conjunction(TV1,TV3),weatherreport/

source) which updates the above fusion tree to give the following fusion tree.

The third instruction is AddText(19.5.99,weatherreport/date) which updates theabove fusion tree to give the following fusion tree.

The fourth instruction is AddText(London,weatherreport/city) which updates theabove fusion tree to give the following fusion tree.

The fifth instruction is AddText(sun,weatherreport/weather) which updates theabove fusion tree to give the following fusion tree.

The net result is a merged structured news report.

In Example 5.1, we start by constructing a skeleton, and then adding text entries. So for thefusion tree ðN ;E; S; T ;BÞ, we start by defining ðN ;E; SÞ, and then incrementally add to T and Buntil we have a fusion tree that defines an item of structured text. In the next example, we form amerged structured news report from some complex features.

hweatherreport :hsource : TV1 and TV3ihdate : ihcity : ihweather : i

: weatherreporti

hweatherreport :hsource : TV1 and TV3ihdate : 19:5:1999ihcity : ihweather : i

: weatherreporti

hweatherreport :hsource : TV1 and TV3ihdate : 19:5:1999ihcity : Londonihweather : i

: weatherreporti

hweatherreport :hsource : TV1 and TV3ihdate : 19:5:1999ihcity : Londonihweather : suni

: weatherreporti


Example 5.2. Consider R1 being the first report given in Example 1.2. Now consider the actionsequence hA1; . . . ;A4i where

where

For the first instruction, CreateSkeleton(R1), we get the following fusion tree:

For the second instruction, AddTextð5 Nov 99;weatherreport=dateÞ, we update theabove fusion tree to get the following:

For the third instruction, Populate(F1,weatherreport/regionalreport), we updatethe above fusion tree to get the following:

For the fourth instruction, AddFeature(F2,weatherreport), we update the above fusiontree to get the following:

A1 ¼ CreateSkeletonðR1ÞA2 ¼ AddTextð5 Nov 99; weatherreport=dateÞA3 ¼ PopulateðF1;weatherreport=regionalreportÞA4 ¼ AddFeatureðF2;weatherreportÞ

F1 ¼ regionalreport(region(South East),maxtemp(20C))

F2 ¼ regionalreport(region(North West),maxtemp(18C))

hweatherreport :hdate :ihregionalreport :hregion : ihmaxtemp : i


hweatherreport :hdate : 5 Nov 99ihregionalreport :hregion : ihmaxtemp : i





If an action sequence is non-conflicting and complete, then the set of instructions can be used tobuild a fusion tree and they leave no gaps in the text entries in the fusion tree. If the action se-quence is incomplete, then the fusion tree will have text entries missing, and if the action sequenceis conflicting then there will be instructions for putting more than one text entry into the sameposition or instructions for putting both a text entry and a complex feature into the same position.Before defining when an action sequence is complete and non-conflicting, we consider when anaction sequence is consistent.

Definition 5.3. An action sequence hA1; . . . ;Ani is consistent iff fA1; . . . ;Ang0 ?.

This definition takes a direct interpretation of consistent. It just means an action sequence isconsistent if there is not an instruction to do both an action a and an action :a. An action se-quence can be checked for consistency before an attempt to construct a merged report is made.Since, an action sequence is a specification for a merged structured news report, we can de-

termine whether a particular structured news report meets the specification. We define this asfollows:

Definition 5.4. The meets relation is a binary relation between items of structured text and actionsequences, and is defined as follows:

R meets hA1; . . . ;Ani iff R meets A1 and . . . and R meets An

So by recursion, we need to consider the meets relation for action formulae. For action formulaeAi that are atoms, we require the following rules:




R meets CreateSkeletonðR0Þ if SkeletonðR0Þ � SkeletonðRÞ holdsR meets AddTextðE;PÞ if 9A s:t: TextEntryðA;EÞ ^ PositionðA;P;RÞ holdsR meets AddFeature(F,P) if Anchor(F,P,R) holds

R meets ExtendFeature(F,P) if Position(F,P,R) holds

R meets Populate(F,P) if Position(F,P,R) holds


For action formulae Ai that are not atoms, we require the following rules:

Definition 5.5. An action sequence hA1; . . . ;Ani is non-conflicting iff there is a construction se-quence hT1; . . . ; Tni for hA1; . . . ;Ani and Tn meets hA1; . . . ;Ani.

However, the meets relation is a little too relaxed in the sense that a report may meet an actionsequence but may also include extra information that has not been specified.

Definition 5.6. The matches relation is defined as follows, where R is a structured news report andhA1; . . . ;Ani is an action sequence.

The matches relation identifies the minimal structured news report(s) that meet(s) the actionsequence. In other words, it identifies the news reports that do not include any superfluous in-formation.

Definition 5.7. An action sequence hA1; . . . ;Ani is unambiguous iff there is only one constructionsequence hT1; . . . ; Tni for hA1; . . . ;Ani.If an action sequence hA1; . . . ;Ani is unambiguous, there is exactly one structured news report R

such that R matches hA1; . . . ;Ani.

Definition 5.8. An action sequence hA1; . . . ;Ani is complete iff the construction sequencehT1; . . . ; Tni for hA1; . . . ;Ani is such that Tn is a structured news report.

In other words, an action sequence is a complete if it is not the case that the fusion tree thatresults has missing text entries.

6. Properties of fusion rules

We now consider a few properties of fusion rules to clarify the nature of the syntax and exe-cution.

Proposition 6.1. Assuming the actions are only composed from the action atoms defined in Definition3.7. An action sequence hA1; . . . ;Ani is non-conflicting implies the following conditions:

1. :9 P s.t. fA1; . . . ;Ang ‘ AddText(E,P) ^ AddTextðE0;PÞ ^ E 6¼ E0

2. :9 P s.t. fA1; . . . ;Ang ‘ AddText(E,P) ^ ExtendFeature(F,P)

3. :9 P s.t. fA1; . . . ;Ang ‘ AddText(E,P) ^ AddFeature(F,P)

R meets a _ b iff R meets a or R meets bR meets a ^ b iff R meets a and R meets bR meets :a iff it is not the case that R meets a

R matches hA1; . . . ;Aniiff

R meets hA1; . . . ;Ani and 9=R0 s:t: R0 meets hA1; . . . ;Ani and SkeletonðR0Þ � SkeletonðRÞ


Proof. Consider Condition 1. The result of AddText(E,P) is a fusion tree with E being the textentry for the atomic feature at P. By the definition of atomic features, there can only be one textentry at P. So it is not possible to have both E and E0 at P when E 6¼ E0. So there is no constructionsequence hT1; . . . ; Tni for hA1; . . . ;Ani, where Tn meets hA1; . . . ;Ani. The cases for Conditions 2 and3 are essentially the same. �

A fusion call does not necessarily produce a unique action sequence. In other words, normallythere is some non-determinism in which fusion rules are applied.

Proposition 6.2. Let ðD;C; fR1;R2gÞ be a fusion call. It is not necessarily the case that there is aunique action sequence hA1; . . . ;Ani that is generated.

Proof. Consider Example 4.2, in which the action sequence hA1; . . . ;A5i is generated. This fusioncall could equally generate the following action sequence hA0

1; . . . ;A05i, where:

There can also be some non-determinism in an action sequence. �

Proposition 6.3. Let hA1; . . . ;Ani be an action sequence, and let Ai be an action formula in thatsequence. If Ai can be rewritten into disjunctive normal form, so Ai is equivalent to a formulaa1 _ _ ak, then there may be more than one structured news report R such that R matcheshA1; . . . ;Ani.

Proof. Consider a report R1 where R1 matches hA1; . . . ;Ani and a report R2 where R2 matcheshA1; . . . ;Ani. So for each Ai, R1 meets Ai. Now consider an Ai of the form of a _ b. Clearly R1meets a _ b and R2 meets a _ b. But suppose, R1 meets a and R1 does not meet b. Also suppose,R2 does not meet a and R2 meets b. So, R1 6¼ R2. �

The length of an execution sequence, i.e. the number of execution steps undertaken for a fusioncall is constrained by the number of fusion rules, and the nature of the inferences from the domainknowledge, and the size of the structured news reports. In order to get a useful boundary on thelength of an execution sequence, we adopt the following definition.

Definition 6.1. A fusion rule a ) b is capped iff the only possible substitutions k are such that kassigns feature terms to the free variables in b.

All the fusion rules in this paper are capped.

Proposition 6.4. For any fusion call ðD;C; fR1;R2gÞ, if jCj is finite, and each fusion rule is capped,then the execution sequence hX1; . . . ;Xni is finite.

A01 ¼ CreateSkeletonðR1Þ

A02 ¼ AddTextðsun;weatherreport=weatherÞ

A03 ¼ AddTextðLondon;weatherreport=cityÞ

A04 ¼ AddTextð19:5:1999;weatherreport=dateÞ

A05 ¼ AddTextðConjunctionðTV1;TV3Þ;weatherreport=sourceÞ


Proof. The constraints in Definition 4.6 ensure that there are no cycles in an execution sequence.So there is no execution sequence hX1; . . . ;Xni such that there is an Xi and Xj, where Xi ¼ Xj unlessi ¼ j. Hence, there is no sequence of execution steps where the same instantiated form of a fusionrule is used twice. The only way that we can get an infinite sequence hX1; . . . ;Xni is if there areinfinitely many kðbÞ generated for some rule a ) b 2 C. However, if each fusion rule in C iscapped, then there are only finitely many kðbÞ that can be generated for each fusion rule, since theonly substitutions for the variables in b come from the feature terms generated from the struc-tured news reports in the fusion call, and there are only finitely many feature terms that can begenerated from each fusion call. So it is not possible to generate an infinite execution sequencehA1; . . . ;Ani. �

The assumption that the fusion rules are capped seems quite reasonable if the aim of fusion is toonly include information from the original reports being merged. Indeed if we assume the rules arecapped, we can identify a tighter bound based directly on the size of the structured reports beingmerged.However, the computational viability of a fusion system depends on more than the number of

execution steps taken. Indeed there are a number of factors that need to be considered:

• executing fusion rules,• reasoning with background knowledge,• acting on fusion rules.

We can regard each of these activities being a form of classical logic inferencing, and hence thecomputational viability is bounded by the computational viability of classical logic. Whilst ingeneral, reasoning with classical logic is difficult to automate, implementation based on Prolog isfeasible.Another practical question is whether the syntax can express everything that we want or need to

express. This includes:

• Location completeness of structural atoms. This is the ability to describe any structural relation-ship in a report in terms of the nesting and sequence of semantic labels and text entries.

• Comparison completeness of background atoms. This is the ability to compare any com-bination of text entries with respect to the background knowledge. Clearly the backgroundatoms presented are only indicative of the possible atoms that may be defined for an appli-cation.

• Fusion completeness of action atoms. This is the ability to describe how any structured newsreport can be constructed. In one sense, the current set of action atoms is sufficient for this. How-ever, further actions atoms would allow reports to be constructed with fewer instructions.For example, currently the action atoms cannot directly specify the sequence in which siblingsoccur.

The notion of structured news reports that we reviewed in Section 2 provides a rich structuralrepresentation. The check and action atoms that we defined in Section 3, do not draw on the fullexpressivity of structured new report. However, it is straightforward to add further check and


action atoms to extend the basic fusion framework that we have presented here. To illustrate, wecould introduce the following actions.

Definition 6.2. Further action atoms include:

1. LeftAdd(F,P) where F is a feature and P is a position. The resulting action is to add F to thefusion tree to the left of the existing feature at P.

2. RightAdd(F,P) where F is a feature and P is a position. The resulting action is to add F tothe fusion tree to the right of the existing feature at P.

To use these actions, we also introduce the following structural atoms.

Definition 6.3. Further structural atoms include:

1. LeftNeighbourðF0;F;P;RÞ where F0 and F are features with the anchor at position P in R

and F0 is immediately to the left of F.2. RightNeighbourðF;F0;P;RÞ where F0 and F are features with the anchor at position P in R

and F0 is immediately to the right of F.

We illustrate the first of these structural atoms below.

Example 6.1. Let R be the news report given in Fig. 1. Also consider the following:

For this report, LeftNeighbour(F0,F,P,R) holds.

Now we illustrate the action atoms given in Definition 6.2.

Example 6.2. Consider the following fusion tree T.

Now consider the following instructions:

where

F ¼ regionalreportðregionðNorth WestÞ;maxtempð18CÞÞ

F0 ¼ sourceðOrange websiteÞF ¼ URLðwww:orange:co:ukÞP ¼ bidreport=reportinfo



A ¼ LeftAddðF;weatherreport=regionalreportÞA0 ¼ RightAddðF;weatherreport=regionalreportÞ


For the instruction, LeftAdd (F,weatherreport/regionalreport), applied to the fu-sion tree T, we get the following:

But suppose we ignore the previous instruction, and return to the original state T of the fusiontree. For the instruction RightAdd(F,weatherreport/regionalreport), applied to thefusion tree T, we get the following:

It is straightforward to extend the framework that we have presented in this paper to ac-commodate these atoms.Clearly, we can greatly extend the set of background atoms depending on the application. Some

further discussion of formulae for the background knowledge including discussion of inconsis-tency, temporal knowledge, and domain knowledge is given in [30,35], and for further discussionof lexical and world knowledge see also [28,29,33]. Also of relevance are the options of usingontologies for structured text (see for example [20]) and using comprehensive semantic networkssuch as WordNet [40]. More generally, it may be appropriate to harness typed-feature structures[11] and machine readable dictionaries [49] for representing and reasoning with lexical knowledge.

7. Discussion

Structured text is a general concept implicit in many approaches to handling textual informa-tion in computing, including tagged text in XML, text in relational and object-oriented databases,

hweatherreport :hdate : 5 Nov 99ihregionalreport :hregion : North Westihmaxtemp : 18Ci

: regionalreportihregionalreport :hregion : South Eastihmaxtemp : 20Ci






and output from information extraction systems. Structured text can be naturally viewed in logic.Each item of structured text can be represented by a formula of classical logic. This means thatconsistency checking and inferencing can be undertaken with structured text using domainknowledge.We have proposed fusion rules as a scripting language for defining how to merge news reports.

It may be appropriate to develop syntactic sugar and other notational conveniences to enhancethis proposal. This may include using symbols such as AND, OR, and NOT. It may also includepriority ordering over fusion rules to dictate the preferred ordering in which they should apply soas to allow for simpler antecedents. Further assumptions could also be used about the process tosimplify the notation. For example, each rule has ReportðR1Þ ^ ReportðR2Þ in the antecedent,and yet we may assume two reports which are always referred to as R1 and R2, and thereby notneed ReportðR1Þ ^ ReportðR2Þ in the antecedent.The definition for a fusion call suggests an implementation based on existing automated rea-

soning technology and on XML programming technology. The most obvious route for repre-senting each structured news report is to represent it as an XML document. Once information isin the form of XML documents, a number of technologies for managing and manipulating in-formation in XML are available [8]. Possibilities for representing and reasoning with backgroundknowledge include relational databases, Datalog, and Prolog. Possibilities for implementing aninference engine for executing fusion rules include a meta-level program in Prolog, or an imple-mentation in a imperative programming language such as Java. Another possibility is to presentfusion rules in RuleML and harness one of the Java rule engines that are currently being pro-posed. 1 Finally, an action engine for acting on the instructions given by the fusion rules, could beimplemented in an imperative programming language such as Java that can manipulate XML. Anaction engine would need to take each instruction in an action sequence and construct a con-struction sequence.We have not formalized the relationship between the structural atoms and XML technology.

However, there is clearly an overlap in functionality with technologies including the XPath lan-guage and proposals for the XML Query Algebra. 2 However, the main points we want to stressin any comparison is that: (1) fusion rules offer a logical bridge between logical reasoning withbackground knowledge, structural information about news reports to be merged, and logicalspecifications of the instructions for producing the merged report; and (2) fusion rules offer ahigher-level scripting language for handling structured text than available with XPath or XMLQuery Algebra, and so fusion rules can be used on top of XML technology. In this sense, fusionrules and XML technology are complementary.Given that information extraction may be the technology for providing structured news reports

for merging, integration of a fusion system with information extraction technology may be ap-propriate. The GATE System provides an implemented architecture for managing textual datastorage and exchange, visualization of textual data structures, and plug-in modularity of textprocessing components [23]. The text processing components includes LaSIE which performsinformation extraction tasks including named entity recognition, coreference resolution, templateelement filling, and scenario template filling.

1 www.dfki.uni-kl.de/ruleml.2 www.w3.org.


References

[1] S. Abiteboul, Querying semi-structured data, in: International Conference on Database Theory, 1997, pp.

1–18.

[2] ARPA. Message Understanding Conference: Proceedings of the Seventh Conference, Morgan Kaufmann, Los

Altos CA, 1998.

[3] S. Benferhat, C. Cayrol, D. Dubois, J. Lang, H. Prade. Inconsistency management and prioritized syntax-based

entailment, in: Proceedings of the 13th International Joint Conference on AI (IJCAI’93), 1993.

[4] S. Bergamaschi, S. Castano, M. Vincini, D. Beneventano, Semantic integration of heterogeneous information

sources, Data and Knowledge Engineering 36 (2001) 215–249.

[5] S. Benferhat, D. Dubois, H. Prade. How to infer from inconsistent beliefs without revising, in: Proceedings of the

14th International Joint Conference on AI (IJCAI’95), 1995.

[6] Ph. Besnard, A. Hunter, A logic-based theory of deductive arguments, Artificial Intelligence 128 (2001)

203–235.

[7] C. Baral, S. Kraus, J. Minker, V. Subrahmanian, Combining knowledgebases of consisting of first-order theories,

Computational Intelligence 8 (1992) 5–71.

[8] N. Bradley, The XML Companion, Addison-Wesley, Reading MA, 2000.

[9] G. Brewka, Preferred subtheories: an extended logical framework for default reasoning, in: Proceedings of the

Eleventh International Joint Conference on Artificial Intelligence (IJCAI’89), 1989, pp. 1043–1048.

[10] P. Buneman, Semistructured data, in: Proceedings of the ACM Symposium on Principles of Database Systems,

1997.

[11] B. Carpenter, The Logic of Typed Feature Structures, Cambridge University Press, Cambridge, 1992.

[12] D. Calvanese, G. De Giacomo, M. Lenzerini, D. Nardi, R. Rosati, Description logic framework for information

integration, in: Proceedings of the 6th Conference on the Principles of Knowledge Representation and Reasoning

(KR’98), Morgan Kaufmann, Los Altos, CA, 1998, pp. 2–13.

[13] D. Calvanese, G. De Giacomo, M. Lenzerini, D. Nardi, R. Rosati, Source integration in data warehousing, in:

Proceedings of the 9th International Workshop on Database and Expert Systems (DEXA’98), IEEE Computer

Society Press, Silver Spring, MD, 1998, pp. 192–197.

[14] L. Cholvy, Reasoning with data provided by federated databases, Journal of Intelligent Information Systems 10

(1998) 49–80.

[15] J. Cowie, W. Lehnert, Information extraction, Communications of the ACM 39 (1996) 81–91.

[16] L. Cholvy, S. Moral, Merging databases: problems and examples, International Journal of Intelligent Systems 10

(2001) 1193–1221.

[17] W. Cohen, A web-based information system that reasons with structured collections of text, in: Proceedings of

Autonomous Agents’98, 1998.

[18] C. Cayrol, V. Royer, C. Saurel, Management of preferences in assumption based reasoning, in: Information

Processing and the Management of Uncertainty in Knowledge based Systems (IPMU’92), Lecture Notes in

Computer Science, vol. 682, Springer, Berlin, 1993.

[19] D. Dubois, H. Prade (Eds.), Handbook of Defeasible Reasoning and Uncertainty Management Systems, vol. 3,

Kluwer, Dordrecht, 1998.

[20] M. Erdmann, R. Studer, How to structure and access XML documents with ontologies, Data and Knowledge

Engineering 36 (2001) 317–335.

[21] E. Franconi, U. Sattler, A data warehouse conceptual data model for multidimensional aggregation, in: S. Gatziu,

M. Jeusfeld, M. Staudt, Y. Vassiliou (Eds.), Proceedings of the Workshop in Design and Management of Data

Warehouses, 1999.

[22] P. Gardenfors, Knowledge in Flux, MIT Press, Cambridge, MA, 1988.

[23] R. Gaizaukas, H. Cunningham, Y. Wilks, P. Rodgers, K. Humphreys, GATE: an environment to support

research and development in natural language engineering, in: Proceedings of the 8th IEEE Interna-

tional Conference on Tools with Artificial Intelligence, IEEE Computer Society Press, Silver Spring, MD,

1996.


[24] G. Grahne, A. Mendelzon, Tableau techniques for querying information sources through global schemas, in:

Proceedings of the 7th International Conference on Database Theory (ICDT’99), Lecture Notes in Computer

Science, Springer, Berlin, 1999.

[25] R. Grishman, Information extraction techniques and challenges, in: M. Pazienza (Ed.), Information Extraction,

Springer, Berlin, 1997.

[26] K. Hui, P. Gray, Developing finite domain constraints–a data model approach, in: Proceedings of Computation

Logic 2000 Conference, Springer, Berlin, 2000, pp. 448–462.

[27] J. Hammer, H. Garcia-Molina, S. Nestorov, R. Yerneni, Template-based wrappers in the TSIMMIS system, in:

Proceedings of ACM SIGMOD’97, ACM, New York, 1997.

[28] A. Hunter, L. Marten, Context-sensitive reasoning with lexical and world knowledge, in: SOAS Working Papers in

Linguisitcs, vol. 9, 1999, pp. 80–95.

[29] A. Hunter, Intelligent text handling using default logic, in: In Proceedings of the IEEE Conference on Tools with

Artificial Intelligence (TAI’96), IEEE Computer Society Press, Silver Spring MD, 1996, pp. 34–40.

[30] A. Hunter, Merging potentially inconsistent items of structured text, Data and Knowledge Engineering 34 (2000)

305–332.

[31] A. Hunter, Ramification analysis using causal mapping, Data and Knowledge Engineering 32 (2000) 1–27.

[32] A. Hunter, Reasoning with inconsistency in structured text, Knowledge Engineering Review 15 (2000) 317–337.

[33] A. Hunter, A default logic-based framework for context-dependent reasoning with lexical knowledge, Journal of

Intelligent Information Systems 16 (2001) 65–87.

[34] A. Hunter, Hybrid argumentation systems for structured news reports, Knowledge Engineering Review (in press).

[35] A. Hunter, Merging structured text using temporal knowledge, Data and Knowledge Engineering (in press).

[36] H. Katsuno, A. Mendelzon, On the difference between updating a knowledgebase and revising it, in: Principles of

Knowledge Representation and Reasoning: Proceedings of the Second International Conference (KR’91), Morgan

Kaufmann, Los Altos, CA, 1991, pp. 387–394.

[37] S. Konieczny, R. PinoPerez, On the logic of merging, in: Proceedings of the Sixth International Conference on

Principles of Knowledge Representation and Reasoning (KR’98), Morgan Kaufmann, Los Altos, CA, 1998, pp.

488–498.

[38] P. Liberatore, M. Schaerf, Arbitration (or how to merge knowledgebases), IEEE Transactions on Knowledge and

Data Engineering 10 (1998) 76–90.

[39] Y. Loyer, N. Spyratos, D. Stamate, Integration of information in four-valued logics under non-uniform

assumptions, in: Proceedings of 30th IEEE International Symposium on Multiple-Valued Logic (ISMVL2000),

IEEE Press, New York, 2000.

[40] G. Miller, WordNet: A lexical database for English, Communications of the ACM 38 (11) (1995) 39–41.

[41] A. Motro, Cooperative database systems, International Journal of Intelligent Systems 11 (1996) 717–732.

[42] R. Manor, N. Rescher, On inferences from inconsistent information, Theory and Decision 1 (1970) 179–219.

[43] A. Preece, K. Hui, A. Gray, P. Marti, T. BenchCapon, D. Jeans, Z. Cui, The KRAFT architecture for knowledge

fusion and transformation, in: Expert Systems, Springer, Berlin, 1999.

[44] A. Poulovassilis, P. McBrien, A general formal framework for schema transformation, Data and Knowledge

Engineering 28 (1998) 47–71.

[45] N. Paton, R. Stevens, P. Baker, C. Goble, S. Bechhofer, A. Brass, Query processing in the TAMBIS bioinformatics

source integration system, in: Proceedings of the 11th International Conference on Scientific and Statistical

Databases, 1999.

[46] A. Sahuguet, F. Azavant, Building light-weight wrappers for legacy web data-sources using W4F, in: Proceedings

of the International Conference on Very Large Databases (VLDB’99), 1999.

[47] A. Sheth, J. Larson, Federated database systems for managing distributed, heterogeneous, and autonomous

databases, ACM Computing Surveys 22 (1990) 183–236.

[48] K. Smith, L. Obrst, Unpacking the semantics of source and usage to perform semantic reconciliation in large-scale

information systems, ACM Sigmod Record 28 (1999) 26–31.

[49] Y. Wilks, B. Slator, L. Guthrie, Electric Words: Dictionaries Computers and Meanings, MIT Press, Cambridge,

MA, 1996.


Anthony Hunter received a BSc (1984) from the University of Bristol and an MSc (1987) and Ph.D. (1992)from Imperial College, London. He is currently a senior lecturer in the Department of Computer Science atUniversity College London. His main research interests are: knowledge representation and reasoning forhandling incompleteness and inconsistency in information; default reasoning; argumentation; application indecision-support and in technologies for understanding and reasoning with information in natural language.


logical fusion rules for merging structured news reports

Documents