an event-based denotational semantics for natural language queries of data represented in triple...

20
An event-based denotational semantics for natural language queries of data represented in triple stores Richard Frost , Randy Fortier and Bryan St. Amour School of Computer Science University of Windsor ICSC 2013

Upload: horatio-mcdaniel

Post on 01-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

An event-based denotational semantics for natural language queries of data represented in

triple stores

Richard Frost , Randy Fortier and Bryan St. AmourSchool of Computer Science

University of Windsor

ICSC 2013

Objectives of our research

To create an efficient, modular Natural Language (NL) speech

interface to graphical data which enables answers to

questions to be computed directly from the data

“xxx xx xxxxx xxx xxxxxx x xx x xx?”

⇧⇧ ⇧ ⇧⇧⇧ ⇧ ⇧⇧⇧ ⇧ ⇧ ⇧ ⇧⇧

Data = {(a,r1,c), (a,r2,f), (c,r3,g),……….}

• Efficient: polynomial time and space complexity.

• Modular: new language constructs can be added

without affecting any existing code.

• Graphical data: binary-relational triple stores, converted

relational data, semantic web RDF data.

ICSC 2013

Why do we need a compositional semantics?

“How many states which are members of the United Nations

have capitals in the southern hemisphere?”

• Information retrieval systems can only answer if a

similar statement, with the answer, is in the data store.

• Even so, the statement would need to be updated

whenever a new member is added to the U.N. or a

change in capital is declared which affects the result.

ICSC 2013

Progress so far

•X-SAIGA – an environment for constructing language processors as

modular executable specifications of attribute grammars. Based on a

top-down polynomial space/time complexity parser for arbitrary

(ambiguous/left-recursive) CFGs.

•SpeechWeb – an architecture for creating speech interfaces to

hyperlinked applications on the Web.

•NL semantics for conventional relational databases

Youtube, enter: SpeechWeb

NEXT STEP – SEMANTICS FOR GRAPHICAL DATA

ICSC 2013

A Breakthrough - Montague’s (1970’s) approach to natural-language semantics (simplified)

“Mars spins” English

[[Mars]] [[spins]]

= λp(p e_mars) spins_pred Higher-order Intensional Logic (IL) => spins_pred e_mars Data Model

=> True

ICSC 2013

Montague Semantics (MS)

“every moon spins”

([[every]] [[moon]]) [[spins]]

= (λpλq ∀x(p x → q x) moon_pred) spins_pred

=> λq ∀ x(moon_pred x → q x) spins_pred => ∀ x(moon_pred x → spins_pred x)

=> True (if all things that are moons spin)

ICSC 2013

MS is polymorphic “Mars and Venus spin”

=> ([[and]] [[Mars ]] [[Venus ]]) [[spin ]]

=> (λsλt (λr(s r & t r)) λp(p e_mars) λp(p e_venus)) spins_pred

=>> λr(λp(p e_mars) r & λp(p e_venus) r) spins_pred

=> λp(p e_mars) spins_pred & λp(p e_venus) spins_pred => spins_pred e_mars & spins_pred e_venus => True & True

=> True

ICSC 2013

MS is very powerful

•The semantics covers a large sub-set of classical first-order English.

- does (((every moon) $and (every planet)) spin)

- how_many (moons $that (orbit (a (red planet))) (were (discovered_by (the (person $who (discovered Nereid)))))

- which planet (is (orbited_by (no moon)))

•It covers intensions, modal expressions (although we do not)

•The meaning of words can be defined in terms of other words.

[[discoverer]] = [[person $who (discovered (a thing))]]

ICSC 2013

Montague Semantics is ideally suited as a basis for computerized query processors

•Denotational: every word and phrase has a well-defined mathematical meaning

(denotation).

•Compositional: The meaning of a phrase is obtained from the meanings of its

parts through simple (function application).

•Referentially transparent: the meaning of a phrase, after syntactic

disambiguation, is always the same.

•There is a one-to-one correspondence between syntactic and semantic

rules

BUT

ICSC 2013

Shortcomings of MS for query processing

•Computationally intractable: ∀ x(moon_pred x → spins_pred x)

•No explicit denotation for transitive verbs: left uninterpreted until the

end and then a syntactic re-write is used to give IL expression

•Prepositional phrases not easy to accommodate in MS entity-based

semantics

•Needs intermediate language: IL needs to be mapped to the triple

store/binary-rel/RDF data model OR to another intermediate language

(although Montague said that IL was dispensable).

ICSC 2013

Our semantics •Has the 4 Montagovian properties: denotational/modular/ etc.

•Computationally tractable: set based rather than predicates.

•Event based: Able to easily accommodate prepositional phrases.

•Has an explicit denotation for transitive verbs: enabling accommodation

of phrases such as “wrote or interpreted”.

•No intermediate language: NL denotations are defined directly in terms of

basic triple store operations. This approach differs from many other NL

query approaches which map NL to SQL or SPARQL.

ICSC 2013

An example datastore – 5 events {(EV 1000, REL "type", TYPE "born_ev"),

(EV 1000, REL "subject", ENT "capone"),

(EV 1000, REL "date", ENTNUM 1899),

(EV 1001, REL "type", TYPE "join_ev"),

(EV 1001, REL "subject", ENT "capone")

(EV 1001, REL "object", ENT "fpg"),

(EV 1002, REL "type", TYPE "membership"),

(EV 1002, REL "subject", ENT "capone"),

(EV 1002, REL "object", ENT "thief_set"),

(EV 1002, REL "date", ENTNUM 1918 ),

(EV 1004, REL "type", TYPE "steal_ev"),

(EV 1004, REL "subject", ENT "capone"),

(EV 1004, REL "object", ENT "car_1"),

(EV 1005, REL "type", TYPE "smoke_ev"),

(EV 1005, REL "subject", ENT "capone"),

easily add (EV 1000, REL "location", ENT "brooklyn"),

ICSC 2013

Basic retrieval operators .

getts (ANY, REL “subject”, ENT “capone”)

=> {(1000, REL “subject”, ENT “capone”),

(1001, REL “subject”, ENT “capone”), etc.

getts can be used to define other basic operators. Definitions in the paper..

Example uses:

get_subjs_for_events {EV 1000, EV 1009} => {ENT "capone", ENT "torrio"}

get_members “thief_set” => {ENT “capone"}

 get_subjs_of_event_type “born_ev” => {ENT “capone”}

We can now define semantics using these basic operators

ICSC 2013

Our new semanticsNote in paper and from now on: bold italic thief = [[thief]]

thief = get_members “thief_set"

  e.g. thief => {ENT “capone”}

smokes = get_subjs_of_event_type “smoke_ev”

  e.g. smokes => {ENT “capone”}

capone setofents = (ENT "capone") setofents ∈ e.g. capone smokes => True

a nph vbph = #( nph vbph) ~= 0 ⋂

term_and tmph1 tmph2 = f where

f setofevs = (tmph1 setofevs) & (tmph2 setofevs)

e.g. ((a thief ) $term_and capone) smokes => True

 

ICSC 2013

Our new semantics – major contribution 1

join = make_trans “join_ev”

 

e.g. join (a gang) => {ENT “capone”, ENT “torrio”}

Definition:

make_trans event_type = f

where

f tmph

= { subj | (subj, evs) (∈ make_image event_type)

&

tmph ( {map thirds (getts (ev, REL "object", ANY))⋃ | ev evs})}∈

where, for example: make_image “join_ev”

=> {(ENT “capone”, {EV 1001, EV 1003}), (ENT “torrio”, {EV 1009})}

 

ICSC 2013

Prepositional phrases – major contribution 2

steal_with_time tmph date

= {subj | (subj, evs) ∈ image_steal &

tmph ( ⋃ {thirds (getts (ev,REL"object",ANY))

| ev ∈ evs

&

date(thirds ( getts (ev,REL "date", ANY)))})}

The date argument is used to “filter” the events.

e.g. steal_with_time (a car) (date_1918) => {ENT "capone"}

Note : we need to generalize and create a more powerful version of the make_trans function (this should not be too difficult)

 

ICSC 2013

The result: A wide range of English NL queries

e.g. “Which gangster who stole a car in 1915 or 1918 joined a gang that was joined by Torrio?”

⇩which (gangster $that (steal_with_time (a car)

(date_1915 $term_or date_1908))

(join (a (gang $that (joined_by torrio))))

⇩ {ENT “capone”}

The brackets are introduced by the parser, which will produce more than one bracketed expressions for ambiguous input.

 

ICSC 2013

Next steps

1. Generalize the method for accommodating prepositional phrases and create a more powerful version of the make_trans function to cover queries such as : “who stole a car in Brooklyn in 1915” (our solution is briefly described in the paper).

2.Extend the parser of the existing NL speech query processor to accommodate prepositional phrases.

3.Replace the entity-based NL semantics of the existing query processor with the new event-based semantics.

4. Interface the new query processor with an RDF semantic web data source (will require converting RDF triples to event-based triples).

5. Develop methods for optimising queries to semantic web data.

⇩ An NL speech query interface to semantic web data

 

ICSC 2013

References for previous work

PARSING: Frost, R., Hafiz, R., Callaghan, P., (2007) Modular and efficient top-down parsing for ambiguous left-recursive grammars. In: 10th ACL, IWPT, 109–120.

Hafiz, R. and Frost, R, (2010) Lazy combinators for executable specifications of general attribute grammars, Proceedings of the 12th International Symposium on Practical aspects of declarative languages (PADL), LNCS 5937, 167-182.

SPEECH RECOGNITION: Frost, R. A. (2005). A call for a public-domain SpeechWeb. CACM 48 (11) 45-49.

Frost, R. A., Ma, X. and Shi, Y. (2007) A browser for a public-domain SpeechWeb. WWW 2007, 1307-1308.

SEMANTICS: Frost, R. A. (2006) Realization of natural language interfaces using lazy functional programming. ACM Comp. Surv. 38 (4) Article 11.

Frost, R. A. and Fortier, R. (2007) An efficient denotational semantics for natural language database queries, NLDB 07, LNCS 4592, 12-24.

YouTube: SpeechWeb => http://www.youtube.com/watch?v=Axa-n4etdZE

 

ICSC 2013

Acknowledgements

Rahmatullah Hafiz

Paul Callaghan

Nabil Abdullah

Ali Karaki

Paul Meyer

Jon Donais

Matthew Clifford

Shane Peelar

Stephen Karamatos

Walid Mnaymneh

Rob Mavrinac

Cai Filiault

NSERC – Natural Science and Engineering Council of Canada

Research Services - University of Windsor

 

ICSC 2013