geographic reference analysis for geographic document querying

24
Geographic reference analysis for geographic document querying F.Bilhaut , T.Charnois, P.Enjalbert & Y.Mathet {bilhaut, charnois, enjalbert, mathet}@info.unicaen.fr GREYC, CNRS UMR 6072 University of Caen

Upload: cadman-mckinney

Post on 01-Jan-2016

53 views

Category:

Documents


3 download

DESCRIPTION

Geographic reference analysis for geographic document querying. F.Bilhaut , T.Charnois, P.Enjalbert & Y.Mathet {bilhaut, charnois, enjalbert, mathet}@info.unicaen.fr GREYC, CNRS UMR 6072 University of Caen. The "GéoSem" project. Passage extraction from geographical documents - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Geographic reference analysis for geographic document querying

Geographic reference analysis for geographic document querying

F.Bilhaut , T.Charnois, P.Enjalbert & Y.Mathet

{bilhaut, charnois, enjalbert, mathet}@info.unicaen.fr

GREYC, CNRS UMR 6072

University of Caen

Page 2: Geographic reference analysis for geographic document querying

The "GéoSem" project

• Passage extraction from geographical documents

• From a query to a ranked set of passages

• Queries are concerned with :- time

- phenomenon

- space

Page 3: Geographic reference analysis for geographic document querying

Excerpt from "Hérin" corpus

From 1965 to 1985, the number of high-school students has increased by

70%, but at different rythms and intensities depending on academies and

departments. Lower in South-West and Massif Central, moderate in

Brittany and Paris, the rise has been considerable in Mid-West and Alsace.

[…] Also occurs the schooling duration increase which was more important

in departments where, in the middle of the 60's, study continuation after

primary school was far from beeing systematic.

Page 4: Geographic reference analysis for geographic document querying

Excerpt from "Hérin" corpus

From 1965 to 1985, the number of high-school students has increased by

70%, but at different rythms and intensities depending on academies and

departments. Lower in South-West and Massif Central, moderate in

Brittany and Paris, the rise has been considerable in Mid-West and Alsace.

[…] Also occurs the schooling duration increase which was more important

in departments where, in the middle of the 60's, study continuation after

primary school was far from beeing systematic.

Time

Page 5: Geographic reference analysis for geographic document querying

Excerpt from "Hérin" corpus

From 1965 to 1985, the number of high-school students has increased by

70%, but at different rythms and intensities depending on academies and

departments. Lower in South-West and Massif Central, moderate in

Brittany and Paris, the rise has been considerable in Mid-West and Alsace.

[…] Also occurs the schooling duration increase which was more important

in departments where, in the middle of the 60's, study continuation after

primary school was far from beeing systematic.

Time Phenomenon

Page 6: Geographic reference analysis for geographic document querying

Excerpt from "Hérin" corpus

From 1965 to 1985, the number of high-school students has increased by

70%, but at different rythms and intensities depending on academies and

departments. Lower in South-West and Massif Central, moderate in

Brittany and Paris, the rise has been considerable in Mid-West and Alsace.

[…] Also occurs the schooling duration increase which was more important

in departments where, in the middle of the 60's, study continuation after

primary school was far from beeing systematic.

Time Phenomenon Space

Page 7: Geographic reference analysis for geographic document querying

Queries

• Which passages address educational difficulties in west of France in the 50's ?

• Which passages address variations of the number of pupils in rural areas ?

• Which passages address Calvados district?

Page 8: Geographic reference analysis for geographic document querying

Queries

• Which passages address educational difficulties in west of France in the 50's?

• Which passages address variations of the number of pupils in Paris area?

• Which passages address Calvados district?

Page 9: Geographic reference analysis for geographic document querying

Some Signifiant Spatial Expressions

Paris

in north of France

from south of Loire

Some seabord towns

The quarter of

The districts in north of France

Fifteen

All

Some seabord towns of Normandy

The most rural districts situated from south of Loire

Page 10: Geographic reference analysis for geographic document querying

The type "zone"a georeferenced area anchored in a named place

Paris

in north of France

Normandy

From Normandy to Alsace

from south of Loire

Page 11: Geographic reference analysis for geographic document querying

The ‘LocGeo’ type

Quant Type Zone qualification administrative Position named geo. entity

The quarter of / districts in north of France

Fifteen / All /

Some seabord towns of Normandy

The most rural districts situated from south of Loire

Some seabord towns

The canonical form:

[quantification]+[type]+[zone]

Page 12: Geographic reference analysis for geographic document querying

The ‘LocGeo’ type

Quant Type Zone qualification administrative Position named geo. entity

The quarter of / districts in north of France

Fifteen / All /

Some seabord towns of Normandy

The most rural districts situated from south of Loire

Some seabord towns

quant

type

zone

Page 13: Geographic reference analysis for geographic document querying

Semantic Representation« Paris »

zone: loc: internal

egn:

coord:

ty_zone: town

nom: Paris

Long: 5.733333

Lat: 45.633333

Page 14: Geographic reference analysis for geographic document querying

Semantic Representation« Some seabord towns in north of Normandy »

locgeo:

quant:

type:

zone:

type: relative

ty_zone: town

geo: seabord

nom: Normandy

ty_zone: region

loc: internal

position: north

egn:

Page 15: Geographic reference analysis for geographic document querying

Implementation and (first) Results

A tokenisation and a morphological analysis

A DCG to perform altogether syntactic and semantic analysis• the grammar contains 160 rules• an internal lexical base of 200 entries• a gazetteer of 100000 named places (France)

9OO expressions recognised and analysed from a geographical corpus (200 text pages)

Good results but a precise and quantitative evaluation to be done

Page 16: Geographic reference analysis for geographic document querying

Semantic matching : Why ?

a query

corpora

Tex

t AT

ext B

[…] the northern half of France […]

[…] the south of a Bordeaux-Genève line […]

"Which passages address Paris ?"

[…] In Paris and Toulouse […]

[…] In Ile de France region […]

1

3

2

Page 17: Geographic reference analysis for geographic document querying

Semantic matching : How ?

• Spatial compatibility : Is the zone denoted by the passage spatially compatible

with the one of the query? (is there, at least, an intersection?)

• Relevance degree : if this zone is compatible, how relevant is it w.r.t.the

query?- probability- granularity

Page 18: Geographic reference analysis for geographic document querying

Compatibility computation

• Q1) Which passages address Paris ?

• P1) […] the capital city […]

• P2) […] big cities in France.

• P3) […] the northern half of France […]

• P4) […] South of a Bordeaux-Genève line.

YES

YES

YES

NO

gazetteer

gazetteer + computation

gis+

com

puta

tion

Page 19: Geographic reference analysis for geographic document querying

"the northern half of France"

Page 20: Geographic reference analysis for geographic document querying

"the south of a Bordeaux-Genève line"

Page 21: Geographic reference analysis for geographic document querying

Relevance degree (1)Quantification

Query= "Calvados" (french district)

P1= "The quarter of districts in north of France"

P2= "All districts in north of France"

P3= "Some districts in north of France"

P4= "Fifteen districts in north of France"

r=25%

r=100%

r=i/n=5/52=9.6%

r=i/n=15/52=29%

GIS

GIS

1

2

4

3rank

Page 22: Geographic reference analysis for geographic document querying

Relevance degree (2)Granularity

"Basse Normandie"

"Calvados"

 ’the northern half of France’

"Caen"

countryregiondistrictcity

"zone"

Page 23: Geographic reference analysis for geographic document querying

locgeo(locgeo:(det:Det..type:Type..Zone)) --> #prep, det(Det), type(Type), zone(Zone).

det(Sem) --> [X],{lexique(X,[X|R],det,Sem)}.

type(X) --> typeQualif(X).type(ty_zone:N) --> nomtype(N).

typeQualif(ty_zone:N..Q) --> option, nomtype(N), #prep, qualif(Q).

nomtype(Sem) --> [X], {lexique(X,[X|R],nom,Sem)}.

zone(X)--> egn(X).

egn(egn:(ty_zone:T..nom:Y..coord:C)) --> --> ls_lexiconExtDCG(np, type_sem:egn..type_zone:T..nom:Y..coord:C ).

egn(egn:(ty_zone:T..nom:Y)) --> [X],{lexique(X,[X|R],np, type_sem:egn..type_zone:T..nom:Y)}.

 

Page 24: Geographic reference analysis for geographic document querying

lexique(quelque,[quelque],det,type_sem:relatif..type:relatif_qualifie..nb:'qualitatif:faible').

lexique(tout,[tout,le],det,type_sem:exhaustif).

lexique(région,[région],nom,type_sem:zone(administrative)..nom_zone:région).

lexique(ville,[ville],nom,type_sem:zone(administrative)..nom_zone:ville).

Lexique('Bretagne',['Bretagne'],np,type_sem:egn..type_zone:région..nom:'Bretagne').