imperfect temporal information in data sets

24
I MPERFECT T EMPORAL I NFORMATION IN DATA SETS Koen Van Daele [email protected] Flanders Heritage Agency 28 march 2012 KOEN VAN DAELE (FHA) IMPERFECT TEMPORAL INFORMATION CAA2012 1 / 23

Upload: koen-van-daele

Post on 17-Jul-2015

716 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Imperfect temporal information in data sets

IMPERFECT TEMPORAL INFORMATION IN DATA SETS

Koen Van [email protected]

Flanders Heritage Agency

28 march 2012

KOEN VAN DAELE (FHA) IMPERFECT TEMPORAL INFORMATION CAA2012 1 / 23

Page 2: Imperfect temporal information in data sets

INTRODUCTION

THE QUESTIONS

I Who?I What?I How?I Why?I Where?I When?

NATURAL LANGUAGE

I About ...I Circa ...I In the course of ...I More or less contemporary to ...I After ... but before ...

KOEN VAN DAELE (FHA) IMPERFECT TEMPORAL INFORMATION CAA2012 2 / 23

Page 3: Imperfect temporal information in data sets

INTRODUCTION

THE QUESTIONS

I Who?I What?I How?I Why?I Where?I When?

NATURAL LANGUAGE

I About ...I Circa ...I In the course of ...I More or less contemporary to ...I After ... but before ...

KOEN VAN DAELE (FHA) IMPERFECT TEMPORAL INFORMATION CAA2012 2 / 23

Page 4: Imperfect temporal information in data sets

INTRODUCTION

WHY IS A DATE IMPERFECT?

UNCERTAINTYThere exists an exact date, but it is completely or partially unknow to us.

I Joseph Haydn was born on the 31st of march or the1st of april 1732.

I Pompeii was destroyed in an eruption of the Vesuviuson august 24 79 AD according to some classicalsources. Some of the material evidence suggests thatit was more to the end of october or november thatthe eruption took place.

KOEN VAN DAELE (FHA) IMPERFECT TEMPORAL INFORMATION CAA2012 3 / 23

Page 5: Imperfect temporal information in data sets

INTRODUCTION

SUBJECTIVITYSome things will never have an exact date. They occur gradually or are opento interpretation.

I Linear Pottery Culture in Flanders started around5.550 BC. It disappeared between 5.000 and 4.900BC.

I When did World War II end? May 7th 1945? May 8th1945? September 2nd 1945?

KOEN VAN DAELE (FHA) IMPERFECT TEMPORAL INFORMATION CAA2012 4 / 23

Page 6: Imperfect temporal information in data sets

INTRODUCTION

GRANULARITYWe describe time on different scales (centuries, years, days, hours).

I The First World War took place in the 20th century.I The First World War took place from 1914 to 1918.I The First World War started on the 28th of july 1914 and ended on the

11th of november 1918.

Any temporal specification made in natural language will become vague if werefine the granularity of the temporal axis sufficiently.

KOEN VAN DAELE (FHA) IMPERFECT TEMPORAL INFORMATION CAA2012 5 / 23

Page 7: Imperfect temporal information in data sets

INTRODUCTION

OUR GOALS

HOWcan we record temporal information in a structured way that allows us torecord both accurate and vague temporal information with one system andprocess that information?

I Handle exact dates, eg. 27/09/1838I Handle uncertain dates, eg. sometime in 1377I Handle subjective dates, eg. the vague end of Linear Pottery CultureI Analyse our datasets with all these different types of dates. eg. Which

people lived in the 19th century.

KOEN VAN DAELE (FHA) IMPERFECT TEMPORAL INFORMATION CAA2012 6 / 23

Page 8: Imperfect temporal information in data sets

RESEARCH REPRESENTING IMPERFECT TIME

SHARP TIMEINTERVAL (STI)

TI− I+

I

I A set as we generally know itI We can use it to represent uncertainty in a dateI We can’t use it to represent uncertainty in a periodeI We can’t use it to express subjectivity

KOEN VAN DAELE (FHA) IMPERFECT TEMPORAL INFORMATION CAA2012 7 / 23

Page 9: Imperfect temporal information in data sets

RESEARCH REPRESENTING IMPERFECT TIME

ROUGH TIMEINTERVAL (RTI)

TI− I− I+ I+

B− I B+

I

I Lower approximation II Upper approximation II Boundary regions B− and B+

KOEN VAN DAELE (FHA) IMPERFECT TEMPORAL INFORMATION CAA2012 8 / 23

Page 10: Imperfect temporal information in data sets

RESEARCH REPRESENTING IMPERFECT TIME

EXAMPLE

Adam Gheerijs was born around 1320-1325 He died on 10/12/1394.

T13201325

10/12/1394

B− I

I

KOEN VAN DAELE (FHA) IMPERFECT TEMPORAL INFORMATION CAA2012 9 / 23

Page 11: Imperfect temporal information in data sets

RESEARCH REPRESENTING IMPERFECT TIME

ROUGH TIMEINTERVAL (RTI)

TI− I− I+ I+

B− I B+

I

I We can use it to represent uncertainty in a dateI We can use it to represent uncertainty in a periodI We can’t use it to represent subjectivity

KOEN VAN DAELE (FHA) IMPERFECT TEMPORAL INFORMATION CAA2012 10 / 23

Page 12: Imperfect temporal information in data sets

RESEARCH REPRESENTING IMPERFECT TIME

FUZZY TIMEINTERVAL (FTI)

t

I(t)

0

1

FBI CI FEI

SI

I Core CII Support SII Fuzzy beginning FBII Fuzzy end FEI

KOEN VAN DAELE (FHA) IMPERFECT TEMPORAL INFORMATION CAA2012 11 / 23

Page 13: Imperfect temporal information in data sets

RESEARCH REPRESENTING IMPERFECT TIME

EXAMPLES

Linear Pottery Culture in Flanders started around 5.550 BC. It disappearedbetween 5.000 and 4.900 BC.

t

I(t)

0

1

5.575 BC

5.525 BC 5.000 BC

4.900 BC

KOEN VAN DAELE (FHA) IMPERFECT TEMPORAL INFORMATION CAA2012 12 / 23

Page 14: Imperfect temporal information in data sets

RESEARCH REPRESENTING IMPERFECT TIME

EXAMPLES

The Russian Revolution was a series of smaller revolutions: one in 1905, thefebruary revolution and october revolution in 1917 and a civil war from 1918to 1922-1923.

t

I(t)

0

1

Civil War

1905 feb 1917

nov 1917

KOEN VAN DAELE (FHA) IMPERFECT TEMPORAL INFORMATION CAA2012 13 / 23

Page 15: Imperfect temporal information in data sets

RESEARCH REPRESENTING IMPERFECT TIME

FUZZY TIMEINTERVAL (FTI)

t

I(t)

0

1

FBI CI FEI

SI

I We can use it to represent uncertainty in a dateI We can use it to represent uncertainty in a periodI We can use it to represent subjectivity

KOEN VAN DAELE (FHA) IMPERFECT TEMPORAL INFORMATION CAA2012 14 / 23

Page 16: Imperfect temporal information in data sets

RESEARCH QUERYING IMPERFECT TIME

ALLEN RELATIONS

before b(A,B) ≡ a+ < b−

overlaps o(A,B) ≡ a− < b− ∧ b− < a+ ∧ a+ < b+

during d(A,B) ≡ a− > b− ∧ a+ < b+

meets m(A,B) ≡ a+ = b−

starts s(A,B) ≡ a− = b− ∧ a+ < b+

finishes f (A,B) ≡ a+ = b+ ∧ b− < a−

equals e(A,B) ≡ a− = b− ∧ a+ = b+

bef bef (A,B) ≡ a+ ≤ b−

dur dur(A,B) ≡ a− ≥ b− ∧ a+ ≤ b+

intersects i(A,B) ≡ a+ > b− ∧ a− < b+

I Table only shows half of the relations. Can be reversed.I Relations defined between the begin- and endpoints of 2 STI’s.I 13 original Allen relations and 5 composite

KOEN VAN DAELE (FHA) IMPERFECT TEMPORAL INFORMATION CAA2012 15 / 23

Page 17: Imperfect temporal information in data sets

RESEARCH QUERYING IMPERFECT TIME

FUZZY ALLEN RELATIONS

HOW CAN WE DEFINE THE ALLEN RELATIONS FOR 2 FTI’S?The relation between 2 FTI’s is a fuzzy relation (result is a value between 0and 1).

I Nagypál and Motik (NM)I Intuitive resultsI Fully compatible with Allen relations for STI’sI equals(A,A) = 0.5

I Schockaert (S1)I Intuitive resultsI Very complex calculationI Fully compatible with Allen relations for STI’sI equals(A,A) = 1

I Simplified Schockaert (S2)I Specialised version of S1I Simpler to calculateI Only works for trapezoidal FTI’s.

KOEN VAN DAELE (FHA) IMPERFECT TEMPORAL INFORMATION CAA2012 16 / 23

Page 18: Imperfect temporal information in data sets

IMPLEMENTATION AND TEST

IMPLEMENTATION

I Implemenation in a RDBMS, PostgresqlI Using PostGIS (Spatial extension)I Written in SQL and PL/pgSQLI Code released as Open Source Software.I Download at https://github.com/koenedaele/pgFTII Complete implementation of NM and S2.I Using a finite timeline of 1.000.000 BC to 100.000 AD.

KOEN VAN DAELE (FHA) IMPERFECT TEMPORAL INFORMATION CAA2012 17 / 23

Page 19: Imperfect temporal information in data sets

IMPLEMENTATION AND TEST

TESTDATA

For our tests we used a data set of 716 architects who worked in Flanders.

LOUIS DELACENSERIE Date of birth: 27/09/1838. Date of death:02/09/1909.

ADAM GHEERIJS Was born around 1320-1325. Died 10/12/1394.

JEAN D’ OISY Was born ca. 1310 near Valenciennes and died 1377 inBrussels.

JAN GUETHEGEM 15th century builder.

HENDRIK VAN TIENEN No date of birth or death known. All we know is thathe was working in 1396 on the St.-Janskerk in Diest.

KOEN VAN DAELE (FHA) IMPERFECT TEMPORAL INFORMATION CAA2012 18 / 23

Page 20: Imperfect temporal information in data sets

IMPLEMENTATION AND TEST

CURRENT WAY OF ENTERING DATA

I Available since may 2009I date fieldI Some conventions to handle uncertainty

I If we only know the year: 1st of januaryI If we only know the mont: 1st day of that monthI No year known: no data entry

I No way to differentiate between somewhere in 1377 and 01/01/1377!I 42% of the persons in our database has a potentially uncertain birth- or

deathdate

KOEN VAN DAELE (FHA) IMPERFECT TEMPORAL INFORMATION CAA2012 19 / 23

Page 21: Imperfect temporal information in data sets

IMPLEMENTATION AND TEST

TESTING THE ALLEN BEFORE RELATION

Date NM S1 S2n time (sec) n time (sec) n time (sec)

01/01/1001 0 0,12 0 0,06 0 0,0601/01/1701 7 0,09 7 9,6 7 0,0701/01/1901 132 0,58 132 419,01 132 0,101/01/2101 716 0,06 716 0,07 716 0,06

I Each methode produces the same outcome.I S1 is the slowest by far.I Effect of short-circuiting.I S2 is faster than NM.

KOEN VAN DAELE (FHA) IMPERFECT TEMPORAL INFORMATION CAA2012 20 / 23

Page 22: Imperfect temporal information in data sets

CONCLUSION

CONCLUSION

I Fuzzy Time Intervals can be used to store dates and period that are vagueor we are uncertain about.

I We can discover the Allen Relations between Fuzzy Time Intervals usingdifferent algorithms.

I Choice of comparison algorithm is dependent on the needs of theresearcher.

I If trapezoidal FTI’s are sufficient, S2 is fastest.I If complex analysis and reasoning is needed, go with S1.I If trapezoidal FTI’s are not sufficient and speed is more important than

reasoning, NM offers a nice balance.

KOEN VAN DAELE (FHA) IMPERFECT TEMPORAL INFORMATION CAA2012 21 / 23

Page 23: Imperfect temporal information in data sets

CONCLUSION

QUESTIONS AND ANSWERS?

KOEN VAN DAELE (FHA) IMPERFECT TEMPORAL INFORMATION CAA2012 22 / 23

Page 24: Imperfect temporal information in data sets

CONCLUSION

FURTHER INFORMATION

FURTHER READING AND FULL BIBLIOGRAPHYKoen Van Daele, 2010: Imperfecte tijdsmodellering in historischedatabanken. Unpublished masterpaper, Universiteit Genthttp://lib.ugent.be/fulltxt/RUG01/001/418/820/RUG01-001418820_2010_0001_AC.pdf.

CONTACT ME

I Email: [email protected] Twitter: @koenedaeleI LinkedIn: http://be.linkedin.com/in/koenvandaele

KOEN VAN DAELE (FHA) IMPERFECT TEMPORAL INFORMATION CAA2012 23 / 23