imperfect temporal information in data sets
TRANSCRIPT
IMPERFECT TEMPORAL INFORMATION IN DATA SETS
Koen Van [email protected]
Flanders Heritage Agency
28 march 2012
KOEN VAN DAELE (FHA) IMPERFECT TEMPORAL INFORMATION CAA2012 1 / 23
INTRODUCTION
THE QUESTIONS
I Who?I What?I How?I Why?I Where?I When?
NATURAL LANGUAGE
I About ...I Circa ...I In the course of ...I More or less contemporary to ...I After ... but before ...
KOEN VAN DAELE (FHA) IMPERFECT TEMPORAL INFORMATION CAA2012 2 / 23
INTRODUCTION
THE QUESTIONS
I Who?I What?I How?I Why?I Where?I When?
NATURAL LANGUAGE
I About ...I Circa ...I In the course of ...I More or less contemporary to ...I After ... but before ...
KOEN VAN DAELE (FHA) IMPERFECT TEMPORAL INFORMATION CAA2012 2 / 23
INTRODUCTION
WHY IS A DATE IMPERFECT?
UNCERTAINTYThere exists an exact date, but it is completely or partially unknow to us.
I Joseph Haydn was born on the 31st of march or the1st of april 1732.
I Pompeii was destroyed in an eruption of the Vesuviuson august 24 79 AD according to some classicalsources. Some of the material evidence suggests thatit was more to the end of october or november thatthe eruption took place.
KOEN VAN DAELE (FHA) IMPERFECT TEMPORAL INFORMATION CAA2012 3 / 23
INTRODUCTION
SUBJECTIVITYSome things will never have an exact date. They occur gradually or are opento interpretation.
I Linear Pottery Culture in Flanders started around5.550 BC. It disappeared between 5.000 and 4.900BC.
I When did World War II end? May 7th 1945? May 8th1945? September 2nd 1945?
KOEN VAN DAELE (FHA) IMPERFECT TEMPORAL INFORMATION CAA2012 4 / 23
INTRODUCTION
GRANULARITYWe describe time on different scales (centuries, years, days, hours).
I The First World War took place in the 20th century.I The First World War took place from 1914 to 1918.I The First World War started on the 28th of july 1914 and ended on the
11th of november 1918.
Any temporal specification made in natural language will become vague if werefine the granularity of the temporal axis sufficiently.
KOEN VAN DAELE (FHA) IMPERFECT TEMPORAL INFORMATION CAA2012 5 / 23
INTRODUCTION
OUR GOALS
HOWcan we record temporal information in a structured way that allows us torecord both accurate and vague temporal information with one system andprocess that information?
I Handle exact dates, eg. 27/09/1838I Handle uncertain dates, eg. sometime in 1377I Handle subjective dates, eg. the vague end of Linear Pottery CultureI Analyse our datasets with all these different types of dates. eg. Which
people lived in the 19th century.
KOEN VAN DAELE (FHA) IMPERFECT TEMPORAL INFORMATION CAA2012 6 / 23
RESEARCH REPRESENTING IMPERFECT TIME
SHARP TIMEINTERVAL (STI)
TI− I+
I
I A set as we generally know itI We can use it to represent uncertainty in a dateI We can’t use it to represent uncertainty in a periodeI We can’t use it to express subjectivity
KOEN VAN DAELE (FHA) IMPERFECT TEMPORAL INFORMATION CAA2012 7 / 23
RESEARCH REPRESENTING IMPERFECT TIME
ROUGH TIMEINTERVAL (RTI)
TI− I− I+ I+
B− I B+
I
I Lower approximation II Upper approximation II Boundary regions B− and B+
KOEN VAN DAELE (FHA) IMPERFECT TEMPORAL INFORMATION CAA2012 8 / 23
RESEARCH REPRESENTING IMPERFECT TIME
EXAMPLE
Adam Gheerijs was born around 1320-1325 He died on 10/12/1394.
T13201325
10/12/1394
B− I
I
KOEN VAN DAELE (FHA) IMPERFECT TEMPORAL INFORMATION CAA2012 9 / 23
RESEARCH REPRESENTING IMPERFECT TIME
ROUGH TIMEINTERVAL (RTI)
TI− I− I+ I+
B− I B+
I
I We can use it to represent uncertainty in a dateI We can use it to represent uncertainty in a periodI We can’t use it to represent subjectivity
KOEN VAN DAELE (FHA) IMPERFECT TEMPORAL INFORMATION CAA2012 10 / 23
RESEARCH REPRESENTING IMPERFECT TIME
FUZZY TIMEINTERVAL (FTI)
t
I(t)
0
1
FBI CI FEI
SI
I Core CII Support SII Fuzzy beginning FBII Fuzzy end FEI
KOEN VAN DAELE (FHA) IMPERFECT TEMPORAL INFORMATION CAA2012 11 / 23
RESEARCH REPRESENTING IMPERFECT TIME
EXAMPLES
Linear Pottery Culture in Flanders started around 5.550 BC. It disappearedbetween 5.000 and 4.900 BC.
t
I(t)
0
1
5.575 BC
5.525 BC 5.000 BC
4.900 BC
KOEN VAN DAELE (FHA) IMPERFECT TEMPORAL INFORMATION CAA2012 12 / 23
RESEARCH REPRESENTING IMPERFECT TIME
EXAMPLES
The Russian Revolution was a series of smaller revolutions: one in 1905, thefebruary revolution and october revolution in 1917 and a civil war from 1918to 1922-1923.
t
I(t)
0
1
Civil War
1905 feb 1917
nov 1917
KOEN VAN DAELE (FHA) IMPERFECT TEMPORAL INFORMATION CAA2012 13 / 23
RESEARCH REPRESENTING IMPERFECT TIME
FUZZY TIMEINTERVAL (FTI)
t
I(t)
0
1
FBI CI FEI
SI
I We can use it to represent uncertainty in a dateI We can use it to represent uncertainty in a periodI We can use it to represent subjectivity
KOEN VAN DAELE (FHA) IMPERFECT TEMPORAL INFORMATION CAA2012 14 / 23
RESEARCH QUERYING IMPERFECT TIME
ALLEN RELATIONS
before b(A,B) ≡ a+ < b−
overlaps o(A,B) ≡ a− < b− ∧ b− < a+ ∧ a+ < b+
during d(A,B) ≡ a− > b− ∧ a+ < b+
meets m(A,B) ≡ a+ = b−
starts s(A,B) ≡ a− = b− ∧ a+ < b+
finishes f (A,B) ≡ a+ = b+ ∧ b− < a−
equals e(A,B) ≡ a− = b− ∧ a+ = b+
bef bef (A,B) ≡ a+ ≤ b−
dur dur(A,B) ≡ a− ≥ b− ∧ a+ ≤ b+
intersects i(A,B) ≡ a+ > b− ∧ a− < b+
I Table only shows half of the relations. Can be reversed.I Relations defined between the begin- and endpoints of 2 STI’s.I 13 original Allen relations and 5 composite
KOEN VAN DAELE (FHA) IMPERFECT TEMPORAL INFORMATION CAA2012 15 / 23
RESEARCH QUERYING IMPERFECT TIME
FUZZY ALLEN RELATIONS
HOW CAN WE DEFINE THE ALLEN RELATIONS FOR 2 FTI’S?The relation between 2 FTI’s is a fuzzy relation (result is a value between 0and 1).
I Nagypál and Motik (NM)I Intuitive resultsI Fully compatible with Allen relations for STI’sI equals(A,A) = 0.5
I Schockaert (S1)I Intuitive resultsI Very complex calculationI Fully compatible with Allen relations for STI’sI equals(A,A) = 1
I Simplified Schockaert (S2)I Specialised version of S1I Simpler to calculateI Only works for trapezoidal FTI’s.
KOEN VAN DAELE (FHA) IMPERFECT TEMPORAL INFORMATION CAA2012 16 / 23
IMPLEMENTATION AND TEST
IMPLEMENTATION
I Implemenation in a RDBMS, PostgresqlI Using PostGIS (Spatial extension)I Written in SQL and PL/pgSQLI Code released as Open Source Software.I Download at https://github.com/koenedaele/pgFTII Complete implementation of NM and S2.I Using a finite timeline of 1.000.000 BC to 100.000 AD.
KOEN VAN DAELE (FHA) IMPERFECT TEMPORAL INFORMATION CAA2012 17 / 23
IMPLEMENTATION AND TEST
TESTDATA
For our tests we used a data set of 716 architects who worked in Flanders.
LOUIS DELACENSERIE Date of birth: 27/09/1838. Date of death:02/09/1909.
ADAM GHEERIJS Was born around 1320-1325. Died 10/12/1394.
JEAN D’ OISY Was born ca. 1310 near Valenciennes and died 1377 inBrussels.
JAN GUETHEGEM 15th century builder.
HENDRIK VAN TIENEN No date of birth or death known. All we know is thathe was working in 1396 on the St.-Janskerk in Diest.
KOEN VAN DAELE (FHA) IMPERFECT TEMPORAL INFORMATION CAA2012 18 / 23
IMPLEMENTATION AND TEST
CURRENT WAY OF ENTERING DATA
I Available since may 2009I date fieldI Some conventions to handle uncertainty
I If we only know the year: 1st of januaryI If we only know the mont: 1st day of that monthI No year known: no data entry
I No way to differentiate between somewhere in 1377 and 01/01/1377!I 42% of the persons in our database has a potentially uncertain birth- or
deathdate
KOEN VAN DAELE (FHA) IMPERFECT TEMPORAL INFORMATION CAA2012 19 / 23
IMPLEMENTATION AND TEST
TESTING THE ALLEN BEFORE RELATION
Date NM S1 S2n time (sec) n time (sec) n time (sec)
01/01/1001 0 0,12 0 0,06 0 0,0601/01/1701 7 0,09 7 9,6 7 0,0701/01/1901 132 0,58 132 419,01 132 0,101/01/2101 716 0,06 716 0,07 716 0,06
I Each methode produces the same outcome.I S1 is the slowest by far.I Effect of short-circuiting.I S2 is faster than NM.
KOEN VAN DAELE (FHA) IMPERFECT TEMPORAL INFORMATION CAA2012 20 / 23
CONCLUSION
CONCLUSION
I Fuzzy Time Intervals can be used to store dates and period that are vagueor we are uncertain about.
I We can discover the Allen Relations between Fuzzy Time Intervals usingdifferent algorithms.
I Choice of comparison algorithm is dependent on the needs of theresearcher.
I If trapezoidal FTI’s are sufficient, S2 is fastest.I If complex analysis and reasoning is needed, go with S1.I If trapezoidal FTI’s are not sufficient and speed is more important than
reasoning, NM offers a nice balance.
KOEN VAN DAELE (FHA) IMPERFECT TEMPORAL INFORMATION CAA2012 21 / 23
CONCLUSION
QUESTIONS AND ANSWERS?
KOEN VAN DAELE (FHA) IMPERFECT TEMPORAL INFORMATION CAA2012 22 / 23
CONCLUSION
FURTHER INFORMATION
FURTHER READING AND FULL BIBLIOGRAPHYKoen Van Daele, 2010: Imperfecte tijdsmodellering in historischedatabanken. Unpublished masterpaper, Universiteit Genthttp://lib.ugent.be/fulltxt/RUG01/001/418/820/RUG01-001418820_2010_0001_AC.pdf.
CONTACT ME
I Email: [email protected] Twitter: @koenedaeleI LinkedIn: http://be.linkedin.com/in/koenvandaele
KOEN VAN DAELE (FHA) IMPERFECT TEMPORAL INFORMATION CAA2012 23 / 23