family history research on the semantic web : building a semantic prototype for danish genealogical...

25
Family History Family History Research Research on the Semantic Web on the Semantic Web : : Building a Semantic Prototype for Building a Semantic Prototype for Danish Genealogical Research Danish Genealogical Research By By Charla Woodbury Charla Woodbury Computer Science Computer Science Spring Research Conference Spring Research Conference March 19, 2005 March 19, 2005 Supported in part by NSF Supported in part by NSF

Upload: benny-sermons

Post on 01-Apr-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Family History Research on the Semantic Web : Building a Semantic Prototype for Danish Genealogical Research By Charla Woodbury Computer Science Spring

Family History Research Family History Research on the Semantic Webon the Semantic Web: :

Building a Semantic Prototype for Danish Building a Semantic Prototype for Danish Genealogical ResearchGenealogical Research

By By

Charla WoodburyCharla WoodburyComputer ScienceComputer Science

Spring Research ConferenceSpring Research ConferenceMarch 19, 2005March 19, 2005

Supported in part by NSFSupported in part by NSF

Page 2: Family History Research on the Semantic Web : Building a Semantic Prototype for Danish Genealogical Research By Charla Woodbury Computer Science Spring

22

Semantic Web Semantic Web Machine “Understandable” WebMachine “Understandable” Web

DATA

INFORMATION

KNOWLEDGE

MEANING

Page 3: Family History Research on the Semantic Web : Building a Semantic Prototype for Danish Genealogical Research By Charla Woodbury Computer Science Spring

33

Need for Semantic WebNeed for Semantic Web

“The Semantic Web: … content that is meaningful to computers [and that] will unleash a revolution of new possibilities … Properly designed, the Semantic Web can assist the evolution of human knowledge …”

(Tim Berners-Lee, …, Weaving the Web)

Page 4: Family History Research on the Semantic Web : Building a Semantic Prototype for Danish Genealogical Research By Charla Woodbury Computer Science Spring

44

Semantic WebSemantic Web‘‘DATEDATE’’

Calendar date

To date an artefact

A fruit

A romantic experience

To go on a romantic experience with someone

Page 5: Family History Research on the Semantic Web : Building a Semantic Prototype for Danish Genealogical Research By Charla Woodbury Computer Science Spring

55

Also a Also a SURNAMESURNAME – – Mr. C. J. DateMr. C. J. Date****

The semantic web will make it possible The semantic web will make it possible for machines to know the difference!for machines to know the difference!

** Edgar F. Codd and C. J. Date are famous in the ** Edgar F. Codd and C. J. Date are famous in the area of databases for defining levels of normal area of databases for defining levels of normal formsforms

Page 6: Family History Research on the Semantic Web : Building a Semantic Prototype for Danish Genealogical Research By Charla Woodbury Computer Science Spring

66

REAL PROBLEMREAL PROBLEM

A person decides to do family history research for the first time on their Danish family lines.

• Where do they go?• What records do they look for?• How do they handle records in Danish?• How can they tell when the records they have match their search family?

Page 7: Family History Research on the Semantic Web : Building a Semantic Prototype for Danish Genealogical Research By Charla Woodbury Computer Science Spring

77

SEMANTIC WEB PROTOTYPE

Ontology – semantic model (BYU Ontos)

Annotated web pages (Web Ontology Language OWL proposed W3C Feb 2004)

Solutions for special genealogical problems

Page 8: Family History Research on the Semantic Web : Building a Semantic Prototype for Danish Genealogical Research By Charla Woodbury Computer Science Spring

88

ONTOLOGY MODELONTOLOGY MODEL

Page 9: Family History Research on the Semantic Web : Building a Semantic Prototype for Danish Genealogical Research By Charla Woodbury Computer Science Spring

99

ONTOLOGY ENTITIESONTOLOGY ENTITIESFIND and MARK UP relevant web pages FIND and MARK UP relevant web pages

by:by:

• NAMENAME <NAME><NAME>• DATEDATE <DATE><DATE>• PLACEPLACE <PLACE><PLACE>• RELATIONSHIPRELATIONSHIP <RELATION><RELATION>• OCCUPATIONOCCUPATION <OCCUPATION><OCCUPATION>• RECORD_TYPERECORD_TYPE <RTYPE><RTYPE>• SOURCESOURCE <SOURCE><SOURCE>

Page 10: Family History Research on the Semantic Web : Building a Semantic Prototype for Danish Genealogical Research By Charla Woodbury Computer Science Spring

1010

Partial Danish Partial Danish GIVEN NAMEGIVEN NAME LEXICONLEXICON

MALEMALE• And.And.• AndersAnders• Andreas Andreas • Christen Christen • ChristianChristian• EricEric• Erik Erik • GregersGregers• HansHans• Ib Ib • JacobJacob• JensJens• JepJep

FEMALEFEMALE• Ane Ane • Anna Anna • AnneAnne• Birthe Birthe • BirteBirte• BodilBodil• CarolineCaroline• DorteDorte• Dorthe Dorthe • EleneElene• Ellen Ellen • Elisabeth Elisabeth • ElsbethElsbeth

Page 11: Family History Research on the Semantic Web : Building a Semantic Prototype for Danish Genealogical Research By Charla Woodbury Computer Science Spring

1111

PartialPartial DATE DATE Lexicon Lexicon (actual lexicon is a single list in alphabetic order)(actual lexicon is a single list in alphabetic order)

MONTHSMONTHS January –Jan –Januar -11brJanuary –Jan –Januar -11br Februrary –Feb –Februar -12brFebrurary –Feb –Februar -12br March –Mar –MartsMarch –Mar –Marts April – Apr –AplApril – Apr –Apl May –MaiMay –Mai June –Jun –JuniJune –Jun –Juni July –Jul –Juli -5brJuly –Jul –Juli -5br August –Aug –Augst -6brAugust –Aug –Augst -6br September –Sep –Sept -7br –SeptembreSeptember –Sep –Sept -7br –Septembre October –Oct -8br –OctobreOctober –Oct -8br –Octobre November –Nov -9br –NovembreNovember –Nov -9br –Novembre December –Dec -10br -DecembreDecember –Dec -10br -Decembre

TIMETIME Year –yr –aar –årYear –yr –aar –år Month –mo –maaned –måned –m.Month –mo –maaned –måned –m. Week –uge –ug.Week –uge –ug. Day –dag –dg.Day –dag –dg. Hour – h. –hr.Hour – h. –hr.

FEAST DATES (partial)FEAST DATES (partial) Easter – Paaske –Påske –Paasche –Easter – Paaske –Påske –Paasche –

PåschePåsche Pentecost – Pent –Pinse -PinPentecost – Pent –Pinse -Pin Trinity –Tr –Trin –TrinitatisTrinity –Tr –Trin –Trinitatis

DAYS OF WEEKDAYS OF WEEK Sunday –Dominico –Dom.Sunday –Dominico –Dom. Monday –Mondag –Mond.Monday –Mondag –Mond. Tuesday –Tirsdag –Tirsd.Tuesday –Tirsdag –Tirsd. Wednesday -Onsdag –Onsd.Wednesday -Onsdag –Onsd. Thursday –Tørsdag –Tørsd.Thursday –Tørsdag –Tørsd. Friday –Fredag –Fred.Friday –Fredag –Fred. Saturday –Lørsdag –Lørs.Saturday –Lørsdag –Lørs.

Page 12: Family History Research on the Semantic Web : Building a Semantic Prototype for Danish Genealogical Research By Charla Woodbury Computer Science Spring

1212

Original RecordOriginal RecordFHL Film#052,236 Tvilum ParishFHL Film#052,236 Tvilum Parish

Page 13: Family History Research on the Semantic Web : Building a Semantic Prototype for Danish Genealogical Research By Charla Woodbury Computer Science Spring

1313

Web PageWeb Page

• SOURCE URL -SOURCE URL -Tvilum Sogne KirkebogTvilum Sogne Kirkebog

• [PAGE HEADER][PAGE HEADER] Fødde 1751 3 Fødde 1751 3

• [BODY][BODY] Truust Dom. 23 p: Trinit: laest Truust Dom. 23 p: Trinit: laest over Niels Baches SØREN fadd. over Niels Baches SØREN fadd. Johannes Michelsens og Niels Mollers Johannes Michelsens og Niels Mollers hustruer af Søebyevad, Peder hustruer af Søebyevad, Peder Rasmussen af Søebyevad, Jens Bachis Rasmussen af Søebyevad, Jens Bachis søn Peder og Niels Thylkes s. Peder af søn Peder og Niels Thylkes s. Peder af TruustTruust

Page 14: Family History Research on the Semantic Web : Building a Semantic Prototype for Danish Genealogical Research By Charla Woodbury Computer Science Spring

1414

ONTOLOGY ENTITIESONTOLOGY ENTITIESFIND and MARK UP relevant web pages by:FIND and MARK UP relevant web pages by:

• NAMENAME <NAME><NAME>• DATEDATE <DATE><DATE>• PLACEPLACE <PLACE> <PLACE>• RELATIONSHIPRELATIONSHIP <RELATION><RELATION>• OCCUPATIONOCCUPATION <OCCUPATION><OCCUPATION>• RECORD_TYPERECORD_TYPE <RTYPE><RTYPE>• SOURCESOURCE <SOURCE><SOURCE>

Colors only represent OWL annotation mark-ups Colors only represent OWL annotation mark-ups automatically placed in the web page using the automatically placed in the web page using the ontologyontology

Page 15: Family History Research on the Semantic Web : Building a Semantic Prototype for Danish Genealogical Research By Charla Woodbury Computer Science Spring

1515

Annotated Web PageAnnotated Web Page

• SOURCE -SOURCE -Tvilum Parish RegisterTvilum Parish Register

• [PAGE HEADER][PAGE HEADER] FøddeFødde 17511751 3 3

• [BODY][BODY] Truust Truust Dom. 23 p: Trinit: Dom. 23 p: Trinit: laest laest over over Niels BachesNiels Baches SØRENSØREN fadd.fadd. Johannes Johannes MichelsensMichelsens og og NielsNiels Mollers Mollers hustruerhustruer af af SøebyevadSøebyevad, , Peder RasmussenPeder Rasmussen af af SøebyevadSøebyevad, , Jens BachisJens Bachis sønsøn PederPeder og og Niels ThylkesNiels Thylkes s.s. PederPeder af af TruustTruust

Page 16: Family History Research on the Semantic Web : Building a Semantic Prototype for Danish Genealogical Research By Charla Woodbury Computer Science Spring

1616

RESULTS LISTINGRESULTS LISTINGTARGET – TARGET – Jens Pedersen BachJens Pedersen BachTruust, Tvilum Parish, Gjern District, SkanderborgTruust, Tvilum Parish, Gjern District, Skanderborg Date Range - born 1693 to died 1778Date Range - born 1693 to died 1778

Name Date Place Relation Occupation RecordType

Source(URL)

Jens Bachis Dom. 23 p: Trinit:

1751 (14 Nov 1751)

Truust fadd: FøddeFødde Tvilum Parish Register

SOURCE -SOURCE -Tvilum Parish RegisterTvilum Parish Register[PAGE HEADER][PAGE HEADER] FøddeFødde 17511751 3 3 [BODY][BODY] Truust Truust Dom. 23 p: Trinit: Dom. 23 p: Trinit: laest over laest over Niels BachesNiels Baches SØRENSØREN fadd.fadd. Johannes MichelsensJohannes Michelsens og og NielsNiels Mollers Mollers hustruerhustruer af af SøebyevadSøebyevad, , Peder RasmussenPeder Rasmussen af af SøebyevadSøebyevad, , Jens BachisJens Bachis sønsøn PederPeder og og Niels Niels ThylkesThylkes s.s. PederPeder af af TruustTruust

Page 17: Family History Research on the Semantic Web : Building a Semantic Prototype for Danish Genealogical Research By Charla Woodbury Computer Science Spring

1717

CONVERSION FUNCTIONSCONVERSION FUNCTIONSinside the ontologyinside the ontology

• Compute birthdate from age at deathCompute birthdate from age at death

Death – 22 Mar 1743 Death – 22 Mar 1743

Age - 23 yr 2 mAge - 23 yr 2 m

->-> BIRTHBIRTH Jan 1720Jan 1720

• Compute dates from feast dates Sunday 23rd after Trinity 1751

->-> 14 Nov 1751

Page 18: Family History Research on the Semantic Web : Building a Semantic Prototype for Danish Genealogical Research By Charla Woodbury Computer Science Spring

1818

Solutions for Special ProblemsSolutions for Special Problems

RULES FORRULES FOR

• Matching different name formsMatching different name forms

• Matching place names to appropriate Matching place names to appropriate recordsrecords

Page 19: Family History Research on the Semantic Web : Building a Semantic Prototype for Danish Genealogical Research By Charla Woodbury Computer Science Spring

1919

RULERULE - Match different name forms - Match different name forms as ONE PERSONas ONE PERSON

• JENS PEDERSENJENS PEDERSEN

• JENS PEDERSEN BACHJENS PEDERSEN BACH

• JENS BACHJENS BACH

• JENS BACHISJENS BACHIS

Page 20: Family History Research on the Semantic Web : Building a Semantic Prototype for Danish Genealogical Research By Charla Woodbury Computer Science Spring

2020

PLACES - County Map of DENMARK

Page 21: Family History Research on the Semantic Web : Building a Semantic Prototype for Danish Genealogical Research By Charla Woodbury Computer Science Spring

2121

Parish and District Map of Parish and District Map of SKANDERBORGSKANDERBORG

Page 22: Family History Research on the Semantic Web : Building a Semantic Prototype for Danish Genealogical Research By Charla Woodbury Computer Science Spring

2222

Matching Places to RecordsMatching Places to RecordsFarm

nameParish District County Record Links

Molger Tamdrup Nim Skanderborg PARISH Tamdrup 1684-1912PROBATE Nim Herred Provisti Rask Skanderborg Rytterdistrikt

Tamdrup Nim Skanderborg List of URL’s Includes Molger URL’sAdds Parish specific records

Nim Skanderborg List of URL’s Includes Tamdrup URL’sAdds District specific records

Skanderborg List of URL’sIncludes all district URL’sAdds County specific records

Page 23: Family History Research on the Semantic Web : Building a Semantic Prototype for Danish Genealogical Research By Charla Woodbury Computer Science Spring

2323

EvaluationEvaluation User relevance feedback on recordsUser relevance feedback on records

Expert manual results of same query and Expert manual results of same query and data setsdata sets

COMPARECOMPARE• Speed of query results Speed of query results • Recall and precision Recall and precision

TOTO• GOOGLE searchGOOGLE search• Present research techniquesPresent research techniques

Records in book and microfilmRecords in book and microfilm Internet helps Internet helps

Page 24: Family History Research on the Semantic Web : Building a Semantic Prototype for Danish Genealogical Research By Charla Woodbury Computer Science Spring

2424

MAJOR CONTRIBUTIONSMAJOR CONTRIBUTIONS

First genealogical prototype of the First genealogical prototype of the semantic web semantic web

Practical demonstration of the Practical demonstration of the superiority of the semantic web for superiority of the semantic web for researchresearch

Portal for family history research that Portal for family history research that could be easily expanded could be easily expanded

Page 25: Family History Research on the Semantic Web : Building a Semantic Prototype for Danish Genealogical Research By Charla Woodbury Computer Science Spring

2525

QUESTIONS?QUESTIONS?