Data archiving for the Irish Record Linkage project
Rebecca Grant, Digital Archivist, Digital Repository of IrelandDolores Grant, IRL-DRI Digital Archivist, Digital Repository of Ireland
Data archiving for the Irish Record Linkage project
This is a Placeholder for Text
•Bullet-point 01
•Bullet-point 02
•Bullet-point 03
Irish Record Linkage project 1864-1913
Irish Record Linkage is an Irish Research Council funded project running from 2014 – June 2016
Collaboration between the University of Limerick (historians), the Digital Repository of Ireland at the Royal Irish Academy (archivists), and Insight@NUI Galway (knowledge engineers, Linked Data experts)
Constructing a Knowledge Platform – Linked Data based on Vital Registration Data (digitised registers of Births, Marriages and Deaths) in order to answer research questions around infant and maternal mortality
Data archiving for the Irish Record Linkage project
This is a Placeholder for Text
•Bullet-point 01
•Bullet-point 02
•Bullet-point 03
Irish Record Linkage project 1864-1913
The Linked Data concept and the project’s dataset
Extracting data from the vital records
Approaches to archival authenticity
Preservation of the records
Data archiving for the Irish Record Linkage project
This is a Placeholder for Text
•Bullet-point 01
•Bullet-point 02
•Bullet-point 03
The Digital Repository of IrelandDRI is a trusted digital repository for the Humanities and Social Sciences data – launched June 2015 and based at the Royal Irish AcademyLinking and preserving the rich collections held by Irish institutions (archives, museums, libraries, galleries, universities, research projects etc)Focal point for the development of national guidelines and policy for digital preservation and access.
repository.dri.ie
Data archiving for the Irish Record Linkage project
This is a Placeholder for Text
•Bullet-point 01
•Bullet-point 02
•Bullet-point 03
INSIGHT@NUI GalwayInsight is a joint initiative between University College Dublin, the National University of Ireland at Galway, University College Cork, and Dublin City University. Insight was established in 2013 by Science Foundation Ireland with funding of €75m.
The Semantic Web,Sensors and the Sensor Web,Social network analysis,Decision Support and Optimization, andConnected Health.
Data archiving for the Irish Record Linkage project
This is a Placeholder for Text
•Bullet-point 01
•Bullet-point 02
•Bullet-point 03
Irish Record Linkage and Linked Data Queries
• How many women died within 42 days following childbirth due to complications related to labour and how does that figure correspond with the official reports?
• Which women died of causes that can be attributed to maternal death, but for which no corresponding birth certificate exists?
• How did various socio-economic conditions affect maternal and infant mortality rates?
Data archiving for the Irish Record Linkage project
This is a Placeholder for Text
•Bullet-point 01
•Bullet-point 02
•Bullet-point 03
The General Register Office (GRO) – civil registry responsible for recording information on births, deaths and marriages.
Records of 5,847,323 births (from 1864 to 1912), 4,236,922 deaths (from 1864 and 1912) and 1,160,546 marriages (from 1845 to 1912) transferred to the project team with strict terms and conditions.
Events were captured on register pages (up to 10 for births and deaths, and up to 4 for marriages) divided by district and sent to the GRO where volumes were then created and an index compiled. Database dump of the GRO's database with digitised versions of theregister pages and indexes (TIFFs)
General Register Office records
Data archiving for the Irish Record Linkage project
This is a Placeholder for Text
•Bullet-point 01
•Bullet-point 02
•Bullet-point 03
The Linked Data Concept
The example above describes the subject (James Joyce) and his relationship (predicate) to an object (Dublin). By semantically separating the elements of the information (that James Joyce was born in Dublin) datasets stored in this way can be easily queried.
Data archiving for the Irish Record Linkage project
This is a Placeholder for Text
•Bullet-point 01
•Bullet-point 02
•Bullet-point 03
Birth Records
Register TIFF Index TIFF System Pre 1900 System Post 1900 Superintendent Registrar’s District
Registrar’s District Registration district District District Union County County County Province Province Number in register Entry number Date & place of birth Year of event Date of birth, year of event Name (if any) Name Forename, Surname Forename, Surname Sex Sex Name, surname & dwelling place of father
Name & surname & maiden surname of mother
Mother’s maiden name
Rank or profession of father
Signature, qualification, and residence of informant
When Registered Returns year Returns year Returns quarter Returns quarter Signature of Registrar Name & surname & maiden surname of mother
Rank or profession of father
Signature, qualification, and residence of informant
Signature of Registrar Signature of Superintendant Registrar and date
Baptismal name if added after registration of birth and date
Stamp Number Stamp number Stamp number Volume number Returns volume number Returns volume number Page number Page number Returns page number Returns page number Stamped number Page ID 2nd Stamped
number
Index entry number Index page number
Data archiving for the Irish Record Linkage project
This is a Placeholder for Text
•Bullet-point 01
•Bullet-point 02
•Bullet-point 03
Archival principlesThe principle of provenance: Provenance means the history of ownership related to a group of records or an individual item in a collection. Preserving information on these relationships is essential as they provide evidence of how and who created and used the records before they became part of the archives. Provenance provides essential contextual information for understanding the content and history of an archival collectionThe principle of original order: Archives are kept in the order in which they were originally created or used. This original order allows custodians to protect the authenticity of the records and provides essential information as to how they were created, kept and used.
Data archiving for the Irish Record Linkage project
This is a Placeholder for Text
•Bullet-point 01
•Bullet-point 02
•Bullet-point 03
Data (eg. database records and TIFFs) are only stored for the duration of the project, and must be destroyed following its completion
Data can only be accessed by the IRL project team after an access agreement has been signed
Records cannot be duplicated, downloaded, brought off-site
Personal, identifying information cannot be published
Copyright and related rights remain vested in the General Register Office.
Terms of transfer
Data archiving for the Irish Record Linkage project
This is a Placeholder for Text
•Bullet-point 01
•Bullet-point 02
•Bullet-point 03
DRI Presentation
Archival authenticityThe quality of being genuine, not a counterfeit, and free from tampering, and is typically inferred from internal and external evidence, including its physical characteristics, structure, content, and context.
The presence of a signature serves as a fundamental test for authenticity; the signature identifies the creator and establishes the relationship between the creator and the record.
The style and language of the document must be consistent with other, related documents that are accepted as authentic.
Society of American Archivists http://www2.archivists.org/glossary/terms/a/authenticity
Data archiving for the Irish Record Linkage project
This is a Placeholder for Text
•Bullet-point 01
•Bullet-point 02
•Bullet-point 03
DRI Presentation
Archival authenticityOnly records that are complete can ensure accountability and protect personal rights[…]Individual records must be complete; they must contain all the information they had when they were created. They must also maintain their original structure and context. (Hirtle)
An authentic record is one that is what it purports to be and has not been tampered with or otherwise corrupted. (InterPARES 2)
For a record to be considered trustworthy […] it must accurately reflect the event it records and be uncontaminated by the distorting influence of time, bias, interpretation, or unwarranted opinion on the part of the record-maker (McNeil)
Data archiving for the Irish Record Linkage project
This is a Placeholder for Text
•Bullet-point 01
•Bullet-point 02
•Bullet-point 03
Initial data preparationFinal dataset comprises birth, marriage and death records from 2 districts in Dublin (South City no. 1 and South City no. 3)
Separate database constructed to enable the encoding of the IRL records
Tables represent both the register pages and the records (“record” = historical event)
Each event links back to the register page
Fields created reflect original record information and structure enables transformation to RDF
Data archiving for the Irish Record Linkage project
This is a Placeholder for Text
•Bullet-point 01
•Bullet-point 02
•Bullet-point 03
DRI Presentation
• Whole, authentic record maintained to represent the original record and preserve context of creation
• Every database record linked to the TIFF image – TIFFs stored in semi-meaningful arrangement
• Consistent cataloguing practices (dates, square brackets, [sic], notes field to capture anomalies)
• Paleography• Controlled vocabulary of death terms and professions• Archiving databases: preserving content, structure and processes
(RODA toolkit (Repository of Authentic Digital Objects), SIARD (Software Independent Archiving of Relational Databases))
Data challenges
Data archiving for the Irish Record Linkage project
This is a Placeholder for Text
•Bullet-point 01
•Bullet-point 02
•Bullet-point 03
DRI Presentation
Separation of concerns – transcription vs intepretationVariance in how subject names and places were recorded (initials,short hands, name of a building versus street name) - might imply something, which we are currently unaware of.
Transcription of the register pages transcribes exactly what was written down.
Some interpretation necessary in order to use data however – eg. street names changing over time, new insights into medical conditions, adoption of new social theory (eg. class distinctions)
Captured data in two separate ontologies – one for transcription, one for intepretation. For example a death recorded in days in the first database can be interpreted/queried as hours in the second.
Data archiving for the Irish Record Linkage project
This is a Placeholder for Text
•Bullet-point 01
•Bullet-point 02
•Bullet-point 03
GRO Triplestore
Triplestore 2 Data Analysis
SEPA
RATI
ON
OF
CON
CERN
S
Data archiving for the Irish Record Linkage project
This is a Placeholder for Text
•Bullet-point 01
•Bullet-point 02
•Bullet-point 03
DRI Presentation
Register page as EAD (database crosswalk)
Data archiving for the Irish Record Linkage project
This is a Placeholder for Text
•Bullet-point 01
•Bullet-point 02
•Bullet-point 03
DRI Presentation
Data archiving for the Irish Record Linkage project
This is a Placeholder for Text
•Bullet-point 01
•Bullet-point 02
•Bullet-point 03
DRI Presentation
Archival authenticity and preservationArchivist encoded entire register pages rather than lines of data regarding an individual (eg. a single life event such as a death)
Database records refer back to digitised TIFFs created by General Register Office
Interpretation of the dataset occurs separately – all records are transcribed exactly including typos, blank fields, details crossed out, Xs etc.
TIFFs can be preserved with EAD or QDC metadata, and associated databases preserved separately and linked
Querying of the data occurs only on an obfuscated dataset with personal names excluded; linked data can contain outbound links but is protected by a firewall
Authenticity of the dataset
Data archiving for the Irish Record Linkage project
This is a Placeholder for Text
•Bullet-point 01
•Bullet-point 02
•Bullet-point 03
Bibliography
Hirtle, Peter. “Archival Authenticity in a Digital Age”. Authenticity in a digital environment, 2000.
Lee, Brent. Authenticity, Accuracy and Reliability: Reconciling Arts-related and Archival Literature, 2005.
McNeil, Heather. “Trusting Records in a Postmodern World”. Archivaria 51, 2001.
Pearce-Moses, Richard. A Glossary of Archival and Records Terminology, 2005.
SIARD Suite: http://www.bar.admin.ch/dienstleistungen/00823/01911/index.html?lang=en
Data archiving for the Irish Record Linkage project
This is a Placeholder for Text
•Bullet-point 01
•Bullet-point 02
•Bullet-point 03
@beck_grant@IRL_project
http://repository.dri.ie
The content of this presentation is licensed as CC-BY. Please attribute to Rebecca Grant, Digital Archivist, Digital Repository of Ireland, 2015.
https://irishrecordlinkage.wordpress.com/