claudia medina: linking health records for population health research in brazil
DESCRIPTION
Talk by Claudia Medina Coeli on the 1st Symposium of Big Data and Public Health, 2013TRANSCRIPT
UFRJLabmecs
1st Symposium on Big Data and Public Health - 20131st Symposium on Big Data and Public Health - 2013
Linking Health Records for Population Health Research in Brazil.
Cláudia Medina Coeli Cláudia Medina Coeli
UFRJLabmecs
Record Linkage: Record Linkage: The process of identifying and merging records across different databases that correspond to the same entity (for example, the same individual).
This process creates a new database that has more variables than each single database linked.
It also can be used to identify records that refer to the same entity within a single database. It is used for deduplication (removal of duplicate records or merging them into a combine record)
UFRJLabmecs
Record Linkage: Record Linkage: Record linkage is made relatively easy when a unique identifier, such as a health insurance number, is available in the databases to be linked.
In the absence of a unique identifier, the process is based on similar personal identifiers (e.g., name, sex, date of birth, address)
Use of techniques that deal with problems such as typographical errors or variations; time-sensitive data (e.g. address); large databases.
UFRJLabmecs
Record Linkage: Record Linkage:
Christen P, 2012
Data pre-processing: data cleaning, standardization of codes and formats; parsing (name, address).
Indexing (Blocking): comparisons are restricted to records that agree on a blocking key (e.g. soundex (first name) + sex).
Comparison: approximate comparison functions (partial agreement); vector of numerical similarity.
Classification:rule-based, probabilistic, machine learning approaches)
Clerical review: manual inspection (tedious and labour-intensive)
Evaluation: accuracy studies
UFRJLabmecs
The Record Linkage Process: The Record Linkage Process:
...“For more than a decade, most of the methodological research has been in the computer science literature”...
…“Many applications are still in the epidemiological or health informatics literature with most individuals using government health agency shareware based on the Fellig-Sunter model”...
William E Winkler, 2012.
UFRJLabmecs
The record linkage approaches most The record linkage approaches most frquently used in the Brazilian health frquently used in the Brazilian health
sector : sector : Probabilistic (Fellig-Sunter Model): uses approximate comparison functions. Different weights are assigned to each field based on their discriminant power and vulnerability to error. A number of commercial and open source softwares are available. Deterministic: uses exact comparison functions and rule-based classification approach. Rules are developed based on expert knowledge. Specific computer routines need to be developed for each problem.
UFRJLabmecs
Classification model:Classification model:
Probabilistic Rule-based
UFRJLabmecs
Febrl
LinkPlus
Reclink/OpenRecLink
Open Source Record Matching
Software: Software:
UFRJLabmecs
OpenReclink:OpenReclink:
Open Source (http://reclink.sourceforge.net/)Multi-platform;Multiple language support;New database back-end;PostgreSQL integration;New deduplication routine Better performance (Linux Ubuntu 64 bits).
UFRJLabmecs
Accuracy of a probabilistic record linkage strategy applied to identify deaths among cases reported to the Brazilian
AIDS surveillance database*.Study Population:All AIDS cases reported in SINAN with date of diagnosis between 2002 and 2005
Imperfect gold standard:Known death - case with a date of death informed in the surveillance database (N = 19,750).
Known alive - no date of death informed in the surveillance database and found registered in the laboratory database in 2006 (N = 36,675).
Linkage
Gold StandardDead Alive Total
Dead 17301 2449 19750
Alive 155 38520 38675
Global Sensitivity (Se) = 87.6%
Specificity (Sp) = 99.6%.
*Fonseca et al, CSP 26(7), 2010.
Results of the Internal Validation
Study*
*Fonseca et al, CSP 26(7), 2010.
Global Sensitivity (Se) = 87.6% Specificity (Sp) = 99.6%.
UFRJLabmecs
In longitudinal mortality studies, linkage errors introduce outcome misclassification, making risk ratio estimates prone to bias. Risk ratios will not be biased if all three conditions hold:
(1) exposure and outcome misclassification errors must be independent; (2) the outcome misclassification must be non-differential with regard to the exposure levels. (3) specificity must be 100%.
UFRJLabmecs
ImpactImpact ofof linkagelinkage errorserrors onon riskrisk ratios:ratios:
UFRJLabmecs
http://www.ihdln.org
Laboratório de Métodos Epidemiológicos, Estatísticos e Laboratório de Métodos Epidemiológicos, Estatísticos e Computacionais em Saúde (LABMECS/IESC/UFRJ)Computacionais em Saúde (LABMECS/IESC/UFRJ)
Thank you.Thank [email protected]
http://www.iesc.ufrj.br/posgrad/posgraduacao/
UFRJLabmecs