claudia medina: linking health records for population health research in brazil

14
UFRJ Labmecs 1st Symposium on Big Data and Public 1st Symposium on Big Data and Public Health - 2013 Health - 2013 Linking Health Records for Population Health Research in Brazil. Cláudia Medina Cláudia Medina Coeli Coeli

Upload: flavio-codeco-coelho

Post on 13-May-2015

978 views

Category:

Education


0 download

DESCRIPTION

Talk by Claudia Medina Coeli on the 1st Symposium of Big Data and Public Health, 2013

TRANSCRIPT

Page 1: Claudia medina: Linking Health Records for Population Health Research in Brazil

UFRJLabmecs

1st Symposium on Big Data and Public Health - 20131st Symposium on Big Data and Public Health - 2013

Linking Health Records for Population Health Research in Brazil.

Cláudia Medina Coeli Cláudia Medina Coeli

Page 2: Claudia medina: Linking Health Records for Population Health Research in Brazil

UFRJLabmecs

Record Linkage: Record Linkage: The process of identifying and merging records across different databases that correspond to the same entity (for example, the same individual).

This process creates a new database that has more variables than each single database linked.

It also can be used to identify records that refer to the same entity within a single database. It is used for deduplication (removal of duplicate records or merging them into a combine record)

Page 3: Claudia medina: Linking Health Records for Population Health Research in Brazil

UFRJLabmecs

Record Linkage: Record Linkage: Record linkage is made relatively easy when a unique identifier, such as a health insurance number, is available in the databases to be linked.

In the absence of a unique identifier, the process is based on similar personal identifiers (e.g., name, sex, date of birth, address)

Use of techniques that deal with problems such as typographical errors or variations; time-sensitive data (e.g. address); large databases.

Page 4: Claudia medina: Linking Health Records for Population Health Research in Brazil

UFRJLabmecs

Record Linkage: Record Linkage:

Christen P, 2012

Data pre-processing: data cleaning, standardization of codes and formats; parsing (name, address).

Indexing (Blocking): comparisons are restricted to records that agree on a blocking key (e.g. soundex (first name) + sex).

Comparison: approximate comparison functions (partial agreement); vector of numerical similarity.

Classification:rule-based, probabilistic, machine learning approaches)

Clerical review: manual inspection (tedious and labour-intensive)

Evaluation: accuracy studies

Page 5: Claudia medina: Linking Health Records for Population Health Research in Brazil

UFRJLabmecs

The Record Linkage Process: The Record Linkage Process:

...“For more than a decade, most of the methodological research has been in the computer science literature”...

…“Many applications are still in the epidemiological or health informatics literature with most individuals using government health agency shareware based on the Fellig-Sunter model”...

William E Winkler, 2012.

Page 6: Claudia medina: Linking Health Records for Population Health Research in Brazil

UFRJLabmecs

The record linkage approaches most The record linkage approaches most frquently used in the Brazilian health frquently used in the Brazilian health

sector : sector : Probabilistic (Fellig-Sunter Model): uses approximate comparison functions. Different weights are assigned to each field based on their discriminant power and vulnerability to error. A number of commercial and open source softwares are available. Deterministic: uses exact comparison functions and rule-based classification approach. Rules are developed based on expert knowledge. Specific computer routines need to be developed for each problem.

Page 7: Claudia medina: Linking Health Records for Population Health Research in Brazil

UFRJLabmecs

Classification model:Classification model:

Probabilistic Rule-based

Page 8: Claudia medina: Linking Health Records for Population Health Research in Brazil

UFRJLabmecs

Febrl

LinkPlus

Reclink/OpenRecLink

Open Source Record Matching

Software: Software:

Page 9: Claudia medina: Linking Health Records for Population Health Research in Brazil

UFRJLabmecs

OpenReclink:OpenReclink:

Open Source (http://reclink.sourceforge.net/)Multi-platform;Multiple language support;New database back-end;PostgreSQL integration;New deduplication routine Better performance (Linux Ubuntu 64 bits).

Page 10: Claudia medina: Linking Health Records for Population Health Research in Brazil

UFRJLabmecs

Accuracy of a probabilistic record linkage strategy applied to identify deaths among cases reported to the Brazilian

AIDS surveillance database*.Study Population:All AIDS cases reported in SINAN with date of diagnosis between 2002 and 2005

Imperfect gold standard:Known death - case with a date of death informed in the surveillance database (N = 19,750).

Known alive - no date of death informed in the surveillance database and found registered in the laboratory database in 2006 (N = 36,675).

Linkage

Gold StandardDead Alive Total

Dead 17301 2449 19750

Alive 155 38520 38675

Global Sensitivity (Se) = 87.6%

Specificity (Sp) = 99.6%.

*Fonseca et al, CSP 26(7), 2010.

Page 11: Claudia medina: Linking Health Records for Population Health Research in Brazil

Results of the Internal Validation

Study*

*Fonseca et al, CSP 26(7), 2010.

Global Sensitivity (Se) = 87.6% Specificity (Sp) = 99.6%.

UFRJLabmecs

Page 12: Claudia medina: Linking Health Records for Population Health Research in Brazil

In longitudinal mortality studies, linkage errors introduce outcome misclassification, making risk ratio estimates prone to bias. Risk ratios will not be biased if all three conditions hold:

(1) exposure and outcome misclassification errors must be independent; (2) the outcome misclassification must be non-differential with regard to the exposure levels. (3) specificity must be 100%.

UFRJLabmecs

ImpactImpact ofof linkagelinkage errorserrors onon riskrisk ratios:ratios:

Page 13: Claudia medina: Linking Health Records for Population Health Research in Brazil

UFRJLabmecs

http://www.ihdln.org

Page 14: Claudia medina: Linking Health Records for Population Health Research in Brazil

Laboratório de Métodos Epidemiológicos, Estatísticos e Laboratório de Métodos Epidemiológicos, Estatísticos e Computacionais em Saúde (LABMECS/IESC/UFRJ)Computacionais em Saúde (LABMECS/IESC/UFRJ)

Thank you.Thank [email protected]

http://www.iesc.ufrj.br/posgrad/posgraduacao/

UFRJLabmecs