issues in deterministic and probabilistic record linkage scott duvall salt lake city vha mc
TRANSCRIPT
![Page 1: Issues in Deterministic and Probabilistic Record Linkage Scott DuVall Salt Lake City VHA MC](https://reader035.vdocuments.mx/reader035/viewer/2022062722/56649f325503460f94c4ef94/html5/thumbnails/1.jpg)
Issues in Deterministic and Probabilistic Record Linkage
Scott DuVallSalt Lake City
VHA MC
![Page 2: Issues in Deterministic and Probabilistic Record Linkage Scott DuVall Salt Lake City VHA MC](https://reader035.vdocuments.mx/reader035/viewer/2022062722/56649f325503460f94c4ef94/html5/thumbnails/2.jpg)
the age of
informatiinformationon
![Page 3: Issues in Deterministic and Probabilistic Record Linkage Scott DuVall Salt Lake City VHA MC](https://reader035.vdocuments.mx/reader035/viewer/2022062722/56649f325503460f94c4ef94/html5/thumbnails/3.jpg)
informatiinformationon
informaticianinformatician
information = information =
![Page 4: Issues in Deterministic and Probabilistic Record Linkage Scott DuVall Salt Lake City VHA MC](https://reader035.vdocuments.mx/reader035/viewer/2022062722/56649f325503460f94c4ef94/html5/thumbnails/4.jpg)
Linkage Adds Information
![Page 5: Issues in Deterministic and Probabilistic Record Linkage Scott DuVall Salt Lake City VHA MC](https://reader035.vdocuments.mx/reader035/viewer/2022062722/56649f325503460f94c4ef94/html5/thumbnails/5.jpg)
Linkage Corrects Errors
![Page 6: Issues in Deterministic and Probabilistic Record Linkage Scott DuVall Salt Lake City VHA MC](https://reader035.vdocuments.mx/reader035/viewer/2022062722/56649f325503460f94c4ef94/html5/thumbnails/6.jpg)
6
• Missing informationaffects patient care1
1 Stiell et al. Prevalence of information gaps in the emergency department and the effect on patient outcomes. Cmaj 2003;169(10):1023-8.
2 Coleman et al. Lost in transition: challenges and opportunities for improving the quality of transitional care. Ann Intern Med 2004;141(7):533-6.
•Transitions in care cause breakdown in communication2
![Page 7: Issues in Deterministic and Probabilistic Record Linkage Scott DuVall Salt Lake City VHA MC](https://reader035.vdocuments.mx/reader035/viewer/2022062722/56649f325503460f94c4ef94/html5/thumbnails/7.jpg)
• Resolving duplicates can cost $60 per case.1
1Thornton SN, Hood SK. Reducing Duplicate Patient Creation Using a Probabilistic Matching Algorithm in an Open-access Community Data Sharing Environment. Proc AMIA Symp 2005:1135.
![Page 8: Issues in Deterministic and Probabilistic Record Linkage Scott DuVall Salt Lake City VHA MC](https://reader035.vdocuments.mx/reader035/viewer/2022062722/56649f325503460f94c4ef94/html5/thumbnails/8.jpg)
• “between $0.30 and $0.40 of every dollar spent on health care is wasted on overuse, under use, misuse, duplication, system failures, unnecessary repetition, poor communications and inefficiency.”1
1Reid PP, Compton WD, Grossman JH, Fanjiang G. Building a Better Delivery System: A New Engineering/Health Care Partnership. National Academies Press, 2005:99.
![Page 9: Issues in Deterministic and Probabilistic Record Linkage Scott DuVall Salt Lake City VHA MC](https://reader035.vdocuments.mx/reader035/viewer/2022062722/56649f325503460f94c4ef94/html5/thumbnails/9.jpg)
• Key element of health care information exchange and interoperability, estimated to be able to reduce costs $77.8 billion annually.1
1Walker J, Pan E, Johnston D, Adler-Milstein J, Bates DW, Middleton B. The value of health care information exchange and interoperability. Health Aff (Millwood). 2005 Jan-Jun;Suppl Web Exclusives: W5-10-W5-18.
![Page 10: Issues in Deterministic and Probabilistic Record Linkage Scott DuVall Salt Lake City VHA MC](https://reader035.vdocuments.mx/reader035/viewer/2022062722/56649f325503460f94c4ef94/html5/thumbnails/10.jpg)
10
Record Matching
• Many systems have record matching software.
• Errors still exist– 50% missed in CDC Survey1
– 5% missed in 1.5 million records = 75,0002
1 User Manual for the CDC Deduplication Evaluation Toolkit2 Snow LA, DuVall SL. Clinical Data Exchange Through A Looking Glass: A Gray-Box Approach To Record Linkage. NLM 2005.
![Page 11: Issues in Deterministic and Probabilistic Record Linkage Scott DuVall Salt Lake City VHA MC](https://reader035.vdocuments.mx/reader035/viewer/2022062722/56649f325503460f94c4ef94/html5/thumbnails/11.jpg)
Old Technology
![Page 12: Issues in Deterministic and Probabilistic Record Linkage Scott DuVall Salt Lake City VHA MC](https://reader035.vdocuments.mx/reader035/viewer/2022062722/56649f325503460f94c4ef94/html5/thumbnails/12.jpg)
Misunderstood Technology
![Page 13: Issues in Deterministic and Probabilistic Record Linkage Scott DuVall Salt Lake City VHA MC](https://reader035.vdocuments.mx/reader035/viewer/2022062722/56649f325503460f94c4ef94/html5/thumbnails/13.jpg)
Misunderstood Technology
![Page 14: Issues in Deterministic and Probabilistic Record Linkage Scott DuVall Salt Lake City VHA MC](https://reader035.vdocuments.mx/reader035/viewer/2022062722/56649f325503460f94c4ef94/html5/thumbnails/14.jpg)
Score Is Not Probability
score
probability
![Page 15: Issues in Deterministic and Probabilistic Record Linkage Scott DuVall Salt Lake City VHA MC](https://reader035.vdocuments.mx/reader035/viewer/2022062722/56649f325503460f94c4ef94/html5/thumbnails/15.jpg)
Information is not Used
![Page 16: Issues in Deterministic and Probabilistic Record Linkage Scott DuVall Salt Lake City VHA MC](https://reader035.vdocuments.mx/reader035/viewer/2022062722/56649f325503460f94c4ef94/html5/thumbnails/16.jpg)
MPIMPIMPIMPIName +
Date of Birth + Social Security Number
![Page 17: Issues in Deterministic and Probabilistic Record Linkage Scott DuVall Salt Lake City VHA MC](https://reader035.vdocuments.mx/reader035/viewer/2022062722/56649f325503460f94c4ef94/html5/thumbnails/17.jpg)
MPIMPIMPIMPI
![Page 18: Issues in Deterministic and Probabilistic Record Linkage Scott DuVall Salt Lake City VHA MC](https://reader035.vdocuments.mx/reader035/viewer/2022062722/56649f325503460f94c4ef94/html5/thumbnails/18.jpg)
Deterministic Linkage
1)IF r1.social_security_number = r2.social_security_number
THEN match.
2) IF SoundexCompare(r1.last_name, r2.last_name) AND
SoundexCompare(r1.first_name, r2.first_name) AND
EditDistance(r1.birth_place, r2.place)<2 AND
r1.birth_date = r2.birth_date AND
r1.multiplicity = r2.multiplicity AND
r1.birth_order = r2.birth_order
THEN match.
![Page 19: Issues in Deterministic and Probabilistic Record Linkage Scott DuVall Salt Lake City VHA MC](https://reader035.vdocuments.mx/reader035/viewer/2022062722/56649f325503460f94c4ef94/html5/thumbnails/19.jpg)
IF contains(0..9)
THEN NUMBER
IF contains(North, South, East, West)
THEN DIRECTION
IF contains(Street, Road, Lane, Drive, ...)
THEN STREET_TYPE
ELSE STREET_NAME
IF (NUMBER = NUMBER) AND (DIRECTION = DIRECTION) AND (STREET = STREET) AND (STREET_TYPE = STREET_TYPE)
THEN MATCH
![Page 20: Issues in Deterministic and Probabilistic Record Linkage Scott DuVall Salt Lake City VHA MC](https://reader035.vdocuments.mx/reader035/viewer/2022062722/56649f325503460f94c4ef94/html5/thumbnails/20.jpg)
Probabilistic Linkage
Each field given AGREEMENT and DISAGREEMENT weight
Weight proportional to the field’s DISCRIMINATION and RELIABILITY
Many more parameters, possibility of better matching
![Page 21: Issues in Deterministic and Probabilistic Record Linkage Scott DuVall Salt Lake City VHA MC](https://reader035.vdocuments.mx/reader035/viewer/2022062722/56649f325503460f94c4ef94/html5/thumbnails/21.jpg)
21
Record Matching
Understand your Data+
Understand Mistakes in your Data
Good Strategy for LinkageMANUAL REVIEW
MANUAL REVIEW
![Page 22: Issues in Deterministic and Probabilistic Record Linkage Scott DuVall Salt Lake City VHA MC](https://reader035.vdocuments.mx/reader035/viewer/2022062722/56649f325503460f94c4ef94/html5/thumbnails/22.jpg)
Understanding the Data
• Compare characteristics of records in the duplicate subset with records in the full enterprise data warehouse
• Describe instances where records in the duplicate subset are not typical of the database at large
• Provide considerations for others looking at duplicate records in master patient indexes
![Page 23: Issues in Deterministic and Probabilistic Record Linkage Scott DuVall Salt Lake City VHA MC](https://reader035.vdocuments.mx/reader035/viewer/2022062722/56649f325503460f94c4ef94/html5/thumbnails/23.jpg)
UUHSC Friedman
Extra names and titles 34.3% 36.9%
Nicknames, spelling variations 21.8% 13.9%
One letter substitutions 13.6% 13.7%
One letter added or deleted 7.6% 12.9%
Punctuation or spaces 1.9% 11.8%
Different last names for females 12.9% 7.8%
Permuted parts of names 3.2% 1.4%
Different first names 2.8% 1.4%
One letter transposed 1.9% 0.8%
Nicknames, spelling variations 21.8% 13.9%
Punctuation or spaces 1.9% 11.8%
![Page 24: Issues in Deterministic and Probabilistic Record Linkage Scott DuVall Salt Lake City VHA MC](https://reader035.vdocuments.mx/reader035/viewer/2022062722/56649f325503460f94c4ef94/html5/thumbnails/24.jpg)
UUHSC Grannis
Missing SSN 52.4% 35%
Typographical errors 62.7% 35.5%
Spouse (family) collisions 14.8% 47.5%
Unexplained collisions 9.9% 17%
Invalid SSN 12.6% 0%
Missing SSN 52.4% 35%
Typographical Errors 62.7% 35.5%
All Collisions 24.7% 64.5%
Invalid SSN 12.6% 0%
![Page 25: Issues in Deterministic and Probabilistic Record Linkage Scott DuVall Salt Lake City VHA MC](https://reader035.vdocuments.mx/reader035/viewer/2022062722/56649f325503460f94c4ef94/html5/thumbnails/25.jpg)
Extension of the Probabilistic Model for Approximate Field Comparators
![Page 26: Issues in Deterministic and Probabilistic Record Linkage Scott DuVall Salt Lake City VHA MC](https://reader035.vdocuments.mx/reader035/viewer/2022062722/56649f325503460f94c4ef94/html5/thumbnails/26.jpg)
Probabilistic Model
Field in Record A = Field in Record BAgreement Weight
Field in Record A ≠ Field in Record BDisagreement Weight
![Page 27: Issues in Deterministic and Probabilistic Record Linkage Scott DuVall Salt Lake City VHA MC](https://reader035.vdocuments.mx/reader035/viewer/2022062722/56649f325503460f94c4ef94/html5/thumbnails/27.jpg)
M – probability that field matches in dup pair
U – probability that field matches in non-dup pair
![Page 28: Issues in Deterministic and Probabilistic Record Linkage Scott DuVall Salt Lake City VHA MC](https://reader035.vdocuments.mx/reader035/viewer/2022062722/56649f325503460f94c4ef94/html5/thumbnails/28.jpg)
Agreement WeightLOG(M/U)
Disagreement WeightLOG(1-M/1-U)
![Page 29: Issues in Deterministic and Probabilistic Record Linkage Scott DuVall Salt Lake City VHA MC](https://reader035.vdocuments.mx/reader035/viewer/2022062722/56649f325503460f94c4ef94/html5/thumbnails/29.jpg)
Field in Record A ≈ Field in Record B?
![Page 30: Issues in Deterministic and Probabilistic Record Linkage Scott DuVall Salt Lake City VHA MC](https://reader035.vdocuments.mx/reader035/viewer/2022062722/56649f325503460f94c4ef94/html5/thumbnails/30.jpg)
Approximate Comparator
Edit Distance
ED( Johnathan, Jonathan ) = 1
![Page 31: Issues in Deterministic and Probabilistic Record Linkage Scott DuVall Salt Lake City VHA MC](https://reader035.vdocuments.mx/reader035/viewer/2022062722/56649f325503460f94c4ef94/html5/thumbnails/31.jpg)
![Page 32: Issues in Deterministic and Probabilistic Record Linkage Scott DuVall Salt Lake City VHA MC](https://reader035.vdocuments.mx/reader035/viewer/2022062722/56649f325503460f94c4ef94/html5/thumbnails/32.jpg)
![Page 33: Issues in Deterministic and Probabilistic Record Linkage Scott DuVall Salt Lake City VHA MC](https://reader035.vdocuments.mx/reader035/viewer/2022062722/56649f325503460f94c4ef94/html5/thumbnails/33.jpg)
Approximate Comparator Weight
LOG(Mδ /Uδ)
![Page 34: Issues in Deterministic and Probabilistic Record Linkage Scott DuVall Salt Lake City VHA MC](https://reader035.vdocuments.mx/reader035/viewer/2022062722/56649f325503460f94c4ef94/html5/thumbnails/34.jpg)
Mδ – probability that field approximately matches by δ in dup pair
Uδ – probability that field approximately matches by δ in non-dup pair
![Page 35: Issues in Deterministic and Probabilistic Record Linkage Scott DuVall Salt Lake City VHA MC](https://reader035.vdocuments.mx/reader035/viewer/2022062722/56649f325503460f94c4ef94/html5/thumbnails/35.jpg)
Dups Non-Dups
Load and randomizetraining set
Classify with estimated
parameters
Estimate Dups and Non-Dups
Update Parameters
Initial Parameters
![Page 36: Issues in Deterministic and Probabilistic Record Linkage Scott DuVall Salt Lake City VHA MC](https://reader035.vdocuments.mx/reader035/viewer/2022062722/56649f325503460f94c4ef94/html5/thumbnails/36.jpg)
Dups Non-Dups
Load and randomizetraining set
Classify with updated
parameters
Re-estimate Dups and Non-Dups
Update Parameters
Updated Parameters
![Page 37: Issues in Deterministic and Probabilistic Record Linkage Scott DuVall Salt Lake City VHA MC](https://reader035.vdocuments.mx/reader035/viewer/2022062722/56649f325503460f94c4ef94/html5/thumbnails/37.jpg)
Dups Non-Dups
Load and randomizevalidation set
Classify with training set parameters
Classified Dups and Non-Dups
Training Set Parameters
![Page 38: Issues in Deterministic and Probabilistic Record Linkage Scott DuVall Salt Lake City VHA MC](https://reader035.vdocuments.mx/reader035/viewer/2022062722/56649f325503460f94c4ef94/html5/thumbnails/38.jpg)
![Page 39: Issues in Deterministic and Probabilistic Record Linkage Scott DuVall Salt Lake City VHA MC](https://reader035.vdocuments.mx/reader035/viewer/2022062722/56649f325503460f94c4ef94/html5/thumbnails/39.jpg)
![Page 40: Issues in Deterministic and Probabilistic Record Linkage Scott DuVall Salt Lake City VHA MC](https://reader035.vdocuments.mx/reader035/viewer/2022062722/56649f325503460f94c4ef94/html5/thumbnails/40.jpg)
![Page 41: Issues in Deterministic and Probabilistic Record Linkage Scott DuVall Salt Lake City VHA MC](https://reader035.vdocuments.mx/reader035/viewer/2022062722/56649f325503460f94c4ef94/html5/thumbnails/41.jpg)
![Page 42: Issues in Deterministic and Probabilistic Record Linkage Scott DuVall Salt Lake City VHA MC](https://reader035.vdocuments.mx/reader035/viewer/2022062722/56649f325503460f94c4ef94/html5/thumbnails/42.jpg)
![Page 43: Issues in Deterministic and Probabilistic Record Linkage Scott DuVall Salt Lake City VHA MC](https://reader035.vdocuments.mx/reader035/viewer/2022062722/56649f325503460f94c4ef94/html5/thumbnails/43.jpg)
questionsquestions??