enhancing the ead: encoded archival description of sensitive medical data

9
Enhancing the EAD: Encoded Archival Description of Sensitive Medical Data Catherine Arnott Smith, PhD Assistant Professor School of Library and Information Studies University of Wisconsin-Madison Room 4217 Helen C. White Hall 600 N. Park Street, Madison, WI 53706 Nancy McCall Chesney Medical Archives Johns Hopkins Medical Institutions 5801 Smith Avenue, Suite 235 Baltimore, MD 21209 Abstract Electronic health records (EHRs) will present a significant challenge to archivists and researchers. The EHR is a “container for a set of transactions”: both persistent (historical data pertaining to one patient, with long-term value); and event (EKG tracings of that one patient on one morning in the clinic, with short- term value.) (Bird, Goodchild, & Beale, 2000). The central role of EHRs in the electronic healthcare environment is a major theme of the proposed National Health Information Infrastructure (http://aspe.hhs.gov/sp/nhii/ ). However, the development and maintenance of EHRs, and, logically, the preservation of EHRs for access by future generations now takes place in a climate of extreme sensitivity towards patient privacy. HIPAA (the Health Information Portability and Accountability Act of 1996; Public Law 104-191) imposes severe financial penalties for unauthorized release of personal health information (PHI). An XML standard called the Clinical Document Architecture (CDA) has been developed and is now being refined to enable secure maintenance, storage, and transmission of semistructured clinical documents. This poster will report on the progress of a study that marries the CDA to the Encoded Archival Description (EAD) standard. The aim of this research is to ensure not only that the requirements of health information policy will be met, but that they can be used to guide and inform researchers of the future. The CDA and the EAD HIPAA (the Health Insurance Portability and Accountability Act) defines uses and disclosures of personal health information that must be authorized by the patient. This requires considerable changes in health information management and in research practice alike. For example, for health information to be considered “deidentified” and thus accessible by researchers, 18 specific data elements must be redacted from the record including Names; Elements of dates, except years, directly related to an individual; Telephone and fax numbers; Geographic subdivisions; Electronic mail addresses; Social Security numbers; and “Any other unique identifying number, characteristic or code”. This conflict has ramifications for the archiving and preservation of medical records now and for the future. For example, the difficulty of locating and identifying these data elements in highly unstructured narrative text, such as letters from physicians, poses logistical problems for archivist custodians of medical records, so does the requirement of redacting geographic subdivisions. The Clinical Document Architecture (CDA) is an emerging national standard for clinical data modeling that is receiving international recognition. CDA Release 2.0 was ANSI-approved in May of 2005. (Health Level Seven, 2005). The intent of the CDA is to support reuse, exchange, and longevity of the documents in a system-independent manner, raising the possibility of the true lifetime health record. (Health Level Seven,

Upload: catherine-arnott-smith

Post on 15-Jun-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Enhancing the EAD: Encoded archival description of sensitive medical data

Enhancing the EAD:

Encoded Archival Description of Sensitive Medical Data Catherine Arnott Smith, PhD Assistant Professor School of Library and Information Studies University of Wisconsin-Madison Room 4217 Helen C. White Hall 600 N. Park Street, Madison, WI 53706 Nancy McCall Chesney Medical Archives Johns Hopkins Medical Institutions 5801 Smith Avenue, Suite 235 Baltimore, MD 21209 Abstract Electronic health records (EHRs) will present a significant challenge to archivists and researchers. The EHR is a “container for a set of transactions”: both persistent (historical data pertaining to one patient, with long-term value); and event (EKG tracings of that one patient on one morning in the clinic, with short-term value.) (Bird, Goodchild, & Beale, 2000). The central role of EHRs in the electronic healthcare environment is a major theme of the proposed National Health Information Infrastructure (http://aspe.hhs.gov/sp/nhii/). However, the development and maintenance of EHRs, and, logically, the preservation of EHRs for access by future generations now takes place in a climate of extreme sensitivity towards patient privacy. HIPAA (the Health Information Portability and Accountability Act of 1996; Public Law 104-191) imposes severe financial penalties for unauthorized release of personal health information (PHI). An XML standard called the Clinical Document Architecture (CDA) has been developed and is now being refined to enable secure maintenance, storage, and transmission of semistructured clinical documents. This poster will report on the progress of a study that marries the CDA to the Encoded Archival Description (EAD) standard. The aim of this research is to ensure not only that the requirements of health information policy will be met, but that they can be used to guide and inform researchers of the future. The CDA and the EAD HIPAA (the Health Insurance Portability and Accountability Act) defines uses and disclosures of personal health information that must be authorized by the patient. This requires considerable changes in health information management and in research practice alike. For example, for health information to be considered “deidentified” and thus accessible by researchers, 18 specific data elements must be redacted from the record including Names; Elements of dates, except years, directly related to an individual; Telephone and fax numbers; Geographic subdivisions; Electronic mail addresses; Social Security numbers; and “Any other unique identifying number, characteristic or code”. This conflict has ramifications for the archiving and preservation of medical records now and for the future. For example, the difficulty of locating and identifying these data elements in highly unstructured narrative text, such as letters from physicians, poses logistical problems for archivist custodians of medical records, so does the requirement of redacting geographic subdivisions. The Clinical Document Architecture (CDA) is an emerging national standard for clinical data modeling that is receiving international recognition. CDA Release 2.0 was ANSI-approved in May of 2005. (Health Level Seven, 2005). The intent of the CDA is to support reuse, exchange, and longevity of the documents in a system-independent manner, raising the possibility of the true lifetime health record. (Health Level Seven,

Page 2: Enhancing the EAD: Encoded archival description of sensitive medical data

2000). The CDA can be viewed as a super-set of XML Document Type Definitions, hierarchically organized to prescribe the semantics and constraints on different types of clinical documents. These constraints make clinical documents shareable. The intent of the CDA in health informatics is similar to that of the Encoded Archival Description (EAD) in archives; both architectures attempt to impose order on semi-structured text documents by standardizing frequently occurring segments within those documents. The EAD was developed in 1993 as an encoding standard for machine-readable, sharable text created by libraries, museums, archives and manuscript repositories. The CDA, similarly, standardizes templates for radiology reports, laboratory test results, history and physical notes, discharge summaries, operating room notes, and hundreds of other common healthcare documents. Both the CDA and EAD are communication standards that specify structure, but do not attempt to define semantics of the content being structured. Examples of CDA and EAD markup appear in Table 1.

Table 1 Examples of CDA and EAD Markup

CDA-structured Consultation Note [extract]

(Health Level Seven, 2004) EAD 2002-structured Finding Aid [extract]

(Society of American Archivists, 2002) ClinicalDocument> ... CDA Header ... <StructuredBody> <section> <code code="10157-2" codeSystem="2.16.840.1.113883.6.1" codeSystemName="LOINC" /> <title>Family history</title> - <text> - <list> <item>Father had fatal MI in his early 50's.</item> <item>No cancer or diabetes.</item> </list> </text> </section> </StructuredBody> </ClinicalDocument>

<descgrp> <head>Important Information for Users of the Collection</head> <accessrestrict> <head>Access</head> <p>Collection is open for research.</p> </accessrestrict> <userestrict> <head>Publication Rights</head> <p>Property rights reside with the University of California. Literary rights are retained by the creators of the records and their heirs. For permissions to reproduce or to publish, please contact the Head of Special Collections and Archives.</p> </userestrict> <prefercite> <head>Preferred Citation</head> <p>Mildred Davenport Dance Programs and Dance School Materials. MS-P29. Special Collections and Archives, The UC Irvine Libraries, Irvine, California.</p> </prefercite> </descgrp> [5]i

Research methodology

Overview

This study uses the CDA standard as a translating mechanism to facilitate identification of PHI in electronic health records. The researchers have developed a HIPAA-aware EAD finding aid for use with electronic health record collections. This finding aid will provide information to future users of such collections about what HIPAA-regulated data elements are present and where in the document they reside.

Phase I: Development of a HIPAA-aware Finding Aid

Page 3: Enhancing the EAD: Encoded archival description of sensitive medical data

The researchers analyzed the content and structure of 200 deidentified contemporary clinical documents from a major medical center. The purpose is to determine the distribution and frequency of the 18 data elements specified through HIPAA to be removed for deidentification to take place. These 200 documents are of 8 types found commonly in a pilot study of electronic health records attached to cancer patients (Radiology Reports, Progress Notes, Letters, Operative Notes, History/Physical Reports, Surgical Pathology Reports, Discharge Summaries, Emergency Room Visits). A finding aid for this simulated EHR collection was developed using the EAD 2002 structure and informed by the CDA v. 2.0 specification for individual data elements. This finding aid will attempt to enhance and augment the EAD by incorporating a description of specific sensitive document elements and segments that are known to be present in these generic clinical document types. The HIPAA-aware EAD finding aid developed in Phase I was tested and refined for goodness of fit with representative documents of different types found in the historical collections at the Alan Mason Chesney Historical Archives, Johns Hopkins Medical Institutions. This work can further archivists’ understanding of (1) the location and frequency of potentially HIPAA-sensitive data in historic documents, and (2) potentially sensitive types of historical medical records. One outcome of this project is to make it easier to prepare and maintain Web-based finding aids for collections now based in print, but which face a digital preservation future. Impact The ultimate product of this research project is a twin set of finding aid templates, EAD-compliant and HIPAA-aware, for use with clinical documents both electronic and contemporary, and paper-based and historical The project can make many potential contributions to management and preservation of electronic records. The first is in the application of an archival perspective to the electronic health record. Work on EHRs in the United States has been strongly allied to the relational database model, unlike that in Europe and Australia, which has taken a document-centric perspective (for an overview, see the description of the GEHR projects) (Deep Thought Informatics, n.d.) Since as much as 90% of medical records in the US exist as paper documents and not as databases, an archival approach may be what is needed to further our nation’s progress towards a health information infrastructure. Second, and in turn, the researchers hope that the introduction of the Clinical Document Architecture perspective and philosophy to the archives community will be helpful to archivists struggling with the problem of medical records in various formats. Third, if the information content and structure of the EHR is better understood, that content will be more available to the researchers of the future. Health information is a more highly regulated commodity than ever and health information policy must be integrated into archives practice. An understanding of how much EHR content is and is not accessible to future researchers in a HIPAA climate will enable researchers and archivists alike to explore the riches of the clinical data in contemporary and historical records, while continuing to safeguard the privacy of the public. Comparison of the similarities and differences in clinical content between electronic health records and their print precursors will inform our understanding of both these media. Fourth, knowledge of what future users of health records, both historical and contemporary, can and cannot legally see will allow archivists to educate these users and design finding aids to meet the needs of researchers more effectively. Fifth, an understanding of what segments of what clinical documents contain what potentially sensitive information should inform future development of finding aids for future collections yet unborn.

Page 4: Enhancing the EAD: Encoded archival description of sensitive medical data

Sixth, and finally, this research is a test of the EAD. Medical records are an idiosyncratic type. An experiment that tests the EAD’s ability to address these idiosyncrasies should tell us more about the strengths and weaknesses of this archival standard.

The HIPAA-Aware EAD Finding Aid: Electronic and Print Examples

Description Standards Used in Creating these Examples All quoted text is from Margaret Procter & Michael Cook. Manual of Archival Description (3rd ed.). Burlington, VT: Gower.

Print Record

Granularity Electronic Record Print Record

Most granular Level 5 (analogous to piece)

Problem features

Predominant language

Script

Special features

Secondary characteristics

(see Content and Structure: Level B)

Level 4 (analogous to item)

Date

Site, locality, place

Personal/Corporate names

Events/Activities

Subject keywords

(see Content and Structure: Level A)

Page 5: Enhancing the EAD: Encoded archival description of sensitive medical data

EXAMPLE: Operating Room Note

Finding Aid for Electronic Record (Level 5 description) Granularity Electronic

Record Display

in Finding

Aid

Example of description

Most granular

Level 5 (analogous to

piece)

See below

Level 4 (analogous to

item)

Number of restricted HIPAA elements only

“Operating Room Note files, range YYMMDD through YYMMDD. 5 of 10 fields have been identified as containing data elements restricted for release under HIPAA.”

Least granular

Level 3 (analogous to

series)

Indication that restricted HIPAA elements are present in record

“Electronic Medical Records of the University of Pittsburgh Cancer Institute, YYMMDD through YYMMDD. Contains

data restricted for release under HIPAA.”

Here is how an Operating Room Note would appear described in a finding aid for an electronic medical record collection. The following text is copied from the finding aid for the J.A. Benjamin Papers (Online Archive of California; Louise B. Darling Medical Library, UCLA) that reference the same essential document, a description of operations dictated by the surgeon, Dr. Benjamin. Text added by the author and deriving from an actual example of an Operating Room Note appears below. The text boxed out in gray indicates text restricted under HIPAA.

Finding Aid for Operating Room Note: Electronic Record [ Box 1 ] [ Folder 17 ] Description of operations 1941 [unittitle added: see error report] notes on two procedures on a named patient, dictated by JAB, surgeon, at White Memorial Hospital, Los Angeles [RESTRICTED]

Page 6: Enhancing the EAD: Encoded archival description of sensitive medical data

The contents of the Operating Room Notes are as follows: <Procedures>PROCEDURES: <OperationTitle>TITLE OF OPERATION: REMOVAL OF SUBCUTANEOUS PORT AND CENTRAL VENOUS CATHETER. <Anesthesia>ANESTHESIA: LIDOCAINE 1% WITH INTRAVENOUS SEDATION. <PreopDiagnosis>PREOPERATIVE DIAGNOSIS(ES): BREAST CANCER. <PostopDiagnosis>POSTOPERATIVE DIAGNOSIS(ES): BREAST CANCER. <Indications>INDICATIONS: The patient is a ____________-year-old woman with metastatic breast cancer. She is currently being treated with Tamoxifen. She has no more use for long-term venous access device. Its removal was advised. <Description>DESCRIPTION OF OPERATION: The patient was placed supine on the operating table. The right chest was prepped and then draped in a sterile fashion. A transverse incision was made over the most prominent portion of the port. The port was easily delivered through the incision. The stay sutures were cut. The catheter and port were easily removed intact. The subcutaneous tunnel was obliterated with a figure-of- eight Vicryl stitch. At this point, the patient brought her left arm up into the operating field to scratch her nose. It was quickly removed. New towels were placed, and the wound was irrigated with antibiotic- containing saline solution. The wound was closed with a running subcuticular Dexon stitch. Steri-Strips were applied over Benzoin. A dry sterile dressing was applied. The patient tolerated the procedure well. She was transferred to the post anesthesia care unit awake and in stable condition. I was the only one scrubbed for this procedure. <SignatureBlock> <SignatureBlock>____________ Dictator: ____________ ________________________ ____________Job #: 01109

Page 7: Enhancing the EAD: Encoded archival description of sensitive medical data

Finding Aid for Print Record (Level 5 description) Granularity Print Record

Display in Finding Aid

Example of Description

Most granular

Level 5 (analogous to

piece)

see below

Level 4 (analogous to

item)

Date

Site, locality, place

Personal/Corporate names

Events/Activities

Subject keywords

“Operating Room Note files, range YYMMDD through YYMMDD. 5 of 10 fields have been identified as containing data elements restricted for release under HIPAA.”

Keywords: Breast neoplasm; Removal of intravenous port

Least granular

Level 3 (analogous to

series)

Status quo

“Electronic Medical Records of the University of Pittsburgh Cancer Institute, YYMMDD through YYMMDD. Contains data restricted for release

under HIPAA.”

Here is how the same Operating Room Note would appear described in a finding aid for a print medical record collection. The following text is copied from the same finding aid described above and represents the same document. Text deriving from an actual contemporary example of an Operating Room Note has been framed in a box. Whereas the electronic record, with its more granular content description, necessarily features text that must be deidentified for release under HIPAA, the print record equivalent can be described more generally in terms of restricted content. In this example, the number of data elements potentially restricted by HIPAA is listed. In the end, the ability of the HIPAA-Aware EAD Finding Aid to identify and structure non-restricted content---in this example, the surgical procedure, type of anesthesia, diagnoses and description of the operation—should assist archivists and researchers alike in understanding the collections with which they work.

Page 8: Enhancing the EAD: Encoded archival description of sensitive medical data

Finding Aid for Operating Room Note: Print Record

[ Box 1 ] [ Folder 17 ] Description of operations 1941 [unittitle added: see error report] notes on two procedures on a named patient, dictated by JAB, surgeon, at White Memorial Hospital, Los Angeles [RESTRICTED]

Contents include a description of the following:

PROCEDURES: TITLE OF OPERATION: REMOVAL OF SUBCUTANEOUS PORT AND CENTRAL VENOUS CATHETER. ANESTHESIA: LIDOCAINE 1% WITH INTRAVENOUS SEDATION. PREOPERATIVE DIAGNOSIS(ES): BREAST CANCER. POSTOPERATIVE DIAGNOSIS(ES): BREAST CANCER. INDICATIONS: [One data element restricted for release under HIPAA] DESCRIPTION OF OPERATION: The patient was placed supine on the operating table. The right chest was prepped and then draped in a sterile fashion. A transverse incision was made over the most prominent portion of the port. The port was easily delivered through the incision. The stay sutures were cut. The catheter and port were easily removed intact. The subcutaneous tunnel was obliterated with a figure-of- eight Vicryl stitch. At this point, the patient brought her left arm up into the operating field to scratch her nose. It was quickly removed. New towels were placed, and the wound was irrigated with antibiotic- containing saline solution. The wound was closed with a running subcuticular Dexon stitch. Steri-Strips were applied over Benzoin. A dry sterile dressing was applied. The patient tolerated the procedure well. She was transferred to the post anesthesia care unit awake and in stable condition. I was the only one scrubbed for this procedure. SIGNATURE [One data element restricted for release under HIPAA] DICTATOR: [One data element restricted for release under HIPAA] TRANSCRIBER: [One data element restricted for release under HIPAA] DATE/TIME STAMP: [One data element restricted for release under HIPAA]

References Bird, L.J., Goodchild, A., & Beale, T. (2000). Integrating health care information using XML-based metadata. [Unpublished manuscript]. Available online: http://citeseer.nj.nec.com/bird00integrating.html. (Date accessed: Jan. 31, 2005). Deep Thought Informatics. (n.d.) Case study: GEHR Australia. Available online: http://www.deepthought.com.au/it/archetypes/output/gehr_example.html (Date accessed: May 30, 2005). Health Level Seven. Hl7 receives ANSI approval of three Version 3 specifications including CDA, Release 2.” (2005, May 5). [Press release]. Available online: http://www.hl7.org/Press/20050505.pdf (Date accessed: May 30, 2005).

Page 9: Enhancing the EAD: Encoded archival description of sensitive medical data

Health Level Seven. (2004, August). HL7 Clinical Document Architecture, Release 2.0. Available online: http://hl7.org/library/Committees/structure/CDA.ReleaseTwo.CommitteeBallot03.Aug.2004.zip. Health Level Seven. (2000). HL7 to release first XML-based standard for healthcare. [Press Release]. Ann Arbor, MI: Health Level Seven. Society of American Archivists. (2002). Appendix C. Example 1: Guide to the Mildred Davenport Dance Programs and Dance School Materials. Available online: http://www.loc.gov/ead/tglib/appendix_c.html. (Date accessed: May 30, 2005).