2013-03-03 etl specifications v4.0 saftinet
TRANSCRIPT
-
8/19/2019 2013-03-03 ETL Specifications v4.0 SAFTINet
1/47
SAFTINet ETL Specifications Document Page 1
Scalable Architecture for Federated Therapeutic Inquiries Network (SAFTINet)
ETL Specifications Document
Version 4.0
March 3rd, 2013
-
8/19/2019 2013-03-03 ETL Specifications v4.0 SAFTINet
2/47
SAFTINet ETL Specifications Document Page 2
LICENSE
© 2011 Foundation for the National Institutes of Health (FNIH).
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this document except in
compliance with the License. You may obtain a copy of the License at http://omop.fnih.org/publiclicense .
Unless required by applicable law or agreed to in writing, documentation and software distributed under
the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND,
either express or implied. Any redistributions of this work or any derivative work or modification based on
this work should be accompanied by the following source attribution: "This work is based on work by the
Observational Medical Outcomes Partnership (OMOP) and used under license from the FNIH at
http://omop.fnih.org/publiclicense.
Any scientific publication that is based on this work should include a reference to
http://omop.fnih.org .
This document was created specifically for the Scalable Architecture for Federated Translational Inquiries
Network (SAFTINet) project, in collaboration with OMOP. It reflects changes to the OMOP CDMv2 to
create OMOP CDMv3 which were done in collaboration with FNIH OMOP and the SCANNER (Scalable
National Network for Effectiveness Research) project (http://scanner.ucsd.edu/)
SAFTINet is supported by grant number R01HS019908 from the Agency for Healthcare Research and Quality.
-
8/19/2019 2013-03-03 ETL Specifications v4.0 SAFTINet
3/47
SAFTINet ETL Specifications Document Page 3
TABLE OF CONTENTS
1.0 Introduction 6
2.0 Definition of terms 7
3.0 Assumptions 12
4.0 Source Data Mapping Approach 134.1 Change to Existing Tables 144.2 Table Name: ORGANIZATION 154.3 Table Name: CARE_SITE 174.4 Table Name: PROVIDER 184.5 Table Name: X_DEMOGRAPHIC 214.6 Table Name: VISIT_OCCURRENCE 254.7 Table Name: DRUG_OCCURRENCE 274.8 Table Name: CONDITION_OCCURRENCE 304.9 Table Name: PROCEDURE_OCCURRENCE 334.10 Table Name: OBSERVATION 35
5.0 Appendix A: Table Specific Rules 38
6.0 Appendix B: Row Filters 39
7.0 Appendix C: Sending data using flatfiles 46
-
8/19/2019 2013-03-03 ETL Specifications v4.0 SAFTINet
4/47
-
8/19/2019 2013-03-03 ETL Specifications v4.0 SAFTINet
5/47
SAFTINet ETL Specifications Document Page 5
Change Record
Date Author Version Change Reference
02-Nov-2009 Vicki Fan, Mark
Khayter
1.0 Original OMOP ETL Template Document
04-Oct-2011 Patrick Hosokawa 2.0 Document adapted to SAFTINet ETL data model,flowcharts added to detail data flow from ETL modelto grid model
20-Dec-2011 Patrick Hosokawa 2.1 Document updated to 12/20/11 ETL data model
17-Mar-2012 Patrick Hosokawa 2.2 Document updated to 3/17/12 ETL data model
06-Aug-2012 Patrick Hosokawa 4.0 Change section removed, Appendix B updated, final
move to CDMv4. Added data on labs provided to
Appendix B.
03-Mar-2013 Patrick Hosokawa 4.1 Additions to Appendix B, Added Appendix C forflatfile instructions
-
8/19/2019 2013-03-03 ETL Specifications v4.0 SAFTINet
6/47
SAFTINet ETL Specifications Document Page 6
1.0 Introduction
This document reflects the requirements, assumptions, business rules and transformations for the
implementation of OMOP CDM V3, as recommended for SAFTINet.
The purpose of this document is two-fold:
1. Describe ETL mapping of data from SAFTINet partners into Common Data Model.2. Serve as a blueprint for equivalent ETL mapping processes for other data sources into CDM.
In each section, the tables and their mapping are individually reviewed along with any source specific
rules and exceptions.
The intended audiences for this document are the SAFTINet team and partner ETL technical personnel.
Sections of the document are targeted specifically towards each audience with appropriate focus and
level of detail.
-
8/19/2019 2013-03-03 ETL Specifications v4.0 SAFTINet
7/47
-
8/19/2019 2013-03-03 ETL Specifications v4.0 SAFTINet
8/47
SAFTINet ETL Specifications Document Page 8
as a string data type, and allowed to have one of two known code
values: "M" for male, "F" for female -- and NULL for records
where gender is unknown or not applicable (or arguably "U" for
unknown as a sentinel value). The data domain for the gender
column is: "M", "F".
In database technology, domain refers to the description of an
attribute's allowed values. The physical description is a set ofvalues the attribute can have, and the semantic, or logical,
description is the meaning of the attribute.
Drug In pharmacology, a drug as "a chemical substance used in the
treatment, cure, prevention, or diagnosis of disease or used to
otherwise enhance physical or mental well-being." Drugs may be
prescribed for a limited duration, or on a regular basis for chronic
disorders.
Drug Exposure (entity) The Drug Exposure entity contains individual records that suggest
drug utilization by the person. Drug Exposure indicators store key
information about each person medication and the timing
thereof, including the drug (captured as standard Concept code inthe CDM), quantity, beginning date of medication, number of
days supply, period of exposure, and prescription refill data. Drug
Exposures are stored in the DRUG_EXPOSURE table.
Encrypted Unique Identifiers Output of a de-identification process used to hash the identity of
subjects, providing them with a unique but de-identified
identifier.
Electronic Health Record (EHR) Electronic health record refers to an individual person's medical
record in digital format. It may be made up of electronic medical
records from many locations and/or sources. The EHR is a
longitudinal electronic record of person health information
generated by one or more encounters in any care deliverysetting. Included in this information are person demographics,
progress notes, problems, medications, vital signs, past medical
history, immunizations, laboratory data and radiology reports.
The EHR has the ability to generate a complete record of a clinical
person encounter - as well as supporting other care-related
activities directly or indirectly via interface - including evidence-
based decision support, quality management, and outcomes
reporting.
Electronic Medical Record
(EMR)
An electronic medical record is a computerized legal medical
record created in an organization that delivers care, such as a
hospital or outpatient setting. Electronic medical records tend to
be a part of a local stand-alone health information system that
allows storage, retrieval and manipulation of records. This
document will reference EHR moving forward even if certain data
sources internally use the EMR definition.
Extract, Transform, Load (ETL) Process of getting data out of one data store (Extract), modifying
it (Transform), and inserting it into a different data store (Load).
Generic Product Information
(GPI)
A proprietary unique identifier for a drug used by the commercial
Medi-Span® formulary database
-
8/19/2019 2013-03-03 ETL Specifications v4.0 SAFTINet
9/47
SAFTINet ETL Specifications Document Page 9
Grid-enabled network A collection of grid nodes (virtual organizations) capable of
responding to/with grid query/response services
Grid Node A grid-enabled database containing data “owned” by a specific
health care entity or virtual organization.
Grid Portal Contains that set of services that allows queries to be sent, to
give access to authorized user, and administer query and
response activities.Healthcare Common Procedure
Coding System (HCPCS)
HCPCS Level I codes are managed by the AMA (licensing fees
apply). The HCPCS Level II codes are managed by CMS (Centers
for Medicare & Medicaid Services). The Level II codes includes:
alphanumeric HCPCS procedure and modifier codes, their long
and short descriptions, and applicable Medicare administrative,
coverage, and pricing data. These codes are used for Medicare
outpatient services.
International Classification of
Disease, 9th Revision, Clinical
Modifications (ICD9-CM)
The official system of assigning codes to diagnoses and
procedures associated with hospital utilization in the United
States.
Investigator Any authorized clinician or researcher, or person designated toact on their behalf (e.g., research assistant, statistician) who has
been authenticated for access to query and response
functionality on the grid-enabled network
Logical Observation Identifiers
Names and Codes (LOINC)
Universal code names and identifiers to medical terminology
related to the Electronic Health Record and assists in the
electronic exchange and gathering of clinical results (such as
laboratory tests, clinical observations, outcomes management
and research).
Limited Data Set As defined by HIPAA, limited data sets are data sets stripped of
certain direct identifiers that are specified in the Privacy Rule.
They are not de-identified information under the Privacy Rule. Alimited data set is PHI that excludes the following direct
identifiers of the individual or of relatives, employers, or
household members of the individual: (1) names; (2) postal
address information, other than town or city, state, and ZIP code;
(3) telephone numbers; (4) fax numbers; (5) e-mail addresses; (6)
social security numbers; (7) medical record numbers; (8) health
plan beneficiary numbers; (9) account numbers; (10)
certificate/license numbers; (11) vehicle identifiers and serial
numbers, including license plate numbers; (12) device identifiers
and serial numbers; (13) web URLs; (14) Internet Protocol (IP)
address numbers; (15) biometric identifiers, including fingerprintsand voiceprints; and (16) full-face photographic images and any
comparable images. Importantly, unlike de-identified data, PHI in
limited data sets may include the following: city, state and ZIP
codes; all elements of dates (such as admission and discharge
dates); and unique codes or identifiers not listed as direct
identifiers. Recognizing that institutions, IRBs and investigators
are frequently faced with applying both the Common Rule and
the HIPAA Privacy Rule, OHRP does not consider a Limited Data
-
8/19/2019 2013-03-03 ETL Specifications v4.0 SAFTINet
10/47
SAFTINet ETL Specifications Document Page 10
Set (as defined under the HIPAA Privacy Rule) to constitute
individually identifiable information under 45 CFR 46.102(f)(2).
Local Reference Value The specific value stored used in the partner’s data to refer to any
given concept. This value will be mapped to a standardized
concept value for translation by ROSITA.
National Drug Codes (NDC) Unique identifiers assigned to individual drugs. NDCs are used
primarily as an inventory code and for prescriptions.
Observation (entity) The Observation table contains all general observations that aretracked as attributes, including source Observation code,
matching standard Concept Code, date of the Observation, type
of Observation, type of result, number/text/Concept code, and
reference range for numeric results. Observation entities are
recorded in the Observation table.
Observational Medical
Outcomes Partnership (OMOP)
A public-private partnership designed to protect human health by
improving the monitoring of drugs for safety and effectiveness.
Organization (entity) The Organization table is the highest level of the partner care
infrastructure hierarchy. Each organization may have multiple
care sites. Providers will work at one or more care sites.
Person (entity) A Person entity is one of the basic dimensions of analysis. Itpresents the framework for active drug surveillance. The Person
entity is Concept driven, and its attribute values are stored as
standard Concept codes rather than original (i.e., “raw”) source
values and is stored in the logical X_Demographic table.
Primary Care Physician A physician designated as responsible to provide specific care to a
patient, including evaluation and treatment as well as referral to
specialists.
Procedure Occurrence (entity) A Procedure Occurrence records individual instances of medical
procedures extracted from source data. Procedures are recorded
in various data sources in different forms with varying levels of
standardization such as CPT-4, ICD-9-CM, and HCPCS procedurecodes. These are stored in the PROCEDURE_OCCURRENCE table.
Protected Health Information
(PHI)
Protected health information (PHI) under HIPAA includes any
individually identifiable health information. Identifiable refers not
only to data that is explicitly linked to a particular individual
(that's identified information). It also includes health information
with data items which reasonably could be expected to allow
individual identification. De-indentified information is that from
which all potentially identifying information has been removed.
Provider (entity) The Provider table contains information on local care
providers including type and specialty. Providers are
assigned to an individual care site.Query A request for data based on the query specifications “sent” via a
grid services portal to a specified grid network.
ROSITA A software package designed to transition SAFTINet data from
the partner XML download to a grid database compatible form.
This package will translate local source codes into OMOP
concepts and will remove PHI other than dates of birth, dates of
service, and zip codes.
-
8/19/2019 2013-03-03 ETL Specifications v4.0 SAFTINet
11/47
SAFTINet ETL Specifications Document Page 11
RxNorm A standardized nomenclature for clinical drugs and drug delivery
devices is produced by the National Library of Medicine. In
RxNorm, the name of a clinical drug combines its ingredients,
strengths, and/or form.
RxNorm provides normalized names for clinical drugs and links its
names to many of the drug vocabularies commonly used in
pharmacy management and drug interaction software, including
those of First DataBank, Micromedix, MediSpan, Gold StandardAlchemy, and Multum. By providing links between these
vocabularies, RxNorm can mediate messages between systems
not using the same software and vocabulary.
Subject A patient, client or person of interest in the use cases
described whose clinical and demographic data are
contained within the virtual organization(s)
Systematized Nomenclature of
Medicine - Clinical Terms
(SNOMED-CT)
SNOMED-CT is one of a suite of designated standards for use in
U.S. Federal Government systems for the electronic exchange of
clinical health information, and is also a required standard in
interoperability specifications of the U.S. Healthcare Information
Technology Standards Panel. SNOMED-CT is also beingimplemented internationally as a standard within other IHTSDO
Member countries.
Terminology Technical or special terms used in a business or special subject
area.
Virtual organization (aka
Partner)
Any entity or group of entities (e.g., clinic, network of clinics,
agency or agencies) whose data is represented by a single grid
node and available through grid services for query/response
activities
Visit Occurrence (entity) The Visit Occurrence entity contains the information available in
the source data about person visits to healthcare providers,
including inpatient, outpatient, and ER visits. Visits are recorded
in various data sources in different forms with varying levels of
standardization. The detail level of the classification and
description of the visit differs by data source. Visit Occurrence
entities are recorded in the VISIT_OCCURRENCE table.
Vocabulary A computerized list (as of items of data or words) used for
reference (as for information retrieval or word processing).
-
8/19/2019 2013-03-03 ETL Specifications v4.0 SAFTINet
12/47
SAFTINet ETL Specifications Document Page 12
3.0 Assumptions
The design follows the agreed upon general project assumptions:1
- Electronic Medical Data: EMR is a subset of EHR. This document will reference EHR moving forward
even if specific data source might internally use Electronic Medical Record (EMR) definition.
- Financial Information: The CDM model makes use of financial information such as Fees, Payments,
Deductibles, Copayment, etc. from payer source data, such as Medicaid
- Plan Detail Information: The model potentially makes use of fields related to Plan or Coverage details
such as Benefit Plan, Plan Indicator, etc. of the administrative information in the claims data. The model
makes use of medical coverage period and eligibility for prescription drugs.
- Cleansing and Validation: The selected data fields will be handled (whether loaded directly or as part of a
transformation) with a validation plan which is to be determined later.
- Data Privacy: ETL from EHR/CDW will contain clear text direct patient identifiers and dates. ROSITA will
encrypt all clear text direct patient identifiers. A random identifier (called a GUID) that is unrelated to
any patient identifier will be associated with each patient record. Birth dates and dates of service will
remain unchanged. Zip codes will also be forward to the grid node unchanged and as second variable to
only include the 3-digit zip (the leftward 3 digits). The resulting data exported to the grid node will
therefore be a limited data set containing encrypted direct identifiers with unchanged dates and both 5-
digit and 3-digit zip codes. The grid node will have no access to any clear text direct patient identifiers
from the EHR/CDW.
Under the assumption that payer data will be provided with clear text direct identifiers, ROSITA will
perform record linkage to link the clinical record with the financial record using clear text identifiers. If a
match is made, the same GUID assigned to the clinical data will be assigned to the financial data.
Otherwise, a new GUID will be generated that is unrelated to any patient identifier. Dates will remain
unchanged. The resulting data exported to the grid node will be consistent with a Limited Data Set
containing encrypted direct identifiers, unchanged dates, a 5- and 3-digit zip code, and a GUID random
identifier. The grid node will have no access to any clear text direct patient identifiers from payer (e.g.
Medicaid) data.
- Concept Identifiers: Data are represented through standard concept identifiers using a standardized
terminology. During ETL, source data representations (raw data codes) will be translated to standard
concept identifiers through a mapping process. If no standard concept identifier is available, the concept
identifier field will contain ‘0’ as a value.
-
8/19/2019 2013-03-03 ETL Specifications v4.0 SAFTINet
13/47
SAFTINet ETL Specifications Document Page 13
4.0 Source Data Mapping Approach
This section covers the high-level assumptions and approach to extraction, transformation and loading
(ETL) of raw source data into the Common Data Model (CDM). The assumptions and approach are
defined with a special focus on claims and EHR data. The section covers each of the major tables in the
CDM separately, elaborating the distinct handling required for each.
Unless otherwise specified with ‘Required’ in field listing, missing attributes will not disqualify data from
being loaded into the Common Data Model. Missing attributes for Concept Identifiers will be populated
with the value zero (0) in the CDM, while the rest of the missing attributes will be populated with NULL.
The Source Field and Applied Rule fields are left blank for the partners to fill in. The source field should
be filled in with the equivalent field in the partner’s source data. The Applied Rule field should contain
any specialized rules (i.e. filtering, translation, combination of categories etc…) that the partner
implements when filling in the field.
In the flowcharts, the colors red, yellow, and green are used in the following manner.
Left Side (ETL View): represents desired source data
Red – Field is not brought forward into the grid
Green – Field is brought forward into the grid
Right Side (Grid View): represents desired grid-facing data
Red – Field is generated by the Rosita application. It is not derived from any ETL data field.
Yellow – Field is generated from ETL data, but does not exist as a field in the ETL data.
Green – Field is brought forward from ETL data unchanged.
The arrows indicate that the field on the right (in yellow) is generated from the field on the left (green forthose fields brought forward, red otherwise).
The grid facing data model (right side of the flowcharts) closely matches the OMOP v3 data model.
However, the SAFTINet grid model has a few extra fields needed specifically for SAFTINet. All fields
present in SAFTINet but not in the OMOP model use the prefix X_ (i.e. X_Organization_Source).
-
8/19/2019 2013-03-03 ETL Specifications v4.0 SAFTINet
14/47
SAFTINet ETL Specifications Document Page 14
4.1 Changes to existing tables
Table Changed Field Change
Visit_Occurrence x_visit_occurrence_source_identifier Changed from
visit_occurrence_source_identifier, new
x_ prefix is so the field can pass throughto the grid
Drug_Exposure x_visit_occurrence_source_identifier Changed from
visit_occurrence_source_identifier, new
x_ prefix is so the field can pass through
to the grid
Condition_Occurrence x_visit_occurrence_source_identifier Changed from
visit_occurrence_source_value, new x_
prefix is so the field can pass through to
the grid
Procedure_Occurrence x_visit_occurrence_source_identifier Changed from
visit_occurrence_source_value, new x_
prefix is so the field can pass through to
the grid
Observation x_visit_occurrence_source_identifier Changed from
visit_occurrence_source_value, new x_
prefix is so the field can pass through to
the grid
-
8/19/2019 2013-03-03 ETL Specifications v4.0 SAFTINet
15/47
SAFTINet ETL Specifications Document Page 15
4.2 Table Name: ORGANIZATION
The Organization table is the highest level of the partner care infrastructure hierarchy. Each organization may have multiple care sites.
Providers can work at one or more care sites. Address information submitted with the organization will be used to create a new location
record which will be linked to the organization record via the Location_ID field.
The field mapping is performed as follows:
Destination Field Data Type /
Required
Source Field Applied Rule Comment
organization_source
_value
String(50) /
Required
Local reference value for organization, used to
create the organization_id field on the gridfacing record. This value will also be used in
other records to refer to the organization.
X_data_source_type String(20) /
Required
Data Source Identifier (EHR / CDW / Medicaid)
place_of_service_source
_value
String (50) /
Required
The type of organization. If the organization
type is not defined in the source data refer to
the place_of_service_type section of the
Concept ID Table. Used to create
place_of_service_concept _id.
organization_address_1 String (50) First line of the address
organization_address_2 String (50) Second line of the address
organization_city String (50) City portion of the address
organization_state String (2) State portion of the address
organization_zip String (9) Zip code of the addressorganization_county String (20) County portion of the address
-
8/19/2019 2013-03-03 ETL Specifications v4.0 SAFTINet
16/47
SAFTINet ETL Specifications Document Page 16
4.2.1 Example of ORGANIZATION source / destination data
1. The organization_source_value field will be compared to the current set of locations. If the value does not already occur in the table (new location) a row will
be added to the table and a new ID (location_id) will be generated. Either a newly generated value or a pre-existing value (if the record is found) of the
location table Primary Key will be placed into location_id.
2. x_zip_deidentified will be generated from organization_zip. This field was created specifically for person locations to support the creation of ‘Safe Harbor’
Limited Data Sets.
3. x_location_type will be derived from the XML record type (Organization in this case)
Organization Table - XML
organization_source_value1 UC Internal Medicine
x_data_source_type EHRplace_of_service_source_value Academic Practice
organization_address_1 13199 E Montview Blvd
organization_address_2 Suite 300, Mail Stop F443
organization_city Aurora
organization_state CO
organization_zip 80045
organization_county Arapahoe
ETL View Grid View
Organization Table - Grid
organization_id 22770494
organization_source_value UC Internal Medicine
x_data_source_type EHR
place_of_service_concept_id 3389
place_of_service_source_value Academic Practice
location_id 39458x_gride_node_id 1
Location Table - Grid
location_id 39458
location_source_value UC Internal Medicine
x_data_source_type EHR
address_1 13199 E Montview Blvd
address_2 Suite 300, Mail Stop F443
city Aurora
state CO
zip 80045
x_ zip_deidentified2 800
county Arapahoe
x_location_type3 Organization
x_grid_node_id 1
Green – Brought forward into grid model / Red – Removed in processing Green – Brought forward from ETL / Yellow – Generated from ETL field / Red –
Generated locally or from multiple ETL fields
-
8/19/2019 2013-03-03 ETL Specifications v4.0 SAFTINet
17/47
SAFTINet ETL Specifications Document Page 17
4.3 Table Name: CARE_SITE
The Care Site table refers to the lower level of the provider care hierarchy. Individual provider care locations will be stored in this table.
The field mapping is performed as follows:
Destination Field Data Type Source Field Applied Rule Comment
care_site_source_value String (50) /
Required
Local reference value for care site, used to create
the care_site_id field on the grid facing record. This
value will also be used in other records to refer to
the care site.
x_data_source_type String(20) /Required
Data Source Identifier (EHR / CDW / Medicaid)
organization_source
_value
String (50) /
Required
Local reference value for organization. This value
will be matched against the organization table to
obtain the corresponding organization_id.
place_of_service_source
_value
String (50) The type of care site. If the care site type is not
defined in the source data refer to the
place_of_service_type section of the Concept ID
Table. Used to create place_of_service_concept _id
x_care_site_name String(50) Name of the clinic (care site)
care_site_address_1 String (50) First line of the address
care_site_address_2 String (50) Second line of the address
care_site_city String (50) City portion of the address
care_site_state String (2) State portion of the address
care_site_zip String (9) Zip code of the addresscare_site_county String (20) County portion of the address
-
8/19/2019 2013-03-03 ETL Specifications v4.0 SAFTINet
18/47
SAFTINet ETL Specifications Document Page 18
4.3.1 Example of CARE SITE source / destination data
1. The care_site_source_value field will be compared to the current set of locations. If the value does not already occur in the table (new location) a row will
be added to the table and a new ID (location_id) will be generated. Either a newly generated value or a pre-existing value (if the record is found) of the
location table Primary Key will be placed into location_id.
2. x _zip_deidentified will be generated from care_site_zip. This field was created specifically for person locations to support the creation of ‘Safe Harbor’
Limited Data Sets
3. x_location_type will be derived from the XML record type (Care Site in this case)
Care Site Table - XML
care_site_source_value1
UC Internal Medicinex_data_source_type EHR
organization_source_value University of Colorado
place_of_service_source_value Internal Medicine
x_care_site_name Eastside Clinic
care_site_address_1 13199 E Montview Blvd
care_site_address_2 Suite 300, Mail Stop F443
care_site_city Aurora
care_site_state CO
care_site_zip 80045
care_site_county Arapahoe
ETL View Grid View
Care Site Table - Grid
care_site_id 22770494
care_site_source_value UC Internal Medicine
x_data_source_type EHR
location_id 49382
organization_id 382392place_of_service_concept_id 39458
place_of_service_source_value Internal Medicine
x_care_site_name Eastside Clinic
x_grid_node_id 1
Location Table - Grid
location_id 49382
location_source_value UPI Building
x_data_source_type EHR
address_1 13199 E Montview Blvd
address_2 Suite 300, Mail Stop F443
city Aurora
state CO
zip 80045
x_zip_deidentified2
800county Arapahoe
x_location_type3 Care Site
x_grid_node_id 1
Green – Brought forward into grid model / Red – Removed in processing Green – Brought forward from ETL / Yellow – Generated from ETL field / Red –
Generated locally or from multiple ETL fields
-
8/19/2019 2013-03-03 ETL Specifications v4.0 SAFTINet
19/47
SAFTINet ETL Specifications Document Page 19
4.4 Table Name: PROVIDER
The Provider table contains information on local care providers including type and specialty. Providers are assigned to an individual care site.
The field mapping is performed as follows:
Destination Field Data Type Source Field Applied Rule Comment
provider_source_value String (50) /
Required
Local reference value for provider, used to create
the provider_id field on the grid facing record. This
value will also be used in other records to refer to
the provider.x_data_source_type String(20) /
Required
Data Source Identifier (EHR / CDW / Medicaid)
npi String (50) Provider NPI
dea String (50) Provider DEA Number
specialty_source_value String (50) Provider type as recorded at the source (e.g.
Physican, NP, MA, etc). If the provider type is not
defined in the source data refer to the Health Care
Provider Specialty section of the Concept ID Table.
Used to create specialty_concept_id
x_provider_first String (75) Provider First Name
x_provider_middle String (75) Provider Middle Name (or initial)
x_provider_last String (75) Provider Last Name
care_site_source_value String (50) Local reference value for Care Site. This value will
be matched against the Care Site table to obtain thecorresponding care_site_id.
x_organization_source
_value
String (50) /
Required
Local reference value for Organization. This value
will be matched against the Care Site table to obtain
the corresponding organization_id.
-
8/19/2019 2013-03-03 ETL Specifications v4.0 SAFTINet
20/47
SAFTINet ETL Specifications Document Page 20
4.4.1 Example of PROVIDER source / destination data
Provider Table - XML
provider_source_value 349302
x_data_source_type EHR
npi 34930302
dea 49492
specialty_source_value General Practitioner
x_provider_first Marcusx_provider_middle W
x_provider_last Welby
care_site_source_value UC Internal Medicine
x_organization_source_value University of Colorado
ETL View Grid View
Provider Table - Grid
provider_id 2399450
provider_source_value 349302
x_data_source_type EHR
npi 34930302
dea 49492
specialty_source_value General Practitioner
specialty_concept_id 20302
x_provider_first Marcus
x_provider_middle W
x_provider_last Welby
care_site_id 22770494
x_organization_id 3939
x_grid_node_id 1
Green – Brought forward into grid model / Red – Removed in processing /
Blue – Item under discussion
Green – Brought forward from ETL / Yellow – Generated from ETL field / Red –
Generated locally or from multiple ETL fields / Blue – Item under discussion
-
8/19/2019 2013-03-03 ETL Specifications v4.0 SAFTINet
21/47
SAFTINet ETL Specifications Document Page 21
4.5 Table Name: X_Demographic
The X_Demographic table stores information about individual patients, the PHI elements of this record will be stripped out in the
transformation to the grid model. Address information will be limited and used to create a new location record.
The field mapping is performed as follows:
Destination Field Data Type Source Field Applied Rule Comment
person_source_value String (50) /
Required
Person unique identifier at the source (MRN). Used
to create the person_id field on the grid facing
record. This value will also be used in other records
to refer to the person.
x_data_source_type String (20) /Required
Data Source Identifier (EHR / CDW / Medicaid)
medicaid_id_number String (50) Medicaid ID Number
ssn String (50) Social Security Number
last String (75) Last Name
middle String (75) Middle Name or Initial
first String (75) First Name
address_1 String (50) The first line of the person's actual address.
address_2 String (50) The first line of the person's actual address.
city String (50) The city portion of the person's actual address.
state String (2) The state portion of the person's actual address.
zip String (9) Zip code of the person's actual address.
county String (20) The county portion of the person’s address as
recorded at source.
year_of_birth Number(4) /
Required
Year of birth
month_of_birth Number (2) Month of birth
day_of_birth Number (2) Day of birth
gender_source_value String (50) Local reference value for gender of the person.
Used to create gender_concept_id
race_source_value String (50) Local reference value for race of the person. Used
to create race_concept_id.
ethnicity_source_value String (50) Local reference value for ethnicity of the person.
Used to create ethnicity_concept_id.
-
8/19/2019 2013-03-03 ETL Specifications v4.0 SAFTINet
22/47
SAFTINet ETL Specifications Document Page 22
provider_source_value String (50) Local reference value for patient’s primary provider
(if any). This value will be matched against the
Provider table to obtain the corresponding
provider_id.
care_site_source_value String (50) Local reference value for the patient’s primary Care
Site (if any). This value will be matched against the
Care Site table to obtain the corresponding
care_site_id.
x_organization_source
_value
String (50) /
Required
Local reference value for patient’s organization. Th
value will be matched against the Organization table
to obtain the corresponding organization_id.
-
8/19/2019 2013-03-03 ETL Specifications v4.0 SAFTINet
23/47
SAFTINet ETL Specifications Document Page 23
4.5.1 Example of X_Demographic source / destination data
X_Demographic Table - XML
person_source_value 29201082
x_data_source_type EHR
medicaid_id_number 3903432
ssn 999-99-9999
last Doe
middle D
first John
address_1 123 Fake St
address_2 Apt 566
city Aurora
state CO
zip 80045
county Arapahoe
year_of_birth 1965
month_of_birth 2
day_of_birth 9
gender_source_value Male
race_source_value White
ethnicity_source_value Non-Hispanic
provider_source_value 35346346
care_site_source_value UC Internal Medicine
x_organization_source_value University of Colorado
ETL View
Person Table - Grid
person_id 22770494
person_source_value2
location_id1 49382
year_of_birth 1965
month_of_birth 2
day_of_birth 9
gender_concept_id 675
gender_source_value Malerace_concept_id 344
race_source_value White
ethnicity_concept_id 202
ethnicity_source_value Non-Hispanic
provider_id3 34235556
care_site_id 22770494
x_organization_id 382392
x_grid_node_id 1
GRID View
Location Table - Grid
location_id1 39458
location_source_value
x_data_source_type EHR
address_14
address_24 city Aurora
state CO
zip 80045
x_zip_deidentified5 800
county Arapahoe
x_location_type6 34344
x_grid_node_id 1
Green – Brought forward into grid model / Red – Removed in processing Green – Brought forward from ETL / Yellow – Generated from ETL field / Red –
Generated locally or from multiple ETL fields
-
8/19/2019 2013-03-03 ETL Specifications v4.0 SAFTINet
24/47
SAFTINet ETL Specifications Document Page 24
1. The location ID value is not linked to a location_source_ value in this case. When the address information is transferred to the location table, the resulting
ID value will be placed in the person record for reference
2. The grid version of the person table contains a blank field for person_source_value to comply with the OMOP standard. The value for
person_source_value on the ETL side will not be carried forward due to privacy concerns.
3. The grid facing provider_id will be derived from the ETL field provider_source_value.
4. When creating the location table the local values for person address will not be passed through to the grid, although they are labeled green because in
other instances, such as Organization and Care Site, they do move forward to the grid facing database
5. x _zip_deidentified will be generated from zip. This field was created specifically for person locations to support the creation of ‘Safe Harbor’
Limited Data Sets
6. x_location_type will be derived from the XML record type (Person in this case)
-
8/19/2019 2013-03-03 ETL Specifications v4.0 SAFTINet
25/47
SAFTINet ETL Specifications Document Page 25
4.6 Table Name: VISIT_OCCURRENCE
The Visit Occurrence table contains a record for each patient-provider encounter. The provider, patient and location are all stored as well as
the type of visit.
The field mapping is performed as follows:
Destination Field Data Type Source Field Applied Rule Comment
x_visit_occurrence
_source_identifier
String (50) /
Required
Local reference value for visit, used to create the
visit_occurrence_id field on the grid facing record.
x_data_source_type String (20) /
Required
Data Source Identifier (EHR / CDW / Medicaid)
person_source_value String (50) /
Required
Person unique identifier at the source (MRN). This
value will be matched against the Person table to
obtain the corresponding person_id.
visit_start_date DATE/
Required
The date on which the Visit started
visit_end_date DATE /
Required
The date on which the Visit ended
place_of_service
_source_value
String (50) Visit type (office visit, med refill, face-to-face,
telephone, med refill … etc). If the visit site type is
not defined in the source data refer to the
Visit_Type section of the Concept ID Table. Used to
create place_of_service_concept_id
x_provider_source_value String (50) Local reference value for the provider conducting
the visit. This value will be matched against theProvider table to obtain the corresponding
provider_id.
care_site_source_value String (50) Local reference value for the Care Site of the visit.
This value will be matched against the Care Site
table to obtain the corresponding care_site_id.
-
8/19/2019 2013-03-03 ETL Specifications v4.0 SAFTINet
26/47
SAFTINet ETL Specifications Document Page 26
4.6.1 Example of VISIT OCCURRENCE source / destination data
Visit Occurrence Table - XML
x_visit_occurrence_source
_identifier
349302
x_data_source_type EHR
person_source_value 2302202
visit_start_date 5/23/2011
visit_end_date 5/25/2011place_of_service_source_value Physical
x_provider_source_value 20302340
care_site_source_value UC Internal Medicine
ETL View Grid View
Visit Occurrence Table - Grid
visit_occurrence_id 3203402
x_data_source_type EHR
person_id 30205202
visit_start_date 5/23/2011
visit_end_date 5/25/2011
place_of_service_concept_id 302023003place_of_service_source_value Physical
x_provider_id 04594020
care_site_id 202033
x_grid_node_id 1
Green – Brought forward into grid model / Red – Removed in processing Green – Brought forward from ETL / Yellow – Generated from ETL field / Red –
Generated locally or from multiple ETL fields
-
8/19/2019 2013-03-03 ETL Specifications v4.0 SAFTINet
27/47
-
8/19/2019 2013-03-03 ETL Specifications v4.0 SAFTINet
28/47
SAFTINet ETL Specifications Document Page 28
quantity Number (8,2) The quantity of drug recorded in the corresponding
Drug Exposure Instance
days_supply Number (4) The number of days' supply of the medication
recorded in the corresponding Drug Exposure
Instance.
x_drug_name String (255) /
Required
Drug name taken verbatim from source field
x_drug_strength String (50) Strength (taken verbatim) (e.g. 20, 1000, 2-4, 1)
sig String (500) Sig (if available)
provider_source_value String (50) /
Required
Local reference value for prescribing/administering
provider (if any). This value will be matched against
the Provider table to obtain the correspondingprovider_id.
x_visit_occurrence_
source_identifier
String (50) Local reference value for the visit where the drug
was prescribed/administered. This value will be
matched against the Visit Occurrence table to obtai
the corresponding visit_occurrence_id.
relevant_condition
_source_value
String (50) Associated Diagnosis Source Code. This is the code
for the condition for which the drug was given. This
value is independent and will not be matched
against the Condition Occurrence table.
-
8/19/2019 2013-03-03 ETL Specifications v4.0 SAFTINet
29/47
SAFTINet ETL Specifications Document Page 29
4.7.1 Example of DRUG EXPOSURE source / destination data
Drug Exposure Table - ETL
drug_exposure_source_identifier 30003400
x_data_source_type EHR
person_source_value 2302202
drug_source_value 4594930302
drug_source_value_vocabulary NDC
drug_exposure_start_date 4/19/2011
drug_exposure_end_date 5/19/2011
drug_type_source_value Prescription
stop_reason Regimen Completed
Refills 1
quantity 60
days_Supply 30
x_drug_name Amoxicillin
x_drug_strength 500
sig
provider_source_value 239292
x_visit_occurrence_source
_identifier
3499202
relevant_condition_source_value 393821
ETL View Grid View
Drug Exposure Table - Grid
drug_exposure_id 9947839
x_data_source_type EHR
person_id 30205202
drug_concept_id 499506
drug_source_value 4594930302
drug_exposure_start_date 4/19/2011
drug_exposure_end_date 5/19/2011
drug_type_concept_id 983921
stop_reason Regimen Completed
refills 1
quantity 60
days_Supply 30
x_drug_name Amoxicillin
x_drug_strength 500
sig
prescribing_provider_id 3935050
visit_occurrence_id 040200
relevant_condition_concept_id 059439333
x_grid_node_id 1
Green – Brought forward into grid model / Red – Removed in processing Green – Brought forward from ETL / Yellow – Generated from ETL field / Red – Generated locally or from multiple ETL fields
-
8/19/2019 2013-03-03 ETL Specifications v4.0 SAFTINet
30/47
SAFTINet ETL Specifications Document Page 30
4.8 Table Name: CONDITION_OCCURRENCE
The Condition Occurrence table contains a record for each patient condition. The codes associated with the conditions as well as the
associated person, provider, and visits/encounters are also recorded.
The field mapping is performed as follows:
Destination Field Data Type Source Field Applied Rule Comment
condition_occurrence
_source_identifier
String (50) /Required
Source Condition Primary Key; could be a unique
record identifier. Used to create the
condition_occurrence_id field on the grid facing
record.
x_data_source_type String (20) /
Required
Data Source Identifier (EHR / CDW / Medicaid)
person_source_value String (50) /Required
Person unique identifier at the source (MRN). This
value will be matched against the Person table to
obtain the corresponding person_id.
condition_source_value String (50) /
Required
Local diagnosis code (e.g. ICD-9, SNOMED etc…).
Used to create condition_concept_id
condition_source_value
_vocabularyString(50) /
Required
Type of code (e.g. ICD-9) used for condition.
x_condition_source_desc String (50) Source Diagnosis Text Description
condition_start_date Date / Required Onset Date
x_condition_update_date Date Date condition was updated/reviewed
condition_end_date Date Resolved Date – Leave blank for unresolved
conditions.condition_type_source
_value
String (50) /
Required
Type of condition as recorded in source data (e.g.
chief complaint, problem list, etc). If the condition
type is not defined in the source data refer to the
Condition_Occurrence section of the Concept ID
Table. Used to create condition_type_concept_id
stop_reason String (20) The reason, if available, that the condition was no
longer recorded, as indicated in the source data.
Valid values include discharged, resolved etc…
associated_provider
_source_value
String (50) Provider ID from the source - Provider of record.
This value will be matched against the Provider tabl
to obtain the corresponding provider_id.
-
8/19/2019 2013-03-03 ETL Specifications v4.0 SAFTINet
31/47
SAFTINet ETL Specifications Document Page 31
x_visit_occurrence
_source_identifier
String (50) Local reference value for visit. This value will be
matched against the Visit Occurrence table to obtai
the corresponding Visit Occurrence ID.
-
8/19/2019 2013-03-03 ETL Specifications v4.0 SAFTINet
32/47
SAFTINet ETL Specifications Document Page 32
4.8.1 Example of CONDITION OCCURRENCE source / destination data
Condition Occurrence Table - ETL
condition_occurrence_source_identifier 30003400
x_data_source_type EHR
person_source_value 393030
condition_source_value 162.9
condition_source_value_vocabulary ICD9
x_condition_source_desc Malignant Neop
condition_start_date 4/19/2011
x_condition_update_date 10/19/2011
condition_end_date
condition_type_source_value Chief Complaint
stop_reason
associated_provider_source_value 392904
x_visit_occurrence_source_identifier 403030
ETL View Grid View
Condition Occurrence Table - Grid
condition_occurrence_id 8349393
x_data_source_type EHR
person_id 94849303
condition_concept_id 884934
condition_source_value 162.9
x_condition_source_desc Malignant Neop
condition_start_date 4/19/2011
x_condition_update_date 10/19/2011
condition_end_date
condition_type_concept_id 499404
stop_reason
associated_provider_id 39304
visit_occurrence_id 90493023
x_grid_node_id 1
Green – Brought forward into grid model / Red – Removed in processing Green – Brought forward from ETL / Yellow – Generated from ETL field / Red –
Generated locally or from multiple ETL fields
-
8/19/2019 2013-03-03 ETL Specifications v4.0 SAFTINet
33/47
SAFTINet ETL Specifications Document Page 33
4.9 Table Name: PROCEDURE_OCCURRENCE
The Procedure Occurrence table contains a record for each procedure. The type of procedure as well as the associated person and visit are
recorded.
The field mapping is performed as follows:
Destination Field Data Type Source Field Applied Rule Comment
procedure_occurrence_s
ource_identifier
String (50)
/Required
Source Procedure Primary Key. Used to create the
procedure_occurrence_id field on the grid facing
record.
x_data_source_type String (20) /
Required
Data Source Identifier (EHR / CDW / Medicaid)
person_source_value String (50) /
Required
Person unique identifier at the source (MRN). This
value will be matched against the Person table to
obtain the corresponding person_id.
procedure_source_value String (50) /
Required
The Procedure Code as captured from the source
data. Values include CPT-4, ICD-9-CM (Procedure),
HCPCS, and other procedure codes. Used to create
procedure_concept_id.
procedure_source_value
_vocabulary
String(50) /
Required
Type of code (e.g. CPT) used for condition.
procedure_date DATE / Required The date on which the procedure began (or was
performed)
procedure_type_source _value String (50) The procedure type as stored in source. If theprocedure type is not defined in the source data
refer to the Procedure Occurrence section of the
Concept ID Table. Used to create
procedure_type_concept_id.
provider_record_source
_value
String (50) Local Reference value for Provider. This value will b
matched against the Provider table to obtain the
corresponding provider_id.
x_visit_occurrence
_source_identifier
String (50) Local Reference value for visit. This value will be
matched against the Visit Occurrence table to obtai
the corresponding visit_occurrence_id.
relevant_condition
_source_value
String (50) First Associated Diagnosis Code. Used to create
relevant_condition_concept_id.
-
8/19/2019 2013-03-03 ETL Specifications v4.0 SAFTINet
34/47
SAFTINet ETL Specifications Document Page 34
4.9.1 Example of PROCEDURE OCCURRENCE source / destination data
Procedure Occurrence Table - ETL
procedure_occurrence_source_identifier 9848493
x_data_source_type EHR
person_source_value 594928
procedure_source_value 49750
procedure_source_value_vocabulary CPT
procedure_date 4/19/2011
procedure_type_source_value Inpatient header
provider_record_source_value 23902023
x_visit_occurrence_source_identifier 2302320
relevant_condition_source_value 20230
ETL View Grid View
Procedure Occurrence Table - Grid
procedure_occurrence_id 393948230
x_data_source_type EHR
person_id 3493030
procedure_concept_id 39949023
procedure_source_value 49750procedure_date 4/19/2011
procedure_type_concept_id 884934
associated_provider_id 34040222
visit_occurrence_id 20923042
relevant_condition_concept_id 23032009
x_grid_node_id 1
Green – Brought forward into grid model / Red – Removed in processing Green – Brought forward from ETL / Yellow – Generated from ETL field / Red –
Generated locally or from multiple ETL fields
-
8/19/2019 2013-03-03 ETL Specifications v4.0 SAFTINet
35/47
SAFTINet ETL Specifications Document Page 35
4.10 Table Name: OBSERVATION
The Observation table contains records for labs, measurements such as height and weight, etc… It is also where information from Past
Medical History, Past Surgical History, Allergy, and Social/Personal History are stored.
The field mapping is performed as follows:
Destination Field Data Type Source Field Applied Rule Comment
observation_source
_identifier
String (50) /
Required
Source Primary Key for Observation Record. Used to
create the obs_occurrence_id field on the grid facin
record.
x_data_source_type String (20) /
Required
Data Source Identifier (EHR / CDW / Medicaid)
person_source_value String (50) /
Required
Person unique identifier at the source (MRN). This
value will be matched against the Person table to
obtain the corresponding person_id.
observation_source
_value
String (50) /
Required
The Observation Code as it appears in the source
data. Used to create obs_concept_id
observation_source
_value_vocabulary
String(50) /
Required
Vocabulary used for the observation
observation_date Date / Required The date of the Observation
observation_time Time The time of the observation
value_as_number NUMBER(14,3) The observation result stored as a numeric value.
This is applicable to observations where the result is
expressed as a numeric value.
value_as_string String (60) The observation result stored as character string. It
is applicable to the observations where the result is
expressed as a character string. Used to create
obs_value_as_concept_id.
unit_source_value String (50) Unit of measure for Observation result when
measured as a numeric value. Used to create
unit_concept_id
range_low NUMBER(14,3) The lower limit of the numeric range of the
Observation value. It is not applicable if the
observation results are non-numeric or categorical,
and must be in the same units of measure as the
observation value
-
8/19/2019 2013-03-03 ETL Specifications v4.0 SAFTINet
36/47
SAFTINet ETL Specifications Document Page 36
range_high NUMBER(14,3) The upper limit of the numeric range of the
Observation value. It is not applicable if the
observation results are non-numeric or categorical,
and must be in the same units of measure as the
observation value
observation_type_source
_value
String (50) /
Required
Type of observation (e.g. PRO, Lab, History of, Socia
History, Allergies). If the visit site type is not define
in the source data refer to the Observation section
of the Concept ID Table. Used to create
observation_type_concept_id
associated_provider
_source_value
String (50) Provider ID from the source. This value will be
matched against the Provider table to obtain thecorresponding provider_id.
x_visit_occurrence_
source_identifier
String (50) Local reference value for visit. This value will be
matched against the Visit Occurrence table to obtai
the corresponding visit_occurrence_id.
relevant_condition
_source_value
String (50) First Associated Diagnosis Code. Used to create
relevant_condition_concept_id.
x_obs_comment String (500) Contains Result Comments – do not use this field
for now
-
8/19/2019 2013-03-03 ETL Specifications v4.0 SAFTINet
37/47
SAFTINet ETL Specifications Document Page 37
4.10.1 Example of OBSERVATION source / destination data
Observation Table - ETL
observation_source_identifier 40230320
x_data_source_type EHR
person_source_value 20202302
observation_source_value BP_Systolic
observation_source_value_vocabulary University Lab
observation_date 7/12/2011
observation_time 4:53:00 PMvalue_as_number 148
value_as_string
unit_source_value mmHg
range_low 50
range_high 200
observation_type_source_value Lab Value
asociated_provider_source_value 930392
x_visit_occurrence_source_identifier 2020200
relevant_condition_source_value 401.2
x_obs_comment
ETL View Grid View
Observation Table - Grid
observation_id 23902323
x_data_source_type EHR
person_id 3903030
observation_concept_id 102190
observation_source_value 8393929
observation_date 7/12/2011
observation_time 4:53:00 PMvalue_as_number 148
value_as_string
value_as_concept_id
unit_concept_id 020333
unit_source_value mmHg
range_low 50
range_high 200
observation_type_concept_id 2032002
associated_provider_id 939393
visit_occurrence_id 2002303
relevant_condition_concept_id 302023
x_obs_comment
x_grid_node_id 1
Green – Brought forward into grid model / Red – Removed in processing Green – Brought forward from ETL / Yellow – Generated from ETL field / Red
– Generated locally or from multiple ETL fields
-
8/19/2019 2013-03-03 ETL Specifications v4.0 SAFTINet
38/47SAFTINet ETL Specifications Document Page
Appendix A: Table Specific Rules
Person Table
o Recordset should consist of all information (including inpatient and outpatient visits) about any patients
with activity (outpatient visits) at a participating primary care site within the past 5 years (back to1/1/2007 for initial SAFTINet load)
o For any patient seen within the past 5 years we request data retrospectively as described below.
-
8/19/2019 2013-03-03 ETL Specifications v4.0 SAFTINet
39/47SAFTINet ETL Specifications Document Page
Appendix B: Row filters
This section details the types of data that will go into each table. For each table, the rightmost columns lis
the general data domains (e.g. Lab values) along with the specific concepts (e.g. Blood Pressure) within
each domain that should be gathered for the table. When a date is listed with a concept, please gather all
records after that date. For most concepts, this will mean gathering the last 5 years of data (2007-2012),
though some concepts go back further such as colonoscopy and pneumovax.
Organization One record per grouping of care sites operating under a single health care hierarchy
Care Site Include a record for any location where care is provided (examples include clinics, mobileunits and "home-health care"). Multiple separate care-sites in a single building could be
grouped together, or not depending on partner's preference
Provider Include a record for every provider who appears in the "provider" table OR the subset ofthe table that can be linked to a claim, a visit, or a prescription, whatever is easiest. If
filtering, include all providers who have been active since 1/1/2007 even if not currently
active. Person Include a record for each person who has had some sort of contact with the participating
clinics since 1/1/2007 (regardless of current activity status). This set of persons can be
used to filter the rest of the clinical data - only pull data related to this set of patients.
-
8/19/2019 2013-03-03 ETL Specifications v4.0 SAFTINet
40/47
SAFTINet ETL Specifications Document Page 40
For the following four tables, we wish to collect the specified record types. Please check the ‘Collected?’ column for
any record types that will be included in the source data file. Also, please list the local source value for that type.
Example: If the local tag for Systolic BP that will go into the observation_source_value field is ‘SBP’, put that in the
local name column where systolic BP is listed.
Record Type Minimum
Date
Result Type Collected? Local Name
Drug
Exposure
Include a record for each prescription / fill / drug administration.
Prescription
Medication List
Administered Drugs
Fulfillment
Condition
Occurrence
Include a record for each entry on the problem list as well as a record for each encounter level diagnosis code.
Generally, these will be ICD-9 codes.
Problem list
Visit-level diagnosis codes
ICD-9 codes from claims record
Observation Data that do not fit in another table belong here. Observation table contains data from the following categories: labobservations (i.e. test results), general clinical findings, signs, and symptoms, along with other domains listed below.
Vital Signs
Height 1/1/2007
Height Percentile (for children) 1/1/2007
Weight
1/1/2007
Weight Percentile (for children) 1/1/2007
Pulse oximetry 1/1/2007
Pulse 1/1/2007
Blood Pressure - Systolic 1/1/2007
Blood Pressure - Diastolic 1/1/2007
Social History
Smoking Status
(Current/Past/Former/Second
Hand Exposure)
All Records /
No Date Limit
Drinking Status All Records /
No Date Limit
Past Medical History (To be defined)
-
8/19/2019 2013-03-03 ETL Specifications v4.0 SAFTINet
41/47
SAFTINet ETL Specifications Document Page 41
Past Surgical History (To be defined)
Lab Results
Cholesterol 1/1/2007
LDL 1/1/2007
Alanine transaminase 1/1/2007
Albumin 1/1/2007
Alkaline Phosphatase 1/1/2007
Aspartate aminotransferase 1/1/2007
Bilrubin (Total, Indirect and
Direct)
1/1/2007
Blood Urea Nitrogen_Serum 1/1/2007
Calcium-Serum 1/1/2007 CBC_% lymphocytes 1/1/2007
CBC_% Neutrophils 1/1/2007
CBC_White Blood Cell Count 1/1/2007
Chlamydia trachomatis DNA
assay (procedure)
1/1/2007
Chol HDL 1/1/2007
Chol_LDL, calculated 1/1/2007
Chol_LDL, measured directly 1/1/2007
Chol_Total 1/1/2007
Creatinine_Serum 1/1/2007
Free T4 1/1/2007
Glucose, Fasting_Serum 1/1/2007
Glucose, Random_Serum 1/1/2007
Glucose_Serum 1/1/2007 Hemoglobin A1c 1/1/2007
Hemoglobin_Serum 1/1/2007
Hepatitis B core antibody 1/1/2007
Hepatitis B e antibody 1/1/2007
Hepatitis B e antigen 1/1/2007
Hepatitis B surface antibody 1/1/2007
Hepatitis B surface antigen 1/1/2007
Hepatitis C antibody 1/1/2007
Hepatitis C antigen 1/1/2007
INR 1/1/2007
Platelet Count 1/1/2007
Potassium 1/1/2007
-
8/19/2019 2013-03-03 ETL Specifications v4.0 SAFTINet
42/47
SAFTINet ETL Specifications Document Page 42
Prostate specific antigen
measurement (procedure)
1/1/2007
Pulmonary Function Test 1/1/2007
Sodium 1/1/2007
Triglycerides 1/1/2007
TSH 1/1/2007
Urinary Protein 1/1/2007
Urine microalbumin/creatinine
ratio measurement (procedure)
1/1/2007
Urine protein/creatinine ratio
measurement (procedure)
1/1/2007
Urine_Microalbuminuriameasurement (procedure)
1/1/2007
Urine_Protein measurement
(procedure)
1/1/2007
Creatinine_phosphokinase 1/1/2007
GFR, estimated 1/1/2007
influenza assay 1/1/2007
influenza rapid assay (poct) 1/1/2007
pertussis test 1/1/2007
respiratory syncytial test 1/1/2007
FEV1, pre, number 1/1/2007
FEV1, pre, percent 1/1/2007
FEV1, post, number 1/1/2007
FEV1, post, percent 1/1/2007
FVC, pre, number 1/1/2007 FVC, pre, percent 1/1/2007
FVC, post, number 1/1/2007
FVC, post, percent 1/1/2007
PFT: Peak expiratory flow 1/1/2007
Allergies
Family History
Family History of CVD 1/1/2007
Patient Reported Outcomes
Medication Adherence Survey
MAS 1a 1/1/2007 Yes/No
MAS 1b 1/1/2007 Yes/No
MAS 1c 1/1/2007 Yes/No
MAS 1d 1/1/2007 Yes/No
-
8/19/2019 2013-03-03 ETL Specifications v4.0 SAFTINet
43/47
-
8/19/2019 2013-03-03 ETL Specifications v4.0 SAFTINet
44/47
SAFTINet ETL Specifications Document Page 44
PHQ-9 Q7 score 1/1/2007
PHQ-9 Q8 score 1/1/2007
PHQ-9 Q9 score 1/1/2007
PHQ-9 Total score 1/1/2007
Demographic Information
Highest Education Level
Achieved
All Records /
No Date Limit
Language Preference All Records /
No Date Limit
Imputed Race / Ethnicity All Records /
No Date Limit
Person % Fed Poverty level
1/1/2007 Person family size 1/1/2007
Family income 1/1/2007
Person relationship status 1/1/2007
Person Practice Status (active or
moved or gone elsewhere)
Most Recent
/ No Date
Limit
Procedure
Occurrence
Include a record for each procedure performed on a patient (CPT-4, ICD-9-CM (Procedures), and HCPCS codes). If you
want to filter the procedure table, at least include the following procedures
Procedures
Bone mineral density (DEXA
scan)
1/1/2007
Colonoscopy
Diabetic Eye Exam
1/1/2007
Diabetic Foot Exam 1/1/2007
Double contrast barium enema 1/1/2007
Mammogram 1/1/2007
Pap Smear 1/1/2007
Pulmonary Function Test 1/1/2007
Spirometry 1/1/2007
Mechanical Ventilation 1/1/2007
Continuous nebulized therapy 1/1/2007
Endotracheal intubation 1/1/2007
Critical Care 1/1/2007
Fecal occult blood test 1/1/2007
Immunizations
-
8/19/2019 2013-03-03 ETL Specifications v4.0 SAFTINet
45/47
SAFTINet ETL Specifications Document Page 45
Pneumovax
Other Immunizations 1/1/2007
Education
Education Nutrition 1/1/2007
Education Weight loss
management
1/1/2007
1. ACT and C-ACT categories should be one of the following:
1 = ACT in control (Total score > 19)
2 = ACT poorly controlled (Total score 16-19)
3 = ACT very poorly controlled (Total score < 15)
-
8/19/2019 2013-03-03 ETL Specifications v4.0 SAFTINet
46/47SAFTINet ETL Specifications Document Page
Appendix C: Sending data using flatfiles
Some users may wish to send their data in a standard flatfile as opposed to the current XML. ROSITA is
being modified to handle such files. The basic file should be a .txt style text file with columns arranged in
the order listed in this document. Individual column values should be separated by a pipe ‘|’character. A
total of 9 files should be loaded in the initial round, one for each table in Sections 4.2-4.10. The files will b
processed in the same fashion as the current XML files (see ROSITA Admin Guide for further details)
Example: 1 row from a sample Organization file
This record (from Section 4.2):
Organization Table - XML organization_source_value1 UC Internal Medicine
x_data_source_type EHRplace_of_service_source_value Academic Practice
organization_address_1 13199 E Montview Blvd
organization_address_2 Suite 300, Mail Stop F443
organization_city Aurora
organization_state CO
organization_zip 80045
organization_county Arapahoe
Should be represented as follows in the file (the actual text should be all on one line):
UC Internal Medicine|EHR|Academic Practice|13199 E Montview Blvd|Suite 300, Mail StopF443|Auora|CO|80045|Arapahoe
Users should apply the following rules when generating flatfiles:
- Send a separate file for each data table
- Files should be named using the following convention [table name].txt
-
Column values should be separated with the | character used as a delimiter
- Files should contain one record per row. No header row is needed, the first row should be actual
data
- Quotation marks occurring within column values should be ‘escaped’ so the processor can locate
them. This should be done with the \ character – the end result should look like \”
-
8/19/2019 2013-03-03 ETL Specifications v4.0 SAFTINet
47/47
- Backslash marks occurring within column values should also be ‘escaped’ with a second backslash
the end result should look like \\
- Datetime values should be in the following format 2012-01-09T12:00:00Z (example: 2012-01-09
4:15:00 PM) and dates should be use the following format YYYY-MM-DD