snomed ct - in release format 2 ‘rf2’
DESCRIPTION
SNOMED CT - in Release Format 2 ‘RF2’. Implications for different constituencies. Presented 22 nd July 2013 Tom Seabury. Objectives. To share a sense of how SNOMED CT in RF2 can be fed into the terminology subsystems of an EHR system To give an RF2 comparison to the RF1 approaches - PowerPoint PPT PresentationTRANSCRIPT
SNOMED CT- in Release Format 2 ‘RF2’Implications for different constituencies
1
Presented 22nd July 2013Tom Seabury
Objectives
• To share a sense of how SNOMED CT in RF2 can be fed into the terminology subsystems of an EHR system
• To give an RF2 comparison to the RF1 approaches – for the same tasks
Sources
A comprehensive exposition on RF2 is included in the SNOMED CT Technical Implementation Guide, as the section
• Release Format 2 Update Guide
• Information relating specifically to the UKTC RF2 distribution of its content is from the UKTC team
The impossible 40 minute Webinar?
Technical subjects• New things, redundant things• Policy and tools• Major differences• ‘Concept Enumeration’• Change log, history mechanism• RF2 release types• Content : Files : Folder structures• Implementation Schemas• Preferred terms, preferred FSNs
Basicsassumed knowledge from prior Webinars
RF2 described & illustrated
Policy and plansUKTC Plans
RF2 compared to the pre-existing RF1
How do I …Tools
Webinar Topics
Assumed prior knowledge or experience
RF1 – Awareness –or-– Experience of use –or- – Technical expertise
SNOMED CT basics– The concept : descriptions : relationships scheme– The existence of an inactivation (‘history’) mechanism– Subsets (Refsets in RF2)– UK Extension content, its packing and distribution
structures.
Unfamiliar words or concepts?
During the webinar- please do ask for clarification if you
encounter unfamiliar words or usese.g. ‘Component’ has a specific meaning:
- any of Concept, description, relationship, Refset
e.g. ‘stated relationship’ - a relationship as stated by one of SNOMED CT’s authors, not one subsequently inferred from other relationships by some classifier tool.
- Or seemingly contradictory statements
For Clinical users
For:Clinical users of systems exploiting SNOMED CT
RF2 has:No direct impacts – nothing will be immediately apparent
For informaticians (clinical or otherwise)
For:Clinical / Informatician users– Those who configure systems exploiting
SNOMED CT
RF2 is:– Relevant to what tool are used – e.g. Refsets editor Vs. Subset editor– e.g. Detection of inactive content in templates,
queries or results
For Developers and system developers
For:Software & System Developers
RF2: will need to be understood– Its format– The potential inherent in the available content – Which are the appropriate parts of SNOMED CT content to be
correctly ingested and transformed– Any new Refset types (not needing revision to the RF2 standard)
will need to be tracked and impact assessed.
For Content Developers
For:UKTCOther developers of SNOMED CT content
i.e. owners of their own SNOMED CT namespace
RF2:– Will be conformant to RF2 in the development and maintenance
of content and of metadata– Will be conformant to for the RF2 distribution of content– Can consider whether to embed RF2 techniques natively into
authoring tools, or whether to rely on data transformations
Basics revisited
SNOMED CT Content, Release Formats
SNOMED CT is a large set of reference data
The UK Edition is distributed by the UKTC (via TRUD)
Distribution formats are standardisedIn addition: local choices for ways to pack content
are in use
SNOMED CT UK Edition
The whole of the SNOMED CT UK Edition is distributed via TRUD
A definition of ‘UK Edition’• the International Release,• the UK Clinical Extension and• the UK Drug Extension
Data
Content of SNOMED CT is distributed as• a collection of SNOMED CT data files of
different types
Files contain– the core SNOMED CT tables– sets of data components (in Refsets)– sets of metadata components (in Refsets)
Refsets (RF2) or Subsets (RF1) :Sets of things (or more exactly)Collections of references to things e.g. a set of concepts cherry-picked from the whole of SNOMED CT
‘Tables’ and ‘Files’ are partly interchangeably used
SNOMED CT Content, Release Formats
RF2 (and its predecessor RF1) are Standardised Release Formats for the content of SNOMED CT
These are:• 99% Product and platform neutral
– (exception: DOS eol characters)
• Exclusively for use with SNOMED CT• Formalised in IHTSDO documentation as
SNOMED CT Standards• Independent of the content which they distribute*
SNOMED CT Content, Release Formats
An RF2 (full) release containsAll of the past states of all the things which were ever
in SNOMED CT UK Edition
By contrast: an RF1 release containsAll the things which were ever in SNOMED CT UK
Edition, in their current state
(RF2 Snapshot is similar to RF1, it has only current the status of any component. RF2 snapshot is however different from RF1 in many ways described later)
The UKTC use of RF1 and RF2
UK• UKTC have relied exclusively on RF1 until October 2012
• UK RF1 and RF2 will co-exist for no less than three years from October 2012, UK RF1 distribution is currently the definitive version.
International• Deprecation of RF1 by IHTSDO is being considered,
IHTSDO wish to distribute the international core content exclusively in RF2
RF2 described and illustrated
SNOMED CT content in RF2
RF2 is being used by UKTC to distribute:– Core content
• Concepts• Descriptions• Relationships
– Sets (formerly ‘subsets’ now as ‘Refsets’)• realm description Refset• ‘Non-Human’ concepts set
– Cross-maps • e.g. to ICD-10
What structures and standards?
RF2 standardises:• Data types• Attributes used, and their meaning• File types and naming (carrying numerous fragments of information)
It is used to represent:• Core components• Reference sets (aka ‘Refsets’)
• Essential functionality – (such as language specificity, historical status changes and
associations
Concurrent to RF2 …
• Introduction of ‘Module’– and the Module Dependency Refset
• ‘Active’ field– Each component in RF2 has an associated active field– values of true ('1') or false ('0')– Use to filter out inactive content where appropriate
NB It is not always most appropriate to filter out inactive descriptions or concepts
Language of release formats
‘State Valid’ date stamped records
‘Refset’ Cf. Subset‘Concept Enumeration’ self referencing
‘Delta, Snapshot, Full’ release types in RF2
‘Module’, other new & existing metadata
‘Extensibility’ distribute anything
‘State Valid’ illustrated
Red text signified what has changed between entries
For illustration, data is NOT colour coded
Log is ordered here in reverse chronological
This is the first ever entry
(SNOMED CT files have no defined ordering of rows)
Ownership changes so a new ‘Module’ association is recorded
Modelling is improved and it becomes ‘fully defined’
And is subsequently inactivated
Refset patterns (RF2)UK map pattern
UK Cross-maps ICD-10
RF1 Subset Patterns
Reference Set names
• The labels for Refsets can be more verbose
• Addition of text to indicate the Refset type e.g. ‘Family history simple reference set’reference setsimple reference setFamily history simple reference set
(foundation metadata concept)(foundation metadata concept)Family history
Implied purpose
Explicit on formatting
Purpose clarified • in UK release documentation• on Subset register
RF2 Concept Enumerations Vs. RF1 arbitrary integers
• RF2 - Concept enumerations are used across all release files.
• uses concepts in a metadata hierarchy to represent an enumerated value set rather than using arbitrary integer (as in RF1) values
• Take the SCTID data type
RF2 Concept Enumerations (and other Metadata)
The metadata hierarchy
Zips, Files and folders
Zips, Files and folders
This International release is not the baseline release, so Delta is legitimate to include
Choices of:
Which of these am I likely to need?• Choice is between a current Snapshot and a current Full
But what is mandated of UKTC?• The full view is required to support some SNOMED CT
use cases but many requirements can be adequately met by providing access to a current Snapshot view.
However:• ‘A SNOMED CT-enabled terminology server must be
able to import data from a full release because this is the only Release Type that is required to be produced by all Extension developers’
Metadata values illustrated – Core ones
Metadata values illustrated - Module
Metadata values – carried over from RF1
UKTC distribution structures
UKTC has always added further structure– beyond that mandated by the standard
• e.g.– TRUD Packs and Subpacks– File:content strategy
• e.g. which extension in what file• e.g. which sets in what file
• No changes for RF2 introduction: replicated RF1 file and folder structures UK RF1 <> UK RF2
What does RF2 look like
197 files but Don’t Panic!
RF2
• Zipped structure• Unzipped structure
– folder structures– file names
• What is found where?– Data– Metadata– Continuity of access: Per RF1:
• 2 full releases and each of the ‘incremental releases’ between them.
Familiar UK Folder names?
UK Drug Extension
Illustration of Refset content & metadata
Policy and plans
• UKTC Policy, IHTSDO policy• Transformabilty• Beyond the UK
UKTC policy – released data
• UKTC enjoys some latitude in its cut-over from RF1 to RF2
• The planned period of concurrent running of RF1 plus RF2 (RF2 as tech preview status currently) will terminate in October 2015
• UK Edition in RF2 Status• Status: ‘Technical Preview’
– UK RF2 baseline release July 2013– Scope – including UK cross-maps
UKTC policy – tooling
• UKTC currently performs a conversion between RF1 and RF2 using tools and configuration data which is not itself distributed.
• UKTC has no current plans to distribute these tools and configuration data
(UKTC continues to author terminology content in tools which are not tied to any particular release format)
IHTSDO stated policy
IHTSDO• RF2 ‘Developed in response to extensive
feedback on’ RF1
• RF1 format was replaced by RF2 in January 2012, RF1 ‘is being maintained for a transitional period’
– (SNOMED CT®Technical Implementation Guide January 2013)
UKTC policy reflects this, but is different
Transformability
• Content is being transformed from RF1 to RF2 by UKTC
• Content is being transformed from RF2 to RF1 by IHTSDO
These transformations :– Require a fraction of prepared, different metadata for each
format– Tables of equivalence for some metadata such as versioning of
membership of Refsets and Subsets– For the UK Edition: are subject to a set of documented
deviations provided by UKTC within the RF2 release note.
Forward Compatibility
Today’s tools, tomorrows data (RF2 distributed)
- UKTC Distribution in both RF1 and RF2
- UKTC Distribution in RF2 exclusively+/- metadata which is specific to and essential for RF1+/- tools to allow you to generate RF1
(should you need to)
Back Compatibility
Tomorrow’s tools, today’s data (RF1 distributed)
• UKTC, UK Edition• RF2 metadata present (from RF2 files)• RF1 metadata present
• Providing Back compatibility
SNOMED CT Release Formats – stability
(Jan 2013) Stability‘The RF2 format is likely to be stable for at least a five year
period, without addition or deletion of fields’
Stable ExtensibilityThe Refset mechanism permits (without change to the core
standard) new Refset types to be used (extensibility)
Tools
Over time, as RF2 becomes the primary distribution format in the UK, tools will be developed to enhance the ability to process data in this format more easily.
This will include • Refset development• mapping tools • a concept editing environment
Similarities & differences RF2 and RF1
Contrast
RF2 supports things which are unavailable from RF1:
• Refset extensibility – a constrained set of novel types can be added
RF2 supports things which differently available from RF1:
• Component history
Extensive documentation of the value and benefits is made by IHTSDO : http://www.snomed.org/guide/rf2value.pdf
Contrast
Identification of the origin of a componentRF1 – NamespaceID (embedded into the component ID)
RF2 – ModuleID (a newly added metadata item)
In RF1 but not in RF2
PartitionIDs not in RF203 A Subset04 A Cross Map Set05 A Cross Map Target
In RF1 but not in RF2
• CTV3ID and SNOMEDID – (to Refsets)
• Single FULLYSPECIFIEDNAME• ISPRIMITIVE (to Refset)• REFINABILITY (field in RELATIONSHIP
file, to Refset)
Content available only in RF2
• Non-human Refset• Metadata: Module
Implementation Schemas
Implementation Schemas (1)
Implementation Schema (2)
Does HSCIC recommend that RF2 (or RF1) is used as the implementation schema?– No
Could RF2 be used as the implementation schema?– Perhaps, but its principally for distribution
Populating an implementation schema
Combine files of like types– Concept (x3)– Description (x3)– Relationship (x3)
Apply parts of the data as distributed in Refsets– Historical relationships in addition to Relationship table– UK Language preferences given precedence over
international– UK Preferred terms applied
Content RefsetsAll other relevant data
Release format Vs. Implementation Format
Distribution Normalised, no data
duplicationExtensive
Distributed in a normalised format
ImplementationPartly Denormalised
Denormalisation (performance)
Re-indexedFilteredPartitioned
Release format Vs. Implementation Format
Distribution Inclusion of all data
ImplementationRemoval of
unnecessary data (for the given application)
Most solutions are likely to be record-entry centric, hence mostly it will be the active components which are actually relevant
Populating your implementation schema
Release Format 1 Release Format 2Snapshot
Release Format 2Full
‘operating on’ SNOMED CT reference dataSNOMED CT
UK Edition
‘operating on’ SNOMED CT reference dataSNOMED CT
UK Edition
De-normalisationCombination
Substitute: Own file and folder skeleton
Data-reconciliationCore Tables …Update descriptions table with UK description preferences (unpack these from Refsets)
? Substitute: Own scheme for metadata e.g. Refsets > local value sets Own scheme for component status Own scheme for component history
? Add back: Own interface terms
Detect and accommodate any new Refset types found
Neutral distribution format
SNOMED CT Implementation schemaReference data in a Local schema for local needs
Staging database (distribution schema)
RF2 operations on reference data
RF1 operations on reference
data
De-normalisationCombination
Substitute: Own file and folder skeleton
Data-reconciliationCore Tables: Rip and replaceUpdate descriptions table with UK description preferences (unpack these from Subsets)
? Substitute: Own scheme for metadata e.g. Subsets > local value sets Own scheme for component status Own scheme for component history
? Add back: Own interface terms
Current status data table style (< 99% of implementations)Rip out & replace existing
reference data
Log style reference data database?Almost no-one:
Merely append new reference data data
No import tooling?
You may wish to just ‘get at’ a Refset out of the raw data
• How to? Tools you will rely on– Initially : File & Refset manifest –or-– Lookup tables between Refset names and their identifiers – cut
& paste (or search the Descriptions Table)
– Search in the Reference Set Descriptor Reference Set to identify the file pattern (or alternatively by seeking the Refset supertype in the Metadata hierarchy)
– Search within the files of the given pattern– (if data for one Refset has been partitioned across multiple files:
recombine it)– Filter the results for only the active content
RF2 :: RF1
RF2 files
Reference set files
– Primary grouping of Refsets is driven by their data format (i.e. not their common field of use)
– Second axis of grouping can be by utility / area of application
– Field of use clustering of files and data• can lead to the same Refset distributed more than once in a given release
Difference in files included
RF2Possible types are:
• Concept• Description• Relationship• Identifier;• Refset (all subtypes)
RF1 Possible file types are:
• Concepts• Descriptions• Relationships• ComponentHistory;• References;• Subsets• SubsetMembers;• CrossMapSets;• CrossMaps;• CrossMapTargets;• TextDefinitions;• Canonical;• DualKeyIndex;• WordKeyIndex;• StatedRelationships.
RF2Refset file typesFixed patterns
• reference set descriptor• module dependency• description format
Extendable patterns (addition of fields)• attribute value type • simple map• language type• query specification type• annotation type• association type
An RF1 release contains no less than 11 files
An RF2 release contains no less than 14 files:No upper limit
Extension example: CTV3 map:| Simple map | (S)
Any number and combinations of (C) (I) (S) additional fieldse.g. | Complex map type | (IISSSC)
Differences for a recipient of RF2 (Vs. RF1) (2)
• Choosing a storage structure for Refsets is different to the challenge for RF1 Subsets
• Extensibility of Refsets in RF2 dictates that each of the finite number of Refset patterns must each be accommodated into part of the storage schema.
• These different Refset patterns may each be held in a different data table structured for the purpose of that particular Refset pattern.
• The extensibility of RF2 however allows the addition of new Refset patterns, these conform to the standard and are not tied to a revision of the standard. Consequently
What sets can be together in one distribution file?
RF1Same Subset Type
RF2Same Refset pattern
Distribution folder structure for sets
Distribution of sets within Refset distribution files
RF1 (Sub)setsUKTC convention:
One file per subset
RF2 (Ref)setsUKTC convention:
One file per collection of Refsets (perhaps by refset pattern)
Files
• UTF-8 encoded• tab delimited• text files
• contain a column header row, providing field names for each column within the file
• Lower camel case is used for the field names (e.g. moduleId, effectiveTime)
• use DOS style line termination• Each line is terminated with a carriage return character followed by
a• line feed character• Should have a last line that ends with a line terminator (CR/LF)
before the end of file
RF2 - History of each component
In RF2 all changes in components are represented by adding a row (same component ID) with:– a new effective time– any necessary change in the component values.
– For changes which get into the ‘release’ data– Not one row for each and every change by SNOMED
CT authors made in between releases
Refsets and Active field values
• Refsets as distributed in RF2 contain components which are both active and inactive, according to the value in the ‘Active’ field.
• For a full release it is possible, using the applicable date range for each row, to identify the members of a Refset at any past time.
Release Types
"Full" release• each file containing every version of every component ever
released.
"Snapshot" release• containing only the most recent version of every component
ever released (both active and inactive components).A single snapshot provides access to a single release version and this ‘closely matches’ the view provided by the original SNOMED CT release format (RF1)
"Delta" release• containing only component versions created since the last
release. Each component version represents a new component or a change in an existing component.
Combinations of release types
• First Ever Full Release (‘Baseline’)+ Every subsequent Delta= Current Full Release
• Snapshot + Deltas= incomplete Full
• Delta alone is valueless
If your system have transaction tracking for the reference terminology itself, you may prefer to append Deltas than to Rip & Replace the Full release at each release
If you rely on Snapshot releases, then you may need to Rip & Replace the entire snapshot at each release
(being aware that you may lose past versions of Refsets which may still be current)
Application of an incomplete set of Deltas can be misleading
Full release data for Refsets
This UID is unique to each unique pairing of …
’active’ means that the row is active in this Refset
It’s not a surrogate or repeat of the concept’s own active status
But …Its not permissible to distribute as
an active Refset member if the component itself is not active at
that time
Snapshot
HistoryHistory
History
Refset distribution files
• Any RF2 file containing Refsets can only contain one type of Refset e.g. a file which holds exclusively ‘ssRefset’ having two additional columns, both holding String values
• The name indicates the attributes held in the file from any number of
• Component• String• Integer
Preferred Terms
• RF2 does not have a Description type value “ Preferred Term”, only types of “ Fully specified name ” and “Synonym”, where the latter may be refined either to a “Preferred term ” or to a “Synonym” within a language reference set. As a result of this change, in RF2 the preference for particular Descriptions in a language or dialect will be represented in the language reference set, and not in the descriptions table.
Preferred Terms in RF2
(The RF1 release files contain within the core tables identification of just one Preferred Term and one Fully Specified Name per concept)
• The international Edition in RF2 does not identify one Preferred Term per concept
• To identify a Preferred Term from RF2 data it is essential to combine information from a Language Refset along with data in the core tables.
Preferred Terms in RF2
UK Edition in RF2 identifies the UK Preferred Terms via:• Descriptions.Description.type=Synonym• RefSet.Acceptability=Preferred(RefsetID 999001261000000100)(Refset file name = xder2_cRefset_NHSRealmDescriptionLanguageFull_GB1000000_yyyymmdd.txt)(Path….\SNOMEDRF2\1.0.0\NHS_SNOMEDRF2\SnomedCT_GB1000000_20121001\RF2Release\
Full\Refset\Content\NHSRealmDescription)
There are no restrictions against the identification of alternative preferred terms in Refset(s) and using these as an alternative to the UKTC provided one.
NB existing UK documentation statesAlthough supporting a number of description re-prioritisations (Realm-specific promotions of descriptions to „preferred
term‟ description-type) the present NHS Realm Description Subset is best thought of as a mechanism to satisfy the „one and only one fully-specified name & preferred term‟ schema constraints for the UK data
What needs further exploration?
Technical subjects• New things, redundant things• Policy and tools• Major differences• ‘Concept Enumeration’• Change log, history mechanism• RF2 release types• Content : Files : Folder structures• Implementation Schemas• Preferred terms, preferred FSNs
How did we do? Speak to us
Routes by which you might wish to engage:
• Person to person; orientation (via: [email protected], [email protected] )
• NHS Networks / SNOMED CThttp://www.networks.nhs.uk/nhs-networks/snomed-ct (useful even if download speeds are slow)
• UKTC Implementation Forum(open to all, join via: [email protected] )
• Helpdesk [email protected]
Q&A
Q) Has RF2 any impact on dm+d?A) No, dm+d is unaffected
- no further questions were received during the Webinar