oasis electronic trial master file standard technical committee content classification layer
DESCRIPTION
OASIS Electronic Trial Master File Standard Technical Committee Content Classification Layer. January 20, 2014 9:00 – 10:00 AM PST. Agenda. Roll Call. Meeting Etiquette. Announce your name prior to making comments or suggestions Keep your phone on mute when not speaking (#6) - PowerPoint PPT PresentationTRANSCRIPT
OASIS Electronic Trial Master File Standard Technical
Committee
Content Classification Layer
January 20, 20149:00 – 10:00 AM PST
AgendaTopic Presenter
9:00-9:05 Call to Order & Roll Call Zack Schmidt
9:05-9:10 Approval of Minutes https://www.oasis-open.org/committees/documents.php?wg_abbrev=etmf
All
TC Process and Administration (deferred) Chet Ensign
2
9:10-9:20 Outreach Subcommittee - All Jennifer Alpert9:20-9:50 Tech presentation – Content Classification Layer Z. Schmidt/Aliaa
9:50-9:55 New Business All
9:55-10:00 Next meeting agenda / Date Z. Schmidt
Name Company Voting Status Present?Jennifer Alpert Palchak CareLex Voter y
Aliaa Badr CareLex Voter yOleksiy (Alex) Palinkash CareLex Voter yTroy Jacobson Forte Research Voter yLou Chappuie Individual Voter yLisa Mulcahy Individual Non-Voter yRobert Gehrke Mayo Clinic Voter n
Rich Lustig Oracle Non-Voter yMichael Agard Paragon Solutions Non-Voter yChristopher McSpiritt Paragon Solutions Non-Voter y
Jamie O’Keefe Paragon Solutions Non-Voter nFran Ross Paragon Solutions Non-Voter yPeter Alterman SAFE-BioPharma Voter yCatherine Schmidt SterlingBio Voter yZack Schmidt SureClinical Voter yTrish Whetzel, PhD SureClinical Non-Voter yPeter Junge Beijing Sursen Observer nLaura Hilty Forte Research Observer nTony O’Hare Forte Research Observer nEldin Rammell Rammell Consulting Observer nRobin Cover OASIS staff Non-Voter nChet Ensign OASIS staff Non-Voter n
Roll Call
Meeting Etiquette• Announce your name prior to making comments or
suggestions • Keep your phone on mute when not speaking (#6)
• Do not put your phone on hold – Hang up and dial in again when finished with your other call – Hold = Elevator Music = very frustrated speakers and participants
• Meetings will be recorded and posted– Another reason to keep your phone on mute when not speaking!
• Use the join.me “Chat” feature for questions / comments / Votes
• We will follow Robert’s Rules of OrderNOTE: This meeting is being recorded and minutes will be posted on TC page after the
meeting
From eTMF Std TC to Participants:Hi everyone: remember to keep your phone on mute
4
• Status – New Members:– Oracle – Joined– In Progress: EMC, Kaiser Permanente, Shire,
Medtronics• Activities / Milestones
Outreach Subcommittee
• Status• Timeline• In parallel with other Tech work from charter
Tech Discussion
–Classification System Components:
• Classification Categories
– Taxonomy, hierarchy
• Metadata (‘Tags’)– Characterizes content
• Content Model– Published set of
classifications, metadata for a domain (e.g., eTMF)
Content Classification System Discussion
Classification Categories Component
– Hierarchy of categories
• Categories, subcategories, content types
– Defined relationships with rules: Parent-Child
– All categories, content types required to have unique names and machine codes
– Each content type is associated with Metadata Properties (includes core and domain-specific)
– Content items are linked to content types.
– Unique classification and term codes based on Universal Decimal Classification System (UDC) numbering, widely used in libraries worldwide. Human and machine readable; infinitely expandable
– Can be described, edited and validated using OWL editor (like open source editor Protégé’)
– Supports any simple text vocabulary, including TMF Ref Model and other vocabularies
– W3C OWL2 and RDF/XML supported
Classification Categories Component
StudyDigital Content
Classification Categories Hierarchy
Metadata Component– Used to tag or index digital content itemsMetadata Classes:Core - Comprised of four areas:
File Properties, Classification, Audit Trail Business Process
Domain-specific -- Metadata for a domain in life sciences such as eTMF, finance, legal administration, or others. Uses standards-based terms from groups like NCIOrg Specific – Metadata that meets organizations needs – not standards basedGeneral – obtained from public standards-based vocabulary terminology resources like dublin core Annotation Properties
Metadata about classification categories and metadata: Core, Org-Specific metadata
Metadata ComponentCore Metadata Example – File Properties:
Content Model Component
– Contains classification hierarchy, metadata in machine readable format:
Content Model Component
Term Sourcing Concepts:• Terms adopted by standards bodies should be used first in eTMF model
Primary Term Sources for eTMF Classification System:– Internet Standards Dev Orgs: W3C, IETF, ISO, etc.
» Required for interoperability of machine code
– NIH NCIthesaurus: Term database for FDA, CDISC, HL7, other orgs
» Required for interoperability of clinical / health sciences data
Secondary Term Sources for eTMF Classification System:• Industry sources – widely used terms in enterprise content mgmt software, TMF RM
Classification System – Term Sources
*Spec, Table 6, p21
Classification Categories Component
– Classification hierarchy and numbering is based on UDC library numbering standard and XML naming
– Digital dot notation – Designed for human and machine readability
– Each number is also a unique code for naming and ordering in the hierarchy
– Primary Categories (PC): Three digit. eTMF: 100-200
– Subcategories (SC): Two digit: 10-99
– Content Types (CT): : Two digit: 10-99
– Maximum number of Sub-Category divisions is 5, excluding the 3-digits for the Primary Category
[1] Per spec section 2.1.1; 6.0
Classification Categories Component
Classification Categories Hierarchy and Numbering [1]:
Hierarchy Numbering/Naming Considerations: • Flexible, standards-based approach (W3C XML compliant naming*)• Ability to add multiple hierarchy divisions / levels
• Proposed: 5 divisions = [100*905) = 5.9x1011 Content Types• Uniqueness of numbers – usable as machine code identifiers• Machine readable, human readable• No sorting issues, no need for leading zeros*, no special chars
*Leading zeros in XML syntax are ignored: http://www.w3.org/TR/REC-xml/
Numbering and Naming Scheme
Numbering
• Primary Categories and Sub-Categories :
– Category Code number
• Content Type:
– Content Type ID
Naming
• Primary Categories and Sub-Categories
– Simple text-based names
– Unique name, 64 char limit
– Abbreviation – 16 char limit suggested
– Compatible with W3C XML naming standards :
No special characters :
( ) < > ? / % # @ !
Classification Categories ComponentExample: Classification Categories Hierarchy, Naming, Numbering
Modifying Classification Category Entities – General Editing Rules
Domain Specific
– Classifications cannot be deleted –> Reserve/Unreserve
– Modifications allowed to some annotation properties (see spec)
– Codes (Category Codes, CT Type ID) cannot be generated
Organization Specific
– Classifications can be deleted
– Modifications allowed for classification metadata, annotations
– Codes (Category Codes, CT Type ID) can be generated
Classification Categories Component
Classification Category, Content Type Editing Rules*
Type Import Terms Generate Code
Add/Modify Delete/Reserve
DomainSpecific
Yes No No/Yes** Reserve/Unreserve
OrganizationSpecific
Yes Yes Yes/Yes Delete
*Spec, Table 6, p21
**Annotation metadata
Classification Editing Tool – Free, Open Source Protégé (From Stanford University: http://protege.stanford.edu/ )
*Spec, Table 6, p21
Protégé Editor:-Edit Classification Taxonomy and Metadata Terms-Validate Taxonomy and Term name compliance-Create valid RDF/XML Ontology
Proposed Classification System has following Properties:
• Based on Naming and Numbering that is W3C XML compliant
– No special characters: ( ) & # @ / … etc.
– No leading zeros in classification numbers
• Based on Universal Decimal Classification (UDC) system for content classification:
– 100199 : eTMF Domain
– UDC system used in 170+ countries worldwide; expandable, human and machine readable, sortable http://en.wikipedia.org/wiki/Universal_Decimal_Classification
• Flexible and customizable for organizations, yet interoperable
– Domain classifications – Standardized; Organization-specific classifications – Editable
• Defined set of rules for Editing, modifying Taxonomy
• Any Organization can Modify/Edit taxonomy using open source editors like Protégé
Classification Categories - Summary
*Spec, Table 6, p21
Appendix
Content Classification System – Core Terms needed for Architecture – Objectives:
• Classification, Subclassification concept -
– Supports RDF/XML, OWL languages
– Non-domain specific, generic terms
– Easily understandable by anyone - conveys concept
– Conveys hierarchy
– No conflicts – not a reserved term in RDF/XML, OWL or other compilers/ IDE’s
– First priority – Source terms from standards bodies
Classification System – Core Terms
*Spec, Table 6, p21
Content Classification System – Core Terms needed for Architecture
• Classification, Subclassification term concept:
Classification System – Core Terms
*Spec, Table 6, p21
Term Options: Source DefinitionCategory, SubCategory NIH NCIthesaurus Category: ‘This term is used informally
to mean a class of things’ (NCI code: C25372); Subcategory: ‘A subdivision that has common differentiating characteristics within a larger category.’ (NCI Code C25692)
Class, SubClass W3C OWL Class: ‘Resources may be divided into groups called classes’ SubClass: ‘Subclasses are classes; If a class C is a subclass of a class C', then all instances of C will also be instances of C'. (W3C RDF Class def)
TMF Zone, Section TMF Ref Model TMF Zone = Primary Classification (no published def found online) Section = SubClassification (no published def found online)
Proposed Term
Content Classification System – Core Terms needed for Architecture
• Classification, Subclassification term concept:
Classification System – Core Terms
*Spec, Table 6, p21
Term Options: Source +/-Category, SubCategory NIH NCIthesaurus +Everyone knows it
+Describes hierarchy+In use by standards body (NIH NCI Thesaurus)+Generic
Class, SubClass W3C OWL +Describes hierarchy+In use by standards body+Generic - Could be a reserved word for some development tools
TMF Zone, Section TMF Ref Model +In use by TMF RM users-Doesn’t convey hierarchy-Not in use by standards body-Not Generic
Proposed Term
Content Classification System – Core Terms needed for Architecture – Objectives:
• Content Type concept
– Supports RDF/XML, OWL languages
– Non-domain specific, generic terms
– Easily understandable by anyone – conveys concept
– No conflicts – not a reserved term in RDF/XML, OWL or other compilers/ IDE’s
– First priority – Source terms from standards bodies
Classification System – Core Terms
*Spec, Table 6, p21
Content Classification System – Core Terms needed for Architecture
• Content Type term concept:
Classification System – Core Terms
*Spec, Table 6, p21
Term Source DefinitionContent Type W3C &
CareLexOracle
W3C: ‘Specifies the nature of a linked resource’ W3C and RFC2045] and [RFC2046]
CareLex: A content type is a reusable collection of metadata, business processes, behavior, and other settings for a category of items or documents in electronic content material.
Oracle: Content types are used to define the metadata that you can associate with content.
Artifact TMF Ref Model ‘A collection of documents’Wikipedia (Not published)
Proposed Term
Content Classification System – Core Terms needed for Architecture
• Content Type term concept:
Classification System – Core Terms
*Spec, Table 6, p21
Term Source +/-Content Type W3C +Widely used in internet SW
+ECM SW use - Microsoft, Oracle, Alfresco, etc. +In use by standards body (W3C)+Generic
Artifact TMF Ref Model +In use by TMF RM users-Not in use by standards body-Not Generic -Doesn’t convey concept of metadata
Proposed Term
• Roll call
• Reports– Outreach– Tech Discussion: Classification Layer: Core Metadata (Charter item 2, p.2)
• New business
Draft Agenda: Next Meeting