oasis electronic trial master file standard technical committee content classification layer

24
OASIS Electronic Trial Master File Standard Technical Committee Content Classification Layer January 20, 2014 9:00 – 10:00 AM PST

Upload: yoland

Post on 24-Feb-2016

33 views

Category:

Documents


0 download

DESCRIPTION

OASIS Electronic Trial Master File Standard Technical Committee Content Classification Layer. January 20, 2014 9:00 – 10:00 AM PST. Agenda. Roll Call. Meeting Etiquette. Announce your name prior to making comments or suggestions Keep your phone on mute when not speaking (#6) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: OASIS  Electronic Trial Master File Standard Technical Committee  Content Classification Layer

OASIS Electronic Trial Master File Standard Technical

Committee

Content Classification Layer

January 20, 20149:00 – 10:00 AM PST

Page 2: OASIS  Electronic Trial Master File Standard Technical Committee  Content Classification Layer

AgendaTopic Presenter

9:00-9:05 Call to Order & Roll Call Zack Schmidt

9:05-9:10 Approval of Minutes https://www.oasis-open.org/committees/documents.php?wg_abbrev=etmf

All

TC Process and Administration (deferred) Chet Ensign

2

9:10-9:20 Outreach Subcommittee - All Jennifer Alpert9:20-9:50 Tech presentation – Content Classification Layer Z. Schmidt/Aliaa

9:50-9:55 New Business All

9:55-10:00 Next meeting agenda / Date Z. Schmidt

Page 3: OASIS  Electronic Trial Master File Standard Technical Committee  Content Classification Layer

Name Company Voting Status Present?Jennifer Alpert Palchak CareLex Voter y

Aliaa Badr CareLex Voter yOleksiy (Alex) Palinkash CareLex Voter yTroy Jacobson Forte Research Voter yLou Chappuie Individual Voter yLisa Mulcahy Individual Non-Voter yRobert Gehrke Mayo Clinic Voter n

Rich Lustig Oracle Non-Voter yMichael Agard Paragon Solutions Non-Voter yChristopher McSpiritt Paragon Solutions Non-Voter y

Jamie O’Keefe Paragon Solutions Non-Voter nFran Ross Paragon Solutions Non-Voter yPeter Alterman SAFE-BioPharma Voter yCatherine Schmidt SterlingBio Voter yZack Schmidt SureClinical Voter yTrish Whetzel, PhD SureClinical Non-Voter yPeter Junge Beijing Sursen Observer nLaura Hilty Forte Research Observer nTony O’Hare Forte Research Observer nEldin Rammell Rammell Consulting Observer nRobin Cover OASIS staff Non-Voter nChet Ensign OASIS staff Non-Voter n

Roll Call

Page 4: OASIS  Electronic Trial Master File Standard Technical Committee  Content Classification Layer

Meeting Etiquette• Announce your name prior to making comments or

suggestions • Keep your phone on mute when not speaking (#6)

• Do not put your phone on hold – Hang up and dial in again when finished with your other call – Hold = Elevator Music = very frustrated speakers and participants

• Meetings will be recorded and posted– Another reason to keep your phone on mute when not speaking!

• Use the join.me “Chat” feature for questions / comments / Votes

• We will follow Robert’s Rules of OrderNOTE: This meeting is being recorded and minutes will be posted on TC page after the

meeting

From eTMF Std TC to Participants:Hi everyone: remember to keep your phone on mute

4

Page 5: OASIS  Electronic Trial Master File Standard Technical Committee  Content Classification Layer

• Status – New Members:– Oracle – Joined– In Progress: EMC, Kaiser Permanente, Shire,

Medtronics• Activities / Milestones

Outreach Subcommittee

Page 6: OASIS  Electronic Trial Master File Standard Technical Committee  Content Classification Layer

• Status• Timeline• In parallel with other Tech work from charter

Tech Discussion

Page 7: OASIS  Electronic Trial Master File Standard Technical Committee  Content Classification Layer

–Classification System Components:

• Classification Categories

– Taxonomy, hierarchy

• Metadata (‘Tags’)– Characterizes content

• Content Model– Published set of

classifications, metadata for a domain (e.g., eTMF)

Content Classification System Discussion

Page 8: OASIS  Electronic Trial Master File Standard Technical Committee  Content Classification Layer

Classification Categories Component

– Hierarchy of categories

• Categories, subcategories, content types

– Defined relationships with rules: Parent-Child

– All categories, content types required to have unique names and machine codes

– Each content type is associated with Metadata Properties (includes core and domain-specific)

– Content items are linked to content types.

– Unique classification and term codes based on Universal Decimal Classification System (UDC) numbering, widely used in libraries worldwide. Human and machine readable; infinitely expandable

– Can be described, edited and validated using OWL editor (like open source editor Protégé’)

– Supports any simple text vocabulary, including TMF Ref Model and other vocabularies

– W3C OWL2 and RDF/XML supported

Classification Categories Component

StudyDigital Content

Classification Categories Hierarchy

Page 9: OASIS  Electronic Trial Master File Standard Technical Committee  Content Classification Layer

Metadata Component– Used to tag or index digital content itemsMetadata Classes:Core - Comprised of four areas:

File Properties, Classification, Audit Trail Business Process

Domain-specific -- Metadata for a domain in life sciences such as eTMF, finance, legal administration, or others. Uses standards-based terms from groups like NCIOrg Specific – Metadata that meets organizations needs – not standards basedGeneral – obtained from public standards-based vocabulary terminology resources like dublin core Annotation Properties

Metadata about classification categories and metadata: Core, Org-Specific metadata

Metadata ComponentCore Metadata Example – File Properties:

Page 10: OASIS  Electronic Trial Master File Standard Technical Committee  Content Classification Layer

Content Model Component

– Contains classification hierarchy, metadata in machine readable format:

Content Model Component

Page 11: OASIS  Electronic Trial Master File Standard Technical Committee  Content Classification Layer

Term Sourcing Concepts:• Terms adopted by standards bodies should be used first in eTMF model

Primary Term Sources for eTMF Classification System:– Internet Standards Dev Orgs: W3C, IETF, ISO, etc.

» Required for interoperability of machine code

– NIH NCIthesaurus: Term database for FDA, CDISC, HL7, other orgs

» Required for interoperability of clinical / health sciences data

Secondary Term Sources for eTMF Classification System:• Industry sources – widely used terms in enterprise content mgmt software, TMF RM

Classification System – Term Sources

*Spec, Table 6, p21

Page 12: OASIS  Electronic Trial Master File Standard Technical Committee  Content Classification Layer

Classification Categories Component

– Classification hierarchy and numbering is based on UDC library numbering standard and XML naming

– Digital dot notation – Designed for human and machine readability

– Each number is also a unique code for naming and ordering in the hierarchy

– Primary Categories (PC): Three digit. eTMF: 100-200

– Subcategories (SC): Two digit: 10-99

– Content Types (CT): : Two digit: 10-99

– Maximum number of Sub-Category divisions is 5, excluding the 3-digits for the Primary Category

[1] Per spec section 2.1.1; 6.0

Classification Categories Component

Classification Categories Hierarchy and Numbering [1]:

Hierarchy Numbering/Naming Considerations: • Flexible, standards-based approach (W3C XML compliant naming*)• Ability to add multiple hierarchy divisions / levels

• Proposed: 5 divisions = [100*905) = 5.9x1011 Content Types• Uniqueness of numbers – usable as machine code identifiers• Machine readable, human readable• No sorting issues, no need for leading zeros*, no special chars

*Leading zeros in XML syntax are ignored: http://www.w3.org/TR/REC-xml/

Page 13: OASIS  Electronic Trial Master File Standard Technical Committee  Content Classification Layer

Numbering and Naming Scheme

Numbering

• Primary Categories and Sub-Categories :

– Category Code number

• Content Type:

– Content Type ID

Naming

• Primary Categories and Sub-Categories

– Simple text-based names

– Unique name, 64 char limit

– Abbreviation – 16 char limit suggested

– Compatible with W3C XML naming standards :

No special characters :

( ) < > ? / % # @ !

Classification Categories ComponentExample: Classification Categories Hierarchy, Naming, Numbering

Page 14: OASIS  Electronic Trial Master File Standard Technical Committee  Content Classification Layer

Modifying Classification Category Entities – General Editing Rules

Domain Specific

– Classifications cannot be deleted –> Reserve/Unreserve

– Modifications allowed to some annotation properties (see spec)

– Codes (Category Codes, CT Type ID) cannot be generated

Organization Specific

– Classifications can be deleted

– Modifications allowed for classification metadata, annotations

– Codes (Category Codes, CT Type ID) can be generated

Classification Categories Component

Classification Category, Content Type Editing Rules*

Type Import Terms Generate Code

Add/Modify Delete/Reserve

DomainSpecific

Yes No No/Yes** Reserve/Unreserve

OrganizationSpecific

Yes Yes Yes/Yes Delete

*Spec, Table 6, p21

**Annotation metadata

Page 15: OASIS  Electronic Trial Master File Standard Technical Committee  Content Classification Layer

Classification Editing Tool – Free, Open Source Protégé (From Stanford University: http://protege.stanford.edu/ )

*Spec, Table 6, p21

Protégé Editor:-Edit Classification Taxonomy and Metadata Terms-Validate Taxonomy and Term name compliance-Create valid RDF/XML Ontology

Page 16: OASIS  Electronic Trial Master File Standard Technical Committee  Content Classification Layer

Proposed Classification System has following Properties:

• Based on Naming and Numbering that is W3C XML compliant

– No special characters: ( ) & # @ / … etc.

– No leading zeros in classification numbers

• Based on Universal Decimal Classification (UDC) system for content classification:

– 100199 : eTMF Domain

– UDC system used in 170+ countries worldwide; expandable, human and machine readable, sortable http://en.wikipedia.org/wiki/Universal_Decimal_Classification

• Flexible and customizable for organizations, yet interoperable

– Domain classifications – Standardized; Organization-specific classifications – Editable

• Defined set of rules for Editing, modifying Taxonomy

• Any Organization can Modify/Edit taxonomy using open source editors like Protégé

Classification Categories - Summary

*Spec, Table 6, p21

Page 17: OASIS  Electronic Trial Master File Standard Technical Committee  Content Classification Layer

Appendix

Page 18: OASIS  Electronic Trial Master File Standard Technical Committee  Content Classification Layer

Content Classification System – Core Terms needed for Architecture – Objectives:

• Classification, Subclassification concept -

– Supports RDF/XML, OWL languages

– Non-domain specific, generic terms

– Easily understandable by anyone - conveys concept

– Conveys hierarchy

– No conflicts – not a reserved term in RDF/XML, OWL or other compilers/ IDE’s

– First priority – Source terms from standards bodies

Classification System – Core Terms

*Spec, Table 6, p21

Page 19: OASIS  Electronic Trial Master File Standard Technical Committee  Content Classification Layer

Content Classification System – Core Terms needed for Architecture

• Classification, Subclassification term concept:

Classification System – Core Terms

*Spec, Table 6, p21

Term Options: Source DefinitionCategory, SubCategory NIH NCIthesaurus Category: ‘This term is used informally

to mean a class of things’ (NCI code: C25372); Subcategory: ‘A subdivision that has common differentiating characteristics within a larger category.’ (NCI Code C25692)

Class, SubClass W3C OWL Class: ‘Resources may be divided into groups called classes’ SubClass: ‘Subclasses are classes; If a class C is a subclass of a class C', then all instances of C will also be instances of C'. (W3C RDF Class def)

TMF Zone, Section TMF Ref Model TMF Zone = Primary Classification (no published def found online) Section = SubClassification (no published def found online)

Proposed Term

Page 20: OASIS  Electronic Trial Master File Standard Technical Committee  Content Classification Layer

Content Classification System – Core Terms needed for Architecture

• Classification, Subclassification term concept:

Classification System – Core Terms

*Spec, Table 6, p21

Term Options: Source +/-Category, SubCategory NIH NCIthesaurus +Everyone knows it

+Describes hierarchy+In use by standards body (NIH NCI Thesaurus)+Generic

Class, SubClass W3C OWL +Describes hierarchy+In use by standards body+Generic - Could be a reserved word for some development tools

TMF Zone, Section TMF Ref Model +In use by TMF RM users-Doesn’t convey hierarchy-Not in use by standards body-Not Generic

Proposed Term

Page 21: OASIS  Electronic Trial Master File Standard Technical Committee  Content Classification Layer

Content Classification System – Core Terms needed for Architecture – Objectives:

• Content Type concept

– Supports RDF/XML, OWL languages

– Non-domain specific, generic terms

– Easily understandable by anyone – conveys concept

– No conflicts – not a reserved term in RDF/XML, OWL or other compilers/ IDE’s

– First priority – Source terms from standards bodies

Classification System – Core Terms

*Spec, Table 6, p21

Page 22: OASIS  Electronic Trial Master File Standard Technical Committee  Content Classification Layer

Content Classification System – Core Terms needed for Architecture

• Content Type term concept:

Classification System – Core Terms

*Spec, Table 6, p21

Term Source DefinitionContent Type W3C &

CareLexOracle

W3C: ‘Specifies the nature of a linked resource’ W3C and RFC2045] and [RFC2046]

CareLex: A content type is a reusable collection of metadata, business processes, behavior, and other settings for a category of items or documents in electronic content material.

Oracle: Content types are used to define the metadata that you can associate with content.

Artifact TMF Ref Model ‘A collection of documents’Wikipedia (Not published)

Proposed Term

Page 23: OASIS  Electronic Trial Master File Standard Technical Committee  Content Classification Layer

Content Classification System – Core Terms needed for Architecture

• Content Type term concept:

Classification System – Core Terms

*Spec, Table 6, p21

Term Source +/-Content Type W3C +Widely used in internet SW

+ECM SW use - Microsoft, Oracle, Alfresco, etc. +In use by standards body (W3C)+Generic

Artifact TMF Ref Model +In use by TMF RM users-Not in use by standards body-Not Generic -Doesn’t convey concept of metadata

Proposed Term

Page 24: OASIS  Electronic Trial Master File Standard Technical Committee  Content Classification Layer

• Roll call

• Reports– Outreach– Tech Discussion: Classification Layer: Core Metadata (Charter item 2, p.2)

• New business

Draft Agenda: Next Meeting