most common issues in define.xml files nj cdisc user group sergiy sirichenko october 21, 2015
TRANSCRIPT
Most Common Issues in Define.xml filesNJ CDISC User Group
Sergiy SirichenkoOctober 21, 2015
Abbreviations› CT – CDISC Control Terminology› VLM – Value Level Metadata
Major problems in Define.xml
› Usage of outdated Define.xml v1.0› Inconsistency in metadata› Missing study specific metadata› Lack of expertise
Outdated Define.xml v1.0 is still used› Define.xml has many standard limitation issues› “The first” versions are never perfect› Define.xml v1.0 is 11 years old
› Does anybody still using SDTM IG 3.1.1?› Define.xml v2.0 is robust enough to handle current
submission needs
› Separate presentation or webinar will be dedicated to this topic
Lack of structural consistency in v1.0› Metadata structural consistency in define.xml v2.0
is preventive against errors› Example: Variable Source value defines other
attributes› “CRF” -> Pages are expected› “Derived” -> Computational Algorithm is
expected› Define.xml v1.0 allows entering CRF pages for
derived variables, having missing values for expected attributes, etc.
Limited and confusing VLM in v1.0› In v1.0 Value Level Metadata does not provide a
reference to variable it applies› Cannot handle multiple conditions
› Confusing and complex hierarchical VLM structure is used instead
› Example: › LB domain has VLM assigned to LBCAT› LBCAT has VLM for LBSPEC, LBSPEC -> LBMETHOD, etc.› Properties of LBORRES (or other?) variable are described on some
point of this tree structure› V2.0 has explicit single expression with multiple condition assigned
to particular variable
Some sponsors try to mimic v2.0› To use functionality of v2.0› Example:
› V1.0 does not have attributes for NCI Codes› Sponsor added NCI Codes as a part of Decode value› V2.0:
› V1.0:
› It’s invalid usage of v1.0 standard!› Why not switch to v2.0 instead?
Permitted Value (Code)mmol/L [C64387]ng/mL [*]
Code Value Code Textmmol/L mmol/L [C64387]ng/mL ng/mL [*]
Some sponsors use custom stylesheet› Often done to mimic the functionality of v2.0› Regulatory reviewers like consistency, so please
use the CDISC provided standard stylesheet
Non-relevant metadata› Variable Role is used for standard development,
but does not add any value for study metadata› Example:
› STUDYID and USUBJID can only be “Identifier”› Does anyone actually used this info?› Define.xml 2.0 stylesheet doesn’t display it
Order of datasets and variables› Alphabetical
› Example: AE, CM, DM, …› Correct: logical order as defined by standard - by Class,
then by domain name› Random
› Example:
› Correct: as variables are present in dataset
Order # Variable Label1 AECAT Category for Adverse Event 2 AEDECOD Dictionary-Derived Term 3 AEGRPID Group ID 4 AESEQ Sequence Number 5 AETERM Reported Term for the Adverse Event 6 DOMAIN Domain Abbreviation 7 STUDYID Study Identifier 8 USUBJID Unique Subject Identifier 9 AEBODSYS Body System or Organ Class
10 AEOUT Outcome of Adverse Event …
Missing or invalid Origin› No references to CRF pages
› Example: Origin=”CRF”, instead of “CRF Page 12, 41, 57”› Inconsistencies in Origin/Comments
› Example:› RFSTDTC has Origin = “CRF”› No annotations on CRF (as expected)› Comments: “First dose of study medication” (it looks
like Derived variable)
Missing of invalid Derivations› Example 1:
› AGE: ”Calculation: = Min DOV - BRTHDTC in AGEU“› What is DOV? How I can use Character value (BRTHDTC)
in arithmetical formula? How were missing or partially missing dates handled?
› Derivations should be provided in terms of available data
› Example 2: › “ZX021_AE_DURATION”› ???
Invalid Value Level Metadata› VLM should be described on the same level as
regular variables:› Codelist, DataType, Length, Origin, Derivation, etc.› Common issue is missing or invalid metadata for Value
Level› Consider VLM as new variables with properties
independent from “hosted” variable› Example: Treatment Emergent Flag in SUPPAE has
length=1, not 200 as QVAL variable
Duplicate records› Code List
› Term› Variables
› Order Number
External dictionaries› Info on external dictionaries (MedDRA, WHODrug)
is not provided correctly› As comments to variable (non-machine readable)
› ISO8601 is defined as External Dictionary› It’s a data format associated with all date, datetime, etc.
variables. No specific reference to ISO8601 is needed if Data Type is defined correctly
Missing study specific metadata› Study specific information is crucial for reviewers› However in most submission packages it’s missing› Value of define.xml, SDRG, aCRFs is to explain what
is unique in this particular study
Missing Codelists› Codelists are limited to variables which are
assigned to standard CT› Commonly missing study specific Codelists for
variables› Category (--CAT), Subcategory (--SCAT)› EXTRT, ARMCD, --TESTCD/--TEST, QNAM, TPT› RDOMAIN in CO and RELREC domains› XXTOX, …
Merged Codelists› Due to confusion between Standard CT Codelist
and study Variable Codelist› Example:
› Define.xml has one codelist (UNIT) assigned to all --DOSU, --VAMTU, --ORRESU, --STRESU variables
› This codelist includes all unique terms across all study “units” variables and have 450 items, while for example EXDOSU variable is populated with one “mg” term only
› A reference to 450-terms codelist is not relevant
What is define.xml Codelist?› Define.xml Codelist describes data collection
process and should be limited to all terms used for data collection of specific data element (a particular Variable or Value Level)› For example, LBSTRESU, EGORRESU, EXDOSU usually
have separate Codelists based on the same (UNIT) standard CT
› If data is collected as a free text, then Codelist may be not applicable› Common example is CMDOSU, CMDOSFRQ, CELOC, etc.
Missing terms in Codelist› Term is present in data
› SD0037 check› Programming error› Due to misspelling , leading space characters, etc.› Due to missing Decoded value for some items
› CodeList vs. EnumaretedItem› Codelist was populated based on collected data,
but some options from CRF page are not included› Example: Only race “WHITE” is collected, while 6
options are present on CRF
Missing or invalid Value Level Metadata
› Content of SUPPQUAL domains must be described
Missing description of --SPID› --SPID is often Key Variable in domain› Clear and detailed description is required to
understand study data› Why --SPID was introduced? How it was derived? …
› Often Sponsors copy Notes text from CDISC IG. It’s completely invalid approach! Study specific information is expected.› SDTM IG text: “Sponsor-defined reference number.
Perhaps pre-printed on the CRF as an explicit line identifier or defined in the sponsor’s operational database. Example: Line number on a CRF Page.“
Missing description of variables› Study specific variables are the most important
› RFPENDTC, RFSTDTC, RFXSTDTC, --GRPID, --LNKID,--SPID, …
› SDTM text is not a variable description!› See --SPID slide as an example
Invalid Key Variables› Too long list of variables
› Example: “STUDYID, USUBJID, EXSPID, EXTRT, EXCAT, EXDOSTXT, EXDOSU, EXDOSFRM, EXDOSFRQ, EXDOSTOT, EXROUTE, EXSTDTC, EXENDTC, EXSTDY, EXENDY, EXTPT,EXTPTNUM, EXTPTREF, VISIT”
› Inconsistency between Key Variables and domain Structure› Example: Structure: “One record per event”
› Key Variable: “USUBJID, AETERM, AEDECOD, AESTDTC, AESEV, AESER, AEACN, VISIT”
› Usage of –SEQ as Key Variable› Example: “USUBJID, AESEQ”
Non-compliance with eCTD› Define.xml file is located in different folder than
datasets› Example:
› define.xml in …\tabulation› Data in …\tabulation\sdtm
› File name is not “define.xml”› Example:
› “define_study_001_sdtm.xml”
Contact info:
Sergiy [email protected]