building an integrated clinical trial data management ... · building an integrated clinical trial...

9

Click here to load reader

Upload: doandan

Post on 23-Aug-2018

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Building an Integrated Clinical Trial Data Management ... · Building an Integrated Clinical Trial Data ... a Clinical Trial Data Management System (CTDMS ... Form Design and Definition

A.Peregoudov, WHO, Geneva

Page 1 of 9

Building an Integrated Clinical Trial Data Management SystemWith SAS Using OLE Automation and ODBC Technology

Alexandre PeregoudovReproductive Health and Research, World Health Organization

Geneva, Switzerland

Introduction

Handling clinical data is in many ways different from dealing with any other traditional data.Guidelines for Good Clinical Practice (GCP) impose various strict requirements oncomputerized systems and procedures to be used in clinical trials.

The SAS System features many of those characteristics that are essential for Clinical TrialData Management: a powerful database engine, advanced statistical analysis tools,elaborated reporting and data presentation procedures. It is also important to mention that theSAS self-documented database format is accepted by many national drug regulation agenciesas a standard for Computer Assisted New Drug Applications (CANDA). New features andfurther enhancements of the SAS System with the Nashville Release improve its integrationinto Microsoft Windows® platform and interoperability with other Windows compatibleapplications.

OLE automation and ODBC support provided by the SAS System pave the way for softwareintegration with other OLE automation compliant Windows applications to take advantage ofthe best available in these products. This paper describes how different program componentsare being integrated by an OLE automation controller through Visual Basic® programming intoa Clinical Trial Data Management System (CTDMS 2000) with the SAS System as a core.

System Overview

CTDMS 2000 is being developed by the Reproductive Health and Research Department(RHR), WHO in response to the needs for centralized data management of multi-centreclinical trials. The main objectives are: GCP compliance, lower development and maintenancecosts, portability, and provision for new technologies, such as automatic data capture withoptical character recognition and WEB-based data entry.

RHR has a world-wide network of trial sites. More than 100 institutions in five continentsparticipate in the collaborative research being co-ordinated by the RHR secretariat fromGeneva. These institutions are located both in developed and developing countries. Somehave at their disposal advanced telecommunication and informatics infrastructure withadequately trained staff. Others are far away from the most recent technologicaldevelopments, still relying on traditional post, telephone and fax as the only ways ofcommunication with the outside institutions.

These specific conditions impose a number of severe limitations on the system design andimplementation. CTDMS 2000 should be able to receive the clinical trial data – case report

Page 2: Building an Integrated Clinical Trial Data Management ... · Building an Integrated Clinical Trial Data ... a Clinical Trial Data Management System (CTDMS ... Form Design and Definition

A.Peregoudov, WHO, Geneva

Page 2 of 9

forms (CRFs) - from the sites both as paper copies posted to the centralized datamanagement unit and remotely through the WEB-based data entry. The data should beprocessed in a uniform way regardless of the CRF input mode and results should becommunicated back to the sites in exactly the same manner.

SAS/AF Frame objects, like Data Table and Data Form, facilitate the speedy development ofdata entry applications with graphical user interface to the data. But the SAS/AF solution, inthe given setting, seems to be neither cost effective nor technically feasible. First, becauseit is too expensive to provide a SAS license from the limited RHR budget to all the centres thatare technologically ready to implement this option. Second, because it does not entirely solvedata acquisition problems, especially for the centres from developing countries, and does notpermit a flexible multi-facet data entry.

Apart from that, paper CRFs are still in use in 100% of the clinical trials conducted by RHR(and by many other sponsors). It is unlikely that within the next few years there will bemigration to complete electronic data processing systems. Technological levels, due to widegeographic representation and differences in economical status, vary too much from one trialsite to another. Therefore, CRFs should be designed, translated into various local languages,printed in multiple copies and distributed amongst the trial sites. As the CRF is part and parcelof the clinical trial protocol, it appears to be logical to integrate the form design tool in theclinical trial data management system.

Most form design software products not only create a graphical environment for CRFdevelopment but also provide means for direct data entry through electronic forms. This isGCP compliant (if the data is protected by electronic signature) and certainly the leastexpensive solution, but it needs a reliable Internet connection and qualified technical supportstaff at the trial site.

The paper CRFs - comprising the total current data flow - will be processed by OCR (opticalcharacter recognition) software. Preliminary testing showed that the OCR engines are capableof reading form images and capturing data from both machine-printed and hand-written forms,including text, check marks, bar codes, labels and other fields, with a very high degree ofprecision. Advantages of the OCR automatic data capture are numerous. OCR not onlyreplaces manual data entry, the bottleneck of any data collection application, with fast, reliableand cost-effective procedures. It provides all the means to build electronic archives of the formimages and link them with the extracted data, thus ensuring an effective mechanism for audittrail required by GCP.

Therefore, the solution that has been adopted for CTDMS 2000 enforces the SAS system withtwo OLE automation compatible software tools (see Fig. 1). CRF design and relatedelectronic data entry are being implemented with JetForm® (JF). Paper CRF processing andautomatic data capture from the forms is being dealt with by Eyes and Hands for Forms®

(EHF), the OCR software from ReadSoft®. Both packages are Windows® compatibleapplications and allow for full integration with the SAS system under the Windows NTplatform.

The core of the data management procedures - maintenance and update of clinical trialdatabases, validation of transaction records, data quality report generation - is beingimplemented with the SAS Base software using its’ macro and formatting facilities.

Page 3: Building an Integrated Clinical Trial Data Management ... · Building an Integrated Clinical Trial Data ... a Clinical Trial Data Management System (CTDMS ... Form Design and Definition

A.Peregoudov, WHO, Geneva

Page 3 of 9

Fig. 1. CTDMS 2000 Overview

Trial Site Trial Site

CTDMS 2000Trial WEB Site

CRF Processingwith EHF

CRF Processingwith JF

Transactiondatabase

Transactiondatabase

OLE Automation Controller

Data Dictionary ValidationDictionaryMain Database

OLE Automation Controller

D b U dData Validation Data View

Data Query

CRF Definitionwith EHF

CRF Designwith JF

Quality Reporting

Database Update

Page 4: Building an Integrated Clinical Trial Data Management ... · Building an Integrated Clinical Trial Data ... a Clinical Trial Data Management System (CTDMS ... Form Design and Definition

A.Peregoudov, WHO, Geneva

Page 4 of 9

Provision is being made to fully utilize new features of the SAS System that are now availablein the Version 8.

Form Design and Definition

CTDMS 2000 exposes the full functionality of the JetForm® (Design® & Filler®) graphicaldevelopment tools for form design and data entry. JF Design module has been used for RHRclinical trials for many years, initially as a mainframe application and then under Windows. Afragment of a CRF designed with this tool is shown in Fig. 2. JF proved to be an efficient,flexible and powerful graphical design tool, with many more attractive features in the Windowsenvironment, such as WEB-based data entry, forms dictionary and ODBC support.

The outcome of the form design system is the CRF in an electronic format. Being an integralpart of the clinical trial protocol, the CRFs are distributed amongst the trial sites in hard copiesand, if necessary, electronically.

In order to be processed by the OCR engine for automatic data capture the form initially hasto be defined with EHF. The definition procedure works with the form image (TIF format)produced by a scanner from the paper copy. This is an advantage of the EHF software. Itdoes not depend on any particular design tool. EHF deals with forms of any type and origin,even those drawn by hand, provided that the form design agrees with a few minor featuresimposed by the software.

Data Dictionary

The form design and definition process results in a primary data dictionary being createdduring these phases. Whether the dictionary is JF or EHF-specific, both include such keyattributes as variable name, type and format.

These dictionaries are exported and further translated into the SAS System format throughVisual Basic programming. At this stage the primary dictionary is extended with a few morecontrol variables to identify: transaction batch, user, date/time of entry and validation, and witha few more data attributes to supply SAS compatible variable labels and print formats. Thedata dictionary is stored within CTDMS 2000 as a SAS data set. This allows for an automaticgeneration of the related SAS transaction databases thus reducing overall application set-uptime. Most of the data dictionary components may be modified if and when necessary.

When both JF and EHF primary dictionaries are created, the data dictionary module checkswhether they are comparable and makes sure that both are translated in the same datadictionary.

Transaction data dictionaries serve as a source for design of all other databases, such as, forexample, main database and analysis database. This is done merely by selecting the requiredvariables from different dictionaries and writing the SAS code to compile the data.

Page 5: Building an Integrated Clinical Trial Data Management ... · Building an Integrated Clinical Trial Data ... a Clinical Trial Data Management System (CTDMS ... Form Design and Definition

A.Peregoudov, WHO, Geneva

Page 5 of 9

Multi-Facet Data Entry

CTDMS 2000 is providing two data entry modes: paper CRFs centrally processed by OCRengine with automatic data capture (EHF tool) and remote WEB-based entry throughelectronic CRF format (JF tool).

In the case of OCR reading, data manager intervention is only required at the data verificationstage. By comparing the CRF image on the computer screen with the extracted information(see Fig. 2), the data manager takes the immediate decision to accept or correct the data orto generate a data query to the investigator at the trial site.

Fig. 2. On-screen data verification after OCR reading by EHF

Whatever the input mode of the given CRF, CTDMS 2000 generates an unique form-specificSAS transaction database. The transaction database is completed directly by EHF or JFthrough the SAS ODBC driver. A possibility to export a flat ASCII transaction file from anydata entry mode also exists as a back-up option.

Page 6: Building an Integrated Clinical Trial Data Management ... · Building an Integrated Clinical Trial Data ... a Clinical Trial Data Management System (CTDMS ... Form Design and Definition

A.Peregoudov, WHO, Geneva

Page 6 of 9

Database Update

The transaction database has a very short life span. It is used as a bridge between the dataentry module and the main database. Once the transaction database is completed (optionally,validated and corrected) the main database is automatically updated.

Database update in the SAS System is a straightforward Data step processing that requirejust a few basic SAS statements. It becomes slightly more complicated in order to allowCTDMS 2000 to control procedures: preventing an update with transaction records alreadyexisting in the main database, determining the number of the input and the output records andreporting summary of the update processing to the user for logging.

Data Quality Control

A great deal of effort needs to be put into quality control when managing clinical trials data.Ideally, computer records should 100% mirror the original data from the CRFs. Quite often,standard operating procedures applied by the pharmaceutical industry require completechecks of a certain percentage of electronic records against the CRFs.

Collected data should be checked for completeness, consistency and correctness. Anydetected error should be annotated and, if necessary, referred to the investigator in chargeas a data query. When resolved, the data query should be translated into a formal datacorrection and applied to the trial data base.

Depending on the complexity of the trial and amount of information to be collected, dataquality checks may be very numerous. The more thorough the data quality control the morereliable the data will be.

There are two types of quality checks that are established in CTDMS 2000. Both require theSAS Base features only.

Range check (RChk) addresses a single item of information (i.e. a single SAS variable) andverifies whether the value falls within the specified range (continuous variable) or belongs tothe specified list (discrete variable).

Cross check (XChk) involves a number of information items (i.e. a number of SAS variables)that may belong to different databases. Normally, XChk is defined through the SAS languageas a certain logical condition to be satisfied.

Validation Dictionary

Every variable in the SAS transaction database may be linked by CTDMS 2000 with a rangecheck. The validation dictionary module takes the check definition as a valid range or as avalid list of values that the variable may be assigned. This definition is then transformed bythe module into the SAS user-written format. The RChk format is applied at the databasevalidation step only. When executed by the function put(variable,RChk-format), any valueof the given variable is translated into one of the RChk-format keywords. The main one is

Page 7: Building an Integrated Clinical Trial Data Management ... · Building an Integrated Clinical Trial Data ... a Clinical Trial Data Management System (CTDMS ... Form Design and Definition

A.Peregoudov, WHO, Geneva

Page 7 of 9

the standard keyword that stands for the valid case. The others signal a range error. Whendefining RChk, the user may assign specific keywords to the different range errors dependingon their severity.

The collection of validation formats defined for the whole database constitutes the RChk partof the validation dictionary. Physically, range checks in a source code are maintained as aSAS data set that may be updated as necessary. After compilation they are stored in a SAScatalogue.

A cross check (XChk) definition consists of five components: list of related SAS variables, theSAS code which sets the logical condition that the variables should meet, the descriptive textand the short message associated with the XChk, and the clause controlling when the checkshould be activated (always, if and only if all the relevant variables are valid, not active, etc.).

The collection of cross checks defined for the whole database constitutes the XChk part of thevalidation dictionary. The validation dictionary module verifies all the XChk componentsprovided by the user and stores them as a SAS data set that may be updated as necessary.The XChk descriptive text may include a macro variable that will be assigned a value afterexecution of the XChk code by the SAS System.

By their definition, cross checks may be either limited to the transaction database or beapplicable to the main database. The XChk clause provides a flexible way to temporary“switch off” some of the global cross checks that are not applicable to a particular transactiondatabase.

Validation Procedures

The burden of the final data validation and quality control remains with the SAS Base. Thoughsimple range checks can be set-up in both data entry modes, thorough final validation of thetransaction and main databases is essential to justify corrections to the source data as wellas for documentation purposes.

The SAS formatting features and macro facilities turn out to be extremely powerful and flexiblefor the data quality checks and reporting. The validation processing in CTDMS 2000 is a100% SAS Data step application. It employs Data step and macro functions with a fewprocedures available in the SAS Base.

The executable codes for range and cross checks are defined as SAS macros. Wheninitialised, they generate a series of global macro variables and statements that make the SASBase system apply the validation dictionary. In fact, the validation dictionary components arecopied to the macro variables. Most of the processing during the validation step takes placein the macro environment. It results in the validation report - a SAS data set - which bringstogether all the violations of RChk and XChk rules that occurred for every database record.

Page 8: Building an Integrated Clinical Trial Data Management ... · Building an Integrated Clinical Trial Data ... a Clinical Trial Data Management System (CTDMS ... Form Design and Definition

A.Peregoudov, WHO, Geneva

Page 8 of 9

Validation Reporting

The goal of the clinical trial data validation is to identify incomplete, inconsistent, and incorrectdata and to report it to the data manager and investigators for the prompt corrective action.

The validation report should be as detailed as is possible. Any relevant information – from thedata and validation dictionaries as well as from the database – may help to take quick andadequate decisions with respect to corrections.

ODS formatting and graphical options in the Nashville Release allow one to produce veryattractive looking validation reports (see Fig. 3). What is more important, the report can bedirected through ODS to the HTML destination. When the CTDMS 2000 WEB site is set-up(future plans) the validation report will be immediately available world-wide for referral by allthe investigators involved in the trial.

RHR Project 97902 (PH)Mifepristone and Two Regimens of Levonorgestrel in Emergency Contraception

Page 1Validation Report for: Centre 1996 - Ulaanbaatar Date: 18 Apr 2000 16:31

Case No Check Id Reference/Label Value Message

ESQ08D ESQ08M ESQ08Y ESQ13

Date of ultrasound exam (DD)Date of ultrasound exam (MM)Date of ultrasound exam (YYYY)Fetal heart beats

9999

99999

Invalid value’’ ’’’’ ’’’’ ’’

0028-P

xCheck 129

ESQ04 ESQ08 ESQ09 ESQ10 ESQ11A ESQ11B ESQ12 ESQ13 ESQ14

Suspected pregnancy. Please verify the dataand check pregnancy report for completeness

Pregnancy testDate of ultrasound examDuration of pregnancyDate of conceptionAmniotic sac (A)Amniotic sac (B)Crown-rump lengthFetal heart beatsCarry pregnancy to term

2.

3504AUG1998

99999991

Pregnancy

0110-Z xCheck 205

AMQ23A AMQ23B F1Q02A F1Q02B

Second dose should be taken 10-18 hours afterthe first one. Actual delay is 9 hours

Date 1-st dose takenTime 1-st dose takenDate 2-nd dose takenTime 2-nd dose taken

13JAN199912:40

13JAN199921:40

Protocol violation

Fig. 3. Validation report in HTML format

Data Query Resolution

The validation data set produced by CTDMS 2000 contains all the necessary informationitems to automatically generate (within Visual Basic) query resolution electronic forms. After

Page 9: Building an Integrated Clinical Trial Data Management ... · Building an Integrated Clinical Trial Data ... a Clinical Trial Data Management System (CTDMS ... Form Design and Definition

A.Peregoudov, WHO, Geneva

Page 9 of 9

a data query is replied to by the investigator or fixed by the data manager, correct data valuesare entered through these forms. From the data entries, the query resolution module createsSAS data correction statements of the sort: if Rec_id then variable = correct_value. Theseapply modifications to the database.

It would be natural to integrate data query entry boxes directly into the HTML-based validationreport, next to the Value column (see Fig. 3). If it is feasible, the data query resolution couldbe done directly by the investigators via the Internet.

Further Developments

Some components of the CTDMS 2000 have already been programmed and tested. Thevalidation processing is being used for the data management of the on-going RHR clinicaltrials.

The Nashville Release (distributed in Switzerland as from April 2000) brought more power thatextensively improves the SAS System and its integration with other OLE automation compliantWindows applications. Most of the effort now being made are intended to incorporate into theexisting CTDMS 2000 version new enhancements of the SAS System and explore newfeatures available in the Output Delivery System.