standardization of “drug safety” reporting applications-

The New York State University at BrockportDepartment of Computational Science

Standardization of “Drug Safety” Reporting Applications

Halley M. Zand

Winter, 2005

Thesis advisor: Dr. Robert Tuzun

1

Abstract:

The purpose of this thesis is the development of an application process for

preparing reports on drug safety. The FDA is responsible for protecting the public

health by assuring the safety and security of human and veterinary drugs.

Annually, companies who provide medications are required to generate reports

that assure the FDA of the drug’s safety.

This thesis proposes an Information Technology infrastructure model that

provides drug providers IT organization with a strategic perspective on how to

computerize their Drug Safety reporting activity. It introduces software

development concepts, methods, techniques and tools for collecting data from

multiple platforms and generates reports from them by scripting queries.

Introduction:

According to Guidance Documents for Drug Evaluations and Research from the

U.S. Food and Drug Administration all prescription drugs, both new and generic

need to be approved by the FDA. To obtain these approvals, drug providers are

required to generate annual reports on product safety and attach them to their

application letter. Also, any person can report to the FDA a reaction or problem

with a drug. The FDA reviews applications and all reported clinical outcomes to

see if the reported events happened because of other reasons or use the

suspected drug.

Manually reporting is not practical because of the large volume of data, and the

differing platforms and formats in which they are stored. Unfortunately, tools and

standards are often poorly used due to lack of Database Application Modeling,

Programming and Software Engineering skills. User applications are often

cobbled together with little more efficiency than manual processing, and tools for

automation and large scale data processing are not utilized.

2

The hiring of qualified staff and carefully selecting software increases the quality

and reduces costs. A two-hour job may take a week due to poor technical skills,

and the cost of software licensing may increase by as much as 5,000 USD from

10,000 USD because of the lack of attention paid to the productivity of the

software tool. A standardized IT infrastructure provides higher computational

quality at lower cost. In addition, professional developers with computational

science backgrounds are the only group that has the sufficient computational

knowledge and bookkeeping skills for software application design and the ability

to apply technical concepts.

Merging Computational Science and Drug Development Science for Drug Safety

Evaluation can be evolved within a modern computer environment; and because

Computational Technology grows quickly, designers would need an advanced

vision for the future. A strong knowledge in computational science and

bookkeeping helps developers use what is available and progress forward from

it.

This thesis explains a modern computational architecture for implementing Drug

Safety Reporting Applications. This architecture uses advanced IT concepts to

increase the quality of work on a large volume of data that may be dynamic

rather than static and comes from distributed computer networks. This thesis

aids in the study of Drug Safety in obtaining the best software solution

advantages possible.

Objectives:

SAS is the software application that developers use to provide high-quality

reporting applications for Drug Safety. The collection of concepts that work

together is required in order to achieve a computer-based method for Drug

Safety evaluation. This paper proposes an infrastructure that uses the optimal

solutions for this process. The abstract is intended to use the information

3

gathered to develop the system as a whole. It can accept data from both papers

and electronic databases. Databases such as Oracle and Microsoft Access can

be considered as backbones of the system. All computational terminologies that

are recommended for this proposed infrastructure must be explained. For

example; in some cases, data mining might be used to find a pattern and help to

estimate descriptions of a data field. This ability of the proposed architecture in

data mining should be illustrated.

In this thesis, entity relational database modeling as well as data accessing,

formatting, classification, and scripting is illustrated best by giving examples and

working on creating descriptions of longitudinal data. Focusing on code

consistency with all essential attributes and their effeciencies in the proposed

infrastructure is included. Proposed software should support maintainablity; but

focusing techniques on the data error concept is not within the scope of this

paper. In order to achieve the best result, we need to use all available pieces of

accurate data and perform the correct programming processing. These data can

come from health care providers, consumers, literature, and other relevant

databases. It is important to find the ordinary errors during scripting. Due to a

missing part or step in coding for data processing (extracting and retrieving,

manipulating data, or making narrative data from queries and assessing them) a

large difference on the expected result and the accuracy of the reports may

occur.

Technical Specifications:

Data accessing:

SAS data might come from other application platforms. These data might be

formatted or non-formatted and therefore filed differently in varied environments.

Accessing these data from several servers is done in the following steps.

4

I. Use the SAS ODBC driver to access by communicating with

either local or remote SAS servers using TCP/IP protocol. Data

can come from a local, remote, or any type of database server. It

can be in any format including raw data or any vendor’s software

data set. The ability to read raw data in any format, from any kind

of file (including variable-length records, binary files, and free-

formatted data--even files with messy or missing data) is required.

II. Combine and manipulate these data on the client side, analyze

the out-coming data and distribute it by making an execute file

from the server to multiple client.

The following are examples of possible case in data accessing:

a. Data may exist on a mainframe computer or pc network.

These data might join to an existing data set, create new

variables (columns), and produce tables and interactive

graphs.

b. Raw data may exist on a UNIX server. Compute other

data values from them, form statistics, and create an

HTML report to use in web application systems, then

store on a web server in intranet /internet platform.

c. Access may be needed to BMDP, SPSS, and OSIRIS

files directly as well as files such as Microsoft Excel

spreadsheet, Microsoft Access table, dBase, ORACLE

forms and any other DBMS. In addition, both relational

and non-relational databases, including any PC data

source can be considered as a data file.

d. The relational databases in DB2 format exists in OS/390,

VM, DB2, UNIX or PC environment.

e. ODBC, Informix, ORACLE and OLE DB data may come

from any platform. They may also come from SYBAS

5

machine or Teradata, MSSQL Server or any other

machine.

f. Baan or PeopleSoft files may come from ERP systems

such as R/3 and SAP BW. Thus global data may be

received and processed for creating an enterprise report.

Data Management: After accessing data, it is necessary to manage them, by

creating, retrieving, and updating database information. This may require

advanced programming skills because the information comes from a wide range

of data sources and it is necessary to merge them together and then evaluate.

Data with the same attributes need generic formatting that requires a

manipulation process. Evaluating values of data requires computational

operations that may be defined as functions. Saved sets of data in the data

forms may have been extracted from subsets data. Complex conditional

processes during data manipulation may be needed when a wide range of data

source is merged.

After gathering and shaping information we need Statistical Analyzing to

produce reports. These reports are customized and they may be complex.

Tables, frequency counts, and cross-tabulation tables may be produced to create

a variety of charts and plots. Also, the computation of a variety of descriptive

statistics including linear regression analysis, standard deviation, correlations

and other measures of association, as well as multi-way cross-tabulations and

inferential statistics may be necessary.

These representations should be able to be reported to a wide variety of

locations and platforms in order to suit client needs. Results may be required to

be presented in many formats, such as an array of markup languages including

HTML4 and XML, or formatted for a high-resolution printer such PostScript/ PDF/

6

PCL files, RTF or even color graphs that can be made interactive using ActiveX

controls or Java applets.

System architecture modeling

Information must be gathered by drug providers. These data come from clinical

studies by the FDA and other professional investigators. Other information

comes from medical records of patients who were treated by the specific drug.

Usually, drug providers do a study of their product before moving onto the

evaluation step. The first step is the collecting of data to generate reports such as

the country of origin for patients receiving the drug, worldwide patient exposure,

demographic characteristics, most commonly reported body-system reactions

(ordered by gender and/or age of patients), and the summary of death or other

7

Post marketing DATA Ware House

User

Data DictionaryArchive

(MedDRA/PubMed)

(Oracle)

ModificationClinical Studies Data,

Individual clinical trials

Reporting data by investigators.

Data Analyses

(SAS)

Adverse Event Reporting

Clinical Trials Hospital Labs.

Verifications

critical body reactions. Another resource is the company’s surveys on products

completed by patients or clients who are volunteers in the U.S. or other

countries. These surveys include match data from Med Watch Forms that the

U.S. Department of Health and Human Services accepts as a voluntary reporting

of adverse events and product problems. Also, these manufacturing companies

may be able to receive FDA Reports generated on the basis of Med Watch

Reports about this product. Furthermore, many of the surveys are answered by

physicians and other doctors who have the EMR System and are able to answer

detailed questions regarding medical conditions and other related medical

issues.

Any tool that is recommended here should be consistent with FDA Standards

and the objectives that follow.

In any Adverse Event Reporting System, the Basic Calculation and Data

Analysis have statistical bases on data sets that may frequently be ordered

according to one or more variables coming from a variety of data sources. Thus

an Adverse Events Reporting System can work on any possible platform. For

example, if it uses E2B data element structures then it should be able of doing

any possible interactive query or data flow transactions on shared data. SAS is

compatible with all computer platforms. It works on any type of operating system.

It supports data sharing concepts. It suports submission through the WEB or any

other network that includes Oracle, Unix, NT servers or Mainframe machines.

This means that any regardless of backbone, SAS can suport it.

Data sources may need to be summarized or checked before being

reported. Scripting and programing concepts are one of the major necesities in

development. SAS has a powerful scripting language that can do any required

summarizing, verification and validation.

8

In the pharmaceutical field and bio-informatics, SAS software is generally

thought of for statistical analysis programming but is also a largely untapped

resource for its other many features. It’s screen building and object oriented

development abilities are needed to keep up with the latest Information and

Technology advances.

SAS is a stand-alone system produced by SAS, Inc. and sold in the open

market. It exceeds all technical objectives specified here.

The FDA has proposed MedDRA as a standardized dictionary of medical

terminology. MedDRA has been used internationally to discuss the regulation of

medical products. MedDRA provides symptoms, signs, diseases, and diagnoses

information. It also includes other information such as:

Names of investigations (e.g. liver function analyses, metabolism tests)

Sites (e.g. application site reactions, implant site reactions and injection

site reactions)

Therapeutic indications

Surgical and medical procedures

Social and family history terms

SAS and MedDRA are FDA standards. They have high standard designing; and

assure that company builders continue looking to find weaknesses and improve

their products. All their documentation and userinterfaces are user friendly. SAS

and MedDRA are generic softwares and any specific needs such as security of

data or reliability of operations can be negotiated in a service level agreement.

EMR Database: These data come from hospital laboratories and clinical data

entry systems. They are documented before and after verification. All

documentations are electronic and all reporting submitted electronically. MeDRA

9

does encoding that is part of clinical data entry. All data entries are standard

based approved by the FDA.

Terminologies:

A computerized Drug Safety Evaluations requires the following informatics

terminologies:

Data classifications

Control Code

Formatting

Quality Control

Data Mining

Gathering information

Accessing and manipulating data

Scripting

Each of these terminologies carries a process or methodology that will be

discussed in the following.

Data Classification:

Any Structured Analysis of information needs classification. Data Classification is

the first best-known task in data flow modeling. The data model of a Drug

Adverse Event Reporting System is derived from conceptual information such as

entities and their interrelationships. A mechanism serves as a store of all drug

information which can link analysis, design, implementation and evolutions

applied in most medical applications. This classification should be consistent and

not clash. It is integrated in all parts that require maintainability.

10

The outcome attributed to adverse events is the most important information that

needs to be classified. The data classification for this attribute should be a

standard classification that is matched by the FDA reporting program.

The FDA uses MedDRA as a part of the proposed rule for post-marketing

reporting. MedDRA is the abbreviation for Medical Dictionary for Regulatory

Activities and it is an international terminology designed to support the

classification, retrieval, presentation, and communication of medical information

throughout the medical product regulatory cycle. Originally, MedDRA was written

in English and distributed in ASCII file format; but it is now available in several

other languages such as Dutch, French, German, Italian, Portuguese, Spanish,

and Japanese. This on-line dictionary is intended to become the global medical

terminology standard for use by every bio-pharmaceutical company in the world

and has the best-known classification with an integrated platform in updating that

can be used by all standard systems. In the majority of homegrown medical

applications, the patient medical recording systems use this classification and it

is valid for all phases of drugs and subscribing Pharmaceutical companies.

MedDRA works as a catalog of medical disorders. It has a hierarchical data

structure that has five terms. Developing queries or retrieving information about

medical diagnoses need hierarchical searching on these terms, and other

queries might be selected by grouping them thusly.

The next page picture shows the SOC view of Cardiac and Vascular

investigations (excl enzyme test):

11

MedDRA classifications have an Object Oriented data structure as shown in the

following screens.

12

Each MedDRA has a unique code that can be use as a searching key.

14

A query makes a link between collected data and terms in MeDRA. A

query can create a selection on a description of medical data. This

selection requires searching and enters the term to be sought into the

'Search for Value' field. The query then selects one of the records

returned and identifies information about patients. After that, codes in

the database are ready for any statistical evaluation.

The other advantages of using MedDRA are:

MedDRA is on-line (not requiring installation or periodic updates on the

client system). The application has a standardized interface, is well supported,

and requires little effort to interface with any client computing environment. A

good designer can get the best advantage of this classified information by using

it as a shared data set. Updating this shared information maintains all the related

outcomes that have referenced this data set.

Informatics terminologies such as encoding are already included in

MedDRA for its own data sets.

MedDRA includes high standards that can be updated with queries or

importing data; however, it requires quality control because it can disrupt

everything.

Current MedDRA Version has MediMiner for the managing and analysis of

the coded data included all data mining. This unique tool allows analysis of the

impact of recoding the data sets from one MedDRA version to another when

MedDRA is a standalone product that has been used as an integral component

of our range of coding tools. MedDRA classification can be browsed by a tree

that can be collapsed and viewed at every level of detail for all occurrences in

every possible search category such as legend, terms and coding.

15

Control Code:

SAS and MedDRA both have code controlling utility to do the following:

Debugging system and maintenance ability in any branch of code to make

a cross-reference listing showing all the program names that have been declared

and used.

The analyzer discovers un-initialized variables, unreachable codes,

uncalled functions and procedures as well as the number of times executed for

each statement.

MedDRA has MedMiner as its version control utility. During any updating in

MedDRA MedDRA 3.1, MediMiner controls all changes by analyzing the coding

sets. In MedDRA 4.1, it also impacts the recoding of data by identifying all codes

that remain unchanged, and identifying those codes that may require recoding. It

is also possible to identify the codes that no longer exist, those that have been

changed in some way, and those that have a related change or where a

multiracial (inherited from multiphase of original codes) change has had an

impact. Primary and secondary changes are identified as well as changes in the

current status of the code.

SAS software includes Source Control Manager (SCM) utility as one of the

options in Desktop selection of Solution menu.

SAS->Solutions->Desktop->Development and Programming-> Source Code

Manager

16

SCM includes a friendly GUI that has SAS file check-in/check-out capabilities.

This GUI lists all libraries, data sets, catalogs, and catalog entries in a

hierarchical order. SCM has flexible testing, revision control, and version labeling

with an easy application distribution utility. By having a version label, it is easy to

create a copy of an application and place it in other locations on the network.

Also, SAS/CONNECT utility can place the application on other remote machines.

Formatting:

Usability of information is one the most important components of any application

implementation. Usability requires readability and the readability of any data set

is facilitated by standardized formatting. Each line represents many separate

17

pieces of information which are data values, and the formats determine how

these values are displayed or used in calculations. These formats set the width

of displayed values, the number of decimal points displayed, the handling of

blanks, zeroes, and commas, as well as other details.

SAS supports its own standard and user defined formatting. Standard formats

might be use for numeric, character or picture data. Also, User can write or

define custom-made formats in Data and Procedure steps. User defined formats

are reusable and can be saved in format catalogs. If saved in a SAS Catalog

they then remain there permanently. If saved in catalog WORK.FORMATS, they

are there temporarily and retrievable only in the same SAS session or job in

which they were created. Because catalogs are a type of SAS files that reside in

a SAS data library, they work as an executable handling facility and intercept run-

time error under undefined format. By this way, type-checking is supported and

influences the readability of information. If the SAS system option NOFMTERR is

in effect, SAS uses its own default formatting when it calls an undefined format

so that in some cases we might ignore these errors and continue the executing.

Quality Control:

Delivering the correct result requires quality control. SAS recognizes common

errors such as syntax, execution-time, data and semantic errors; however, users

can check for common mistakes such as the following:

Check for syntax errors

o statements ending with a semicolon

o starting and ending quotation marks

o keywords

o Every DO and SELECT statement must be followed by an END statement

18

Check for execution errors:

o illegal mathematical operations

o observations out of order for BY-group processing

o Incorrect reference in an INFILE statement such as misspelling or

otherwise incorrectly stating the external files are recognized.

o A program may run, yet give an incorrect result. These errors are often

detectable by checking self-consistency and should always be reported,

certainly in the debugging stage, and often during production runs.

SAS usually executes the statements in a DATA step one by one, in the

order they appear. After executing the DATA step, SAS moves to the next step

and continues in the same fashion. It must be certain that all the SAS

statements appear in order so that SAS can execute them properly.

Check input statements and data. SAS can detect data errors during the

execution; but this won’t terminate the processing. After executing, it prints a

note describing the error. In that note SAS lists the related values that are

stored in the input buffer and the program data vector.

o The corresponding values with actual variable values in INPUT statements

must be checked.

o Any corresponding arrangement such as formats, lists and columns for

input statements must be checked too.

Data mining:

Data mining is a class of database applications that look for hidden patterns in a

group of data. Statistical analysis is the data analyzing method that is matched

with the nature of data mining. Statistical analysis might uncover the hidden

19

pattern of data for a large volume of information coming from Adverse Events

Reports or survey systems. A data mining process might combine variables that

occur more than expected. By applying statistical options, an optimal guess can

be made about the best match behavior that may have occurred frequently.

Data mining is a critical aspect of these reporting systems. Occasionally, the

predictions may be even more important than detections in drug safety

evaluation. In the United States, patients can file lawsuits against drug providers

for severe adverse reactions. These legal actions often make American drug

companies fearful to introduce drugs into the U.S. market. However, data mining

on data from other parts of the world offers a way to move the drug safety

process from a reactive process to a proactive posture in efforts. In effect, it

would help drug providers to take a safer marketing strategy rather than take

risks.

Data mining on data from other parts of the world is also a way to move drug

safety evaluation from detection to prediction

If MedDRA System Organ Class terms are adopted as a class of events then one

can select related data from patient records for that event and make it possible to

discover statistical rules or patterns automatically from the data, later creating a

hypothesis and runing tests on the patient record database to verify or refute it.

Data mining can protect drug providers against lawsuit. This process uses data

from other countries and clinical studies.

SAS assists data analyzing in an instructional way, so that even people with no

statistical knowledge are able to run the required processes on selected data

sources (a basic option includes: counting missing and non-missing values,

minimum, maximum, range, sum, mean, variance, standard deviation, standard

error of the mean, coefficient of variance, skewness, kurtosis ). In addition,

access to data sources can be secured to prevent unauthorized access. SAS

also allows for the creating of different reports and presentations on results

20

(including tabular tables, frequently reports with graphical presentations to

visualize the results).

SAS supports data mining for a large volume of statistical procedures

(regression, association discovery, time series, and time series cross-Sectional

(panel) data analysis), whereas, data is usually analyzed by regression (one

observation for each patient). Sometimes it is required to correlate with cross-

sectional data such as geographic region, gender, smoking, alcohol use, and so

on.

Gathering information and documenting system specifications:

The available information (such as the toxicological and pharmacokinetic profiles

of the individual drug, the treatment indication or indications, the intended

populations, etc.) might have been defined by relational databases. The

backbone of this system might be SQL, Access or even Excel; but the data query

may not be suited to the performance of detailed statistical analyses of data in

this stage. It is then that SAS helps in statistical analysis. SAS has been

interfaced with databases to allow large volumes of data to be retrieved efficiently

for analysis. All engines can be assigned to a SAS library. This library is a place

that saves all access to the stored files. These files might come from a variety of

engines such as ODBC, SPSS, SYMBAS, REMOTE, META, MYSQL, ACEESS,

ORACLE, DB2, MySQL, ACCESS, etc. For the processing of data, it is required

to define all connections that might be created between the different sets of data

records. The first link can illustrate correspondence of the MedDRA

classifications to the patient records. In concentrating on the relevance of

available data, medical information of patient works in tandem with MedDRA

classifications to build queries and analysis information.

As a part of application developing process, specifying the following information

is required:

21

1. Source data : Miscellaneous data sources may exist and in order to get the

correct results, the prescription drug information provided by drug firms should be

truthful, balanced, and accurately communicated. The same applies to data

coming from clinical and post-marketing trials, or spontaneous reports (submitted

individually by doctors or patients). Dynamic data are operational data from

internal systems such as the homegrown applications of clinics or hospitals, the

manual data coming from paper chart patient history, EMR (Electronic Medical

Records), and Adverse Event Reporting (Med Watch).

2. Data Staging: This area includes the storage and processing for extracted

data from the internal and external systems prior to loading in a SAS data bank.

The following is a list of cases.

Information may be located in multiple SQL tables in a local

computer or external servers. If it is required, one may make a connection to

the database server and use the data dynamically. For example the

Adverse Events Database has included side effects which are serious (such

as death or risk of dying, hospitalization, disabilities, congenital anomaly or

required intervention to prevent permanent impairment or damage). These

data are required for generating some particular reports.

Part of the information is part of Aventis Reports or ClinTrace. Data

from these two areas might work together to complete an assignment then

create an executable program that makes a connection to the backbone

database of these two licensed vendor applications and use the data.

Note: Having a basic knowledge about these databases helps programmers

to create standard codes. For example an Aventis or ClinTrace Case ID

(Manufacturer Control #) is assigned on an “Episode” basis for each

patient. Adverse Events (reporting side effects) are temporarily linked to

the same episode and are entered in the same Case ID. For drugs that are

given intermittently, additional episodes (Case ID) are created for events

that occur after different treatment cycles.

22

Side effects are stored in Companies Core Safety Data Sheets.

These sheets are for global labeling of reports and are based on the

diagnoses which are in turn assessed by seriousness. All diagnoses

reported from intensified monitoring (such as clinical trial or post-marketing

surveillance study) are assessed as associated or not-associated with the

study medications. These data may be joined to MedDRA information to

build a larger directory that is used in SQL scripts.

Drug providers use certain information, such as the cause of side

effects as a result of internal or natural body process, in a causality

algorithm for internal clinical interpretation or signal evaluation purposes. In

some particular cases, this algorithm is required to be applied as a part of

script logic in the SAS code. If a company has a computerized analyzing

application, depending on their software, it is possible to execute a

connection for using this application inside the SAS script code.

In data mining related by diagnoses, MedDRA information is

required. It is recommended to use SAS scripting for creating a remote

connection to read MedDRA ASCII file, importing data to the temporary

created tables. These tables would be deleted at the end of scripting

process.

Note: All transactions such as queries, statistical analyses or visualizations

coming from sources should be consistent. Sometimes these data are not

enough to be consistent. In order to solve this problem, all “no match” data

need appropriate transformations or conversion from their original form to

the MedDRA representation.

3. Metadata : A term used to describe or specify the data. It is used to define

all of the characteristics of data required to build databases and applications,

and to support knowledge workers and information producers. This includes

data element name, meaning, format, domain values, business integrity rules,

relationships, owner, etc.

23

For example the following classification shows the analogy of data concepts in

MedDRA:

1. SOC MedDRA CODE Numeric MedDRA Term String2. HLGT MedDRA CODE Numeric MedDRA Term String3. PT MedDRA CODE Numeric MedDRA Term String COSTART Symbol, AlphaNumeric WHO_ART Code, Numeric ICDS Code, Numeric PT ICD-10 Code Numeric HARTS Code, Numeric ICDS_CM Code, Numeric JART Code Numeric* SOC Code Numeric* SOC Name Numeric

4. LLT – Lowest Level Term

MedDRA Code Numeric MedDRA Term String WHO_ART Code Numeric COSTART Symbol AlphaNumeric ICDS_CM Code Numeric CURRENCY Character/Boolean HARTS Code Numeric ICDS Code Numeric JART Code Numeric

* Multi valued attribute

Defining Metdata for the adverse event reporting data is also required. These data are:

o Patient Identifier and patient information: age at time of event or

date of birth, sex, weight, etc.

o Outcomes attributed to adverse events such as death, life-

threatening occurrences, hospitalization, initial or prolonged,

24

disability, congenital anomaly, required intervention to prevent

impairment/damage, other.

o Date of event and report in mo/day/yr format.

o Description of problem.

o Relevant tests/laboratory data including dates.

o Other relevant history including preexisting medical condition (e.g.

allergies, race, pregnancy, smoking or alcohol use, hepatic/renal

dysfunction, etc.)

Still most popular medical clinics use Paper Medical Records (PMRs) but many

others have begun to use Electronic Medical Records (EMRs). No standard form

has been yet defined for EMRs, but all provide the same information that requires

Metadata definitions. These data are:

o Patient primary reason for medical visit

o History of onset of clinical signs and symptoms,

o Current list of medications the patient is using

o Relevant past medical history, including hospital admission,

surgeries, and diagnosis

o History of family disease, such as diabetes, cancer, heart disease,

and medical illness

o Social history: use of drugs, smoking, job stability, and housing,

living condition, incarceration.

25

o Review of systems: patient relocation of systems and current

medical problems, such as trouble sleeping at night, panic

episodes, and results of tests.

o Physical examination: the clinician’s hands-on examination of

patient, including head, eyes, ears, nose, throat, chest, and

extremities

o Labs includes blood glucose, cholesterol, and drug levels

o Studies such as X-ray, MRI, CT, and EKG.

o Progress notes such as record of temporal progression of signs

and symptoms, labs and studies for the length of the study or

admission

4. The entity-relationship model The specification of required information for

an adverse event serves as a starting point for constructing a conceptual schema

(overall design of the database) for the suggested database. The identity set and

attributes targeted here are drug and patient entity sets. These entity sets have a

relationship that has attributes by itself. This relationship is a “many to many”

relationship. Other relationships might be designed between subsets of an entity

set. The relationship between drug entity sets and ingredient or side effect entity

sets are examples of these relationships. Here, these relationships are “many to

one” relationship. This method of designation helps in saving memory. In some

other cases such as patient-drug relationship, the maximum participants are

limited to two relations, which leave a designation in one general set.In the

following diagram, small rectangles show the entity set; large rectangles specify

attributes; diamonds represent relationship sets; lines link attributes to entity sets

and entity sets to relationship sets; arrows indicate that an entity falls exclusively

into another entity; double lines indicate many relationship sets; bold diamonds

show “many to one” relationship sets, and rectangles with non-indexed

26

information indicate information about a relationship set. Above E_R model is a sample

of what can be considered; although the attributes can be designed with more details in mind. For

example, ‘rout and dosage’ could be designed as a separate entity because it includes many

optional attributes that may be concatenated together as a description data text. They may also be

saved seperatly in a data source. This designed E_R model gives substantial flexibility in the

designing of the basic data base schema. Accessing and Manipulating Data:The first step in

accessing and manipulating data is the DATA Step. The DATA Step is for

accessing, reading and programming the data processing. As explained before,

one of the strengths of SAS is the fast and easy access from many different

sources. In addition to the programming components, SAS has many other

features in the DATA Step Process that help to develop a standard application.

SAS language has all the statements required for accomplishing typical data

processing. Among these are the reading and adding of raw data files and SAS

data sets and writing the results. Sub-setting data, combining multiple SAS files,

creating SAS variables, recoding data values; and creating listing and summary

reports that include advanced analyzing features such as web analytical

solutions are also possible. Special focus should be placed on the management

of SAS data set input and output, working with different data types, and the

manipulation of data. It may also be necessary to control the SAS data set input

and output, combine, summarize, and then process iteratively with programming

to perform data manipulations and transformationsAccessing data would be first

needed here. Sometimes, the required data file will be saved in another server

and location. With an ftp server running, SAS can make an ftp connection and

use the external data source remotely without there remaining any copy of the

downloaded data on the machine unless SAS writes it out. As an example, one

can assume the data belongs to cps-users and is located at

~/halley/thesis/main.data. Many data might

come as raw data. This raw data must be entered into a SAS data set. As an

example, one of the clients might send a letter or a txt file that includes parts of

the patient’s information. The following script shows how to input these data into

a SAS data set. The SAS System 05:25 Thursday, December 15, 2005 5

PatientId age sex weight country Hzan0616341 30 1 200

11 Amir5666892 40 2 180 12 J675bhgfdql 56 2 . 45

27

Occur

filename fromrcr ftp 'main.data' cd='halley/thesis' user='cps-user' host='cps.brockport.edu' recfm=v prompt;

->Nmjhg567908 12 1 100 23 Iu6-567-567 99 1 170

01 ***A missing value for a numeric variable is presented by a period (.)Processing Examples: To use

external files, it is required to tell SAS where to find them. To do this, there are

the following choices:

1- Identify the file directly in the INFILE, FILE, or other SAS statement that

uses the file.

2- Set up a fileref for the file by using the FILENAME statement, and then

use the fileref in the INFILE, FILE, or other SAS statement.

3- Use operating environment commands to set up a fileref, and then use the

fileref in the INFILE, FILE, or other SAS statement.

Note: To use several files or members from the same directory, partitioned data

sets (PDS), or MACLIB, use the FILENAME statement to create a fileref that will

identify the name. The fileref can then be used in the INFILE statement and

enclose the name of the file, PDS member, or MACLIB member in parentheses

immediately after the fileref, as shown in the example below:

Also, from file menu, ADX can import data from a SAS data set or any of

ACCESS data base, Excel spreadsheet, a dBase database, a delimited

28

/* filename data 'directory-or-PDS-or-MACLIB' */;/* data1.txt and data2.txt located in directory c:\thesis */

filename data 'c:\thesis\';

data paitientdata1; infile data('data1.txt'); input PatientId $11. +2 age 2. +1 weight 3. +2 sex 1. +2 date1 mmddyy10. +2 date2 mmddyy10. + 2 country 12. ;

run;

data paitientdata2; infile data('data2.txt'); input PatientId $11. +2 age 2. +1 weight 3. +2 sex 1. +2 date1 mmddyy10. +2 date2 mmddyy10. + 2 country 12. ;run;

text file, and files with other common formats. This is helpful when one has

saved information in a variety of formats.

In SAS one can gain access to data sources by defining ’libref’ and

assigning accesses to them without copying them inside the SAS

environment. ‘libref’ makes a shortcut to the metadata on the SAS

Metadata Server. Any metadata in the SAS metadata server can be read

by a Meta. Meta is an engine that has options for controlling the outputs.

Meta creates just the metadata in the repository and does not affect the

data sources. If the table does not exist in the data source, the Meta

engine creates the metadata based on the information specified in the

application for the output table. When deleting a table, this option deletes

the metadata from the repository but does not delete the table from the

data source. Also, when deleting a table, this option deletes the table from

the data source but does not delete the metadata from the repository.

SAS Library includes Metadata objects that are defined by ‘libref.’ These

objects define the engines that are used to process the data. This library

has URI (Uniform Resource Identifier) architecture. To get access to a

SAS Metadata Server, define the host address. If working in a TCP

network, define the port number. If the protocol is not a com but a bridge,

define a user-id and password otherwise it will not be possible to log into a

SAS Metadata Server. In addition, any repository Metadata may be used

by a repository-id or name.

To access these tables, one can use SAS/Warehouse Administrator as a

tool. In order to determine the metadata, it needs to identify and search

the objects by their name, URL and other identifiers such as their ID. The

following script displays this process.

29

Ibname upcase metan liburi="SASLibrary?@name='oralib' "

ipaddr=d6292.us.GCS.com

Scripting:

SQL Scripting Goal is the driving of available data from any possible data source.

Most vendor applications have SQL backbone so that with SQL scripting it is

possible to perform queries on original or manipulated data (retrieving data from

multiple tables; creating views, indexes, and tables; and updating or deleting

values in existing tables and views as well as summarizing them). SQL scripting

can happen in SAS or SQL environment.

In the following example, the reduction of the earlier E_R schema ids is created

from inside the SQL environment:

/*------------------------------------------------------------------------------------------*//* create a higher-level entity set for drug information */CREATE TABLE drug(id CHAR(12) NOT NULL,generic_name CHAR(25),trade_name CHAR(25), dosage INT,unit INT,category INT,FOREIGN KEY (category) REFERENCES drug_category(category_id)ON DELETE CASCADE,FOREIGN KEY (unit) REFERENCES unit(unit_id)ON DELETE CASCADE,PRIMARY KEY (id)) ENGINE=INNODB;

/* create the lower level entity sets for drug information */

CREATE TABLE ingredient (id INT, drug_id CHAR(12),ingredient_name CHAR(25),ingredient_value INT,unit INT, INDEX drug_ind (drug_id),FOREIGN KEY (drug_id) REFERENCES drug(id)ON DELETE CASCADE,FOREIGN KEY (unit) REFERENCES unit(unit_id)ON DELETE CASCADE,

30

) ENGINE=INNODB;

/* the side effects of each drug have description that should be compatible with MedDRAClassification */

CREATE TABLE sideeffects (MedDRACode INT, drug_id CHAR(12),INDEX drug_ind (drug_id),FOREIGN KEY (drug_id) REFERENCES drug(id)ON DELETE CASCADE) ENGINE=INNODB;

/* create a general entity set for patient information; This entity set can be expanded by other entity sub sets such as patient laboratory information or more information about the history of that patient */

CREATE TABLE paitient(id CHAR(12) NOT NULL,first_name CHAR(25),middle_name CHAR(25),last_name CHAR(25), DateOfBirth DATE,Sex INT,weight INT,race INT,country INT,FOREIGN KEY (race) REFERENCES drug(race_id)ON DELETE CASCADE,FOREIGN KEY (country) REFERENCES drug(country_id)ON DELETE CASCADE,PRIMARY KEY (id)) ENGINE=INNODB;

/* some revalent paitient information might come from following sugested sub entity set */

CREATE TABLE Relevant_Patients_Info (Info_id INT NOT NULL AUTO_INCREMENT,paitient_id CHAR(25) NOT NULL,allergies_id INT, races_id INT, Num_pregnancies INT, smoking INT,alcohol_use INT, hepatic_id INT,dysfunctions_id INT,INDEX (allergies_id),FOREIGN KEY (allergies_id) REFERENCES allergies(allergies_id) ON UPDATE CASCADE ON DELETE RESTRICT,INDEX (races_id), FOREIGN KEY (races_id) REFERENCES races(races_id) ON UPDATE CASCADE ON DELETE RESTRICT,INDEX (hepatic_id),

31

FOREIGN KEY (hepatic_id) REFERENCES hepatic(hepatic_id) ON UPDATE CASCADE ON DELETE RESTRICT,INDEX (dysfunctions_id),FOREIGN KEY (dysfunctions_id) REFERENCES dysfunctions(dysfunctions_id) ON UPDATE CASCADE ON DELETE RESTRICT,INDEX (paitient_id),FOREIGN KEY (paitient_id) REFERENCES paitient(id) ON UPDATE CASCADE ON DELETE RESTRICT,PRIMARY KEY(Info_id)) ENGINE=INNODB;

/* transforming to a tabular form of this E_R model includes aggration is streightforward. Paitient-Drug relationship includes a column for each attribute in the primary key of the entity set for this relationship (any oconcomitant medical products that paitient uses and therapy dates might come from related tables in the drug id and paitient id. Also, any available adverse event information that shows the problem of using that drugshould be included.)*/

CREATE TABLE Patients_drugs (

Info_id INT NOT NULL AUTO_INCREMENT,paitient_id CHAR(25) NOT NULL,drug_id CHAR(12) NOT NULL,therapy_start_date DATE,therapy_end_date DATE,MedDRACode_DiagnoseForUse INT,/* 1 == yes, 2==no, 3==doesn’t apply *//* Event abated after use stopped or dose reduced */Quest1 INT,/* event reappeared after reintroduction */Quest2 INT, Lot_number INT, Exp_Date DATE, NDCno INT,

reason INT NOT NULL, date_of_event DATE,date_of_report DATE,adverse_desc TEXT, -----

INDEX (paitient_id),FOREIGN KEY (paitient_id) REFERENCES paitient(id) ON UPDATE CASCADE ON DELETE RESTRICT,INDEX (drug_id),FOREIGN KEY (drug_id) REFERENCES drug(id) ON UPDATE CASCADE ON DELETE RESTRICT,PRIMARY KEY(Info_id)) ENGINE=INNODB;

32

SQL scripting is required to generate reports on summary statistics. Macro

Language provides a facility that allows writing SQL procedure inside the SAS

environment. Therefore, SQL scripting extends SAS coding to the retrieval and

combination of data from tables or views. New ones can be created along with

indexes, and data values in PROC SQL tables can be updated. It is also

possible to update and retrieve data from Database Management System tables

or modify a PROC SQL table by adding, modifying, or dropping columns.

Example: Assume the Adverse Events Information from clinical studies, post-

marketing trials, spontaneous reports, and miscellaneous sources (including

independent drug identification numbers and retrospective data collection) are

saved in the above SQL tables. The following script generates a report that

shows Country of Origin for Patients receiving a drug in a post-marketing setting.

33

proc sql;

/* It extracts and manipulates grouped and ordered data from patient records to create a new temporary view table that includes only patient populations in each country. Country field is defined as an id number; to represent it by country name, it joins to the columns from countries table. After process is done, the temporary view table is dropped*/

create view temp as select country, count(country) as count, calculated Count/Subtotal as Percent format=percent8.2 from paitient, (select count(*) as Subtotal from paitient) as survey2 group by country order by count;quit; proc sql;

/* extracts required data from created temporary view table and then drop it */

title1 'Country or Origin for Patients Receiving the suspected drug in a Postmarketing Setting';

select c.countryname,t.count as cc,"(", t.Percent ,")" from countries c, temp t where c.ipcode = t.country; quit;

proc sql;drop view temp;quit;

Country or Origin for Patients Receiving the suspected drug in a Postmarketing Setting 22:04 Monday, January 16, 2006

CountryName Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

Greece 1 (0.1%)Uruguay 2 (0.2%)Taiwan 2 (0.2%)French Polynesia 2 (0.2%)Peru 2 (0.2%)Korea 2 (0.2%)South Africa 3 (0.2%)Portugal 3 (0.2%)Turkey 4 (0.3%)Hungary 4 (0.3%)Austria 4 (0.3%)New Zealand 7 (0.5%)Brazil 7 (0.5%)Norway 10 (0.8%)Israel 11 (0.8%)Chile 15 (1.1%)Netherlands 26 (2.0%)Italy 39 (3.0%)Spain 38 (2.9%)Belgium 38 (2.9%)United States 42 (3.2%)Finland 44 (3.4%)Germany 50 (3.8%)Sweden 69 (5.3%)Denmark 91 (7.0%)Canada 97 (7.4%)Australia 107 (8.2%)Great Britain 271 (20.8%)France 313 (24.0%)

The patient exposure to the drug can be calculated and presented in different

ways. Although available exposure data are provided for a period of time, the

primary focus of a submitted report may be the number of exposures and cases

that occurred in a specific period of time. In the following report, global patient

exposures from 1989 to 2004 are provided:

proc sql;create view temp1 as

34

select region, count(region) as SachetSales from paitient group by region order by SachetSales;quit;

proc sql;create view temp2 as select region, count(region) as Exposures from paitient, where paitient_Id in (select paitient_Id from Patients_drugs where substr(therapy_start_date,7,4) > '1983' && substr(therapy_end_date,7,4) < '2001') group by region order by Exposures;quit;

proc sql; title1 'Wor ldwide Patient Exposure to the suspected drug 1989 to 1994';select c.region,t1.SachetSales , t2.Exposures from countries c, temp1 t1, temp2 t2 where c.ipcode = t1.region and c.ipcode = t2.region ; quit;

proc sql;select sum(t1.SachetSales) as SumSachetSales, sum(t2.Exposures) as SumExposuresfrom temp1 t1, temp2 t2quit;

proc sql;drop view temp1, temp2;quit;

Worldwide Patient Exposure to the suspected drug 1989 to 1994 23 20:55 Saturday, January 21, 2006

Region SachetSales Exposures ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Europe 230,649,500 1,895,749 Australia 5,292,542 43,500 Korea 3,067,300 25,211 Canada 1,497,100 12,305 Rest of World 2,405,064 19,768

SumSachet SumExposures ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 242,911,506 1,996,533

35

Inside the SQL scripting, one may occasionally work with data that are imported

from the MedDRA application. These data may have already existed in a

machine and it is not required to make access to the MedDRA environment a

second time. One can use the SAS utility to convert data from one form to

another or copy between machines. A free trial of MedDRA is available on the

MSSO website. This contains a copy sample of MedDRA data which are saved

in an Access data base. It could also be imported to an Excel file if needed. If

the data set is standard and completed it would then be better to use it as a

shared data source. This shared data source may be stored as a Relational

Database System (RDBMS), an Excel spreadsheet, or even as data stored on a

flat file. If it is stored in an external machine then it becomes an external data

source and a SAS connection is required for access.

The following SAS script retrieves MedDRA Classification from a data source. It

imports data from an external file (a spreadsheet) to a SAS table. This code was

generated and saved during the wizard importing process. Saving this type of

script helps to prevent redoing the work when the information is needed again.

The following script works as well:

36

Filename xclfil 'C:\thesis\CTCAEv3.xls’; proc import datafile=xclfil out= WORK.MEDDRAInfo dbms=excel97 replace; getnames= yes ;

PROC IMPORT OUT= WORK.MEDDRAInfo DATAFILE= "C:\thesis\CTCAEv3.xls" DBMS=EXCEL REPLACE; SHEET="'CTCAE v3#0 MedDRA Codes$'"; GETNAMES=YES; MIXED=NO; SCANTEXT=YES; USEDATE=YES; SCANTIME=YES;RUN;

The above script retrieves MedDRA Classification from a data source. Often these data may not represent all MedDRA data. Usually, only a subset of these data is required and is stored in an external file.

Assume MedDRAClassifications.xls includes only the MedDRA Classifications Data. To generate reports related to side effects, importing this file is enough to retrieve the appropriate symptoms information or signs listed by outcomes.

The SAS System 05:25 Thursday, December 15, 2005 1

Obs MedDRATermLevel1 MedDRATermLevel2

1 Nervous system disorders 2 Balance disorder 3 Convulsion 4 Lethargy 5 Optic neuritis 6 Paraesthesia 7 Speech disorder 8 Tunnel vision 9 Visual field defect 10 11 Eye disorders 12 Astigmatism 13 Blindness ………………….

* Sometimes the information that comes from a Report Adverse Event, clinical trials or any other post-marketing or Pharmacovigilance Application has a provisional order number that is assigned

37

PROC IMPORT DBMS=EXCEL OUT= work.MedDRA DATAFILE="c:\thesis\MedDRAClassifications.xls"REPLACE;Run;

infile ' c:\thesis\MedDRAClassifications.csv' delimiter=',' dsd; proc print data=MedDRA;run;

to outcome data which is cannot be correctly mapped to MedDRA. These order numbers alone can be used when electronic reports or data are submitted and automatically converted to the MedDRA codes.

From the parameter list created, values can be individually highlighted and chosen for processing. These required parameter values may be retrieved from tables that have been created by scripts such as following:

proc sql; create table reasonlist1 ( Description char(60));

insert into reasonlist1 values('Patient Died') values('Life threatening illness') values('Required emergency room/doctor visit') values('Required hospitalization') values('Resulted in permanent disability') values('Resulted in prolongation of hospitalization') values('others');

The ordering of the above parameter values is important for selecting the rows by

their Order Number and the description of these values must be the same as

those found on the FDA forms. The following script creates a parameter table for

the abbreviations used by Drug Safety Reporting. The ordering and description of

these abbreviations is also consistent with FDA standards.

proc sql; create table abbreviations ( abb char(5), Description char(60));

insert into abbreviations

values( 'ADR','adverse drug reaction')values( 'AE','adverse event')values( 'AERS','Adverse Event Reporting System ')values( 'bid','twice daily')values( 'CI','confidence interval')values( 'CIOMS','Council for International Organizations of Medical Sciences')values( 'COSTAR','Coding Symbols for Thesaurus of Adverse Reaction TermsT')values( 'CSDS','Core Safety Data Sheet')values( 'CV','coefficient of variation')values( 'FDA','Food and Drug Administration')values( 'GABA','Gamma amino butyric acid')values( 'HARTS','')values( 'IBD','International Birth Date' )

38

values( 'ICD9-1','International Classification of Diseases, 9th and 10th 0')values( 'ICD9C','MEditions/Revisions')values( 'ICH','International Classification of Diseases, Ninth Revision, Clinical MedDRAModification')values( 'NDA','International Conference on Harmonisation ')values( 'PSUR','Medical Dictionary for Regulatory Activities')values( 'qd','New Drug Application')values( 'qid','Periodic Safety Update Report')values( 'SAE','once daily')values( 'SD','four times daily')values( 'SE','serious adverse event')values( 'US','standard deviation')values( 'WHO-AR','standard error T');quit;

Formatting may be used for other parameter values. The ATTRIB Statement

permanently associates a format with a variable. SAS uses the format to write

the values of the variables specified.

attrib sales1-sales3 format=comma10.2;

Due to the permanent association of the ATTRIB Statement in the above

command, any subsequent DATA Step or PROC Step will use COMMA10.2

format to write the values of sales1, sales2, and sales3.

In addition to the default formats that are supplied by Base SAS Software, one

can create custom-made formats by the Format Procedure. The following format

procedure is used to define the Static Parameter Values that may be required. It

expresses weights; and measures using USP (United States Pharmacopeia)

standard abbreviations for dosage units.

Proc format; value $dosage_units ‘1’ = ‘m’ ‘2’ = ‘kg’ ‘3’ = ‘g’ ‘4’ = ‘m’ ’5’ = ‘mcg’ ‘6’ = ‘L’ ‘7’ = ‘mL’ ’8’ = ‘mEq’ ’9’ = ‘mmol’ ‘10’ = ‘ %’

run;

39

*see legend below for definitions

(1) m (lower case) = meter

(2) kg = kilogram

(3) g = gram

(4) mg = milligram

(5) mcg = microgram

(do not use the Greek letter mu which has been misread as mg)

(6) L (upper case) = liter

(7) mL (lower/upper case) = milliliter (do not use cc which has been misread as U or the

number 4)

(8) mEq = milliequivalent

(9) mmol = millimole

It can also be used to define a format variable for the drug in question (see procedure below):

Time durations, age and formats are also available:

40

proc format; value $dosage_form ‘1’ = ‘capsule’ ‘2’ = ‘cream’ ‘3’ = ‘ear drop’ ‘4’ = ‘eye drop’ ‘5’ = ‘inhaler’ ‘6’ = ‘injection’ ‘7’ = ‘oral solution’ ‘8’ = ‘solution’ ‘9’ = ‘suspension pediatric drop’ ‘10’ = ‘syrup’ ‘11’ = ‘tablet’ ‘12’ = ‘chewable tablet’ ‘13’ = ‘other’ run;

proc format; value $time_duration_form ‘1’ = ‘hour’ ‘2’ = ‘day’ ‘3’ = ‘week’ ‘4’ = ‘month’ ‘5’ = ‘year’

run;proc format; value $age_range _form ‘1’ = ‘children’ ‘2’ = ‘adult’ run;

proc format value $eating-format ‘1’ = ‘with meal’ ‘2’ = ‘without meal’ ‘3’ = ‘before meal’ ‘4’ = ‘after meal’ ‘5’ = ‘with a glass of water’ ‘5’ = ‘other’ run;

proc format value $time-format ‘1’ = ‘morning’ ‘2’ = ‘noon’ ‘3’ = ‘after noon’ ‘4’ = ‘evening’ ‘5’ = ‘midnight’

run;

Other values are a combination of the above defined formats. For example, drug

labels may read: “for adults, every morning, 2 tablets, 2 hour before meals, with a

glass of water” or “for children, under 8 years of age, ½ a tablet before meals, with a

glass of water….”

In a database, grouping processes may be based on the “Sex/Gender” field where

the values of “Male” “Female” and “unknown” can define minor groupings. These

values can be stored as Numeric variables (1, 2, and 3). The ordering of numeric

levels in relation to classification variables must be done with care. If in a statistical

41

report, the data for female patients is required to appear after the data for males,

the “Sex/Gender” field would use “2” for females and “1” for males. The following

SAS script describes this formatting.

proc format library=proclib;

value $sex '1'='male’ '2'='female' '3'='unknown' picture pop low-high='000,000,000'

run;

Formatting has other usages in scripting. Many of the data values must be defined

by format. In SAS one can use this format with any of the following:

1. PUT, PUTC, or PUTN functions

2. %SYSFUNC macro function

3. FORMAT/ATTRIB statement in a DATA step or a PROC step

Also one can use a macro function to define a user defined function. This function

applies the defined format to the result of the function outside a DATA step.

Usually Patient records are the type of data that can come from an Open

Database Connectivity (ODBC). It is very possible that these data have existed

as a backbone of a medical client-server application. In this case, access to data

42

num=15;char=put(num,hex2.);

population=1145.32;put population 10.2;result: 1,145.32

%macro tst(amount); %put %sysfunc(putn(&amount,dollar10.2));%mend tst;%tst (1154.23);

via ODBC is required. The module "SAS/Access for ODBC" must be installed on

the computer. Configuring the database by referring to the DNS (Data Source

Name) and how it is accessed is can also be required. Even parameter values

can come from an ODBS. These data may have dynamic data values that get

up-dated by end-users through the web. Normally, these applications have

administration parts that allow the end-user to do parameter updating.

Example:

The following script shows how one can use a part of data that is stored in

another vendor's Database Management System (DBMS) files. This data then

goes into the SAS data set. In the following script a ‘libref’ is declared and points

to a library containing Oracle data. SAS reads data from an Oracle file into a SAS

data set:

Memory allocation is the most important concept in creating or extending a data

library. SAS allows for the request of space as needed. For optimizing system

performance and allocating space appropriately, one can pre-allocate the most

space that that may be needed. These methods are used more often when

multivolume access to SAS data libraries is required.

The above data statement may then change to:

/* Know this is a big data set. */data paitient.big (alq=100000 deq=5000);

43

libname dblib oracle user=halley password=halley path='hrdept_002';data paitient.big; set dblib.paitient;run;

As is explained earlier, data can come from an external data file. Additionally,

one can connect to a data file and work on it. In the following script, we can

connect to Z/OS and UNIX server to use DB2 and Oracle data:

/*************************************//* connect to z/OS *//*************************************/ options comamid=tcp; filename rlink '!sasroot\connect\saslink\tcptso.scr'; signon os390host; /*************************************/ /* download DB2 data views using */ /* SAS/ACCESS engine */ /*************************************/

rsubmit os390host; libname db db2; proc download data=db.paitient out=db2dat; run; endrsubmit;

/*************************************/ /* connect to UNIX */ /*************************************/

options remote=hrunix comamid=tcp; filename rlink '!sasroot\connect\saslink\tcpunix.scr'; signon;

/*************************************/ /* download Oracle data using */ /* SAS/ACCESS engine */ /*************************************/

rsubmit hrunix; libname oracle user=hzan password=halley; proc download data=oracle.paitient out=oracdat;

run;

endrsubmit;

/*************************************/ /* sign off both links */ /*************************************/ signoff hrunix; signoff os390host cscript= '!sasroot\connect\saslink\tcptso.scr';

44

/*************************************/ /* union data into SAS view */ /*************************************/ proc sql;

create view temp_joindata as (select gender ,country, count(*) into population from db2dat group by gender,country ;) union (select gender,country, count(*) into population from oracdat group by gender,country;) union (select gender,country, count(*) into population from paitient1 group by gender,country;)

proc sql; create view jointdataselect temp_joindata.gender, temp_joindata. population, countries.namefrom temp_joindata, countries where countries.codeId = temp_joindata.countryorder by gender, countries.namegroup by gender, countries.nameoptions fmtsearch=(proclib);

/* The NOWD option runs the REPORT procedure without the REPORT window and sends its output to the open output destination(s).*/

proc report data=jointdata nowd;column gender country population;format gender $SEX. Country & $50. Population pop;title ‘Country or Origin for Patients Receiving the drug in Post marketing’;run;

Country or Origin for Patients Receiving this drug in Post marketingfor 04JAN06

Gender country Population Female Algeria 743,453Male 235,984Unkown 167

Female Denmark 423,457,698Male 546,876,345Unkown 897

Female Spain 456,9812,564Male 400,987,564Unkown 234

Female United Kingdom 876,234,123Male 564,234,876Unkown

45

Conclusions:

This thesis proposes ways on how to improve programming practices for

Standardizing Drug Safety Reporting Systems. The quality of a Drug Safety

Reporting Application depends on the system architecture, methodologies, and

modeling used by the programmer. The degree to which an implementation is

standardized is in direct proportion to the correctness of methods in accessing,

gathering and manipulating the data, its classifications, control code, quality

control, formatting, statistical analyzing, and mining thereof. Classification terms

should follow a hierarchical structure that is consistent with FDA standards and

MedDRA. Using the control code with MedMinder and the SCM is also

important. Both this and quality control should not be overlooked by

programmers. Formatting of data must be done properly and again, consistent

with FDA standards. Statistical analyzing and data mining in these types of

applications must also be done correctly as it has a direct affect on the results.

Ultimately, gathering data and its access should be handled dynamically and

manual accessing should not be considered. Above all, details such as size of

data in the data accessing stage should be carefully protected.

As to the professional performing in the system, an advanced background in

computational, mathematical, and programming methods is obligatory for

accurately applying these terminologies. SAS programming, knowledge of

Object Oriented programming data structures, data base modeling and SQL are

all necessary skills for implementing a Standard Drug Safety Reporting System.

Knowledge of statistical modeling is particularly desirable in data mining

applications. Finally, a graduated computational science major or a professional

software designer can make the application work more dynamically and

accurately with good scripting skills. The workbench of Drug Safety Reporting

Systems is made up of SAS, and MedDRA applications. SAS supports an

advanced data accessing technology; and MedDRA classification matches the

46

metadata required for designing this application. These existing components

improve the reliability of design, and SQL scripting expands it.

47

References

SAS Publishing, the Analyst Application, Second Edition (July 2002)

Adriaans, P., and D.Zantings.1996. Data Mining. Edinburg Gate, England: Addison

Wesley Longman.

Hand, D.J. 1997. Construction and Assessment of Classification Rules. New York: John

Wiley & Sons, Inc

Berry, M.J.A., and G. Linoff. 1996. Data Mining Techniques for Marketing, Sales, and

Customer Support. New York: John Wiley & Sons, Inc

Bergeron, Bryan P. (2003). Prentice Hall Professional Technical Reference. Bioformatics

Computing. New Jersey: Pearson Education, Inc.

Pharmacoepidemiology and Drug Safety, Vol. 1 [1992], Vol. 2 [1993], Vol. 6 [1997]) &

Vol. 7 [1998])

Agresti, A. (1996) Introduction to categorical Data Analysis, Wiley, NY

Collet, D. (1994) Modeling Survival Data in Medical researches, CRC/Chapman & Hall,

London

Benichou C., (ed) Adverse Reactions: A practical Guide to Diagnosis and Management

(Wiley & Sons, 1994)

Fuchi, K. (1981) “Aiming for knowledge information processing system.” Processing of

international conference on fifth generation computing systems, Japan Information

Processing Development center, Tokyo republished (1982) by North-Holland Publishing,

Amsterdam

SAS online documents http://www.sas.com/service/library/onlinedoc

CDER (http://www.fda.gov/cder/handbook/index.htm )

MedWatch http://www.fda.gov/medwatch/getforms.htm

48

http://www.fda.gov/medwatch/getforms.htm

http://www.fda.gov/cder/handbook/index.htm

http://www.sas.com/service/library/onlinedoc

MedDRA http://www.meddrahelp.com/

49

http://www.meddrahelp.com/