standardization of “drug safety” reporting applications-
DESCRIPTION
TRANSCRIPT
The New York State University at BrockportDepartment of Computational Science
Standardization of “Drug Safety” Reporting Applications
Halley M. Zand
Winter, 2005
Thesis advisor: Dr. Robert Tuzun
1
Abstract:
The purpose of this thesis is the development of an application process for
preparing reports on drug safety. The FDA is responsible for protecting the public
health by assuring the safety and security of human and veterinary drugs.
Annually, companies who provide medications are required to generate reports
that assure the FDA of the drug’s safety.
This thesis proposes an Information Technology infrastructure model that
provides drug providers IT organization with a strategic perspective on how to
computerize their Drug Safety reporting activity. It introduces software
development concepts, methods, techniques and tools for collecting data from
multiple platforms and generates reports from them by scripting queries.
Introduction:
According to Guidance Documents for Drug Evaluations and Research from the
U.S. Food and Drug Administration all prescription drugs, both new and generic
need to be approved by the FDA. To obtain these approvals, drug providers are
required to generate annual reports on product safety and attach them to their
application letter. Also, any person can report to the FDA a reaction or problem
with a drug. The FDA reviews applications and all reported clinical outcomes to
see if the reported events happened because of other reasons or use the
suspected drug.
Manually reporting is not practical because of the large volume of data, and the
differing platforms and formats in which they are stored. Unfortunately, tools and
standards are often poorly used due to lack of Database Application Modeling,
Programming and Software Engineering skills. User applications are often
cobbled together with little more efficiency than manual processing, and tools for
automation and large scale data processing are not utilized.
2
The hiring of qualified staff and carefully selecting software increases the quality
and reduces costs. A two-hour job may take a week due to poor technical skills,
and the cost of software licensing may increase by as much as 5,000 USD from
10,000 USD because of the lack of attention paid to the productivity of the
software tool. A standardized IT infrastructure provides higher computational
quality at lower cost. In addition, professional developers with computational
science backgrounds are the only group that has the sufficient computational
knowledge and bookkeeping skills for software application design and the ability
to apply technical concepts.
Merging Computational Science and Drug Development Science for Drug Safety
Evaluation can be evolved within a modern computer environment; and because
Computational Technology grows quickly, designers would need an advanced
vision for the future. A strong knowledge in computational science and
bookkeeping helps developers use what is available and progress forward from
it.
This thesis explains a modern computational architecture for implementing Drug
Safety Reporting Applications. This architecture uses advanced IT concepts to
increase the quality of work on a large volume of data that may be dynamic
rather than static and comes from distributed computer networks. This thesis
aids in the study of Drug Safety in obtaining the best software solution
advantages possible.
Objectives:
SAS is the software application that developers use to provide high-quality
reporting applications for Drug Safety. The collection of concepts that work
together is required in order to achieve a computer-based method for Drug
Safety evaluation. This paper proposes an infrastructure that uses the optimal
solutions for this process. The abstract is intended to use the information
3
gathered to develop the system as a whole. It can accept data from both papers
and electronic databases. Databases such as Oracle and Microsoft Access can
be considered as backbones of the system. All computational terminologies that
are recommended for this proposed infrastructure must be explained. For
example; in some cases, data mining might be used to find a pattern and help to
estimate descriptions of a data field. This ability of the proposed architecture in
data mining should be illustrated.
In this thesis, entity relational database modeling as well as data accessing,
formatting, classification, and scripting is illustrated best by giving examples and
working on creating descriptions of longitudinal data. Focusing on code
consistency with all essential attributes and their effeciencies in the proposed
infrastructure is included. Proposed software should support maintainablity; but
focusing techniques on the data error concept is not within the scope of this
paper. In order to achieve the best result, we need to use all available pieces of
accurate data and perform the correct programming processing. These data can
come from health care providers, consumers, literature, and other relevant
databases. It is important to find the ordinary errors during scripting. Due to a
missing part or step in coding for data processing (extracting and retrieving,
manipulating data, or making narrative data from queries and assessing them) a
large difference on the expected result and the accuracy of the reports may
occur.
Technical Specifications:
Data accessing:
SAS data might come from other application platforms. These data might be
formatted or non-formatted and therefore filed differently in varied environments.
Accessing these data from several servers is done in the following steps.
4
I. Use the SAS ODBC driver to access by communicating with
either local or remote SAS servers using TCP/IP protocol. Data
can come from a local, remote, or any type of database server. It
can be in any format including raw data or any vendor’s software
data set. The ability to read raw data in any format, from any kind
of file (including variable-length records, binary files, and free-
formatted data--even files with messy or missing data) is required.
II. Combine and manipulate these data on the client side, analyze
the out-coming data and distribute it by making an execute file
from the server to multiple client.
The following are examples of possible case in data accessing:
a. Data may exist on a mainframe computer or pc network.
These data might join to an existing data set, create new
variables (columns), and produce tables and interactive
graphs.
b. Raw data may exist on a UNIX server. Compute other
data values from them, form statistics, and create an
HTML report to use in web application systems, then
store on a web server in intranet /internet platform.
c. Access may be needed to BMDP, SPSS, and OSIRIS
files directly as well as files such as Microsoft Excel
spreadsheet, Microsoft Access table, dBase, ORACLE
forms and any other DBMS. In addition, both relational
and non-relational databases, including any PC data
source can be considered as a data file.
d. The relational databases in DB2 format exists in OS/390,
VM, DB2, UNIX or PC environment.
e. ODBC, Informix, ORACLE and OLE DB data may come
from any platform. They may also come from SYBAS
5
machine or Teradata, MSSQL Server or any other
machine.
f. Baan or PeopleSoft files may come from ERP systems
such as R/3 and SAP BW. Thus global data may be
received and processed for creating an enterprise report.
Data Management: After accessing data, it is necessary to manage them, by
creating, retrieving, and updating database information. This may require
advanced programming skills because the information comes from a wide range
of data sources and it is necessary to merge them together and then evaluate.
Data with the same attributes need generic formatting that requires a
manipulation process. Evaluating values of data requires computational
operations that may be defined as functions. Saved sets of data in the data
forms may have been extracted from subsets data. Complex conditional
processes during data manipulation may be needed when a wide range of data
source is merged.
After gathering and shaping information we need Statistical Analyzing to
produce reports. These reports are customized and they may be complex.
Tables, frequency counts, and cross-tabulation tables may be produced to create
a variety of charts and plots. Also, the computation of a variety of descriptive
statistics including linear regression analysis, standard deviation, correlations
and other measures of association, as well as multi-way cross-tabulations and
inferential statistics may be necessary.
These representations should be able to be reported to a wide variety of
locations and platforms in order to suit client needs. Results may be required to
be presented in many formats, such as an array of markup languages including
HTML4 and XML, or formatted for a high-resolution printer such PostScript/ PDF/
6
PCL files, RTF or even color graphs that can be made interactive using ActiveX
controls or Java applets.
System architecture modeling
Information must be gathered by drug providers. These data come from clinical
studies by the FDA and other professional investigators. Other information
comes from medical records of patients who were treated by the specific drug.
Usually, drug providers do a study of their product before moving onto the
evaluation step. The first step is the collecting of data to generate reports such as
the country of origin for patients receiving the drug, worldwide patient exposure,
demographic characteristics, most commonly reported body-system reactions
(ordered by gender and/or age of patients), and the summary of death or other
7
Post marketing DATA Ware House
User
Data DictionaryArchive
(MedDRA/PubMed)
(Oracle)
ModificationClinical Studies Data,
Individual clinical trials
Reporting data by investigators.
Data Analyses
(SAS)
Adverse Event Reporting
Clinical Trials Hospital Labs.
Verifications
critical body reactions. Another resource is the company’s surveys on products
completed by patients or clients who are volunteers in the U.S. or other
countries. These surveys include match data from Med Watch Forms that the
U.S. Department of Health and Human Services accepts as a voluntary reporting
of adverse events and product problems. Also, these manufacturing companies
may be able to receive FDA Reports generated on the basis of Med Watch
Reports about this product. Furthermore, many of the surveys are answered by
physicians and other doctors who have the EMR System and are able to answer
detailed questions regarding medical conditions and other related medical
issues.
Any tool that is recommended here should be consistent with FDA Standards
and the objectives that follow.
In any Adverse Event Reporting System, the Basic Calculation and Data
Analysis have statistical bases on data sets that may frequently be ordered
according to one or more variables coming from a variety of data sources. Thus
an Adverse Events Reporting System can work on any possible platform. For
example, if it uses E2B data element structures then it should be able of doing
any possible interactive query or data flow transactions on shared data. SAS is
compatible with all computer platforms. It works on any type of operating system.
It supports data sharing concepts. It suports submission through the WEB or any
other network that includes Oracle, Unix, NT servers or Mainframe machines.
This means that any regardless of backbone, SAS can suport it.
Data sources may need to be summarized or checked before being
reported. Scripting and programing concepts are one of the major necesities in
development. SAS has a powerful scripting language that can do any required
summarizing, verification and validation.
8
In the pharmaceutical field and bio-informatics, SAS software is generally
thought of for statistical analysis programming but is also a largely untapped
resource for its other many features. It’s screen building and object oriented
development abilities are needed to keep up with the latest Information and
Technology advances.
SAS is a stand-alone system produced by SAS, Inc. and sold in the open
market. It exceeds all technical objectives specified here.
The FDA has proposed MedDRA as a standardized dictionary of medical
terminology. MedDRA has been used internationally to discuss the regulation of
medical products. MedDRA provides symptoms, signs, diseases, and diagnoses
information. It also includes other information such as:
Names of investigations (e.g. liver function analyses, metabolism tests)
Sites (e.g. application site reactions, implant site reactions and injection
site reactions)
Therapeutic indications
Surgical and medical procedures
Social and family history terms
SAS and MedDRA are FDA standards. They have high standard designing; and
assure that company builders continue looking to find weaknesses and improve
their products. All their documentation and userinterfaces are user friendly. SAS
and MedDRA are generic softwares and any specific needs such as security of
data or reliability of operations can be negotiated in a service level agreement.
EMR Database: These data come from hospital laboratories and clinical data
entry systems. They are documented before and after verification. All
documentations are electronic and all reporting submitted electronically. MeDRA
9
does encoding that is part of clinical data entry. All data entries are standard
based approved by the FDA.
Terminologies:
A computerized Drug Safety Evaluations requires the following informatics
terminologies:
Data classifications
Control Code
Formatting
Quality Control
Data Mining
Gathering information
Accessing and manipulating data
Scripting
Each of these terminologies carries a process or methodology that will be
discussed in the following.
Data Classification:
Any Structured Analysis of information needs classification. Data Classification is
the first best-known task in data flow modeling. The data model of a Drug
Adverse Event Reporting System is derived from conceptual information such as
entities and their interrelationships. A mechanism serves as a store of all drug
information which can link analysis, design, implementation and evolutions
applied in most medical applications. This classification should be consistent and
not clash. It is integrated in all parts that require maintainability.
10
The outcome attributed to adverse events is the most important information that
needs to be classified. The data classification for this attribute should be a
standard classification that is matched by the FDA reporting program.
The FDA uses MedDRA as a part of the proposed rule for post-marketing
reporting. MedDRA is the abbreviation for Medical Dictionary for Regulatory
Activities and it is an international terminology designed to support the
classification, retrieval, presentation, and communication of medical information
throughout the medical product regulatory cycle. Originally, MedDRA was written
in English and distributed in ASCII file format; but it is now available in several
other languages such as Dutch, French, German, Italian, Portuguese, Spanish,
and Japanese. This on-line dictionary is intended to become the global medical
terminology standard for use by every bio-pharmaceutical company in the world
and has the best-known classification with an integrated platform in updating that
can be used by all standard systems. In the majority of homegrown medical
applications, the patient medical recording systems use this classification and it
is valid for all phases of drugs and subscribing Pharmaceutical companies.
MedDRA works as a catalog of medical disorders. It has a hierarchical data
structure that has five terms. Developing queries or retrieving information about
medical diagnoses need hierarchical searching on these terms, and other
queries might be selected by grouping them thusly.
The next page picture shows the SOC view of Cardiac and Vascular
investigations (excl enzyme test):
11
MedDRA classifications have an Object Oriented data structure as shown in the
following screens.
12
13
Each MedDRA has a unique code that can be use as a searching key.
14
A query makes a link between collected data and terms in MeDRA. A
query can create a selection on a description of medical data. This
selection requires searching and enters the term to be sought into the
'Search for Value' field. The query then selects one of the records
returned and identifies information about patients. After that, codes in
the database are ready for any statistical evaluation.
The other advantages of using MedDRA are:
MedDRA is on-line (not requiring installation or periodic updates on the
client system). The application has a standardized interface, is well supported,
and requires little effort to interface with any client computing environment. A
good designer can get the best advantage of this classified information by using
it as a shared data set. Updating this shared information maintains all the related
outcomes that have referenced this data set.
Informatics terminologies such as encoding are already included in
MedDRA for its own data sets.
MedDRA includes high standards that can be updated with queries or
importing data; however, it requires quality control because it can disrupt
everything.
Current MedDRA Version has MediMiner for the managing and analysis of
the coded data included all data mining. This unique tool allows analysis of the
impact of recoding the data sets from one MedDRA version to another when
MedDRA is a standalone product that has been used as an integral component
of our range of coding tools. MedDRA classification can be browsed by a tree
that can be collapsed and viewed at every level of detail for all occurrences in
every possible search category such as legend, terms and coding.
15
Control Code:
SAS and MedDRA both have code controlling utility to do the following:
Debugging system and maintenance ability in any branch of code to make
a cross-reference listing showing all the program names that have been declared
and used.
The analyzer discovers un-initialized variables, unreachable codes,
uncalled functions and procedures as well as the number of times executed for
each statement.
MedDRA has MedMiner as its version control utility. During any updating in
MedDRA MedDRA 3.1, MediMiner controls all changes by analyzing the coding
sets. In MedDRA 4.1, it also impacts the recoding of data by identifying all codes
that remain unchanged, and identifying those codes that may require recoding. It
is also possible to identify the codes that no longer exist, those that have been
changed in some way, and those that have a related change or where a
multiracial (inherited from multiphase of original codes) change has had an
impact. Primary and secondary changes are identified as well as changes in the
current status of the code.
SAS software includes Source Control Manager (SCM) utility as one of the
options in Desktop selection of Solution menu.
SAS->Solutions->Desktop->Development and Programming-> Source Code
Manager
16
SCM includes a friendly GUI that has SAS file check-in/check-out capabilities.
This GUI lists all libraries, data sets, catalogs, and catalog entries in a
hierarchical order. SCM has flexible testing, revision control, and version labeling
with an easy application distribution utility. By having a version label, it is easy to
create a copy of an application and place it in other locations on the network.
Also, SAS/CONNECT utility can place the application on other remote machines.
Formatting:
Usability of information is one the most important components of any application
implementation. Usability requires readability and the readability of any data set
is facilitated by standardized formatting. Each line represents many separate
17
pieces of information which are data values, and the formats determine how
these values are displayed or used in calculations. These formats set the width
of displayed values, the number of decimal points displayed, the handling of
blanks, zeroes, and commas, as well as other details.
SAS supports its own standard and user defined formatting. Standard formats
might be use for numeric, character or picture data. Also, User can write or
define custom-made formats in Data and Procedure steps. User defined formats
are reusable and can be saved in format catalogs. If saved in a SAS Catalog
they then remain there permanently. If saved in catalog WORK.FORMATS, they
are there temporarily and retrievable only in the same SAS session or job in
which they were created. Because catalogs are a type of SAS files that reside in
a SAS data library, they work as an executable handling facility and intercept run-
time error under undefined format. By this way, type-checking is supported and
influences the readability of information. If the SAS system option NOFMTERR is
in effect, SAS uses its own default formatting when it calls an undefined format
so that in some cases we might ignore these errors and continue the executing.
Quality Control:
Delivering the correct result requires quality control. SAS recognizes common
errors such as syntax, execution-time, data and semantic errors; however, users
can check for common mistakes such as the following:
Check for syntax errors
o statements ending with a semicolon
o starting and ending quotation marks
o keywords
o Every DO and SELECT statement must be followed by an END statement
18
Check for execution errors:
o illegal mathematical operations
o observations out of order for BY-group processing
o Incorrect reference in an INFILE statement such as misspelling or
otherwise incorrectly stating the external files are recognized.
o A program may run, yet give an incorrect result. These errors are often
detectable by checking self-consistency and should always be reported,
certainly in the debugging stage, and often during production runs.
SAS usually executes the statements in a DATA step one by one, in the
order they appear. After executing the DATA step, SAS moves to the next step
and continues in the same fashion. It must be certain that all the SAS
statements appear in order so that SAS can execute them properly.
Check input statements and data. SAS can detect data errors during the
execution; but this won’t terminate the processing. After executing, it prints a
note describing the error. In that note SAS lists the related values that are
stored in the input buffer and the program data vector.
o The corresponding values with actual variable values in INPUT statements
must be checked.
o Any corresponding arrangement such as formats, lists and columns for
input statements must be checked too.
Data mining:
Data mining is a class of database applications that look for hidden patterns in a
group of data. Statistical analysis is the data analyzing method that is matched
with the nature of data mining. Statistical analysis might uncover the hidden
19
pattern of data for a large volume of information coming from Adverse Events
Reports or survey systems. A data mining process might combine variables that
occur more than expected. By applying statistical options, an optimal guess can
be made about the best match behavior that may have occurred frequently.
Data mining is a critical aspect of these reporting systems. Occasionally, the
predictions may be even more important than detections in drug safety
evaluation. In the United States, patients can file lawsuits against drug providers
for severe adverse reactions. These legal actions often make American drug
companies fearful to introduce drugs into the U.S. market. However, data mining
on data from other parts of the world offers a way to move the drug safety
process from a reactive process to a proactive posture in efforts. In effect, it
would help drug providers to take a safer marketing strategy rather than take
risks.
Data mining on data from other parts of the world is also a way to move drug
safety evaluation from detection to prediction
If MedDRA System Organ Class terms are adopted as a class of events then one
can select related data from patient records for that event and make it possible to
discover statistical rules or patterns automatically from the data, later creating a
hypothesis and runing tests on the patient record database to verify or refute it.
Data mining can protect drug providers against lawsuit. This process uses data
from other countries and clinical studies.
SAS assists data analyzing in an instructional way, so that even people with no
statistical knowledge are able to run the required processes on selected data
sources (a basic option includes: counting missing and non-missing values,
minimum, maximum, range, sum, mean, variance, standard deviation, standard
error of the mean, coefficient of variance, skewness, kurtosis ). In addition,
access to data sources can be secured to prevent unauthorized access. SAS
also allows for the creating of different reports and presentations on results
20
(including tabular tables, frequently reports with graphical presentations to
visualize the results).
SAS supports data mining for a large volume of statistical procedures
(regression, association discovery, time series, and time series cross-Sectional
(panel) data analysis), whereas, data is usually analyzed by regression (one
observation for each patient). Sometimes it is required to correlate with cross-
sectional data such as geographic region, gender, smoking, alcohol use, and so
on.
Gathering information and documenting system specifications:
The available information (such as the toxicological and pharmacokinetic profiles
of the individual drug, the treatment indication or indications, the intended
populations, etc.) might have been defined by relational databases. The
backbone of this system might be SQL, Access or even Excel; but the data query
may not be suited to the performance of detailed statistical analyses of data in
this stage. It is then that SAS helps in statistical analysis. SAS has been
interfaced with databases to allow large volumes of data to be retrieved efficiently
for analysis. All engines can be assigned to a SAS library. This library is a place
that saves all access to the stored files. These files might come from a variety of
engines such as ODBC, SPSS, SYMBAS, REMOTE, META, MYSQL, ACEESS,
ORACLE, DB2, MySQL, ACCESS, etc. For the processing of data, it is required
to define all connections that might be created between the different sets of data
records. The first link can illustrate correspondence of the MedDRA
classifications to the patient records. In concentrating on the relevance of
available data, medical information of patient works in tandem with MedDRA
classifications to build queries and analysis information.
As a part of application developing process, specifying the following information
is required:
21
1. Source data : Miscellaneous data sources may exist and in order to get the
correct results, the prescription drug information provided by drug firms should be
truthful, balanced, and accurately communicated. The same applies to data
coming from clinical and post-marketing trials, or spontaneous reports (submitted
individually by doctors or patients). Dynamic data are operational data from
internal systems such as the homegrown applications of clinics or hospitals, the
manual data coming from paper chart patient history, EMR (Electronic Medical
Records), and Adverse Event Reporting (Med Watch).
2. Data Staging: This area includes the storage and processing for extracted
data from the internal and external systems prior to loading in a SAS data bank.
The following is a list of cases.
Information may be located in multiple SQL tables in a local
computer or external servers. If it is required, one may make a connection to
the database server and use the data dynamically. For example the
Adverse Events Database has included side effects which are serious (such
as death or risk of dying, hospitalization, disabilities, congenital anomaly or
required intervention to prevent permanent impairment or damage). These
data are required for generating some particular reports.
Part of the information is part of Aventis Reports or ClinTrace. Data
from these two areas might work together to complete an assignment then
create an executable program that makes a connection to the backbone
database of these two licensed vendor applications and use the data.
Note: Having a basic knowledge about these databases helps programmers
to create standard codes. For example an Aventis or ClinTrace Case ID
(Manufacturer Control #) is assigned on an “Episode” basis for each
patient. Adverse Events (reporting side effects) are temporarily linked to
the same episode and are entered in the same Case ID. For drugs that are
given intermittently, additional episodes (Case ID) are created for events
that occur after different treatment cycles.
22
Side effects are stored in Companies Core Safety Data Sheets.
These sheets are for global labeling of reports and are based on the
diagnoses which are in turn assessed by seriousness. All diagnoses
reported from intensified monitoring (such as clinical trial or post-marketing
surveillance study) are assessed as associated or not-associated with the
study medications. These data may be joined to MedDRA information to
build a larger directory that is used in SQL scripts.
Drug providers use certain information, such as the cause of side
effects as a result of internal or natural body process, in a causality
algorithm for internal clinical interpretation or signal evaluation purposes. In
some particular cases, this algorithm is required to be applied as a part of
script logic in the SAS code. If a company has a computerized analyzing
application, depending on their software, it is possible to execute a
connection for using this application inside the SAS script code.
In data mining related by diagnoses, MedDRA information is
required. It is recommended to use SAS scripting for creating a remote
connection to read MedDRA ASCII file, importing data to the temporary
created tables. These tables would be deleted at the end of scripting
process.
Note: All transactions such as queries, statistical analyses or visualizations
coming from sources should be consistent. Sometimes these data are not
enough to be consistent. In order to solve this problem, all “no match” data
need appropriate transformations or conversion from their original form to
the MedDRA representation.
3. Metadata : A term used to describe or specify the data. It is used to define
all of the characteristics of data required to build databases and applications,
and to support knowledge workers and information producers. This includes
data element name, meaning, format, domain values, business integrity rules,
relationships, owner, etc.
23
For example the following classification shows the analogy of data concepts in
MedDRA:
1. SOC MedDRA CODE Numeric MedDRA Term String2. HLGT MedDRA CODE Numeric MedDRA Term String3. PT MedDRA CODE Numeric MedDRA Term String COSTART Symbol, AlphaNumeric WHO_ART Code, Numeric ICDS Code, Numeric PT ICD-10 Code Numeric HARTS Code, Numeric ICDS_CM Code, Numeric JART Code Numeric* SOC Code Numeric* SOC Name Numeric
4. LLT – Lowest Level Term
MedDRA Code Numeric MedDRA Term String WHO_ART Code Numeric COSTART Symbol AlphaNumeric ICDS_CM Code Numeric CURRENCY Character/Boolean HARTS Code Numeric ICDS Code Numeric JART Code Numeric
* Multi valued attribute
Defining Metdata for the adverse event reporting data is also required. These data are:
o Patient Identifier and patient information: age at time of event or
date of birth, sex, weight, etc.
o Outcomes attributed to adverse events such as death, life-
threatening occurrences, hospitalization, initial or prolonged,
24
disability, congenital anomaly, required intervention to prevent
impairment/damage, other.
o Date of event and report in mo/day/yr format.
o Description of problem.
o Relevant tests/laboratory data including dates.
o Other relevant history including preexisting medical condition (e.g.
allergies, race, pregnancy, smoking or alcohol use, hepatic/renal
dysfunction, etc.)
Still most popular medical clinics use Paper Medical Records (PMRs) but many
others have begun to use Electronic Medical Records (EMRs). No standard form
has been yet defined for EMRs, but all provide the same information that requires
Metadata definitions. These data are:
o Patient primary reason for medical visit
o History of onset of clinical signs and symptoms,
o Current list of medications the patient is using
o Relevant past medical history, including hospital admission,
surgeries, and diagnosis
o History of family disease, such as diabetes, cancer, heart disease,
and medical illness
o Social history: use of drugs, smoking, job stability, and housing,
living condition, incarceration.
25
o Review of systems: patient relocation of systems and current
medical problems, such as trouble sleeping at night, panic
episodes, and results of tests.
o Physical examination: the clinician’s hands-on examination of
patient, including head, eyes, ears, nose, throat, chest, and
extremities
o Labs includes blood glucose, cholesterol, and drug levels
o Studies such as X-ray, MRI, CT, and EKG.
o Progress notes such as record of temporal progression of signs
and symptoms, labs and studies for the length of the study or
admission
4. The entity-relationship model The specification of required information for
an adverse event serves as a starting point for constructing a conceptual schema
(overall design of the database) for the suggested database. The identity set and
attributes targeted here are drug and patient entity sets. These entity sets have a
relationship that has attributes by itself. This relationship is a “many to many”
relationship. Other relationships might be designed between subsets of an entity
set. The relationship between drug entity sets and ingredient or side effect entity
sets are examples of these relationships. Here, these relationships are “many to
one” relationship. This method of designation helps in saving memory. In some
other cases such as patient-drug relationship, the maximum participants are
limited to two relations, which leave a designation in one general set.In the
following diagram, small rectangles show the entity set; large rectangles specify
attributes; diamonds represent relationship sets; lines link attributes to entity sets
and entity sets to relationship sets; arrows indicate that an entity falls exclusively
into another entity; double lines indicate many relationship sets; bold diamonds
show “many to one” relationship sets, and rectangles with non-indexed
26
information indicate information about a relationship set. Above E_R model is a sample
of what can be considered; although the attributes can be designed with more details in mind. For
example, ‘rout and dosage’ could be designed as a separate entity because it includes many
optional attributes that may be concatenated together as a description data text. They may also be
saved seperatly in a data source. This designed E_R model gives substantial flexibility in the
designing of the basic data base schema. Accessing and Manipulating Data:The first step in
accessing and manipulating data is the DATA Step. The DATA Step is for
accessing, reading and programming the data processing. As explained before,
one of the strengths of SAS is the fast and easy access from many different
sources. In addition to the programming components, SAS has many other
features in the DATA Step Process that help to develop a standard application.
SAS language has all the statements required for accomplishing typical data
processing. Among these are the reading and adding of raw data files and SAS
data sets and writing the results. Sub-setting data, combining multiple SAS files,
creating SAS variables, recoding data values; and creating listing and summary
reports that include advanced analyzing features such as web analytical
solutions are also possible. Special focus should be placed on the management
of SAS data set input and output, working with different data types, and the
manipulation of data. It may also be necessary to control the SAS data set input
and output, combine, summarize, and then process iteratively with programming
to perform data manipulations and transformationsAccessing data would be first
needed here. Sometimes, the required data file will be saved in another server
and location. With an ftp server running, SAS can make an ftp connection and
use the external data source remotely without there remaining any copy of the
downloaded data on the machine unless SAS writes it out. As an example, one
can assume the data belongs to cps-users and is located at
~/halley/thesis/main.data. Many data might
come as raw data. This raw data must be entered into a SAS data set. As an
example, one of the clients might send a letter or a txt file that includes parts of
the patient’s information. The following script shows how to input these data into
a SAS data set. The SAS System 05:25 Thursday, December 15, 2005 5
PatientId age sex weight country Hzan0616341 30 1 200
11 Amir5666892 40 2 180 12 J675bhgfdql 56 2 . 45
27
Occur
filename fromrcr ftp 'main.data' cd='halley/thesis' user='cps-user' host='cps.brockport.edu' recfm=v prompt;
->Nmjhg567908 12 1 100 23 Iu6-567-567 99 1 170
01 ***A missing value for a numeric variable is presented by a period (.)Processing Examples: To use
external files, it is required to tell SAS where to find them. To do this, there are
the following choices:
1- Identify the file directly in the INFILE, FILE, or other SAS statement that
uses the file.
2- Set up a fileref for the file by using the FILENAME statement, and then
use the fileref in the INFILE, FILE, or other SAS statement.
3- Use operating environment commands to set up a fileref, and then use the
fileref in the INFILE, FILE, or other SAS statement.
Note: To use several files or members from the same directory, partitioned data
sets (PDS), or MACLIB, use the FILENAME statement to create a fileref that will
identify the name. The fileref can then be used in the INFILE statement and
enclose the name of the file, PDS member, or MACLIB member in parentheses
immediately after the fileref, as shown in the example below:
Also, from file menu, ADX can import data from a SAS data set or any of
ACCESS data base, Excel spreadsheet, a dBase database, a delimited
28
/* filename data 'directory-or-PDS-or-MACLIB' */;/* data1.txt and data2.txt located in directory c:\thesis */
filename data 'c:\thesis\';
data paitientdata1; infile data('data1.txt'); input PatientId $11. +2 age 2. +1 weight 3. +2 sex 1. +2 date1 mmddyy10. +2 date2 mmddyy10. + 2 country 12. ;
run;
data paitientdata2; infile data('data2.txt'); input PatientId $11. +2 age 2. +1 weight 3. +2 sex 1. +2 date1 mmddyy10. +2 date2 mmddyy10. + 2 country 12. ;run;
text file, and files with other common formats. This is helpful when one has
saved information in a variety of formats.
In SAS one can gain access to data sources by defining ’libref’ and
assigning accesses to them without copying them inside the SAS
environment. ‘libref’ makes a shortcut to the metadata on the SAS
Metadata Server. Any metadata in the SAS metadata server can be read
by a Meta. Meta is an engine that has options for controlling the outputs.
Meta creates just the metadata in the repository and does not affect the
data sources. If the table does not exist in the data source, the Meta
engine creates the metadata based on the information specified in the
application for the output table. When deleting a table, this option deletes
the metadata from the repository but does not delete the table from the
data source. Also, when deleting a table, this option deletes the table from
the data source but does not delete the metadata from the repository.
SAS Library includes Metadata objects that are defined by ‘libref.’ These
objects define the engines that are used to process the data. This library
has URI (Uniform Resource Identifier) architecture. To get access to a
SAS Metadata Server, define the host address. If working in a TCP
network, define the port number. If the protocol is not a com but a bridge,
define a user-id and password otherwise it will not be possible to log into a
SAS Metadata Server. In addition, any repository Metadata may be used
by a repository-id or name.
To access these tables, one can use SAS/Warehouse Administrator as a
tool. In order to determine the metadata, it needs to identify and search
the objects by their name, URL and other identifiers such as their ID. The
following script displays this process.
29
Ibname upcase metan liburi="SASLibrary?@name='oralib' "
ipaddr=d6292.us.GCS.com
Scripting:
SQL Scripting Goal is the driving of available data from any possible data source.
Most vendor applications have SQL backbone so that with SQL scripting it is
possible to perform queries on original or manipulated data (retrieving data from
multiple tables; creating views, indexes, and tables; and updating or deleting
values in existing tables and views as well as summarizing them). SQL scripting
can happen in SAS or SQL environment.
In the following example, the reduction of the earlier E_R schema ids is created
from inside the SQL environment:
/*------------------------------------------------------------------------------------------*//* create a higher-level entity set for drug information */CREATE TABLE drug(id CHAR(12) NOT NULL,generic_name CHAR(25),trade_name CHAR(25), dosage INT,unit INT,category INT,FOREIGN KEY (category) REFERENCES drug_category(category_id)ON DELETE CASCADE,FOREIGN KEY (unit) REFERENCES unit(unit_id)ON DELETE CASCADE,PRIMARY KEY (id)) ENGINE=INNODB;
/* create the lower level entity sets for drug information */
CREATE TABLE ingredient (id INT, drug_id CHAR(12),ingredient_name CHAR(25),ingredient_value INT,unit INT, INDEX drug_ind (drug_id),FOREIGN KEY (drug_id) REFERENCES drug(id)ON DELETE CASCADE,FOREIGN KEY (unit) REFERENCES unit(unit_id)ON DELETE CASCADE,
30
) ENGINE=INNODB;
/* the side effects of each drug have description that should be compatible with MedDRAClassification */
CREATE TABLE sideeffects (MedDRACode INT, drug_id CHAR(12),INDEX drug_ind (drug_id),FOREIGN KEY (drug_id) REFERENCES drug(id)ON DELETE CASCADE) ENGINE=INNODB;
/* create a general entity set for patient information; This entity set can be expanded by other entity sub sets such as patient laboratory information or more information about the history of that patient */
CREATE TABLE paitient(id CHAR(12) NOT NULL,first_name CHAR(25),middle_name CHAR(25),last_name CHAR(25), DateOfBirth DATE,Sex INT,weight INT,race INT,country INT,FOREIGN KEY (race) REFERENCES drug(race_id)ON DELETE CASCADE,FOREIGN KEY (country) REFERENCES drug(country_id)ON DELETE CASCADE,PRIMARY KEY (id)) ENGINE=INNODB;
/* some revalent paitient information might come from following sugested sub entity set */
CREATE TABLE Relevant_Patients_Info (Info_id INT NOT NULL AUTO_INCREMENT,paitient_id CHAR(25) NOT NULL,allergies_id INT, races_id INT, Num_pregnancies INT, smoking INT,alcohol_use INT, hepatic_id INT,dysfunctions_id INT,INDEX (allergies_id),FOREIGN KEY (allergies_id) REFERENCES allergies(allergies_id) ON UPDATE CASCADE ON DELETE RESTRICT,INDEX (races_id), FOREIGN KEY (races_id) REFERENCES races(races_id) ON UPDATE CASCADE ON DELETE RESTRICT,INDEX (hepatic_id),
31
FOREIGN KEY (hepatic_id) REFERENCES hepatic(hepatic_id) ON UPDATE CASCADE ON DELETE RESTRICT,INDEX (dysfunctions_id),FOREIGN KEY (dysfunctions_id) REFERENCES dysfunctions(dysfunctions_id) ON UPDATE CASCADE ON DELETE RESTRICT,INDEX (paitient_id),FOREIGN KEY (paitient_id) REFERENCES paitient(id) ON UPDATE CASCADE ON DELETE RESTRICT,PRIMARY KEY(Info_id)) ENGINE=INNODB;
/* transforming to a tabular form of this E_R model includes aggration is streightforward. Paitient-Drug relationship includes a column for each attribute in the primary key of the entity set for this relationship (any oconcomitant medical products that paitient uses and therapy dates might come from related tables in the drug id and paitient id. Also, any available adverse event information that shows the problem of using that drugshould be included.)*/
CREATE TABLE Patients_drugs (
Info_id INT NOT NULL AUTO_INCREMENT,paitient_id CHAR(25) NOT NULL,drug_id CHAR(12) NOT NULL,therapy_start_date DATE,therapy_end_date DATE,MedDRACode_DiagnoseForUse INT,/* 1 == yes, 2==no, 3==doesn’t apply *//* Event abated after use stopped or dose reduced */Quest1 INT,/* event reappeared after reintroduction */Quest2 INT, Lot_number INT, Exp_Date DATE, NDCno INT,
reason INT NOT NULL, date_of_event DATE,date_of_report DATE,adverse_desc TEXT, -----
INDEX (paitient_id),FOREIGN KEY (paitient_id) REFERENCES paitient(id) ON UPDATE CASCADE ON DELETE RESTRICT,INDEX (drug_id),FOREIGN KEY (drug_id) REFERENCES drug(id) ON UPDATE CASCADE ON DELETE RESTRICT,PRIMARY KEY(Info_id)) ENGINE=INNODB;
32
SQL scripting is required to generate reports on summary statistics. Macro
Language provides a facility that allows writing SQL procedure inside the SAS
environment. Therefore, SQL scripting extends SAS coding to the retrieval and
combination of data from tables or views. New ones can be created along with
indexes, and data values in PROC SQL tables can be updated. It is also
possible to update and retrieve data from Database Management System tables
or modify a PROC SQL table by adding, modifying, or dropping columns.
Example: Assume the Adverse Events Information from clinical studies, post-
marketing trials, spontaneous reports, and miscellaneous sources (including
independent drug identification numbers and retrospective data collection) are
saved in the above SQL tables. The following script generates a report that
shows Country of Origin for Patients receiving a drug in a post-marketing setting.
33
proc sql;
/* It extracts and manipulates grouped and ordered data from patient records to create a new temporary view table that includes only patient populations in each country. Country field is defined as an id number; to represent it by country name, it joins to the columns from countries table. After process is done, the temporary view table is dropped*/
create view temp as select country, count(country) as count, calculated Count/Subtotal as Percent format=percent8.2 from paitient, (select count(*) as Subtotal from paitient) as survey2 group by country order by count;quit; proc sql;
/* extracts required data from created temporary view table and then drop it */
title1 'Country or Origin for Patients Receiving the suspected drug in a Postmarketing Setting';
select c.countryname,t.count as cc,"(", t.Percent ,")" from countries c, temp t where c.ipcode = t.country; quit;
proc sql;drop view temp;quit;
Country or Origin for Patients Receiving the suspected drug in a Postmarketing Setting 22:04 Monday, January 16, 2006
CountryName Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Greece 1 (0.1%)Uruguay 2 (0.2%)Taiwan 2 (0.2%)French Polynesia 2 (0.2%)Peru 2 (0.2%)Korea 2 (0.2%)South Africa 3 (0.2%)Portugal 3 (0.2%)Turkey 4 (0.3%)Hungary 4 (0.3%)Austria 4 (0.3%)New Zealand 7 (0.5%)Brazil 7 (0.5%)Norway 10 (0.8%)Israel 11 (0.8%)Chile 15 (1.1%)Netherlands 26 (2.0%)Italy 39 (3.0%)Spain 38 (2.9%)Belgium 38 (2.9%)United States 42 (3.2%)Finland 44 (3.4%)Germany 50 (3.8%)Sweden 69 (5.3%)Denmark 91 (7.0%)Canada 97 (7.4%)Australia 107 (8.2%)Great Britain 271 (20.8%)France 313 (24.0%)
The patient exposure to the drug can be calculated and presented in different
ways. Although available exposure data are provided for a period of time, the
primary focus of a submitted report may be the number of exposures and cases
that occurred in a specific period of time. In the following report, global patient
exposures from 1989 to 2004 are provided:
proc sql;create view temp1 as
34
select region, count(region) as SachetSales from paitient group by region order by SachetSales;quit;
proc sql;create view temp2 as select region, count(region) as Exposures from paitient, where paitient_Id in (select paitient_Id from Patients_drugs where substr(therapy_start_date,7,4) > '1983' && substr(therapy_end_date,7,4) < '2001') group by region order by Exposures;quit;
proc sql; title1 'Wor ldwide Patient Exposure to the suspected drug 1989 to 1994';select c.region,t1.SachetSales , t2.Exposures from countries c, temp1 t1, temp2 t2 where c.ipcode = t1.region and c.ipcode = t2.region ; quit;
proc sql;select sum(t1.SachetSales) as SumSachetSales, sum(t2.Exposures) as SumExposuresfrom temp1 t1, temp2 t2quit;
proc sql;drop view temp1, temp2;quit;
Worldwide Patient Exposure to the suspected drug 1989 to 1994 23 20:55 Saturday, January 21, 2006
Region SachetSales Exposures ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Europe 230,649,500 1,895,749 Australia 5,292,542 43,500 Korea 3,067,300 25,211 Canada 1,497,100 12,305 Rest of World 2,405,064 19,768
SumSachet SumExposures ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 242,911,506 1,996,533
35
Inside the SQL scripting, one may occasionally work with data that are imported
from the MedDRA application. These data may have already existed in a
machine and it is not required to make access to the MedDRA environment a
second time. One can use the SAS utility to convert data from one form to
another or copy between machines. A free trial of MedDRA is available on the
MSSO website. This contains a copy sample of MedDRA data which are saved
in an Access data base. It could also be imported to an Excel file if needed. If
the data set is standard and completed it would then be better to use it as a
shared data source. This shared data source may be stored as a Relational
Database System (RDBMS), an Excel spreadsheet, or even as data stored on a
flat file. If it is stored in an external machine then it becomes an external data
source and a SAS connection is required for access.
The following SAS script retrieves MedDRA Classification from a data source. It
imports data from an external file (a spreadsheet) to a SAS table. This code was
generated and saved during the wizard importing process. Saving this type of
script helps to prevent redoing the work when the information is needed again.
The following script works as well:
36
Filename xclfil 'C:\thesis\CTCAEv3.xls’; proc import datafile=xclfil out= WORK.MEDDRAInfo dbms=excel97 replace; getnames= yes ;
PROC IMPORT OUT= WORK.MEDDRAInfo DATAFILE= "C:\thesis\CTCAEv3.xls" DBMS=EXCEL REPLACE; SHEET="'CTCAE v3#0 MedDRA Codes$'"; GETNAMES=YES; MIXED=NO; SCANTEXT=YES; USEDATE=YES; SCANTIME=YES;RUN;
The above script retrieves MedDRA Classification from a data source. Often these data may not represent all MedDRA data. Usually, only a subset of these data is required and is stored in an external file.
Assume MedDRAClassifications.xls includes only the MedDRA Classifications Data. To generate reports related to side effects, importing this file is enough to retrieve the appropriate symptoms information or signs listed by outcomes.
The SAS System 05:25 Thursday, December 15, 2005 1
Obs MedDRATermLevel1 MedDRATermLevel2
1 Nervous system disorders 2 Balance disorder 3 Convulsion 4 Lethargy 5 Optic neuritis 6 Paraesthesia 7 Speech disorder 8 Tunnel vision 9 Visual field defect 10 11 Eye disorders 12 Astigmatism 13 Blindness ………………….
* Sometimes the information that comes from a Report Adverse Event, clinical trials or any other post-marketing or Pharmacovigilance Application has a provisional order number that is assigned
37
PROC IMPORT DBMS=EXCEL OUT= work.MedDRA DATAFILE="c:\thesis\MedDRAClassifications.xls"REPLACE;Run;
infile ' c:\thesis\MedDRAClassifications.csv' delimiter=',' dsd; proc print data=MedDRA;run;
to outcome data which is cannot be correctly mapped to MedDRA. These order numbers alone can be used when electronic reports or data are submitted and automatically converted to the MedDRA codes.
From the parameter list created, values can be individually highlighted and chosen for processing. These required parameter values may be retrieved from tables that have been created by scripts such as following:
proc sql; create table reasonlist1 ( Description char(60));
insert into reasonlist1 values('Patient Died') values('Life threatening illness') values('Required emergency room/doctor visit') values('Required hospitalization') values('Resulted in permanent disability') values('Resulted in prolongation of hospitalization') values('others');
The ordering of the above parameter values is important for selecting the rows by
their Order Number and the description of these values must be the same as
those found on the FDA forms. The following script creates a parameter table for
the abbreviations used by Drug Safety Reporting. The ordering and description of
these abbreviations is also consistent with FDA standards.
proc sql; create table abbreviations ( abb char(5), Description char(60));
insert into abbreviations
values( 'ADR','adverse drug reaction')values( 'AE','adverse event')values( 'AERS','Adverse Event Reporting System ')values( 'bid','twice daily')values( 'CI','confidence interval')values( 'CIOMS','Council for International Organizations of Medical Sciences')values( 'COSTAR','Coding Symbols for Thesaurus of Adverse Reaction TermsT')values( 'CSDS','Core Safety Data Sheet')values( 'CV','coefficient of variation')values( 'FDA','Food and Drug Administration')values( 'GABA','Gamma amino butyric acid')values( 'HARTS','')values( 'IBD','International Birth Date' )
38
values( 'ICD9-1','International Classification of Diseases, 9th and 10th 0')values( 'ICD9C','MEditions/Revisions')values( 'ICH','International Classification of Diseases, Ninth Revision, Clinical MedDRAModification')values( 'NDA','International Conference on Harmonisation ')values( 'PSUR','Medical Dictionary for Regulatory Activities')values( 'qd','New Drug Application')values( 'qid','Periodic Safety Update Report')values( 'SAE','once daily')values( 'SD','four times daily')values( 'SE','serious adverse event')values( 'US','standard deviation')values( 'WHO-AR','standard error T');quit;
Formatting may be used for other parameter values. The ATTRIB Statement
permanently associates a format with a variable. SAS uses the format to write
the values of the variables specified.
attrib sales1-sales3 format=comma10.2;
Due to the permanent association of the ATTRIB Statement in the above
command, any subsequent DATA Step or PROC Step will use COMMA10.2
format to write the values of sales1, sales2, and sales3.
In addition to the default formats that are supplied by Base SAS Software, one
can create custom-made formats by the Format Procedure. The following format
procedure is used to define the Static Parameter Values that may be required. It
expresses weights; and measures using USP (United States Pharmacopeia)
standard abbreviations for dosage units.
Proc format; value $dosage_units ‘1’ = ‘m’ ‘2’ = ‘kg’ ‘3’ = ‘g’ ‘4’ = ‘m’ ’5’ = ‘mcg’ ‘6’ = ‘L’ ‘7’ = ‘mL’ ’8’ = ‘mEq’ ’9’ = ‘mmol’ ‘10’ = ‘ %’
run;
39
*see legend below for definitions
(1) m (lower case) = meter
(2) kg = kilogram
(3) g = gram
(4) mg = milligram
(5) mcg = microgram
(do not use the Greek letter mu which has been misread as mg)
(6) L (upper case) = liter
(7) mL (lower/upper case) = milliliter (do not use cc which has been misread as U or the
number 4)
(8) mEq = milliequivalent
(9) mmol = millimole
It can also be used to define a format variable for the drug in question (see procedure below):
Time durations, age and formats are also available:
40
proc format; value $dosage_form ‘1’ = ‘capsule’ ‘2’ = ‘cream’ ‘3’ = ‘ear drop’ ‘4’ = ‘eye drop’ ‘5’ = ‘inhaler’ ‘6’ = ‘injection’ ‘7’ = ‘oral solution’ ‘8’ = ‘solution’ ‘9’ = ‘suspension pediatric drop’ ‘10’ = ‘syrup’ ‘11’ = ‘tablet’ ‘12’ = ‘chewable tablet’ ‘13’ = ‘other’ run;
proc format; value $time_duration_form ‘1’ = ‘hour’ ‘2’ = ‘day’ ‘3’ = ‘week’ ‘4’ = ‘month’ ‘5’ = ‘year’
run;proc format; value $age_range _form ‘1’ = ‘children’ ‘2’ = ‘adult’ run;
proc format value $eating-format ‘1’ = ‘with meal’ ‘2’ = ‘without meal’ ‘3’ = ‘before meal’ ‘4’ = ‘after meal’ ‘5’ = ‘with a glass of water’ ‘5’ = ‘other’ run;
proc format value $time-format ‘1’ = ‘morning’ ‘2’ = ‘noon’ ‘3’ = ‘after noon’ ‘4’ = ‘evening’ ‘5’ = ‘midnight’
run;
Other values are a combination of the above defined formats. For example, drug
labels may read: “for adults, every morning, 2 tablets, 2 hour before meals, with a
glass of water” or “for children, under 8 years of age, ½ a tablet before meals, with a
glass of water….”
In a database, grouping processes may be based on the “Sex/Gender” field where
the values of “Male” “Female” and “unknown” can define minor groupings. These
values can be stored as Numeric variables (1, 2, and 3). The ordering of numeric
levels in relation to classification variables must be done with care. If in a statistical
41
report, the data for female patients is required to appear after the data for males,
the “Sex/Gender” field would use “2” for females and “1” for males. The following
SAS script describes this formatting.
proc format library=proclib;
value $sex '1'='male’ '2'='female' '3'='unknown' picture pop low-high='000,000,000'
run;
Formatting has other usages in scripting. Many of the data values must be defined
by format. In SAS one can use this format with any of the following:
1. PUT, PUTC, or PUTN functions
2. %SYSFUNC macro function
3. FORMAT/ATTRIB statement in a DATA step or a PROC step
Also one can use a macro function to define a user defined function. This function
applies the defined format to the result of the function outside a DATA step.
Usually Patient records are the type of data that can come from an Open
Database Connectivity (ODBC). It is very possible that these data have existed
as a backbone of a medical client-server application. In this case, access to data
42
num=15;char=put(num,hex2.);
population=1145.32;put population 10.2;result: 1,145.32
%macro tst(amount); %put %sysfunc(putn(&amount,dollar10.2));%mend tst;%tst (1154.23);
via ODBC is required. The module "SAS/Access for ODBC" must be installed on
the computer. Configuring the database by referring to the DNS (Data Source
Name) and how it is accessed is can also be required. Even parameter values
can come from an ODBS. These data may have dynamic data values that get
up-dated by end-users through the web. Normally, these applications have
administration parts that allow the end-user to do parameter updating.
Example:
The following script shows how one can use a part of data that is stored in
another vendor's Database Management System (DBMS) files. This data then
goes into the SAS data set. In the following script a ‘libref’ is declared and points
to a library containing Oracle data. SAS reads data from an Oracle file into a SAS
data set:
Memory allocation is the most important concept in creating or extending a data
library. SAS allows for the request of space as needed. For optimizing system
performance and allocating space appropriately, one can pre-allocate the most
space that that may be needed. These methods are used more often when
multivolume access to SAS data libraries is required.
The above data statement may then change to:
/* Know this is a big data set. */data paitient.big (alq=100000 deq=5000);
43
libname dblib oracle user=halley password=halley path='hrdept_002';data paitient.big; set dblib.paitient;run;
As is explained earlier, data can come from an external data file. Additionally,
one can connect to a data file and work on it. In the following script, we can
connect to Z/OS and UNIX server to use DB2 and Oracle data:
/*************************************//* connect to z/OS *//*************************************/ options comamid=tcp; filename rlink '!sasroot\connect\saslink\tcptso.scr'; signon os390host; /*************************************/ /* download DB2 data views using */ /* SAS/ACCESS engine */ /*************************************/
rsubmit os390host; libname db db2; proc download data=db.paitient out=db2dat; run; endrsubmit;
/*************************************/ /* connect to UNIX */ /*************************************/
options remote=hrunix comamid=tcp; filename rlink '!sasroot\connect\saslink\tcpunix.scr'; signon;
/*************************************/ /* download Oracle data using */ /* SAS/ACCESS engine */ /*************************************/
rsubmit hrunix; libname oracle user=hzan password=halley; proc download data=oracle.paitient out=oracdat;
run;
endrsubmit;
/*************************************/ /* sign off both links */ /*************************************/ signoff hrunix; signoff os390host cscript= '!sasroot\connect\saslink\tcptso.scr';
44
/*************************************/ /* union data into SAS view */ /*************************************/ proc sql;
create view temp_joindata as (select gender ,country, count(*) into population from db2dat group by gender,country ;) union (select gender,country, count(*) into population from oracdat group by gender,country;) union (select gender,country, count(*) into population from paitient1 group by gender,country;)
proc sql; create view jointdataselect temp_joindata.gender, temp_joindata. population, countries.namefrom temp_joindata, countries where countries.codeId = temp_joindata.countryorder by gender, countries.namegroup by gender, countries.nameoptions fmtsearch=(proclib);
/* The NOWD option runs the REPORT procedure without the REPORT window and sends its output to the open output destination(s).*/
proc report data=jointdata nowd;column gender country population;format gender $SEX. Country & $50. Population pop;title ‘Country or Origin for Patients Receiving the drug in Post marketing’;run;
Country or Origin for Patients Receiving this drug in Post marketingfor 04JAN06
Gender country Population Female Algeria 743,453Male 235,984Unkown 167
Female Denmark 423,457,698Male 546,876,345Unkown 897
Female Spain 456,9812,564Male 400,987,564Unkown 234
Female United Kingdom 876,234,123Male 564,234,876Unkown
45
Conclusions:
This thesis proposes ways on how to improve programming practices for
Standardizing Drug Safety Reporting Systems. The quality of a Drug Safety
Reporting Application depends on the system architecture, methodologies, and
modeling used by the programmer. The degree to which an implementation is
standardized is in direct proportion to the correctness of methods in accessing,
gathering and manipulating the data, its classifications, control code, quality
control, formatting, statistical analyzing, and mining thereof. Classification terms
should follow a hierarchical structure that is consistent with FDA standards and
MedDRA. Using the control code with MedMinder and the SCM is also
important. Both this and quality control should not be overlooked by
programmers. Formatting of data must be done properly and again, consistent
with FDA standards. Statistical analyzing and data mining in these types of
applications must also be done correctly as it has a direct affect on the results.
Ultimately, gathering data and its access should be handled dynamically and
manual accessing should not be considered. Above all, details such as size of
data in the data accessing stage should be carefully protected.
As to the professional performing in the system, an advanced background in
computational, mathematical, and programming methods is obligatory for
accurately applying these terminologies. SAS programming, knowledge of
Object Oriented programming data structures, data base modeling and SQL are
all necessary skills for implementing a Standard Drug Safety Reporting System.
Knowledge of statistical modeling is particularly desirable in data mining
applications. Finally, a graduated computational science major or a professional
software designer can make the application work more dynamically and
accurately with good scripting skills. The workbench of Drug Safety Reporting
Systems is made up of SAS, and MedDRA applications. SAS supports an
advanced data accessing technology; and MedDRA classification matches the
46
metadata required for designing this application. These existing components
improve the reliability of design, and SQL scripting expands it.
47
References
SAS Publishing, the Analyst Application, Second Edition (July 2002)
Adriaans, P., and D.Zantings.1996. Data Mining. Edinburg Gate, England: Addison
Wesley Longman.
Hand, D.J. 1997. Construction and Assessment of Classification Rules. New York: John
Wiley & Sons, Inc
Berry, M.J.A., and G. Linoff. 1996. Data Mining Techniques for Marketing, Sales, and
Customer Support. New York: John Wiley & Sons, Inc
Bergeron, Bryan P. (2003). Prentice Hall Professional Technical Reference. Bioformatics
Computing. New Jersey: Pearson Education, Inc.
Pharmacoepidemiology and Drug Safety, Vol. 1 [1992], Vol. 2 [1993], Vol. 6 [1997]) &
Vol. 7 [1998])
Agresti, A. (1996) Introduction to categorical Data Analysis, Wiley, NY
Collet, D. (1994) Modeling Survival Data in Medical researches, CRC/Chapman & Hall,
London
Benichou C., (ed) Adverse Reactions: A practical Guide to Diagnosis and Management
(Wiley & Sons, 1994)
Fuchi, K. (1981) “Aiming for knowledge information processing system.” Processing of
international conference on fifth generation computing systems, Japan Information
Processing Development center, Tokyo republished (1982) by North-Holland Publishing,
Amsterdam
SAS online documents http://www.sas.com/service/library/onlinedoc
CDER (http://www.fda.gov/cder/handbook/index.htm )
MedWatch http://www.fda.gov/medwatch/getforms.htm
48