providing data security in patient clinical … data security in patient clinical data using support...
TRANSCRIPT
PROVIDING DATA SECURITY IN PATIENT CLINICAL DATA USING SUPPORT VECTOR
MACHINE
ANNAI COLLEGE OF ENGINEERING AND TECHNOLOGY
DEPARTMENT OF INFORMATION TECHNOLOEGY
GUIDED BY
Mrs D.SHYAMALADEVI, M.Tech. Ass Pro/IT
P.GEETHA([email protected]) R.KALPANA([email protected]) P.VINOTHINI([email protected])
ABSTRACT: To speed up the time and improve the diagnosis
accuracy in today’s Healthcare system. The advantages of
clinical decision support system include not only improving
diagnosis accuracy but also reducing diagnosis time. We have
use the data mining technique name Using the SVM classifier
and the encryption technique and try to make the Clinical
Decision Support System more helpful for providing
information about some important diseases more accurately and
efficiently. The models were proposed nearly three decades ago,
but have had little practical impact, despite their significant
advantages compared to either ranked keyword or pure svm. We
also propose term independent bounds that are able to further
reduce the number of score calculations for short, simple queries
under the extended svm. Together, these methods yield an
overall saving from 50 to 80 percent of the evaluation cost on
test queries drawn from biomedical search.
INTRODUCTION :Now a days, Healthcare industry is
extensively distributed in the global scope to provide health
services for patients, has never faced such a massive amounts of
electronic data or experienced such a sharp growth rate of data
today. However, if no appropriate technique is developed to find
great potential economic values from big healthcare data, these
data might not only become meaningless but also requires a
large amount of space to store and manage. Over the past two
decades, the miraculous evolution of data mining technique has
imposed a major impact on the revolution of human’s lifestyle
by predicting behaviors and future trends on everything which
can convert stored data into meaningful information. These
techniques are well suitable for providing decision support in the
healthcare setting. To speed up the diagnosis time and improve
the diagnosis accuracy, a new system in healthcare industry
should be workable to provide a much cheaper and faster way
for diagnosis. Clinical Decision Support System (CDSS), with
various data mining techniques being applied to assist
physicians in diagnosing patient diseases with similar
symptoms, has received a great attention recently. Clinical
decision support system has been defined as an “active
knowledge systems”, which use two or more items of patient’s
data to generate case specific advice. This implies that a CDSS
is simply a decision support system that is focused on using
knowledge management in such a way to achieve clinical advice
for patient care based on multiple items of patient’s data. The
main purpose of modern CDSS is to assist clinicians at the point
of care. This means that clinicians interact with a CDSS to help
to analyses, and reach a diagnosis based on patient data. Naive
Bayesian classifier, one of the popular machine learning tool of
data mining, has been widely used recently to predict various
diseases in CDSS. Despite its simplicity, it is more appropriate
for medical diagnosis in healthcare than some sophisticated
techniques. The CDSS with naive Bayesian classifier has offered
many advantages over the traditional healthcare systems and
opens a new way for clinicians to predict patient’s diseases.
However, its flourish still hinges on understanding and
managing the information security and privacy challenges,
especially during the patient disease decision phase. One of the
main challenges is how to keep patient’s medical data away
from unauthorized disclosure. The usage of medical data can be
of interest for a large variety of healthcare stakeholders. For
example, an online directto-consumer service provider offers
individual risk prediction for patient’s disease. Without good
protection of patient’s medical data, patient may feel afraid that
his medical data will be leaked and abused, and refuse to
provide his medical data to CDSS for diagnosis. Therefore, it is
crucial to protect patient’s medical data. However, keeping
medical data’s privacy is not sufficient to push forward the
whole CDSS into flourish. Service provider’s classifier, which is
used to predict patient’s disease, cannot be exposed to third
parties since the classifier is considered as service provider’s
own asset. Otherwise, the third parties can abuse the classifier to
predict patient’s disease which could damage service provider’s
profit. Therefore, besides preserving the privacy of patient’s
medical data, how to protect service provider’s privacy is also
crucial for the CDSS. Preserving Patient-Centric Clinical
Decision Support System called PPCD, helps physician to
predict disease risks of patients in privacy-preserving way. A
secure and privacy-preserving patient-centric clinical decision
support system which allows service provider to diagnose
patient’s disease without leaking any patient’s medical data.
CDSS provides knowledge and specific information about the
diseases for clinicians to enhance diagnostic efficiency and
improving healthcare quality. It can highly elevate patient
safety, improve healthcare quality. With increasing amounts of
data being generated by all the healthcare industries and
researchers there is a need for fast, accurate and robust
algorithms for data analysis. Improvements in databases
technology, computing performance and artificial intelligence
have contributed to the development of intelligent data analysis.
The primary aim of data mining is to discover patterns in the
data that lead to better understanding of the data generating
process and to useful predictions. One recent technique that has
been developed to address these issues is the support vector
machine. The support vector machine has been developed as
robust tool for classification and regression in noisy, complex
domains. In this paper, we have use the Support Vector Machine
(SVM) technique which is one of the most powerful
classification techniques that was successfully applied to many
real world problems. This SVM has greater advantage in case of
improving diagnosis accuracy in clinical decision support
system. Along with this, we have uses the homomorphic
Encryption technique for providing security to the sensitive data
related to patient health information. The remaining paper is
organized as, Section II gives some Literature Survey which
gives some brief information of the study done in the field of
Clinical Decision Support System CDSS. Section III discuss in
brief about some the data mining and Privacy preserving
technique used for providing CDSS. Section IV provides the
workflow of the system that we have generated for providing
proper CDSS. Finally, Section V concludes the paper.
LITERATURE SURVEY
Application of the clinical decision support systems in the
management of chronic diseases :With the continuous
deepening of the hospital information and improvement of self-
awareness of quantitative management, to reduce the incidence
of medical malpractice, the clinical decision support system
(CDSS), which provides decision support for medical staff, has
emerged. Faced with China's aging population and the
background of high incidence of chronic diseases, establishing a
clinical decision support system with chronic disease
management function is particularly important. This paper
introduces the concept, development process and architecture of
clinical decision support system (CDSS) firstly. Then the
application progress and problems of clinical decision support
systems in chronic disease management is reviewed, in order to
provide a new idea for the development of relevant clinical
decision support system.
Architectural Approaches for Implementing Clinical
Decision Support Systems in Cloud: A Systematic Review
:Clinical Decision Support Systems (CDSS) were explicitly
introduced in the 90's with the aim of providing knowledge to
clinicians in order to influence its decisions and, therefore,
improve patients' health care. There are different architectural
approaches for implementing CDSS. Some of these approaches
are based on cloud computing, which provides on-demand
computing resources over internet. The goal of this paper is to
determine and discuss key issues and approaches involving
architectural designs in implementing a CDSS using cloud
computing. To this end, we performed a standard Systematic
Literature Review (SLR) of primary studies showing the
intervention of cloud computing on CDSS implementations.
Twenty-one primary studies were reviewed. We found that
CDSS architectural components are similar in most of studies.
Cloud-based CDSS are most used in Home Healthcare and
Emergency Medical Systems. Alerts/Reminders and Knowledge
Service are the most common implementations. Major
challenges are around security, performance and compatibility.
We concluded on the benefits of implementing a cloud-based
CDSS, since it allows cost-efficient, ubiquitous and elastic
computing resources. We highlight that some studies show
weaknesses regarding the conceptualization of a cloud-based
computing approach and lack of a formal methodology in the
architectural design process.
A Machine Learning Approach to Improve Contactless
Heart Rate Monitoring Using a Webcam :Unobtrusive,
contactless recordings of physiological signals are very
important for many health and human–computer interaction
applications. Most current systems require sensors which
intrusively touch the user's skin. Recent advances in contact-free
physiological signals open the door to many new types of
applications. This technology promises to measure heart rate
(HR) and respiration using video only. The effectiveness of this
technology, its limitations, and ways of overcoming them
deserves particular attention. In this paper, we evaluate this
technique for measuring HR in a controlled situation, in a
naturalistic computer interaction session, and in an exercise
situation. For comparison, HR was measured simultaneously
using an electrocardiography device during all sessions. The
results replicated the published results in controlled situations,
but show that they cannot yet be considered as a valid measure
of HR in naturalistic human–computer interaction. We propose a
machine learning approach to improve the accuracy of HR
detection in naturalistic measurements. The results demonstrate
that the root mean squared error is reduced from 43.76 to 3.64
beats/min using the proposed method.
Prediction of extubation failure for neonates with
respiratory distress syndrome using the MIMIC-II clinical
database :Extubation failure (EF) is an ongoing problem in the
neonatal intensive care unit (NICU). Nearly 25% of neonates
fail their first extubation attempt, requiring re-intubations that
are associated with risk factors and financial costs. We identified
179 mechanically ventilated neonatal patients that were
intubated within 24 hours of birth in the MIMIC-II intensive
care database. We analyzed data from the patients 2 hours prior
to their first extubation attempt, and developed a prediction
algorithm to distinguish patients whose extubation attempt was
successful from those that had EF. From an initial list of 57
candidate features, our machine learning approach narrowed
down to six features useful for building an EF prediction model:
monocyte cell count, rapid shallow breathing index, fraction of
inspired oxygen (FiO2), heart rate, PaO2/FiO2 ratio where
PaO2 is the partial pressure of oxygen in arterial blood, and work
of breathing index. Algorithm performance had an area under
the receiver operating characteristic curve (AUC) of 0.871 and
sensitivity of 70.1% at 90% specificity.
Public Key Cryptosystems from the Multiplicative Learning
with Errors :We construct two public key cryptosystems based
on the multiplicative learning with errors (MLWE) problem.
The cryptosystem 1 (resp. 2) are semantically secure assuming
the worst-case hardness of the decisional composite residuosity
problem and the search (resp. decisional) LWE problem, which
implies approximating the shortest vector problem in a lattice to
small polynomial factors. Namely, our cryptosystems are secure
as long as one of them is hard. Moreover, the cryptosystem 1
can be adapted to the chosen ciphertext secure cryptosystem. In
addition, the cryptosystem 2 has the additively homomorphic
property.
k-Nearest Neighbor Classification over Semantically Secure
Encrypted Relational Data :Data Mining has wide applications
in many areas such as banking, medicine, scientific research and
among government agencies. Classification is one of the
commonly used tasks in data mining applications. For the past
decade, due to the rise of various privacy issues, many
theoretical and practical solutions to the classification problem
have been proposed under different security models. However,
with the recent popularity of cloud computing, users now have
the opportunity to outsource their data, in encrypted form, as
well as the data mining tasks to the cloud. Since the data on the
cloud is in encrypted form, existing privacy-preserving
classification techniques are not applicable. In this paper, we
focus on solving the classification problem over encrypted data.
In particular, we propose a secure k-NN classifier over
encrypted data in the cloud. The proposed protocol protects the
confidentiality of data, privacy of user's input query, and hides
the data access patterns. To the best of our knowledge, our work
is the first to develop a secure k-NN classifier over encrypted
data under the semi-honest model. Also, we empirically analyze
the efficiency of our proposed protocol using a real-world
dataset under different parameter settings.
Secure two-party distance computation protocol based on
privacy homomorphism and scalar product in wireless
sensor networks Numerous privacy-preserving issues have
emerged along with the fast development of the Internet of
Things. In addressing privacy protection problems in Wireless
Sensor Networks (WSN), secure multi-party computation is
considered vital, where obtaining the Euclidian distance between
two nodes with no disclosure of either side's secrets has become
the focus of location-privacy-related applications. This paper
proposes a novel Privacy-Preserving Scalar Product Protocol
(PPSPP) for wireless sensor networks. Based on PPSPP, we then
propose a Homomorphic-Encryption-based Euclidean Distance
Protocol (HEEDP) without third parties. This protocol can
achieve secure distance computation between two sensor nodes.
Correctness proofs of PPSPP and HEEDP are provided,
followed by security validation and analysis. Performance
evaluations via comparisons among similar protocols
demonstrate that HEEDP is superior; it is most efficient in terms
of both communication and computation on a wide range of data
types, especially in wireless sensor networks.
Toward efficient and privacy-preserving computing in big
data era :Big data, because it can mine new knowledge for
economic growth and technical innovation, has recently received
considerable attention, and many research efforts have been
directed to big data processing due to its high volume, velocity,
and variety (referred to as "3V") challenges. However, in
addition to the 3V challenges, the flourishing of big data also
hinges on fully understanding and managing newly arising
security and privacy challenges. If data are not authentic, new
mined knowledge will be unconvincing; while if privacy is not
well addressed, people may be reluctant to share their data.
Because security has been investigated as a new dimension,
"veracity," in big data, in this article, we aim to exploit new
challenges of big data in terms of privacy, and devote our
attention toward efficient and privacy-preserving computing in
the big data era. Specifically, we first formalize the general
architecture of big data analytics, identify the corresponding
privacy requirements, and introduce an efficient and privacy-
preserving cosine similarity computing protocol as an example
in response to data mining's efficiency and privacy requirements
in the big data era.
Efficient privacy preservation technique for healthcare
records using big data :Big data, because it can mine new
knowledge for economic growth and technical innovation, has
recently received considerable attention, and many research
efforts have been directed to big data processing due to its high
volume, velocity, and variety (referred to as “3V”) challenges.
However, in addition to the 3V challenges, the flourishing of big
data also hinges on fully understanding and managing newly
arising security and privacy challenges. If data are not authentic,
new mined knowledge will be unconvincing; while if privacy is
not well addressed, people may be reluctant to share their data.
Because security has been investigated as a new dimension,
“veracity,” in big data, in this article, we aim to exploit new
challenges of big data in terms of privacy, and devote our
attention toward efficient and privacy-preserving computing in
the big data era. Specifically, we first formalize the general
architecture of big data analytics, identify the corresponding
privacy requirements, and introduce an efficient and privacy
Triple DES as an example in response to data mining's
efficiency and privacy requirements in the big data era.
A Review on the State-of-the-Art Privacy-Preserving
Approaches in the e-Health Clouds :Cloud computing is
emerging as a new computing paradigm in the healthcare sector
besides other business domains. Large numbers of health
organizations have started shifting the electronic health
information to the cloud environment. Introducing the cloud
services in the health sector not only facilitates the exchange of
electronic medical records among the hospitals and clinics, but
also enables the cloud to act as a medical record storage center.
Moreover, shifting to the cloud environment relieves the
healthcare organizations of the tedious tasks of infrastructure
management and also minimizes development and maintenance
costs. Nonetheless, storing the patient health data in the third-
party servers also entails serious threats to data privacy. Because
of probable disclosure of medical records stored and exchanged
in the cloud, the patients' privacy concerns should essentially be
considered when designing the security and privacy
mechanisms. Various approaches have been used to preserve the
privacy of the health information in the cloud environment. This
survey aims to encompass the state-of-the-art privacy-preserving
approaches employed in the e-Health clouds. Moreover, the
privacy-preserving approaches are classified into cryptographic
and noncryptographic approaches and taxonomy of the
approaches is also presented. Furthermore, the strengths and
weaknesses of the presented approaches are reported and some
open issues are highlighted.
OBJECTIVE : A privacy-preserving patient-centric clinical
decision support system may propose called PPCD. which is
based on Support Vector Machine classification to help
physician to predict disease risks of patients in a privacy-
preserving way.
EXISTING SYSTEM
A privacy-preserving patient-centric clinical decision
support system using naive Bayesian classifier.
In existing system used different data mining
techniques, neural networks and association rule
mining, have been used for anomaly detection and
classification.
In existing system described several computer
models (such as Bayesian networks) that may be
used in clinical practice in the near future.
PROPOSED SYSTEM
To improve the existing system using Clinical
Decision Support System based on Support Vector
Machine (SVM).
Data Mining classification technique for Clinical
Decision Support System.
The system will work faster and efficient using
SVM. It is widely used in real-life applications.
We have also used the RSA and Paillier algorithm
for homomorphic encryption using proxy Re-
encryption algorithm that prevents cipher data from
Chosen Cipher text Attack (CCA). So this system is
more secure than existing system.
Modules
1. Similar Diseases Document Extraction
2. Score Computation
3. Term Independent Score Bounds
4. Get SVM Top k
Module Description
Similar Diseases Extraction :In this module we extract the
similar historical medical data contain confirmed case records of
patient’s symptoms and confirmed diseases. These data contain
some sensitive information which are highly related to data set
based on the given SVM Clinical decision support system. The
similar document is extracted using TF-IDF values. Compute
the similarity score for the given query and the data set. Get the
highest similarity score document.
Score Computation -In this module we compute the score for
each document in a data set. The recursive nature of SVM
queries makes it necessary to calculate the scores on lower
levels in the query first. One obvious possibility would be to try
and add processing logic to each query node as it acts on its
clauses. But optimizations such as max-score could only be
employed at the query root node, as a threshold is only available
for the overall query score. Instead, we follow a holisti
approach and prefer to be able to calculate the document score
given a set of query terms S T present in a document, no
matter where they appear in the query tree.
Term Independent Score Bounds :To provide early
termination of Clinical Decision Support System scoring, we
also propose the use of term independent SVM score bounds
that represent the maximum attainable score for a given number
of terms. A lookup table of score bounds Mr is created, indexed
by r that is consulted to check if it is possible for a query
document containing r of the terms to achieve a score greater
than the current entry threshold. That is, for each r = 1. . . n , we
seek to determine
The number of possible term combinations that could occur in
documents is for each r, which is in
total, and a demanding computation. However, the scoring
functions only depend on the clause scores, that is, the overall
score of each sub-tree, meaning that the problem can be broken
down into subparts, solved for each sub-tree separately, and then
aggregated. The simplest (sub) tree consists of one term, for
which the solution is trivial.
Get SVM Top-k :In this module we get the top k Clinical
Decision Support System for the give SVM query. Our objective
is to construct a query sequence q1, q2... qv of SVM queries,
that can be submitted to the database, retrieve as few documents
as possible, and still contain all the documents that would be in
the top-k results clinical decision support system in datasets.
ARCHITECHTURE DIAGRAM
DATAFLOWDIAGRAM
SOFTWARE REQUIREMENTS
Introduction to JAVA :Java was conceived by James Gosling,
Patrick Naughton, Chris Warth, Ed Frank, and Mike Sheridan at
Sun Microsystems, Inc. in 1991. It took 18 months to develop
the first working version. This language was initially called
“Oak” but was renamed “Java” in 1995. Between the initial
implementation of Oak in the fall of 1992 and the public
announcement of Java in the spring of 1995, many more people
contributed to the design and evolution of the language. Bill Joy,
Arthur van Hoff, Jonathan Payne, Frank Yellin, and Tim
Lindholm were key contributors to the maturing of the original
prototype. Somewhat surprisingly, the original impetus for Java
was not the Internet! Instead
Why JAVA? :Java is not just a language. Java is an all purpose
language that can be successfully used for all kind of software
including data processing software, scientific software, system
software and real time software, as well as possible for portable
web software. The Internet helped catapult Java to the forefront
of programming, and Java, in turn, has had a profound effect on
the Internet. The reason for this is quite simple: Java expands the
universe of objects that can move about freely in cyberspace. In
a network, two very broad categories of objects are transmitted
between the server and ythe personal computer:
1) Dynamic ,active programs.For example,
when you read ythe e-mail, you are viewing passive
data. Even when you download a program, the
program’s code is still only passive data until you
execute it. However, a second type of object can be
transmitted to
2) An application is a program that runs on
ythe computer, under the operating system of that
computer. That is, an application created by Java is
more or less like one created using C or C++. When
used to create applications, Java is not much different
from any other computer language. Rather, it is
Java’s ability to create applets that makes it
important. An applet is an application designed to be
transmitted over the Internet and executed by a Java-
compatible Web browser. An applet is actually a tiny
Java program, dynamically downloaded across the
network, just like an image, sound file, or video clip.
The important difference is that an applet is an
intelligent program, not just an animation or media
file. In other words, an applet is a program that can
react to user input and dynamically change—not just
run the same animation or sound over and over.
re likely aware, every time that you download a
“normal” program, you are risking a viral infection. Prior to
Java, most users did not download executable programs
frequently, and those who did scanned them for viruses prior to
execution. Even so, most users still worried about the possibility
of infecting their systems with a virus. In addition to viruses,
another type of malicious program exists that must be guarded
against. This type of program can gather private information,
such as credit card numbers, bank account balances, and
passwords, by searching the contents of ythe computer’s local
file system. Java answers both of these concerns by providing a
“firewall” between a networked application and ythe computer.
When you use a Java-compatible Web browser, you can safely
download Java applets without fear of viral infection or
malicious intent. Java achieves this protection by confining a
Java program to the Java execution environment and not
allowing it access to other parts of the computer. (You will see
how this is accomplished shortly.) The ability to download
applets with confidence that no harm will be done and that no
security will be breached is considered by many to be the single
most important aspect of Java.
Portability :As discussed earlier, many types of computers and
operating systems are in use throughout the world—and many
are connected to the Internet. For programs to be dynamically
downloaded to all the various types of platforms connected to
the Internet, some means of generating portable executable code
is needed. As you will soon see, the same mechanism that helps
ensure security also helps create portability. Indeed, Java’s
solution to these two problems is both elegant and efficient.
JDBC :( Java Database Connectivity) :Java provides various
interfaces for connecting to the database and executing SQL
statements. Using the JDBC interfaces you can execute normal
statements, dynamic SQL statements and store procedures that
take both IN & OUT parameters. It enables application to
connect any database using the various drivers and new set of
java API (Application Programming Interface) objects and
methods. The database is a collection of related information and
a Database Management system (DBMS) is software that
provides you with a mechanism to retrieve modify and insert
data to the database.
For the application to communicate with the database it needs
the following information:
1. RDBMS/DBMS product using which the database is
created.
2. The location of database.
3. The name of database.
JDBC APPLICATION ARCHITECTURE:
Figure 9 JDBC Application Architecture
JDBC interface:
Interface Description
Callable statement Enable you to
execute stored procedure that
have OUT parameter.
Connection The interface that
connects ythe java
application to database.
Database Met
Data
Provides java with
information concerning the
database.
Driver The interface used
to execute dynamic SQL
statement and stored
procedures.
Prepared
statement
An interface used
to execute dynamic SQL
statement and stored
procedures.
Result Set An interface that
provides information on a
result set returned from
database.
Statement An interface for
executing normal SQL
statement and stored
procedures.
Table 1 JDBC Interface
Java objects:
Java also provides a handful of objects that you can
use in ythe java application. Most of the objects are used to
provide java with some database specifications data type
available in most databases.
JDBC API Objects:
Some of the important JDBC API objects are as
follows:
Data Provides an object
that can accepts database data
values.
Driver Manager Provides another way
to make a connection to
database.
Driver Property Info Used to manage the
driver.
Time Provides an object
that can accepts database timer
values.
Table 2 JDBC API Objects
Advantages of JDBC:
One of the main the reason to use JDBC is that it is
responsibly faster than CGI (Common Gateway Interface)
approach. Using CGI usually requires that you can call another
program that must be executed by computer. This separate
program then access the database process the data returning it
back to calling program in a stream. This requires multiple level
of processing which is turn increase wait time as well as enable
more bugs to appear. Executing JDBC statement against the
database requires only some sort of services that the process the
SQL commands through the database. This speeds up
dramatically the time it takes to execute the SQL statement.
MYSQL:
DBMS AND RDBMS:
Data: :Collection of facts in raw form that becomes information
offer proper organization of processing is known as data.
Database: :Collection of data files which are arranged into
integrated file system which helps inn minimizing duplication of
data and provides convenient access to information within the
system.
Benefits of database system:
Minimal Redundancy: :Redundant data are space expensive.
They required more than one updating operations. Further as
different copies of data may be in different stages of updating,
the system may not give consistent information. Hence a
database system can be used to eliminate redundancy whenever
possible.
Data consistency: :When changes occur in a data item, every
file, which contains those fields, should be updated to reflect the
change. A database system efficiently does this function.
Data sharing:The central database can be located on a server,
which can then be shared by several users. In fact, the database
can be shared. This sharing enables central storage of the
database.
Data integrity:When any users put in the database it is very
important that the data items and associations between data
items are not destroyed. Integrity checks can be performed at the
data level itself, by checking that data values confirm to certain
specified rules.
Privacy and security:The information stored should be kept in
such a way that it does not get lost or affected. Privacy means
that individuals and corporation as a whole should have rights to
determine themselves when how and to what extent information
about them is to the transmitted to others. Security of data
means protecting the data against accidental intentional
disclosure to unauthorized modification description.
Database management system:A collection of programs for
storing and receiving data from a database is called database
management system. It is a software used for the management,
maintenance and retrieval of data stored in a database. It is an
integrated set of computer programs that collectively provide an
organization with all the capabilities of a central of a physical
database structure are handled by the DBMS.
Advantages of DBMS:
Create a database, define its contents.
Backup the data.
Enforce data security.-
Organize and maintain the physical storage.
Disadvantages of DBMS:
It allows only single user to access the database.
Information cannot be retrieved from more than one
database, at same time.
Through its uses powerful hardware, if the database
fails all the operations based on the database will be
suspended.
Relational database management system: -A relational
database is based on the concept that the related data is
organized and stored in a series of two dimensional tables
known as relations. In relation a row is called tuple and column
is called attribute. RDBMS is a suite of software programs that
can be used for creating, maintaining, modifying and
manipulating the relational database. RDBMS is a one among
the three basic modules of DBMS.
Difference: :The difference of two relation is a third relation
containing tuples that occur in the first relation but not in the
second.
Selection: :The select operation on a relation results in another
relation with the same attributes and the body containing tuples
where the condition falls true.
Projection: :Projection is an operation that selects specified
attributes from relations. The result of a projection is a new
relation that has selected attributes. It picks columns out of
relations.
Division: :For a division operation attributes of second relations
must be a subset of those of first relation. First relation divided
by a second relation is a relation where attributes consists of all
attributes of first relation not included in second relation.
Join :A join combines the data spread across the tables. A
join combine the specified rows of tables. A common column
provides the join condition. To avoid ambiguity the reference to
the table must be made. If the tables in join have the same
column name, it is distinguish them by referring the table name.
Integrity constraint:
An integrity constraint is a mechanism to prevent invalid
data entry into the table. Some types of integrity constraints are:
1. Domain integrity constraints.
2. Entity integrity constraints.
3. Referential integrity constraint.
Domain integrity constraint: :This can verify whether the data
entered is in proper form and also they set a range for the input
data.
Entity integrity constraint:These constraints are used to
enforce consistency in database. Any data recorded in the
database is called entity. Each entity represents a table and each
record represents an instance of that entity. Entity constraint is
used to identify each row in a table uniquely.
Types of entity integrity constraints:
a. Unique constraint.
b. Primary constraint.
Referential integrity constraints: :The referential integrity
constraint enforces relationship between tables. The foreign key
establishes a column or a combination of columns as a foreign
key. It establishes a relationship with a specified primary or
unique key in another table, called the referenced key. The table
containing the referenced key is called the parent table and that
containing the foreign key is called the child table.
Advantages of RDBMS over DBMS:
RDBMS provide a flexible mechanism to design a
database in such a way as to meet the changing needs of an
application.
It allows multiple users to simultaneously access the
database.
It allows the retrieval of information from more than
one table by linking the tables.
It provides an efficient security mechanism to the
database.
Disadvantages:
The performance of RDBMS on a smaller system
such as a pc is considerably reduced.
It assumes enormous storage space.
INPUT DESIGN AND OUTPUT DESIGN
INPUT DESIGN :The input design is the link between the
information system and the user. It comprises the developing
specification and procedures for data preparation and those steps
are necessary to put transaction data in to a usable form for
processing can be achieved by inspecting the computer to read
data from a written or printed document or it can occur by
having people keying the data directly into the system. The
design of input focuses on controlling the amount of input
required, controlling the errors, avoiding delay, avoiding extra
steps and keeping the process simple. The input is designed in
such a way so that it provides security and ease of use with
retaining the privacy. Input Design considered the following
things:’
1. What data should be given as input?
2. How the data should be arranged or coded?
3. The dialog to guide the operating personnel
in providing input.
4. Methods for preparing input validations and
steps to follow when error occur.
OBJECTIVES :1.Input Design is the process of converting a
user-oriented description of the input into a computer-based
system. This design is important to avoid errors in the data input
process and show the correct direction to the management for
getting correct information from the computerized system.
2.It is achieved by creating user-friendly screens for the data
entry to handle large volume of data. The goal of designing
input is to make data entry easier and to be free from errors. The
data entry screen is designed in such a way that all the data
manipulates can be performed. It also provides record viewing
facilities.
3.When the data is entered it will check for its validity. Data can
be entered with the help of screens. Appropriate messages are
provided as when needed so that the user
will not be in maize of instant. Thus the objective of input
design is to create an input layout that is easy to follo
OUTPUT DESIGN :A quality output is one, which meets the
requirements of the end user and presents the information
clearly. In any system results of processing are communicated to
the users and to other system through outputs. In output design it
is determined how the information is to be displaced for
immediate need and also the hard copy output. It is the most
important and direct source information to the user. Efficient
and intelligent output design improves the system’s relationship
to help user decision-making.
1. Designing computer output should proceed in an
organized, well thought out manner; the right output must be
developed while ensuring that each output element is designed
so that people will find the system can use easily and effectively.
When analysis design computer output, they should Identify the
specific output that is needed to meet the requirements.
2. Select methods for presenting information.
3. Create document, report, or other formats that
contain information produced by the system.
The output form of an information system should
accomplish one or more of the following objectives.
Convey information about past activities, current
status or projections of the Future.
Signal important events, opportunities, problems, or
warnings.
Trigger an action.
Confirm an action.
METHODOLOGY :Software methodologies are concerned
with the process of creating software - not so much the technical
side but the organizational aspects. In this, the first of two
articles, I will introduce the different types of methodologies. It
will describe them from an historical perspective, as one way to
understand where the project is and where the project is going is
to know where the project has been. There will also be a
subsequent companion article looking at the current state of the
art and what believe the future holds. This second article is
given the rather controversial title of "Why Creating Software is
not engineering", which I will, of course, explain. In the 2nd
article I will discuss several popular agile methodologies
particularly their most important aspects (such as unit testing),
which are generally neglected or glossed over.
Waterfall -Conventional wisdom is to ask an expert for help
and there are many willing helpers ready to provide their
services, for a fee, of course. You find that you need an
"analyst" to work out where you really want to go and a
"designer" to provide detailed unambiguous instructions on how
to get there. The analyst works out by deduction and/or educated
guesswork exactly which "Station Hotel" you want. Perhaps
they even manage to contact Jim to confirm the location. They
also find out the exact address and the opening hours. Now you
know exactly where you want to go but how to get there? The
designer provides a "specification" or instructions for getting to
the hotel - eg proceed 2 miles to the roundabout, take the 3rd
exit, etc. To ensure that the driver understands the instructions
the essential parts are even translated into his native language. A
good designer might also try to anticipate problems and have
contingency plans - eg, if the freeway s alternative.
The essential point of the specification is to have the
trip completely and thoroughly planned before starting out.
Everybody involved reads the specification and agrees that this
should get the customer to the pub on time. Can you see a
problem with this approach? While the analyst/designer is busy
at work you (the customer) are getting a bit nervous. It's been
some time and you still haven't gone anywhere. You also want
feedback that once you start the trip everything will stay on
track, since the experience of taxi journeys is that they can be
very unpredictable and the driver never gives any indication of
whether he is lost or on course. You need a "plan" so that you
can check that everyone is doing their job and that if something
is amiss it will be immediately apparent. The plan will also
require the driver to regularly report his position so you know if
he is going to be late or not get there at all. For a large project
you will need a "project manager" to formulate the plan.
Prototype :The prototype methodology addresses the major
problem of the waterfall methodology's "specification", which is
that you are never sure it will work until you arrive at the
destination. It is difficult or impossible to prove that the
specification is correct so just create a simple working example
of the product, much like an architect would create a scale
model of a proposed new building. To continue the taxi analogy
the designer, or a delegate, grabs a motorbike to first check that
you can actually get to the Station Hotel and even explore a few
alternative routes. When the motorbike driver has found a good
way the actual taxi trip can begin.
Iterative :The precursor to today's agile methodologies can be
loosely grouped as "iterative". They are also often described as
"cascade" methods as they are really just a series of small
waterfalls. I am not going to discuss these in detail as they were
not very widely used or very successful. They were more an
attempt to fix the problems of the waterfall methodology rather
than to bypass them altogether. As such they had many of the
same problems, and even exacerbated some.
Agile :The main problem with the waterfall approach is that it
takes a lot of effort up front effort in planning, analysis and
design. When it comes to the actual implementation there are so
many variables and unknowns that it is very unlikely to go to
plan. The prototype approach meant the project could eliminate
some of this uncertainty by demonstrating that there is a
reasonable chance of success. However, there is still a lot of up
front effort to design and build the prototype even before the
project even start the real development. What if the project
could divide the project into a sequence of steps so that at each
stage the project can demonstrate that the project is closer to the
final product? After each step the project produce a working (but
not final) product so that the customer can see that the project is
on the right track.
Research Project Development Processes
The key research project management processes,
which run though all of these phases, are:
Phase management
Planning
Control
Procurement
Integration
CONCLUSION :Clinical decision support system CDSS,
which uses advanced data mining techniques that help clinician
make proper decisions, has received considerable attention
recently. The advantages of clinical decision support system
include not only improving diagnosis accuracy but also reducing
diagnosis time. We have proposed Clinical decision support
system using Support Vector Machine. Using SVM, the
computational time and diagnosis rate will improved. The
patient can get proper much efficiently and properly by
providing diagnosis result according to their own preferences.
The data that will be transfer is present in encrypted data, so that
there will be no loss in the privacy of patients data while training
the SVM classifier and providing data on the network.
REFERENCE
[1] Ximeng Liu, Rongxing Lu, Jianfeng Ma, Le Chen, and
Baodong Qin, “Privacy- Preserving Patient-Centric Clinical
Decision Support System on Naïve Bayesian Classification”,
IEEE JOURNAL OF BIOMEDICAL AND HEALTH
INFORMATICS, VOL. XX, NO. XX, DECEMBER 2014.
[2] R. S. Ledley and L. B. Lusted, “Reasoning foundations of
medical diagnosis,” Science, vol. 130, no. 3366, pp. 9–21, 1959.
[3] Maria-Luiza Antonie, Osmar R. Za¨ıane, Alexandru Coma,
.Application of Data Mining Techniques for Medical Image
Classification.Proceeding of second International worshop on
Mutimedia data mining(MDM/KDD’2001),in conjuction with
ACM SIGKDD conference.SAN FRANCISCO,USA,AUG
26,2001.
[4] C. Schurink, P. Lucas, I. Hoepelman, and M. Bonten,
“Computer- assisted decision support for the diagnosis and
treatment of infectious disease s in intensive care units,” The
Lancet infectious diseases, vol. 5, no. 5, pp. 305–312, 2005.
[5] Tzu-cheng Chuang, Okan K. Ersoy, Saul B. Gelfand,
Boosting Classification Accuracy With Samples Chosen From A
Validation Set, ANNIE (2007), Intelligent ` Engineering
systems through artificial neural networks, St. Louis, MO, pp.
455-461.
[6] McSherry D (2011), Conversational case-based reasoning in
medical decision making, Artificial Jun; 52(2):59-66. Epub
[7] X. Yi and Y. Zhang, “Privacy-preserving naive bayes
classification on distributed data via semi-trusted mixers,”
Information Systems, vol. 34, no. 3, pp. 371–380, 2009.
[8] A. Amirbekyan and V. Estivill-Castro, “A new efficient
privacy- preserving scalar product protocol,” in Proceedings of
the sixth Aus- tralasian conference on Data mining and
analytics-Volume 70. Australian Computer Society, Inc., 2007,
pp. 209–214.
[9] R. Lu, H. Zhu, X. Liu, J. K. Liu, and J. Shao, “Toward
efficient and privacy-preserving computing in big data era,”
IEEE Network, vol. 28, no. 4, pp. 46–50, 2014.
[10]A simultaneous consult on your patient’s diagnosis,
Simulconsult, www.simulconsult.com/
[11]Mr. P. A. Kharat, Dr. S. V. Dudul, IEEE 2011, Clinical
Decision Support based on Jordan/Elman Network”.
[12]AY AI-Hyari, A. M. Al-Taee and M. A. Al-Taee, IEEE
2013, “Clinical Decision Support for diagnosis and management
of Chronic Renal Failure.
[13]G.Subbalakshmi, K. Ramesh and M. Chinna Rao,IJCSE
2011, “Decision Support in Heart Disease Prediction System
using Naive Bayes”
[14]Hui-Ling Chen, Bo Yang, Jie Liu and Da-You Liu, Springer
2011, “A support vector machine classifier with rough set-based
feature selection for breast cancer diagnosis.”