sb mirza
TRANSCRIPT
-
7/31/2019 SB Mirza
1/31
Knowledge-based expert systems
and a proof-of-concept case study
for multiple sequence alignment
construction and analysisMohamed Radhouene Aniba,Sophie Siguenza, Anne Friedrich,FredericPlewniak ,Olivier Poch,Aron Marchler-Bauer
and Julie Dawn Thompson
SB MIRZA
FA11-RBI-004
MS 2nd semester Bioinformatics
-
7/31/2019 SB Mirza
2/31
ABSTRACT
Bioinformatic information being produced tohandle the various applications used to studythe information presents a major datamanagement and analysis challenge toresearchers.
It is impossible to analyse manually all theinformation.
And new approaches are needed that arecapable of processing the large-scaleheterogeneous data to extract the pertinentinformation.
-
7/31/2019 SB Mirza
3/31
Cont
A general methodology for building knowledge-based expert systems is described, focusing onthe (Unstructured Information Management
Architecture), UIMA, which provides facilities forboth data and process management.
-
7/31/2019 SB Mirza
4/31
New Challenges
Such system level studies necessitate acombination of experimental, theoretical andcomputational approaches.
A major challenge for bioinformaticians in thepost-genomic era is clearly the management,validation and analysis of this mass ofexperimental and predicted data.
-
7/31/2019 SB Mirza
5/31
Approaches
One approach has been data warehousing,where all the relevant databases are stored in aunified format and mined through a uniform
interface Distributed systems implement software to
access heterogeneous databases that aredispersed over the internet and provide a query
facility to access the data.
-
7/31/2019 SB Mirza
6/31
SRS and entrez are probably the mostwidely used database query andnavigation systems for the life science
community.
Semantic web based methods have beenintroduced which add meaning to the raw
data by using formal descriptions of theconcepts
-
7/31/2019 SB Mirza
7/31
KNOWLEDGE-BASED EXPERTSYSTEMS
There are several forms of expert systems thathave been classified according to themethodology
rule-based systems use a set of rules to analyseinformation about a specific class of problemsand recommend one or more possible solution.
case-based reasoning systems adapt solutionsthat were used to solve previous problems anduse them to solve new problems.
-
7/31/2019 SB Mirza
8/31
Cont
Neural networks implement software simulationsof massively parallel processes involving theprocessing of elements that are interconnected
in anetwork architecture.
Fuzzy expert systems use the method of fuzzylogic which deals with uncertainty and is used in
areas where the results are not always binary.
-
7/31/2019 SB Mirza
9/31
-
7/31/2019 SB Mirza
10/31
The knowledge base contains domain expertise inthe form of facts that the expert system will use tomake determinations.
The working storage is a database containing dataspecific to a problem being solved.
The inference engine is the code at the core of thesystem which derives recommendations from theknowledge base and problem-specific data in theworking storage
-
7/31/2019 SB Mirza
11/31
Cont
The knowledge acquisition module is usedto update or expand dynamic knowledgebases.
The user interface controls the dialogbetween the user and the system.
-
7/31/2019 SB Mirza
12/31
Expert systems and their architecture
The sequence of steps used to analyse aparticular problem is not explicitlyprogrammed, but is defined dynamically
for each new case.
Expert systems allow more than one lineof reasoning to be pursued and presented
-
7/31/2019 SB Mirza
13/31
Cont
Problem solving is accomplished by applyingspecific knowledge rather than a specifictechnique. when the expert system does not
produce the desired results, the solution is toexpand the knowledge base rather than to re-programme the procedures.
-
7/31/2019 SB Mirza
14/31
Tools for implementation of expertsystems Expert system shell is a piece of software which
provides a development framework containingthe user interface.
(CLIPS)
(JESS)
The use of a shell can reduce the amount ofmaintenance required and increase reusability
and flexibility of the application. An alternative is toBuild a customised expert
system using conventional languages.
It has been recommended to use a
-
7/31/2019 SB Mirza
15/31
Expert systems in bioinformatics
In the biocomputing domain, various expertsystems have been built for specific tasks.
Artificial neural networks have been used
successfully for pattern discovery in many areas,including DNA and protein sequence analysis andmicroarray data analysis.
Fuzzy logic approaches have also been applied tothe analysis of gene expression data.
-
7/31/2019 SB Mirza
16/31
Cont
An active area of research has been thereconstruction of functional networks, such asexpression data or interaction data, and
intelligent systems, neural networks, geneticalgorithms, etc.
These approaches are also finding applications,e.g drug discovery and design or medical
diagnosis.
An important task in bioinformatics is theextraction of knowledge from the biomedical
literature.
-
7/31/2019 SB Mirza
17/31
Cont
FIGENIX addresses the problems of automaticstructural and functional annotation under thesupervision of a rule-based expert system.
Case-based approaches and rule-basedsystems have also been applied widely.
But there is no standard architecture that wouldallow exchange of information code between themany different applications.
-
7/31/2019 SB Mirza
18/31
A PROPOSED SOLUTION FORBIOINFORMATICS:UIMA The most difficult aspects of building an expert
system in this domain is information integration,because of the heterogeneity of traditional
biological data resources. The results of certain programmes, databaseannotations in natural language or the scientificliterature is stored in different formats.
The principal challenge with unstructuredinformation is needed to be analysed to identify,locate and relate the entities and relationships ofinterest.
-
7/31/2019 SB Mirza
19/31
Cont
OpenNLP, GATE and UIMA, Open NLP is anumbrella structure to open source projects forthe treatment of natural language.
Open NLP and GATE, contain many powerful
and robust algorithms for the processing ofnatural language texts.
UIMA allows the analysis structured and
unstructured data. It provides powerful capabilities for distributed
computing through services and is scalable andextensible software architecture.
-
7/31/2019 SB Mirza
20/31
Overview of UIMA
A Collection Reader (CR), which allows the CPEto treat all the data in a given collection.
A corpus: AEs and AAEs, possibly managed byFCs and CAS Consumer (CC), takes the resultsfrom the CAS and stores them in an exploitableformat
The first layer of a UIMA application, represents
the Working Storage and contains the systemsmemory and used to
-
7/31/2019 SB Mirza
21/31
Cont
store the metadata associated with
each AE.
-
7/31/2019 SB Mirza
22/31
Cont
AAEs and each module in the application layercan then update the metadata in the CAS ateach step of the data analysis.
In an expert system, the analysis pipeline is not
pre-defined by the developer or user, butdepends the previous experience gained by
the system.
The third layer corresponds to the User Interfacebetween the final user and the system.
It provides powerful capabilities for distributedcomputing through services and is scalable and
extensible software architecture.
-
7/31/2019 SB Mirza
23/31
Cont
UIMA is now exploited by numerous applicationrelated to the biological domain.
-
7/31/2019 SB Mirza
24/31
A CASE STUDY: THE ALEXSYS
MULTIPLE ALIGNM ENT EXPERTSYSTEM
The first formal algorithm for Multiple sequencealignment (MSA) was computationally expensiveand most programmes employ some kind ofheuristic approach.
The progressive alignment procedure exploits
the fact that homologous sequences areevolutionarily related.
-
7/31/2019 SB Mirza
25/31
Cont
This method involves three main steps:1:Pairwise sequence alignment and
distance matrix calculation
2: guide tree construction
3:multiple alignment
-
7/31/2019 SB Mirza
26/31
Different MSA tools
DbClustal
Tcoffee
MAFFT Muscle
3DCoffee
PRALINE Refiner
PipeAlign
-
7/31/2019 SB Mirza
27/31
AlexSys prototype: objectives
Multiple sequence comparisons or alignmentsprovide the basis for most of the computationalmethods used in genomic analyses orproteomics projects.
The MSA programmes are capable of aligningdifficult, divergent sets of sequences withconsistently high quality.
Alignments Expert SYStem evaluate andoptimise all the steps involved in theconstruction and analysis of a multiplealignment.
-
7/31/2019 SB Mirza
28/31
Design and implementation
The TSs represent the prototypes memory (the
UIMA CAS)
The prototype currently uses five TSs:
Sequence, Matrix, Tree, Parameter andAlignment.
With TS we can build the second applicationlayer containing the AEs, grouped together intoAAEs.
This is designed to perform three main tasks:input data handling, annotation and information
extraction and MSA construction.
-
7/31/2019 SB Mirza
29/31
The modular structure of theAlexSys prototype.
-
7/31/2019 SB Mirza
30/31
Clustal W alignment optimisationusing UIMA
The final enhancement to the ClustalW algorithm
involved the addition of a post-processingrefinement step
Many approaches are developed to improveMSA, based on an optimisation of the Sum of
Pairs objective function.
-
7/31/2019 SB Mirza
31/31