sb mirza

Upload: s-b-mirza

Post on 05-Apr-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 SB Mirza

    1/31

    Knowledge-based expert systems

    and a proof-of-concept case study

    for multiple sequence alignment

    construction and analysisMohamed Radhouene Aniba,Sophie Siguenza, Anne Friedrich,FredericPlewniak ,Olivier Poch,Aron Marchler-Bauer

    and Julie Dawn Thompson

    SB MIRZA

    FA11-RBI-004

    MS 2nd semester Bioinformatics

  • 7/31/2019 SB Mirza

    2/31

    ABSTRACT

    Bioinformatic information being produced tohandle the various applications used to studythe information presents a major datamanagement and analysis challenge toresearchers.

    It is impossible to analyse manually all theinformation.

    And new approaches are needed that arecapable of processing the large-scaleheterogeneous data to extract the pertinentinformation.

  • 7/31/2019 SB Mirza

    3/31

    Cont

    A general methodology for building knowledge-based expert systems is described, focusing onthe (Unstructured Information Management

    Architecture), UIMA, which provides facilities forboth data and process management.

  • 7/31/2019 SB Mirza

    4/31

    New Challenges

    Such system level studies necessitate acombination of experimental, theoretical andcomputational approaches.

    A major challenge for bioinformaticians in thepost-genomic era is clearly the management,validation and analysis of this mass ofexperimental and predicted data.

  • 7/31/2019 SB Mirza

    5/31

    Approaches

    One approach has been data warehousing,where all the relevant databases are stored in aunified format and mined through a uniform

    interface Distributed systems implement software to

    access heterogeneous databases that aredispersed over the internet and provide a query

    facility to access the data.

  • 7/31/2019 SB Mirza

    6/31

    SRS and entrez are probably the mostwidely used database query andnavigation systems for the life science

    community.

    Semantic web based methods have beenintroduced which add meaning to the raw

    data by using formal descriptions of theconcepts

  • 7/31/2019 SB Mirza

    7/31

    KNOWLEDGE-BASED EXPERTSYSTEMS

    There are several forms of expert systems thathave been classified according to themethodology

    rule-based systems use a set of rules to analyseinformation about a specific class of problemsand recommend one or more possible solution.

    case-based reasoning systems adapt solutionsthat were used to solve previous problems anduse them to solve new problems.

  • 7/31/2019 SB Mirza

    8/31

    Cont

    Neural networks implement software simulationsof massively parallel processes involving theprocessing of elements that are interconnected

    in anetwork architecture.

    Fuzzy expert systems use the method of fuzzylogic which deals with uncertainty and is used in

    areas where the results are not always binary.

  • 7/31/2019 SB Mirza

    9/31

  • 7/31/2019 SB Mirza

    10/31

    The knowledge base contains domain expertise inthe form of facts that the expert system will use tomake determinations.

    The working storage is a database containing dataspecific to a problem being solved.

    The inference engine is the code at the core of thesystem which derives recommendations from theknowledge base and problem-specific data in theworking storage

  • 7/31/2019 SB Mirza

    11/31

    Cont

    The knowledge acquisition module is usedto update or expand dynamic knowledgebases.

    The user interface controls the dialogbetween the user and the system.

  • 7/31/2019 SB Mirza

    12/31

    Expert systems and their architecture

    The sequence of steps used to analyse aparticular problem is not explicitlyprogrammed, but is defined dynamically

    for each new case.

    Expert systems allow more than one lineof reasoning to be pursued and presented

  • 7/31/2019 SB Mirza

    13/31

    Cont

    Problem solving is accomplished by applyingspecific knowledge rather than a specifictechnique. when the expert system does not

    produce the desired results, the solution is toexpand the knowledge base rather than to re-programme the procedures.

  • 7/31/2019 SB Mirza

    14/31

    Tools for implementation of expertsystems Expert system shell is a piece of software which

    provides a development framework containingthe user interface.

    (CLIPS)

    (JESS)

    The use of a shell can reduce the amount ofmaintenance required and increase reusability

    and flexibility of the application. An alternative is toBuild a customised expert

    system using conventional languages.

    It has been recommended to use a

  • 7/31/2019 SB Mirza

    15/31

    Expert systems in bioinformatics

    In the biocomputing domain, various expertsystems have been built for specific tasks.

    Artificial neural networks have been used

    successfully for pattern discovery in many areas,including DNA and protein sequence analysis andmicroarray data analysis.

    Fuzzy logic approaches have also been applied tothe analysis of gene expression data.

  • 7/31/2019 SB Mirza

    16/31

    Cont

    An active area of research has been thereconstruction of functional networks, such asexpression data or interaction data, and

    intelligent systems, neural networks, geneticalgorithms, etc.

    These approaches are also finding applications,e.g drug discovery and design or medical

    diagnosis.

    An important task in bioinformatics is theextraction of knowledge from the biomedical

    literature.

  • 7/31/2019 SB Mirza

    17/31

    Cont

    FIGENIX addresses the problems of automaticstructural and functional annotation under thesupervision of a rule-based expert system.

    Case-based approaches and rule-basedsystems have also been applied widely.

    But there is no standard architecture that wouldallow exchange of information code between themany different applications.

  • 7/31/2019 SB Mirza

    18/31

    A PROPOSED SOLUTION FORBIOINFORMATICS:UIMA The most difficult aspects of building an expert

    system in this domain is information integration,because of the heterogeneity of traditional

    biological data resources. The results of certain programmes, databaseannotations in natural language or the scientificliterature is stored in different formats.

    The principal challenge with unstructuredinformation is needed to be analysed to identify,locate and relate the entities and relationships ofinterest.

  • 7/31/2019 SB Mirza

    19/31

    Cont

    OpenNLP, GATE and UIMA, Open NLP is anumbrella structure to open source projects forthe treatment of natural language.

    Open NLP and GATE, contain many powerful

    and robust algorithms for the processing ofnatural language texts.

    UIMA allows the analysis structured and

    unstructured data. It provides powerful capabilities for distributed

    computing through services and is scalable andextensible software architecture.

  • 7/31/2019 SB Mirza

    20/31

    Overview of UIMA

    A Collection Reader (CR), which allows the CPEto treat all the data in a given collection.

    A corpus: AEs and AAEs, possibly managed byFCs and CAS Consumer (CC), takes the resultsfrom the CAS and stores them in an exploitableformat

    The first layer of a UIMA application, represents

    the Working Storage and contains the systemsmemory and used to

  • 7/31/2019 SB Mirza

    21/31

    Cont

    store the metadata associated with

    each AE.

  • 7/31/2019 SB Mirza

    22/31

    Cont

    AAEs and each module in the application layercan then update the metadata in the CAS ateach step of the data analysis.

    In an expert system, the analysis pipeline is not

    pre-defined by the developer or user, butdepends the previous experience gained by

    the system.

    The third layer corresponds to the User Interfacebetween the final user and the system.

    It provides powerful capabilities for distributedcomputing through services and is scalable and

    extensible software architecture.

  • 7/31/2019 SB Mirza

    23/31

    Cont

    UIMA is now exploited by numerous applicationrelated to the biological domain.

  • 7/31/2019 SB Mirza

    24/31

    A CASE STUDY: THE ALEXSYS

    MULTIPLE ALIGNM ENT EXPERTSYSTEM

    The first formal algorithm for Multiple sequencealignment (MSA) was computationally expensiveand most programmes employ some kind ofheuristic approach.

    The progressive alignment procedure exploits

    the fact that homologous sequences areevolutionarily related.

  • 7/31/2019 SB Mirza

    25/31

    Cont

    This method involves three main steps:1:Pairwise sequence alignment and

    distance matrix calculation

    2: guide tree construction

    3:multiple alignment

  • 7/31/2019 SB Mirza

    26/31

    Different MSA tools

    DbClustal

    Tcoffee

    MAFFT Muscle

    3DCoffee

    PRALINE Refiner

    PipeAlign

  • 7/31/2019 SB Mirza

    27/31

    AlexSys prototype: objectives

    Multiple sequence comparisons or alignmentsprovide the basis for most of the computationalmethods used in genomic analyses orproteomics projects.

    The MSA programmes are capable of aligningdifficult, divergent sets of sequences withconsistently high quality.

    Alignments Expert SYStem evaluate andoptimise all the steps involved in theconstruction and analysis of a multiplealignment.

  • 7/31/2019 SB Mirza

    28/31

    Design and implementation

    The TSs represent the prototypes memory (the

    UIMA CAS)

    The prototype currently uses five TSs:

    Sequence, Matrix, Tree, Parameter andAlignment.

    With TS we can build the second applicationlayer containing the AEs, grouped together intoAAEs.

    This is designed to perform three main tasks:input data handling, annotation and information

    extraction and MSA construction.

  • 7/31/2019 SB Mirza

    29/31

    The modular structure of theAlexSys prototype.

  • 7/31/2019 SB Mirza

    30/31

    Clustal W alignment optimisationusing UIMA

    The final enhancement to the ClustalW algorithm

    involved the addition of a post-processingrefinement step

    Many approaches are developed to improveMSA, based on an optimisation of the Sum of

    Pairs objective function.

  • 7/31/2019 SB Mirza

    31/31