scientific data mining: emerging developments and challenges f. seillier-moiseiwitsch bioinformatics...

12
Scientific Data Mining: Scientific Data Mining: Emerging Developments and Emerging Developments and Challenges Challenges F. Seillier-Moiseiwitsch Bioinformatics Research Center Department of Mathematics and Statistics University of Maryland - Baltimore County

Post on 21-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Scientific Data Mining: Emerging Developments and Challenges F. Seillier-Moiseiwitsch Bioinformatics Research Center Department of Mathematics and Statistics

Scientific Data Mining:Scientific Data Mining:Emerging Developments and Emerging Developments and

ChallengesChallenges

F. Seillier-Moiseiwitsch

Bioinformatics Research Center

Department of Mathematics and Statistics

University of Maryland - Baltimore County

Page 2: Scientific Data Mining: Emerging Developments and Challenges F. Seillier-Moiseiwitsch Bioinformatics Research Center Department of Mathematics and Statistics

Bioinformatics:

A View from the Trenches

Page 3: Scientific Data Mining: Emerging Developments and Challenges F. Seillier-Moiseiwitsch Bioinformatics Research Center Department of Mathematics and Statistics

Some Needed Developments: Simultaneous data mining of databases

• Different types of information in separate databases

GenBank, PDB, HIV-Web, PubMed, …

Data selection

Generic solution

Page 4: Scientific Data Mining: Emerging Developments and Challenges F. Seillier-Moiseiwitsch Bioinformatics Research Center Department of Mathematics and Statistics

Some Needed Developments: Simultaneous data mining of databases

• Same information in different databases

Meta-analysis

e.g. Gene expression data

Pre-processing

different technologies

sources of variability

Page 5: Scientific Data Mining: Emerging Developments and Challenges F. Seillier-Moiseiwitsch Bioinformatics Research Center Department of Mathematics and Statistics
Page 6: Scientific Data Mining: Emerging Developments and Challenges F. Seillier-Moiseiwitsch Bioinformatics Research Center Department of Mathematics and Statistics

Some Needed Developments: Data mining of heterogeneous databases

Many different types of information in same database

e.g. Patient records - diagnostics

lab results, DNA, microarray

2D gel images

data compression

features

Page 7: Scientific Data Mining: Emerging Developments and Challenges F. Seillier-Moiseiwitsch Bioinformatics Research Center Department of Mathematics and Statistics
Page 8: Scientific Data Mining: Emerging Developments and Challenges F. Seillier-Moiseiwitsch Bioinformatics Research Center Department of Mathematics and Statistics

Some Needed Developments: New Algorithms

• Molecular evolution

Phylogenetic reconstruction

Large number of sequences

Statistical evolutionary models

MCMC, E-M algorithm

Parallel processors

Emerging models

Page 9: Scientific Data Mining: Emerging Developments and Challenges F. Seillier-Moiseiwitsch Bioinformatics Research Center Department of Mathematics and Statistics
Page 10: Scientific Data Mining: Emerging Developments and Challenges F. Seillier-Moiseiwitsch Bioinformatics Research Center Department of Mathematics and Statistics

Some Needed Developments: New Algorithms

• Proteomics

images of 2D gels

clean up, alignment

group composite image

biological vs. experimental variability

easily updated

Page 11: Scientific Data Mining: Emerging Developments and Challenges F. Seillier-Moiseiwitsch Bioinformatics Research Center Department of Mathematics and Statistics

Some Needed Developments: New Algorithms

• Functional genomics

microarray data

background estimation (subjectivity)

automation of analytical protocols

Page 12: Scientific Data Mining: Emerging Developments and Challenges F. Seillier-Moiseiwitsch Bioinformatics Research Center Department of Mathematics and Statistics

Some Challenges

• Public domain software

• Easily implementation on any computing platform

• Incorporation of state-of-the-art statistical techniques

clustering, classification

longitudinal models

spatio-temporel models