1 advanced bioinformatics core (abc) 進階生物資訊核心設施 chang, chuan-hsiung ( 張傳雄...

27
1 Advanced Bioinformatics Cor e (ABC) 進進進進進進進進進進 Chang, Chuan-Hsiung ( 進進進 ) Chen, Chen-Hsin ( 進進進 ) Hsu, Chun-nan ( 進進進 ) Li, Kuo-Bin ( 進進進 ) Yang, Ueng-Cheng ( 進進進 )

Upload: gyles-nichols

Post on 11-Jan-2016

264 views

Category:

Documents


3 download

TRANSCRIPT

1

Advanced Bioinformatics Core (ABC)進階生物資訊核心設施

Chang, Chuan-Hsiung (張傳雄 )Chen, Chen-Hsin (陳珍信 )Hsu, Chun-nan (許鈞南 )

Li, Kuo-Bin (李國彬 )Yang, Ueng-Cheng (楊永正 )

2

Our niches

基因體相關資訊

預防性醫學個人化醫學

資訊科學技術資訊科學技術InformationInformation TechnologyTechnology

比較生物資訊比較生物資訊Comparative Comparative

BioinformaticsBioinformatics

功能生物資訊功能生物資訊Functional Functional

BioinformaticsBioinformatics

基因體研究統計基因體研究統計Genomic Genomic StatisticsStatistics

IT

FB

GS

CB

R&D is the basis for service and collaboration

Data collection Analysis tools

Databases Workflow

Interpretation

Interdisciplinary collaboration Vision-based R & D

3

單一窗口處理服務申請

單一窗口

登錄

瞭解問題

初部規劃

對使用者說明分析方法

使用者

綜合分析結果

進度報告

任務編組小組會議

FB GSCB IT

品質管制

完成

使用者

分析結果上線

登錄

後續諮詢no

fail

pass

yes

平台分析客制化服務合作性服務

4

Online servicehttp://abc.binfo.org.tw/

• Comparative bioinformatics– Bacterial Genome Annotation System

(bGAS)– Genome Comparison Tools (include

CAGO, CAMP, CICP)• Gene variation related

– A functional analysis and selection tool for SNP in large scale association study (FastSNP)

• Alternative splicing related– Putative Alternative Splicing database

(PALSdb)– Integrated splicing variants database

(ISVdb)• Gene expression related

– Bacterial gene expression database (BGEdb)

– Microarray Annotation and Profile (MAP)

– Cross-Hybridization Analysis Network of Gene Expression (CHANGE)

• Pathway related– Pathway Knowledge Management Syste

m (PKMS)• Phenotype related

– Bacteria: Bacterial phenotype database (BPdb)

– Cellular level: Integrated RNAi database– Organismal level: Genotype to Phenotyp

e (G2P)• Disease candidate gene databases

– Spinocerebellar ataxia candidate gene database (SCAdb)

– STR-related disease database (STRRDdb)

– Disease associated gene database (DAGdb)

– Encyclopedia of Hepatocellular Carcinoma genes Online (EHCO)

• Utilities – Gene Name Service (GNS)

• Consultation service– http://consult.binfo.org.tw/

5

CancersInfectious disease

Highly heritable disease

The same strategy may be applied to all types of cancers

Breast cancer Liver cancer Lung cancers

6

Value-added information and tools

New method: top-downNew method: top-down

Genome

Gene variation

Risk factor

Disease

Pathway analysisPathway analysis

Literature mining

Literature mining

GenotypeGenotype

Gene variation• Functional Analysis and Selection T

ool for SNP (FastSNP) in large scale association study

Alternative splicing• Putative Alternative Splicing

(PALS) db• Integrated splicing variant (ISV) db

Pathway analysis• Pathway knowledge Management S

ystem (PKMS)

Phenotypes• Disease Associated Gene (DAG) db• Gene to Phenotype (G2P) db• Integrated RNAi db

7

Two ways to collect information:web wrapper agent and text mining

Chromosome

Function Report

Ensembl dbSNPdbSNP

ESEfinderESEfinder

RESCUE-ESERESCUE-ESE

TFSEARCHTFSEARCH

SNP Search

Candidate Gene Approach

Single SNP (batch)

Novel SNP

Gene Symbol

SNP rsID

Agent Starter

NCBI GenBankNCBI GenBank

PolyPhenPolyPhen

Swiss-protSwiss-prot

Prioritization

FastSNP

Gene name service

Text mining

8

World’s most accurate automatic gene name identification from biomedical literature

BioCreAtIvE - Critical Assessment for Information Extraction in Biology http://biocreative.sourceforge.net/

9

Common strategy to discover the disease mechanism

Control Experiment differences

Distinguish cause & effect

Look for major factor

Form hypothesis

Design therapeuticintervention

Raw data

Patterns

Mechanisms

Genotyping orGene expression

cancer

10

More than 400 gene expression microarrays for cervical, lung, breast, etc. cancers

were analyzed by ABC’s tools

• Design– MIAME check list– GESDAS (Gene Expression

Study Design and Analysis Suite)

• Analysis– SMD (Stanford Microarray

Database)– GESDAS– MAP (Microarray Annotation

and Profile)– IPIR (integrated protein

interaction resource)

– CHANGE (Cross Hybridization Analysis Network of Gene Expression)

– SpliceGear and ChangeGear

• Interpretation – PKMS (Pathway Knowledge

Management System)– Integrated RNAi database – DAG db (Disease Associated

Gene database)– G2P (Genotype to Phenotype)

• Six cancer-related publications in year 2006

11

Microarray study design

http://gears.stat.sinica.edu.tw/MIAME/MIAME.php

12

Genomic Statistics Unit for Complex Diseases in the NRPGM Advanced Bioinformatics Core

Enhancing the web platform:

New

New

cDNA

Image plots

Affymetrix

MM larger than PM

13

Expanding GESDAS to a more comprehensive platform “Gene-Environment Analysis Refining System” (GEARS)

for general biomarkers (not open yet)

14

Integrated Protein Interaction Resource (IPIR) => Microarray Annotation and Profile (MAP) =>

pathway knowledge management system (PKMS)

No PPI expansion

With PPI expansion

Red: ER+

Green: ER-

Yellow: ER+ and ER-

15

World’s Most Accurate Protein Subcellular Localization Image Classifier (July 2006 – Present)

Publications

• Y.-S. Lin et al. Boosting Multi-Class Learning with Repeating Codes. In TAAI 2006 Conference on Artificial Intelligence and Applications. December, 2006.

• C.-C. Lin et al. Boosting Multiclass Learning with Repeating Codes for Protein Subcellular Localization. Submitted, 2007

Previous best result: 83%Our preliminary result: 93%

16

CancersInfectious diseases

Highly heritable diseases

Taiwan Pathogenic Microorganism Gene Database (TPMGD) for CDC, Taiwan

Infectious diseases

17

Integrate sequence with epidemiology information

Dec. 2005 – Dec. 2006

1818

使用者身份切換成功

同樣的系統,以 EpiNet 為名,對學術界開放

發佈最新消息與新聞管理發佈最新消息與新聞管理搜尋、顯示欄位的管理搜尋、顯示欄位的管理

新增與管理資料庫內容新增與管理資料庫內容資料管理者的權限資料管理者的權限

資料查詢及瀏覽資料查詢及瀏覽

個人工作區操作及序列分析個人工作區操作及序列分析

Advanced Bioinformatics Core

19

The next generation bioinformatics tool for biomedical scientists: Web service & workflow tool

20

Vibrio vulnificus strain-specific plasmid genomes.

Comparative bioinformatics tools

Integrated Comparative Analysis Platform (iCAP) for Genomic Data

bGAS (bacterial Genome Annotation System)

21

bGAS (bacterial Genome Annotation System)

22

23

24

CancersInfectious disease

Highly heritable diseases

Schizophrenia

Disease

Linkage analysisGenotyping

Linkage analysisGenotyping

Disease gene

Genes

Candidate genes

Candidateregion

Chromosome

Genome

Research methodResearch method

25

Example of providing integrated service: Searching for Disease-Associated Gene Variations

Collect information(IT,FB)

Integrate information & Primer design (FB)

Integrate information perform quality control (FB)

Look for gene variation (CB)

Priority setting (FB) Sequencing Core

Candidate gene variation & disease phenotype (GS)

26

Gene variation detection and gene-gene interaction

• Design– FastSNP

– ISV db

– PALS db

– PipMaker pipeline

– Primer3

• Analysis– PolyPhred pipeline

– GAP (Generalized Associated Plots) analysis

60 primer pairs were designed18,000 sequences were compared103 Variation sites were found * 68 were not reported before * 20 variation sites may related to phenotype (need more samples)

27

Synergy is emerging from collaboration

• Help a single project to integrate different types of information

• Make new observations by integrating data from different users

Sequencing

Genotyping

Proteomics

Gene expression

ABC

RNAi

Mouse mutagenesis

Gene relatedinformation

Phenotyperelatedinformation

PET gene probe