1 advanced bioinformatics core (abc) 進階生物資訊核心設施 chang, chuan-hsiung ( 張傳雄...
TRANSCRIPT
1
Advanced Bioinformatics Core (ABC)進階生物資訊核心設施
Chang, Chuan-Hsiung (張傳雄 )Chen, Chen-Hsin (陳珍信 )Hsu, Chun-nan (許鈞南 )
Li, Kuo-Bin (李國彬 )Yang, Ueng-Cheng (楊永正 )
2
Our niches
基因體相關資訊
預防性醫學個人化醫學
資訊科學技術資訊科學技術InformationInformation TechnologyTechnology
比較生物資訊比較生物資訊Comparative Comparative
BioinformaticsBioinformatics
功能生物資訊功能生物資訊Functional Functional
BioinformaticsBioinformatics
基因體研究統計基因體研究統計Genomic Genomic StatisticsStatistics
IT
FB
GS
CB
R&D is the basis for service and collaboration
Data collection Analysis tools
Databases Workflow
Interpretation
Interdisciplinary collaboration Vision-based R & D
3
單一窗口處理服務申請
單一窗口
登錄
瞭解問題
初部規劃
對使用者說明分析方法
使用者
綜合分析結果
進度報告
任務編組小組會議
FB GSCB IT
品質管制
完成
使用者
分析結果上線
登錄
後續諮詢no
fail
pass
yes
平台分析客制化服務合作性服務
4
Online servicehttp://abc.binfo.org.tw/
• Comparative bioinformatics– Bacterial Genome Annotation System
(bGAS)– Genome Comparison Tools (include
CAGO, CAMP, CICP)• Gene variation related
– A functional analysis and selection tool for SNP in large scale association study (FastSNP)
• Alternative splicing related– Putative Alternative Splicing database
(PALSdb)– Integrated splicing variants database
(ISVdb)• Gene expression related
– Bacterial gene expression database (BGEdb)
– Microarray Annotation and Profile (MAP)
– Cross-Hybridization Analysis Network of Gene Expression (CHANGE)
• Pathway related– Pathway Knowledge Management Syste
m (PKMS)• Phenotype related
– Bacteria: Bacterial phenotype database (BPdb)
– Cellular level: Integrated RNAi database– Organismal level: Genotype to Phenotyp
e (G2P)• Disease candidate gene databases
– Spinocerebellar ataxia candidate gene database (SCAdb)
– STR-related disease database (STRRDdb)
– Disease associated gene database (DAGdb)
– Encyclopedia of Hepatocellular Carcinoma genes Online (EHCO)
• Utilities – Gene Name Service (GNS)
• Consultation service– http://consult.binfo.org.tw/
5
CancersInfectious disease
Highly heritable disease
The same strategy may be applied to all types of cancers
Breast cancer Liver cancer Lung cancers
6
Value-added information and tools
New method: top-downNew method: top-down
Genome
Gene variation
Risk factor
Disease
Pathway analysisPathway analysis
Literature mining
Literature mining
GenotypeGenotype
Gene variation• Functional Analysis and Selection T
ool for SNP (FastSNP) in large scale association study
Alternative splicing• Putative Alternative Splicing
(PALS) db• Integrated splicing variant (ISV) db
Pathway analysis• Pathway knowledge Management S
ystem (PKMS)
Phenotypes• Disease Associated Gene (DAG) db• Gene to Phenotype (G2P) db• Integrated RNAi db
7
Two ways to collect information:web wrapper agent and text mining
Chromosome
Function Report
Ensembl dbSNPdbSNP
ESEfinderESEfinder
RESCUE-ESERESCUE-ESE
TFSEARCHTFSEARCH
SNP Search
Candidate Gene Approach
Single SNP (batch)
Novel SNP
Gene Symbol
SNP rsID
Agent Starter
NCBI GenBankNCBI GenBank
PolyPhenPolyPhen
Swiss-protSwiss-prot
Prioritization
FastSNP
Gene name service
Text mining
8
World’s most accurate automatic gene name identification from biomedical literature
BioCreAtIvE - Critical Assessment for Information Extraction in Biology http://biocreative.sourceforge.net/
9
Common strategy to discover the disease mechanism
Control Experiment differences
Distinguish cause & effect
Look for major factor
Form hypothesis
Design therapeuticintervention
Raw data
Patterns
Mechanisms
Genotyping orGene expression
cancer
10
More than 400 gene expression microarrays for cervical, lung, breast, etc. cancers
were analyzed by ABC’s tools
• Design– MIAME check list– GESDAS (Gene Expression
Study Design and Analysis Suite)
• Analysis– SMD (Stanford Microarray
Database)– GESDAS– MAP (Microarray Annotation
and Profile)– IPIR (integrated protein
interaction resource)
– CHANGE (Cross Hybridization Analysis Network of Gene Expression)
– SpliceGear and ChangeGear
• Interpretation – PKMS (Pathway Knowledge
Management System)– Integrated RNAi database – DAG db (Disease Associated
Gene database)– G2P (Genotype to Phenotype)
• Six cancer-related publications in year 2006
11
Microarray study design
http://gears.stat.sinica.edu.tw/MIAME/MIAME.php
12
Genomic Statistics Unit for Complex Diseases in the NRPGM Advanced Bioinformatics Core
Enhancing the web platform:
New
New
cDNA
Image plots
Affymetrix
MM larger than PM
13
Expanding GESDAS to a more comprehensive platform “Gene-Environment Analysis Refining System” (GEARS)
for general biomarkers (not open yet)
14
Integrated Protein Interaction Resource (IPIR) => Microarray Annotation and Profile (MAP) =>
pathway knowledge management system (PKMS)
No PPI expansion
With PPI expansion
Red: ER+
Green: ER-
Yellow: ER+ and ER-
15
World’s Most Accurate Protein Subcellular Localization Image Classifier (July 2006 – Present)
Publications
• Y.-S. Lin et al. Boosting Multi-Class Learning with Repeating Codes. In TAAI 2006 Conference on Artificial Intelligence and Applications. December, 2006.
• C.-C. Lin et al. Boosting Multiclass Learning with Repeating Codes for Protein Subcellular Localization. Submitted, 2007
Previous best result: 83%Our preliminary result: 93%
16
CancersInfectious diseases
Highly heritable diseases
Taiwan Pathogenic Microorganism Gene Database (TPMGD) for CDC, Taiwan
Infectious diseases
1818
使用者身份切換成功
同樣的系統,以 EpiNet 為名,對學術界開放
發佈最新消息與新聞管理發佈最新消息與新聞管理搜尋、顯示欄位的管理搜尋、顯示欄位的管理
新增與管理資料庫內容新增與管理資料庫內容資料管理者的權限資料管理者的權限
資料查詢及瀏覽資料查詢及瀏覽
個人工作區操作及序列分析個人工作區操作及序列分析
Advanced Bioinformatics Core
20
Vibrio vulnificus strain-specific plasmid genomes.
Comparative bioinformatics tools
Integrated Comparative Analysis Platform (iCAP) for Genomic Data
bGAS (bacterial Genome Annotation System)
24
CancersInfectious disease
Highly heritable diseases
Schizophrenia
Disease
Linkage analysisGenotyping
Linkage analysisGenotyping
Disease gene
Genes
Candidate genes
Candidateregion
Chromosome
Genome
Research methodResearch method
25
Example of providing integrated service: Searching for Disease-Associated Gene Variations
Collect information(IT,FB)
Integrate information & Primer design (FB)
Integrate information perform quality control (FB)
Look for gene variation (CB)
Priority setting (FB) Sequencing Core
Candidate gene variation & disease phenotype (GS)
26
Gene variation detection and gene-gene interaction
• Design– FastSNP
– ISV db
– PALS db
– PipMaker pipeline
– Primer3
• Analysis– PolyPhred pipeline
– GAP (Generalized Associated Plots) analysis
60 primer pairs were designed18,000 sequences were compared103 Variation sites were found * 68 were not reported before * 20 variation sites may related to phenotype (need more samples)
27
Synergy is emerging from collaboration
• Help a single project to integrate different types of information
• Make new observations by integrating data from different users
Sequencing
Genotyping
Proteomics
Gene expression
ABC
RNAi
Mouse mutagenesis
Gene relatedinformation
Phenotyperelatedinformation
PET gene probe