data mining for bioinformatics at ewha cse dec. 14, 2001 hwan-seung yong ( gene: actgaaagggctctcaaa...

15
Data Mining for BioInformatics at Ewha CSE Dec. 14, 2001 Hwan-Seung Yong (Gene: ACTGAAAGGGCTCTCAAA) Dept. of Computer Science & Engineering Ewha Womans Univ.

Upload: rolf-daniel

Post on 03-Jan-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Mining for BioInformatics at Ewha CSE Dec. 14, 2001 Hwan-Seung Yong ( Gene: ACTGAAAGGGCTCTCAAA ) Dept. of Computer Science & Engineering Ewha Womans

Data Mining for BioInformatics at Ewha CSE

Dec. 14, 2001

Hwan-Seung Yong

(Gene: ACTGAAAGGGCTCTCAAA)

Dept. of Computer Science & Engineering

Ewha Womans Univ.

Page 2: Data Mining for BioInformatics at Ewha CSE Dec. 14, 2001 Hwan-Seung Yong ( Gene: ACTGAAAGGGCTCTCAAA ) Dept. of Computer Science & Engineering Ewha Womans

BioInformatics and Computer Science

• Computer: 2 진법 시스템 (0/1) designed by Human• Living things: 4 진법 (A/G/C/T) designed by Nature• 컴퓨터 기술의 발전

– 데이터 분석 + 데이타베이스 = 데이터 마이닝 (At present)– 고성능 병렬 컴퓨터 기술– 분산 처리 및 웹 /X ML 기술– 지식관리 (Knowledge Management) 기술의 등장

• 인간이 컴퓨터를 만든 이유– 4 진법속에 담긴 생명의 비밀을 찾아서– 신의 영역에 도전

For BioInformatics

Page 3: Data Mining for BioInformatics at Ewha CSE Dec. 14, 2001 Hwan-Seung Yong ( Gene: ACTGAAAGGGCTCTCAAA ) Dept. of Computer Science & Engineering Ewha Womans

BioInformatics and Computer Science

• BioInformatics– DNA 코드 Reader(biotechnology) 및 Alignment 기술 개발

• 유전자의 전체 시퀀스를 겨우 만든 상태– 이것으로 부터 의미 ( 유전자 등 ) 를 찾는 것 .

– Binary Object 로 부터 Source Code 를 찾는 기술• Disassembler 와 Reverse Engineering 기술 전문가가 필요

– 데이타마이닝이 중요한 적용 기술임 .

Binary Code Assembly Code Source Code

DNA Sequence 유전자 단백질

Computer System

Living Things: Nature

Page 4: Data Mining for BioInformatics at Ewha CSE Dec. 14, 2001 Hwan-Seung Yong ( Gene: ACTGAAAGGGCTCTCAAA ) Dept. of Computer Science & Engineering Ewha Womans

Why Ewha CSE is appropriate for BioInformatics

• Recent focus of CSE’s Research Area – As a BK Project Plan: Knowledge Engineering Framework– Data Warehousing and OLAP – Data Mining– XML Technology– Knowledge Engineering Enabling Technology– Knowledge Engineering Application

• Electronic Commerce• BioInformatics

• 본교 관련 연구기관– 분자생명과학대학원 (BK)– 한국과학재단 SRC( 세포신호전달센터 )– 정통부 컴퓨터 그래픽스 / 가상현실 연구센터

• 기존의 관련연구 ( 직접 )– 검찰청 유전자 검색 및 자동분석 프로그램 개발– 국립과학수사연구소 유전자 정보 관리 시스템 개발

Page 5: Data Mining for BioInformatics at Ewha CSE Dec. 14, 2001 Hwan-Seung Yong ( Gene: ACTGAAAGGGCTCTCAAA ) Dept. of Computer Science & Engineering Ewha Womans

유전밴드 인식 , 코드 등록프로그램

유전밴드 인식 , 코드 등록프로그램

유전자 자동분석 프로그램

Page 6: Data Mining for BioInformatics at Ewha CSE Dec. 14, 2001 Hwan-Seung Yong ( Gene: ACTGAAAGGGCTCTCAAA ) Dept. of Computer Science & Engineering Ewha Womans

DNA Locus Registration Interface

Page 7: Data Mining for BioInformatics at Ewha CSE Dec. 14, 2001 Hwan-Seung Yong ( Gene: ACTGAAAGGGCTCTCAAA ) Dept. of Computer Science & Engineering Ewha Womans

Data Warehousing, OLAP and Data Mining

• Data Warehousing and OLAP – ETL Methodology (Extraction, Transformation and Loading)– Data Warehouse Architecture– OLAP Server Development– Multidimensional Data Processing– Metadata Handling– Data Quality Control

• Data Mining– Classification and Analysis of Data Minig Technique– Clustering Algorithm– Association Algorithm– Classification Algorithm– CRM Appliation based on Web Log Mining– Text Mining for XML Data

Page 8: Data Mining for BioInformatics at Ewha CSE Dec. 14, 2001 Hwan-Seung Yong ( Gene: ACTGAAAGGGCTCTCAAA ) Dept. of Computer Science & Engineering Ewha Womans

XML and Supporting Technology

• XML Related Area– XML Server Development

• Query Processing and Storage System

– XML document Mining

• Knowledge Enabling Technology– Multimedia Highspeed Network

– Component based Software Engineering

– Security

– Multimedia DBMS

– Natural Language Processing

– Computer Graphics and Virtual Reality

Page 9: Data Mining for BioInformatics at Ewha CSE Dec. 14, 2001 Hwan-Seung Yong ( Gene: ACTGAAAGGGCTCTCAAA ) Dept. of Computer Science & Engineering Ewha Womans

Research Requirement for BioInformatics

• Large Volume of Data including multimeia data• High Performace Computing System

– Massively Parallel Processing Hardware and Software

• XML related work is important– For exchange of bio data– Gene Annotation

• Web based collaborative system – Require web based interoperable application and standard – Distributed processing technique

• CORBA, SOAP, Microsoft .NET framework

• Data Mining– For Gene Prediction, Functional Genomics

Page 10: Data Mining for BioInformatics at Ewha CSE Dec. 14, 2001 Hwan-Seung Yong ( Gene: ACTGAAAGGGCTCTCAAA ) Dept. of Computer Science & Engineering Ewha Womans

Bio Data Mining Research

• XML Standard for Bio Data

• Graphical User Interface for XML Data

• Data Converter to XML – Convert Existing Bio Data to XML Standard

– Convert between Some XML Standard

• Integration Methodology with Existing DB– SOAP(Simple Object Access Protocol)

– WSDL(Web Service Description Language)

Page 11: Data Mining for BioInformatics at Ewha CSE Dec. 14, 2001 Hwan-Seung Yong ( Gene: ACTGAAAGGGCTCTCAAA ) Dept. of Computer Science & Engineering Ewha Womans

XML Standard for Bio Data

• Before– FASTA format, GenBank format, GFF(General Feature Format)

• XML Format– AGAVE (Architecture for Genomic Annotation, Visualization and

Exchange)• Developed by Double Twist, Inc.

• Released in June 2000

• Open Source licence in August 2001.

• AGAVE 3.2 version with Prophecy 3.0 in Sept. 2001

• Refer http://www.agavexml.org

• Genome XML Viewer by Labbook– BSML

Page 12: Data Mining for BioInformatics at Ewha CSE Dec. 14, 2001 Hwan-Seung Yong ( Gene: ACTGAAAGGGCTCTCAAA ) Dept. of Computer Science & Engineering Ewha Womans

XML standard for Bio Data

• BioXML Standard and GAME– an open-source/free software organization dedicated to providing a

set of standard xml formats for the exchange of biological data

• GAME(Genomic Annotation Markup Language)– Created at BDGP (Berkeley Drosophila Genome Project)

– Current Version 1.1 released in March 2000

– http://www.bioxml.org

– Follow WikiWeb scheme• collaborative web site that can be edited by anyone

• Community documentation system

• Everyone can edit sharing web pages

Page 13: Data Mining for BioInformatics at Ewha CSE Dec. 14, 2001 Hwan-Seung Yong ( Gene: ACTGAAAGGGCTCTCAAA ) Dept. of Computer Science & Engineering Ewha Womans

annotation

컴퓨터이론 및 보안 연구실

Phylogenetic Tree Visualization• Tree drawing algorithms• Graph drawing algorithms

New algorithm design•Simulated annealing•Other optimization techniques

Known gene • Sequence similarity

Unknown gene • Neural networks• Hidden Markov models

Unknown gene prediction

Microarray data analysis

Data mining tools

Two samples comparison

Clustering classification tools

Multiple samples comparison

Phylogenetic prediction

Phylogeny inferencePhylogenetic analysis

Comparative genomics

Whole genome sequence

Page 14: Data Mining for BioInformatics at Ewha CSE Dec. 14, 2001 Hwan-Seung Yong ( Gene: ACTGAAAGGGCTCTCAAA ) Dept. of Computer Science & Engineering Ewha Womans

Open Source Project

• Open BioInformatics Foundation– http://www.open-bio.org

– Umbralla group for various bio*.org group• bioxml.org, bioperl.org, biopython.org, biojava.org, biocorba.org

• biopathways.org

• bio-ensembl.org– Annotation for human genome

– The First Bioinformatics Open Source Conference (BOSC'2001) was held, August 2001 at San Diego.

– Many Open System Activities

Page 15: Data Mining for BioInformatics at Ewha CSE Dec. 14, 2001 Hwan-Seung Yong ( Gene: ACTGAAAGGGCTCTCAAA ) Dept. of Computer Science & Engineering Ewha Womans

Vision and Future Prediction

• Ewha will

– Contribute something in Bio Data Mining Area

– Have Bio Informatics Institute or Research Center

– Have strong bio-industry relationship

• Closing Comment

ATGCCGTCGGGCCCCGGGGC => Thank You 를 4 진법으로 표현