aliya sadeque bioc 599 supervisory committee meeting

Post on 22-Jan-2016

57 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Aliya Sadeque BIOC 599 Supervisory Committee Meeting. Wednesday December 19, 2007. Outline. About me Thesis project blueprint Course selection. Curriculum Vitae. Queen’s University . Bachelor of Science (Honours) in Biochemistry. Minor in Computing. Graduated May, 2007. - PowerPoint PPT Presentation

TRANSCRIPT

Aliya SadequeBIOC 599Supervisory Committee Meeting Wednesday December 19, 2007.

Outline

About me Thesis project blueprint Course selection

Curriculum Vitae

Queen’s University.Bachelor of Science (Honours) in Biochemistry. Minor in Computing.Graduated May, 2007

Previous Coursework Undergraduate Level

Biochemistry: Proteins and Enzymes Physical Biochemistry Metabolism Molecular Biology Introductory Biochemistry Laboratory Protein Structure and Function Current Topics in Biochemistry Biochemistry of the Cell Advanced Molecular Biology

Previous Coursework Undergraduate Level Computing:

Database Management Systems Neural and Genetic Computing Introduction to Data Mining System Level Programming Operating Systems

Undergraduate Level Mathematics: Introduction to Statistics Discrete Math for Computer Scientists Modeling Techniques in Biology

Thesis Project Blueprint Context

What do we know so far Why is this work important

LCS Hits curves

where does the number of hits explode Visualization

Where are these regions Further investigation of regions of interest

Promoter Prediction Tools

Existing tools: what’s out there? Developing a new tool

Visualization visualize all predicted promoters against LCS identified

regions

Context

Promoter sequences might be identified as conserved islands in a divergent sea”

Longest Common Subsequence

Longest Common Subsequence

subsequence length # solutions

51 0

52 0

60 0

50 1

45 6

40 13

36 25

35 28

30 48

25 101

20 191

17 344

15 1350

14 5966

13 23723

12 63845

10 118643

subsequence length # solutions

58 0

60 0

57 2

55 4

54 5

53 6

50 14

45 24

40 46

30 114

25 216

20 667

19 1004

18 2105

17 6554

15 58492

subsequence length

# solutions

62 0

63 0

64 0

65 0

61 2

60 5

57 7

59 7

56 10

55 12

54 16

53 20

52 24

51 27

Longest Common Subsequence Hits Curves

Figure 1 Hits curve (full range view)

-20000

0

20000

40000

60000

80000

100000

120000

140000

0 10 20 30 40 50 60 70

subsequence length (nts)

nu

mb

er

of

matc

hes

error 1 errors 2 errors 3

Longest Common Subsequence Hits Curves

Figure 2. Hits curve (under 50 matches)

-5

5

15

25

35

45

0 10 20 30 40 50 60 70

subsequence length (nts)

nu

mb

er

of

matc

hes

error 1 errors 2 errors 3

Promoter Prediction

Existing Tools Interpolated Context Modeling (ICM) Feedforward Neural Network

New ideas for promoter prediction Neural Networks

Drosophila tool GPCR tool

Data mining techniques WEKA

Other forms of computer learning

Course Selection

BIOC 570 - completed MICR 502 - Virology Courses to sit in for:

Biochemistry courses? Computing courses?

Data mining Bioinformatics

Questions for myself

Hits curves – what does it mean if they identify same region? Exact to approximate matches

Why is this study important What weka tools would be good?

top related