![Page 1: Welcome to Introduction to Bioinformatics Wednesday, 13 April 2005](https://reader036.vdocuments.mx/reader036/viewer/2022070412/568149e8550346895db7121d/html5/thumbnails/1.jpg)
![Page 2: Welcome to Introduction to Bioinformatics Wednesday, 13 April 2005](https://reader036.vdocuments.mx/reader036/viewer/2022070412/568149e8550346895db7121d/html5/thumbnails/2.jpg)
Welcome toIntroduction to Bioinformatics
Wednesday, 13 April 2005
Rehash of Exam 1 (selected)
Rehash of Exam 2 (selected)
Discussion of DGPB, Chapter 6
![Page 3: Welcome to Introduction to Bioinformatics Wednesday, 13 April 2005](https://reader036.vdocuments.mx/reader036/viewer/2022070412/568149e8550346895db7121d/html5/thumbnails/3.jpg)
Exam 1, Problem 6
6c. Find first nucleotides of genes that don’t encode protein.
(LOAD-SHARED-FILE noncoding-genes-of)
(DEFINE "nc-genes" (NONCODING GENES OF S7942)
(FOR EACH (gene IN nc-genes)) WITH beginning = SEQUENCE OF gene 1 – 3 DO COLLECT beginning
(DEFINE variable AS value)
![Page 4: Welcome to Introduction to Bioinformatics Wednesday, 13 April 2005](https://reader036.vdocuments.mx/reader036/viewer/2022070412/568149e8550346895db7121d/html5/thumbnails/4.jpg)
Exam 1, Problem 6
6c. Find first nucleotides of genes that don’t encode protein.
(LOAD-SHARED-FILE "noncoding-genes-of")
(DEFINE nc-genes AS (NONCODING-GENES-OF S7942)
(FOR-EACH gene IN nc-genes AS beginning = (SEQUENCE-OF gene FROM 1 TO 3) COLLECT beginning)
(DEFINE variable AS value)
![Page 5: Welcome to Introduction to Bioinformatics Wednesday, 13 April 2005](https://reader036.vdocuments.mx/reader036/viewer/2022070412/568149e8550346895db7121d/html5/thumbnails/5.jpg)
Exam 1, Problem 6
6c. Find first nucleotides of genes that don’t encode protein.
(LOAD-SHARED-FILE "noncoding-genes-of")
(DEFINE nc-genes AS (NONCODING-GENES-OF S7942)
(FOR-EACH gene IN nc-genes AS beginning = (SEQUENCE-OF gene FROM 1 TO 3) COLLECT beginning)
:: ("GCG" "GCG" "GGA" "GCC" "GCC" "GGA" "GCG" "GGG" "GCC" "GGG" "GCG" "GCC" "GCG" "GGA" "GGG" "GCG" "GGG" "TCC" "GGT" "GGG" "GGG" "AAA" "GGA" "CCA" "TCC" "GGC" "GGC" "CGC" "CGG" "GGG" "GGG" "GCG" "AAA" "GGG" "GGG" "GGT" "TCC" "GGC" "TGG" "GGG" "GCG" "GGG" "GCC" "GGG" "GCC" "GGG" "CGG" "CGG" "GGG" "GCG" "GGG" "GGG")
![Page 6: Welcome to Introduction to Bioinformatics Wednesday, 13 April 2005](https://reader036.vdocuments.mx/reader036/viewer/2022070412/568149e8550346895db7121d/html5/thumbnails/6.jpg)
![Page 7: Welcome to Introduction to Bioinformatics Wednesday, 13 April 2005](https://reader036.vdocuments.mx/reader036/viewer/2022070412/568149e8550346895db7121d/html5/thumbnails/7.jpg)
Exam 1, Problem 8
8. Thermophilus extremus G+C% content = 80%.
8a. Frequency of cutting of MseI (TTAA)
G + C = 0.8A + T = 0.2 A = 0.1 T = 0.1
Expected frequency of TTAA = 0.1 * 0.1 * 0.1 * 0.1 = 10-4
![Page 8: Welcome to Introduction to Bioinformatics Wednesday, 13 April 2005](https://reader036.vdocuments.mx/reader036/viewer/2022070412/568149e8550346895db7121d/html5/thumbnails/8.jpg)
Exam 1, Problem 8
8. Thermophilus extremus G+C% content = 80%.
8d. Test answer with 1000 random DNA sequences 1000-nucleotides in length (G+C% = 80%)
(FOR-EACH iteration FROM 1 TO 1000
AS seq = (RANDOM-DNA A 1 T 1 C 4 G 4 LENGTH 1000)
AS counts = (COUNT-OF "TTAA" IN seq)
SUM counts)
:: 103
??? hits per trial?
![Page 9: Welcome to Introduction to Bioinformatics Wednesday, 13 April 2005](https://reader036.vdocuments.mx/reader036/viewer/2022070412/568149e8550346895db7121d/html5/thumbnails/9.jpg)
Exam 1, Problem 8
8. Thermophilus extremus G+C% content = 80%.
8d. Test answer with 1000 random DNA sequences 1000-nucleotides in length (G+C% = 80%)
(FOR-EACH iteration FROM 1 TO 1000
AS seq = (RANDOM-DNA A 1 T 1 C 4 G 4 LENGTH 1000)
AS counts = (COUNT-OF "TTAA" IN seq)
SUM counts)
:: 103
0.103 hits per trial?
![Page 10: Welcome to Introduction to Bioinformatics Wednesday, 13 April 2005](https://reader036.vdocuments.mx/reader036/viewer/2022070412/568149e8550346895db7121d/html5/thumbnails/10.jpg)
Exam 1, Problem 8
8. Thermophilus extremus G+C% content = 80%.
8e. Test answer with 1000 random DNA sequences 3000-nucleotides in length (G+C% = 80%)
(FOR-EACH iteration FROM 1 TO 1000
AS seq = (RANDOM-DNA A 1 T 1 C 4 G 4 LENGTH 3000)
AS counts = (COUNT-OF "TTAA" IN seq)
SUM counts)
:: 314
??? hits per trial?
![Page 11: Welcome to Introduction to Bioinformatics Wednesday, 13 April 2005](https://reader036.vdocuments.mx/reader036/viewer/2022070412/568149e8550346895db7121d/html5/thumbnails/11.jpg)
Exam 1, Problem 8
8. Thermophilus extremus G+C% content = 80%.
8e. Test answer with 1000 random DNA sequences 3000-nucleotides in length (G+C% = 80%)
(FOR-EACH iteration FROM 1 TO 1000
AS seq = (RANDOM-DNA A 1 T 1 C 4 G 4 LENGTH 3000)
AS counts = (COUNT-OF "TTAA" IN seq)
SUM counts)
:: 314
0.314 hits per trial?
![Page 12: Welcome to Introduction to Bioinformatics Wednesday, 13 April 2005](https://reader036.vdocuments.mx/reader036/viewer/2022070412/568149e8550346895db7121d/html5/thumbnails/12.jpg)
Exam 1, Problem 8
8. Thermophilus extremus G+C% content = 80%.
8f. Interpret your results in light of the definition of E-value (or Expect value).
Your results:
1000 1000-nucleotides DNA sequences 0.103 per trial
Expected frequency = 10-4
1000 3000-nucleotides DNA sequences 0.314 per trial
E-value = (expected frequency) · (search space)
E-value
0.1
0.3
![Page 13: Welcome to Introduction to Bioinformatics Wednesday, 13 April 2005](https://reader036.vdocuments.mx/reader036/viewer/2022070412/568149e8550346895db7121d/html5/thumbnails/13.jpg)
Exam 1, Problem 10
Examine Fig. 4.11 in the text, focusing on the spot labeled TDH1.
![Page 14: Welcome to Introduction to Bioinformatics Wednesday, 13 April 2005](https://reader036.vdocuments.mx/reader036/viewer/2022070412/568149e8550346895db7121d/html5/thumbnails/14.jpg)
Exam 1, Problem 10
Examine Fig. 4.11 in the text, focusing on the spot labeled TDH1.
10b. From what you can learn of the function of the gene, why might this result make sense?
![Page 15: Welcome to Introduction to Bioinformatics Wednesday, 13 April 2005](https://reader036.vdocuments.mx/reader036/viewer/2022070412/568149e8550346895db7121d/html5/thumbnails/15.jpg)
glucose
pyruvate
glycolysis
gluconeogenesis
Glyceraldehyde-3-phosphate
dehydrogenase
![Page 16: Welcome to Introduction to Bioinformatics Wednesday, 13 April 2005](https://reader036.vdocuments.mx/reader036/viewer/2022070412/568149e8550346895db7121d/html5/thumbnails/16.jpg)
Exam 2, Problem 4
Consider Chi-Squared.
4a. Define a function that calculates chi-squared scores, given two input arguments…
(DEFINE-FUNCTION chi-square (observed expected-freqs) (LET* ((total (SUM-OF observed)) (expected (FOR-EACH freq IN expected-freqs COLLECT (* freq total))))
(FOR-EACH O IN observed FOR E IN expected
AS diff = (- O E)AS numerator = (* diff diff)SUM (/ numerator E))))
![Page 17: Welcome to Introduction to Bioinformatics Wednesday, 13 April 2005](https://reader036.vdocuments.mx/reader036/viewer/2022070412/568149e8550346895db7121d/html5/thumbnails/17.jpg)
Exam 2, Problem 4
Consider Chi-Squared.
4b. How do you interpret the 1.44 result from my example?
The probability of getting a value of 1.44 is likely to occur in the gene 100-nt population
![Page 18: Welcome to Introduction to Bioinformatics Wednesday, 13 April 2005](https://reader036.vdocuments.mx/reader036/viewer/2022070412/568149e8550346895db7121d/html5/thumbnails/18.jpg)
Exam 2, Problem 4
Consider Chi-Squared.
4b. How do you interpret the 1.44 result from my example?
This means that there is a > 5% chance that the population given fits the expected ratios.
![Page 19: Welcome to Introduction to Bioinformatics Wednesday, 13 April 2005](https://reader036.vdocuments.mx/reader036/viewer/2022070412/568149e8550346895db7121d/html5/thumbnails/19.jpg)
Exam 2, Problem 4
Consider Chi-Squared.
4b. How do you interpret the 1.44 result from my example?
there is a 5% chance that the A:C:G:T ratio of 28:22:28:22 is due to random chance.
![Page 20: Welcome to Introduction to Bioinformatics Wednesday, 13 April 2005](https://reader036.vdocuments.mx/reader036/viewer/2022070412/568149e8550346895db7121d/html5/thumbnails/20.jpg)
Exam 2, Problem 5
5h. Rerun the program you wrote in 5d but using a single population: random DNA sequences.
![Page 21: Welcome to Introduction to Bioinformatics Wednesday, 13 April 2005](https://reader036.vdocuments.mx/reader036/viewer/2022070412/568149e8550346895db7121d/html5/thumbnails/21.jpg)
DGPB 6.1 Associating proteins with functions
AGTCGT…TGTAACG…CGTGC… AGTCGT…CATGGGA…CGTGC…
UPTAG DOWNTAG
gene
![Page 22: Welcome to Introduction to Bioinformatics Wednesday, 13 April 2005](https://reader036.vdocuments.mx/reader036/viewer/2022070412/568149e8550346895db7121d/html5/thumbnails/22.jpg)
DGPB 6.1 Associating proteins with functions
![Page 23: Welcome to Introduction to Bioinformatics Wednesday, 13 April 2005](https://reader036.vdocuments.mx/reader036/viewer/2022070412/568149e8550346895db7121d/html5/thumbnails/23.jpg)
DGPB 6.1 Associating proteins with functions
![Page 24: Welcome to Introduction to Bioinformatics Wednesday, 13 April 2005](https://reader036.vdocuments.mx/reader036/viewer/2022070412/568149e8550346895db7121d/html5/thumbnails/24.jpg)
DGPB 6.1 Associating proteins with functions
![Page 25: Welcome to Introduction to Bioinformatics Wednesday, 13 April 2005](https://reader036.vdocuments.mx/reader036/viewer/2022070412/568149e8550346895db7121d/html5/thumbnails/25.jpg)
DGPB 6.1 Associating proteins with functions
Sampling problem