![Page 1: Finding detailed relationships between proteins specific to phenotypes among microbial organisms Daniel Park Molecular Biology Institute, UCLA Yeates lab](https://reader030.vdocuments.mx/reader030/viewer/2022033101/56649d125503460f949e59f3/html5/thumbnails/1.jpg)
Finding detailed relationships between proteins specific to phenotypes among microbial organisms
Daniel ParkMolecular Biology Institute, UCLA
Yeates labSoCalBSI
August 24, 2006
![Page 2: Finding detailed relationships between proteins specific to phenotypes among microbial organisms Daniel Park Molecular Biology Institute, UCLA Yeates lab](https://reader030.vdocuments.mx/reader030/viewer/2022033101/56649d125503460f949e59f3/html5/thumbnails/2.jpg)
OUTLINE
• Phylogenetic profiles
• Ternary logic analysis
• Building COG & phenotype profiles
• Results of logic analysis
![Page 3: Finding detailed relationships between proteins specific to phenotypes among microbial organisms Daniel Park Molecular Biology Institute, UCLA Yeates lab](https://reader030.vdocuments.mx/reader030/viewer/2022033101/56649d125503460f949e59f3/html5/thumbnails/3.jpg)
OUTLINE
• Phylogenetic profiles
• Ternary logic analysis
• Building COG & phenotype profiles
• Results of logic analysis
![Page 4: Finding detailed relationships between proteins specific to phenotypes among microbial organisms Daniel Park Molecular Biology Institute, UCLA Yeates lab](https://reader030.vdocuments.mx/reader030/viewer/2022033101/56649d125503460f949e59f3/html5/thumbnails/4.jpg)
PHYLOGENETIC PROFILES• Turning an earlier question on its side:• From, “What proteins are found in a genome?”• To, “What genomes contain a given protein?”
![Page 5: Finding detailed relationships between proteins specific to phenotypes among microbial organisms Daniel Park Molecular Biology Institute, UCLA Yeates lab](https://reader030.vdocuments.mx/reader030/viewer/2022033101/56649d125503460f949e59f3/html5/thumbnails/5.jpg)
VARIATIONS OF PHYLOGENETIC PROFILES
• Relationships between protein families
• Relationships between protein family profile and given target ‘phenotype’ profile
![Page 6: Finding detailed relationships between proteins specific to phenotypes among microbial organisms Daniel Park Molecular Biology Institute, UCLA Yeates lab](https://reader030.vdocuments.mx/reader030/viewer/2022033101/56649d125503460f949e59f3/html5/thumbnails/6.jpg)
OUTLINE
• Phylogenetic profiles
• Ternary logic analysis
• Building COG & phenotype profiles
• Results of logic analysis
![Page 7: Finding detailed relationships between proteins specific to phenotypes among microbial organisms Daniel Park Molecular Biology Institute, UCLA Yeates lab](https://reader030.vdocuments.mx/reader030/viewer/2022033101/56649d125503460f949e59f3/html5/thumbnails/7.jpg)
COMPLEXITY OF CELLULAR PROCESSES
![Page 8: Finding detailed relationships between proteins specific to phenotypes among microbial organisms Daniel Park Molecular Biology Institute, UCLA Yeates lab](https://reader030.vdocuments.mx/reader030/viewer/2022033101/56649d125503460f949e59f3/html5/thumbnails/8.jpg)
HIGHER ORDER RELATIONSHIPS:TERNARY LOGIC ANALYSIS
A B
![Page 9: Finding detailed relationships between proteins specific to phenotypes among microbial organisms Daniel Park Molecular Biology Institute, UCLA Yeates lab](https://reader030.vdocuments.mx/reader030/viewer/2022033101/56649d125503460f949e59f3/html5/thumbnails/9.jpg)
8 LOGIC TYPES FOR PHYLOGENETIC PROFILE TRIPLETS
![Page 10: Finding detailed relationships between proteins specific to phenotypes among microbial organisms Daniel Park Molecular Biology Institute, UCLA Yeates lab](https://reader030.vdocuments.mx/reader030/viewer/2022033101/56649d125503460f949e59f3/html5/thumbnails/10.jpg)
MEASURING MUTAL INFORMATION BETWEEN TWO PROFILES
Where U is the uncertainty coefficient relating profiles x and y H is the Shannon entropy of the probability distributions
Range of U: [0,1] Ex. U = 0.88 88% decrease in uncertainty
High value of U indicates high
mutual information between x and y
)(/)],()()([)|( xHyxHyHxHyxU
![Page 11: Finding detailed relationships between proteins specific to phenotypes among microbial organisms Daniel Park Molecular Biology Institute, UCLA Yeates lab](https://reader030.vdocuments.mx/reader030/viewer/2022033101/56649d125503460f949e59f3/html5/thumbnails/11.jpg)
MEASURING MUTAL INFORMATION AMONG THREE PROFILES
U(c | f(a,b)) where f(a,b) is the logical combination of a and b
Constraints:
U(c|a) < xU(c|b) < xU(c|f(a,b)) > y
![Page 12: Finding detailed relationships between proteins specific to phenotypes among microbial organisms Daniel Park Molecular Biology Institute, UCLA Yeates lab](https://reader030.vdocuments.mx/reader030/viewer/2022033101/56649d125503460f949e59f3/html5/thumbnails/12.jpg)
OUTLINE
• Phylogenetic profiles
• Ternary logic analysis
• Building COG & phenotype profiles
• Results of logic analysis
![Page 13: Finding detailed relationships between proteins specific to phenotypes among microbial organisms Daniel Park Molecular Biology Institute, UCLA Yeates lab](https://reader030.vdocuments.mx/reader030/viewer/2022033101/56649d125503460f949e59f3/html5/thumbnails/13.jpg)
COGs: CLUSTERS OF ORTHOLOGOUS GROUPS
Set of orthologous proteins from at least three different lineages
Cluster Functional group
![Page 14: Finding detailed relationships between proteins specific to phenotypes among microbial organisms Daniel Park Molecular Biology Institute, UCLA Yeates lab](https://reader030.vdocuments.mx/reader030/viewer/2022033101/56649d125503460f949e59f3/html5/thumbnails/14.jpg)
COMBINATIONS OF COG PROFILES MATCHING A PHENOTYPE
![Page 15: Finding detailed relationships between proteins specific to phenotypes among microbial organisms Daniel Park Molecular Biology Institute, UCLA Yeates lab](https://reader030.vdocuments.mx/reader030/viewer/2022033101/56649d125503460f949e59f3/html5/thumbnails/15.jpg)
ASSOCIATING MORE GENOMES WITH COGS
No. of fully sequenced bacterial genomes over the last 9 years
66
354
70
50
100
150
200
250
300
350
400
1997 2003 2006
Years
No
. o
f b
acte
rial
gen
om
es
![Page 16: Finding detailed relationships between proteins specific to phenotypes among microbial organisms Daniel Park Molecular Biology Institute, UCLA Yeates lab](https://reader030.vdocuments.mx/reader030/viewer/2022033101/56649d125503460f949e59f3/html5/thumbnails/16.jpg)
`
BUILDING COG PROFILES
• 81,480 proteins• 354 bacterial genomes• 4,613 COGs
![Page 17: Finding detailed relationships between proteins specific to phenotypes among microbial organisms Daniel Park Molecular Biology Institute, UCLA Yeates lab](https://reader030.vdocuments.mx/reader030/viewer/2022033101/56649d125503460f949e59f3/html5/thumbnails/17.jpg)
BUILDING PHENOTYPE PROFILES
http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi
![Page 18: Finding detailed relationships between proteins specific to phenotypes among microbial organisms Daniel Park Molecular Biology Institute, UCLA Yeates lab](https://reader030.vdocuments.mx/reader030/viewer/2022033101/56649d125503460f949e59f3/html5/thumbnails/18.jpg)
![Page 19: Finding detailed relationships between proteins specific to phenotypes among microbial organisms Daniel Park Molecular Biology Institute, UCLA Yeates lab](https://reader030.vdocuments.mx/reader030/viewer/2022033101/56649d125503460f949e59f3/html5/thumbnails/19.jpg)
OUTLINE
• Phylogenetic profiles
• Ternary logic analysis
• Building COG & phenotype profiles
• Results of logic analysis
![Page 20: Finding detailed relationships between proteins specific to phenotypes among microbial organisms Daniel Park Molecular Biology Institute, UCLA Yeates lab](https://reader030.vdocuments.mx/reader030/viewer/2022033101/56649d125503460f949e59f3/html5/thumbnails/20.jpg)
Cumulative no. of protein triplets recovered at an uncertainty coefficient score greater than a given
threshold
![Page 21: Finding detailed relationships between proteins specific to phenotypes among microbial organisms Daniel Park Molecular Biology Institute, UCLA Yeates lab](https://reader030.vdocuments.mx/reader030/viewer/2022033101/56649d125503460f949e59f3/html5/thumbnails/21.jpg)
Frequency for each of the eight logic function types observed
![Page 22: Finding detailed relationships between proteins specific to phenotypes among microbial organisms Daniel Park Molecular Biology Institute, UCLA Yeates lab](https://reader030.vdocuments.mx/reader030/viewer/2022033101/56649d125503460f949e59f3/html5/thumbnails/22.jpg)
CORRELATIONS WITH PHENOTYPES:TEMPERATURE RANGE
• For U > 0.8, one relationship between proteins was found:
Hyperthermophilicity = and( COG0432, !COG0225 )U ( Hyp. | COG0432 ) = 0.26
U ( Hyp. | COG0225 ) = 0.29
U ( Hyp. | and( COG0432, !COG0225 ) ) = 0.71
[S] COG0432: Uncharacterized conserved protein
[O] COG0225: Peptide methionine sulfoxide reductase
![Page 23: Finding detailed relationships between proteins specific to phenotypes among microbial organisms Daniel Park Molecular Biology Institute, UCLA Yeates lab](https://reader030.vdocuments.mx/reader030/viewer/2022033101/56649d125503460f949e59f3/html5/thumbnails/23.jpg)
LOGICAL COMBINATION OF COG PROFILES MATCHING A PHENOTYPE PROFILE
c = hyperthermophilicityf = and( COG0432, !COG0225 ) a = COG0432 (Uncharacterized conserved protein)b = !COG0225 (Peptide methionine sulfoxide reductase)
![Page 24: Finding detailed relationships between proteins specific to phenotypes among microbial organisms Daniel Park Molecular Biology Institute, UCLA Yeates lab](https://reader030.vdocuments.mx/reader030/viewer/2022033101/56649d125503460f949e59f3/html5/thumbnails/24.jpg)
CONCLUSIONS
• There may be a correlation between the absence of methionine sulfoxide reductase and the presence of an uncharacterized conserved protein in hyperthermophiles.
![Page 25: Finding detailed relationships between proteins specific to phenotypes among microbial organisms Daniel Park Molecular Biology Institute, UCLA Yeates lab](https://reader030.vdocuments.mx/reader030/viewer/2022033101/56649d125503460f949e59f3/html5/thumbnails/25.jpg)
CONCLUSIONS
– Classified ~80,000 proteins from 354 bacterial genomes into ~4,600 COGs
– Built COG and phenotype profile matrices for 354 fully sequenced bacterial genomes
– Support that ternary relationships among COGs are biologically significant
– Support that some logic types are seen in biology more than others: 1 (and)
57 (xor)
![Page 26: Finding detailed relationships between proteins specific to phenotypes among microbial organisms Daniel Park Molecular Biology Institute, UCLA Yeates lab](https://reader030.vdocuments.mx/reader030/viewer/2022033101/56649d125503460f949e59f3/html5/thumbnails/26.jpg)
FUTURE DIRECTIONS
• Build a richer database of phenotype profiles
• Investigate relationships at lower cutoffs
• Experimentally characterize the unknown COG0432 by crystallography
![Page 27: Finding detailed relationships between proteins specific to phenotypes among microbial organisms Daniel Park Molecular Biology Institute, UCLA Yeates lab](https://reader030.vdocuments.mx/reader030/viewer/2022033101/56649d125503460f949e59f3/html5/thumbnails/27.jpg)
ACKNOWLEDGEMENTS
Todd Yeates
Matteo Pellegrini
Yeates lab
Morgan Beeby
Brian O’Connor
Rest of the lab
SoCalBSI 2006
Jamil Momand
Wendie Johnston
Sandra Sharp
Nancy Warter-Perez
Ronnie Cheng
Fellow participants