it & health 2009 summary
DESCRIPTION
It & Health 2009 Summary. Thomas Nordahl Petersen. Teachers. Bent Petersen. Thomas Nordahl Petersen. Ramneek Gupta. Rasmus Wernersson. Lisbeth Nielsen Fink. Thomas Blicher. Anders Gorm Pedersen. Outline of the course. Topics will cover a general introduction to bioinformatics Evolution - PowerPoint PPT PresentationTRANSCRIPT
It & Health 2009Summary
Thomas Nordahl Petersen
Teachers
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
Thomas Nordahl Petersen
Rasmus Wernersson
Lisbeth Nielsen Fink
Anders Gorm Pedersen
Bent Petersen
Ramneek Gupta
Thomas Blicher
Outline of the course
• Topics will cover a general introduction to bioinformatics– Evolution– DNA / Protein– Alignment and scoring matrices
• How does it work & what are the numbers
– Visualization of multiple alignments• Phylogenetic trees and logo plots
– Commonly used databases• Uniprot/Genbank & Genome browsers
– Protein 3D-structure– Artificial neural networks & case stories– Practical use of bioinformatics tools
• Preparation for exam
Topics covered - (some of them)
Information flow in biological systems
Amino Acids
Amine and carboxyl groups. Sidechain ‘R’ is attached to C-alpha carbon
The amino acids found in Living organisms are L-amino acids
Amino Acids - peptide bond
N-terminal C-terminal
1 and 3-letter codes
1.There are 20 naturally occurring amino acids2.Normally the one/three codes are used
Ala - ACys - CAsp - DGlu - EPhe - FGly - GHis - HIle - ILys - KLeu - L
Met - MAsn - NPro - PGln - QArg - RSer - SThr - TVal - VTrp - WTyr - Y
CE
NT
ER
FO
R B
IOLO
GIC
AL
SE
QU
EN
CE
AN
ALY
SIS
Theory of evolution
Charles DarwinCharles Darwin1809-18821809-1882
Phylogenetic tree
Global versus local alignments
Global alignment: align full length of both sequences. (The “Needleman-Wunsch” algorithm).
Local alignment: find best partial alignment of two sequences (the “Smith-Waterman” algorithm).
Global alignment
Seq 1
Seq 2
Local alignment
Pairwise alignment: the solution
”Dynamic programming” (the Needleman-Wunsch algorithm)
Sequence alignment - Blast
Sequence alignment - Blast
Blosum & PAM matrices
• Blosum matrices are the most commonly used substitution matrices.
• Blosum50, Blosum62, blosum80• PAM - Percent Accepted Mutations• PAM-0 is the identity matrix.• PAM-1 diagonal small deviations from 1, off-
diag has small deviations from 0• PAM-250 is PAM-1 multiplied by itself 250
times.
Sequence profiles (1J2J.B)
>1J2J.B mol:aa PROTEIN TRANSPORT NVIFEDEEKSKMLARLLKSSHPEDLRAANKLIKEMVQEDQKRMEK
Log-odds scores
• BLOSUM is a log-likelihood matrix:• Likelihood of observing j given you have i is
– P(j|i) = Pij/Pi
• The prior likelihood of observing j is– Qj , which is simply the frequency
• The log-likelihood score is– Sij = 2log2(P(j|i)/log(Qj) = 2log2(Pij/(QiQj))– Where, Log2(x)=logn(x)/logn(2) – S has been normalized to half bits, therefore the factor 2
BLAST Exercise
Genome browsers - UCSC
Intron - Exon structure
Single Nucleotide polymorphism - SNP
SNPs
Protein 3D-structure
Protein structure
Primary structure: Amino acids sequences
Secondary structure: Helix/Beta sheet
Tertiary structure: Fold, 3D cordinates
Protein structure-helix
helix 3 residues/turn - few, but not uncommon-helix 3.6 residues/turn - by far the most common helixPi-helix 4.1 residues/turn - very rare
Protein structurestrand/sheet
Protein folds
Class4’th is ‘few secondary structure
ArchitectureOverall shape of a domain
TopologyShare secondary structure connectivity
Protein 3D-structure
Neural NetworksFrom knowledge to information
Protein sequence Biological feature
• A data-driven method to predict a feature, given a set of training data
• In biology input features could be amino acid sequence or nucleotides
• Secondary structure prediction
• Signal peptide prediction
• Surface accessibility
• Propeptide prediction
Use of artificial neural networks
N C
Signalpeptide
Propeptide Mature/active protein
Prediction of biological featuresSurface accessible
QuickTime™ and a decompressor
are needed to see this picture.
Predict surface accessible fromamino acid sequence only.
Logo plots
Information content, how is it calculated - what does it mean.
Logo plots - Information Content
Sequence-logo
Calculate Information Content
I = apalog2pa + log2(4), Maximal value is 2 bits
• Total height at a position is the ‘Information Content’ measured in bits.• Height of letter is the proportional to the frequency of that letter.• A Logo plot is a visualization of a mutiple alignment.
~0.5 each
Completely conserved