computational searches of biological sequences

71
Computational searches of biological sequences

Upload: quana

Post on 11-Jan-2016

20 views

Category:

Documents


0 download

DESCRIPTION

Computational searches of biological sequences. Conceptos básicos. Homología y otras relaciones evolutivas (paralógos, ortólogos, xenólogos) Uso preferencial de codones, CAI y expresividad Microarreglos y aproximaciones estadísticas para su análisis. Descripción de programas existentes. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Computational searches of  biological sequences

Computational searches of biological sequences

Page 2: Computational searches of  biological sequences

Conceptos básicosConceptos básicos Homología y otras relaciones Homología y otras relaciones

evolutivas (paralógos, ortólogos, evolutivas (paralógos, ortólogos, xenólogos)xenólogos)

Uso preferencial de codones, CAI y Uso preferencial de codones, CAI y expresividadexpresividad

Microarreglos y aproximaciones Microarreglos y aproximaciones estadísticas para su análisisestadísticas para su análisis

Page 3: Computational searches of  biological sequences

Descripción de programas Descripción de programas existentesexistentes

BLAST (Comparación apareada de BLAST (Comparación apareada de secuencias)secuencias)

MEME/MAST (Identificación de MEME/MAST (Identificación de motivos sobre-representados)motivos sobre-representados)

Page 4: Computational searches of  biological sequences

Planteamiento de problemas Planteamiento de problemas para resolverpara resolver

1.1. Grupo mínimo de genes para la vidaGrupo mínimo de genes para la vida2.2. Predicción de operones bacterianosPredicción de operones bacterianos3.3. Expresividad en unidades Expresividad en unidades

transcripcionalestranscripcionales4.4. Conservación de expresividad entre Conservación de expresividad entre

organismosorganismos5.5. Identificación de genes transferidos Identificación de genes transferidos

horizontalmente horizontalmente H. pyloriH. pylori6.6. Regulación por glucosa en Regulación por glucosa en E. coliE. coli

Page 5: Computational searches of  biological sequences

C S T P A G N D E Q H R K M I L V F Y WC 12S 0 2T -1 2 3P -3 0 0 8A 1 1 1 0 2G -2 0 -1 -2 1 7N -2 1 1 -1 0 0 4D -3 1 0 -1 0 0 2 5E -3 0 0 -1 0 -1 1 3 4Q -2 0 0 0 0 -1 1 1 2 3H -1 0 0 -1 -1 -1 1 0 0 1 6R -2 0 0 -1 -1 -1 0 0 0 2 1 5K -3 0 0 -1 0 -1 1 1 1 2 1 3 3M -1 -1 -1 -2 -1 -4 -2 -3 -2 -1 -1 -2 -1 4I -1 -2 -1 -3 -1 -5 -3 -4 -3 -2 -2 -2 -2 3 4L -2 -2 -1 -2 -1 -4 -3 -4 -3 -2 -2 -2 -2 3 3 4V 0 -1 0 -2 0 -3 -2 -3 -2 -2 -2 -2 -2 2 3 2 3F -1 -3 -2 -4 -2 -5 -3 -5 -4 -3 0 -3 -3 2 1 2 0 7Y -1 -2 -2 -3 -2 -4 -1 -3 -3 -2 2 -2 -2 0 -1 0 -1 5 8W -1 -3 -4 -5 -4 -4 -4 -5 -4 -3 -1 -2 -4 -1 -2 -1 -3 4 4 14

PAM 250PAM 250

AGGIDGGHGFMG117137

Matriz de substitución para Matriz de substitución para aminoácidosaminoácidos

Page 6: Computational searches of  biological sequences

Unitary matrix for DNA Unitary matrix for DNA sequencessequences

A

A C G T

C

G

T

1

1

1

1

0 0 0

0

0

0

00

0

0

0

0

Page 7: Computational searches of  biological sequences

In any case, the values obtain in the comparison are the same along the entire alignment

It is well know that some residues in a protein or in a nucleotide sequence plays important roles and therefore are constrained to vary

Page 8: Computational searches of  biological sequences

These conserved regions constitutes motif which are sometimes These conserved regions constitutes motif which are sometimes recognized in a set of recognized in a set of aligned sequences. aligned sequences.

SCK1_CENEL/13-36 CLKPCKDLYGPHAGAKCMNGKCKCSCKM_CENMA/13-36 CLPPCKAQFGQSAGAKCMNGKCKCSCT2_ANDAU/35-57 CASVCRRVIGVAAG-KCINGRCVCSCK3_ANDMA/13-35 CASVCRKVIGVAAG-KCINGRCVCSCK4_MESMA/35-57 CASVCRREIGVAAG-KCINGKCVCSCKK_TITSE/35-57 CYSACKKLVGKATG-KCTNGRCDCSCK2_TITDI/14-36 CVKICIDRYNTRGA-KCINGRCTCSCKP3_TITSE/7-28 CNRKCCPG-GCRSG-KCINGKCQCSCBX_MESMA/8-29 CRVKCVAM-GFSSG-KCINSKCKCSCKL_LEIQH/8-28 CQLSCRSL-GL-LG-KCIGDKCECSCK5_ANDMA/8-28 CQLSCRSL-GL-LG-KCIGVKCECSCK1_CENNO/36-57 CDKDCKRR-GYRSG-KCINNACKCSCK2_CENNO/8-29 CDKDCTSR-KYRSG-KCINNACKC * * . ** . *

Page 9: Computational searches of  biological sequences

What is a motif in a biological sequence?What is a motif in a biological sequence?

Represents a conserved region of a sequence.

This conservation might be due to a functional constraint.

There are conserved structural domains in a family of proteins. Amino acid sequences can almost always represent such motifs.

Motif identification is useful to classify and understand protein or nucleotide function.

Page 10: Computational searches of  biological sequences

Example of a protein motif.

Motifs can be represented by Weight Matrices:Motifs can be represented by Weight Matrices:

Page 11: Computational searches of  biological sequences

Example of a RNA motif.

Motifs can be represented by Weight Matrices:Motifs can be represented by Weight Matrices:

Page 12: Computational searches of  biological sequences

Example of a DNA motif.

Motifs can be represented by Weight Matrices:Motifs can be represented by Weight Matrices:

Page 13: Computational searches of  biological sequences

How can we obtain a How can we obtain a Weight Weight Matrix for a specific motifMatrix for a specific motif??

……. by evaluating the relative frequency of its elements in a set of aligned sequences.

Page 14: Computational searches of  biological sequences

This frequency matrix contains relevantBiological Information about your protein and can be used to obtain a:

PPosition osition SSpecific pecific SScore core MMatrix atrix PSSMPSSM

Page 15: Computational searches of  biological sequences

PPosition osition SSpecific pecific SScore core MMatrix atrix PSSMPSSM

While PAM and Blosum matrices are used to compare two amino acids of a pair of sequences regardless of their position in the aligned sequences, a PSSM analysis uses a different matrix in which the score varies depending on the conservation of each position of the aligned sequences

Page 16: Computational searches of  biological sequences
Page 17: Computational searches of  biological sequences

Serin Protease

Page 18: Computational searches of  biological sequences

Actually, the frequencies are not used as such to score putative sites. The score assigned assigned to a piece of sequence, S, is calculated as the log-ratio of two probabilities:

P(S|M), the probability to observe sequence S given the motif model M (the matrix).

P(S|B), the probability to observe sequence S given the background model B (the genomic context).

The score of a sequence segment is WS=log[P(S|M)/P(S|B)]

PPosition osition SSpecific pecific SScore core MMatrix atrix PSSMPSSM

Page 19: Computational searches of  biological sequences

Different programs have been developed to find motifs

1 AKSJDFHLASUHERLAKSNBKAJNCLKJASHDKFJAHSEJ2 DLKTJNKHBHEASHRGHBDFASJGHBCLKUSHKLCSDHGK3 GNLKXDHKIASGCSDKJCSKHDGKJELHBHEAJFNLOIJS4 JHSLRCKJGHXBDKSLCFALSIZDNGJDFGNLCKJSDNSD5 LKSAJDHBFCKGLSHBHEAUABSXDJKFASODFHBHKAHS6 JSHGHAEKHKSDFJHKSJDFHKAJSEHRKAJHBHEAPERI7 QWHBHEACVLXMNCVKUIEHRMBDKFJAHLIDHRTRKKQP8 LICVUWJENOMNVIDFGKJERJSGFAHGSIUOPIAKHVIU9 OIEURTKSHOIUCVBSDFGUYWERKJHDFLIUHBHEAERT10 OIUWERMXCVKJHBHEAWIERUOIUVMBNAWIUEYRHASS

Page 20: Computational searches of  biological sequences

Different programs have been developed to find motifs

1 AKSJDFHLASUHERLAKSNBKAJNCLKJASHDKFJAHSEJ2 DLKTJNKHBHEASHRGHBDFASJGHBCLKUSHKLCSDHGK3 GNLKXDHKIASGCSDKJCSKHDGKJELHBHEAJFNLOIJS4 JHSLRCKJGHXBDKSLCFALSIZDNGJDFGNLCKJSDNSD5 LKSAJDHBFCKGLSHBHEAUABSXDJKFASODFHBHKAHS6 JSHGHAEKHKSDFJHKSJDFHKAJSEHRKAJHBHEAPERI7 QWHBHEACVLXMNCVKUIEHRMBDKFJAHLIDHRTRKKQP8 LICVUWJENOMNVIDFGKJERJSGFAHGSIUOPIAKHVIU9 OIEURTKSHOIUCVBSDFGUYWERKJHDFLIUHBHEAERT10 OIUWERMXCVKJHBHEAWIERUOIUVMBNAWIUEYRHASS

…..if the alignment is not an option?

Page 21: Computational searches of  biological sequences

How do they work?

A) Counting all the “words” of certain length and evaluating the more frequent and statistically significant.

B) In a aleatory fashion, taking fragments chosen randomly and evaluating if these fragments manage to generate a conserved representative motif

(Gibbs sampler algorithm)Gibbs sampler algorithm)

Page 22: Computational searches of  biological sequences

Gibbs sampler algorithmGibbs sampler algorithmMultiple Local Alignment (MLA)Multiple Local Alignment (MLA)

Page 23: Computational searches of  biological sequences

We mark a sequence into the motif site (occurrence), which is described by a probability-positional matrix q(i,r) , and the background, which is described by background symbol probabilities f(i). r is a nucleotide (a residue); r {A,T,G,C}i is a position in the site, i=1..s , s is the motif length

Positional-Probabilistic Model (PPM) and backgroundPositional-Probabilistic Model (PPM) and background

Page 24: Computational searches of  biological sequences

What is a motif

Two probabilistic models, foreground (the motif) and background, are formulated.

We classify (mark) all the input sequences into these two models-obtained parts.

Page 25: Computational searches of  biological sequences

A Gibbs sampling step

Motif and background bases counters are computed from all the sequence fragments except the current one.

The probability distribution of the new site position or its absence in the current sequence is derived from the statistical models and the current sequence content.

A new site location is sampled from the distribution.

Statistical models for the background and for the motif are formed using the counters.

The current sequence

Page 26: Computational searches of  biological sequences

…..if the alignment is not an option?

Gibbs sampler algorithmGibbs sampler algorithm

Page 27: Computational searches of  biological sequences

Gibbs sampler algorithmGibbs sampler algorithm

Motif site (occurrence), which is described by a probability-positional matrix q(i,r)

background, which is described by background symbol probabilities f(i).

Two probabilistic models are formulated: the foreground model (the motif) and the background model

Page 28: Computational searches of  biological sequences

Gibbs sampler algorithmGibbs sampler algorithm

A probability distribution (where the foreground and background models are different) can be evaluated

Page 29: Computational searches of  biological sequences

A com

plete s

tatis

tical

descr

iptio

n of th

e meth

od is

not in

the s

cope o

f this

talk

Page 30: Computational searches of  biological sequences

One of the sequences, chosen randomly,is removed from the alignment.

The main idea of the method …..The main idea of the method …..

A probability distribution profile is evaluated

Page 31: Computational searches of  biological sequences

and replaced by new sequence searched with the previous motif profile

A new probability distribution profile is evaluated again

The main idea of the method …..The main idea of the method …..

Page 32: Computational searches of  biological sequences

The main idea of the method …..The main idea of the method …..

After several cycles, the method tends to identify a significant motif After several cycles, the method tends to identify a significant motif

Page 33: Computational searches of  biological sequences

GLAM2 is a software package for finding motifs in sequences, typically amino-acid or nucleotide sequences.

The main innovation of GLAM2 is that it allows insertions and deletions in motifs.

The package includes these programs:

* glam2 - for discovering motifs shared by a set of sequences.

* glam2scan - for finding matches, in a sequence database, to a motif discovered by glam2.

* glam2format - for converting glam2 motifs to standard alignment formats.

* glam2mask - for masking glam2 motifs out of sequences, so that weaker motifs can be found.

* purge - for removing highly similar members of a set of sequences.

http://bioinformatics.org.au/glam2/doc/

Page 34: Computational searches of  biological sequences

http://meme.sdsc.edu/meme4/cgi-bin/glam2.cgi

Page 35: Computational searches of  biological sequences

Basic usageRunning glam2 without any arguments gives a usage message:

Usage: glam2 [options] alphabet my_seqs.fa Main alphabets: p = proteins, n = nucleotides Main options (default settings): -h: show all options and their default settings -o: output file (stdout) -r: number of alignment runs (10) -n: end each run after this many iterations without improvement (10000) -2: examine both strands forward and reverse complement --z: minimum number of sequences in the alignment (2) --a: minimum number of aligned columns (2) --b: maximum number of aligned columns (50) --w: initial number of aligned columns (20)

-The main input to glam2 is a file of sequences in FASTA format:>MyFirstSequence GHYWVVCTGGGACH >My2ndSequence LLIGGPWVWWADDDF (etc.)

You need to tell glam2 which alphabet to use:glam2 p my_prots.fa glam2 n my_nucs.fa

Use -o to write the output to a file rather than to the screen:glam2 -o my_prots.glam2 p my_prots.fa

Page 36: Computational searches of  biological sequences

How it works

To use glam2 starts from a random alignment, and makes many small, random changes to it, which are designed to find high-scoring alignments in the long run. The longer you let it run, the more likely it is to find a maximal-scoring alignment.

To check that a reproducible, high-scoring motif has been found, the whole procedure is run several (e.g. 10) times from different starting alignments. If all runs produce identical alignments, we have maximum confidence that this is the optimal motif.

If a few of the runs produce different, lower-scoring motifs, we still have high confidence.

If all the runs produce completely different alignments, we have low confidence, and the run-length needs to be increased.

Page 37: Computational searches of  biological sequences

MEMEMEME: Multiple Expectation maximization for Motif Elicitation

MAIN DIFFERENCES

Try many different initial segments in order to get one that converges to an optimum.

Try different window analysis sizes.

In order to generate a motif with gaps, more than one motif can be generated.

Page 38: Computational searches of  biological sequences

Motif discovery from unaligned sequences Optimal for Genomic or Protein sequences Especially if you do not know the size of the motif

Identifies profile motifs Simultaneously analyze Multiple motifs for any input

Flexible model of motif presence Motif can be absent in some sequences Motif can appear several times in one sequence

A very useful program to discoverA very useful program to discover Patterns Patterns

MEMEMEME: Multiple Expectation maximization for Motif Elicitation

Page 39: Computational searches of  biological sequences

The input to MEME contains the following The input to MEME contains the following

fields.fields. Sequences

Protein or DNA sequences (Do not merge) in fasta format.

Notice that sequence names must not be repeated

Valid examples are

>seq1 GDIFYPGYCPDVKPVNDFDLSAFAGAWHEIAK >seq2 GDMFCPGYCPDVKPVGDFDLSAFAGAWHELAK

>GdhA Glutamate dehydrogenase from Escherichia coli

RDMFCPGGCPDVKPVGDHDLSAFKGAWHELAL

Page 40: Computational searches of  biological sequences

Motif distribution

One per sequence (oops)

Zero or one per sequence (zoops)

Any number of repetitions (tcm)

Number of motifs [optional]

The program will stop the analysis after this number of motifs is found.

Number of sites (Minimum or Maximum sites) (<= 300)

Minimum sites = 5 Maximum sites = 8

Motif width .

MEME will find the optimum width of each motif within the limits you specify : Minimum or Maximum

MEME input. continued….MEME input. continued….

Page 41: Computational searches of  biological sequences

Text output format

By default, MEME output is in hypertext (HTML) format.

Shuffle letters in input sequences

Useful for further statistical analysis

Look for palindromes onlyAverage the letter frequencies in corresponding motif columns together

MEME input. continued….MEME input. continued….

Page 42: Computational searches of  biological sequences

http://meme.sdsc.edu/meme/meme.html

Page 43: Computational searches of  biological sequences

MEME OutputMEME Output

Motif length

# of motifs found

Expectation value

“Position-Specific Probability Matrix

Information content

Consensus

Page 44: Computational searches of  biological sequences

Sequence names

Strand (reverse or complement)

Position in sequence

Statistical significance

Motif within sequence

MEME Output

Page 45: Computational searches of  biological sequences

Overall Statistical significance

Sequence length

Motif in complement strand

MEME OutputMEME Output

Page 46: Computational searches of  biological sequences

Searches for motifs (one or more) in sequence databases: Like BLAST but motifs are used as input Similar to the matrices obtained by iteration in PSI-

BLAST

Profile defines statistical significance of a match Multiple motif matches per sequence Combined E value for all motifs

MEME uses MAST to summarize results: Each MEME result is accompanied by the MAST result

for searching the discovered motifs on the given sequences.

MASTMAST

Page 47: Computational searches of  biological sequences

Email address

Database (like BLAST)

Motif file (e.g. MEME output)

Consider matched sequence length

E value threshold

MAST inputMAST input

Page 48: Computational searches of  biological sequences

Matched accession

Match E value

Length of sequence

Link to GenBank

MAST outputMAST output

Page 49: Computational searches of  biological sequences

Motif diagram

MASTMAST output output

Page 50: Computational searches of  biological sequences

Position of each instance

P value of instance

Matched parts of sequence

Motif ‘consensus’

Motif and orientation

MAST outputMAST output

Page 51: Computational searches of  biological sequences

Affymetrix GeneChip arrays High density oligonucleotide

array technology as developed by Affymetrix www.affymetrix.com

Overview images courtesy of Affymetrix unless otherwise specified

Page 52: Computational searches of  biological sequences

Probes and Probesets

Page 53: Computational searches of  biological sequences

Sample Preparation

Page 54: Computational searches of  biological sequences

Hybridization to the Chip

Page 55: Computational searches of  biological sequences

The Chip is Scanned

Page 56: Computational searches of  biological sequences

The task of analysing a gene expression experimentfor differential genes falls into the following steps:

(1) Ranking: genes are ranked according to theirevidence of differential expression.

(2) Assigning significance: a statistical significance isbeing assigned to each gene.

(3) Cut-off value: to arrive at a limited number ofdifferentially expressed genes a cut-off value forthe statistical significance needs to be determined.

DIFFERENTIAL EXPRESSION Motivation

Page 57: Computational searches of  biological sequences

Planteamiento de problemas Planteamiento de problemas para resolverpara resolver

1.1. Grupo mínimo de genes para la vidaGrupo mínimo de genes para la vida2.2. Predicción de operones bacterianosPredicción de operones bacterianos3.3. Expresividad en unidades Expresividad en unidades

transcripcionalestranscripcionales4.4. Conservación de expresividad entre Conservación de expresividad entre

organismosorganismos5.5. Identificación de genes transferidos Identificación de genes transferidos

horizontalmente horizontalmente H. pyloriH. pylori6.6. Regulación por glucosa en Regulación por glucosa en E. coliE. coli

Page 58: Computational searches of  biological sequences

Datos de microarreglos sobre Datos de microarreglos sobre la reglulación por glucosa en la reglulación por glucosa en

E. coliE. coli ID B_number crp1 crp1 crp2 crp2 wt1 wt1 wt2 wt2 wtg1 wtg1 wtg2 wtg3

    señal det señal det señal det señal det señal det señal det

1 aas_b2836_at 279.2 P 293.8 P 360.8 P 332.2 P 475.6 P 374.4 P

2 aat_b0885_at 344.9 A 332.5 A 110.7 A 256.8 A 237.1 A 35.2 A

3 abc_b0199_at 697 M 682.4 A 530.5 A 532 A 747.9 M 882.8 A

4 abrB_b0715_at 537.3 P 518.3 P 446.3 A 329.6 M 357.9 P 406.7 A

5 accA_b0185_at 1735.3 P 1083.1 P 1821.7 P 1798.6 P 2416.7 P 1794.5 P

6 accB_b3255_at 2314.2 P 1811.4 P 1598 P 1835.2 P 3182.8 P 2893.6 P

7 accC_b3256_at 2486.2 P 1845.6 P 1165.7 P 1146 P 2861.6 P 2883.3 P

8 accD_b2316_at 1770.3 P 1468.1 P 1592.1 P 1808.5 P 2257.6 P 2873.7 P

9 aceA_b4015_at 632.6 P 790.2 P 3005.5 P 2489 P 474.1 P 521.8 P

10 aceB_b4014_at 282.7 P 394.4 P 1610.5 P 1254.1 P 274.8 A 316.6 A

11 aceE_b0114_at 5060.1 P 3979.2 P 2093.5 P 1665.9 P 7943.2 P 8411.5 P

12 aceF_b0115_at 3981.4 P 2287.3 P 1998.1 P 1425.8 P 5515.2 P 4993.9 P

13 aceK_b4016_at 386.8 P 331 P 534.9 P 442.9 P 395.4 P 395.3 P

14 ackA_b2296_at 2624.8 P 2084.2 P 1225.1 P 1273 P 3624.6 P 2990.2 P

15 acnA_b1276_at 488 P 326.2 P 701.9 P 773.2 P 368.7 A 460.8 A

16 acnB_b0118_at 2417.2 P 2185.7 P 3693 P 2864.5 P 1401.4 P 1786 P

P-Confiable, A-Dudoso, M-no confiable

Page 59: Computational searches of  biological sequences

Enfoques de estudio Tecnología de microarreglos

Se pueden complementar con otros métodos (RT-PCR, por ejemplo)

Page 60: Computational searches of  biological sequences

Outliers Outliers iteration iteration methodmethod

1. Se obtienen los promedios de las diferentes repeticiones

2. Se comparan los resultados de los experimentos de estudio con los del experimento control. En nuestro ejemplo:

a) WTglucosa/WT b) CRP/WT

3. Se obtienen la media y desviación estandar de los datos.

Page 61: Computational searches of  biological sequences

4.- Para identificar a los genes con mayor significancia asumimos un corte arbitrario de 2 desviaciones estardar relativos a la media.

Outliers Outliers iteration iteration methodmethod

Genes inducidos

Genes reprimidos

Page 62: Computational searches of  biological sequences

5.- Se vuelve a calcular la media y desviación estandar de los datos que no fueron considerados como estadísticamente importantes.

Outliers Outliers iteration iteration methodmethod

Page 63: Computational searches of  biological sequences

6.- Nuevamente se considera que los genes por arriba o debajo de dos desviaciones 2 desviaciones estardar relativos a la media son estadísticamente relevantes

Outliers Outliers iteration iteration methodmethod

Genes inducidos

Genes reprimidos

Page 64: Computational searches of  biological sequences

Outliers Outliers iteration methoditeration methodSe itera hasta ya no encontrar valores por Se itera hasta ya no encontrar valores por

afuera de las 2 DS.afuera de las 2 DS.

Genes inducidos

Genes reprimidos

Genes inducidos

Genes reprimidos

Page 65: Computational searches of  biological sequences

Metodo de Metodo de ordenamiento (Rank)ordenamiento (Rank)

Destaca los genes más expresados. Por lo tanto, no importa el valor en términos absolutos si no

su valor relativo a los demás valores del arreglo Los genes en principio deben de conservar su posición dada

la naturaleza de su expresión.

Page 66: Computational searches of  biological sequences

Metodo de Metodo de ordenamiento (Rank)ordenamiento (Rank)

Gene Valor

A 0.3

B 0.6

C 1.4

D 3.2

E 1.1

F 4.3

G 0.9

H 0.8

I 0.4

J 0.7

Pos Gene Valor

1 F 4.3

2 D 3.2

3 C 1.4

4 E 1.1

5 G 0.9

6 H 0.8

7 J 0.7

8 B 0.6

9 I 0.4

10 A 0.3

Page 67: Computational searches of  biological sequences

Metodo de Metodo de ordenamiento (Rank)ordenamiento (Rank)

Pos Gene Valor

1 F 4.3

2 D 3.2

3 C 1.4

4 E 1.1

5 G 0.9

6 H 0.8

7 J 0.7

8 B 0.6

9 I 0.4

10 A 0.3

¿Cuál es la probabilidad de quedar en primer lugar?

P-primero = 1/n = 1/10 = 0.1

¿Cuál es la probabilidad de quedar en primer o segundo lugar?

p= p-primero + p-segundo = 0.2

¿Cuál es la probabilidad de quedar en segundo lugar?

P-segundo = 1/n = 1/10 = 0.1

Page 68: Computational searches of  biological sequences

Metodo de Metodo de ordenamiento (Rank)ordenamiento (Rank)

Pos Gene Valor

1 F 4.3

2 D 3.2

3 C 1.4

4 E 1.1

5 G 0.9

6 H 0.8

7 J 0.7

8 B 0.6

9 I 0.4

10 A 0.3

Pos Gene

Valor Prob

1 F 4.3 0.1

2 D 3.2 0.2

3 C 1.4 0.3

4 E 1.1 0.4

5 G 0.9 0.5

6 H 0.8 0.6

7 J 0.7 0.7

8 B 0.6 0.8

9 I 0.4 0.9

10 A 0.3 1.0

Page 69: Computational searches of  biological sequences

Metodo de Metodo de ordenamiento (Rank)ordenamiento (Rank)

¿Cómo consideramos la probabilidad si existen

repeticiones del experimento?Media geométrica

1er exp -> 3er lugar -> 0.32do exp -> 1er lugar -> 0.13er exp -> 2do lugar -> 0.2

P = (0.3 x 0.1 x 0.2)1/3

P = (0.06)1/3

P = 0.39

Page 70: Computational searches of  biological sequences

Problemas a resolver sobre Problemas a resolver sobre análisis de microarreglosanálisis de microarreglos

1. ¿Cuáles son los genes reprimidos e inducidos cuando E. coli es crecida en glucosa?

2. ¿Cuáles son los genes reprimidos e inducidos en la mutante CRP de E. coli?

3. ¿Cuál es la coincidencia entre los genes de los puntos 1 y 2?

4. ¿Qué tanto coinciden los resultados del análisis de Outliers y Rank?

5. Identifique el posible elemento de regulación común a estos genes utilizando Glim2 y MEME

Page 71: Computational searches of  biological sequences