inferência filogenética construção de Árvores filogenéticas ii ana margarida sousa instituto...
Post on 07-Apr-2016
216 Views
Preview:
TRANSCRIPT
Inferência FilogenéticaInferência Filogenética
Construção de Árvores Filogenéticas II
Ana Margarida Sousa
Instituto Gulbenkian de CiênciaGrupo de Biologia Evolutiva
amsousa@igc.gulbenkian.pt
AA
BBCC
DD
EE
FF
GG
wtwt
Árvore verdadeira
BclIBclISau
3AI
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
IBclIBclI BglII
BglIIBclIBclI
Sau3A
I
Sau3A
IBclIBclI
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
IBclIBclI
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
IBclIBclI
A
BclIBclISau
3AI
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
IBclIBclI BclIBclI
Sau3A
I
Sau3A
I
BglIIBglII
BclIBclISau
3AI
Sau3A
IBclIBclI
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
IBclIBclI
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
IBclIBclI
B
CSau
3AI
Sau3A
I
Sau3A
I
Sau3A
IBclIBclI
BglIIBglII
BclIBclI
Sau3A
I
Sau3A
IBglIIBglII
BclIBclI
Sau3A
I
Sau3A
IBclIBclI
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
IBclIBclI
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
BamHI
BamHI
Sau3A
I
Sau3A
IBclIBclI
DSau
3AI
Sau3A
I
Sau3A
I
Sau3A
IBclIBclI BclIBclI
Sau3A
I
Sau3A
IBclIBclI
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
IBclIBclI
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
BamHI
BamHI
Sau3A
I
Sau3A
I
BamHI
BamHI
BamHI
BamHI
Sau3A
I
Sau3A
IBclIBclI
ESau
3AI
Sau3A
IBclIBclIBclIBclI
Sau3A
I
Sau3A
IBclIBclI BclIBclI
Sau3A
I
Sau3A
I
BglIIBglII
BclIBclI
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
BclIBclI Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
IBclIBclI
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
BamHI
BamHI
BclIBclI
FBclIBclI
Sau3A
I
Sau3A
IBclIBclI BclIBclI BclIBclI BclIBclI
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
BclIBclI Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
IBclIBclI
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
IBclIBclI
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
IBclIBclI
Sau3A
I
Sau3A
I
GBclIBclI
Sau3A
I
Sau3A
IBclIBclI
BglIIBglII
BclIBclI
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
IBclIBclI
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
IBclIBclI
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
IBclIBclI
Dados I - DADOS DE RESTRIÇÃO
A
B
C
D
EFG
7 53 1A 11011100100000111010000001110110101000001110000001110B 11111010100100111010000001110100101000101110000001010C 00011000110011111010000001101100101000001100000011010D 00011000100000011010000011110111101110001100000011010E 10000001101001111111110100100100111000111101010110100F 10000110101000011110111111100110101001111101101001011G 10010000100000111110010100100100101000011101001001010
7 553A GGAACATCTGCGTAGACAATACTGCTAACAGTTACTGGCTCTCTCGTGTATCTAAAACGATTCCGGCACTGGAACACTTAAACGGGTTTGATGTTCGCTGGAAGCGTCTACTGAACGATGACCGTTGCTTCTACAAAGATGGCTTTATGCTTGATGGGGAACTCATGATCAAGGGCGTAGACTTTAACACAGGGTCCGGCCTACTGCGTACTAAATGGACTGACACGAAGAACCAAGAGTTTCATGAAGAGTTATTCGTTGAACCAATCCGTAAGAAAGATAAAGTTCCCTTTAAGCTGCACACTGGACACCTTCACATAAAACTGTACGCTATCCTCCCGCTGCACATCGTGGAGTCTGAAGAAGATTGTGATGTCATGACGTTGCTCATGCAGGAACACGTTAAGAACATGCTGCCTCTGCTACAGGAATACTTCCCTGAAATCAAATGGCAAGCGGCTGAATCTTACGAGGTCTACGATATGATAGAATTACAGCAATTGTACGAGCAGAAGCGAGCAGAAGGCCATGAGGGTCTCATTGTGAAAGACCCB GGAACATCTGCGTAGACAATACTGCTAACAGTTACTGGCTCTCTCGTGTATCTAAAACGATTCCGGCACTGGAACACTTAAACGGGTTTGATGTTCGCTGGAAGCGTCTACTGAACGATGACCGTTGCTTCTACAAAGATGGCTTTATGCTTGATGGAGAACTCATGATCAAGAGCGTAGACTTTAACACAGGGTCCGGCCTACTGCGTACTAAATGGACTGACACGAAGAACCAAGAGTTTCATGAAGAGTTATTCGTTGAACCAATCCGTAAGAAAGATAAAGTTCCCTTTAAGCTGCACACTGGACACCTTCACATAAAACTGTACGCTATCCTCCCGCTGCACATCGTGGAGTCTGAAGAAGACTGTGATGTCATGACGTTGCTCATGCAGGAACACGTTAAGAACATGTTGCCTCTGCTACAGGAATACTTCCCTGAAATCAAATGGCAAGCGGTTGAATCTTACGAGGTCTACGATATGGTAGAATTACAGCAATTGTACGAGCAGAAGCGAGCAGAAGGCCATGAGGGTCTCATTGTGAAAGACCCC GGAACATCTGCGTAGACAATACTGCTAACAGTTACTGGCTCTCTCGTGTATCTAAAACGATTCCGGCACTGGAACACTTAAACGGGTTTGATGTTCGTTGGAAGCGTTTACTGAACGATGACCGTTGCTTCTACAAAGATGGCTTTATGCTTGATGGGGAATTCATGATCAAGGGCGTAGACTTTAACACAGGGTCCGGCCTACTGCGTACTAAATGGACTGACACGAAGAACCAAGAGTTTCATGAAGAGTTATTCGTTGAACCAATCCGTAAGAAAGATAAAGTTCCCTTTAAGCTGCACACTGGACACCTTCACATAAAACTGTACGCTATCCTCCCGCTGCACATCGTGGAGTCTGAAGAAGACTGTGATGTCATGACGTTGCTCATGCAGGAACACGTTAAGAACATGCTGCCTCTGCTACAGGAATACTTCCCTGAAATCAAATGGCAAGCGGCTGAATCTTACGAGATCTACGATATGGTAGAATTATAGCAATTGTACGAGCAGAAGCGAGCAGAAGGCCATGAGGGTCTCATTGTGAAAGACCCD GGAACATCTGCGTAGACAATACTGCTAACAGTTACTGGCTCTCTCGTGTATCTAAAACGATTCCGGCACTGGAACACTTAAACGGGTTTGATGTTCGCTGGAAGCGTTTACTGAACGATGACCGTTGCTTCTACAAAGATGGCTTTATGCTTGATGGGGAACTCATGATCAAGGGCGTAGACTTTAACACAGGGTCCGGCCTACTGCGTACTAAATGGACTGACACGAAGAACCAAGAGTTTCATGAAGAGTTATTCGTTGAACCAATCCGTAAGAAAGATAAAGTTCCCTTTAAGCTGCACACTGGACACCTTCACATAAAACTGTACGCTATCCTCCCGCTGCACATCGTGGAGTCTGAAGAAGACTGTGATGTCATGACGTTGCTCATGCAGGAACACGTTAAGAACATGCTGTCTCTGCTACAGGAATACTTCCCTGAAATCAAATGGCAAGCGACTGAATCTTACGAGGTCTACGATATGGTAGAATTACAGCAATTGTACGAGCAGAAGCGAGCAGAAGGCCATGAGGGTCTCATTGTGAAAGACCCE GGAACATCTGCGTAGACAATACTGCTAACAGTTACTGGCTCTCTCGTGTATCTAAAACGATTCCGGCACTGGAACACTTAAACGGGTTTGATGTTCGCTGGAAGCGTCTACTGAACGATGACCGTTGTTTCTACAAAGATGGCTTTATGCTTGATGGGGAACTCATGATCAAGGACGTAGATTTTAACACAGGGTCCGACCTACTGCGTACTAAATGGACTGACACGAAGAACCAAGAGTTTCATGAAGAGTTATTCGTTGAACCAATCCGTAAGAAAGATAAAGTTCCCTTTAAGCTGCACACTGGACACCTTCACATAAAACTGTACGCTATCCTCCCGCTGCACATCGTGGAGTCTGAAGAAGACTGTGATGTCATGACGTTGCTCATGCAGGAACACGTTAAGAACATGCTGCCTCTACTACAGGAATACTTTCCTGAAATCAAATGGCAAGCGGCTGAATCTTACGAGGTCTACGATATGGTAGAATTACAGCAATTGTACGAACAAAAGCGAGCAGAAGGCCATGAGGGTTTCATTGTGAAAGACCCF GGAACATCTGCGTAGACAATACTGCTAACAGTTATTGGCTCTCTCGTGTATCTAAAACGATTCCGGCACTGGAACACTTAAACGGGTTTGATGTTCGCTGGAAGCGTCTACTGAACGATGACCGTTGTTTCTACAAAGATGGCTTTATGCTTGATGGGGAATTCATGATCAAGGGCGTAGATTTTAACACAGGGTCCGACCTACTGCGTACTAAATGGACTGACACGAAGAACCAAGAGTTTCATGAAGAGTTATTCGTTGAACCAATCCGTAAGAAAGATAAAGTTCCCTTTAAGCTGCACACTGGACACCTTCACATAAAACTGTACGCTATCCTCCCGCTGCACATCGTGGAGTCTGAAGAAGACTGTGATGTCATGACGTTGCTCATGCAGGAACACGTTAAGAACATGCTGCCTCTACTACAGGAATATTTTCCTGAAATCAAATGGCAAGCGGCTGAATCTTACGAGGTCTACGATATGGTAGAATTACAGCAATTGTACGAGCAAAAGCGAGCAGAAGGCCATGAGGGTCTCATTGTGAAAGACCCG GGAACATCTGCGTAGACAATACTGCTAACAGTTACTGGCTCTCTCGTGTATCTAAAACGATTCCGGCACTGGAACACTTAAACGGGTTTGATGTTCGCTGGAAGCGTCTACTGAACGATGACCGTTGTTTCTACAAAGATGGCTTTATGCTTGATGGGGAATTCATGATCAAGGGCGTAGATTTTAACACAGGGTCCGACCTACTGCGTACTAAATGGACTGACACGAAGAACCAAGAGTTTCATGAAGAGTTATTCGTTGAACCAATCCGTAAGAAAGATAAAGTTCCCTTTAAGCTGCACACTGGACACCTTCACATAAAACTGTACGCTATCCTCCCGCTGCACATCGTGGAGTCTGAAGAAGACTGTGATGTCATGACGTTGCTCATGCAGGAACACGTTAAGAACATGCTGCCTCTACTACAGGAATACTTTCCTGAAATCAAATGGCAAGCGGCTGAATCTTACGAGGTCTACGATATGGTAGAATTACAGCAATTGTACGAGCAAAAGCGAGCAGAAGGCCATGAGGGTCTCATTGTGAAAGACCC
Dados II - SEQUÊNCIAS NUCLEOTÍDICAS
Mapas físicos (dados de restrição)
Matriz de dados (0/1)Matriz de dados (0/1)
Matriz de distâncias
UPGMA NJ ME
Seqs nucleotídicas Seqs nucleotídicas (alinhamento)(alinhamento)
MP, ML
Matriz de distâncias
UPGMA NJ MEMP, ML
Dados boleanos (0/1)
Métodos Programa
Cálculo distâncias restdistUPGMA neighborNJ neighborME fitchMP parsML restml
Dados de sequência
Métodos Programa
Cálculo distâncias dnadistUPGMA neighborNJ neighborME fitchMP dnaparsML dnaml
BOOTSTRAPBOOTSTRAP
Matriz de dados (0/1)Matriz de dados (0/1)
Gerar 100 pseudo-réplicas
100 Matrizes de distância
100 árvores NJ
Árvore consenso pela maioriaÁrvore consenso pela maioria
Seqs nucleotídicas Seqs nucleotídicas (alinhamento)(alinhamento)
Gerar 100 pseudo-réplicas
100 árvores NJ
Árvore consenso pela maioriaÁrvore consenso pela maioria
Métodos Programa
Pseudo-replicas seqbootÁrvore consenso consens
1. Copiar o ficheiro de entrada (formato .txt) para a pasta onde se encontra o programa executável que vai utilizar (ex: restdist.exe).
2. Clicar duas vezes sobre o executável para abrir o programa.
3. Escrever o nome do ficheiro de entrada (não esquecer a extensão “.txt”).
4. Alterar as opções pretendidas conforme indicado no menu.
5. Escrever ‘y’. Automaticamente é gerado um ‘outfile’ e/ou um ‘treefile’.
6. Transferir estes ficheiros para outra pasta e mudar-lhes o nome.
7. Abrir o ficheiro ‘treefile’ com o programa TreeView para analisar a árvore produzida.
Sequência de passos para utilizar qualquer um dos programas do pack PhylipPhylip.
Inferência Bayesiana usando o programa MrBayesMrbayes.exe
Dados mistos: Dados de restrição + Dados de sequênciaDados mistos: Dados de restrição + Dados de sequência
#NEXUSbegin data;dimensions ntax=14 nchar=5128;format datatype=mixed (Restriction:1-304,DNA:305-5128) interleave=yes gap=- missing=?;matrix
A0000100010100?01001000000000001000100001000001011000000000000001010010000000000010000011100000000110000000000000001000010100000100000100000000001000000000000000000000000001010001000000010001010000000001000010100000000100010000000001000001000101000010000010001000001000100000000010100101010000010100000100
B0100010000000?0?001101001000001110000100100001001000000000000001011110000010110000000000101000010001100000010000010000001000000010000101000100000100001100000010001000011001000000100100000001100000000000000001100010000101100000001001010101000000000001000100000100000010100001000000100001000000010011101000
C 0010010000000?0?001101101000001110000100000100001000000000000001111010000010111000011000101000010001000000000000010000001000000010000101000100000100001100000010000000000111000001100100000001100000000000010001100010000101100000000001000101000000000001000100000100000010100001000100100001000000010000101000
B11P10 TAAAAATCTGAGTGACTATCTCACAGTGTACGGAC-CTAAAGTTCCCCCAB13P10 TAAAAATCTGAGTGATTATCTCACAGTGTACGGAC-CTAAAGTTCCCCCAB14P10 TAAAAATCTGAGTGATTATCTCACAGTGTACGGAC-CTAAAGTTCCCCCA[ 4810 4820 ][ * * ]a5P10 TAGGGGGTACCTAAAGCCCAGCCAa7P10 TAGGGGGTACCTAAAGCCCAGCCAa8P10 TAGGGGGTACCTAAAGCCCAGCCAa9P10 TAGGGGGTACCTAAAACCCAGCCAa11P10 TAGGGGGTACCTAAAGCCCAGCCAa13P10 TAGGGGGTACCTAAAGCCCAGCCAa14P10 TAGGGGGTACCTAAAGCCCAGCCAB3P10 TAGGGGGTACCTAAAGCCCAGCCAB7P10 TAGGGGGTACCTAAAGCCCAGCCAB9P10 TAGGGGGTACCTAAAGCCCAGCCAB10P10 TAGGGGGTACCTAAAGCCCAGCCAB11P10 TAGGGGGTACCTAAAACCCAGTCAB13P10 TAGGGGGTACCTAAAACCCAGTCAB14P10 TAGGGGGTACCTAAAACCCAGTCA;end;begin mrbayes;
delete 1 4 6 7 12 13 14;charset Restriction=1-304;charset DNA=305-5128;partition Names=2: Restriction, DNA;set partition=Names;lset applyto=(2) nst=6 rates=gamma;unlink shape=(all) pinvar=(all) statefreq=(all) revmat=(all);prset ratepr=variable;
mcmcp ngen=1000 printfreq=100 samplefreq=100 nchains=4 savebrlens=yes filename=Allenz+Allseqs0; mcmc; end;
Sequência de passos para utilizar o programa MrBayesMrbayes.exe
1. Gravar o ficheiro de entrada na mesma localização que o programa executável.
2. Iniciar o programa.
3. Escrever o comando ‘execute’ e depois o nome do ficheiro de entrada (não esquecer a extensão ‘.txt’).
4. Aumentar o número de gerações para 1 000 000.
5. Verificar se ao fim deste nº de gerações o valor do desvio padrão entre as cadeias é ≤ 0.01.
6. Se sim pode parar o programa.
7. Escrever o comando ‘sump burnin = 2500’ (resumir os valores dos parâmetros).
8. Escrever o comando ‘sumt burnin = 2500’.
9. Verificar o resultado abrindo o ficheiro com extensão ‘.con’ com o programa TreeView.
top related