diversity of crinkler effector genes in phytophthora infestans isolates
DESCRIPTION
My talk given at Microbes in Norwich 2013TRANSCRIPT
Diversity of Crinkler Effector Genes in Phytophthora infestans isolates.
Graham Etherington
8 Feb 2013
Outline• Phytophthora infestans
– Biology and evolution– Effector genes
• Crinklers– Gene architecture– VLVVVP recombination motif
• Aims• Methods
– Looking for the VLVVVP motif– Making use of all the data
• Applying methods to data• Confirming results• Comments/To-do
Phytophthora infestans
• Oomycete• Cause of Late Blight in Solanum species (e.g. Potato)• Responsible for the Irish Potato Famine (1845)• Global potato yield losses of 16% • Economic losses of £4.3 billion annually
• Capable of sexual and asexual reproduction• High evolutionary potential- striking capacity for genetic change- rapidly adapts to overcome resistant plants
• Phytophthora - from Greek (phytón) “plant” and (phthorá) “destruction” - the plant-destroyer!
Phytophthora infestans
P. infestans effector genes
• Secreted from pathogen – alters host to enable infection• High polymorphism and positive selection• Forms a ‘haustorium’ which delivers effector molecules to
plant cells.• Two main cytoplasmic effectors:– RXLRs– Crinklers
Beynon 2006
Crinkler effector genes
Haas 2009
ConservedN-terminal
DiverseC-terminal
Torto 2003
CrinklersN-terminal C-terminal
Recombination hotspot.Look at sequence diversity around this site.
Roman Numerals:V+L+V+V+V=70=70P
Aims
• Do P.infestans isolates contain unique Crinkler genes?
• What is the diversity of Crinklers within P.infestans?
• How many Crinklers are shared between different isolates?
The data
• 76nt PE Illumina genomic sequences of P. infestans isolates– EC3527 (51x)– EC3626 (64x) – UK 3928 (50x)– US 22 (50x)– NL 07434 (50x)
Test data (99% seq similarity)
Looking for VLVVVP in genome
• Take isolate reads, make 6-frame translations and look for ‘VLVVVP’ recombination site?
• Control experiment – available resources:– P. infestans reference sequence (T30-4)– All annotated transcripts
• including Crinklers
• Make 6-frame translation of genome and look for VLVVVP motif– Where are the motif hits?
Looking for VLVVVP in genome
• 231 VLVVVP hits:– 81 in annotated Crinklers– 18 in other transcripts– 132 outside any annotated feature
• Conclusion: Looking for the VLVVVP motif in 6-frame translations results in a very high number of false positives.
DNA sequence
• Extract Crinkler gene sequences using P. infestans annotation.
• Align sequences• Locate VLVVVP motif coding sequence• Create ‘variable consensus’ DNA sequence
from alignment.
Consensus sequence
Consensus sequence logo
GTGCTGGTGG[TC]G[TG]T[TG]CCV L V V V P
Testing the consensus sequence
• No. of consensus motifs found in P. infestans genome: 194– in annotated Crinklers: 185– in any other annotated genes: 0– not found in any annotated region: 9
• Blast 300nt upstream region– 8 Crinkler gene hits, 1 false positive
• 193/194 hits were Crinklers• Conclusion: Looking for the ‘variable consensus’
sequence in reads should result in very low number of false positives.
Methods
• Search reads for variable consensus.• Truncate reads at consensus.– ATCGGTTCATGTGCTGGTGGTGTTTCCGCCGTG becomes
GTGCTGGTGGTGTTTCCGCCGTG• Count the occurrence of each sequence:
GTGCTGGTGGTGGTTCCGCCGTG100GTGCTGGTGGCGTTGCCATGTG 70GTGCTGGTGGCGTTGCCA 5GTGCTGGTGGCGTTGCCCGCGTG1GTGCTGGTGGTGGTTCCGCC 30
• Remove singletons
Making use of all the data
• In the datasets there are read-counts for short ‘sub-sequences’ which are part of one or more longer unique ‘super-sequences’.
GTGCTGGTGGTGGTTCCGCCGTG 2GTGCTGGTGGTGGTTCC
8GTGCTGGTGGT
55GTGCTGGTGGTTAACGGT
40GTGCTGGTGGTTAACGGTACGTAA 25• Interested in the longest unique super-sequences,
but don’t want to throw away information in the shorter sub-sequences.
Proportional Representation
• Distribute read counts of smaller sub-sequences to longer super-sequences proportionately:– assign more read counts to common super-
sequences– assign less read counts to rarer super-sequences
Proportional Representation
A ABCDE100
Bi ABCDEFGH144
Bii ABCDEFGA6
A ABCDE100
Bi ABCDEFGH48
Bii ABCDEFGA2
100
5048 96
100
502 4
sub-sequence A ABCDE 100super-sequence Bi ABCDEFGH 48super-sequence Bii ABCDEFGA 2
Sub-sequence A is shared between super-sequences Bi and Bii as follows:
50
Assign to Bi
Assign to Bii
+48
+2
Recap
• Identify the best way to find Crinkler recombination motif (VLVVVP) in P. infestans = variable consensus sequence.
• Search for the variable consensus in the isolate reads.
• Create ‘proportionally representative’ longest super-sequences.
• Produce Venn Diagram of shared and unique sequences (both as DNA seqs and protein seqs).
EC CrinklersAs amino acid sequences
Shared by both 59
ec3626 only 4
ec3527 only 1
Total 64
motif ec3527 ec3626VLVALPPGTSSAPISDGSDFWLS 56 22VLVALPPGTSSAPISDGTCRDFN 15 16VLVALPPGTSSAPISDGTDFWLSRF 216 106VLVALPPGTSSAPISDGTDLWLSRF 1103 899VLVALPPGTSSAQ*AREQVCSSTRH 11 13VLVALPPWTSSAPISDGTNLWLSRF 100 81VLVALPSGTSSAPISDGTDLWLSRF 58 40VLVVVPDEAGGAVASVEPS 11 18VLVVVPDEAGGAVASVV 11 7VLVVVPDGAGGSASDTSRIDR 2 8VLVVVPDGAGGSASDTSRMDRLFDK 322 413VLVVVPDVAGYAVASVELSAAPTT 22 21VLVVVPDVAGYAVASVEPSAAPTTI 13 25VLVVVPE*EQPVSPPQKN*SAVST 36 41VLVVVPEGAGGDLELASLLSTTIQ 19 32VLVVVPEGAVGSALSQPANSATIPN 20 15VLVVVPEGGKRHPSNGWFAEFFH 17 24VLVVVPEHDGAISNDMSAVTTP 49 66VLVVVPEHDGTISNDMSAVTTPLIV 99 49VLVVVPEHDGTISNDMSAVTTPLT 25 25VLVVVPEITTTV*VRERKDEVLMA 50 14VLVVVPEQAQGQPGLWLVTGSVD 19 25VLVVVPEQDGKIKRYVCSDNATDSR 147 95VLVVVPEQDGTISKEMFAATTPLT 91 55VLVVVPEQDGTISKEMSAATSPLTV 28 11VLVVVPEQDGTISKEMSAATTPLTL 57 55VLVVVPEQDGTISKEMSAATTPLTV 654 437VLVVVPEQDGTISNDLSAVTTPLTV 4 28VLVVVPEQDGTISNDMSAVTTPLTV 605 463VLVVVPEQGFSVPTVSQDGVFDHCI 36 41VLVVVPEQGSSVL*TVFSTTAVIHS 24 34VLVVVPEQGSSVPTVSLDGVFDHCS 58 96
motif ec3527 ec3626VLVVVPEQGSSVPTVSQDGVFDHCI 136 249VLVVVPEQGSSVPTVSQDGVFDHCS 61 82VLVVVPESFGVDSQLLQLQEALLQ 29 17VLVVVPGIITTVEVRERKDDKLIMA 25 52VLVVVPGITTTVEVRERKDEMLMA 45 10VLVVVPGITTTVEVRERKDEVLMAE 173 251VLVVVPGLVASTVTIVIEEAAGSKP 110 126VLVVVPGQGSSVPTVSQDGVFDH 14 5VLVVVPGQRLPIAATAIHEPHPA 14 34VLVVVPKGKNDRSAAMAIGVAPSLP 227 379VLVVVPKGRNDRSAAMAIGVAP 5 10VLVVVPKQDGTISNDTSAATTPLT 39 28VLVVVPKQGTSVPTVSQDGVFDHCN 12 48VLVVVPMPSVGSKRSADEIADVQKR 21 29VLVVVPPPSVGSKRSADEIADVQK 13 17VLVVVPPQDDLRSPAMTLLEAILPY 22 25VLVVVPRSRETMTHGAQWILR 20 8VLVVVPRSRETTTHGALWILRFN 16 9VLVVVPRSTGDDDAWSPKDLCTVD 27 40VLVVVPRSTGDDDAWSPMDPKVQLN 695 699VLVVVPRSTGDDDAWSPMDSK 3 44VLVVVPRSTGDDYALSPNDLCTLDP 28 50VLVVVPRSTGDGDAWSPMDPKVQLN 29 25VLVVVPRVAPAPENKRKRKRMEDED 37 24VLVVVPSDDVVVPVSVPVAVPTGPE 25 39VLVVVPVGAGVGVGQDVSMDVPAA 83 141VLVVVPVGAGVGVGQDVSMHVPAAV 7 8VLVVVPGLVASTVTIVIEEAAES 0 43VLVVVPRSAGDDDAWS 0 5VLVVVPYGAG 0 3VLVVVPYPEQ 0 9VLVALPPGKS 2 0
Proportional representation of all DNA super-sequences.
Shared and unique motifs
Number in brackets refers to percent of dataset.
…as amino acid sequences
UK/US/NL isolate Crinklersmotif uk us nlVLVALPPGTSSAPISDGSDFWLSR 12 26 37VLVALPPGTSSAPISDGTCRDFNI 28 72 47VLVALPPGTSSAPISDGTDFWLSRF 119 259 198VLVALPPGTSSAPISDGTDLWLSRF 860 1352 1336VLVALPPWTSSAPISDGTNLWLSRF 101 143 232VLVALPSGTSSAPISDGTDLWLSRF 10 105 17VLVVVPDEAGGAVASVEPSAAPT 5 26 145VLVVVPDGAGGSASDTSRMDRLFDK 147 347 399VLVVVPE*EQPVSPPQKN*S 3 16 17VLVVVPEGAGGDLELASLLSTTIQE 12 18 50VLVVVPEGAVGSALSQPANSATI 15 49 37VLVVVPEGGKRHPSNGWFAEFFHP 12 55 79VLVVVPEHDGAISNDMSAVTTPLTV 18 55 53VLVVVPEHDGTISNDMSAVTTPLIV 42 116 157VLVVVPEITTTV*VRERKDEVLM 3 94 29VLVVVPEQAQGQPGLWLVTG 6 18 34VLVVVPEQDGKIKRYVCSDNATDSR 78 150 154VLVVVPEQDGTISKEMFAATTPLTL 16 165 49VLVVVPEQDGTISKEMSAATSPLT 63 34 18VLVVVPEQDGTISKEMSAATTPLTV 252 895 793VLVVVPEQDGTISNDLSAVTTPLTV 16 6 57VLVVVPEQDGTISNDMSAVTTPLTV 307 902 619VLVVVPEQGFSVPTVSQDGVFDHCI 15 28 17VLVVVPEQGSSVL*TVFSTTAVIHS 18 8 38VLVVVPEQGSSVPTVSQDGVFDHCI 71 255 283VLVVVPEQGSSVPTVSQDGVFDHCS 71 106 14VLVVVPESFGVDSQLLQLQEALLQ 10 52 26VLVVVPGITTTVEVRERKDEVLMAE 60 453 353VLVVVPGLVASTVTIVIEEAAG 34 85 103
motif uk us nlVLVVVPKGKNDRSAAMAIGVAPSLP 147 333 173VLVVVPKQDGTISNDTSAATTPLT 7 24 77VLVVVPKQGTSVPTVSQDGVFDHC 2 44 25VLVVVPMPSVGSKRSADEIADVQ 3 10 21VLVVVPPPSVGSKR 9 22 15VLVVVPPQDDLRSPAMTLLEAILPY 21 27 20VLVVVPRSAGDDDAWSPMD 3 8 17VLVVVPRSRETMTHGAQWILRFN 9 17 62VLVVVPRSRETTTHGAL 4 9 4VLVVVPRSTGDDDAWSPKDLCTVDP 12 104 133VLVVVPRSTGDDDAWSPMDPD 12 52 11VLVVVPRSTGDDDAWSPMDPKVQLN 101 364 262VLVVVPRSTGDDYALSPNDLCTLD 33 18 25VLVVVPRSTGDGDAWSPMDPK 14 11 6VLVVVPRVAPAPENKRKRKRME 4 19 21VLVVVPSDDVVVPVSVPVAVPT 9 11 20VLVVVPVGAGVGVGQDVSMDVPAAV 82 211 130VLVVVPDQTEDANLSQRFSDL 3 42 0VLVVVPDVAGYAVASVEP 0 20 18VLVVVPEQGSSVPTVSLDGVFDHCS 0 81 72VLVVVPGIITTVEVRERKDDKLIMA 0 30 36VLVVVPGITTTVEVRERKDEMLMA 0 49 20VLVVVPGQRLPIAATAIHEPH 0 27 22VLVVVPDVDGYAV 8 0 0VLVVVPGQGSSVP 5 0 0VLVVVPEQDGEIKRYVCSDSAT 0 3 0VLVVVPGIKTTVEVRERKDEVLMA 0 9 0VLVALPPGTSKIGR 0 0 3VLVVVPSPSVGS 0 0 10VLVVVPYPEQAQVDMVHE 0 0 16
Shared by all 46UK/US 1US/NL 5UK only 2US only 2NL only 3Total 59
• UK 3928, US 22, NL 07434Proportional representation of all DNA super-sequences ..as amino acid sequences
Shared and unique motifs
Are Crinklers being correctly identified?
Left paired-read = Left primer
Right paired-read = Right primer
Crinkler gene
• Create primers from reads for sequencing
sequenceVLVVVP
Shared by NL/US
Shared by UK/US
NLUSCRN PITG_12090CRN PITG_12094
UKUSCRN PITG_19373CRN11
Crinklers correctly identified?
Results
• EC data– reflects the similarity of the two isolates (99% identical at
amino acid level).– from the 64 longest unique sequences, 59 are found in
both isolates.– at least one isolate has unique Crinklers.
• UK/US/NL data– Most Crinklers are shared – US genotype shares Crinklers with UK and NL, but very few
shared between UK and NL– each isolate has 2-3 unique Crinklers.
Results
• EC v UK/US/NL– 64 Crinklers in EC, 59 in UK/US/NL• 52 shared by both• 12 exclusive to EC isolates, • 7 exclusive to UK/US/NL
• ‘VLVVVP’ not the only recombination motif.– ‘VLVALP’ is also found in 8/64 in EC super-
sequences and 7/59 in UK/US/NL data.
Conclusions
• Variable consensus sequence• ‘Proportional representation’ – identifies Crinkler diversity– estimates abundance.
• P.infestans isolates do contain ‘unique’ Crinklers.– Present in a few or totally unique?
• Application of methods on UK/US/NL data reflects the greater sequence diversity between the data.
• Crinklers prediction verified through sequencing.
Comments
• Method could bias the isolate with the most reads – normalisation.
• Method shows diversity at the start of the recombination hotspot– can’t say much about the rest of the gene.
VLVVVPEQDGTISNDMSAVTTPLTV 1000
VLVVVPEQDGTISNDMSAVTTPLTVABCDEFG 250VLVVVPEQDGTISNDMSAVTTPLTVHIJKLMN 250VLVVVPEQDGTISNDMSAVTTPLTVOPQRST 200VLVVVPEQDGTISNDMSAVTTPLTVUVWXYZ 300
• To-do:– Larger-scale PCR/sequencing to confirm
shared/unique Crinklers in isolates.– What would we find with longer (targeted?)
sequences?
Acknowledgments
• Kamoun Group– Sophien Kamoun– Kentaro Yoshida– Marina País– Liliana Cano
• Dan MacLean