gene structure, regulatory control, and evolution of black widow venom latrotoxins

7
Gene structure, regulatory control, and evolution of black widow venom latrotoxins Kanaka Varun Bhere a,1 , Robert A. Haney a,1 , Nadia A. Ayoub b , Jessica E. Garb a,a Department of Biological Sciences, University of Massachusetts Lowell, MA, USA b Department of Biology, Washington and Lee University, Lexington, VA, USA article info Article history: Received 22 July 2014 Revised 28 August 2014 Accepted 29 August 2014 Available online 12 September 2014 Edited by Takashi Gojobori Keywords: a-Latrotoxin Venom Neurosecretion Genomics Molecular evolution Latrodectus abstract Black widow venom contains a-latrotoxin, infamous for causing intense pain. Combining 33 kb of Latrodectus hesperus genomic DNA with RNA-Seq, we characterized the a-latrotoxin gene and dis- covered a paralog, 4.5 kb downstream. Both paralogs exhibit venom gland specific transcription, and may be regulated post-transcriptionally via musashi-like proteins. A 4 kb intron interrupts the a-latrotoxin coding sequence, while a 10 kb intron in the 3 0 UTR of the paralog may cause non-sense-mediated decay. Phylogenetic analysis confirms these divergent latrotoxins diversified through recent tandem gene duplications. Thus, latrotoxin genes have more complex structures, regulatory controls, and sequence diversity than previously proposed. Ó 2014 Federation of European Biochemical Societies. Published by Elsevier B.V. All rights reserved. 1. Introduction Venom proteins can evolve rapidly through gene duplication and adaptation in response to selection imposed by diverse and co-evolving prey [1–4]. Venom toxins also have important biomed- ical applications, including receptor characterization and drug dis- covery [5,6]. However, few studies have examined the structure and regulation of genes encoding venom proteins [7]. For spiders, one of the largest venomous clades, nearly all sequences come from venom gland cDNAs [8,9]. Thus, the roles of gene duplication, alternative splicing, and regulatory controls in generating venom molecular complexity are poorly understood. The spider genus Latrodectus includes black widows (multiple species) and the Australian red-back spider (Latrodectus hasselti), the venoms of which have potent neurotoxic effects on vertebrates. Latrodectus venom contains latrotoxins, a family of neurotoxic proteins that share a unique N-terminal domain flanked by 11–20 ankyrin motif repeats [6,8,10]. Four latrotoxins have been functionally characterized from Latrodectus tredecimguttatus, and while three (a-latroinsectotoxin, a-latrocrustotoxin and d-latroinsectotoxin) elicit neurotransmitter release in arthropods [11], a-latrotoxin forms calcium channels in vertebrate pre- synaptic neuronal membranes, thereby triggering massive neuro- transmitter exocytosis [12–14]. a-Latrotoxin is responsible for the extreme pain resulting from black widow bites [6], and is important for studying neurosecretion, and hence has received considerable scientific attention. While Latrodectus venom contains multiple latrotoxins, they have only been identified in three genera of the family Theridiidae, suggesting latrotoxins represent a recently expanded protein fam- ily [8,15,16]. Evidence from RNA-Seq data indicates P20 divergent latrotoxins are expressed in venom glands of the Western black widow (Latrodectus hesperus) and most are phylogenetically dis- tinct from the functionally characterized latrotoxins [16]. We hypothesize these transcripts are encoded by distinct loci that are spatially clustered in the genome and that their transcription and translation is tightly controlled to ensure strong venom gland-specific expression. We sequenced 33 kb from the L. hesperus genome encompass- ing the a-latrotoxin gene, which we integrated with venom gland http://dx.doi.org/10.1016/j.febslet.2014.08.034 0014-5793/Ó 2014 Federation of European Biochemical Societies. Published by Elsevier B.V. All rights reserved. Abbreviations: FPKM, fragments per kilobase per million library reads; TE, transposable element; LINE, long interspersed nuclear element; MaSp, major ampullate silk protein; TSS, transcription start site; TF, transcription factor; MBE, musashi binding element; LTR, long terminal repeat; NMD, non-sense mediated decay; FDR, false discovery rate Corresponding author at: Department of Biological Sciences, 198 Riverside Street, Olsen Hall 520, University of Massachusetts Lowell, Lowell, MA 01854, USA. Fax: +1 978 934 3044. E-mail address: [email protected] (J.E. Garb). 1 These authors contributed equally to this paper. FEBS Letters 588 (2014) 3891–3897 journal homepage: www.FEBSLetters.org

Upload: jessica-e

Post on 14-Mar-2017

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Gene structure, regulatory control, and evolution of black widow venom latrotoxins

FEBS Letters 588 (2014) 3891–3897

journal homepage: www.FEBSLetters .org

Gene structure, regulatory control, and evolution of black widow venomlatrotoxins

http://dx.doi.org/10.1016/j.febslet.2014.08.0340014-5793/� 2014 Federation of European Biochemical Societies. Published by Elsevier B.V. All rights reserved.

Abbreviations: FPKM, fragments per kilobase per million library reads; TE,transposable element; LINE, long interspersed nuclear element; MaSp, majorampullate silk protein; TSS, transcription start site; TF, transcription factor; MBE,musashi binding element; LTR, long terminal repeat; NMD, non-sense mediateddecay; FDR, false discovery rate⇑ Corresponding author at: Department of Biological Sciences, 198 Riverside

Street, Olsen Hall 520, University of Massachusetts Lowell, Lowell, MA 01854, USA.Fax: +1 978 934 3044.

E-mail address: [email protected] (J.E. Garb).1 These authors contributed equally to this paper.

Kanaka Varun Bhere a,1, Robert A. Haney a,1, Nadia A. Ayoub b, Jessica E. Garb a,⇑a Department of Biological Sciences, University of Massachusetts Lowell, MA, USAb Department of Biology, Washington and Lee University, Lexington, VA, USA

a r t i c l e i n f o a b s t r a c t

Article history:Received 22 July 2014Revised 28 August 2014Accepted 29 August 2014Available online 12 September 2014

Edited by Takashi Gojobori

Keywords:a-LatrotoxinVenomNeurosecretionGenomicsMolecular evolutionLatrodectus

Black widow venom contains a-latrotoxin, infamous for causing intense pain. Combining 33 kb ofLatrodectus hesperus genomic DNA with RNA-Seq, we characterized the a-latrotoxin gene and dis-covered a paralog, 4.5 kb downstream. Both paralogs exhibit venom gland specific transcription,and may be regulated post-transcriptionally via musashi-like proteins. A 4 kb intron interruptsthe a-latrotoxin coding sequence, while a 10 kb intron in the 30 UTR of the paralog may causenon-sense-mediated decay. Phylogenetic analysis confirms these divergent latrotoxins diversifiedthrough recent tandem gene duplications. Thus, latrotoxin genes have more complex structures,regulatory controls, and sequence diversity than previously proposed.� 2014 Federation of European Biochemical Societies. Published by Elsevier B.V. All rights reserved.

1. Introduction Latrodectus venom contains latrotoxins, a family of neurotoxic

Venom proteins can evolve rapidly through gene duplicationand adaptation in response to selection imposed by diverse andco-evolving prey [1–4]. Venom toxins also have important biomed-ical applications, including receptor characterization and drug dis-covery [5,6]. However, few studies have examined the structureand regulation of genes encoding venom proteins [7]. For spiders,one of the largest venomous clades, nearly all sequences comefrom venom gland cDNAs [8,9]. Thus, the roles of gene duplication,alternative splicing, and regulatory controls in generating venommolecular complexity are poorly understood.

The spider genus Latrodectus includes black widows (multiplespecies) and the Australian red-back spider (Latrodectus hasselti),the venoms of which have potent neurotoxic effects on vertebrates.

proteins that share a unique N-terminal domain flanked by11–20 ankyrin motif repeats [6,8,10]. Four latrotoxins have beenfunctionally characterized from Latrodectus tredecimguttatus, andwhile three (a-latroinsectotoxin, a-latrocrustotoxin andd-latroinsectotoxin) elicit neurotransmitter release in arthropods[11], a-latrotoxin forms calcium channels in vertebrate pre-synaptic neuronal membranes, thereby triggering massive neuro-transmitter exocytosis [12–14]. a-Latrotoxin is responsible forthe extreme pain resulting from black widow bites [6], and isimportant for studying neurosecretion, and hence has receivedconsiderable scientific attention.

While Latrodectus venom contains multiple latrotoxins, theyhave only been identified in three genera of the family Theridiidae,suggesting latrotoxins represent a recently expanded protein fam-ily [8,15,16]. Evidence from RNA-Seq data indicates P20 divergentlatrotoxins are expressed in venom glands of the Western blackwidow (Latrodectus hesperus) and most are phylogenetically dis-tinct from the functionally characterized latrotoxins [16]. Wehypothesize these transcripts are encoded by distinct loci thatare spatially clustered in the genome and that their transcriptionand translation is tightly controlled to ensure strong venomgland-specific expression.

We sequenced 33 kb from the L. hesperus genome encompass-ing the a-latrotoxin gene, which we integrated with venom gland

Page 2: Gene structure, regulatory control, and evolution of black widow venom latrotoxins

3892 K.V. Bhere et al. / FEBS Letters 588 (2014) 3891–3897

expression data (RNA-Seq and Expressed Sequence Tags (ESTs)) toreveal a-latrotoxin’s gene structure, quantify its expression, andinvestigate its regulation. This revealed a divergent, highlyexpressed latrotoxin paralog 4.5 kb downstream of a-latrotoxin,and long introns and putative regulatory elements in both para-logs, providing novel insights into latrotoxin evolution andproduction.

2. Materials and methods

2.1. Genomic sequencing

A genomic library was constructed from eight L. hesperusfemales to cover the estimated 1261 Mb genome [17]. Library con-struction details are in Supplementary Material and Ayoub et al.[18]. a-Latrotoxin primers were used to PCR-screen the genomiclibrary, revealing three positive clones. The clone with the smallestinsert (estimated from a BamHI digest) was used to make a shotgunlibrary from three complete, separate digests (EcoRI, PstI andEcoRV). Resulting fragments were ligated into pZErO™-2 plasmids(Invitrogen) and electroporated into TOP10 Escherichia coli. For eachdigest, a library of 192 clones was screened for insert size. 1–2 kbinserts were sequenced using Sp6 and T7 primers. Sequences wereedited and assembled in SEQUENCHER 4.2 (Gene Codes Corp.). Pri-mer walking was employed to complete insert sequencing. Thecomplete sequence was deposited at NCBI (Accession KM382064).

2.2. Genomic annotation

Open Reading Frames (ORFs) were predicted from the assem-bled genomic insert sequence using getORF, retaining ORFs encod-ing P30 amino acids. Predicted proteins were subjected to BLASTpsearches against NCBI’s non-redundant (nr) protein database. Genestructure and expression were explored using female L. hesperusRNA-Seq data [19] from three tissues: (1) venom gland, (2) total silkgland tissues, and (3) cephalothorax minus venom glands [16,19].This included 133 million high quality 75–100 bp paired-endsequence reads collectively. Trinity [20] was used to assemble tis-sue-specific reads, and overlapping sequences across libraries weremerged using CAP3 [19]. Tophat 2.0.8b [21] was used to align RNA-Seq reads from tissues to the genomic sequence to map introns,which were visualized with Integrative Genomics Viewer (IGV)2.3 [22]. Cufflinks 2.0.2 [23] was used to assemble transcripts fromTophat mappings, filtering out isoforms representing <10% of majorisoform abundance. Cuffmerge was used to merge assemblies fromtissues, and Cuffdiff was used to produce expression estimates fortranscripts in each tissue. OrfPredictor [24] was used to translateproteins from transcripts in the frame of the top nr BLASTx hit.We also mapped L. hesperus venom gland ESTs [25] (NCBI dbESTAccessions JZ577614–JZ578096) to the genomic sequence.

The insert sequence was annotated with MAKER [26]. Gene pre-diction was performed using Augustus under Drosophila melano-gaster training parameters and mapping: (1) de novo Trinityassembled transcripts [19] (2) Cufflinks transcripts predicted fromthe insert, (3) venom gland ESTs [25] and (4) latrotoxins translatedfrom Trinity transcripts. RepeatMasker [27] was used to predictrepetitive elements and low complexity regions with the D. mela-nogaster database. The programs NNPP 2.2, Match 1.0, UTRscan,and PipMaker [28–31] were used to predict regulatory elements(Supplementary Materials).

2.3. Evolutionary analyses

We obtained latrotoxin proteins from NCBI’s nr and extracted L.tredecimguttatus latrotoxins [8] from the transcriptome shotgun

assembly archive using tBLASTn (e-value cutoff 1e-20), with thefirst 320 amino acids from L. hesperus latrotoxins [16] as queries.L. tredecimguttatus translations were obtained with OrfPredictor[24]. We aligned these sequences with L. hesperus latrotoxins pre-dicted by Cufflinks from the genomic insert and de novo assembledfrom RNA-Seq data [16,19] with COBALT [32]. Phylogenetic analy-ses were performed with Mr. Bayes 3.2.2 [33], with a ‘‘mixed’’amino acid model and gamma distribution, running 5 � 106 gener-ations. Clade posterior probabilities were from a 50% majority-ruletree, excluding the first 25% of trees. We produced a nucleotidealignment with PAL2NAL [34] from the protein alignment (Supple-mentary Materials). PAML 4 [35] was used to test for positiveselection among sites by comparing Model M7 (several dN/dS cat-egories between 0 and 1), and model M8 (adds category withunconstrained dN/dS), and among sites along specific branches bycomparing models A and Ax2=1. MEME [36] was also employedto find sites under diversifying selection across the latrotoxinphylogeny with an false discovery rate (FDR) < 0.05.

3. Results

3.1. Venom expressed tandem latrotoxins

We sequenced 33342 bp of L. hesperus DNA from a genomiclibrary clone containing the a-latrotoxin gene. Sanger sequencingand shotgun assembly of 236 subclones resulted in six contigs968–8347 bp long, and the assembly was finished through primerwalking. getORF predicted 129 proteins in both directions, 15 withsignificant BLASTp hits (Table 1). Two proteins (1401 and 1393amino acids) had top BLASTp hits to a-latrotoxin. The upstreampredicted protein had a top hit to L. hesperus a-latrotoxin with99% identity and represents the a-latrotoxin locus, while thedownstream latrotoxin had a best match to Steatoda grossaa-latrotoxin at 44% identity, and represents an adjacent paralog(Table 1). The translated tandem L. hesperus latrotoxins share 43%protein sequence identity.

Mapping of L. hesperus RNA-Seq data [19] with Tophat to theinsert indicated that both latrotoxin genes are highly transcribedin venom glands (Fig. 1), with Cufflinks venom gland fragmentsper kilobase per million library reads (FPKM) of 80157 fora-latrotoxin and 81563 for the downstream paralog. FPKM incephalothorax and silk glands was far lower for both latrotoxins(a-latrotoxin: cephalothorax = 1318, silk glands = 1442; down-stream paralog: cephalothorax = 76, silk glands = 0).

Venom gland RNA-Seq and EST evidence revealed one 4241 bpphase 1 intron in a-latrotoxin’s 25th codon from the stop codonand a 10005 bp intron in the 30 UTR of the adjacent latrotoxin(Figs. 1 and 2, Table 2). The Cufflinks a-latrotoxin transcript hada 4257 bp coding region, a 316 bp 50 UTR and a 2805 bp 30 UTR(Table 2). The Trinity a-latrotoxin transcripts consisted of twooverlapping (by 25 bp) fragments, one with a 150 bp 50 UTR andthe other with a 4828 bp 30 UTR. The downstream paralog’sCufflinks transcript contained a 4182 bp coding region, whichwas collinear with the genomic sequence, indicating no introninterrupting the coding region. The Cufflinks transcript for thislocus had a 675 bp 50 UTR and a 341 bp 30 UTR flanking the10005 bp intron located 57 bp into the 30 UTR. The Trinity tran-scripts comprised two overlapping (by 25 bp) fragments, whichlacked the first 10 bp of coding sequence, and with the down-stream transcript terminating 6 bp before the Cufflinks transcript,but agreeing in intron position. A single bp insertion in the Trinitytranscript produced a frameshift and premature truncation, butmapped reads suggested its incorrect assembly. The end of a-latro-toxin’s 30 UTR and the putative transcription start site (TSS) of theadjacent latrotoxin were separated by 4545 or 2522 bp, depending

Page 3: Gene structure, regulatory control, and evolution of black widow venom latrotoxins

Table 1BLASTp hits for the proteins predicted by the getORF program from the complete L. hesperus 33 kb 119_P5 genomic insert sequence. Start and end columns indicate position of theORF in the sequence. The final column indicates whether the best-hit NCBI accession is a transposable element.

Accession Description e-Value Start End Length TE?

AGD80166.1 Alpha-latrotoxin (Latrodectus hesperus) 0.0 2271 6473 4203 NEEZ99596.1 Hypothetical protein TcasGA2_TC002110 (Tribolium castaneum) 7.00e�30 7596 7159 436 YEEZ99596.1 Hypothetical protein TcasGA2_TC002110 (Tribolium castaneum) 3.00e�17 7907 7542 364 YXP_001602535.2 Hypothetical protein LOC100118603 (Nasonia vitripennis) 1.00e�33 15195 14575 619 YAGD80173.1 Alpha-latrotoxin (Steatoda grossa) 0.0 18794 22972 4179 NEJY57592.1 AAEL017053-PA (Aedes aegypti) 8.00e�11 23965 23798 166 NAAN87269.1 ORF (Drosophila melanogaster) 4.00e�08 24578 24465 112 YEFA02763.1 Hypothetical protein TcasGA2_TC008496 (Tribolium castaneum) 5.00e�26 24917 24633 283 YWP_006706426.1 Pao retrotransposon peptidase protein, partial (Candidatus Regiella insecticola) 4.00e�39 25890 25456 433 YXP_003245995.1 Hypothetical protein LOC100570266 (Acyrthosiphon pisum) 2.00e�31 26481 26002 478 YEFA13346.1 Hypothetical protein TcasGA2_TC002325 (Tribolium castaneum) 1.00e�32 26956 26504 451 YXP_003699175.1 Uncharacterized protein LOC100874762 (Megachile rotundata) 1.00e�10 27862 27305 556 YXP_002012451.1 GI14275 (Drosophila mojavensis) 8.00e�10 28257 27931 325 NCAA35587.1 Unnamed protein product (Drosophila melanogaster) 8.00e�59 30831 29839 991 YAAX07244.1 Reverse transcriptase (Liobuthus kessleri) 3.00e�23 31174 30812 361 Y

Fig. 1. Mapping of RNA-Seq data to the 33 kb L. hesperus genomic reference sequence showing strong venom gland specific expression of tandem latrotoxin paralogs. Topruler shows location in the reference sequence. Representative reads from the L. hesperus venom gland (top panel), cephalothorax (middle) and silk gland (bottom) RNA-seqlibraries are shown mapped to the reference using Tophat with visualization in IGV. Reads mapping across intron boundaries are connected by thin lines and show location ofsplice junctions. At the bottom the gene structure of the two latrotoxins encoded by the fosmid sequence as determined by Cufflinks is diagrammed. In exonic regions, codingsequences are in gray, UTRs in black. The dashed line indicates the extended 30 UTR region predicted by the Trinity assembled transcript. The position of the mariner elementis indicated in the a-latrotoxin intron, and the RepeatMasker predicted retrotransposon positions are indicated in red in the intron of the downstream paralog.

Table 2Locations of latrotoxin gene features in the L. hesperus genomic insert sequence. In cases where the predicted length of a feature varies when measured using the Trinity predictedtranscript or the Cufflinks predicted transcript, both values are shown. The shaded portion gives values for the upstream a-latrotoxin locus, the unshaded for the downstreamlatrotoxin paralog.

Feature Start End Length5´ UTR exon (Cufflinks) 1955 2270 3165´ UTR exon (Trinity) 2121 2270 150Coding exon 2271 6450 4180Intron 6451 10691 4241Coding exon 10692 10768 773´ UTR exon (Cufflinks) 10769 13573 28053´ UTR exon (Trinity) 10769 15596 48285´ UTR exon 18119 18793 675Coding exon 18794 22975 41823´ UTR exon 22976 23032 57Intron 23033 33037 100053´ UTR exon (Cufflinks) 33038 33321 2843´ UTR exon (Trinity) 33038 33315 278

K.V. Bhere et al. / FEBS Letters 588 (2014) 3891–3897 3893

Page 4: Gene structure, regulatory control, and evolution of black widow venom latrotoxins

Fig. 2. MAKER-annotated L. hesperus genomic sequence illustrating latrotoxin introns containing transposable elements. The top annotation track shows Augustus predictedgene structure. The middle track shows alignments of RNA-Seq assembled transcripts and ESTs to the fosmid (Cufflinks assembled transcripts labeled with prefix ‘‘CUFF’’,Trinity assembled transcripts labeled with prefix ‘‘venom_comp’’, all other sequences are L. hesperus venom gland ESTs labeled with their GenBank Accession numbers), whilethe bottom section shows repeat regions identified by RepeatMasker (all features annotated in this track from 0 to 19 kb are simple repeat/low complexity regions; featuresbetween 22 and 32 kb are retrotransposon fragments). The position of the putative Tc1/mariner family transposon not predicted by RepeatMasker is shown in green.

Table 3Repeats found in the L. hesperus genomic insert sequence. For simple repeats therepeat motif is shown. All repeats were identified by RepeatMasker with theexception of the putative Tc1/mariner superfamily DNA transposon (shaded row)identified as discussed in the text.

Repeat Class/Family

Position in insert sequence Matching repeatBegin End

Simple Repeat/ Low complexity

765 806 (ATTTA)n2102 2143 (A)n8662 8691 (AG)n9476 9532 (ATTTT)n11119 11213 (TATTAAT)n13027 13076 A-rich15841 15890 (TTA)n16588 16656 A-rich17928 17978 (TAATTTT)n

LTR/Pao/BEL

23817 2503125415 2610026331 2683927242 2732428119 28193

LINE/Jockey 30008 30075

30515 31256

Tc1/mariner 7011 8435

3894 K.V. Bhere et al. / FEBS Letters 588 (2014) 3891–3897

if the Cufflinks or Trinity a-latrotoxin was considered. The MAKERannotation also supported two tandem latrotoxin paralogs, butAugustus gene predictions suggested multiple introns in both par-alogs, disagreeing with the transcript evidence (Fig. 2).

3.2. Latrotoxin transposable elements

RepeatMasker identified seven transposable elements (TEs)from 68 to 1215 bp in the 30 UTR intron of the downstream latro-toxin. Five belong to the Pao retrotransposon family and two werelong interspersed nuclear elements (LINEs) in the Jockey retro-transposon family (Fig. 2, Table 3). Most getORF proteins had sig-nificant BLAST hits to TEs (Tables 1 and 3). These elements, orclosely related copies, showed high expression in silk and cephalo-thorax tissues, but not in venom glands (Supplementary Materials,Table 2).

A putative TE not detected by RepeatMasker was located withinthe a-latrotoxin intron. BLASTn of this intron found significant hitsto L. hesperus genomic sequences surrounding the dragline silkmajor ampullate silk protein 1 (MaSp1) and major ampullate silkprotein 2 (MaSp2) encoding genes. The top BLASTn hit was to twodifferent regions (1304 or 1429 bp at 92% identity) of a MaSp2 con-taining clone (EF595245) from the library used in this study [18]. A652 bp BLASTn match was found to a different L. hesperus genomicregion from a MaSp1 containing clone (EF595246). A Cufflinks tran-script for this TE in a-latrotoxin’s intron (positions 7014–8433), orclosely related copies, had much higher expression in silk glandsand cephalothorax than in venom glands (FPKM: venomgland = 1447, cephalothorax = 304868, silk gland = 838794).BLASTx of this transcript to NCBI revealed homology to transposas-es of the Tc1/mariner DNA transposon superfamily [37], in agree-ment with getORF predictions (Table 1). EMBOSS einverted found41 bp inverted repeats flanking coordinates 7011–8435, indicatingthe mariner-like transposon in the intron in a-latrotoxin is 1425 bp.

3.3. Latrotoxin regulatory elements

Twenty-nine promoters were predicted in the 33 kb insert(Supplementary Materials, Table 3), with a score P0.95. None werewithin 600 bp of a TSS (defined by transcript termini). When runon 50 UTRs, 30 UTRs, or putative promoters upstream of TSS, Pip-maker found no conserved regions between latrotoxin paralogs.Several transcription factor (TF) binding sites were predicted byMatch 1.0 in the introns of both paralogs, but their orientationwas opposite of latrotoxin genes, and functions of the TFs bindingthese sites had no obvious connection to venom production(Table 4).

UTRscan predicted upstream open reading frames (uORFs) inthe 50 UTRs of both latrotoxin transcripts, an internal ribsosomeentry site (IRES) in the 50 UTR of a-latrotoxin, and musashi bindingelements (MBEs) in the 50 UTR of the downstream latrotoxin(Supplementary Materials, Table 4). The 30 UTR from the Cufflinksa-latrotoxin contained multiple MBEs, with additional MBEs and aBrd-Box motif in the longer 30 UTR from the Trinity transcript. Botha-latrotoxin transcript 30 UTRs terminated in a predicted polyade-nylation signal, suggesting possible alternative transcripts withdifferent 30 UTRs. The Cufflinks 30 UTR of the downstream latrotox-in gene had one predicted terminal polyadenylation site.

Page 5: Gene structure, regulatory control, and evolution of black widow venom latrotoxins

Table 4Transcriptional regulatory protein binding sites identified in latrotoxin introns usingMatch 1.0. Shown are position within intron sequence, orientation, name andfunctional information.

Position Strand Name Function

563 – CF2-II Late activator in folliclecells during chorion formation

2463 – BR-C Z1 Ecdysone response during metamorphosis2842 – BR-C Z4 Ecdysone response during metamorphosis3930 – BR-C Z4 Ecdysone response during metamorphosis

895 – Elf-1 Epidermal differentiation4447 – Croc Head development4664 – BR-C Z1 Ecdysone response during metamorphosis

K.V. Bhere et al. / FEBS Letters 588 (2014) 3891–3897 3895

3.4. Latrotoxin family evolution

Most latrotoxins from L. hesperus group in highly supportedclades with orthologs from L. tredecimguttatus, with five excep-

Fig. 3. Evolutionary relationships of latrotoxin paralogs indicating the relatively recent dtree of latrotoxin proteins. Numbers at nodes indicate clade posterior probabilities (PP). Tare in red and L. hesperus in blue; sequences from L. geometricus and S. grossa are in oCufflinks from this study. All other L. hesperus and L. tredecimguttatus sequences were fromfunctionally characterized paralog (a-LTX = a-latrotoxin clade; d-LIT = d-latroinsectotoxlarger gray shaded clade contains the two genomic insert latrotoxins and more closely reselection are indicated by the number 1 in purple text.

tions, including the paralog adjacent to a-latrotoxin (Fig. 3). Theadjacent latrotoxin genes are closely related, but not sister para-logs, with a-latrotoxin’s downstream paralog being more closelyrelated to another paralog. The Cufflinks-translated L. hesperusa-latrotoxin falls within a clade of orthologs. A test for sites evolv-ing under positive selection using all latrotoxins (M7 vs. M8:2Dln L = 9.113; df = 2; P = 0.0104) was significant, with anadditional M8 category of sites with an average dN/dS of 1.618.However, no sites had a posterior probability >0.592 in the siteclass under positive selection. A branch-site test was appliedspecifically to two branches subsequent to the duplication eventseparating a-latrotoxin from the downstream paralog (Fig. 3),and a model allowing for positive selection on these brancheswas a better fit to the data (MA vs. MAx2=1: 2Dln L = 14.219;df = 1; P = 0.0001), with two codons (389 and 886) having a poster-ior probability >0.95 of dN/dS > 1. MEME indicated 9 codons under

uplication of tandem latrotoxins. Shown is a midpoint rooted Bayesian phylogenetiche node marked by an asterisk has a PP of 1.00. Sequences from L. tredecimguttatus

range or green, respectively. Arrows indicate the tandem sequences assembled byde novo assembly of RNA-Seq reads. Shaded beige boxes label clades that contain a

in clade; a-LIT = a-latroinsectotoxin clade; a-LCT = a-latrocrustotoxin clade). Thelated sequences, including all a-latrotoxins. Branches tested for sites under positive

Page 6: Gene structure, regulatory control, and evolution of black widow venom latrotoxins

3896 K.V. Bhere et al. / FEBS Letters 588 (2014) 3891–3897

positive selection, with most clustered towards the C-terminus ofthe latrotoxin protein (Supplementary Material).

4. Discussion

Characterization of 33 kb of the Western black widow (L. hes-perus) genome revealed the complete a-latrotoxin gene and adivergent closely related paralogous locus, and both are highlyexpressed in venom glands at similar levels. These loci representtwo of P20 potential L. hesperus latrotoxin genes with venom-gland specific expression [16]. These adjacent paralogs occur4.5 kb apart, providing evidence of a tandem gene duplication.Both latrotoxin loci encode large proteins, but contrary to previousfindings in L. tredecimguttatus [38], introns are present in their cod-ing sequence or 30 UTR. The latrotoxin coding and 30 UTR intronsare approximately 5� and 7.5� the average length of introns fromcorresponding regions in D. melanogaster [39]. However, theseintrons are similar to average lengths recently reported from spi-der genomes [40]. While energetically costly to transcribe [41],introns can increase gene expression when they contain regulatoryenhancers, or by direct interaction of the transcriptional andspliceosomal machinery [42].

Latrotoxin UTRs have several predicted regulatory sequencesthat may function in post-transcriptional regulation. The musashimRNA binding protein exerts translational control and modulatescell division and cell cycle progression [43], and the multiple UTRmusashi binding elements predicted in both latrotoxins may regu-late their expression. Haney et al. [16] reported a protein with asignificant BLAST hit to a putative RNA-binding musashi proteinamong 695 L. hesperus venom gland specifically expressed tran-scripts, with venom gland biased expected read counts per millionreads among tissues (cephalothorax = 54.64; silk = 5.9;venom = 2647.02), supporting the role of MBEs in latrotoxin regu-lation. Upstream ORFs were also predicted in the 50 UTRs of bothparalogs, which may modulate expression through effects ontranslation efficiency [44]. Furthermore, the presence of a 30 UTRintron may target the downstream latrotoxin’s transcript to thenon-sense mediated decay (NMD) pathway, greatly reducingtranslation [45]. In mammals, this typically occurs when the 30

most exon-exon junction is >50–55 bp downstream of the stopcodon [46], and the splice junction in the 30 UTR of the downstreamlatrotoxin is 57 bp beyond the stop codon. Despite both genesbeing expressed at similar levels in venom glands, the downstreamlatrotoxin paralog was not identified in venom by mass spectrom-etry [16], consistent with its transcripts being targeted to the NMDpathway. While NMD has not been studied in arachnids, BLASTp ofthree components of this pathway (UPF1–3) from Homo sapiens tothe L. hesperus transcriptome identified expressed homologs in spi-ders (e-values of best hits: UPF1 = 0.0; UPF2 = 0.0; UPF3 = 7e�58).

The latrotoxin paralogs have been subject to several transpos-able element (TE) insertions, mostly long terminal repeat (LTR) ret-rotransposons belonging to the BEL/Pao family, and non-LTRretrotransposons of the Jockey family, both widespread in eukary-otic genomes and occurring within the downstream paralog’s 30

UTR intron [47,48]. The identified TEs also exhibit tissue-specificexpression, with much lower expression in the venom gland, rela-tive to silk and cephalothorax tissues. A large TE was found in a-latrotoxin’s intron, with homology to the Tc1/mariner DNA trans-poson superfamily, which is amenable to interspecific horizontaltransfer [37]. This transposon may have recently proliferated inthe L. hesperus genome, as closely related sequences (up to 92%identity over 1400 bp) were identified in two other sequencedclones from this same genomic library, representing different loci[18]. Moreover, the mariner-like TE composes 33% of the lengthof a-latrotoxin’s intron, and its recent insertion may be linked to

this intron’s origin [49,50]. Although a complete transposase ORFwas not present in this TE, BLASTp of this sequence to the L. hes-perus transcriptome reveals expressed transcripts encoding com-plete transposase, suggesting these mariner-like elements areactively transposing. While the intronless nature of other latrotox-ins [38,51] is consistent with a recent origin of introns in L. hes-perus latrotoxins, the methods used to identify introns in L.tredecimguttatus latrotoxins (PCR-RFLP [38]) may have missedsmall introns, or those outside coding regions.

Nearly all latrotoxin transcripts assembled from RNA-Seq datain L. hesperus had a closely related ortholog in L. tredecimguttatus(also assembled from RNA-Seq data), implying the latrotoxin genefamily had diversified through numerous gene duplications in themost recent common ancestor of these two species or in an earlierancestor. Yet the paralog downstream of a-latrotoxin did notappear to have an L. tredecimguttatus ortholog, suggesting it iseither not evolutionarily conserved, is a recent duplicate for whichthe closely related paralog in L. hesperus was not sampled, or lacksor exhibits variable expression in L. tredecimguttatus. The adjacentlatrotoxins are not sister paralogs, but occur in the same sub-clade,supporting recent duplication events via non-homologous cross-ing-over, and we expect that many other latrotoxins are clusteredin the Latrodectus genome. While latrotoxins largely appear to beevolving under purifying selection, our analyses found limited evi-dence of positive selection on a few codons, suggesting the possibleadaptive evolution of latrotoxins. Clearly, functional assays areneeded to test the toxin activity of these recently described para-logs, which is currently unknown. The high level of sequence diver-gence between a-latrotoxin and its mostly closely related paralogswould suggest functional variability. While gene duplication andselection are implicated as important processes in venom evolu-tion, as has been shown in spiders and other venomous taxa[52–55], functional data will add to our understanding of howthese processes impact venom phenotypes.

Acknowledgements

Rujuta Gadgil, Cheryl Hayashi, Caryn McCowan, and Alex Lan-caster helped in data collection. This study was funded by theNational Institutes of Health from Grants 1F32GM83661-01 and1R15GM097714-01 to JEG, and F32 GM78875-1A to NAA, and bythe National Science Foundation (IOS-0951886 to NAA).

Appendix A. Supplementary data

Supplementary data associated with this article can be found, inthe online version, at http://dx.doi.org/10.1016/j.febslet.2014.08.034.

References

[1] Duda, T.F. and Palumbi, S.R. (1999) Molecular genetics of ecologicaldiversification: duplication and rapid evolution of toxin genes of thevenomous gastropod Conus. Proc. Natl. Acad. Sci. U.S.A. 96, 6820–6823,http://dx.doi.org/10.1073/pnas.96.12.6820.

[2] Fry, B.G. (2005) From genome to ‘‘venome’’: molecular origin and evolution ofthe snake venom proteome inferred from phylogenetic analysis of toxinsequences and related body proteins. Genome Res. 15, 403–420, http://dx.doi.org/10.1101/gr.3228405.

[3] Moura-da-Silva, A., Paine, M.I., Diniz, M.V., Theakston, R.D. and Crampton, J.(1995) The molecular cloning of a phospholipase A2 from Bothrops jararacussusnake venom: evolution of venom group II phospholipase A2’s may imply geneduplications. J. Mol. Evol. 41, 174–179. http://link.springer.com/10.1007/BF00170670.

[4] Whittington, C.M., Papenfuss, A.T., Locke, D.P., Mardis, E.R., Wilson, R.K., et al.(2010) Novel venom gene discovery in the platypus. Genome Biol. 11, R95,http://dx.doi.org/10.1186/gb-2010-11-9-r95.

[5] Escoubas, P., Quinton, L. and Nicholson, G.M. (2008) Venomics: unravelling thecomplexity of animal venoms with mass spectrometry. J. Mass Spectrom. 43,279–295, http://dx.doi.org/10.1002/jms.1389.

Page 7: Gene structure, regulatory control, and evolution of black widow venom latrotoxins

K.V. Bhere et al. / FEBS Letters 588 (2014) 3891–3897 3897

[6] Ushkaryov, Y.A., Volynski, K.E. and Ashton, A.C. (2004) The multiple actions ofblack widow spider toxins and their selective use in neurosecretion studies.Toxicon 43, 527–542, http://dx.doi.org/10.1016/j.toxicon.2004.02.008.

[7] Pineda, S.S., Wilson, D., Mattick, J.S. and King, G.F. (2012) The lethal toxin fromAustralian funnel-web spiders is encoded by an intronless gene. PLoS ONE 7,e43699, http://dx.doi.org/10.1371/journal.pone.0043699.

[8] He, Q., Duan, Z., Yu, Y., Liu, Z., Liu, Z., et al. (2013) The venom glandtranscriptome of Latrodectus tredecimguttatus revealed by deep sequencingand cDNA library analysis. PLoS ONE 8, e81357, http://dx.doi.org/10.1371/journal.pone.0081357.

[9] Zhang, Y., Chen, J., Tang, X., Wang, F., Jiang, L., et al. (2010) Transcriptomeanalysis of the venom glands of the Chinese wolf spider Lycosa singoriensis.Zoology 113, 10–18, http://dx.doi.org/10.1016/j.zool.2009.04.001.

[10] Kiyatkin, N.I., Dulubova, I.E., Chekhovskaya, I.A. and Grishin, E.V. (1990)Cloning and structure of cDNA encoding a-latrotoxin from black widow spidervenom. FEBS Lett. 270, 127–131, http://dx.doi.org/10.1016/0014-5793(90)81250-R.

[11] Krasnoperov, V.G., Shamotienko, O.G. and Grishin, E.V. (1990) Isolation andproperties of insect-specific neurotoxins from venoms of the spiderLatrodectus mactans tredecimguttatus. Bioorg. Khim. 16, 1138–1140.

[12] Ashton, A.C. (2001) Alpha-Latrotoxin, acting via two Ca2+-dependentpathways, triggers exocytosis of two pools of synaptic vesicles. J. Biol. Chem.276, 44695–44703, http://dx.doi.org/10.1074/jbc.M108088200.

[13] Orlova, E.V., Rahman, M.A., Gowen, B., Volynski, K.E., Ashton, A.C., et al. (2000)Structure of alpha-latrotoxin oligomers reveals that divalent cation-dependent tetramers form membrane pores. Nat. Struct. Biol. 7, 48–53,http://dx.doi.org/10.1038/71247.

[14] Ushkaryov, Y.A., Petrenko, A.G., Geppert, M. and Südhof, T.C. (1992)Neurexins: synaptic cell surface proteins related to the alpha-latrotoxinreceptor and laminin. Science 257, 50–56.

[15] Garb, J.E. and Hayashi, C.Y. (2013) Molecular evolution of alpha-Latrotoxin, theexceptionally potent vertebrate neurotoxin in black widow spider venom.Mol. Biol. Evol. 30, 999–1014, http://dx.doi.org/10.1093/molbev/mst011.

[16] Haney, R.A., Ayoub, N.A., Clarke, T.H., Hayashi, C.Y. and Garb, J.E. (2014)Dramatic expansion of the black widow toxin arsenal uncovered by multi-tissue transcriptomics and venom proteomics. BMC Genomics 15, 366, http://www.biomedcentral.com/1471-2164/15/366.

[17] Gregory, T.R. and Shorthouse, D.P. (2003) Genome sizes of spiders. J. Hered. 94,285–290, http://dx.doi.org/10.1093/jhered/esg070.

[18] Ayoub, N.A., Garb, J.E., Tinghitella, R.M., Collin, M.A. and Hayashi, C.Y. (2007)Blueprint for a high-performance biomaterial: full-length spider dragline silkgenes. PLoS ONE 2, e514, http://dx.doi.org/10.1371/journal.pone.0000514.

[19] Clarke, T.H., Garb, J.E., Hayashi, C.Y., Haney, R.A., Lancaster, A.K., et al. (2014)Multi-tissue transcriptomics of the black widow spider reveals expansions,co-options, and functional processes of the silk gland gene toolkit. BMCGenomics 15, 365, http://www.biomedcentral.com/1471-2164/15/365.

[20] Grabherr, M.G., Haas, B.J., Yassour, M., Levin, J.Z., Thompson, D.A., et al. (2011)Full-length transcriptome assembly from RNA-Seq data without a referencegenome. Nat. Biotechnol. 29, 644–652, http://dx.doi.org/10.1038/nbt.1883.

[21] Trapnell, C., Pachter, L. and Salzberg, S.L. (2009) TopHat: discovering splicejunctions with RNA-Seq. Bioinformatics 25, 1105–1111, http://dx.doi.org/10.1093/bioinformatics/btp120.

[22] Thorvaldsdottir, H., Robinson, J.T. and Mesirov, J.P. (2013) Integrativegenomics viewer (IGV): high-performance genomics data visualization andexploration. Brief. Bioinform. 14, 178–192, http://dx.doi.org/10.1093/bib/bbs017.

[23] Trapnell, C., Williams, B.A., Pertea, G., Mortazavi, A., Kwan, G., et al. (2010)Transcript assembly and quantification by RNA-Seq reveals unannotatedtranscripts and isoform switching during cell differentiation. Nat. Biotechnol.28, 511–515, http://dx.doi.org/10.1038/nbt.1621.

[24] Min, X.J., Butler, G., Storms, R. and Tsang, A. (2005) OrfPredictor: predictingprotein-coding regions in EST-derived sequences. Nucleic Acids Res. 33,W677–W680, http://dx.doi.org/10.1093/nar/gki394.

[25] McCowan, C. and Garb, J.E. (2014) Recruitment and diversification of anecdysozoan family of neuropeptide hormones for black widow spider venomexpression. Gene 536, 366–375, http://dx.doi.org/10.1016/j.gene.2013.11.054.

[26] Cantarel, B.L., Korf, I., Robb, S.M.C., Parra, G., Ross, E., et al. (2008) MAKER: aneasy-to-use annotation pipeline designed for emerging model organismgenomes. Genome Res. 18, 188–196, http://dx.doi.org/10.1101/gr.6743907.

[27] Smit, A.F.A., Hubley, R. and Green, P. RepeatMasker Open-3.0. Available:http://www.repeatmasker.org.

[28] Reese, M.G. (2001) Application of a time-delay neural network to promoterannotation in the Drosophila melanogaster genome. Comput. Chem. 26, 51–56.

[29] Matys, V., Kel-Margoulis, O.V., Fricke, E., Liebich, I., Land, S., et al. (2006)TRANSFAC(R) and its module TRANSCompel(R): transcriptional generegulation in eukaryotes. Nucleic Acids Res. 34, D108–D110, http://dx.doi.org/10.1093/nar/gkj143.

[30] Grillo, G., Turi, A., Licciulli, F., Mignone, F., Liuni, S., et al. (2010) UTRdb andUTRsite (RELEASE 2010): a collection of sequences and regulatory motifs of

the untranslated regions of eukaryotic mRNAs. Nucleic Acids Res. 38, D75–D80, http://dx.doi.org/10.1093/nar/gkp902.

[31] Schwartz, S., Zhang, Z., Frazer, K.A., Smit, A., Riemer, C., et al. (2000) PipMaker-a web server for aligning two genomic DNA sequences. Genome Res. 10, 577–586.

[32] Papadopoulos, J.S. and Agarwala, R. (2007) COBALT: constraint-basedalignment tool for multiple protein sequences. Bioinformatics 23, 1073–1079, http://dx.doi.org/10.1093/bioinformatics/btm076.

[33] Ronquist, F., Teslenko, M., van der Mark, P., Ayres, D.L., Darling, A., et al. (2012)MrBayes 3.2: efficient Bayesian phylogenetic inference and model choiceacross a large model space. Syst. Biol. 61, 539–542, http://dx.doi.org/10.1093/sysbio/sys029.

[34] Suyama, M., Torrents, D. and Bork, P. (2006) PAL2NAL: robust conversion ofprotein sequence alignments into the corresponding codon alignments.Nucleic Acids Res. 34, W609–W612, http://dx.doi.org/10.1093/nar/gkl315.

[35] Yang, Z. (2007) PAML 4: phylogenetic analysis by maximum likelihood. Mol.Biol. Evol. 24, 1586–1591, http://dx.doi.org/10.1093/molbev/msm088.

[36] Murrell, B., Wertheim, J.O., Moola, S., Weighill, T., Scheffler, K., et al. (2012)Detecting individual sites subject to episodic diversifying selection. PLoSGenet. 8, e1002764, http://dx.doi.org/10.1371/journal.pgen.1002764.

[37] Plasterk, R.H., Izsvák, Z. and Ivics, Z. (1999) Resident aliens: the Tc1/marinersuperfamily of transposable elements. Trends Genet. 15, 326–332.

[38] Danilevich, V.N. and Grishin, E.V. (2000) The chromosomal genes for blackwidow spider neurotoxins do not contain introns. Bioorg. Khim. 26, 933–939.

[39] Hong, X., Scofield, D.G. and Lynch, M. (2006) Intron size, abundance, anddistribution within untranslated regions of genes. Mol. Biol. Evol. 23, 2392–2404, http://dx.doi.org/10.1093/molbev/msl111.

[40] Sanggaard, K.W., Bechsgaard, J.S., Fang, X., Duan, J., Dyrlund, T.F., et al. (2014)Spider genomes provide insight into composition and evolution of venom andsilk. Nat. Commun. 5, 3765, http://dx.doi.org/10.1038/ncomms4765.

[41] Castillo-Davis, C.I., Mekhedov, S.L., Hartl, D.L., Koonin, E.V. and Kondrashov,F.A. (2002) Selection for short introns in highly expressed genes. Nat. Genet.31, 415–418, http://dx.doi.org/10.1038/ng940.

[42] Le Hir, H., Nott, A. and Moore, M.J. (2003) How introns influence and enhanceeukaryotic gene expression. Trends Biochem. Sci. 28, 215–220, http://dx.doi.org/10.1016/S0968-0004(03)00052-5.

[43] MacNicol, M.C., Cragle, C.E. and MacNicol, A. (2011) Context-dependentregulation of Musashi-mediated mRNA translation and cell cycle regulation.Cell Cycle 10, 39–44, http://dx.doi.org/10.4161/cc.10.1.14388.

[44] Mignone, F., Gissi, C., Liuni, S. and Pesole, G. (2002) Untranslated regions ofmRNAs. Genome Biol. 3. REVIEWS0004.

[45] Baker, K.E. and Parker, R. (2004) Nonsense-mediated mRNA decay:terminating erroneous gene expression. Curr. Opin. Cell Biol. 16, 293–299,http://dx.doi.org/10.1016/j.ceb.2004.03.003.

[46] Bicknell, A.A., Cenik, C., Chua, H.N., Roth, F.P. and Moore, M.J. (2012) Introns inUTRs: why we should stop ignoring them. BioEssays 34, 1025–1034, http://dx.doi.org/10.1002/bies.201200073.

[47] De la Chaux, N. and Wagner, A. (2011) BEL/Pao retrotransposons in metazoangenomes. BMC Evol. Biol. 11, 154, http://dx.doi.org/10.1186/1471-2148-11-154.

[48] Glushkov, S., Novikova, O., Blinov, A. and Fet, V. (2006) Divergent non-LTRretrotransposon lineages from the genomes of scorpions (Arachnida:Scorpiones). Mol. Genet. Genomics 275, 288–296, http://dx.doi.org/10.1007/s00438-005-0079-3.

[49] Purugganan, M.D. (1993) Transposable elements as introns: evolutionaryconnections. Trends Ecol. Evol. 8, 239–243, http://dx.doi.org/10.1016/0169-5347(93)90198-X.

[50] Roy, S.W. (2004) The origin of recent introns: transposons? Genome Biol. 5,251, http://dx.doi.org/10.1186/gb-2004-5-12-251.

[51] Danilevich, V.N., Luk’ianov, S.A. and Grishin, E.V. (1999) Cloning and structureof gene encoded alpha-latrocrustoxin from the black widow spider venom.Bioorg. Khim. 25, 537–547.

[52] Casewell, N.R., Wagstaff, S.C., Harrison, R.A., Renjifo, C. and Wuster, W. (2011)Domain loss facilitates accelerated evolution and neofunctionalization ofduplicate snake venom metalloproteinase toxin genes. Mol. Biol. Evol. 28,2637–2649, http://dx.doi.org/10.1093/molbev/msr091.

[53] Chang, D. and Duda, T.F. (2012) Extensive and continuous duplicationfacilitates rapid evolution and diversification of gene families. Mol. Biol.Evol. 29, 2019–2029, http://dx.doi.org/10.1093/molbev/mss068.

[54] Vonk, F.J., Casewell, N.R., Henkel, C.V., Heimberg, A.M., Jansen, H.J., et al. (2013)The king cobra genome reveals dynamic gene evolution and adaptation in thesnake venom system. Proc. Natl. Acad. Sci. U.S.A. 110, 20651–20656, http://dx.doi.org/10.1073/pnas.1314702110.

[55] Binford, G.J., Bodner, M.R., Cordes, M.H.J., Baldwin, K.L., Rynerson, M.R., et al.(2009) Molecular evolution, functional variation, and proposed nomenclatureof the gene family that includes sphingomyelinase D in sicariid spider venoms.Mol. Biol. Evol. 26, 547–566, http://dx.doi.org/10.1093/molbev/msn274.