expanded protein families in echinococcus - springer10.1186/s12864-017-3574... · expanded protein...

9
Expanded protein families in Echinococcus A total of 26 expanded families consisting of 10 to 66 protein members were found in Echinococcus (Additional file 1.16). Among them, is the heat-shock protein 70 (Hsp70) family, which has been described by Tsai et al. in all of the tapeworm genomes obtained so far [1]. We also found three interesting expanded families present only in the cestode orthology group: GPS motif-containing protein, Ubiquitin-conjugating enzyme, and Glycosyl transferase. The E. canadensis (G7) GPS motif-containing protein is related to polycystin-1, a protein involved in central signal-transduction pathways being the GPS motif (PF01825) responsible of protein-protein interactions. Polycystins form an expanding family of proteins composed of multiple members in fish, invertebrates, mammals and humans. Ubiquitin-conjugating enzyme is known to be involved in the ubiquitination pathway, modulating proteins degradation and protein- protein interactions. The Ubiquitin-conjugating (UBC) complex consists of up to 19 genes in Echinococcus. Protein sequence alignments showed a high conservation of the UBC superfamily domain (PF00179) only among cestode parasites. (Figure 1) 10 20 30 40 50 60 70 80 90 100 . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | EcG7|EcG7_05780-RA M T C SAI R R T F M K D L Q H L C R T I D A S T D G Q A S I V G I D E M T V K V L L R P K S G Y N A H A E F L L T I K C CP T Y P KAC P D V S F D S P I F H P N I D P S D G A I C L S L L S E W RR EcG7|EcG7_09374-RA . . . P T T Q . . . . . . V . Q . Y . . . . . . . N . . . . . I AA . . . . . . . S . . . T . . . . . . . . . . . . . . . . S . . . . DS . . I . . . T . . . . . . . . . . S . P . . . . . . Y D . . - EcG7|EcG7_09682-RA . NY.ED Q . . . F E . . . Q . STA . . . L . . . . . . . .ES . AT . . . . S . . . . . . . Y . . . . . . . . . S . . .E . . GNS . E . F . . . . . . . . . . . . . TC S MH . . . . . D . Q- Egra|EgrG_000351600.1 . . . P T T Q . . . . . . V . Q . Y . . . . . . . N . . . . . I AA . . . . . . . S . . . T . . . . . . . . . . . . . . . . S . . . . DS . . I . . . T . . . . . . . . . . S . P . . . . . . Y D . . - Egra|EgrG_000503000.1 . . . P T A Q . . . . . . . . Q . Y . . . E . . . . . . . F . . . . . . . S . . . S . . . . . . . . . . . . . M . . . . . H S A . . NIR . . . . . . . . . . . . . V . . . T . S V . V . . . N . . QS Egra|EgrG_000503220.1 . . . P T A Q . . . . . . . . Q . Y . . . E . . . . . . . F . . . . . . . S . . . S . . . . . . . . . . . . . M . . . . . H S A . . NIR . . . . . . . . . . . . . V . . . T . S V . V . . . N . . Q. Egra|EgrG_000677000.1 . NY.ED Q . . . F E . . . Q . STA . . . . . . . . . . . .ES . AT . . . . S . . . . . . . Y . . . . . . . . . S . . .E . . GNS . E . F . . . . . . . . . . . . . TC S MH . . . . . D . Q- Egra|EgrG_001154600.1 . S . . T T Q . . . F . . . . Q . H . K . E . . . . . . . T . L . S . . . S . T . S . . . . . . . . . . . . . V . N . . . T S S . . RN. . . . T . S . . . . . . . . . SL S . S . . . . . . . . . Q- Egra|EgrG_002029000.1 . . . P T T Q . . . . . . V . Q . Y . . . . D . . G . . . . . I AV . . . . . . . S . . . . . . . . . . . . . . . . . . . G T . . . .ES . . . . . H . . . . . . . . A . . S . Y . . . . . FD D . . - Emul|EmuJ_000351600.1 . . . P T T Q . . . . . . V . . . Y . . . . . . . N . . . . . I AA . . . . . . . S . . . T . . . . . . . . . . . . . . . . S . . . . DS . . I . . . T . . . . . . . . . . S . P . . . . . . Y D . . - Emul|EmuJ_000351600.2 . . . P T T Q . . . . . . V . . . Y . . . . . . . N . . . . . I AA . . . . . . . S . . . T . . . . . . . . . . . . . . . . S . . . . DS . . I . . . T . . . . . . . . . . S . P . . . . . . Y D . . - Emul|EmuJ_000503000.1 . . . P T A Q . . . . . . . . Q . Y . . . E . . . . . . . F . . . . . A . . . . . S . . . . . . . . . . . . . M . . . . . H.A . . NFP . . . . . . . . . . . . . V . . . T . S V . V . . . N . . QS Emul|EmuJ_000579600.1 . S . . T T . . A . F . . . . Q . Y K . . E . . . . . . . T . L . TG . . S . R . S . . . . . . C . . . . . . I . S . . . T S S . . RN. . . . T . S . . . . . . . . . SQ S . S V . . . . . . . . Q- Emul|EmuJ_000677000.1 . SY.ED Q . . . S E . . . Q . SIA . . . . . . . . . . . .ES . ATM . . . S . . . . . . . Y . . . . . . . . . S . . .E . . WSS . A . F . . . . . . . . . . . . . TC S MH . . . . . D . Q- Emul|EmuJ_001154600.1 . S . Y T T Q . . . F . . . . Q . H . K . E . . . N . . . T . L . S . . . S . T . S . . . . . . C . . . . . . V . N . . . T S S . . RN. . . . T . S . . . . . . . . . SL S . S . . . . . . . . . Q- Emul|EmuJ_001154700.1 . S . Y T T Q . . . F . . . . Q . Y K . . EV . . . . . . T . L . . . . . S . T . S . . . . . . C . . . . . . V . N . . . T S S . . LN. . . . T . S . . . . . . . . . SL S . S . . . . . . . . . Q- TsM1|TsM_000036800 . SR. T . Q . . . F . . . . Q . Y . . . E . . . . . . . T VT . . . . . S . R . . . . . . H . C . . . . . . I . I . . . T S S . . WNG . E . T . S . . . . . . . . . SIG . S . . . . . . M K--- TsM1|TsM_000062700 . .S.EA Q H . . P E . . . Q . IT . . . . . . . . H . . . .ES . AT . L . . S . Q . T . . HY . . . Q . . M . . T . R.G . . EGA . . . A . . T . . . . . . . . . . N . S . H . D . . TN . Q- TsM1|TsM_000167900 . I . PVNC . . . . R . . . Q . Y A . . . T . . N . . . . . I . T . . . . . . I . . . . . . . C . . . . . . . . . . . . . TA . . E. . . . I . . . . . . . . . . . . . . . . . . . F . . . . .C . - TsM1|TsM_000183500 . S . P. T Q . . . L . . V . Q . Y . . . . . . . . . . . . . . . V . . L . A . . C . . . . . . L . . . . . . . . . . . . . S . . . MQS . F . . . . . . . . . . . . . . T. . S . . . . . . N D . . - TsM1|TsM_000043200 . PSH T T Q . . . F . . . . Q . Y . . . ET . . . . . . A . A . S . . . S . R . . . . . . . . C . . . . . . I . D . . . T S S . . RDP . E . T . G . . . . . . . . . S A T . S V . . NFF . . . E- Hmic|HmN_000112700.1 . SRPENL . . . Y . . . . Q . Y M . V ET . . . . . . E . L . . . . . . . R . A . K . . T . . . . . . V . T M . . QGQGA . . RVA . K . T . N . . . . . . . . . AKM . Y . . . . . . N . . Y- Hmic|HmN_000634400.1 AVA. T T Q . S . F . . V . . . Y K . . . S . . . . . . A VEFV . D . S . R I S . C . . . . C . . . . T . Y M . . T . S S A . . LKP . E . . . . T . . . . S . . . FI S . S . . . . I . T Q . LS Hmic|HmN_000634600.1 AVA. T T Q . S . F . . V . . . Y Q . . . S . . . . . . A LELV . DT S . R I S . C . . . . C . . . . T . Y VN . . HS S . . . LKP . E . . . . T . . . . . . . . IVYK S . . . . IFA Q . . - Hmic|HmN_000987400.1 AVA. T T Q . S . F . . V . . . Y Q . . . S . . . . . . A LELV . DT S . R I S . C . . . . C . . . . T . Y VN . . HS S . . . LKP . E . . . . T . . . . . . . . IVYK S . . . . IFA Q . .S 110 120 130 140 150 160 170 180 190 200 . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | EcG7|EcG7_05780-RA S C Y C L V D L V K A V L Y V I D H P A F D S P N N RF G A LK D P A Q L S T N T A R V L A G L P V K G R R F P P N R A WL E W A R A S G C L P T G--- E E E M E E A E EE------ LRKR E D- EcG7|EcG7_09374-RA . . . N . L . M . . . . . . L . . . . . . . . . . . D. . I VN . . . . . P . M . . . . . . . . . . . . .C . . . . T . . V K . . . D N . . . . . . --- . . . L . . .GAA------ V.TG D E- EcG7|EcG7_09682-RA . . . S . S . . . . . I . H L . . . . T . . . . . IP L VT .H . . . . . A . K . . . A . . . . . . D . H . . . . . V . . C . . . . . N . R . . . P E Q E . . . E D . ---------W VG.MA . - Egra|EgrG_000351600.1 . . . N . L . . . . . . . . L . . . . . . . . . . . D. . I VN . . . . . P . M . . . . . . . . . . . . .C . . . . T . . V K . . CD N . . . . . . --- . . . L . . .GAA------ V.TG D E- Egra|EgrG_000503000.1 - . . S . . . . . . . L . C L . . . . M . . . . V HP L . E VNS . . . . AR K . . . L . . . . . . D . LY . . . . PV . V . . . . V N . . . . . EG E E . . . F . GV . .G------ VGLN D E- Egra|EgrG_000503220.1 . . . S . I . . . . . L . . L . . . . V . . W . A HP L . VAQS . . . . A S K . . . L . . . . . . N . L . . . . . P . . V . . . . V N . . . . . E- E E . VYF . GV KVG - - - - - - VD.M . E- Egra|EgrG_000677000.1 . . . S . S . . . . . I . H L . E . . T . . . . . . P L VT .H . . . . . A . K . . . A . . . . . . D . H . . . . . V . . C . . . . . N . . . . . P E Q E . . . E D . ---------W VG.MA . - Egra|EgrG_001154600.1 . . . S . L . V . . . . . . L . E . . N . S . A . . S L AI .E N . . . . P S K . . . L . . . . . . N . H . . T . . T . . C . . . . . N . . . . IR E E E V . . T . . P . A.AQER--TAAN . .K Egra|EgrG_002029000.1 . . . N . L . . . . . . . . L . . . . . . . . . . . D. . T VD . . . . . P . M . T . . . . . . . . . . . . . . . . T . . V K . . . E N . . . . . E--- . . QL . . . . DA------ VG. . D E- Emul|EmuJ_000351600.1 . . . N . L . M . . . . . . . . . . . . . . . . . . D. . I VN . . . . . P . M . . . . . . . . . . . . .C . . . . T . . V K . . . D N . . . . . . --- . . . L . . .GAA------ I .TG D E- Emul|EmuJ_000351600.2 . . . N . L . M . . . . . . . . . . . . . . . . . . D. . I VN . . . . . P . M . . . . . . . . . . . . .C . . . . T . . V K . . . D N . . . . . . --- . . . L . . .GAA------ I .TG D E- Emul|EmuJ_000503000.1 - . . S . . . . . . . L . . L . . . . V . . W . A HP L . VAQS . . . . A S K . . . L . . . . . . N . L . . . . . P . . V . . . . V N . . . . . EG E E . . . L . GV . .G------ VGIK . E- Emul|EmuJ_000579600.1 . . . S . L . V . . . . . . L . E . . N . E . A . . S L AI .E N . G . . . S K . . . L . . . . . . N . . . . T . . T . . C . . . . . N . . . . . R E E E V . . T . . S . A.AQGR--TVAK . .N Emul|EmuJ_000677000.1 . S . S . S . . . . . I . H L . E . . T . . . . . . L L VT IH . . . . . A . K A . . A . . . . . . D . H . . . . . V . . C . . . . . N D . . . . P E Q E . . DE D . ---------W VG.LA . - Emul|EmuJ_001154600.1 . . . S . L . V . . . M . . L . E . . N . S . A . . S L AT .E N . G . . . S K . . . L . . . . . . N . . . . T . . T . . C . . . . . N . . . . . R E E E V . . T . . S . A.AQGR--TVAK . .N Emul|EmuJ_001154700.1 . . . S . L . V . . . M . . L . E . . N . S . A . . S L AT .E N . G . . . S K . . . L . . . . . . N . . . . T . . T . . C . . . . . N . . . . . R E E E V . . T . . S . A.AQGR--TVAK . .N TsM1|TsM_000036800 ---------- . . . . L . E . . N . . . A . . S L . T .EK . E . . P S R . . . L . . . . . . N . H . . T . . T . . C . . . . . N . . . . . R E E E V D . T . . F . A.VKER--AVAK . AK TsM1|TsM_000062700 . . H N . I . . . . V . . H L . EQ . N . E . . . . H L EFPH . . . L M AAT . . . A . . . . . . S . H . . . . . V . . C . . . L. N . . . . . Q E Q E . . . E . QE . . .KEEENK VGDK D .P TsM1|TsM_000167900 . . . S . M . V . . . . . . . . . . . . . . . . . . . . . V . . . L S R . PV E . . . . . . . . . . . . . . . . . . . . . F . . . S. N R . . . I . --- . . . L . . V . .A------ .WEKNR- TsM1|TsM_000183500 . . . N . L . V . . G . . . L . . . . . . . . . . . S. . R .S . T . . . . . M . . . . . . . V . I . . Y . . . . . T . . F K . . SD N . . . . . E--- . . G. . . . . DA - - - - - - AGM . D K- TsM1|TsM_000043200 . . . S . L . A . . . . . . L . E . . N . G . A . . S. . W .E N . E . . P S K . . . L . . . . . . . . Y . . A . . N . . C . . . . . N . . . . . R E E E V D . P . . F . A.VKER--AVTM . .K Hmic|HmN_000112700.1 . . W S . L . V . . . M . . L . E . . N . E . A . . SYAY VN . I S E . EV K . K . L . . . . K . NEWV . . A . E . . C . . . . E N H . . . . E E E E . . . K . SESPMP-----STSKATE Hmic|HmN_000634400.1 - . . S . L . . . . . I . . L . . . . N . E . . I SD.YE VSGQE E I A . K S . L L . . . . . . D . KC YA . . T T . C D . . . E N N . . . . A D DV K . GYVWT SPQ - - - - - - - QDVH EC Hmic|HmN_000634600.1 --- S . L . . . . . I . . L . I . . N . E . . I SD L HE VSFQEA M A . K . . L L . . . . . . D . KC YA . . T T . C D . . . E N N . . . . A D DV K . GYVWT PPQ - - - - - - - QDVR EC Hmic|HmN_000987400.1 - . . S . L . . . . . I . . L . I . . N . E . . I SD L HE VSFQEA M A . K . . L L . . . . . . D . KC YA . . T T . C D . . . E N N . . . . A D DV K . GYVWT PPQ - - - - - - - QDVR EC 210 220 230 240 250 260 270 280 290 300 . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | EcG7|EcG7_05780-RA LV QNWG - - - - DG V NVSATNMG S E D K E STNATFY A L S NTA S D S V P S F A K I R Y LF D P E ---- E KGIYRD P Y ESC W T P F E V D S Q R I L I W H P S SDQKT D A Y T V F EcG7|EcG7_09374-RA V D R KG . ----EE I DAPF NA A A . . N .VPFR E. I . . P . D A T . . T . . . . . . M . . . . . L . ---- . . IT SWG . . DRRC . . . . . E . . . V . . . . . . N GRNR . N . . . . EcG7|EcG7_09682-RA Y D . KS. ----SET DDFT D S A A . . E E . CVI ELVD . S T . .M . E T . . . . . I M . C. I . S . ---- D E S . CQSHCHNYQY . HG L E RH . V . V . . . . N AMEL . T H . . . Egra|EgrG_000351600.1 V D . KG . ----EE I DAPF NA A A P . N . . PFS E. I . . P . D G T . . T . . . . . . M . . . L . L . ---- . . IT SWG . . DRRC . . . . . E . . . V . . . . . . N GRNR . N . . . . Egra|EgrG_000503000.1 F D . KGD----G. .GA. . N T A A . . . E D. . IGA.C . PF D . T . . A . . . . . . . . . S . N. . ----G E NN SYN . C . RL . S . . . . E . . . V . . . . . . N STG. . T . . . . Egra|EgrG_000503220.1 F D . .GD----E. I .A. . N TT A . . . E D. . IG. . . . P . D . T . . A . . . . . . . . H S . N. . ---- D E NN SSG . . GRYR S . . . I E . . . V . . . . . . N SIG.GT . . . . Egra|EgrG_000677000.1 Y D . KS. ----SET DNFT D S A A . . E E . CVI ELVD . S T . .M . EI . . . I . I M . . S I . S . ---- D E S . CQSHCHNYQY . HG L E RH . V . V . . . . N AV.L . T H . . . Egra|EgrG_001154600.1 MK ETDPARNVGD IGA----- A M . .G . . NV DV.GES . D A T . . T . . . . . . . . H S . . V Q---- D D NQ TWI . . . RS Y . . YN ML T . . V I . . E . GRHN.P . RNA . . Egra|EgrG_002029000.1 . D . KG H - - - - E A I DG . D N TTD . . NRLPIL E. I . . P . D A T . . T . . . . . . . . C. . . . . ----Q E S.F.GS . LG. . . . . . . E . . . V . . . . . . N .HSR . S . . L . Emul|EmuJ_000351600.1 V D L KG . ----EE I DAPV NA S A L . N .VPFS E. I . . P . D G T . . T . . . . . . M . . . . . L . ---- . . IT SWG . . DRHC . . . . . E . . . V . . . . . . N GRNK . N H . . . Emul|EmuJ_000351600.2 V D L KG . ----EE I DAPV NA S A L . N .VPFS E. I . . P . D G T . . T . . . . . . M . . . . . L . ---- . . IT SWG . . DRHC . . . . . E . . . V . . . . . . N GRNK . N H . . . Emul|EmuJ_000503000.1 F D . KGD----G. . DA. . N T A A . . . E D. . I D. .C . P . D . T . . A . . . . . . . . . S L N. . ----G E NN SCN . C . RL . S . . . . E L . . V . . . . . . ISTG. . T . . . . Emul|EmuJ_000579600.1 I N EDDPANDVGS IGAP. D T STMG . E . . NV DV.GES . D A T . . T . . . . . . . . H S I . N Q---- D D NR TESY . . CR YI . Y D .A . . . . . . . E . E .ND.LHRNF . . Emul|EmuJ_000677000.1 Y D . KS. ----SET DDFT D S A A . . E E . CVT ELVD . S T . .M . E T I . . LTI M . . S I . S . ---- D . S.FQSH . HNYQYLHG L E RH . V . V . . . . N AMEL . T H . . . Emul|EmuJ_001154600.1 I N EDDPANDVGS IGAP. D T STMG . E . . .V DA.DK. . D A T . A T . . . . . . . . H S I . A Q---- D D SQ TWI . . . .S Y . . YN ML T . . V . . . D . GRRN.P . RNA . . Emul|EmuJ_001154700.1 I N EDDPANDVGS IGAP. D T STMG . E . . .V DA.DK. . D A T . . T . . . . . . . . . S I NA Q---- D D NQ TWI . . . .S Y . . YN ML T . . V . . . D . GRRN.P . RNA . . TsM1|TsM_000036800 .S ESDPANHL-K IG. . .GD A A L . . E . LNA DA.DDF . VA T . . T . . . . . . . . H S I . A K---- D DDQ TWIL . . HK YI . YG .V . . . V . . . E . D .YENPHKN S . . TsM1|TsM_000062700 Y D SLG. ----.ET DDYT D S A A . . E E . TI I EVVDES T . .V . EVA . A . . . . . . S I . S . ---- . EAV.QYH . QAYQYH YG L E CH . V . V . . . T N GKEA . TP . . . TsM1|TsM_000167900 FS . KHS----EDF DI . .K TEVH . GAHQLDSS--CP . D . T . . T I . . . . . . . R. . . .G---- .N.LFQN . . . .R . . . . . . . . . . . . . . . . . N . . RR . TC . . . TsM1|TsM_000183500 VG . KG D - - - - E V IGTCT D T A A A . H .A. .S E.V. . P . D . T . . T . . . . . . M . . . . . . . ---- . D S. SWG . . GWE . . . . . . E . HW V . . . . . G D.KNM . T . . . . TsM1|TsM_000043200 VS EVGPENNL-KTIA. .ED A A L D . E . .SVNVCDEI . IA T . . T . . . . . . . . H S I . V Q---- D D SQIWNR . .PK YI . C D I AF . . V . . . E . G N YKNSHKN S . . Hmic|HmN_000112700.1 N D ELA. --------.DNE TPLP KA E . I DQHQDDKY . IV S . E T A . . . . . . . . S V . L D---- . Q S--YEK . TQM YVSI . TS . . . V V . . N . TGKEGVYKK . I . Hmic|HmN_000634400.1 . D V QEE----------KK SDV . D G N N - - E . L S E MW . D V S . . T I . . . . . . . . S I AD . EDSSL D ST TQSK . YQPS . . . . HGMH . . V L FQ . T HKDYP EQK ST . Hmic|HmN_000634600.1 . D V QEE----------K N TEI . DGN . --E.QAEVWP D V S . . T I . . . . . F . . S I AD DDDSC .G SP TISQ . YRL YI . . . LGMH . . V L FQ . T HKDYP EQK ST . Hmic|HmN_000987400.1 . D V QEE----------K N TEI . DGN . --E.QAEVWP D V S . . T I . . . . . F . . S I AD DDDSC .G SP TISQ . YRL YI . . . LGMH . . V L FQ . T HKDYP EQK ST . UBCc domain (PF00179)

Upload: phamdang

Post on 20-Apr-2018

216 views

Category:

Documents


1 download

TRANSCRIPT

Expanded protein families in Echinococcus A total of 26 expanded families consisting of 10 to 66 protein members were found

in Echinococcus (Additional file 1.16). Among them, is the heat-shock protein 70 (Hsp70) family, which has been described by Tsai et al. in all of the tapeworm genomes obtained so far [1]. We also found three interesting expanded families present only in the cestode orthology group: GPS motif-containing protein, Ubiquitin-conjugating enzyme, and Glycosyl transferase. The E. canadensis (G7) GPS motif-containing protein is related to polycystin-1, a protein involved in central signal-transduction pathways being the GPS motif (PF01825) responsible of protein-protein interactions. Polycystins form an expanding family of proteins composed of multiple members in fish, invertebrates, mammals and humans. Ubiquitin-conjugating enzyme is known to be involved in the ubiquitination pathway, modulating proteins degradation and protein-protein interactions. The Ubiquitin-conjugating (UBC) complex consists of up to 19 genes in Echinococcus. Protein sequence alignments showed a high conservation of the UBC superfamily domain (PF00179) only among cestode parasites. (Figure 1)

10 20 30 40 50 60 70 80 90 100. . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . |

EcG7|EcG7_05780-RA M T C S A I R R T F M K D L Q H L C R T I D A S T D G Q A S I V G I D E M T V K V L L R P K S G Y N A H A E F L L T I K C C P T Y P K A C P D V S F D S P I F H P N I D P S D G A I C L S L L S E W R REcG7|EcG7_09374-RA . . . P T T Q . . . . . . V . Q . Y . . . . . . . N . . . . . I A A . . . . . . . S . . . T . . . . . . . . . . . . . . . . S . . . . D S . . I . . . T . . . . . . . . . . S . P . . . . . . Y D . . -EcG7|EcG7_09682-RA . N Y . E D Q . . . F E . . . Q . S T A . . . L . . . . . . . . E S . A T . . . . S . . . . . . . Y . . . . . . . . . S . . . E . . G N S . E . F . . . . . . . . . . . . . T C S M H . . . . . D . Q -Egra|EgrG_000351600.1 . . . P T T Q . . . . . . V . Q . Y . . . . . . . N . . . . . I A A . . . . . . . S . . . T . . . . . . . . . . . . . . . . S . . . . D S . . I . . . T . . . . . . . . . . S . P . . . . . . Y D . . -Egra|EgrG_000503000.1 . . . P T A Q . . . . . . . . Q . Y . . . E . . . . . . . F . . . . . . . S . . . S . . . . . . . . . . . . . M . . . . . H S A . . N I R . . . . . . . . . . . . . V . . . T . S V . V . . . N . . Q SEgra|EgrG_000503220.1 . . . P T A Q . . . . . . . . Q . Y . . . E . . . . . . . F . . . . . . . S . . . S . . . . . . . . . . . . . M . . . . . H S A . . N I R . . . . . . . . . . . . . V . . . T . S V . V . . . N . . Q .Egra|EgrG_000677000.1 . N Y . E D Q . . . F E . . . Q . S T A . . . . . . . . . . . . E S . A T . . . . S . . . . . . . Y . . . . . . . . . S . . . E . . G N S . E . F . . . . . . . . . . . . . T C S M H . . . . . D . Q -Egra|EgrG_001154600.1 . S . . T T Q . . . F . . . . Q . H . K . E . . . . . . . T . L . S . . . S . T . S . . . . . . . . . . . . . V . N . . . T S S . . R N . . . . T . S . . . . . . . . . S L S . S . . . . . . . . . Q -Egra|EgrG_002029000.1 . . . P T T Q . . . . . . V . Q . Y . . . . D . . G . . . . . I A V . . . . . . . S . . . . . . . . . . . . . . . . . . . G T . . . . E S . . . . . H . . . . . . . . A . . S . Y . . . . . F D D . . -Emul|EmuJ_000351600.1 . . . P T T Q . . . . . . V . . . Y . . . . . . . N . . . . . I A A . . . . . . . S . . . T . . . . . . . . . . . . . . . . S . . . . D S . . I . . . T . . . . . . . . . . S . P . . . . . . Y D . . -Emul|EmuJ_000351600.2 . . . P T T Q . . . . . . V . . . Y . . . . . . . N . . . . . I A A . . . . . . . S . . . T . . . . . . . . . . . . . . . . S . . . . D S . . I . . . T . . . . . . . . . . S . P . . . . . . Y D . . -Emul|EmuJ_000503000.1 . . . P T A Q . . . . . . . . Q . Y . . . E . . . . . . . F . . . . . A . . . . . S . . . . . . . . . . . . . M . . . . . H . A . . N F P . . . . . . . . . . . . . V . . . T . S V . V . . . N . . Q SEmul|EmuJ_000579600.1 . S . . T T . . A . F . . . . Q . Y K . . E . . . . . . . T . L . T G . . S . R . S . . . . . . C . . . . . . I . S . . . T S S . . R N . . . . T . S . . . . . . . . . S Q S . S V . . . . . . . . Q -Emul|EmuJ_000677000.1 . S Y . E D Q . . . S E . . . Q . S I A . . . . . . . . . . . . E S . A T M . . . S . . . . . . . Y . . . . . . . . . S . . . E . . W S S . A . F . . . . . . . . . . . . . T C S M H . . . . . D . Q -Emul|EmuJ_001154600.1 . S . Y T T Q . . . F . . . . Q . H . K . E . . . N . . . T . L . S . . . S . T . S . . . . . . C . . . . . . V . N . . . T S S . . R N . . . . T . S . . . . . . . . . S L S . S . . . . . . . . . Q -Emul|EmuJ_001154700.1 . S . Y T T Q . . . F . . . . Q . Y K . . E V . . . . . . T . L . . . . . S . T . S . . . . . . C . . . . . . V . N . . . T S S . . L N . . . . T . S . . . . . . . . . S L S . S . . . . . . . . . Q -TsM1|TsM_000036800 . S R . T . Q . . . F . . . . Q . Y . . . E . . . . . . . T V T . . . . . S . R . . . . . . H . C . . . . . . I . I . . . T S S . . W N G . E . T . S . . . . . . . . . S I G . S . . . . . . M K - - -TsM1|TsM_000062700 . . S . E A Q H . . P E . . . Q . I T . . . . . . . . H . . . . E S . A T . L . . S . Q . T . . H Y . . . Q . . M . . T . R . G . . E G A . . . A . . T . . . . . . . . . . N . S . H . D . . T N . Q -TsM1|TsM_000167900 . I . P V N C . . . . R . . . Q . Y A . . . T . . N . . . . . I . T . . . . . . I . . . . . . . C . . . . . . . . . . . . . T A . . E . . . . I . . . . . . . . . . . . . . . . . . . F . . . . . C . -TsM1|TsM_000183500 . S . P . T Q . . . L . . V . Q . Y . . . . . . . . . . . . . . . V . . L . A . . C . . . . . . L . . . . . . . . . . . . . S . . . M Q S . F . . . . . . . . . . . . . . T . . S . . . . . . N D . . -TsM1|TsM_000043200 . P S H T T Q . . . F . . . . Q . Y . . . E T . . . . . . A . A . S . . . S . R . . . . . . . . C . . . . . . I . D . . . T S S . . R D P . E . T . G . . . . . . . . . S A T . S V . . N F F . . . E -Hmic|HmN_000112700.1 . S R P E N L . . . Y . . . . Q . Y M . V E T . . . . . . E . L . . . . . . . R . A . K . . T . . . . . . V . T M . . Q G Q G A . . R V A . K . T . N . . . . . . . . . A K M . Y . . . . . . N . . Y -Hmic|HmN_000634400.1 A V A . T T Q . S . F . . V . . . Y K . . . S . . . . . . A V E F V . D . S . R I S . C . . . . C . . . . T . Y M . . T . S S A . . L K P . E . . . . T . . . . S . . . F I S . S . . . . I . T Q . L SHmic|HmN_000634600.1 A V A . T T Q . S . F . . V . . . Y Q . . . S . . . . . . A L E L V . D T S . R I S . C . . . . C . . . . T . Y V N . . H S S . . . L K P . E . . . . T . . . . . . . . I V Y K S . . . . I F A Q . . -Hmic|HmN_000987400.1 A V A . T T Q . S . F . . V . . . Y Q . . . S . . . . . . A L E L V . D T S . R I S . C . . . . C . . . . T . Y V N . . H S S . . . L K P . E . . . . T . . . . . . . . I V Y K S . . . . I F A Q . . S

110 120 130 140 150 160 170 180 190 200. . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . |

EcG7|EcG7_05780-RA S C Y C L V D L V K A V L Y V I D H P A F D S P N N R F G A L K D P A Q L S T N T A R V L A G L P V K G R R F P P N R AW L E W A R A S G C L P T G - - - E E E M E E A E E E - - - - - - L R K R E D -EcG7|EcG7_09374-RA . . . N . L . M . . . . . . L . . . . . . . . . . . D . . I V N . . . . . P . M . . . . . . . . . . . . . C . . . . T . . V K . . . D N . . . . . . - - - . . . L . . . G A A - - - - - - V . T G D E -EcG7|EcG7_09682-RA . . . S . S . . . . . I . H L . . . . T . . . . . I P L V T . H . . . . . A . K . . . A . . . . . . D . H . . . . . V . . C . . . . . N . R . . . P E Q E . . . E D . - - - - - - - - - W V G . M A . -Egra|EgrG_000351600.1 . . . N . L . . . . . . . . L . . . . . . . . . . . D . . I V N . . . . . P . M . . . . . . . . . . . . . C . . . . T . . V K . . C D N . . . . . . - - - . . . L . . . G A A - - - - - - V . T G D E -Egra|EgrG_000503000.1 - . . S . . . . . . . L . C L . . . . M . . . . V H P L . E V N S . . . . A R K . . . L . . . . . . D . L Y . . . . P V . V . . . . V N . . . . . E G E E . . . F . G V . . G - - - - - - V G L N D E -Egra|EgrG_000503220.1 . . . S . I . . . . . L . . L . . . . V . . W . A H P L . V A Q S . . . . A S K . . . L . . . . . . N . L . . . . . P . . V . . . . V N . . . . . E - E E . V Y F . G V K V G - - - - - - V D . M . E -Egra|EgrG_000677000.1 . . . S . S . . . . . I . H L . E . . T . . . . . . P L V T . H . . . . . A . K . . . A . . . . . . D . H . . . . . V . . C . . . . . N . . . . . P E Q E . . . E D . - - - - - - - - - W V G . M A . -Egra|EgrG_001154600.1 . . . S . L . V . . . . . . L . E . . N . S . A . . S L A I . E N . . . . P S K . . . L . . . . . . N . H . . T . . T . . C . . . . . N . . . . I R E E E V . . T . . P . A . A Q E R - - T A A N . . KEgra|EgrG_002029000.1 . . . N . L . . . . . . . . L . . . . . . . . . . . D . . T V D . . . . . P . M . T . . . . . . . . . . . . . . . . T . . V K . . . E N . . . . . E - - - . . Q L . . . . D A - - - - - - V G . . D E -Emul|EmuJ_000351600.1 . . . N . L . M . . . . . . . . . . . . . . . . . . D . . I V N . . . . . P . M . . . . . . . . . . . . . C . . . . T . . V K . . . D N . . . . . . - - - . . . L . . . G A A - - - - - - I . T G D E -Emul|EmuJ_000351600.2 . . . N . L . M . . . . . . . . . . . . . . . . . . D . . I V N . . . . . P . M . . . . . . . . . . . . . C . . . . T . . V K . . . D N . . . . . . - - - . . . L . . . G A A - - - - - - I . T G D E -Emul|EmuJ_000503000.1 - . . S . . . . . . . L . . L . . . . V . . W . A H P L . V A Q S . . . . A S K . . . L . . . . . . N . L . . . . . P . . V . . . . V N . . . . . E G E E . . . L . G V . . G - - - - - - V G I K . E -Emul|EmuJ_000579600.1 . . . S . L . V . . . . . . L . E . . N . E . A . . S L A I . E N . G . . . S K . . . L . . . . . . N . . . . T . . T . . C . . . . . N . . . . . R E E E V . . T . . S . A . A Q G R - - T V A K . . NEmul|EmuJ_000677000.1 . S . S . S . . . . . I . H L . E . . T . . . . . . L L V T I H . . . . . A . K A . . A . . . . . . D . H . . . . . V . . C . . . . . N D . . . . P E Q E . . D E D . - - - - - - - - - W V G . L A . -Emul|EmuJ_001154600.1 . . . S . L . V . . . M . . L . E . . N . S . A . . S L A T . E N . G . . . S K . . . L . . . . . . N . . . . T . . T . . C . . . . . N . . . . . R E E E V . . T . . S . A . A Q G R - - T V A K . . NEmul|EmuJ_001154700.1 . . . S . L . V . . . M . . L . E . . N . S . A . . S L A T . E N . G . . . S K . . . L . . . . . . N . . . . T . . T . . C . . . . . N . . . . . R E E E V . . T . . S . A . A Q G R - - T V A K . . NTsM1|TsM_000036800 - - - - - - - - - - . . . . L . E . . N . . . A . . S L . T . E K . E . . P S R . . . L . . . . . . N . H . . T . . T . . C . . . . . N . . . . . R E E E V D . T . . F . A . V K E R - - A V A K . A KTsM1|TsM_000062700 . . H N . I . . . . V . . H L . E Q . N . E . . . . H L E F P H . . . L M A A T . . . A . . . . . . S . H . . . . . V . . C . . . L . N . . . . . Q E Q E . . . E . Q E . . . K E E E N K V G D K D . PTsM1|TsM_000167900 . . . S . M . V . . . . . . . . . . . . . . . . . . . . . V . . . L S R . P V E . . . . . . . . . . . . . . . . . . . . . F . . . S . N R . . . I . - - - . . . L . . V . . A - - - - - - . W E K N R -TsM1|TsM_000183500 . . . N . L . V . . G . . . L . . . . . . . . . . . S . . R . S . T . . . . . M . . . . . . . V . I . . Y . . . . . T . . F K . . S D N . . . . . E - - - . . G . . . . . D A - - - - - - A G M . D K -TsM1|TsM_000043200 . . . S . L . A . . . . . . L . E . . N . G . A . . S . . W . E N . E . . P S K . . . L . . . . . . . . Y . . A . . N . . C . . . . . N . . . . . R E E E V D . P . . F . A . V K E R - - A V T M . . KHmic|HmN_000112700.1 . . W S . L . V . . . M . . L . E . . N . E . A . . S Y A Y V N . I S E . E V K . K . L . . . . K . N E W V . . A . E . . C . . . . E N H . . . . E E E E . . . K . S E S P M P - - - - - S T S K A T EHmic|HmN_000634400.1 - . . S . L . . . . . I . . L . . . . N . E . . I S D . Y E V S G Q E E I A . K S . L L . . . . . . D . K C Y A . . T T . C D . . . E N N . . . . A D D V K . G Y V W T S P Q - - - - - - - Q D V H E CHmic|HmN_000634600.1 - - - S . L . . . . . I . . L . I . . N . E . . I S D L H E V S F Q E A M A . K . . L L . . . . . . D . K C Y A . . T T . C D . . . E N N . . . . A D D V K . G Y V W T P P Q - - - - - - - Q D V R E CHmic|HmN_000987400.1 - . . S . L . . . . . I . . L . I . . N . E . . I S D L H E V S F Q E A M A . K . . L L . . . . . . D . K C Y A . . T T . C D . . . E N N . . . . A D D V K . G Y V W T P P Q - - - - - - - Q D V R E C

210 220 230 240 250 260 270 280 290 300. . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . |

EcG7|EcG7_05780-RA L V Q NWG - - - - D G V N V S A T N M G S E D K E S T N A T F Y A L S N T A S D S V P S F A K I R Y L F D P E - - - - E K G I Y R D P Y E S CW T P F E V D S Q R I L I W H P S S D Q K T D A Y T V FEcG7|EcG7_09374-RA V D R K G . - - - - E E I D A P F N A A A . . N . V P F R E . I . . P . D A T . . T . . . . . . M . . . . . L . - - - - . . I T S WG . . D R R C . . . . . E . . . V . . . . . . N G R N R . N . . . .EcG7|EcG7_09682-RA Y D . K S . - - - - S E T D D F T D S A A . . E E . C V I E L V D . S T . . M . E T . . . . . I M . C . I . S . - - - - D E S . C Q S H C H N Y Q Y . H G L E R H . V . V . . . . N A M E L . T H . . .Egra|EgrG_000351600.1 V D . K G . - - - - E E I D A P F N A A A P . N . . P F S E . I . . P . D G T . . T . . . . . . M . . . L . L . - - - - . . I T S WG . . D R R C . . . . . E . . . V . . . . . . N G R N R . N . . . .Egra|EgrG_000503000.1 F D . K G D - - - - G . . G A . . N T A A . . . E D . . I G A . C . P F D . T . . A . . . . . . . . . S . N . . - - - - G E N N S Y N . C . R L . S . . . . E . . . V . . . . . . N S T G . . T . . . .Egra|EgrG_000503220.1 F D . . G D - - - - E . I . A . . N T T A . . . E D . . I G . . . . P . D . T . . A . . . . . . . . H S . N . . - - - - D E N N S S G . . G R Y R S . . . I E . . . V . . . . . . N S I G . G T . . . .Egra|EgrG_000677000.1 Y D . K S . - - - - S E T D N F T D S A A . . E E . C V I E L V D . S T . . M . E I . . . I . I M . . S I . S . - - - - D E S . C Q S H C H N Y Q Y . H G L E R H . V . V . . . . N A V . L . T H . . .Egra|EgrG_001154600.1 M K E T D P A R N V G D I G A - - - - - A M . . G . . N V D V . G E S . D A T . . T . . . . . . . . H S . . V Q - - - - D D N Q T W I . . . R S Y . . Y N M L T . . V I . . E . G R H N . P . R N A . .Egra|EgrG_002029000.1 . D . K G H - - - - E A I D G . D N T T D . . N R L P I L E . I . . P . D A T . . T . . . . . . . . C . . . . . - - - - Q E S . F . G S . L G . . . . . . . E . . . V . . . . . . N . H S R . S . . L .Emul|EmuJ_000351600.1 V D L K G . - - - - E E I D A P V N A S A L . N . V P F S E . I . . P . D G T . . T . . . . . . M . . . . . L . - - - - . . I T S WG . . D R H C . . . . . E . . . V . . . . . . N G R N K . N H . . .Emul|EmuJ_000351600.2 V D L K G . - - - - E E I D A P V N A S A L . N . V P F S E . I . . P . D G T . . T . . . . . . M . . . . . L . - - - - . . I T S WG . . D R H C . . . . . E . . . V . . . . . . N G R N K . N H . . .Emul|EmuJ_000503000.1 F D . K G D - - - - G . . D A . . N T A A . . . E D . . I D . . C . P . D . T . . A . . . . . . . . . S L N . . - - - - G E N N S C N . C . R L . S . . . . E L . . V . . . . . . I S T G . . T . . . .Emul|EmuJ_000579600.1 I N E D D P A N D V G S I G A P . D T S T M G . E . . N V D V . G E S . D A T . . T . . . . . . . . H S I . N Q - - - - D D N R T E S Y . . C R Y I . Y D . A . . . . . . . E . E . N D . L H R N F . .Emul|EmuJ_000677000.1 Y D . K S . - - - - S E T D D F T D S A A . . E E . C V T E L V D . S T . . M . E T I . . L T I M . . S I . S . - - - - D . S . F Q S H . H N Y Q Y L H G L E R H . V . V . . . . N A M E L . T H . . .Emul|EmuJ_001154600.1 I N E D D P A N D V G S I G A P . D T S T M G . E . . . V D A . D K . . D A T . A T . . . . . . . . H S I . A Q - - - - D D S Q T W I . . . . S Y . . Y N M L T . . V . . . D . G R R N . P . R N A . .Emul|EmuJ_001154700.1 I N E D D P A N D V G S I G A P . D T S T M G . E . . . V D A . D K . . D A T . . T . . . . . . . . . S I N A Q - - - - D D N Q T W I . . . . S Y . . Y N M L T . . V . . . D . G R R N . P . R N A . .TsM1|TsM_000036800 . S E S D P A N H L - K I G . . . G D A A L . . E . L N A D A . D D F . V A T . . T . . . . . . . . H S I . A K - - - - D D D Q T W I L . . H K Y I . Y G . V . . . V . . . E . D . Y E N P H K N S . .TsM1|TsM_000062700 Y D S L G . - - - - . E T D D Y T D S A A . . E E . T I I E V V D E S T . . V . E V A . A . . . . . . S I . S . - - - - . E A V . Q Y H . Q A Y Q Y H Y G L E C H . V . V . . . T N G K E A . T P . . .TsM1|TsM_000167900 F S . K H S - - - - E D F D I . . K T E V H . G A H Q L D S S - - C P . D . T . . T I . . . . . . . R . . . . G - - - - . N . L F Q N . . . . R . . . . . . . . . . . . . . . . . N . . R R . T C . . .TsM1|TsM_000183500 V G . K G D - - - - E V I G T C T D T A A A . H . A . . S E . V . . P . D . T . . T . . . . . . M . . . . . . . - - - - . D S . S WG . . G W E . . . . . . E . HW V . . . . . G D . K N M . T . . . .TsM1|TsM_000043200 V S E V G P E N N L - K T I A . . E D A A L D . E . . S V N V C D E I . I A T . . T . . . . . . . . H S I . V Q - - - - D D S Q I W N R . . P K Y I . C D I A F . . V . . . E . G N Y K N S H K N S . .Hmic|HmN_000112700.1 N D E L A . - - - - - - - - . D N E T P L P K A E . I D Q H Q D D K Y . I V S . E T A . . . . . . . . S V . L D - - - - . Q S - - Y E K . T Q M Y V S I . T S . . . V V . . N . T G K E G V Y K K . I .Hmic|HmN_000634400.1 . D V Q E E - - - - - - - - - - K K S D V . D G N N - - E . L S E MW . D V S . . T I . . . . . . . . S I A D . E D S S L D S T T Q S K . Y Q P S . . . . H G M H . . V L F Q . T H K D Y P E Q K S T .Hmic|HmN_000634600.1 . D V Q E E - - - - - - - - - - K N T E I . D G N . - - E . Q A E V W P D V S . . T I . . . . . F . . S I A D D D D S C . G S P T I S Q . Y R L Y I . . . L G M H . . V L F Q . T H K D Y P E Q K S T .Hmic|HmN_000987400.1 . D V Q E E - - - - - - - - - - K N T E I . D G N . - - E . Q A E V W P D V S . . T I . . . . . F . . S I A D D D D S C . G S P T I S Q . Y R L Y I . . . L G M H . . V L F Q . T H K D Y P E Q K S T .

UBCc domain (PF00179)

Figure 1: Multiple alignment of E. canadensis (G7) expanded proteins. (A) Ubiquitin-conjugating enzyme

The third expanded proteins family is the glycosyl transferases, which is

involved in glycan biosynthesis and modifications. This important pathway could play an important role in the biogenesis of the acellular carbohydrate-rich laminated layer, which is a unique Echinococcus-specific trait and one of the morphological traits that differs among Echinococcus species. These protein families are composed of 10 members that are conserved among cestodes but are very divergent in relation to other organisms. (Figure 2)

310 320 330 340 350 360 370 380 390 400. . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . |

EcG7|EcG7_05780-RA Y F I E F L G T E H H R N E L G E H Y N T L F I G S V L H E Y Q L H P E S R Q T S S N C P W Y A F L Q - H S G S L H Q N Y T S C S S S L H L S D M S E S N E P - I M Q T L T A D F C P W S S R D G K G MEcG7|EcG7_09374-RA . . F . . . . . D . . . . . . . . . . . . . . . . N . . R . . . S . . . . . . . . . S . . . . . Y T E - Y . . . S C . S . L . Y I . . . N . . N . F . V K . A - V R H . . . E . . . . . . A L . . R . IEcG7|EcG7_09682-RA . . . . K . . D . . . P V . . . K . . K . . . T . . A . R . L . T . . . . . . . . . A . . . F . S HW - Y . . C . . . P . A . Y . . . . N . V K L F V L D K . - V P P A I . . . T S D K N L - - - - - -Egra|EgrG_000351600.1 . . F . . . . . D . . . . . . . . . . . . . . . . . . . R . . . S . . . L . . . . . S . . . . . Y T E - Y . . . S C . S . L . Y I . . . N . . N . F . V K . A - V R H . . . E . . . . . . A L . . R . IEgra|EgrG_000503000.1 . . V . . . . . . . . . T D . . Q . . . . . . . . . A . R . H . S . . . . . . . . . S . . . . . L S W - Y P E A S Y H S . . . Y D . . . N . . N . F . V K K K - V . R D . . . . . . . . . A L . . . . IEgra|EgrG_000503220.1 . . V . . . R . G . . . T D . . . . . . . . . . . . A . R . H . S . . . . . . . . . . . . . . P . F W - Y P E A S Y R S . . . Y D . . . N . . N . F A V D K T - . T R N . . . . . . . . . A L . . . . IEgra|EgrG_000677000.1 . . . . K . . D . . . P V . . . . . . K . . . T . . A . R . L . T . . . . . . . . . A . . . F . S HW - Y . . C . . . P . A . Y . . . . N . V K L F V L D K . - V P P A I . . . T S D K N L - - - - - -Egra|EgrG_001154600.1 . . . . V . . N Q . . Q . . . . L L . S N . . T . D . . . . N . I . S . F . . . . . T . . . . P . Y T - I . C P S Y R S . S P Y . . . I Q I D N L F R I D . . - T K H . . . S . . . M . . A L . . Q . TEgra|EgrG_002029000.1 . . F . . . . . D . . . . . . . . . . . . . . . . N . . R . . . S . . . . . . . . . . . . . . . A S E - Y . . N S Y . S H S . Y G . Y . D . G N . F K I D . S - V K R . . . . . . . . . . A Q . . R . IEmul|EmuJ_000351600.1 . . F . . . . . D . . . . . . . . . . . . . . V . T . . R . . . S . . . . . . . . . S . . . . . Y S E - Y . . . S Y . S . P . Y I . . . N . . N . F . V K . A - V R H . . . E . . . . . . A L . . R . IEmul|EmuJ_000351600.2 . . F . . . . . D . . . . . . . . . . . . . . V . T . . R . . . S . . . . . . . . . S . . - - - - - - - - - - - - - - - - - - - - - - - - - C N . F . V K . A - V R H . . . E . . . . . . A L . . R . IEmul|EmuJ_000503000.1 . . V . . . . . . . . . T D . . Q . . S . . . . . . A . R . H . S . . . . . . . . . . . . . . P . F W - Y P E A S F R S . . . Y D . . . N . . Q . F . V K . R - V . R N . . . . . . . . . A L . . . . IEmul|EmuJ_000579600.1 . . V . I . . G Q . . Q S . . . L L . S . . . T . D . . . . N . I . S . F . . . . . T . . . . P . Y K - N P C P S Y Y S . S P Y . . . I Q I D N L F R I D . . - T K H . . . S . . . M . . A L . . Q . TEmul|EmuJ_000677000.1 . . . . K . . D . N Y P V . . . . . . K . . . T S . A . R . L . T . . . . . . . . . A . . . F . S HW - Y . . C . . . P . P . Y . . . . N . V K L F V L D K . - V P P A I . . . T S D K N L - - - - - -Emul|EmuJ_001154600.1 . Y . . I . . N Q . . Q . . . . L L . S . . . T . D . . . . N . I . S . F . . . . . T . . . . P . Y K - N P C P S Y Y S . S P Y . . . I Q I D N L F R I D . . - T K H . . . S . . . M . . A L . . Q . TEmul|EmuJ_001154700.1 . Y . . I . . N Q . . Q . . F . Q L . . A . A S . D . . . . N . I . S . F . . . . . T . . . . P . Y K - N P C P S Y Y S . S P Y . . . I Q I D N L F R I D . . - T K H . . . S . . . M . . A L . . Q . TTsM1|TsM_000036800 . . V . I . . S . . . . . . . . Q L . S . . . T . N . . R . N . V . S G . . . . . . T . . . F P . F T - N L C P S Y Y S . P . Y G . F . E . D R L F . T . . . - . G E S . . S . . . M . . A L . . Q . ITsM1|TsM_000062700 . . L . M . . G . . . H V . . . . . . R . . . T . . G . R . W . T . . . . . . . . . T . . . . . . Y W - D Y . Y S Y . P R . N S . . . P N . V K V F V V D N . - V T . . F S L . A S D R N . - - - - - -TsM1|TsM_000167900 . . . . . . . . . Y . . . . . . . N . . . . . . . . . V . . C . T Y . . . . . . . . . . . . . . Y I . - R . . . P Y P T D . . Y D . . . N . . . . F . I . . . - P . . . . . S . . . S . . A L . . . . ITsM1|TsM_000183500 H Y . . . . . I D D . . S . . . . . . . . . . . . D . . R . H . S . . . . . . . . . T . . C . G - - - - - - - - - - - - - - - - - . . . N . G T . F . V K K S - V R R . M . T . . . . . . A L . . Q . ITsM1|TsM_000043200 . . A . I . . G Y . . . . . M . L L . R . . . T . N . . K . N . I . S D . . . . . G T . . . F P . F T - N P C P S F C S . S P Y . . F . E . D K L F . T D K . - T D N . . . S . . . M . . A . . . Q . IHmic|HmN_000112700.1 . . L . T . C D S . . N E . . . N L S S Y . Y S . V K . R . N . . N . D T C . . . . S . C . . P V Y T N E T D Y S Y V Y N E E Y E P . . D . E Y L F K P Y . . R V G P G . . G . . . S . A L E . . L . SHmic|HmN_000634400.1 . Y G . V . C G Q R . M E . . . S K . L D I . K . . L T F D M . V N G . T . . R . R I . L . G P Y M E - D T F . Y T G T . S P S E F N Y Y . T E L F R D P K . - - P R N . . . . . . . . . K E . . Q . AHmic|HmN_000634600.1 . Y G . V . C E Q P . L E . . . F K . R D I . T . . T T F D M . V D K . T . . R . R I . V . R P Y M E - D T L . H E G T . S P S E F K Y Y . T E L F R D P K . - - P R N . . . . . . . . . K E . . Q . AHmic|HmN_000987400.1 . Y G . V . C E Q P . L E . . . F K . R D I . T . . T T F D M . V D K . T . . R . R I . V . R P Y M E - D T L . H E G T . S P S E F K Y Y . T E L F R D P K . - - P R N . . . . . . . . . K E . . Q . A

c| _00098 00 G C Q G S S Q

410 420 430 440 450 460 470 480 490 500. . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . |

EcG7|EcG7_05780-RA N Y L F G S L F F E G N R N G G N F - - T C F L D E D S D E - D S G S Q G I G W L F D L C S L E P Q N V V M V L D E A S D A M V - S E S T S H C G I S Y H G D K L I M G S E I L L T T E Q N T Y A N E FEcG7|EcG7_09374-RA . A . . S G . Y . . . . . L R . S V - - P Y . . N . G . G D - G . . . . . . . R . . . S . Y S D H G D . F A . P T . L . . T . L - . . . D A Q S R M . D L V V N . . S E . . V S A M E . H D S S . . . SEcG7|EcG7_09682-RA - - - - - - . L V . N . P H V E . . - - . . . S . D . G G - - - - - . E . . . R . . . S . L . - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - E Y PEgra|EgrG_000351600.1 . A . . S . . Y . . D . . L R . . V - - P . . . . . G . G D - G . . . . . . . R . . . S . Y S D H G D . F A . P T . L . . T . L - F . . D A Q S R M . D L V V N . . S E . . V S G M E . H D N S . . . SEgra|EgrG_000503000.1 . G . . D G . . . . S . Q . R E T . - - . . L . . G . G . G - . . E . . D . . R P . E S . . S D R K E . F W E I A . L . E M E - - - - - - D S G . A A H . A . G . S A E . . V S . . G . E D D . . . . LEgra|EgrG_000503220.1 . G . . D G . . . . . . Q . R K T . - - . . . . . . . G . S - . . E . . V . . R . . E S . . S D S E E . . R E I A . L . G M E - - - - - - D S S R V A H . A . S . L A E . . V S . . G . E D D . . . . PEgra|EgrG_000677000.1 - - - - - - . L V . N . P H . E . . - - . . . S N D . G G - - - - - . E . . . R . . . S . L . - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - E Y PEgra|EgrG_001154600.1 A D . . S R I . . . . E . H . A . . - - . . . . . A . G . S - . . E . G . . R R . . . A S . S D H E D A A E K F S K P . . D S E K P . L E T Q S E T . . N V A D . V A D A D S S P . D . E F S . S Y K PEgra|EgrG_002029000.1 . A . . S . . Y . . S V . H R . S I - - . . . . . G . . . D - N . . . . . . . H . . . S G Y . D H E D . . E . T T . L . . V . L - P Q . D A R S E . . D . V G Q M N L E C . V T P A E . . D . N . I . PEmul|EmuJ_000351600.1 . A . . S G . Y . . S . . L R . . V - - P Y . . . . G . G D - G . . N . D . E R . . N S . H S D H G D . F A . P T . L . . T E L - F . . D A Q S K M . D L V V N . . P E . . V S A M E . R D N S . . . SEmul|EmuJ_000351600.2 . A . . S G . Y . . S . . L R . . V - - P Y . . . . G . G D - G . . N . D . E R . . N S . H S D H G D . F A . P T . L . . T E L - F . . D A Q S K M . D L V V N . . P E . . V S A M E . R D N S . . . SEmul|EmuJ_000503000.1 . G . . D G . . . . . . Q . R K T . - - . . . . . . . G . S - N N . . . V . . R . . E S . . S D S E E . . R E I A . L . G M E - - - - - - D S S R V T H . A . R M L A E . . V S . . R . E E D . . . . PEmul|EmuJ_000579600.1 A D . . S R I . . . . E . H . A . . - - . . . . . A . G . S - . . E . G . . R R . . . A S . S D H E D A A E R F S K P . . D S E K P . . G T Q S E T . . N V A D . V A E T D S S P . D . E F S H S . K PEmul|EmuJ_000677000.1 - - - - - - . L V . N . P H . E D . - - . . . S . D . G G - - - - - . E . . . R . . . S . L . - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - E Y PEmul|EmuJ_001154600.1 A D . . S R I . . . . E . H . A . . - - . . . . . A . G . S - . . E . G . . R R . . . A S . S D H E D A A E R F S K P . . D S E K P . . G T Q S E T . . N V A D . V A E T D S S P . D . E F S H S . K PEmul|EmuJ_001154700.1 A D . . S R I . . . . E . H . A . . - - . . . . . A . G . S - . . E . G . . R R . . . A S . S D H E D A A E R F S K P . . D S E K P . . G T Q S E T . . N V A D . V A E T D S S P . D . E F S H S . K PTsM1|TsM_000036800 . D . . . R I . . . . E . H . A . . - - . . . . . A . G . S - G G . . E . . . R . . . E G . S N H D D A T I E F E . S P . D T D R P N . E T Q S E V . D N E A D . V F E . D H P . . E D E C S . T S K PTsM1|TsM_000062700 - - - - - - . L . . E . P H E E S . - - . S . V . D . N . D - - . E . E . . . R . . . S . . . G D - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - . . A PTsM1|TsM_000167900 . G . . D G . . . . . P . H R . D . - - . . . . G . . . . G - G . E R . . . R . . . . S S . P D - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -TsM1|TsM_000183500 . A . L . G . . . . V . . R R . D L - - . . . . . . . L . D - . . E N . C . R R . . . I S . S D N K E A . A . P N . L P . I R L - P . P D A Q S . L . D R V . S . S S E . . V M S S R . - D H T . S . STsM1|TsM_000043200 . D . . . R I . . . . E . H . . . . - - I . . . . A . G . S - . G E . E . . . R . . . E G . S D H E D . T I E F E K P P . D T D K P K I E A . S E L . N N E A D Q V S E . . F S P M G D E . S . S . . PHmic|HmN_000112700.1 M N I L . G . . . . . G . L N A D S - - E . . . F . . D . S - Y Q . . D . F R H . . N F E K . . E Y E E C D E D . D . T A V D T - L K M E T E S E A T V N D A E E E E V . . E V I E V . I H V S K . V DHmic|HmN_000634400.1 . . . L D R . . . N E E . R . . G S - F A . . . . . . . E G K G . S . N A L V E . . N T E N . D DW E Y E E T R C V E D - - - - - - - - - - - - - - - - - - - S I E E S I . E Q T E I G S V A Q F S K KHmic|HmN_000634600.1 . . . L D R . . . N E E . R . . G S - F A . . . . . . . E G K G . S . N A L V E . . N T E N . D DW E Y E E T S C F E D S V E D - - - - - - - - - - - - - - - S I E G S I . D Q T E I G S V A Q V S K KHmic|HmN_000987400.1 . . . L D R . . . N E E . R . . G S - F A . . . . . . . E G K G . S . N A L V E . . N T E N . D DW E Y E E T S C F E D S V E D - - - - - - - - - - - - - - - S I E G S I . D Q T E I G S V A Q V S K K

510 520 530 540 550 560 570 580 590 600. . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . |

EcG7|EcG7_05780-RA G K E S V E L T S S - - - - - V S S T T Y K R V R V P Y A P V T D CW A C R Y K Y Y Y I R K S I N T G L P F V W KW Y F R R T RW S I R F A P Q Q K V D L S M T G I R I P T W R A S S G R M M S D S C HEcG7|EcG7_09374-RA D . . C K . P S . . - - - - - . . F S N C . D S H M S H P L I . . . T D . Q S D . N R . . . . . . V . . A P . . . . F S . S . . . . . . . . . . . I . . . . T A . . Q . . P . . . . . . . . . . . I . .EcG7|EcG7_09682-RA D N Q I R K R A . . - - - - - . . . Q . . E E . Y M A H P L M . K . L S . Q R N - - - - - - - - - A . . A P Q . . . I . . Q . . . P . . . . . . . N . N . . . . . . H . . P . . . . . . . . I . . I . .Egra|EgrG_000351600.1 D . . C K . P S . . - - - - - . . F S N C . D S H M S . P L I . . . T D . Q S D . N R . . . . . . V . . A P . . . . F S . S . . . . . . . . . . . I . . . . T A . . Q . . P . . . . L . . . . . . I . .Egra|EgrG_000503000.1 D N . R K K . P . P P P P S F L . . A . . E H F . A T C . Q I I G . L Y . L E D . I C . K S A . . A H . G P L C . . V . . Q . . . P . . . . . . . N . . . . . . . . Q . . P . . . . . A . . . L . I . .Egra|EgrG_000503220.1 D N K H K . . P . . - - - S S M . . L I . E D F . A T C . Q I I G . P L . L E D . I C . K D A . . A Y . G L P . . . V . . Q . . . P . . . . . . . N . . . . . . . . Q . . P . . . . . A . . . L . I . .Egra|EgrG_000677000.1 D N Q I R K R A . . - - - - - . . P Q . H E E . Y M A H P L M . K . L S . Q R N - - - - - - - - - A . . A P Q . . . I . . Q . . . P . . . . . . . N . N . . . . . . H . . P . . . . . . . L I . . I . .Egra|EgrG_001154600.1 A I Q . T P A P . . - - - - - . . . V . . E E K Y E S . P S M R . . T N . Q . D L N T M K E L V . A E . A P H . . . V . . Q . . . P . . . . . . . N . . . . . . D . L . . P . . I . V . . L L . . V . REgra|EgrG_002029000.1 . . K C K . P S . . - - - - - . . . S N . . D . C . . . P L I . . . L E . . S E . N . L . . . . . G . . A P . . R . F . . S . . . . . . . . . . . I . . . . . . . . . . . P . . . . L . . . . . . I . .Emul|EmuJ_000351600.1 D . . C K . P S . . - - - - - . . F S N C . D G H M S . P L I . . . T D . . S D . N R . . . . . . V . . A P . . . . F S . S . . . . . . . . . . . I . . . . . . . . K . . P . . . . . . . . . . . I . REmul|EmuJ_000351600.2 D . . C K . P S . . - - - - - . . F S N C . D G H M S . P L I . . . T D . . S D . N R . . . . . . V . . A P . . . . F S . S . . . . . . . . . . . I . . . . . . . . K . . P . . . . . . . . . . . I . REmul|EmuJ_000503000.1 D N . H K . . P . . - - - S S M . . L I . E D F . A T C . Q I I G . P L . L E D . I C . K D A . . A Y . G S P . . . V . . Q . . . P . . . . . . . N . . . . . . . V Q . . P . . . . . A . . . L . I . .Emul|EmuJ_000579600.1 A I R . S P A P . . - - - - - . . . E . . E E K Y E S . P L M R . . T N . Q . D L N T V K G L V . A E . A P H . . . V . . Q . . . P . . . . . . . N . . . . . . D . S . . P . . I . V . . L L . . V . REmul|EmuJ_000677000.1 D N Q I R K R A . . - - - - - . . . Q . . E E . Y . A H P L M . K . L S . Q R N - - - - - - - - - A . . A V Q . . . I . . Q . . . P . . . . . . . N . N . . . . . . H . . P . . . . . . . L I . . I . .Emul|EmuJ_001154600.1 A I R . S P A P . . - - - - - . . . E . . E E K Y E S . P L M R . . T N . Q . D L N T V K G L V . A E . A P H . . . V . . Q . . . P . . . . . . . N . . . . . . D . S . . P . . I . V . . L L . . V . REmul|EmuJ_001154700.1 A I R . S P A P . . - - - - - . . . E . . E E K Y E S . P L M R . . T N . Q . D L N T V K G L V . A E . A P H . . . V . . Q . . . P . . . . . . . N . . . . . . D . S . . P . . I . V . . L L . . V . RTsM1|TsM_000036800 A N S R . R S P . P - - - - - A . . E . . T D K Y E . . P L M R S . T N . N . E F . A V . E . V . L D . A P H . . . M . . Q . . . . . . . . . . . N . . . . . . . . S . . P . . V . V . . L L . . V . RTsM1|TsM_000062700 N N H . R Q P S P . - - - - - . . . Q . . E K I Y . A H P L . . E . L N . Q D N G S P M . Q Y . . A . . L P Q R . . I . . Q . . . P . . . . . . . S . E . . . . . . . . . P . . . . . . Q L I . . I . RTsM1|TsM_000167900 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - H D K V E . . . . . . L . . Q . . P . . . . . . . . . . . L . RTsM1|TsM_000183500 D G . . E N A N . . - - - - - A . . S . . R T . C . . . P L I . . . T R . M . . . E F L . E A . . A H . . S . . . . S I . . . . . P . . . . . . . N . . . . . A . . . . . P . . . . . . . . . L E I . LTsM1|TsM_000043200 A G R R A R S P . P - - - - - A . . V . . W E NWQ S S P S L R N . T H . G . E F N A M K G L V . L D . S P Q . . . I . . Q . . . P . . . . . . . N . . . . . . . . P . . P . . V . V . . L L . . V . RHmic|HmN_000112700.1 R D D A Q S I S . T - - - - - - . T D . . N E E Y R D . R L M R E . T Y . N . H D S D S A CW A . S N . D P Q F I . H L . . . . . . M . . . . . . S . N . . . E . . K V . P . . V . T A . L L . . I . QHmic|HmN_000634400.1 E E P E D A P V . . - - - - - L D E Q S . E P R Y L CW T S M R L . Q H . I V N S G L M L E Y . . Q N M . R D . . . V L L Q . . . P . . . . . . . S . E . . T A D . H L . P . . . . A . . L L H . V . .Hmic|HmN_000634600.1 E E P G N A P L . . - - - - - L D E L L H V L W S P C E N S M R L . Q H . S E S A R L V F D C . . Q D M . T D . . . I . L Q . . . P . . . . . . . S . E . . T A D . H L . P . . . . A . . L L H . V . .Hmic|HmN_000987400.1 E E P G N A P L . . - - - - - L D E L L H V L W S P C E N S M R L . Q H . S E S A R L V F D C . . Q D M . T D . . . I . L Q . . . P . . . . . . . S . E . . T A D . H L . P . . . . A . . L L H . V . .

610 620 630 640 650 660 670 680 690 700. . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . |

EcG7|EcG7_05780-RA F C S K N Q N T R N L V L L D P - - - - - - - - - - M V M Q H H P L P R P T L T G I L W L T P L D A L S P F Y C V P I P D M E E Q Q N S E D N G H E R G D D G Y P T P I C L R F L A V A A L L V NWM AEcG7|EcG7_09374-RA . . A . . R D M T . . . . . . . T A L S P L S P L L N L . R Y R E . . S . R . . . V . . M . . . N . . . . . . R . . . . V K . . L . E . . L D . C . G R . S A . . . . T . . . . . . . M . . V T . . V .EcG7|EcG7_09682-RA Y . A . . G A M K . . . . . . . M A L S P L S P L L N L . R . . RW . K . H . I . . . . M . . F . . . . . . . R . . . . T R . Q . E E N . G D E D G . . . . P . . K . V . . . . . . I . . F . T . . V SEgra|EgrG_000351600.1 . . A . . H G M K . . . . . . . M A L S P L S P L L N L . R Y R E . . S . R . . . V . . M . . . N . . . . . . R . . . . I E . . L . E . . L D . C . G R . G A . . . . T . . . . . . . V . . V T . . V .Egra|EgrG_000503000.1 . . A . . . D M S . F . . . . . M A L S P L S P L L N L . . . . A . . . . R . . . V . . M . . . . . . . . . . H . . . . L R . . R K D . . . D . C . E . . G V . . . . T . . . . . T I T . F . T . . V .Egra|EgrG_000503220.1 . . A . . . D M S . F . . . . . M A L S P L S P L L N L . . . . A . . . . R . . . V . . M . . . . . . . . . . H . . . . L R . . R K D . . . D . C . E . . G V . . . . T . . . . . T I T . F . T . . V .Egra|EgrG_000677000.1 Y . A . . G A M K . . . . . . . M A L S P L S P L L N L . R . . RW . K . H . I . . . . M . . F . . . . . . . R . . . . T R . Q . K E N . G D E D G . . G . P . . K . V . . . . . . I . . F . T . . V SEgra|EgrG_001154600.1 . G A . . R G M K . . . . . . . M A L S P L S P L L N L . R . S V E . . . H . . . V . . M . . . . . . . L S . R . . . . I T G W . E G R . G E D G . E D N N A . . R A . H . . . . V I T . F . T . . V .Egra|EgrG_002029000.1 . . A . . H G M K . . . . . . . M A L S P L S P L L N L . R Y R E . . S . R . . . V . . M . . . N . . . . . . R . . S . V E . . L . E . . L D . C . G R . G A . . . . T . . . . . . . T T . A I . . V .Emul|EmuJ_000351600.1 . . A . . . D I T . . . . . . . T A L S P L S P L L N L . R Y R E . . S . R . . . V . . M . . . N . . . . . . R . . . . V K . . L . E . . F D . C . E R . G A . . . . . . . . . . . . I S . A I . . . .Emul|EmuJ_000351600.2 . . A . . . D I T . . . . . . . T A L S P L S P L L N L . R Y R E . . S . R . . . V . . M . . . N . . . . . . R . . . . V K . . L . E . . F D . C . E R . G A . . . . . . . . . . . . I S . A I . . . .Emul|EmuJ_000503000.1 . . A . . . D M S . . . . . . . M A L S P L S P L L N L . . . . A . . S . R . . . V . . M . . . . . . . . . . H . . . . L R . . R K D G . . D . C . E . . G V . . . . T . . . . . T I T . F . T . . V .Emul|EmuJ_000579600.1 . G A . . R K M K . . . . . . . M A L S P L S P L L N L . R . S V E . . . H . . . V . . M . . . . . . . . S . R . . . . I T G W . E G R . G E D G . E D N N P . . R A . H . . . . V I T . F . A . . V .Emul|EmuJ_000677000.1 Y . A R . G A M K . . . . . . . M A L S P L S P L L N L . R . . RW . K . H . I . . . . M . . . . . . . . . . R . . . . T R . Q . E E N . G D E D G . . . . P . . K . V . . . . . . I . . F F T . . V SEmul|EmuJ_001154600.1 . G A . . R K M K . . . . . . . M A L S P L S P L L N L . R . S V E . . . H . . . V . . M . . . . . . . . S . R . . . . I T G W . E G R . G E D G . E D N N P . . R A . H . . . . V I T . F . A . . V .Emul|EmuJ_001154700.1 . G A . . R K M K . . . . . . . M A L S P L S P L L N L . R . S V E . . . H . . . V . . M . . . . . . . . S . R . . . . I T G W . E G R . G E D G . E D N N P . . R A . H . . . . V I T . F . A . . V .TsM1|TsM_000036800 . S A . . R E M K . . . . . . . M A L S P L S P L L N L . R . C V E . . . C . S . V . . M . . . N . . . . S . R . . . . A T T Q . G E P . G D D G G G D I G V . . A A A H . . . . T I T . F . T . . V .TsM1|TsM_000062700 . . V G K . D . E Y . . . . . . M A L S P L S P L L N L . R Y K K . . K . H . I . V . . M . . . . . . . . S . H . . . . I R K Q . . E N . . D K D G E . . E S C . . . A . . . . . T I T . F . A . . V STsM1|TsM_000167900 . . A R D . S I . . . . . P . . T A L S P L S P L L N L . E . . S . . G . Q . N . V . . . . . . . . . . . . . H . . . . H K . G . E E K . N D D . . . . . S T . . . . . . . . . . . . T . . I T . . V .TsM1|TsM_000183500 . . A . . . G M T . . . . . . . M A L S P L S S L L N L . H Y C E . . . . R . . . V . . M . . . . . . . . . . R . . . . V D . . . . D G V Y D R . . W E . . V . . . . L . . . . . T . T . . . T . . V .TsM1|TsM_000043200 . . A . . R E M . . . . . . . . M A L S P L S P L L N L . R . C V E . . . C . S . . . . M . . M . . . . . S . H . . . . I T T R . R E T . G D . R G . D N . A . . S D A H . . . . T I T . F . T . . V .Hmic|HmN_000112700.1 Y . K A . DW A E . V A . . . . M G L S P L S P V L N L . R . K S G . I . R . D . . . . . . . F . . I . . . . H . . . . S V Q G E E - - - - - - - - - - - I D . . P A . S . . V . T I G . F A A . . V SHmic|HmN_000634400.1 Y . T . . P . L V . . . . . . . M S L S P L S P L Q N . . . R N A S N S G Q M N . V . L M S T . Q . . . . . F H . . V - - - - - - - - - - - - - - - - S T E N C . S . S T . . Y . T L S . F A T . . V SHmic|HmN_000634600.1 Y . T . . P H L E . . . . . . . M S L S P L S P L Q N . . . R N A S N S G Q M N . V . . M S T . Q . . . . . F H . . V - - - - - - - - - - - - - - - - S T E N C . S . S T . . Y . T L S . F A T . . V SHmic|HmN_000987400.1 Y . T . . P H L E . . . . . . . M S L S P L S P L Q N . . . R N A S N S G Q M N . V . . M S T . Q . . . . . F H . . V - - - - - - - - - - - - - - - - S T E N C . S . S T . . Y . T L S . F A T . . V S

Figure 2: Multiple alignment of E. canadensis (G7) expanded proteins (B) Glycosyl transferase enzyme. Protein domains are indicated with Pfam identifier.

Drug targets

Antigens Antigen B (AgB) is one of the main antigens of cyst hydatid fluid. We have

previously demonstrated that it is highly polymorphic and variable in its transcription profile in E. canadensis (G7) compared to E. granulosus (G1) [2,3]. Important functions have been proposed for this antigen, such as lipid binding and transport [4], and modulation of cell response to inflammation [5]. As expected, we found AgB gene in a cestode-specific orthology group and E. canadensis (G7) orthologs to cysteine-type endopeptidase inhibitor (immunogenic protein Ts11), which were previously found in in vitro excretion/secretion products of the T. solium metacestode [6]. This indicates that there are similarities among the helminth excretion/secretion proteomes, which could be further studied for drug target development.

Antimicrobial peptides We identified the E. canadensis (G7) “Antimicrobial peptide tachystatin A”

(ECANG7_00862), which is present in all of the cestodes but is absent in human hosts. This gene encodes for a 92-amino-acid-long peptide containing a predicted N-terminal signal peptide that resides between the amino acids 1 and 22, thereby suggesting that it could be excreted/secreted. The analysis of the primary sequence of E. canadensis (G7) protein exhibited low similarity to the antimicrobial peptide Tachystatin-A2 of Tachypleus tridentatus (Arthropoda) (accession number: Q9U8X3) [7] (Figure 3). This protein belongs to the defensin family and has a secondary structure consisting of a cysteine-stabilized triple-stranded beta-sheet [8] and a signal peptide. Defensins are abundant and widely distributed antimicrobial peptides that play an important role in innate immunity and are found in multicellular animals from molluscs to humans. They are characterized by having a cationic β-sheet rich amphipathic structure stabilized by a conserved three-disulfide ligation motif [9]. Indeed, the analysis of the predicted secondary structure of E. canadensis (G7) peptide revealed the presence of β-sheet structures with six cysteine residues that could be involved in the stability of the β-sheet structure. In addition, we found that ECANG7_00862 exhibits common physicochemical and structural properties described for antimicrobial peptides: net charge +2, 10 kDa, ̴ 50% content of hidrobofic amino acids, the presence of signal peptide and short length. Analyses of recently published expression data indicated a high expression level of this peptide in Echinococcus [1,10].Amino acid sequence analysis of orthologs of ECANG7_00862 showed that all of the proteins of the

Galactosyl-T (PF01762) 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150

. . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . |EcG7|EcG7_09779-RA R I R F L Q V P A N V C S A L T P E Q A K S G L T D R S T H P N V V I V Y K S G I Y N F K E R S H L R K L Y N L F Y T D I N M R L I F S I G L P R T S L S N V F Q R D G F N I T P Q N R A G N K L M A Y L R S P Y T T K K Q L S L E M Q E H D D L L V G D Y E D S Y Y N L T L K L F H T F Q W A A R F C R PEcG7|EcG7_10981-RA . . . . . . . . L . A . . P . . . A . . . N . . . N G . P . . . . . . I . . . . V . . . E . . . D I . R . . . . S Q A N . T I Q . . . . . . . . K . . . G . . . . . . . . . . . L A . . . . Y . . L . . S . . . S E . . R R . L K . . Y . . N . . . L . . . . . . . . . . S . . . . . . . . . . . . . . . .Egra|EgrG_000370800.1 . . . . . K . . S S I . . P . . A A R V T D . I I F G . S . . . . . . I . . . A V . . . . . . N Q I . . . . Y . S . . . V . I H . . . . . . . . . . . . G . . . . . . . . . V . L . S K . . Q R . L E H S . F . S R A . T . . I R . . H K Y N . . . . . . . . . . . F . . . . . Q . . S . . . . . . . . . .Egra|EgrG_000406400.1 . . . L F . . . S . . . P P . . N A . V I G . . A . D T . . . Y . . . I . . . A V . . . E . . R Q . . . I . H P S R I G A . I H . . . . . . . . Q . . P G . . . . . . . . . . . L S . . . . K . M L . H . . F . L R . . R L . L K . L Y . . . . . . . . . . . . T . . . . . . . . . . . . . . . . . . . . .Egra|EgrG_002009400.1 H . . . . R . . S . . . P P . . N A . . S N . . S . E . S . . S . . L . . . . A V . . . E V . R Q . . D . . H . S H P S . S I H . . . . . A . . . . . P G . . . . . . . . . V . L A . . . . K . . L . H S N F . S R . Q . L . L K . . Y . . . . . . . . . . . . T . . . . . . . . . . . . . . . . . . . . .Emul|EmuJ_000039900.1 H L . . . . . . S . . . P P . . N A . . S N . . S . E . S . . S . . L . . . . A V . . . E V . R Q . . D . . H . S H P S . S I H . . . . . . . . . . L P G . . . . . . . . . V . L T . . . . K . . L . H S N F . S R S Q . L . L K . . Y . . . . . . . . . . . . T . . . . . . . . . . . . . . . . . . . . .Emul|EmuJ_000279800.1 . . . . . K . . S S . . P P Q . . S . . A D N V . F E . S Y . . . . . I . . . A V . . . E . . . Q I . N . . H . S . . . V S I H . . . . . . . . . . . . G . . . . . . . . . V . L . S K . . Q . . L E H S H L . S R A Q T . . I K . . H K Y N . . . . . . . . . . . . . . . . . . . . S . . . . . . . . . .Emul|EmuJ_000370800.1 . . . . . K . . S S . . P P Q . . S . . A D N V . F E . S Y . . . . . I . . . A V . . . E . . . Q I . N . . H . S . . . V S I H . . . . . . . . . . . . G . . . . . . . . . V . L . S K . . Q . . L E H S H L . S R A Q T . . I K . . H K Y N . . . . . . . . . . . . . . . . . . . . S . . . . . . . . . .Emul|EmuJ_000398300.1 . . . . . . . . . . . . . . . . . . . . . . . . . G . . . . . . . . . . . . . . V . . . . . . . . . . . . . . . S . . . . . V S . . . . . . . . . . . . . . . . . . . . . . . . L . . . S . . . . . . . . . . . F . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . LEmul|EmuJ_000406400.1 H L . . . . . . S . . . P P . . N A . . S N . . S . E . S . . S . . L . . . . A V . . . E V . R Q . . D . . H . S H P S . S I H . . . . . . . . . . L P G . . . . . . . . . V . L T . . . . K . . L . H S N F . S R S Q . L . L K . . Y . . . . . . . . . . . . T . . . . . . . . . . . . . . . . . . . . .TsM1|TsM_000875900 . . . . . H . . T . . . . P . . . T . . . V . . . . G . P R . . . . . . . . . . V . . . E . . N . . . Q . . H . S N . N . S I H . . . . . . . . . S V . . . . . . . . . . . V . L R . . . . Q . . . G H S H F . M R . . R M . L K . . Y D . . . . . I . . . . . . . . . . S . . . . . . . . . . . . . . . .TsM1|TsM_000087900 . . . . . . I . S H . . . P . . . A . T . I . . . . G . . R . . . . . . . . . . V . . . E . . N . . . Q . . H . S N . N . S I H . . . . . . . . . . . Q G . . . . . . . . . V . L A G . . . Q . . L E H S Q F . L R . N R M F A K . . Y I . N . . . I . . . . . . . . . . . . . . . . . . . . . . . . . . .Hmic|HmN_000137800.1 M . . Y . R . . S Q I . P P . E N . Y . E K A K . N - P S . H D . . . I . . . . V . K . E S . N Q I . Q F C H . D R Y . L K I D . V . . . . M . T . M E . . I . . . . . . . V . L N G . . . E . M L E H I . . . Q R . F E R . R Q . . E . . . . . . I . . . . . T . F . . . . . . . . S Y H . . . . . . . L

160 170 180 190 200 210 220 230 240 250 260 270 280 290. . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . .

EcG7|EcG7_09779-RA Y K P I F V F L D D D Y I V N P N K L T N F V R D L T P K L Q E N L N H G Y E I I V N P V F R Y S N P - - H S L W A C S K R E I P W P M H T P Q Y Y G M Y S M Y S Y H H V H D I A L A M H F T K P L V L D D T W L G M V Q Y K L N L T F S R L K G M F R E H S P L I N Q A S C S D I F F A L L S E F E R REcG7|EcG7_10981-RA . . . . . . . . . . . . A . . T . . . A S . . . . . . . E . R . . . H . . . D V . . . . . . . - - - - - - Y . R . . F . . . . . . . . . . P . E . L . I . . V W . . R . . . . M . . T . . . . . . M A I . . . . M A . . . . . . . . . . T . . E . - - - - - - - - - - - - - - - - - - - - - - - - - - - -Egra|EgrG_000370800.1 . . . T . . . . . . . F A . . T D . . V . . . . N K . . E Q L . S . . Y . . K M N . . . . V . . P S E - - Y P Q . . L . . . . . . . . . . A . . . . . . . . L W . . R . . . . . . . . . . . . . . . . . E . . . . . L . . H . . . . . . A K . I . . . S H N . L M S K . V R . . N . . . . . I . D L R L .Egra|EgrG_000406400.1 . . . . . . . . . . . . A . . I . . . A . . A . G . . . E . R D . . S . . . N V T . . . . . . . . S K - - Y P Q . . F . . . . . . . . L . . . E H L . I . . V W . . R . . . . M . . . . . . . . . M . I . . . . . A . . . . . . . . . . T K . . . . L F L D F . Q S D R L . . . . . . . . P I WM . Q . .Egra|EgrG_002009400.1 . . . . . . . . . . . . A . . I . . . A . . . . G . . . E . R D . . S Y . . S M M E . . . H . F N S T S K Y P Q . . F . . . . . . . . S . . . E H L . I . . V W . . R . . . . M . . . . . . . . . M . I . . . . . A . . . . . . . . . . T K . . . . L F . T M R . P T . L N . . . M . . . . T . V . K Q .Emul|EmuJ_000039900.1 . . . . . . . . . . . . A . . I . . . A . . . . G . . . Q . . D . . S Y . . S M M E . . . H . F N S T - - - - - - - - - - - - - - - - - - - - K H L . I . . V W . . R . . . . M . . . . . . . . . M . I . . . . . A . . . . . . . . . . T K . . . . L L . T M R . P K . L N . . . M . . . P . A V . K Q .Emul|EmuJ_000279800.1 . . . T . . . . . . . . . . . . S . . . K . I . . . . . . . . . . . . . . . . . . . . . . . . . . . . - - . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Y . . . . . H . . . . . . L . . . . . . . . . .Emul|EmuJ_000370800.1 . . . T . . . . . . . F A . . T D . . V . . . . N K . L E Q L . S . . Y . . K M N . . S . V . . P S E - - Y . Q . . L . . . . . . . . V . A A . . . . . . . L W . . R . . . . . . . . . . . . . . . . . . . . . . . L K . H . . . . . . T K . M . . . P Q Y R E I S R K V R . . . . . . . . I T D L R L .Emul|EmuJ_000398300.1 . . . . . . . . . . . . . . . . S . . . K . I . . . . . . . . . . . . . . . . . . . . . . . . . . . . - - . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Y . . . . . H . . . . . . L . . . . . . . . . .Emul|EmuJ_000406400.1 . . . . . . . . . . . . A . . I . . . A . . . . G . . . Q . . D . . S Y . . S M M E . . . H . F N S T S K Y P Q . . F . . . . . . . . S . . . E H L . I . . V W . . R . . . . M . . . . . . . . . M . I . . . . . A . . . . . . . . . . T K . . . . . F L D F . Q S D R L . . . . . . . . P I R M . Q . .TsM1|TsM_000875900 . . . . . . . . . . . . A . . T . . . V . . . . N S . . . S R . . . Y . . . V V D . . . . L . P . . T - - S . R . . F . . . . . . . . . . M . E . M . I . . . W . . C . . N . . . . . . . . . . . T . V . . I . . . . . . . . . . . . . Q S . D . . . S . V . . I E D H V N . . . . . . . . I . . . . . .TsM1|TsM_000087900 . . . . . . . . . . . . A . . T G . . V S . I L S . . S E S R . . . S Y . . D M M . . . . . . . A G T - - F P Q . . F . . . . . . . . . . . . E L L . I . . . W . . R . . . . M . . . . . . . . . M . I . . . . . A . . . . . . . . . . T . . . . . H F L A P . Q S . R S D . . . . . . . P I N . . K . .Hmic|HmN_000137800.1 . T . T . I . M . . . . A . . . K . . V A Y L K S R A . E E L . K . . . . F D R P E . V . . . L . H . - I F P Q . T F L . K . . . . . . N M . E . L . I . N V F G . S V . E . . . I G . Y . . . . I . F . G . Y I A I I M D . . . . P L L . . D R . T F G D M L P K R R H T . A K . L . V S . D D I . A .

medically relevant related tapeworms share many similarities such as : i) a predicted β-sheet secondary structure is present, ii) the four cysteine residues are conserved, and iii) a C-terminal portion of approximately 23 residues (75-97) is highly conserved.

Figure 3: Multiple alignment of drug target sequences of E. Canadensis (G7) and their orthologs: (A) Antimicrobial peptides. Arrow heads indicate cysteine residues involved in the β-sheet structures stability.

Peptide hormones Peptide hormones and neuropeptides are among the most structurally and

functionally diverse class of metazoan signalling molecules [11]. The peptides family PP-fold is one of these peptide hormone families [12] and is capable of binding to several G-protein-coupled receptors (GPCRs) [12]. GPCRs are involved in many diseases and are the target of approximately 40% of all of the modern medicinal drugs [13]. These peptides have been identified in S. mediterranea and Schistosoma spp. and there is evidence of conservation of the genomic organisation of flatworm peptide genes [11]. In this work, the former PP-fold family members were grouped into the “pancreatic hormone peptide” category . In E. canadensis (G7) we found four gene models associated with the term "pancreatic hormone peptide-like proteins," that were either present exclusively in cestodes or in flatworms. Two of them, the gene models ECANG7_09023 and ECANG7_05886, encode for a cestode-specific “pancreatic hormone peptide”-like peptide, containing a poorly conserved receptor-binding domain and a dimerization interface. The ECANG7_09023 receptor-binding domain sequence is 100% conserved in Echinococcus [1], but has only 55% of identity with the corresponding ortholog of H. microstoma [1] and 60% of identity with S. mediterranea (Figure 4). The ECANG7_05886 protein domain sequence showed 100% of identity with the corresponding orthologs of E.multilocularis and E. granulosus (G1), 81% of identity with H. microstoma [1] and 75% of identity with Aplysia californica (Lophochotrozoa/Mollusca/Gastropoda, the closest invertebrate organism which there is information about) [14]. There is experimental evidence of the transcription of both gene model transcripts in E. multilocularis metacestodes, pre-gravid and gravid adult specimens, and of the product of gene model ECANG7_09023 in protoscoleces [1].

^^ ^^

10 20 30 40 50 60 70....|....|....|....|....|....|....|....|....|....|....|....|....|....|

EcG7|ECANG7_00862 -----------------------------MSCTRVHAVGLLLLTCILHAIALPSRSNEDCRGLDKTCGGWEgra|EgrG_000724600.1 -----------------------------MSCTRVHAVGLLLLTCILHAIALPSRSNEDCRGLDKTCGGWEmul|EmuJ_000724600.1 -----------------------------MSCTRVHAIGLLLLTCILHAIALPSRSNEDCRGLDKTCGGWTsM1|TsM_000873400 --------------------------------------------MSETRFSFAYFKAEDCLNLDETCGGWHmic|HmN_000380100.1 MRQERRIAFTSPCPRKSMIYSSGFKRNYFKIQLESTLVGTTLHKCYFH-PHLPARGTEGCLKLDMTCVDHTACA2|Q9U8X3 ----------------------------MKLQNTLILIGCLFLMGAMIGDAYSRCQLQG-FNCVVRSYGL

80 90 100 110 120 130....|....|....|....|....|....|....|....|....|....|....|....|...

EcG7|ECANG7_00862 NHEPCCPGLTCRKVRDGNTYGRCSFSLLELHYPIRTVTTTTS-TTVSTEPFT-----------Egra|EgrG_000724600.1 NHEPCCPGLTCRKVRDGNTYGRCSFSLLELHYPIRTVTTTTLPPSLRSHPRRLVEATSIPKRPEmul|EmuJ_000724600.1 NHEPCCPGLTCRKVRDGNTYGRCSFSLLELHYPIRTVTTTTS-ATVSTEPFT-----------TsM1|TsM_000873400 NHKPCCPGLTCRKVRDGNTYGRCSFSLLELHYPTTTATTTTTSTIGSTKPFVGTPL-------Hmic|HmN_000380100.1 VFNPCCPGMVCRKSREGARYGRCSVSWAEFIK-LDSSSTEPLTTTTTSGPLTST---------TACA2|Q9U8X3 PTIPCCRGLTCRSYFPGSTYGRCQRY------------------------------------

Figure 4: Multiple alignment of drug target sequences of E. canadensis (G7) and their orthologs: (B) pancreatic hormones-like proteins The sequence in bold corresponds to the predicted signal peptide found by Signal P 4.1 software, the black line corresponds to the conserved domain found in PFAM database

Transport Vacuolar ATPase (V-ATPase) is an ubiquitous proton pump of eukaryotic cells that

performs essential activities using the localized concentration of protons energized by ATP [15]. V-ATPases play different functions such as receptor-mediated endocytosis, intracellular membrane traffic, protein degradation and coupled transport of small molecules and ions [16,17]. In nematodes it was described to be involved in several functions such as nutrition, osmoregulation, synthesis of the cuticle, neurobiology and reproduction [18]. The presence and the potential role of this protein in platyhelminthes have not yet been explored. In this category, we found 3 gene models that encode for V-ATPases in the E.canadensis (G7) genome and their corresponding orthologs in cestodes. Among them; the gene model ECANG7_02132 showed a high conservation of sequence and alpha helix structure in the N-terminal region with the most similar V-ATPase that has been characterised in invertebrates The V-ATPase of tobacco hornworm manduca sexta (Figure 5)[19]

Figure 5: Multiple alignment of drug target sequences of E. Canadensis (G7) and their orthologs: (C) V-ATPase protein sequences are added to the alignment. Arrow heads below alignment indicate sV-ATPase conserved residues.

Metabolism In the metabolism category we found an enzyme involved in glycosylation

processes. In eukaryotes, glycosylation is critical for correct protein folding and sorting, as well as for the enzyme activity and quality control involving the endoplasmic reticulum (ER)-associated degradation (ERAD) pathway. Part of glycosylation takes place on the luminal side of the ER, where mannoses and glucoses are transferred to acceptor molecules. Mannosylation in the ER lumen is common to four glycosylation pathways: N-linked glycosylation, glycosylphosphatidylinositol (GPI)-anchor, protein O- and protein C-mannosylation, and it is vital for many eukaryotes [20]. Dolichol-phosphate mannose is a mannosyl donor, which is important for the above-mentionedpathways, and is synthesised from GDP-mannose and dolichol-phosphate by the

10 20 30 40 50 60 70 80 90. . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . |

EcG7|ECANG7_09023 M L P L V L I A L A L I S T C C A R P L S I V E S P V E R D S P I D P Q I V K W L Q S Q N P S L A P I L N T H G N I N H D - - - - S S N S H S W K D L F K Y M A Q L N D Y Y V L F G R A R - -Egra|EgrG_000043400 M L P L V L I A L S L I S T C C A R P L S I V E S P V E R D S P I D P Q I V K W L Q S Q N P S L A P I L N T H G N I N H D - - - - S S N S H S W K D L F K Y M A Q L N D Y Y V L F G R A R F GEmul|EmuJ_000043400 M V P S V L I A L A L I S T C C A R P L S I V E S P V E R D S P I D P Q I V K W L Q S Q N P S L A P I L N T H G D I N H D - - - - S S N S H S W K D L F K Y M A Q L N D Y Y V L F G R A R F GTsM1|TsM_000527100 M I P L V L I A L T F T S L S H A L P L S V L E A R A N Q E S Q I D P Q I V R W L Q S Q D P S L T S I L N T N D N V N H G N I D H G I N S H S W K D L F K Y M A Q L N D Y Y M L F G R A R F GHmic|HmN_000295600 M I S A L L T S I L L F L L T D G R P A G F K R S R S L E E I L S D P E F M E W L N S K F P N I A F F L S G R N - - - - - - - - - Q P S T E T V R E I F R F M A E L N N Y Y A I Y G R A R F G

PAHsignal peptide

10 20 30 40 50 60 70 80 90 100 110. . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . .

Egra|EgrG_000926000 MS L MT I S S L L T A T L C L G L L Q V S S A Y R H S T S Q F R D R F I

^^

P I E E E N L WP YWMD D P D I DG E E Q - - - MV S D RMN K K S H P T V P S L P Q Y L E R L K K F Q S L K D K E K Y V A A L N S Y Y M I F G R P REcG7|ECANG7_05886 MS L MT I S S L L T T T L C L G L L Q V S S A Y R H S T S Q F R D R F I P I E E E N L WP YWMD D P D I DG E E Q - - - V V S D R V N K K S H P T V P S L P Q Y L E R L K K F Q S L K D K E K Y V A A L N S Y Y M I F G R P REmul|EmuJ_000926000 MS L MT I S S L L T T T L C L G L F Q V S S A Y R H S T S Q F Q D R F I P R E E E N F WP YWMD D P D T DG E E - - - - V V S D KMN K K S H P T V P L L P Q Y L E R L K K F Q S L K D K E K Y V A A L N S Y Y M I F G R P RTsM1|TsM_000401700 MS L T T I S S L L I A T L F L G S L Q V S S T Y Q L P T E R F Q D R F V P I E K D S F S P YWMD D V DMDG E E DGME V T G D K V D K R S R P T V P S L P Q Y L E R L K K F H S L R D K E K Y V A A L N S Y Y M I F G R P RHmic|HmN_000206600 MS P L S L A S F L T T I L L L G I S Y S T E A Y L I P F R V P E N D L K P MD S S Q F - - - - - D D F D - - - - - - - - E I A S - - R S K K N D I T - - Y L P - - - - - - K R F K T A K DME R Y MA A L N A Y Y M I F G R P R

signal peptide

^ ^ ^ ^ ^ ^

V-ATPase10 20 30 40 50 60 70 80 90 100

....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....EcG7|ECANG7_02132 -----MASAAQLDGVAQLQVAKAAATAKIEEARTRRVKKLKQAKSEAAVEANLYKAECEAAIRELEQR-SRSIYRTG-HSEAPTTLRRKQGDRTRRDHRSGSHREgra|EgrG_000892000.1 -----MASAAQLDGVAQLQVAKAAATAKIEEARTRRIKKLKQAKSEAAVEANLYKAECEAAIKELEQR-SRSIYRTG-HSEAPTTLRRKQGDRTRRDHRSGSHREmul|EmuJ_000892000.1 -----MASAAQLDGVAQLQVAKAAATAKIEEARTRRIKKLKQAKSEAAVEANLYKAECEAAIKELEQRHSKHNIKTGGRVEAFTEQAIQRLQQRYEENKETALGTsM1|TsM_000267900 -----MASVTQLDGVAQLQVAKAAATAKIEEARTRRIKKLKQAKSEATVEANSYKTECEAAIKELEQKYSKHDSETGGRVEAFTEEAIRRLQQRYEENRETALSHmic|HmN_000386900.1 -----MSTTNSLDGPGQLQIAKAAALSKVEDARIRRLNKIREAQTEAAILTRHFKEEHEKHYNEVVKEFALEDARSTERFSTLAEKAIEKLHQMYEENKEKALAMSext|Q25532.1 -------MASQTHGIQQLLAAEKRAAEKVSEARKRKAKRLKQAKEEAQDEVEKYRQERERQFKEFEAKHMGTREGVAAKIDAETRIKIDEMNKMVQTQKEAVIKScere|5BW9 MDYKDDDDKSQKNGIATLLQAEKEAHEIVSKARKYRQDKLKQAKTDAAKEIDSYKIQKDKELKEFEQKNAGGVGELEKKAEAGVQGELAEIKKIAEKKKDDVVK

enzyme Dolichyl-phosphate beta-D-mannosyl transferase (DPM) (EC 2.4.1.83). In mammals, the enzyme is a complex of three proteins: DPM1, which is the catalytic subunit, and two other subunits, DPM2 and DPM3. DPM3 is necessary for stabilization and ER localization of DPM1, and DPM2 stabilizes DPM3. DPM3 is an ER-localized 92-amino-acid protein composed of two membrane-spanning regions [21]. In E. canadensis (G7) we found two subunits of the DPM complex: the subunit DPM1 that is encoded by the gene model ECANG7_04674 and is present in all of the metazoan analysed. And the subunit DPM3 that is encoded by the gene model ECANG7_01023, is present only in cestodes and consist of 2 transmembrane regions and the DMP3 superfamily domain. (Figure 6 A and B). The high sequence conservation in all of the cestode species (90% among Echinococcus species and 65% with other cestodes species) and the low sequence conservation in relation to the human counterpart suggest that it codes for a new class of DPM3. Since DPM3 is the only divergent member of the complete metabolic pathway in Echinococcus (Figure 6 C) and play a regulatory role in N-glycan precursor biosynhtesis it could represent a novel drug target candidate.

DPM3

DPM1

10 20 30 40 50 60 70 80 90 100....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|...

EcG7|ECANG7_01023 MVIKLHHGGLKLGLAGRFSVATPPMVRAVRWITGVVLSTAVWLGVLYAPFSRSNALDLIILYSPVAIIISFGLFSFFFIIYGVVTFNDCPVQNSVLSPVLCILEgra|EgrG_001024100.1 MVIKLHHGGLKLGLADRFSVATPPMVRAVRWITGVVLSIAVWLGVLYAPFSRSNALDLIILYSPVAIIVSFGLFSLFFIIYGVVTFNDCPSKTWSFHLFFVFCEmul|EmuJ_001024100.1 MVIKLHHGGLKLGFADRFNVGTPPMVRAVRWITGVVLSIAVWLGVLYAPFSRSNALDLIILYSPVAIIISFGLFSFFFIIYGVVTFNDCPSKTRSFHLFFVFCTsM1|TsM_000186900 ----MPLTGTRIS----YDVETPAMVRAVRWISALVLSVAVWLGALYAPFSRSHVLDLIIFCLPVAVIIFFGLFSFLFIVYGVVTFNDCPGEAEKLKQQIDEAHmic|HmN_000081000.1 ------------------------MIRAAPWLSTAILLTALWLGALYSPIASVPSIKMVVLLSPVIIVI-LGIISLSYIMFGVFNFNDCPGEDEKLKQQIEEA

10 20 30 40 50 60 70 80 90 100 110 120....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|

EcG7|ECANG7_04674 MSRYSILLPTYNEKENLPLTVYLIDKYMRSNSYDYEIVIVDDNSPDGTQLAAEKLQELYGPGKIVLKPRQKKEGLGSAYVHGLRYATGDFVIIMDADLSHNPKFLPVFIDFRLQKCMDYDEgra|EgrG_000843600.1 MSRYSILLPTYNEKENLPLTVYLIDKYMRSNSYDYEIVIVDDNSPDGTQLAAEKLQELYGPGKIVLKPRQKKEGLGSAYVHGLRYATGDFVIIMDADLSHNPKFLPVFIDFRLQKCMDYDEmul|EmuJ_000843600.1 MSRYSILLPTYNEKENLPLTVYLIDKYMRSNSYDYEIVIVDDNSPDGTQLAAEKLQELYGPGKIVLKPRQKKEGLGSAYVHGLRYATGDFVIIMDADLSHNPKFLPVFIDFRLQKCMDYDTsM1|TsM_000672100 MSRYSILLPTYNEKENLPLTVCLIDKYMSSNSYNYEIIIIDDNSPDGTQLAAEKLQKLYGSDKIVLKPRQKKEGLGSAYVHGLKYATGDFVFIMDADLSHN-------------KCMDYDGsal|6716 --MLSIIVPTYNEVDNIPIIV----------NIKYEIIIVDDNSADNTQLAIHELIKIYGKEKVVLKPRAAKLGLGTAYMHGMKFARGEFIILMDADLSHHPKFIPEFIK--KQKEKDYDSman|Smp_042790.1 MTLYSIILPTYNEKENLPIITWIIDKVMNKSGFPYELIIVDDNSPDGTGEVAKKLQSIFGENKIILKPRSGKLGLGSAYLHGLKFAKGDFIIIMDADLSHHPKFIPEFIK--LQKQHDYDSmed|mk4.004497.01 NSKYSIILPTYNEKENLPIIIYLIDKYLTKNNINYEVIIVDDNSPDGTQDAAKQLQKIFGKNKIILQPRSGKLGLGTAYLHGLQYAAGEFIILMDADLSHHPKFIPEFIS--KQTKGNYDCele|Y66H1A.2a TPKYSIILPTYNEKENLPICIWLIENYLK--EVSHEVIIVDDASPDGTQGIAKLLQKEYGDDKILIKPRVGKLGLGTAYSHGLSFARGEFIILMDADLSHHPKFIPEMIA--LQHKYKLDBflo|PACid:473726 -DKYSVLLPTYNERDNLPLIVWLLVRAFQESGHDFEIIVIDDGSPDGTLEVAQQLEKIYRKDKIVLRPRAKKLGLGTAYIHGMKHATGNYVIIMDADLSHHPKFIPEFIS--KQQEKNYDBflo|PACid:480698 -DKYSVLLPTYNERDNLPLIVWLLVRAFQESGHDFEIIVIDDGSPDGTLEVAQQLEKIYGKDKIVLRPRAKKLGLGTAYIHGMKHATGNYIIIMDADLSHHPKFIPEFIS--KQQEKNYDDmel|FBpp0080817 GHKYSILMPTYNEKDNLPIIIWLIVKYMKASGLEYEVIVIDDGSPDGTLDVAKDLQKIYGEDKIVLRPRGSKLGLGTAYIHGIKHATGDFIVIIDADLSHHPKFIPEFIK--LQQEGNYDDrer|ENSDARP00000074043 PDKYSVLLPTYNERENLPLIVWLLVKYFGESGYNYEIIVIDDGSPDGTLQIAEQLQKIYGADKILLRPRAEKLGLGTAYIHGIKHATGNFVIIMDADLSHHPKFIPQFIE--KQKEGGYDMmus|CCDS17109.1 QDKYSVLLPTYNERENLPLIVWLLVKSFSESAINYEIIIIDDGSPDGTREVAEQLAEIYGPDRILLRPREKKLGLGTAYIHGIKHATGNYVIIMDADLSHHPKFIPEFIR--KQKEGNFDHsap|Hsap_12669 QNKYSVLLPTYNERENLPLIVWLLVKSFSESGINYEIIIIDDGSPDGTRDVAEQLEKIYGSDRILLRPREKKLGLGTAYIHGMKHATGNYIIIMDADLSHHPKFIPEFIR--KQKEGNFD

130 140 150 160 170 180 190 200 210 220 230 240....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|

EcG7|ECANG7_04674 IVTGTRYGCGGGVCGWNLKRKIISRCANFLAHLLLLPKASDLTGSFRLYKRNVLSELIKCSFSRGYVFQMEMMARASSMGYKIGEVGISFVDRLYGASKLSGSEIKQYLACLIRLFFTI-Egra|EgrG_000843600.1 IVTGTRYGCGGGVCGWNLKRKIISRCANFLAHLLLLPKASDLTGSFRLYKRNVLSELIKCSFSRGYVFQVEMMARASSMGYKIGEVGISFVDRLYGASKLSGSEIKQYLACLIRLFFTI-Emul|EmuJ_000843600.1 IVTGTRYGCGGGVCGWNLKRKIISRCANFLAHLLLLPKASDLTGSFRLYKRNVLSELIKCSFSRGYVFQMEMMARASAMGYKIGEVGISFVDRLYGASKLSGSEIRQYLACLIRLFFTI-TsM1|TsM_000672100 IVTGTRYGCGGGVYGWSLKRKIISRGANFLAHLLLLPKASDLTGSFRLYKRDVLSELIKCSVSRGYVFQMEMMARASVMGYKIGEVX---------------------------------Gsal|6716 IVTGTRYSHDGGVFGWNIQRKLISRTANYLAQVLLWPKASDLTGSFRLYKKTVLEKLIKTCKSKGYVFQMEMITRASAMDYSIGEVGITFIDRVFGVSKLGGSEIIQYASGLIRLFLTL-Sman|Smp_042790.1 VVTGTRYASNGGVSGWDLKRKLISRTANYIAQVLLRPKASDLTGSFRLYKKAVLQDLVSRCTSRGYVFQMEMIVLASSLGYKIGEVGITFVDRFYGESKLGGTEIIQYIMGLLHLFCTC-Smed|mk4.004497.01 IVTGTRYDLGGGVYGWNLKRKLISRTANYLAHIMLQPDASDLTGSFRLYKKEVLINLVSVCKSRGFVFQMEMIIRATCLNYKIGEVPISFVDRMYGQSKLGGSEIIQYVKGLLTLFFTT-Cele|Y66H1A.2a IVTGTRYKDGGGVSGWDLKRKTISKGANFLAQFLLNPGVSDLTGSFRLYKRDILSKLIAESVSKGYVFQMEMMFRAKKSGYRIGEVPISFVDRFFGESKLGSQEIVDYAKGLLYLFAFVWBflo|PACid:473726 VVSGTRYTGSGGVYGWDLKRKLISRGANYLTQVLLRPGASDLTGSFRLYKKAVLEKLVESCVSKGYVFQMEMIVRARQLGFTIGEVPITFVDRVYGESKLGGNEVISFAKGLLYLFATT-Bflo|PACid:480698 VVSGTRYRGSGGVYGWDLKRKLISRGANYLTQVLLRPGASDLTGSFRLYKKAVLEKLVESCVSKGYVFQMEMIVRARQLGFTIGEVPITFVDRVYGESKLGGNEVISFAKGLLYLFATT-Dmel|FBpp0080817 IVSGTRYAGNGGVFGWDFKRKLISRGANFLSQVLLRPNASDLTGSFRLYKKDVLEKCIASCVSKGYVFQMEMLVRARQHGYTIAEVPITFVDRIYGTSKLGGTEIIQFAKNLLYLFATT-Drer|ENSDARP00000074043 LVSGTRYRGDGGVYGWDLRRKLISRGANFVTQVLLRPGASDLTGSFRLYKKEVLEKLVEQCVSKGYVFQMEMIVRARQLGYTIGEVPISFVDRVYGESKLGGNEIVSFLKGLLTLFATT-Mmus|CCDS17109.1 IVSGTRYKGNGGVYGWDLKRKIISRGANFITQILLRPGASDLTGSFRLYRKEVLQKLIEKCVSKGYVFQMEMIVRARQMNYTIGEVPISFVDRVYGESKLGGNEIVSFLKGLLTLFATT-Hsap|Hsap_12669 IVSGTRYKGNGGVYGWDLKRKIISRGANFLTQILLRPGASDLTGSFRLYRKEVLEKLIEKCVSKGYVFQMEMIVRARQLNYTIGEVPISFVDRVYGESKLGGNEIVSFLKGLLTLFATT-

A

B

Figure 6: Multiple alignment of drug target sequences of E. Canadensis (G7) and their orthologs: (A) DPM3 peptide. The black line over the alignment indicates the Dolichol-phosphate mannosyl transferase subunit 3 domains (DPM3) (B) DPM1 and conserved orthologs of all the organisms. Hsap: Homo sapiens, Mmus: Mus musculus; Drer: Danio rerio; Bflo: Branchiostoma floridae; Dmel: Drosophila melanogaster; Cele: Caenorhabditis elegans; ECANG7: Echinococcus canadensis; Egra: Echinococcus granulosus; Emul: Echinococcus multilocularis; Gsal: Gyrodactylus salaris; Hmic: Hymenolepis microstoma; Sman: Schistosoma mansoni; Smed: Schimdtea mediterranea; TsM1: Taenia solium.TACA2: Tachypleus tridentatus (Tachystatin-A2, GenBank: Q9U8X3) M. sexta and S. cerevisiae (Sc). (C) The complete metabolic pathway of N-glican biosynthesis in Echinococcus

Transcription processes Zinc finger proteins are a class of regulatory proteins that participate in a variety of

cellular activities, such as development, differentiation and tumour suppression. Among them, C2H2 zinc-finger genes are members of the largest and most complex gene superfamilies in metazoan genomes [22] Subgroups of lineage specific C2H2- containing proteins can be found in yeast, nematodes, insects and plants [23]. This class of zinc fingers can have a variety of functions, such as RNA binding and mediating protein-protein interactions, but are best known due to their role in sequence-specific DNA-binding proteins, in particular in humans, where has been described that they bind specific methylated DNA sequences. Such proteins exhibit zinc finger domains that are typically organised in tandem repeats of two, three or more fingers comprising the DNA-binding domain of the protein. In E. canadensis (G7) we found 124 gene models that encode for C2H2-domain containing proteins. Among all of them the gene model ECANG7_07928 encodes for a 125- amino-acid protein that contains 4 C2H2-type zinc finger domains and it is present exclusively in cestodes. Sequences-specific methylated DNA-binding proteins along with genomic DNA methylation pattern may play a role in the regulation of gene transcription.

C

Bibliography 1. Tsai IJ, Zarowiecki M, Holroyd N, Garciarrubio A, Sanchez-Flores A, Brooks KL, et al. The genomes of four tapeworm species reveal adaptations to parasitism. Nature [Internet]. Nature Publishing Group, a division of Macmillan Publishers Limited. All Rights Reserved.; 2013 [cited 2014 May 29];496:57–63. Available from: http://dx.doi.org/10.1038/nature12031 2. Muzulin PM, Kamenetzky L, Gutierrez AM, Guarnera EA, Rosenzvit MC. Echinococcus granulosus antigen B gene family: Further studies of strain polymorphism at the genomic and transcriptional levels. Exp. Parasitol. 2008;118:156–64. 3. Kamenetzky L, Muzulin PM, Gutierrez AM, Angel SO, Zaha A, Guarnera EA, et al. High polymorphism in genes encoding antigen B from human infecting strains of Echinococcus granulosus. Parasitology. 2005;131:805–15. 4. Silva-Álvarez V, Franchini GR, Pórfido JL, Kennedy MW, Ferreira AM, Córsico B. Lipid-free antigen B subunits from echinococcus granulosus: oligomerization, ligand binding, and membrane interaction properties. PLoS Negl. Trop. Dis. [Internet]. 2015 [cited 2016 Apr 11];9:e0003552. Available from: http://www.ncbi.nlm.nih.gov/pubmed/25768648 5. Silva-Álvarez V, Folle AM, Ramos AL, Kitano ES, Iwai LK, Corraliza I, et al. Echinococcus granulosus Antigen B binds to monocytes and macrophages modulating cell response to inflammation. Parasit. Vectors [Internet]. 2016 [cited 2016 Apr 11];9:69. Available from: http://www.ncbi.nlm.nih.gov/pubmed/26846700 6. Victor B, Dorny P, Kanobana K, Polman K, Lindh J, Deelder AM, et al. Use of expressed sequence tags as an alternative approach for the identification of Taenia solium metacestode excretion/secretion proteins. BMC Res. Notes [Internet]. 2013 [cited 2016 Mar 30];6:224. Available from: http://www.ncbi.nlm.nih.gov/pubmed/23742691 7. Osaki T, Omotezako M, Nagayama R, Hirata M, Iwanaga S, Kasahara J, et al. Horseshoe crab hemocyte-derived antimicrobial polypeptides, tachystatins, with sequence similarity to spider neurotoxins. J. Biol. Chem. [Internet]. 1999 [cited 2016 Mar 30];274:26172–8. Available from: http://www.ncbi.nlm.nih.gov/pubmed/10473569 8. Fujitani N, Kawabata S, Osaki T, Kumaki Y, Demura M, Nitta K, et al. Structure of the antimicrobial peptide tachystatin A. J. Biol. Chem. [Internet]. 2002 [cited 2016 Mar 30];277:23651–7. Available from: http://www.ncbi.nlm.nih.gov/pubmed/11959852 9. Ganz T. Defensins: antimicrobial peptides of innate immunity. Nat. Rev. Immunol. [Internet]. 2003 [cited 2016 Mar 30];3:710–20. Available from: http://www.ncbi.nlm.nih.gov/pubmed/12949495 10. Zheng H, Zhang W, Zhang L, Zhang Z, Li J, Lu G, et al. The genome of the hydatid tapeworm Echinococcus granulosus. Nat. Genet. [Internet]. Nature Publishing Group; 2013 [cited 2014 Jun 17];45:1168–75. Available from: http://dx.doi.org/10.1038/ng.2757 11. Collins JJ, Hou X, Romanova E V, Lambrus BG, Miller CM, Saberi A, et al. Genome-wide analyses reveal a role for peptide hormones in planarian germline development. PLoS Biol. [Internet]. 2010 [cited 2016 Mar 30];8:e1000509. Available from: http://www.ncbi.nlm.nih.gov/pubmed/20967238 12. Berglund MM, Schober DA, Statnick MA, McDonald PH, Gehlert DR. The use of bioluminescence resonance energy transfer 2 to study neuropeptide Y receptor agonist-induced beta-arrestin 2 interaction. J. Pharmacol. Exp. Ther. [Internet]. 2003 [cited 2016 Mar 30];306:147–56. Available from: http://www.ncbi.nlm.nih.gov/pubmed/12665544 13. DAVID FILMORE. Cell-based screening assays and structural studies are fueling G-protein coupled receptors as one of the most popular classes of investigational drug

targets. Mod. DRUG Discov. [Internet]. 2004 [cited 2016 Mar 30];7:4. Available from: http://pubs.acs.org/subscribe/journals/mdd/v07/i11/pdf/1104feature_filmore.pdf 14. Rajpara SM, Garcia PD, Roberts R, Eliassen JC, Owens DF, Maltby D, et al. Identification and molecular cloning of a neuropeptide Y homolog that produces prolonged inhibition in Aplysia neurons. Neuron [Internet]. 1992 [cited 2016 Mar 30];9:505–13. Available from: http://www.ncbi.nlm.nih.gov/pubmed/1524828 15. Hinton A, Bond S, Forgac M. V-ATPase functions in normal and disease processes. Pflügers Arch. Eur. J. Physiol. [Internet]. 2009 [cited 2016 Mar 30];457:589–98. Available from: http://www.ncbi.nlm.nih.gov/pubmed/18026982 16. Forgac M. Structure, function and regulation of the vacuolar (H+)-ATPases. FEBS Lett. [Internet]. 1998 [cited 2016 Mar 30];440:258–63. Available from: http://www.ncbi.nlm.nih.gov/pubmed/9872382 17. Forgac M. The vacuolar H+-ATPase of clathrin-coated vesicles is reversibly inhibited by S-nitrosoglutathione. J. Biol. Chem. [Internet]. 1999 [cited 2016 Mar 30];274:1301–5. Available from: http://www.ncbi.nlm.nih.gov/pubmed/9880499 18. Knight AJ, Behm CA. Minireview: the role of the vacuolar ATPase in nematodes. Exp. Parasitol. [Internet]. 2012 [cited 2016 Mar 30];132:47–55. Available from: http://www.ncbi.nlm.nih.gov/pubmed/21959022 19. Lepier A, Gräf R, Azuma M, Merzendorfer H, Harvey WR, Wieczorek H. The peripheral complex of the tobacco hornworm V-ATPase contains a novel 13-kDa subunit G. J. Biol. Chem. [Internet]. 1996 [cited 2016 Mar 30];271:8502–8. Available from: http://www.ncbi.nlm.nih.gov/pubmed/8626552 20. Kinoshita T, Fujita M, Maeda Y. Biosynthesis, remodelling and functions of mammalian GPI-anchored proteins: recent progress. J. Biochem. [Internet]. 2008 [cited 2016 Mar 30];144:287–94. Available from: http://www.ncbi.nlm.nih.gov/pubmed/18635593 21. Maeda Y, Tanaka S, Hino J, Kangawa K, Kinoshita T. Human dolichol-phosphate-mannose synthase consists of three subunits, DPM1, DPM2 and DPM3. EMBO J. [Internet]. 2000 [cited 2016 Mar 30];19:2475–82. Available from: http://www.ncbi.nlm.nih.gov/pubmed/10835346 22. Knight RD, Shimeld SM. Identification of conserved C2H2 zinc-finger gene families in the Bilateria. Genome Biol. [Internet]. BioMed Central; 2001 [cited 2016 Oct 7];2:RESEARCH0016. Available from: http://www.ncbi.nlm.nih.gov/pubmed/11387037 23. Seetharam A, Bai Y, Stuart GW, Knight R, Shimeld S, Stillman J, et al. A survey of well conserved families of C2H2 zinc-finger genes in Daphnia. BMC Genomics [Internet]. BioMed Central; 2010 [cited 2016 Oct 7];11:276. Available from: http://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-11-276