Discriminating Facts from Artefacts in the Secreted Ly-6 Protein Family
Christopher Southan
Department of Molecular Pharmacology
AstraZeneca R&D, Mölndal, Sweden
Outline
• Introduction• Proteomic identification of novel secreted rat Ly6 proteins in EST data• Discovery of unknown homologues• Bioinformatic analysis of chimeric mRNAs• Database errors propagated by the chimeras • Delineating a large secreted Ly6 family on the rat genome• Discovery of mouse homologues but no clear orthologues• Equivocal biochemical results for homologues• Summary of bioinformatic pitfalls
Introduction: Quirks that Lurk in Databases
• The sequence deluge into the primary databases necessitates automated pipelines to produce 'value added' secondary databases
• But, however sophisticated the data parsing or curation, anomalies will get through
• Most things that could have gone wrong, have• Although the overall quirk frequency is low, they present
pitfalls for the unwary• Responsibility for primary annotation and sequence quality
lies solely with submitting authors• Few originating authors correct, update or withdraw their
primary sequence entries• It is difficult to discriminate between in vitro artifacts or rare in
vivo events
Rat Urine HPLC Intact MALDI N-Terminal Sequence
High-speed microbore column
Rat Urine 2D-Gel Trypsin MS/MS PepSea Search EST hits
Spot 1 gave two differentpeptide matches
• CTSFDSTGFCHVGR contained within rat EST A893514
• CESLDSTGLCR contained within rat EST AA800439
EST AA893514 vs. dbEST: 30 Rat Hits at 95% to 100% Identity
Assembly of Rat Urinary Proteins 1 and 2
• 9 EST sequences, the MS/MS sequences, and the N-terminal Edman data, were consistent with two paralogous proteins
• 90% identical at the AA level and 96% identical at the DNA level• Highly represented in rat liver ESTs• One N-glycosylation site with 1.6 to 2.0 Kda glycan• Secreted forms abundant in male rat urine by HPLC• RUP1 independently verified as liver regeneration-related protein by full mRNA
verified signal peptide
RUP1 MGKHILLLPLGLSLLMSSLLALQCFRCTSFDSTGFCHVGRQKCQTYPDEICAWVVVTTRD ||| ||||||||||||||||||||||| |:||||:|:|||: |||||||||||||||||| RUP2 MGKPILLLPLGLSLLMSSLLALQCFRCESLDSTGLCRVGRRICQTYPDEICAWVVVTTRD RUP1 GKFVYGNQSCAECNATTVEHGSLIVSTNCCSATPFCNMVHR 101 ||||||||||||| :|||||||||:||||||||||||||| RUP2 GKFVYGNQSCAECIGTTVEHGSLIISTNCCSATPFCNMVHP 101
RUP3: Independent MS-based Identification by Wait et al. “Proteins of rat serum, urine and CSF:VI”
Electrophoresis 22, 3043-3052 (2001)
RUP1 MGKPILLLPLGLSLLMSSLLALQCFRCESLDSTGLCRVGRRICQTYPDEICAWVVVTTRDRUP2 MGKHILLLPLGLSLLMSSLLALQCFRCTSFDSTGFCHVGRQKCQTYPDEICAWVVVTTRDRUP3 MGKHILLLPLGLSLLMSSLLALQCFRCISFDSTGFCYVGRHICQTYPDEICAWVVVTTRD *** *********************** * **** * ***. ******************
RUP1 GKFVYGNQSCAECIGTTVEHGSLIISTNCCSATPFCNMVHP EST AA800439RUP2 GKFVYGNQSCAECNATTVEHGSLIVSTNCCSATPFCNMVHR EST AA893514 RUP3 GKFVYGNQSCAECNATTVEHGSLIVSTNCCSATPFCNMVHR EST AA893518 ************* *********.***************
RUP Paralogues Define a New Family of Secreted Ly-6 Proteins
UP1_RAT : UP2_RAT : UP3_RAT : SP1_RAT :
* 20 * 40 * MGKPILLLPLGLSLLMSSLLALQCFRCESLDSTGLCRVGRRICQTYPDEICMGKHILLLPLGLSLLMSSLLALQCFRCTSFDSTGFCHVGRQKCQTYPDEICMGKHILLLPLGLSLLMSSLLALQCFRCISFDSTGFCYVGRHICQTYPDEICMGKNILLLLLGLSFVIGFLQALRCLECDMLNSDGICEKGNSTCEAKEDQEC
: 51 : 51 : 51 : 51
UP1_RAT : UP2_RAT : UP3_RAT : SP1_RAT :
60 * 80 * 100 AWVVVTTRDGKFVYGNQSCA-ECIGTTVEHGSLIISTNCCSATPFCNMVHPAWVVVTTRDGKFVYGNQSCA-ECNATTVEHGSLIVSTNCCSATPFCNMVHRAWVVVTTRDGKFVYGNQSCA-ECNATTVEHGSLIVSTNCCSATPFCNMVHRGILVVSQG-VDILFGMQDCSSHCLNKTFHHYNLTLDFTCCHDQSLCNEF--
: 101 : 101 : 101 : 99
A Quirky Result: Solid Matches Between RUP2 and Four Unrelated mRNAs
• Rat mitochondrial IF1 protein mRNA, L07806, 883 bp• Rat casein kinase II alpha subunit (CK2), L15618, 2180 bp • Rat mitochondrial succinyl-CoA synthetase alpha subunit
J03621, 1684 bp• Rat 3' non-translated beta-F1-ATPase mRNA-binding protein
mRNA AF368860, 1197 bp • Matches of 92% to 100% identity over 300-500 bases• Two in reverse-frame, two in forward frame
L07806 F1-ATPase inhibitor
AF368860 UTR F1-ATPase inhib
L15618 casein kinase II alpha
J03621 mito succinyl-CoA synthase alpha
Three RUP-like Chimeras and a Pre-mRNA
Translation Matches for the Chimeras Reveal a Cryptic Protein
RUP-2 28 TSFDSTGFCHVGRQKCQTYPDEICAWVVVTTRDGKFVYGNQSCAECNATTVEHGSLIVSTNCCSATPFCNMVHR 101 TSFDSTGFCHVGRQKCQTYPDEICAWVVVTTRDGKFVYGNQSCAECNATTVEHGSLIVSTNCCSATPFCNMVHR 417 TSFDSTGFCHVGRQKCQTYPDEICAWVVVTTRDGKFVYGNQSCAECNATTVEHGSLIVSTNCCSATPFCNMVHR 196
L07806 Rattus rattus mitochondrial IF1 protein mRNA
RUP-2: 59 RDGKFVYGNQSCAECNATTVEHGSLIVSTNCCSATPFCNMVHR 101 RDGKFVYGNQSCAECNATTVEHGSLIVSTNCCSATPFCNMVHR 708 RDGKFVYGNQSCAECNATTVEHGSLIVSTNCCSATPFCNMVHR 580
L15618 Rat casein kinase II alpha subunit (CK2) mRNA
RUP-2 24 CFRCTSFDSTGFCHVGRQKCQTYPDEICAWVVVTTRDGKFVYGNQSCAECNATTVEHGSLIVSTNCCSATPFCNMV 99 CF C + +S G C+ C +P E+CA V+T +DGKFVYGNQSCAEC+ TVEHGSLIVSTNCCSAT FCN+V 50 CFECGNLNSMGICNFRTAVCYAHPGEVCA-SVLTYKDGKFVYGNQSCAECSGRTVEHGSLIVSTNCCSATSFCNIV 274
J03621 Rat mitochondrial succinyl-CoA synthetase alpha subunit
RUP1 Gene Structure
Matching the Chimeras Against the Rat Genome
SCORE START END QSIZE IDENTITY CHRO STRAND START END ------------------------------------------------------------
L15618 Rat casein kinase II alpha subunit 1451 709 2177 2180 99.9% 3 + 142470350 142514932 799 1091 2161 2180 90.2% 10 - 39567792 39568918 313 392 711 2180 99.1% 8 - 36902949 36905031
L07806 Rattus rattus mitochondrial IF1 protein 405 420 826 833 100.0% 5 + 152628418 152632060 398 8 415 833 99.1% 8 - 36902399 36905032
J03621Rat mitochondrial succinyl-CoA synthetase subunit 1203 472 1684 1684 100.0% 4 + 106816653 106845979 469 1 472 1684 100.0% 8 - 36133698 36137263
AF368860 Rattus norvegicus 3' non-translated beta-F1-ATPase1118 1 1120 1120 100.0% 8 + 37247995 372515301016 1 1120 1120 96.9% 8 + 36688890 369050341006 1 1120 1120 95.6% 8 + 36901482 37055697
Multiple Loci on Rat Chromosome 8: Erroneous Mapping of the Chimeras
L15618 casein kinase II alphaL07806 F1-ATPase inhibitorAF198441 Rat RUP2 AF198442 Rat spleen protein 1
What Caused the Chimeras?
• Each of the chimeric cDNAs submitted by different research groups 1988-1993
• All were prepared from rat cDNA libraries• Two of these genes are nuclear-encoded mitochondrial proteins• L07806-IF1 has 2 non-chimeric counterparts• Hits to rat genome data confirm the three 'host' transcripts are on
different loci• The 5' insertions are different sequences, lengths and orientations• L15618 is single-exon insert and maps to an unexpressed locus• Are these insertions of RUP2-like genes in vitro artefacts or rare
translocation events in vivo?
Protein Database Entries from the Chimera and Pre mRNA
The L07806-derived chimeric protein was chosen as the reference sequence by NCBI
NP_037047 ATPase inhibitor, mitochondrial precursor length = 107: NP_037047 MTKSCRIEASTLGVWGMRVLQTRGFGSDS M S + LGVWGMRVLQTRGFGSDSQ03344 MAGSALAVRARLGVWGMRVLQTRGFGSDS
but Swiss-Prot Q03344 highlights the discrepancy and correctly chooses “normal” rather than the chimera
CONFLICT MAGSALAVRAR -> MTKSCRIEAST (IN REF. 1).
The L07806-derived chimeric protein, without the targeting sequence, was expressed as a maltose binding protein fusion in E coli and was fully active!
tr Q91XP0 3' non-translated beta-F1-ATPase mRNA-binding protein: Length = 28
The artefactual sequence includes an exon Q91XP0 and AAK61874 MGKHILLLPLVLSLLMSSLQDSCGHEPS RUP1 MGKHILLLPLGLSLLMSSLLLALQCFRCTSFDSTGFCHVGRQK...
The L07806 Chimera Caused Errors in
UniGene
RUP Gene Family on Rat 8q21
Rat and Mouse RUP Homologues are Highly Diverged
* 20 * 40 * 60ENSM34617 : MGKLLLLHFLLMQASFALVFIQVQATVCMVCKSFK-SGHCLVGKNNCTTRYKPGCRTRNYENSM34619 : ----MKNFLRLCLFLLCFETG--FPLQCVQCQSYK-NGECATKKETCTTKPGETCMIRRTENSM34610 : --MNSVTKISTLLIVILSFLCFVEGLICNSCEKSR-DSRCTMSQSRCVAKPGESCS---TENSM39445 : ---------------LAFSIS---ALKCFQCTLFNSKGKCLFQEPPCETQNNEVCV---LENSM23855 : --------ILLHLLGLSFLVGFLKALTCITCDRINSQGICESGEGCCQAKPGEKCA---SENSM48154 : ----MGKHILQLLLVLSLLVMSSQALTCITCDRINSQGICESGEGCCQAKPGEKCA---SENSR7555/1 : ----MGKHILLLPLGLSLLMSSLLALQCFRCISFDSTGFCYVGRHICQTYPDEICAW--VENSR7614/1 : ----MGKHILLLPLGLSLLMSSLLALQCFRCISFDSTGFCYVGRHICQTYPDEICAW--VENSR7667/1 : ----MGKPILLLPLGLSLLMSSLLALQCFRCESLDSTGLCRVGRRICQTYPDEICAW--VENSR7837/1 : ----MGKHILLLPLGLSLLMSSLLALQCFRCESFDSTGLCQFGRYKCQTYPGEVCAF--VENSR7903/1 : ----MGKHILLLPLGLSLLMSSLLALQCFRCTSFDSTGFCHVGRQKCQTYPDEICAW--VENSR11381/ : ----MGKNILLLLLGLSFVIGFLQALRCLECDMLNSDGICEKGNSTCEAKEDQECG---I * 80 * 100 * ENSM34617 : FLFSHTGKWVHNHTELDCDKACMAENMYLGALKISTFCCKGEDFCNKYHGQVVNKNIYENSM34619 : WYANEIHNLQDAE--TKCTNSCKFEEKTSGYLTTHTYCCSHGDFCNDINLPIVMT---ENSM34610 : VSHFVGTKHVYSK--QMCSPQCKEKQLNTGKKLIYIMCCEKN-LCNSF----------ENSM39445 : WAKFEGGRFMYGF--QECSHTCVNQTLNLRNKRIEMKCCNDKSFCN------------ENSM23855 : LITLKDGKIQFGN--QRCANICFTGTVQTGDQTVKMKCCKKRSFCNEL----------ENSM48154 : LITLKDGKIQFGN--QRCANICFTGTVQTGDQTVKMKCCKKRSFCN------------ENSR7555/1 : VVTTRDGKFVYGN--QSCA-ECNATTVEHGSLIVSTNCCSATPFCNMVHR--------ENSR7614/1 : VVTTRDGKFVYGN--QSCA-ECNATTVEHGSLIVSTNCCSATPFCNMVHR--------ENSR7667/1 : VVTTRDGKFVYGN--QSCA-ECIGTTVEHGSLIISTNCCSATPFCNMVHP--------ENSR7837/1 : IITTRDGKFVYGN--QSCA-ECNATTVEHGSLIVSTNCFSATPFCNMVHR--------ENSR7903/1 : VVTTRDGKFVYGN--QSCA-ECNATTVEHGSLIVSTNCCSATPFCNMVHR--------ENSR11381/ : LVVSQGVDILFGM--QDCSSHCLNKTFHHYNLTLDFTCCHDQSLCNEF----------
Sequences Conserved in Rat but Divergent in Mouse
Homologues in Five Mammals but True Orthology Unclear
UP1_RAT : SP1_RAT : PIP1_PIG : BOP1_COW : EQP1_HOR : EQP2_HOR : DOP1_DOG : XP42_MOU :
* 20 * 40 * 60 * 80 * 100 MGKPILLLP--LGLSLLMSSLLALQCFRCESLDSTGLCRVGRRICQTYPDEICAWVVVTTRDGKFVYGNQSCA-ECIGTTVEHGSLIISTNCCSATPFCNMVHPMGKNILLL--LLGLSFVIGFLQALRCLECDMLNSDGICEKGNSTCEAKEDQECGILVVSQ-GVDILFGMQDCSSHCLNKTFHHYNLTLDFTCCHDQSLCNEF--MGKCLLLPLLLVVLSSLLGFPQALECFQCQRVSASGVCESGKSFCQTQGSQQCFLRKVYE-GDTVSYGHQGCSSLCVPMKFFRPNVTVDFRCCHDSPFCNKF--MAKCLLL-LLLVVLSSLLGLPQALECFQCNRVNASGVCETGGSTCQTQGSQQCFLRRIFE-NGTLSYGHQGCSQLCIPMKLFNPSVIVEYKCCHDSPLCNKF--MGKHLLLP--LVILSSLLGFLQALQCFHCDRVNASGVCVSGERFCETTGSQQCFVKKVYE-DGIISYGYQGCSSLCVDMMFLNFNVNLDWKCCHHASLCNKF--MGKHLLLP--LIILSSLLGFLQALTCLKCDRVNTSGVCQSGASFCQTKGSQQCYVRKVYE-DDTISYGSQGCSSICTDILLFSPNVAVDLKCCDDSPLCNKF--MGRCLLLLHLLLILCSQLDLLQALQCFQCKQVNANGVCEDGKSTCQAEGNQQCFLRKVYK-DNILSYGYQGCSSVCSPMTIFSTDVNLEEKCCNDSSFCNKF--MEKYLLLL--LLGIFLRVGFLQALTCVSCGRLNSSGICETAETSCEATNNRKCALRLLYK-DGKFQYGFQGCLGTCFNYTKTNNNMVKEHKCCDHQNLCNKP--
: 101 : 99 : 101 : 100 : 99 : 99 : 101 : 99
Remote Human Homolgues but no Strict Ortholgues
>tr|AF462605|Q8WXA2|9AD752F00D901FFE PATE.[Homo sapiens] (expressed in prostate and testis) Length = 126
Score = 31.2 bits (69), Expect = 3.3 Identities = 21/79 (26%), Positives = 32/79 (39%), Gaps = 6/79 (7%)
RUP1 : 23 QCFRCESLDSTGLCRVGRRICQTYPDEICAWVVVTTRDGK----FVYGNQSCAECIGTTV QC C C GR IC +E C + RDG F+ ++CA+ G +PATE : 47 QCRMCHLQFPGEKCSRGRGICTATTEEACMVGRMFKRDGNPWLTFMGCLKNCADVKG--I
Query: 79 EHGSLIISTNCCSATPFCN 97 +++ CC + CNSbjct: 105 RWSVYLVNFRCCRSHDLCN 123
Threading Reveals Homology between RUP1, Lynx1 and Snake Toxin Structures
P81827|UP1 : Q9WVC2|LYN : P81782|BUC :
* 20 * 40 * 60 MGKPILLLPLGLSLLMSSLL--ALQCFRC--ES--LDSTGLCRVGRRICQTYPDEICAWVVV-MTH--LLTVFLVALMGLPVAQALECHVCAYNGDNCFKPMRCPAMATYCMTTRTYF-----------------------------MECYRCGVSG--CHLKITCSAEETFCYKWLNKI------
P81827|UP1 : Q9WVC2|LYN : P81782|BUC :
* 80 * 100 * 120 TTRDGKFVYGNQSCAECIGTTVEHGSLII---------STNCCSATPFCNM----------V---------TPYR-MKVRKSCVPSCFETVYDGYSKHASATSCCQ-YYLCNGAGFATPVTLAL---------SNERWLGCAKTCTEIDTWNVY---------NKCCT-TNLCNT-----------
Lynx1, an Endogenous Toxin-like modulator of AChRs in the CNS,
Why so Few Apparent Orthologues?
P55000: Antineoplastic Urinary Protein/Secreted Mammalian Ly-6/uPAR Related Protein – Equivocal Annotation
Linking Sequence to Function: the Lost Keyword Problem (PubMed Queries in red)
• Adermann et al. "Structural and phylogenetic characterisation of human SLURP-1, the first secreted mammalian member of the Ly-6 /uPAR protein superfamily" Protein Sci. 1999 … from blood and urine peptide libraries. SLURP-1 is encoded by the ARS (component B)-81/s locus, and appears to be the first mammalian member of the Ly-6/uPAR family lacking a GPI-anchoring signal sequence ... SLURP-1 (+) Ly-6 (+) ANUP (-)
• Katz et al "A partial catalogue of proteins secreted by epidermal keratinocytes in culture." J Invest Dermatol. 1999 … proteins secreted by adult human epidermal keratinocytes included anti-neoplastic urinary protein (+) ANUP (-) SLURP-1(-) Ly-6 (-)
• Fischer et al. "Mutations in the gene encoding SLURP-1 in Mal de Meleda". Hum Mol Genet. 2001 … Three different homozygous mutations (a deletion, a nonsense and a splice site mutation) were detected in 19 families of Algerian and Croatian origin … first instance of a secreted protein being involved in a palmoplantar keratoderma.. SLURP-1 (+) Ly-6 (+) ANUP (-)
Mouse Ly-6-like Caltrin: Sequence Errors, Unverified Reported Function, New Name and New Function?
Confusion Over Caltrin: 5 Different Sequences in SwissProt; 22 PubMed Citations
Caltrin = inhibition of Ca2+ uptake into spermatozoa
• CALTRIN PRECURSOR (CALCIUM TRANSPORT INHIBITOR). - Mus musculus (a Ly-6 protein)
• CALTRIN PRECURSOR (CALCIUM TRANSPORT INHIBITOR) (SEMINALPLASMIN) (SPLN). - Bos taurus (PYY-like)
• CALTRIN-LIKE PROTEIN I. - Cavia porcellus (weak protease inhibitor match)
• CALTRIN-LIKE PROTEIN II. - Cavia porcellus (elastase inhibitor like)
• PANCREATIC SECRETORY TRYPSIN INHIBITOR II PRECURSOR (PSTI-II) (CALTRIN) (CALCIUM TRANSPORT INHIBITOR). - Rattus norvegicus (trypsin inhibitor identity)
Limited Knolwedge for the Short Ly-6 Proteins
• Single domain proteins ~85-100 residues mostly with signal peptide• Probable ligands by inference from toxin structures?• Recently duplicated rodent parologous family of 6 -10 gene loci but
very different evolutionary trajectories between mouse and rat • Liver and spleen expression in rat• Significant amounts of multiple gene products, probably glycosylated,
secreted in male rat urine• Foetal expression for pig, bovine and horse orthologues• Rapid evolution in mammals • Mix of secreted and GPI anchored homologues in human• Human Lynx-1 modulating AChRs • SLURP linked to skin physiology• Caltrin/SVS VII Phospholipid binding• Homologues involved in myelopoiesis in Xenopus and liver acute
phase in rainbow trout
Summary of the Bioinformatic Pitfalls
• The chimeric and pre-mRNAs lead to:– Artifactual clustering of ESTs and non-homologous gene products
in Unigene– Protein database conflicts and artifacts– Propogation of errors in RefSeq and rat genome
• Loose ends and sequence errors in old data • Equivocal functional annotation transitively perpetuated• Sequence-literature links broken by gene name ambiguities• Incorrect signal peptide annotation • Similarity scores for Ly-6 homologues fall below those in domain
databases• Rapid evolution made orthologue assignment difficult
Conclusions
• Bioinformatics can help a little bit of proteomics data go a long way• Finding quirks in database entries is definitely part of the fun but …• Sequence anomalies can seriously confound automated annotation• They can only be exposed of unravelled by
– transitive and broad sequence/keyword searching– detailed examination of sequence and literature links– understanding database building procedures– chimeras can be recognised by EST and genome matches
• Conflicting data links should be ideally be resolved by new data but may have to use judgment
• Difficult to discriminate between in vitro artefacts and rare in vivo events
• Inferring biological meaning from database searches requires an understanding of the experiments and the in-silico analyses
• Value of Swiss-Prot is significantly enhanced by community annotation
Acknowledgments, Reference and Database Entries
Southan C, Cutler P, Birrell H, Connell J, Fantom KG, Sims M, Shaikh N, Schneider K. “The characterisation of novel secreted Ly-6 proteins from rat urine by the combined use of two-dimensional gel electrophoresis, microbore high performance liquid chromatography and expressed sequence tag data” Proteomics 2002 Feb;2(2):187-96.
AF198441 Rat RUP2 mRNAUP1_RAT (P81827) Urinary protein 1 (RUP1) UP2_RAT (P81828) Urinary protein 2 (RUP2)UP3_RAT (P83125) Urinary protein 3 (RUP3) RSP1_RAT (Q9QXN2) Spleen protein 1AF198442 Rat spleen protein 1 precursor, mRNA, complete cds P83106 PIP1 protein (PIP1) - Sus scrofa P83107 BOP1 protein (BOP1) - Bos taurus Q9BZG9 Ly-6 neurotoxin-like protein Lynx1 - Homo sapiensAF321824 Human Ly-6 neurotoxin-like protein Lynx1 mRNA, partial cds
Human Short Ly6 Proteins
Name Size Chrom Ens ESTs Sigpep GPI InterPro
Patents
LYNX1 115 8q24.3 + + 19 91 Ly6 Curagen, Hyseq, HGS (sec), Incyte (sec)
Genset (partial)
SLURP2 97 8q24.3 + + 22 - Ly6 Genentch (sec/tm) ZymoGenetics
RGTR43 125 8q24.3 - + 22 103 Ly6 Genentech, HGS, Incyte
SLURP1 103 8q24.3 + + 22 - Ly6 HGS, ARS, Biovision (partial)
PATE 126 11q24.2 + + 21 - CyCPA2
Genset (sec), USDOH
LVLF31 113 11q24.2 - + 18 - - None
VertebrateShort Ly6 Proteins
ENSR7555 : ENSR7614 : ENSR7903 : ENSR7837 : ENSR7667 : ENSM23855 : ENSM48154 : PIP1_pig : BOP1_cow : ENSR11381 : ENSM39445 : ENSM34619 : ENSM34617 : ENSM34610 : LVLF3112_h : Hep21_Chic : SLURP2_hum : RGTR430_hu : SLURP1_hum : LYNX1_hum : PATE_hum :
MGKHILL----LPLGLS--------------------LLMSSLLALQCFRCISFDSTGFCYVGRHICQTYPDEICMGKHILL----LPLGLS--------------------LLMSSLLALQCFRCISFDSTGFCYVGRHICQTYPDEICMGKHILL----LPLGLS--------------------LLMSSLLALQCFRCTSFDSTGFCHVGRQKCQTYPDEICMGKHILL----LPLGLS--------------------LLMSSLLALQCFRCESFDSTGLCQFGRYKCQTYPGEVCMGKPILL----LPLGLS--------------------LLMSSLLALQCFRCESLDSTGLCRVGRRICQTYPDEIC----ILL----HLLGLS--------------------FLVGFLKALTCITCDRINSQGICESGEGCCQAKPGEKCMGKHILQ----LLLVLS--------------------LLVMSSQALTCITCDRINSQGICESGEGCCQAKPGEKCMGKCLLLP--LLLVVLS--------------------SLLGFPQALECFQCQRVSASGVCESGKSFCQTQGSQQCMAKCLLL---LLLVVLS--------------------SLLGLPQALECFQCNRVNASGVCETGGSTCQTQGSQQCMGKNILL----LLLGLS--------------------FVIGFLQALRCLECDMLNSDGICEKGNSTCEAKEDQEC--------------------------------------LAFSISALKCFQCTLFNSKGKCLFQEPPCETQNNEVC-MKNFLR-----LCLFL--------------------LCFETGFPLQCVQCQSYK-NGECATKKETCTTKPGETCMGKLLLLHFLLMQASFA--------------------LVFIQVQATVCMVCKSFK-SGHCLVGKNNCTTRYKPGCMNSVTKIS--TLLIVIL--------------------SFLCFVEGLICNSCEKSR-DSRCTMSQSRCVAKPGESCMLVLFLLGTVFLLCPYWGEL-----------------HDPIKATEIMCYECKKYH-LGLCYGVMTSCSLKHKQSCMKLLFVG------LALV--------------------LCVGVVEALQCKVCKYKIPYVGCFHGANETTCERRERCMQLGTGL---LLAAVLS--------------------LQLAAAEAIWCHQCTGFG---GCSHG-SRCLR-DSTHCMRGTRLA---LLALVLA--------------------ACGELAPALRCYVCPEPTGVSDCVTIAT-CTT-NETMCMASRWAVQ---LLLVAA--------------------WSMGCGEALKCYTCKEPMTSASCRTITR-CKP-EDTACMTPCSPD----LVVLMG----------------------LPLAQALDCHVCAYNG--DNCFNPMR-CPA-MVAYCMDKSLLLELPILLCCFRALSGSLSMRNDAVNEIVAVKNNFPVIEIVQCRMCHLQFPGEKCSRGRGICTATTEEACm a C C C c C
: 51 : 51 : 51 : 51 : 51 : 47 : 51 : 53 : 52 : 51 : 37 : 48 : 54 : 52 : 57 : 49 : 47 : 50 : 50 : 45 : 75
ENSR7555 : ENSR7614 : ENSR7903 : ENSR7837 : ENSR7667 : ENSM23855 : ENSM48154 : PIP1_pig : BOP1_cow : ENSR11381 : ENSM39445 : ENSM34619 : ENSM34617 : ENSM34610 : LVLF3112_h : Hep21_Chic : SLURP2_hum : RGTR430_hu : SLURP1_hum : LYNX1_hum : PATE_hum :
AW--VVVTTRDGKFVYGN--QSCA---ECNATTVEHGS--LIVSTNCCSATPFCNMVHR----------------AW--VVVTTRDGKFVYGN--QSCA---ECNATTVEHGS--LIVSTNCCSATPFCNMVHR----------------AW--VVVTTRDGKFVYGN--QSCA---ECNATTVEHGS--LIVSTNCCSATPFCNMVHR----------------AF--VIITTRDGKFVYGN--QSCA---ECNATTVEHGS--LIVSTNCFSATPFCNMVHR----------------AW--VVVTTRDGKFVYGN--QSCA---ECIGTTVEHGS--LIISTNCCSATPFCNMVHP----------------A---SLITLKDGKIQFGN--QRCAN--ICFTGTVQTGD--QTVKMKCCKKRSFCNEL------------------A---SLITLKDGKIQFGN--QRCAN--ICFTGTVQTGD--QTVKMKCCKKRSFCN--------------------F---LRKVYEGDTVSYGH--QGCSS--LCVPMKFFRPN--VTVDFRCCHDSPFCNKF------------------F---LRRIFENGTLSYGH--QGCSQ--LCIPMKLFNPS--VIVEYKCCHDSPLCNKF------------------G---ILVVSQGVDILFGM--QDCSS--HCLNKTFHHYN--LTLDFTCCHDQSLCNEF------------------V---LWAKFEGGRFMYGF--QECSH--TCVNQTLNLRN--KRIEMKCCNDKSFCN--------------------MIRRTWYANEIHNLQDAE--TKCTN--SCKFEEKTSGY--LTTHTYCCSHGDFCNDINLPIVMT-----------RTRNYFLFSHTGKWVHNHTELDCDK--ACMAENMYLGA--LKISTFCCKGEDFCNKYHGQVVNKNIY--------S---TVSHFVGTKHVYSK--QMCSP--QCKEKQLNTGK--KLIYIMCCEKN-LCNSF------------------AVENFYILTRKGQSMYHYSKLSCMT--SCEDINFLGFT--KRVELICCDHSNYCNLPEGV---------------A---IIKTSLGKVTLYYQ--QGCTSALNCGRERASDAESRLTSRYSCCETD-LCNEKWDDDPTD-----------VTTATRVLSNTEDLPLVT--KMCHI--GCPDIPSLGLG--PYVSIACCQTS-LCNHD------------------KTTLYSREIVYPFQGDSTVTKSCAS--KCKPSDVDGIG--QTLPVSCCN-TELCNVDGAPALNSLHCGALTLLPLMTTLVTVEAEYPFNQSPVVTRSCSS--SCVATDPDSIG--AAHLIFCCFRD-LCNSEL-----------------M---TTRTYYTPTRMKVS--KSCVP--RCFETVYDGYS-KHASTTSCCQYD-LCNGTGLATPATLALAPILLATLMVG--RMFKRDGNPWLTF--MGCLK--NCADVKGIRWS-VYLVNFRCCRSHDLCNEDL----------------- C C Cc CN
: 101 : 101 : 101 : 101 : 101 : 95 : 97 : 101 : 100 : 99 : 83 : 106 : 117 : 99 : 113 : 107 : 97 : 120 : 103 : 111 : 126
Searches Against Rat ESTs Confirmed the Three mRNAs as Chimeras
J03621
L07806
L15618
mRNA Anomaly No. 4: Unspliced?
LOCUS AF368860 1197 bp mRNA 13-JUN-2001 (CDS 10..96 "MGKHILLLPLVLSLLMSSLQDSCGHEPS")
Rattus norvegicus 3' non-translated beta-F1-ATPase mRNA-binding protein mRNA, complete cds. "Identification of a liver specific cDNA clone chaperoning the differential assembly of ribonucleoprotein complexes at the 3' UTR of the mRNAs of
oxidative phosphorylation"
BLAST
vs Rat ESTs
RUP-4? MGKHILLLPLVLSLLMSSLLALQCIQCARIDSRGICRHDIYICHADSDEVCSWVVATTRD MGKHILLLPL LSLLMSSLLALQC +C DS G C C DE+C+WVV TTRDRUP-2 MGKHILLLPLGLSLLMSSLLALQCFRCTSFDSTGFCHVGRQKCQTYPDEICAWVVVTTRD
RUP-4? GKFVYGNQSCAECNATTVEQGSLIVSTNCCSASHFCNMVYR(ESTs AA945232,AA945121) GKFVYGNQSCAECNATTVE GSLIVSTNCCSA+ FCNMV+RRUP-2 GKFVYGNQSCAECNATTVEHGSLIVSTNCCSATPFCNMVHR 101
RUP Homologues Expand a New Sub-family of Secreted Ly-6 Proteins
3D PSSM Fold Recognition Server