sequence analysis of a cdna clone encoding the liver cell adhesion molecule, l-cam

6
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/19684578 Sequence analysis of a cDNA clone encoding the liver cell adhesion molecule, L-CAM ARTICLE in PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES · JUNE 1987 Impact Factor: 9.67 · DOI: 10.1073/pnas.84.9.2808 · Source: PubMed CITATIONS 155 READS 33 4 AUTHORS, INCLUDING: Warren J Gallin University of Alberta 57 PUBLICATIONS 2,582 CITATIONS SEE PROFILE Barbara Sorkin National Institutes of Health 24 PUBLICATIONS 1,101 CITATIONS SEE PROFILE Available from: Barbara Sorkin Retrieved on: 04 February 2016

Upload: independent

Post on 13-Nov-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

Seediscussions,stats,andauthorprofilesforthispublicationat:https://www.researchgate.net/publication/19684578

SequenceanalysisofacDNAcloneencodingthelivercelladhesionmolecule,L-CAM

ARTICLEinPROCEEDINGSOFTHENATIONALACADEMYOFSCIENCES·JUNE1987

ImpactFactor:9.67·DOI:10.1073/pnas.84.9.2808·Source:PubMed

CITATIONS

155

READS

33

4AUTHORS,INCLUDING:

WarrenJGallin

UniversityofAlberta

57PUBLICATIONS2,582CITATIONS

SEEPROFILE

BarbaraSorkin

NationalInstitutesofHealth

24PUBLICATIONS1,101CITATIONS

SEEPROFILE

Availablefrom:BarbaraSorkin

Retrievedon:04February2016

Proc. Nati. Acad. Sci. USAVol. 84, pp. 2808-2812, May 1987Developmental Biology

Sequence analysis of a cDNA clone encoding the liver celladhesion molecule, L-CAM

(intrinsic membrane protein/epithelial glycoprotein/calcium-dependent adhesion/gene duplication)

WARREN J. GALLIN, BARBARA C. SORKIN, GERALD M. EDELMAN, AND BRUCE A. CUNNINGHAMDepartment of Developmental and Molecular Biology, The Rockefeller University, 1230 York Avenue, New York, NY 10021

Contributed by Gerald M. Edelman, January 5, 1987

ABSTRACT The liver cell adhesion molecule (L-CAM)appears on non-neural epithelial tissues and mediates calcium-dependent adhesion in these tissues both in the embryo and inthe adult. It appears on cell surfaces as a glycoprotein of Mr124,000 but is synthesized as a precursor of Mr 135,000. Wehave isolated and determined the nucleic acid sequence of acDNA clone (XL320) encoding chicken L-CAM. The 5' end ofthis clone has an open reading frame extending for 2520 basepairs, followed by an 850-base-pair untranslated region ter-minating with a polyadenylylation site at its 3' end. Proteinsequence analysis of intact L-CAM and of cyanogen bromidefragments of the protein confirmed the reading frame andindicated that XL320 encodes the complete sequence ofL-CAMas it is expressed on the cell surface as well as the bulk of theprecursor. The sequence includes a hydrophobic segment of 31amino acids, supporting our earlier conclusion that L-CAM isan intrinsic membrane protein. There are five potential aspar-agine glycosylation sites on the extracellular part of themolecule and an intracellular domain that is phosphorylated invivo. The mature L-CAM polypeptide consists of 727 aminoacids, with a calculated Mr of 79,900 for the carbohydrate-freeprotein. The L-CAM sequence is not homologous to otherknown protein sequences, including those of the neural celladhesion molecule (N-CAM) and other members of the immu-noglobulin superfamily, but the L-CAM molecule does containthree contiguous segments (113 amino acids each) that arehomologous to each other. The similarities among these seg-ments suggest that at least part of the L-CAM molecule aroseby gene duplication.

The liver cell adhesion molecule (L-CAM) is a primary CAMthat appears in a distinct pattern at a variety of inductiveembryonic sites as well as in adult tissues (1, 2). It wasinitially isolated on the basis of its ability to mediate calcium-dependent adhesion between cells of the chicken liver epi-thelium (3, 4). Several other calcium-dependent adhesionmolecules, including uvomorulin (5), E-cadherin (6), cell-CAM 120/80 (7), and Arc-1 (8) have been isolated fromdifferent epithelial tissues or cell lines, and these moleculeshave biochemical properties (9) and tissue distributions (2,10) so similar to those of L-CAM that it is likely that all ofthem are its mammalian homologues.The importance of L-CAM in embryonic development and

in the formation of epithelium is suggested by studies inwhich L-CAM-mediated adhesion was blocked with antibod-ies. The formation of epithelial colonies in primary culturesof embryonic chicken hepatocytes is reversibly blocked byantibodies to L-CAM (3, 4). In addition, antibodies touvomorulin block the compaction of mouse embryos andhence the polarization of blastomeres that is essential tonormal development (11). Similarly, antibodies that prevent

the formation of tight junctions in MDCK cells react withL-CAM rather than components of the tight junctions them-selves, indicating that the formation of some junctionalcomplexes is dependent on cell-cell adhesion via L-CAM(12). Antibodies to L-CAM have recently been shown toperturb the inductive interactions between epidermis anddermis and to alter pattern formation during the developmentof feathers (13). Inasmuch as L-CAM is expressed in epider-mis but not in dermis, this effect is apparently due todisruption of cooperative cellular interactions within theepidermal cell collectives leading to consequent alterations ofsignals between these cells and mesodermal cells. In addi-tion, L-CAM and N-CAM (neural cell adhesion molecule) areoften expressed in adjacent populations of early embryoniccells; their coordinate expression throughout development,particularly at sites of induction, suggests that their mutualactivity is crucial for the formation of tissue boundaries (1, 2,10, 14).Because of the widespread distribution of L-CAM and its

apparent role in promoting cooperative interactions betweencells in collectives, particularly during embryogenesis, wehave undertaken a detailed analysis of this molecule and itsgenes. These studies provide a structural basis for analyzingL-CAM function and the control of its expression duringdevelopment and should provide probes for analyzing therole of L-CAM in cellular mechanisms of development. Wehave previously described the overall structure of the L-CAM molecule (9) and used a small cDNA probe, pEC301(15), and specific antibodies to show that the mRNA [4kilobases (kb)] and the protein are the same size in all tissuesexpressing L-CAM. We report here the sequence of a cDNAclone that encompasses almost the entire L-CAM mRNA,including the complete amino acid sequence of the protein asit is detected on the cell surface.

MATERIALS AND METHODSIsolation of a L-CAM cDNA Clone. cDNA libraries in Xgtll

were prepared as described (15). Bacteriophage were platedat a density of 105 plaque-forming units per plate, and replicasof each plate on nitrocellulose were probed with pEC301insert. A positive bacteriophage, designated XL320, wascloned to purity. DNA from this bacteriophage was digestedwith EcoRI and ligated with EcoRI-digested calf intestinalalkaline phosphatase-treated pBR328 (15). A plasmid con-taining an insert that hybridized to pEC301 insert wasisolated and designated pEC320.DNA Sequencing. For M13 sequencing, insert DNA from

pEC320 was digested with restriction enzymes and subclonedinto M13mpl8 or M13mpl9 (16). For deletion subcloning, theinsert from pEC320 was cloned into the EcoRI site ofBluescript-KS vectors (17), and deletion subclones wereprepared by digestion with exonuclease III and mung bean

Abbreviations: L-CAM, liver cell adhesion molecule; N-CAM,neural cell adhesion molecule.

2808

The publication costs of this article were defrayed in part by page chargepayment. This article must therefore be hereby marked "advertisement"in accordance with 18 U.S.C. §1734 solely to indicate this fact.

Proc. Natl. Acad. Sci. USA 84 (1987) 2809

nuclease and blunt-end ligation to recircularize the plasmids(17). Deleted DNA was transformed into competent JM109cells and plasmid was prepared from single colonies forsequencing. DNA sequence analysis was performed by thedideoxynucleotide chain-termination method. Fragments sub-cloned into M13 vectors were sequenced using the Klenowfragment of DNA polymerase I (18, 19) and fragmentssubcloned into Bluescript vectors were sequenced usingeither avian myeloblastosis virus reverse transcriptase orKlenow fragment (20). Sequence data were compiled usingthe Staden ANALYSEQ programs (21).

Peptide Purification and Sequencing. The Mr 81,000 frag-ment (Ftl) ofL-CAM was isolated (9) and treated with CNBr[10% (wt/vol) in 70% formic acid] for 4 hr at room temper-ature. Fragments were separated by gel filtration on Sepha-cryl S-300 (Pharmacia) in 0.1 M NH4HCO3, followed bypreparative NaDodSO4/PAGE on 15% acrylamide gels. Pep-tides were detected by Coomassie blue staining, electroe-luted (22), and subjected to Edman degradation (23) in theRockefeller University sequencing facility.RNA Transfer Blotting. Liver or brain total RNA (17 pug)

was resolved by electrophoresis on formaldehyde gels (24),transferred to nitrocellulose, and probed with 32PO4-labeledDNA probes (107 cpm in 10 ml of buffer) (15). Afterhybridization overnight at 420C, blots were washed (15) andexposed to XAR-5 film at -70'C, with enhancing screens forautoradiography.

RESULTS

A Xgtll cDNA library was screened by nucleic acid hybrid-ization using the L-CAM-specific cDNA clone XL301 (15). Arecombinant clone (designated XL320) was isolated, butrestriction mapping indicated that the EcoRI site at one endhad been destroyed during the construction of the library.XL320 bacteriophage DNA was therefore digested withEcoRI for 16 hr and the resulting fragments were subclonedinto the EcoRI site of pBR328. A transformant that hybrid-ized to XL301 probe was isolated, and the plasmid (designat-ed pEC320) was found to contain a single insert of 8 kbbounded by two EcoRI sites. Analysis of the sequenceproved that the pEC320 insert consisted of a cDNA thatcontained an open reading frame that matched L-CAMprotein sequence and terminated in a poly(A) sequence fusedto ,B-galactosidase and X genomic sequence from the Xgtllvector. Sequence analysis of the ends of the pEC320 insertdemonstrated that the EcoRI* site that had been used toexcise the insert from the bacteriophage corresponded toposition 23,673 in the wild-type X genomic sequence (25).The complete sequence of the L-CAM portion of pEC320

(Fig. 1) extends 3.4 kb from the 5' EcoRl site to a poly(A) tailpreceded by a polyadenylylation signal (AATAAA) at nucle-otide 3336. It begins at the 5' end with an open reading framethat continues for 2520 base pairs. The amino-terminalsequence of the mature L-CAM protein starts 340 base pairsfrom the 5' end of the cDNA. This result is consistent withthe observation that L-CAM is synthesized as a precursorpolypeptide approximately 10 kDa larger than the matureprotein (ref. 26; B.C.S. and B.A.C., unpublished data) andindicates that the clone contains nearly all of the precursor.No obvious signal sequence is present, however. The me-thionine codon at position -48 is probably not a start codonbecause the predicted polypeptide is not large enough toinclude a signal peptide plus the expected precursor. More-over, the pEC320 cDNA is smaller than the mRNA forL-CAM (4 kb).The translation of the long open reading frame ofthe cDNA

is presented in Fig. 1, and agrees with amino acid sequenceanalysis ofintact L-CAM and CNBr peptides of the molecule.

Analysis of the hydrophobicity of the amino acid sequence(27) (Fig. 2) indicated a single membrane-spanning regionconsisting of 31 consecutive hydrophobic amino acid resi-dues flanked on the amino-terminal side by a Gly-Arg-Arg-Ser-Tyr sequence, and on the carboxyl-terminal side by anArg-Arg-Arg-Lys sequence. The plot also shows a smallerpeak at residues 303-335, but this region has hydrophilicresidues dispersed throughout.

Digestion of liver membranes with trypsin in the presenceof Ca2+ ions releases an extracellular Mr 81,000 glycoproteinfragment (Ftl) of L-CAM (4, 9). This fragment contains fourasparagine-linked oligosaccharides (9). Consistent with theseresults, the extracellular portion of the sequence shown inFig. 1 contains five potential asparagine glycosylation se-quences (Asn-X-Thr).The relative molecular weights of Ftl and intact L-CAM as

measured by NaDodSO4/PAGE in the Laemmli system (28)are significantly larger than the molecular weights predictedfrom the sequence of L-CAM (Mr 70,000 vs. 59,000 and Mr116,000 vs. 79,000, respectively). These discrepancies appearto be due in part to anomalous migration of the glycoproteinand Ftl on NaDodSO4/PAGE. For example, although intactglycosylated L-CAM migrates with a M, of 124,000 on aNaDodSO4/PAGE system in Tris buffers, it has a Mr of105,000 in a phosphate buffer (29), which is much closer to thevalue predicted from the cDNA sequence. Thrombospondinhas also been reported to migrate with a lower apparentmolecular weight in the phosphate system than in theLaemmli system and the lower molecular weight is closer tothe value predicted from the cDNA sequence (30). In addi-tion, a number of other proteins have been reported to havepredicted molecular weights significantly smaller than therelative molecular weights measured by NaDodSO4/PAGE(see ref. 31).

Analysis of the L-CAM sequence showed no apparentinternal homology in the nucleic acid sequence, but thepredicted amino acid sequence contains three contiguoushomologous repeats of -113 amino acids beginning 27 aminoacids from the amino terminus of the mature protein. Whenthese regions were optimally aligned (Fig. 3), there was a 39%amino acid sequence identity between region I (amino acids33-136) and region II (amino acids 146-248) and =20%identity between either I or II and region III (amino acids258-361). Searches of sequence databasest using the Wilbur-Lipman fast search algorithm (32) revealed no obvioushomologies between L-CAM and any other known proteins.To test for possible L-CAM variants and homologous

molecules, subcloned fragments of the L-CAM cDNA wereused to perform RNA blot hybridization analyses of RNAfrom liver and brain. Three fragments of the L-CAM cDNAwere used to probe total cellular RNA from liver: the 5'1.3-kb EcoRI/Kpn I fragment, the central 1.3-kb Kpn I/BamHI fragment, and a 3' fragment extending from theBamHI site to the poly(A) sequence. All three probesdetected a single species -4 kb long (Fig. 4), as reported forXL301 probes (15), indicating that there is no size heteroge-neity in mRNAs encoding L-CAM. Hybridization of theseprobes to total RNA from 10-day embryonic chicken brain,which does not express L-CAM, showed no specific hybrid-ization to any material in the brain, either under high-stringency conditions (Fig. 4) or low-stringency conditions(not shown).

tNational Institutes of Health (1986) Genetic Sequence Databank:GenBank (Research Systems Div., Bolt, Beranek, and Newman,Cambridge, MA), Tape Release 46; and Protein IdentificationResource (1986) Protein Sequence Database (Nati. Biomed. Res.Found., Washington, DC), Release 10.

Developmental Biology: Gallin et al.

2810 Developmental Biology: Gallin et al. Proc. Natl. Acad. Sci. USA 84 (1987)

-ItO100 -090go0...AspSerVaIAI zAI aGIyArgGiuLeuGlyArgVaISerPbeAI *Al aSerGlyArgProTrpATaValTyrVaIProThrAupTrArfPheLyVaIAmlyAup

GAATTCCGGGACAGCGTGGCGGCGGGCAGGGAGCTGGGACGAGTGAGCTTTGCAGCCtGqCCGCGCCCGCCCTCGCCCCTGTATCTCCCGCACTCACACACGCCTTCAAGGCTGAACGCCGGAT-70 -60 -60 -40

Cl yValV^I Se rTh rLyeArgPr oLeuTbh*LeuTyrCI yAfgLys I IeSe rPbeTb r IIeTy rAI aCInAspAI *Me tCIyLyeAr gliiSe rAl aArgVaIThfrVa ICI Argiii ArgGGCGTCGTCTCAACAAAACGGCCACTGACGCTCTATGGGAGGAAGA TCAGCTTCACCATCTACGCACAGGATGCCATGGGCMCACCAGCACTCAGCCCGGGCTGACTGTGGGCAGGCACAGG

-30 -20 -10 1Hi sAr gAf gHisHi sHi AseHi HisLeuGI Ae"pTh rTbofPr oAIaVa ILeuTbfrPbePr oLy sHi «As pPr oCI PheLeuAr gAr gGI Ly sAr gAspTr pVa IIePr oPro l IeCACCGCCCGCCACCACCACAACCACCACCTGCAGGACACGACCCCGGCTGTGCTGACCTTCCCCAAGCACGACCCCGGCTTCCTCCGGAGGCAGACACGAGATTGGGTCATCCCCCCCCATC

10 20 30 40Se leaGIuAHiAfgGIyroTyrPfoMe AfgLeuVa GI |IeLysSerAsmLysAspLysGIuSerLysValTyfTyrSerI leThrGIyGIuGIyAIaAspSerProProAGCEGCTCGAGAACCACCGGGGGCCCTACCCCATGAGGCTCCTCCACATCAAATCCAACAAGCACAACCAGTCCAAGCCTCTACTACACCATCACGCCCCCAGCCCCCGACAGCCCGCC CC

60 60 70 60'ValGly I IePboleII IIeGI uArgGIuTb r~lyTrpLeuGI uVaITb rGI uGI*LeuAspArgGI uLys I IeAspArgTy rTh rLeuLeuSe rHi *AIaVa ISe rAl aSe rGIyGI aProGTGGGCATCTTCATCATCGAGCGCGAGACGGGGTGGCTGGACCTCACGGAGCAGCTGCACCCCCAGAAGATCGATCCCTACACCCTCTTATCCCACGCCCCTCTCCCCCAGCGGGCAGCCC

go 100 110 120ValGluAepProMe tGI ul11eI Ie I IeTo rVaIMe AspGI oAsoAspAsLysPr oVa IPboe IIeLy*GIuVa IPboeVaIGIyTy r IIeCI uGIuAinAIaLysProGI yTb fSe rVa IGTGGAGGACCCCATGGAGATC ATCATCACGGTGATGGACCAGCAACGACAACAAGCCCGTGTTCATCAAGGAGGTCTTCGTCGGCTACATCCACCACAACCCCAAGCCAGGCACCTCCGTC

130 0 140 160 160M~etTbrValAssAI*Tb rAspAI AspAspAI aVaIAsTbrtAspAse~ly 11eVaISe rTyrSerlIIeVaI SertCI aCIeProPr oAr gProHi *ProCI nMe tPbeTb r 11eAepProATCACCGTGAACGCCACGGATCCCGACGATGCCGGTCAACACGGACAACGGCATCGTCACCTACTCCATCCTCACCCAGCAGCCGCCCCCCGCCCCACCCCCAGCATGCTTCACCATTGACCCG

170 1oo 0 190 200Al aLysCI yl11el11eSe rVa ILeuGI yTbrGI yLeuAspAr gGI uTbf*TbtProAseTyrThr*Leu 11eVa IGI DAI*Tb rAspCI nGI uCI LysCI yLeuSer*AsoT rAI *Tb rAIGCCAAGGGCATCATCAGCGTGCTCCGCACGGCCCTCCACCCGGACACCACTCCCAACTACACACTGATCGTCCACCCCACGGAC CAGGAGGGCAAGGGTCTGTCCAACACCGCCACGGCCC

210 220 0 230 24011 e IIeGI uVaITbtA~pAIaAaAsD~pAslF IeProl IePheAsoProTb rMetTyrGI uGI yVaIVaICIuGluAsnLysProGI yThrCI uVa IAI ArgLeuTb rVaITbrA~pGInATCATTGAGGTCACGGATGCCAACGACAACATTCCCATCTTCAACCCCACCATGTACGAAGGTGTGGTGGAAGAGAATAAGCCAGGTAC ACACCTCCCCACCCTGACTCTGACGGACCAC

250 260 270 2A0AspAI aProCI ySe rPr oAI *TrpCI nAl *VaITy rHi s IIeLyeSe rCIyAonLeuAspCIlyAI aPheSe rlIIeIIeTh rAepPr oSe rTh rAsnAsnC y I IeLeuLy$Tb rAI aLysGACGCACCAGGCTCCCCAGCCTGGCAAGCCGCTTTACCACATCAAGAGTGGGsAACCTGCATCCTCCCTTCACCATTATTACTCACCCCA(.CACCAAC AACGGAATCCTGAMACACAGCCCAMG

200 300 310 320GI yLeuAspTyrGI uThrLysSerArgTy rAspLeuVa IVa ITh rVal(GIuAsnLysVaIProI.euSerVaIProII ITh rLeuSerThrAI alSrVaI LeuVa ITh rVa LeuAspVaIGGCCTGGATTATGAGACCAAGAGCCGCCTACGACCT(:GTGGTGACGGTGGAGAACAAAGTGCCCCTGTCsCGTCCCATCACACTCTCCACCGCCAGCGTCCTGGTGACCGTCCTGGACGTG

330 340 350 360AsnGI uPr oPr oVa IPheValProPro l Iel ysArgsVa~ICIyVaIPr oGIuAspLeuProVaICYIyInCI nVa ITh rSe rTy rThrAlaClIIAspPr oAspAr gAspMf tAr gCInLyrsAATCACCCCCCTCTCTTCCTCCCCCCTATCAAGAGGCTACGGGTACCAGAGCACCTACCAGTGGGCCAGCAG(CTTACATCCTACA(GCCCCAACGACCCCCACACGCACATGAGGCACAAG

370 380 390 400 0lleTbrTyrArgMetClyStrAspProAlaGlyTrpl.euTyrlIeHisPro(luAsnCly1IeVaIThrAlaThrClnProLeuAspAr6g(:IuSerVaI1iisAaIlIeAsnSerThrTyrATCACGTACCGCATGCGCA (;CGACCCA(;CACGCTGGCCTGTACATTCACCCCGAGAAxTGCATTGTCAC(;GCCCACCCAGCCCACTG(.ACCGCGAGTCGGTGCACGCCATCAACAGTACATAC

410 420 430 440LYeAIa*lIeIIeLeuAI aVa IAs pAs"C y I IePr oAspTb rTb rGIyTb rGI yTbrLeeLe uLeuLenLeuGI nAspVa IAsnAs pAnGI yPr oTb rP oCI uPr oAr gSe rPbeGI uAAGGCCATCATCCTGGCTGTGGACAATGGGATACCCCATACCACCGCTACGGCCACCCTCCTCCTCCTCCTCCACCA TGTGAATGCAACGMGGCCCCACCCCAGAGCCCCGGTCCTTCGAG

460 460 470 480 0aIIe GluerArgGllPfoGluLyahrnlAeLevSersieValAspLyIAspLelProProHisTbrTyrProPoeLysAlaAeLLeuCluHisClySeIrSerAsaenTrpTbrVuIGLuCATCLGTrGCCTGCAGCTGAGAAGCAGATACTGGAGTGGAGGTCATACATCC G CTCCCCAGCCAT ACCCTGGAGCACTGGGGATCCTGACAACAACTGcGACTTGAG

450 500 560 600lleArg~lyGllAspGluLeuAlaMet~lyLeuLysLye~luLeu~luProGlyGluTyrAssllePbeValLysLeuThrAspSerGloGlyLysAlaGllVaITbr~loValLy$AIaATAAGCGGCCAACATCAGCCTGGCCATGGGCCTCAAAAACCACCTCCACCCCGGCGAGTACAATATCTTTCTCAACCTCACGACAGCCCAGCCGCAAGCCCACACCTCACGCCAGCTCAAAGCC

630 640 660 640GlaYal ye lu luGlGlyTbrAlaLysAso E ClcuArgArgSerTyrle*Val~ly~lyLeuCI yValProAai .1eLeu~lylleLeu~ly~lylleLeuAl^LeuLeulleLeuCAGGTGL A G-UAAGCCACACCCAACAAC gGAGCGGAGGTCGTACATCGTCGGTGGGCTGGGTGTCCCCGCCATCCTGGGCATCCTGGCGGGAATCCTGGCCCTGCTGATCCTG

70 Sao 590 600LeuLteLeuLeuLeuLeuPheAlsArgArgArgLryeal~luLye~luProLeuLeuProPro~luAspAspMetArgAspAonVaITyrAsnTyrAsp~lu~luvlGlyrlGlyGluluCTGCTGCTGCTGCTGCTCTTTGCCCGGCGACGCAAAGTGGAGAAGGAGCCGTTGCTGCCGCCCGCAGGATGACATGCGGGACAACGTCTACAMACTACCACCACCAGCCCCCCCCGCACCAC

1I0 620 630 640AspGlmAspTyrAspLeuSerGlaLeuHisArgGlyLeuAspAIaArgProGluVsIlleArgAsaAspVaIAlaProProLeuMetAlaAlaProGlnTyrArgProArgProAlsAsnCACCACCACTACCACCTCAGCCCAGCTGCACCCCCCCCTCCACGCCCCCCCCCCACCTCATCCCCAATCATCTCGCCCCCCCCCCTCATGCCCCCCCCCCACTACCCGCCCCCCCCCCCCAAC

660 660 670 680ProAspGlulleGlyAsuPbelleAlp~lvAseLeuLyeAlaAlaAepTbrAspProThrAlaProProTyrAspSerLeuLeuValPbeAspTyf~lu~ly~ly~lySerGluAll~brCCCGATGAGATCGGGAACTTCATCGACGAGAMACCTGAACGCAGCTCACACCCACCCCACCCCCCCCCCCTACCACTCCCTCCTCCTCTTCCACTACCAGCCCCCCCCCTCGGAGGCCACC

690 700 710 720SerLeASerSeMLeuAsoSerSerAlaSerAspGloAspGlnAspTyrAspTyrLeuAse~luTrpGlyAsoArgPheLyaLyeLeuAlaCluLeuTrrly~ly~ly~luAspAlp~luTCGCTCAGCTCCCTCAACTCCTCCCCCTCCCACCAGGACCAGGACTACGACTACCTCAACGAGTCGGGCAACCGCTTCAAGAACCTGGCGCAGCTCTATGCGGCGGGGCCAGATGATGAA

AMTAGGCAGCCTTCGTCCCCTTGGCTGCCCGCCCACGTGGCTCTTGCCATTTCACCCCCTCGGACATGCACTGGGTTTGCAGCTAAAGCCCCCTCCAACAGCGTTGGATCCACCACTCGCAT

CCCTCCCCTTCGGTGCAGGGAAAGCTCAGGGTCACCGGGGCTGGAGGCTTTTGGGGAGCCCACGCTTTTCCTGCCCCCTCAGCCCCACACACGGGCACAGTCCCATTCGGGCTCCTTGGGA

ACAGGCTGGAGGCGCGGTGCTCCTGGGTCAGCTCCCAGTGCCCCAATCCCACAGCCCCCACAGCACGGCCGGCGCTCAGCATCTCTCCCCCTGCAAGGAAAGAGGCCCCCACCGCCTCTC

TCCCTTTGCTACAGAGGAAGGGGTTTCTeCCG(:CACCCTCCATAATACCTCGTGCTCCCAGCCCCCGCCATCGGCGCCACGCTCAGGGTCTGGGATGGCCTTTGTGTGGAGCTGCTCGTGCG

TGTGCAACACGGGTGGGCGATCGCTCCCGCACACCAACGACGTGCCATTCCCTCTGCATGACACACCACAGA^ATTATATTTGATACAGTACTCACGGGCAGATTTCTATTTTTTGTATAA

CAGGTGACGATGTTATTTCCAGCTGCGTTGTTT(:TAGTTATCGTTCTGTTAATGAGAGGTGCTTCCTGAAATCAAGTGGCTTTACTTAAA(:TCCGAACT(:CGTGTCTAGCTGTTGTTTTGTrATTGnTTA TV.CA-v---------------------CTAC(:T ---------TTTTAG~ATAAsCATsATCsAAAACAAs

AAAAAAAAAA

FIG. 1. Sequence of cDNA clone pEC320 encoding L-CAM. Residues determined by protein sequence analysis of L-CAM and cyanogenbromide fragments are underlined. The amino-terminal residue of the mature protein is number 1. The five potential asparagine glycosylationsites on the extracellular portion of the molecule are indicated with dots; the cysteine residues are marked by boxes. The putative membranespanning region is underlined with a dark bar.

Proc. Natl. Acad. Sci. USA 84 (1987) 2811

.0 00.

-50_

-1000 200 400 600

Residue Number

FIG. 2. Hydrophobicity plot of the amino acid sequence ofL-CAM. Hydrophobicity of 15 amino acid segments ofL-CAM werecalculated by the method of Kyte and Doolittle (27). The largestpeak, marked with an arrow (positions 545-575) resembles themembrane spanning region of other proteins.

DISCUSSIONThe structures of the L-CAM cDNA and L-CAM protein aresummarized in Fig. 5. The protein sequence includes ahydrophobic segment that appears to be a membrane-span-ning region and a presumed cytoplasmic domain of 152residues. This cytoplasmic domain contains a serine-richregion, which apparently includes sites that undergo phos-phorylation (9). The bulk of the molecule (544 amino acids)is external to the cell surface and includes at least threehomologous repeats, each of -113 amino acids. Five poten-tial sites for asparagine glycosylation are also present; pre-vious studies (9) indicate that four ofthese sites are occupied,three with complex oligosaccharides and one with a high-mannose oligosaccharide (9).While it is well-established that L-CAM and its mammalian

homologues mediate calcium-dependent adhesion betweenepithelial cells, there has been some disagreement as to itsmode ofattachment to the membrane (4, 9, 26). The sequencedata reported here strongly support our earlier conclusionthat L-CAM is an integral membrane protein (4, 9). Theprotein sequence deduced from the nucleic acid sequencecontains a highly hydrophobic region flanked by multiplebasic amino acids. This region separates the extracellular partof L-CAM, represented by the Ftl fragment, from thepresumed cytoplasmic domain that contains a serine-richregion. These results are consistent with the hydrophobicbehavior of L-CAM in charge-shift electrophoresis and itsresistance to extraction with 0.1 M NaOH (B.C.S. andB.A.C., unpublished data).

Recently, comparison of the amino-terminal nine residuesof N-cadherin, a calcium-dependent adhesion molecule fromchicken neural tissue, and E-cadherin (mouse L-CAM) sug-gested a striking similarity between them (33). Both mole-cules have the same amino acid sequence for the first seven

a b c d e f

B

28S-. * 0

18S-

FIG. 4. RNA transfer blot analyses of liver (lanes a, c, and e) andbrain (lanes b, d, and f) RNA probed with fragments of L-CAMcDNA. Lanes a and b, the 5' 1.3-kb EcoRI/Kpn I fragment; lanes cand d, the central 1.3-kb Kpn I/BamHI fragment; lanes e and f, the3' 800 base pairs of the L-CAM cDNA.

residues: Asp-Trp-Val-Ile-Pro-Pro-Ile. L-CAM from chickenhas the same amino-terminal sequence. After position seven,however, all three sequences diverge from each other. Themeaning of the identity is unclear. Moreover, as shown inFig. 4, we found no hybridization of L-CAM cDNA probesto brain mRNA, indicating that any major structural similar-ities between L-CAM and N-cadherin are not reflected inextensive nucleic acid homologies. Hybridization at lowerstringencies also failed to detect cross-hybridization betweenL-CAM and any RNA species in the brain.The mRNA encoding uvomorulin has been reported to

contain sequences homologous to the rat B1 repetitivesequence (34). We find no evidence for any repetitiveelements in the cDNA clone for L-CAM; all regions of thecDNA clone hybridize to a single size species on RNAtransfer blots. In addition, there is no homology between thepublished sequence of the uvomorulin cDNA clone and theL-CAM cDNA sequence reported here. Thus, it is unlikelythat repetitive elements in the 3' untranslated region of themRNA are significant factors in the regulation of L-CAMexpression during development.When the L-CAM protein sequence was compared to itself

by a diagonal matrix method (35), three regions of repeatedhomology were found in the amino acid sequence, althougha similar analysis of the nucleic acid sequence of L-CAMrevealed no comparable homologies. When these sequenceswere aligned to maximize homology, the similarity betweenthe first two (from the amino terminus) repeats was muchgreater than the homology between either of them and thethird, more carboxyl-terminal, region. The dot matrix plotalso indicated a fourth possible homology region adjacent tothe third, but there were too few amino acid identities to alignthis region with the other three. These results suggest thatL-CAM evolved by two or more successive duplications of aDNA precursor. In addition, because the amino acid se-

TV V VVVT V TVY V VP TV30 50 70 90 11O 130KDKESAKS VG 3I IERE -E I D LS1vSAS qPVEDWE I ---- KVFIKEVFV 1 MVV DAVNTDNG SI PP P IPA L T I DX - V I I M BE- GV DR PG

1 1 270 20 J1I~ I I III 0Ij ~ 1135011 LlSPW - GPAF S I DP S !BAS ENKVP SyTASL IKRV LPVGQWr S DM

FIG. 3. Alignment of the three regions of internal homology. Three contiguous segments of the protein sequence (Fig. 1) were aligned andgaps were inserted to maximize the alignment of identical amino acids. Residues identical in at least two segments are enclosed in boxes. The16 amino acids that are identical in all three segments are marked with triangles. Amino acids are designated by the single-letter code.

Developmental Biology: Gallin et al.

2812 Developmental Biology: Gallin et al.

H2N [.. ::.].. .1 [:::::[:::: I..........

RP P C

II ICOOH

K P AS2 Si B

1kb

Ib I

FIG. 5. Schematic drawing of the structure of L-CAM. Model ofthe L-CAM protein molecule (Upper) aligned to scale with the cDNArestriction map (Lower). The three internal homology regions arestippled, the five potential asparagine glycosylation sites are denotedby vertical lines, the cell membrane and the membrane-spanningregion of L-CAM are hatched, and the small serine-rich region of theintracellular domain is shaded. R, EcoRI; P, Pst I; C, Cla I; K, KpnI; A, Acc I; S2, Sst II; S1, Sst I; B, BamHI.

quence, but not the nucleic acid sequence, reflects thehomology, there may have been a strong selection for specificstructural elements repeated in the L-CAM polypeptide.These elements probably reflect requirements of the cell-binding and calcium-binding functions of the molecule.Each of the homologous repeats consists of =113 amino

acids. Despite the fact that these are roughly the size of animmunoglobulin domain, the sequences of these regions bearno resemblance to immunoglobulin domains. Moreover,whereas immunoglobulin-like domains are characterized byinternal disulfide bonds, the L-CAM homology regions con-tain no cysteine residues. The only cysteine residues inmature L-CAM are outside these regions at positions 9, 449,530, 532, and 539.The complete amino acid sequences of the three major

N-CAM polypeptide chains have recently been determined(31). The extracellular portion of N-CAM contains fiveinternal repeats that are homologous with the domains ofimmunoglobulins and other members of the immunoglobulinsuperfamily. Dot matrix comparison of the nucleic acidsequences and the protein sequences of L-CAM and thelargest (ld) chain of N-CAM indicated that there was nostructural homology between the two molecules. This resultis consistent with previous structural and functional analysesof the two molecules (1, 4, 9, 31).L-CAM and N-CAM are structurally different molecules

that appear to have evolved in distinctly different pathways.These two primary CAMs are expressed in different cellgroups during embryonic development in patterns that sug-gest that they act coordinately to affect intercellular interac-tions and border formation in cell collectives that are under-going induction. This observation implies that there has beena strong selection for their differential expression and dis-tinctly different binding functions during the evolution ofregulative development.The availability of complete cDNA clones for the CAMs

opens up the possibility of using molecular biolkgical tech-niques and DNA transfection to test the effect of expressingthe CAMs at different times in different cell types on thebehavior of individual cells and the formation of cell groups.Such experiments provide a complement to those alreadyperformed with antibodies against L-CAM, which show amarked effect on the patterns of L-CAM- and N-CAM-mediated tissue collectives exchanging signals during devel-opment (13).

We thank Eric Schneider for excellent technical assistance andDonna Atherton of the Rockefeller University Protein SequencingFacility for help with protein sequence determination. W.J.G. is an

R. J. Reynolds Fellow. This work was supported by U.S. PublicHealth Service Grants HD-16550, HD-09635, and AM-04256. Proteinsequence analysis was performed by the Rockefeller UniversityProtein Sequencing Facility, supported in part by funds provided bythe U.S. Army Research Office for the purchase of equipment.

1. Edelman, G. M. (1986) Annu. Rev. Cell Biol. 2, 81-116.2. Thiery, J.-P., Delouvde, A., Gallin, W. J., Cunningham, B. A.

& Edelman, G. M. (1984) Dev. Biol. 102, 61-78.3. Bertolotti, R., Rutishauser, U. & Edelman, G. M. (1980) Proc.

Natl. Acad. Sci. USA 77, 4831-4835.4. Gallin, W. J., Edelman, G. M. & Cunningham, B. A. (1983)

Proc. Natl. Acad. Sci. USA 80, 1038-1042.5. Hyafil, F., Morello, D., Babinet, G. & Jacob, F. (1980) Cell 21,

927-934.6. Yoshida-Noro, C., Suzuki, N. & Takeichi, M. (1984) Dev.

Biol. 101, 19-27.7. Damsky, C. H., Richa, J., Solter, D., Knudsen, K. & Buck,

C. A. (1983) Cell 34, 455-466.8. Imhof, B. A., Vollmers, H. P., Goodman, S. L. & Birchmeier,

W. (1983) Cell 35, 667-675.9. Cunningham, B. A., Leutzinger, Y., Gallin, W. J., Sorkin,

B. C. & Edelman, G. M. (1984) Proc. Nat!. Acad. Sci. USA81, 5787-5791.

10. Edelman, G. M., Gallin, W. J., Delouvde, A., Cunningham,B. A. & Thiery, J.-P. (1983) Proc. Natl. Acad. Sci. USA 80,4384-4388.

11. Johnson, M. H., Maro, B. & Takeichi, M. (1986) J. Embryol.Exp. Morphol. 93, 239-255.

12. Gumbiner, B. & Simons, K. (1986) J. Cell Biol. 102, 457-468.13. Gallin, W. J., Chuong, C.-M., Finkel, L. H. & Edelman,

G. M. (1986) Proc. Natl. Acad. Sci. USA 83, 8235-8239.14. Crossin, K. L., Chuong, C.-M. & Edelman, G. M. (1985)

Proc. Nat!. Acad. Sci. USA 82, 6942-6946.15. Gallin, W. J., Prediger, E. A., Edelman, G. M. & Cunning-

ham, B. A. (1985) Proc. Natl. Acad. Sci. USA 82, 2809-2813.16. Yanisch-Perron, C., Vieira, J. & Messing, J. (1985) Gene 33,

103-119.17. Stratagene Cloning Systems (1986) Bluescript Exo/Mung In-

struction Manual (Stratagene Cloning Systems, San Diego,CA).

18. Sanger, F., Nicklen, S. & Coulson, A. R. (1977) Proc. Natl.Acad. Sci. USA 74, 5463-5467.

19. Biggin, M. D., Gibson, T. J. & Hong, G. F. (1983) Proc. Natl.Acad. Sci. USA 80, 3963-3965.

20. Zagursky; R. L., Baumeister, K., Lemax, N. & Berman,M. L. (1985) Gene Anal. Tech. 2, 89-94.

21. Staden, R. (1982) Nucleic Acids Res. 10, 4731-4751.22. Hunkapillar, M. W., Lujan, E., Ostrander, F. & Hood, L. E.

(1983) Methods Enzymol. 91, 227-236.23. Hemperly, J. J., Hopp, T. P., Becker, J. W. & Cunningham,

B. A. (1979) J. Biol. Chem. 254, 6803-6810.24. Maniatis, T., Fritsch, E. F. & Sambrook, J. (1982) Molecular

Cloning: A Laboratory Manual (Cold Spring Harbor Labora-tory, Cold Spring Harbor, NY).

25. Daniels, D. L., Schroeder, J. L., Szybalski, W., Sanger, F. &Blattner, F. R. (1983) in Lambda II, eds. Hendrix, R. W.,Roberts, J. W., Stahl, F. W. & Weisberg, R. A. (Cold SpringHarbor Laboratory, Cold Spring Harbor, NY), pp. 469-517.

26. Peyrerias, N., Hyafil, F., Louvard, D., Ploegh, H. & Jacob, F.(1983) Proc. Nat!. Acad. Sci. USA 80, 6274-6277.

27. Kyte, J. & Doolittle, R. F. (1982) J. Mol. Biol. 157, 105-132.28. Laemmli, U. K. (1970) Nature (London) 227, 680-685.29. Weber, K. & Osborn, M. (1969) J. Biol. Chem. 244, 4406-

4412.30. Lawler, J. & Hynes, R. 0. (1986) J. Cell Biol. 103, 1635-1648.31. Cunningham, B. A., Hemperly, J. J., Murray, B. A., Prediger,

E. A., Brackenbury, R. & Edelman, G. M., Science, in press.32. Wilbur, W. J. & Lipman, D. J. (1983) Proc. Natl. Acad. Sci.

USA 80, 726-730.33. Shirayoshi, Y., Hatta, K., Hosoda, M., Tsunasawa, S., Saki-

yama, F. & Takeichi, M. (1986) EMBO J. 5, 2485-2488.34. Schuh, R., Vestweber, D., Riede, I., Ringwald, M., Rosen-

berg, U., Jackle, H. & Kemler, R. (1986) Proc. Natl. Acad.Sci. USA 83, 1364-1368.

35. Staden, R. (1982) Nucleic Acids Res. 10, 2951-2961.

--1

Proc. Natl. Acad Sci. USA 84 (1987)