structure cd1a, cd1b, cd1c - pnas · least three components, cdla, cdlb, and cdlc, that are...

5
Proc. Nati. Acad. Sci. USA Vol. 84, pp. 9189-9193, December 1987 Immunology Structure and expression of the human thymocyte antigens CD1a, CD1b, and CD1c (leukocyte differentiation antigens/thymus leukemia antigen/nucleotide sequence/DNA transfer/major histocompatibility complex) L. H. MARTIN, F. CALABI, F.-A. LEFEBVRE, C. A. G. BILSLAND, AND C. MILSTEIN* Medical Research Council Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 2QH, England Contributed by C. Milstein, August 31, 1987 ABSTRACT The CD1 human antigens are a family of at least three components, CDla, CDlb, and CDlc, that are characteristic of the cortical stage of thymocyte maturation. CDla was originally named HTA1 or T6 and thought to be the human equivalent of mouse TM. The genes coding for all three have now been identified by transfection into mouse cells. The transfectants express the surface antigens that can then be recognized by the corresponding cluster of monoclonal anti- bodies used to define the three members of CD1. The full sequence of the genomic DNA is described for all three. The intron-exon structure of CD1a is deduced by comparison with a near-full-length cDNA clone. Similar structures are proposed for the other two, largely based on sequence homology. An unusually long 5'-untranslated exon (280 bases long) is highly conserved between the three genes, suggesting an important but unknown function. CD1c has a duplicated form of this exon that is thought to be spliced out. The major homology between the three antigens is in the j32-microglobulin-binding domain. The general relatedness to major histocompatibility complex class I and class II molecules is significant but low, with no section of higher homology to mouse TM. The CD1 monoclonal antibodies recognize human thymocyte antigens (1-7), which have been often considered as the human homologues of the mouse thymus leukemia (TL) antigens (8). However, CD1 antigens, although evolutionari- ly related to major histocompatibility complex (MHC) class I and class II molecules, do not share any specific sequence homology to Tla (9, 10), do not map in the MHC (11), and appear to have a wider tissue distribution than TL (12). CD1 has now been divided into three subgroups, CD1a, -b, and -c (12), with glycosylated heavy chains of 49, 45, and 43 kDa, respectively, and a peptidic backbone of 35 kDa (13). On the cell surface they occur in a noncovalent, sometimes loose association with P2-microglobulin (14-16). Our identification of five CD1 genes (10) has presented a dilemma, since only three CD1 proteins have been identified. We now report the genomic coding sequences for three of these genest and their identification as CD1a, -b, and -c through the use of DNA transfection assays. MATERIALS AND METHODS DNA-Mediated Gene Transfer. DNA from the X clones XR4B3, XR1L5, and XR7L4 (10) was prepared according to standard protocols (17) and subcloned into either the Xho I or the EcoRI site of a derivative of pSV2.gpt (18, 19), which was a kind gift of M. S. Neuberger (this laboratory). The insert from XR4B3 was subcloned in its entirety using the unique Xho I site in the polylinker. A 10.4-kilobase (kb) Xba I fragment from XR1L5 was blunted and ligated into the blunted EcoRI site of the vector. From XR7L4 only the 7.2-kb EcoRI fragment that contained the CD1 gene was subcloned. Generally, 25 jig of DNA was transferred into the cultured cell lines NS0 (murine myeloma) and EL4 (murine T cell) by electroporation (20). Transfectants were selected for in Dulbecco's modified Eagle's medium supplemented with 10%o (vol/vol) fetal calf serum and mycophenolic acid (10 ,ug/ml). Cells were analyzed in the cell sorter (FACS II) (Becton Dickson) by indirect immunofluorescence. Growing cells were spun down and resuspended (5 x 105 cells per 100 ,ul) in Dulbecco's phosphate-buffered saline (DPBS) contain- ing an appropriate dilution (1:400-1600) of monoclonal anti- body. These were specific for CD1a, -b, and -c and available through the Third International Workshop on Human Leu- kocyte Differentiation Antigens (21) (listed in Fig. 1). Cells were incubated for 45-60 min at 4°C, then washed twice with 10 ml of DPBS, resuspended (5 x 105 cells per 100 ,ul) in DPBS containing a 1:30 dilution of fluorescein isothiocya- nate-conjugated goat anti-mouse IgG (Sigma), incubated 30 min at 4°C, washed as before, and resuspended at a concen- tration of 3 x 106 cells per ml in DPBS for sorting. Lactoperoxidase iodination, immunoprecipitation of cell- surface protein, and polyacrylamide gels (7.5-15% acryl- amide linear gradient) were essentially as described (9). cDNA Cloning and DNA Sequencing. cDNA cloning was as described (9). DNA was sequenced by the dideoxy chain- termination procedure in conjunction with "shotgun clon- ing" (22). The sequence data were processed utilizing DBU- TIL (23) and its accessory programs. The sequence of each gene was completed on both strands, with at least 2-fold redundancy. RESULTS DNA from three CD1 genomic clones, XR4B3, XR7L4, and XR1L5 (10), was isolated and subcloned into the pSV2.gpt vector. The constructs were used to transfect NSO, a murine myeloma, and EL4, a murine thymoma. Stable transfectants were selected with mycophenolic acid. Single-cell clones were prepared from selected pools and screened with a fluorescence-activated cell sorter. Initially, positive cells were detected by a mixture of three monoclonal antibodies, NA1/34 (1), NU-T2 (5), and M241 (3), to represent the CD1a, -b, and -c subgroups, respectively. A number of such transfectants were prepared, but very few expressed the antigen on their surface. This was true when transfectants were prepared using EL4 as well as NSO cells. A positive clone from each transfection was selected and subjected to a detailed analysis using all the currently available CD1 mono- clonal antibodies (Fig. 1). The NSO clone transfected with Abbreviation: MHC, major histocompatibility complex. *To whom reprint requests should be addressed. tThis sequence is being deposited in the EMBL/GenBank data base (Bolt, Beranek, and Newman Laboratories, Cambridge, MA, and Eur. Mol. Biol. Lab., Heidelberg) (accession no. J03584). 9189 The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. §1734 solely to indicate this fact. Downloaded by guest on July 17, 2020

Upload: others

Post on 28-Jun-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Structure CD1a, CD1b, CD1c - PNAS · least three components, CDla, CDlb, and CDlc, that are characteristic of the cortical stage of thymocyte maturation. CDlawasoriginally namedHTA1orT6andthoughttobethe

Proc. Nati. Acad. Sci. USAVol. 84, pp. 9189-9193, December 1987Immunology

Structure and expression of the human thymocyte antigens CD1a,CD1b, and CD1c

(leukocyte differentiation antigens/thymus leukemia antigen/nucleotide sequence/DNA transfer/major histocompatibility complex)

L. H. MARTIN, F. CALABI, F.-A. LEFEBVRE, C. A. G. BILSLAND, AND C. MILSTEIN*Medical Research Council Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 2QH, England

Contributed by C. Milstein, August 31, 1987

ABSTRACT The CD1 human antigens are a family of atleast three components, CDla, CDlb, and CDlc, that arecharacteristic of the cortical stage of thymocyte maturation.CDla was originally named HTA1 or T6 and thought to be thehuman equivalent of mouse TM. The genes coding for all threehave now been identified by transfection into mouse cells. Thetransfectants express the surface antigens that can then berecognized by the corresponding cluster of monoclonal anti-bodies used to define the three members of CD1. The fullsequence of the genomic DNA is described for all three. Theintron-exon structure of CD1a is deduced by comparison witha near-full-length cDNA clone. Similar structures are proposedfor the other two, largely based on sequence homology. Anunusually long 5'-untranslated exon (280 bases long) is highlyconserved between the three genes, suggesting an importantbut unknown function. CD1c has a duplicated form of this exonthat is thought to be spliced out. The major homology betweenthe three antigens is in the j32-microglobulin-binding domain.The general relatedness to major histocompatibility complexclass I and class II molecules is significant but low, with nosection of higher homology to mouse TM.

The CD1 monoclonal antibodies recognize human thymocyteantigens (1-7), which have been often considered as thehuman homologues of the mouse thymus leukemia (TL)antigens (8). However, CD1 antigens, although evolutionari-ly related to major histocompatibility complex (MHC) classI and class II molecules, do not share any specific sequencehomology to Tla (9, 10), do not map in the MHC (11), andappear to have a wider tissue distribution than TL (12). CD1has now been divided into three subgroups, CD1a, -b, and -c(12), with glycosylated heavy chains of 49, 45, and 43 kDa,respectively, and a peptidic backbone of 35 kDa (13). On thecell surface they occur in a noncovalent, sometimes looseassociation with P2-microglobulin (14-16). Our identificationof five CD1 genes (10) has presented a dilemma, since onlythree CD1 proteins have been identified. We now report thegenomic coding sequences for three of these genest and theiridentification as CD1a, -b, and -c through the use of DNAtransfection assays.

MATERIALS AND METHODSDNA-Mediated Gene Transfer. DNA from the X clones

XR4B3, XR1L5, and XR7L4 (10) was prepared according tostandard protocols (17) and subcloned into either the Xho I orthe EcoRI site of a derivative ofpSV2.gpt (18, 19), which wasa kind gift of M. S. Neuberger (this laboratory). The insertfrom XR4B3 was subcloned in its entirety using the uniqueXho I site in the polylinker. A 10.4-kilobase (kb) Xba Ifragment from XR1L5 was blunted and ligated into the

blunted EcoRI site ofthe vector. From XR7L4 only the 7.2-kbEcoRI fragment that contained the CD1 gene was subcloned.Generally, 25 jig of DNA was transferred into the culturedcell lines NS0 (murine myeloma) and EL4 (murine T cell) byelectroporation (20). Transfectants were selected for inDulbecco's modified Eagle's medium supplemented with10%o (vol/vol) fetal calf serum and mycophenolic acid (10,ug/ml). Cells were analyzed in the cell sorter (FACS II)(Becton Dickson) by indirect immunofluorescence. Growingcells were spun down and resuspended (5 x 105 cells per 100,ul) in Dulbecco's phosphate-buffered saline (DPBS) contain-ing an appropriate dilution (1:400-1600) of monoclonal anti-body. These were specific for CD1a, -b, and -c and availablethrough the Third International Workshop on Human Leu-kocyte Differentiation Antigens (21) (listed in Fig. 1). Cellswere incubated for 45-60 min at 4°C, then washed twice with10 ml of DPBS, resuspended (5 x 105 cells per 100 ,ul) inDPBS containing a 1:30 dilution of fluorescein isothiocya-nate-conjugated goat anti-mouse IgG (Sigma), incubated 30min at 4°C, washed as before, and resuspended at a concen-tration of 3 x 106 cells per ml in DPBS for sorting.Lactoperoxidase iodination, immunoprecipitation of cell-surface protein, and polyacrylamide gels (7.5-15% acryl-amide linear gradient) were essentially as described (9).cDNA Cloning and DNA Sequencing. cDNA cloning was as

described (9). DNA was sequenced by the dideoxy chain-termination procedure in conjunction with "shotgun clon-ing" (22). The sequence data were processed utilizing DBU-TIL (23) and its accessory programs. The sequence of eachgene was completed on both strands, with at least 2-foldredundancy.

RESULTSDNA from three CD1 genomic clones, XR4B3, XR7L4, andXR1L5 (10), was isolated and subcloned into the pSV2.gptvector. The constructs were used to transfect NSO, a murinemyeloma, and EL4, a murine thymoma. Stable transfectantswere selected with mycophenolic acid. Single-cell cloneswere prepared from selected pools and screened with afluorescence-activated cell sorter. Initially, positive cellswere detected by a mixture of three monoclonal antibodies,NA1/34 (1), NU-T2 (5), and M241 (3), to represent the CD1a,-b, and -c subgroups, respectively. A number of suchtransfectants were prepared, but very few expressed theantigen on their surface. This was true when transfectantswere prepared using EL4 as well as NSO cells. A positiveclone from each transfection was selected and subjected to adetailed analysis using all the currently available CD1 mono-clonal antibodies (Fig. 1). The NSO clone transfected with

Abbreviation: MHC, major histocompatibility complex.*To whom reprint requests should be addressed.tThis sequence is being deposited in the EMBL/GenBank data base(Bolt, Beranek, and Newman Laboratories, Cambridge, MA, andEur. Mol. Biol. Lab., Heidelberg) (accession no. J03584).

9189

The publication costs of this article were defrayed in part by page chargepayment. This article must therefore be hereby marked "advertisement"in accordance with 18 U.S.C. §1734 solely to indicate this fact.

Dow

nloa

ded

by g

uest

on

July

17,

202

0

Page 2: Structure CD1a, CD1b, CD1c - PNAS · least three components, CDla, CDlb, and CDlc, that are characteristic of the cortical stage of thymocyte maturation. CDlawasoriginally namedHTA1orT6andthoughttobethe

Proc. Natl. Acad. Sci. USA 84 (1987)

40 -

E ER4 E ER1 E NR7

30-

20-

10

-~. I m0 v-tJ)'M C] z- r- NR W 0° I

Z CD1a I CD1bj CD1c 1

FIG. 1. Cytofluorimetric analysis of surface expression of trans-fected genes, using a panel of monoclonal antibodies. The numbersrepresent the net mean fluorescence obtained in the fluorescence-activated cell sorter. NR7 cells are R7L4-transfected NSO cells (cloneNR7L4R7.6). ER4 cells are R4B3-transfected EL4 cells (cloneER4B3.28). ER1 cells are RlL5-transfected EL4 cells (clone ER1L-5.3). Monoclonal antibodies specific for CDla, -b, or -c, respectively,are as described in ref. 21.

XR7L4 gene (NR7) was labeled by the antibodies of the CD1cgroup, but not by CD1a or -b specific antibodies. The EL4clone transfected with XR4B3 (ER4) was recognized by CD1aantibodies, whereas the XR1L5 transfectant (ER1) was rec-ognized by CD1b antibodies, though some background la-beling was present with other antibodies in both cases.Whether this background was artifactual or represented atrue cross-reaction remains uncertain. The level of expres-sion in the transfectants is comparable to or higher thanMOLT-4 (results not shown).The transfected gene products were in some cases char-

acterized further by immunoprecipitation of lysates preparedfrom 1251-surface-labeled transfected cells (Fig. 2). The ly-sates were incubated with each of the three "prototype"

0 -: r~_j (A WWL Z L .U]

-j r, r-C0 0~Q

z z

CD1: a c b c a a c c a

rz

z

b

40w- -qfwp

FIG. 2. NaDodSO4/PAGE analysis of parental and transfectedcell lines. Cells (named as in Fig. 1) were 12I-surface-labeled thenimmunoabsorbed using the following monoclonal antibodies that arespecific for an individual CD1 subgroup. Lanes: a, NA1/34; b,NU-T2; c, M241. We are grateful to R. Knowles (Memorial Sloan-Kettering Cancer Center) and K. Sagawa (Kurume UniversitySchool of Medicine, Japan) for providing M241 and NU-T2 samples,respectively.

antibodies, NA1/34, NU-T2, and M241. The slightly differ-ent mobility of the transfected gene products in comparisonto the MOLT-4 proteins is probably due to differences inglycosylation. Though difficult to detect in Fig. 2, similar toMOLT-4, the transfectants express the CD1 proteins inassociation with P2-microglobulin.The results support our proposal (9, 24) that XR4B3

encodes CD1a and that XR7L4 encodes CD1c. In addition weshow that XR1L5 encodes CD1b.Genomic Organization of the CD1a Gene. We have de-

scribed (9) cDNA clones derived from the R4 gene, fromwhich the partial amino acid sequence of the CD1a a chainwas predicted. Fig. 3 describes a longer cDNA clone (XAA3)corresponding to a nearly full-length copy of the 2.1-kbtranscript (9). This clone contains 1840 bases excluding thepoly(A).poly(T) tract. It includes an open reading frame of981 bases starting with the initiator ATG suggested in Fig. 4.This is followed by a predicted amino-terminal sequence thatfulfills the criteria for a signal sequence (25). The size of thepredicted polypeptide product (37 kDa) agrees well with thatof the CD1a a-chain precursor, as determined by in vitrotranslation studies (9). The first amino acid of the maturechain has been chosen in part by sequence considerations(26) and in part by homology to a rabbit CD1 sequence (W. J.Mandy, personal communication). Upstream of the codingsequence, there is an unusually long (280-base) putativelyuntranslated 5' stretch that contains several ATG triplets aswell as short open reading frames.We have also completed the sequence of the relevant

fragments of the genomic clone XR4B3 (10). For reasons ofspace we present below the sequences of the exons and onlysmall segments of introns. The complete sequences areavailable and will be published elsewhere. A comparison withthe sequence of the cDNA clone described above shows theCD1a gene to be organized as follows (Figs. 3 and 4). Aputative exon 1, of which only 20 base pairs are known fromthe cDNA sequence, cannot be mapped with respect to theother exons, since it is not found in the available genomicclones. Exon 2 encodes a 5'-untranslated sequence as well asthe signal peptide (L); exons 3-5 encode the bulk of the achain; exon 6 encodes the connecting peptide, transmem-brane segment, and the adjacent basic intracytoplasmicresidues; and exon 7 encodes the remainder of the shortintracytoplasmic tail and the 3'-untranslated sequence. Thus,the CD1a gene is organized in a similar way to MHC class Igenes. In MHC class I genes, exons correlate well withprotein domains. The similar organization ofthe CD1a codingsequence also suggests that the extracellular portion of theCD1a a chain is organized in three domains (al-3). Themost striking difference is the length of the 5' region, whichis >280 bases in CD1a against an average of 18 bases in MHCclass I genes (27) and is encoded in more than one exon. Inaddition, the cytoplasmic segment of CD1a is much shorterthan MHC class I and is split among two rather than threeexons.

Predicted Protein Structure of CD1a, CD1b, and CD1c. Thecomplete nucleotide sequence was also determined of the7.2-kb EcoRI fragment containing the CD1c gene and of the10.4-kb Xba I fragment containing the CD1b gene. Althoughthe actual splicing pattern remains speculative, the al-a3coding and transmembrane exons can be identified by com-parison with CD1a (Fig. 4) and by analysis of potential splicesignals (28). The carboxyl terminus of CD1c, however, isunclear. It may terminate as shown with the tryptophanresidue. More likely, based on homology to CD1a, the RNAwill splice on the 5' side of the GTGA end of CD1c shown inFig. 4, contribute two nucleotides to the first codon of adownstream exon, and avoid the early termination signal.Such a putative exon has not been identified yet.

9190 Immunology: Martin et al.

Dow

nloa

ded

by g

uest

on

July

17,

202

0

Page 3: Structure CD1a, CD1b, CD1c - PNAS · least three components, CDla, CDlb, and CDlc, that are characteristic of the cortical stage of thymocyte maturation. CDlawasoriginally namedHTA1orT6andthoughttobethe

Proc. Natl. Acad. Sci. USA 84 (1987) 9191

XM3 cDNA *:::-_TPo ly AA In

BASES* , .' . .* "

( Xb-S-Xh) B R , SEE .R , H '.Xb'\ S

5'UT L al a2 a3 TC C 3UT

S S H H

I I I5'UT L

SR2H Xh R R

a2 a3 Tc

R S Xb

CD1c

1Kb 5'UT L _5'UT 4k al

H S ( Xh-S-Xb)

l IT T

C MUT

H R

I%c2. a3 T

FIG. 3. Partial restriction maps ofthe CDla, -b, and-c genes. B, BamHI; H, Hindlll; R, EcoRI; S, Sac I; Xb, Xba I; Xh, Xho I. The Xb-S-Xhsites in parentheses are not genomic but are part ofthe X2001 polylinker. Stop codons (e) and polyadenylylation signals (v) that have been verifiedfrom cloned cDNAs are shown; potential sites are indicated by open symbols. The suggested exons are shown in boxes that are filled in to indicateputative protein domains. 5'UT, 5' untranslated; L, leader; T, transmembrane; C, cytoplasmic; 3'UT, 3' untranslated. The bold arrows (-a)indicate the tandem repeats within CD1c. Alu repeats are in small boxes, with an arrow to indicate the polarity. The full sequence of the XAA3cDNA can be deduced from Fig. 4, except for the first 20 bases, which are 5'-GCCTCTTCTGCGGGATGGAT-3', and for the 3'-untranslatedregion, which is identical to the one described in ref. 9. Another Sac I site is present in CD1b just upstream of exon C 3'UT (see ref. 10).

CDla, -b, and -c are predicted to be of similar molecularweight, consistent with the observed size of deglycosylatedCDla and CD1c chains (9, 13), which is considerably smallerthan that ofmostMHC class I molecules. This size differencecan be attributed to the cytoplasmic tails. An unusual featurein the three CD1 molecules is a third cysteine in nonhomol-ogous positions in the a2 domain. There are three potentialN-glycosylation homologous sites, two in al and one in a2.One additional site is also found at various positions in al inCD1a and CD1c, and in a3 in CDlb. This is consistent withbiochemical studies suggesting four N-linked oligosaccharidechains in CD1a. However, since the glycosylated CD1b andCD1c a chains are -4 and -6 kDa smaller, respectively, thanCDla, one or more of the CD1b and CD1c glycosylation sitesmay not be used. A comparison of nucleotide and amino acidsequences (Table 1) shows that the a3-coding exons are muchmore conserved than either al or a2. More silent replace-ments are found in both al and a2 than in a3.A Long Noncoding 5' Sequence Is Highly Conserved in CD1

Genes and Duplicated in CD1c. Similarly to MHC genes, CD13'-untranslated regions are not conserved. Thus, probesderived from the 3' end of the CD1a cDNA behave as if theywere gene specific in hybridization studies (9), and directsequence comparison shows no clear homology within thisregion. The contrary is true at the 5' end, where CD1sequences upstream of the main coding region display aremarkable degree of primary sequence conservation. Ho-mology spanning the whole length of the CD1a leader exon,of which 270 nucleotides out of 300 are presumably untrans-lated, is readily detected using the DIAGON program (29) inCD1b and in two separate regions of the CD1c sequenceupstream of the al-a3 coding exons (referred to as 5'UT-Land 5'UT-OL in Figs. 3 and 4). As summarized in Table 1, thenucleotide sequence homology of this exon is comparable tothat of protein-encoding exons. Such homology is also as highin the upstream region (270 bases) as it is in the protein-encoding region (30 bases) and further extends to the flankingregions. As stated above, the CD1c gene contains thesequence twice, within tandemly repeated units =700 bases

long. Each repeat unit extends from 130 bases on the 5' sideto 250 bases on the 3' side of the exon. Sequence homologybetween the two units is =90%. Both repeats contain asequence encoding a putative signal peptide; however, in the3' unit (iL), there is a termination triplet five codonsdownstream of the presumed translational start. Thus, it isunlikely that this sequence is represented in functionalmRNA.

DISCUSSIONUtilizing the technique ofDNA transfer, it has been possibleto identify the genes encoding the human CD1a, -b, and -cantigens. Assays with the panel ofmonoclonal antibodies thatdefine the three CD1 subgroups resulted in a clear delineationof reactivities: XR4B3 coded for CD1a, XR1L5 coded forCD1b, and XR7L4 coded for CD1c. The transfected cell lineswill prove useful in defining the specificity of newly isolatedmonoclonal antibodies. Biochemical and immunohistochem-ical studies (12) have shown that the CD1 subgroups differ intissue distribution and have a wider range than TL antigens.Moreover, the levels of expression for CD1a, -b, and -c arenot tightly linked (24). Thus, the various CD1 genes may beunder the influence of different regulatory elements.

Nucleotide and amino acid sequence comparisons show alimited degree of conservation, especially low in the mem-brane-distal domains (al and a2). Indeed, in this regionsequence homology among CD1 molecules is significantlyless than among the equivalent domains of MHC molecules(27) and even somewhat lower than among immunoglobulinand T-cell-receptor variable-region subfamilies. Even so, theoverall folding pattern of CD1 antigens seems to be con-served, because profiles of secondary-structure propensity,hydrophobicity, and charge are largely superimposable (datanot shown). Secondary-structure predictions (30, 31) suggestthat the middle section of class I al and a2 and class II 31domains folds in an a-helical configuration. Application ofsecondary-structure predictive algorithms to CD1 heavychains, as well as the conserved Pro-Xaa-Pro sequence in the

CDla

Xb

CDlb L=}

Immunology: Martin et al.

Dow

nloa

ded

by g

uest

on

July

17,

202

0

Page 4: Structure CD1a, CD1b, CD1c - PNAS · least three components, CDla, CDlb, and CDlc, that are characteristic of the cortical stage of thymocyte maturation. CDlawasoriginally namedHTA1orT6andthoughttobethe

9192 Immunology: Martin et al. Proc. Natl. Acad. Sci. USA 84 (1987)

CDla gataGGAGGAAGTGGTGTAAGAAAGTGA-AAA-AAT--C-AGA---CG-GGATAAATATGCTTAGCDlb caaggag.TAT--.AAGGA.G. .GA.GACA----.GG.----.. .CG.....GT. .. .G.G.TA.. ..AAACT.TAA.A.T.AGGG.TTGA-.....A.A. .GTGAG. .. .A.AGG....G.CDlc caggaag. .A.--. .AG. .. .TAA....C. .A. ..G. .----...C.....TC.T.-..A.....AATAA.TTT. .TT.AGAG.. .GTG .......T.....AGGAC. .A-.

V4L caggaag. .A.--. .AG ...A.C..C.A. .. GG.----...C. .....AA. .TC.TA. .G.-A.....AGTAA.TTT. .TT.AGAG.... TG........T.....AGG.CT.A-.

CDla GTTTTTGAAGGAGTGGATTTT---CTTTGTTGCAGTCAGGGG.---AGGTTTGTCTGT----TGGCTGCAGAAAGAAGTCAGAATAGAGATATCGTGGGGTAGGTTTGTTTGGAACAGAAATCACDlb C.G.GG.--.AG.CAC....TCT. .GAAAA. --...TT. .ATGAGGAAGAGA.T. .GCAGT. .. .AA.AGAG......CT.C. .-.G. .CT.A. .AAA. .C. .. .C.GAA.TTG. .G....MDc .. .GAG.--.A. .CA....C. .--... .A..T....C. .CTGATG. .GAA.AT. ..----.. .TA.A---.G.......T. .G..CA.A. ..A.A.A....C.AA.....G....

VL..AAG.--.A. .CA.....G-------..A..T....T. .CTGATG. .GAA.AT. ..CTGT. .. .TA.A---.G.T......T ... G.A.A...A.A.A....C.AA.....G....

LEADER -10 -1 +1

CDla AAACATTCGGGAGATAACGAAGTTCGTTGTCTCTGTGTT TCAGGTGATCGCgag331bp ttgtcgcagL P F Q L F G N S E H A

CDlb . .T.....GC.C.G.CAGT.AG.A.TTG---.. .. .C.C.G...A....C.GC. .. .C.T. ..A.C.......CTCT.T.T. .....G.AA. .G. .A.C.T.gtaag 296bp tcttcacagQ FL L L G D A

CDlc GCA.A. .GC........A............---. C......C.. .AGT. .. .TGC. .C....C. .CTT.G.....G.A.......gtaag 640bp to '4#L

V(L GCA.ATGGC.---.......... A......C......C. .TAGT.....GC. .C ...T.C. .CTC...C.....G. .A......TAgtatg 393bp to CDlca~lOcX1 10 20 30 40y L KE PL SF H VT WI A SF Y H SW KQ0NL V SG W L SD LQTHTWDITDSM

CDla GGTAGACTTTCTCTTACGACCTCTT ATCTGACAACGTTAGTGTATATGAATAACGGCGhICa F QG, T IQ0T S TiI ST A T QG D I G D

CDlb CCT..C...G...GAC....T....T.T.CA.C.T.G....ACJ GA....GC....C..AA.G....C...T..GA......T....GG....T...G.C..Ala SQ H V I10 F VIIQ A RG QG D E G E

CDc CATC.C. .....A.ACG.........T.CA.....TTC.A.A.. GT ..A.A....GC..G.GG. .A. .G....A....GAC. .G.......GG......TG.A. .A

50 60 70 80S TI VF LC P WS RG NFSN E E W KE LET LFR I RTI R SF EG IR RYA

CDla AGCACCATCGTTTTCCTGTGCCCCTGGTCCAGGGG TCAGCAATGAGGAGTGGAAGGAACTGGAAACATTATTCCGTATACGCACCATTCGGTCATTTGAGGGAATTCGTAGATACGCCG AlI K K D K V A ElI V YI FG FAR EV Q DF

CDlb G. .... TGC.A.A.. ....AAG..,r......T.AA.. .T..TG..A....GTTGCT..GT.A..GGAGA.....AG.CTA..T.T..G.A.TCGC.CGA.A.G.A.AAGAC.TT...G I HINI K L SD L FY L FG LT RE D H

CD1c G...A.AA....CA ..... A.......... ..T.TCA. .C. .A. .GTTG.T..T...T.CTA.CT.T. .G.A.T.AC.CG. .AG. .....AAGACC.T. .A

HELQFET 100 vW 110HEQFETyrPF E I QVT GGC E LH SG KV SG SF L QL

CDla CATGAATTGCAGTTTGAATgtgagttcag 613bp ataaccccagATCCTTTTGAGATACAGGTGACAGGAGGCTGTGAGCTGCACTCTGGAAAGGTCTCAGGAAGCTTCTTGCAGTTAG DF M K G IA G A IV R G

CDlb GG....T. .C. ..A.GA....gtgagtctag 543bp ctctacatag.C.C.C....C... .GC.T. .C...A....A.T....GGT.C.AT. .T.....C. .AG.GG.S QD Y SK V K A S PE G F V

CDlc AG.C. .GATT.C.CGA.. .gtaagttcaa 663bp cttcctccag .....C....AG......A..CG.........T......AG.C....A.G.....T... G..

120 130 140 150A YQ GSD F V SF QN N SW L PY PVAG N MA KHF C KV LN Q NQHE N DI

C~la GTACAGTAATTTACTCCACATAGTGCTTCGGCGGAAGCAGATCGAATCCACGACGAGAAGCTL G L L V KI A C V S E G S R Q K A L II Y G I MET

CDib . ..CTAGG....TG. .T. .CC .....TG. .A.'4 C...C.TG. .....T.CC.....AA.G.. .C.GC.G. ..AC. .A.A....TGC.C.AA. .. .TA. .AT.....AGG.ATC.TG. .A.CTF N L LL T T V S G C S L Q SV H L HQ0Y EG VT E

CDc . ....TCA.C . ...T....AC.... T.... ..CAA ...G....C....GCTG.. .A.G.T....C.AAG.G.....TC.TC.A.....TC.GT.TG.A.GCGTCACAGA.

160 IV 170 180 czT- HN L L SDTC PRF I LGL L DA G KAH LQ0R QV al K P

CDla ACA---CACAATCTTCTCAGTGACACCTGCCCACGTTTCATCTTGGGTCTTCTTGATGCAGGAAAGGCACATCTCCAGCGGCAAGgtcagtcctg 483bp tcctttgcagTGAAGCCCV R I Y E Y L V N D

CDlb GTG---AGA.T...C.C...TA. .. .A.....C..A.ATC.....CG.C. .CA......A.. .G.....G..AA.A. .....gttagtcctg 184bp cctgccttag.....TV Y I RS L M Y VH R

CMc . ..GTGT.T....CA.A. .AAG....T....C. .A. .TC......C.G .....G.G...ATGT. .G.A. .CA....gtcagtagtt 203bp ctctctgcag. ...G .. .A

190 200 17 210 220E A W L SHGP S PG PGH LQ L VCH V SGFYP KP VW VM W MR G EQEQ Q

CDla GAGCGCGCCTGCCGCTGCTGCTTCGTGGGCTTTAGTCACAACCTTGTAGGAGGGTACGACGGS R

CDb .........AG........A.....G.........................................S R L 5 Q L A T NL

C~lc ..A.......AG.C.....C.T...GT ....G...TT...G..T..T....C...C..C........T..T.....CA......AA...A.....A.T.

230 240 250 260GT QR G DIL P SA DGT W YLR AT LE VA AG EA A DL SCR V KH S SLE

C~la GGATACAGGCTTGCATCGTGAAGTTTCCCACTGGTGCCGGAGACGCTTCGCGTAGAACGCAGL N MNTW D D G

CDlb ......T......C...A... ~s.........A......T....A.A.......G...G...............T...K H N Q VI S E P G R G

C~lc ....A.A.AT. .T. .T. .TC.T. .T.A............T.AG.TG.T.......AT....A.....C.T... .G....T....A.....GA........GA

270 TMC 280 290 300G Q D I V L Y W G lu H H S S V G F I I L A V I V P - L L L L I G

C~la GGCCAGGACATCGTCCTCTACTGGGgtgagaaaaa 333bp aaattcacagAGCATCACAGTTCCGTGGGCTTCATCATCTTGGCGGTGATAGTGCCT---TTACTTCTTCTGATAGGTI A rg N PT I S V I S L C

CDib .......A.......Agtaagaaata 239bp caattgccagGAA.C.C. ..C.. .A.T .....CA. .TG.T...AA.A......TCC. .G....T..C. .T.CI F M NW A V V I V

C~lc .......A.......gtaagactgg 306bp atatgtgcagGA..C. .. TT.....A. .AAT.GG. .TGC....TA.......C---. .GG.GA....A.A....TC

C 310L A L W F R K R Cy s F C*

C~la CTTGCGCTTTGGTTCAGGAAACGCTGgtgagttctt 169bp tctcatccagTTTCTGTTAAY MR Ar gSY Q0N I P

CDlb .. ..A.A.....AT.T.G.....C.gtgagttggt 553bp tttttaacagG.CATA.C.GAATATCCCATGAV K HW *

C~lc .....T.T.A....T.A. .. .G.A.. .GTGA

FIG. 4. DNA and deduced protein sequences of CDla, .b, and -c. CD1a has been used as the reference sequence, thus nucleotides and aminoacids (in standard one-letter code) are shown for CD1b and -c only where differences from CD1a occur. Exon, capital letters; intron, lowercaseletters. The v and v indicate presumed disulfide loops in the a2 and a3 domains, respectively. Potential N-linked glycosylation sites are boxed,andstpcodosare ndicaed wit an aserisk 9reest h emn fC~ hw n i.3a o aee 'TL hsih ep*"*resumed

nonfunctional copy of the leader exon.

middle of the a2 domains, does not agree with an extensive In terms of gene organization, the most substantial differ-a-helical region and suggests a different folding than that of ence between CD1 and MHC genes is found in the presenceMHC class I and II. Thus, the conclusion is strengthened that of a long sequence upstream of the main coding readingCD1 are distantly related to the MHC. frame. Such sequence is well conserved among CD1 genes,

Dow

nloa

ded

by g

uest

on

July

17,

202

0

Page 5: Structure CD1a, CD1b, CD1c - PNAS · least three components, CDla, CDlb, and CDlc, that are characteristic of the cortical stage of thymocyte maturation. CDlawasoriginally namedHTA1orT6andthoughttobethe

Proc. Natl. Acad. Sci. USA 84 (1987) 9193

Table 1. Percent sequence identity between CDla, CDMb, and CD1c clones

Sequence identity, %

5'UT Leader al a2 a3 TMC OverallCD1a CD1b CD1c tL CD1a Ct)lb CD1c CD1a CD1b CD1c CD1a CD1b CD1c CD1a CD1b CD1c CD1a CD1b CD1c CD1a CD1b CD1c

CD1a 56 75 73 - 67 77 - 61 64 64 69 - 92 77 - 59 63 67 70CD1b - 63 60 45 -72 40 - 69 51 - 63 90 - 78 43 65 59 - 68CD1c 92 65 55 50 50 - 52 49 - 65 70 - 47 28 - 57 58

Nucleotide identities are given in bold type; amino acid identities are in standard type. Percent identity was calculated using the genetic eventsoption of the program TWOB (R. Staden, personal communication). qL refers to the sequence shown in Fig. 3 up to the beginning of the leadersequences. 5'UT, 5' untranslated; TMC, transmembrane and cytoplasmic segment.

suggesting the existence of selective constraints. Long se-quences upstream of the main coding region have beendescribed in other transcripts (32) and postulated to affectmRNA stability and/or translational efficiency. In this re-gard, it is of interest to note that the proposed AUG initiatordoes not lie in an optimal context, whereas the upstreamsequence contains several AUG triplets, one of which (con-served in CD1a and in CD1c, but not in CD1b) is foundimmediately upstream and in a different phase, with noterminator interposed. According to the model described inref. 32, such an arrangement can greatly decrease theefficiency of translation of the main coding sequence, whichmight explain why CD1a shows a slow biosynthetic rate,despite the relative abundance of mRNA (unpublished re-sults). In the CD1c gene, this upstream, mainly noncodingexon is present in two copies as part of a larger duplicatedunit. The duplication has accumulated a sequence divergence-10%o. No allele lacking the duplication (as assessed by thesize of the 7.2-kb EcoRI fragment) has been observed in DNAfrom -20 individuals (unpublished observations). The se-quence encoding the signal peptide of the downstream repeatis interrupted by a stop codon, suggesting that this copy isspliced out of functional CD1c mRNA, and for this reason islabeled as a 4i leader. Also the putative donor splice site at the3' border of the exon diverges from the consensus sequence(28). The remarkable length and sequence conservation ofthis 5'-untranslated exon remain an intriguing puzzle.

N. Migone performed early studies on a second CD1a gene, whichalthough not described in this paper, were very valuable in the work.We thank J. M. Jarvis, D. Gilmore, and R. Pannell for invaluablehelp. L.H.M. was supported by a Fellowship from the WellcomeTrust, and F.C. by a Special Fellowship from the Leukemia Societyof America.

1. McMichael, A. J., Pilch, J. R., Galfre, G., Mason, D. Y.,Fabre, J. W. & Milstein, C. (1979) Eur. J. Immunol. 9,205-210.

2. Reinhertz, E. L., Kung, P. C., Goldstein, G., Levey, R. H. &Schlossman, S. P. (1980) Proc. Natl. Acad. Sci. USA 77,1588-1592.

3. Knowles, R. W. & Bodmer, W. F. (1982) Eur. J. Immunol. 12,676-681.

4. Olive, D., Debreuil, P. & Mawas, C. (1984) Immunogenetics20, 253-264.

5. Hagiwara, S., Yasuda, K., Shiraishi, M., Itok, K., Okubo, K.,Matsuo, Y. & Sagawa, K. (1985) Saishin Igaku 40, 636-651.

6. Kahn-Perles, B., Wietzerbin, J., Caillol, D. H. & Lemmonier,

F. (1985) J. Immunol. 134, 1759-1765.7. Amiot, M., Bernard, A., Ragnal, B., Knapp, W., Deschildre,

G. & Boumsell, L. (1986) J. Immunol. 136, 1752-1758.8. Flaherty, L. (1981) in The Major Histocompatibility Complex

in Immunobiology, ed. Dorf, M. E. (Wiley, New York), pp.33-57.

9. Calabi, F. & Milstein, C. (1986) Nature (London) 323, 540-543.10. Martin, L. H., Calabi, F. & Milstein, C. (1986) Proc. Natl.

Acad. Sci. USA 83, 9154-9158.11. Calabi, F., Schroeder, J., Martin, L. H. & Milstein, C. (1987)

in Leucocyte Typing III, ed. McMichael, A. J. (Oxford Univ.Press, Oxford), pp. 72-74.

12. Boumsell, L. & Knowles, R. (1987) in Leucocyte Typing III,ed. McMichael, A. J. (Oxford Univ. Press, Oxford), pp. 71-72.

13. Van de Rijn, M., Lerch, P., Knowles, R. W. & Terhost, C.(1983) J. Immunol. 131, 851-855.

14. Ziegler, A. & Milstein, C. (1979) Nature (London) 279, 243-244.

15. Kefford, R. F., Calabi, F., Fearnley, I. M., Burrone, 0. R. &Milstein, C. (1984) Nature (London) 308, 641-642.

16. Bernabeu, C., Van de Rijn, M., Lerch, P. G. & Terhorst, C.(1983) Nature (London) 308, 642-645.

17. Maniatis, T., Fritsch, E. F. & Sambrook, J. (1982) MolecularCloning: A Laboratory Manual (Cold Spring Harbor, ColdSpring Harbor, NY).

18. Mulligan, R. C. & Berg, P. (1981) Proc. Natl. Acad. Sci. USA78, 2072-2076.

19. Neuberger, M. S. (1983) EMBO J. 2, 1373-1378.20. Potter, H., Weir, L. & Leder, P. (1984) Proc. Natl. Acad. Sci.

USA 81, 7161-7165.21. McMichael, A. J. & Gotch, F. M. (1987) in Leucocyte Typing

III, ed. McMichael, A. J. (Oxford Univ. Press, Oxford), pp.31-62.

22. Bankier, A. T. & Barrell, B. G. (1983) in Techniques in Nu-cleic Acid Biochemistry, ed. Flavell, R. A. (Elsevier Ireland,Limerick), Vol. 58, pp. 1-34.

23. Staden, R. (1986) Nucleic Acids Res. 14, 217-231.24. Milstein, C., Calabi, F., Jarvis, J. M., Kefford, R., Martin,

L. H. & Migone, N. (1987) in Leucocyte Typing III, ed.McMichael, A. J. (Oxford Univ. Press, Oxford), pp. 882-889.

25. Inouye, M. & Halegona, S. (1980) CRC Crit. Rev. Biochem. 7,339-371.

26. Von Heijne, G. (1985) J. Mol. Biol. 184, 99-105.27. Srivastava, R., Dueeman, B. W., Bico, P. A., Sood, A. K. &

Weissman, S. M. (1985) Immunol. Rev. 84, 93-121.28. Mount, S. M. (1982) Nucleic Acids Res. 10, 459-472.29. Staden, R. (1982) Nucleic Acids Res. 10, 2951-2961.30. Vega, M. A., Bragado, R., Ezquezza, A. & Lopez de Castro,

J. A. (1984) Biochemistry 23, 823-831.31. Auffray, C. & Novotny, J. (1984) Nucleic Acids Res. 12,

243-255.32. Kozak, M. (1986) Cell 47, 481-483.

Immunology: Martin et al.

Dow

nloa

ded

by g

uest

on

July

17,

202

0