xi. yeast sequencing reports. sequencing of a 13·2 kb segment next
Post on 25-Jan-2017
216 Views
Preview:
TRANSCRIPT
YEAST VOL. 10: S81-S91 (1994)
ooooo 0 0” 0 XI 0 0 Yeast Sequencing Reports
Sequencing of a 13.2 kb Segment Next to the Left Telomere of Yeast Chromosome XI Revealed Five Open Reading Frames and Recent Recombination Events with the Right Arms of Chromosomes I11 and V DESPINA ALEXANDRAKIt$* AND MARIA TZERMIAt
?Foundation for Research and Technology-HELLAS, Institute of Molecular Biology and Biotechnology and $University of Crete, Department of Biology, P. 0. Box 1527, Heraklion 711 I0 Crete, Greece
Received 15 August 1993; accepted 1 December 1993
We report the entire sequence of a 13.2 kb segment next to the left telomere of chromosome XI of Saccharomyces cerevisiae. A 1.2 kb fragment near one end is 91% homologous to the right arm of chromosome I11 and 0.7 kb of that are 77% homologous to the right arm of chromosome V. Five open reading frames are included in the sequenced segment. Two of them are almost identical to the known YCR104W and YCR103C hypothetical proteins of chromosome 111. A third one contains a region homologous to the Zn (2)-Cys (6) binuclear cluster pattern of fungal transcriptional activators. The fourth one, part of which is similar to the mammalian putative transporter of mevalonate, has the structure of membrane transporters. The fifth one is similar to yeast ferric reductase. The sequence has been deposited in the EMBL data library under Accession Number X75950.
KEY WORDS - Genome sequencing; Saccharomyces cerevisiae; chromosome XI; chromosomal recombination; telomeres; gene redundancy.
INTRODUCTION In the course of the European Community (BRIDGE) project to sequence Saccaromyces cerevisiae chromosome XI, we have determined the complete sequence of 13 213 base pairs on a DNA fragment mapped next to the left telomere (about 200 nucleotides away from a 1 kb telomeric sequence, unpublished results). This fragment contains five open reading frames (ORFs), the potential function of which will be discussed below.
MATERIALS AND METHODS Strains and vectors
Cosmids pUKG040 and pEKGlOO were pro- vided in Escherichia coli strain TG1 (A(1ac pro), *Author to whom correspondence should be addressed.
CCC 0749-503X/94/SQ0081-11 0 1994 by John Wiley & Sons Ltd
thil, supE44, hsdD5, F‘ (traD36, proA”+ B”+ la& lacZAM15)) from Agnb Thierry and Bernard Dujon (Thierry and Dujon, in preparation). They are cosmids from libraries of chromosome XI, derivatives of cosmids pOU61 cos and pWE15 respectively, containing overlapping partial Sau3AI yeast DNA fragments. Escherichia coli strain DH5a (supE44 AlacU169 (cp80lacZAM15) hsdR17 recAl endA1 gyrA96 thi-1 relAl), and pUC18 and pUC19 vectors were used for all subsequent subcloning and sequencing steps.
Sequencing strategy We have used directed sequencing of ordered
restriction fragments. Cosmid DNAs were digested with EcoRI and electrophoresed in low melting point agarose. Four EcoRI fragments were puri- fied and subcloned into pUC18 or pUC19 vectors.
D. ALEXANDRAKI AND M. TZERMIA
3343 EcoR I 1585 ECOR I
1524 EcoR I It I 2309 bp
S82
A
B
10248 EcoR I 13208 ECOR I 45% bp I
~
1523 61 1758 6905 2960 bp
3 3
2 2
1 1
-1 -1
-2 -2
-3 -3
2000 4000 6ooo 8000 loo00 12000
F711 7
D123 All0 F705 B473 ++ - - Figure 1. (a) EcoRI restriction map of the 13 213 base pair segment. The arrows indicate the beginning (Suu3AI sites) of the sequences included in the two overlapping cosmids. The numbers below the bar indicate the size of each EcoRI fragment. (b) 6-phase ORF map of the 13 213 base pairs. Small bars indicate initiation codons and full bars indicate stop codons. The location and the direction of five ORFs are indicated by arrows. The number in the name of each ORF indicates its size in amino acids. The ORF names were assigned by MIPS.
The order of the EcoRI fragments is shown on the map of Figure 1A. The sequences of the two 5‘ EcoRI fragments and 2 kb of the third one have been determined from cosmid pUKG040. The rest of the reported sequence derives from cosmid pEKG100. The additional sequence of 24.6 kb included in the pEKGlOO insert has been reported separately (Tzermia et al., 1994).
Double stranded template DNAs were prepared by alkaline lysis followed by Qiagen-tip selection (Qiagen Inc.) or PEG precipitation (Ausubel et al., 1987). They were subsequently sequenced using [35S]dATP and the Sequenase kit (United States Biochemical Corp.) following the supplier’s proto- cols. Sequencing of both strands of the EcoRI fragments, subcloned in both orientations, was performed by the ‘universal’ or the ‘reverse’ M13 primers on nested ExoIII-S1 nuclease (Ausubel et al., 1987) deletions. Synthetic oligonucleotides cor- responding to internal sequences (prepared on an Applied Biosystems synthesizer by the Department of Microchemistry at 1.M.B.B.-Crete) were used as primers to fill in the gaps. The junctions between the sequenced EcoRI fragments have been estab-
lished by PCR sequencing of PCR products, which were synthesized from oligonucleotide primers near the ends of the fragments, using cosmid DNA as template, [32P]ATP-labelled primers and the fmol DNA sequencing kit (Promega Corp.). Samples of sequenced DNAs were analysed on 40cm long 6% or 4% polyacrylamide gels with single or double loadings.
Sequence analysis software Endonuclease restriction, 6-phase ORF map-
ping and hydrophobicity profiles of the sequences were accomplished by the DNA Strider software (Marck, 1988). Comparisons of nucleotide and amino acid sequences were made to the GenBank, EMBL, SWISS-Prot and NBRF libraries using the GCG package software by us on the I.M.B.B. MicroVAX and by the staff at MIPS.
RESULTS AND DISCUSSION Sequence determination
The reported sequence was determined from overlapping Exo 111 produced deletions and
IENT OF CHROMOSOME XI
601 TTTGlTCGTT AATTTICAAT GTCTATOGAA A C C C G T T a T I U U T T G G C G TTTGI'CTCT ATTTOCGATA GTGTACATAC CGTCCTlGCA TKAGCACIG
101 GAGAlRGCTG GTCTCMTCT OGTAGACTAC CATGGCACAC CAGTGATAX TCTGCTCACT ICTTCAGCTG GMTAAOCAGT ULACATOGTG GIGMGTCAC
801 CATAGTTGAA AACAGCCTCA OCAATTKAA CZGGTAffiT CTCACTTGCA TCAOCTCCTT GAAAULAGTA CTATTOGCCC AAATCAGCTC TGATATCGGA
901 GACGTAAACA CCCAAITCGA OCAAGTTAAC TCITTCGTCA GAIGGAGATA GAGIGGTAGT GGCTCCGGCA GCGGCAACAC CAGCAGCGAT GGCGGCGXA
2 -I-
S83
1001
1601
1701
1801
1901
2001
2101
2201
2301
2401
2 5 0 1
2 6 0 1
2701
2801
2 9 0 1
3001
3101
3201
3301
3401
3501
3601
3701
3801
3901
4001
4101
4201
4301
4401
CCAGCAGCCA TTGAAGTTAA TLTGA AlRTTTGITT T G l T T C T T K TGCZATATA A G C r a A C A G GAAAWVUGG AATAAAMCA TATTCTCAAA
1201 TCTATATCTC ATCTTICACA CAATCTCATT AICAC G AGATGCTCIT GTTTCTGAAC GMT~TACA TCTTT~TAG xmccmm TQXGTACIG
~
1301 TTTTATGGCA CTCATGTGTA I T C G T A E C G TAGAATGEA GAATGCCAAT TATKGGTGC CGAGGXCCT T A T M C C C I T T T C T m G C CTGTCACATT
1 4 0 1 TCCTITTTCG G T C W G A ATATCCGAAT TTl'ACATlTG GAOCCKGlR CAGAAGCTTA TTGTCIMGC CCATATPCAC ICTGCTCl'AA ACGGCTTEA
In 0 t- I&
1: 1501
EcoR I EcoR 1 TGCPIACAAAT A W C T A T C T C r r S A U K G TACAACAKTA AAETGTGIT GGGAGTCGTA TACT G T C T G m T ICG- TCCGCGAATG
m G A GCAACTATCT KAAAAGTAT GIXGAAGICC GWAGCAACA GAGOGTGCCA CCAAIXGCTT TCACAMGAC T X C T T I C G T G I T G G W
AAAAIGAGTC ACTTACTACA OGTTCAATTT TTATAATICA ATETTGTAC C T XTGGAA AAGAlCTCCT CAPARTTACT OCACCCOGAA AlRCCTGIXT
CAAAGRAGGC ACTTGULGAA ETAATATAT W C G A T G R A A T I T T C C G K AAAXTTCTA AAGGICCCTC TAAATCATTT PGTATAGITG TTAAACTGTT
TTCAGTTGCA TCATTXTCGG ITTCAGGCCC TEATCGATA TTTITCATIC CTTIGTTAAA AGCTCTTCTT C C T A T T I T T CCATTCATAT AATAATAGW
AAGTAATAGG A G T T T E C C A AATTTGKTA ACAGGTTICA ACITTTTCTTG GTGMTATGA TCAAATATGT TTTCCATTTC TPTAAAECA ATPTTCACCG
AAATKACTT ATTGTCnGA CCTCAAAGT CKCAGGACA TATAGAATAA GTTAGTGCGG ACGhATTTCT C T G M Z A A T AATTGGATAT C I T T A l T T C
CRTAIXCTGT AGAAAAATGG TATAAATITG AATTAATGCT CTOVVLGCa TCACATAAAT AAGAMCAAC GCATTGRTA AATGGGGCCA TTPTTGCAG
AGGCATAATG ACTTAULAGA TAATTTCCTG TICTCTAATT CAAAATAGGT CTCAAGGACA T T W C A C A AATGTAATGA ATTTAGZGA GCCAGGALGC
TAAAATTGAA TATTCICGCA ZCAAAAICTT GATGTCTTAA CGmTAAAUL TTGCATI\AAC T T C C m T C A T ATGGACAGTT GCATGCCATA A l T G M G I T C c
GGTGAACTCA TTACCATTTA MTTAGATGC AITTAAGDA m c c m a GCTIRMCAT ~ A A T C G T G AACTTTITTA A G T C C ~ T GATGAGTOGA
ATGT!3GGCG GTTTCKACG AGCATGAATT TGRTCATAA TATLTCTCK TCTTAATGTT GTTTTXTACA GAAGGATGTC C C C T G A E M GAATAGTTCT
CTAGACGTAC C T T G m A C A AAATCGEGT TCATTCGCAC C-CT GTACAGAGTG AAATTITAAC GTCCGEAAT AGGATCCATA ACCACAAICT
CTCCAAGTAA GGTATlTCCT LTATTGCATG AITTTTACIC AGATATAPX GACGAAGATC TTCATTAAGG CCCATATGGA TAGCTGZTC AATTGKAGA
CCATGRATAA ATATGCAATG ZACTGATCG CECCGTCAA GAOCCGCCAC ATTTATGTAC AGGTAACGTA A A A A T W ITCCACPCGC TCAGTATAAA
A A A C m A G C CGAAACAAAR CACGTTAAAA CITTATGGRA GACXTCAATA GCC'ICTGGCA CTTCITTTGG ATGGGATGCT AACCACATAA TGGCTGTCAT
AACACCCACT TTATAATAAT TlTTTTTACT GICCAGAITA AGOCAAATTA T T G Z T G C T G ACCAGlTTTA TGACTICTTG G G C C m C AAAACMTCT
TGCACATCCC TTAATACTTT ITCTTTATGA AULATTTOGT AACIATCGTA AARATCGCTA GCLUAAAAT CAGTTAAATA C I C A C A W ACTTCATAAT
Emu I
TTGGAAGCGA TTCACATAAG m m c I v I T m CXTTGG~WI GCCKAGTGC GGTCGGGCAG TCKMTGGA ACTAAICTCC G A G A G ~ C AGTMTWG
CTCTCITTTC CAGTTGITAC OGCTGAGTTT T m a c m c CATATCTCCI CTCZTATTT GGCWGTA TCTGTTTCAG TTGCTMAT vxTcmmc
GACGITGGTC CATAAPICAAT GECCTATTG TGCTTTGAAG ATAAATATCT CATGTTGGA?. AGAGOGTTTT GTCCTICGTG AATGTTAGGC TGGCATTGIT
lTGl7XGCTG AGATTULAGC ICTTCAATAC GACGACCCAT C T C I A A T l l T GAACATCTCC T G T A T I T m CCTCAGITGG ICCTCCATGC CGCAAGTGTT
lTCTITATAT TCACAITGCT TTCGATITCG AWGAGCAA CTICCACAGA ACGGTCTTAC TCCAICGCAC TTCAGClTTC ITACTClRCA AAAGTGGCAC
ZACTTl'GCAG G C W C T E T XTCTCOZTA TCIAA"lTT TTTITTCACT AAGITCCGTA TTC TCTA-m ATGGMK.TA T A n G C T K A
S84
1901 TATTATTAAT GAAATUCTA CITTTAKCTA AAAAI\TGATC CCAGCATIG ATACTCTTCA AAGTXTCAA TGGCTTTTK PEACTTATCT AARACTATTT
8001 TTGCGGTTTC ATTCCICACT T T G T W A A AALTAGGAGG TCCACAGCAA ACCXAGATA ATGAGCCACT CMTTCAGCC GCTTCAETA GAAGTTCCPT
8101 AACATCTGGT CGCCCGCAAT GGAAAACAAC AOCATCAAAA CCAAGTGGAT TGGCAGACTT TTCTAMGTA GTTGCaCM CAGTGCaTT TEATCAGCC
8201 TTCWTCCT GTTGAblUT ATCTAAACTA TCACTAGGGG TTAATGATGG GAClTCCATT GTGTETAGA TGTWVY;CTG TACATTWGA TTPTCTAAAC
8301 A C A m C T C CGGCTTATAA GCCTCWTA CCTCGAAKC TClAACTGCA ATCXTAATT TTACAGATTG TTTTCmGCA GCCGCTULCG rCTTTCCMG
8401 TTTAATTGCA TGTGCMTAG CTCCAGGCAA ACCGGTACCT CCAGTGAGm AC?!ATACATT ATTGWTTA TTGACCGGAG ATGAAGAACC ATATGGACCT
8501 TCTATAGCTA GTCTCATAGA TGTCTTACCT C(1ITTGCGRC ACXATACTC TTTUCFAGT CTTGTTACTC CCTTTTPTTC W C A G U T A A W C C A a T
8601 CACCATTCTT GCTTACTGAA ICCAAARCI\G TAMTGGATG TGACTGCCAG AAGVGAGTG GATG?AAAAA CGAAACGM XATATLGCC CAGGTTTGGC
8101 CCTCCATGGC CTTGCCGGTT ZTTTAACTGT TAAACGAATG AGATCATCCC CGATTAGTTG TAGGGWGCT TTGGG-C CAAAATAAGA AGCTTTGATA
8801 ATTCTAATAA TCCGGXAAC UTCCAAATC G W T A G U G TG'IKTATCCA CTCPATGCCA CTTAALCTM CMCAECTC CCAACAECA T M G A ? C A
D. ALEXANDRAKI AND M. TZERMIA
r- I&
I ,
4501
4601
4101
4801
4901
5001
5101
5201
5301
5401
5501
5601
5101
5801
5901
6001
6101
6201
6301
GATTCATTAT TTCCAAAGCA XCGTTATTG GTTACTAGTG CTGWLCAATT GATATGTMT CAGTIGCTAT ACATCZAGAT m G m T C TCITCTTPPG
ACGTPPCGAA TATTTTTTGA m T T GAAGTGCACA CTmTCTTTT C M G T I T T T TA-TC CGTTT'IKTGG CTATTCCCGC mTAACATA
GAGGCWLTGC ATGGAACACA ATAATTATTT TAAGGTGCTT TTULGATCAA GATATTTTTA AGAG'IKCTCA CTATGGGTAT UGATAATAA m m T a m c
GTTTICTTAT TATTTCCATT m A G A T T T TACAGGCTA AA(1IcI\cATA TAT-TG CCATCGGATA TEMTITX TATCCTGCTA TIGATATIEI\
ACTAZATAAC TGATACTAGA ATATACWT TCGTGCACTA TTAACCGTTT GGC
MTTGWTTT AAATGCTAAA UCGATATTA ACGGGAATAC CTWTCTCG ATCUGGTGC CTGAECAGG ATATGaTGC TITATTCTTC TECTTTZAT
TCTGTACAAC TTTTCTACTT GGGGGGCAAA TETGGTTAT GClATTTATC TAGCGCATTA TTTAUGAAT MTACTTTTG CKGTGGWLG TAMTTAUC
TATGCTTCTA TAGGTGGGTT KCATTCAGT TGP2GAC"T TmTGCCOC AGTlATAACA TGGCmATC ATATAmTC AAnrAATTC ATTATAGGCT
TAGGGWACT GTTTCAAGGG GCAGCGCTAC RTTTGCWC TTTCTCTGIC ACACPCTGGG AAATTIATCT CACGCPAGGC GTlTTAATTG GATTCGGlTT
AGCATCTATT TTCATXCCA CTGTCACACT CKCCCCACTA TGCTTCAGAA ATAAMGATC TTTAGCCTCT GGTATffiWLA CPGCTGGWG CGGGTTAGGT
GGTAZTGTCT TTAACTTGGG AATGCAAAGT AlTCTACAM AWGGGCGT TI\MTGCCCG CTCATPGCTC ACTGCATMT ATGCACATCA ClTAGCACCA
TTGCGCTTAT GTTGACCAGA W C A C A T C AZGCCTACC TCAACATAAC W T ' C A C A M T T E M T T GCTAGATTAT UTGTGCTTT CAMTTTCGC S d A I+ pEKG100
GGTCUGTTA CTTTTIGGAT WTATCATT TGCTATGTTA GGATATGTIG T C C r m G T A TTCCTPGTCT GATTTTACCG lTAGTT7lGG TDaTACTAGT
RAGCAAGGCT CATACGTATC CTGCATGGTG ACTGTCGGCT CTCTGCTGGG ACGACCMTT GTGGRCACA TTGCTUTM ATATGGATCA C W C A G T K
GCATULTATT GCACCTTGTC ATGGCCATCC TZTGTTGGGC CAlGTGGA'PR CCTIGTAAAA ATTTGGCCAC TGCGAZACGT TCTGGAlTAT TGGTTGGTCC
TATTATGGGA ACAATTTGGC WCAATTGC TEAATTGTT ACPCGCATLG TTGmCTTCA AAAGCTTCCT GGTACCPTTG GTAGTACCTG GATTTTTATG
ZCGGCTTTTG CCTTAGTTGC OCCCATAATC GCTCTGGAAC TTCGTTCAkC TGA'IKCGAAT GGAAIEGATT ATTATCGTAC AGCAATATTC GlGGGTllTG
SGTACTTTGG TGTTAGTTTA UCCAAlGGC TATTGAWLGG GTTTATMTA GCTCGAGATG AGATLGCTGT GCGTWHGCC TATTCAGCTG A C C M T G A
lTTGCATTTA AACGTlRAGT 'IKTCACATAT GXTAAAIGT CTTCTTCGTC ATAALCAATT ACCT€GCAGA GT TAALGGT CACTTTZATT TCACACTCTA
6401 GATAAWULGG GGATAUGTT GCCAGAAAAT TTITTGClTT ATCGCTTTT TAGkTlTGTC TTCTaTTTT ATAAAWTG ATAATTlAGT Al(aATIVLTAG
A T A A W CATTTITTTT ECAAATTGA AACCTTIACT GGTCTTTTAA MGIW\ATAT TAGTUCCTT CTACCUCTA AMTCTRCA CIXFATCCCC
AGTAblUCA TAAGAWGG T T W C l ' T T TTIAGGGCCA TTATATTTTC AGTCCTATAC ATGTTPGATG GCATACPTTT W G G A X A T TDaTWLAGCT
ATGTICCACA TAGGTTTATG UCTAT'SGAT AZATACTATA AAIAAAACGC TCTXCCMG AAAT-CA TTT-C TTTTGTCATC AGCACAAAAA
TATGCGGATT TACGCGTTTG ICATTCCTAT AAACCTTAAC CTlAACGCGA AACCATATTT CGCTXCTAA TGTTCCTTCG GTGCATPPTG A W G G W
CTAGGTTCAG ATATCATGCA XGAATCTAT TATCTTCW C C C A A C m ACAAACGTTT AGACGAGCG T T A m C T G AATTAAUGC AKCTTTTZAC
1001 ACGTGTGACT CTGGCACATT PEAGCTGTTT G-TCWA AGlCTTGCU TCTAAGGACC W A A T G TATTTCGAGG KATAGCNLA GAATCTCATT
1101 GTTCTCCAGC ACTTGCTAAT CACATAKTA TTKTAACXA CAKXTGATA GTTAACMCT AATAWWAT AAACCAGGTA AARAAUHRA Iv\GAAIwGC
1201 TGTAAAAGAC GACAACGTTG GCGTCGGAA AATATTAATA GAACCATAU TTGUTAATG TAGCCPCTAT TATACXTTA ICCCGATTAC AZTGTTTZAG
1301 CGACGGTAAT CATCTTTTTA lCCGGCACTC TGWTCAALC TCTATCTTLR AGTATCCTAT ATGCATCAGT TATGTGATT ATATCGAAGA CKGGTGTGT
1401 GATATTACTC CACAG'PRCAA AAACAGUVLT ACAGAGAAAA TI'(AAAAGAT ATAlGAACGG ATGGlATGGT AAATAAMTA ACAGCCATTA TAMGCTTGT
1501 GGTAAATLTl CCAGGACGGA AGTTAGCTTA TGCAGTTTPG TTTATCTTCC TTGmTCTGT GTATAATACT GMACXGCA ATTAACmT CATAAAATTT
1601 GCCTULGGGC CGTATAAAAA TITAGTQCC ATTGTTCACA CTTIAAATAT AGC-TTG CCAA-GG GTTTCATTGC AGGTTTATCG AAATATTAAC
6501
6601
6101
6801
6901
1101 TCTnaAATA CAGGWLCTAT PUGAGCTAA CTTCTTAAGA ?AAWAACCA TGCGACACAC TACGACTTTT ACMAITCCA IAAAAG?GAT ATTTAAGATT
1801 TAGAVAAAT ATCAAXGAC ECTGTACTC TGCTTCAATT ACCTCCTTAC CATAAAGTTA T A G a G C T T KTCCATCGA CATACTAAAA TTICGATPGG
Figure 2. (Continued).
13.2 kb SEGMENT OF CHROMOSOME XI
8901 TTGCACCMC GACCATATGA AGCTTCATA MMTACTIC CTGMTACTC W M WLAAACGATT GTGCCAGCTA . I . M . c U C l 4 1
S85
S86
77%
D. ALEXANDRAKI A M ) M. TZERMIA
Chromosome V (Right end) 11,125 10,678
I I
j S %
Chromosome XI (Left end)
Chromosome I11 (Right end)
Chromosome I11 (Right end)
.- .- I T - l I 89% 96% 78% 93% - 90% n
II I I II I t II I I
I I , 1 I I I I 1 1 I I I I I I I I I
:lo441056
II II II
I I I I I I I . - . . . . . ..
I 306,931 1 1 I
71% !83%
288,802 288,796
Figure 3. Diagram of the homologies between different chromosomes. Shaded bars indicate the chomosomes. Numbers at the bottom of each bar indicate the base coordinates as given in the BlastA analysis. Numbers on top of each bar indicate the percentage of base pair identities
internal oligo-priming to fill in the gaps. An aver- age length of 315 nucleotides was read from each sequencing reaction. Readings up to 400 bases were achieved on 4% polyacrylamide gels. Com- pressions seen at several specific positions were solved by repeating the sequencing reactions using dITP (5 different instances). Sequence assembly was performed manually according to restriction maps and the sequences obtained from PCR con- necting fragments. Sequence alignments of both strands were done using the GCG program. The final sequence contained an additional 6 1-base EcoRI fragment following the first (5')EcoRI site, which had not been detected at the original gel electrophoretic analysis of the cosmid DNA (Figure la).
Sequence analysis Six phase ORF map analysis of the 13.2 kb
fragment revealed five ORFs> 100 codons (Figure lb). Their sizes range from 110 to 71 1 codons and they constitute 48.2% of the entire sequence (6366 bases). This percentage is a low compared to the average chromosome content in coding sequences and it is probably due to the location of this
fragment near the end region of the chromosome (Oliver et al., 1992).
The complete sequence of the 13 213 bases is given in Figure 2. FastA (Pearson and Lipman, 1988) and BlastA ( Altschul et al., 1990) analyses of the sequenced segment revealed extensive hom- ologies to known sequences on different yeast chromosomes (Figure 3). More specifically: a) a region of 1182 bases showed an overall identity of 90.7% (ranging from 75% to 100%) to the right arm of chromosome 111; b) 159 bases of that showed 77% identity to a second region of chromosome 111; c) 638 bases were found 77% identical to the right arm of chromosome V and d) homology has been detected to the right arm of chromosome I1 (Becker, personal communication). Obviously, these are due to recombination events between fragments near the ends of the mentioned chromosomes, as has been previously reported (Oliver et al., 1992).
Analysis of the putative O W products The putative translation products of the identi-
fied ORFs have been compared to protein data- bases using FastA (Table 1). For better evaluation
13.2 kb SEGMENT OF CHROMOSOME XI S87
Table 1. with the protein databases
Best optimized FastA scores obtained by the comparison of the putative translation product of each ORF
Homologous or Identical Optimized Highest ORF protein score score
D123 S. cerevisiae 544 554 Hypothetical protein YCR104W (124aa) 98.3% identity in 120aa
S. cerevisiae 485 554 SYGP-ORF12 (120aa) 86.2% identity in 123aa
Hypothetical protein YCR103C (ll laa) 79.3% identity in 11 laa
CYPl (HAPl) regulatory protein (1483aa) 28% identity in 130aa
mevalonate transporter (494aa) 28.1% identity in 153aa
Ferric reductase (686aa) 24.5% identity in 693aa
A1 10 S. cerevisiae 496 655
F705 S. cerevisiae 155 3624
B473 Chinese hamster (MeV) 182 2515
F711 S. cerevisiae (FREl) 542 3723
Reference
van der Linden et al., 1992 EMBL: X59720
Mulligan et al., 1993, unpublished EMBL: L10830
van er Linden et al., 1992 EMBL: X59720
Verdiere, 1988 EMBL: X13793
Kim et al., 1992 EMBL: S48888
Dancis et al., 1992 EMBL: M86908
of the significance of each obtained score, we have also included the highest FastA score, obtained by the comparison of each ORF to itself. Optimum scores higher than 200 have been considered as significant. Lower scores due to homologies in restricted areas of the protein sequences indicated conservation of specific domains. Protein patterns (motifs) have been identified by the ProSite pro- gram (Bairoch, 1991) of the GCG package. Our findings on each individual ORF are discussed below.
ORF D123 is included in the region which is closely related to the right arms of yeast chromo- somes I11 and V. It is almost identical to YCR104W a hypothetical protein in the HMR 3' region on chromosome 111, and very similar to the STGP-ORF12 encoded by a gene contained in a region of 36772 base pairs of chromosome V between the known genes MAK10, AFG18 on its 5' site and CYC7 on its 3' site (Mulligan et al., unpublished, Mortimer et al., 1989) (Figure 4a). The function of both of these hypothetical proteins remains unknown. As has been already reported
for YCR104W (Bork et al., 1992), D123 showed also similarities to the yeast temperature-shock inducible protein TIPl (Kondo and Inouye, 1991) (27.3% identity in 99 overlapping amino acids, FastA score: 144) and to the yeast serine rich, glucose induced protein SRPl (Marguet et al., 1988) (26.3% identity in 99 overlapping amino acids, FastA score: 121), the function of which is similarly unknown. All of these proteins, including D123, start with a putative hydrophobic signal sequence, which is followed by a conserved domain of about 90 residues including the stress- induced protein motif (P-W-Y-[ST](2)-R-L). This domain is followed, in SRPl (total length of 254 amino acids) and TIPl (total length of 210 amino acids) proteins only, by a repetitive serine and alanine rich region (Figure 4b). According to Kondo and Inouye (1991) and Marguet et al. (1988), there is a family of several genes in different chromosomes which cross-hybridize to both TIP1 and SRPl sequences but some of these may not be highly expressed genes, since only three distinct transcripts have been detected. We have not
S88 D. A L E X A N D R A K I A N D M. TZERMIA
A D 1 2 3 YCRlO4W S Y GP -0RF 1 2
D123 YCR104W SY GP -0RF 1 2
D123 YCR104W SY GP-ORF 1 2
B D123 YCR104W S Y GP -0RF 1 2 T I P l SRP 1
D123 YCRlOllW S Y GP-ORF 12 T I P l SRP 1
D123 YCR104W SY GP -0RF 1 2 T I P l SRP 1
D123 YCR104W SYGP-ORF12 T I P l SRP 1
D123 YCR104W SY GP-ORF 12 T I P l SRP 1
MVKLTS IAAGVAAIMGVAAAPATTTLSPSDERVNLVELGVYVSDIRAHIAQYYLFQAAH MVKLTS IAAGVAAIAAGIAAAPATTTLSPSDERVNLVELGVYVSDIRAHIAQYYLFQAAH MVULTS IAAGVMIM---TASATTTLAQSDERVNLVELGVYVSDIRAHIAQYY SFQAAH * * * * * * * * * * * * * * * .*. * * * ** , . * * * * * * * * * * * ** * * * * * * * ** * *
P S E T Y P V E I A E A V F N Y G D F ? T M L T G I P A E Q V T R V I T G V I Y T PTET Y PVEI AEAVFNY GDFTTMLTG IPAEQVTRV I TGVP W Y STRLRPA I SSALSKDGI YT PTETYP IEVAEAVFNYGDFTrMLTGIAPDQVTRMITGVPWYSSRLKPAISSALSKDGIYT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
- IAN AIPK - IAN
*. .
MVKLTS I AAGVAAIAAGVmP ATTTLSP SDERVNLVELGVY VSD I RAHIAQY Y LFQAAH MVKLTS I AAGVAAIAAG I AAAP ATTTLSP SDERVNLVELGVYVSD I RAHIAQY Y LFQAAH MVKLT S I AAGVAA IAA- - - TAS AT TTLAQS DERVNLVELGVY VSD I R A H I A Q Y Y SFQAAH MS-VSKIAFVLSAIASLAVADTSAAETA---------ELQAI ICDINSHLSDYLGLETGN MA-YTKIAL-FAAIAALASmT-QDQIN---------ELNVILNDVKSHIQEY I SIASDS
.. * * . . . . . . . . . . ................. P SETYPVE IAEAVFNYGDFTTMLTG IPAEQVTRVITG RPAI-- PTETYPVEIAEAVFNYGDFTTMLTGIPAEQVTRVITG RPAI-- PTETYP 1EVAEAVFNYGDFTTMLTGIAPD;LVTRMITG KPAI --
S-GFQI ---PSDVLSVYQQMulTYTDDAYTTLFSELDFDAITKTIVK SSEI - - SSGFSLSSMPAGVLD I GMAIASATDDSY TTLYSEVDFXVSKMLTM EPALKS
- -- - - - - -- - -- - - - - __ - -- - - - - --
. . . . . . . . . . . . . . . . . . . . . . . ..........
". . . . . . . .
Figure 4. (a ) Multiple alignment of the scqticnccs of the D123 ORF. the YCR104W hypothetical protein and SYGP-ORFIZ using the CLUSTAL program. ( b ) Multiple alignment of the sequences of the D123 ORF. YCR 104W. SYGP-ORFIZ. TIPI and SRPl proteins. The stress-induced pi-otciii motif is shadowed. Asterisks indicate residue identities and dots indicate conservative substitutions.
13.2 kb SEGMENT OF CHROMOSOME XI S89
AllO MEMLLFLNESYIFHRLRMWSTVLWHSCVFVCVECENANYRVPR-CLIKPF-SVPVTFPFS YCR103C MEMLLFLNESYIFHRFRMWSIVLWHSCVFVCAECGNANYRGAG-VPCKTLLRAPVKFPLS DYClOl MEMLLFLNESYIFHRFRMWSIVLWHSCVFVCAECGNAYYRGAGGCRGAGGCLEKPF-CAPVKFPFS
** *** * . . . . . . . . . . . . . . . . . . . . . ********** ** ** ** . AllO VKKNIRILDLDPRTEAYCLSPYSVCSKRLPCKKYFYLLNSYNIKRVLGVVYC YCRlO3C VKKNIRILDLDPRSEAYCLSLNSVCFKRLPFNKYFHLLNSYNIKRVLGWYC DYClOl VKKNIRILDLDPRSEAYCLSHHLVCPKRFPCKATSLLL----------IPEG
************** * * * * * * ** *** * * **
Figure 5. proteins using the CLUSTAL program.
Multiple alignment of the sequences of the AllO ORF, YCR103C and DYClOl hypothetical
examined the expression of the gene encoding D123 protein. However, its proximity to the telo- mere could indicate that it is expressed at flow levels or under specific conditions (Sandell and Zakian, 1992). In fact, we have noticed, by FastA analyses, that the TIP1 and SRPl proteins are more similar to the SYGP-ORF12 (30.6% identity in 108 overlapping amino acids and 29.3% identity in 99 overlapping amino acids, respectively) than they are to the D123 and YCR104W ORFs.
ORF AllO is also contained in the region of homology between chromosomes XI and 111. It is similar to the hypothetical protein YCR103C. It contains several non conservative amino acid sub- stitutions which may imply that the two products serve different functions. A third ORF of 101 codons (DYC101 sequence communicated by H. Domdey), that was identified on chromosome 11, is also homologous to A1 10, exhibiting 76.3% iden- tity in 97 overlapping amino acids (FastA score: 442). That ORF resembles YCR103C as much as the AllO does (794% identity in 97 overlapping amino acids). A multiple alignment of the three sequences revealed more residue substitutions be- tween A1 10 and the other two proteins, except for the last 14 amino acids at the carboxy terminus which are identical in AllO and YCR103C ORFs and absent from the chromosome I1 ORF (Figure 5).
The FastA alignment of the F705 ORF prod- uct showed homology to the yeast protein CYPl (HAP1) only in 130 overlapping residues of its amino terminus (residues 7 to 136 in F705 and 49 to 170 in CYP1). In fact, the alignment varied depending on the program used, except for a stretch of 36 residues, 30 of which make up the fungal Zn (2)-Cys (6) binuclear cluster domain. F705 was also regionally similar to several pro- teins that contain the same motif (FastA scores: 100-1 55). All of these proteins are transcriptional
activators (GAIA, MAL63, LEU3, etc.) that bind DNA with their cysteine-rich amino termi- nus in a zinc dependent fashion (Coleman, 1992). The identified motif in F705 starts at residue
RKQC) and follows the consensus sequence:
9)-C-x(2)-C-x(6,8)-C F705 contains also, near its carboxy terminus, a second rare motif characteriz- ing membrane proteins involved in sugar trans- port, starting at residue 629 (MEKIGRRAFNKG) and following the consensus sequence: [LIVMSTI- [DE]-x-[LIVMFA]-G-R-[RK]-x(4,6)-G. However, its significance is questionable since the hydropho- bicity profile of F705 protein does not reveal the typical transmembrane domains (data not shown), unless it is a novel type of protein that can mediate both interaction with a sugar and transcriptional regulation.
The FastA search for ORF B473 product revealed low similarities to a number of membrane proteins. The most significant similarity was between its amino terminus and a mammalian membrane protein, the putative transporter of mevalonate (Figure 6a). This similarity does not necessarily indicate a directly homologous mol- ecule but some sort of membrane transporter. In fact, the hydrophobicity profile of ORF B473 showed 12 membrane spanning stretches typical of such proteins (Figure 6b). The profile also included one central and one carboxy terminal hydrophilic region which followed the transmembrane seg- ments 6 and 12 respectively, similarly to the profile of MEV protein and other membrane transporters (Culham et al., 1993). The main difference was at the amino terminus (- 30 residues) or ORF B473 which appeared hydrophilic. A transcript corre- sponding to the ORF B473 was not detected in extracts of cells grown in standard YPD (Guthrie and Fink, 1991) growth medium (data not shown).
23 (SCHFCRVRKLKCDRVRPFCGSCSSRN-
[GAS]-C-X(~)-C-[RKH]-X(~)-[RK]-X-[RK]-C-X(~,
S90 D. ALEXANDRAKI AND M. TZERMIA
B473 MSEERHEDHHRDVENKLNLNGKDDINGNTSISIEVPDGGYGWFILL-AFILYNFS'I'WGAN MeV MPPA--------------------IGG--PVGYTPPDGGW~A~~ISIGFS-YAFP
**** * * * . . . *** *** . . *. . *.* ... . B473 SGYAIYLAHYLENNTFAGGSKLDYASIGGLAFSCGLFFAPVITWLYHIFSIQFIIGLGIL MeV KSI"F---FKEIEGIFNATTSEVSWISSIMLAVMYAGGPISSVLVNKYGSRPVMIAGGC
.. ... * * ... .... . . *... .. .*. . * . .. . .. * B473 FQGAALLLAAFSVTLWEIYLTQGVLIGFGLAFIFIPSVTLIPLWFRNKRSLASGIGTAGS MeV LSGCGLIAASFCNTVQELYLCIGVIGGLGLAFNLNPALTMIGKYFYKKF@LZNGLAMAGS * . * * * - * . *. *.** **. *.**** . * * * * . * .* . * **** . *** ***
100 200 300 400 B
3 2 1 0
-1 -2 -3 4
4 4 3 3 2 2 1 1
0 0
-1 -1
-2 -2
-3 -3 -4 -4
100 200 400
Figure 6. (a) Alignment of the 180 aminoterminal residues of ORF B473 with the 156 aminoterminal residues of the mammalian membrane transporter MEV. @) Hydrophobicity profiles (Kyte and Doolittle, 1982) of ORF B473 and MEV protein.
Ths may indicate gene expression under specific conditions or gene repression related to its position near the telomere.
The product of ORF F711 showed a significant similarity to the known yeast FREl ferric reduc- tase. We have proven by biochemical, genetic and structural analyses that it is also a membrane protein that can reduce the environmental ferric iron to its intracellular ferrous form. The expres- sion of this non-essential gene is down-regulated by the presence of iron in the growth medium and
its RNA is not detectable under standard YPD medium growth conditions (Georgatsou and Alexandraki, submitted for publication).
ACKNOWLEDGEMENTS We thank Bernard Dujon for the excellent coordi- nation of the Chromosome XI sequencing project. We thank Irmi Becker and all MIPS staff for help with the sequence analysis. We thank Horst Domdey for allowing us to use the unpublished
13.2 kb SEGMENT OF CHROMOSOME XI S9 1
DYClOl hypothetical protein from chromosome 11. We also thank Yannis Papanikolaou for help with the computer analyses at IMBB, Georgia Houlaki for help with the preparation of the figures and the co-contractor George Thireos for helpful discussions. This work was supported by the Commission of the European Communities under the BRIDGE program of the Division of Biotechnology and by the Greek Ministry of Industry, Energy and Technology.
REFERENCES Altschul, S. F., Gish, W., Miller, W., Myers, E. and
Lipman, D. J. (1990). Basic local alignment search tool. J. Mol. Biol. 215, 403-410.
Ausubel, F. M., Brent, R., Kingston, R. E., Moore, D. D., Seidman, J. G., Smith, J. A. and Struhl, K. (Eds) (1987). Current Protocols in Molecular Biology. Green Publishing Associates and Wiley-Interscience.
Bairoch, A. (1991). A dictionary of sites and patterns in proteins. Nucl. Acid Res. 16, 2241-2245.
Bork, P., Ouzounis, C., Sander, C., Scharf, M., Schneider, R. and Sonnhammer, E. (1992). Compre- hensive sequence analysis of the 182 predicted open reading frames of yeast chromosome 111. Protein Sci.
Coleman, J . E. (1990). Zinc proteins: enzymes storage proteins, transcription factors and replication pro- teins. Annu. Rev. Biochem. 61, 897-946.
Culham, D. E., Lasby, B., Marangoni, A. G., Milner, J. L., Steer, B. A., van Nues, R. W. and Wood, J. M. (1993). Isolation and sequencing of Escherichia coli gene prop reveals unusual structural features of the osmoregulatory prolinelbetaine transporter, ProP. J. Mol. Biol. 229, 268-276.
Dancis, A,, Roman, D. J., Anderson, G. J., Hinnenbush, A. G. and Klausner, R. D. (1992). Ferric reductase of Saccharomyces cerevisiae: Molecular characteriz- ation, role in iron uptake and transcriptional control by iron. Proc. Natl. Acad. Sci. U.S.A. 89, 3869-3873.
Georgatsou, E. and Alexandraki, D. Two distinctly regulated genes are required for ferric reduction, the first step of iron uptake in Saccharomyces cerevisiae. (Submitted for publication)
1, 1677-1690.
Guthrie, C. and Fink, G. R. (Eds) (1991). Guide to yeast genetics and molecular biology. Methods in Enzymol- ogy, vol. 194. Academic Press, New York.
Higgins, D. G. and Sharp, P. M. (1988). Clustal: a package for performing multiple sequence alignment on a microcomputer. Gene 73, 237-244.
Kim, C. M., Goldstein, J. L. and Brown, M. S. (1992). cDNA cloning of MEV, a mutant protein that facili- tates cellular uptake of mevalonate and identification of the point mutation responsible for its gain of function. J. Biol. Chem. 267, 23113-23121.
Kondo, K. and Inouye, M. (1991). TIPI, a cold shock inducible gene of Saccharomyces cerevisiae. J. Biol. Chem. 266, 17537-1 7544.
Kyte, J. and Doolittle, R. F. (1982). A simple method for displaying the hydrophobic character of a protein. J. Mol. Biol. 157, 105-132.
Marck, C. (1988). ‘DNA Strider’: a ‘C ’ program for the fast analysis of DNA and protein sequences on the Apple Macintosh family of computer. Nucl. Acid Res. 16, 829-1836.
Marguet, D., Guo, X. J. and Lauquin, G. J.-M. (1988). Yeast gene SRPI (serine rich protein)-Intragenic re- peat structure and identification of a family of SRPI related DNA sequences. J. Mol. Biol. 202, 455470.
Mortimer, R. K., Schild, D., Contopoulou, C. R. and Kans, J. A. (1989). Genetic map of Saccharomyces cerevisiae, Edition 10. Yeast 5, 321-403.
Oliver, S . G. et al. (1992). The complete DNA sequence of yeast chromosome 111. Nature 357, 38-46.
Pearson, V. R. and Lipman, D. J. (1988). Improved tools for biological sequence analysis. Proc. Natl. Acad. Sci. U.S.A. 85, 2444-2448.
Sandell, L. L. and Zakian, V. A. (1992). Telomeric position effect in yeast. Trends Cell Biol. 2, 1G14.
Tzermia, M., Horaitis, 0. and Alexandraki, D. (1994). The complete sequencing of a 24.6 kb segment of yeast chromosome XI identified the known Loci UR.41, SAC1 and TRP3, and revealed 6 new open reading frames including homologues to the threonine dehydratases, membrane transporters, hydantoinases and the phospholipase A,-activating protein. Yeast 10.
top related