id x03006; sv 1; linear; mrna; std; mam; 620 bp. xx ac x03006; xx sv x03006.1 xx dt 28-jan-1986...
TRANSCRIPT
![Page 1: ID X03006; SV 1; linear; mRNA; STD; MAM; 620 BP. XX AC X03006; XX SV X03006.1 XX DT 28-JAN-1986 (Rel. 08, Created) DT 12-SEP-1993 (Rel. 36, Last updated,](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2da2/html5/thumbnails/1.jpg)
![Page 2: ID X03006; SV 1; linear; mRNA; STD; MAM; 620 BP. XX AC X03006; XX SV X03006.1 XX DT 28-JAN-1986 (Rel. 08, Created) DT 12-SEP-1993 (Rel. 36, Last updated,](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2da2/html5/thumbnails/2.jpg)
ID X03006; SV 1; linear; mRNA; STD; MAM; 620 BP.XXAC X03006;XXSV X03006.1XXDT 28-JAN-1986 (Rel. 08, Created)DT 12-SEP-1993 (Rel. 36, Last updated, Version 2)XXDE Bovine mRNA for lens beta-s-crystallinXXKW beta-crystallin; beta-gamma-crystallin; crystallin.XXOS Bos taurus (cow)OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia;OC Eutheria; Laurasiatheria; Cetartiodactyla; Ruminantia; Pecora; Bovidae;OC Bovinae; Bos.XXRN [1]RP 1-620RX PUBMED; 4054100.RA Quax-Jeuken Y.E.F.M., Driessen H., Leunissen J., Quax W.J., de Jong W.,RA Bloemendal H.;RT "Beta-s-crystallin: structure and evolution of a distinct member of theRT beta-gamma-superfamily";RL EMBO J. 4(10):2597-2602(1985).XXCC Data kindly reviewed (06-MAR-1986) by Y. Quax-JeukenXX...
EM
BL
![Page 3: ID X03006; SV 1; linear; mRNA; STD; MAM; 620 BP. XX AC X03006; XX SV X03006.1 XX DT 28-JAN-1986 (Rel. 08, Created) DT 12-SEP-1993 (Rel. 36, Last updated,](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2da2/html5/thumbnails/3.jpg)
Index
parserindex
flatfile
![Page 4: ID X03006; SV 1; linear; mRNA; STD; MAM; 620 BP. XX AC X03006; XX SV X03006.1 XX DT 28-JAN-1986 (Rel. 08, Created) DT 12-SEP-1993 (Rel. 36, Last updated,](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2da2/html5/thumbnails/4.jpg)
Retrieve
index parser displayentries
![Page 5: ID X03006; SV 1; linear; mRNA; STD; MAM; 620 BP. XX AC X03006; XX SV X03006.1 XX DT 28-JAN-1986 (Rel. 08, Created) DT 12-SEP-1993 (Rel. 36, Last updated,](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2da2/html5/thumbnails/5.jpg)
SRS
Sequence Retrieval Systeman indexing and retrieval system for
flat file databases
![Page 6: ID X03006; SV 1; linear; mRNA; STD; MAM; 620 BP. XX AC X03006; XX SV X03006.1 XX DT 28-JAN-1986 (Rel. 08, Created) DT 12-SEP-1993 (Rel. 36, Last updated,](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2da2/html5/thumbnails/6.jpg)
http://srs.bioinformatics.nl
![Page 7: ID X03006; SV 1; linear; mRNA; STD; MAM; 620 BP. XX AC X03006; XX SV X03006.1 XX DT 28-JAN-1986 (Rel. 08, Created) DT 12-SEP-1993 (Rel. 36, Last updated,](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2da2/html5/thumbnails/7.jpg)
http://srs.ebi.ac.uk
![Page 8: ID X03006; SV 1; linear; mRNA; STD; MAM; 620 BP. XX AC X03006; XX SV X03006.1 XX DT 28-JAN-1986 (Rel. 08, Created) DT 12-SEP-1993 (Rel. 36, Last updated,](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2da2/html5/thumbnails/8.jpg)
![Page 9: ID X03006; SV 1; linear; mRNA; STD; MAM; 620 BP. XX AC X03006; XX SV X03006.1 XX DT 28-JAN-1986 (Rel. 08, Created) DT 12-SEP-1993 (Rel. 36, Last updated,](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2da2/html5/thumbnails/9.jpg)
Q: Which sequences in EMBL [do not] encode for a protein for which the 3D structure is known?
Q: Which sequences in EMBL [do not] encode for a protein for which the 3D structure is known?
![Page 10: ID X03006; SV 1; linear; mRNA; STD; MAM; 620 BP. XX AC X03006; XX SV X03006.1 XX DT 28-JAN-1986 (Rel. 08, Created) DT 12-SEP-1993 (Rel. 36, Last updated,](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2da2/html5/thumbnails/10.jpg)
![Page 11: ID X03006; SV 1; linear; mRNA; STD; MAM; 620 BP. XX AC X03006; XX SV X03006.1 XX DT 28-JAN-1986 (Rel. 08, Created) DT 12-SEP-1993 (Rel. 36, Last updated,](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2da2/html5/thumbnails/11.jpg)
![Page 12: ID X03006; SV 1; linear; mRNA; STD; MAM; 620 BP. XX AC X03006; XX SV X03006.1 XX DT 28-JAN-1986 (Rel. 08, Created) DT 12-SEP-1993 (Rel. 36, Last updated,](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2da2/html5/thumbnails/12.jpg)
![Page 13: ID X03006; SV 1; linear; mRNA; STD; MAM; 620 BP. XX AC X03006; XX SV X03006.1 XX DT 28-JAN-1986 (Rel. 08, Created) DT 12-SEP-1993 (Rel. 36, Last updated,](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2da2/html5/thumbnails/13.jpg)
Command line SRS
Using getz
![Page 14: ID X03006; SV 1; linear; mRNA; STD; MAM; 620 BP. XX AC X03006; XX SV X03006.1 XX DT 28-JAN-1986 (Rel. 08, Created) DT 12-SEP-1993 (Rel. 36, Last updated,](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2da2/html5/thumbnails/14.jpg)
Retrieve the UniProt entry for the protein with accession number P19558:
getz "[uniprot-acc:P19558]" -e
Count the human proteins in the UniProt database:getz "[uniprot-org:human]" –c
Print sequence of the rice proteins in the UniProt database that have a length between 10 and 50 aa:getz "[uniprot-org:rice]&[uniprot-sl#10:50]" -f sl
![Page 15: ID X03006; SV 1; linear; mRNA; STD; MAM; 620 BP. XX AC X03006; XX SV X03006.1 XX DT 28-JAN-1986 (Rel. 08, Created) DT 12-SEP-1993 (Rel. 36, Last updated,](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2da2/html5/thumbnails/15.jpg)
Give the id and description for all A.thal proteins that have at least 8 transmembrane domains:
getz '[swissprot-org:arabidopsis thaliana]< ([swissprot-CountedItem:transmem]&[swissprot-CountedN#8:]))' -f "id des"
![Page 16: ID X03006; SV 1; linear; mRNA; STD; MAM; 620 BP. XX AC X03006; XX SV X03006.1 XX DT 28-JAN-1986 (Rel. 08, Created) DT 12-SEP-1993 (Rel. 36, Last updated,](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2da2/html5/thumbnails/16.jpg)
Count the human protein sequences in the NCBI RefSeq database:getz "[refseqp-org:human]" –c
Count the human mRNA sequences in the NCBI RefSeq database:getz "[refseq-org:human]&[refseq-mol:mrna]" –c
Retrieve the mRNA sequences for all human proteins in the NCBI RefSeq database in fasta format :getz "[refseqp-org:human]>[refseq-mol:mrna]" –d –sf fasta
![Page 17: ID X03006; SV 1; linear; mRNA; STD; MAM; 620 BP. XX AC X03006; XX SV X03006.1 XX DT 28-JAN-1986 (Rel. 08, Created) DT 12-SEP-1993 (Rel. 36, Last updated,](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2da2/html5/thumbnails/17.jpg)
MRS: A fast and compact retrieval system for biological data. Hekkelman M.L., Vriend G.
http://mrs.cmbi.ru.nl/
![Page 18: ID X03006; SV 1; linear; mRNA; STD; MAM; 620 BP. XX AC X03006; XX SV X03006.1 XX DT 28-JAN-1986 (Rel. 08, Created) DT 12-SEP-1993 (Rel. 36, Last updated,](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2da2/html5/thumbnails/18.jpg)
European Molecular BiologyOpen Software Suite
![Page 19: ID X03006; SV 1; linear; mRNA; STD; MAM; 620 BP. XX AC X03006; XX SV X03006.1 XX DT 28-JAN-1986 (Rel. 08, Created) DT 12-SEP-1993 (Rel. 36, Last updated,](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2da2/html5/thumbnails/19.jpg)
EMBOSS
"European Molecular Biology Open Software Suite"
http://emboss.sourceforge.net/
Toolbox with bioinformatics applications
![Page 20: ID X03006; SV 1; linear; mRNA; STD; MAM; 620 BP. XX AC X03006; XX SV X03006.1 XX DT 28-JAN-1986 (Rel. 08, Created) DT 12-SEP-1993 (Rel. 36, Last updated,](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2da2/html5/thumbnails/20.jpg)
http://emboss.bioinformatics.nl/
![Page 21: ID X03006; SV 1; linear; mRNA; STD; MAM; 620 BP. XX AC X03006; XX SV X03006.1 XX DT 28-JAN-1986 (Rel. 08, Created) DT 12-SEP-1993 (Rel. 36, Last updated,](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2da2/html5/thumbnails/21.jpg)
http://main.g2.bx.psu.edu/
![Page 22: ID X03006; SV 1; linear; mRNA; STD; MAM; 620 BP. XX AC X03006; XX SV X03006.1 XX DT 28-JAN-1986 (Rel. 08, Created) DT 12-SEP-1993 (Rel. 36, Last updated,](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2da2/html5/thumbnails/22.jpg)
command line / shell
![Page 23: ID X03006; SV 1; linear; mRNA; STD; MAM; 620 BP. XX AC X03006; XX SV X03006.1 XX DT 28-JAN-1986 (Rel. 08, Created) DT 12-SEP-1993 (Rel. 36, Last updated,](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2da2/html5/thumbnails/23.jpg)
Useful EMBOSS commandscommand descriptionshowdb Displays information on the currently available
databaseswossname Finds programs by keywords in their one-line
documentationtfm Reads the manual entries for each program in EMBOSSseealso Finds the relevant programs of certain programseqret Reads and writes (returns) sequencesentret Reads and writes (returns) flatfile entries
extractfeat Extract features from a sequence
extractseq Extract regions from a sequencetranseq Translate nucleic acid sequences
![Page 24: ID X03006; SV 1; linear; mRNA; STD; MAM; 620 BP. XX AC X03006; XX SV X03006.1 XX DT 28-JAN-1986 (Rel. 08, Created) DT 12-SEP-1993 (Rel. 36, Last updated,](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2da2/html5/thumbnails/24.jpg)
Get help from EMBOSS itself
# showdbShows the currently available databases# tfm wossnameHow to use a EMBOSS command? Just (r)tfm it# wossname alignmentWhich commands can handle alignments?# seealso seqretAre there any other commands able to do the similar thing?
![Page 25: ID X03006; SV 1; linear; mRNA; STD; MAM; 620 BP. XX AC X03006; XX SV X03006.1 XX DT 28-JAN-1986 (Rel. 08, Created) DT 12-SEP-1993 (Rel. 36, Last updated,](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2da2/html5/thumbnails/25.jpg)
Command line options
• All EMBOSS programs react to a number of command line options. The most important ones are–help Get help–help –verbose Get elaborate help–auto “no questions asked” –stdout Write to standard output–filter Read stdin, write stdout
![Page 26: ID X03006; SV 1; linear; mRNA; STD; MAM; 620 BP. XX AC X03006; XX SV X03006.1 XX DT 28-JAN-1986 (Rel. 08, Created) DT 12-SEP-1993 (Rel. 36, Last updated,](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2da2/html5/thumbnails/26.jpg)
SEQRET parameterszonnebloem> seqret -help Standard (Mandatory) qualifiers: [-sequence] seqall (Gapped) sequence(s) filename and optional format, or reference (input USA) [-outseq] seqoutall [<sequence>.<format>] Sequence set(s) filename and optional format (output USA)
Additional (Optional) qualifiers: (none) Advanced (Unprompted) qualifiers: -feature boolean Use feature information -firstonly boolean Read one sequence and stop
General qualifiers: -help boolean Report command line options. More information on associated and general qualifiers can be found with -help -verbose
![Page 27: ID X03006; SV 1; linear; mRNA; STD; MAM; 620 BP. XX AC X03006; XX SV X03006.1 XX DT 28-JAN-1986 (Rel. 08, Created) DT 12-SEP-1993 (Rel. 36, Last updated,](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2da2/html5/thumbnails/27.jpg)
SEQRET parameterszonnebloem> seqret -help -verbose Standard (Mandatory) qualifiers: [-sequence] seqall (Gapped) sequence(s) filename and
optional format, or reference (input USA) [-outseq] seqoutall [<sequence>.<format>] Sequence set(s) filename and optional format (output USA)
Additional (Optional) qualifiers: (none) Advanced (Unprompted) qualifiers: -feature boolean Use feature information -firstonly boolean Read one sequence and stop
Associated qualifiers:
"-sequence" associated qualifiers -sbegin1 integer Start of each sequence to be used ///
![Page 28: ID X03006; SV 1; linear; mRNA; STD; MAM; 620 BP. XX AC X03006; XX SV X03006.1 XX DT 28-JAN-1986 (Rel. 08, Created) DT 12-SEP-1993 (Rel. 36, Last updated,](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2da2/html5/thumbnails/28.jpg)
SEQRET parameters /// "-sequence" associated qualifiers -sbegin1 integer Start of each sequence to be used -send1 integer End of each sequence to be used -sreverse1 boolean Reverse (if DNA) -sask1 boolean Ask for begin/end/reverse -snucleotide1 boolean Sequence is nucleotide -sprotein1 boolean Sequence is protein -slower1 boolean Make lower case -supper1 boolean Make upper case -sformat1 string Input sequence format -sdbname1 string Database name -sid1 string Entryname -ufo1 string UFO features -fformat1 string Features format -fopenfile1 string Features file name ///
![Page 29: ID X03006; SV 1; linear; mRNA; STD; MAM; 620 BP. XX AC X03006; XX SV X03006.1 XX DT 28-JAN-1986 (Rel. 08, Created) DT 12-SEP-1993 (Rel. 36, Last updated,](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2da2/html5/thumbnails/29.jpg)
SEQRET parameters /// "-outseq" associated qualifiers -osformat2 string Output seq format -osextension2 string File name extension -osname2 string Base file name -osdirectory2 string Output directory -osdbname2 string Database name to add -ossingle2 boolean Separate file for each entry -oufo2 string UFO features -offormat2 string Features format -ofname2 string Features file name -ofdirectory2 string Output directory ///
![Page 30: ID X03006; SV 1; linear; mRNA; STD; MAM; 620 BP. XX AC X03006; XX SV X03006.1 XX DT 28-JAN-1986 (Rel. 08, Created) DT 12-SEP-1993 (Rel. 36, Last updated,](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2da2/html5/thumbnails/30.jpg)
SEQRET parameters /// General qualifiers: -auto boolean Turn off prompts -stdout boolean Write standard output -filter boolean Read standard input, write standard output -options boolean Prompt for standard and additional values -debug boolean Write debug output to program.dbg -verbose boolean Report some/full command line options -help boolean Report command line options. More information on associated and general qualifiers can be found with -help -verbose -warning boolean Report warnings -error boolean Report errors -fatal boolean Report fatal errors -die boolean Report dying program messages
![Page 31: ID X03006; SV 1; linear; mRNA; STD; MAM; 620 BP. XX AC X03006; XX SV X03006.1 XX DT 28-JAN-1986 (Rel. 08, Created) DT 12-SEP-1993 (Rel. 36, Last updated,](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2da2/html5/thumbnails/31.jpg)
Universal Sequence AddressType Example Description
filename xxx.seq A sequence file "xxx.seq" in any format
format::filename fasta::xxx.seq A sequence file "xxx.seq" in fasta format
db:IDname embl:paamir EMBL entry PAAMIR, using whatever access method is defined locally for the EMBL database
db:AccessionNumber embl:X13776EMBL entry X13776, using whatever access method is defined locally for the EMBL database and searching by accession number and entry name (X13776 is the accession number in this case)
db-acc:AccessionNumber embl-acc:X13776 EMBL entry X13776, using whatever access method is defined locally for the EMBL database and searching by accession number only
db-id:IDname embl-id:paamir EMBL entry PAAMIR, using whatever access method is defined locally for the EMBL database, and searching by ID only
db-searchfield:word embl-des:lectin EMBL entries containing the word 'lectin' in the Description line
db-searchfield:wildcard-word embl-org:*human* EMBL entries containing the wildcarded word 'human' in the
Organism fields
db:wildcard-ID embl:paami*EMBL entries PAAMIB, PAAMIE and so on, usually in alphabetical order, using whatever access method is defined locally for the EMBL database
![Page 32: ID X03006; SV 1; linear; mRNA; STD; MAM; 620 BP. XX AC X03006; XX SV X03006.1 XX DT 28-JAN-1986 (Rel. 08, Created) DT 12-SEP-1993 (Rel. 36, Last updated,](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2da2/html5/thumbnails/32.jpg)
Universal Sequence AddressType Example Description
db or db:* embl or EMBL:* All sequences in the EMBL database
@listfile @mylist Reads file mylist and uses each line as a separate USA. List files can contain references to other lists files or any other standard USA.
list:listfile list:mylist Same as "@mylist" above
'program parameters |' 'getz -e [embl-id:paamir] |'The pipe character "|" causes EMBOSS to fire up getz (the SRS sequence retrieval program) to extract entry PAAMIR from EMBL in EMBL format. Any application or script which writes one or more sequences to stdout can be used in this way.
asis::sequence asis::atacgcagttatctgaccatSo far the shortest USA we could invent. In 'asis' format the name is the sequence so no file needs to be opened. This is a special case. It was intended as a joke, but could be quite useful for generating command lines.
Each of the above can have '[start : end]' or '[start : end : r]' appended to them. The 'file' and 'dbname' forms of USA can have 'format::' in front of them (although a database knows which format it is and so this is redundant and error-prone)
![Page 33: ID X03006; SV 1; linear; mRNA; STD; MAM; 620 BP. XX AC X03006; XX SV X03006.1 XX DT 28-JAN-1986 (Rel. 08, Created) DT 12-SEP-1993 (Rel. 36, Last updated,](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2da2/html5/thumbnails/33.jpg)
Walk through exercise
For a protein with UniProt Accession number:
Q5ZKN6
find the nucleotide sequence that encodes this (repeated) amino acid fragment:
VAEEVAEE
![Page 34: ID X03006; SV 1; linear; mRNA; STD; MAM; 620 BP. XX AC X03006; XX SV X03006.1 XX DT 28-JAN-1986 (Rel. 08, Created) DT 12-SEP-1993 (Rel. 36, Last updated,](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2da2/html5/thumbnails/34.jpg)
Getting the sequence
seqret -auto uniprot:Q5ZKN6 -stdout
>Q5ZKN6_CHICK Q5ZKN6 SubName: Full=Putative uncharacterized protein;MADNLPSEFDVVVIGTGLPESIIAAACARSGQRVLHVDSRNYYGGNWASFSFSGLLSWIKENQQNTDIKDECEDWRKLILENEEVISLNKKDKTIQHVEAFCFDDQDAAEDVEEAGALARLPAYGASVAEEVAEEPEKECSPLESAVPGAENLESEKATSVDPASAAEGNVTEINAESESSHDSASGESTLESGKTEAALSEISAQEPKKITYSQIVREGRRFNIDLVSKLLYSRGLLIELLIKSNVSRYAEFKNATRILAFREGKVEQVPCSRADVFNSRQLAMVEKRMLMKFLTFCLEYEQHPDEYQDYKNSTFAQFLKTRKLTPSLQHFILHSIAMVSEKDCNTLEGLQATRKFLQCLGRYGNTPFLFPLYGQGEIPQCFCRMCAVFGGIYCLRHSVQCLVVDKESGRCKAVVDHFGQRISANYFIVEDSYLSESVCENVCYRQLSRAVLITDQSVLKTDSEQQVSILMVPPVDLGQPAVCVIELCSSTMTCMKDTYLVHLTCPSTKTAREDLEPVVQKLFSLNAETEKETEDEVLEKPRVLWALYFNMRDSSGIDRNSYSGLPSNVYVCSGPDSALGNDCAVKQAETIFQEMFPTEEFCPAPPNPEDIIYDEDEIASEETGFNNSPETKPESSLQESSSRGSSTAVKEHIEE
![Page 35: ID X03006; SV 1; linear; mRNA; STD; MAM; 620 BP. XX AC X03006; XX SV X03006.1 XX DT 28-JAN-1986 (Rel. 08, Created) DT 12-SEP-1993 (Rel. 36, Last updated,](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2da2/html5/thumbnails/35.jpg)
Getting the sequence
seqret -auto uniprot:Q5ZKN6 -stdout
>Q5ZKN6_CHICK Q5ZKN6 SubName: Full=Putative uncharacterized protein;MADNLPSEFDVVVIGTGLPESIIAAACARSGQRVLHVDSRNYYGGNWASFSFSGLLSWIKENQQNTDIKDECEDWRKLILENEEVISLNKKDKTIQHVEAFCFDDQDAAEDVEEAGALARLPAYGASVAEEVAEEPEKECSPLESAVPGAENLESEKATSVDPASAAEGNVTEINAESESSHDSASGESTLESGKTEAALSEISAQEPKKITYSQIVREGRRFNIDLVSKLLYSRGLLIELLIKSNVSRYAEFKNATRILAFREGKVEQVPCSRADVFNSRQLAMVEKRMLMKFLTFCLEYEQHPDEYQDYKNSTFAQFLKTRKLTPSLQHFILHSIAMVSEKDCNTLEGLQATRKFLQCLGRYGNTPFLFPLYGQGEIPQCFCRMCAVFGGIYCLRHSVQCLVVDKESGRCKAVVDHFGQRISANYFIVEDSYLSESVCENVCYRQLSRAVLITDQSVLKTDSEQQVSILMVPPVDLGQPAVCVIELCSSTMTCMKDTYLVHLTCPSTKTAREDLEPVVQKLFSLNAETEKETEDEVLEKPRVLWALYFNMRDSSGIDRNSYSGLPSNVYVCSGPDSALGNDCAVKQAETIFQEMFPTEEFCPAPPNPEDIIYDEDEIASEETGFNNSPETKPESSLQESSSRGSSTAVKEHIEE
![Page 36: ID X03006; SV 1; linear; mRNA; STD; MAM; 620 BP. XX AC X03006; XX SV X03006.1 XX DT 28-JAN-1986 (Rel. 08, Created) DT 12-SEP-1993 (Rel. 36, Last updated,](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2da2/html5/thumbnails/36.jpg)
Run a program within Perl: 3 ways
$seq = `seqret -auto uniprot:Q5ZKN6 stdout`;
system("seqret -auto uniprot:Q5ZKN6 stdout");
open SEQRET,"seqret -auto uniprot:Q5ZKN6 stdout|";while(my $line = <SEQRET>) {if($line !~ /^>/) {
chomp($line);$seq .= $line;
}}close SEQRET;
![Page 37: ID X03006; SV 1; linear; mRNA; STD; MAM; 620 BP. XX AC X03006; XX SV X03006.1 XX DT 28-JAN-1986 (Rel. 08, Created) DT 12-SEP-1993 (Rel. 36, Last updated,](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2da2/html5/thumbnails/37.jpg)
my $lsOutput = `ls -l`;
put shell commands or programs in backticks to run from Perl. The
output can be stored in a variable.
![Page 38: ID X03006; SV 1; linear; mRNA; STD; MAM; 620 BP. XX AC X03006; XX SV X03006.1 XX DT 28-JAN-1986 (Rel. 08, Created) DT 12-SEP-1993 (Rel. 36, Last updated,](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2da2/html5/thumbnails/38.jpg)
open LS,"ls -l|";
The open function can run a program and read its output. The pipe symbol "|" links the output to a filehandle.
![Page 39: ID X03006; SV 1; linear; mRNA; STD; MAM; 620 BP. XX AC X03006; XX SV X03006.1 XX DT 28-JAN-1986 (Rel. 08, Created) DT 12-SEP-1993 (Rel. 36, Last updated,](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2da2/html5/thumbnails/39.jpg)
Find the fragment’s position
my $seq = "";open SEQRET,"seqret -auto uniprot:Q5ZKN6 stdout|";
while(my $line = <SEQRET>) {if($line !~ /^>/) {
chomp($line);$seq .= $line;
}}close SEQRET;# look for location of the repeatmy $position = index($seq, "VAEEVAEE") + 1;# print the offsetprint "Position = ", $position, "\n";
![Page 40: ID X03006; SV 1; linear; mRNA; STD; MAM; 620 BP. XX AC X03006; XX SV X03006.1 XX DT 28-JAN-1986 (Rel. 08, Created) DT 12-SEP-1993 (Rel. 36, Last updated,](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2da2/html5/thumbnails/40.jpg)
!~
opposite of "=~ "gives true if the search found no hits.
![Page 41: ID X03006; SV 1; linear; mRNA; STD; MAM; 620 BP. XX AC X03006; XX SV X03006.1 XX DT 28-JAN-1986 (Rel. 08, Created) DT 12-SEP-1993 (Rel. 36, Last updated,](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2da2/html5/thumbnails/41.jpg)
Get a cross-reference to EMBL
entret uniprot:Q5ZKN6 -auto stdout |grep "DR "
Get the feature table of this protein entry
![Page 42: ID X03006; SV 1; linear; mRNA; STD; MAM; 620 BP. XX AC X03006; XX SV X03006.1 XX DT 28-JAN-1986 (Rel. 08, Created) DT 12-SEP-1993 (Rel. 36, Last updated,](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2da2/html5/thumbnails/42.jpg)
Understand the cross-reference
DR EMBL; AJ720048; CAG31707.1; -; mRNA.
Read the detailed documentation of UniProt cross referencehttp://www.expasy.org/sprot/userman.html#DR_line
Database cross reference
EMBL accession numberProtein ID Molecule Type
Link to EMBL Status identifier
The correspondingcross reference in EMBL
![Page 43: ID X03006; SV 1; linear; mRNA; STD; MAM; 620 BP. XX AC X03006; XX SV X03006.1 XX DT 28-JAN-1986 (Rel. 08, Created) DT 12-SEP-1993 (Rel. 36, Last updated,](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2da2/html5/thumbnails/43.jpg)
Get a cross-reference to EMBL
entret uniprot:Q5ZKN6 -auto stdout | grep "DR " |grep "EMBL;"
In Perl, use a regular expression to locate the EMBL reference line, and extract the EMBL accession number and the protein-ID
![Page 44: ID X03006; SV 1; linear; mRNA; STD; MAM; 620 BP. XX AC X03006; XX SV X03006.1 XX DT 28-JAN-1986 (Rel. 08, Created) DT 12-SEP-1993 (Rel. 36, Last updated,](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2da2/html5/thumbnails/44.jpg)
Link protein to coding DNA
extractfeat embl:AJ720048 -value CAG31707.1 stdout
Returns the DNA coding for protein CAG31707.1 (=Q5ZKN6)
![Page 45: ID X03006; SV 1; linear; mRNA; STD; MAM; 620 BP. XX AC X03006; XX SV X03006.1 XX DT 28-JAN-1986 (Rel. 08, Created) DT 12-SEP-1993 (Rel. 36, Last updated,](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2da2/html5/thumbnails/45.jpg)
Figure out the offset in DNA
Offset in amino acid sequence: 128Offset in corresponding nucleotide sequence: ((128-1) x 3) + 1
OR (128 x 3)-2 = 382Position is from 382 to (382 + 8x3)=406
Figure out the position of its corresponding coding DNA sequence (is there anything wrong here?)
![Page 46: ID X03006; SV 1; linear; mRNA; STD; MAM; 620 BP. XX AC X03006; XX SV X03006.1 XX DT 28-JAN-1986 (Rel. 08, Created) DT 12-SEP-1993 (Rel. 36, Last updated,](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2da2/html5/thumbnails/46.jpg)
Extract the DNA sequenceextractfeat embl:AJ720048 -value CAG31707.1 stdout | extractseq –filter -reg "382-406"
Now we got the corresponding DNA sequence for the protein fragment
It should be: “gttgctgaggaggttgctgaagaac”
But is that correct? Let's translate it for verification…
![Page 47: ID X03006; SV 1; linear; mRNA; STD; MAM; 620 BP. XX AC X03006; XX SV X03006.1 XX DT 28-JAN-1986 (Rel. 08, Created) DT 12-SEP-1993 (Rel. 36, Last updated,](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2da2/html5/thumbnails/47.jpg)
Verify the result
extractfeat embl:AJ720048 -value CAG31707.1 stdout | extractseq –filter -reg "382-406"| transeq -filter
Result is “VAEEVAEEX” but not “VAEEVAEE”
What’s wrong here?
Always try to verify your results: computers make very few errors, but that is not true for people...
![Page 48: ID X03006; SV 1; linear; mRNA; STD; MAM; 620 BP. XX AC X03006; XX SV X03006.1 XX DT 28-JAN-1986 (Rel. 08, Created) DT 12-SEP-1993 (Rel. 36, Last updated,](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2da2/html5/thumbnails/48.jpg)
Exercise
Build a pipeline in Perl to perform the previous steps of the walkthrough (from slide 34)
Test it with the UniProt protein A0L7N9
Find the fragment at offset 305 that is 8 aa long
Find out the coding DNA of this amino acid fragment and verify it