bioinformacs resources - genbank...april 27th sequence databases (3. sh.) june 15th mongodb,...
TRANSCRIPT
![Page 1: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/1.jpg)
BioinfRes SoSe 18
Bioinforma)csResources-Genbank-
Lecture&ExercisesProf.B.Rost,Dr.L.Richter,J.Reeb
Ins)tutfürInforma)kI12
![Page 2: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/2.jpg)
BioinfRes SoSe 18
PreliminarySchedule
* These exercises can earn you a bonus
April 13th Intro, General Overview (1. sh.) June 1th Lecture cancelled April 20th Sequence Databases (2. sh.) June 8th NoSql 2 (7.sh.) April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications (9.sh.) May 11th Lecture cancelled June 29th PredictProtein May 18th SQL (5. sh.) Jul 6th Wrap Up, Q&A May 25th SQL, NoSql (6. sh) Jul 20th Exam
![Page 3: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/3.jpg)
BioinfRes SoSe 18
Na)onalCenterforBiotechnologyInforma)on,NCBI
http://nihrecord.nih.gov/newsletters/2013/07_19_2013/images/milestonesPic6.jpg
● firstideasinthemiddleofthe80s
● divisionoftheNa)onalLibraryofMedicine(NLM)insidetheNa)onalIns)tutesofHealth(NIH)
● poli)calmission
● foundedin1988
● DavidLipman
![Page 4: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/4.jpg)
BioinfRes SoSe 18
NCBI’spoli)calmissionasdefinedbythebill:1. design,develop,implement,andmanageautomatedsystems
forthecollec)on,storage,retrieval,analysis,anddissemina)onofknowledgeconcerninghumanmolecularbiology,biochemistry,andgene)cs;
2. performresearchintoadvancedmethodsofcomputer-basedinforma)onprocessingcapableofrepresen)ngandanalyzingthevastnumberofbiologicallyimportantmoleculesandcompounds;
3. enablepersonsengagedinbiotechnologyresearchandmedicalcaretousesystemsdevelopedunderparagraph(1)andmethodsdescribedinparagraph(2);and
4. coordinate,asmuchasisprac)cable,effortstogatherbiotechnologyinforma)ononaninterna)onalbasis.
![Page 5: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/5.jpg)
BioinfRes SoSe 18
SelectedNCBIAccomplishmentsBlastGenBankatNCBI
NCBIwebsite
GenomesOMIM
PubMed
1990
1992
1994
1995
1996
1997
HumanGenomePubMedCentral
EntrezGene/DTDs
NIHPublicAccessGenomeReferenceConsor)um
1000GenomesProject
1999
2000
2003
2005
2007
2008
![Page 6: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/6.jpg)
BioinfRes SoSe 18
NCBIResources● NCBIcurrentlyhostsavastbunchofresourceshap://www.ncbi.nlm.nih.gov/guide/all/
● groupedaccordingtovariouscriteria- metadata,project-centric- methodoriented- topicoriented
● sortedinthesec)ons:databases,downloads,submissions,tools,howtos
![Page 7: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/7.jpg)
BioinfRes SoSe 18
Genbank’sOrigin
● WalterGoad,LosAlamosNa)onalLaboratory
● LosAlamosSequenceDatabase1979
● Crea)onandreleaseofGenBankin1982
● Endof1982:2000sequences
● MovetoNCBIin1992http://www.lanl.gov/science-innovation/features/innovations/images/light/thumbnails/21.jpg
![Page 8: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/8.jpg)
BioinfRes SoSe 18
Minutesfrom20thanniversaryofGenBankin2002
“....AmongthemisamemoonLosAlamosNa)onalLaboratorysta)onerydatedMay9,1980,thatreads:Monday,May12at10:30SteveSimoninvitesyouforcakeandcoffeetocelebrate100,000basesnowintheDNAsequencelibrary.”
takenfromhaps://www.genomeweb.com/genbank-turns-20
![Page 9: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/9.jpg)
BioinfRes SoSe 18
GrowthofGenBankandWGS
-doublingapprox.every18months,diagramforrelease225,Apr.2018-currentversion:release225:260,189,141,631basesinGenbank,2,784,740,996,536basesinWGS-takenfromhap://www.ncbi.nlm.nih.gov/genbank/sta)s)cs
![Page 10: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/10.jpg)
BioinfRes SoSe 18
GrowthofGenBankandWGS
-currentrelease225:208,452,303sequencesinGenbank,621,379,029sequencesinWGS-takenfromhap://www.ncbi.nlm.nih.gov/genbank/sta)s)cs,release225,Apr.2018
![Page 11: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/11.jpg)
BioinfRes SoSe 18
ReferencesforGenBank● onecurrentcita)onsource:“GenBank”.NucleicAcidsRes.2014Jan;42(Databaseissue):D32-7.doi:10.1093/nar/gkt1030.Epub2013Nov11.
● PMID:24217914● themostrecent:“Genbank”.NucleicAcidsRes.2018Jan4;46(D1):D41–D47.Publishedonline2017Nov13th.doi:10.1093/nar/gkx1094
● PMCID:PMC5753231
![Page 12: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/12.jpg)
BioinfRes SoSe 18
ReferencesforGenBank● moregeneralforNCBIservices:“DatabaseresourcesoftheNa)onalCenterforBiotechnologyInforma)on”.NucleicAcidsRes.2016Jan4;44(Databaseissue):D7–D19.Publishedonline2015Nov28.doi:10.1093/nar/gkv1290
● partoftheInterna)onalNucleo)deSequenceDatabaseCollabora)on(INSDC)togetherwithEMBLNucleo)deSequenceDatabase(EMBL-Bank),partoftheEuropeanNucleo)deArchive(ENA)andtheDNADataBankofJapan(DDBJ)
![Page 13: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/13.jpg)
BioinfRes SoSe 18
MostGrowingDivisionsDivision Description Release 197
(8/2013) Annual Increase (%)
WGS* Whole-genome shotgun data 2,035,032,639,807 from Release 219
TSA* Transcriptome shotgun data 149,038,907,599 from Release 219
WGS* Whole-genome shotgun data 500.420.412.665 62.4.
TSA* Transcriptome shotgun data 8.6333123.935 49.9
PHG Phages 119.812.712 42.5
VRL Viruses 1.757.202.472 22.9
BCT Bacteria 10.281.048.518 21.8
ENV Environmental samples 3.743.277.434 10.9
INV Invertebrates 2.737.140.464 9.8
PAT Patented sequences 13.290.161.247 9.7
PLN Plants 5.963.882.822 8.8
GSS Genome survey sequences 23.726.384.753 8.1
VRT Other vertebrates 3.068.956.026 6.3
MAM Other mammals 911.342.025 5.6
... ... ... ...
TOTAL All GenBank sequences 654.613.333.676 45.1 * not distributed with the release; there specific project server sections
![Page 14: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/14.jpg)
BioinfRes SoSe 18
TopOrganisms(Rel.207)Organism Entries Non-WGS base
pair Homo sapiens 20.921.637 17.714.786.437
Mus musculus 9.727.522 9.995.696.539
Rattus norvegicus 2.193.812 6.526.236.496
Bos taurus 2.227.298 5.410.360.312
Zea mays 4.177.175 5.201.714.457
Sus scrofa 3.297.029 4.895.127.638
Danio rerio 1.727.668 3.133.901.682
Triticum aestivum 1.796.780 1.927.718.314
... ... ...
Oryza sativa Japonica Group
1.376.410 1.265.556.227
... ... ...
Arabidopsis thaliana 2.578.785 1.202.100.008
... ...
![Page 15: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/15.jpg)
BioinfRes SoSe 18
TopOrganisms(Rel.219)Organism Entries Non-WGS base pair
Homo sapiens 24,231,652 18,893,466,733
Mus musculus 9,883,173 10,229,286,664
Rattus norvegicus 2,197,781 6,528,984,315
Bos taurus 2,229,235 5,429,379,063
Zea mays 4,197,803 5,227,077,026
Sus scrofa 3,298,802 5,071,347,463
Hordeum vulgare ssp. vulgare
1,346,798 3,235,834,212
Danio rerio 1,729,033 3,190,913,255
Ovis canadanensis canadanensis
72 2,590,574,434
Triticum aestivum 1,812,814 1,942,831,630
... ... ...
Oryza sativa Japonica Group
1,378,262 1,642,328,218
... ... ...
Escherichia coli 118,884 1,571,576,668
... ...
![Page 16: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/16.jpg)
BioinfRes SoSe 18
Distribu)onofSequenceFiles(Rel.207)Division Number of Files
BCT 178 CON 317 ENV 81 EST 478 HTG 142 INV 126 PAT 219 PLN 107 TSA 175 VRL 34
Release 207 consists of 2333 text files in total. Release 225 consists of 3120 text files in total.
![Page 17: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/17.jpg)
BioinfRes SoSe 18
Distribu)onofSequenceFiles(Rel.2019)Division Number of Files
BCT 350 CON 359 ENV 97 EST 483 HTG INV 153 PAT 290 PHG 4 PLN 145 PRI 56 SYN 10 TSA 230 VRL 48
Release 219 consists of 2225 text files in total.
![Page 18: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/18.jpg)
BioinfRes SoSe 18
DatabaseFiles(Rel.225)
● GenBankcomesinasetofcompressedtextfilesavailableviaFTP
● seekp://kp.ncbi.nih.gov/genbank/gbrel.txt● 3120ASCIIfiles(listedindivisionplusaddi)onallistfiles)intherangeof0.7-520MB
● uncompressed~885GB● eachfileconsistsoftwopor)ons
![Page 19: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/19.jpg)
BioinfRes SoSe 18
DatabaseFiles● Part1:highlyconserveddatabasefileheaders1 10 20 30 40 50 60 70 79 ---------+---------+---------+---------+---------+---------+---------+--------- GBBCT1.SEQ Genetic Sequence Data Bank April 15 2015 NCBI-GenBank Flat File Release 207.0 Bacterial Sequences (Part 1) 51396 loci, 92682287 bases, from 51396 reported sequences ---------+---------+---------+---------+---------+---------+---------+--------- 1 10 20 30 40 50 60 70 79
● Part1:sequenceentriesforthatdivisiondescribedintheheader
![Page 20: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/20.jpg)
BioinfRes SoSe 18
1 10 20 30 40 50 60 70 79!---------+---------+---------+---------+---------+---------+---------+---------!GBSMP.SEQ Genetic Sequence Data Bank! December 15 1992!! GenBank Flat File Release 74.0!! Structural RNA Sequences!! 2 loci, 236 bases, from 2 reported sequences!!LOCUS AAURRA 118 bp ss-rRNA RNA 16-JUN-1986!DEFINITION A.auricula-judae (mushroom) 5S ribosomal RNA.!ACCESSION K03160!VERSION K03160.1!KEYWORDS 5S ribosomal RNA; ribosomal RNA.!SOURCE A.auricula-judae (mushroom) ribosomal RNA.! ORGANISM Auricularia auricula-judae! Eukaryota; Fungi; Eumycota; Basidiomycotina; Phragmobasidiomycetes;! Heterobasidiomycetidae; Auriculariales; Auriculariaceae.!REFERENCE 1 (bases 1 to 118)! AUTHORS Huysmans,E., Dams,E., Vandenberghe,A. and De Wachter,R.! TITLE The nucleotide sequences of the 5S rRNAs of four mushrooms and! their use in studying the phylogenetic position of basidiomycetes! among the eukaryotes! JOURNAL Nucleic Acids Res. 11, 2871-2880 (1983)!FEATURES Location/Qualifiers! rRNA 1..118! /note="5S ribosomal RNA"!BASE COUNT 27 a 34 c 34 g 23 t!ORIGIN 5' end of mature rRNA.! 1 atccacggcc ataggactct gaaagcactg catcccgtcc gatctgcaaa gttaaccaga! 61 gtaccgccca gttagtacca cggtggggga ccacgcggga atcctgggtg ctgtggtt!//!!
LOCUS ABCRRAA 118 bp ss-rRNA RNA 15-SEP-1990!DEFINITION Acetobacter sp. (strain MB 58) 5S ribosomal RNA, complete sequence.!ACCESSION M34766!VERSION M34766.1!KEYWORDS 5S ribosomal RNA.!SOURCE Acetobacter sp. (strain MB 58) rRNA.! ORGANISM Acetobacter sp.! Prokaryotae; Gracilicutes; Scotobacteria; Aerobic rods and cocci;! Azotobacteraceae.!REFERENCE 1 (bases 1 to 118)! AUTHORS Bulygina,E.S., Galchenko,V.F., Govorukhina,N.I., Netrusov,A.I.,! Nikitin,D.I., Trotsenko,Y.A. and Chumakov,K.M.! TITLE Taxonomic studies of methylotrophic bacteria by 5S ribosomal RNA! sequencing! JOURNAL J. Gen. Microbiol. 136, 441-446 (1990)!FEATURES Location/Qualifiers! rRNA 1..118! /note="5S ribosomal RNA"!BASE COUNT 27 a 40 c 32 g 17 t 2 others!ORIGIN ! 1 gatctggtgg ccatggcggg agcaaatcag ccgatcccat cccgaactcg gccgtcaaat! 61 gccccagcgc ccatgatact ctgcctcaag gcacggaaaa gtcggtcgcc gccagayy!//!---------+---------+---------+---------+---------+---------+---------+---------!1 10 20 30 40 50 60 70 79!
![Page 21: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/21.jpg)
BioinfRes SoSe 18
TheGenBankFlatFileFormat
● asequenceentryconsistsofmanyrecords(lines)● eachrecordconsistsoftwoparts
● Part1:columns1-10/EntryFieldName
● Part2:remaininglinewiththecontent
![Page 22: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/22.jpg)
BioinfRes SoSe 18
Part1/1● akeyword,beginningincolumn1oftherecord(e.g.,REFERENCEisakeyword)
● asubkeywordbeginningincolumn3,withcolumns1and2blank(e.g.,AUTHORSisasubkeywordofREFERENCE)
● orasubkeywordbeginningincolumn4,withcolumns1,2,and3blank(e.g.,PUBMEDisasubkeywordofREFERENCE)
![Page 23: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/23.jpg)
BioinfRes SoSe 18
Part1/2
● blankcharacters,indica)ngthatthisrecordisacon)nua)onoftheinforma)onunderthekeywordorsubkeywordaboveit
● acode,beginningincolumn6,indica)ngthenatureofanentry(featurekey)intheFEATUREStable
![Page 24: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/24.jpg)
BioinfRes SoSe 18
Part1/3● anumber,endingincolumn9oftherecord:- Thisnumberoccursinthepor)onoftheentrydescribingtheactualnucleo)desequenceanddesignatesthenumberingofsequenceposi)ons
● twoslashes(//)inposi)ons1and2,markingtheendofanentry
![Page 25: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/25.jpg)
BioinfRes SoSe 18
Part2● Thesecondpartofeachsequenceentryrecordcontainstheinforma)onappropriatetoitskeyword
● inposi)ons13to80forkeywords
● inposi)ons11to80forthesequence
![Page 26: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/26.jpg)
BioinfRes SoSe 18
EntryFieldTypes(incomplete)● Locus:Ashortmnemonicnamefortheentry,chosentosuggestthesequence'sdefini)on;mandatorykeyword/exactlyonerecord.
● Defini4on:Aconcisedescrip)onofthesequence;mandatorykeyword/oneormorerecords
● Accession:- theprimaryaccessionnumberisaunique,unchangingiden4fierassignedtoeachGenBanksequencerecord.
- tobeusedforcita)onsfromGenBank- mandatorykeyword/oneormorerecords.
![Page 27: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/27.jpg)
BioinfRes SoSe 18
EntryFieldTypes(incomplete)
● Version:- compoundiden)fierconsis)ngoftheprimaryaccessionnumberandanumericversionnumberassociatedwiththecurrentversionofthesequencedataintherecord
- op)onallyfollowedbyanintegeriden)fier(a"GI")assignedtothesequencebyNCBI
- mandatorykeyword/exactlyonerecord
![Page 28: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/28.jpg)
BioinfRes SoSe 18
EntryFieldTypes(incomplete)
● DBLINK:providescross-referencestoresourcesthatsupporttheexistenceasequencerecord;op4onalkeyword/oneormorerecords
● Keywords:shortphrasesdescribinggeneproductsandotherinforma)onaboutanentry;mandatorykeywordinallannotatedentries/oneormorerecords
![Page 29: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/29.jpg)
BioinfRes SoSe 18
EntryFieldTypes(incomplete)
● Source:Commonnameoftheorganismorthenamemostfrequentlyusedintheliterature;mandatorykeywordinallannotatedentries/oneormorerecords/includesonesubkeyword
● Organism:Formalscien)ficnameoftheorganism(firstline)andtaxonomicclassifica)onlevels(secondandsubsequentlines);mandatorysubkeywordinallannotatedentries/twoormorerecords
![Page 30: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/30.jpg)
BioinfRes SoSe 18
EntryFieldTypes(incomplete)● Reference:- Cita)onsforallar)clescontainingdatareportedinthisentry
- includessevensubkeywordsandmayrepeat- mandatorykeyword/oneormorerecords
● Journal:liststhejournalname,volume,year,andpagenumbersofthecita)on;mandatorysubkeyword/oneormorerecords
● op)onalsubkeywords:Authors,Consor)um,Title,Medline,Pubmed,Remark
![Page 31: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/31.jpg)
BioinfRes SoSe 18
EntryFieldTypes(incomplete)● Features:tablecontaininginforma)ononpor)onsofthesequencethatcodeforproteinsandRNAmolecules;sitesofbiologicalsignificance;op4onalkeyword/oneormorerecords
● Origin:- specifica)onofhowthefirstbaseofthereportedsequenceisopera)onallylocatedwithinthegenome
- mandatorykeyword/exactlyonerecord- followedbysequencedata(mul)plerecords)
● //:entrytermina)onsymbol;mandatoryattheendofanentry/exactlyonerecord
![Page 32: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/32.jpg)
BioinfRes SoSe 18
DetailedLocusFormatColumns Contents 01-05 'LOCUS'
06-12 spaces
13-28 Locus name
29-29 space
30-40 Length of sequence, right-justified
41-41 space
42-43 bp
44-44 space
45-47 spaces, ss- (single-stranded), ds- (double-stranded), or ms- (mixed-stranded)
48-53 NA, DNA, RNA, tRNA (transfer RNA), rRNA (ribosomal RNA), mRNA (messenger RNA), uRNA (small nuclear RNA), left justified
54-55 space
56-63 'linear' followed by two spaces, or 'circular'
64-64 space
65-67 The division code
68-68 space
69-79 Date, in the form dd-MMM-yyyy (e.g., 15-MAR-1991)
![Page 33: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/33.jpg)
BioinfRes SoSe 18
AccessionFormat● sixoreightcharacters● sixcharacterformat:- singleuppercaseleaer- 5digits
● eigthcharacterformat:- twouppercaseleaers- 6digits
● primaryaccessionnumberalwaysthefirstone
![Page 34: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/34.jpg)
BioinfRes SoSe 18
Features(Incomplete)
● authorita)vesource:hap://www.insdc.org/documents/feature-table
● featuretablecontainsinforma)onabout:- geneandgeneproducts- regionsofbiologicalsignificance- canenumeratedifferencesbetweenvariousreports- providescross-referencestootherdatacollec)ons- allowshierarchicalrela)onbetweenthefeatures
![Page 35: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/35.jpg)
BioinfRes SoSe 18
Layout● firstlineofthefeaturetableisaheader● includesthekeyword‘FEATURES’andthecolumnheader‘Loca)on/Qualifiers’
● eachfeatureconsistsof:- descriptorlinecontainingafeaturekeyandaloca)on
- acon)nua)onlinefortheloca)onmayfollow- featurequalifiersmayfollowthedescriptorline- key:column6-20,loca)onstartsincolumn22- qualifiersonsubsequentlinesatcolumn22star)ngwitha‘/’
![Page 36: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/36.jpg)
BioinfRes SoSe 18
AFewFrequentFeatures● CDS:sequencecodingforaminoacidsinprotein(includesstopcodon)
● exon:regionthatcodesforpartofsplicedmRNA● gene:regionthatdefinesafunc)onalgene,possiblyincludingupstream(promotor,enhancer,etc)anddownstreamcontrolelements,andforwhichanamehasbeenassigned
● mRNA:messengerRNA
● .......>60featurescurrently
![Page 37: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/37.jpg)
BioinfRes SoSe 18
Loca)onandQualifiers
● Loca)on:- aloca)oncanbe:asinglebase,aspanofbases,asitebetweentwobases,ajoinofsequences,...
- examples:23,23..56,23^24,join(23..56,87..110)
● Qualifiers:- format:fromcolumn22/qualifier_name[=value]- types:freetext,enumera)onorcontrolledvocabulary,cita)ons,sequences,featurelabels
![Page 38: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/38.jpg)
BioinfRes SoSe 18
DatabaseCrossReferences/db_xref
● hap://www.ncbi.nlm.nih.gov/genbank/collab/db_xref/
● Qualifier:/db_xref="database:idenDfier”● Defini4on:databasecross-reference:pointertorelatedinforma)oninanotherdatabase
● Scope:allfeaturekeys● Example:/db_xref="Swiss-Prot:P12345”
● currently>120databasesavailable
![Page 39: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/39.jpg)
BioinfRes SoSe 18
AnatomyofaGenbankFlatFile
. . .
![Page 40: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/40.jpg)
BioinfRes SoSe 18
AnatomyofaGenbankFlatFile
. . .
Locus line
![Page 41: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/41.jpg)
BioinfRes SoSe 18
AnatomyofaGenbankFlatFile
. . . Accession Number, Version and GI number
![Page 42: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/42.jpg)
BioinfRes SoSe 18
AnatomyofaGenbankFlatFile
. . . Feature table with annotations
![Page 43: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/43.jpg)
BioinfRes SoSe 18
UsefulResourcesfromNCBI
● Materials:● Electronicbookshelf
● hap://www.ncbi.nlm.nih.gov/educa)on/factsheets/
● kp://kp.ncbi.nih.gov/pub/factsheets/Factsheet_Books.pdf
● NCBImanuals
● textbooks
![Page 44: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/44.jpg)
BioinfRes SoSe 18
UsefulResourcesfromNCBI
● Processes,e.g.Prokaryo)cGenomeAnnota)onPipeline
● designedforbacterialandarchaealgenomes● mul)-levelprocessincludingprotein-codinggenepredic)onandfunc)onalgenomeunitlikerRNAs,tRNAs,smallRNAs,pseudogenescontrolregions,repeats,inser)onelementsa.s.f.
● combina)onofab-iniDopredic)onandhomologybasedmethods
![Page 45: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/45.jpg)
BioinfRes SoSe 18
UsefulResourcesfromNCBI● referencedatabases:RefSeq● hap://www.ncbi.nlm.nih.gov/refseq/
● comprehensive,integrated,non-redundant,well-annotatedsetofsequences,includinggenomicDNA,transcripts,andproteins
● stablereferenceforgenomeannota)on,esp.subsetofRefSeqGene
● referencesequences
● referencecoordinates● accessibleviaBLAST,EntrezandFTP
![Page 46: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/46.jpg)
BioinfRes SoSe 18
RefSeq● createdby:- Eukaryo)cGenomeAnnota)onPipeline- Prokaryo)cGenomeAnnota)onPipeline- Manualcura)on- SubmissiontoINSDCmembers
● reflectcurrentknowledgeofsequencesdataandbiology
● formatconsistency● Accessionnumbercontainsan“_”
![Page 47: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/47.jpg)
BioinfRes SoSe 18
RefSeqGrowth
![Page 48: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/48.jpg)
BioinfRes SoSe 18
DatabasesAccessibleviaEntrez
http://www.ncbi.nlm.nih.gov/gquery/
![Page 49: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/49.jpg)
BioinfRes SoSe 18
Computa)on:BlastatNCBI
![Page 50: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/50.jpg)
BioinfRes SoSe 18
![Page 51: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/51.jpg)
BioinfRes SoSe 18
![Page 52: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/52.jpg)
BioinfRes SoSe 18
![Page 53: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/53.jpg)
BioinfRes SoSe 18
![Page 54: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/54.jpg)
BioinfRes SoSe 18
SearchingtheNCBI/Entrez● provideanintegratedsearchinterfacetothedifferentNCBIdatabases:EntrezProgrammingU)li)es(E-u)li)es)
● Base-URL:hap://eu)ls.ncbi.nlm.nih.gov/entrez/eu)ls/
● >40databases
● stableinterfaceofnineserver-sideprograms
● hap://www.ncbi.nlm.nih.gov/books/NBK25501/
![Page 55: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/55.jpg)
BioinfRes SoSe 18
EntrezGuidelines● ifyouusetheeu)lsagainsttheguidelinesyoumightbebanned!
● >100requests:weekendsoroutsideUSpeak)mes(9pm-5am,EST)
● notmorethan3requestpersecond
● provideemailandtoolname:&tool=<...>&email=<...>!
● registra)onwithemailandtoolnamewithNCBImayrelaxtheserestric)ons
● supportedbyBioPython
![Page 56: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/56.jpg)
BioinfRes SoSe 18
Construc)ngURLs
● parameter:&lowerCaseName● excep)on:&WebEnv
● norequiredorder
● nullvaluesandinappropriateparameteraregenerallyignored
● nospaces,use+instead
● useURLencodingsforspecialcharacterlike:%22for“or%23for#or%40for@
![Page 57: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/57.jpg)
BioinfRes SoSe 18
E-u)li)es● Einfo● Esearch
● EPost
● ESummary● EFetch
● ELink
● EGQuery
● ESpell● ECitMatch
![Page 58: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/58.jpg)
BioinfRes SoSe 18
ExternalInterfacestoEntrez/API● thereareanumberofAPIstoaccessthevariousservicesfromNCBI,describedat:
● hap://www.ncbi.nlm.nih.gov/books/NBK25501/● baseURL:hap://eu)ls.ncbi.nlm.nih.gov/entrez/eu)ls/
● basicsearching:- esearch.fcgi?db=<database>&term=<query>- Input:Entrezdatabase(&db);anyEntreztextquery(&term)
- Output:ListofUIDsmatchingtheEntrezquery
![Page 59: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/59.jpg)
BioinfRes SoSe 18
ESearch
● textsearch● eu)ls.ncbi.nlm.nih.gov/entrez/eu)ls/esearch.fcgi
● respondstoatextquerywiththelistofmatchingUIDsinagivendatabase(forlateruseinESummary,EFetchorELink),alongwiththetermtransla)onsofthequery
![Page 60: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/60.jpg)
BioinfRes SoSe 18
ESummary
● documentsummarydownloads● eu)ls.ncbi.nlm.nih.gov/entrez/eu)ls/esummary.fcgi
● respondstoalistofUIDsfromagivendatabasewiththecorrespondingdocumentsummaries
![Page 61: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/61.jpg)
BioinfRes SoSe 18
EGQuery
● globalquery● eu)ls.ncbi.nlm.nih.gov/entrez/eu)ls/egquery.fcgi
● respondstoatextquerywiththenumberofrecordsmatchingthequeryineachEntrezdatabase
![Page 62: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/62.jpg)
BioinfRes SoSe 18
EInfo
● databasesta)s)cs● eu)ls.ncbi.nlm.nih.gov/entrez/eu)ls/einfo.fcgi
● providesthenumberofrecordsindexedineachfieldofagivendatabase,thedateofthelastupdateofthedatabase,andtheavailablelinksfromthedatabasetootherEntrezdatabases
● without&db:listsallavailabledatabases
![Page 63: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/63.jpg)
BioinfRes SoSe 18
EFetch
● datarecorddownloads● eu)ls.ncbi.nlm.nih.gov/entrez/eu)ls/efetch.fcgi
● respondstoalistofUIDsinagivendatabasewiththecorrespondingdatarecordsinaspecifiedformat
![Page 64: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/64.jpg)
BioinfRes SoSe 18
ELink
● Entrezlinks● eu)ls.ncbi.nlm.nih.gov/entrez/eu)ls/elink.fcgi
● respondstoalistofUIDsinagivendatabasewitheitheralistofrelatedUIDs(andrelevancyscores)inthesamedatabaseoralistoflinkedUIDsinanotherEntrezdatabase
![Page 65: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/65.jpg)
BioinfRes SoSe 18
ELink
● checksfortheexistenceofaspecifiedlinkfromalistofoneormoreUIDs
● createsahyperlinktotheprimaryLinkOutproviderforaspecificUIDanddatabase,orlistsLinkOutURLsandaaributesformul)pleUIDs
![Page 66: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/66.jpg)
BioinfRes SoSe 18
EPost
● UIDuploads● eu)ls.ncbi.nlm.nih.gov/entrez/eu)ls/epost.fcgi
● acceptsalistofUIDsfromagivendatabase,storesthesetontheHistoryServer,andrespondswithaquerykeyandwebenvironmentfortheuploadeddataset
![Page 67: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/67.jpg)
BioinfRes SoSe 18
ESpell
● spellingsugges)ons● eu)ls.ncbi.nlm.nih.gov/entrez/eu)ls/espell.fcgi
● retrievesspellingsugges)onsforatextqueryinagivendatabase
![Page 68: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/68.jpg)
BioinfRes SoSe 18
ECitMatch
● batchcita)onsearchinginPubMed● eu)ls.ncbi.nlm.nih.gov/entrez/eu)ls/ecitmatch.cgi
● retrievesPubMedIDs(PMIDs)correspondingtoasetofinputcita)onstrings
![Page 69: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/69.jpg)
BioinfRes SoSe 18
Iden)ficators● recordsareiden)fiedbyanintegerIDcalledUID● UIDaredatabasespecificlikeGInumbers,PMIDS,MMDB-IDs
● UIDareaswellinputandoutput
● especiallyusefulincombina)onwiththeHistoryserver
● afulldescrip)onofparametersandsyntaxcanbefoundat:hap://www.ncbi.nlm.nih.gov/books/NBK25499/
![Page 70: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/70.jpg)
BioinfRes SoSe 18
SelectedUIDsEntrez Database UID common name E-utility Database Name Books Book ID books Conserved Domains PSSM-ID cdd dbVar dbVar ID dbvar EST GI number nucest Gene Gene ID gene Genome Genome ID genome MeSH MeSH ID mesh NCBI Web Site Web Site ID ncbisearch Nucleotide GI number nuccore PubMed PMID pubmed ... ... ...
![Page 71: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/71.jpg)
BioinfRes SoSe 18
EntrezCoreEngine● EGQuery,ESearch,andESummary● twotasks:- assemblealistofUIDsthatmatchatextquery(ESearch)- retrieveabriefsummaryrecordcalledaDocumentSummary(DocSum)foreachUIDESummary)
● EGQuey:globalversionofESearch● esearch.fcgi?db=database&term=query esummary.fcgi?db=database&id=uid1,uid2,uid3,...!
● expandedintomorecomplicatedEntrezqueries
![Page 72: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/72.jpg)
BioinfRes SoSe 18
EntrezDatabases(EInfo,EFetch,andELink)
● EInfo:- providesdetailedinforma)onabouteachdatabase- includinglistsoftheindexingfieldsinthedatabase- availablelinkstootherEntrezdatabases
![Page 73: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/73.jpg)
BioinfRes SoSe 18
EntrezDatabases(EInfo,EFetch,andELink)
● addedvaluetotherawdata:- supportsavarietyofdisplayformats:EFetchUIDlistsinXMLandplaintext(&retmode)foralldatabases,otherformats(&rettype)aredatabasespecific
- hap://www.ncbi.nlm.nih.gov/books/NBK25499/table/chapter4.T._valid_values_of__retmode_and/?report=objectonly
- efetch.fcgi?db=database&id=uid1,uid2,uid3 &rettype=report_type&retmode=data_mode!
![Page 74: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/74.jpg)
BioinfRes SoSe 18
EntrezDatabases(EInfo,EFetch,andELink)
● addedvaluetotherawdata:- linkstorecordsinotherEntrezdatabasesmanifestedaslistofassociatedUIDs
- UIDsmustbevalidinsourcedatabase(&dbfrom)- elink.fcgi?dbfrom=protein&db=gene&id=15718680,157427902
![Page 75: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/75.jpg)
BioinfRes SoSe 18
EntrezHistoryServer
● simple:intheGUIaccessibleviatherespec)vetabs
● youcanstoretemporarilysetsofUIDsasinputforlaterqueriesthroughothertools
● eachlistofUIDsisspecifiedby:- &query_key(integerlabel)- &WebEnv(cookiestring)
![Page 76: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/76.jpg)
BioinfRes SoSe 18
Crea)onofastoredUIDlist
● EPost:- EPostcanbeuseduploadaUIDlist- returns&query_keyand&WebEnv!
● ESearch:- storestheresultsifgiven&usehistory=y!
● ELink:- storestheresultsifgiven&cmd=neighbor_history!
![Page 77: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/77.jpg)
BioinfRes SoSe 18
UsageofstoredUIDlists● Useofstoredlists:esummary.fcgi?db=database&WebEnv=webenv &query_key=key!
● onewebenvironmentcanholdmul)pleresultlists
● listsinthesamewebenvironmentcanbecombinedwithAND,OR,NOT
● bydefaulteverycallcreatesanewenvironment
● ->give&WebEnvinsubsequentcallstostorethelistsinthesamewebenvironment
![Page 78: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/78.jpg)
BioinfRes SoSe 18
SketchingPipelines
● getDocSummariesorentriesforkeywordsorIDs:- ESearch->ESummary/EFetch- EPost->ESummary/EFetch
● filter/limitarecordset:- EPost/ELink->ESearch
● moreadvancedqueries:- ESearch->ELink->ESummary/EFetch- EPost->ELink->ESearch->EFetch
![Page 79: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/79.jpg)
BioinfRes SoSe 18
● storingresults:- esearch.fcgi?db=<database>&term=<query>&usehistory=y
- input:anyEntreztextquery(&term);Entrezdatabase(&db);&usehistory=y
- output:webenvironment(&WebEnv)andquerykey(&query_key)parametersspecifyingtheloca)onontheEntrezhistoryserverofthelistofUIDsmatchingtheEntrezquery
- example:hap://eu)ls.ncbi.nlm.nih.gov/entrez/eu)ls/esearch.fcgi?db=pubmed&term=science%5bjournal%5d+AND+breast+cancer+AND+2008%5bpdat%5d&usehistory=y
![Page 80: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/80.jpg)
BioinfRes SoSe 18
● Associa)ngSearchResultswithExis)ngSearchResults:- esearch.fcgi?db=<database>&term=<query1>&usehistory=y
- esearch.fcgi?db=<database>&term=<query2>&usehistory=y&WebEnv=$web1
- Input:AnyEntreztextquery(&term);Entrezdatabase(&db);&usehistory=y;Exis)ngwebenvironment(&WebEnv)fromapriorE-u)litycall
- Output:Webenvironment(&WebEnv)andquerykey(&query_key)parametersspecifyingtheloca)onontheEntrezhistoryserverofthelistofUIDsmatchingtheEntrezquery
![Page 81: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications](https://reader033.vdocuments.mx/reader033/viewer/2022050121/5f515cb5e5f918157102d83b/html5/thumbnails/81.jpg)
BioinfRes SoSe 18
E-u)lityWebinar
● haps://www.youtube.com/watch?v=iCFVVexp30o