bioinformacs resources - swissprot2016/05/13 · 2 16.711 mus musculus (mouse) 3 13.888 arabidopsis...
TRANSCRIPT
![Page 1: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/1.jpg)
BioinfRes SoSe 16
Bioinforma)csResources-Swissprot-
Lecture&ExercisesProf.B.Rost,Dr.L.Richter,J.Reeb
Ins)tutfürInforma)kI12
![Page 2: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/2.jpg)
BioinfRes SoSe 16
Puta)veSchedule
Apr. 22nd Intro, General Overview (1. sh.) Jun 10th No-SQL (7.sh.) Apr. 29th Sequence Databases (2. sh.) Jun 17th No-SQL (8.sh.)* May 6th No lecture Jun 24th JavaScript / UI (9.sh.) May 13th Sequence Databases (3. sh.) Jul 1st Web Services (10.sh.) May 20th Structure Databases (4. sh)* Jul 8th Bioinformatics Suites / Forums May 27th SQL (5. sh.) Jul 15th Wrap Up, Q&A Jun 3rd SQL (6. sh) Jul 28th Exam, 10:30-12:00 MW1050
* These exercises can earn you a bonus
![Page 3: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/3.jpg)
BioinfRes SoSe 16
XMLInfusion(in10sec)● compila)onfromhMp://www.w3schools.com/xml/default.asp
● XMLisasoQware-andhardware-independenttooltostoreandtotransportdata
● XMLstandsforeXtensibleMarkupLanguage
● designedtostoreandtransportdata● designedtobeself-descrip)ve
● W3Crecommenda)on
● itdoesNOTDOanything
![Page 4: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/4.jpg)
BioinfRes SoSe 16
AboutTags
● XMLtagsarenotpredefinedlikeHTMLtags● everybodycan/hastoinventhisowntags
● newtagscanbeaddedany)me
● theauthorhastodefinecontentandstructureofthedocument
● everythingisplaintext
![Page 5: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/5.jpg)
BioinfRes SoSe 16
DocumentStructure<?xml version="1.0" encoding="UTF-8"?>!<bookstore>!! <book category="cooking">! <title lang="en">Everyday Italian</title>! <author>Giada De Laurentiis</author>! <year>2005</year>! <price>30.00</price>! </book>!! <book category="children">! <title lang="en">Harry Potter</title>! <author>J K. Rowling</author>! <year>2005</year>! <price>29.99</price>! </book>!!....!</bookstore>!!takenfromhMp://www.w3schools.com/xml/xml_usedfor.asp
![Page 6: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/6.jpg)
BioinfRes SoSe 16
SyntaxRules● elementsaredefinedusingtags:<tagName> ... </tagName>or<tagName/>!
● elementscanbenested(containotherelements-parentandchildnodes,siblingnodes)
● elementscanhavetextcontent
● eachdocumentmustcontainONErootelementthatistheparentofallotherelements
![Page 7: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/7.jpg)
BioinfRes SoSe 16
SyntaxRefined
● prologline<?xml ...>isop)onal● tagsmustbe(self-)closed
● tagarecasesensi)ve
● tagsmustbeproperlynested:<a><b>....</a></b> Wrong!<a><b>....</b></a>! Right!
![Page 8: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/8.jpg)
BioinfRes SoSe 16
SyntaxRefined● tagsmayhaveaMributes● aMributevaluesmustalwaysbequoted
● somespecialcharacterscannotbeuseddirectly
● ->codedbyen)tyreferences:< < lessthan> > greaterthan& & ampersand' ‘ apostrophe" “ quota)onmark
● comments:<!-- .... -->!
![Page 9: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/9.jpg)
BioinfRes SoSe 16
TagNames● casesensi)ve● muststartwithaleMerorunderscore
● mustnotstartwiththeleMersxmlinanycase
● cancontain:leMers,digits,hyphens,underscoresandperiods
● cannotcontainspaces
● applycommonsenseandaconsistentstyle● avoid:minus(-),period(.),colon(:),non-englishcharactersforcompa)bilityreasons
![Page 10: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/10.jpg)
BioinfRes SoSe 16
XMLElement
● everythingbetweenthestartandtheendtag● tagsareincluded
● cancontain:- text- aMributes- otherelements- amixofall
● areextensible
![Page 11: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/11.jpg)
BioinfRes SoSe 16
XMLAMributes
● valuesmustbequoted:singleordoublequotes● theunusedcharactercanbeusedinsidethevalue
● decisionforaMributeorelementundecided,but:- aMributescannotcontainmul)plevalues- aMributescannotcontaintreestructures- aMributesarenoteasilyexpandable
● usefultostoremetadata,likeelementid,etc.
![Page 12: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/12.jpg)
BioinfRes SoSe 16
AGlimpseofNamespaces
● allowtopreventtagnamecollisionsbetweendifferentauthors/applica)ons/domains
● implementedbytheintroduc)onofprefixes● definedasanaMribute:xmlns:prefix=“URI”!
● usage:<prefix:tagName>!● theURIisonlyneededtobeunique
● usedtointegrateotherspecifica)ons,e.g.XSLT
![Page 13: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/13.jpg)
BioinfRes SoSe 16
LevelsofCorrectness● wellformed:adocumentobeythesyntaxrules:- rootelement- closingtag- casesensi)ve- properlynested- aMributevaluesquoted
● validdocuments:inadd)ontobeingvalidthealsoconformtoadocumenttypedefini)on(formatspecifica)on)
![Page 14: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/14.jpg)
BioinfRes SoSe 16
DocumentTypeDefini)ons
● twowaystospecifyadocumentstructure:● DTD:DocumentTypDefini)on
● XMLSchema:XMLbasedalterna)vetoDTD
![Page 15: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/15.jpg)
BioinfRes SoSe 16
Example
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE note SYSTEM "Note.dtd”> <note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend! ©right; </body> </note>!
![Page 16: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/16.jpg)
BioinfRes SoSe 16
Example
<!DOCTYPE note [ <!ELEMENT note (to,from,heading,body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)> <!ELEMENT body (#PCDATA)> <!ENTITY copyright “Copyright by ..”> ]>!
![Page 17: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/17.jpg)
BioinfRes SoSe 16
XMLDTD
● referencedfromadocumentwith:<!DOCTYPE note SYSTEM "Note.dtd">!
● !DOCTYPEdefinestherootelement● !ELEMENTdefinesthestructureoftheelements
● #PCDTAmeansparse-abletextdata● !ENTITYdefinesspecialcharactersorstrings
![Page 18: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/18.jpg)
BioinfRes SoSe 16
XMLSchema● alterna)vetoDTD<xs:element name="note”> <xs:complexType> <xs:sequence> <xs:element name="to" type="xs:string"/> <xs:element name="from" type="xs:string"/> <xs:element name="heading" type="xs:string"/> <xs:element name="body" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element>!
● supportofdatatypesandnamespaces
● wriMeninXMLandextensible!
![Page 19: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/19.jpg)
BioinfRes SoSe 16
NamesandOtherComplica)ons
AmosBairoch
takenfromhMp://web.expasy.org/images/people/Amos_Bairoch.jpg
IoannisXenarios
takenfromhMp://www.isb-sib.ch/people/Ioannis.Xenarios
![Page 20: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/20.jpg)
BioinfRes SoSe 16
History
1986 A.BairochcreatedSwiss-Protatthe UniversityofGeneva,since1988in
collabora)onwithEMBL/EBI
1993 togetherwithRonAppellaunchofExPASy
1998 Founda)onofSIB(SwissIns)tuteof Bioinforma)cs)
2002 Founda)onoftheUniProtconsor)umby EBI,SIBandPIR
![Page 21: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/21.jpg)
BioinfRes SoSe 16
UniProtComponents:● UniProtKB:- UniProtKB/Swiss-Prot- UniProtKB/TrEMBL
● UniParc:puresequencearchive,noannota)ons
● UniRef:consistsfothreedatabasesofclusteredsetsofproteinsequences(UniRef100,UniRef90,UniRef50)usingtheCD-HITalgorithm
● UniMes:datafrommetagenomicandenvironmentalsamples,notinUniProtKB
![Page 22: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/22.jpg)
BioinfRes SoSe 16
ExPASy
● hMp://www.expasy.org● ExpertProteinAnalysisSystem(1993)
● now:SIBExPASyBioinforma)csResourcesPortal● Ar)moP,JonnalageddaM,ArnoldK,Bara)nD,CsardiG,de
CastroE,DuvaudS,FlegelV,For)erA,GasteigerE,GrosdidierA,HernandezC,IoannidisV,KuznetsovD,Liech)R,MoreoS,MostaguirK,RedaschiN,RossierG,XenariosI,andStockingerH.ExPASy:SIBbioinforma9csresourceportal,NucleicAcidsRes,40(W1):W597-W603,2012.
![Page 23: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/23.jpg)
BioinfRes SoSe 16
ExpasyCategories
● Proteomics● Genomics
● StructuralBioinforma)cs
● Systemsbiology● Phylogeny/evolu)on
![Page 24: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/24.jpg)
BioinfRes SoSe 16
ExpasyCategories
● Popula)ongene)cs● Transcriptomics
● Biophysics
● Imaging● DrugDesign
![Page 25: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/25.jpg)
BioinfRes SoSe 16
ResourceDescrip)on
1. Resourcenameanddescrip)on2. MaintainingSIBgroup
3. Scien)ficcategory4. Keywords:acontrolledvocabularyisusedtotag
theresource
![Page 26: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/26.jpg)
BioinfRes SoSe 16
ResourceDescrip)on
5. URLforthewebinterfaceandforthedownloadifavailable
6. SoQwaretype:website,commandlineinterface,GUI,etc
7. Status:greencheckboxifcurrentlyavailable
![Page 27: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/27.jpg)
BioinfRes SoSe 16
UniProt/SwissProtSta)s)cs
● Release2016_05,May.11th● takenfromhMp://web.expasy.org/docs/relnotes/relstat.html
● 551.193sequenceentries(548.454in2015_05)/196.822.649aminoacids(195.409.447in2015_05)
![Page 28: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/28.jpg)
BioinfRes SoSe 16
UniProt/SwissProtSta)s)cs● Growthoveroneyear:2016_5vs2015_5
Protein existence (PE) Entries % 1. Evidence at protein level 92.536
(85.419) 16.8
(15.6) 2. Evidence at transcript level 57.757
(61.814) 10.5
(11.3) 3. Inferred from homology 387.589
(387.733) 70.3
(70.7) 4. Predicted 11358
(11.526) 2.1
(2.1) 5. Uncertain 1.953
(1.962) 0.4
(0.4)
![Page 29: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/29.jpg)
BioinfRes SoSe 16
Development
takenfromhMp://web.expasy.org/docs/relnotes/relstat1.pngforrelease2015_5
![Page 30: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/30.jpg)
BioinfRes SoSe 16
MoreNumbers(rel.2015_5)
● Representedspecies:13.209● Top20species:116.206sequences,i.e.21.3%ofthetotalnumberofsequences
Entries No of Species Entries No of Species 1 5.495 8 228 2 1.899 9 214 3 1.023 10 122 4 657 11-20 711 5 487 21-50 426 6 399 51-100 213 7 289 >100 1.046
![Page 31: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/31.jpg)
BioinfRes SoSe 16
SpeciesRepresenta)on(rel.2015_5)Top Frequency Species
1 20.198 Homo sapiens (Human) 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces cerevisiae (Baker’s yest) 6 5.993 Bos taurus (Bovine) 7 5.103 Schizosaccheromyces pombe (Fission yeast) 8 4.433 Escherichia coli K12 9 4.185 Bacillus subtilis 10 4.131 Dictyostelium discoideum (Slime mold) ... ... ...
![Page 32: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/32.jpg)
BioinfRes SoSe 16
Representa)onoftheDivisions(rel.2015_5)
Archaea (4%), 19340
Bacteria (61%), 332110
Eukaryota (33%), 180411
Viruses (3%), 16593
![Page 33: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/33.jpg)
BioinfRes SoSe 16
Distribu)onofEukaryota(rel.2015_5)
Human (11%), 20199
Other Mammalia
(26%), 46146
Other Vertebrata
(10%), 17823
Viridiplantae (20%), 36480
Fungi (17%), 31527
Insecta (5%), 8781
Nematoda (2%), 4417
Other (8%), 15038
![Page 34: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/34.jpg)
BioinfRes SoSe 16
LengthDistribu)on(rel.2015_5)
0
10000
20000
30000
40000
50000
60000
70000
![Page 35: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/35.jpg)
BioinfRes SoSe 16
AminoAcidComposi)on(rel.2015_5)
figure taken from http://web.expasy.org/docs/relnotes/relstat.html gray=aliphatic, red=acidic, green=small hydroxy, blue=basic, black=aromatic, white=amide, yellow=sulfur
![Page 36: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/36.jpg)
BioinfRes SoSe 16
SwissProtAnnota)onProcess
● definedinhMp://www.uniprot.org/docs/sop_manual_cura)on.pdf
● explainedinhMp://www.uniprot.org/help/manual_cura)on
![Page 37: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/37.jpg)
BioinfRes SoSe 16
Annota)onPhases
1. Sequencecura)on2. Sequenceanalysis3. Literaturecura)on4. Family-basedcura)on5. EvidenceaMribu)on6. Qualityassurance,integra)onandupdate
![Page 38: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/38.jpg)
BioinfRes SoSe 16
SequenceCura)on
● morethan95%aretranslatedCDSfromINSDC● othersources:PDB,directproteinsequencing,projectsnotsubmiongtoINSDC
● sequencesareselectedaccordingtocura)onpriori)es(hMp://www.uniprot.org/program/)
● resultsinthe“canonicalsequence”foragene/speciespair
![Page 39: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/39.jpg)
BioinfRes SoSe 16
Stepstowardthecanonicalsequence
● Entryselec)on● RunBLASTsimilaritysearchestoiden)fyaddi)onalsequencesforthesamegene
● Iden)fyhomologsbyreciprocalBLASTandphylogenybasedresources
● Lockselectedentriesforothercuratorstopreventduplica)on
![Page 40: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/40.jpg)
BioinfRes SoSe 16
Stepstowardthecanonicalsequence● PreparesequencealignmentswithT-Coffee,Muscle,ClustalW
● Mergeintothecanonicalsequence:- mostprevalent- mostsimilartoorthologssequencesfoundinotherspecies
- basedonlengthandaacomposi)onitallowstheclearestdescrip)on
- default:longest
● recordconflictsandvaria)ons
![Page 41: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/41.jpg)
BioinfRes SoSe 16
SequenceAnalysis
● Severalanalysisprogramsareappliedtothesequencesfor:- topologicalfeatures- post-transla)onalmodifica)ons- domains
● allresultsaremanuallycheckedandin-orexcludedforannota)on
![Page 42: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/42.jpg)
BioinfRes SoSe 16
TopologicalAnalysis
Tools Prediction Signal P Presence and location of signal peptides TargetP Presence and location of transit peptides Predotar Mitochondrial, plastid or ER targeting sequences ESKW Transmembrane domains MEMSAT Transmembrane domains TMHMM Transmembrane domains Phobius Discriminates transmembrane and signal regions
![Page 43: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/43.jpg)
BioinfRes SoSe 16
Post-transla)onalmodifica)onAnalysis
Tools Prediction GPI-predictor GPI lipid anchor sites NetNGlyc N-glycosylation sites NetOGlyc O-glycosylation sites NMT Predictor N-terminal myristoylation sites Sulfinator Tyrosine sulfatation sites
![Page 44: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/44.jpg)
BioinfRes SoSe 16
DomainAnalysis
Tools Prediction ps_scan internal PROSITE profile, pattern and rule scanning InterPro retrieves non-PROSITE motif matches using InterPro database or
InterProScan Coils Coiled-coils regions polyAA internal program which identifies homopolymeric stretches of amino
acids REPEAT identifies the following repeats: Ankyrin, Armadillo, HAT, HEAT,
Kelch, Leucine-rich, PFTA, PFTB, RCC1, TPR, WD40
![Page 45: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/45.jpg)
BioinfRes SoSe 16
Automatically selected results are returned in a graphical interface which allows visualisation of the predictions (Figure 1). Selected features are shown in green and unselected features are shown in red. The selected/unselected state of a feature can be toggled by clicking on it.
Figure 1. UniProtKB sequence analysis results displayed in graphical interface
All predictions are manually reviewed and relevant results are selected for inclusion in the entry. The sequence analysis platform then transforms the selected features into UniProtKB annotation by applying a set of automatic annotation rules (Figure 2).
taken from http://www.uniprot.org/docs/sop_manual_curation.pdf
![Page 46: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/46.jpg)
BioinfRes SoSe 16
LiteratureCura)on
● Iden)fica)onofrelevantscien)ficliteraturefrom- literatureandtextminingresources(PubMed,EuropePMC,iHOP,TextPresso)
- addi)onsfromothersourcesmadebythecurator
● Informa)onisextractedformthefulltext:- generalannota)ons(notposi)onspecific)- posi)onspecificannota)ons
![Page 47: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/47.jpg)
BioinfRes SoSe 16
GeneralAnnota)ons
● hMp://www.uniprot.org/help/general_annota)on
● posi)on-independent● containsmostlygeneralbiologicalinforma)onlike:func)ons,cataly)cac)vity,cofactor,enzymeregula)on,subunitstructure,pathway,...
![Page 48: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/48.jpg)
BioinfRes SoSe 16
SequenceAnnota)ons
● posi)ondependent● hMp://www.uniprot.org/help/sequence_annota)on
● regionsorsitesofinterestlikepost-transla)onalmodifica)ons,bindingsites,ac)vesites,etc.
● containsseveralsubsec)ons:moleculeprocessing,regions,sites,aminoacidmodifica)ons,naturalvariants,experimentalinfo,secondarystructure
![Page 49: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/49.jpg)
BioinfRes SoSe 16
Family-basedCura)on
● Evalua)onandcura)onofhomologsasdescribedabove
● Standardiza)onofannota)onofhomologs● Propaga)onofannota)onacrossthehomologstoensureconsistency
![Page 50: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/50.jpg)
BioinfRes SoSe 16
EvidenceAMribu)on● Everyannota)onisaMributedtoitsoriginalsource
● Everyannota)oncanbetracedbackandevaluated
● Forevidencedis)nc)onthereare7codesfromtheEvidenceCodeOntology(ECO)usedformanuallycuratedentries
● hMp://www.uniprot.org/help/evidences● Addi)onalGOtermannota)on
![Page 51: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/51.jpg)
BioinfRes SoSe 16
done through the use of a subset of evidence codes from the Evidence Code Ontology (ECO) (24). There are seven ECO evidence codes used in manually curated entries as shown in Table 2.
Table 2. Evidence Code Ontology (ECO) codes used during the UniProt manual curation process
ECO code Term name Usage ECO:0000269 experimental evidence used in
manual assertion Information for which there is published experimental evidence
ECO:0000303 non-traceable author statement used in manual assertion
Information based on author statements in scientific articles for which there is no experimental support
ECO:0000250 sequence similarity evidence used in manual assertion
Information which has been propagated from a related experimentally characterised protein
ECO:0000312 imported information used in manual assertion
Information which has been imported from another database and manually verified
ECO:0000305 curator inference used in manual assertion
Information which has been inferred by a curator based on his/her scientific knowledge or on the scientific content of an article
ECO:0000255 match to sequence model evidence used in manual assertion
Information originating from the UniProt automatic annotation systems or any of the sequence analysis programs used during the manual curation process and which has been manually verified
ECO:0000244 combinatorial evidence used in manual assertion
Information which is manually curated based on a combination of experimental and computational evidence
Full details of the evidences used in UniProtKB are available at http://www.uniprot.org/manual/evidences. 4.11 GO annotation Gene Ontology (GO) terms are assigned based on experimental data from the literature. Relevant terms are identified using the QuickGO (25) browser and are assigned to entries using the Protein2GO curation tool. This tool has been developed within the UniProt group and is used both by UniProt and by other members of the GO Consortium. GO terms are also propagated to homologous proteins where appropriate. The procedure is described in more detail at http://www.ebi.ac.uk/GOA/ManualAnnotationEfforts. 4.12 Quality control and integration All finished entries are run through a series of automated checks which verify a large number of biological rules such as the positions and relevance of amino acids cited in the entry. Any reported errors are corrected. Once an entry has passed the automated checks, it undergoes manual review by a senior curator to ensure that all relevant sequences have been merged, that all relevant literature has been added, that the annotation has been added correctly, and that all relevant sequence analysis results have been included. Once an entry has passed the automated and manual quality control checks, it is integrated into the database. 4.13 Unlock finished entries Integrated entries are unlocked so that they are available for further curation.
taken from http://www.uniprot.org/docs/sop_manual_curation.pdf
![Page 52: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/52.jpg)
BioinfRes SoSe 16
QualityControlandIntegra)on
● Finishedentriesrunthroughaseriesofrule-basedcheckedconcerningespeciallyposi)onsandregions
● Allerrorsarecorrected
● Manuallyreviewedbyaseniorcurator
● Finallyitisintegratedintothedatabase● Unlockthefinishedentriesforfurthercura)on
![Page 53: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/53.jpg)
BioinfRes SoSe 16
Demostra)on
● hMp://www.uniprot.org/uniprot/P62756#sec)on_features
![Page 54: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/54.jpg)
BioinfRes SoSe 16
TheSwiss-ProtFlatFile● hMp://web.expasy.org/docs/userman.html● Anentryiscomposedbydifferentlinetypes
● Linetypeshavetheirownformat
● FollowsEMBLNucleo)deSequenceDatabaseformatascloseaspossible
● 2sec)ons:- coredata(sequencedata,cita)oninfo,taxonomy)- annota)ons(func)on,modifica)on,domains,secandquartstructure,diseaseassocia)ons,conflicts,asf)
![Page 55: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/55.jpg)
BioinfRes SoSe 16
Line Code
Content Occurence in an entry
ID Identification Once; starts the entry AC Accession number(s) Once or more DT Date Three times DE Description Once or more GN Gene name(s) Optional OS Organism species Once or more OG Organelle Optional OC Organism classification Once or more OX Taxonomy cross-reference Once OH Organism host Optional
--continued--
The following table lists the available two-letter line codes. Each code is followed by three blanks.
![Page 56: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/56.jpg)
BioinfRes SoSe 16
Line Code
Content Occurence in an entry
RN Reference number Once or more RP Reference position Once or more RC Reference comment(s) Optional RX Reference cross-reference(s) Optional RG Reference group Once or more (Optional if RA line) RA Reference authors Once or more (Optional if R line) RT Reference title Optional RL Reference location Once or more CC Comments or notes Optional DR Database cross-references Optional PE Protein existence Once KW Keywords Optional FT Feature table data Once or more in Swiss-Prot, optional in TrEMBL SQ Sequence header Once (blanks) Sequence data Once or more // Termination line Once; ends the entry
![Page 57: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/57.jpg)
BioinfRes SoSe 16
FieldsinMoreDetail
● IDline:IDEntryNameStatus;SequenceLength.
● EntryName:upto11uppercasealphanumericcharactersX_Y- Xisamnemoniccodeofatmost5alphanumericcharacters
- Yisamnemonicspeciesiden)fica)oncodeofatmost5alphanumericcharacters
● IDCYC_BOVINReviewed;104AA.
![Page 58: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/58.jpg)
BioinfRes SoSe 16
● ACline:ACAC_number_1;[AC_number_2;]...[AC_number_N;]
● Accessionnumber:6or10characters1 2 3 4 5 6 7 8 9 10 [A-N,R-Z][0-9][A-Z] [A-Z,0-9][A-Z,0-9][0-9][O,P,Q] [0-9][A-Z,0-9][A-Z,0-9][A-Z,0-9][0-9][A-N,R-Z][0-9][A-Z] [A-Z,0-9][A-Z,0-9][0-9][A-Z] [A-Z,0-9] [A-Z,0-9] [0-9]
● RegEx:[OPQ][0-9][A-Z0-9]{3}[0-9]|[A-NR-Z][0-9]([A-Z][A-Z0-9]{2}[0-9]){1,2}
● Examples:P12345,Q1AAA9,A0A022YWF9
![Page 59: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/59.jpg)
BioinfRes SoSe 16
● DTline:date,DD-MMM-YYYY● alwaysoneofthebiweeklyreleasedates
● alwaysthreelines:- dateofintegra)on- dateofsequenceversion,sequenceversionX- dateofentryversion,entryversionX
● Example:DT01-FEB-1999,integratedintoUniProtKB/TrEMBL.DT15-OCT-2000,sequenceversion2.DT15-DEC-2004,entryversion5.
![Page 60: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/60.jpg)
BioinfRes SoSe 16
● DElines:- threecategoriesandaddi)onalsubcategories- containsarecommendedname- besides:fullname,shortname,ECnumber- alterna)venames:e.g.asanallergenorinbiotechnology,...
![Page 61: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/61.jpg)
BioinfRes SoSe 16
DERecName:Full=AnnexinA5;DEShort=Annexin-5;DEAltName:Full=AnnexinV;DEAltName:Full=Lipocor)nV;DEAltName:Full=EndonexinII;DEAltName:Full=CalphobindinI;DEAltName:Full=CBP-I;DEAltName:Full=Placentalan)coagulantproteinI;DEShort=PAP-I;DEAltName:Full=PP4;DEAltName:Full=Thromboplas)ninhibitor;DEAltName:Full=Vascularan)coagulant-alpha;DEShort=VAC-alpha;DEAltName:Full=AnchorinCII;DERecName:Full=Granulocytecolony-s)mula)ngfactor;DEShort=G-CSF;DEAltName:Full=Pluripoie)n;DEAltName:Full=Filgras)m;DEAltName:Full=Lenogras)m;DEFlags:Precursor;
![Page 62: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/62.jpg)
BioinfRes SoSe 16
● OSline:origina)ngorganism● OSHomosapiens(Human).● OSRoussarcomavirus(strainSchmidt-RuppinA)(RSV-SRA)(Avianleukosis
OSvirus-RSA).
● OClines:containthetaxonomicclassifica)onofthesourceorganismaccordingto(hMp://www.ncbi.nlm.nih.gov/Taxonomy/)
● OCNode[;Node...].
● OCEukaryota;Metazoa;Chordata;Craniata;Vertebrata;Euteleostomi;OCMammalia;Eutheria;Euarchontoglires;Primates;Catarrhini;Hominidae;OCHomo.
![Page 63: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/63.jpg)
BioinfRes SoSe 16
RN,RP,RC,RX,RG,RA,RT,RL● canoccurmul)ple)me● orderinblockfixed
● e.g:RN[1]RPNUCLEOTIDESEQUENCE[MRNA](ISOFORMSAANDC),FUNCTION,INTERACTIONRPWITHPKC-3,SUBCELLULARLOCATION,TISSUESPECIFICITY,DEVELOPMENTALRPSTAGE,ANDMUTAGENESISOFPHE-175ANDPHE-221.RCSTRAIN=BristolN2;RXPubMed=11134024;DOI=10.1074/jbc.M008990200;RAZhangL.,WuS.-L.,RubinC.S.;RT"AnoveladapterproteinemploysaphosphotyrosinebindingdomainandRTexcep)onallybasicN-terminaldomainstocaptureandlocalizeanRTatypicalproteinkinaseC:characteriza)onofCaenorhabdi)selegansRTCkinaseadapter1,aproteinthatavidlybindsproteinkinaseC3.“;RLJ.Biol.Chem.276:10463-10475(2001).
![Page 64: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/64.jpg)
BioinfRes SoSe 16
CClines
● freetext● containsmostoftheannotatedinforma)on● CC-!-TOPIC:Firstlineofacommentblock;
CCsecondandsubsequentlinesofacommentblock.
● structuredbypredefinedtopicslike:Allergen,Alterna)veProducts,..,Cofactor,...,Disease,..Domain,...,Func)on,Interac)on,.......
![Page 65: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/65.jpg)
BioinfRes SoSe 16
CC -!- ALLERGEN: Causes an allergic reaction in human. Minor allergen of!
CC bovine dander.!
CC -!- ALTERNATIVE PRODUCTS:!
CC Event=Alternative initiation; Named isoforms=2;!
CC Name=Alpha;!
CC IsoId=P51636-1; Sequence=Displayed;!
CC Name=Beta;!
CC IsoId=P51636-2; Sequence=VSP_018696;!
CC -!- SUBCELLULAR LOCATION: Cell membrane {ECO:0000250}; Peripheral!
CC membrane protein {ECO:0000250}. Secreted {ECO:0000250}. Note=The!
CC last 22 C-terminal amino acids may participate in cell membrane!
CC attachment.!
CC -!- SUBCELLULAR LOCATION: Isoform 2: Cytoplasm {ECO:0000305}.!
!
!
![Page 66: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/66.jpg)
BioinfRes SoSe 16
CrossReferences
● toomanytoenumerate● extensivereferenceswithnucleo)dedatabases,e.g.:inEMBLFTCDS302..2674FT/protein_id="CAA03857.1“FT/db_xref="SWISS-PROT:P26345“FT/gene="recA“FT/product="RecAprotein“inSwiss=ProtDREMBL;AJ297977;CAC17465.1;-;Genomic_DNA.DREMBL;X56491;CAA39846.1;ALT_FRAME;mRNA.
![Page 67: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/67.jpg)
BioinfRes SoSe 16
KeyWords/FeatureTable
● KWKeyword[;Keyword...].● helpstosearchresp.indexthedatabase
● nolimits:KW3D-structure;Alterna)vesplicing;Alzheimerdisease;Amyloid;KWApoptosis;Celladhesion;Coatedpits;Copper;KWDirectproteinsequencing;Diseasemuta)on;Endocytosis;KWGlycoprotein;Heparin-binding;Iron;Membrane;Metal-binding;KWNotchsignalingpathway;Phosphoryla)on;Polymorphism;KWProteaseinhibitor;Proteoglycan;Serineproteaseinhibitor;Signal;KWTransmembrane;Zinc.
● FeaturetablelikeGenBank/EMBL/DDBJ
![Page 68: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/68.jpg)
BioinfRes SoSe 16
Programma)cAccess
● hMp://www.uniprot.org/help/programma)c_access(rememberthislink!)
● severalusecasesdocumented,butnotasanAPI● bestway:usethewebinterfacetoconstruct/refineyourqueryfirstbeforeyoutrytoautomatetheprocess
![Page 69: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/69.jpg)
BioinfRes SoSe 16
RetrievinganIndividualEntry
● usessimpleURLwhichcanbebookmarked● forindividualentries:hMp://www.uniprot.org/uniprot/P12345
● defaultresultisawebpage
● alterna)veformats:txt,xml,rdf,fasta,gff
● specifiedviatheaccessionsuffix
● structuredformatslikexmlorrdfcanincludereferencedentries
![Page 70: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/70.jpg)
BioinfRes SoSe 16
UsingtheIDmappingservice
● hMp://www.uniprot.org/help/programma)c_access#batch_retrieval_perl_example
● useshMpPOSTmethod
● convertsbetweendifferentdatabaseIDs
● youhavetoknowthespecificabbrevia)onfortherespec)vedatabases
![Page 71: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/71.jpg)
BioinfRes SoSe 16
RetrievingEntriesviaQueries
● useshMpGETmethodi.e.● thequerystringispartoftheURL
● structuremightbequitecomplex
● usethebrowsertoconfigurethequerystring● moreseongareavailableviathequerybuilderhMp://www.uniprot.org/help/advanced_search
● theURLlengthmightbelimitedto1000characters
![Page 72: Bioinformacs Resources - Swissprot2016/05/13 · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces](https://reader034.vdocuments.mx/reader034/viewer/2022042912/5f4674854ea01044921b7dd6/html5/thumbnails/72.jpg)
BioinfRes SoSe 16
Examples● hMp://www.uniprot.org/uniprot/P12345.txt● hMp://www.uniprot.org/uniprot/P12345.xml
● hMp://www.uniprot.org/uniref/UniRef90_P04259.xml
● hMp://www.uniprot.org/uniref/UniRef90_P04259.rdf
● hMp://www.uniprot.org/uniref/UniRef90_P04259.fasta
● hMp://www.uniprot.org/uniref/UniRef90_P04259.tab