bioinformatika pro přfuk 2001

52
Jan Pačes Ústav molekulární genetiky [email protected] Jiří Vondrášek stav organické chemie a biochemie [email protected] http://bio.img.cas.cz/PrfUK2002 Bioinformatika pro PřfUK 2001

Upload: danil

Post on 04-Jan-2016

60 views

Category:

Documents


1 download

DESCRIPTION

Bioinformatika pro PřfUK 2001. Jiří Vondrášek Ústav organické chemie a biochemie [email protected]. Jan Pačes Ústav molekulární genetiky [email protected]. http://bio.img.cas.cz/PrfUK2002. Databáze: obsah. principy SQL formáty biologických sekvencí IUB kódy DNA databáze - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Bioinformatika pro PřfUK 2001

Jan PačesÚstav molekulární [email protected]

Jiří VondrášekÚstav organické chemie a [email protected]

http://bio.img.cas.cz/PrfUK2002

Bioinformatikapro PřfUK 2001

Page 2: Bioinformatika pro PřfUK 2001

Databáze: obsah

principySQLformáty biologických sekvencíIUB kódyDNA databázeproteinové a genomové databázestrukturní databáze

Page 3: Bioinformatika pro PřfUK 2001

organizace databází

Relační databáze

c_id identifikátor, číslo

title text

journal krátký text

year datum

… …

a_id identifikátor

c_id identifikátor

name krátký text

k_id identifikátor

c_id identifikátor

keyword krátký text

Page 4: Bioinformatika pro PřfUK 2001

SQL: Structured Query Language

c_id identifikátor, číslo

title text

journal krátký text

year datum

… …

CREATE TABLE article (c_id INTEGER,title TEXT,journal VARCHAR(30),year DATE

);

Page 5: Bioinformatika pro PřfUK 2001

SQL: Structured Query Language

CREATE TABLE author (a_id INTEGER,c_id INTEGER,name VARCHAR(30)

);

a_id identifikátor

c_id identifikátor

name krátký text

Page 6: Bioinformatika pro PřfUK 2001

SQL: Structured Query Language

INSERT INTO article SET c_id = '1',title = 'Something absolutely fantastic',journal = 'Bioinformatics',year = '2002';

INSERT INTO author SETa_id = '1',c_id = '1',name = 'Paces, Jan';

INSERT INTO author SETa_id = '2',c_id = '1',name = 'Vondrasek, Jiri';

Page 7: Bioinformatika pro PřfUK 2001

SQL: Structured Query Language

SELECT article.title,article.journal,author.nameFROM article,journalWHERE article.c_id = author.c_id AND

article.year > '2000' ANDauthor.name LIKE 'Paces%';

Page 8: Bioinformatika pro PřfUK 2001

kód nukleotidy komplementA A TC C GG G CT T A(U U) AM AC KR AG YW AT SS CG WY CT RK GT MV ACG BH ACT DD AGT HB CGT VN ACGT N- mezera -

kód třípísmenný kód aminokyselinaA Ala alaninC Cys cysteinD Asp asparagová kyselinaG Glu glutamová kyselinaH His histidinI Ile isoleucinK Lys lysinL Leu leucinM Met methioninN Asn asparaginP Pro prolinQ Gln glutaminR Arg argininS Ser serinT Thr threoninV Val valinW Trp tryptofanY Tyr tyrosinB Asx asparagová kys. nebo

asparaginZ Glx glutamová kys. nebo

glutaminX Xxx jakákoliv

aminokyselina* --- stop

nukleotidy aminokyseliny

IUB kódy

Page 9: Bioinformatika pro PřfUK 2001

binární s chromatogramy

pro programy

minimální

anotované

textové

SCFALFABI

interní formáty databází

textfasta

EMBLGenBankASNXML

formáty sekvencí

Page 10: Bioinformatika pro PřfUK 2001

SCF (standart chromatogram file)

formáty sekvencí - SCF

Page 11: Bioinformatika pro PřfUK 2001

EMBL (formát databáze EMBL)ID AF031150 standard; RNA; ROD; 1379 BP.XXAC AF031150;XXSV AF031150.1XXDT 27-FEB-1998 (Rel. 54, Created)DT 27-FEB-1998 (Rel. 54, Last updated, Version 1)XXDE Mus musculus paired-box transcription factor (Pax4) mRNA, complete cds.XXKW .XXOS Mus musculus (house mouse)OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia;OC Eutheria; Rodentia; Sciurognathi; Muridae; Murinae; Mus.XXRN [1]RP 1-1379RA Inoue H., Nomiyama J., Nakai K., Matsutani A., Tanizawa Y., Oka Y.;RT Isolation of full-length cDNA of mouse PAX4 gene and identification of itsRT human homologue;RL Biochem. Biophys. Res. Commun. 243:628-633(1998).XXRN [2]RP 1-1379RA Inoue H., Nomiyama J., Nakai K., Tanizawa Y., Oka Y.;RT ;RL Submitted (23-OCT-1997) to the EMBL/GenBank/DDBJ databases.RL Third Dept. of Int. Med., Yamaguchi University, 1144 Kogushi, Ube,RL Yamaguchi 755, JapanXXFH Key Location/Qualifiers…

formáty sekvencí - EMBL

Page 12: Bioinformatika pro PřfUK 2001

…FH Key Location/QualifiersFHFT source 1..1379FT /db_xref=taxon:10090FT /organism=Mus musculusFT /cell_line=MIN6FT CDS 297..1346FT /codon_start=1FT /gene=Pax4FT /product=paired-box transcription factorFT /protein_id=AAC40046.1FT /translation=MQQDGLSSVNQLGGLFVNGRPLPLDTRQQIVQLAIRGMRPCDISRFT SLKVSNGCVSKILGRYYRTGVLEPKCIGGSKPRLATPAVVARIAQLKDEYPALFAWEIQFT HQLCTEGLCTQDKAPSVSSINRVLRALQEDQSLHWTQLRSPAVLAPVLPSPHSNCGAPRFT GPHPGTSHRNRTIFSPGQAEALEKEFQRGQYPDSVARGKLAAATSLPEDTVRVWFSNRRFT AKWRRQEKLKWEAQLPGASQDLTVPKNSPGIISAQQSPGSVPSAALPVLEPLSPSFCQLFT CCGTAPGRCSSDTSSQAYLQPYWDCQSLLPVASSSYVEFAWPCLTTHPVHHLIGGPGQVFT PSTHCSNWPXXSQ Sequence 1379 BP; 327 A; 402 C; 347 G; 303 T; 0 other; aaaaaaaaaa aaaaagcggc cgctgaattc tagcagaagg ctgccctctg ctcctgagtg 60 aaggctctgt gaagctctgg accccctggc aggactgaag cagctggagg ctgttacaag 120 accagaccac cagcaaaccc tggagcctgc acaggaccct gagacctctt cctggaattc 180 ccaccttttt tcctccatcc agaaccagtc ccaaagagaa acttccagaa ggagctctcc 240 gttttcagtt tgccagttgg cttcctgtcc ttctgtgagg agtaccagtg tgaagcatgc 300 agcaggacgg actcagcagt gtgaatcagc tagggggact ctttgtgaat ggccggcccc 360… gctgtgggac agcaccaggc agatgttcca gtgacacctc atcccaggcc tatctccaac 1200 cctactggga ctgccaatcc ctccttcctg tggcttcctc ctcatatgtg gaatttgcct 1260 ggccctgcct caccacccat cctgtgcatc atctgattgg aggcccagga caagtgccat 1320 caacccattg ctcaaactgg ccataagagg cctctatttg acagtaataa aaacctttt 1379//

EMBL (formát databáze EMBL)

formáty sekvencí - EMBL

Page 13: Bioinformatika pro PřfUK 2001

GenbankLOCUS AF145233 1360 bp mRNA ROD 23-OCT-1999DEFINITION Mus musculus transcription factor PAX4 (Pax4) mRNA, complete cds.ACCESSION AF145233VERSION AF145233.1 GI:6102607KEYWORDS .SOURCE house mouse. ORGANISM Mus musculus Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Rodentia; Sciurognathi; Muridae; Murinae; Mus.REFERENCE 1 (bases 1 to 1360) AUTHORS Kalousova,A., Benes,V., Paces,J., Paces,V. and Kozmik,Z. TITLE DNA binding and transactivating properties of the paired and homeobox protein Pax4 JOURNAL Biochem. Biophys. Res. Commun. 259 (3), 510-518 (1999) MEDLINE 99294619 PUBMED 10364449REFERENCE 2 (bases 1 to 1360) AUTHORS Kalousova,A., Paces,J. and Kozmik,Z. TITLE Direct Submission JOURNAL Submitted (23-APR-1999) Dept. of Transcription Regulation, Institute of Molecular Genetics, Videnska 1083, Prague 142 20, Czech RepublicFEATURES Location/Qualifiers source 1..1360 /organism="Mus musculus" /db_xref="taxon:10090" gene 1..1360 /gene="Pax4" CDS 211..1260 /gene="Pax4" /note="DNA binding protein; paired box protein; homeobox protein" /codon_start=1 /product="transcription factor PAX4" /protein_id="AAF03533.1"…

formáty sekvencí - GenBank

Page 14: Bioinformatika pro PřfUK 2001

CDS 211..1260 /gene="Pax4" /note="DNA binding protein; paired box protein; homeobox protein" /codon_start=1 /product="transcription factor PAX4" /protein_id="AAF03533.1" /db_xref="GI:6102608" /translation="MQQDGLSSVNQLGGLFVNGRPLPLDTRQQIVQLAIRGMRPCDIS RSLKVSNGCVSKILGRYYRTGVLEPKCIGGSKPRLATPAVVARIAQLKDEYPALFAWE IQHQLCTEGLCTQDKAPSVSSINRVLRALQEDQSLHWTQLRSPAVLAPVLPSPHSNCG APRGPHPGTSHRNRTIFSPGQAEALEKEFQRGQYPDSVARGKLAAATSLPEDTVRVWF SNRRAKWRRQEKLKWEAQLPGASQDLTVPKNSPGIISAQQSPGSVPSAALPVLEPLSP SFCQLCCGTAPGRCSSDTSSQAYLQPYWDCQSLLPVASSSYVEFAWPCLTTHPVHHLI GGPGQVPSTHCSNWP"BASE COUNT 359 a 381 c 328 g 292 tORIGIN 1 tggcaggact gaagcagctg gaggctgtta caagaccaga ccaccagcaa accctggagc 61 ctgcacagga ccctgagacc tcttcctgga attcccacct tttttcctcc atccagaacc 121 agtcccaaag agaaacttcc agaaggagct ctccgttttc agtttgccag ttggcttcct 181 gtccttctgt gaggagtacc agtgtgaagc atgcagcagg acggactcag cagtgtgaat… 1081 tccagtgaca cctcatccca ggcctatctc caaccctact gggactgcca atccctcctt 1141 cctgtggctt cctcctcata tgtggaattt gcctggccct gcctcaccac ccatcctgtg 1201 catcatctga ttggaggccc aggacaagtg ccatcaaccc attgctcaaa ctggccataa 1261 gaggcctcta tttgacagta ataaaaacct tttcttagat gttaaaaaaa aaaaaaaaaa 1321 aaaaaaaaaa aaaaaaaaaa aaaaaaaaaa aaaaaaaaaa //

Genbank

formáty sekvencí - GenBank

Page 15: Bioinformatika pro PřfUK 2001

fasta>gi|6102607|gb|AF145233.1|AF145233 Mus musculus transcription factor PAX4 (Pax4) mRNA, complete cdsTGGCAGGACTGAAGCAGCTGGAGGCTGTTACAAGACCAGACCACCAGCAAACCCTGGAGCCTGCACAGGACCCTGAGACCTCTTCCTGGAATTCCCACCTTTTTTCCTCCATCCAGAACCAGTCCCAAAGAGAAACTTCCAGAAGGAGCTCTCCGTTTTCAGTTTGCCAGTTGGCTTCCTGTCCTTCTGTGAGGAGTACCAGTGTGAAGCATGCAGCAGGACGGACTCAGCAGTGTGAATCAGCTAGGGGGACTCTTTGTGAATGGCCGGCCCCTTCCTCTGGACACCAGGCAGCAGATTGTGCAGCTAGCAATAAGAGGGATGCGACCCTGTGACATTTCACGGAGCCTTAAGGTATCTAATGGCTGTGTGAGCAAGATCCTAGGACGCTACTACCGCACAGGTGTCTTGGAACCCAAGTGTATTGGGGGAAGCAAACCACGTCTGGCCACACCTGCTGTGGTGGCTCGAATTGCCCAGCTAAAGGATGAGTACCCTGCTCTTTTTGCCTGGGAGATCCAACACCAGCTTTGCACTGAAGGGCTTTGTACCCAGGACAAGGCTCCCAGTGTGTCCTCTATCAATCGAGTACTTCGGGCACTTCAGGAAGACCAGAGCTTGCACTGGACTCAACTCAGATCACCAGCTGTGTTGGCTCCAGTTCTTCCCAGTCCCCACAGTAACTGTGGGGCTCCCCGAGGCCCCCACCCAGGAACCAGCCACAGGAATCGGACTATCTTCTCCCCGGGACAAGCCGAGGCACTGGAGAAAGAGTTTCAGCGTGGGCAGTATCCAGATTCAGTGGCCCGTGGGAAGCTGGCTGCTGCCACCTCTCTGCCTGAAGACACGGTGAGGGTTTGGTTTTCTAACAGAAGAGCCAAATGGCGCAGGCAAGAGAAGCTGAAATGGGAAGCACAGCTGCCAGGTGCTTCCCAGGACCTGACAGTACCAAAAAATTCTCCAGGGATCATCTCTGCACAGCAGTCCCCCGGCAGTGTACCCTCAGCTGCCTTGCCTGTGCTGGAACCATTGAGTCCTTCCTTCTGTCAGCTATGCTGTGGGACAGCACCAGGCAGATGTTCCAGTGACACCTCATCCCAGGCCTATCTCCAACCCTACTGGGACTGCCAATCCCTCCTTCCTGTGGCTTCCTCCTCATATGTGGAATTTGCCTGGCCCTGCCTCACCACCCATCCTGTGCATCATCTGATTGGAGGCCCAGGACAAGTGCCATCAACCCATTGCTCAAACTGGCCATAAGAGGCCTCTATTTGACAGTAATAAAAACCTTTTCTTAGATGTTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

formáty sekvencí - FastA

Page 16: Bioinformatika pro PřfUK 2001

ASNSeq-entry ::= set { class nuc-prot , descr { title "Mus musculus transcription factor PAX4 (Pax4) mRNA, complete cds." , source { org { taxname "Mus musculus" , common "house mouse" , db { { db "taxon" , tag id 10090 } } , orgname { name binomial { genus "Mus" , species "musculus" } , lineage "Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Rodentia; Sciurognathi; Muridae; Murinae; Mus" , gcode 1 , mgcode 2 , div "ROD" } } } , pub { pub { sub { authors { names std

formáty sekvencí - ASN

Page 17: Bioinformatika pro PřfUK 2001

Bioinformatic Links

Page 18: Bioinformatika pro PřfUK 2001

GenBank

Page 19: Bioinformatika pro PřfUK 2001

Swiss-Prot

Page 20: Bioinformatika pro PřfUK 2001

Entrez

Entrez•Literature (PubMed)•Nucleotide (GenBank)•Protein (PIR)•Genome•Structure (PDB)•PopSet•Taxonomy•OMIM

Page 21: Bioinformatika pro PřfUK 2001

Entrez

Page 22: Bioinformatika pro PřfUK 2001

Entrez

Page 23: Bioinformatika pro PřfUK 2001

Entrez

Page 24: Bioinformatika pro PřfUK 2001

SRS

Page 25: Bioinformatika pro PřfUK 2001

SRS

Page 26: Bioinformatika pro PřfUK 2001

SRS

Page 27: Bioinformatika pro PřfUK 2001

SRS

Page 28: Bioinformatika pro PřfUK 2001

SRS

Page 29: Bioinformatika pro PřfUK 2001

SRS

Page 30: Bioinformatika pro PřfUK 2001

SRS

Page 31: Bioinformatika pro PřfUK 2001

SRS - list

Page 32: Bioinformatika pro PřfUK 2001

SRS - list

Page 33: Bioinformatika pro PřfUK 2001

SRS - list

Page 34: Bioinformatika pro PřfUK 2001

PDB

Page 35: Bioinformatika pro PřfUK 2001

PDB

Page 36: Bioinformatika pro PřfUK 2001

PDB

Page 37: Bioinformatika pro PřfUK 2001

PDBHEADER GENE REGULATION/DNA 22-APR-99 6PAX TITLE CRYSTAL STRUCTURE OF THE HUMAN PAX-6 PAIRED DOMAIN-DNA TITLE 2 COMPLEX REVEALS A GENERAL MODEL FOR PAX PROTEIN-DNA TITLE 3 INTERACTIONS COMPND MOL_ID: 1; COMPND 2 MOLECULE: HOMEOBOX PROTEIN PAX-6; COMPND 3 CHAIN: A; COMPND 4 ENGINEERED: YES; COMPND 5 BIOLOGICAL_UNIT: MONOMER; COMPND 6 MOL_ID: 2; COMPND 7 MOLECULE: 26 NUCLEOTIDE DNA; COMPND 8 CHAIN: B; COMPND 9 ENGINEERED: YES; COMPND 10 BIOLOGICAL_UNIT: MONOMER; COMPND 11 MOL_ID: 3; COMPND 12 MOLECULE: 26 NUCLEOTIDE DNA; COMPND 13 CHAIN: C; COMPND 14 ENGINEERED: YES; COMPND 15 BIOLOGICAL_UNIT: MONOMER SOURCE MOL_ID: 1; SOURCE 2 ORGANISM_SCIENTIFIC: HOMO SAPIENS; SOURCE 3 ORGANISM_COMMON: HUMAN; SOURCE 4 GENE: PAX6; SOURCE 5 EXPRESSION_SYSTEM: ESCHERICHIA COLI; SOURCE 6 EXPRESSION_SYSTEM_STRAIN: BL21(DE3); SOURCE 7 MOL_ID: 2; SOURCE 8 SYNTHETIC: YES; SOURCE 9 MOL_ID: 3; SOURCE 10 SYNTHETIC: YES KEYWDS PAX, PAIRED DOMAIN, TRANSCRIPTION, PROTEIN-DNA INTERACTIONS, KEYWDS 2 GENE REGULATION/DNA EXPDTA X-RAY DIFFRACTION AUTHOR H.E.XU,M.A.ROULD,W.XU,J.A.EPSTEIN,R.L.MAAS,C.O.PABO REVDAT 1 13-JUL-99 6PAX 0 JRNL AUTH H.E.XU,M.A.ROULD,W.XU,J.A.EPSTEIN,R.L.MAAS,C.O.PABO JRNL TITL CRYSTAL STRUCTURE OF THE HUMAN PAX-6 PAIRED JRNL TITL 2 DOMAIN-DNA COMPLEX REVEALS SPECIFIC ROLES FOR THE JRNL TITL 3 LINKER REGION AND THE CARBOXY-TERMINAL SUBDOMAIN JRNL TITL 4 IN DNA BINDING

Page 38: Bioinformatika pro PřfUK 2001

PDBSEQRES 1 A 133 SER HIS SER GLY VAL ASN GLN LEU GLY GLY VAL PHE VAL SEQRES 2 A 133 ASN GLY ARG PRO LEU PRO ASP SER THR ARG GLN ARG ILE SEQRES 3 A 133 VAL GLU LEU ALA HIS SER GLY ALA ARG PRO CYS ASP ILE SEQRES 4 A 133 SER ARG ILE LEU GLN VAL SER ASN GLY CYS VAL SER LYS SEQRES 5 A 133 ILE LEU GLY ARG TYR TYR ALA THR GLY SER ILE ARG PRO SEQRES 6 A 133 ARG ALA ILE GLY GLY SER LYS PRO ARG VAL ALA THR PRO SEQRES 7 A 133 GLU VAL VAL SER LYS ILE ALA GLN TYR LYS GLN GLU CYS SEQRES 8 A 133 PRO SER ILE PHE ALA TRP GLU ILE ARG ASP ARG LEU LEU SEQRES 9 A 133 SER GLU GLY VAL CYS THR ASN ASP ASN ILE PRO SER VAL SEQRES 10 A 133 SER SER ILE ASN ARG VAL LEU ARG ASN LEU ALA SER GLU SEQRES 11 A 133 LYS GLN GLN SEQRES 1 B 26 A A G C A T T T T C A C G SEQRES 2 B 26 C A T G A G T G C A C A G SEQRES 1 C 26 T T C T G T G C A C T C A SEQRES 2 C 26 T G C G T G A A A A T G C FORMUL 4 HOH *84(H2 O1) HELIX 1 1 ASP A 20 HIS A 31 1 12 HELIX 2 2 PRO A 36 LEU A 43 1 8 HELIX 3 3 ASN A 47 THR A 60 1 14 HELIX 4 4 PRO A 78 GLU A 90 1 13 HELIX 5 5 ALA A 96 SER A 105 1 10 HELIX 6 6 VAL A 117 GLU A 130 1 14 SHEET 1 A 2 SER A 3 VAL A 5 0 SHEET 2 A 2 VAL A 11 VAL A 13 -1 N PHE A 12 O GLY A 4 CRYST1 33.840 61.686 171.111 90.00 90.00 90.00 P 21 21 21 4 ORIGX1 1.000000 0.000000 0.000000 0.00000 ORIGX2 0.000000 1.000000 0.000000 0.00000 ORIGX3 0.000000 0.000000 1.000000 0.00000 SCALE1 0.029551 0.000000 0.000000 0.00000 SCALE2 0.000000 0.016211 0.000000 0.00000 SCALE3 0.000000 0.000000 0.005844 0.00000 ATOM 1 N SER A 1 -1.985 -12.356 81.201 1.00 60.11 N ATOM 2 CA SER A 1 -1.709 -12.440 82.636 1.00 60.41 C ATOM 3 C SER A 1 -2.774 -13.282 83.373 1.00 59.35 C ATOM 4 O SER A 1 -3.734 -13.763 82.751 1.00 58.16 O ATOM 5 CB SER A 1 -1.638 -11.029 83.229 1.00 64.08 C ATOM 6 OG SER A 1 -2.862 -10.345 83.045 1.00 69.46 O ATOM 7 H SER A 1 -2.431 -11.538 80.917 1.00 40.00 H ATOM 8 HG SER A 1 -2.887 -9.549 83.596 1.00 40.00 H ATOM 9 N HIS A 2 -2.634 -13.393 84.701 1.00 59.45 N

Page 39: Bioinformatika pro PřfUK 2001

SCOP

Page 40: Bioinformatika pro PřfUK 2001

PDBsum

Page 41: Bioinformatika pro PřfUK 2001

PDBsum

Page 42: Bioinformatika pro PřfUK 2001

PDBsum

Page 43: Bioinformatika pro PřfUK 2001

CATH

Page 44: Bioinformatika pro PřfUK 2001

CATH

Page 45: Bioinformatika pro PřfUK 2001

FSSP - Fold classification

Page 46: Bioinformatika pro PřfUK 2001

Structural genomics

Page 47: Bioinformatika pro PřfUK 2001

Bioinformatické WWW rozcestníky

EBI: http://www.ebi.ac.uk/ToolsExpasy: http://www.expasy.chPasteur: http://bioweb.pasteur.frLyon: http://pbil.univ-lyon1.frNCBI: http://ncbi.nlm.nih.gov

Page 48: Bioinformatika pro PřfUK 2001

EBI

Page 49: Bioinformatika pro PřfUK 2001

ExPASy

Page 50: Bioinformatika pro PřfUK 2001

PBIL

Page 51: Bioinformatika pro PřfUK 2001

Pasteur

Page 52: Bioinformatika pro PřfUK 2001

Bioinformatic Links