doug raiford lesson 3. more and more sequence data is being generated every day useless if not...

14
Doug Raiford Lesson 3

Upload: diana-clarke

Post on 16-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Doug Raiford Lesson 3.  More and more sequence data is being generated every day  Useless if not made available to other researchers

Doug RaifordLesson 3

Page 2: Doug Raiford Lesson 3.  More and more sequence data is being generated every day  Useless if not made available to other researchers

More and more sequence data is being generated every day

Useless if not made available to other researchers

Page 3: Doug Raiford Lesson 3.  More and more sequence data is being generated every day  Useless if not made available to other researchers

Not just sequence dataMany other biological

experiments Expression NMR Mass Spec Protein X-ray crystallography

Page 4: Doug Raiford Lesson 3.  More and more sequence data is being generated every day  Useless if not made available to other researchers

With the data comes scientific journal articles

Page 5: Doug Raiford Lesson 3.  More and more sequence data is being generated every day  Useless if not made available to other researchers

Search tools Find similar genes in other

organism Find articles Find

Implemented algorithms Alignment Sequence assembly Protein structure prediction

Page 6: Doug Raiford Lesson 3.  More and more sequence data is being generated every day  Useless if not made available to other researchers

National Center for Biotechnology Information (NCBI) GenBank

(accessed through NCBI)▪ Sponsored by

National Institute of Health (NIH)

RefSeq▪ Derived from

GenBank, curated, non-redundant

Page 7: Doug Raiford Lesson 3.  More and more sequence data is being generated every day  Useless if not made available to other researchers

European Molecular Biology Laboratory (EMBL)

DNA Data Bank of Japan (DDBJ)

Page 8: Doug Raiford Lesson 3.  More and more sequence data is being generated every day  Useless if not made available to other researchers

Protein Data Bank (PDB) PDB files: standardized format for

viewersProtein Information Resource (PIR)

Page 9: Doug Raiford Lesson 3.  More and more sequence data is being generated every day  Useless if not made available to other researchers

Will revisit laterCan actually perform scientific

analysis Color by charge Hydrophobicity Render surface

Page 10: Doug Raiford Lesson 3.  More and more sequence data is being generated every day  Useless if not made available to other researchers

Entrez Global Query Cross-Database Search System Single source for searching publications,

sequences, proteins,diseases, etc.

Whole Genome DB Genomic

Expression Omnibux (GEO)

Online Mendelian Inheritance in Man(OMIM)

PubMed Map of site

Page 11: Doug Raiford Lesson 3.  More and more sequence data is being generated every day  Useless if not made available to other researchers

Practical Extraction and Report Language Expansion came later

Really good at string manipulation DNA and proteins

represented as strings Scripting language Almost all Unix and Linux

systems come with it installed

Free download and install for windows

Page 12: Doug Raiford Lesson 3.  More and more sequence data is being generated every day  Useless if not made available to other researchers

Make a computer do what we want it to do

Program in a language Machine language▪ Low level—1’s and 0’s

High level programming language▪ C/C++▪ Java▪ Compiled into machine

language Very high level

languages▪ Scripting▪ Interpreted

Perl lives herePerl lives here

Page 13: Doug Raiford Lesson 3.  More and more sequence data is being generated every day  Useless if not made available to other researchers

Display something to the screenSyntax and punctuationStore something in a variableCommenting the codeSome easy string manipulation

print “Hello World\n”;

Page 14: Doug Raiford Lesson 3.  More and more sequence data is being generated every day  Useless if not made available to other researchers