login: bitseminar pass: bitseminar2011 login: bitseminar pass: bitseminar2011

48
Login: BITseminar Pass: BITseminar20 11

Upload: georgiana-mckinney

Post on 24-Dec-2015

231 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011

Login:BITseminarPass: BITseminar2011

Page 2: Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011

BIOINFORMATICS

Page 3: Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011

Bioinformatics• Combination of:– Theory and methods (algorithms, statistical

methods, machine learning, …)– Applications (sequence analysis, genome

assemblies, databases, ... )– Different kinds of datasets (sequence data,

microarray, next-gen data, …)

Page 4: Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011

Biology Core Concepts• Molecular biology• Systems biology• Evolutionary theory• Common lab techniques• Sequence comparison• Phylogenetic analysis

Page 5: Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011

Computer science• Programming• Database querying• Data mining• Visualization• Machine learning• Modeling• …

Page 6: Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011

Data exceeds analysis

Bioinformatician

data

Page 7: Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011

How to survive?• Knowledge of Linux/Unix• Scripting: Perl/Python• Network based data storage• Knowledge biology, genomics• Database structures• Try to keep up with all new tools!

Page 8: Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011

Benifit of using (Bio)perl, exampleYou have a 1000 sequences to blast and analyse…You can do this manually Or… use a perlscript to do this for you and present you the final results!

Page 9: Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011

Good journals to keep up the paceBioinformatics (

http://bioinformatics.oxfordjournals.org/ ) BMC Bioinformatics ( http://

www.biomedcentral.com/bmcbioinformatics/ ) PLoS Computational Biology (

http://www.ploscompbiol.org/ ) ...

Page 10: Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011

DATABASES

Page 11: Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011

Types of databases• DNA databases• Protein databases• Genome databases• Microarray databases• Next-Gen seq databases

Page 12: Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011

What to find in databases?• Sequences• Motifs• Mutations, SNPs• Gene ineraction profiles• Interactions (protein protein interactions)• Transcription factor binding sites• Etc…

Page 13: Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011

Databases? Good Reference• http://nar.oxfordjournals.org annual edition

Page 15: Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011

Amino acid databases• Uniprot– SWISS-PROT– TrEMBL– PIR

Page 16: Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011

Uniprot• http://www.uniprot.org • Good quality, curated• Minimal redundancy• Extensive cross linking

to useful databases

Page 17: Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011

Structural databases• Structure leads to function!

– Protein Data Base – PDB http://www.pdb.org – SCOP & CATH databases (structural classification)

http://scop.mrc-lmb.cam.ac.uk/scop/ ; http://www.cathdb.info/

Page 18: Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011

Structure prediction (modeling) SWISS-MODEL & Repository ( http:// swissmodel.expasy.org/ )

MODELLER & MODBASE ( http://salilab.org )

Study of interactions (docking) & drug design

Page 19: Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011

SNPs and pharma

• To collect, encode, and disseminate knowledge about the impact of human genetic variations on drug response.

http://www.pharmgkb.org/

Page 20: Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011

DNA Microarray Databases• Standard: MIAME = minimum information about

microarray experiment

• Databases:– ArrayExpress (EBI)http://www.ebi.ac.uk/arrayexpress/

– GEO (NCBI)http://www.ncbi.nlm.nih.gov/geo/

Check the database before planning an experiment!

Page 22: Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011

GENOME BROWSERS

Page 23: Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011

Human reference sequences

• Celera• Huref• GRCh37

Three reference genomes. Keep this

in mind when browsing databases!

Page 24: Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011

Useful Genome Browsers• Ensembl: http://www.ensembl.org/• NCBI Map Viewer:

http://www.ncbi.nlm.nih.gov/mapview/map_search.cgi?

• UCSC: http://genome.ucsc.edu/

Page 25: Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011

Genome browser: Ensembl

Page 26: Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011

EMBL Problems• Lots of redundancy• Wrong or old annotations• Vector contamination• Errors in sequences

Page 27: Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011

Refseq• Better option, NCBI reference• Curated• Annotations are controlled• No redundancy

Page 28: Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011

NCBI:Genbank vs RefSeqhttp://www.ncbi.nlm.nih.gov/RefSeq/

• Sequence records are created by scientists who submit sequence data to GenBank. As an archival database, GenBank may contain hundreds of records for the same gene. In addition, because there is no independent review system, the types of information may vary from record to record, and GenBank sequence data may contain errors and contaminant vector DNA.

• To address some of the problems associated with GenBank sequence records, NCBI developed its RefSeq database.

Page 29: Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011

Refseq accession numbers• NM_ mRNA (provisional, predicted, reviewed)• NP_ protein (provisional, predicted, reviewed)• NR_ non-coding RNA (provisional, reviewed)• NG_ human genes (provisional, reviewed)• NC_ chromosomes, complete genomes

(provisional, reviewed)

Page 30: Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011

Refseq accession numbers (2)• XM_ predicted mRNA (model)• XP_ predicted protein (model)• XR_ predicted non-coding RNA (model)• NT_ human and mouse genomic contiqs

(model)• NW_ mouse supercontiqs (model)

Page 31: Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011

Genome browser: NCBI

Page 33: Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011

SNPS AND DISEASE RESEARCH

Page 34: Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011

SNPs and disease research• Association analysis, disease related (?),

mapping genome variation…• Reference = dbSNP database

Page 35: Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011

Example NCBI SNP database, SNP rs33957964

Page 36: Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011

Other useful SNPs databases• Genome variation center

http://gvs.gs.washington.edu/GVS/• HapMap (Ensembl) http://hapmap.org/• List of all: http://

www.hgvs.org/dblist/ccent.html

Page 37: Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011

Clinical Bioinformatics

• Microarrays, omics data (genomics, proteomics, interactomics, metabolomics, …)

• Combination of bioinformatics and medical informatics

Page 38: Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011

ALGORITHMS AND TOOLS

Page 39: Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011

Algorithms• Fundaments for bioinformatic tools

– Implemented in ‘front end tools’ (website, Java applications)• Can be slow• Good for smaller analysis, quick mining

– Scripts, programs - use in command line (e.g.local BLAST)• Usually local install on server• faster• large queries, long analysis time required• Knowledge of linux/unix essential

Page 40: Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011

Hall of Fame• Linux operating system, mySQL database• (Bio)Perl: programming language making your life easier!• Blast/Blat: comparing sequences• Phylip: Phylogenetic analysis, tree building• ClustalW: Multiple alignment• MEGA5: Multiple alignment and editing sequences• HMMER: comparative genomics• EMBOSS: combining several tools for sequence analysis

Open sourcce Free to use and develop

Page 41: Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011

Tools? Good Reference• http://nar.oxfordjournals.org/ - annual edition

Page 42: Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011

Analysing next gen sequencing data• Different tools for different formats– Roche– Applied Biosystems– Illumina

Page 43: Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011

Next gen tools• FastQC: quality assesment of FASTQ files

Page 44: Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011

Assembly tools next gen• A number of specialized tools exist:ABySS, gap4, Geneious, Mira, Newbler,SSAKE, SOAPdenovo, Velvet, …

Page 45: Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011

Galaxy! http://galaxy.psu.edu/

• Galaxy provides a web-based application for the analysis of sequence data

• Includes many tools including NGS data• Makes your life easier, less linux knowledge

Page 46: Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011

On the cloud

Page 47: Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011

Structure Galaxy

Page 48: Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011

Login:BITseminarPass: BITseminar2011

So this is why you need a bioinformatician in the lab!!