blast bioinformatics

18
BLAST By Harpreet Singh Kalsi Hans Raj College

Upload: sardar-harpreet-kalsi

Post on 26-Jun-2015

1.588 views

Category:

Education


0 download

TRANSCRIPT

Page 1: blast bioinformatics

BLASTBy

Harpreet Singh KalsiHans Raj College

Page 2: blast bioinformatics

BIOINFORMATICSBioinformatics is an emerging field of science which uses computer technology for storage, retrieval, manipulation and distribution of information related to biological data specifically for DNA, RNA and proteins.

DATABASEThey are simply the repositories in which all the biological data is stored as computer language. Databases are variously classified on varying basis like data type, data source, organisms, etc.

TOOLSTools are software developed to perform various tasks over the stored data such as searches, analysis, submission, annotation, etc.

RESIDUETerms stand for the building block of the macromolecules in the databases. For example nucleotide for DNA & RNA and amino acids for Proteins.

IMPORTANT TERMS

Page 3: blast bioinformatics

BRIEF CLASSIFICATION OF DATABASES

On basis of Data Source

Primary Databases

Secondary Databases

Special Categories

Composite Database

Integrated Database

On basis of Data Type

Genome Databases

Sequence Databases

Structure Databases

Microarray Databases

Chemical Databases

Metabolic Databases

Enzyme Databases

Disease Databases

Literature Databases

Taxonomy Database

Page 4: blast bioinformatics

IMPORTANT DATABASES IMPORTANT TOOLS

NCBI (Integrated database) EMBL (Nucleotide database) DDBJ (Nucleotide database) GenBank (Nucleotide

database) SWISS-PROT (Protein

database) OMIM (Disease database) PDB (Structure database) KEGG (Metabolic database) PubMed (Literature database) Enzymes (Enzyme database) PANDIT (taxonomy database) ArrayExpress (microarray

database)

BLAST (search and homology tool)

FASTA (search and homology tool)

BankIt (submission tool) Sequin(submission tool) ORF Finder (analysis tool) TXSearch (retrieval tool for

taxonomy database) SAKURA (submission tool

in DDBJ) ClustalW (multiple

sequence alignment) MSDFold (protein

secondary structure comparison tool)

Page 5: blast bioinformatics

BLAST stands for Basic Local Alignment Search Tool

Blast is a program which uses specific scoring matrices (like PAM or BLOSSUM) for performing sequence-similarity searches against a variety of sequence databases, to give us high-scoring ungapped segments among related sequences.

Complex- requires multiple steps and many parameters

The BLAST algorithm is fast, accurate, and web-accessible

Is relatively faster than other sequence similarity search tools.

Provides us with ability to perform analysis by different types of programs

BLAST

Page 6: blast bioinformatics

Program Input Query search Database 1

blastn DNA DNA 1

blastp protein protein 6

blastx DNA protein 6

tblastn protein DNA 36

tblastx DNA DNA

TYPES OF BLAST ALGORITHM

Continued

Page 7: blast bioinformatics

blastn compares a DNA query sequence against a DNA database, allowing for gaps

blastp compares a protein query sequence against a protein database, allowing for gaps

blastx compares a DNA query sequence translated into six reading frames against a protein database, allowing for gaps

tblastn compares a protein query sequence against a DNA database translated into six reading frames, allowing for gaps

tblastx compares a DNA query sequence translated into six reading frames against a DNA database

translated into six reading frames. tblastx doesn’t allow for gaps.

TYPES OF BLAST ALGORITHM (CONT.)

Page 8: blast bioinformatics

MEGABLAST - for comparison of large sets of long DNA sequences

RPS-BLAST - Conserved Domain Detection

BLAST 2 Sequences - for performing pair-wise alignments for 2 chosen sequences

Genomic BLAST - for alignments against select human, microbial or malarial genomes

PSI-BLAST - construct a multiple alignment from matches

PHI-BLAST -specify a pattern that hits must match

SPECIAL ALGORITHM OF BLAST

Page 9: blast bioinformatics

Make specific primers with Primer-BLAST Search trace archives Find conserved domains in your sequence (cds) Find sequences with similar conserved domain architecture

(cdart) Search sequences that have gene expression profiles (GEO) Search immunoglobulins (IgBLAST) Search using SNP flanks Screen sequence for vector contamination (vecscreen) Align two (or more) sequences using BLAST (bl2seq) Search protein or nucleotide targets in PubChem BioAssay Search SRA transcript and genomic libraries Constraint Based Protein Multiple Alignment Tool Needleman-Wunsch Global Sequence Alignment Tool Search RefSeqGene

SPECIAL TYPES OF BLAST

http://blast.ncbi.nlm.nih.gov/Blast.cgi

Page 10: blast bioinformatics

Although how BLAST works is a little complicated and lengthy so in short and brief explanation BLAST works in following two steps:

1. BLAST first searches for short regions of a given length (W) called “words” (or substrings) that score at least “T” when compared to the query sequence that align with sequences in the database (“target sequences”), using a substitution matrix.

2. For every pair of sequences (query and target) that have a word or words in common, BLAST extends the alignment in both directions to find alignments that score greater (are more similar) than a certain score threshold (S). These alignments are called high scoring pairs or HSPs; the maximal scoring HSPs are called MSPs.

HOW BLAST WORKS?

Page 11: blast bioinformatics

HOW BLAST WORKS?PICTORIAL REPRESENTATION

Query Sequence

“words” (subsequences of the query sequence)

Query words are compared to the database (target sequences) and exact matches identified

For each word match, alignment is extended in both directions to find alignments that score greater than some threshold (maximal segment pairs, or MSPs)

(Schneider and La Rota 2000)

Page 12: blast bioinformatics

There are various questions which a BLAST can handle which commonly arises in the research laboratory. Some of the most common questions arising are:

Which bacterial species have a protein that is related to a protein whose amino-acid sequence I know?

Where does the DNA I’ve sequenced come from? What other genes encode proteins that exhibit

structures similar to the one I’ve just determined? What does the protein structure looks like? What is the function of the gene or the protein that I've

sequenced? (if it’s not known then you have some work to do)

What are the probable functions of the sequence I have?

APPLICATION IN MEDICAL BIOTECHNOLOGY

CONTINUED

Page 13: blast bioinformatics

To answer the question arising we use BLAST for searching the database and then analyse the results which it produces. Here to explain this we will see an example

We have following sequence of a protein from our experiments with a Mycobacterium tuberculosisSequence:

Now as to see whether this protein has any similarity between other organisms we perform a BLAST to

understand it’s importance. To perform BLAST we go to following URL

http://blast.ncbi.nlm.nih.gov/

EXAMPLE TO UNDERSTAND THE WORKING OF BLAST

CONTINUED

Page 14: blast bioinformatics

After performing blast against a chosen or every blast we perform the analysis of the result

A chosen entry is shown below

This entry shows that the sequence for which we ran BLAST hits against a database (here Swiss-Prot) has a 88% identity with Full=Single-stranded DNA-binding protein accession number P46390.2

Continued

Page 15: blast bioinformatics

Entry shows us a score which describes the quality of the entry which has matched with the query which we have sequenced in our experiment.

With the use of accession number which we have obtained after organising a BLAST search we can easily access the information about many aspects. Some of them are described below

• The organism from which it came• Function of the protein• Region of DNA encoding for the gene• length of the sequence• taxonomy of the organism• FASTA sequence of the protein• Links for the 3D structure if it has been found

Similarly we can see whether the sequence which we have sequenced is homologous (similar) or not with any of the sequence in the database which we are referring for the search. As mentioned we can search any database of our interest to check it’s function or function for similar structures.

Page 16: blast bioinformatics

BLAST is the most important program in bioinformatics (maybe all of biology)

BLAST is based on sound statistical principles (key to its speed and sensitivity)

A basic understanding of its principles is key for using/interpreting BLAST output

BLAST can play an essential role for helping us to purpose the following

structure of a proteinFunction of sequenceRelation with an organism

Use blastn or MEGA-BLAST for DNA Use PSI-BLAST for protein searches

SUMMARY

Page 17: blast bioinformatics

BOOKS

BIOINFORMATICS by by Pevsner BIOINFORMATICS by Jin Xiong BIOINFORMATICS by Ghosh and Malik

INTERNET

Slide share www.slideshare.com NCBI www.blast.ncbi.nlm.nih.gov/Blast.cgi UniProt/Swiss-Prot www.uniprot.org

BIBLIOGRAPHY

Page 18: blast bioinformatics

THANK YOU