virus host co-evolution in sight of their proteomes and codon preferences

44
Virus Host co- evolution in sight of their proteomes and codon preferences Bioinformatics project 2007 Yaar Reuveni Instructor - Michal Linial

Upload: lexi

Post on 11-Feb-2016

26 views

Category:

Documents


0 download

DESCRIPTION

Virus Host co-evolution in sight of their proteomes and codon preferences. Bioinformatics project 2007 Yaar Reuveni Instructor - Michal Linial. Outline:. My project is composed of two phases: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Virus Host co-evolution in sight of their proteomes and codon preferences

Virus Host co-evolution in sight of their proteomes and

codon preferences

Bioinformatics project 2007

Yaar ReuveniInstructor - Michal Linial

Page 2: Virus Host co-evolution in sight of their proteomes and codon preferences

Outline:My project is composed of two

phases:

1. Phase I: The virus host web tool – VirOsNet. You are welcome to visit at: www.virosnet.cs.huji.ac.il

2. Phase II: Virus Host co-evolution research using codon usage analysis.

Page 3: Virus Host co-evolution in sight of their proteomes and codon preferences

Viruses: Basically a cpasid

envelope that contains genetic information.

Viruses can not replicate by themselves, and depend on the host for reproduction.

It’s main purpose in life enter a host, and use it’s facilities to reproduce

Page 4: Virus Host co-evolution in sight of their proteomes and codon preferences

Viruses fight back:

Page 5: Virus Host co-evolution in sight of their proteomes and codon preferences

Phase I: VirOsNet

VirOsNet provides database and tools for exploring virus evolution and virus-host co-

evolution

Page 6: Virus Host co-evolution in sight of their proteomes and codon preferences

Background and Motivation:

Ample of examples suggest that often viruses steal information from their hosts.

Viruses must optimize their amount of genetic material and physical size.

Viruses have very fast evolution:o Hard to trace.o Might change by switching hosts.o Shuffle their genetic material.

Page 7: Virus Host co-evolution in sight of their proteomes and codon preferences

Phase (I) main objective:

Compare all viral proteins to all known proteins and detect resemblance.

Meaning: in what way do viral proteins "resemble" any of all other known proteins in our world?

Page 8: Virus Host co-evolution in sight of their proteomes and codon preferences

Objectives and possible outcomes (i)

Clever search: Provide crossbreeding factors when searching

Offer comparisons of viruses relative to the proteome of their known hosts

Stolen elements: where were they stolen from? Was it from the host?

Mimicking phenomenon: detect host - protein mimicry

When did it happen: Evolutionary tracking

Page 9: Virus Host co-evolution in sight of their proteomes and codon preferences

Objectives and possible outcomes (ii)

Recent event – indicative by similarity search results that are exceptional.

Insights on viruses and their proteomes.

Long term: Pharmaceutics applications. Proposal of

drug targets

Page 10: Virus Host co-evolution in sight of their proteomes and codon preferences

Methods: Data is from the ProtoNet DB (currently ~ 1.8 million

proteins) All proteins are from UniProt.

New tables to the DB -specialized for host-virus relations.

Pre computed BLAST (BLOSUM62) and dynamic BLAST options.

Entry is a Viral Protein, BLAST search results are sorted by the descending E-values.

Several display schemes. Each result associated with domain information (InterPro) Download options for next phase analysis

Page 11: Virus Host co-evolution in sight of their proteomes and codon preferences

Tool overview:The tool works in a 4 steps scheme:

1. Step 1: search for a virus to query on using one of the search methods

2. Step 2: choose a specific virus3. Step 3: choose one of it’s proteins, and

the BLAST properties4. Step 4: choosing one of the BLAST

results to get it’s pairwise alignment

Page 12: Virus Host co-evolution in sight of their proteomes and codon preferences

7,763 viruses and 199,563 proteins

Page 13: Virus Host co-evolution in sight of their proteomes and codon preferences

Some Statistics

Entry point to viruses according to their genetic material complexity

Page 14: Virus Host co-evolution in sight of their proteomes and codon preferences

Example: check all dsRNA viruses

Affecting Eukaryotes

Page 15: Virus Host co-evolution in sight of their proteomes and codon preferences
Page 16: Virus Host co-evolution in sight of their proteomes and codon preferences
Page 17: Virus Host co-evolution in sight of their proteomes and codon preferences
Page 18: Virus Host co-evolution in sight of their proteomes and codon preferences
Page 19: Virus Host co-evolution in sight of their proteomes and codon preferences

Case study: Abelson murine leukemia virus:a VERY close homolog of human and a

mouse protein tyrosine kinase that:(i) Regulates cytoskeleton during cell differentiation,

cell division and cell adhesion(ii) Regulates DNA repair potentially in severe

demage. The viral protein causes cancer (active site

mutation)

Lets look at it ……

Page 20: Virus Host co-evolution in sight of their proteomes and codon preferences

Active site

Page 21: Virus Host co-evolution in sight of their proteomes and codon preferences

Summery Phase I: Pros:

Platform for studying viruses relative to hosts A discovery tool Rich BLAST options for evolutionary wider view Crossbreeding with host data (i.e. IntrPro

Domains). Dynamic view on BLAST result as a group

(ProtoMesh) Cons: Still to improve the usability to the average biologist VirOsNet can get very slow on overload or in some

of the filtering options.

Page 22: Virus Host co-evolution in sight of their proteomes and codon preferences

Phase II: Codon usage

Virus-host classification using codon usage analysis

with SVM

Figure adapted fromL. Merkel, N. Budisa, BIOspektrum 2006 , 12 , 41.Veränderung des genetischen Codes.

Page 23: Virus Host co-evolution in sight of their proteomes and codon preferences

RNA codons: 

2’nd base

UCAG1’st

U

UUU (Phe/F)PhenylalanineUCU (Ser/S)SerineUAU (Tyr/Y)TyrosineUGU (Cys/C)Cysteine

baseUUC (Phe/F)PhenylalanineUCC (Ser/S)SerineUAC (Tyr/Y)TyrosineUGC (Cys/C)Cysteine

 UUA (Leu/L)LeucineUCA (Ser/S)SerineUAA Ochre (Stop)UGA Opal (Stop)

 UUG (Leu/L)LeucineUCG (Ser/S)SerineUAG Amber (Stop)UGG (Trp/W)Tryptophan

 

C

CUU (Leu/L)LeucineCCU (Pro/P)ProlineCAU (His/H)HistidineCGU (Arg/R)Arginine

 CUC (Leu/L)LeucineCCC (Pro/P)ProlineCAC (His/H)HistidineCGC (Arg/R)Arginine

 CUA (Leu/L)LeucineCCA (Pro/P)ProlineCAA (Gln/Q)GlutamineCGA (Arg/R)Arginine

 CUG (Leu/L)LeucineCCG (Pro/P)ProlineCAG (Gln/Q)GlutamineCGG (Arg/R)Arginine

 

A

AUU (Ile/I)IsoleucineACU (Thr/T)ThreonineAAU (Asn/N)AsparagineAGU (Ser/S)Serine

 AUC (Ile/I)IsoleucineACC (Thr/T)ThreonineAAC (Asn/N)AsparagineAGC (Ser/S)Serine

 AUA (Ile/I)IsoleucineACA (Thr/T)ThreonineAAA (Lys/K)LysineAGA (Arg/R)Arginine

 AUG (Met/M)Methionine, Start[1]ACG (Thr/T)ThreonineAAG (Lys/K)LysineAGG (Arg/R)Arginine

 

G

GUU (Val/V)ValineGCU (Ala/A)AlanineGAU (Asp/D)Aspartic acidGGU (Gly/G)Glycine

 GUC (Val/V)ValineGCC (Ala/A)AlanineGAC (Asp/D)Aspartic acidGGC (Gly/G)Glycine

 GUA (Val/V)ValineGCA (Ala/A)AlanineGAA (Glu/E)Glutamic acidGGA (Gly/G)Glycine

 GUG (Val/V)ValineGCG (Ala/A)AlanineGAG (Glu/E)Glutamic acidGGG (Gly/G)Glycine

Page 24: Virus Host co-evolution in sight of their proteomes and codon preferences

Main question:

Given a viral protein, determine who might be a potential host of the virus.

The basis for the hypothesis: An optimization of the viruses toward their hosts

Page 25: Virus Host co-evolution in sight of their proteomes and codon preferences

Objectives: Create a classification tool, that receives a

viral protein and will give a prediction on its potential hosts.

Classify all the proteins to different classes, using a maximum-margin hyperplane.

Provide different levels of classification. Create a “host rank” for a given viral

protein for each of its potential hosts.

Results: May suggest a “virus cross-species potential index”

Page 26: Virus Host co-evolution in sight of their proteomes and codon preferences

Methods: Collect and arrange all the codon usage data

(or other relevant data for this classification). Analyze the data, normalization and

processing. Unsupervised learning and clustering for

better understanding of the data. Given all codon usage for all species, use the

SVM algorithm to create a predictor for a new specimens.

Provide various levels of classifying classes for the codon data.

Page 27: Virus Host co-evolution in sight of their proteomes and codon preferences

About the data: Codon usage is calculated for

each species. Each species is represented

by a 64 positions vector. The question of normalization:

o standard normalize to 1.o functional per amino-acid, or by

entropy.o percentage – per column

666444442222222223113RLSTPAGVKNQHEDYCFIMWSTOP

Codon usage

spec

ies

1 . . . 64

Page 28: Virus Host co-evolution in sight of their proteomes and codon preferences

Bacteria

666444442222222223113RLSTPAGVKNQHEDYCFIMWSTOP

Page 29: Virus Host co-evolution in sight of their proteomes and codon preferences

Primates

Page 30: Virus Host co-evolution in sight of their proteomes and codon preferences

Data from Nakamura: Codon usage tabulated from the

international DNA sequence databasesNakamura, Y., Gojobori, T. and Ikemura, T. (2000) Nucl. Acids

Res. 28, 292.

Downloading the codon usage table The data covers all species (including

viruses).

Page 31: Virus Host co-evolution in sight of their proteomes and codon preferences

Usage distribution:Bacteria Invertebrates Primates

ViralPlants Rodents

Page 32: Virus Host co-evolution in sight of their proteomes and codon preferences

Usage distribution:

Positions 1-13

Page 33: Virus Host co-evolution in sight of their proteomes and codon preferences

Our data: It was expected to find diverse codon

usage between different taxonomy groups.

There are 703 distinct known hosts in our DB and 2152 distinct known hosted viruses.

I created an interface for extracting the CDS data from the coding data we have in ProtoNet.

I used the same convention for the vector

Page 34: Virus Host co-evolution in sight of their proteomes and codon preferences

In ProtoNet (version 5.1):16,567 viruses and 409,726 proteins

Page 35: Virus Host co-evolution in sight of their proteomes and codon preferences

Dividing our data in to groups:

GroupNameFungiBacteri

a

Viridiplantae (green plants)

Rodents

PrimatesFish

Aves (birds)

Tetrapoda

Arthropoda

Taxid4751233090998999443324438782325236656

distinct Hosts4463393831313418788

Number viruses not distinct

916914142511015474262761263

Distinct viruses9161329162868163741549175

Distinct viruses with CDS

9151304150816163631462169

Page 36: Virus Host co-evolution in sight of their proteomes and codon preferences

Who infect what?

226

112

1370

308

64732

6Primates

Rodents

Aves

Tetrapoda16

Fish

151

Bacteria

7

2 302Fungi

Plants

6 Others

70Arthropoda

+)99 (distributed

Page 37: Virus Host co-evolution in sight of their proteomes and codon preferences

These are all diferent viruses groups:

Page 38: Virus Host co-evolution in sight of their proteomes and codon preferences

Comparison:Positions 1-12

Looks Promising!

Page 39: Virus Host co-evolution in sight of their proteomes and codon preferences

Clustering: preliminary results

Using a set of COMPACT tool (COMPACT: A Comparative Package for Clustering Assessment)

Varshavsky et al, 2005 ISPA: 159-167.

Visualization of resultsScoring

Page 40: Virus Host co-evolution in sight of their proteomes and codon preferences

Hierarchal - Percentage Normalization

Page 41: Virus Host co-evolution in sight of their proteomes and codon preferences

Hierarchal - Standard Normalization

Page 42: Virus Host co-evolution in sight of their proteomes and codon preferences

Summery phase II: All data is organized, accessible and will

update along with the ProtoNet DB. Comprehensive analysis, created a

good understanding of the data. Future plans:

Decide on a good division into classes. Use SVM algorithm to create a classifier, given a

virus codon preferences guess potential hosts. Create an interface that offers this service.

Page 43: Virus Host co-evolution in sight of their proteomes and codon preferences

Acknowledgements:Thank you to all the people that

helped: Michal Linial Iris Bahir Menachem Fromer Alexander Savenok Michael Dvorkin Roy Varshavsky

Page 44: Virus Host co-evolution in sight of their proteomes and codon preferences

Thank You!