autograph: un serveur de comparaison de génomes...

45
AutoGRAPH: un serveur de comparaison de génomes - Application à l’identification de nouveaux gènes chez le chien Thomas DERRIEN & Christophe Hitte - Laboratory: CNRS - Institute of Genetics and Development of Rennes (France) - Team: Dog Genetics Rennes - 23 Oct 2007

Upload: others

Post on 27-May-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: AutoGRAPH: un serveur de comparaison de génomes ...videos.rennes.inria.fr/genopole/bioInfo2007/... · All Ensembl gene predictions are based on experimental evidences (UniProt -

AutoGRAPH: un serveur de comparaison de génomes -

Application à l’identification de nouveaux gènes chez le chien

Thomas DERRIEN & Christophe Hitte-

Laboratory: CNRS - Institute of Genetics and Development of Rennes (France)-

Team: Dog Genetics

Rennes - 23 Oct 2007

Page 2: AutoGRAPH: un serveur de comparaison de génomes ...videos.rennes.inria.fr/genopole/bioInfo2007/... · All Ensembl gene predictions are based on experimental evidences (UniProt -

Context

Dog Radiation Hybrid (RH) map

- 2003 : >3200 markers (Guyon et al.)

- 2004 : >4200 markers FISH/RH (Breen et al.)

- 2005 : 10,000 genes (Hitte et al.)

Canis familiaris : 38 autosomes + XY

chr1

chr2

chr3

chr4

????Prédiction de gènes Comparaison et perspectivesIntroduction Multispecies mapMultiresources map

Rennes 23 oct 2007

Page 3: AutoGRAPH: un serveur de comparaison de génomes ...videos.rennes.inria.fr/genopole/bioInfo2007/... · All Ensembl gene predictions are based on experimental evidences (UniProt -

Context

Dog sequence

- 2005: Optimization of the low-coverage sequence of the dog genome. (Hitte C. et al.)

- 2005: Framework for the high-coverage of the dog sequence assembly. (Lindblad-Toh K. et al.)

Prédiction de gènes Comparaison et perspectivesIntroduction Multispecies mapMultiresources map

Rennes 23 oct 2007

Dog Radiation Hybrid (RH) map

- 2003 : >3200 markers (Guyon et al.)

- 2004 : >4200 markers FISH/RH (Breen et al.)

- 2005 : 10,000 genes (Hitte et al.)

Page 4: AutoGRAPH: un serveur de comparaison de génomes ...videos.rennes.inria.fr/genopole/bioInfo2007/... · All Ensembl gene predictions are based on experimental evidences (UniProt -

Comparative genomics x2

Prédiction de gènes Comparaison et perspectives

Multi-ressources and multi-species comparative genomics analyses.

=> Sequence vs RH map vs cytogenetic map...

=> Dog vs mammal sequences.

Introduction Multispecies mapMultiresources map

Rennes 23 oct 2007

Page 5: AutoGRAPH: un serveur de comparaison de génomes ...videos.rennes.inria.fr/genopole/bioInfo2007/... · All Ensembl gene predictions are based on experimental evidences (UniProt -

RH markers localizations and sequence alignments from the dog sequence

assembly (CanFam 1.0).

rh markers/genes

sequence alignments

relation between sequence and RH markers

Dog sequenceDog RH map

CFA 9

- Compare gene order RH map and sequence assembly.

- Estimate the colinearity between the 2 resources.

Aims:

Multi-resources comparative maps:

Prédiction de gènes Comparaison et perspectivesIntroduction Multispecies mapMultiresources map

Data set:

Rennes 23 oct 2007

Page 6: AutoGRAPH: un serveur de comparaison de génomes ...videos.rennes.inria.fr/genopole/bioInfo2007/... · All Ensembl gene predictions are based on experimental evidences (UniProt -

AutoGRAPH and Multi-resources datasets:

Prédiction de gènes Comparaison et perspectives

(Derrien T. et al. 2006)

http://autograph.genouest.org/

Introduction Multispecies mapMultiresources map

Rennes 23 oct 2007

Page 7: AutoGRAPH: un serveur de comparaison de génomes ...videos.rennes.inria.fr/genopole/bioInfo2007/... · All Ensembl gene predictions are based on experimental evidences (UniProt -

AutoGRAPH and Multi-resources datasets:

Prédiction de gènes Comparaison et perspectivesIntroduction Multispecies mapMultiresources map

MySQL temporary table

Insertion

- PHP, PERL- GD Graphic Library

ordering

Graphic map construction Visualization

User dataset

Identifiant

Chromos

Localisatio

Chromos

Localisatio

Chromos

Localisatio

GENE CFA7 6.3 CFA_0 30393 HSA_ 20043GENE CFA7 14.9 CFA_0 31907 HSA_ 20028GENE CFA7 28.3 CFA_0 32588 HSA_ 20021GENE CFA7 51.6 CFA_0 36582 HSA_ 19971GENE CFA7 64.9 CFA_0 38730 HSA_ 19942GENE CFA7 72.3 CFA_0 39878 HSA_ 19922GENE CFA7 79.6 CFA_0 40725 HSA_ 19915GENE CFA7 84.4 CFA_0 41214 HSA_ 19911GENE CFA7 101.1 CFA_0 41925 HSA_ 19906GENE CFA7 112.0 CFA_0 42469 HSA_ 19898GENE CFA7 124.7 CFA_0 44408 HSA_ 19874GENE CFA7 135.2 CFA_0 45004 HSA_ 19864GENE CFA7 146.0 CFA_0 46163 HSA_ 19855GENE CFA7 168.2 CFA_0 47265 HSA_ 19840

Rennes 23 oct 2007

Page 8: AutoGRAPH: un serveur de comparaison de génomes ...videos.rennes.inria.fr/genopole/bioInfo2007/... · All Ensembl gene predictions are based on experimental evidences (UniProt -

Prédiction de gènes Comparaison et perspectives

Comparative analysis Cytogenetic map - RH map -

Sequence (CanFam1.0) for the dog chromosome 11 (CFA 11) :

- Strong colinearity between RH vs. CytoGenetic map.

- Inversion might be due to a problem in sequence assembly.

RH map Sequence

AutoGRAPH and Multi-resources datasets:

CytoGenetic

Results:- 8 discrepancies Sequence assembly / RH map- Cytogenetic experiments.- 4 have been solved in favor of the RH map.

Led to CanFam2.0 (Dec. 2005) (Lindblad-toh K)

Introduction Multispecies mapMultiresources map

Rennes 23 oct 2007

Page 9: AutoGRAPH: un serveur de comparaison de génomes ...videos.rennes.inria.fr/genopole/bioInfo2007/... · All Ensembl gene predictions are based on experimental evidences (UniProt -

Introduction Multispecies map Prédiction de gènes Comparaison et perspectivesMultiresources map

Multispecies Comparative Maps

Rennes 23 oct 2007

Page 10: AutoGRAPH: un serveur de comparaison de génomes ...videos.rennes.inria.fr/genopole/bioInfo2007/... · All Ensembl gene predictions are based on experimental evidences (UniProt -

Introduction Multispecies map Prédiction de gènes Comparaison et perspectivesMultiresources map

Rennes 23 oct 2007

- Identification of conserved sequences between species => functional sequences - Compare chromosomal organization between species => chromosomes rearrangements and evolution

Mutlispecies map: why?

Page 11: AutoGRAPH: un serveur de comparaison de génomes ...videos.rennes.inria.fr/genopole/bioInfo2007/... · All Ensembl gene predictions are based on experimental evidences (UniProt -

Introduction Multispecies map Prédiction de gènes Comparaison et perspectivesMultiresources map

Rennes 23 oct 2007

- “Comparative anchors” : conserved sequences between species....

- ortholog genes (orthology relationships 1:1) -

Mutlispecies map: How?

Ancestral genomegene A gene B

Genome 1gene A.1 gene B.1

Genome 2gene A.2 gene B.2

SPECIATION

Orthologs: gene A.1 et A.2

= homolog genes separated by a speciation event

(Carnivore)

(Felis catus)(Canis familiaris)

DUPLICATIONGenome 1

gene A.1 gene B.1’ gene B.1’’ Genome 2

gene A.2 gene B.2

Paralogs: gene B.1’ et B.1’’

= homolog genes separated by a duplication event

(Canis familiaris) (Felis catus)

Page 12: AutoGRAPH: un serveur de comparaison de génomes ...videos.rennes.inria.fr/genopole/bioInfo2007/... · All Ensembl gene predictions are based on experimental evidences (UniProt -

Multi-species comparative maps:

Data sets: - Collect ortholog data sets from Ensembl v.42 (Biomart/MartView)

=> Orthologues features for 5 species of interest : Dog - Human - Chimp - Rat - Mouse.

Introduction Multispecies map Prédiction de gènes Comparaison et perspectivesMultiresources map

- Compare genomes and construct multispecies comparative maps (synteny maps)

- Identify Conserved Segments (CS, Synteny blocks), Conserved Segments Ordered

(CSO, Synteny segments) and breakpoints regions.

- Facilitate gene identification

Aim:

Dog genes

Chromosome 1 (CFA 1)

Human genes

Orthology relationships (1:1)

Dog: Reference Human: Tested Genome

HSA 2

HSA 3Breakpoint between 2 CS.

Breakpoint between 2 CSO.

CSOs

Human synt

Page 13: AutoGRAPH: un serveur de comparaison de génomes ...videos.rennes.inria.fr/genopole/bioInfo2007/... · All Ensembl gene predictions are based on experimental evidences (UniProt -

AutoGRAPH: CFA 34 vs human genome

- CFA 34 (reference) has 2 Conserved Segments:

=> HSA 5 => HSA 3

- Within CFA34 - HSA 3: 2 Conserved Segments Ordered (CSO).

- 2 Breakpoint regions.

- > High colinearity within CSO <-

CFA - 34

HSA 3

HSA 5

Results:

- Automatic identification of CS/CSO and breakpoints regions between 2 species

Results (map output):

Introduction Multispecies map Prédiction de gènes Comparaison et perspectivesMultiresources map

Rennes 23 oct 2007

Page 14: AutoGRAPH: un serveur de comparaison de génomes ...videos.rennes.inria.fr/genopole/bioInfo2007/... · All Ensembl gene predictions are based on experimental evidences (UniProt -

CFA - 34 Human

Introduction Multispecies map Prédiction de gènes Comparaison et perspectivesMultiresources map

AutoGRAPH: CFA 34 vs human and mouse genomes

- 5 CS with mouse genome: => MMU 15 => MMU 13 => MMU 16 => 2x MMU 3 (=> 4 CSO)

- Reused /species-specific Breakpoints Identification.

- > High colinearity within CSO <-

Mouse

Results:

- Identification of CS/CSO and breakpoints regions between 3 species

Results (map output):

Rennes 23 oct 2007

Page 15: AutoGRAPH: un serveur de comparaison de génomes ...videos.rennes.inria.fr/genopole/bioInfo2007/... · All Ensembl gene predictions are based on experimental evidences (UniProt -

Introduction Multispecies map Prédiction de gènes Comparaison et perspectives

Multispecies Comparative Maps

Multiresources map

2 types of result:

- displayed on the comparative map.

- listed in an array (and a flat file format that can be downloaded).

Localizations on reference chromosome

CS - CSO : Sizes (bp)CS - CSO Id

(No of genes)Localizations on

tested chromosomes

Density

Density around breakpoint regions

Results (array output):

Rennes 23 oct 2007

Page 16: AutoGRAPH: un serveur de comparaison de génomes ...videos.rennes.inria.fr/genopole/bioInfo2007/... · All Ensembl gene predictions are based on experimental evidences (UniProt -

Introduction Multispecies map Prédiction de gènes Comparaison et perspectivesMultiresources map

Results (orthology relationships):

1:1 orthology relationship

1:0 orthology relationship

Tested syntenic interval

Rennes 23 oct 2007

Page 17: AutoGRAPH: un serveur de comparaison de génomes ...videos.rennes.inria.fr/genopole/bioInfo2007/... · All Ensembl gene predictions are based on experimental evidences (UniProt -

Introduction Multispecies map Comparaison et perspectivesMultiresources map

Applications: Dog genome annotation

Rennes 23 oct 2007

The Dog Orphan genes

Prédiction de gènes

Page 18: AutoGRAPH: un serveur de comparaison de génomes ...videos.rennes.inria.fr/genopole/bioInfo2007/... · All Ensembl gene predictions are based on experimental evidences (UniProt -

The Ensembl Automatic Gene Annotation System (Curwen V., 2004)

412 unannotated genes in the dog genome (orphan genes)(Protein coding genes)

All Ensembl gene predictions are based on experimental evidences (UniProt - SwissProt - RefSeq)

Analysis of a subset of genes 1:1:1:1:0 (Human:Chimp:Mouse:Rat)

Introduction Multispecies map The unknown dog genes Dog orphan genesMultiresources map

Human Chimp Mouse Rat Dog

Total protein coding genes 23068 20 982 23616 21367 17507

Extract X

Data set:

Rennes 23 oct 2007

Page 19: AutoGRAPH: un serveur de comparaison de génomes ...videos.rennes.inria.fr/genopole/bioInfo2007/... · All Ensembl gene predictions are based on experimental evidences (UniProt -

Hypothesis:

Annotation problems in the dog genome?

Are these genes specific in primates and rodents lineages?

Introduction Multispecies map The unknown dog genes Dog orphan genesMultiresources map

Aim: Analyse these genes in the dog genome

Rennes 23 oct 2007

Page 20: AutoGRAPH: un serveur de comparaison de génomes ...videos.rennes.inria.fr/genopole/bioInfo2007/... · All Ensembl gene predictions are based on experimental evidences (UniProt -

Characterization of 412 [1:1:1:1:0] ortholog

Structural characterization:

1 2 3 4 5 6 7 8 9 10 12 14 16 18 20 22 24 26 28 30P

Dog_OrphanGenes

Random_Genes

Nombre d’Exon(s)

No

mb

re d

e g

en

es

01

02

03

04

05

06

0

Exon number

Num

ber

of g

enes

Mean exon No Mean protein sizes (aa) Mean cDNA GC content

Human tested set (n=412) 6.3 398 52.55

Human random set (n=1000) 10.7 557 51.34

Higher rate of monoexonic genes

Smaller protein sizes

Higher transcripts GC content

Introduction Multispecies map The unknown dog genes Dog orphan genesMultiresources map

Page 21: AutoGRAPH: un serveur de comparaison de génomes ...videos.rennes.inria.fr/genopole/bioInfo2007/... · All Ensembl gene predictions are based on experimental evidences (UniProt -

Characterization of human ortholog

Functionnal characterization:

- Significant enrichment in Specific GO term:

- Analyzed with the program GO Tree Machine (Zhang B. et al, 2004)

p-value GO category (BP) p-value GO category (BP)

0.008970025556679 potassium ion transport 0.00045451746485333 fertilization

0.008902320120899 response to wounding 0.00040299007694943 fertilization (sensu Metazoa)

0.0088375773202095 detection of chemical stimulus 6.2941602638017E-05 regulation of cellular physiological process

0.0070783033944581 physiological response to wounding 1.4569323904298E-05 regulation of physiological process

0.0069479680074864 plasma membrane fusion 1.7202210352273E-06 regulation of metabolism

0.0069479680074864 microtubule nucleation 1.2026321510082E-06 regulation of cellular metabolism

0.0052673840107958 fusion of sperm to egg plasma membrane 6.8760533901414E-07 regulation of transcription

0.004307225374419 response to external stimulus 5.7435783302335E-07 regulation of nucleobase, nucleoside, nucleotide and nucleic acid metabolism

0.00060351043892835 nucleobase, nucleoside, nucleotide and nucleic acid metabolism 5.0349916873545E-07 transcription

0.00052973081439737 regulation of biological process 4.3662427217568E-07 regulation of transcription, DNA-dependent

0.00049971977537453 regulation of cellular process 2.0446677466923E-07 transcription, DNA-dependent

Main category:- chemosensation, olfaction- immunity and host defense- reproduction

Introduction Multispecies map The unknown dog genes Dog orphan genesMultiresources map

Rennes 23 oct 2007

Page 22: AutoGRAPH: un serveur de comparaison de génomes ...videos.rennes.inria.fr/genopole/bioInfo2007/... · All Ensembl gene predictions are based on experimental evidences (UniProt -

Methods:

4 steps

Multispecies synteny maps construction: Reference (hsa/ptr/mmu/rno) : Tested (cfa)&

Dog syntenic interval identification

Testing the dog syntenic interval by sequence alignments

Dog gene predictions

1

2

3

Comparison with ensembl dog annotated genes4

Introduction Multispecies map The unknown dog genes Dog orphan genesMultiresources map

Rennes 23 oct 2007

Page 23: AutoGRAPH: un serveur de comparaison de génomes ...videos.rennes.inria.fr/genopole/bioInfo2007/... · All Ensembl gene predictions are based on experimental evidences (UniProt -

1st step: Multispecies map construction

Ensembl annotation: Comparative anchors

AutoGRAPH: For 1 tested dog gene <=> 4 dog syntenic intervals defined

Example: Gene H2BFS = Histone H2B type F-S

- Human: ENSG00000197597 (HSA 21: ~ 43,8 Mb)- Mouse: ENSMUSG00000050936 (MMU 3: ~ 96,3 Mb)- Chimp: ENSPTRG00000017808 (PTR 6: ~ 26,6 Mb)- Rat: ENSRNOG00000029696 (RNO 2: ~ 19,1 Mb)

Human Chimp Mouse Rat

No of 1:1 orthologues with the

dog genome14,997 14,798 14,667 14,065

Introduction Multispecies map The unknown dog genes Dog orphan genesMultiresources map

Rennes 23 oct 2007

Page 24: AutoGRAPH: un serveur de comparaison de génomes ...videos.rennes.inria.fr/genopole/bioInfo2007/... · All Ensembl gene predictions are based on experimental evidences (UniProt -

Reference genome : Human (HSA 21: ~ 43,8 Mb)

Tested : Dog

Conserved segment : x1

Localization : CFA 31: 14,13 - 40,90 (Mb)

(ENSG00000197597 = H2BFS = Histone H2B type F-S)

One predicted dog interval with human

High colinearity in the CS

Interval defined: CFA_31: 39,87 - 39,98 (Mb)

1st step: Multispecies map construction

Introduction Multispecies map The unknown dog genes Dog orphan genesMultiresources map

Rennes 23 oct 2007

Page 25: AutoGRAPH: un serveur de comparaison de génomes ...videos.rennes.inria.fr/genopole/bioInfo2007/... · All Ensembl gene predictions are based on experimental evidences (UniProt -

Reference : Mouse (MMU 3: ~ 96,3 Mb)

Tested : Dog

No Conserved Segments : x16

Coordonnées : XXXs

... same analyses with chimp and rat orthologues

Interval defined: CFA_31: 39,87 - 40,16 (Mb)

(ENSMUSG00000050936 = H2BFS = Histone H2B type F-S)

1st step: Multispecies map construction

Introduction Multispecies map The unknown dog genes Dog orphan genesMultiresources map

Rennes 23 oct 2007

Page 26: AutoGRAPH: un serveur de comparaison de génomes ...videos.rennes.inria.fr/genopole/bioInfo2007/... · All Ensembl gene predictions are based on experimental evidences (UniProt -

Interval defined: CFA_31: 39,87 - 39,98 (Mb)

Interval defined: CFA_31: 39,87 - 40,16 (Mb)

Hu/Dog :

Mmu/Dog :

Interval defined: CFA_31: 39,87 - 39,98 (Mb)Chim/Dog :

Interval defined: CFA_31: 39,80 - 40,16 (Mb)Rat/Dog :

39,8

7 Mb

39,8

0 Mb

39,9

8 Mb

40,1

6 Mb

Consensus Orthologous IntervaL :COIL

Principle

Page 27: AutoGRAPH: un serveur de comparaison de génomes ...videos.rennes.inria.fr/genopole/bioInfo2007/... · All Ensembl gene predictions are based on experimental evidences (UniProt -

Results

- 389 (94%) COILs (dog predicted intervals)

If at least (2/4) overlapped intervals : Consensus Orthologous IntervaLs -COIL-

Mean size interval: 347 kb (vs 2,4 Gb)

distributed evenly on all the dog chromosomes (nb:1 => chr 14 --- nb:47 => chr1)

- 17 : in breakpoint regions reused between primates and rodents- 6 : no consensual in the prediced interval (orthology prediction ?)

Introduction Multispecies map The unknown dog genes Dog orphan genesMultiresources map

Reduce the search space to only 347 kb!

Rennes 23 oct 2007

Page 28: AutoGRAPH: un serveur de comparaison de génomes ...videos.rennes.inria.fr/genopole/bioInfo2007/... · All Ensembl gene predictions are based on experimental evidences (UniProt -

Methods:

4 steps

Multispecies synteny maps construction: Reference (hsa/ptr/mmu/rno) : Tested (caf)&

Dog syntenic interval identification

Protein Alignment : cDNA reference sequence vs Canine genome sequence :

Dog gene predictions

1

2

3

Comparison with ensembl dog annotated genes4

Introduction Multispecies map The unknown dog genes Dog orphan genesMultiresources map

412

389

Rennes 23 oct 2007

Page 29: AutoGRAPH: un serveur de comparaison de génomes ...videos.rennes.inria.fr/genopole/bioInfo2007/... · All Ensembl gene predictions are based on experimental evidences (UniProt -

Canis familiaris : 38 autosomes + XY

chr1

chr2

chr3

chr4

????

Protein Alignment : cDNA reference sequence vs Canine genome sequence :

Rat

H2BFS cDNA

Mouse

H2BFS cD

NA

Exonerate (Slater G et al., 2005): model cDNA2 genome.

Chimp

H2BFS cD

NA

Introduction Multispecies map The unknown dog genes Dog orphan genesMultiresources map

Rennes 23 oct 2007

Human

H2BFS cDNA

Page 30: AutoGRAPH: un serveur de comparaison de génomes ...videos.rennes.inria.fr/genopole/bioInfo2007/... · All Ensembl gene predictions are based on experimental evidences (UniProt -

2/4: Alignement de séquences de référence sur le génome testé

4 reference species 2-3 reference species 0-1 reference species

canine COIL match with Protein Alignments 271 77 41

Proportion 69.7% 19.7% 10.5%

Protein Alignment : cDNA reference sequence vs Canine genome sequence :

Introduction Multispecies map The unknown dog genes Dog orphan genesMultiresources map

348 COILs correlated with sequence alignments of cDNA reference-species sequences

Rennes 23 oct 2007

Page 31: AutoGRAPH: un serveur de comparaison de génomes ...videos.rennes.inria.fr/genopole/bioInfo2007/... · All Ensembl gene predictions are based on experimental evidences (UniProt -

Methods:

4 steps

Multispecies synteny maps construction: Reference (hsa/ptr/mmu/rno) : Tested (caf)&

Dog syntenic interval identification

Protein Alignment : cDNA reference sequence vs Canine genome sequence :

Dog gene predictions

1

2

3

Comparison with ensembl dog annotated genes4

Introduction Multispecies map The unknown dog genes Dog orphan genesMultiresources map

412

389

348

Rennes 23 oct 2007

Page 32: AutoGRAPH: un serveur de comparaison de génomes ...videos.rennes.inria.fr/genopole/bioInfo2007/... · All Ensembl gene predictions are based on experimental evidences (UniProt -

Gene prediction

Software: GeneWise (Birney E et al, 2004):

- Alignment of Reference protein sequences with dog nucleotidic interval.

=> 1 gene : 4 analyses corresponding to the 4 reference proteins=> Best Score => Threshold: > 40% protein identity with reference-species ortholog

CTCAATGCCAATCCGGCCCCCCTGAGTTCTTCTTCCGCGTGTTGACCACCGTCCCAGAATTCCAGGCCCTGCTCTTCCTCCTCTTCCTCCTCCTCTGATCCTCTGTGGCAACACAGCCATCATCTGGGTGGTGTGCACGCACAGCTCCCTCCGCACCCCCATGTACTTCTTCCTCTGCAACCTGGCCTTTGATCAGCTACACCACGGTGGTGGTGCCTCTGATGCTTTCCAACATTTGGGCTCAACCAATCCGGCCCCCCTGAGTTCTTCTTCCGCGTGTTGACCCCCAGAATTCCAGGCCCTGCTCTTCCTCCTCTTCCTCCTCCTCTACTTGATGATCCTCTGTGGCAACACAGCCATCATCTGGGTGGTGTGCACGCTCCCTCCGCACCCCCATGTACTTCTTCCTCTGCAACCTGGCCTTTGTAGAGATCAGCTACACCACGGTGGTGGTGCCTCTGATGCTTTCCAACAT

Dog syntenic interval

Mouse protein (H2BFS)

285 dog genes

Dog gene structure prediction

Introduction Multispecies map The unknown dog genes Dog orphan genesMultiresources map

Rennes 23 oct 2007

Page 33: AutoGRAPH: un serveur de comparaison de génomes ...videos.rennes.inria.fr/genopole/bioInfo2007/... · All Ensembl gene predictions are based on experimental evidences (UniProt -

Methods:

4 steps

Multispecies synteny maps construction: Reference (hsa/ptr/mmu/rno) : Tested (caf)&

Dog syntenic interval identification

Protein Alignment : cDNA reference sequence vs Canine genome sequence :

Dog gene predictions

1

2

3

Comparison with ensembl dog annotated genes4

Introduction Multispecies map The unknown dog genes Dog orphan genesMultiresources map

412

389

348

285

Rennes 23 oct 2007

Page 34: AutoGRAPH: un serveur de comparaison de génomes ...videos.rennes.inria.fr/genopole/bioInfo2007/... · All Ensembl gene predictions are based on experimental evidences (UniProt -

CFA_ 14: 57,078,322 - 58,156,905

347 AA - 7 Exons

Number of ensembl annotated genes in the interval:

1

New orthology relationship (1:1:1:1:1)

TFEC: Transcription factor EC isoform b

Is there any dog gene already annotated by Ensembl in the interval?

Introduction Multispecies map The unknown dog genes Dog orphan genesMultiresources map

Dog syntenic interval

Reference cDNA alignments

GeneWise: gene prediction

Rennes 23 oct 2007

Page 35: AutoGRAPH: un serveur de comparaison de génomes ...videos.rennes.inria.fr/genopole/bioInfo2007/... · All Ensembl gene predictions are based on experimental evidences (UniProt -

Is there any dog gene already annotated by Ensembl in the interval?

chr4:

Gap

Conservation

humanmouse

rat

Level 1Level 2Level 3Level 4Level 5Level 6

SINELINELTRDNA

SimpleLow Complexity

SatelliteRNA

OtherUnknown

AGAGACTTGTATACTCTAC...

CTTTTAT

TCAGGGAGCAGGTGTGCCCCTGTAC...TGTGCCCCTGTCC...CCAGCCATCTGTG...

TCAAAAAGA

AG

CAATTATTATTTTA

TGAATGGATAAAGA...

TTTATTAT

GA

chr4 + 61141kchr35 + 28639kchr35 - 28672kchr21 - 33184kchr32 + 18168kchr29 - 31942kchr11 + 27582k

chr2 + 57129kchr17 + 42932k

chr6 + 66820kchr30 + 27046k

chr6 + 15046kchr1 - 118754kchr11 + 70472kchr20 + 59523kchrX + 41508kchr1 - 118796kchr1 - 119285kchr28 - 44147kchr1 - 113736kchr1 + 120633kchrX + 40980kchr1 - 108199kchr11 + 70446kchr1 - 108154kchr1 - 102220kchrX - 40072k

chr1 + 108116kchr26 + 3091kchrX + 41591kchr4 - 61500kchr26 + 3047kchr15 + 5460kchr1 - 73204k

chr17 - 38021kchr5 - 42234kchr4 - 61438kchr26 + 3177k

chr1 - 118621kchr1 + 104532kchr17 - 47586k

chr6 - 15047kchr5 + 78548k

chr1 + 108384kchr1 - 73109kchrX - 41383k

chr1 + 118324kchr35 + 27795kchr1 + 102889k

chr15 + 5540kchr26 - 3047kchr1 + 97057k

chr1 + 118097kchr6 + 13102k

chr14 - 15102kchr1 - 108060kchr26 + 3213k

chr24 + 36661kchr5 + 83998k

chr17 + 37978kchr8 - 65715kchr6 + 3360kchr4 + 3154k

chr1 - 103101kchr1 + 118703kchr16 - 56946kchrX - 41583k

chr1 + 108063kchr1 - 72722k

chr4 - 3427kchr1 + 108040k

61380000 61390000 61400000 61410000 61420000 61430000 61440000Gap Locations

Your Sequence from Blat Search

RefSeq GenesNon-Dog RefSeq Genes

Human Proteins Mapped by Chained tBLASTn

CpG Islands (Islands < 300 Bases are Light Green)

Dog/Human/Mouse/Rat Multiz Alignments & phastCons Scores

Human (Mar. 2006/hg18) Alignment Net

Repeating Elements by RepeatMasker

Simple Tandem Repeats by TRF

Microsatellites - Di-nucleotide and Tri-nucleotide RepeatsChained Self-Alignments

ENSCAF_BEFOREENSG00000211445_predit

ENSCAF_borne_AFTER

Homo TNIP1Bos TNIP1Mus Tnip1

Xenopus tnip1Danio tnip1

Gallus LOC396235Mus Gpx3Mus Gpx3

Homo GPX3Bos GPX3

Sus CAPNS1Rattus Gpx3

Oryza Os05g0493500Oryza Os03g0607600

GLI4TNIP1

TNIP1GPX6

GPX3

GPX3

CCNB1 ZNF33BZNF11B

AK131420ZNF33BZNF11BZNF551

CpG: 61

GPX3: Glutathione peroxidase 3 precursor

CFA_ 4: 61,375,065-61,498,458

Dog syntenic interval

Reference cDNA alignments

GeneWise: gene prediction

226 AA - 5 Exons

Number of ensembl annotated genes in the interval:

0

New gene AND new orthology relationship (1:1:1:1:1)

Introduction Multispecies map The unknown dog genes Dog orphan genesMultiresources map

Rennes 23 oct 2007

Page 36: AutoGRAPH: un serveur de comparaison de génomes ...videos.rennes.inria.fr/genopole/bioInfo2007/... · All Ensembl gene predictions are based on experimental evidences (UniProt -

Tester si la présence de gène dans l'Env. synténique défini

Number of ensembl annotated genes in the interval:

3 (?)

New gene AND new orthology relationship (1:1:1:1:1)

MYO1D: Myosin Id

CFA_9:43,454,106 - 43,829,963

1006 AA - 22 Exons

Dog syntenic interval

Reference cDNA alignments

GeneWise: gene prediction

Introduction Multispecies map The unknown dog genes Dog orphan genesMultiresources map

Rennes 23 oct 2007

Page 37: AutoGRAPH: un serveur de comparaison de génomes ...videos.rennes.inria.fr/genopole/bioInfo2007/... · All Ensembl gene predictions are based on experimental evidences (UniProt -

Methods:

4 steps

Multispecies synteny maps construction: Reference (hsa/ptr/mmu/rno) : Tested (caf)&

Dog syntenic interval identification

Testing the dog syntenic interval by sequence alignments

Dog gene predictions

1

2

3

Comparison with ensembl dog annotated genes4

Introduction Multispecies map The unknown dog genes Dog orphan genesMultiresources map

412

389

348

285

185 new dog genes100 new orthology relationships

=

Rennes 23 oct 2007

Page 38: AutoGRAPH: un serveur de comparaison de génomes ...videos.rennes.inria.fr/genopole/bioInfo2007/... · All Ensembl gene predictions are based on experimental evidences (UniProt -

Results validation:

Introduction Multispecies map The unknown dog genes Dog orphan genesMultiresources map

80.5% have a peptide motif with InterProScan (InterPro Database)

48.6% (90/185) match with a canine ESTs (DB_EST)

> 40% protein identity with a reference ortholog.

Rennes 23 oct 2007

Page 39: AutoGRAPH: un serveur de comparaison de génomes ...videos.rennes.inria.fr/genopole/bioInfo2007/... · All Ensembl gene predictions are based on experimental evidences (UniProt -

104 with no gene prediction: reasons?

Introduction Multispecies map The unknown dog genes Dog orphan genesMultiresources map

Mean Size (bp)Fraction of

GAP content(%)

Fraction of repeat content

(%)

Fraction of GC content

(%)

Fraction of gene in telomeric region (%)

412 None dog interval predicted

23 None dog interval predicted

389 347,401 2.23 37.33 46.23 44.4

41 231,900 5.96 35.24 48.86 53.6

348 361,008 1.79 37.57 45.92 43.4

63 287,821 3.35 35.36 48.09 46.0

285 377,187 1.44 38.06 45.43 42.8

375,929 1.3233 35.8865 45.63 31.0

Step 1

Step 2

Step 3

Dog Consensual intervals definition

Overlap Reference transcripts vs. dog consensual interval

Gene prediction in dog consensual interval

104 without dog prediction

Rennes 23 oct 2007

Page 40: AutoGRAPH: un serveur de comparaison de génomes ...videos.rennes.inria.fr/genopole/bioInfo2007/... · All Ensembl gene predictions are based on experimental evidences (UniProt -

104 with no gene prediction: reasons?

Introduction Multispecies map The unknown dog genes Dog orphan genesMultiresources map

Mean Size (bp)Fraction of

GAP content(%)

Fraction of repeat content

(%)

Fraction of GC content

(%)

Fraction of gene in telomeric region (%)

412 None dog interval predicted

23 None dog interval predicted

389 347,401 2.23 37.33 46.23 44.4

41 231,900 5.96 35.24 48.86 53.6

348 361,008 1.79 37.57 45.92 43.4

63 287,821 3.35 35.36 48.09 46.0

285 377,187 1.44 38.06 45.43 42.8

Random set 1000 375,929 1.3233 35.8865 45.63 31.0

Step 1

Step 2

Step 3

Rennes 23 oct 2007

Page 41: AutoGRAPH: un serveur de comparaison de génomes ...videos.rennes.inria.fr/genopole/bioInfo2007/... · All Ensembl gene predictions are based on experimental evidences (UniProt -

104 with no gene prediction ?

Introduction Multispecies map The unknown dog genes Dog orphan genesMultiresources map

Structural problems (Sequence Quality):

- higher GAP content (> 10% for 12 genes)

- Protein identity < 20% [3/4] :- Smaller sizes of the dog intervals- Higher rate of GC content + telomeric localization - No EST validation - Biological function prone to “Gain and Loss” (immunity, olfaction = adaptation to environment, GOTM analysis)

Rennes 23 oct 2007

92 genes

Evolutionnary scenario : Loss of dog genes

Page 42: AutoGRAPH: un serveur de comparaison de génomes ...videos.rennes.inria.fr/genopole/bioInfo2007/... · All Ensembl gene predictions are based on experimental evidences (UniProt -

92 with no gene prediction? the example of the PNMA Family: RAP (Dufayard et al)

Introduction Multispecies map The unknown dog genes Dog orphan genesMultiresources map

Reconciliate TreeGene Tree

Page 43: AutoGRAPH: un serveur de comparaison de génomes ...videos.rennes.inria.fr/genopole/bioInfo2007/... · All Ensembl gene predictions are based on experimental evidences (UniProt -

Conclusions - Directions:

- Analysis of the evolution rate of these dog sequence compared to reference sequence

- Other orphan-gene sets & Other species set (using Cat, Elephant...)

- Using the gene adjacency + in-depth gene prediction for refining gene family orthology :

1:0 orthology + n:m orthology

Introduction Multispecies map The unknown dog genes Dog orphan genesMultiresources map

Rennes 23 oct 2007

COILs approach : - Multispecies - Multiple set of 1:1:

- complementary contributions of different genomes

- short interval = short space search (350 kb) : - reduces the cost of detecting false-positives - divergent sequence match facilitated- background noise is significantly reduced

Page 44: AutoGRAPH: un serveur de comparaison de génomes ...videos.rennes.inria.fr/genopole/bioInfo2007/... · All Ensembl gene predictions are based on experimental evidences (UniProt -

Acknowledgements:

Introduction Multispecies map The unknown dog genes Dog orphan genesMultiresources map

Francis GalibertCatherine André Christophe Hitte

Rennes Dog Genetics-Genomics Team

Rennes 23 oct 2007(Sophie Roucan, Hugues Leroy, Anthony Assi, Olivier Filangi...)

Page 45: AutoGRAPH: un serveur de comparaison de génomes ...videos.rennes.inria.fr/genopole/bioInfo2007/... · All Ensembl gene predictions are based on experimental evidences (UniProt -