vectorbase kolymbari meeting july 2011 new genomes new features and future plans daniel lawson (on...

19
VectorBase http://www.vectorbase.or g Kolymbari Meeting July 2011 new genomes new features and future plans Daniel Lawson (on behalf of VectorBase)

Upload: barrie-shields

Post on 14-Jan-2016

219 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: VectorBase  Kolymbari Meeting July 2011 new genomes new features and future plans Daniel Lawson (on behalf of VectorBase)

VectorBasehttp://www.vectorbase.org

Kolymbari MeetingJuly 2011

new genomes

new features

and future plans

Daniel Lawson (on behalf of VectorBase)

Page 2: VectorBase  Kolymbari Meeting July 2011 new genomes new features and future plans Daniel Lawson (on behalf of VectorBase)

VectorBasehttp://www.vectorbase.org

Kolymbari MeetingJuly 2011

VectorBase

EMBL-EBI

IMBB

Imperial College, London

University of Notre Dame

Harvard UniversityUniversity of

New Mexico

Page 3: VectorBase  Kolymbari Meeting July 2011 new genomes new features and future plans Daniel Lawson (on behalf of VectorBase)

VectorBasehttp://www.vectorbase.org

Kolymbari MeetingJuly 2011

VectorBase

• Integrated genomic resource for arthropod vectors of human pathogens.• Funded by NIH-NIAID as part of four Bioinformatic Resource Centers (BRCs).• Collaboration of 3 European and 3 US Institutes.

• VectorBase is:• Both service provider and content generator• A collator of genomic information• A genome annotation group (gene structure prediction)• A provider of tools for browsing and data mining vector genomes• A helpdesk for community queries• Responsible for data submissions to the public archival databanks• Committed to regular release cycle (5-6 releases per year)

Page 4: VectorBase  Kolymbari Meeting July 2011 new genomes new features and future plans Daniel Lawson (on behalf of VectorBase)

VectorBasehttp://www.vectorbase.org

Kolymbari MeetingJuly 2011

Summary of current contents

Genome Gene set TranscriptomicsGene

expressionPopGen

Aedes aegypti ✓ ✓ ✓ ✓ ✕

Anopheles gambiae ✓ ✓ ✓ ✓ ✓

Culex quinquefasciatus ✓ ✓ ✕ ✓ ✕

Glossina morsitans ✓ ✕ ✓ ✕ ✕

Ixodes scapularis ✓ ✓ ✕ ✕ ✕

Pediculus humanus ✓ ✓ ✕ ✕ ✕

Rhodnius prolixus ✓ ✕ ✓ ✕ ✕

Page 5: VectorBase  Kolymbari Meeting July 2011 new genomes new features and future plans Daniel Lawson (on behalf of VectorBase)

VectorBasehttp://www.vectorbase.org

Kolymbari MeetingJuly 2011

VectorBase website

• Release cycle has allowed for more frequent updates to the Ensembl browser• Includes support for presenting local data (GFF3, BAM, BED, (big)WIG & VCF files)• Updates/development for specific data types (e.g. PopGen, ontologies & search)

• VectorBase site needs a style/technology make over• Aim to removing clutter from the site and improving user experience• Merging our Help wiki (FAQ, tutorials, newsletter, forum) into the main site• Advantages for site maintenance and flexibility for coming years• Now is the time to get in contact with comments, wish list items.

Please contact VectorBase if you have comments about the current site, wish lists for the new site and if you want to be involved in user testing the new site.

Page 6: VectorBase  Kolymbari Meeting July 2011 new genomes new features and future plans Daniel Lawson (on behalf of VectorBase)

VectorBasehttp://www.vectorbase.org

Kolymbari MeetingJuly 2011

Pre-sites for upcoming genomes

Page 7: VectorBase  Kolymbari Meeting July 2011 new genomes new features and future plans Daniel Lawson (on behalf of VectorBase)

VectorBasehttp://www.vectorbase.org

Kolymbari MeetingJuly 2011

Pre-sites for upcoming genomes

Browse Search

Page 8: VectorBase  Kolymbari Meeting July 2011 new genomes new features and future plans Daniel Lawson (on behalf of VectorBase)

VectorBasehttp://www.vectorbase.org

Kolymbari MeetingJuly 2011

Supporting species without genomic resources

Genome De-linked Annotation Viewer

Browse Search

Page 9: VectorBase  Kolymbari Meeting July 2011 new genomes new features and future plans Daniel Lawson (on behalf of VectorBase)

VectorBasehttp://www.vectorbase.org

Kolymbari MeetingJuly 2011

Supporting species without genomic resources

Page 10: VectorBase  Kolymbari Meeting July 2011 new genomes new features and future plans Daniel Lawson (on behalf of VectorBase)

VectorBasehttp://www.vectorbase.org

Kolymbari MeetingJuly 2011

Updating annotation sets

Community VectorBase

• Submissions from community (CAP)• Previously as .xls file• Soon to also accept fasta and gff3• DAS server for data presentation overhauled• Integration into reference gene set codified

• Manual curation at Harvard/New Mexico• Priority is Anopheles gambiae• Provides QC for new gene builds• Final arbiter for issues arising from CAP• Move to Aedes aegypti in late 2011

The quality of the gene sets will improve faster if you, the community, play an active role in correcting gene predictions. Please contact VectorBase if you find an incorrect prediction or have data sets which can improve the gene set.

Page 11: VectorBase  Kolymbari Meeting July 2011 new genomes new features and future plans Daniel Lawson (on behalf of VectorBase)

VectorBasehttp://www.vectorbase.org

Kolymbari MeetingJuly 2011

Updating annotation setsRNA-Seq

Aim: Gene prediction using high-throughput transcriptome data a.k.a ‘RNA-seq’

Overview • Alternative method for generating transcript-based gene predictions.• Uses Illumina or 454 reads as well as traditional Sanger sequenced ESTs• Relatively short read lengths makes intron-exon junction prediction hard countered by the very high volume of data generated (millions of reads)• Pipeline uses existing short-read algorithms for gene prediction:• tophat, cufflinks, scripture

Potential problems• Data sets require significant filtering and pre-analysis QC• Mis-calling of homopolymer runs in 454 data leads to data noise and mis-prediction of splice sites• Large data sets include many inappropriate splicing events (intron read through, NMD targets etc.)

Summary• Effective at finding UTR regions and validating/improving existing predictions• Vital for making sense of sequence based measures of gene expression

Page 12: VectorBase  Kolymbari Meeting July 2011 new genomes new features and future plans Daniel Lawson (on behalf of VectorBase)

VectorBasehttp://www.vectorbase.org

Kolymbari MeetingJuly 2011

Updating annotation setsProjection from reference

Aim:

• Gene prediction using ‘high’ quality reference set from a related species.

Overview• When annotating a species for which we have a closely related reference species we can align the genomes and project from the ‘high’ quality set onto the new assembly.• This is more effective than a similarity build as it allows for building genes across contigs regardless of the assembly.• Whole-genome alignment (WGA) between reference and target using BLASTz.• Custom filter to ensure that each bp in the target genome is aligned to no more than one position in the reference genome. • Project predictions through transformation of coordinates between reference and target assemblies.

Summary• Effective for low coverage and poor quality assemblies.• Limited to reflect only orthologous loci between reference and target, i.e. no novel gene prediction.

Page 13: VectorBase  Kolymbari Meeting July 2011 new genomes new features and future plans Daniel Lawson (on behalf of VectorBase)

VectorBasehttp://www.vectorbase.org

Kolymbari MeetingJuly 2011

Anopheles gambiae reference sequence

• Many issues with the PEST assembly as a reference• S molecular form is proposed as the next reference

Sanger*

Illumina†

454

Hybrid assembly strategy

Metrics of

success

• Project existing gene predictions• de novo prediction in novel regions• Re-map important datasets

Page 14: VectorBase  Kolymbari Meeting July 2011 new genomes new features and future plans Daniel Lawson (on behalf of VectorBase)

VectorBasehttp://www.vectorbase.org

Kolymbari MeetingJuly 2011

Anopheles gambiae reference sequence

Validation of the assembly by normal metricsEmphasis on the concordance with large scale restriction map (optical map)

Page 15: VectorBase  Kolymbari Meeting July 2011 new genomes new features and future plans Daniel Lawson (on behalf of VectorBase)

VectorBasehttp://www.vectorbase.org

Kolymbari MeetingJuly 2011

Anopheles gambiae reference sequence

Page 16: VectorBase  Kolymbari Meeting July 2011 new genomes new features and future plans Daniel Lawson (on behalf of VectorBase)

VectorBasehttp://www.vectorbase.org

Kolymbari MeetingJuly 2011

Upcoming genomes: Kolymbari 2013?

NHGRI White papers

SandfliesLutzomyia longipalpisPhlebotomus papatasi

Anopheles (AGCC)Anopheles arabiensisAnopheles quadriannulatusAnopheles merusAnopheles melasAnopheles christylAnopheles epiroticusAnopheles stephensiAnopheles maculatusAnopheles funestusAnopheles minimusAnopheles culicifaciesAnopheles farautiAnopheles dirusAnopheles atroparvusAnopheles albimanus

GlossinaGlossina palpalisGlossina fuscipesGlossina pallidipesGlossina brevipalpisGlossina austeniStomoxys calcitransMusca domestica

SimuliumSimulium vittatumSimulium sirbanumSimulium damnosumSimulium ochraceumSimulium squamosumSimulium thyolenseSimulium santipauliSimulium woodiSimulium exiguum Simulium yahense

Tick & MitesLeptotrombidium delienseIxodes scapularis*Dermacentor variabilisOrnithodorus turicata

AnophelesAnopheles darlingi*Anopheles stephensi

Others

AedesAedes albopictus

i5K initiative

?

...

Page 17: VectorBase  Kolymbari Meeting July 2011 new genomes new features and future plans Daniel Lawson (on behalf of VectorBase)

VectorBasehttp://www.vectorbase.org

Kolymbari MeetingJuly 2011

Notices

• 2nd round of Driving Biological Projects solicitation• 2 years funding at $300K per year maximum• 2 page letters of interest by August 1st• Invited full proposals by November 1st• http://www.vectorbase.org/Other/News/?id=140

• Hiring an outreach position at Notre Dame• Details on the University of Notre Dame website• http://www.vectorbase.org/Other/News/?id=145

Page 18: VectorBase  Kolymbari Meeting July 2011 new genomes new features and future plans Daniel Lawson (on behalf of VectorBase)

VectorBasehttp://www.vectorbase.org

Kolymbari MeetingJuly 2011

Contact VectorBase at [email protected]

Page 19: VectorBase  Kolymbari Meeting July 2011 new genomes new features and future plans Daniel Lawson (on behalf of VectorBase)

VectorBasehttp://www.vectorbase.org

Kolymbari MeetingJuly 2011

Acknowledgements

VEMBL-EBI

Imperial College

Daniel Lawson Derek Wilson Gautier Koscielny Karyn Megy Martin Hammond Daniel Hughes Ewan Birney Paul Kersey

Fotis Kafatos Bob MacCallum George Christophides Seth Redmond

NoTre Dame

HaRvardIMBB

New MexicO

ASequencers

EnsEmbl

Maggie Werner-Washburne Phil Baker

Bill Gelbart Susan Russo Dave Emmert Pinlei Zhou Lynn Crosby Kathy Campbell

Kitsos Louis Pantelis Topalis Emmanuel Dialynas

TIGR/JCVI WashU Broad Institute

Frank Collins Nora Besansky Greg Madey Rob Bruggner Nate Konopinski EO Stinson Scott Emrich Andrew Sheehan Rory Carmichael Dave Cieslak Dave Campbell Ryan Butler Katie Cybulski Neil Lobo