biomart and chado
DESCRIPTION
BioMart and CHADO. Arek Kasprzyk GMOD meeting 16 May 2005. BioMart. User interfaces ‘advanced search’ Web wizard GUI Text Query optimization Federation Structured database views (dataset). BioMart schema. databases. datasets. Dataset. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: BioMart and CHADO](https://reader036.vdocuments.mx/reader036/viewer/2022081506/568152fa550346895dc11813/html5/thumbnails/1.jpg)
BioMart and CHADO
Arek KasprzykGMOD meeting16 May 2005
![Page 2: BioMart and CHADO](https://reader036.vdocuments.mx/reader036/viewer/2022081506/568152fa550346895dc11813/html5/thumbnails/2.jpg)
BioMart
• User interfaces ‘advanced search’– Web wizard– GUI– Text
• Query optimization• Federation• Structured database views (dataset)
![Page 3: BioMart and CHADO](https://reader036.vdocuments.mx/reader036/viewer/2022081506/568152fa550346895dc11813/html5/thumbnails/3.jpg)
BioMart schema
datasetsdatabases
![Page 4: BioMart and CHADO](https://reader036.vdocuments.mx/reader036/viewer/2022081506/568152fa550346895dc11813/html5/thumbnails/4.jpg)
Dataset
• Organised into 1 - n tables with 0,1 level referencing (database view)
• Filters, Attributes• Exportables, Importables, Links• Properties captured by dataset configuration
file• Can be derived from source schema by fixed
schema transformation
![Page 5: BioMart and CHADO](https://reader036.vdocuments.mx/reader036/viewer/2022081506/568152fa550346895dc11813/html5/thumbnails/5.jpg)
Datasets and schema
• Relational DB analogies– Each dataset -> table
• Relational attributes translated to unique filters and attributes
– exportable/importable ->PK/FK– A collection of datasets with unique names
create a virtual schema
![Page 6: BioMart and CHADO](https://reader036.vdocuments.mx/reader036/viewer/2022081506/568152fa550346895dc11813/html5/thumbnails/6.jpg)
Structured and ‘ad hoc’ database views
![Page 7: BioMart and CHADO](https://reader036.vdocuments.mx/reader036/viewer/2022081506/568152fa550346895dc11813/html5/thumbnails/7.jpg)
FK
FK
FK
FK
PK
PK
Dataset
![Page 8: BioMart and CHADO](https://reader036.vdocuments.mx/reader036/viewer/2022081506/568152fa550346895dc11813/html5/thumbnails/8.jpg)
FK
FK
FK
FK
PK
FK FK FKFK
PK PK
PK PK
Dataset
![Page 9: BioMart and CHADO](https://reader036.vdocuments.mx/reader036/viewer/2022081506/568152fa550346895dc11813/html5/thumbnails/9.jpg)
FK
FK
FK
FK
PK
PK
FK FK
FK FK
Dataset
![Page 10: BioMart and CHADO](https://reader036.vdocuments.mx/reader036/viewer/2022081506/568152fa550346895dc11813/html5/thumbnails/10.jpg)
main1
PK1
2
PK2PK1
FK2
dm
FK2
dm
FK1 FK2
dm
FK1 FK2
PK1FK1 FK1
FK2 FK2PK2 FK1
Dataset - ‘reversed star’
![Page 11: BioMart and CHADO](https://reader036.vdocuments.mx/reader036/viewer/2022081506/568152fa550346895dc11813/html5/thumbnails/11.jpg)
DatasetFixed schema transformation
A
B
TA
TB
C
![Page 12: BioMart and CHADO](https://reader036.vdocuments.mx/reader036/viewer/2022081506/568152fa550346895dc11813/html5/thumbnails/12.jpg)
Transformation principles
• Main– 1:1, n:1
• Dimension– 1:n– 1:1,n:1
![Page 13: BioMart and CHADO](https://reader036.vdocuments.mx/reader036/viewer/2022081506/568152fa550346895dc11813/html5/thumbnails/13.jpg)
Application
• Read database meta data• User input:
– main, dms, cardinalities• Write a configuration file• Translate configuration into DDLs• MartBuilder
![Page 14: BioMart and CHADO](https://reader036.vdocuments.mx/reader036/viewer/2022081506/568152fa550346895dc11813/html5/thumbnails/14.jpg)
Transformation configuration file
• Focus tables– Main,dm
• Central, reference tables• Type: exported, imported• Keys• Optional
– Columns subset,– User table names,– Projections,– Central filters
![Page 15: BioMart and CHADO](https://reader036.vdocuments.mx/reader036/viewer/2022081506/568152fa550346895dc11813/html5/thumbnails/15.jpg)
Datasets, Attributes and Filters
GENE
gene_id(PK)gene_stable_id gene_startgene_chrom_endchromosomegene_display_iddescription
Mart
Dataset
Attribute
Filter
![Page 16: BioMart and CHADO](https://reader036.vdocuments.mx/reader036/viewer/2022081506/568152fa550346895dc11813/html5/thumbnails/16.jpg)
Exportables, Importables and Links
Dataset 1
Dataset 2
Links
![Page 17: BioMart and CHADO](https://reader036.vdocuments.mx/reader036/viewer/2022081506/568152fa550346895dc11813/html5/thumbnails/17.jpg)
Exportables, Importables and Links
UniProt Human Ensembl Genes
Exportable Importable
name = uniprot_id
attributes = uniprot_ac
name = uniprot_id
filters = uniprot_ac_list
Links
SELECT uniprot_ac FROM ...
SELECT … FROM … WHERE uniprot_ac IN (….)
![Page 18: BioMart and CHADO](https://reader036.vdocuments.mx/reader036/viewer/2022081506/568152fa550346895dc11813/html5/thumbnails/18.jpg)
Exportables, Importables and Links
Encode Human Ensembl Genes
Exportable Importable
name=genomic_region
attributes=chr_name, chr_start, chr_end
name=genomic_region
filters=chr_name (=), chr_start (>=), chr_end (<=)
Links
SELECT chr_name, chr_start, chr_end FROM ...
SELECT … FROM … WHERE (chr_name = 1 AND chr_start >= 100 AND chr_end < = 10000) OR (chr_name = 2 AND chr_start >= 50 AND chr_end < = 56780) ...
![Page 19: BioMart and CHADO](https://reader036.vdocuments.mx/reader036/viewer/2022081506/568152fa550346895dc11813/html5/thumbnails/19.jpg)
Dataset configuration
• Hierachical representation of fliters and attributes– Trees– Groups– Collections
• Exportables and Importables• Basic relational mapping• Meta data - defines user interface
![Page 20: BioMart and CHADO](https://reader036.vdocuments.mx/reader036/viewer/2022081506/568152fa550346895dc11813/html5/thumbnails/20.jpg)
Dataset Configuration
XML
XML
XML
![Page 21: BioMart and CHADO](https://reader036.vdocuments.mx/reader036/viewer/2022081506/568152fa550346895dc11813/html5/thumbnails/21.jpg)
MartEditor
![Page 22: BioMart and CHADO](https://reader036.vdocuments.mx/reader036/viewer/2022081506/568152fa550346895dc11813/html5/thumbnails/22.jpg)
Table naming conventionNaïve configuration
• Tables– Meta tables meta_content– Data tables dataset__content__type
• Data tables– Main __main – Dimension __dm
• Columns– Key _key
![Page 23: BioMart and CHADO](https://reader036.vdocuments.mx/reader036/viewer/2022081506/568152fa550346895dc11813/html5/thumbnails/23.jpg)
Retrieval
myDatabase
SNPVega
EnsemblUniProt
myMart
MSD
BioMart API
JAVA Perl
MartExplorer MartShell MartView
Schema transformation
MartBuilder
XML
MartEditor
Configuration
Databases
Public data (local or remote)
BioMart architecture
![Page 24: BioMart and CHADO](https://reader036.vdocuments.mx/reader036/viewer/2022081506/568152fa550346895dc11813/html5/thumbnails/24.jpg)
BioMart Registry
R
WWW GUI
RR
![Page 25: BioMart and CHADO](https://reader036.vdocuments.mx/reader036/viewer/2022081506/568152fa550346895dc11813/html5/thumbnails/25.jpg)
Class diagram - configuration
![Page 26: BioMart and CHADO](https://reader036.vdocuments.mx/reader036/viewer/2022081506/568152fa550346895dc11813/html5/thumbnails/26.jpg)
Class diagram - querying
![Page 27: BioMart and CHADO](https://reader036.vdocuments.mx/reader036/viewer/2022081506/568152fa550346895dc11813/html5/thumbnails/27.jpg)
MartView
![Page 28: BioMart and CHADO](https://reader036.vdocuments.mx/reader036/viewer/2022081506/568152fa550346895dc11813/html5/thumbnails/28.jpg)
MartShell
![Page 29: BioMart and CHADO](https://reader036.vdocuments.mx/reader036/viewer/2022081506/568152fa550346895dc11813/html5/thumbnails/29.jpg)
MartExplorer
![Page 30: BioMart and CHADO](https://reader036.vdocuments.mx/reader036/viewer/2022081506/568152fa550346895dc11813/html5/thumbnails/30.jpg)
Third party software
• Bioconductor (biomaRt) – BioMart schema
• Taverna – BioMart java library
• DAS ProServer – BioMart perl library
![Page 31: BioMart and CHADO](https://reader036.vdocuments.mx/reader036/viewer/2022081506/568152fa550346895dc11813/html5/thumbnails/31.jpg)
biomaRt
![Page 32: BioMart and CHADO](https://reader036.vdocuments.mx/reader036/viewer/2022081506/568152fa550346895dc11813/html5/thumbnails/32.jpg)
Taverna
![Page 33: BioMart and CHADO](https://reader036.vdocuments.mx/reader036/viewer/2022081506/568152fa550346895dc11813/html5/thumbnails/33.jpg)
ProServer
• No programming• DAS request and responses defined by
Exportables and Importables and configured by MartEditor
• DAS1
![Page 34: BioMart and CHADO](https://reader036.vdocuments.mx/reader036/viewer/2022081506/568152fa550346895dc11813/html5/thumbnails/34.jpg)
Where are we?
• 0.2 released in february• 0.3 to be released in june
– Platforms• Mysql• Oracle• Postgres
– Robust error handling
![Page 35: BioMart and CHADO](https://reader036.vdocuments.mx/reader036/viewer/2022081506/568152fa550346895dc11813/html5/thumbnails/35.jpg)
Where are we?
• BioMart v 0.2– Large scale data federation (Hinxton)
• Uniprot Proteomes,MSD,Ensembl,Vega
– Optimizing access to a large database• Ensembl, WormBase, ArrayExpress
– Federating small datasets with public data • Pasteur, INRA, Bayer, Unilever, Serono, Sanofi-
Aventis, DevGen, etc …
![Page 36: BioMart and CHADO](https://reader036.vdocuments.mx/reader036/viewer/2022081506/568152fa550346895dc11813/html5/thumbnails/36.jpg)
Immediate Future
• MartBuilder– GUI– XML configuration
• MartView– Scalable– Configurable
![Page 37: BioMart and CHADO](https://reader036.vdocuments.mx/reader036/viewer/2022081506/568152fa550346895dc11813/html5/thumbnails/37.jpg)
Acknowledgments
• BioMart– Damian Smedley (EBI)– Darin London (EBI)– Will Spooner (CSHL)
• Contributors– Arne Stabenau (Ensembl)– Andreas Kahari (Ensembl)– Craig Melsopp (Ensembl)– Katerina Tzouvara (Uniprot)– Paul Donlon (Unilever)