data mining in ensembl with biomart giulietta spudich
TRANSCRIPT
![Page 1: Data Mining in Ensembl with BioMart Giulietta Spudich](https://reader036.vdocuments.mx/reader036/viewer/2022062803/56649f585503460f94c7de92/html5/thumbnails/1.jpg)
Data Mining in Ensembl with Data Mining in Ensembl with BioMartBioMart
Giulietta Spudich
![Page 2: Data Mining in Ensembl with BioMart Giulietta Spudich](https://reader036.vdocuments.mx/reader036/viewer/2022062803/56649f585503460f94c7de92/html5/thumbnails/2.jpg)
Simple Text-based Simple Text-based Search EngineSearch Engine
![Page 3: Data Mining in Ensembl with BioMart Giulietta Spudich](https://reader036.vdocuments.mx/reader036/viewer/2022062803/56649f585503460f94c7de92/html5/thumbnails/3.jpg)
‘‘Mouse Gene’ Gives Us ResultsMouse Gene’ Gives Us Results
![Page 4: Data Mining in Ensembl with BioMart Giulietta Spudich](https://reader036.vdocuments.mx/reader036/viewer/2022062803/56649f585503460f94c7de92/html5/thumbnails/4.jpg)
A More Complex Query is Not as UsefulA More Complex Query is Not as Useful
![Page 5: Data Mining in Ensembl with BioMart Giulietta Spudich](https://reader036.vdocuments.mx/reader036/viewer/2022062803/56649f585503460f94c7de92/html5/thumbnails/5.jpg)
BioMart- Data miningBioMart- Data mining
• BioMart is a search engine that can find multiple terms and put them into a table format.
• Such as: mouse gene (IDs), chromosome and base pair position
• No programming required!
![Page 6: Data Mining in Ensembl with BioMart Giulietta Spudich](https://reader036.vdocuments.mx/reader036/viewer/2022062803/56649f585503460f94c7de92/html5/thumbnails/6.jpg)
General or Specific Data-TablesGeneral or Specific Data-Tables
• All the genes for one species
• Or… only genes on one specific region of a chromosome
• Or… genes on one region of a chromosome associated with a disease
![Page 7: Data Mining in Ensembl with BioMart Giulietta Spudich](https://reader036.vdocuments.mx/reader036/viewer/2022062803/56649f585503460f94c7de92/html5/thumbnails/7.jpg)
The First Step: Choose the The First Step: Choose the DatasetDataset
![Page 8: Data Mining in Ensembl with BioMart Giulietta Spudich](https://reader036.vdocuments.mx/reader036/viewer/2022062803/56649f585503460f94c7de92/html5/thumbnails/8.jpg)
The Second Step: FiltersThe Second Step: Filters
Filters define which genes we are looking at.
![Page 9: Data Mining in Ensembl with BioMart Giulietta Spudich](https://reader036.vdocuments.mx/reader036/viewer/2022062803/56649f585503460f94c7de92/html5/thumbnails/9.jpg)
Attributes attach informationAttributes attach information
Determine output columns with Attributes.
![Page 10: Data Mining in Ensembl with BioMart Giulietta Spudich](https://reader036.vdocuments.mx/reader036/viewer/2022062803/56649f585503460f94c7de92/html5/thumbnails/10.jpg)
ResultsResults
Tables or sequencesTables or sequences
![Page 11: Data Mining in Ensembl with BioMart Giulietta Spudich](https://reader036.vdocuments.mx/reader036/viewer/2022062803/56649f585503460f94c7de92/html5/thumbnails/11.jpg)
Query:Query:
• For all mouse genes on chromosome 10 that are protein coding, I would like to know the IDs in both Ensembl and MGI.Are there Illumina probes and GO IDs for these genes?
• In the query:Filters: what we knowAttributes: what we want to know.
![Page 12: Data Mining in Ensembl with BioMart Giulietta Spudich](https://reader036.vdocuments.mx/reader036/viewer/2022062803/56649f585503460f94c7de92/html5/thumbnails/12.jpg)
Query:Query:
• For all mouse genes on chromosome 10 that are protein coding, I would like to know the IDs in both Ensembl and MGI.Are there Illumina probes and GO IDs for these genes?
• In the query:Filters: what we knowAttributes: what we want to know.
![Page 13: Data Mining in Ensembl with BioMart Giulietta Spudich](https://reader036.vdocuments.mx/reader036/viewer/2022062803/56649f585503460f94c7de92/html5/thumbnails/13.jpg)
Query:Query:
• For all mouse genes on chromosome 10 that are protein coding, I would like to know the IDs in both Ensembl and MGI.Are there Illumina probes and GO IDs for these genes?
• In the query:Filters: what we knowAttributes: what we want to know.
![Page 14: Data Mining in Ensembl with BioMart Giulietta Spudich](https://reader036.vdocuments.mx/reader036/viewer/2022062803/56649f585503460f94c7de92/html5/thumbnails/14.jpg)
A Brief ExampleA Brief Example
Change dataset tomouse
Mus musculus
![Page 15: Data Mining in Ensembl with BioMart Giulietta Spudich](https://reader036.vdocuments.mx/reader036/viewer/2022062803/56649f585503460f94c7de92/html5/thumbnails/15.jpg)
Select the genes with FiltersSelect the genes with Filters
We are looking for mouse genes on chromosome 10 that are protein coding.
ClickFilters.
Expand the ‘REGION’
panel.
![Page 16: Data Mining in Ensembl with BioMart Giulietta Spudich](https://reader036.vdocuments.mx/reader036/viewer/2022062803/56649f585503460f94c7de92/html5/thumbnails/16.jpg)
Filters (selecting the genes)Filters (selecting the genes)
Change this to chromosome 10
![Page 17: Data Mining in Ensembl with BioMart Giulietta Spudich](https://reader036.vdocuments.mx/reader036/viewer/2022062803/56649f585503460f94c7de92/html5/thumbnails/17.jpg)
Filters (selecting the genes)Filters (selecting the genes)
Select ‘protein coding’ in the ‘GENE’ section.
Click on ‘Attributes’
![Page 18: Data Mining in Ensembl with BioMart Giulietta Spudich](https://reader036.vdocuments.mx/reader036/viewer/2022062803/56649f585503460f94c7de92/html5/thumbnails/18.jpg)
We would like GO terms and IDs in MGI (the Mouse Genome Informatics site).
Attributes (Output Options)Attributes (Output Options)
Expand the ‘EXTERNAL’ panel for
non-Ensembl IDs.
![Page 19: Data Mining in Ensembl with BioMart Giulietta Spudich](https://reader036.vdocuments.mx/reader036/viewer/2022062803/56649f585503460f94c7de92/html5/thumbnails/19.jpg)
Attributes (Output)Attributes (Output)
Scroll down to add ‘Illumina v1’ probes that map to these genes.
Click ‘Results’
![Page 20: Data Mining in Ensembl with BioMart Giulietta Spudich](https://reader036.vdocuments.mx/reader036/viewer/2022062803/56649f585503460f94c7de92/html5/thumbnails/20.jpg)
‘Results’ shows Gene IDs, GO terms, and Illumina probes for all protein coding mouse
genes on chromosome 10.
The Results Table - PreviewThe Results Table - PreviewFor the full result table: click ‘Go’ or View ‘ALL’ rows.
![Page 21: Data Mining in Ensembl with BioMart Giulietta Spudich](https://reader036.vdocuments.mx/reader036/viewer/2022062803/56649f585503460f94c7de92/html5/thumbnails/21.jpg)
Full Result TableFull Result TableEnsembl Gene and
Transcript IDsGO terms MGI
symbolIllumina probes
![Page 22: Data Mining in Ensembl with BioMart Giulietta Spudich](https://reader036.vdocuments.mx/reader036/viewer/2022062803/56649f585503460f94c7de92/html5/thumbnails/22.jpg)
Original Query:Original Query:
• For all mouse genes on chromosome 10 that are protein coding, I would like to know the IDs in both Ensembl and MGI.Are there Illumina probes and GO IDs for these genes?
• In the query:Filters: what we knowAttributes: columns in the Result Table
![Page 23: Data Mining in Ensembl with BioMart Giulietta Spudich](https://reader036.vdocuments.mx/reader036/viewer/2022062803/56649f585503460f94c7de92/html5/thumbnails/23.jpg)
Other Export Options (Attributes)Other Export Options (Attributes) Sequences: UTRs, flanking sequences, cDNA
and peptides, etc
Gene IDs from Ensembl and external sources (MGI, Entrez, etc)
Microarray data
Protein Functions/descriptions (Interpro, GO)
Orthologous gene sets
SNP/ Variation Data
![Page 24: Data Mining in Ensembl with BioMart Giulietta Spudich](https://reader036.vdocuments.mx/reader036/viewer/2022062803/56649f585503460f94c7de92/html5/thumbnails/24.jpg)
BioMart Data SetsBioMart Data Sets
• Ensembl genes• Vega genes
• SNPs• Compara (homologues and alignments)
![Page 25: Data Mining in Ensembl with BioMart Giulietta Spudich](https://reader036.vdocuments.mx/reader036/viewer/2022062803/56649f585503460f94c7de92/html5/thumbnails/25.jpg)
BioMart around the BioMart around the world…world…
BioMart started at Ensembl…
To where has it travelled?
![Page 26: Data Mining in Ensembl with BioMart Giulietta Spudich](https://reader036.vdocuments.mx/reader036/viewer/2022062803/56649f585503460f94c7de92/html5/thumbnails/26.jpg)
Central ServerCentral Server
www.biomart.org
![Page 27: Data Mining in Ensembl with BioMart Giulietta Spudich](https://reader036.vdocuments.mx/reader036/viewer/2022062803/56649f585503460f94c7de92/html5/thumbnails/27.jpg)
WormBase WormBase
![Page 28: Data Mining in Ensembl with BioMart Giulietta Spudich](https://reader036.vdocuments.mx/reader036/viewer/2022062803/56649f585503460f94c7de92/html5/thumbnails/28.jpg)
HapMapHapMap
Population frequencies
Inter- population comparisons
Gene annotation
![Page 29: Data Mining in Ensembl with BioMart Giulietta Spudich](https://reader036.vdocuments.mx/reader036/viewer/2022062803/56649f585503460f94c7de92/html5/thumbnails/29.jpg)
DictyBaseDictyBase
![Page 30: Data Mining in Ensembl with BioMart Giulietta Spudich](https://reader036.vdocuments.mx/reader036/viewer/2022062803/56649f585503460f94c7de92/html5/thumbnails/30.jpg)
GRAMENEGRAMENE
Rice, Maize, Arabidopsis genomes…
![Page 31: Data Mining in Ensembl with BioMart Giulietta Spudich](https://reader036.vdocuments.mx/reader036/viewer/2022062803/56649f585503460f94c7de92/html5/thumbnails/31.jpg)
How to Get ThereHow to Get Therehttp://www.biomart.org/biomart/martview
http://www.ensembl.org/biomart/martview
• Or click on ‘BioMart’ from Ensembl
![Page 32: Data Mining in Ensembl with BioMart Giulietta Spudich](https://reader036.vdocuments.mx/reader036/viewer/2022062803/56649f585503460f94c7de92/html5/thumbnails/32.jpg)
• Choose Dataset (All genes for a species)
• Choose Filters (narrows the gene set)
• Choose Attributes (output options)
The FlowThe Flow
![Page 33: Data Mining in Ensembl with BioMart Giulietta Spudich](https://reader036.vdocuments.mx/reader036/viewer/2022062803/56649f585503460f94c7de92/html5/thumbnails/33.jpg)
BioMart teamBioMart team
• Arek Kasprzyk• Syed Haider• Richard Holland• Damian Smedley