visigene - avirtual microscope and database for in situ images at genome.ucsc.edu galt barber, donna...

1
VisiGene - AVirtual Microscope and Database for In Situ Images at genome.ucsc.edu Galt Barber, Donna Karolchik, David Haussler, Jim Kent VisiGene displays images from in-situ RNA hybridization, reporter genes, and other techniques that show where a gene, enhancer, or promoter is active in an organism. Currently VisiGene contains ~100,000 images from several high- throughput gene projects and also images from the literature as curated by the model organism databases. The controls for VisiGene are quite simple. There is a text box for search terms, a scrolling list of thumbnails of images that match the search terms, and a large region that serves as a virtual microscope for the selected image. One simply clicks on a region to go to the next level of magnification centered on that region. VisiGene only transmits the data for the part of the image that you are viewing at the scale you are viewing it at, so the response time is quite fast. One can scroll through the image by dragging it with a mouse. Underneath the image is a caption which contains a link to the paper associated with the image, hyperlinks to the UCSC Genome Browser page for the genes, the age, sex and genotype of the organism, and when available human curated information on what anatomical structures the gene is active in. The search terms include gene names and symbols, authors, date of publication, organisms, developmental stages, and anatomical structures. The Genome Browser and Gene Sorter contain tracks and columns that link into VisiGene. Current image sets include mouse transcription factors from the Mahoney Lab, adult mouse brain images from the Allen Brain Atlas, mouse head and brain images from the GENSAT project, whole mount Xenopus laevis images from the Japanese Institute of Basic Biology, and images from the mouse literature curated by the GXD group of MGI. We are grateful to all who have contributed images to VisiGene so far, and are actively searching for additional image sets. Database structure Indentation shows parent/child relationship between tables. Key fields used to join tables are underlined. In general a key field named xyz links into the id field of the xyz table. table fields submissionSource id ,name,acknowledgement,setUrl,itemUrl,abUrl submissionSet id ,name,contributors ,year,publication,pubUrl,journal ,copyright ,submissionSource journal id ,name,url copyright id ,notice imageFile id ,fileName,priority,imageWidth,imageHeight,submissionSet ,submitId,caption caption id ,caption image id ,submissionSet ,imageFile ,imagePos,paneLabel,sectionSet ,sectionIx,specimen ,preparation specimen id ,name,taxon,genotype ,bodyPart ,sex ,age,minAge,maxAge,notes bodyPart id ,name sex id ,name genotype id ,taxon,strain ,alleles strain id ,taxon,name genotypeAllele genotype ,allele .allele id ,gene ,name gene id ,name,locusLink,refSeq,genbank,uniProt,taxon preparation id ,fixation ,embedding ,permeablization ,sliceType ,notes fixation id ,description embedding id ,description permeablization id ,description sliceType id , name imageProbe image ,probe ,probeColor probe id ,gene ,antibody ,probeType ,fPrimer,rPrimer,seq,bac gene id ,name,locusLink,refSeq,genbank,uniProt,taxon antibody id ,name,description,taxon probeType id , name bac id , name probeColor id , name expressionLevel imageProbe ,bodyPart ,level,cellType ,cellSubtype ,expressionPattern bodyPart id , name cellType id , name cellSubtype id , name expressionPattern id , name Full Resolution Image 1/2x Image 1/4x Full sized images are shrunk 1/2, 1/4, 1/8, 1/16, 1/32, and 1/64. Images at each scale are cut into 512x512 tiles. This processing happens off-line on our computer cluster. Javascript code in “bigImage.html”requests just those tiles needed to to show current window. The bigImage.html is independent of the database, and could easily be used to deliver other high resolution imagery over the web. SQL Database http JPEGs JAX/MGI Gene names Excel Spreadsheet Laptop JPEGs Mahoney Lab PCR Primers XML Dump http JPEGs NCBI Gensat BAC Seq. File naming scheme 3 CDs JPEGs Japanese NIBB EST Seq Excel Spreadsheet Ext HD JPEG 2000 Allen Brain Clone Seq vgLoadJax 978 lines of C vgLoadMahoney 724 lines of C vgLoadGensat 301 lines of C vgLoadJax 204 lines of C vgLoadJax 253 lines of C Directory containing 3 files per submission: submission.ra imageInfo.tab caption.txt visiGeneLoad 1332 lines of C vgPrepImage 832 lines of C ~4,000,000 512x512 JPEG image tiles ~1,000,000 row MySQL Database Free text gene-aware index vgGetText 290 lines of C Directories of Full sized images hgVisiGene Web CGI script 3988 lines of C bigImage.html JavaScript + HTML 1098 lines Your web browser Acknowledgements Imagery and Caption Data: Paul Gray and the Mahoney Lab Martin Ringwald, Susan McKlatchy, Janan Eppig, and the Gene Expression folks at MGI/Jackson Labs Michael Dicuccio at NCBI and the GENSAT project Naeto Ueno and the Japanese National Institute for Basic Biology Susan Sunkin and the Allen Brain Institute Software Tools: MySQL Image Magick ER Mapper (for JPEG 2000 libraries) GNU Compiler Collection & Linux Funding: VisiGene was developed as a skunk works under NHGRI grant 1P41HG02371 Special thanks to the Quality Assurance Group at genome.ucsc.edu for all their help in making VisiGene a robust web application.

Post on 20-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: VisiGene - AVirtual Microscope and Database for In Situ Images at genome.ucsc.edu Galt Barber, Donna Karolchik, David Haussler, Jim Kent VisiGene displays

VisiGene - AVirtual Microscope and Database for In Situ Images at genome.ucsc.edu

Galt Barber, Donna Karolchik, David Haussler, Jim Kent

VisiGene displays images from in-situ RNA hybridization, reporter genes, and other techniques that show where a gene, enhancer, or promoter is active in an organism. Currently VisiGene contains ~100,000 images from several high-throughput gene projects and also images from the literature as curated by the model organism databases. The controls for VisiGene are quite simple. There is a text box for search terms, a scrolling list of thumbnails of images that match the search terms, and a large region that serves as a virtual microscope for the selected image. One simply clicks on a region to go to the next level of magnification centered on that region. VisiGene only transmits the data for the part of the image that you are viewing at the scale you are viewing it at, so the response time is quite fast. One can scroll through the image by dragging it with a mouse. Underneath the image is a caption which contains a link to the paper associated with the image, hyperlinks to the UCSC Genome Browser page for the genes, the age, sex and genotype of the organism, and when available human curated information on what anatomical structures the gene is active in. The search terms include gene names and symbols, authors, date of publication, organisms, developmental stages, and anatomical structures. The Genome Browser and Gene Sorter contain tracks and columns that link into VisiGene. Current image sets include mouse transcription factors from the Mahoney Lab, adult mouse brain images from the Allen Brain Atlas, mouse head and brain images from the GENSAT project, whole mount Xenopus laevis images from the Japanese Institute of Basic Biology, and images from the mouse literature curated by the GXD group of MGI. We are grateful to all who have contributed images to VisiGene so far, and are actively searching for additional image sets.

Database structure Indentation shows parent/child relationship between tables. Key fields used to join tables are underlined. In general a key field named xyz links into the id field of the xyz table.

table fieldssubmissionSource id,name,acknowledgement,setUrl,itemUrl,abUrl submissionSet id,name,contributors,year,publication,pubUrl,journal,copyright,submissionSource journal id,name,url copyright id,notice imageFile id,fileName,priority,imageWidth,imageHeight,submissionSet,submitId,caption caption id,caption image id,submissionSet,imageFile,imagePos,paneLabel,sectionSet,sectionIx,specimen,preparation specimen id,name,taxon,genotype,bodyPart,sex,age,minAge,maxAge,notes bodyPart id,name sex id,name genotype id,taxon,strain,alleles strain id,taxon,name genotypeAllele genotype,allele .allele id,gene,name gene id,name,locusLink,refSeq,genbank,uniProt,taxon preparation id,fixation,embedding,permeablization,sliceType,notes fixation id,description embedding id,description permeablization id,description sliceType id, name imageProbe image,probe,probeColor probe id,gene,antibody,probeType,fPrimer,rPrimer,seq,bac gene id,name,locusLink,refSeq,genbank,uniProt,taxon antibody id,name,description,taxon probeType id, name bac id, name probeColor id, name expressionLevel imageProbe,bodyPart,level,cellType,cellSubtype,expressionPattern bodyPart id, name cellType id, name cellSubtype id, name expressionPattern id, name

Full Resolution Image

1/2x Image

1/4x

Full sized images are shrunk 1/2, 1/4, 1/8, 1/16, 1/32, and 1/64. Images at each scale are cut into 512x512 tiles. This processing happens off-line on our computer cluster. Javascript code in “bigImage.html”requests just those tiles needed to to show current window. The bigImage.html is independent of the database, and could easily be used to deliver other high resolution imagery over the web.

SQL Databasehttp JPEGsJAX/MGI

Gene names

Excel SpreadsheetLaptop JPEGsMahoney LabPCR Primers

XML Dumphttp JPEGs

NCBI GensatBAC Seq.

File naming scheme3 CDs JPEGs

Japanese NIBBEST Seq

Excel SpreadsheetExt HD JPEG 2000

Allen BrainClone Seq

vgLoadJax978 lines of C

vgLoadMahoney724 lines of C

vgLoadGensat301 lines of C

vgLoadJax204 lines of C

vgLoadJax253 lines of C

Directory containing 3 files per submission: submission.raimageInfo.tab

caption.txt

visiGeneLoad1332 lines of C

vgPrepImage832 lines of C

~4,000,000 512x512

JPEG image tiles

~1,000,000row MySQLDatabase

Free textgene-aware

index

vgGetText290 lines of C

Directories of Full sizedimages

hgVisiGeneWeb CGI script3988 lines of C

bigImage.htmlJavaScript + HTML

1098 lines

Your web browser

AcknowledgementsImagery and Caption Data:

Paul Gray and the Mahoney Lab

Martin Ringwald, Susan McKlatchy, Janan Eppig, and the Gene Expression folks at MGI/Jackson Labs

Michael Dicuccio at NCBI and the GENSAT project

Naeto Ueno and the Japanese National Institute for Basic Biology

Susan Sunkin and the Allen Brain Institute

Software Tools:

MySQL

Image Magick

ER Mapper (for JPEG 2000 libraries)

GNU Compiler Collection & Linux

Funding:

VisiGene was developed as a skunk works under NHGRI grant 1P41HG02371

Special thanks to the Quality Assurance Group at genome.ucsc.edu for all their help in making VisiGene a robust web application.