data workflow overview genomics high- throughput facility genome analyzer iix institute for genomics...
TRANSCRIPT
Data Workflow Overview
Genomics High- Throughput Facility
GenomeAnalyzer IIx
Institute for Genomicsand Bioinformatics
Computation Resources
Storage Capacity
Public Web Servers
● ~ 800 processors● Sun Grid Engine
● ~ 100TB (secured)● Fast drives● 30TB for HTS
● HTTP, FTP● Dedicated hosts● User accounts
HTS: 700GB/day
Bandwidth: 10Gb/s
USER
Sample Analysis Requests(via web interface)
Analysis Results(FTP server)
Data Analysis Workflow
IMAGES2-4 TB
INTENSITIES100-200 GB
Image Analysis
Firecrest
Base Calling
Bustard
BASE CALLS50-100 GB
SEQUENCES+ SCORES20/30 GB
Synthesis
Gerald
GENOMEALIGNMENT
>100 GB
Alignment
ELAND+ Reference Genome
READ COUNTSRead Counting
Casava VDC
Sample-Specific Analysis, Visualization…
e.g. Genome alignment, RNAseq, CHIPseq analysis
Downloadable files for HTS usersFASTQ files
Sequences, Scores (FASTQ)
@HWUSI-EAS1562_0001:8:1:1119:18138#0/1ATATTCTTATATAAAAATATAATTATTTTAATATTTGGTCCTTTCGTACTAAAATAT+HWUSI-EAS1562_0001:8:1:1119:18138#0/1aaY`_aaY^a``[[`a\\\\aaa_^[aaZZWaaaXXY[VYaW^aaaa[aaa]a[a`
@HWUSI-EAS1562_0001:8:1:1119:13476#0/1AGAAAGCTTTGAAAATTATGTATACGCCTCGTAAGCCCAGTCCAAAGTCAAGACCA+HWUSI-EAS1562_0001:8:1:1119:13476#0/1a_^`a`_a[[NOONN__V__`Y^`^X]R[]]]]]Q```Y````__`^W`YVUPR]]
Sequence identifier Raw SequencePhred base calling quality scores(0 to 62 encoded using ASCII 64 to 126)
Genome Alignment (ELAND)
HWUSI-EAS1562_0001:8:1:1119:18138#0/1 ATATTCTTATATAAAAATATAATTATTTTAATATTTGGTCCTTTCGTACTAAAATAT U1 0 147 255 chr1.fa 26532086 F 23G
HWUSI-EAS1562_0001:8:1:1119:13476#0/1 AGAAAGCTTTGAAAATTATGTATACGCCTCGTAAGCCCAGTCCAAAGTCAAGACCA U0 1 0 0 chr12.fa 90535786 F
Sequence identifier
Raw Sequence
Type of match
Number of exact/1-error/2-error matches
Chromosome/Position/Direction
Substitution
Read Counts (Casava VDC)
Matchs with Genes, Exons, Splice junctions
Chromosome Gene Matchs
Files for visualization (GenomeStudio)
Genome alignment, Gene expression,RNAseq and CHIPseq analysis