escience case studies using taverna dr. georgina moulton the university of manchester...
TRANSCRIPT
eScience Case Studies Using Taverna
Dr. Georgina MoultonThe University of Manchester
([email protected])(on behalf of the myGRID team)
Traditional Bioinformatics
12181 acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt 12241 cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt 12301 gaccatccta atagatacac agtggtgtct cactgtgatt ttaatttgca ttttcctgct 12361 gactaattat gttgagcttg ttaccattta gacaacttca ttagagaagt gtctaatatt 12421 taggtgactt gcctgttttt ttttaattgg gatcttaatt tttttaaatt attgatttgt 12481 aggagctatt tatatattct ggatacaagt tctttatcag atacacagtt tgtgactatt 12541 ttcttataag tctgtggttt ttatattaat gtttttattg atgactgttt tttacaattg 12601 tggttaagta tacatgacat aaaacggatt atcttaacca ttttaaaatg taaaattcga 12661 tggcattaag tacatccaca atattgtgca actatcacca ctatcatact ccaaaagggc 12721 atccaatacc cattaagctg tcactcccca atctcccatt ttcccacccc tgacaatcaa 12781 taacccattt tctgtctcta tggatttgcc tgttctggat attcatatta atagaatcaa
Requirements
• Automation• Reliability• Repeatability• Few programming skill required• Works on distributed resources
Multi-disciplinary
• ~37000 downloads• Ranked 210 on
sourceforge• Users in US,
Singapore, UK, Europe, Australia,
• Systems biology• Proteomics• Gene/protein annotation• Microarray data analysis• Medical image analysis• Heart simulations• High throughput screening• Phenotypical studies• Plants, Mouse, Human• Astronomy• Aerospace• Dilbert Cartoons
Williams-Beuren Syndrome (WBS)
• Contiguous sporadic gene deletion disorder
• 1/20,000 live births, caused by unequal crossover (homologous recombination) during meiosis
• Haploinsufficiency of the region results in the phenotype
• Multisystem phenotype – muscular, nervous, circulatory systems
• Characteristic facial features• Unique cognitive profile• Mental retardation (IQ 40-100,
mean~60, ‘normal’ mean ~ 100 )• Outgoing personality, friendly
nature, ‘charming’
Williams-Beuren Syndrome Microdeletion
Chr 7 ~155 Mb
~1.5 Mb7q11.23
GTF2I
RFC2
CYLN2
GTF2IRD1
NCF1
WBSCR1/E1f4H
LIMK1
ELN
CLDN4
CLDN3
STX1A
WBSCR18
WBSCR21
TBL2
BCL7B
BAZ1B
FZD9
WBSCR5/LAB
WBSCR22
FKBP6
POM121
NOLR1
GTF2IRD2
C-c
en
C-m
id
A-c
en
B-m
id
B-c
en
A-m
id
B-t
el
A-t
el
C-t
el
WBSCR14
ST
AG
3P
MS
2L
Block A
FK
BP
6T
PO
M12
1N
OL
R1
Block C
GT
F2I
PN
CF
1PG
TF
2IR
D2P
Block B
**
WBS
SVAS
Patient deletions
CTA-315H11
CTB-51J22
‘Gap’
Physical Map
Eicher E, Clark R & She, X An Assessment of the Sequence Gaps: Unfinished Business in a Finished Human Genome. Nature Genetics Reviews (2004) 5:345-354Hillier L et al. The DNA Sequence of Human Chromosome 7. Nature (2003) 424:157-164
Filling a genomic gap in silico
• Two steps to filling the genomic gap: 1. Identify new, overlapping sequence of interest2. Characterise the new sequence at nucleotide
and amino acid level
• Number of issues if we are to do it the traditional way:
1. Frequently repeated – info rapidly added to public databases
2. Time consuming and mundane 3. Don’t always get results4. Huge amount of interrelated data is produced
A B C
The Williams Workflows
A: Identification of overlapping sequenceB: Characterisation of nucleotide sequenceC: Characterisation of protein sequence
The Biological Results
CTA-315H11 CTB-51J22
ELN
WBSCR14
RP11-622P13 RP11-148M21 RP11-731K22
314,004bp extension
All nine known genes identified(40/45 exons identified)
CLDN4
CLDN3
STX1A
WBSCR18
WBSCR21
WBSCR22
WBSCR24
WBSCR27
WBSCR28
Four workflow cycles totalling ~ 10 hoursThe gap was correctly closed and all known features identified
Case Study – Graves Disease
• Autoimmune disease that causes hyperthyroidism
• Antibodies to the thyrotropin receptor result in constitutive activation of the receptor and increased levels of thyroid hormone
• Original myGrid Case StudyRef: Li P, Hayward K, Jennings C, Owen K, Oinn T, Stevens R, Pearce S and Wipat A (2004) Association of variations in NFKBIE with Graves? disease using classical and myGrid methodologies. UK e-Science All Hands Meeting 2004
Graves Disease
The experiment: • Analysing microarray data to determine genes
differentially-expressed in Graves Disease patients and healthy controls
• Characterising these genes (and any proteins encoded by them) in an annotation pipeline
• From affymetrix probeset identifier, extract information about genes encoded in this region.
• For each gene, evidence is extracted from other data sources to potentially support it as a candidate for disease involvement
Annotation Pipeline
Evidence includes:• SNPs in coding and non-coding regions• Protein products • Protein structure and functional features• Metabolic Pathways• Gene Ontology terms