escience case studies using taverna dr. georgina moulton the university of manchester...

14
eScience Case Studies Using Taverna Dr. Georgina Moulton The University of Manchester ([email protected] ) (on behalf of the my GRID team)

Upload: bennett-mathews

Post on 29-Jan-2016

226 views

Category:

Documents


0 download

TRANSCRIPT

eScience Case Studies Using Taverna

Dr. Georgina MoultonThe University of Manchester

([email protected])(on behalf of the myGRID team)

Traditional Bioinformatics

12181 acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt 12241 cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt 12301 gaccatccta atagatacac agtggtgtct cactgtgatt ttaatttgca ttttcctgct 12361 gactaattat gttgagcttg ttaccattta gacaacttca ttagagaagt gtctaatatt 12421 taggtgactt gcctgttttt ttttaattgg gatcttaatt tttttaaatt attgatttgt 12481 aggagctatt tatatattct ggatacaagt tctttatcag atacacagtt tgtgactatt 12541 ttcttataag tctgtggttt ttatattaat gtttttattg atgactgttt tttacaattg 12601 tggttaagta tacatgacat aaaacggatt atcttaacca ttttaaaatg taaaattcga 12661 tggcattaag tacatccaca atattgtgca actatcacca ctatcatact ccaaaagggc 12721 atccaatacc cattaagctg tcactcccca atctcccatt ttcccacccc tgacaatcaa 12781 taacccattt tctgtctcta tggatttgcc tgttctggat attcatatta atagaatcaa

Requirements

• Automation• Reliability• Repeatability• Few programming skill required• Works on distributed resources

Multi-disciplinary

• ~37000 downloads• Ranked 210 on

sourceforge• Users in US,

Singapore, UK, Europe, Australia,

• Systems biology• Proteomics• Gene/protein annotation• Microarray data analysis• Medical image analysis• Heart simulations• High throughput screening• Phenotypical studies• Plants, Mouse, Human• Astronomy• Aerospace• Dilbert Cartoons

Williams-Beuren Syndrome (WBS)

• Contiguous sporadic gene deletion disorder

• 1/20,000 live births, caused by unequal crossover (homologous recombination) during meiosis

• Haploinsufficiency of the region results in the phenotype

• Multisystem phenotype – muscular, nervous, circulatory systems

• Characteristic facial features• Unique cognitive profile• Mental retardation (IQ 40-100,

mean~60, ‘normal’ mean ~ 100 )• Outgoing personality, friendly

nature, ‘charming’

Williams-Beuren Syndrome Microdeletion

Chr 7 ~155 Mb

~1.5 Mb7q11.23

GTF2I

RFC2

CYLN2

GTF2IRD1

NCF1

WBSCR1/E1f4H

LIMK1

ELN

CLDN4

CLDN3

STX1A

WBSCR18

WBSCR21

TBL2

BCL7B

BAZ1B

FZD9

WBSCR5/LAB

WBSCR22

FKBP6

POM121

NOLR1

GTF2IRD2

C-c

en

C-m

id

A-c

en

B-m

id

B-c

en

A-m

id

B-t

el

A-t

el

C-t

el

WBSCR14

ST

AG

3P

MS

2L

Block A

FK

BP

6T

PO

M12

1N

OL

R1

Block C

GT

F2I

PN

CF

1PG

TF

2IR

D2P

Block B

**

WBS

SVAS

Patient deletions

CTA-315H11

CTB-51J22

‘Gap’

Physical Map

Eicher E, Clark R & She, X An Assessment of the Sequence Gaps: Unfinished Business in a Finished Human Genome. Nature Genetics Reviews (2004) 5:345-354Hillier L et al. The DNA Sequence of Human Chromosome 7. Nature (2003) 424:157-164

Filling a genomic gap in silico

• Two steps to filling the genomic gap: 1. Identify new, overlapping sequence of interest2. Characterise the new sequence at nucleotide

and amino acid level

• Number of issues if we are to do it the traditional way:

1. Frequently repeated – info rapidly added to public databases

2. Time consuming and mundane 3. Don’t always get results4. Huge amount of interrelated data is produced

A B C

The Williams Workflows

A: Identification of overlapping sequenceB: Characterisation of nucleotide sequenceC: Characterisation of protein sequence

The Biological Results

CTA-315H11 CTB-51J22

ELN

WBSCR14

RP11-622P13 RP11-148M21 RP11-731K22

314,004bp extension

All nine known genes identified(40/45 exons identified)

CLDN4

CLDN3

STX1A

WBSCR18

WBSCR21

WBSCR22

WBSCR24

WBSCR27

WBSCR28

Four workflow cycles totalling ~ 10 hoursThe gap was correctly closed and all known features identified

Case Study – Graves Disease

• Autoimmune disease that causes hyperthyroidism

• Antibodies to the thyrotropin receptor result in constitutive activation of the receptor and increased levels of thyroid hormone

• Original myGrid Case StudyRef: Li P, Hayward K, Jennings C, Owen K, Oinn T, Stevens R, Pearce S and Wipat A (2004) Association of variations in NFKBIE with Graves? disease using classical and myGrid methodologies. UK e-Science All Hands Meeting 2004

Graves Disease

The experiment: • Analysing microarray data to determine genes

differentially-expressed in Graves Disease patients and healthy controls

• Characterising these genes (and any proteins encoded by them) in an annotation pipeline

• From affymetrix probeset identifier, extract information about genes encoded in this region.

• For each gene, evidence is extracted from other data sources to potentially support it as a candidate for disease involvement

Annotation Pipeline

Evidence includes:• SNPs in coding and non-coding regions• Protein products • Protein structure and functional features• Metabolic Pathways• Gene Ontology terms