biostatistics & bioinformatics unit (research summary)

20
Biostatistics & Bioinformatics Unit (research summary)

Upload: lucien

Post on 15-Jan-2016

50 views

Category:

Documents


0 download

DESCRIPTION

Biostatistics & Bioinformatics Unit (research summary). BBU members. Peter Holmans Valentina Moskvina Andrew Pocklington Marian Hamshere Dobril Ivanov Giancarlo Russo Alex Richards Alexey Vedernikov. Research Areas. Genome-wide association analyses - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Biostatistics & Bioinformatics Unit (research summary)

Biostatistics & Bioinformatics Unit (research summary)

Page 2: Biostatistics & Bioinformatics Unit (research summary)

BBU members• Peter Holmans• Valentina Moskvina• Andrew Pocklington• Marian Hamshere• Dobril Ivanov• Giancarlo Russo• Alex Richards• Alexey Vedernikov

Page 3: Biostatistics & Bioinformatics Unit (research summary)

Research Areas• Genome-wide association analyses• Polygenic analyses to investigate genetic

architecture and relationship between traits• Sub-phenotype analysis: can refining the

phenotype refine the association signal?• Gene-wide analysis to summarise

association evidence per gene• Genome-wide interaction analysis:

statistically desirable?• Integrating gene expression data and

association data: are eQTLs useful for predicting disease?

• Pathway analysis: are sets of biologically-related genes enriched for association (or CNV) signal?

• Next generation sequencing

Page 4: Biostatistics & Bioinformatics Unit (research summary)

Data• Samples

– Bipolar, Schizophrenia, Alzheimer’s Disease, Parkinson’s, ADHD, VCFS

– 3k cases, 5k controls (on average)– 9k cases, 12k controls through collaboration

• SNP data– 500k -1.2M genotyped (genome-wide on chip)– 8.5M imputed

• currently 1kG+hapmap3 CEU+TSI samples as reference panel

• 256 node cluster (PBS script)– 200k custom chip

• QC– Remove poor samples, poor SNPs, minimise systematic bias,

cluster plots• Merging data

– Strand alignment, overlapping samples• Analyse & store results

– Plink, snptest, mach2dat, custom scripts etc

Marian Hamshere

Page 5: Biostatistics & Bioinformatics Unit (research summary)

Data

Genome

SNPs Genes (exons)

Page 6: Biostatistics & Bioinformatics Unit (research summary)

Phenotype data• Plenty!

– Psychological measures, e.g. grandiose delusions, depression

– fMRi brain volume– Neurocognitive measures, e.g. IQ, speed

tasks• Define phenotypes across

– Collections– Diseases

• Targeted/guided hypotheses• Data reduction techniques, e.g. PCA• Cross disorder polygenic analysis (with care!)• Phenotype analysis -> guide GWAS

Marian Hamshere

Page 7: Biostatistics & Bioinformatics Unit (research summary)

Databases• 37 interrelated MySQL databases on 3 servers,

approximately:- 730 tables- 9,600,000,000 records- 345 GB of disk space

• 2 Web Applications using the databasesWGA Results:- Genome-wide association analysis results (AD, BD, SZ, PD)- SNP-to-Gene and Gene-to-SNP annotations- Upload SNP lists for analysis and annotation- Download query results, top hits, entire resultsets

WTCCC mirror:- analysis, SNP, genotype, phenotype and sample data- selection and graphical representation of data according to user filters

Alexey Vedernikov

Page 8: Biostatistics & Bioinformatics Unit (research summary)

eQTLs and polygenic score analysis• Analysis

- Polygenic score analysis is a method of aggregating genotype data across many SNPs to predict affected status- eQTL analysis is a way of linking genotype data with gene expression levels- eQTL analysis is used to define groups of SNPs with a greater or lesser effect on global brain gene expression

• Data- eQTLs are defined in the datasets of Myers et al (8361 transcripts and 380157 SNPs, for 163 adult control brains)and Gibbs et al (2532685 SNPs and around 14000 transcripts in 125 adult control brain samples, across 4 brain areas)- ISC and MGS data: SNPs with a greater effect on global gene expression generally predict schizophrenia affected status significantly better than those with a lesser effect

Alex Richards

Page 9: Biostatistics & Bioinformatics Unit (research summary)

Gene-wide analysis• Analysis

- GWA studies are focused on SNPs as the unit of analysis - Complex patterns of association might not be reflected by association to the same SNPs in different samples- Power to detect association might be enhanced by exploiting information from multiple (quasi) independent signals within genes- Risk likely reflects the co-action of several loci but the approximate numbers of loci involved at the individual or the population levels are unknown

• Methods- SNP – Gene annotations- Permutations- Use of summary statistics only

Valentina Moskvina

Page 10: Biostatistics & Bioinformatics Unit (research summary)

Biological models of disease

Andrew Pocklington

• Molecular systems biology: models of neuronal signalling and diversity

Text-mining/data curation (interactions, functional annotations)

Network analysis

Data integration

Page 11: Biostatistics & Bioinformatics Unit (research summary)

Biological models of disease

Andrew Pocklington

• Molecular systems biology: models of neuronal signalling and diversity

• Neurobiology of disease: use these models to understand disease genetics (e.g. by identifying biologically-relevant sets of genes for pathway analysis).

Page 12: Biostatistics & Bioinformatics Unit (research summary)

Biological models of disease

179 neuro-anatomical structures

21,000 genes

http://www.brain-map.org/

Page 13: Biostatistics & Bioinformatics Unit (research summary)

Biological models of disease

http://www.brain-map.org/

179 neuro-anatomical structures

21,000 genes

Page 14: Biostatistics & Bioinformatics Unit (research summary)

Pathway analysis

Peter Holmans

• GWAS data: do pathways contain a larger number of significantly-associated genes than expected? (allowing for varying gene sizes, numbers of SNPs, genetic linkage,...)

Testing whether biologically-related genes are enriched for association signal

Page 15: Biostatistics & Bioinformatics Unit (research summary)
Page 16: Biostatistics & Bioinformatics Unit (research summary)
Page 17: Biostatistics & Bioinformatics Unit (research summary)

Pathway analysis

Peter Holmans

• GWAS data: do pathways contain a larger number of significantly-associated genes than expected? (allowing for varying gene sizes, numbers of SNPs, genetic linkage,...)

• CNV data: do CNVs in cases hit more genes in a pathway than CNVs in controls? (allowing for varying gene and CNV sizes)

Testing whether biologically-related genes are enriched for association signal

Page 18: Biostatistics & Bioinformatics Unit (research summary)
Page 19: Biostatistics & Bioinformatics Unit (research summary)

Pathway analysis

Peter Holmans

• GWAS data: do pathways contain a larger number of significantly-associated genes than expected ? (allowing for varying gene sizes, numbers of SNPs)

• CNV data: do CNVs in cases hit more genes in a pathway than CNVs in controls ? (allowing for varying gene and CNV sizes)

• Future: – Continue to develop better-annotated pathways,

using genomic data from multiple sources (e.g. expression, proteomics)

– Extend methods to use next-gen sequencing data

Testing whether biologically-related genes are enriched for association signal

Page 20: Biostatistics & Bioinformatics Unit (research summary)

Next Generation Sequencing• Targeted exome capture:

- Digest DNA to ~300bp- Clean and anneal adapters- Perform Pre-capture PCR with indexed primers- Hybridise exome capture probes- Clean and extract captured regions- Perform Post-capture PCR- Quantify and pool DNA samples

• Data processing- Indexing, aligning and sorting the output reads- Removing PCR duplicate reads- Analysis of target capturing and coverage

- Local realignment around indels - Recalibration of phred scores - QC using depth information and phred scores - Variants calling

• Analysis...

Giancarlo Russo