understanding mechanisms underlying human gene expression variation with rna sequencing

Post on 28-Aug-2014

3.769 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

Goals:• Long-term: understand the precise mechanisms by

which genetic variation in humans influences gene expression

• inform genome-wide association studies (which have identified hundreds of non-coding region associated with disease)

• This study: identify the transcribed and polyadenylated RNAs present in a model cell type, identify genetic variants that influence the expression of these RNAs

Lymphoblastoid cell lines: a model system for understanding the genetics

of gene regulation

•This talk: 69 cell lines derived from white blood cells from Nigerian individuals

Genotypes: > 4M Single Nucleotide Polymorphisms (SNPs)

from the HapMap Project

Genomic Data: mRNA expression, DNA methylation, histone marks,

etc.

Lymphoblastoid cell lines: a model system for understanding the genetics

of gene regulation

Genotypes: > 4M Single Nucleotide Polymorphisms (SNPs)

from the HapMap Project

This talk: 69 cell lines derived from white blood cells from Nigerian individuals

AAAAAAAAAA

DNA

RNA

RNA-Seq 1: Isolate poly-A RNA, convert to cDNA

AAAAAAAAAA

DNA

RNA

RNA-Seq 2: Generate short sequencing reads from cDNA library (using

Illumina GA2)

AAAAAAAAAA

DNA

RNA

RNA-Seq 3: Map short reads back to genome;

count of reads/gene measures expression

AAAAAAAAAA

DNA

RNA

AAAAAAAAAA

Lymphoblastoid cell lines: a model system for understanding the genetics

of gene regulation

•This talk: 69 cell lines derived from white blood cells from Nigerian individuals

AAAAAAAAAA

Sequence RNA from each line (total 1.2 billion sequencing reads), average expression

levels across lines

RNA-Seq identifies new exons

AAAAAAAAAA

DNA

RNA

AAAAAAAAAA

DNA

RNA

RNA-Seq identifies new exons

RNA-Seq identifies new polyadenylation sites

AAAAAAAAAA

DNA

RNA

AAAAAAAAAA

DNA

RNA

RNA-Seq identifies new polyadenylation sites

Revisiting gene annotations in these cells

• ~4,000 unannotated, conserved exons

• 115 of which appear protein-coding

• Unannotated exons have lower expression and are more tissue specific than annotated exons

• ~400 polyadenylation sites over 50 bases from a known site.

• Conclusion: extensive use of unannotated UTRs.

Lymphoblastoid cell lines: a model system for understanding the genetics

of gene regulation

•This talk: 69 cell lines derived from white blood cells from Nigerian individuals

AAAAAAAAAA

Genotypes: > 4M Single Nucleotide Polymorphisms

(SNPs) from the HapMap Project

Genotypes: > 4M Single Nucleotide Polymorphisms

(SNPs) from the HapMap Project

Genotypes: > 4M Single Nucleotide Polymorphisms

(SNPs) from the HapMap Project

Genotypes: > 4M Single Nucleotide Polymorphisms

(SNPs) from the HapMap Project

Genotypes: > 4M Single Nucleotide Polymorphisms

(SNPs) from the HapMap Project

Genotypes: > 4M Single Nucleotide Polymorphisms (SNPs)

from the HapMap Project

Expression: from RNA-Seq

(Natural) Genetic variation potentially affects many levels of gene regulation

DNA

1. Transcription InitiationChromatin accessibilityTF Binding

2. mRNA processingSplicingPolyadenylationCapping, export

3. mRNA degradationmicroRNA regulationNMD

4. Translation, etctRNA abundancesProtein localizationProtein degradation

AAAAAAAAAA

DNA

RNA

RNA-Seq 3: Map short reads back to genome;

count of reads/gene measures expression

C

AAAAAAAAAA

DNA

RNA

AAAAAAAAAA

T

•use genotypes to identify associations between genetic variation and expression (eQTLs)

•~1000 eQTLs at an FDR of 10%

Polymorphisms near the transcription start site of a gene are the most likely to affect its

transcription

• Combining information across all genes, we can ask where SNPs that affect expression lie

• SNPs near the TSS, throughout the genic region most likely to influence expression

Black: bins within the genic regionBlue: bins outside the genic region

L

See also Veyrieras et al. (2008), Stranger et al. (2007), Cheung et al. (2005)

(Natural) Genetic variation potentially affects many levels of gene regulation

DNA

1. Transcription InitiationChromatin accessibilityTF Binding

2. mRNA processingSplicingPolyadenylationCapping, export

3. mRNA degradationmicroRNA regulationNMD

4. Translation, etctRNA abundancesProtein localizationProtein degradation

• use genotypes to identify associations between genetic variation and splicing (sQTLs)

• ~200 sQTLs at an FDR of 10%

• use genotypes to identify associations between genetic variation and splicing (sQTLs)

• ~200 sQTLs at an FDR of 10%

where are SNPs that affect splicing?

• Figure: odds of a SNP in a given functional annotation to impact splicing (relative to those in non-splice site intronic positions)

• SNPs in splice sites (this is defined liberally to include sites beyond the canonical two bases) and within the exon itself are enriched for sQTLs

Conclusions•Goal: understand the mechanisms of natural variation in gene

regulation in a model system

•RNA sequencing is useful for annotating genomes and comparing mRNA levels across individuals

•Observe extensive usage of unannotated UTRs.

•eQTLs enriched near transcription start sites, sQTLs enriched in and around canonical splice sites

•Next steps: can we identify which transcription factors/splice factors have altered binding, leading to variation in expression?

http://dx.doi.org/10.1038/nature08872

http://eqtl.uchicago.edu

top related