learning classifiers from discretized expression quantitative trait loci

20
Learning classifiers from discretized expression quantitative trait loci A. Masegosa, M. M. Abad-Grau, S. Moral and F. Matesanz CITIC, Universidad de Granada & I. P. López Neyra,CSIC, Granada, Spain

Upload: ntnu

Post on 12-Apr-2017

200 views

Category:

Science


0 download

TRANSCRIPT

Learning classifiers from discretized expression quantitative trait loci

A. Masegosa, M. M. Abad-Grau, S. Moral and F. Matesanz CITIC, Universidad de Granada & I. P. López Neyra,CSIC, Granada, Spain

Outline

  Introduction

  Methods   Classification Algorithms   SNPs Processing

  Data Sets

  Experimental Results

  Conclusions & Future Works

Introduction

Introduction Genetics variants and Gene expression

 Genotypes and mRNA transcript levels.   Basis to understand complex diseases [18].

 Gene variants modifies some gene expressions (eQTLs).   Hard to indentify.   Linkage desequilibrium with the real cause [23].

 Associations between single SNPs and gene expression [7,21].   No multiple SNPs.   Satistical inference and computational problems.

Introduction Our approach

  SNP-GeneExpression data association.   Pre-discretization of expression data   Low expression and high expression.

 Alternative statistical inference approach.   From regression to classification.   Supervised classification machinery.

 Different assumptions.   Hidden binary variable (non-observable mechanism)   SNPs Hidden Variable Gene Expression

Introduction Our approach

  Gene HLA-DRB5 (DRB5)   Encode β chains for the DR HLA class II receptor.   Associated with immune related diseases susceptibility [5].

Introduction Our approach

  Gene HLA-DRB5 (DRB5)   Encode β chains for the DR HLA class II receptor.   Associated with immune related diseases susceptibility [5].

Low Expressed High Expressed

Methods

Methods Classification algorithms

  Classification function:

  X is subset of SNPs.

  Y is the output variable: Low expression vs High expression.

  Learning Machines: Supervised Classifers   Learn a function “f” from a set of labeled data samples   Different models: Naïve Bayes, SVM, C4.5….

  Evaluate the prediction capacity of a subset of SNPs:   If there is prediction capacity then there is association.

Methods SNPs Processing

  Genotypes from Chromosome 6.   DRB5 is this chromosome.   Cis association.

  SNPs grouped in blocks of low recombination.   SNPs with high LD among them.   Pairwise computations of confident intervals of LD [6].

  Analyze association between DRB5 expression   Single SNPs   Block of SNPs

Results

Data Set

  107 unrelated individuals (parents).   Yoruba (Nigeria) population

  6593 SNPs from Chromosome 6   345 non-overlapping blocks of low recombination

Block ID

SNPs

per

blo

ck

Results All Blocks

  Classification Models (predict the binarized expression of DRB5):  Naïve Bayes, C4.5 and SVM

  Regression Models (predict the continuous expression of DRB5):  SVM-Reg [20] & Gaussian processes [25]

  Evaluation  Train models with 90% of data, Test over the other10% and repeat (10 fold cv).

Results Block vs SNPs prediction capacity

Block Id

Are

a un

der

ROC

Results SNPs with maximum prediction capacity

  Table   SNPs with perfect predicition (AUC=1.0)

  Histrogram   Homozygotic mutant allele (left bar), heterozygotic (central bar) and

Homozygotic wild type (right bar).   High Expression (red) and Low Expression (blue).

Conclusions &

Future Works

Conclusions & Future Works   By discretizing gene expression:

  GeneExpression-SNPs associations with classification learning.   Simplify the hypothesis: low vs high expression.   Many variables (relevant, noisy, redundant) can be considered.

  Gene DRB5 has been studied with YRI population.   Perfect correlation between some SNPs.

  Future Works:   Automated discretization approach (Gaussian mixture model).   Extend these analysis to other genes.

Thanks for you attention!!

Any questions?

Results Block prediction capacity (NB classifier)

Block Id

Are

a un

der

ROC

Results Single SNP prediction capacity (NB classifier)

SNP Id

Are

a un

der

ROC