learning and applications · microarray data analysis ... microarray data analysis regularization...

48
L EARNING AND APPLICATIONS REGULARIZATION METHODS FOR HIGH DIMENSIONAL LEARNING Francesca Odone and Lorenzo Rosasco [email protected] - [email protected] Regularization Methods for High Dimensional Learning Learning and applications

Upload: others

Post on 03-Aug-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: LEARNING AND APPLICATIONS · Microarray data analysis ... Microarray data analysis Regularization Methods for High Dimensional Learning Learning and applications. LEARNING FROM IMAGES

LEARNING AND APPLICATIONSREGULARIZATION METHODS FOR HIGH DIMENSIONAL LEARNING

Francesca Odone and Lorenzo [email protected] - [email protected]

Regularization Methods for High Dimensional Learning Learning and applications

Page 2: LEARNING AND APPLICATIONS · Microarray data analysis ... Microarray data analysis Regularization Methods for High Dimensional Learning Learning and applications. LEARNING FROM IMAGES

PLAN

Learning and engineering applications: why?

Examples of in house applicationsFace and object detectionMedical image analysisMicroarray data analysis

Regularization Methods for High Dimensional Learning Learning and applications

Page 3: LEARNING AND APPLICATIONS · Microarray data analysis ... Microarray data analysis Regularization Methods for High Dimensional Learning Learning and applications. LEARNING FROM IMAGES

LET’S GO BACK TO THE BEGINNING

The goal is not to memorize but to generalize (or to predict)

Given a set of data

(x1, y1), . . . , (xn, yn)

find a function f which is a good predictor of y for a future input x

f (x) = y

Regularization Methods for High Dimensional Learning Learning and applications

Page 4: LEARNING AND APPLICATIONS · Microarray data analysis ... Microarray data analysis Regularization Methods for High Dimensional Learning Learning and applications. LEARNING FROM IMAGES

WHAT IS IT USEFUL FOR?

The learning paradigm is useful whenever the underlying process ispartially unknown,too complex, ortoo noisy

to be modeled as a sequence of instructions.

Regularization Methods for High Dimensional Learning Learning and applications

Page 5: LEARNING AND APPLICATIONS · Microarray data analysis ... Microarray data analysis Regularization Methods for High Dimensional Learning Learning and applications. LEARNING FROM IMAGES

THE APPLICATIONS WE DEAL WITH

Computer visionFace detection and recognitionObject detectionImage annotationDynamic events and actions analysis

Medical Image AnalysisAutomatic MR annotationDictionary learning

Computational biologyGene selection

Regularization Methods for High Dimensional Learning Learning and applications

Page 6: LEARNING AND APPLICATIONS · Microarray data analysis ... Microarray data analysis Regularization Methods for High Dimensional Learning Learning and applications. LEARNING FROM IMAGES

THE APPLICATIONS WE DEAL WITH

Computer visionFace detection and recognitionObject detectionImage annotationDynamic events and actions analysis

Medical Image AnalysisAutomatic MR annotationDictionary learning

Computational biologyGene selection

Regularization Methods for High Dimensional Learning Learning and applications

Page 7: LEARNING AND APPLICATIONS · Microarray data analysis ... Microarray data analysis Regularization Methods for High Dimensional Learning Learning and applications. LEARNING FROM IMAGES

THE APPLICATIONS WE DEAL WITH

Computer visionFace detection and recognitionObject detectionImage annotationDynamic events and actions analysis

Medical Image AnalysisAutomatic MR annotationDictionary learning

Computational biologyGene selection

Regularization Methods for High Dimensional Learning Learning and applications

Page 8: LEARNING AND APPLICATIONS · Microarray data analysis ... Microarray data analysis Regularization Methods for High Dimensional Learning Learning and applications. LEARNING FROM IMAGES

THE APPLICATIONS WE DEAL WITH

Computer visionFace detection and recognitionObject detectionImage annotationDynamic events and actions analysis

Medical Image AnalysisAutomatic MR annotationDictionary learning

Computational biologyGene selection

Regularization Methods for High Dimensional Learning Learning and applications

Page 9: LEARNING AND APPLICATIONS · Microarray data analysis ... Microarray data analysis Regularization Methods for High Dimensional Learning Learning and applications. LEARNING FROM IMAGES

THE APPLICATIONS WE DEAL WITH

Computer visionFace detection and recognitionObject detectionImage annotationDynamic events and actions analysis

Medical Image AnalysisAutomatic MR annotationDictionary learning

Computational biologyGene selection

Regularization Methods for High Dimensional Learning Learning and applications

Page 10: LEARNING AND APPLICATIONS · Microarray data analysis ... Microarray data analysis Regularization Methods for High Dimensional Learning Learning and applications. LEARNING FROM IMAGES

THE APPLICATIONS WE DEAL WITH

Computer visionFace detection and recognitionObject detectionImage annotationDynamic events and actions analysis

Medical Image AnalysisAutomatic MR annotationDictionary learning

Computational biologyGene selection

Regularization Methods for High Dimensional Learning Learning and applications

Page 11: LEARNING AND APPLICATIONS · Microarray data analysis ... Microarray data analysis Regularization Methods for High Dimensional Learning Learning and applications. LEARNING FROM IMAGES

THE APPLICATIONS WE DEAL WITH

Computer visionFace detection and recognitionObject detectionImage annotationDynamic events and actions analysis

Medical Image AnalysisAutomatic MR annotationDictionary learning

Computational biologyGene selection

Regularization Methods for High Dimensional Learning Learning and applications

Page 12: LEARNING AND APPLICATIONS · Microarray data analysis ... Microarray data analysis Regularization Methods for High Dimensional Learning Learning and applications. LEARNING FROM IMAGES

PLAN

Learning and engineering applications: why?

Examples of in house applicationsFace and object detectionMedical image analysisMicroarray data analysis

Regularization Methods for High Dimensional Learning Learning and applications

Page 13: LEARNING AND APPLICATIONS · Microarray data analysis ... Microarray data analysis Regularization Methods for High Dimensional Learning Learning and applications. LEARNING FROM IMAGES

LEARNING FROM IMAGES

Object detection, image categorization and, more in general,image understanding are difficult problemsLearning from examples has been accepted as a viable way todeal with such problems, addressing noise and intra-classvariability by collecting appropriate data and finding suitabledescriptions

Images are relatively easy to gather

Regularization Methods for High Dimensional Learning Learning and applications

Page 14: LEARNING AND APPLICATIONS · Microarray data analysis ... Microarray data analysis Regularization Methods for High Dimensional Learning Learning and applications. LEARNING FROM IMAGES

IMAGE DESCRIPTIONS

WITH OVERCOMPLETE FEATURE SETS

Overcomplete general purpose sets of features are effective formodeling visual information

Many object classes have peculiar intrinsic structures that can bebetter appreciated if one looks for symmetries or localgeometries

Examples of features: wavelets, ranklets, chirplets, rectanglefeatures, ...Examples of problems: face detection [Heisele et al., Viola &Jones, Destrero et al.], pedestrian detection [Oren et al.], cardetection [Papageorgiou & Poggio]

The approach is inspired by biological systemsSee, for instance, B. A. Olshauser and D. J. Field “Sparse codingwith an over-complete basis set: a strategy employed by V1?”1997

Regularization Methods for High Dimensional Learning Learning and applications

Page 15: LEARNING AND APPLICATIONS · Microarray data analysis ... Microarray data analysis Regularization Methods for High Dimensional Learning Learning and applications. LEARNING FROM IMAGES

FACE DETECTIONDESTRERO ET AL, 2009

THE CLASSIFICATION PROBLEM

It is a (binary) classification problem:→ each image region can either be a face or notWe start from a training set of faces and non-faces images:

{(x1, y1), . . . , (xn, yn)}

xi is a raw vector encoding the gray levels of image Ii ,yi = {−1,1} according to whether the image is a face or not

IMAGE REPRESENTATION

We represent images as rectangle feature vectors:

xi → (φ1(xi ), . . . , φp(xi ))

Regularization Methods for High Dimensional Learning Learning and applications

Page 16: LEARNING AND APPLICATIONS · Microarray data analysis ... Microarray data analysis Regularization Methods for High Dimensional Learning Learning and applications. LEARNING FROM IMAGES

FACE DETECTION

ASSUMPTION

We assumeΦβ = Y

where Φ = {Φij} is the data matrix; β = (β1, ..., βp)T vector ofunknown weights to be estimated; Y = (y1, ..., yn)T output labels

Usually p is big; existence of the solution is ensured, uniquenessis notThe overcomplete set contains many correlated featuresThus, the problem is ill-posed. We resort to regularization.

SELECT FACE FEATURES

L1 regularization allow us to select a sparse subset of meaningfulfeatures for the problem, with the aim of discarding correlated ones

minβ∈IRp

‖Y − βΦ‖2 + λ ‖β‖1 .

Regularization Methods for High Dimensional Learning Learning and applications

Page 17: LEARNING AND APPLICATIONS · Microarray data analysis ... Microarray data analysis Regularization Methods for High Dimensional Learning Learning and applications. LEARNING FROM IMAGES

A SAMPLED VERSION OF THE ALGORITHM

Applying the algorithm starting from the entire set of feature is notcomputationally feasible (Φ: 4000x64000 ' 1GB)

We create many subsets offeatures randomly sampledwith repetitionWe run the algorithmseparately on each subsetWe keep only featuresselected in every run in whichthey were present

S0

Subset 1

Random extractions of10% features w. repetition

Subset 200Subset 2

Selectedfeatures 1

Selectedfeatures 2

Selectedfeatures 200

ThresholdedLandweber

S1

Keep features selectedin every run in whichthey were present

Regularization Methods for High Dimensional Learning Learning and applications

Page 18: LEARNING AND APPLICATIONS · Microarray data analysis ... Microarray data analysis Regularization Methods for High Dimensional Learning Learning and applications. LEARNING FROM IMAGES

THE FINAL SET OF FACE FEATURES

Positive and negative samples from thetraining set

Notice how vertical symmetries are notcaptured by selected features

Regularization Methods for High Dimensional Learning Learning and applications

Page 19: LEARNING AND APPLICATIONS · Microarray data analysis ... Microarray data analysis Regularization Methods for High Dimensional Learning Learning and applications. LEARNING FROM IMAGES

THE SOLUTION DEPENDS ON THE TRAINING DATA

In MIT+CMU training set all imagesare registered and well cropped

Vertical symmetries are captured byselected features

Regularization Methods for High Dimensional Learning Learning and applications

Page 20: LEARNING AND APPLICATIONS · Microarray data analysis ... Microarray data analysis Regularization Methods for High Dimensional Learning Learning and applications. LEARNING FROM IMAGES

FACE DETECTION

FACE CLASSIFICATION

Elastic net regularization embeds both feature selection andprediction functionalitiesAs suggested in (Candes & Tao, 2007) in order to improve theclassification performance one could use L2 regularization on thereduced data representation.Since a main requirement of our application is real-timeperformance we adopt a linear SVM for classification:

L1 + SVM gives us sparsity both on the representation and on thedataset and thus fewer computations

Regularization Methods for High Dimensional Learning Learning and applications

Page 21: LEARNING AND APPLICATIONS · Microarray data analysis ... Microarray data analysis Regularization Methods for High Dimensional Learning Learning and applications. LEARNING FROM IMAGES

FACE CLASSIFICATION RESULTS

0.7

0.75

0.8

0.85

0.9

0.95

1

0 0.005 0.01 0.015 0.02

2 stages feature selection2 stages feature selection + correlation

Viola+Jones feature selection using our same dataViola+Jones cascade performance

Our strategy for feature selection outperforms the one by Viola andJones using the same dataset

Adaboost seems to need a big number of examples to be trainedeffectively (we used just 4000 examples)

Regularization Methods for High Dimensional Learning Learning and applications

Page 22: LEARNING AND APPLICATIONS · Microarray data analysis ... Microarray data analysis Regularization Methods for High Dimensional Learning Learning and applications. LEARNING FROM IMAGES

FROM FACE CLASSIFICATION TO FACE DETECTIONWHY IS IT DIFFICULT?

It is very unlikely to find a face in a real image→ high number of false positives

Image dimensions:384x222px

∼ 6.5 · 105 tests in amulti-scale search with abase window of 19x19px

Only 11 faces!

Regularization Methods for High Dimensional Learning Learning and applications

Page 23: LEARNING AND APPLICATIONS · Microarray data analysis ... Microarray data analysis Regularization Methods for High Dimensional Learning Learning and applications. LEARNING FROM IMAGES

FACE DETECTION:A CASCADE OF CLASSIFIERS

For each image we have many tests to do→ few positive examples and many negative examplesWe build a coarse-to-fine classification architecture:→ Simpler classifiers are used to reject the majority ofsub-windows→ More complex classifiers allow us to achieve low false positiverates

Regularization Methods for High Dimensional Learning Learning and applications

Page 24: LEARNING AND APPLICATIONS · Microarray data analysis ... Microarray data analysis Regularization Methods for High Dimensional Learning Learning and applications. LEARNING FROM IMAGES

FACE DETECTION:A CASCADE OF CLASSIFIERS

1 Start from set S of selected features2 Choose at least 3 mutually distant features3 Train a linear SVM classifier using those features and test it on a

validation set4 Do we reach target performance (h = 99,5%; f = 50%)?

YES Finalize the classifier, remove used features from S and go to (2).NO Add a feature from S and go to (3).

F =K∏

i=1

fi and H =K∏

i=1

hiRegularization Methods for High Dimensional Learning Learning and applications

Page 25: LEARNING AND APPLICATIONS · Microarray data analysis ... Microarray data analysis Regularization Methods for High Dimensional Learning Learning and applications. LEARNING FROM IMAGES

A PIPELINE FOR FACE AUTHENTICATIONDESTRERO ET AL., 2009

Regularization Methods for High Dimensional Learning Learning and applications

Page 26: LEARNING AND APPLICATIONS · Microarray data analysis ... Microarray data analysis Regularization Methods for High Dimensional Learning Learning and applications. LEARNING FROM IMAGES

RESULTS

Regularization Methods for High Dimensional Learning Learning and applications

Page 27: LEARNING AND APPLICATIONS · Microarray data analysis ... Microarray data analysis Regularization Methods for High Dimensional Learning Learning and applications. LEARNING FROM IMAGES

PLAN

Learning and engineering applications: why?

Examples of in house applicationsFace and object detectionMedical image analysisMicroarray data analysis

Regularization Methods for High Dimensional Learning Learning and applications

Page 28: LEARNING AND APPLICATIONS · Microarray data analysis ... Microarray data analysis Regularization Methods for High Dimensional Learning Learning and applications. LEARNING FROM IMAGES

AUTOMATIC ANNOTATION OF MR IMAGES:SYNOVITIS ASSESSMENTBASSO ET AL, 2010

Setting: children under 16 affected by Juvenile Idiopatic ArthritisGoal: to measure the volume of the inflamed synovia in 3D MRimagesOur problem: to classify each voxel of the MRThe approach is supervised, we use for training the manualannotations performed by experts

Regularization Methods for High Dimensional Learning Learning and applications

Page 29: LEARNING AND APPLICATIONS · Microarray data analysis ... Microarray data analysis Regularization Methods for High Dimensional Learning Learning and applications. LEARNING FROM IMAGES

VOXEL-BASED IMAGE DESCRIPTION

Each voxel is represented with a set of cues chosen among theones commonly used for voxel classificationThey include the intensity of the voxel and its neighbors, theposition of the voxel, the multiscale 2-jets, the vesselnessmeasures

x→ φ(x) = {ϕ1, . . . , ϕk}

Regularization Methods for High Dimensional Learning Learning and applications

Page 30: LEARNING AND APPLICATIONS · Microarray data analysis ... Microarray data analysis Regularization Methods for High Dimensional Learning Learning and applications. LEARNING FROM IMAGES

MULTI-CUE VOXEL CLASSIFIER

THE DISCRIMINANT FUNCTION

We look for a more flexible discriminant function

f (φ) =∑

(i,j)∈I

αjiK

ji (φ) + b

ASSUMPTION

The k × n basis functions

K ji (φ) = exp

{−||ϕj − ϕj

i ||2

2σ2

}

measure the similarity of φ with an example voxel i with respectto a specific cue j

Regularization Methods for High Dimensional Learning Learning and applications

Page 31: LEARNING AND APPLICATIONS · Microarray data analysis ... Microarray data analysis Regularization Methods for High Dimensional Learning Learning and applications. LEARNING FROM IMAGES

MULTI-CUE VOXEL CLASSIFIER

THE DISCRIMINANT FUNCTION

We look for a more flexible discriminant function

f (φ) =∑

(i,j)∈I

αjiK

ji (φ) + b

MODEL SELECTION

The optimal subset I of basis fuctions, on which f depends, maybe inferred from the data by means of feature selection.Starting from a manually annotated training set of n voxels wecompute the n × kn matrix K

K = (K1, . . . ,Kk )

and look for a sparse vector α so that

y = Kα

Regularization Methods for High Dimensional Learning Learning and applications

Page 32: LEARNING AND APPLICATIONS · Microarray data analysis ... Microarray data analysis Regularization Methods for High Dimensional Learning Learning and applications. LEARNING FROM IMAGES

MULTI-CUE VOXEL CLASSIFIER

THE DISCRIMINANT FUNCTION

We look for a more flexible discriminant function

f (φ) =∑

(i,j)∈I

αjiK

ji (φ) + b

LEARNING ALGORITHM

The goal of learning is to find the optimal affine combinationdefined by the coefficients αj

i and b. This is achieved with L2regularization on the restricted matrix K̂

Regularization Methods for High Dimensional Learning Learning and applications

Page 33: LEARNING AND APPLICATIONS · Microarray data analysis ... Microarray data analysis Regularization Methods for High Dimensional Learning Learning and applications. LEARNING FROM IMAGES

RESULTS

Multi-cue classifier if 15 times sparser than SVM and approximately40 times faster.

Regularization Methods for High Dimensional Learning Learning and applications

Page 34: LEARNING AND APPLICATIONS · Microarray data analysis ... Microarray data analysis Regularization Methods for High Dimensional Learning Learning and applications. LEARNING FROM IMAGES

PLAN

Learning and engineering applications: why?

Examples of in house applicationsFace and object detectionMedical image analysisMicroarray data analysis

Regularization Methods for High Dimensional Learning Learning and applications

Page 35: LEARNING AND APPLICATIONS · Microarray data analysis ... Microarray data analysis Regularization Methods for High Dimensional Learning Learning and applications. LEARNING FROM IMAGES

MACHINE LEARNING AND THE ANALYSIS OF

MICROARRAYS

GOALS

Design methods able to identify a gene segnature, i.e., a panelof genes potentially interesting for further screeningLearn the gene signatures, i.e., select the most discriminantsubset of genes on the available data

Regularization Methods for High Dimensional Learning Learning and applications

Page 36: LEARNING AND APPLICATIONS · Microarray data analysis ... Microarray data analysis Regularization Methods for High Dimensional Learning Learning and applications. LEARNING FROM IMAGES

MACHINE LEARNING AND THE ANALYSIS OF

MICROARRAYS

A TYPICAL "-OMICS" SCENARIO

High dimensional data - Few samples per classtenths of data - tenths of thousands genes→ Variable selectionHigh risk of selection biasdata distortion arising from the way the data are collecteddue to the small amount of data available→ Model assessment needed

Possibily find ways to incorporate prior knowledgeDeal with data visualization

Regularization Methods for High Dimensional Learning Learning and applications

Page 37: LEARNING AND APPLICATIONS · Microarray data analysis ... Microarray data analysis Regularization Methods for High Dimensional Learning Learning and applications. LEARNING FROM IMAGES

GENE SELECTION

THE PROBLEM

Select a small subset of input variables (genes) which are usedfor building classifiers

ADVANTAGES:it is cheaper to measure less variablesthe resulting classifier is simpler and potentially fasterprediction accuracy may improve by discarding irrelevantvariablesidentifying relevant variables gives useful information about thenature of the corresponding classification problem (biomarkerdetection)

Regularization Methods for High Dimensional Learning Learning and applications

Page 38: LEARNING AND APPLICATIONS · Microarray data analysis ... Microarray data analysis Regularization Methods for High Dimensional Learning Learning and applications. LEARNING FROM IMAGES

VARIABLE SELECTION IN BIOINFORMATICS

MOTIVATIONS

Ease Computational Burden:Discard the (apparently) less significant features and train in asimplified space: alleviate the curse of dimensionalityEnhance Information:Highlight (and rank) the most important features and improve theknowledge of the underlying process.

COMMONLY ADOPTED METHODS

Statistical Filters (t-test,S/N ratio,...)Learning Techniques (embedded methods, wrapper methods,stepwise feature elimination,..)Mapping Methods (“Metagenes”: simplified model for pathways,even though biological suggestions require caution

Regularization Methods for High Dimensional Learning Learning and applications

Page 39: LEARNING AND APPLICATIONS · Microarray data analysis ... Microarray data analysis Regularization Methods for High Dimensional Learning Learning and applications. LEARNING FROM IMAGES

STATISTICAL FILTERS

These approaches are well established in the gene selectionliterature. One considers the various measurements associated toeach gene (column of the data matrix X)

T TEST

For each column of X we compute

t =µ1 − µ2√σ1n1

+ σ2n2

were subscripts 1 and 2 stand for positive and negative examplesGenes are ranked with respect to the t valueA threshold is set to perform gene selection

Regularization Methods for High Dimensional Learning Learning and applications

Page 40: LEARNING AND APPLICATIONS · Microarray data analysis ... Microarray data analysis Regularization Methods for High Dimensional Learning Learning and applications. LEARNING FROM IMAGES

GENE SELECTION WITH L1-L2 REGULARIZATIONMOSCI ET., 2008

minβ∈IRp

‖Y − βX‖2 + τ(‖β‖1 + ε ‖β‖22).

Consistency guaranteed - the more samples available thebetter the estimatorMultivariate - it takes into account many genes at once

OUTPUT

one-parameter (ε) family of nested lists with equivalent predictionability and increasing correlation among genes

ε→ 0: minimal list of prototype genesε1 < ε2 < ε3 < . . .: longer lists including correlated genes

Regularization Methods for High Dimensional Learning Learning and applications

Page 41: LEARNING AND APPLICATIONS · Microarray data analysis ... Microarray data analysis Regularization Methods for High Dimensional Learning Learning and applications. LEARNING FROM IMAGES

DOUBLE OPTIMIZATION APPROACHMOSCI ET., 2008

VARIABLE SELECTION + CLASSIFICATION:

Variable selection step (L1-L2)

minβ∈IRp

‖Y − βX‖2 + τ(‖β‖1 + ε ‖β‖22).

Classification step (RLS)

‖Y − βX‖22 + λ ‖β‖2

2

for each ε we have to choose λ and τ

Regularization Methods for High Dimensional Learning Learning and applications

Page 42: LEARNING AND APPLICATIONS · Microarray data analysis ... Microarray data analysis Regularization Methods for High Dimensional Learning Learning and applications. LEARNING FROM IMAGES

A SELECTION BIAS AWARE FRAMEWORKBARLA ET AL, 2008

λ→ (λ1, . . . , λA)τ → (τ1, . . . , τB)the optimal pair (λ∗, τ∗) is one of the possible A · B pairs (λ, τ)

Regularization Methods for High Dimensional Learning Learning and applications

Page 43: LEARNING AND APPLICATIONS · Microarray data analysis ... Microarray data analysis Regularization Methods for High Dimensional Learning Learning and applications. LEARNING FROM IMAGES

ALGORITHMIC AND COMPUTATIONAL ISSUES

FROM MANY LISTS TO ONE FINAL LIST

Criterion based on frequency – i.e., occurrences of a geneacross all the listsSince we have a correlation parameter we can tune and varythe list length

FROM 1 WEEK COMPUTATION TO...?

Computational time for LOO (for one task)time1−optim = (2.5s to 25s)depending on the correlation parameter

total time = A · B · Nsamples · time1−optim∼ 20 · 20 · 30 · time1−optim

∼ 2 · 104s to 2 · 105s

6 tasks→ 1 week!!

Regularization Methods for High Dimensional Learning Learning and applications

Page 44: LEARNING AND APPLICATIONS · Microarray data analysis ... Microarray data analysis Regularization Methods for High Dimensional Learning Learning and applications. LEARNING FROM IMAGES

COMPUTATION OVER A GRID

Grid middleware: OurGrid, a multiplatform grid that can deal withhosts not directly connected to the Internet.Used by the ShareGrid project, which involves severaluniversities in Northern Italy.Cheap solution: 60 PCs (students: lab)

Regularization Methods for High Dimensional Learning Learning and applications

Page 45: LEARNING AND APPLICATIONS · Microarray data analysis ... Microarray data analysis Regularization Methods for High Dimensional Learning Learning and applications. LEARNING FROM IMAGES

GENE SELECTION WITH L1-L2 REGULARIZATIONDE MOL, MOSCI, TRASKINE, VERRI, 2008

Regularization Methods for High Dimensional Learning Learning and applications

Page 46: LEARNING AND APPLICATIONS · Microarray data analysis ... Microarray data analysis Regularization Methods for High Dimensional Learning Learning and applications. LEARNING FROM IMAGES

FINDING STRUCTURED GENE SIGNATURES

How do we estimate groups of correlated genes?We may rely on the nested structure obtained by varying thecorrelation parameterWe consider the minimal list list0 as a starting point of anagglomerative clustering technique , based on the Pearsondistance:

d(Xi ,Xj ) =corr(Xi ,Xj )√var(Xi )var(Xj )

evaluating the normalized correlation between two columns Xiand Xj of the data matrix X

Regularization Methods for High Dimensional Learning Learning and applications

Page 47: LEARNING AND APPLICATIONS · Microarray data analysis ... Microarray data analysis Regularization Methods for High Dimensional Learning Learning and applications. LEARNING FROM IMAGES

AN EXAMPLE APPLICATION

IDENTIFYING THE HYPOXIA SIGNATURE OF NEUROBLASTOMA VIAREGULARIZATION

joint research with IGG Molecular Biology lab

Dataset: 9 neuroblastoma (NB) cell lines cultured under normoxic andhypoxic conditions. Technology: Affymetrix GeneChip U133 plus 2.0.

t-test: no genes selected!

l1l2 protocol: 11 genes for the minimal list (frequency> 30%)

Regularization Methods for High Dimensional Learning Learning and applications

Page 48: LEARNING AND APPLICATIONS · Microarray data analysis ... Microarray data analysis Regularization Methods for High Dimensional Learning Learning and applications. LEARNING FROM IMAGES

REFERENCES

A. Destrero, C. De Mol, F. Odone, A. Verri. "A Regularized Framework for FeatureSelection in Face Detection and Authentication". IJCV (2009).

A. Destrero, C. De Mol, F. Odone, A. Verri."A sparsity-enforcing method forlearning face features.". IEEE Transactions on Image Processing 18 (2009):188-201.

C. Basso, M. Santoro, A. Verri and M. Esposito. "Segmentation of InflamedSynovia in Multi-Modal MRI." In Proc. of IEEE ISBI 2009, June 28 - July 1 2009.

Fardin, Paolo, Cornero, Andrea, Annalisa Barla, Sofia Mosci, Acquaviva,Massimo, Lorenzo Rosasco, Gambini, Claudio, Alessandro Verri, Varesio, Luigi,"Identification of multiple hypoxia signatures in neuroblastoma cell lines by l1-l2regularization and data reduction", Journal of Biomedicine and Biotechnology,2010

A. Barla, S. Mosci, L. Rosasco and A. Verri. "A method for robust variableselection with significance assessment." Proc. of ESANN, European Symposiumon Artificial Neural Networks 2008.

C. De Mol, S. Mosci, M. Traskine and A. Verri; "A Regularized Method forSelecting Nested Groups of Relevant Genes from Microarray Data" Journal ofComputational Biology 2008.

Regularization Methods for High Dimensional Learning Learning and applications