random projections - healthiness and stochastic simulation · stochastic simulation of multiple...
TRANSCRIPT
RandomProjections
Joao Brazuna
Defining RandomProjections
How Do RandomProjections Work?
How to ApplyRandomProjections?
Multiple LinearRegression Model
StochasticSimulation ofMultiple LinearRegression Models
Multiple LogisticRegression Model
DiagnosingLeukaemia
Other Applications
Random ProjectionsHealthiness and Stochastic Simulation
Joao Brazuna
Statistical Methods in Data MiningInstituto Superior Tecnico
November 29, 2016
RandomProjections
Joao Brazuna
Defining RandomProjections
How Do RandomProjections Work?
How to ApplyRandomProjections?
Multiple LinearRegression Model
StochasticSimulation ofMultiple LinearRegression Models
Multiple LogisticRegression Model
DiagnosingLeukaemia
Other Applications
Contents
Defining Random Projections
How Do Random Projections Work?
How to Apply Random Projections?
Multiple Linear Regression Model
Stochastic Simulation of Multiple Linear Regression Models
Multiple Logistic Regression Model
Diagnosing Leukaemia
Other Applications
RandomProjections
Joao Brazuna
Defining RandomProjections
How Do RandomProjections Work?
How to ApplyRandomProjections?
Multiple LinearRegression Model
StochasticSimulation ofMultiple LinearRegression Models
Multiple LogisticRegression Model
DiagnosingLeukaemia
Other Applications
Information Era
Characterized by:
I High dimensional data;
I Difficult to process.
Random Projections’ Goal:
Efficiently reduce the data dimension.
Some Applications of Random Projections
I Classification;
I Clustering;
I Regression.
RandomProjections
Joao Brazuna
Defining RandomProjections
How Do RandomProjections Work?
How to ApplyRandomProjections?
Multiple LinearRegression Model
StochasticSimulation ofMultiple LinearRegression Models
Multiple LogisticRegression Model
DiagnosingLeukaemia
Other Applications
“Dimensionality Curse”
It affects data analysis in two different ways:
I A lot of features and samples;
I A lot of features and few samples.
An Efficient Solution:Random Projections
RandomProjections
Joao Brazuna
Defining RandomProjections
How Do RandomProjections Work?
How to ApplyRandomProjections?
Multiple LinearRegression Model
StochasticSimulation ofMultiple LinearRegression Models
Multiple LogisticRegression Model
DiagnosingLeukaemia
Other Applications
Notation
I Sample dimension n;
I For the i-th sample, with i ∈ {1, · · · , n}, we check pfeatures;
I Vector with the p features for sample i :
x i = (xi1, · · · , xip) ∈ Rp, ∀i ∈ {1, ..., n} ;
I We join the n vectors (by rows) in a n × p dimensionalmatrix:
X =
x t1...x tn
=
x11 · · · x1p
.... . .
...xn1 · · · xnp
∈ Rn×p.
RandomProjections
Joao Brazuna
Defining RandomProjections
How Do RandomProjections Work?
How to ApplyRandomProjections?
Multiple LinearRegression Model
StochasticSimulation ofMultiple LinearRegression Models
Multiple LogisticRegression Model
DiagnosingLeukaemia
Other Applications
Random Projections’ Goal
What do we have?n vectors in Rp
What do we want?n vectors in Rk , with k < p and all the squared distancespreserved by a (1± ε) factor.
So, we keep the sample size but we significantly reduce thenumber of features.
What do we need?A function f : Rp → Rk such that, for anyu, v ∈ {x1, · · · , xn},
(1−ε)∥∥f (u)− f (v)
∥∥2 ≤‖u − v‖2 ≤ (1+ε)∥∥f (u)− f (v)
∥∥2.
RandomProjections
Joao Brazuna
Defining RandomProjections
How Do RandomProjections Work?
How to ApplyRandomProjections?
Multiple LinearRegression Model
StochasticSimulation ofMultiple LinearRegression Models
Multiple LogisticRegression Model
DiagnosingLeukaemia
Other Applications
Random Projections’ Goal
Figure: Example of our goal. Source:[1]
RandomProjections
Joao Brazuna
Defining RandomProjections
How Do RandomProjections Work?
How to ApplyRandomProjections?
Multiple LinearRegression Model
StochasticSimulation ofMultiple LinearRegression Models
Multiple LogisticRegression Model
DiagnosingLeukaemia
Other Applications
Random Projections’ Goal
Figure: Example of our goal. Source:[1]
RandomProjections
Joao Brazuna
Defining RandomProjections
How Do RandomProjections Work?
How to ApplyRandomProjections?
Multiple LinearRegression Model
StochasticSimulation ofMultiple LinearRegression Models
Multiple LogisticRegression Model
DiagnosingLeukaemia
Other Applications
Random Projections’ Goal
Figure: Example of our goal. Source:[1]
RandomProjections
Joao Brazuna
Defining RandomProjections
How Do RandomProjections Work?
How to ApplyRandomProjections?
Multiple LinearRegression Model
StochasticSimulation ofMultiple LinearRegression Models
Multiple LogisticRegression Model
DiagnosingLeukaemia
Other Applications
Random Projections’ Goal
Figure: Example of our goal. Source:[1]
RandomProjections
Joao Brazuna
Defining RandomProjections
How Do RandomProjections Work?
How to ApplyRandomProjections?
Multiple LinearRegression Model
StochasticSimulation ofMultiple LinearRegression Models
Multiple LogisticRegression Model
DiagnosingLeukaemia
Other Applications
Principal Component Analysis vs. RandomProjections
PCA’s GoalTo preserve data variability:
I 1st Principal Component ⇒ Direction of the maximumvariability;
I 2nd Principal Component ⇒ Direction of maximumvariability that is orthogonal to the 1st ;...
RP’s GoalTo preserve distances between vectors: For all rows u, v ofX , the distance between the projected values f (u) and f (v)is similar to the original squared distance between u and v :is is between (1− ε)‖u − v‖2 and (1 + ε)‖u − v‖2.
RandomProjections
Joao Brazuna
Defining RandomProjections
How Do RandomProjections Work?
How to ApplyRandomProjections?
Multiple LinearRegression Model
StochasticSimulation ofMultiple LinearRegression Models
Multiple LogisticRegression Model
DiagnosingLeukaemia
Other Applications
Some Questions on Random Projections
I How can we define that function?
I Can we use it regardless of the values of n or p?
I Is there any restriction on k? How can we determinethe smallest possible k (dimension of the space wherewe want to project the n vectors of Rp)?
RandomProjections
Joao Brazuna
Defining RandomProjections
How Do RandomProjections Work?
How to ApplyRandomProjections?
Multiple LinearRegression Model
StochasticSimulation ofMultiple LinearRegression Models
Multiple LogisticRegression Model
DiagnosingLeukaemia
Other Applications
How Do Random Projections Work?
Lemma (Johnson-Lindenstrauss)
Let:
I X be a matrix in Rn×p, whose rows are denoted byx i ∈ Rp, ∀i ∈ {1, · · · , n}:
X =
x t1...x tn
=
x11 · · · x1p
.... . .
...xn1 · · · xnp
∈ Rn×p;
I ε ∈]0, 1[ arbitrary;
I k ∈ N such that d 243ε2−2ε3 log ne ≤ k < p.
Then, there exists f : Rp → Rk such that for anyu, v ∈ {x1, · · · , xn} we have that
(1−ε)∥∥f (u)− f (v)
∥∥2 ≤‖u − v‖2 ≤ (1+ε)∥∥f (u)− f (v)
∥∥2.
RandomProjections
Joao Brazuna
Defining RandomProjections
How Do RandomProjections Work?
How to ApplyRandomProjections?
Multiple LinearRegression Model
StochasticSimulation ofMultiple LinearRegression Models
Multiple LogisticRegression Model
DiagnosingLeukaemia
Other Applications
The AnswersIt is not possible to randomly project an arbitrary set ofvector. We must have a value of p of at leastd 24
3ε2−2ε3 log ne+ 1, because k < p.
Figure: Smallest value of k vs. ε and n
RandomProjections
Joao Brazuna
Defining RandomProjections
How Do RandomProjections Work?
How to ApplyRandomProjections?
Multiple LinearRegression Model
StochasticSimulation ofMultiple LinearRegression Models
Multiple LogisticRegression Model
DiagnosingLeukaemia
Other Applications
ConclusionsI Fixing ε, k slowly increases (logarithmically) as n
increases;I Fixing n, k quickly decreases as ε increases.
Figure: Smallest value of k vs. ε and n
RandomProjections
Joao Brazuna
Defining RandomProjections
How Do RandomProjections Work?
How to ApplyRandomProjections?
Multiple LinearRegression Model
StochasticSimulation ofMultiple LinearRegression Models
Multiple LogisticRegression Model
DiagnosingLeukaemia
Other Applications
Conclusions
We can produce a table considering n and k as integervalues.
ε0.0001 0.001 0.1 0.3 0.5 0.7 0.9 0.999 0.9999
n
10 1.85× 109 1.85× 107 1974 256 111 71 57 56 56102 3.69× 109 3.69× 107 3948 512 222 141 114 111 111103 5.53× 109 5.53× 107 5921 768 332 212 171 166 166106 1.11× 1010 1.11× 108 11842 1536 664 423 342 332 332109 1.66× 1010 1.66× 108 17763 2303 995 635 512 498 4981012 2.22× 1010 2.22× 108 23684 3071 1327 846 683 664 664
Table: Smallest value of k for fixed ε and n
For instance, 1 million vectors with dimension 10 million canbe projected, considering ε = 0.5, in R664.We obtain 1 million vectors with dimension 664 each, withall the squared distances preserved by a 1± ε factor.
RandomProjections
Joao Brazuna
Defining RandomProjections
How Do RandomProjections Work?
How to ApplyRandomProjections?
Multiple LinearRegression Model
StochasticSimulation ofMultiple LinearRegression Models
Multiple LogisticRegression Model
DiagnosingLeukaemia
Other Applications
How to Apply Random Projections?
I The tool is obtained on the proof ofJohnson-Lindenstrauss’ Lemma;
I Given a data matrix X ∈ Rn×p with n samplescontaining p parameters each, our goal is to find aprojection matrix R such that E = XR is the projectionof matrix X ;
I Let f : Rp → Rk be given by f (u) = 1√kAu, where
A ∈ Rk×p is a matrix verifying Aij ∼i .i .d .
N(0, 1) for all i,j;
I The map preserves all the squared distances by a 1± εfactor when repeatedly applied O(n) times (it occursalmost surely);
I Take R such that Rt = 1√kA and the projection is given
byE = XR.
RandomProjections
Joao Brazuna
Defining RandomProjections
How Do RandomProjections Work?
How to ApplyRandomProjections?
Multiple LinearRegression Model
StochasticSimulation ofMultiple LinearRegression Models
Multiple LogisticRegression Model
DiagnosingLeukaemia
Other Applications
How to Apply Random Projections?
1. Fix ε ∈]0, 1[;
2. Choose k ∈ N such that d 243ε2−2ε3 log ne ≤ k < p;
3. Build a matrix R = 1n√kAt , where A =
∑nl=1 Al and Al
are n real k × p dimensional matrices with standardizednormal entries, which means thatAl ij ∼
i .i .dN(0, 1), ∀i ∈ {1, ..., k} , ∀j ∈ {1, ..., p};
4. Get the projection of matrix X computing E = XR.
RandomProjections
Joao Brazuna
Defining RandomProjections
How Do RandomProjections Work?
How to ApplyRandomProjections?
Multiple LinearRegression Model
StochasticSimulation ofMultiple LinearRegression Models
Multiple LogisticRegression Model
DiagnosingLeukaemia
Other Applications
Interpreting Projected FeaturesThe data matrix X is a n dimensional sample of p features:
X =
x11 · · · x1p
.... . .
...xn1 · · · xnp
∈ Rn×p.
After the projection, the new matrix is the product of X bya matrix R ∈ Rp×k .
E = X R =
x11 · · · x1p
.... . .
...xn1 · · · xnp
r11 · · · r1k...
. . ....
rp1 · · · rpk
=
=
∑p
j=1 x1j rj1 · · ·∑p
j=1 x1j rjk...
. . ....∑p
j=1 xnj rj1 · · ·∑p
j=1 xnj rjk
∈ Rn×k
RandomProjections
Joao Brazuna
Defining RandomProjections
How Do RandomProjections Work?
How to ApplyRandomProjections?
Multiple LinearRegression Model
StochasticSimulation ofMultiple LinearRegression Models
Multiple LogisticRegression Model
DiagnosingLeukaemia
Other Applications
Interpreting Projected Features
Each new feature is a linear combination of all the originalfeatures where the coefficients are the elements of matrix R,which are
rij =1
n√k
n∑l=1
Alji
with Alji ∼ N(0, 1).
RandomProjections
Joao Brazuna
Defining RandomProjections
How Do RandomProjections Work?
How to ApplyRandomProjections?
Multiple LinearRegression Model
StochasticSimulation ofMultiple LinearRegression Models
Multiple LogisticRegression Model
DiagnosingLeukaemia
Other Applications
Multiple Linear Regression Model
Let us consider the general linear model with Gauss-Markovstructure:
y = XDβ + ε⇔⇔Yi = β0 + β1xi1 + · · ·+ βpxip + εi
I y = (Y1, ...,Yn) is the n dimensional vector containingthe values of the response variable Y ;
I XD =
1 x11 · · · x1p...
.... . .
...1 xn1 · · · xnp
is the n × (p + 1)
dimensional design matrix;
RandomProjections
Joao Brazuna
Defining RandomProjections
How Do RandomProjections Work?
How to ApplyRandomProjections?
Multiple LinearRegression Model
StochasticSimulation ofMultiple LinearRegression Models
Multiple LogisticRegression Model
DiagnosingLeukaemia
Other Applications
Multiple Linear Regression Model
y = XDβ + ε
I β = (β0, ..., βp) is the vector of the p + 1 regressionparameters;
I ε = (ε1, ..., εn) is the vector of random errors such that:I E (ε) = 0⇔ E (εi ) = 0 , ∀i ∈ {1, ..., n};I Var(ε) = σ2
I ⇔ Var(εi ) = σ2 , ∀i ∈ {1, ..., n};I Corr(εi , εj) = 0 , ∀i 6= j .
E (Y |x) = XDβ
To make inferences, we additionally suppose that
εi ∼i .i .d .
N(
0, σ2).
So the fitted values are
y = XD β ⇔ yi = β0 + β1xi1 + · · ·+ βpxip.
RandomProjections
Joao Brazuna
Defining RandomProjections
How Do RandomProjections Work?
How to ApplyRandomProjections?
Multiple LinearRegression Model
StochasticSimulation ofMultiple LinearRegression Models
Multiple LogisticRegression Model
DiagnosingLeukaemia
Other Applications
Simulating a Linear Multiple Regression Model -Ideal Case
We generated on R:
I n = 5000 samples;I p = 1000 features:
I 400 values from the distribution N(20, 25);I 500 values from the distribution Unif (5, 95);I 100 values from the distribution Bin(100, 0.5);
I X ∈ R5000×1000 is the concatenation of all 5000samples of the 1000 generated features;
I βk ∼ N(0, 100), ∀k ∈ {0, 1, · · · , p} are the trueregression parameters;
I σ = 2.055656 is the constant standard deviation of therandom errors;
I εi ∼ N(0, σ2
), ∀i ∈ {1, · · · , n} are the random errors,
with the only restriction imposed by the model!
RandomProjections
Joao Brazuna
Defining RandomProjections
How Do RandomProjections Work?
How to ApplyRandomProjections?
Multiple LinearRegression Model
StochasticSimulation ofMultiple LinearRegression Models
Multiple LogisticRegression Model
DiagnosingLeukaemia
Other Applications
Simulating a Linear Multiple Regression Model -Ideal Case
With all this data, we generate the response variable vector:
y = XDβ + ε.
Just for control, we can estimate the parametersβ0, β1, · · · , βp and verify that they are similar to the originalones.
RandomProjections
Joao Brazuna
Defining RandomProjections
How Do RandomProjections Work?
How to ApplyRandomProjections?
Multiple LinearRegression Model
StochasticSimulation ofMultiple LinearRegression Models
Multiple LogisticRegression Model
DiagnosingLeukaemia
Other Applications
Applying Random Projections to the GeneratedModel - Ideal Case
Using ε = 0.5 (factor from random projections, not the linearmodel), we reduce the number of features from p = 1000 tok = 409.Taking ε = 0.999, we can get k = 205, which is the smallestinteger k that can be used!
ε p k R2a VIF AIC PRESS
0.5 1000 409 71.23% < 5 95433 5.7× 1010
0.999 1000 205 37.58% < 5 98976 1.2× 1010
Model assumptions seem to still be verified in both cases.We obtain good results using ε = 0.5 but they are not sogood when we take ε = 0.999.
RandomProjections
Joao Brazuna
Defining RandomProjections
How Do RandomProjections Work?
How to ApplyRandomProjections?
Multiple LinearRegression Model
StochasticSimulation ofMultiple LinearRegression Models
Multiple LogisticRegression Model
DiagnosingLeukaemia
Other Applications
Simulating a Linear Multiple Regression Model -Non-Ideal Case
I n = 5000 samples as before;I p = 10000 features, 10 times more, keeping the
proportion and the distributions:I 4000 from the distribution N(20, 25);I 5000 from the distribution Unif (5, 95);I 1000 from the distribution Bin(100, 0.5);
I X ∈ R5000×1000 is the concatenation of all 5000samples of the 1000 generated features;
I βk ∼ N(0, 100), ∀k ∈ {0, 1, · · · , p} are the regressionparameters;
I σ = 2.055656 is the constant standard deviation of therandom errors;
I εi ∼ N(0, σ2
), ∀i ∈ {1, · · · , n} are the random errors,
with the only restriction imposed by the model;I y = XDβ + ε.
There are more features than samples so we cannot estimateregression parameters!
RandomProjections
Joao Brazuna
Defining RandomProjections
How Do RandomProjections Work?
How to ApplyRandomProjections?
Multiple LinearRegression Model
StochasticSimulation ofMultiple LinearRegression Models
Multiple LogisticRegression Model
DiagnosingLeukaemia
Other Applications
Applying Random Projections to the GeneratedModel - Non-Ideal Case
Using ε = 0.5 (factor from random projections, not the linearmodel), we reduce the number of features from p = 10000to k = 409 as before.Taking ε = 0.999, we can also get k = 205, which is thesmallest integer k that can be used!
ε p k R2a VIF AIC PRESS
0.5 10000 409 8% < 5 112323 1.7× 1012
0.999 10000 205 4% < 5 112379 1.7× 1012
Model assumptions seem to still be verified in both cases.We do not obtain good results using ε = 0.5 and they areeven worse when we choose ε = 0.999.
RandomProjections
Joao Brazuna
Defining RandomProjections
How Do RandomProjections Work?
How to ApplyRandomProjections?
Multiple LinearRegression Model
StochasticSimulation ofMultiple LinearRegression Models
Multiple LogisticRegression Model
DiagnosingLeukaemia
Other Applications
Conclusions
I Model assumptions are verified;
I The portion of data variability which is explained by thelinear model seems to decrease, with AIC increasingand R2
a decreasing;
I The total number of influent observations seems toincrease as we reduce the dimension k of the spacewhere we want to project our data, with PRESS gettinglarger or, at least, at the same order.
I Interpreting the regression parameters is more difficult.
RandomProjections
Joao Brazuna
Defining RandomProjections
How Do RandomProjections Work?
How to ApplyRandomProjections?
Multiple LinearRegression Model
StochasticSimulation ofMultiple LinearRegression Models
Multiple LogisticRegression Model
DiagnosingLeukaemia
Other Applications
Interpreting the Regression Parameters
In General:The coefficient βk tells us how much a fitted value increasesor decreases when the k − th explanatory variable (afterprojection) increases in one unit, keep all the other termsfixed.
Y = β0 + β1x∗1 + · · ·+ βpx
∗p
When Applying Random Projections:
The coefficient βk tells us how much a fitted value increasesor decreases when a linear combination x∗k of all the originalexplanatory variables (before projection) increase in one unit,keep all the other terms fixed. But those other terms alsodepend on the model structure considering the originalvariables...
RandomProjections
Joao Brazuna
Defining RandomProjections
How Do RandomProjections Work?
How to ApplyRandomProjections?
Multiple LinearRegression Model
StochasticSimulation ofMultiple LinearRegression Models
Multiple LogisticRegression Model
DiagnosingLeukaemia
Other Applications
Multiple Logistic Regression ModelFor the next dataset, our response variable will be binary.When the response variable is categorical, we should uselogistic regression.Using logistic functions, we can obtain the model
E(Yi |xi1, · · · , xip
)= πi =
eβ0+β1xi1+···+βpxip
1 + eβ0+β1xi1+···+βpxip
that can be linearised by the logit function
π∗i = log
(πi
1− πi
)= β0 + β1xi1 + · · ·+ βpxip.
which is continuous, linear on the parameters and takesvalues in R.
β =
β0
β1...βp
, X i =
1
Xi ,1...
Xi ,p
RandomProjections
Joao Brazuna
Defining RandomProjections
How Do RandomProjections Work?
How to ApplyRandomProjections?
Multiple LinearRegression Model
StochasticSimulation ofMultiple LinearRegression Models
Multiple LogisticRegression Model
DiagnosingLeukaemia
Other Applications
Multiple Logistic Regression Model
Figure: Example of a Logistic Function
RandomProjections
Joao Brazuna
Defining RandomProjections
How Do RandomProjections Work?
How to ApplyRandomProjections?
Multiple LinearRegression Model
StochasticSimulation ofMultiple LinearRegression Models
Multiple LogisticRegression Model
DiagnosingLeukaemia
Other Applications
Multiple Logistic Regression Model
The Model
π (x) =eβ0+β1x1+···+βpxp
1 + eβ0+β1x1+···+βpxp =ex
tβ
1 + ex tβ=[1 + e−x
tβ]−1
where the logit function is given by
π∗(x) = log
[π(x)
1− π(x)
]= β0 + β1x1 + · · ·+ βpxp
Estimating the Parameters
It is possible to estimate β using maximum likelihood, butwe get non-linear likelihood equations for any βk coefficient.We need to apply numerical methods. In R, it is used FisherScoring algorithm.
RandomProjections
Joao Brazuna
Defining RandomProjections
How Do RandomProjections Work?
How to ApplyRandomProjections?
Multiple LinearRegression Model
StochasticSimulation ofMultiple LinearRegression Models
Multiple LogisticRegression Model
DiagnosingLeukaemia
Other Applications
Another Application - Clinical Data FromLeukaemia Diagnosis
Data from clinical experiments on St. Jude Children’sResearch Hospital, Memphis, Tenessee, USA, was publishedon 2002.It is a microarray for diagnosing acute lymphoblasticleukaemia.
I n = 327 samples - 327 patients from that hospital;
I p = 12625 explanatory variables - 12625 genes;
I Y binary response variable:
Y =
{1, if the patient is ill
0, otherwise.
RandomProjections
Joao Brazuna
Defining RandomProjections
How Do RandomProjections Work?
How to ApplyRandomProjections?
Multiple LinearRegression Model
StochasticSimulation ofMultiple LinearRegression Models
Multiple LogisticRegression Model
DiagnosingLeukaemia
Other Applications
Another Application - Clinical Data FromLeukaemia Diagnosis
Figure: Construcao de um Microarray
RandomProjections
Joao Brazuna
Defining RandomProjections
How Do RandomProjections Work?
How to ApplyRandomProjections?
Multiple LinearRegression Model
StochasticSimulation ofMultiple LinearRegression Models
Multiple LogisticRegression Model
DiagnosingLeukaemia
Other Applications
Building a Regression Model
Ideal Number of Samples:
Between 5 and 10 samples per each explanatory variable.
What do we have?More explanatory variables than samples (0.026 samples pereach explanatory variable). It is not possible to estimateregression parameters!
What can we do?Applying random projections to the explanatory variables.There are n = 327 vectors in Rp, with p = 12625 that canbe projected in Rk , with k < p. By Johnson-Lindenstrauss’Lemma, the smallest k in those conditions, pickingε = 0.999 is 139.
After Projection:
327 samples and 139 explanatory variables. It is not ideal,but at least we can now build a model.
RandomProjections
Joao Brazuna
Defining RandomProjections
How Do RandomProjections Work?
How to ApplyRandomProjections?
Multiple LinearRegression Model
StochasticSimulation ofMultiple LinearRegression Models
Multiple LogisticRegression Model
DiagnosingLeukaemia
Other Applications
Building a Regression Model
The response variable is binary, so we should apply logisticregression. We will apply both methods: linear and logisticregression.
RandomProjections
Joao Brazuna
Defining RandomProjections
How Do RandomProjections Work?
How to ApplyRandomProjections?
Multiple LinearRegression Model
StochasticSimulation ofMultiple LinearRegression Models
Multiple LogisticRegression Model
DiagnosingLeukaemia
Other Applications
Applying Multiple Linear Regression
We estimate the p + 1 = 140 regression parameters on R,without considering the response variable as categorical.
Classification RuleWe want to classify the patients as being ill or not.If a fitted value yi is larger than 0.5, we classify the i − thpatient as being ill.Otherwise, we do not consider that patient as ill.
Multiple Linear Regression Model
I R2a ' 16%;
I Several variables with a very high p-value on its t-test;
I F-Test with a small p-value: 0.8%;
I VIF > 10 on 4 variables, VIF < 5 on 92 variables;
I AIC ' 20;
I PRESS ' 27.
RandomProjections
Joao Brazuna
Defining RandomProjections
How Do RandomProjections Work?
How to ApplyRandomProjections?
Multiple LinearRegression Model
StochasticSimulation ofMultiple LinearRegression Models
Multiple LogisticRegression Model
DiagnosingLeukaemia
Other Applications
Applying Multiple Linear Regression
Model Assumptions
When we try to verify model assumptions, we can easilycheck that they are not verified, so multiple linear regressionmodel does not fit the data.
PredictionThere are 327 patients, 19 of them not ill and 308 ill.Applying the previous classification rule, we get
I 312 ill patients;
I 15 healthy patients.
Also, we did not split the data on training set and test set...
RandomProjections
Joao Brazuna
Defining RandomProjections
How Do RandomProjections Work?
How to ApplyRandomProjections?
Multiple LinearRegression Model
StochasticSimulation ofMultiple LinearRegression Models
Multiple LogisticRegression Model
DiagnosingLeukaemia
Other Applications
Applying Multiple Logistic Regression
An Additional ProblemWe have a sample of 19 healthy patients and 308 ill ones in atotal os 327 observations, so the data is not balanced. Thisleads to convergence problems while estimating parameters...
On Possible SolutionAuthorizing a maximum error of 0.1 on parametersestimation and 100000 iterations.
Applying Multiple Logistic Regression
I Very high standard deviation for each coefficient(order of 103
);
I AIC ' 280;
I PRESS ' 4.62.
RandomProjections
Joao Brazuna
Defining RandomProjections
How Do RandomProjections Work?
How to ApplyRandomProjections?
Multiple LinearRegression Model
StochasticSimulation ofMultiple LinearRegression Models
Multiple LogisticRegression Model
DiagnosingLeukaemia
Other Applications
Applying Multiple Logistic Regression
PredictionIt is on prediction that we have the most importantimprovement.Applying the classification rule defined above, we predict
I 308 ill patients;
I 19 healthy patients.
which correspond to the real data.Once again, we did not split the data on training set andtest set...
RandomProjections
Joao Brazuna
Defining RandomProjections
How Do RandomProjections Work?
How to ApplyRandomProjections?
Multiple LinearRegression Model
StochasticSimulation ofMultiple LinearRegression Models
Multiple LogisticRegression Model
DiagnosingLeukaemia
Other Applications
Comparing Random Projections and PCA
Advantages of PCA
I Better adjusted coefficient of determination (concisewith PCA’s goal);
I Faster if we extract few principal components.
Advantages of RP
I Better predictions;
I Faster if we need to extract a lot of principalcomponents when using PCA.
RandomProjections
Joao Brazuna
Defining RandomProjections
How Do RandomProjections Work?
How to ApplyRandomProjections?
Multiple LinearRegression Model
StochasticSimulation ofMultiple LinearRegression Models
Multiple LogisticRegression Model
DiagnosingLeukaemia
Other Applications
Other Applications - Linear Regression withCompressive Ordinary Least Squares
Figure: Application to Detection of Musical Patterns withn = 2000, p = 106 and Very Sparse Data. Source: [5]
RandomProjections
Joao Brazuna
Defining RandomProjections
How Do RandomProjections Work?
How to ApplyRandomProjections?
Multiple LinearRegression Model
StochasticSimulation ofMultiple LinearRegression Models
Multiple LogisticRegression Model
DiagnosingLeukaemia
Other Applications
Other Applications - Noise Detection on Images
Figure: Application to Noise Detection on Images. Source: [7]
RandomProjections
Joao Brazuna
Defining RandomProjections
How Do RandomProjections Work?
How to ApplyRandomProjections?
Multiple LinearRegression Model
StochasticSimulation ofMultiple LinearRegression Models
Multiple LogisticRegression Model
DiagnosingLeukaemia
Other Applications
Other Applications - Noise Detection on Images
Figure: Impact over required floating point operations after noisedetection on images, using MATLAB. Source: [7]
RandomProjections
Joao Brazuna
Appendix
Bibliography
Bibliography I
Aditya Krishna Menon.Random Projections and Applications to DimensionalityReduction.School of Information Technologies, University ofSydney, Australia, 2007.
Lopez-Paz and Duvenaud.Random Projections.School of Engineering and Applied Sciences, Universityof Harvard, Estados Unidos da America, 2013.
Michael Mahoney.The Johnson-Lindenstrauss Lemma.School of Engineering, University of Standford, EstadosUnidos da America, 2009.
RandomProjections
Joao Brazuna
Appendix
Bibliography
Bibliography II
Sanjoy Dasgupta and Anupam Gupta.An Elementary Proof of a Theorem of Johnson andLindenstrauss.New Jersey, Estados Unidos da America, 2001.
Robert J. Durrant and Ata Kaban.Random Projections for Machine Learning and DataMining: Theory and Applications.University of Birmingham, Reino Unido, 2012.
Conceicao Amado.Regressao Logıstica - Uma Introducao.Instituto Superior Tecnico, Universidade de Lisboa,Portugal, 2010.
RandomProjections
Joao Brazuna
Appendix
Bibliography
Bibliography III
Angelo Cardoso.Random Projections in Dimensionality Reduction.Instituto Superior Tecnico, Universidade de Lisboa,Portugal, 2009.