regressione: processi gaussiani e machine...
Embed Size (px)
TRANSCRIPT
Corso di Sistemi di Elaborazione dellInformazione
Prof. Giuseppe Boccignone
Dipartimento di Informatica
Universit di Milano
boccignone.di.unimi.it/FM_2016.html
SCUOLA DI SPECIALIZZAZIONE IN FISICA MEDICA
Regressione: processi Gaussiani e Machine Learning
Apprendimento supervisionato //Regressione
Processi Gaussiani: prendendo un qualsiasi numero finito di variabili aleatorie, dallensemble che forma il processo aleatorio stesso, esse hanno una distribuzione di probabilit congiunta gaussiana (multivariata)
Pi precisamente: Gaussiano se per qualunque scelta di si ha che
Processi stocastici // Classi di processi
Chapter 3
Gaussian Processes andStochastic DifferentialEquations
3.1 Gaussian Processes
Gaussian processes are a reasonably large class of processes that havemany applications, including in finance, time-series analysis and machinelearning. Certain classes of Gaussian processes can also be thought of asspatial processes. We will use these as a vehicle to start considering moregeneral spatial objects.
Definition 3.1.1 (Gaussian Process). A stochastic process {Xt}t0 is Gaus-sian if, for any choice of times t1, . . . , tn, the random vector (Xt1 , . . . , Xtn)has a multivariate normal distribution.
3.1.1 The Multivariate Normal Distribution
Because Gaussian processes are defined in terms of the multivariate nor-mal distribution, we will need to have a pretty good understanding of thisdistribution and its properties.
Definition 3.1.2 (Multivariate Normal Distribution). A vectorX = (X1, . . . , Xn)is said to be multivariate normal (multivariate Gaussian) if all linear combi-nations of X, i.e. all random variables of the form
n
k=1
kXk
have univariate normal distributions.
89
Chapter 3
Gaussian Processes andStochastic DifferentialEquations
3.1 Gaussian Processes
Gaussian processes are a reasonably large class of processes that havemany applications, including in finance, time-series analysis and machinelearning. Certain classes of Gaussian processes can also be thought of asspatial processes. We will use these as a vehicle to start considering moregeneral spatial objects.
Definition 3.1.1 (Gaussian Process). A stochastic process {Xt}t0 is Gaus-sian if, for any choice of times t1, . . . , tn, the random vector (Xt1 , . . . , Xtn)has a multivariate normal distribution.
3.1.1 The Multivariate Normal Distribution
Because Gaussian processes are defined in terms of the multivariate nor-mal distribution, we will need to have a pretty good understanding of thisdistribution and its properties.
Definition 3.1.2 (Multivariate Normal Distribution). A vectorX = (X1, . . . , Xn)is said to be multivariate normal (multivariate Gaussian) if all linear combi-nations of X, i.e. all random variables of the form
n
k=1
kXk
have univariate normal distributions.
89
3.1. GAUSSIAN PROCESSES 93
Simulating Multivariate Normals
One possible way to simulate a multivariate normal random vector isusing the Cholesky decomposition.
Lemma 3.1.11. If Z N(0, I), = AA is a covariance matrix, andX = + AZ, then X N(,).
Proof. We know from lemma 3.1.4 that AZ is multivariate normal and sois + AZ. Because a multivariate normal random vector is completelydescribed by its mean vector and covariance matrix, we simple need to findEX and Var(X). Now,
EX = E+ EAZ = + A0 =
and
Var(X) = Var(+ AZ) = Var(AZ) = AVar(Z)A = AIA = AA = .
Thus, we can simulate a random vector X N(,) by
(i) Finding A such that = AA.
(ii) Generating Z N(0, I).
(iii) Returning X = + AZ.
Matlab makes things a bit confusing because its function chol producesa decomposition of the form BB. That is, A = B. It is important tobe careful and think about whether you want to generate a row or columnvector when generating multivariate normals.
The following code produces column vectors.
Listing 3.1: Matlab code
1 mu = [1 2 3];
2 Sigma = [3 1 2; 1 2 1; 2 1 5];
3 A = chol(Sigma);
4 X = mu + A * randn(3,1);
The following code produces row vectors.
Listing 3.2: Matlab code
1 mu = [1 2 3];
2 Sigma = [3 1 2; 1 2 1; 2 1 5];
3 A = chol(Sigma);
4 X = mu + randn(1,3) * A;
92 CHAPTER 3. GAUSSIAN PROCESSES ETC.
If A is also symmetric, then A is called symmetric positive semi-definite(SPSD). Note that if A is an SPSD then there is a real-valued decomposition
A = LL,
though it is not necessarily unique and L may have zeroes on the diagonals.
Lemma 3.1.9. Covariance matrices are SPSD.
Proof. Given an n 1 real-valued vector x and a random vector Y withcovariance matrix , we have
Var (xY) = xVar (Y)x.
Now this must be non-negative (as it is a variance). That is, it must be thecase that
xVar (Y)x 0,
so is positive semi-definite. Symmetry comes from the fact that Cov(X, Y ) =Cov(Y,X).
Lemma 3.1.10. SPSD matrices are covariance matrices.
Proof. Let A be an SPSD matrix and Z be a vector of random variables withVar(Z) = I. Now, as A is SPSD, A = LL. So,
Var(LZ) = LVar(Z)L = LIL = A,
so A is the covariance matrix of LZ.
COVARIANCE POS DEF ...
Densities of Multivariate Normals
If X = (X1, . . . , Xn) is N(,) and is positive definite, then X has thedensity
f(x) = f(x1, . . . , xn) =1
(2)n||exp
{12(x )1(x )
}.
Chapter 3
Gaussian Processes andStochastic DifferentialEquations
3.1 Gaussian Processes
Gaussian processes are a reasonably large class of processes that havemany applications, including in finance, time-series analysis and machinelearning. Certain classes of Gaussian processes can also be thought of asspatial processes. We will use these as a vehicle to start considering moregeneral spatial objects.
Definition 3.1.1 (Gaussian Process). A stochastic process {Xt}t0 is Gaus-sian if, for any choice of times t1, . . . , tn, the random vector (Xt1 , . . . , Xtn)has a multivariate normal distribution.
3.1.1 The Multivariate Normal Distribution
Because Gaussian processes are defined in terms of the multivariate nor-mal distribution, we will need to have a pretty good understanding of thisdistribution and its properties.
Definition 3.1.2 (Multivariate Normal Distribution). A vectorX = (X1, . . . , Xn)is said to be multivariate normal (multivariate Gaussian) if all linear combi-nations of X, i.e. all random variables of the form
n
k=1
kXk
have univariate normal distributions.
89
92 CHAPTER 3. GAUSSIAN PROCESSES ETC.
If A is also symmetric, then A is called symmetric positive semi-definite(SPSD). Note that if A is an SPSD then there is a real-valued decomposition
A = LL,
though it is not necessarily unique and L may have zeroes on the diagonals.
Lemma 3.1.9. Covariance matrices are SPSD.
Proof. Given an n 1 real-valued vector x and a random vector Y withcovariance matrix , we have
Var (xY) = xVar (Y)x.
Now this must be non-negative (as it is a variance). That is, it must be thecase that
xVar (Y)x 0,
so is positive semi-definite. Symmetry comes from the fact that Cov(X, Y ) =Cov(Y,X).
Lemma 3.1.10. SPSD matrices are covariance matrices.
Proof. Let A be an SPSD matrix and Z be a vector of random variables withVar(Z) = I. Now, as A is SPSD, A = LL. So,
Var(LZ) = LVar(Z)L = LIL = A,
so A is the covariance matrix of LZ.
COVARIANCE POS DEF ...
Densities of Multivariate Normals
If X = (X1, . . . , Xn) is N(,) and is positive definite, then X has thedensity
f(x) = f(x1, . . . , xn) =1
(2)n||exp
{12(x )1(x )
}.
92 CHAPTER 3. GAUSSIAN PROCESSES ETC.
If A is also symmetric, then A is called symmetric positive semi-definite(SPSD). Note that if A is an SPSD then there is a real-valued decomposition
A = LL,
though it is not necessarily unique and L may have zeroes on the diagonals.
Lemma 3.1.9. Covariance matrices are SPSD.
Proof. Given an n 1 real-valued vector x and a random vector Y withcovariance matrix , we have
Var (xY) = xVar (Y)x.
Now this must be non-negative (as it is a variance). That is, it must be thecase that
xVar (Y)x 0,
so is positive semi-definite. Symmetry comes from the fact that Cov(X, Y ) =Cov(Y,X).
Lemma 3.1.10. SPSD matrices are covariance matrices.
Proof. Let A be an SPSD matrix and Z be a vector of random variables withVar(Z) = I. Now, as A is SPSD, A = LL. So,
Var(LZ) = LVar(Z)L = LIL = A,
so A is the covariance matrix of LZ.
COVARIANCE POS DEF ...
Densities of Multivariate Normals
If X = (X1, . . . , Xn) is N(,) and is positive definite, then X has thedensity
f(x) = f(x1, . . . , xn) =1
(2)n||exp
{12(x )1(x )
}.
92 CHAPTER 3. GAUSSIAN PROCESSES ETC.
If A is also symmetric, then A is called symmetric positive semi-definite(SPSD). Note that if A is an SPSD then there is a real-valued decomposition
A = LL,
though it is not necessarily unique and L may have zeroes on the diagonals.
Lemma 3.1.9. Covariance matrices are SPSD.
Proof. Given an n 1 real-valued vector x and a random vector Y withcovariance matrix , we have
Var (xY) = xVar (Y)x.
Now this must be non-negative (as it is a variance). That is, it must be thecase that
xVar (Y)x 0,
so is positive semi-definite. Symmetry comes from the fact that Cov(X, Y ) =Cov(Y,X).
Lemma 3.1.10. SPSD matrices are covariance matrices.
Proof. Let A be an SPSD matrix and Z be a vector of random variables withVar(Z) = I. Now, as A is SPSD, A = LL. So,
Var(LZ) = LVar(Z)L = LIL = A,
so A is the covariance matrix of LZ.
COVARIANCE POS DEF ...
Densities of Multivariate Normals
If X = (X1, . . . , Xn) is N(,) and is positive definite, then X has thedensity
f(x) = f(x1, . . . , xn) =1
(2)n||exp
{12(x )1(x )
}.
Processi Gaussiani (GP) e Machine Learning
Processi Gaussiani: prendendo un qualsiasi numero finito di variabili aleatorie, dalla collezione che forma il processo aleatorio stesso, esse hanno una distribuzione di probabilit congiunta gaussiana (multivariata)
Un processo Gaussiano completamente specificato mediante media e matrice di covarianza
3.1. GAUSSIAN PROCESSES 93
Simulating Multivariate Normals
One possible way to simulate a multivariate normal random vector isusing the Cholesky decomposition.
Lemma 3.1.11. If Z N(0, I), = AA is a covariance matrix, andX = + AZ, then X N(,).
Proof. We know from lemma 3.1.4 that AZ is multivariate normal and sois + AZ. Because a multivariate normal random vector is completelydescribed by its mean vector and covariance matrix, we simple need to findEX and Var(X). Now,
EX = E+ EAZ = + A0 =
and
Var(X) = Var(+ AZ) = Var(AZ) = AVar(Z)A = AIA = AA = .
Thus, we can simulate a random vector X N(,) by
(i) Finding A such that = AA.
(ii) Generating Z N(0, I).
(iii) Returning X = + AZ.
Matlab makes things a bit confusing because its function chol producesa decomposition of the form BB. That is, A = B. It is important tobe careful and think about whether you want to generate a row or columnvector when generating multivariate normals.
The following code produces column vectors.
Listing 3.1: Matlab code
1 mu = [1 2 3];
2 Sigma = [3 1 2; 1 2 1; 2 1 5];
3 A = chol(Sigma);
4 X = mu + A * randn(3,1);
The following code produces row vectors.
Listing 3.2: Matlab code
1 mu = [1 2 3];
2 Sigma = [3 1 2; 1 2 1; 2 1 5];
3 A = chol(Sigma);
4 X = mu + randn(1,3) * A;
Processi stocastici // Simulazione di processi Gaussiani
94 CHAPTER 3. GAUSSIAN PROCESSES ETC.
The complexity of the Cholesky decomposition of an arbitrary real-valuedmatrix is O(n3) floating point operations.
3.1.2 Simulating a Gaussian Processes Version 1
Because Gaussian processes are defined to be processes where, for anychoice of times t1, . . . , tn, the random vector (Xt1 , . . . , Xtn) has a multivariatenormal density and a multivariate normal density is completely describe bya mean vector and a covariance matrix , the probability distribution ofGaussian process is completely described if we have a way to construct themean vector and covariance matrix for an arbitrary choice of t1, . . . , tn.
We can do this using an expectation function
(t) = EXt
and a covariance function
r(s, t) = Cov(Xs, Xt).
Using these, we can simulate the values of a Gaussian process at fixed timest1, . . . , tn by calculating where i = (t1) and , where i,j = r(ti, tj) andthen simulating a multivariate normal vector.
In the following examples, we simulate the Gaussian processes at evenlyspaced times t1 = 1/h, t2 = 2/h, . . . , tn = 1.
Example 3.1.12 (Brownian Motion). Brownian motion is a very importantstochastic process which is, in some sense, the continuous time continuousstate space analogue of a simple random walk. It has an expectation function(t) = 0 and a covariance function r(s, t) = min(s, t).
Listing 3.3: Matlab code
1 %use mesh size h
2 m = 1000; h = 1/m; t = h : h : 1;
3 n = length(t);
4
5 %Make the mean vector
6 mu = zeros(1,n);
7 for i = 1:n
8 mu(i) = 0;
9 end
10
11 %Make the covariance matrix
12 Sigma = zeros(n,n);
94 CHAPTER 3. GAUSSIAN PROCESSES ETC.
The complexity of the Cholesky decomposition of an arbitrary real-valuedmatrix is O(n3) floating point operations.
3.1.2 Simulating a Gaussian Processes Version 1
Because Gaussian processes are defined to be processes where, for anychoice of times t1, . . . , tn, the random vector (Xt1 , . . . , Xtn) has a multivariatenormal density and a multivariate normal density is completely describe bya mean vector and a covariance matrix , the probability distribution ofGaussian process is completely described if we have a way to construct themean vector and covariance matrix for an arbitrary choice of t1, . . . , tn.
We can do this using an expectation function
(t) = EXt
and a covariance function
r(s, t) = Cov(Xs, Xt).
Using these, we can simulate the values of a Gaussian process at fixed timest1, . . . , tn by calculating where i = (t1) and , where i,j = r(ti, tj) andthen simulating a multivariate normal vector.
In the following examples, we simulate the Gaussian processes at evenlyspaced times t1 = 1/h, t2 = 2/h, . . . , tn = 1.
Example 3.1.12 (Brownian Motion). Brownian motion is a very importantstochastic process which is, in some sense, the continuous time continuousstate space analogue of a simple random walk. It has an expectation function(t) = 0 and a covariance function r(s, t) = min(s, t).
Listing 3.3: Matlab code
1 %use mesh size h
2 m = 1000; h = 1/m; t = h : h : 1;
3 n = length(t);
4
5 %Make the mean vector
6 mu = zeros(1,n);
7 for i = 1:n
8 mu(i) = 0;
9 end
10
11 %Make the covariance matrix
12 Sigma = zeros(n,n);
94 CHAPTER 3. GAUSSIAN PROCESSES ETC.
The complexity of the Cholesky decomposition of an arbitrary real-valuedmatrix is O(n3) floating point operations.
3.1.2 Simulating a Gaussian Processes Version 1
Because Gaussian processes are defined to be processes where, for anychoice of times t1, . . . , tn, the random vector (Xt1 , . . . , Xtn) has a multivariatenormal density and a multivariate normal density is completely describe bya mean vector and a covariance matrix , the probability distribution ofGaussian process is completely described if we have a way to construct themean vector and covariance matrix for an arbitrary choice of t1, . . . , tn.
We can do this using an expectation function
(t) = EXt
and a covariance function
r(s, t) = Cov(Xs, Xt).
Using these, we can simulate the values of a Gaussian process at fixed timest1, . . . , tn by calculating where i = (t1) and , where i,j = r(ti, tj) andthen simulating a multivariate normal vector.
In the following examples, we simulate the Gaussian processes at evenlyspaced times t1 = 1/h, t2 = 2/h, . . . , tn = 1.
Example 3.1.12 (Brownian Motion). Brownian motion is a very importantstochastic process which is, in some sense, the continuous time continuousstate space analogue of a simple random walk. It has an expectation function(t) = 0 and a covariance function r(s, t) = min(s, t).
Listing 3.3: Matlab code
1 %use mesh size h
2 m = 1000; h = 1/m; t = h : h : 1;
3 n = length(t);
4
5 %Make the mean vector
6 mu = zeros(1,n);
7 for i = 1:n
8 mu(i) = 0;
9 end
10
11 %Make the covariance matrix
12 Sigma = zeros(n,n);
94 CHAPTER 3. GAUSSIAN PROCESSES ETC.
The complexity of the Cholesky decomposition of an arbitrary real-valuedmatrix is O(n3) floating point operations.
3.1.2 Simulating a Gaussian Processes Version 1
Because Gaussian processes are defined to be processes where, for anychoice of times t1, . . . , tn, the random vector (Xt1 , . . . , Xtn) has a multivariatenormal density and a multivariate normal density is completely describe bya mean vector and a covariance matrix , the probability distribution ofGaussian process is completely described if we have a way to construct themean vector and covariance matrix for an arbitrary choice of t1, . . . , tn.
We can do this using an expectation function
(t) = EXt
and a covariance function
r(s, t) = Cov(Xs, Xt).
Using these, we can simulate the values of a Gaussian process at fixed timest1, . . . , tn by calculating where i = (t1) and , where i,j = r(ti, tj) andthen simulating a multivariate normal vector.
In the following examples, we simulate the Gaussian processes at evenlyspaced times t1 = 1/h, t2 = 2/h, . . . , tn = 1.
Example 3.1.12 (Brownian Motion). Brownian motion is a very importantstochastic process which is, in some sense, the continuous time continuousstate space analogue of a simple random walk. It has an expectation function(t) = 0 and a covariance function r(s, t) = min(s, t).
Listing 3.3: Matlab code
1 %use mesh size h
2 m = 1000; h = 1/m; t = h : h : 1;
3 n = length(t);
4
5 %Make the mean vector
6 mu = zeros(1,n);
7 for i = 1:n
8 mu(i) = 0;
9 end
10
11 %Make the covariance matrix
12 Sigma = zeros(n,n);
94 CHAPTER 3. GAUSSIAN PROCESSES ETC.
The complexity of the Cholesky decomposition of an arbitrary real-valuedmatrix is O(n3) floating point operations.
3.1.2 Simulating a Gaussian Processes Version 1
Because Gaussian processes are defined to be processes where, for anychoice of times t1, . . . , tn, the random vector (Xt1 , . . . , Xtn) has a multivariatenormal density and a multivariate normal density is completely describe bya mean vector and a covariance matrix , the probability distribution ofGaussian process is completely described if we have a way to construct themean vector and covariance matrix for an arbitrary choice of t1, . . . , tn.
We can do this using an expectation function
(t) = EXt
and a covariance function
r(s, t) = Cov(Xs, Xt).
Using these, we can simulate the values of a Gaussian process at fixed timest1, . . . , tn by calculating where i = (t1) and , where i,j = r(ti, tj) andthen simulating a multivariate normal vector.
In the following examples, we simulate the Gaussian processes at evenlyspaced times t1 = 1/h, t2 = 2/h, . . . , tn = 1.
Example 3.1.12 (Brownian Motion). Brownian motion is a very importantstochastic process which is, in some sense, the continuous time continuousstate space analogue of a simple random walk. It has an expectation function(t) = 0 and a covariance function r(s, t) = min(s, t).
Listing 3.3: Matlab code
1 %use mesh size h
2 m = 1000; h = 1/m; t = h : h : 1;
3 n = length(t);
4
5 %Make the mean vector
6 mu = zeros(1,n);
7 for i = 1:n
8 mu(i) = 0;
9 end
10
11 %Make the covariance matrix
12 Sigma = zeros(n,n);
94 CHAPTER 3. GAUSSIAN PROCESSES ETC.
The complexity of the Cholesky decomposition of an arbitrary real-valuedmatrix is O(n3) floating point operations.
3.1.2 Simulating a Gaussian Processes Version 1
Because Gaussian processes are defined to be processes where, for anychoice of times t1, . . . , tn, the random vector (Xt1 , . . . , Xtn) has a multivariatenormal density and a multivariate normal density is completely describe bya mean vector and a covariance matrix , the probability distribution ofGaussian process is completely described if we have a way to construct themean vector and covariance matrix for an arbitrary choice of t1, . . . , tn.
We can do this using an expectation function
(t) = EXt
and a covariance function
r(s, t) = Cov(Xs, Xt).
Using these, we can simulate the values of a Gaussian process at fixed timest1, . . . , tn by calculating where i = (t1) and , where i,j = r(ti, tj) andthen simulating a multivariate normal vector.
In the following examples, we simulate the Gaussian processes at evenlyspaced times t1 = 1/h, t2 = 2/h, . . . , tn = 1.
Example 3.1.12 (Brownian Motion). Brownian motion is a very importantstochastic process which is, in some sense, the continuous time continuousstate space analogue of a simple random walk. It has an expectation function(t) = 0 and a covariance function r(s, t) = min(s, t).
Listing 3.3: Matlab code
1 %use mesh size h
2 m = 1000; h = 1/m; t = h : h : 1;
3 n = length(t);
4
5 %Make the mean vector
6 mu = zeros(1,n);
7 for i = 1:n
8 mu(i) = 0;
9 end
10
11 %Make the covariance matrix
12 Sigma = zeros(n,n);
3.1. GAUSSIAN PROCESSES 95
13 for i = 1:n
14 for j = 1:n
15 Sigma(i,j) = min(t(i),t(j));
16 end
17 end
18
19 %Generate the multivariate normal vector
20 A = chol(Sigma);
21 X = mu + randn(1,n) * A;
22
23 %Plot
24 plot(t,X);
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 11.8
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0
0.2
t
Xt
Figure 3.1.1: Plot of Brownain motion on [0, 1].
Example 3.1.13 (Ornstein-Uhlenbeck Process). Another very importantGaussian process is the Ornstein-Uhlenbeck process. This has expectationfunction (t) = 0 and covariance function r(s, t) = e|st|/2.
Listing 3.4: Matlab code
1 %use mesh size h
2 m = 1000; h = 1/m; t = h : h : 1;
3 n = length(t);
4 %paramter of OU process
5 alpha = 10;
6
3.1. GAUSSIAN PROCESSES 95
13 for i = 1:n
14 for j = 1:n
15 Sigma(i,j) = min(t(i),t(j));
16 end
17 end
18
19 %Generate the multivariate normal vector
20 A = chol(Sigma);
21 X = mu + randn(1,n) * A;
22
23 %Plot
24 plot(t,X);
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 11.8
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0
0.2
t
Xt
Figure 3.1.1: Plot of Brownain motion on [0, 1].
Example 3.1.13 (Ornstein-Uhlenbeck Process). Another very importantGaussian process is the Ornstein-Uhlenbeck process. This has expectationfunction (t) = 0 and covariance function r(s, t) = e|st|/2.
Listing 3.4: Matlab code
1 %use mesh size h
2 m = 1000; h = 1/m; t = h : h : 1;
3 n = length(t);
4 %paramter of OU process
5 alpha = 10;
6
96 CHAPTER 3. GAUSSIAN PROCESSES ETC.
7 %Make the mean vector
8 mu = zeros(1,n);
9 for i = 1:n
10 mu(i) = 0;
11 end
12
13 %Make the covariance matrix
14 Sigma = zeros(n,n);
15 for i = 1:n
16 for j = 1:n
17 Sigma(i,j) = exp(-alpha * abs(t(i) - t(j)) / 2);
18 end
19 end
20
21 %Generate the multivariate normal vector
22 A = chol(Sigma);
23 X = mu + randn(1,n) * A;
24
25 %Plot
26 plot(t,X);
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 11.5
1
0.5
0
0.5
1
1.5
2
2.5
3
t
Xt
Figure 3.1.2: Plot of an Ornstein-Uhlenbeck process on [0, 1] with = 10.
Example 3.1.14 (Fractional Brownian Motion). Fractional Brownian mo-tion (fBm) is a generalisation of Brownian Motion. It has expectation func-tion (t) = 0 and covariance function Cov(s, t) = 1/2(t2H + s2H |t s|2H),
Simulazione di un processo di Ornstein-Uhlenbeck (O-U):
Processi stocastici // Simulazione di processi Gaussiani Simulazione di un processo di Ornstein-Uhlenbeck (O-U):
3.1. GAUSSIAN PROCESSES 95
13 for i = 1:n
14 for j = 1:n
15 Sigma(i,j) = min(t(i),t(j));
16 end
17 end
18
19 %Generate the multivariate normal vector
20 A = chol(Sigma);
21 X = mu + randn(1,n) * A;
22
23 %Plot
24 plot(t,X);
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 11.8
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0
0.2
t
Xt
Figure 3.1.1: Plot of Brownain motion on [0, 1].
Example 3.1.13 (Ornstein-Uhlenbeck Process). Another very importantGaussian process is the Ornstein-Uhlenbeck process. This has expectationfunction (t) = 0 and covariance function r(s, t) = e|st|/2.
Listing 3.4: Matlab code
1 %use mesh size h
2 m = 1000; h = 1/m; t = h : h : 1;
3 n = length(t);
4 %paramter of OU process
5 alpha = 10;
6
96 CHAPTER 3. GAUSSIAN PROCESSES ETC.
7 %Make the mean vector
8 mu = zeros(1,n);
9 for i = 1:n
10 mu(i) = 0;
11 end
12
13 %Make the covariance matrix
14 Sigma = zeros(n,n);
15 for i = 1:n
16 for j = 1:n
17 Sigma(i,j) = exp(-alpha * abs(t(i) - t(j)) / 2);
18 end
19 end
20
21 %Generate the multivariate normal vector
22 A = chol(Sigma);
23 X = mu + randn(1,n) * A;
24
25 %Plot
26 plot(t,X);
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 11.5
1
0.5
0
0.5
1
1.5
2
2.5
3
t
Xt
Figure 3.1.2: Plot of an Ornstein-Uhlenbeck process on [0, 1] with = 10.
Example 3.1.14 (Fractional Brownian Motion). Fractional Brownian mo-tion (fBm) is a generalisation of Brownian Motion. It has expectation func-tion (t) = 0 and covariance function Cov(s, t) = 1/2(t2H + s2H |t s|2H),
96 CHAPTER 3. GAUSSIAN PROCESSES ETC.
7 %Make the mean vector
8 mu = zeros(1,n);
9 for i = 1:n
10 mu(i) = 0;
11 end
12
13 %Make the covariance matrix
14 Sigma = zeros(n,n);
15 for i = 1:n
16 for j = 1:n
17 Sigma(i,j) = exp(-alpha * abs(t(i) - t(j)) / 2);
18 end
19 end
20
21 %Generate the multivariate normal vector
22 A = chol(Sigma);
23 X = mu + randn(1,n) * A;
24
25 %Plot
26 plot(t,X);
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 11.5
1
0.5
0
0.5
1
1.5
2
2.5
3
t
Xt
Figure 3.1.2: Plot of an Ornstein-Uhlenbeck process on [0, 1] with = 10.
Example 3.1.14 (Fractional Brownian Motion). Fractional Brownian mo-tion (fBm) is a generalisation of Brownian Motion. It has expectation func-tion (t) = 0 and covariance function Cov(s, t) = 1/2(t2H + s2H |t s|2H),
3.1. GAUSSIAN PROCESSES 97
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 13
2
1
0
1
2
3
4
t
Xt
Figure 3.1.3: Plot of an Ornstein-Uhlenbeck process on [0, 1] with = 100.
where H (0, 1) is called the Hurst parameter. When H = 1/2, fBm re-duces to standard Brownian motion. Brownian motion has independent in-crements. In contrast, for H > 1/2 fBm has positively correlated incrementsand for H < 1/2 fBm has negatively correlated increments.
Listing 3.5: Matlab code
1 %Hurst parameter
2 H = .9;
3
4 %use mesh size h
5 m = 1000; h = 1/m; t = h : h : 1;
6 n = length(t);
7
8 %Make the mean vector
9 mu = zeros(1,n);
10 for i = 1:n
11 mu(i) = 0;
12 end
13
14 %Make the covariance matrix
15 Sigma = zeros(n,n);
16 for i = 1:n
17 for j = 1:n
18 Sigma(i,j) = 1/2 * (t(i)^(2*H) + t(j)^(2*H)...
19 - (abs(t(i) - t(j)))^(2 * H));
3.1. GAUSSIAN PROCESSES 97
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 13
2
1
0
1
2
3
4
t
Xt
Figure 3.1.3: Plot of an Ornstein-Uhlenbeck process on [0, 1] with = 100.
where H (0, 1) is called the Hurst parameter. When H = 1/2, fBm re-duces to standard Brownian motion. Brownian motion has independent in-crements. In contrast, for H > 1/2 fBm has positively correlated incrementsand for H < 1/2 fBm has negatively correlated increments.
Listing 3.5: Matlab code
1 %Hurst parameter
2 H = .9;
3
4 %use mesh size h
5 m = 1000; h = 1/m; t = h : h : 1;
6 n = length(t);
7
8 %Make the mean vector
9 mu = zeros(1,n);
10 for i = 1:n
11 mu(i) = 0;
12 end
13
14 %Make the covariance matrix
15 Sigma = zeros(n,n);
16 for i = 1:n
17 for j = 1:n
18 Sigma(i,j) = 1/2 * (t(i)^(2*H) + t(j)^(2*H)...
19 - (abs(t(i) - t(j)))^(2 * H));
96 CHAPTER 3. GAUSSIAN PROCESSES ETC.
7 %Make the mean vector
8 mu = zeros(1,n);
9 for i = 1:n
10 mu(i) = 0;
11 end
12
13 %Make the covariance matrix
14 Sigma = zeros(n,n);
15 for i = 1:n
16 for j = 1:n
17 Sigma(i,j) = exp(-alpha * abs(t(i) - t(j)) / 2);
18 end
19 end
20
21 %Generate the multivariate normal vector
22 A = chol(Sigma);
23 X = mu + randn(1,n) * A;
24
25 %Plot
26 plot(t,X);
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 11.5
1
0.5
0
0.5
1
1.5
2
2.5
3
t
Xt
Figure 3.1.2: Plot of an Ornstein-Uhlenbeck process on [0, 1] with = 10.
Example 3.1.14 (Fractional Brownian Motion). Fractional Brownian mo-tion (fBm) is a generalisation of Brownian Motion. It has expectation func-tion (t) = 0 and covariance function Cov(s, t) = 1/2(t2H + s2H |t s|2H),
Processi Gaussiani (GP) e Machine Learning
Processi Gaussiani: prendendo un qualsiasi numero finito di variabili aleatorie, dalla collezione che forma il processo aleatorio stesso, esse hanno una distribuzione di probabilit congiunta gaussiana (multivariata)
Un processo Gaussiano completamente specificato mediante media e matrice di covarianza
Problema per il Machine Learning: imparare i parametri della funzione di covarianza
Further reading
Many more topics and code:http://www.gaussianprocess.org/gpml/
MCMC inference for GPs is implemented in FBM:http://www.cs.toronto.edu/~radford/fbm.software.html
Gaussian processes for ordinal regression, Chu and Ghahramani,JMLR, 6:10191041, 2005.
Flexible and ecient Gaussian process models for machine learning,Edward L. Snelson, PhD thesis, UCL, 2007.
http://www.gaussianprocess.org/gpml/chapters/
http://www.gaussianprocess.org/gpml/code/matlab/doc/index.html
Processi Gaussiani: presentazione intuitiva //idee di base
Supponiamo di avere 6 punti rumorosi: vogliamo predire a x*=0.2
Processi Gaussiani //idee di base
Supponiamo di avere 6 punti rumorosi: vogliamo predire a x*=0.2
Non assumiamo un modello parametrico, esempio:
Lasciamo che i dati parlino da soli
Assumiamo che i dati siano stati generati da una distribuzione Gaussiana n-variata
Processi Gaussiani //idee di base
Assumiamo che i dati siano stati generati da una distribuzione Gaussiana n-variata:
media 0
funzione di covarianza k(x,x)
Caratteristiche di k(x,x)
massima per x vicino a x
se x lontano da x
esempio:
x vicino a x
y correlato a y
Processi Gaussiani //idee di base
Assumiamo che i dati siano stati generati da una distribuzione Gaussiana n-variata con rumore gaussiano additivo
Incorporiamo il rumore nel kernel
Vogliamo predire y* e non la reale funzione f*
il punto la media di
y* e f*
media uguale covarianza diversa
Processi Gaussiani //idee di base
Calcoliamo le covarianze fra tutte le possibili combinazioni di punti.
otteniamo tre matrici
il punto la media di
y* e f*
Processi Gaussiani //idee di base
Regressione con GP
Assumiamo che i dati siano stati generati da una distribuzione Gaussiana n-variata con rumore gaussiano additivo
Vogliamo predire
Usando le propriet delle Gaussiane
il punto la media di
y* e f*
Predizione
Incertezza
Processi Gaussiani //idee di base
Regressione con GP per il punto y*, con
Nell esempio, stimando dalle error bar, e con una scelta appropriata di e
il punto la media di
y* e f*
Predizione
Incertezza
Processi Gaussiani //idee di base
Regressione con GP per 1000 punti
Nell esempio, stimando dalle error bar, e con una scelta appropriata di e
Processi Gaussiani //idee di base
Regressione con GP per 1000 punti
Nell esempio, stimando dalle error bar, e con una scelta appropriata di e
Possiamo incorporare la fisica del processo nella funzione di kernel
Processi Gaussiani //idee di base
Possiamo incorporare la fisica del processo nella funzione di kernel
Processi Gaussiani //idee di base
Possiamo incorporare la fisica del processo nella funzione di kernel
Processi Gaussiani //idee di base
Esempio
Definizione della funzione di kernel k(x,z)
Creazione di un data set
% definizione del kernel e della bandwidth tau = 1.0;k = @(x,z) exp(-(x-z)'*(x-z) / (2*tau^2));
% crea un training set corrotto da rumore m_train = 10; i_train = floor(rand(m_train,1) * size(X,1) + 1);
X_train = X(i_train);
y_train = h(i_train) + sigma * randn(m_train,1);
funzione vera (latente)
Esempio
Predizione:
X_test = X;
K_test_test = compute_kernel_matrix(k,X_test,X_test);
K_test_train = compute_kernel_matrix(k,X_test,X_train);
K_train_train = compute_kernel_matrix(k,X_train,X_train);
G = K_train_train + sigma^2 * eye(size(X_train,1));
mu_test = K_test_train * (G \ y_train);
Sigma_test = K_test_test + sigma^2 * eye(size(X_test,1)) - K_test_train * (G \ (K_test_train'));
function [K] = compute_kernel_matrix(k, X, Z) m = size(X,1); n = size(Z,1); K = zeros(m,n); for i = 1:m for j = 1:n K(i,j) = k(X(i,:)', Z(j,:)'); end end end
Fantastico! Ma...
che cosa abbiamo fatto realmente e come si collega allapproccio Bayesiano?
come si effettua una scelta appropriata dei parametri??
Riprendiamo la regressione lineare
Processi Gaussiani //idee di base
L a p r o b a b i l i t Gaussiana a priori sui pesi equivale a un a priori Gaussiano sulle funzioni y
Design MatrixCongiunta Normale
Si pu dimostrare che
Matrice di Gram
A priori Gaussiano sulle funzioni y
Mi bastano media e varianza per specificarla
Media nulla basta la varianza per specificarla
Mi bastano i dati per specificare la varianza
Processi Gaussiani //fondamenti concettuali
P r o b a b i l i t G a u s s i a n a (congiunta) sui punti della funzione
Probabilit Gaussiana a priori sulla funzione
I parametri w sono spariti !
f una variabile aleatoria
Processi Gaussiani //fondamenti concettuali
function [f] = sample_gp_prior(k, X) K = compute_kernel_matrix(k, X, X); % campiona da una distribuzione Gaussiana a media nulla % con matrice di covarianza K [U,S,V] = svd(K); A = U*sqrt(S)*V'; f = A * randn(size(X,1),1); end
Processi Gaussiani //fondamenti concettuali
Quando campiono dal processo Gaussiano campiono una funzione
Processo Gaussiano
I parametri w sono spariti !
Processi Gaussiani //fondamenti concettuali
Quando campiono dal processo Gaussiano campiono una funzione
Processo Gaussiano
I parametri w sono spariti !
Processi Gaussiani //fondamenti concettuali
Cio: una distribuzione di probabilit su funzioni?
E quindi una funzione una variabile aleatoria?
Quando campiono dal processo Gaussiano campiono una funzione
Processi Gaussiani //fondamenti concettuali
Vediamola in questo modo.. campioniamo da una Gaussiana 2D
which we can plot in a horizontal manner, if we wish
x1 x2 x1
x2 0
2
4
-2
-4
Processi Gaussiani //fondamenti concettuali
which we can plot in a horizontal manner, if we wish
x1 x2 x1
x2 0
2
4
-2
-4
which we can plot in a horizontal manner, if we wish
and where we can display a 2-D point using both plots
x1 x2 x1
x2 0
2
4
-2
-4
Vediamola in questo modo.. campioniamo da una Gaussiana 2D
Processi Gaussiani //fondamenti concettuali
which we can plot in a horizontal manner, if we wish
x1 x2 x1
x2 0
2
4
-2
-4
which we can plot in a horizontal manner, if we wish
and where we can display a 2-D point using both plots
x1 x2 x1
x2 0
2
4
-2
-4
which we can plot in a horizontal manner, if we wish
and where we can display a 2-D point using both plots
x1 x2 x1
x2 0
2
4
-2
-4
Vediamola in questo modo.. campioniamo da una Gaussiana 2D
Processi Gaussiani //fondamenti concettuali
which we can plot in a horizontal manner, if we wish
x1 x2 x1
x2 0
2
4
-2
-4
which we can plot in a horizontal manner, if we wish
and where we can display a 2-D point using both plots
x1 x2 x1
x2 0
2
4
-2
-4
which we can plot in a horizontal manner, if we wish
and where we can display a 2-D point using both plots
x1 x2 x1
x2 0
2
4
-2
-4
which we can plot in a horizontal manner, if we wish
and where we can display a 2-D point using both plots
x1 x2 x1
x2 0
2
4
-2
-4
Vediamola in questo modo.. campioniamo da una Gaussiana 2D
Processi Gaussiani //fondamenti concettuali
which we can plot in a horizontal manner, if we wish
x1 x2 x1
x2 0
2
4
-2
-4
which we can plot in a horizontal manner, if we wish
and where we can display a 2-D point using both plots
x1 x2 x1
x2 0
2
4
-2
-4
which we can plot in a horizontal manner, if we wish
and where we can display a 2-D point using both plots
x1 x2 x1
x2 0
2
4
-2
-4
which we can plot in a horizontal manner, if we wish
and where we can display a 2-D point using both plots
x1 x2 x1
x2 0
2
4
-2
-4
which we can plot in a horizontal manner, if we wish
and where we can display a 2-D point using both plots
x1 x2 x1
x2 0
2
4
-2
-4
Vediamola in questo modo.. campioniamo da una Gaussiana 2D
Processi Gaussiani //fondamenti concettuali
which we can plot in a horizontal manner, if we wish
x1 x2 x1
x2 0
2
4
-2
-4
Vediamola in questo modo.. campioniamo da una Gaussiana 2D
Processi Gaussiani //fondamenti concettuali
.. e uniamo le due coordinate con una retta
which we can plot in a horizontal manner, if we wish
x1 x2 x1
x2 0
2
4
-2
-4
which we can plot in a horizontal manner, if we wish
and where we can display a 2-D point using both plots
x1 x2 x1
x2 0
2
4
-2
-4
Processi Gaussiani //fondamenti concettuali
which we can plot in a horizontal manner, if we wish
x1 x2 x1
x2 0
2
4
-2
-4
which we can plot in a horizontal manner, if we wish
and where we can display a 2-D point using both plots
x1 x2 x1
x2 0
2
4
-2
-4
which we can plot in a horizontal manner, if we wish
and where we can display a 2-D point using both plots
x1 x2 x1
x2 0
2
4
-2
-4
.. e uniamo le due coordinate con una retta
Processi Gaussiani //fondamenti concettuali
which we can plot in a horizontal manner, if we wish
x1 x2 x1
x2 0
2
4
-2
-4
which we can plot in a horizontal manner, if we wish
and where we can display a 2-D point using both plots
x1 x2 x1
x2 0
2
4
-2
-4
which we can plot in a horizontal manner, if we wish
and where we can display a 2-D point using both plots
x1 x2 x1
x2 0
2
4
-2
-4
which we can plot in a horizontal manner, if we wish
and where we can display a 2-D point using both plots
x1 x2 x1
x2 0
2
4
-2
-4
.. e uniamo le due coordinate con una retta
Processi Gaussiani //fondamenti concettuali
which we can plot in a horizontal manner, if we wish
x1 x2 x1
x2 0
2
4
-2
-4
which we can plot in a horizontal manner, if we wish
and where we can display a 2-D point using both plots
x1 x2 x1
x2 0
2
4
-2
-4
which we can plot in a horizontal manner, if we wish
and where we can display a 2-D point using both plots
x1 x2 x1
x2 0
2
4
-2
-4
which we can plot in a horizontal manner, if we wish
and where we can display a 2-D point using both plots
x1 x2 x1
x2 0
2
4
-2
-4
which we can plot in a horizontal manner, if we wish
and where we can display a 2-D point using both plots
x1 x2 x1
x2 0
2
4
-2
-4
.. e uniamo le due coordinate con una retta
Processi Gaussiani //fondamenti concettuali
Adesso da una Gaussiana 3D
but were not restricted to bivariate Gaussians
Lets introduce some extra dimensions, to our multivariate Gaussian distribution (without plotting it!)
Perhaps unsurprisingly, as we generate points from our new trivariate distribution, we can equivalently show these on our three-column horizontal plot
x1 x2
0
2
4
-2
-4
x3
Processi Gaussiani //fondamenti concettuali
Adesso da una Gaussiana 3D
but were not restricted to bivariate Gaussians
Lets introduce some extra dimensions, to our multivariate Gaussian distribution (without plotting it!)
Perhaps unsurprisingly, as we generate points from our new trivariate distribution, we can equivalently show these on our three-column horizontal plot
x1 x2
0
2
4
-2
-4
x3
but were not restricted to bivariate Gaussians
Lets introduce some extra dimensions, to our multivariate Gaussian distribution (without plotting it!)
Perhaps unsurprisingly, as we generate points from our new trivariate distribution, we can equivalently show these on our three-column horizontal plot
x1 x2
0
2
4
-2
-4
x3
Processi Gaussiani //fondamenti concettuali
Adesso da una Gaussiana 3D
but were not restricted to bivariate Gaussians
Lets introduce some extra dimensions, to our multivariate Gaussian distribution (without plotting it!)
Perhaps unsurprisingly, as we generate points from our new trivariate distribution, we can equivalently show these on our three-column horizontal plot
x1 x2
0
2
4
-2
-4
x3
but were not restricted to bivariate Gaussians
Lets introduce some extra dimensions, to our multivariate Gaussian distribution (without plotting it!)
Perhaps unsurprisingly, as we generate points from our new trivariate distribution, we can equivalently show these on our three-column horizontal plot
x1 x2
0
2
4
-2
-4
x3
but were not restricted to bivariate Gaussians
Lets introduce some extra dimensions, to our multivariate Gaussian distribution (without plotting it!)
Perhaps unsurprisingly, as we generate points from our new trivariate distribution, we can equivalently show these on our three-column horizontal plot
x1 x2
0
2
4
-2
-4
x3
Processi Gaussiani //fondamenti concettuali
Adesso da una Gaussiana 3D
but were not restricted to bivariate Gaussians
Lets introduce some extra dimensions, to our multivariate Gaussian distribution (without plotting it!)
Perhaps unsurprisingly, as we generate points from our new trivariate distribution, we can equivalently show these on our three-column horizontal plot
x1 x2
0
2
4
-2
-4
x3
but were not restricted to bivariate Gaussians
Lets introduce some extra dimensions, to our multivariate Gaussian distribution (without plotting it!)
Perhaps unsurprisingly, as we generate points from our new trivariate distribution, we can equivalently show these on our three-column horizontal plot
x1 x2
0
2
4
-2
-4
x3
but were not restricted to bivariate Gaussians
Lets introduce some extra dimensions, to our multivariate Gaussian distribution (without plotting it!)
Perhaps unsurprisingly, as we generate points from our new trivariate distribution, we can equivalently show these on our three-column horizontal plot
x1 x2
0
2
4
-2
-4
x3
but were not restricted to bivariate Gaussians
Lets introduce some extra dimensions, to our multivariate Gaussian distribution (without plotting it!)
Perhaps unsurprisingly, as we generate points from our new trivariate distribution, we can equivalently show these on our three-column horizontal plot
x1 x2
0
2
4
-2
-4
x3
Processi Gaussiani //fondamenti concettuali
Adesso da una Gaussiana 3D
but were not restricted to bivariate Gaussians
Lets introduce some extra dimensions, to our multivariate Gaussian distribution (without plotting it!)
Perhaps unsurprisingly, as we generate points from our new trivariate distribution, we can equivalently show these on our three-column horizontal plot
x1 x2
0
2
4
-2
-4
x3
but were not restricted to bivariate Gaussians
Lets introduce some extra dimensions, to our multivariate Gaussian distribution (without plotting it!)
Perhaps unsurprisingly, as we generate points from our new trivariate distribution, we can equivalently show these on our three-column horizontal plot
x1 x2
0
2
4
-2
-4
x3
but were not restricted to bivariate Gaussians
Lets introduce some extra dimensions, to our multivariate Gaussian distribution (without plotting it!)
Perhaps unsurprisingly, as we generate points from our new trivariate distribution, we can equivalently show these on our three-column horizontal plot
x1 x2
0
2
4
-2
-4
x3
but were not restricted to bivariate Gaussians
Lets introduce some extra dimensions, to our multivariate Gaussian distribution (without plotting it!)
Perhaps unsurprisingly, as we generate points from our new trivariate distribution, we can equivalently show these on our three-column horizontal plot
x1 x2
0
2
4
-2
-4
x3
but were not restricted to bivariate Gaussians
Lets introduce some extra dimensions, to our multivariate Gaussian distribution (without plotting it!)
Perhaps unsurprisingly, as we generate points from our new trivariate distribution, we can equivalently show these on our three-column horizontal plot
x1 x2
0
2
4
-2
-4
x3
Processi Gaussiani //fondamenti concettuali
Adesso da una Gaussiana 3D
but were not restricted to bivariate Gaussians
Lets introduce some extra dimensions, to our multivariate Gaussian distribution (without plotting it!)
Perhaps unsurprisingly, as we generate points from our new trivariate distribution, we can equivalently show these on our three-column horizontal plot
x1 x2
0
2
4
-2
-4
x3
but were not restricted to bivariate Gaussians
Lets introduce some extra dimensions, to our multivariate Gaussian distribution (without plotting it!)
Perhaps unsurprisingly, as we generate points from our new trivariate distribution, we can equivalently show these on our three-column horizontal plot
x1 x2
0
2
4
-2
-4
x3
but were not restricted to bivariate Gaussians
Lets introduce some extra dimensions, to our multivariate Gaussian distribution (without plotting it!)
Perhaps unsurprisingly, as we generate points from our new trivariate distribution, we can equivalently show these on our three-column horizontal plot
x1 x2
0
2
4
-2
-4
x3
but were not restricted to bivariate Gaussians
Lets introduce some extra dimensions, to our multivariate Gaussian distribution (without plotting it!)
Perhaps unsurprisingly, as we generate points from our new trivariate distribution, we can equivalently show these on our three-column horizontal plot
x1 x2
0
2
4
-2
-4
x3
but were not restricted to bivariate Gaussians
Lets introduce some extra dimensions, to our multivariate Gaussian distribution (without plotting it!)
Perhaps unsurprisingly, as we generate points from our new trivariate distribution, we can equivalently show these on our three-column horizontal plot
x1 x2
0
2
4
-2
-4
x3
Processi Gaussiani //fondamenti concettuali
Adesso da una Gaussiana 5D
Why stop at a trivariate Gaussian distribution? We can add as many dimensions to our multivariate Gaussian as we wish
x1 x2
0
2
4
-2
-4
x3 x4 x5
Processi Gaussiani //fondamenti concettuali
Adesso da una Gaussiana 5D
Why stop at a trivariate Gaussian distribution? We can add as many dimensions to our multivariate Gaussian as we wish
x1 x2
0
2
4
-2
-4
x3 x4 x5
Why stop at a trivariate Gaussian distribution? We can add as many dimensions to our multivariate Gaussian as we wish
x1 x2
0
2
4
-2
-4
x3 x4 x5
Processi Gaussiani //fondamenti concettuali
Adesso da una Gaussiana 5D
Why stop at a trivariate Gaussian distribution? We can add as many dimensions to our multivariate Gaussian as we wish
x1 x2
0
2
4
-2
-4
x3 x4 x5
Why stop at a trivariate Gaussian distribution? We can add as many dimensions to our multivariate Gaussian as we wish
x1 x2
0
2
4
-2
-4
x3 x4 x5
Why stop at a trivariate Gaussian distribution? We can add as many dimensions to our multivariate Gaussian as we wish
x1 x2
0
2
4
-2
-4
x3 x4 x5
Processi Gaussiani //fondamenti concettuali
Adesso da una Gaussiana 5D
Why stop at a trivariate Gaussian distribution? We can add as many dimensions to our multivariate Gaussian as we wish
x1 x2
0
2
4
-2
-4
x3 x4 x5
Why stop at a trivariate Gaussian distribution? We can add as many dimensions to our multivariate Gaussian as we wish
x1 x2
0
2
4
-2
-4
x3 x4 x5
Why stop at a trivariate Gaussian distribution? We can add as many dimensions to our multivariate Gaussian as we wish
x1 x2
0
2
4
-2
-4
x3 x4 x5
Why stop at a trivariate Gaussian distribution? We can add as many dimensions to our multivariate Gaussian as we wish
x1 x2
0
2
4
-2
-4
x3 x4 x5
Processi Gaussiani //fondamenti concettuali
Adesso 3 campionamenti da una Gaussiana 25DBuilding large Gaussians
Three draws from a 25D Gaussian:
1
0
1
2
f
x
To produce this, we needed a mean: I used zeros(25,1)
The covariances were set using a kernel function: ij = k(xi,xj).
The xs are the positions that I planted the tics on the axis.
Later well find ks that ensure is always positive semi-definite.
Processo Gaussiano
I parametri w sono spariti !
f una variabile aleatoria
Processi Gaussiani //fondamenti concettuali
Distribuzione Gaussiana vs. Processo Gaussiano
Un processo Gaussiano definisce una distribuzione su funzioni p(f) dove
pu essere considerato un vettore a dimensionalit infinita (un processo stocastico)
Definizione:
p(f) un processo Gaussiano se per ogni sottoinsieme finito la distribuzione sul sottoinsieme finito (vettore n-dimensionale) una Gaussiana multivariata
Processi Gaussiani //definizione formale
che cosa ci dice il kernel?
La funzione di kernel, dunque la covarianza, ci dice quanto due punti sono correlati
Processi Gaussiani //Che cosa ci dice il kernel.
Processo Gaussiano
Processi Gaussiani //fondamenti concettuali
Processo Gaussiano Usando Kernel diversi....
Processi Gaussiani //fondamenti concettuali
Processi Gaussiani //fondamenti concettuali
Processi Gaussiani //Modello grafico
n=1,...,N
p a r a m e t r i del kernel
Processi Gaussiani //Regressione
Il modello assumendo rumore Gaussiano
La distribuzione congiunta su e
Predizione rispetto a un nuovo dato :
La congiunta su
Partiziono la matrice di covarianza come
Processi Gaussiani //Regressione
Sfruttando le propriet delle congiunte di Gaussiane espresse in forma matriciale si ottiene che la predizione specificabile come
Processi Gaussiani //Regressione
Processi Gaussiani //Regressione: esempio
Processi Gaussiani //Regressione: esempio
Processi Gaussiani //Regressione: esempio
Processi Gaussiani //Regressione: esempio
Processi Gaussiani //Regressione: esempio
Processi Gaussiani //Regressione: esempio
Processi Gaussiani //Regressione: esempio
Processi Gaussiani //Regressione: esempio
La matrice di covarianza deve essere semidefinita e positiva
In generale la funzione di covarianza ha dei parametri da apprendere
Processi Gaussiani //Parametri del kernel
varianza del segnale
La matrice di covarianza deve essere semidefinita e positiva
In generale la funzione di covarianza ha dei parametri da apprendere
Processi Gaussiani //Parametri del kernel
lunghezza di scala o di correlazione
La matrice di covarianza deve essere semidefinita e positiva
In generale la funzione di covarianza ha dei parametri da apprendere
approccio ML: si massimizza
approccio MAP: si massimizza
approccio totalmente Bayesiano: non trattabile, approssimazioni
Processi Gaussiani //Parametri del kernel
Processi Gaussiani //Parametri del kernel
Processi Gaussiani //Parametri del kernel
Processi Gaussiani //Parametri del kernel
Processi Gaussiani //Parametri del kernel
Processi Gaussiani //Parametri del kernel
Processi Gaussiani //Parametri del kernel
Processi Gaussiani //Parametri del kernel
Processi Gaussiani //Parametri del kernel