regressione: processi gaussiani e machine...

of 39 /39
Corso di Sistemi di Elaborazione dell’Informazione Prof. Giuseppe Boccignone Dipartimento di Informatica Università di Milano [email protected] boccignone.di.unimi.it/FM_2016.html SCUOLA DI SPECIALIZZAZIONE IN FISICA MEDICA Regressione: processi Gaussiani e Machine Learning Apprendimento supervisionato //Regressione

Author: buiphuc

Post on 15-Feb-2019

217 views

Category:

Documents


0 download

Embed Size (px)

TRANSCRIPT

Corso di Sistemi di Elaborazione dellInformazione

Prof. Giuseppe Boccignone

Dipartimento di Informatica

Universit di Milano

[email protected]

boccignone.di.unimi.it/FM_2016.html

SCUOLA DI SPECIALIZZAZIONE IN FISICA MEDICA

Regressione: processi Gaussiani e Machine Learning

Apprendimento supervisionato //Regressione

Processi Gaussiani: prendendo un qualsiasi numero finito di variabili aleatorie, dallensemble che forma il processo aleatorio stesso, esse hanno una distribuzione di probabilit congiunta gaussiana (multivariata)

Pi precisamente: Gaussiano se per qualunque scelta di si ha che

Processi stocastici // Classi di processi

Chapter 3

Gaussian Processes andStochastic DifferentialEquations

3.1 Gaussian Processes

Gaussian processes are a reasonably large class of processes that havemany applications, including in finance, time-series analysis and machinelearning. Certain classes of Gaussian processes can also be thought of asspatial processes. We will use these as a vehicle to start considering moregeneral spatial objects.

Definition 3.1.1 (Gaussian Process). A stochastic process {Xt}t0 is Gaus-sian if, for any choice of times t1, . . . , tn, the random vector (Xt1 , . . . , Xtn)has a multivariate normal distribution.

3.1.1 The Multivariate Normal Distribution

Because Gaussian processes are defined in terms of the multivariate nor-mal distribution, we will need to have a pretty good understanding of thisdistribution and its properties.

Definition 3.1.2 (Multivariate Normal Distribution). A vectorX = (X1, . . . , Xn)is said to be multivariate normal (multivariate Gaussian) if all linear combi-nations of X, i.e. all random variables of the form

n

k=1

kXk

have univariate normal distributions.

89

Chapter 3

Gaussian Processes andStochastic DifferentialEquations

3.1 Gaussian Processes

Gaussian processes are a reasonably large class of processes that havemany applications, including in finance, time-series analysis and machinelearning. Certain classes of Gaussian processes can also be thought of asspatial processes. We will use these as a vehicle to start considering moregeneral spatial objects.

Definition 3.1.1 (Gaussian Process). A stochastic process {Xt}t0 is Gaus-sian if, for any choice of times t1, . . . , tn, the random vector (Xt1 , . . . , Xtn)has a multivariate normal distribution.

3.1.1 The Multivariate Normal Distribution

Because Gaussian processes are defined in terms of the multivariate nor-mal distribution, we will need to have a pretty good understanding of thisdistribution and its properties.

Definition 3.1.2 (Multivariate Normal Distribution). A vectorX = (X1, . . . , Xn)is said to be multivariate normal (multivariate Gaussian) if all linear combi-nations of X, i.e. all random variables of the form

n

k=1

kXk

have univariate normal distributions.

89

3.1. GAUSSIAN PROCESSES 93

Simulating Multivariate Normals

One possible way to simulate a multivariate normal random vector isusing the Cholesky decomposition.

Lemma 3.1.11. If Z N(0, I), = AA is a covariance matrix, andX = + AZ, then X N(,).

Proof. We know from lemma 3.1.4 that AZ is multivariate normal and sois + AZ. Because a multivariate normal random vector is completelydescribed by its mean vector and covariance matrix, we simple need to findEX and Var(X). Now,

EX = E+ EAZ = + A0 =

and

Var(X) = Var(+ AZ) = Var(AZ) = AVar(Z)A = AIA = AA = .

Thus, we can simulate a random vector X N(,) by

(i) Finding A such that = AA.

(ii) Generating Z N(0, I).

(iii) Returning X = + AZ.

Matlab makes things a bit confusing because its function chol producesa decomposition of the form BB. That is, A = B. It is important tobe careful and think about whether you want to generate a row or columnvector when generating multivariate normals.

The following code produces column vectors.

Listing 3.1: Matlab code

1 mu = [1 2 3];

2 Sigma = [3 1 2; 1 2 1; 2 1 5];

3 A = chol(Sigma);

4 X = mu + A * randn(3,1);

The following code produces row vectors.

Listing 3.2: Matlab code

1 mu = [1 2 3];

2 Sigma = [3 1 2; 1 2 1; 2 1 5];

3 A = chol(Sigma);

4 X = mu + randn(1,3) * A;

92 CHAPTER 3. GAUSSIAN PROCESSES ETC.

If A is also symmetric, then A is called symmetric positive semi-definite(SPSD). Note that if A is an SPSD then there is a real-valued decomposition

A = LL,

though it is not necessarily unique and L may have zeroes on the diagonals.

Lemma 3.1.9. Covariance matrices are SPSD.

Proof. Given an n 1 real-valued vector x and a random vector Y withcovariance matrix , we have

Var (xY) = xVar (Y)x.

Now this must be non-negative (as it is a variance). That is, it must be thecase that

xVar (Y)x 0,

so is positive semi-definite. Symmetry comes from the fact that Cov(X, Y ) =Cov(Y,X).

Lemma 3.1.10. SPSD matrices are covariance matrices.

Proof. Let A be an SPSD matrix and Z be a vector of random variables withVar(Z) = I. Now, as A is SPSD, A = LL. So,

Var(LZ) = LVar(Z)L = LIL = A,

so A is the covariance matrix of LZ.

COVARIANCE POS DEF ...

Densities of Multivariate Normals

If X = (X1, . . . , Xn) is N(,) and is positive definite, then X has thedensity

f(x) = f(x1, . . . , xn) =1

(2)n||exp

{12(x )1(x )

}.

Chapter 3

Gaussian Processes andStochastic DifferentialEquations

3.1 Gaussian Processes

Gaussian processes are a reasonably large class of processes that havemany applications, including in finance, time-series analysis and machinelearning. Certain classes of Gaussian processes can also be thought of asspatial processes. We will use these as a vehicle to start considering moregeneral spatial objects.

Definition 3.1.1 (Gaussian Process). A stochastic process {Xt}t0 is Gaus-sian if, for any choice of times t1, . . . , tn, the random vector (Xt1 , . . . , Xtn)has a multivariate normal distribution.

3.1.1 The Multivariate Normal Distribution

Because Gaussian processes are defined in terms of the multivariate nor-mal distribution, we will need to have a pretty good understanding of thisdistribution and its properties.

Definition 3.1.2 (Multivariate Normal Distribution). A vectorX = (X1, . . . , Xn)is said to be multivariate normal (multivariate Gaussian) if all linear combi-nations of X, i.e. all random variables of the form

n

k=1

kXk

have univariate normal distributions.

89

92 CHAPTER 3. GAUSSIAN PROCESSES ETC.

If A is also symmetric, then A is called symmetric positive semi-definite(SPSD). Note that if A is an SPSD then there is a real-valued decomposition

A = LL,

though it is not necessarily unique and L may have zeroes on the diagonals.

Lemma 3.1.9. Covariance matrices are SPSD.

Proof. Given an n 1 real-valued vector x and a random vector Y withcovariance matrix , we have

Var (xY) = xVar (Y)x.

Now this must be non-negative (as it is a variance). That is, it must be thecase that

xVar (Y)x 0,

so is positive semi-definite. Symmetry comes from the fact that Cov(X, Y ) =Cov(Y,X).

Lemma 3.1.10. SPSD matrices are covariance matrices.

Proof. Let A be an SPSD matrix and Z be a vector of random variables withVar(Z) = I. Now, as A is SPSD, A = LL. So,

Var(LZ) = LVar(Z)L = LIL = A,

so A is the covariance matrix of LZ.

COVARIANCE POS DEF ...

Densities of Multivariate Normals

If X = (X1, . . . , Xn) is N(,) and is positive definite, then X has thedensity

f(x) = f(x1, . . . , xn) =1

(2)n||exp

{12(x )1(x )

}.

92 CHAPTER 3. GAUSSIAN PROCESSES ETC.

If A is also symmetric, then A is called symmetric positive semi-definite(SPSD). Note that if A is an SPSD then there is a real-valued decomposition

A = LL,

though it is not necessarily unique and L may have zeroes on the diagonals.

Lemma 3.1.9. Covariance matrices are SPSD.

Proof. Given an n 1 real-valued vector x and a random vector Y withcovariance matrix , we have

Var (xY) = xVar (Y)x.

Now this must be non-negative (as it is a variance). That is, it must be thecase that

xVar (Y)x 0,

so is positive semi-definite. Symmetry comes from the fact that Cov(X, Y ) =Cov(Y,X).

Lemma 3.1.10. SPSD matrices are covariance matrices.

Proof. Let A be an SPSD matrix and Z be a vector of random variables withVar(Z) = I. Now, as A is SPSD, A = LL. So,

Var(LZ) = LVar(Z)L = LIL = A,

so A is the covariance matrix of LZ.

COVARIANCE POS DEF ...

Densities of Multivariate Normals

If X = (X1, . . . , Xn) is N(,) and is positive definite, then X has thedensity

f(x) = f(x1, . . . , xn) =1

(2)n||exp

{12(x )1(x )

}.

92 CHAPTER 3. GAUSSIAN PROCESSES ETC.

If A is also symmetric, then A is called symmetric positive semi-definite(SPSD). Note that if A is an SPSD then there is a real-valued decomposition

A = LL,

though it is not necessarily unique and L may have zeroes on the diagonals.

Lemma 3.1.9. Covariance matrices are SPSD.

Proof. Given an n 1 real-valued vector x and a random vector Y withcovariance matrix , we have

Var (xY) = xVar (Y)x.

Now this must be non-negative (as it is a variance). That is, it must be thecase that

xVar (Y)x 0,

so is positive semi-definite. Symmetry comes from the fact that Cov(X, Y ) =Cov(Y,X).

Lemma 3.1.10. SPSD matrices are covariance matrices.

Proof. Let A be an SPSD matrix and Z be a vector of random variables withVar(Z) = I. Now, as A is SPSD, A = LL. So,

Var(LZ) = LVar(Z)L = LIL = A,

so A is the covariance matrix of LZ.

COVARIANCE POS DEF ...

Densities of Multivariate Normals

If X = (X1, . . . , Xn) is N(,) and is positive definite, then X has thedensity

f(x) = f(x1, . . . , xn) =1

(2)n||exp

{12(x )1(x )

}.

Processi Gaussiani (GP) e Machine Learning

Processi Gaussiani: prendendo un qualsiasi numero finito di variabili aleatorie, dalla collezione che forma il processo aleatorio stesso, esse hanno una distribuzione di probabilit congiunta gaussiana (multivariata)

Un processo Gaussiano completamente specificato mediante media e matrice di covarianza

3.1. GAUSSIAN PROCESSES 93

Simulating Multivariate Normals

One possible way to simulate a multivariate normal random vector isusing the Cholesky decomposition.

Lemma 3.1.11. If Z N(0, I), = AA is a covariance matrix, andX = + AZ, then X N(,).

Proof. We know from lemma 3.1.4 that AZ is multivariate normal and sois + AZ. Because a multivariate normal random vector is completelydescribed by its mean vector and covariance matrix, we simple need to findEX and Var(X). Now,

EX = E+ EAZ = + A0 =

and

Var(X) = Var(+ AZ) = Var(AZ) = AVar(Z)A = AIA = AA = .

Thus, we can simulate a random vector X N(,) by

(i) Finding A such that = AA.

(ii) Generating Z N(0, I).

(iii) Returning X = + AZ.

Matlab makes things a bit confusing because its function chol producesa decomposition of the form BB. That is, A = B. It is important tobe careful and think about whether you want to generate a row or columnvector when generating multivariate normals.

The following code produces column vectors.

Listing 3.1: Matlab code

1 mu = [1 2 3];

2 Sigma = [3 1 2; 1 2 1; 2 1 5];

3 A = chol(Sigma);

4 X = mu + A * randn(3,1);

The following code produces row vectors.

Listing 3.2: Matlab code

1 mu = [1 2 3];

2 Sigma = [3 1 2; 1 2 1; 2 1 5];

3 A = chol(Sigma);

4 X = mu + randn(1,3) * A;

Processi stocastici // Simulazione di processi Gaussiani

94 CHAPTER 3. GAUSSIAN PROCESSES ETC.

The complexity of the Cholesky decomposition of an arbitrary real-valuedmatrix is O(n3) floating point operations.

3.1.2 Simulating a Gaussian Processes Version 1

Because Gaussian processes are defined to be processes where, for anychoice of times t1, . . . , tn, the random vector (Xt1 , . . . , Xtn) has a multivariatenormal density and a multivariate normal density is completely describe bya mean vector and a covariance matrix , the probability distribution ofGaussian process is completely described if we have a way to construct themean vector and covariance matrix for an arbitrary choice of t1, . . . , tn.

We can do this using an expectation function

(t) = EXt

and a covariance function

r(s, t) = Cov(Xs, Xt).

Using these, we can simulate the values of a Gaussian process at fixed timest1, . . . , tn by calculating where i = (t1) and , where i,j = r(ti, tj) andthen simulating a multivariate normal vector.

In the following examples, we simulate the Gaussian processes at evenlyspaced times t1 = 1/h, t2 = 2/h, . . . , tn = 1.

Example 3.1.12 (Brownian Motion). Brownian motion is a very importantstochastic process which is, in some sense, the continuous time continuousstate space analogue of a simple random walk. It has an expectation function(t) = 0 and a covariance function r(s, t) = min(s, t).

Listing 3.3: Matlab code

1 %use mesh size h

2 m = 1000; h = 1/m; t = h : h : 1;

3 n = length(t);

4

5 %Make the mean vector

6 mu = zeros(1,n);

7 for i = 1:n

8 mu(i) = 0;

9 end

10

11 %Make the covariance matrix

12 Sigma = zeros(n,n);

94 CHAPTER 3. GAUSSIAN PROCESSES ETC.

The complexity of the Cholesky decomposition of an arbitrary real-valuedmatrix is O(n3) floating point operations.

3.1.2 Simulating a Gaussian Processes Version 1

Because Gaussian processes are defined to be processes where, for anychoice of times t1, . . . , tn, the random vector (Xt1 , . . . , Xtn) has a multivariatenormal density and a multivariate normal density is completely describe bya mean vector and a covariance matrix , the probability distribution ofGaussian process is completely described if we have a way to construct themean vector and covariance matrix for an arbitrary choice of t1, . . . , tn.

We can do this using an expectation function

(t) = EXt

and a covariance function

r(s, t) = Cov(Xs, Xt).

Using these, we can simulate the values of a Gaussian process at fixed timest1, . . . , tn by calculating where i = (t1) and , where i,j = r(ti, tj) andthen simulating a multivariate normal vector.

In the following examples, we simulate the Gaussian processes at evenlyspaced times t1 = 1/h, t2 = 2/h, . . . , tn = 1.

Example 3.1.12 (Brownian Motion). Brownian motion is a very importantstochastic process which is, in some sense, the continuous time continuousstate space analogue of a simple random walk. It has an expectation function(t) = 0 and a covariance function r(s, t) = min(s, t).

Listing 3.3: Matlab code

1 %use mesh size h

2 m = 1000; h = 1/m; t = h : h : 1;

3 n = length(t);

4

5 %Make the mean vector

6 mu = zeros(1,n);

7 for i = 1:n

8 mu(i) = 0;

9 end

10

11 %Make the covariance matrix

12 Sigma = zeros(n,n);

94 CHAPTER 3. GAUSSIAN PROCESSES ETC.

The complexity of the Cholesky decomposition of an arbitrary real-valuedmatrix is O(n3) floating point operations.

3.1.2 Simulating a Gaussian Processes Version 1

Because Gaussian processes are defined to be processes where, for anychoice of times t1, . . . , tn, the random vector (Xt1 , . . . , Xtn) has a multivariatenormal density and a multivariate normal density is completely describe bya mean vector and a covariance matrix , the probability distribution ofGaussian process is completely described if we have a way to construct themean vector and covariance matrix for an arbitrary choice of t1, . . . , tn.

We can do this using an expectation function

(t) = EXt

and a covariance function

r(s, t) = Cov(Xs, Xt).

Using these, we can simulate the values of a Gaussian process at fixed timest1, . . . , tn by calculating where i = (t1) and , where i,j = r(ti, tj) andthen simulating a multivariate normal vector.

In the following examples, we simulate the Gaussian processes at evenlyspaced times t1 = 1/h, t2 = 2/h, . . . , tn = 1.

Example 3.1.12 (Brownian Motion). Brownian motion is a very importantstochastic process which is, in some sense, the continuous time continuousstate space analogue of a simple random walk. It has an expectation function(t) = 0 and a covariance function r(s, t) = min(s, t).

Listing 3.3: Matlab code

1 %use mesh size h

2 m = 1000; h = 1/m; t = h : h : 1;

3 n = length(t);

4

5 %Make the mean vector

6 mu = zeros(1,n);

7 for i = 1:n

8 mu(i) = 0;

9 end

10

11 %Make the covariance matrix

12 Sigma = zeros(n,n);

94 CHAPTER 3. GAUSSIAN PROCESSES ETC.

The complexity of the Cholesky decomposition of an arbitrary real-valuedmatrix is O(n3) floating point operations.

3.1.2 Simulating a Gaussian Processes Version 1

Because Gaussian processes are defined to be processes where, for anychoice of times t1, . . . , tn, the random vector (Xt1 , . . . , Xtn) has a multivariatenormal density and a multivariate normal density is completely describe bya mean vector and a covariance matrix , the probability distribution ofGaussian process is completely described if we have a way to construct themean vector and covariance matrix for an arbitrary choice of t1, . . . , tn.

We can do this using an expectation function

(t) = EXt

and a covariance function

r(s, t) = Cov(Xs, Xt).

Using these, we can simulate the values of a Gaussian process at fixed timest1, . . . , tn by calculating where i = (t1) and , where i,j = r(ti, tj) andthen simulating a multivariate normal vector.

In the following examples, we simulate the Gaussian processes at evenlyspaced times t1 = 1/h, t2 = 2/h, . . . , tn = 1.

Example 3.1.12 (Brownian Motion). Brownian motion is a very importantstochastic process which is, in some sense, the continuous time continuousstate space analogue of a simple random walk. It has an expectation function(t) = 0 and a covariance function r(s, t) = min(s, t).

Listing 3.3: Matlab code

1 %use mesh size h

2 m = 1000; h = 1/m; t = h : h : 1;

3 n = length(t);

4

5 %Make the mean vector

6 mu = zeros(1,n);

7 for i = 1:n

8 mu(i) = 0;

9 end

10

11 %Make the covariance matrix

12 Sigma = zeros(n,n);

94 CHAPTER 3. GAUSSIAN PROCESSES ETC.

The complexity of the Cholesky decomposition of an arbitrary real-valuedmatrix is O(n3) floating point operations.

3.1.2 Simulating a Gaussian Processes Version 1

Because Gaussian processes are defined to be processes where, for anychoice of times t1, . . . , tn, the random vector (Xt1 , . . . , Xtn) has a multivariatenormal density and a multivariate normal density is completely describe bya mean vector and a covariance matrix , the probability distribution ofGaussian process is completely described if we have a way to construct themean vector and covariance matrix for an arbitrary choice of t1, . . . , tn.

We can do this using an expectation function

(t) = EXt

and a covariance function

r(s, t) = Cov(Xs, Xt).

Using these, we can simulate the values of a Gaussian process at fixed timest1, . . . , tn by calculating where i = (t1) and , where i,j = r(ti, tj) andthen simulating a multivariate normal vector.

In the following examples, we simulate the Gaussian processes at evenlyspaced times t1 = 1/h, t2 = 2/h, . . . , tn = 1.

Example 3.1.12 (Brownian Motion). Brownian motion is a very importantstochastic process which is, in some sense, the continuous time continuousstate space analogue of a simple random walk. It has an expectation function(t) = 0 and a covariance function r(s, t) = min(s, t).

Listing 3.3: Matlab code

1 %use mesh size h

2 m = 1000; h = 1/m; t = h : h : 1;

3 n = length(t);

4

5 %Make the mean vector

6 mu = zeros(1,n);

7 for i = 1:n

8 mu(i) = 0;

9 end

10

11 %Make the covariance matrix

12 Sigma = zeros(n,n);

94 CHAPTER 3. GAUSSIAN PROCESSES ETC.

The complexity of the Cholesky decomposition of an arbitrary real-valuedmatrix is O(n3) floating point operations.

3.1.2 Simulating a Gaussian Processes Version 1

Because Gaussian processes are defined to be processes where, for anychoice of times t1, . . . , tn, the random vector (Xt1 , . . . , Xtn) has a multivariatenormal density and a multivariate normal density is completely describe bya mean vector and a covariance matrix , the probability distribution ofGaussian process is completely described if we have a way to construct themean vector and covariance matrix for an arbitrary choice of t1, . . . , tn.

We can do this using an expectation function

(t) = EXt

and a covariance function

r(s, t) = Cov(Xs, Xt).

Using these, we can simulate the values of a Gaussian process at fixed timest1, . . . , tn by calculating where i = (t1) and , where i,j = r(ti, tj) andthen simulating a multivariate normal vector.

In the following examples, we simulate the Gaussian processes at evenlyspaced times t1 = 1/h, t2 = 2/h, . . . , tn = 1.

Example 3.1.12 (Brownian Motion). Brownian motion is a very importantstochastic process which is, in some sense, the continuous time continuousstate space analogue of a simple random walk. It has an expectation function(t) = 0 and a covariance function r(s, t) = min(s, t).

Listing 3.3: Matlab code

1 %use mesh size h

2 m = 1000; h = 1/m; t = h : h : 1;

3 n = length(t);

4

5 %Make the mean vector

6 mu = zeros(1,n);

7 for i = 1:n

8 mu(i) = 0;

9 end

10

11 %Make the covariance matrix

12 Sigma = zeros(n,n);

3.1. GAUSSIAN PROCESSES 95

13 for i = 1:n

14 for j = 1:n

15 Sigma(i,j) = min(t(i),t(j));

16 end

17 end

18

19 %Generate the multivariate normal vector

20 A = chol(Sigma);

21 X = mu + randn(1,n) * A;

22

23 %Plot

24 plot(t,X);

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 11.8

1.6

1.4

1.2

1

0.8

0.6

0.4

0.2

0

0.2

t

Xt

Figure 3.1.1: Plot of Brownain motion on [0, 1].

Example 3.1.13 (Ornstein-Uhlenbeck Process). Another very importantGaussian process is the Ornstein-Uhlenbeck process. This has expectationfunction (t) = 0 and covariance function r(s, t) = e|st|/2.

Listing 3.4: Matlab code

1 %use mesh size h

2 m = 1000; h = 1/m; t = h : h : 1;

3 n = length(t);

4 %paramter of OU process

5 alpha = 10;

6

3.1. GAUSSIAN PROCESSES 95

13 for i = 1:n

14 for j = 1:n

15 Sigma(i,j) = min(t(i),t(j));

16 end

17 end

18

19 %Generate the multivariate normal vector

20 A = chol(Sigma);

21 X = mu + randn(1,n) * A;

22

23 %Plot

24 plot(t,X);

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 11.8

1.6

1.4

1.2

1

0.8

0.6

0.4

0.2

0

0.2

t

Xt

Figure 3.1.1: Plot of Brownain motion on [0, 1].

Example 3.1.13 (Ornstein-Uhlenbeck Process). Another very importantGaussian process is the Ornstein-Uhlenbeck process. This has expectationfunction (t) = 0 and covariance function r(s, t) = e|st|/2.

Listing 3.4: Matlab code

1 %use mesh size h

2 m = 1000; h = 1/m; t = h : h : 1;

3 n = length(t);

4 %paramter of OU process

5 alpha = 10;

6

96 CHAPTER 3. GAUSSIAN PROCESSES ETC.

7 %Make the mean vector

8 mu = zeros(1,n);

9 for i = 1:n

10 mu(i) = 0;

11 end

12

13 %Make the covariance matrix

14 Sigma = zeros(n,n);

15 for i = 1:n

16 for j = 1:n

17 Sigma(i,j) = exp(-alpha * abs(t(i) - t(j)) / 2);

18 end

19 end

20

21 %Generate the multivariate normal vector

22 A = chol(Sigma);

23 X = mu + randn(1,n) * A;

24

25 %Plot

26 plot(t,X);

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 11.5

1

0.5

0

0.5

1

1.5

2

2.5

3

t

Xt

Figure 3.1.2: Plot of an Ornstein-Uhlenbeck process on [0, 1] with = 10.

Example 3.1.14 (Fractional Brownian Motion). Fractional Brownian mo-tion (fBm) is a generalisation of Brownian Motion. It has expectation func-tion (t) = 0 and covariance function Cov(s, t) = 1/2(t2H + s2H |t s|2H),

Simulazione di un processo di Ornstein-Uhlenbeck (O-U):

Processi stocastici // Simulazione di processi Gaussiani Simulazione di un processo di Ornstein-Uhlenbeck (O-U):

3.1. GAUSSIAN PROCESSES 95

13 for i = 1:n

14 for j = 1:n

15 Sigma(i,j) = min(t(i),t(j));

16 end

17 end

18

19 %Generate the multivariate normal vector

20 A = chol(Sigma);

21 X = mu + randn(1,n) * A;

22

23 %Plot

24 plot(t,X);

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 11.8

1.6

1.4

1.2

1

0.8

0.6

0.4

0.2

0

0.2

t

Xt

Figure 3.1.1: Plot of Brownain motion on [0, 1].

Example 3.1.13 (Ornstein-Uhlenbeck Process). Another very importantGaussian process is the Ornstein-Uhlenbeck process. This has expectationfunction (t) = 0 and covariance function r(s, t) = e|st|/2.

Listing 3.4: Matlab code

1 %use mesh size h

2 m = 1000; h = 1/m; t = h : h : 1;

3 n = length(t);

4 %paramter of OU process

5 alpha = 10;

6

96 CHAPTER 3. GAUSSIAN PROCESSES ETC.

7 %Make the mean vector

8 mu = zeros(1,n);

9 for i = 1:n

10 mu(i) = 0;

11 end

12

13 %Make the covariance matrix

14 Sigma = zeros(n,n);

15 for i = 1:n

16 for j = 1:n

17 Sigma(i,j) = exp(-alpha * abs(t(i) - t(j)) / 2);

18 end

19 end

20

21 %Generate the multivariate normal vector

22 A = chol(Sigma);

23 X = mu + randn(1,n) * A;

24

25 %Plot

26 plot(t,X);

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 11.5

1

0.5

0

0.5

1

1.5

2

2.5

3

t

Xt

Figure 3.1.2: Plot of an Ornstein-Uhlenbeck process on [0, 1] with = 10.

Example 3.1.14 (Fractional Brownian Motion). Fractional Brownian mo-tion (fBm) is a generalisation of Brownian Motion. It has expectation func-tion (t) = 0 and covariance function Cov(s, t) = 1/2(t2H + s2H |t s|2H),

96 CHAPTER 3. GAUSSIAN PROCESSES ETC.

7 %Make the mean vector

8 mu = zeros(1,n);

9 for i = 1:n

10 mu(i) = 0;

11 end

12

13 %Make the covariance matrix

14 Sigma = zeros(n,n);

15 for i = 1:n

16 for j = 1:n

17 Sigma(i,j) = exp(-alpha * abs(t(i) - t(j)) / 2);

18 end

19 end

20

21 %Generate the multivariate normal vector

22 A = chol(Sigma);

23 X = mu + randn(1,n) * A;

24

25 %Plot

26 plot(t,X);

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 11.5

1

0.5

0

0.5

1

1.5

2

2.5

3

t

Xt

Figure 3.1.2: Plot of an Ornstein-Uhlenbeck process on [0, 1] with = 10.

Example 3.1.14 (Fractional Brownian Motion). Fractional Brownian mo-tion (fBm) is a generalisation of Brownian Motion. It has expectation func-tion (t) = 0 and covariance function Cov(s, t) = 1/2(t2H + s2H |t s|2H),

3.1. GAUSSIAN PROCESSES 97

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 13

2

1

0

1

2

3

4

t

Xt

Figure 3.1.3: Plot of an Ornstein-Uhlenbeck process on [0, 1] with = 100.

where H (0, 1) is called the Hurst parameter. When H = 1/2, fBm re-duces to standard Brownian motion. Brownian motion has independent in-crements. In contrast, for H > 1/2 fBm has positively correlated incrementsand for H < 1/2 fBm has negatively correlated increments.

Listing 3.5: Matlab code

1 %Hurst parameter

2 H = .9;

3

4 %use mesh size h

5 m = 1000; h = 1/m; t = h : h : 1;

6 n = length(t);

7

8 %Make the mean vector

9 mu = zeros(1,n);

10 for i = 1:n

11 mu(i) = 0;

12 end

13

14 %Make the covariance matrix

15 Sigma = zeros(n,n);

16 for i = 1:n

17 for j = 1:n

18 Sigma(i,j) = 1/2 * (t(i)^(2*H) + t(j)^(2*H)...

19 - (abs(t(i) - t(j)))^(2 * H));

3.1. GAUSSIAN PROCESSES 97

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 13

2

1

0

1

2

3

4

t

Xt

Figure 3.1.3: Plot of an Ornstein-Uhlenbeck process on [0, 1] with = 100.

where H (0, 1) is called the Hurst parameter. When H = 1/2, fBm re-duces to standard Brownian motion. Brownian motion has independent in-crements. In contrast, for H > 1/2 fBm has positively correlated incrementsand for H < 1/2 fBm has negatively correlated increments.

Listing 3.5: Matlab code

1 %Hurst parameter

2 H = .9;

3

4 %use mesh size h

5 m = 1000; h = 1/m; t = h : h : 1;

6 n = length(t);

7

8 %Make the mean vector

9 mu = zeros(1,n);

10 for i = 1:n

11 mu(i) = 0;

12 end

13

14 %Make the covariance matrix

15 Sigma = zeros(n,n);

16 for i = 1:n

17 for j = 1:n

18 Sigma(i,j) = 1/2 * (t(i)^(2*H) + t(j)^(2*H)...

19 - (abs(t(i) - t(j)))^(2 * H));

96 CHAPTER 3. GAUSSIAN PROCESSES ETC.

7 %Make the mean vector

8 mu = zeros(1,n);

9 for i = 1:n

10 mu(i) = 0;

11 end

12

13 %Make the covariance matrix

14 Sigma = zeros(n,n);

15 for i = 1:n

16 for j = 1:n

17 Sigma(i,j) = exp(-alpha * abs(t(i) - t(j)) / 2);

18 end

19 end

20

21 %Generate the multivariate normal vector

22 A = chol(Sigma);

23 X = mu + randn(1,n) * A;

24

25 %Plot

26 plot(t,X);

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 11.5

1

0.5

0

0.5

1

1.5

2

2.5

3

t

Xt

Figure 3.1.2: Plot of an Ornstein-Uhlenbeck process on [0, 1] with = 10.

Example 3.1.14 (Fractional Brownian Motion). Fractional Brownian mo-tion (fBm) is a generalisation of Brownian Motion. It has expectation func-tion (t) = 0 and covariance function Cov(s, t) = 1/2(t2H + s2H |t s|2H),

Processi Gaussiani (GP) e Machine Learning

Processi Gaussiani: prendendo un qualsiasi numero finito di variabili aleatorie, dalla collezione che forma il processo aleatorio stesso, esse hanno una distribuzione di probabilit congiunta gaussiana (multivariata)

Un processo Gaussiano completamente specificato mediante media e matrice di covarianza

Problema per il Machine Learning: imparare i parametri della funzione di covarianza

Further reading

Many more topics and code:http://www.gaussianprocess.org/gpml/

MCMC inference for GPs is implemented in FBM:http://www.cs.toronto.edu/~radford/fbm.software.html

Gaussian processes for ordinal regression, Chu and Ghahramani,JMLR, 6:10191041, 2005.

Flexible and ecient Gaussian process models for machine learning,Edward L. Snelson, PhD thesis, UCL, 2007.

http://www.gaussianprocess.org/gpml/chapters/

http://www.gaussianprocess.org/gpml/code/matlab/doc/index.html

Processi Gaussiani: presentazione intuitiva //idee di base

Supponiamo di avere 6 punti rumorosi: vogliamo predire a x*=0.2

Processi Gaussiani //idee di base

Supponiamo di avere 6 punti rumorosi: vogliamo predire a x*=0.2

Non assumiamo un modello parametrico, esempio:

Lasciamo che i dati parlino da soli

Assumiamo che i dati siano stati generati da una distribuzione Gaussiana n-variata

Processi Gaussiani //idee di base

Assumiamo che i dati siano stati generati da una distribuzione Gaussiana n-variata:

media 0

funzione di covarianza k(x,x)

Caratteristiche di k(x,x)

massima per x vicino a x

se x lontano da x

esempio:

x vicino a x

y correlato a y

Processi Gaussiani //idee di base

Assumiamo che i dati siano stati generati da una distribuzione Gaussiana n-variata con rumore gaussiano additivo

Incorporiamo il rumore nel kernel

Vogliamo predire y* e non la reale funzione f*

il punto la media di

y* e f*

media uguale covarianza diversa

Processi Gaussiani //idee di base

Calcoliamo le covarianze fra tutte le possibili combinazioni di punti.

otteniamo tre matrici

il punto la media di

y* e f*

Processi Gaussiani //idee di base

Regressione con GP

Assumiamo che i dati siano stati generati da una distribuzione Gaussiana n-variata con rumore gaussiano additivo

Vogliamo predire

Usando le propriet delle Gaussiane

il punto la media di

y* e f*

Predizione

Incertezza

Processi Gaussiani //idee di base

Regressione con GP per il punto y*, con

Nell esempio, stimando dalle error bar, e con una scelta appropriata di e

il punto la media di

y* e f*

Predizione

Incertezza

Processi Gaussiani //idee di base

Regressione con GP per 1000 punti

Nell esempio, stimando dalle error bar, e con una scelta appropriata di e

Processi Gaussiani //idee di base

Regressione con GP per 1000 punti

Nell esempio, stimando dalle error bar, e con una scelta appropriata di e

Possiamo incorporare la fisica del processo nella funzione di kernel

Processi Gaussiani //idee di base

Possiamo incorporare la fisica del processo nella funzione di kernel

Processi Gaussiani //idee di base

Possiamo incorporare la fisica del processo nella funzione di kernel

Processi Gaussiani //idee di base

Esempio

Definizione della funzione di kernel k(x,z)

Creazione di un data set

% definizione del kernel e della bandwidth tau = 1.0;k = @(x,z) exp(-(x-z)'*(x-z) / (2*tau^2));

% crea un training set corrotto da rumore m_train = 10; i_train = floor(rand(m_train,1) * size(X,1) + 1);

X_train = X(i_train);

y_train = h(i_train) + sigma * randn(m_train,1);

funzione vera (latente)

Esempio

Predizione:

X_test = X;

K_test_test = compute_kernel_matrix(k,X_test,X_test);

K_test_train = compute_kernel_matrix(k,X_test,X_train);

K_train_train = compute_kernel_matrix(k,X_train,X_train);

G = K_train_train + sigma^2 * eye(size(X_train,1));

mu_test = K_test_train * (G \ y_train);

Sigma_test = K_test_test + sigma^2 * eye(size(X_test,1)) - K_test_train * (G \ (K_test_train'));

function [K] = compute_kernel_matrix(k, X, Z) m = size(X,1); n = size(Z,1); K = zeros(m,n); for i = 1:m for j = 1:n K(i,j) = k(X(i,:)', Z(j,:)'); end end end

Fantastico! Ma...

che cosa abbiamo fatto realmente e come si collega allapproccio Bayesiano?

come si effettua una scelta appropriata dei parametri??

Riprendiamo la regressione lineare

Processi Gaussiani //idee di base

L a p r o b a b i l i t Gaussiana a priori sui pesi equivale a un a priori Gaussiano sulle funzioni y

Design MatrixCongiunta Normale

Si pu dimostrare che

Matrice di Gram

A priori Gaussiano sulle funzioni y

Mi bastano media e varianza per specificarla

Media nulla basta la varianza per specificarla

Mi bastano i dati per specificare la varianza

Processi Gaussiani //fondamenti concettuali

P r o b a b i l i t G a u s s i a n a (congiunta) sui punti della funzione

Probabilit Gaussiana a priori sulla funzione

I parametri w sono spariti !

f una variabile aleatoria

Processi Gaussiani //fondamenti concettuali

function [f] = sample_gp_prior(k, X) K = compute_kernel_matrix(k, X, X); % campiona da una distribuzione Gaussiana a media nulla % con matrice di covarianza K [U,S,V] = svd(K); A = U*sqrt(S)*V'; f = A * randn(size(X,1),1); end

Processi Gaussiani //fondamenti concettuali

Quando campiono dal processo Gaussiano campiono una funzione

Processo Gaussiano

I parametri w sono spariti !

Processi Gaussiani //fondamenti concettuali

Quando campiono dal processo Gaussiano campiono una funzione

Processo Gaussiano

I parametri w sono spariti !

Processi Gaussiani //fondamenti concettuali

Cio: una distribuzione di probabilit su funzioni?

E quindi una funzione una variabile aleatoria?

Quando campiono dal processo Gaussiano campiono una funzione

Processi Gaussiani //fondamenti concettuali

Vediamola in questo modo.. campioniamo da una Gaussiana 2D

which we can plot in a horizontal manner, if we wish

x1 x2 x1

x2 0

2

4

-2

-4

Processi Gaussiani //fondamenti concettuali

which we can plot in a horizontal manner, if we wish

x1 x2 x1

x2 0

2

4

-2

-4

which we can plot in a horizontal manner, if we wish

and where we can display a 2-D point using both plots

x1 x2 x1

x2 0

2

4

-2

-4

Vediamola in questo modo.. campioniamo da una Gaussiana 2D

Processi Gaussiani //fondamenti concettuali

which we can plot in a horizontal manner, if we wish

x1 x2 x1

x2 0

2

4

-2

-4

which we can plot in a horizontal manner, if we wish

and where we can display a 2-D point using both plots

x1 x2 x1

x2 0

2

4

-2

-4

which we can plot in a horizontal manner, if we wish

and where we can display a 2-D point using both plots

x1 x2 x1

x2 0

2

4

-2

-4

Vediamola in questo modo.. campioniamo da una Gaussiana 2D

Processi Gaussiani //fondamenti concettuali

which we can plot in a horizontal manner, if we wish

x1 x2 x1

x2 0

2

4

-2

-4

which we can plot in a horizontal manner, if we wish

and where we can display a 2-D point using both plots

x1 x2 x1

x2 0

2

4

-2

-4

which we can plot in a horizontal manner, if we wish

and where we can display a 2-D point using both plots

x1 x2 x1

x2 0

2

4

-2

-4

which we can plot in a horizontal manner, if we wish

and where we can display a 2-D point using both plots

x1 x2 x1

x2 0

2

4

-2

-4

Vediamola in questo modo.. campioniamo da una Gaussiana 2D

Processi Gaussiani //fondamenti concettuali

which we can plot in a horizontal manner, if we wish

x1 x2 x1

x2 0

2

4

-2

-4

which we can plot in a horizontal manner, if we wish

and where we can display a 2-D point using both plots

x1 x2 x1

x2 0

2

4

-2

-4

which we can plot in a horizontal manner, if we wish

and where we can display a 2-D point using both plots

x1 x2 x1

x2 0

2

4

-2

-4

which we can plot in a horizontal manner, if we wish

and where we can display a 2-D point using both plots

x1 x2 x1

x2 0

2

4

-2

-4

which we can plot in a horizontal manner, if we wish

and where we can display a 2-D point using both plots

x1 x2 x1

x2 0

2

4

-2

-4

Vediamola in questo modo.. campioniamo da una Gaussiana 2D

Processi Gaussiani //fondamenti concettuali

which we can plot in a horizontal manner, if we wish

x1 x2 x1

x2 0

2

4

-2

-4

Vediamola in questo modo.. campioniamo da una Gaussiana 2D

Processi Gaussiani //fondamenti concettuali

.. e uniamo le due coordinate con una retta

which we can plot in a horizontal manner, if we wish

x1 x2 x1

x2 0

2

4

-2

-4

which we can plot in a horizontal manner, if we wish

and where we can display a 2-D point using both plots

x1 x2 x1

x2 0

2

4

-2

-4

Processi Gaussiani //fondamenti concettuali

which we can plot in a horizontal manner, if we wish

x1 x2 x1

x2 0

2

4

-2

-4

which we can plot in a horizontal manner, if we wish

and where we can display a 2-D point using both plots

x1 x2 x1

x2 0

2

4

-2

-4

which we can plot in a horizontal manner, if we wish

and where we can display a 2-D point using both plots

x1 x2 x1

x2 0

2

4

-2

-4

.. e uniamo le due coordinate con una retta

Processi Gaussiani //fondamenti concettuali

which we can plot in a horizontal manner, if we wish

x1 x2 x1

x2 0

2

4

-2

-4

which we can plot in a horizontal manner, if we wish

and where we can display a 2-D point using both plots

x1 x2 x1

x2 0

2

4

-2

-4

which we can plot in a horizontal manner, if we wish

and where we can display a 2-D point using both plots

x1 x2 x1

x2 0

2

4

-2

-4

which we can plot in a horizontal manner, if we wish

and where we can display a 2-D point using both plots

x1 x2 x1

x2 0

2

4

-2

-4

.. e uniamo le due coordinate con una retta

Processi Gaussiani //fondamenti concettuali

which we can plot in a horizontal manner, if we wish

x1 x2 x1

x2 0

2

4

-2

-4

which we can plot in a horizontal manner, if we wish

and where we can display a 2-D point using both plots

x1 x2 x1

x2 0

2

4

-2

-4

which we can plot in a horizontal manner, if we wish

and where we can display a 2-D point using both plots

x1 x2 x1

x2 0

2

4

-2

-4

which we can plot in a horizontal manner, if we wish

and where we can display a 2-D point using both plots

x1 x2 x1

x2 0

2

4

-2

-4

which we can plot in a horizontal manner, if we wish

and where we can display a 2-D point using both plots

x1 x2 x1

x2 0

2

4

-2

-4

.. e uniamo le due coordinate con una retta

Processi Gaussiani //fondamenti concettuali

Adesso da una Gaussiana 3D

but were not restricted to bivariate Gaussians

Lets introduce some extra dimensions, to our multivariate Gaussian distribution (without plotting it!)

Perhaps unsurprisingly, as we generate points from our new trivariate distribution, we can equivalently show these on our three-column horizontal plot

x1 x2

0

2

4

-2

-4

x3

Processi Gaussiani //fondamenti concettuali

Adesso da una Gaussiana 3D

but were not restricted to bivariate Gaussians

Lets introduce some extra dimensions, to our multivariate Gaussian distribution (without plotting it!)

Perhaps unsurprisingly, as we generate points from our new trivariate distribution, we can equivalently show these on our three-column horizontal plot

x1 x2

0

2

4

-2

-4

x3

but were not restricted to bivariate Gaussians

Lets introduce some extra dimensions, to our multivariate Gaussian distribution (without plotting it!)

Perhaps unsurprisingly, as we generate points from our new trivariate distribution, we can equivalently show these on our three-column horizontal plot

x1 x2

0

2

4

-2

-4

x3

Processi Gaussiani //fondamenti concettuali

Adesso da una Gaussiana 3D

but were not restricted to bivariate Gaussians

Lets introduce some extra dimensions, to our multivariate Gaussian distribution (without plotting it!)

Perhaps unsurprisingly, as we generate points from our new trivariate distribution, we can equivalently show these on our three-column horizontal plot

x1 x2

0

2

4

-2

-4

x3

but were not restricted to bivariate Gaussians

Lets introduce some extra dimensions, to our multivariate Gaussian distribution (without plotting it!)

Perhaps unsurprisingly, as we generate points from our new trivariate distribution, we can equivalently show these on our three-column horizontal plot

x1 x2

0

2

4

-2

-4

x3

but were not restricted to bivariate Gaussians

Lets introduce some extra dimensions, to our multivariate Gaussian distribution (without plotting it!)

Perhaps unsurprisingly, as we generate points from our new trivariate distribution, we can equivalently show these on our three-column horizontal plot

x1 x2

0

2

4

-2

-4

x3

Processi Gaussiani //fondamenti concettuali

Adesso da una Gaussiana 3D

but were not restricted to bivariate Gaussians

Lets introduce some extra dimensions, to our multivariate Gaussian distribution (without plotting it!)

Perhaps unsurprisingly, as we generate points from our new trivariate distribution, we can equivalently show these on our three-column horizontal plot

x1 x2

0

2

4

-2

-4

x3

but were not restricted to bivariate Gaussians

Lets introduce some extra dimensions, to our multivariate Gaussian distribution (without plotting it!)

Perhaps unsurprisingly, as we generate points from our new trivariate distribution, we can equivalently show these on our three-column horizontal plot

x1 x2

0

2

4

-2

-4

x3

but were not restricted to bivariate Gaussians

Lets introduce some extra dimensions, to our multivariate Gaussian distribution (without plotting it!)

Perhaps unsurprisingly, as we generate points from our new trivariate distribution, we can equivalently show these on our three-column horizontal plot

x1 x2

0

2

4

-2

-4

x3

but were not restricted to bivariate Gaussians

Lets introduce some extra dimensions, to our multivariate Gaussian distribution (without plotting it!)

Perhaps unsurprisingly, as we generate points from our new trivariate distribution, we can equivalently show these on our three-column horizontal plot

x1 x2

0

2

4

-2

-4

x3

Processi Gaussiani //fondamenti concettuali

Adesso da una Gaussiana 3D

but were not restricted to bivariate Gaussians

Lets introduce some extra dimensions, to our multivariate Gaussian distribution (without plotting it!)

Perhaps unsurprisingly, as we generate points from our new trivariate distribution, we can equivalently show these on our three-column horizontal plot

x1 x2

0

2

4

-2

-4

x3

but were not restricted to bivariate Gaussians

Lets introduce some extra dimensions, to our multivariate Gaussian distribution (without plotting it!)

Perhaps unsurprisingly, as we generate points from our new trivariate distribution, we can equivalently show these on our three-column horizontal plot

x1 x2

0

2

4

-2

-4

x3

but were not restricted to bivariate Gaussians

Lets introduce some extra dimensions, to our multivariate Gaussian distribution (without plotting it!)

Perhaps unsurprisingly, as we generate points from our new trivariate distribution, we can equivalently show these on our three-column horizontal plot

x1 x2

0

2

4

-2

-4

x3

but were not restricted to bivariate Gaussians

Lets introduce some extra dimensions, to our multivariate Gaussian distribution (without plotting it!)

Perhaps unsurprisingly, as we generate points from our new trivariate distribution, we can equivalently show these on our three-column horizontal plot

x1 x2

0

2

4

-2

-4

x3

but were not restricted to bivariate Gaussians

Lets introduce some extra dimensions, to our multivariate Gaussian distribution (without plotting it!)

Perhaps unsurprisingly, as we generate points from our new trivariate distribution, we can equivalently show these on our three-column horizontal plot

x1 x2

0

2

4

-2

-4

x3

Processi Gaussiani //fondamenti concettuali

Adesso da una Gaussiana 3D

but were not restricted to bivariate Gaussians

Lets introduce some extra dimensions, to our multivariate Gaussian distribution (without plotting it!)

Perhaps unsurprisingly, as we generate points from our new trivariate distribution, we can equivalently show these on our three-column horizontal plot

x1 x2

0

2

4

-2

-4

x3

but were not restricted to bivariate Gaussians

Lets introduce some extra dimensions, to our multivariate Gaussian distribution (without plotting it!)

Perhaps unsurprisingly, as we generate points from our new trivariate distribution, we can equivalently show these on our three-column horizontal plot

x1 x2

0

2

4

-2

-4

x3

but were not restricted to bivariate Gaussians

Lets introduce some extra dimensions, to our multivariate Gaussian distribution (without plotting it!)

Perhaps unsurprisingly, as we generate points from our new trivariate distribution, we can equivalently show these on our three-column horizontal plot

x1 x2

0

2

4

-2

-4

x3

but were not restricted to bivariate Gaussians

Lets introduce some extra dimensions, to our multivariate Gaussian distribution (without plotting it!)

Perhaps unsurprisingly, as we generate points from our new trivariate distribution, we can equivalently show these on our three-column horizontal plot

x1 x2

0

2

4

-2

-4

x3

but were not restricted to bivariate Gaussians

Lets introduce some extra dimensions, to our multivariate Gaussian distribution (without plotting it!)

Perhaps unsurprisingly, as we generate points from our new trivariate distribution, we can equivalently show these on our three-column horizontal plot

x1 x2

0

2

4

-2

-4

x3

Processi Gaussiani //fondamenti concettuali

Adesso da una Gaussiana 5D

Why stop at a trivariate Gaussian distribution? We can add as many dimensions to our multivariate Gaussian as we wish

x1 x2

0

2

4

-2

-4

x3 x4 x5

Processi Gaussiani //fondamenti concettuali

Adesso da una Gaussiana 5D

Why stop at a trivariate Gaussian distribution? We can add as many dimensions to our multivariate Gaussian as we wish

x1 x2

0

2

4

-2

-4

x3 x4 x5

Why stop at a trivariate Gaussian distribution? We can add as many dimensions to our multivariate Gaussian as we wish

x1 x2

0

2

4

-2

-4

x3 x4 x5

Processi Gaussiani //fondamenti concettuali

Adesso da una Gaussiana 5D

Why stop at a trivariate Gaussian distribution? We can add as many dimensions to our multivariate Gaussian as we wish

x1 x2

0

2

4

-2

-4

x3 x4 x5

Why stop at a trivariate Gaussian distribution? We can add as many dimensions to our multivariate Gaussian as we wish

x1 x2

0

2

4

-2

-4

x3 x4 x5

Why stop at a trivariate Gaussian distribution? We can add as many dimensions to our multivariate Gaussian as we wish

x1 x2

0

2

4

-2

-4

x3 x4 x5

Processi Gaussiani //fondamenti concettuali

Adesso da una Gaussiana 5D

Why stop at a trivariate Gaussian distribution? We can add as many dimensions to our multivariate Gaussian as we wish

x1 x2

0

2

4

-2

-4

x3 x4 x5

Why stop at a trivariate Gaussian distribution? We can add as many dimensions to our multivariate Gaussian as we wish

x1 x2

0

2

4

-2

-4

x3 x4 x5

Why stop at a trivariate Gaussian distribution? We can add as many dimensions to our multivariate Gaussian as we wish

x1 x2

0

2

4

-2

-4

x3 x4 x5

Why stop at a trivariate Gaussian distribution? We can add as many dimensions to our multivariate Gaussian as we wish

x1 x2

0

2

4

-2

-4

x3 x4 x5

Processi Gaussiani //fondamenti concettuali

Adesso 3 campionamenti da una Gaussiana 25DBuilding large Gaussians

Three draws from a 25D Gaussian:

1

0

1

2

f

x

To produce this, we needed a mean: I used zeros(25,1)

The covariances were set using a kernel function: ij = k(xi,xj).

The xs are the positions that I planted the tics on the axis.

Later well find ks that ensure is always positive semi-definite.

Processo Gaussiano

I parametri w sono spariti !

f una variabile aleatoria

Processi Gaussiani //fondamenti concettuali

Distribuzione Gaussiana vs. Processo Gaussiano

Un processo Gaussiano definisce una distribuzione su funzioni p(f) dove

pu essere considerato un vettore a dimensionalit infinita (un processo stocastico)

Definizione:

p(f) un processo Gaussiano se per ogni sottoinsieme finito la distribuzione sul sottoinsieme finito (vettore n-dimensionale) una Gaussiana multivariata

Processi Gaussiani //definizione formale

che cosa ci dice il kernel?

La funzione di kernel, dunque la covarianza, ci dice quanto due punti sono correlati

Processi Gaussiani //Che cosa ci dice il kernel.

Processo Gaussiano

Processi Gaussiani //fondamenti concettuali

Processo Gaussiano Usando Kernel diversi....

Processi Gaussiani //fondamenti concettuali

Processi Gaussiani //fondamenti concettuali

Processi Gaussiani //Modello grafico

n=1,...,N

p a r a m e t r i del kernel

Processi Gaussiani //Regressione

Il modello assumendo rumore Gaussiano

La distribuzione congiunta su e

Predizione rispetto a un nuovo dato :

La congiunta su

Partiziono la matrice di covarianza come

Processi Gaussiani //Regressione

Sfruttando le propriet delle congiunte di Gaussiane espresse in forma matriciale si ottiene che la predizione specificabile come

Processi Gaussiani //Regressione

Processi Gaussiani //Regressione: esempio

Processi Gaussiani //Regressione: esempio

Processi Gaussiani //Regressione: esempio

Processi Gaussiani //Regressione: esempio

Processi Gaussiani //Regressione: esempio

Processi Gaussiani //Regressione: esempio

Processi Gaussiani //Regressione: esempio

Processi Gaussiani //Regressione: esempio

La matrice di covarianza deve essere semidefinita e positiva

In generale la funzione di covarianza ha dei parametri da apprendere

Processi Gaussiani //Parametri del kernel

varianza del segnale

La matrice di covarianza deve essere semidefinita e positiva

In generale la funzione di covarianza ha dei parametri da apprendere

Processi Gaussiani //Parametri del kernel

lunghezza di scala o di correlazione

La matrice di covarianza deve essere semidefinita e positiva

In generale la funzione di covarianza ha dei parametri da apprendere

approccio ML: si massimizza

approccio MAP: si massimizza

approccio totalmente Bayesiano: non trattabile, approssimazioni

Processi Gaussiani //Parametri del kernel

Processi Gaussiani //Parametri del kernel

Processi Gaussiani //Parametri del kernel

Processi Gaussiani //Parametri del kernel

Processi Gaussiani //Parametri del kernel

Processi Gaussiani //Parametri del kernel

Processi Gaussiani //Parametri del kernel

Processi Gaussiani //Parametri del kernel

Processi Gaussiani //Parametri del kernel