a. proof of claim5proceedings.mlr.press/v97/nayebi19a/nayebi19a-supp.pdf · 2021. 1. 14. ·...

9
A Framework for Bayesian Optimization in Embedded Subspaces A. Proof of Claim 5 Proof of Claim 5. First note that z [0, 1] directly implies z 1+δ z z 1-δ which proves both lower bounds. It remains to prove the upper bounds of δ. We study the mapping f : [0, 1] R,z 7z - z 1+δ . Clearly f (0) = f (1) = 0 δ. Setting the derivative to zero and solving for z we have f 0 (z)=1 - (1 + δ)z δ =0 z = 1 1+ δ 1 δ . Note that 1 1+δ 1 δ 1, thus f (z) 1 1+ δ 1 δ - 1 1+ δ 1+δ δ = 1 1+ δ 1 δ 1 - 1 1+ δ 1+ δ - 1 1+ δ = δ 1+ δ δ. Very similar arguments prove the claim for the mapping f : [0, 1] R,z 7z 1-δ - z. Again, f (0) = f (1) = 0 δ. Setting the derivative to zero and solving for z yields f 0 (z) = (1 - δ)z -δ - 1=0 z = 1 1 - δ - 1 δ . Note that 1 1-δ - 1 δ = (1 - δ) 1 δ e -1 , thus f (z) 1 1 - δ - 1-δ δ - 1 1 - δ - 1 δ = 1 1 - δ - 1 δ 1 1 - δ - 1 1 - 1+ δ 1 - δ e -1 2δe -1 δ. B. The Dilation Issue in REMBO In their seminal work proposing the REMBO variants, Wang et al. (2016b) noted that their random Gaussian embedding A has a dilation that necessitates to scale REMBO’s search space to [- d, + d] d to ensure the optimal remains in the search domain. However, REMBO might select a point y such that the random embedding Ay is outside the actual box X . In such a case it projects Ay onto the boundary of X before evaluating f . This may lead to over-exploration of the boundary regions, which leads to particularly bad performance when optimal point lies in the interior. (Bi- nois et al., 2015) proposed a new warping kernel k ψ to address this issue by an improved preservation of distances among such points. This method was further improved in (Binois et al., 2019). The analysis in Sect. C below shows that REMBO applies these complex corrections frequently. In contrast, the count-sketch avoids such complex correc- tions. By construction of the inverse count-sketch S the columns of the matrix are orthogonal, since each row has only a single non-zero entry. Moreover, for any fixed co- ordinate (dimension) d i of the low-dimensional space we have that the number of coordinates in the high dimen- sional space that map to d i is concentrated at D/d, i.e., this number has mean D/d and a variance that drops (quickly) as D increases. The reason is that h picks the target of each dimension uniformly at random. The dilation of S is thus roughly p D/d, and the search domain Y =[-1, 1] d is mapped to SY = {Sy | y ∈ Y} ⊂ X =[-1, 1] D with low distortion (1 ± ε) as given in Definition 1. More- over, note that the back-projection is realized by solely copying the coordinates and multiplying them by {-1, 1}, which assert that no point is projected outside the domain X . Thus, HeSBO avoids complex corrections in contrast to the REMBO variants described above. C. Analysis of Correction Efforts in REMBO We examine how often the REMBO variants choose points to sample whose projection to the high-dimensional space is outside of the search domain X . Note that these points are corrected by projecting them back onto the boundary X . Table 1 shows the results on Branin with input D = 100 and three different choices for the target dimension d. We see that for larger values of d almost all points chosen by REMBO are projected outside of X . Note that all three algorithms use Table 1. The fraction of steps of the three REMBO variants that project the sample point outside X . The data was collected by running the algorithms on Branin with input dimension D = 100 over 30 replications. d =2 d =3 d =4 ky 0.95 ± 0.006 0.993 ± 0.0008 0.9999 ± 0.0001 kx 0.93 ± 0.009 0.992 ± 0.001 0.9996 ± 0.0002 k ψ 0.88 ± 0.02 0.992 ± 0.003 0.9997 ± 0.0002 the same random projection in each iteration, their decisions what points to sample are affected by the respective kernel. In particular, we see that k X performs better than k y as intended by Wang et al. (2016b), whereas k ψ of Binois et al. (2015) performs best among these. Note that the approach proposed in this paper completely avoids the need for corrections.

Upload: others

Post on 13-Aug-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A. Proof of Claim5proceedings.mlr.press/v97/nayebi19a/nayebi19a-supp.pdf · 2021. 1. 14. · HeSBO-EI 0 20 40 60 80 Number of evaluations 0 2 4 6 8 10 12 14 Function values 25 dim

A Framework for Bayesian Optimization in Embedded Subspaces

A. Proof of Claim 5Proof of Claim 5. First note that z ∈ [0, 1] directly impliesz1+δ ≤ z ≤ z1−δ which proves both lower bounds. Itremains to prove the upper bounds of δ. We study themapping f : [0, 1] → R, z 7→ z − z1+δ. Clearly f(0) =f(1) = 0 ≤ δ. Setting the derivative to zero and solving forz we have

f ′(z) = 1− (1 + δ)zδ = 0⇔ z =

(1

1 + δ

) 1δ

.

Note that(

11+δ

) 1δ ≤ 1, thus

f(z) ≤(

1

1 + δ

) 1δ

−(

1

1 + δ

) 1+δδ

=

(1

1 + δ

) 1δ(

1− 1

1 + δ

)≤ 1 + δ − 1

1 + δ=

δ

1 + δ≤ δ.

Very similar arguments prove the claim for the mappingf : [0, 1]→ R, z 7→ z1−δ − z. Again, f(0) = f(1) = 0 ≤δ. Setting the derivative to zero and solving for z yields

f ′(z) = (1− δ)z−δ − 1 = 0⇔ z =

(1

1− δ

)− 1δ

.

Note that(

11−δ

)− 1δ

= (1− δ) 1δ ≤ e−1, thus

f(z) ≤(

1

1− δ

)− 1−δδ

−(

1

1− δ

)− 1δ

=

(1

1− δ

)− 1δ(

1

1− δ − 1

)≤ 1− 1 + δ

1− δ e−1 ≤ 2δe−1 ≤ δ.

B. The Dilation Issue in REMBOIn their seminal work proposing the REMBO variants, Wanget al. (2016b) noted that their random Gaussian embeddingA has a dilation that necessitates to scale REMBO’s searchspace to [−

√d,+√d]d to ensure the optimal remains in the

search domain. However, REMBO might select a point ysuch that the random embedding Ay is outside the actualbox X . In such a case it projects Ay onto the boundary ofX before evaluating f . This may lead to over-explorationof the boundary regions, which leads to particularly badperformance when optimal point lies in the interior. (Bi-nois et al., 2015) proposed a new warping kernel kψ toaddress this issue by an improved preservation of distances

among such points. This method was further improved in(Binois et al., 2019). The analysis in Sect. C below showsthat REMBO applies these complex corrections frequently.In contrast, the count-sketch avoids such complex correc-tions. By construction of the inverse count-sketch S thecolumns of the matrix are orthogonal, since each row hasonly a single non-zero entry. Moreover, for any fixed co-ordinate (dimension) di of the low-dimensional space wehave that the number of coordinates in the high dimen-sional space that map to di is concentrated at D/d, i.e., thisnumber has mean D/d and a variance that drops (quickly)as D increases. The reason is that h picks the target ofeach dimension uniformly at random. The dilation of S isthus roughly

√D/d, and the search domain Y = [−1, 1]d

is mapped to SY = {Sy | y ∈ Y} ⊂ X = [−1, 1]D

with low distortion (1± ε) as given in Definition 1. More-over, note that the back-projection is realized by solelycopying the coordinates and multiplying them by {−1, 1},which assert that no point is projected outside the domain X .Thus, HeSBO avoids complex corrections in contrast to theREMBO variants described above.

C. Analysis of Correction Efforts in REMBOWe examine how often the REMBO variants choose points tosample whose projection to the high-dimensional space isoutside of the search domain X . Note that these points arecorrected by projecting them back onto the boundary ∂X .Table 1 shows the results on Branin with inputD = 100 andthree different choices for the target dimension d. We seethat for larger values of d almost all points chosen by REMBOare projected outside ofX . Note that all three algorithms use

Table 1. The fraction of steps of the three REMBO variants thatproject the sample point outside X . The data was collected byrunning the algorithms on Branin with input dimension D = 100over 30 replications.

d = 2 d = 3 d = 4

ky 0.95± 0.006 0.993± 0.0008 0.9999± 0.0001kx 0.93± 0.009 0.992± 0.001 0.9996± 0.0002kψ 0.88± 0.02 0.992± 0.003 0.9997± 0.0002

the same random projection in each iteration, their decisionswhat points to sample are affected by the respective kernel.In particular, we see that kX performs better than ky asintended by Wang et al. (2016b), whereas kψ of Binoiset al. (2015) performs best among these. Note that theapproach proposed in this paper completely avoids the needfor corrections.

Page 2: A. Proof of Claim5proceedings.mlr.press/v97/nayebi19a/nayebi19a-supp.pdf · 2021. 1. 14. · HeSBO-EI 0 20 40 60 80 Number of evaluations 0 2 4 6 8 10 12 14 Function values 25 dim

A Framework for Bayesian Optimization in Embedded Subspaces

D. Details on ExperimentsWe provide details needed to replicate the experiments inTable 2. All REMBO variants and HeSBO-EI use identicalinitial datasets in every replication to initialize the surrogatemodels, obtained via random Latin hypercube designs. Forthe other algorithms we used their preferred strategies to ob-tain the same number of initial data points. We implementedHeSBO-EI and all REMBO variants in Python 2.7 us-ing GPy. They use a GP model with the 5

2 -Matern kerneland constant mean function. The acquisition function wasoptimized by L-BFGS-B leveraging its gradient (Frean &Boyle, 2008).

For the other algorithms we used the reference implemen-tations provided by the authors, see the hyperlinks in thereferences. ADD was provided by Gardner in personal com-munication. For the neural network parameter search bench-mark discussed in Sect. 3, we had to omit ADD because ofsoftware incompatibilities with this benchmark and SKLdue to its high running time.

Table 2. The fraction of steps of the three REMBO variants thatproject the sample point outside X . The data was collected byrunning the algorithms on Branin with input dimension D = 100over 30 replications.

Feature Choice

number of initial data points 10sampling approach Latin hypercube design

provided by pyDOEkernel Matern 5/2- ARDSearch space [−1, 1]dMethod of GP fitting GPy package (1.9.2)Number of replications 100Reported statistic MedianError bars Standard error of medianProcessor Xeon Haswell E5-2695 Dual

2.3 GHz 14-coreNumber of CPUs 1

E. Additional Plots for all ConductedExperiments

This section provides the plots for all experiments that weconducted as part of the evaluation. This includes plotsalready shown in Sect. 3 and additional plots that wereomitted due to space constraints. The plots are organizedaccording to the same subsections.

E.1. Performance Evaluation

The performances on Branin as a function of the evaluationsare given in Fig. 5.

The performances on Hartmann6 as a function of the evalu-

ations are given in Fig. 6.

The performances on Rosenbrock as a function of the evalu-ations are given in Fig. 7.

The performances on Styblinski-Tang as a function of theevaluations are given in Fig. 8.

E.2. Robustness

The robustness plots on Branin and Hartmann6 are given inFig. 9

E.3. Scalability

The performances on Branin, Rosenbrock and Styblinski-Tang as a function of the running time are given in Fig. 10.

E.4. High Dimensional Network Architecture Search

The performances on neural network optimization as a func-tion of the evaluations are given in Fig. 11.

Page 3: A. Proof of Claim5proceedings.mlr.press/v97/nayebi19a/nayebi19a-supp.pdf · 2021. 1. 14. · HeSBO-EI 0 20 40 60 80 Number of evaluations 0 2 4 6 8 10 12 14 Function values 25 dim

A Framework for Bayesian Optimization in Embedded Subspaces

BRANIND = 25 D = 100 D = 1000

0 20 40 60 80Number of evaluations

0

1

2

3

4

5

Func

tion

valu

es

25 dim Branin

Rembo-Ky

Rembo-Kx

Rembo-KψHeSBO-EI

0 10 20 30 40 50 60 70 80Number of evaluations

0

1

2

3

4

5

Func

tion

valu

es

100 dim Branin

Rembo-Ky

Rembo-Kx

Rembo-KψHeSBO-EI

0 20 40 60 80Number of evaluations

0

1

2

3

4

5

Func

tion

valu

es

1000 dim Branin

Rembo-Ky

Rembo-Kx

Rembo-KψHeSBO-EI

0 20 40 60 80Number of evaluations

0

2

4

6

8

10

12

14

Func

tion

valu

es

25 dim Branin

Rembo-Kx

BOCKEBOSKLADDHeSBO-EIHeSBO-KGHeSBO-BM

0 10 20 30 40 50 60 70 80Number of evaluations

0.0

2.5

5.0

7.5

10.0

12.5

15.0

17.5

20.0

Func

tion

valu

es

100 dim Branin

Rembo-Kx

BOCKEBOSKLADDHeSBO-EIHeSBO-KGHeSBO-BM

0 20 40 60 80Number of evaluations

0

1

2

3

4

5

Func

tion

valu

es

1000 dim Branin

Rembo-Kx

BOCKHeSBO-EIHeSBO-KGHeSBO-BM

0 20 40 60 80Number of evaluations

0

1

2

3

4

5

Func

tion

valu

es

25 dim Branin

Rembo-Kx

BOCKEBOADDHeSBO-EIHeSBO-KGHeSBO-BM

0 10 20 30 40 50 60 70 80Number of evaluations

0

1

2

3

4

5

Func

tion

valu

es

100 dim Branin

Rembo-Kx

BOCKEBOADDHeSBO-EIHeSBO-KGHeSBO-BM

0 20 40 60 80Number of evaluations

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Func

tion

valu

es

1000 dim Branin

BOCKHeSBO-EIHeSBO-KGHeSBO-BM

Figure 5. Performance plots for Branin as a function of the evaluations for the projection based algorithms (top), the state-of-the-artalgorithms (middle row) and a close-up to the best algorithms (bottom) for different input dimensions D = 25 (left), D = 100 (middlecolumn), and D = 1000 (right).

Page 4: A. Proof of Claim5proceedings.mlr.press/v97/nayebi19a/nayebi19a-supp.pdf · 2021. 1. 14. · HeSBO-EI 0 20 40 60 80 Number of evaluations 0 2 4 6 8 10 12 14 Function values 25 dim

A Framework for Bayesian Optimization in Embedded Subspaces

HARTMANN6

0 20 40 60 80Number of evaluations

−2.00

−1.75

−1.50

−1.25

−1.00

−0.75

−0.50

−0.25

0.00

Func

tion

valu

es

25 dim Hartmann6

Rembo-Ky

Rembo-Kx

Rembo-KψHeSBO-EI

0 20 40 60 80Number of evaluations

−3.5

−3.0

−2.5

−2.0

−1.5

−1.0

−0.5

0.0

Func

tion

valu

es

25 dim Hartmann6

Rembo-Kx

BOCKEBOSKLADDHeSBO-EIHeSBO-KGHeSBO-BM

0 20 40 60 80Number of evaluations

−2.00

−1.75

−1.50

−1.25

−1.00

−0.75

−0.50

−0.25

0.00

Func

tion

valu

es

100 dim Hartmann6

Rembo-Ky

Rembo-Kx

Rembo-KψHeSBO-EI

0 20 40 60 80Number of evaluations

−3.5

−3.0

−2.5

−2.0

−1.5

−1.0

−0.5

0.0

Func

tion

valu

es100 dim Hartmann6

Rembo-Kx

BOCKEBOSKLADDHeSBO-EIHeSBO-KGHeSBO-BM

0 20 40 60 80Number of evaluations

−2.0

−1.5

−1.0

−0.5

0.0

Func

tion

valu

es

1000 dim Hartmann6

Rembo-Ky

Rembo-Kx

Rembo-KψHeSBO-EI

0 20 40 60 80Number of evaluations

−3.0

−2.5

−2.0

−1.5

−1.0

−0.5

0.0

Func

tion

valu

es

1000 dim Hartmann6

Rembo-Kx

BOCKHeSBO-EIHeSBO-KGHeSBO-BM

Figure 6. Performance plots for Hartmann-6 as a function of the evaluations for the projection based algorithms (left) and the state-of-the-artalgorithms (right) for different input dimensions D = 25 (top), D = 100 (middle), and D = 1000 (bottom).

Page 5: A. Proof of Claim5proceedings.mlr.press/v97/nayebi19a/nayebi19a-supp.pdf · 2021. 1. 14. · HeSBO-EI 0 20 40 60 80 Number of evaluations 0 2 4 6 8 10 12 14 Function values 25 dim

A Framework for Bayesian Optimization in Embedded Subspaces

ROSENBROCKD = 25 D = 100 D = 1000

0 20 40 60 80Number of evaluations

0.0

2.5

5.0

7.5

10.0

12.5

15.0

17.5

20.0

Func

tion

valu

es

25 dim Rosenbrock

Rembo-Ky

Rembo-Kx

Rembo-KψHeSBO-EI

0 20 40 60 80Number of evaluations

0.0

2.5

5.0

7.5

10.0

12.5

15.0

17.5

20.0Fu

nctio

nva

lues

100 dim Rosenbrock

Rembo-Ky

Rembo-Kx

Rembo-KψHeSBO-EI

0 20 40 60 80Number of evaluations

0.0

2.5

5.0

7.5

10.0

12.5

15.0

17.5

20.0

Func

tion

valu

es

1000 dim Rosenbrock

Rembo-Ky

Rembo-Kx

Rembo-KψHeSBO-EI

0 20 40 60 80Number of evaluations

0.0

2.5

5.0

7.5

10.0

12.5

15.0

17.5

20.0

Func

tion

valu

es

25 dim Rosenbrock

Rembo-Kx

BOCKEBOSKLADDHeSBO-EIHeSBO-KGHeSBO-BM

0 10 20 30 40 50 60 70Number of evaluations

0.0

2.5

5.0

7.5

10.0

12.5

15.0

17.5

20.0

Func

tion

valu

es

100 dim Rosenbrock

Rembo-Kx

BOCKEBOSKLADDHeSBO-EIHeSBO-KGHeSBO-BM

0 20 40 60 80Number of evaluations

0.0

2.5

5.0

7.5

10.0

12.5

15.0

17.5

20.0

Func

tion

valu

es

1000 dim Rosenbrock

Rembo-Kx

BOCKHeSBO-EIHeSBO-KGHeSBO-BM

0 20 40 60 80Number of evaluations

0.00

0.25

0.50

0.75

1.00

1.25

1.50

1.75

2.00

Func

tion

valu

es

25 dim Rosenbrock

BOCKADDHeSBO-EIHeSBO-KGHeSBO-BM

0 20 40 60 80Number of evaluations

0.00

0.25

0.50

0.75

1.00

1.25

1.50

1.75

2.00

Func

tion

valu

es

100 dim Rosenbrock

BOCKADDHeSBO-EIHeSBO-KGHeSBO-BM

0 20 40 60 80Number of evaluations

0.00

0.25

0.50

0.75

1.00

1.25

1.50

1.75

2.00

Func

tion

valu

es

1000 dim Rosenbrock

BOCKHeSBO-EIHeSBO-KGHeSBO-BM

Figure 7. Performance plots for Rosenbrock as a function of the evaluations for the projection based algorithms (top), the state-of-the-artalgorithms (middle row) and a close-up to the best algorithms (bottom) for different input dimensions D = 25 (left), D = 100 (middlecolumn), and D = 1000 (right).

Page 6: A. Proof of Claim5proceedings.mlr.press/v97/nayebi19a/nayebi19a-supp.pdf · 2021. 1. 14. · HeSBO-EI 0 20 40 60 80 Number of evaluations 0 2 4 6 8 10 12 14 Function values 25 dim

A Framework for Bayesian Optimization in Embedded Subspaces

STYBLINSKI-TANGD = 25 D = 100 D = 1000

0 20 40 60 80Number of evaluations

−500

0

500

1000

1500

2000

2500

Func

tion

valu

es

25 dim Styblinski-Tang

Rembo-Ky

Rembo-Kx

Rembo-KψHeSBO-EI

0 20 40 60 80Number of evaluations

2000

4000

6000

8000

10000

12000

14000Fu

nctio

nva

lues

100 dim Styblinski-Tang

Rembo-Ky

Rembo-Kx

Rembo-KψHeSBO-EI

0 10 20 30 40 50 60 70 80Number of evaluations

20000

40000

60000

80000

100000

120000

140000

Func

tion

valu

es

1000 dim Styblinski-Tang

Rembo-Ky

Rembo-Kx

Rembo-KψHeSBO-EI

0 20 40 60 80Number of evaluations

−500

0

500

1000

1500

2000

2500

Func

tion

valu

es

25 dim Styblinski-Tang

Rembo-Kx

BOCKEBOSKLADDHeSBO-EIHeSBO-KGHeSBO-BM

0 20 40 60 80Number of evaluations

2000

4000

6000

8000

10000

12000

14000

Func

tion

valu

es

100 dim Styblinski-Tang

Rembo-Kx

BOCKEBOSKLADDHeSBO-EIHeSBO-KGHeSBO-BM

0 10 20 30 40 50 60 70 80Number of evaluations

20000

40000

60000

80000

100000

120000

140000

Func

tion

valu

es

1000 dim Styblinski-Tang

Rembo-Kx

BOCKHeSBO-EIHeSBO-KGHeSBO-BM

0 20 40 60 80Number of evaluations

−700

−600

−500

−400

−300

−200

−100

0

Func

tion

valu

es

25 dim Styblinski-Tang

BOCKEBOSKLADDHeSBO-EIHeSBO-KGHeSBO-BM

0 20 40 60 80Number of evaluations

1000

1500

2000

2500

3000

3500

4000

Func

tion

valu

es

100 dim Styblinski-Tang

BOCKEBOSKLADDHeSBO-EIHeSBO-KGHeSBO-BM

0 10 20 30 40 50 60 70 80Number of evaluations

15000

20000

25000

30000

35000

40000

Func

tion

valu

es

1000 dim Styblinski-Tang

BOCKHeSBO-EIHeSBO-KGHeSBO-BM

Figure 8. Performance plots for Styblinski-Tang as a function of the evaluations for the projection based algorithms (top), the state-of-the-art algorithms (middle row) and a close-up to the best algorithms (bottom) for different input dimensions D = 25 (left), D = 100(middle column), and D = 1000 (right).

Page 7: A. Proof of Claim5proceedings.mlr.press/v97/nayebi19a/nayebi19a-supp.pdf · 2021. 1. 14. · HeSBO-EI 0 20 40 60 80 Number of evaluations 0 2 4 6 8 10 12 14 Function values 25 dim

A Framework for Bayesian Optimization in Embedded Subspaces

0 20 40 60 80 100Number of evaluations

0

1

2

3

4

5

6

Func

tion

valu

es

25 dim Branin

Rembo-Kx (d=2)Rembo-Kx (d=4)Rembo-Kx (d=8)

HeSBO-EI (d=2)HeSBO-EI (d=4)HeSBO-EI (d=8)

0 20 40 60 80 100Number of evaluations

−2.5

−2.0

−1.5

−1.0

−0.5

0.0

Func

tion

valu

es

25 dim Hartmann6

Rembo-Kx (d=2)Rembo-Kx (d=4)Rembo-Kx (d=8)

HeSBO-EI (d=2)HeSBO-EI (d=4)HeSBO-EI (d=8)

0 20 40 60 80 100Number of evaluations

0

1

2

3

4

5

6

Func

tion

valu

es

100 dim Branin

Rembo-Kx (d=2)Rembo-Kx (d=4)Rembo-Kx (d=8)

HeSBO-EI (d=2)HeSBO-EI (d=4)HeSBO-EI (d=8)

0 20 40 60 80 100Number of evaluations

−2.5

−2.0

−1.5

−1.0

−0.5

0.0Fu

nctio

nva

lues

100 dim Hartmann6

Rembo-Kx (d=2)Rembo-Kx (d=4)Rembo-Kx (d=8)

HeSBO-EI (d=2)HeSBO-EI (d=4)HeSBO-EI (d=8)

0 20 40 60 80 100Number of evaluations

0

1

2

3

4

5

6

Func

tion

valu

es

1000 dim Branin

Rembo-Kx (d=2)Rembo-Kx (d=4)Rembo-Kx (d=8)

HeSBO-EI (d=2)HeSBO-EI (d=4)HeSBO-EI (d=8)

0 20 40 60 80 100Number of evaluations

−2.5

−2.0

−1.5

−1.0

−0.5

0.0

Func

tion

valu

es

1000 dim Hartmann6

Rembo-Kx (d=2)Rembo-Kx (d=4)Rembo-Kx (d=8)

HeSBO-EI (d=2)HeSBO-EI (d=4)HeSBO-EI (d=8)

Figure 9. The robustness analysis for Branin (left) and Hartmann6 (right) for D = 25 (top), D = 100 (middle), D = 1000 (bottom) anddifferent values of d.

Page 8: A. Proof of Claim5proceedings.mlr.press/v97/nayebi19a/nayebi19a-supp.pdf · 2021. 1. 14. · HeSBO-EI 0 20 40 60 80 Number of evaluations 0 2 4 6 8 10 12 14 Function values 25 dim

A Framework for Bayesian Optimization in Embedded Subspaces

ALL BEST

0 5 10 15 20 25 30 35Time (seconds)

0

5

10

15

20

25

Func

tion

valu

es

100 dim Branin

Rembo-Ky

Rembo-Kx

Rembo-KψBOCKEBOSKLADDHeSBO-EIHeSBO-KGHeSBO-BM

0 5 10 15 20 25 30 35Time (seconds)

0

1

2

3

4

5

6

Func

tion

valu

es

100 dim Branin

Rembo-Ky

Rembo-Kx

Rembo-KψEBOADDHeSBO-EIHeSBO-KGHeSBO-BM

0 5 10 15 20 25 30 35Time (seconds)

0

20

40

60

80

100

120

Func

tion

valu

es

100 dim Rosenbrock

Rembo-Ky

Rembo-Kx

Rembo-KψBOCKEBOSKLADDHeSBO-EIHeSBO-KGHeSBO-BM

0 5 10 15 20 25 30 35Time (seconds)

0

2

4

6

8

10Fu

nctio

nva

lues

100 dim Rosenbrock

BOCKEBOADDHeSBO-EIHeSBO-KGHeSBO-BM

0 5 10 15 20 25 30 35Time (seconds)

−2000

0

2000

4000

6000

8000

Func

tion

valu

es

100 dim StybTang

Rembo-Ky

Rembo-Kx

Rembo-KψBOCKEBOSKLADDHeSBO-EIHeSBO-KGHeSBO-BM

0 5 10 15 20 25 30 35Time (seconds)

−2500

−2000

−1500

−1000

−500

0

Func

tion

valu

es

100 dim StybTang

BOCKEBOSKLADDHeSBO-EIHeSBO-KGHeSBO-BM

Figure 10. The scalability analysis for all algorithms (left) and a close-up on the best (right) with input dimension D = 100 each anddifferent target dimensions for d: Branin d = 4 (top), Rosenbrock d = 4 (mid) and Styblinski-Tang d = 12 (bottom). Note that in theplot on the bottom right, ADD’s performance equals EBO on the Styblinski-Tang d = 12, that is why it is not visible.

Page 9: A. Proof of Claim5proceedings.mlr.press/v97/nayebi19a/nayebi19a-supp.pdf · 2021. 1. 14. · HeSBO-EI 0 20 40 60 80 Number of evaluations 0 2 4 6 8 10 12 14 Function values 25 dim

A Framework for Bayesian Optimization in Embedded Subspaces

0 20 40 60 80 100Number of evaluations

0.32

0.34

0.36

0.38

0.40

0.42

0.44

Func

tion

valu

es100 dim MNIST

Rembo-Ky

Rembo-Kx

Rembo-KψHeSBO-EI

0 20 40 60 80 100Number of evaluations

0.32

0.34

0.36

0.38

0.40

0.42

0.44

Func

tion

valu

es

100 dim MNIST

Rembo-Kx

BOCKEBOHeSBO-EIHeSBO-KGHeSBO-BM

Figure 11. The performance comparison of projection based algorithms (top) and state-of-the-art algorithms (bottom) on the MNISTneural network benchmark with input dimension D = 100 and target dimension d = 12.