euclidean distance geometry - École polytechniqueliberti/teaching/dgp/dgp-columbia.pdfeuclidean...

145
Euclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University, November 2015 [L., Lavor: Introduction to Euclidean Distance Geometry, in preparation]

Upload: others

Post on 23-Mar-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Euclidean Distance Geometry

Leo Liberti

CNRS LIX Ecole Polytechnique, France

IEOR, Columbia University, November 2015

[L., Lavor: Introduction to Euclidean Distance Geometry, in preparation]

Page 2: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Table of contents

1. Applications

2. Definition

3. Complexity primer

4. Complexity of the DGP

5. Number of solutions

6. Mathematical optimization formulations

7. Realizing complete graphs

8. The Branch-and-Prune algorithm

9. Symmetry in the KDMDGP

10. Tractability of protein instances

11. Finding vertex orders

12. Approximate realizations

2

Page 3: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Applications

1. Applications

2. Definition

3. Complexity primer

4. Complexity of the DGP

5. Number of solutions

6. Mathematical optimization formulations

7. Realizing complete graphs

8. The Branch-and-Prune algorithm

9. Symmetry in the KDMDGP

10. Tractability of protein instances

11. Finding vertex orders

12. Approximate realizations

3

Page 4: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Clock Synchronization

5

7

3

4

Atomic clock (S)16:27

ChuckAlice

Bob

16:20 16:22 16:24 16:26 16:28 16:30

A C S B

16:21 16:23 16:25 16:27 16:29 16:31

[Singer, 2011]

4

Page 5: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Sensor network localization

0

29

0 / 2.50781

36

1 / 1.97906

51

2 / 1.13649

52

3 / 2.92955

804 / 2.5548

915 / 2.4284

175 / 2.49225177 / 1.4276

87

176 / 2.73147

211 / 0.989917214 / 2.31786

49

210 / 2.83291

84

213 / 3.02018

66 212 / 1.56417

268 / 1.76662

267 / 2.31042

271 / 1.75417

69269 / 2.0622

73

270 / 3.10479

85

272 / 1.15897

346 / 1.50056

1

18

6 / 1.90126

37

7 / 2.61352

41

8 / 1.6604

45

9 / 1.92913

57

10 / 2.66407

5811 / 0.658861

8312 / 1.32064

110 / 1.29792

111 / 0.840347

113 / 2.14071

114 / 2.03576

47

112 / 2.5425

215 / 2.12848

217 / 3.0822

218 / 1.98508

216 / 2.70003

229 / 2.97748

231 / 1.63275

232 / 2.32966

230 / 2.92252

249 / 1.49322

250 / 1.4467

251 / 3.11231

289 / 2.5056

81

290 / 2.54013

291 / 1.97931

2

5

13 / 1.62227

1414 / 2.01015

17

15 / 2.68422

20

16 / 3.05872

23

17 / 2.48332

4218 / 1.05756

64

19 / 2.21356

67

20 / 2.42912

8821 / 2.34146

93

22 / 1.0662

28 / 2.06104

29 / 1.80061

31 / 1.46243

33 / 1.3817835 / 1.01186

33

30 / 3.15392 6532 / 2.19726

70

34 / 3.17108

82 / 2.05285

83 / 1.9929

84 / 2.90088

85 / 0.965501

87 / 1.40579

88 / 2.51699

86 / 3.19003

98

89 / 2.50718

106 / 1.36517

107 / 2.2605

109 / 2.88336

55

108 / 2.81041

119 / 2.25758

123 / 0.666733

126 / 2.79445

122 / 1.26894

124 / 2.45029

71

125 / 2.61515

127 / 2.5523

56

120 / 2.46168

63

121 / 3.15779

143 / 2.62768

145 / 3.01294146 / 1.62871

89

147 / 3.16128

144 / 2.60777

233 / 1.59139

234 / 3.07459

235 / 1.62072

316 / 1.57213

318 / 2.54084

92

317 / 3.16387

326 / 2.30061

325 / 2.97621

327 / 2.62621

350 / 3.18927

323 / 0.844888

26 / 2.43216

24 / 1.43953

25 / 2.52302

27 / 1.2847

329 / 2.97101

331 / 1.47226

330 / 2.81533

79

339 / 3.17026

195 / 3.02256

196 / 2.72116

194 / 3.16431

53

193 / 0.88823

319 / 1.78484

322 / 3.15941

320 / 1.19371321 / 2.59083

332 / 2.88171

77

333 / 2.5427

6

36 / 2.6581637 / 2.84506

39 / 3.00913

39

38 / 0.977096

40 / 0.147583

90

41 / 2.54617 223 / 1.11811

225 / 1.74291

72

224 / 2.873

257 / 2.65041

7

10

42 / 1.67711

16

43 / 1.28092

3044 / 2.6039944

45 / 1.93798

46 / 0.90946

78

47 / 1.95204

86

48 / 3.1702

9749 / 2.91638

50 / 2.79803

59 / 2.64882

61 / 3.11897

63 / 2.57151

64 / 0.6334465 / 1.47059

15

58 / 2.53596 2160 / 2.72874

59

62 / 2.44092

98 / 2.45974102 / 2.89582

101 / 3.00823

100 / 2.72757

103 / 1.23805

104 / 2.65248

105 / 1.56149

24

99 / 2.76855

182 / 2.80251

183 / 2.17394

178 / 0.684621

184 / 2.04844

186 / 2.36739

179 / 1.75353

181 / 1.27559

62

180 / 2.82612

185 / 2.02673

244 / 2.90515

245 / 2.54461246 / 1.4908

248 / 2.19914

242 / 2.10075

243 / 1.81286

247 / 2.70375

334 / 2.85895

335 / 2.74921

345 / 0.96983896344 / 3.01517

852 / 1.59789

53 / 3.1109

13

51 / 2.96353

54 / 2.6062

94 55 / 2.44249

81 / 2.44372

31

80 / 3.09041

352 / 3.17363

95

353 / 1.69606

9

56 / 2.23711

57 / 2.87413

95 / 2.4309797 / 1.79475

90 / 0.575857

91 / 1.84882

60

92 / 3.1144161

93 / 0.745853

7494 / 1.08674

96 / 1.91462

132 / 2.48921134 / 1.67584

128 / 2.41661

129 / 2.54924

130 / 0.857403131 / 1.03428

133 / 1.33929

295 / 2.78515

296 / 3.03355

297 / 2.79379

292 / 2.23004

293 / 2.59059

76

294 / 2.25052

1122

66 / 2.54335

2567 / 1.59039

26

68 / 2.55644

3569 / 1.22183

4370 / 1.26323

46

71 / 0.65784172 / 1.4554873 / 1.45696

74 / 2.43597

99

75 / 0.710271

135 / 1.03939

137 / 2.64809

138 / 2.36917139 / 2.85603

141 / 1.99344

142 / 0.632129

28136 / 3.03397

140 / 1.63894

150 / 2.05658

151 / 2.68427

152 / 1.33314153 / 1.85365155 / 1.0403

156 / 1.22973

157 / 2.29037

154 / 2.67744

158 / 3.05556

159 / 2.34031

160 / 1.82595

162 / 2.70381

163 / 2.43938

161 / 1.90326

203 / 0.991465

204 / 1.81584

206 / 2.66247

207 / 2.45539

208 / 2.2551

209 / 1.2466

48205 / 3.06568

236 / 1.90515238 / 2.5944239 / 2.72018

240 / 3.16484

241 / 0.720351

237 / 2.87679

252 / 0.86256254 / 0.843728

255 / 2.43018

256 / 1.26966

253 / 2.83326

262 / 0.880856

263 / 3.06774

264 / 1.88647

261 / 1.9719

348 / 2.26962

349 / 2.1111

351 / 3.02766

12

78 / 2.35652

76 / 0.500095

77 / 3.13454

79 / 2.02458

173 / 2.85379

172 / 2.85207

174 / 2.12157

281 / 1.77315

280 / 2.13908

187 / 0.741716

188 / 1.28288

304 / 2.67349

298 / 3.19922

301 / 3.10155

303 / 1.2464

68

299 / 2.68553300 / 2.93307

82302 / 2.70564

306 / 3.166308 / 2.46248

305 / 0.378469

307 / 1.95331

341 / 2.70349340 / 1.87485

354 / 2.04535

149 / 1.71292

32148 / 2.63438

19

117 / 2.4561

118 / 1.55824

115 / 1.40904

50116 / 0.648045

259 / 2.59804

260 / 2.60371

258 / 2.04883

265 / 2.53475

266 / 1.20584

285 / 3.04711

284 / 1.28966

286 / 0.444549287 / 2.49879

283 / 0.739388

282 / 2.4463

288 / 2.40494

312 / 2.0258

313 / 1.03139314 / 2.65429

315 / 1.70638

34

189 / 2.92806

38

190 / 2.07339

40191 / 2.80145

192 / 2.35256

324 / 2.63064

323 / 2.58729

27

171 / 2.23482

167 / 2.38609

164 / 1.92141

165 / 2.6582

166 / 2.28031

168 / 2.41683 169 / 0.955716

170 / 0.967808199 / 2.18351

197 / 0.88776

198 / 1.51159

200 / 0.583974201 / 2.87535

202 / 1.31584

220 / 2.98526

219 / 1.26904

221 / 0.308754

222 / 1.8407

226 / 1.33417227 / 3.09905

228 / 1.31378

328 / 1.65874

338 / 1.28689

337 / 1.83675

75

336 / 3.11488

347 / 3.11676

310 / 2.37803

309 / 1.9722

311 / 1.13749

276 / 3.05065

277 / 2.40579

273 / 2.81885

275 / 3.02541

274 / 2.49444

54

278 / 3.02172

279 / 1.17834

343 / 1.31891

342 / 2.50522

−→

[Yemini, 1978]

5

Page 6: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Protein conformation from NMR data

1

2

3

4

5

9

6

8

10

7

11

−→

[Crippen & Havel 1988]

6

Page 7: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Clock synchronization: solutions

|xA − xB| = 7

|xA − xC| = 3

|xA − xS| = 5

|xB − xC| = 4

xS = 16:27.5

7

3

4

PSfrag

xA

xB

xC

xS

xS = 16:27

xA = 16:22

xB=16:15

xC=16:19

xB=16:15

xC=16:25

xB=16:29

xC=16:19

xB=16:29

xC=16:25

xA = 16:32

xB=16:25

xC=16:29

xB=16:25

xC=16:35

xB=16:39

xC=16:29

xB=16:39

xC=16:35

7

Page 8: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Definition

1. Applications

2. Definition

3. Complexity primer

4. Complexity of the DGP

5. Number of solutions

6. Mathematical optimization formulations

7. Realizing complete graphs

8. The Branch-and-Prune algorithm

9. Symmetry in the KDMDGP

10. Tractability of protein instances

11. Finding vertex orders

12. Approximate realizations

8

Page 9: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Distance Geometry Problem (DGP)

Given: Determine whether ∃:– a simple graph G = (V,E)

– an edge function d : E → R≥0– an integer K ∈ N

a realization x : V → RK s.t.

∀{u, v} ∈ E ‖xu − xv‖2 = duv

1

2

3

4

1

√3

1

1

1

1 1

2

3

4

Let n = |V |9

Page 10: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

More applications

• Autonomous underwater vehicles [Bahr et al. 2009]

• Statics of rigid structures [Maxwell 1864]

• Matrix completion [Laurent 2009]

• Statistics [Boer 2013]

• Psychology [Kruskal 1964]

[Liberti et al., SIREV 2014]

10

Page 11: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Complexity primer

1. Applications

2. Definition

3. Complexity primer

4. Complexity of the DGP

5. Number of solutions

6. Mathematical optimization formulations

7. Realizing complete graphs

8. The Branch-and-Prune algorithm

9. Symmetry in the KDMDGP

10. Tractability of protein instances

11. Finding vertex orders

12. Approximate realizations

11

Page 12: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Definitions

• Decision problem: mathematical YES/NO-type question depending

on a parameter vector π

• Instance: same as above with π replaced by given values v

• Certificate: proof that a given answer is true

• P: all decision problems solvable in at most p(|π|) steps

where p is a polynomial

• NP: all decision problems with |YES certificate| ≤ p(|π|)where p is a polynomial

12

Page 13: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Reductions

• P,Q: decision problems

• If ∃ algorithm A which:

1. reformulates instances P of P into instances Q of Q

2. has answer(P ) = YES iff answer(A(P )) = YES

3. is polytime in the instance size |P |

then A is a reduction of P to Q

13

Page 14: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

NP-hardness

• Q is NP-hard if every problem in NP reduces to Q

• Q is NP-complete if it is NP-hard and is in NP

Why does it work?

any P in NPpolytime reduction−−−−−−−−−−−−−−−−−−→ Q: how hard?

• Suppose Q easier than P

• Solve P by reducing to Q in polytime and then solve Q

• Then P as easy as Q, against assumption

• ⇒ Q at least as hard as P

So if Q is NP-hard it is as hard as any problem in NP

⇒ Q is as hard as the hardest problem in NP

14

Page 15: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

NP-hardness proofs

Given a new problem Q, take any known

NP-hard problem P and reduce it to Q

Why does it work?

P : NP-hardpolytime reduction−−−−−−−−−−−−−−−−−−→ Q: how hard?

• As before: Suppose . . . (etc.) ⇒ Q at least as hard as P

• Since P is NP-hard, it is hardest in NP, and so is Q

⇒ Q is NP-hard

15

Page 16: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Complexity of the DGP

1. Applications

2. Definition

3. Complexity primer

4. Complexity of the DGP

5. Number of solutions

6. Mathematical optimization formulations

7. Realizing complete graphs

8. The Branch-and-Prune algorithm

9. Symmetry in the KDMDGP

10. Tractability of protein instances

11. Finding vertex orders

12. Approximate realizations

16

Page 17: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

DGP ∈ NP?

• NP: YES/NO problems with polytime-checkable proofs for YES

• DGP is a YES/NO problem

• DGP1 ∈ NP, since duv = |xu − xv| ⇒ (d ∈ Q→ x ∈ Q)

• Solutions might involve irrational numbers when K > 1

• Some empirical evidence that DGP 6∈ NP [Beeker et al. 2013]

17

Page 18: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

The DGP is NP-hard

PartitionGiven a = (a1, . . . , an) ∈ Nn, ∃ I ⊆ {1, . . . , n} s.t.

i∈Iai =

i 6∈Iai ?

• Reduce (NP-hard) Partition to DGP1

• a −→ cycle C with V (C) = {1, . . . , n}, E(C) = {{1,2}, . . . , {n,1}}• For i < n let di,i+1 = ai, and dn,n+1 = dn1 = an

• E.g. for a = (1,4,1,3,3), get cycle graph:

1

2 3

4

5

1

4

1

33

[Saxe, 1979]

18

Page 19: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Partition is YES ⇒ DGP1 is YES

• Given: I ⊂ {1, . . . , n} s.t.∑

i∈Iai =

i 6∈Iai

• Construct: realization x of C in R

1. x1 = 0 // start

2. induction step: suppose xi known

if i ∈ I

let xi+1 = xi + di,i+1 // go right

else

xi+1 = xi − di,i+1 // go left

• Correctness proof: by the same induction

but careful when i = n: have to show xn+1 = x1

19

Page 20: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

I = {1,2,3}

0 1 2 3 4 5 6

x1

Page 21: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

0 1 2 3 4 5 6

x1 x2

1 ∈ I

Page 22: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

0 1 2 3 4 5 6

x1 x2 x3

1 ∈ I2 ∈ I

Page 23: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

0 1 2 3 4 5 6

x1 x2 x3 x4

1 ∈ I2 ∈ I

3 ∈ I

Page 24: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

0 1 2 3 4 5 6

x1 x2 x3 x4x5

1 ∈ I2 ∈ I

3 ∈ I

4 6∈ I

Page 25: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

0 1 2 3 4 5 6

x6 = x1? x2 x3 x4x5

1 ∈ I2 ∈ I

3 ∈ I

4 6∈ I5 6∈ I

Page 26: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Partition is YES ⇒ DGP1 is YES

(1) =∑

i∈I(xi+1 − xi) =

i∈Idi,i+1 =

=∑

i∈Iai =

i 6∈Iai =

=∑

i 6∈Idi,i+1 =

i 6∈I(xi − xi+1) = (2)

(1) = (2)⇒∑

i∈I(xi+1 − xi) =

i 6∈I(xi − xi+1)⇒

i≤n(xi+1 − xi) = 0

⇒ (xn+1 − xn) + (xn − xn−1) + · · ·+ (x3 − x2) + (x2 − x1) = 0

⇒ xn+1 = x1

26

Page 27: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Partition is NO ⇒ DGP1 is NO

• By contradiction: suppose DGP1 is YES, x realization of C

• F = {{u, v} ∈ E(C) | xu ≤ xv}, E(C)rF = {{u, v} ∈ E(C) | xu > xv}• Trace x1, . . . , xn: follow edges in F (→) and in E(C) r F (←)

0 1 2 3-1-2-3

x1 x2x3x4 x5

{u,v}∈F(xv − xu) =

{u,v}6∈F(xu − xv)

{u,v}∈F|xu − xv| =

{u,v}6∈F|xu − xv|

{u,v}∈Fduv =

{u,v}6∈Fduv

• Let J = {i < n | {i, i+1} ∈ F} ∪ {n | {n,1} ∈ F}

⇒∑

i∈Jai =

i 6∈Jai

• So J solves Partition instance, contradiction

• ⇒ DGP is NP-hard, DGP1 is NP-complete

27

Page 28: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Number of solutions

1. Applications

2. Definition

3. Complexity primer

4. Complexity of the DGP

5. Number of solutions

6. Mathematical optimization formulations

7. Realizing complete graphs

8. The Branch-and-Prune algorithm

9. Symmetry in the KDMDGP

10. Tractability of protein instances

11. Finding vertex orders

12. Approximate realizations

28

Page 29: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

With congruences

• (G,K): DGP instance

• X ⊆ RKn: set of solutions

• Congruence: composition of translations, rotations, reflections

• C = set of congruences in RK

• x ∼ y means ∃ρ ∈ C (y = ρx):

distances in x are preserved in y through ρ

• ⇒ if |X| > 0, |X| = 2ℵ0

29

Page 30: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Modulo congruences

• Congruence is an equivalence relation ∼ on X

(reflexive, symmetric, transitive)

• Partitions X into equivalence classes

• X = X/∼: sets of representatives of equivalence classes

• Focus on |X| rather than |X|

30

Page 31: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Cardinality of X

• infeasible ⇔ |X| = 0

• rigid graph ⇔ |X| < ℵ0

• globally rigid graph ⇔ |X| = 1

• flexible graph ⇔ |X| = 2ℵ0

• |X| = ℵ0: impossible by Milnor’s theorem

31

Page 32: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Milnor’s theorem implies |X| 6= ℵ0

• System S of polynomial equations of degree 2

∀i ≤ m pi(x1, . . . , xnK) = 0

• Let X be the set of x ∈ RnK satisfying S

• Number of connected components of X is O(3nK)

[Milnor 1964]

• If |X| is countable then G cannot be flexible

⇒ incongruent elements of X are separate connected components

⇒ by Milnor’s theorem, there’s finitely many of them

32

Page 33: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Examples

V 1 = {1,2,3}E1 = {{u, v} | u < v}d1 = 1

x1 x2

x3

ρ congruence in R2

⇒ ρx valid realization

|X| = 1

V 2 = V 1 ∪ {4}E2 = E1 ∪ {{1,4}, {2,4}}d2 = 1 ∧ d14 =

√2

x1 x2

x3

x4

ρ reflects x4 wrt x1, x2

⇒ ρx valid realization

|X| = 2 ( , )

V 3 = V 2

E3 = {{u, u+1}|u ≤ 3}∪ {1,4}d1 = 1

x1 x2

x3x4ρ rotates x2x3, x1x4 by θ

⇒ ρx valid realization

|X| is uncountable

( , , , , . . .)

33

Page 34: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Mathematical optimization formulations

1. Applications

2. Definition

3. Complexity primer

4. Complexity of the DGP

5. Number of solutions

6. Mathematical optimization formulations

7. Realizing complete graphs

8. The Branch-and-Prune algorithm

9. Symmetry in the KDMDGP

10. Tractability of protein instances

11. Finding vertex orders

12. Approximate realizations

34

Page 35: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

System of quadratic constraints

∀{u, v} ∈ E ‖xu − xv‖2 = d2uv

• Around 10 vertices

• Computationally useless

35

Page 36: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Quadratic objective

minx∈RnK

{u,v}∈E(‖xu − xv‖2 − d2uv)

2

• Globally optimal value zero iff x is a realization of G

• sBB: 10-100 vertices, exact solutions

• heuristics: 100-1000 vertices, poor quality

[Lavor et al., 2006]

36

Page 37: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Convexity and concavity

maxx∈RnK

{u,v}∈E‖xu − xv‖2

∀{u, v} ∈ E ‖xu − xv‖2 ≤ d2uv

• Convex constraints, concave objective

• Computationally no better than “quadratic objective”

37

Page 38: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Pointwise reformulation

maxx∈RnK

{u,v}∈E,k≤Kθuvk(xuk − xvk)

∀{u, v} ∈ E ‖xu − xv‖2 ≤ d2uv

• Convex subproblem in stochastic iterative heuristics

“guess θ and solve”

• 100-1000 vertices, good quality

[L. IOS14/MAGO14(slides)]

38

Page 39: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

SDP formulation

minX�0

{u,v}∈E(Xuu +Xvv − 2Xuv)

∀{u, v} ∈ E Xuu +Xvv − 2Xuv ≥ d2uv

• Similar to those of Ye, Wolkowicz — works better for proteins

• 100 vertices, good quality

[D’Ambrosio et al., in progress]

39

Page 40: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Realizing complete graphs

1. Applications

2. Definition

3. Complexity primer

4. Complexity of the DGP

5. Number of solutions

6. Mathematical optimization formulations

7. Realizing complete graphs

8. The Branch-and-Prune algorithm

9. Symmetry in the KDMDGP

10. Tractability of protein instances

11. Finding vertex orders

12. Approximate realizations

40

Page 41: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Cliques

2-clique

12

3-clique

1

2

3

4-clique

1

2

3

4

(K +1)-clique = K-clique ⊕ a vertex

Given a realization of the K-clique, find the position of the vertex

41

Page 42: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Trilateration

1

2

3

11

2

−→1 2 3

0 1 2

Example: realize triangle on a line

• From ‖x3 − x1‖ = 2 and ‖x3 − x2‖ = 1 get

x23 − 2x1x3 + x21 = 4 (1)

x23 − 2x2x3 + x22 = 1. (2)

• (2)− (1) yields

2x3(x1 − x2) = x21 − x22 − 3

⇒ 2x3 = 4,

• Hence x3 = 2

42

Page 43: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Realizing a (K +1)-clique in RK−1

• Apply trilateration inductively on Kassume x1, . . . , xK ∈ RK−1 known, compute y = xK+1

• K quadratic eqns (∀j ≤ K ‖y − xj‖2 = d2j,K+1) in K − 1 vars

‖y‖2 − 2x1 · y + ‖x1‖2 = d21,K+1 [1]...

...‖y‖2 − 2xK · y + ‖xK‖2 = d2K,K+1 [K]

• Form system ∀j ≤ K − 1 ([j]− [K])

2(x1 − xK) · y = ‖x1‖2 − ‖xK‖2 − d21,K+1 + d2K,K+1 [1]− [K]...

...2(xK−1 − xK) · y = ‖xK−1‖2 − ‖xK‖2 − d2K−1,K+1 + d2K,K+1 [K − 1]− [K]

• This is a (K − 1)× (K − 1) linear system Ay = b

Solve to find y

[Dong, Wu 2002]

43

Page 44: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

“Solve”?

1. What if A is singular?

2. Or: A nonsingular but instance is NO

44

Page 45: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Singularity: rkA = K − 2

One row xj − xK of A depends on the others

K = 2 triangle in R1 x1 − x2 = 0x1=x2 x3?x3?

K = 3 4-clique in R2 x1, x2, x3 on a linex1 x2

x3

x4?

x4?

K = 4 5-clique in R3 x1, . . . , x4 in a plane x1x2 x3 x4

x5?

x5?

Trend continues: rk A = K − 2⇒ |X| = 2 (see later)

45

Page 46: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Singularity: rkA = K − 3

Two rows xj − xk depend on the others

K = 3 4-clique in R2 x1 = x2 = x3 x1=x2=x3

x4?

K = 4 5-clique in R3 x1, . . . , x4 on a line

x1

x2

x3

x4

x5?

Trend continues: [Hendrickson, 1992]

Thm. 5.8. If a graph G is connected, flexible and has more than K vertices, X

contains almost always a submanifold diffeomorphic to a circle

46

Page 47: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Hendrickson’s theorem also applies to non-cliques

47

Page 48: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Nonsingular matrix A with NO instance

• Infeasible quadratic system ∀j ≤ K ‖xK+1 − xj‖2 = d2j,K+1

• Take differences, get nonsingular A and value for xK+1

• . . . but it’s wrong!

Shit happens!

Every time you solve the linear system Ay = b

check feasibility with quadratic system

48

Page 49: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Algorithm for realizing complete graphs in RK

• Assume:

(i) G = (V,E) complete

(ii) |V | = n ≥ K +2

(iii) we know x1, . . . , xK+1

• Increase K: we know how to realize xK+2 in RK

• Use this inductively for each i ∈ {K +2, . . . , n}

49

Page 50: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Algorithm for realizing complete graphs in RK

// realize next vertex iteratively

for i ∈ {K +2, . . . , n} do// use (K +1) immediate adjacent predecessors to compute xi

if rkA = K thenxi = A−1b // A, b defined as above

elsexi =∞ // A singular, mark ∞ and exit

breakend if// check that xi is feasible w.r.t. other distances

for {j ∈ N(i) | j < i} doif ‖xi − xj‖ 6= dij then

// if not, mark infeasible and exit loop

xi = ∅break

end ifend forif xi = ∅ then

breakend if

end forreturn x

50

Page 51: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Complexity of Alg. 1

• Outer loop: O(n)

• Rank and inverse of A: O(K3)

• Inner loop: O(n)

• Get O(n2K3)

• But in most applications K is fixed

• Get O(n2)

But how do we find the realization of the first K +1 vertices?

51

Page 52: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Realizing (K +1)-cliques in RK

• Realizing (K +1)-cliques in RK−1 yields “flat simplices”

(e.g. triangles on lines)

• Use “natural” embedding dimension RK

• Same reasoning as above:

get system Ay = b where y = xK+1 and Aj = 2(xj − xK)

• But now A is (K − 1)×K

• Same as previous case with A singular

52

Page 53: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Almost square

How can you solve the following system Ay = b:

a11 a12 . . . a1K... ... . . . ...

aK−1,1 aK−1,2 . . . aK−1,K

y1...

yK

=

b1...

bK−1

where A has one more column than rows and rank K − 1?

53

Page 54: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Basics and nonbasics

• Since rkA = K − 1, ∃ K − 1 linearly independent columns

• B: set of their indices

• N : index of remaining column

• B: (K − 1)× (K − 1) square matrix of columns in B

• ⇒ B is nonsingular

• Can partition columns as A = (B|N)

Column j corresponds to variable yj

• Variables yB are called basic variables

• Variable yN is called nonbasic variable

54

Page 55: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

The dictionary

(B|N)y = b

⇒ ByB+NyN = b

⇒ yB = B−1b−B−1NyN

Basics expressed in function of nonbasic

55

Page 56: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

One quadratic equation

• From value of yN , can use dictionary to get yB

• Use one quadratic equation

1. Pick any h ∈ {1, . . . ,K − 1}, equation is ‖xh − y‖22 = d2hK

2. y = (yB|yN )⊤

3. Replace yB with B−1b−B−1NyN in equation

4. Solve resulting quadratic equation in one variable yN

5. Get 0,1 or 2 values for yN

6. ⇒ Get 0,1 or 2 positions for xK+1

56

Page 57: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

What if B−1N is zero?

• yB = B−1b−B−1NyN reduces to yB = B−1b

• Use one quadratic equation

1. Pick any h ∈ {1, . . . ,K − 1}, equation is ‖xh − y‖22 = d2hK

2. y = (yB|yN )⊤

3. Replace yB with B−1b in equation

4. Solve resulting quadratic equation in one variable yN

5. Get 0,1 or 2 values for yN

6. ⇒ Get 0,1 or 2 positions for xK+1

57

Page 58: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

The difference

• B−1N 6= 0: yNdictionary−−−−−−−→ yB

• Different values y+N 6= y−N −→ y+, y− with different components

• B−1N = 0: yBquadratic eqn.−−−−−−−−−−→ yN

• Even if y+N 6= y−N , K − 1 components of y+, y− are equal

aff(x1, . . . , xK−1) = {y ∈ RK | yN = 0}

58

Page 59: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

The case of no solutions

• No realizations exist for this (K +1)-clique in RK

• DGP instance is NO

59

Page 60: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

The case of one solution

• Assume for simplicity: N = K, h = 1, B−1N 6= 0Then ‖xh − y‖2 = d2h,K+1 becomes:

λy2K − 2µyK + ν = 0, where

λ = 1+∑

ℓ,j<K

β2ℓja

2jK

µ = x1K +∑

ℓ,j<K

βℓjajK(βℓjbℓ − x1ℓ)

ν =∑

ℓ,j<K

βℓjbℓ(βℓjbℓ − 2x1ℓ) + ‖x1‖2 − d21,K+1

• (Exactly one solution for yK) ⇔ µ2 = λν, not a tautology

• The set of all (K + 1)-clique DGP instances in RK s.t. µ2 = λν

has Lebesgue measure 0

• Ignore them, they happen with probability∗ 0!

∗ Assuming continuous distributions over the reals. For floating point number, who knows? . . .

but we’ll ignore these instances anyhow

60

Page 61: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Discriminant > 0, = 0

4

4

4

1

3

1

2

3

1

2

32

61

Page 62: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

The case of two solutions

• K spheres SK−11 , . . . , SK−1K in RK

centered at x1, . . . , xK

with radii d1,K+1, . . . , dK,K+1

• xK+1 must be at the intersection of SK−11 , . . . , SK−1K

• If⋂

j SK−1j 6= ∅, then |⋂j S

K−1j | = 2 in general

• will not mention “probability 0” or “in general” anymore

[Coope 2000]

62

Page 63: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Mirror images

• Let x+ = {x1, . . . , xK, x+K+1}, x− = {x1, . . . , xK, x−K+1}assume dim aff(x1, . . . , xK) = K − 1 (†)

• Theorem

x+, x− are reflections w.r.t. hyperplane defined by x1, . . . , xK

• Proof

1. x+, x− congruent by construction

2. ∀i ≤ K xi ∈ x+ ∩ x− → x+, x− not translations

3. |x+ ∩ x−| = K < |x+| = |x−| → x+, x− not rotations by (†)4. ⇒ must be reflections

63

Page 64: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Algorithm for realizing (K +1)-cliques in RK

// realize 1 at the origin

x1 = (0, . . . ,0)

// realize next vertex iteratively

for ℓ ∈ {2, . . . ,K +1} do

// at most two positions in Rℓ−1 for vertex ℓ

S =⋂

i<ℓSℓ−2i

if S = ∅ then

// warn: infeasible

return ∅end if

// arbitrarily choose one of the two points

choose any xℓ ∈ S

end for

// return feasible realization

return x

64

Page 65: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Complexity of Alg. 2

• Outer loop: O(K)

• Gaussian elimination on A: O(K3)

• Some messing about to obtain x+K+1, x−K+1: +O(K2)

• Get O(K4)

• But in most applications K is fixed

• Get O(1)

65

Page 66: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Back to complete graphs

• Alg. 2: realize 1, . . . ,K +1 in RK: O(1)

• Alg. 1: Realize K +2, . . . , n: O(n2)

• ⇒ O(n2)

• What about |X|?– Alg. 1 is deterministic: one solution from x1, . . . , xK+1– Alg. 2 is stochastic: pick one of two values K times

⇒ |X| = 2K

66

Page 67: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Let’s look at sparser graphs

67

Page 68: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

K-laterative graphs

• In Alg. 1 we only need each v > K + 1 to have K + 1 adjacent

predecessors in order to find a unique solution for xv

• Determination of xv from K+1 adjacent predecessors: K-lateration

• K-laterative graph:

(i) has a vertex order ensuring this property

(ii) the initial K +1 vertices induce a (K +1)-clique

the order is called K-lateration order

• Alg. 1 realizes all K-laterative graphs

The DGP restricted to K-laterative graphs in RK is tractable

[Eren et al. 2004]

68

Page 69: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

The story so far

• Lots of nice applications

• DGP is NP-hard

• May have 0,1, finitely many or 2ℵ0 solutions modulo congruences

• Continuous optimization techniques don’t scale well

• Using K +1 adjacent predecessors, realize K-laterative graphs in

RK in polytime

• Do we need K +1 adjacent predecessors, or can we do with

less?

69

Page 70: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

The Branch-and-Prune algorithm

1. Applications

2. Definition

3. Complexity primer

4. Complexity of the DGP

5. Number of solutions

6. Mathematical optimization formulations

7. Realizing complete graphs

8. The Branch-and-Prune algorithm

9. Symmetry in the KDMDGP

10. Tractability of protein instances

11. Finding vertex orders

12. Approximate realizations

70

Page 71: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Fewer adjacent predecessors

• Alg. 2 only needs K adjacent predecessor

• Extend to n vertices: (K − 1)-laterative graphs

• Can we realize (K − 1)-laterative graphs in RK?

• A small case: graph consisting of two K +1 cliques11

1

5

5

5

222

33

3 44

4

71

Page 72: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Take a closer look. . .

5

5

22

33

44

• Realization of a K +1 clique in RK knowing x1, . . . , xK

• We know how to do that!

• Consistent with 2 solutions for x5, reflected across plane through

x2, x3, x4

72

Page 73: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Discretization and pruning edges

• (K − 1)-laterative graph G = (V,E):

∀v > K ∃Uv ⊂ V (|Uv| = K ∧ ∀u ∈ Uv(u < v) ∧ {u, v} ∈ E)

• Discretization edges:

ED = {{u, v} ∈ E | u, v ≤ K}︸ ︷︷ ︸

initial clique

∪ {{u, v} ∈ E | v > K ∧ u ∈ Uv}︸ ︷︷ ︸

vertex order

• Pruning edges EP = E r ED

10

1

2

3

4

5

6

7

8

9

11

12

13

14

15

16

17

1819

73

Page 74: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Role of discretization edges

Missing discretization edge

⇒ non-rigid structure

⇒ X uncountable

Else: X finite

74

Page 75: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Role of pruning edges

No pruning edges: 8 incongruent realizations in R2

1

2

3

4

5 1

2

3

4

5 1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

75

Page 76: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Role of pruning edges

Pruning edge {1,4}: only 4 realizations remain valid

1

2

3

4

5 1

2

3

4

5 1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

76

Page 77: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Motivation

Protein backbones

• Total order < on V

• Covalent bond distances: {u− 1, u} ∈ E

• Covalent bond angles: {u− 2, u} ∈ E

• NMR experiments: {u− 3, u} ∈ E

(and other edges {u, v} with v − u > 3)

Generalize “3” to K

[Lavor et al., COAP 2012]

77

Page 78: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

KDMDGP graphs

K = 2 K = 3

1

2

3

4

5

Generalization of protein backbone order:

v > K is adjacent to K immediate predecessors v − 1, . . . , v −K

KDMDGP: Discretizable Molecular Distance Geometry Problem

78

Page 79: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

The Branch-and-Prune (BP) algorithm

BP(v, x, X):

1. Given v > K, realization x = (x1, . . . , xv−1)

2. Compute S =⋂

u∈Uv

SK−1u

3. For each xv ∈ S s.t. ∀{u, v} ∈ EP (u < v → ‖xu − xv‖ = duv)

(a) let x = (x, xv)

(b) if v = n add x to X, else call BP(v +1, x, X)

• Recursive: starts with BP(K + 1, (x1, . . . , xK), ∅)

• All realizations in X are incongruent∗

• Can be easily modified to find only p solutions for given p

• Applies to all (K − 1)-laterative graphs in RK

• Specialize to KDMDGP graph by setting Uv = {v − 1, . . . , v −K}

∗ with probability 1, and aside from one reflection at v = K +1

[L. et al. ITOR 2008]

79

Page 80: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Complexity of BP

• Most operations are O(Kh) for some fixed h ⇒ O(1)

• Distance check at Step 3: O(n)

• Recursion on at most 2 branches at each call: binary tree

• Only recurse when v > K, v < n: 2n−K nodes

• Overall O(n2n−K) = O(2n)

Worst-case exponential behaviour

80

Page 81: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Hardness of KDMDGP

• The KDMDGP is NP-hard for each K

– every DGP instance is also DMDGP if K = 1

– reduction from Partition can be extended to any K

• (K − 1)-lateration graphs are NP-hard by inclusion

• No polytime algorithm unless P=NP

Trilaterative graphs in RK are complexitywise borderline at K

81

Page 82: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Correctness

Thm.

When BP terminates, X contains every incongruent realization of G

Proof.

• Let y be any realization of G

• Since G has an initial K-clique, can rotate/translate/reflect y to

y[K] = x[K] for all x ∈ X

• BP exhaustively constructs every extension of x[K] which is feasible

with all distances, so y ∈ X

for a realization y, y[h] = (y1, . . . , yh) is the initial segment of y

82

Page 83: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Two examples

83

Page 84: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Empirical observations

• Fast: up to 10k vertices in a few seconds on 2010 hardware

• Precise: errors in range O(10−9)-O(10−12)

• Number of solutions always a power of 2:

obvious if EP = ∅, but otherwise mysterious

• Linear-time behaviour on proteins:

this really shouldn’t happen

84

Page 85: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Symmetry in the KDMDGP

1. Applications

2. Definition

3. Complexity primer

4. Complexity of the DGP

5. Number of solutions

6. Mathematical optimization formulations

7. Realizing complete graphs

8. The Branch-and-Prune algorithm

9. Symmetry in the KDMDGP

10. Tractability of protein instances

11. Finding vertex orders

12. Approximate realizations

[L. et al. DAM 2014]

85

Page 86: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Partial reflections

• For each v > K, let

gv(x) = (x1, . . . , xv−1, Rvx(xv), . . . , R

vx(xn))

be the partial reflection of x w.r.t. v

xv−3

xv−2

xv−1

• Note: the gv’s are idempotent operators

• GD = (V,ED): subgraph of G given by discretization edges

• ∀v > K reflection Rvx gives a binary choice in general∗

• XD ⊂ RnK contains 2n−K incongruent realizations of GD

∗ subsequent results hold “with probability 1”

86

Page 87: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Discretization group

• GD = 〈gv | v > K〉: the discretization group of G w.r.t. K

subgroup of a Cartesian product of reflection groups

• An element g ∈ GD has the form⊗

v>Kgavv , where av ∈ {0,1}

• Action of GD on XD: g(x) =(

gaK+1K+1 ◦ · · · ◦ gann

)

(x)

87

Page 88: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Commutativity of partial reflections

Lemma A GD is Abelian

Proof Assume K < u < v. Then

gugv(x) = gu(x1, . . . , xv−1, Rvx(xv), . . . , R

vx(xn))

= (x1 . . . , xu−1, Rugv(x)

(xu), . . . , Rugv(x)

Rvx(xv), . . . , R

ugv(x)

Rvx(xn))

= (x1 . . . , xu−1, Rux(xu), . . . , R

vgu(x)

Rux(xv), . . . , R

vgu(x)

Rux(xn))

= gv(x1, . . . , xu−1, Rux(xu), . . . , R

ux(xn))

= gvgu(x)

where equality of these terms holds by a Technical Lemma

(next slide)

[L. et al. 2013]

88

Page 89: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Commutativity of partial reflections

Technical Lemma

(Proof sketch for K = 2) Let y⊥Aff(xv−1, . . . , xv−K) and ρy = Rvx

ρz

ρy

t

O

z

y

ρzy

ρyt

ρzt

ρρzy

ρzρyt = ρρzyρzt

89

Page 90: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

One realization generates all others

Lemma B The action of GD on XD is transitive

K = 2 x

y

g

gg

54

3

∃g ∈ GD (y = g(x)): namely, y = g5(g4(g3(x)))

Proof By induction on v: assume result holds to v− 1 with g′, then either

it holds for v and g = g′, else flip and let g = gvg′

[L. et al. 2013]

90

Page 91: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Structure and invariance

• GD is Abelian and generated by n−K idempotent elements

⇒ GD∼= Cn−K

2

• GD ≤ Aut(XD) by construction

91

Page 92: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Solution sets

• X: set of incongruent realizations of G

• GD defined on same vertices but fewer edges

⇒ fewer distance constraints on realizations

⇒ more realizations

• All realizations of G are also realizations of GD

⇒ X ⊆ XD

92

Page 93: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Losing invariance on pruning edges

Lemma C Let Wuv = {u+K +1, . . . , v} be the range of {u, v}∀x ∈ X, u,w, v ∈ V (w ∈Wuv ↔ ‖xu − xv‖ 6= ‖gw(x)u − gw(x)v‖)

Proof sketch for K = 2

u

w v

Corollary If {u, v} ∈ EP and w ∈Wuv, gw(x) 6∈ X

[L. et al. 2013]

93

Page 94: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Pruning group

Define:

Γ = {gw ∈ GD | w > K ∧ ∀{u, v} ∈ EP (w 6∈Wuv)}GP = 〈Γ〉

Lemma D X is invariant w.r.t. GP

Proof

Follows by corollary, invariance of XD w.r.t. GD and X ⊆ XD

94

Page 95: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Transitivity of the pruning group

Lemma E The action of GP on X is transitive

• Given x, y ∈ X, aim to show ∃g ∈ GP (y = g(x))

• Lemma B ⇒ ∃g ∈ GD with y = g(x) ∈ XD

• Suppose g 6∈ GP and aim for a contradiction

• ⇒ ∃{u, v} ∈ EP and w ∈W uv s.t. gw is a component of g

• Lemma C ⇒ ‖gw(x)u − gw(x)v‖ 6= duv

• If w is the only such vertex, ⇒ y = g(x) 6∈ X, contradiction

• Suppose ∃ another z ∈W uv s.t. gz is a component of g

• Set of cases s.t. ‖xu − xv‖ = ‖gzgw(x)u − gzgw(x)v‖ given ‖gw(x)u − gw(x)v‖ 6=‖xu − xv‖ 6= ‖gz(x)u − gz(x)v‖ has Lebesgue measure 0 in all DGP inputs

• By induction, holds for any number of components gz of g with z ∈W uv

• ⇒ y = g(x) 6= x against hypothesis, done

[L. et al. 2013]

95

Page 96: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

The main result

Theorem |X| = 2|Γ|

• Lemma A ⇒ GD∼= Cn−K

2 ⇒ |GD| = 2n−K

• GP ≤ GD ⇒ ∃ℓ ∈ N (GP∼= Cℓ

2) , with ℓ = |Γ|

• Lemma E ⇒ ∀x ∈ X GPx = X

• Idempotency ⇒ ∀g ∈ GP g−1 = g

⇒ ∀g, h ∈ GP , x ∈ X (gx = hx→ h−1gx = x→ hgx = x→ hg = e→ h = g−1= g)

⇒ the mapping GPx→ GP given by gx→ g is injective

• ∀g, h ∈ GP , x ∈ X (g 6= h→ gx 6= hx)

⇒ the mapping gx→ g is surjective

• ⇒ the mapping gx→ g is a bijection

• ⇒ |GPx| = |GP |• ⇒ ∀x ∈ X |X| = |GPx| = |GP | = 2|Γ|

[L. et al. 2013]

96

Page 97: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Symmetry-aware BP

• Don’t need to explore all branches of BP tree

• Build Γ as a pre-processing step

• Run BP, terminating as soon as |X| = 1

• For each g ∈ GP , compute gx

[Mucherino et al. JBCB 2012]

97

Page 98: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Complexity

• Computing Γ: O(mn)

1. initialize indicator vector ι = (ιK+1, . . . , ιn) for gv ∈ Γ

2. initialize ι = 1

3. for each {u, v} ∈ EP and w ∈Wuv let ιw = 0

• BP: O(2n)

• Compute gx for each g ∈ GP : O(2|Γ|)

• Overall: O(2n)

• Gains depend on the instance

98

Page 99: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Tractability of protein instances

1. Applications

2. Definition

3. Complexity primer

4. Complexity of the DGP

5. Number of solutions

6. Mathematical optimization formulations

7. Realizing complete graphs

8. The Branch-and-Prune algorithm

9. Symmetry in the KDMDGP

10. Tractability of protein instances

11. Finding vertex orders

12. Approximate realizations

[L. et al. 2013]

99

Page 100: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Let’s handle the BP tree

1

2

3

4 29

5 17

6 13

7 12

8

9

10

11

14 15

16

18 22

19 21

20

23 24

25

26

27

28

30 42

31 38

32 37

33

34

35

36

39 40

41

43 47

44 46

45

48 49

50

51

52

53

Max depth: n, looks good! Aim to prove width is bounded

100

Page 101: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Number of solutions at each BP tree levelDepends on range of longer pruning edge incident to level v

2 4 8 16 32 64 128 256 512

256

128

64

32

16

8

2

1 2 3 4 5 6 7 8 9

2 4 8 16 32 64 128

2 4 8 16 32 64

2 4 8 16 32

2 4 8 16

2 4 8

2 4

2 4

K−trilaterative

K+

Green path

8 nodes at level K +4

{3,K+5} ∈ EP ⇒ gK+4, gK+5 6∈ Γ

⇒ no symmetry at levels K + 4

and K +5

⇒ only 4 nodes at level K +5

shortest pruning edge {6,K +7}

longer {5,K +7}

longest {1,K +7}

no pruning edges

101

Page 102: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Periodic pruning edges

1

16

2

3

4

5

6

7

8

9

10

11

12

13

14

1517

115 6 7 8 9

2 4 8 16 32 64 128 256 512

256

128

64

32

16

8

2

2 4 8 16 32 64 128

2 4 8 16 32 64

2 4 8 16 32

2 4 8 16

2 4 8

2 4

2 4

1210 ...4

• 2ℓ growth up to level ℓ, then constant: O(2ℓn) nodes in BP tree

• BP is Fixed-Parameter Tractable (FPT) in a bunch of cases

• For all tested protein backbones, ℓ ≤ 5⇒ BP linear on proteins!

102

Page 103: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

The story so far

• Nice applications, problem is hard, could have many solutions

• Continuous methods don’t scale

• If certain vertex orders are present, use mixed-combinatorial methods

• Realize K-laterative in polytime but (K − 1)-laterative are hard

• If adjacent predecessors are immediate, theory of symmetries

• Number of solutions is a power of two

• For proteins, BP is linear time

• How do we find these vertex orders?

103

Page 104: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Finding vertex orders

1. Applications

2. Definition

3. Complexity primer

4. Complexity of the DGP

5. Number of solutions

6. Mathematical optimization formulations

7. Realizing complete graphs

8. The Branch-and-Prune algorithm

9. Symmetry in the KDMDGP

10. Tractability of protein instances

11. Finding vertex orders

12. Approximate realizations

[Cassioli et al., DAM]

104

Page 105: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

. . . wasn’t the backbone providing them?

• NMR data not as clean as I pretended

• Have to mess around with side chains

• What about other applications, anyhow?

Methods for finding trilaterative orders automatically

105

Page 106: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Mostly bad news

• Finding K-laterative orders is NP-complete :-(

• But also FPT :-)

• Finding KDMDGP orders is NP-complete for all K :-(

• It’s also really hard in practice, and methods don’t scale well

106

Page 107: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Definitions

• Trilateration Ordering Problem (TOP)

Given a connected graph G = (V,E) and a positive integer

K, does G have a K-lateration order?

• Contiguous Trilateration Ordering Problem (CTOP)

Given a connected graph G = (V,E) and a positive integer

K, does G have a (K − 1)-lateration order such that Uv =

{v − 1, . . . , v −K} for each v > K?

Both problems are in NP

107

Page 108: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Hardness of TOP

• Essentially due to finding the initial clique

– brute force: test all(nK

)

subsets of V

–(nK

)

is O(nK), polytime if K fixed

• Reduction from K-Clique problem:

Given a graph, does it have a K-clique?

[Mucherino et al., OPTL 2012]

108

Page 109: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Reduction from K-Clique

reduction

GG′ = G⊕ U

K = 3 K − 1 dummy nodes

• If K-Clique instance is YES– start with α = (initial clique of G,U)

– induction: if αv−1 defined, pick αv at shortest path distance 1 from⋃

α

• If K-Clique instance is NO– By contradiction: suppose ∃ trilateration order α in G′

– Initial clique α[K] = (α1, . . . , αK) must have K − 1 vertices in G, 1 in U– αK+1 must be in G, hence ∃ K-clique in G

[Cassioli et al., DAM]

109

Page 110: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

sh. path

K

U

n+ 1 n+ 2

ℓ = 1ℓ = 2

ℓ− 1 ℓ

110

Page 111: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Once the initial clique is known

Greedily grow a trilateration order α

• Initialize α with initial K-clique K

• Let W = V rK

• ∀v > K av = |vertices in K adjacent to v|// at termination, av will be the number of adjacent predecessors of v

• While W 6= ∅:

1. choose v ∈W with largest av2. if av < K, no trilateration order from this K, terminate

3. α← (α, v)

4. for all u ∈W adjacent to v, increase au5. W ←W r {v}

• Instance is YES

[Mucherino et al., OPTL 2012]

111

Page 112: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Greedy algorithm is correct

• Assume TOP instance is YES, proceed by induction

– start: by maximality, aK+1 > K

– assume α is a valid TOP up to v − 1, suppose av < K

– but instance is YES so there is another z ∈W with az ≥ K

– contradicts maximality of av

• Assume TOP instance is NO

– “YES” termination when W = ∅ contradicts the NO

– hence it must terminate with W 6= ∅ and “NO” answer

112

Page 113: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Complexity

• Outer while loop: O(n)

• Choice of largest av: O(n)

• Inner loop on W : O(n)

• Overall: O(n2)

• If we add brute force initial clique: O(2Kn2)

• Polytime if K fixed, FPT otherwise

113

Page 114: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

CTOP is hard

• Reduction from Hamiltonian Path (HP)

Given a graph G, does it have a path passing through each

vertex exactly once?

• α a H. path in G ⇒ ∀v 6= 1, n αv is adjacent to αv−1, αv+1

• Apart from initial 1-clique α1

every αv is adjacent to its immediate predecessor

• ⇒ α is a KDMDGP order in G with K = 1

• HP is the same as CTOP with K = 1

• ⇒ By inclusion, CTOP is NP-hard

[Cassioli et al., DAM]

114

Page 115: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

CTOP is hard for all K

• Reduction from HP

reduction

1

2

3C1

C2

C3

C12

C23

• Technical proof

[Cassioli et al., DAM]

115

Page 116: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

How do we find KDMDGP orders?

Mathematical optimization & CPLEX

• xvi = 1 iff vertex v has rank i in the order

• Each vertex has a unique order rank:

∀v ∈ V∑

i∈nxvi = 1;

• Each rank value is assigned a unique vertex:

∀i ∈ n∑

v∈Vxvi = 1;

• There must be an initial K-clique:

∀v ∈ V, i ∈ {2, . . . ,K}∑

u∈N(v)

j<i

xuj ≥ (i− 1)xvi;

• Each vertex with rank > K must have at least K contiguous adjacent predecessors

∀v ∈ V, i > K∑

u∈N(v)

i−K≤j<i

xuj ≥ Kxvi.

• Do not expect too much; scales up to 100 vertices

116

Page 117: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

How about those 10k-atom backbones?

We have Carlile for those

• Note the repetitions — they serve a purpose!

• Repetition orders are also hard to find for any K

• . . . but Carlile knows how to handcraft them!

[Lavor et al. JOGO 2013]

117

Page 118: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

And what about the side-chains?The Carlile+Antonio tool!

[Costa et al. JOGO, 2014]

118

Page 119: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Approximate realizations

1. Applications

2. Definition

3. Complexity primer

4. Complexity of the DGP

5. Number of solutions

6. Mathematical optimization formulations

7. Realizing complete graphs

8. The Branch-and-Prune algorithm

9. Symmetry in the KDMDGP

10. Tractability of protein instances

11. Finding vertex orders

12. Approximate realizations

119

Page 120: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Data errors

Distance values are never precise

• Covalent bonds are fairly precise

• NMR data is a mess [Berger, J. ACM 1999]

– experimental errors yield intervals [dLuv, dUuv]

– NMR outputs frequencies of (atom type pair,distance value)

weighted graph reconstruction yields systematic error

– some atom type pairs yield more error (“only trust H—H”)

• Properties of specific molecules give rise to other constraints

• The protein graph may not be (K − 1)-laterative based on

the backbone

120

Page 121: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

The Lavorder comes to the rescue!

• Carlile’s handcrafted repetition orders properties:

– repetitions allow a “virtual backbone” of H atoms only

– discretization edges: {v, v − i} covalent bonds for i ∈ {1,2},{v, v − 3} sometimes covalent sometimes from NMR

– most NMR data restricted to pruning edges

• When dv,v−3 is an interval: intersect two spheres with sph. shelldL

dU

• Discretize circular segments and run BP with modified S

Algorithm no longer exhaustive

121

Page 122: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Die Symmetriktheoriedammerung

• Intervals and discretization break the theory of symmetries����

�� ��������

����

����

��������

��������

��������

����

����

��

��

����

��

������

��

������

��������

����

��������

��

��������

����

��������

�����������������������������������������������������������

�����������������������������������������������������������

�����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������

�����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������

�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������

�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������

duv

• Only some bounds for the number b of BP solutions:

∃ℓ, k 2ℓqk ≤ b ≤ 2n−3qM

q = |discretization points|, M = |NMR discretization edges|

122

Page 123: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

But at least it’s producing results

Joint work with Institut Pasteur

[Cassioli et al., BMC Bioinf., 2015]

123

Page 124: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

General approximate methods

• All these methods are specialized to

protein distance data from NMR

• What about general approximate methods?

• Assume large-sized input data with errors

• No assumptions on graph structure

124

Page 125: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

The Isomap method: Ingredients

• PDM = Partial Distance Matrix (a representation of G)

• EDM = Euclidean Distance Matrix

1. Complete the given PDM d to a symmetric matrix D

2. Find a realization x (in some dimension K)

s.t. the EDM (‖xu − xv‖) is “close” to D

3. Project x from dimension K to dimension K,

keeping pairwise distances approximately equal

125

Page 126: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Completing the distance matrix

• ∀{u, v} 6∈ E let Duv = length of the shortest path u→ v

• Use Floyd-Warshall’s algorithm O(n3)

1: // n× n array Dij to store distances

2: D = 0

3: for {i, j} ∈ E do

4: Dij = dij5: end for

6: for k ∈ V do

7: for j ∈ V do

8: for i ∈ V do

9: if Dik +Dkj < Dij then

10: // Dij fails to satisfy triangle inequality, update

11: Dij = Dik +Dkj

12: end if

13: end for

14: end for

15: end for

126

Page 127: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Finding a realization

• Let’s give ourselves many dimensions, say K = n

• Attempt to find x : V → Rn with (‖xu − xv‖2) ≈ (Duv)

• If we had the Gram matrix B of x, then:

1. find eigen(value/vector) matrices Λ, Y of B

2. since B is PSD, Λ ≥ 0⇒√Λ exists

3. ⇒ B = Y ΛY ⊤ = (Y√Λ)(Y

√Λ)⊤

4. x⊤ = Y√Λ is such that x⊤x = B

• Can we compute B from D?

127

Page 128: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Schoenberg’s theorem

• Standard method for computing B from D2

• Also known as classic MultiDimensional Scaling (MDS)

• Apply many algebraic manipulations to

d2uv = ‖xu − xv‖2 = xu⊤xu + xv

⊤xv − 2xu⊤xv

where the centroid∑

k≤nxuk = 0 for all u ≤ n

• Get B = −12(In −

1n1n)D

2(In − 1n1n), i.e.

xu · xv =1

2n

k≤n(d2uk + d2kv)− d2uv −

1

2n2

h≤nk≤n

d2hk

• D “approximately” EDM ⇒ B “approximately” Gram

[Schoenberg, Annals of Mathematics, 1935]

128

Page 129: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Project to RK for a given K

• Only use the K largest eigenvalues of Λ

• Y [K] = K columns of Y corresp. to K largest eigenvalues

• Λ[K] = K largest eigenvalues of Λ on diagonal

• x⊤ = Y [K]√

Λ[K] is a K × n matrix

• Y [K] span the subspace where x “fills more space”, i.e. neglecting other

dimensions causes smaller errors w.r.t. the realization in Rn

This method is called Principal Component Analysis (PCA)

129

Page 130: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Isomap

Given K and PDM d:

1. D = FloydWarshall(d)

2. B = MDS(D)

3. x = PCA(B,K)

−→

[Tenenbaum et al. Science 2000]

130

Page 131: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Bonus material

Page 132: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Other topics

• DGP on a sphere

• DGP with different norms

ℓ1? ℓ∞? geodesics?

• Unassigned DGP

132

Page 133: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Realizing rigid graphs on a unit sphere

trivial extension: we get one more constraint ‖x‖2 = 1 for free!

Page 134: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Everything’s simpler on SK

main building block: K-lateration

Realize y knowing x1, . . . , xK+1and distances di from xi to y (i ≤ K +1)

‖xi − y‖2 = d2i

‖xi‖2 + ‖y‖2 − 2xiy = d2i

1 + 1− 2xiy = d2i

xiy = 1− d2i /2

Ay = b

where A =

x11 x12 · · · x1,K+1... ... . . . ...

xK+1,1 xK+1,2 · · · xK+1,K+1

b = (1− d2i /2 | i ≤ K +1)⊤

134

Page 135: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

(K − 1)-lateration on SK

As in the Euclidean case, replace dictionary in ‖y‖ = 1, get

‖y‖2 = 1

‖yB‖2 + y2N = 1

‖B−1b− yNB−1N‖2 + y2N = 1

yN =−(B−1b)(B−1N)±

(B−1bB−1N)2 − ‖B−1N‖4 +1

‖B−1N‖2 +1

yB = B−1b− yNB−1N

Again, ≤ 2 possible positions for y

135

Page 136: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

When distances are geodesic

Page 137: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Godel’s Distance Geometry theorem

Thm.

Any weighted 4-clique which can be realized in R3 but not R2 can

also be realized on rS2 (for some r > 0) with geodesic distances.

137

Page 138: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Proof 1/3: geodesic and chord lengths

• a1, . . . , a6: given distance lengths on 4-clique edges

aim to realize them on a sphere as geodesics

• cx(α): length of chord of geodesic α having radius r = 1/xα

cx(α)

1/x

• τ(x): tetrahedron in R3 having side lengths cx(ai) for i ≤ 6

• φ(x): radius inverse of circumscribed sphere about τ(x)

138

Page 139: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Proof 2/3: straighten the geodesics

• T : realization a1, . . . , a6 in R3

• r: radius of the sphere circumscribed around T

• limx→0

cx(α) = α

as 1/x→∞ implies chord lengths = geodesic lengths

• ⇒ limx→0

τ(x) = T and limx→0

φ(x) = 1/r

139

Page 140: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Proof 3/3: the fixed point

• By existence of T , φ(0) = 1/r makes φ continuous at 0

• Technical claim: φ has a fixed point in the interval I = (0, π/a′),where a′ = max(a1, . . . , a6)

• Let y by the fixed point of φ, i.e. φ(y) = y

• τ(y) circumscribed by a sphere σ having radius 1/y

• By defn. τ(y): tetrahedron s.t. side lengths = chords subtending

geodesics a1, . . . , a6 on circumscribed sphere

• ⇒ rlz. τ(y) on σ has geodesic lengths equal to a1, . . . , a6

140

Page 141: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

141

Page 142: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

142

Page 143: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Other norms?

Open research question

143

Page 144: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Unassigned DGP

Given two integers K > 0, n > 0 and a sequence d = (d1, . . . , dh) for

some h, find a configuration x = (x1, . . . , xn) ⊆ RK such that:

∀i ≤ h ∃u < v ≤ n ‖xu − xv‖2 = di. (3)

Another open research question

But look for the works of Phil Duxbury

144

Page 145: Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/dgp-columbia.pdfEuclidean Distance Geometry Leo Liberti CNRS LIX Ecole Polytechnique, France IEOR, Columbia University,

Some references

• L. Liberti, C. Lavor, N. Maculan, A. Mucherino, Euclidean distance geometry

and applications, SIAM Review, 56(1):3-69, 2014

• L. Liberti, B. Masson, J. Lee, C. Lavor, A. Mucherino, On the number

of realizations of certain Henneberg graphs arising in protein conformation,Discrete Applied Mathematics, 165:213-232, 2014

• L. Liberti, C. Lavor, A. Mucherino, The discretizable molecular distance geometry

problem seems easier on proteins, in [see below], 47-60

• A. Mucherino, C. Lavor, L. Liberti, N. Maculan (eds.), Distance Geometry:

Theory, Methods and Applications, Springer, New York, 2013

THE END