a network of mathematical theoremschrismav/networkstalk.pdfa network of theorems substring bug june...

43
A Network of Mathematical Theorems A.Chotai, C.Mavroyiakoumou, I.Papadopoulos, X.Wen Imperial College London Department of Mathematics June 17, 2015 June 17, 2015 1 / 32

Upload: others

Post on 10-May-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Network of Mathematical Theoremschrismav/networksTalk.pdfA Network of Theorems Substring bug June 17, 2015 4 / 32 Removal of duplicates Duplicate nodes correspond to theorems that

A Network of Mathematical Theorems

A.Chotai, C.Mavroyiakoumou, I.Papadopoulos, X.Wen

Imperial College London

Department of Mathematics

June 17, 2015

June 17, 2015 1 / 32

Page 2: A Network of Mathematical Theoremschrismav/networksTalk.pdfA Network of Theorems Substring bug June 17, 2015 4 / 32 Removal of duplicates Duplicate nodes correspond to theorems that

A Network of Theorems

Fig: A Weighted, Undirected Network

A Network of Theorems Introduction June 17, 2015 2 / 32

Page 3: A Network of Mathematical Theoremschrismav/networksTalk.pdfA Network of Theorems Substring bug June 17, 2015 4 / 32 Removal of duplicates Duplicate nodes correspond to theorems that

A Network of Theorems

Fig: A Part of Our Network of Theorems

A Network of Theorems Part of our Network June 17, 2015 3 / 32

Page 4: A Network of Mathematical Theoremschrismav/networksTalk.pdfA Network of Theorems Substring bug June 17, 2015 4 / 32 Removal of duplicates Duplicate nodes correspond to theorems that

The Substring Bug

Removal of 78 nodes and approximately 5000 edges

We keep any nodes that the substring bug contributes less than 5%to the frequency e.g. the central limit theorem which is a substring ofLindeberg central limit theorem.

A Network of Theorems Substring bug June 17, 2015 4 / 32

Page 5: A Network of Mathematical Theoremschrismav/networksTalk.pdfA Network of Theorems Substring bug June 17, 2015 4 / 32 Removal of duplicates Duplicate nodes correspond to theorems that

Removal of duplicates

Duplicate nodes correspond to theorems that have the same name andappear in our data set with

different

ID number

but same

Frequency

Degree

Theorem IDs Theorem Name Frequency Degree118, 167 distortion theorem 295 80

143, 868 Helmholtz theorem 190 40

175, 767 Torelli theorem 82 36

191, 362 Godel incompleteness theorem 26 16

219, 698 Bertini theorem 78 42

253, 547 art gallery theorem 179 12

272, 531 Fredholm theorem 134 89

426, 1460 Chasles theorem 60 15

553, 1619 Dirac theorem 27 17

571, 749 Euclid theorem 7 5

766, 1580 Foster theorem 7 4

836, 1644 Oka theorem 8 5

Wanted to make space

Action taken: Delete the 12 duplicates from our data set!

Removal of duplicates One type of bug June 17, 2015 5 / 32

Page 6: A Network of Mathematical Theoremschrismav/networksTalk.pdfA Network of Theorems Substring bug June 17, 2015 4 / 32 Removal of duplicates Duplicate nodes correspond to theorems that

Removal of duplicates

Duplicate nodes correspond to theorems that have the same name andappear in our data set with

different

ID number

but same

Frequency

Degree

Theorem IDs Theorem Name Frequency Degree118, 167 distortion theorem 295 80

143, 868 Helmholtz theorem 190 40

175, 767 Torelli theorem 82 36

191, 362 Godel incompleteness theorem 26 16

219, 698 Bertini theorem 78 42

253, 547 art gallery theorem 179 12

272, 531 Fredholm theorem 134 89

426, 1460 Chasles theorem 60 15

553, 1619 Dirac theorem 27 17

571, 749 Euclid theorem 7 5

766, 1580 Foster theorem 7 4

836, 1644 Oka theorem 8 5

Wanted to make space

Action taken: Delete the 12 duplicates from our data set!

Removal of duplicates One type of bug June 17, 2015 5 / 32

Page 7: A Network of Mathematical Theoremschrismav/networksTalk.pdfA Network of Theorems Substring bug June 17, 2015 4 / 32 Removal of duplicates Duplicate nodes correspond to theorems that

Removal of duplicates

Duplicate nodes correspond to theorems that have the same name andappear in our data set with

different

ID number

but same

Frequency

Degree

Theorem IDs Theorem Name Frequency Degree118, 167 distortion theorem 295 80

143, 868 Helmholtz theorem 190 40

175, 767 Torelli theorem 82 36

191, 362 Godel incompleteness theorem 26 16

219, 698 Bertini theorem 78 42

253, 547 art gallery theorem 179 12

272, 531 Fredholm theorem 134 89

426, 1460 Chasles theorem 60 15

553, 1619 Dirac theorem 27 17

571, 749 Euclid theorem 7 5

766, 1580 Foster theorem 7 4

836, 1644 Oka theorem 8 5

Wanted to make spaceAction taken: Delete the 12 duplicates from our data set!

Removal of duplicates One type of bug June 17, 2015 5 / 32

Page 8: A Network of Mathematical Theoremschrismav/networksTalk.pdfA Network of Theorems Substring bug June 17, 2015 4 / 32 Removal of duplicates Duplicate nodes correspond to theorems that

Compare initial n consecutive characters

Potential bugs that arise from the n = 7 case include:

Theorems that have various spellings.e.g. Pythagoras theorem, Pythagoreas theorem and Pythagoreantheorem

Theorems that are the same but when mentioned in a publicationoften have one of the contributors’ name omitted.

e.g. Ascoli theorem and Ascoli Arzela theorem

Nodes that describe a general class of theorems rather than beingan actual theorem.

e.g. theorems on continuation, theorems on sums of squares, andtheorems of Euclid

This method motivates the bug hunting.

Compare initial n consecutive characters Bug Hunting June 17, 2015 6 / 32

Page 9: A Network of Mathematical Theoremschrismav/networksTalk.pdfA Network of Theorems Substring bug June 17, 2015 4 / 32 Removal of duplicates Duplicate nodes correspond to theorems that

Compare initial n consecutive characters

Potential bugs that arise from the n = 7 case include:

Theorems that have various spellings.e.g. Pythagoras theorem, Pythagoreas theorem and Pythagoreantheorem

Theorems that are the same but when mentioned in a publicationoften have one of the contributors’ name omitted.

e.g. Ascoli theorem and Ascoli Arzela theorem

Nodes that describe a general class of theorems rather than beingan actual theorem.

e.g. theorems on continuation, theorems on sums of squares, andtheorems of Euclid

This method motivates the bug hunting.

Compare initial n consecutive characters Bug Hunting June 17, 2015 6 / 32

Page 10: A Network of Mathematical Theoremschrismav/networksTalk.pdfA Network of Theorems Substring bug June 17, 2015 4 / 32 Removal of duplicates Duplicate nodes correspond to theorems that

Compare initial n consecutive characters

Potential bugs that arise from the n = 7 case include:

Theorems that have various spellings.e.g. Pythagoras theorem, Pythagoreas theorem and Pythagoreantheorem

Theorems that are the same but when mentioned in a publicationoften have one of the contributors’ name omitted.

e.g. Ascoli theorem and Ascoli Arzela theorem

Nodes that describe a general class of theorems rather than beingan actual theorem.

e.g. theorems on continuation, theorems on sums of squares, andtheorems of Euclid

This method motivates the bug hunting.

Compare initial n consecutive characters Bug Hunting June 17, 2015 6 / 32

Page 11: A Network of Mathematical Theoremschrismav/networksTalk.pdfA Network of Theorems Substring bug June 17, 2015 4 / 32 Removal of duplicates Duplicate nodes correspond to theorems that

Compare initial n consecutive characters

Potential bugs that arise from the n = 7 case include:

Theorems that have various spellings.e.g. Pythagoras theorem, Pythagoreas theorem and Pythagoreantheorem

Theorems that are the same but when mentioned in a publicationoften have one of the contributors’ name omitted.

e.g. Ascoli theorem and Ascoli Arzela theorem

Nodes that describe a general class of theorems rather than beingan actual theorem.

e.g. theorems on continuation, theorems on sums of squares, andtheorems of Euclid

This method motivates the bug hunting.

Compare initial n consecutive characters Bug Hunting June 17, 2015 6 / 32

Page 12: A Network of Mathematical Theoremschrismav/networksTalk.pdfA Network of Theorems Substring bug June 17, 2015 4 / 32 Removal of duplicates Duplicate nodes correspond to theorems that

Merging Nodes

Merging Nodes June 17, 2015 7 / 32

Page 13: A Network of Mathematical Theoremschrismav/networksTalk.pdfA Network of Theorems Substring bug June 17, 2015 4 / 32 Removal of duplicates Duplicate nodes correspond to theorems that

Merging Nodes

Merging Nodes June 17, 2015 8 / 32

Page 14: A Network of Mathematical Theoremschrismav/networksTalk.pdfA Network of Theorems Substring bug June 17, 2015 4 / 32 Removal of duplicates Duplicate nodes correspond to theorems that

Merging Nodes

Sum the weights of the two edges to a shared node

Pythagoras theorem - Hahn Banach theorem (2)

Pythagorean theorem - Hahn Banach theorem (3)

Merging Nodes June 17, 2015 9 / 32

Page 15: A Network of Mathematical Theoremschrismav/networksTalk.pdfA Network of Theorems Substring bug June 17, 2015 4 / 32 Removal of duplicates Duplicate nodes correspond to theorems that

Merging Nodes

(1 − 11

40

)× 7 = 5.075. We round this up to 6.

Merging Nodes June 17, 2015 10 / 32

Page 16: A Network of Mathematical Theoremschrismav/networksTalk.pdfA Network of Theorems Substring bug June 17, 2015 4 / 32 Removal of duplicates Duplicate nodes correspond to theorems that

Order and Strength

The order of a node is the number of different theorems to which thenode is connected

The strength of a node is the sum of the weights of each edgeincident to that node

Order and Strength June 17, 2015 11 / 32

Page 17: A Network of Mathematical Theoremschrismav/networksTalk.pdfA Network of Theorems Substring bug June 17, 2015 4 / 32 Removal of duplicates Duplicate nodes correspond to theorems that

Order and Strength

We are interested in nodes with low degrees and high strengths.

Order and Strength June 17, 2015 12 / 32

Page 18: A Network of Mathematical Theoremschrismav/networksTalk.pdfA Network of Theorems Substring bug June 17, 2015 4 / 32 Removal of duplicates Duplicate nodes correspond to theorems that

Normalising Weights

It is helpful to consider the nodes at a local level.

We create a non-symmetric matrix P of normalised weights, wherePij =

wij

si

Normalising Weights June 17, 2015 13 / 32

Page 19: A Network of Mathematical Theoremschrismav/networksTalk.pdfA Network of Theorems Substring bug June 17, 2015 4 / 32 Removal of duplicates Duplicate nodes correspond to theorems that

Mean Squared Errors

P =

0 P12 P13

P21 0 P23

P31 P32 0

Mean Squared Errors June 17, 2015 14 / 32

Page 20: A Network of Mathematical Theoremschrismav/networksTalk.pdfA Network of Theorems Substring bug June 17, 2015 4 / 32 Removal of duplicates Duplicate nodes correspond to theorems that

Mean Squared Errors with Green’s theorem

Mean Squared Errors Green’s theorem June 17, 2015 15 / 32

Page 21: A Network of Mathematical Theoremschrismav/networksTalk.pdfA Network of Theorems Substring bug June 17, 2015 4 / 32 Removal of duplicates Duplicate nodes correspond to theorems that

Shortest Paths

6

5

3

2

4

1

In this graph, the shortest path from 1 to 3 is through 4.

Diameter Shortest Paths June 17, 2015 16 / 32

Page 22: A Network of Mathematical Theoremschrismav/networksTalk.pdfA Network of Theorems Substring bug June 17, 2015 4 / 32 Removal of duplicates Duplicate nodes correspond to theorems that

Diameter

The diameter is the longest length of the shortest paths.

1

3

5

4

2

Fig 1: This network has diameter 3. Fig 2: This network has infinite diameter

Diameter Diameter June 17, 2015 17 / 32

Page 23: A Network of Mathematical Theoremschrismav/networksTalk.pdfA Network of Theorems Substring bug June 17, 2015 4 / 32 Removal of duplicates Duplicate nodes correspond to theorems that

The four couples

Fig: The four distinct pairs of theorems that we stripped off

Diameter Four pairs June 17, 2015 18 / 32

Page 24: A Network of Mathematical Theoremschrismav/networksTalk.pdfA Network of Theorems Substring bug June 17, 2015 4 / 32 Removal of duplicates Duplicate nodes correspond to theorems that

Node Betweenness Centrality

Which theorem is ‘important’?

Look at its node betweenness centrality

Definition

BC (n) =∑

s 6=n 6=t

σst(n)

σst

where σst(n) is the total number of shortest paths passing through node nand σst is the total number of shortest paths from node s to node t.

7

6

3

1

2

8

4

5

In this graph, node 2 has the highest node betweenness centrality.

Betweenness centralities Definition of node BC June 17, 2015 19 / 32

Page 25: A Network of Mathematical Theoremschrismav/networksTalk.pdfA Network of Theorems Substring bug June 17, 2015 4 / 32 Removal of duplicates Duplicate nodes correspond to theorems that

Node Betweenness Centrality

Which theorem is ‘important’? Look at its node betweenness centrality

Definition

BC (n) =∑

s 6=n 6=t

σst(n)

σst

where σst(n) is the total number of shortest paths passing through node nand σst is the total number of shortest paths from node s to node t.

7

6

3

1

2

8

4

5

In this graph, node 2 has the highest node betweenness centrality.

Betweenness centralities Definition of node BC June 17, 2015 19 / 32

Page 26: A Network of Mathematical Theoremschrismav/networksTalk.pdfA Network of Theorems Substring bug June 17, 2015 4 / 32 Removal of duplicates Duplicate nodes correspond to theorems that

Node Betweenness Centrality

Which theorem is ‘important’? Look at its node betweenness centrality

Definition

BC (n) =∑

s 6=n 6=t

σst(n)

σst

where σst(n) is the total number of shortest paths passing through node nand σst is the total number of shortest paths from node s to node t.

7

6

3

1

2

8

4

5

In this graph, node 2 has the highest node betweenness centrality.

Betweenness centralities Definition of node BC June 17, 2015 19 / 32

Page 27: A Network of Mathematical Theoremschrismav/networksTalk.pdfA Network of Theorems Substring bug June 17, 2015 4 / 32 Removal of duplicates Duplicate nodes correspond to theorems that

MVT most ‘important’ intermediary

Theorems with the highest node betweenness centrality.1

1Diagram produced in Gephi

Betweenness centralities Theorem with high node BC June 17, 2015 20 / 32

Page 28: A Network of Mathematical Theoremschrismav/networksTalk.pdfA Network of Theorems Substring bug June 17, 2015 4 / 32 Removal of duplicates Duplicate nodes correspond to theorems that

Normalised Node Betweenness Centrality

Normalised BC (n)

BC (n) =∑

s 6=n 6=t

σst(n)

σst÷(N − 1

2

)where N is the total number of nodes.

278 nodes have zeronode betweenness cen-trality, which meansthat they lie on theboundary of the net-work of mathematicaltheorems.

Betweenness centralities Normalisation of node BC June 17, 2015 21 / 32

Page 29: A Network of Mathematical Theoremschrismav/networksTalk.pdfA Network of Theorems Substring bug June 17, 2015 4 / 32 Removal of duplicates Duplicate nodes correspond to theorems that

Normalised Node Betweenness Centrality

Normalised BC (n)

BC (n) =∑

s 6=n 6=t

σst(n)

σst÷(N − 1

2

)where N is the total number of nodes.

278 nodes have zeronode betweenness cen-trality, which meansthat they lie on theboundary of the net-work of mathematicaltheorems.

Betweenness centralities Normalisation of node BC June 17, 2015 21 / 32

Page 30: A Network of Mathematical Theoremschrismav/networksTalk.pdfA Network of Theorems Substring bug June 17, 2015 4 / 32 Removal of duplicates Duplicate nodes correspond to theorems that

Edge Betweenness Centrality

Edge betweenness centrality indicates the ‘importance’ of a edge.

Definition

BC (e) =∑s 6=t

σst(e)

σst

where σst(e) is the total number of shortest paths going through edge eand σst is the total number of shortest paths from node s to node t.

2

1

4

5

6

3 7

The edge connecting 1 to 4 has the highest edge betweenness centrality.

Betweenness centralities Definition of edge BC June 17, 2015 22 / 32

Page 31: A Network of Mathematical Theoremschrismav/networksTalk.pdfA Network of Theorems Substring bug June 17, 2015 4 / 32 Removal of duplicates Duplicate nodes correspond to theorems that

Most ‘important’ pair of theorems

This pair of theorems has the highest edge betweenness centrality

Mean valuetheorem

Holditchtheorem

This edge lies on most shortest paths between other theorems.

Betweenness centralities Pair of theorems with high edge BC June 17, 2015 23 / 32

Page 32: A Network of Mathematical Theoremschrismav/networksTalk.pdfA Network of Theorems Substring bug June 17, 2015 4 / 32 Removal of duplicates Duplicate nodes correspond to theorems that

Most ‘important’ pair of theorems

This pair of theorems has the highest edge betweenness centrality

Mean valuetheorem

Holditchtheorem

This edge lies on most shortest paths between other theorems.

Betweenness centralities Pair of theorems with high edge BC June 17, 2015 23 / 32

Page 33: A Network of Mathematical Theoremschrismav/networksTalk.pdfA Network of Theorems Substring bug June 17, 2015 4 / 32 Removal of duplicates Duplicate nodes correspond to theorems that

Girvan-Newman Algorithm

Reveals community structure of the network

Algorithm 1 Girvan – Newman Algorithm

1: Calculate the edge betweenness centrality of all the edges.

2: Remove the edge with the highest edge betweenness centrality.3: Recalculate the edge betweenness centrality for the remaining edges.4: Repeat step 2 and step 3 until all the edges have been deleted.

2

1

4

5

6

3 7

2

1

4

5

6

3 7

Betweenness centralities Girvan-Newman algorithm June 17, 2015 24 / 32

Page 34: A Network of Mathematical Theoremschrismav/networksTalk.pdfA Network of Theorems Substring bug June 17, 2015 4 / 32 Removal of duplicates Duplicate nodes correspond to theorems that

Girvan-Newman Algorithm

Reveals community structure of the network

Algorithm 1 Girvan – Newman Algorithm

1: Calculate the edge betweenness centrality of all the edges.2: Remove the edge with the highest edge betweenness centrality.

3: Recalculate the edge betweenness centrality for the remaining edges.4: Repeat step 2 and step 3 until all the edges have been deleted.

2

1

4

5

6

3 7

2

1

4

5

6

3 7

Betweenness centralities Girvan-Newman algorithm June 17, 2015 24 / 32

Page 35: A Network of Mathematical Theoremschrismav/networksTalk.pdfA Network of Theorems Substring bug June 17, 2015 4 / 32 Removal of duplicates Duplicate nodes correspond to theorems that

Girvan-Newman Algorithm

Reveals community structure of the network

Algorithm 1 Girvan – Newman Algorithm

1: Calculate the edge betweenness centrality of all the edges.2: Remove the edge with the highest edge betweenness centrality.3: Recalculate the edge betweenness centrality for the remaining edges.4: Repeat step 2 and step 3 until all the edges have been deleted.

2

1

4

5

6

3 7

2

1

4

5

6

3 7

Betweenness centralities Girvan-Newman algorithm June 17, 2015 24 / 32

Page 36: A Network of Mathematical Theoremschrismav/networksTalk.pdfA Network of Theorems Substring bug June 17, 2015 4 / 32 Removal of duplicates Duplicate nodes correspond to theorems that

Community Detection

Communities June 17, 2015 25 / 32

Page 37: A Network of Mathematical Theoremschrismav/networksTalk.pdfA Network of Theorems Substring bug June 17, 2015 4 / 32 Removal of duplicates Duplicate nodes correspond to theorems that

Stability

Communities Stability June 17, 2015 26 / 32

Page 38: A Network of Mathematical Theoremschrismav/networksTalk.pdfA Network of Theorems Substring bug June 17, 2015 4 / 32 Removal of duplicates Duplicate nodes correspond to theorems that

Community Results

Community (3.#) Count Percentage0 492 37.70%

1 735 56.32%

2 55 4.21%

3 2 0.15%

4 13 1.00%

5 2 0.15%

6 2 0.15%

7 2 0.15%

8 2 0.15%

Normalised Laplacian after deleting cen-tral limit theorem, h theorem,divergencetheorem, mean value theorem, bayes the-orem, implicit function theorem, fluctua-tion dissipation theorem, gauss theoremand characterisation theorem

Theorem ID Theorem name Frequency

84 final value theorem 483

208 initial value theorem 165

240 gershgorin theorem 111

369 gershgorin circle theorem 90

424 gerschgorin circle theorem 111

506 gauss lucas theorem 40

544 bauer fike theorem 65

736 poincare separation theorem 29

944 gershgorin disc theorem 30

1108 bisectors theorem 6

1132 gerschgorin disc theorem 12

11169 perpendicular bisector theorem 2

1266 sylvester determinant theorem 2

Communities Results June 17, 2015 27 / 32

Page 39: A Network of Mathematical Theoremschrismav/networksTalk.pdfA Network of Theorems Substring bug June 17, 2015 4 / 32 Removal of duplicates Duplicate nodes correspond to theorems that

Community Results

Community (4.#) Count Percentage0 66 5.06%1 18 1.38%2 49 3.75%3 9 0.69%4 10 0.77%5 17 1.30%6 18 1.38%7 48 3.68%8 7 0.54%9 4 0.31%

10 973 74.56%11 15 1.15%12 19 0.54%13 4 0.31%14 23 1.76%15 19 1.46%16 10 0.77%17 2 0.15%18 2 0.15%19 2 0.15%20 2 0.15%

Theorem ID Theorem name Frequency

9 fubini theorem 1008

319 sklar theorem 31

633 savitch theorem 6

820 perpendicular bisector theorem 14

Inverse Laplacian Waiting Time

Dynamics

Communities Results June 17, 2015 28 / 32

Page 40: A Network of Mathematical Theoremschrismav/networksTalk.pdfA Network of Theorems Substring bug June 17, 2015 4 / 32 Removal of duplicates Duplicate nodes correspond to theorems that

Minimum spanning forest

A spanning forest of our network H includes all the nodes of H andcontains no cyclesWe use Kruskal’s algorithm to create a spanning forest

Spanning trees June 17, 2015 29 / 32

Page 41: A Network of Mathematical Theoremschrismav/networksTalk.pdfA Network of Theorems Substring bug June 17, 2015 4 / 32 Removal of duplicates Duplicate nodes correspond to theorems that

Maximum spanning forest

Spanning trees June 17, 2015 30 / 32

Page 42: A Network of Mathematical Theoremschrismav/networksTalk.pdfA Network of Theorems Substring bug June 17, 2015 4 / 32 Removal of duplicates Duplicate nodes correspond to theorems that

Maximum spanning forest

Spanning trees June 17, 2015 31 / 32

Page 43: A Network of Mathematical Theoremschrismav/networksTalk.pdfA Network of Theorems Substring bug June 17, 2015 4 / 32 Removal of duplicates Duplicate nodes correspond to theorems that

Maximum spanning forest

Spanning trees June 17, 2015 32 / 32