16 machine learning universal approximation multilayer perceptron
TRANSCRIPT
![Page 1: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/1.jpg)
Neural NetworksUniversal Approximation of the Multilayer Perceptron
Andres Mendez-Vazquez
June 16, 2015
1 / 39
![Page 2: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/2.jpg)
Outline
1 Introduction
2 Basic DefintionsTopologyCompactnessAbout Density in a TopologyHausdorff SpaceMeasureThe Borel MeasureDiscriminatory FunctionsThe Important TheoremUniversal Representation Theorem
2 / 39
![Page 3: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/3.jpg)
Introduction
Representation of functionsThe main result in multi-layer perceptron is its power of representation.
FurthermoreAfter all, it is quite striking if we can represent continuous functions of theform f : Rn 7−→ R as a finite sum of simple functions.
3 / 39
![Page 4: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/4.jpg)
Introduction
Representation of functionsThe main result in multi-layer perceptron is its power of representation.
FurthermoreAfter all, it is quite striking if we can represent continuous functions of theform f : Rn 7−→ R as a finite sum of simple functions.
3 / 39
![Page 5: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/5.jpg)
Therefore
Our main goalWe want to know under which conditions the sum of the form:
G (x) =N∑
j=1αj f
(wT x + θj
)(1)
can represent continuous functions in a specific domain.
4 / 39
![Page 6: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/6.jpg)
Setup of the problem
Definition of In
It is an n-dimensional unit cube [0, 1]n
In addition, we have the following set of functions
C (In) = {f : In → R|f is a continous function} (2)
5 / 39
![Page 7: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/7.jpg)
Setup of the problem
Definition of In
It is an n-dimensional unit cube [0, 1]n
In addition, we have the following set of functions
C (In) = {f : In → R|f is a continous function} (2)
5 / 39
![Page 8: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/8.jpg)
Outline
1 Introduction
2 Basic DefintionsTopologyCompactnessAbout Density in a TopologyHausdorff SpaceMeasureThe Borel MeasureDiscriminatory FunctionsThe Important TheoremUniversal Representation Theorem
6 / 39
![Page 9: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/9.jpg)
Now, some important definitions
Definition (Topological Space)A topological space is then a set X together with a collection of subsets ofX , called open sets and satisfying the following axioms:
1 The empty set and X itself are open.2 Any union of open sets is open.3 The intersection of any finite number of open sets is open.
ExamplesGiven the set {1, 2, 3, 4} we have that the following set is a topology{∅, {1} , {1, 2} , {1, 2, 3, 4}}.
DefinitionA topological space X is called compact if each open cover has a finitesubcover.
7 / 39
![Page 10: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/10.jpg)
Now, some important definitions
Definition (Topological Space)A topological space is then a set X together with a collection of subsets ofX , called open sets and satisfying the following axioms:
1 The empty set and X itself are open.2 Any union of open sets is open.3 The intersection of any finite number of open sets is open.
ExamplesGiven the set {1, 2, 3, 4} we have that the following set is a topology{∅, {1} , {1, 2} , {1, 2, 3, 4}}.
DefinitionA topological space X is called compact if each open cover has a finitesubcover.
7 / 39
![Page 11: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/11.jpg)
Now, some important definitions
Definition (Topological Space)A topological space is then a set X together with a collection of subsets ofX , called open sets and satisfying the following axioms:
1 The empty set and X itself are open.2 Any union of open sets is open.3 The intersection of any finite number of open sets is open.
ExamplesGiven the set {1, 2, 3, 4} we have that the following set is a topology{∅, {1} , {1, 2} , {1, 2, 3, 4}}.
DefinitionA topological space X is called compact if each open cover has a finitesubcover.
7 / 39
![Page 12: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/12.jpg)
Now, some important definitions
Definition (Topological Space)A topological space is then a set X together with a collection of subsets ofX , called open sets and satisfying the following axioms:
1 The empty set and X itself are open.2 Any union of open sets is open.3 The intersection of any finite number of open sets is open.
ExamplesGiven the set {1, 2, 3, 4} we have that the following set is a topology{∅, {1} , {1, 2} , {1, 2, 3, 4}}.
DefinitionA topological space X is called compact if each open cover has a finitesubcover.
7 / 39
![Page 13: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/13.jpg)
Now, some important definitions
Definition (Topological Space)A topological space is then a set X together with a collection of subsets ofX , called open sets and satisfying the following axioms:
1 The empty set and X itself are open.2 Any union of open sets is open.3 The intersection of any finite number of open sets is open.
ExamplesGiven the set {1, 2, 3, 4} we have that the following set is a topology{∅, {1} , {1, 2} , {1, 2, 3, 4}}.
DefinitionA topological space X is called compact if each open cover has a finitesubcover.
7 / 39
![Page 14: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/14.jpg)
Outline
1 Introduction
2 Basic DefintionsTopologyCompactnessAbout Density in a TopologyHausdorff SpaceMeasureThe Borel MeasureDiscriminatory FunctionsThe Important TheoremUniversal Representation Theorem
8 / 39
![Page 15: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/15.jpg)
Now
TheoremA compact set is closed and bounded.
ThusIn is a compact set in Rn .
Definition (Continuous functions)A function f : X → Y where the pre-image of every open set in Y is openin X .
9 / 39
![Page 16: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/16.jpg)
Now
TheoremA compact set is closed and bounded.
ThusIn is a compact set in Rn .
Definition (Continuous functions)A function f : X → Y where the pre-image of every open set in Y is openin X .
9 / 39
![Page 17: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/17.jpg)
Now
TheoremA compact set is closed and bounded.
ThusIn is a compact set in Rn .
Definition (Continuous functions)A function f : X → Y where the pre-image of every open set in Y is openin X .
9 / 39
![Page 18: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/18.jpg)
Thus
We have the following statementLet K be a nonempty subset of Rn , where n > 1. Then If K is compact,then every continuous real-valued function defined on K is bounded.
Definition (Supremum Norm)Let X be a topological space and let F be the space of all boundedcomplex-valued continuous functions defined on K . The supremum normis the norm defined on F by
‖f ‖ = supx∈X|f (x)| (3)
10 / 39
![Page 19: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/19.jpg)
Thus
We have the following statementLet K be a nonempty subset of Rn , where n > 1. Then If K is compact,then every continuous real-valued function defined on K is bounded.
Definition (Supremum Norm)Let X be a topological space and let F be the space of all boundedcomplex-valued continuous functions defined on K . The supremum normis the norm defined on F by
‖f ‖ = supx∈X|f (x)| (3)
10 / 39
![Page 20: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/20.jpg)
Outline
1 Introduction
2 Basic DefintionsTopologyCompactnessAbout Density in a TopologyHausdorff SpaceMeasureThe Borel MeasureDiscriminatory FunctionsThe Important TheoremUniversal Representation Theorem
11 / 39
![Page 21: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/21.jpg)
Limit Points
DefinitionIf X is a topological space and p is a point in X , a neighborhood of p is asubset V of X that includes an open set U containing p, p ∈ U ⊆ V .
This is also equivalent to p ∈ X being in the interior of V .
Example in a metric spaceIn a metric space (X , d), a set V is a neighborhood of a point p if thereexists an open ball with center at p and radius r > 0, such that
Br (p) = B (p; r) = {x ∈ X |d (x, p) < r} (4)
is contained in V .
12 / 39
![Page 22: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/22.jpg)
Limit Points
DefinitionIf X is a topological space and p is a point in X , a neighborhood of p is asubset V of X that includes an open set U containing p, p ∈ U ⊆ V .
This is also equivalent to p ∈ X being in the interior of V .
Example in a metric spaceIn a metric space (X , d), a set V is a neighborhood of a point p if thereexists an open ball with center at p and radius r > 0, such that
Br (p) = B (p; r) = {x ∈ X |d (x, p) < r} (4)
is contained in V .
12 / 39
![Page 23: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/23.jpg)
Limit Points
Definition of a Limit PointLet S be a subset of a topological space X . A point x ∈ X is a limit pointof S if every neighborhood of x contains at least one point of S differentfrom x itself.
Example in R
Which are the limit points of the set{
1n
}∞
n=1?
13 / 39
![Page 24: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/24.jpg)
Limit Points
Definition of a Limit PointLet S be a subset of a topological space X . A point x ∈ X is a limit pointof S if every neighborhood of x contains at least one point of S differentfrom x itself.
Example in R
Which are the limit points of the set{
1n
}∞
n=1?
13 / 39
![Page 25: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/25.jpg)
This allows to define the idea of density
Something NotableA subset A of a topological space X is dense in X , if for any point x ∈ X ,any neighborhood of x contains at least one point from A.
Classic ExampleThe real numbers with the usual topology have the rational numbers as acountable dense subset.
Why do you believe the floating-point numbers are rational?
In additionAlso the irrational numbers.
14 / 39
![Page 26: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/26.jpg)
This allows to define the idea of density
Something NotableA subset A of a topological space X is dense in X , if for any point x ∈ X ,any neighborhood of x contains at least one point from A.
Classic ExampleThe real numbers with the usual topology have the rational numbers as acountable dense subset.
Why do you believe the floating-point numbers are rational?
In additionAlso the irrational numbers.
14 / 39
![Page 27: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/27.jpg)
This allows to define the idea of density
Something NotableA subset A of a topological space X is dense in X , if for any point x ∈ X ,any neighborhood of x contains at least one point from A.
Classic ExampleThe real numbers with the usual topology have the rational numbers as acountable dense subset.
Why do you believe the floating-point numbers are rational?
In additionAlso the irrational numbers.
14 / 39
![Page 28: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/28.jpg)
From this, you have the idea of closure
DefinitionThe closure of a set S is the set of all points of closure of S , that is, theset S together with all of its limit points.
ExampleThe closure of the following set (0, 1) ∪ {2}
MeaningNot all points in the closure are limit points.
15 / 39
![Page 29: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/29.jpg)
From this, you have the idea of closure
DefinitionThe closure of a set S is the set of all points of closure of S , that is, theset S together with all of its limit points.
ExampleThe closure of the following set (0, 1) ∪ {2}
MeaningNot all points in the closure are limit points.
15 / 39
![Page 30: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/30.jpg)
From this, you have the idea of closure
DefinitionThe closure of a set S is the set of all points of closure of S , that is, theset S together with all of its limit points.
ExampleThe closure of the following set (0, 1) ∪ {2}
MeaningNot all points in the closure are limit points.
15 / 39
![Page 31: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/31.jpg)
Outline
1 Introduction
2 Basic DefintionsTopologyCompactnessAbout Density in a TopologyHausdorff SpaceMeasureThe Borel MeasureDiscriminatory FunctionsThe Important TheoremUniversal Representation Theorem
16 / 39
![Page 32: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/32.jpg)
Hausdorff Space
Definition of SeparationPoints x and y in a topological space X can be separated byneighborhoods if there exists a neighborhood U of x and a neighborhoodV of y such that U and V are disjoint.
DefinitionX is a Hausdorff space if any two distinct points of X can be separated byneighborhoods.
17 / 39
![Page 33: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/33.jpg)
Hausdorff Space
Definition of SeparationPoints x and y in a topological space X can be separated byneighborhoods if there exists a neighborhood U of x and a neighborhoodV of y such that U and V are disjoint.
DefinitionX is a Hausdorff space if any two distinct points of X can be separated byneighborhoods.
17 / 39
![Page 34: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/34.jpg)
Outline
1 Introduction
2 Basic DefintionsTopologyCompactnessAbout Density in a TopologyHausdorff SpaceMeasureThe Borel MeasureDiscriminatory FunctionsThe Important TheoremUniversal Representation Theorem
18 / 39
![Page 35: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/35.jpg)
We have then
Look at what we have1 C (In) is compact2 The continuous functions there are bounded!!!
19 / 39
![Page 36: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/36.jpg)
We have then
Look at what we have1 C (In) is compact2 The continuous functions there are bounded!!!
19 / 39
![Page 37: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/37.jpg)
We have then
Look at what we have1 C (In) is compact2 The continuous functions there are bounded!!!
19 / 39
![Page 38: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/38.jpg)
Now, the Measure Concept
Definition of σ−algebraLet A ⊂ P (X), we say that A to be an algebra if
1 ∅, X ∈ A.2 A,B ∈ A then A ∪ B ∈ A.3 A ∈ A then Ac ∈ A.
DefinitionAn algebra A in P (X) is said to be a σ−algebra, if for any sequence{An} of elements in A, we have ∪∞
n=1An ∈ A
ExampleIn X = [0, 1), the class A0 consisting of ∅, and all finite unionsA = ∪n
i=1 [ai , bi) with 0 ≤ ai < bi ≤ ai+1 ≤ 1 is an algebra.
20 / 39
![Page 39: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/39.jpg)
Now, the Measure Concept
Definition of σ−algebraLet A ⊂ P (X), we say that A to be an algebra if
1 ∅, X ∈ A.2 A,B ∈ A then A ∪ B ∈ A.3 A ∈ A then Ac ∈ A.
DefinitionAn algebra A in P (X) is said to be a σ−algebra, if for any sequence{An} of elements in A, we have ∪∞
n=1An ∈ A
ExampleIn X = [0, 1), the class A0 consisting of ∅, and all finite unionsA = ∪n
i=1 [ai , bi) with 0 ≤ ai < bi ≤ ai+1 ≤ 1 is an algebra.
20 / 39
![Page 40: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/40.jpg)
Now, the Measure Concept
Definition of σ−algebraLet A ⊂ P (X), we say that A to be an algebra if
1 ∅, X ∈ A.2 A,B ∈ A then A ∪ B ∈ A.3 A ∈ A then Ac ∈ A.
DefinitionAn algebra A in P (X) is said to be a σ−algebra, if for any sequence{An} of elements in A, we have ∪∞
n=1An ∈ A
ExampleIn X = [0, 1), the class A0 consisting of ∅, and all finite unionsA = ∪n
i=1 [ai , bi) with 0 ≤ ai < bi ≤ ai+1 ≤ 1 is an algebra.
20 / 39
![Page 41: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/41.jpg)
Now, the Measure Concept
Definition of σ−algebraLet A ⊂ P (X), we say that A to be an algebra if
1 ∅, X ∈ A.2 A,B ∈ A then A ∪ B ∈ A.3 A ∈ A then Ac ∈ A.
DefinitionAn algebra A in P (X) is said to be a σ−algebra, if for any sequence{An} of elements in A, we have ∪∞
n=1An ∈ A
ExampleIn X = [0, 1), the class A0 consisting of ∅, and all finite unionsA = ∪n
i=1 [ai , bi) with 0 ≤ ai < bi ≤ ai+1 ≤ 1 is an algebra.
20 / 39
![Page 42: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/42.jpg)
Now, the Measure Concept
Definition of σ−algebraLet A ⊂ P (X), we say that A to be an algebra if
1 ∅, X ∈ A.2 A,B ∈ A then A ∪ B ∈ A.3 A ∈ A then Ac ∈ A.
DefinitionAn algebra A in P (X) is said to be a σ−algebra, if for any sequence{An} of elements in A, we have ∪∞
n=1An ∈ A
ExampleIn X = [0, 1), the class A0 consisting of ∅, and all finite unionsA = ∪n
i=1 [ai , bi) with 0 ≤ ai < bi ≤ ai+1 ≤ 1 is an algebra.
20 / 39
![Page 43: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/43.jpg)
Now, the Measure Concept
Definition of σ−algebraLet A ⊂ P (X), we say that A to be an algebra if
1 ∅, X ∈ A.2 A,B ∈ A then A ∪ B ∈ A.3 A ∈ A then Ac ∈ A.
DefinitionAn algebra A in P (X) is said to be a σ−algebra, if for any sequence{An} of elements in A, we have ∪∞
n=1An ∈ A
ExampleIn X = [0, 1), the class A0 consisting of ∅, and all finite unionsA = ∪n
i=1 [ai , bi) with 0 ≤ ai < bi ≤ ai+1 ≤ 1 is an algebra.
20 / 39
![Page 44: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/44.jpg)
Now, the Measure Concept
Definition of additivityLet µ : A → [0,+∞] be such that µ (∅) = 0, we say that µ is σ−additiveif for any {Ai}i∈I ⊂ A (Where I can be finite of infinite countable) ofmutually disjoint sets such that ∪i∈I Ai ∈ A, we have that
µ (∪i∈I Ai) =∑i∈I
µ (Ai) (5)
Definition of MeasurabilityLet A be a σ−algebra of subsets of X , we say that the [air (X ,A) is ameasurable space where a σ−additive function µ : A → [0,+∞] is called ameasure on (X ,A).
21 / 39
![Page 45: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/45.jpg)
Now, the Measure Concept
Definition of additivityLet µ : A → [0,+∞] be such that µ (∅) = 0, we say that µ is σ−additiveif for any {Ai}i∈I ⊂ A (Where I can be finite of infinite countable) ofmutually disjoint sets such that ∪i∈I Ai ∈ A, we have that
µ (∪i∈I Ai) =∑i∈I
µ (Ai) (5)
Definition of MeasurabilityLet A be a σ−algebra of subsets of X , we say that the [air (X ,A) is ameasurable space where a σ−additive function µ : A → [0,+∞] is called ameasure on (X ,A).
21 / 39
![Page 46: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/46.jpg)
Outline
1 Introduction
2 Basic DefintionsTopologyCompactnessAbout Density in a TopologyHausdorff SpaceMeasureThe Borel MeasureDiscriminatory FunctionsThe Important TheoremUniversal Representation Theorem
22 / 39
![Page 47: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/47.jpg)
A Borel Measure
DefinitionThe Borel σ-algebra is defined to be the σ-algebra generated by the opensets (or equivalently, by the closed sets).
Definition of a Borel MeasureIf F is the Borel σ-algebra on some topological space, then a measureµ : F → R is said to be a Borel measure (or Borel probability measure).For a Borel measure, all continuous functions are measurable.
Definition of a signed Borel MeasureA signed Borel measure µ : B (X)→ is a measure such that
1 µ (∅) = 0.2 µis σ-additive.3 supA∈B(X) |µ (A)| <∞
23 / 39
![Page 48: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/48.jpg)
A Borel Measure
DefinitionThe Borel σ-algebra is defined to be the σ-algebra generated by the opensets (or equivalently, by the closed sets).
Definition of a Borel MeasureIf F is the Borel σ-algebra on some topological space, then a measureµ : F → R is said to be a Borel measure (or Borel probability measure).For a Borel measure, all continuous functions are measurable.
Definition of a signed Borel MeasureA signed Borel measure µ : B (X)→ is a measure such that
1 µ (∅) = 0.2 µis σ-additive.3 supA∈B(X) |µ (A)| <∞
23 / 39
![Page 49: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/49.jpg)
A Borel Measure
DefinitionThe Borel σ-algebra is defined to be the σ-algebra generated by the opensets (or equivalently, by the closed sets).
Definition of a Borel MeasureIf F is the Borel σ-algebra on some topological space, then a measureµ : F → R is said to be a Borel measure (or Borel probability measure).For a Borel measure, all continuous functions are measurable.
Definition of a signed Borel MeasureA signed Borel measure µ : B (X)→ is a measure such that
1 µ (∅) = 0.2 µis σ-additive.3 supA∈B(X) |µ (A)| <∞
23 / 39
![Page 50: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/50.jpg)
A Borel Measure
DefinitionThe Borel σ-algebra is defined to be the σ-algebra generated by the opensets (or equivalently, by the closed sets).
Definition of a Borel MeasureIf F is the Borel σ-algebra on some topological space, then a measureµ : F → R is said to be a Borel measure (or Borel probability measure).For a Borel measure, all continuous functions are measurable.
Definition of a signed Borel MeasureA signed Borel measure µ : B (X)→ is a measure such that
1 µ (∅) = 0.2 µis σ-additive.3 supA∈B(X) |µ (A)| <∞
23 / 39
![Page 51: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/51.jpg)
A Borel Measure
DefinitionThe Borel σ-algebra is defined to be the σ-algebra generated by the opensets (or equivalently, by the closed sets).
Definition of a Borel MeasureIf F is the Borel σ-algebra on some topological space, then a measureµ : F → R is said to be a Borel measure (or Borel probability measure).For a Borel measure, all continuous functions are measurable.
Definition of a signed Borel MeasureA signed Borel measure µ : B (X)→ is a measure such that
1 µ (∅) = 0.2 µis σ-additive.3 supA∈B(X) |µ (A)| <∞
23 / 39
![Page 52: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/52.jpg)
A Borel Measure
DefinitionThe Borel σ-algebra is defined to be the σ-algebra generated by the opensets (or equivalently, by the closed sets).
Definition of a Borel MeasureIf F is the Borel σ-algebra on some topological space, then a measureµ : F → R is said to be a Borel measure (or Borel probability measure).For a Borel measure, all continuous functions are measurable.
Definition of a signed Borel MeasureA signed Borel measure µ : B (X)→ is a measure such that
1 µ (∅) = 0.2 µis σ-additive.3 supA∈B(X) |µ (A)| <∞
23 / 39
![Page 53: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/53.jpg)
A Borel Measure
RegularityA measure µ is Borel regular measure:
1 For every Borel set B ⊆ Rn and A ⊆ Rn ,µ (A) = µ (A ∩ B) + µ (A− B).
2 For every A ⊆ Rn , there exists a Borel set B ⊆ Rn such that A ⊆ Band µ (A) = µ (B).
24 / 39
![Page 54: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/54.jpg)
A Borel Measure
RegularityA measure µ is Borel regular measure:
1 For every Borel set B ⊆ Rn and A ⊆ Rn ,µ (A) = µ (A ∩ B) + µ (A− B).
2 For every A ⊆ Rn , there exists a Borel set B ⊆ Rn such that A ⊆ Band µ (A) = µ (B).
24 / 39
![Page 55: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/55.jpg)
A Borel Measure
RegularityA measure µ is Borel regular measure:
1 For every Borel set B ⊆ Rn and A ⊆ Rn ,µ (A) = µ (A ∩ B) + µ (A− B).
2 For every A ⊆ Rn , there exists a Borel set B ⊆ Rn such that A ⊆ Band µ (A) = µ (B).
24 / 39
![Page 56: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/56.jpg)
Outline
1 Introduction
2 Basic DefintionsTopologyCompactnessAbout Density in a TopologyHausdorff SpaceMeasureThe Borel MeasureDiscriminatory FunctionsThe Important TheoremUniversal Representation Theorem
25 / 39
![Page 57: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/57.jpg)
Discriminatory Functions
DefinitionGiven the set M (In) of signed regular Borel measures, a function f isdiscriminatory if for a measure µ ∈ M (In)
ˆIn
f(wT x + θ
)dµ = 0 (6)
for all w ∈ Rn and θ ∈ R implies that µ = 0
DefinitionWe say that f is sigmoidal if
f (t)→{
1 as t → +∞0 as t → −∞
26 / 39
![Page 58: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/58.jpg)
Discriminatory Functions
DefinitionGiven the set M (In) of signed regular Borel measures, a function f isdiscriminatory if for a measure µ ∈ M (In)
ˆIn
f(wT x + θ
)dµ = 0 (6)
for all w ∈ Rn and θ ∈ R implies that µ = 0
DefinitionWe say that f is sigmoidal if
f (t)→{
1 as t → +∞0 as t → −∞
26 / 39
![Page 59: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/59.jpg)
Outline
1 Introduction
2 Basic DefintionsTopologyCompactnessAbout Density in a TopologyHausdorff SpaceMeasureThe Borel MeasureDiscriminatory FunctionsThe Important TheoremUniversal Representation Theorem
27 / 39
![Page 60: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/60.jpg)
The Important Theorem
Theorem 1Let f a be any continuous discriminatory function. Then finite sums of theform
G (x) =N∑
j=1αj f
(wT
j x + θj), (7)
where wj ∈ Rn and αj , θj ∈ R are fixed, are dense in C (In)
MeaningBasically given any function g ∈ C (In) and any neighborhood V of g, youhave a G ∈ V .
28 / 39
![Page 61: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/61.jpg)
The Important Theorem
Theorem 1Let f a be any continuous discriminatory function. Then finite sums of theform
G (x) =N∑
j=1αj f
(wT
j x + θj), (7)
where wj ∈ Rn and αj , θj ∈ R are fixed, are dense in C (In)
MeaningBasically given any function g ∈ C (In) and any neighborhood V of g, youhave a G ∈ V .
28 / 39
![Page 62: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/62.jpg)
Furthermore
In other wordsGiven any g ∈ C (In) and ε > 0, there is a sum, G (x), of the above form,for which
|G (x)− g (x)| < ε ∀x ∈ In (8)
29 / 39
![Page 63: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/63.jpg)
Proof
Let S ⊂ C (In) be the set of functions of the form G(x)First, S is a linear subspace of C (In)
DefinitionA subset V of Rn is called a linear subspace of Rn if V contains the zerovector, and is closed under vector addition and scaling. That is, forX ,Y ∈ V and c ∈ R, we have X + Y ∈ V and cX ∈ V .
We claim that the closure of S is all of C (In)Assume that the closure of S is not all of C (In)
30 / 39
![Page 64: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/64.jpg)
Proof
Let S ⊂ C (In) be the set of functions of the form G(x)First, S is a linear subspace of C (In)
DefinitionA subset V of Rn is called a linear subspace of Rn if V contains the zerovector, and is closed under vector addition and scaling. That is, forX ,Y ∈ V and c ∈ R, we have X + Y ∈ V and cX ∈ V .
We claim that the closure of S is all of C (In)Assume that the closure of S is not all of C (In)
30 / 39
![Page 65: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/65.jpg)
Proof
Let S ⊂ C (In) be the set of functions of the form G(x)First, S is a linear subspace of C (In)
DefinitionA subset V of Rn is called a linear subspace of Rn if V contains the zerovector, and is closed under vector addition and scaling. That is, forX ,Y ∈ V and c ∈ R, we have X + Y ∈ V and cX ∈ V .
We claim that the closure of S is all of C (In)Assume that the closure of S is not all of C (In)
30 / 39
![Page 66: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/66.jpg)
Proof
ThenThe closure of S , say R, is a closed proper subspace of C (I )
We use the Hahn-Banach TheoremIf p : V → R is a sublinear function (i.e. you havep (x + y) ≤ p (x) + p (y) and the product against scalar is the same), andϕ : U → R is a linear functional on a linear subspace U ⊆ V which isdominated by p on U , i.e. ϕ (x) ≤ p (x) ∀x ∈ U .
ThenThere exists a linear extension ψ : V → R of ϕ to the whole space V , i.e.,there exists a linear functional ψ such that
1 ψ (x) = ϕ (x) ∀x ∈ U .2 ψ (x) ≤ p (x) ∀x ∈ V .
31 / 39
![Page 67: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/67.jpg)
Proof
ThenThe closure of S , say R, is a closed proper subspace of C (I )
We use the Hahn-Banach TheoremIf p : V → R is a sublinear function (i.e. you havep (x + y) ≤ p (x) + p (y) and the product against scalar is the same), andϕ : U → R is a linear functional on a linear subspace U ⊆ V which isdominated by p on U , i.e. ϕ (x) ≤ p (x) ∀x ∈ U .
ThenThere exists a linear extension ψ : V → R of ϕ to the whole space V , i.e.,there exists a linear functional ψ such that
1 ψ (x) = ϕ (x) ∀x ∈ U .2 ψ (x) ≤ p (x) ∀x ∈ V .
31 / 39
![Page 68: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/68.jpg)
Proof
ThenThe closure of S , say R, is a closed proper subspace of C (I )
We use the Hahn-Banach TheoremIf p : V → R is a sublinear function (i.e. you havep (x + y) ≤ p (x) + p (y) and the product against scalar is the same), andϕ : U → R is a linear functional on a linear subspace U ⊆ V which isdominated by p on U , i.e. ϕ (x) ≤ p (x) ∀x ∈ U .
ThenThere exists a linear extension ψ : V → R of ϕ to the whole space V , i.e.,there exists a linear functional ψ such that
1 ψ (x) = ϕ (x) ∀x ∈ U .2 ψ (x) ≤ p (x) ∀x ∈ V .
31 / 39
![Page 69: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/69.jpg)
Proof
ThenThe closure of S , say R, is a closed proper subspace of C (I )
We use the Hahn-Banach TheoremIf p : V → R is a sublinear function (i.e. you havep (x + y) ≤ p (x) + p (y) and the product against scalar is the same), andϕ : U → R is a linear functional on a linear subspace U ⊆ V which isdominated by p on U , i.e. ϕ (x) ≤ p (x) ∀x ∈ U .
ThenThere exists a linear extension ψ : V → R of ϕ to the whole space V , i.e.,there exists a linear functional ψ such that
1 ψ (x) = ϕ (x) ∀x ∈ U .2 ψ (x) ≤ p (x) ∀x ∈ V .
31 / 39
![Page 70: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/70.jpg)
Proof
ThenThe closure of S , say R, is a closed proper subspace of C (I )
We use the Hahn-Banach TheoremIf p : V → R is a sublinear function (i.e. you havep (x + y) ≤ p (x) + p (y) and the product against scalar is the same), andϕ : U → R is a linear functional on a linear subspace U ⊆ V which isdominated by p on U , i.e. ϕ (x) ≤ p (x) ∀x ∈ U .
ThenThere exists a linear extension ψ : V → R of ϕ to the whole space V , i.e.,there exists a linear functional ψ such that
1 ψ (x) = ϕ (x) ∀x ∈ U .2 ψ (x) ≤ p (x) ∀x ∈ V .
31 / 39
![Page 71: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/71.jpg)
Proof
It is possible to construct sublinear function defined as followWe define the following linear functional
T (f ) ={
f if f ∈ C (In)− R0 if f ∈ R
(9)
ThenWe have there is a bounded linear functional (Using T as p and ϕ, andV ∈ C (In) and U = R) called L 6= 0 with L (R) = L (S) = 0.
32 / 39
![Page 72: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/72.jpg)
Proof
It is possible to construct sublinear function defined as followWe define the following linear functional
T (f ) ={
f if f ∈ C (In)− R0 if f ∈ R
(9)
ThenWe have there is a bounded linear functional (Using T as p and ϕ, andV ∈ C (In) and U = R) called L 6= 0 with L (R) = L (S) = 0.
32 / 39
![Page 73: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/73.jpg)
ProofNow, we use the Riesz Representation TheoremLet X be a locally compact Hausdorff space. For any positive linearfunctional ψ on C (X), there is a unique regular Borel measure µ on Xsuch that
ψ =ˆ
Xf (x) dµ (x) (10)
for all f in C (X)
We can then do the following
L (h) =ˆ
In
h (x) dµ (x) (11)
Where?For some µ ∈ M (In), for all h ∈ C (In)
33 / 39
![Page 74: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/74.jpg)
ProofNow, we use the Riesz Representation TheoremLet X be a locally compact Hausdorff space. For any positive linearfunctional ψ on C (X), there is a unique regular Borel measure µ on Xsuch that
ψ =ˆ
Xf (x) dµ (x) (10)
for all f in C (X)
We can then do the following
L (h) =ˆ
In
h (x) dµ (x) (11)
Where?For some µ ∈ M (In), for all h ∈ C (In)
33 / 39
![Page 75: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/75.jpg)
ProofNow, we use the Riesz Representation TheoremLet X be a locally compact Hausdorff space. For any positive linearfunctional ψ on C (X), there is a unique regular Borel measure µ on Xsuch that
ψ =ˆ
Xf (x) dµ (x) (10)
for all f in C (X)
We can then do the following
L (h) =ˆ
In
h (x) dµ (x) (11)
Where?For some µ ∈ M (In), for all h ∈ C (In)
33 / 39
![Page 76: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/76.jpg)
Proof
In particularGiven that f
(wT x + θ
)is in R for all w and θ
We must have that ˆIn
f(wT x + θ
)dµ (x) = 0 (12)
for all w and θ
But we assumed that f is discriminatory!!!Then... µ = 0 contradicting the fact that L 6= 0!!! We have acontradiction!!!
34 / 39
![Page 77: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/77.jpg)
Proof
In particularGiven that f
(wT x + θ
)is in R for all w and θ
We must have that ˆIn
f(wT x + θ
)dµ (x) = 0 (12)
for all w and θ
But we assumed that f is discriminatory!!!Then... µ = 0 contradicting the fact that L 6= 0!!! We have acontradiction!!!
34 / 39
![Page 78: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/78.jpg)
Proof
In particularGiven that f
(wT x + θ
)is in R for all w and θ
We must have that ˆIn
f(wT x + θ
)dµ (x) = 0 (12)
for all w and θ
But we assumed that f is discriminatory!!!Then... µ = 0 contradicting the fact that L 6= 0!!! We have acontradiction!!!
34 / 39
![Page 79: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/79.jpg)
Proof
FinallyThe subspace S of sums of the form G is dense!!!
35 / 39
![Page 80: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/80.jpg)
Now, we deal with the sigmoidal function
Lemma 1Any bounded, measurable sigmoidal function, f , is discriminatory. Inparticular, any continuous sigmoidal function is discriminatory.
ProofI will leave this to you... it is possible I will get a question from this prooffor the firs midterm.
36 / 39
![Page 81: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/81.jpg)
Now, we deal with the sigmoidal function
Lemma 1Any bounded, measurable sigmoidal function, f , is discriminatory. Inparticular, any continuous sigmoidal function is discriminatory.
ProofI will leave this to you... it is possible I will get a question from this prooffor the firs midterm.
36 / 39
![Page 82: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/82.jpg)
Outline
1 Introduction
2 Basic DefintionsTopologyCompactnessAbout Density in a TopologyHausdorff SpaceMeasureThe Borel MeasureDiscriminatory FunctionsThe Important TheoremUniversal Representation Theorem
37 / 39
![Page 83: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/83.jpg)
We have the theorem finally!!!
Universal Representation Theorem for the multi-layer perceptronLet f be any continuous sigmoidal function. Then finite sums of the form
G (x) =N∑
j=1αj f
(wT x + θj
)(13)
are dense in C (In).
In other wordsGiven any g ∈ C (In) and ε > 0, there is a sum G (x) of the above form,for which
|G (x)− g (x)| < ε ∀x ∈ In (14)
38 / 39
![Page 84: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/84.jpg)
We have the theorem finally!!!
Universal Representation Theorem for the multi-layer perceptronLet f be any continuous sigmoidal function. Then finite sums of the form
G (x) =N∑
j=1αj f
(wT x + θj
)(13)
are dense in C (In).
In other wordsGiven any g ∈ C (In) and ε > 0, there is a sum G (x) of the above form,for which
|G (x)− g (x)| < ε ∀x ∈ In (14)
38 / 39
![Page 85: 16 Machine Learning Universal Approximation Multilayer Perceptron](https://reader033.vdocuments.mx/reader033/viewer/2022042707/586e84fb1a28aba0038b620d/html5/thumbnails/85.jpg)
Poof
SimpleCombine the theorem and lemma 1... and because the continoussigmoidals satisfy the conditions of the lemma... we have ourrepresentation!!!
39 / 39