linear models & clustering
DESCRIPTION
Linear Models & Clustering. Presented by Kwak , Nam- ju. Coverage. Classification Some tools for classification Linear regression Multiresponse linear regression Logistic regression Perceptron Instance-based learning Basic understanding kD -tree Ball tree Clustering - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Linear Models & Clustering](https://reader034.vdocuments.mx/reader034/viewer/2022052414/5681407a550346895dabfb15/html5/thumbnails/1.jpg)
1
Linear Models & Clustering
Presented by Kwak, Nam-ju
![Page 2: Linear Models & Clustering](https://reader034.vdocuments.mx/reader034/viewer/2022052414/5681407a550346895dabfb15/html5/thumbnails/2.jpg)
2
Coverage Classification• Some tools for classification• Linear regression• Multiresponse linear regression• Logistic regression• Perceptron Instance-based learning• Basic understanding• kD-tree• Ball tree Clustering• Clustering and types of clustering• Iterative distance-based clustering• Faster distance calculation
![Page 3: Linear Models & Clustering](https://reader034.vdocuments.mx/reader034/viewer/2022052414/5681407a550346895dabfb15/html5/thumbnails/3.jpg)
3
Classification Classification• Some tools for classification• Linear regression• Multiresponse linear regression• Logistic regression• Perceptron
![Page 4: Linear Models & Clustering](https://reader034.vdocuments.mx/reader034/viewer/2022052414/5681407a550346895dabfb15/html5/thumbnails/4.jpg)
4
Some tools for classification An input is categorized into one of collections
of data based on its features or attributes.
Classification is important in that we can dis-tinguish a set of data having common charac-teristics from others.
Some classification operations
In-put
Class
![Page 5: Linear Models & Clustering](https://reader034.vdocuments.mx/reader034/viewer/2022052414/5681407a550346895dabfb15/html5/thumbnails/5.jpg)
5
Some tools for classification Decision tree & classification rule
x=1?
y=1?
y=1?
b a a bIf x=1 and y=0 then class=aIf x=0 and y=1 then class=aIf x=0 and y=0 then class=bIf x=1 and y=1 then class=b
no
no no
yes
yes
yes
Decision tree
Classification rule
x XOR y = ?
![Page 6: Linear Models & Clustering](https://reader034.vdocuments.mx/reader034/viewer/2022052414/5681407a550346895dabfb15/html5/thumbnails/6.jpg)
6
Linear regression If attributes and classes are numeric value, we
can express the resulting class of a given in-put as a linear transformation between the set of attributes of the input and a certain set of weights.
x: class, wi: weight, ai: attribute
It is important to set wi well, so that the trans-formation results in a desirable class for given attributes of an input.
![Page 7: Linear Models & Clustering](https://reader034.vdocuments.mx/reader034/viewer/2022052414/5681407a550346895dabfb15/html5/thumbnails/7.jpg)
7
Linear regression Here, we introduce a simple way to make a
machine “learn”. A machine takes several training instances,
which are associations of a set of attributes to a class.
It extracts a rule from the training instances, then builds and tunes a mechanism to infer a class from a unknown test example.
It would give us an inferred class using “learnt” knowledge.
![Page 8: Linear Models & Clustering](https://reader034.vdocuments.mx/reader034/viewer/2022052414/5681407a550346895dabfb15/html5/thumbnails/8.jpg)
8
Linear regression n training instances will be given, that is, n
sets of attributes and n corresponding classes is going to be provided.
x(i): the corresponding ACTUAL class for the i-th training instance
aj(i): the j-th attribute of the i-th training in-
stance It is clear that we should find the set of wj’s
minimizing the following:
![Page 9: Linear Models & Clustering](https://reader034.vdocuments.mx/reader034/viewer/2022052414/5681407a550346895dabfb15/html5/thumbnails/9.jpg)
9
Multiresponse linear regression Linear regression is performed for each ap-
pearing class individually, with training in-stances, such that the value of linear trans-formation becomes 1 for training instances of that class and 0 for others.
Let us assume that we are doing linear re-gression for a certain class. But the following conditions should be met.
It looks like a membership function.
![Page 10: Linear Models & Clustering](https://reader034.vdocuments.mx/reader034/viewer/2022052414/5681407a550346895dabfb15/html5/thumbnails/10.jpg)
10
Multiresponse linear regression Now, with a given test example, we evaluate
linear transformation for each class using wi of that class.
Select the class which gives the largest value as the class of the test example.
For Class 1
For Class 2
For Class m
For Class n
…Test
exam-ple
Value 1
Value 2
Value 3
Value 4
Largest!!
![Page 11: Linear Models & Clustering](https://reader034.vdocuments.mx/reader034/viewer/2022052414/5681407a550346895dabfb15/html5/thumbnails/11.jpg)
11
Logistic regression Logit function
Inverse Logit functionFrom Wikipedia
![Page 12: Linear Models & Clustering](https://reader034.vdocuments.mx/reader034/viewer/2022052414/5681407a550346895dabfb15/html5/thumbnails/12.jpg)
12
Logistic regression P(1|a1, … , ak): for a certain class, the probabil-
ity that a test example consisting of a1, … , ak
is of that class
We set wi’s to minimize log-likelihood for each class.
![Page 13: Linear Models & Clustering](https://reader034.vdocuments.mx/reader034/viewer/2022052414/5681407a550346895dabfb15/html5/thumbnails/13.jpg)
13
Logistic regression Plain multiresponse regression doesn’t guar-
antee that each linear transformation value is between 0 and 1.
With Logistic regression, the value is between 0 and 1 and satisfies one of important condi-tion for being regarded as a probability.
However, the sum of values for all the classes may not become 1.
![Page 14: Linear Models & Clustering](https://reader034.vdocuments.mx/reader034/viewer/2022052414/5681407a550346895dabfb15/html5/thumbnails/14.jpg)
14
Logistic regression Pairwise classification: for every pair of
classes, namely, the first one and the second one, the meaning of P(1|a1, … , ak) is some-what changed.
P(1|a1, … , ak): the probability that a test ex-ample consisting of a1, … , ak is of the first class
P(0|a1, … , ak)=1-P(1|a1, … , ak): the probabil-ity that a test example consisting of a1, … , ak
is of the second class The regression is done only for training in-
stances of either the first and the second class of the pair.
![Page 15: Linear Models & Clustering](https://reader034.vdocuments.mx/reader034/viewer/2022052414/5681407a550346895dabfb15/html5/thumbnails/15.jpg)
15
Logistic regression For each pair of classes, namely, the first one
and the second one, if P(1|a1, … , ak) is above 0.5, then the resulting class is the first one, otherwise, the second one.
We can count how many times each class wins pairwise classification. The class which wins the most many times is the final resulting class for the given test example.
![Page 16: Linear Models & Clustering](https://reader034.vdocuments.mx/reader034/viewer/2022052414/5681407a550346895dabfb15/html5/thumbnails/16.jpg)
16
Logistic regression
h
i
P(1|a1, … , ak) ≥0.5?Or not?
i
i
j
P(1|a1, … , ak) ≥0.5?Or not?
i
i
k
P(1|a1, … , ak) ≥0.5?Or not?
k
…
…
Winnerthe most many times!!
![Page 17: Linear Models & Clustering](https://reader034.vdocuments.mx/reader034/viewer/2022052414/5681407a550346895dabfb15/html5/thumbnails/17.jpg)
17
Perceptron Sometimes, we only need to know which class
a test example belongs to without any infor-mation of probabilities.
Assumptions for simplification• Only two classes are of interest.• Linearly separable: data space can be sepa-
rated with a single hyperplane.
Linearly sepa-rable
Not linearly sepa-rable
![Page 18: Linear Models & Clustering](https://reader034.vdocuments.mx/reader034/viewer/2022052414/5681407a550346895dabfb15/html5/thumbnails/18.jpg)
18
Perceptron Remind that it is about a pair of two classes,
namely, the first class and the second one.
If a test example makes it below 0, the exam-ple is of the first class. If a test example makes it above 0, the example is of the sec-ond class.
We will find wj's as described above. In other words, we’re looking for a hyperplane:
![Page 19: Linear Models & Clustering](https://reader034.vdocuments.mx/reader034/viewer/2022052414/5681407a550346895dabfb15/html5/thumbnails/19.jpg)
19
Perceptron Algorithm PERCEPTRON LEARNING RULE
When misclassified instance is found, parameters of the perceptron hyperplane is modified, so that the in-stance may be classified correctly in the future.
If A is added into wj's
• (w0, w1, … , wk) ☞ (w0+a0, w1+a1, … , wk+ak)
• w0a0+w1a1+ … +wkak ☞ w0a0+w1a1+ … +wkak+∑aj2
Initialize all wj's to 0’sUntil all the training instances are properly classified
For each training instance AIf A is wrongly classified by the current perceptron
If A is actually of the first class, add A into wj'sIf A is actually of the second class, subtract A from wj's
![Page 20: Linear Models & Clustering](https://reader034.vdocuments.mx/reader034/viewer/2022052414/5681407a550346895dabfb15/html5/thumbnails/20.jpg)
20
Perceptron Perceptron: the hyperplane found in such a
way Perceptron is grandfather/grandmother of
neural network.
a0 a1 aj ak…
w0 w1 wj wk
An instance is input into the per-ceptron.Attributes of the instance acti-vate the input layer.Attributes are linearly trans-formed with weights and sent to the output node.Output node signals 1 if the re-ceived value is above 0, -1 oth-erwise.
![Page 21: Linear Models & Clustering](https://reader034.vdocuments.mx/reader034/viewer/2022052414/5681407a550346895dabfb15/html5/thumbnails/21.jpg)
21
Instance-based learning Instance-based learning• kD-tree• Ball tree
![Page 22: Linear Models & Clustering](https://reader034.vdocuments.mx/reader034/viewer/2022052414/5681407a550346895dabfb15/html5/thumbnails/22.jpg)
22
Basic understanding Find the training instance which is the closest
to the test example and predict the class from it.
Distance• Euclidean distance• Alternatives
Normalizing attributes
![Page 23: Linear Models & Clustering](https://reader034.vdocuments.mx/reader034/viewer/2022052414/5681407a550346895dabfb15/html5/thumbnails/23.jpg)
23
kD-tree k: the number of attributes Assume that k=2.
(7, 4)
(2, 2)
(6, 7)
(3, 8)
(3, 8) (6,
7)
(7, 4)
(2, 2)
![Page 24: Linear Models & Clustering](https://reader034.vdocuments.mx/reader034/viewer/2022052414/5681407a550346895dabfb15/html5/thumbnails/24.jpg)
24
kD-tree
![Page 25: Linear Models & Clustering](https://reader034.vdocuments.mx/reader034/viewer/2022052414/5681407a550346895dabfb15/html5/thumbnails/25.jpg)
25
Ball tree
8
5 5
2 2 3 2
![Page 26: Linear Models & Clustering](https://reader034.vdocuments.mx/reader034/viewer/2022052414/5681407a550346895dabfb15/html5/thumbnails/26.jpg)
26
clustering Clustering• Iterative distance-based clustering• Faster distance calculation
![Page 27: Linear Models & Clustering](https://reader034.vdocuments.mx/reader034/viewer/2022052414/5681407a550346895dabfb15/html5/thumbnails/27.jpg)
27
Clustering and types of clustering No class to be predicted Instances are to be divided into groups. Types of clustering• Exclusive• Overlapping• Probabilistic• Hierarchical
![Page 28: Linear Models & Clustering](https://reader034.vdocuments.mx/reader034/viewer/2022052414/5681407a550346895dabfb15/html5/thumbnails/28.jpg)
28
Iterative distance-based clustering Also called k-means Step 1: Select k points randomly as centers of
k clusters. Step 2: Each instance is associated with the
center which is the closest to it. Step 3: After all the instances are associated,
for each cluster, the centroid is computed from the instances of that cluster. This cen-troid becomes a new center of the cluster.
Step 4: With new centers for clusters, the same jobs are repeated.
![Page 29: Linear Models & Clustering](https://reader034.vdocuments.mx/reader034/viewer/2022052414/5681407a550346895dabfb15/html5/thumbnails/29.jpg)
29
Iterative distance-based clustering The best solution (k=2)
If the randomly selected centers are as fol-lows,
![Page 30: Linear Models & Clustering](https://reader034.vdocuments.mx/reader034/viewer/2022052414/5681407a550346895dabfb15/html5/thumbnails/30.jpg)
30
Faster distance calculation For each node, keep the sum of all instances
and the number of instances belonging to the ball the node represents.
Traversing the tree from top to bottom, find the closest cluster center for each instance.
If an entire ball of a node belongs to a certain cluster center, we need not traverse its child nodes simply by utilizing information stored in the node.
![Page 31: Linear Models & Clustering](https://reader034.vdocuments.mx/reader034/viewer/2022052414/5681407a550346895dabfb15/html5/thumbnails/31.jpg)
31
Faster distance calculation
![Page 32: Linear Models & Clustering](https://reader034.vdocuments.mx/reader034/viewer/2022052414/5681407a550346895dabfb15/html5/thumbnails/32.jpg)
32
Conclusion Any question?