01. introduction

28
Jeonghun Yoon

Upload: jeonghun-yoon

Post on 12-Apr-2017

106 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Jeonghun Yoon

Machine Learning? 기계 학습?

학습(learning)

○ 하나의 문제를 수행한 후에 그 추론과정에서 얻은 경험을 바탕으로 시스템의 지식을

수정 및 보완하여, 다음에 그 문제나 또는 비슷한 문제를 수행할 때에는 처음보다 더

효율적이고 효과적으로 문제를 해결할 수 있는 적응성

○ 새로운 선언적 지식의 습득, 지도 및 실습을 통한 인지적인 기술의 개발, 새로운 지식의

일반적이고 효과적인 표현으로의 조직화, 관찰 이나 실험을 통한 새로운 사실이나 이론

의 발견 등과 같은 다양한 과정들을 포함한다.

※ Wikipedia 참조

기계 학습(Machine Learning)

○ 기계 학습(machine learning)은 인공 지능의 한 분야로, 컴퓨터가 학습할 수 있도록 하

는 알고리즘과 기술을 개발하는 분야를 말한다. 가령, 기계 학습을 통해서 수신한 이메

일이 스팸인지 아닌지를 구분할 수 있도록 훈련할 수 있다.

○ "컴퓨터에게 배울 수 있는 능력, 즉 코드로 정의하지 않은 동작을 실행하는 능력에 대

한 연구 분야"

○ 훈련 데이터를 통해 이미 알려진 속성을 기반으로 예측하는 능력을 의미한다.

※ Wikipedia 참조

데이터 마이닝(data mining)

○ 대규모로 저장된 데이터 안에서 체계적이고 자동적으로 통계적 규칙이나 패턴을 찾아

내는 것이다.

○ 다른 말로는 KDD(데이터베이스 속의 지식 발견, knowledge-discovery in databases)라

고도 일컫는다.

※ Wikipedia 참조

1. Introduction of Machine Learning 2. Naive Bayesian Classifier 3. Linear Regression 4. Logistic Regression 5. K-means Clsutering 6. Graph Mining 7. Dimensional Reduction (PCA) 8. Spectral Clustering 9. Association Rule Mining 10. Bayesian Network 1 & 2 11. Decision tree 12. Support Vector Machine (SVM) 1 & 2 13. Hidden Markov Model (HMM)

14. Markov chain Monte Carlo(MCMC) 15. Gibbs Sampling 16. Latent Dirichlet allocation (LDA) 17. Neural Networks

Definition( Machine Learning / 기계학습 )

A set of methods that can automatically detect patters in data, and then use

the uncovered patterns to predict patterns to predict future data, or to

perform other kinds of decision making under uncertainty

"Machine Learning-A Probabilistic Perspective" Kevin P. Murphy

Supervised Learning Classification

Bayesian Classifier

Logistic Regression

KNN Classifier

Support Vector Machine (SVM)

Regression Linear Regression

Unsupervised Learning Clustering

K-means Clustering

Spectral Clustering

Dimensional Reduction PCA

Reinforce Learning

A feature vector is an 𝑛-dimensional vector of numerical features

that represent some object.

For example , a document , 𝑥𝑖 : 문서 안에서의 𝑖 번째 단어 I love you. 𝕩 = 𝐼, 𝑙𝑜𝑣𝑒, 𝑦𝑜𝑢

𝕩 𝑦

𝑓 mapping

불연속 값 또는 연속 값

E-mail (Words) Spam or Not(불)

Web Site (Words) Sports or Science or News(불)

특성 벡터

꽃 (꽃의 생김새) Line-flower or Mass-flower(불)

아이의 키 아버지의의 키(연)

방의 개수, 방의 넓이 집 값(연)

(visit, money, buy, girl, Viagra)

For example,

Spam mail 𝑓

Goal of Supervised Learning (predictive learning)

To learn a mapping(function) from input 𝕩 to output 𝑦, given a labeled set of

input-output pairs 𝐷 = 𝕩𝑖 , 𝑦𝑖 𝑖=1𝑁 , where 𝐷 is called the training set, and 𝑁

is the number of training examples.

- 𝕩𝑖 : 𝐷-dimensional vector of numbers ⇒ feature vector - 𝑦𝑖 : response variable ⇒ categories or real-values

𝑦𝑖 is categorical, the problem is classification, 𝑦𝑖 is real-valued, the problem is regression.

What is natural grouping among these objects?

Simpson's Family School Employees Females males

Goal of Unsupervised Learning (descriptive learning)

To find "interesting pattern" in the data, given 𝐷 = 𝕩𝑖 𝑖=1𝑁 , where 𝐷 is called

the training set, and 𝑁 is the number of training examples.

- 𝕩𝑖 : 𝐷-dimensional vector of numbers ⇒ feature vector

It is known knowledge discovery.

Goal of Reinforcement Learning

To learn how to act or behave when given occasional reward or punishment

sinnals.

Supervised Learning Classification

Bayesian Classifier

Logistic Regression

KNN Classifier

Support Vector Machine (SVM)

Regression Linear Regression

Unsupervised Learning Clustering

K-means Clustering

Spectral Clustering

Dimensional Reduction PCA

Reinforce Learning

𝕩 𝑦

𝑓 mapping

불연속 값 또는 연속 값

E-mail (Words) Spam or Not(불)

Web Site (Words) Sports or Science or News(불)

특성 벡터

꽃 (꽃의 생김새) Line-flower or Mass-flower(불)

Goal of Classification

To learn a mapping(function) from input 𝕩 to output 𝑦, where 𝑦 ∈ {1, … , 𝐶},

with 𝐶 being the number of classes.

function approximation

Assume that 𝑦 = 𝑓(𝕩) for unknown function ℎ, the goal of learning is to

estimate function 𝑓 given a labeled training set, and then to make predictions

using 𝑦∗ = 𝑓∗(𝕩). Then we can make predictions on novel input.

way to formalize the problem

Compute!! our "best guess" using

𝑦∗ = 𝑓∗ 𝕩 = 𝑎𝑟𝑔 max𝑐=1…𝐶

𝑝(𝑦 = 𝑐|𝕩, 𝐷)

This corresponds to the most probable class label. It is known as a MAP estimate (Maximum a Posterior). Ex) mail spam filtering

𝑝 𝑦 = 1 𝕩, 𝐷) 𝑝 𝑦 = 2 𝕩, 𝐷)

... 𝑝 𝑦 = 𝐶 𝕩, 𝐷)

Goal of Regression

To learn a mapping(function) from input 𝕩 to output 𝑦, where 𝑦 is continuous.

𝑦 = 𝜖1 + 𝜖2𝑥 𝑦 = 𝜖1 + 𝜖2𝑥 + 𝜀3𝑥2

Supervised Learning Classification

Bayesian Classifier

Logistic Regression

KNN Classifier

Support Vector Machine (SVM)

Regression Linear Regression

Unsupervised Learning Clustering

K-means Clustering

Spectral Clustering

Dimensional Reduction PCA

Reinforce Learning

What is natural grouping among these objects?

Simpson's Family School Employees Females males

Goal of Clustering

To estimate the distribution over the number of cluster, 𝑝 𝐾 𝐷 ; this tells us if

there are subpopulations within the data

To estimate which cluster each point belong to.

𝐾∗ = 𝑎𝑟𝑔 max𝐾

𝑝(𝐾|𝐷)

We often approximate the distribution

We can infer which cluster each data point belongs to by computing

𝑧𝑖∗ = 𝑎𝑟𝑔 max

𝑘𝑝(𝑧𝑖 = 𝑘|𝕩𝑖 , 𝐷)

𝑝 𝑧𝑖 = 1 𝕩, 𝐷) 𝑝 𝑧𝑖 = 2 𝕩, 𝐷)

... 𝑝 𝑧𝑖 = 𝐶 𝕩, 𝐷)

Goal of Dimensional Reduction

To reduce the dimensionality by projecting the data to a lower dimensional

subspace which captures the "essence" of the data.

Motivation : Although the data may appear high dimensional, there are only be a small number of degrees of variability, corresponding to latent factors. latent factor : which describe most of the variability

"Machine Learning-A Probabilistic Perspective" Kevin P. Murphy

http://ko.wikipedia.org/wiki/기계_학습