deep learning with python. 파이썬 (python) 이란 ? 1991 년 guido van rossum 이 발표한...

Post on 29-Dec-2015

252 Views

Category:

Documents

4 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Deep Learning with Python

파이썬 (python) 이란 ?

• 1991 년 Guido van Rossum 이 발표한 인터프리터 언어

• Google 의 3 대 개발언어 (C/C++, Java, Python)

파이썬의 특징• 가독성

• 풍부한 라이브러리

• 접착성

• 무료

• 유니코드

파이썬의 종류• Cpython

• C 로 작성된 인터프리터

• Jython• 자바 가상 머신용 인터프리터

• IronPhthon• .Net 과 Mono 용 인터프리터 . C# 으로 구현됨

• PyPy• 파이썬으로 작성된 파이썬 인터프리터

설치 및 개발환경• http://www.python.org/downloads

파이썬 커맨드라인

파이썬 IDLE

자료형 - 수치

자료형 - 문자열

자료형 – List, Set, Tuple, Dictionary

Shallow/Deep Copy

Function

Modules and Packages

• Python modules “package program code and data for re-use.” (Lutz)

– Similar to library in C, package in Java.

• Python packages are hierarchical modules (i.e., modules that contain other modules).

• Three commands for accessing modules:1. import2. from…import3. reload

Modules and Packages: import

• The import command loads a module:# Load the regular expression module>>> import re

• To access the contents of a module, use dotted names:# Use the search method from the re module>>> re.search(‘\w+’, str)

• To list the contents of a module, use dir:>>> dir(re) [‘DOTALL’, ‘I’, ‘IGNORECASE’,…]

Modules and Packagesfrom…import• The from…import command loads individual functions

and objects from a module:# Load the search function from the re module>>> from re import search

• Once an individual function or object is loaded with from…import, it can be used directly:

# Use the search method from the re module>>> search (‘\w+’, str)

Import vs. from…import

Import• Keeps module

functions separate from user func-tions.

• Requires the use of dotted names.

• Works with reload.

from…import• Puts module func-

tions and user functions together.

• More convenient names.

• Does not work with reload.

Modules and Packages: reload

• If you edit a module, you must use the reload com-mand before the changes become visible in Python:

>>> import mymodule...>>> reload (mymodule)

• The reload command only affects modules that have been loaded with import; it does not update individual functions and objects loaded with from...import.

NumPy

• Fundamental package for scientific computing with Python

• It contains among other things:• a powerful N-dimensional array object• sophisticated (broadcasting) functions• tools for integrating C/C++ and Fortran code• useful linear algebra, Fourier transform, and random number

capabilities

NumPy: Functions & Attributes

NumPy: N-dimensional array object

파이썬과 자연언어처리• Python is a great language for NLP:

• Simple• Easy to debug:

• Exceptions• Interpreted language

• Easy to structure• Modules• Object oriented programming

• Powerful string manipulation

Introduction to NLTK

• The Natural Language Toolkit (NLTK) provides:• Basic classes for representing data relevant to natural lan-

guage processing.

• Standard interfaces for performing tasks, such as tokenization, tagging, and parsing.

• Standard implementations of each task, which can be com-bined to solve complex problems.

NLTK: Example Modules

• nltk.token: processing individual elements of text, such as words or sentences.

• nltk.probability: modeling frequency distributions and probabilistic systems.

• nltk.tagger: tagging tokens with supplemental information, such as parts of speech or wordnet sense tags.

• nltk.parser: high-level interface for parsing texts.• nltk.chartparser: a chart-based implementation of the

parser interface.• nltk.chunkparser: a regular-expression based surface

parser.

NLTK: Top-Level Organization

• NLTK is organized as a flat hierarchy of packages and modules.

• Each module provides the tools necessary to address a specific task

• Modules contain two types of classes:• Data-oriented classes are used to represent information rele-

vant to natural language processing.• Task-oriented classes encapsulate the resources and methods

needed to perform a specific task.

Installing NLTK

• 32-bit binary installation

• Install Python: • http://www.python.org/download/releases/3.4.1/ (avoid the 64-bit versions)

• Install Numpy (optional):• http://sourceforge.net/projects/numpy/files/NumPy/1.8.1/numpy-1.8.1-win32-superpack-

python3.4.exe

• Install NLTK: • http://pypi.python.org/pypi/nltk

• Test installation: • Start>Python34, then type import nltk

Installing NLTK Data

NLTK Corpora

NLTK Book

Simple Statistics

NLP Pipeline

Using a Tagger

Supervised Classification

Gender Identification

Gender Identification(cont.)

WordNet

Semantic Similarity

Theano

• Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-di-mensional arrays efficiently

• Easy parallelization: CPU or GPU• Speed optimization• Deep learning tutorial codes• Good maintenance• A great user group

Requirements

• Linux, Mac OS X or Windows operating system

• Python >= 2.6

• NumPy >= 1.6.2

• SciPy >= 0.11

• A BLAS installation (with Level 3 functionality)

Easy Installation of an Optimized Theano on Current Ubuntu• For Ubuntu 11.10 through 14.04:

• sudo apt-get install python-numpy python-scipy python-dev python-pip python-nose g++ libopenblas-dev git

• sudo pip install Theano

• For Ubuntu 11.04:• sudo apt-get install python-numpy python-scipy python-dev

python-pip python-nose g++ git libatlas3gf-base libatlas-dev• sudo pip install Theano

Test the newly installed packages

• NumPy (~30s): python -c "import numpy; numpy.test()“

• SciPy (~1m): python -c "import scipy; scipy.test()“

• Theano (~30m): python -c "import theano; theano.test()"

Adding two Scalars

>>> import theano.tensor as T>>> x = T.dscalar('x')>>> y = T.dscalar('y')>>> z = x + y>>> z.eval({x : 16.3, y : 12.1})array(28.4)

Adding two Matrices

>>> x = T.dmatrix('x')>>> y = T.dmatrix('y')>>> z = x + y>>> f = function([x, y], z)>>> f([[1, 2], [3, 4]], [[10, 20], [30, 40]])array([[ 11., 22.], [ 33., 44.]])

Logistic Function

>>> x = T.dmatrix('x')>>> s = 1 / (1 + T.exp(-x))>>> logistic = function([x], s)>>> logistic([[0, 1], [-1, -2]])array([[ 0.5 , 0.73105858], [ 0.26894142, 0.11920292]])

Restricted Boltzmann Machine

RBM in Theamo

class RBM(object): """Restricted Boltzmann Machine (RBM) """ def __init__( self, input=None, n_visible=784, n_hidden=500, W=None, hbias=None, vbias=None, numpy_rng=None, theano_rng=None ):

Generative Training

Contrastive Divergence

Bayesian network(=Belief network)

• probabilistic graphical model• represents a set of random variables and

their conditional dependencies via a directed acyclic graph

Layer Stacking

Deep Learning for WSD

• WSD(Word Sense Disambiguation)• 둘 이상의 의미로 사용되는 어휘가 문맥에서 어떤 의미로 사용되었는지를

구분하는 작업

• 기계번역 / 정보검색 등 자연언어처리 응용시스템의 성능을 좌우함

차에서 내리자 내리는 눈 때문에 모두 건물 안으로 뛰어갔다 .

세종 의미분석 말뭉치

Naïve Bayes for WSD

• Bayes’ Rule

sk: 중의성 어휘 s 의 의미c : 중의성 어휘 s 의 주변 문맥 어휘

SVM for WSD

Word2Vec

Word2Vec Install

• Download the code: svn checkout http://word2vec.googlecode.com/svn/trunk/

• Run 'make' to compile word2vec tool• Run the demo scripts: ./demo-word.sh and ./demo-

phrases.sh• For questions about the toolkit, see 

http://groups.google.com/group/word2vec-toolkit

•Download the code: svn checkout http://word2vec.googlecode.com/svn/trunk/•Run 'make' to compile word2vec tool•Run the demo scripts: ./demo-word.sh and ./demo-phras-es.sh•For questions about the toolkit, see http://groups.google.com/group/word2vec-toolkit

Vector Representation in Word2Vec

>>> model = Word2Vec(sentences, size=200) # default value is 100

>>> model.most_similar(positive=['woman', 'king'], negative=['man'], topn=1)

[('queen', 0.50882536)]

>>> model.doesnt_match("breakfast cereal dinner lunch".split())

'cereal'

>>> model.similarity('woman', 'man')

0.73723527

>>> model['computer'] # raw NumPy vector of a word

array([-0.00449447, -0.00310097, 0.02421786, ...], dtype=float32)

top related