deep learning with python. 파이썬 (python) 이란 ? 1991 년 guido van rossum 이 발표한...
TRANSCRIPT
Deep Learning with Python
파이썬 (python) 이란 ?
• 1991 년 Guido van Rossum 이 발표한 인터프리터 언어
• Google 의 3 대 개발언어 (C/C++, Java, Python)
파이썬의 특징• 가독성
• 풍부한 라이브러리
• 접착성
• 무료
• 유니코드
파이썬의 종류• Cpython
• C 로 작성된 인터프리터
• Jython• 자바 가상 머신용 인터프리터
• IronPhthon• .Net 과 Mono 용 인터프리터 . C# 으로 구현됨
• PyPy• 파이썬으로 작성된 파이썬 인터프리터
설치 및 개발환경• http://www.python.org/downloads
파이썬 커맨드라인
파이썬 IDLE
자료형 - 수치
자료형 - 문자열
자료형 – List, Set, Tuple, Dictionary
Shallow/Deep Copy
Function
Modules and Packages
• Python modules “package program code and data for re-use.” (Lutz)
– Similar to library in C, package in Java.
• Python packages are hierarchical modules (i.e., modules that contain other modules).
• Three commands for accessing modules:1. import2. from…import3. reload
Modules and Packages: import
• The import command loads a module:# Load the regular expression module>>> import re
• To access the contents of a module, use dotted names:# Use the search method from the re module>>> re.search(‘\w+’, str)
• To list the contents of a module, use dir:>>> dir(re) [‘DOTALL’, ‘I’, ‘IGNORECASE’,…]
Modules and Packagesfrom…import• The from…import command loads individual functions
and objects from a module:# Load the search function from the re module>>> from re import search
• Once an individual function or object is loaded with from…import, it can be used directly:
# Use the search method from the re module>>> search (‘\w+’, str)
Import vs. from…import
Import• Keeps module
functions separate from user func-tions.
• Requires the use of dotted names.
• Works with reload.
from…import• Puts module func-
tions and user functions together.
• More convenient names.
• Does not work with reload.
Modules and Packages: reload
• If you edit a module, you must use the reload com-mand before the changes become visible in Python:
>>> import mymodule...>>> reload (mymodule)
• The reload command only affects modules that have been loaded with import; it does not update individual functions and objects loaded with from...import.
NumPy
• Fundamental package for scientific computing with Python
• It contains among other things:• a powerful N-dimensional array object• sophisticated (broadcasting) functions• tools for integrating C/C++ and Fortran code• useful linear algebra, Fourier transform, and random number
capabilities
NumPy: Functions & Attributes
NumPy: N-dimensional array object
파이썬과 자연언어처리• Python is a great language for NLP:
• Simple• Easy to debug:
• Exceptions• Interpreted language
• Easy to structure• Modules• Object oriented programming
• Powerful string manipulation
Introduction to NLTK
• The Natural Language Toolkit (NLTK) provides:• Basic classes for representing data relevant to natural lan-
guage processing.
• Standard interfaces for performing tasks, such as tokenization, tagging, and parsing.
• Standard implementations of each task, which can be com-bined to solve complex problems.
NLTK: Example Modules
• nltk.token: processing individual elements of text, such as words or sentences.
• nltk.probability: modeling frequency distributions and probabilistic systems.
• nltk.tagger: tagging tokens with supplemental information, such as parts of speech or wordnet sense tags.
• nltk.parser: high-level interface for parsing texts.• nltk.chartparser: a chart-based implementation of the
parser interface.• nltk.chunkparser: a regular-expression based surface
parser.
NLTK: Top-Level Organization
• NLTK is organized as a flat hierarchy of packages and modules.
• Each module provides the tools necessary to address a specific task
• Modules contain two types of classes:• Data-oriented classes are used to represent information rele-
vant to natural language processing.• Task-oriented classes encapsulate the resources and methods
needed to perform a specific task.
Installing NLTK
• 32-bit binary installation
• Install Python: • http://www.python.org/download/releases/3.4.1/ (avoid the 64-bit versions)
• Install Numpy (optional):• http://sourceforge.net/projects/numpy/files/NumPy/1.8.1/numpy-1.8.1-win32-superpack-
python3.4.exe
• Install NLTK: • http://pypi.python.org/pypi/nltk
• Test installation: • Start>Python34, then type import nltk
Installing NLTK Data
NLTK Corpora
NLTK Book
Simple Statistics
NLP Pipeline
Using a Tagger
Supervised Classification
Gender Identification
Gender Identification(cont.)
WordNet
Semantic Similarity
Theano
• Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-di-mensional arrays efficiently
• Easy parallelization: CPU or GPU• Speed optimization• Deep learning tutorial codes• Good maintenance• A great user group
Requirements
• Linux, Mac OS X or Windows operating system
• Python >= 2.6
• NumPy >= 1.6.2
• SciPy >= 0.11
• A BLAS installation (with Level 3 functionality)
Easy Installation of an Optimized Theano on Current Ubuntu• For Ubuntu 11.10 through 14.04:
• sudo apt-get install python-numpy python-scipy python-dev python-pip python-nose g++ libopenblas-dev git
• sudo pip install Theano
• For Ubuntu 11.04:• sudo apt-get install python-numpy python-scipy python-dev
python-pip python-nose g++ git libatlas3gf-base libatlas-dev• sudo pip install Theano
Test the newly installed packages
• NumPy (~30s): python -c "import numpy; numpy.test()“
• SciPy (~1m): python -c "import scipy; scipy.test()“
• Theano (~30m): python -c "import theano; theano.test()"
Adding two Scalars
>>> import theano.tensor as T>>> x = T.dscalar('x')>>> y = T.dscalar('y')>>> z = x + y>>> z.eval({x : 16.3, y : 12.1})array(28.4)
Adding two Matrices
>>> x = T.dmatrix('x')>>> y = T.dmatrix('y')>>> z = x + y>>> f = function([x, y], z)>>> f([[1, 2], [3, 4]], [[10, 20], [30, 40]])array([[ 11., 22.], [ 33., 44.]])
Logistic Function
>>> x = T.dmatrix('x')>>> s = 1 / (1 + T.exp(-x))>>> logistic = function([x], s)>>> logistic([[0, 1], [-1, -2]])array([[ 0.5 , 0.73105858], [ 0.26894142, 0.11920292]])
Restricted Boltzmann Machine
RBM in Theamo
class RBM(object): """Restricted Boltzmann Machine (RBM) """ def __init__( self, input=None, n_visible=784, n_hidden=500, W=None, hbias=None, vbias=None, numpy_rng=None, theano_rng=None ):
Generative Training
Contrastive Divergence
Bayesian network(=Belief network)
• probabilistic graphical model• represents a set of random variables and
their conditional dependencies via a directed acyclic graph
Layer Stacking
Deep Learning for WSD
• WSD(Word Sense Disambiguation)• 둘 이상의 의미로 사용되는 어휘가 문맥에서 어떤 의미로 사용되었는지를
구분하는 작업
• 기계번역 / 정보검색 등 자연언어처리 응용시스템의 성능을 좌우함
차에서 내리자 내리는 눈 때문에 모두 건물 안으로 뛰어갔다 .
세종 의미분석 말뭉치
Naïve Bayes for WSD
• Bayes’ Rule
sk: 중의성 어휘 s 의 의미c : 중의성 어휘 s 의 주변 문맥 어휘
SVM for WSD
Word2Vec
Word2Vec Install
• Download the code: svn checkout http://word2vec.googlecode.com/svn/trunk/
• Run 'make' to compile word2vec tool• Run the demo scripts: ./demo-word.sh and ./demo-
phrases.sh• For questions about the toolkit, see
http://groups.google.com/group/word2vec-toolkit
•Download the code: svn checkout http://word2vec.googlecode.com/svn/trunk/•Run 'make' to compile word2vec tool•Run the demo scripts: ./demo-word.sh and ./demo-phras-es.sh•For questions about the toolkit, see http://groups.google.com/group/word2vec-toolkit
Vector Representation in Word2Vec
>>> model = Word2Vec(sentences, size=200) # default value is 100
>>> model.most_similar(positive=['woman', 'king'], negative=['man'], topn=1)
[('queen', 0.50882536)]
>>> model.doesnt_match("breakfast cereal dinner lunch".split())
'cereal'
>>> model.similarity('woman', 'man')
0.73723527
>>> model['computer'] # raw NumPy vector of a word
array([-0.00449447, -0.00310097, 0.02421786, ...], dtype=float32)