데이터분석언어 (python) - openwith.net°이터분석언어_1python2018.pdf · 회차 날짜...
TRANSCRIPT
회차 날짜 주요 내용 교재
1 9/8 강의 소개
Python 개요
실습환경 구축
(Windows/Linux 설치)
2 9/15 기본프로그래밍 (1) Python 언어구조, 데이터 타입
변수, Expression과 연산자
3 9/29 기본프로그래밍 (2)
Sequence (List, Tuple, …)
Dictionary와 Set
4 10/6 기본프로그래밍 (3)
제어구문
함수기초
5 10/13 기본프로그래밍 (4) 함수응용
Exception 구문
6 10/20 Python OOP (1)
7 10/27 Python OOP (2)
8 11/3 <중간고사>
9 11/10 빅데이터와 Python
Module과 패키지
10 11/17 String과 Regular Expression
File과 Text
11 11/24 Standard Library
12 12/1 Network 및 Web 프로그래밍
13 12/8 Numpy
14 12/15 Matplotlib과 Pandas (1)
15 12/22 Pandas (2)
16 12/29 <기말고사>
순서
• 세상 변하는 얘기 – Disruptive Technologies
– Prime Mover – OSS와 Python
• Python 개요 – 특징과 역사
– Python as a Programming Language
– Python Interpreter와 CPython
– Python Use Cases
• Python과 데이터분석
• 실습환경
• Python in a Sheet
• 실습
Disruptive Technologies
https://www.mckinsey.com/business-functions/digital-mckinsey/our-insights/disruptive-technologies
소프트웨어와 오픈소스 (OSS)
2014-12-13 5
Python 배경 – 프로그래밍 언어와 오픈소스
프로그래밍 언어의 역사
• Before C: – 1957 FORTRAN/ 1959 COBOL/
1964 BASIC
• C – 1969 C
– 1973 PASCAL
• C++ – 1983 C++
• http://www.youtube.com/watch?v=JoVQTPbD6UY
• After C/C++ – 1991 Python
– 1995 Java, Javascript
– 1995 R
– 2009 Go
OSS의 역사
• 1960's ARPANET, ...
• 1969 Unix
• 1980 Usenet
• 1983 GNU 프로젝트
• 1985 FSF
• 1989 386BSD, FreeBSD, …
• 1991 Linux kernel
• 1994 MySQL
• 1996 Apache 웹 서버
• 2001 Open Source 선언:
• 2004 Ubuntu
• http://www.youtube.com/wat
ch?v=POexV1k62_Y
2014-12-13 7
Forerunners
Bjarne Stroustrup
Yukihiro Matsumoto
James Gosling Larry
Wall
Rasmus Lerdorf
Ken Thompson
Dennis Ritchie
Linus Torvalds
Brendan Eich
Richard Stallman
Larry Page
Bill Joy Tim
Berners-Lee Guido van Rossum
Python History
• Conceived in the late 1980s,
• Implementation began in 1989 as a successor to ABC language – On the origins of Python, Van Rossum wrote in 1996
• ...In December 1989, I was looking for a "hobby" programming project that would keep me occupied during the week around Christmas. My office ... would be closed, but I had a home computer, and not much else on my hands. I decided to write an interpreter for the new scripting language I had been thinking about lately: a descendant of ABC that would appeal to Unix/C hackers. I chose Python as a working title for the project, being in a slightly irreverent mood (and a big fan of Monty Python's Flying Circus). — Guido van Rossum
• Process – PSF (Python Software Foundation)
• Python’s intellectual property is vested in the PSF
• Python’s reference source repositories (Mercurial git)
– Python Enhancement Proposals (PEPs) - public docs
• Guido van Rossum – Python’s inventor, architect
– Benevolent Dictator For Life (BDFL).
• Zen of Python - PEP 20 – S/W principles that influences design of Python by Tim Peters.
• Beautiful is better than ugly.
• Explicit is better than implicit.
• Simple is better than complex.
• Complex is better than complicated.
• Flat is better than nested.
• …
Python 특징
• 특징 – Simple, but not simplistic.
– A general-purpose programming language
– A very high-level language (VHLL).
– OOP language
– * A functional programming language
– Batteries Included - Standard Library and Extension Modules
• Python Implementations - 4 production-quality implementations – CPython
• - Classic Python (Python) = implementation of Python.
• = a compiler, interpreter, and set of built-in and optional extension modules
– Jython,
– IronPython,
– PyPy - generate native machine code “just in time”
• Syntax and semantics – Indentation – Expressions, Statements and control flow – Typing
• Strong Typing • Dynamic Typing • Dynamic Typing - Duck test "If it walks like a duck and it quacks like
a duck, then it must be a duck“
• Libraries – https://pypi.org/
• Development Environments – REPL (read–eval–print loop) – IDLE – IDE – IPython
• Python version – (…)
• alpha releases, tagged as 3.x a0, 3.x a1, and so on.
• beta release, 3.x b1, and after the betas, at least one release candidate, 3.x rc1.
• final release of 3.x (3.x.0)
– Python 2.7 • first released in July 2010
• Python 2.7's end-of-life postponed to 2020
– Python 3.0 • first released in 2008. - Each v3 minor release adds features
• initially called Python 3000 (or py3k) released in 2008
• In 2017, Google announced work on a Python 2.7 to Go transcompiler to improve performance
Python Interpreter
• What? – Process
– Lexing ; text line in source code AST
– Parsing ; analyze AST
– Compiling ; AST (structured) code object
– Interpreting ; code object로 실제로 Do! • (여러 의미)
– = PVM = stack machine (call stack과는 구별)
– Bytecode interpreter
» (bytecode = intermediate code = internal representation of Py program in the interpreter
• Ex. Byterun = Python Python interpreter
• Intermediate approach
JIT compiler/Bytecode
Compiled Interpreted
Ready to run Not cross-platform Cross-platform Interpreter Required
Often Faster Inflexible Simpler to test Often slower
Source code is private
Extra step Easier to debug Source coude is public
C, C++, Objective-C PHP, Javascript, Ruby, Perl
Hybrid - Java, C#, VB.NET, Python
• Cpython – (1) compiler to convert to bytecode – (2) VM to run the bytecode
• = stack-based (instead of register-based) • dis module has most of the details
– (3) C interface to interact with the VM
– Python/ceval.c • PyEval_EvalFrame(PyFrameObject *f)
– Modules/main.c • Py_Main(int argc, wchar_t **argv)
• Cpython vs Cython – `Cython` is a language in itself that is a superset of `Python` (i.e. (almost)
all `Python` syntax is accepted) and `CPython` is one (the most trusted and used) implementation of `Python` in `C`.
– Cython adds a few extensions to the Python language, and lets you compile your code to C extensions, code that plugs into the CPython interpreter.
>>> import dis >>> def add(x,y): ... z=x+y ... return z ... >>> dis.dis(add) 2 0 LOAD_FAST 0 (x) 2 LOAD_FAST 1 (y) 4 BINARY_ADD 6 STORE_FAST 2 (z) 3 8 LOAD_FAST 2 (z) 10 RETURN_VALUE
>>> help(list) Help on class list in module builtins: class list(object) | list() -> new empty list | list(iterable) -> new list initialized from iterable's items | | Methods defined here: | | __add__(self, value, /) | Return self+value. | | __contains__(self, key, /)
>>> help(list.sort) Help on method_descriptor: sort(...) L.sort(key=None, reverse=False) -> None -- stable sort *IN PLACE*
Primitive building blocks
• 개요 – some for description of data and processes applied to them
• Syntax – 문법
• Semantic – meaning of languages
• Type System – Typed vs. Untyped
• Untyped allows any operation to be performed on any data – ex. tcl
– Strongly-Typed vs. Weakly-Typed
– Static vs. Dynamic Typing
Python Use Cases – Some Projects
Python과 데이터분석
• 데이터분석
• 개념의 확장 – Spreadsheet 중심 분석
– + BI/OLAP/DB Query
– + 통계 분석
– + 텍스트 분석 (SNA/감성분석, 마이닝, 검색)
– + Machine Learning
– + Deep Learning
• Data Science
• Python과 데이터분석
22
실습환경
• Choice
• 강의 범위
• 강의안 (?)
Py/IDLE pipenv virtualenv vi Spyder/ ipython
Anaconda Eclipse Pycharm Atom
Windows
Linux
• 실습환경 – Linux
– MS Windows
– Others
• Raspberry Pi
• Cloud
Python 프로그래밍 언어
• Lexical Structure – Lines and Indentation
– Character Sets
– Tokens
– Statements
• Data Types – Numbers
– Sequences
– Sets
– Dictionaries
– Callables
– Boolean Values
• Strings
• Variables and Other References – Variables
– Assignment Statements
• Functions
• Expressions and Operators – Numeric Operations
– Sequence Operations
– Set Operations
– Dictionary Operations
• Control Flow Statements – if else while
– for break continue
– try raise with
• Classes & OOP
• Exceptions
• Core Built-ins and Standard Libraries
• Modules & Packages
실습