data wrangling lab - university of arkansas at little rock...why do we choose python? •c or c++...

Post on 13-Jun-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Data Wrangling Lab

Sept 26-29, 2016 (c) 2016 iCDO@UALR 1

David /WEI DAI

CDO-1 Certificate Program:Foundations for Chief Data Officers

Agenda

• Basic Python Program

• MongoDB Lab

• Clean Data Lab

Sept 26-29, 2016 (c) 2016 iCDO@UALR 2

A Tutorial on the Python Programming Language

Sept 26-29, 2016 (c) 2016 iCDO@UALR 3

Why do we choose Python?

• C or C++

• Java

• Perl

• Scheme

• Fortran

• Python

• Matlab

Modern, interpreted, object-oriented, full featured high level programming language

Portable(Unix/Linux,MacOS X,Windows) Open source, intellectual property rights held

by the Python Software Foundation Python versions: 2.x and 3.x

3.x is not backwards compatible with 2.x This course uses 3.x version

Fast program development Simple syntax Easy to write well readable code Large standard library Lots of third party libraries

Numpy, Scipy, Biopython MatplotlibSept 26-29, 2016 (c) 2016 iCDO@UALR 4

Python Program Platform

• Open a browser and access the website:

• https://teslae.host.ualr.edu:8888

• Password: python

Sept 26-29, 2016 (c) 2016 iCDO@UALR 5

Hello World

•At the prompt type “ hello world!”

Sept 26-29, 2016 (c) 2016 iCDO@UALR 6

The print and string Statement

>>> print('hello')hello>>> print('hello', David')hello David

• Elements separated by commas print with a space between them

• Strings are immutable

• “+” is overloaded to do concatenation >>> x = 'hello'

>>> x = x + ' America'>>> print(x)'hello America'

Sept 26-29, 2016 (c) 2016 iCDO@UALR 7

Substrings and Methods

>>> s = '012345'>>> print(s[3])'3'>>> print(s[1:4])'123'>>> print(s[2:])'2345'>>> print(s[:4])'0123'>>> print(s[-2])'4'

• len(String) – returns the number of characters in the String

• str(Object) – returns a String representation of the Object

>>> print(len(s))6>>> print(str(10.3))'10.3'

Sept 26-29, 2016 (c) 2016 iCDO@UALR 8

Sept 26-29, 2016 (c) 2016 iCDO@UALR 9

• Relational operators== equal

!=, <> not equal

> greater than

>= greater than or

equal

< less than

<= less than or equal

• Logical operatorsand and

or or

notnot

Variables

• Are not declared, just assigned

• The variable is created the first time you assign it a value

• Assignment is = and comparison is ==

Sept 26-29, 2016 (c) 2016 iCDO@UALR 10

Lists

• Ordered collection of data

• Data can be of different types

• Lists are mutable

• Issues with shared references and mutability

• Same subset operations as Strings

>>> x = [1,'hello', (3 + 2j)]>>> print(x)[1, 'hello', (3+2j)]>>> print(x[2])(3+2j)>>> print(x[0:2])[1, 'hello']

Sept 26-29, 2016 (c) 2016 iCDO@UALR 11

Lists: Modifying Content

• x[i] = a reassigns the ith element to the value a

• Since x and y point to the same list object, both are changed

• The method appendalso modifies the list

>>> x = [1,2,3]>>> y = x>>> x[1] = 15>>>print( x)[1, 15, 3]>>> print(y)[1, 15, 3]>>> x.append(12)>>> print(y)[1, 15, 3, 12]

Sept 26-29, 2016 (c) 2016 iCDO@UALR 12

Lists: Modifying Contents

• The method append modifies the list and returns None

• List addition (+) returns a new list

>>> x = [1,2,3]>>> y = x>>> z = x.append(12)>>> print(z == None)True>>> print(y)[1, 2, 3, 12]>>> x = x + [9,10]>>> print(x)[1, 2, 3, 12, 9, 10]>>> print(y)[1, 2, 3, 12]>>>

Sept 26-29, 2016 (c) 2016 iCDO@UALR 13

If ELSE Statements

if expression:statement(s)

else:statement(s)

Sept 26-29, 2016 (c) 2016 iCDO@UALR 14

For Loops

• Similar to perl for loops, iterating through a list of values

16123

for x in [1,6,12,3] :print(x)forloop1.py

0123

for x in range(4) :print(x)forloop2.py

range(N) generates a list of numbers [0,1, …, n-1]Sept 26-29, 2016 (c) 2016 iCDO@UALR 15

Functions are first class objects

• Can be assigned to a variable

• Can be passed as a parameter

• Can be returned from a function

• Functions are treated like any other variable in Python, the def statement simply assigns a function to a variable

Sept 26-29, 2016 (c) 2016 iCDO@UALR 16

Function Basics

def min(x,y) :if x > y :

return xelse :

return y

>>> mix(2,5)5

functionbasics.py

Sept 26-29, 2016 (c) 2016 iCDO@UALR 17

Python for graph

• Matplotlib is a python 2D plotting library which produces high quality figures

• Read demos is ready at plot_demo.ipy file.

Sept 26-29, 2016 (c) 2016 iCDO@UALR 18

MongoDB LAB

Sept 26-29, 2016 (c) 2016 iCDO@UALR 19

http://teslae.host.ualr.edu:8081

username: mongotest

Password: mongotest

MongoDB Express User Interface

Sept 26-29, 2016 (c) 2016 iCDO@UALR 20

MongoDB Express

• MongoDB Express is Web-based MongoDB admin interface

• You can create, review, export, delete data through the platform

Sept 26-29, 2016 (c) 2016 iCDO@UALR 21

MongoDB Express Lab

• Export cities.json

• Add a new city name which you like to MongoDB

• Query or find the new city name

• Delete the new city name

Sept 26-29, 2016 (c) 2016 iCDO@UALR 22

Clean Data Lab

Sept 26-29, 2016 (c) 2016 iCDO@UALR 23

Courses Data in MongoDB

Sept 26-29, 2016 (c) 2016 iCDO@UALR 24

Connect to MongoDB

Sept 26-29, 2016 (c) 2016 iCDO@UALR 25

CRUD Operation for MongoDB

Sept 26-29, 2016 (c) 2016 iCDO@UALR 26

Basic Python-MongoDB Lab

• Write codes to add a new course • {"courseid": "71XX", <--Change XX

• "subject": "information science",

• "title": "data quality algorithm", <--Change course name

• "hours": 3 <--Change hours

• }

• Write codes to search your courses• query = {"title": "data quality algorithm" } <--Change title name

• projection = {"hours": 3 <--Change hours

Sept 26-29, 2016 (c) 2016 iCDO@UALR 27

Basic Python-MongoDB lab (cont.)

• A challenge project• Write codes to add your name at teachers’ list

Sept 26-29, 2016 (c) 2016 iCDO@UALR 28

Clean Data lab (cont.)

• Teachers, Courses, and Students are MDM data so that the data is accurate and trust.

• student_course_report and

• teacher_course_report contain incorrect data, but teacherid, studentid ,and courseid are correct.

Teachersinfo teacher_course_report

Sept 26-29, 2016 (c) 2016 iCDO@UALR 29

Clean Data lab (cont.)

teacher_course_report

Sept 26-29, 2016 (c) 2016 iCDO@UALR 30

Clean Data lab (cont.)

• Write codes to clean student_course_report

• Tips:

coursesinfo

studentsinfo

student_course_report

Sept 26-29, 2016 (c) 2016 iCDO@UALR 31

Clean Data lab (cont.)

• A challenge project• Write codes to clean t_s_c_report.

coursesinfo

studentsinfo

TeachersinfoSept 26-29, 2016 (c) 2016 iCDO@UALR 32

THANK YOU

Sept 26-29, 2016 (c) 2016 iCDO@UALR 33

Reference

• http://www.scipy-lectures.org/packages/statistics/index.html

• https://github.com/mongo-express/mongo-express

• https://api.mongodb.com/python/current/

• https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&uact=8&sqi=2&ved=0ahUKEwjI-uufkabPAhVOgx4KHdWsAXwQFggiMAE&url=http%3A%2F%2Fwww.fh.huji.ac.il%2F~goldmosh%2FPythonTutorialFeb152012.ppt&usg=AFQjCNH5nWz_PAanbl7JCdE6PN7SFUVxyw&sig2=SGxL0rIqfL8gbxQD7mfURA

• https://docs.mongodb.com/manual/

• http://www2.imm.dtu.dk/pubdb/views/edoc_download.php/5944/pdf

• O'higgins, Niall. MongoDB and Python: Patterns and processes for the popular document-oriented database. " O'Reilly Media, Inc.", 2011.

Sept 26-29, 2016 (c) 2016 iCDO@UALR 34

top related