cosc 1306—computer science and programming python functions jehan-françois pâris [email protected]

COSC 1306—COMPUTER SCIENCE AND PROGRAMMING

PYTHON FUNCTIONS

Jehan-François Pâ[email protected]

Module Overview

• We will learn how to read, create and modify files– Pay special attention to pickled files

• They are very easy to use!

The file system

• Provides long term storage of information. • Will store data in stable storage (disk)• Cannot be RAM because:

– Dynamic RAM loses its contents when powered off

– Static RAM is too expensive– System crashes can corrupt contents of the main

memory

Overall organization

• Data managed by the file system are grouped in user-defined data sets called files

• The file system must provide a mechanism for naming these data– Each file system has its own set of conventions– All modern operating systems use a

hierarchical directory structure

Windows solution

• Each device and each disk partition is identified by a letter– A: and B: were used by the floppy drives– C: is the first disk partition of the hard drive– If hard drive has no other disk partition,

D: denotes the DVD drive• Each device and each disk partition has its own

hierarchy of folders

Windows solution

C:

WindowsUsers

Second diskD:

Program Files

Flash driveF:

UNIX/LINUX organization

• Each device and disk partition has its own directory tree– Disk partitions are glued together through the

operation to form a single tree• Typical user does not know where her files

are stored

UNIX/LINUX organizationRoot partition

bin

usr

/ Other partition

The magicmount

Second partition can be accessed as /usr

Mac OS organization

• Similar to Windows – Disk partitions are not merged – Represented by separate icons on the

desktop

Accessing a file (I)

• Your Python programs are stored in a folder AKA directory– On my home PC it is

C:\Users\Jehan-Francois Paris\Documents\Courses\1306\Python

• All files in that directory can be directly accessed through their names– "myfile.txt"

Accessing a file (II)

• Files in subdirectories can be accessed by specifying first the subdirectory– Windows style:

• "test\\sample.txt" – Note the double backslash

– Linux/Unix/Mac OS X style:• "test/sample.txt"

– Generally works for Windows

Why the double backslash?

• The backslash is an escape character in Python– Combines with its successor to represent

non-printable characters• ‘\n’ represents a newline• ‘\t’ represents a tab

– Must use ‘\\’ to represent a plain backslash

Accessing a file (III)

• For other files, must use full pathname– Windows Style:

• "C:\\Users\\Jehan-Francois Paris\\Documents\\Courses\\1306\\Python\\myfile.txt"

Accessing file contents

• Two step process:– First we open the file– Then we access its contents

• Read• Write

• When we are done, we close the file.

What happens at open() time?

• The system verifies– That you are an authorized user– That you have the right permission

• Read permission• Write permission• Execute permission exists but doesn’t apply

and returns a file handle /file descriptor

The file handle

• Gives the user– Direct access to the file

• No directory lookups– Authority to execute the file operations whose

permissions have been requested

Python open()

• open(name, mode = ‘r’, buffering = -1)

where– name is name of file– mode is permission requested

• Default is ‘r’ for read only– buffering specifies the buffer size

• Use system default value (code -1)

The modes

• Can request– ‘r’ for read-only– ‘w’ for write-only

• Always overwrites the file– ‘a’ for append

• Writes at the end– ‘r+’ or ‘a+’ for updating (read + write/append)

Examples

• f1 = open("myfile.txt") same asf1 = open("myfile.txt", "r")

• f2 = open("test\\sample.txt", "r")

• f3 = open("test/sample.txt", "r")

• f4 = open("C:\\Users\\Jehan-Francois Paris\\Documents\\Courses\\1306\\Python\\myfile.txt")

Reading a file

• Three ways:– Global reads– Line by line– Pickled files

Global reads

• fh.read()– Returns whole contents of file specified by

file handle fh– File contents are stored in a single string

that might be very large

Example

• f2 = open("test\\sample.txt", "r") bigstring = f2.read()print(bigstring)f2.close() # not required

Output of example

• To be or not to be that is the questionNow is the winter of our discontent

– Exact contents of file ‘test\sample.txt’

Line-by-line reads

• for line in fh : # do not forget the column #anything you wantfh.close() # not required

Example

• f3 = open("test/sample.txt", "r") for line in f3 : # do not forget the column

print(line)f3.close() # not required

Output

• To be or not to be that is the question

Now is the winter of our discontent

– With one or more extra blank lines

Why?

• Each line ends with an end-of-line marker• print(…) adds an extra end-of-line

Trying to remove blank lines

• print('----------------------------------------------------')f5 = open("test/sample.txt", "r") for line in f5 : # do not forget the column print(line[:-1]) # remove last charf5.close() # not requiredprint('-----------------------------------------------------')

The output

• ----------------------------------------------------To be or not to be that is the questionNow is the winter of our disconten-----------------------------------------------------

• The last line did not end with an EOL!

A smarter solution (I)

• Only remove the last character if it is an EOL– if line[-1] == ‘\n’ :

print(line[:-1]else print line

A smarter solution (II)

• print('----------------------------------------------------')fh = open("test/sample.txt", "r")for line in fh : # do not forget the column if line[-1] == '\n' : print(line[:-1]) # remove last char else : print(line)print('-----------------------------------------------------')fh.close() # not required

It works!

• ----------------------------------------------------To be or not to be that is the questionNow is the winter of our discontent-----------------------------------------------------

Making sense of file contents

• Most files contain more than one data item per line– COSC 713-743-3350

UHPD 713-743-3333• Must split lines

– mystring.split(sepchar)where sepchar is a separation character• returns a list of items

Splitting strings

• >>> text = "Four score and seven years ago">>> text.split()['Four', 'score', 'and', 'seven', 'years', 'ago']

• >>>record ="1,'Baker, Andy', 83, 89, 85">>> record.split(',')[' 1', "'Baker", " Andy'", ' 83', ' 89', ' 85']

Not what we wanted!

Example

# how2split.pyprint('----------------------------------------------------')f5 = open("test/sample.txt", "r")for line in f5 :

words = line.split() for xxx in words : print(xxx)f5.close() # not requiredprint('-----------------------------------------------------')

Output

• ----------------------------------------------------Tobe…ofourdiscontent-----------------------------------------------------

Other separators (I)

• Commas– CSV Excel format

• Values are separated by commas• Strings are stored without quotes

–Unless they contain a comma• “Doe, Jane”, freshman, 90, 90

–Quotes within strings are doubled

Other separators (II)

• Tabs( ‘\t’)– Advantages:

• Your fields will appear nicely aligned• Spaces, commas, … are not an issue

– Disadvantage:• You do not see them

–They look like spaces

Why it is important

• When you must pick your file format, you should decide how the data inside the file will be used:– People will read them– Other programs will use them– Will be used by people and machines

An exercise

• Converting our output to CSV format– Replacing tabs by commas

• Easy–Will use string replace function

First attempt

• fh_in = open('grades.txt', 'r') # the 'r' is optionalbuffer = fh_in.read()newbuffer = buffer.replace('\t', ',')fh_out = open('grades0.csv', 'w')fh_out.write(newbuffer)fh_in.close()fh_out.close()print('Done!')

The output

• Alice 90 90 90 90 90Bob 85 85 85 85 85Carol 75 75 75 75 75

becomes• Alice,90,90,90,90,90

Bob,85,85,85,85,85Carol,75,75,75,75,75

Dealing with commas (I)

• Work line by line• For each line

– split input into fields using TAB as separator– store fields into a list

• Alice 90 90 90 90 90becomes[‘Alice’, ’90’, ’90’, ’90’, ’90’, ’90’]

Dealing with commas (II)

– Put within double quotes any entry containing one or more commas

– Output list entries separated by commas• ['"Baker, Alice"', 90, 90, 90, 90, 90]

becomes"Baker, Alice",90,90,90,90,90

Dealing with commas (III)

• Our troubles are not over:– Must store somewhere all lines until we are

done– Store them in a list

Dealing with double quotes

• Before wrapping items with commas with double quotes replace– All double quotes by pairs of double quotes– 'Aguirre, "Lalo" Eduardo'

becomes'Aguirre, ""Lalo"" Eduardo'then'"Aguirre, ""Lalo"" Eduardo"'

General organization (I)

• linelist = [ ]• for line in file

– itemlist = line.split(…)– linestring = '' # empty string– for each item in itemlist

• remove any trailing newline• double all double quotes• if item contains comma, wrap• add to linestring

General organization (II)

• for line in filefor line in file– ……– for each item in itemlistfor each item in itemlist

• double all double quotesdouble all double quotes• if item contains comma, wrapif item contains comma, wrap• add to linestringadd to linestring

– append linestring to stringlist

General organization (III)

• for line in filefor line in file– ……– remove last comma of linestring– add newline at end of linestring– append linestring to stringlist

• for linestring in in stringline – write linestring into output file

The program (I)

• # betterconvert2csv.py""" Convert tab-separated file to csv"""fh = open('grades.txt','r') #input filelinelist = [ ] # global data structurefor line in fh : # outer loop itemlist = line.split('\t') # print(str(itemlist)) # just for debugging linestring = '' # start afresh

The program (II)

• for item in itemlist : #inner loop item = item.replace('"','""') # for quotes if item[-1] == '\n' : # remove it item = item[:-1] if ',' in item : # wrap item linestring += '"' + item +'"' + ',' else : # just append linestring += item +',' # end of inside for loop

The program (III)

• # must replace last comma by newline linestring = linestring[:-1] + '\n' linelist.append(linestring)# end of outside for loopfh.close()fhh = open('great.csv', 'w')for line in linelist : fhh.write(line)fhh.close()

Notes

• Most print statements used for debugging were removed– Space considerations

• Observe that the inner loop adds a comma after each item– Wanted to remove the last one

• Must also add a newline at end of each line

The input file

• Alice 90 90 90 90 90Bob 85 85 85 85 85Carol 75 75 75 75 75Doe, Jane 90 90 90 80 70Fulano, Eduardo "Lalo" 90 90 9090

The output file

• Alice,90,90,90,90,90Bob,85,85,85,85,85Carol ,75,75,75,75,75"Doe, Jane",90,90 ,90 ,80 ,75"Fulano, Eduardo ""Lalo""",90,90,90,90

Mistakes being made (I)

• Mixing lists and strings:– Earlier draft of program declared

• linestring = [ ]and did• linestring.append(item)

– Outcome was• ['Alice,', '90,'. … ]

instead of• 'Alice,90, …'

Mistakes being made (II)

• Forgetting to add a newline– Output was a single line

• Doing the append inside the inner loop:– Output was

• Alice,90Alice,90,90Alice,90,90,90…

Mistakes being made

• Forgetting that strings are immutable:– Trying to do

• linestring[-1] = '\n'

instead of• linestring = linestring[:-1] + '\n'

– Bigger issue:• Do we have to remove the last comma?

Could we have done better? (I)

• Make the program more readable by decomposing it into functions– A function to process each line of input

• do_line(line)– Input is a string ending with newline–Output is a string in CSV format–Should call a function processing individual

items

Could we have done better? (II)

– A function to process individual items• do_item(item)

– Input is a string–Returns a string

• With double quotes "doubled"• Without a newline• Within quotes if it contains a comma

The new program (I)

• def do_item(item) : item = item.replace('"','""') if item[-1] == '\n' : item = item[:-1] if ',' in item : item ='"' + item +'"' return item

The new program (II)

• def do_line(line) : itemlist = line.split('\t') linestring = '' # start afresh for item in itemlist : linestring += do_item(item) +',' linestring += '\n' return linestring

The new program (III)

• fh = open('grades.txt','r')linelist = [ ]for line in fh : linelist.append(do_line(line))fh.close()

The new program (IV)

• fhh = open('great.csv', 'w')for line in linelist : fhh.write(line)fhh.close()

Why it is better

• Program is decomposed into small modules that are much easier to understand– Each fits on a PowerPoint slide

The break statement

• Makes the program exit the loop it is in• In next example, we are looking for

first instance of a string in a file– Can exit as soon it is found

Example (I)

• searchstring= input('Enter search string:')found = Falsefh = open('grades.txt')for line in fh : if searchstring in line : print(line) found = True break

Example (II)

• if found == True : print("String %s was found" % searchstring)else : print("String %s NOT found " % searchstring)

Flags

• A variable like found– That can either be True or False– That is used in a condition for an if or a while

is often referred to as a flag

A dumb mistake

• Unlike C and its family of languages,Python does not let you write– if found = True

for– if found == True

• There are still cases where we can do mistakes!

Example

• >>> b = 5>>> c = 8>>> a = b = c>>> a8

• >>> a = b == c>>> aTrue

HANDLING EXCEPTIONS

When a wrong value is entered

• When user is prompted for– number = int(input("Enter a number: ")

and enters– a non-numerical string

a ValueError exception is raised and the program terminates

• Python a programs catch errors

The try… except pair (I)

• try:<statements being tried>

except Exception as ex:<statements catching the exception>

• Observe– the colons– the indentation

The try… except pair (II)

• try:<statements being tried>

except Exception as ex:<statements catching the exception>

• If an exception occurs while the program executes the statements between the try and the except, control is immediately transferred to the statements after the except

A better example

• done = Falsewhile not done : filename= input("Enter a file name: ") try : fh = open(filename) done = True except Exception as ex: print ('File %s does not exist' % filename)print(fh.read())

An Example (I)

• done = Falsewhile not done : try : number = int(input('Enter a number:')) done = True except Exception as ex: print ('You did not enter a number')print ("You entered %.2f." % number)input("Hit enter when done with program.")

A simpler solution

• done = Falsewhile not done myinput = (input('Enter a number:')) if myinput.isdigit() : number = int(myinput) done = True else : print ('You did not enter a number')print ("You entered %.2f." % number)input("Hit enter when done with program.")

PICKLED FILES

Pickled files

• import pickle – Provides a way to save complex data

structures in a file– Sometimes said to provide a

serialized representation of Python objects

Basic primitives (I)

• dump(object,fh)– appends a sequential representation of

object into file with file handle fh– object is virtually any Python object– fh is the handle of a file that must have been

opened in 'wb' mode

b is a special option allowing towrite or read binary data

Basic primitives (II)

• target = load( filehandle)– assigns to target next pickled object stored in

file filehandle– target is virtually any Python object– filehandle id filehandle of a file that was

opened in rb mode

Example (I)

• >>> mylist = [ 2, 'Apples', 5, 'Oranges']• >>> mylist

[2, 'Apples', 5, 'Oranges']• >>> fh = open('testfile', 'wb') # b is for BINARY• >>> import pickle• >>> pickle.dump(mylist, fh)• >>> fh.close()

Example (II)

• >>> fhh = open('testfile', 'rb') # b is for BINARY• >>> theirlist = pickle.load(fhh)• >>> theirlist

[2, 'Apples', 5, 'Oranges']• >>> theirlist == mylist

True

What was stored in testfile?

• Some binary data containing the strings 'Apples' and 'Oranges'

Using ASCII format

• Can require a pickled representation of objects that only contains printable characters– Must specify protocol = 0

• Advantage:– Easier to debug

• Disadvantage:– Takes more space

Example

• import picklemydict = {'Alice': 22, 'Bob' : 27}fh = open('asciifile.txt', 'wb') # MUST be 'wb'pickle.dump(mydict, fh, protocol = 0)fh.close()fhh = open('asciifile.txt', 'rb')theirdict = pickle.load(fhh)print(mydict)print(theirdict)

The output

• {'Bob': 27, 'Alice': 22}{'Bob': 27, 'Alice': 22}

What is inside asciifile.txt?

• (dp0VBobp1L27LsVAlicep2L22Ls.

Dumping multiple objects (I)

• import picklefh = open('asciifile.txt', 'wb')for k in range(3, 6) : mylist = [i for i in range(1,k)] print(mylist) pickle.dump(mylist, fh, protocol = 0)fh.close()

Dumping multiple objects (II)

• fhh = open('asciifile.txt', 'rb')lists = [ ] # initializing list of listswhile 1 : # means forever try:

lists.append(pickle.load(fhh))except EOFError :

breakfhh.close()print(lists)

Dumping multiple objects (III)

• Note the way we test for end-of-file (EOF)

– while 1 : # means forever try:

lists.append(pickle.load(fhh)) except EOFError :

break

The output

• [1, 2][1, 2, 3][1, 2, 3, 4][[1, 2], [1, 2, 3], [1, 2, 3, 4]]

What is inside asciifile.txt?

• (lp0L1LaL2La.(lp0L1LaL2LaL3La.(lp0L1LaL2LaL3LaL4La.

Practical considerations

• You rarely pick the format of your input files– May have to do format conversion

• You often have to use specific formats for you output files– Often dictated by program that will use them

• Otherwise stick with pickled files!