COSC 1306—COMPUTER SCIENCE AND PROGRAMMING PYTHON FUNCTIONS

Download COSC 1306—COMPUTER SCIENCE AND PROGRAMMING PYTHON FUNCTIONS

Post on 17-Jan-2016

34 views

Category:

Documents

0 download

Embed Size (px)

DESCRIPTION

COSC 1306COMPUTER SCIENCE AND PROGRAMMING PYTHON FUNCTIONS. Jehan-Franois Pris jfparis@uh.edu. Module Overview. We will learn how to read, create and modify files Pay special attention to pickled files They are very easy to use!. The file system. - PowerPoint PPT Presentation

TRANSCRIPT

<ul><li><p> COSC 1306COMPUTER SCIENCE AND PROGRAMMINGPYTHON FUNCTIONSJehan-Franois Prisjfparis@uh.edu</p></li><li><p>Module OverviewWe will learn how to read, create and modify filesPay special attention to pickled filesThey are very easy to use!</p></li><li><p>The file systemProvides long term storage of information. Will store data in stable storage (disk)Cannot be RAM because:Dynamic RAM loses its contents when powered offStatic RAM is too expensiveSystem crashes can corrupt contents of the main memory</p></li><li><p>Overall organizationData managed by the file system are grouped in user-defined data sets called filesThe file system must provide a mechanism for naming these dataEach file system has its own set of conventionsAll modern operating systems use a hierarchical directory structure </p></li><li><p>Windows solution Each device and each disk partition is identified by a letterA: and B: were used by the floppy drivesC: is the first disk partition of the hard driveIf hard drive has no other disk partition, D: denotes the DVD driveEach device and each disk partition has its own hierarchy of folders</p></li><li><p>Windows solutionC: WindowsUsersProgram FilesFlash drive F:</p></li><li><p>UNIX/LINUX organizationEach device and disk partition has its own directory treeDisk partitions are glued together through the operation to form a single treeTypical user does not know where her files are stored</p></li><li><p>UNIX/LINUX organizationRoot partitionbinusr/Other partitionThe magic mountSecond partition can be accessed as /usr </p></li><li><p>Mac OS organizationSimilar to Windows Disk partitions are not merged Represented by separate icons on the desktop</p></li><li><p>Accessing a file (I)Your Python programs are stored in a folder AKA directoryOn my home PC it isC:\Users\Jehan-Francois Paris\Documents\ Courses\1306\PythonAll files in that directory can be directly accessed through their names"myfile.txt"</p></li><li><p>Accessing a file (II)Files in subdirectories can be accessed by specifying first the subdirectoryWindows style:"test\\sample.txt" Note the double backslashLinux/Unix/Mac OS X style:"test/sample.txt"Generally works for Windows </p></li><li><p>Why the double backslash?The backslash is an escape character in PythonCombines with its successor to represent non-printable characters\n represents a newline\t represents a tabMust use \\ to represent a plain backslash</p></li><li><p>Accessing a file (III)For other files, must use full pathnameWindows Style:"C:\\Users\\Jehan-Francois Paris\\ Documents\\Courses\\1306\\Python\\ myfile.txt"</p></li><li><p>Accessing file contentsTwo step process:First we open the fileThen we access its contentsReadWriteWhen we are done, we close the file.</p></li><li><p>What happens at open() time?The system verifiesThat you are an authorized userThat you have the right permissionRead permissionWrite permissionExecute permission exists but doesnt applyand returns a file handle /file descriptor</p></li><li><p>The file handleGives the userDirect access to the fileNo directory lookupsAuthority to execute the file operations whose permissions have been requested</p></li><li><p>Python open()open(name, mode = r, buffering = -1) wherename is name of filemode is permission requestedDefault is r for read onlybuffering specifies the buffer sizeUse system default value (code -1)</p></li><li><p>The modesCan requestr for read-onlyw for write-onlyAlways overwrites the filea for appendWrites at the endr+ or a+ for updating (read + write/append)</p></li><li><p>Examplesf1 = open("myfile.txt") same as f1 = open("myfile.txt", "r")f2 = open("test\\sample.txt", "r")f3 = open("test/sample.txt", "r")f4 = open("C:\\Users\\Jehan-Francois Paris\\ Documents\\Courses\\1306\\Python\\myfile.txt")</p></li><li><p>Reading a fileThree ways:Global readsLine by linePickled files</p></li><li><p>Global readsfh.read()Returns whole contents of file specified by file handle fhFile contents are stored in a single string that might be very large</p></li><li><p>Examplef2 = open("test\\sample.txt", "r") bigstring = f2.read() print(bigstring) f2.close() # not required</p></li><li><p>Output of example To be or not to be that is the question Now is the winter of our discontent</p><p>Exact contents of file test\sample.txt</p></li><li><p>Line-by-line readsfor line in fh : # do not forget the column #anything you want fh.close() # not required</p></li><li><p>Examplef3 = open("test/sample.txt", "r") for line in f3 : # do not forget the column print(line) f3.close() # not required</p></li><li><p>OutputTo be or not to be that is the question Now is the winter of our discontent</p><p>With one or more extra blank lines</p></li><li><p>Why?Each line ends with an end-of-line markerprint() adds an extra end-of-line</p></li><li><p>Trying to remove blank linesprint('----------------------------------------------------') f5 = open("test/sample.txt", "r") for line in f5 : # do not forget the column print(line[:-1]) # remove last char f5.close() # not required print('-----------------------------------------------------')</p></li><li><p>The output---------------------------------------------------- To be or not to be that is the question Now is the winter of our disconten -----------------------------------------------------</p><p>The last line did not end with an EOL!</p></li><li><p>A smarter solution (I)Only remove the last character if it is an EOLif line[-1] == \n : print(line[:-1] else print line</p></li><li><p>A smarter solution (II)print('----------------------------------------------------') fh = open("test/sample.txt", "r") for line in fh : # do not forget the column if line[-1] == '\n' : print(line[:-1]) # remove last char else : print(line) print('-----------------------------------------------------') fh.close() # not required</p></li><li><p>It works!---------------------------------------------------- To be or not to be that is the question Now is the winter of our discontent -----------------------------------------------------</p></li><li><p>Making sense of file contentsMost files contain more than one data item per lineCOSC713-743-3350 UHPD 713-743-3333Must split linesmystring.split(sepchar) where sepchar is a separation characterreturns a list of items</p></li><li><p>Splitting strings&gt;&gt;&gt; text = "Four score and seven years ago" &gt;&gt;&gt; text.split() ['Four', 'score', 'and', 'seven', 'years', 'ago']</p><p>&gt;&gt;&gt;record ="1,'Baker, Andy', 83, 89, 85" &gt;&gt;&gt; record.split(',') [' 1', "'Baker", " Andy'", ' 83', ' 89', ' 85']Not what we wanted!</p></li><li><p>Example# how2split.pyprint('----------------------------------------------------')f5 = open("test/sample.txt", "r")for line in f5 : words = line.split() for xxx in words : print(xxx)f5.close() # not requiredprint('-----------------------------------------------------')</p></li><li><p>Output---------------------------------------------------- To be of our discontent -----------------------------------------------------</p></li><li><p>Other separators (I)CommasCSV Excel formatValues are separated by commasStrings are stored without quotesUnless they contain a commaDoe, Jane, freshman, 90, 90Quotes within strings are doubled</p></li><li><p>Other separators (II)Tabs( \t)Advantages:Your fields will appear nicely alignedSpaces, commas, are not an issueDisadvantage:You do not see themThey look like spaces</p></li><li><p>Why it is importantWhen you must pick your file format, you should decide how the data inside the file will be used:People will read themOther programs will use themWill be used by people and machines</p></li><li><p>An exerciseConverting our output to CSV formatReplacing tabs by commasEasyWill use string replace function</p></li><li><p>First attemptfh_in = open('grades.txt', 'r') # the 'r' is optional buffer = fh_in.read() newbuffer = buffer.replace('\t', ',') fh_out = open('grades0.csv', 'w') fh_out.write(newbuffer) fh_in.close() fh_out.close() print('Done!')</p></li><li><p>The outputAlice9090909090 Bob8585858585 Carol7575757575becomesAlice,90,90,90,90,90 Bob,85,85,85,85,85 Carol,75,75,75,75,75</p></li><li><p>Dealing with commas (I)Work line by lineFor each linesplit input into fields using TAB as separatorstore fields into a listAlice9090909090 becomes [Alice, 90, 90, 90, 90, 90]</p></li><li><p>Dealing with commas (II)Put within double quotes any entry containing one or more commasOutput list entries separated by commas['"Baker, Alice"', 90, 90, 90, 90, 90]becomes "Baker, Alice",90,90,90,90,90 </p></li><li><p>Dealing with commas (III)Our troubles are not over:Must store somewhere all lines until we are doneStore them in a list</p></li><li><p>Dealing with double quotesBefore wrapping items with commas with double quotes replaceAll double quotes by pairs of double quotes'Aguirre, "Lalo" Eduardo' becomes 'Aguirre, ""Lalo"" Eduardo' then '"Aguirre, ""Lalo"" Eduardo"' </p></li><li><p>General organization (I)linelist = [ ]for line in fileitemlist = line.split()linestring = '' # empty stringfor each item in itemlistremove any trailing newlinedouble all double quotesif item contains comma, wrapadd to linestring</p></li><li><p>General organization (II)for line in filefor each item in itemlistdouble all double quotesif item contains comma, wrapadd to linestringappend linestring to stringlist</p></li><li><p>General organization (III)for line in fileremove last comma of linestringadd newline at end of linestringappend linestring to stringlistfor linestring in in stringline write linestring into output file</p></li><li><p>The program (I)# betterconvert2csv.py """ Convert tab-separated file to csv """ fh = open('grades.txt','r') #input file linelist = [ ] # global data structure for line in fh : # outer loop itemlist = line.split('\t') # print(str(itemlist)) # just for debugging linestring = '' # start afresh</p></li><li><p>The program (II) for item in itemlist : #inner loop item = item.replace('"','""') # for quotes if item[-1] == '\n' : # remove it item = item[:-1] if ',' in item : # wrap item linestring += '"' + item +'"' + ',' else : # just append linestring += item +',' # end of inside for loop </p></li><li><p>The program (III) # must replace last comma by newline linestring = linestring[:-1] + '\n' linelist.append(linestring) # end of outside for loop fh.close() fhh = open('great.csv', 'w') for line in linelist : fhh.write(line) fhh.close() </p></li><li><p>NotesMost print statements used for debugging were removedSpace considerationsObserve that the inner loop adds a comma after each itemWanted to remove the last oneMust also add a newline at end of each line</p></li><li><p>The input fileAlice9090909090 Bob8585858585 Carol 7575757575 Doe, Jane9090 90 80 70 Fulano, Eduardo "Lalo"90909090</p></li><li><p>The output fileAlice,90,90,90,90,90 Bob,85,85,85,85,85 Carol ,75,75,75,75,75 "Doe, Jane",90,90 ,90 ,80 ,75 "Fulano, Eduardo ""Lalo""",90,90,90,90</p></li><li><p>Mistakes being made (I)Mixing lists and strings:Earlier draft of program declaredlinestring = [ ]and didlinestring.append(item)Outcome was['Alice,', '90,'. ]instead of'Alice,90, '</p></li><li><p>Mistakes being made (II)Forgetting to add a newlineOutput was a single lineDoing the append inside the inner loop:Output wasAlice,90 Alice,90,90 Alice,90,90,90 </p></li><li><p>Mistakes being madeForgetting that strings are immutable:Trying to dolinestring[-1] = '\n'instead oflinestring = linestring[:-1] + '\n'Bigger issue:Do we have to remove the last comma?</p></li><li><p>Could we have done better? (I)Make the program more readable by decomposing it into functionsA function to process each line of inputdo_line(line)Input is a string ending with newlineOutput is a string in CSV formatShould call a function processing individual items</p></li><li><p>Could we have done better? (II)A function to process individual itemsdo_item(item)Input is a stringReturns a stringWith double quotes "doubled"Without a newlineWithin quotes if it contains a comma</p></li><li><p>The new program (I)def do_item(item) : item = item.replace('"','""') if item[-1] == '\n' : item = item[:-1] if ',' in item : item ='"' + item +'"' return item</p></li><li><p>The new program (II)def do_line(line) : itemlist = line.split('\t') linestring = '' # start afresh for item in itemlist : linestring += do_item(item) +',' linestring += '\n' return linestring</p></li><li><p>The new program (III)fh = open('grades.txt','r') linelist = [ ] for line in fh : linelist.append(do_line(line)) fh.close()</p></li><li><p>The new program (IV)fhh = open('great.csv', 'w') for line in linelist : fhh.write(line) fhh.close()</p></li><li><p>Why it is betterProgram is decomposed into small modules that are much easier to understandEach fits on a PowerPoint slide</p></li><li><p>The break statementMakes the program exit the loop it is inIn next example, we are looking for first instance of a string in a fileCan exit as soon it is found</p></li><li><p>Example (I)searchstring= input('Enter search string:') found = False fh = open('grades.txt') for line in fh : if searchstring in line : print(line) found = True break</p></li><li><p>Example (II)if found == True : print("String %s was found" % searchstring) else : print("String %s NOT found " % searchstring)</p></li><li><p>Flags A variable like foundThat can either be True or FalseThat is used in a condition for an if or a whileis often referred to as a flag</p></li><li><p>A dumb mistakeUnlike C and its family of languages, Python does not let you writeif found = Trueforif found == TrueThere are still cases where we can do mistakes!</p></li><li><p>Example&gt;&gt;&gt; b = 5 &gt;&gt;&gt; c = 8 &gt;&gt;&gt; a = b = c &gt;&gt;&gt; a 8&gt;&gt;&gt; a = b == c &gt;&gt;&gt; a True</p></li><li><p>HANDLING EXCEPTIONS</p></li><li><p>When a wrong value is enteredWhen user is prompted fornumber = int(input("Enter a number: ")and entersa non-numerical stringa ValueError exception is raised and the program terminatesPython a programs catch errors</p></li><li><p>The try except pair (I)try: except Exception as ex: Observethe colonsthe indentation</p></li><li><p>The try except pair (II)try: except Exception as ex: If an exception occurs while the program executes the statements between the try and the except, control is immediately transferred to the statements after the except</p></li><li><p>A better exampledone = False while not done : filename= input("Enter a file name: ") try : fh = open(filename) done = True except Exception as ex: print ('File %s does not exist' % filename) print(fh.read())</p></li><li><p>An Example (I)done = False while not done : try : number = int(input('Enter a number:')) done = True except Exception as ex: print ('You did not enter a number') print ("You entered %.2f." % number) input("Hit enter when done with program.")</p></li><li><p> A simpler solutiondone = False while not done myinput = (input('Enter a number:')) if myinput.isdigit() : number = int(myinput) done = True else : print ('You did not enter a number') print ("You entered %.2f." % number) input("Hit enter when done with program.")</p></li><li><p>PICKLED FILES</p></li><li><p>Pickled filesimport pickle Provides a way to save complex data structures in a fileSometimes said to provide a serialized representation of Python objects</p></li><li><p>Basic primitives (I)dump(object,fh)appends a sequential representation of object into file with file handle fhobject is virtually any Python objectfh is the handle of a file that must have been opened in 'wb' mode b is a special option allowing to write or read binary data</p></li><li><p>Basic primitives (II)target = load( filehandle)assigns to target next pickled object stored in file filehandletarget is virtually any Python objectfilehandle id filehandle of a file that was opened in rb mode</p></li><li><p>Example (I)&gt;&gt;&gt; mylist = [ 2, 'Apples', 5, 'Oranges']&gt;&gt;&gt; mylist [2, 'Apples', 5, 'Oranges']&gt;&gt;&gt; fh = open('testfile',...</p></li></ul>