cosc 1306—computer science and programming python functions

Download COSC 1306—COMPUTER SCIENCE AND PROGRAMMING PYTHON FUNCTIONS

Post on 17-Jan-2016

35 views

Category:

Documents

0 download

Embed Size (px)

DESCRIPTION

COSC 1306—COMPUTER SCIENCE AND PROGRAMMING PYTHON FUNCTIONS. Jehan-François Pâris jfparis@uh.edu. Module Overview. We will learn how to read, create and modify files Pay special attention to pickled files They are very easy to use!. The file system. - PowerPoint PPT Presentation

TRANSCRIPT

  • COSC 1306COMPUTER SCIENCE AND PROGRAMMINGPYTHON FUNCTIONSJehan-Franois Prisjfparis@uh.edu

  • Module OverviewWe will learn how to read, create and modify filesPay special attention to pickled filesThey are very easy to use!

  • The file systemProvides long term storage of information. Will store data in stable storage (disk)Cannot be RAM because:Dynamic RAM loses its contents when powered offStatic RAM is too expensiveSystem crashes can corrupt contents of the main memory

  • Overall organizationData managed by the file system are grouped in user-defined data sets called filesThe file system must provide a mechanism for naming these dataEach file system has its own set of conventionsAll modern operating systems use a hierarchical directory structure

  • Windows solution Each device and each disk partition is identified by a letterA: and B: were used by the floppy drivesC: is the first disk partition of the hard driveIf hard drive has no other disk partition, D: denotes the DVD driveEach device and each disk partition has its own hierarchy of folders

  • Windows solutionC: WindowsUsersProgram FilesFlash drive F:

  • UNIX/LINUX organizationEach device and disk partition has its own directory treeDisk partitions are glued together through the operation to form a single treeTypical user does not know where her files are stored

  • UNIX/LINUX organizationRoot partitionbinusr/Other partitionThe magic mountSecond partition can be accessed as /usr

  • Mac OS organizationSimilar to Windows Disk partitions are not merged Represented by separate icons on the desktop

  • Accessing a file (I)Your Python programs are stored in a folder AKA directoryOn my home PC it isC:\Users\Jehan-Francois Paris\Documents\ Courses\1306\PythonAll files in that directory can be directly accessed through their names"myfile.txt"

  • Accessing a file (II)Files in subdirectories can be accessed by specifying first the subdirectoryWindows style:"test\\sample.txt" Note the double backslashLinux/Unix/Mac OS X style:"test/sample.txt"Generally works for Windows

  • Why the double backslash?The backslash is an escape character in PythonCombines with its successor to represent non-printable characters\n represents a newline\t represents a tabMust use \\ to represent a plain backslash

  • Accessing a file (III)For other files, must use full pathnameWindows Style:"C:\\Users\\Jehan-Francois Paris\\ Documents\\Courses\\1306\\Python\\ myfile.txt"

  • Accessing file contentsTwo step process:First we open the fileThen we access its contentsReadWriteWhen we are done, we close the file.

  • What happens at open() time?The system verifiesThat you are an authorized userThat you have the right permissionRead permissionWrite permissionExecute permission exists but doesnt applyand returns a file handle /file descriptor

  • The file handleGives the userDirect access to the fileNo directory lookupsAuthority to execute the file operations whose permissions have been requested

  • Python open()open(name, mode = r, buffering = -1) wherename is name of filemode is permission requestedDefault is r for read onlybuffering specifies the buffer sizeUse system default value (code -1)

  • The modesCan requestr for read-onlyw for write-onlyAlways overwrites the filea for appendWrites at the endr+ or a+ for updating (read + write/append)

  • Examplesf1 = open("myfile.txt") same as f1 = open("myfile.txt", "r")f2 = open("test\\sample.txt", "r")f3 = open("test/sample.txt", "r")f4 = open("C:\\Users\\Jehan-Francois Paris\\ Documents\\Courses\\1306\\Python\\myfile.txt")

  • Reading a fileThree ways:Global readsLine by linePickled files

  • Global readsfh.read()Returns whole contents of file specified by file handle fhFile contents are stored in a single string that might be very large

  • Examplef2 = open("test\\sample.txt", "r") bigstring = f2.read() print(bigstring) f2.close() # not required

  • Output of example To be or not to be that is the question Now is the winter of our discontent

    Exact contents of file test\sample.txt

  • Line-by-line readsfor line in fh : # do not forget the column #anything you want fh.close() # not required

  • Examplef3 = open("test/sample.txt", "r") for line in f3 : # do not forget the column print(line) f3.close() # not required

  • OutputTo be or not to be that is the question Now is the winter of our discontent

    With one or more extra blank lines

  • Why?Each line ends with an end-of-line markerprint() adds an extra end-of-line

  • Trying to remove blank linesprint('----------------------------------------------------') f5 = open("test/sample.txt", "r") for line in f5 : # do not forget the column print(line[:-1]) # remove last char f5.close() # not required print('-----------------------------------------------------')

  • The output---------------------------------------------------- To be or not to be that is the question Now is the winter of our disconten -----------------------------------------------------

    The last line did not end with an EOL!

  • A smarter solution (I)Only remove the last character if it is an EOLif line[-1] == \n : print(line[:-1] else print line

  • A smarter solution (II)print('----------------------------------------------------') fh = open("test/sample.txt", "r") for line in fh : # do not forget the column if line[-1] == '\n' : print(line[:-1]) # remove last char else : print(line) print('-----------------------------------------------------') fh.close() # not required

  • It works!---------------------------------------------------- To be or not to be that is the question Now is the winter of our discontent -----------------------------------------------------

  • Making sense of file contentsMost files contain more than one data item per lineCOSC713-743-3350 UHPD 713-743-3333Must split linesmystring.split(sepchar) where sepchar is a separation characterreturns a list of items

  • Splitting strings>>> text = "Four score and seven years ago" >>> text.split() ['Four', 'score', 'and', 'seven', 'years', 'ago']

    >>>record ="1,'Baker, Andy', 83, 89, 85" >>> record.split(',') [' 1', "'Baker", " Andy'", ' 83', ' 89', ' 85']Not what we wanted!

  • Example# how2split.pyprint('----------------------------------------------------')f5 = open("test/sample.txt", "r")for line in f5 : words = line.split() for xxx in words : print(xxx)f5.close() # not requiredprint('-----------------------------------------------------')

  • Output---------------------------------------------------- To be of our discontent -----------------------------------------------------

  • Other separators (I)CommasCSV Excel formatValues are separated by commasStrings are stored without quotesUnless they contain a commaDoe, Jane, freshman, 90, 90Quotes within strings are doubled

  • Other separators (II)Tabs( \t)Advantages:Your fields will appear nicely alignedSpaces, commas, are not an issueDisadvantage:You do not see themThey look like spaces

  • Why it is importantWhen you must pick your file format, you should decide how the data inside the file will be used:People will read themOther programs will use themWill be used by people and machines

  • An exerciseConverting our output to CSV formatReplacing tabs by commasEasyWill use string replace function

  • First attemptfh_in = open('grades.txt', 'r') # the 'r' is optional buffer = fh_in.read() newbuffer = buffer.replace('\t', ',') fh_out = open('grades0.csv', 'w') fh_out.write(newbuffer) fh_in.close() fh_out.close() print('Done!')

  • The outputAlice9090909090 Bob8585858585 Carol7575757575becomesAlice,90,90,90,90,90 Bob,85,85,85,85,85 Carol,75,75,75,75,75

  • Dealing with commas (I)Work line by lineFor each linesplit input into fields using TAB as separatorstore fields into a listAlice9090909090 becomes [Alice, 90, 90, 90, 90, 90]

  • Dealing with commas (II)Put within double quotes any entry containing one or more commasOutput list entries separated by commas['"Baker, Alice"', 90, 90, 90, 90, 90]becomes "Baker, Alice",90,90,90,90,90

  • Dealing with commas (III)Our troubles are not over:Must store somewhere all lines until we are doneStore them in a list

  • Dealing with double quotesBefore wrapping items with commas with double quotes replaceAll double quotes by pairs of double quotes'Aguirre, "Lalo" Eduardo' becomes 'Aguirre, ""Lalo"" Eduardo' then '"Aguirre, ""Lalo"" Eduardo"'

  • General organization (I)linelist = [ ]for line in fileitemlist = line.split()linestring = '' # empty stringfor each item in itemlistremove any trailing newlinedouble all double quotesif item contains comma, wrapadd to linestring

  • General organization (II)for line in filefor each item in itemlistdouble all double quotesif item contains comma, wrapadd to linestringappend linestring to stringlist

  • General organization (III)for line in fileremove last comma of linestringadd newline at end of linestringappend linestring to stringlistfor linestring in in stringline write linestring into output file

  • The program (I)# betterconvert2csv.py """ Convert tab-separated file to csv """ fh = open('grades.txt','r') #input file linelist = [ ] # global data structure for line in fh : # outer loop itemlist = line.split('\t') # print(str(itemlist)) # just for debugging linestring = '' # start afresh

  • The program (II) for item in itemli