cos 131: computing for engineers chapter 8: file input and output

33
July 4, 2022 1 Douglas R. Sizemore, Ph.D. Professor of Computer Science COS 131: Computing for Engineers Chapter 8: File Input and Output This lecture was given in Fall, 2008 by Professor Sizemore and refers to an older Version of MATLAB than R2011A.

Upload: britain

Post on 19-Jan-2016

35 views

Category:

Documents


0 download

DESCRIPTION

COS 131: Computing for Engineers Chapter 8: File Input and Output. Douglas R. Sizemore, Ph.D. Professor of Computer Science. This lecture was given in Fall, 2008 by Professor Sizemore and refers to an older Version of MATLAB than R2011A. Introduction. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: COS 131: Computing for Engineers Chapter 8: File Input and Output

April 21, 2023 1

Douglas R. Sizemore, Ph.D.

Professor of Computer Science

COS 131: Computing for Engineers

Chapter 8: File Input and Output

This lecture was given in Fall, 2008 by Professor Sizemore and refers to an olderVersion of MATLAB than R2011A.

Page 2: COS 131: Computing for Engineers Chapter 8: File Input and Output

April 21, 2023 2

Introduction• Addresses three levels of capability for

reading and writing files in MATLAB– Saving and restoring the workspace– High-level functions for accessing files in

specific formats– Low-level file access programs for general-

purpose file processing– Consider conditions under which each is

appropriate

Page 3: COS 131: Computing for Engineers Chapter 8: File Input and Output

April 21, 2023 3

Introduction• Consider three types of activities that read and

write data files.– MATLAB has basic ability to save your workspace (or

parts of) to a file and restore it later for further processing

– Have high-level functions in MATLAB that take the name of a file in any one of a number of popular formats and produce an internal representation of the data from that file in a form ready for processing

– Need to deal with lower-level capabilities for manipulating text files that do not have recognizable structures

Page 4: COS 131: Computing for Engineers Chapter 8: File Input and Output

April 21, 2023 4

Introduction• Will consider files containing:

– Workspace variables– Spreadsheet data– Text files with delimited numbers– Text files with plain text

• MATLAB also has the ability to access binary files – files whose data are not in text form; we will not consider binary files here.

Page 5: COS 131: Computing for Engineers Chapter 8: File Input and Output

April 21, 2023 5

Concept: Serial Input and Output (I/O)• Refer to the process of reading and writing data

files as Input/Output or I/O• All computer file systems save and retrieve data as

a sequential stream of characters; remember these characters are small sets of ones and zeros corresponding to digital electronic signals of +/- 5 volts dc which represent the binary number system or 1s and 0s

• Input and output streams depicted in the slides on the next slide

Page 6: COS 131: Computing for Engineers Chapter 8: File Input and Output

April 21, 2023 6

Concept: Serial Input and Output (I/O)

• Input and Output Streams:

Input Stream

Output Stream

Page 7: COS 131: Computing for Engineers Chapter 8: File Input and Output

April 21, 2023 7

Concept: Serial Input and Output (I/O)

• Input and Output– Data control characters are mixed in with the

regular characters – we an make sense of what is happening; specify the organization of the data

Page 8: COS 131: Computing for Engineers Chapter 8: File Input and Output

April 21, 2023 8

Concept: Serial Input and Output (I/O)

• File Processing Scenario for INPUT– Program opens the file for reading– Continually requests values from the file data

stream until the end of file (EOF) is reached– As the data is received, the program uses the

delimiting characters included in the data stream to reformat the data to reconstruct the organization of the data as represented in the file

Page 9: COS 131: Computing for Engineers Chapter 8: File Input and Output

April 21, 2023 9

MATLAB Workspace (I/O)

• MATLAB allows you to save your workspace to a file with the SAVE command; allows you to reload your workspace from a file with the LOAD command

• File will be the name you give it with a .mat extension

• The default filename is matlab.mat

Page 10: COS 131: Computing for Engineers Chapter 8: File Input and Output

April 21, 2023 10

MATLAB Workspace (I/O)• Can also identify specific variables that you want

to save by– listing them explicitly– Providing logical expressions to indicate the variable

names• Example: >>save mydata.mat a b c*

– The above example would save the variables a b and any variable beginning with the letter c.

– Not practical as it only saves the results and not the code

– Almost always better to save the scripts and raw data that created the workspace

Page 11: COS 131: Computing for Engineers Chapter 8: File Input and Output

April 21, 2023 11

High-Level I/O Functions

• Now examine the general case of file I/O– Will need to load data from external sources– Will need to process those data– Will need to save those data back to the file

system

Page 12: COS 131: Computing for Engineers Chapter 8: File Input and Output

April 21, 2023 12

High-Level I/O Functions• When we attempt to read or write data from an

external file this is extremely difficult without knowing something of the– Types of data contained in the file– The organization of the data in the file

• Good habit: – explore the data in a file by whatever tools you have at

your disposal– Commit to processing the data according to your

observations

• Table in following slide shows the file readers and writers available in MATLAB

Page 13: COS 131: Computing for Engineers Chapter 8: File Input and Output

April 21, 2023 13

High-Level I/O Functions

• File I/O FunctionsFile Content

File

Extension Reader Writer Data Format

Plain text Any textscan fprintfSpecified in the function calls

Comma-separated numbers CSV csvread csvwrite Double array

Tab-separated text TAB dlmread dlmwrite Double array

General delimited text DLM dlmread dlmwrite Double array

Excel worksheet XLS xlsread xlswrite Double or cell array

Lotus 1-2-3 workshet WK1 wk1read wk1write Double or cell array

Scientific data in Common Data Format CDF cdfread cdfwrite

Cell array of CDF records

Flexible Image Transport

System data FITS fitsreadPrimary or extension table data

Data in Hierarchical Data Format HDF hdfread

HDF or HDF-EOS data set

Extended Markup Language (XML) XML xmlread xmlwrite

Document Object Model node

Image data Various imread imwriteTrue color grayscale, or indexed image

Audio file AU or WAV auread or wavread auwrite or wavwriteSound data and sample rate

Movie AVI aviread MATLAB movie

Page 14: COS 131: Computing for Engineers Chapter 8: File Input and Output

April 21, 2023 14

High-Level I/O Functions

• Exploration– Most common files encountered are text files and spreadsheets

– Delimited text files are presumed to contain numerical values

– Spreadsheet data may be either numerical data stored as doubles (typically 64 bits or 8 bytes per number) or string data stored in cell arrays.

– Text files are usually delimited by a special character:• Comma

• Tab

• Space

• or another designated character

• Designates the column divider

• New-line character designates the rows

Page 15: COS 131: Computing for Engineers Chapter 8: File Input and Output

April 21, 2023 15

High-Level I/O Functions

• Exploration– Exception is the plain text reader that requires a format to

define columns and rows

– The file extension as in .txt gives you a significant clue to the nature of the data

– For plain text files you can use a simple editor like Notepad in Windows to examine the organization of the data and obtain clues as to how to proceed

Page 16: COS 131: Computing for Engineers Chapter 8: File Input and Output

April 21, 2023 16

High-Level I/O Functions

• Excel spreadsheets– Rectangular arrays containing labeled rows and columns of

cells

Page 17: COS 131: Computing for Engineers Chapter 8: File Input and Output

April 21, 2023 17

High-Level I/O Functions

• Excel spreadsheets– MATLAB xlsread(…) function separates the text and

numerical portions of a spreadsheet

– The input parameter of xlsread(…) is the name of the file

– Can have up to three return variables• First return variable will hold all numerical values in an array of

doubles

• Second return variable will hold all tlhe text data in cell arrays

• Third return variable (optional) will hold both string and numerical data in cell arrays

– Exercise 8.1: Reading Excel Data

– Smith text, page 189-190, bottom-top

Page 18: COS 131: Computing for Engineers Chapter 8: File Input and Output

April 21, 2023 18

High-Level I/O Functions

• Excel spreadsheets– Observations from Exercise 8.1

• Excel reader function determines the smallest rectangle on the spreadsheet containing all of the numerical data; referred to as the number rectangle

• First result is essentially this number rectangle; if there are any non-numeric values within the rectangle, they are replaced by NaN, the built-in MATLAB name for something that is not a number

• Second result is all character data as strings in a cell array; numbers encountered are given as empty strings

• Third result consists of cell arrays of both numbers and character strings; missing values are assumed to be numeric and are assigned the value, NaN

Page 19: COS 131: Computing for Engineers Chapter 8: File Input and Output

April 21, 2023 19

High-Level I/O Functions

• Excel spreadsheets– Will likely want to write back to the file or to another new or

existing file

– Excel spreadsheets can be written using:• Xlswrite(<filename>, <array>, <sheet>, <range>)

• Where <filename> is the name of the file

• <array> is the data source, a cell array

• <sheet> is the sheet name

• <range> is the range of cells in Excel identify notation

Page 20: COS 131: Computing for Engineers Chapter 8: File Input and Output

April 21, 2023 20

High-Level I/O Functions

• Delimited Text Files – Numerical Data Only– Data are frequently presented in text file form

– If data in a text file are all numerical values, MATLAB can read the file directly into an array

– Necessary for data to be separated or delimited by commas, spaces, or tab characters

– Numerical data of this type can be read using• Dlmread(file, delimiter)

• Delimiter is a single character that ca be used to specify an unusual delimiting character

• Function produces a numerical array containing the data values

• Array elements where data are not supplied are filled with zeros

Page 21: COS 131: Computing for Engineers Chapter 8: File Input and Output

April 21, 2023 21

High-Level I/O Functions

• Delimited Text Files – Numerical Data Only– Exercise 8.2: Reading delimited files– Smith text, page 191, bottom– Listing 8.1: Sample delimited text file:

– Delimited data files can be written using:• dlmwrite( <filename>, <array>, <dlm>)• <filename> is the name of the file• <array> is the data source – a numerical array• <dlm> is the delimiting character; not specified is a comma (CSV)

Page 22: COS 131: Computing for Engineers Chapter 8: File Input and Output

April 21, 2023 22

Lower-Level File I/O

• Introduction– You may encounter text files that cannot be read or written by

the higher level functions defined above

– MATLAB includes functions for general purpose reading and writing of data files

– When we open these files we return a file handle

– A file handle is used by any functions employed in the reading from and writing to the file

– Once the read and write activities have been completed, the file must be closed

Page 23: COS 131: Computing for Engineers Chapter 8: File Input and Output

April 21, 2023 23

Lower-Level File I/O• Opening and Closing Files

– To open a file for reading or writing:• fh = fopen( <filename>, <purpose> )

• fh is a file handle used in subsequent function calls to identify the particular I/O stream

• <filename> is the name of the file

• <purpose> is a string specifying the purpose for opening the file– r – file must already exist

– w – file will be overwritten if it exists

– a – data will be appended to the file if it exists

– To close the file,• fclose( fh )

Page 24: COS 131: Computing for Engineers Chapter 8: File Input and Output

April 21, 2023 24

Lower-Level File I/O• Reading Text Files

– Three levels of support are provided when reading text files:• Reading whole lines with or with out the new line character• Parsing into tokens with delimiters• Parsing into cell arrays using a format string

– To read a whole line including the new line character, use:• str = fgets( fh );• Will return each line as a string until the end of the file (EOF)• Use fgetl(…) to leave out each new line character

– To parse each line into tokens (elementary text strings) separated by white space delimiters, use a combination of fgetl(…) and the tokenizer function:

• [tk, rest] = strtok( ln ); where tk is a string token, rest is the remainder of the line, and ln is a string to be parsed into tokens

Page 25: COS 131: Computing for Engineers Chapter 8: File Input and Output

April 21, 2023 25

Lower-Level File I/O• Reading Text Files

– To parse a line according to a specific format string into a cell array, use:

• ca = textscan( fh, <format> ); where ca is the resulting cell arrray, fh is the file handle, and <format> is a format control string we used for sscanf(…). (Chapter 6)

Page 26: COS 131: Computing for Engineers Chapter 8: File Input and Output

April 21, 2023 26

Lower-Level File I/O• Examples of Reading Text Files

– Listing 8.2 shows a script that will list any text file in the Command window

Refer to notationsOn Listing 8.2 on page 193-194of the Smith text

Page 27: COS 131: Computing for Engineers Chapter 8: File Input and Output

April 21, 2023 27

Lower-Level File I/O• Examples of Reading Text Files

– Listing 8.3 shows the difference in output results between the conventional listing script and the tokenizing lister

Refer to notationsOn Listing 8.2 on page 194of the Smith text

Page 28: COS 131: Computing for Engineers Chapter 8: File Input and Output

April 21, 2023 28

Lower-Level File I/O• Examples of Reading Text Files

– Exercise 8.3: Using file listers – illustrates both traditional and tokenizer approaches to file listing

– Smith text, pages 194-195, bottom-top

Page 29: COS 131: Computing for Engineers Chapter 8: File Input and Output

April 21, 2023 29

Lower-Level File I/O• Writing Text Files

– Must have file open

– The fprintf(…) function used to write to it by including its file handle as the first parameter

– Listing 8.4 alters Listing 8.2, copys a text file instead of listing it

Refer to notationsOn Listing 8.2 on page 195of the Smith text

Page 30: COS 131: Computing for Engineers Chapter 8: File Input and Output

April 21, 2023 30

Engineering Example: Spreadsheet Data• Adaptation of the structure assembly problem form

Chapter 7• In this example the data are presented in a spreadsheet

as given here:

Page 31: COS 131: Computing for Engineers Chapter 8: File Input and Output

April 21, 2023 31

Engineering Example: Spreadsheet Data• Start by considering the layout of the data• Also consider the process necessary to extract what we

need• Which of the three forms of data returned from

xlsread(…) for our use?– Numerical data are not really important in this application

– Not exclusively a text processing problem either

– Will process the raw data provided by xlsread(…), giving bot the string and numerical data

• Create a function the will read this file and produce the same model/structure as in Chapter 7

Page 32: COS 131: Computing for Engineers Chapter 8: File Input and Output

April 21, 2023 32

Engineering Example: Spreadsheet Data• Listing 8.5: Reading structure data

Page 33: COS 131: Computing for Engineers Chapter 8: File Input and Output

April 21, 2023 33

Engineering Example: Spreadsheet Data• Observations on Listing 8.5

– Note at line 2 the function reads the spreadsheet and only keeps the raw data

– In traversing the array, note that we begin with an offset that ignores column 1 and row 1

– As the function cycles through the rows, it is important to empy the array CONN before each pass to avoid “inheriting” data from a previous row

– You can test this function by replacing the structure array construction in lines 1-11 of Listing 7.7 in Chapter7 with the following line:

– data = readStruct(‘Structure_array.xls’);