files and parsing - stanford...

108
Files and Parsing CS106AP Lecture 12

Upload: others

Post on 06-Mar-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

Files and ParsingCS106AP Lecture 12

Page 2: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

RoadmapProgramming Basics

The Console Images

Data structures

MidtermGraphics

Object-Oriented Programming

Everyday Python

Life after CS106AP!

Day 1!

Page 3: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

Images

The Console

Everyday PythonObject-Oriented Programming

MidtermGraphics

Programming Basics

Roadmap

Life after CS106AP!

Day 1!

Data structures

ListsFilesParsing: Strings

Dictionaries 1.0

Dictionaries 2.0

Page 4: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

Today’s questions

How can I separate valuable data from junk?

Page 5: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

Today’s topics

1. Review

Command Line

File Reading

2. What is Parsing?

Useful String Functions

How to Parse

3. What’s next?

Page 6: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

Review

Page 7: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

Command Line & Arguments

Page 8: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

Command Line

Command Line/Terminal

Text interface for giving instructions to the computer.

These instructions are relayed to the computer’s operating system.

Definition

Python Console/Interpreter

An interactive program that allows us to write Python code and run it

line-by-line.

Definition

PyCharm Terminal == Terminal/Command Prompt

Page 9: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

Command Line Usage

python3 script_name.py

using Python,run this script’smain() function

Page 10: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

What’s up with $?

Our convention is to let "$" represent the terminal prompt.

Page 11: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

What’s up with $?

Our convention is to let "$" represent the terminal prompt.

e.g.

$ python3 ghost.py hoover

Page 12: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

What’s up with $?

Our convention is to let "$" represent the terminal prompt.

e.g.

$ python3 ghost.py hoover

this is the part you’d type into your terminal!

Page 13: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

What’s up with $?

Our convention is to let "$" represent the terminal prompt.

e.g.

$ python3 ghost.py hoover

If we use “>>>”, we’re referring to the Python interpreter.

>>> 3 * 6

18

Page 14: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

Think/Pair/Share:Line-by-line: what’s happening in the following code?

Page 15: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

Arguments

def main():

args = sys.argv[1:]

if len(args) == 1:

print_processed_text(args[0], ‘aei’)

if len(args) == 3 and args[0] == ‘-chars’:

print_processed_text(args[2], args[1])

Think/Pair/Share:Line-by-line: what’s happening in the following code?

Page 16: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

Arguments

def main():

args = sys.argv[1:]

if len(args) == 1:

print_processed_text(args[0], ‘aei’)

if len(args) == 3 and args[0] == ‘-chars’:

print_processed_text(args[2], args[1])

Think/Pair/Share:Line-by-line: what’s happening in the following code?

$ python3 DeleteCharacters.py -chars aei poem.txt

Page 17: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

Arguments

def main():

args = sys.argv[1:]

if len(args) == 1:

print_processed_text(args[0], ‘aei’)

if len(args) == 3 and args[0] == ‘-chars’:

print_processed_text(args[2], args[1])

$ python3 DeleteCharacters.py -chars aei poem.txt

get the command line arguments as a list!

Page 18: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

Arguments

def main():

args = sys.argv[1:]

if len(args) == 1:

print_processed_text(args[0], ‘aei’)

if len(args) == 3 and args[0] == ‘-chars’:

print_processed_text(args[2], args[1])

$ python3 DeleteCharacters.py -chars aei poem.txt

get the command line arguments as a list!

Page 19: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

Arguments

def main():

args = sys.argv[1:]

if len(args) == 1:

print_processed_text(args[0], ‘aei’)

if len(args) == 3 and args[0] == ‘-chars’:

print_processed_text(args[2], args[1])

$ python3 DeleteCharacters.py -chars aei poem.txt

slice off the first item in the list

Page 20: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

Arguments

def main():

args = sys.argv[1:]

if len(args) == 1:

print_processed_text(args[0], ‘aei’)

if len(args) == 3 and args[0] == ‘-chars’:

print_processed_text(args[2], args[1])

$ python3 DeleteCharacters.py -chars aei poem.txt

slice off the first item in the list

Now our list doesn’t include the script name.

Page 21: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

Arguments

def main():

args = sys.argv[1:]

if len(args) == 1:

print_processed_text(args[0], ‘aei’)

if len(args) == 3 and args[0] == ‘-chars’:

print_processed_text(args[2], args[1])

$ python3 DeleteCharacters.py -chars aei poem.txt

args[‘-chars’, ‘aei’, ‘poem.txt’]

Page 22: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

Arguments

def main():

args = sys.argv[1:]

if len(args) == 1:

print_processed_text(args[0], ‘aei’)

if len(args) == 3 and args[0] == ‘-chars’:

print_processed_text(args[2], args[1])

$ python3 DeleteCharacters.py -chars aei poem.txt

args[‘-chars’, ‘aei’, ‘poem.txt’] 0 1 2

Page 23: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

Arguments

def main():

args = sys.argv[1:]

if len(args) == 1:

print_processed_text(args[0], ‘aei’)

if len(args) == 3 and args[0] == ‘-chars’:

print_processed_text(args[2], args[1])

$ python3 DeleteCharacters.py poem.txt

Think/Pair/Share:What would args be? What lines of code

would execute?

Page 24: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

Arguments

def main():

args = sys.argv[1:]

if len(args) == 1:

print_processed_text(args[0], ‘aei’)

if len(args) == 3 and args[0] == ‘-chars’:

print_processed_text(args[2], args[1])

$ python3 DeleteCharacters.py poem.txt

args[‘poem.txt’]

0

Page 25: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

Arguments

def main():

args = sys.argv[1:]

if len(args) == 1:

print_processed_text(args[0], ‘aei’)

if len(args) == 3 and args[0] == ‘-chars’:

print_processed_text(args[2], args[1])

$ python3 DeleteCharacters.py i rly like unic0rns ^-^

Think/Pair/Share:What would args be?

Page 26: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

Arguments

def main():

args = sys.argv[1:]

if len(args) == 1:

print_processed_text(args[0], ‘aei’)

if len(args) == 3 and args[0] == ‘-chars’:

print_processed_text(args[2], args[1])

args[‘i’, ‘rly’, ‘like’, ‘unic0rns’, ‘^-^’] 0 1 2 3 4

$ python3 DeleteCharacters.py i rly like unic0rns ^-^

Page 27: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

Takeaways on arguments

python3 DeleteCharacters.py -chars aei poem.txt

with all of these arguments!

using Python,run this script

Page 28: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

Takeaways on arguments

● We can use sys.argv to get a list of strings that correspond to the command line arguments!

Slide adapted from Chris Piech

Page 29: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

Files

Page 30: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

Storing Information

When we’re not running a program and we want to save information, we store it on our hard drive (also called disk)

When we’re running a program, variables and information are stored on RAM (Random Access Memory)

Page 31: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

What’s in a text file?

0 The suns are able to fall and rise:

1 When that brief light has fallen for us,

2 we must sleep a never ending night.

● No bold/italics!

● Each line is ended by the ‘\n’ newline character!

○ Except for the last line, which doesn’t have a ‘\n’.

Page 32: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

What’s in a text file?

0 The suns are able to fall and rise:\n

1 When that brief light has fallen for us,\n

2 we must sleep a never ending night.

● No bold/italics!

● Each line is ended by the ‘\n’ newline character!

○ Except for the last line, which doesn’t have a ‘\n’.

Page 33: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

File Reading – catullus.txt

0 The suns are able to fall and rise:\n

1 When that brief light has fallen for us,\n

2 we must sleep a never ending night.

with open(‘catullus.txt’, ‘r’) as f:

for line in f:

print(line)

Page 34: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

File Reading – catullus.txt

0 The suns are able to fall and rise:\n

1 When that brief light has fallen for us,\n

2 we must sleep a never ending night.

with open(‘catullus.txt’, ‘r’) as f:

for line in f:

print(line)

print() automatically adds a ‘\n’!

Page 35: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

Output:

The suns are able to fall and rise:\n\n

When that brief light has fallen for us,\n\n

we must sleep a never ending night.

with open(‘catullus.txt’, ‘r’) as f:

for line in f:

print(line)

print() automatically adds a ‘\n’!

How can we avoid the extra output line?

Page 36: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

Output:

The suns are able to fall and rise:\n

When that brief light has fallen for us,\n

we must sleep a never ending night.

with open(‘catullus.txt’, ‘r’) as f:

for line in f:

print(line, end=’’)

end’s default value is ‘\n’

Page 37: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

Output:

The suns are able to fall and rise:\n

When that brief light has fallen for us,\n

we must sleep a never ending night.

with open(‘catullus.txt’, ‘r’) as f:

for line in f:

print(line, end=’’)

“once you’ve printed this line, don’t add on a ‘\n’”

Page 38: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

How can I separate valuable data from junk?

Page 39: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

Parsing!

Page 40: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

Data from Social Explorer: ACS 2017

Page 41: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

Data from Social Explorer: ACS 2017

Page 42: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

What is data?$GPGGA,005328.000,3726.1389,N,12210.2515,W,2,07,1.3,22.5,M,-25.7,M,2.0,0000*70

$GPGSA,M,3,09,23,07,16,30,03,27,,,,,,2.3,1.3,1.9*38

$GPRMC,005328.000,A,3726.1389,N,12210.2515,W,0.00,256.18,221217,,,D*78

$GPGGA,005329.000,3726.1389,N,12210.2515,W,2,07,1.3,22.5,M,-25.7,M,2.0,0000*71

$GPGSA,M,3,09,23,07,16,30,03,27,,,,,,2.3,1.3,1.9*38

$GPRMC,005329.000,A,3726.1389,N,12210.2515,W,0.00,256.18,221217,,,D*79

$GPGGA,005330.000,3726.1389,N,12210.2515,W,2,07,1.3,22.5,M,-25.7,M,3.0,0000*78

$GPGSA,M,3,09,23,07,16,30,03,27,,,,,,2.3,1.3,1.9*38

Read more about NMEA

Page 43: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

What is data?

● Usually just text!

○ Text is a common data exchange format.

Page 44: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

Parsing

ParsingThe act of reading “raw” text and converting it

into a more useful format stored in memory.

Definition

Adapted from Jon Skeet

Page 45: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

Components of Parsing

Page 46: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

Components of Parsing

● File Reading

Page 47: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

Components of Parsing

● File Reading

● String Manipulation

Page 48: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

Components of Parsing

● File Reading

● String Manipulation

● Advanced Control Flow

Page 49: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

Components of Parsing

● File Reading

● String Manipulation

● Advanced Control Flow

● Container Data Types

Page 50: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

Components of Parsing

● File Reading

● String Manipulation

● Advanced Control Flow

● Container Data Types

Page 51: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

String Manipulation - Useful Functions

s.isalpha()

s.isdigit()

s.isspace()

Page 52: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

String Manipulation - Useful Functions

s.isalpha()

s.isdigit()

s.isspace() applies to spaces, tabs, and newlines.

Page 53: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

String Manipulation - Useful Functions

s.isalpha()

s.isdigit()

s.isspace() applies to spaces, tabs, and newlines.Tabs are written ‘\t’. Newlines are ‘\n’.

Page 54: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

String Manipulation - Useful Functions

Page 55: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

String Manipulation - Useful Functions

s.startswith(substr)

s.endswith(substr)These functions return booleans!

Page 56: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

String Manipulation - Useful Functions

s.startswith(substr)

s.endswith(substr)

>>> ‘Sonja’.startswith(‘Son’)

These functions return booleans!

Page 57: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

String Manipulation - Useful Functions

s.startswith(substr)

s.endswith(substr)

>>> ‘Sonja’.startswith(‘Son’)

True

These functions return booleans!

Page 58: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

String Manipulation - Useful Functions

>>> s = ‘computer’

Page 59: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

String Manipulation - Useful Functions

>>> s = ‘computer’

>>> ‘put’ in s

Page 60: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

String Manipulation - Useful Functions

>>> s = ‘computer’

>>> ‘put’ in s You can use in with strings, like lists!

Page 61: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

String Manipulation - Useful Functions

>>> s = ‘computer’

>>> ‘put’ in s

True

You can use in with strings, like lists!

Page 62: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

String Manipulation - Useful Functions

>>> s = ‘computer’

>>> ‘put’ in s

True

Page 63: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

String Manipulation - Useful Functions

>>> s = ‘hello!’

Page 64: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

String Manipulation - Useful Functions

>>> s = ‘hello!’

>>> s.find(‘!’)

Page 65: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

String Manipulation - Useful Functions

>>> s = ‘hello!’

>>> s.find(‘!’)

5

find() returns the index of the first occurrence of the substring you pass in

Page 66: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

String Manipulation - Useful Functions

>>> s = ‘hello!’

>>> s.find(‘!’)

5

>>> s.find(‘l’)

find() returns the index of the first occurrence of the substring you pass in

Page 67: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

String Manipulation - Useful Functions

>>> s = ‘hello!’

>>> s.find(‘!’)

5

>>> s.find(‘l’)

2

find() returns the index of the first occurrence of the substring you pass in

Page 68: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

String Manipulation - Useful Functions

>>> s = ‘hello!’

>>> s.find(‘w’)

Page 69: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

String Manipulation - Useful Functions

>>> s = ‘hello!’

>>> s.find(‘w’)

-1

if the string doesn’t contain the substring, return -1

Page 70: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

String Manipulation - Useful Functions

>>> s = ‘hello!’

>>> s.find(‘w’)

-1

>>> s.find(‘l’, 3)optionally can pass in start index (or end index)

Page 71: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

String Manipulation - Useful Functions

>>> s = ‘hello!’

>>> s.find(‘w’)

-1

>>> s.find(‘l’, 3)

3

optionally can pass in start index (or end index)

Page 72: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

String Manipulation - Useful Functions

>>> s = ‘hello!’

>>> s.find(‘w’)

-1

>>> s.find(‘l’, 3)

3

the format is: s.find(substr, start_index, end_index)

Page 73: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

String Manipulation - Useful Functions

>>> s = ‘hello!’

>>> s.find(‘w’)

-1

>>> s.find(‘l’, 3)

3

the format is: s.find(substr, start_index, end_index)

Page 74: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

Think/Pair/Share:Find the first ‘@’ in s. Return the substring made of 0 or more alpha characters following the ‘@’.

Page 75: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

String Manipulation - Useful Functions

>>> s = ‘ hello world! ’

Page 76: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

String Manipulation - Useful Functions

>>> s = ‘ hello world! ’

>>> s.strip() removes whitespace on left & right sides of string

Page 77: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

String Manipulation - Useful Functions

>>> s = ‘ hello world! ’

>>> s.strip() removes whitespace on left & right sides of string

Page 78: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

String Manipulation - Useful Functions

>>> s = ‘ hello world! ’

>>> s.strip()

'hello world!'

removes whitespace on left & right sides of string

Page 79: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

String Manipulation - Useful Functions

>>> s = ‘ hello world! ’

>>> s.strip()

'hello world!'

>>> s = ‘ hello world!\n ’

removes whitespace on left & right sides of string

Page 80: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

String Manipulation - Useful Functions

>>> s = ‘ hello world! ’

>>> s.strip()

'hello world!'

>>> s = ‘ hello world!\n ’

>>> s.strip()

removes whitespace on left & right sides of string

can be used on newlines and tabs as well as spaces

Page 81: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

String Manipulation - Useful Functions

>>> s = ‘ hello world! ’

>>> s.strip()

'hello world!'

>>> s = ‘ hello world!\n ’

>>> s.strip()

'hello world!'

removes whitespace on left & right sides of string

can be used on newlines and tabs as well as spaces

Page 82: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

String Manipulation - Useful Functions

>>> s = ‘ hello world! ’

>>> s.strip()

'hello world!'

>>> s = ‘ hello world!\n ’

>>> s.strip()

'hello world!'

removes whitespace on left & right sides of string

can be used on newlines and tabs as well as spaces

Page 83: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

Recall: (output)

The suns are able to fall and rise:\n\n

When that brief light has fallen for us,\n\n

we must sleep a never ending night.

with open(‘catullus.txt’, ‘r’) as f:

for line in f:

print(line)

print() automatically adds a ‘\n’!

How can we avoid the extra output line?

Page 84: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

Recall: (output)

The suns are able to fall and rise:\n

When that brief light has fallen for us,\n

we must sleep a never ending night.

with open(‘catullus.txt’, ‘r’) as f:

for line in f:

line = line.strip()

print(line)

How can we avoid the extra output line?

Page 85: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

How do we represent strings?

● Google “omega uppercase unicode”○ ‘03A9’

Page 86: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

How do we represent strings?

● Google “omega uppercase unicode”○ ‘03A9’○ hexadecimal notation (base-16) = 0-9 plus letters A-F

Page 87: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

How do we represent strings?

● Google “omega uppercase unicode”○ ‘03A9’○ hexadecimal notation (base-16) = 0-9 plus letters A-F

>>> s = ‘\u03A9’

Page 88: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

How do we represent strings?

● Google “omega uppercase unicode”○ ‘03A9’○ hexadecimal notation (base-16) = 0-9 plus letters A-F

>>> s = ‘\u03A9’

>>> s

Page 89: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

How do we represent strings?

● Google “omega uppercase unicode”○ ‘03A9’○ hexadecimal notation (base-16) = 0-9 plus letters A-F

>>> s = ‘\u03A9’

>>> s

‘Ω’

Page 90: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

Components of Parsing

● File Reading

● String Manipulation

● Advanced Control Flow

● Container Data Types

Page 91: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

Compound Boolean Expressions

s = ‘yay’

if len(s) == 2 and s[1] == ‘a’:

# do something

Page 92: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

Compound Boolean Expressions

s = ‘yay’

if len(s) == 2 and s[1] == ‘a’:

# do something

False True

Page 93: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

Compound Boolean Expressions

s = ‘yay’

if len(s) == 2 and s[1] == ‘a’:

# do something

Stop! This will never get executed!

Page 94: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

Compound Boolean Expressions

s = ‘yay’

if len(s) == 2 and s[1] == ‘a’:

# do something

Stop! This will never get executed!This is also known as “shortcircuiting”.

Page 95: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

Why is this useful?

Page 96: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

Why is this useful?

s = ‘’

if len(s) != 0 and s[0] == ‘a’:

# do something

Page 97: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

Why is this useful?

s = ‘’

if len(s) != 0 and s[0] == ‘a’:

# do something

False s[0] would result in an error!

Page 98: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

Advanced Control Flow

break

continue

Page 99: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

Advanced Control Flow

# print words in all_words until hit a censored word!

Page 100: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

Advanced Control Flow

# print words in all_words until hit a censored word!

def censored(all_words, censored_words): for word in all_words: if word in censored_words: break print(word)

Page 101: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

Advanced Control Flow

# print words in all_words until hit a censored word!

def censored(all_words, censored_words): for word in all_words: if word in censored_words: break print(word)

Page 102: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

Advanced Control Flow

# print words in all_words that aren’t censored!

Page 103: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

Advanced Control Flow

# print words in all_words that aren’t censored!

def censored(all_words, censored_words): for word in all_words: if word in censored_words: continue print(word)

Page 104: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

Advanced Control Flow

# print words in all_words that aren’t censored!

def censored(all_words, censored_words): for word in all_words: if word in censored_words: continue print(word)

Page 105: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

Think/Pair/Share:Print list of zoo animals (not including the bears) and corresponding list of number of times each animal has been fed.

Page 106: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

What’s next?

Page 107: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

Images

The Console

Everyday PythonObject-Oriented Programming

MidtermGraphics

Programming Basics

Roadmap

Life after CS106AP!

Day 1!

Data structures

ListsFilesParsing: Strings

Dictionaries 1.0

Dictionaries 2.0

Page 108: Files and Parsing - Stanford Universityweb.stanford.edu/class/archive/cs/cs106ap/cs106ap.1198/lectures/1… · Each line is ended by the ‘\n’ newline character! Except for the

What’s next?

● Dictionaries

○ Is there a better way to store complex data?