programming for linguists an introduction to python
TRANSCRIPT
Programming for Linguists
An Introduction to Python
ContactClaudia Peersman
Lange Winkelstraat 40, room L202 (2nd floor)
Literature“Think Python. How to Think Like a Computer
Scientist?” by Allen B. Downey freely available at:http://greenteapress.com/thinkpython/thinkpython.html
“Natural Language Processing with Python. Analyzing Text with the Natural Language Toolkit” by Steven Bird, Ewan Klein, and Edward Loperfreely available at:http://www.nltk.org/book
The Python programming language
Part 1
Formal vs. natural languages
The way of the program
Programming for linguists?
What is a program?
Debugging
Your first program
Formal vs. natural languages
Natural Languages: spoken languages, e.g. English, Dutch, French…not designed by peopleevolved naturally
Formal Languages: designed by people for specific applications, e.g.:in mathematics: notation which denotes
relationships among numbers and symbolsin chemistry: represent the chemical
structure of molecules
Many features in common:tokens, structure, syntax and
semantics
A lot of differences:Natural Languages Formal Languages
Ambiguity Nearly unambiguous
Redundancy Compact
Idioms and metaphors
Literal: they mean exactly what they
say
Some Examples5 + 5 = 10
H2O
5 + 5 = 1$0 ???
Zz ???
Illegal tokens $ and Zz
5 +: 5 = 10 ??? Legal tokens, but illegal structure +:
The way of the programProgramming = the art of problem
solving:formulate problemsthink creatively about possible
solutionsexpress a solution clearly and
accuratelytrial and error
Low-level vs. high-level languagesLow-level languages = “machine
languages”: only language a computer can execute
High-level languages like Python, Perl, Java, C++ need to be processed to a low-level language to be executed by:
compilersinterpreters
An interpreter:processes the program a little at a timealternates between reading lines and
performing computations
A compiler:translates the high-level language
completely firstonce a program is compiled, it can be
executed repeatedly without further translation
Programming for linguists?aim: handle large linguistic corpora
automatic frequency countsdistribution of linguistic features across
different categories, corporalook up context
existing tools are limited, cost money
About Python…high-level language
open source
executed by an interpreter in two ways:interactive modescript mode
interactive mode:
open the interpreter>>> prompt = ready to begintype a commandinterpreter prints the results
>>> 1 + 12
script mode:open a new window in the interpretertype a number of commandssave the program as a python script:
e.g. test.pythe program is executed whenever you
tell the interpreter to run itthe results are printed when the script is
run
Which mode to use?interactive mode:
good for testing small parts of the program before you go on
does not save the program!
script mode:put together all small parts of code in a
sequence of instructions for the computer to execute
save your programuse it again in the future
What is a program?a sequence of instructions that
specifies how to perform a computation
for linguists: the computation can also be e.g. looking up the context of words in a text, calculating average word lengths, sentence lengths, …
Some basic instructionsinput: data you type, a text you load
output: display data on the screen, send data to a file
math: perform basic mathematical operations like +, -, X, :
conditional execution: check for certain conditions and execute the appropriate instructions
repetition: perform some action repeatedly, usually with some variation
Programming = breaking a large, complex task into smaller and smaller subtasks until the subtasks are simple enough to be performed with one of these basic instructions
Debugging
Bugs = programming errors
Debugging = process of tracking down programming errors
Three kinds of bugs:syntax errors runtime errorssemantic errors
Syntax errorsrefer to the structure of the program
and the rules about that structure
if there is even a single syntax error in your code:
Python will display an error message
the execution of your program will quit immediately
An exampleparentheses:
(1 + 2) : correct syntax
2) : syntax error
Syntax errors are very common in the beginning. The more you practice and gain experience, the fewer mistakes you will make and the faster you will find them.
Runtime errorsalso called exceptions
do not appear until after the program has started to run
Python will display an error message
For example: you give the instruction to open a file, but you have typed in the wrong file name or wrong directory
Semantic errorsThe program will run perfectly, but it
will not produce the results you wanted: the meaning of the program (semantics) is wrong
Tricky errors, because:Python will not display an error
message !!you need to work backward looking
at the output of the program and try to figure out what it is doing exactly
An examplePython function read( ) vs. readline(
) vs. readlines( )
Debugging is equally important to programming itself:
not only learn how to write a programlearn to write a program that workslearn to write a program that does
what you want it to do
Always try out small pieces of code before you go on with writing your program
Try out your code on short pieces of text, so that you can verify your results manually
Your first programopen IDLE (desktop)
The first program is usually called “Hello, world!”
In Python:>>> print “Hello, world!” or
>>> print ‘Hello, world!’
Mind the quotation marks!
This is the print statement
The quotation marks mark the beginning and the end of the text to be displayed
The quotation marks do not appear in the result
Why we teach Python:
e.g. in Java:
public class Hello
{
public static void main( String[] args )
{
System.out.println( "Hello, World!" );
}}
Make some mistakesWhat happens if you:
leave out one of the quotation marksreplace “ by ‘ or vice versa in one
casespell “print” wrongdouble the quotation marksdouble the quotation marks, but
change the order
By making mistakes on purpose you will:learn which details are important
in writing program codelearn to debug more efficiently,
because you get to know what the error messages mean
Try it yourselvesWe will make time to try out new
things as we proceed
Programming is a new way of thinking for linguists
If there is a problem or you have a question, do not hesitate to mention it immediately
Values and Typesvalues = basic elements of a
programe.g. print “Hello, world!”
each value has a type:integerstringfloat
Integer: all non-decimal numbers e.g. 105
String: a string of letterse.g. “Hello, World!”
Float: numbers with a decimal pointe.g. 10.5
The interpreter can tell you the type of a value:
>>> type(105)<type ‘int’>
Try to find out what the type is of the following values:
“Hello!”3.1415Dag Jan“123”“123.456”
Try this:>>> print 123,456
Float types always have a dot, never a comma
To which kind of error could this lead?runtime errorsyntax errorsemantic error
VariablesA name that refers to a value
An assignment statement creates new variables and assigns values to them
You can choose the name yourselfe.g.
>>> text = “Everything except ‘Hello, world!’”>>> age = 26>>> pi = 3.1415
The variables now carry the values we assigned to them:>>> print text>>> print age>>> print pi
The interpreter can again tell you the type:>>> type(text)
Variable names:can be arbitrarily longcan contain both letters and
numbershave to begin with a lettercan contain uppercase lettersare case sensitive !
If you use an illegal character in your name, you will get a syntax error message:e.g. my name, live@
You cannot choose a name that is a keyword in Python:
and del from as elif global assert else if break except import class exec in continue finally is def for lambda not while or with pass yield print raise return try
Tip: try to choose names which describe what the variable is used for
StatementsUnits of code that the Python
interpreter can execute
So far we have seen the print statement and the assignment statement
A program usually contains a series of statements that are executed in an order predetermined by the programmer
e.g.>>> age1 = 20>>> age2 = 40>>> print age240>>> average_age = (age1 + age2)/2>>> print average_age30
You always have to assign a value to a variable before you can work with it
Variables have to be spelled in the same way throughout the program
If you assign a new value to an existing variable, the old value is deleted
e.g. >>>age = 20
>>>age = age + 20
>>>print age
Operators and OperandsOperators = special symbols that
represent computationse.g. +, -, *, /, **
Operands = the values the operator is applied toe.g. 2 + 2
Try 2/3
When both operands are integers, the result is again an integer
If you want a floating-point result, you have to make one of the operands a floating-point number:>>> 2/3.00.66666666666666663
you can also give a command at the beginning of your script:from __future__ import division
ExpressionsA combination of values, variables,
and operatorsTry:
>>>x = 5>>>x + 1
Now make a script of it (File New window) and run it (Run Run module)
In a script an expression all by itself does not print a result !!!
How can you modify the script so that it does produce a result ?
Order of OperationsThe order of evaluation depends on
the rules of precedence
For mathematical operators, Python follows mathematical conventions:ParenthesesExponentiationMultiplication and divisionAddition and subtraction
String OperationsIn general: no mathematical
operations on stringse.g. “hello”/ “hi” TypeError: unsupported operand type(s) for /: 'str' and 'str’
Except: the + and * operators
Try:“hello” + “hi”“hello”*2
String + string = concatenation
string * int = repetition
An expression that is either True or Falsee.g. the operator ==>>>5 == 5True>>>5 == 6False
True & False: <type ‘bool’> not string
Boolean Expressions
x == y x is equal to y
x != y x is not equal to y
x > y x is greater than y
x < y x is smaller than y
x >= y x is greater than or equal to y
x <= y x is smaller than or equal to y
Relational Operators
Remember that: = is an assignment operator used to assign a value to a variable== is a relational operator used to express equalityorder is again important
=< , =>, =! do not work!
and
or
not
Return a boolean expression:True or False
Logical Operators
Which would return True?
x > 0 and x < 5x == 3 or x == 4not(x > 5)
Conditional statements check conditions and change the behaviour of the program accordingly
if statement:e.g. >>>if x > 0 :
print “x is positive” #body
Conditional Execution
Only if the condition is True, the print statement will be executed
There is no limit on the number of statements that can appear in the body
There has to be at least one statement in the body
You can use pass as a temporary substitute for code you have not written yet:
if x > 0:pass
There are more than 2 possibilities
if, elif (else if) & elsee.g.if x > y :
print “x is greater than y”elif x < y :
print “x is smaller than y”else :
print “x is equal to y”
Chained Conditionals
There is no limit on the number of elif statements
Every elif statement has to contain at least one statement
The else statement has to come at the end, but is not necessary
Each condition is checked in order
What would the result be if x = 8?
if x == 0:print “x is 0”
elif x > 0:print “x is greater than 0”
elif x > 0 and x <10:print “x is between 0 and 10”
else:print “x is smaller than 0”
If one of the conditions is True, the corresponding branch executes and the statement ends
Even if more than one condition is True, only the first True branch executes !
Some CommentsAs programs grow and become more
complicated, they get more difficult to read
You can add notes which explain (for yourself and for others who read your code) what the program is doing:start a piece of code with “#” and add
your commenteverything from the # to the end of the
line is ignored by the program
Put 2 numbers in different variables
Print the results for the operands +, -, *, /, ** when they are applied to these 2 variables (with floating-point numbers as a result for division)
Exercises
x = 2 y = 3 print "x =", x print "y =", y print "x + y =", x + y print "x - y =", x - y print "x * y =", x * y print "x / y =", x / float(y) print "x**y =", x**y
For Next Week…Write a script called yourname_ex1.py
that calculates the average weight of 5 variables:36.5 kg47.8 kg33 kg68.3 kg72 kg
Write a script called yourname_ex2.py that assigns an integer value to a variable “age” and prints “you are a minor” if the value is under 18, that prints “you are over 18” if the value is 18 or more and prints “you are kidding” if the value is less than 0.
Python install (please use these links):for windows:
http://www.python.org/ftp/python/2.6.5/python-2.6.5.msi
for mac:http://www.python.org/ftp/python/2.6.5/python-2.6.5-macosx10.3-2010-03-24.dmg
Please mail by Tuesday next week:the scripts from the exercisesthe subject of your dissertation
Thank you