introduction to perl pawel sirotkin 28.11-01.12.2008, riga
TRANSCRIPT
Introduction to Perl
Pawel Sirotkin28.11-01.12.2008, Riga
Overview
Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
2
About programming Why Perl?
How to write, how to run Variables Operations Basic input and output Conditionals and loops Regular expressions
About programming
Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
3
Working with algorithms Program needs to contain exact commands
(Mostly) not: Go buy some bread But: Put on your coat and shoes, open the door, go
through it, close the door, go down the stairs… Has a certain input Processes it Produces a certain output
Why Perl?
Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
4
Easy to learn Simple syntax Good at manipulating text
Good at dealing with regular expressions
How to write a Perl program
Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
5
Perl programs can be written in any text editor Notepad, vim, even Word… Recommended: A simple text editor with syntax
highlighting
Write the program code Save the file as xxx.pl
.pl extension not necessary, but useful
What is a Perl program like?
Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
6
# This *very* simple program prints "Hello World!“
print "Hello World!";
What is a Perl program like?
Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
7
The content of a line after the # is commentary. It is ignored by the program
What are commentaries for, then? They are for you, and others who will have to read
the code Imaging looking at a complex program in a few
months and trying to figure out what it does Write as much commentary as you can
# This *very* simple program prints "Hello World!“
print "Hello World!";
What is a Perl program like?
Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
8
This is a Perl command In this case, for printing text on the screen
Every command should start at a new line Not a Perl requirement, but crucial for readability
Every command should end with a semicolon; Many commands take arguments
Here: “Hello World!”
# This *very* simple program prints "Hello World!“
print "Hello World!";
What to do with the program?
Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
9
Perl works from the command line Windows: „Start“ „Run…“ Go to the directory where you saved the
program E.g.: cd C:\Perl\MyPrograms
Run the program: perl myprogram.pl
See the results of your labours!
Exercise (1)
Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
10
Create a folder for your Perl programs Open the editor of your choice and write the
„Hello World“ program The command is print „Hello World!“; Don‘t forget the commentary!
Save the program Run it! What happens if you misprint the print
command?
Variables
Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
11
The „Hello World“ program always has the same output Not a very useful program, as such
We need to be able to change the output Variables are objects that can hold different
values
Defining variables
Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
12
To define a variable, write a dollar sign followed by the variable’s name Names should consist of letters, numbers and the
underscore They should start with a letter Variable names are case-sensitive!
$a and $A are different variables! Generally, a variable’s name should tell you what
the variable does
# We define a variable „a“ and assign it a value of „42“
$a = 42;
Defining variables
Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
13
Variables can be assigned values String: text (character sequence) in quotes/double
quotes Numbers
$a = 42; $a = “some text”;
# We define a variable „a“ and assign it a value of „42“
$a = 42;
Changing variables
Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
14
Arithmetic operations $a = 42 / 2; # division $a = 42 + 5; # addition $a = $b * 2;# multiplication $a = $a - $b; # subtraction
Also useful: $a += 42; # the same as $a = $a + 42; The same for +, -, /
String operations $a = “some“ . “ text“; # concatenation $a = $a . “ more text“;
Basic output
Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
15
We have already seen an output command print “text“; print $a; print “text $a“; print “text “ . $a+$b . “ more text.“; Special characters:
\n – new line \t – tabulator
Exercise (2)
Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
16
Define a variable Assign it a value of 15 Print it Double the value Print it again Define another variable with the string
„apples“ Print both variables Change the first variable to its square and the
second to „pears“ Print both variables
Basic input
Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
17
The <> operator returns input from the standard source (usually, the keyboard)
Syntax: $a = <>;
Don’t forget to tell the user what he’s supposed to enter!
Try the following program:# This program asks the user for his name and greets him
print "What is your name? ";$name = <>;print "Hello $name!";
Input, output and new lines
Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
18
As the user input is followed by the [Enter] key, the string in $name ends in a new line
The chomp function deletes the new line at the end of a string
Try the following, modified program:
# This program asks the user for his name and greets him
print "What is your name? ";$name = <>;chomp($name);print "Hello $name!";
Exercise (3)
Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
19
Let the user enter the radius of a circle Tell him the diameter (2r), circumference (2πr)
and area (πr²) of the circle Try doing this using one variable for each measure Try doing this using only one variable
If, else
Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
20
Until now, the course the program runs is fixed The if clause allows us to take different actions
in different circumstances
# Let‘s try out a conditional clause
print "Please enter password: ";$password = <>;if ($password == 42) {
print "Correct password! Welcome.";} else {
print "Wrong password! Access denied.";}
If, else
Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
21
Note: = is the assignment operator, == is the comparison operator
Else is an optional operator triggering if the if condition fails
# Let‘s try out a conditional clause
print "Please enter password: ";$password = <>;if ($password == 42) {
print "Correct password! Welcome.";} else {
print "Wrong password! Access denied.";}
Exercise (4)
Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
22
Try out the password program. Why doesn‘t it work correctly? Fix it. Tell the user if the number he entered is too large
or too small Hint: The comparison operators you’ll need are < and >
Ask the user for a geometrical form (circle or square), and then for a radius or side length. Return the area and perimeter.
While
Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
23
What if we want to do checks until something happens?
The while loop repeats commands until its criteria are met Note: in the example below, $password has no value,
so it specifically doesn’t have the value 42
# Now on to a "while" loopwhile ($password != 42) {
print "Access denied.\n";print "Please enter password: ";$password = <>;chomp($password);
}print "Correct password! Welcome.";
Exercise (5)
Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
24
Write a small game: take a number, and make the user guess it. Tell him if it‘s too high or too low. If the user gets it right, the program terminates. If you like, you can take a random number:
$random = int (rand(10) );
Perl regular expressions
Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
25
Regular expressions very useful for text processing Perl matching character: =~
Perl non-matching character: !~ The regular expression must be in backslashes:
/regex/ The program below accepts any password that
contains the characters „42“ anywhere# A "while" loop with regular expressionswhile ($password !~ /42/) { # While the entered line doesn’t contain “42”
print "Access denied.\n";print "Please enter password: ";$password = <>;chomp($password);
}print "Correct password! Welcome.";
Perl regular expressions
Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
26
Simple string: some text One of a number of symbols: [aA]
Matches a or A Also possible: [tT]he, matching the or The
One of a continuous string of symbols: [a-h][1-8] Matches any two-character string from a1 to h8
Special characters ^ matches the beginning of a line $ matches the end of a line
Perl regular expressions
Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
27
More special characters Wildcard: the dot . Matches any single character
b.d matches bad, bed, bid, bud… Don‘t forget: it also matches forbid, badly…
+ matches one or more of the previous character re+d matches red and reed (and also reeed and so on!)
* matches zero or more occurrences of the previous character bel* matches be, bel and bell (and belll…)
? matches zero or one occurrences of the previous character soo?n Matches son or soon
Perl regular expressions
Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
28
Character classes \d: digits
Rule \d+ matches Rule 1, Rule 2, ..., Rule 334... \w: “word characters” – letters, digits, _
\w \w – any two “words” separated by a blank \s: any whitespace (blanks, tabs)
^\s+\d – any line where the first character is a digit Capitalize the symbols to get the opposite
\S is anything but whitespace, \D are non-digits…
Exercise (6)
Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
29
Write a program which asks the user for his e-mail address.
Check if the address is syntactically correct. Possible rules:
Must contain an @ character At least one symbol before it Must contain a dot At least two symbols between @ and . At least two symbols after . No fancy symbols like {§*
Do you accept addresses with more than one dot?
Perl regular expressions
Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
30
Switches Tell Perl how to deal with the regular expression /regex/i: ignore lower/upper case
/wiebke/i matches Wiebke and wiebke s/regex/regex2/: substitute regex with regex2
$text =~ s/Mark/Euro/ /regex/g: repeat match until end of the line
# What the //g switch does
$text = “The meat costs 10 Mark, the fish costs 15 Mark.”;$text2 = $text1;$text =~ s/Mark/Euro/; # “The meat costs 10 Euro, the fish costs 15 Mark.”$text2 =~ s/Mark/Euro/g; # “The meat costs 10 Euro, the fish costs 15 Euro.”
Perl regular expressions
Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
31
Grouping Allows us to use matched string /(text)/ matches text and stores it in a variable
The first group is stored in $1, the second in $2...
# Substitution and grouping
$sum = 0; # initializing the variable with zero$text = “The meat costs 10 Mark, the fish costs 15 Mark.”while ($text =~ s/(\d+) Mark/$1 Euro/) { # numbers-spaces-”Mark”
$sum = $sum + $1; # adding amount to $sum value}print “Substituted $sum Mark for Euro!”;
Reading files
Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
32
What if we want to have input from a file, not from the user?
Open file for reading: open(INPUT, "<file.ext");
Read a line: $line = <SOURCE>; $line = <>; # is just a special case
Writing files
Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
33
What if we want to print to a file, not to the screen?
Open file for writing: open(OUTPUT, “>file.ext");
Write: print OUTPUT “Some text...”;
Reading files
Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
34
A program for testing e-mail addresses Note: If we want to use a special character literally,
we need to escape it with a backslash In strings : " In regular expressions: . + * ^ $ and the backslash \ itself
open(INPUT, "<test.txt");while ($line = <INPUT>) {
chomp($line);if ($line =~ /^.+@..+\...+$/) { # testing for e-mail:
[email protected] "\"$line\" is a valid e-mail address.\n";
} else {print "E-mail address \" $line\" not valid.\n";
}}
Exercise (7)
Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
35
Make a text file and fill it with a Wikipedia article Count the number of definite and indefinite
articles Count the number of numbers and digits Insert a <number!> tag before every number
Arrays
Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
36
Arrays contain lists of variables Syntax:
@days = [“Monday“, “Tuesday“, “Friday“]; $days[0] = “Saturday“; $day = $days[2];
Useful for storing linear sequences of variables
Note: @ for whole lists, $ for single variables
Arrays
Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
37
Useful array commands push(@array, “element“);
Adds a new element to the end of the array Creates the array if necessary
$element = pop(@array); Moves the last value of @array to $element
# Trying out arrays
@tags = (“N”, “V”, “Adj”);$tag1 = pop(@tags); # $tag1 is now “Adj”, @tags is (“N”, “V”)$tag2 = pop(@tags); # $tag2 is now “V”, @tags is (“N”)Push(@tags, „V“, $tag2); # @tags is now again (“N”, “V”, “Adj”)
Hashes
Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
38
Hashes are associative arrays They are lists where the elements are not
ordered, but identified by a „name“ Syntax:
%probability = (”verb“, 0.32,“adjective“, 0.02,“adverb“, 0);
$probability{“noun”} = 0.52;
Exercise (7)
Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
39
What happens if you try to print an array? What about a hash?
What happens if you convert an array into a hash, or the other way round?
Practical: Tokenizer
Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
40
Take a Wikipedia article and put it into a text file Clean it up if necessary
Tokenize it! We only want one word per line Insert a „sentence boundary“ symbol where
appropriate The output should be another file Think about what choices you make and why!
Practical: Tagger
Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
41
Take the POS-annotated corpus from treebank.txt
Clean and tokenize it Count the tag-token probabilities Count the transition probabilities
For the first time, I strongly recommend bigrams Apply the Viterbi algorithm and tag an input
file of your choice!
Practical: Tagger++
Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
42
If it‘s still too easy, or if you want a long-term aim: Implement smoothing: words can have tags you
haven‘t seen them with, or appear in contexts you never saw them before
Try to figure out a way to guess the tags for unknown words better
Write a program to train on 9/10 of the corpus, and test it on the rest. Compare your results to the actual annotations Do this 10 times for every 9/10
Still too easy? Implement trigrams and compare the results.