introduction to perl pawel sirotkin 28.11-01.12.2008, riga

42
Introduction to Perl Pawel Sirotkin 28.11-01.12.2008, Riga

Upload: tyler-harrell

Post on 13-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Perl Pawel Sirotkin 28.11-01.12.2008, Riga

Introduction to Perl

Pawel Sirotkin28.11-01.12.2008, Riga

Page 2: Introduction to Perl Pawel Sirotkin 28.11-01.12.2008, Riga

Overview

Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

2

About programming Why Perl?

How to write, how to run Variables Operations Basic input and output Conditionals and loops Regular expressions

Page 3: Introduction to Perl Pawel Sirotkin 28.11-01.12.2008, Riga

About programming

Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

3

Working with algorithms Program needs to contain exact commands

(Mostly) not: Go buy some bread But: Put on your coat and shoes, open the door, go

through it, close the door, go down the stairs… Has a certain input Processes it Produces a certain output

Page 4: Introduction to Perl Pawel Sirotkin 28.11-01.12.2008, Riga

Why Perl?

Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

4

Easy to learn Simple syntax Good at manipulating text

Good at dealing with regular expressions

Page 5: Introduction to Perl Pawel Sirotkin 28.11-01.12.2008, Riga

How to write a Perl program

Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

5

Perl programs can be written in any text editor Notepad, vim, even Word… Recommended: A simple text editor with syntax

highlighting

Write the program code Save the file as xxx.pl

.pl extension not necessary, but useful

Page 6: Introduction to Perl Pawel Sirotkin 28.11-01.12.2008, Riga

What is a Perl program like?

Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

6

# This *very* simple program prints "Hello World!“

print "Hello World!";

Page 7: Introduction to Perl Pawel Sirotkin 28.11-01.12.2008, Riga

What is a Perl program like?

Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

7

The content of a line after the # is commentary. It is ignored by the program

What are commentaries for, then? They are for you, and others who will have to read

the code Imaging looking at a complex program in a few

months and trying to figure out what it does Write as much commentary as you can

# This *very* simple program prints "Hello World!“

print "Hello World!";

Page 8: Introduction to Perl Pawel Sirotkin 28.11-01.12.2008, Riga

What is a Perl program like?

Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

8

This is a Perl command In this case, for printing text on the screen

Every command should start at a new line Not a Perl requirement, but crucial for readability

Every command should end with a semicolon; Many commands take arguments

Here: “Hello World!”

# This *very* simple program prints "Hello World!“

print "Hello World!";

Page 9: Introduction to Perl Pawel Sirotkin 28.11-01.12.2008, Riga

What to do with the program?

Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

9

Perl works from the command line Windows: „Start“ „Run…“ Go to the directory where you saved the

program E.g.: cd C:\Perl\MyPrograms

Run the program: perl myprogram.pl

See the results of your labours!

Page 10: Introduction to Perl Pawel Sirotkin 28.11-01.12.2008, Riga

Exercise (1)

Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

10

Create a folder for your Perl programs Open the editor of your choice and write the

„Hello World“ program The command is print „Hello World!“; Don‘t forget the commentary!

Save the program Run it! What happens if you misprint the print

command?

Page 11: Introduction to Perl Pawel Sirotkin 28.11-01.12.2008, Riga

Variables

Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

11

The „Hello World“ program always has the same output Not a very useful program, as such

We need to be able to change the output Variables are objects that can hold different

values

Page 12: Introduction to Perl Pawel Sirotkin 28.11-01.12.2008, Riga

Defining variables

Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

12

To define a variable, write a dollar sign followed by the variable’s name Names should consist of letters, numbers and the

underscore They should start with a letter Variable names are case-sensitive!

$a and $A are different variables! Generally, a variable’s name should tell you what

the variable does

# We define a variable „a“ and assign it a value of „42“

$a = 42;

Page 13: Introduction to Perl Pawel Sirotkin 28.11-01.12.2008, Riga

Defining variables

Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

13

Variables can be assigned values String: text (character sequence) in quotes/double

quotes Numbers

$a = 42; $a = “some text”;

# We define a variable „a“ and assign it a value of „42“

$a = 42;

Page 14: Introduction to Perl Pawel Sirotkin 28.11-01.12.2008, Riga

Changing variables

Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

14

Arithmetic operations $a = 42 / 2; # division $a = 42 + 5; # addition $a = $b * 2;# multiplication $a = $a - $b; # subtraction

Also useful: $a += 42; # the same as $a = $a + 42; The same for +, -, /

String operations $a = “some“ . “ text“; # concatenation $a = $a . “ more text“;

Page 15: Introduction to Perl Pawel Sirotkin 28.11-01.12.2008, Riga

Basic output

Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

15

We have already seen an output command print “text“; print $a; print “text $a“; print “text “ . $a+$b . “ more text.“; Special characters:

\n – new line \t – tabulator

Page 16: Introduction to Perl Pawel Sirotkin 28.11-01.12.2008, Riga

Exercise (2)

Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

16

Define a variable Assign it a value of 15 Print it Double the value Print it again Define another variable with the string

„apples“ Print both variables Change the first variable to its square and the

second to „pears“ Print both variables

Page 17: Introduction to Perl Pawel Sirotkin 28.11-01.12.2008, Riga

Basic input

Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

17

The <> operator returns input from the standard source (usually, the keyboard)

Syntax: $a = <>;

Don’t forget to tell the user what he’s supposed to enter!

Try the following program:# This program asks the user for his name and greets him

print "What is your name? ";$name = <>;print "Hello $name!";

Page 18: Introduction to Perl Pawel Sirotkin 28.11-01.12.2008, Riga

Input, output and new lines

Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

18

As the user input is followed by the [Enter] key, the string in $name ends in a new line

The chomp function deletes the new line at the end of a string

Try the following, modified program:

# This program asks the user for his name and greets him

print "What is your name? ";$name = <>;chomp($name);print "Hello $name!";

Page 19: Introduction to Perl Pawel Sirotkin 28.11-01.12.2008, Riga

Exercise (3)

Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

19

Let the user enter the radius of a circle Tell him the diameter (2r), circumference (2πr)

and area (πr²) of the circle Try doing this using one variable for each measure Try doing this using only one variable

Page 20: Introduction to Perl Pawel Sirotkin 28.11-01.12.2008, Riga

If, else

Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

20

Until now, the course the program runs is fixed The if clause allows us to take different actions

in different circumstances

# Let‘s try out a conditional clause

print "Please enter password: ";$password = <>;if ($password == 42) {

print "Correct password! Welcome.";} else {

print "Wrong password! Access denied.";}

Page 21: Introduction to Perl Pawel Sirotkin 28.11-01.12.2008, Riga

If, else

Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

21

Note: = is the assignment operator, == is the comparison operator

Else is an optional operator triggering if the if condition fails

# Let‘s try out a conditional clause

print "Please enter password: ";$password = <>;if ($password == 42) {

print "Correct password! Welcome.";} else {

print "Wrong password! Access denied.";}

Page 22: Introduction to Perl Pawel Sirotkin 28.11-01.12.2008, Riga

Exercise (4)

Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

22

Try out the password program. Why doesn‘t it work correctly? Fix it. Tell the user if the number he entered is too large

or too small Hint: The comparison operators you’ll need are < and >

Ask the user for a geometrical form (circle or square), and then for a radius or side length. Return the area and perimeter.

Page 23: Introduction to Perl Pawel Sirotkin 28.11-01.12.2008, Riga

While

Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

23

What if we want to do checks until something happens?

The while loop repeats commands until its criteria are met Note: in the example below, $password has no value,

so it specifically doesn’t have the value 42

# Now on to a "while" loopwhile ($password != 42) {

print "Access denied.\n";print "Please enter password: ";$password = <>;chomp($password);

}print "Correct password! Welcome.";

Page 24: Introduction to Perl Pawel Sirotkin 28.11-01.12.2008, Riga

Exercise (5)

Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

24

Write a small game: take a number, and make the user guess it. Tell him if it‘s too high or too low. If the user gets it right, the program terminates. If you like, you can take a random number:

$random = int (rand(10) );

Page 25: Introduction to Perl Pawel Sirotkin 28.11-01.12.2008, Riga

Perl regular expressions

Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

25

Regular expressions very useful for text processing Perl matching character: =~

Perl non-matching character: !~ The regular expression must be in backslashes:

/regex/ The program below accepts any password that

contains the characters „42“ anywhere# A "while" loop with regular expressionswhile ($password !~ /42/) { # While the entered line doesn’t contain “42”

print "Access denied.\n";print "Please enter password: ";$password = <>;chomp($password);

}print "Correct password! Welcome.";

Page 26: Introduction to Perl Pawel Sirotkin 28.11-01.12.2008, Riga

Perl regular expressions

Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

26

Simple string: some text One of a number of symbols: [aA]

Matches a or A Also possible: [tT]he, matching the or The

One of a continuous string of symbols: [a-h][1-8] Matches any two-character string from a1 to h8

Special characters ^ matches the beginning of a line $ matches the end of a line

Page 27: Introduction to Perl Pawel Sirotkin 28.11-01.12.2008, Riga

Perl regular expressions

Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

27

More special characters Wildcard: the dot . Matches any single character

b.d matches bad, bed, bid, bud… Don‘t forget: it also matches forbid, badly…

+ matches one or more of the previous character re+d matches red and reed (and also reeed and so on!)

* matches zero or more occurrences of the previous character bel* matches be, bel and bell (and belll…)

? matches zero or one occurrences of the previous character soo?n Matches son or soon

Page 28: Introduction to Perl Pawel Sirotkin 28.11-01.12.2008, Riga

Perl regular expressions

Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

28

Character classes \d: digits

Rule \d+ matches Rule 1, Rule 2, ..., Rule 334... \w: “word characters” – letters, digits, _

\w \w – any two “words” separated by a blank \s: any whitespace (blanks, tabs)

^\s+\d – any line where the first character is a digit Capitalize the symbols to get the opposite

\S is anything but whitespace, \D are non-digits…

Page 29: Introduction to Perl Pawel Sirotkin 28.11-01.12.2008, Riga

Exercise (6)

Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

29

Write a program which asks the user for his e-mail address.

Check if the address is syntactically correct. Possible rules:

Must contain an @ character At least one symbol before it Must contain a dot At least two symbols between @ and . At least two symbols after . No fancy symbols like {§*

Do you accept addresses with more than one dot?

Page 30: Introduction to Perl Pawel Sirotkin 28.11-01.12.2008, Riga

Perl regular expressions

Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

30

Switches Tell Perl how to deal with the regular expression /regex/i: ignore lower/upper case

/wiebke/i matches Wiebke and wiebke s/regex/regex2/: substitute regex with regex2

$text =~ s/Mark/Euro/ /regex/g: repeat match until end of the line

# What the //g switch does

$text = “The meat costs 10 Mark, the fish costs 15 Mark.”;$text2 = $text1;$text =~ s/Mark/Euro/; # “The meat costs 10 Euro, the fish costs 15 Mark.”$text2 =~ s/Mark/Euro/g; # “The meat costs 10 Euro, the fish costs 15 Euro.”

Page 31: Introduction to Perl Pawel Sirotkin 28.11-01.12.2008, Riga

Perl regular expressions

Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

31

Grouping Allows us to use matched string /(text)/ matches text and stores it in a variable

The first group is stored in $1, the second in $2...

# Substitution and grouping

$sum = 0; # initializing the variable with zero$text = “The meat costs 10 Mark, the fish costs 15 Mark.”while ($text =~ s/(\d+) Mark/$1 Euro/) { # numbers-spaces-”Mark”

$sum = $sum + $1; # adding amount to $sum value}print “Substituted $sum Mark for Euro!”;

Page 32: Introduction to Perl Pawel Sirotkin 28.11-01.12.2008, Riga

Reading files

Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

32

What if we want to have input from a file, not from the user?

Open file for reading: open(INPUT, "<file.ext");

Read a line: $line = <SOURCE>; $line = <>; # is just a special case

Page 33: Introduction to Perl Pawel Sirotkin 28.11-01.12.2008, Riga

Writing files

Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

33

What if we want to print to a file, not to the screen?

Open file for writing: open(OUTPUT, “>file.ext");

Write: print OUTPUT “Some text...”;

Page 34: Introduction to Perl Pawel Sirotkin 28.11-01.12.2008, Riga

Reading files

Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

34

A program for testing e-mail addresses Note: If we want to use a special character literally,

we need to escape it with a backslash In strings : " In regular expressions: . + * ^ $ and the backslash \ itself

open(INPUT, "<test.txt");while ($line = <INPUT>) {

chomp($line);if ($line =~ /^.+@..+\...+$/) { # testing for e-mail:

[email protected] "\"$line\" is a valid e-mail address.\n";

} else {print "E-mail address \" $line\" not valid.\n";

}}

Page 35: Introduction to Perl Pawel Sirotkin 28.11-01.12.2008, Riga

Exercise (7)

Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

35

Make a text file and fill it with a Wikipedia article Count the number of definite and indefinite

articles Count the number of numbers and digits Insert a <number!> tag before every number

Page 36: Introduction to Perl Pawel Sirotkin 28.11-01.12.2008, Riga

Arrays

Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

36

Arrays contain lists of variables Syntax:

@days = [“Monday“, “Tuesday“, “Friday“]; $days[0] = “Saturday“; $day = $days[2];

Useful for storing linear sequences of variables

Note: @ for whole lists, $ for single variables

Page 37: Introduction to Perl Pawel Sirotkin 28.11-01.12.2008, Riga

Arrays

Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

37

Useful array commands push(@array, “element“);

Adds a new element to the end of the array Creates the array if necessary

$element = pop(@array); Moves the last value of @array to $element

# Trying out arrays

@tags = (“N”, “V”, “Adj”);$tag1 = pop(@tags); # $tag1 is now “Adj”, @tags is (“N”, “V”)$tag2 = pop(@tags); # $tag2 is now “V”, @tags is (“N”)Push(@tags, „V“, $tag2); # @tags is now again (“N”, “V”, “Adj”)

Page 38: Introduction to Perl Pawel Sirotkin 28.11-01.12.2008, Riga

Hashes

Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

38

Hashes are associative arrays They are lists where the elements are not

ordered, but identified by a „name“ Syntax:

%probability = (”verb“, 0.32,“adjective“, 0.02,“adverb“, 0);

$probability{“noun”} = 0.52;

Page 39: Introduction to Perl Pawel Sirotkin 28.11-01.12.2008, Riga

Exercise (7)

Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

39

What happens if you try to print an array? What about a hash?

What happens if you convert an array into a hash, or the other way round?

Page 40: Introduction to Perl Pawel Sirotkin 28.11-01.12.2008, Riga

Practical: Tokenizer

Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

40

Take a Wikipedia article and put it into a text file Clean it up if necessary

Tokenize it! We only want one word per line Insert a „sentence boundary“ symbol where

appropriate The output should be another file Think about what choices you make and why!

Page 41: Introduction to Perl Pawel Sirotkin 28.11-01.12.2008, Riga

Practical: Tagger

Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

41

Take the POS-annotated corpus from treebank.txt

Clean and tokenize it Count the tag-token probabilities Count the transition probabilities

For the first time, I strongly recommend bigrams Apply the Viterbi algorithm and tag an input

file of your choice!

Page 42: Introduction to Perl Pawel Sirotkin 28.11-01.12.2008, Riga

Practical: Tagger++

Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

42

If it‘s still too easy, or if you want a long-term aim: Implement smoothing: words can have tags you

haven‘t seen them with, or appear in contexts you never saw them before

Try to figure out a way to guess the tags for unknown words better

Write a program to train on 9/10 of the corpus, and test it on the rest. Compare your results to the actual annotations Do this 10 times for every 9/10

Still too easy? Implement trigrams and compare the results.