more cgi programming

190
More CGI programming ...

Upload: bunny

Post on 25-Jan-2016

55 views

Category:

Documents


0 download

DESCRIPTION

More CGI programming. Back to CGI programming. Now that we know how to use conditional expressions, we can write a CGI program that determines whether the environment variables set by the HTTP server demon include one of interest. CGI program which checks for a particular env var. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: More CGI programming

More CGI programming ...

Page 2: More CGI programming

Back to CGI programming ...

• Now that we know how to use conditional expressions, we can write a CGI program that determines whether the environment variables set by the HTTP server demon include one of interest

Page 3: More CGI programming

CGI program which checks for a particular env var

#!/usr/local/bin/perl

print <<EOF;

Content-type: text/html

<HTML>

<HEAD><TITLE> Environment checking program </TITLE></HEAD>

<BODY>

<H1> Environment Variables </H1>

<p>

EOF

# next line checks if a certain key/value pair in %ENV is defined

if ( $ENV{”HTTP_GROCERY_ORDER"} )

{ print ”Request includes Grocery-Order header" }

else { print “Request does not include Grocery-Order header”};

print <<EOF;

</p>

</BODY>

</HTML>

EOF

Page 4: More CGI programming

• CS4320 got here on 4 feb 2005

Page 5: More CGI programming

More Perl ...

Page 6: More CGI programming

Defining subroutines in Perl

• A subroutine definition in Perl is of the form

sub <subroutine-name>

{ <sequence-of-statements> }• Example

sub greetTheWorld

{ print “Hello, world!\n”;

print “Have a nice day”

}• In a main program, this would be called as follows:

greetTheWorld();

Page 7: More CGI programming

Defining subroutines in Perl (contd.)

• Another example

sub printEnvironmentVariables

{ foreach my $key ( sort( keys(%ENV) ) )

{ print "<LI> $key = $ENV{$key}</LI>" }

}

• This is used on the next two slides in a new version of the CGI program which prints out its environment variables

Page 8: More CGI programming

Another CGI example ...

Page 9: More CGI programming

A revised CGI program to report env vars (Part 1)

#!/usr/local/bin/perl

print <<EOF;

Content-type: text/html

<HTML>

<HEAD>

<TITLE> Environment reporting program </TITLE>

</HEAD>

<BODY>

<H1> Environment Variables </H1>

<UL>

EOF

printEnvironmentVariables();

print <<EOF;

</UL>

</BODY>

</HTML>

EOF

Page 10: More CGI programming

A revised CGI program to report env vars (Part 2)

sub printEnvironmentVariables

{ foreach my $key ( sort( keys(%ENV) ) )

{ print "<LI> $key = $ENV{$key}</LI>" }

}

Page 11: More CGI programming

Some more Perl ...

Page 12: More CGI programming

Passing Arguments to subroutines

• The subroutines which we have defined so far have not taken any arguments

• Pre-defined Perl subroutines can take arguments, as in this program fragment:

%mothers = (Tom=>May, Bob=>Ann, Tim=>Una);

delete( $mothers{Bob} )

• Can programmer-defined subroutines take arguments?

• Yes, although the way in which they handle arguments is a little different from what you are used to

Page 13: More CGI programming

Passing Arguments to subroutines (contd.)

• Suppose we want a subroutine called greetPerson which

– takes one argument, a string, and

– prints a message greeting the person whose name is the string

• An example call might be

greetPerson(“Eamonn de Valera”)

which should produce the output

Hello, Eamonn de Valera

• The following program fragment should produce the same output:

my $person = “Eamonn de Valera”;

greetPerson($person)

• How would we define such a subroutine?

Page 14: More CGI programming

Passing Arguments to subroutines (contd.)

• Your first instinct might be to write something like this:

sub greetPerson($formalArgument)

{ print “Hello, $formalArgument” }

but that would be WRONG

• A subroutine in Perl must access its actual argument(s) through a special array variable called

@_

• Since our subroutine takes only one argument, this would be in the first element of @_, so our definition would be:

sub greetPerson

{ print “Hello, $_[0]” }

Page 15: More CGI programming

Passing Arguments to subroutines (contd.)

• Suppose we want a subroutine called greetTwoPeople which

– takes two string arguments and

– prints a message greeting the people whose names are the strings

• An example call might be

greetTwoPeople(“Eamonn”, “Michael”)

which should produce the output

Hello, Eamonn and Michael

• Since our subroutine takes two arguments, these would be in the first two elements of @_, so our definition would be:

sub greetTwoPeople

{print “Hello, $_[0] and $_[1]”}

Page 16: More CGI programming

Passing Arguments to subroutines (contd.)

• Suppose we want a subroutine called greetMember which

– takes two arguments

• an array of strings

• an integer pointing to one member of this array

– and prints a message greeting the person whose name in the indicated string

• An example use is:

@club = (Eamonn, Michael, Harry);

greetMember(2, @club)

which should produce the output

Hello, Michael

• This introduces a further complication ...

Page 17: More CGI programming

Passing Arguments to subroutines (contd.)

• All actual arguments to a subroutine are collapsed into one flat array, the special array @_

• Thus, the program fragment

@club = (Eamonn, Michael, Harry);

greetMember(2, @club)

causes the subroutine greetMember to receive an @_ whose value is

(2, Eamonn, Michael, Harry)

• So our definition would be:

sub greetMember

{ print “Hello, $_[$_[0]]” }

Page 18: More CGI programming

Using local variables in subroutines

• Local variables can be defined in subroutines using the my construct

• Indeed, doing so enables us to write subroutines which are easier to understand

• subroutine greetMember on the last slide is clearer if it written using local variables, as follows:

sub greetMember

{my ($position, @strings);

$position = $_[0]-1;

@strings = @_[1..scalar(@_)-1];

print “Hello, $strings[$position]”

}

Page 19: More CGI programming

• CS 4400 got to here on 1 February 2002

Page 20: More CGI programming

Using local variables in subroutines

• We don’t have to declare the local variables in a separate line

• We can just use the my construct in the statements where the vars first appear

• The subroutine greetMember on the last slide could also be written as follows:

sub greetMember

{my $position = $_[0]-1;

my @strings = @_[1..scalar(@_)-1];

print “Hello, $strings[$position]”

}

Page 21: More CGI programming

Using local variables in subroutines

• We can also use a subroutine called shift() to remove the first element from @_

• Since shift() also returns, as its value, the value of the removed element, we can use it in an assignment statement

• Since have removed the first element, we can then assign the new value of @_ to @strings

• The subroutine greetMember on the last slide could also be written as follows:

sub greetMember

{my $position = shift(@_);

my @strings = @_;

print “Hello, $strings[$position]”

}

Page 22: More CGI programming

Using local variables in subroutines

• What I regard as an unfortunate feature of Perl is that it allows a lot of abbreviations

• I present one here, simply because you will often see it in script archives– if no explicit argument is given to shift() in a subroutine, it is

assumed to be @_

• Thus, in a script archive, you might find subroutine greetMember on the last slide written as follows:

sub greetMember

{my $position = shift;

my @strings = @_;

print “Hello, $strings[$position]”

}

Page 23: More CGI programming

subroutines which return values

• We often need to define subroutines which return values, as in the following program fragment:

my @numbers = (1, 2, 3, 4, 5);

my $average = sum( @numbers ) / scalar( @numbers );

print $average

• It can be defined as follows:sub sum

{ my @numbers = @_;

my $sum = 0;

foreach my $value ( @numbers)

{ $sum = $sum + $value }

return $sum

}

• The value returned is specified with a return statement

Page 24: More CGI programming

subroutines which return values (contd.)

• A subroutine can contain more than one return statement• The following program fragment defines and uses a

boolean subroutine which checks for the existence of the argument passed to it

if ( present ( $ENV{"EDITOR"} ) ) { print "\n The envVar EDITOR exists" } else { print "\n The envVar EDITOR does not exist" };

sub present { my $varInQuestion = $_[0]; if ( $varInQuestion ) { return 1 } else { return 0 } }

• It enables us to write a cleaner version of a CGI program we wrote earlier

Page 25: More CGI programming

Revised CGI program which checks for an env var (part 1)

#!/usr/local/bin/perl

print <<EOF;

Content-type: text/html

<HTML>

<HEAD><TITLE> Environment checking program </TITLE></HEAD>

<BODY>

<H1> Environment Variables </H1>

<p>

EOF

if ( present($ENV{”HTTP_GROCERY_ORDER"}) )

{ print ”Request includes Grocery-Order header" }

else { print “Request does not include Grocery-Order header”};

print <<EOF;

</p>

</BODY>

</HTML>

EOF

Page 26: More CGI programming

• Cs 4320 got here on 8 february 2005

Page 27: More CGI programming

Another CGI example ...

Page 28: More CGI programming

Program reporting GET method data

• We will use much of what we have learned to write a CGI program which – is called by a HTML FORM

– and sends back to the browser a HTML page which lists the data it received from the form

Page 29: More CGI programming

Program reporting GET method data (part 1)

#!/usr/local/bin/perl

print <<EOF;

Content-type: text/html

<HTML>

<HEAD>

<TITLE> Program reporting GET method data </TITLE>

</HEAD>

<BODY>

<H1> Form Data sent by the GET method </H1>

<UL>

EOF

printFormData();

print <<EOF;

</UL>

</BODY>

</HTML>

EOF

Page 30: More CGI programming

Program reporting GET method data (part 2)

sub printFormData

{

my $queryString = $ENV{'QUERY_STRING'};

separateAndPrintDataIn($queryString)

}

sub separateAndPrintDataIn

{my (@equations, $name, $value);

@equations = split("&",$_[0]);

foreach my $equation (@equations)

{ ($name,$value) = split("=",$equation);

print ”<LI>$name = $value </LI>";

}

}

Page 31: More CGI programming

• Cs 4320 got here on 24 feb 2004

Page 32: More CGI programming

Decoding query strings

• The previous program was pretty good but it would not work in all cases

• Suppose the program is called by a HTML FORM which contains two text input elements:

– one asks for the user’s name

– one asks for the company for which he works

Page 33: More CGI programming

Decoding query strings

• Suppose the user’s name is Sean Croke and his company’s name is Black&Decker

• The QUERY_STRING received by the program will be

name=Sean+Croke&company=Black%26Decker

because space in the user’s name and the ampersand in the company’s name must be encoded for safe transmission

• separateAndPrintDataIn must be improved to cater for this

• We must learn more about string processing in Perl to do this

Page 34: More CGI programming

• CS 4320 got here on 18 Feb 2003

Page 35: More CGI programming

Some more Perl ...

Page 36: More CGI programming

String Processing in Perl

• Perl contains a facility for reasoning about regular expressions, expression that describe classes of strings

• Since dynamic web page generation is all about text processing, Perl’s regular expression tools are probably the most important reason why the language has become so widely used in server-side web programming

• We will not have time in this course to cover all of Perl’s regular expression facilities

• We will consider only a subset, including those facilities that are required by the form-data processing task we have set out to achieve

Page 37: More CGI programming

Retrieving encoded SP characters

• To retrieve the SP characters that encoded in the QUERY_STRING, we need to learn about only two operators– the translation operator tr///

– the binding operator =~

• Consider the following Perl statement

$stringVar =~ tr/+/ /;

• The binding operator =~ says that the translation expression

tr/+/ / should be applied to the contents of $stringVar • The translation expression tr/+/ / specifies that every instance of the + character should be

replaced by a SP

Page 38: More CGI programming

The tr/// operator

• In general, an application of the tr/// operator is of the form

tr/<list1>/<list2>/

where <list1> and <list2> are (rather simple) reg exprs specifying ordered character lists of equal length

• It specifies that instances of character N in <list1> should be replaced by the corresponding character in <list2>

• Example:

tr/abc/cab/

replaces any instance of a by c, any instance of b by a and any instance of c by b

• Example

tr/A-Z/Z-A/

replaces uppercase letters with the corresponding letters in a reverse-order alphabet

Page 39: More CGI programming

Back to CGI programming ...

Page 40: More CGI programming

Retrieving encoded SP characters (finally!)

• This is the revised definition of separateAndPrintDataIn

sub separateAndPrintDataIn

{my (@equations, $name, $value);

@equations = split("&",$_[0]);

foreach my $equation (@equations)

{ ($name,$value) = split("=",$equation);

$value =~ tr/+/ /;

print ”<LI>$name = $value </LI>";

}

}

Page 41: More CGI programming

Decoding URL-encodings

• The revised definition of separateAndPrintDataIn on the previous slide will handle the + char in a QUERY_STRING like

name=Sean+Croke&company=Black%26Decker

but it will not decode the URL-encoding in %26• We need to modify the subroutine still further so that,

whenever it finds a % followed by two hexadecimal digits it will replace these three characters by the single character whose URL-encoding this three-character sequence represents

• We need to learn some more Perl

Page 42: More CGI programming

Yet more Perl ...

Page 43: More CGI programming

The s/// operator

• A basic application of the s/// operator is of the form s/<pattern>/<replacement>/

where <pattern> is a regular expression and <replacement> is treated as if it were a double-quoted string

• which means that <replacement> can contain variables, some of which may be assigned values while <pattern> is matched with the target string

• The operator specifies that the first instance of <pattern> should be replaced by the corresponding interpretation of <replacement>

Page 44: More CGI programming

The s/// operator (contd.)

• Example s/// expression:

s/ab*c/ac/

this replaces the first substring of the target string that comprises “an a followed by zero or more instances of b followed by by a c” with the substring “ac”

• Example application of the above s/// expression:

$myString = “adabbbbcabbcabceee”;

print “myString is $myString\n”;

$myString =~ s/ab*c/ac/;

print “myString is $myString”

• This produces the following output

myString is adabbbbcabbcabceee

myString is adacabbcabceee

Page 45: More CGI programming

The s/// operator (contd.)

• We have seen that certain characters have a special meaning in regular expressions:– the example on the last slide used the * character which means “0

or more instances of the preceding character or pattern”

• These are called meta-characters

• Other meta-characters are listed on the next slide

Page 46: More CGI programming

The s/// operator (contd.)

• The meta-characters include:• the * character which means “0 or more instances of preceding”

• the + character, which means “1 or more instances of preceding”

• the ? character, which means “0 or 1 instances of preceding”

• the { and } character delimit an expression specifying a range of acceptable occurrences of the preceding character

• Examples:

{m} means exactly m occurences of preceding character/pattern

{m,} means at least m occurrences of preceding char/pattern

{m,n} means at least m, but not more than n, occurrences of preceding char/pattern

• Thus,

{0,} is equivalent to *

{1,} is equivalent to +

{0,1} is equivalent to ?

Page 47: More CGI programming

The s/// operator (contd.)

• Further meta-characters are:• the ^ character, which matches the start of a string

• the $ character, which matches the end of a string

• the . character which matching anything except a newline character

• the [ and ] character starts an equivalence class of characters, any of which can match one character in the target string

• the ( and ) characters delimit a group of sub-patterns

• the | character separates alternative patterns

Page 48: More CGI programming

The s/// operator (contd.)

• Example s/// expression:

s/^a.*d$/x/

this replaces the entire target string with “x”, provided the target string starts with an a, followed by zero or more non-newline characters, and ends with a d

• An example application is on the next slide

Page 49: More CGI programming

The s/// operator (contd.)

• Example application of the s/// expression on the last slide:$myString1 = “adabbbbcabbcabcede”;

print “myString1 is $myString1\n”;

$myString1 =~ s/^a.*d$/x/;

print “myString1 is $myString1\n”

print “\n”;

$myString2 = “adabbbbcabbcabceed”;

print “myString2 is $myString2\n”;

$myString2 =~ s/^a.*d$/x/;

print “myString2 is $myString2”

• This produces the following outputmyString1 is adabbbbcabbcabcede

myString1 is adabbbbcabbcabcede

myString2 is adabbbbcabbcabceed

myString2 is x

Page 50: More CGI programming

The s/// operator (contd.)

• Example s/// expression:

s/^a.{2,5}d$/x/

this replaces the entire target string with “x”, provided the target string starts with an a, followed by between two and five non-newline characters, and ends with a d

• An example application is on the next slide

Page 51: More CGI programming

The s/// operator (contd.)

• Example application of the s/// expression on the last slide:$myString1 = “adabbbbcabbcabced”;

print “myString1 is $myString1\n”;

$myString1 =~ s/^a.{2,5}d$/x/;

print “myString1 is $myString1\n”

print “\n”;

$myString2 = “afghd”;

print “myString2 is $myString2\n”;

$myString2 =~ s/^a.{2,5}d$/x/;

print “myString2 is $myString2”

• This produces the following outputmyString1 is adabbbbcabbcabced

myString1 is adabbbbcabbcabced

myString2 is afghd

myString2 is x

Page 52: More CGI programming

The s/// operator (contd.)

• Example s/// expression:

s/(abc){2,5}d/x/

this replaces the first sub-string in the target that comprises “between 2 and 5 repeats of the the pattern abc, followed by the letter d” with “x”

• An example application is on the next slide

Page 53: More CGI programming

The s/// operator (contd.)

• Example application of the s/// expression on the last slide:$myString = “abcdefabcabcabcabcdefgh”;

print “myString is $myString\n”;

$myString =~ s/(abc){2,5}d/x/;

print “myString is $myString1\n”

• This produces the following outputmyString is abcdefabcabcabcabcdefgh

myString is abcdefxefgh

Page 54: More CGI programming

The s/// operator (contd.)

• Example s/// expression:

s/(foo|bar)/x/

this replaces the first sub-string that matches either foo or bar with x

• An example application is on the next slide

Page 55: More CGI programming

The s/// operator (contd.)

• Example application of the s/// expression on the last slide:$myString1 = “abcfoodefbar”;

print “myString1 is $myString1\n”;

$myString1 =~ s/(foo|bar)/x/;

print “myString1 is $myString1\n”

print “\n”;

$myString2 = “abcbar”;

print “myString2 is $myString2\n”;

$myString2 =~ s/(foo|bar)/x/;

print “myString2 is $myString2”

• This produces the following outputmyString1 is abcfoodefbar

myString1 is abcxdefbar

myString2 is abcbar

myString2 is abcx

Page 56: More CGI programming

The s/// operator (contd.)

• Although some characters have special meanings in regular expressions, we may, sometimes, just want to use them to match themselves in the target string

• We do this by escaping them in the regular expression, by preceding them with a backslash \

• Example s/// expression:

s/^a\^+.*d$/x/

this replaces the entire target string with “x”, provided the target string starts with an a, followed by one or more carat characters, followed by zero or more non-newline characters, and ends with a d

• An example application is on the next slide

Page 57: More CGI programming

The s/// operator (contd.)

• Example application of the s/// expression on the last slide:$myString1 = “adabbbbcabbcabced”;

print “myString1 is $myString1\n”;

$myString1 =~ s/^a\^+.*d$/x/;

print “myString1 is $myString1\n”

print “\n”;

$myString2 = “a^^^abbbbcabbcabceed”;

print “myString2 is $myString2\n”;

$myString2 =~ s/^a\^+.*d$/x/;

print “myString2 is $myString2”

• This produces the following outputmyString1 is adabbbbcabbcabced

myString1 is adabbbbcabbcabced

myString2 is a^^^abbbbcabbcabceed

myString2 is x

Page 58: More CGI programming

The s/// operator (contd.)

• As mentioned earlier, the [ and ] characters have a special meaning in regular expressions – they delimit an equivalence class of characters, any one of which

may be used to match one character in the target string

• Example s/// expression:

s/a[KLM]b/x/

replaces the first substring comprising “the letter a followed by one of the three letters KLM, followed by the letter b” with the substring “x”

Page 59: More CGI programming

The s/// operator (contd.)

• The ^ character has a special meaning when used as the first character between [ and ] characters; this meaning is different from its special meaning when used outside the [ and ] characters– when used as the first character between the [ and ] characters, the

^ character specifies the complement of the equivalence class that would have been specified if its were absent

• Example s/// expression:

s/a[^KLM]b/x/

replaces the first substring comprising “the letter a followed by any single letter that is not one of KLM, followed by the letter b” with the substring “x”

Page 60: More CGI programming

The s/// operator (contd.)

• The - character also has a special meaning when used between [ and ] characters:– it is used to join the start and end of a sequence of characters, any

one of which may be used to match one character in the target string

• Example s/// expression:

s/a[0-9]b/x/

replaces the first substring comprising “the letter a followed by one digit, followed by the letter b” with the substring “x”

Page 61: More CGI programming

The s/// operator (contd.)

• Example s/// expression:

s/ %[a-fA-F0-9]/x/

replaces the first substring comprising “an % followed by a hexadecimal digit” with the substring “x”

• An example application is on the next slide

Page 62: More CGI programming

The s/// operator (contd.)

• Example application of the s/// expression on the last slide:

$myString = “a%klm%Abbb%Cyyy”;

print “myString is $myString\n”;

$myString =~ s/%[a-fA-F0-9]/x/;

print “myString is $myString”

• This produces the following output

myString is a%klm%Abbb%Cyyy

myString is a%klmxbbb%Cyyy

Page 63: More CGI programming

The s/// operator (contd.)

• Certain escape sequences also have a special meaning in regular expressions. They define certain commonly used equivalence classes of characters:\w is equivalent to [a-zA-Z0-9_] \W is equivalent to [^a-zA-Z0-9_] \d is equivalent to [0-9] \D is equivalent to [^0-9] \s is equivalent to [ \n\t\f\r] \S is equivalent to [^ \n\t\f\r] \b denotes a word boundary\B denotes a non-word boundary

• Note the SP characters in the meaning of \s and \S, that is the white-space equivalence includes SP

• Byt the way, \f is formFeed and \r is carriageReturn

Page 64: More CGI programming

The s/// operator (contd.)

• Example s/// expression:

s/ %\d\d\d\D/x/

replaces the first substring comprising “an % followed by three decimal digits, followed by a non-digit” with the substring “x”

• Example s/// expression:

s/ \s\w\w\s/x/

replaces the first substring comprising “a white-space character, followed by two word characters, followed by another white-space character” with the substring “x”

Page 65: More CGI programming

The s/// operator (contd.)

• The standard quantifiers are all "greedy”

– they match as many occurrences as possible without causing the pattern to fail.

• It is possible to make them “frugal”

– that is, make them match the minimum number of times necessary

• We do this by following the quantifier with a "?"

• *? Match 0 or more times, preferably only 0

• +? Match 1 or more times, preferably only 1 time• ?? Match 0 or 1 time, preferably only 0• {n}? Match exactly n times• {n,}? Match at least n times, preferably only n times• {n,m}? Match at least n but not more than m times, preferably

only n times

Page 66: More CGI programming

• Consider the effect of this quantifier modification below:$myString1 = "abcabcabcabc";

print "myString1 is $myString1\n";

$myString1 =~ s/(abc){2,5}/x/;

print "myString1 is $myString1\n";

$myString2 = "abcabcabcabc";

print "myString2 is $myString2\n";

$myString2 =~ s/(abc){2,5}?/x/;

print "myString2 is $myString2\n"

• This produces the following outputmyString1 is abcabcabcabc

myString1 is x

myString2 is abcabcabcabc

myString2 is xabcabc

Page 67: More CGI programming

• CS4400 got to here on 8 February 2002

Page 68: More CGI programming

The s/// operator (contd.) -- remembering subpattern matches

• When a <pattern> is being matched with a target string, substrings that match sub-patterns can be remembered and re-used later in the same pattern

• Sub-patterns whose matching substrings are to be remembered are enclosed in parentheses

• The sub-patterns are implicitly numbered, starting from 1 and their matching substrings can then be re-used later in the pattern by preceding the appropriate integer with a backslash \

Page 69: More CGI programming

The s/// operator -- remembering subpattern matches (contd.)

• Example s/// expression:

s/ %([a-fA-F0-9])\1/x/

replaces the first substring comprising “an % followed by two identical hexadecimal digits” with the substring “x”

• Example application of the above s/// expression:

$myString = “a%klm%Abb%CCbb%DDbbb%Cyyy”;

print “myString is $myString\n”;

$myString =~ s/%([a-fA-F0-9])\1/x/;

print “myString is $myString”

• This produces the following output

myString is a%klm%Abb%CCbb%DDbbb%CyyymyString is a%klm%Abbxbb%DDbbb%Cyyy

Page 70: More CGI programming

• CS 4320 got here on 28 Feb 2003

Page 71: More CGI programming

The s// operator (contd.) -- using subpattern matches in replacements

• We saw that, within a <pattern>, substrings that matched sub-patterns can be re-used later in the pattern by preceding the appropriate integer with a backslash \

• Within a <replacement>, substrings that matched sub-patterns in the <pattern> can be used by preceding the appropriate integer with a dollar $

Page 72: More CGI programming

Using subpattern matches in replacements (contd.)

• Example:

s/ %([a-fA-F0-9])\1/x$1$1/

replaces the first instance of “an % followed by two identical hexadecimal digits” with “x followed by two instances of the hexadecimal digit”

• Example application of the above s/// expression:

$myString = “a%klm%Abb%CCbb%DDbbb%Cyyy”;

print “myString is $myString\n”;

$myString =~ s/%([a-fA-F0-9])\1/x$1$1/;

print “myString is $myString”

• This produces the following output

myString is a%klm%Abb%CCbb%DDbbb%CyyymyString is a%klm%AbbxCCbb%DDbbb%Cyyy

Page 73: More CGI programming

The s/// operator (contd.) -- the g modifier

• Normally, an application of the s// operator replaces only the first instance of the <pattern> regular expression

• When the g (short for global) modifier is used, all instances are replaced

• Example:

s/ %[a-fA-F0-9]/x/

replaces the first substring comprising “an % followed by a hexadecimal digit” with the substring “x”

• Example:

s/ %[a-fA-F0-9]/x/g

replaces all substrings comprising “an % followed by a hexadecimal digit” with the substring “x”

Page 74: More CGI programming

The s/// operator (contd.) -- the e modifier

• In a normal application of the s// operator, the <replacement> is treated as if it were a double-quoted string

• When the e (short for execute) modifier is used, the <replacement> is executed as if it were standard Perl code – which means that it can involve subroutine calls using any

variables as arguments

• Example:

s/ %([a-fA-F0-9])/foo($1)/e

replaces the first substring comprising “an % followed by a hexadecimal digit” with “the result of applying the subroutine foo to a scalar variable containing this hexadecimal digit

Page 75: More CGI programming

The s/// operator (contd.) -- another example

• Example:

s/%([a-fA-F0-9][a-fA-f09])/pack("C",hex($1))/eg

This replaces all substrings comprising “an % followed by a pair of hexadecimal

digits”

with

“the result of evaluating the expression

pack("C",hex($1))

where $1 is a scalar variable containing this pair of hexadecimal digits”

Page 76: More CGI programming

Some more Perl subroutines

• hex() takes one, string, argument, interprets it as a hexadecimal number and returns the corresponding value.

• Example application$string = “aB”;$number = hex($string);print $number

• The above program fragment would produce this output

171

Page 77: More CGI programming

Some more Perl subroutines (contd.)

• pack() and unpack()

These subroutines are used to encode and decode data to/from various formats

• pack() takes a list of data values and packs them into a binary structure

• unpack() takes a string representation of a structure and expands it into a list of data values

Page 78: More CGI programming

pack()

• A call to this subroutine has the form

pack( <format>,<list> )

• The subroutine encodes data provided in <list> into the form specified by the characters in <format>

• <format> is a string whose constituent characters specify both the type of data to be packed into the structure and the order in which it is to be packed.

• <format> can contain a wide variety of characters, only one of which we will consider here:

C

which specifies an unsigned char value

Page 79: More CGI programming

pack() (contd.)

• Example application:

$str = pack( "CCCCC",100,101,102,103,104);

print $str

would produce this output

defgh

• Example application:

$str = pack( "CC",hex(“64”),hex(“65”));

print $str

would produce this output

de

Page 80: More CGI programming

unpack()

• unpack() does the reverse of pack():

• A call to this subroutine is of the form

unpack( <format>, <string-expression> );

• The subroutine unpacks the <string-expression>, which is a representation of some structure, into a list of items

• The form of the unpacking is driven by the characters in <format>

Page 81: More CGI programming

unpack() (contd.)

• Example application:

@list = unpack("CCCCC", defgh);

foreach my $member (@list)

{ print “$member “ }

• This will produce the following output: 100 101 102 103 104

Page 82: More CGI programming

The s/// operator -- another example (contd.)

• Example: application$myString = “Black%26Decker%3ACompany”;

print “myString is $myString\n”;

$myString =~ s/%([a-fA-F0-9][a-fA-f09])/pack("C",hex($1))/eg;

print “myString is $myString”

• This produces the following outputmyString is Black%26Decker%3AIreland

myString is Black&Decker:Ireland

Page 83: More CGI programming

More on s/// expressions

• All the s/// expressions we have written so far have consumed all the characters that matched the <pattern> specified by the regular expression and substituted whatever was specified by the <replacement>

• There was no notion of examining the context surrounding the consumed characters– any characters that were matched were consumed

• We need some way of matching characters without removing them from the target string

• Perl provides two meta-expression for doing this

Page 84: More CGI programming

Look-ahead checks

(?=regexp)

This is a non-consuming positive lookahead check

It matches characters in the target string against the pattern specified by the embedded regular expression regexp without consuming them from the target string

• Example

s/\w+(?=\t)/X/g

This replaces words that are followed by tabs with the character X, without removing the tabs from the target string

• Example applications are on the next slide

Page 85: More CGI programming

Look-ahead checks (contd.)

• Program fragment:$myString1 = "fred\t is a brave\t man";

print "myString1 is $myString1\n";

$myString1 =~ s/\w+\t/X/g;

print "myString1 is $myString1\n";

$myString2 = "fred\t is a brave\t man";

print "myString2 is $myString2\n";

$myString2 =~ s/\w+(?=\t)/X/g;

print "myString2 is $myString2\n”

• Output producedmyString1 is fred is a brave man

myString1 is X is a X man

myString1 is fred is a brave man

myString1 is X is a X man

Page 86: More CGI programming

Look-ahead checks (contd.)

(?!regexp)

This is a non-consuming negative lookahead check

It ensures that characters in the target string do not match the pattern specified by the embedded regular expression regexp

• Example

s/cow(?!boy)/X/g

This replaces all sub-strings “cow” with “X”, provided these sub-strings are not followed by the sub-string “boy”

Page 87: More CGI programming

• CS4400 gotr to here on 12 February 2002

Page 88: More CGI programming

Look-behind checks

(?<=regexp)

This is a non-consuming look-behind check

It ensures that preceding characters in the target string match the pattern specified by the embedded regular expression regexp

• Example

s/(?<=cow)boy/girl/g

This replaces all sub-strings “boy” with “girl”, provided these sub-strings are preceded by the sub-string “cow”

Page 89: More CGI programming

Look-behind checks (contd.)

(?<!regexp)

This is a non-consuming negative look-behind check

It ensures that preceding characters in the target string do not match the pattern specified by the embedded regular expression regexp

• Example

s/(?<!cow)boy/girl/g

This replaces all sub-strings “boy” with “girl”, provided these sub-strings are not preceded by the sub-string “cow”

Page 90: More CGI programming

Back to CGI programming ...

Page 91: More CGI programming

(At last!) back to decoding URL-encodings

This is the revised definition of separteAndPrintDataIn sub separateAndPrintDataIn

{my (@equations, $name, $value);

@equations = split("&",$_[0]);

foreach my $equation (@equations)

{ ($name,$value) = split("=",$equation);

$value =~ tr/+/ /;

$value =~

s/%([a-fA-F0-9][a-fA-f0-9])/pack("C",hex($1))/eg;

print ”<LI>$name = $value </LI>";

}

}

Page 92: More CGI programming

• The following material on Perl will not be subject to examination

Page 93: More CGI programming

Watching out for hackers

• The revised definition of separateAndPrintDataIn on the previous slide will handle the + char and the %26 URL-encoding in a QUERY_STRING like

name=Sean+Croke&Black%26Decker

but it could expose our server to hackers

• One trick of hackers is to send Server Side Include commands in form data

• We need to modify the subroutine still further so that, whenever it finds anything which looks even remotely like an SSI in form data, it eliminates the offending piece of data

• We need to learn some more Perl

Page 94: More CGI programming

Yet more Perl ...

Page 95: More CGI programming

The m// operator

• A basic application of the m// operator is of the form

m/<pattern>/

where <pattern> is a regular expression

• This expression checks whether any instance of <pattern> can be found in the target expression

• The m// operator is frequently used in conditional expressions, as part of if and while statements

Page 96: More CGI programming

The m// operator (contd.)

• Generic application

$targetString =~ m/<pattern>/

evaluates to true if $targetString contains at least one instance of <pattern>

• Generic application

$targetString !~ m/<pattern>/evaluates to true if $targetString contains no instance of

<pattern>

• Specific application

$targetString =~ m/<!--(.|\n)*-->/evaluates to true if $targetString contains at least instance

of a sub-string which looks like a HTML comment (and, therefore, might contain a Server Side Include)

Page 97: More CGI programming

The m// operator (contd.)

• The m// operator can be used in the condition part of an if statement

• Example

if ( $value =~ m/<!--(.|\n)*-->/ )

{ $value =~ s/<!--(.|\n)*-->//g };

removes from $value all sub-strings which looks like HTML comments (and which, therefore, might contain Server Side Includes)

Page 98: More CGI programming

Back to CGI programming ...

Page 99: More CGI programming

Watching out for hackers (contd.)

This is the final definition of separateAndPrintDataIn() sub separateAndPrintDataIn

{my (@equations, $name, $value);

@equations = split("&",$_[0]);

foreach my $equation (@equations)

{ ($name,$value) = split("=",$equation);

$value =~ tr/+/ /;

$value =~

s/%([a-fA-F0-9][a-fA-f0-9])/pack("C",hex($1))/eg;

if ( $value =~ m/<!--(.|\n)*-->/ )

{print ”<EM>SSI removed from following:</EM> ";

$value =~ s/<!--(.|\n)*-->//g };

print ”<LI>$name = $value </LI>";

}

}

Page 100: More CGI programming

Improved program reporting GET method data

• Using this new definition of separateAndPrintDataIn() we have an improved version of the CGI program which – is called by a HTML FORM

– and sends back to the browser a HTML page which lists the data it received from the form

Page 101: More CGI programming

Improved GET data program (part 1)

#!/usr/local/bin/perl

print <<EOF;

Content-type: text/html

<HTML>

<HEAD>

<TITLE> Form Data reporting program </TITLE>

</HEAD>

<BODY>

<H1> Form Data </H1>

<UL>

EOF

printFormData();

print <<EOF;

</UL>

</BODY>

</HTML>

EOF

Page 102: More CGI programming

Improved GET data program (part 2)

sub printFormData

{ print "<P>Your form sent these data:</P>\n";

my $queryString = $ENV{'QUERY_STRING'};

separateAndPrintDataIn($queryString)

}

Page 103: More CGI programming

Improved GET data program (part 3)

sub separateAndPrintDataIn

{my (@equations, $name, $value);

@equations = split("&",$_[0]);

foreach my $equation (@equations)

{ ($name,$value) = split("=",$equation);

$value =~ tr/+/ /;

$value =~

s/%([a-fA-F0-9][a-fA-f0-9])/pack("C",hex($1))/eg;

if ( $value =~ m/<!--(.|\n)*-->/ )

{print ”<EM>SSI removed from following:</EM> ";

$value =~ s/<!--(.|\n)*-->//g };

print ”<LI>$name = $value </LI>";

}

}

Page 104: More CGI programming

Needs further refinement

• This program needs further refinement.

• It will not properly handle forms in which there are fields which allow multiple selections

• A better version of this program is available on my ftp site

• The program is

lectureExample2.cgi

Page 105: More CGI programming

File Processing in Perl

Page 106: More CGI programming

File Processing

• File handling in Perl is based on the notion of a file handle, a token which is associated with a disk file or an input/output device

• Before a file is used, a handle must be created for it;

– subsequently, all operations on the file refer to the handle, not the file name

• However, Perl provides three pre-defined file handles

– STDIN, which is the handle for standard input;

– STDOUT, the handle for standard output;

– STDERR, the handle for the output channel where error messages should be sent

Page 107: More CGI programming

File Processing (contd.)

• In normal execution mode, the pre-defined file handles have the following associations:

– STDIN is attached to the keyboard;

– STDOUT is attached to the console;

– STDERR is attached to the console

• In CGI mode, however,

– STDIN receives data from the HTTP server demon;

– STDOUT sends data to the HTTP server demon, for onward transmission to the client;

– STDERR sends data to ??????

Page 108: More CGI programming

File Processing (contd.)

• We have already been using STDOUT implicitly

– the print() subroutine, by default, send its output there

• The statement

print( “Hello world”)

is implicitly the same as

print(STDOUT “Hello world”)

• We can re-direct the output of the print() subroutine to any file handle

• If, for example, we have already defined myHandle as a file handle, we could direct output there, as follows:

print(myHandle “Hello world”)

Page 109: More CGI programming

File Processing (contd.)

• We define a file handle when we open a file and, at the same time, associate the file handle with the file

• The open() subroutine is used to open a file

• It syntax is as follows:

open( <handle>, <access-and-name> )

where

<handle> is the token we wish to use as the handle for the file

and

<access-and-name> is a string which specifies the operating system’s name for the file and, also, the type of access we want to the file

• read-only, (the default form of access)

• write-only or

• append-only

Page 110: More CGI programming

File Processing (contd.)

• Example usage:

open(Customers, “customerFile.txt”)

opens a file called customerFile.txt and associates with it the handle Customers; because we have said nothing about the form of access, it is read-only by default

• Example usage:

open(Customers, “<customerFile.txt”)

opens a file called customerFile.txt and associates with it

the handle Customers; usage of < explicitly states that we want read-only access

Page 111: More CGI programming

File Processing (contd.)

• Example usage:

open(Customers,“>customerFile.txt”)

opens a file called customerFile.txt and associates with it

the handle Customers; usage of > explicitly states that we want write-only access

• Example usage:

open(Customers,“>>customerFile.txt”)

opens a file called customerFile.txt and associates with it

the handle Customers; usage of >> explicitly states that we want append-only access

Page 112: More CGI programming

File Processing (contd.)

• When a program is finished reading from or writing to the device associated with a file handle, the channel to the device should be closed

• This is done by using the close() subroutine; this takes only one argument, a file handle

• Example usage:

close( Customers )

Page 113: More CGI programming

Example Program

• Consider this program fragment:

open(handle1, “>output.txt”);

print(handle1 “Hello, world!\n”);

print(handle1 “How are you?”);

close(handle1)

• It places the following content in file output.txt:

Hello, world!

How are you?

Page 114: More CGI programming

Reading from a file

• We already know how to write to a file

– we use the print() subroutine, quoting the file handle

• To read from a file,

– we apply the <> input operator to the file handle

• Example usage:

$line = < myHandle99 > This reads the next available line from the file which is

associated with the handle myHandle99 and copies it into the scalar variable $line

• The input operator returns the special value undef at the end of a file

Page 115: More CGI programming

Example Program

• Consider this program fragment:

open(myHandle,"<output.txt");

$line1 = <myHandle>;

$line2 = <myHandle>;

close(myHandle);

print $line1;

print $line2

• It assumes that the file contains at least two lines

• If these are

Hello, world!

How are you?

the program fragment prints the following output

Hello, world!

How are you?

Page 116: More CGI programming

Reading from a file of unknown length

• If we do not know how many lines are in a file, we should use the while construct and a boolean subroutine called defined which checks whether its single argument has the special value undef

• Consider this program fragment:

open(myHandle,"<output.txt");

$line = <myHandle>;

while ( defined($line) )

{ print $line;

$line = <myHandle> };

close(myHandle)

• This program fragment makes no assumptions about how many lines are in the file being read

Page 117: More CGI programming

• CS4400 got to here at 13:00 on 15 February 2002

Page 118: More CGI programming

Reading a datum of known length

• If we know exactly the length of the piece of data we wish to input, we can use the read() subroutine

• Syntax:

read( <HANDLE>,<SCALAR>,<LENGTH> )

• This subroutine attempts to read <LENGTH> bytes of data into variable <SCALAR> from the device attached to <HANDLE>.

• If <LENGTH> bytes are not actually available, <SCALAR> will be assigned the bytes actually read

Page 119: More CGI programming

Reading a datum of known length (contd.)

• Consider this program fragment:

open(myHandle,"<output.txt");

read(myHandle,$line,8);

print $line;

close(myHandle)

• If the file actually contains

Hello, world!

How are you?

the program fragment prints the following eight characters to STDOUT

Hello, w

Page 120: More CGI programming

Checking that a file is available

• Consider this program fragment:

open(myHandle,"<output.txt");

$line = <myHandle>;

if ( defined($line) )

{ ... }

else { ... }

• Although it checks whether the file contains any data, it assumes that the file can be opened for reading

• This is not a safe assumption in all circumstances

• A user on the web will not be happy if a CGI resource that he has requested crashes in mid-execution

• We should always check whether a file can be opened and, if not, send a useful piece of HTML to the user’s browser

Page 121: More CGI programming

Checking that a file is available (contd.)

• Consider this program fragment:

if ( not (open(myHandle,"<output.txt") ) )

{ print”<P>File unavailable</P>” }

else { $line = <myHandle>;

...

...

}

• If the file cannot be opened, it sends a warning to the user’s browser; otherwise, it proceeds to process the file

Page 122: More CGI programming

File Locking in Perl• A CGI program should lock a data file while it is using it

• Why?

– Several copies of the program may be running at the same time

• this will happen, for example, if different users simultaneously send HTTP requests for the same CGI program resource

– Different CGI programs may require access to the same data file at the same time

• for example, one program may wish to read from the file while another may wish to add data to the file

• Perl provides an flock operator which we use

– to lock a file immediately before we use it;

– to unlock the file immediately after we have finished using it

Page 123: More CGI programming

File Locking in Perl (contd.)• Syntax of usage:

flock(<file-handle>,<lock-option>)• The lock-options include:

• 1 which requests a shared lock, which is usually adequate for reading from a file

• 2 which requests an exclusive lock, which is required when writing to a file

• 8 which releases a previously requested lock

Page 124: More CGI programming

File Locking in Perl (contd.) Requesting and receiving a shared lock means

we are happy to allow other programs to read from the file at the same time as our program is doing so;

but a program which wants to write to the file will be delayed until we release the lock, since a write program should request an exclusive lock, something it cannot receive if our shared lock request has been granted

Requesting and receiving an exclusive lock means no other program can read from, or write to, the file until we

release the lock

Page 125: More CGI programming

File Locking in Perl (contd.)

• Example fragment of file-reading program:if ( not (open(myHandle,"<output.txt") ) )

{ print”<P>File unavailable</P>” }

else { flock(myHandle,1);

$line = <myHandle>;

while ( defined($line) )

{ ...

$line = <myHandle>

};

flock(myHandle,8)

}

Page 126: More CGI programming

File Locking in Perl (contd.)

• Example fragment of file-writing program:if ( not (open(myHandle,”>output.txt") ) )

{ print”<P>File unavailable</P>” }

else { flock(myHandle,2);

... Write stuff to the file ...

flock(myHandle,8)

}

Page 127: More CGI programming

File Access Permissions for CGI programs

• Remember that, in a multi-user operating system, different users have differing permissions to access a data file

– some users may be able to write to the file

– other users may be able only to read from the file

– other users may not be permitted to access the file in any way

• In Unix, for example, these permissions appear in the output provided by the ls command

• Example ls output

-rw-r--r-- 1 fred admin 1234 Feb 19 08:24 customers.txt

Page 128: More CGI programming

File Access Permissions for CGI programs (contd.)

• Programs typically inherit the file access permissions of the users which execute the programs

• Thus, only a program executed by the user “fred” could write to the following data file:

-rw-r--r-- 1 fred admin 1234 Feb 19 08:24 customers.txt

• What permissions do CGI programs have to access data files?

• A CGI program is executed by the HTTP server demon• In Unix systems, the HTTP server demon is treated as if it

were an ordinary system user, typically with a username like “nobody” or “httpd”

• Therefore, CGI programs on Unix systems usually have whatever access permissions are possessed by the HTTP server demon

Page 129: More CGI programming

File Access Permissions for CGI programs (contd.)

• Therefore, CGI programs on Unix systems usually have whatever access permissions are possessed by the HTTP server demon

• This is a security hole:– any user of a multi-user Unix system could write a CGI program

which could modify the contents of a data file owned by any other user

• There is a way around this problem:– it involves a notion called setuid – a treatment of this is beyond the scope of this lecture– just remember, when you are doing CGI programming in the real

world, that you need to consider file access permissions

Page 130: More CGI programming

Back to CGI form handling

Page 131: More CGI programming

Back to FORM data handling

• Now that we know how to read data from files, including STDIN, we can write a CGI program which reads data from a FORM that uses the POST request method

• In fact, we can easily write a CGI program that can accept data sent by either the GET or POST method

Page 132: More CGI programming

General Form Data reporting program (part 1)

#!/usr/local/bin/perl

print <<EOF;

Content-type: text/html

<HTML>

<HEAD>

<TITLE> Form Data reporting program </TITLE>

</HEAD>

<BODY>

<H1> Form Data </H1>

<UL>

EOF

printFormData();

print <<EOF;

</UL>

</BODY>

</HTML>

EOF

Page 133: More CGI programming

General Form Data reporting program (part 2)

sub printFormData

{ my ($requestMethod, $buffer);

$requestMethod = $ENV{‘REQUEST_METHOD’};

print "<P> Your form used $requestMethod and “;

print ”it sent the following data: </P>\n";

if ($requestMethod eq 'POST')

{read(STDIN,$buffer,$ENV{'CONTENT_LENGTH'})}

else

{$buffer = $ENV{'QUERY_STRING'}};

separateAndPrintDataIn($buffer)

}

Page 134: More CGI programming

General Form Data reporting program (part 3)

sub separateAndPrintDataIn

{my (@equations, $name, $value);

@equations = split("&",$_[0]);

foreach my $equation (@equations)

{ ($name,$value) = split("=",$equation);

$value =~ tr/+/ /;

$value =~

s/%([a-fA-F0-9][a-fA-f0-9])/pack("C",hex($1))/eg;

if ( $value =~ m/<!--(.|\n)*-->/ )

{print ”<EM>SSI removed from following:</EM> ";

$value =~ s/<!--(.|\n)*-->//g };

print ”<LI>$name = $value </LI>";

}

}

Page 135: More CGI programming

Note

• This program does not handle forms which include fields that allow multiple selections

• You will have to modify this program using the techniques used in lectureExample2.cgi (available at my ftp site, as said earlier) in order to make it handle multiple selections

Page 136: More CGI programming

Some more example programs

Page 137: More CGI programming

Printing files referenced in a GET request

• Remember that the following is a well-formed HTTP request:

GET /cs4400/jabowen/cgi-bin/print.cgi/extra/path/info.txt HTTP/1.1

Host: student.cs.ucc.ie

The server recognizes that the application program is print.cgi so it passes the string /extra/path/info.txt in the

environment variable PATH_INFOPATH_INFO /extra/path/info.txt

Page 138: More CGI programming

Printing files referenced in a GET request (contd.)

• Remember that this information is also also passed in

PATH_TRANSLATED

• For example, if

DOCUMENT_ROOT /usr/local/www/docs

and

PATH_INFO /extra/path/info.txt

then

PATH_TRANSLATED /usr/local/www/docs/extra/path/info.txt

• We will now write a CGI program which prints a file whose path and name are passed in PATH_TRANSLATED

Page 139: More CGI programming

A CGI script that displays text files

#!/usr/local/bin/perl

print "Content-Type: text/plain \n\n";

$fileName = $ENV{'PATH_TRANSLATED'};

if ( open(FILE,"<$fileName”) )

{ $line = <FILE>;

while ( defined($line) )

{ print $line;

$line=<FILE>

}; close(FILE)

}

else { print ”File cannot be opened\n"}

Page 140: More CGI programming

A CGI script that displays text files (contd.)

• The script on the last slide is insecure because, – although we may think the client is restricted to files in the

DOCUMENT_ROOT hierarchy,

– the user sending the request could use “..” to go up the directory structure

• For example, he could send this requestGET /cs4400/jabowen/cgi-bin/print.cgi/../../passwords.txt HTTP/1.1

Host: student.cs.ucc.ie

resulting in PATH_TRANSLATED having this value:PATH_TRANSLATED /usr/local/www/docs/../../passwords.txt

which is equivalent to:PATH_TRANSLATED /usr/local/passwords.txt

Page 141: More CGI programming

A CGI script that displays only text files located in the document root hierarchy

#!/usr/local/bin/perl

print "Content-Type: text/plain \n\n";

$fileName = $ENV{'PATH_TRANSLATED'};

if ($fileName =~ m/\.\./)

{ print ”Bad chars in file name.\n” }

else { if ( open(FILE,"<$fileName”) )

{ $line = <FILE>;

while ( defined($line) )

{ print $line;

$line = <FILE>

}; close(FILE)

}

else {print ”File cannot be opened\n"} }

Page 142: More CGI programming

Modular programming in Perl

Packages

Page 143: More CGI programming

Modular Programming in Perl

• One advantage of programming in Perl is that we can easily access and use code written by others

• This code is available on the web in the form of modules

• One source of these modules is the Comprehensive Perl Archive Network (CPAN)

• This can be found at www.cpan.org but it is also mirrored at more than a hundred sites around the world

Page 144: More CGI programming

Using Modules in Perl

• Before a module can be used, it must be installed – either in the directory where the program using the module resides

– or in a special system directory

• The installation of new modules in the special system directory is easy but is beyond the scope of this course

• Most installations of Perl already include a large number of modules in the special systems directory and it is unlikely that you will need to install anything there for a long time

• If you wish to install your own module in your own directory, this is as simple as placing the text file containing the module in the directory

Page 145: More CGI programming

Using Modules in Perl (contd.)

• By convention, the name of a Perl module starts with an upper-case letter and is stored in a file which has the same name as the name of the module, but which has a

.pm

suffix instead of the usual .cgi or .pl suffix

• Thus, a module called MyOwnUtilities is stored in a file called MyOwnUtilities.pm

Page 146: More CGI programming

Using Modules in Perl (contd.)

• To use a module, we must specify it in a use statement near the top of our program

• There are several forms of use statement

• We will just use the simplest form, which has the following format:

use <module-name>;

• Thus, if we wish to use subroutine(s) implemented in MyOwnUtilities we must use the statement

use MyOwnUtilities;

Page 147: More CGI programming

Using Modules in Perl (contd.)

• If we wish to use a subroutine implemented by a module, we should preface the name of the desired subroutine, whenever we invoke it in our program, with

<module-name>::• Thus, if we wish to use a subroutine called myAverageOf() which is implemented in MyOwnUtilities we must invoke it as is done in the following statement:

$average = MyOwnUtilities::myAverageOf($n1,$n2);

Page 148: More CGI programming

Using Modules in Perl (contd.)

• Example Program:#!/usr/local/bin/perl

use MyOwnUtilities;

my ($n1, $n2) = (12, 24);

my $average = MyOwnUtilities::myAverageOf($n1,$n2);

print "The average of $n1 and $n2 is $average"

• Output Produced by Example program:The average of 12 and 24 is 18

Page 149: More CGI programming

Writing Modules in Perl

• We will not consider all possible details of writing modules in Perl

• However, let us consider the source code of module MyOwnUtilities

• Source code:

package MyOwnUtilities;

sub myAverageOf

{ return ( $_[0] + $_[1] )/2

}

1;

Page 150: More CGI programming

Writing Modules in Perl (contd.)

• Structure of a module:

package <module-name> ;

<definitions-of-resources- implemented-by-the-module>

1;• The first statement in a module file consists of the

keyword package followed by the name of the module

• The last statement of a module must evaluate to a value which Perl regards as true so, by convention, most programmers use the statement 1;

• In between these statements, we place the definitions of the resources implemented by the module

Page 151: More CGI programming

Object-oriented programming in Perl

Page 152: More CGI programming

• Perl is a very large language

• As well as the functional/procedural style of programming that we have seen so far, it also supports object-oriented programming

• We will not, in this course, have time to go into the details of OOP in Perl

• In fact, I mention it only because many of the packages which make Perl a powerful web programming tool are written in an OOP style

Page 153: More CGI programming

• One hint that you have found an OO script is the appearance of the arrow operator, ->, which is usually used to access a method or attribute of an object;

– if you see it, you can be fairly certain you have an OO script

• The web contains many powerful OO-based Perl libraries, modules and packages which provide utilities for useful tasks, including the easy construction of CGI programs

• Indeed, there exists a module called CGI.pm which provides a host of features that make it easy to write CGI programs

– if you ever have to write serious CGI programs in your future jobs, then you should use CGI.pm

– indeed, the only reason I have shown you how to use basic Perl for CGI programming is so that you will understand what is done by CGI.pm resources

Page 154: More CGI programming

• CPAN contains a huge number of modules apart from CGI.pm

• Among the most important are:

LWP::UserAgent which provides facilities that can be used to write special-purpose web clients;

XML::Parser which provides facilities that can be used to parse XML documents;

XML::DOM which provides facilities that can be used to manipulate XML document object models

• We may refer to some of the XML modules later in this course when we discuss XML

Page 155: More CGI programming

Some last thoughts on Perl ...

... for now, at least ..

Page 156: More CGI programming

Warning: variant syntax

• I have mixed feelings about Perl

• It is quite a useful language

• However,

– there are too many pieces of syntactic sugar

– these make the language bigger, without adding any functionality

Page 157: More CGI programming

Warning: variant syntax (contd.)

• For example, consider the until construct

• Example usage$x=0;

until ( $x > 10 )

{ print “$x\n”;

$x = $x+1 }

• But this is equivalent to the following!!!!!$x=0;

while ( $x <= 10 )

{ print “$x\n”;

$x = $x+1 }

• A whole new construct has been added for nothing. It simply means more to remember, more to forget!

Page 158: More CGI programming

Warning: variant syntax (contd.)

• Another exampleopen(FILE, “<customers.txt”);

while ( <FILE> )

{ print $_ };

close(FILE)

• This is short-hand for the followingopen(FILE, “<customers.txt”);

$line = <FILE>;

while ( defined($line) )

{ print $line;

$line = <FILE> };

close(FILE)

• Yes, it’s shorter, but it’s another piece of syntax which adds no functionality to the language.

Page 159: More CGI programming

Warning: variant syntax (contd.)

• There are many sites on the web which offer repositories of CGI scripts written in Perl

• Some of these scripts are useful; some are not

• Some of these scripts are badly written; some are not

• You will find many variants of the core Perl syntax floating around these script repositories

• Try not to get what has been called “cancer of the semi-colon” from all this unnecessary syntactic sugar

• You may often find scripts which use syntactic variants that I have not covered in these lectures

• If you do and you want to use them, search the on-line Perl documentation (see reference later) until you get an adequate explanation

Page 160: More CGI programming

More documentation on Perl

• O’Reilly & Associates, Inc. sponsor the following web-site, which contains a lot of information about Perl

http://www.perl.com/pub

• Included in this documentation is the following page which lists the pre-defined subroutines and gives brief explanations of themhttp://www.perl.com/pub/doc/manual/html/pod/perlfunc/

• For your convenience, a list of subroutines, adapted from the above site, is provided on the next few slides

Page 161: More CGI programming

Glossary of pre-defined Perl subroutines

• abs

– absolute value subroutine

• accept

– accept an incoming socket connect

• alarm

– schedule a SIGALRM

• atan2

– arctangent of Y/X

• bind

– binds an address to a socket

• binmode

– prepare binary files on old systems

• bless

– create an object

Page 162: More CGI programming

• caller

– get context of the current subroutine call

• chdir

– change your current working directory

• chmod

– changes the permissions on a list of files

• chomp

– remove a trailing record separator from a string

• chop

– remove the last character from a string

• chown

– change the owership on a list of files

• chr

– get character this number represents

Page 163: More CGI programming

• chroot

– make directory new root for path lookups

• close

– close file (or pipe or socket) handle

• closedir

– close directory handle

• connect

– connect to a remove socket

• continue

– optional trailing block in a while or foreach

• cos

– cosine function

• crypt

– one-way passwd-style encryption

Page 164: More CGI programming

• dbmclose

– breaks binding on a tied dbm file

• dbmopen

– create binding on a tied dbm file

• defined

– test whether a value, variable, or subroutine is defined

• delete

– deletes a value from a hash

• die

– raise an exception or bail out

• do

– turn a BLOCK into a TERM

• dump

– create an immediate core dump

Page 165: More CGI programming

• each

– retrieve the next key/value pair from a hash

• endgrent

– be done using group file

• endhostent

– be done using hosts file

• endnetent

– be done using networks file

• endprotoent

– be done using protocols file

• endpwent

– be done using passwd file

• endservent

– be done using services file

Page 166: More CGI programming

• eof

– test a filehandle for its end

• eval

– catch exceptions or compile code

• exec

– abandon this program to run another

• exists

– test whether a hash key is present

• exit

– terminate this program

• exp

– raise e to a power

• fcntl

– file control system all

Page 167: More CGI programming

• fileno

– return file descriptor from filehandle

• flock

– lock an entire file with an advisory lock

• fork

– create a new process just like this one

• format

– declare a picture format with use by the write() subroutine

• formline

– internal subroutine used for formats

• getc

– get the next character from the filehandle

• getgrent

– get next group record

Page 168: More CGI programming

• getgrgid

– get group record given group user ID

• getgrnam

– get group record given group name

• gethostbyaddr

– get host record given its address

• gethostbyname

– get host record given name

• gethostent

– get next hosts record

• getlogin

– return who logged in at this tty

• getnetbyaddr

– get network record given its address

Page 169: More CGI programming

• getnetbyname

– get networks record given name

• getnetent

– get next networks record

• getpeername

– find the other hend of a socket connection

• getpgrp

– get process group

• getppid

– get parent process ID

• getpriority

– get current nice value

• getprotobyname

– get protocol record given name

Page 170: More CGI programming

• getprotobynumber

– get protocol record numeric protocol

• getprotoent

– get next protocols record

• getpwent

– get next passwd record

• getpwnam

– get passwd record given user login name

• getpwuid

– get passwd record given user ID

• getservbyname

– get services record given its name

• getservbyport

– get services record given numeric port

Page 171: More CGI programming

• getservent

– get next services record

• getsockname

– retrieve the sockaddr for a given socket

• getsockopt

– get socket options on a given socket

• glob

– expand filenames using wildcards

• gmtime

– convert UNIX time into record or string using Greenwich time

• goto

– create spaghetti code

• grep

– locate elements in a list test true against a given criterion

Page 172: More CGI programming

• hex

– convert a string to a hexadecimal number

• import

– patch a module's namespace into your own

• int

– get the integer portion of a number

• ioctl

– system-dependent device control system call

• join

– join a list into a string using a separator

• keys

– retrieve list of indices from a hash

• kill

– send a signal to a process or process group

Page 173: More CGI programming

• last

– exit a block prematurely

• lc

– return lower-case version of a string

• lcfirst

– return a string with just the next letter in lower case

• length

– return the number of bytes in a string

• link

– create a hard link in the filesytem

• listen

– register your socket as a server

• local

– create a temporary value for a global variable (dynamic scoping)

Page 174: More CGI programming

• localtime

– convert UNIX time into record or string using local time

• log

– retrieve the natural logarithm for a number

• lstat

– stat a symbolic link

• m//

– match a string with a regular expression pattern

• map

– apply a change to a list to get back a new list with the changes

• mkdir

– create a directory

• msgctl

– SysV IPC message control operations

Page 175: More CGI programming

• msgget

– get SysV IPC message queue

• msgrcv

– receive a SysV IPC message from a message queue

• msgsnd

– send a SysV IPC message to a message queue

• my

– declare and assign a local variable (lexical scoping)

• next

– iterate a block prematurely

• no

– unimport some module symbols or semantics at compile time

• oct

– convert a string to an octal number

Page 176: More CGI programming

• open

– open a file, pipe, or descriptor

• opendir

– open a directory

• ord

– find a character's numeric representation

• pack

– convert a list into a binary representation

• package

– declare a separate global namespace

• pipe

– open a pair of connected filehandles

• pop

– remove the last element from an array and return it

Page 177: More CGI programming

• pos

– find or set the offset for the last/next m//g search

• print

– output a list to a filehandle

• printf

– output a formatted list to a filehandle

• prototype

– get the prototype (if any) of a subroutine

• push

– append one or more elements to an array

• q/STRING/

– singly quote a string

• qq/STRING/

– doubly quote a string

Page 178: More CGI programming

• quotemeta

– quote regular expression magic characters

• qw/STRING/

– quote a list of words

• qx/STRING/

– backquote quote a string

• rand

– retrieve the next pseudorandom number

• read

– fixed-length buffered input from a filehandle

• readdir

– get a directory from a directory handle

• readlink

– determine where a symbolic link is pointing

Page 179: More CGI programming

• recv

– receive a message over a Socket

• redo

– start this loop iteration over again

• ref

– find out the type of thing being referenced

• rename

– change a filename

• require

– load in external subroutines from a library at runtime

• reset

– clear all variables of a given name

• return

– get out of a subroutine early

Page 180: More CGI programming

• reverse

– flip a string or a list

• rewinddir

– reset directory handle

• rindex

– right-to-left substring search

• rmdir

– remove a directory

• s///

– replace a pattern with a string

• scalar

– force a scalar context

• seek

– reposition file pointer for random-access I/O

Page 181: More CGI programming

• seekdir

– reposition directory pointer

• select

– reset default output or do I/O multiplexing

• semctl

– SysV semaphore control operations

• semget

– get set of SysV semaphores

• semop

– SysV semaphore operations

• send

– send a message over a socket

• setgrent

– prepare group file for use

Page 182: More CGI programming

• sethostent

– prepare hosts file for use

• setnetent

– prepare networks file for use

• setpgrp

– set the process group of a process

• setpriority

– set a process's nice value

• setprotoent

– prepare protocols file for use

• setpwent

– prepare passwd file for use

• setservent

– prepare services file for use

Page 183: More CGI programming

• setsockopt

– set some socket options

• shift

– remove the first element of an array, and return it

• shmctl

– SysV shared memory operations

• shmget

– get SysV shared memory segment identifier

• shmread

– read SysV shared memory

• shmwrite

– write SysV shared memory

• shutdown

– close down just half of a socket connection

Page 184: More CGI programming

• sin

– return the sin of a number

• sleep

– block for some number of seconds

• socket

– create a socket

• socketpair

– create a pair of sockets

• sort

– sort a list of values

• splice

– add or remove elements anywhere in an array

• split

– split up a string using a regexp delimiter

Page 185: More CGI programming

• sprintf

– formatted print into a string

• sqrt

– square root function

• srand

– seed the random number generator

• stat

– get a file's status information

• study

– optimize input data for repeated searches

• sub

– declare a subroutine, possibly anonymously

• substr

– get or alter a portion of a stirng

Page 186: More CGI programming

• symlink

– create a symbolic link to a file

• syscall

– execute an arbitrary system call

• sysread

– fixed-length unbuffered input from a filehandle

• system

– run a separate program

• syswrite

– fixed-length unbuffered output to a filehandle

• tell

– get current seekpointer on a filehandle

• telldir

– get current seekpointer on a directory handle

Page 187: More CGI programming

• tie

– bind a variable to an object class

• time

– return number of seconds since 1970

• times

– return elapsed time for self and child processes

• tr///

– transliterate a string

• truncate

– shorten a file

• uc

– return upper-case version of a string

• ucfirst

– return a string with just the next letter in upper case

Page 188: More CGI programming

• umask

– set file creation mode mask

• undef

– remove a variable or subroutine definition

• unlink

– remove one link to a file

• unpack

– convert binary structure into normal perl variables

• unshift

– prepend more elements to the beginning of a list

• untie

– break a tie binding to a variable

• use

– load in a module at compile time

Page 189: More CGI programming

• utime

– set a file's last access and modify times

• values

– return a list of the values in a hash

• vec

– test or set particular bits in a string

• wait

– wait for any child process to die

• waitpid

– wait for a particular child process to die

• wantarray

– get list vs array context of current subroutine call

• warn

– print debugging info

Page 190: More CGI programming

• write

– print a picture record

• y///

– transliterate a string