Introduction to Perl
Bioinformatics
What is Perl? Practical Extraction and Report
Language A scripting language Components
an interpreter scripts: text files created by user
describing a sequence of steps to be performed by the interpreter
Installation Create a Perl directory under C:\ Either
Download AP.msi from the course website (http://curry.ateneo.net/~jpv/BioInf07/) and execute (installs into C:\Perl directory)
Or download and unzip AP.zip into C:\Perl Reset path variable first (or edit C:\
autoexec.bat) so that you can execute scripts from MSDOS C> path=%path%;c:\Perl\bin
Writing and RunningPerl Scripts Create/edit script (extension: .pl)
C> edit first.pl
Execute script C> perl first.pl
* Tip: place your scripts in a separate work directory
# my first scriptprint “Hello World”;print “this is my first script”;
Perl Features Statements Strings Numbers and Computation Variables and Interpolation Input and Output Files Conditions and Loops Pattern Matching Arrays and Lists
Statements A Perl script is a sequence of
statements Examples of statements
print “Type in a value”;$value = <>;$square = $value * $value;print “The square is ”, $square, “\n”;
Comments Lines that start with # are ignored
by the Perl interpreter# this is a comment line
In a line, characters that follow # are also ignored$count = $count + 1; # increment
$count
Strings String
Sequence of characters Text
In Perl, characters should be surrounded by quotes ‘I am a string’ “I am a string”
Special characters specified through escape sequences (preceded by a \ ) “a newline\n and a tab\t”
Numbers Integers specified as a sequence of
digits 6 453
Decimal numbers: 33.2 6.04E24 (scientific notation)
Variables Variable: named storage for values
(such as strings and numbers) Names preceded by a $ Sample use:
$count = 5; # assignment statement$message = “Hello”; # another assignmentprint $count; # print the value of a variable
Computation Fundamental arithmetic operations:
+ - * / Others
** exponentiation () grouping
Example (try this out as a Perl script)$x = 4;$y = 2;$z = (3 + $x) ** $y;print $z, “\n”;
Interpolation Given the following script:
$x = “Smith”;print “Good morning, Mr. $x”;print ‘Good morning, Mr. $x’;
Strings quoted with “” perform expansions on variables escape characters like \n are also
interpreted when strings are quoted with “” but not when they are quoted with ‘’
Input and Output Output
print function Escape characters Interpolation
Input Bracket operator (e.g., $line = <>; ) Not typed (takes in strings or
numbers)
Input Files Opening a file
open INFILE, ’data.txt’; Input
$line = <INFILE>; Closing a file
close INFILE;
Output Files Opening
open OUTFILE, ’>result.txt’; Or, open OUTFILE, ’>>result.txt’;
#append Input
print OUTFILE “Hello”; Closing files
close OUTFILE;
Conditions Can execute statements
conditionally Syntax: Example:
if ( condition ) if ( $num > 1000 ){ { statement print “Large”; statement } …}
If - Else$num = <>;if ( $num > 1000 ){ print “Large number\n”;}else{ print “Small number\n”;}print “Thanks\n”;
Loops Repetitive execution Syntax: Example:
while ( condition )$count = 0;{ while ( $count < 10 ) statement { statement print
“counting-”, $count; … $count = $count +
1; } }
Conditions ( expr symbol expr ) Numbers
== equal <= less than or equal
!= not equal >= greater than or equal< less than> greater than
Stringseq ne lt gt le ge=~ pattern match
Functions length $str returns number of characters
in $str defined $str tests if $str is a valid string
(useful for testing if $line=<>;suceeded)
chomp $str removes last character from $str(useful because $line=<>;
includesthe newline character)
print $var displays $var on output device
Pattern Matching <string> =~ <pattern>
is a condition that that checks if a string matches a pattern
Simplest case: <pattern> specifies a search substringExample: if (s =~ /bio/) …
holds TRUE if s is “molecular biology”, “bioinformatics”, “the bionic man”;FALSE if s is “chemistry”, “bicycle”, “a BiOpsy”
Special pattern matching characters \w letters (word character) \d digit \s space character (space, tab
\n)
if ( s =~ /\w\w\s\d\d\d/ ) …holds TRUE for “CS 123 course”,“Take Ma 101 today”FALSE for “Only 1 number here”
Special pattern matching characters
. any character ^ beginning of string/line $ end of string or line
if ( s =~ /^\d\d\d\ss..r/ ) …holds TRUE for “300 spartans”FALSE for “all 100 stars”
Groups and Quantifiers [xyz] character set | alternatives * zero or more + 1 or more ? 0 or 1 {M} exactly M {M,N} between M and N characters
NCBI file Example
/VERSION\s+(\S+)\s+GI:(\S+)/
Matches a version line Parenthesis groups characters for
future retrieval $1 stands for the first version
number,$2 gets the number after “GI:”