introduction to perl bioinformatics. what is perl? practical extraction and report language a...

Post on 19-Dec-2015

249 Views

Category:

Documents

5 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Introduction to Perl

Bioinformatics

What is Perl? Practical Extraction and Report

Language A scripting language Components

an interpreter scripts: text files created by user

describing a sequence of steps to be performed by the interpreter

Installation Create a Perl directory under C:\ Either

Download AP.msi from the course website (http://curry.ateneo.net/~jpv/BioInf07/) and execute (installs into C:\Perl directory)

Or download and unzip AP.zip into C:\Perl Reset path variable first (or edit C:\

autoexec.bat) so that you can execute scripts from MSDOS C> path=%path%;c:\Perl\bin

Writing and RunningPerl Scripts Create/edit script (extension: .pl)

C> edit first.pl

Execute script C> perl first.pl

* Tip: place your scripts in a separate work directory

# my first scriptprint “Hello World”;print “this is my first script”;

Perl Features Statements Strings Numbers and Computation Variables and Interpolation Input and Output Files Conditions and Loops Pattern Matching Arrays and Lists

Statements A Perl script is a sequence of

statements Examples of statements

print “Type in a value”;$value = <>;$square = $value * $value;print “The square is ”, $square, “\n”;

Comments Lines that start with # are ignored

by the Perl interpreter# this is a comment line

In a line, characters that follow # are also ignored$count = $count + 1; # increment

$count

Strings String

Sequence of characters Text

In Perl, characters should be surrounded by quotes ‘I am a string’ “I am a string”

Special characters specified through escape sequences (preceded by a \ ) “a newline\n and a tab\t”

Numbers Integers specified as a sequence of

digits 6 453

Decimal numbers: 33.2 6.04E24 (scientific notation)

Variables Variable: named storage for values

(such as strings and numbers) Names preceded by a $ Sample use:

$count = 5; # assignment statement$message = “Hello”; # another assignmentprint $count; # print the value of a variable

Computation Fundamental arithmetic operations:

+ - * / Others

** exponentiation () grouping

Example (try this out as a Perl script)$x = 4;$y = 2;$z = (3 + $x) ** $y;print $z, “\n”;

Interpolation Given the following script:

$x = “Smith”;print “Good morning, Mr. $x”;print ‘Good morning, Mr. $x’;

Strings quoted with “” perform expansions on variables escape characters like \n are also

interpreted when strings are quoted with “” but not when they are quoted with ‘’

Input and Output Output

print function Escape characters Interpolation

Input Bracket operator (e.g., $line = <>; ) Not typed (takes in strings or

numbers)

Input Files Opening a file

open INFILE, ’data.txt’; Input

$line = <INFILE>; Closing a file

close INFILE;

Output Files Opening

open OUTFILE, ’>result.txt’; Or, open OUTFILE, ’>>result.txt’;

#append Input

print OUTFILE “Hello”; Closing files

close OUTFILE;

Conditions Can execute statements

conditionally Syntax: Example:

if ( condition ) if ( $num > 1000 ){ { statement print “Large”; statement } …}

If - Else$num = <>;if ( $num > 1000 ){ print “Large number\n”;}else{ print “Small number\n”;}print “Thanks\n”;

Loops Repetitive execution Syntax: Example:

while ( condition )$count = 0;{ while ( $count < 10 ) statement { statement print

“counting-”, $count; … $count = $count +

1; } }

Conditions ( expr symbol expr ) Numbers

== equal <= less than or equal

!= not equal >= greater than or equal< less than> greater than

Stringseq ne lt gt le ge=~ pattern match

Functions length $str returns number of characters

in $str defined $str tests if $str is a valid string

(useful for testing if $line=<>;suceeded)

chomp $str removes last character from $str(useful because $line=<>;

includesthe newline character)

print $var displays $var on output device

Pattern Matching <string> =~ <pattern>

is a condition that that checks if a string matches a pattern

Simplest case: <pattern> specifies a search substringExample: if (s =~ /bio/) …

holds TRUE if s is “molecular biology”, “bioinformatics”, “the bionic man”;FALSE if s is “chemistry”, “bicycle”, “a BiOpsy”

Special pattern matching characters \w letters (word character) \d digit \s space character (space, tab

\n)

if ( s =~ /\w\w\s\d\d\d/ ) …holds TRUE for “CS 123 course”,“Take Ma 101 today”FALSE for “Only 1 number here”

Special pattern matching characters

. any character ^ beginning of string/line $ end of string or line

if ( s =~ /^\d\d\d\ss..r/ ) …holds TRUE for “300 spartans”FALSE for “all 100 stars”

Groups and Quantifiers [xyz] character set | alternatives * zero or more + 1 or more ? 0 or 1 {M} exactly M {M,N} between M and N characters

NCBI file Example

/VERSION\s+(\S+)\s+GI:(\S+)/

Matches a version line Parenthesis groups characters for

future retrieval $1 stands for the first version

number,$2 gets the number after “GI:”

top related