perl programming

35
Perl Programming Paul Tymann Computer Science Department Rochester Institute of Technology [email protected]

Upload: carlo

Post on 19-Mar-2016

31 views

Category:

Documents


2 download

DESCRIPTION

Perl Programming. Paul Tymann Computer Science Department Rochester Institute of Technology [email protected]. Strings. A collection of characters This slide consists of a sequence of strings CS folk have been working with strings for years - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Perl Programming

Perl Programming

Paul TymannComputer Science Department

Rochester Institute of [email protected]

Page 2: Perl Programming

2

Strings

• A collection of characters– This slide consists of a sequence of strings

• CS folk have been working with strings for years

• Many tools and algorithms have been developed to work with strings

Page 3: Perl Programming

3

Sequences

• Ask a biologist what a sequence is:– ATGCCTATGCCCCTTGAGAGA

• Show that to a CS type and ask “what is this”– It is a string!!

• In a way bioinformatics is all about manipulating strings

• CS types are real good at manipulating strings!!

Page 4: Perl Programming

4

What the heck is Perl?

• Perl a computer language designed to scan arbitrary text files, extract information from those text files, and print reports based on that information– “Perl” == “Practical Extraction and Report

Language” • What makes Perl powerful?

– It has sophisticated pattern matching capabilities– Straightforward I/O

• It was created, written, developed, and maintained by Larry Wall ([email protected])

Page 5: Perl Programming

5

Where does Perl stand?

• Perl is an interpreted language– Which means it runs slower than a compiled

language– BUT it is much easier, and quicker, to develop

programs– Some people would call Perl a scripting language

• The language is intended to be practical (easy to use, efficient, complete) rather than beautiful (tiny, elegant, minimal)

• It is a useful tool that can get the job done

Page 6: Perl Programming

6

Lots of People Are Using Perl

• There are lots of people using Perl and as a result there are lots of libraries that you can get for free

• If you can think of an application, chances are you can find the Perl code to do it

• This means writing Perl programs to do sophisticated things is easy and does not take long to to.

Page 7: Perl Programming

7

BioPerl

• Bioperl is a collection of perl modules that facilitate the development of perl scripts for bioinformatics applications

• Bioperl provides a means by which large quantities of sequence data can be analyzed in ways that are typically difficult or impossible with web based systems

• Bioperl is open source software that is still under active development

Page 8: Perl Programming

8

BioPerl Modules

• Sequence Object• Sequence flat-file format I/O• Sequence alignment objects• BLAST similarity search• Sequence database access• Sequence file indexing• Common Base Object

Page 9: Perl Programming

9

Is Perl THE tool?

• Probably not• Perl is great for munging text data to a

different form– Get a blast search off the web and extract info

from it and place it in your database• Perl is great if you want it done fast• What about more complicated programming?

– You might want to get a bigger hammer!!– There are many BIO.* packages out there.

Page 10: Perl Programming

10

Your First Perl Program

# Say Helloprint “Hello World\n”;

Comment Ignored by Interpreter

A String – a collection of characters

Escape character - newline

Print statement

ExecutionOrder

Page 11: Perl Programming

11

Perl - Unix Style

#!/usr/local/bin/perl -w

# Say Helloprint “Hello World\n”;

Comment used by Unix to run Perl

Page 12: Perl Programming

12

How To Make It Run

Create a text file that contains a Perl program (script)

Page 13: Perl Programming

13

How To Make It Run

Invoke the interpreterto run the program

Page 14: Perl Programming

14

Sometimes we make misteaks

Create the Perl script

Should be “print”

Page 15: Perl Programming

15

Sometimes we make misteaks

Run the interpreter

Page 16: Perl Programming

16

Sometimes we make misteaks

Fix the mistake

Try again

Page 17: Perl Programming

17

Your Turn!!

• Write a Perl program that prints out your name and the name of your workshop partner on separate lines

• Sample Output:Paul TymannRhys Price Jones

Page 18: Perl Programming

18

Your Second Perl Program

# Convert DNA string to RNA string$DNA = “AGGGGAGGCCTTACT”;$RNA = $DNA;$RNA =~ s/T/U/g;print “$RNA\n”;

A scalar variable holds the characters in the string

Assignment – evaluate right side and place in left

Apply operation on right to the contents of the variable on the left

Substitute all occurrences of T with U

Page 19: Perl Programming

19

Reading from the Keyboard

• You can read information from the keyboard by using– <STDIN>

• For example to read a string from the keyboard and place that string in the string variable str– $STR = <STDIN>;

• The line termination character will be read and appended to the string

Page 20: Perl Programming

20

Modified Program

# Convert DNA string to RNA stringprint "Enter DNA string: ";$DNA = <STDIN>;$RNA = $DNA;$RNA =~ s/T/U/g;print "$RNA\n";

Page 21: Perl Programming

21

Arithmetic and Logic Operators

Symbol Meaning

** Exponentiation

! Logical Negation

*/%

MultiplicationDivisionRemainder

+-

AdditionSubtraction

<><=>=

Less thanGreater thanLess or equalGreater or equal

==!=

EqualNot Equal

&& Boolean And

|| Boolean Or

Page 22: Perl Programming

22

Flow of Control

• Conditional– if ( expression ) { statements }– if ( expression ) { statements } else { statements }– If ( expression ) { statements } elsif …

• Loops– while ( expression ) { statements }– for ( init; test; increment ) { statements }

Page 23: Perl Programming

23

Examples

# Print 1 through 100 twice

$i = 1;while ( $i <= 100 ) { print $i,”\n”; $i = $i + 1;}

for ( $i = 1; $i <= 100; $i = $i + 1 ) {print $i,”\n”; $i = $i + 1;}

Page 24: Perl Programming

24

Reverse Complement

# Calculate the reverse complement$dna = <STDIN>;$revcomm = “”;for ( $pos=0; $pos<length($dna)-1; $pos = $pos + 1 ) { $base = substr( $dna, $pos, 1 ); if ( $base eq ‘A’ ) { $base = ‘T’; } elsif ( $base eq ‘T’ ) { $base = ‘A’; } elsif ( $base eq ‘C’ ) { $base = ‘G’; } else { $base = ‘C’; } $revcomm = $revcomm . $base;}print $revcomm,”\n”;

Don’t include the newline

String concatenation

Page 25: Perl Programming

25

Perl IS Different

while ( <> ) { print if /blue/;}

Treat each argument on the command line as a file name. Open the files one at a time and step through them a line at a time

Print the current line if it contains the string “blue”

Page 26: Perl Programming

26

Your Turn!!

• Change the reverse complement program so that– It reads the DNA strings from a file whose name is

supplied on the command line. You may assume that each DNA string is on a separate line

– Instead of calculating the reverse complement starting at the beginning of the string, your program must start at the end of the DNA and work towards the front

Page 27: Perl Programming

27

Lists

• A list is an object consisting of a sequence of values– 1, 2, 3, 5, 7, 11, 13, 17, 19, 23– 1..10– ‘a’..’z’

• A list that has been given a name is called an array– @small_primes = (1, 2, 3, 5, 7, 11, 13, 17);

• The individual elements of a list must be scalars

Page 28: Perl Programming

28

Fibonacci

@fibs = ( 1, 1 );

for ( $i = 2; $i <= 10; $i = $i + 1 ) { $fibs[ $i ] = $fibs[ $i - 1 ] + $fibs[ $i - 2];}

print “I calculated ",$#fibs," fibs\n";print @fibs,"\n"

A list with the first two Fibonacci numbers

Add the previous two numbers to get the next one

Extends the list and puts the next number there Numbers of items in the list

Page 29: Perl Programming

29

Regular Expressions

• Provide a way of writing a compact description of a set of strings– Sort of like wildcards

• Single character patterns– A single character matches itself– A “.” matches any single character except newline– [characters] – matches any one of the characters– ^ means “does not match”

Page 30: Perl Programming

30

Examples

• G• [0123456789]• [0-9]• [a-zA-z]• [^0-9]

Page 31: Perl Programming

31

Character Class Abbreviations

Construct Class Negated Class\d (digits) [0-9] \D [^0-9]

\w (words) [a-zA-Z0-9_]* \W [^a-zA-Z0-9_]

\s space [ \r\t\n\f] \S [^ \r\t\n\f]

Page 32: Perl Programming

32

Grouping Patterns

• Sequence– abc

• Multipliers– * - zero or more of the previous character

• a*b b, ab, aab, aaab, aaaab, …

– + - one or more of the previous character• a+b ab, aab, aaab, …

Page 33: Perl Programming

33

My Problem

XXXX, ROBERT 4653 N VCSG-4 rma9999 XXXXXX, ADAM 3976 N VCSG-4 716-555-4281 alb9999 XXXXXXX, EDWARD 4637 N VCSG-2 716-555-4780 esb9999 XXXXXXX, JOHN 1906 N VCSG-4 716-555-4780 XXXX, DERRICK 6432 N VCSG-2 716-555-3161 dxc9999 XXXXXXXXX, JOHN 5034 N VCSG-2 716-555-3894 jak9999 XXX, JASON 9020 N VCSG-2 716-555-3145 jsl9999 XXXXXXX, SARAH 7610 N VCSG-2 716-555-3147 sem9999 XXXXXXXX, CHRISTOPHER 6309 N VCSG-2 716-555-3427 cco9999 XXXXXXX, MICHAEL 8195 N VCSG-2 716-555-3166 mpp9999 XXXXXX, SHAUN 9925 N VCSG-2 716-555-3145 sls9999 XXXXXX, WILLIAM 2568 N VCSG-2 716-555-3144 wjw9999 XXXXXX, PATRICK 2335 N EECC-2 716-555-3144 psw9999

Page 34: Perl Programming

34

Roster to CSV

while(<>) {

($last,$first,$id,$ntid,$gradeType,$program,$phone,$email)= /([^,]+), (\S+) (\d{4}) (\S*) (\S*) (\S+) (\S*) (\S*).*/;

print "\"$last,$first\",$id,$program,$email\@cs.rit.edu\n";}

Match 1 or more non-comma characters

Match 1 or more non-whitespace characters

Match 4 digits Match 0 or more non-whitespace characters (the fields may not be in the input

Match anything!!

XXXXXXX, EDWARD 4637 N VCSG-2 716-555-4780 esb9999

Page 35: Perl Programming

35

The Result

"XXXX,ROBERT",4653,VCSG-4,[email protected]"XXXXXX,ADAM",3976,VCSG-4,[email protected]"XXXXXXX,EDWARD",4637,VCSG-2,[email protected]"XXXXXXX,JOHN",1906,VCSG-4,@cs.rit.edu"XXXX,DERRICK",6432,VCSG-2,[email protected]"XXXXXXXXX,JOHN",5034,VCSG-2,[email protected]"XXX,JASON",9020,VCSG-2,[email protected]"XXXXXXX,SARAH",7610,VCSG-2,[email protected]"XXXXXXXX,CHRISTOPHER",6309,VCSG-2,[email protected]"XXXXXXX,MICHAEL",8195,VCSG-2,[email protected]"XXXXXX,SHAUN",9925,VCSG-2,[email protected]"XXXXXX,WILLIAM",2568,VCSG-2,[email protected]"XXXXXX,PATRICK",2335,EECC-2,[email protected]