computer programming for biologists class 9 dec 4 th, 2014 karsten hokamp

37
Computer Programming for Biologists Class 9 Dec 4 th , 2014 Karsten Hokamp http://bioinf.gen.tcd.ie/GE3M25

Upload: robert-partin

Post on 15-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Computer Programming for Biologists

Class 9

Dec 4th, 2014

Karsten Hokamp

http://bioinf.gen.tcd.ie/GE3M25

Computer Programming for Biologists

• mock exam

• revision

• variable scope

• extensions

• file handles

Overview

Computer Programming for Biologists

http://bioinf.gen.tcd.ie/GE3M25/programming/exam

Mock Exam

Computer Programming for Biologists

my $prot = &translate($seq); # call

sub translate { # definition

my $seq = shift @_; # parameters

return $prot; # return value(s)

}

Revision - Subroutines

Computer Programming for Biologists

scope

The area of the script in which a variable is visible.

Different blocks defined by:

• main program

• subroutines

• loops

• branches

different namespaces

Computer Programming for Biologists

scopemy $global_1;…while (my $input = <>) { statement1;}if (condition) {

my $local_1 = 'xxx';}

sub subroutine { my $local_1;

foreach my $nuc (@bases) {statement2;

}}

mainpart

Computer Programming for Biologists

scopemy $global_1;…while (my $input = <>) { statement1;}if (condition) {

my $local_1 = 'xxx';}

sub subroutine { my $local_1;

foreach my $nuc (@bases) {statement2;

}}

Blocks

Computer Programming for Biologists

scope

Tip: Keep local variables within subroutines

explicitly pass content between main part and subs,

e.g.: $protein = &translate($seq);

value passed into subroutinevalue passed into subroutinevalue returned from subroutinevalue returned from subroutine

avoid accidentally changing global variables

Computer Programming for Biologists

scope

# extract headerwhile ($input = <>) {

if ($input =~ /^>(.+)/) {my $header = $1;

}}

print "sequence ID: $header\n";

Global symbol "$header" requires explicit package name

Wrong:

Computer Programming for Biologists

scope

# initialize global variablemy $header = '';

# extract headerwhile ($input = <>) {

if ($input =~ /^>(.+)/) {$header = $1;

}}

print "sequence ID: $header\n";

Correct:

Computer Programming for Biologists

course projectcommon errors:

(scope)my $dna = '';

# read inputwhile (my $input = <>) {

# remove line endingchomp $input;

# append to sequence string my $dna .= $input;}

my $dna = '';

# read inputwhile (my $input = <>) {

# remove line endingchomp $input;

# append to sequence string my $dna .= $input;}

Computer Programming for Biologists

course projectcommon errors:

(scope)my $dna = '';

# read inputwhile (my $input = <>) {

# remove line endingchomp $input;

# append to sequence string my $dna .= $input;}

my $dna = '';

# read inputwhile (my $input = <>) {

# remove line endingchomp $input;

# append to sequence string my $dna .= $input;}

different variables

Computer Programming for Biologists

course projectcommon errors:

(scope)my $dna = '';

# read inputwhile (my $input = <>) {

# remove line endingchomp $input;

# append to sequence string $dna .= $input;}

my $dna = '';

# read inputwhile (my $input = <>) {

# remove line endingchomp $input;

# append to sequence string $dna .= $input;}

same variable

Computer Programming for Biologists

course projectcommon errors:(arrangement)

# print output in chunks of 60 bp widthwhile ($dna) { $out = substr $dna, 0, 60, ''; print "$i $out\n"; $i += length($out);}

# change string to array:my @chars = split //, $dna;

# print output in chunks of 60 bp widthwhile ($dna) { $out = substr $dna, 0, 60, ''; print "$i $out\n"; $i += length($out);}

# change string to array:my @chars = split //, $dna;

empties $dna

Computer Programming for Biologists

course projectconsiderations:

# form the reverse complement$dna = reverse($dna);$dna =~ tr/ACTG/TGAC/;

# translatemy $protein = &translate($dna);

# form the reverse complement$dna = reverse($dna);$dna =~ tr/ACTG/TGAC/;

# translatemy $protein = &translate($dna);

order is important

# translatemy $protein = &translate($dna);

# form the reverse complement$dna = reverse($dna);$dna =~ tr/ACTG/TGAC/;

# translatemy $protein = &translate($dna);

# form the reverse complement$dna = reverse($dna);$dna =~ tr/ACTG/TGAC/;

Computer Programming for Biologists

course projectconsiderations: # define variables:

my $do_revcomp = '';my $do_composition = '';my $do_translate = '';

# read sequencemy $dna = '';…

# calculate GC contentif ($do_composition) {

&composition($dna);}

# form the reverse complementif ($do_revcomp) {

$dna = reverse($dna);$dna =~ tr/ACTG/TGAC/;

}

# translate the proteinif ($do_translate) {

&translate($dna);}

# define variables:my $do_revcomp = '';my $do_composition = '';my $do_translate = '';

# read sequencemy $dna = '';…

# calculate GC contentif ($do_composition) {

&composition($dna);}

# form the reverse complementif ($do_revcomp) {

$dna = reverse($dna);$dna =~ tr/ACTG/TGAC/;

}

# translate the proteinif ($do_translate) {

&translate($dna);}

make actions optional

Computer Programming for Biologists

course projectconsiderations: # define variables:

my $do_revcomp = '1';my $do_composition = '';my $do_translate = '';

# read sequencemy $dna = '';…

# calculate GC contentif ($do_composition) {

&composition($dna);}

# form the reverse complementif ($do_revcomp) {

$dna = reverse($dna);$dna =~ tr/ACTG/TGAC/;

}

# translate the proteinif ($do_translate) {

&translate($dna);}

# define variables:my $do_revcomp = '1';my $do_composition = '';my $do_translate = '';

# read sequencemy $dna = '';…

# calculate GC contentif ($do_composition) {

&composition($dna);}

# form the reverse complementif ($do_revcomp) {

$dna = reverse($dna);$dna =~ tr/ACTG/TGAC/;

}

# translate the proteinif ($do_translate) {

&translate($dna);}

make actions optional

Computer Programming for Biologists

course project

Work on your course project (sequanto.pl):

1.fix bugs

2.add "choice" variables at the top

3.move code blocks into subroutines (GC-content, composition)

Computer Programming for Biologists

Control through options

Perl module

Getopt::Long

allows processing command line options.

Computer Programming for Biologists

Control through options

$ man Getopt::Long

NAME

Getopt::Long - Extended processing of command line options

SYNOPSIS

use Getopt::Long;

my $data = "file.dat";

my $length = 24;

my $verbose;

$result = GetOptions ("length=i" => \$length, # numeric

"file=s" => \$data, # string

"verbose" => \$verbose); # flag

Computer Programming for Biologists

Control through options

$ man Getopt::Long

NAME

Getopt::Long - Extended processing of command line options

SYNOPSIS

use Getopt::Long;

my $data = "file.dat";

my $length = 24;

my $verbose = '';

$result = GetOptions ("length=i" => \$length, # numeric

"file=s" => \$data, # string

"verbose" => \$verbose); # flag

typeof argument

nameof parameter

reference

Computer Programming for Biologists

Control through options

perl test.pl -verbose -length 20 -file input.txt$verbose set to '1', $length set to '20', $data set to 'input.txt'

Command line parameters (with arguments):

Computer Programming for Biologists

Control through options

perl test.pl -verbose -length 20 -file input.txt$verbose set to '1', $length set to '20', $data set to 'input.txt'

Reorder:

perl test.pl -file input.txt -length 20 –verbose

Command line parameters (with arguments):

Computer Programming for Biologists

Control through options

perl test.pl -verbose -length 20 -file input.txt$verbose set to '1', $length set to '20', $data set to 'input.txt'

Reorder:

perl test.pl -file input.txt -length 20 -verbose

Long version:

perl test.pl --verbose --length=20 --file=input.txt

Command line parameters (with arguments):

Computer Programming for Biologists

Control through options

perl test.pl -verbose -length 20 -file input.txt$verbose set to '1', $length set to '20', $data set to 'input.txt'

Reorder:

perl test.pl -file input.txt -length 20 -verbose

Long version:

perl test.pl --verbose --length=20 --file=input.txt

Short version:

perl test.pl -v -l 20 -f input.txt

Command line parameters (with arguments):

Computer Programming for Biologists

Control through options

use Getopt::Long;

my $do_translation = '';

my $do_revcomp = '';

&GetOptions ("translate" => \$do_translation,

"revcomp" => \$do_revcomp,

);

Try this in your script:

perl sequanto.pl -gc -revcomp test.fa

To allow the following execution:

Computer Programming for Biologists

Control through options

use Getopt::Long;

my $do_translation = '';

my $do_revcomp = '';

&GetOptions ("translate" => \$do_translation,

"revcomp" => \$do_revcomp,

);

Try this in your script:

perl sequanto.pl -gc -revcomp test.fa

To allow the following execution:

2. Initialise variables

1. Import module

3. Call function

define flags

associate with referenced variables

Program prints output to screen:

$ translate.pl seq.fa

MGSAILSALLSRRSQRATTIIYHYARITTQRAHGLCDII…

Redirect into file:

$ translate.pl seq.fa > seq.aa

Append to file:

$ translate.pl seq.fa >> seq.aa

Data Input/OutputRedirect output

Reading from STDIN, default input stream:

my $in = <>;

Use filehandle to read input from a specific file:

open (IN, 'input.txt'); # open file for reading

while (my $in = <IN>) { … }

# read content line by line

close IN; # close filehandle when finished

Data Input/OutputFilehandles

filehandle

Syntax:

open (FH, filename); # open file for reading

open (FH, "< filename"); # open file for reading

open (FH, "> filename"); # open file for writing

open (FH, ">> filename"); # append to file

close FH; # empties buffer

Data Input/OutputFilehandles

Write and append mode will create files if necessary

Write mode will empty file first

$file_name = 'results.txt';

if ($write_modus eq 'append') {

# append to file (creates file if necessary)

open (OUT, ">>$file_name");

} else {

# normal write (erases content if file exists)

open (OUT, ">$file_name");

}

print OUT 'some text';

close OUT; # output might not appear until FH is closed!

Data Input/OutputWriting to files

open (IN, $file_name)

or die "Can't read from $file_name: $!";

open (OUT, ">>$file_out")

or die "Can't append to $file_out: $!";

# Note: special variable $! contains error message

Data Input/OutputError check!

Always test if an important operation worked out:

One or more file names are specified after the program,

loop over each argument:

foreach my $file (@ARGV) { # special variable @ARGV

open (IN, $file) or die; # open filehandle

while (my $in = <IN>) { # read file line by line

# do something

}

close IN; # close filehandle

}

Data Input/OutputReading from Filehandle

Computer Programming for Biologists

Reading sequence from a file

# read pattern and sequence my ($pattern, $file) = @ARGV;

open (IN, $file) or die "$!";my $sequence = '';while (<IN>) {

next if (/^>/);chomp;$sequence .= $_;

}close IN;

# get patternmy $pattern = shift @ARGV;

my $sequence = '';while (<>) {

next if (/^>/);chomp;$sequence .= $_;

}

$ perl split.pl gcctg test.fa

Note: two command line arguments!

Computer Programming for Biologists

Reading sequence from a file

# read pattern and sequence my $pattern = shift @ARGV;my $file = shift @ARGV;

open (IN, $file) or die "$!";my $sequence = '';while (<IN>) {

next if (/^>/);chomp;$sequence .= $_;

}close IN;

# get patternmy $pattern = shift @ARGV;

my $sequence = '';while (<>) {

next if (/^>/);chomp;$sequence .= $_;

}

$ perl split.pl gcctg test.fa

Note: two command line arguments!

Computer Programming for Biologists

course project

Work on your course project (sequanto.pl):

1.Add explicit opening of file-handle

2.Store translated sequence into a new file

Computer Programming for Biologists

Exam

Reminder

Exam:Thu, Dec 11th, 11 - 1 pm