computer programming for biologists
DESCRIPTION
Computer Programming for Biologists. Class 9 Dec 5 th , 2013 Karsten Hokamp http://bioinf.gen.tcd.ie/GE3M25. Computer Programming for Biologists. Overview. revision variable scope course project file handles. Computer Programming for Biologists. Revision - Pragmas. - PowerPoint PPT PresentationTRANSCRIPT
Computer Programming for Biologists
Class 9
Dec 4th, 2014
Karsten Hokamp
http://bioinf.gen.tcd.ie/GE3M25
Computer Programming for Biologists
• mock exam
• revision
• variable scope
• extensions
• file handles
Overview
Computer Programming for Biologists
http://bioinf.gen.tcd.ie/GE3M25/programming/exam
Mock Exam
Computer Programming for Biologists
my $prot = &translate($seq); # call
sub translate { # definition
my $seq = shift @_; # parameters
…
return $prot; # return value(s)
}
Revision - Subroutines
Computer Programming for Biologists
scope
The area of the script in which a variable is visible.
Different blocks defined by:
• main program
• subroutines
• loops
• branches
different namespaces
Computer Programming for Biologists
scopemy $global_1;…while (my $input = <>) { statement1;}if (condition) {
my $local_1 = 'xxx';}
sub subroutine { my $local_1;
foreach my $nuc (@bases) {statement2;
}}
mainpart
Computer Programming for Biologists
scopemy $global_1;…while (my $input = <>) { statement1;}if (condition) {
my $local_1 = 'xxx';}
sub subroutine { my $local_1;
foreach my $nuc (@bases) {statement2;
}}
Blocks
Computer Programming for Biologists
scope
Tip: Keep local variables within subroutines
explicitly pass content between main part and subs,
e.g.: $protein = &translate($seq);
value passed into subroutinevalue passed into subroutinevalue returned from subroutinevalue returned from subroutine
avoid accidentally changing global variables
Computer Programming for Biologists
scope
# extract headerwhile ($input = <>) {
if ($input =~ /^>(.+)/) {my $header = $1;
}}
print "sequence ID: $header\n";
Global symbol "$header" requires explicit package name
Wrong:
Computer Programming for Biologists
scope
# initialize global variablemy $header = '';
# extract headerwhile ($input = <>) {
if ($input =~ /^>(.+)/) {$header = $1;
}}
print "sequence ID: $header\n";
Correct:
Computer Programming for Biologists
course projectcommon errors:
(scope)my $dna = '';
# read inputwhile (my $input = <>) {
# remove line endingchomp $input;
# append to sequence string my $dna .= $input;}
my $dna = '';
# read inputwhile (my $input = <>) {
# remove line endingchomp $input;
# append to sequence string my $dna .= $input;}
Computer Programming for Biologists
course projectcommon errors:
(scope)my $dna = '';
# read inputwhile (my $input = <>) {
# remove line endingchomp $input;
# append to sequence string my $dna .= $input;}
my $dna = '';
# read inputwhile (my $input = <>) {
# remove line endingchomp $input;
# append to sequence string my $dna .= $input;}
different variables
Computer Programming for Biologists
course projectcommon errors:
(scope)my $dna = '';
# read inputwhile (my $input = <>) {
# remove line endingchomp $input;
# append to sequence string $dna .= $input;}
my $dna = '';
# read inputwhile (my $input = <>) {
# remove line endingchomp $input;
# append to sequence string $dna .= $input;}
same variable
Computer Programming for Biologists
course projectcommon errors:(arrangement)
# print output in chunks of 60 bp widthwhile ($dna) { $out = substr $dna, 0, 60, ''; print "$i $out\n"; $i += length($out);}
# change string to array:my @chars = split //, $dna;
# print output in chunks of 60 bp widthwhile ($dna) { $out = substr $dna, 0, 60, ''; print "$i $out\n"; $i += length($out);}
# change string to array:my @chars = split //, $dna;
empties $dna
Computer Programming for Biologists
course projectconsiderations:
# form the reverse complement$dna = reverse($dna);$dna =~ tr/ACTG/TGAC/;
# translatemy $protein = &translate($dna);
# form the reverse complement$dna = reverse($dna);$dna =~ tr/ACTG/TGAC/;
# translatemy $protein = &translate($dna);
order is important
# translatemy $protein = &translate($dna);
# form the reverse complement$dna = reverse($dna);$dna =~ tr/ACTG/TGAC/;
# translatemy $protein = &translate($dna);
# form the reverse complement$dna = reverse($dna);$dna =~ tr/ACTG/TGAC/;
Computer Programming for Biologists
course projectconsiderations: # define variables:
my $do_revcomp = '';my $do_composition = '';my $do_translate = '';
# read sequencemy $dna = '';…
# calculate GC contentif ($do_composition) {
&composition($dna);}
# form the reverse complementif ($do_revcomp) {
$dna = reverse($dna);$dna =~ tr/ACTG/TGAC/;
}
# translate the proteinif ($do_translate) {
&translate($dna);}
# define variables:my $do_revcomp = '';my $do_composition = '';my $do_translate = '';
# read sequencemy $dna = '';…
# calculate GC contentif ($do_composition) {
&composition($dna);}
# form the reverse complementif ($do_revcomp) {
$dna = reverse($dna);$dna =~ tr/ACTG/TGAC/;
}
# translate the proteinif ($do_translate) {
&translate($dna);}
make actions optional
Computer Programming for Biologists
course projectconsiderations: # define variables:
my $do_revcomp = '1';my $do_composition = '';my $do_translate = '';
# read sequencemy $dna = '';…
# calculate GC contentif ($do_composition) {
&composition($dna);}
# form the reverse complementif ($do_revcomp) {
$dna = reverse($dna);$dna =~ tr/ACTG/TGAC/;
}
# translate the proteinif ($do_translate) {
&translate($dna);}
# define variables:my $do_revcomp = '1';my $do_composition = '';my $do_translate = '';
# read sequencemy $dna = '';…
# calculate GC contentif ($do_composition) {
&composition($dna);}
# form the reverse complementif ($do_revcomp) {
$dna = reverse($dna);$dna =~ tr/ACTG/TGAC/;
}
# translate the proteinif ($do_translate) {
&translate($dna);}
make actions optional
Computer Programming for Biologists
course project
Work on your course project (sequanto.pl):
1.fix bugs
2.add "choice" variables at the top
3.move code blocks into subroutines (GC-content, composition)
Computer Programming for Biologists
Control through options
Perl module
Getopt::Long
allows processing command line options.
Computer Programming for Biologists
Control through options
$ man Getopt::Long
NAME
Getopt::Long - Extended processing of command line options
SYNOPSIS
use Getopt::Long;
my $data = "file.dat";
my $length = 24;
my $verbose;
$result = GetOptions ("length=i" => \$length, # numeric
"file=s" => \$data, # string
"verbose" => \$verbose); # flag
Computer Programming for Biologists
Control through options
$ man Getopt::Long
NAME
Getopt::Long - Extended processing of command line options
SYNOPSIS
use Getopt::Long;
my $data = "file.dat";
my $length = 24;
my $verbose = '';
$result = GetOptions ("length=i" => \$length, # numeric
"file=s" => \$data, # string
"verbose" => \$verbose); # flag
typeof argument
nameof parameter
reference
Computer Programming for Biologists
Control through options
perl test.pl -verbose -length 20 -file input.txt$verbose set to '1', $length set to '20', $data set to 'input.txt'
Command line parameters (with arguments):
Computer Programming for Biologists
Control through options
perl test.pl -verbose -length 20 -file input.txt$verbose set to '1', $length set to '20', $data set to 'input.txt'
Reorder:
perl test.pl -file input.txt -length 20 –verbose
Command line parameters (with arguments):
Computer Programming for Biologists
Control through options
perl test.pl -verbose -length 20 -file input.txt$verbose set to '1', $length set to '20', $data set to 'input.txt'
Reorder:
perl test.pl -file input.txt -length 20 -verbose
Long version:
perl test.pl --verbose --length=20 --file=input.txt
Command line parameters (with arguments):
Computer Programming for Biologists
Control through options
perl test.pl -verbose -length 20 -file input.txt$verbose set to '1', $length set to '20', $data set to 'input.txt'
Reorder:
perl test.pl -file input.txt -length 20 -verbose
Long version:
perl test.pl --verbose --length=20 --file=input.txt
Short version:
perl test.pl -v -l 20 -f input.txt
Command line parameters (with arguments):
Computer Programming for Biologists
Control through options
use Getopt::Long;
my $do_translation = '';
my $do_revcomp = '';
&GetOptions ("translate" => \$do_translation,
"revcomp" => \$do_revcomp,
);
Try this in your script:
perl sequanto.pl -gc -revcomp test.fa
To allow the following execution:
Computer Programming for Biologists
Control through options
use Getopt::Long;
my $do_translation = '';
my $do_revcomp = '';
&GetOptions ("translate" => \$do_translation,
"revcomp" => \$do_revcomp,
);
Try this in your script:
perl sequanto.pl -gc -revcomp test.fa
To allow the following execution:
2. Initialise variables
1. Import module
3. Call function
define flags
associate with referenced variables
Program prints output to screen:
$ translate.pl seq.fa
MGSAILSALLSRRSQRATTIIYHYARITTQRAHGLCDII…
Redirect into file:
$ translate.pl seq.fa > seq.aa
Append to file:
$ translate.pl seq.fa >> seq.aa
Data Input/OutputRedirect output
Reading from STDIN, default input stream:
my $in = <>;
Use filehandle to read input from a specific file:
open (IN, 'input.txt'); # open file for reading
while (my $in = <IN>) { … }
# read content line by line
close IN; # close filehandle when finished
Data Input/OutputFilehandles
filehandle
Syntax:
open (FH, filename); # open file for reading
open (FH, "< filename"); # open file for reading
open (FH, "> filename"); # open file for writing
open (FH, ">> filename"); # append to file
close FH; # empties buffer
Data Input/OutputFilehandles
Write and append mode will create files if necessary
Write mode will empty file first
$file_name = 'results.txt';
if ($write_modus eq 'append') {
# append to file (creates file if necessary)
open (OUT, ">>$file_name");
} else {
# normal write (erases content if file exists)
open (OUT, ">$file_name");
}
print OUT 'some text';
close OUT; # output might not appear until FH is closed!
Data Input/OutputWriting to files
open (IN, $file_name)
or die "Can't read from $file_name: $!";
open (OUT, ">>$file_out")
or die "Can't append to $file_out: $!";
# Note: special variable $! contains error message
Data Input/OutputError check!
Always test if an important operation worked out:
One or more file names are specified after the program,
loop over each argument:
foreach my $file (@ARGV) { # special variable @ARGV
open (IN, $file) or die; # open filehandle
while (my $in = <IN>) { # read file line by line
# do something
}
close IN; # close filehandle
}
Data Input/OutputReading from Filehandle
Computer Programming for Biologists
Reading sequence from a file
# read pattern and sequence my ($pattern, $file) = @ARGV;
open (IN, $file) or die "$!";my $sequence = '';while (<IN>) {
next if (/^>/);chomp;$sequence .= $_;
}close IN;
# get patternmy $pattern = shift @ARGV;
my $sequence = '';while (<>) {
next if (/^>/);chomp;$sequence .= $_;
}
$ perl split.pl gcctg test.fa
Note: two command line arguments!
Computer Programming for Biologists
Reading sequence from a file
# read pattern and sequence my $pattern = shift @ARGV;my $file = shift @ARGV;
open (IN, $file) or die "$!";my $sequence = '';while (<IN>) {
next if (/^>/);chomp;$sequence .= $_;
}close IN;
# get patternmy $pattern = shift @ARGV;
my $sequence = '';while (<>) {
next if (/^>/);chomp;$sequence .= $_;
}
$ perl split.pl gcctg test.fa
Note: two command line arguments!
Computer Programming for Biologists
course project
Work on your course project (sequanto.pl):
1.Add explicit opening of file-handle
2.Store translated sequence into a new file
Computer Programming for Biologists
Exam
Reminder
Exam:Thu, Dec 11th, 11 - 1 pm