computer programming for biologists

37
Computer Programming for Biologists Class 9 Dec 4 th , 2014 Karsten Hokamp http://bioinf.gen.tcd.ie/GE3M25

Upload: vonda

Post on 13-Jan-2016

59 views

Category:

Documents


0 download

DESCRIPTION

Computer Programming for Biologists. Class 9 Dec 5 th , 2013 Karsten Hokamp http://bioinf.gen.tcd.ie/GE3M25. Computer Programming for Biologists. Overview. revision variable scope course project file handles. Computer Programming for Biologists. Revision - Pragmas. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Computer Programming for Biologists

Computer Programming for Biologists

Class 9

Dec 4th, 2014

Karsten Hokamp

http://bioinf.gen.tcd.ie/GE3M25

Page 2: Computer Programming for Biologists

Computer Programming for Biologists

• mock exam

• revision

• variable scope

• extensions

• file handles

Overview

Page 3: Computer Programming for Biologists

Computer Programming for Biologists

http://bioinf.gen.tcd.ie/GE3M25/programming/exam

Mock Exam

Page 4: Computer Programming for Biologists

Computer Programming for Biologists

my $prot = &translate($seq); # call

sub translate { # definition

my $seq = shift @_; # parameters

return $prot; # return value(s)

}

Revision - Subroutines

Page 5: Computer Programming for Biologists

Computer Programming for Biologists

scope

The area of the script in which a variable is visible.

Different blocks defined by:

• main program

• subroutines

• loops

• branches

different namespaces

Page 6: Computer Programming for Biologists

Computer Programming for Biologists

scopemy $global_1;…while (my $input = <>) { statement1;}if (condition) {

my $local_1 = 'xxx';}

sub subroutine { my $local_1;

foreach my $nuc (@bases) {statement2;

}}

mainpart

Page 7: Computer Programming for Biologists

Computer Programming for Biologists

scopemy $global_1;…while (my $input = <>) { statement1;}if (condition) {

my $local_1 = 'xxx';}

sub subroutine { my $local_1;

foreach my $nuc (@bases) {statement2;

}}

Blocks

Page 8: Computer Programming for Biologists

Computer Programming for Biologists

scope

Tip: Keep local variables within subroutines

explicitly pass content between main part and subs,

e.g.: $protein = &translate($seq);

value passed into subroutinevalue passed into subroutinevalue returned from subroutinevalue returned from subroutine

avoid accidentally changing global variables

Page 9: Computer Programming for Biologists

Computer Programming for Biologists

scope

# extract headerwhile ($input = <>) {

if ($input =~ /^>(.+)/) {my $header = $1;

}}

print "sequence ID: $header\n";

Global symbol "$header" requires explicit package name

Wrong:

Page 10: Computer Programming for Biologists

Computer Programming for Biologists

scope

# initialize global variablemy $header = '';

# extract headerwhile ($input = <>) {

if ($input =~ /^>(.+)/) {$header = $1;

}}

print "sequence ID: $header\n";

Correct:

Page 11: Computer Programming for Biologists

Computer Programming for Biologists

course projectcommon errors:

(scope)my $dna = '';

# read inputwhile (my $input = <>) {

# remove line endingchomp $input;

# append to sequence string my $dna .= $input;}

my $dna = '';

# read inputwhile (my $input = <>) {

# remove line endingchomp $input;

# append to sequence string my $dna .= $input;}

Page 12: Computer Programming for Biologists

Computer Programming for Biologists

course projectcommon errors:

(scope)my $dna = '';

# read inputwhile (my $input = <>) {

# remove line endingchomp $input;

# append to sequence string my $dna .= $input;}

my $dna = '';

# read inputwhile (my $input = <>) {

# remove line endingchomp $input;

# append to sequence string my $dna .= $input;}

different variables

Page 13: Computer Programming for Biologists

Computer Programming for Biologists

course projectcommon errors:

(scope)my $dna = '';

# read inputwhile (my $input = <>) {

# remove line endingchomp $input;

# append to sequence string $dna .= $input;}

my $dna = '';

# read inputwhile (my $input = <>) {

# remove line endingchomp $input;

# append to sequence string $dna .= $input;}

same variable

Page 14: Computer Programming for Biologists

Computer Programming for Biologists

course projectcommon errors:(arrangement)

# print output in chunks of 60 bp widthwhile ($dna) { $out = substr $dna, 0, 60, ''; print "$i $out\n"; $i += length($out);}

# change string to array:my @chars = split //, $dna;

# print output in chunks of 60 bp widthwhile ($dna) { $out = substr $dna, 0, 60, ''; print "$i $out\n"; $i += length($out);}

# change string to array:my @chars = split //, $dna;

empties $dna

Page 15: Computer Programming for Biologists

Computer Programming for Biologists

course projectconsiderations:

# form the reverse complement$dna = reverse($dna);$dna =~ tr/ACTG/TGAC/;

# translatemy $protein = &translate($dna);

# form the reverse complement$dna = reverse($dna);$dna =~ tr/ACTG/TGAC/;

# translatemy $protein = &translate($dna);

order is important

# translatemy $protein = &translate($dna);

# form the reverse complement$dna = reverse($dna);$dna =~ tr/ACTG/TGAC/;

# translatemy $protein = &translate($dna);

# form the reverse complement$dna = reverse($dna);$dna =~ tr/ACTG/TGAC/;

Page 16: Computer Programming for Biologists

Computer Programming for Biologists

course projectconsiderations: # define variables:

my $do_revcomp = '';my $do_composition = '';my $do_translate = '';

# read sequencemy $dna = '';…

# calculate GC contentif ($do_composition) {

&composition($dna);}

# form the reverse complementif ($do_revcomp) {

$dna = reverse($dna);$dna =~ tr/ACTG/TGAC/;

}

# translate the proteinif ($do_translate) {

&translate($dna);}

# define variables:my $do_revcomp = '';my $do_composition = '';my $do_translate = '';

# read sequencemy $dna = '';…

# calculate GC contentif ($do_composition) {

&composition($dna);}

# form the reverse complementif ($do_revcomp) {

$dna = reverse($dna);$dna =~ tr/ACTG/TGAC/;

}

# translate the proteinif ($do_translate) {

&translate($dna);}

make actions optional

Page 17: Computer Programming for Biologists

Computer Programming for Biologists

course projectconsiderations: # define variables:

my $do_revcomp = '1';my $do_composition = '';my $do_translate = '';

# read sequencemy $dna = '';…

# calculate GC contentif ($do_composition) {

&composition($dna);}

# form the reverse complementif ($do_revcomp) {

$dna = reverse($dna);$dna =~ tr/ACTG/TGAC/;

}

# translate the proteinif ($do_translate) {

&translate($dna);}

# define variables:my $do_revcomp = '1';my $do_composition = '';my $do_translate = '';

# read sequencemy $dna = '';…

# calculate GC contentif ($do_composition) {

&composition($dna);}

# form the reverse complementif ($do_revcomp) {

$dna = reverse($dna);$dna =~ tr/ACTG/TGAC/;

}

# translate the proteinif ($do_translate) {

&translate($dna);}

make actions optional

Page 18: Computer Programming for Biologists

Computer Programming for Biologists

course project

Work on your course project (sequanto.pl):

1.fix bugs

2.add "choice" variables at the top

3.move code blocks into subroutines (GC-content, composition)

Page 19: Computer Programming for Biologists

Computer Programming for Biologists

Control through options

Perl module

Getopt::Long

allows processing command line options.

Page 20: Computer Programming for Biologists

Computer Programming for Biologists

Control through options

$ man Getopt::Long

NAME

Getopt::Long - Extended processing of command line options

SYNOPSIS

use Getopt::Long;

my $data = "file.dat";

my $length = 24;

my $verbose;

$result = GetOptions ("length=i" => \$length, # numeric

"file=s" => \$data, # string

"verbose" => \$verbose); # flag

Page 21: Computer Programming for Biologists

Computer Programming for Biologists

Control through options

$ man Getopt::Long

NAME

Getopt::Long - Extended processing of command line options

SYNOPSIS

use Getopt::Long;

my $data = "file.dat";

my $length = 24;

my $verbose = '';

$result = GetOptions ("length=i" => \$length, # numeric

"file=s" => \$data, # string

"verbose" => \$verbose); # flag

typeof argument

nameof parameter

reference

Page 22: Computer Programming for Biologists

Computer Programming for Biologists

Control through options

perl test.pl -verbose -length 20 -file input.txt$verbose set to '1', $length set to '20', $data set to 'input.txt'

Command line parameters (with arguments):

Page 23: Computer Programming for Biologists

Computer Programming for Biologists

Control through options

perl test.pl -verbose -length 20 -file input.txt$verbose set to '1', $length set to '20', $data set to 'input.txt'

Reorder:

perl test.pl -file input.txt -length 20 –verbose

Command line parameters (with arguments):

Page 24: Computer Programming for Biologists

Computer Programming for Biologists

Control through options

perl test.pl -verbose -length 20 -file input.txt$verbose set to '1', $length set to '20', $data set to 'input.txt'

Reorder:

perl test.pl -file input.txt -length 20 -verbose

Long version:

perl test.pl --verbose --length=20 --file=input.txt

Command line parameters (with arguments):

Page 25: Computer Programming for Biologists

Computer Programming for Biologists

Control through options

perl test.pl -verbose -length 20 -file input.txt$verbose set to '1', $length set to '20', $data set to 'input.txt'

Reorder:

perl test.pl -file input.txt -length 20 -verbose

Long version:

perl test.pl --verbose --length=20 --file=input.txt

Short version:

perl test.pl -v -l 20 -f input.txt

Command line parameters (with arguments):

Page 26: Computer Programming for Biologists

Computer Programming for Biologists

Control through options

use Getopt::Long;

my $do_translation = '';

my $do_revcomp = '';

&GetOptions ("translate" => \$do_translation,

"revcomp" => \$do_revcomp,

);

Try this in your script:

perl sequanto.pl -gc -revcomp test.fa

To allow the following execution:

Page 27: Computer Programming for Biologists

Computer Programming for Biologists

Control through options

use Getopt::Long;

my $do_translation = '';

my $do_revcomp = '';

&GetOptions ("translate" => \$do_translation,

"revcomp" => \$do_revcomp,

);

Try this in your script:

perl sequanto.pl -gc -revcomp test.fa

To allow the following execution:

2. Initialise variables

1. Import module

3. Call function

define flags

associate with referenced variables

Page 28: Computer Programming for Biologists

Program prints output to screen:

$ translate.pl seq.fa

MGSAILSALLSRRSQRATTIIYHYARITTQRAHGLCDII…

Redirect into file:

$ translate.pl seq.fa > seq.aa

Append to file:

$ translate.pl seq.fa >> seq.aa

Data Input/OutputRedirect output

Page 29: Computer Programming for Biologists

Reading from STDIN, default input stream:

my $in = <>;

Use filehandle to read input from a specific file:

open (IN, 'input.txt'); # open file for reading

while (my $in = <IN>) { … }

# read content line by line

close IN; # close filehandle when finished

Data Input/OutputFilehandles

filehandle

Page 30: Computer Programming for Biologists

Syntax:

open (FH, filename); # open file for reading

open (FH, "< filename"); # open file for reading

open (FH, "> filename"); # open file for writing

open (FH, ">> filename"); # append to file

close FH; # empties buffer

Data Input/OutputFilehandles

Write and append mode will create files if necessary

Write mode will empty file first

Page 31: Computer Programming for Biologists

$file_name = 'results.txt';

if ($write_modus eq 'append') {

# append to file (creates file if necessary)

open (OUT, ">>$file_name");

} else {

# normal write (erases content if file exists)

open (OUT, ">$file_name");

}

print OUT 'some text';

close OUT; # output might not appear until FH is closed!

Data Input/OutputWriting to files

Page 32: Computer Programming for Biologists

open (IN, $file_name)

or die "Can't read from $file_name: $!";

open (OUT, ">>$file_out")

or die "Can't append to $file_out: $!";

# Note: special variable $! contains error message

Data Input/OutputError check!

Always test if an important operation worked out:

Page 33: Computer Programming for Biologists

One or more file names are specified after the program,

loop over each argument:

foreach my $file (@ARGV) { # special variable @ARGV

open (IN, $file) or die; # open filehandle

while (my $in = <IN>) { # read file line by line

# do something

}

close IN; # close filehandle

}

Data Input/OutputReading from Filehandle

Page 34: Computer Programming for Biologists

Computer Programming for Biologists

Reading sequence from a file

# read pattern and sequence my ($pattern, $file) = @ARGV;

open (IN, $file) or die "$!";my $sequence = '';while (<IN>) {

next if (/^>/);chomp;$sequence .= $_;

}close IN;

# get patternmy $pattern = shift @ARGV;

my $sequence = '';while (<>) {

next if (/^>/);chomp;$sequence .= $_;

}

$ perl split.pl gcctg test.fa

Note: two command line arguments!

Page 35: Computer Programming for Biologists

Computer Programming for Biologists

Reading sequence from a file

# read pattern and sequence my $pattern = shift @ARGV;my $file = shift @ARGV;

open (IN, $file) or die "$!";my $sequence = '';while (<IN>) {

next if (/^>/);chomp;$sequence .= $_;

}close IN;

# get patternmy $pattern = shift @ARGV;

my $sequence = '';while (<>) {

next if (/^>/);chomp;$sequence .= $_;

}

$ perl split.pl gcctg test.fa

Note: two command line arguments!

Page 36: Computer Programming for Biologists

Computer Programming for Biologists

course project

Work on your course project (sequanto.pl):

1.Add explicit opening of file-handle

2.Store translated sequence into a new file

Page 37: Computer Programming for Biologists

Computer Programming for Biologists

Exam

Reminder

Exam:Thu, Dec 11th, 11 - 1 pm