csb472h1: computational genomics and bioinformatics · some modules may not be trivial to install...
TRANSCRIPT
CSB472H1: Computational Genomics and Bioinformatics Tutorial #8 Alex Nguyen, 2014 [email protected] ESC-4075
Copyright © 2014, Alex N. Nguyen Ba
What we have seen so far…
• Variables • A way to store values into memories.
• Functions • Print, string functions
• If/else, for, while • Conditionals • Comparisons
• Arrays and hashes • Subroutines • Regex • Input/Output
Copyright © 2014, Alex N. Nguyen Ba
Subroutines
• Code sharing is an important aspect of the community
Algorithm modifications (Blast derivatives etc.)
Problem: small number of programmers code in PERL. How to effectively transmit an algorithm?
Copyright © 2014, Alex N. Nguyen Ba
Logic
• Algorithm overview can be given by logic charts
initialization
Procedure
Condition
False
Input
True
Output
Copyright © 2014, Alex N. Nguyen Ba
Formulas
• Algorithm description is often given as formulas
For example…
Formulas are often the fastest way to get the point across.
Formulas are preferred by the bioinformatics community, however they can lead to ugly notations:
If($x == 1){return 5;} elsif($x == 0) {return 10;}
5 * (x) + 10 * (1-x) 5x * 101-x
Copyright © 2014, Alex N. Nguyen Ba
Pseudocodes
• Pseudocodes are code abstractions
Pseudocodes are ‘structured’ as codes, but are not meant to be interpreted by machines.
Code-specific syntax can, and should, be left out
When reading a pseudocode, I should not be able to say: “It looks like PERL.”
Do not write: open(FILE,$filename); Spot all the problems
Copyright © 2014, Alex N. Nguyen Ba
Pseudocodes
Input: string a of length m, string b of length n F[0,0] := 0 d := penalty for all i: F[i,0] := -i * d for all j: F[0,j] := -j * d for i = 1 to m: for j = 1 to n: F[i,j] = max(F[i-1,j-1] + S[a[i],b[j]], F[i-1,j]-d,F[i,j-1]-d)
There is no true standard for pseudocode syntax. KEEP IT CONSISTENT.
structure
Mathematics do not have to be explained
Blocks should be clear
Copyright © 2014, Alex N. Nguyen Ba
Pseudocodes
The pseudocode has to provide enough information for: - Solving the problem in any programming language - The complete understanding of every ‘variable’ used
Copyright © 2014, Alex N. Nguyen Ba
Pseudocodes
Class exercise (previously been an assignment)
Copyright © 2014, Alex N. Nguyen Ba
diff := Pa - Pb na := len(a) nb := len(b) if diff < 0 swap(a,b) swap(na,nb) diff := abs(diff) k := diff output_read := a[0 .. diff-1] while k < na r1 := a[k .. k] r2 := b[k-diff .. k-diff] if r1 != "." output_read := output_read + r1 elsif r2 != "." and r2 != "" output_read := output_read + r2 else output_read := output_read + "." k := k + 1 output_read := output_read + b[na-diff .. nb-1]
a[0 .. 9] corresponds to the first ten letters of read a a[0 .. 0] corresponds to the first letter of read a only a[na .. na] is empty where na is the length of a (remember 0th index) Pa and Pb are in 0th index but this is not required abs() is the absolute value len() is the length of the read swap() exchanges the variables
CA.TG..C.GT.G.T..AC.G..GA
3rd position (0-index) in read_1
1rst position (0-index) in read_2
CA.TG.AC.GT.GA
What we have seen so far…
• Variables • A way to store values into memories.
• Functions • Print, string functions
• If/else, for, while • Conditionals • Comparisons
• Arrays and hashes • Subroutines • Regex • Input/Output • Pseudocodes
Copyright © 2014, Alex N. Nguyen Ba
Modules
• While code is often shared by pseudocode, the vast majority of coding languages contain libraries of code written by the community
Modules are pieces of code that other people have written in your language
In many cases, they will provide more efficient and broad case code.
Copyright © 2014, Alex N. Nguyen Ba
Modules
• What can modules do?
The community has coded a wide range of codes that you can take advantage of
Manipulation of Window’s mouse… Writing Excel files… Drawing images… Complex mathematical functions…
Copyright © 2014, Alex N. Nguyen Ba
Modules
• Why shouldn’t you use a module?
Any module you use requires that anyone who uses your code have that module
Some modules may not be trivial to install Some modules are incompatible with the user’s Perl version If you code for specific cases, your code might be faster No easy way of modifying the function
Copyright © 2014, Alex N. Nguyen Ba
Modules
• How to install modules
Copyright © 2014, Alex N. Nguyen Ba
Modules
It can take awhile to find the module you want… It is maybe best you browse some forums and see what other people have used
Copyright © 2014, Alex N. Nguyen Ba
Modules
Perl modules require installation
On Windows, this installation is done through PPM
A nice interface should open called the “Perl Package Manager”
Copyright © 2014, Alex N. Nguyen Ba
Modules
Perl modules require installation
On linux based OS
You will have to type things instead… both do practically the same thing
Type: install packagename
Copyright © 2014, Alex N. Nguyen Ba
Modules
Let’s install the Excel-Writer module
Copyright © 2014, Alex N. Nguyen Ba
Modules
Let’s install the Excel-Writer module
Copyright © 2014, Alex N. Nguyen Ba
Modules
Let’s install the Excel-Writer module
Copyright © 2014, Alex N. Nguyen Ba
Modules
Excel-Writer module… done!
How to use it?
Copyright © 2014, Alex N. Nguyen Ba
Modules
use Excel::Writer::XLSX; my $workbook = Excel::Writer::XLSX->new('test.xlsx'); my $worksheet = $workbook->add_worksheet(); for(my $i = 0;$i < 100;++$i){ $worksheet->write($i, 0, sqrt($i)); }
Copyright © 2014, Alex N. Nguyen Ba
BioPERL Modules
BioPERL is a collection of perl modules written by biologists
Many of the tools you have learned about in class can be used via BioPERL
You even have access to some algorithmic functions like the Needleman algorithm
Installing BioPERL should be fairly straightforward
Copyright © 2014, Alex N. Nguyen Ba
BioPERL Modules
BioPERL is a collection of perl modules written by biologists
Copyright © 2014, Alex N. Nguyen Ba
BioPERL Modules
use Bio::SeqIO; my $file = "CSB472-2012-assignment_1.fasta"; my $seqio_object = Bio::SeqIO->new(-file => $file); my $seq_object = $seqio_object->next_seq; print $seq_object->display_id; print "\n"; print $seq_object->seq; print "\n";
Automatic format recognition, automatic handling of files…
http://www.bioperl.org/wiki/Module:Bio::SeqIO
Copyright © 2014, Alex N. Nguyen Ba
Object Oriented programming
OO programming vs procedural
my $sub = substr($string,1,2);
“Subject”
Object oriented programming places the subject as the owner of the function
my $sub = $string->substr(1,2); Note that this is not right.
It is, however, how almost all modules work Copyright © 2014, Alex N. Nguyen Ba
Object Oriented programming
Let’s use BioPERL to create sequence objects.
use Bio::Seq; use Bio::SeqIO; $seq_obj = Bio::Seq->new(-seq => 'atgcggctg', -display_id => 'new_dna', -desc => 'random_protein'); $seqio_obj = Bio::SeqIO->new(-file => 'test.fasta', -format => 'fasta'); $seqio_obj->write_seq($seq_obj);
Copyright © 2014, Alex N. Nguyen Ba
Web programming
Bioinformatics resources can usually be ran online
User Interface Inputs Parameters
Scripts and programs
Output
Copyright © 2014, Alex N. Nguyen Ba
Web programming
A series of input/output programs
User Interface Inputs Parameters
Scripts and programs
Output The output is usually interpreted by your browser (HTML code)
Copyright © 2014, Alex N. Nguyen Ba
What we have seen this semester
• Variables • $scalars, @arrays, %hashes.
• Functions • If/else, for, while
• Conditionals • Comparisons
• Subroutines • Regex • Input/Output • Pseudocode • BioPERL • Command line arguments • Web programming
Copyright © 2014, Alex N. Nguyen Ba