bioperl modules. object oriented programming in perl (1) defining a class – a class is simply a...
Post on 21-Dec-2015
237 views
TRANSCRIPT
Object Oriented Programming in Perl (1)
• Defining a class– A class is simply a package with subroutines that function
as methods.
#!/usr/local/bin/perlpackage Cat;sub new {…}sub meow {…}
Object Oriented Programming in Perl (2)
$new_object = new ClassName;
$cat->meow();
Perl ObjectTo initiates an object from a class, call the class “new” method.
Using MethodTo use the methods of an object, use the “->” operator.
Object Oriented Programming in Perl (3)
• Inheritance– Declare a class array called @ISA.
• This array store the name and parent class(es) of the new species.
package NorthAmericanCat;@NorthAmericanCat::ISA = (“Cat”);sub new { …}
Perl Modules
A Perl module is a reusable package defined in a library file whose name is the same as the name of the package.
Names of perl modules
• Each Perl module has a unique name.• To minimize name space collision, Perl provides a
hierarchical name space for modules.– Components of a module name are separated by double
colons (::).– For example,
• Math::Complex• Math::Approx• String::BitCount• String::Approx
Module files
• Each module is contained in a single file.
• Module files are stored in a subdirectory hierarchy that parallels the module name hierarchy.
• All module files have an extension of .pm.
Module Is stored in
Config Config.pm
Math::Complex Math/Complex.pm
String::Approx String/Approx.pm
Module libraries
• The Perl interpreter has a list of directories in which it searhces for modules.
• Global arry @INC
>perl –V
@INC:
/usr/local/lib/perl5/5.00503/sun4-solaris
/usr/local/lib/perl5/5.00503
/usr/local/lib/perl5/site-perl/5.005/sun4-solaris
/usr/local/lib/perl5/site-perl/5.005
Using Modules
• A module can be loaded by calling the use function.
use Foo;
bar( “a” ); # using bar method
blat( “b” ); # using blat method
Bioperl toolkit• Core package (bioperl-live)
– THE basic package and it’s required by all the other packages• Run package (bioperl-run)
– Providing wrappers for executing some 60 common bioinformatics applications
• DB package (bioperl-db)– Subproject to store sequence and annotation data in a BioSQL
relational database• Network package (bioperl-network)
– Parses and analyzes protein-protein interaction data• Dev package (bioperl-dev)
– New and exploratory bioperl development
Bioperl Object-Oriented
• The Bioperl takes advantages of the OO design to create a consistent, well documented, object model for interacting with biological data in the life sciences.
• Bioperl Name space The Bioperl package installs everything in the Bio:: namespace.
(where are the packages stored???)
Bioperl Objects
• Sequence handling objects– Sequence objects
– Alignment objects
– Location objects
• Other Objects:3D structure objects, tree objects and phylogenetic trees, map objects, bibliographic objects and graphics objects
Sequence handling
• Typical sequence handling tasks:– Access the sequence
– Format the sequence
– Sequence alignment and comparison • Search for similar sequences
• Pairwise comparisons
• Multiple alignment
Sequence Annotation
• Bio::SeqFeature Sequence object can have multiple sequence feature (SeqFeature) objects (e.g. Gene, Exon, or Promoter objects) associated with it.
• Bio::Annotation A Seq object can also have an Annotation object (used to store database links, literature references and comments) associated with it
Sequence Input/Output
The Bio::SeqIO system was designed to make getting and storing sequences to and from the myriad of formats as easy as possible.
Accessing sequence data
– Bioperl supports accessing remote databases as well as local databases.
– Bioperl currently supports sequence data retrieval from the GenBank, Genpept, RefSeq, SwissProt, and EMBL databases
Format the sequences
• SeqIO object can read a stream of sequences in one format: Fasta, EMBL, GenBank, Swissprot, PIR, GCG, SCF, phd/phred, Ace, or raw (plain sequence), then write to another file in another format
Manipulating sequence data
$seqobj->display_id() # the human readable id of the sequence
$seqobj->subseq(5,10) # part of the sequence as a string $seqobj->desc() # a description of the sequence
$seqobj->trunc(5,10) # truncation from 5 to 10 as new object
$seqobj->revcom # reverse complements sequence
$seqobj->translate # translation of the sequence…
Search result parsing
The Bio::SearchIO system was designed for parsing sequence database searches (BLAST, sim4, waba, FASTA, HMMER, exonerate, etc.)
Manipulating alignment
The Bio::AlignIO system was designed for manipulating the alignment objects in different formats including aln, phylip, fasta, etc.
Example: Format the sequences
Example: using “seq_formating.pl” to convert “sequences.gb” to another format
Copy the files to the current directory
Check whether the files are executable
Now, let’s look at the genbank file.
The home directory in Windows system.
If you have Notepad++ installed, click “Edit with Notepad++”.
If not, try to open “sequence.gb” with Notepad program.
Program name
Input file
Format of the input sequences
Output file
Format of the output sequences
<enter>
Type:dir
To display the files in the current folder (NOT ls)
You should have the following files in the folder(you may have other files, but that’s fine):(1)seq_formating.pl(2)sequences.gb.txt
Type:perl<space>seq_formating.pl<space>sequences.gb.txt<space>genbank<space>sequences.fasta<space>fasta