8.1 common errors – exercise #3 assuming something on the variable part of the input file. when...
TRANSCRIPT
8.1Common Errors – Exercise #3
• Assuming something on the variable part of the input file.
When parsing a format file (genebank, fasta or any other format), you should only rely on the format for parsing and not on the variable part of the input. Thus parsing by features such as these is wrong:
– Assuming each line in the title will start with a lowercase letter– Assuming the title will be composed of only 2 lines
It is legitimate to rely on the presence of the words ‘TITLE’ and ‘JOURNAL’ for the parsing as these are a part of the format.
• Reading the whole file at once (@all_lines = <$fh>;).
This is risky in case the file is very large… When we do not need all the lines in the file at once, we try to use $line = <$fh> in a ‘while’ loop.
• Performing an action on a variable without checking if it is defined.
This can generate errors in some cases.
• Use of functions/features not taught in class.
8.2
Solution to HW3 Q#6
• For each protein record print the first line (the LOCUS line) followed by a sorted list of its reference TITLEs.
1. Read the file
2. if reached LOCUS
line print it
3. if reached TITLE start
an inner loop until
reaching the JOURNAL
line (to take the full title)
4. push entire TITLE to
titles array
5. If reached a FEATURES
line print the title array and
initialize
...
...
8.3my $line = <$in>; # read input lineswhile (defined $line){
chomp($line); # if reached LOCUS line print it
if ((substr($line,0,5) eq "LOCUS") ) { print "\n$line\n";}
# if reached TITLE start an inner loop until reaching the JOURNAL lineif ( (length($line) > 7) && (substr($line,2,5) eq "TITLE") ) {
while ((defined $line) && (substr($line,2,7) ne "JOURNAL")) {chomp $line;$title = $title.substr($line,12); # concatenate the title line$line = <$in>;}push(@titleArray,$title); # push entire title to title array$title="";
}# if reached FEATURES line - sort and print titles arrayif ((substr($line,0,8) eq "FEATURES") ) {
@titleArray = sort(@titleArray);foreach $title (@titleArray) {print "$title\n";}@titleArray = (); # empty title array
} $line = <$in>;}
8.4
Hashes(associative arrays)
8.5
Let's say we want to create a phone book . . .
Enter a name that will be added to the phone book:
Dudi
Enter a phone number:
6409245
Enter a name that will be added to the phone book:
Dudu
Enter a phone number:
6407693
Hash Motivation
8.6
An associative array (or simply – a hash) is an unordered set of
pairs of keys and values. Each key is associated with a value.
A hash variable name always start with a “%”:
my %hash;
Initialization:
%hash = ("a"=>5, "bob"=>"zzz", 50=>"John");
Accessing:
you can access a value by its key:
print $hash{50}; John
Tip you can reset the hash (to an empty one) by %hash = ();
Note: a key in a hash will be interpreted as a string. These are equivalent:
Hash – an associative array
%hash
5"a" >=
"zzz""bob" >=
"John"50 >=
50”>=John”
“50”>=”John”
$hash{50}
$hash{“50”}
8.7
modifying :
$hash{bob} = "aaa"; (modifying an existing value)
adding :
$hash{555} = "z"; (adding a new key-value pair)
You can ask whether a certain key exists in a hash:
if (exists $hash{50} )...
You can delete a certain key-value pair in a hash:
delete($hash{50});
Hash – an associative array
%hash
5"a" >=
"zzz""bob" >=
"John"50 >=
%hash
5"a" >=
"aaa""bob" >=
"John"50 >=
%hash
5"a" >=
"aaa""bob" >=
"John"50 >=
"z"555 >=
%hash
5"a" >=
"aaa""bob" >=
"z"555 >=
8.8 Variable types in PERL
Scalar Array Hash
$number-3.54
$string"hi\n"
@array %hash
>=
>=
>=$array[0]
$hash{key}
8.9
An associative array of the phone book suggested in the first slide
(we will see a more elaborated version later on):
Declare
my %phoneBook;
Updating
$phoneBook{"Dudi"} = 9245;
$phoneBook{"Dudu"} = 7693;
Fetching
print $phoneBook{"Dudi"};
Hash – an associative array
%hash
9245"Dudi" >=
7693"Dudu" >=
8.10
It is possible to get a list of all the keys in %hash
my @hashKeys = keys(%hash);
Similarly you can get an array of the values in %hash
my @hashVals = values(%hash);
Iterating over hash elements
%hash
5"a" >=
"zzz""bob" >=
"John"50 >=
@hashKeys
"bob" 50"a"
@hashVals
5 "John" "zzz"
8.11
To iterate over all the values in %hash
my @hashVals = values(%hash);
foreach my $value (@hashVals)...
To iterate over the keys in %hash
my @hashKeys = keys(%hash);
foreach my $key (@hashKeys)...
Iterating over hash elements
%hash
5"a" >=
"zzz""bob" >=
"John"50 >=
@hashKeys
"bob" 50"a"
@hashVals
5 "John" "zzz"
8.12
For example, iterating over the keys in %hash :
my @hashKeys = keys(%hash);
foreach my $key (@hashKeys) {
print "The key is $key\n";
print "The value is $hash{$key}\n";
}
Iterating over hash elements
%hash
5"a" >=
"zzz""bob" >=
"John"50 >=
The key is bobThe value is zzzThe key is aThe value is 5The key is 50The value is John
@hashKeys
"bob" 50"a"
@hashVals
5 "John" "zzz"
8.13
Notably: The elements are given in an arbitrary order,
so if you want a certain order use sort:
my @hashKeys = keys(%hash);
my @sortedHashKeys = sort(@hashKeys);
foreach $key (@sortedHashKeys) {
print "The key is $key\n";
print "The value is $hash{$key}\n";
}
Iterating over hash elements
%hash
5"a" >=
"zzz""bob" >=
"John"50 >=
@hashKeys
"bob" 50"a"
@hashVals
5 "John" "zzz"
8.14
####################################### Purpose: Store names and phone numbers in a hash,# and allow the user to ask for the number of a certain name.# Input: Enter name-number pairs, enter "END" as a name to stop,# then enter a name to get his/her number#use strict;
my %phoneNumbers = ();my $number;
Example – phoneBook.pl #1
8.15
# Ask user for names and numbers and store in a hash
my $name = "";
while (1==1) {
print "Enter a name that will be added to the phone book:\n";
$name = >STDIN>;
chomp $name;
if ($name eq "END") {
last;
}
print "Enter a phone number: \n";
$number = >STDIN>;
chomp $number;
$phoneNumbers{$name} = $number;
}
Example – phoneBook.pl #2
8.16
# Ask for a name and print the corresponding number
$name = "";
while (1==1) {
print "Enter a name to search for in the phone book:\n";
$name = >STDIN>;
chomp $name;
if (exists($phoneNumbers{$name})) {
print "The phone number of $name is: $phoneNumbers{$name}\n";
}
elsif ($name eq "END") {
last;
}
else {
print "Name not found in the book\n";
}
}
Example – phoneBook.pl #3
8.17 Class exercise 81. Write a script that reads a file with a list of protein names and lengths
(proteinLengths ):AP_000081 181AP_000174 104AP_000138 145stores the names of the sequences as hash keys, with the length of the sequence as the value. Print the keys of the hash.
2. Add to Q1: Read another file, and print the names that appeared in both files with the same length. Print a warning if the name is the same but the length is different.
3. Write a script that reads a GenPept file (you may use the preproinsulin record), finds all JOURNAL lines, and stores the journal name (as key) and year of publication (as value) in a hash:a. Store only the first year (order of appearance in the file) value for each journal
nameb*. Store all years for each journal name
Then print the names and years, sorted by the journal name (no need to sort the years for the same journal in b*, unless you really want to do so…)