5.1 previously on... perl course (let ’ s practice some more loops)

28
5.1 Previously on . . . PERL course (let’s practice some more loops)

Post on 20-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

5.1

Previously on . . .PERL course

(let’s practice some more loops)

5.2FASTA: Analyzing complex input

Overall design:

Read the FASTA file (several sequences).

For each sequence:

1. Read the FASTA sequence

1.1. Read FASTA header

1.2. Read each line until next FASTA header

2. For each sequence: Do something

2.1. Compute G+C content

2.2. Print header and G+C content

Let’s see how it’s done…

Do something

End of input? No

End

Start

Save header

Read line

Header orend of input

Yes

Concatenate to sequence

No

Read line

Read line

5.3

# 1. Read FASTA sequece

$fastaLine = <STDIN>;

while (defined $fastaLine) {# 1.1. Read FASTA header

$header = substr($fastaLine,1);

$fastaLine = <STDIN>;# 1.2. Read sequence until next FASTA header

while ((defined $fastaLine) and

(substr($fastaLine,0,1) ne ">" ))

{

$seq .= $fastaLine;

$fastaLine = <STDIN>;

}

# 2. Do something

... # 2.1 compute $gcContent

print "$header: $gcContent\n";

}

Do something

End of input? No

End

Start

Save header

Read line

Header orend of input

Yes

Concatenate to sequence

No

Read line

Read line

5.4Class exercise 4a

1. Write a script that reads lines of names and expenses:Yossi 6.10,16.50,5.00Dana 21.00,6.00Refael 6.10,24.00,7.00,8.00ENDFor each line print the name and the sum. Stop when you reach "END"

2. Change your script to read names and expenses on separate lines, Identify lines with numbers by a "+" sign as the first character in the string:Yossi+6.10+16.50+5.00Dana+21.00+6.00Refael +6.10+24.00+7.00+8.00END

Sum the numbers while there is a '+' sign before them.

Sum the numbers while there is a '+' sign before them.

Output:Yossi 27.6Dana 27Refael 45.1

Output:Yossi 27.6Dana 27Refael 45.1

5.5Class exercise 4a

3. (Home Ex. 2 Q. 5) Write a script that reads several protein sequences in FASTA format, and prints the name and length of each sequence. Start with the example code from the last lesson.

4*. Write a script that reads several DNA sequences in FASTA format, and prints FASTA output of the sequences whose header starts with 'Chr07'.

5**. Write a script that reads several DNA sequences in FASTA format, and prints FASTA output of the sequences whose header contains 'Chr07'.

5.6

Reading and writing files

5.7

Open a file for reading, and link it to a filehandle:

open(IN, "<EHD.fasta");

And then read lines from the filehandle, exactly like you would from <STDIN>:

my $line = <IN>;

my @inputLines = <IN>;

foreach $line (@inputLines) ...

Every filehandle opened should be closed:

close(IN);

Always check the open didn’t fail (e.g. if a file by that name doesn’t exists):

open(IN, "<$file") or die "can't open file $file";

Reading files

5.8

Open a file for writing, and link it to a filehandle:

open(OUT, ">EHD.analysis") or die...

NOTE: If a file by that name already exists it will be overwriten!

You could append lines to the end of an existing file:

open(OUT, ">>EHD.analysis") or die..

Print to a file (in both cases):

print OUT "The mutation is in exon $exonNumber\n";

Writing to files

no comma here

5.9

You can ask questions about a file or a directory name (not filehandle):

if (-e $name) { print "The file $name exists!\n"; }

-e $name exists-r $name is readable-w $name is writable by you-z $name has zero size-s $name has non-zero size (returns size)-f $name is a file-d $name is a directory-l $name is a symbolic link-T $name is a text file-B $name is a binary file (opposite of -T).

File Test Operators

5.10

open( IN, '<D:\workspace\Perl\p53.fasta' );

• Always use a full path name, it is safer and clearer to read

• Remember to use \\ in double quotes

open( IN, "<D:\\workspace\\Perl\\$name.fasta" );

• (usually) you can also use /

open( IN, "<D:/workspace/Perl/$name.fasta" );

Working with paths

5.11

Reading files: example

$line = <STDIN>;chomp $line;

# loop processes one input line and print output for linewhile ($line ne "END") { # Separate name and numbers @nameAndNums = split(/ /, $line); $name = $nameAndNums[0]; @nums = split(/,/, $nameAndNums[1]); $sum = 0;

# Sum numbers foreach $num (@nums) {

$sum = $sum + $num; } print "$name $sum\n";

# Read next line $line = <STDIN>; chomp $line;} Input: Yossi 6.10,16.50,5.00

Dana 21.00,6.00Refael 24.00,7.00,8.00END

Output: Yossi 27.6Dana 27Refael 45.1

5.12

Reading files: example

open(IN, '<D:\perl_ex\in.txt') or die "can't open input file";

$line = <IN>;chomp $line;

# loop processes one input line and print output for linewhile ($line ne "END") { # Separate name and numbers @nameAndNums = split(/ /, $line); $name = $nameAndNums[0]; @nums = split(/,/, $nameAndNums[1]); $sum = 0;

# Sum numbers foreach $num (@nums) {

$sum = $sum + $num; } print "$name $sum\n";

# Read next line $line = <IN>; chomp $line;}close(IN);

Input: Yossi 6.10,16.50,5.00Dana 21.00,6.00Refael 24.00,7.00,8.00END

Output: Yossi 27.6Dana 27Refael 45.1

5.13

Reading files: example

open(IN, '<D:\perl_ex\in.txt') or die "can't open input file";open(OUT,'>D:\perl_ex\out.txt') or die "can't open output file";$line = <IN>;chomp $line;

# loop processes one input line and print output for linewhile ($line ne "END") { # Separate name and numbers @nameAndNums = split(/ /, $line); $name = $nameAndNums[0]; @nums = split(/,/, $nameAndNums[1]); $sum = 0;

# Sum numbers foreach $num (@nums) {

$sum = $sum + $num; } print OUT "$name $sum\n";

# Read next line $line = <IN>; chomp $line;}close(IN);close(OUT);

Input: Yossi 6.10,16.50,5.00Dana 21.00,6.00Refael 24.00,7.00,8.00END

Output: Yossi 27.6Dana 27Refael 45.1

5.14Class exercise 5a

1. Change the script for class exercise 4a.2 to read the lines from an input file (instead of reading lines from keyboard).

2. Now, in addition, write the output of the previous question to a file named 'D:\perl_ex\class.ex.4a2.out' (instead of printing to the screen).

3*. Now, before opening 'D:\perl_ex\class.ex.4a2.out‘, check if it exists, and if so – print a message that the output file already exist, and exit the script.

4*. Change the script for class exercise 4.a3 to receive from the user two strings: 1) a name of FASTA file 2) a name of an output file. And then - read from a FASTA file given by the user, and write to an output file also supplied by the user.

5.15

Passing information using command-line arguments

5.16

It is common to give arguments (separated by spaces) within the command-line for a program or a script:

They will be stored in the array @ARGV:

foreach my $arg (@ARGV){ print "$arg\n";}

Command line arguments

> perl -w findProtein.pl D:\perl_ex\in.fasta 2 430

D:\perl_ex\in.fasta2430

@ARGV

'D:\perl_ex\in.fasta'

'2'

'430'

5.17

It is common to give arguments (separated by spaces) within the command-line for a program or a script:

They will be stored in the array @ARGV:

foreach my $arg (@ARGV){ print "$arg\n";}

> perl -w findProtein.pl D:\my perl\in.fasta 2 430

Command line arguments

D:\myperl\in.fasta2430

@ARGV

'D:\my'

'perl\in.fasta'

'2'

'430'

5.18

It is common to give arguments (separated by spaces) within the command-line for a program or a script:

They will be stored in the array @ARGV:

foreach my $arg (@ARGV){ print "$arg\n";}

> perl -w findProtein.pl "D:\my perl\in.fasta" 2 430

Command line arguments

D:\my perl\in.fasta2430

@ARGV

'D:\my perl\in.fasta'

'2'

'430'

5.19

It is common to give arguments (separated by spaces) within the command-line for a program or a script:

They will be stored in the array @ARGV:

my $inFile = $ARGV[0];my $outFile = $ARGV[1];

Or more simply:

my ($inFile,$outFile) = @ARGV;

Command line arguments

> perl -w findProtein.pl D:\perl_ex\in.fasta D:\perl_ex\out.txt

5.20Command line arguments in Eclispe

5.21Command line arguments in Eclispe

5.22

Reminder: the class exercise of 3 days ago.

Reading files - example

Input: Yossi 6.10,16.50,5.00Dana 21.00,6.00Refael 24.00,7.00,8.00END

Output: Yossi 27.6Dana 27Refael 45.1

5.23

Reading files: example

$line = <STDIN>;chomp $line;

# loop processes one input line and print output for linewhile ($line ne "END") { # Separate name and numbers @nameAndNums = split(/ /, $line); $name = $nameAndNums[0]; @nums = split(/,/, $nameAndNums[1]); $sum = 0;

# Sum numbers foreach $num (@nums) {

$sum = $sum + $num; } print "$name $sum\n";

# Read next line $line = <STDIN>; chomp $line;} Input: Yossi 6.10,16.50,5.00

Dana 21.00,6.00Refael 24.00,7.00,8.00END

Output: Yossi 27.6Dana 27Refael 45.1

5.24

Reading files: example

my ($inFileName) = @ARGV;open(IN, "<$inFileName") or die "can't open $inFileName";

$line = <IN>;chomp $line;

# loop processes one input line and print output for linewhile ($line ne "END") { # Separate name and numbers @nameAndNums = split(/ /, $line); $name = $nameAndNums[0]; @nums = split(/,/, $nameAndNums[1]); $sum = 0;

# Sum numbers foreach $num (@nums) {

$sum = $sum + $num; } print "$name $sum\n";

# Read next line $line = <IN>; chomp $line;}close(IN);

Input: Yossi 6.10,16.50,5.00Dana 21.00,6.00Refael 24.00,7.00,8.00END

Output: Yossi 27.6Dana 27Refael 45.1

5.25

Reading files: example

my ($inFileName, $outFileName) = @ARGV;open(IN, "<$inFileName") or die "can't open $inFileName";open(OUT, ">$outFileName") or die "can't open $outFileName";$line = <IN>;chomp $line;

# loop processes one input line and print output for linewhile ($line ne "END") { # Separate name and numbers @nameAndNums = split(/ /, $line); $name = $nameAndNums[0]; @nums = split(/,/, $nameAndNums[1]); $sum = 0;

# Sum numbers foreach $num (@nums) {

$sum = $sum + $num; } print OUT "$name $sum\n";

# Read next line $line = <IN>; chomp $line;}close(IN);close(OUT);

Input: Yossi 6.10,16.50,5.00Dana 21.00,6.00Refael 24.00,7.00,8.00END

Output: Yossi 27.6Dana 27Refael 45.1

5.26

Reading files: example

my ($inFileName, $outFileName) = @ARGV;open(IN, "<$inFileName") or die "can't open $inFileName";open(OUT, ">$outFileName") or die "can't open $outFileName";$line = <IN>;chomp $line;

# loop processes one input line and print output for linewhile (defined $line) { # Separate name and numbers @nameAndNums = split(/ /, $line); $name = $nameAndNums[0]; @nums = split(/,/, $nameAndNums[1]); $sum = 0;

# Sum numbers foreach $num (@nums) {

$sum = $sum + $num; } print OUT "$name $sum\n";

# Read next line $line = <IN>; chomp $line;}close(IN);close(OUT);

Input: Yossi 6.10,16.50,5.00Dana 21.00,6.00Refael 24.00,7.00,8.00

Output: Yossi 27.6Dana 27Refael 45.1

5.27

Reading files: example

my ($inFileName, $outFileName) = @ARGV;open(IN, "<$inFileName") or die "can't open $inFileName";open(OUT, ">$outFileName") or die "can't open $outFileName";$line = <IN>;

# loop processes one input line and print output for linewhile (defined $line) { chomp $line; # Separate name and numbers @nameAndNums = split(/ /, $line); $name = $nameAndNums[0]; @nums = split(/,/, $nameAndNums[1]); $sum = 0;

# Sum numbers foreach $num (@nums) {

$sum = $sum + $num; } print OUT "$name $sum\n";

# Read next line $line = <IN>;}close(IN);close(OUT);

Input: Yossi 6.10,16.50,5.00Dana 21.00,6.00Refael 24.00,7.00,8.00

Output: Yossi 27.6Dana 27Refael 45.1

5.28Class exercise 5b

1. Change the script of class exercise 5a.2 such that script receive the input and output file names as arguments.

2*. Write a script receives a number of numeric arguments and prints its sum. For example:

10 20 30 40

output:

100