96-summer 生物資訊程式設計實習 ( 二 )

55
96-Summer 生生生生生生生生生生 ( 生 ) Bioinformatics with Perl 8/13~8/22 生生生 8/24~8/29 生生生 8/31 生生生

Upload: melva

Post on 12-Jan-2016

76 views

Category:

Documents


0 download

DESCRIPTION

96-Summer 生物資訊程式設計實習 ( 二 ). Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯. Schedule. Regular expression. File handle. File handle. Reserved file handle File manipulation File test operator File status Localtime. Reserved file handle. STDIN STDOUT STDERR DATA - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: 96-Summer 生物資訊程式設計實習 ( 二 )

96-Summer生物資訊程式設計實習(二 )

Bioinformatics with Perl

8/13~8/22 蘇中才8/24~8/29 張天豪

8/31 曾宇鳯

Page 2: 96-Summer 生物資訊程式設計實習 ( 二 )

ScheduleDate Time Subject Spea

ker

8/13 一

13:30~17:30 Perl Basics 蘇中才

8/15 三

13:30~17:30 Programming Basics 蘇中才

8/17 五

13:30~17:30 Regular expression 蘇中才

8/20 一

13:30~17:30 Retrieving Data from Protein Sequence Database

蘇中才

8/22 三

13:30~17:30 Perl combines with Genbank, BLAST 蘇中才

8/24 五

13:30~17:30 PDB database and structure files 張天豪

8/27 一

8:30~12:30 Extracting ATOM information 張天豪

8/27 一

13:30~17:30 Mapping of Protein Sequence IDs and Structure IDs

張天豪

8/31五 13:30~17:30 Final and Examination 曾宇鳳

Page 3: 96-Summer 生物資訊程式設計實習 ( 二 )

Regular expression

File handle

Page 4: 96-Summer 生物資訊程式設計實習 ( 二 )

File handle

Reserved file handle File manipulation File test operator File status Localtime

Page 5: 96-Summer 生物資訊程式設計實習 ( 二 )

Reserved file handle

STDIN STDOUT STDERR DATA ARGV ARGVOUT

Page 6: 96-Summer 生物資訊程式設計實習 ( 二 )

File handle - open

Inputopen SEQ, “seq.txt”;open SEQ, “< seq.txt”;

Outputopen SEQ, “> seq.txt”;

Appended outputopen LOG, “>> log.txt”;

Page 7: 96-Summer 生物資訊程式設計實習 ( 二 )

File handle - close

Input/Outputclose SEQ;close LOG;

Page 8: 96-Summer 生物資訊程式設計實習 ( 二 )

File handle - die

Error handlingdie “<your error message>”;$! : system error message

Example#!/usr/bin/perl -w#log.pl : write the read-only fileopen LOG, ">> disorder.fa" or die "LOG ERROR:$!\n";# write logclose LOG;

Page 9: 96-Summer 生物資訊程式設計實習 ( 二 )

File handle - warn

Warning handlingwarn “<your error message>”;$! : system error message

Example open LOG, “>> disorder.txt” or warn “LOG ERROR:$!”;

Page 10: 96-Summer 生物資訊程式設計實習 ( 二 )

File copy#!/usr/bin/perl -w

#copy1.pl : copy data from the input file into the output file

open INPUT, "<disorder.fa" or die "disorder.fa can't be opened\n";

open OUTPUT, ">temp.fa" or die "temp.fa can't be created\n";

my $line;while ( $line = <INPUT> ){ chomp $line; print OUTPUT "$line\n";}close INPUT;close OUTPUT;

Page 11: 96-Summer 生物資訊程式設計實習 ( 二 )

File test operators (1/3)Operator Description

-A Returns the access age of OPERAND when the program started.

-b Tests if OPERAND is a block device.

-B 

Tests if OPERAND is a binary file. If OPERAND is a file handle,then the current buffer is examined, instead of the file itself.

-c Tests if OPERAND is a character device.

-C Returns the inode change age of OPERAND when the program started.

-d Tests if OPERAND is a directory.

-e Tests if OPERAND exists.

-f 

Tests if OPERAND is a regular file as opposed to a directory,symbolic link or other type of file.

Page 12: 96-Summer 生物資訊程式設計實習 ( 二 )

File test operators (2/3)

Operator Description

-g Tests if OPERAND has the setgid bit set.

-k Tests if OPERAND has the sticky bit set.

-l 

Tests if OPERAND is a symbolic link. Under DOS,this operator always will return false.

-M Returns the age of OPERAND in days when the program started.

-o 

Tests if OPERAND is owned by the effective uid.Under DOS, it always returns true.

-O 

Tests if OPERAND is owned by the read uid/gid.Under DOS, it always returns true.

-p Tests if OPERAND is a named pipe.

-r Tests if OPERAND can be read from.

Page 13: 96-Summer 生物資訊程式設計實習 ( 二 )

File test operators (3/3)Operator Description

-R 

Tests if OPERAND can be read from by the real uid/gid.Under DOS, it is identical to -r.

-s 

Returns the size of OPERAND in bytes.Therefore, it returns true if OPERAND is non-zero.

-S Tests if OPERAND is a socket.-t Tests if OPERAND is opened to a tty.-T 

Tests if OPERAND is a text file. If OPERAND is a file handle,then the current buffer is examined, instead of the file itself.

-u Tests if OPERAND has the setuid bit set.-w Tests if OPERAND can be written to.-W   Tests if OPERAND can be written to by the real uid/gid.

Under DOS, it is identical to -w.-x Tests if OPERAND can be executed.-X 

Tests if OPERAND can be executed by the real uid/gid.Under DOS, it is identical to -x.

-z Tests if OPERAND size is zero.

Page 14: 96-Summer 生物資訊程式設計實習 ( 二 )

File copy +

#!/usr/bin/perl -w

#copy2.pl : copy data from the input file into the output file

if (not -e "disorder1.fa") { die "disorder1.fa isn't existed\n"; print "continue to open disorder1.fa\n";}open INPUT, "<disorder1.fa" or die "disorder1.fa can't be opened\n";if (-e "temp.fa") { warn "temp.fa is existed\n"; print "continue to write temp.fa\n";}open OUTPUT, ">temp.fa" or die "temp.fa can't be created\n";my $line;while ( $line = <INPUT> ){ chomp $line; print OUTPUT "$line\n";}close OUTPUT;close INPUT;

Page 15: 96-Summer 生物資訊程式設計實習 ( 二 )

Exercise

File handle

Page 16: 96-Summer 生物資訊程式設計實習 ( 二 )

File size Get the size of a file

my $size = -s “disorder.fa”;

Check file size if ( -s “disorder.fa” > 5*1024) { … }

if ($size=-s “disorder.fa” > 5*1024) { print “disorder.fa has $size bytes\n”;}

What’s the value of $size ? Why ?

Page 17: 96-Summer 生物資訊程式設計實習 ( 二 )

Exercise – linenumber.pl Input (disorder.fa)>GCN4_YEAST (P03069) General control protein GCN4 - Saccharomyces cere

visiae (Baker's yeast).MSEYQPSLFALNPMGFSPLDGSKSTNENVSASTSTAKPMVGQLIFDKFIKTEEDPIIKQDTPSNLDFDFALPQTATAPDAKTVLPIPELDDAVVESFFSSSTDSTPMFEYENLEDNSKEW...EHAYSRARTKNNYGSTIEGLLDLPDDDAPEEAGLAAPRLSFLPAGHTRRLSTAPPTDVSLGDELHLDGEDVAMAHADALDDFDLDMLGDGDSPGPGFTPHDSAPYGALDMADFEFEQMFTDALGIDEYGG

Output 1 >GCN4_YEAST (P03069) General control protein GCN4 - Saccharomyces

cerevisiae (Baker's yeast). 2 MSEYQPSLFALNPMGFSPLDGSKSTNENVSASTSTAKPMVGQLIFDKFIKTEEDPIIKQD 3 TPSNLDFDFALPQTATAPDAKTVLPIPELDDAVVESFFSSSTDSTPMFEYENLEDNSKEW ... 128 EHAYSRARTKNNYGSTIEGLLDLPDDDAPEEAGLAAPRLSFLPAGHTRRLSTAPPTDVSL 129 GDELHLDGEDVAMAHADALDDFDLDMLGDGDSPGPGFTPHDSAPYGALDMADFEFEQMFT 130 DALGIDEYGG

Page 18: 96-Summer 生物資訊程式設計實習 ( 二 )

Regular expression

File status, localtime

Page 19: 96-Summer 生物資訊程式設計實習 ( 二 )

File information - stat0 dev number of filesystem

1 ino inode number

2 mode file mode (type and permissions)

3 nlink number of (hard) links to the file

4 uid numeric user ID of file's owner

5 gid numeric group ID of file's owner

6 rdev the device identifier (special files only)

7 size total size of file, in bytes

8 atime last access time in seconds since the epoch

9 mtime last modify time in seconds since the epoch

10 ctime inode change time in seconds since the epoch (*)

11 blksize preferred block size for file system I/O

12 blocks actual number of blocks allocated

Page 20: 96-Summer 生物資訊程式設計實習 ( 二 )

File status#!/usr/bin/perl -w

#stat.pl : show the information of the file

my $fn = shift @ARGV;die "please enter a filename\n" if(not defined($fn));die "$fn isn't existed\n" if(not -e $fn);my ($dev,$ino,$mode,$nlink,$uid,$gid,$rdev,$size, $atime,$mtime,$ctime,$blksize,$blocks) = stat($fn);

print "device = $dev\n";print "inode = $ino\n";print "mode = $mode\n";print "node link = $nlink\n";print "user id = $uid\n";print "group id = $gid\n";print "rdev = $rdev\n";print "size = $size\n";print "atime = $atime\n";print "mtime = $mtime\n";print "ctime = $ctime\n";print "block size = $blksize\n";print "blocks = $blocks\n";

Page 21: 96-Summer 生物資訊程式設計實習 ( 二 )

Local time#!/usr/bin/perl -w

#localtime1.pl : show the readable time of the file

my $fn = shift @ARGV;die "please enter a filename\n" if (not defined($fn));die "$fn isn't existed\n" if (not -e $fn);my ($dev,$ino,$mode,$nlink,$uid,$gid,$rdev,$size, $atime,$mtime,$ctime,$blksize,$blocks) = stat($fn);

my $alocal = localtime $atime;my $mlocal = localtime $mtime;my $clocal = localtime $ctime;print "atime = $alocal\n";print "mtime = $mlocal\n";print "ctime = $clocal\n";

Page 22: 96-Summer 生物資訊程式設計實習 ( 二 )

Local time +

#!/usr/bin/perl -w

#localtime2.pl : show the user-defined time of the file

my $fn = shift @ARGV;

die "please enter a filename\n" if (not defined($fn));

die "$fn isn't existed\n" if (not -e $fn);

my ($dev,$ino,$mode,$nlink,$uid,$gid,$rdev,$size,

$atime,$mtime,$ctime,$blksize,$blocks) = stat($fn);

my ($sec,$min,$hour,$day,$mon,$year,$wday,$yday,$isdst) = localtime $mtime;

print "mtime = ($year/$mon/$day $hour:$min:$sec ($wday;$yday;$isdst)\n";

Page 23: 96-Summer 生物資訊程式設計實習 ( 二 )

Local time

$sec : 0~59 $min : 0~59 $hour : 0~23 $day : 1~31 $mon : 0~11 $year : +1900 $wday : 0 (Sunday) ~ 6 (Saturday) $yday : 0 (Jan 1) ~354 or 355 $isdst: daylight saving time (positive or zero)

Page 24: 96-Summer 生物資訊程式設計實習 ( 二 )

Exercise

localtime

Page 25: 96-Summer 生物資訊程式設計實習 ( 二 )

Quiz – localtime

my ($sec,$min,$hour,$day,$mon,$year,$wday, $yday,$isdst) = localtime $mtime;

print "mtime = ($year/$mon/$day $hour:$min:$sec ($wday;$yday;$isdst)\n";

mtime = (107/7/2 10:10:16 (4;213;0)

my $mlocal = localtime $mtime;print "mtime = $mlocal\n";mtime = Thu Aug 2 10:10:16 2007

my ($mlocal) = localtime $mtime;

?

Page 26: 96-Summer 生物資訊程式設計實習 ( 二 )

Exercise

How to show the time information of disorder.fa like “ 2007/8/2 10:10:16 (Thu) “ ?Hint: year, month and weekday @weekDays = qw(Sun Mon Tue Wed Thu Fri Sat Sun);

How to show the time information of disorder.fa like “Aug 2 2007 10:10:16 (Thu)“ ? @months = qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec);

Page 27: 96-Summer 生物資訊程式設計實習 ( 二 )

Regular expression

Basic

Page 28: 96-Summer 生物資訊程式設計實習 ( 二 )

How to search a word in a text file ?

Unix commandgrep

PerlRegular expression

Page 29: 96-Summer 生物資訊程式設計實習 ( 二 )

An example of Regular expression#!/usr/bin/perl -w

#google1.pl : check string with/without a certain pattern

while (1){ print "Please enter your query:"; $line = <>; if ($line =~ /google/) { print "Found!!!\n"; } else { print "No match\n"; }}

Page 30: 96-Summer 生物資訊程式設計實習 ( 二 )

If we want to find the following words google, g01gle, g12gle, gabgle, …, gxxgle

ggle, gogle, google, gooogle, …, go…ogle

gogle, google, gooogle, …, go…ogle

google, goooogle, goooooogle, …, goo…oogle

ggle, gogle, google, gooogle, …, go…ogle, gagle, gaagle, gaaagle, gbgle, gbbgle, …

Page 31: 96-Summer 生物資訊程式設計實習 ( 二 )

Meta-character

Wildcard (.)Except for “\n”

Quantifier? : one character or none* : one character ~ or none+ : one character ~

Page 32: 96-Summer 生物資訊程式設計實習 ( 二 )

If we want to find the following words google, g01gle, g12gle, gabgle, …, gxxgle

/g..gle/ ggle, gogle, google, gooogle, …, go…ogle

/go*gle/ gogle, google, gooogle, …, go…ogle

/go+gle/ google, goooogle, goooooogle, …, goo…oogle

/g(oo)+gle/ ggle, gogle, google, gooogle, …, go…ogle, gagl

e, gaagle, gaaagle, gbgle, gbbgle, … /g.*gle/

Page 33: 96-Summer 生物資訊程式設計實習 ( 二 )

Character class

[ ] - ^

Examples [abcdefghijklmnopqrstuvwxyz] or [a-z] [0123456789] or [0-9] [abcxyz] [02468] or [^13579] [A-Za-z0-9]

Page 34: 96-Summer 生物資訊程式設計實習 ( 二 )

Character class simplicity [\d] : [0-9] [\w] : [A-Za-z0-9_] [\s] : [\f\t\n\r ]

Something you don’t want [\D] : [^\d] [\W] : [^\w] [\S] : [^\s]

How about [\s\S] ?

What’s different between . and [\s\S] ?

Page 35: 96-Summer 生物資訊程式設計實習 ( 二 )

Please think …

/google/ /g[\d][\d]gle/ /g..gle/ /g[\w]*gle/ /g.*gle/ /g[\d\D]*gle/ /g……….gle/

Page 36: 96-Summer 生物資訊程式設計實習 ( 二 )

Additional quantifiers

| { n, m }

Examples /(google|Google)/ or /(G|g)oogle/ /g……….gle/ or /go{10}gle/ /go{0,100}gle/ /g(oo)+gle/ or /g(oo){1,}gle/

Page 37: 96-Summer 生物資訊程式設計實習 ( 二 )

Additional quantifiers

^ : beginning of the string $ : end of the string \b : boundary of a word \B : [^\b]

Examples /^google$/ /\bgoogle\b/

Page 38: 96-Summer 生物資訊程式設計實習 ( 二 )

Additional quantifiers

( ) \1, \2, … : backreference

Examples /g(o)\1gle/ /g([\S])\1gle/

Output (matched variable) $1, $2, …

Page 39: 96-Summer 生物資訊程式設計實習 ( 二 )

Exercise

Basic regular expression

Page 40: 96-Summer 生物資訊程式設計實習 ( 二 )

Exercise

How to extract these words ?

gogle, gooogle, gooooogle, gooooooogle (No ggogles)

g11gle, g33gle, g55gle, g77gle, g99gle (excluding gg99gles)

What do those mean ?

/g[\d]+gle/

/go?gle/

/g([\w])([\w])\2\1gle/

Page 41: 96-Summer 生物資訊程式設計實習 ( 二 )

Magic variable - $_

Originalwhile ($line = <>) {

chomp($line);

if ($line =~ /google/) {

print “$line\n”;

}

}

Magicwhile (<>) { chomp; if (/google/) { print “$_\n”; }}

Page 42: 96-Summer 生物資訊程式設計實習 ( 二 )

Magic variable - $_#!/usr/bin/perl -w

#google2.pl : check string with/without a certain pattern

print "Please enter your query:";while (<>){ chomp; if (/google/) { print "Found!!!\n"; } else { print "No match\n"; } print "Please enter your query:";}

Page 43: 96-Summer 生物資訊程式設計實習 ( 二 )

Regular expression

Flags

Page 44: 96-Summer 生物資訊程式設計實習 ( 二 )

Regular Expression

String matching m// or //

String substitutions///

String transliterationtr/// or y///

Page 45: 96-Summer 生物資訊程式設計實習 ( 二 )

Matching

Complete syntax m//

Examplesm/google/m/g(oo){0,}gle/

Othersm<google>, m[google], m!google!, …

Page 46: 96-Summer 生物資訊程式設計實習 ( 二 )

Flag options /i : case insensitivity /s : let . become [\d\D] /m : multiple lines

Examples google, Google, GOOGLE, gOOGLE, GooGle, …

m/google/i

Page 47: 96-Summer 生物資訊程式設計實習 ( 二 )

Matched patterns

$& : the last matched patterns $` : prefix-string of $& $’ : suffix-string of $&

Examples$string = "Microsoft google Yahoo";

$string =~ m/google/i;

print “[$`][$&][$‘]\n";

[Microsoft ][google][ Yahoo]

Page 48: 96-Summer 生物資訊程式設計實習 ( 二 )

Matched pattern - $&, $`, $’#!/usr/bin/perl -w#google3.pl : check string with/without a certain patter

nprint "Please enter your query:";while (<>){ chomp; if (m/google/i) { print "Match:[$&]\n"; print "prefix : [$`]\n"; print "suffix : [$']\n"; } else { print "No match\n"; } print "Please enter your query:";}

Page 49: 96-Summer 生物資訊程式設計實習 ( 二 )

Substitution

Complete syntax s/// or s###

Examples$string =~ s/google/GOOGLE/s/(google|GOOGLE)/Microsoft/

Otherss#^https://#http://#;

Page 50: 96-Summer 生物資訊程式設計實習 ( 二 )

Flag options /i : case insensitivity /s : let . become [\d\D] /g : multiple replacement

Examples s/google/yahoo/sg s/\s+/ /g s/^\s+// s/\s+$// s#^.*/##s

Page 51: 96-Summer 生物資訊程式設計實習 ( 二 )

Flag options \U : upper case \L : lower case \E : end-point for case setting \u : upper case for the first word \l : lower case for the first word

Examples s/(GOOGLE)/\L$1/ig s/(\w+) kiss (\w+)/\U$2\E was kissed by $1/i;

Page 52: 96-Summer 生物資訊程式設計實習 ( 二 )

Transliteration

Complete syntax tr/// or y///

Examples$string =~ tr/a-z/A-Z/ $string =~ tr/a-z/b-za/$string =~ tr/ATCG/TAGC/$string =~ tr/ATCG/UAGC/

Page 53: 96-Summer 生物資訊程式設計實習 ( 二 )

Transliteration - flags

/d : delete$text =~ tr/ //d;

/c : replace unassigned char with a certain char$text =~ tr/[0-9]/*/c;

/s : remove the redundant char$text =~ tr/a-zA-Z//s;

Page 54: 96-Summer 生物資訊程式設計實習 ( 二 )

Exercise

Replacement

Page 55: 96-Summer 生物資訊程式設計實習 ( 二 )

Exercise

A Ingenious Love letterhttp://love.english.tw/post/43/630Stored in love.txt

How to decode this letter ?Please replace ‘I’ with your namePlease replace “you” with your boy/girl friend’s

namePlease replace “we” with your names.