introduction to perl genetics 875 2012 perl is a language that is easy to use and was designed to do...

44
Introduction to PERL Genetics 875 2012 a language that is easy to use and was designed to do certain ike reading, writing, moving text & sequences) very well of PERL: ’s intuitive ’s easy to get started … you don’t need to know everything initially ’s very good at reading and manipulating files, sequences, text ere is usually >1 way to accomplish a task ges of PERL: programs are different from other programs, in that the program you write is “run” by another program which interprets your code (this interpreter is actually called perl … your programs will be run by the perl interpreter). this, your code is one level removed from the actual computer … perl programs are slower than other languages (like C, C++, and Java is not used so much for functions that require heavy computation. 1

Upload: doris-crawford

Post on 18-Jan-2016

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to PERL Genetics 875 2012 PERL is a language that is easy to use and was designed to do certain tasks (like reading, writing, moving text

Introduction to PERLGenetics 875 2012

PERL is a language that is easy to use and was designed to do certain tasks (like reading, writing, moving text & sequences) very well

Advantages of PERL:- it’s intuitive- it’s easy to get started … you don’t need to know everything initially- it’s very good at reading and manipulating files, sequences, text- there is usually >1 way to accomplish a task

Disadvantages of PERL:Perl programs are different from other programs, in that the program

you write is “run” by another program which interprets yourcode (this interpreter is actually called perl … your programswill be run by the perl interpreter).

Because of this, your code is one level removed from the actual computer …Therefore, perl programs are slower than other languages (like C, C++, and Java). Thus, perl is not used so much for functions that require heavy computation. 1

Page 2: Introduction to PERL Genetics 875 2012 PERL is a language that is easy to use and was designed to do certain tasks (like reading, writing, moving text

A great way to learn PERL:http://www.oreilly.com/catalog/lperl3/“Learning Perl”

Also, some great online resources:http://www.perl.com/

A short PERL tutorialhttp://archive.ncsa.uiuc.edu/General/Training/PerlIntro/

And lots of other help on the web ….

2

Page 3: Introduction to PERL Genetics 875 2012 PERL is a language that is easy to use and was designed to do certain tasks (like reading, writing, moving text

Like any language, programming languages have structure

A book has words, sentences, paragraphs, chapters, and punctuation linking themall together

ENGLISH PERL

Noun Scalar, variable

Verb Function, command

Phrase Statement, expression

Paragraph Loop

Chapter Subroutines, packages, modules

3

Page 4: Introduction to PERL Genetics 875 2012 PERL is a language that is easy to use and was designed to do certain tasks (like reading, writing, moving text

A variable is a container that can hold information that has the potential to vary

Variables can be singular, in which case they are identified by a “$” in front of the variable name

eg) $x $File1 $StudentName

They can be a number, letter (called a “character” in perl), string of numbers, or string of letters … justremember that whatever it is, it is considered a single item.

eg) $x = 5 $motif = “GATTAC” $StudentName = “Rutabega”

Variables can be plural and those come in different forms: Arrays and Hashes

4

Page 5: Introduction to PERL Genetics 875 2012 PERL is a language that is easy to use and was designed to do certain tasks (like reading, writing, moving text

Arrays are a list of single variables

An array is a container that holds a list of separate, single variables in a specific orderAn array is denoted by a @ in front of its name

Eg) @StudentNames = (“Caligula”, “Randolph”, “Imelda”);

“Caligula” is stored in the first ‘cell’ of the array, which is the “0” cell“Randoph” is stored in the second ‘cell’ of the array, which is the “1” cell“Imelda” is stored in the third ‘cell’ of the array, which is the “2” cell

** Note that in programming languages, you always start counting at “0” instead of at “1”

Position in array: 0 1 2Value stored at that position: Caligula Randolph Imelda

5

Page 6: Introduction to PERL Genetics 875 2012 PERL is a language that is easy to use and was designed to do certain tasks (like reading, writing, moving text

Arrays are a list of single variables

An array is a container that holds a list of separate, single variables in a specific orderAn array is denoted by a @ in front of its name

Eg) @StudentNames = ( “Caligula”, “Randoph”, “Imelda” );

Position in array: 0 1 2Value stored at that position: Caligula Randoph Imelda

You can ‘call’ a specific cell (which, remember, is a singular variable identified by $):

$StudentNames[0] = “Caligula”$StudentNames[1] = “Randolph”

This $ tells perl that you want a singular variable

These brackets tell perl that you are looking at a single cell in an array

Between these two parts of the name, perl knows this is a cell of an array 6

Page 7: Introduction to PERL Genetics 875 2012 PERL is a language that is easy to use and was designed to do certain tasks (like reading, writing, moving text

Exercise 0: Write your first perl program!

We will start by creating a simple perl program where you will print a string to the screen.

A few things about writing perl programs:

-- The first few lines of the program (which you’ll write in a .pl text file)will contain information for the computer about how to run the program

-- In order for the perl interpreter to understand your code, you must usethe right syntax.

Like in English, each phrase must have an obvious start and stop point.

The most common punctuation in perl is “;” which acts like a period does in English.A statement begins after the “;” from the previous statement, and ends at the next “;”

There will be other kinds of punctuation which define statements/items, like (….) {……} “…..” and we’ll get to these in a bit.

One useful punctuation is # which means “Don’t read this line of the file” – it is usefulbecause you can type in notes to yourself (“comments”) that aren’t part of the code.

Page 8: Introduction to PERL Genetics 875 2012 PERL is a language that is easy to use and was designed to do certain tasks (like reading, writing, moving text

Exercise 0: Write your first perl program!

1.Open the terminal on your computer and go to the desktopuse the unix command “cd” to change directories (type everything written in brown)

cd Desktop

2.We will use the text editor “emacs” to create and write your fileemacs FirstProgram.pl

This should open a blank file, since you just created it

3.Type this in the first line of your .pl file:#!/usr/bin/perl

This is a special magic command that tells the computer to use theperl interpreter to read and execute your program.

4.We will use a special mode of perl called “strict.” To do that, type this on the second line of your .pl file:

use strict;

5.Save your file using the emacs command, “Ctrl x Ctrl s” (ie, hold down Ctrl key and hit x then s)

You are now ready to start writing your own code! 8

Page 9: Introduction to PERL Genetics 875 2012 PERL is a language that is easy to use and was designed to do certain tasks (like reading, writing, moving text

Exercise 0: Write your first perl program!

6. Print a sentence to the screen using the built-in perl “print” functionprint “Hello. Welcome to your first perl program \n”;

The default for the print function is to print to the screen from where you ran the program.We will learn later how you can print to a file.

Note the “\n” at the end of this print statement. “\n” stands for “new-line character”This “\n” adds a ‘return’ to the end of your statement to end the line

You’ve now written your first perl program.

To run your program, open another terminal window. You will call the perl interpreter and then feed it your program file name

perl FirstProgram.pl

7. Save your file using the emacs command, “Ctrl x Ctrl s” (ie, hold down Cntrl key and hit x then s)

You will either see your sentenced on the screen, or you will get some kind of error …9

Page 10: Introduction to PERL Genetics 875 2012 PERL is a language that is easy to use and was designed to do certain tasks (like reading, writing, moving text

#!/usr/bin/perluse strict;

print “Hello. Welcome to your first perl program \n”;

10

Page 11: Introduction to PERL Genetics 875 2012 PERL is a language that is easy to use and was designed to do certain tasks (like reading, writing, moving text

Exercise 1: Modify your first perl programWe will create and define a string variable and an array.

6.You will add code to your existing program. Make a variable called Name:my $Name;

*since we are using “strict” mode, you must define a variable before you use it … for whatever reason, you do that by typing “my” in front of the variable, only when you create the variable (ie. The first time you ever type it)

7.Define the variable $Name to be your own name:$Name = “Audrey”;

8.Create an array called FavoriteHolidaysmy @FavoriteHolidays;

•Define the array as your top 3 favorite holidays, exactly as below:@FavoriteHolidays = (“Halloween”, “Christmas” , “Arbor Day”);

11

Page 12: Introduction to PERL Genetics 875 2012 PERL is a language that is easy to use and was designed to do certain tasks (like reading, writing, moving text

Exercise 1: Modify your first perl program

10. Print the variables you just defined to the screen using the built-in perl “print” functionprint “The top favorite holiday for $Name is $FavoriteHolidays[0]\n”;

11. Save your program by typing Ctrl x Ctrl s

12. Exit the program by typing Ctrl x Ctrl c

You will either see your name and holiday, or you will get some kind of error …

To run your program, open another terminal window. You will call the perl interpreter and then feed it your program file

perl FirstProgram.pl

We will create and define a string variable and an array.

12

Page 13: Introduction to PERL Genetics 875 2012 PERL is a language that is easy to use and was designed to do certain tasks (like reading, writing, moving text

#!/usr/bin/perluse strict;

print “Hello. Welcome to your first perl program \n”;

my $Name;$Name = "Audrey";

my @FavoriteHolidays;

@FavoriteHolidays = ("Halloween", "Christmas", "Arbor Day");

print "The top favorite holiday for $Name is $FavoriteHolidays[0]\n";

13

Page 14: Introduction to PERL Genetics 875 2012 PERL is a language that is easy to use and was designed to do certain tasks (like reading, writing, moving text

Hashes are fancy containers for single variables

Whereas an array indexes variables by their position in the list:

A hash indexes one variable by another (known as a ‘key’): for example, Name and hometown

Key in hash: Caligula Randolph ImeldaValue stored with that key: Rome Berlin Manila

A hash is denoted by %. To call the individual values contained in the hash, you needthe key name

my %HomeTowns;$HomeTowns{ “Caligula”} = “Rome”

Position in array: 0 1 2

Value stored at that position: Caligula Randoph Imelda

$ for calling single variable

curly brackets tell you it’s a hash

Page 15: Introduction to PERL Genetics 875 2012 PERL is a language that is easy to use and was designed to do certain tasks (like reading, writing, moving text

Exercise 2: Create and use a Hash

1. You will add code to your existing program. Make a hash called HolidayMonth:my %HolidayMonth;

2.Define the Hash, with the key = holiday and the stored value = the month$HolidayMonth{ “Halloween” } = “October”;$HolidayMonth{ “Christmas” } = “December”;$HolidayMonth{ “Arbor Day” } = “April”;

3. Print the month of the top holidayprint “The top favorite holiday for $Name is $FavoriteHolidays[0] in$HolidayMonth{Halloween} \n”;

15

Page 16: Introduction to PERL Genetics 875 2012 PERL is a language that is easy to use and was designed to do certain tasks (like reading, writing, moving text

#!/usr/bin/perluse strict;

print “Hello. Welcome to your first perl program \n”;

my $Name;$Name = "Audrey";

my @FavoriteHolidays;

@FavoriteHolidays = ("Halloween", "Christmas", "Arbor Day");

my %HolidayMonth;

$HolidayMonth{“Halloween”} = “October”;$HolidayMonth{“Christmas”} = “December”;$HolidayMonth{“Arbor Day”} = “April”;

print “The top favorite holiday for $Name is $FavoriteHolidays[0] in $HolidayMonth{Halloween} \n”;

16

Page 17: Introduction to PERL Genetics 875 2012 PERL is a language that is easy to use and was designed to do certain tasks (like reading, writing, moving text

Perl has a lot of built in functions and ‘operators’

+ means add $x + 5; is 7 - means subtract $y – 3; is 0* means multiply $x * 3; is 6/ means divide ($x*3)/2 is 3++ means increase by 1 $y++; is 4= assignment operator (set a variable to = something)= = is to evaluate equality

There are different operators for strings:$x = 123 $y = 456 $z = 3

. means concatenate two strings $x . $y; is 123456x means replicate a string $z x 4; is 3333eq evaluates string equality

These things work on numbers. $x = 2; $y = 3;

17

Page 18: Introduction to PERL Genetics 875 2012 PERL is a language that is easy to use and was designed to do certain tasks (like reading, writing, moving text

Conditional statement

Often you only want to do something if a certain condition is true. This is a casefor if/unless/else statements

If $x is equal to 5, then do something translates to

if ($x = = 5) {something ….

}

18

Page 19: Introduction to PERL Genetics 875 2012 PERL is a language that is easy to use and was designed to do certain tasks (like reading, writing, moving text

Conditional statement

Often you only want to do something if a certain condition is true. This is a casefor if/unless/else statements

If $x is equal to 5, then do something translates to

if ($x = = 5) {something ….

}

Parentheses define the start and stop of the condition

= = means if $x is exactly equal to 5If you type if ($x = 5) it will reset $x to be 5 and the statement is automatically true

Curly brackets define whatto do if the conditional statement

is true.

19

Page 20: Introduction to PERL Genetics 875 2012 PERL is a language that is easy to use and was designed to do certain tasks (like reading, writing, moving text

Conditional statement

Can also use if-then-else statements:

if ($x = = 5) {something ….

}else {do something different …

}

if ($x = = 5) {something ….

}elsif ($x<10) {do something different …

}

OR

The program will evaluate the statementin ( …) – if true, it will do what’s in { ..} if false it will SKIP what’s in { … } and resume on the line after that section.

20

Page 21: Introduction to PERL Genetics 875 2012 PERL is a language that is easy to use and was designed to do certain tasks (like reading, writing, moving text

Conditional statement

The ‘while’ statement is useful: do something while (some condition is true).

my $count = 0;

while ($count < 100) {do some function …$count++;

)

The ‘while’ statement turns out to be very useful for reading in files …

Remember that ++ is the “incrementby one” operator. So each time yougo through the loop, $count increasesby one. If you forget to increase count and it stays at 0, you will be in aninfinite loop.

Note that a while statement is a kind of loop …

21

Page 22: Introduction to PERL Genetics 875 2012 PERL is a language that is easy to use and was designed to do certain tasks (like reading, writing, moving text

Repeating actions: Loops

Very often, want to repeat the same function many times (often on different variables).For example:

-- open a file of microarray data-- read in each line of the file-- divide the 3rd cell of data by some constant-- save the file

for (my $i = 0; $i<10; $i++) {

do something …

}

There are 3 components of a “for loop”:Here $i acts as a counter

22

Page 23: Introduction to PERL Genetics 875 2012 PERL is a language that is easy to use and was designed to do certain tasks (like reading, writing, moving text

Repeating actions: Loops

Very often, want to repeat the same function many times (often on different variables).For example:

-- open a file of microarray data-- read in each line of the file-- divide the 3rd cell of data by some constant-- save the file

for (my $i = 0; $i<10; $i++) {

do something …

}

create a new variable to use as a counter

usually start that counter off at 0

do whatever as long as $i < 10

after each loop, increment$I by one (using the ++ operator)

23

Page 24: Introduction to PERL Genetics 875 2012 PERL is a language that is easy to use and was designed to do certain tasks (like reading, writing, moving text

Repeating actions: Loops

Very often, want to repeat the same function many times (often on different variables).For example:

-- open a file of microarray data-- read in each line of the file-- divide the 3rd cell of data by some constant-- save the file

for (my $i = 0; $i<10; $i++) {

do something …

}

create a new variable to use as a counter

usually start that counter off at 0

do whatever as long as $i < 10

after each loop, increment$I by one (using the ++ operator)

An important concept: scope – if you create a variable inside a loop, it is a “local” variable = it only exists while you’re in the loop (in this case, $I is a local variable). If you want a variable that is “global,” ie. it exists for the duration of the program, be sure to declare it outside of any loops. 24

Page 25: Introduction to PERL Genetics 875 2012 PERL is a language that is easy to use and was designed to do certain tasks (like reading, writing, moving text

#!/usr/bin/perluse strict;

print “Hello. Welcome to your first perl program \n”;

my $Name;$Name = "Audrey";

my @FavoriteHolidays;

@FavoriteHolidays = ("Halloween", "Christmas", "Arbor Day");

my %HolidayMonth;

$HolidayMonth{“Halloween”} = “October”;$HolidayMonth{“Christmas”} = “December”;$HolidayMonth{“Arbor Day”} = “April”;

for (my $i=0; $i<3; $i++) {print “Number $i favorite holiday for $Name is $FavoriteHolidays[$i];

}

Exercise 3: using loops

25

Page 26: Introduction to PERL Genetics 875 2012 PERL is a language that is easy to use and was designed to do certain tasks (like reading, writing, moving text

File Handling: talking to the outside world

can open existing files to read in data and can create new files to write to using“open”

open (HANDLE, “FileName.txt”)

shorthand file handleactual file name … default is read-only file

26

Page 27: Introduction to PERL Genetics 875 2012 PERL is a language that is easy to use and was designed to do certain tasks (like reading, writing, moving text

File Handling: talking to the outside world

can open existing files to read in data and can create new files to write to using“open”

open (HANDLE, “>FileName.txt”)

shorthand file handleactual file name

this “>” means it’s a writable file

Create a new file and print to it to save your data

open (SF, “>SaveFile.txt”);

print SF “$x”;27

Page 28: Introduction to PERL Genetics 875 2012 PERL is a language that is easy to use and was designed to do certain tasks (like reading, writing, moving text

#!/usr/bin/perluse strict;

print “Hello. Welcome to your first perl program \n”;

my $Name;$Name = "Audrey";

my @FavoriteHolidays;

@FavoriteHolidays = ("Halloween", "Christmas", "Arbor Day");

my %HolidayMonth;

$HolidayMonth{“Halloween”} = “October”;$HolidayMonth{“Christmas”} = “December”;$HolidayMonth{“Arbor Day”} = “April”;

open (SF, “>SaveFile.txt”);

for (my $i=0; $i<3; $i++) {print SF “Number $i favorite holiday for $Name is $FavoriteHolidays[$i]\n”;

}

Exercise 4: print results to a file

Notice how I had to create SF outside the loop so that the file is globally accessible.28

Page 29: Introduction to PERL Genetics 875 2012 PERL is a language that is easy to use and was designed to do certain tasks (like reading, writing, moving text

Reading in a file: combining file handling and the while statement

open (FILE, “FileName.txt”)

while (my $line = <FILE>) {print “$line\n”;

}

29

Page 30: Introduction to PERL Genetics 875 2012 PERL is a language that is easy to use and was designed to do certain tasks (like reading, writing, moving text

Reading in a file: combining file handling and the while statement

open (FILE, “FileName.txt”)

while (my $line = <FILE>) {print “$line\n”;

}

create a variable to hold each line of the file

<..> is the line input operator … reads each line in a file

while there are more lines in FILE

Another useful thing: <STDIN>STDIN is the standard way of getting information from the the user and it tells the programto wait until the user enters some information. Here’s an example:

print “Hello user. What is your favorite color:”;my $answer = <STDIN>;chomp($answer)

When the user enters the data,a \n (return) character willbe stuck onto the end of what perl takes as input. Usually, you don’twant that so you can use the ‘chomp’function, which cuts the last character off of a string. You would probably want to do this on $line in the example above as well.

Page 31: Introduction to PERL Genetics 875 2012 PERL is a language that is easy to use and was designed to do certain tasks (like reading, writing, moving text

Regular expressions: comparing sequences

These are some of the most useful functions in PERL. They allow you to easily scan your sequence, search for substrings, transpose, etc.

=~ is the operator for doing regular expressions.

=~ m is the match operator … used to search for a match to some sequence

$sequence = “CCATATAGAGATGAGCCTATA”;

if ($sequence =~ m/GATGAG/) {print “sequence contains GATGAG\n”;

}

31

Page 32: Introduction to PERL Genetics 875 2012 PERL is a language that is easy to use and was designed to do certain tasks (like reading, writing, moving text

Regular expressions: comparing sequences

=~ s is the swap operator … used to swap one word for another

$sequence = “CCATATAGAGATGAGCCTATA”;

sequence =~ s/GATGAG/nnnnnn/;

This will convert the sequence CCATATAGAGATGAGCCTATAto CCATATAGAnnnnnnCCTATA

32

Page 33: Introduction to PERL Genetics 875 2012 PERL is a language that is easy to use and was designed to do certain tasks (like reading, writing, moving text

=~ tr is the transpose operator … used to transpose one character into another

Regular expressions: comparing sequences

$sequence =~ tr/GATC/CTAG/;

This function is useful to use in conjunction with the built-in “reverse” function.

my $sequence = “GGATCCAA”;my $newsequence = reverse($sequence); #newseq is now AACCTAGG

$newsequence =~ tr/GATC/CTAG/; # newseq is now TTGGATCC

$newsequence is now the reverse complement of $sequence

33

Page 34: Introduction to PERL Genetics 875 2012 PERL is a language that is easy to use and was designed to do certain tasks (like reading, writing, moving text

Exercise 4: open and read a Fasta file

1.Create a new file called ReadFasta.plemacs ReadFile.pl

2.Type the usual stuff at the top of the file#!/usr/bin/perluse strict;

3.Open the file upstream.fasta and read in the data using the ‘while’ statementopen (FILE, “upstreams.fasta”);

while (my $line = <FILE>) {print “line = $line\n”;

}

4.Save the file: Ctrl x s

5.Run the file:perl ReadFasta.pl

34

Page 35: Introduction to PERL Genetics 875 2012 PERL is a language that is easy to use and was designed to do certain tasks (like reading, writing, moving text

#!/usr/bin/perluse strict;

open (FILE, “upstreams.fasta”);

while (my $line = <FILE>) {print “line is $line\n”;

}

35

Page 36: Introduction to PERL Genetics 875 2012 PERL is a language that is easy to use and was designed to do certain tasks (like reading, writing, moving text

Exercise 4: open and read a Fasta file

6.You will store the fasta sequence data in a Hash. Go back into your program and create a hash to hold the FASTA sequence.

Then create a scalar $gene to hold gene namemy %Fasta;my $gene;

7.In the while statement, evaluate each line to see if it is Name or Sequence.A fasta file has >NAME\n followed by sequence

if ($line =~ m/>/) {$gene = $line;

}

8. Now you know that the subsequent lines must be sequence. Store that in the hashelse {

$Fasta{$gene} = $Fasta{$gene} . $line;}

Note what we are doing: we expect >NAME to come before sequence … but thesequence could extend for multiple lines in the file. Therefore, we need to concatenatesequence from multiple lines, hence the “.” operator to concatenate strings.

36

Page 37: Introduction to PERL Genetics 875 2012 PERL is a language that is easy to use and was designed to do certain tasks (like reading, writing, moving text

37

>YNL313CTATGTATATGCTTAAACTAGCCTGTTCTAGATAGTCGCTATCGATTTTGCCACATTACCACCTTAAGTTGATATAATATTGCTTATTATAAAGGAAAGAACGCGTTTCCTAACTTCGTATATGGCGATAATTATCTAAGAAACTTCGCATCGTGAAAAAAAAGATGAAAAAAATGGAAGCTCATCGAGGCCAAAGGAATTGCTAAAAAGAAGCTATCAGACCAGGAAGTAAACTAGTGGTTGCAAAATT

For Line 1: $line will contain >, so $gene gets set to >YNL313C (and remains this until $gene is reset)

For Line 2: $line will NOT contain > and is therefore assumed to be sequenceso, $Fasta{$gene} = $line at Line 2

For Line 3: $line will NOT contain > …BUT if $Fasta{$line} = $line at Line 3 then will LOSE previous sequenceso … concatonate with previous sequence:

$Fasta{$gene} = $Fasta{$gene} . $line;

(remember right side gets evaluated FIRST then left side gets set equal to it)

Page 38: Introduction to PERL Genetics 875 2012 PERL is a language that is easy to use and was designed to do certain tasks (like reading, writing, moving text

Exercise 4: open and read a Fasta file

Next, you’ll search through each upstream sequence for each gene for a consensus sequence.

9. We need a way to search through all of the sequences, indexed by genes. We will usethe “foreach” method of looping. Because the elements of a hash are not stored in anyspecial order, we will use a way to step through each ‘key’ in the hash.

foreach my $g (keys %Fasta) {print “gene is $g and sequence is $Fasta{$g}\n”;

}

38

Page 39: Introduction to PERL Genetics 875 2012 PERL is a language that is easy to use and was designed to do certain tasks (like reading, writing, moving text

#!/usr/bin/perluse strict;

open (FILE, “upstreams.fasta”);

my %Fasta;my $gene;

while (my $line = <FILE>) {

if ($line =~m />/) {$gene = $line;

} else {$Fasta{$gene} = $Fasta{$gene} . $line;

}

}

39

Page 40: Introduction to PERL Genetics 875 2012 PERL is a language that is easy to use and was designed to do certain tasks (like reading, writing, moving text

#!/usr/bin/perluse strict;

open (FILE, “upstreams.fasta”);

my %Fasta;my $gene;

while (my $line = <FILE>) {if ($line =~m />/) {

$gene = $line; } else {

$Fasta{$gene} = $Fasta{$gene} . $line; }}

foreach my $g (keys %Fasta) {print “gene is $g and sequence is $Fasta{$g}\n”;

}

40

Page 41: Introduction to PERL Genetics 875 2012 PERL is a language that is easy to use and was designed to do certain tasks (like reading, writing, moving text

Exercise 4: open and read a Fasta file

Next, you’ll search through each upstream sequence for each gene for a consensus sequence.You will make a new hash to store the sequence matches.

10. First create the new hash, %Matchesmy %Matches;

•Next, within your loop … search each upstream sequence for the motif, GATGC

11.If there is a match, set the value to GATGC …. else set the value to “no match”foreach my $g (keys %Fasta) {

if ($Fasta{$g} =~ m/GATGC/i) {

$Matches{$g} = “GATGC”;} else {

$Matches{$g} = “no match”

}

this little i means do a case-insensitive search

41

Page 42: Introduction to PERL Genetics 875 2012 PERL is a language that is easy to use and was designed to do certain tasks (like reading, writing, moving text

#!/usr/bin/perluse strict;

open (FILE, “upstreams.fasta”);

my %Fasta;my $gene;

while (my $line = <FILE>) {if ($line =~ m/>/) {

$gene = $line; } else {

$Fasta{$gene} = $Fasta{$gene} . $line; }}my %Matches;foreach my $g(keys %Fasta) {

if ($Fasta{$g} =~ m/GATGC/i) {$Matches{$g} = “GATGC”;print “$g contains GATGC\n”;

} else {$Matches{$g} = “no matches”;

} }

42

Page 43: Introduction to PERL Genetics 875 2012 PERL is a language that is easy to use and was designed to do certain tasks (like reading, writing, moving text

#!/usr/bin/perluse strict;

open (FILE, “upstreams.fasta”);

my %Fasta;my $gene;

while (my $line = <FILE>) {if ($line =~ m/>/) {

$gene = $line; } else {

$Fasta{$gene} = $Fasta{$gene} . $line; }}open (SAVEFILE, “>YGR136W_output.txt”);

my %Matches;foreach my $g (keys %Fasta) {

if ($Fasta{$g} =~ m/GATGC/i) {$Matches{$g} = “GATGC”;print “$g contains GATGC\n”;print SAVEFILE “$g\tGATGC\n”; #\t means tab, \n means return

} else {$Matches{$g} = “no matches”;

} }

43

Page 44: Introduction to PERL Genetics 875 2012 PERL is a language that is easy to use and was designed to do certain tasks (like reading, writing, moving text

Exercise 4: open and read a Fasta file

Finally, save the results in the Matches hash to a new file.

• Create and open a savefile, Matches.txt:open (SF, “>Matches.txt”);

14.Step through the hash and print the gene and match information to the fileforeach my $g (keys %Matches) {

print SF “$g … $Matches{$g}\n”; }

•Save the fileCtrl x Ctrl s

16.Run the program from the command lineperl ReadFasta.pl

44