perl variables and data structures andrew emerson, high performance systems, cineca

24
PERL PERL Variables and data structures Andrew Emerson, High Performance Systems, CINECA

Upload: gwendoline-johnston

Post on 28-Dec-2015

225 views

Category:

Documents


2 download

TRANSCRIPT

PERLPERL

Variables and data structures

Andrew Emerson, High Performance Systems, CINECA

The “Hello World” programThe “Hello World” program

Consider the following:

#

# Hello World

#

$message=“Ciao, Mondo”;

print “$message \n”;

exit;

#

# Hello World

#

$message=“Ciao, Mondo”;

print “$message \n”;

exit;

Perl VariablesPerl Variables

$message is called a variable, something with a name used to hold one or more pieces of information.

All computer languages have the ability to create variables to store and manipulate data.

Perl differs from other languages because you do not specify the “type” (i.e. integer, real, character, etc.) only the “complexity” of the data.

Perl VariablesPerl Variables

Perl has 3 ways of storing data:

1. Scalar For single data items, like numbers or strings.

2. Arrays For ordered lists of scalars. Scalars indexed by

numbers.

3. Associative arrays or “hashes” Like arrays, but uses “keys” to identify the scalars.

Scalar VariablesScalar Variables

ExamplesExamples

#

$no_of_chrs=24; # integer

$per_cent_identity=0; # also integer

$per_cent_identity=99.50; # redefined as real

$pi = 3.1415926535; # floating point (real)

$e_value=1e-40; # using scientific notation

$dna=“GCCTACCGTTCCACCAAAAAAAA”; # string -double quotes

$dna=‘GCCTACCGTTCCACCAAAAAAAA’; # string -single quotes

#

$no_of_chrs=24; # integer

$per_cent_identity=0; # also integer

$per_cent_identity=99.50; # redefined as real

$pi = 3.1415926535; # floating point (real)

$e_value=1e-40; # using scientific notation

$dna=“GCCTACCGTTCCACCAAAAAAAA”; # string -double quotes

$dna=‘GCCTACCGTTCCACCAAAAAAAA’; # string -single quotes

Scalar VariablesScalar Variables

CASE is important, $DNA ≠ $dna; (true for all variables)

Scalars must be prefixed with a $ whenever they are used (is there a $? Yes → it is a scalar). The next character should be a letter and not a number (true for all variables).

Scalars can be happily redefined at any time (e.g. integer → real → string):

# unlikely example

$dna = 0; # integer

$dna = “GGCCTCGAACGTCCAGAAA”; # now it’s a # string

# unlikely example

$dna = 0; # integer

$dna = “GGCCTCGAACGTCCAGAAA”; # now it’s a # string

Doing things with scalars..Doing things with scalars..#

$a =1.5;

$b =2.0; $c=3;

$sum = $a+$b*$c; # multiply by $b by $c, add to $a

#

while ($j<100) {

$j++; # means $j=$j+1, i.e. add 1 to j

print “$j\n”;

}

#

$dna1=“GCCTAAACGTC”;

$polyA=“AAAAAAAAAAAAAAAA”;

$dna1 .= $polyA; # add one string to another

# (equiv. $dna1 = $dna1.$polyA)

$no_of_bases = length($dna2); # length of a scalar

#

$a =1.5;

$b =2.0; $c=3;

$sum = $a+$b*$c; # multiply by $b by $c, add to $a

#

while ($j<100) {

$j++; # means $j=$j+1, i.e. add 1 to j

print “$j\n”;

}

#

$dna1=“GCCTAAACGTC”;

$polyA=“AAAAAAAAAAAAAAAA”;

$dna1 .= $polyA; # add one string to another

# (equiv. $dna1 = $dna1.$polyA)

$no_of_bases = length($dna2); # length of a scalar

More about strings..More about strings..

There is a difference between strings with ‘ and “

#

$nchr = 24;

$message=“chromosones in human cell =$nchr”;

print $message;

$message = ‘chromosones in human cell =$nchr’;

print $message;

exit;

#

$nchr = 24;

$message=“chromosones in human cell =$nchr”;

print $message;

$message = ‘chromosones in human cell =$nchr’;

print $message;

exit;single quotes

double quotes

OUTPUT

chromosones in human cell =24

chromosones in human cell =$nchr

OUTPUT

chromosones in human cell =24

chromosones in human cell =$nchr

More about stringsMore about strings

Double quotes “ interpret variables, single quotes ‘ do not:

$dna=‘GTTTCGGA’;

print “sequence=$dna”;

print ‘sequence=$dna’;

$dna=‘GTTTCGGA’;

print “sequence=$dna”;

print ‘sequence=$dna’;

OUTPUT

sequence=GTTTCGGA

sequence=$dna

OUTPUT

sequence=GTTTCGGA

sequence=$dna

Normally you would want double quotes when using print.

@days_in_month=(31,28,31,30,31,30,31,31,30,31,30,31);

@days_of_the_week=(‘mon’, ‘tue’, ‘wed’ ,’thu’,’fri’,’sat’,’sun’);

@bases = (‘adenine’, ‘guanine’, ‘thymine’, ‘cytosine’, ‘uracil’);

@GenBank_fields=( ‘LOCUS’,

‘DEFINITION’,

‘ACCESSION’,

...

);

@days_in_month=(31,28,31,30,31,30,31,31,30,31,30,31);

@days_of_the_week=(‘mon’, ‘tue’, ‘wed’ ,’thu’,’fri’,’sat’,’sun’);

@bases = (‘adenine’, ‘guanine’, ‘thymine’, ‘cytosine’, ‘uracil’);

@GenBank_fields=( ‘LOCUS’,

‘DEFINITION’,

‘ACCESSION’,

...

);

ArraysArraysCollections of numbers, strings etc can be stored in arrays. In Perl arrays are defined as ordered lists of scalars and are represented with the @ character.

Initializing arrays with lists

Arrays - elementsArrays - elements

To access the individual array elements you use [ and ] :

@poly_peptide=(‘gly’,’ser’,’gly’,’pro’,’pro’,’lys’,’ser’,’phe’);

# now mutate the peptide

$poly_peptide[0]=‘val’;

$i=0;

# print out what we have

while ($i<8) {

print “$poly_peptide[$i] “;

$i++;

}

@poly_peptide=(‘gly’,’ser’,’gly’,’pro’,’pro’,’lys’,’ser’,’phe’);

# now mutate the peptide

$poly_peptide[0]=‘val’;

$i=0;

# print out what we have

while ($i<8) {

print “$poly_peptide[$i] “;

$i++;

}

Look

array index

The numbers used to identify the elements are called indices.

Arrays - elements Arrays - elements

When accessing array elements you use $ - why ? Because array elements are scalar and scalars must have $;

@poly_peptide=(..);

$poly_peptide[0] = ‘val’;

@poly_peptide=(..);

$poly_peptide[0] = ‘val’;

This means that you can have a separate variable called $poly_peptide because $poly_peptide[0] is part of @poly_peptide, NOT $poly_peptide.

This may seem a bit weird, but that's okay, because it is weird.

Unix Perl Manual

Array indices start from 0 not 1 ;

Array elementsArray elements

$poly_peptide[0]=‘var’;

$poly_peptide[1]=‘ser’;

$poly_peptide[7]=‘phe’;

$poly_peptide[0]=‘var’;

$poly_peptide[1]=‘ser’;

$poly_peptide[7]=‘phe’;

The last index of the array can be found from $#name_of_array, e.g. $#poly_peptide. You can also use negative indices: it means you count back from the end of the array. Therefore

$poly_peptide[-1]= $poly_peptide[$#poly_peptide] = $poly_peptide[7]

$poly_peptide[-1]= $poly_peptide[$#poly_peptide] = $poly_peptide[7]

Array propertiesArray properties

Length of an array:

$len = $#poly_peptide+1;$len = $#poly_peptide+1;

The size of the array does not need to be defined – it can grow dynamically:

# begin program

$i=0;

while ($i<100) {

$polyA[$i]=‘A’;

$i++;

}

# begin program

$i=0;

while ($i<100) {

$polyA[$i]=‘A’;

$i++;

}

Useful Array functionsUseful Array functions

PUSH and POPPUSH and POP

Functions commonly used for manipulating a stack:

PUSHPOP

F.I.L.O = First In Last Out

Very common in computer programs

Array functions – PUSH and POPArray functions – PUSH and POP

# part of a program that reads a database into an array

# open database etc first..

@dblines=(); # resets @dblines

while ($line=<DB>) {

push @dblines,$line; # push $line onto array

}

...

while (@dblines) {

$record = pop @dblines; # pop line off and use it

.... do something

}

# part of a program that reads a database into an array

# open database etc first..

@dblines=(); # resets @dblines

while ($line=<DB>) {

push @dblines,$line; # push $line onto array

}

...

while (@dblines) {

$record = pop @dblines; # pop line off and use it

.... do something

}

Scalar ContextsScalar Contexts

If you provide an expression (e.g. an array) when Perl expects a scalar, Perl attempts to evaluate the expression in a scalar context. For an array this is the length of an array:

$length=@poly_peptide;$length=@poly_peptide;

$length=$#poly_peptide+1;$length=$#poly_peptide+1;

This is equivalent to

Hence:

while (@dblines) {

..

while (@dblines) {

..

array in scalar context = length of array

Special variablesSpecial variables

$_Set in many situations such as reading from a file or in a foreach loop.

$0Name of the file currently being executed.

$]Version of Perl being used.

@_Contains the parameters passed to a subroutine.

@ARGVContains the command line arguments passed to the program.

Perl defines some variables for special purposes, including:

Some are read-only and cannot be changed: see man perlvar for more details.

Associative Arrays (Hashes)Associative Arrays (Hashes)

Similar to normal arrays but the elements are identified by keys and not indices. The keys can be more complicated, such as strings of characters.

Hashes are indicated by % and can be initialized with lists like arrays:

%hash = (key1,val1,key2,val2,key3,val3..);%hash = (key1,val1,key2,val2,key3,val3..);

Associative Arrays (Hashes)Associative Arrays (Hashes)

Examples

%months=(‘jan’,31,’feb’,28,’mar’,31,’apr’,30);%months=(‘jan’,31,’feb’,28,’mar’,31,’apr’,30);

Alternatively,

%months=(‘jan’=> 31,

’feb’=> 28,

’mar’=> 31,

’apr’=> 30);

%months=(‘jan’=> 31,

’feb’=> 28,

’mar’=> 31,

’apr’=> 30);

=> is a synonym for ,

keyvalue

Associative Arrays (Hashes)Associative Arrays (Hashes)

Further examples

#

%classification = (‘dog’ => ‘mammal’, ‘robin’ => ‘bird’, ‘snake’ => ‘reptile’);

%genetic_code = (

‘TCA’ => ‘ser’,

‘TTC’ => ‘phe’,

‘TTA’ => ‘leu’,

‘TTA’ => ‘STOP’

‘CCC’ => ‘pro’,

...

);

#

%classification = (‘dog’ => ‘mammal’, ‘robin’ => ‘bird’, ‘snake’ => ‘reptile’);

%genetic_code = (

‘TCA’ => ‘ser’,

‘TTC’ => ‘phe’,

‘TTA’ => ‘leu’,

‘TTA’ => ‘STOP’

‘CCC’ => ‘pro’,

...

);

The elements of a hash are accessed using curly brackets, { and } :

Associative Arrays (Hashes) - elementsAssociative Arrays (Hashes) - elements

$genetic_code{TCA} = ‘ser’;

$genetic_code{CCC} = ‘pro’;

$genetic_code{TGA} = ‘STOP’;

$genetic_code{TCA} = ‘ser’;

$genetic_code{CCC} = ‘pro’;

$genetic_code{TGA} = ‘STOP’;

Note the $ sign: the elements are scalars and so must be preceded by $, even though they belong to a % (just as for arrays).

Associative Arrays (Hashes) – useful Associative Arrays (Hashes) – useful functionsfunctions

existsindicates whether a key exists in the hash

if (exists $genetic_code{$codon}) {

...

}else {

print “Bad codon $codon\n”;

exit;

}

if (exists $genetic_code{$codon}) {

...

}else {

print “Bad codon $codon\n”;

exit;

}

Associative Arrays (Hashes) – useful Associative Arrays (Hashes) – useful functionsfunctions

keys and valuesmakes arrays from the keys and values of a hash.

@codons = keys %genetic_code;

@amino_acids = values %genetic_code;

@codons = keys %genetic_code;

@amino_acids = values %genetic_code;

Often you will see code like the following:

foreach $codon (keys %genetic_code) {

if ($genetic_code{$codon} eq ‘STOP’) {

last; # i.e. stop translating

} else {

$protein .= $genetic_code{$codon};

}

foreach $codon (keys %genetic_code) {

if ($genetic_code{$codon} eq ‘STOP’) {

last; # i.e. stop translating

} else {

$protein .= $genetic_code{$codon};

}