python course in bioinformatics - university of california...

35
Outline General Introduction Basic Types in Python Programming Exercises Python course in Bioinformatics Xiaohui Xie March 31, 2009 Xiaohui Xie Python course in Bioinformatics

Upload: phamhuong

Post on 04-May-2018

226 views

Category:

Documents


2 download

TRANSCRIPT

OutlineGeneral Introduction

Basic Types in PythonProgramming

Exercises

Python course in Bioinformatics

Xiaohui Xie

March 31, 2009

Xiaohui Xie Python course in Bioinformatics

OutlineGeneral Introduction

Basic Types in PythonProgramming

Exercises

General Introduction

Basic Types in Python

Programming

Exercises

Xiaohui Xie Python course in Bioinformatics

OutlineGeneral Introduction

Basic Types in PythonProgramming

Exercises

Why Python?

I Scripting language, raplid applications

I Minimalistic syntax

I Powerful

I Flexiablel data structure

I Widely used in Bioinformatics, and many other domains

Xiaohui Xie Python course in Bioinformatics

OutlineGeneral Introduction

Basic Types in PythonProgramming

Exercises

Where to get Python and learn more?

I Main source of information: http://docs.python.org/

I Tutorial: http://docs.python.org/tutorial/index.html

I Biopython: http://biopython.org/wiki/Main Page

Xiaohui Xie Python course in Bioinformatics

OutlineGeneral Introduction

Basic Types in PythonProgramming

Exercises

Invoking Python

I To start: type python in command lineI It will look like

Python 2.5.2 (r252:60911, Mar 25 2009, 00:12:33)

[GCC 4.1.2 (Gentoo 4.1.2 p1.0.2)] on linux2

Type "help", "copyright", "credits" or "license" for more information.

>>>

I You can now type commands in the line denoted by >>>

I To leave: type end-of-file character ctrl-D on Unix, ctrl-zon Windows

I This is called interactive mode

Xiaohui Xie Python course in Bioinformatics

OutlineGeneral Introduction

Basic Types in PythonProgramming

Exercises

Appetizer Example

I Task: Print all numbers in a given fileI File: numbers.txt

2.1

3.2

4.3

I Code: print.py

# Note: the code lines begin in the first column of the file. In

# Python code indentation *is* syntactically relevant. Thus, the

# hash # (which is a comment symbol, everything past a hash is

# ignored on current line) marks the first column of the code

data = open("numbers.txt", "r")

for d in data:

print d

data.close()

Xiaohui Xie Python course in Bioinformatics

OutlineGeneral Introduction

Basic Types in PythonProgramming

Exercises

Appetizer Example cont’d

I Task: Print the sum of all the data in the fileI Code: sum.py

data = open("numbers.txt", "r")

s = 0

for d in data:

s = s + float(d)

print s

data.close()

Xiaohui Xie Python course in Bioinformatics

OutlineGeneral Introduction

Basic Types in PythonProgramming

Exercises

Interative Mode

I prompt >>> allows to enter command

I command is ended by newline

I variables need not be initialized or declared

I a colon “:” opens a block

I ... prompt denotes that block is expected

I no prompt means python output

I a block is indented

I by ending indentation, block is ended

Xiaohui Xie Python course in Bioinformatics

OutlineGeneral Introduction

Basic Types in PythonProgramming

Exercises

Differences to Java or C

I can be used interatively. This makes it much easier to testprograms and to debug

I no declaration of variables

I no brackets denote block, just indentation (Emacs supportsthe style)

I a comment begins with a “#”. Everything after that isignored.

Xiaohui Xie Python course in Bioinformatics

OutlineGeneral Introduction

Basic Types in PythonProgramming

Exercises

Numbers

I Example

>>> 2+2

4

>>> # This is a comment

... 2+2

4

>>> 2+2 # and a comment on the same line as code

4

>>> (50-5*6)/4

5

>>> # Integer division returns the floor:

... 7/3

2

>>> 7/-3

-3

Xiaohui Xie Python course in Bioinformatics

OutlineGeneral Introduction

Basic Types in PythonProgramming

Exercises

Numbers cont’d

I Example

>>> width = 20

>>> height = 5*9

>>> width * height

900

>>> # Variables must be defined (assigned a value) before they can be

>>> # used, or an error will occur:

>>> # try to access an undefined variable

... n

Traceback (most recent call last):

File "<stdin>", line 1, in <module>

NameError: name ’n’ is not defined

Xiaohui Xie Python course in Bioinformatics

OutlineGeneral Introduction

Basic Types in PythonProgramming

Exercises

Strings

I Strings can be enclosed in single quotes or double quotesI Example

>>> ’spam eggs’

’spam eggs’

>>> ’doesn\’t’

"doesn’t"

>>> "doesn’t"

"doesn’t"

>>> ’"Yes," he said.’

’"Yes," he said.’

>>> "\"Yes,\" he said."

’"Yes," he said.’

>>> ’"Isn\’t," she said.’

’"Isn\’t," she said.’

Xiaohui Xie Python course in Bioinformatics

OutlineGeneral Introduction

Basic Types in PythonProgramming

Exercises

Strings cont’d

I Strings can be surrounded in a pair of matching triple-quotes:""" or ’’’. End of lines do not need to be escaped whenusing triple-quotes, but they will be included in the string.

I Example

print """

Usage: thingy [OPTIONS]

-h Display this usage message

-H hostname Hostname to connect to

"""

Xiaohui Xie Python course in Bioinformatics

OutlineGeneral Introduction

Basic Types in PythonProgramming

Exercises

Strings cont’d

I Strings can be concatenated (glued together) with the +operator, and repeated with *:

I Example

>>> word = ’Help’ + ’A’

>>> word

’HelpA’

>>> ’<’ + word*5 + ’>’

’<HelpAHelpAHelpAHelpAHelpA>’

>>> ’str’ ’ing’ # <- This is ok

’string’

>>> ’str’.strip() + ’ing’ # <- This is ok

’string’

Xiaohui Xie Python course in Bioinformatics

OutlineGeneral Introduction

Basic Types in PythonProgramming

Exercises

Strings cont’d

I Strings can be subscripted (indexed); like in C, the firstcharacter of a string has subscript (index) 0.

I There is no separate character type; a character is simply astring of size one.

I Substrings can be specified with the slice notation: twoindices separated by a colon.

I Example

>>> word = ’Help’ + ’A’

>>> word[4]

’A’

>>> word[0:2]

’He’

>>> word[:2] # The first two characters

’He’

>>> word[2:] # Everything except the first two characters

’lpA’

Xiaohui Xie Python course in Bioinformatics

OutlineGeneral Introduction

Basic Types in PythonProgramming

Exercises

Strings cont’d

I Unlike a C string, Python strings cannot be changed.Assigning to an indexed position in the string results in anerror:

I Example

>>> word[0] = ’x’

Traceback (most recent call last):

File "<stdin>", line 1, in ?

TypeError: object doesn’t support item assignment

>>> ’x’ + word[1:]

’xelpA’

>>> ’Splat’ + word[4]

’SplatA’

Xiaohui Xie Python course in Bioinformatics

OutlineGeneral Introduction

Basic Types in PythonProgramming

Exercises

Strings cont’d

I Example

>>> from string import *

>>> dna = ’gcatgacgttattacgactctg’

>>> len(dna)

22

>>> ’n’ in dna

False

>>> count(dna,’a’)

5

>>> replace(dna, ’a’, ’A’)

’gcAtgAcgttAttAcgActctg’

I Exercise: Calculate GC percent of dna

Xiaohui Xie Python course in Bioinformatics

OutlineGeneral Introduction

Basic Types in PythonProgramming

Exercises

Strings cont’d

I Solution: Calculate GC percent

>>> gc = (count(dna, ’c’) + count(dna, ’g’)) / float(len(dna)) * 100

>>> "%.2f" % gc

’64.08’

Xiaohui Xie Python course in Bioinformatics

OutlineGeneral Introduction

Basic Types in PythonProgramming

Exercises

Strings cont’d

I Exercise: Calculate the complement of DNA

A - T

C - G

Xiaohui Xie Python course in Bioinformatics

OutlineGeneral Introduction

Basic Types in PythonProgramming

Exercises

Lists

I A list of comma-separated values (items) between squarebrackets.

I List items need not all have the same type (compund datatypes)

>>> a = [’spam’, ’eggs’, 100, 1234]

>> a[0]

’spam’

>>> a[3]

1234

>>> a[-2]

100

>>> a[1:-1]

[’eggs’, 100]

>>> a[:2] + [’bacon’, 2*2]

[’spam’, ’eggs’, ’bacon’, 4]

>>> 3*a[:3] + [’Boo!’]

[’spam’, ’eggs’, 100, ’spam’, ’eggs’, 100, ’spam’, ’eggs’, 100, ’Boo!’]

Xiaohui Xie Python course in Bioinformatics

OutlineGeneral Introduction

Basic Types in PythonProgramming

Exercises

Lists cont’d

I Unlike strings, which are immutable, it is possible to changeindividual elements of a list

I Assignment to slices is also possible, and this can even changethe size of the list or clear it entirely

I Example

>>> a

[’spam’, ’eggs’, 100, 1234]

>>> a[2] = a[2] + 23

>>> a[0:2] = [1, 12] # Replace some items:

>>> a[0:2] = [] # Remove some:

>>> a

[123, 1234]

>>> a[1:1] = [’bletch’, ’xyzzy’] # Insert some:

>>> a

[123, ’bletch’, ’xyzzy’, 1234]

Xiaohui Xie Python course in Bioinformatics

OutlineGeneral Introduction

Basic Types in PythonProgramming

Exercises

Lists cont’d

I Functions returning a list

>>> range(3)

[0, 1, 2]

>>> range(10,20,2)

[10, 12, 14, 16, 18]

>>> range(5,2,-1)

[5, 4, 3]

>>> aas = "ALA TYR TRP SER GLY".split()

>>> aas

[’ALA’, ’TYR’, ’TRP’, ’SER’, ’GLY’]

>>> " ".join(aas)

’ALA TYR TRP SER GLY’

>>> l = list(’atgatgcgcccacgtacga’)

[’a’, ’t’, ’g’, ’a’, ’t’, ’g’, ’c’, ’g’, ’c’, ’c’, ’c’, ’a’,

’c’, ’g’, ’t’, ’a’, ’c’, ’g’, ’a’]

Xiaohui Xie Python course in Bioinformatics

OutlineGeneral Introduction

Basic Types in PythonProgramming

Exercises

Dictionaries

I A dictionary is an unordered set of key: value pairs, with therequirement that the keys are unique

I A pair of braces creates an empty dictionary: .I Placing a comma-separated list of key:value pairs within the

braces adds initial key:value pairs to the dictionaryI The main operations on a dictionary are storing a value with

some key and extracting the value given the keyI Example

>>> tel = {’jack’: 4098, ’sape’: 4139}

>>> tel[’guido’] = 4127

>>> tel

{’sape’: 4139, ’guido’: 4127, ’jack’: 4098}

>>> tel[’jack’]

4098

Xiaohui Xie Python course in Bioinformatics

OutlineGeneral Introduction

Basic Types in PythonProgramming

Exercises

Dictionaries cont’d

I Example

>>> tel = {’jack’: 4098, ’sape’: 4139, ’guido’ = 4127}

>>> del tel[’sape’]

>>> tel[’irv’] = 4127

>>> tel

{’guido’: 4127, ’irv’: 4127, ’jack’: 4098}

>>> tel.keys()

[’guido’, ’irv’, ’jack’]

>>> ’guido’ in tel

True

Xiaohui Xie Python course in Bioinformatics

OutlineGeneral Introduction

Basic Types in PythonProgramming

Exercises

Programming

I Example

a, b = 3, 4

if a > b:

print a + b

else:

print a - b

Xiaohui Xie Python course in Bioinformatics

OutlineGeneral Introduction

Basic Types in PythonProgramming

Exercises

Programming cont’d

I Example

>>> # Fibonacci series:

... # the sum of two elements defines the next

... a, b = 0, 1

>>> while b < 10:

... print b

... a, b = b, a+b

...

1

1

2

3

5

8

Xiaohui Xie Python course in Bioinformatics

OutlineGeneral Introduction

Basic Types in PythonProgramming

Exercises

Programming features

I multiple assignment: rhs evaluated before anything on theleft, and (in rhs) from left to right

I while loop executes as long as condition is True (non-zero,not the empty string, not None)

I block indentation must be the same for each line of block

I need empty line in interactive mode to indicate end of block(not required in edited code)

I use of print

Xiaohui Xie Python course in Bioinformatics

OutlineGeneral Introduction

Basic Types in PythonProgramming

Exercises

Printing

I Example

>>> i = 256*256

>>> print ’The value of i is’, i

The value of i is 65536

Xiaohui Xie Python course in Bioinformatics

OutlineGeneral Introduction

Basic Types in PythonProgramming

Exercises

Flow control

I Example

x = 35

if x < 0:

x = 0

print ’Negative changed to zero’

elif x == 0:

print ’Zero’

elif x == 1:

print ’Single’

else:

print ’More’

Xiaohui Xie Python course in Bioinformatics

OutlineGeneral Introduction

Basic Types in PythonProgramming

Exercises

Iteration

I Python for iterates over sequence (string, list, generatedsequence)

I Example

a = [’cat’, ’window’, ’defenestrate’]

for x in a:

print x, len(x)

Xiaohui Xie Python course in Bioinformatics

OutlineGeneral Introduction

Basic Types in PythonProgramming

Exercises

Iteration

I Python for iterates over sequence (string, list, generatedsequence)

I Example

a = [’cat’, ’window’, ’defenestrate’]

for x in a:

print x, len(x)

Xiaohui Xie Python course in Bioinformatics

OutlineGeneral Introduction

Basic Types in PythonProgramming

Exercises

Definiting functions

I Example

def fib(n): # write Fibonacci series up to n

"""Print a Fibonacci series up to n."""

a, b = 0, 1

while b < n:

print b,

a, b = b, a+b

# Now call the function we just defined:

fib(2000)

# will return:

# 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597

Xiaohui Xie Python course in Bioinformatics

OutlineGeneral Introduction

Basic Types in PythonProgramming

Exercises

Reverse Complement of DNA

I Excercise: Find the reverse complement of a DNA sequenceI Example

5’ - ACCGGTTAATT 3’ : forward strand

3’ - TGGCCAATTAA 5’ : reverse strand

So the reverse complement of ACCGGTTAATT is AATTAACCGGA

Xiaohui Xie Python course in Bioinformatics

OutlineGeneral Introduction

Basic Types in PythonProgramming

Exercises

Reverse Complement of DNA

I Solution: Find the reverse complement of a DNA sequence

from string import *

def revcomp(dna):

""" reverse complement of a DNA sequence """

comp = dna.translate(maketrans("AGCTagct", "TCGAtcga"))

lcomp = list(comp)

lcomp.reverse()

return join(lcomp, "")

Xiaohui Xie Python course in Bioinformatics

OutlineGeneral Introduction

Basic Types in PythonProgramming

Exercises

Translate a DNA sequence

I Excercise: Translate a DNA sequence to an amino acidsequence

I Genetic code

standard = { ’ttt’: ’F’, ’tct’: ’S’, ’tat’: ’Y’, ’tgt’: ’C’,

’ttc’: ’F’, ’tcc’: ’S’, ’tac’: ’Y’, ’tgc’: ’C’,

’tta’: ’L’, ’tca’: ’S’, ’taa’: ’*’ , ’tca’: ’*’,

’ttg’: ’L’, ’tcg’: ’S’, ’tag’: ’*’, ’tcg’: ’W’,

’ctt’: ’L’, ’cct’: ’P’, ’cat’: ’H’, ’cgt’: ’R’,

’ctc’: ’L’, ’ccc’: ’P’, ’cac’: ’H’, ’cgc’: ’R’,

’cta’: ’L’, ’cca’: ’P’, ’caa’: ’Q’, ’cga’: ’R’,

’ctg’: ’L’, ’ccg’: ’P’, ’cag’: ’Q’, ’cgg’: ’R’,

’att’: ’I’, ’act’: ’T’, ’aat’: ’N’, ’agt’: ’S’,

’atc’: ’I’, ’acc’: ’T’, ’aac’: ’N’, ’agc’: ’S’,

’ata’: ’I’, ’aca’: ’T’, ’aaa’: ’K’, ’aga’: ’R’,

’atg’: ’M’, ’acg’: ’T’, ’aag’: ’K’, ’agg’: ’R’,

’gtt’: ’V’, ’gct’: ’A’, ’gat’: ’D’, ’ggt’: ’G’,

’gtc’: ’V’, ’gcc’: ’A’, ’gac’: ’D’, ’ggc’: ’G’,

’gta’: ’V’, ’gca’: ’A’, ’gaa’: ’E’, ’gga’: ’G’,

’gtg’: ’V’, ’gcg’: ’A’, ’gag’: ’E’, ’ggg’: ’G’ }

Xiaohui Xie Python course in Bioinformatics