python for biomedicine - uefwhamalai/pybio/pybio_ravantti_lectures_v1.0.pdf · python-packages for...

132
Python for Biomedicine [email protected]

Upload: others

Post on 23-Jun-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 2: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Basic informationTime: spring term 2019, 5 contact sessions and self-studying, lasting about 6 weeks.

Contact teaching: 15.4. 4h (12-16), 16.4. 4h (10-14), 29.4. 2h (12-14), 9.5. 4h (12-16), 10.5. 4h (10-14).

The deadline for project works will be in the end of May-beginning of June.

Teachers:

Lecturer: Janne Ravantti, adjunct professor, Faculty of Medicine, UH

Assistants: Wilhelmiina Hämäläinen & Vittorio Fortino, Institute of Biomedicine, UEF

Page 3: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

This is a beginners class...

Page 4: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Today & tomorrow● the course goals● how to pass the course● why this course might be good for you

● what is programming in general

● working Python-programming environment installed

● basics of Python-programming○ “theory”○ the first program○ exercises○ more “theory”

Page 5: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Ask questions!

Page 6: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Bioinformatics & Python?

----------------------------------------- Python -----------------------------------------

Comp.sciStatisticsMathematics

BiologyBioinfos’methodsdevelopment

Processingbiologicaldata

Page 7: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Programming 1/2

Programming requires peculiar way of thinking(but it can be learned!)

Page 8: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Programming 2/2

Good* way to learn programming is to program!

*The Best?

Page 9: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Programming & bioinformatics

Goodness of your program is (mostly) defined by the biological question*

*Wilhelmiina might disagree...

Page 10: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Opinionated tips for programming

● Start small (e.g. not aligning 1000-genomes humans!) and one step at a time

● Don’t worry (about errors) (too much - testing is important, but...)

● Think! What...:

○ is the biological question?

○ is the data?

○ the program is supposed to do (methods, algorithms, ...)?

○ input (DNA-sequence? Set of RNA-seq data, names of plants, …)

○ can go wrong => then what (disk full, memory full, bad methods, too little data, ...)?

● Learn to save your code (naming, locations, even something like git)

Page 11: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Caveats● Everything changes...

○ Data (WXS => WGS => WGBS; RNA-seq, …; HG37 vs. HG38...)○ Methods (bowtie => bowtie2 => bwa mem => minimap2 => …)○ Links go stale (404 Not Found)

○ Python 2.7 => 3.6+○ Python-libraries (Standard library, Biopython, ...)

○ Operating systems / platforms○ System libraries

=> Do not get stuck with the old unless absolutely necessary, but don’t worry too much about newest trends!

Page 12: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Learning to program depends on you!

(The best way to learn programming is to program...)

Please, ask questions! We can pace the course on your preferences

Page 13: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Learning outcomes● to make computer work for you!

● understand what kind of data processing tasks can be automated and how

● working programming environment that can be used with your own research

Page 14: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Advantages● Study credits :)

● Automate your own data cleaning, analysis and reporting

● After understanding the general ideas behind programming languages, new languages (R, java, …) are easier to learn and use

● Jobs - also from other fields - biomedicine and biosciences are nowadays naturally “Big Data” and “Data Science”-oriented

Page 15: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Questions?

Page 16: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Programming● Program is only(...) a detailed “recipe” what the programmable object

(computer, phone, refrigerator, car, …) must (should…) do● E.g. instructions for a robot to warm up dinner:

1. Put pizza on the plate2. Put plate into micro3. Warm up a minute4. If pizza not warm, goto step 35. Feed to Master

● Is this detailed enough? What could go wrong?

Page 17: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Basic programming concepts 1/3● Program is collection of commands / rules (“statements”) to the computer

(here: with Python-interpreter)

● Commands in the different programming languages can look quite different, but in the end they do same things:

○ Data input / output (“read, write”)○ Internal data handling: variables and data structures (“list, dictionary, ...”) and their

manipulation (“operators”)○ Control flow statements

■ Conditional statements (“if then else”)■ Loops (“for, while, goto, ...”)

○ Functions / subroutines / methods (“sqrt(x), sin(x),...”)

Page 18: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Basic programming concepts 2/3● Programming language’s syntax (i.e. rules and regulations) causes problems

in the beginning (and later in the life…)○ Capital and small letters matter (“print” vs. “Print”)○ Commans (“,”) matter more than in natural languages

● Errors can come from○ Syntax => program doesn’t even start○ Program(mer)’s logic => program might stop (“crash”) during the run or does wrong things

=> which one is worse?

https://en.wikibooks.org/wiki/Computer_Programming/Hello_world

Page 19: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Basic programming concepts 3/3● Programming requires its own(?) way of thinking

● Think before writing even one line○ What information / data the program needs?○ What should the program do with the input and what to produce as an output?○ Is the input/output coming/going to a file(s)?○ What separate (recurring) parts the program has?

● What could go wrong? What to do then?

● Especially larger programming projects should be split to smaller pieces (in your mind, in paper, …) => “divide and conquer”

Page 20: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

ExerciseOn your own words, how would you make a program that reads from computer’s folder all text files and reports those in which a string “TTAGGG” occurs*.

What information the program needs as input?

How would you report the files?

How should the program proceed?

Any repeatable parts?

What can go worng? What then?

(*)https://en.wikipedia.org/wiki/Microsatellite

Page 21: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Documentation● Should have instructions how to run the program, without reader to

understand inner workings of the program

● Should describe how the program works, regardless the language used

● Should help other programmers to understandstep-by-step how the program works

● Documentation is not only the program listing!

● Has to be clearly and correctly written

Page 22: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Questions?

Page 23: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Python programming language● Named after TV-show Monty Python’s flying circus

● Two main versions○ 2.7+, older still in use, no new development○ 3.x, newer, actively developed, but stable language

● There is a lot of learning material based on version 2.x. Be careful out there

● Python 3.7 is used in this course

https://docs.python.org/3/tutorial/index.html

Page 24: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Python’s characteristics● Easy to read (for a programming language...)

● Open Source, available to all kinds of machines / platforms

● Large standard library (“Batteries included” philosophy)

● Lots of material/libraries/modules in the Internet (web-programming, graphics, bioinformatics, …)

● Fast enough for scientific computing○ large numerical and scientific libraries○ can be used with other (faster) languages (“C”, “C++”, …)

Page 25: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Python in the wild● In teaching...

● Scientific computing○ Bioinformatics (e.g. Biopython-package - later!)○ Machine learning○ Visualization○ Robotics○ “Big Data”, “Data Science”

● System administration○ ILM, Weta, Rackspace,...

● Mobile programming (e.g. http://kivy.org/docs/gettingstarted/intro.html)

Page 26: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Anaconda / Python

● Anaconda is Continuum Analytics’ Python-distribution package

● Works in Mac- / Win- / Linux-environments

● Contains over 330 720 something(...) Python-packages for all kinds of

programming tasks

● Free version also for commercial use

● Smaller installation package “miniconda”

● conda-program has become a general package manager e.g. for

bioinformatics’ programs (samtools, bwa, emboss, ...)

Page 27: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Let’s install Anaconda!https://www.anaconda.com/

Page 28: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Python programming environments● Many, many, many options

○ IDLE = Integrated DeveLopment Environment○ Spyder○ Jupyter (lab)

● Interactive command interpreter○ Takes commands (1+1, print(“ATGC”), ...))○ Shows results of the commands / programs○ Prints contents of the data structures and variables

● There is usually also an editor for writing and editing programs○ Use a one with syntax highlighting / coloring

● Let’s try...

Page 29: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Python vs. IDE● Many, many, many options

○ IDLE = Integrated DeveLopment Environment○ Spyder○ Jupyter (lab)

● Python itself understands only textfiles, reads them and executes them line-by-line. Everything else is made to make programming easier

● In this course, we are learning Python!

Page 30: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

The first programLet’s program the machine to greet us!

In the command line

Write the following and press Enter:

print(“What is thy bidding my master?”)

In the IDE (idle3, Spyder, jupyter, …):

Write the same print-command and select e.g. Run -> Run Module

Save the program e.g. as “greeting.py”

Page 31: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

The first program - cont’dSaved programs can be naturally opened, edited and rerun

”#”-character means a comment i.e. free text that the Python-interpreter does not care about, for example:

# this is my first programprint(”What is thy bidding my master?”)

Reasonable and informative comments help to understand the code/program later!

Page 32: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Saving programs● As with any writing - save often!

● Use informative filenames (“test.py” vs. “calculate_GC_ratio.py”)

● Understand the difference between files saved by default in IDEs (e.g. jupyter notebook) vs. text file containing the actual Python-code (usually “something.py” - file)

● The Python-code is portable and can be run anywhere with the same(ish) Python-environment

Page 33: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

ExerciseMake program that prints the following:

Nucleobases in DNA are:

* cytosine [C]* guanine [G]* adenine [A]* thymine [T]

Remember to save your program!

Page 34: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Error messages● The interpreter will complain about possible errors

● Start time errors○ Typos (missing parenthesis, commas, …)○ Uninitialized variables (will come back to this later)

● Runtime errors○ Division by zero○ Type errors (will come back to this later)

● Understanding error messages can be difficult - patience!

Page 35: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

This far...● Introduction to programming & Python

● Some ideas how programming problems can be approached e.g. with by “divide & conquer”

● You have a working Python-programming environment installed

● You have written your first program!

Page 36: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Questions?

Page 37: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

ExerciseDescribe in general level a program that reads two Genbank-formatted(*) files and count how many same genes they have

What input the program needs & output program produces?

How program knows, if there are same genes?

What repeatable parts the program has?

What can go wrong? Then what?

(*)http://scikit-bio.org/docs/0.5.2/generated/skbio.io.format.genbank.html

(*)http://www.insdc.org/files/feature_table.html

Page 38: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

ExerciseMake program that prints the following:

*-------*| || || |*-------*

Page 39: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Variables● Programming can be thought to be manipulation of the input to output using

statements and variables as temporary holders of internal state

● Variables are “boxes” that hold different kinds of information. In Python, variable has always some type

○ Integer (1, 5, -100)○ Float (3.14, -1.06e-10)○ Character / String (“ATGGGA”, “1234567890”)○ Boolean (True, False)

● Naming is case sensitive but quite free. Sensible variable names make more sense especially later (“xxx_ver_001” vs. “total_sum”)

Page 40: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Assignment (statement) 1/2● Equal sign (“=”) assigns value (right hand side) to a variable (left hand side)

● Variables can be combined○ a = 1○ b = 2○ c = a + b

● Variables’ values can be printed using print-statement separated by commas

○ print(a, b, c)

● Arithmetic operations (”+”, ”-”, ”/”, ”*”) work for numerical values and, ”+” and ”*” for characters and strings

Page 41: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Assignment (statement) 2/2● Equal sign (“=”) assigns value (right hand side) to a variable (left hand side)

● Read assignment “a = b + 2” as: “a gets a value b plus two”, not: “a equals b plus two”

● Reason being, that “a = a + 2” makes little sense mathematically, but makes sense, when reading it as:”assign a new value to a which is the sum of previous value of a and two”

● Equality is tested with operator “==” (more later)

Page 42: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Operators● Operators do “something” to variables

● Most common operators are○ arithmetic operators (“+”, “-”, “*”, “/”)○ comparison operators (“>”, “<”, “==”)○ logical operators (“and”, “or”, “not”)

● We’ll return to these later...

Page 43: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Exercise● Make a program that calculates how many exons transcript variants X1, X2

and X3 have, when○ X1 has 23 exons○ X2 has four exons less than X1○ X3 has half the exons of X2

● Use assignments and variables in calculations

Page 44: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Type conversions● Combining different types of variables can surprise / give (semi-)cryptic error

messages (“a”+3.14, 10.0+1, 10*”TA”)

● Type of the variable can be changed (, if it makes sense)○ int(a) => change variable’s type to integer○ float(a) => -:- float○ str(a) => -:- string

● E.g. (let’s try!)

character_1 = “1”number_1 = int(character_1)

Page 45: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Reading input● Python interpreter reads user input with input-command(*) as a string

● input takes an argument that will be shown to user (e.g. questions)

● input returns a string and it can be assigned directly to a variable, but type might need a conversion

● Examples○ gene_name = input(“Enter gene name: ”)○ genome_length = int(input(“Enter genome length: ”))

(*) “input” is actually a function, but more later...

Page 46: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Exercises● Make a program that asks two gene lengths and prints out lengths’ mean

○ Ask lengths one by one

● Make a program that calculates area, given height and length

Test your programs with different inputs - what can go wrong?

Page 47: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Intermission● If you have extra time in the lectures (& tonight)

○ https://www.practicepython.org/

● Libraries (e.g.)○ https://docs.python.org/3/library/index.html○ https://matplotlib.org/○ https://pydata.org/

■ https://pandas.pydata.org/■ https://seaborn.pydata.org/index.html■ https://biopython.org/

■ ...

Page 48: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

This far you can...● Input data to your program

● Manipulate data using variables and operators

● Print out variables and/or results

Page 49: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Questions?

Page 50: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Programming...● Program is collection of commands / rules (“statements”) to the computer

(here: with Python-interpreter)

● Commands in the different programming languages can look quite different, but in the end they mostly do same things:

○ Data input / output (“read, write”)○ Internal data handling: variables and data structures (“list, dictionary, ...”) and their

manipulation (“operators”)○ Control flow statements

■ Conditional statements (“if then else”)■ Loops (“for, while, goto, ...”)

○ Functions / subroutines / methods (“sqrt(x), sin(x),...”)

Page 51: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Conditional statements 1/4● Control structures direct the order of execution of the statements in a program

● In Python, if statement handles the decisions

● if understands only boolean types True and False

● The colon (“:”) ends the if-statement - do not forget it!

if statement_is_True:do_something

Page 52: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Conditional statements 2/4● The statement is whatever “thing” that produces True/False result e.g.

a = 1

b = “Hello”

c = False

a > 2 (False)

b == “Hello” (True)

c == True (False)

Page 53: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Conditional statements 3/4● Statements can be constructed using constants, variables and operators

○ Arithmetic operators (+, -, *, /, %)○ Comparison operators (==, <, >, !=, >=, <=)○ Logical operators (and, or, not)

E.g. if a + b > 5 and c == 5:

do_something

if not c == 5:do_something

Page 54: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Conditional statements 4/4● If the statement is True, then the lines in the same block are executed

if statement_is_true: do_something do_something_morecontinue_the_program

highlights the code block / indentation

Page 55: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Code blocks● Blocks group together (one or more) statements (if-statements, but also

loops, functions etc.) that are executed at the same part of the program.

● The block is marked with indentation which is done using tab-character not with spaces - especially do not mix both! Editors help with indentation

if a > 100 : a = a / 2 b = a + 10print(a, b)

Page 56: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

ExerciseMake program that asks for three genome lengths and prints the longest one. If the length is zero or negative, print a warning.

What could go wrong? How?

Page 57: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Conditional statements - more● If-statement understands only boolean values True and False

● If statement is True, then the following code block is executed

● else-statement with colon (“:”) can follow the if-statement’s code block. If the original statement was False, then else code block is executed

if statement_is_True:do_something

else: # <= the statement was Falsedo_something_else

continue_the_program

Page 58: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Conditional statements - even more...● If-statement can also have extra elif-statements

● elif == “else if”

if statement_1_is_True:do_something

elif statement_1_is_False_and_statement_2_is_True:do_something_else

else: # <= all previous statements Falsedo_something_else_2

continue_normal_program

Page 59: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Exercise● Make a program that asks user a number and prints out, if the number is even

or odd

● Use if- and else-statements

● The % (modulo) operator yields the remainder from the division of the first argument by the second

Page 60: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Tips for programming● Divide & conquer - split program into smaller, understandable pieces - even

before writing a single line!

● Name programs and variables with reason

● Always initialize (i.e. give starting value) all variables

● Smart commenting pays off!

● Print intermediate results when in doubt - those can be commented out later

Page 61: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

ExerciseMake a program that asks three numbers from user in any order, substracts 5 from the largest, add then 8 to smallest number and finally prints old and new numbers from smallest to largest

What is the output with input:

2, 4, 8

10, 11, 11

Page 62: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

This far...● I/O (Input/Output)

● Comparisons

Page 63: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Questions?

Page 64: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Strings 1/2● Python has many, many, many tools (operators, functions, methods) to

manipulate strings

E.g.gene = “AUG” + “AATGAATCTGGA”+ “TAG”

print(len(gene)) # length

print(gene.find(”TGA”))

if “TAG” in gene:print(“Stop codon found!”)

https://docs.python.org/3/library/stdtypes.html#string-methods

Page 65: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Strings 2/2● Remember case sensitivity!

codon = “ACT”

if codon == “act”:print(“Threonine found”)

”in” operator tests, if the exact substring is found in the other string:

if ”ct” in codon:print(“CT found”)

Page 66: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

ExerciseMake a program that asks two nucleotide sequences, prints longer one and the shorter one, only if it is not found in the longer one.

Page 67: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

From strings to lists 1/2● Strings are immutable lists containing characters

● Lists are (“linear”) collections of any kind of data not unlike lockers

● Length of the list tells how many individuallockers there are

● E.g. characters of word “protein” can be storedin seven different lockers

“locker”: |p|r|o|t|e|i|n| 0 1 2 3 4 5 6

Page 68: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

From strings to lists 2/2● Each character in the string can be accessed using indices and brackets

“[“,”]”

● Every element in a list has its own “serial number” (index)

● The only(?) tricky part is that indexing starts at 0 (zero)

E.g.gene = “titin”len(gene) = 5 # the length of the word, not the protein!gene[0] # the first letter “t”gene[4] # the last letter “n”

Page 69: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Strings, lists and indices● The only(?) tricky part in indexing is to remember to start from 0 (zero) and

that the last position is length of the list - 1

● Negative index means counting from the end○ -1 is the index of the last element

E.g.gene = “titin”gene[-1] # the last letter “n”last_index = len(gene)-1gene[last_index] # the last letter again

Page 70: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

ExerciseMake a program that asks name of the gene and prints out the first, the middle and the last letter

Page 71: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Lists & slices● Lists (and strings) in general can be sliced i.e. return a part of the list

● General form (“syntax”) is: my_list[start:end:step], where start and end indices can be omitted, if start == 0 and end == -1

● Note that the list ends in index end-1!

E.g.sequence = “ATTTGTAAAGTCCCCCG”

sequence[1:2] # returns “T”sequence[1:3] # “TT”sequence[7:] # “AAGTCCCCCG”

Page 72: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Lists 1/3● List index range is from 0 to len(list)-1

● Lists can contain any data types freely○ collection = [”A”, 400, ”3.14”, 4.6, ”Hello World”, [1, 2, ,3]]

● append-method adds elements to the endof the list

○ hiv_genes = [“gag”, “pol”, “env”, “tat”, “rev”, “nef”, “vpr”, “vif”]○ hiv_genes.append(“vpu”)

=> [“gag”, “pol”, “env”, “tat”, “rev”, “nef”, “vpr”, “vif”, “vpu”]

● append does not work with strings!

Page 73: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Lists 2/3● Strings are immutable and single characters cannot be changed individually,

i.e. think them constants like integer 123

● List elements can be manipulated freely e.g. using indices

hiv_genes = [“gag”, “pol”, “env”, “tat”, “rev”, “nef”, “vpr”, “vif”, “vpu”]hiv_structural_proteins = hiv_genes[:3] # [“gag”, “pol”, “env”]hiv_accessory = hiv_genes[5:]

# change the name of “env” to “ENV”hiv_genes[2] = “ENV”print(hiv_genes)

Page 74: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Lists 3/3● Python has many ways to manipulate lists

○ https://docs.python.org/3/tutorial/datastructures.html

E.g.a = [1, 2, 3]b = [5, 4, -11]c = a + b # => c = [1, 2, 3, 5, 4, -11]d = sorted(c) # => d = [-11, 1, 2, 3, 4, 5]…max(a)min(a)

Page 75: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

ExerciseMake a program that asks three gene names and print them in alphabetical order with the length of the gene name

Page 76: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

This far...● I/O (Input/Output)

● Comparisons

● Strings & their manipulation

● Compound data type List

Page 77: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Questions?

Page 78: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Programming...● Program is collection of commands / rules (“statements”) to the computer

(here: with Python-interpreter)

● Commands in the different programming languages can look quite different, but in the end they mostly do same things:

○ Data input / output (“read, write”)○ Internal data handling: variables and data structures (“list, dictionary, ...”) and their

manipulation (“operators”)○ Control flow statements

■ Conditional statements (“if then else”)■ Loops (“for”, “while”, “goto”, …)■ Functions / subroutines / methods (“sqrt(x), sin(x),...”)

Page 79: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

List traversal 1/2● in-operator tests if variable (or constant) is found in the list

hiv_genes = [“gag”, “pol”, “env”, “tat”, “rev”, “nef”, “vpr”, “vif”, “vpu”]if “GAG” in hiv_genes:

print(“GAG found!”)

● Lists can be accessed element by element with for-statement (loop!)

for gene in hiv_genes: # remember the “:”print(“found gene:”, gene)

● Syntax is the same as in if-statements

Page 80: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

List traversal 2/2● It is often useful to access list element by element with index (variable)

● Indices can be generated with range-function

● Syntax: range(star, end, step). N.B. this is the same as slicing lists and strings

● Start is by default 0 (zero) and step is 1, so range(10)==range(0,10,1)

E.g. my_list = [1,-1,5,6,10,33,1,2.0,3,19,-2,0,0,“END”]for index in range(0,9,2):

print(index, my_list[index]) # what are we printing?

Page 81: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

ExerciseMake a program that traverses through the previous hiv_gene-list one by one and prints list out like:

1. first_gene_name2. second_gene_name

...

Page 82: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Nested loops● for-statements can be used like any other commands inside loops - note the

indentation!

for i in range(10): for j in range(10): print(i*j) # what is the output?

oncoviruses = [“HBV”, “HCV”, “HTLV”,”HPV”,”HHV-8”,”MCPyV”,”EBV”]for virus_1 in oncoviruses: for virus_2 in oncoviruses: print(“Testing:”, virus_1, “vs.”, virus_2) # output?

Page 83: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

while-statement● while-statement repeats associated code block as long as the specific

condition is True

● General syntax iswhile condition_True:

do_somethingdo_something_else

continue_the_program

E.g.a = 1while a < 10:

print(a)print(“Loop done!”)

Page 84: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

while cont’d● while-statement repeats associated code block as long as the specific

condition is True

● If the condition is never False loop never ends (“infinite loop”)

● So...a = 1while a < 10:

print(a)a = a + 1 # if this is what we want...

print(“Loop done!”)

Page 85: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

ExerciseMake a program that asks gene names one at a time until user inputs “STOP”. After that, the program prints out all names

Page 86: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

ExerciseMake a program that prints out multiplication table for two integers user inputs

E.g.

number_1 = 3number_2 = 2=>1*1 = 12*1 = 23*1 = 31*2 = 2...

Page 87: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

This far...● I/O (Input/Output)

● Comparisons

● Strings & their manipulation

● Compound data type List

● Loops for and while

Page 88: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Questions?

Page 89: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Extra exerciseMake a program that calculates and prints row- and column sums of 3x3 matrix from user input

E.g.1 2 3 => 1 2 3; 63 5 -1 => 3 5 -1; 77 5 2 => 7 5 2; 14

--- --- ---21 50 -6

Input data and format the output any way you wish

If you have problems, at least use comments to sketch out the program flow

Page 90: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Dictionary data type / structure 1/2● List is sequential data structure where elements are indexed

● Dictionary keeps data accessible using keys

● Syntax: my_dict = {key1:data1, key2:data2, ...}

● Assignment & accessing data: my_dict[key] = data

E.g.hiv_gene_sequence = {} # empty dictionaryhiv_gene_sequence[“gag”] = “MGARASVLSGGELDRWEKIRLRPGGK…”hiv_gene_sequence[“pol”] = “FFREDLAFLQGKAREFSSEQTRANSPTRRE”...

Page 91: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Dictionary data type / structure 2/2● Key can be any immutable variable (lists will not do) or constant

● Dictionary can be accessed key-by-key with for-loop

for gene in hiv_gene_sequence:print(gene, hiv_gene_sequence[gene])

● in-operator tests, if a key exists

if “vif” in hiv_gene_sequence:print(“vif gene sequence:”, hiv_gene_sequence[“vif”])

else:print(“vif not found!”)

Page 92: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

ExerciseMake a program that asks protein sequences until “STOP” is encountered. After “STOP”, the program prints how many times each sequence was seen in the input.

What can go wrong? Why? What to do?

Page 93: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

More on strings 1/3● Python has many, many useful functions / methods for string-manipulation

(see: https://docs.python.org/3.6/library/stdtypes.html#string-methods)

● E.g. split(delimiter)-method splits string to list of words, based on delimiter

input_line = input(“Give gene name and length: “)input_words = input_line.split() # ['nef', '71']

file_path = “/home/ravantti/Downloads”path_parts = file_path.split(“/”)print(“root-directory:”, path_parts[1] # WHY?

Page 94: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

More on strings 2/3● E.g. replace(old_string, new_string)-method replaces all matching

instances on old_string to new_string - handy for data cleaning

pi_string = “3,14”pi = pi_string.replace(“,”,”.”)

pi = float(pi) # vs. float(pi_string)

Page 95: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

More on strings 3/3● E.g. strip() removes “extra” spaces and tabulator-characters from

beginning and end of a string

gene_name = “ ABCB1 “gene_name = gene_name.strip() # cleaned up version

Page 96: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

ExerciseMake a program that asks genome and gene names one pair per line until “STOP” and then prints the genomes and genes sorted both alphabetically

Hint: data structures can have data structures

Page 97: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Odds and sods● Make small program to all data handling => easier to reproduce

● Use code editor with syntax coloring & bracket matching

● KISS (at least in the beginning)

● Think naming of the program files & variables

● Good comments make program better, bad comments make program dangerous - comments are not executable code!

Page 98: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Files 1/2● Our programs have asked user to input the data by hand => laborious

● Most(?) programs read & write at least some data from/to files

● Files are operated through file objects

● Files are opened with open-statement and closed with close-method

● open returns a file object (handle) that is used for reading and writing

Page 99: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Files 2/2Open for reading from a file:file_handle = open(“file_name”, “rt”)

Open for writing to a file:file_handle = open(“file_name”, “wt”)

N.B. opening file for writing and writing anything to it will destroy the original file!

Closing the file with close-method - note the parenthesis

file_handle.close()

Page 100: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Reading filesfile_handle = open(“file_name”, “rt”) # open as a text file

line = file_handle.readline() # reads one line to a string

lines = file_handle.readlines() # reads all the lines into a list of strings

● for-statement can traverse through file line-by-line

for line in file_handle:do_something_with_the_line

● Line includes the the line feed (“\n”) character in the end!

Page 101: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

ExerciseMake a program that asks file name from user and then prints out each line of the file

You can use the file “genes.txt” from the course’s web-page

Page 102: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Writing files 1/2file_handle = open(”file_name”, ”wt”) # open text file for writing

file_handle.write(“PRD1 is a bacteriophage”) # writing a string

The line feed character must be added, if needed

Remember to close the file with

file_handle.close()

Page 103: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Writing files 2/2E.g.

fusion_gene_part_1 = “BCR”fusion_gene_part_2 = “ABL1”

fusion_file = open(“my_fusion_genes.txt”, “wt”)

fusion_file.write(fusion_gene_part_1)fusion_file.write(fusion_gene_part_2)

fusion_file.close()

In the file: BRCABL1 <= why?

Page 104: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

ExerciseMake a program that asks gene names from a user and writes them to a file line by line

Page 105: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Example 1/348,904,912 rows in a file: merged_bams.sam

903878_1_165_3.1_0.503 16 1 79062 69 165M * 0 0CTGCCCCCCACCTGACGACTTCAATAAGAAGTAGCAGCATTTCTCCAAGGAGGAAATACCAGAGTCAATTCACAACCACTGCAATTGCAGTGGTACCACCATAACAGCCCTTGGGCTGCAGAAGGAACTAAGAGTCTAGTCACTACAGTGGCACCTTCAGCAC * NM:i:0 AS:i:990RG:Z:fixed_XXX_sorted179915_1_164_5.4_0.366 16 1 85929 100 99M1X36M1X17M * 0 0GGATTGGCAATGCGTTCTTAGATAATACACCAAATACAAGCATGAAACAAACAAATGCAGCCAAAATGTACCAGAATCTGAAAACATCTATTATCTACGAAGAATTAGAGGGGAATTTGGTGAAAGAAATATGGCAGAATGGGACATTGCTCTGTGAATGCT * NM:i:2 AS:i:936RG:Z:YYY_sorted305674_1_344_4.9_0.459 0 1 87516 100 342M * 0 0CAGAGATGAGTTTGTTTATTTTTTTATTTTTTAAAAAATTGCTAATTTACAGAACATGGAGATGAGTATGTTTTGAAGGCTTGGAAGCATGCAAGTGGGAGAAGAAAGGAGTCAGCTACATTCTGGCTGTGTGCAGAGGCAGGTCACTGTGGTGGGAGTGTTCCTGTTTCATGGACTCTGCAAATCGCAATGCTTGGCATGGCCTCCCGACCCTGATGGCAGAGAAGCAAACCAGTCGGAGAGCTGGGGTCCTCCCAGCCCTCTTGGCCCTGTGGCCAATTTTTTCTTCAATAGCCTCATAAAATCACATTATTTGAGTGCCCATGGCTCCAAAACAAGCAG * NM:i:0 AS:i:2064 RG:Z:fixed_ZZZ_sorted

...

1st column is wrong, must include the name of the sample from the end

Page 106: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Example 2/3XXX_903878_1_165_3.1_0.503 16 1 7962 69 165M * 0 0CTGCCCCCCACCTGACGACTTCAATAAGAAGTAGCCCAGCATTTCTCCAAGGAGGAAATACCAGAGTCAATTCACAACCACTGCAATTGCAGTGGTACCACCATAACAGCCCTTGGGCTGCAGAAGGAACTAAAGTCTAGTCACTACAGTGGCACCTTCAGCAC * NM:i:0 AS:i:990RG:Z:fixed_XXX_sortedYYY_179915_1_164_5.4_0.366 16 1 85929 100 99M1X36M1X27M * 0 0GGATTGGCAATGCGTTCTTAGATAATACACCAAAAATACAAGCATGAAACAAACAAATGCAGCCAAAATGTACCAGAATCTGAAAACATCTATTATCTACGAAGAATTAGAGGGGAATTTGAAAGAAATATGGCAGAATGGGACATTGCTCTGTGAATGCT * NM:i:2 AS:i:936RG:Z:fixed_YYY_sortedZZZ_305674_1_344_4.9_0.459 0 1 87516 100 34M * 0 0CAGAGATGAGTTTGTTTATTTTTTTATTTTTTAAAAAATTGCTAATTTACAGAACATGGAGATGAGTATGTTTTGAAGGCTTGGAAGCATGCAAGTGGGAGAAGAAAGGAGTCAGCTACATTCTGGCTGGCAGAGGCAGGTCACTGTGGTGGGAGTGTTCCTGTTTCATGGACTCTGCAAATCGCAATGCTTGGCATGGCCTCCCGACCCTGATGGCAGAGAAGCAAACACCAGTCGGAGAGCTGGGGTCCTCCCAGCCCTCTTGGCCCTGTGGCCAATTTTTTCTTCAATAGCCTCATAAAATCACATTATTTGAGTGCCCATGGCTCCAAAACAAGCAG * NM:i:0 AS:i:2064 RG:Z:fixed_ZZZ_sorted

...

Page 107: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Example 3/3fix_sam.py:

#903878_1_165_3.1_0.503 16 1 78062 69 195M * 0 0 CTGCCCCCCACCTGACGACAATAAGAAGTAGCCCAGCATTTCTCCAAGGAGGAAATACCAGAGTCAATTCACAACCACTGCAATTGCAGTGGTACCACCATAACAGCCCTTGGGCTGCAGAAGGAACAAGAGTCTAGTCACTACAGTGGCACCTTCAGCAC * NM:i:0 AS:i:990 RG:Z:fixed_XXX_sorted

file_handle = open("merged_bams.sam", "rt")for line in file_handle: w = line.split() sample_name = w[-1].split("_")[1] read_name = sample_name + "_" + w[0] w[0] = read_name print("\t".join(w))

Page 108: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Exercise● Make a program that writes identical sequences from two separate files into

third

● You can assume that there is only one sequence per line

● You can use files seq_1.txt and seq_2.txt from the course’s webpage

● Think first what your program is supposed to do○ Logic - program flow?○ Inputs?○ Output?○ What could go wrong?

Page 109: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

This far...● I/O (Input/Output)● Comparisons● Strings & their manipulation● Compound data types list & dictionary

● Loops

● File I/O

Page 110: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Questions?

Page 111: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Programming...● Program is collection of commands / rules (“statements”) to the computer

(here: with Python-interpreter)

● Commands in the different programming languages can look quite different, but in the end they mostly do same things:

○ Data input / output (“read, write”)○ Internal data handling: variables and data structures (“list, dictionary, ...”) and their

manipulation (“operators”)○ Control flow statements

■ Conditional statements (“if then else”)■ Loops (“for”, “while”, “goto”, …)■ Functions / subroutines / methods (“sqrt(x), sin(x),...”)

Page 112: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Functions 1/3● Splitting a program to (smallish) logical pieces usually pays off

○ testing different pieces is easier○ parts of a program can be modified without changing the whole program○ parts can be reused in other programs

● In Python the separate parts are called functions (or classes, which we not using in this course)

● Define functions to do one thing and one thing only

Page 113: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Functions 2/3● Self defined functions are called like any other function in Python

● Functions are defined with def-statement

def double_number(my_number):print(“Double of”, my_number, “is”, 2*my_number)

● Functions can be used after they are defined

input_number = input(“Give number to be doubled: “)double_number(input_number)

What will go wrong and why?

Page 114: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Functions 3/3● def-statement starts the function definition and function ends when the

indentation returns to previous level

● Syntax: def function_name(parameter1, parameter2, …):○ After function’s name list used parameters separated by commas○ Parameters carry information (variables, data structures) into function

● return-statement carry information back from a function

● Functions must be defined before they are used

● Functions can have other function definitions, call other functions and call themselves...

Page 115: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

ExerciseWrite a function that returns cube (x*x*x) of a number

Make a program that reads one number per line from a file and prints out the number and its cube using the previous function

Page 116: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

More on functions 1/2● Write always comments about

○ What the function does○ What parameters the function needs○ What the function returns, if anything

● Functions can return multiple values => e.g. immutable list, tuple

● Returned values can be unpacked with single assignment

Page 117: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

More on functions 2/2E.g.

def min_max(num1, num2):# this program returns two numbers in increasing order: smaller, largerif num1 > num2:

return(num2, num1)else:

return(num1, num2)

### The Main Program ###small, big = min_max(100, 50) # <= Here we unpack the return values to varsprint("Smaller was:", small, "and larger was:", big)

Page 118: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

ExerciseWrite a function that returns a median value of numbers given to a function

Make a program that calculates Quartile(*) from a set of numbers

(*)https://en.wikipedia.org/wiki/Quartile

Page 119: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Libraries / modules 1/2● Libraries bring new commands/functions to the programming environment

● Python has “everything and the kitchen sink” type of standard library https://docs.python.org/3/library/

● There are also widely used and accepted libraries (e.g. matplotlib, numpy and biopython - more on these in the latter part of the course)

● Users can write their own libraries (we’ll skip that in this course)

Page 120: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Libraries / modules 2/2● Library is activated with import statement

● Functions within library are called as library.function

E.g.

import random # get access to various random number functionsprint(“A random integer between [0,10] just for you:”, random.randint(0,10))

Page 121: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

ExerciseMake a program that draws seven different numbers between range [1, 40]

Use random library

Run the program several times - do the results change?

Page 122: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Few words about objects in Python● Object-Oriented Programming (OOP) is one way to program with Python

● However, in this course, we stick with procedural programming (sequences of commands)

● Objects are entities that encapsulate both attributes (≅ data) and methods (≅ functions) how the data is manipulated

● Objects are defined with class-statement

● Attributes and methods are referred with the dot (“.”) in end of the object

Page 123: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Objects - in action● We have already seen objects, methods and attributes e.g. with file I/O and

string manipulation

E.g.my_file = open(“test.txt”, “rt”)my_file.closed # attribute / state of the file => Falsemy_file.close() # method to close the filemy_file.closed # testing again => True

my_name_is = “janne”my_name_is.capitalize() # string method => “Janne”

Page 124: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Libraries 1/3import numpya = numpy.random.randint(15, size=(2,5))b = numpy.random.randint(15, size=(2,5))

print("a:", a, "\n")print("b:", b, "\n")

print("a+b:", a+b, "\n")print("a*b:", a*b, "\n") # per each element in the table

print("a[0]*b[0]:", a[0]*b[0], "\n") # for the first rows only

https://docs.scipy.org/doc/numpy/user/index.html

Page 125: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Libraries 2/3import datetime

today = datetime.date.today()next_time = datetime.date(2019, 5, 9)

print("Time between now and the next time:", next_time - today)

https://docs.python.org/3/library/datetime.html

Page 126: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Libraries 3/3Try in the notebook (assuming that matplotlib is installed!)

# magic command to inline graphics, note the “%” sign%matplotlib inline

import matplotlib.pyplot

x = [1,2,3,4]y = [-3, 11, 4, 9]

matplotlib.pyplot.plot(x,y)

https://matplotlib.org/users/pyplot_tutorial.html

Page 127: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

This far● I/O, including file I/O

● conditionals / comparisons

● data structures list & dictionary

● Loops for & while

● Functions, methods & libraries

=> you can program! Now it is only practise, practise, practise (& read more about Python, libraries, …)

Page 128: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Tips for programming 1/3● Split the problem / program into smaller logical pieces even before writing a

single line of code (simulate the program flow in your mind / pen & paper)

● The whole program can be completed piece-by-piece. Start e.g. by reading the data into some reasonable data structure

● Learn to comment your code well! It will pay off later.

● Outline the program using comments

Page 129: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Tips for programming 2/3● What the program reads in? What is the wanted output?

● What choices program need to make?

● What data structures are needed?

● Are there any repeated parts? Loops and/or functions?

● What can go wrong? What then?

● Naming and initialization: variables, data structures, files, ...

Page 130: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

Tips for programming 3/3● The program can print (& save to a file!) intermediate results

● Extra output can be commented out later

● Test your programs with simple inputs

● Python has many, many, many functions and libraries for common tasks - read the documentation!

● Beware of incompatible libraries, Python 2.7+ code snippets, ...

Page 131: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

The next time...1. Python-programming warm up :)2. Anaconda Distribution revisited3. Python as an Integration language4. Libraries...

4.1. Numpy & Scipy4.2. Matplotlib4.3. Pandas4.4. Biopython4.5. ...

Recap

Page 132: Python for Biomedicine - UEFwhamalai/PyBio/PyBio_ravantti_lectures_v1.0.pdf · Python-packages for all kinds of Anaconda is Continuum Analytics’ Python-distribution package Works

print(“The End”)

print(“KIITOS!”)