recitation 12 programming for engineers in python

Click here to load reader

Upload: warren-andrews

Post on 15-Jan-2016

226 views

Category:

Documents


0 download

TRANSCRIPT

PowerPoint Presentation

Recitation 12

Programming for Engineers in Python1PlanDynamic ProgrammingCoin Change problemLongest Common SubsequenceApplication to Bioinformatics

22Teaching Survey3Please answer the teaching survey: https://www.ims.tau.ac.il/Tal/This will help us to improve the courseDeadline: 4.2.12

Coin Change Problem4What is the smallest number of coins I can use to make exact change?Greedy solution: pick the largest coin first, until you reach the change neededIn the US currency this works well:Give change for 30 cents if youve got 1, 5, 10, and 25 cent coins: 25 + 5 2 coins

http://jeremykun.files.wordpress.com/2012/01/coins.jpgThe Sin of Greediness5What if you dont have 5 cent coins?You got 1, 10, and 25Greedy solution: 25+1+1+1+1+1 6 coinsBut a better solution is: 10+10+10 3 coins!So the greedy approach isnt optimal

The Seven Deadly Sins and the Four Last Things by Hieronymus Boschhttp://en.wikipedia.org/wiki/File:Boschsevendeadlysins.jpgRecursive Solution6Reminder find the minimal # of coins needed to give exact change with coins of specified valuesAssume that we can use 1 cent coins so there is always some solutionDenote our coin list by c1, c2, , ck (c1=1)k is the # of coins values we can useDenote the change required by nIn the previous example: n=30, k=3, c1=1, c2=10, c3=25Recursive Solution7Recursion Base:If n=0 then we need 0 coinsIf k=1, c1=1, so we need n coins

Recursion Step:If n>> print 'result', coins_change_rec(30, (1,5,10,25))result 2>>> print 'max calls',max(calls.values())max calls 4

Dynamic Programing - Memoization10We want to store the values of calculation so we dont repeat themWe create a table called mem# of columns: # of cents needed + 1# of rows: # of coin values + 1The table is initialized with some illegal value for example -1:

mem = [ [-1 for y in range(cents_needed+1)] for x in range(len(coin_values)) ] Dynamic Programing - Memoization11For each call of the recursive function, we check if mem already has the answer:if mem[len(coin_values)][cents_needed] == -1:In case that it doesnt (the above is True) we calculate it as before, and we store the result, for example:if cents_needed ci)We can decide not to use ci , therefore to use only c0 ,.., ci-1, and therefore min_coins[i-1,j] .So which way do we choose?The one with the least coins!min_coins[i,j] = min(min_coins[i,j-ci] +1, min_coins[i-1,j])

Example matrix recursion step16coins_matrix.pyThe code for the matrix solution and the idea is from http://jeremykun.wordpress.com/2012/01/12/a-spoonful-of-python/Longest Common Subsequence17Given two sequences (strings/lists) we want to find the longest common subsequenceDefinition subsequence: B is a subsequence of A if B can be derived from A by removing elements from AExamples[2,4,6] is a subsequence of [1,2,3,4,5,6][6,4,2] is NOT a subsequence of [1,2,3,4,5,6]is is a subsequence of distancenice is NOT a subsequence of distanceLongest Common Subsequence18Given two subsequences (strings or lists) we want to find the longest common subsequence:Example for a LCS:Sequence 1: HUMANSequence 2: CHIMPANZEE

Applications include:BioInformatics (next up)Version Control

http://wordaligned.org/articles/longest-common-subsequence

The DNA19Our biological blue-print A sequence made of four bases A, G, C, TDouble strand:A connects to TG connects to CEvery triplet encodes for an amino-acid Example: GAGGlutamateA chain of amino-acids is a protein the biological machine!http://sips.inesc-id.pt/~nfvr/msc_theses/msc09b/Longest common subsequence20The DNA changes:Mutation: AG, CT, etc.Insertion: AGC ATGCDeletion: AGC AC

Given two non-identical sequences, we want to find the parts that are commonSo we can say how different they areWhich DNA is more similar to ours? The cats or the dogs?

http://palscience.com/wp-content/uploads/2010/09/DNA_with_mutation.jpgRecursion21An LCS of two sequences can be built from the LCSes of prefixes of these sequencesDenote the sequences seq1 and seq2Base check if either sequence is empty:If len(seq1) == 0 or len(seq2) == 0: return [ ]Step build solution from shorter sequences:If seq1[-1] == seq2[-1]: return lcs (seq1[:-1],seq2[:-1]) + [ seq1[-1] ]else: return max(lcs (seq1[:-1],seq2), lcs(seq1,seq2[:-1]), key = len)lcs_rec.py21Wasteful Recursion22For the inputs MAN and PIG, the calls are:(1, ('', 'PIG'))(1, ('M', 'PIG'))(1, ('MA', 'PIG'))(1, ('MAN', ''))(1, ('MAN', 'P'))(1, ('MAN', 'PI'))(1, ('MAN', 'PIG'))(2, ('MA', 'PI'))(3, ('', 'PI'))(3, ('M', 'PI'))(3, ('MA', ''))(3, ('MA', 'P'))(6, ('', 'P'))(6, ('M', ''))(6, ('M', 'P'))24 redundant calls!

http://wordaligned.org/articles/longest-common-subsequenceWasteful Recursion23When comparing longer sequences with a small number of letters the problem is worseFor example, DNA sequences are composed of A, G, T and C, and are longFor lcs('ACCGGTCGAGTGCGCGGAAGCCGGCCGAA', 'GTCGTTCGGAATGCCGTTGCTCTGTAAA') we get an absurd:(('', 'GT'), 13,182,769)(('A', 'GT'), 13,182,769)(('A', 'G'), 24,853,152)(('', 'G'), 24,853,152)(('A', ''), 24,853,152)

http://blog.oncofertility.northwestern.edu/wp-content/uploads/2010/07/DNA-sequence.jpgDP Saves the Day24We saw the overlapping sub problems emerge comparing the same sequences over and over againWe saw how we can find the solution from solution of sub problems a property we called optimal substructureTherefore we will apply a dynamic programming approachStart with top-down approach - memoization

Memoization25We save results of function calls to refrain from calculating them againdef lcs_mem( seq1, seq2, mem=None ): if not mem: mem = { } key = (len(seq1), len(seq2)) # tuples are immutable if key not in mem: # result not saved yet if len(seq1) == 0 or len(seq2) == 0: mem[key] = [ ] else: if seq1[-1] == seq2[-1]: mem[key] = lcs_mem(seq1[:-1], seq2[:-1], mem) + [ seq1[-1] ] else: mem[key] = max(lcs_mem(seq1[:-1], seq2 ,mem), lcs_mem (seq1, seq2[:-1], mem), key=len )return mem[key]maximum recursion depth exceeded26We want to use our memoized LCS algorithm on two long DNA sequences:>>> from random import choice>>> def base(): return choice('AGCT')>>> seq1 = str([base() for x in range(10000)])>>> seq2 = str([base() for x in range(10000)])>>>print lcs(seq1, seq2)RuntimeError: maximum recursion depth exceeded in cmpWe need a different algorithm27link

DNA Sequence Alignment28Needleman-Wunsch DP Algorithm:Python package: http://pypi.python.org/pypi/nwalignOn-line example: http://alggen.lsi.upc.es/docencia/ember/frame-ember.htmlCode: needleman_wunsch_algorithm.pyLecture videos from TAU:http://video.tau.ac.il/index.php?option=com_videos&view=video&id=4168&Itemid=53