class 31: deanonymizing
DESCRIPTION
Python DictionariesPlagarism Detection(Pre-)History of Object-Oriented ProgrammingTRANSCRIPT
![Page 1: Class 31: Deanonymizing](https://reader033.vdocuments.mx/reader033/viewer/2022051817/547c6b01b379594e2b8b4f9b/html5/thumbnails/1.jpg)
Class 31:Deanonymizing
cs1120 Fall 2011David Evans4 November 2011
![Page 2: Class 31: Deanonymizing](https://reader033.vdocuments.mx/reader033/viewer/2022051817/547c6b01b379594e2b8b4f9b/html5/thumbnails/2.jpg)
2
Plan
Dictionaries in PythonHistory of Object-Oriented Programming
![Page 3: Class 31: Deanonymizing](https://reader033.vdocuments.mx/reader033/viewer/2022051817/547c6b01b379594e2b8b4f9b/html5/thumbnails/3.jpg)
Python Dictionary
Dictionary abstraction provides a lookup table. Each entry in a dictionary is a
<key, value>pair. The key must be an immutable object. The value can be anything.
dictionary[key] evaluates to the value associated with key. Running time is approximately constant! Recall: Class 9 on consistent hashing. Python
dictionaries use (inconsistent) hashing to make lookups nearly constant time.
![Page 4: Class 31: Deanonymizing](https://reader033.vdocuments.mx/reader033/viewer/2022051817/547c6b01b379594e2b8b4f9b/html5/thumbnails/4.jpg)
Dictionary Example>>> d = {}>>> d['UVa'] = 1818>>> d['UVa'] = 1819>>> d['Cambridge'] = 1209>>> d['UVa']1819>>> d['Oxford']
Traceback (most recent call last): File "<pyshell#93>", line 1, in <module> d['Oxford']KeyError: 'Oxford'
Create a new, empty dictionary
Add an entry: key ‘UVa’, value 1818
Update the value: key ‘UVa’, value 1819
![Page 5: Class 31: Deanonymizing](https://reader033.vdocuments.mx/reader033/viewer/2022051817/547c6b01b379594e2b8b4f9b/html5/thumbnails/5.jpg)
![Page 6: Class 31: Deanonymizing](https://reader033.vdocuments.mx/reader033/viewer/2022051817/547c6b01b379594e2b8b4f9b/html5/thumbnails/6.jpg)
6
![Page 7: Class 31: Deanonymizing](https://reader033.vdocuments.mx/reader033/viewer/2022051817/547c6b01b379594e2b8b4f9b/html5/thumbnails/7.jpg)
HistogrammingDefine a procedure histogram that takes a text string as its input, and returns a dictionary that maps each word in the input text to the number of occurrences in the text.
Useful string method: split([separator])outputs a list of the words in the string
>>> 'here we go'.split() ['here', 'we', 'go']>>> "Simula, Nygaard and Dahl, Norway, 1962".split(",")['Simula', ' Nygaard and Dahl', ' Norway', ' 1962']
![Page 8: Class 31: Deanonymizing](https://reader033.vdocuments.mx/reader033/viewer/2022051817/547c6b01b379594e2b8b4f9b/html5/thumbnails/8.jpg)
8
def histogram(text): d = {} words = text.split()
>>> histogram(""""Mathematicians stand on each others' shoulders and computer scientists stand on each others' toes."Richard Hamming"""){'and': 1, 'on': 2, 'shoulders': 1, 'computer': 1, 'Richard': 1, 'scientists': 1, "others'": 2, 'stand': 2, 'Hamming': 1, 'each': 2, '"Mathematicians': 1, 'toes."': 1}
![Page 9: Class 31: Deanonymizing](https://reader033.vdocuments.mx/reader033/viewer/2022051817/547c6b01b379594e2b8b4f9b/html5/thumbnails/9.jpg)
def histogram(text): d = {} words = text.split() for w in words: if w in d: d[w] = d[w] + 1 else: d[w] = 1 return d
>>> declaration = urllib.urlopen('http://www.cs.virginia.edu/cs11
20/readings/declaration.html').read() >>> histogram(declaration){'government,': 1, 'all': 11, 'forbidden': 1, '</title>': 1, '1776</b>': 1, 'hath': 1, 'Caesar': 1, 'invariably': 1, 'settlement': 1, 'Lee,': 2, 'causes': 1, 'whose': 2, 'hold': 3, 'duty,': 1, 'ages,': 2, 'Object': 1, 'suspending': 1, 'to': 66, 'present': 1, 'Providence,': 1, 'under': 1, '<dd>For': 9, 'should.': 1, 'sent': 1, 'Stone,': 1, 'paralleled': 1, …
![Page 10: Class 31: Deanonymizing](https://reader033.vdocuments.mx/reader033/viewer/2022051817/547c6b01b379594e2b8b4f9b/html5/thumbnails/10.jpg)
10
Sorting the Histogramsorted(collection, cmp)Returns a new sorted list of the elements in collection ordered by cmp.
cmp specifies a comparison function of two arguments which should return a negative, zero or positive number depending on whether the first argument is considered smaller than, equal to, or larger than the second argument.
>>> sorted([1,5,3,2,4],<)SyntaxError: invalid syntax>>> <SyntaxError: invalid syntax>>> sorted([1,5,3,2,4], lambda a, b: a > b)[1, 5, 3, 2, 4]>>> sorted([1,5,3,2,4], lambda a, b: a - b)[1, 2, 3, 4, 5]
Expression ::= lambda Parameters : Expression
Makes a procedure, just like Scheme’s lambda (instead of listing parameters in (), separate with :)
![Page 11: Class 31: Deanonymizing](https://reader033.vdocuments.mx/reader033/viewer/2022051817/547c6b01b379594e2b8b4f9b/html5/thumbnails/11.jpg)
Showing the Histogram
def show_histogram(d): keys = d.keys() okeys = sorted(keys,
for k in okeys: print str(k) + ": " + str(d[k])
![Page 12: Class 31: Deanonymizing](https://reader033.vdocuments.mx/reader033/viewer/2022051817/547c6b01b379594e2b8b4f9b/html5/thumbnails/12.jpg)
Showing the Histogram
def show_histogram(d): keys = d.keys() okeys = sorted(keys,
lambda k1, k2: d[k2] - d[k1]) for k in okeys: print str(k) + ": " + str(d[k])
![Page 13: Class 31: Deanonymizing](https://reader033.vdocuments.mx/reader033/viewer/2022051817/547c6b01b379594e2b8b4f9b/html5/thumbnails/13.jpg)
Author Fingerprinting(aka Plagarism Detection)
“The program identifies phrases of three words or more in an author’s known work and searches for them in unattributed plays. In tests where authors are known to be different, there are up to 20 matches because some phrases are in common usage. When Edward III was tested against Shakespeare’s works published before 1596 there were 200 matches.”
The Times, 12 October 2009
![Page 14: Class 31: Deanonymizing](https://reader033.vdocuments.mx/reader033/viewer/2022051817/547c6b01b379594e2b8b4f9b/html5/thumbnails/14.jpg)
def phrase_collector(text, plen): d = {} words = text.split() words = map(lambda s: s.lower(), words) for windex in range(0, len(words) - plen): phrase = tuple(words[windex:windex+plen]) if phrase in d: d[phrase] = d[phrase] + 1 else: d[phrase]= 1 return d
Dictionary keys must beimmutable: convert the (mutable) list to an immutable tuple.
def histogram(text): d = {} words = text.split() for w in words: if w in d: d[w] = d[w] + 1 else: d[w] = 1 return d
![Page 15: Class 31: Deanonymizing](https://reader033.vdocuments.mx/reader033/viewer/2022051817/547c6b01b379594e2b8b4f9b/html5/thumbnails/15.jpg)
def common_phrases(d1, d2): keys = d1.keys() common = {} for k in keys: if k in d2: common[k] = (d1[k], d2[k]) return common
>>> ptj = phrase_collector(declaration, 3)>>> ptj{('samuel', 'adams,', 'john'): 1, ('to', 'pass', 'others'): 1, ('absolute', 'despotism,', 'it'): 1, ('a', 'firm', 'reliance'): 1, ('with', 'his', 'measures.'): 1, ('are', 'his.', '<p>'): 1, ('the', 'ruler', 'of'): 1, …>>> pde = phrase_collector(myhomepage, 3)>>> common_phrases(ptj, pde){('from', 'the', '<a'): (1, 1)}
myhomepage = urllib.urlopen('http://www.cs.virginia.edu/evans/index.html').read()declaration = urllib.urlopen('http://www.cs.virginia.edu/cs1120/readings/declaration.html').read()
![Page 16: Class 31: Deanonymizing](https://reader033.vdocuments.mx/reader033/viewer/2022051817/547c6b01b379594e2b8b4f9b/html5/thumbnails/16.jpg)
>>> pde = phrase_collector(myhomepage, 2)>>> ptj = phrase_collector(declaration, 2)>>> show_phrases(common_phrases(ptj, pde))('of', 'the'): (12, 5)('in', 'the'): (7, 7)('the', '<a'): (1, 10)('for', 'the'): (6, 5)('to', 'the'): (7, 3)('from', 'the'): (3, 5)('to', 'be'): (6, 1)('<p>', 'i'): (1, 6)('on', 'the'): (5, 2)('we', 'have'): (5, 1)('of', 'their'): (4, 1)('and', 'the'): (3, 1)('to', 'provide'): (1, 3)('of', 'a'): (2, 2)('the', 'state'): (2, 2)('by', 'their'): (3, 1)('the', 'same'): (2, 1)…
![Page 17: Class 31: Deanonymizing](https://reader033.vdocuments.mx/reader033/viewer/2022051817/547c6b01b379594e2b8b4f9b/html5/thumbnails/17.jpg)
History ofObject-Oriented
Programming
![Page 18: Class 31: Deanonymizing](https://reader033.vdocuments.mx/reader033/viewer/2022051817/547c6b01b379594e2b8b4f9b/html5/thumbnails/18.jpg)
Computing in World War II
Cryptanalysis (Lorenz: Collossus at Bletchley Park, Enigma: Bombes at Bletchley, NCR in US)
Ballistics Tables, calculations for Hydrogen bomb (ENIAC at U. Pennsylvania)
Batch processing: submit a program and its data, wait your turn, get a result
Building a flight simulator required a different type of computing: interactive computing
![Page 19: Class 31: Deanonymizing](https://reader033.vdocuments.mx/reader033/viewer/2022051817/547c6b01b379594e2b8b4f9b/html5/thumbnails/19.jpg)
Pre-History:MIT’s Project Whirlwind (1947-1960s)
Jay Forrester
![Page 20: Class 31: Deanonymizing](https://reader033.vdocuments.mx/reader033/viewer/2022051817/547c6b01b379594e2b8b4f9b/html5/thumbnails/20.jpg)
Whirlwind Innovations
Magnetic Core Memory(first version used vacuum tubes) IBM 704 (used by John McCarthy to
create LISP) commercialized this
![Page 21: Class 31: Deanonymizing](https://reader033.vdocuments.mx/reader033/viewer/2022051817/547c6b01b379594e2b8b4f9b/html5/thumbnails/21.jpg)
August 29, 1949:
First Soviet Atomic Test
![Page 22: Class 31: Deanonymizing](https://reader033.vdocuments.mx/reader033/viewer/2022051817/547c6b01b379594e2b8b4f9b/html5/thumbnails/22.jpg)
Short or Endless Golden Age of Nuclear Weapons?
0
10000
20000
30000
40000
50000
60000
1940 1950 1960 1970 1980 1990 2000 2010 2020
Hiroshima (12kt), Nagasaki (20kt)
First H-Bomb (10Mt)
Tsar Bomba (50 Mt, largest ever = 10x all of WWII)
B83 (1.2Mt), largestin currently active arsenal
kilo
tons
![Page 23: Class 31: Deanonymizing](https://reader033.vdocuments.mx/reader033/viewer/2022051817/547c6b01b379594e2b8b4f9b/html5/thumbnails/23.jpg)
Semi-Automatic Ground Environment (SAGE)MIT/IBM, 1950-1982Coordinate radar
stations in real-time to track incoming bombers
Total cost: $55B (more than Manhattan Project)
![Page 24: Class 31: Deanonymizing](https://reader033.vdocuments.mx/reader033/viewer/2022051817/547c6b01b379594e2b8b4f9b/html5/thumbnails/24.jpg)
R-7 Semyorka
First intercontinental ballistic missileFirst successful test: August 21, 1957
Sputnik: launched by R-7, October 4, 1957
![Page 25: Class 31: Deanonymizing](https://reader033.vdocuments.mx/reader033/viewer/2022051817/547c6b01b379594e2b8b4f9b/html5/thumbnails/25.jpg)
SketchpadIvan Sutherland’s 1963 PhD thesis
(supervised by Claude Shannon)
Interactive drawing programLight pen
![Page 26: Class 31: Deanonymizing](https://reader033.vdocuments.mx/reader033/viewer/2022051817/547c6b01b379594e2b8b4f9b/html5/thumbnails/26.jpg)
Components in Sketchpad
![Page 27: Class 31: Deanonymizing](https://reader033.vdocuments.mx/reader033/viewer/2022051817/547c6b01b379594e2b8b4f9b/html5/thumbnails/27.jpg)
Objects in SketchpadIn the process of making the Sketchpad system operate, a few very general functions were developed which make no reference at all to the specific types of entities on which they operate. These general functions give the Sketchpad system the ability to operate on a wide range of problems. The motivation for making the functions as general as possible came from the desire to get as much result as possible from the programming effort involved. For example, the general function for expanding instances makes it possible for Sketchpad to handle any fixed geometry subpicture. The rewards that come from implementing general functions are so great that the author has become reluctant to write any programs for specific jobs. Each of the general functions implemented in the Sketchpad system abstracts, in some sense, some common property of pictures independent of the specific subject matter of the pictures themselves.
Ivan Sutherland, Sketchpad: a Man-Machine Graphical Communication System, 1963
![Page 28: Class 31: Deanonymizing](https://reader033.vdocuments.mx/reader033/viewer/2022051817/547c6b01b379594e2b8b4f9b/html5/thumbnails/28.jpg)
Simula• Considered the first
“object-oriented” programming language
• Language designed for simulation by Kristen Nygaard and Ole-Johan Dahl (Norway, 1962)
• Had special syntax for defining classes that packages state and procedures together
![Page 29: Class 31: Deanonymizing](https://reader033.vdocuments.mx/reader033/viewer/2022051817/547c6b01b379594e2b8b4f9b/html5/thumbnails/29.jpg)
Counter in Simula
class counter; integer count; begin procedure reset(); count := 0; end; procedure next(); count := count + 1; end; integer procedure current(); current := count; end; end
Does this have everything we need for “object-oriented programming”?
![Page 30: Class 31: Deanonymizing](https://reader033.vdocuments.mx/reader033/viewer/2022051817/547c6b01b379594e2b8b4f9b/html5/thumbnails/30.jpg)
Object-Oriented Programming
Object-Oriented Programming is a state of mind where you program by thinking about objects
It is difficult to reach that state of mind if your language doesn’t have mechanisms for packaging state and procedures (Python has class, Scheme has lambda expressions)
Other things can help: dynamic dispatch, inheritance, automatic memory management, mixins, good donuts, etc.
![Page 31: Class 31: Deanonymizing](https://reader033.vdocuments.mx/reader033/viewer/2022051817/547c6b01b379594e2b8b4f9b/html5/thumbnails/31.jpg)
31
Charge
Monday: continue OOP historyPS6 Due MondayNext week:
PS7 Due next Monday: only one week!building a (mini) Scheme interpreter
(in Python and Java)
Reminder: Peter has office hours now! (going over to Rice)