bahman bahmani [email protected]. password security [schechter et al. 10] semantic analytics...
TRANSCRIPT
![Page 2: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/2.jpg)
2
Outline
Password Security [Schechter et al. ’10] Semantic Analytics [Goyal et al. ’11] Reputation Systems [Bahmani et al. ’11] Conclusion
![Page 3: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/3.jpg)
3
Outline
Password Security [Schechter et al. ’10] Semantic Analytics [Goyal et al. ’11] Reputation Systems [Bahmani et al. ’11] Conclusion
![Page 4: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/4.jpg)
4
Password selection policies Length of 8 to 20 Both letters and numbers Both lower and upper case letters Non-alphanumeric characters A number between first and last character Not your dog’s name … Oh, by the way, change it once a month!
![Page 5: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/5.jpg)
5
Unintended consequences
Rule Consequence
Require minimum length Use dictionary words, write down passwords
Include special characters E3, a@,…
No simple character replacements #{lb, hash}, ^{hat, top}, ...
![Page 6: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/6.jpg)
6
Strong password = security?
![Page 7: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/7.jpg)
7
Why all these rules then?Statistical guessing attacks
![Page 8: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/8.jpg)
8
Why not just measure popularity?!
Popularity oracle: Map passwords to counts
If password popular, prompt user to change it Can limit attack to 0.0001% rather than 0.22%
(MySpace) or 0.9% (RockYou)
![Page 9: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/9.jpg)
9
What is wrong with this oracle?
Allows no salting If compromised, attack is optimized!
![Page 10: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/10.jpg)
10
Requirements for a good oracle
Keep counts without keeping passwords Quick updates Quick queries
![Page 11: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/11.jpg)
11
Candidate Magic oracle
0 0 . . . 0 0 0
0 0 . . . 0 0 0
. . .
0 0 . . . 0 0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
d
w
![Page 12: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/12.jpg)
12
CM oracle
0 0 . . . 0 0 0
0 0 . . . 0 0 0
. . .
0 0 . . . 0 0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
d
w
![Page 13: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/13.jpg)
13
CM oracle
0 0 . . . 0 1 (=0+1)
0
0 1 (=0+1)
. . . 0 0 0
. . .
1 (=0+1)
0 . . . 0 0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
d
w
![Page 14: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/14.jpg)
14
CM oracle
0 0 . . . 0 1 (=0+1)
0
0 1 (=0+1)
. . . 0 0 0
. . .
1 (=0+1)
0 . . . 0 0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
d
w
![Page 15: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/15.jpg)
15
CM oracle
0 0 . . . 0 1 (=0+1)
0
0 1 (=0+1)
. . . 0 0 0
. . .
1 (=0+1)
0 . . . 0 0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
d
w
![Page 16: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/16.jpg)
16
CM oracle
1 (=0+1)
0 . . . 0 1 (=0+1)
0
0 1 (=0+1)
. . . 1 (=0+1)
0 0
. . .
1 (=0+1)
1 (=0+1)
. . . 0 0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
d
w
![Page 17: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/17.jpg)
17
CM oracle
1 (=0+1)
0 . . . 0 1 (=0+1)
0
0 1 (=0+1)
. . . 1 (=0+1)
0 0
. . .
1 (=0+1)
1 (=0+1)
. . . 0 0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
d
w
![Page 18: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/18.jpg)
18
CM oracle: how about collisions?
1 (=0+1)
0 . . . 0 1 (=0+1)
0
0 1 (=0+1)
. . . 1 (=0+1)
0 0
. . .
1 (=0+1)
1 (=0+1)
. . . 0 0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
d
w
![Page 19: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/19.jpg)
19
CM oracle don’t care!
![Page 20: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/20.jpg)
20
CM oracle
2 (=0+1+1)
0 . . . 0 1 (=0+1)
0
0 2 (=0+1+1)
. . . 1 (=0+1)
0 0
. . .
1 (=0+1)
1 (=0+1)
. . . 1 (=0+1)
0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
d
w
![Page 21: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/21.jpg)
21
CM oracle
2 (=0+1+1)
0 . . . 0 1 (=0+1)
0
0 2 (=0+1+1)
. . . 1 (=0+1)
0 0
. . .
1 (=0+1)
1 (=0+1)
. . . 1 (=0+1)
0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
d
w
![Page 22: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/22.jpg)
22
CM oracle
2 (=0+1+1)
0 . . . 0 1 (=0+1)
0
0 2 (=0+1+1)
. . . 1 (=0+1)
0 0
. . .
1 (=0+1)
1 (=0+1)
. . . 1 (=0+1)
0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
d
w
![Page 23: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/23.jpg)
23
CM oracle
2 (=0+1+1)
0 . . . 0 2 (=0+1+1)
0
03
(=0+1+1+1)
. . . 1 (=0+1)
0 0
. . .
2 (=0+1+1)
1 (=0+1)
. . . 1 (=0+1)
0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
d
w
![Page 24: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/24.jpg)
24
CM oracle
2 0 . . . 0 2 0
0 3 . . . 1 0 0
. . .
2 1 . . . 1 0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
d
w
![Page 25: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/25.jpg)
25
CM oracle query: Minimum counter
2 0 . . . 0 2 0
0 3 . . . 1 0 0
. . .
2 1 . . . 1 0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
d
w
![Page 26: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/26.jpg)
26
CM oracle: Theorem
Choosing d,w “properly” leads to “tiny” errors in frequencies with “very large” probability
Formally, at most ε error with probability 1-δ:
€
w = e /ε⎡ ⎤,d = ln(1/δ )⎡ ⎤
![Page 27: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/27.jpg)
27
CM oracle: Example
With w=270,000 and d=14, error in frequencies less than 10-5 = 0.00001 with probability 1-10-6 = 0.999999!
![Page 28: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/28.jpg)
28
CM oracle: Magic
Guarantee independent of number of passwords
Example: Fit (approximate) counts of 100M passwords in less than 4M counters!
![Page 29: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/29.jpg)
29
What if CM oracle is stolen?
Choose d and w small enough to ensure a minimum false positive rate!
Trouble users just a little bit, but confound attackers
![Page 30: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/30.jpg)
30
CM oracle sketch
Small memory remember only what matters
Quick updatesQuick queries
That’s the definition of a sketch
![Page 31: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/31.jpg)
31
Simple examples
Stream of numbers a1, a2, …, at, …SUM sketch: running sumAVG sketch: (running sum, count)
![Page 32: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/32.jpg)
32
Cognitive Analogy
Stream of sensory observations Remember only parts of observations Still function properly Everyone is doing it! [Muthukrishnan, 2005]
![Page 33: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/33.jpg)
33
Outline
Password Security [Schechter et al. ’10] Semantic Analytics [Goyal et al. ’11] Reputation Systems [Bahmani et al. ’11] Conclusion
![Page 34: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/34.jpg)
34
Example: Sentiment Analysis Is a word used more in a positive or
a negative sense?
![Page 35: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/35.jpg)
35
Problem: Positive or negative?
***nice****myPhone***
myPhone**great*
**myPhone***
**excellent**myPhone***
** bad **** **myPhone **
*myPhone*****terrible
myPhone**good*
![Page 36: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/36.jpg)
36
Solution: Co-occurrence countsmyPhone and words good, great,
nice, ...myPhone and words bad, awful,
terrible, …
![Page 37: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/37.jpg)
37
Co-occurrence counts applications
Statistical machine translation Spelling correction Part-of-speech tagging Paraphrasing Word sense disambiguation Language modeling Speech and character recognition …
![Page 38: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/38.jpg)
38
Co-occurrence counts task
Large corpus of documents Tweet stream Web corpus
Vocabulary {w1,w2,…,wN} English language: N≈105
Web: N≈109
Goal: For any two words in the vocabulary, compute the number of documents containing both
![Page 39: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/39.jpg)
39
Problem: Too many unique pairs
Example [Goyal et al., 2010]: 78M word corpus of size 577MB 63K unique words 118M unique word pairs, 2GB to only
store them
![Page 40: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/40.jpg)
40
It gets worse with larger corpus size
![Page 41: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/41.jpg)
41
Solution 1: Just Hadoop it!Compute all co-occurrence counts
exactly Ref. [“Data-Intensive Text Processing with MapReduce”,
Lin et al.]
Problem: Too inefficient
![Page 42: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/42.jpg)
42
Solution 2: CM sketch
Use a CM sketch to track the counts of word pairs
![Page 43: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/43.jpg)
43
Example
0 0 . . . 0 0 0
0 0 . . . 0 0 0
.
.
.
.
.
.
. . .
.
.
.
.
.
.
.
.
.
0 0 . . . 0 0 0
d
w
![Page 44: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/44.jpg)
44
Example
How do you shoot a yellow elephant?
0 0 . . . 0 0 0
0 0 . . . 0 0 0
.
.
.
.
.
.
. . .
.
.
.
.
.
.
.
.
.
0 0 . . . 0 0 0
d
w
(shoot, yellow)
![Page 45: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/45.jpg)
45
Example
How do you shoot a yellow elephant?
0 1 . . . 0 0 0
0 0 . . . 1 0 0
.
.
.
.
.
.
. . .
.
.
.
.
.
.
.
.
.
1 0 . . . 0 0 0
d
w
(shoot, yellow)
(shoot, elephant)
![Page 46: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/46.jpg)
46
Example
How do you shoot a yellow elephant?
0 1 . . . 1 0 0
0 1 . . . 1 0 0
.
.
.
.
.
.
. . .
.
.
.
.
.
.
.
.
.
2 0 . . . 0 0 0
d
w
(shoot, yellow)
(shoot, elephant)
(yellow, elephant)
![Page 47: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/47.jpg)
47
Example
How do you shoot a yellow elephant?
0 2 . . . 1 0 0
0 1 . . . 1 0 1
.
.
.
.
.
.
. . .
.
.
.
.
.
.
.
.
.
2 0 . . . 1 0 0
d
w
(shoot, yellow)
(shoot, elephant)
(yellow, elephant)
![Page 48: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/48.jpg)
48
Back to sentiment analysisQuery the CM sketch with the pairs
(myPhone, good) (myPhone, nice) (myPhone, bad) (myPhone, terrible) …
![Page 49: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/49.jpg)
49
CM sketch: Gain
Does not store the word pairs themselves
30X less space (37GB corpus, almost no error) [Goyal et al., 2010]
![Page 50: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/50.jpg)
50
Outline
Password Security [Schechter et al. ’10] Semantic Analytics [Goyal et al. ’11] Reputation Systems [Bahmani et al. ’11] Conclusion
![Page 51: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/51.jpg)
51
Motivation
![Page 52: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/52.jpg)
52
PageRank
Well known reputation system [Page et al., 1998]
Treats each link as an endorsementA node highly reputed if endorsed by
many other such nodes
![Page 53: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/53.jpg)
53
Goal: Computing PageRank on the flyNetwork edges arrive over time
Friendships Social events
Maintain an accurate estimate of PageRank of every node after each edge arrival
![Page 54: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/54.jpg)
54
Random surfer interpretation
A random surfer traverses the network Teleports to a completely random node
with some probability ε (e.g., ε=0.2) at each step
Follows a random link otherwisePageRank: stationary distribution of
this walk
![Page 55: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/55.jpg)
55
Example: Random surfer
1
2 3 4
5 6
7 8
910
11
![Page 56: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/56.jpg)
56
Example: Random surfer
1
2 3 4
5 6
7 8
910
11
![Page 57: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/57.jpg)
57
Example: Random surfer
1
2 3 4
5 6
7 8
910
11
![Page 58: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/58.jpg)
58
Example: Random surfer
1
2 3 4
5 6
7 8
910
11
![Page 59: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/59.jpg)
59
Example: Random surfer
1
2 3 4
5 6
7 8
910
11
![Page 60: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/60.jpg)
60
Example: Random surfer
1
2 3 4
5 6
7 8
910
11
![Page 61: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/61.jpg)
61
PageRank computation methods
Power Iteration: Iterative linear algebraic method.
Monte Carlo: Simulate the PageRank walk. Use the empirical distribution to approximate PageRank.
Neither can be done efficiently on the fly
![Page 62: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/62.jpg)
62
PageRank sketch
Store R random walks starting at each node
Whenever a new edge arrives modify only the random walks needing an update New edge (u, v) Only walks passing through u Each with probability 1/degree(u)
![Page 63: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/63.jpg)
63
ExampleNode 1 Node 2 Node 3
1 12123212 2 323232
2 123211123232 2112321112323
32
3 11 23 3232321
4 1111 2323211112321
32323
5 1121111 2 3212321232321
6 12323 2323212 3
7 1 2111 3232121112321
8 12123 232121112 3212
9 11 2 3
10 111212111232 211121121 321121
1
3 2
![Page 64: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/64.jpg)
64
ExampleNode 1 Node 2 Node 3
1 13212 2 323232
2 1321321 21232321 32
3 11111 23 3232321
4 13 23 32323
5 113213211321
2 321232323
6 12323 2323212 3
7 1 232 3232121112321
8 1 232121112 32
9 1323 2 3
10 1321 2 321121
1
3 2
![Page 65: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/65.jpg)
65
Key Insight
Most edges miss most random walks!
Even more pronounced as network grows larger.
![Page 66: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/66.jpg)
66
![Page 67: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/67.jpg)
67
![Page 68: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/68.jpg)
68
![Page 69: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/69.jpg)
69
![Page 70: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/70.jpg)
70
PageRank sketch: TheoremAs the network grows, the marginal
number of operations per update decreases!
Theorem: Given random arrivals, if Mt is the update work at time t
€
E[M t ] ≤RN
ε 2t
![Page 71: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/71.jpg)
71
Outline
Password Security [Schechter et al. ’10] Semantic Analytics [Goyal et al. ’11] Reputation Systems [Bahmani et al. ’11] Conclusion
![Page 72: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/72.jpg)
72
Sketching: Why Care?
Different view of big data analysisNimble and on the fly, compared to
bulky and inefficientDirect reduction in data
infrastructure costs, both CAPEX and OPEX
![Page 73: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/73.jpg)
73
Sketching: How about errors?Mathematical guarantees behind
rates and sizes of errors If you can not make a decision based
on an analytics result, which has less than 0.0001% error with probability 0.99999, then you most likely should not make that decision!
![Page 74: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/74.jpg)
74
Sketching: What’s next?
Lots of applications: Security, Social media analytics, Recommendation
systems, Sensor networks, Intelligent mobile applications The math and algorithms are there Needed:
Technologists: build systems with sketching techniques Entrepreneurs: build products with these techniques Big business leaders: learn about, adopt, and benefit
from these techniques
![Page 76: Bahman Bahmani bahman@stanford.edu. Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]](https://reader035.vdocuments.mx/reader035/viewer/2022062223/5517e38e550346d5568b4604/html5/thumbnails/76.jpg)
76
Appendix: Photo Credits Slide 4: http://www.the-games-blog.com/and-the-cat-and-mouse-game-continues/ Slide 6: http://www.security-faqs.com/what-exactly-is-a-dictionary-attack.html Slide 7:
http://krepon.armscontrolwonk.com/archive/3182/forecasting-proliferation/crystalball-2
Slide 8: http://www.hdwallpaperspics.com/crystal-ball-wallpapers.html Slide 9,27, 41, 48: http://lissarankin.com/do-you-expect-people-to-read-your-mind Slide 18: http://ouroregon.org/category/content-authors/alina-harway?page=2 Slide 31:
http://sciencesoup.tumblr.com/post/39608896216/learning-foreign-languages-triggers-brain
Slide 33: http://livingqlikview.blogspot.com/2012/03/my-sentiments-on-sentiment-analysis.html
Slide 34: http://www.presentermedia.com/index.php?target=closeup&maincat=clipart&id=2221
Slide 40: http://www.clker.com/clipart-yellow-elephant.html Slide 51: http://en.wikipedia.org/wiki/PageRank