thank you!christos/talks/10-kdd-award/faloutsos10ia.pdf · • ricardo baeza-yates (yahoo!...
TRANSCRIPT
![Page 1: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/1.jpg)
CMU SCS
Thank you!
C. Faloutsos CMU
![Page 2: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/2.jpg)
CMU SCS
Large Graph Mining
C. Faloutsos CMU
![Page 3: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/3.jpg)
CMU SCS
Large Graph Mining Data Mining for fun (and profit)
C. Faloutsos CMU
![Page 4: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/4.jpg)
CMU SCS
KDD'10 C. Faloutsos 4
Outline
• Credit where credit is due • Technical part – Data mining
– Can it be automated? – Research challenges
• Non-technical part: `Listen’ – To the data – To non-experts
![Page 5: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/5.jpg)
CMU SCS
Nominator
• Jian Pei
KDD'10 C. Faloutsos 5
![Page 6: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/6.jpg)
CMU SCS
Endorsers
• Charu C. Aggarwal (IBM Research) • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University of Alberta) • Yixin Chen (Washington University at St. Louis)
KDD'10 C. Faloutsos 6
![Page 7: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/7.jpg)
CMU SCS
Endorsers, cont’d • William Cohen (Carnegie Mellon University) • Diane J. Cook (Washington State University) • Gautam Das (University of Texas at Arlington) • Inderjit S. Dhillon (University of Texas at Austin) • Chris H. Q. Ding (University of Texas at
Arlington)
KDD'10 C. Faloutsos 7
![Page 8: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/8.jpg)
CMU SCS
Endorsers, cont’d • Petros Drineas (Rensselaer Polytechnic Institute) • Tina Eliassi-Rad (Lawrence Livermore National
Laboratory) • Greg Ganger (Carnegie Mellon University) • Minos Garofalakis (Technical University of Crete) • James Garrett (Carnegie Mellon University)
KDD'10 C. Faloutsos 8
![Page 9: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/9.jpg)
CMU SCS
Endorsers, cont’d • Dimitrios Gunopulos (University of Athens) • Xiaofei He (Zhejiang University) • Panagiotis G. Ipeirotis (New York University) • Eamonn Keogh (UCR) • Hiroyuki Kitagawa (University of Tsukuba) • Tamara Kolda (Sandia Nat. Labs)
KDD'10 C. Faloutsos 9
![Page 10: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/10.jpg)
CMU SCS
Endorsers, cont’d • Flip Korn (AT&T Research) • Nick Koudas (University of Toronto) • Hans-Peter Kriegel • Ravi Kumar (Yahoo! Research) • Laks Lakshmanan (UBC) • Jure Leskovec (Stanford University)
KDD'10 C. Faloutsos 10
![Page 11: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/11.jpg)
CMU SCS
Endorsers, cont’d • Nikos Mamoulis (Hong Kong University) • Heikki Manilla (Aalto University, • Dharmendra S. Modha (IBM Research) • Mario Nascimento (University of Alberta) • Jennifer Neville (Purdue University) • Beng Chin Ooi (National University of Singapore)
KDD'10 C. Faloutsos 11
![Page 12: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/12.jpg)
CMU SCS
Endorsers, cont’d • Dimitris Papadias (Hong Kong University of
Science and Technology) • Spiros Papadimitriou (IBM Research) • Jian Pei (Simon Fraser University) • Foster Provost (New York University) • Oliver Schulte (Simon Fraser University) • Dennis Shasha (New York University) • Srinivasan Parthasarathy (OSU)
KDD'10 C. Faloutsos 12
![Page 13: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/13.jpg)
CMU SCS
Endorsers, cont’d • Jimeng Sun (IBM Research) • Dacheng Tao (Nanyang University of Technology) • Yufei Tao (The Chinese University of Hong Kong) • Evimaria Terzi (Boston University) • Alex Thomo (University of Victoria) • Andrew Tomkins (Google Research)
KDD'10 C. Faloutsos 13
![Page 14: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/14.jpg)
CMU SCS
Endorsers, cont’d • Caetano Traina (University of Sao Paulo) • Vassilis Tsotras (University of California, Riverside) • Alex Tuzhilin (New York University) • Haixun Wang (Microsoft Research)
KDD'10 C. Faloutsos 14
![Page 15: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/15.jpg)
CMU SCS
Endorsers, cont’d
• Wei Wang (University of North Carolina at Chapel Hill)
• Philip S. Yu (University of Illinois, Chicago) • Zhongfei Zhang (Binghamton University, State
University of New York)
KDD'10 C. Faloutsos 15
![Page 16: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/16.jpg)
CMU SCS
KDD committee
• Ramasamy Uthurusamy, Chair • Robert Grossman (University of Illinois at
Chicago) • Jiawei Han (University of Illinois at
Urbana-Champaign) • Tom Mitchell (Carnegie Mellon University) • Gregory Piatetsky-Shapiro (KDnuggets)
KDD'10 C. Faloutsos 16
![Page 17: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/17.jpg)
CMU SCS
KDD committee cnt’d
• Raghu Ramakrishnan (Yahoo! Research) • Sunita Sarawagi (Indian Institute of
Technology, Bombay) • Padhraic Smyth (University of California at
Irvine) • Ramakrishnan Srikant (Google Research)
KDD'10 C. Faloutsos 17
![Page 18: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/18.jpg)
CMU SCS
KDD committee cnt’d
• Xindong Wu (University of Vermont) • Mohammed J. Zaki (Rensselaer Polytechnic
Institute)
KDD'10 C. Faloutsos 18
![Page 19: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/19.jpg)
CMU SCS
Family • Parents Nikos & Sophia • Siblings Michalis*, Petros*, Maria
• Wife Christina#
(*) : and co-authors (#) : and research impact evaluator (‘grandpa’ test - see later…)
KDD'10 C. Faloutsos 19
![Page 20: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/20.jpg)
CMU SCS
KDD'10 C. Faloutsos 20
Academic ‘parents’
• Christodoulakis, Stavros (T.U.C.)
• Sevcik, Ken (U of T)
• Roussopoulos, Nick (UMD)
![Page 21: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/21.jpg)
CMU SCS
KDD'10 C. Faloutsos 21
Academic ‘children’ • King-Ip (David) Lin • Ibrahim Kamel • Flip Korn • Byoung-Kee Yi • Leejay Wu • Deepayan Chakrabarti
![Page 22: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/22.jpg)
CMU SCS
KDD'10 C. Faloutsos 22
Academic ‘children’
• Jia-Yu (Tim) Pan • Spiros Papadimitriou • Jimeng Sun • Jure Leskovec • Hanghang Tong
![Page 23: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/23.jpg)
CMU SCS
KDD'10 C. Faloutsos 23
Academic ‘children’
• Mary McGlohon • Fan Guo • Lei Li • Leman Akoglu • Dueng Horng (Polo) Chau • Aditya Prakash • U Kang
![Page 24: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/24.jpg)
CMU SCS
KDD'10 C. Faloutsos 24
CMU colleagues
• Tom Mitchell • Garth Gibson • Greg Ganger • M. (Satya) Satyanarayanan • Howard Wactlar • Jeannette Wing • + +
![Page 25: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/25.jpg)
CMU SCS
KDD'10 C. Faloutsos 25
Co-authors
• [dblp 7/2010:] All 300 of you
• Agma J. M. Traina (22) • Caetano Traina Jr. (20) • …
![Page 26: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/26.jpg)
CMU SCS
Funding agencies
• NSF (Maria Zemankova, Frank Olken, ++) • DARPA, LLNL, PITA • IBM, MS, HP, INTEL, Y!, Google,
Symantec, Sony, Fujitsu, …
KDD'10 C. Faloutsos 26
![Page 27: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/27.jpg)
CMU SCS
KDD'10 C. Faloutsos 27
Outline
• Credit where credit is due • Technical part – Data mining
– Can it be automated? – Research challenges
• Non-technical part: `Listen’ – To the data – To non-experts
![Page 28: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/28.jpg)
CMU SCS
Data mining = compression & …
KDD'10 C. Faloutsos 28
Christos Faloutsos, Vasileios Megalooikonomou: On data mining, compression, and Kolmogorov complexity. Data Min. Knowl. Discov. 15(1): 3-20 (2007)
![Page 29: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/29.jpg)
CMU SCS
Data mining = compression & …
KDD'10 C. Faloutsos 29
Christos Faloutsos, Vasileios Megalooikonomou: On data mining, compression, and Kolmogorov complexity. Data Min. Knowl. Discov. 15(1): 3-20 (2007)
![Page 30: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/30.jpg)
CMU SCS
Data mining = compression & …
KDD'10 C. Faloutsos 30
But: how can compression • do forecasting? • spot outliers?
• do classification?
![Page 31: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/31.jpg)
CMU SCS
Data mining = compression & …
KDD'10 C. Faloutsos 31
OK – then, isn’t compression a solved problem (gzip, LZ)?
![Page 32: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/32.jpg)
CMU SCS
… compression is undecidable!
Theorem*: for an arbitrary string x, computing its Kolmogorov complexity K(x) is undecidable
KDD'10 C. Faloutsos 32 (*) E.g., [T. M. Cover and J. A. Thomas. Elements of Information Theory. John Wiley and Sons,1991, section 7.7]
A.N. Kolmogorov
EVEN WORSE than NP-hard!
![Page 33: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/33.jpg)
CMU SCS
… compression is undecidable!
…which means there will always be better data mining tools/models/patterns to be discovered
-> job security -> job satisfaction
KDD'10 C. Faloutsos 33
![Page 34: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/34.jpg)
CMU SCS
Let’s see some examples of models
KDD'10 C. Faloutsos 34 Body weight
Response to new drug
![Page 35: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/35.jpg)
CMU SCS
Let’s see some examples of models
KDD'10 C. Faloutsos 35 income
$ spent
![Page 36: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/36.jpg)
CMU SCS
Let’s see some examples of models
KDD'10 C. Faloutsos 36 income
$ spent
![Page 37: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/37.jpg)
CMU SCS
Let’s see some examples of models
KDD'10 C. Faloutsos 37 income
$ spent
![Page 38: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/38.jpg)
CMU SCS
Let’s see some examples of models
KDD'10 C. Faloutsos 38 income
$ spent
3/4
![Page 39: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/39.jpg)
CMU SCS
KDD'10 C. Faloutsos 39
Metabolic rate
3/4
mass
Let’s see some examples of models
http://universe-review.ca /R10-35-metabolic.htm
![Page 40: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/40.jpg)
CMU SCS
KDD'10 C. Faloutsos 40
Metabolic rate
3/4
mass
Kleiberg’s law
http://universe-review.ca /R10-35-metabolic.htm
![Page 41: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/41.jpg)
CMU SCS
KDD'10 C. Faloutsos 41
Outline
• Credit where credit is due • Technical part – Data mining
– Can it be automated? NO! • Always room for better models
– Research challenges
• Non-technical part: `Listen’ – To the data – To non-experts
![Page 42: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/42.jpg)
CMU SCS
Always room for better models
• Eg.: clustering – k-means (or our favorite clustering algo)
• How many clusters are in the Sierpinski triangle?
KDD'10 C. Faloutsos 42
…
![Page 43: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/43.jpg)
CMU SCS
Always room for better models
KDD'10 C. Faloutsos 43
![Page 44: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/44.jpg)
CMU SCS
Always room for better models
KDD'10 C. Faloutsos 44
K=3 clusters?
![Page 45: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/45.jpg)
CMU SCS
Always room for better models
KDD'10 C. Faloutsos 45
K=3 clusters? K=9 clusters?
![Page 46: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/46.jpg)
CMU SCS
Always room for better models
KDD'10 C. Faloutsos 46
Piece-wise flat
Mixture of (Gaussian)
clusters
![Page 47: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/47.jpg)
CMU SCS
Always room for better models
KDD'10 C. Faloutsos 47
Piece-wise flat
Mixture of (Gaussian)
clusters
¾ Power law
??
![Page 48: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/48.jpg)
CMU SCS
Always room for better models
KDD'10 C. Faloutsos 48
Piece-wise flat
Mixture of (Gaussian)
clusters
¾ Power law
ONE, but Self-similar
‘cluster’
![Page 49: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/49.jpg)
CMU SCS
Always room for better models
KDD'10 C. Faloutsos 49
ONE, but Self-similar
‘cluster’
• Barnsley’s method of IFS (iterated function systems) can easily generate it [Barnsley+Sloan, BYTE, 1988]
~100 lines of C code: www.cs.cmu.edu~/christos/www/SRC/ifs.tar
![Page 50: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/50.jpg)
CMU SCS
Always room for better models
KDD'10 C. Faloutsos 50
• But, does self-similarity appear in real life?
![Page 51: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/51.jpg)
CMU SCS
Real, self similar dataset
KDD'10 C. Faloutsos 51
![Page 52: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/52.jpg)
CMU SCS
Real, self similar dataset
KDD'10 C. Faloutsos 52
![Page 53: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/53.jpg)
CMU SCS
Real, self similar dataset
KDD'10 C. Faloutsos 53
![Page 54: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/54.jpg)
CMU SCS
Real, self similar dataset
KDD'10 C. Faloutsos 54
![Page 55: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/55.jpg)
CMU SCS
KDD'10 C. Faloutsos 55
• the red is true • origin: Norway • but most other coastlines are ‘self-similar’, too!
![Page 56: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/56.jpg)
CMU SCS
How can we find better models?
• Obviously, an art (‘undecidable’!) • Helps if we
– Listen to domain experts and – Listen to the data (next)
KDD'10 C. Faloutsos 56
![Page 57: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/57.jpg)
CMU SCS
KDD'10 C. Faloutsos 57
Outline
• Credit where credit is due • Technical part – Data mining
– Can it be automated? NO! – Research challenges
• Listen to the data (the more, the better!)
• Non-technical part: `Listen’ – To the data – To non-experts
![Page 58: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/58.jpg)
CMU SCS
KDD'10 C. Faloutsos 58
Scalability
• Google: > 450,000 processors in clusters of ~2000 processors each [Barroso, Dean, Hölzle, “Web Search for a Planet: The Google Cluster Architecture” IEEE Micro 2003]
• Yahoo: ~5Pb of data [Fayyad’07] • ‘M45’: 4K proc’s, 3Tb RAM, 1.5 Pb disk
![Page 59: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/59.jpg)
CMU SCS
Promising research direction: scalability
• challenges – Vast amounts of data; storing; cooling (!); …
• … and opportunities: – DATA: Easier to collect (clickstreams, sensors
etc) – S/W: Hadoop, hbase, pig, … : open source – H/W: 1Tb disk: ~ US$ 100
KDD'10 C. Faloutsos 59
![Page 60: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/60.jpg)
CMU SCS
Promising research direction
• The more data, the more subtle patterns we may discover
• Examples of subtle patterns:
KDD'10 C. Faloutsos 60
![Page 61: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/61.jpg)
CMU SCS
More data, more subtle patterns
KDD'10 C. Faloutsos 61
Mukund Seshadri, Sridhar Machiraju, Ashwin Sridharan, Jean Bolot, Christos Faloutsos, Jure Leskovec: Mobile call graphs: beyond power-law and lognormal distributions. KDD 2008: 596-604
Duration (log scale)
PDF: fraction of customers
(log scale)
![Page 62: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/62.jpg)
CMU SCS
More data, more subtle patterns
KDD'10 C. Faloutsos 62
Mukund Seshadri, Sridhar Machiraju, Ashwin Sridharan, Jean Bolot, Christos Faloutsos, Jure Leskovec: Mobile call graphs: beyond power-law and lognormal distributions. KDD 2008: 596-604
Duration (log scale)
PDF: fraction of customers
(log scale)
(mixture of) Gaussians
![Page 63: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/63.jpg)
CMU SCS
More data, more subtle patterns
KDD'10 C. Faloutsos 63
Mukund Seshadri, Sridhar Machiraju, Ashwin Sridharan, Jean Bolot, Christos Faloutsos, Jure Leskovec: Mobile call graphs: beyond power-law and lognormal distributions. KDD 2008: 596-604
![Page 64: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/64.jpg)
CMU SCS
More data, more subtle patterns
KDD'10 C. Faloutsos 64
Mukund Seshadri, Sridhar Machiraju, Ashwin Sridharan, Jean Bolot, Christos Faloutsos, Jure Leskovec: Mobile call graphs: beyond power-law and lognormal distributions. KDD 2008: 596-604
Zipf (Pareto,
Power-law)
![Page 65: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/65.jpg)
CMU SCS
More data, more subtle patterns
KDD'10 C. Faloutsos 65
Mukund Seshadri, Sridhar Machiraju, Ashwin Sridharan, Jean Bolot, Christos Faloutsos, Jure Leskovec: Mobile call graphs: beyond power-law and lognormal distributions. KDD 2008: 596-604
![Page 66: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/66.jpg)
CMU SCS
More data, more subtle patterns
KDD'10 C. Faloutsos 66
Mukund Seshadri, Sridhar Machiraju, Ashwin Sridharan, Jean Bolot, Christos Faloutsos, Jure Leskovec: Mobile call graphs: beyond power-law and lognormal distributions. KDD 2008: 596-604
lognormal
![Page 67: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/67.jpg)
CMU SCS
More data, more subtle patterns
KDD'10 C. Faloutsos 67
Mukund Seshadri, Sridhar Machiraju, Ashwin Sridharan, Jean Bolot, Christos Faloutsos, Jure Leskovec: Mobile call graphs: beyond power-law and lognormal distributions. KDD 2008: 596-604
![Page 68: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/68.jpg)
CMU SCS
More data, more subtle patterns
KDD'10 C. Faloutsos 68
Mukund Seshadri, Sridhar Machiraju, Ashwin Sridharan, Jean Bolot, Christos Faloutsos, Jure Leskovec: Mobile call graphs: beyond power-law and lognormal distributions. KDD 2008: 596-604
dPln (=doubly
Pareto Lognormal)
![Page 69: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/69.jpg)
CMU SCS
So, dPln is the answer?
KDD'10 C. Faloutsos 69
![Page 70: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/70.jpg)
CMU SCS
So, dPln is the answer?
KDD'10 C. Faloutsos 70
Yes, for the moment…
![Page 71: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/71.jpg)
CMU SCS
So, dPln is the answer?
KDD'10 C. Faloutsos 71
With more data, who knows?!
![Page 72: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/72.jpg)
CMU SCS
KDD'10 C. Faloutsos 72
Outline
• Credit where credit is due • Technical part – Data mining
– Can it be automated? NO! – Research challenges
• Listen to the data (the more, the better!)
• Non-technical part: ‘Listen’ – To the data – To non-experts
![Page 73: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/73.jpg)
CMU SCS
Listen to non-experts
• Explain ‘why’, to a non-expert (‘grandpa’) • (and, even harder, explain ‘how’ – e.g.:
– Frobenious Perron for irreducible MC
KDD'10 C. Faloutsos 73
![Page 74: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/74.jpg)
CMU SCS
Listen to non-experts
• Explain ‘why’, to a non-expert (‘grandpa’) • (and, even harder, explain ‘how’ – e.g.:
– Frobenious Perron for irreducible MC -> pageRank -> random surfer
KDD'10 C. Faloutsos 74
![Page 75: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/75.jpg)
CMU SCS
Summary • Data mining = compression = undecidable =
job security • Hence: always room for better models/
patterns
– Listen to the data (Gb, Tb and Pb of them!) – Listen to domain experts (e.g., ¾ Kleiberg’s
law) • Listen to non-experts (‘explain to grandpa’) KDD'10 C. Faloutsos 75
![Page 76: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/76.jpg)
CMU SCS
Compression, fun, recursion
• The shortest, recursive joke: • There are 3 types of data miners
KDD'10 C. Faloutsos 76
![Page 77: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/77.jpg)
CMU SCS
Compression, fun, recursion
• The shortest, recursive joke: • There are 3 types of data miners
– Those who can count
KDD'10 C. Faloutsos 77
![Page 78: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/78.jpg)
CMU SCS
Compression, fun, recursion
• The shortest, recursive joke: • There are 3 types of data miners
– Those who can count – And those who can not
KDD'10 C. Faloutsos 78
![Page 79: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University](https://reader033.vdocuments.mx/reader033/viewer/2022050523/5fa6ddde045af646ba0ceb75/html5/thumbnails/79.jpg)
CMU SCS
Thank you! For the honor, and for making this wonderful research community
KDD'10 C. Faloutsos 79