![Page 1: CX4242: Data & Visual Analytics - Visualization€¦ · Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007. Find bad sellers (fraudsters) on eBay who](https://reader034.vdocuments.mx/reader034/viewer/2022042923/5f714706980a4771a809de89/html5/thumbnails/1.jpg)
CX4242:
Data & Visual Analytics
Mahdi Roozbahani
Lecturer, Computational Science and
Engineering, Georgia Tech
![Page 2: CX4242: Data & Visual Analytics - Visualization€¦ · Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007. Find bad sellers (fraudsters) on eBay who](https://reader034.vdocuments.mx/reader034/viewer/2022042923/5f714706980a4771a809de89/html5/thumbnails/2.jpg)
Assignments Overview(Tentative and subject to change)
C X 4242
![Page 3: CX4242: Data & Visual Analytics - Visualization€¦ · Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007. Find bad sellers (fraudsters) on eBay who](https://reader034.vdocuments.mx/reader034/viewer/2022042923/5f714706980a4771a809de89/html5/thumbnails/3.jpg)
Assignment 1
Platforms, Languages & TechnologiesPython, Gephi, SQLite, D3, OpenRefine
QuestionsQ1: Collecting and visualizing data (Python & Gephi)
Q2: Analysing data using SQLite
Q3: D3 Warmup
Q4: Analysing data through OpenRefine
![Page 4: CX4242: Data & Visual Analytics - Visualization€¦ · Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007. Find bad sellers (fraudsters) on eBay who](https://reader034.vdocuments.mx/reader034/viewer/2022042923/5f714706980a4771a809de89/html5/thumbnails/4.jpg)
Assignment 2
Platforms, Languages & TechnologiesD3, Tableau
QuestionsQ1: Designing a good table and visualizing data with Tableau Q2: Force
directed graph using D3
Q3: Scatter plots using D3 Q4: Heatmap using D3
Q5: Interactive visualization using D3 Q6: Choropleth map
using D3
Q7: Pros and cons of various visualization tools
![Page 5: CX4242: Data & Visual Analytics - Visualization€¦ · Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007. Find bad sellers (fraudsters) on eBay who](https://reader034.vdocuments.mx/reader034/viewer/2022042923/5f714706980a4771a809de89/html5/thumbnails/5.jpg)
Assignment 3
Platforms, Languages & TechnologiesJava, Hadoop, Spark, Pig, Azure
QuestionsQ1: Analyzing a graph with Hadoop/Java
Q2: Analyzing a graph with Spark/Scala on Databricks Q3: Analyzing
data with Pig on AWS
Q4: Analyzing a graph using Hadoop on Microsoft Azure Q5:
Regression using Azure ML Studio
![Page 6: CX4242: Data & Visual Analytics - Visualization€¦ · Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007. Find bad sellers (fraudsters) on eBay who](https://reader034.vdocuments.mx/reader034/viewer/2022042923/5f714706980a4771a809de89/html5/thumbnails/6.jpg)
Assignment 4
Platforms, Languages & TechnologiesPypy, PageRank, Random Forest, SciKit Learn
QuestionsQ1: Scalable single-machine PageRank
Q2: Implementing a random forest classifier
Q3: Using Scikit-Learn for running various classifiers
![Page 7: CX4242: Data & Visual Analytics - Visualization€¦ · Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007. Find bad sellers (fraudsters) on eBay who](https://reader034.vdocuments.mx/reader034/viewer/2022042923/5f714706980a4771a809de89/html5/thumbnails/7.jpg)
Collection
Cleaning
Integration
Visualization
Analysis
Presentation
Dissemination
![Page 8: CX4242: Data & Visual Analytics - Visualization€¦ · Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007. Find bad sellers (fraudsters) on eBay who](https://reader034.vdocuments.mx/reader034/viewer/2022042923/5f714706980a4771a809de89/html5/thumbnails/8.jpg)
Building blocks. Not Rigid “Steps”.
Can skip some
Can go back (two-way street)
• Data types inform visualization design
• Data size informs choice of algorithms
• Visualization motivates more data cleaning
• Visualization challenges algorithm
assumptions
e.g., user finds that results don’t make sense
Collection
Cleaning
Integration
Visualization
Analysis
Presentation
Dissemination
![Page 9: CX4242: Data & Visual Analytics - Visualization€¦ · Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007. Find bad sellers (fraudsters) on eBay who](https://reader034.vdocuments.mx/reader034/viewer/2022042923/5f714706980a4771a809de89/html5/thumbnails/9.jpg)
How “big data” affects the
process?(Hint: almost everything is harder!)
The Vs of big data (3Vs originally, then 7, now 42)
Volume: “billions”, “petabytes” are common
Velocity: think Twitter, fraud detection, etc.
Variety: text (webpages), video (youtube)…
Veracity: uncertainty of data
Variability
Visualization
Value
Collection
Cleaning
Integration
Visualization
Analysis
Presentation
Dissemination
http://www.ibmbigdatahub.com/infographic/four-vs-big-data
http://dataconomy.com/seven-vs-big-data/
https://tdwi.org/articles/2017/02/08/10-vs-of-big-data.aspx
![Page 10: CX4242: Data & Visual Analytics - Visualization€¦ · Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007. Find bad sellers (fraudsters) on eBay who](https://reader034.vdocuments.mx/reader034/viewer/2022042923/5f714706980a4771a809de89/html5/thumbnails/10.jpg)
Three Example Projects from Polo and Mahdi Research group
![Page 11: CX4242: Data & Visual Analytics - Visualization€¦ · Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007. Find bad sellers (fraudsters) on eBay who](https://reader034.vdocuments.mx/reader034/viewer/2022042923/5f714706980a4771a809de89/html5/thumbnails/11.jpg)
Apolo Graph Exploration:
Machine Learning + Visualization
18
Apolo: Making Sense of Large Network Data by Combining Rich User Interaction and Machine Learning.
Duen Horng (Polo) Chau, Aniket Kittur, Jason I. Hong, Christos Faloutsos. CHI 2011.
![Page 12: CX4242: Data & Visual Analytics - Visualization€¦ · Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007. Find bad sellers (fraudsters) on eBay who](https://reader034.vdocuments.mx/reader034/viewer/2022042923/5f714706980a4771a809de89/html5/thumbnails/12.jpg)
19
Beautiful Hairball
Death Star
Spaghetti
![Page 13: CX4242: Data & Visual Analytics - Visualization€¦ · Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007. Find bad sellers (fraudsters) on eBay who](https://reader034.vdocuments.mx/reader034/viewer/2022042923/5f714706980a4771a809de89/html5/thumbnails/13.jpg)
Finding More Relevant Nodes
Apolo uses guilt-by-association
(Belief Propagation)
HCIPaper
Data MiningPaper
Citation network
20
![Page 14: CX4242: Data & Visual Analytics - Visualization€¦ · Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007. Find bad sellers (fraudsters) on eBay who](https://reader034.vdocuments.mx/reader034/viewer/2022042923/5f714706980a4771a809de89/html5/thumbnails/14.jpg)
Demo: Mapping the Sensemaking Literature
22
Nodes: 80k papers from Google Scholar (node size: #citation)Edges: 150k citations
![Page 15: CX4242: Data & Visual Analytics - Visualization€¦ · Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007. Find bad sellers (fraudsters) on eBay who](https://reader034.vdocuments.mx/reader034/viewer/2022042923/5f714706980a4771a809de89/html5/thumbnails/15.jpg)
![Page 16: CX4242: Data & Visual Analytics - Visualization€¦ · Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007. Find bad sellers (fraudsters) on eBay who](https://reader034.vdocuments.mx/reader034/viewer/2022042923/5f714706980a4771a809de89/html5/thumbnails/16.jpg)
Key Ideas (Recap)
Specify exemplars
Find other relevant nodes (BP)
24
![Page 17: CX4242: Data & Visual Analytics - Visualization€¦ · Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007. Find bad sellers (fraudsters) on eBay who](https://reader034.vdocuments.mx/reader034/viewer/2022042923/5f714706980a4771a809de89/html5/thumbnails/17.jpg)
What did Apolo go through?
Collection
Cleaning
Integration
Visualization
Analysis
Presentation
Dissemination
Scrape Google Scholar. No API. 😩
Design inference algorithm (Which nodes to show next?)
Paper, talks, lectures
Interactive visualization you just saw
You will a new Apolo prototype (called Argo)
![Page 18: CX4242: Data & Visual Analytics - Visualization€¦ · Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007. Find bad sellers (fraudsters) on eBay who](https://reader034.vdocuments.mx/reader034/viewer/2022042923/5f714706980a4771a809de89/html5/thumbnails/18.jpg)
26
Apolo: Making Sense of Large Network Data by Combining Rich User Interaction and
Machine Learning. Duen Horng (Polo) Chau, Aniket Kittur, Jason I. Hong, Christos Faloutsos.
ACM Conference on Human Factors in Computing Systems (CHI) 2011. May 7-12, 2011.
![Page 19: CX4242: Data & Visual Analytics - Visualization€¦ · Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007. Find bad sellers (fraudsters) on eBay who](https://reader034.vdocuments.mx/reader034/viewer/2022042923/5f714706980a4771a809de89/html5/thumbnails/19.jpg)
NetProbe:
Fraud Detection in Online Auction
NetProbe: A Fast and Scalable System for Fraud Detection in Online Auction Networks. Shashank Pandit, Duen Horng (Polo)
Chau, Samuel Wang, Christos Faloutsos. WWW 2007
![Page 20: CX4242: Data & Visual Analytics - Visualization€¦ · Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007. Find bad sellers (fraudsters) on eBay who](https://reader034.vdocuments.mx/reader034/viewer/2022042923/5f714706980a4771a809de89/html5/thumbnails/20.jpg)
Find bad sellers (fraudsters) on eBay
who don’t deliver their items
NetProbe: The Problem
Buyer
$$$
Seller
28
Non-delivery fraud is a common auction fraud
source: https://www.fbi.gov/contact-us/field-offices/portland/news/press-releases/fbi-tech-tuesday---building-a-digital-defense-against-auction-fraud
![Page 21: CX4242: Data & Visual Analytics - Visualization€¦ · Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007. Find bad sellers (fraudsters) on eBay who](https://reader034.vdocuments.mx/reader034/viewer/2022042923/5f714706980a4771a809de89/html5/thumbnails/21.jpg)
29
![Page 22: CX4242: Data & Visual Analytics - Visualization€¦ · Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007. Find bad sellers (fraudsters) on eBay who](https://reader034.vdocuments.mx/reader034/viewer/2022042923/5f714706980a4771a809de89/html5/thumbnails/22.jpg)
NetProbe: Key Ideas
Fraudsters fabricate their reputation by
“trading” with their accomplices
Fake transactions form near bipartite cores
How to detect them?
30
![Page 23: CX4242: Data & Visual Analytics - Visualization€¦ · Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007. Find bad sellers (fraudsters) on eBay who](https://reader034.vdocuments.mx/reader034/viewer/2022042923/5f714706980a4771a809de89/html5/thumbnails/23.jpg)
NetProbe: Key Ideas
Use Belief Propagation
31
F A H
Fraudster
Accomplice
Honest
Darker means
more likely
![Page 24: CX4242: Data & Visual Analytics - Visualization€¦ · Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007. Find bad sellers (fraudsters) on eBay who](https://reader034.vdocuments.mx/reader034/viewer/2022042923/5f714706980a4771a809de89/html5/thumbnails/24.jpg)
NetProbe: Main Results
33
![Page 25: CX4242: Data & Visual Analytics - Visualization€¦ · Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007. Find bad sellers (fraudsters) on eBay who](https://reader034.vdocuments.mx/reader034/viewer/2022042923/5f714706980a4771a809de89/html5/thumbnails/25.jpg)
34
“Belgian Police”
![Page 26: CX4242: Data & Visual Analytics - Visualization€¦ · Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007. Find bad sellers (fraudsters) on eBay who](https://reader034.vdocuments.mx/reader034/viewer/2022042923/5f714706980a4771a809de89/html5/thumbnails/26.jpg)
35
![Page 27: CX4242: Data & Visual Analytics - Visualization€¦ · Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007. Find bad sellers (fraudsters) on eBay who](https://reader034.vdocuments.mx/reader034/viewer/2022042923/5f714706980a4771a809de89/html5/thumbnails/27.jpg)
What did NetProbe go through?
Collection
Cleaning
Integration
Visualization
Analysis
Presentation
Dissemination
Scraping (built a “scraper”/“crawler”)
Design detection algorithm
Not released
Paper, talks, lectures
![Page 28: CX4242: Data & Visual Analytics - Visualization€¦ · Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007. Find bad sellers (fraudsters) on eBay who](https://reader034.vdocuments.mx/reader034/viewer/2022042923/5f714706980a4771a809de89/html5/thumbnails/28.jpg)
37
NetProbe: A Fast and Scalable System for Fraud Detection in Online Auction Networks. Shashank
Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. International Conference on World Wide
Web (WWW) 2007. May 8-12, 2007. Banff, Alberta, Canada. Pages 201-210.
![Page 30: CX4242: Data & Visual Analytics - Visualization€¦ · Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007. Find bad sellers (fraudsters) on eBay who](https://reader034.vdocuments.mx/reader034/viewer/2022042923/5f714706980a4771a809de89/html5/thumbnails/30.jpg)
Homework 1 (out next week; tasks subject to change)
• Simple “End-to-end” analysis
• Collect data using API
• Store in SQLite database
• Create graph from data
• Analyze, using SQL queries (e.g.,
create graph’s degree distribution)
• Visualize graph using Gephi
• Describe your discoveries
Collection
Cleaning
Integration
Visualization
Analysis
Presentation
Dissemination