identifying multi-id users in open forums

41
Identifying Multi-ID Users in Open Forums Hung-Ching Chen Mark Goldberg Malik Magdon-Ismail

Upload: mort

Post on 08-Jan-2016

21 views

Category:

Documents


0 download

DESCRIPTION

Identifying Multi-ID Users in Open Forums. Hung-Ching Chen Mark Goldberg Malik Magdon-Ismail. Who is Using Multiple IDs?. Consider two chatrooms, “letters” and “numbers”. letters. numbers. dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? dogg: not much - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Identifying Multi-ID Users in Open Forums

Identifying Multi-ID Users in Open Forums

Hung-Ching ChenMark Goldberg

Malik Magdon-Ismail

Page 2: Identifying Multi-ID Users in Open Forums

Who is Using Multiple IDs?

Consider two chatrooms, “letters” and “numbers”

letters numbers

dogg: hey anna, Z rocks!anna: hey dogg whats newmack: what’s that dogg?dogg: not muchdogg: Z is the best

mack: i love 5catt: i don’tjoop: 27 rules!catt: i agree joop :)

Page 3: Identifying Multi-ID Users in Open Forums

Multi-ID Usersletters numbers

dogg/catt joopmackanna

dogg: hey anna, Z rocks!

Page 4: Identifying Multi-ID Users in Open Forums

Multi-ID Usersletters numbers

dogg/catt joopmackanna

dogg: hey anna, Z rocks! mack: i love 5

Page 5: Identifying Multi-ID Users in Open Forums

Multi-ID Usersletters numbers

dogg/catt joopmackanna

dogg: hey anna, Z rocks!anna: hey dogg whats new

mack: i love 5

Page 6: Identifying Multi-ID Users in Open Forums

Multi-ID Usersletters numbers

dogg/catt joopmackanna

dogg: hey anna, Z rocks!anna: hey dogg whats newmack: what’s that dogg?

mack: i love 5

Page 7: Identifying Multi-ID Users in Open Forums

Multi-ID Usersletters numbers

dogg/catt joopmackanna

dogg: hey anna, Z rocks!anna: hey dogg whats newmack: what’s that dogg?

mack: i love 5

catt: i don’t

Page 8: Identifying Multi-ID Users in Open Forums

Multi-ID Usersletters numbers

dogg/catt joopmackanna

dogg: hey anna, Z rocks!anna: hey dogg whats newmack: what’s that dogg?dogg: not much

mack: i love 5

catt: i don’t

Page 9: Identifying Multi-ID Users in Open Forums

Multi-ID Usersletters numbers

dogg/catt joopmackanna

dogg: hey anna, Z rocks!anna: hey dogg whats newmack: what’s that dogg?dogg: not much

mack: i love 5

catt: i don’tjoop: 27 rules!

Page 10: Identifying Multi-ID Users in Open Forums

Multi-ID Usersletters numbers

dogg: hey anna, Z rocks!anna: hey dogg whats newmack: what’s that dogg?dogg: not muchdogg: Z is the best

mack: i love 5

catt: i don’tjoop: 27 rules!

dogg/catt joopmackanna

Page 11: Identifying Multi-ID Users in Open Forums

Multi-ID Usersletters numbers

dogg: hey anna, Z rocks!anna: hey dogg whats newmack: what’s that dogg?dogg: not muchdogg: Z is the best

mack: i love 5

catt: i don’tjoop: 27 rules!catt: i agree joop :)

dogg/catt joopmackanna

Page 12: Identifying Multi-ID Users in Open Forums

Overview

• A model of a public forum• Two efficient statistics-based

algorithms for identification of multi-ID users

• Simulation results

Page 13: Identifying Multi-ID Users in Open Forums

Overview

• A model of a public forum• Two efficient statistics-based

algorithms for identification of multi-ID users

• Simulation results

Page 14: Identifying Multi-ID Users in Open Forums

A Model of a Public Forum

• Every actor has one response queue

• Common average and variance of response delay

• The server has a queue to process messages with very short delay

Page 15: Identifying Multi-ID Users in Open Forums

Model Friendship Graph

anna

catt

mack

dogg joop

Page 16: Identifying Multi-ID Users in Open Forums

Model Alias Graph

anna

catt

mack

dogg joop

Page 17: Identifying Multi-ID Users in Open Forums

Multi-ID Users

• One response queue for each actor

dogg/catt joopmackanna

annacatt catt

dogg

Page 18: Identifying Multi-ID Users in Open Forums

Multi-ID Usersletters numbers

dogg/catt joopmackanna

dogg: hey anna, Z rocks!

catt catt

dogg

anna

dogg

Page 19: Identifying Multi-ID Users in Open Forums

Multi-ID Usersletters numbers

dogg/catt joopmackanna

dogg: hey anna, Z rocks! mack: i love 5

cattdogg

mack

catt

dogg

Page 20: Identifying Multi-ID Users in Open Forums

Multi-ID Usersletters numbers

dogg/catt joopmackanna

dogg: hey anna, Z rocks!anna: hey dogg whats new

mack: i love 5

cattmack

anna

doggdogg

Page 21: Identifying Multi-ID Users in Open Forums

Multi-ID Usersletters numbers

dogg/catt joopmackanna

dogg: hey anna, Z rocks!anna: hey dogg whats newmack: what’s that dogg?

mack: i love 5

cattmack

anna

mack

dogg

Page 22: Identifying Multi-ID Users in Open Forums

Multi-ID Usersletters numbers

dogg/catt joopmackanna

dogg: hey anna, Z rocks!anna: hey dogg whats newmack: what’s that dogg?

mack: i love 5

catt: i don’t

catt

anna

mack

catt

mack

Page 23: Identifying Multi-ID Users in Open Forums

Multi-ID Usersletters numbers

dogg/catt joopmackanna

dogg: hey anna, Z rocks!anna: hey dogg whats newmack: what’s that dogg?dogg: not much

mack: i love 5

catt: i don’t

catt

mack

catt

dogg

anna

Page 24: Identifying Multi-ID Users in Open Forums

Multi-ID Usersletters numbers

dogg/catt joopmackanna

dogg: hey anna, Z rocks!anna: hey dogg whats newmack: what’s that dogg?dogg: not much

mack: i love 5

catt: i don’tjoop: 27 rules!

mackcattdogg

joop

catt

Page 25: Identifying Multi-ID Users in Open Forums

Multi-ID Usersletters numbers

dogg: hey anna, Z rocks!anna: hey dogg whats newmack: what’s that dogg?dogg: not muchdogg: Z is the best

mack: i love 5

catt: i don’tjoop: 27 rules!

dogg/catt joopmackanna

cattdogg

joop

dogg

mack

Page 26: Identifying Multi-ID Users in Open Forums

Multi-ID Usersletters numbers

dogg: hey anna, Z rocks!anna: hey dogg whats newmack: what’s that dogg?dogg: not muchdogg: Z is the best

mack: i love 5

catt: i don’tjoop: 27 rules!catt: i agree joop :)

dogg/catt joopmackanna

cattdogg

dogg

catt

joop

Page 27: Identifying Multi-ID Users in Open Forums

Overview

• A model of a public forum• Two efficient statistics-based

algorithms for identification of multi-ID users

• Simulation results

Page 28: Identifying Multi-ID Users in Open Forums

First Algorithm

1. Collect information {‹timei, IDi›}2. Compute minimum time delay minD

for every pair of IDs3. Cluster the delays using k-means into

two groups.• Call pairs with larger center suspected to

be the same actor (red)• Call pairs with smaller center suspected to

different (blue)4. Connected components using red

edges are ID groups representing one actor

Page 29: Identifying Multi-ID Users in Open Forums

1. Collect information

anna

catt

mack

dogg

joop

mack

catt

dogg dogg

time

Page 30: Identifying Multi-ID Users in Open Forums

2. minD(dogg, mack)

mack

dogg

mack

dogg dogg

Page 31: Identifying Multi-ID Users in Open Forums

2. minD(dogg, catt)

catt

dogg

catt

dogg dogg

Page 32: Identifying Multi-ID Users in Open Forums

3. Cluster minD values

minD(dogg, mack)minD(dogg, catt)

Different actorsSame actor

Page 33: Identifying Multi-ID Users in Open Forums

3. Resulting Alias Graph

anna

catt

mack

dogg joop

Contradictions

Page 34: Identifying Multi-ID Users in Open Forums

4. One Red Component

anna

catt

mack

dogg joop

9 false positives10% accuracy

Page 35: Identifying Multi-ID Users in Open Forums

Refinement Algorithm

1. 2. 3. 4. Find groups connected with red

edges5. Color IDs into smaller ID groups

using blue edges

Follow original algorithm

Page 36: Identifying Multi-ID Users in Open Forums

4. One Red Component

anna

catt

mack

dogg joop

Page 37: Identifying Multi-ID Users in Open Forums

5. Color Blue Graph

anna

catt

mack

dogg joop

Page 38: Identifying Multi-ID Users in Open Forums

5. Color Blue Graph

anna

catt

mack

dogg joop

1 false positive90% accuracy

Page 39: Identifying Multi-ID Users in Open Forums

Overview

• A model of a public forum• Two efficient statistics-based

algorithms for identification of multi-ID users

• Simulation results

Page 40: Identifying Multi-ID Users in Open Forums

Simulation results

The average number of

messages per pair

Accuracy without coloring

(%)

Accuracy with coloring (%)

0.513 70.2797 70.9765

8.274 97.3140 97.3473

87.68 99.7762 99.7826

880.4 99.9995 99.9998

The dependence of Accuracy on the number of messages

Page 41: Identifying Multi-ID Users in Open Forums

Future Work

• Real-life testing• More sophisticated model

– Different types of users– Short and overlapping online appearances– Length of posts

• Algorithm improvements– Statistical evaluation of the new models– Lexical and semantic analysis