linking organizational social networking profiles research wrap-up – 28 august 2015 1

Post on 17-Jan-2016

219 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Linking OrganizationalSocial Networking ProfilesResearch Wrap-Up – 28 August 2015

2

Develop a systemto find an organization’s profiles across different social networks.

Objective

3

Brands

Regional

Affiliates

Affiliate Profiles

4

System

Overview

Organization Name

Official

Affiliate

Unrelated

5

OfficialProfiles representing the company as a whole. e.g. @Microsoft, @Dell (respectively)

AffiliateProfiles representing a brand or regional affiliate.

e.g. @Surface, @Windows, @MicrosoftAsia

UnrelatedProfiles that aren’t run by the company itself.

Includes employees, other companies.

6

Introduction

Introduction

Implementation

Evaluation

Results/Discussion

Future?

7

Input Processing

QueryGET /company/Microsoft Corporation

Profile Acquisition

Twitter/Facebook Search API

DuckDuckGo Instant Answers

API

Processed Querye.g. “Microsoft”

Profile Conversion

Profile Classification

Twitter/Facebook Profiles

Feature Vectors

Labelled Profilesjson

Pipeline

8

9

Input Processing

QueryGET /company/Microsoft Corporation

Profile Acquisition

Twitter/Facebook Search API

DuckDuckGo Instant Answers

API

Processed Querye.g. “Microsoft”

Profile Conversion

Profile Classification

Twitter/Facebook Profiles

Feature Vectors

Labelled Profilesjson

Pipeline

10

Input ProcessingQuery DuckDuckGo Instant Answers API, which gives

a “topic summary”.Take the name from that summary.

11

Profile AcquisitionQuery Twitter/Facebook’s search API and retrieve 20

candidate profiles.

12

Name-based

(5)

• N1: Normalized Edit Distance: Query to Username• N2: Normalized Edit Distance: Query to Display Name• N3: Length of Query• N4: Length of Username• N5: Length of Display Name

Description-

based (3)

• D1: Occurrences of Query in Description• D2: Cosine Similarity: Query and Description• D3: Cosine Similarity: Profile Description and

DuckDuckGo Description

Language Model-

based (6)

• LM1: “Official” Description LM Probability• LM2: “Affiliate” Description LM Probability• LM3: “Unrelated” Description LM Probability• LM4: “Official” Post LM Probability• LM5: “Affiliate” Post LM Probability• LM6: “Unrelated” Post LM Probability

Profile Conversion - Features

13

Name-based FeaturesN1 - Normalized Edit

Distance: Query to UsernameN2 - Normalized Edit Distance: Query to Display NameN3 - Length of QueryN4 - Length of UsernameN5 - Length of Display Name

1−𝑒𝑑𝑖𝑡 _𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒

𝑚𝑎𝑥 (𝑙𝑒𝑛 (𝑠1 ) , 𝑙𝑒𝑛 (𝑠2 ))0 when completely different, 1 when identical

Username: GMDisplay Name: General Motors

QuirksAbbreviations: GM versus General Motors

Stopwords: “Corporation”, “Company”, etc.

Imposters!

14

Description-based FeaturesD1 - Occurrences of Query

D2 - Cosine Similarity: Query and DescriptionD3 - Cosine Similarity: DuckDuckGo Description and Profile Description

15

Language Model-based FeaturesProbability that description/posts

appear in each language model:

Description

• LM1 - Official Profiles• LM2- Affiliate Profiles• LM3 - Unrelated

Profiles

Recent Posts

• LM4 - Official Profiles• LM5 - Affiliate Profiles• LM6 - Unrelated

Profiles

16

Official; 232; 7%Affiliate;

675; 20%

Unrelated; 2474; 73%

3381 labels from 228 organizations

Twitter Labels

Official; 145; 4% Affil-iate; 491; 14%

Unrelated; 2767; 81%

3403 labels from 216 organizations

Facebook Labels

Ground Truth Breakdown

17

Per-Fold Evaluation Process

Official Profiles

Affiliate Profiles

1. Training set is used to train the classifier.

Classifier

Unrelated

Profiles

2. Test set is filtered for official and affiliate profiles.

Official Profiles

Affiliate Profiles

Test Set

3. Obtain list of organizations that own these profiles.

Official Profiles

Affiliate Profiles

Organization Names

System

4. Names used to query system, results used to calculate performance.

Organization Names

Classifier

Classified

Official

Classified

AffiliateClassifie

d Unrelate

d

18

BaselineSimulates manually judging profiles by name alone.

N1 - Normalized Edit Distance: Query to Username

N2 - Normalized Edit Distance: Query to Display Name

19

F1 Precision Recall0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

60.4%

80.1%

48.5%

93.5%97.5%

89.9%

Baseline Final

Official

F1 Precision Recall0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

67.0%74.0%

61.1%

92.3% 94.9%89.8%

Baseline Final

Affiliate

Results - Twitter

20

F1 Precision Recall0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

81.0% 79.4% 82.6%

93.3% 96.2%90.6%

Baseline Final

Official

F1 Precision Recall0%

10%

20%

30%

40%

50%

60%

70%

80%

48.9%

59.0%

41.8%

67.0%

75.3%

60.4%

Baseline Final

Affiliate

Results - Facebook

21

Profile TypesFacebook has multiple profile types: people, pages,

places, groups, etc.Twitter has just one: people.

Affiliates?Why don’t FB affiliates score as well? Page usernames are

optional./pages/Netflix-Latinoamérica/553454298124413

Display Name ID

22

Profile TypesFacebook has multiple profile types: people, pages,

places, groups, etc.Twitter has just one: people.

Affiliates?Why don’t FB affiliates score as well? Page usernames are

optional./pages/Netflix-Latinoamérica/553454298124413

Display Name ID

23

Affiliates?Why don’t FB affiliates score as well? Page usernames are

optional./pages/Netflix-Latinoamérica/553454298124413

Display Name ID

Auto-generated pages also follow the same pattern!

24

Future?

Focus on affiliates – unique to the domain.

25

Future?

Focus on affiliates – unique to the domain.

Drill down into the various different types: (e.g.) outreach, regional, brand, business unit.

26

Future?

Focus on affiliates – unique to the domain.

Drill down into the various different types: (e.g.) outreach, regional, brand, business unit.Improve ground truth: crowd-source labels.

27

DoneObjective: develop a system to find an organization’s profiles across different social networksUsed network-specific classifiers to do soEvaluated performance using modified cross-validation

FUture

Dive deeper into affiliates, which are unique to organizations

top related