semantic recommandation sytems for research 2.0

SEMANTIC RECOMMENDATION SYSTEMS

FOR RESEARCH 2.0OR

A Conceptual Prototype for a Twitter based Recommender System for Research 2.0

by Patrick Thonhauser

Thursday, October 11, 12

OUTLINE

• Motivation

• Basics (Semantic Web, Recommender Systems, Natural Language Processing)

• Conceptual Prototype

• Test results and Discussion

• Questions

MOTIVATION

• Is Twitter useful for discovering new connections between researchers in similar subject areas (and why Twitter)?

• How much information can we extract form 140 character strings?

• Is it possible to separate useful information from noise?

• Are there any appropriate classifiers and metrics to measure the significance of Twitter users and Tweets?

SEMANTIC WEB

• Additional Layer of Information

• Linked Data (use URIs as names, use HTTP URIs, use standards to provide Information, include links to other URIs)

• RDF (based on triples -> subject, predicate, object) is like HTML for the classic web

• Nearly all semantic web standards are based on RDF (like FOAF - Friend of a Friend Project)

RECOMMENDER SYSTEMS

• Collaborative Filtering (user based/ item based)

• Content Based Recommendation

• Knowledge Based Recommendation

• Hybrid Recommendations

NATURAL LANGUAGE PROCESSING (NLP)

• Classification of Microtext Artefacts (This presentation is killer!)

• Applying NLP - Pipelines

• End of Sentence Detection

• Tokenization

• POS Tagging

• Chunking

• Extraction

THE CONCEPT OF THOUGHT

BUBBLES

Let’s imagine every Twitter user belongs to several

different topic related Bubbles

• A user is part of topic related bubbles

• Twitter users within topic related bubbles don’t necessarily know each other

• Connections of already existing connections of the service user lead to new information

• Non bidirectional connections preferred

LET’S SUMMARIZE

So how can we find such potentially interesting users?

PROOF OF CONCEPT SYSTEM(1) Preselection of user set, which will

be analyzed in depth

(2) Apply NLP-Pipeline for measuring user similarity

(3) Categorize the top-n best scoring users according to the idea of Thought Bubbles

(4) Recommend top-n best scoring users of a category to the user

(5) Analyze acceptance of recommendations

IOS DEV

SOCIAL MEDIA

SPORTS

SERVICE USER

TWITTER

REST API

THOUGHT

BUBBLES API

NLP PRE-

FILTERING

CATEGORISATION

CLUSTERING

ANALYZE RECS

SERVER

A USERS THOUGHT BUBBLE

Friends of Friends Twitter

Accounts

Filter accounts that are already connected to you

Filter accounts where: follower_count < 300 status_count < 1000

Filter non English speakingaccounts

Filter Filter FilterIdentifiy People

by using a simple NLP Pipeline

Set of Twitter accounts for further processing

(1) PRE-SELECTION/FILTERING

• The set of friends of friend‘s Twitter accounts changes from iteration to iteration

• Filters are added after analyzing the acceptance of recommendations

@testuser The grand jury

commented on a number of…

POS tagging

Tokenization and stripping

@mentions and URLs

[('The', 'AT'), ('grand', 'JJ'), ('jury', 'NN'),

('commented', 'VBD'), ('on', 'IN'),

('a', 'AT'), ('number',

'NN'), ... ('.', '.')]

Raw Tweets

Chunking

Neglect 200 most used English

wordsPOS tagged Tweets

[('jury', 'NN'), 'number',

'NN'), ('social dayly',

'NP'), ...]

Mined nouns and phrases

Frequency Distribution

[('jury', 34), ('social', 23), ('test case',

16), ...]

Filter top n words

Set of Frequency Distributed mined nouns and phrases

(2) NLP PIPELINE

400 most recent Tweets of a potential recommendation are used for calculating the similarity measure

•Calculate top-n users by applying Single-Linkage-Clustering

•Categorize if user belongs to user specific bubbles

•Present recommendation lists to users

•Analyze acceptance of recommendations (connect user accounts with FOAF) and add new filter predicate if necessary.

SUPERVISED TEST RUN

@gargamit100*@selvers*

@UpsideLearning*@poposkidimitar*

@jkalten*@cpappas*@pfidalgo1*

@timbuckteeth*@starsandrobots*

@TheJ Russ@cliveshepherd*

@Microsoft@jtcobb*

@MichaelPhelps@SebastianThrun*

@elearning*@elvaandrade

@BarackObama@SteveVictor

@AnwarRichardson@pabaker55*

@jamesmclynn@DrEvanHarris

@mstrohm*@AmyFrearson

@gekitz@Hhaitch@sclater*

@TheRock@MCeraWeakBaby

@fatcharlesh@FrankViola@timbarker

@AnnaOscarsson@WithDrake

sabrinaVanessa@charliesheen

@WWEDanielBryan@cmccosky

@kaitlyntrigger@judithsei*

@atsc*@melaniedaveid

@Emmadw*@ladygaga

@marcusfairs@lucyheartsTW

@PeterSmith@MikeVick

@meadd cameron0 0.075 0.150 0.225 0.300

recommendations are framed

UNSUPERVISED TEST RESULTS

The probability that a recommended item is relevant is 64.4%. Standard Derivation: 31.5%

DISCUSSIONTwitter IS useful for discovering new information in sense of Research 2.0 but:

• Recommendations reflect the Twitter behavior of the user

• Automated tweets harm recommendation results (one sentence gets an enormous weight because it occurs very very often)

• Twitter‘s request limitation is a show stopper

• Comparison to similar systems (Content and collaborative filtering)

THANK YOU!ANY QUESTIONS?

semantic recommandation sytems for research 2.0

twitter useful

n users

twitter behavior

new information

useful information

preselection of user

interesting users

clustering categorizeif

Documents

programming distributed memory sytems using openmp

operating sytems basics

principles, sytems and technologies for c4isr

irrigation design manual - irrigation sytems

programa scorpio sytems nivel 1

engineered sytems exalogic exadata

aldmyr sytems v. friedman - weird copyright case.pdf

energy rating sytems for green buildings

introduction of shalom sytems pvt.ltd

sytems analysis and design

co-operative sytems whereitsat event introduction

12.1 solving sytems of equations - weebly

rr410502 distributed sytems

performance analysis of mc-cdma sytems using …

6492098 computer sytems validation

management information sytems

recommandation afssap

recommandation opinel

environmental management sytems and processes

tut lis it & sytems