deduced social networks for educational portalshaffer/presentations/sigcse12_monikasr… · deduced...

1
Deduced Social Networks for Educational Portal Educational portals such as Algoviz.org contain rich information resources. A key concern is directing users to speci:ic resources that are of interest to them. While AlgoViz has signi:icant traf:ic, we cannot count on active user participation in the form of explicit ratings of individual resources. Lacking active user data (e.g., user ratings on resources), we instead use log data to deduce user trends. We describe our techniques for clustering users based on the log data. We show how cluster analysis can be used to improve searching and browsing within AlgoViz. Our approach has the potential to be useful for a wide range of educational resource portals. Abstract A Deduced Social Network (DSN) is a Graph with tuple (Entity, Object, k), where Entity is the node of the graph, Object is an attribute linked to Entity, where one Entity can have multiple Object(s), and k is a constant or a function that returns the minimum number of Object(s) that must be common between two Entities to create a connection (i.e., edge) between them. Example: In a userbased DSN, a connection threshold k=5 for an edge indicates that two users (i.e., Entity) have viewed at least 5 common pages (i.e., Object). In an objectbased DSN, k=3 for an edge would indicate that there are at least 3 users who viewed the two objects (e.g., pages). Deduced Social Network (DSN) Getting the Data Store user activities. Various logging options: Server log, system logs. Sites such as Google Analytics provide more advanced metrics like visits, pageviews, bounce rate, time one site, etc. Data Cleaning: Remove spammers, bots, crawlers, etc. Generating and Analyzing DSN in AlgoViz DSNs can identify connections between anonymous users who are otherwise disconnected. DSNs can identify the existence of groups with speci:ic interest. DSNs can be used to improve search on sites with high volumes of anonymous traf:ic. Re:ine Ranking and Recommendations We use a custom ranking function that places different weights on AlgoVizspeci:ic :ields of an Algorithm Visualization catalog entry. Clusters representing a speci:ic content type are used to add weight to content of that type. Top contents c 1 , c 2 , c 3 , …, c n of cluster x that is dominated by a content type of y (e.g., forum, page, catalog entry, etc.), receive certain points. Search results and browsing list are ordered by ranking score. Top contents of clusters dominated by a speci:ic content type are used to create recommendations. Applications Is there any connection between the users? Generate DSN between users based on various criteria such as pageviews, ratings, reviews, etc. Within the network, are there groups of users? Graph Partitioning (Modularity clustering) What are the interests of the groups? Topic modeling Latent Dirichlet allocation (LDA) Monika Akbar and Clifford A. Shaffer Department of Computer Science, Virginia Tech {amonika, shaffer}@vt.edu Findings Clusters and cluster- sizes of Sept-10 DSN (left) and Oct-10 DSN (right) (k = 10) Topics and cluster-wise topic distribution for Sept-10 (top) and Oct-10 (bottom) DSN AlgoViz DSN for Sept-10 (top) and Oct-10 (bottom) (k = 10) This work has been supported by NSF DUE-0836940, DUE-0937863, DUE-0840719. Catalog Entry (CE) Ranking 0 10 20 30 40 50 1 2 3 4 5 6 7 # of Nodes Cluster # 0 20 40 60 80 100 1 2 3 4 5 # of Nodes Cluster # User groups in AlgoViz (October 2010) Projected Dimension 1 -800 -600 -400 -200 0 200 400 600 800 Projected Dimension 2 -1000 -500 0 500 1000 Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Cluster 6 Cluster 7 User groups in AlgoViz (September 2010) Projected Dimension 1 -1000 -500 0 500 1000 1500 2000 Projected Dimension 2 -200 0 200 400 Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5

Upload: others

Post on 12-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Deduced Social Networks for Educational Portalshaffer/Presentations/SIGCSE12_MonikaSR… · Deduced Social Networks for Educational Portal Educational+ portals+ such+ as+ Algoviz.org+

Deduced Social Networks for Educational Portal

Educational   portals   such   as   Algoviz.org   contain   rich   information   resources.   A   key   concern   is  directing   users   to   speci:ic   resources   that   are   of   interest   to   them.  While   AlgoViz   has   signi:icant  traf:ic,  we   cannot   count   on   active   user   participation   in   the   form   of   explicit   ratings   of   individual  resources.   Lacking   active   user   data   (e.g.,   user   ratings   on   resources),   we   instead   use   log   data   to  deduce   user   trends.  We   describe   our   techniques   for   clustering   users   based   on   the   log   data.  We  show   how   cluster   analysis   can   be   used   to   improve   searching   and   browsing  within   AlgoViz.   Our  approach  has  the  potential  to  be  useful  for  a  wide  range  of  educational  resource  portals.  

Abstract A  Deduced  Social  Network  (DSN)  is  a  Graph  with  tuple  (Entity,  Object,  k),  where  •  Entity  is  the  node  of  the  graph,  •  Object  is  an  attribute  linked  to  Entity,  where  one  Entity  can  have  multiple  Object(s),  and  •  k   is   a   constant   or   a   function   that   returns   the  minimum  number   of  Object(s)   that  must   be  

common  between  two  Entities  to  create  a  connection  (i.e.,  edge)  between  them.  Example:   In   a  user-­‐based  DSN,   a   connection   threshold  k=5   for   an   edge   indicates   that   two  users  (i.e.,  Entity)  have  viewed  at  least  5  common  pages  (i.e.,  Object).    In  an  object-­‐based  DSN,    k=3  for  an  edge  would  indicate    that  there  are  at  least  3  users  who  viewed  the  two  objects  (e.g.,  pages).    

Deduced Social Network (DSN)

Getting  the  Data  

•  Store  user  activities.    •  Various  logging  options:  Server  log,  system  

logs.  •  Sites   such   as   Google   Analytics   provide  

more   advanced   metrics   like   visits,  pageviews,  bounce  rate,  time  one  site,  etc.  

 •  Data   Cleaning:   Remove   spammers,   bots,  

crawlers,  etc.    

Generating and Analyzing DSN in AlgoViz

•  DSNs  can  identify  connections  between  anonymous  users  who  are  otherwise  disconnected.  •  DSNs  can  identify  the  existence  of  groups  with  speci:ic  interest.  •  DSNs  can  be  used  to  improve  search  on  sites  with  high  volumes  of  anonymous  traf:ic.  

•  Re:ine  Ranking  and  Recommendations    •  We  use  a  custom  ranking  function  that  places  different  weights  on  

AlgoViz-­‐speci:ic  :ields  of  an  Algorithm  Visualization  catalog  entry.  •  Clusters  representing  a  speci:ic  content  type  are  used  to  add  weight  

to  content  of  that  type.  Top  contents  c1,  c2,  c3,  …,  cn  of  cluster  x  that  is  dominated  by  a  content  type  of  y  (e.g.,  forum,  page,  catalog  entry,  etc.),  receive  certain  points.  

•  Search  results  and  browsing  list  are  ordered  by  ranking  score.    •  Top  contents  of  clusters  dominated  by  a  speci:ic  content  type  are  

used  to  create  recommendations.  

Applications

Is  there  any  connection  between  the  users?  •  Generate   DSN   between   users   based   on   various  

criteria  such  as  pageviews,  ratings,  reviews,  etc.  

   

Within  the  network,  are  there  groups  of  users?  

•  Graph  Partitioning  (Modularity  clustering)  

What  are  the  interests  of  the  groups?  •  Topic  modeling    

•  Latent Dirichlet allocation (LDA)  

Monika Akbar and Clifford A. Shaffer Department of Computer Science, Virginia Tech

{amonika, shaffer}@vt.edu

Findings

Clusters and cluster- sizes of Sept-10 DSN (left) and Oct-10 DSN (right) (k = 10)

Topics and cluster-wise topic distribution for Sept-10 (top) and Oct-10 (bottom) DSN AlgoViz DSN for Sept-10 (top) and Oct-10 (bottom) (k = 10)

This work has been supported by NSF DUE-0836940, DUE-0937863, DUE-0840719.

Catalog Entry (CE) Ranking

0

10

20

30

40

50

1 2 3 4 5 6 7

# o

f N

odes

Cluster #

0

20

40

60

80

100

1 2 3 4 5

# o

f N

odes

Cluster #

User groups in AlgoViz (October 2010)

Projected Dimension 1

-800 -600 -400 -200 0 200 400 600 800

Proj

ecte

d D

imen

sion

2

-1000

-500

0

500

1000

Cluster 1Cluster 2Cluster 3Cluster 4Cluster 5Cluster 6Cluster 7

User groups in AlgoViz (September 2010)

Projected Dimension 1

-1000 -500 0 500 1000 1500 2000

Proj

ecte

d D

imen

sion

2

-200

0

200

400Cluster 1Cluster 2Cluster 3Cluster 4Cluster 5