information access on social web

Informa)on Access on the Social Web

2013/5/20

Agenda

2

Collaborative Exploratory Search

Integrated People Search

Virtual Reference and Community-based QA

Closing Remarks

Social Information Access

Dual Perspective Image Finding

Informa-on Access �  Information Access: an interactive process starts with a user noticing his/her needs and ends with the user obtaining the necessary information �  Iterative, multiple stages, many back loops

User Generated Content

Social Networks

Social Informa-on Access �  Social Information Access: information access using “community wisdom” �  Distilled from the actions in real/virtual

community �  Collaboration in explicit or implicit manner

�  Social information access technologies capitalize on the natural tendency of people to follow direct and indirect cues of others’ activities �  Going to a restaurant that attract many customers �  Asking others what movies to watch.

Space of Social Informa-on Access �  [Brusilovsky2012]’s taxonomy for social info access

7

Space of Social Informa-on Access �  [Brusilovsky2012]’s taxonomy for social info access

� However,

8

More Social Informa-on Access �  Collaboration can be explicit, not just implicit

�  Explicit Collaboration: users work as a team to complete the same task

9

Implicit Collaboration Explicit Collaboration

More Social Informa-on Access �  Target can be people, not just documents

�  Documents can be used to represent people

�  People should be modeled in network, not just by themselves �  Relationship is as important as the documents generated by the people

10

More Social Informa-on Access �  Content can be user generated, not just expert generated �  User generated content is noisy, flat, but easy to scale up

11

Expert Generated Content User Generated Content

More Social Informa-on Access �  Can social information access learn from library service, or vice versa?

12

Explicit Collaboration: Collaborative

Exploratory Search

Collaborate with Zhen Yue, Shuguang Han

Collabora-ve Exploratory Search

�  Complex information needs such as exploratory search may lead to collaboration �  Students working on a class project �  Friends looking for information to plan a vacation

Understand group activitities involved in the collaborative exploratory search process

Accommodate and support user activities in collaborative

exploratory search

Analyzing Collaborative Search Process

Data analysis method

User behavior

Designing Collaborative Search System

CollabSearch System

q  Search functions - Web Search

- Save/edit/rate/tag Web pages/snippets - Space for search task description

15

CollabSearch System

§  http://crystal.exp.sis.pitt.edu:8080/CollaborativeSearch/

16

Categorizing User Ac-ons Actions Descriptions

Query (Q) A user issues a query or clicks on a query from search history

View (V) A user clicks on a result in the returned result list

Save (S) A user saves a snippet or bookmarks a webpage

Workspace (W) A user clicks on or edits an item saved in the workspace

Topic (T) A user clicks on the topic statement

Chat (C) A user sends an message or views the chat history

Pre-‐Query Ac-ons

18

Collect

View

Workspace

Query

Chat

Topic

Collect

View

Workspace

Topic

Query

Collaborative Search

Individual search

v  Possible benefit of explicit communication in collaborative search §  Helping users to generate queries.

Pre-‐chat and post-‐chat analysis

19

Chat

View

Query

Collect

Workspace

Topic

View

Query

Save

Chat

Topic

Workspace

Reasons that trigger the chatting: Needs for discussing task requirements and

item collected.

Post-chat: Check workspace

Issuing a query Check the topic statement

Dimensions of User Interac-ons

20

Interac-ons in Collabora-ve Search Interactions Description

Search – query – self (Q) A user issues a query Select-‐ item-‐self (V) A user clicks on a result in the returned result list Capture-‐item-‐self (S) A user saves a snippet or bookmarks a webpage

Scan-‐list of saved item – mixed (Wm)

A user checks the workspace without clicking on any particular item.

Select – single saved item –self (Ws)

A user clicks on an item in the workspace saved by him/herself

Select – single saved item – partner (Wp)

A user clicks on an item in the workspace saved by the partner

Scan-‐topic -‐shared (T) A user clicks the topic statement for view Communicate-‐ messages-‐self

(Cs) A user sends a message to the other user

Communicate-‐message-‐partner (Cp)

A user receives a message from the other user

21

Transi-on analysis using HMM

�  Disadvantage of previous methods �  Missing global view of search behaviors �  Hard to determine the segments of sequential behaviors as different

search states

� Model search states as hidden variables

A Hidden Markov Model for Action transitions

22

Hidden States and Transi-ons Q V S Wm Ws Wp T Cs Cp

HQ 0.82 0.13

HV 0.87 0.1

HS 0.88

HD 0.36 0.36 0.21

HW 0.37 0.44 0.12

HC 0.44 0.47

23

Collaborative Search

Individual Search

What We Learned �  Collaborative search process have patterns

�  More collaboration-‐oriented actions as the collaboration level increase

�  Transitions within search-‐oriented actions and within collaboration-‐oriented actions are more frequent than between them in all three conditions.

�  Explicit and implicit communication has potential benefit on helping using generating query ideas.

24

People Search in their Networks:

PeopleExplorer

Collaborate with Shuguang Han, Zhen Yue

Search for People �  People use search engines in daily basis

�  But many are People Search �  Find appropriate collaborators �  Find conference program committee members �  Find qualified job candidates �  Find appropriate experts to answer questions in online QA

(Question Answering) system 26

query=“experts in information retrieval”

�  Unable to support diverse tasks in one system �  Only focus on one type of people search task, but task contexts are

diverse �  Find keynote speakers: authoritativeness �  Find collaborators: social closeness

�  Unable to support personalizing user preferences �  Even in the same task, users have different preferences. e.g. finding

thesis committee members �  Some users prefer to find domain expert �  Some prefer to find someone who are easily to be connected

�  Unable to support exploratory search process �  Exploration is an iterative and interactive process. Users may need

to learn the importance of each criterion

27

Limita-on of Exis-ng People Search

The PeopleExplorer System �  The proposed method

�  Represent task diversity through multiple facets �  Allow users personalize the importance of each facet �  Explore the importance of each facet (system explained why each

candidate is returned in candidate surrogate)

�  The Dataset �  151,165 ACM hosted conference papers �  In computer science and information science fields �  From 2000 to 2011 �  209,592 unique authors �  Title, abstract and authors of each paper

28

29

query = “recommender system”

Users’ exploration on three facets

Candidate Surrogate

Workspace

�  Content Relevance �  I: Retrieve a set of relevant documents for each query �  II: Pass the score from document to each of its authors �  III: Rank author based on its integrated score �  Title and Abstract were indexed for document search

�  Authoritativeness �  PageRank** �  Decomposed a coauthor link into two directional links

Method

** Illustration of Authoritativeness, from Wikipedia

30

�  Social Similarity �  Measured by # common coauthors two people shared �  Users can also build their social profiles, the similarity is measured by the aggregated similarity for all connections in your social profile

�  Integration �  Log-‐Linear combination with weights indicating the importance of each facet

Method

31

Experiment Design �  Exploratory People Search Tasks

�  Conference Mentor Finding �  Expectation: Authoritativeness is important

�  New Coauthor Finding �  Expectation: More social similarity

�  External Thesis Committee Member Finding �  Expectation: both social similarity and authoritativeness are

important

�  Reviewer Suggestion �  Expectation: Less social similarity

�  Two Systems �  Experimental system and baseline system

32

33

The Experimental system

The Baseline system

Example Tasks

34

Par-cipants �  24 participants

�  10 are female, 14 are male

�  All are PhD students majoring in computer science and information science from 8 Universities �  Research interests are diverse: information retrieval, computer

graphics, GIS, information security, health informatics, graphic model

�  92% of them searched at least 2-‐3 times a month. �  67% of them searched for people at least once a week in academic

search engines such as Google Scholar and Microsoft Academic Search.

35

Result Analysis �  System Usage Analysis

�  How did people use two systems?

�  System Performance �  Whether the experimental system is better in terms of both Efficiency

and Effectiveness ?

�  User Perceptions �  How did users perceived the performance of the system?

�  Task Contexts �  The importance of each facet in different tasks and among different

users

36

System Usage �  Number of unique queries (NUQ)

�  Overall, no significant difference, but has significance for Conference mentor finding task (p=0.037)

�  Number of result pages users clicked (NP) �  Experimental system is significantly better

�  How many times users tuned the slide bars (NSB)

37

System Effec-veness �  Average rank position of the marked candidates (ARP) �  Average relevance score over the five selected candidates (ARel) �  Number of returned candidates (NC) and the number of unique

candidates (NUC) generated by the system for each task.

38

System Efficiency �  Overall, No significant difference has been found

}  But significant (p = 0.1) for Task 1

39

System Efficiency �  Overall, No significant difference has been found

}  The time spent for finding the first candidate is significant for Task 2

40

User Percep-ons

�  Usability questions

}  Interaction between Task and Satisfactory In Q4

41

Task Contexts Analysis �  The importance of each facet in different tasks

�  Record the weights for each facet when selecting a candidate, �  If weight of the facet ≠ 0, we think this facet is important �  count the number of candidates view each facet as important

42

Insights �  People finding tasks do need iterative and interactive system support �  Users only need to check fewer unique candidates in the top rank

positions.

�  The candidates are more relevant. Overall, users perceived more satisfied.

�  Importance of each facet is diverse in different tasks

43

Combine Expert Content with User Generated Content

Collaborate with Yiling Lin and Peter Brusilovsky

Finding Images �  Great amount of images created daily � Most of images are without textual content

45

Teenie Harris Arichive: 80,000 images 5 catalogers who worked full time for 5 years

The Flamenco Search Interface

46

�  images can be found more efficiently and effectively when more than one information indicators are provided to users in a combined manner �  Driven by information scent in the information foraging theory

49

Dual Perspec-ve Image Finding

Dual-‐Perspec-ve Image Finding

50

Provide sufficiently strong information scents

Allow users to incrementally reach their goal

Offer efficient and informative feedback

Informa-on Flow

51

Research Design �  “Teenie” Harris collection at Carnegie Museum of Art

�  1,986 of these images �  4,206 unique tags and 16,659 tag assignments using Mturk

�  Library of Congress image collection in Flickr. �  12,541 images �  39,737 unique tags and 1,216,318 tag assignments

�  provided by the Library of Congress and Flickr’s users

52

DPIF: Flickr LC Collec-on

53

Baseline 1: Subject Heading Only

54

Baseline 2:Tag only

55

Research Design

�  Controlled experiment with 52 participants from great Pittsburgh area

�  Data will be recorded with multiple methods: �  system logs, �  a pre-‐test (working memory capacity test & background survey), �  post-‐questionnaire after each task, each interface, and at the end �  a structural interview

�  Search tasks �  Lookup tasks �  Exploratory search tasks

56

Search Tasks �  Lookup search tasks

�  3 for each participant/system �  Total 9 lookups

�  Exploratory search tasks �  1 for each participant/system, total 3 exploratory tasks

57

Ini-al Results

58

Learn from Current and Traditional:

Virtual Reference and Community-‐based QA

Collaborate with Dan Wu at Wuhan University

Two Social Services

�  Community-‐based Q&A (cQA) �  Provide knowledge sharing among community users �  Become rapidly developing social collaboration platforms �  Build participatory platform for Q & A among community users

�  Collaborative Digital Reference (cDR) �  Extend reference service with patrons to online �  Collaborate among libraries with different expertise & working

schedules �  Learn among libraries and help each other �  Allocate resources better according to users’ needs �  Build collaborative platform for Q & A among libraries

60

Research Mo-va-ons �  cQA and cDR are two instances of social Q & A

�  Both enable people to collaborate in answering questions �  important question: the differences and connections between cQA

and cDR, and between different languages

�  Research Questions �  Q1: through the set of questions asked at the selected cQA and cDR

sites, what can be the service differences in term of answer quality, responsiveness and response time?

�  Q2: Do Chinese sites and English sites reveal differences in the answers to Q1?

�  Q3. What can be learned from cQA to improve cDR?

61

Study Design �  Sampling method

�  Aim to obtain first-‐hand, focused evaluation �  2 languages: English and Chinese �  3 cDR sites and 3 cQA sites in each language

�  3X4 questions and domains �  3 domains: Economics, literature, library science �  4 types of questions: Factual questions, enumerative questions,

definition questions and explorative questions

�  Answers: obtained from encyclopedias, Wikipedia and online fact books, also ask domain experts

62

Three Chinese cQA Sites �  Baidu Zhidao

63

Three Chinese cQA Sites �  Sina iAsk

64

Three Chinese cQA Sites �  SOSO Ask

65

Three English cQA Sites �  Yahoo! Answers

66

Three English cQA Sites �  Answers.com

67

Three English cQA Sites � MadSci Net

68

Three Chinese cDR Sites �  Reference Service of China’s National Science Digital Library

69

Three Chinese cDR Sites � Online Joint Knowledge Navigation

70

Three Chinese cDR Sites �  The Collaborative Reference Network

71

Three English cDR Sites � QuestionPoint

72

Three English cDR Sites �  IPL2

73

Three English cDR Sites �  Ask a Librarian

74

3X4 Ques-ons and Domains Economics Literature Library Science

Factual questions

芒德尔•托宾效应最早是在哪篇文章中被提出？ In which paper was the idea later called Mundell-‐Tobin effect first published?

迄今为止，诺贝尔文学奖已有多少位获奖者？ How many people have won the Nobel Prize for Literature up to now?

世界图书首都评选是从哪一年开始的？ From which year did the selection of “World Book Capital” begin?

Enumerative questions

根据最新统计数据，中国有哪些企业进入世界五百强前十名之列？ According to the latest data, which Chinese corporations are among the top ten of the world’s top five hundreds enterprises?

在所有诺贝尔文学奖得主中，有哪些人是从南美洲来的？ Among all the Nobel Literature Prize laureates, who are/were from South America?

世界性的图书馆组织有哪些？ What international library organizations are there?

Definition questions

什么是流动性补偿？ What does compensation for liquidity mean?

什么是泛文学？ What does pan-‐literature mean?

什么是iSchool？ What is iSchool?

Explorative questions

全球经济复苏还需要多长时间？为什么？ How much time is still needed for global economy to recover? Why?

博客对大众文学有哪些影响？ What impacts have the blogs made on the popular literature?

数字图书馆的快速发展会给实体图书馆带来哪些方面的重大变化？为什么会有这些变化？ What important changes will the rapidly developed digital libraries bring to traditional libraries? And why are there these changes?

75

Results: Chinese Sites questions cQA sites cDR sites

Baidu Zhidao

Sina .iAsk SOSO Ask The Collaborative Reference Service of China’s National Science Digital Library

Online Joint Knowledge Navigation

The Collaborative Reference Network of Zhongshan Library at Guangdong Province

Factual questions

Economics 0/0 0/0 0/0 0/1 1/1 0/0

Literature 0/1 1/1 1/1 1/1 1/1 1/1

Lib Science 0/0 1/1 1/1 1/1 1/1 1/1


Economics 1/1 1/2 2/2 1/1 1/1 0/0

Literature 0/0 1/1 2/2 0/0 1/1 1/1

Lib Science 1/2 1/1 2/2 1/1 1/1 1/1


Economics 1/1 1/1 2/2 1/1 1/1 1/1

Literature 1/2 1/2 1/2 0/0 1/1 1/1

Lib Science 1/2 1/1 1/1 1/1 1/1 0/1


Economics 0/0 1/1 3/3 1/1 0/1 0/0

Literature 1/2 0/0 1/2 0/0 1/1 0/1

Lib Science 1/2 0/0 1/1 0/1 0/1 0/0 76

43 answers for the 12 questions asked in cQA

Average 3.58 answers per question

29 answers for the 12 questions asked in cDR

Average 2.42 answers per question

33 answers are correct (76.7%)

23 answers are correct (79.3%)

Factual: 5 answers, 4 are correct

Factual: 8 answers, 7 are correct

Enumerative: 13 answers, 11 are correct


Definition: 14 answers, 10 are correct


Explorative: 11 answers, 8 are correct


Results: Chinese Sites rank system/Q&A websites number of questions that

received answers (out of 12 questions)

number of correct answers/ total number of answers

correct answer rate (%)

answering time (average over all returned answers)

1 SOSO Ask 8 17/19 89.5 1 day,20 hours and 3minutes

2 Online Joint Knowledge Navigation

12 10/12 83.3 3 days

3 Sina.iAsk 8 9/11 80 13 days,19 hours and 5 minutes

4 The Collaborative Reference Service of China’s National Science Digital Library

9 7/9 77.7 7 days

5 The Collaborative Reference Network of Zhongshan Library at Guangdong Province

8 6/8 75 8 hours

6 Baidu Zhidao 8 7/13 53.8 6 days and 15hours

77

SOSO Ask responded relatively quickly and produced the highest number of answers

Online Joint answered all 12 questions, and responded very quickly

Had the shortest response time, but the quality of the answers varies

cQA was not faster at providing answers when comparing to cDR

questions cQA sites cDR sites

Yahoo! Answers

Library of Congress IPL2

Factual questions

Economics 0/0 1/1 1/1

Literature 1/1 1/1 1/1

Lib Science 0/0 1/1 1/1













Results: English Sites

78

15 answers for 10 of the 12 questions asked in Yahoo! Answers

IPL provided 12 answers to 12 questions LC provided 6 answers to 6 of the 12 questions

10 answers are correct (66.7%) in Yahoo! Answers IPL has 100% correct answers

LC has 83.3% correct answers

Factual: 1 answer, and is correct

Factual: 6 answers, all are correct







rank system/Q&A websites number of questions that received answers (out of 12 questions)

number of correct answers/ total number of answers

correct answer rate (%)

answering time (average over all returned answers)

1 IPL2 12 12/12 100 14 days

2 Library of Congress 6 5/6 83.3 17 days

3 Yahoo! Answers 10 10/15 66.7 2 days

4 MadSci Net 1 0/1 0 /

5 Ask a librarian 1 0/0 0 /

6 Answers.com 0 0/0 0 /

Results: English Sites

79

IPL2 is the best online service , 100% correct answer rate, also answers are all in high quality

Yahoo! Answers has the fastest answering speed and the largest number of answers. But its answer quality is lower than IPL2 and LC

Answers.com and Ask a Librarian did not answer our questions

LC only answered half of our questions, and took long time to answer

Between Chinese and English �  Exhibit many similarities

�  cQA sites are good at enumerative and definition question, and to some degree explorative questions, but poorly on factual questions, particularly in economics.

�  cDR sites are more reliable, and produce higher quality answers even though number of answers is smaller

�  Demonstrate some differences �  Screening questions differently: our questions to the Chinese sites

produced more responses, whereas two English sites did not answer our questions at all.

�  Response time is shorter in Chinese sites, and only Yahoo! Answers is in comparable response timeframe. Maybe both IPL2 and Library of Congress are very busy

80

What We Learned �  Pros and Cons of cQA and cDR

�  cQA’s advantages: large user groups, more answers returned. �  Consistent with Shachaf (2009): cQA are more heavily utilized

�  cQA’s Limitations: information of different qualities and the shallowness of some answers.

�  cDR’s advantages: rich and reliable reference resources, and high literacy skills of reference librarians. �  Consistent with Connaway and Radford (2011): information quality and

interpersonal relationship �  Consistent with Shachaf (2009): librarians are valuable for answering more

difficult questions

�  cDR’s limitations: slow response speed and smaller numbers of answers.

81

What We Learned �  Inspirations

�  How to speed up and scale up cDR? �  make the cDR reference process and results as open as possible

�  Lankes (2004): general DR model contains a Q & A archive

�  Add commenting, tagging and discussing functions to cDR questions and answer collections �  Build up more feedback and participatory mechanisms

�  the usages of cQA answers in cDR services �  ??An answer to Connaway and Radford (2011) challenges: “users still

do not really know about digital reference services” �  some high quality cDR services make them available in well-‐known cQA

sites, integrate cDR with cQA

82

What We Learned �  Limitations of the Study

�  the number of samples is small �  considering the popularity of cQA sites and many other cDR services �  Considering the wide range of questions asked

�  our selected questions and our native language might trigger or prevent some responses from the English sites.

�  it would be better to have a survey associated with the questions we asked so that some reasons behind certain reactions from the sites (such as lack of returned answers to our questions) can be better explained.

83

Closing Remarks

Collabora-ve Search 2.0

�  Better model of users and teams �  People in different populations �  Teams with bigger size �  Team members with different roles

�  New mobile and mixed platform �  Smart phones, tablets, laptops, etc.

�  Collaborative search process or systems �  Collaborative search are more popular �  But collaborative search systems are not widely used

Heterogeneously Social � Heterogeneous information resources

�  Articles, web pages, blogs, twitters, facebooks, youtube, search history

� Heterogeneous platforms �  Communication networks �  Interaction platforms: mobiles, tablets, laptops, desktops etc

Integra-on with LIS �  Social information access develops many new technology on information organization, information storage and retrieval �  Scalable and quick, but noisy and shallow

� How such knowledge can be integrated with traditional expert generated knowledge �  Clean and deep, but static and

87

Privacy and Security �  Social information in general is open

�  But people still are concerned with their privacy �  Particularly when information can be easily aggregated

�  Social information belongs to the sites �  But it is part of the people’s identity and assets �  How to maintain, preserve and safe-‐guard social information?

88

Access Increasingly More Social �  Know the boundary of Social Information Access �  How to identify which tasks

are good for social information access?

�  How to effectively integrate social networking, direct messaging, and social recommendations with current search facilities.

Related Publica-ons �  Dan Wu, Daqing He. (2013). A study on Q&A services between community-‐based question answering and

collaborative digital reference in two languages. iConference 2013 Proceedings (pp. 326-‐337). doi:10.9776/13205.

�  Han, Shuaguang; Yue, Zhen; He, Daqing. Automatic Identifying Search Tactic in Individual Information Seeking: A Hidden Markov Model Approach. iConference 2013.

�  Zhen Yue, Shuguang Han, Daqing He, A Comparison of Action Transitions in Individual and Collaborative Exploratory Web Search. The eighth Asia information retrieval societies conference, 2012

�  Zhen Yue, Jiepu Jiang, Shuguang Han, Daqing He. 2012. Where do the Query Terms Come from? An Analysis of Query Reformulation in Collaborative Web Search. In Proceedings of the 21st International Conference on Information and Knowledge Management (CIKM '12): 2595-‐2598.

�  Shuguang Han, Daqing He, Zhen Yue, Jiepu Jiang and Wei Jeng. IRIS-‐IPS: An Interactive People Search System for HCIR Challenge. 2012 Human-‐Computer Information Retrieval Symposium (HCIR Challenge 2012), Boston, IBM Research

�  Zhen Yue, Shuguang Han, Jiepu Jiang, and Daqing He. 2012. Search tactics as means of examining search processes in collaborative exploratory web search. In Proceedings of the 5th Ph.D. workshop on Information and knowledge (PIKM '12). ACM, New York, NY, USA, 59-‐66. DOI=10.1145/2389686.2389699

90

91

Really Tough Questions Please!!!

Acknowledgement �  The work presented here were conducted by faculty and students in Information Retrieval, Integration and Synthesis Lab at School of Information Sciences

� Other people participated in these works are �  Prof. Peter Brusilovsky, Prof Dan Wu etc.

�  These work are partially supported by the National Science Foundation

information access on social web

Technology

collaborative search

search task description

workspace topic view

chat view query

chat topic workspace

check workspace

topic statement

support user activities