information access on social web
DESCRIPTION
A presentation given in May 2013. Talk about related work developed at IRIS lab at the School of Information Sciences, University of Pittsburgh.TRANSCRIPT
Informa)on Access on the Social Web
2013/5/20
Agenda
2
Collaborative Exploratory Search
Integrated People Search
Virtual Reference and Community-based QA
Closing Remarks
Social Information Access
Dual Perspective Image Finding
Informa-on Access � Information Access: an interactive process starts with a user noticing his/her needs and ends with the user obtaining the necessary information � Iterative, multiple stages, many back loops
User Generated Content
Social Networks
Social Informa-on Access � Social Information Access: information access using “community wisdom” � Distilled from the actions in real/virtual
community � Collaboration in explicit or implicit manner
� Social information access technologies capitalize on the natural tendency of people to follow direct and indirect cues of others’ activities � Going to a restaurant that attract many customers � Asking others what movies to watch.
Space of Social Informa-on Access � [Brusilovsky2012]’s taxonomy for social info access
7
Space of Social Informa-on Access � [Brusilovsky2012]’s taxonomy for social info access
� However,
8
More Social Informa-on Access � Collaboration can be explicit, not just implicit
� Explicit Collaboration: users work as a team to complete the same task
9
Implicit Collaboration Explicit Collaboration
More Social Informa-on Access � Target can be people, not just documents
� Documents can be used to represent people
� People should be modeled in network, not just by themselves � Relationship is as important as the documents generated by the people
10
More Social Informa-on Access � Content can be user generated, not just expert generated � User generated content is noisy, flat, but easy to scale up
11
Expert Generated Content User Generated Content
More Social Informa-on Access � Can social information access learn from library service, or vice versa?
12
Explicit Collaboration: Collaborative
Exploratory Search
Collaborate with Zhen Yue, Shuguang Han
Collabora-ve Exploratory Search
� Complex information needs such as exploratory search may lead to collaboration � Students working on a class project � Friends looking for information to plan a vacation
Understand group activitities involved in the collaborative exploratory search process
Accommodate and support user activities in collaborative
exploratory search
Analyzing Collaborative Search Process
Data analysis method
User behavior
Designing Collaborative Search System
CollabSearch System
q Search functions - Web Search
- Save/edit/rate/tag Web pages/snippets - Space for search task description
15
CollabSearch System
§ http://crystal.exp.sis.pitt.edu:8080/CollaborativeSearch/
16
Categorizing User Ac-ons Actions Descriptions
Query (Q) A user issues a query or clicks on a query from search history
View (V) A user clicks on a result in the returned result list
Save (S) A user saves a snippet or bookmarks a webpage
Workspace (W) A user clicks on or edits an item saved in the workspace
Topic (T) A user clicks on the topic statement
Chat (C) A user sends an message or views the chat history
Pre-‐Query Ac-ons
18
Collect
View
Workspace
Query
Chat
Topic
Collect
View
Workspace
Topic
Query
Collaborative Search
Individual search
v Possible benefit of explicit communication in collaborative search § Helping users to generate queries.
Pre-‐chat and post-‐chat analysis
19
Chat
View
Query
Collect
Workspace
Topic
View
Query
Save
Chat
Topic
Workspace
Reasons that trigger the chatting: Needs for discussing task requirements and
item collected.
Post-chat: Check workspace
Issuing a query Check the topic statement
Dimensions of User Interac-ons
20
Interac-ons in Collabora-ve Search Interactions Description
Search – query – self (Q) A user issues a query Select-‐ item-‐self (V) A user clicks on a result in the returned result list Capture-‐item-‐self (S) A user saves a snippet or bookmarks a webpage
Scan-‐list of saved item – mixed (Wm)
A user checks the workspace without clicking on any particular item.
Select – single saved item –self (Ws)
A user clicks on an item in the workspace saved by him/herself
Select – single saved item – partner (Wp)
A user clicks on an item in the workspace saved by the partner
Scan-‐topic -‐shared (T) A user clicks the topic statement for view Communicate-‐ messages-‐self
(Cs) A user sends a message to the other user
Communicate-‐message-‐partner (Cp)
A user receives a message from the other user
21
Transi-on analysis using HMM
� Disadvantage of previous methods � Missing global view of search behaviors � Hard to determine the segments of sequential behaviors as different
search states
� Model search states as hidden variables
A Hidden Markov Model for Action transitions
22
Hidden States and Transi-ons Q V S Wm Ws Wp T Cs Cp
HQ 0.82 0.13
HV 0.87 0.1
HS 0.88
HD 0.36 0.36 0.21
HW 0.37 0.44 0.12
HC 0.44 0.47
23
Collaborative Search
Individual Search
What We Learned � Collaborative search process have patterns
� More collaboration-‐oriented actions as the collaboration level increase
� Transitions within search-‐oriented actions and within collaboration-‐oriented actions are more frequent than between them in all three conditions.
� Explicit and implicit communication has potential benefit on helping using generating query ideas.
24
People Search in their Networks:
PeopleExplorer
Collaborate with Shuguang Han, Zhen Yue
Search for People � People use search engines in daily basis
� But many are People Search � Find appropriate collaborators � Find conference program committee members � Find qualified job candidates � Find appropriate experts to answer questions in online QA
(Question Answering) system 26
query=“experts in information retrieval”
� Unable to support diverse tasks in one system � Only focus on one type of people search task, but task contexts are
diverse � Find keynote speakers: authoritativeness � Find collaborators: social closeness
� Unable to support personalizing user preferences � Even in the same task, users have different preferences. e.g. finding
thesis committee members � Some users prefer to find domain expert � Some prefer to find someone who are easily to be connected
� Unable to support exploratory search process � Exploration is an iterative and interactive process. Users may need
to learn the importance of each criterion
27
Limita-on of Exis-ng People Search
The PeopleExplorer System � The proposed method
� Represent task diversity through multiple facets � Allow users personalize the importance of each facet � Explore the importance of each facet (system explained why each
candidate is returned in candidate surrogate)
� The Dataset � 151,165 ACM hosted conference papers � In computer science and information science fields � From 2000 to 2011 � 209,592 unique authors � Title, abstract and authors of each paper
28
29
query = “recommender system”
Users’ exploration on three facets
Candidate Surrogate
Workspace
� Content Relevance � I: Retrieve a set of relevant documents for each query � II: Pass the score from document to each of its authors � III: Rank author based on its integrated score � Title and Abstract were indexed for document search
� Authoritativeness � PageRank** � Decomposed a coauthor link into two directional links
Method
** Illustration of Authoritativeness, from Wikipedia
30
� Social Similarity � Measured by # common coauthors two people shared � Users can also build their social profiles, the similarity is measured by the aggregated similarity for all connections in your social profile
� Integration � Log-‐Linear combination with weights indicating the importance of each facet
Method
31
Experiment Design � Exploratory People Search Tasks
� Conference Mentor Finding � Expectation: Authoritativeness is important
� New Coauthor Finding � Expectation: More social similarity
� External Thesis Committee Member Finding � Expectation: both social similarity and authoritativeness are
important
� Reviewer Suggestion � Expectation: Less social similarity
� Two Systems � Experimental system and baseline system
32
33
The Experimental system
The Baseline system
Example Tasks
34
Par-cipants � 24 participants
� 10 are female, 14 are male
� All are PhD students majoring in computer science and information science from 8 Universities � Research interests are diverse: information retrieval, computer
graphics, GIS, information security, health informatics, graphic model
� 92% of them searched at least 2-‐3 times a month. � 67% of them searched for people at least once a week in academic
search engines such as Google Scholar and Microsoft Academic Search.
35
Result Analysis � System Usage Analysis
� How did people use two systems?
� System Performance � Whether the experimental system is better in terms of both Efficiency
and Effectiveness ?
� User Perceptions � How did users perceived the performance of the system?
� Task Contexts � The importance of each facet in different tasks and among different
users
36
System Usage � Number of unique queries (NUQ)
� Overall, no significant difference, but has significance for Conference mentor finding task (p=0.037)
� Number of result pages users clicked (NP) � Experimental system is significantly better
� How many times users tuned the slide bars (NSB)
37
System Effec-veness � Average rank position of the marked candidates (ARP) � Average relevance score over the five selected candidates (ARel) � Number of returned candidates (NC) and the number of unique
candidates (NUC) generated by the system for each task.
38
System Efficiency � Overall, No significant difference has been found
} But significant (p = 0.1) for Task 1
39
System Efficiency � Overall, No significant difference has been found
} The time spent for finding the first candidate is significant for Task 2
40
User Percep-ons
� Usability questions
} Interaction between Task and Satisfactory In Q4
41
Task Contexts Analysis � The importance of each facet in different tasks
� Record the weights for each facet when selecting a candidate, � If weight of the facet ≠ 0, we think this facet is important � count the number of candidates view each facet as important
42
Insights � People finding tasks do need iterative and interactive system support � Users only need to check fewer unique candidates in the top rank
positions.
� The candidates are more relevant. Overall, users perceived more satisfied.
� Importance of each facet is diverse in different tasks
43
Combine Expert Content with User Generated Content
Collaborate with Yiling Lin and Peter Brusilovsky
Finding Images � Great amount of images created daily � Most of images are without textual content
45
Teenie Harris Arichive: 80,000 images 5 catalogers who worked full time for 5 years
The Flamenco Search Interface
46
47
48
� images can be found more efficiently and effectively when more than one information indicators are provided to users in a combined manner � Driven by information scent in the information foraging theory
49
Dual Perspec-ve Image Finding
Dual-‐Perspec-ve Image Finding
50
Provide sufficiently strong information scents
Allow users to incrementally reach their goal
Offer efficient and informative feedback
Informa-on Flow
51
Research Design � “Teenie” Harris collection at Carnegie Museum of Art
� 1,986 of these images � 4,206 unique tags and 16,659 tag assignments using Mturk
� Library of Congress image collection in Flickr. � 12,541 images � 39,737 unique tags and 1,216,318 tag assignments
� provided by the Library of Congress and Flickr’s users
52
DPIF: Flickr LC Collec-on
53
Baseline 1: Subject Heading Only
54
Baseline 2:Tag only
55
Research Design
� Controlled experiment with 52 participants from great Pittsburgh area
� Data will be recorded with multiple methods: � system logs, � a pre-‐test (working memory capacity test & background survey), � post-‐questionnaire after each task, each interface, and at the end � a structural interview
� Search tasks � Lookup tasks � Exploratory search tasks
56
Search Tasks � Lookup search tasks
� 3 for each participant/system � Total 9 lookups
� Exploratory search tasks � 1 for each participant/system, total 3 exploratory tasks
57
Ini-al Results
58
Learn from Current and Traditional:
Virtual Reference and Community-‐based QA
Collaborate with Dan Wu at Wuhan University
Two Social Services
� Community-‐based Q&A (cQA) � Provide knowledge sharing among community users � Become rapidly developing social collaboration platforms � Build participatory platform for Q & A among community users
� Collaborative Digital Reference (cDR) � Extend reference service with patrons to online � Collaborate among libraries with different expertise & working
schedules � Learn among libraries and help each other � Allocate resources better according to users’ needs � Build collaborative platform for Q & A among libraries
60
Research Mo-va-ons � cQA and cDR are two instances of social Q & A
� Both enable people to collaborate in answering questions � important question: the differences and connections between cQA
and cDR, and between different languages
� Research Questions � Q1: through the set of questions asked at the selected cQA and cDR
sites, what can be the service differences in term of answer quality, responsiveness and response time?
� Q2: Do Chinese sites and English sites reveal differences in the answers to Q1?
� Q3. What can be learned from cQA to improve cDR?
61
Study Design � Sampling method
� Aim to obtain first-‐hand, focused evaluation � 2 languages: English and Chinese � 3 cDR sites and 3 cQA sites in each language
� 3X4 questions and domains � 3 domains: Economics, literature, library science � 4 types of questions: Factual questions, enumerative questions,
definition questions and explorative questions
� Answers: obtained from encyclopedias, Wikipedia and online fact books, also ask domain experts
62
Three Chinese cQA Sites � Baidu Zhidao
63
Three Chinese cQA Sites � Sina iAsk
64
Three Chinese cQA Sites � SOSO Ask
65
Three English cQA Sites � Yahoo! Answers
66
Three English cQA Sites � Answers.com
67
Three English cQA Sites � MadSci Net
68
Three Chinese cDR Sites � Reference Service of China’s National Science Digital Library
69
Three Chinese cDR Sites � Online Joint Knowledge Navigation
70
Three Chinese cDR Sites � The Collaborative Reference Network
71
Three English cDR Sites � QuestionPoint
72
Three English cDR Sites � IPL2
73
Three English cDR Sites � Ask a Librarian
74
3X4 Ques-ons and Domains Economics Literature Library Science
Factual questions
芒德尔•托宾效应最早是在哪篇文章中被提出? In which paper was the idea later called Mundell-‐Tobin effect first published?
迄今为止,诺贝尔文学奖已有多少位获奖者? How many people have won the Nobel Prize for Literature up to now?
世界图书首都评选是从哪一年开始的? From which year did the selection of “World Book Capital” begin?
Enumerative questions
根据最新统计数据,中国有哪些企业进入世界五百强前十名之列? According to the latest data, which Chinese corporations are among the top ten of the world’s top five hundreds enterprises?
在所有诺贝尔文学奖得主中,有哪些人是从南美洲来的? Among all the Nobel Literature Prize laureates, who are/were from South America?
世界性的图书馆组织有哪些? What international library organizations are there?
Definition questions
什么是流动性补偿? What does compensation for liquidity mean?
什么是泛文学? What does pan-‐literature mean?
什么是iSchool? What is iSchool?
Explorative questions
全球经济复苏还需要多长时间?为什么? How much time is still needed for global economy to recover? Why?
博客对大众文学有哪些影响? What impacts have the blogs made on the popular literature?
数字图书馆的快速发展会给实体图书馆带来哪些方面的重大变化?为什么会有这些变化? What important changes will the rapidly developed digital libraries bring to traditional libraries? And why are there these changes?
75
Results: Chinese Sites questions cQA sites cDR sites
Baidu Zhidao
Sina .iAsk SOSO Ask The Collaborative Reference Service of China’s National Science Digital Library
Online Joint Knowledge Navigation
The Collaborative Reference Network of Zhongshan Library at Guangdong Province
Factual questions
Economics 0/0 0/0 0/0 0/1 1/1 0/0
Literature 0/1 1/1 1/1 1/1 1/1 1/1
Lib Science 0/0 1/1 1/1 1/1 1/1 1/1
Enumerative questions
Economics 1/1 1/2 2/2 1/1 1/1 0/0
Literature 0/0 1/1 2/2 0/0 1/1 1/1
Lib Science 1/2 1/1 2/2 1/1 1/1 1/1
Definition questions
Economics 1/1 1/1 2/2 1/1 1/1 1/1
Literature 1/2 1/2 1/2 0/0 1/1 1/1
Lib Science 1/2 1/1 1/1 1/1 1/1 0/1
Explorative questions
Economics 0/0 1/1 3/3 1/1 0/1 0/0
Literature 1/2 0/0 1/2 0/0 1/1 0/1
Lib Science 1/2 0/0 1/1 0/1 0/1 0/0 76
43 answers for the 12 questions asked in cQA
Average 3.58 answers per question
29 answers for the 12 questions asked in cDR
Average 2.42 answers per question
33 answers are correct (76.7%)
23 answers are correct (79.3%)
Factual: 5 answers, 4 are correct
Factual: 8 answers, 7 are correct
Enumerative: 13 answers, 11 are correct
Enumerative: 7 answers, 7 are correct
Definition: 14 answers, 10 are correct
Definition: 8 answers, 7 are correct
Explorative: 11 answers, 8 are correct
Explorative: 6 answers, 2 are correct
Results: Chinese Sites rank system/Q&A websites number of questions that
received answers (out of 12 questions)
number of correct answers/ total number of answers
correct answer rate (%)
answering time (average over all returned answers)
1 SOSO Ask 8 17/19 89.5 1 day,20 hours and 3minutes
2 Online Joint Knowledge Navigation
12 10/12 83.3 3 days
3 Sina.iAsk 8 9/11 80 13 days,19 hours and 5 minutes
4 The Collaborative Reference Service of China’s National Science Digital Library
9 7/9 77.7 7 days
5 The Collaborative Reference Network of Zhongshan Library at Guangdong Province
8 6/8 75 8 hours
6 Baidu Zhidao 8 7/13 53.8 6 days and 15hours
77
SOSO Ask responded relatively quickly and produced the highest number of answers
Online Joint answered all 12 questions, and responded very quickly
Had the shortest response time, but the quality of the answers varies
cQA was not faster at providing answers when comparing to cDR
questions cQA sites cDR sites
Yahoo! Answers
Library of Congress IPL2
Factual questions
Economics 0/0 1/1 1/1
Literature 1/1 1/1 1/1
Lib Science 0/0 1/1 1/1
Enumerative questions
Economics 1/1 0/0 1/1
Literature 1/2 0/0 1/1
Lib Science 1/1 1/1 1/1
Definition questions
Economics 1/2 0/0 1/1
Literature 0/1 0/0 1/1
Lib Science 1/1 1/1 1/1
Explorative questions
Economics 2/2 0/0 1/1
Literature 0/1 0/0 1/1
Lib Science 2/3 0/1 1/1
Results: English Sites
78
15 answers for 10 of the 12 questions asked in Yahoo! Answers
IPL provided 12 answers to 12 questions LC provided 6 answers to 6 of the 12 questions
10 answers are correct (66.7%) in Yahoo! Answers IPL has 100% correct answers
LC has 83.3% correct answers
Factual: 1 answer, and is correct
Factual: 6 answers, all are correct
Enumerative: 4 answers, 3 are correct
Enumerative: 4 answers, 4 are correct
Definition: 4 answers, 2 are correct
Definition: 4 answers, 4 are correct
Explorative: 6 answers, 4 are correct
Explorative: 4 answers, 3 are correct
rank system/Q&A websites number of questions that received answers (out of 12 questions)
number of correct answers/ total number of answers
correct answer rate (%)
answering time (average over all returned answers)
1 IPL2 12 12/12 100 14 days
2 Library of Congress 6 5/6 83.3 17 days
3 Yahoo! Answers 10 10/15 66.7 2 days
4 MadSci Net 1 0/1 0 /
5 Ask a librarian 1 0/0 0 /
6 Answers.com 0 0/0 0 /
Results: English Sites
79
IPL2 is the best online service , 100% correct answer rate, also answers are all in high quality
Yahoo! Answers has the fastest answering speed and the largest number of answers. But its answer quality is lower than IPL2 and LC
Answers.com and Ask a Librarian did not answer our questions
LC only answered half of our questions, and took long time to answer
Between Chinese and English � Exhibit many similarities
� cQA sites are good at enumerative and definition question, and to some degree explorative questions, but poorly on factual questions, particularly in economics.
� cDR sites are more reliable, and produce higher quality answers even though number of answers is smaller
� Demonstrate some differences � Screening questions differently: our questions to the Chinese sites
produced more responses, whereas two English sites did not answer our questions at all.
� Response time is shorter in Chinese sites, and only Yahoo! Answers is in comparable response timeframe. Maybe both IPL2 and Library of Congress are very busy
80
What We Learned � Pros and Cons of cQA and cDR
� cQA’s advantages: large user groups, more answers returned. � Consistent with Shachaf (2009): cQA are more heavily utilized
� cQA’s Limitations: information of different qualities and the shallowness of some answers.
� cDR’s advantages: rich and reliable reference resources, and high literacy skills of reference librarians. � Consistent with Connaway and Radford (2011): information quality and
interpersonal relationship � Consistent with Shachaf (2009): librarians are valuable for answering more
difficult questions
� cDR’s limitations: slow response speed and smaller numbers of answers.
81
What We Learned � Inspirations
� How to speed up and scale up cDR? � make the cDR reference process and results as open as possible
� Lankes (2004): general DR model contains a Q & A archive
� Add commenting, tagging and discussing functions to cDR questions and answer collections � Build up more feedback and participatory mechanisms
� the usages of cQA answers in cDR services � ??An answer to Connaway and Radford (2011) challenges: “users still
do not really know about digital reference services” � some high quality cDR services make them available in well-‐known cQA
sites, integrate cDR with cQA
82
What We Learned � Limitations of the Study
� the number of samples is small � considering the popularity of cQA sites and many other cDR services � Considering the wide range of questions asked
� our selected questions and our native language might trigger or prevent some responses from the English sites.
� it would be better to have a survey associated with the questions we asked so that some reasons behind certain reactions from the sites (such as lack of returned answers to our questions) can be better explained.
83
Closing Remarks
Collabora-ve Search 2.0
� Better model of users and teams � People in different populations � Teams with bigger size � Team members with different roles
� New mobile and mixed platform � Smart phones, tablets, laptops, etc.
� Collaborative search process or systems � Collaborative search are more popular � But collaborative search systems are not widely used
Heterogeneously Social � Heterogeneous information resources
� Articles, web pages, blogs, twitters, facebooks, youtube, search history
� Heterogeneous platforms � Communication networks � Interaction platforms: mobiles, tablets, laptops, desktops etc
Integra-on with LIS � Social information access develops many new technology on information organization, information storage and retrieval � Scalable and quick, but noisy and shallow
� How such knowledge can be integrated with traditional expert generated knowledge � Clean and deep, but static and
87
Privacy and Security � Social information in general is open
� But people still are concerned with their privacy � Particularly when information can be easily aggregated
� Social information belongs to the sites � But it is part of the people’s identity and assets � How to maintain, preserve and safe-‐guard social information?
88
Access Increasingly More Social � Know the boundary of Social Information Access � How to identify which tasks
are good for social information access?
� How to effectively integrate social networking, direct messaging, and social recommendations with current search facilities.
Related Publica-ons � Dan Wu, Daqing He. (2013). A study on Q&A services between community-‐based question answering and
collaborative digital reference in two languages. iConference 2013 Proceedings (pp. 326-‐337). doi:10.9776/13205.
� Han, Shuaguang; Yue, Zhen; He, Daqing. Automatic Identifying Search Tactic in Individual Information Seeking: A Hidden Markov Model Approach. iConference 2013.
� Zhen Yue, Shuguang Han, Daqing He, A Comparison of Action Transitions in Individual and Collaborative Exploratory Web Search. The eighth Asia information retrieval societies conference, 2012
� Zhen Yue, Jiepu Jiang, Shuguang Han, Daqing He. 2012. Where do the Query Terms Come from? An Analysis of Query Reformulation in Collaborative Web Search. In Proceedings of the 21st International Conference on Information and Knowledge Management (CIKM '12): 2595-‐2598.
� Shuguang Han, Daqing He, Zhen Yue, Jiepu Jiang and Wei Jeng. IRIS-‐IPS: An Interactive People Search System for HCIR Challenge. 2012 Human-‐Computer Information Retrieval Symposium (HCIR Challenge 2012), Boston, IBM Research
� Zhen Yue, Shuguang Han, Jiepu Jiang, and Daqing He. 2012. Search tactics as means of examining search processes in collaborative exploratory web search. In Proceedings of the 5th Ph.D. workshop on Information and knowledge (PIKM '12). ACM, New York, NY, USA, 59-‐66. DOI=10.1145/2389686.2389699
90
91
Really Tough Questions Please!!!
Acknowledgement � The work presented here were conducted by faculty and students in Information Retrieval, Integration and Synthesis Lab at School of Information Sciences
� Other people participated in these works are � Prof. Peter Brusilovsky, Prof Dan Wu etc.
� These work are partially supported by the National Science Foundation