[IEEE 2011 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops) - Seattle, WA, USA (2011.03.21-2011.03.25)] 2011 IEEE International Conference on Pervasive Computing and Communications Workshops (PERCOM Workshops) - Crowdsourcing location-based queries

Download [IEEE 2011 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops) - Seattle, WA, USA (2011.03.21-2011.03.25)] 2011 IEEE International Conference on Pervasive Computing and Communications Workshops (PERCOM Workshops) - Crowdsourcing location-based queries

Post on 09-Mar-2017

212 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

<ul><li><p>Crowdsourcing Location-based QueriesMuhammed Fatih Bulut Yavuz Selim Yilmaz</p><p>Ubiqutious Computing LabDepartment of Computer Science and EngineeringUniversity at Buffalo, SUNY, Buffalo, NY, 14260{ mbulut | yavuzsel | demirbas }@buffalo.edu</p><p>Murat Demirbas</p><p>AbstractLocation-based queries are quickly becoming ubiq-uitous. However, traditional search engines perform poorly fora significant fraction of location-based queries, which are non-factual (i.e., subjective, relative, or multi-dimensional). As analternative, we investigate the feasibility of answering location-based queries by crowdsourcing over Twitter. More specifically,we study the effectiveness of employing location-based services(such as Foursquare) for finding appropriate people to answera given location-based query. Our findings give insights forthe feasibility of this approach and highlight some researchchallenges in social search engines.</p><p>I. INTRODUCTIONIn recent years we have witnessed local businesses embrac-</p><p>ing location-based advertising over the Internet [7], as well as arapid prevalence of smartphones that boosted mobile Internet-access [6]. These two trends enabled location-based querying,which found applications in a variety of contexts, such astravel, food, entertainment, shopping, commuting, health, andwork. Due to the convenience it provides to users, location-based queries are quickly becoming ubiquitous.</p><p>Location-based queries, however, have been an elusivetarget for traditional search engines. While traditional searchengines perform very well in answering factual location-basedqueries (such as hotels in Miami), they perform poorly fornon-factual location-based queries, which are more subjective,relative, and multi-dimensional (such as Anyone knows anycheap, good hotel, price ranges between 100 to 200 dollarsin Miami?, Can anyone recommend a videographer in theBoston, MA area for videotaping a band?, Looking foran apartment in Boston. Anyone have any ideas? Oh, mustlove dogs.). Our preliminary investigation indicates that asignificant fraction of location-based queries are non-factualqueries. In our experiment, we mined Twitter for location-based queries and manually labeled 269 queries we collectedas factual or non-factual. We found that 63% of the querieswere non-factual, while only 37% of them were factual. Whilethis experiment does not provide a conclusive evidencesinceit is restricted to only one small dataset, it still indicates thata significant fraction of location-based queries are non-factual.</p><p>Without big breakthroughs in machine learning and naturallanguage processing techniques, it is difficult to improve theperformance of the traditional search engines for location-based queries due to these non-factual queries. Instead, apromising alternative to investigate for location-based query</p><p>answering is forwarding these queries to humans (i.e., crowd-sourcing the query), as humans are capable of parsing, in-terpreting, and answering these queries provided that theyare familiar with the location, of course. Recently Aardvark,a social search engine, demonstrated the feasibility of thisapproach [9]. Aardvark uses the social network of the asker tofind suitable answerers for the query and forwards this queryto the answerers, and returns any answer back to the asker.</p><p>In this paper, we investigate the problem of crowdsourcinglocation-based queries over Twitter. More specifically, weinvestigate whether the newly emerging location check-inservices, such as Foursquare, can be useful for crowdsourcinglocation-based queries. In our crowdsourcing system, we useFoursquare to determine users that frequent the queried localeand that have interests on the queried category (e.g., food,nightlife). We hypothesize that these confirmed users willprovide better answers than random people in that locale.</p><p>Our contributions in this paper are as follows:</p><p> We design and implement a framework for crowdsourcinglocation-based queries. We build our framework on topof Twitter to utilize Twitters large user community. Thesocial network aspect of Twitter also provides benefit foranswering location-based queries; we observed that somequeried Twitter users retweet queries to their followers toprovide better/more answer.</p><p> We provide an analysis of crowdsourced location-basedquerying performance. Our findings show that employingFoursquare is helpful on finding appropriate people toanswer location-based queries and increases the question-answering rates significantly for focused categories, suchas food, nightlife, and college/education.</p><p> We show that, compared to using traditional searchengines, crowdsourcing location-based queries can beadvantageous for non-factual queries. While our systemcan answer 75% of both factual and non-factual queriesit received, Google achieves 78% answer rate for factualqueries and only 29% answer rate for non-factual queries.The latency of our crowdsourced system is also accept-able for many tasks; approximately 50% of answers arereceived within the first 20 minutes of the queries, and90% of answers received within 2 hours of the queries.</p><p>Second IEEE Workshop on Pervasive Collaboration and Social Networking</p><p>978-1-61284-937-9/11/$26.00 2011 IEEE 513</p></li><li><p>II. RELATED WORKA. Location-based Queries</p><p>Due to the rapid increase in smart phone technology,location-based queries are getting more popular. Chow et al.[3] introduces a new database management system for scalablelocation-based networking services. This aims to provide aquick and scalable database management system for location-based queries.</p><p>Search companies and database researchers worked on thelocation-based queries, including the questions of spatial in-dexing and nearest neighbor search, and geometric approachesto location-based queries [11]. Those works have been algo-rithmic in nature, and assume that the queries specify theexact/correct object name, so the focus is on indexing androuting the query to an answer with minimum cost.</p><p>B. Question Answering Systems</p><p>A closely related work to ours is the work on Aardvark [9],a social search-engine startup which was recently acquired byGoogle. Aardvark accepts questions from users via IM, e-mail,web, SMS or voice and tries to route the question to the mostappropriate person in the Aardvark community. Aardvark isreported to achieve a rate of 87.7% on answering questions,and for the rated answers, 84.5% of them are rated either goodor OK. Our system is different from Aardvark in the followingaspects: our system uses Twitter as a middleware, and employsbuilt-in social networking features of Twitter to its advantage.It also employs Foursquare to increase the question-answeringrates. Finally, in our system, both questions and answers arecrowdsourced for moderation to evaluate the system perfor-mance in more detail.</p><p>ChaCha [1] is another question answering engine based onSMS. It uses its users location information (set by users) andthe user statistics in order to find the appropriate human opera-tor to forward the question for answering. The human operatorsends the answer via SMS and gets a commission per answeredquestions. ChaCha also performs caching to answer previouslyencountered questions quickly. SMSFind is a recent study [2]that aims to automate what ChaCha does. SMSFind acceptsquestions via SMS and handles question answering by usingweb search engines in the back-end. It focuses on the long-tailqueries whose topics are not popular but there are so manyof them (inspired from long-tail phenomenon). Therefore thistype of queries are difficult to answer by existing automatedSMS search engines which use a set of known topics. Ituses a ranking algorithm to find the best matching snippetsfrom search engines by using both Information Retrieval andNatural Language Processing techniques. The best matchinganswer snippet is sent via an SMS back to the querier. By thisway, SMSFind achieves 57.3% accuracy in overall and 47.8%accuracy on the long-tail queries.</p><p>C. Crowdsourcing and Collaboration</p><p>In previous work [5], we proposed that Twitter can be usedas a middleware for crowdsourced sensing and collaborationapplications. In that work, for the sake of simplicity, the</p><p>BostonBuffaloChicagoDallas</p><p>HoustonLas Vegas</p><p>Los AngelesMiami</p><p>New YorkSan Francisco</p><p>TABLE ICITIES</p><p>Arts &amp; EntertainmentCollege &amp; Education</p><p>FoodHome/Work/Other</p><p>NightlifeParks &amp; Outdoors</p><p>ShopsTravel</p><p>TABLE IILOCATION TYPES (CATEGORIES) IN</p><p>FOURSQUARE</p><p>crowdsourced question was the current weather condition,a question which everyone in that locale can answer. Incomparison, our work in this paper aims to answer generallocation-based questions and aims to find the most appropriateperson to answer the question by employing question topic cat-egorization and information from location check-in services.</p><p>A recent work [16] focuses on crowdsourcing image search.Computer algorithms face inherent difficulties for imagerecognition especially in unconstrained environments due tovariations in poses, illumination, size and occlusion. As analternative, the paper proposes a system which combinesthe intelligence of humans with machines by employing thecrowds in Mechanical Turk to validate answers given by thecomputer after processing the queried image. The findings inthe paper support the feasibility and importance of crowd-sourced human validation.</p><p>In other related work Lange et al. [10] address some chal-lenges on location-based social networks and propose offeringof relevant services to users by utilizing users geolocationinformation. Also, Schuster et al. [12] propose a service-oriented approach which provides an environment for socialcollaborative application development.</p><p>III. SYSTEM ARCHITECTURE</p><p>In this section, we present the main components ofour crowdsourcing location-based question-answering system.These components are: Question Collector, Validator,Asker, Answer Collector and Forwarder. Overall archi-tecture of the system can be seen in Figure 1.</p><p>A. Question Collector</p><p>Question Collector collects questions from Twitter usingTwitters Search API [13]. Since our focus is on location-basedquestions, we restricted this component to only collect ques-tions that mention a city name among those given in Table I.In order to recognize tweets that ask a question, we requirea question mark ? in the text. However, just requiring aquestion mark returns several false-positives; therefore we per-formed experiments to determine a good set of keywords thatappear in location-based question tweets. We determined thatthe keywords anyone, any suggestion and where workbest for collecting location-based question tweets. Thus, we</p><p>514</p></li><li><p>Fig. 1. Overall System Architecture</p><p>use the following template to collect tweets (the order of theelements is not important): [question keyword][text][locationkeyword][?] The following tweet is an example collected byour system using the above template: [Anyone][have dinnersuggestions while in][San Francisco][?]</p><p>Finally, we automatically filter inappropriate and spamtweets by employing a bag of blacklisted words and filteringout tweets with links (http://) and mentions (@). Weobserve that in general people do not insert links in theirquestions, and @ indicates a conversation in the tweet.</p><p>B. Validator (for questions)</p><p>Despite our automatic filtering in the Question Collectorcomponent, it is hard to eliminate inappropriate tweets due tosarcasm and word-play in some tweets [4]. As an extra levelof filtering, we employ moderators to validate the questionsbefore forwarding the questions to get answers.</p><p>The unmoderated questions are kept in a queue and for-warded to available moderators over Twitter. Validating aquestion is a simple task; one can easily label e.g. Can anyonesuggest a good, cheap hotel in Miami? as a valid questionwithout having detailed information on Miami or hotels. Forour initial setup, we use moderators selected from our lab anduniversity, however, we plan to expand the size of moderatorsby introducing a karma system where people can get karmaby validating questions. Members can then use these karmapoints to ask questions with higher priority to our system.</p><p>The moderators are asked to label the category and qualityof questions. For categorizing questions, moderators use one ofthe Foursquare location types given in Table II. For ranking thequestion, the moderators use the ranks in Table III. A question</p><p>Rank Meaning1 Inappropriate2 Can be asked3 Good Question</p><p>TABLE IIIRANKING LEVELS FOR A QUESTION</p><p>is ranked Inappropriate if the question cannot be asked. Theother two ranking levels are used for the appropriate questionswith Good Questions being more clear and better articulatedthan Can be asked questions.</p><p>Our system sends three consecutive tweets to ourmoderators for a question. First is the categories in Table II,second is the rankings in Table III, and the last is the questionthat needs to be validated. In addition, the system adds arandom character at the end of each tweet. This is due toTwitters policy that one cannot update the same status morethan a few times [15]. So, the tweet flow for validation of aquestion is as follows.@username A:Arts&amp;Entertainment,C:College&amp;Education,F:Food,H:Home/Work/Other,N:Nightlife,P:Parks&amp;Outdoors,S:Shops,T:Travel@username 1: inappropriate, 2:can be asked, 3: good question@username [Question]</p><p>After sending above tweets, our system waits for a replyfrom the moderator. For the sake of simplicity, we set a strictreply format. The moderator should first include the categorywith its initial letter and then the rank of the question. For</p><p>515</p></li><li><p>example, the rank reply N2 means that the tweet has acategory of nightlife and it is ranked as can be asked.If the moderator does not reply in a given time-limit, thenwe forward the question to another available moderator. Oursystem does not forward further questions to a moderator untilthe moderator replies the previous one. This provides an easyway for one to opt out of serving as a moderator.</p><p>After the validation step is completed, the questions areready to be used by Asker.</p><p>C. AskerThe Asker component forwards validated questions to peo-</p><p>ple identified as most appropriate to answer those questions.To identify appropriate people we use two different processes.In the first approach, Twitter users are selected based on theirbio information: our system checks users bio in order to selectthe users that are living in the city that the question contains. Inthe second approach, our system selects Twitter users who linktheir Foursquare accounts with their Twitter accounts. We useFoursquare because it is a location-based service where peoplecheck-in to locations they frequent. The person who checks-into a location more than any other people is called a mayor ofthat location as such the mayor badge have knowledge aboutthe locations and the categories which these locations imply.This makes our expert finding problem easier.</p><p>Next, Asker forwards the question to the person overTwitter. If the person does not respond to our question, thenour...</p></li></ul>

Recommended

View more >