[ieee 2011 ieee international conference on pervasive computing and communications workshops (percom...

Crowdsourcing Location-based QueriesMuhammed Fatih Bulut Yavuz Selim Yilmaz

Ubiqutious Computing LabDepartment of Computer Science and EngineeringUniversity at Buffalo, SUNY, Buffalo, NY, 14260{ mbulut | yavuzsel | demirbas }@buffalo.edu

Murat Demirbas

Abstract—Location-based queries are quickly becoming ubiq-uitous. However, traditional search engines perform poorly fora significant fraction of location-based queries, which are non-factual (i.e., subjective, relative, or multi-dimensional). As analternative, we investigate the feasibility of answering location-based queries by crowdsourcing over Twitter. More specifically,we study the effectiveness of employing location-based services(such as Foursquare) for finding appropriate people to answera given location-based query. Our findings give insights forthe feasibility of this approach and highlight some researchchallenges in social search engines.

I. INTRODUCTION

In recent years we have witnessed local businesses embrac-ing location-based advertising over the Internet [7], as well as arapid prevalence of smartphones that boosted mobile Internet-access [6]. These two trends enabled location-based querying,which found applications in a variety of contexts, such astravel, food, entertainment, shopping, commuting, health, andwork. Due to the convenience it provides to users, location-based queries are quickly becoming ubiquitous.

Location-based queries, however, have been an elusivetarget for traditional search engines. While traditional searchengines perform very well in answering factual location-basedqueries (such as “hotels in Miami”), they perform poorly fornon-factual location-based queries, which are more subjective,relative, and multi-dimensional (such as “Anyone knows anycheap, good hotel, price ranges between 100 to 200 dollarsin Miami?”, “Can anyone recommend a videographer in theBoston, MA area for videotaping a band?”, “Looking foran apartment in Boston. Anyone have any ideas? Oh, mustlove dogs.”). Our preliminary investigation indicates that asignificant fraction of location-based queries are non-factualqueries. In our experiment, we mined Twitter for location-based queries and manually labeled 269 queries we collectedas factual or non-factual. We found that 63% of the querieswere non-factual, while only 37% of them were factual. Whilethis experiment does not provide a conclusive evidence—sinceit is restricted to only one small dataset—, it still indicates thata significant fraction of location-based queries are non-factual.

Without big breakthroughs in machine learning and naturallanguage processing techniques, it is difficult to improve theperformance of the traditional search engines for location-based queries due to these non-factual queries. Instead, apromising alternative to investigate for location-based query

answering is forwarding these queries to humans (i.e., crowd-sourcing the query), as humans are capable of parsing, in-terpreting, and answering these queries —provided that theyare familiar with the location, of course. Recently Aardvark,a social search engine, demonstrated the feasibility of thisapproach [9]. Aardvark uses the social network of the asker tofind suitable answerers for the query and forwards this queryto the answerers, and returns any answer back to the asker.

In this paper, we investigate the problem of crowdsourcinglocation-based queries over Twitter. More specifically, weinvestigate whether the newly emerging location check-inservices, such as Foursquare, can be useful for crowdsourcinglocation-based queries. In our crowdsourcing system, we useFoursquare to determine users that frequent the queried localeand that have interests on the queried category (e.g., food,nightlife). We hypothesize that these confirmed users willprovide better answers than random people in that locale.

Our contributions in this paper are as follows:

• We design and implement a framework for crowdsourcinglocation-based queries. We build our framework on topof Twitter to utilize Twitter’s large user community. Thesocial network aspect of Twitter also provides benefit foranswering location-based queries; we observed that somequeried Twitter users retweet queries to their followers toprovide better/more answer.

• We provide an analysis of crowdsourced location-basedquerying performance. Our findings show that employingFoursquare is helpful on finding appropriate people toanswer location-based queries and increases the question-answering rates significantly for focused categories, suchas food, nightlife, and college/education.

• We show that, compared to using traditional searchengines, crowdsourcing location-based queries can beadvantageous for non-factual queries. While our systemcan answer 75% of both factual and non-factual queriesit received, Google achieves 78% answer rate for factualqueries and only 29% answer rate for non-factual queries.The latency of our crowdsourced system is also accept-able for many tasks; approximately 50% of answers arereceived within the first 20 minutes of the queries, and90% of answers received within 2 hours of the queries.

Second IEEE Workshop on Pervasive Collaboration and Social Networking

978-1-61284-937-9/11/$26.00 ©2011 IEEE 513

II. RELATED WORK

A. Location-based Queries

Due to the rapid increase in smart phone technology,location-based queries are getting more popular. Chow et al.[3] introduces a new database management system for scalablelocation-based networking services. This aims to provide aquick and scalable database management system for location-based queries.

Search companies and database researchers worked on thelocation-based queries, including the questions of spatial in-dexing and nearest neighbor search, and geometric approachesto location-based queries [11]. Those works have been algo-rithmic in nature, and assume that the queries specify theexact/correct object name, so the focus is on indexing androuting the query to an answer with minimum cost.

B. Question Answering Systems

A closely related work to ours is the work on Aardvark [9],a social search-engine startup which was recently acquired byGoogle. Aardvark accepts questions from users via IM, e-mail,web, SMS or voice and tries to route the question to the mostappropriate person in the Aardvark community. Aardvark isreported to achieve a rate of 87.7% on answering questions,and for the rated answers, 84.5% of them are rated either goodor OK. Our system is different from Aardvark in the followingaspects: our system uses Twitter as a middleware, and employsbuilt-in social networking features of Twitter to its advantage.It also employs Foursquare to increase the question-answeringrates. Finally, in our system, both questions and answers arecrowdsourced for moderation to evaluate the system perfor-mance in more detail.

ChaCha [1] is another question answering engine based onSMS. It uses its users’ location information (set by users) andthe user statistics in order to find the appropriate human opera-tor to forward the question for answering. The human operatorsends the answer via SMS and gets a commission per answeredquestions. ChaCha also performs caching to answer previouslyencountered questions quickly. SMSFind is a recent study [2]that aims to automate what ChaCha does. SMSFind acceptsquestions via SMS and handles question answering by usingweb search engines in the back-end. It focuses on the long-tailqueries whose topics are not popular but there are so manyof them (inspired from long-tail phenomenon). Therefore thistype of queries are difficult to answer by existing automatedSMS search engines which use a set of known topics. Ituses a ranking algorithm to find the best matching snippetsfrom search engines by using both Information Retrieval andNatural Language Processing techniques. The best matchinganswer snippet is sent via an SMS back to the querier. By thisway, SMSFind achieves 57.3% accuracy in overall and 47.8%accuracy on the long-tail queries.

C. Crowdsourcing and Collaboration

In previous work [5], we proposed that Twitter can be usedas a middleware for crowdsourced sensing and collaborationapplications. In that work, for the sake of simplicity, the

BostonBuffaloChicagoDallas

HoustonLas Vegas

Los AngelesMiami

New YorkSan Francisco

TABLE ICITIES

Arts & EntertainmentCollege & Education

FoodHome/Work/Other

NightlifeParks & Outdoors

ShopsTravel

TABLE IILOCATION TYPES (CATEGORIES) IN

FOURSQUARE

crowdsourced question was the current weather condition,a question which everyone in that locale can answer. Incomparison, our work in this paper aims to answer generallocation-based questions and aims to find the most appropriateperson to answer the question by employing question topic cat-egorization and information from location check-in services.

A recent work [16] focuses on crowdsourcing image search.Computer algorithms face inherent difficulties for imagerecognition especially in unconstrained environments due tovariations in poses, illumination, size and occlusion. As analternative, the paper proposes a system which combinesthe intelligence of humans with machines by employing thecrowds in Mechanical Turk to validate answers given by thecomputer after processing the queried image. The findings inthe paper support the feasibility and importance of crowd-sourced human validation.

In other related work Lange et al. [10] address some chal-lenges on location-based social networks and propose offeringof relevant services to users by utilizing users’ geolocationinformation. Also, Schuster et al. [12] propose a service-oriented approach which provides an environment for socialcollaborative application development.

III. SYSTEM ARCHITECTURE

In this section, we present the main components ofour crowdsourcing location-based question-answering system.These components are: “Question Collector”, “Validator”,“Asker”, “Answer Collector” and “Forwarder”. Overall archi-tecture of the system can be seen in Figure 1.

A. Question Collector

“Question Collector” collects questions from Twitter usingTwitter’s Search API [13]. Since our focus is on location-basedquestions, we restricted this component to only collect ques-tions that mention a city name among those given in Table I.In order to recognize tweets that ask a question, we requirea question mark “?” in the text. However, just requiring aquestion mark returns several false-positives; therefore we per-formed experiments to determine a good set of keywords thatappear in location-based question tweets. We determined thatthe keywords “anyone”, “any suggestion” and “where” workbest for collecting location-based question tweets. Thus, we

514

Fig. 1. Overall System Architecture

use the following template to collect tweets (the order of theelements is not important): [question keyword][text][locationkeyword][?] The following tweet is an example collected byour system using the above template: [Anyone][have dinnersuggestions while in][San Francisco][?]

Finally, we automatically filter inappropriate and spamtweets by employing a bag of blacklisted words and filteringout tweets with links (“http://”) and mentions (“@”). Weobserve that in general people do not insert links in theirquestions, and “@” indicates a conversation in the tweet.

B. Validator (for questions)

Despite our automatic filtering in the “Question Collector”component, it is hard to eliminate inappropriate tweets due tosarcasm and word-play in some tweets [4]. As an extra levelof filtering, we employ moderators to validate the questionsbefore forwarding the questions to get answers.

The unmoderated questions are kept in a queue and for-warded to available moderators over Twitter. Validating aquestion is a simple task; one can easily label e.g. Can anyonesuggest a good, cheap hotel in Miami? as a valid questionwithout having detailed information on Miami or hotels. Forour initial setup, we use moderators selected from our lab anduniversity, however, we plan to expand the size of moderatorsby introducing a karma system where people can get karmaby validating questions. Members can then use these karmapoints to ask questions with higher priority to our system.

The moderators are asked to label the category and qualityof questions. For categorizing questions, moderators use one ofthe Foursquare location types given in Table II. For ranking thequestion, the moderators use the ranks in Table III. A question

Rank Meaning1 Inappropriate2 Can be asked3 Good Question

TABLE IIIRANKING LEVELS FOR A QUESTION

is ranked “Inappropriate” if the question cannot be asked. Theother two ranking levels are used for the appropriate questionswith “Good Questions” being more clear and better articulatedthan “Can be asked” questions.

Our system sends three consecutive tweets to ourmoderators for a question. First is the categories in Table II,second is the rankings in Table III, and the last is the questionthat needs to be validated. In addition, the system adds arandom character at the end of each tweet. This is due toTwitter’s policy that one cannot update the same status morethan a few times [15]. So, the tweet flow for validation of aquestion is as follows.@username A:Arts&Entertainment,C:College&Education,F:Food,H:Home/Work/Other,N:Nightlife,P:Parks&Outdoors,S:Shops,T:Travel@username 1: inappropriate, 2:can be asked, 3: good question@username [Question]

After sending above tweets, our system waits for a replyfrom the moderator. For the sake of simplicity, we set a strictreply format. The moderator should first include the categorywith its initial letter and then the rank of the question. For

515

example, the rank reply “N2” means that the tweet has acategory of “nightlife” and it is ranked as “can be asked”.If the moderator does not reply in a given time-limit, thenwe forward the question to another available moderator. Oursystem does not forward further questions to a moderator untilthe moderator replies the previous one. This provides an easyway for one to opt out of serving as a moderator.

After the validation step is completed, the questions areready to be used by “Asker”.

C. AskerThe Asker component forwards validated questions to peo-

ple identified as most appropriate to answer those questions.To identify appropriate people we use two different processes.In the first approach, Twitter users are selected based on theirbio information: our system checks users’ bio in order to selectthe users that are living in the city that the question contains. Inthe second approach, our system selects Twitter users who linktheir Foursquare accounts with their Twitter accounts. We useFoursquare because it is a location-based service where peoplecheck-in to locations they frequent. The person who checks-into a location more than any other people is called a mayor ofthat location as such the mayor badge have knowledge aboutthe locations and the categories which these locations imply.This makes our expert finding problem easier.

Next, Asker forwards the question to the person overTwitter. If the person does not respond to our question, thenour system does not ask further questions to this person. Thisprovides people an easy opt-out from our study.@username Please help our research project by answering thefollowing question. For more info visit [website link]@username [Question]

Since Twitter strictly follows some rate-limit (1000 statusupdates per day), one issue while asking questions is whetherto ask in parallel or serially. Asking in parallel increases ourchance of getting an answer quickly, but does not use Twitter’srate-limit wisely. From our experiments (see Figure 7), we findthat over 50% of the people answer within the first 20 minutes,so asking serially seems to be a reasonable approach under theconstraints of Twitter rate-limits.

D. Answer CollectorAnswer Collector constantly polls our Twitter account (by

using the Twitter API [13]) for any received answers to thequestions asked. This component is similar to the “QuestionCollector” component. Once it gets answers from the queriedTwitter users, the component processes it just the same as the“Question Collector” component. The answers that containinappropriate words are filtered out by employing a set ofblacklisted words. Finally, the component matches the answerwith the respective question by using log data stored indatabase, and stores the answer for the validation step.

E. Validation (for answers)Although our system processes answers for inappropriate

words, it is still possible to have some answers which are

Rank Meaning1 Inappropriate2 Can be forwarded3 Good Answer

TABLE IVRANKING LEVELS FOR AN ANSWER

inappropriate to forward. As we do for the questions, answersare also sent to moderators for validation. This step is similarto the validation step in “Validation (for questions)” with thesmall exception of using Table IV for validation rankings.

F. Forwarder

In this step, our system forwards the answers which haveranking level of either 2 (can be forwarded) or 3 (GoodAnswer) back to the Twitter users that asked the respectivequestions. Here are the tweets we send in this step:@asker Our crowd-sourced question answering system found thefollowing answer to your question contributed by @answerer@asker [Question]@asker [Answer]

IV. EXPERIMENTS

In this section we present experiment results from oursystem. We used Java as our primary programming lan-guage and utilized open source libraries “twitter4j” [14] and“Foursquared” [8]. We wrote approximately 5KLOC. Westored all our log data in our lab’s mysql database distributedto 8 tables consisting of questions, answers, users, and moder-ators’ data. Our question dataset consists of 269 questions thatour system collected over Twitter and validated as acceptableby the moderators. We categorize questions as factual and non-factual. In our dataset, 63% of the questions are non-factualand the remaining 37% of them are factual. Table V showsthe examples of questions for each type.

Question Rates: We observed that Twitter is very noisy andhas lots of “Inappropriate tweets” that cannot be forwarded asa question. Among all the questions that “Question Collector”collected, our moderators have tagged approximately 75% ofthem as inappropriate to ask.

Answer Finding Rate: Here we present the answer findingrates for the questions in our dataset. In addition to using thesequestions in our system, we also forwarded the same questionsto Google search engine as a control experiment. We considera question answered by Google if our human evaluators see theanswer in the first 5 results. We consider a question answeredby our system, if our system provides at least one answer atranking level either 2 (can be forwarded) or 3 (good answer).

One of the promising results for our system is that, itanswered approximately 75% of the questions, compared toGoogle’s 47% answer rate on these questions. Moreover,while our system answered 75% of both the factual and thenon-factual questions, Google answered 78% of the factualquestions and only 29% of the non-factual questions. This

516

Factual Questions Non-factual QuestionsAnyone know what’s being filmed at B-way/Berwynin Chicago?

Looking for an apartment in Boston. Anyone haveany ideas? Oh, must love dogs.

Does anyone know the exact location of the “threesisters” buildings in San Francisco?

Can anyone recommend a good hotel in San Fran-cisco that’s baby friendly?

Looking for a good bike ride this weekend in theBoston area. Any ideas, anyone?

Anyone know a good & cheap place to stay in newyork? preferably without bedbugs

TABLE VQUESTION TYPES

indicates that our system gives really promising results fornon-factual questions, while not losing any performance onfactual questions.

Fig. 2. Reply Percentages

Reply Percentages: In Figure 2, different from the answerfinding rate, we compare the reply rates for each city. Wefound that Chicago is the most participating city, while Buffalois the least participating city, and that participation ratio ismostly similar for all cities in around 5%. The low reply rate isdue to forwarding specific, sometimes difficult and unsolicitedquestions, and not providing any incentive.

Fig. 3. Reply Percentage based on Question Rankings

Reply Percentage based on Question Rankings: Figure 3shows the reply percentage based on the question ranks. Aninteresting observation here is; people tend to answer “GoodQuestions” more than the questions that are “Can be asked”.Although the difference between reply percentages is not thatsignificant, it is still possible to increase reply rate by askingbetter questions.

Question Ranks vs. Answer Ranks: Another interestingobservation is on how the answer ranks change based on

Fig. 4. Question Ranks vs. Answer Ranks

the question ranks. The questions that rank higher (betterquestions) get answers that rank higher (better answers) asseen in Figure 4. For the questions that rank level 3 (GoodQuestions), 40% of the answers are “Good Answers” and 10%of the answers are “Can be forwarded”. On the other hand forthe questions that rank 2 (Can be asked), we got less “GoodAnswers” (nearly 27%) and more “Can be forwarded” answers(nearly 23%).

Fig. 5. Foursquare Reply Rate vs. Random User Reply Rate

Foursquare Reply Rate vs. Random User Reply Rate:Here we compare the effects of employing Foursquare in-formation versus not employing it (randomly finding usersfrom Twitter who live in the city that the question refersto) on the reply rates. We found that for the categories of“College & Education”, “Food” and “Nightlife”, Foursquareusers respond more than the random users. For the “Parks& Outdoors” and “Home/Work/Other” categories, it turns out

517

that there are few Foursquare places, therefore we have fewusers from Foursquare to get answers for these categories.In addition, we found the categories “Arts & Entertainments”and “Shops” are too broad, and finding experts for a specificquestion is difficult for these broad categories.

Fig. 6. Response Intervals

Response Intervals: Here, we compare the response timesof users for 4 different time slices. As shown in Figure 6, wegot answers for each time slices of the day, but we got mostof the answers at afternoon and evening. Here we note thatwe asked questions uniformly distributed throughout the day,so the asking time did not affect the result.

Fig. 7. Cumulative Distribution for Response Time

Cumulative Distribution for Response Time: Here, weexamine the users’ response latencies to queries. We foundthat approximately 50% of the answers were received withinthe first 20 minutes.

Median Response Time: We posed randomly selected 40questions from our dataset to Aardvark at the same time withour system and compared the median response times from bothsystems. Our system achieved 13 minutes median responsetime. Aardvark was reported to achieve 6 minutes medianresponse time in [9], however, in our experiments Aardvarkachieved 27 minutes median response time.

Client Type: In this part, we examine the use of mobiledevices while answering. We observed that nearly 80% ofthe answers came from mobile device users compared to20% PC/laptop users. This result highlights how much mobileapplications are getting prevalent in our lives.

Retweeted Questions Rate: Another interesting point weobserved is some users retweeted the questions. Approxi-mately 9% of our questions were retweeted by answerers toget help from their friends. Retweeting questions is actuallyhelping our questions to spread more rapidly and get answermore quickly.

V. CONCLUDING REMARKS

In this work, we presented a crowdsourced location-basedquestion-answering system. Our findings indicate that evenwithout an incentive structure, we could answer 75% ofthe questions that we asked, and the latency in answeringquestions is low—50% of the answers arrived in 20 minutes. Inaddition, for the categories of “Arts & Entertainment”, “Food”and “Nightlife”, Foursquare users provide better answers thanrandomly selected users from the same cities. Finally, socialaspects of Twitter provide a way to spread questions to evenmore people (%9 of the questions are retweeted), and this isvaluable in order to get answers more quickly.

REFERENCES

[1] http://www.chacha.com/.[2] J. Chen, L. Subramanian, and E. Brewer. Sms-based web search for low-

end mobile devices. In Proceedings of the sixteenth annual internationalconference on Mobile computing and networking, MobiCom ’10, pages125–136, New York, NY, USA, 2010. ACM.

[3] C. Chow, J. Bao, and M. F. Mokbel. Towards location-based socialnetworking services. In Proceeding of the 2nd ACM SIGSPATIALInternational Workshop on Location Based Social Networks, LBSN 2010,co-located with ACM SIGSPATIAL GIS, San Jose, CA, November 2010.

[4] D. Davidov, O. Tsur, and A. Rappoport. Semi-supervised recognitionof sarcastic sentences in twitter and amazon. In Proceeding of the 23rdinternational conference on Computational Linguistics (COLING), July2010.

[5] M. Demirbas, M. A. Bayir, C. G. Akcora, Y. S. Yilmaz, and H. Fer-hatosmanoglu. Crowd-sourced sensing and collaboration using twitter.In 2010 IEEE International Symposium on ”A World of Wireless, Mobileand Multimedia Networks” (WoWMoM), pages 1–9. IEEE, June 2010.

[6] N. Eagle and A. Pentland. Social serendipity: Mobilizing social software.IEEE Pervasive Computing, 4:28–34, April 2005.

[7] M. Ferris. Insights on mobile advertising, promotion and research. InJournal of Advertising Research.47(1), 28-37, 2007.

[8] http://code.google.com/p/foursquared.[9] D. Horowitz and S. D. Kamvar. The anatomy of a large-scale social

search engine. In Proceedings of the 19th international conference onWorld wide web, WWW ’10, pages 431–440, New York, NY, USA,2010. ACM.

[10] T. Lange, M. Kowalkiewicz, T. Springer, and T. Raub. Overcomingchallenges in delivering services to social networks in location centricscenarios. In Proceedings of the 2009 International Workshop onLocation Based Social Networks, LBSN ’09, pages 92–95, New York,NY, USA, 2009. ACM.

[11] N. Roussopoulos, S. Kelley, and F. Vincent. Nearest neighbor queries.SIGMOD Rec., 24:71–79, May 1995.

[12] D. Schuster, T. Springer, and A. Schill. Service-based developmentof mobile real-time collaboration applications for social networks.In Pervasive Computing and Communications Workshops (PERCOMWorkshops), 2010 8th IEEE International Conference on, pages 232–237, 2010.

[13] http://dev.twitter.com.[14] http://twitter4j.org.[15] http://dev.twitter.com/pages/rate-limiting.[16] T. Yan, V. Kumar, and D. Ganesan. Crowdsearch: exploiting crowds for

accurate real-time image search on mobile phones. In Proceedings ofthe 8th international conference on Mobile systems, applications, andservices, MobiSys ’10, pages 77–90, New York, NY, USA, 2010. ACM.

518

[ieee 2011 ieee international conference on pervasive computing and communications workshops (percom...

Documents