harvesting crowdsourcing biodiversity data from facebook groups

Download Harvesting crowdsourcing biodiversity data from Facebook groups

Post on 12-May-2015

490 views

Category:

Documents

1 download

Embed Size (px)

TRANSCRIPT

  • 1.Harvesting crowdsourcing biodiversity data from Facebook groups Jason Guan-Shuo Mai1, Cheng-Hsin Hsu1, Dong-Po Deng2, De-En Lin3, Hsu-Hong Lin3, Kwang-Tsao Shao11 Taiwan Biodiversity Information Facility (TaiBIF), Biodiversity Research Center, Academia Sinica, Taipei, Taiwan2 Institute of Information Science, Academia Sinica, Taipei, Taiwan3 Taiwan Endemic Species Research Institute, Council of Agriculture, Nantou, TaiwanThe emergence of Web 2.0 enables people to contribute their biodiversity observations on the Web. These crowdsourcing biodiversity data are increasing theirvalue in scientific studies due to the potentially broader spatial and temporal scales. However, the data provided in plain text hinder the process of data retrievaland analysis. In this study, we propose a framework to automatically structure the loose-format text so that volunteers can keep providing data in their ownfamiliar ways, while interested citizens, biodiversity researchers and managers can benefit from the semantically structured information. We take 2 Facebookbiodiversity interest groups Reptile-Road-Mortality and Enjoy-Moths as examples.0. Crowdsourcing - Threadparticipants provide2. Using natural languagePost messageunstructured data processing techs with Taiwanvoluntarily Geographic Name and Taiwan Post PictureCatalogue of Life databases as Facebook interest groups knowledge bases to extract Comment messagespecies vernacular names and6. Improvingplace names from a threadComment messagesource data Comment messagequality withoutchanging usersReptile-Road-Mortality Enjoy-Moths What a typical discussion threadown familiar looks like.ways1. Crawling data fromFacebook via its API Our algorithm picks a most related species name appearing in a thread based on social networking characteristics.Semanticannotation tooldisambiguatesFor each vernacular name in TaiCOL do:toponymic occurs in the message?Full-matchedhomonymsYes name Nooccurs in the Prefix3message?Postfix2 occurs in the thread? Yes Yes NoNo occurs in the One click on amessage? message to recognize speciesMain Prefix2 YesPostfix1No YesNo vernacular names and related DatabaseName doesnt exist in theMatched abbreviation messageCalculate confidence score informationof this name5. Developing 4. Publishingbrowser plug- linked openins to give data via D2Rusers digested server forfeedback of open accessstructuralized and usagedata Our dataset is linked to other datasets on linked open data cloud such as DBPedia, GeoNames and LODE (Linked Open Data of 3. Introducing content management Ecology) so it can have benefit from the large amount of meta-information they provide. system Drupal for easier dataAlgorithms used to recognize abbreviationsmanagement (including errorof vernacular names and place namescorrection) and display