m&e sharenet 2014_kolkata_connected customer audience analytics_twitter data analytics.doc

Upload: saikat-chatterjee

Post on 02-Jun-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/10/2019 M&E ShareNet 2014_Kolkata_Connected Customer Audience analytics_Twitter Data Analytics.doc

    1/8

    Data Analytics using Twitter Social Media

    Author(s):Saikat Chatterjee

    Document: Techathon Solution Overview Template

    Owner: IBM Status: Draft

    Page 1 of 8

  • 8/10/2019 M&E ShareNet 2014_Kolkata_Connected Customer Audience analytics_Twitter Data Analytics.doc

    2/8

    Contents Contents

    1 "igh #evel Overview1 1 Intro%uction

    1 & Solution Overview

    & Detaile% Description& 1 (rchitecture Overview

    & & Macro %esign

    ! +nvironment ,ee%s

    Document: Techathon Solution Overview Template

    Owner: IBM Status: Draft

    Page & of 8

  • 8/10/2019 M&E ShareNet 2014_Kolkata_Connected Customer Audience analytics_Twitter Data Analytics.doc

    3/8

    1. High Level Overview

    1.1 Introduction Twitter is a massive social networking site tuned towards fast communication. More than 140million active users publish over 400 million 140-character Tweets ever! da!. Twitter"s speedand ease of publication have made it an important communication medium for people from allwalks of life. Twitter has pla!ed a prominent role in socio-political events# such as the $rab%pring and the &ccup! 'all %treet movement. Twitter has also been used to post damagereports and disaster preparedness information during large natural disasters# such as the(urricane %and!.

    This document provides a ver! high level overview of the proposed solution and thesoftware)hardware re*uirements necessar! for building it.

    1.2 Solution Overview This application showcases some of the data anal!tical works that can be achieved using the Twitter +,%T based $ -

    /ollecting# storing# and anal! ing Twitter data

    %tore this data in a tangible wa! for use in real-time applications

    ocus on common measures and algorithms that are used to anal! e social media data

    2isual anal!tics# an approach which helps humans inspect the data through intuitivevisuali ations

    Document: Techathon Solution Overview Template

    Owner: IBM Status: Draft

    Page ! of 8

  • 8/10/2019 M&E ShareNet 2014_Kolkata_Connected Customer Audience analytics_Twitter Data Analytics.doc

    4/8

    2. Detailed Description

    Collecting, storing, and analyzing Twitter data3sers on Twitter generate over 400 million Tweets ever!da! 1 . %ome of these Tweets areavailable to researchers and practitioners through public $ s at no cost. n this chapter we willlearn how to e tract the following t!pes of information from Twitter5

    nformation about a user#

    $ user"s network consisting of his connections#

    Tweets published b! a user# and

    %earch results on Twitter.

    $ s to access Twitter data can be classi6ed into two t!pes based on their design and accessmethod5

    +,%T $ s are based on the +,%T architecture 7 now popularl! used for designing web$ s. These $ s use the pull strateg! for data retrieval. To collect information a usermust e plicitl! re*uest it.

    %treaming $ s provides a continuous stream of public information from Twitter. These$ s use the push strateg! for data retrieval. &nce a re*uest for information is made# the%treaming $ s provide a continuous stream of updates with no further input from theuser. The! have di8erent capabilities and limitations with respect to what and how muchinformation can be retrieved. The %treaming $ has three t!pes of endpoints5 ublic streams5 These are streams containing the public tweets on Twitter. 3ser streams5 These are single-user streams# with to all the Tweets of a user.

    %ite streams5 These are multi-user streams and intended for applications whichaccess Tweets from multiple users.

    Storing Twitter Data There has been an e plosion in the si e of data generated on social media. This data e plosioncalls for a new data storage paradigm. $t the forefront of this movement is 9o%:;# whichpromises to store big data in a more accessible wa! than the traditional# relational model. Thereare several 9o%:; implementations. n this book# we choose Mongo

  • 8/10/2019 M&E ShareNet 2014_Kolkata_Connected Customer Audience analytics_Twitter Data Analytics.doc

    5/8

    Analyzing Twitter DataMan. of the /uestions that we as0 of our Twitter %ata can e answere% through networ0 anal.sis 2uestionssuch as 3who is important456 3who tal0s to whom456 an% 3what is important45 can all e answere% through anetwor0 7sing proper networ0 measures6 we can fin% these important actors or topics in a networ0

    /entralit! - 'ho is important@

  • 8/10/2019 M&E ShareNet 2014_Kolkata_Connected Customer Audience analytics_Twitter Data Analytics.doc

    6/8

    2.2 Macro design

    Document: Techathon Solution Overview Template

    Owner: IBM Status: Draft

    Page ) of 8

  • 8/10/2019 M&E ShareNet 2014_Kolkata_Connected Customer Audience analytics_Twitter Data Analytics.doc

    7/8

    Document: Techathon Solution Overview Template

    Owner: IBM Status: Draft

    Page * of 8

  • 8/10/2019 M&E ShareNet 2014_Kolkata_Connected Customer Audience analytics_Twitter Data Analytics.doc

    8/8

    3. nviron!ent "eedsID : !et"eans #

    $anguage and Tools: %D& ' #, % ' *, A+ache Ant latest ersion, -it ' . / 0

    1e" Ser er: A+ache To2cat .

    De elo+2ent 3S: 1indows # (Internet 2ust "e accessi"le as our a++lication calls Twitter4s 5 ST A6I), Ad2inistrator access le el

    De+loy2ent 3S: $inu7 (Internet 2ust "e accessi"le as our a++lication calls Twitter4s 5 ST A6I), Ad2inistrator access le el

    Data"ase: 8ongoD9 latest ersion

    Document: Techathon Solution Overview Template

    Owner: IBM Status: Draft

    Page 8 of 8