potholes and bad road conditions- mining twitter to

66
Potholes and Bad Road Conditions- Mining Twitter to Extract Information on Killer Roads Swati Agarwal 1 , Nitish Mittal 2 , Ashish Sureka 3 1 BITS Pilani, Goa Campus, India 2 Noon.com, Dubai, UAE 3 Ashoka University, India Research Track 5 The ACM India Joint International Conference on CoDS & COMAD January 13, 2018 Goa, India

Upload: others

Post on 02-Jan-2022

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Potholes and Bad Road Conditions- Mining Twitter to

Potholes and Bad Road Conditions- Mining

Twitter to Extract Information on Killer Roads

Swati Agarwal1, Nitish Mittal2, Ashish Sureka3

1 BITS Pilani, Goa Campus, India2 Noon.com, Dubai, UAE

3 Ashoka University, India

Research Track 5

The ACM India Joint International Conference on CoDS & COMAD

January 13, 2018

Goa, India

Page 2: Potholes and Bad Road Conditions- Mining Twitter to

Motivation

2

• Increasing trend of adaptation of social media by Indian Government and public

agencies to reach out to the people

• public citizens use Twitter to post their complaints and report the incidents to the

concerned authorities- Citizensourcing

• Complaints on killer roads contain the information about road irregularities and

other issues causing high risks and discomfort to the citizens.

• To develop a system that can automatically identify complaint reports and

overcome the challenge of manual inspection.

• To extract important information from tweets such as the exact geographical

location which can be used to locate the fault

Page 3: Potholes and Bad Road Conditions- Mining Twitter to

Examples- Bad Road Complaints

Page 4: Potholes and Bad Road Conditions- Mining Twitter to

Related Work

• Mining online communications from Twitter to automatically determine road

hazards by building language models. [Kumar et al 2014]

• Mining tweet texts to extract information on highways and arterial roads in

Pittsburg and Philadelphia Metropolitan Areas. [Gu et al 2016]

• Analyzing real-time traffic related data for incident management. [Fu et al

2015] [Eleonora 2015] [Mai et al 2013] [Schulz et al 2013]

• Identifying traffic information on Twitter by mining traffic congestion,

incidents and weather information. [Napong et al 2011]

• Examining the use of Twitter for making public awareness during heavy

snow and public riots. [Panagiotopoulos et al. 2016]

4

Page 5: Potholes and Bad Road Conditions- Mining Twitter to

Related Work- Steps Taken by Indian Government

• TwitterSeva

• A program to address public complaints on one shared portal.

• Uniting 200 social media handle

• Sub-units

• #DOTSeva, #BSNLSeva, #MTNLSeva, PostalSeva (Telecommunication

and stakeholder departments)

• #MociSeva (Ministry of Commerce and Industry)

• Cybercell

• Union minister of Women and Child Development

• Online complaints against women harassment

• Online trolls and abuse against women

5

Page 6: Potholes and Bad Road Conditions- Mining Twitter to

Novel Research Contributions

• First study on mining citizen's complaints and reports on killer roads posted on

official Twitter handle of Government.

• We investigate the efficacy of spatial, contextual and linguistic features for

identifying useful information from complaint posts.

• We build a text-analysis based model to enrich spatial features (geographical

location metadata) in a tweet that can be used to discover insights from less

informative reports.

• In order to conduct our experiments for this research, we create the very first

database of citizens’ complaints on killer roads and highways and make our

dataset publicly available for the research community so that our results can be

used for benchmark, comparison and further extension.

6

Page 7: Potholes and Bad Road Conditions- Mining Twitter to

Enriched

Tweets

Raw

Tweets

Complaint

Reports

Proposed Research Framework

#JointHashtags, Spelling

Errors, Slang and

Abbreviations, @username

REST

API

Data Collection Micropost Enrichment [1] Non-Complaint Tweets [1]

Features Extraction

Tweet Metadata, Named

Entity Recognizer, N-

gram modeling,

Knowledgebase Graph

Complaint Tweets

Classification

Unknown

Features

Vector ModelOne-Class/ Multi-Class

Classification

Empirical Analysis and

Visualization

Problem Specific

Identification

7

[1] Mittal, N., Agarwal, S. and Sureka, A. (2016, December). Got a Complaint? Keep

Calm and Tweet it!. In Proceedings of the 12th International Conference on Advanced

Data Mining and Applications (ADMA). Springer.

Page 8: Potholes and Bad Road Conditions- Mining Twitter to

Experimental Dataset Collection

• Extracting all the tweets posted to the public agencies’ accounts

• Twitter Search API to extract the tweets consisting of @username in the tweets

• Duration: 8 weeks (18 July 2016 to 13 September 2016)

• @MORTHIndia- Ministry of Road, Transports, and Highways

• @nitin_gadkari- Union Minster of MORTHIndia

8

Total #tweets 81,304

Original 17,511

Retweet 52,701

Replies 11,092

English 13, 368

Unique 13,208

Sampled 3,302

Users 2,604

Page 9: Potholes and Bad Road Conditions- Mining Twitter to

Features Extraction

9

• Location- to know the region (or city) from which the complaint is

reported

• Landmark- the pinpoint location (or landmark) of the problem

• Problem- to know the type of issue faced by the citizens so that the

complaint can be forwarded to the concerned department

@MORTHIndia @nitin_gadkari

Page 10: Potholes and Bad Road Conditions- Mining Twitter to

1. Location

10

ED2- Tweets filtered by AISP classifier

“Mahatma Gandhi Marg”, “Vasant Vihar Colony”, “PVR Saket”, and “Nehru Place”

Page 11: Potholes and Bad Road Conditions- Mining Twitter to

2. Problem Identification

11

Dysfunctional Facilities- street light, traffic light, drainage

Risky and Hazardous- pothole, street light, traffic light, hawkers, breaker, sign

board, construction, intersection, diversion, animal, turn, crash, barrier

Poor Condition- road, highway, traffic, flyover, underpass, bypass, chowk,

expressway

Indiscipline and Carelessness- barrier, repair, sign board, pothole, parking,

police

Page 12: Potholes and Bad Road Conditions- Mining Twitter to

2. Problem Identification

12

Page 13: Potholes and Bad Road Conditions- Mining Twitter to

Classification

13

Rule-based

Classifier

Features Vector

ED2

Unknown Tweets

Useful Tweet (ED3)

Irrelevant Tweet

Nearly- Useful Tweet (ED4)

• Irrelevant Tweets (IRT)

• off-topic complaints

• Useful Tweets (UT)

• complete reports

• Nearly-Useful Tweets (N-UT)

• some information is missing

Page 14: Potholes and Bad Road Conditions- Mining Twitter to

Enrichment of N-UT?

14

• Problem,

• Region,

• Landmark/Pin-point location

USER’S PROFILE COMPLAINT TWEET POSTED BY THE USER

OPENSTREETMAP

OUTPUT

Page 15: Potholes and Bad Road Conditions- Mining Twitter to

Characterization of Complaint Reports

15

Distribution of Distinct Geographical

Locations Identified in the Complaint

Reports in our Experimental Dataset

Page 16: Potholes and Bad Road Conditions- Mining Twitter to

Experimental Results

16

Tweet Type #Tweets

Irrelevant 734

Nearly-Useful (NC) 1668

Nearly-Useful (C) 50

Useful 170

Appreciation 166

News 417

Information Sharing 97

Overall accuracy:

67%

Page 17: Potholes and Bad Road Conditions- Mining Twitter to

Limitations

• API dependency

• Presence of multilingual scripts and text in a tweet

17

Page 18: Potholes and Bad Road Conditions- Mining Twitter to

Takeaways

• We propose an approach for identification of complaints reported in road

irregularities and bad road conditions.

• We propose to use the application of ConceptNet lexical Knowledgebase, NER,

and geocoding location APIs to extract important components of a killer road

complaint.

• Based on the available components in the tweets, we classify them into three

categories: useful, nearly-useful, and irrelevant tweets.

• Our results shows that the proposed approach classifies these complaint reports

with an accuracy of 67% and a recall of 65%.

• Maximum number of complaints are reported about the risky and accident prone

roads while most of them are due to the poor condition of amenities.

• Complaints are reported from all over the India while maximum complaints are

reported from Maharashtra, New Delhi, Uttar Pradesh, and Bihar states.

18

Page 19: Potholes and Bad Road Conditions- Mining Twitter to

Future Directions

• Integration with other domains

• Information Visualization

• Metaphor analysis

• Modality and Dimensions

• Multimedia content

• Cross-platform analysis

• Novel Applications

• Corruption barometer

• Query vs. complaints

19

Page 20: Potholes and Bad Road Conditions- Mining Twitter to

References

20

1. Agarwal, S., Mittal, N., Sureka, A.: Syntactic enhancement of killer road complaint tweets posted on twitter, mendeley data, v1, doi:

10.17632/dm6s252524.1 (2017)

2. D’Andrea, E.e.a.: Real-time detection of traffic from twitter stream analysis. IEEE Transactions on Intelligent Transportation Systems

16(4), 2269–2283 (2015)

3. Gu, Y., Qian, Z.S., Chen, F.: From twitter to detector: Real-time traffic incident detection using social media data. Transportation

Research Part C: Emerging Technologies 67 (2016)

4. Havasi, C., Speer, R., Alonso, J.: Conceptnet 3: a flexible, multilingual semantic network for common sense knowledge. In: Recent

advances in natural language processing. pp. 27–29. Citeseer (2007)

5. Kumar, A., Jiang, M., Fang, Y.: Where not to go?: detecting road hazards using twitter. In: Proceedings of the 37th international ACM

SIGIR conference on Research & development in information retrieval. pp. 1223–1226. ACM (2014)

6. Mai, E., Hranac, R.: Twitter interactions as a data source for transportation incidents. In: Proc. Transportation Research Board 92nd

Ann. Meeting (2013)

7. Panagiotopoulos, P., Barnett, J., Bigdeli, A.Z., Sams, S.: Social media in emergency management: Twitter as a tool for communicating

risks to the public. Technological Forecasting and Social Change 111, 86–96 (2016)

8. Wanichayapong, N.: Social-based traffic information extraction and classification. In: ITS Telecommunications (ITST), 11th

International Conference on. IEEE (2011)

Page 21: Potholes and Bad Road Conditions- Mining Twitter to

References

21

9. Agarwal, S., Mittal, N., Sureka, A.: Enhanced dataset of citizen centric complaints and grievances on twitter, mendeley data, v1

http://dx.doi.org/10.17632/w2cp7h53s5.1 (2016)

10. Amari, S., Wu, S.: Improving support vector machine classifiers by modifying kernel functions. Neural Networks 12(6), 783 – 789

(1999)

11. Anderson, M., Lewis, K., Dedehayir, O.: Diffusion of innovation in the public sector: Twitter adoption by municipal police

departments in the us. In: Portland International Conference on Management of Engineering and Technology, 2015

12. Edwards, A., Kool, D.: Webcare in Public Services: Deliver Better with Less?, pp. 151–166. Springer International Publishing, Cham

(2015)

13. Frias-Martinez, V., Sae-Tang, A., Frias-Martinez, E.: To call, or to tweet? understanding 3-1-1 citizen complaint behaviors. In: ASE

Big- Data/SocialCom/CyberSecurity Conference (2014)

14. Heverin, T., Zach, L.: Twitter for city police department information sharing. Pro- ceedings of the American Society for Information

Science and Technology (2010)

15. Khan, G.F., Swar, B., Lee, S.K.: Social media risks and benefits a public sector perspective. Social Science Computer Review 32(5),

606–627 (2014)

16. Loukis, E., Charalabidis, Y., Androutsopoulou, A.: Evaluating a Passive Social Media Citizensourcing Innovation, pp. 305–320.

Springer International Publishing

17. Meijer, A.J., Torenvlied, R.: Social media and the new organization of government communications an empirical analysis of twitter

usage by the dutch police. The American Review of Public Administration p. 0275074014551381 (2014)

Page 22: Potholes and Bad Road Conditions- Mining Twitter to

We are hiring full-time and visiting faculty in Computer Science

Our focus is on: AI, Networks, Systems,Software Science, and Theory (but get in touch anyway if you are doing exciting work in some other area of Computer Science)

Write to us: [email protected]

Tel: +91 832 2580 107

Page 23: Potholes and Bad Road Conditions- Mining Twitter to

Thank you!

23

Page 24: Potholes and Bad Road Conditions- Mining Twitter to

Questions?

24

Page 25: Potholes and Bad Road Conditions- Mining Twitter to

BACK-UP SLIDES

25

Page 26: Potholes and Bad Road Conditions- Mining Twitter to

Open Source Social Media Data

Introduction

Twitter

Content and Blogger

#likes #retweets

URL, hashtag, media

Tweet

User@username, profile name, bio information, location, and external URLs

Activity feeds, followers and followings

Tweets posted by the blogger

YouTube

Title, description, uploader, and related videos

#likes, #dislikes, and #comments received on the video

content and user of the comments and replies posted on the video

Video

UserTitle and description of the channel

#subscribers and #subscriptions of a channel

featured channel, contact and subscriptions (optional)

Tumblr

Content, type, blogger and tags of a post

#notes, #likes, #dislikes and comments received on a post

who liked or reblogged a post

Post

BloggerTitle and description of the blogger

Activity feeds of the blogger

followers, following and likes of a blogger (optional)

26

Page 27: Potholes and Bad Road Conditions- Mining Twitter to

OSSMInt Applications for Government

Motivation

image source: http://blog.marketwired.com/2014/12/23/best-2014-social-media-marketing-posts/

Radical Users and

Communities

Public Citizens’ Complaints and

Grievances

Religious

Conflicts

Online Recruitment in Radical

Groups

Women and Child

Online Abuse &

Harassment

Civil Protest and

Unrest Forecast

Detecting Fake

News on OSM

Extremism and

Hate Promotion

Secret Message

Detection

27

Page 28: Potholes and Bad Road Conditions- Mining Twitter to

Twitter

28

Profile Picture

Activity Feeds Interests Writing a new tweet

Multimedia attachment

Enable location

Maximum length Poll

New and Trending Topics

Change the

Region

Tweet Metadata

Likes, Reply, Retweet

External Entity

(Blog URL)

Twitter Blog

Recommendations

Blogs followed

by Visitor

Direction Mention

Username

Profile Name

Verified

User Bio

Location

External Links

Page 29: Potholes and Bad Road Conditions- Mining Twitter to

IntroductionMining Public Complaints

and Grievances on Twitter

Ministry of the Railway (@railminindia & @sureshpprabhu)

State Police (@delhipolice & @mumbaipolice)

Ministry of Road and Transport (@morthindia & @nitin_gadkari)

State Traffic Police (@dtptraffic)

Income Tax Department (@incometaxindia)

Ministry of External Affairs (@sushmaswaraj)

Department of children & youth affairs press

office account (@DCYAPress)

Department of health (@roinnslainte)

Prime Minister Theresa May’s office

(@Number10gov)

national government (@GOVUK)

Citizensourcing

disseminating information

collecting complaints and grievances from citizens

29

Page 30: Potholes and Bad Road Conditions- Mining Twitter to

Case Study 1- Mining Public Complaints and Grievances

• General issues that are faced by public citizens and seek for immediate actions by

the concerned authorities.

• Examples:

• Corruption, illegal transactions from banks

• No-refund on ticket cancellation, delay of train, poor facilities on train/station

• Traffic violations

• Crimes

• In 1st week of January 2017, 55 trains delayed in North India due to fog

• Heavy rainfall in Delhi-NCR caused massive traffic jam for hours

Motivation

30

Page 31: Potholes and Bad Road Conditions- Mining Twitter to

Motivation

Traffic Violation

PAN card delayed

Illegal Transaction

Poor facilities on train

Case Study 1- Mining Public Complaints and Grievances

31

Page 32: Potholes and Bad Road Conditions- Mining Twitter to

Case Study 2- Mining Twitter to Extract information on Killer Roads

• Bad Road Conditions

• Road Irregularities, Roughness, potholes, bumps, patchy surface and poorly

designed speed breakers

• Accidents Prone Roads

• Blind or improper curves, temporary diversions, traffic on under repair roads.

• Dysfunctional Amenities

• Poor street lights or dim lights, encroachments on roads by shops, hawkers and

broken road signboards

Motivation

32

Page 33: Potholes and Bad Road Conditions- Mining Twitter to

Case Study 2- Mining Twitter to Extract information on Killer Roads

Motivation

33

Page 34: Potholes and Bad Road Conditions- Mining Twitter to

Dataset Characterization

Case Study

1

34

Page 35: Potholes and Bad Road Conditions- Mining Twitter to

Case Study 2- Mining Twitter to Extract information on Killer Roads

Motivation

Road Irregularities Dysfunctional Amenities

Accident Prone RoadsUnder-construction Road

35

Page 36: Potholes and Bad Road Conditions- Mining Twitter to

Padding Space CorrectionMicropost Enrichment

Algorithm

• Add a space after all special characters

• Comma, semi colon, question mark, period, exclamatory mark

• Trim all extra whitespaces

36

@ nitin_gadkari Pls re tar our village road,

a farmer died y’day while jumping

pothole, road condition is horrible, cant

even walk, pls help

Page 37: Potholes and Bad Road Conditions- Mining Twitter to

Non-Complaint Tweets (AISP)

37

Appreciation Posts

Information Sharing

Promotional Tweets

Page 38: Potholes and Bad Road Conditions- Mining Twitter to

1. Frequent N-grams

38

• Grouped triplets of word n-grams frequently occurring in

complains

• Addresses the challenge of keyword spotting method

• Use of WordNet lexical resource to map similar terms

Case Study

1

Page 39: Potholes and Bad Road Conditions- Mining Twitter to

2. Closed-Domain Keywords

39

• Terms occurring in a complaint specific to a department of

public agency

• N-grams- to identify the type of the complaint

Case Study

1

Page 40: Potholes and Bad Road Conditions- Mining Twitter to

3. Events and Substances

40

• Concepts present in the tweets- not identified as the entities

• Extracted using IBM Watson Concept and Relationship Extraction API

• Uses various lexical resources and knowledgegraphs as the backend

databases (Yago, Dbpedia, WordNet, and Freebase)

Case Study

1

Page 41: Potholes and Bad Road Conditions- Mining Twitter to

4. Location

41

• Location reported in the complaints

• Identify people, location, and organization

• Verify the location by applying geocoding on the entities

• M. B. Road

• Gandhi Nagar

• AIIMS metro station

Case Study

1

Page 42: Potholes and Bad Road Conditions- Mining Twitter to

5. Media Presence

42

Case Study

1

Page 43: Potholes and Bad Road Conditions- Mining Twitter to

2. Problem Identification

43

Case Study

2 Big potholes on Delhi roads. It’s risky to drive at nights.

Streetlights on highway not working

Dim streetlights on highway.

Highway is full of dark.

there is complete blackout on highway.

Page 44: Potholes and Bad Road Conditions- Mining Twitter to

1. Location- Landmark

44

Case Study

2 private

administrative

locality

canal

residential

hospital

hamlet

aerodrome

station

trunk

commercial

construction

neighborhood

common

bank

motorway junction

suburb

kindergarten

river

bus stop

bus station

school

restaurant

unclassified

motorway

hotel

secondary

peak

primary

water

public building

bicycle parking

service

house

stadium

railway

Page 45: Potholes and Bad Road Conditions- Mining Twitter to

1. Location- Region

45

Case Study

2 Village

Town

City

District

Territory

Page 46: Potholes and Bad Road Conditions- Mining Twitter to

Enrichment of N-UT?

46

Case Study

2• Problem,

• Region,

• Landmark/Pin-point location

Sampled Data (ED1)

F1 .. .. Fi-1 Fi

Hashtag Expansion

Spell Error Correction

Information Sharing

Promotion Appreciation

AISP Tweet Classif cation

AISP

Unknown (ED2)

1Tweet Pre-Processing

Sentence Segmentation

Slang Treatment

@username mention treatment

Sampled Data (ED1)

F1 .. .. Fi-1 Fi

Tweets Pre-processing

AISP Tweet Classif cation

AISP

Unknown (ED2)

ED2

Geographical Location

Issue/Incident

Feature Extraction

Classif cation

Irrelevant

Incomplete Report (ED3)

Useful Tweets (ED4)

ED4 Spatial Feature Enrichment Gaining Insights and

Knowledge from Killer Road Complaints

ED3 Useful Tweets

Page 47: Potholes and Bad Road Conditions- Mining Twitter to

2. Problem Identification

47

Case Study

2

Page 48: Potholes and Bad Road Conditions- Mining Twitter to

Experimental Results

48

Case Study

1

@dtpTraffic @RailMinIndia @IncomeTaxIndia0

0.2

0.4

0.6

0.8

1

Experimental Dataset

Pre

cisi

on

Linear

Polynomial

RBF

Cascaded Ensemble

Parallel Ensemble

0.7

6

0.7

5

0.4

7

0.5

6

0.7

6

0.4

3

0.4

2

0.2

8

0.3

3

0.4

3

0.6

2

0.6

1

0.3

6

0.4

3

0.8

3

Page 49: Potholes and Bad Road Conditions- Mining Twitter to

Experimental Results

49

Case Study

1

Page 50: Potholes and Bad Road Conditions- Mining Twitter to

Experimental Results

50

Case Study

1

Page 51: Potholes and Bad Road Conditions- Mining Twitter to

Dynamics of Religious Conflicts on Social

Media

51

Notes

Type

Blogger

Text

Link

Chat

Photo

Audio

Video

Quote

Answer

Photo,Audio,Video,Link,Answer,Chat,Quote,Text

NoteBy,Type(Like/Reblog)

Tumblr Metadata

URL

Tag

Time

Notes

Type

Blogger

Seed

Tags

TumblrMetadata

Translator Language Identif er Posts and

Blogs

Contextual and Non-Text Metadata Text Data

English Language Data Non-

English

Description Blogger Metadata

Experimental Dataset

#Posts,Title,DescripFon

Tumblr Search API

Blogger Relations (Like/Reblog)

Posts Metadata

Notes Metadata

Blogger Metadata

Post Types

Tumblr Metadata

Topic Identif cation

Topic Identif cation

LIWC

Named Entities

Taxonomy

Word Count Features (WCF) Named Entities Features (NEF)

Religion

Unknown

Concepts

Multi-Label

Taxonomy Posts tagged with all religions mentioned in the textual content

LIWC PTopic

Feature 1 Feature 2

..

..

.. Feature N

Posts

User

NER

NER

<People, Date> <Organization> <Place, Time>

<Age, Location>

PNEF

UNEF

Experimental Dataset

Post Metadata Post Metadata

Post and Blogger Metadata

Page 52: Potholes and Bad Road Conditions- Mining Twitter to

Term Obfuscation Detection in Secret Message Communication

52

MEAN AVERAGE CONCEPTUAL SIMILARITY (MACS)

WORD OBFUSCATION CLASSIFIER

Page 53: Potholes and Bad Road Conditions- Mining Twitter to

Early forecast of civil protests

53

Learning(Models:(

Named(En2ty(Recogni2on:(• • •

Page 54: Potholes and Bad Road Conditions- Mining Twitter to

Online Radicalization Detection

54

• Hate promoting content identification

• Twitter

• Radicalized users and communities detection

• YouTube

• Tumblr

• Intent-based radicalized and racist posts identification

• Tumblr

Page 55: Potholes and Bad Road Conditions- Mining Twitter to

55

Sensitive Topic Content

Users and Hidden Community

Government Bodies

Monitoring OSM Social Media

Expressing their Opinions

Promoting Certain Ideology and Beliefs

Discussion on a variety of sensitive topics (religion and race), presence

of hate promoting, radicalized and extremist content

Shallow NLP Techniques

Deep NLP and Semantics

Users Uploading Extremist Posts and Forming Communities

Graph Traversal and Topical/Focused Crawler

N-gram Feature Vectors Model

Personality Traits, Writing Cues, Word Semantics, Emotions

One-Class KNN and SVM

Random Forest, Naïve Bayes, and

Decision Tree

Friends, Subscribers, Featured channels

Notes, Likes, Re-blogs

Shark Search, Best First Search

Random Walk

Linguistic and Contextual Metadata Analysis Links and User Relationship Analysis

Requirement of Monitoring Social Media

OSSMInt Applications (Identification and

Detection)

Purpose of Author for uploading such content

High-Level Approach

Social Media Platforms and Data Sources

Discriminatory Features (Extracted and Derived)

Content and Network Analysis (Data Mining

and Machine Learning)

Page 56: Potholes and Bad Road Conditions- Mining Twitter to

Future Directions

• Novel Applications

• Corruption barometer

• Intent vs. impact of hate promoting posts

• Recruitment analysis in online radicalized communities

• Fake news detection

• NLP systems for security informatics

56

Page 57: Potholes and Bad Road Conditions- Mining Twitter to

Micropost Enrichment AlgorithmMining Public Complaints and

Grievances on Twitter

• Consisting of data pre-processing and text enrichment techniques

• Addressing the problem of noisy data in tweets

Raw Tweets

Sentence Segmentation

Micropost Enhancement and Enrichment

Enriched Tweets

Hashtag Expansion

Padding Space Correction

Spell Error Correction

Acronym & Slang

Treatment

@Username Expansion

57

Page 58: Potholes and Bad Road Conditions- Mining Twitter to

Sentence SegmentationMicropost Enrichment

Algorithm

• Find ill hashtags in the tweet (ignore the hashtag present in URL)

• Removal of filling terms

• Umm, hmm, err,

• Trim consecutive special characters occurring together

• ????, !!!!!, …..

58

Raw Tweets

Sentence Segmentation

Micropost Enhancement and Enrichment

Enriched Tweets

Hashtag Expansion

Padding Space Correction

Spell Error Correction

Acronym & Slang

Treatment

@Username Expansion

Page 59: Potholes and Bad Road Conditions- Mining Twitter to

Joint Hashtag ExpansionMicropost Enrichment

Algorithm

1. Common Separator (’-’ and ’-’)

• #no-strong-action-by-police = no strong

action by police

• #black_money = black money

1. Uppercase Letters

• #MarchForDemocracy = March For

Democracy

• #CharchaOnRWH = Charcha On RWH

• #FANTomorrow = FAN Tomorrow

59

3. Alphanumeric String

• #TheriJoins100crClub = Theri Joins 100 cr Club

• #NH8= NH 8

4. Longest subsequence

• #seriousissue = serious issue

• #havesomesenseofchecking = have some sense of

checking

• #swachhbharat #swaranshatabdi

#modisarkar

Page 60: Potholes and Bad Road Conditions- Mining Twitter to

Spell Error CorrectionMicropost Enrichment

Algorithm

• Given Sentence S= ”plnty of ppl waiting frm past hour”,

• set of 3-grams= [n1: ’plnty of ppl’, n2: ‘of ppl waiting’, n3: ’ppl waiting frm’, n4:

’waiting frm past’, n5: ‘frm past hour’]

60

Page 61: Potholes and Bad Road Conditions- Mining Twitter to

Acronym and Slang TreatmentMicropost Enrichment

Algorithm

• Domain Specific Slangs (fetched from frequent n-grams, n=1, 2, 3, 4)

• NH (National Highway), Toll, RTO (Regional Transport Office), RC (Vehicle Registration

Certificate), KM (kilometers), STN (station), Acc (accident) and RD (Road)

• Numeric

• 2- to, two, too?

• 4- for, four?

• Others (General slang or abbreviations)

• B4- before, plz- please, idk- i don’t know, c- see

• NoSlang online dictionary

61

Page 62: Potholes and Bad Road Conditions- Mining Twitter to

Username ExpansionMicropost Enrichment

Algorithm

62

• Expanding @username mentioned directly in the tweet

• Extracting profile name using Twitter REST API

Page 63: Potholes and Bad Road Conditions- Mining Twitter to

Proposed Research FrameworkMining Public Complaints and

Grievances on Twitter

Enriched

Tweets

Non-Complaint Tweets

Raw

Tweets

#JointHashtags, Spelling

Errors, Slang and

Abbreviations, @username

Micropost Enrichment

REST

API

Data Collection

Complaint

Reports

Features Extraction

Tweet Metadata, Named

Entity Recognizer, N-

gram modeling,

Knowledgebase Graph

Complaint Tweets

Classification

Unknown

Features

Vector ModelOne-Class/ Multi-Class

Classification

Empirical Analysis and

Visualization

Problem Specific

Identification

63

Page 64: Potholes and Bad Road Conditions- Mining Twitter to

Non-Complaint Tweets (AISP)

64

Appreciation Posts

Tweet

Bag-of-

words

Core NLP

Exhaustive set of

Appreciation keywords

Similarity

Computation

Lemmatized

Match?

Tweet

Tone Analyzer

IBM Watson APIJoy Score?

> Threshold then

appreciation post

Sentence Level

Tone

Anger

Sadness

Fear

Joy

Anxiety

Non-Complaint Tweets

Page 65: Potholes and Bad Road Conditions- Mining Twitter to

Non-Complaint Tweets (AISP)

65

Information Sharing

• Tweets consisting of an external URL

• URL not an image, video

• Quote tweets

Non-Complaint Tweets

Page 66: Potholes and Bad Road Conditions- Mining Twitter to

Non-Complaint Tweets (AISP)

66

Promotional Tweets

• Tweets posted by other accounts or handle of same

department or public agency

• Tweet posted by a verified account

Non-Complaint Tweets