classification of unwanted messages in online social network using machine learning algorithms

3

Click here to load reader

Upload: seventhsensegroup

Post on 16-Apr-2017

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Classification of Unwanted Messages in Online Social Network Using Machine Learning Algorithms

International Journal of Computer Trends and Technology (IJCTT) – volume 4 Issue 8–August 2013

ISSN: 2231-2803 http://www.ijcttjournal.org Page 2699

Classification of Unwanted Messages in Online Social Network Using Machine Learning Algorithms

Padma Priya.B#1, Sathiyakumari.K*2 #1Research Scholar,*2Assistant Professor PSGR Krishnammal College for Women

Bharathair University Coimbatore

India

Abstract— This One major fact in today's technical world, people are very active users of Online Social Networks. They share every details of their day to day life and are in touch with their loved ones no matter in which part of the world they live. The main issue is the ability to control the messages that are posted in the user's private message or walls to detect and negotiate unwanted messages. This work focus on predicting the emotions of a particular message or post in various OSN like twitter, blogs etc for emotion analysis so as to filter the messages which are inappropriate. This paper focuses on collecting corpus for sentimental analysis and performs linguistic analysis and machine learning techniques for predicting emotions accurately. Using the corpus we define distinct emotions and filter unwanted messages. Keywords— Online Social Networks (OSN), information filtering, short text classification, criteria-based personalization

I. INTRODUCTION Online social network is one of the standard platforms for

social collaboration.. Unlike olden days, messages are send through letters, telephones, emails etc.Due to the overwhelming technical development people share their day to day life details through social networking websites. Continuous communication among people implies that there is a considerable amount of data transfer which includes text, audio, video which depicts one's human life information explicitly. Interpersonal communication is a growing issue where people tend to explore themselves, relationships and social cultural artefacts. The huge and dynamic nature of this data employs the researcher to mine or discover useful information from online social networks. In online social networks Information filtering can be used for more sensitive purpose as there is a possibility of posting or commenting texts or content those are inappropriate. In psychology and philosophy emotion is a subjective conscience which is categorized into different types. Here we deal with emotions that are expressed using text for example tweets, comments etc. The aim of the present work is to propose a system which will be able to classify the short text messages in different categories and cordially filtering it. For learning model we use SVM , Naïve Bayes for classifying emotion. So for emotion analysis for text is done in documents, stories, novels which has its own limitations whereas here we predict the emotion for user conversations, tweets, comments for a socially safe

environment since lately people try to misuse the privileges and sometimes spam messages and vulgar content is exhibited by users. First the text is classified in to five categories. Primary emotions are detected like happy, sad, angry, surprise and in non neutral two emotions are detected vulgar and offensive.

The data is collected from twitter [2].As we need to find the emotions of different people and different type of conversation twitter is the exact medium for data collection. Conversations from blogs and micro blogging sites are also collected. Nearly two thousand tweets are collected and a text from various online social networks is collected.

II.RELATED WORK Adil et all [1] has studied human emotions in text in a

multimodal form which includes visual and acoustic features. Alec Go et al [5] have classified the tweets as positive, negative and neutral. Dan Roth et all [9] .Diana et all [14] have used two data sets Sem Eval 2007 Task 14 and emotion annotated blog corpus where they classify six basic emotion using SVM and other machine learning algorithms. Schaffer and Diana 2011 [16].

III.DATA SET The data set is collect from twitter. Tweets are collected

from web [2].The data set had multilingual tweets. Foreign language tweets have been removed from the dataset. The data set has only tweets in English. The resulting data set has 7500 tweets.

TABLE I EXAMPLE OF REFINED TWEETS

Honesty hurts. :) @im_rahultomar: frankly speaking i donno... Im a proud human being but when it comes to being Indian i don’t know @Jiah Khan no more? Unbelievable! She was so young. Ritu Da, a sensitive artistic mind, a gentle human, considerate and caring. Gone! Spoke while ago on doing another film together!

A. Data Annotation

Page 2: Classification of Unwanted Messages in Online Social Network Using Machine Learning Algorithms

International Journal of Computer Trends and Technology (IJCTT) – volume 4 Issue 8–August 2013

ISSN: 2231-2803 http://www.ijcttjournal.org Page 2700

Emotion labeling is reliable and effective if there are more than one judgment for each label. Five judges have manually annotated the data. They have to label the data set (tweets) as to which emotion category they emotion category it is described as undefined.

B. Measuring Annotations The interpretation of emotion analysis in text is very

subjective which leads to disagreement between judges. To predict emotion effectively we use Cohen’s kappa method. Cohen’s kappa is a statistical and efficient measure for inter annotator agreement which helps in predicting the accurate emotion of a particular text.

C. Learning Model and Feature Set Our emotion classifier is based on Machine learning

algorithms. First from the collected data set the stop words are removed and stemmed. The normalized data thus obtained is used as vector for training the vector. The following features extracted .They are Unigrams, Bigrams, Personal , pos, pos bigrams, word net effect emotion lexicon, BoW,Dp .Each word is stemmed using porter stemmer. Personal pronoun, adjectives pos, pos bigram are extracted using Stanford Penn Bank POS-Tagger. Word net effect emoticon lexicon captures the contextual information of the particular text. Using these features emotion of a text is defined. All proposed features are analyzed in our experiment in order to find the combination of most appropriate context message classification.

D. Experiment and Result This section describes the data collections, classifiers and

other parameters used to conduct the experiments, as well as the demonstrate results obtained using the tool. The open source data mining tool Rapid Miner 5.There are two classification algorithms are used for the emotion classification, such as naïve bayes and support vector machine. These are implemented and trained using Rapid Minor. The Rapid minor is a collection of state-of-the-art machine learning algorithms and data pre-processing tools. . The robustness of the classifiers are evaluated using 10 fold cross validation for all the algorithms. Predictive accuracy is used as a primary performance measure for predicting the emotions in text. Precision, Recall, F Score are the parameters used in evaluating the predictive accuracy there by comparing with machine learning algorithms. Using these metrics and features combined we compare the prediction accuracy with the two machine learning algorithms.

TABLE III COMBINATION OF FEATURES IN TERMS OF PRECISION, RECALL, F SCORE.

FEATURES PRECISION RECALL F-SCORE DP 38% 25% 32% BoW 42% 29% 35% Bigram 56% 30% 36% Unigram 28% 45% 40% Pos 63% 47% 49% Pos Bigram 56% 58% 52% Dp+BoW 65% 59% 57% Dp+Bow+Bigram 55% 60% 59%

Dp+BoW+Bigram+Unigram 67% 64% 60% Dp+BoW+Bigram+Unigram+Pos Bigram

74% 67% 67%

TABLE IIIII RESULT OF THE PROPOSED WORK IN TERMS OF PRECISION,

RECALL, F SCORE IN CLASS VALUES Metrics Happy Sad Angry Vulgar Offensive

Precision 87% 53% 66% 65% 58%

Recall 78% 79% 69% 72% 63%

F Score 73% 81% 77% 80% 77%

TABLE IVV

PREDICTION ACCURACY COMPARED WITH TWO ALGORITHMS NAÏVE BAYES AND SVM

classifiers Naive Bayes SVM Time taken to

build model(min)

3 5

Correctly classified instances

732 954

Incorrectly classified instances

115 95

Prediction accuracy 67.64% 75%

The above table shows that comparison of NB and SVM. The NB algorithm gives the low accuracy compare to SVM.

E. Future Work

In future work we can use other machine learning algorithms and fuzzy neural network to create a hybrid of algorithms in order to acquire more accurate results. Online Social Network can use these text mining and sentimental analysis techniques to a greater level so as to filter unwanted text from the user wall.

Page 3: Classification of Unwanted Messages in Online Social Network Using Machine Learning Algorithms

International Journal of Computer Trends and Technology (IJCTT) – volume 4 Issue 8–August 2013

ISSN: 2231-2803 http://www.ijcttjournal.org Page 2701

IV. REFERENCES [1] 1. Adil Alpkocak Jan 1 2008 AISB 2008 Convention Communication. [2] Information gathered from http://infolab.tamu.edu/resources [3] http://archive.ics.uci.edu/ml/datasets/SMS+Spam+CollectionGo_Bhay

ani_Huang_2009 http://www.stanford.edu/~alecmgo/papers/TwitterDistantSupervision09.pdf CS224N Project Report, Stanford

[4] Cecilia Ovesdotter Alm, Dan Roth, Richard Sproat 01/2005; In proceeding of: HLT/EMNLP 2005, Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, 6-8 October 2005, Vancouver, British Columbia, Canada

[5] KNOWLEDGE ENGINEERING: PRINCIPLE AND TECHNIQUE, KEPT 2008 International Conference on Knowledge Engineering Principles and Techniques Selected Papers, Cluj-Napoca (Romania), July 2-4 2000

[6] Soumaya Chaffar and Diana Inkpen, "Using a Heterogeneous Dataset for Emotion Analysis in Text", in Proceedings of the 24th Canadian Conference on Artificial Intelligence (AI 2011), St-John's, NFL, Canada, May 2011, pp. 62-67\

[7] B. Liu. Sentiment Analysis and Subjectivity. Handbook of Natural Language Processing, SecondEdition, (editors: N. Indurkhya and F. J. Damerau), 2010

[8] B. Pang and L. Lee, “Opinion Mining and Sentiment Analysis.” Foundations and Trends inInformation Retrieval 2(1-2), pp. 1–135, 2008.

[9] J. Wiebe, T. Wilson, R. Bruce, M. Bell, and M. Martin, “Learning Subjective Language,” Computational Linguistics, vol. 30, pp. 277–308, September 2004

[10] M. Hu and B. Liu, “Mining and Summarizing Customer Reviews,” Proceedings of the AC SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), pp. 168–177, 2004.

[11] N. Jindal, and B. Liu. “Opinion Spam and Analysis.” Proceedings of the ACM Conference on Web Search and Data Mining (WSDM), 2008.