infusing social data analytics into future internet applications for manufacturing

33
DSS Lab NTUA AICCSA 2014 Evmorfia Biliri, Michael Petychakis, Iosif Alvertis, Fenareti Lampathaki, Sotirios Koussouris, Dimitrios Askounis (National Technical University of Athens – NTUA, DSSLab) Infusing Social Data Analytics into Future Internet applications for Manufacturing 1

Upload: michael-petychakis

Post on 16-Jul-2015

73 views

Category:

Data & Analytics


0 download

TRANSCRIPT

DSS Lab NTUA AICCSA 2014

Evmorfia Biliri, Michael Petychakis, Iosif Alvertis, Fenareti Lampathaki, Sotirios Koussouris, Dimitrios Askounis(National Technical University of Athens – NTUA, DSSLab)

Infusing Social Data Analytics into Future Internet applications for Manufacturing

1

DSS Lab NTUA AICCSA 2014

About me and the lab

• Me– PhD Student

– API Developer

– Semantic Web enthusiast

• DSS Lab

– Research in ICT including• Future Internet Applications and Systems for Enterprises and Public

Administrations

• Big, Open and Linked Data and Analytics

• APIs, Social Media Publishing and Analytics

• eGovernance and Policy Modeling

• Enterprise and Government Interoperability

• ICT for Manufacturing

• Software Services and Cloud Infrastructures

DSS Lab NTUA AICCSA 2014

The problem (I)• A new age of engagement and collaboration has emerged with

the proliferation of user-generated content

• The quantity of information in the world is soaring, with businesses, governments and society only starting to tap its potential

• Harnessing collective intelligence represents a challenge for any manufacturing industry.

3

• To understand what is discussed online about any topic of interest, instantly catching the market realm

• To early identify sentiments about products and brands, thus preventing potential damage to the corporate reputation

• To detect on time user trends in order to be incorporated in product design

DSS Lab NTUA AICCSA 2014

The problem (II)

What we have

Many users

Many social

platforms

What we want

• Aggregated sentiment per

product or product feature

• Brand mentions analytics

• Trendy words identification and

analysis

DSS Lab NTUA AICCSA 2014

Problem (III)

Loooved it!!!lol

imho

#cinema

@mpetyx Have a look at http://... #greatdesign

TTYN

#omgfacts

#fail

#thingsilove

lol haha FTW yea right

Liz has finally managed to achieve

what seems to have been her goal...

to release an album that could have just

as easily been made by anybody else.

DSS Lab NTUA AICCSA 2014

Problem (IV)

IRONY IDENTIFICATION IS HARD…

Amazon.com (1 star)

“It took a couple of goes to get into it, but once the story hooked me, I found it difficult to put the book down -- except for those moments when I had to stop and

shriek at my friends, "SPARKLY VAMPIRES!" or "VAMPIRE BASEBALL!“ or "WHY IS BELLA SO STUPID?" These moments came increasingly often as I reached the climactic chapters, until I simply reached the point where I had to stop and flail around laughing.”

What if we did not have the rating (ground truth)?

DSS Lab NTUA AICCSA 2014

Algorithm

• Lexical based

– Pre-defined lists of words and phrases associated with a sentiment

– LIWC, SentiWordNet

– Difficult to apply the same list in different context

– Semi-automatic construction of lexicons

• Machine learning based

– Naïve Bayes, SVM, Maximum entropy

– Supervised classification

– Adaptation to domain/context

– Labeled data difficult to find

DSS Lab NTUA AICCSA 2014

NLP• Tokenization.

What about punctuation?

• Conversion to lowercaseDo capital words indicate sentiment (anger, excitement) ?

• Emoticons detection and replacement Commonly used as “ground-truth” polarity labels in the automatic creation of testing datasets.

• Stop words filtering

• Repeated characters removalCould be indicative of feeling? Remove based on lexicon usage?

• StemmingUse aggressive stemmer? Possible loss of information?

• N-gram creation n=? (usually between 1 and 3)

DSS Lab NTUA AICCSA 2014

NLP (II)• Part of speech tagging.

Study effect of adverbs, adjectives and POS structure of sentence

• Negation detectionReplacement with one word?

• HashtagsAre they more valuable? Could be used to map to preconfigured subjects and improve accuracy…

DSS Lab NTUA AICCSA 2014

Software and services

And many more …

DSS Lab NTUA AICCSA 2014

Our approach

The FITMAN “Unstructured and Social Data Analytics” Specific Enabler (FITMAN-Anlzer) extracts unstructured data from selected web resources and social data

from selected social networks and turns such user-generated content to knowledge to be used for the benefit of manufacturers.

11

A web infrastructure to ….

Collect Store Process Visualize Interact with

Cloud-based, customizable, domain-independent solution with a user-friendly interface

DSS Lab NTUA AICCSA 2014

Design goals• Domain-Independent

Optimizations are usually very specific and cannot be applied across different industries

• No code skills or formal querying language requiredPeople who train and use the system are typically not IT

• FlexibilitySystem can be trained to meet the needs of a specific domain and even a specific department

e.g. a promotional tweet

• Real time streaming and scalabilityImportant news go viral within hours…

• Report historyStore and keep statistics and charts for future reference

The described design goals were decided in collaboration with and validated by the FITMAN trials in respect with real-life applications in the manufacturing domain

DSS Lab NTUA AICCSA 2014

Functionalities Overview

Keyword- & Account-based Information Acquisition

Information Filtering

Sentiment Analysis

Trend Analysis

Added-value User Generated Content (UGC)

13

Repeated Characters Removal

Username Removal

TokenizationConversion to lowercase

Stop-words Filtering

……

SVM Training

+ Light Stemming

+ Term Frequency

Emoticons Identification

URL Removal

Snowball Stemming

DSS Lab NTUA AICCSA 2014

Technology Stack & Interactions

Trend & Sentiment Analysis Engine

Processing /Querying Engine

Visualization & Report Creator Engine

Data Connectors

Storage System

Scalable… Transferable… Extensible… Open-source…

14

Charts

DSS Lab NTUA AICCSA 2014

Data retrieval

• Twitter

– Streaming API

Low latency access to Twitter’s global stream of Tweet data. Suitable for following specific users or topics.

• Facebook

– Graph API

Primary way to get data in and out of Facebook's social graph. It's a low-level HTTP-based API that you can use to query dataExample: /{page-id}/posts

to get the posts that were published by this page

DSS Lab NTUA AICCSA 2014

Storage System

Couchbase Server

NoSQL database

Dynamic schema design

Flexibility, Scalability

Free Enterprise edition

Source: http://www.couchbase.com/nosql-resources/what-is-no-sql

DSS Lab NTUA AICCSA 2014

Indexing

Elasticsearch

• Flexible and powerful open source, distributed, real-time search and analytics engine.

• Easy to use

• Construction of structured queries also in JSON

• RESTful API for configuration/management

• Suitable for JSON documents.

DSS Lab NTUA AICCSA 2014

Sentiment analysis engine

Rapidminer Studio

• Easy-to-use visual environment for predictive analytics

• No programming required

• Available implementation for SVM

• Powerful text processing plugin

DSS Lab NTUA AICCSA 2014

Visualization

• Kibana

– No code required

– Real-time analysis of streaming data

– Highly scalable

– Open source, community driven

– Seamless integration with Elasticsearch

• Google charts

– Great variety of charts

DSS Lab NTUA AICCSA 2014

Scenario: The User PerspectiveI want to monitor trends for furniture, so I access the FITMAN Unstructured & Social

Data Analytics SE

1

20

I create a new project for the domain I am

interested in

2

DSS Lab NTUA AICCSA 2014

Step 2: Initialization

21

DSS Lab NTUA AICCSA 2014

Scenario: The User PerspectiveI want to monitor trends for furniture, so I access the FITMAN Unstructured & Social

Data Analytics SE

I provide the necessary training

material

Automatically Collect data

1

3

Connectors

22

Users

I create a new project for the domain I am

interested in

2

Publish UGC

DSS Lab NTUA AICCSA 2014

Step 3: Training

• Download csv file with the 1000 most recently collected documents, edit and upload it

to train the system

• Upload your own csv file

• One training file per language

23

DSS Lab NTUA AICCSA 2014

Scenario: The User PerspectiveI want to monitor trends for furniture, so I access the FITMAN Unstructured & Social

Data Analytics SE

I provide the necessary training

material

Publish UGC

Automatically Collect data

1

3

Connectors

I select search terms to generate data reports

4

24

Users

I create a new project for the domain I am

interested in

2

DSS Lab NTUA AICCSA 2014

Step 4: Creating reports

25

DSS Lab NTUA AICCSA 2014

Scenario: The User PerspectiveI want to monitor trends for furniture, so I access the FITMAN Unstructured & Social

Data Analytics SE

I provide the necessary training

material

Publish UGC

Automatically Collect data

I view collected data and navigate to the analysis & refine results

1

3

5

Connectors

I select search terms to generate data reports

4

26

Users

I create a new project for the domain I am

interested in

2

DSS Lab NTUA AICCSA 2014

Steps 5: Analysis & Visualization

27

DSS Lab NTUA AICCSA 2014

Steps 5: Analysis & Visualization

28

DSS Lab NTUA AICCSA 2014

Steps 5: Analysis & Visualization

29

DSS Lab NTUA AICCSA 2014

Steps 5: Analysis & Visualization

30

DSS Lab NTUA AICCSA 2014

Anlzer free search

31

DSS Lab NTUA AICCSA 2014

Future steps

• Evaluation in the scope of real-life business cases in various domains

• Measure and evaluate the effect of domain-specific training

• Experiment with other NLP techniques (e.g. include POS-tagger in text preprocessing)

• Extend polarity tags (detect more sentiments)

• Try other machine learning algorithms

• Subjective-objective sentence identification, as a prior step to the sentiment analysis process

• Conversation-level analysis of facebook comments

• Influencers detection

• Allow for even more flexible queries with Elasticsearch

DSS Lab NTUA AICCSA 2014

Thank you!