mining the social web - leibniz universität hannover · user recommendation ... (lda) to generate...

65
Mining the Social Web Asmelash Teka Hadgu [email protected] L3S Research Center April 24, 2018

Upload: others

Post on 06-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

Mining the Social Web

Asmelash Teka [email protected]

L3S Research CenterApril 24, 2018

Page 2: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

Outline

2

Deep LearningSelected Papers

Introduction: The big picture

Page 3: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

Definition from Web Science Conference1

Web Science is the emergent science of the people, organizations, applications, and of policies that shape and are shaped by the Web

Studying the online world to understand the offline world

3

What is Web Science?

http://websci16.org

Page 4: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

Web 2.0 describes World Wide Web sites that emphasize user-generated content, usability, and interoperability.

The social web is a set of social relations that link people through the World Wide Web.

4

The Social Web

https://makeawebsitehub.com/social-media-sites

Page 5: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

Problem: How can we automatically construct user profiles?

5

User Identification

Page 6: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

Authoritative users extraction - Discovering expert users for a target topic.

Personalized web search - Personalized social media posts retrieval.

User recommendation - Suggesting new interesting users to a target user.

Preprocessing step for many interesting downstream analysis.

6

User Identification Applications

Page 7: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

Challenges:

Labels - Generating labelled data to train Models

Feature Engineering - What are good features

Modeling - Mostly off the shelf Predictive models

7

Machine Learning for User Identification [1,2,3]

Page 8: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

Feature engineering is key in a machine learning task.

● Profile features● Tweeting behaviour features● Linguistic content● Social network features

8

Feature Engineering

Page 9: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

Baseline

Bag-of-words (BOW) model with weighting (e.g., TF-IDF)

Each user is represented as a bag of the words in their tweets.

These words are then used as features to train a classifier.

9

Feature Engineering

Page 10: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

Ideally Profile information should be sufficient. In reality it does not contain enough quality information to be directly used for user classification.

- Length of name, Number of alphanumeric chars.- Capitalization forms in user name- Use of avatar picture- Number of followers, friends- Regular expression matches in bio:

(I |i)(m|am| m|[0 − 9] + (yo|yearold)man|woman|boy |girl

10

Profile Features

Page 11: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

A set of statistics capturing the way users interact with the micro-blogging service.

- Number of tweets of a user.- Number and fraction of retweets of a user.- Ave. number of hashtags and URLs per tweet.- Ave. time and std between tweets.

11

Tweeting Behaviour Features

Page 12: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

Linguistic content contains the user’s lexical usage and the main topics of interest to the user.

- List of socio-linguistic words such as emoticons, ellipses etc.- Prototypical words, hashtags etc. Given n classes, each class ci containing

seed users Si . Each word w from seed users is assigned a score:

(1)

where |w, Si| is the number of times the word w is issued by all users for class ci .

- Latent Dirichlet Allocation (LDA) to generate topics used as features for classification.

- Sentiment words- Word embeddings

12

Linguistic Content Features

Page 13: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

These features contain the social connections between a user and those one follows, replies to or whose messages they retweet.

Prototypical ‘friend’ accounts are generated by exploring the social network of users in the training set by bootstrapping as in Eq. (1)

- Number of prototypical friends, percentage number of prototypical friend - Prototypical replied users, Prototypical retweeted users

13

Social Network Features

Page 14: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

Manually building ground truth data and validating result not easy.

● Combine multiple data sources. e.g. scraping websites that contain users in each class.

● Using key terms in profile of users. Prone to systematic bias and care should be taken on features.

● Using crowdsourcing services such as Amazon Mechanical Turk1 and CrowdFlower2

14

Ground-Truth and Validation

1. https://www.mturk.com 2. http://www.crowdflower.com/

Page 15: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

Identifying Computer Science researchers on Twitter [3]

15

A Complete Pipeline

Page 16: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

● Current State-of-the-art methods for classification are based on deep learning.

● Key difference is in representation learning - instead of hand crafting features, layers of neural networks, learn representations (features) from raw inputs.

● Bag of words -> words in semantic spaces through word embeddings e.g., word2vec

16

Deep Learning

Page 17: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

Intro Deep Learning

Page 18: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

Applications:

Face Recognition

Object Detection

Neural Style Transfer

Page 19: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

From Feedforward to Convolutional Neural Networks

1. Deep Learning on large images

Consider an RGB image with 1000x1000 dimensions, if you use FCs with 1000 hidden units => 3 billion params.

- Need lots of data to prevent overfitting- computational/memory requirements are huge

2. Preserve Spatial relationship of input images.

Page 20: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

The Big Picture (What’s inside the box?)

cat

Convolutional Neural Network

Page 21: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

Why Convolutions?

Main advantages of convolutional layers over fully connected layers:

● Sparse interactions

● Parameter sharing

● Translation invariance

Page 22: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

Hierarchy of Feature Representations

Earlier layers of deep neural networks detect low level features such as edges and later layers detect high level features.

Page 23: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

Convolution Operation:

● Apply a dot-product of filter over image: ● Convolutional networks are neural networks that use convolution in at least one

of their layers.

1 1 1 0 0

0 1 1 1 0

0 0 1 1 1

0 0 1 1 0

0 1 1 0 0

1 0 1

0 1 0

1 0 1

Image

Filter/kernel

Animation Source: https://www.coursera.org/learn/convolutional-neural-networks

Page 24: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

Padding

Problems with the basic Convolution operation:

- Shrinking output- Throwing away information from edge

E.g., nxn image convolved with fxf filter becomes

To solve these problems we can pad the image before applying convolution.

Given: nxn input convolved with fxf filter filter padding p gives:

Page 25: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

Valid and Same Convolutions

Valid Convolution: Convolution with no padding

Input image: n by n becomes

Same Convolution: Pad so that output size is the same as the input size

Input image: n by n output feature map:

n + 2p - f + 1 = n => p = (f-1)/2

Common filter sizes for Convolution: 1x1, 3x3, 5x5, 7x7

Page 26: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

Strided Convolution

Strided Convolution is another important basic building block.

Instead of shifting filter by one, we jump over by stride s of steps.

Given an n x n image convolved with f x f filter with padding p and stride s, the output size is given by:

Page 27: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

Convolutions over Volumes

● Allow us to operate directly on multi-dimensional input e.g., RGB images.

● Multiple filters stacked together allow us to detect multiple features.

● The resulting output is volume which in turn we can perform convolution to form a convolutional network.

Video Source: https://www.coursera.org/learn/convolutional-neural-networks

Page 28: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

Convolution Computation [Dimensions]

Input: 5x5x3 image

2 filters 3x3 with Stride = 2 and padding = 1

Output: (5+2-3)/2 +1 = 3x3x2

We add bias and apply nonlinearity, e.g., Relu to obtain the next output to complete one step of a convolution layer.

Video Source: http://cs231n.github.io/convolutional-networks/

Page 29: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

One Convolutional Layer [Parameters]

Consider the image on the right is 64x64x3. If you apply 10 filters that are 3x3 in the first CONV layer of neural network, how many parameters does that layer have?

CONV

3x3x3 = 27 params + bias => 28 params * 10 filters => 280 params

Note: No matter how big the input image is, the number of parameters stays the same. This makes convNets less prone to overfitting.

Page 30: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

Types of Layers in a Convolutional Neural Network

- Convolution (CONV)- Pooling (POOL)- Fully Connected (FC)

CONV POOL CONV POOL CONV

FC FC

Page 31: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

Pooling Layer

- Reduce the size of representation- Speed up computation- Make features more robust

No parameters to learn!

Hyperparameters: filter size, stride, Max or Average pooling, mostly no padding.

Page 32: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

Pooling Operation:

3 4 0 1

2 0 2 0

5 3 2 1

1 0 6 7

Max Pooling: take the max of values in each filter. Similly,

Average Pooling: take the average of values in each filter.

4 2

5 7

3 4 0 1

2 0 2 0

5 3 2 1

1 0 6 7

2.25

0.75

2.25

4

Max pooling Average pooling

Page 33: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

ConvNet Example

64x64x3

f =5x5s=1#f=8

64x64x8

CONV

MAXPOOL

f =2x2s=2

32x32x8

f =5x5s=1#f=16

CONV

28x28x16

MAXPOOL

f =2x2s=2 14x14x16

f =3x3s=1#f=32

CONV

FC FC

How activation dimensions change

Page 34: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

Why Convolutions?

Main advantages of convolutional layers over fully connected layers:

● Sparse interactions: In each layer, each output value depends only on a small number of inputs.

● Parameter sharing: a feature detector (such as a vector edge detector) that’s useful in one part of the image is probably useful in another part of the image.

● Translation invariance: An image shifted a few pixels should result in similar features and should be assigned the same output label.

Page 35: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

Training ConvNets

Training examples (x(1) ,y(1)), (x(2) ,y(2)), … (x(m) ,y(m))

Cost

Maximum likelihood provides a guide for how to design a good cost function.

Use gradient descent to optimize parameters to reduce J.

y^

Page 36: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

Summary Part-One

We have seen basic building blocks

● Convolution operation, padding and strided convolution● Convolution Layer● Pooling layer

How to put these things together?

Page 37: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

Case Studies: Deep ConvNet Architectures

- Start with what’s used or modify for own purpose- Gain intuition from successful architectures- Architecture that works for one task often works well for other tasks

ImageNet Classification

1.2 million high-resolution images into 1000 different classes.

Page 38: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

LeNet-5

Architecture: CONV-POOL-CONV-POOL-CONV-FC-FC

● Relatively small network ~ 60k parameters● Notice the pattern of one or more CONVs followed by a POOL and then finally FCc● As you go deeper in the network, a decrease in HxW 32 => 28 => 14 => 10 => 10 => 5 and an increase

in channels 1 => 6 => 16

LeCun et al., 1998, Gradient-Based Learning Applied to Document Recognition

Input: 32x32 grayscale imageVALID CONV: 6 filters 5x5, stride 1AVG POOL: 2x2, stride 2VALID CONV: 6 filters 5x5, stride 1AVG POOL: 2x2, stride 2FC 120 nodesFC 84 nodes

Page 39: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

AlexNet

● Similar to LeNet but much bigger ~60M parameters instead of ~60k● Used ReLu activation function

Krizhevsky et al., 2012, ImageNet Classification with Deep Convolutional Neural Networks

Input: 227x227x3SAME CONV 96 filters 11x11 s=4 MAX-POOL 3x3 s=2SAME CONV 256 filters 5x5MAX-POOL 3x3 s=2SAME CONV 384 filters 3x3 SAME CONV 384 filters 3x3 SAME CONV 256 filters 3x3 MAX-POOL 3x3 s=2FC 4096FC 4096Softmax 1000

Page 40: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

VGG-16

● CONV = 3x3 filter, s=1, same● MAX-POOL = 2x2, s = 2● From 8 to 16 layers

Input: 224x224x3[CONV 64] x 2POOL[CONV 128] x 2POOL[CONV 256] x 3POOL[CONV 512] x 3POOL[CONV 512] x 3POOLFC 4096FC 4096Softmax 1000

Simonyan and Zisserman, 2015, Very Deep Convolutional Networks for Large-Scale Image Recognition

Notable for simple and uniform architecture.

Page 41: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

ResNetHe et al., 2015, Deep residual learning for image recognition

Very deep networks are difficult to train because of vanishing and exploding gradients.

Skip Connection: allows to take the activation from one layer and feed it directly to a layer much deeper in the network.

Plain connection Residual block

Page 42: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

ResNet

ResNets are formed by stacking residual blocks together to form a deep network

ResNets do well because it’s easy for a residual block learn the identity function

Plain connection

Residual Network

He et al., 2015, Deep residual learning for image recognition

Page 43: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

ResNet

● Able to train very deep networks without degrading (152 layers on ImageNet, 1202 on Cifar)

● Swept 1st place in all ILSVRC and COCO 2015 competitions

He et al., 2015, Deep residual learning for image recognition

Page 44: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

GoogLeNet

When designing a layer for a ConvNet you might ask:

1x1 CONV? 3x3 CONV? 5x5 CONV? MAX POOL?

Inception Network says Why not all!

Szegedy et al. 2014, Going deeper with convolutions

Page 45: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

GoogLeNetSzegedy et al. 2014, Going deeper with convolutions

- 22 layers- Efficient ‘Inception’ module- No FC layers- Only 5 million Params 12x less than AlexNet- ILSVRC’14 classification winner (6.7% top 5 error)

Page 46: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

Summary Part Two

Use Cases and Architectural Choices

Classic Examples:

LeNet-5, AlexNet, VGG

Recent Advances:

ResNet, GoogLeNet (Inception)

Page 47: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

References

Convolutional Neural Networks On Coursera

https://www.coursera.org/specializations/deep-learning

CS231n: Convolutional Neural Networks for Visual Recognition http://cs231n.stanford.edu/

Deep Learning, Ian Goodfellow and Yoshua Bengio and Aaron Courvill http://www.deeplearningbook.org/

Page 48: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

ApplicationsSpeech Recognition

Machine Translation

Named Entity Recognition

Image Captioning

Page 49: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

Why Not a standard feed-forward Network?

● Input, output can be of different lengths in different examples

● Doesn’t share features learned across time

● Reduced Number of parameters

Page 50: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

Why Not CNNs?

CNN - specialized for processing a grid of values such as an imageRNN - specialized for processing a sequence of values x(1), . . . , x(τ).

CNN - can readily scale to images with large width and heightRNN - can scale to much longer sequences

Page 51: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

Types of RNNs (RNN Architectures)

Source: http://karpathy.github.io/2015/05/21/rnn-effectiveness/

Image classification Music generationImage captioning

Sentiment classification Named Entity RecognitionVideo activity recognition

Machine translation

Page 52: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

What are Recurrent Neural Networks (RNNs)?

xt - the input at time step tst - the hidden state at time step t. It’s the “memory” of the network. st is calculated based on the previous hidden state and the input at the current step: st=f(Ut + Wst-1). The function f usually is a nonlinearity such as tanh or ReLU.ot - is the output at step t Training RNN (learn parameter U, V, W): backpropagation with Backpropagation Through Time (BPTT) algorithm

Page 53: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

Standard Recurrent Neural Networks

The repeating module in a standard RNN contains a single layer neural network.

Source: Source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Page 54: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

LSTMs

54

LSTMs are a special type of RNNs capable of learning long-term dependencies.

Page 55: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

Selected Papers

Page 56: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

Gender Shades

● Computer vision is being utilized in high-stakes sectors such as health-care and law enforcement.

● Machine learning algorithms can discriminate based on classes like race and gender.

● An approach to evaluate bias present in automated facial analysis algorithms and datasets.

Buolamwini, Joy, and Timnit Gebru. "Gender shades: Intersectional accuracy disparities in commercial gender classification." Conference on Fairness, Accountability and Transparency. 2018.

Page 57: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

Gender Shades

● Members of parliament from Africa: Rwanda, Senegal, South Africa and Europe: Iceland, Finland, Sweden

● Public figures with known identities and photos available under non-restrictive licenses

● gender labeling based on name, gendered title, prefixes such as Mr. or Ms and appearance of the photo

Buolamwini, Joy, and Timnit Gebru. "Gender shades: Intersectional accuracy disparities in commercial gender classification." Conference on Fairness, Accountability and Transparency. 2018.

Pilot Parliaments Benchmark (PPB) dataset

Page 58: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

Gender Shades

● Current benchmark datasets: IJB-A and Adience majority (79.6% for IJB-A and 86.2% for Adience) of the subjects are lighter-skinned faces

● Evaluation on commercial gender classification○ Microsoft - Microsoft’s Cognitive Services Face API○ IBM - IBM's Watson Visual Recognition API○ Face++ - Publicly available demonstration of Face++ from China

darker-skinned females are the most misclassified group (with error rates of up to 34.7%)

Buolamwini, Joy, and Timnit Gebru. "Gender shades: Intersectional accuracy disparities in commercial gender classification." Conference on Fairness, Accountability and Transparency. 2018.

Page 59: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

CheXNet

● Deep learning is being applied to help with improved medical diagnoses, disease analyses, and pharmaceutical development.

● Chest X-rays are the most common imaging examination tool for diagnosing Pneumonia and other diseases.

● Automated detection of diseases from chest X-rays at level of expert radiologists helps in clinical settings, also important to deliver health care in populations with inadequate access to diagnostic imaging specialists.

Rajpurkar, Pranav, et al. "CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning." arXiv preprint arXiv:1711.05225 (2017).

Page 60: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

CheXNet

● An algorithm that detects Pneumonia from chest X-rays at a level exceeding professional radiologists.

● CheXNet, is a 121-layer convolutional neural network trained on ChestX-ray14, publicly available chest X-ray dataset.

● Uses dense connections and batch normalization to make the optimization of such a deep network tractable.

Rajpurkar, Pranav, et al. "CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning." arXiv preprint arXiv:1711.05225 (2017).

Page 61: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

CheXNet

● Heatmaps visualize areas of image most indicative of disease with class activation maps (CAMs).

● Let fkbe kth feature map and wc,kweight in the final classification layer for feature map k leading to pathology c, compute a map Mc:

● Heatmap generated by upscaling Mcto dimensions of image and overlaying

Rajpurkar, Pranav, et al. "CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning." arXiv preprint arXiv:1711.05225 (2017).

Page 62: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

DeepMoji

● NLP tasks are often limited by scarcity of manually annotated data.

● Researchers typically use emoticons and specific hashtags as forms of distant supervision.

● Learn emotion, sentiment and sarcasm detection using a single model through emoji prediction on a dataset of 1246 million tweets containing one of 64 common emojis.

Felbo, Bjarke, et al. "Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm." arXiv preprint arXiv:1708.00524 (2017).

Top five most likely emojis for each text

Page 63: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

DeepMoji

Pretraining on the classification task of predicting which emoji were initially part of a text.

● English tweets without URLs● word tokens, repeated characters shortened ● Single tweet for the pretraining per unique emoji type

regardless of the number of emojis in a tweet● Balance dataset to avoid over representation of most

frequent emojis

Felbo, Bjarke, et al. "Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm." arXiv preprint arXiv:1708.00524 (2017).

DeepMoji model S is text length and C number of classes

Page 64: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

DeepMoji

Transfer Learning: fine tuning the pretrained model to a target task.

- Using model as a feature extractor with all layers frozen

- Using model as an initialization where the full model is unfrozen

- Sequentially unfreezing and fine tuning a single layer at a time, chain-thaw.

Felbo, Bjarke, et al. "Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm." arXiv preprint arXiv:1708.00524 (2017).

Chain-thaw transfer learning approach

Page 65: Mining the Social Web - Leibniz Universität Hannover · User recommendation ... (LDA) to generate topics used as features for classification. - Sentiment words ... follows, replies

References

1. Pennacchiotti, Marco, and Ana-Maria Popescu. "Democrats, republicans and starbucks afficionados: user classification in twitter." Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2011.

2. Rao, Delip, et al. "Classifying latent user attributes in twitter." Proceedings of the 2nd international workshop on Search and mining user-generated contents. ACM, 2010.

3. Hadgu, Asmelash Teka, and Robert Jäschke. "Identifying and analyzing researchers on twitter." Proceedings of the 2014 ACM conference on Web science. ACM, 2014.

4. Remain, Joseph, EST al. "You only look once: Unified, real-time object detection." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.

5. Rajpurkar, Pranav, et al. "CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning." arXiv preprint arXiv:1711.05225 (2017).

6. Felbo, Bjarke, et al. "Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm." arXiv preprint arXiv:1708.00524 (2017).

7. Buolamwini, Joy, and Timnit Gebru. "Gender shades: Intersectional accuracy disparities in commercial gender classification." Conference8. on Fairness, Accountability and Transparency. 2018.

65