emojinet: an open service and api for emoji sense discovery
TRANSCRIPT
EmojiNet: An Open Service and API for Emoji Sense Discovery
Presented By - Sanjaya Wijeratne
Sanjaya Wijeratne, Lakshika Balasuriya, Amit Sheth, Derek Doran, EmojiNet: An Open Service and API for Emoji Sense Discovery, In 11th International AAAI Conference on Web and Social Media (ICWSM 2017). Montreal, Canada; 2017. Demo | BibTeX
Sanjaya Wijeratne, Lakshika Balasuriya, Amit Sheth, Derek Doran
Problems with current State-of-the-art● Current version of EmojiNet supports:
○ Only 35% of all emoji supported by the Unicode
Consortium (845 out of 2,389)
○ Emoji sense definitions are very short (10 ~ 15 words)
○ No support for platform-specific emoji meanings
○ Not available for download as a dataset
○ Does not support REST API access
2
What is new in EmojiNet● Supports all 2,389 emoji supported by Unicode Consortium
○ 2,389 emoji (3 times increase)
○ 12,904 sense definitions (4 times increase)
● Sense-embeddings learned over text corpora
○ Twitter and Google News corpora are used to learn word
embeddings to further strengthen sense definitions
● Platform-specific meanings for 40 commonly misunderstood
emoji obtained through an Amazon Mechanical Turk Task
● Public release of the EmojiNet dataset with REST API access 3
Sense Filtering● We had 50,115 total number of senses in our sense pool
○ 21,779 of them were incorrect according to English
○ We evaluated the remaining 28,336 sense labels
■ 15,432 sense labels were removed as they were not
correct (noisy data extracted from Emoji Dictionary)
○ Remaining 12,904 sense labels were considered for sense
disambiguation
6
EmojiNet Resource Evaluation● Resource linking based on image similarity performed with
96.27% accuracy
9
Adding Word Embeddings to EmojiNet● We trained a Twitter word embedding model using 110
million tweets with emoji. We also used a publicly available
Google News word embedding model to learn word vectors
● Each word in each emoji sense in each emoji was replaced by
the 20 most related words learned by the word embeddings
models. This lead to 3 contexts for each emoji sense
○ BabelNet-based context words
○ Twitter-based context words
○ Google News-based context words10
Adding Platform-specific senses to EmojiNet● We conducted an experiment on Amazon Mechanical Turk
to understand what emoji senses are platform-specific for a
given emoji
○ We selected 40 commonly misunderstood emoji for this
○ We created 14,448 tasks, where each task asked to
evaluate whether a particular platform-specific sense is
valid
○ 1,128 tasks were filtered as they were spam
11
Emoji Sense Disambiguation● We selected 25 most misunderstood emoji based on past
work for a emoji sense disambiguation task
○ Randomly selected 50 tweets for each emoji
○ Used Simplified LESK algorithm for disambiguation
12
Emoji Similarity● We used 100 emoji available in EmoTwi50 dataset to create a
graph based on emoji similarity
○ Emoji are represented as nodes
○ If two emoji share the same sense label, they are
connected by an edge
● We used label propagation algorithm to find clusters in our
emoji graph
13
Calculate Emoji Similarity using Jaccard Coefficient● In another experiment, we used Jaccard Similarity on emoji
senses to find emoji similarity
15
Questions?
Thank You!
16
Read more about EmojiNet at - http://wiki.knoesis.org/index.php/EmojiNet