making sense of millions of thoughts: finding patterns in the tweets

106
Making Sense of Millions of Thoughts Finding patterns in the Tweets “Knowing comes from learning, from seeking.” “What we call chaos is just we haven't recognized.” “I am looking for a needle haystack.” “140-character text messages, called Krist Wongsuphasawat (50 characters) (58 characters) (42 characters) (42 characters)

Upload: krist-wongsuphasawat

Post on 11-Aug-2014

266 views

Category:

Data & Analytics


0 download

DESCRIPTION

I gave this presentation at Workshop on Interactive Language Learning, Visualization, and Interfaces / ACL 2014 in Baltimore, MD on June 27, 2014. http://nlp.stanford.edu/events/illvi2014/index.html ABSTRACT Everyday on Twitter, there are millions of thoughts that are captured and shared to the world in the form of 140-character messages, or Tweets. There are many things we could learn from these thoughts if we could figure out a way to digest this gigantic dataset. Visualization is one of the many ways to extract information from these Tweets. In this presentation, I will talk about several visualizations based on Tweets, as well as share experiences and challenges from working with Tweet data.

TRANSCRIPT

Page 1: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

Making Sense of Millions of Thoughts

Findingpatterns

in theTweets

“Knowing comes from learning, from seeking.”

“What we call chaos is just we haven't recognized.”

“I am looking for a needle haystack.”

“140-character text messages, called ”

Krist Wongsuphasawat

(50 characters)

(58 characters)

(42 characters)

(42 characters)

Page 2: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

X-Men

Page 3: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

Prof. XAbility: Telepathy (mind reading)

Page 4: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

CerebroEnhance telepathy

Prof. X

Page 5: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

Cerebro

Page 6: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

With this power…

Page 7: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

What are you thinking?

Page 8: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

What are people thinking about x?

Product Event

Personetc.

Page 9: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

Reality

Page 10: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

Cerebro

Page 11: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

Internet

Page 12: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

Platformthought

thought

thought

thought

thought

crowdsourcing social networks

Data

Page 13: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

Twittertweet

tweet

tweet

tweet

tweet

Tweets

Page 14: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

Tweets• 140 characters

• text + media

• geo

• time

Page 15: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

Twittertweet

tweet

tweet

tweet

tweet

Tweets

Page 16: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

What can we learn from these Tweets?

Page 17: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

visual-insights@twitter@miguelrios @philogb @trebor @kristw

Page 18: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

World Cup

Election

Oscars

Pure Curiosity

Grammy

TV Shows

New Year

Breaking news

Earthquake

Page 19: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

Insights, Stories

(Tweets)DATA

with limited time

Audience: general public

Page 20: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

Tools

• Hadoop

• Apache Pig

• Vertica

• node.js, python

• d3 & co.

Page 21: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

Pig

Page 22: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

Insights, Stories

(Tweets)DATA

Page 23: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

Insights, Stories

(Tweets)

Filter

DATA

Page 24: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

Having all Tweets

How people think I feel.

Page 25: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

Having all Tweets

How people think I feel. How I really feel.

Page 26: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

Filter data

Good news:

Bad news:

Want only relevant Tweets

Have all Tweets

Too many Tweets

Page 27: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

Filter data (2)• #hashtags — e.g. #world-cup

• easy to filter

• hashtags must be presented

• typo?

Page 28: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

Filter data (2)• #hashtags — e.g. #world-cup

• easy to filter

• hashtags must be presented

• keywords — e.g. goal

• broader

• can be ambiguous

Page 29: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

Filter data (3)• Combine with other attributes

• Time

• during the first half of World Cup final

Page 30: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

Filter data (3)• Combine with other attributes

• Time

• during the first half of World Cup final

• Location

• Tweets from Brazil

• Not every Tweet is geotagged.

Page 31: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

Filter data (4)

• Languages

• Sometimes use only English Tweets

• Future

• Translation?

Page 32: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

Insights, Stories

(Tweets)

Filter

Clean

DATA

Page 33: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

Clean data

• Typo (Mobile input)

• Abbreviation (due to 140-character limit)

• Exaggeration (e.g. GOOOOALLLL)

• Twitter specific e.g., Old-style retweet “RT …”

• Inappropriate content

Page 34: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

Insights, Stories

(Tweets)

Filter

Clean

Visualize

DATA

Page 35: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

(+ media)photos, videos

What?

Where? When?

GEO TIME

TEXT

DATA

Page 36: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

What?

Where? When?

GEO TIME

TEXT

Visualize Data

Page 37: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

What?

Where? When?

GEO TIME

TEXT

Visualize Data

Page 38: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

TIME Tweets/second

Page 39: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

TIME Tweets/second

Page 40: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

TIME Tweets/second + Annotation

http://www.flickr.com/photos/twitteroffice/5681263084/

Page 41: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

TIME Tweets/second + Annotation

Manual

To automate

Top tweets (most Retweets, Favs)

Page 42: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

What?

Where? When?

GEO TIME

TEXT

Visualize Data

Page 43: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

GEOHeatmap

Low density

High density

Page 44: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

GEONew York City

flickr.com/photos/twitteroffice/8798020541

Page 45: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

GEOSan Francisco

flickr.com/photos/twitteroffice/8798020541

Page 46: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

GEOSan Francisco

Rebuild the world based on

tweet volumes

twitter.github.io/interactive/andes/

Page 47: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

What?

Where? When?

GEO TIME

TEXT

Visualize Data

Page 48: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

TIME + GEO

blog.twitter.com/2011/global-pulseyoutu.be/SybWjN9pKQk

Japan Earthquake 2011

Page 49: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets
Page 50: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

TIME + GEO Tweet pattern [Rios & Lin 2012]

Night

Late night

Daytime

Night

Late night

Daytime

Page 51: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

What?

Where? When?

GEO TIME

TEXT

Visualize Data

Page 52: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

TEXT Trends

Page 53: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

TEXT

www.wordle.net

Some samples from World Cup

Page 54: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

TEXT Word cloud of Tweets right after the 1st goal

www.wordle.net

Page 55: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets
Page 57: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

TEXT• Now

• Derived information: Sentiment, Topic

• Combine with other information (geo & time) + context

• Future

• Better technique + involves more NLP e.g. key phrases, etc.

Page 58: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

TEXT Descriptive Keyphrases [Chuang et al. 2012]

Page 59: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

TEXT• Challenge

• Scale

Page 60: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

What?

Where? When?

GEO TIME

TEXT

Visualize Data

Page 61: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

GEO + TEXT Real-time Tweet map

Page 62: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

GEO + TEXT Real-time Tweet map

Page 63: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

GEO + TEXT Real-time Tweet map

most frequent

term

Page 64: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

GEO + TEXT Real-time Tweet map

Gmail went down Jan 24, 2014

Page 65: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

GEO + TEXT Real-time Tweet map

Nelson Mandela passed away Dec 5, 2013

Page 66: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

GEO + TEXT Real-time Tweet map

• Next:

• Involves more NLP

• Tokenization - Languages without space between words

• etc.

• Challenge:

• Real-time

Page 67: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

GEO + TEXT

www.yelp.com/wordmap

Yelp Wordmap

Page 68: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

What?

Where? When?

GEO TIME

TEXT

Visualize Data

Page 69: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

TIME + TEXT

http://www.babynamewizard.com/voyager

Baby Name Voyager

Page 70: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

TIME + TEXT

http://www.babynamewizard.com/voyager

Baby Name Voyager

Page 71: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

TIME + TEXT

UEFA Champions League

Biggest Tournament for European soccer clubs

Many Tweets during the matches

Page 72: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

TIME + TEXT UEFA Champions League

Dortmund Bayern Munich

Count Tweets mentioning the teams every minute

Team 1 Team 2

Page 73: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

TIME + TEXT UEFA Champions League

Page 74: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

TIME + TEXT UEFA Champions League

+ “goal” count + context

Page 75: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

TIME + TEXT UEFA Champions League

+ “offside”

Page 76: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

TIME + TEXT UEFA Champions League

+ players

Page 77: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

A B C D

A C

C

Competition Tree

vs vs

vs

Page 78: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

A B C D

A C

C

Competition Tree

+

vs vs

vs

Page 79: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

A B C D

A C

C

Competition Tree

+ =

uclfinal.twitter.com

vs vs

vs

Page 80: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

TIME + TEXT UEFA Champions League

• Challenges

• Filter relevance tweets

• Multiple matches at the same time

• Ambiguous words: “goal”, “red”, “yellow”

• Tweets mentioning both teams e.g. “#GER 2-2 #GHA”

Page 81: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

What?

Where? When?

GEO TIME

TEXT

Visualize Data

Page 82: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

TIME + GEO + TEXT State of the Union

twitter.github.io/interactive/sotu2014

Page 83: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

TIME + GEO + TEXT State of the Union

1) timeline + topic from Tweets

4) Density map of Tweets about selected topic

3) Volume of Tweets by topics

during selected part of the SOTU

2) context (speech)

twitter.github.io/interactive/sotu2014

Page 84: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

TIME + GEO + TEXT New Year 2014

Page 85: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

TIME + GEO + TEXT New Year 2014

Page 86: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

TIME + GEO + TEXT New Year 2014

twitter.github.io/interactive/newyear2014/

Page 87: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

Recap

Page 88: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

What can we learn from these Tweets?

many, many things.

Page 89: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

better

the examples in this talk

imagine…

DATA(Tweets)

Page 90: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

Insights, Stories

(Tweets)

Filter

Clean

Visualize

DATA

Page 91: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

(Tweets)

Insights, Stories

Filter

Clean

Process &Visualize

DATA

Page 92: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

(Tweets)

Insights, Stories

Filter

Clean

Process &Visualize

DATA

NLP

Page 93: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

TEXTWhat?

Where? When?

GEO TIME

Visualize data

Page 94: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

(Tweets)

Insights, Stories

Filter

Clean

Process &Visualize

DATA

Research

Page 95: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

Working together

Raw data

Human

Page 96: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

Working together

Raw data

Human

Computer (One machine, Cloud, MapReduce, etc.)

Page 97: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

Working together

Raw data

Human

Ignored informationProcessed information

Computer (One machine, Cloud, MapReduce, etc.)

Page 98: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

Working together

Raw data

Human

Aggregated information

Ignored informationProcessed information

Computer (One machine, Cloud, MapReduce, etc.)

Page 99: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

Working together

Raw data

Human

Aggregated information

Ignored informationProcessed information

Computer (One machine, Cloud, MapReduce, etc.)

NLP Make computers think more like Human.

Page 100: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

Working together

Raw data

Human

Aggregated information

Ignored informationProcessed information

VISHelp people consume information.

Computer (One machine, Cloud, MapReduce, etc.)

NLP Make computers think more like Human.

Page 101: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

Working together

Raw data

Human

Aggregated information

Ignored informationProcessed information

VISHelp people consume information.

Computer (One machine, Cloud, MapReduce, etc.)

NLP Make computers think more like Human.

HCI

User interactions or

Provide feedback

Bridge the gap. Connect human & computer.

Page 102: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

Advanced techniques vs.

Scalability

Page 103: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

LifeFlow => Flying SessionsResearch System at Twitter

Page 104: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

Summary• Thoughts are captured in the Tweets: what, where, when

• Finding patterns from: text + geo + time

• Opportunities for NLP + HCI + VIS collaboration

• Better technique vs. Scalability + Real-time

@kristw / interactive.twitter.com

Page 105: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

Questions?

Page 106: Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

Thank you