slicing big data: gambling, twitter & time sensitive information

29
Gambling, Twitter & Time Sensitive Information IR14 - Denver,CO [email protected] @dpwoodford Wednesday, 23 October 13

Upload: darryl-woodford

Post on 01-Nov-2014

990 views

Category:

Education


1 download

DESCRIPTION

Presented at the Internet Researchers conference in Denver, CO -- 26 October 2013. Discusses Gambling, Reality TV, and World Events in the Context of Twitter Data, and selecting usable data from big data.

TRANSCRIPT

Page 1: Slicing Big Data: Gambling, Twitter & Time Sensitive Information

Gambling, Twitter & Time Sensitive Information

IR14 - Denver,[email protected]

@dpwoodford

Wednesday, 23 October 13

Page 2: Slicing Big Data: Gambling, Twitter & Time Sensitive Information

FORMAT

• Not going to simply repeat the paper.

• I will get to the gambling (& fantasy sports) examples, but want to discuss our wider work with large datasets.

• Happy to answer more specific questions about the use in the gambling industry.

• Examples from Sport, TV, Gambling & Fantasy Sports. A tour-de-force of current research projects

Wednesday, 23 October 13

Page 3: Slicing Big Data: Gambling, Twitter & Time Sensitive Information

DEALING WITH THE TITLE: TWITTER

• Twitter => Large Data Sets, but specific research questions often require a small data set:– Australian users– Users registering on the platform during natural disasters– ‘Experts’ on Fantasy Sports– Sporting Participants: Golf, Tennis, NFL, College Football, etc..– Reality TV ‘fanatics’– Almost infinite examples

• Goal is to get from “Big Data” to what I’ve been calling “useful data”

Wednesday, 23 October 13

Page 4: Slicing Big Data: Gambling, Twitter & Time Sensitive Information

DEALING WITH THE TITLE: GAMBLING

• Long term interest in the gambling industry (one case study in my prior work on games).

• Many parallels between Gambling and Fantasy Sports (another current research project).

• When I was an ‘active participant’, Twitter was just becoming popular (2006-2010).

• It quickly became a crucial source of information, and websites started aggregating it.

Wednesday, 23 October 13

Page 5: Slicing Big Data: Gambling, Twitter & Time Sensitive Information

DEALING WITH THE TITLE: GAMBLING

Wednesday, 23 October 13

Page 6: Slicing Big Data: Gambling, Twitter & Time Sensitive Information

DEALING WITH THE TITLE: GAMBLING

Wednesday, 23 October 13

Page 7: Slicing Big Data: Gambling, Twitter & Time Sensitive Information

DEALING WITH THE TITLE: TIME SENSITIVE INFORMATION

• Lines move incredibly fast: Just as much a market as day-trading on the stock exchange

Wednesday, 23 October 13

Page 8: Slicing Big Data: Gambling, Twitter & Time Sensitive Information

WHY IS DATA SLICED?

• Streaming API is limited to ~1% of total tweets per second & Firehose access is expensive.

• Large data sets are not easily malleable, or visually analyzed (e.g. with Tableau):– Our database of Twitter users is ~3.7TB, and growing.– A weeks worth of selected TV data (current US shows) in JSON

format is 750MB, and 600MB in TSV (selected fields). And millions of rows.

• Analyzing large data sets is slow, if it’s even possible => “Usable Data”

Wednesday, 23 October 13

Page 9: Slicing Big Data: Gambling, Twitter & Time Sensitive Information

HOW IS DATA SLICED: COMPULSORY

Wednesday, 23 October 13

Page 10: Slicing Big Data: Gambling, Twitter & Time Sensitive Information

HOW IS DATA SLICED: SELECTING FOR AUTHENTICITY -- WTA

Wednesday, 23 October 13

Page 11: Slicing Big Data: Gambling, Twitter & Time Sensitive Information

HOW IS DATA SLICED: SELECTING FOR AUTHENTICITY -- FANTASY SPORTS

Wednesday, 23 October 13

Page 12: Slicing Big Data: Gambling, Twitter & Time Sensitive Information

HOW IS DATA SLICED: SELECTING FOR AUTHENTICITY -- FANTASY SPORTS

CLIP  FROM  YAHOO  FANTASY  FOOTBALL  RE:  CALVIN  JOHNSON  INJURY  &  TWITTER  REPORTS

Wednesday, 23 October 13

Page 13: Slicing Big Data: Gambling, Twitter & Time Sensitive Information

BUT YOU STILL NEED A SANITY CHECK

Wednesday, 23 October 13

Page 14: Slicing Big Data: Gambling, Twitter & Time Sensitive Information

BUT YOU STILL NEED A SANITY CHECK

Wednesday, 23 October 13

Page 15: Slicing Big Data: Gambling, Twitter & Time Sensitive Information

HOW IS DATA SLICED: RANDOM SAMPLING

Source:  Tony  Hirst  (Open  University  UK)

Wednesday, 23 October 13

Page 16: Slicing Big Data: Gambling, Twitter & Time Sensitive Information

BUT SOMETIMES YOU NEED THE FULL SAMPLE & REPEATED CAPTURE

Source:  Bruns  /  Woodford  [Mapping  Online  Publics]

Wednesday, 23 October 13

Page 17: Slicing Big Data: Gambling, Twitter & Time Sensitive Information

HOW IS DATA SLICED: ONLY A SMALL SAMPLE MATTERS

Floods,  Earthquake,  Tsunami

Media  Coverage

Wednesday, 23 October 13

Page 18: Slicing Big Data: Gambling, Twitter & Time Sensitive Information

HOW IS DATA SLICED: TV -- SEASONAL DATA VS EPISODIC

Impact  of  Live  Feed

Wednesday, 23 October 13

Page 19: Slicing Big Data: Gambling, Twitter & Time Sensitive Information

HOW IS DATA SLICED: TV -- SEASONAL DATA VS EPISODIC

Wednesday, 23 October 13

Page 20: Slicing Big Data: Gambling, Twitter & Time Sensitive Information

HOW IS DATA SLICED: TV -- SEASONAL DATA VS EPISODIC

Delayed  TV  sucks

Wednesday, 23 October 13

Page 21: Slicing Big Data: Gambling, Twitter & Time Sensitive Information

HOW IS DATA SLICED: MOST ACTIVE ≠ REPRESENTATIVE

• Most active (#BB15, #BBLF) users often defend a HM to the death (akin to sporting tribalism), but most users are attackers (forthcoming paper w/ Katie Prowd)

Disclaimer:  Scale  changed  to  fit  on  slide

Source:  Woodford  /  Prowd  [Fan  Cultures  and  Hatred  in  Big  Brother  15:  Race  Rows,  EliMsm  &  SporMng  Tribalism  -­‐-­‐  Forthcoming]

Wednesday, 23 October 13

Page 22: Slicing Big Data: Gambling, Twitter & Time Sensitive Information

TIME SLICES OF TWEET CONTENT IS ENLIGHTENING

Source:  Woodford  /  Prowd  [Fan  Cultures  and  Hatred  in  Big  Brother  15:  Race  Rows,  EliMsm  &  SporMng  Tribalism  -­‐-­‐  Forthcoming]

Wednesday, 23 October 13

Page 23: Slicing Big Data: Gambling, Twitter & Time Sensitive Information

TIME SLICES OF TWEET CONTENT IS ENLIGHTENING

Source:  Woodford  /  Prowd  [Fan  Cultures  and  Hatred  in  Big  Brother  15:  Race  Rows,  EliMsm  &  SporMng  Tribalism  -­‐-­‐  Forthcoming]

Wednesday, 23 October 13

Page 24: Slicing Big Data: Gambling, Twitter & Time Sensitive Information

HOW IS DATA SLICED: MOST ACTIVE ≠ REPRESENTATIVE

Source:  Woodford  /  Prowd  [Fan  Cultures  and  Hatred  in  Big  Brother  15:  Race  Rows,  EliMsm  &  SporMng  Tribalism  -­‐-­‐  Forthcoming]

Wednesday, 23 October 13

Page 25: Slicing Big Data: Gambling, Twitter & Time Sensitive Information

HOW IS DATA SLICED: MOST ACTIVE ≠ REPRESENTATIVE

• Twitter closed these quickly, yet the BB15 accounts remained active for much of the season...

Wednesday, 23 October 13

Page 26: Slicing Big Data: Gambling, Twitter & Time Sensitive Information

AND A QUICK NOTE ON NON-TWITTER ANALYTICS

Wednesday, 23 October 13

Page 27: Slicing Big Data: Gambling, Twitter & Time Sensitive Information

AND A QUICK NOTE ON NON-TWITTER ANALYTICS

• There’s lots of data out there, but it needs to be sliced to be usable.

• You can work with large, original, data sets, but often this adds extra complexity that isn’t necessary to answer your research questions.

• But don’t delete the data you don’t need!

Wednesday, 23 October 13

Page 28: Slicing Big Data: Gambling, Twitter & Time Sensitive Information

AND A QUICK NOTE ON NON-TWITTER ANALYTICS

Wednesday, 23 October 13

Page 29: Slicing Big Data: Gambling, Twitter & Time Sensitive Information

ACKNOWLEDGEMENTS

• ARC Centre for Excellence in Creative Industries and Innovation (CCI) - http://www.cci.edu.au & http://www.mappingonlinepublics.net

• Social Media Research Group -- http://socialmedia.qut.edu.au

• Queensland University of Technology

Wednesday, 23 October 13