chances and challenges of studying social media data

44
Chances and Challenges of Studying Social Media Data Dr. Katrin Weller GESIS – Leibniz-Institute for the Social Sciences Data Archive for the Social Sciences / Computational Social Science Cologne, Germany Digital Studies Fellow at John W. Kluge Center Library of Congress Washington D.C. E-Mail: [email protected] ●Twitter: @kwelle ● Web: www.katrinweller.net

Upload: katrin-weller

Post on 06-Aug-2015

84 views

Category:

Social Media


2 download

TRANSCRIPT

Chances and Challenges of Studying Social Media Data

Dr. Katrin Weller GESIS – Leibniz-Institute for the Social Sciences

Data Archive for the Social Sciences / Computational Social Science

Cologne, Germany

Digital Studies Fellow at John W. Kluge Center

Library of Congress

Washington D.C.

E-Mail: [email protected] ●Twitter: @kwelle ● Web: www.katrinweller.net

My Background • PhD in Information Science (until 2012 University of

Düsseldorf)

• Interests: Web Science, Social Media (focus on Twitter), Knoweledge representation + Semantic Web, informetrics + altmetrics, scholarly communication

• Since 2013: GESIS, Social Web Data: New data types for social science research; research methods and data archiving.

• Jan-May 2015: Digital Studies Fellowship at the Library of Congress

2

Recent and Current Work

• Co-editor of „Twitter & Society“ (Peter Lang, 2014).

• With Katharina Kinder-Kurlanda: „The hidden data of social media research“

• #FAIL! Things that didn‘t work out in social media research – and what we can learn from them (#fail2015a at Web Science Conference, Oxford, #fail2015b at Internet Research 16, Phoenix). https://failworkshops.wordpress.com

• Pilotproject for archiving social media datasets in an election study. (http://arxiv.org/abs/1312.4476v2)

3

Social media research – some insights from bibliometrics

5

What is social media research?

457,000

4,991 19,440

Social media research 2000-2013

0

1000

2000

3000

4000

5000

2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013

No. of publications (Scopus)

Scopus search, conducted March 2014: (TITLE-ABS-KEY("social media") OR TITLE-ABS-KEY("social web") OR TITLE-ABS-KEY("social software") OR TITLE-ABS-KEY("web 2.0")) AND PUBYEAR > 1999

Scopus: 2000-2013 by country

0 1000 2000 3000 4000 5000 6000 7000

United States

United Kingdom

Germany

Australia

China

Spain

Canada

Italy

France

Taiwan

Netherlands

South Korea

Finland

Austria

Japan

Greece

India

Singapore

Switzerland

Hong Kong

Ireland

Scopus: 2000-2013 by subject area

10650; 36%

5542; 19%

2384

2288

2151

1535

773

772

65 Computer ScienceSocial SciencesEngineeringMedicineBusiness, Management and AccountingMathematicsArts and HumanitiesDecision SciencesPsychologyNursingEconomics, Econometrics and FinanceBiochemistry, Genetics and Molecular BiologyHealth ProfessionsEnvironmental ScienceEarth and Planetary SciencesAgricultural and Biological SciencesPharmacology, Toxicology and PharmaceuticsPhysics and AstronomyMaterials ScienceMultidisciplinaryNeuroscienceImmunology and MicrobiologyChemical EngineeringVeterinaryDentistryChemistryEnergy

Twitter research by discipline

10

Challenge

• Interdisciplinarity!

• „Social media research“ is not a coherent research field.

• Influences from lots of different disciplines.

• Some disciplines still isolated, not all equally advanced in technical tasks.

• Challenge of keeping track of what is going on – across disciplines.

11

Example: Twitter research in social sciences

12

Weller, K. (2014). What do we get from Twitter – and What Not? A Close Look at Twitter Research in the Social Sciences. Knowledge Organization 41(3), 238-248.

Challenge vs. Chance

• Lots of room for exploration and innovation

• Few or no standards

13

Comparability

• Data?

• Collection Period?

• Method?

• Tools?

14

15

Different methods even in social science Twitter research

Weller, K. (2014). What do we get from Twitter – and what not? A close look at Twitter research in the social sciences. Knowledge Organization. 41(3), 238-248

Example

0

10

20

30

40

50

60

2008 2009 2010 2011 2012 2013

Publications on „Twitter and elections“ (Scopus and Web of Science)

Weller, K. (2014). Twitter und Wahlen: Zwischen 140 Zeichen und Milliarden von Tweets. In: R. Reichert (Ed.), Big Data: Analysen zum digitalen Wandel von Wissen, Macht und Ökonomie (pp. 239-257). Bielefeld: transcript.

16

Year of election

Name of election Country/region No. of papers (2013)

Date of election

2008 40th Canadian General Election Canada 1 14.10.2008

2009 European Parliament election, 2009 Europe 1 07.06.2009

2009 German federal election, 2009 Germany 2 27.09.2009

2010 2010 UK general election United Kingdom 4 06.05.2010

2010 South Korean local elections, 2010 South Korea 1 02.06.2010

2010 Dutch general election, 2010 Netherlands 2 09.06.2010

2010 Australian federal election, 2010 Australia 1 21.08.2010

2010 Swedish general election, 2010 Sweden 1 19.09.2010

2010 Midterm elections / United States House of Representatives elections, 2010 USA 4 02.11.2010

2010 Gubernational elections: Georgia USA 1 02.11.2010

2010 Gubernational elections: Ohio USA 1 02.11.2010

2010 Gubernational elections: Rhode Island USA 1 02.11.2010

2010 Gubernational elections: Vermont USA 1 02.11.2010

2010 2010 superintendent elections South Korea 1 17.12.2010

2011 Baden-Württemberg state election, 2011 Germany 1 27.03.2011

2011 Rhineland-Palatinate state election, 2011 Germany 1 27.03.2011

2011 Scottish parliament election 2011 Scotland 1 05.05.2011

2011 Singapore’s 16th parliamentary General Election Singapore 1 07.05.2011

2011 Norwegian local elections, 2011 Norway 2 12.09.2011

2011 2011 Danish parliamentary election Denmark 2 15.09.2011

2011 Berlin state election, 2011 Germany 2 18.09.2011

2011 Gubernational elections: West Virginia USA 1 04.10.2011

2011 Gubernational elections: Louisiana USA 1 22.10.2011

2011 Swiss federal election, 2011 Switzerland 1 23.10.2011

2011 2011 Seoul mayoral elections South Korea 1 26.10.2011

2011 Gubernational eletions: Kentucky USA 1 08.11.2011

2011 Gubernational elections: Mississippi USA 1 08.11.2011

2011 Spanish national election 2011 Spain 1 20.11.2011

2012 Queensland State election Australia 1 24.03.2012

2012 South Korean legislative election, 2012 South Korea 1 11.04.2012

2012 French presidential election, 2012 France 2 22.04.2012

2012 Mexican general election, 2012 Mexico 1 01.07.2012

2012 United States presidential election, 2012 / United States House of Representatives elections, 2012

USA 17 06.11.2012

2012 South Korean presidential election, 2012 South Korea 2 19.12.2012

2013 Ecuadorian general election, 2013 Ecuador 1 17.02.2013

2013 Venezuelan presidential election, 2013 Venezuela 1 14.04.2013

2013 Paraguayan general election, 2013 Paraguay 1 21.04.2013

Year of election

Name of election Country/region No. of papers (2013)

Date of election

2008 40th Canadian General Election Canada 1 14.10.2008

2009 European Parliament election, 2009 Europe 1 07.06.2009

2009 German federal election, 2009 Germany 2 27.09.2009

2010 2010 UK general election United Kingdom 4 06.05.2010

2010 South Korean local elections, 2010 South Korea 1 02.06.2010

2010 Dutch general election, 2010 Netherlands 2 09.06.2010

2010 Australian federal election, 2010 Australia 1 21.08.2010

2010 Swedish general election, 2010 Sweden 1 19.09.2010

2010 Midterm elections / United States House of Representatives elections, 2010 USA 4 02.11.2010

2010 Gubernational elections: Georgia USA 1 02.11.2010

2010 Gubernational elections: Ohio USA 1 02.11.2010

2010 Gubernational elections: Rhode Island USA 1 02.11.2010

2010 Gubernational elections: Vermont USA 1 02.11.2010

2010 2010 superintendent elections South Korea 1 17.12.2010

2011 Baden-Württemberg state election, 2011 Germany 1 27.03.2011

2011 Rhineland-Palatinate state election, 2011 Germany 1 27.03.2011

2011 Scottish parliament election 2011 Scotland 1 05.05.2011

2011 Singapore’s 16th parliamentary General Election Singapore 1 07.05.2011

2011 Norwegian local elections, 2011 Norway 2 12.09.2011

2011 2011 Danish parliamentary election Denmark 2 15.09.2011

2011 Berlin state election, 2011 Germany 2 18.09.2011

2011 Gubernational elections: West Virginia USA 1 04.10.2011

2011 Gubernational elections: Louisiana USA 1 22.10.2011

2011 Swiss federal election, 2011 Switzerland 1 23.10.2011

2011 2011 Seoul mayoral elections South Korea 1 26.10.2011

2011 Gubernational eletions: Kentucky USA 1 08.11.2011

2011 Gubernational elections: Mississippi USA 1 08.11.2011

2011 Spanish national election 2011 Spain 1 20.11.2011

2012 Queensland State election Australia 1 24.03.2012

2012 South Korean legislative election, 2012 South Korea 1 11.04.2012

2012 French presidential election, 2012 France 2 22.04.2012

2012 Mexican general election, 2012 Mexico 1 01.07.2012

2012 United States presidential election, 2012 / United States House of Representatives elections, 2012

USA 17 06.11.2012

2012 South Korean presidential election, 2012 South Korea 2 19.12.2012

2013 Ecuadorian general election, 2013 Ecuador 1 17.02.2013

2013 Venezuelan presidential election, 2013 Venezuela 1 14.04.2013

2013 Paraguayan general election, 2013 Paraguay 1 21.04.2013

Big DATA? 2013: twitter and election

No. of Tweets No. Of publications (2013)

0-500 3

501-1.000 4

1.001-5.000 1

5.001-10.000 1

10.001-50.000 7

50.001-100.000 4

100.001-500.000 5

500.001-1.000.000. 3

1.000.001-5.000.000 3

More than 5.000.000 3

More than 100.000.000 1

More than 1.000.000.000 1

No/unsufficient information 13

Weller, K. (2014). Twitter und Wahlen: Zwischen 140 Zeichen und Milliarden von Tweets. In: R. Reichert (Ed.), Big Data: Analysen zum digitalen Wandel von Wissen, Macht und Ökonomie (pp. 239-257). Bielefeld: transcript.

19

Comparability twitter and election

Data collection methods

Weller, K. (2014). Twitter und Wahlen: Zwischen 140 Zeichen und Milliarden von Tweets. In: R. Reichert (Ed.), Big Data: Analysen zum digitalen Wandel von Wissen, Macht und Ökonomie (pp. 239-257). Bielefeld: transcript.

20

Data source number No information 11

Collected manually from Twitter website (Copy-Paste / Screenshot)

6

Twitter API (no further information) 8

Twitter Search API 3

Twitter Streaming API 1

Twitter Rest API 1

Twitter API user timeline 1

Own program for accessing Twitter APIs 4

Twitter Gardenhose 1

Official Reseller (Gnip, DataSift) 3

YourTwapperKeeper 3

Other tools (e.g. Topsy) 6

Received from colleagues 1

Comparability twitter and election

Data collection periods

Weller, K. (2014). Twitter und Wahlen: Zwischen 140 Zeichen und Milliarden von Tweets. In: R. Reichert (Ed.), Big Data: Analysen zum digitalen Wandel von Wissen, Macht und Ökonomie (pp. 239-257). Bielefeld: transcript.

21

period Number of publications (2013)

0-10 hours 1

1-2 days 6

3-7 days 3

8-14 days 5

2-4 weeks 7

1-2 months 13

2-6 months 5

7-12 months 3

More than 12 months 0

No/unsufficient information 6

Challenges

• Quickly changing landscape of social media platforms

• Twitter as a model organism of social media research?

23

Scopus: 2000-2013 popular keywords

Social networks (897), Social network (657)

User interfaces (1,007)

Social networking (online) (2,291)

Facebook (847)

Knowledge management (860)

Web services (869) Information systems (810)

Twitter (667)

Semantics(765), Semantic Web(669)

Communication(650)

Information technology (639)

E-learning (623), Students(579) Education(520) Teaching (504)

Scopus search, conducted March 2014: (TITLE-ABS-KEY("social media") OR TITLE-ABS-KEY("social web") OR TITLE-ABS-KEY("social software") OR TITLE-ABS-KEY("web 2.0")) AND PUBYEAR > 1999

Social Media Research: Topics

• Political communication / elections • Activism • Popular culture, memes • Brand communication, marketing • Journalism (incl. agenda setting, citizen journalism, TV

backchannel) • Crisis communication, disaster response • Scholarly communication • Language • And many more

25

pointless babble?

26

Social media research – some insights from expert interviews

• Weller, Katrin, and Katharina E. Kinder-Kurlanda. 2014. ““I love thinking about ethics!” Perspectives on ethics in social media research.” Internet Research (IR15), Deagu, South Korea, 22.-24.10.2014. Paper to be published in Selected Papers of Internet Research, view preprintHiddenDataEthics_Weller+Kinder-Kurlanda_IR15-preprint.

• Kinder-Kurlanda, K. E.; Weller, K. (2014). “I always feel it must be great to be a Hacker!”: The role of interdisciplinary work in social media research. In Proceedings of the 2014 ACM Web Science Conference WebSci’14, June 23–26, 2014, Bloomington, IN, USA, pp. 91-98. doi:http://dx.doi.org/10.1145/2615569.2615685. View preprint versionhiddendata_websci14-preprint_Kinder-Kurlanda+Weller(2014).

• Weller, K. & Kinder-Kurlanda, K. (in press). Uncovering the Challenges in Collection, Sharing and Documentation: The Hidden Data of Social Media Research? To appear in Workshop on Standards and Practices in Large-Scale Social Media Research. ICWSM, Oxford, May 2015. http://www.aaai.org/ocs/index.php/ICWSM/ICWSM15/paper/viewFile/10657/10552.

28

CHANCES

• Researchers value social media as a new type of data.

• Previously „ephemeral data“ become visible

• Immediate – quick reaction to events

• Structured

• „natural“ data

29

What I find really interesting is that structure becomes manifest in internet communication. So it’s the first time in history actually that we can, that social structures between people become manifest within a technology. (...) They become visible, they become crawlable, they become analyzable.

Some of the CHALLENGES

Preliminary results, more detailed analysis to follow.

- Interdisciplinarity

- Ethics

- Standards

- Data access & infrastructure

- …

30

Unregulated and developing field

• Researchers showed a high awareness of the unregulated and developing character of social media research methods.

But, I think that (…) in like a couple of years, maybe five – it depends a lot, because the subject of the research is changing every day, (…) but I think that we’re going to have, (…) more or less shared qualitative approaches with a lot of good practices.

Data Sharing

32

But you can’t make your data available for others to look at, which means both your study can’t really be replicated and it can’t be tested for review. But also it just means your data can’t be made available for other people to say, Ah you have done this with it, I’ll see what I can do with it, (…) There is no open data.

Data Sharing

“I think probably a couple times we’ve asked around if anyone else happened to have a particular dataset. […] but not so much, because they probably have tracked in a different data format, and then merging the two together actually becomes quite difficult as well.”

33

Ethics / privacy

34

“I will not quote tweets.”

“if somebody plays a really important role in a particular event then maybe they deserve the credit of being accredited as well.”

Social media research – some more insights from critical literature

Representativeness

Blank, G. (2014). Who uses Twitter? Representativeness of Twitter Users. Presentation at General Online Research GOR 14. Retrieved from: http://conftool.gor.de/conftool14/index.php?page=downloadPaper&filename=Blank-Who_uses_Twitter_Representativeness-119.pptx&form_id=119&form_version=final

34

26

812

1814

10

1712

2328

333035

0

20

40

60

80

100

% w

ho h

ave

done m

ore

th

an n

eve

r

InterestPolitical activities

Interestin politics

Sendpolitical

message

ContactMP online

Re-postpoliticalnews

Politicalcommenton SNS

Findpolitical

facts

Signonline

petition

OxIS current users: 2013 N=1,613

Figure 6: Political Activities of Twitter Users

Twitter user Non-user

Data Quality

• E.g. comparison of Twitter API and Reseller data.

37

Morstatter, F., Pfeffer, J., Liu, H., & Carley, K. M. (2013). Is the Sample Good Enough? Comparing Data from Twitter’s Streaming API with Twitter’s Firehose. Retrieved from http://arxiv.org/abs/1306.5204

Inequality in data access possibilities

38

boyd, d., and Crawford, K. 2012. Critical questions for Big Data: Provocations for a cultural, technological, and scholarly phenomenon. Information, Communication & Society 15(5):662–679. DOI: 10.1080/1369118X.2012.678878

• Data haves and data have nots

– Financial reasons

– Connections to companies

– Different skills

– …

Top 5 Challenges in Twitter research

• Representativeness and validity

• Cross-platform studies

• Comparisons (e.g. different countries, points in time)

• Multi-method approaches

• Context and meaning

Bruns, Axel, and Katrin Weller. 2014. "Twitter data analytics – or: the pleasures and perils of studying Twitter (guest editorial for special issue)". Aslib Journal of Information Management 66 (3): 246-249. 39

Summary

Three sources of challenges in social media research: • The variety of user interactions that count as social media

and their ever changing nature that makes social media a moving target.

• The diversity of the research community, which challenges knowledge transfer and development of standards.

• The dependency on commercial companies to open up access to their data. Researchers themselves only have limited means to change these sources of challenges.

40

Weller, K. (2015). Accepting the challenges of social media research. Online Information Review 39(3).

Summary

Currently addressed challenges • research infrastructure, including data collection and

sharing facilities, training in new methods and technologies.

• The call for more thoughtfulness in research ethics. • Critical considerations on big data and data quality,

including reflection of the power of algorithms and misrepresentations through big data approaches. Requests for broader scopes by facilitating multi-method and multi-platform studies, as well as longitudinal studies

41

Outlook

• Long term preservation of social media, i.e. archiving of data and as well as of social media platforms’ look and feel.

• Documentation of applied research methods which should enable comparative studies across the single use cases, quality control and verification of results.

• Accessing social media users’ expectations on privacy in order to respond to them through ethical standards.

42

GREETINGS FROM COLOGNE AND DC!

QUESTIONS AND FEEDBACK

[email protected]

@kwelle

http://katrinweller.net