production knowledge imass-olhao_24-4-2014_en

28
1 Production of new knowledge through automated Big Data extraction from Social Bookmarking Systems and analyzing of the resulting network: The case of the network of the globalization of agriculture in Delicious 1st IMASS Conference, Methods and Analyses in Social Sciences, 23-24 April 2014, Olhão, Portugal, http://imass.ca/imass/conference University of Huelva, Spain Juan D. Borrero, [email protected] Estrella Gualda, [email protected] José Carpio, [email protected]

Upload: juan-d-borrero

Post on 05-Dec-2014

83 views

Category:

Presentations & Public Speaking


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Production knowledge imass-olhao_24-4-2014_en

11

Production of new knowledge through automated Big Data extraction from Social Bookmarking Systems and

analyzing of the resulting network: The case of the network of the globalization of agriculture in Delicious

1st IMASS Conference,Methods and Analyses in Social Sciences,

23-24 April 2014, Olhão, Portugal, http://imass.ca/imass/conference

University of Huelva, Spain

Juan D. Borrero, [email protected]

Estrella Gualda, [email protected]

José Carpio, [email protected]

Page 2: Production knowledge imass-olhao_24-4-2014_en

22

Table of ContentsIntroductionWeb 2.0 and Social Tagging SystemsSocial tagging and folksonomyFolksonmy and collective tag structure

Context and Topic of StudyDeliciousTagging on DeliciousTag structure, Delicious and social networksGlobalization of Agriculture

Objectives

MethodologyData collectionAnalysis

ResultsSocial network statistics from Delicious datasetNetwork centralizationTop authoritative nodesVisualization UserURL netCohesion and substructuresTag clouds

DiscussionCentrality and powerCentral tags

ConclusionsFurther researchPossible applications

Production of new knowledge through automated Big Data extraction from Social Bookmarking Systems and analyzing of the resulting network: The case of the network of the globalization of agriculture in Delicious

Page 3: Production knowledge imass-olhao_24-4-2014_en

33

FrameworkWeb 2.0 and Social Tagging Systems

Many users add metadata in the form of TAGS

Resulting collective tag structure

Source: http://www.idonato.com/2009/05/27/fun-with-tag-clouds/

Source: http://blog.hubspot.com/blog/tabid/6307/bid/7372/9-Reasons-Why-Your-Social-Media-Strategy-Isn-t-Working.aspx/

Source: http://bvdt.tuxic.nl/index.php/the-wisdom-of-the-crowds-in-the-audiovisual-archive-domain/

Web 2.0 has made tagging possible for a wide range of people to produce, share, interact with, and organize data

Page 4: Production knowledge imass-olhao_24-4-2014_en

44

FrameworkSocial Tagging

A user enjoys a resource and, according to his or her mental model, identifies those terms that best describe the information conveyed by that resource

is the activity in the Web 2.0 of annotating digital resources with keywords - tags (Golder and Huberman, 2006; Trant, 2009).

Social Tagging

Tagging

Page 5: Production knowledge imass-olhao_24-4-2014_en

55

Source: http://scot-project.net//

Social tags produced by users are usually regarded as high quality descriptors of web page topics and a good indicator of web users’ interests and preferences.

This process also allows the formation of a socially constructed classification schema called folksonomy…

(Vander Wal, 2004)

FrameworkSocial tagging and folksonomy

Page 6: Production knowledge imass-olhao_24-4-2014_en

66

… that emerges via a Bottom-up process, and… …the tags of many different users

are aggregated and the resulting collective tag structure– such as tag cloud – depicts the collective knowledge of Web users (Cress et al., 2012)

Source: http://blog.cimmyt.org/?p=6052

Source: http://scot-project.net//

FrameworkFolksonomy and collective tag structure

Page 7: Production knowledge imass-olhao_24-4-2014_en

7

Context and Topic of StudyContext

Delicious is a free social bookmarking web service for storing, sharing and discovering web bookmarks•Content is created, annotated and viewed by its users. •Non-hierarchical classification system: users can tag each of their bookmarks on the Delicious website, and provides knowledge about the URL marked •Collective nature:

• view bookmarks added or annotated by other users.

• organize existing tags into groups (tag bundles).

Source: www.delicious.com

Page 8: Production knowledge imass-olhao_24-4-2014_en

88

Context and Topic of StudyTagging on Delicious

People can classify the huge amount of information at her/his disposal in the form of tags.

Keywords freely chosen by users employed to annotate various types of digital content, or suggested by Delicious

Source: www.delicious.com

Page 9: Production knowledge imass-olhao_24-4-2014_en

99

Context and Topic of StudyTag structure, Delicious and social networks

We can see Delicious as a tripartite network whose representation can be described by two bipartite networks, for user→tag and user→URL relations, and where we can also see indirected links (e.g. between users - straight lines), that represent a unipartite network

The structure of Social tagging websites can be viewed as a network of three different node types: the U users, the R resources (web sites –URLs) and the T tags that the U users deploy to tag the R websites.

A Tripartite Network made of three users U=(u,u’,u’’), four tags T=(t,t’,t’’,t’’’) and three URLs (url,url’,url’’)

In Delicious, an annotation is mainly composed of three interconnected components (Smith, 2008):

1. Link to the resource (website)2. One or more tags3. User who makes the annotation

Page 10: Production knowledge imass-olhao_24-4-2014_en

1010

GlobalizationImplies large market as result of the reduction transaction costs of international

trade

Globalization of agriculture- trade (foods, goods)- prices (food, goods)

- food consumption (bulk products versus processed products)- R&D

- rules and laws (subsidies, WTO related to poverty)

implications

Asymmetries

effects

Web 2.0

Discussion/diffusion

Context and Topic of StudyTopic

Page 11: Production knowledge imass-olhao_24-4-2014_en

11

ObjectivesTo discover some type of structuration around the issue of the globalization of agriculture on Delicious

Extracting automatically data from Delicious social bookmarking website, and using Social Network Analysis (SNA),1.what types of URLs around our topic have been recommended via collaborative tagging in Delicious,2.what types of users label URLs around this topic,3.whether there is some type of structuration and hierarchyto be discovered in the network of the globalisation of agriculture (centrality, substructures, etc.), and4.what types of tags are been used to specifically label (and thus define and qualify) the URLs on the globalization of agriculture that they recommend through Delicious.

Page 12: Production knowledge imass-olhao_24-4-2014_en

1212

MethodologyData collection / Procedure

(A) Start point. Identify the search attributes. Authoritative source as baseline to find keywords connected to the idea of ‘globalization of agriculture’

– Wikipedia definition of “critics of globalization (popular, high reputation)

– Other starts points (future)– Selected (manually= researcher expertise) main concepts

from the website homepages, tag clouds or topics. – Identified the 9 seed keywords (globalization +

agriculture, development, activism, trade, poverty, food, organic, GMO)

– Other concepts rejected

(B) Perl program web-crawling was made to gather the sample of users, URLs and tags for

- globalization+agriculture;globalization+development; globalization+activism; globalization+poverty; globalization+food; globalization+organic; globalization+GMO

- From 22 April 2011 to 21 May 2011

(C) Results- 61,043 taggings that involved 3,668 users on 4,913 URLs

and 5,724 tags.

(D) Program in Haskell to reduce the amount of data by cutting the URLs and using key words, including the identification of synonyms, the elimination of words with capital letters and derivatives such as words in plural.

Page 13: Production knowledge imass-olhao_24-4-2014_en

1313

MethodologyData collection / Final dataset

2,148 URLs 4,776 tags 3,668 users

Page 14: Production knowledge imass-olhao_24-4-2014_en

14

MethodologyAnalysis

With the help of the Software Pajek, we analyzed these social networks,

first studying its properties (quantitative), and

second visualizing the nets (qualitative) through force-directed graph layouts and tag clouds.

Page 15: Production knowledge imass-olhao_24-4-2014_en

15

Network Type Relation # of nodes

# of links Density Av. Degree

User URL Bipartite Directed 5,816 7,200 0.09% 2.476

User– User Unipartite Undirected 3,668 134,833 1.97% 73.5187

URL – URL Unipartite Undirected 2,148 20,558 0.84% 19.141

Tag – Tag Unipartite Undirected 4,776 539,105 47.06% 225.756

A bipartite network with a directed relation is a network created through two different types of nodes (in this case “users” and “URLs”), that are directly connected by a relationship or link (in this work: user recommend URLs, or user tag URLs) (2-mode network).

A unipartite network with an undirected relation is a network created after a transformation of the original matrix into a user-user, tag-tag, or URL-URL matrix. In these cases there is an undirected relation through a vertice (node) that connect both (1-mode network). For instance, a user-user matrix is built here through the URLs that connect users, because different people can tag or recommend the same URL.

ResultsSocial Network Statistics from Delicious dataset

Tag-tag network is much denser than the others: Peopleusually use common tags

Page 16: Production knowledge imass-olhao_24-4-2014_en

1616

The network is highly centralized within a few nodes. The power law is a defining characteristic of large-scale networks such as the Web (e.g.

Barabási and Albert, 1999), which implies a high degree of network centralization

How come that a few users and websites are better connected than the majority?

2,148 URLs arranged in rank order by number of inbound links (URL’s Indegree: Sum of total inbound links)

3,668 users arranged in rank order by number of outbound links (User’s Outdegree: Sum of total outbound links)

ResultsNetwork centralizationHyperlink Network (userURL). The degree of variability in URL and user centrality scores according to indegree and outdegree.

Only 10 URLs from 2,148 (0.47%) account for 17.97% of links.1% URLs (22 URLs from 2,148) account for 26.50% of links.

Only 10 users from 3,668 (0.27%) account for 5.25% of links.1% users (37 users from 3,668) account for 12.01% of links.

Page 17: Production knowledge imass-olhao_24-4-2014_en

17

ResultsTop authoritative nodes in the Delicious “Globalization of agriculture” network

Indegree OutdegreeValue URL Description Value User Description

1 259 www.nytimes.com On line newspaper 71 /garrygoldenhttp://www.garrygold

en.net/Professional futurist

2 170 www.independent.co.uk On line newspaper 51 /mritiunjoy

Mritiunjoy MohantyProfessor, Economics

Indian Institute of Management Calcutta

3 155 www.naomiklein.org Activist media site 44 /emmarlyb

4 144 www.news.bbc.co.uk/ On line newspaper 42 /woldpublicopinionhttp://www.worldpubl

icopinion.org/Activist media site

5 124 www.globalresearch.ca Activist media site 33 /criticalspatialpractic

eNicholas Brown

Artist

6 95 www.spiegel.de/ On line newspaper 30 /pagolnari

Dr. Kathy Ward pagol Nari

Professor, Carbondale, EEUU

Feminist bloggerhttp://pagolnari.blogs

pot.com.es/

7 94 www.guardian.co.uk/ On line newspaper 28 /bfunk

Bryan Finokihttp://subtopia.blogsp

ot.com.es/Author Subtopia

(Blog), Senior Editor, Archinect, and

Adjunct, Woodbury University School of

Architecture, San Diego

8 94 www.economist.com/ On line newspaper 28 /chris.h.p9 87 www.corpwatch.org Activist media site 27 /maitreya11 Carlos Puentes

10 72 www.theatlantic.com Online magazine 24 /matttbastardMatthew Elliot

http://bastardlogic.wordpress.com/

10 most centralized websites.Six of them were media-based

(online newspapers such as The New York Times, The Independent, BBC, Spiegel, The Guardian, and The Economist) and three wer activist (Naomi Klein, Global Research, and Corpwatch)

Identification of Users with a greater degree of centrality.

Mritiunjoy user plays a very important role in the network.

Mritiunjoy joined to Delicious on 12 march, 2007.

Mritiunjoy Mohanty - is a professor at the Indian Institute of Management Calcutta, and his Research Interests are Political Economy of growth and development.

Page 18: Production knowledge imass-olhao_24-4-2014_en

18

ResultsVisualization UserURL network. 5,816 nodesEnergy-Frutcherman (Pajek) Map. Color: Cores

Page 19: Production knowledge imass-olhao_24-4-2014_en

19

Cluster K=1..5

(subnet)

Nodes Frequence(%)

CumFreq(nodes)

CumFreq (%)

1 4,445 76.43% 4,445 76.43%

2 792 13.62% 5,237 90.04%

3 387 6.65% 5,624 96.70%

4 147 2.53% 5,771 99.23%

5 45 0.77% 5,816 100.00%

Sum 5,816 100.00%

k-core: A k-core of a graph G is a maximal connected sub-graph of G in which all vertices have a degree of at least k.

ResultsCohesion and substructures

Page 20: Production knowledge imass-olhao_24-4-2014_en

20

2-core 792 vertices. Density=0.26% 3-core 387 vertices. Density=1.16%

4-core 147 vertices. Density=5.16% 5-core 45 vertices. Density=34.77%

ResultsCohesion and substructures

We found that the mass media websites belong to the 5-core subgroup, as the main activists websites are included in the 4-core.

Page 21: Production knowledge imass-olhao_24-4-2014_en

21

Gráfico 9. Nube de etiquetas para la Red de Globalización de la Agricultura identificada en

Delicious (Principales etiquetas de la red)

Main themes

ResultsTagCloud: identifying the topical themes in the unipartite tagtag networkSize proportional to the weights - the top 50 highest weighted tags.Produced by Wordle

Page 22: Production knowledge imass-olhao_24-4-2014_en

22

Discussion

• Because tagging is a bottom up process, the constitution of a global network in this way suggests a very old sociological dilemma concerning the constitution of society.

– Do individuals (or micro entities) came first or are communities and societies present from the very beginning?

– Does human agency determine social structures or is an individual's behavior determined by social structures?

• We found the bottom-up social tagging process is crucial, but it could not exist without Web 2.0 technology.

• What it is especially interesting for us here is whether these questions could be transferred to understanding the society that lives around the process of social tagging inside Web 2.0 as we exemplified in this article by the social bookmarking site Delicious.

• The approach of this study acknowledges the reciprocity and influence of the social and semantic characteristics. However, the user is who ultimately decides if one URL have to be included or not and whether he or she is going to write new tags. Thus, the constitution of the globalization of agriculture network is probably a mixture, as it is the society.

Page 23: Production knowledge imass-olhao_24-4-2014_en

23

DiscussionCentrality and Power

• Very inequal distribution of power of the URLs cited by users in the topic globalization of agriculture.– Important accumulation of inbound links.

• Mass media and activists in this network of globalization of agriculture in Delicious surpassed by far other resources tagged.

• Identification of key collective actors (represented here through URLs as unknow users as well) allow a better comprehension of leadership, influence processes, and power-related structures.

• For social practitioners, is a good way to identify key informants in a community through which to disseminate useful and important information.

ADVANTAGES OF THIS TYPE OF KNOWLEDGEFOR RESEARCHING AND INTERVENING

Page 24: Production knowledge imass-olhao_24-4-2014_en

24

DiscussionCentral Tags: Users producing Tags

• Tags: suggested by the website or added new tags in a creative way

• Each user could label a URL with an unlimited number of tags.

• Tag Cloud: visual approach to the language used by users and to identify discourses.

• From a total of 4,776 tags, two words were the main ones.

• Most frequently tags used were the words: ‘economics’ and ‘politics’.

Page 25: Production knowledge imass-olhao_24-4-2014_en

25

ConclusionsAchieved goals

• A first step towards the development of empirical techniques capable of automatically differentiating actors who occupy a more central position.

• First stone in the difficult process of understanding and discovering patterns in the process that characterize users tagging URLs for collaborative reasons.

• Utility for discovering latent patterns = provide effective recommendations to different actors.

• Understanding the community of more than a thousand links.

• Retrieval and analysis of information was complex but easy = working in interdisciplinary teams.

Page 26: Production knowledge imass-olhao_24-4-2014_en

26

FOCUS ON Users•Identification of key actors that disseminate and share URLs, as the previously cited Mritiunjoy

– Determine from where key elements that structure the network emerge. •Why is ‘that’ so important actor in the network of globalization of agriculture?

– Key actors in this type of network could configure and reconfigure the evolution of the network (TIME), and structure and even manipulate the type of interchange of resources in Delicious or in similar bookmarking sites.

•Use of some tags at classifying URLs and the distinction among users in the way they use some words/tags

– Distinction between scientifics / other professionals or users? – Identify users with the same patterns at tagging, or URLs that were similarly

labelled: study structural equivalences•Is it by chance? Are most prominent actors in a type of website like Delicious corresponding to a profile of very active and participative people? Do they usually work (or have as hobby) in this area and this is why accumulate and tag so many URLs in Delicious?•Go in-depth about users (if possible).

Further research

Page 27: Production knowledge imass-olhao_24-4-2014_en

27

Further research

FOCUS ON Tags• Reasons of the prominence of the two first tags around the globalization of

agriculture.– Influence of first tags on the following ones.

• Role of innovation and creativity at tagging• Are some of the 4,776 found tags used in a interchangeable basis?

– Why sometimes the word economics is used sometimes, and why other times is used economy?

– Are they used in the same way at classifying the URLs?– Evolution and usage of language around an issue along time.– Ideological and terminological approaches in the national/ international arena.

• Other possible studies based in retrieving the pages and making content analysis.

• Why some labels are present/ absent? • Are there “traditions”/ “fashions” at tagging in the Web 2.0?

OTHERS• To compare results from Delicious and from other social bookmarking sites.• Longitudinal analysis.• And other explorations, other starting points, other indicators, etc.

Page 28: Production knowledge imass-olhao_24-4-2014_en

28

Possible Applications

• Producing and “manipulating” public opinion (at recommending and describing websites) and markets– If we know the interests of users belonging to a network, we could also

be able to make recommendations• Important for researchers interested in formulating strategies

for intervention and mobilization, but also practitioners, and firms could make use of this.

• The discovery of the central elements in a network (users and URLs), at the same time that the tags used by users could be key to design future strategies for diffusion (spreading taglines, causes, rumours, etc.

• Implementation of Information Retrieval and Recommender Systems techniques in social commerce and social media contexts.

• Applications in advertising, e-commerce, mobilizing, security…s• …