using open datasets for research purposes

37
Using open datasets for research purposes Erasmus Studio Tuesday 20 January 2015 Martijn Kleppe, Erasmus Universiteit Rotterdam Astrid van Aggelen, Vrije Universiteit Laura Hollink, Vrije Universiteit

Upload: martijn-kleppe

Post on 31-Jul-2015

130 views

Category:

Science


0 download

TRANSCRIPT

Using open datasets for research purposesErasmus Studio

Tuesday 20 January 2015

Martijn Kleppe, Erasmus Universiteit RotterdamAstrid van Aggelen, Vrije Universiteit

Laura Hollink, Vrije Universiteit

2

Program

• I. Introduction: PoliMedia (Martijn)

• II. Talk of Europe (Astrid)

• III. Concluding: Research with open datasets (Martijn)

www.polimedia.nl

Current approach of research

How do media cover debates in the Dutch Parliament?

Issues with current approach

+ = Too much work

(Travel & manually)

Issues with current approach

+ =Limited

material and different systems

(No images + selection of programs)

PoliMedia approach

PoliMedia Portal

Search debate and person

NewspapersKB

TelevisionSound and Vision

RadioKB

Staten Generaal Digitaal

KB

• Yeah! It works (but no television)

• Not perfect

• But still ok (recall: 62%; precision: 80%)

• It is open for everyone: www.polimedia.nl

• We won a prize with it

Results

• Yeah! It works (but no television)

• Not perfect

• But still ok (recall: 62%; precision: 80%)

• It is open for everyone: www.polimedia.nl

• We won a prize with it

• People actually use it (!)

Results

NRC Handelsblad, Ewoud Sander, Voor al haar mantelzorgen, 14 April 2014

“Another digital source I often use is PoliMedia.nl

Yeah! An article in NRC HANDELSBLAD!

“PoliMedia is mainly interesting because of the advanced search &

filter options”

NRC Handelsblad, Ewoud Sander, Voor al haar mantelzorgen, 14 April 2014

Oh no, he does not use PoliMedia

for what it was made for…

• Do people understand it?

• Not only Ewoud Sanders uses PoliMedia not to its full potential. Me neither …

• Which topic received most press coverage?

• Can do this via Sparql Endpoint. Result the “Indonesische Kwestie”.

• But I do not know how to work with a Sparql Endpoint

Results

• Not really open data

• Only Dutch

• Follow-up: Talk of Europe

Results

LinkedPolitics: Linked Open Data of political events, actors, media.

Talk of Europe

• Goal: publish the plenary debates of the European Parliament as Linked Data

• Linked Data: a format for publishing data on the Web, with URI’s as permanent identifiers, designed for connecting pieces of data.

• Why is this important?

To allow large scale analysis across time spans by social scientists interested in voting behavior, partisanship, lobbies, differences between countries, etc.

To residents of the European Union, so the electorate, access to the proceedings of the European parliament is a formal right.

Data

Data

Data

14M triples about the 30K speeches by 3K speakers (and their affiliations) in 1K session days that were held in the EU parliament so far (1999-2014)

Links to external datasets

Country names

Members of Parliament

Members of Parliament+ Parties Members of

Parliament

Access to the data

1. We provide access in three ways:

2. Through a SPARQL endpoint at http://linkedpolitics.ops.few.vu.nl/sparql/

3. Using the browse and search options of ClioPatria.

4. By downloading the data in turtle or RDF/XML.

5. As triple patterns fragments at http://data.linkeddatafragments.org/linkedpolitics (Thanks to Ruben Verborgh).

Searching the proceedings of the EU Parliament

Searching the proceedings of the EU Parliament

Example queries on the Talk-of-Europe data

• What are differences between members in terms of terms mentioned?

• What are differences between EU parties in terms of terms mentioned?

• Which new member was discussed most when they joined?

• For each EU country, get the number of speeches held by its representatives that contains the word “agriculture".

• …

Creative Camps

• 3 events of one week each, where people are invited to work with our data on-site.

• Outcome CC 1 @ Hilversum:• Links to the Italian

parliament.• Detection of people who

speak about an unusual mix of topics.

• Sentiment analysis

Check out our current Call for Participation! Deadline 30 January 2015http://www.talkofeurope.eu/creativecamp2/call-for-participation/

31

Research with open datasets

32

Our experiences• There are some really nice and interesting datasets

• How do you find an open dataset that matches your research question?

34

Our Experiences• There are some really nice and interesting datasets

• How do you find an open dataset that matches your research question?

• What are really open datasets? And what is not open?

• Do you need to collaborate with computer scientists?

• Is an open dataset sufficient or a semi-finished product or ‘half-fabrikaat’? What was the goal for creating the dataset?

36

Our Experiences• There are some really nice and interesting datasets

• How do you find an open dataset that matches your research question?

• What are really open datasets? And what is not open?

• Do you need to collaborate with computer scientists?

• Is an open dataset sufficient or a semi-finished product or ‘half-fabrikaat’? What was the goal for creating the dataset?

• What is the aim of using open datasets? Answering research questions or finding research questions?