using the mediawiki web api to get (only) the data you need · using the mediawiki web api to get...

26
Using the MediaWiki web API to get (only) the data you need WikiConference USA 2014 Frances Hocutt @franceshocutt [email protected]

Upload: others

Post on 22-Aug-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Using the MediaWiki web API to get (only) the data you need · Using the MediaWiki web API to get (only) the data you need WikiConference USA 2014 Frances Hocutt @franceshocutt frances.hocutt@gmail.com

Using the MediaWiki web API to get (only) the data you need

WikiConference USA 2014

Frances Hocutt@franceshocutt

[email protected]

Page 2: Using the MediaWiki web API to get (only) the data you need · Using the MediaWiki web API to get (only) the data you need WikiConference USA 2014 Frances Hocutt @franceshocutt frances.hocutt@gmail.com

API = Application Programming Interface

web API = just another way to get data from a website

Page 3: Using the MediaWiki web API to get (only) the data you need · Using the MediaWiki web API to get (only) the data you need WikiConference USA 2014 Frances Hocutt @franceshocutt frances.hocutt@gmail.com

Humans are good at perceiving information, patterns

Computers are good at handling (lots of) data

APIs:programs::webpages:humans*

*more or less. The devil's in the details.

Page 4: Using the MediaWiki web API to get (only) the data you need · Using the MediaWiki web API to get (only) the data you need WikiConference USA 2014 Frances Hocutt @franceshocutt frances.hocutt@gmail.com

Where we start...

Page 5: Using the MediaWiki web API to get (only) the data you need · Using the MediaWiki web API to get (only) the data you need WikiConference USA 2014 Frances Hocutt @franceshocutt frances.hocutt@gmail.com

Where we go next...

Page 6: Using the MediaWiki web API to get (only) the data you need · Using the MediaWiki web API to get (only) the data you need WikiConference USA 2014 Frances Hocutt @franceshocutt frances.hocutt@gmail.com

But if we want to see it differently...

http://sepans.com/wikistalker/#

Page 7: Using the MediaWiki web API to get (only) the data you need · Using the MediaWiki web API to get (only) the data you need WikiConference USA 2014 Frances Hocutt @franceshocutt frances.hocutt@gmail.com

...how do we make that happen?

From the website:

● Open up Organic chemistry

● Read through it and find the links in it

● Copy/paste the links to your list

● Save your list

Through the API:

● Request that list of links from the API

● Save the resulting list

Or do you want the links on all pages starting with “Organic”? No problem!

How to get all the wiki links on “Organic chemistry”

Page 8: Using the MediaWiki web API to get (only) the data you need · Using the MediaWiki web API to get (only) the data you need WikiConference USA 2014 Frances Hocutt @franceshocutt frances.hocutt@gmail.com

APIs make extracting data easier.

Page 9: Using the MediaWiki web API to get (only) the data you need · Using the MediaWiki web API to get (only) the data you need WikiConference USA 2014 Frances Hocutt @franceshocutt frances.hocutt@gmail.com

To get or post data through an API, you want to know:

Where do I direct my request?

How do I write my request?

How do I send my request?

Page 10: Using the MediaWiki web API to get (only) the data you need · Using the MediaWiki web API to get (only) the data you need WikiConference USA 2014 Frances Hocutt @franceshocutt frances.hocutt@gmail.com

Where do I direct my request?

WP's endpoint: http://en.wikipedia.org/w/api.php

How do I write my request?

How do I send my request?

Page 11: Using the MediaWiki web API to get (only) the data you need · Using the MediaWiki web API to get (only) the data you need WikiConference USA 2014 Frances Hocutt @franceshocutt frances.hocutt@gmail.com

Where do I direct my request?

How do I write my request?

Look at documentation and/or code samples.

Resources: API sandbox + /w/api.php (main doc), [[API]], tutorials, Data and Developer Hub (new

project)

How do I send my request?

Page 12: Using the MediaWiki web API to get (only) the data you need · Using the MediaWiki web API to get (only) the data you need WikiConference USA 2014 Frances Hocutt @franceshocutt frances.hocutt@gmail.com

Where do I direct my request?

How do I write my request?

How do I send my request? Over http.

Web browserShell script: curl

Module: requests, urllib2 (Python)API client library: mwclient (Python)

(See: [[API:Client code]])

Page 13: Using the MediaWiki web API to get (only) the data you need · Using the MediaWiki web API to get (only) the data you need WikiConference USA 2014 Frances Hocutt @franceshocutt frances.hocutt@gmail.com

Putting it together:

http://en.wikipedia.org/w/api.php+

?action=query&prop=links&plnamespace=0&pllimit=max&title=Organic_chemistry

=http://en.wikipedia.org/w/api.php?action=query&prop=links&plnamespace=0&pllimit=max&title=Organic_chemistry

Page 14: Using the MediaWiki web API to get (only) the data you need · Using the MediaWiki web API to get (only) the data you need WikiConference USA 2014 Frances Hocutt @franceshocutt frances.hocutt@gmail.com

Or use a client library:

import mwclient

wikipedia = mwclient.Site('en.wikipedia.org')ochem = wikipedia.Pages['Organic chemistry']links = ochem.links(generator=False)

for item in links: print item

Page 15: Using the MediaWiki web API to get (only) the data you need · Using the MediaWiki web API to get (only) the data you need WikiConference USA 2014 Frances Hocutt @franceshocutt frances.hocutt@gmail.com

Nifty (potential) API-using projects

Page 16: Using the MediaWiki web API to get (only) the data you need · Using the MediaWiki web API to get (only) the data you need WikiConference USA 2014 Frances Hocutt @franceshocutt frances.hocutt@gmail.com

Wikipedia Gender

http://moebio.com/research/wikipediagender

Page 17: Using the MediaWiki web API to get (only) the data you need · Using the MediaWiki web API to get (only) the data you need WikiConference USA 2014 Frances Hocutt @franceshocutt frances.hocutt@gmail.com

Edits vs. location

WikiTrip

http://sonetlab.fbk.eu/wikitrip/

Page 18: Using the MediaWiki web API to get (only) the data you need · Using the MediaWiki web API to get (only) the data you need WikiConference USA 2014 Frances Hocutt @franceshocutt frances.hocutt@gmail.com

WikiChanges

http://sergionunes.com/p/wikichanges/

Page 19: Using the MediaWiki web API to get (only) the data you need · Using the MediaWiki web API to get (only) the data you need WikiConference USA 2014 Frances Hocutt @franceshocutt frances.hocutt@gmail.com

Wikipedia Community Visualization

http://haithams.github.io/community_visualization/EnWp/network/index.html#

Page 20: Using the MediaWiki web API to get (only) the data you need · Using the MediaWiki web API to get (only) the data you need WikiConference USA 2014 Frances Hocutt @franceshocutt frances.hocutt@gmail.com

Twitterbot for random facts/claims

@AutoWikiFactshttps://github.com/fhocutt/obscure-enwiki-fact

Relevance/randomness in topic comes from using the API to fetch recently changed pages/Wikidata itemsTo increase obscurity, the bot selects the first sentence of the last paragraph of the longest section to tweet!

Page 21: Using the MediaWiki web API to get (only) the data you need · Using the MediaWiki web API to get (only) the data you need WikiConference USA 2014 Frances Hocutt @franceshocutt frances.hocutt@gmail.com

Acknowledgements:

Gnome OPW and the WMFWikiConference USA 2014

Sumana HarihareswaraTollef Fog Heen

Merlijn Van DeenBrad Jorsch

Questions?

Page 22: Using the MediaWiki web API to get (only) the data you need · Using the MediaWiki web API to get (only) the data you need WikiConference USA 2014 Frances Hocutt @franceshocutt frances.hocutt@gmail.com

Supplemental

Page 23: Using the MediaWiki web API to get (only) the data you need · Using the MediaWiki web API to get (only) the data you need WikiConference USA 2014 Frances Hocutt @franceshocutt frances.hocutt@gmail.com

Using the API sandbox

Page 24: Using the MediaWiki web API to get (only) the data you need · Using the MediaWiki web API to get (only) the data you need WikiConference USA 2014 Frances Hocutt @franceshocutt frances.hocutt@gmail.com

Parameters in the sandbox

Page 25: Using the MediaWiki web API to get (only) the data you need · Using the MediaWiki web API to get (only) the data you need WikiConference USA 2014 Frances Hocutt @franceshocutt frances.hocutt@gmail.com

API sandbox results

Page 26: Using the MediaWiki web API to get (only) the data you need · Using the MediaWiki web API to get (only) the data you need WikiConference USA 2014 Frances Hocutt @franceshocutt frances.hocutt@gmail.com

In-browser query resulthttp://en.wikipedia.org/w/api.php?action=query&prop=links&plnamespace=0&pllimit=max&title=Organic_chemistry