wiki[mp]edia data sources & the mediawiki api
DESCRIPTION
For #melhack - http://lplabs.com/melbournehack/pmwiki/pmwiki.php/Main/HomePageTRANSCRIPT
Wiki[mp]edia data sources &
the MediaWiki API
Brianna Laugher
for #melhackNovember 2009
...
Wikipedia13M articles total3M+ articles in English240+ languagesSimple English!
{{coord|37|48|49|S|144|57|47|E|type:city_region:AU-VIC|display=inline,title}}
stable.toolserver.org/geohack/wiki.toolserver.org/view/GeoHack
{{Infobox Company|name = Lonely Planet|logo =|type = [[United Kingdom|British]] [[Government-owned company|government-owned]] (subsidiary of [[BBC Worldwide]])|genre = [[Guide book|Travel guides]]|foundation = 1972|founder = Tony Wheeler<br />Maureen Wheeler|location_city = [[Footscray, Victoria]]|location_country = [[Australia]]|location = |origins = |key_people = Matt Goldberg <small>(Global [[CEO]])</small>|area_served = Worldwide|industry = [[Multi media]]|products = Travel [[guidebook, digital applications, online travel community]]|services =
Wikimedia Commonscommons.wikimedia.orgMultilingual5M+ files“Self-created”, PD, Flickr
Predominantly photographs,but also diagrams, maps, flags
Wiktionary5M+ entries170+ languages13 languages > 100K entries
French biggest at 1.5M(English second at 1.4M)
JavaScript Wiktionary lookup plugin for third parties:
http://bawolff.blogspot.com/2009/10/introducing-wiktionary-lookup-now-for.html
http://en.wiktionary.org/wiki/Wiktionary:Parsing
Users Logs Pages, subpages, talk pages Links, backlinks Templates Categories
MediaWiki structure
MediaWiki markup
The only thing that completely understands it is MediaWiki :(
XML
download.wikimedia.org OR Amazon Public Data Sets
meta.wikimedia.org/wiki/Data_dumps
Database dumps
DBpediaCommunity project extracting structured data from Wikipedia and making it available
Can download data sets or query them online
Ontology++
e.g. dbpedia.org/page/Lonely_Planet
MediaWiki API
mediawiki.org/wiki/API
en.wikipedia.org/w/api.php
Client libraries!
mwclient
Python library for accessing MediaWiki APIs
toolserver.org
Server for community-developed plugins, addons, extensions, stats and hacks – tools
Tools often explicitly implements implicit editing community standards (“community API”)
Toolserver
TemplateTigertoolserver.org/~kolossos/templatetiger/
For a few dozen Wikipedia languages, & Wikimedia Commons
Lets you query templates very much like SQL
identi.ca/pfctdayelise [email protected]
Thanks!
Logos and screenshots may be copyright their respective ownersSlides are otherwise © Brianna Laugher