wiki[mp]edia data sources & the mediawiki api

19

Click here to load reader

Upload: brianna-laugher

Post on 06-May-2015

2.677 views

Category:

Technology


4 download

DESCRIPTION

For #melhack - http://lplabs.com/melbournehack/pmwiki/pmwiki.php/Main/HomePage

TRANSCRIPT

Page 1: Wiki[mp]edia data sources & the MediaWiki API

Wiki[mp]edia data sources &

the MediaWiki API

Brianna Laugher

for #melhackNovember 2009

Page 2: Wiki[mp]edia data sources & the MediaWiki API

...

Page 3: Wiki[mp]edia data sources & the MediaWiki API

Wikipedia13M articles total3M+ articles in English240+ languagesSimple English!

Page 4: Wiki[mp]edia data sources & the MediaWiki API

{{coord|37|48|49|S|144|57|47|E|type:city_region:AU-VIC|display=inline,title}}

stable.toolserver.org/geohack/wiki.toolserver.org/view/GeoHack

Page 5: Wiki[mp]edia data sources & the MediaWiki API

{{Infobox Company|name = Lonely Planet|logo =|type = [[United Kingdom|British]] [[Government-owned company|government-owned]] (subsidiary of [[BBC Worldwide]])|genre = [[Guide book|Travel guides]]|foundation = 1972|founder = Tony Wheeler<br />Maureen Wheeler|location_city = [[Footscray, Victoria]]|location_country = [[Australia]]|location = |origins = |key_people = Matt Goldberg <small>(Global [[CEO]])</small>|area_served = Worldwide|industry = [[Multi media]]|products = Travel [[guidebook, digital applications, online travel community]]|services =

Page 6: Wiki[mp]edia data sources & the MediaWiki API

Wikimedia Commonscommons.wikimedia.orgMultilingual5M+ files“Self-created”, PD, Flickr

Predominantly photographs,but also diagrams, maps, flags

Page 7: Wiki[mp]edia data sources & the MediaWiki API
Page 8: Wiki[mp]edia data sources & the MediaWiki API

Wiktionary5M+ entries170+ languages13 languages > 100K entries

French biggest at 1.5M(English second at 1.4M)

Page 9: Wiki[mp]edia data sources & the MediaWiki API

JavaScript Wiktionary lookup plugin for third parties:

http://bawolff.blogspot.com/2009/10/introducing-wiktionary-lookup-now-for.html

http://en.wiktionary.org/wiki/Wiktionary:Parsing

Page 10: Wiki[mp]edia data sources & the MediaWiki API

Users Logs Pages, subpages, talk pages Links, backlinks Templates Categories

MediaWiki structure

Page 11: Wiki[mp]edia data sources & the MediaWiki API

MediaWiki markup

The only thing that completely understands it is MediaWiki :(

Page 12: Wiki[mp]edia data sources & the MediaWiki API

XML

download.wikimedia.org OR Amazon Public Data Sets

meta.wikimedia.org/wiki/Data_dumps

Database dumps

Page 13: Wiki[mp]edia data sources & the MediaWiki API

DBpediaCommunity project extracting structured data from Wikipedia and making it available

Can download data sets or query them online

Ontology++

e.g. dbpedia.org/page/Lonely_Planet

Page 14: Wiki[mp]edia data sources & the MediaWiki API

MediaWiki API

mediawiki.org/wiki/API

en.wikipedia.org/w/api.php

Client libraries!

Page 15: Wiki[mp]edia data sources & the MediaWiki API

mwclient

Python library for accessing MediaWiki APIs

Page 16: Wiki[mp]edia data sources & the MediaWiki API
Page 17: Wiki[mp]edia data sources & the MediaWiki API

toolserver.org

Server for community-developed plugins, addons, extensions, stats and hacks – tools

Tools often explicitly implements implicit editing community standards (“community API”)

Toolserver

Page 18: Wiki[mp]edia data sources & the MediaWiki API

TemplateTigertoolserver.org/~kolossos/templatetiger/

For a few dozen Wikipedia languages, & Wikimedia Commons

Lets you query templates very much like SQL

Page 19: Wiki[mp]edia data sources & the MediaWiki API

identi.ca/pfctdayelise [email protected]

Thanks!

Logos and screenshots may be copyright their respective ownersSlides are otherwise © Brianna Laugher