LoCloud - D3.6: Wikimedia Application

Download LoCloud - D3.6: Wikimedia Application

Post on 11-Jun-2015

90 views

Category:

Technology

0 download

Embed Size (px)

DESCRIPTION

This is one of 7 reports provided in work package 3: Micro services for small and medium institutions. Authors: Dimitris Gavrilis, Costis Dallas and Dimitra-Nefeli Makri (ATHENA)

TRANSCRIPT

  • 1. Local content in a Europeana cloudD3.6: Wikimedia ApplicationAuthors:Dimitris Gavrilis, Costis Dallas andDimitra-Nefeli MakriLoCloud is funded by the European CommissionsICT Policy Support ProgrammeVersion: 1.0

2. View the LoCloud project deliverablesLoCloudD3.6WikimediaApplication2RevisionHistoryRevisionDateAuthorOrganisationDescription0.120/08/2014DimitrisGavrilisATHENA0.225/08/2014Dimitra-NefeliMakriATHENA0.301/09/2014DimitrisGavrilisATHENA0.407/09/2014DimitrisGavrilis,Dimitra-NefeliMakriATHENA1.009/09/2014DimitrisGavrilisATHENAPublishedStatementoforiginality:Thisdeliverablecontainsoriginalunpublishedworkexceptwhereclearlyindicatedotherwise.Acknowledgementofpreviouslypublishedmaterialandoftheworkofothershasbeenmadethroughappropriatecitation,quotationorboth. 3. LoCloudD3.6WikimediaApplication3ContentsExecutivesummary_______________________________________________________________41.Introduction_________________________________________________________________5Overviewofthemicroservice_____________________________________________________________5Overviewofthedevelopmentmethodology_________________________________________________72.Gettingstarted_______________________________________________________________8Operatingsystem______________________________________________________________________8Server________________________________________________________________________________8Termsofuse(APIkey)___________________________________________________________________8Authentication_________________________________________________________________________8BaseURL_____________________________________________________________________________83.APIReference________________________________________________________________9Harvest_______________________________________________________________________________9ListItems_____________________________________________________________________________10GetItem_____________________________________________________________________________11HTMLStatusCodes____________________________________________________________________134.Howtoinstallthemicroservice_________________________________________________145.HowthemicroserviceisinstalledinLoCloud______________________________________146.Conclusions_________________________________________________________________147.References_________________________________________________________________15 4. ExecutivesummaryThisdeliverablepresentstheWikimediaapplicationwhichwillbeusedwithintheLoCloudinfrastructure.TheapplicationallowstheharvestingofcontentprovidedbysmallculturalinstitutionsorindependentexpertsanduploadedtoWikimediainstallations,andtheprovisionofenrichedcontenttoEuropeana.Theapplication(ormicroservice)hasbeenbuiltasawebservice(RESTbased)andusestheWikimediaAPIinordertocommunicatewithWikimedia.TheapplicationsmainfunctionalitiesaretoharvestcontentformWikimedia,parsetheharvestedcontent,andidentifyusefulentitiesthatcanbemappedtotheESEorEDMmetadataschemas.TheapplicationisconnectedtotheLoCloudinfrastructurethroughitsRESTservices.TheLoCloudaggregator(MoRe)usestheservicesinordertoallowuserstoinitiateaharvestandgetcontentintotheaggregator.ThemappedESE/EDMrecordsaredeliveredtoMoRe,wheretheycanbeenrichedusingthevariousenrichmentservicesavailableontheaggregatorandthenprovidedtoEuropeana.ThebaseURLoftheapplicationcanbefoundat:http://more.locloud.eu/wikimedia/TheAPIconsoleisaccessiblefrom:http://more.locloud.eu/wikimedia/console.phpLoCloudD3.6WikimediaApplication4 5. LoCloudD3.6WikimediaApplication51. IntroductionOverview of the microserviceTheWikimediaapplicationhasbeenbuiltasawebserviceandusesaRESTinterfaceinordertocommunicatewithWikimediacommonsandfacilitatetheexchangeofinformation.OntopoftheRESTservices,anAPIconsolehasbeenbuiltthatdemonstratesitsfunctionality.TheuserisrequiredtoprovidetheURLtotheWikimediaAPIinstallationendpoint(e.g.http://commons.wikimedia.org/w/api.php),plustheuseridentifierthathasprovidedthecontenttobeharvested(e.g.PMRMaeyaert).TheapplicationwillthenusetheAPIinordertoretrievetherecordsassociatedwiththisuser,parseandextractusefulinformationthatisthendisplayedtotheuser.Thisinformationincludesthefollowingelementssofar: author description source date image(url)TheharvestedinformationcanthenbemappedtoESEorEDM(EuropeanaSemanticElements,EuropeanaDataModel,seereferencesforfutherinformation)inordertobeingestedintotheLoCloudMetadataAggregator(MoRe).Figure1:Wikimediaapplicationconsole.Listofharvestedrecords 6. LoCloudD3.6WikimediaApplication6Figure2:Wikimediaapplicationconsole.ListofharvestedrecordsFigure3:Wikimediaapplicationconsole.ListofharvestedrecordsTheharvestedandtransformedrecordswillbeenrichedwithintheLoCloudMoReaggregatorandthroughthevariousenrichedmicro-services.EnrichmentthroughMoReallowsthestreamliningofallavailableenrichmentmicro-servicesintorobustworkflows.Typicalenrichmentscenariosinclude: AnnotationoftextusingWikipedialemmas Discoveryofassociationsofrecordsincloseproximity Annotationwithsubjecttermsfromvariousthesauri 7. TheoverallworkflowisdepictedintheaboveschemawheretheWikimediaapplicationharvestscontentfromWikimediaandingestsitintotheLoCloudinfrastrure(MoRe).There,thecontentisenrichedthroughthevariousenrichmentmicro-servicesandtheenrichedversionisdeliveredtoEuropeanaandbacktotheWikimediaapplicationforpossiblere-ingestionintoWikimediaasanewversion.Overview of the development methodologyTheWikimediaapplicationhasbeenbuiltasaswebserviceandthePHPlanguagewasusedtodevelopit.AlthoughitmakesextensiveuseofRESTservices,theWikimediaCommonsAPIlibraryhasbeenused.ThemainchallengeindevelopingtheWikimediaapplicationfocusesonextractingandcorrectlyparsingcontentfromWikimedia,becauseWikicontentisnotalwayssemanticallycorrectandlittleinformationcanbeextractedwithagooddegreeofconfidence.ThesemanticdisambiguitieshavetodomainlywiththefactthatWikisareinherentlydifficultin:a)capturingmetadataandb)annotatingpartsofthewikitext.Thismeansthatfewmetadataperpageareavailableandtextualdescriptionswithinthepagesofthewikiarenotannotatedwithmetadata(theyappearasplaintext).Thetestingoftheapplicationinvolvestwoparts:TestingoftheapplicationVariousAPIcallsweretestedusingChromeRESTclients.Thetestsinvolvedusingcombinationsofvariousparameters.TestingofthecontentTestingusingrealcontentwascarriedoutusingadummyWikimediainstallationandusingtherealWikimediaCommons.Theharvestedcontentwasparsedsothattheappropriateentitiescouldbeextracted.LoCloudD3.6WikimediaApplication7 8. LoCloudD3.6WikimediaApplication82. GettingstartedOperating systemTheWikimediaapplicationcanbeinstalledinanywebserverthathasApacheHTTPandPHPinstalled.Thisincludesthefollowingoperatingsystems: Windowsserver Linuxserver MacOSserverServerTheminimumserverrequirementsoftheWikimediaapplicationmatchthoseoftheApacheHTTPserver.WikimediaisanextremelylightweightmicroserviceprovidingitharvestsnomorethanfivesimultaneousWikimediainstallations.Terms of use (API key)Therearetwoaccesslevelsinplaceinordertoensuretheproperuseoftheapplication,andsafeguardtheserversresources.AccesstotheRESTAPIcallsTheserequireanAPIkeythatisprovidedonrequest.EachAPIcallmustbeaccompaniedbythisAPIkeyinordertoprovideproperauditandresourcelimitation(whenneeded).AccesstotheAPIconsoleTheaccesstotheAPIconsoleisfreeanddoesnotrequireanykeys.Itwasmeantasademooftheapplicationscapabilities.AuthenticationThereisnoauthenticationrequired(excepttheuseofanAPIkeywhenusingtheRESTAPIcalls).Base URLThebaseURLoftheapplicationcanbefoundunder:http://more.locloud.eu/wikimedia/TheAPIconsoleisaccessiblefrom:http://more.locloud.eu/wikimedia/console.php 9. LoCloudD3.6WikimediaApplication93. APIReferenceHarvestTheidofthebatchisreceivedoriscreated.RequestMethodURL[POSTorGET]http://more.locloud.eu/wikimedia/Harvest.php?api_key=AAAAA&harvest_id=1ParameterDatatypeDescriptionapi_keyStringTheAPIKeyurlStringThebaseURLoftheWIkimediainstallationcontributorStringThecontributornameharvest_idIntegerTheharvest_idofthebatch.Ifempty,anewharvestiscreatedandreturned.ResponseStatusResponse200AnXMLdocumentwiththeidofthebatchtoharvestisreceived(harvest_id).Exampleresponse:http://commons.wikimedia.org/w/api.php?action=query&list=usercontribs&uclimit=5&format=xmlPMRMaeyaert 10. LoCloudD3.6WikimediaApplication10ListItemsAlistwithalltheharvesteditemsisreceived.RequestMethodURL[GET]http://more.locloud.eu/wikimedia/ListItems.php?api_key=AAAAA&harvest_id=1ParameterDatatypeDescriptionapi_keyStringTheAPIKeyharvest_idStringTheidoftheharvestc_tokenStringAtokencontainingtheidofthepagetoharvest.Ifc_tokenisempty,thefirstpageisreceivedResponseStatusResponse200AnXMLdocumentwiththeharvesteditemsisreceived.Eachitemcontainsarecordidentifier.Exampleresponse:GetItemTheharvesteditemisreceived.RequestMethodURL[GET]http://more.locloud.eu/wikimedia/GetItem.php?api_key=AAAAA&harvest_id=1&item_id=Gent_Sint-Baafskathedraal_portret_bisschop_Lindanus_B_STB_602.jpgParameterDatatypeDescriptionapi_keyStringTheAPIKeyharvest_idStringTheidoftheharvestItem_idStringTheidoftheitemtoharvestResponseStatusResponse200AnXMLdocumentwiththeharvesteditemisreceived.Exampleresponse:Gent Sint-Baafskathedraal portret bisschop Lindanus B STB602.jpgFile:Gent_Sint-Baafskathedraal_portret_bisschop_Lindanus_B_STB_602.jpghttp://upload.wikimedia.org/wikipedia/commons/3/30/Gent_Sint-Baafskathedraal_portret_bisschop_Lindanus_B_STB_602.jpghttp://commons.wikimedia.org/wiki/File:Gent_Sint-Baafskathedraal_portret_bisschop_Lindanus_B_STB_602.jpg 12. 117595463600PMRMaeyaert2014-06-21T12:23:14Z3e862647cad40628a0027bd828611cf1ca5f897f5 January 2011,19:12:21PMRMaeyaertOwn work

Nederlands: Portret van Mgr. WillemLindanus, Sint-Baafskathedraal Inventaris van hetKunstpatrimonium van Oostvlaanderen, V Sint-BaafskathedraalGent, Dr. Elisabeth Dhanens, Gent 1965 inv nr: 499/2

All media needing categories as of2014Media needing categories as of 21 June2014Uploaded with UploadWizardCC-BY-SA-3.0Creative Commons Attribution Share-AlikeV3.00101http://upload.wikimedia.org/wikipedia/commons/thumb/7/79/CC_some_rights_reserved.svg/90px-CC_some_rights_reserved.svg.pnghttp://creativecommons.org/licenses/by-sa/LoCloudD3.6WikimediaApplication123.0/http://creativecommons.org/licenses/by-sa/3.0/legalcode 13. LoCloudD3.6WikimediaApplication13HTML Status CodesAllstatuscodesarestandardHTTPstatuscodes.TheonesbelowareusedinthisAPI.2XX-Successofsomekind4XX-Erroroccurredinclientspart5XX-ErroroccurredinserverspartStatusCodeDescription200OK201Created202Accepted(Requestaccepted,andqueuedforexecution)400Badrequest401Authenticationfailure403Forbidden404Resourcenotfound405MethodNotAllowed409Conflict412PreconditionFailed413RequestEntityTooLarge500InternalServerError501NotImplemented503ServiceUnavailable 14. 4. HowtoinstallthemicroserviceTheapplicationisawebserviceanddoesnotrequireanyinstallation,itcanbeuseddirectlythroughitsRESTinterface.However,itispossibletoinstallonanewserver.Therequirementsforthatinstallationare:LoCloudD3.6WikimediaApplication14 ApacheHTTPServer(version2.0andabove) PHPSupport(version5.0andabove) MySQL(version5.0andabove)5. HowthemicroserviceisinstalledinLoCloudTheWikimediaapplicationcanbeconnectedtotheLoCloudinfrastructurethroughitsRESTservices.TheLoCloudaggregator(MoRe)canusetheservicesinordertoallowuserstoinitiateanewharvestandgetcontentintotheaggregator,usingoneoftheintermediateformatssupportedbyLoCloud.OncetheWikimediarecordshavebeendeliveredtoMoRe,theycanbeenrichedusingthevariousenrichmentservicesavailable.Theseservicesincludetheadditionofvocabularyterms,theannotationwithWikipedialemmasetc.6. ConclusionsInconclusion,themicro-servicesgeneralapproachincludesharvestingofmetadataontopofaRESTfularchitecture.ThewebbasedapplicationandRESTendpointshavebeentestedusingreal-data.Themainchallengesindevelopingandoperatingsuchaservicehavetodomostlywiththeambiguitiesoftheharvestedmetadata.ThisisbecauseofWikimediaslackofformalization(andenforcingofthatnormalization)ofmetadata.OncetheWikimediarecordshavebeenharvestedusingtheapplicationandtheyareontheMoReaggregatortheycantakeadvantageofthefullrangeofenrichmentmicro-servicesbeforebeingpublishedonEuropeana. 15. 7. ReferencesESEEuropeanaSemanticElements-http://www.europeana.eu/portal/ese_index.htmlEDMEuropeanaDataModel-http://pro.europeana.eu/edm-documentationLoCloud,2013:D2.3ModifiedMoRePrototypeWikimediaCommons-http://commons.wikimedia.org/wiki/Main_PageWikimediaCommonsAPI-http://tools.wmflabs.org/magnus-toolserver/commonsapi.phpLoCloudD3.6WikimediaApplication15