d4.2 joint infrastructure services - clarin · architectural overview of the b2access service is...

16
D4.2 Joint infrastructure services Document information Title Joint infrastructure services ID CLARINPLUS-D4.2 (CE-2017-0985) Author(s) Claus Zinn, Twan Goosen, Marie Hinrichs, Emanuel Dima, Willem Elbers, Dieter Van Uytvanck, Dirk Goldhahn, Thorsten Trippel, Jozef Mišutka Responsible WP leader Erhard Hinrichs Contractual Delivery Date 2017-02-28 Actual Delivery Date 2017-02-28 Distribution Public Document status in workplan Deliverable Project information Project name CLARIN-PLUS Project number 676529 Call H2020-INFRADEV-1-2015-1 Duration 2015-09-01 – 2017-08-31 Website www.clarin.eu Contact address [email protected]

Upload: others

Post on 16-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: D4.2 Joint infrastructure services - CLARIN · architectural overview of the B2ACCESS service is shown in Figure 1. To give the CLARIN user access to the EUDAT infrastructure, the

D4.2

JointinfrastructureservicesDocumentinformationTitle JointinfrastructureservicesID CLARINPLUS-D4.2(CE-2017-0985)Author(s) ClausZinn,TwanGoosen,MarieHinrichs,Emanuel

Dima,WillemElbers,DieterVanUytvanck,DirkGoldhahn,ThorstenTrippel,JozefMišutka

ResponsibleWPleader ErhardHinrichsContractualDeliveryDate 2017-02-28ActualDeliveryDate 2017-02-28Distribution PublicDocumentstatusinworkplan DeliverableProjectinformationProjectname CLARIN-PLUSProjectnumber 676529Call H2020-INFRADEV-1-2015-1Duration 2015-09-01–2017-08-31Website www.clarin.euContactaddress [email protected]

Page 2: D4.2 Joint infrastructure services - CLARIN · architectural overview of the B2ACCESS service is shown in Figure 1. To give the CLARIN user access to the EUDAT infrastructure, the

CLARIN-PLUSD4.2Jointinfrastructureservices

1

Tableofcontents1 ExecutiveSummary.................................................................................................................22 Introduction..............................................................................................................................33 CooperationwithEUDAT......................................................................................................43.1 B2ACCESSandB2SAFE...................................................................................................................43.1.1 B2ACCESS.....................................................................................................................................43.1.2 B2SAFE..........................................................................................................................................5

3.2 B2DROPandtheCLARINLanguageResourceSwitchboard..........................................63.2.1 B2DROP.........................................................................................................................................63.2.2 TheCLARINLanguageResourceSwitchboard............................................................73.2.3 GoalsforConnectingtheLRSwithB2DROP.................................................................73.2.4 UsingB2DROPasAlternativetotheMPGserver........................................................73.2.5 CreatingabridgebetweenB2DROPandtheSwitchboard.....................................8

3.3 EUDAT’sGeneralExecutionFramework.................................................................................94 Cooperationwithothere-Researchinfrastructures...................................................94.1 GÉANT....................................................................................................................................................94.2 RDA.......................................................................................................................................................104.2.1 DataFoundationandTerminology(DFT)...................................................................104.2.2 DynamicDataCitation.........................................................................................................104.2.3 RDA/WDSCertificationofDigitalRepositories........................................................104.2.4 FederatedIdentityManagement.....................................................................................104.2.5 LegalInteroperability..........................................................................................................104.2.6 DataFabric................................................................................................................................104.2.7 DomainRepositories............................................................................................................114.2.8 GroupofEuropeanDataExperts(GEDE)....................................................................11

4.3 DARIAH...............................................................................................................................................114.4 EUROPEANA.....................................................................................................................................124.5 LAPPSGrid.........................................................................................................................................13

5 Conclusion...............................................................................................................................14References......................................................................................................................................15

Page 3: D4.2 Joint infrastructure services - CLARIN · architectural overview of the B2ACCESS service is shown in Figure 1. To give the CLARIN user access to the EUDAT infrastructure, the

CLARIN-PLUSD4.2Jointinfrastructureservices

2

1 ExecutiveSummaryThe CLARIN-PLUS partners actively follow the progress of a number of otherneighbouring infrastructure projects. To support the cross-fertilization betweenprojects, it has been our aim to implement joint services with these infrastructures,whenever possible. In this deliverable, we report on joint work carried out incooperationwithEUDAT,GÉANT,RDA,DARIAH,EUROPEANA,andtheLAPPSGrid.Thejoint work touches issues such as secure authentication and authorisation, datamanagement policies, trusted exchange of data, sharing of metadata for language-relatedresources,andtheavailabilityoftoolsacrosscommunities.

Page 4: D4.2 Joint infrastructure services - CLARIN · architectural overview of the B2ACCESS service is shown in Figure 1. To give the CLARIN user access to the EUDAT infrastructure, the

CLARIN-PLUSD4.2Jointinfrastructureservices

3

2 IntroductionThisdeliverablereportsonprogressonworkpackage4oftheCLARIN-PLUSproposal.Inbrief,theobjectivesforWP4areto:

• strengthen the ties with other research infrastructure initiatives, inside andoutsidetheEU;

• obtain a higher degree of synergy, by re-using other infrastructural serviceswithin CLARIN and by promoting the usage of CLARIN services in differentcontexts;

• enhance the visibility of CLARIN’s infrastructure, paving the way for futurecollaborationsandeventuallythegrowthoftheCLARINERICmemberbase(seeWP5);and

• providemoreandbetterservicestotheCLARINusercommunity.In thisdeliverable,wereport theworktowardachievingtheobjectiveswithregardtotheEuropean infrastructureprojectsEUDAT,GÉANT,DARIAH,EUROPEANAaswellasRDAandtheLAPPSGrid.

Page 5: D4.2 Joint infrastructure services - CLARIN · architectural overview of the B2ACCESS service is shown in Figure 1. To give the CLARIN user access to the EUDAT infrastructure, the

CLARIN-PLUSD4.2Jointinfrastructureservices

4

3 CooperationwithEUDATInthissection,wediscusstheuseofEUDAT’sB2servicesfortheCLARINinfrastructure.In Section 3.1, we focus on our contributions to B2ACCESS and B2SAFE. Section 3.2describes the interface between B2DROP and the CLARIN Language ResourceSwitchboard. In Section3.3,wediscuss the ongoing cooperation betweenEUDAT andCLARINstafftodeveloptheGeneralExecutionFramework(GEF),andtheinclusionoforaccesstoCLARINtools.

3.1 B2ACCESSandB2SAFE

3.1.1 B2ACCESSB2ACCESS service is “an easy-to-use and secure Authentication and Authorizationplatformdeveloped by EUDAT. B2ACCESS is versatile and can be integratedwith anyservice.WhenB2ACCESSisintegratedwithagivenservice,theusermayloginbyusingdifferent methods of authentication”, see https://eudat.eu/services/b2access. AnarchitecturaloverviewoftheB2ACCESSserviceisshowninFigure1.To give the CLARIN user access to the EUDAT infrastructure, the CLARIN identityprovider (IdP) has been integratedwith B2ACCESS.Now, CLARIN users can use theiraccounttoaccessEUDATservices,e.g.,todepositdatawithB2SHARE.B2ACCESS has been integrated with the eduGAIN 1 IdPs as well. However, thisintegrationissufferingfromeduGAIN’sopt-inpolicythatpreventsuserstologintotheEUDAT services with their home organization account if their NREN2has an opt-inpolicyandtheirIdPdidnotoptintoB2ACCESSyet.ThisissueisoneofthethingssolvedbytheCLARINserviceproviderfederation(SPF).Effort has been put into getting B2ACCESS in the SPF for approximately a year now,withoutmuchprogress.Alongprocessoflegaladviceandfeedbackonthepossibilityofcomputing centres in EUDAT joining the SPF has proven to be the bottleneck. SinceCLARIN promotes usage of the home organization accounts, this is a major issue foradoptionoftheEUDATserviceswithinCLARIN.Inparallel,CLARINERICisalsoinvestigatingtheuseof“unityIDM”,thecorecomponentofB2ACCESS,asthecentral identitymanagementsolution,replacingthecurrentsetup[3]. Inthiscontext,CharlesUniversityhasdevelopedanLDAPendpoint forunityIDM3,whichisneededtointegratetheCLARINwebsiteaswellasdeveloper-centredservicessuchasTrac,SVNandtheNexusrepositories.Thissetuphasalreadybeendeployedinatestenvironmentandwearecurrentlyplanningthemigration.

1Seehttp://www.geant.org/Services/Trust_identity_and_security/eduGAIN.2A National Research and Education Network (NREN) is a specialised internet serviceproviderdedicatedtosupportingtheneedsoftheresearchandeducationcommunitieswithinacountry,seehttps://en.wikipedia.org/wiki/National_research_and_education_network.3“unityIDM”, a softwarepackage for identity, federationand inter-federationmanagement, seehttp://unity-idm.eu.

Page 6: D4.2 Joint infrastructure services - CLARIN · architectural overview of the B2ACCESS service is shown in Figure 1. To give the CLARIN user access to the EUDAT infrastructure, the

CLARIN-PLUSD4.2Jointinfrastructureservices

5

TheLDAPendpointcanplayanimportantrolefortheEUDATintegrationaswell.BasedondiscussionswithB2ACCESSandB2DROPdevelopers, ithasbecomeclear,however,that a seamless integration of B2DROP for CLARIN users, using the same credentialswithinbothCLARINandEUDAT,mightnotbepossiblepolicy-wise.Currently, thebestalternative solution seems to be a dedicated B2DROP instance for CLARIN that isdirectlyconnectedtotheCLARINLDAPserver.

3.1.2 B2SAFEB2SAFEserviceis“arobust,safeandhighlyavailableservicewhichallowscommunityand departmental repositories to implement data management policies on theirresearch data across multiple administrative domains in a trustworthy manner”, seehttps://eudat.eu/services/b2safe,andtheillustrationgiveninFigure2.Inthe lastquarterof2015,asurveyhasbeendistributedacrossallCLARINcentrestoexploretheirinterestintheintegrationwithB2SAFE.Responseswerereceivedfrom10centres.Togetallparticipantstoabasiclevelofknowledgeabouttheservice,aonedayworkshop4at the end of 2015 was organised. Based on the participation in thisworkshop and because of the urgency to have an off-site copy of the repository, aninitialplantointegrateabout8centreswasdevised.All CLARIN centres expressed interest in using B2SAFE (instead of joining B2SAFE),eitherusingtheicommandsorGridFTPifsecuretransfersareneeded.Currently,fourcentreintegrationshavebeeneithercompletedorsuspended.TheMPI-PL and SOAS integrations have started earliest. Especially the SOAS integration withB2SAFEwasurgentbecauseof the lackofproperoffsitebackups for their repository.Both integrations,however,havebeenproblematicmainlybecauseofa lackofhuman

4Seehttps://www.clarin.eu/event/2015/clarin-b2safe-workshop

Figure1.B2ACCESSArchitecturalDiagram

Page 7: D4.2 Joint infrastructure services - CLARIN · architectural overview of the B2ACCESS service is shown in Figure 1. To give the CLARIN user access to the EUDAT infrastructure, the

CLARIN-PLUSD4.2Jointinfrastructureservices

6

resources.BothareconnectedtoanEUDATdatacentreonthetechnicallevel,butbotharelackingaproperimplementationofthebackupscriptsontherepositoryside.Atthetime of writing this report, SOAS has resources available again and work onimplementing thebackupscriptshasstartedagain.Theother twocentres,CLARIN-ATand Språkbanken, havebeen integrated smoothly, eachwith a lead-timeof about twomonths from initial discussions to clarify the requirements to running the backupscriptsinproduction.

Currently,threecentresareintheprocessofbeingintegrated,namely,Meertens,CELRandFIN-CLARIN.ThisleavestheLINDAT,CLARIN-PLandCMUcentresonthelistwithlowestpriority.IntegrationforthesecentreswillbeplannedwhenMeertens,CELRandFIN-CLARINhavebeenfullyintegrated.

3.2 B2DROPandtheCLARINLanguageResourceSwitchboard

3.2.1 B2DROPFollowingthedescriptionofB2DROPonhttps://eudat.eu/services/b2drop,B2DROPis“asecureandtrusteddataexchangeserviceforresearchersandscientiststokeeptheirresearchdatasynchronizedandup-to-dateandtoexchangewithotherresearchers.”

Figure3.TheB2DROPUsageScenario.

B2DROP allows individual users to store 20G of research data in the cloud, and toexchange such data with selected colleagues, over a given amount of time. B2DROP

Figure2.B2SAFEOverview.

Page 8: D4.2 Joint infrastructure services - CLARIN · architectural overview of the B2ACCESS service is shown in Figure 1. To give the CLARIN user access to the EUDAT infrastructure, the

CLARIN-PLUSD4.2Jointinfrastructureservices

7

offers services for synchronizing multiple versions of data across devices and users,givenconfigurablefilepermissions.B2DROPisbuiltuponownCloud,seehttp://owncloud.org ,which in turn iswritten inthe PHP programming language, see https://owncloud.org/blog/owncloud-and-php/ .Formoredetails,onB2DROP,seethedeliverableonEUDATservices[2],andconsultitsuserdocumentation,seehttps://eudat.eu/services/userdoc/b2drop#UserDocumentation-B2DROPUsage-Documentdata.

3.2.2 TheCLARINLanguageResourceSwitchboardThe LR Switchboard (LRS) is being developed inWP2 of the CLARIN-PLUS project. Itaimsateasilyconnectingusersandtheirresourceswiththetoolsthatcanprocessthem,see[1].TheLRShasbeenconnectedwiththeVirtualLanguageObservatory(VLO),seehttps://vlo.clarin.eu.Here,userscaneasilyinvoketheswitchboardfromVLO’sresourceviewer, which in turn, suggests applicable tools for the resource in question. In thefuture,theLRSwillalsobeconnectedtoCLARIN’sVirtualCollectionRegistry(VCR).The switchboard is also offered as a standalone version, seehttps://www.clarin.eu/switchboard. Here, users can simply upload a resource fromtheirlocalfilesystemtotheLRS,forwhichthenapplicabletoolsarebeingidentified.In the standalone version, the file uploaded by the user is currently stored on a filestorageserverattheMPCDFcomputingcentreinGarching.

3.2.3 GoalsforConnectingtheLRSwithB2DROPThemaingoalsforconnectingtheLRSwithB2DROPareasfollows:

• Tore-useotherinfrastructuralserviceswithinCLARIN.• TopromotetheusageofCLARINservicesindifferentcontexts.• ToincreasethevisibilityoftheCLARINinfrastructure.

Therearetwopossibleavenuestoworktowardthesegoals:

1. In the standalone version of the LRS, useB2DROP rather than the file storageserver in Garching for the storage of resources so that tools connected to theswitchboardcanaccessthem,seeSection3.2.4.

2. OfferabridgebetweenB2DROPandtheswitchboard,thereforegivingB2DROP

usersaccesstotheCLARINtoolspace,seeSection3.2.5.

3.2.4 UsingB2DROPasAlternativetotheMPGserverWhen users of the standalone version of the LRS upload a resource, it is temporarilystoredatanexternalfilestorageserver(MPCDFGarching).Thisserverhasanumberofdrawbacks. In particular, a very limited amount of disk space is available, and moregravely,thereislittleaccesscontrolinplacesothatusersthatknowtheserveraddresscaneasilyviewandaccessalluploads.Theserver,hence,actslikea“publicdropbox”.

Page 9: D4.2 Joint infrastructure services - CLARIN · architectural overview of the B2ACCESS service is shown in Figure 1. To give the CLARIN user access to the EUDAT infrastructure, the

CLARIN-PLUSD4.2Jointinfrastructureservices

8

Toaddressprivacyconcerns,itisnecessarytobetterrestrictaccesstofileuploads.Asafirstimprovementtothesituation,theMPGservercouldbereplacedbyusingaB2DROPinstanceasalternativestoragedevice.Inthisscenario,theB2DROPinstancewillhaveasingleuseraccount, say, switchboardAdmin,which isoperatedby theLR switchboard:whenever a user of the LR switchboard (standalone version) uploads a file to theswitchboard, it will be transferred to the switchboardAdmin account of the B2DROPinstance. Using B2DROP’s API, the admin user will associate a shared link to theresource in question, possibly with a set expiration date. Tools connected theswitchboardwillbegiventhislinktoaccessthefile.We have developed a prototypical implementation of this scenario, which is beingtested.AB2DROP instanceassociatedwith theLR switchboard is currentlyhostedonthe same server than the LR switchboard, which also helps tackling CORS-relatedissues.5

3.2.5 CreatingabridgebetweenB2DROPandtheSwitchboardTo promote the usage of CLARIN-related tools across communities,we are creating abridgefromB2DROPandtheCLARINLanguageResourceSwitchboard.Figure4depictsascreenshotofaninitialprototypeofB2DROPwithaLRSplugin.Whenusersclickonthe ‘…’ button, a menu opens giving access to a range of actions connected to theresource.Thismenuhasbeenextendedwithanaction“Switchboard”.Whenusersselectthisoptionforaresource,theLRswitchboardapplicationopensinanewbrowsertab;the switchboard is invoked with a B2DROP reference to the resource in question,mirroringtheconnectionbetweentheVLOandtheLRS.

Figure4.BridgebetweenB2DROPandtheLRS.

5Cross-origin resource sharing (CORS) isamechanismthatallowsrestrictedresourcesonawebpagetoberequestedfromanotherdomainoutsidethedomainfromwhichthefirstresourcewasserved,seehttps://en.wikipedia.org/wiki/Cross-origin_resource_sharing.

Page 10: D4.2 Joint infrastructure services - CLARIN · architectural overview of the B2ACCESS service is shown in Figure 1. To give the CLARIN user access to the EUDAT infrastructure, the

CLARIN-PLUSD4.2Jointinfrastructureservices

9

3.3 EUDAT’sGeneralExecutionFrameworkThe General Execution Framework (GEF) is a Docker-based platform6that aims atenablingtheexecutionofscientificworkflowsclosetothedata.ThecentralideaofGEFistoencapsulateascientificprocessingtoolinaDocker“image”,makingitportableandallowing its enactment in various suitable locations,where the access to the primaryscientificdataisfastandefficient.A GEF service is defined as the encapsulation of a scientific (non-interactive) tool,together with some required metadata. The metadata specifies human-orientedinformation(servicenameanddescription)butalsooperationalparameterssuchastheexpectedfilesystemlocationsfortheinputandoutput.TheGEFplatformprovidesawebbaseduser interface thatallowsauthorisedusers tobuild new services, and regular users to run existing services and to inspect anddownloadtheresults.ThewebinterfaceiscompletelybasedonanHTTPAPI,whichisalsoavailabletotheusersforprogrammaticaccesstotheservice.GEFisdevelopedintheframeoftheEUDAT2020projectandiscurrentlyunderactivedevelopment. It is expected to reach a public testing phase at around the middle ofMarch2017andbeusedincommunityusecasesbySummer2017.ThesourcecodeoftheprojectisavailableonGitHub:https://github.com/EUDAT-GEF/GEF.In theCLARINcontext, aGEFusecase for the computer-supportedannotationofdatawithWebLichtisbeingdevised.

4 Cooperationwithothere-Researchinfrastructures

4.1 GÉANTThe CLARIN centres, and especially the members of the Authentication andAuthorisation Infrastructure (AAI) task force have regular contacts with GÉANTrepresentativesabout the functioningofeduGAINand ingeneral theexperienceswithSAML-basedauthentication.Examplesofpracticaloutcomesofthisinteractionare:

• The Attribute Aggregator7service, described in CLARIN-PLUS D2.2, providinginsightintheattributereleasepoliciesfromtheindividualIdentityProviders.

• TheeduGAINopt-indashboard8–showingthepercentageofidentityprovidersper country that are connected to eduGAIN. This and the previous service arevery instrumental in deciding on how to connect CLARIN service providers tonationalidentityproviders.

• The eduGAIN Connectivity Check Service9that has been inspired by CLARIN’sShibbolethIdPQAtool10.

6Docker is an open platform for developers and sysadmins to build, ship, and run distributedapplications,seehttps://www.docker.com.7Seehttps://lindat.mff.cuni.cz/services/aaggreg/8Seehttps://technical.edugain.org/isFederatedCheck/Federations/9Seehttps://technical.edugain.org/eccs/10Seehttps://github.com/ufal/lindat-aai-shibbie

Page 11: D4.2 Joint infrastructure services - CLARIN · architectural overview of the B2ACCESS service is shown in Figure 1. To give the CLARIN user access to the EUDAT infrastructure, the

CLARIN-PLUSD4.2Jointinfrastructureservices

10

Next to such interactions (e.g., at eduGAIN town hall meetings, FIM4R and TNCconferences)CLARINERIC isalsoparticipating inGÉANT’s internationaluseradvisorycommittee11.

4.2 RDAThe role of the Research Data Alliance (RDA) in bringing together players in theresearch data and research infrastructure field on aworld-wide scale is indisputable.Therefore,CLARINhasbeenparticipating inRDAactivities fromthebeginning. In thissection,weprovideanoverviewofRDAoutputsandongoingactivitiestowhichCLARINcontributedandiscontributing.

4.2.1 DataFoundationandTerminology(DFT)CLARIN contributed to the DFT recommendations 12 about defining a consistentterminologyfordatamanagement.Itwasalsoamongoneofthefirstadopters.

4.2.2 DynamicDataCitationTheworkinggrouponDynamicDataCitationhascomeupwithrecommendations13onhow to reliably cite data sets that are changing over time, using persistent identifiersandtimestamps.

4.2.3 RDA/WDSCertificationofDigitalRepositoriesThis interest group14prepared the harmonization between the Data Seal of Approvaland World Data Systems certification procedures for data repositories. CLARINprovided input from its experience with the DSA procedure and its own centrecertification.

4.2.4 FederatedIdentityManagementThe interest group on Federated Identity Management 15 was a forum (partiallyoverlappingwiththeFIM4Rinitiative16)forexchangingexperiencesandbestpractices.CLARINcontributedwithseveralpresentationsontheextensionofitsServiceProviderFederation.

4.2.5 LegalInteroperabilityGetting a better understanding of the legal frameworks to enable (research) dataexchangeandinteroperabilityistheaimofthisinterestgroup17.

4.2.6 DataFabricThe Data Fabric interest group18– with several subgroups – works on the topic ofregistereddata objects andmaking thesemachine-actionable. It pertains tometadata,persistentidentifiers,repositories,registriesofrepositoriesanddatatyping.CLARINisactively involved,providing insights ithasgained fromexperiencewithe.g. thecentreregistryandtheLRswitchboard.

11Seehttp://www.geant.net/Users/Pages/User_Advisory_Committee.aspx12Seehttp://dx.doi.org/10.15497/06825049-8CA4-40BD-BCAF-DE9F0EA2FADF13Seehttp://dx.doi.org/10.15497/RDA0001614Seehttps://www.rd-alliance.org/groups/rdawds-certification-digital-repositories-ig.html15Seehttps://www.rd-alliance.org/groups/federated-identity-management.html16Seehttps://indico.cern.ch/event/605369/17Seehttps://www.rd-alliance.org/groups/rdacodata-legal-interoperability-ig.html18Seehttps://www.rd-alliance.org/group/data-fabric-ig.html

Page 12: D4.2 Joint infrastructure services - CLARIN · architectural overview of the B2ACCESS service is shown in Figure 1. To give the CLARIN user access to the EUDAT infrastructure, the

CLARIN-PLUSD4.2Jointinfrastructureservices

11

4.2.7 DomainRepositoriesThis interest group19about domain-specific repositories and data management plansreceivesregularinputfromCLARIN.Oneofthegoalsisworkonaprotocolforlanguagedatamanagementplans.

4.2.8 GroupofEuropeanDataExperts(GEDE)Thisexpertumbrellagroup20tries to identifycommonrecommendationsbasedon theoutputs of other RDA groups and international bodies (like the ITU). Its currentactivitiesfocusmostlyonpersistentidentifiers.

4.3 DARIAHIn CLARIN-PLUS, CLARIN is in constant consultation with DARIAH, which is also anEuropeanResearchInfrastructureConsortium(ERIC)fortheHumanities.Althoughthefocusofboth infrastructures is indifferentareasof theHumanities, thereareobviousconnections as CLARIN focuses on language related analysis, both as a method foranswering questions in the Humanities and Social Sciences and as an objective forresearchsuchasinLinguistics.Thecooperationextendstotechnicalbackendactivitiestoavoidreduplicationofwork,suchasintheareaofpersistentidentificationofdigitalobjects in repositories and core components for user identification andmanagement.Partsoftheseactivitiesareconductedbynationalpartners,othersaredevelopmentsontheEuropeanlevel.Asbothresearchinfrastructureshaveapartiallyoverlappinguserbase, itwasdecidedearlyontoenableusersoftheoneinfrastructuretousecomponentsoftheotheraswell.Theuseisespeciallyimportantwithregardtothereuseofresearchdata,licenseddataand editions,whichmay be used for language based analysis in CLARIN aswell as inothercontextsoftheHumanitiessuchasspatialrecreations,objectdescriptionsetc.Asapolicy,CLARINuses theShibbolethsystem,which isalsousedby libraries;hencescholarsfrominstitutionsfromcountriesparticipating inCLARINcanimmediatelyusethe services whenever their institution provides a Shibboleth-based connection. ToallowDARIAHusers toaccessCLARINservices, the internalDARIAHIdentityProviderwasconnectedtotheCLARINServiceProviderFederationviaeduGAIN.Likewise,therearealsoplanstoconnecttheCLARINIdentityProvidertotheDARIAHservices.Findingtheresourcesalreadyavailableinoneoftheinfrastructuresisalsoanimportantrequirementby theuser community,whowish tohave a single point of entry to findadequate material for reuse. This requires that the metadata is shared andinteroperable. First important steps toward the integration of metadata from bothCLARIN-D and DARIAH-DE have been taken. Based on the Open Archives InitiativeProtocol for Metadata Harvesting (OAI-PMH), a prototype system for metadataexchangeisinplace.OntheCLARIN-side,DARIAH-DEDublinCorebasedmetadatahasbeen converted to and integrated into CLARIN's Component MetaData Infrastructure(CMDI). The data has been integrated into a test instance of the Virtual LanguageObservatorywhereitisalreadysearchable.FortheintegrationofCLARIN-metadataintothesearchengineoperatedbyDARIAH-DE,it isrequiredtocreateschemamappingsfirst.Examplesofthesehavebeendevelopedand are currently being tested; also, a more general mapping procedure is beingdevised.

19Seehttps://www.rd-alliance.org/groups/domain-repositories-interest-group.html20Seehttps://www.rd-alliance.org/groups/gede-group-european-data-experts-rda

Page 13: D4.2 Joint infrastructure services - CLARIN · architectural overview of the B2ACCESS service is shown in Figure 1. To give the CLARIN user access to the EUDAT infrastructure, the

CLARIN-PLUSD4.2Jointinfrastructureservices

12

DARIAHandCLARINalsocollaborateonbuildingasustainableregistryofcoursesandothereducationalmaterialrelatedtotheuseofdigital languageresourcesandtools inHSS research. For more details, see the upcoming CLARIN-PLUS deliverable D5.2(Operationalcourseandeducationmaterialregistry).

4.4 EuropeanaThe Europeana Digital Service Infrastructure (DSI) is funded under the ConnectingEurope Facility (CEF) with the goal to develop Europeana into a widely-recognisedplatformofservicesandresources,notonlyformetadatareferences,butalsoforaccesstoculturalcontent,toolsandtechnologies,projectsandotherservices.DuringthetimespanofCLARIN-PLUS,CLARINparticipatesinWP2ofthesecondphaseofDSI(DSI-2)21.Thisworkpackageisconcernedwiththe“designofend-userproducts&services”.Task2.6.3, “Data sharing with third parties”, assigned to CLARIN, closely matches theCLARIN-PLUS WP4 objectives of strengthening the ties with other researchinfrastructuresandre-usingotherinfrastructuralserviceswithinCLARIN.The work plan22for task 2.6.3 of Europeana DSI-2 describes the following actionstowardsthegoalofintegratingEuropeanaresourcesintotheCLARINinfrastructureandincreasingthevisibilityandeaseofaccessofEuropeanaanditsdatawithintheCLARINcommunityandviceversa:

1. Data sets relevant to CLARIN’s community are identified out of the full setavailablefromEuropeana’sOAIharvester.

2. Implementation of a conversion from Europeana’s EDM format to CLARIN’sCMDI.

3. InclusionofobtainedmetadataintheVirtualLanguageObservatory(VLO).4. Selection of tools from CLARIN’s infrastructure to be included in a processing

workflowbasedonEuropeanaresources.5. AdaptationoftheCLARINinfrastructureforincreasedload(toaccommodatefor

asignificantincreaseinthenumberofharvestedandindexedmetadatarecords)wherenecessary.

6. Inclusion of Europeana APIs of potential interest to CLARIN’s target audienceCLARIN’s“languageresourceandtoolinventory”.

Asofthecompletionofthepresentdeliverable,thefollowingconcreteresultshavebeenachieved:

1. ACMDIprofileforEDMhasbeencreated,seehttps://catalog.clarin.eu/ds/ComponentRegistry#/?itemId=clarin.eu%3Acr1%3Ap_1475136016208&registrySpace=public

2. AnEDM-CMDIconversionstylesheethasbeenimplemented,seehttps://github.com/clarin-eric/metadata-conversion

3. Aselectionofdatasetstoharvesthasbeenmade,buildingonworkcarriedoutbyCLARINinthefirstphaseofDSIin2015(DSI-1).

4. Test OAI harvests and VLO imports have been carried out with a smallerselectionofsetscontainingatotalofabout3millionrecords.

Work on preparing the infrastructure components (OAI harvester, VLO) for theincreased load is ongoing. In themonths following the completion of this deliverable,possibilities for processing Europeana resources by means of the currently availablepipeline for discovering resources (using the VLO), findingmatching tools (using theLRS) and carrying out linguistic analyses (using applicable tools provided by CLARIN

21Seehttps://www.clarin.eu/group-page/europeana-dsi-222Seehttps://www.clarin.eu/file/3932

Page 14: D4.2 Joint infrastructure services - CLARIN · architectural overview of the B2ACCESS service is shown in Figure 1. To give the CLARIN user access to the EUDAT infrastructure, the

CLARIN-PLUSD4.2Jointinfrastructureservices

13

centres) will be investigated. A number of scenarios involving publicly availableresourcesandtoolswillbepreparedfordemonstrationpurposes.Additionally,aselectionofEuropeanaAPIsandservicesareplannedtobedescribedandregistered in the Language Resource Inventory 23 . Finally, one or more onlinepublications (weblog posts) on Europeana’s and CLARIN’s web portals are plannedtowards the end of DSI-2 (third quarter of 2017) to provide mutual exposure andpresentthepracticeandpotentialofthisintegration.ParalleltoDSI-2,EuropeanacurrentlyparticipatesinanEUDATpilotasapartofwhichitinvestigates,withsupportfromCLARIN,thepossibilityofintegratingselectedCLARINresourcesintotheEuropeanaecosystem24.

4.5 LAPPSGridThere is on-going coordination with members of the LAPPS25Consortium (BrandeisUniversity and Vassar College) about technical coordination of workflows betweenCLARIN (WebLicht) and LAPPS (since December 2015). The joint funding proposalinvolvingLAPPSpartnersBrandeisUniversityandVassarCollege,andCLARINCentresat Charles University Prague and University Tübingen was submitted to the MellonFoundationinApril2016andapprovedforfundinginSeptember2016.A preliminary meeting of the project PIs took place at the COLING conference inDecember 2016. Work on all work packages of the project started in January 2017,including implementation of software to convert between the internal data exchangeformatsused,mappingoflinguistictermstoensuresemanticcompatibilitybetweentheprojects, and investigation of authentication and authorization issues. The kick-offmeetingwilltakeplaceinPragueinMarch2017.

23Seehttps://www.clarin.eu/content/language-resource-inventory24The“EUROPEANADataPilotMeeting”wasontheagendaoftheEUDATuserforuminHelsinkiin January 2017 (https://www.eudat.eu/events/user-meetings/eudat-helsinki-meeting-23-27-january-2017-helsinki-finland).25Seehttp://www.lappsgrid.org/

Page 15: D4.2 Joint infrastructure services - CLARIN · architectural overview of the B2ACCESS service is shown in Figure 1. To give the CLARIN user access to the EUDAT infrastructure, the

CLARIN-PLUSD4.2Jointinfrastructureservices

14

5 ConclusionThe CLARIN-PLUS project partners closely follow the progress of a number ofneighbouring infrastructure projects, namely EUDAT, GÉANT, RDA, DARIAH,EUROPEANA, and the LAPPS Grid. In this deliverable, we have described allcontributions of the CLARIN-PLUS partners to maximise cross-fertilization betweenprojects. We have been working together with regard to identity management(authentication and authorization), see B2ACCESS, GÉANT and DARIAH, theimplementation of datamanagement policies, see B2SAFE, secure data exchange, seeB2DROP,andtheexecutionofscientificworkflows,seeEUDAT’sGEF.TheCLARINcommunitymakesavailable its resourcesand tools toothercommunitiessuchasEUROPEANAandDARIAH,solvingtechnicalissuessuchasmetadataconversionfromEuropeana’sEDMformattoCMDI,ortoolinclusioninEuropeana-basedprocessingworkflows.Also,B2DROPuserswillsoonbeabletoconnecttheresourcesoftheirclouddrivewiththeCLARINLanguageResourceSwitchboard,thusgivingtheseusersaccessto tools that can process their resources. As the deliverable shows, the CLARIN-PLUSpartnersareinaproductivecontactwiththemaine-Researchinfrastructures,includingRDAandtheLAPPSGrid.As a result, CLARIN-PLUS is well prepared to meet the challenge of realising ane-Research infrastructure where researchers can easily and smoothly search for,manage, and process research data across institutional, national, and technologicalboundaries.

Page 16: D4.2 Joint infrastructure services - CLARIN · architectural overview of the B2ACCESS service is shown in Figure 1. To give the CLARIN user access to the EUDAT infrastructure, the

CLARIN-PLUSD4.2Jointinfrastructureservices

15

References[1] C. Zinn. LR Switchboard (Software). Deliverable of the CLARIN-PLUS project,CLARINPLUS-D2.5, retrieved from: https://office.clarin.eu/v/CE-2016-0881-CLARINPLUS-D2_5.pdf[2]M.vandeSanden,C.Staiger,C.Cacciari,R.Mucci,C.J.Hakansson,A.Hasan,S.Coutin,H. Thiemann,B.vonSt.Vieth, J. Jensen(2015).D5.3:FinalReportonEUDATServices.Retrievedfrom:http://hdl.handle.net/11304/2433d23a-6079-49a6-9010-ca534f6e348d.[3]J.Mišutka.RobustSPF1:workflowandmonitoring.DeliverableoftheCLARIN-PLUSproject, CLARINPLUS-D2.2, retrieved from: https://office.clarin.eu/v/CE-2016-0809-CLARINPLUS-D2_2.pdf