together: dissemination and exploitation of cultural ...digital infrastructures for research (di4r)...
TRANSCRIPT
![Page 1: together: Dissemination and exploitation of cultural ...Digital Infrastructures for research (DI4R) 2017 Brussels, BE 30 November 2017. Europeanain sixbullets •Europeanais the European](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609e0a46756d1461767f5371/html5/thumbnails/1.jpg)
BringingEuropeana andCLARINtogether:
Disseminationandexploitationofculturalheritagedata
inaresearchinfrastructureTwan Goosen1 (CLARINERIC),Nuno Freire2,ClemensNeudecker3,MariaEskevich1
1 CLARINERIC;2 Europeana /INESC-ID;3BerlinStateLibrary/Europeana Newspapers
DigitalInfrastructuresforresearch(DI4R)2017
Brussels,BE
30November2017
![Page 2: together: Dissemination and exploitation of cultural ...Digital Infrastructures for research (DI4R) 2017 Brussels, BE 30 November 2017. Europeanain sixbullets •Europeanais the European](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609e0a46756d1461767f5371/html5/thumbnails/2.jpg)
Europeana insix bullets
• Europeana istheEuropeandigitalplatformforculturalheritagethat• seekstoenableuserstosearchandaccessknowledgeinallthelanguagesofEurope,eitherdirectlyviaitswebportals,orindirectlyviathird-partyapplicationsleveragingitsdataservice• Europeana enablespeopletoexplorethedigitalresourcesofEurope'sgalleries,museums,libraries,archivesandaudiovisualcollections• workingwithpartnersandalliestodevelopframeworks,standards,strategyandpolicyrelevanttodigitalculturalheritage,andtoraisefunds• providingdigitalexpertiseandplatformsforbringingculturalheritagetowideraudiences• championingtheuseofdigitised culturalheritageineducation,researchandthecreativeindustriesthroughpartnershipsandinternationalengagementcampaigns
2
![Page 3: together: Dissemination and exploitation of cultural ...Digital Infrastructures for research (DI4R) 2017 Brussels, BE 30 November 2017. Europeanain sixbullets •Europeanais the European](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609e0a46756d1461767f5371/html5/thumbnails/3.jpg)
CLARINinseven bullets
• CLARINistheCommonLanguageResourcesandTechnologyInfrastructure
• ESFRI ERICstatussince2012,Landmarksince2016• thatprovideseasyandsustainableaccessforscholarsinthehumanitiesandsocialsciences andbeyond• todigitallanguagedata (inwritten,spoken,videoormultimodalform)• andadvancedtools todiscover,explore,exploit,annotate,analyse orcombinethem,wherevertheyarelocated• throughasinglesign-ononlineenvironment• andthatservesasanecosystemforknowledgesharing
3
![Page 4: together: Dissemination and exploitation of cultural ...Digital Infrastructures for research (DI4R) 2017 Brussels, BE 30 November 2017. Europeanain sixbullets •Europeanais the European](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609e0a46756d1461767f5371/html5/thumbnails/4.jpg)
4
CLARINERICinmembersand centres
Aconsortium of:• 19members:AT,BG,CZ,DE,DK,DLU,EE,FI,GR,HU,IT,LT,LV,NL,NO,PL,PT,SE,SI• 2observers:FR,UK;• >40centres
![Page 5: together: Dissemination and exploitation of cultural ...Digital Infrastructures for research (DI4R) 2017 Brussels, BE 30 November 2017. Europeanain sixbullets •Europeanais the European](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609e0a46756d1461767f5371/html5/thumbnails/5.jpg)
CLARIN&Europeana partnershipincontextofDSIDigitalServiceInfrastructure (DSI):Creationofacomplete,cohesiveandintegratedDigitalServiceInfrastructure• DSI(01.2015– 06.2016):
– EuropeanResearchDistributionPlan– AssessmentofrelevantdatasetsavailablefromTheEuropeanLibrary(TEL)
• DSI-2(07.2016– 08.2017):– Improvementofdataqualityandimplementationofqualityframeworksto
improvemetadataquality– IntegrationofEuropeana dataintoCLARINinfrastructure
• DSI-3(09.2017– 08.2018):– Fosteringcontentsupplybyoptimising Europeana dataandaggregation
infrastructure– Improving(meta-)dataandcontentquality– Fosteringreuseofdigitalculturalheritageresourcesbyimprovingcontent
distributionmechanisms– Maintainaninternationalinteroperablelicensingframework
5
![Page 6: together: Dissemination and exploitation of cultural ...Digital Infrastructures for research (DI4R) 2017 Brussels, BE 30 November 2017. Europeanain sixbullets •Europeanais the European](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609e0a46756d1461767f5371/html5/thumbnails/6.jpg)
StepstowardsCLARIN&Europeana interoperability
1) IncorporateEuropeana metadataintheVLO
2) Openingupthefull-textEuropeana NewspapersresourcessuchasthosefromEuropeana NewspapersthroughCLARIN’sfederatedcontentsearchmechanism(FCS)
3) ExploitingCLARIN’scommunicationchannelstoincreasetheawarenessofEuropeana withinthecommunity
4) MeasureimpactofthedisseminationofEuropeana data
6
![Page 7: together: Dissemination and exploitation of cultural ...Digital Infrastructures for research (DI4R) 2017 Brussels, BE 30 November 2017. Europeanain sixbullets •Europeanais the European](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609e0a46756d1461767f5371/html5/thumbnails/7.jpg)
Metadata:accesstoculturalheritage
• Aggregationandexploitationof(meta)dataaboutdigitised objectsfromverydifferentcontexts.• Europeana DataModel(EDM)asitsmodelforinteroperabilityofmetadata,inlinewiththevisionoflinkedopenvocabularies
7
• Aggregationofmetadatafromresourceproviders(CLARINcentresandselected“external”parties)• VirtualLanguageObservatory(VLO)providesauniformexperienceandconsistentworkflow.• LanguageResourceSwitchboard(LRS)allowsresearcherstoinvoketoolswiththeselectedresourcesdirectlyfromitsuserinterface.
Challenge:CLARINandEuropeana donotshareacommonmetadatamodel
![Page 8: together: Dissemination and exploitation of cultural ...Digital Infrastructures for research (DI4R) 2017 Brussels, BE 30 November 2017. Europeanain sixbullets •Europeanais the European](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609e0a46756d1461767f5371/html5/thumbnails/8.jpg)
TheCLARINdataarchitecture:repositories
8
Repository at a CLARIN centre
Language Data Metadata Language
Tools
describes
single text or recording
!corpus
!lexicon
!wordnet
!grammar
!…
web application !
web service !
web service pipeline
!stand-alone application
!…
![Page 9: together: Dissemination and exploitation of cultural ...Digital Infrastructures for research (DI4R) 2017 Brussels, BE 30 November 2017. Europeanain sixbullets •Europeanais the European](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609e0a46756d1461767f5371/html5/thumbnails/9.jpg)
TheCLARINdataarchitecture:harvesting
9
Language Data Metadata Language
ToolsLanguage
Data Metadata Language Tools
Harvested Metadata
Language Data Metadata Language
ToolsLanguage
Data Metadata Language Tools
copy
![Page 10: together: Dissemination and exploitation of cultural ...Digital Infrastructures for research (DI4R) 2017 Brussels, BE 30 November 2017. Europeanain sixbullets •Europeanais the European](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609e0a46756d1461767f5371/html5/thumbnails/10.jpg)
TheCLARINdataarchitecture:processing
10
![Page 11: together: Dissemination and exploitation of cultural ...Digital Infrastructures for research (DI4R) 2017 Brussels, BE 30 November 2017. Europeanain sixbullets •Europeanais the European](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609e0a46756d1461767f5371/html5/thumbnails/11.jpg)
TheCLARINdataarchitecture:contentsearch
11
Language Data Metadata Language
ToolsLanguage
Data Metadata Language Tools
(Federated) Content Search!!
(1) enter query !(4) show aggregated results
Language Data Metadata Language
ToolsLanguage
Data Metadata Language Tools
(2) perform local search
(3) retrieve results
![Page 12: together: Dissemination and exploitation of cultural ...Digital Infrastructures for research (DI4R) 2017 Brussels, BE 30 November 2017. Europeanain sixbullets •Europeanais the European](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609e0a46756d1461767f5371/html5/thumbnails/12.jpg)
TheCLARINdataarchitecture:workflows
12
Language Data Metadata Language
ToolsLanguage
Data Metadata Language Tools
Web Service Pipelines!!
(1) select input data (2) construct pipeline (3) execute (4) use/analyse output data
Language Data Metadata Language
ToolsLanguage
Data Metadata Language Tools
![Page 13: together: Dissemination and exploitation of cultural ...Digital Infrastructures for research (DI4R) 2017 Brussels, BE 30 November 2017. Europeanain sixbullets •Europeanais the European](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609e0a46756d1461767f5371/html5/thumbnails/13.jpg)
Interoperability iskey
• to the exhange ofmetadata• to the exchangeformatsfor the outputofanalytic tools• to the optionsfor supporting comparativeresearch
13
![Page 14: together: Dissemination and exploitation of cultural ...Digital Infrastructures for research (DI4R) 2017 Brussels, BE 30 November 2017. Europeanain sixbullets •Europeanais the European](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609e0a46756d1461767f5371/html5/thumbnails/14.jpg)
CLARIN&Europeana Interoperaility highligths• CLARIN’singestionpipeline(OpenArchivesInitiativeProtocolforMetadataHarvesting(OAI-PMHprotocol))wasextendedtoretrieveasetofselectedcollectionsfromEuropeana andapplytheconversionintheprocess.
• Severalinfrastructurecomponentshadtobeadaptedtoaccommodatethesignificantincreaseintheamountofdatatobehandledandstored.– Currentstatus:
• 775Europeana datasets(e.g.Newspapers)nowfoundintheVLO• 10KaretechnicallysuitableforprocessingwiththeLRS
– Goal:• Morerecordsintheforeseeablefuture
14
![Page 15: together: Dissemination and exploitation of cultural ...Digital Infrastructures for research (DI4R) 2017 Brussels, BE 30 November 2017. Europeanain sixbullets •Europeanais the European](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609e0a46756d1461767f5371/html5/thumbnails/15.jpg)
Metadataretrievalandconversion:OAI-PMHprotocol• Europeana:
– EDM-structuredEuropeana asRDF/XMLdocuments• CLARIN:
– HarvesterperformsconversionsbymeansofXSLTstylesheetsbyapplyingastylesheetthatconvertstheRDF/XMLdocumentsmetadatatoComponentMetadata(CMD)
– CreationofaCMDprofileforEDMintheCMDIComponentRegistry– implementationofanXSLTstylesheetthatproducesinstancesofthe
correspondingschemaonbasisoftheEDMrecords.– PropertiesaredefinedasCMDelementsintheorderthattheyappearinthe
EDMspecificationwhileobjectorderisbasedonrelevance.– Conceptlinksareassignedtomostcomponentsandelements.– Implementedconversionstylesheet:theheaderinformationandresource
proxies(entitiesrepresentingexternaldocuments)intheresultingrecordareproducedonthebasisofalistofstaticXPathsintheoriginaldocument.
– Therecord’spayloadisproducedmostlybymeansofastraightforwardcrosswalkwherethepropertiesinthedocumentaremappedtoCMDcomponentsorelementsofanequivalentname.
• Testharvestof11selectedmetadatasets:– Totalof3.2millionsuccessfullyretrievedandconverted,schemavalidrecords– Fullharvestandimportofthesizeofthissampletakesroughly48hours
15
![Page 16: together: Dissemination and exploitation of cultural ...Digital Infrastructures for research (DI4R) 2017 Brussels, BE 30 November 2017. Europeanain sixbullets •Europeanais the European](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609e0a46756d1461767f5371/html5/thumbnails/16.jpg)
Processingpipelineissues
• GenerallackoftechnicalinformationavailableintheprovidedEDM(e.g.themediatypeforlinkedresources)
• Directlinkstomachineprocessable resourcesarecommonlymissing
• LimitedfunctionalityprovidedbythetoolsthatareconnectedtotheLRS(e.g.languagesvariability,resourcetypes,accessibility)
16
![Page 17: together: Dissemination and exploitation of cultural ...Digital Infrastructures for research (DI4R) 2017 Brussels, BE 30 November 2017. Europeanain sixbullets •Europeanais the European](https://reader033.vdocuments.mx/reader033/viewer/2022060900/609e0a46756d1461767f5371/html5/thumbnails/17.jpg)
Getintouch
https://www.europeana.euhttps://pro.europeana.euhttps://pro.europeana.eu/project/europeana-dsi-3
17