1. 2 geert van grootel flemish government economy, science and innovation dept. knowledge management...

23
1

Upload: adam-ray

Post on 27-Mar-2015

220 views

Category:

Documents


1 download

TRANSCRIPT

  • Slide 1

1 Slide 2 2 Geert Van Grootel Flemish government Economy, Science and Innovation dept. Knowledge Management Division euroCRIS Treasurer CERIF taskgroup member Contact Koning Albert II-laan, 35 bus 10 1030 Brussel [email protected] Slide 3 3 Overview The FRIS vision & strategy FRIS principles Harvesting OAI-PMH repositorie Metadata format: CERIF Metadata format: MODS Techniques Results Conclusions Slide 4 4 The research information space researchers Research organisation Investment opportunities projects publications patents equipment government financers researchers editors libraries data centres research institutions industry products research data facilities Slide 5 5 Slide 6 6 Vision and strategic goals a simple, transparent and open research information space that contributes to the Flemish knowledge based economy (and strengthening the international competitive position of Flanders) more efficient and effective policy (monitoring) innovation value chain works faster better customer services (e-government) improved and faster valorisation increased networking capacities finding expertise improved information flow maximal reuse of data Simple and uniform processes enhancing strategic intelligence better information: complete, correct, actual higher responsivity of the policy domain Slide 7 7 Generate the required information directly from the data within processes at and between the stakeholders: Faster and reduced workload Data quality guaranteed Simultaneity - real time data Data Information Knowledge at the lowest process cost Central principle Slide 8 8 Flanders Research Information Space Globally (includes all stakeholders) Network of federated repositories For & by all stakeholders researchers, educators & students industry management public Open Acces via open standards (CERIF) Semantically rich environment (SBVR) Maximal formal information interconnection Generate the required information directly from the data within processes at and between the stakeholders : Faster and reduced workload - Data quality guaranteed - Simultaneity - real time data Slide 9 9 EWI Research Portal Slide 10 10 Project: Publication metadata in Research Portal According to FRIS principles Repository content can be linked with Research Portal data. Preferably by identifiers: Person, OrgUnit If feasible by author name Integration into Research Portal and exposing via portal interfaces Two scenarios In repository conversion, harvesting, loading Harvesting in native format, conversion, loading Slide 11 11 Scenario 1: details University Dspace repository Metadata format: qualified DC Not yet publicly accessible: data cleansing process ongoing Based on large import from WoS Merged with university data for Person and OrgUnit 10 person day (@mire) Product: CERIF2006 publication metadata available at OAI-PMH interface. Slide 12 12 2 Running university repositories Dspace: DC, qDC and MODS metadata formats available. Metadata format of choice: MODS Different levels of integration Workflow with controlled identifier insertion Correct author and orgunit identifiers in relations (for internal authors and orgunits) Simple stand alone interface No identifiers available 10 person days allocated (EWI) Product: CERIF2006 xml file Scenario 2: details Slide 13 13 Scenario 1. execution Conversion of qDC into CERIF (@mire) Steep learning curve, CERIF familiarization did cost time Several iterations needed to produce harvest in digestible format To get CERIF format right (at least for the quite flexible portal harvesting module) Data error corrections Date as text field: 2001, 2001-01, 01-2001, jan-2001, ?2001 or maybe 2002 Identifiers sourced form different internal databases Ambiguous references due to personnel status Authors with >1 identifier Harvesting Possible Loading into Research Portal data base Failed due to persistent relation constraint violations on Person and OrgUnit Project aborted after evaluation Slide 14 14 Scenario 2. execution Elimination of identifier-less repository Name to identifier mapping was considered to time consuming for the resources of this scenario Schema mapping though was succesfull Harvesting of repository content in single xml filesingle xml file Create the following patterns in BS Studio generic Publication pattern Mods pattern CERIF2006 pattern Test pattern against there sources by self commitment Create mappings: mods2generic generic2CERIF2006 Generate CERIF2006 xml as output. Slide 15 15 Scenario 2. execution For several small test datasets this scenario was successful Loading the full repository content proofed difficult Identifier mismatches Data type support in source Date fields Numeric fields Eg. Page: 103 vs p.103 vs pag. 103 Slide 16 16 Conclusion: scenarios End goal is to ambitious The less desirable is scenario 1 Scalability problem Ad hoc per repository and repository software Learning curve Scenario 2 is the more systematic approach Require tools & semantic mapping knowledge Modification only in details on mods patterns Other patterns and the commitments reusable Slide 17 17 Slide 18 18 Conclusions: metadata schemes DC an qDC have to little formality and normalisation Hampers considerably implementation in m2m Mods is more robust but still need more formality CERIF has steep learning curve less human friendly Need for support for date-time fractions Year of Data-Time for publication year Slide 19 19 Conclusions: Workflow & process control Data inconsistencies need a architectural or at least a system approach Integration of information sources with workflow control Strong integration in researchers workbench Responsibilities The management of the production is a core responsibility of each organisation. Why is research the exception Is it wise to buy back the own production data from a third party Slide 20 20 Modelling needs Slide 21 21 Methodology Purpose Definition Scoping Knowledge extraction Verbalization Build logical model Resolve to CERIF Conceptual level Logical & Physical level Slide 22 22 BSM Business Semantics Management Slide 23 23