![Page 1: Emerging domain agnostic functionalities on the handle-centered networks](https://reader031.vdocuments.mx/reader031/viewer/2022021923/5a6d79aa7f8b9af8418b5887/html5/thumbnails/1.jpg)
Emerging domain agnostic functionalities
on the handle-centered networks
Kei Kurakawa
National Institute of Informatics
Takayuki Sekiya
The University of Tokyo
Yasumasa Baba
The Institute of Statistical Mathematics
1
International Workshop on Sharing, Citation and Publication of Scientific Data across Disciplines
Joint Support-Center for Data science Research (DS), ROIS
NIPR / NINJAL, Tachikawa, Tokyo, Japan, 5-7 December 2017.
![Page 2: Emerging domain agnostic functionalities on the handle-centered networks](https://reader031.vdocuments.mx/reader031/viewer/2022021923/5a6d79aa7f8b9af8418b5887/html5/thumbnails/2.jpg)
Overview
• Research data sharing
• Domain-independent automatic data processing environment on the PID centric information model for very large collections of distributed scientific data
• Kernel Information
• Handle-centered networks on Kernel Information metadata layer
• Future directions
• Summary
2
![Page 3: Emerging domain agnostic functionalities on the handle-centered networks](https://reader031.vdocuments.mx/reader031/viewer/2022021923/5a6d79aa7f8b9af8418b5887/html5/thumbnails/3.jpg)
Research data sharing mind from Open
Access, Open Data , Open Science
• Disciplinary historical events– Meteorology and geoscience
• The first International Polar Year (IPY)(1882)
• The first International Geophysics Year (IGY) (1957)
– Biology• “Bermuda Rules” (1996)
• Interdisciplinary events– Budapest Open Access Initiative (2002)
– Berlin Declaration (2003)
– G8 Open Data Charter (2013)
• The movement reached at the slogan “research data sharing without barriers” of RDA (Research Data Alliance) among all disciplines, in order to innovate and develop societal and technological specifications for scientific data infrastructures.
3
![Page 4: Emerging domain agnostic functionalities on the handle-centered networks](https://reader031.vdocuments.mx/reader031/viewer/2022021923/5a6d79aa7f8b9af8418b5887/html5/thumbnails/4.jpg)
Current procedure to aggregate
and process the scientific data• The procedure, which may be
peculiar to each discipline, is a process of craftsmanship and too much time consuming task.
• The data consumer needs to understand the semantics of data structure in domain dependent schemes and choose ordinarily a community standard of tools on a specific computational environment to process the data.
• It seems to be difficult for outsiders of the expertise to do the same things.
4
1. Fetch and crawl data
Data on the Web
Data consumer
2. Manually process the data
Manually check for:
data format
data structure
data version
data provenance
data quality
![Page 5: Emerging domain agnostic functionalities on the handle-centered networks](https://reader031.vdocuments.mx/reader031/viewer/2022021923/5a6d79aa7f8b9af8418b5887/html5/thumbnails/5.jpg)
A community objective
• Data on the web
– Very large collections of scientific data, which is distributed on the web
– PID centric information model
• Two major processes in the scientific data use
– Data discovery
– Automatic data processing
• To invest domain-independent automatic data processing environment on the PID centric information model for very large collections of distributed scientific data
5
![Page 6: Emerging domain agnostic functionalities on the handle-centered networks](https://reader031.vdocuments.mx/reader031/viewer/2022021923/5a6d79aa7f8b9af8418b5887/html5/thumbnails/6.jpg)
PID centric information model
and services
• Information elements
– Handle : PID
– Metadata
– Data
– Data type
• PID resolve service
– Handle server
• Metadata service
– Metadata repository
• Data services
– Data repository
– Data type registry
6
Working group outputs of the Data Fabric, Data Type
Registries, PID Information Types, and PID Kernel Information
![Page 7: Emerging domain agnostic functionalities on the handle-centered networks](https://reader031.vdocuments.mx/reader031/viewer/2022021923/5a6d79aa7f8b9af8418b5887/html5/thumbnails/7.jpg)
Kernel Information : Metadata
7
<Web Space>
Handle:PID
Data
Handle:PID
KI:Metadata
Handle:PID
KI:Metadata
Data type
Data
Data type for the “Data”
Kernel Information represents a connection between
Data and Data type.
digitalObjectLocation
digitalObjectLocationdigitalObjectType
digitalObjectType
![Page 8: Emerging domain agnostic functionalities on the handle-centered networks](https://reader031.vdocuments.mx/reader031/viewer/2022021923/5a6d79aa7f8b9af8418b5887/html5/thumbnails/8.jpg)
Kernel Information : Metadata
8
<Web Space>
Handle:PID
Data
Handle:PID
Data type
Handle:PID
KI:Metadata
Handle:PID
KI:Metadata
Data type
Data
KI:Metadata itself also should be data-typed.
Data type for the “KI:Metadata”
digitalObjectLocation
digitalObjectLocationdigitalObjectType
digitalObjectType
RDAKIProfileType
RDAKIProfileType
![Page 9: Emerging domain agnostic functionalities on the handle-centered networks](https://reader031.vdocuments.mx/reader031/viewer/2022021923/5a6d79aa7f8b9af8418b5887/html5/thumbnails/9.jpg)
Kernel Information : Metadata
9
<Web Space>
Handle:PID
Data
Handle:PID
Data type
Handle:PID
KI:Metadata
Handle:PID
KI:Metadata
Data type
Data
KI represents structural relationships.
digitalObjectLocation
digitalObjectLocationdigitalObjectType
digitalObjectType
RDAKIProfileType
RDAKIProfileType
wasDerivedFrom
![Page 10: Emerging domain agnostic functionalities on the handle-centered networks](https://reader031.vdocuments.mx/reader031/viewer/2022021923/5a6d79aa7f8b9af8418b5887/html5/thumbnails/10.jpg)
Kernel Information structural
data relationships defined• wasDerivedFrom
• specializationOf
• revisionOf
• primarySourceOf
• quotationOf
• alternateOf
• hadMember
• externalW3CPROVDoc
10
![Page 11: Emerging domain agnostic functionalities on the handle-centered networks](https://reader031.vdocuments.mx/reader031/viewer/2022021923/5a6d79aa7f8b9af8418b5887/html5/thumbnails/11.jpg)
11
PID profile metadata 17.03.06 from the WG
![Page 12: Emerging domain agnostic functionalities on the handle-centered networks](https://reader031.vdocuments.mx/reader031/viewer/2022021923/5a6d79aa7f8b9af8418b5887/html5/thumbnails/12.jpg)
PID centric information sequence
12
KI:Metadata
Handle:PID
Handle serverClient
1. Query with Handle
2. Handle information
(e.g., PID to Profile, URL to target ROR,
Data field for PID Kernel Information)
Data type registry Data repository
Metadata repository
or Landing page
3. Query with Handle for DTR profile
4. DTR profile definition
5. HTTP GET Resources (data, metadata, landing page)
6. ResourcesMetadata
Data type
Data
![Page 13: Emerging domain agnostic functionalities on the handle-centered networks](https://reader031.vdocuments.mx/reader031/viewer/2022021923/5a6d79aa7f8b9af8418b5887/html5/thumbnails/13.jpg)
Data processing paradigm shift
13
1. Fetch and crawl data
Data on the Web
Data consumer
2. Manually process the data
Manually check for:
data format
data structure
data version
data provenance
data quality
Data on PID centric
information architecture
Current manual method Future automatic method
1. Fetch the list of PIDs
Client program
Data type registry
Handle service
2. Query/response for PID KI profile
3. Query/response for data type profile
5.Automatically process the data
4. Fetch the data
![Page 14: Emerging domain agnostic functionalities on the handle-centered networks](https://reader031.vdocuments.mx/reader031/viewer/2022021923/5a6d79aa7f8b9af8418b5887/html5/thumbnails/14.jpg)
Metadata splitting
14
Domain independent
Domain dependent
Automatic data processing
Data discovery
Separation of concerns
Metadata for data
Generality levels
Domain independent
Domain dependent
Kernel Information
Metadata definitions
Metadata
for data discovery
![Page 15: Emerging domain agnostic functionalities on the handle-centered networks](https://reader031.vdocuments.mx/reader031/viewer/2022021923/5a6d79aa7f8b9af8418b5887/html5/thumbnails/15.jpg)
Handle-centered networks
on Kernel Information metadata layer
15
Attribute augmented graph
Data layer
Data type layer
Kernel Information metadata layer
![Page 16: Emerging domain agnostic functionalities on the handle-centered networks](https://reader031.vdocuments.mx/reader031/viewer/2022021923/5a6d79aa7f8b9af8418b5887/html5/thumbnails/16.jpg)
Future directions
• Domain agnostic functionalities
• Data science approach
– Analysis of data
– Classification of data
– Recommendations of data
– Prediction of data
• On Kernel Information metadata layer,
– Trustworthy and traceability analysis before download the data
16
![Page 17: Emerging domain agnostic functionalities on the handle-centered networks](https://reader031.vdocuments.mx/reader031/viewer/2022021923/5a6d79aa7f8b9af8418b5887/html5/thumbnails/17.jpg)
Summary
• The objective is – Domain-independent automatic data processing environment on
the PID centric information model for very large collections of distributed scientific data
• We introduced– Kernel Information as a RDA working output
• Data
• Data type
• Structural relation between data
• We viewed – Handle-centered networks on the Kernel Information metadata
layer
• Domain agnostic functionalities is emerging from– Graph based reasoning on the framework.
17
![Page 18: Emerging domain agnostic functionalities on the handle-centered networks](https://reader031.vdocuments.mx/reader031/viewer/2022021923/5a6d79aa7f8b9af8418b5887/html5/thumbnails/18.jpg)
Acknowledgement
• This work is supported by the open
collaborative research at National Institute
of Informatics (NII) Japan (FY2017).
• The authors are thankful to all RDA Kernel
Information WG members for their great
discussions on remotely and in-person
meetings.
18