interoperability design and implementation choices ... · interoperability design and...

20
Grant Agreement N°825619 Page 1 of 20 AI4EU Deliverable D2.7 Interoperability design and implementation choices reference document WP 2 Platform design and implementation Task 2.6 Interoperability and AI European partnering platforms Dissemination level 1 PU Due delivery date 31/08/2019 Nature 2 R Actual delivery date 05/09/2019 Lead beneficiary IDI Contributing beneficiaries KNO, BSC, TAS, IDSA and FHG Document Version Date Author Comments 3 1.0 Jul 31 S. Marcel, A. Anjos and S. Gaist (IDI) BEAT 1.1 Aug 9 M. Welß (FHG) Acumos 1.1 Aug 9 D. Kowald, R. Kern and S. Kopeinik (KNO) Added “European Data for AI” sections. 1.1 Aug 12 D. Vincente (BSC) HPC 1.2 Aug 12 S. Marcel, A. Anjos and S. Gaist (IDI) harmonisation and restructuring 1.3 Aug 20 M. Aubrun (TAS) Mundi 1.3 Aug 20 S. Marcel, A. Anjos and S. Gaist (IDI) Abstract, Introduction and Conclusion 1.3 Aug 23 Sebastian Steinbuss (IDSA) Input from IDSA 1.4 Aug 27 M. Aubrun (TAS) Interoperability with Acumos 1.5 Sep 3 S. Marcel and A. Anjos (IDI) Implementation of comments from reviewer 1 1 Dissemination level: PU = Public, PP = Restricted to other programme participants (including the JU), RE = Restricted to a group specified by the consortium (including the JU), CO = Confidential, only for members of the consortium (including the JU) 2 Nature of the deliverable: R = Report, P = Prototype, D = Demonstrator, O = Other 3 Creation, modification, final version for evaluation, revised version following evaluation, final

Upload: others

Post on 11-Aug-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Interoperability design and implementation choices ... · Interoperability design and implementation choices reference document ... final version for evaluation, revised version following

Grant Agreement N°825619

Page 1 of 20

AI4EU Deliverable D2.7

Interoperability design and implementation choices reference

document

WP 2 Platform design and implementation

Task 2.6 Interoperability and AI European partnering platforms

Dissemination level1 PU Due delivery date 31/08/2019

Nature2 R Actual delivery date 05/09/2019

Lead beneficiary IDI

Contributing beneficiaries KNO, BSC, TAS, IDSA and FHG

Document Version Date Author Comments3

1.0 Jul 31 S. Marcel, A. Anjos and S. Gaist (IDI) BEAT

1.1 Aug 9 M. Welß (FHG) Acumos

1.1 Aug 9 D. Kowald, R. Kern and S. Kopeinik (KNO) Added “European

Data for AI”

sections.

1.1 Aug 12 D. Vincente (BSC) HPC

1.2 Aug 12 S. Marcel, A. Anjos and S. Gaist (IDI) harmonisation

and restructuring

1.3 Aug 20 M. Aubrun (TAS) Mundi

1.3 Aug 20 S. Marcel, A. Anjos and S. Gaist (IDI) Abstract,

Introduction and

Conclusion

1.3 Aug 23 Sebastian Steinbuss (IDSA) Input from IDSA

1.4 Aug 27 M. Aubrun (TAS) Interoperability

with Acumos

1.5 Sep 3 S. Marcel and A. Anjos (IDI) Implementation

of comments from

reviewer 1

1 Dissemination level: PU = Public, PP = Restricted to other programme participants (including the JU), RE = Restricted to a group

specified by the consortium (including the JU), CO = Confidential, only for members of the consortium (including the JU)

2 Nature of the deliverable: R = Report, P = Prototype, D = Demonstrator, O = Other

3 Creation, modification, final version for evaluation, revised version following evaluation, final

Page 2: Interoperability design and implementation choices ... · Interoperability design and implementation choices reference document ... final version for evaluation, revised version following

AI4EU_D2.7_M8_vfinal

Page 2 of 20

Deliverable abstract

This deliverable describes the high-level interoperability requirement among different AI

Resources available within AI4EU. Following a concertation meeting held during a workshop in

March 2019, we converged to a technical solution connecting the main AI4EU platform to different

AI Resources. AI Resource producers will be expected to export containers with data and software

or brokers to access remote resources. AI Resource consumers shall be able to import these assets

upon End-User needs.

Deliverable Review

Reviewer #1: MICHELA MILANO Reviewer #2: ..........................................

Answer Comments Type* Answer Comments Type*

1. Is the deliverable in accordance with

(i) the

Description of the

Action?

X Yes

☐ No

☐ M

☐ m

☐ a

☐ Yes

☐ No

☐ M

☐ m

☐ a

(ii) the

international State of

the Art?

X Yes

☐ No

☐ M

☐ m

☐ a

☐ Yes

☐ No

☐ M

☐ m

☐ a

2. Is the quality of the deliverable in a status

(i) that allows it

to be sent to European

Commission?

X Yes

☐ No

☐ M

☐ m

☐ a

☐ Yes

☐ No

☐ M

☐ m

☐ a

(ii) that needs

improvement of the

writing by the

originator of the

deliverable?

☐ Yes

X No

☐ M

☐ m

☐ a

☐ Yes

☐ No

☐ M

☐ m

☐ a

(iii) that needs further

work by the Partners

responsible for the

deliverable?

☐ Yes

X No

☐ M

☐ m

☐ a

☐ Yes

☐ No

☐ M

☐ m

☐ a

* Type of comments: M = Major comment; m = minor comment; a = advice

Page 3: Interoperability design and implementation choices ... · Interoperability design and implementation choices reference document ... final version for evaluation, revised version following

AI4EU_D2.7_M8_vfinal

Page 3 of 20

1. Introduction

This deliverable describes the high-level interoperability requirement among different AI

Resources available within AI4EU. Currently Europe has a large variety of actors and stakeholders

providing data (i.e. Thales Alenia Space’s Mundi, Bonseyes, ELG), software (i.e. Acumos,

IDIAP’s BEAT) and computing infrastructures (i.e. BSC’s HPC, CINECA). These assets are

coined as AI Resources in AI4EU and this deliverable.

This deliverable is complementary to D2.1 on the architecture of the AI4EU platform and D3.2 on

the AI4EU use cases. Therefore, its scope is restricted to presenting high level technical

descriptions of how AI4EU actors intend to interoperate with a common container-based standard.

AI4EU actors can be categorized as resource producers or consumers. Resource producers are

data and software providers that should feed the AI4EU ecosystem. Resource consumers are

computing infrastructures that should input data and software to produce models and predictions.

For example, in image classification (e.g. cats, dogs, cars, pedestrians, …) the input data consists

of a large set of images containing the object of interest assigned with identity labels. To classify

a new image, an End-User will first create a model from the dataset using dedicated software

(training algorithm). The End-User may then use the trained model to infer the class label of a test

image. The AI Resources involved in this example are a dataset, software for training and

inference, and the trained model. Training and inference are executed at resource consumers while

resource producers provide data and software.

At the moment, the European landscape of AI Resources is highly heterogeneous and not

interoperable. As a consequence, the aim of this deliverable is to propose a common path towards

federating AI Resources in Europe. To reach that objective we first introduce in Section 2 relevant

technical details of the main partner resources involved in WP2 T2.6. Next in Section 3 we propose

a high-level technical solution connecting different AI. More precisely AI Resource producers will

be expected to export containers with data and software or brokers to access remote resources.

Moreover, AI Resource consumers shall be able to import these assets upon End-User needs.

Page 4: Interoperability design and implementation choices ... · Interoperability design and implementation choices reference document ... final version for evaluation, revised version following

AI4EU_D2.7_M8_vfinal

Page 4 of 20

2. State of the Art

a. Acumos

AI4EU operates its own instance of Acumos4 which can serve as a possible AI Resource Repository

of the AI4EU platform. In the first step, it will contain AI Resources carefully selected by AI4EU

experts such as:

● Trained Models that are ready to deploy,

● High quality datasets,

● Connectors and brokers that can be used in conjunction with the above models and datasets

to visually compose AI pipelines.

Acumos also includes a visual editor (Figure below) where an AI Resource must expose an

interface description (i.e. in the protobuf format).

Visual editor in Acumos

The resources can either be onboarded as deployable artefact or as catalog entry the refers to an

external source. In the second step, the community is encouraged to add further resources as well

as comment and rate the existing ones. All uploaded resources must undergo a well-defined

publication process to ensure integrity and quality of the content: this comprises technical, legal

and ethical aspects of the resource and content.

Acumos accepts AI Resources in the form of a Docker container that expose an interface (i.e. a

protobuf specification5). As part of the onboarding process, the target execution environment (like

x86_64 or HPC) must be specified, so it can eventually be deployed into the cloud or the AI4EU

HPC playground.

4 https://www.acumos.org/

5 https://developers.google.com/protocol-buffers/

Page 5: Interoperability design and implementation choices ... · Interoperability design and implementation choices reference document ... final version for evaluation, revised version following

AI4EU_D2.7_M8_vfinal

Page 5 of 20

b. Thales Alenia Space (TAS)

Existing Projects

Copernicus

Copernicus6 is the European Union's Earth Observation (EO) programme coordinated and managed

by the European Commission in partnership with the European Space Agency (ESA) and EU

Agencies. The objective of this programme is to provide global, continuous and high quality EO

data in order to address global challenges in six different thematic: atmosphere, marine, land,

climate, emergency and security. To achieve its objective, the European Commission has launched

major initiatives to:

● Produce data: Sentinel programme that consists to build EO satellites and set up ground

segments to receive and process EO data. Currently, seven missions are developed by ESA,

which include radar and super-spectral payloads. Note that Copernicus Programme has

adopted a free, full and open data policy for all information produced in the framework of

Copernicus

● Access data:

○ Conventional Data Access Hubs: Portals that provide free access to Copernicus

satellite data through interactive graphical user interface

○ Data and Information Access Services (DIAS): Cloud-based platforms that provide

data and information access alongside processing resources, tools and other relevant

data.

Mundi

Mundi is a one of the DIAS project, which is executed by a consortium composed of 9 parties,

including Thales Alenia Space. As mentioned above, Mundi is a cloud-based platform that:

● Gives unlimited, free and complete access to Copernicus data and information, as well as

access to additional commercial satellite or non-space data sets

● Gives access to sophisticated processing tools

● Provides a scalable computing and storage environment for third parties, either individual

or companies

● Allows third parties to offer advanced value-adding services integrating Copernicus with

their own data and tools to the benefit of their own users

● Provides adapted technical, business and functional support

Mundi platform is accessible via the following link: https://mundiwebservices.com.

Position in AI4EU project

In AI4EU project, the position of Mundi is data provider of EO satellite data. Note that no condition

is required to explore and view the EO satellite data, but users must be registered to download

them. The only condition to register is to have a valid email address.

6 https://www.copernicus.eu/en

Page 6: Interoperability design and implementation choices ... · Interoperability design and implementation choices reference document ... final version for evaluation, revised version following

AI4EU_D2.7_M8_vfinal

Page 6 of 20

c. BEAT platform from IDIAP (IDI)

BEAT7,8 is a framework for the definition, execution and comparison of software-based data-

driven workflows that can be subdivided functionally (into processing blocks). The user provides

the description of data exchange formats, algorithms, data flows (also known as toolchains) and

experimental details (parameters). The framework can execute the experiment locally or in a

computing infrastructure transparently. Results can be shared and compared via traditional

exchange mechanisms or by using a web-based platform.

The BEAT Platform and Framework were created as part of a pan-European project composed of

both academic and industrial partners in which one of the goals was the design and development

of a free, open-source online web-based platform for the development and certification of

reproducible software-based machine learning (ML) and pattern recognition (PR) experiments.

The main intent behind the web platform is to establish a framework for the certification and

performance analysis of such systems while still respecting the privacy and confidentiality of built-

in data and user contributions.

The BEAT Framework, as per definition, is task-independent, being adaptable to different problem

domains and evaluation scenarios. At the conceptual phase, the platform was bound to support a

number of use-cases which we try to summarize:

● Benchmarking of ML and PR systems and components: users should be able to program

and execute full systems so as to identify performance and computing requirements for

complete toolchains or individual components;

● Comparative evaluation: it should be possible to run challenges and competitions on the

platform as it is the case in similar systems such as Kaggle9;

● Certification of ML and PR systems: the platform should be able to attest on the operation

and performance of experiments so as to support the work of certification agencies or

publication claims;

● Educational resource: the platform shall be usable as an educational resource for

transmitting know-how about ML and PR applications. It should be possible to set-up

interest groups that share work assignments such as in a teacher-student relationship.

In the context of AI4EU, training and inference pipelines will be exported from the existing BEAT

Platform10, which makes it a producer of AI resources.

d. International Data Space Association (IDSA)

Today, there is a common understanding that data is of high value. Leveraging this value and

trading data creates huge revenues for the large data platform providers. Rarely, the creators of data

are benefitting from this value in an adequate way. Often, only the cost for data creation and

management remain with them. Furthermore, many give their data away for free or pay with it for

the use of a service. Finally, others keep it for themselves without taking advantage of the value.

There is a need for vendor independent data ecosystems and marketplaces, open to all at low cost

and with low entry barriers. This need is addressed by the International Data Spaces (IDS)

Association, a nonprofit organization with today about 100 members from various industrial and

7 https://arxiv.org/abs/1704.02319 8 https://www.idiap.ch/software/beat 9 https://www.kaggle.com 10 https://www.beat-eu.org/platform

Page 7: Interoperability design and implementation choices ... · Interoperability design and implementation choices reference document ... final version for evaluation, revised version following

AI4EU_D2.7_M8_vfinal

Page 7 of 20

scientific domains. The IDS Association specified an architecture, interfaces and sample code for

an open, secure data ecosystem of trusted partners. The specification of the IDS Association forms

the basis for a data marketplace based on European values, i.e. data privacy and security, equal

opportunities through a federated design, and ensuring data sovereignty for the creator of the data

and trust among participants. It forms the strategic link between the creation of data in the internet

of things on the one hand side and the use of this data in machine learning (ML) and artificial

intelligence (AI) algorithms on the other hand side.

Digital responsibility is evolving from a hygiene factor to key differentiator and source of

competitive advantage. Future data platforms and markets will be built on design principles that go

beyond our traditional understanding of cybersecurity and privacy. Based on strong data ethics

principles the IDS Reference Architecture Model puts the user in its center to ensure

trustworthiness in ecosystems and sovereignty over data in the digital age as its key value

proposition. IDSA defines a reference architecture, which supports sovereign exchange and sharing

of data between partners independent from their size and financial power. Thus, it meets the needs

of both large and small and medium enterprises (SMEs). Further down the road, it may be taken

up as well by individuals. Whether data of IoT devices is concerned, in on premise systems or

cloud platforms, the IDSA aims at providing the standard for sharing data between different

endpoints while ensuring data sovereignty.

e. Barcelona Supercomputing Center (BSC)

The High-performance computing (HPC) infrastructures included in the project will be used as

potential execution platform for selected trial projects that require thousands of processors and/or

accelerators to be completed in a reasonable time. These selected projects will be initially defined

and tested in the Acumos platform for later be adapted and executed in the HPC infrastructures.

Most of the HPC infrastructures have some limitations in terms of applications supported,

containers, and security restrictions, all these aspects will be evaluated and configured in the task

T2.5 to provide an easy environment to port projects from the AI4EU platform to the HPC

environments.

The current HPC centers involved in AI4EU project are the Barcelona Supercomputing Center

(BSC) from Spain and CINECA from Italy, in both cases the Infrastructures available in these two

supercomputing centers include general purpose processors and accelerated machines with GPUs.

For BSC the hardware available for the trial projects will be Nord3 for the non-accelerated codes,

the description of the hardware of this machine is available here (Nord3 system configuration).

This cluster is running SUSE Linux Enterprise Server 11 with SP3 and the system works with LSF

batch scheduler (LSF documentation), the cluster supports Singularity containers currently with

version 2.4.2 but the BSC support team is working to support the version 3.2.0 before the end of

september 2019.

For accelerated codes BSC have 2 clusters to support different kinds of workflows, depending on

their requirements. One of them is using Power9 processors with NVIDIA V100 GPUs (MN4-

Power9 system overview), this cluster provides more than 1 PFlops of compute power, and the

other one with x86 processors and NVIDIA K80 gpus ( MinoTauro System overview) which

provides more than 250 TFlops performance. In booth machines the containers supported are

singularity with the version 3.2.0, it is important to remark that for Power9 machine, the docker

images needs to be done for ppc64 architecture, which can difficult the porting from acumos, so

this infrastructure will be used only for very demanding projects where the porting is really

beneficial in terms of performance.

Page 8: Interoperability design and implementation choices ... · Interoperability design and implementation choices reference document ... final version for evaluation, revised version following

AI4EU_D2.7_M8_vfinal

Page 8 of 20

f. Know Center (KNO)

Data-driven services are becoming an increasingly important aspect of the modern economy, with

data markets playing a pivotal role as a broker between stakeholders of data-driven ecosystems. As

one example, the Data Market Austria (DMA)11 is an initiative to create a digital ecosystem, i.e., a

multi-sided market for shared datasets and data-driven algorithms. Specifically, DMA takes the

role of being a central hub for a variety of actors participating in the (Austrian) data economy,

regardless of their industry sector. For successful collaborations in data markets, different actors

need to collaborate to be able to create new solutions. Recommender services and underlying

models thus, take the role of matchmakers, that discover and suggest potential new combinations

between users, datasets, and services.

Therefore, DMA is built upon the scalable recommendation-as-a-service framework ScaR12, which

implements an important aspect of modern recommender systems. This includes functionality to:

● support different forms of metadata and interaction data.

● process and consider streaming data for the recommendation process in (near) real-time.

● scale the recommender system to be suitable for cloud based environments.

● combine (near) real-time recommender approaches with context dependent data.

To support these functionalities, ScaR is following the Microservice Architecture design pattern13

and uses Apache Zookeeper14 for scalability purposes. Furthermore, the high-performance

enterprise search platform Apache Solr15 is used as a database to allow for (near) real-time

recommendation and search functionality.

The ScaR framework was initially applied and evaluated in the course of DMA to interlink users,

datasets and algorithms16. However, the current implementation of the framework lacks of so-

called gatekeeper functionalities that assess technical and scientific properties of potential datasets

prior to the recommendation and search process. Thus, the implementation of such gatekeeper

functionalities is Know-Center’s contribution to T2.6 of AI4EU that can be understood as a

controller of AI resources (e.g., datasets), as it creates a European Data for AI database with

recommendation and search services (see Section 3).

11

https://datamarket.at/en/ 12

http://scar.know-center.tugraz.at/ 13

https://microservices.io/patterns/microservices.html 14

https://zookeeper.apache.org/ 15

https://lucene.apache.org/solr/ 16

https://arxiv.org/abs/1908.04017

Page 9: Interoperability design and implementation choices ... · Interoperability design and implementation choices reference document ... final version for evaluation, revised version following

AI4EU_D2.7_M8_vfinal

Page 9 of 20

3. Results and Analysis

In sections (a) to (f), we provide examples of interoperability plans from AI4EU partners, while in

the section (g) we describe a possible component exchange that data producers and consumers may

adhere if they would like to interoperate.

a. Acumos

Background

Acumos provides a data exchange format based on Docker containers for AI Resources (i.e.

components). We advocate that a similar container-based framework should be used as the

interoperability standard for AI4EU. We briefly introduce its functioning below.

The Acumos platform defines an “onboarding” feature17 allowing AI Resources to be uploaded

and downloaded from an existing platform instance. Resources are encapsulated and safe-kept

using Docker containers18. Users uploading AI Resources, typically describe such through means

of a programming core, data components and an I/O exchange definition, which is later converted

into a Docker container. In recent versions of Acumos, it is also possible to onboard Docker

containers directly19. A Docker image must be created in a way, that upon start, exposes the service

defined in the Protobuf on Port 80 and must be made available either in a public Docker registry

or on the Acumos registry, so it can be referenced during the onboarding process (Figure below).

Screenshot of the onboarding processing on Acumos

17

https://wiki.acumos.org/display/AC/Soup-to-

Nuts+Example%3A+Onboarding%2C+Downloading%2C+Deploying%2C+and+Using+a+Pytho

n-Based+Model+in+Acumos 18

https://www.docker.com/resources/what-container 19

https://wiki.acumos.org/display/LM/Docker+file+using+new+model+runner

Page 10: Interoperability design and implementation choices ... · Interoperability design and implementation choices reference document ... final version for evaluation, revised version following

AI4EU_D2.7_M8_vfinal

Page 10 of 20

Interoperability plan

Acumos is already nearly compliant with the proposed container-based interoperability plan as we

are using a similar underlying representation model. In case this representation evolves in time,

exchanging assets with the Acumos platform may have to be adapted.

b. Thales Alenia Space (TAS)

Background

Mundi has a classical architecture (see Figure below) with :

• an IaaS that has the specificity to provide single access point for the entire Copernicus data.

On this cloud environment, virtual machines (VM) with storage and computing capacities

are also installed. And if the VM resources are not enough, it is possible to buy a tenant,

which is guaranteed private and fully secured, and compliant with European’s privacy

policies.

• a PaaS

• a SaaS that contains a Jupyter Notebook to manipulate data and run the docker containers.

On this SaaS, it is also possible to propose complementary AI tools (based on Python 3 or

R codes) via Mundi marketplace.

Mundi Solution

Interoperability plan

As EO data are huge, downloading them through the Internet is not an effective way of working.

Moreover, it is possible to download only two EO data at once. Best way to process EO data is to

do it in the same cloud environment where they are hosted, that is why the considered solution

consists in downloading Docker containers from AI4EU repository platform, which contain the AI

tools of interest, and to execute these containers on the virtual machine provided by the Mundi

platform (see next Figure). Note that Thales Alenia Space plans also to negotiate with other

members of the consortium to expose a link toward AI4EU repository platform on Mundi

Marketplace to promote the integration of AI tools from AI4EU project. If the AI4EU user want to

discover what is an EO data, it will be possible to download EO data (not more than two at once

for free account) from the semantic search of AI4EU project. This tool will also help users to select

the EO data of interest.

Page 11: Interoperability design and implementation choices ... · Interoperability design and implementation choices reference document ... final version for evaluation, revised version following

AI4EU_D2.7_M8_vfinal

Page 11 of 20

Interoperability between AI4EU and Mundi platforms

c. BEAT platform from IDIAP (IDI)

Background

Essentially, each processing unit in a BEAT workflow is represented by:

● An Algorithm20 object which is composed of

○ a JSON description containing information about the inputs and outputs, the type of

the algorithm, its parameters and some metadata,

○ the actual code that will be executed following a predetermined class format,

○ the documentation of the algorithm.

● one or more DataFormat21 objects which describe the data types (simple or complex) that

must be used to allow the Algorithm to read from its input and write to its outputs.

These processing units are executed in a Docker container that provide a specific Environment

containing an arbitrary number of libraries (e.g. TensorFlow, pyTorch). Environments are

versioned so they can evolve in time, while older versions are kept for reproducibility.

Interoperability plan

20 https://www.idiap.ch/software/beat/docs/beat/docs/stable/beat/algorithms.html 21 https://www.idiap.ch/software/beat/docs/beat/docs/stable/beat/dataformats.html

Page 12: Interoperability design and implementation choices ... · Interoperability design and implementation choices reference document ... final version for evaluation, revised version following

AI4EU_D2.7_M8_vfinal

Page 12 of 20

Diagram illustrating the exporting of a BEAT processing pipeline to an Acumos Docker

container

To allow exporting BEAT Algorithms or complex sets of those, we will modify the BEAT

framework to allow the user to arbitrarily create AI4EU-compatible containers from a subset of

Algorithms running in a BEAT Experiment22.

More precisely, we will make the following modifications to the BEAT framework to support

exporting pipelines to the AI4EU platform:

1. A code generation tool to:

a. Convert the DataFormat objects into a Protobuf description,

b. Create a programming core from selected BEAT Algorithms23.

2. An exporter tool to create Docker images such that they can be directly onboarded to the

AI4EU platform.

These containers will be based on original BEAT Environments and will be augmented to contain

the necessary programming core and input/output descriptions.

22 https://www.idiap.ch/software/beat/docs/beat/docs/stable/beat/experiments.html 23 https://www.idiap.ch/software/beat/docs/beat/docs/stable/beat/algorithms.html

Page 13: Interoperability design and implementation choices ... · Interoperability design and implementation choices reference document ... final version for evaluation, revised version following

AI4EU_D2.7_M8_vfinal

Page 13 of 20

d. International Data Space Association (IDSA)

Background

The International Data Spaces connects the lower-level architectures for communication and basic

data services with more abstract architectures for smart data services. It therefore supports the

establishment of secure data supply chains from data source to data use, while at the same time

making sure data sovereignty is guaranteed for data owners.

Over the IDS Connector, the International Data Space’s central component, industrial data clouds,

as well as individual enterprise clouds, on-premises applications and individual, connected devices

can be connected to the International Data Spaces.

International Data Spaces connecting different cloud platforms

The IDS Reference Architecture Model describes processes for the provision and consumption of

data and algorithms and provides a semantic data model (IDS Infomodel) to describe offerings for

the data economy. The IDS Connector is responsible for communication and interoperability in the

IDS terms it is supported by the components Broker, App Store and Identity Provider.

Interaction of technical components

Page 14: Interoperability design and implementation choices ... · Interoperability design and implementation choices reference document ... final version for evaluation, revised version following

AI4EU_D2.7_M8_vfinal

Page 14 of 20

A distributed network like the International Data Spaces relies on the connection of different

member nodes where Connectors or other core components are hosted (a Connector comprising

one or more Data Endpoints). The Connector is responsible for the exchange of data or as a proxy

in the exchange of data, as it executes the complete data exchange process from and to the internal

data resources and enterprise systems of the participating organizations and the International Data

Spaces. It provides metadata to the Broker as specified in the connector self-description, e.g.

technical interface description, authentication mechanism, exposed data sources, and associated

data usage policies. It is important to note that the data is transferred between the Connectors of

the Data Provider and the Data Consumer (peer-to-peer network concept).

There may be different types of implementations of the Connector, based on different technologies

and depending on what specific functionality is required regarding the purpose of the Connector.

Two fundamental variants are the Base Connector and the Trusted Connector as they differ in the

capabilities regarding security and data sovereignty.

The Connector Architecture uses application container management technology to ensure an

isolated and secure environment for individual data services. A data service matches a system

which offers an API to store, access or process data. To ensure the privacy of sensitive data, data

processing should take place as close to the data source as possible. Any data preprocessing (e.g.,

filtering, anonymization, or analysis) should be performed by Internal Connectors. Only data

intended for being made available to other participants should be made visible through External

Connectors.

Data Apps are data services encapsulating data processing and/or data transformation functionality

bundled as container images for simple installation by application container management.

Using an integrated index service, the Broker manages the data sources available in the

International Data Spaces and supports publication and maintenance of associated metadata.

Furthermore, the Broker Index Service supports the search for data resources. Both the App Store

and the Broker are based on the Connector architecture (which is described in detail in the

following paragraphs) in order to support secure and trusted data exchange with these services.

Connector Architecture

The details of the IDS Connector Architecture can be found in the IDS Reference Architecture

Model24

24 https://www.internationaldataspaces.org/wp-content/uploads/2019/03/IDS-Reference-

Architecture-Model-3.0.pdf

Page 15: Interoperability design and implementation choices ... · Interoperability design and implementation choices reference document ... final version for evaluation, revised version following

AI4EU_D2.7_M8_vfinal

Page 15 of 20

Interoperability plan

As the IDS Connector is a generic concept, but based on virtualization and container management,

the concept of the IDS Connector is easily adaptable to the AI4EU platform. Most current

implementations of IDS Connectors rely on Docker or Kubernetes.

The Execution Core Container of the IDS Connector can be integrated as Docker image in the

AI4EU platform and can be used in different scenarios for consuming and providing data, models

and applications. in data sovereign way. From the IDSA perspective it is still open how the IDS

Connector Components Data Bus and Data Router will be adopted in the AI4EU platform as they

connect the different containers. The IDS Reference Architecture Model relies on a certification

scheme, that includes an assessment of the technical component used and the participants in the

ecosystem. It is still open, how this certification scheme will be adopted within the AI4EU platform

(regarding Core Component Certification) and in general (regarding Participant Certification).

e. Know Center (KNO)

Background

The Know Center (KNO) will focus on technological development, which supports the

interoperability of external datasets managed within T2.7 of AI4EU, i.e., “External Data for AI”.

To this end, a database that holds metadata of potential external datasets will be provided based on

the database created within the DMA project. This database will implement the metadata standards

described in the initial data management plan (D2.9). It will further offer a Web service-based

interface that enables search and recommendation of datasets. Main functionalities will encompass:

● Recommendation algorithms that provide personalized suggestions of datasets based on

prior user interactions (e.g., clicks on datasets, past search queries, etc.). Therefore, the

ScaR recommendation framework will be utilized.

● Gatekeeper functionalities that assess technical and scientific properties of potential

datasets prior to their database integration. Thus, the research will investigate the

application of algorithms that allow an automatic assessment of datasets towards their

suitability to cater specific use cases (e.g., is a dataset suitable for time-series analyses?).

To this end, AI algorithms will be prototypically applied on the data in order to observe the

properties of the learnt models. The outcome of this process will then be used as an

additional set of descriptive features to feed the aforementioned search and

recommendation services.

● Data consumption functionalities via data brokers. Once suitable datasets are found for

specific algorithms, they can be consumed via containerized data brokers. Therefore, the

data broker type (e.g., csv file) as well as the source locations (e.g., Web URL) is saved in

the metadata database as well.

Interoperability plan

The main aim of the “External Data for AI” database is to organize metadata of datasets. The

database will be enriched by a set of intelligent services that support structure, findability,

recommendation and consumption. In the remainder of this section, the framework’s different

components as well as the interoperability with AI4EU (with Acumos as an example) is described:

Page 16: Interoperability design and implementation choices ... · Interoperability design and implementation choices reference document ... final version for evaluation, revised version following

AI4EU_D2.7_M8_vfinal

Page 16 of 20

● External Platforms: Metadata of datasets from external platforms will be collected. This

process will start with the approximately 15,000 datasets identified in DMA.

● Gatekeeper: Prior to the database integration, all datasets will be assessed by a so-called

gatekeeper that determines the suitability of a dataset for AI algorithms. The gatekeeper

will also be responsible for filtering datasets with insufficient metadata quality.

● “European Data for AI” database and services: Apache Solr will serve as a data backend

managing an inverted index. The inverted index stores all relevant features used for

generating recommendations and search results.

● Recommendations and Search: Flexible recommendation and search services will

facilitate access to the dataset metadata. These services can be exploited by the AI4EU

platform using a REST-full interface i.e., HTTP as a communication protocol and JSON as

a format for data transmission. The recommendation and search services will be developed

based on the ScaR framework.

● Interactions: In order to calculate personalized recommendations, the AI4EU platform will

be able to provide interaction data. Thus, if a user of the platform works with a specific

dataset or AI service, this interaction can be stored in the database. Such data also act as a

feedback loop to offer personalization services, as it allows to track whether recommended

datasets or services have been interacted with or not.

● AI services / algorithms: Similar to the handling of above described interaction data,

AI4EU might also provide metadata of AI services / algorithms. With this information,

additional recommendation functionality can be offered. Thus, this allows to go beyond

typical item2user recommendations (e.g., recommending a dataset to a user) and

additionally provide item2item recommendations (e.g., recommending a dataset to an AI

service). This should lead to novel and useful combinations of datasets and AI services.

● AI4EU data broker: By providing an AI4EU data broker, datasets can be consumed by

the platform. Therefore, a service definition implemented as a Protobuf file, a license in

JSON format and a container embedding the data broker will be provided. The data broker

type (e.g., csv file) as well as the source location (e.g., Web URL) is stored in the “European

Data for AI” database.

On top of this database, recommendation and search services will be employed that aim to provide

novel and helpful suggestions for combining AI services and datasets, and aim to support platform

Page 17: Interoperability design and implementation choices ... · Interoperability design and implementation choices reference document ... final version for evaluation, revised version following

AI4EU_D2.7_M8_vfinal

Page 17 of 20

users in completing targeted and extensive searches for datasets, respectively. Thus,

recommendation and search build on a data corpus describing offered items (i.e., dataset and AI

services), actors (i.e., organizations and users) and (user) interactions. Interactions of users (e.g.,

viewing the description of an offer) and feedback of users (e.g., clicking on a recommendation) are

stored separately. Metadata describes the properties of AI services, datasets, organizations and

users.

f. Barcelona Supercomputing Center (BSC)

Background

The HPC environment is usually more restrictive in terms of security and performance than other

environments like the HTC or cloud. One of the limitations in the HPC centers is the availability

of containers in their executions. The most used container up to now is Docker, but currently in

most of the HPC centers Docker is not supported and the containers supported are mainly

Singularity25 and/or Shifter26, booth systems provide mechanisms to transform a Docker image to

the corresponding container type but limiting some of its features to increase the security of the

final execution.

Interoperability plan

At BSC there is support only for Singularity, so the main way to input assets from the AI4EU to

HPC platforms will be to establish guidelines for the users to generate containers that can be easily

translated to Singularity, to be executed on HPC. Also, another core point to be taken into account

is the performance of containers. This aspect is an important point to be checked before spending

massive amounts of compute hours in a project, as it is very important to assure the best possible

performance and efficiency of the executions before scale it up to several thousands of cores. The

performance and portability part will be fully managed by the Task 2.5 while the interoperability

design between AI4EU and HPC will be designed with the effort allocated to BSC in the T2.6.

g. A container approach towards interoperability

Typical examples of AI Resources are:

● pre-trained machine learning models,

● rule-based models (expert systems aka. symbolic AI),

● algorithms to perform training or inference,

● datasets (for training or testing),

● data brokers to access remote data (e.g. satellite images).

We recommend that an AI Resource should comprise three objects:

1. A service definition implemented as a Protobuf file (.proto),

2. A license in JSON format (license.json),

3. A container (e.g. Docker image) containing the programming core and or data components

(e.g. pre-trained models, datasets or data brokers).

Service definition

25

https://sylabs.io/docs/ 26

https://docs.nersc.gov/programming/shifter/overview/

Page 18: Interoperability design and implementation choices ... · Interoperability design and implementation choices reference document ... final version for evaluation, revised version following

AI4EU_D2.7_M8_vfinal

Page 18 of 20

Any AI Resource must define an execution context and a data exchange interface via a Protobuf

file. This data exchange interface must be self-contained in the sense that it contains the definition

of the input and output data structures as well as the service definition.

The following example shows the Protobuf definition for a classifier inferring class labels for the

Iris Flower dataset (basic benchmark in Machine Learning) : syntax = "proto3";

package kTglehYxRGIPEoXkdoKpXCLzWgLrCbCp;

service Model {

rpc classify (IrisDataFrame) returns (ClassifyOut);

}

message IrisDataFrame {

repeated double sepal_length = 1;

repeated double sepal_width = 2;

repeated double petal_length = 3;

repeated double petal_width = 4;

}

message ClassifyOut {

repeated int64 value = 1;

}

Page 19: Interoperability design and implementation choices ... · Interoperability design and implementation choices reference document ... final version for evaluation, revised version following

AI4EU_D2.7_M8_vfinal

Page 19 of 20

License

Additionally, the AI Resource must provide a suitable license file in JSON format. The

specification for the format can be found online27. For instance, here is an example of a license file

in JSON format wrapping an Apache 2.0 license:

{

"modelLicenses": [

{

"keyword": "Apache-2.0",

"intro": "Apache 2.0 License for Company A. Legal Text",

"copyright": {

"year": 2019,

"company": "Company B",

"suffix": "All rights reserved."

},

"swidTag": "Acumos Ai/ML Model|Data",

"modelId": "AB123456",

"licenseType": "trythenbuy|purchasemodel|purchaseartifacts",

"rights": [

{

"id": "location",

"name": "Locations Allowed",

"desc": "The right to use this software is granted for the specified allowed locations",

"limit": {

"type": "location",

"value": [

"China,Europe,United States"

]

}

}

],

"contact": {

"desc": "Contact Company @ [email protected] To acquire the right to use this software"

},

"fullLegalLicense": "All legal text|url"

}

]

}

Container

Any suitable container format such as Docker or Singularity.

27

https://docs.acumos.org/en/boreas/submodules/security-verification/license-manager-client-

library/docs/license-json.html

Page 20: Interoperability design and implementation choices ... · Interoperability design and implementation choices reference document ... final version for evaluation, revised version following

AI4EU_D2.7_M8_vfinal

Page 20 of 20

4. Conclusion

In this deliverable, we proposed a common path towards federating AI Resources in Europe. We

introduced relevant technical details of partner resources and proposed a high-level technical

solution towards interoperability.

● The Acumos platform already defines an “onboarding” feature allowing AI Resources to be uploaded and downloaded from an existing instance. Resources are encapsulated and safe-kept using Docker containers through means of a programming core, data components and an I/O exchange definition.

● BEAT will work as an AI Resource producer exporting its Algorithms or complex sets of

those allowing the End-Users to arbitrarily create compatible containers from a BEAT

Experiment.

● Thales Alenia Space will work as an AI Resource producer/consumer by hosting executions

of compatible containers.

● Know Center will work as an AI Resource provider/controller by providing data brokers to

consume datasets that are referenced in the “European Data for AI” database and

recommendation infrastructure created in Task 2.7.

● International Data Spaces Association will work as an AI Resource producer/consumer by

providing a reference architecture model for the sovereign exchange of data, models and

algorithms. The Execution Core of the IDS Connector can be made available as reusable

component in the AI4EU platform. The Certification scheme of the IDS Reference

Architecture Model will provide trustworthiness to the platform and the AI4EU ecosystem.

● Barcelona Supercomputing Center will work as an AI Resource consumer by providing an

adaptor between AI4EU containers and supported Singularity images.

We believe that the proposed approach using containers will fit many use cases and is flexible

enough to accommodate the currently heterogeneous ecosystem of AI actors in Europe and beyond.

In conclusion, we expect that this deliverable forms one of the main pillars for the consolidation of

AI Resources.