euclid data model 101 - episode 01: overview

17
High level presentation of data modeler’s job & available tools

Upload: euc-dm-test

Post on 15-Jan-2015

201 views

Category:

Technology


1 download

DESCRIPTION

First episode of Euclid Data Model 101. High-level presentation of data modeler's job & available tools.

TRANSCRIPT

Page 1: Euclid Data Model 101 - Episode 01: Overview

High level presentation of data modeler’s job & available tools

Page 2: Euclid Data Model 101 - Episode 01: Overview

INTRODUCTION Objectives of Euclid Datamodel 101

Slidecast dedicated to Euclid data modelers & developers

Released as multiple episodes over time

1st episode: high-level overview of tools and process

2nd episode: the TIPS example

Following episodes: zoom on technical points

Help you understand what is expected and how to do it

Page 3: Euclid Data Model 101 - Episode 01: Overview

INTRODUCTION Objectives and contents of this presentation

Get an overview of the data modeling process

Understand the data model workflow

Know where to find information

Know what tools are available

No complex details and technical information here… … but high-level information and pointers to the right direction.

Page 4: Euclid Data Model 101 - Episode 01: Overview

1 - Understand Euclid DataModel

Why using a Euclid DataModel? Why choosing XML? What is XML

Schema? What are the Euclid-specific XML rules my schema shall comply with? How is DataModel SVN repository structured? How are

xml namespaces structured?

2 - Create your own DataModel

What should my DataModel contain? What software can I use to write

xml? How can I check if my datamodel is correct?

3 - Use the DataModel in your own code

How can I use the data model in my code? How can I use XML data

bindings? Can I get pre-configured tools all at once?

SUMMARY The data modeling process

Page 5: Euclid Data Model 101 - Episode 01: Overview

Why using a Euclid DataModel?

Euclid mission relies a lot on data transfer and manipulation

Data consistency between OUs, workflows, pipelines, storage

is a key point

EAS

Your data products will be:

- stored on EAS

- queryable from EAS

- transmitted to/from pipelines by IAL

SDC

IAL

DataModel

Data products in/out

Your DataModel will be:

- used to structure EAS db

- manipulated by your pipelines code

DESIGN TIME RUN TIME

Compliant

use

use

Pipeline code

Page 6: Euclid Data Model 101 - Episode 01: Overview

Why choosing XML?

Find information

- W3Schools tutorials: http://www.w3schools.com/schema/

<coord> <x>12.05</x> <y>3.1</y> </coord>

XML language brings many benefits:

Easy to read and understand by humans and machines

Many tools available to create, control and check xml

Widely used and supported across the world

Self contained: express data and data structure

Strong type/namespace control and definition

XML chosen above many other alternatives

Page 7: Euclid Data Model 101 - Episode 01: Overview

What is XML Schema?

Two file format you should be familiar with:

Contains the actual data

XML XSD (XML schema)

Describes the data structure

<coord> <x>12.05</x> <y>3.1</y> </coord>

<xs:element name=«coord»> <xs:complexType> <xs:sequence> <xs:element name=«x» type=«xs:float» /> <xs:element name=«y» type=«xs:float» /> </xs:sequence> </xs:complexType> </xs:element>

Find information

- W3Schools tutorials: http://www.w3schools.com/schema/

- Highlights on XML/XSD (DM Workshop): http://euclid.roe.ac.uk/attachments/download/2744/Workshop_Nov2013_XSD_XML%20-%202.02.ppt

complies with

Page 8: Euclid Data Model 101 - Episode 01: Overview

What are the Euclid-specific XML rules my schema shall comply with?

Find information

- Official Euclid XML rules: http://euclid.roe.ac.uk/dmsf/eucrma?folder_id=47

- DM Workshop presentation: http://euclid.roe.ac.uk/attachments/download/2762/DM-Rules.pdf

Need for a fully consistent DataModel everybody should follow the same rules

Rules are still in development, feedback is welcome and changes might be required

Among existing rules:

- XML Schema file name - XML file name - Single root element - Element identifier name - Numeric type restriction

- Recursive definitions - Target namespaces - Encoding - Unqualified namespaces - …

Page 9: Euclid Data Model 101 - Episode 01: Overview

How is DataModel SVN repo structured?

Find information

- DMWorkshop svn presentation: http://euclid.roe.ac.uk/projects/eucrma/wiki/20131411DMWSconf

- Dictionary of types: https://apceucliddev.in2p3.fr/jenkins/job/Dictionary/ws/eXist/dictionary.html

- Configuration management & best practices: http://euclid.roe.ac.uk/projects/eucrma/wiki#Configuration-management

EC/SGS/ST/4-2-05-DM/schema

Classic SVN structure

- trunk: latest stable work

- branches: specific feature parallel development - tags: official releases

Dictionary and Interfaces for your products

- Dictionnary: definition of the complexTypes and

elements of your product entire DataModel

- Interfaces: definition of the data exchanged between

components. One root element only per type, that you can

see as a variable to access a product.

Page 10: Euclid Data Model 101 - Episode 01: Overview

How are xml namespaces structured?

Find information

- DMWorkshop svn presentation: http://euclid.roe.ac.uk/projects/eucrma/wiki/20131411DMWSconf

- Dictionary of types: https://apceucliddev.in2p3.fr/jenkins/job/Dictionary/ws/eXist/dictionary.html

- Configuration management & best practices: http://euclid.roe.ac.uk/projects/eucrma/wiki#Configuration-management

EC/SGS/ST/4-2-05-DM/schema

Under Dictionary and Interfaces, 4 top-level namespaces

- bas: common definitions shared by everyone

- ins: instrument specific definitions

- pro: OU-specific definitions

- sys: system specific definitions (storage, processing…)

/pro sub-levels

- one directory per OU

- one responsible custodian per directory

Page 11: Euclid Data Model 101 - Episode 01: Overview

What should my DataModel contain?

Find information

- Fits DataModel (see dictionary and interfaces): schema/trunk/Dictionary/pro/sim/euc-test-ousim-tips.xsd

- DM Workshop DataContainer presentation: http://euclid.roe.ac.uk/attachments/download/2765

- DM wiki homepage: http://euclid.roe.ac.uk/projects/eucrma/wiki

Your DataModel should contain:

- definitions of pipeline inputs

- definitions of output products

- definitions of intermediate elements

used in your code

Your DataModel can use:

- new elements you define

- already existing elements

- dataContainers for files with no specific definition

<sgs:dataContainer>

• ID • Filename • StorageNode • Path

Must have

Page 12: Euclid Data Model 101 - Episode 01: Overview

What software can I use to write XML?

Find information

- Altova XMLSpy: http://www.altova.com/xmlspy.html

- Oxygen XML Editor: http://www.oxygenxml.com/

Of course, any text editor allows you to simply read and write XML

One of these two powerful XML development environment software is recommended

- Altova XMLSpy (license from 400€)

- Oxygen XML Editor (license from 99$ - 30 days free trial)

Project oriented browsing, handles dependencies

between files

XML validation and detection of errors

Content completion for elements, attributes & values

Schema modeling with graph representation

Page 13: Euclid Data Model 101 - Episode 01: Overview

How can I check if my DataModel is correct?

Find information

- Altova XMLSpy: http://www.altova.com/xmlspy.html

- Oxygen XML Editor: http://www.oxygenxml.com/

- Official Euclid XML rules: http://euclid.roe.ac.uk/dmsf/eucrma?folder_id=47

- DM Workshop presentation: http://euclid.roe.ac.uk/attachments/download/2762/DM-Rules.pdf - DataModelChecker readme (SVN): EC\SGS\ST\4-2-05-DM\tools\trunk\DataModelChecker\doc

Use Oxygen or XMLSpy to validate your XML and XML Schema files

Use Euclid Data Model Checker tools

Well formed XML: correct language syntax

Document validation: xml conforms to xml schema definition

Check compliance with Euclid DataModel rules

Python module & scripts available in Euclid SVN (EC\SGS\ST\4-2-05-DM\tools\trunk\DataModelChecker)

Page 14: Euclid Data Model 101 - Episode 01: Overview

How can I use the DataModel in my code?

Find information

- XML data bindings resources: http://www.rpbourret.com/xml/XMLDataBinding.htm

In your pipelines code, you might want to

Read and modify existing XML files

Produce new XML files

Manipulate data as specified in the DataModel (no XML file)

Multiple ways to do that

Pipeline code

Data Model

in

out

use

Manually parse XML files

Use XPATH and xml libraries (Python lxml)

Use bindings generation

Must be avoided

Prefered way

Bindings: XML Schema elements become class definitions XML product becomes an object instance

Page 15: Euclid Data Model 101 - Episode 01: Overview

How do I use XML data bindings?

Find information

- Python Bindings library: (SVN)/EC/SGS/ST/4-2-05-DM/tools/trunk/PythonBinding - C++ Bindings library: (SVN)/EC/SGS/ST/4-2-05-DM/tools/trunk/CppBinding - DMWorkshop Python bindings presentation: http://euclid.roe.ac.uk/attachments/download/2734 - DMWorkshop C++ bindings presentation: http://euclid.roe.ac.uk/attachments/download/2745 & http://euclid.roe.ac.uk/attachments/download/2773

First step: generate classes from the DataModel

Two XML binding libraries available for Euclid

For Python, based on PyXB

For C++, based on CodeSynthesis XSD

DataModel XML Schema (.xsd files)

C++

Python

generateStubs.py

generate_allbindings.sh

C++ classes: .hxx & .cxx

Python classes: .py

Second step: use generated classes in your own code

Create and access elements as you would do with usual classes/objects

Page 16: Euclid Data Model 101 - Episode 01: Overview

Can I get pre-configured tools at once?

Find information

- CODEEN yum packages list: https://apceuclidrepo.in2p3.fr/nexus/content/groups/el6.euclid/

- Virtualbox virtualization tool: https://www.virtualbox.org/

- VMWare virtualization tool: http://www.vmware.com/fr/products/player/

We are building a virtual machine you can use on your own computer

Based on Scientific Linux 6 (OS supported for Euclid)

Linked to Euclid CODEEN yum repository for package installation

Linked to Euclid SVN for source code checkin/checkout

Containing - Required software libraries - Pre-configured development environment - C++ & Python bindings generation libraries - Data Model Checker tools - … and more

Still in development, hopefully available soon

Page 17: Euclid Data Model 101 - Episode 01: Overview

Tips DataModel from its creation to the pipeline code

In the next episode…

Stay tuned !