euclid data model 101 - episode 01: overview
DESCRIPTION
First episode of Euclid Data Model 101. High-level presentation of data modeler's job & available tools.TRANSCRIPT
High level presentation of data modeler’s job & available tools
INTRODUCTION Objectives of Euclid Datamodel 101
Slidecast dedicated to Euclid data modelers & developers
Released as multiple episodes over time
1st episode: high-level overview of tools and process
2nd episode: the TIPS example
Following episodes: zoom on technical points
Help you understand what is expected and how to do it
INTRODUCTION Objectives and contents of this presentation
Get an overview of the data modeling process
Understand the data model workflow
Know where to find information
Know what tools are available
No complex details and technical information here… … but high-level information and pointers to the right direction.
1 - Understand Euclid DataModel
Why using a Euclid DataModel? Why choosing XML? What is XML
Schema? What are the Euclid-specific XML rules my schema shall comply with? How is DataModel SVN repository structured? How are
xml namespaces structured?
2 - Create your own DataModel
What should my DataModel contain? What software can I use to write
xml? How can I check if my datamodel is correct?
3 - Use the DataModel in your own code
How can I use the data model in my code? How can I use XML data
bindings? Can I get pre-configured tools all at once?
SUMMARY The data modeling process
Why using a Euclid DataModel?
Euclid mission relies a lot on data transfer and manipulation
Data consistency between OUs, workflows, pipelines, storage
is a key point
EAS
Your data products will be:
- stored on EAS
- queryable from EAS
- transmitted to/from pipelines by IAL
SDC
IAL
DataModel
Data products in/out
Your DataModel will be:
- used to structure EAS db
- manipulated by your pipelines code
DESIGN TIME RUN TIME
Compliant
use
use
Pipeline code
Why choosing XML?
Find information
- W3Schools tutorials: http://www.w3schools.com/schema/
<coord> <x>12.05</x> <y>3.1</y> </coord>
XML language brings many benefits:
Easy to read and understand by humans and machines
Many tools available to create, control and check xml
Widely used and supported across the world
Self contained: express data and data structure
Strong type/namespace control and definition
XML chosen above many other alternatives
What is XML Schema?
Two file format you should be familiar with:
Contains the actual data
XML XSD (XML schema)
Describes the data structure
<coord> <x>12.05</x> <y>3.1</y> </coord>
<xs:element name=«coord»> <xs:complexType> <xs:sequence> <xs:element name=«x» type=«xs:float» /> <xs:element name=«y» type=«xs:float» /> </xs:sequence> </xs:complexType> </xs:element>
Find information
- W3Schools tutorials: http://www.w3schools.com/schema/
- Highlights on XML/XSD (DM Workshop): http://euclid.roe.ac.uk/attachments/download/2744/Workshop_Nov2013_XSD_XML%20-%202.02.ppt
complies with
What are the Euclid-specific XML rules my schema shall comply with?
Find information
- Official Euclid XML rules: http://euclid.roe.ac.uk/dmsf/eucrma?folder_id=47
- DM Workshop presentation: http://euclid.roe.ac.uk/attachments/download/2762/DM-Rules.pdf
Need for a fully consistent DataModel everybody should follow the same rules
Rules are still in development, feedback is welcome and changes might be required
Among existing rules:
- XML Schema file name - XML file name - Single root element - Element identifier name - Numeric type restriction
- Recursive definitions - Target namespaces - Encoding - Unqualified namespaces - …
How is DataModel SVN repo structured?
Find information
- DMWorkshop svn presentation: http://euclid.roe.ac.uk/projects/eucrma/wiki/20131411DMWSconf
- Dictionary of types: https://apceucliddev.in2p3.fr/jenkins/job/Dictionary/ws/eXist/dictionary.html
- Configuration management & best practices: http://euclid.roe.ac.uk/projects/eucrma/wiki#Configuration-management
EC/SGS/ST/4-2-05-DM/schema
Classic SVN structure
- trunk: latest stable work
- branches: specific feature parallel development - tags: official releases
Dictionary and Interfaces for your products
- Dictionnary: definition of the complexTypes and
elements of your product entire DataModel
- Interfaces: definition of the data exchanged between
components. One root element only per type, that you can
see as a variable to access a product.
How are xml namespaces structured?
Find information
- DMWorkshop svn presentation: http://euclid.roe.ac.uk/projects/eucrma/wiki/20131411DMWSconf
- Dictionary of types: https://apceucliddev.in2p3.fr/jenkins/job/Dictionary/ws/eXist/dictionary.html
- Configuration management & best practices: http://euclid.roe.ac.uk/projects/eucrma/wiki#Configuration-management
EC/SGS/ST/4-2-05-DM/schema
Under Dictionary and Interfaces, 4 top-level namespaces
- bas: common definitions shared by everyone
- ins: instrument specific definitions
- pro: OU-specific definitions
- sys: system specific definitions (storage, processing…)
/pro sub-levels
- one directory per OU
- one responsible custodian per directory
What should my DataModel contain?
Find information
- Fits DataModel (see dictionary and interfaces): schema/trunk/Dictionary/pro/sim/euc-test-ousim-tips.xsd
- DM Workshop DataContainer presentation: http://euclid.roe.ac.uk/attachments/download/2765
- DM wiki homepage: http://euclid.roe.ac.uk/projects/eucrma/wiki
Your DataModel should contain:
- definitions of pipeline inputs
- definitions of output products
- definitions of intermediate elements
used in your code
Your DataModel can use:
- new elements you define
- already existing elements
- dataContainers for files with no specific definition
<sgs:dataContainer>
• ID • Filename • StorageNode • Path
Must have
What software can I use to write XML?
Find information
- Altova XMLSpy: http://www.altova.com/xmlspy.html
- Oxygen XML Editor: http://www.oxygenxml.com/
Of course, any text editor allows you to simply read and write XML
One of these two powerful XML development environment software is recommended
- Altova XMLSpy (license from 400€)
- Oxygen XML Editor (license from 99$ - 30 days free trial)
Project oriented browsing, handles dependencies
between files
XML validation and detection of errors
Content completion for elements, attributes & values
Schema modeling with graph representation
How can I check if my DataModel is correct?
Find information
- Altova XMLSpy: http://www.altova.com/xmlspy.html
- Oxygen XML Editor: http://www.oxygenxml.com/
- Official Euclid XML rules: http://euclid.roe.ac.uk/dmsf/eucrma?folder_id=47
- DM Workshop presentation: http://euclid.roe.ac.uk/attachments/download/2762/DM-Rules.pdf - DataModelChecker readme (SVN): EC\SGS\ST\4-2-05-DM\tools\trunk\DataModelChecker\doc
Use Oxygen or XMLSpy to validate your XML and XML Schema files
Use Euclid Data Model Checker tools
Well formed XML: correct language syntax
Document validation: xml conforms to xml schema definition
Check compliance with Euclid DataModel rules
Python module & scripts available in Euclid SVN (EC\SGS\ST\4-2-05-DM\tools\trunk\DataModelChecker)
How can I use the DataModel in my code?
Find information
- XML data bindings resources: http://www.rpbourret.com/xml/XMLDataBinding.htm
In your pipelines code, you might want to
Read and modify existing XML files
Produce new XML files
Manipulate data as specified in the DataModel (no XML file)
Multiple ways to do that
Pipeline code
Data Model
in
out
use
Manually parse XML files
Use XPATH and xml libraries (Python lxml)
Use bindings generation
Must be avoided
Prefered way
Bindings: XML Schema elements become class definitions XML product becomes an object instance
How do I use XML data bindings?
Find information
- Python Bindings library: (SVN)/EC/SGS/ST/4-2-05-DM/tools/trunk/PythonBinding - C++ Bindings library: (SVN)/EC/SGS/ST/4-2-05-DM/tools/trunk/CppBinding - DMWorkshop Python bindings presentation: http://euclid.roe.ac.uk/attachments/download/2734 - DMWorkshop C++ bindings presentation: http://euclid.roe.ac.uk/attachments/download/2745 & http://euclid.roe.ac.uk/attachments/download/2773
First step: generate classes from the DataModel
Two XML binding libraries available for Euclid
For Python, based on PyXB
For C++, based on CodeSynthesis XSD
DataModel XML Schema (.xsd files)
C++
Python
generateStubs.py
generate_allbindings.sh
C++ classes: .hxx & .cxx
Python classes: .py
Second step: use generated classes in your own code
Create and access elements as you would do with usual classes/objects
Can I get pre-configured tools at once?
Find information
- CODEEN yum packages list: https://apceuclidrepo.in2p3.fr/nexus/content/groups/el6.euclid/
- Virtualbox virtualization tool: https://www.virtualbox.org/
- VMWare virtualization tool: http://www.vmware.com/fr/products/player/
We are building a virtual machine you can use on your own computer
Based on Scientific Linux 6 (OS supported for Euclid)
Linked to Euclid CODEEN yum repository for package installation
Linked to Euclid SVN for source code checkin/checkout
Containing - Required software libraries - Pre-configured development environment - C++ & Python bindings generation libraries - Data Model Checker tools - … and more
Still in development, hopefully available soon
Tips DataModel from its creation to the pipeline code
In the next episode…
Stay tuned !