slides apde2002 uren data dictionary

Upload: starofearth4u

Post on 30-May-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/14/2019 SLIDES APDE2002 Uren Data Dictionary

    1/16

    Developing a Distributed Data

    Dictionary Service

    Jim URen

    Jet Propulsion LaboratoryCalifornia Institute of TechnologyDesign Hub, KM Standards Working Group & EDA Team

    April 11, 2002

  • 8/14/2019 SLIDES APDE2002 Uren Data Dictionary

    2/16

    2001-07-30 2Developing A Distributed DataDictionary Service

    Problem

    1. Data dictionaries mean different things todifferent people:

    Vocabularies - human readable collections of termsand definitions pertaining to a domain

    Data element dictionaries - machine interpretablecollections of data elements (fromISO/IEC11179)

    Schemas (information models) - structured, machineinterpretable collections of information models

    consisting of structured relationships between dataelements

    2. Dictionaries do not communicate with eachother

  • 8/14/2019 SLIDES APDE2002 Uren Data Dictionary

    3/16

    2001-07-30 3Developing A Distributed DataDictionary Service

    What is Needed

    A mechanism that can be used to access,publish, update, relate and integrate datadictionaries (vocabularies, data elements,and data models)

    Mechanism must be able to span domainsand subdomains, e.g., engineering, science,and administrative

    Mechanism must have both manual and

    automated interfaces Mechanism should follow the distributed

    service model (e.g., DNS, Internet Domain NameService, x.500 Directory, etc.)

  • 8/14/2019 SLIDES APDE2002 Uren Data Dictionary

    4/16

    2001-07-30 4Developing A Distributed DataDictionary Service

    A Solution

    Develop a distributed data dictionary service using:

    LDAP Internet service protocol (LightWeight Directory AccessProtocol)

    ISO11179 - a specification for standard dataelements

    DSML XML DTD/Schema (Directory Service Markup Language)

    Dublin Core Meta-data

    the Service will store and relate vocabulary, dataelements, and data model information

  • 8/14/2019 SLIDES APDE2002 Uren Data Dictionary

    5/16

    2001-07-30 5Developing A Distributed DataDictionary Service

    Advantages of LDAP

    LDAP has many advantages, including: Universal Access - Internet directory standard, widely

    adopted and implemented by numerous vendors andopen source software solutions

    Simple - a relatively simple, high-level protocol with a

    straightforward API Extensible - easily extended and adapted

    Access Control and Security - connections can beauthenticated and secured layered Internet securitymechanims

    Multi-Platform Development - C/C++, Perl, Java,JavaScript, Python, PHP and other APIs are available,making LDAP services accessible from virtually anylanguage, platform, or development environment

  • 8/14/2019 SLIDES APDE2002 Uren Data Dictionary

    6/16

    2001-07-30 5Developing A Distributed DataDictionary Service

    What is LDAP?

    An Internet Standard from an IETF working group RFC 1777 Lightweight Directory Access Protocol

    RFC 1778 String Representation of Standard Atribute Syntaxes

    RFC 1779 String Representation of Distinguished Names

    RFC 1959 LDAP URL Format

    RFC LDAP API

    A distributed, hierarchial data base Uses a multi-part naming convention to create

    unique records (distinguished names)

    cn=behaviour, dc=vocabulary, dc=Part233, dc=10303, dc=ISO cn=requirement_set, dc=data-element, dc=Part233, dc=10303, dc=ISO

    cn=TBR-apha1, dc=shema, dc=Part233, dc=10303, dc=ISO

    Includes ability to implement multiple levels ofsecurity

  • 8/14/2019 SLIDES APDE2002 Uren Data Dictionary

    7/16

    2001-07-30 5Developing A Distributed DataDictionary Service

    Example of an LDAP tree

    ISO

    10303 144969000

    233

    Vocabulary Data Elements Schema

    203 210209 . . . 237235. . .

  • 8/14/2019 SLIDES APDE2002 Uren Data Dictionary

    8/16

  • 8/14/2019 SLIDES APDE2002 Uren Data Dictionary

    9/16

    2001-07-30 7Developing A Distributed DataDictionary Service

    Data Dictionary Components

    for a given namespace

  • 8/14/2019 SLIDES APDE2002 Uren Data Dictionary

    10/16

    2001-07-30 8Developing A Distributed DataDictionary Service

    using Standards-based technologyLDAP Protocol | ISO 11179 meta-data schema | DSML |Dublin CorePrototype service viewable at:

    http://step.jpl.nasa.gov/ldap

    Supporting

    Validation

    Scenarios

    Supporting

    Automated

    Processes

    Supporting

    Terminology

    Lookups

    SupportingData

    ModelingActivities

  • 8/14/2019 SLIDES APDE2002 Uren Data Dictionary

    11/16

    2001-07-30 9Developing A Distributed DataDictionary Service

    A Proposed Data Element Naming Convention

    A structured, multi-part naming system similar to IP addressing and URLs

    dot delimited names

    follows convention used by Dublin Core Meta-dataInitiative

    short-name aliases could be supported in theplanned distributed data dictionary service

    e.g. author = DC.Creator, keyword=DC.Subject, etc.

    Names would consist of domains, descriptorsand qualifiers.

  • 8/14/2019 SLIDES APDE2002 Uren Data Dictionary

    12/16

    2001-07-30 10Developing A Distributed DataDictionary Service

    Examples of the Data Element Naming

    Convention within JPL Domains

    Dublin Core Meta-data Initiative (a JPL adopted standard) DC.Date

    DC.Date.Created

    DC.Date.LastModified

    JPLs Planetary Data System (PDS) PDS.Target_Name

    PDS.Sampling_Factor

    JPLs Product Data Management System (PDMS)

    PDMS.Version

    PDMS.ReferenceDesignator

    JPL New Business System (NBS)

    NBS.HR.start_date

    NBS.HR.employee_status

  • 8/14/2019 SLIDES APDE2002 Uren Data Dictionary

    13/16

    2001-07-30 11Developing A Distributed DataDictionary Service

    Terminology Lookup Scenarios

    Resolving Ambiguous Terminology - an end user, needing to clarify use andmeaning of a word used in a specific context, performs a multi-domain vocabularylookup across multiple DD services looking for published vocabulary of referenceddomain

    Finding the Correct Acronym - an end user, confronted with a number of new

    acronyms used in a presentation, accesses a local DD service to look up the

    acronyms based within probable domains, thereby eliminating the alternative

    meanings e.g., searching for STEP standards work versus the JPL STEP project

    Enabling Improved Search Engine Performance - as a search engine scans

    through a document, it discovers a keyword list and finds a reserved word; the

    document includes a reference to a domain-specific vocabulary list in a DD service;

    the search engine uses this vocabulary to be certain it is indexing the keywords in

    the right context

    Building Glossaries for Technical Papers - an engineer or scientist writing atechnical paper, needs to include a glossary of relevant terms in the paper; by

    performing a multi-service search, terms and definitions that relate to the topic of the

    paper are quickly found and inserted into the paper with the corresponding

    attributions

  • 8/14/2019 SLIDES APDE2002 Uren Data Dictionary

    14/16

    2001-07-30 12Developing A Distributed Data

    Dictionary Service

    Validation Scenarios

    Validating Units of Measure - a system integrator receives an MCAD geometrymodel (e.g., STEP AP203 Part 21 file) of a component to be integrated into anyassembly; automatically, a standard validation routine is performed against theschema located in a referenced data dictionary that checks for use of the units ofmeasure called for in the contract and identified in the exchange file

    Enabling Automated Repository Check-In - as a STEP model is checked into a

    PDM system, an automated validation routine checks the model using the schema

    (located in the DD service) that is identified in the Part 21 data file Improving Quality of Data Handoffs - an MCAD geometry model is sent from

    design to thermal analysis and validation is performed using the correct schema

    version as referenced in the model; validation is an automated process that occurs

    before any work is done with the model as it is transferred between domains

    Validating for Adequacy and Range the PDS (NASAs Planetary Data System)

    central node receives a dataset description in template format to be ingested into thedataset catalogue database. Automatically, a standard validation routine is

    performed that checks for required keywords, key word values and value types in the

    dataset in template format against a corresponding structure stored in the PDS

    domain of the data dictionary service

  • 8/14/2019 SLIDES APDE2002 Uren Data Dictionary

    15/16

    2001-07-30 13Developing A Distributed Data

    Dictionary Service

    Data Modeling Scenarios

    Data Reuse in Modelling Activities- a data modeller, charged withdeveloping an information model for a new application, uses dataelements published in several DD services (much like a parts library),ensuring that the new information model will have compatible interfaceswith data sets that share the same data elements or collection of elements

    Creating a TDP (technical data package) - an application performs aschema check against objects about to be wrapped into a TDP (e.g., STEP

    AP232 or PDM Schema TDP) to ensure their correct structure and meta-data content

    Data Integration Enabled - an analyst, charged with integrating datafrom two or more data sets, accesses the correct version of eachschema as referenced in the data set from the DD service spaceallowing them to identify/map interfaces between the data sets, e.g.,

    MCAD-ECAD-cost data Extending a schema - to solve a "local" problem, a data modeller uses

    data elements from a published collection of data items to extend anexisting official schema; the new schema is published in the DD servicewith traces/links back to the official schema

  • 8/14/2019 SLIDES APDE2002 Uren Data Dictionary

    16/16

    2001-07-30 14Developing A Distributed Data

    Dictionary Service

    Architecture development UML Model (50%)

    Naming Convention (50%)

    Linking ontology (25%)

    Server configuration

    2nd and 3rd DD test nodes (33%)

    Wrapping existing DD DBs (10 %)

    Client configurations

    LDAP URL (75 %) Java(33%)

    Python(33%) Perl (33%)

    C/C++ (75%) Unix Shell (25%)

    PHP(25%) Native clients(25%)

    Whats next? (Completing the prototype)

    Reporting Module Glossary Builder (0 %) History/CM Report (0 %) Summarized Listings (0 %)

    Testing (0 %) Scenarios

    Server Clients Data population

    Documentation White Paper (100 %) FAQ (20 %) Server Configuration Guide (15%) Recommended Best Practices

    (10%)