slides apde2002 uren data dictionary
TRANSCRIPT
-
8/14/2019 SLIDES APDE2002 Uren Data Dictionary
1/16
Developing a Distributed Data
Dictionary Service
Jim URen
Jet Propulsion LaboratoryCalifornia Institute of TechnologyDesign Hub, KM Standards Working Group & EDA Team
April 11, 2002
-
8/14/2019 SLIDES APDE2002 Uren Data Dictionary
2/16
2001-07-30 2Developing A Distributed DataDictionary Service
Problem
1. Data dictionaries mean different things todifferent people:
Vocabularies - human readable collections of termsand definitions pertaining to a domain
Data element dictionaries - machine interpretablecollections of data elements (fromISO/IEC11179)
Schemas (information models) - structured, machineinterpretable collections of information models
consisting of structured relationships between dataelements
2. Dictionaries do not communicate with eachother
-
8/14/2019 SLIDES APDE2002 Uren Data Dictionary
3/16
2001-07-30 3Developing A Distributed DataDictionary Service
What is Needed
A mechanism that can be used to access,publish, update, relate and integrate datadictionaries (vocabularies, data elements,and data models)
Mechanism must be able to span domainsand subdomains, e.g., engineering, science,and administrative
Mechanism must have both manual and
automated interfaces Mechanism should follow the distributed
service model (e.g., DNS, Internet Domain NameService, x.500 Directory, etc.)
-
8/14/2019 SLIDES APDE2002 Uren Data Dictionary
4/16
2001-07-30 4Developing A Distributed DataDictionary Service
A Solution
Develop a distributed data dictionary service using:
LDAP Internet service protocol (LightWeight Directory AccessProtocol)
ISO11179 - a specification for standard dataelements
DSML XML DTD/Schema (Directory Service Markup Language)
Dublin Core Meta-data
the Service will store and relate vocabulary, dataelements, and data model information
-
8/14/2019 SLIDES APDE2002 Uren Data Dictionary
5/16
2001-07-30 5Developing A Distributed DataDictionary Service
Advantages of LDAP
LDAP has many advantages, including: Universal Access - Internet directory standard, widely
adopted and implemented by numerous vendors andopen source software solutions
Simple - a relatively simple, high-level protocol with a
straightforward API Extensible - easily extended and adapted
Access Control and Security - connections can beauthenticated and secured layered Internet securitymechanims
Multi-Platform Development - C/C++, Perl, Java,JavaScript, Python, PHP and other APIs are available,making LDAP services accessible from virtually anylanguage, platform, or development environment
-
8/14/2019 SLIDES APDE2002 Uren Data Dictionary
6/16
2001-07-30 5Developing A Distributed DataDictionary Service
What is LDAP?
An Internet Standard from an IETF working group RFC 1777 Lightweight Directory Access Protocol
RFC 1778 String Representation of Standard Atribute Syntaxes
RFC 1779 String Representation of Distinguished Names
RFC 1959 LDAP URL Format
RFC LDAP API
A distributed, hierarchial data base Uses a multi-part naming convention to create
unique records (distinguished names)
cn=behaviour, dc=vocabulary, dc=Part233, dc=10303, dc=ISO cn=requirement_set, dc=data-element, dc=Part233, dc=10303, dc=ISO
cn=TBR-apha1, dc=shema, dc=Part233, dc=10303, dc=ISO
Includes ability to implement multiple levels ofsecurity
-
8/14/2019 SLIDES APDE2002 Uren Data Dictionary
7/16
2001-07-30 5Developing A Distributed DataDictionary Service
Example of an LDAP tree
ISO
10303 144969000
233
Vocabulary Data Elements Schema
203 210209 . . . 237235. . .
-
8/14/2019 SLIDES APDE2002 Uren Data Dictionary
8/16
-
8/14/2019 SLIDES APDE2002 Uren Data Dictionary
9/16
2001-07-30 7Developing A Distributed DataDictionary Service
Data Dictionary Components
for a given namespace
-
8/14/2019 SLIDES APDE2002 Uren Data Dictionary
10/16
2001-07-30 8Developing A Distributed DataDictionary Service
using Standards-based technologyLDAP Protocol | ISO 11179 meta-data schema | DSML |Dublin CorePrototype service viewable at:
http://step.jpl.nasa.gov/ldap
Supporting
Validation
Scenarios
Supporting
Automated
Processes
Supporting
Terminology
Lookups
SupportingData
ModelingActivities
-
8/14/2019 SLIDES APDE2002 Uren Data Dictionary
11/16
2001-07-30 9Developing A Distributed DataDictionary Service
A Proposed Data Element Naming Convention
A structured, multi-part naming system similar to IP addressing and URLs
dot delimited names
follows convention used by Dublin Core Meta-dataInitiative
short-name aliases could be supported in theplanned distributed data dictionary service
e.g. author = DC.Creator, keyword=DC.Subject, etc.
Names would consist of domains, descriptorsand qualifiers.
-
8/14/2019 SLIDES APDE2002 Uren Data Dictionary
12/16
2001-07-30 10Developing A Distributed DataDictionary Service
Examples of the Data Element Naming
Convention within JPL Domains
Dublin Core Meta-data Initiative (a JPL adopted standard) DC.Date
DC.Date.Created
DC.Date.LastModified
JPLs Planetary Data System (PDS) PDS.Target_Name
PDS.Sampling_Factor
JPLs Product Data Management System (PDMS)
PDMS.Version
PDMS.ReferenceDesignator
JPL New Business System (NBS)
NBS.HR.start_date
NBS.HR.employee_status
-
8/14/2019 SLIDES APDE2002 Uren Data Dictionary
13/16
2001-07-30 11Developing A Distributed DataDictionary Service
Terminology Lookup Scenarios
Resolving Ambiguous Terminology - an end user, needing to clarify use andmeaning of a word used in a specific context, performs a multi-domain vocabularylookup across multiple DD services looking for published vocabulary of referenceddomain
Finding the Correct Acronym - an end user, confronted with a number of new
acronyms used in a presentation, accesses a local DD service to look up the
acronyms based within probable domains, thereby eliminating the alternative
meanings e.g., searching for STEP standards work versus the JPL STEP project
Enabling Improved Search Engine Performance - as a search engine scans
through a document, it discovers a keyword list and finds a reserved word; the
document includes a reference to a domain-specific vocabulary list in a DD service;
the search engine uses this vocabulary to be certain it is indexing the keywords in
the right context
Building Glossaries for Technical Papers - an engineer or scientist writing atechnical paper, needs to include a glossary of relevant terms in the paper; by
performing a multi-service search, terms and definitions that relate to the topic of the
paper are quickly found and inserted into the paper with the corresponding
attributions
-
8/14/2019 SLIDES APDE2002 Uren Data Dictionary
14/16
2001-07-30 12Developing A Distributed Data
Dictionary Service
Validation Scenarios
Validating Units of Measure - a system integrator receives an MCAD geometrymodel (e.g., STEP AP203 Part 21 file) of a component to be integrated into anyassembly; automatically, a standard validation routine is performed against theschema located in a referenced data dictionary that checks for use of the units ofmeasure called for in the contract and identified in the exchange file
Enabling Automated Repository Check-In - as a STEP model is checked into a
PDM system, an automated validation routine checks the model using the schema
(located in the DD service) that is identified in the Part 21 data file Improving Quality of Data Handoffs - an MCAD geometry model is sent from
design to thermal analysis and validation is performed using the correct schema
version as referenced in the model; validation is an automated process that occurs
before any work is done with the model as it is transferred between domains
Validating for Adequacy and Range the PDS (NASAs Planetary Data System)
central node receives a dataset description in template format to be ingested into thedataset catalogue database. Automatically, a standard validation routine is
performed that checks for required keywords, key word values and value types in the
dataset in template format against a corresponding structure stored in the PDS
domain of the data dictionary service
-
8/14/2019 SLIDES APDE2002 Uren Data Dictionary
15/16
2001-07-30 13Developing A Distributed Data
Dictionary Service
Data Modeling Scenarios
Data Reuse in Modelling Activities- a data modeller, charged withdeveloping an information model for a new application, uses dataelements published in several DD services (much like a parts library),ensuring that the new information model will have compatible interfaceswith data sets that share the same data elements or collection of elements
Creating a TDP (technical data package) - an application performs aschema check against objects about to be wrapped into a TDP (e.g., STEP
AP232 or PDM Schema TDP) to ensure their correct structure and meta-data content
Data Integration Enabled - an analyst, charged with integrating datafrom two or more data sets, accesses the correct version of eachschema as referenced in the data set from the DD service spaceallowing them to identify/map interfaces between the data sets, e.g.,
MCAD-ECAD-cost data Extending a schema - to solve a "local" problem, a data modeller uses
data elements from a published collection of data items to extend anexisting official schema; the new schema is published in the DD servicewith traces/links back to the official schema
-
8/14/2019 SLIDES APDE2002 Uren Data Dictionary
16/16
2001-07-30 14Developing A Distributed Data
Dictionary Service
Architecture development UML Model (50%)
Naming Convention (50%)
Linking ontology (25%)
Server configuration
2nd and 3rd DD test nodes (33%)
Wrapping existing DD DBs (10 %)
Client configurations
LDAP URL (75 %) Java(33%)
Python(33%) Perl (33%)
C/C++ (75%) Unix Shell (25%)
PHP(25%) Native clients(25%)
Whats next? (Completing the prototype)
Reporting Module Glossary Builder (0 %) History/CM Report (0 %) Summarized Listings (0 %)
Testing (0 %) Scenarios
Server Clients Data population
Documentation White Paper (100 %) FAQ (20 %) Server Configuration Guide (15%) Recommended Best Practices
(10%)