agenda - the configurable data curation system (cdcs)
TRANSCRIPT
Agenda
• Migration Status
• New applications
• Stakeholder Engagements/Development - Ben
• Any Additional Items
Test Site Alias Name/URL Owner Production URLhttp://vm-itl-ssd-017.nist.gov:8029/ test-ambench.nist.gov AMBENCH ( Carrie/Yanic) ambench.nist.gov
http://vm-itl-ssd-017.nist.gov:8028/ test-phasedata.nist.gov Phasedata ( Carrie/Yanic) phasedata.nist.gov
http://vm-itl-ssd-017.nist.gov:8030/ test-ghgr.nist.gov GHG Registry (Gretchen Greene) ghgr.nist.gov
http://vm-itl-ssd-017.nist.gov:8035/ ammdtest.el.nist.gov AMMD (Yan Lu)
http://vm-itl-ssd-017.nist.gov:8040/ test-materials.registry.nist.gov Material Registry (NMRR) ( Ray Plante/Chander)
materials.registry.nist.gov
http://vm-itl-ssd-017.nist.gov:8045/ test-jarvis.nist.gov JARVIS-API (Kamal Choudhary) jarvis.nist.gov
http://vm-itl-ssd-017.nist.gov:8050/ test-potentials.nist.gov Interatomic Potentials Repository (Lucas Hale)
potentials.nist.gov
http://vm-itl-ssd-017.nist.gov:8060/ test-mrr.materialsdatafacility.nist.gov MDF (Copy from NW) (Ben Blaizic/Ben Long)
http://mrr.materialsdatafacility.org
http://vm-itl-ssd-017.nist.gov:8100/ BIPM International Metrology Resource Registry (Ray/Marcus)
http://imrr.bipm.org/
https://vm-itl-ssd-070.nist.gov/ Query-able Data Repository (Tom H) https://smstest.el.nist.gov
2.0 Data Migrations
Deployed
Deployed
Deployed
Deployed
Deployed
Deployed
Deployed
Deployed
Registry Schema Test Case
Deployed - Emptyammd.nist.gov
Configurable Data Curation System (CDCS) Update
Single Sign-On enabled on all sites: Internal:
https://test-sample.nist.govhttps://test-materials.registry.nist.govhttps://test-ambench.nist.govhttps://test-mrr.materialsdatafacility.nist.govhttps://test-jarvis.nist.govhttps://test-potentials.nist.govhttps://test-phasedata.nist.govhttps://test-ghgr.nist.gov
Externalmdcs.nist.govcdcs.registry.nist.gov If you come in from a NIST network,
SSO will work, otherwise defaults to Django login
Laboratory Information Management System (LIMS)
• This laboratory information management system (LIMS) allows for the automated creation and curation of microscopy experimental records using the schema co-developed by ODI and the MML Electron Microscopy Nexus Facility.
• Experimental records are automatically harvested from multiple data sources to facilitate browsing and searching of data collected from the varied instruments in the Nexus Facility.
Contacts: June Lau/Josh Taillon
NIST Sample Repository
• Issue: How to keep track of samples – Currently a spreadsheet and handwritten labels
• Solution: Automate storing of sample data in CDCS using QR Codes tied to the original sample. Can extend across multiple CDCS instances.
Contacts: Carrie Campbell/Yanic Congo
NIST AMBench Repository
• New AMBench Repository deployed• https://test-ambench.nist.gov/
Contacts: Carrie Campbell/Lyle Levine
Swedish AM Repository
• The Swedish AM-arena to use as storage facility for their material data.• https://amdata.proj.kth.se
Contacts: Carrie Campbell
WIPP Registry
• The Web Image Processing Pipeline Project (WIPP)• wipp-plugins.nist.gov
Contacts: Peter Bajscy/Mylene Simon
CDCS Resource Registry
• Have a demo system that contains useful information while showing implementation of a registry.
• Deployed to cdcs.registry.nist.gov
Contacts: Zach Trautt/Kevin Brady
CDCS Resource Registry
• Have a demo system that contains useful information while showing implementation of a registry.
Pycdcs
Resource Type: Client Tool: API LibraryLocal ID: B3MNOJGAODDX20P9RASDStatus: active
IdentityTitle: pycdcs
ProvidersPublisher: National Institute of Standards and Technology Contact Name: Lucas Hale Contact Email Address: [email protected] ContentDescription: This is a base Python package for accessing instances of the NIST Configurable Data Curation System (CDCS) databases, versions 2+. It defines a Python CDCS class that streamlines REST calls to a database by: (1) Taking access settings once (username, password, etc) and saving them for subsequent REST calls. (2) Defining methods that wrap around REST calls to interact with the database in a more Pythonic way. (3) Automatically converting any accessed information to pandas Series and DataFrame objects to allow for the information to be easily manipulated.Landing Page: https://github.com/lmhale99/pycdcs
RoleType: Client Tool: API Library
Parmenides-based CDCS Root-terms POC
Root terms approach background“Methodology: Root and Rule”
R&D by Bhat et al
Multiple domain usage•crystallography •biology•materials science•forensics•cybersecurity•and more
Linguistic/morphological approach to natural language text
Applications in linguistic•analysis•semantics•pragmatics•indexing•and more
Software tooling
Supported by•linguistic software, Parmenides
Developed•by Jacob Collard•with Eswaran Subrahmanian
Analyzes natural language text
Outputs root-term-structured metadata
Used by text analysis applications
CDCS development
Initial root-term applications•root-term-based search•dynamic faceting + indexing•atomic + compound terms•run against XML corpora
Future applications•vocabulary generation, ontology, mining, and more
Exploring and comparing usage with other approaches• student projects soon
Exploratory usage coming soon• with June Lau et al soon
COVID-19 NIST-Based Sites
• https://covid19-data.nist.gov - The NIST COVID19-DATA repository is being made available to aid in meeting the White House Call to Action for the Nation’s artificial intelligence experts to develop new text and data mining techniques that can help the science community answer high-priority scientific questions related to COVID-19 - ~ 144,000 Papers, ~19,200 Institutions, ~405,000 Authors
• https://covid19-registry.nist.gov – NIST Registry of all known COVID related sites – 52 Sites
• https://test-qdr.nist.gov - CAS COVID-19 Anti-Viral Candidate Compounds Dataset - As a specialist in scientific information solutions, CAS is partnering with research organizations around the globe to tackle the complex and rapidly evolving challenge of COVID-19. Aligned with our mission as a division of the American Chemical Society, CAS is making a wide range of assets, expertise, and resources available to support this fight. ~49,000 Entries
Essential Highlights: Release 2.9
• Front-end libraries for styling • bootstrap: 3.3.7 to 4.4.1 • font awesome: 4.7.0 to 5.13.0
• Admin tool• dump records by user• assign records to user by moving records to a local
file system folder
Upgrades
• configure for account request denial reply-emails• toggle footer links display• filter queries by workspace• edit MRR resource title• initial support for JSON (JSON schema/docs,
insert/retrieval via REST)• perform advanced searches using Elasticsearch
Introduced ability to
• change owner view for only active users• app-initialization only during data-migration• exporters work on all pages• removed errors on a few dashboard pages• cosmetic fixes
Fixes
Essential Highlights: Release Summary View
Past release (2.9)May 26
Current release (2.10)May 18 – July 17
Future release (2.11)July 13 – Sept 11
Parallel / Ongoing
• Upgraded• front-end libraries for
styling • admin tool
• Introduced• account request denial• footer links display• Workspace query filters• MRR resource title
editing• initial JSON support• Elasticsearch
• Made fixes for• change ownership• app-initialization• exporters• cosmetic
• Record migration• Ability to migrate records from
older to newer schemas
• PID updates• Share PID for single record
• REST API updates• for registry data-migration
operations
• Persistence updates• Preserve record publication-
times across dump/load
• General Enhancements• Code quality• Performance
• PID-updates• Share PIDs for
record sets• Support operations
on multiple PIDs
• UI updates• Support multiple
views (XSLTs) per template (XSD)
• Search updates• Initial upgrades of
search-by-type and search-by-periodic-table
• Persistence updates• Preserve record
MAC-times across dump/load
Generalizations / extensions
• JSON support –continued
Admin tool enhancements
• Vagrant to Docker conversion
• Admin proxy operations
Recent Activities
Infrared Spectrum RepositoryTed Heilweil / Aaron Massari – Infrared spectrum, jcamp-formatted data from WebBook, MDCS project for community data exchangehttps://mdcs.nist.gov/explore/keyword/
NIST Transfection Data RepositoryAnne Plant et al – Bio-based transfection experiments, calendar data, end-to-end workflow engagement evolvinghttps://test-sys.nist.gov/
NIST Sample RepositoryCarrie Campbell / Yannick Congo – Sample QRCode/PID integration, end-to-end workflow use-case of interest to nexus-lims and Anne Plant’s grouphttps://test-sample.nist.gov/
MIL-STD-889 RepositoryChandler Becker / Carrie Campbell / Rachel Stadler – Navy MDCS project to support the MIL-STD-889 Database
COVID-19 Curator/Registry SSD Team / EL / MML -– Support quality, access, search of COVID-19 research literature and anti-viral therapy researchhttps://covid19-data.nist.gov/https://covid19-registry.nist.gov/
NIST CDCS Resource Repository
Registry of CDCS-related resources (70)Primary system distributions (4)
Instances of distributions (12)
Components (“plugins”, core/input modules, schemas, etc.) (51)
Client tools (client libraries, etc.) (3)
Deployed at
https://cdcs.registry.nist.gov
Considerations
Only includes initial metadata on public repositories and sites
Will only include POC and related metadata based on review and approval
To be maintained incrementally
Annual CHiMaD Meeting• Held virtually – June 9, 2020
• https://chimad2020.sched.com/• Keynote by Jim Yurko – Director of Materials Design, Apple• Pre-recorded presentations• Live breakouts
• Materials-specific• Informatics-related
• Materials Data Facility (Ian F., Ben B., Marcus S., et al)• Artificial Intelligence and High-Performance Data Mining (Ankit A., Alok Ch., et al)• Uncertainty Quantification of Phase Equilibria and Thermodynamics (Marius Stan et al)
• Zach T. specific interactions• Possibly lead to new registry resource type: educational assets• Evolving group process discussion
Deployment Activities
• Kevin B., Marcus N., ITSO Anne*Streamlined A&A
• Kevin B., Marcus N., Steve B., MDF, and many stakeholders
Migration / update of all major systems except BIPM to post python 3,
etc.
• more modular Dockers• consolidated theming
Evolution of deployment configurations
Continual Interactions
Active weekly discussions and interactions
App developer’s meetings
Feature generation• Feature requests
• captured in discussions• advanced by issuers• worked on by different teams
Trello Boardhttps://trello.com/b/hOnW3inK/cdcs-development
Engagement Resources and Channels• Clearinghouse website – https://cdcs.nist.gov/
• Demo servers are available with the latest versions of MDCS and NMRR:
• MDCS: https://mdcs.nist.gov• NMRR: https://cdcs.registry.nist.gov
• Documentation• Tutorials• REST API Documentation and examples
• Python 2• Python 3
• Client Library developed• Python 2• Python 3
• Vagrant distributions• REST API development• Developer level for Hackathons
• CDCS-Related Users and Affiliations• Docker distribution• Google Group• Org Chart• Weekly Developer Meetings
Configurable Data Curation System (CDCS) Update
Deployment: 1. Docker Deployments – updated to release 2.92. Vagrant Deployments – updated to release 2.9
https://cdcs.nist.gov/cdcs-documentation/downloads.pdf
Questions/Additional Items
• Please visit: https://cdcs.nist.gov/• Send us your comments• Send us your ideas• Send us any content you feel is missing
Contacts:[email protected]@nist.gov