agenda - the configurable data curation system (cdcs)

36
NIST Configurable Data Curation System (CDCS) Quarterly Stakeholders Meeting June 23, 2020

Upload: khangminh22

Post on 10-Mar-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

NIST Configurable Data Curation System

(CDCS)

Quarterly Stakeholders Meeting

June 23, 2020

Agenda

• Migration Status

• New applications

• Stakeholder Engagements/Development - Ben

• Any Additional Items

Test Site Alias Name/URL Owner Production URLhttp://vm-itl-ssd-017.nist.gov:8029/ test-ambench.nist.gov AMBENCH ( Carrie/Yanic) ambench.nist.gov

http://vm-itl-ssd-017.nist.gov:8028/ test-phasedata.nist.gov Phasedata ( Carrie/Yanic) phasedata.nist.gov

http://vm-itl-ssd-017.nist.gov:8030/ test-ghgr.nist.gov GHG Registry (Gretchen Greene) ghgr.nist.gov

http://vm-itl-ssd-017.nist.gov:8035/ ammdtest.el.nist.gov AMMD (Yan Lu)

http://vm-itl-ssd-017.nist.gov:8040/ test-materials.registry.nist.gov Material Registry (NMRR) ( Ray Plante/Chander)

materials.registry.nist.gov

http://vm-itl-ssd-017.nist.gov:8045/ test-jarvis.nist.gov JARVIS-API (Kamal Choudhary) jarvis.nist.gov

http://vm-itl-ssd-017.nist.gov:8050/ test-potentials.nist.gov Interatomic Potentials Repository (Lucas Hale)

potentials.nist.gov

http://vm-itl-ssd-017.nist.gov:8060/ test-mrr.materialsdatafacility.nist.gov MDF (Copy from NW) (Ben Blaizic/Ben Long)

http://mrr.materialsdatafacility.org

http://vm-itl-ssd-017.nist.gov:8100/ BIPM International Metrology Resource Registry (Ray/Marcus)

http://imrr.bipm.org/

https://vm-itl-ssd-070.nist.gov/ Query-able Data Repository (Tom H) https://smstest.el.nist.gov

2.0 Data Migrations

Deployed

Deployed

Deployed

Deployed

Deployed

Deployed

Deployed

Deployed

Registry Schema Test Case

Deployed - Emptyammd.nist.gov

MRR and MDF upgraded to 2.8 and Harvesting- May 20

Added new type “Semantic Asset” to schema

Configurable Data Curation System (CDCS) Update

Single Sign-On enabled on all sites: Internal:

https://test-sample.nist.govhttps://test-materials.registry.nist.govhttps://test-ambench.nist.govhttps://test-mrr.materialsdatafacility.nist.govhttps://test-jarvis.nist.govhttps://test-potentials.nist.govhttps://test-phasedata.nist.govhttps://test-ghgr.nist.gov

Externalmdcs.nist.govcdcs.registry.nist.gov If you come in from a NIST network,

SSO will work, otherwise defaults to Django login

Laboratory Information Management System (LIMS)

• This laboratory information management system (LIMS) allows for the automated creation and curation of microscopy experimental records using the schema co-developed by ODI and the MML Electron Microscopy Nexus Facility.

• Experimental records are automatically harvested from multiple data sources to facilitate browsing and searching of data collected from the varied instruments in the Nexus Facility.

Contacts: June Lau/Josh Taillon

NIST Sample Repository

• Issue: How to keep track of samples – Currently a spreadsheet and handwritten labels

• Solution: Automate storing of sample data in CDCS using QR Codes tied to the original sample. Can extend across multiple CDCS instances.

Contacts: Carrie Campbell/Yanic Congo

NIST AMBench Repository

• New AMBench Repository deployed• https://test-ambench.nist.gov/

Contacts: Carrie Campbell/Lyle Levine

Swedish AM Repository

• The Swedish AM-arena to use as storage facility for their material data.• https://amdata.proj.kth.se

Contacts: Carrie Campbell

WIPP Registry

• The Web Image Processing Pipeline Project (WIPP)• wipp-plugins.nist.gov

Contacts: Peter Bajscy/Mylene Simon

CDCS Resource Registry

• Have a demo system that contains useful information while showing implementation of a registry.

• Deployed to cdcs.registry.nist.gov

Contacts: Zach Trautt/Kevin Brady

CDCS Resource Registry

• Have a demo system that contains useful information while showing implementation of a registry.

Pycdcs

Resource Type: Client Tool: API LibraryLocal ID: B3MNOJGAODDX20P9RASDStatus: active

IdentityTitle: pycdcs

ProvidersPublisher: National Institute of Standards and Technology Contact Name: Lucas Hale Contact Email Address: [email protected] ContentDescription: This is a base Python package for accessing instances of the NIST Configurable Data Curation System (CDCS) databases, versions 2+. It defines a Python CDCS class that streamlines REST calls to a database by: (1) Taking access settings once (username, password, etc) and saving them for subsequent REST calls. (2) Defining methods that wrap around REST calls to interact with the database in a more Pythonic way. (3) Automatically converting any accessed information to pandas Series and DataFrame objects to allow for the information to be easily manipulated.Landing Page: https://github.com/lmhale99/pycdcs

RoleType: Client Tool: API Library

Parmenides-based CDCS Root-terms POC

Root terms approach background“Methodology: Root and Rule”

R&D by Bhat et al

Multiple domain usage•crystallography •biology•materials science•forensics•cybersecurity•and more

Linguistic/morphological approach to natural language text

Applications in linguistic•analysis•semantics•pragmatics•indexing•and more

Software tooling

Supported by•linguistic software, Parmenides

Developed•by Jacob Collard•with Eswaran Subrahmanian

Analyzes natural language text

Outputs root-term-structured metadata

Used by text analysis applications

CDCS development

Initial root-term applications•root-term-based search•dynamic faceting + indexing•atomic + compound terms•run against XML corpora

Future applications•vocabulary generation, ontology, mining, and more

Exploring and comparing usage with other approaches• student projects soon

Exploratory usage coming soon• with June Lau et al soon

COVID-19 NIST-Based Sites

• https://covid19-data.nist.gov - The NIST COVID19-DATA repository is being made available to aid in meeting the White House Call to Action for the Nation’s artificial intelligence experts to develop new text and data mining techniques that can help the science community answer high-priority scientific questions related to COVID-19 - ~ 144,000 Papers, ~19,200 Institutions, ~405,000 Authors

• https://covid19-registry.nist.gov – NIST Registry of all known COVID related sites – 52 Sites

• https://test-qdr.nist.gov - CAS COVID-19 Anti-Viral Candidate Compounds Dataset - As a specialist in scientific information solutions, CAS is partnering with research organizations around the globe to tackle the complex and rapidly evolving challenge of COVID-19. Aligned with our mission as a division of the American Chemical Society, CAS is making a wide range of assets, expertise, and resources available to support this fight. ~49,000 Entries

https://covid19-data.nist.gov

https://covid19-data.nist.gov

https://covid19-data.nist.gov

https://covid19-data.nist.gov

https://covid19-registry.nist.gov

https://covid19-registry.nist.gov

https://covid19-registry.nist.gov

https://test-qdr.nist.gov

https://test-qdr.nist.gov

https://test-qdr.nist.gov

Essential Highlights: Release 2.9

• Front-end libraries for styling • bootstrap: 3.3.7 to 4.4.1 • font awesome: 4.7.0 to 5.13.0

• Admin tool• dump records by user• assign records to user by moving records to a local

file system folder

Upgrades

• configure for account request denial reply-emails• toggle footer links display• filter queries by workspace• edit MRR resource title• initial support for JSON (JSON schema/docs,

insert/retrieval via REST)• perform advanced searches using Elasticsearch

Introduced ability to

• change owner view for only active users• app-initialization only during data-migration• exporters work on all pages• removed errors on a few dashboard pages• cosmetic fixes

Fixes

Essential Highlights: Release Summary View

Past release (2.9)May 26

Current release (2.10)May 18 – July 17

Future release (2.11)July 13 – Sept 11

Parallel / Ongoing

• Upgraded• front-end libraries for

styling • admin tool

• Introduced• account request denial• footer links display• Workspace query filters• MRR resource title

editing• initial JSON support• Elasticsearch

• Made fixes for• change ownership• app-initialization• exporters• cosmetic

• Record migration• Ability to migrate records from

older to newer schemas

• PID updates• Share PID for single record

• REST API updates• for registry data-migration

operations

• Persistence updates• Preserve record publication-

times across dump/load

• General Enhancements• Code quality• Performance

• PID-updates• Share PIDs for

record sets• Support operations

on multiple PIDs

• UI updates• Support multiple

views (XSLTs) per template (XSD)

• Search updates• Initial upgrades of

search-by-type and search-by-periodic-table

• Persistence updates• Preserve record

MAC-times across dump/load

Generalizations / extensions

• JSON support –continued

Admin tool enhancements

• Vagrant to Docker conversion

• Admin proxy operations

Recent Activities

Infrared Spectrum RepositoryTed Heilweil / Aaron Massari – Infrared spectrum, jcamp-formatted data from WebBook, MDCS project for community data exchangehttps://mdcs.nist.gov/explore/keyword/

NIST Transfection Data RepositoryAnne Plant et al – Bio-based transfection experiments, calendar data, end-to-end workflow engagement evolvinghttps://test-sys.nist.gov/

NIST Sample RepositoryCarrie Campbell / Yannick Congo – Sample QRCode/PID integration, end-to-end workflow use-case of interest to nexus-lims and Anne Plant’s grouphttps://test-sample.nist.gov/

MIL-STD-889 RepositoryChandler Becker / Carrie Campbell / Rachel Stadler – Navy MDCS project to support the MIL-STD-889 Database

COVID-19 Curator/Registry SSD Team / EL / MML -– Support quality, access, search of COVID-19 research literature and anti-viral therapy researchhttps://covid19-data.nist.gov/https://covid19-registry.nist.gov/

NIST CDCS Resource Repository

Registry of CDCS-related resources (70)Primary system distributions (4)

Instances of distributions (12)

Components (“plugins”, core/input modules, schemas, etc.) (51)

Client tools (client libraries, etc.) (3)

Deployed at

https://cdcs.registry.nist.gov

Considerations

Only includes initial metadata on public repositories and sites

Will only include POC and related metadata based on review and approval

To be maintained incrementally

Annual CHiMaD Meeting• Held virtually – June 9, 2020

• https://chimad2020.sched.com/• Keynote by Jim Yurko – Director of Materials Design, Apple• Pre-recorded presentations• Live breakouts

• Materials-specific• Informatics-related

• Materials Data Facility (Ian F., Ben B., Marcus S., et al)• Artificial Intelligence and High-Performance Data Mining (Ankit A., Alok Ch., et al)• Uncertainty Quantification of Phase Equilibria and Thermodynamics (Marius Stan et al)

• Zach T. specific interactions• Possibly lead to new registry resource type: educational assets• Evolving group process discussion

Deployment Activities

• Kevin B., Marcus N., ITSO Anne*Streamlined A&A

• Kevin B., Marcus N., Steve B., MDF, and many stakeholders

Migration / update of all major systems except BIPM to post python 3,

etc.

• more modular Dockers• consolidated theming

Evolution of deployment configurations

Continual Interactions

Active weekly discussions and interactions

App developer’s meetings

Feature generation• Feature requests

• captured in discussions• advanced by issuers• worked on by different teams

Trello Boardhttps://trello.com/b/hOnW3inK/cdcs-development

Engagement Resources and Channels• Clearinghouse website – https://cdcs.nist.gov/

• Demo servers are available with the latest versions of MDCS and NMRR:

• MDCS: https://mdcs.nist.gov• NMRR: https://cdcs.registry.nist.gov

• Documentation• Tutorials• REST API Documentation and examples

• Python 2• Python 3

• Client Library developed• Python 2• Python 3

• Vagrant distributions• REST API development• Developer level for Hackathons

• CDCS-Related Users and Affiliations• Docker distribution• Google Group• Org Chart• Weekly Developer Meetings

Configurable Data Curation System (CDCS) Update

Deployment: 1. Docker Deployments – updated to release 2.92. Vagrant Deployments – updated to release 2.9

https://cdcs.nist.gov/cdcs-documentation/downloads.pdf

Questions/Additional Items

• Please visit: https://cdcs.nist.gov/• Send us your comments• Send us your ideas• Send us any content you feel is missing

Contacts:[email protected]@nist.gov