pieper niso virtual conf feb17

Download Pieper NISO Virtual Conf Feb17

Post on 09-Feb-2017




1 download

Embed Size (px)


No Slide Title

Open Source Technologiesat the National Agricultural Library

Ursula PieperIT Specialist Web Team Lead

National Agricultural LibraryAgricultural Research ServiceUnited States Department of Agriculture

Feb 17, 2016


2Ursula PieperUrsula.Pieper@ars.usda.gov301-504-7379Acknowledgements:Knowledge Services Division (Susan McCarthy)Monica Poelchau and Chris Childers (i5K Workspace)Peter Arbuckle and Ezra Kahn (LCA Commons)Jeffrey Campbell (LTAR)Cynthia Parr (Ag Data Commons)

Information Services Division (Vernon Chapman) Chuck Schoppet, NAL (Fedora Commons/Islandora)

Why Open Source? Benefit from community contributions and supportSecurity managed by communityCost Vendor lock-inCan get customized locallyInteroperabilityRe-use of skills

I am gone read the above text3

PHPAvailable Expertise @ NALDrupalPython

GrailsJavaSolrSubject Matter ExpertsDjango

I am gone read the above text4

Open Source based Projects(Selection)DrupalPythonGrailsJavaSolrDjangoAg Data CommonsScientific data catalog/repository LCA CommonsLife Cycle Assessment repo and toolsPubAgCatalog of agricultural scientific literatureI5K@NAL WorkspaceRepository and workspace for Arthropod GenomesLong Term Agro-ecosystem ResearchHistorical and future agricultural research dataNational Nutrient DatabaseDr. Duke's Phytochemical and Ethnobotanical Databases

I am gone read the above text5

Open Source based Projects(Selection)DrupalGrailsJava BasedAg Data Commons http://data.nal.usda.govi5K@NAL Workspace http://i5k.nal.usda.govLCA Commons http://lcacommons.govPubAg Data Management System http://pubag.nal.usda.govLCA Commons http://lcacommons.govNational Nutrient Database http://ndb.nal.usda.gov/ndb/Phytochem Database (Duke) http://phytochem.nal.usda.govLong-term Agro-ecosystem Researchhttp://ltar.nal.usda.gov

I am gone read the above text6

Ag Data CommonsRequirementsPublic Access to USDA funded research resultsSupport scientific research and evidence-based policyRe-use / re-analysisREE Action Plan: 2012 goalsJournal submission requirements

MandatesAmerica COMPETES ActOSTP MemorandumM-13-13, Open Data Policy7

Ag Data CommonsA data catalog and repository based on the Drupal DKAN distribution8

Summary of Required CapabilitiesComprehensive catalog of research resultsSupport for compliance reportingFeeds Data.govEnhanced dataset description for discovery and reuseFlexibility to support distributed data repositoriesSome disciplines already have repositories (e.g. GenBank)Preservation of valuable data for long-term researchSupportive infrastructure for small agencies & labsLink scholarly literature to its supporting dataSustainable business model9

Ag Data Commons Pilot Standard DKAN FeaturesDrupal 7 Installation ProfileFulfills Project Open Data requirementsDataset content type: POD 1.1 metadata schemaUnlimited number of resources can get uploadeddata.json and rdf available

Additional FeaturesSocial media linksSome data analysis tools (map, graph through recline library)License display


Ag Data Commons Pilot Whats missing from DKAN?DKANs main use case: Government and organizational documents and datasets

General improvements

Large File upload, virus checking, file size displayHarvest Dashboard for harvesting external POD datasets or data using other standardsSolr searchVersioningData curation workflow

Scientific data require additional functionality

DOI assignments to datasets Identity management for authors (orcid, etc.)Citation information (Primary citation, Methods citation, Related publications)Collection of additional metadata Long-term archiving capabilitiesFunding source referenceEmbargo periodSpecialized taxonomies


Ag Data Commons Pilot Lessons learned

Keeping codebase compliant with standard DKAN All configuration changes need to get committed to codeCodebase cannot clash with standard DKAN (which requires discipline when under time pressure)Significant pain merging NAL customizations with new DKAN releasesLocal programming and systems support is necessary (our model)

Contributing back to DKAN and DrupalMany of NALs customizations are adopted (and then maintained) by standard DKANGeneral Drupal functionality:Open data schema mapper NALT Thesaurus

Taking advantage of customizations by other organizationsWorkflow, Stories, Visualizations


Ag Data Commons Pilothttps://data.nal.usda.gov13

I5k Workspace@NALProvides tools and resources for scientists working on insect genomes. Goal: to store insect genome sequencesvisualize them, enable their curationmake them accessible to scientists. Designed specifically to handle and support genomic data.Website: https://i5k.nal.usda.gov

I am gone read the above text14

Key open-source software used by the i5k WorkspaceMain portal/websitebuilt with Drupal/TripalKey web application for genome visualization and feature annotationJbrowse/Apollo

We use other OSS, but just focusing on these 2 now15

Key open-source software used by the i5k Workspace

We use other OSS, but just focusing on these 2 now16

I5K Workspace @ NAL 1. Drupal + TripalChado is a database schema for biological dataTripal allows Drupal to access data stored in the Chado database to populate web pages using Drupal functionality.Community: small and academic


Apollo is a web application that allows interactive, instantaneous editing of genome featuresIt is one of the key features of the i5k Workspace Community: small and academic

I5K Workspace @ NAL 2. Apollo


Registration module for Apollo applicationCompletely built in houseIntegrates notifications, account creation, and captcha

Visualizing custom data types: gene pagesHierarchical view to display gene/transcript relationships

Search website (many thousands of nodes)Apache Solr search

I5K Workspace @ NAL Customized Resources


Customization requires one full-time developer at the NAL Because our customizations are forked off the main repository, any updates in the main branch require more updates on our partCustomizations are too specific to our website to be able to fully contribute back to/integrate with the main project I5K Workspace @ NAL Tripal: Lessons learned


Instead of building customized resources, we contributed financially to the salary of the lead developer.

Improvements were not specific to the NALs goals, but were aimed at improving the stability of the application

Even without a financial contribution, bug reports and feature requests from the entire user community are usually addressed very quickly due to an active development team, and a lead developer solely focused on this project. I5K Workspace @ NAL Apollo: Customized resources


How you interact with the development community of an OSS project depends on 1) the community itself 2) the specificity of the customization requiredI5K Workspace @ NAL Apollo: Lessons learned


I5K Workspace @ NAL https://i5k.nal.usda.gov


Life Cycle Assessment (LCA) CommonsLCA Commons is a repository that provides access to data and tools that support life cycle assessment of agricultural products.

We collect, curate, and provide access to data edited and formatted explicitly for use in LCA

The LCA Commons is designed specifically to handle and support unit process data for LCA.

Website: www.lcacommons.gov

I am gone read the above text24

LCA Commons Technology StackThree separate applications accessed through Drupal web content management system. Discovery and Editorial ApplicationsGroovy/grails web implementation of domain specific openLCA data model/modeling tool LCA Collection on Ag Data CommonsDKAN catalog and datastore

I am gone read the above text25

LCA Commons Technology Stack

I am gone read the above text26

Discovery ApplicationEditorial ApplicationLCA Collection on Ag Data Commonslcacommons.govApplication

Groovy/Grails FrameworkSolr IndexopenLCA APIActiviti BPM


DrupalCustom User Mgt.openLCA mySQL

openLCA mySQL

DKANDatastoreDKAN Catalog

DatabaseLCA Commons Technology Stack


LCA CommonsCustomized Resources

openLCA datastore not designed explicitly for data management beyond what is necessary for desktop modeling. has required developing custom work-arounds for data management

Activiti BPM has required significant customization for editorial workflow for LCA data

Will need to develop customized search capabilities that enable search across all three applications through Drupal


LCA CommonsLessons learnedTechnology selection based on clearly defined functional requirements is criticalUsing openLCA for an application for which it was not exactly designed has required custom developmentAND innovation in the fieldSpurred openLCA developer to build functionality that more closely meets our needs and pushed the domain forward in terms of data sharing and management


LCA Commonshttp://lcacommons.gov


PubAg Data Management SystemPubAg is the National Agricultural Library's search system for agricultural information.Content:Full-text articles relevant to the agricultural sciencesCitations to peer-reviewed journal articles.Repository (Data Management):Fedora Commons/Islandora/DrupalPublic Interface:Apache Solr and Java application layer


PubAg Data Management System


PubAg Data Management SystemFrom Islandora (https://wiki.dur