acs 248th paper 146 vivo/scientistsdb integration into eureka

Post on 24-May-2015

81 Views

Category:

Science

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Development of plugins for access to researchers identified in VIVO on the ScientistsDB website. Also developed a plugin to access Elasticsearch from within Eureka.

TRANSCRIPT

Faculty Profiling and Searching in the Eureka Research Workbench

using VIVO and ScientistsDB

Matthew Morse, Israel Hurst, and Stuart J. ChalkDepartment of ChemistryUniversity of North Florida

schalk@unf.edu

2014 Fall ACS Meeting

Motivation What is Eureka? What is VIVO? VIVO API What is ScientistDB? MediaWiki API Search Approaches ElasticSearch Usage Future Plans Conclusion

Outline

Motivation

Eureka Research Workbench is an Electronic Laboratory Notebook (ELN) …

…plus representation of resources …and needs to be social

Find colleagues that you can collaborate with

There are many places to get this information

Scientists need to move todigital notebooks…

...and record not just the databut the flow and context

How science is doneis important for searching,aggregation, meta-analysis

We need more than an electronic version of a notebook

We need a science version of “Second Life” (SciLife?)

Electronic Notebooks

Started in 2006 after getting involved in the Analytical Information Markup Language (AnIML) project

Store all research notes/data in a digital format Capture the workflow of scientists Writing in a lab notebook is equivalent to

“multi-type” blogging in the digital world How to capture information? Many data types!

(ExptML) How to store files “online”? (Fedora-Commons) How to access files in the browser? (CakePHP) How to represent laboratory resources? (ExptML) How to link data together? RDF (in Fedora-Commons)

Eureka Research Workbench (ERW)

A specification (written in XML) that describes different types of information recorded during the scientific process (http://exptml.sourceforge.net)

Experiment Markup Language (ExptML)

Sample Solution Space Specimen Substance Task Template Timeline User Vendor

Annotation Api Calculation Chemical Citation Customer Data Dataset Definition Element

Equipment Event Experiment Group Message Project Protocol Quote Report Result

What is VIVO?

An interdisciplinary network: Enabling collaboration and discovery among scientists across all disciplines.

Open source software out of Cornell University Now part of Duraspace (Dspace, Fedora-Commons, and

VIVO) Often integrated with other academic services Semantic representation -> Vivo Ontology (https://wiki.duraspace.org/display/VIVO/VIVO-

ISF+Ontology)

http://vivoweb.org/

VIVO API

Interface to search for different types of ‘individuals’ Faculty members Subjects Departments …

Available in multiple download formats N-Triples, RDF, N3, Turtle, JSON-LD

https://wiki.duraspace.org/display/VIVO/The+ListRDF+API

What is ScientistsDB?

Mediawiki site containing nearly 50,000 scientists

Wikipedia entries …plus manual additions

Tony Williams, RSC Sean Atkins, CDD Vault

http://www.scientistsdb.com/

MediaWiki API

Mediawiki is the software that runs Wikipedia Available for download (http://www.mediawiki.org) Access to all data in a mediawiki MySQL database Components

Authentication Search CRUD

http://www.mediawiki.org/wiki/API:Main_page

Search Approaches VIVO

listRDF API for faculty(http://<instance>/listrdf?vclass=http://vivoweb.org/ontology/core#FacultyMember)

Faculty member information (as JSON)(http://<instance>/individual/a52486491431389?format=json)

ScientistsDB Retrieve infobox

(http://www.scientistsdb.com/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Scientist

Extract records with ‘fields’ field

ElasticSearch

Data is stored on a cluster of computers running Elasticsearch NoSQL software

All data is ingested as JSON

Uses Apache Lucene to index data

http://www.elasticsearch.org/overview/elasticsearch

Implementation

Development of CakePHP plugins for VIVO (multiple locations) ScientistDB Elasticsearch

CakePHP can access each of these anywhere in its Model-View-Controller (MVC) code

Stuart Chalk
As of the presentation the VIVO plugin was not working due to problems ingesting data into Elasticsearch. We have since decided to go back to using MySQL for the data until we fully understand the problems with Elasticsearch.

Future Plans

Ingest more installations of VIVO Work with technical staff at VIVO to make

multi-site search available to all VIVO users

Improve code to clean up infobox data Work with Tony and Sean to evaluate if there

are better ways to retrieve subject fields

Conclusion

ScientistDB plugin works VIVO plugin very close…

Eureka needs to be collaborative software and therefore being able to find other researchers in your field is an important part of the system

Development of many more plugins to access online datasources within Eureka

schalk@unf.edu Phone: 904-620-5311 Skype: stuartchalk LinkedIn/Slidehare: https://www.linkedin.com/in/

stuchalk ORCID: http://orcid.org/0000-0002-0703-7776 ResearcherID:

http://www.researcherid.com/rid/D-8577-2013

Questions?

top related