acs 248th paper 146 vivo/scientistsdb integration into eureka

16
Faculty Profiling and Searching in the Eureka Research Workbench using VIVO and ScientistsDB Matthew Morse, Israel Hurst, and Stuart J. Chalk Department of Chemistry University of North Florida [email protected] 2014 Fall ACS Meeting

Upload: stuart-chalk

Post on 24-May-2015

81 views

Category:

Science


0 download

DESCRIPTION

Development of plugins for access to researchers identified in VIVO on the ScientistsDB website. Also developed a plugin to access Elasticsearch from within Eureka.

TRANSCRIPT

Page 1: ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka

Faculty Profiling and Searching in the Eureka Research Workbench

using VIVO and ScientistsDB

Matthew Morse, Israel Hurst, and Stuart J. ChalkDepartment of ChemistryUniversity of North Florida

[email protected]

2014 Fall ACS Meeting

Page 2: ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka

Motivation What is Eureka? What is VIVO? VIVO API What is ScientistDB? MediaWiki API Search Approaches ElasticSearch Usage Future Plans Conclusion

Outline

Page 3: ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka

Motivation

Eureka Research Workbench is an Electronic Laboratory Notebook (ELN) …

…plus representation of resources …and needs to be social

Find colleagues that you can collaborate with

There are many places to get this information

Page 4: ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka

Scientists need to move todigital notebooks…

...and record not just the databut the flow and context

How science is doneis important for searching,aggregation, meta-analysis

We need more than an electronic version of a notebook

We need a science version of “Second Life” (SciLife?)

Electronic Notebooks

Page 5: ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka

Started in 2006 after getting involved in the Analytical Information Markup Language (AnIML) project

Store all research notes/data in a digital format Capture the workflow of scientists Writing in a lab notebook is equivalent to

“multi-type” blogging in the digital world How to capture information? Many data types!

(ExptML) How to store files “online”? (Fedora-Commons) How to access files in the browser? (CakePHP) How to represent laboratory resources? (ExptML) How to link data together? RDF (in Fedora-Commons)

Eureka Research Workbench (ERW)

Page 6: ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka

A specification (written in XML) that describes different types of information recorded during the scientific process (http://exptml.sourceforge.net)

Experiment Markup Language (ExptML)

Sample Solution Space Specimen Substance Task Template Timeline User Vendor

Annotation Api Calculation Chemical Citation Customer Data Dataset Definition Element

Equipment Event Experiment Group Message Project Protocol Quote Report Result

Page 7: ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka

What is VIVO?

An interdisciplinary network: Enabling collaboration and discovery among scientists across all disciplines.

Open source software out of Cornell University Now part of Duraspace (Dspace, Fedora-Commons, and

VIVO) Often integrated with other academic services Semantic representation -> Vivo Ontology (https://wiki.duraspace.org/display/VIVO/VIVO-

ISF+Ontology)

http://vivoweb.org/

Page 8: ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka

VIVO API

Interface to search for different types of ‘individuals’ Faculty members Subjects Departments …

Available in multiple download formats N-Triples, RDF, N3, Turtle, JSON-LD

https://wiki.duraspace.org/display/VIVO/The+ListRDF+API

Page 9: ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka

What is ScientistsDB?

Mediawiki site containing nearly 50,000 scientists

Wikipedia entries …plus manual additions

Tony Williams, RSC Sean Atkins, CDD Vault

http://www.scientistsdb.com/

Page 10: ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka

MediaWiki API

Mediawiki is the software that runs Wikipedia Available for download (http://www.mediawiki.org) Access to all data in a mediawiki MySQL database Components

Authentication Search CRUD

http://www.mediawiki.org/wiki/API:Main_page

Page 11: ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka

Search Approaches VIVO

listRDF API for faculty(http://<instance>/listrdf?vclass=http://vivoweb.org/ontology/core#FacultyMember)

Faculty member information (as JSON)(http://<instance>/individual/a52486491431389?format=json)

ScientistsDB Retrieve infobox

(http://www.scientistsdb.com/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Scientist

Extract records with ‘fields’ field

Page 12: ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka

ElasticSearch

Data is stored on a cluster of computers running Elasticsearch NoSQL software

All data is ingested as JSON

Uses Apache Lucene to index data

http://www.elasticsearch.org/overview/elasticsearch

Page 13: ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka

Implementation

Development of CakePHP plugins for VIVO (multiple locations) ScientistDB Elasticsearch

CakePHP can access each of these anywhere in its Model-View-Controller (MVC) code

Stuart Chalk
As of the presentation the VIVO plugin was not working due to problems ingesting data into Elasticsearch. We have since decided to go back to using MySQL for the data until we fully understand the problems with Elasticsearch.
Page 14: ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka

Future Plans

Ingest more installations of VIVO Work with technical staff at VIVO to make

multi-site search available to all VIVO users

Improve code to clean up infobox data Work with Tony and Sean to evaluate if there

are better ways to retrieve subject fields

Page 15: ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka

Conclusion

ScientistDB plugin works VIVO plugin very close…

Eureka needs to be collaborative software and therefore being able to find other researchers in your field is an important part of the system

Development of many more plugins to access online datasources within Eureka