building big data in food science
DESCRIPTION
Presentation SURF Research and Innovation Event 2013 February 28, The Hague University of Applied Sciences Jan Top is Senior Scientist at Wageningen UR and Professor at VU University Amsterdam.TRANSCRIPT
Building Big Data in Food Science
Jan Top
COMMIT/
Food data?
Data in Food Research
Health effects
Sensory effects
Physical properties
Genomics, metabolomics
Sustainable production
How to get high-quality, multidisciplinary, multi-location data in the first place?
Traditional lab notes
Basically unstructured pages
Personal way of working
Chronologic, no erasing
Enable replication
Datasets are scattered, hard to find, understand and combine
Emphasis on data processing
Structured registration of methods, materials, data and observations is part of good science
Modern lab notes
New approach?
Three lines of support
Research workflow - Tiffany
Linking data - Rosanne
Vocabularies – ROC+
Three lines of support
Research workflow - Tiffany
Linking data - Rosanne
Vocabularies – ROC+
Research output structured
Objectives
Activities
Products● Materials
● Methods
● Devices
● Data
● Models
● People
● ...
method
device
e-note
person
method
dataset
person
paper
datasetmodel
presentation
material
my experiment
statistical analysis
conference
Network of activities
Tiffany
Theme Council KM Platform June 16, 2011
Three lines of support
Research workflow - Tiffany
Linking data - Rosanne
Vocabularies – ROC+
Rosanne
Rosanne
Manual annotation
Heuristic annotation
RDF export
SPARQL-based selection and integration
Scientific Table: proposed addition to SDMX and RDF DataCube
Three lines of support
Research workflow - Tiffany
Linking data - Rosanne
Vocabularies – ROC+
Creating ontologies
Ontologies can be created from scratch → very time consuming
Ontologies can be downloaded → not optimally tuned to the application at hand
ROC+ allows domain experts to define an application-specific vocabulary by:
(i) Getting suggestions from existing ontologies
(ii) Getting suggestions from corpora
(iii) Structuring the identified terms
ROC+ recipe
Start with a few characteristic terms
Add related terms from the suggestions
Iterate as long as you think is useful
Structure the terms
●Broader or narrower
●Synonym
●Related
In a few words...
Food data available but scattered
Workflow approach puts scientific food data into context
Annotated tables support interpretation, selection and integration
Develop application-specific vocabularies
http://www.afsg.nl/InformationManagement/
QUESTIONS, IDEAS?
COMMIT/