Semantically Enhanced Model Experiment Evaluation Process (SeMEEP)
within the Atmospheric Chemistry Community
• Chris Martin 1,2, Mo Haji 2, Peter Dew 2, Peter Jimack 2, Mike Pilling 1
• 1 School of Chemistry, University of Leeds
• 2 School of Computing, University of Leeds
2
Outline of the Presentation
• Introduction
• Atmospheric community
• SeMEEP
• ELN Provenance capture
• Conclusion and next stage
3
Section 1 Overview
• Application domain – atmospheric community
– Reliance on computational models to evaluate data
• Motivation
– Study how to transition from today's ad-hoc processes practises
– Sustainable process of
• Gathering, community evaluation and sharing data & models between scientists
• Minimising changes to proven working practises of the scientist
• Within world-wide co-laboratories
4
Related projects
• CombeChem– Experimental organic chemistry– From source to long term data – perseveration (knowledge)– Semantically-enabled ELN– Data-driven workflow
• Collaboratory for Multi-Scale Chemical Science– Multi-layer chemical model
• myGrid– Bio-informatics and related areas (semantic pattern matching– Reusable semantic workflow using SMD (semantic metadata)– Data Quality
• Karama2– Weather forecasting – computation modelling– Data-driven workflow
Add
Sample
chem1 chem2
Quantum Thermo Kinetic Mechanism Reacting Flow
Chemistry Chemistry Simulation
5
Section 2 Atmospheric Chemistry
• Seeks to understand the chemical processes (reactions) taking place in the lower atmosphere (e.g. smoke)
• It has significant implication for both:
– Air Quality
– Climate Change
6
The Master Chemical Mechanism (MCM)
• Data repository of elementary chemical reactions & rate constants
• The mechanism is described by a computational model that is evaluated against experimental data
– Chamber experiments
– Field experiments
27.11.06 Methyl Glyoxal
0
20
40
60
80
100
120
140
0 5000 10000 15000 20000 25000 30000 35000 40000
time/ s
MG
LY
OX
/ pp
bv
MCMv3.1
measured (calibrated using isoprene)
7
Section 3 SeMEEP
• Today
– Typically within the atmospheric chemistry community the provenance is recorded in an ad-hoc, unstructured fashion, using a combination of traditional lab-book, word processing documents and spreadsheet.
• Move to more sustainable evaluation process supports the gathering, evaluation and sharing of data and models
• Using semantic metadata
8
Laboratory Database (s)
Shared Community Semantic Database
CommunityEvaluation(people)
Scientist (s) with personal ELN
SeMEEEP
Com Data manager
Datamanager
Public Database (s)
Datamanager
SeMEEP Vision
• SeMEEP semantically-enabled MEEP
– Supports the organisation of information but critically, records its provenance (say to recover secondary data)
Mike Pilling : “SeMEEP approach will radically enhance the effectiveness of a research community to deliver new science“
10
Raw Data
Metadata
Publication
Metadata
Process DataE.g. k(T, p)
ELN
Community evaluation
(subjective)
May be partial information
PhysicalExperiment
AnalysisProcess
HistoricalData
Theory(e.g. quantum
mechanic)
IUPAC (kinematic, Int. Union of
pure and applied chemistry
From other labs
Requirements for metadata capture for elementary reactions
•Only published data•Rate constants from several labs•No access to the raw data•No access to secondary data•SeMEEP will provide this.
11
Current Evaluation Processes for the MCM
12
Envisioned Evaluation Processes
LaboratoryArchiveCommunity Semantic Database
Inputs to the modelling process:Benchmark data
Model parameter sets etc.
Scientist’s Personal ELN Archive
Workgroup database
ELN Capture of the Model Development Provenance
Model Development
Model ExecutionAnalysis
Links to experimental dataand provenance generation
processes
Data sources
Community EvaluationSubjective
SeMEEP
Semantic-enabled
ELN
13
Section 4 Electronic Lab-Books (ELNs)
• ELNs address the limitations of the current methods of provenance capture.
• Southampton ELN for organic chemistry experiments.
• Benefits to the modeller
• Modelling process can be automatically captured
• Searchable
• Remote access is possible
• Provenance is structured
• Possible to use resolvable references to resources
14
Will User attach quality metadata?
• Motivate users:
– By demonstrating the value of provenance in their day-to-day work
• Writing publication
• Managing their data
• Reinterpretting the data.
– Management
– Publishers
16
The Modelling Process - A Three Layer Mapping
ExperimentExperiment
PlanExperiment Conclusions
Modelling Iteration
Iteration Plan
Iteration Conclusion /
Plan for Iteration n + 1
Modelling Iteration
Model Development
Model Parameters
Model Output
Model Execution Analysis
Iteration Plan
· Model Source code
· ……...
· Model Output Data from previous iterations
· External Data Sources· ……...
Experiment Layer
Modelling Iteration
Layer
Modelling Layer
Iteration Conclusion /
Plan for Iteration n + 1
Iteration Conclusion /
Plan for Iteration n + 1
Model Parameters
Iteration Conclusion /
Plan for Iteration n + 1
17
MCM Mechanism being investigated
18
Modelling Plan
Ontology
Compare to generate metadata
Mechanism Editing Model Execution Model Output Analysis
Mechanism version n
Mechanism version n-1
Scientific Process
Automatic Metadata Capture
Planning the
Scientific Process
User Annotation
Metadata Storeage
Metadata Storeage
Capture Metadata at run time
ELN Process
19
ELN Screenshots
• Prompts displayed when changing the changing the chemical mechanism;
• Editing a reaction
• Adding a new reaction
20
ELN Screenshots
21
ELN Modelling SMD Architecture
SMD creation(e.g. Data driven
workflow)
Context ontology(e.g. materials/
process)
3-level scientific services (model dev; execution; analysis)
Data Storage (SMD, Model Output
& Analysis)
SMD Middleware Services(e.g. ontology. services, query etc
SMD Modelling sub-system
SemanticMetadata
level
Grid Fabrics
User Interface
Workflow constrictor Annotation interface Database Query & Retrieval
DL-based reasoner
Simulation server
22
Evaluation Methodology
• In-depth interviews with members of the atmospheric chemistry model group here at Leeds, covering:
– Demonstration of the prototype
– User testing of the prototype
– Discussion of scenarios involving the use of the prototype (e.g. )
• Analysis
– Interviews recorded and transcribed
– Analysed using techniques from grounded theory
23
Evaluation
Barriers to adoption:
– Effort required at modelling time for provenance capture
• “[in] your lab book you can write down what ever you want [but with an ELN] it is going to take time to go through the different protocol steps”.
– When asked if they would use an ELN requiring a similar amount of user input to the prototype the response was positive:
• “Yeah, I think it would be a good thing. I don’t think it is too much extra … work.”
– Rather than viewing the prompts for user annotation as interruption to their normal work the user recognised the value of being prompted
• “is a good way to do it because otherwise you won’t [record the provenance].”
24
Evaluation
• Users intuitively grasped the benefits of recording provenance with an ELN and that the benefits would be realised after the time of modelling by a number of stakeholders:
– “if someone else wants to look at … [your provenance], that’s great because the person can see exactly what you have done, where you have been and where to go next. And for yourself, if you are writing up a PhD ... [you can] … see exactly what you’ve done whereas currently you have to rifle through lab-books to see exactly what you have done.”
25
Section 5 Conclusions and future work
• Outlined SeMEEP and ELN
– User evaluated proposed modelling ELN
• Addressed case studies
– IUPAC
– MCM
• Developing a case study with the Geomagnetic community
• User and System issues
– Application of actively theory to capture requirements and user evaluation
– Querying and inference
– Address QoS issues (e.g. security, scalabilty, dynamic roles-based access control)
26
Questions