e-lico an e-laboratory for interdisciplinary collaborative research in data mining and data...
TRANSCRIPT
e-LICOAn e-Laboratory for Interdisciplinary Collaborative
research in data mining and data intensive sciences
October 12th, 2010
Delivering data mining to the Life Science Community
Simon JuppSchool of Computer Science
University of Manchester, United Kingdom
e-LICO project overview
Infrastructure to support collaborative, data mining enabled experimental research
Knowledge-driven planning of DM workflows– Improve planning by meta-mining
Support research in data-intensive, knowledge-rich domains– Systems biology use case
European Project
European Project, 9 partners. (Month 20/36)– Specialists from Data Mining, Semantic Web, Grid
computing and Systems Biology
• University of Manchester, UK• University of Geneva, Switzerland• Inserm, France• Josef Stefan Institute, Slovenia• NHRF, Greece• Poznan University, Poland• Rapid-I GmbH, Germany• Ruder Boskovic Institute, Coratia• University of Zurich, Switzerland
An EU-FP7 Collaborative Project (2009-2012) Theme ICT-4.4: Intelligent Content and Semantics
Problems…
Capturing the workflow
– Explanation
– Error detection / Repair
– Reproducibility
– Provenance
Steep learning curve
– Many operators to choose from
– Best combination of operators
– Hard for non Data Miners
Problems… and solutions (e-LICO planned workflows)
Develop “Intelligent Discovery Assistant”
(IDA) for Data Analysis
– Automatically generate workflows by planning
– Assist the user in solving DM task
– Structure workflows in workflow templates
– Self improvement through Meta-Mining
Ontology based data model
– Adds semantics
– OWL/RDF based
– Data Mining Experiment Resository
Capturing the workflow
– Explanation
– Error Detection / Repair
– Reproducibility
– Provenance
Steep learning curve
– Many operators to choose from
– Best combination of operators
– Hard for non Data Miners
The e-LICO workflow
Input Data
Ontology based
AI planner
Workflow executionengine
Publish and share
Output: Data, provenance and
models
Meta-mining
1 3 4
2
Ontology based AI planner
Input Data
Ontology based
AI planner
Workflow executionengine
Publish and share
Output: Data, provenance and
models
Meta-mining
1 3 4
2
Hierarchical Task Network (HTN) planning
Set of Tasks to achieve possible Data Mining Goals
Tasks have an I/O specification and set of associated Methods to
achieve that task
Methods composed of simpler Task/Methods
Some methods are Operators with Conditions and Effects
Example: My task is ‘Data Mining With Evaluation’, my Goal is to get a
workflow that does this Evaluation via Cross-Validation
Workflow planning
The Data Mining Worfkflow Ontology (DMWF)Class Description Examples
IO Object Input and output used by operators Data, Model, Report
MetaData Characteristics of the IOObjects Attribute, AttributeType, DataColumn, DataFormat
Operator DM operators DataTableProcessing, ModelProcessing, Modeling, MethodEvaluation
Goal A DM goal that the user could solve DescriptiveModelling, PatternDiscovery, PredictiveModelling, RetrievalByContent
Task A task is used to achieve a goal CleanMV, CategorialToScalar, DiscretizeAll, PredictTarget
Methods A method is used to solve a task CategorialToScalarRecursive, CleanMVRecursive, DiscretizeAllRecursive, DoPrediction
AI Planner
Brute force planning
Probabilistic Planning
What will likely produce better results?
Case-based Planning
– How did we solved that previously?
DMOP (Workflow optimization ontology)
– Algorithm and Model selection given a particular task
– Meta-mining by abstraction and generalisation
Workflow Planning
Meta-Mining
Initially, the AI planner recommends applicable DM workflows, not
necessarily good ones
Self-improves with experience through meta-mining
The meta-miner
– Applies DM techniques to meta-data from past DM experiments
– Extracts workflow patterns that are signatures of high predictive
performance
The planner uses these workflow patterns to design and recommend
promising workflows
Workflow Execution
04/21/23e-LICO Kick-Off, Geneva 12
Input Data
Ontology based
AI planner
Workflow executionengine
Publish and share
Output: Data, provenance and
models
Meta-mining
1 3 4
2
Workflow Execution
All operators in ontology (+200) are exposed as SOAP or REST based Web
Service
Plans converted to Workflow execution language (SCUFL 2)
Provenance capture
– Execution times, intermediate model returned to planner
Taverna
Worflow Publishing and Sharing
04/21/23e-LICO Kick-Off, Geneva 14
Input Data
Ontology based
AI planner
Workflow executionengine
Publish and share
Output: Data, provenance and
models
Meta-mining
1 3 4
2
Workflow Publishing and Sharing
Workflows and data can be shared via myExperiment
Build a community of data miners
Set of re-usable workflows, data and workflow templates (packs)
Use case – Obstructive nephropathy
Demonstrated with System Biology Use Case– Biomarker discovery and pathway modelling in the study of
chronic kidney disease
– KUP challenge initiated (August 2010)
Expression data
KUP KB(RDF store)
Text-mining / Image mining
New modelsAnd hypothesis
Further wet labexperiments
Research Questions
How and when does a planner based “Intelligent Discovery Assistant” help
the end user?
Can we improve planning and suggest better workflows through meta-
mining?
Can we plan complex workflows with Scientific Goals that answer biological
questions?
– KUP goal is to construct diagnostic models that accurately connect the biological
views to the severity of this pathology
Where are we nowAvailability
http://wwww.e-lico.eu
1st year demo –
http://www.youtube.com/watch?v=JtmqZfzyEKs
eProPlan plugin for Protégé 4.0 Ontologies available
Taverna
http://www.taverna.org.uk
RapidMiner
http://rapid-i.com
Summary
e-LICO: virtual laboratory for interdisciplinary collaborative research in
data-mining
Ontology based AI planning of KDD workflows
Generic E-Science platform for DM
Application layer for Systems Biology
Acknowledgments
Robert Stevens (Manchester) Alan Williams (Manchester) Rishi Ramgolam (Manchester) Jorg-Uwe Kietz (Zurich) Melanie Hilario (Geneva) E-LICO consortium