challenges and opportunities in using workflow technology
TRANSCRIPT
INSTITUTE OF NANOTECHNOLOGY
KIT – The Research University in the Helmholtz Association www.kit.edu
Challenges and Opportunities in using Workflow Technologyfor Reproducible and Reusable Simulation Protocols
Institute of NanotechnologyJörg Schaarschmidt - Challenges and Opportunities in using Workflow technology
Challenges and Opportunities of Workflow Technology
Multiscale Materials Modelling and Virtual Design @KIT
Challenges in Materials Modelling
Opportunities of Workflow Technology
Challenges of Workflow Technology
9 July, 20192
Institute of NanotechnologyJörg Schaarschmidt - Challenges and Opportunities in using Workflow technology
Multiscale Materials Modelling and Virtual Design @KIT
Challenges in Materials Modelling
Opportunities of Workflow Technology
Challenges of Workflow Technology
9 July, 20193
Challenges and Opportunities of Workflow Technology
Institute of NanotechnologyJörg Schaarschmidt - Challenges and Opportunities in using Workflow technology
Multiscale Materials Modelling and Virtual DesignAG-Wenzel (INT)
9 July 20194
Development and application of methods for multi-scale simulations of nanoscale materials and devices
nanoscale electronics
• single molecule electronics
• organic electronics
carbon based systems
• graphene and carbon nanotubes
Workflows
• Rapid Prototyping with Graphical Workflow editor (SimStack)
Prof. Dr. Wolfgang Wenzel
à materials design and discovery
Institute of NanotechnologyJörg Schaarschmidt - Challenges and Opportunities in using Workflow technology
Multiscale Materials Modelling and Virtual DesignAG-Wenzel (INT)
9 July 20195
Institute of NanotechnologyJörg Schaarschmidt - Challenges and Opportunities in using Workflow technology
Multiscale simulation in Organic Electronics
1. Single molecule parametrization (QM)Geometry optimizationCustomized force-fields
2. Generation of atomistic morphologiesMolecules parametrized on quantum mechanical levelSimulation of physical vapor deposition
3. Calculation of charge hopping ratesFull quantum mechanical electronic structure analysisElectronic couplings, reorganization and orbital energies
4. Charge transport simulationsTime resolved charge carrier/exciton dynamicsIVs, IQEs, carrier balance, quenching, ...
9 July 20196
Content adapted from:
Institute of NanotechnologyJörg Schaarschmidt - Challenges and Opportunities in using Workflow technology
Multiscale Materials Modelling and Virtual Design @KIT
Challenges in Materials Modelling
Opportunities of Workflow Technology
Challenges of Workflow Technology
9 July, 20197
Challenges and Opportunities of Workflow Technology
Institute of NanotechnologyJörg Schaarschmidt - Challenges and Opportunities in using Workflow technology
Challenges - Integration of new employeesNew student in the office
Reproduction of a given result as first task
Application of the solution to another data
set (Bachelor level)
Improvement of the given method (Master
level)
Development of a new method
(PhD Cand. level)
Reality
Bachelor Thesis – 3 months
Barely enough time to get familiar with the
work environment (command line, ssh,
HPC, software)
9 July 20198
Training of new Group Members
Content adapted from:
Institute of NanotechnologyJörg Schaarschmidt - Challenges and Opportunities in using Workflow technology
Challenges – Reproducability/Replicability
MotivationContinue ongoing projectsBuild on work of others (Publications)Use existing data for Benchmarking/Datamining (Repository)Increase impact of own publications (citations)
9 July 20199
Reproducibility
xkcd.com Content adapted from:
Institute of NanotechnologyJörg Schaarschmidt - Challenges and Opportunities in using Workflow technology
Quality of Published Data (Journal/Database)
9 July 201910
Scientifically sound
Scientifically sound but missing
metadata
missing replicas
methodologigal errors
works only in one lab
Scientifically sound
vs.
fictive data
Challenges – Reproducability/Replicability
Institute of NanotechnologyJörg Schaarschmidt - Challenges and Opportunities in using Workflow technology9 July 201911
Friederich, Advanced Materials (2017)
Schaarschmidt, PLOS ONE (2014)
Preparation of input data
Missing steps/
customizations
e.g. Data sources
Hardware
Compiler options
Software Versions
…
à For Reproducibility, all
important aspects of a simulation
need to be captured
Challenges – Reproducability/Replicability
Institute of NanotechnologyJörg Schaarschmidt - Challenges and Opportunities in using Workflow technology
4. Challenges - Scalability
RequirementsAvailability of ResourcesAutomatization of the simulation protocolHomogeneous input data
9 July 201912
Scalability
xkcd.com
Institute of NanotechnologyJörg Schaarschmidt - Challenges and Opportunities in using Workflow technology
Challenges - Competence Drain
Highly specialized employee (PhD cand., etc.) implements method
Results in highly specialized hard to use software tool
Employee leavesKnowledge about usage of software tool leaves with employeeUsage/Maintenance/Support of software tool hard to impossible
9 July 201913
//Peter wrote this, nobody knows what it does, don't change it!Content adapted from:
Institute of NanotechnologyJörg Schaarschmidt - Challenges and Opportunities in using Workflow technology
Challenges - InteroperabilitySoftware needs to be inter-operable
Multi-scale environmentsData-sharing
Software DevelopmentSoftware needs to be usable by other groupsSoftware aggregates need to be
preparedsharedarchivedpresented (Deliverables)
9 July 201914
Interoperability
//When I wrote this, only God and I understood what I was doing//Now, God only knows
Content adapted from:
Institute of NanotechnologyJörg Schaarschmidt - Challenges and Opportunities in using Workflow technology
Challenges – Scientific Group
9 July 201915
Training of new group members
Interoperability
Reproducibility
Scalability
Workflows
Content adapted from:
Institute of NanotechnologyJörg Schaarschmidt - Challenges and Opportunities in using Workflow technology
Multiscale Materials Modelling and Virtual Design @KIT
Challenges in Materials Modelling
Opportunities of Workflow Technology
Challenges of Workflow Technology
9 July, 201916
Challenges and Opportunities of Workflow Technology
Institute of NanotechnologyJörg Schaarschmidt - Challenges and Opportunities in using Workflow technology
Workflows
9 July 201917
A workflow represents the coordinated execution of repeatable computational steps while accounting for dependencies and concurrency of tasks.
https://github.com/MD-Studio/MDStudio/
Focus
• Automation• Speedup• Provenance• Innovation
Components
• Applications• (Web)services/Utilities• Libraries• Data• Control Elements
Abstraction Level
• Command Line• Script based• Workflow file• GUI
Backend
• HPC Clusters• Cloud/Grid Resources• Workstation
Institute of NanotechnologyJörg Schaarschmidt - Challenges and Opportunities in using Workflow technology
Workflows
9 July 201918
Focus
• Automation• Speedup• Provenance• Innovation
Components
• Applications• (Web)services/Utilities• Libraries• Data• Control Elements
Abstraction Level
• Command Line• Script based• Workflow file• GUI
Backend
• HPC Clusters• Cloud/Grid Resources• Workstation
Institute of NanotechnologyJörg Schaarschmidt - Challenges and Opportunities in using Workflow technology
The multiscale simulation workflow for Organic Electronics
1. Single molecule parametrization (QM)Geometry optimizationCustomized force-fields
2. Generation of atomistic morphologiesMolecules parametrized on quantum mechanicallevelSimulation of physical vapor deposition
3. Calculation of charge hopping ratesFull quantum mechanical electronic structureanalysisElectronic couplings, reorganization and orbital energies
4. Charge transport simulationsTime resolved charge carrier/exciton dynamicsIVs, IQEs, carrier balance, quenching, ...
9 July 201919 Institute of NanotechnologyJörg Schaarschmidt - Challenges and Opportunities in using Workflow technology
The multiscale simulation workflow for Organic Electronics
1. Single molecule parametrization (QM)Geometry optimizationCustomized force-fields
2. Generation of atomistic morphologiesMolecules parametrized on quantum mechanicallevelSimulation of physical vapor deposition
3. Calculation of charge hopping ratesFull quantum mechanical electronic structureanalysisElectronic couplings, reorganization and orbital energies
4. Charge transport simulationsTime resolved charge carrier/exciton dynamicsIVs, IQEs, carrier balance, quenching, ...
7 July 20193
Content adapted from:
Institute of NanotechnologyJörg Schaarschmidt - Challenges and Opportunities in using Workflow technology
The multiscale simulation workflow for OrganicElectronics
à Traditional approach to multiscale modeling
High level of complexityof multiscale modeling:
Input preparationData transfer tocomputing resourcesJob submission andmonitoring,
9 July 201920
HPC
Content adapted from:
Institute of NanotechnologyJörg Schaarschmidt - Challenges and Opportunities in using Workflow technology
Translation enabled by SimStack
9 July 201921
Embedded Scientific modules = „WaNos“
Saved workflows forreproducible multiscalesimulations
Connect to remote computationalresources
Define input filesand parameters foreach module
Construct a workflowby drag & drop Content adapted from:
Institute of NanotechnologyJörg Schaarschmidt - Challenges and Opportunities in using Workflow technology
Translation enabled by SimStack
Open to arbitrary software modulesRapid prototyping: 30min to include new modules, 1 h to construct functional workflowsMaximal reusability and scalabilityModule interoperability:
Fully automated, file based data transfer between modulesSchema based data transfer in developmentCompatible with OWL ontologies (e.g. EMMO, once developed)
9 July, 201922
… a generic workflow platform conquering complexity
Content adapted from:
Institute of NanotechnologyJörg Schaarschmidt - Challenges and Opportunities in using Workflow technology
Deposit
Deposit is a Monte-Carlo tool to
generate organic thin-film
morphologies
Complex input language (roughly 80
parameters up front)
Minimum number of input files: 2
Minimum number of parameters,
which you have to actually know: 7
Learning curveLearn bash
Find out about the important parameters
Learn ssh
Scp files over
Learn specific qsub commands
Write (or at least adapt) a submission script
…
9 July 201923
Workflows @INT - Example Deposit
Content adapted from:
Institute of NanotechnologyJörg Schaarschmidt - Challenges and Opportunities in using Workflow technology
The multiscale simulation workflow for Organic Electronics
9 July 201924
HPC
Translation by experts intoReusable workflow elements
Content adapted from:
Institute of NanotechnologyJörg Schaarschmidt - Challenges and Opportunities in using Workflow technology
Incorporation of Modules (Workflow Active Nodes - WaNos)
9 July 201925
Content adapted from:
Institute of NanotechnologyJörg Schaarschmidt - Challenges and Opportunities in using Workflow technology9 July 201926
Incorporation of Modules (Workflow Active Nodes - WaNos)
Content adapted from:
Institute of NanotechnologyJörg Schaarschmidt - Challenges and Opportunities in using Workflow technology
Deposit
Deposit is a Monte-Carlo tool to generate organic thin-film morphologies
Complex input language (roughly 80 parameters up front)
Minimum number of input files: 2
Minimum number of parameters, which you have to actually know: 7
9 July 201927
Workflows @INT - Example Deposit
Content adapted from:
Institute of NanotechnologyJörg Schaarschmidt - Challenges and Opportunities in using Workflow technology
The multiscale simulation workflow for Organic Electronics
1. Single molecule parametrization (QM)Geometry optimizationCustomized force-fields
2. Generation of atomistic morphologiesMolecules parametrized on quantum mechanical levelSimulation of physical vapor deposition
3. Calculation of charge hopping ratesFull quantum mechanical electronic structure analysisElectronic couplings, reorganization and orbital energies
4. Charge transport simulationsTime resolved charge carrier/exciton dynamicsIVs, IQEs, carrier balance, quenching, ...
9 July 201928
Content adapted from:
Institute of NanotechnologyJörg Schaarschmidt - Challenges and Opportunities in using Workflow technology
Connection to Compute Resources
9 July 201929
HPC
Middleware actively developed in FZ JülichHandles User AuthenticationCan connect to all common schedulers
Handles data transfer between workflow stepsSetup in < 1h with installer provided by Nanomatch
Content adapted from:
Institute of NanotechnologyJörg Schaarschmidt - Challenges and Opportunities in using Workflow technology
Multiscale Materials Modelling and Virtual Design @KIT
Challenges in Materials Modelling
Opportunities of Workflow Technology
Challenges of Workflow Technology
9 July, 201930
Challenges and Opportunities of Workflow Technology
Institute of NanotechnologyJörg Schaarschmidt - Challenges and Opportunities in using Workflow technology
Reproducability of WorkflowsWhat needs to be captured?
Software, Tools, Scripts
WaNos
Workflow templates
Executed Workflows
Author
How to capture it
Containerization (e.g. udocker)
Versioning
Tags
Repositories
9 July 201931
https://github.com/indigo-dc/udocker
Institute of NanotechnologyJörg Schaarschmidt - Challenges and Opportunities in using Workflow technology
Interoperability of workflow frameworks
9 July 201932
It is unlikely that one workflow framework will meet all the needs of the community
à Shared data Formats and/or Converters will be required
Institute of NanotechnologyJörg Schaarschmidt - Challenges and Opportunities in using Workflow technology
AcknowledgementsProf. Dr. Wolfgang WenzelDr. Ivan Kondov (SCC)
Dr. Timo StrunkDr. Tobias Neumann
9 July 201933
Thank You!
http://www.nanomatch.com
http://www.simstack.eu