eu cost action cm1404: wg€ - efficient data exchange
TRANSCRIPT
Sustainable and Smart Energy Carriers
for Decentralised Energy Production
PRESENTATION
Data Mining Challenges in Distributed Generation
Edward S. Blurock
Blurock Consulting AB
(previously with
Malmö University: Computer Science Dept.
Lund University: Combustion Physics, Energy Sciences
Research Institute for Symbolic Computation
University of California, Irvine: Thesis, Computational Chemistry)
bottom line: a career in (chemical) modelling
(using data, AI and machine learning/data mining …)
Sustainable and Smart Energy Carriers
for Decentralised Energy Production
(SLIGHTLY) REVISED TITLE
Data Mining Challenges in Distributed Generation
Data Mining Challenges in Distributed Generation
Data Mining Challenges in Distributed Generation
Community(?)
Data Mining Challenges in Distributed Generation
Combustion Community(?)
Data Mining Challengesfrom the widely distributed generated data from the scientific community
specifically for those dealing with all aspects of combustion
Sustainable and Smart Energy Carriers
for Decentralised Energy Production
WHAT WE ARE TALKING ABOUT
DataTheme:
you have to have data available
before you can do something with it
Sustainable and Smart Energy Carriers
for Decentralised Energy Production
PRESENTATION
• Introduction (with disclaimers and revisions)
• Motivation: • Data exchange moving into the clouds
• WG4: • Standard definition for data collection and mining toward a virtual
chemistry of Smart Energy Carriers
• WG4 Task Force:• Toward efficient data exchange in the combustion community
Sustainable and Smart Energy Carriers
for Decentralised Energy Production
DATA PERSPECTIVE: DATA EXCHANGE
• Data is the backbone of modern scientific research
• Exchange of data is paramount to successful interaction between research groups
OPEN DATA
Publications and
conferences
Data exchanged between
researchers (email, etc)
Virtual Research Environment
papers
Data files
Clouds (infrastructures)
Sustainable and Smart Energy Carriers
for Decentralised Energy Production
TOWARD A VIRTUAL SCIENTIFIC ENVIRONMENT
We are not alone in this
development
(maybe a bit behind)
Sustainable and Smart Energy Carriers
for Decentralised Energy Production
DATA PERSPECTIVE GOALS: SOCIAL NETWORK
Need
tools
to
promote
efficient
data
sharing
within
the
community
Sustainable and Smart Energy Carriers
for Decentralised Energy Production
DATA PERSPECTIVE: MANY SOURCES
Need
to
accommodate
the
varied
data
that
needs
to
be
handled
Sustainable and Smart Energy Carriers
for Decentralised Energy Production
DATA PERSPECTIVE: INTERRELATIONSHIPS
There
is
no
such
thing
as
an
isolated
data
point
Sustainable and Smart Energy Carriers
for Decentralised Energy Production
DATA PERSPECTIVE: QUALITY CONTROL
Reproducibility
Reliability
AccountabilityDue to accountability requirements (financial incentives):
data managing tools are already being used
An important aspect of interdependency of data
is quality control
(calculation of sensitivity or error bars)
Efficient data exchange and availability
(beyond just published data)
is the key
Sustainable and Smart Energy Carriers
for Decentralised Energy Production
ACCOUNTABILITY: ELECTRONIC LAB NOTEBOOKS (ELN)
In other fields
(pharma)
accountability
has
financial
motivations
(patents)
and
lead
to the
development and use
of ELNs
Sustainable and Smart Energy Carriers
for Decentralised Energy Production
TOWARDS EFFICIENT DATA EXCHANGE
SMARTCATS
WG4
Sustainable and Smart Energy Carriers
for Decentralised Energy Production
PRESENTATION
• Introduction (with disclaimers and revisions)
• Motivation: • Data exchange moving into the clouds
• WG4: • Standard definition for data collection and mining toward a virtual
chemistry of Smart Energy Carriers
• WG4 Task Force:• Toward efficient data exchange in the combustion community
Sustainable and Smart Energy Carriers
for Decentralised Energy Production
WG4 SUMMARY
DATA
WG4 can be summarized in one word:
Management of data:
Use of data
How do we keep track of, exchange and manage all the data
that is generated by the SMARTCATS community
How can we efficiently use the immense amount of data
that the SMARTCATS community generates
Sustainable and Smart Energy Carriers
for Decentralised Energy Production
WG4: TITLE
Standard definition
for data collection and mining
toward a virtual chemistry of smart carriers
Sustainable and Smart Energy Carriers
for Decentralised Energy Production
WG4 CHALLENGE
The main challenge of this WG is to provide a
forum for all experts in the combustion
community to formulate a common set of
requirements for a universal combustion
database not only capable of efficiently
store the vast amount of raw data generated
by experiments and modeling but also, more
importantly, efficiently accessible for
future use and maintenance.
Sustainable and Smart Energy Carriers
for Decentralised Energy Production
Sustainable and Smart Energy Carriers
for Decentralised Energy Production
WG4: AIMS
• Identification of the main requirements and
tools for the development of databases,
software and mathematical tools for data
collection and handling as well as chemistry
optimization using data mining techniques.
• Definition of “crucial” experiments and
simulations, uncertainty and sensitivity
analysis in combustion modeling
Sustainable and Smart Energy Carriers
for Decentralised Energy Production
DEFINITIONS, REQUIREMENTS AND TOOLS
Sustainable and Smart Energy Carriers
for Decentralised Energy Production
WG4: INCREASED DIALOG ABOUT DATA
Sustainable and Smart Energy Carriers
for Decentralised Energy Production
WG4: DATA PERSPECTIVE
Definition of specific sets of prerequisites and
goals for the establishment of a
combustion database that will allow
efficient electronic communication of
combustion-related data.
Sustainable and Smart Energy Carriers
for Decentralised Energy Production
PRESENTATION
• Introduction (with disclaimers and revisions)
• Motivation: • Data exchange moving into the clouds
• WG4: • Standard definition for data collection and mining toward a virtual
chemistry of Smart Energy Carriers
• WG4 Task Force:• Toward efficient data exchange in the combustion
community
Sustainable and Smart Energy Carriers
for Decentralised Energy Production
WG4 TASK FORCE
Goal: To use the expertise within the action
to promote efficient data exchange
among combustion researchers
First task: Cataloging
1. State of the art (in and out of the community)
2. Data within the community
Sustainable and Smart Energy Carriers
for Decentralised Energy Production
GOALS OF PHASE I: CATALOGING
• Roles and Perspectives:
• For each role/perspective catalog a prioritised set of requirements,
expectations and desires
• Data to Disseminate
• For each apparatus and tool, outline (in words, mainly) the data
that could/should be available, from raw data to final published
results.
• Current efforts (inside the Action and outside)
• Catalog how different groups are storing data
• Catalog other data handling from other disciplines
• Projects/proposals/discussions having to do with data
Sustainable and Smart Energy Carriers
for Decentralised Energy Production
PERSPECTIVES AND ROLES
• User: Interested in using the tools to acquire and use the data.• In this role, the user is interested accessing data in a convenient and efficient way.
The user is also interested in what data is available.
• Generator: Generates data, both experimental and theoretical.• The first focus is how much, in which detail and in what form the data should/can be
disseminated.
• An important aspect of this is to make this as painless and efficient as possible so as
to not generate more burden.
• Software/Database Developer: Developer of the tool.• From User: How and in what form the data can be accessed.
• From Generator: Incorporating their data into whatever system they are developing.
Sustainable and Smart Energy Carriers
for Decentralised Energy Production
LEVELS OF DATA
•Public: tables or figures within the text of the publication, or as more
detailed information in supplementary material
•Preliminary: Data leading up to published data
•Experiment: Data directly from the device, uninterpreted and
unedited.
•Intermediate: Data that has been process, but basically very device
dependent and not necessarily useful to others. In a sense, this is only
useful within the research group.
•Collaboration: Data that is useful to exchange among
(knowledgeable?) colleagues and collaborators
Sustainable and Smart Energy Carriers
for Decentralised Energy Production
PARTICULAR FOCUS
•Preliminary:
•Data leading up to published data
• Accessibility
• Usefulness
• Characterisation
• Breadth of exchange: Public, collaborators, within group…
Sustainable and Smart Energy Carriers
for Decentralised Energy Production
CATALOGING
WHATdata to be cataloged and availabilityhas to be cataloged first (a major goal of first phase)
HOWthe data is to be stored
is of secondary importance:
• Catalog with respect to particular devices and models
• Within each:
• What are the data types and forms
• Characterisation of the data
• Quality of the data
• Usefulness
Sustainable and Smart Energy Carriers
for Decentralised Energy Production
HOW: SECONDARY FOCUS
• De facto standards• In moving towards electronic representation, the community is already in the
process establishing standards
• Convenience:• Researchers generate data and ‘store’ it is the most convenient form
available to them (generation of data is primary concern).
• Software:• As long as the format is ‘consistent’, intelligent software can interpret it and
then convert to another ‘standard’ form.
HOWthe data is to be stored
is of secondary importance:
Sustainable and Smart Energy Carriers
for Decentralised Energy Production
DIFFERENT ‘STANDARDS’: TWO TYPICAL FORMATS
<?xml version="1.0" encoding="utf-8"?>
<experiment>
<fileAuthor>Chemical Kinetics Laboratory, Institute of Chemistry, ELTE, Budapest, Hungary</fileAuthor>
<fileVersion>
<major>1</major>
<minor>0</minor>
</fileVersion>
<ReSpecThVersion>
<major>1</major>
<minor>0</minor>
</ReSpecThVersion>
<bibliographyLink preferredKey="N. Leplat, P. Dagaut, C. Togbe, J. Vandooren,
Combust. Flame 158 (2011) 705-725, Fig. 9, C3H6 not taken"/>
<apparatus>
<kind>stirred reactor</kind>
</apparatus>
<experimentType>Jet stirred reactor measurement</experimentType>
<commonProperties>
<property description="" label="P" name="pressure" units=“atm">
<value>1</value></property>
<property description="" label="V" name="volume" units=“cm3">
<value>30</value></property>
<property description="" label="tau" name="residence time" units="s" >
<value>0.07</value></property>
<property name="initial composition">
<component><speciesLink preferredKey="C2H5OH" />
<amount units="mole fraction">0.002</amount></component>
<component><speciesLink preferredKey="O2" />
<amount units="mole fraction">0.024</amount></component>
<component><speciesLink preferredKey="N2" />
<amount units="mole fraction">0.974</amount></component>
</property>
Table 1
Experiment Type: Jet stirred reactor measurement
Paper Title: Oxidation of Cyclohexane in a Jet-Stirred
Reactor
Common Properties
Pressure: 106.7 kPa
Volume: 30 cm3
Phi: 0.5
Residence Time: 2 s
Fuel inlet mole fraction: 0.0067
Temperature range: 500 - 1100 K
Inlet mole composition
CH3CHO 0.0067 mole
fraction
H2 0.0345 mole
fraction
N2 0.9 mole
fraction
Temperature(K)/Mole
Fraction
H2 O2 CO CO2
500 0 5.87E-02 0 0
525 0 5.63E-02 0 0XML format from ReSpecTh Spreadsheet: CloudFlame
State of the art: what is in use now….
Sustainable and Smart Energy Carriers
for Decentralised Energy Production
Review the Hierarchical Data Format (HDF5) used by PrIMe database; hierarchy enables
extension (new groups).
Sustainable and Smart Energy Carriers
for Decentralised Energy Production
XML DATA REPRESENTATION: WE ARE NOT ALONE
XML is the language of the internet == many tools for its manipulation
Important note:
Though understandable for humans,
not necessarily convenient to generate
need tools
Gaining ground in scientific computing
Sustainable and Smart Energy Carriers
for Decentralised Energy Production
SOFTWARE SOLUTION SUPPORTING MANY FORMATS
From a software technical point of view:
interchange between formats
Example
in
computational
chemistry
Sustainable and Smart Energy Carriers
for Decentralised Energy Production
DATABASES WITHIN THE ACTION
http://respecth.chem.elte.hu/respecth
http://www.chemicalkinetics.info
http://primekinetics.org/
Sustainable and Smart Energy Carriers
for Decentralised Energy Production
OUTPUT
Through input from actors in the SMARTCATS action
a
white paper on
Data within the combustion community
We need YOUR input