pnc, “collaboration: tools and infrastructure” december 7, 2012
DESCRIPTION
PNC, “Collaboration: Tools and Infrastructure” December 7, 2012. PrIMe : Integrated Infrastructures for Data and Analysis. Michael Frenklach. Supported by AFOSR, Fung. Combustion is Central to Energy. IMPACT ON SOCIETY Energy (power plants, car and jet engines, rockets, …) - PowerPoint PPT PresentationTRANSCRIPT
PNC, “Collaboration: Tools and Infrastructure”December 7, 2012
Michael Frenklach
Supported by AFOSR, Fung
PrIMe: Integrated Infrastructures for
Data and Analysis
• IMPACT ON SOCIETY– Energy (power plants, car and jet engines, rockets, …)– Defense (engines, rockets, …)– Environment (pollutants, global modeling, …)– Space exploration– Astrophysics– Material synthesis
• ESTABLISHED PRACTICE OF COLLABORATION– Across different disciplines– Across different countries
• THERE IS AN ACCUMULATING EXPERIMENTAL PORTFOLIO• THEORY/MODELING LINKS FUNDAMENTAL TO APPLIED LEVEL
COMBUSTION IS CENTRAL TO ENERGY
mechanism of:ignitionlaminar flamesNOx
soot...
500 1000 1500 2000 25000.00010.011100
1
2
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
45 2 17 11 3 9 58 1 29 33 47 4 73 82 5 6 98 …
individual reactions
modelmodel reduction analysis
numerical simulations
experiments theory
sensitivityreaction path…
Methane Combustion: CH4 + 2 O2 CO2 + 2 H2O
1970’s: 15 reactions, 12 species
1980’s: 75 reactions, 25 species
1990’s: 300+ reactions, 50+ species
Larger molecular-size fuels:2000’s: 1,000+ reactions, 100+ species
2010’s: 10,000+ reactions, 1000+ species
Methane Combustion: CH4 + 2 O2 CO2 + 2 H2O
The networks are complex, but the governing equations (rate laws) are known
Uncertainty exists, but much is known where the uncertainty lies (rate parameters)
Numerical simulations with parameters fixed to certain values may be performed “reliably”
There is an accumulating experimental portfolio on the system
and yet
Methane Combustion: CH4 + 2 O2 CO2 + 2 H2O
Lack of predictability
Lack of consensus
but still
PROBLEMS• current inability of truly predictive modeling
– conflicting data in/among sources– poor documentation of data/models– no uncertainty reporting or analysis– not much focus on integration of data
• resistance to data sharing– no personal incentives– no easy-to-use technology
• no recognition of the problem
• models are not additive
• data are not additive
• need a system for synthesis of data
PrIMeProcess Informatics Model
INFRASTRUCTURE FOR UQ-PREDICTIVE MODELING
http://primekinetics.org
Data sharing App sharing Automation
CURRENT STATUS
• registered members ~400
• countries ~15
• data records ~100,000
• apps ~20
• active “players”− UCB (lead), NSCU, Stanford, MIT, Cambridge, KAUST, Tsinghua
PrIMe
PortalAssess to distributed resources
User authorizationSocial networking
User forumsData evaluation panels
Help, tutorials, examplesCustomized Drupal
(PHP)platform independent
Workflow“Browser-based” software
User building projectsData/app linking
Binary XML interfacesRemote-server support
Project sharingC#, Windows, IEapps: C#, Matlab
WarehouseData collections
Models and ExperimentsControlled by schemas
Submission formsMultiple-mode accessWebDAV
XML
DATA ORGANIZATION:
• conceptual abstraction
• practical realization
Chemical Kinetics Model
Chemical Reactions
Chemical Species
Chemical Elements
composed of
composed of
composed of
haveatomic masses
rate law data -parameter values -uncertainties -reference
have
have
thermo datatransport data
CONCEPTUAL ABSTRACTION: DATA MODEL
reactions
- -combustion modeling quantum chemistry
diagnosticsthermosciences
thermo molecular structure
spectra absorption coefficient
PRACTICAL OUTCOME:TRANS-DISCIPLINARY COLLABORATION
PRIME DATA MODEL: EXPERIMENTSData Attribute (QOI, ‘target’)
a specific feature extracted for modeling:
– peak value– peak location– induction time– ratio of peaks(from multiple experiments) …
Experimental Record• reference• apparatus• conditions• observations
– inner: XML– remote: HDF5, …
• uncertainties• additional items
– links, docs, …– video files, …
archival record
VVUQ datain
strum
enta
l mod
el
Initial Model:“Upload your data to PrIMe Warehouse” (“give me your data”)
New, Distributed Model:“You may, if choose, connect your data to the communal system”• with a switch in the OFF position: “you can use the
communal data and tools but your own data is private to you only”• “but please flip the switch to the ON position when you are
ready to share your own data”
PRIME DATA MODEL
“Connect your code to the communal system”- you control your own code:• release version• user access, licenses• collect fees, if desired
SAME FOR APPS
TECHNOLOGY: HOW
Remote server app—PrIMe Web Services (PWS)• no restrictions on platform• no restrictions on data formats• no restrictions on local programming language(s)
PrIMe Workflow Interface (PWI) is the only “standard”• developed, maintained, and controlled by the community
client machine
client data
PrIMe web services
PrIMe Data Flow Network
PrIMe Dispatcher
BIG DATAexcessively large data sets• do not move the data
• but use “smart agents” (eg, HTML5 walkers)
web services with user-reloaded tasks:fetch data features for user-requested analysis
workflow projectuser specifies conditions of interest
workflow component retrieves archived data: a set of relevant targetstarget values and their uncertainty ranges
surrogate models developed for relevant targetsactive variables and their uncertainty ranges
data warehouse
workflow component performs:• retrieves the pertinent kinetics
model (via link in the dataset)• performs simulations on the fly for
the conditions specified and builds a new surrogate model
• performs UQ analysis combining the new surrogate model with the archived ones and the rest of the pertinent data
• reports results
workflow projectworkflow component performs:• retrieves the pertinent kinetics
model (via link in the dataset)• performs simulations on the fly for
the new data and builds a new surrogate model
• performs UQ analysis combining the new surrogate model with the archived ones and the rest of the pertinent data
• reports results• adds the new data to the dataset
and archives in Warehouse
workflow component retrieves archived data: a set of relevant targetstarget values and their uncertainty ranges
surrogate models developed for relevant targetsactive variables and their uncertainty ranges
data warehouse enrichment
user specifiesa new setof data
FOCUS ON ANSWERING QUESTIONS:prediction of (un)known observations
FOCUS ON ANSWERING QUESTIONS:prediction of an (un)known parameter
FOCUS ON ANSWERING QUESTIONS:prediction of multi-D correlations
ANSWER QUESTIONS
• What causes/skews model predictiveness?
• Are there new experiments to be performed, old repeated, theoretical studies to be carried out?
• What impact could a planned experiment have?
• What is the information content of the data?
• What would it take to bring a given model to a desired level of accuracy?
A PARADIGM SHIFT
from algorithm-centric view
to data-centric view
outputinput codedata data