optipa - ku leuven · matlab program a windows standalone executable has been compiled which is...

1 / 134

OptiPa

(v. 6.2p)

User Manual

2 / 134

Table of contents OptiPa, a MatLab interface ................................................................................................ 4

General information ....................................................................................................... 5 Legal disclaimer ............................................................................................................ 6 Installing OptiPa ............................................................................................................ 7 Structure of the working folders ................................................................................... 13 Parallel computing ....................................................................................................... 14

Model implementation ...................................................................................................... 16 Model implementation: experimental data ................................................................... 17 Model implementation: condition file ........................................................................... 19 Model implementation: linkage data ............................................................................ 21 Model implementation: model definition file ................................................................. 22 The different time concepts ......................................................................................... 29 Using a spreadsheet to create ASCII input files .......................................................... 30 Using the proper MatLab syntax .................................................................................. 31

Graphical User Interface .................................................................................................. 34 GUI: File menu ............................................................................................................ 37 GUI: Options menu ...................................................................................................... 45 GUI: Help menu ........................................................................................................... 51 GUI: Model control menu............................................................................................. 54 GUI: Preparing the optimisation .................................................................................. 57 GUI: Selecting an optimisation method ....................................................................... 61 GUI: Optimisation control ............................................................................................ 64 GUI: Transformations .................................................................................................. 65 Removing outliers ........................................................................................................ 67

Model simulation .............................................................................................................. 68 Graphical simulation output ......................................................................................... 69 Numerical simulation output ........................................................................................ 73

Model optimisation ........................................................................................................... 75 Graphical optimisation output ...................................................................................... 76 Numerical optimisation output ..................................................................................... 77 Statistical optimisation output ...................................................................................... 78

Optimisation per experiment ............................................................................................ 84 Optimisation per experiment: procedure ..................................................................... 85 Optimisation per experiment: output ............................................................................ 86

Confidence regions .......................................................................................................... 90 Confidence regions: procedure ................................................................................... 91 Confidence regions: output.......................................................................................... 93

Sensitivity analysis ........................................................................................................... 97 Sensitivity analysis: procedure .................................................................................... 98 Sensitivity analysis: output ........................................................................................ 100

Bootstrapping ................................................................................................................. 102 Bootstrapping: procedure .......................................................................................... 103 Bootstrapping: output ................................................................................................ 104

Monte Carlo simulations ................................................................................................ 107 Monte-Carlo simulations: procedure .......................................................................... 109 Monte-Carlo simulations: output ................................................................................ 114

Draw from distributions .................................................................................................. 119 Draw from distributions: procedure ............................................................................ 120 Draw from distributions: output .................................................................................. 124

Background .................................................................................................................... 125 Levenberg-Marquardt method ................................................................................... 126

3 / 134

Lagrange multipliers .................................................................................................. 130 Model based error resampling bootstrap technique .................................................. 131 Generating random correlated non-Gaussian parameters ........................................ 132 Cholesky decomposition ........................................................................................... 134

4 / 134

OptiPa, a MatLab interface The MatLab program, OptiPa, was developed to estimate model parameters on ODE based models. This help file describes the functionality of OptiPa and some of the backgrounds. This help file will subsequently highlight the following topics:

General information

How to prepare your model input files

How to prepare your model definition file

How to operate the graphical user interface

How to run a simulation

How to run an optimisation

How to run an optimisation per experiment

How to determine the conditional joint confidence regions for your model parameters

How to do a sensitivity analysis

How to run a bootstrap analysis

How to run Monte-Carlo simulations

How to generate random co-varying variables In case of any problems please contact the author and email your input files to enhance diagnosing your problem. The text of this help file is also available in PDF format. Maarten Hertog KU Leuven BIOSYST-MeBioS Faculty of Bioscience Engineering W. de Croylaan 42 - bus 428 BE - 3001 Heverlee Belgium www.mebios.be

mailto:[email protected]?subject=OptiPa%20question

OptiPa_UserManual.pdf

http://www.mebios.be/

5 / 134

General information OptiPa was developed because of an urgent need for a flexible and versatile interface to estimate model parameters on ODE based models leaving only a minimum of programming for the end user. One of the innovative aspects of OptiPa is that it readily allows the user to identify the different sources of variation in a model as it allows, for instance, estimating certain model parameters either in common, per experiment, per treatment condition or per experimental unit. Analysing data this way enhances the interpretation of experimental data and the subsequent application of the model to different situations.

System requirements OptiPa is developed within the MatLab environment (The MathWorks, Inc., Natick, MA, USA). From the MatLab program a Windows standalone executable has been compiled which is what is being distributed to the end users. As a result OptiPa can be used without having MatLab installed. OptiPa has been compiled to suit 64 bit Windows based systems.

Important update notes OptiPa version v6.2p has been largely improved with regard to user-friendliness.

o First of all, the structure of the model OMF-files has been simplified.While the old structure is still supported users are encouraged to build their models following the new template. Check the help files to find out about the new model definition of the model OMF-files.

o In addition, proper syntax checking of the model OMF-files has been introduced resulting in clear indications of the syntax errors including its line and position numbers.

o Furthermore, the statistical output of OptiPa has been completely revamped with the ASCII text output being replaced by rich HTML output combining text and graphical results together.

o Finally, the way users should be addressing the variables from the condition file has been simplified (but creating a possible incompatibility with old OMF-files).

From version v6.1p onwards OptiPa supports parallel computing. Please read the information on this topic as this has some impact on the operation of OptiPa.

In older OptiPa versions the model files were saved as m-files (with the extension *.m). In the current OptiPa version the model files have been renamed to OMF-files (with the extension *.omf). To move models from previous OptiPa versions just rename the model m-files to OMF-files by changing their extension from *.m to *.omf.

6 / 134

Legal disclaimer While we have made every attempt to ensure the reliability of OptiPa, KU Leuven or its employees cannot be held responsible for any errors or omissions, or for the results obtained from the use of OptiPa. OptiPa is provided "as is", with no guarantee of completeness, accuracy, timeliness or for the results obtained from the use of this information, and without warranty of any kind, express or implied, including, but not limited to warranties of performance, merchantability and fitness for a particular purpose. In no event will KU Leuven, or employees thereof be liable for any decision made or action taken in reliance on the information provided by OptiPa or for any consequential, special or similar damages.

License agreement The LICENSER authorizes the LICENSEE to use OptiPa without any fee for scientific purposes only.

The LICENSER does not adhere to any problems that arise directly or indirectly from using OptiPa.

The copyright for OptiPa remains with the LICENSER. The LICENSEE has to ensure that OptiPa or parts of it are not distributed outside the institution of the LICENSEE.

When publishing results, gained by using OptiPa, the LICENSEE has to mention OptiPa with appropriate citations (ask the author for an appropriate list of references.

Publishing of comparative results with other similar forms of OptiPa, is to be arranged with the LICENSER.

7 / 134

Installing OptiPa OptiPa comes as a self-extractable installation file that will install the MATLAB Compiler Runtime, the OptiPa program files and a folder with some demo models. Just follow the instructions on screen. On running the installation file the user is asked to accept a simple disclaimer to keep us safe...

Fig. 1 OptiPa disclaimer.

Subsequently you are asked to provide a destination folder for the OptiPa program. This can be anywhere on your hard disk, but not in the official program folders. Make sure to provide a proper folder name as by default OptiPa will just unpack in the folder from where you activated the installation file.

don NOT install OptiPa in the windows Program Files folder as OptiPa will need writing rights to be able to work properly.

Instead install OptiPa to one of your user folders

8 / 134

Fig. 2 Provide the OptiPa destination folder.

On confirming the destination folder the installation will start to unpack itself.

Fig. 3 OptiPa is unpacking the installation file.

Once unpacked the Matlab Runtime Component Installer will be started. The MATLAB Compiler Runtime (MCR) is a standalone set of shared libraries that enables the execution of compiled MATLAB applications or components on computers that do not have MATLAB installed. The current version used is 8.5 for 64 bits Windows based systems that comes as part of Matlab release 2015a.

9 / 134

Fig. 4 The Matlab Compiler Runtime installer is being extracted

First time installation of Matlab Runtime Component If MCR is not detected on your machine you will have to go through the full installation procedure as pictured below which will take about 10 minutes. You will need administrator's rights to install the Matlab Compiler Runtime.

Fig. 5 Full installation process of a first time installation of the Matlab Runtime Component

Repeat installation of Matlab Runtime Component

10 / 134

If you already have this version of the Matlab Compiler Runtime v8.5 (Matlab 2015a, 64 bit) installed on your machine you can skip the installation by pressing Cancel in the first screen (Fig. 6).

Fig. 6 Press Cancel if you know for sure you already have Matlab Compiler Runtime v8.5 (Matlab 2015a, 64 bit) installed

Once finished with installing the Matlab Runtime Component the help file of OptiPa is automatically started. Please take some time to read it carefully as everything you ever wanted to know about OptiPa is in here.

Fig. 7 OptiPa help file is automatically launched on installation

After installation, the OptiPa folder should look something like below. A simple shortcut is created in your Start menu as well.

11 / 134

Fig. 8 Structure of the OptiPa installation folder.

Starting OptiPa To start OptiPa, either select the OptiPa link created in your Start menu or go to the installation folder and start OptiPa by double clicking the file optipa.exe (Fig. 8). The very first time you start OptiPa it will take a bit longer as the program still has to be unpacked from its archive (Fig. 9). The black Console-window will remain visible throughout the operation of OptiPa serving as a progress monitor on which some of the OptiPa output can be monitored. This screen replaces the MatLab command window.

Fig. 9 OptiPa is initiating.

After unpacking the archive, OptiPa is automatically started. On first time startup you might get a firewall message. This has to do with the fact MatLab can potentially make use of multiple computers connected through your network. However, OptiPa is not making use of this feature. Therefore you can cancel this request for network access (Fig. 10).

12 / 134

Fig. 10 OptiPa is initiating.

The startup window will be shown while OptiPa is being initialised. Once finished the startup screen will disappear and control is rendered to the actual program.

Fig. 11 The startup screen of OptiPa. If multiple cores have been selected there will be a further delay related to the initiation of the cores

Important notes

Closing the black Console-window results in terminating the whole OptiPa program. You can minimise it though......

OptiPa won't add any entries in the registry of your computer. By deleting the OptiPa folder the program is gone

MCR needs to be uninstalled through the official way (via the windows control panel)

13 / 134

Structure of the working folders Working with OptiPa implies working with different file types. First of all one can supply a total of four user defined files: one obligatory ASCII experimental data file, containing the dependent measured experimental data you want to model, one non-obligatory ASCII condition file, containing any independent variables describing the experimental conditions under which the experiments were performed, one non-obligatory ASCII linkage file, linking the experimental data to the condition profles defined, and one obligatory OptiPa model definition file (OMF-file) containing the actual definition of your mathematical model. While running OptiPa several output files are generated during simulation, optimisation, bootstrapping and Monte-Carlo simulation. To keep everything organised OptiPa works with so-called projects. A project is organised around a certain modelling topic. It generally includes a limited number of experimental data files in combination with several model versions all relating to those same experimental data files. These models can be seen as different attempts to describe the data. Each model saves its output in a separate folder (Fig. 1). The default location for saving all OptiPa projects is the subfolder …\projects relative to where the OptiPa program files are located. However, this project root can be changed through the options menu. In the example below this project root was changed to D:\OptiPa Projects. Within this project root you find the project folders as shown in Fig. 1 for the Demos project and the projects ModelTopic1, ModelTopic2 and ModelTopic3. Within such a project folder three main folders exist: data, models and settings as is shown for the Demos project. These folders are automatically generated when generating a new project through the file menu. The data folder should contain the model data files with experimental data file, the condition file and the linkage file (if applicable). The models folder contains all model definition OMF-files On generating a new model through the file menu a corresponding output folder is automatically generated in the models folder to save output from that particular model.

Fig. 1 Structure of the work folders generated by OptiPa.

Manipulation of projects and folders is done through the file menu on the menu bar from the main OptiPa window.

Important notes

In previous OptiPa versions the model files were saved as m-files (with the extension *.m). In the current OptiPa version the model files have been renamed to OMF-files (with the extension *.omf). To move models from previous OptiPa versions just rename the model m-files to OMF-files.

If one creates a new project by hand (not going through the OptiPa file menu) one has to manually create the sub folders data, models and settings outlined above.

The name of a project and that of a model are not allowed to contain spaces. When creating a new project/model spaces will be automatically replaced by an underscore.

If one creates a model OMF-file by hand (not going through the OptiPa file menu) one has to manually create a corresponding output folder in the models folder as well. If the model OMF-file is named:

MyNewModel.omf the corresponding output folder should be named:

OUTPUT [MyNewModel]

14 / 134

Parallel computing As of version 6.1p OptiPa has been extended with the possibility of parallel computing. Thanks to the Parallel Computing Toolbox™ from MatLab, computationally and data-intensive problems can be processed using multicore processors. OptiPa has been completely restructured and various of the numerical algorithms have been parallelised. Matlab, and therefore also OptiPa, supports the use of up to 12 cores to execute applications on a multicore desktop. The number of cores used can be set through the options menu .The maximum number of cores depends on your system but is limited to 12. When you select one single core you effectively rule out all parallel computations.

Benefits of parallel computing The number of cores initiated will dramatically affect the efficiency of OptiPa during the following tasks:

Simulation: When the data includes multiple experiments simulations of the individual experiments are executed in parallel speeding up the simulation task.

Optimisation: Any of the optimisation methods are based on an iterative process of changing parameter values and simulating the model with the new parameter values. When the data includes multiple experiments simulations of the individual experiments are executed in parallel speeding up the optimisation task. Except for LSQnonlinear all optimisation methods support parallel computing. These genetic/evolution/population based approaches are all based on simulating large sets of model copies. Even when the data consists of only a single experiment the model copies generated by the various optimisation methods are processed in parallel speeding up these global optimisation approaches.

Optimisation per experiment: When optimising the model per experiment the optimisation of the individual experiments are started in parallel speeding up the overall optimisation task. Except for LSQnonlinear all other optimisation methods support parallel computing and benefit from the parallelisation as indicated above.

Calculating conditional joined confidence regions: Joined confidence regions are calculated based on an initial optimisation which benefits from the parallelisation as indicated above. Subsequently by scanning the parameter space for each pair of model parameter values, joined confidence intervals are calculated. This computationally intensive algorithm is now completely parallelised further speeding up this task of calculating conditional joined confidence regions.

Do a sensitivity analysis: This task is based on simulating the model several times with slightly different model parameter values. While these runs were previously done sequentially they are now run in parallel speeding up the sensitivity analysis.

Bootstrapping: Bootstrapping starts from an initial optimisation which benefits from the parallelisation as indicated above. Subsequently the model is repeatedly fitting to a large number of artificially generated bootstrap datasets. These optimisations are now started in parallel further speeding up the overall bootstrap process.

Monte-Carlo simulation: Monte-Carlo simulations are based on simulating the model many times with different model parameter values. While these runs were previously done sequentially they are now run in parallel speeding up the Monte-Carlo simulations. Also the initial optimisation used benefits from the parallelisation as indicated above. In case started from an initial bootstrap dataset the Monte-Carlo analysis starts by fitting the SKN distribution to the bootstrap parameter distributions to generate random correlated sets of variables. This fitting of the distributions is implemented in parallel as well.

Draw from distributions: This task is based on fitting the SKN distribution to a set of variables to generate random correlated sets of variables. This fitting of the distributions is implemented in parallel as well.

Memory Usage When deciding on the number of cores be aware that OptiPa might take up all the memory that you assign to it. If you need to do other work on your machine as well it might not hurt to leave at least one core free for other processes. To help you in monitoring the memory usage a small CPU indicator is included with OptiPa. Whenever you start OptiPa a white bar at the top of your screen will show the current load on your CPU ranging from 0 to 100 %.

15 / 134

Fig. 1 OptiPa with the CPU memory usage indicator visible at the top border of the screen.

Freezing of your system Sometimes when OptiPa is using full resources and you are trying to manipulate the screen, OptiPa seems to freeze and becomes unresponsive. As long as the CPU indicator remains active or remains at 100 % all OptiPa processes are still operational and there is no reason to worry. Just be patient. When frozen (but still operational) the STOP and BREAK buttons might no longer work. In this case you can stil use CTRL-ALT-F11 to stop the calculations or CTRL-ALT-F12 to break the calculations. Please press these key combinations until you notice on the Console window the message 'interrupted by user'. When the system has frozen and the CPU indicator shows almost no activity (say less than 10 %), most likely the parallel processes stalled. Then you have no other choice but killing OptiPa (by simply closing the black Console window).

Output files When parallel computation is activated by selecting more than 1 core, the order in which processes are handled becomes unpredictable. As a result the order of subsequent runs when optimising per experiment, when doing multiple a sensitivity run, and when running multiple Monte-Carlo simulations will have been scrambled depending on the number of cores involved during the parallel computations. This has no real consequences except for the fact that you will have to sort some of the output files yourself afterwards. This can always be done using the run number printed in the output files.

Important notes

The only situation when no benefit is obtained is when optimising data from a single experiment using the LSQnonlinear optimisation routine; this is the only routine that does not support parallelisation and with only one experiment there is nothing to simulate in parallel.

When you select one single core you effectively rule out all parallel computations.

16 / 134

Model implementation To implement a model for use with OptiPa, four user defined files have to be generated: one obligatory ASCII experimental data file, containing the dependent measured experimental data you want to model, one non-obligatory ASCII condition file, containing any independent variables describing the experimental conditions under which the experiments were performed, one non-obligatory ASCII linkage file, linking the experimental data to the condition profles defined, and one obligatory OptiPa model definition file (OMF-file) containing the actual definition of your mathematical model.

17 / 134

Model implementation: experimental data The ASCII data file with experimental data contains measured data on one or more of the dependent model variables. The experimental data is organised in numbered experiments that each coincides with a single model run (Fig. 1). If replicate measurements were taken they can either be organised in a single experiment with replicates or in as many experiments as there where replicates. This would be the case when non-destructive measurements are taken from multiple fruit resulting in a complete time course for each fruit. In this case, the results of each single fruit can be considered a single experiment.

Fig. 1 Layout of the experimental data file. Panel A shows an example of experimental data with replicate measurements organised in as many experiments as there were replicates, while panel B shows a comparable example with replicates within an experiment.

Important notes

The first row of this file will be interpreted as column headings.

The first column of this file must contain experiment number. The numbering of the experiments is allowed to be out of order or to omit certain numbers.

The second column of this file must contain experimental time. Time should be in ascending order, but should not necessarily be equidistant. Multiple observations at a single time point can be included by introducing multiple rows with all the same time, but different observation values (Fig. 1, panel B).

Subsequent columns can contain any kind of numerical information. Column headings used for these columns should be unique and not overlap with the names used in the condition file nor used as model variables in the model definition file.

Missing numbers can be indicated using: NaN

The file should be saved as some kind of delimited ASCII file using either:

o , (comma)

18 / 134

o ; (semicolon)

o | (tabular stop)

o _ (white space)

This experimental data file is obligatory to supply as it is used by OptiPa to identify the simulation time of the model for each of the experiments included. If one does not have any experimental data (yet) and only wants to use OptiPa to do a simulation one can prepare a dummy experimental data file only containing two rows per experiment defining the time interval t=tmin and t=tmax (the maximum simulation time) while filling in NaN values in the experimental data column such as given below.

Whenever you change the data file you need to reload the file through the model control menu.

19 / 134

Model implementation: condition file The so called condition file contains the model input variables but is not mandatory to provide. Besides for defining model input variables this file is also used to define a possible (sub)grouping of the experiments (Fig. 1). A possible grouping could be based on characterisations like cultivar, fruit, grower, orchard, year or harvest. This enables OptiPa to estimate separate multiple values for a single parameter depending on these groupings.

Fig. 1 Layout of the condition file showing an example of a condition file containing the independent variable time, a grouping variable cultivar and the input variables temperature and O2.

Important notes

The first row of this file will be interpreted as column headings.

The first column of this file must contain experiment number. The numbering of the experiments should be in agreement with the numbering used in the experimental data file. If NOT a linkage file should be provided.

The second column of this file must contain experimental time. Time should be in ascending order, but should not necessarily be equidistant. If data from the condition file is used to provide independent input variables to the model and interpolation is used to generate a continuous data flow, the time span given for each of the experiments should start at t=0 and should at least cover the time span covered by the corresponding experimental data from the experimental data file. Generally one should at least provide two rows of condition data per experiment, at t=0 and at t=tmax.

Multiple observations at a single time point are NOT allowed.

The sample frequency presented in the condition file will affect the integration process. When small time steps are introduced the maximum time step allowed for the integrator is small as well. This is to prevent the ODE solver from overlooking short lasting events. When larger time steps are used the

20 / 134

maximum integration step is larger as well. Using smaller time steps than required (providing temperature data for every second while a typical event will take hours) will unnecessarily slow down the ODE solver and thus the whole simulation/optimisation process. In this case the input data should be reduced to the appropriate resolution. Check out the Discontinuity strategy for more options.

Subsequent columns can contain any kind of numerical information. Column headings used for these columns should be unique and not overlap with the names used in the experimental data file nor used as model variables in the model definition file.

Missing numbers are not allowed in this file.


o , (comma)

o ; (semicolon)

o | (tabular stop)

o _ (white space)

Whenever you change the condition data file you need to reload the file through the model control menu.

Update note

As of v6.2p the user variables provided in the condition file are available as functions with the same names as given in the condition file. These functions require a single input, being time, and will generate as an output the interpolated value of the column variable it is referring to. As a result, the interpolated condition data is immediately available for use in the model definition file. At the same time this creates the one incompatibility issue with previous versions as the variables from the condition file are no longer available as column vectors, only as a function. So any existing user defined interpolations in the model definition file will generate an error. In previous versions to access the current temperature at time t users would typically define this in the model definition file using the interpolation function as:

TempCurrent = interp1(t_cond,temp,t);

As of v6.2p this should be replaced by the function named after the variable defined in the condition file using:

TempCurrent = temp(t);

21 / 134

Model implementation: linkage data If independent input variables are required to run the model these should be provided through the condition file using the same numbering of the experiments as was used in the experimental data. However situations might occur where several experiments all relate to one and the same condition. This would require to repeat the same definition of that condition for all of the experiments involved. If these conditions are defined by some variable temperature regime obtained from a temperature logger the condition file might quickly grow too large. In this case one can prepare a linkage data file that tells OptiPa which condition from the condition file to use for which experimental data from the experimental data file. If no linkage file is provided the condition file and the experimental data file are assumed to be linked one to one through the experimental numbers mentioned in the first columns of both files. If a linkage file is needed it should be located in the data folder of the project.

Fig. 1 Layout of the linkage data file showing an example of a linkage data file containing per row a number of one of the experiments and the number of a condition to which it should be linked.

Important notes

The first row of this file will be interpreted as column headings

The first column of this file must contain experiment numbers from the experimental data file.

The second column of this file must contain experiment numbers from the condition file.


o , (comma)

o ; (semicolon)

o | (tabular stop)

o _ (white space)

Whenever you change the data file you need to reload the file through the model control menu.

22 / 134

Model implementation: model definition file The OptiPa model file (OMF-file) containing the model definition (Fig. 1) should be located in the model folder of the project. The template for a new model definition file can be created through the file menu. It consists of four distinctive parts: model initialisation (Fig. 1a, line 1-63), data pre-processing (Fig. 1b, line 65-94), the actual model definition (Fig. 1c, line 96-124) and the model post-processing (Fig. 1d, line 126-163).

All text starting with the %-sign (the green text) are comments to help you implementing the model.

All text marked by the yellow bars, indicate the various sections of the model definition and should be left unchanged. They consist of pairs of tags opening and closing each section, are enclosed within broken brackets with the closing tag containing an additional forward slash.

The text marked by the blue bars are tags to mark the model parameters, the model ODE state variables, and other output variables generated from them.

The text marked in green are functions to define the derivatives of the model state variables.

23 / 134

The model initialisation part

Fig. 1a The model initialisation consists of two sections: the parameter definition section (line 15-36) and the state definition section (line 49-63). In the first part of the parameter definition section (line 16-26) global named model constants can be

24 / 134

defined (either or not using some initial calculations in the free programming space) that will be available to the rest of the model definition by their name. The parameter definition section subsequently defines the model parameters for the estimation procedure in terms of their names and their initial values (line 27-35). Each parameter is preceded by the <Param> tag. In the state definition section, the ODE state variables are declared in terms of their names and their initial values (line 59-61). The initial values can be obtained using some initial calculations in the free programming space. As these initial values for the ODEs are initialised for each experiment simulated, experiment specific values of the input variables can be used. These initial values for the ODEs can also be incorporated as model parameters to be estimated. Each state variable is preceded by the <State> tag.

The pre-processing part

Fig. 1b The pre-processing part (line 65-94) can be used for analytical manipulations of the original experimental data. This can be used to transform the original experimental data. No new variables can be generated, so the transformed data should always replace one of the existing data columns. This option was especially designed for cases where the transformation parameters need to be estimated as part of the whole model fitting process. If not required lines 82-93 can just be deleted or commented out by adding a percentage sign at the start of each line. Do not delete the opening and closing tags of the section. See demo files for examples of how to use this section.

25 / 134

The model definition part

Fig. 1c The model definition part (line 96-124) is where the actual model is being defined. Depending on the model this section might look differently. However, in all cases it should contain a definition of the derivatives for each of the state variables defined in the state definition section using the Deriv(...) function in combination with the correct name of the ODE state variable. When one wants to use the model input variables identified in the condition file, one can access these by using the functions with the same names as given in the condition file. See demo files for examples of how to use this section.

26 / 134

The post-processing part

Fig. 1d The post-processing part (line 97-115) can be used for analytical manipulations of the ODE model results. This can be used to calculate simple dependent variables that are directly derived from the ODE model results or to do additional transformations (change the output to the appropriate units for instance). Newly created output variables should be preceded by the <Output> tag. If not required lines 145-162 can just be deleted or commented out by adding a percentage sign at the start of each line. Do not delete the opening and closing tags of the section. See demo files for examples of how to use this section.

Most important note of all notes in OptiPa

All model parameter values are by definition assumed to be positive. The actual sign of the parameter is determined by the way it is used in the ODEs. So it is up to the user to determine on beforehand whether a parameter has an additive or subtractive effect. This was done to prevent complete garbage when the optimisation routine accidentally would change the sign of a parameter from positive to negative or the other way round. The consequence is that one should be suspicious for any parameter that keeps on going towards zero. In such case one might try to change the sign of the model parameter to check whether one might have made the wrong assumption about the sign of that particular parameter.

Important notes

All text starting with the %-sign (the green text) are comments and can be deleted.

All section tags marked by the yellow bars should be left unchanged.

If a section is not in use their tags should remain but the programming code they are enclosing can be removed.

A model should at least contain one parameter, one state variable and its corresponding derivative.

27 / 134

Therefore, each OMF-file should at least contain one <Param> tag, one <State> tag and one Deriv(..) function. The <Output> tag is not obligatory as it is only required when you want to define additional non-ODE variables

The model parameters identified in the model initialisation part are declared through their names followed by initial starting values. Subsequently these named model parameters can be used in the remainder of the model definition part.

In the pre-processing part the experimental data can be transformed. The experimental data is available through column vectors with the same names as given in the experimental data file. The experimental time vector is given as: t_exp irrespective of its name in the header of the experimental data file. These transformations are calculated per experiment simulated, so you can make use of experiment specific conditions or parameter values. The current condition data is available through functions with the same names as given in the condition file. The condition time vector is given as: t_cond irrespective of its name in the header of the condition data file.

In the pre-processing part only transformation of existing experimental variables is allowed, not the creation of additional new variables.

In the model definition part a continuous data flow can be generated to provide the model with independent input variables based on the condition file using linear interpolation. The current condition data is available through functions with the same names as given in the condition file. These allow you to access the condition data in a continuous way by linear interpolation using: MyConditionColumnName(t) which will give you the value of MyConditionColumnName at the current time t.

There is a standard procedure available to calculate an Arrhenius based temperature dependence for your model parameters. The syntax for this is: karr(kref, Ea, Temp, Tref)

Make sure all of the parameters declared in the model initialisation part that needs estimation are used in one way or another in the model definition part.

In the post-processing part, the model output can be transformed. The results of all ODE variables are available through column vectors with the same names as declared in the section <StateDef> (see at top). The model time vector is given as: t_mod. New output variables require declaration using the <Output> tag.

These transformations are calculated per experiment simulated, so you can make use of experiment specific conditions or parameter values. The current condition data is available through functions with the same names as given in the condition file. These allow you to access the condition data in a continuous way by linear interpolation using: MyConditionColumnName(t_mod) which will give you the value of MyConditionColumnName at all modelled time points of t_mod.

Constants, variables and parameters used in the model definition should be unique and not overlap with the names used in the experimental data file nor with the names used in the condition file.

In defining the model, the user can use any function available in the MatLab base system while adhering to the syntax rules from Matlab.

Whenever you change the model you need to reload the OMF-file through the model control menu. Once the model OMF-file has been coded, as illustrated above, OptiPa can be started to analyse the experimental data using the model. The actual programming required to implement a new model is thus limited to the bare essentials.

Update notes

As of v6.2p the user variables provided in the condition file are available as functions with the same names as given in the condition file. These functions require a single input, being time, and will generate as an output the interpolated value of the column variable it is referring to. As a result, the interpolated condition data is immediately available for use in the model definition file. At the same time this creates the one incompatibility issue with previous versions as the variables from the condition file are no longer available as column vectors, only as a function. So any existing user defined interpolations in the model definition file will generate an error. In previous versions to access the current temperature at time t users would typically define this in the model definition file using the interpolation function as:

TempCurrent = interp1(t_cond,temp,t);

28 / 134

As of v6.2p this should be replaced by the function named after the variable defined in the condition file using:

TempCurrent = temp(t);

In older OptiPa versions the model files were saved as m-files (with the extension *.m). In the current OptiPa version the model files have been renamed to OMF-files (with the extension *.omf). If you want to move models from previous OptiPa versions just rename the model m-files to OMF-files by changing their extension from *.m to *.omf..

If OptiPa finds m-files in the current model folder it will offer to rename them for you into OMF-files.

29 / 134

The different time concepts During the preparation of the input files (experimental data file and condition file) and the model OMF-file (model definition) you have run into the various usages of time.

The experimental data file contains a column time which refers to the time points at which experimental data was collected.

The condition file contains a column time which refers to time points at which data was collected on independent experimental variables (e.g. temperature) somehow serving as an input to the model. The time range covered in the condition file should at least cover the time period during which experimental data was collected but can be collected at different time points and can be more or less frequently collected.

The model definition file contains multiple references to time:

o In the part where experimental data can be transformed the user has access to the time vector from the experimental data file. This experimental time vector is addressed as: t_exp, irrespective of the name given by the user in the actual experimental data file.

o Throughout the model file (when Initialising ODEs, when transforming experimental data, when defining the model, and when transforming ODE model output) the user has access to the time vector from the condition data file. This experimental time vector is addressed as: t_cond, irrespective of the name given by the user in the actual condition file.

o The ODE model definition part where the actual ODE's are defined is called by the ODE solver when numerically integrating the model. At every single time step during the integration the current time is available through the scalar t. So, to define an ODE which is function of time or when interpolating variables from the condition file to calculate their value at the current point in time, you should use this scalar t which at any point during the simulation will refer to the current time.

o Once the ODE model integration is finished the ODE model output is available for transformations or to generate new additional variables. The ODE model output comes with a time vector t_mod which contains all time points at which the integrator generated the ODE output.

Update note

t_cond has become obsolete from v6.2 onwards as the user does no longer need access to this time vector as OptiPa is automatically taking care of the linear interpolation of all the condition data (see here).

30 / 134

Using a spreadsheet to create ASCII input files To prepare the ASCII input files (experimental data file, condition file and linkage file) one can use a spreadsheet program like Excel. The spreadsheet should be saved as some kind of delimited ASCII file using either:

o , (comma)

o ; (semicolon)

o | (tabular stop)

o _ (white space) If you prepare your data in Excel just save the file using for instance:

o CSV (comma delimited) (*.csv)

o Text (TAB delimited) (*.txt)

o Formatted text (space delimited) (*.prn) If things go fine the spreadsheet should result in the corresponding ASCII file as shown below (Fig. 1). If, by accident, your spreadsheet contains some cells containing spaces this results in exporting empty columns and rows to the ASCII file (Fig. 2) which can be recognised from the redundant delimiters. This will generate an error message in OptiPa. You can either remove the redundant delimiters or remove the seemingly empty columns and rows from the spreadsheet file. A quick way to check for these ghost rows or columns is by opening the spreadsheet and pressing the key combination <Ctrl><End> which will make your cursor move to the bottom right position in the file (marked in Fig. 1 and 2 by the position of the marked cells).

Fig. 1 Correct layout of the spreadsheet file with the corresponding ASCII file generated from it.

Fig. 2 Wrong layout of the spreadsheet file with the corresponding ASCII file generated from it.

31 / 134

Using the proper MatLab syntax To prepare the model definition file the user has to adhere to the basic programming syntax from MatLab. Any line of code typed in the model definition needs to be evaluated by the MatLab compiler. Any syntax error will be indicated through an OptiPa error message. The full description of the functions supported through the MatLab Base system (and thus available for defining your OptiPa model) can be checked on the MatLab website. The most important aspects are however covered below. First of all, familiarise yourself with the overall structure of the model definition file as described in detail elsewhere. Subsequently, have a look at the various demo model files to get some ideas of what is possible or not. Some of the MatLab features will be discussed below using the Demo files as examples. We encourage you to play around with them and learn from the consequences! From our experience most models can be properly defined using the limited MatLab instructions outlined below. For more info check out the online Matlab help.

Comments % (percentage sign): anything following the %-sign up to the next line break will be ignored Add as many comments as you like to your model files to make them readable and to help you remember why you did what you did. If you wrote some lines you want to ignore but keep save for the future just comment them out by adding the %-sign.

Example: The following line shows an example where additional info is given after the %-sign. Only the first part of the line is executed:

>> Tref=10; % my current reference temperature for Arrhenius law

The following line shows an example where the whole line is commented out by adding a %-sign at the start of the line. The whole line is ignored:

>> % Tref=20; % my old reference temperature for Arrhenius law

Suppressing output ; (semicolon): Adding a semicolon to a statement suppresses the output of this statement to the screen. You can experiment with this by removing any of the semicolons from an existing OMF-file and check the output on screen. The output generated by your model will now become visible in the black Console-window showing the multiple evaluations of the various statements.

Example: The following statement without a semicolon at the end results in displaying the output a follows.

>> Tref=10

Tref =

10

The same statement with a semicolon at the end suppresses the output.

>> Tref=10;

When forgetting to include the semicolon, simulation or optimisation of the model results in a continuous echoing of output to the black Console-window slowing down the operation of OptiPa.

Rules for variable names MATLAB variable names must begin with a letter, which may be followed by any combination of letters,

http://www.mathworks.nl/help/matlab/functionlist.html

32 / 134

digits, and underscores. MATLAB distinguishes between uppercase and lowercase characters, so A and a are not the same variable. Although variable names can be of any length, MATLAB uses only the first 63 characters of the name, and ignores the rest. Hence, it is important to make each variable name unique in the first 63 characters to enable MATLAB to distinguish variables. When naming a variable, make sure you are not using duplicate names.

Assigning data to variables Variable names are used as placeholders for data. First you have to assign some data to variables before you can use them.

Example

Each model definition file start with a section to define fixed constants. Following the guideline above the definition of these scalar variables should look something like:

>> VariableName = NumericalValue;

And again, feel free to add some explanatory comments:

>> Hmin=124; % the asymptotic hue value at minus infinite time

>> Tref=15; % reference temperature for Arrhenius law

Arithmetic operators + Addition - Subtraction * Multiplication / Division ^ Power ( ) Specify evaluation order ... 'Soft' line break to spread out long expressions over multiple lines

Example >> Deriv(PGact) = ka_PG*PG_pre*PG_act - ki_PG*PG_act;

>> Deriv(FRC_C_U) = S00_C*ADP_C*k_SUSY_C + S10_C*ADP_C*k_SUSY_C ...

- G6P_C_T*FRC_C_U*ATP_C*k_SUSYR_C + FRC_V_U*k_FRC_VCt/Vcy ...

- FRC_C_U*k_FRC_CVt/Vcy - FRC_C_U*ATP_C*k_FRP_C ...

+ S00_C*k_INV_C + S10_C*k_INV_C - GLC_C_T*FRC_C_U*k_INVR_C

...

- k_ISO_C*FRC_C_U + k_ISOR_C*GLC_C_U;

>> Deriv(O2) = -Vmax*O2 / (km + O2);

>> EH_SS = Enz_0/(((10^(-pH))/KE1)+(KE2/(10^(-pH)))+1);

These examples are taken from the ODE model definition part where the core of the model is defined. At every single time step the ODE integrator will evaluate these statements to calculate the change over time to get to the next time step. All variables involved in these calculations are scalars (single values). This is true for the model constants (e.g. a Tref, Rgas, ...) the model parameters (e.g. ka_PG, ki_PG, Vmax, km, Enz_0, KE1, KE2), but also the model output variables (e.g. PG_pre, PG_act, O2, pH) as they just contain the current value at the current time step of the ODE integrator.

Function calls Depending on the need of the model program, calls to external functions can be made assuming these functions are part of the MatLab base system which is available in this runtime version of OptiPa.

Example Elementary Math

Trigonometric (in radians)

Trigonometric (in degrees) Exponential

Y=sin(X) Y=cos(X) Y=tan(X) Y=asin(X) Y=acos(X)

Y=sind(X) Y=cosd(X) Y=tand(X) Y=asind(X) Y=acosd(X)

sine cosine tangens inverse sine inverse cosine

Y = exp(X) Y = log(X) Y = log10(X) Y = sqrt(X)

exponential Natural logarithm Common logarithm Square root

33 / 134

Y=atan(X) Y=atand(X) inverse tangens

Temperature dependency according Arrhenius k=karr(kref,Ea,Temp,Tref)

OUTPUT:

k: value(s) of rate constant valid at temperature Temp

PARAMETERS:

kref: a single reference value for rate constant k

valid at reference temperature Tref

Ea: energy of activation (J/mol)

Temp: a single value, vector or matrix of temperatures (in °C)

Tref: an arbitrary reference temperature (in °C) at which kref is valid

USAGE: to calculate Arrhenius equation using the various model parameters as

input in combination with a fixed global model parameter (Tref) and the actual

temperature as interpolated from the condition file.

k1=karr(k1ref,Ea1,TempAct,Tref);

k2=karr(k2ref,Ea2,TempAct,Tref);

kv=karr(kvref,Eav,TempAct,Tref);

Flow control MatLab functions can be used to provide additional flow control. Think of if..else or for..while statements

Example This example shows how, based on the current time point t and the levels of O2 (pO2, which is interpolated from the condition files) different values are assigned to the model variable k1ref. The O2 levels actually varies depending on the experiment that is being analysed. The constants ULO and CA are part of the parameters being optimised. %-------------------------------------------------------------------------------------

% choosing the correct value of k1ref depending on the O2 level

%-------------------------------------------------------------------------------------

if t >= 244 % this is the shelflife where we have AIR conditions throughout

k1ref=k1ref;

elseif t<244 % this is the storage part where the conditions vary per experiment

if 1<=pO2 & pO2<3 % these are ULO conditions 1 kPa <= O2 < 3 kPa

k1ref=ULO*k1ref; % reduce k1ref by a constant ULO

elseif 3<=pO2 & pO2<21 % these are CA conditions 3 kPa <= O2 < 21 kPa

k1ref=CA*k1ref; % reduce k1ref by a constant CA

elseif pO2==21 % these are AIR conditions O2 = 21 kPa

k1ref=k1ref; % back to original value

end

end

%-------------------------------------------------------------------------------------

Relational operators < Less than <= Less than or equal to > Greater than >= Greater than or equal to == Equal to ~= Not equal to

Important notes

the >> in the descriptions above refers just to your input. You are not supposed to type the >> in your model definition file when you start typing statements.

The full description of the functions supported through the MatLab Base system can be checked on the MatLab website.

http://www.mathworks.nl/help/matlab/functionlist.html

34 / 134

Graphical User Interface After having prepared the relevant ASCII input files and the appropriate OMF file, the subsequent data analysis using the graphical user interface OptiPa (Fig. 1) requires only some clicks of the mouse. To run an analysis one has to load the prepared model files, identify the experiments to be used, select the model parameters of interest, build an Objective function and select an action to undertake. On completion the user will be presented graphical, numerical and statistical outputs.

Fig. 1 Main screen of OptiPa.

OptiPa can perform the following tasks: Simulation: This allows the user to run the model using the current parameter settings and visually

compare the model outcomes against measured experimental data. The graphical and numerical output is the same as generated after an optimisation run.

Optimisation: This allows the user to fit the model to experimental data by optimising one or more of the model parameters. This results in both graphical, numerical and statistical output.

Optimisation per experiment: This allows the user to fit the model to the experimental data one experiment at a time. This results in both statistical and numerical output.

Calculating conditional joined confidence regions: By scanning for each pair of model parameter values the parameter space near their estimated optimal values conditional joined confidence intervals are calculated. This results in both graphical and numerical output.

Do a sensitivity analysis: By running the model several times with different model parameter values insight can be gained in the sensitivity of the model towards the different model parameters. This results in numerical output only.

Bootstrapping: This allows the user to determine confidence intervals for the model parameters by repeatedly fitting the model on a large numbers of bootstrap datasets generated using a model based error resampling. This results in both graphical, statistical and numerical output.

Monte-Carlo simulation: Based on either an initial optimisation or some previous bootstrap results the

35 / 134

user can generate 95% confidence intervals of the model by repeatedly simulating the model using randomly generated co-varying parameter values. This results in both graphical, statistical and numerical output.

Draw from distributions: Based on an arbitrary set of co-varying variables OptiPa can be used to generate new sets of random co-varying parameter values with the same structure as the starting set provided resulting in numerical output only.

Main menu bar The main window's menu bar has three menu buttons (Fig. 2). The first one is related to file and project handling. The second menu button is used to set the various program options. The third menu button gives access to some help options.

Fig. 2 The main window's menu bar

Button panel Each of the tasks can be activated from the button panel (Fig. 3).

Fig. 3 Start button panel to activate the different tasks.

The Simulation button is always available and a simulation can be actioned by simply pressing the button. The second button by default will refer to the Optimisation process and can be activated by pressing the white Start button. The function of this second button can however be changed by pressing the button itself and selecting a different function from the dropdown box that will be revealed (Fig. 4).

Fig. 4 Start button panel showing changing the functionality of the second button.

Each of the tasks (except for a simulation run) can be activated by pressing the white Start button and interrupted by pressing the Stop or Break button (Fig. 3). Pressing the Break button will interrupt the process leaving all parameter values at their initial values. Pressing the Stop button will interrupt the process setting all parameter values to their most recent values as assigned during the current process. This feature can be used to interactively tune the optimisation process by turning on/off certain parameters for optimisation when you do not want to wait till the optimisation has fully reached its optimum. When OptiPa becomes unresponsive during parallel computations, the STOP and BREAK buttons might no longer work. In this case you can use CTRL-ALT-F11 to stop the calculations or CTRL-ALT-F12 to break the calculations. Please press these key combinations until you notice on the Console window the message 'interrupted by user'.

36 / 134

As the variables and parameters provided in the user data files and in the model file are all treated as variables within OptiPa, no duplicate names are allowed. When triggering any of the actions above, OptiPa will first check for this. If duplicate names are found a warning is issued (Fig. 5) together with an overview of all variable names with the duplicate names marked in yellow (Fig. 6).

Fig. 5 Error message indicating duplicate variable names.

Fig. 6 Overview of all known variable and parameter names to check for duplicate names.

The following sections describe the most important elements defining the functionality of OptiPa during a standard parameter estimation procedure. Bootstrapping and Monte-Carlo simulations will be discussed separately.

37 / 134

GUI: File menu Manipulation of projects, models and parameter files is done through the file menu from the main menu bar (Fig. 1). This concerns the handling of projects (creating a new project and opening an existing project) and model files (creating a new model and editing an existing model), the loading and saving of model settings, the exporting of model parameters, and the encrypting and exporting and importing of completed models.

Fig. 1 File menu.

To start working with OptiPa the first thing to do is either to open an existing project or to create a new project through this menu. The name of the current project will be shown on the main OptiPa window (Fig. 2).

Fig. 2 Project name display.

When creating a new project a new project folder will be created within the root project folder including three sub folders: data, models and settings (Fig. 3). After starting a project one can either create a new model or edit an existing model. On generating a new model through the file menu a corresponding output folder is automatically generated in the models folder to save output from that particular model.

38 / 134

Fig. 3 Structure of the work folders generated by OptiPa.

Deleting and renaming projects, models, folders etc. Should be done through windows explorer. Note that the actual loading of selected files for simulation, optimisation, bootstrapping and Monte-Carlo simulation goes through the model control menu visible on the main OptiPa window (Fig. 4).

Fig. 4 Model control menu.

Model settings and parameter file One of the file types not mentioned so far is the model settings file (with the same name of the model but with the file extension SET). This is a file automatically generated by OptiPa containing all settings of the last successful parameter estimation with a certain model. These files are stored in the settings folder of the current project and will be overwritten time after time. Loading this settings file through the file menu (Fig. 5) is a quick way to load a model as it will load all other files as well and all parameter setting will be restored to the settings of the last successful parameter estimation. The user can manually save the current settings at any time through the file menu under different names to store certain settings for later use. The SET file is supporting parameters not estimated in common retaining all possible copy values as defined for the various groupings (see parameter control).

39 / 134

Fig. 5 Loading/Saving settings file through the file menu.

Optimised parameter values obtained after optimisation can also be saved in a user friendly ASCII format through the option of Export model parameters. The resulting PAR file is saved in the models folder of the current project. The content of this folder can be used to update the model initialisation part of the model by simply copying the content to the corresponding part of the model OMF file. The PAR file does ONLY support parameters estimated in common (see parameter control). Those parameter for which a specific grouping was defined will lose this information and only the last known common value is saved in the exported PAR file. This to stay compatible with the format expected in the model definition file. The layout will be automatically adapted to the syntax version of the model OMF file.

Fig. 6 Layout of the PAR file containing the current (optimised) model parameter values. Left: old syntax - pre v6.2p, Right: new syntax - v6.2p onwards.

Recovering parameter estimates In the case OptiPa crashed or you accidentally interrupted it by pressing the Break button the optimisation results are no longer available. However, as long as you did not initiated a new optimisation yet, the optimisation history is still available in the log file of OptiPa (Fig. 7) which is located in the installation folder of OptiPa. This file keeps track of all parameter combinations evaluated including the residual sums of squares. From this file, based on the residual sums of squares, you can still recover the best parameter set obtained so far. The only restriction is that this is done immediately before starting any other optimisation.

40 / 134

Fig. 7 Log file generated during the optimisation routine with one column per estimated model parameter and the last column containing the residual sums of squares.

To do so you should first save a new model settings file for the model you want to recover making sure to make the same selection of parameters being estimated and indicated whether they were estimated in common or according any of your possible grouping parameters. If you want to recover parameters after pressing the Break button you can simple select Save model settings from the file menu. If you want to recover parameters after a crash you will have to restart OptiPa and reload the model completely, as described here, after which you select Save model settings from the file menu to create an up-to-date parameter settings file. Only once you have this parameter file you can proceed to the actual recovery by selecting Recover parameter estimates from the file menu. This will show a confirmation screen (Fig. 8) after which you will be requested to load your settings file containing the setting of the optimisation session you want to recover. If the recovery was successful you will be prompted to provide a name for the new settings file containing the recovered parameter values. This new settings file can be used to reload your model with the optimised parameter values through the Load model settings option on the file menu. If your settings file did not match the log file you get an error message (Fig. 8). Either you made a mistake in preparing the settings file (selecting the wrong parameters) or the log file was already overwritten by another optimisation. In the latter case the results are lost. In the first case you can try again by preparing a new settings file.

Fig. 8 Recovering parameter estimates from the log file. When the current settings file does not match the Log file an error message is generated.

41 / 134

Encrypting your model The situation might occur when you want to share a model with someone else but do not want to give away the coding of the file. In such situation you can decide to encrypt your model file. After encrypting the model file, the user of the encrypted file retains all the functionality when using the encrypted model file with OptiPa except for the fact that the model can no longer be opened in an editor and thus can no longer be changed or even properly be viewed. The model is stored in an internal encrypted file format.

Fig. 9 Encrypting the current model through the file menu.

By selecting the option Encrypt current model from the file menu the user is asked to provide a name for the new encrypted copy of the current model (Fig 10) after which a message on the successful encryption will be presented.

Fig. 10 Providing a name for the new encrypted model. In this example Demo2 was loaded and an encrypted copy was generated under the name Demo2_secret.omf.

The encrypted model will be saved as an OMF-file in the models folder of the current project. In addition a

42 / 134

new model output folder …\models\OUTPUT[modelname] relative to the current project with modelname the name of the encrypted model file (OMF-file). At the same time a new settings file will be generated based on the default settings file of the current model but now under the name of the encrypted model file. This settings file is stored in the settings folder of the current project. The encrypted model can thus be loaded as normal, either through the yellow buttons of the model control menu or through the load model settings option of the file menu. Note that the encrypted OMF-file is just an almost empty placeholder with the standard text '3ncript3d' (Fig. 11) while the actual model code has been encrypted and stored in a corresponding ENC-file with the same name as the OMF-file (in case of the example from Fig. 10 this would be a file called Demo2_secret.enc).

Fig. 11 OMF-file generated as an empty place holder for the ENC-file containing the actual encrypted model code.

Exporting and importing a model To facilitate the exchange of models between users an export-import functionality has been included. A model can be exported by selecting a recent settings file which contains references to a model definition file with the corresponding data input files (the experimental data file, the condition file and the linkage file) and which contains the final optimised parameter values. Based on this information a packaged model file will be generated and saved as an OMP-file in the models folder of the current project. In case of exporting an encrypted model the corresponding ENC-file will be included in the package as well. Also the two basic output files (modeldata.csv and simulation.csv) will be included in the packaged OMP-file.

Fig. 12 Selecting a settings file to be exporting to a OMP-file packaged. In this example Demo2secret with all its corresponding files and parameter settings was packaged and save under the name Demo2_secret.omp.

The generated OMP-file can be exchanged with colleagues. To import such OMP-file the Import model option is selected from the file menu. One is asked for the OMP-file to be imported (Fig. 13) after which one has to indicate the destination project into which the new imported model will be copied (Fig. 14). There is a check to see whether there already exists such model in the selected project. If that is the case the user will be asked for confirmation before completing the importing.

43 / 134

Fig. 13 Selecting an OMP-file to be imported.

Fig. 14 After selecting the OMP file to be imported one is asked for a destination project into which the imported model will be copied. The successful import is confirmed through a dialog window

Important notes

The automatically generated settings file will automatically overwrite the file from the previous optimisation. So when one wants to save a certain setting, one can manually save the settings file under a different name using the file menu to prevent it from being replaced by a newer one.

As long as during parameter recovery the number of parameters selected for optimisation in the settings file matches the number of columns in the log file the recovery will proceed. However, if accidentally this concerned another selection, but with the same number of parameters, the recovery will erroneously proceed and OptiPa will not notice the mistake. Of course, the subsequent simulations most likely won't make any sense anymore as the parameter values most likely won't be realistic.

After encryption make sure to keep your original version of the model as the encrypted copy cannot be viewed or edited.

If you want to share an encrypted model with a third party, make sure they have access to OptiPa v6.1p or later.

44 / 134

Regardless the name of the selected settings file the name of the packaged OMP-file is always based on the name of the OMF model file that is referred in the settings file.

Do not change the name of an OMP-file as the name of the OMP-file will be used during the importing and should match the name of the packed model OMF-file.

45 / 134

GUI: Options menu The second menu button is used to set the following general program options:

the number of cores used for parallel computations,

the ODE solver to use,

the optimisation method to use,

the discontinuity strategy to use,

the default location for projects,

which text editor to use,

whether to overwrite/append the statistical output to file,

the default delimiter for all ASCII output files.

Fig. 1 Set options menu showing all sub menus

Assign number of cores Through the first entry of the options menu, the number of cores used for parallel processing can be set. In between the square brackets the currently active number of cores is indicated. To change the number of cores, OptiPa has to be restarted. The maximum number of cores you can request will be automatically indicated (Fig. 2). It depends on your hardware configuration but is limited to 12 as this is the maximum number of cores currently supported by MatLab. When you select one single core you effectively rule out all parallel computations. In the case of wishful thinking, entering a higher number of cores than you actually have, the number will be automatically changed to the maximum number of cores present on your machine.

Fig. 2 Core Selection menu.

ODE-solver To be able to simulate a model it has to be implemented in terms of ODEs, and an ODE solver has to be selected for numerically integrating the model. In OptiPa a variable-step continuous solver, ODE45, was selected as the default solver. Variable-step solvers decrease the simulation step size to increase accuracy when a system's continuous states are changing rapidly and increase the step size to save simulation time when a system's states are changing slowly. In general, ODE45 is the best solver to apply as a first try for most problems. Besides ODE45 the other ODE solvers from MatLab are available as well (Fig. 3) being:

ode45, Nonstiff Most of the time. This should be the first solver you try.

46 / 134

ode23, Nonstiff For problems with crude error tolerances or for solving moderately stiff problems.

ode113, Nonstiff For problems with stringent error tolerances or for solving computationally intensive problems.

ode15s, Stiff If ode45 is slow because the problem is stiff.

ode23s, Stiff If using crude error tolerances to solve stiff systems and the mass matrix is constant.

ode23t, Moderately Stiff For moderately stiff problems if you need a solution without numerical damping.

ode23tb, Stiff If using crude error tolerances to solve stiff systems.

Fig. 3 ODE solver menu.

Optimisation method OptiPa provides several ways of fitting the model to the data by estimating values for the unknown model parameters thus minimising the discrepancy between predictions by the model and the measured experimental data (Fig. 4). The following routines are available:

LSQnonlinear (MatLab, Optimisation Toolbox): This is the default optimisation method and makes use of the Levenberg-Marquardt method.

Direct pattern search (MatLab, Genetic Algorithm and Direct Search Toolbox): Direct pattern search is a method for solving optimization problems that does not require any information about the gradient of the objective function.

Genetic algorithm (MatLab, Genetic Algorithm and Direct Search Toolbox): The genetic algorithm is a method for solving both constrained and unconstrained optimization problems that is based on natural selection.

Differential evolution algorithm (Kenneth Price and Rainer Storn): The Differential evolution algorithm is a population-based optimizer that attacks the starting point problem by sampling the objective function at multiple, randomly chosen initial points.

Enhanced Scatter Search (Process Engineering Group IIM-CSIC): Scatter search is a population-based metaheuristic that has shown to yield promising outcomes for solving combinatorial and nonlinear optimization problems.

Fig. 4 Optimisation Method menu.

http://www1.icsi.berkeley.edu/~storn/code.html

http://www.iim.csic.es/~gingproc/ssmGO.html

47 / 134

Discontinuity strategy The condition file might contain information on independent experimental conditions used as an input to the ODE model. One should always adjust the level of detail of the input data to what is relevant. However, sometimes it might not be possible to ignore or to smooth out relatively short lasting (discrete) events in the input data. OptiPa has two different approaches to deal with discontinuities in the input data from the condition file. 1. By selecting the default Continuous option (Fig. 5) as the discontinuity strategy, OptiPa simulates

each experiment in a single run starting from t=0 to the final point of time. However, the data frequency presented in the condition file will affect the integration process. When short time steps are introduced the maximum time step allowed for the ODE solver is reduced to match the smallest time step used in the condition data file. This is to prevent the ODE solver from overlooking such short lasting events. When the condition file is limited to larger time steps the maximum integration step will be accordingly larger. Using smaller time steps than required (providing temperature data for every second while a typical event will take hours) will unnecessarily slow down the ODE solver and thus the whole simulation/optimisation process.

2. By selecting the Discrete option (Fig. 5) as the discontinuity strategy, OptiPa simulates each experiment by breaking up the condition file into multiple parts from one time point to the next time point taking the final simulation values of one part as the starting values for the simulation of the next part. The good thing is that simulation of long continuous periods are not hampered by the presence of short discrete events elsewhere in the condition file. As such, the simulation of a scenario with a few discontinuities will go much faster as when using the Continuous option. The downside is that the ODE solver has to initiate over and over again whenever starting the subsequent parts of the overall scenario (which takes time as well). In addition the first derivative is no longer continuous.

In general, when a scenario contains a few short fast events it is worthwhile trying the Discrete option. When there are just a few events describing moderate to slow changes the Continuous option might be more efficient. When the condition file contains a continuous stream of events (like the output from a temperature recording reading at 10 minute intervals) the Continuous option might still be more efficient as, in spite of being limited in its maximum integration step, it does not require the multiple initiations of the ODE solver needed when using the Discrete option.

Fig. 5 The Discontinuity strategy sub menu.

Project root The default location for saving OptiPa projects is the sub folder …\projects relative to where the OptiPa program files are located. By changing this location all input and output will be redirected to this new location. On selecting this menu entry you are prompted to select your root folder for your OptiPa projects (Fig. 6).

48 / 134

Fig. 6 Selecting the root folder for your projects.

Model editor When using OptiPa within the MatLab environment the internal MatLab editor is used to edit the model OMF-files. In the standalone version this editor is no longer available. One can use the standard windows Notepad editor (Fig. 7) or any user defined ASCII text editor one has installed by just selecting the option Select Other and pointing to the executable program file of your editor of choice (Fig. 8). A good text editor to use is the freeware editor Notepad++. as it supports colour coding of programming syntax making it easier to read the OMF-files.

Fig. 7 Model editor selection menu.

Fig. 8 Selection of your own favourite text editor.

If you decide to work with the freeware editor Notepad++ you might want to tell him that the OMF-files are similar to matlab m-files. You can do this by going to the Settings menu of Notepad++ and select the Style

http://notepad-plus-plus.org/

http://notepad-plus-plus.org/

49 / 134

Configurator. In the Language column scroll down to Matlab. Select it and add the User extension OMF as indicated in Fig 9 and finally press Save & Close. Now you are all set and go to start editing your own models.

Fig. 9 How to tell Notepad++ to recognise the OptiPa OMF-files as Matlab files.

Stats output file The statistics output is written to file by default appending to the existing file, thus keeping all information of subsequent optimisations together. However, this file might steadily grow to an unmanageable size. Through this menu option (Fig. 10) one can toggle between Append and Overwrite. This option can be used to effectively remove all old statistical output and start from scratch again.

Fig. 10 Statistical output menu.

Default number of plots After a simulation, model results for all selected experiments will be plotted in a single matrix plot. When a high number of experiments is selected the plots will become too small to read and generating the plot takes too long. In this menu you set the maximum number of experiments to be plotted at once after a successful simulation or optimisation.

50 / 134

Fig. 11 Sub menu to select default number of experiments to be plotted at once during simulation and optimisation output.

ASCII delimiter The available options for the ASCII delimiter are (Fig. 12):

, (comma)

; (semicolon)

| (tabular stop)

_ (white space) with the semicolon being the default setting. This setting is only relevant for output files. When reading input files, OptiPa will determine the delimiter itself.

Fig. 12 Set options menu showing the ASCII delimiter sub menu.

Important note

LSQnonlinear is the default optimisation routine. Though other algorithms can be used during optimisation (either or not per experiment), LSQnonlinear is the routine exclusively used during both Bootstrapping, Monte-Carlo simulation and the calculation of conditional joined confidence regions. Any other selection will be automatically overruled.

After an optimisation using the Discrete discontinuity strategy always check the results by simulating the model once more using the Continuous discontinuity strategy to make sure that no artefacts have been induced related to the derivative of the model no longer being continuous.

51 / 134

GUI: Help menu The help menu (Fig. 1) provides access to this help file including the PDF version. It furthermore provides a link to the OptiPa website, the PDF version of a publication on OptiPa and the acknowledgements.

Fig. 1 Help menu.

Through the option Show all variable names the user is presented with an overview of all variable names used in the user data files and in the model file. Possible duplicate names are marked in yellow (Fig. 2).

Fig. 2 Overview of all known variable names to check for duplicate variable names.

Through the help menu a simple equation checker is provided. This is to facilitate the readability of complex equations from the model OMF-files. One can make use of the equation checker by first copying an equation text string to the computer's clipboard by simply marking the text and pressing the key combination CTR-C (Fig. 3).

52 / 134

Fig. 2 Selection of an equation text string to be checked with the equation checker.

Once the equation string has been copied to the clipboard one can select the option Equation Checker from the help menu. This will open the equation checker window for the copied equation string (Fig. 3). The top line contains the equation string which can be edited by the user. The window underneath contains a graphical representation of the equation text string. When editing the text string, the graphical window can be updated by hitting <Enter> twice. Once finished, on pressing the OK-button the final (corrected) equation text string is copied to the clipboard and can be pasted back into the model OMF-file (by pressing the key combination CTR-V in your text editor where the model file was opened).

Fig. 3 Selection of an equation text string to be checked with the equation checker.

Additional options are available through the context menu that can be accessed by right-clicking on the graphical representation of the equation (Fig. 4). This includes changing the font size and adapting the vertical alignment. Changing this alignment mainly affects the representation of large divisions.

Fig. 4 Equation checker showing the additional options available through the context menu.

Through this context menu, the equation can also be copied to the clipboard as either raw LaTeX code or as a graphical image (Fig. 5).

$$EH_{SS}=\frac{Enz_{0}}{\left(\frac{{{10}^{\left(\textrm{--

}pH\right)}}}{KE1}\right)+\left(\frac{KE2}{{{10}^{\left(\textrm{--

}pH\right)}}}\right)+1}$$

Fig. 5 Equation image and corresponding LaTeX code generated by the equation checker.

53 / 134

If no equation (or an equation with a syntax error) was found on the clipboard on initiating the equation checker, the user will be presented a dummy input screen to start typing an equation from scratch.

Fig. 6 Equation checker with dummy input.

If the equation is too long to fit in the graphical window of the equation editor you can either reduce font size or use the scroll buttons in the left bottom corner to move the equation sideways (Fig. 7).

Fig. 7 Scrolling the equation sideways in the equation editor.

54 / 134

GUI: Model control menu OptiPa provides a simple model control menu to load the different files (experimental data file, condition file, linkage file and model definition file) required for subsequent simulation, optimisation, bootstrapping or Monte-Carlo simulation. Note that this model control menu is different from the file menu from the main menu bar.

Fig. 1 Model control menu.

Next to the yellow load button you find an notebook icon to quickly open the model-file in your favourite text editor (as defined through the options menu) and some spying eye icons to view any of the data input file using a simple spreadsheet viewer (Fig. 2).

Fig. 2 Spreadsheet viewer activated by selecting the 'Eye' icon (Fig. 1).

55 / 134

When you try to load a model-file and OptiPa finds old model m-files in the current project it will offer to rename them for you into OMF-files to bring them up to date with OptiPa version 6.* (Fig. 3).

Fig. 3 Old model files found.

The experimental data file and the model definition file are obligatory to provide. The condition file and the linkage file are not. Only in the cases of providing independent input variables (temperature, relative humidity, ethylene levels, etc.) or wanting to use grouping variables to estimate multiple copies of certain model parameters the condition file (and in some cases a linkage file) has to be supplied. The linkage file should be loaded after loading the experimental data file and the condition file. On changing the experimental data file or the condition file the selection of the current linkage file will be cleared as it is assumed that this linkage file is unique for a certain experimental data file - condition file combination. After selecting any of the buttons, a file load menu appears showing a selection of files you can choose from (Fig. 4). Pressing the OK button will confirm your choice.

Fig. 4 File load menu.

Syntax checking On loading of the model OMF-file the Matlab syntax will be checked. In the case of syntax errors OptiPa will return an error warning and generates an error log referring to line numbers and column positions (Fig. 5). Of course some errors are not syntax errors but will still make OptiPa to fail....

56 / 134

Fig. 5 Error warning on syntax checking of the OMF-file.

57 / 134

GUI: Preparing the optimisation Selecting subsets of data After loading the model files one can select or deselect data of one or more experiments to specify data to be used for parameter estimation as indicated by the zeros and ones in front of the experiment numbers (Fig. 1). By (de)selecting the checkbox all experiments can be (de)selected at once.

Fig. 1 Selecting experiments.

Fig. 2 Selecting experiments in case of a linkage file supplied showing which experiment is linked to which condition.

When a linkage file was supplied the window displaying all experiments (Fig. 2) will contain a third column indicating which experiment is linked to which condition.

Parameter control When the model is loaded, OptiPa shows the different model parameters in a dropdown box (Fig. 3). The zeros and ones in front of the parameters indicate whether these are selected for optimisation or not.

Fig. 3 Selecting parameters.

After selecting one of the parameters from the dropdown box one can change settings for this parameter (Fig. 4). The initial value is set to the value given in the model OMF-file but can be changed using the slider control or by typing a new value. To include the parameter in the optimisation process one has to check the checkbox in front of the initial value. (To (de)select all parameters at once (de)select the checkbox labelled (de)select all parameters for optimisation). Parameters that got assigned a value of zero cannot be selected for optimisation. By default provide non-zero positive values for parameters you want to estimate.

58 / 134

Fig. 4 Changing parameter settings.

By default each parameter will be estimated in common for all the experimental data supplied. However, the columns provided in the condition file can be used to introduce a certain grouping in the data. By default (if a condition file is provided) one can make use of the variable experiment, but one could also introduce groupings like cultivar, year, harvest, etc. (Fig. 5). These groupings can be used to estimate multiple copies of a certain parameter, depending on the number of groups defined.

Fig. 5 Generating multiple parameter copies.

OptiPa will automatically generate as many copies as there are groups defined. By selecting (in the example from Fig. 6) for instance the grouping by experiment, 4 copies are created (Fig. 6). After selecting one of the parameter copies from the dropdown box one can change settings for this copy (Fig. 6). The initial value at creation of the copies is set to the value of the original parent, but can be changed using the slider control or by typing a new value. To include this copy of the parameter in the current optimisation process one has to check the checkbox in front of the selected parameter copy. The zeros and ones in front of the experiment number indicate whether these are selected for optimisation or not. So one could manually set certain copies to a fixed value while estimating the value for the other copies.


Objective function Generally spoken, parameter estimation aims at minimizing a so called Objective function which represents the distance between the experimentally measured state variables and the corresponding model predictions. This Objective function is defined by pressing the red button (Fig 7).

Fig. 7 Button to define the Objective function.

59 / 134

In OptiPa, a least square Objective function has been implemented. Any deviation of an observed score from a predicted score signifies some loss in the accuracy of the model prediction, for example, due to random noise (error). In standard multiple regression the model parameters are estimated by "finding" those parameter values that minimise this residual variance (sum of squared residuals) around the regression curve. It is up to the user to define the Objective function (visualised in the matrix from Fig. 8) by indicating which model variable (the rows in Fig. 8) needs to be compared against which experimental value (the columns in Fig. 8).

Fig. 8 A typical Objective function matrix of OptiPa.

When the model is fitted on several variables at the same time (as in the example of Fig. 8) the total sum of squares is the combined sum of squares over all of the selected dependent variables. As indicated in the next section, one can interfere indirectly with this Objective function by transforming the data. Additionally, OptiPa also provides the option to manually add weight factors to the different dependent variables by assigning values different from one to the Objective function matrix (Fig. 8, the coloured boxes). Depending on the transformations and weight factors applied, OptiPa will come back with different sets of parameter estimates. By default the weight factor is based on the number of observations. If all variables have the same number of observations the weight factor will be 1 for each variable. If, due to missing values, one variable has more observations than another variable the weight factor is reversely related to the number of observations. This is to prevent over-emphasising variables that are measured automatically using a high sampling rate, as compared to variables measured manually at a much lower sampling rate. This assumes both have a comparable measuring accuracy. For example, the weight factor for the variable with the fewest observations is set to 1 while the weight factor for a variable with twice as much observations is set to 0.5. These default values can be overruled by manually entering alternative values. Using the Straight button all entered values will be reset to 1. Using the Clear all button, the Objective function matrix will be emptied. By entering a negative value in one of the boxes, the user is offered the option to select a column from the experimental data file to individually weigh the residuals accordingly through multiplication (Fig. 9). This option can be useful when one has information about reliability of the various measurements in the form of measuring errors but without wanting to feed the model with the individual replicate measurements. In such case one could create columns with 1/SE values resulting in inaccurate measurements (high SE) being down weighted. Of course, such additional weight columns will also show up as extra columns in the Objective function matrix but these can be ignored. The negative value entered plays no rule as such and will always be set to -1, regardless the actual input of the user. In the end, it is the selected vector from the drop down box that determines the actual weighing of the residuals of the concerned measured-modelled combination

60 / 134

Fig. 9 An Objective function matrix with weighing of the variables firmness and PG using additional columns from the experimental data file.

Important notes

One can assign a zero value to certain model parameters, but these parameters will be automatically excluded from the optimisation process. The reason for this is that MatLab won't be able to determine a stepsize for these zero values during the iterative process of parameter optimisation.

Parameters can only be included in the optimisation process when they are initiated at non-zero positive values.

Negative values for the model parameters are not accepted and they will automatically turned into their positive counterpart.

Columns from the experimental data file used for weighing of individual residuals through the objective function, should not contain missing values.

Columns from the experimental data file used for weighing individual residuals through the objective function, can contain zero-values to effectively remove individual data points from the residual sums of squares.

61 / 134

GUI: Selecting an optimisation method

Optimisation method OptiPa provides several ways of fitting the model to the data by estimating values for the unknown model parameters thus minimising the discrepancy between predictions by the model and the measured experimental data (Fig. 3). The following routines are available:

LSQnonlinear (MatLab, Optimisation Toolbox): The least square optimisation is the default optimisation method is carried out using the MatLab least square non linear optimisation routine (lsqnonlin) with the Levenberg-Marquardt method. The Levenberg–Marquardt algorithm provides a numerical solution to the problem of minimizing a function, generally nonlinear, over a space of parameters of the function. The method interpolates between the Gauss–Newton algorithm (GNA) and the method of gradient descent. The Levenberg–Marquardt algorithm is a very popular curve-fitting algorithm used in many software applications for solving generic curve-fitting problems. However, it finds only a local minimum, not a global minimum. Therefore it is mandatory to select proper realistic starting values for the model parameters and to compare various solutions after starting from different starting points. In this way one can increase the likelihood of obtaining a global minimum.

When the function has multiple local minima and one is very unsure about proper starting values for the various parameters one can use one of the other algorithms that all scan a much wider range of parameter values and therefore are potentially more successful in obtaining a global minimum.

Direct pattern search (MatLab, Genetic Algorithm and Direct Search Toolbox): Direct pattern search is a method for solving optimization problems that does not require any information about the gradient of the objective function. Unlike more traditional optimization methods that use information about the gradient or higher derivatives to search for an optimal point, a direct search algorithm searches a set of points around the current point, looking for one where the value of the objective function is lower than the value at the current point. You can use direct search to solve problems for which the objective function is not differentiable, stochastic, or even continuous. At each step, the algorithm searches a set of points, called a mesh, around the current point—the point computed at the previous step of the algorithm. The mesh is formed by adding the current point to a scalar multiple of a set of vectors called a pattern. If the pattern search algorithm finds a point in the mesh that improves the objective function at the current point, the new point becomes the current point at the next step of the algorithm.

Genetic algorithm (MatLab, Genetic Algorithm and Direct Search Toolbox): The genetic algorithm is a method for solving both constrained and unconstrained optimization problems that is based on natural selection, the process that drives biological evolution. The genetic algorithm repeatedly modifies a population of individual solutions. At each step, the genetic algorithm selects individuals at random from the current population to be parents and uses them to produce the children for the next generation. Over successive generations, the population "evolves" toward an optimal solution. You can apply the genetic algorithm to solve a variety of optimization problems that are not well suited for standard optimization algorithms, including problems in which the objective function is discontinuous, nondifferentiable, stochastic, or highly nonlinear. The genetic algorithm uses three main types of rules at each step to create the next generation from the current population: Selection rules select the individuals, called parents, that contribute to the population at the next generation. Crossover rules combine two parents to form children for the next generation. Mutation rules apply random changes to individual parents to form children. The genetic algorithm differs from a classical, derivative-based, optimization algorithm in two main ways. While a classical algorithm generates a single point at each iteration with the sequence of points approaching an optimal solution a genetic algorithm will generates a population of points at each iteration with the best point in the population approaching an optimal solution. While a classical algorithm selects the next point in the sequence by a deterministic computation a genetic algorithm selects the next population by computation which uses random number generators.

Differential evolution algorithm (Kenneth Price and Rainer Storn): The Differential evolution algorithm is a population-based optimizer that attacks the starting point problem by sampling the objective function at multiple, randomly chosen initial points. Pre-set parameter bounds define the domain from which the initial populations are chosen. The Differential evolution algorithm generates new points that are perturbations of existing points, but these deviations are neither reflections nor samples from a predefined probability density function. Instead, Differential evolution algorithm perturbs

http://www1.icsi.berkeley.edu/~storn/code.html

62 / 134

vectors with the scaled difference of two randomly selected population vectors To produce the trial vector, the Differential evolution algorithm adds the scaled, random vector difference to a third randomly selected population vector. In the selection stage, the trial vector competes against the population vector of the same index. The procedure repeats until all vectors have competed against a randomly generated trial vector. Once the last trial vector has been tested, the survivors of the competitions become parents for the next generation in the evolutionary cycle. The following differential evolution strategies can be used in the optimization procedure:

o Classical version: the classical version of DE,

o Local-to-best: a version which attempts a balance between robustness and fast convergence,

o With jitter: taylored for small population sizes and fast convergence; dimensionality should not be too high,

o With per-vector-dither: Classical DE with dither per vector to become even more robust,

o With per-generation-dither: Classical DE with dither per generation to become even more robust,

o Either-or-algorithm: Alternates between differential mutation and three-point-recombination.

Enhanced Scatter Search (Process Engineering Group IIM-CSIC): Scatter search is a population-based meta-heuristic approach that has shown to yield promising outcomes for solving combinatorial and nonlinear optimization problems. Scatter search uses strategies for combining solution vectors that have proved effective in a variety of problem settings. Scatter search orients its explorations systematically, relative to a set of reference points that typically consist of good solutions obtained by prior problem solving efforts. The scatter is based on the five-method template:

o A Diversification Generation Method to generate a collection of diverse trial solutions.

o An Improvement Method to transform a trial solution into one or more enhanced trial solutions. Neither the input nor the output solutions are required to be feasible, though the output solutions will more usually be expected to be so. If no improvement of the input trial solution results, the enhanced solution is considered to be the same as the input solution.

o A Reference Set Update Method to build and maintain a reference set consisting of the best solutions found, where the number of best solutions is typically small compared to the population size of other evolutionary algorithms, organized to provide efficient accessing by other parts of the method. Solutions gain membership to the reference set according to their quality or their diversity.

o A Subset Generation Method to operate on the reference set, to produce several subsets of its solutions as a basis for creating combined solutions.

o A Solution Combination Method to transform a given subset of solutions produced by the Subset Generation Method into one or more combined solution vectors.

The various optimisation methods can be selected from the Set Options menu (Fig. 1).

Fig. 1 Optimisation Method menu.

When one of the alternative optimisation methods is selected the user will be asked to define the scanning fold range for the parameters (Fig. 2). If a fold range of for instance 10 is defined, given the parameter starting values P, the optimisation routine will scan from P/10 to Px10.

http://www.iim.csic.es/~gingproc/ssmGO.html

63 / 134

Fig. 2 Scan range menu.

Important notes

LSQnonlinear is the default optimisation routine. Though other algorithms can be used during optimisation (either or not per experiment) LSQnonlinear is the routine exclusively used during both Bootstrapping, Monte-Carlo simulation and the calculation of conditional joined confidence regions. Any other selection will be automatically overruled.

When an optimisation routine different from LSQnonlinear is chosen the Correlation matrix and the parameters' standard deviations and confidence intervals cannot be calculated as these optimisation methods do not provide a jacobian matrix from which to calculate such information. The strategy would be to use one of the global methods to find the best global minimum and take this as a starting point for a final optimisation using LSQnonlinear.

64 / 134

GUI: Optimisation control Each of the optimisation methods require a user-defined Objective function, generally in terms of the sum of squared deviations of the observed values. The user can interfere with the iterative optimisation process by combinedly affecting the maximum number of iterations and function evaluations and the termination tolerance on the Objective function value by changing the slider control from Fig. 1 from nitpicking (very accurate) to quick and dirty (a first rough optimisation).

Fig. 1 Optimisation control menu.

Statistical Output The statistics output is written to file by default appending to the existing file, thus keeping all information of subsequent optimisations together. However, this file might steadily grow to an unmanageable size. Through this menu option (Fig. 2) one can toggle between Append and Overwrite. This option can be used to effectively remove all old statistical output and start from scratch again.

Fig. 2 Statistical output menu.

Important note

The progress of the optimisation routine strongly depends on the absolute values of the parameters to be optimised. When trying to optimise large parameters (for instance an Ea of ± 120 kJ.mol-1) together with a small parameter (for instance a film permeability of ± 2.18.10-15 mol.s-1.m.m-2.Pa-1) the optimization routine will have severe difficulties. Therefore, in OptiPa, all parameters are made relative to the initial value at the start of the optimization. As far as the optimisation routine is concerned all parameters thus start at a relative value of one.

65 / 134

GUI: Transformations Box-Cox transformation Ordinary least squares techniques assume that the residual variance around the regression line is normal distributed and the same across all values of the input variables. Often, this is not a realistic assumption in biological applications. Therefore OptiPa offers the possibility to apply a Box-Cox transformation (Sokal and Rohlf, 1995) to the data to correct for non-normality and a heteroscedastic variance structure (Fig. 1).


The Box-Cox transformation is a family of power transformation with a single parameter a and

with representing the geometric mean of variable y:

(Eq. 1) The value of a is determined by conducting a standard least squares fit of the transformed model to the transformed experimental data applying the transformation from Eq. 1. When a Box-Cox transformation is requested it will be applied to each of the dependent model variables adding as many a-parameters to the list of parameters that needs optimisation as there are dependent variables. The a-values are not estimated separately but together with all other model parameters. An a-value of one indicates normality and homoscedasticity of the concerning variable.

Normal transformation The total sum of squares strongly depend on the absolute values of the concerning variables. When the model is fitted on several variables at the same time the total sum of squares is the combined sum of squares over all dependent variables. When one variable (for instance hue colour varying in a range from 40° to 90°) is of a completely different magnitude than one of the others (for instance ranging up to 40.10-9 mol.kg-1.s-1) the sum of squares of the larger variable will completely overshadow the sum of squares of the smaller variable. As a consequence, the optimisation routine will favour the model fit for the larger variable ignoring any lack of fit for the smaller variable. To compensate for this, the user can either manually introduce weighing factors in the Objective function or apply a normal transformation to each of the dependent variables (Fig. 1) to bring all data back to the same value range with an average of zero and a standard deviation of one (N(0,1)). This normal transformation is performed per dependent variable to rescale its overall range over all experiments included in the optimisation to approximately N(0,1).

Weighing replicate measurements When there are replicate measurements available within an experiment for the various time points, the variation around the mean is giving information about the accuracy of these measurements. The higher the variation, the less accurate the measurement is. This information can be used to weigh the individual data points. When selecting this option (Fig. 1) individual data points will be divided by the standard error of the replicate measurements measured within that particular experiment at that same particular time point. Individual data points not having any matching replicate measurements will be divided by the average

66 / 134

standard error observed within that particular experiment for the other time points. To be able to use this option all experiments included in the optimisation should contain at least some replicate measurements. As soon as one of the experiments contains no replication at all the option is automatically disabled again. Reference: Sokal R R and Rohlf, F J, Biometry: the principles and practice of statistics in biological research, 3rd edition, W H Freeman and Co, New York, 1995.

67 / 134

Removing outliers After a successful optimisation OptiPa will identify possible outliers based on the 99 % confidence interval applied to the residuals. By checking the option Remove outliers from data (Fig. 1), these outliers will be ignored during a subsequent optimisation run. One should be extremely careful in using this option, as by systematically removing all outliers any model will eventually give a perfect fit to any set of data. So, only when one has reasons to treat the given points as actual outliers this option should be used. When using this option, the optimisation output will mention the fact that certain data points were removed. By repeating the optimisation process, more and more data points might be removed from the original dataset. One should basically go through the whole optimisation process without using this option and by the time one is comfortable about the optimisation obtained, one could do a final optimisation run to check for the effect of possible outliers.


68 / 134

Model simulation A model simulation is actioned by simply pressing the Simulation button (Fig. 1).


After a successful simulation OptiPa returns graphical, and numerical outputs.

69 / 134

Graphical simulation output In the output graphs, data is plotted per experiment. For each of the experiments OptiPa generates time course plots of the combined experimental and simulated data showing the (lack of) fit of the model (Fig. 1). When replicate measurements were included, the original data are plotted using symbols while larger symbols of the same colour are added to indicate the averages, valid at any time point (Fig. 2). At the bottom corner of each figure there is a numbered tile referring to the experiment number. Note that not all experiments might be plotted at once as this might be depending on the maximum number of plots as defined in the Set Options menu on the menu bar of the main OptiPa window. The control bar at the bottom of the plot window allows you to either:

reduce the number of experiments plotted by pressing the minus icon,

increase the number of experiments plotted by pressing the plus icon,

scroll backwards to earlier experiments not plotted by pressing the left icon,

scroll forwards to later experiments not plotted by pressing the right icon,

go to a specific experiment without changing the zoom level by typing a number in the central input box,

zoom to a specific experiment making that graph full size by selecting one of the numbered tiles. This allows you to basically change your zoom level from a single experiments up to all the experiments selected for simulation, allowing you to quickly investigate the individual experiments in greater detail.

Fig. 1 Simulation output window outlining the control bar.

By default, when you included multiple output variables in your model all dependent model variables are shown together using multiple coloured lines corresponding to the colours used in the Objective function matrix. Per graph and per variable, data is scaled between its minimum and maximum value (the relative x and y-ranges going from 0-1). This allows quick evaluation of the model fit in case the different variables are of complete different magnitudes. The downside is that it becomes harder to compare data from the different experiments as also the scaling can vary between experiments. The drop down box labelled All variables contains a list of all variables included in the plot (Fig. 2). You can scroll through the graphs for the individual variables by scrolling through the right hand side box which by default shows All variables (Fig. 2). By selecting a single dependent model variable from the listed variables, this selected variable will be shown using absolute scaling on both axis (Fig. 2).

70 / 134

Fig. 2 Simulation output window showing the scrolling through the individual model output variables. The drop down box All variables functions as a container holding all variables plotted. Note the relative scaling used when all variables are plotted together.

You can toggle between plotting the simulated model values and the residuals by either clicking on the white background of an individual graph, changing only that particular graph, or by hitting the button Show residuals/simulations changing all graphs at once. When the residuals are plotted, like in the first three experiments in fig. 3, the horizontal dashed line represents the absolute zero line. In case of a good fit the residuals of each single experiment should be randomly distributed around this zero line not showing any systematic deviations. To help you in evaluating the model fit at the level of individual experiments, the residuals of each individual experiment are being tested for:

having an average of 0 by using the one-sample t-test: This tests the null hypothesis that the residuals come from a normal distribution with mean equal to zero and some unknown variance. The alternative hypothesis is that the population distribution does not have a mean equal to zero.

being normally distributed using the one-sample Kolmogorov-Smirnov test: This tests the null hypothesis that the residuals come from a normal distribution, against the alternative that it does not come from such a distribution.

showing no trending by using a run test for randomness: This tests the null hypothesis that the residuals come in random order, against the alternative that they do not. The test is based on the number of consecutive residuals above or below the mean residual value.

When the null hypothesis are accepted (p>0.05) no comments are plotted. When for one or more tests the null hypothesis is rejected (p<0.05) a comment is included with the plot to indicate which aspect was not met by the data.

71 / 134

Fig. 3 Simulation output showing the time course plots of the combined experimental and simulated data with the first three experiments showing the residuals.

One can toggle between the observed model values and the residuals by clicking on one of the graphs as was done in Fig. 3 for three of the graphs or by hitting the button Show residuals/simulations. . All graph windows in OptiPa are equipped with an export figure menu button (Fig. 4). This menu allows the user to copy the figure to the clipboard, to print the figure or to export the figure to either TIF, JPG or a MatLab internal FIG format.

Fig. 4 export figure.

The model output windows are all equipped with a Plot Conditions menu button (Fig. 5). This menu allows the user to select one of the independent model variables from the condition file to be included in the graphs. This can be useful to see how the model responds to changes in these independent model variables. The selected variable is assigned a separate y-axis at the right hand side of the graph (Fig. 5).

72 / 134

Fig. 5 Including data from the condition file in the model output graphs.

73 / 134

Numerical simulation output The following numerical output files are automatically generated after each simulation, automatically overwriting previous files with the same name:

modelname.set - created in the project folder …\settings

modeldata.csv - created in the model output folder …\models\OUTPUT[modelname]

simulation.csv - created in the model output folder …\models\OUTPUT[modelname] with modelname the file name of the original model definition file (OMF-file) used. The files are created in different subfolders relative to the current project folder. modelname.set This file is the settings file which is automatically generated by OptiPa containing all settings of the last successful parameter estimation with a certain model. The file is written in internal MatLab format. modeldata.csv This ASCII file contains simulation results at the same discrete time steps as in the experimental data input file. This enables the user to quickly align input and output files to compare the simulated versus predicted values. All output variables generated by the model are saved covering any intermediate state variables and output variables that were declared in the model definition file.

Fig. 1 The left panel shows the original experimental data file while the right panel shows the corresponding modeldata.csv output file containing all output at the identical time steps.

simulation.csv This ASCII file contains simulation results as a more or less continuous stream of data automatically generated by the ODE solver (Fig. 2). It covers the complete time span as identified in the experimental data file in combination with the condition file. This output file can be used for generating smooth graphs using third party software.

74 / 134

Fig. 2 The simulation.csv output file contains the output data as a more or less continuous data stream.

Important note

The automatically generated output files will overwrite the files from a previous simulation/optimisation. So to save old output rename the files to different names to prevent them from being replaced by newer ones.

75 / 134

Model optimisation Once the optimisation has been prepared as outlined before, the actual model optimisation is actioned by simply pressing the Start button next to the Optimisation button (Fig. 1).


After a successful optimisation OptiPa returns graphical, numerical and statistical outputs.

76 / 134

Graphical optimisation output The graphical outputs consist of an overall plot of observed versus expected model values and a plot of residuals as a function of time (Fig. 1) which is presented in the main OptiPa window. One can toggle between the two plot types by clicking the graphs.

Fig. 1 Optimisation output showing the overall plot of observed versus expected model values and a plot of residuals as a function of time.

The colours used for plotting the data are the same as used in the Objective function matrix. When the residuals are plotted, the horizontal line represents the absolute zero line. In case of a good fit the residuals should be randomly distributed around this zero line not showing any systematic deviations. Furthermore, for each of the experiments, OptiPa generates time course plots of the combined experimental and simulated data identical to the normal simulation output.

77 / 134

Numerical optimisation output The following numerical output files are automatically generated after each optimisation automatically overwriting previous files with the same name:

modelname.set - created in the project folder …\settings

modeldata.csv - created in the model output folder …\models\OUTPUT[modelname]

simulation.csv - created in the model output folder …\models\OUTPUT[modelname] with modelname the file name of the original model definition file (OMF-file) used. The files are created in different subfolders relative to the current project folder. These files are identical to the ones generated during a simulation and are described here. Besides these files a separate log file (logfile.csv) is generated in the same folder as where the OptiPa program files are located. This log file contains the progress of the most recent optimisation process as it contains a log of all parameter values assigned during the iterative optimisation process (Fig. 1).This can be useful to identify and visualise strong correlations between parameters and to find out why a certain optimisation will not go in the direction you would expect it to go.

Fig. 1 The logfile.csv file contains all intermediate values assigned to

the different model parameters during the iterative optimisation

process. The last column contains the residual sums of squares.

Important note

The automatically generated output files will overwrite the files from a previous optimisation. So to save old output rename the files to different names to prevent them from being replaced by newer ones.

78 / 134

Statistical optimisation output The statistical output file, Stats.html, is automatically generated after a successful optimisation. The file is created in the model output folder created in the model output folder …\models\OUTPUT[modelname] relative to the current project folder with modelname the file name of the original model definition file (OMF-file) used. The statistics output is by default written to file while appending to the existing file, thus keeping all information of subsequent optimisations together. Through the options menu one can however change this default setting. During the optimisation all output is echoed to the console window and written to the Stats.html output file. At the end, this file can be opened using your default web browser through the Display Stats File button from the main OptiPa window (Fig. 1)

Fig. 1 Display Stats file button.

The statistical output (Fig. 2a-f) has been designed in accordance with the standard output provided by the non linear regression routine from SAS (version 6.11, SAS institute Inc., Cary, NC, USA). After a listing of the iterative optimisation process, OptiPa calculates a summary table of the regression in terms of the sums of squares. It furthermore calculates the approximate standard errors, the 95 % confidence intervals and the correlation matrix of the estimated parameters based on the Jacobian matrix coming from the lsqnonlin routine. Furthermore, using a 99 % confidence interval on the residuals the most extreme outliers are identified for each of the dependent variables. The file header (Fig. 2a) contains basic information regarding date and time of optimisation and the names of the input files used. Furthermore it gives a listing of the selected experiments and mentions which type of transformation or weighing was selected. The termination tolerance and the maximum number of iterations and function evaluations are determined by setting the slider from the optimisation control menu. By moving the slider towards nitpicking the termination tolerance will be lowered while the maximum number of iterations and function evaluations are raised.

Fig. 2a The statistical output file Stats.html: File header

The progress monitor (Fig. 2b) shows the progress of the iterative optimisation process. During each iteration step the model is called several times (as indicated by the function count). The first iteration takes as many function evaluations as you have parameters to be optimised plus one, as at each function

79 / 134

call only one of the parameters will be changed. During the subsequent iterations the procedure might change more parameters at once during each of the function evaluations. The column indicated by Residual contains the sum of the squared deviation between observed and predicted which is affected by possible transformations and the way the Objective function is defined. The aim is to minimise this value. The column indicated by Step-size contains the step size in the current search direction and is a measure for how quickly the optimisation is progressing. Closely related to this is the directional derivative which is the gradient of the function along the search direction. The column Lambda contains the Lagrange multipliers at the solution.

Fig. 2b The statistical output file Stats.html: The progress monitor

Subsequently any possible grouping is mentioned (Fig. 2c). The regression summary (Fig. 2c) shows the sums of squares either or not explained by the model. Finally several measures of goodness of fit are given. However, one needs to realise that their practical value is limited as with non linear models the assumptions of normality are most often not fulfilled. Both the regression summary and the goodness of fit statistics change depending on the weighting and transformation applied (both through the selected optimisation options and through the weights assigned in the Objective function). If certain values are down-weighted their residuals will contribute less and therefore a misfit for this points is considered less important. As a result the explained part in the weighted context will improve while often the explanation of the raw, untransformed data, goes down. When working with variables of different scales which have been rescaled to N(0,1) by applying a normal transformation the results can be hugely different. Therefore, when comparing subsequent modelfits make sure to always compare them using the same transformation/weighing. For this reason the statistics concerning the regression summary is always given twice, once in terms of the original untransformed data and once in terms of the transformed weighted data. In this way the effect of weighing and/or transformation becomes explicitly visible to the user.

80 / 134

Fig. 2c The statistical output file Stats.html: Regression summary

81 / 134

The following statistics are provided helping the user to judge the goodness of fit of the model (Fig. 2c). F-value: The F Value is the test statistic used to decide whether the model as a whole has statistically

significant predictive capability, that is, whether the regression SS is big enough, considering the number of variables needed to achieve it. The null hypothesis is that the model has no predictive capability. The null hypothesis is rejected if the F ratio is large and the corresponding probability is smaller than p=0.05.

Explained part (or R2): Is the squared multiple correlation coefficient, also called the Coefficient of Determination. It is the proportion of the variability in the response that is fitted by the model. If a model has perfect predictability, R2=1. If a model has no predictive capability, R2=0. R, the multiple correlation coefficient and square root of R2, is the correlation between the observed values, and the predicted values.

Adjusted explained part (or R2adj): As additional variables are added to a regression equation, R2 increases even when the new variables have no real predictive capability. The R2adj is an R2-like measure that avoids this difficulty. When variables are added to the equation, R2adj doesn't increase unless the new variables have additional predictive capability.

Root Mean Square Error (RMSE): The RMSE, also known as the standard error of the estimate, is the square root of the Residual Mean Square. It is the standard deviation of the data about the regression line, rather than about the sample mean. The root mean-squared error gives the error value the same dimensionality as the actual and predicted values.

Mean Absolute Percent Error: This is the mean absolute percentage prediction error. Akaike's information criterion (AIC): The AIC is an information criterion that penalizes for the addition of

parameters, and thus selects a model that fits well but has a minimum number of parameters (i.e., simplicity and parsimony). In itself, the value of the AIC for a given data set has no meaning. It becomes interesting when it is compared to the AIC of a series of models specified a priori, the model with the lowest AIC being the 'best' model among all models specified for the data at hand.

Subsequently a table is provided with the estimated parameter values with their approximated standard errors (Fig. 2d). Based on the standard error (under the wrong assumptions of normality, etc.) 95 % confidence intervals are constructed. To generate more accurate confidence intervals bootstrapping techniques could be applied. Large confidence intervals might indicate an over-parameterisation of your model and might aggravate the optimisation as to finding a solution. Such situation might occur when there is not enough information present in the experimental data regarding the aspect covered by that particular model parameter. For parameters estimated per grouping, the multiple copies of the parameter are labelled with an extension label of the corresponding group. (H0_1, H0_2, H0_3, etc.). If parameters were fixed (not estimated) the corresponding fixed value is given without further details. In case of the Box-Cox transformation the estimated value of a is presented as well (as BoxCox_XXX) labelled with the name of the variable to which the transformation was applied.

82 / 134

Fig. 2d The statistical output file Stats.html: Parameter estimates

The correlation matrix (Fig. 2e) shows the correlations between all estimated parameters. High correlations might indicate an over-parameterisation of your model and might aggravate the optimisation as to finding a solution. In that case it might be better to fix some of these parameters at know literature values or just at some arbitrary values. Such situation might occur when two parameters always appear together. Correlations higher than 0.9 and smaller than -0.9 are marked in red. When the correlation matrix becomes too large to display (showing more than 30 by 30 model parameters), a button is generated instead. Only on explicit request of the user the full correlation matrix will be shown.

Fig. 2e The statistical output file Stats.html: Correlation matrix

83 / 134

Using a 99 % confidence interval on the residuals (assuming a normal distribution) the outliers (points outside this 99% confidence interval) are identified for each of the dependent variables (Fig. 2f). Using the option Remove outliers from data these outliers can be removed during a subsequent optimisation step.

Fig. 2f The statistical output file Stats.html: Outliers

Finally the plot for the model fit is included as part of the statistical output (Fig. 2g).

Fig. 2g The statistical output file Stats.html: Graphical output

Important notes


When an optimisation routine different from LSQnonlinear is chosen the Correlation matrix and the parameters´ standard deviations and confidence intervals are omitted as these optimisation methods don´t provide a jacobian matrix from which to calculate such information.

Remain alert for any parameter that tend to go to extreme low values and is approaching zero. This might be indicating the involved process is not relevant and can be discarded. It might also indicate that the sign of the parameter in the model definition should be swapped (in case of some depletion process which was accidentally modelled as an accumulation process). In such case one might try to change the sign of the model parameter in the model definition file, to check whether one might have made the wrong assumption about the sign of that particular parameter.

84 / 134

Optimisation per experiment When fitting a model to a multitude of experiments, model parameters can be estimated in common or following a certain grouping in the data (Fig. 1). In the most extreme case this could lead to the estimation of multiple copies of each model parameters providing a single value for each experiment.


A relative simple model with, for instance, 4 model parameters but applied to 20 subsequent experiments would thus result in an 80 dimensional parameter space. The optimisation in such high dimensional space requires a huge amount of computational time. In the end, when there is no parameter value estimated in common between the 20 experiments, the 20x4 parameters will not show any correlation over the experiments. Only the 4 parameters within a single experiments will show some degree of correlation. As a consequence, the experiments might as well be analysed one by one. In practice, optimising the model 20 times estimating 4 parameters per experiment goes much faster than optimising the model at once on all 20 experiments estimating all 80 parameter values together. For this reason the option was created to optimise the model per selected experiment.

85 / 134

Optimisation per experiment: procedure The optimisation per experiment can be prepared in the same way as described earlier (see Preparing the optimisation). A subset of experiments to be analysed can be selected. All parameters can now be estimated in common as in the end each selected experiment will be optimised one by one anyway. Finally you can activate the Optimise per Experiment functionality (Fig. 1) and hit the Start button.


86 / 134

Optimisation per experiment: output Once started no graphical output is generated to save as much computational time as possible. Only a progress monitor will keep track of the progress. After each experiment a short parameter summary is generated on the log screen and saved to the Stats.html output file. Once finalised, for each of the experiments OptiPa generates time course plots of the combined experimental and simulated data and of the residuals (Fig. 1). As usual, one can toggle between the two plot types by clicking the individual graphs or by hitting the button Show residuals/simulations.

Fig. 1 Optimisation output showing the time course plots of the combined experimental and simulated data.

After the optimisation per experiment the following output files are created:

modeldata.csv

simulation.csv

OptimisedPerExperiment.csv

Stats.html

modelname.set - created in the project folder …\settings All files are created in the model output folder …\models\OUTPUT[modelname] relative to the current project with modelname the file name of the original model definition file (OMF-file) used, except for the settings file The settingsfile contains the parameter estimates accumulated over all selected experiments. The same parameter copy values are stored in the interface with each of the estimated parameters assigned using grouping per experiment (Fig. 2).

87 / 134

Fig. 2 Multiple parameter values assigned per experiment after a successful optimisation per experiment.

The modeldata.csv and simulation.csv file are identical to the files normally generated after a successful optimisation and contain the accumulated results from the subsequent modelfits on the selected experiments. The OptimisedPerExperiment.csv file contains a listing of the parameter estimates with each line referring to a single experiment and the subsequent columns containing the parameters estimated.

Fig. 3 Parameter estimates generated during the call Optimise per experiment.

The Stats.html file (Fig. 4) contains the statistical output. At the end, this file can be opened using your favourite text editor through the Display Stats File button from the main OptiPa window. The header of the file contains the standard information regarding the optimisation settings and the selected experiments. This is followed by short summary tables of the parameter estimates for each experiment optimised. Subsequently the regression summary shows the sums of squares either or not explained by the model accumulated over all the analysed experiments. Note that the modelfit in this example is based on 54 degrees of freedom; in spite of having only 3 model parameters being estimated per experiment, overall the set of 18 experiments has consumed 18 x 3 = 54 degrees of freedom. Finally several measures of goodness of fit are given similar to the standard statistical output generated during a normal optimisation. The regression summary is only generated in terms of the original untransformed raw data. The final table accumulates the results over all experiments providing their modus and mean, with their approximated standard errors. Based on the parameter estimates on the multiple experiments 95 % confidence intervals are constructed. Beside these 95 % intervals skewness and kurtosis is indicated. The confidence intervals are now based on the actual distribution obtained from the multiple experiments and therefore might no longer be symmetric. The correlation matrix shows the correlations between the estimated parameters.

88 / 134

Fig. 4 Statistical output after an Optimisation per experiment.

89 / 134

Important notes

The automatically generated output files will overwrite the files from a previous optimisation. So to save old output, rename the files to different names to prevent them from being replaced by newer ones.

The order of the runs processed on screen and in the Stats.html might have been mixed up depending on the number of cores involved during the parallel computations. You will have to sort the output file yourself afterwards.

When an optimisation routine different from LSQnonlinear is chosen the intermediate optimisation results per experiment are omitted as these methods don´t provide a jacobian matrix from which to calculate the parameters standard deviations.

90 / 134

Confidence regions The parameter estimation accuracy of simultaneously estimated model parameters can be assessed by joined confidence regions, which are bounded by the following inequality:

(Eq.1) Here, SSE(p) represents the sum of squared errors for a parameter vector p, SSE(p̂) denotes the optimised least sum of squared errors, and F(1−α, np, nt−np) is the corresponding value of the classical F-

distribution, with nt the number of data points, and np the number of model parameters and 1- being the confidence level. If np>2, it is impossible to correctly visualise the np-dimensional hypervolume as defined by Eq. (1). As an alternative, two approaches can be followed: (i) a projection on a two-parameter plane is made (over-estimation), or (ii) the cross-section of the hypervolume with a two-parameter plane centered around p̂ is made (under-estimation). The latter method being computationally less intensive is applied in this study. The joined confidence region is constructed by calculating the SSE for a grid of two-parameter combinations while keeping the other np−2 model parameters at their optimised (estimated) value (conditional joined confidence region). This requires a total of (np(np−1))/2 two-parameter combinations to obtain the overall picture. For each two-parameter combinations the model needs to be simulated multiple times to scan the SSE surface in this part of the parameter space. As such this is a time consuming exercise requiring typically up to 500 model simulations per parameter combination.

Important note

Be careful with including ill-defined model parameters showing extreme correlations to other parameters. Also their confidence regions might remain relatively ill-defined.

91 / 134

Confidence regions: procedure Once a model is implemented in OptipPa, experimental data is available and the model was shown to give a proper fit to the data, conditional joined confidence regions can be calculated. It is suggested first to run a normal optimisation to make sure the model fits the data. Then the calculation of the joined confidence regions can be initiated by selecting those parameters you want to be involved the same way you would (de)select parameters for a normal optimisation followed by activating the ConfidenceRegions functionality (Fig. 1) and selecting the Start button.


After the initial optimisation the user is asked to define the scan range and the resolution of the scanning, both expressed in standard deviation units. When all parameters are uncorrelated and behave as being normally distributed a scan range of 3 should suffice to establish up to the 99 % confidence ellipses. Depending on the model and the behaviour of the parameters larger ranges can be required. As a first approach one can use the same scan range for all parameters selected (provide a common value and press the Set in Common button). If from a first calculation one parameter has more widely stretched confidence ellipse, one can take a wider range for that particular parameter by providing different values for the different parameters (select a parameter from the drop-down menu and type a value in the entry field behind it). In all cases the same scan resolution will be applied to all parameters The scan step needs to be larger than 0.01 (= 1% of the standard deviation) and smaller than the smallest scan range defined for any of the parameters. Once finished press the OK button. Pressing the Cancel button will abort the calculations.

Fig. 2 Confidence dialog to define scan range, either in common or per parameter, and scan resolution.

92 / 134

With larger ranges it becomes more likely to include negative parameter values as well. However, as in OptiPa parameters are defined positive by default, the negative part will be ignored and the scanning is limited to the positive part only. If you find that the confidence ellipses are a bit rough, reduce the scan interval to lower values (e.g. 0.1 instead of 0.5). Of course, the smaller step size is selected the more time it takes to complete the calculations.

Important notes

The initial optimisation run will always use the default least square non-linear optimisation.

For the sake of calculating the conditional joined confidence regions, standard deviations will ALWAYS be limited to a maximum of the parameter value/2. So for well-defined parameters with a small standard deviation scanning ranges will be directly determined based on the real standard deviation. For ill-defined parameters with a standard deviation larger than half the parameter value, the scanning ranges will be determined based on n times the parameter value/2. This is the prevent the system of getting involved in scanning unnecessarily large ranges. If, after an initial run, one notices that there is a real need to scan a much larger ranges for a particular parameter, this can be changed by only putting a large value for the scan range of that particular parameter.

93 / 134

Confidence regions: output Once started, OptiPa will scan one-by-one for every possible parameter combination the parameter space surrounding the optimised parameter value combinations keeping all other parameters constant at their optimised values. For each combination OptiPa will first do a rough scan in the defined range of standard deviation units. Then it will focus on confidence region between 0.70 and 0.999 %. In this range it will fine tune to obtain more accurate results. The intermediate hyperboles representing the surfaces for the sum of squared errors (SSE) are show on the fly (Fig. 1). SSE values far out of the 0.999 % range are omitted from the start, sometimes resulting in flat SSE surface parts.

Fig. 1 OptiPa calculating the conditional joined confidence regions.

Once finished, OptiPa generates a matrix plot showing for each parameter combination the calculated conditional joined confidence regions (Fig. 2).

94 / 134

Fig. 2 Calculated conditional joined confidence regions. The center points represent the optimised parameter combinations while the three ellipses, going from the inside to the outside, represent the 0.90, 0.95 and 0.99 confidence regions.

After the optimisation per experiment the following output files are created:

modeldata.csv

simulation.csv

CondJoinedConfReg.csv

Stats.html All files are created in the model output folder …\models\OUTPUT[modelname] relative to the current project with modelname the file name of the original model definition file (OMF-file) used. The modeldata.csv and simulation.csv file are identical to the files normally generated after a successful optimisation and contain the results from the initial modelfit. This modelfit is used as the starting point to calculate the conditional joined confidence intervals. The CondJoinedConfReg.csv file (Fig. 3) contains full coordinates of the confidence ellipses shown in Fig. 2.

95 / 134

Fig. 3 Output saved in CondJoinedConfReg.csv file after the successful calculation of the conditional confidence regions. The blue region contains the names and optimised values of one of the parameter combinations. The red regions contain the full coordinates of the three subsequent confidence ellipses for this parameter combination. Subsequent parameter combinations are covered one after the other. As not all ellipses contain as many points the columns are of unequal length. The missing values are filled with NaNs.

The Stats.html file contains the standard statistical output generated during the initial optimisation run. In case ill-defined model parameters are included one might end up with incomplete confidence ellipses (Fig. 4). Depending on the nature of this lack of definition one can try to cover the full ellipse by scanning a larger parameter range, but this is more computational intensive.

96 / 134

Fig. 4 calculated conditional joined confidence regions. Example of an ill-defined parameter q0 (see table) which can take any positive value not resulting in different model fits. The information in the table was generated assuming symmetric normal parameter distributions. From the conditional joined confidence regions it becomes clear that these confidence intervals are anything but symmetric.

Important notes


Beware for artefacts. When the scanning grid is taken to coarse the ellipses might break up in multiple separated fragments. Lower the scan step to increase resolution and to obtain better defined ellipses.

When the ellipses break up in in multiple separated fragments the outputfile CondJoinedConfReg.csv will become mixed up as there will be more columns than expected. You will have to manually rearrange the columns (Moving the 'extra' columns containing the fragments to the columns containing the rest of the ellipses) or just recalculate the confidence ellipses using a wider scan range and/or finer scan step.

97 / 134

Sensitivity analysis Once a model is implemented in OptiPa, experimental data is available and the model was shown to give a proper fit to the data, a sensitivity analysis can be done. OptiPa provides the option to perform a systematic one-parameter-at-a-time sensitivity analysis changing each of the original optimised model parameters (Pi)

with a fixed relative delta (Pi) resulting in simulations with each of the selected parameters taking the

respective values of Pi ± Pi. Depending on the research question the user can study the model output at specific time points or take the complete time line and express the predictions with the disturbed parameters relative to the prediction using the original optimised parameter values. This sensitivity analysis does not take into account the real variation (either biological or technical) observed for the various model parameters as a certain relative disturbance is often chosen arbitrarily and the correlation structure between the parameters is not taken into account. A better approach would be to work with a Monte-Carlo simulation either or not in combination with bootstrapping.

98 / 134

Sensitivity analysis: procedure The sensitivity analysis can be initiated by selecting those parameters you want to be involved the same way you would (de)select parameters for a normal optimisation followed by activating the Sensitivity functionality (Fig. 1) and selecting the Start button.


OptiPa will run the sensitivity analysis based on a condition file. As a first step, the user will be asked to load a condition file containing settings for the experiment(s) to simulate (Fig. 2). Make sure this file contains at least experiment numbers and time spans for the experiments to be run. If multiple experiments are included (e.g. at different experimental conditions) they are assumed to cover the same timespan.

Fig. 2 Selecting a condition file.

99 / 134

Subsequently, one needs to define the relative magnitude of the disturbance expressed as fraction (>0, <1) of the original optimised parameter values.

Fig. 3 Sensitivity dialog.

Finally, the user will be asked to indicated the number of output points (Fig. 4). This is used to define the output resolution of the simulation and to guarantee an output synchronised with regard to the time points for which output is generated. In addition specific time points can be requested. If no continuous time line is needed, the first field can be left empty or set to 0. If no specific time points are targeted the second field can be left empty as well. However, the minimum number of time points required to perform a simulation is 2. So, if no time line is requested AND only one fixed time point is defined, OptiPa will automatically add the first and last time point from the condition file as well. The same definition of timespan and fixed time points will be applied to each of the experiments present in the selected condition file.

Fig. 4 Sensitivity dialog.

100 / 134

Sensitivity analysis: output After the sensitivity analysis only numerical output is generated which can be found in the following output files:

SensitivityPar.csv

SensitivitySim.csv All files are created in the model output folder …\models\OUTPUT[modelname] relative to the current project with modelname the file name of the original model definition file (OMF-file) used. The SensitivityPar.csv output file provides the parameter values used for the subsequent sensitivity runs (Fig. 1). Run 0 is always based on the original initial values provided (and is referred to as the reference run) while for all subsequent runs one parameter was changed at a time (taking the respective values of Pi

± Pi) with all other parameters kept at their original starting value.

Fig. 1 Parameter values used for the subsequent sensitivity runs.

The SensitivitySim.csv output file provides the actual simulations for the subsequent sensitivity runs (Fig. 2). The first column relates to the runs mentioned in the SensitivityPar.csv output file. In this case output was requested for three selected experiments. The time points for which output was generated depends on what the user requested and are kept identical for all simulation runs.

101 / 134

Fig. 2 Simulation results of the subsequent sensitivity runs.

Important notes

The automatically generated output files will overwrite the files from a previous analysis. So to save old output rename the files to different names to prevent them from being replaced by newer ones.

The order of the runs in the SensitivityPar.csv and SensitivitySim.csv output files might have been mixed up depending on the number of cores involved during the parallel computations. You will have to sort the output file yourself afterwards.

102 / 134

Bootstrapping Given non linearity of the models, using standard statistics, confidence intervals for the model parameters can only be approximated. To acquire accurate estimates of the confidence intervals, bootstrap techniques are required and have been implemented in OptiPa. They also allow the identification of asymmetric confidence intervals. The bootstrap is a resampling method for statistical inference and is commonly used to estimate confidence intervals, but can also be used for sensitivity studies. In practical application, the bootstrap means using some form of resampling with replacement from the actual data to generate a large number of bootstrap samples. The exact nature of the resampling strategy depends on the structure of the data. In the area of postharvest biology and technology, most data consist of time series in which the data at subsequent time steps are heavily correlated and not necessarily equally distributed. As a consequence, simple random resampling of the data with replacement is not appropriate as this completely removes the original correlation between subsequent observations. Therefore the model based error resampling bootstrap technique was implemented in OptiPa. Assuming a valid model is available to describe the dependence structure of the sequential observations from the time series, this information can be used for the bootstrap. Using the model predictions, residuals are calculated for each of the observations. Then a bootstrap sample of residuals is drawn with replacement from the observed residuals. The final bootstrap sample of the observations can be constructed by adding the randomly sampled residuals to the predicted model values.

Important note

One of the prerequisites for this approach is that the residuals are homoscedastic. In the case of heteroscedastic variation resampling the residuals of the raw data would completely distort the data structure and result in non-representative bootstrap samples. In the case of heteroscedastic variation bootstrapping should therefore be used in combination with a Box-Cox transformation to correct for heteroscedasticity.

103 / 134

Bootstrapping: procedure Once a model is implemented in OptipPa and experimental data is available, bootstrap analysis can be performed. It is suggested first to run a normal optimisation to make sure the model fits the data. Then the bootstrap analysis can be initiated by selecting those parameters you want to involve in the bootstrap run the same way you would (de)select parameters for a normal optimisation followed by activating the BootStrap functionality (Fig. 1) and selecting the Start button.


Subsequently OptiPa will ask for the number of bootstrap runs which is by default set to 1000 runs (Fig. 2).

Fig. 2 BootStrap dialog.

After an initial optimisation run to fit the model to the selected data, residuals are calculated. Subsequently, by combining randomly resampled residuals with the predicted model values, the requested number of bootstrap samples is generated and the parameters selected for the bootstrap analysis are estimated for each of the generated bootstrap samples keeping the remaining model parameters fixed. Afterwards the 95 % confidence intervals of the bootstrapped model parameters are simply computed by sorting the estimates in ascending order and selecting those values that cut off the upper and lower 2.5 percentiles.

Important notes

If the Box-Cox transformation is applied to correct for heteroscedastic variation this Box-Cox parameter is optimised during the initial optimisation run. During the subsequent bootstrap runs this parameter is maintained at this initial estimated value.

The initial optimisation run and the subsequent bootstrap runs will always use the default least square non-linear optimisation.

104 / 134

Bootstrapping: output The estimated bootstrap parameters are graphically presented using a matrix plot (Fig. 1) visualising both the distributions and the correlation structure of the bootstrapped parameter values.

Fig. 1 Correlation structure and frequency distributions of the generated bootstrap data.

During the bootstrap analysis the following output files are created:

modeldata.csv

simulation.csv

bootstrap.bst

bootstrap.csv

Stats.html All files are created in the model output folder …\models\OUTPUT[modelname] relative to the current project with modelname the file name of the original model definition file (OMF-file) used. The modeldata.csv and simulation.csv file are identical to the files normally generated after a successful optimisation and contain the results from the initial modelfit. This modelfit is used as the base to generate all subsequent bootstrap datasets. The bootstrap.bst file contains all settings of this initial optimisation and is required if these bootstrap results are subsequently used for Monte-Carlo simulations. The file is written using internal MatLab format. The bootstrap.csv file contains the actual parameter estimates obtained during the various bootstrap runs (Fig. 2).

105 / 134

Fig. 2 Bootstrap parameter estimates, with each line containing a single set of parameters obtained from a single bootstrap dataset.

The Stats.html file contains the standard statistical output generated during the initial optimisation run. In addition it contains output on the bootstrap parameters as shown in Fig. 3. At the end, this file can be opened using your favourite text editor through the Display Stats File button from the main OptiPa window.

106 / 134

Fig. 3 Bootstrap statistical output.

The header contains basic information regarding the number of successful bootstrap runs. The table provides the estimated parameter values, their modus and mean, with their approximated standard errors. Based on the actual bootstrap parameter estimates 95 % confidence intervals are constructed. Beside these 95 % intervals skewness and kurtosis is indicated. Skewness is a parameter that describes asymmetry in a random variable's probability distribution with one tail drawn out more than the other. A positive value indicates skewness to the right while a negative value indicates skewness to the left.

Kurtosis is a parameter that describes the shape of a random variable's probability distribution indicating its peakedness. A positive value indicates a peaked distribution while a negative value indicates a broad flat distribution.

The confidence intervals are now based on the actual distribution of the bootstrap parameters and therefore might no longer be symmetric. The correlation matrix shows the correlations between the estimated bootstrap parameters.

Important notes

Subsequent bootstrap runs will automatically overwrite existing bootstrap output files. To prevent this, rename any existing bootstrap files before starting a new one.

To keep the bootstrap results available for subsequent Monte-Carlo simulations the output files bootstrap.bst and bootstrap.csv should maintain matching names while keeping the file extensions intact.

107 / 134

Monte-Carlo simulations Monte-Carlo simulation is a numerical stochastic technique used to solve mathematical problems. A Monte-Carlo simulation is based on some model system that can be described as a function of random model parameters characterised by their probability distribution functions. Monte-Carlo simulation simulates the model system after random sampling from these probability distribution functions. Monte-Carlo methods have been used since the late 1940's, but only since the availability of large computational power has the technique gained the status of a numerical method capable of addressing large complex applications. The technique is useful to obtain numerical solutions to problems which are too complicated to solve analytically. One way is to use the confidence intervals generated during an initial optimisation of the model. This option might however be less accurate depending on how heavily the assumptions on normality are violated. Therefore it would be better first to do Bootstrapping to generate accurate confidence intervals. Once accurate parameter confidence intervals have been established using Bootstrap techniques, Monte-Carlo simulations can be performed to study model behaviour at different conditions taking into account the parameters' inaccuracies. This can be done either using the parameter values estimated during the bootstrap analyses or by generating new random sets of parameters taking into account the correlation structure and distributions of the parameters identified in the bootstrap analyses. Gaussian random parameter sets can be easily generated using the covariance decomposition algorithm. However, this technique is only applicable to generate co-varying Gaussian random parameter sets. As could already be seen from the previous section, model parameters are often not normally distributed and can either show skewness or kurtosis. The only way to deal with this is to find a transformation of the standard normal distribution matching the observed parameter distributions to reshape the observed parameter distributions into standard normal distributions. If such a transformation is available the covariance decomposition algorithm can be applied in the Gaussian normal parameter space with the resulting parameter values being back-transformed to the original non-normal parameter space. In OptiPa A technique has been implemented to generate random correlated sets of parameters for a large family of normal based, skewed and peaked distributions based on a so called SKN distribution.

Important notes

Monte-Carlo simulations often fail because of negative parameter values generated to perform the simulations resulting in nonsense output. For this reason preferably start from a bootstrap analysis that generates a more realistic range of parameter values (all being non-negative).

Most likely, also in case of a failure, the first two output files MCarloPar.csv and MCarloSim.csv should have been correctly generated. Generally it is the subsequent processing that falls over. In the case of a failure you can check whether this might have been due to negative parameter values by checking the output file MCarloPar.csv. If you find unacceptable parameter combinations, note down their run number and remove the corresponding simulations from the MCarloSim.csv file. After this manual editing of the output files start again the Monte-Carlo procedure. OptiPa will recognise the presence of the existing outputfiles and will ask you to process them again for you. After removing the erroneous simulations, this should result in the remaining output files MCarloSim95Conf.csv and MCarloSimFreq.csv. Please note that you have been manually distorting the generated distribution of the model parameters. This might be acceptable for the accidental negative value, not for a distribution that has a serious tail in the negative range.

Even when starting from a bootstrap analysis, due to the random generation of new parameters similar to the bootstrap sample, one might still end up with accidental negative parameter values making the Monte-Carlo analysis to fail. One work around might be by not generating new parameter values but randomly draw existing parameter value combinations from the bootstrap parameter set.

When starting from an initial optimisation under the assumption of normality always double check that the (symmetric) 95 % confidence intervals do NOT enclose zero. Any of the parameter confidence intervals that do enclose zero will guarantee that your Monte-Carlo simulation will fail. So, only select well defined parameters.

Another generic workaround to get rid of negative parameter values (if you feel appropriate to do so) is by adapting your model definition file and instead of using some parameter kstc, that every now and then becomes negative, you use abs(kstc). In this way such parameter values will be automatically

108 / 134

turned positive. Please note that you are now manually distorting the generated distribution of the model parameters. This might be acceptable for the accidental negative value, not for a distribution that has a serious tail in the negative range.

So far we haven't included a generic system to cope with negative parameter values as this is too much model specific and needs to be solved by the user.

109 / 134

Monte-Carlo simulations: procedure Once a model is implemented in OptipPa and experimental data is available, Monte-Carlo simulations can be performed. The Monte-Carlo simulations can be initiated by activating the MonteCarlo functionality (Fig. 1) and selecting the Start button.


As a first step, the user asked to decide on the type of Monte-Carlo simulation (Fig. 2). One way is to use the confidence intervals generated during an initial optimisation of the model. This option might however be less accurate depending on how heavily the assumptions on normality are violated. A more accurate way would be to base the Monte-Carlo on the more accurate Bootstrap based parameter distributions.

Fig. 2 Selecting the type of Monte-Carlo simulation to be done.

In case the user selected the bootstrap option, the user is asked to load a previously saved bootstrap.bst file containing results from a previous bootstrap analyses (Fig. 3). Depending on the indicated file, the model and all its settings will be loaded.

110 / 134

Fig. 3 Selecting a previous *.bst file.

In case the user requested Monte-Carlo simulations based on an initial model optimisation this initial optimisation is started. The parameters selected for optimisation are also the parameters that will be included as stochastic parameters in the subsequent Monte-Carlo simulations. Subsequently, for both types of Monte-Carlo simulations, the user will be asked to load a condition file containing settings for the experiment(s) to simulate (Fig. 4). Make sure this file contains at least experiment numbers and time spans for the experiments to be run.

Fig. 4 Selecting a condition file.

Finally, OptiPa will ask for the number of Monte-Carlo runs which is by default set to 1000 runs (Fig. 5).

111 / 134

Fig. 5 Monte-Carlo dialog

The Monte-Carlo procedure will start fitting the SKN distribution to the bootstrap parameters, using the cholesky decomposition to generate random correlated sets of parameters. To double check whether the generated Monte-Carlo parameters correspond to the original bootstrap parameters some basic statistics are given for each of the parameters comparing their mean, standard deviation and the levels of skewness and kurtosis. (Fig. 6)

Fig. 6 Output concerning the distributions of the generated Monte-Carlo parameters as compared to the distributions of the original Bootstrap parameters comparing them in terms of mean, standard deviation and the levels of skewness and kurtosis and testing the agreement in distributions through the non-parametric Kolmogorov-Smirnov two-sample test.

The generated Monte-Carlo parameters are furthermore represented using a matrix plot (Fig. 7).

112 / 134

Fig. 7 Covariance structure and frequency distributions of the original bootstrap data (blue data point) the fitted SKN distributions (blue lines) and of 1000 newly generated random parameter sets (red data).

The user is given the options (Fig. 8) to either:

KEEP the newly generated parameter sets,

GENERATE a new set of Monte-Carlo parameters (of different size),

draw existing parameter combinations from the original BOOTSTRAP data set (this option is not included when the user requested Monte-Carlo based on an initial optimisation),

CANCEL the whole Monte-Carlo procedure. After selecting one of these options the Monte-Carlo procedure continues accordingly.

Fig. 8 Monte-Carlo dialog.

Finally, the user will be asked to indicated the number of output points (Fig. 9). This is used to define the output resolution of the simulation and to guarantee an output synchronised with regard to the time points for which output is generated. Eventually this also determines the time resolution of the generated frequency distributions.

113 / 134

Fig. 9 Monte-Carlo dialog.

Important note

The initial optimisation run will always use the default least square non-linear optimisation.

114 / 134

Monte-Carlo simulations: output Once the final Monte-Carlo parameters have been accepted by the user these are saved in a file named: MCarloPar.csv (Fig 1). The first column contains the run number that coincides with the simulation output of MCarloSim.csv (Fig 4).

Fig. 1 Monte-Carlo parameter file.

When the actual Monte-Carlo simulation runs are finished the user can select one or more of the model output variables (Fig. 2) to be plotted showing the whole distribution of simulations generated by the Monte-Carlo analysis. The user can select more than one entry by holding the left Ctrl-key while selecting additional entries.

Fig. 2 Selecting plot variables.

115 / 134

Finally OptiPa will generate graphical output for each variable-experiment combination individually (Fig. 3).

Fig. 3 Graphing the results from the Monte-Carlo simulation with the color scale representing the frequency of the distributions of the modelled response variable (Y-axes) as a function of time (X-axes). The white lines represent the 50 % and 95 % confidence intervals while the blue line represents the modus of the model simulation.

The data from the probability density functions shown in the graphical output of Fig. 3 are saved in a file named MCarloSimFreq.csv. The structure of this file can be schematically represented by Fig. 4. The first two cells are not used and contain a "-1". The rest of the first row contains the time points at which the Monte-Carlo simulations were sampled. This row is valid for the whole file. Data from the various graphs generated are indicated by a plot_ID. At each time point, frequencies are counted for the concerning model output variable covering their full range of observed y-values as indicated in the second column for each of the plots. The observed frequencies are given per plot as one big matrix of values. One could reproduce the plots from Fig. 3 using the data from MCarloSimFreq.csv by creating a surface plot with time points being the X-values, y-values being the Y-values, and observed frequencies being the Z-values defining the height of the surface.

116 / 134

Fig. 4 Graphical representation of the numerical results from the Monte-Carlo simulation saved in the file MCarloSimFreq.csv.

The 50 % and 95 % confidence intervals of the simulation are stored in a separate file named MCarloSim95Conf.csv (Fig. 5). In this last file, the data is again organised per plot generated as indicated by the plotID.

117 / 134

Fig. 5 Numerical results from the Monte-Carlo simulation saved in the file MCarloSim95Conf.csv. This file contains the upper and lower 50 % and 95 % confidence intervals of the simulations including the mean, median and modus.

The raw data of the numerical outcome of the Monte-Carlo simulations is automatically saved in a file named: MCarloSim.csv (Fig 6) which can be used for further data analyses. Note that this file will quickly grow extremely large and eventually might no longer fit into an Excel spreadsheet. If you ask for 1000 Monnte-Carlo runs x 100 time points per simulation x 10 experiments to simulate, this will generate a file of 1 000 000 rows. As excel can deal with up to 1 048 576 rows you would be approaching its technical limits. In such case you might have to write your own software programme to read this output file for additional data analyses.

118 / 134

Fig. 6 Numerical results from the Monte-Carlo simulation saved in the file MCarloSim.csv.

Important notes

The automatically generated output files will overwrite the files from a previous Monte Carlo simulation. So to save old output, rename the files to different names to prevent them from being replaced by newer ones.

The order of the runs in the MCarloSim.csv output files might have been mixed up depending on the number of cores involved during the parallel computations. You will have to sort the output file yourself afterwards.

119 / 134

Draw from distributions Gaussian random parameter sets can be easily generated using the covariance decomposition algorithm. However, this technique is only applicable to generate co-varying Gaussian random parameter sets. As could already be seen from the previous section, model parameters are often not normally distributed and can either show skewness or kurtosis. The only way to deal with this is to find a transformation of the standard normal distribution matching the observed parameter distributions to reshape the observed parameter distributions into standard normal distributions. If such a transformation is available the covariance decomposition algorithm can be applied in the Gaussian normal parameter space with the resulting parameter values being back-transformed to the original non-normal parameter space. In OptiPa a technique has been implemented to generate random correlated sets of parameters for a large family of normal based, skewed and peaked distributions based on a so called SKN distribution.

120 / 134

Draw from distributions: procedure Given a reference set of random co-varying variables OptiPa can be used to generate new sets of values with the same properties as the original set of variables. This action can be initiated by activating the Draw from distributions functionality (Fig. 1) and selecting the Start button.


The only thing needed is a file located in the model output folder containing columns with variables with in the first row the names of the variables (Fig. 2). This file layout is similar to the bootstrap parameter output file.

Fig. 2 Example file containing co-varying random variables.

121 / 134

The user will be asked to select the required input file containing a sample of co-varying variables to be mimicked (Fig. 3)

Fig. 3 Selecting an existing file containing the co-varying variables

OptiPa will ask for the number of value combinations to be generated which is by default set to 1000 sets (Fig. 4).

Fig. 4 Drawing from distributions dialog

The Drawing from distributions will start by fitting the SKN distribution to the variables and use the cholesky decomposition to generate random correlated sets of variables. To double check whether the generated variables correspond to the original variables some basic statistics are given for each of the parameters comparing their mean, standard deviation and the levels of skewness and kurtosis. (Fig. 5)

122 / 134

Fig. 5 Output concerning the distributions of the generated variables as compared to the distributions of the original variables comparing them in terms of mean, standard deviation and the levels of skewness and kurtosis and testing the agreement in distributions through the non-parametric Kolmogorov-Smirnov two-sample test.

The generated variables are furthermore represented using a matrix plot (Fig. 6).

Fig. 6 Covariance structure and frequency distributions of the original data (blue data point) the fitted SKN distributions (blue lines) and of 1000 newly generated random sets (red data).

123 / 134

The user is given the options (Fig. 7) to either:

keep the generated variable sets,

generate a new set co-varying variables (of different size).

Fig. 7 Drawing from distributions dialog

124 / 134

Draw from distributions: output The final variable sets generated which have been accepted by the user are saved in a file named: SKNparameters.csv with a layout similar to the input file (Fig 1).

Fig. 1 SKN parameter file.

Important note

The automatically generated output files will overwrite the files from a previous Drawing from distributions. So to save old output, rename the files to different names to prevent them from being replaced by newer ones.

125 / 134

Theoretical background

126 / 134

Levenberg-Marquardt method Marquardt has put forth an elegant method, related to an earlier suggestion of Levenberg, for varying smoothly between the extremes of the inverse-Hessian method and the steepest descent method. The latter method is used far from the minimum, switching continuously to the former as the minimum is approached. This Levenberg-Marquardt method works very well in practice and has become the standard of nonlinear least-squares routines. Thus this method works well (and fast) if the statistic surface is well-behaved. However, there is no guarantee it will find the global fit-statistic minimum.

A Sherlock Holmes approach… Levenberg-Marquart Methods and Nonlinear Estimation A. Conan Doyle might have called it "The Case of the Disappearing Duplicate Diagonal Dominator": Freshly returned from a cold walk in Regent's Park, Sherlock Holmes and Dr. Watson were warming themselves before the fire at Baker Street when Holmes announced, "Dr. Watson, I am sure you are familiar with the widely used scheme of Dr. Donald Marquardt for robust nonlinear least-squares estimation." Watson shook himself awake from a doze and returned a look of sleepy interest. Holmes continued, "Now I have lately received a letter inquiring about the fate of an earlier promulgation of quite similar ideas. These too recommended the addition of a positive term to the diagonal of the matrix obtained from the Jacobian. Should we not use our deductive powers to ascertain why this earlier foray attracted so little attention while Dr. Marquardt's later work was so remarkably influential?" As he reached from his wing chair to accept the letter proffered by Holmes, Wats on mumbled, "Why, yes, yes, of course. We should investigate." After a moment's perusal of the letter, he continued in more certain tones. "This letter refers to the work of Kenneth Levenberg whilst he was employed at the Americans' Frankford Army Arsenal. It cites a 1944 paper in their Quarterly of Applied Mathematics. That was nearly twenty years before Marquardt's paper appeared--in '63, in the Journal of the Society for Industrial and Applied Mathematics, if memory serves--and the earlier work went largely unnoticed until Marquardt mentioned it again. "Now wasn't Marquardt employed by that chemical company with the French-sounding name--DuPont, I think? Possibly a connection with the Jacobins? Definitely a case of a disappearing duplicate diagonal dominator, I should say." "Yes, yes, Dr. Watson," Holmes agreed. "My belief is that the real clues to the neglect of Levenberg's work and the success of Marquardt's are several. We should study Marquardt's Fortran code, his attention to a certain angle, and perhaps a subtle difference between the two approaches that only the power of modern computation can reveal." "Certainly," Watson answered, "you know best. But perhaps you could first tell me a bit about the importance of nonlinear estimation. Why all the fuss? And I would like to warm my toes a bit more before we venture out again!" If he had stayed awake while his feet warmed up, Dr. Watson would have learned that nonlinear estimation is a ubiquitous tool of modern technology and that the Levenberg-Marquardt method is the breakthrough that accounts for much of its commonplace use.

Nonlinear Estimation Estimation is the process of fitting a mathematical model to experimental data to determine unknown parameters in the model. The parameters are chosen so that the output of the model is the best match in some sense to the observed data. This best fit idea is a bit like observers on a beach deciding whether a distant ship on the horizon is a luxury liner or a freighter--the color of the hull, the presence or absence of rigging, and other features of the observed ship are compared with those of known types of ships until the best match is found. The estimation process is often nonlinear because the observed data do not vary in direct proportion to the parameters in question. Having found accurate values of the estimated parameters, investigators can differentiate among superficially similar situations, or they can make accurate predictions of behavior from the underlying model. These outcomes are an integral part of plant design, process refinement, drug development, quality control, and many other aspects of modern industrial practice.

Processes and Products Joe Skwish, a senior consultant at DuPont, describes a canonical problem of the chemical industry that

127 / 134

hinges on nonlinear estimation. "In chemical reaction rate studies, exponential rate coefficients determine the rate of transition from one chemical product to another. Typically, there is a multiple reaction chain. The sizing and construction of chemical plants depend upon accurate values of those reaction rates. They are determined by using nonlinear estimation to fit the exponential rate coefficients to laboratory data." Skwish explains that the nonlinearity of most scientific models arises from "powers, expo-nentials, and ratios, the same kinds of terms that scientists encounter in the course of their educations. They emulate those expressions when they formulate models of their own." As a specific application of nonlinear estimation, Skwish mentions the design of a waste treatment plant that required the use of mathematical models to characterize the efficiency and thus to determine the size of continuous stirred tank reactors. "Rate constants for these tanks--the coefficients for a model--were estimated from a pilot plant," Skwish explains. "Then data from actual waste streams were used as input to the model to determine the final design. The accuracy of those coefficients was verified by comparison with the actual coefficients measured from the final, full-scale plant." In an earlier position with Eastman Kodak, Skwish had used nonlinear estimation to predict the properties of photographic paper from an examination of the pulp from which it would be produced. By estimating the parameters in an appropriate model, paper chemists could predict the tear and puncture strength of the final product from such pulp characteristics as fiber length and time required to pass through a filter.

Exploring and Evaluating Multidrug Treatment Strategies James Minor, an independent consultant with experience in both the chemical and the pharmaceutical industries, points to the Levenberg-Marquardt method as "one of the most robust estimation methods of all those that are available, especially for industrial problems." He has found the method especially useful for accelerating the training of the neural networks he uses to evaluate different multidrug treatment strategies. "These multidrug treatments are of increasing interest to both the FDA and the drug industry," he explains. "In the past, just one drug was tested, and the remaining components in the blood stream--nutrients, other drugs, and so on--were considered a nuisance. Now biochemists want to take advantage of those interactions to treat complex viruses like HIV. The attitude is, `Why limit the treatment to a single drug? Why handicap yourself when you can evaluate and explore different multidrug treatments?'" Seth Michelson, research section leader in the Department of Biomathematics at Syntex Drug Discovery, elaborates on the impact of drug-drug combination therapies. "The question is, What is the effect of the second drug? Is it simply additive? Or is the effect of the second drug compounded, whether synergistically or antagonistically? "For example, HIV is accompanied by a co-virus, cytomegalovirus (CMV), that causes blindness. The drug AZT treats HIV but not CMV. Can a second drug counteract the threat of CMV without interfering with AZT's primary effect? Doctors must know when they can mix and match drugs, and virologists need to narrow down the number of pathways that are interacting (when they are developing therapies)." The neural net approach based on nonlinear estimation determines the combinations of drugs in which interactions occur and reveals whether those interactions are synergistic or antagonistic. "This information," Michelson suggests, "builds intuition for the virologists so they can formulate and test a more limited range of hypotheses about interaction pathways." Minor says simply, "It is the best tool for postulating new chemical pathways."

Quality Control Charles Pfeifer, consultant manager for quality management and technology at DuPont, describes the importance of nonlinear estimation from another angle: "The model-based process control approach uses first principles chemical-physical mode ls or empirically derived models. Such inferential control approaches traditionally focus on determining set points for purposes of process regulation, stabilization, or obtaining complete chemical reactions. The assumption is that controlling for those in-process set points leads to good product. "But engineering or automatic process control doesn't usually relate those set points to finished product properties important to the customer. From a quality control perspective, the challenge is to estimate process parameters and, thus, process operating windows, to ensure consistent on-aim properties while satisfying traditional goals. Nonlinear estimation often is required in these kinds of models."

Marquardt's Problems Donald Marquardt was motivated to develop his nonlinear estimation technique by problems in this same mold. One such problem involved fitting the Van Laer model of vapor pressure versus temperature to determine the behavior of systems at temperatures intermediate to those at which the data were recorded. Another involved fitting paramagnetic spectral data so that relative peaks could be located precisely. From the location of the spectral peaks, the physical behavior of the molecules under study could be predicted with considerable confidence. For a third problem, heats of formation were estimated from laboratory data, providing what Marquardt calls "a real breakthrough."

128 / 134

Still other problems involved the identification of polymers, improvement of processes, reduction of impurities by means of precise identification, and even safety statistics.

Unavoidable Nonlinearity Eventually, what came to be known as the Levenberg-Marquardt method for nonlinear estimation was found to be useful, in Marquardt's words, in "hundreds and hundreds of applications because it was a technique that worked on most nonlinear problems most of the time. The practical reliability of the method--its ability to converge promptly from a wider range of initial guesses than other typical methods--is a factor in its continued popularity." Marquardt explains that in February 1953 he joined a consulting group at DuPont, "the premier chemical engineering research organization." At that time, he suggests, the demands of engineering modeling were running far ahead of current computational capabilities: "Even though estimation involving linear models was only beginning to be used, all of the engineering models were nonlinear. "The laboratory people often took very ingenious data, from which the engineers would extrapolate to plant-sized equipment. But the engineering groups would value our consulting services only if we could solve the nonlinear estimation problems that were crucial to these extrapolations." The computational problem is one of finding the minimum of the cost function. The most common cost function is the sum of the squares of the differences between the actual data and the values predicted by the current choice of the estimation parameters. The lowest point on this surface corresponds to the best estimate of the unknown parameters. Since the problems are nonlinear, the search for the minimum is always iterative. The insights leading to the Levenberg-Marquardt method arose from Marquardt's experience with several two-parameter estimation problems, including the Van Laer model mentioned earlier. The intuition of Marquardt's chemical engineering colleagues was often sufficient to provide good starting estimates for schemes like steepest descent, which iterate to the best estimate of the parameters by heading straight down the wall of a valley in the cost function surface. But without a good initial guess, many more iterations were usually needed. Worse, they often would not converge.

Primitive Computing Equipment Proves Beneficial Marquardt worked with an IBM Card programed Calculator operating with 46 words of memory at 3.75 floating-point operations per second; by improvements in the machine's logic, he had increased the speed by 50% from its original 2.5 flops! From the output of that crude equipment, he plotted the contours of the cost function, as well as the individual iterates generated by the various estimation schemes he tried. "We had such slow, primitive equipment, it was easy to see the details of every step," he reports. He began to observe a generic geometric problem: Methods like steepest descent, which followed the gradient down the cost function surface, marched in a direction that was nearly orthogonal to that of Taylor series methods, which linearized the cost function. This geometric conflict was a consequence of the long, narrow, banana-shaped valleys in the cost function. The ideal method would have to find an angle of descent intermediate between these two extremes. At the same time, the step size would require adjustment to prevent stepping across the valley and entirely missing the floor where the best parameter values lay.

The Diagonal Term Eventually, Marquardt recognized that these goals could be met by adding a properly sized parameter to the diagonal of the system of equations defining the iterates. Levenberg, whose earlier work was unknown to Marquardt at the time, had provided some intuitive motivation for this diagonal term by deriving it from the argument that an overly long linearized step should offset the apparent reduction in the cost function. But the extraordinary effectiveness of Marquardt's approach hinged on two particular features that were absent from Levenberg's prior work. First, unlike Levenberg, Marquardt did not insist on finding a local minimum of the cost function at each step. In this way he avoided the relatively slow convergence often encountered in steepest descent techniques as they work their way along a narrow zigzag path, crossing and recrossing the floor of the banana-shaped valley in the cost function surface. Second, and of equal importance, Marquardt implemented his method in Fortran and tested it "on a large number of problems." His code contained a particular feature, mentioned only in a long footnote in his 1963 paper, that treated cases in which the diagonal parameter had grown unreasonably large. "Many people initially programming the method have omitted the step described in the footnote in their computer software," Marquardt explains, "but it is very critical. The algorithm is not as robust without it. However, I distributed many hundreds of copies of the code I had tested. I am convinced that this technique would not have received as much attention without the Fortran code. And I believe it is still true today that good results won't receive the attention they deserve if they are not packaged in good code." Marquardt's original ideas have evolved considerably in the hands of other researchers, points out David Gay of AT&T Bell Laboratories. "The success of the Levenberg-Marquardt algorithm led to the

129 / 134

development of trust-region algorithms, which are now popular in many contexts," he says. "For hard nonlinear least-squares problems, these algorithms are often more efficient than Marquardt's code. Modern 'Levenberg-Marquardt' codes actually implement trust-region algorithms, in which the step length, rather than the Marquardt parameter, the multiple of a diagonal matrix added to the Hessian approximation, is explicitly controlled."

Closing the Case So Sherlock Holmes and Dr. Watson might have found that the Levenberg-Marquardt method became a ubiquitous tool of nonlinear estimation because Marquardt made a subtle adjustment in the angle at which the method moved downhill and because he provided well-tested code that implemented a robust algorithm. "Clever insight implemented in robust, well-tested code. Obvious, my dear Dr. Watson," Holmes might conclude. "Not just obvious, Mr. Holmes," Watson could wisely reply, "but critical to a wealth of industrial processes."

Paul Davis is a professor of mathematics at Worcester Polytechnic Institute.

Reprinted from SIAM NEWS Volume 26-6, October 1993 (C) 1993 by Society for Industrial and Applied Mathematics All rights reserved.

130 / 134

Lagrange multipliers In mathematical optimization problems, Lagrange multipliers are a method for dealing with constraints. Suppose the question as given is to find local extremes of a function of several variables subject to one or more constraints given by setting further functions of the variables to given values. The method introduces a new unknown scalar variable, the Lagrange multiplier, for each constraint and forms a linear combination involving the multipliers as coefficients. This reduces the constrained problem to an unconstrained problem. It may then be solved by the usual gradient method.

131 / 134

model based error resampling bootstrap technique The resampling technique applied in OptiPa during bootstrap is error resampling which goes as follows:

1. Fit the model and retain the fitted values and the residuals .

2. For each pair, (xi, yi), in which xi is the (possibly multivariate) explanatory variable, add a randomly

resampled residual to the response variable yi. In other words create synthetic response variables

where j is selected randomly from the list (1, …, n) for every i.

3. Refit the model using the fictitious response variables , and retain the quantities of interest (the estimated model parameters).

4. Repeat steps 2 and 3 a statistically significant number of times.

132 / 134

Generating random correlated non-Gaussian parameters A technique has been implemented to generate random correlated sets of parameters for a large family of normal based, skewed and peaked distributions. Azzalini and DallaValle (1996) developed the skewed normal distribution SN(0,1,α) with its density function of the form 2.ф(z).Φ(α.z) where ф and Φ are the N(0,1) normal probability density and cumulative distribution function. The shape parameter α controls the skewness of the distribution with α = 0 resulting in a standard normal distribution, α < 0 resulting in a distribution skewed to the left and α > 0 resulting in a distribution skewed to the right. Delianedis (2000) used a mixture of two zero-mean normal distributions with unequal variances to introduce kurtosis. In the current approach these two approaches were combined introducing both kurtosis and skewness resulting in the so-called SKN distribution. Instead of using a combination of only two normal distributions with unequal variances to introduce kurtosis, a range of 6 distributions (η = 6) with increasing standard deviations was used each balanced by a standard normal distribution. In this way a smooth overall distributions can be obtained. This resulted in:

with β+ the shape factor controlling the kurtosis of the distribution with β = 0 resulting in a standard normal distribution. An impression of the different faces of the SKN distribution is represented in Fig. 1. A numerical analysis of the SKN distribution from has shown that the surface under the curve equals one, regardless the values of α and β, making the function a real distribution function. It can be proven analytical as well that the integral of the SKN distribution equals one.

Fig. 1 The different faces of the SKN distribution. The bold curve is the standard normal distribution with α = 0 and β = 0. The skewed curves are the result from α ranging from -10 to 10 (β = 0). The peaked curves in the middle are the result from β ranging from 0 to 10 (α = 0). By combining different values of α and β intermediate shapes can be obtained.

The algorithm to generate random correlated parameter values based on the SKN distribution (Fig. 2) can be summarised as follows:

Fit the appropriate transformed normal density function to each of the observed parameter distributions coming from the bootstrap (A)

Based on the original bootstrap data compile the cumulative distribution function (B)

Generate a standard normal cumulative distribution function (C)

Using the inverse transform method transform the original bootstrap data into their corresponding values in the normal parameter space (D)

133 / 134

Calculate the covariance matrix V for the normal transformed bootstrap data and calculate the Cholesky factor L

Using the Cholesky factor L generate a set of co-varying Gaussian random parameters (E)

Based on the random samples generate cumulative distribution functions (F)

Generate for each of the observed parameter distribution coming from the bootstrap, the cumulative transformed normal distribution function based on the fitted transformed normal density function (G)

Using the inverse transform method transform random Gaussian samples into their corresponding values in the original non-normal parameter space (H).

Fig. 2 Algorithm to generate co-varying random parameter sets given the appropriate transformed normal density function to fit to the original bootstrap data.

Reference: Azzalini A and DallaValle A, 'The multivariate skew-normal distribution', Biometrika, 1996 83(4) 715-726. Delianedis G, Lagnado R and Tikhonov S, 'Monte-Carlo simulation of non-normal processes', discussion paper, MKIRisk, London, UK, 2000 <http://www gloriamundi org/picsresources/rlgdst.pdf> (retrieved July, 2004).

134 / 134

Cholesky decomposition Given a covariance matrix V and a vector Y containing the average values of the Gaussian model parameters, the Cholesky decomposition can be used to determine the Cholesky factor L and its transform LT so that: V = L×LT with L a lower triangular matrix. After generating a vector g containing standard Gaussian random numbers (μ = 0, σ = 1) the required covariance structure can be introduced by multiplying this vector with the Cholesky factor L. Finally, by adding the vector Y to correct for non-zero averaged Gaussian parameters, the final vector y is obtained, containing the required set of co-varying Gaussian random parameters (y =Y+ L×g).

optipa - ku leuven · matlab program a windows standalone executable has been compiled which is...

Documents