quick approximation of bivariate functions
TRANSCRIPT
P. Courrieu – British Journal of Mathematical and Statistical Psychology 1/59
Manuscript BJMSP518R2 (2011, in press) to appear in
British Journal of Mathematical and Statistical Psychology
DOI:10.1111/j.2044-8317.2011.02016.x
Quick Approximation of Bivariate Functions
Pierre Courrieu
Laboratoire de Psychologie Cognitive, CNRS-Université de Provence, Marseille, France
Running head: Function Approximation
PsycINFO classification:
2340 (Cognitive Processes), 2323 (Visual Perception), 2240 (Statistics & Mathematics)
Main text length: 9022 words
Total length: 13273 words
Send correspondence to:
Pierre Courrieu
Laboratoire de Psychology Cognitive
UMR 6146, CNRS-Université de Provence
Centre St Charles, Bat. 9, Case D
3 Place Victor Hugo
13331 Marseille cedex 03
France
E-mail: [email protected]
Phone : (+33) 4 13 55 09 89 - Fax : (+33) 4 13 55 09 98
P. Courrieu – British Journal of Mathematical and Statistical Psychology 2/59
Quick Approximation of Bivariate Functions
Abstract
This paper presents two experiments where participants had to approximate function
values at various generalization points of a square, using given function values at a small set
of data points. A representative set of standard function approximation models was trained to
exactly fit the function values at data points, and models' responses at generalization points
were compared to those of humans. Then one defined a large class of possible models
(including the best two identified predictors) and the class maximal possible prediction
accuracy was evaluated. A new model of quick multivariate function approximation
belonging to this class was proposed. Its prediction accuracy was close to the maximum
possible, and significantly better than that of all other tested models. The new model also
provided a significant account of human responses variability. Finally, it was shown that this
model is more particularly suitable to problem presentations in which the visual system can
perform some specific structuring of the data space. So, this model is considered as a suitable
starting point for further investigations on quick multivariate function approximation, which
is to date a weakly explored question in cognitive psychology.
PsycINFO classification:
2340 (Cognitive Processes), 2323 (Visual Perception), 2240 (Statistics & Mathematics)
Key words: Function approximation; Multivariate functions; Computational models; Voronoi
tessellation.
P. Courrieu – British Journal of Mathematical and Statistical Psychology 3/59
1. Introduction
Function approximation is a basic capability that humans commonly use in their every
day life. For instance, whenever we look at a meteorological map for atmospheric temperature
forecasting, predicted temperatures are usually provided at a finite number of locations, while
we are possibly leaving at, or going to, a location that does not belong to the sample set. In
this case, we commonly approximate the temperature at the desired location from the
available predicted temperatures at other locations (possibly the nearest ones). This is a
function (temperature) approximation on a support space described by two variables
(longitude and latitude), which we refer to as a "bivariate" function approximation. The
support space is also called "input space", and the function space is also called "output space".
Another example is that of designers and draughtsmen who must draw continuous surfaces
fulfilling given dimensions at a finite number of points of the plane (without using standard
computational interpolators). More generally, a function approximation problem consists of
estimating a certain quantity at any point of a given support space, while this quantity is
actually known only at a limited number of particular points. The data points where the
function values are known are called "learning examples", "control points", "scattered data" or
simply "data points" and "data set", depending on the context. Every point of the support
space that does not belong to the data set is called a "generalization point", and one
distinguishes between generalization points that are "interpolation points", and those that are
"extrapolation points". Intuitively, an interpolation point is located somewhere "between"
several data points in the support space, while an extrapolation point is outside the data
cluster. More rigorously, one says that a generalization point is an interpolation (respectively
extrapolation) point if it is inside (respectively outside) the convex hull polytope of the data
set in the support space (Courrieu, 1994; Pelillo, 1996). In general, a function approximation
problem constrained only by a finite set of data points is an ill-posed problem, given that there
is a priori an infinity of distinct continuous functions passing by these data points. In fact,
every continuous interpolator provides an example of such a function, and the question we
ask is "how does a human select solutions among the infinite set of possibilities?"
It can be noted that there is a strong analogy between function approximation and
function learning, from a formal point of view (Girosi & Poggio, 1990; Poggio & Girosi,
1990). In fact, available psychological data and models concern function learning. Boolean
function learning (or category learning) has been widely studied on multidimensional support
P. Courrieu – British Journal of Mathematical and Statistical Psychology 4/59
spaces (Ashby & Ell, 2002; Nosofsky & Kruschke, 2002). However, continuous function
learning is commonly restricted to one-variable support spaces (Bott & Heit, 2004; DeLosh,
Busemeyer, & McDaniel, 1997; Kalish, Lewandowsky, & Kruschke, 2004; McDaniel &
Busemeyer, 2005; McDaniel, Dimperio, Griego, & Busemeyer, 2009), or to a one-variable
space with an additional binary contextual variable (Lewandowsky, Kalish, & Ngang, 2002).
An exception to this is the work of Koh (1993), where participants learned to produce
specified response durations when presented with stimulus lines varying in length and angle
of orientation, thus on a two-variable space. However, Koh studied only the learning process,
not the generalization process. Another important exception is the work of Kelley and
Busemeyer (2008), who compared several function learning models on multivariate function
forecasting problems. These authors tested function learning models such as low degree
multivariate polynomial regression, and the "Associative Learning Model" (Busemeyer,
Myung, & McDaniel, 1993), which is a variant of standard Gaussian "Radial Basis Function
Networks". Standard versions of these models will be also tested in the present study, together
with other function approximation models.
Although mathematically related to function approximation, function learning as a
cognitive process is in fact very different from quick function approximation. Function
learning commonly involves repeated presentations of a large set of learning examples, until a
learning criterion is reached and a specific generalization capability ("expertise") is acquired.
By contrast, quick function approximation usually involves small data sets, and
generalizations are produced immediately, based on an available capability, without specific
learning. So, theoretical models suitable to function learning are not necessarily transposable
to quick function approximation, and conversely. In fact, psychological data are not available
concerning quick function approximation on multivariate spaces, and currently available
models mainly belong to machine learning and computer science area. It is suggestive to note
that, in a famous paper, D. Shepard (1968) proposed an interpolation model whose goal was
to provide computers with quick function approximation capabilities similar to those we
attempt hereafter to study in human.
The next section presents an experiment using a quick function approximation task on
a square, which provides initial behavioural data. Then, using these data, we test the
prediction capability of a representative set of 10 standard function approximation models
(section 3). After this, we try to characterise a class of plausible models, and we estimate its
P. Courrieu – British Journal of Mathematical and Statistical Psychology 5/59
maximal prediction capability (section 4). Then we instantiate a new model from this class
(section 5), and we test its actual prediction capability (section 6). Section 7 presents a second
experiment, and we conclude in section 8.
2. Experiment 1
This experiment introduces a quick bivariate function approximation task, where
approximation problems are presented using numbers (function values) on a square
(bidimensional support), the variables being the plane coordinates of numbers' locations. The
goal is simply to collect an initial set of human generalization responses in order to test
various function approximation models in the next sections.
2.1. Participants. 16 participants (8 men and 8 women, 23-62 years old) participated in the
experiment on a voluntary base.
2.2. Material. A set of 16 bivariate function approximation problems was built in the
following way. A set of 5 input data points in a 10×10 square was chosen for interpolation
problems, and another set of 5 input data points was chosen for extrapolation problems. To
each of these data points, one associated two distinct arbitrary function values, thus there were
two functions to be approximated in interpolation problems ("I1" and "I2"), and two other
functions to be approximated in extrapolation problems ("E1" and "E2"). 4 distinct
interpolation input points (numbered ".1", ".2", ".3", and ".4") were associated to interpolation
problems, and 4 distinct extrapolation input points (also numbered ".1", ".2", ".3", and ".4")
were associated to extrapolation problems. In all cases, the distance of a generalization point
from its nearest data point was the same (=3). Table 1 summarises all problems' data. Each of
the 16 approximation problems was presented (one at a time) to the participants on a sheet of
paper as illustrated in Figure 1. Data points were materialised by five circles with their
corresponding function values inside, while the considered generalization point (one per
problem) was materialised by an empty circle. For half the participants, the problems were
presented in the order (I1.1, E1.1, I1.2, E1.2, I1.3, E1.3, I1.4, E1.4, I2.1, E2.1, I2.2, E2.2, I2.3,
E2.3, I2.4, E2.4), while the reverse order was used for the other half. The whole set of
problems is visualized in Figure 3 (together with human average generalization responses).
P. Courrieu – British Journal of Mathematical and Statistical Psychology 6/59
2.3. Task. Participants were instructed that problems' data were artificial, but that they can
consider them as atmospheric temperatures (in Celsius degrees) on a square map of side 1000
kilometres (with a distance unit of 100 km). Then, they had to approximate the temperature in
the empty circle on the basis of the temperatures given at other locations. Responses were
given with a pencil in the empty circle, corrections were allowed, and there was no time
constraint, however, all participants solved the 16 problems in less than 8 minutes.
Table 1
Figure 1
2.4. Basic results. The responses to the 16 problems of the 16 participants are reported in
Table 2, with the mean response and standard deviation for each problem. It has recently been
shown that one can estimate the reproducible proportion of variance, in the average response
vector, using an intraclass correlation coefficient (ICC), namely the so called "ICC(C, k),
Cases 2 and 2A" in the nomenclature of McGraw and Wong (1996), computed on the raw
data of Table 2 (Courrieu, Brand-D'Abrescia, Peereman, Spieler, & Rey, 2011; Rey, Courrieu,
Schmidt-Weigand, & Jacobs, 2009). The obtained ICC is equal to 0.985, which indicates that,
despite the variability of responses, these are in fact highly consistent, and one can hope that
there is an underlying behaviour common to all participants. The 95%, 99%, and 99.9%
confidence intervals of the ICC are [0.973, 0.994], [0.967, 0.996], and [0.959, 0.997],
respectively. According to Courrieu et al. (2011), the squared correlation of the predictions of
an exact model with the average empirical response vector should belong to the ICC
confidence interval with the interval probability. A squared correlation lower than the ICC
lower confidence limit indicates a probable under-fitting, while a squared correlation greater
than the ICC upper confidence limit indicates a probable over-fitting. The very high value of
the observed ICC makes the prediction of such data highly challenging, given that
approximate models have few chances to satisfy the validation criterion.
Table 2
3. Test of standard function approximation models
A representative set of 10 function approximation models has been selected in order to
attempt to predict humans' generalization responses. All models were trained to exactly fit the
P. Courrieu – British Journal of Mathematical and Statistical Psychology 7/59
data set for each sampled function (I1, I2, E1, E2), and then the responses of each model at
the generalization points were recorded. Except for the Quadratic approximation model (see
below), human responses were never used in the training of models, in order to avoid over-
fitting problems (Pitt & Myung, 2002). For those models that have some global tuneable
parameter(s), standard tuning (which is commonly close to optimal) was finally used after
verifying that an optimization with respect to experimental data did not significantly improve
the prediction performance. Psychological function learning models such as EXAM (DeLosh
et al., 1997) and POLE (Kalish et al., 2004)) were not included in the tested set of models
because, in their current state, these models are not suitable to quick multivariate function
approximation, and their possible extension requires non-trivial modifications, which prevents
from regarding these models as standard in this framework.
3.1. Direct approximators. These models directly provide generalizations, and they do not
require learning or building a representation of the whole function, so they are possibly good
candidates for modelling quick function approximation. In this category, we consider the
usual Nearest Neighbour Approximator (abbreviated "NNA"), which simply approximates the
function at any generalization point by the function value of the nearest data point. We also
consider the original Shepard interpolator (abbreviated "She") (Shepard, 1968), also known as
the "inverse distance-weighted average":
!
f (X) = f i X " Xi
"2
i=1
m
# X " X j
"2
j=1
m
# , (1)
where
!
{(Xi, fi), 1" i " m} is a data set of
!
m points, and
!
X is a generalization point. Finally
another direct approximator exhibited interesting properties, this is the Lipschitz interpolator
("Lip") (Beliakov, 2006):
!
f (X) = ( f+(X) + f
"(X)) /2, (2a)
with
!
f+(X) =min1" i"m ( f i + a X # Xi ) ,
!
f"(X) =max
1# i#m ( f i " a X " Xi ) , (2b)
and the Lipschitz constant:
!
a =max1" i< j"m fi # f j / Xi # X j . (2c)
3.2. Radial Basis Function Networks. These are widely used function approximation models
that just require learning by solving a least-squares (or linear) system before generalizing
(Micchelli, 1986; Girosi & Poggio, 1990; Poggio & Girosi, 1990; Yoon, 2001). In this
P. Courrieu – British Journal of Mathematical and Statistical Psychology 8/59
category, we consider Gaussian basis function networks ("Gau"), Hardy multiquadrics
("Har"), and Radial Splines ("Spl"). Given a data set of
!
m points, an RBFN interpolation at
point
!
X is of the form:
!
f (X) = wi g( X " Xii=1
m
# ) , (3)
with
!
f (Xi) = f i,
!
1" i " m, (for exact interpolation), (3a)
or
!
f (Xi) " f i ,
!
1" i " m, (for least-squares approximation), (3b)
where
!
g is a basis function, and
!
wi is a weight associated to the ith data point by learning.
For Gaussian networks, one has the basis function
!
g(d) = exp("d2/s2), while for Hardy
multiquadrics, one has
!
g(d) = (1+ d2)1/ 2, and for Radial Splines on two-dimensional spaces,
one has
!
g(d) = d2ln(d). The scale parameter (s) of Gaussians is tuned using the "global first
nearest-neighbour heuristic" (Moody & Darken, 1989). Despite the existence of a learning
phase, these models are possible candidates to quick function approximation since their
learning process reduces to solving a least-squares (or linear) system, and it has been shown
that biological neural networks could (theoretically) solve least-squares systems very fast,
say, in less than 250 ms (Courrieu, 2004). In practice, one considers a set of
!
L data points
from which one selects a subset of
!
m data points named "prototypes", with
!
m " L . Then one
builds a
!
L "m matrix
!
G = (gki) = (g( Xk " Xi )) ,
!
1" k " L,
!
1" i " m, and the corresponding
!
L "1 vector
!
F = ( fk ) = ( f (Xk )) ,
!
1" k " L, of expected function values. Then one computes
the
!
m "1 weight vector
!
W = (wi) ,
!
1" i " m, solving the least-squares problem
!
minW "R
mGW # F
2 , whose solution is known to be
!
W =G1,3( )F , for a suitable generalized
inverse
!
G1,3( ) of the matrix
!
G (Ben-Israel & Greville, 2003; Courrieu, 2009). If
!
m < L , then
the least-squares solution allows filtering possible noise in the data, however it does not
provide an exact interpolator. If
!
m = L , then the matrix
!
G is square and invertible, the least-
squares system reduces to a simple linear system, and one obtains an exact interpolator using
!
G1,3( ) =G"1, that is
!
W =G"1F , which satisfies (3a).
3.3. Similarity function networks. These are networks that use dissimilarity measures as
inputs, and they can efficiently approximate functions on every metric space, as well as on a
wide variety of non-metric topological spaces, such as those generated by Dynamic
Programming methods. To date, only two approximators are known to have such capabilities:
a simple generalization of the Nearest Neighbour Approximator, and the so-called Phi-
P. Courrieu – British Journal of Mathematical and Statistical Psychology 9/59
approximator ("Phi") (Courrieu, 2005), which, for interpolation on Euclidean spaces, takes
the form:
!
"(X) = wi exp(#$ X # Xi
2)
i=1
m
% exp(#$ X # X j
2
)j=1
m
% , with
!
"(Xi) = f i,
!
1" i " m. (4)
The parameter
!
" is tuned using a standard procedure (
!
"*) defined in the reference above, and
the coefficients (
!
wi's) are learned solving a least-squares (or linear) system, as in Section 3.2.
3.4. Multilayer Neural Networks. These models are more particularly devoted to function
learning, they commonly involve large data sets, and quite laborious learning procedures such
as the "Error Gradient Backpropagation" (Rumelhart, Hinton, & Williams, 1986) for layered
networks called Multilayer Perceptrons ("MLP"), or the "Cascade-Correlation" learning
algorithm (Fahlman & Lebiere, 1990), for both Multilayer Perceptrons and more general feed-
forward architectures called Cascade-Correlation networks ("CCo"). An important advantage
of the Cascade-Correlation learning algorithm over the Backpropagation algorithm is that it
automatically determines the number of necessary hidden neurons. So, we use a variant of the
Cascade-Correlation learning algorithm for both architectures (MLP and CCo). In this variant,
the computation of the hidden units is performed by mean of the global optimization
"Hyperbell" algorithm (Courrieu, 1997) instead of the original multistart type procedure. Each
network was trained to exactly fit the learning set for each function (I1, I2, E1, and E2)
separately, and this was repeated 32 times in order to estimate mean generalization responses
that were used for the comparison with human generalization responses. In all cases, and for
all problems of Experiment 1, MLP completed learning with 4 hidden neurons, while CCo
learned with 2 hidden neurons. These models were tested in order to establish reference fits,
however, their laborious learning process makes them a priori implausible models for quick
function approximation.
3.5. Quadratic approximation ("Qua"). This is the minimum degree polynomial
approximation that allows to exactly fit five data points on a bivariate support. It is important
as an example of polynomial approximation, and also because locally quadratic
approximations are used in other well-known interpolators (Brodlie, Asim, & Unsworth,
2005; Renka, 1988). However, a bivariate polynomial of degree two has 6 monomials, thus
we have 6 coefficients, and only 5 data points to fit in each function. So, we can use the
remaining degree of freedom to choose the particular solution that minimizes the
generalization error with respect to the experimental data. This is known as a "constrained
P. Courrieu – British Journal of Mathematical and Statistical Psychology 10/59
least-squares solution" (Ben-Israel & Greville, 2003, p. 108), and it provides the best
quadratic approximation of human responses that we can hope, given the constraint of exactly
fitting the data points.
3.6. Results. Figure 2 shows examples of generalization surfaces (E1 function) generated by 4
different models (She, Lip, Har, and Phi), which gives an idea of the diversity of solutions.
Results of simulations for all models are reported in Table 3, together with the correlations
between simulated and human generalization responses. The best prediction is provided by
the Phi-approximator (r=0.916), closely followed by the Shepard interpolator (r=0.910), and
the MLP (r=0.905). Despite the optimization of the solutions with respect to the empirical
data, the Quadratic approximation model provides a modest performance (r=0.759). If one
hypothesizes some multi-system model, where each model independently contributes to the
mean responses with a given probability, one can easily compute the optimal probabilities
using Algorithm 1 from (Courrieu, 1994), which provides the following models' probabilities:
0.2022 for NNA, 0 for Lip, 0.1537 for Gau, 0.0847 for Phi, 0.0059 for Har, 0 for Spl, 0.2636
for MLP, 0.0733 for CCo, 0.2166 for She, and 0 for Qua. Weighting the models' predictions
with these probabilities and summing, one obtains a composite prediction vector whose
correlation with human mean responses is r=0.945. Note that this is just an estimate of the
best fit that can be reached with this particular set of models, as a whole. One must not
interpret the individual probabilities as models fits since, in this type of analysis, it can even
happen that a zero probability is assigned to the best model. Despite the quite good
performance of several tested models, one can observe that none of the obtained correlations,
after squaring, belongs to standard confidence intervals of the human data ICC, and all
models under-fit the data (see Section 2.4). Thus, there is room for further investigation.
Table 3
Figure 2
4. Characterization of a class of suitable models
The three best predictors (Phi, She, and MLP) are very different models, and it seems
hard to see what they have in common. In this section, we define a large class of function
P. Courrieu – British Journal of Mathematical and Statistical Psychology 11/59
approximation models, the two best predictors belong to, and we determine the maximum
correlation with empirical data that models of this class could reach.
We hypothesize that the generalization function at any point
!
X " R2 can be rewritten as:
!
f (X) = wi(X) f ii=1
m
" , (5)
with
!
wi(X j ) = "ij (Kronecker delta), and the unit sum constraint that
!
wi(X) =1
i=1
m
" , where
!
m
is the number of data points (here m=5),
!
fi is the given function value at the ith data point,
and
!
wi(X) is the value of the "weighting function" of the ith data point at point
!
X . Moreover,
we hypothesize that the weighting functions only depend on the relative locations of
generalization and data points in the support space, and that they are invariant to shifts and
rotations of the coordinates. The resulting class of interpolators includes, among others,
Shepard interpolators and Nearest Neighbour approximators, and if one weakens the unit sum
constraints to be only approximately satisfied, then the class extends to most Radial Basis
Function Networks, and to Phi approximators when used on Euclidean spaces, as is the case
here.
Observe, in Figure 3, that all interpolation problems have similar input configurations
by quarters of turn of the support, and the same is true for extrapolation problems. If one
considers the data points in increasing order of their distance from the generalization point in
each problem, then one obtains a sequence of 5 distinct distances that is the same for all
interpolation problems, and another sequence of 5 distinct distances that is the same for all
extrapolation problems. The output data reordered in this way form a 5 components row
vector for each problem. Then one can group the vectors of the 8 interpolation problems in an
8×5 matrix, and similarly for the 8 extrapolation problems. Finally, we can append to each
matrix a row vector whose 5 components are equal to 1 (in order to represent the unit sum
constraint). Under the above hypotheses concerning the class of models, one can find a unique
5 weights vector WI for interpolation problems, and similarly, a unique 5 weights vector WE
for extrapolation problems, such that the scalar product of each row of each matrix with the
appropriate weight vector is equal to the generalization response to the corresponding
problem (or is equal to 1 for the unit sum constraint). The data and the solutions to this
problem are presented in Table 4, and the two weight vectors (WI and WE) were obtained by
use of the constrained least-squares technique. The correlation between the predicted
P. Courrieu – British Journal of Mathematical and Statistical Psychology 12/59
generalization responses ("Obtained" column) and the human generalization responses
("Target" column) is r=0.988, which is clearly better than the best model fits previously
obtained, and this correlation is the maximum possible for models belonging to the above
defined class. Interestingly, the squared target correlation (0.9882 = 0.976) belongs to the 95%
confidence interval of the data ICC (0.973, 0.994), which indicates that there is some chance
that the exact model belongs to the class above defined. In order to approach such a goodness
of fit with a model, we must attempt to identify some relevant characteristics that previously
tested models do not have.
Observing the weight vectors (WI and WE) in Table 4, we can see that the weights of
the two most distant data points from the generalization point are always negative, which
indicates that the interpolation is based not only on data values, but also on variations
(differences) between data points. Moreover, although the nearest data point always has the
greatest weight, the weights do not decrease monotonically as the distance increases. A
prominent fact, in the weight vector WI, is that the two greatest weights are at distance ranks 1
and 3, corresponding to data points that are the extremities of a line segment the
generalization point belongs to. This is easy to visualize in Figure 3, noting that in
interpolation problems, the distance rank 2 always corresponds to the central data point. In
summary, it appears that the distance of the generalization point from a line segment joining
two data points is more determinant than the distance from an individual data point, which
strongly suggests that the basic interpolation unit involves a pair of data points, hereafter
called "bipoint".
Now, given two data points with their function values, there is a unique linear function
(straight line) that passes by these points. The global function to be approximated is not
linear, in general, but the principle of using multiple pieces of linear functions to approximate
non-linear functions has been successfully used in several leading models of exemplar based
univariate function learning. This is the case in the EXAM model (DeLosh et al., 1997), as
well as in the POLE model (Kalish et al., 2004), for instance. However, this principle is more
complex to use on multivariate support spaces because, contrarily to the univariate case, most
generalization points do not belong to a straight line defined by a pair of data points. The
model presented in the next section provides a way of overcoming this difficulty. This will be
P. Courrieu – British Journal of Mathematical and Statistical Psychology 13/59
achieved using a special variant of R. Shepard (1987)'s generalization law suitable to
generalize from bipoints.
Table 4
Figure 3
5. The ABI model
We now instantiate a model belonging to the class defined in Section 4, taking into
account the particular observations on the weight functions, in order to attempt to approach
the class maximum possible correlation level with human responses (r=0.988). We
hypothesize that the input space is structured by Voronoi tessellation (Okabe, Boots,
Sugihara, & Chiu, 2000), and that the basic function approximation elements are data
bipoints. A bipoint is simply a pair of data points, to which any generalization point can be
compared by mean of two quantities: a special distance named "exteriority", and a "linear
interpolation/extrapolation" of function values, as explained below. This allows building a
simple model where various bipoints are sequentially sampled with probabilities depending
on the exteriority of the generalization point, and the corresponding function linear estimates
are averaged to build the subject's generalization response. This model is implemented in the
function "ABI" (for "Average of Bipoints Interpolations") listed in Appendix (Matlab 7.5
code). The model is described hereafter.
5.1. Voronoi tessellation and relevant bipoints. Observing visualized data sets (as in Figure
1), it is intuitively obvious that certain bipoints are of minor interest because their two points
are distant from each other and other data points are interposed between them. This intuition
probably results from the way we structure the data support space. There are to date strong
empirical evidences and theoretical arguments suggesting that the visual system generates a
Voronoi-like representation at an early stage in visual processing (Dry, 2008; Dry, Navarro,
Preiss, & Lee, 2009). All around each data point is its "Voronoi-cell", which is the set of all
generalization points having this data point as their nearest neighbour. The juxtaposition off
all Voronoi cells is the "Voronoi tessellation" of the data space. The second row of images in
Figure 4 provides a visualization of Voronoi tessellations of the interpolation and
extrapolation data sets. Two data points are "Voronoi-neighbours" if their respective Voronoi-
cells have at least one border point in common. Some elementary geometric considerations
P. Courrieu – British Journal of Mathematical and Statistical Psychology 14/59
allow building a simple algorithm for extracting all Voronoi-neighbours pairs from a data set,
as does the sub-function "Neighbors" listed in Appendix. We define the set of "relevant
bipoints" as the set of all Voronoi-neighbours pairs from the considered data set. In a two-
dimensional space, the generated set of bipoints coincides with the set of all edges of all
possible "Delaunay triangulations" of the considered set of points. Delaunay triangulation is a
dual structure of Voronoi tessellation, however, contrarily to Voronoi tessellation, Delaunay
triangulation is not always unique.
5.2. Exteriority of a point from a bipoint. This is simply the Euclidean distance of the
considered point from the nearest point belonging to the line segment joining the two points
of the bipoint. Let
!
E(x, a, b) denote the exteriority of a point
!
x from a bipoint
!
(a, b). Set
!
A = x " a,
!
B = x " b, and
!
C = a " b. In the particular case where
!
a = b, the bipoint reduces to
a point and one has simply
!
E(x, a, b) = A = B . In all other cases, the squared exteriority is
given by:
!
E2(x, a, b) = ( A
2B
2" (A.B)
2+ 1
4( A.C + B.C " C
2)2)/ C
2 , (6)
where
!
X .Y denotes the scalar (dot) product of vectors
!
X and
!
Y , and
!
X denotes the
Euclidean norm of vector
!
X . The sub-function named "Exteriority" in the listing of Appendix
computes the exteriority of points from bipoints. The first row of Figure 4 shows the
exteriority function on the square for two distinct bipoints. The expression (6) has the
advantage of being direct and exact, however, note that there is a more general definition of
the exteriority for all convex polytopes (other than bipoints), and corresponding iterative
computation methods can be found in Courrieu (1994). Note also that a technical variant of
the exteriority is known as the "polytope distance" (Gärtner & Jaggi, 2009).
5.3. Sampling probability of a bipoint. Every relevant bipoint
!
(xi,x j ) belonging to the data set
can be considered at any time with a probability
!
pij (x) that depends on the exteriority
!
E(x,xi,x j ) of the current generalization point
!
x from that bipoint, and on the distance of the
generalization point from its nearest neighbour data point
!
d0(x) =min1"k"m x # xk
, where
!
m
is the number of data points. Introducing
!
d0(x) is not only natural in a Voronoi structured
data space, but it is also essential to obtain a stable and exact interpolator. First, both
!
E(x,xi,x j ) and
!
d0(x) depend on the scale of the input data, however, the ratio
!
E(x,xi,x j ) /d0(x) has the desirable property of being scale invariant. Secondly, whenever
!
x
P. Courrieu – British Journal of Mathematical and Statistical Psychology 15/59
tends to a data point, there are two possibilities. (1) If the nearest data point does not belong to
the bipoint
!
(xi,x j ), then the ratio
!
E(x,xi,x j ) /d0(x) tends to infinity, as an ordinary inverse
distance. (2) If the nearest data point belongs to the bipoint
!
(xi,x j ), then the ratio
!
E(x,xi,x j ) /d0(x) tends to a finite limit comprised between 0 and 1, because, by definition of
the exteriority, one has always
!
E(x, a, b) "min( x # a , x # b ) . We can now define
intermediate functions
!
vij (x) , representing the strength of data bipoint
!
(xi, x j ) at
generalization point
!
x :
!
vij (x) = exp("#(E(x,xi,x j ) /d0(x))$) , (7)
where
!
" and
!
" are two positive real global parameters to be estimated, since we have no
theory to fix them. Note that (7) is just a variant of Shepard (1987, Eq. 10) exponential decay
generalization function, where the original Minkowskian metrics have been replaced by the
distance ratio discussed above, in order to account for bipoint generalization. Figure 4 shows
examples of
!
E ,
!
d0, and
!
" functions associated to two bipoints, one belonging to the
interpolation data set (left column), and one belonging to the extrapolation data set (right
column).
Finally, the sampling probability of a relevant bipoint
!
(xi,x j ) is given by:
!
pij (x) = vij (x) / vkl (x)(k,l )"V
# , (8)
where
!
V denotes the set of index pairs of Voronoi-neighbours in the data set. Note that (8) is
just an application of Luce's choice axiom (Luce, 1977), using the strength functions defined
in (7).
5.4. Linear interpolation/extrapolation from one bipoint. Given a relevant bipoint
!
(xi,x j ), and
the corresponding function values
!
( f i, f j ) in the data set, a linear approximation of the
function at the generalization point
!
x is provided by:
!
fij (x) = f i + ( f j " f i)(x " xi).(x j " xi)
x j " xi2
. (9)
As an example, consider the extrapolation problem E1.2 shown in Figure 1, and the data
bipoint consisting of
!
xi= (1, 6), with
!
fi = 8 , and
!
x j = (4,1), with
!
f j =13. The generalization
point is
!
x = (1, 9), and the function value predicted by the considered bipoint is:
!
fij (x) = 8 + (13" 8)(1"1, 9 " 6).(4 "1,1" 6)
(4 "1)2
+ (1" 6)2
= 8 + 5"15
34# 5.79
P. Courrieu – British Journal of Mathematical and Statistical Psychology 16/59
Note that linearly extrapolated values can be outside the range of the data function values.
One can assume that humans do not exactly compute the above quantities, but that they
approximate them with a random approximation error of mean zero (which is not modelled
for the moment).
5.5. Generalization response generating process. Given a generalization point
!
x , we assume
that the subject repeatedly and independently samples relevant bipoints in the data set
according to their sampling probabilities (eq. 8), then he/she linearly estimates the function
value at
!
x from each sampled bipoint (eq. 9), and averages the successive estimates. The
number of sampled bipoints is presumably a positive integer random variable, which is not
modelled for the moment. We can nevertheless determine the expected generalization
response at any point:
!
f (x) = pij (x) f ij (x)( i, j )"V
# . (10)
We can also determine a component part of the variance of generalization responses at any
point:
!
"1
2( f (x)) = pij (x)( fij (x) # f (x))
2
(i, j )$V% . (11)
The above variance does not take into account the (non modelled) random errors in linear
approximations, and it corresponds to the case where only one bipoint is sampled for each
generalization response. In fact, this variance must be divided by the (non modelled) number
of sampled bipoints used in each average. In addition, a simple inspection of individual
responses (see Table 2) shows that there is another (small) source of variance. We can note
that all individual responses are integer numbers, which means that the approximations are
rounded, and thus, there is a rounding error.
5.6. Main properties. Whatever be the dimension of the support space, the one of the function,
and the data set size, one can easily verify that:
.
!
f (x) is continuous at every point
!
x .
. If
!
xi is a data point, then
!
f (xi) = f i , and
!
"1
2( f (xi)) = 0, that is,
!
f (x) is an exact interpolator.
Figure 4
P. Courrieu – British Journal of Mathematical and Statistical Psychology 17/59
6. Test of the model
The two general parameters,
!
" and
!
" , have been estimated using a local search
procedure (Matlab "fminsearch") in order to maximize the correlation of the model
predictions with human generalization responses. As a result, we obtained
!
" =1.2031,
!
" = 2.3536, and the correlation of predicted mean responses with observed ones is
!
r = 0.975
(see Table 5), which is not far from the target correlation r=0.988, however, the difference of
these correlations is marginally significant using Williams T2 test (Steiger, 1980; Williams,
1959), that is
!
T2(13) =1.82, p < .10 . Moreover, the squared obtained correlation (0.9752 =
0.951) is just lower than the lower confidence limit of the 99.9% confidence interval of the
data ICC (0.959, 0.997), which indicates that there is room for model improvement, but we
are not far from an acceptable solution. In addition, the predicted standard deviations have a
correlation of
!
r = 0.653, p < .01, with the observed standard deviations (see Table 5). This last
result could be trivial if there was a correlation between the means and the standard
deviations, but this is not the case, since this correlation is
!
r = 0.179, n.s., for human, and
!
r = "0.106, n.s., for the model. We now verify that the predictions of the ABI model are
significantly better than those of other tested models. For this purpose, we first computed the
correlation between each participant response pattern and each model prediction pattern. The
resulting r values were transformed in z using Fisher's transformation
!
z(r) = log((r +1) /(r "1)) /2 , providing approximately normally distributed measures. This
resulted in a sample of 16 measures (one per subject) for each model, and Student t tests were
applied for pair-wise comparisons of the 11 models. The results are reported in Table 6,
where one can see that the ABI model provided significantly better predictions than all other
tested models. However, one could object that the ABI model was fitted to the empirical data
optimizing two free parameters, while other models (except Qua) were trained/tuned without
reference to human responses. In such circumstances, one commonly use model selection
criteria such as the Akaike Information Criterion (AIC: Akaike, 1974), or the Bayesian
Information Criterion (BIC: Schwarz, 1978), which takes into account both the goodness of fit
(maximum log-likelihood, for these criteria), and the number of model parameters optimized
in order to fit the empirical data. There is another powerful model selection criterion known
as the Bayes factor (BF), but unfortunately this criterion theoretically requires a prior
distribution to be defined on the space of model parameters. However, one knows that a rough
approximation of the Bayes factors can be computed from the BIC values without using a
P. Courrieu – British Journal of Mathematical and Statistical Psychology 18/59
prior distribution on the parameter space (Kass & Raftery, 1995, p. 778). Given two models,
say model 1 and model 2, with data fits such that
!
BIC1" BIC
2, the Bayes factor
approximation is given by
!
BF1,2 " exp((BIC2 # BIC1) /2). An interpretation scale of the Bayes
factors proposed by Jeffreys (1961) is commonly used:
1 ≤ BF < 3.2: "Not worth more than a bare mention"
3.2 ≤ BF < 10: "Substantial"
10 ≤ BF <100: "Strong"
BF ≥ 100: "Decisive"
The AIC and BIC for all models are reported in Table 7, together with the usual
squared correlation (explained proportion of variance), and the BFs with respect to the model
with minimal BIC (ABI model in the present case). As one can see in Table 7, both the AIC
and the BIC are minimal (and the explained proportion of variance is maximal) for the ABI
model compared to other models, confirming the previous analysis. Examining the
approximations of BFs, one can observe that the advantage of the ABI model over other
models is always at least "substantial". Moreover, if one arbitrarily set the ABI model
parameters to simple values such as
!
" = # = 2 , then the correlation of model's predictions
with the empirical data becomes 0.967, which is not much less than the fit obtained with the
optimal tuning, and which remains better than the performance of all other models. This also
shows that the ABI model behaviour has the desirable property of being moderately sensitive
to variations of the parameters.
So, we can conclude that the proposed model is a suitable starting point for modelling
quick multivariate function approximation. Figure 5 visualizes the four generalization
surfaces (I1, I2, E1, and E2) generated by the model with the optimal parameter values. In the
next section, we try a step forward, examining the role of the visual presentation of
approximation problems and the intra-individual variability of responses.
Table 5
Table 6
Table 7
Figure 5
P. Courrieu – British Journal of Mathematical and Statistical Psychology 19/59
7. Experiment 2
A critical assumption in the ABI model is that of the use of a Voronoi-like tessellation
of the data space. Such a tessellation is plausibly performed by the visual system at an early
stage in visual processing (Dry, 2008; Dry et al., 2009). However, one can ask what happens
if the data are presented in a way that does not allow the visual system to perform a Voronoi
tessellation. Is there some form of abstract Voronoi tessellation? If not, the ABI model could
loose its advantage over other models, or even appear completely inappropriate for such a
situation. This is the main point examined in Experiment 2. The function approximation
problems used in this experiment are more complex, less regular, and more numerous than
those of Experiment 1, in order to more closely mimic real life problems. We also address the
question of the intra-participant variability of generalization responses. The ABI model
predicts the intra-participant variability exactly in the same way as the inter-participant
variability, thus the intra-participant variability must exist (which is not a priori obvious!),
and it must be correlated with the predicted variability, as was the inter-participant variability
in Experiment 1. Finally, in an exploratory goal, we also record the response times in this
experiment.
7.1 Participant
As noted in Section 2.4, the generalization responses of participants in the quick
function approximation task are highly consistent (ICC = 0.985), so we can reasonably
consider that the responses provided by a unique participant are representative enough of the
whole population responses. This allows us to schedule a quite long, one participant
experiment with repeated measures, which is suitable to the goals stated in the section
introduction. The participant in this experiment was a 46 years old man, computer engineer,
having a corrected to normal vision.
7.2 Material
A set of 64 bivariate function approximation problems was built in the following way.
Each problem included 8 data points plus one generalization point. The coordinates of points
were randomly generated integer values in the range 1-9, as were the function values for data
points. The following constraints were applied in order to ensure that regular continuous
P. Courrieu – British Journal of Mathematical and Statistical Psychology 20/59
functions could interpolate these points. The minimum distance between two distinct points in
the support space was fixed to 2, and the function values of data points fulfilled a Lipschitz
condition such that, for each pair of data points, the absolute difference of the two function
values divided by the input points' Euclidean distance was never greater than 3 (Lipschitz
constant). An additional set of 8 problems was built in the same way for practice. The
problems were presented in two different ways on a computer screen. In the "spatial
presentation" condition, the problems were presented in a 10×10 square, each point location
corresponding to its input coordinates, while the point was simply materialized by its function
value for data points, or by a question-mark (?) for the generalization point. Figure 6
illustrates the spatial presentation of a problem. In the "non-spatial presentation" condition,
each point was represented as a line segment in a small square, and the heights of the segment
extremities corresponded to the point coordinates. Data point function values were shown
above the corresponding small squares, while a question-mark was used for the generalization
point. Figure 7 illustrates the non-spatial presentation of the same problem as in Figure 6.
Figure 6
Figure 7
7.3 Procedure
In each presentation condition, each trial began with the display of a problem on a
computer screen, then the participant had to estimate the function value at the generalization
point, entering his response on the computer keyboard. As in Experiment 1, there was no time
pressure, however, the computer recorded the response time. If the entered response was
empty or not a number, then the participant was prompted to re-enter his response, however,
the response time was always recorded at the first response. The next trial was initiated when
a suitable response had been recorded. The experiment was divided in 8 sessions of about
1h30 each, distributed over 8 days. Each session began with 8 practice trials, followed by a
short pause. Then the 64 experimental problems were presented in random order, by
sequences of 8 trials, with a short pause between sequences. Only one presentation condition
was used in a given session. Odd rank sessions used the spatial presentation condition, and
even rank sessions used the non-spatial presentation condition. At the end of the experiment,
we had 4 responses and 4 response times in each of the 2 presentation conditions, for each of
the 64 problems.
P. Courrieu – British Journal of Mathematical and Statistical Psychology 21/59
7.4 Competing models
The same standard models as in Experiment 1 were used, in the same conditions, in
order to predict the generalization responses. However, the Qua model was excluded because
it was not suitable to problems with 8 data points. The MLP model always completed learning
with 7 hidden units, while the CCo model always learned with 5 hidden units.
7.5 Results
7.5.1 Effect of the presentation condition and tuning of the ABI model
Four different tunings of the ABI model parameters were used. The first one, denoted
ABI1, is simply the optimal tuning obtained in Experiment 1. The second one, denoted ABIa,
is the optimal tuning obtained by fitting the generalization data of the spatial presentation
condition in Experiment 2. The third one, denoted ABIb, is the optimal tuning obtained by
fitting the generalization data of the non-spatial presentation condition in Experiment 2. The
corresponding parameter values are presented in Table 8. Averaging the three values for each
parameter, we obtain an average α of 1.8669 and an average β of 2.3853. Rounding these
values to the nearest integer numbers, we define a default tuning, denoted ABI0, as α = β =
2. The correlation values (r) were computed between model predictions and the averaged four
responses of the participant to each problem, in each presentation condition. One can see in
Table 8 that the fits of the ABI model to human responses, whatever the model tuning, are
substantially and significantly less good in the non-spatial presentation condition than in the
spatial presentation condition. Moreover, the correlation between human responses in the
non-spatial presentation condition and those in the spatial presentation condition is only r =
0.819, which, although highly significant (p< .001), is significantly less than the 0.947
correlation of ABI0 model predictions (for instance) with the human responses in the spatial
presentation condition (T2(61) = 4.85, p< .001). So, clearly, something important is lost when
changing the presentation of problems, but we need further analyses to know what is lost. One
can also note that the default tuning (ABI0) is almost as good as optimal tunings in both
presentation conditions.
Table 8
P. Courrieu – British Journal of Mathematical and Statistical Psychology 22/59
7.5.2 Comparison of models
Table 9 presents pair-wise comparisons of the prediction performance of 11 models,
by mean of Williams T2 tests (Steiger, 1980; Williams, 1959) on the correlations between
predicted and observed responses, in the spatial presentation condition. As one can see, both
ABIa and ABI0 provided significantly better predictions than all other models in this
presentation condition. Table 10 presents the values of model selection criteria r2, AIC, BIC,
and the approximated Bayes factor (BF) with respect to the minimal BIC. At new, ABIa and
ABI0 are the best models with respect to both criteria, and the results obtained in the spatial
presentation condition confirm those obtained in Experiment 1. Moreover, the Bayes factor
indicates that ABI0 is "strongly" preferable to ABIa, but both models under-fit the data in
terms of squared correlation. Tables 11 and 12 present comparisons similar to those of Tables
9 and 10, but for the data obtained in the non-spatial presentation condition. The ABIb and
ABI0 models remain the best predictors, but not significantly, in terms of correlation of their
predictions with human responses (see Table 11). However, AIC and BIC criteria are minimal
for the standard Lipschitz approximator (Lip) (see Table 12). This emergence of the Lip
model as a possible candidate in the non-spatial presentation condition is interesting, and the
elegant simplicity of this model certainly merits some attention, however, the Bayes factor
indicates that Lip did not perform substantially better than ABI0 in this presentation
condition. The approximated Bayes factor with respect to ABI0 (last column of Table 12)
shows that ABI0 remains at least "substantially" better than all other models, except Lip. One
can note that the non-spatial presentation condition poorly discriminate the models, and that
all fits are poorer than in the spatial presentation condition. What is clear is that the special
advantage of the ABI model over other models is lost in the non-spatial presentation
condition. So, this clear advantage of the ABI model in the spatial presentation condition
probably results from a particular performance of the visual system, such as the Voronoi-like
tessellation of the data space, which is taken into account in the ABI model, but not in the
other tested models. This answers the main question of this experiment. Figure 8 visualizes
the generalization surface generated by the ABI0 model from the data points of the problem
of Figure 6.
Table 9
Table 10
Table 11
P. Courrieu – British Journal of Mathematical and Statistical Psychology 23/59
Table 12
Figure 8
7.5.3 Interpolation and extrapolation problems
Contrarily to Experiment 1 problems, those of Experiment 2 were randomly generated,
and they can thus be considered as a random sample of bivariate function approximation
problems (with some specified characteristics such as the number of data points). It turned out
that, among the 64 problems of Experiment 2, 29 problems were interpolation problems
where the generalization point was inside the convex hull polytope of the data set, and the
remaining 35 problems were extrapolation problems. Table 13 presents a comparison of 10
models' performance (correlations of predictions with human generalization responses), using
only the ABI0 version of the ABI model (without free parameters). The models are compared
in both conditions of Experiment 2, distinguishing interpolation and extrapolation problems.
In the spatial presentation condition, the best predictor is the ABI model for both interpolation
and extrapolation problems, with a particularly strong advantage in extrapolation problems.
However, in the non-spatial presentation condition, the best predictor for interpolation
problems is the Nearest Neighbour Approximator, suggesting that the difficulty in visualizing
the problems induces a regression toward the simplest strategy. This is not true for
extrapolation problems, where the best predictor is the Hardy multiquadric model. A possible
explanation of this is that Hardy's basis functions rapidly become approximately linear along
their radii as the distance from their centre increases, which can lead to approximately linear
extrapolations, that is, to a simplification of the ABI process. This is also in line with
previously reported data concerning extrapolation in univariate function learning tasks
(DeLosh, Busemeyer, & McDaniel, 1997). However, most correlation differences are not
significant in the non-spatial presentation condition, and it is not clear what model is actually
the best predictor, given that it seems to be the Lipschitz approximator, closely followed by
the ABI model, when interpolation and extrapolation generalization responses are mixed.
Note, that the ABI model never performed significantly worse than the best predictors, even
in the non-spatial presentation condition, however, its special advantage is clearly lost in this
case. One can also observe that, contrarily to the intuition, most models better capture human
behaviour regularities in extrapolation problems than in interpolation problems. The reason of
this is not clear, and there is matter for further investigations.
P. Courrieu – British Journal of Mathematical and Statistical Psychology 24/59
Table 13
7.5.4 Response variability and response time
The results concerning response variability and response time are summarized in
Table 14. As expected, in the spatial presentation condition, there was a significant correlation
between the response standard deviations as predicted by the ABI model (ABIa and ABI0),
and the observed standard deviations. The obtained correlations are lower than in Experiment
1, but this is normal given that human response standard deviations were estimated only on 4
measures each in Experiment 2, while they were estimated on 16 measures (participants) in
Experiment 1. So, the SD estimates in Experiment 2 are clearly less accurate than in
Experiment 1, resulting in a loss of correlation. An interesting fact is that we also observed a
significant positive correlation between the SDs predicted by the ABI model and the human
response times. In the framework of the model, this suggests that the subject could increase
the number of sampled bipoints as a function of the variance of successive linear estimates.
Intuitively, this means that one samples more data when the sampled data are inconsistent,
which seems reasonable. This provides suggestions for future developments of the model,
concerning the determination of the number of sampled bipoints in estimating generalization
responses. Similar, but weaker correlations were observed in the non-spatial presentation
condition, and they were significant for response times only. Finally, the mean response time
per problem was 66 (± 40) seconds in the spatial presentation condition, and 52 (± 26)
seconds in the non-spatial presentation condition, which is about twice the mean response
time in Experiment 1. This increase of the response time simply confirms that the problems of
Experiment 2 were more complex than those of Experiment 1.
Table 14
8. Conclusion
This work is a first approach to quick multivariate function approximation tasks and
modelling. The proposed model (ABI) clearly exhibited very encouraging prediction
performance in suitably visualized approximation problems, and thus it constitutes a possible
starting point for further developments. The new model shares a number of characteristics
with several standard models belonging to univariate function learning and computer science
P. Courrieu – British Journal of Mathematical and Statistical Psychology 25/59
areas. It includes an inverse distance principle as the well-known Shepard interpolator
(Shepard, 1968), and it approximates non-linear functions by averaging local linear
interpolations/extrapolations from data points, just as the EXAM model (DeLosh et al., 1997).
Note that the POLE model (Kalish et al., 2004) also uses pieces of linear functions to
approximate non-linear functions, so there is strong convergence of several successful models
on this point. However, the ABI model relies on a Voronoi structuring of the data space, as
suggested by a number of empirical evidences concerning visual processing (Dry, 2008; Dry
et al., 2009). Moreover, the use of bipoints as elementary approximation units on
multidimensional spaces required introducing a special polytope distance, named "exteriority"
(Courrieu, 1994), in order to estimate the distance of any generalization point from each
relevant bipoint. These last elements were strongly suggested by an elaborated examination of
Experiment 1 data. Experiment 2 showed that the model specifically applies to problem
presentations in which the visual system can perform some form of Voronoi-like tessellation
of the data space. However, even in cases where such a visual tessellation is not possible, the
ABI model remains a top level predictor, together with other models such as the standard
Lipschitz interpolator, and possibly the Nearest Neighbour Approximator in interpolation
problems, or the Hardy multiquadric model in extrapolation problems. This leaves open the
question of a possible abstract (non visual) Voronoi tessellation.
The ABI model can work on spaces of any dimension, and we specifically studied
bivariate functions in this article. However, one can also restrict the support space to one
variable in order to compare the generalization behaviour of the ABI model with that of well-
known function learning models. This is particularly relevant in the case of the EXAM model
(DeLosh et al., 1997), because EXAM also builds predictions by averaging linear
interpolations and extrapolations from data points, but in a way that needs the data points to
be strictly ordered, thus in a one-dimension support space. Figure 9 shows a comparison of
ABI and EXAM generalization responses from 5 data points belonging to 3 univariate
functions (linear, exponential, and quadratic) tested by DeLosh et al. (1997), and a fourth
function which is a perturbed variant of the quadratic function. As one can see, ABI and
EXAM provide the same predictions in the linear case, and their generalization responses are
very similar on the interpolation and nearby extrapolation areas of other functions. However,
discrepancies appear between the two models' predictions on the far extrapolation areas of
non-linear functions. As one can see in Figure 9, while EXAM extrapolates in a strictly linear
way from the nearest data points, ABI far extrapolations gradually tend to take into account
P. Courrieu – British Journal of Mathematical and Statistical Psychology 26/59
the whole set of data points, which draws the function towards the average of all bipoints
extrapolations. Examining human and EXAM generalization responses in Figures 9
(exponential function) and Figure 10 (quadratic function) from DeLosh et al (1997), one can
observe that human responses more likely conform to EXAM predictions in the backward
(low) extrapolation area, while they more likely conform to ABI predictions in the forward
(high) extrapolation area. So, for the moment, none of the two models must be preferred in
order to predict one-variable function extrapolation. We can also note that the perturbation of
the quadratic function has the same limited effect on the predictions of both models. Exact
interpolators (unlike smoothers) are frequently highly sensitive to small variations in the data,
which can generate undesirable oscillations of the generalization function. No such a problem
appeared with ABI and EXAM models, and their reaction to local data variations seems very
reasonable. The ABI model was not designed to manage noisy data. This is because the task
of generalizing from a small set of sparse data does not allow participants to detect possible
noise, so participants have no other choice than considering the data as exact. However, the
ABI model itself is not limited to small data sets and it could be used to model other tasks,
possibly using large sets of noisy data, provided that some noise management mechanism be
added to the model. In order to remain in the direct approximation philosophy, a possible
choice is to apply a local smoother, such as a convolution filter, to the raw data in order to
generate a suitable filtered data set before applying the ABI process. However, this is a
substantial extension of the model, and of its application domain, that clearly concerns future
works.
Figure 9
It remains to complete the ABI model with some missing variables (linear
interpolation/extrapolation error, response rounding, and number of sampled bipoints per
response), in order to build a full simulation model. Finally, we note that there is clearly room
for model improvement since all models, including ABI, under-fitted the empirical data in
terms of squared correlation with respect to the human data ICC (Courrieu et al., 2011). This
new model testing methodology has the advantage of providing a clear answer to the
questions of under-fitting and over-fitting of the data by the models, while traditional model
selection criteria only allow comparing several given models to each other without indications
concerning their plausibility. In the present case, the model testing methodology based on the
P. Courrieu – British Journal of Mathematical and Statistical Psychology 27/59
data ICC provides a quite severe diagnostic for all tested models, encouraging us to be
modest, and to envisage further investigations on quick multivariate function approximation.
______________________
Appendix
Matlab code of the ABI function (for "Average of Bipoints Interpolations") that
implements the proposed model (for academic use only, since exceptions are not managed).
Comments (following "%") provide indications about the use of the function.
P. Courrieu – British Journal of Mathematical and Statistical Psychology 28/59
(Note to editorial: please insert this program as it is)
function [EFT,VFT] = ABI(XD,FD,XT,alpha,beta) % Average of Bipoints Interpolations % Expected output (EFT) and variance (VFT) at test input points (XT), % given data output (FD) at data input points (XD). Points=row vectors. [ND,dX]=size(XD); [ND1,dF]=size(FD); [NT,dX1]=size(XT); if nargin<5, alpha=2; beta=2; end % default parameters EFT=zeros(NT,dF); VFT=EFT; Bip=Neighbors(XD); [NBip,s]=size(Bip); for t=1:NT x=XT(t,:); dnn=inf; for i=1:ND d=norm(x-XD(i,:)); if d<dnn, dnn=d; nn=i; end end if dnn<=sqrt(eps) EFT(t,:)=FD(nn,:); VFT(t,:)=zeros(1,dF); else sw=0; fx=0; f2x=0; for k=1:NBip a=XD(Bip(k,1),:); b=XD(Bip(k,2),:); fa=FD(Bip(k,1),:); fb=FD(Bip(k,2),:); ext=Exteriority(x,a,b); w=exp(-beta*(ext/dnn)^alpha); sw=sw+w; fbip=fa+(fb-fa)*((x-a)*(b-a)')/((b-a)*(b-a)'); fx=fx+w*fbip; f2x=f2x+w*(fbip.*fbip); end fx=fx/sw; f2x=f2x/sw; EFT(t,:)=fx; VFT(t,:)=f2x-fx.^2; end, end, end function ext=Exteriority(x,a,b) % Exteriority of point x from line segment ab C=a-b; CC=sum(C.*C); if CC==0 ext=norm(x-a); else A=x-a; B=x-b; e2=sum(A.*A)*sum(B.*B)-sum(A.*B)^2; e2=e2+0.25*(abs(sum(A.*C))+abs(sum(B.*C))-CC)^2; ext=sqrt(max(0,e2/CC)); end, end function pairs=Neighbors(X) % Index pairs of all Voronoi neighbors from matrix X [m,n]=size(X); pairs=[]; for i=1:(m-1) for j=(i+1):m mid=0.5*(X(i,:)+X(j,:)); Y=X-ones(m,1)*mid; D2=sum(Y.^2,2); minD2=min(D2); if minD2==D2(i) pairs=[pairs;[i j]]; else A=eye(n)-Y(i,:)'*Y(i,:)/(Y(i,:)*Y(i,:)'); k=find(D2==minD2); k=k(1); B=2*Y(k,:)*A; C=-(D2(k)+D2(i))*B/(B*B'); D2=sum((Y-ones(m,1)*C).^2,2); minD2=min(D2); if minD2==D2(i), pairs=[pairs;[i j]]; end end, end, end, end
P. Courrieu – British Journal of Mathematical and Statistical Psychology 29/59
References
Akaike, H. (1974). A new look at the statistical model identification. I.E.E.E. Transactions on
Automatic Control, AC 19, 716-723.
Ashby, F.G., & Ell, S.W. (2002). Single versus multiple systems of category learning: Reply
to Nosofsky and Kruschke (2002). Psychonomic Bulletin & Review, 9 (1), 175–180.
Beliakov, G. (2006). Interpolation of Lipschitz functions. Journal of Computational and
Applied Mathematics, 196, 20–44.
Ben-Israel, A., & Greville, T.N.E. (2003). Generalized Inverses: Theory and Applications
(2nd ed.). New York, Springer. 420 pages.
Bott, L., & Heit, E. (2004). Nonmonotonic extrapolation in function learning. Journal of
Experimental Psychology: Learning, Memory, and Cognition, 30(1), 38-50.
Brodlie, K.W., Asim, M.R., & Unsworth, K. (2005). Constrained visualization using the
Shepard interpolation family. Computer Graphics Forum, 24(4), 809-820.
Busemeyer, J.R., Myung, I.J., & McDaniel, M.A. (1993). Cue competition effects:
Theoretical implications for adaptive network learning models. Psychological Science,
4.
Courrieu, P. (1994). Three algorithms for estimating the domain of validity of feedforward
neural networks. Neural Networks, 7, 169-174.
Courrieu, P. (1997). The Hyperbell Algorithm for global optimization: a random walk using
Cauchy densities. Journal of Global Optimization, 10, 37-55.
Courrieu, P. (2004). Solving time of least square systems in Sigma-Pi unit networks. Neural
Information Processing: Letters and Reviews, 4(3), 39-45.
Courrieu, P. (2005). Function approximation on non-Euclidean spaces. Neural Networks, 18,
91-102.
Courrieu, P. (2009). Fast solving of Weighted Pairing Least-Squares systems. Journal of
Computational and Applied Mathematics, 231, 39-48.
Courrieu, P., Brand-D'Abrescia, M., Peereman, R., Spieler, D., & Rey, A. (2011). Validated
intraclass correlation statistics to test item performance models. Behavior Research
Methods, 43, 37-55. doi: 10.3758/s13428-010-0020-5
(preprint: http://arxiv.org/abs/1010.0173).
DeLosh, E.L., Busemeyer, J.R., & McDaniel, M.A. (1997). Extrapolation: the sine qua non
for abstraction in function learning. Journal of Experimental Psychology: Learning,
Memory, and Cognition, 23(4), 968-986.
P. Courrieu – British Journal of Mathematical and Statistical Psychology 30/59
Dry, M. J. (2008). Using relational structure to detect symmetry: a Voronoi tessellation based
model of symmetry perception. Acta Psychologica, 128, 75-90.
Dry, M.J., Navarro, D.J., Preiss, K., Lee, M.D. (2009) The Perceptual Organization of Point
Constellations. In N. Taatgen, H. van Rijn, J. Nerbonne, & L. Shonmaker (Eds.),
Proceedings of the 31st Annual Conference of the Cognitive Science Society, 1151-
1156. Austin, TX: Cognitive Science Society.
Fahlman, S.E., & Lebiere, C. (1990). The Cascade-Correlation learning algorithm. In D.S.
Touretsky (Ed.): Advances in Neural Information Processing Systems, 2. San Mateo,
CA: Morgan Kauffman Publishers, pp. 525-532.
Gärtner, B., & Jaggi, M. (2009). Coresets for polytope distance. ACM, Proceedings of the
25th annual symposium on Computational geometry, 33-42.
Girosi, F., & Poggio, T. (1990). Networks and the best approximation property. Biological
Cybernetics, 63, 169-176.
Jeffreys, H. (1961). Theory of Probability (3rd ed.). Oxford, UK: Oxford University Press.
Kalish, M.L., Lewandowsky, S., & Kruschke, J.K. (2004). Population of linear experts:
knowledge partitioning and function learning. Psychological Review, 111(4), 1072-
1099.
Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical
Association, 90(430), 773-795.
Kelley, H., & Busemeyer, J. (2008). A comparison of models for learning how to dynamically
integrate multiple cues in order to forecast continuous criteria. Journal of Mathematical
Psychology, 52, 218–240.
Koh, K. (1993). Induction of combination rules in two-dimensional function learning.
Memory & Cognition, 21(5), 573-590.
Lewandowsky, S., Kalish, M., & Ngang, S.K. (2002). Simplified learning in complex
situations: knowledge partitioning in function learning. Journal of Experimental
Psychology: General, 131(2), 163-193.
Luce, R.D. (1977). The choice axiom after twenty years. Journal of Mathematical
Psychology, 15, 215-233.
McDaniel, M.A., & Busemeyer, J.R. (2005). The conceptual basis of function learning and
extrapolation: comparison of rule-based and associative-based models. Psychonomic
Bulletin & Review, 12 (1), 24-42.
P. Courrieu – British Journal of Mathematical and Statistical Psychology 31/59
McDaniel, M.A., Dimperio, E., Griego, J.A., & Busemeyer, J.R. (2009). Predicting transfer
performance: a comparison of competing function learning models. Journal of
Experimental Psychology: Learning, Memory, and Cognition, 35(1), 173-195.
McGraw, K.O., & Wong, S.P. (1996). Forming inferences about some intralass correlation
coefficients. Psychological Methods, 1(1), 30-46.
Micchelli, C.A. (1986). Interpolation of scattered data: distance matrices and conditionally
positive definite functions. Constructive Approximation, 2, 11-22.
Moody, J., & Darken, C.J. (1989). Fast learning in networks of locally-tuned processing units.
Neural Computation, 1, 281-294.
Nosofsky, R.M., & Kruschke, J.K. (2002). Single-system models and interference in category
learning: Commentary on Waldron and Ashby (2001). Psychonomic Bulletin & Review,
9, 169-174.
Okabe, A., Boots, B., Sugihara, K., & Chiu, S.N. (2000). Spatial Tessellations - Concepts and
Applications of Voronoi Diagrams (2nd ed.). Chichester, John Wiley. 671 pages.
Pelillo, M. (1996). A relaxation algorithm for estimating the domain of validity of
feedforward neural networks. Neural Processing Letters, 3, 113-121.
Pitt, M.A., & Myung, I.J. (2002). When a good fit can be bad. Trends in Cognitive Sciences,
6(10), 421-425.
Poggio, T., & Girosi, F. (1990). Networks for approximation and learning. Proceedings of the
IEEE, 78(9), 1481-1497.
Renka, R.J. (1988). Multivariate interpolation of large sets of scattered data. ACM
Transactions on Mathematical Software, 14(2), 139-148.
Rey, A., Courrieu, P., Schmidt-Weigand, F., & Jacobs, A.M. (2009). Iterm performane in
visual word recognition. Psychonomic Bulletin & Review, 16 (3), 600-608
Rumelhart, D.E., Hinton, G.E., & Williams, R.J. (1986). Learning internal representations by
error propagation. In D.E. Rumelhart & J.L. McClelland (Eds.): Parallel Distributed
Processing: Explorations in the Microstructure of Cognition. Cambridge, MA: MIT
Press, pp. 318-362.
Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6, 461-464.
Shepard, D. (1968). A two-dimensional interpolation for irregularly-spaced data. Proceedings
of the ACM National Conference, 517-524.
Shepard, R. N. (1987). Toward a universal law of generalization for psychological science.
Science, 237, 1317-1323.
P. Courrieu – British Journal of Mathematical and Statistical Psychology 32/59
Steiger, J.H. (1980). Tests for comparing elements of a correlation matrix. Psychological
Bulletin, 87(2), 245-251.
Williams, E. J. (1959). The comparison of regression variables. Journal of the Royal
Statistical Society, Series B, 21, 396-399.
Yoon, J. (2001). Interpolation by Radial Basis Functions on Sobolev space. Journal of
Approximation Theory, 112, 1-15.
-----------------------------------------------------------------------------------------------------------------
AUTHOR NOTE
The author would like to thank Mark McDaniel and an anonymous reviewer for their
constructive comments concerning this work. Many thanks also to Matthew Dry and John
Kruschke for their helpfull suggestions.
P. Courrieu – British Journal of Mathematical and Statistical Psychology 33/59
Legends and Captions
Table 1. Input-output values of the data sets, and input coordinates of the generalization
points for interpolation and extrapolation problems. Each of the 16 approximation problems
(e.g. "E1.2") consists of a given sampled function (e.g. "E1"), together with a generalization
point (e.g. ".2"), as illustrated in Figure 1.
Table 2. Individual responses of 16 participants to the 16 problems, with mean and standard
deviation for each problem.
Table 3. Generalization responses provided by 10 standard models for the 16 problems,
compared to mean human responses ("Hum"), and correlations between models' responses
and human responses.
Table 4. Data and results of the model class characterization procedure. Each coefficient in
the weight vector WI or WE is the value, at the generalization point, of the weighting function
of the data point whose distance from this generalization point is just above (in the "Dist."
row). The corresponding data function values are given for each problem. Weighting and
summing them in each row, one obtains the values in the "Obtained" column, which are least-
squares approximations of the corresponding human responses in the "Target" column. The
"Const." rows correspond to the unit sum constraint of the model class.
Table 5. Mean values and standard deviations of the responses to the 16 problems provided
by humans and the proposed model (ABI function listed in Appendix). In the last row,
correlations between the predictions and the observations.
Table 6. Pair-wise comparisons of the prediction performance of 11 models by mean of
Student t tests between samples (16 subjects) of Fisher's r to z transformations, where each r
value is the correlation between one subject response pattern and one model prediction
pattern. A positive t value indicates that the row entry model provided better predictions than
the column entry model, while the inverse is true for negative t values. The ABI model
provided significantly better predictions than all other tested models.
P. Courrieu – British Journal of Mathematical and Statistical Psychology 34/59
Table 7. Number of fitted parameters, explained proportion of variance (r2), AIC, and BIC for
11 models tested with the data of Experiment 1. The best model is the one having minimal
AIC and BIC, and maximal r2. The last column provides approximated Bayes factors with
respect to the minimal BIC. Bayes factors greater than 3.2 are considered at least
"substantial". The table also provides the data ICC with its 99.9% confidence interval for
comparison to r2.
Table 8. Comparison (using Williams T2 tests) of the correlations between ABI model
predictions and the responses observed in the two presentation conditions of Experiment 2.
Model's predictions were generated using four tunings of the parameters: the optimal tuning
for Experiment 1 (ABI1), the optimal tuning for the spatial presentation of Experiment 2
(ABIa), the optimal tuning for the non-spatial presentation of Experiment 2 (ABIb), and a
default tuning (ABI0). The table also shows the corresponding parameter values.
Table 9. Pair-wise comparisons of the prediction performance of 11 models, by mean of
Williams T2 tests on the correlations between predicted and observed responses, in the spatial
presentation condition of Experiment 2. A positive T2 value indicates that the row entry
model provided better predictions than the column entry model, while the inverse is true for
negative T2 values. The ABIa and ABI0 models provided significantly better predictions than
all other tested models. The last row provides the obtained correlation for each model.
Table 10. Number of fitted parameters, explained proportion of variance (r2), AIC and BIC
criteria, and approximated Bayes factors with respect to the minimal BIC, for 11 models
tested with the data of the spatial presentation condition of Experiment 2.
Table 11. Pair-wise comparisons of the prediction performance of 11 models, by mean of
Williams T2 tests on the correlations between predicted and observed responses, in the non-
spatial presentation condition of Experiment 2. A positive T2 value indicates that the row
entry model provided better predictions than the column entry model, while the inverse is true
for negative T2 values. The ABIb and ABI0 models provided just, but not significantly better
predictions than standard models. The last row provides the obtained correlation for each
model.
P. Courrieu – British Journal of Mathematical and Statistical Psychology 35/59
Table 12. Number of fitted parameters, explained proportion of variance (r2), AIC and BIC
criteria, and approximated Bayes factors with respect to the two lowest BICs, for 11 models
tested with the data of the non-spatial presentation condition of Experiment 2.
Table 13. Correlations between 10 models' predictions and human generalization responses in
Experiment 2, in the spatial and non spatial presentation conditions, for random interpolation
problems (N=29), and for random extrapolation problems (N=35).
Table 14. Summary of the correlations observed between the predicted standard deviations of
responses, with three tunings of the ABI model (ABI0, ABIa, and ABIb), the observed
response standard deviations, and the averaged response times for each problem, in the spatial
and non-spatial presentation conditions of Experiment 2.
Figure 1. An example of test sheet ("E1.2" extrapolation problem) in Experiment 1. The
function value in the empty circle must be approximated.
Figure 2. Generalization surfaces for the E1 function (dots are data points) generated by four
standard models (She, Lip, Har, and Phi).
Figure 3. Visualization of the data points (in black on white background) and average human
generalization responses (in white on black background) for the 4 experimental functions of
Experiment 1 (with 4 problems per function).
Figure 4. Examples of exteriority functions (first row) from bipoints whose extremity points
are materialized by small white squares. The bipoint in the left column belongs to the
interpolation data set of Experiment 1, while the bipoint in the right column belongs to the
extrapolation data set. The second row shows the nearest data point distance function (d0) for
the interpolation (left) and the extrapolation (right) data sets, which also visualizes the
Voronoi tessellation of these sets. The third row shows the ν functions of the corresponding
bipoints, using the parameter values α=1.2031 and β=2.3536.
P. Courrieu – British Journal of Mathematical and Statistical Psychology 36/59
Figure 5. Generalization surfaces for the I1, I2, E1, and E2 functions (dots are data points)
generated by the ABI model (function listed in Appendix), with α=1.2031 and β=2.3536.
Figure 6. An example of function approximation problem as displayed in the spatial
presentation condition of Experiment 2.
Figure 7. An example of function approximation problem as displayed in the non-spatial
presentation condition of Experiment 2. This is formally the same problem as in Figure 6.
Figure 8. Visualization of the generalization surface generated by the ABI0 model from the
data points of Figure 6 (dots are data points).
Figure 9. Comparison of ABI and EXAM generalization responses from 5 data points
belonging to 3 univariate functions (linear, exponential, and quadratic) tested by DeLosh et al.
(1997), and a fourth function which is a perturbed variant of the quadratic function. The
default parameter values were used for the ABI model, while the EXAM model parameters
were set to γ=0.9 and α=0.1.
P. Courrieu – British Journal of Mathematical and Statistical Psychology 37/59
Table 1
Interpolation Data
Input Output x y I1 I2
Extrapolation Data Input Output
x y E1 E2 1 1 15 10
1 9 10 5 9 1 20 15 9 9 5 5 5 5 5 20
1 6 8 3 6 9 3 3 9 4 18 13 4 1 13 8 5 5 3 18
#
Interpolation Test Points Input x Input y
Extrapolation Test Points Input x Input y
.1
.2
.3
.4
1 6 6 9 9 4 4 1
1 1 1 9 9 1 9 9
P. Courrieu – British Journal of Mathematical and Statistical Psychology 38/59
Table 2
Subject: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 mean SD Problem
I1.1 7 9 8 12 8 12 11 10 10 11 12 12 12 8 8 10 10.00 1.79 I1.2 5 7 5 6 6 6 6 7 7 6 5 7 7 6 8 5 6.19 0.91 I1.3 10 13 10 13 8 10 12 10 15 12 10 10 10 14 10 10 11.06 1.88 I1.4 15 17 17 16 16 18 16 17 17 16 18 18 17 16 15 15 16.50 1.03 I2.1 13 11 8 8 7 12 7 11 7 8 7 15 8 7 12 10 9.44 2.58 I2.2 5 6 5 9 5 5 6 5 5 7 5 5 5 7 9 5 5.88 1.41 I2.3 15 17 17 14 15 18 15 10 10 12 10 10 13 13 10 10 13.06 2.89 I2.4 13 14 13 12 12 13 14 12 13 13 10 18 13 12 12 10 12.75 1.81 E1.1 11 10 10 11 10 10 13 10 10 12 10 15 15 10 8 15 11.25 2.14 E1.2 5 3 6 7 8 5 8 5 5 6 5 5 0 5 6 5 5.25 1.88 E1.3 15 20 16 19 24 21 18 15 18 20 20 20 20 20 12 15 18.31 3.00 E1.4 3 2 5 4 2 4 3 3 5 6 8 3 0 7 10 5 4.38 2.50 E2.1 6 6 5 10 5 5 8 6 10 10 3 9 5 5 8 5 6.62 2.22 E2.2 3 0 2 5 1 2 0 3 2 2 3 4 0 2 6 3 2.38 1.71 E2.3 11 7 11 19 12 10 10 10 15 10 8 15 10 16 10 8 11.38 3.26 E2.4 2 -1 7 10 5 5 3 3 7 7 8 3 7 2 8 5 5.06 2.89
P. Courrieu – British Journal of Mathematical and Statistical Psychology 39/59
Table 3
Model: Hum NNA Lip Gau Phi Har Spl MLP CCo She Qua Problem I1.1 10.0 10.0 9.0 9.4 9.1 9.4 8.5 9.2 9.2 9.8 7.3 I1.2 6.2 5.0 5.1 4.0 5.3 4.6 3.3 4.5 4.8 7.2 4.4 I1.3 11.1 20.0 14.0 12.8 14.3 12.1 10.8 11.6 13.0 12.9 9.8 I1.4 16.5 15.0 11.5 14.7 12.9 14.1 13.2 15.3 14.9 12.6 14.4 I2.1 9.4 5.0 11.0 10.5 9.7 10.1 16.1 10.9 9.9 10.1 12.8 I2.2 5.9 5.0 11.0 8.7 9.0 8.3 14.1 8.2 10.3 9.6 9.6 I2.3 13.1 15.0 13.7 15.8 15.4 14.5 20.3 17.2 15.2 13.8 17.2 I2.4 12.8 10.0 13.5 15.9 13.5 14.8 20.9 13.8 16.5 12.8 16.5 E1.1 11.2 13.0 12.8 8.9 13.1 15.0 19.2 12.9 14.1 10.3 17.5 E1.2 5.2 8.0 8.0 5.3 7.9 8.4 8.6 7.5 6.3 7.2 9.7 E1.3 18.3 18.0 15.3 13.4 18.2 21.9 30.0 18.8 26.1 13.2 28.5 E1.4 4.4 3.0 6.9 5.1 3.7 11.1 14.0 7.2 11.5 6.8 12.8 E2.1 6.6 8.0 8.2 2.6 7.5 5.3 -23.2 6.3 6.2 8.6 -4.3 E2.2 2.4 3.0 5.7 -0.1 2.6 1.0 -29.8 1.1 -6.3 6.1 -10.3 E2.3 11.4 13.0 13.0 7.0 12.6 12.1 -12.5 8.0 14.4 11.6 5.0 E2.4 5.1 3.0 5.7 1.9 3.0 5.7 -21.7 7.9 6.8 7.5 -3.1 r .840 .839 .851 .916 .880 .619 .905 .858 .910 .759
P. Courrieu – British Journal of Mathematical and Statistical Psychology 40/59
Table 4
Problem Reordered interpolation output data Target Obtained I1.1 I2.1 I1.2 I2.2 I1.3 I2.3 I1.4 I2.4 Const.
10.00 5.00 15.00 5.00 20.00 10.00 10.74 5.00 20.00 10.00 5.00 15.00 9.44 8.83 5.00 5.00 10.00 20.00 15.00 6.19 6.21 5.00 20.00 5.00 15.00 10.00 5.88 6.88 20.00 5.00 5.00 15.00 10.00 11.06 12.01 15.00 20.00 5.00 10.00 5.00 13.06 12.30 15.00 5.00 20.00 10.00 5.00 16.50 16.29 10.00 20.00 15.00 5.00 5.00 12.75 14.12 1.00 1.00 1.00 1.00 1.00 1.00 1.00
Dist. WI
3.0000 4.1231 5.0000 8.5440 9.4340 0.5003 0.1583 0.4244 -0.0160 -0.0670
Problem Reordered extrapolation output data Target Obtained E1.1 E2.1 E1.2 E2.2 E1.3 E2.3 E1.4 E2.4 Const.
13.00 8.00 3.00 18.00 3.00 11.25 11.84 8.00 3.00 18.00 13.00 3.00 6.62 6.79 8.00 3.00 3.00 13.00 18.00 5.25 6.28 3.00 3.00 18.00 8.00 13.00 2.38 2.43 18.00 13.00 3.00 3.00 8.00 18.31 18.23 13.00 8.00 18.00 3.00 3.00 11.38 12.97 3.00 18.00 3.00 8.00 13.00 4.38 5.50 3.00 13.00 18.00 3.00 8.00 5.06 5.03 1.00 1.00 1.00 1.00 1.00 1.00 1.00
Dist. WE
3.0000 5.0000 5.6569 8.5440 9.4340 0.8854 0.2098 0.0048 -0.0710 -0.0290 r = .988
P. Courrieu – British Journal of Mathematical and Statistical Psychology 41/59
Table 5
Problem
Mean values Human Model
Standard deviations Human Model
I1.1 I1.2 I1.3 I1.4 I2.1 I2.2 I2.3 I2.4 E1.1 E1.2 E1.3 E1.4 E2.1 E2.2 E2.3 E2.4
10.00 10.59 6.19 6.16 11.06 13.73 16.50 15.14 9.44 8.47 5.88 7.04 13.06 13.07 12.75 12.99 11.25 11.50 5.25 5.96 18.31 18.81 4.38 4.71 6.62 7.00 2.38 2.53 11.38 13.87 5.06 5.12
1.79 2.83 0.91 1.75 1.88 3.63 1.03 3.29 2.58 4.11 1.41 4.49 2.89 3.70 1.81 3.47 2.14 3.38 1.88 3.87 3.00 5.23 2.50 4.66 2.22 4.35 1.71 4.50 3.26 4.32 2.89 5.84
r 0.975 0.653
P. Courrieu – British Journal of Mathematical and Statistical Psychology 42/59
Table 6
t(15) NNA Lip Gau Phi Har Spl MLP CCo She Qua
Lip
Gau
Phi
Har
Spl
MLP
CCo
She
Qua
ABI
-0.16
0.62 1.03
5.14 7.30 3.15
2.08 2.41 0.93 -2.28
-5.84 -7.77 -12.62 -10.12 -13.22
2.72 3.21 3.26 -0.37 2.16 12.93
0.96 1.04 -0.08 -3.17 -2.80 9.29 -2.84
4.20 7.40 3.74 -0.65 1.77 10.66 0.00 2.91
-2.75 -3.47 -5.27 -6.91 -10.45 16.73 -9.14 -5.46 -6.71
19.70 9.83 4.93 4.97 6.58 10.90 3.25 8.98 4.78 8.75
Significance: |t(15)| ≥ 2.13, p<.05; |t(15)| ≥ 2.95, p<.01; |t(15)| ≥ 4.07, p<.001
P. Courrieu – British Journal of Mathematical and Statistical Psychology 43/59
Table 7
Model
Fitting parameters
r2
AIC
BIC
BFABI, *
NNA
Lip
Gau
Phi
Har
Spl
MLP
CCo
She
Qua
ABI
0
0
0
0
0
0
0
0
0
1
2
0.705
0.705
0.725
0.838
0.775
0.383
0.819
0.737
0.828
0.576
0.950
28.86
22.50
25.09
12.28
23.08
793.95
14.43
53.88
19.86
147.91
8.00
28.86
22.50
25.09
12.28
23.08
793.95
14.43
53.88
19.86
148.69
9.54
>1000
652
>1000
3.94
871
>1000
11.53
>1000
174.16
>1000
1
Data ICC
.999 confidence
0.985
[0.959, 0.997]
P. Courrieu – British Journal of Mathematical and Statistical Psychology 44/59
Table 8
Model tuning
α
β
Spatial presentation
Non-spatial presentation
Williams T2 test
ABI0
ABI1
ABIa
ABIb
2
1.2031
3.1757
1.2219
2
2.3536
1.4783
3.3240
r = 0.947
r = 0.931
r = 0.948
r = 0.939
r = 0.782
r = 0.782
r = 0.781
r = 0.787
p < .001
p < .001
p < .001
p < .001
P. Courrieu – British Journal of Mathematical and Statistical Psychology 45/59
Table 9
T2(61) ABIa ABI0 CCo Gau Har Lip MLP NNA Phi She Spl
ABI0
CCo
Gau
Har
Lip
MLP
NNA
Phi
She
Spl
-0.54
-7.02 -6.94
-7.66 -7.59 -0.85
-4.35 -4.26 3.22 3.27
-3.83 -3.73 3.14 4.11 0.80
-3.83 -3.71 5.46 4.10 0.58 -0.34
-6.92 -6.68 1.00 1.82 -1.47 -3.32 -1.77
-4.14 -3.91 3.30 4.48 0.93 0.25 0.49 5.69
-6.45 -6.40 0.50 1.35 -2.16 -4.41 -2.29 -0.66 -3.70
-9.15 -9.02 -2.33 -1.21 -4.60 -4.86 -5.78 -2.69 -5.06 -2.22 ----
r 0.948 0.947 0.712 0.650 0.850 0.877 0.865 0.782 0.881 0.749 0.545
Significance: |T2(61)| ≥ 2, p<.05; |T2(61)| ≥ 2.66, p<.01; |T2(61)| ≥ 3.46, p<.001;
P. Courrieu – British Journal of Mathematical and Statistical Psychology 46/59
Table 10
Model
Fitting parameters
r2
AIC
BIC
BFABI0, *
ABIa
ABI0
CCo
Gau
Har
Lip
MLP
NNA
Phi
She
Spl
2
0
0
0
0
0
0
0
0
0
0
0.899
0.897
0.506
0.423
0.722
0.768
0.748
0.611
0.777
0.560
0.297
74.6
73.1
629.1
1076.3
215.0
191.7
276.5
305.4
159.2
369.5
3709.8
79.0
73.1
629.1
1076.3
215.0
191.7
276.5
305.4
159.2
369.5
3709.8
18.84
1
>1000
>1000
>1000
>1000
>1000
>1000
>1000
>1000
>1000
Data ICC
.999 confidence
0.978
[0.957, 0.989]
P. Courrieu – British Journal of Mathematical and Statistical Psychology 47/59
Table 11
T2(61) ABIb ABI0 CCo Gau Har Lip MLP NNA Phi She Spl
ABI0
CCo
Gau
Har
Lip
MLP
NNA
Phi
She
Spl
-0.47
-2.28 -2.23
-3.74 -3.60 -1.42
-0.24 -0.14 2.55 3.29
-0.62 -0.52 1.65 3.17 -0.27
-0.88 -0.81 2.62 3.16 -0.88 -0.32
-0.29 -0.19 1.68 3.13 -0.04 0.29 0.46
-0.24 -0.11 1.92 3.67 0.05 0.58 0.62 0.19
-1.55 -1.45 0.66 2.09 -1.33 -1.60 -0.67 -1.38 -1.66
-4.11 -4.10 -2.30 -0.65 -3.96 -3.33 -3.89 -3.31 -3.63 -2.32 ----
r 0.787 0.782 0.646 0.529 0.776 0.764 0.747 0.774 0.779 0.701 0.466
Significance: |T2(61)| ≥ 2, p<.05; |T2(61)| ≥ 2.66, p<.01; |T2(61)| ≥ 3.46, p<.001;
P. Courrieu – British Journal of Mathematical and Statistical Psychology 48/59
Table 12
Model
Fitting parameters
r2
AIC
BIC
BFLip, *
BFABI0, *
ABIb
ABI0
CCo
Gau
Har
Lip
MLP
NNA
Phi
She
Spl
2
0
0
0
0
0
0
0
0
0
0
0.619
0.612
0.417
0.280
0.603
0.583
0.557
0.599
0.606
0.491
0.217
57.27
56.16
157.52
221.99
65.05
55.56
102.95
64.15
58.51
72.20
841.85
61.59
56.16
157.52
221.99
65.05
55.56
102.95
64.15
58.51
72.20
841.85
20.49
1.35
>1000
>1000
115.03
1
>1000
73.24
4.37
>1000
>1000
15.09
1
>1000
>1000
85.09
----
>1000
54.18
3.23
>1000
>1000
Data ICC
.999 conf.
0.875
[0.762, 0.940]
P. Courrieu – British Journal of Mathematical and Statistical Psychology 49/59
Table 13
Spatial presentation Non spatial presentation
Model Interpolation Extrapolation Interpolation Extrapolation
ABI0
CCo
Gau
Har
Lip
MLP
NNA
Phi
She
Spl
0.890
0.668*
0.799*
0.831
0.851
0.723*
0.674*
0.800*
0.869
0.431*
0.968
0.725*
0.679*
0.880*
0.910*
0.914*
0.884*
0.935*
0.730*
0.652*
0.662
0.514
0.554*
0.609
0.700
0.541
0.757
0.700
0.670
0.362*
0.837
0.702*
0.583*
0.881
0.805
0.837
0.805
0.825
0.742*
0.563*
(*) Significantly worse than the best predictor in the same column (Williams T2 test).
P. Courrieu – British Journal of Mathematical and Statistical Psychology 50/59
Table 14
Spatial presentation Non-spatial presentation Model tuning Response SD Response time Response SD Response time ABI0 (SD)
ABIa (SD)
ABIb (SD)
r = 0.33, p<.01
r = 0.39, p<.01
r = 0.52, p<.01
r = 0.55, p<.01
r = 0.20, n.s.
r = 0.20, n.s.
r = 0.43, p<.01
r = 0.47, p<.01