quick approximation of bivariate functions

59
P. Courrieu – British Journal of Mathematical and Statistical Psychology 1/59 Manuscript BJMSP518R2 (2011, in press) to appear in British Journal of Mathematical and Statistical Psychology DOI:10.1111/j.2044-8317.2011.02016.x Quick Approximation of Bivariate Functions Pierre Courrieu Laboratoire de Psychologie Cognitive, CNRS-Université de Provence, Marseille, France Running head : Function Approximation PsycINFO classification : 2340 (Cognitive Processes), 2323 (Visual Perception), 2240 (Statistics & Mathematics) Main text length: 9022 words Total length: 13273 words Send correspondence to: Pierre Courrieu Laboratoire de Psychology Cognitive UMR 6146, CNRS-Université de Provence Centre St Charles, Bat. 9, Case D 3 Place Victor Hugo 13331 Marseille cedex 03 France E-mail: [email protected] Phone : (+33) 4 13 55 09 89 - Fax : (+33) 4 13 55 09 98

Upload: independent

Post on 09-Dec-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

P. Courrieu – British Journal of Mathematical and Statistical Psychology 1/59

Manuscript BJMSP518R2 (2011, in press) to appear in

British Journal of Mathematical and Statistical Psychology

DOI:10.1111/j.2044-8317.2011.02016.x

Quick Approximation of Bivariate Functions

Pierre Courrieu

Laboratoire de Psychologie Cognitive, CNRS-Université de Provence, Marseille, France

Running head: Function Approximation

PsycINFO classification:

2340 (Cognitive Processes), 2323 (Visual Perception), 2240 (Statistics & Mathematics)

Main text length: 9022 words

Total length: 13273 words

Send correspondence to:

Pierre Courrieu

Laboratoire de Psychology Cognitive

UMR 6146, CNRS-Université de Provence

Centre St Charles, Bat. 9, Case D

3 Place Victor Hugo

13331 Marseille cedex 03

France

E-mail: [email protected]

Phone : (+33) 4 13 55 09 89 - Fax : (+33) 4 13 55 09 98

P. Courrieu – British Journal of Mathematical and Statistical Psychology 2/59

Quick Approximation of Bivariate Functions

Abstract

This paper presents two experiments where participants had to approximate function

values at various generalization points of a square, using given function values at a small set

of data points. A representative set of standard function approximation models was trained to

exactly fit the function values at data points, and models' responses at generalization points

were compared to those of humans. Then one defined a large class of possible models

(including the best two identified predictors) and the class maximal possible prediction

accuracy was evaluated. A new model of quick multivariate function approximation

belonging to this class was proposed. Its prediction accuracy was close to the maximum

possible, and significantly better than that of all other tested models. The new model also

provided a significant account of human responses variability. Finally, it was shown that this

model is more particularly suitable to problem presentations in which the visual system can

perform some specific structuring of the data space. So, this model is considered as a suitable

starting point for further investigations on quick multivariate function approximation, which

is to date a weakly explored question in cognitive psychology.

PsycINFO classification:

2340 (Cognitive Processes), 2323 (Visual Perception), 2240 (Statistics & Mathematics)

Key words: Function approximation; Multivariate functions; Computational models; Voronoi

tessellation.

P. Courrieu – British Journal of Mathematical and Statistical Psychology 3/59

1. Introduction

Function approximation is a basic capability that humans commonly use in their every

day life. For instance, whenever we look at a meteorological map for atmospheric temperature

forecasting, predicted temperatures are usually provided at a finite number of locations, while

we are possibly leaving at, or going to, a location that does not belong to the sample set. In

this case, we commonly approximate the temperature at the desired location from the

available predicted temperatures at other locations (possibly the nearest ones). This is a

function (temperature) approximation on a support space described by two variables

(longitude and latitude), which we refer to as a "bivariate" function approximation. The

support space is also called "input space", and the function space is also called "output space".

Another example is that of designers and draughtsmen who must draw continuous surfaces

fulfilling given dimensions at a finite number of points of the plane (without using standard

computational interpolators). More generally, a function approximation problem consists of

estimating a certain quantity at any point of a given support space, while this quantity is

actually known only at a limited number of particular points. The data points where the

function values are known are called "learning examples", "control points", "scattered data" or

simply "data points" and "data set", depending on the context. Every point of the support

space that does not belong to the data set is called a "generalization point", and one

distinguishes between generalization points that are "interpolation points", and those that are

"extrapolation points". Intuitively, an interpolation point is located somewhere "between"

several data points in the support space, while an extrapolation point is outside the data

cluster. More rigorously, one says that a generalization point is an interpolation (respectively

extrapolation) point if it is inside (respectively outside) the convex hull polytope of the data

set in the support space (Courrieu, 1994; Pelillo, 1996). In general, a function approximation

problem constrained only by a finite set of data points is an ill-posed problem, given that there

is a priori an infinity of distinct continuous functions passing by these data points. In fact,

every continuous interpolator provides an example of such a function, and the question we

ask is "how does a human select solutions among the infinite set of possibilities?"

It can be noted that there is a strong analogy between function approximation and

function learning, from a formal point of view (Girosi & Poggio, 1990; Poggio & Girosi,

1990). In fact, available psychological data and models concern function learning. Boolean

function learning (or category learning) has been widely studied on multidimensional support

P. Courrieu – British Journal of Mathematical and Statistical Psychology 4/59

spaces (Ashby & Ell, 2002; Nosofsky & Kruschke, 2002). However, continuous function

learning is commonly restricted to one-variable support spaces (Bott & Heit, 2004; DeLosh,

Busemeyer, & McDaniel, 1997; Kalish, Lewandowsky, & Kruschke, 2004; McDaniel &

Busemeyer, 2005; McDaniel, Dimperio, Griego, & Busemeyer, 2009), or to a one-variable

space with an additional binary contextual variable (Lewandowsky, Kalish, & Ngang, 2002).

An exception to this is the work of Koh (1993), where participants learned to produce

specified response durations when presented with stimulus lines varying in length and angle

of orientation, thus on a two-variable space. However, Koh studied only the learning process,

not the generalization process. Another important exception is the work of Kelley and

Busemeyer (2008), who compared several function learning models on multivariate function

forecasting problems. These authors tested function learning models such as low degree

multivariate polynomial regression, and the "Associative Learning Model" (Busemeyer,

Myung, & McDaniel, 1993), which is a variant of standard Gaussian "Radial Basis Function

Networks". Standard versions of these models will be also tested in the present study, together

with other function approximation models.

Although mathematically related to function approximation, function learning as a

cognitive process is in fact very different from quick function approximation. Function

learning commonly involves repeated presentations of a large set of learning examples, until a

learning criterion is reached and a specific generalization capability ("expertise") is acquired.

By contrast, quick function approximation usually involves small data sets, and

generalizations are produced immediately, based on an available capability, without specific

learning. So, theoretical models suitable to function learning are not necessarily transposable

to quick function approximation, and conversely. In fact, psychological data are not available

concerning quick function approximation on multivariate spaces, and currently available

models mainly belong to machine learning and computer science area. It is suggestive to note

that, in a famous paper, D. Shepard (1968) proposed an interpolation model whose goal was

to provide computers with quick function approximation capabilities similar to those we

attempt hereafter to study in human.

The next section presents an experiment using a quick function approximation task on

a square, which provides initial behavioural data. Then, using these data, we test the

prediction capability of a representative set of 10 standard function approximation models

(section 3). After this, we try to characterise a class of plausible models, and we estimate its

P. Courrieu – British Journal of Mathematical and Statistical Psychology 5/59

maximal prediction capability (section 4). Then we instantiate a new model from this class

(section 5), and we test its actual prediction capability (section 6). Section 7 presents a second

experiment, and we conclude in section 8.

2. Experiment 1

This experiment introduces a quick bivariate function approximation task, where

approximation problems are presented using numbers (function values) on a square

(bidimensional support), the variables being the plane coordinates of numbers' locations. The

goal is simply to collect an initial set of human generalization responses in order to test

various function approximation models in the next sections.

2.1. Participants. 16 participants (8 men and 8 women, 23-62 years old) participated in the

experiment on a voluntary base.

2.2. Material. A set of 16 bivariate function approximation problems was built in the

following way. A set of 5 input data points in a 10×10 square was chosen for interpolation

problems, and another set of 5 input data points was chosen for extrapolation problems. To

each of these data points, one associated two distinct arbitrary function values, thus there were

two functions to be approximated in interpolation problems ("I1" and "I2"), and two other

functions to be approximated in extrapolation problems ("E1" and "E2"). 4 distinct

interpolation input points (numbered ".1", ".2", ".3", and ".4") were associated to interpolation

problems, and 4 distinct extrapolation input points (also numbered ".1", ".2", ".3", and ".4")

were associated to extrapolation problems. In all cases, the distance of a generalization point

from its nearest data point was the same (=3). Table 1 summarises all problems' data. Each of

the 16 approximation problems was presented (one at a time) to the participants on a sheet of

paper as illustrated in Figure 1. Data points were materialised by five circles with their

corresponding function values inside, while the considered generalization point (one per

problem) was materialised by an empty circle. For half the participants, the problems were

presented in the order (I1.1, E1.1, I1.2, E1.2, I1.3, E1.3, I1.4, E1.4, I2.1, E2.1, I2.2, E2.2, I2.3,

E2.3, I2.4, E2.4), while the reverse order was used for the other half. The whole set of

problems is visualized in Figure 3 (together with human average generalization responses).

P. Courrieu – British Journal of Mathematical and Statistical Psychology 6/59

2.3. Task. Participants were instructed that problems' data were artificial, but that they can

consider them as atmospheric temperatures (in Celsius degrees) on a square map of side 1000

kilometres (with a distance unit of 100 km). Then, they had to approximate the temperature in

the empty circle on the basis of the temperatures given at other locations. Responses were

given with a pencil in the empty circle, corrections were allowed, and there was no time

constraint, however, all participants solved the 16 problems in less than 8 minutes.

Table 1

Figure 1

2.4. Basic results. The responses to the 16 problems of the 16 participants are reported in

Table 2, with the mean response and standard deviation for each problem. It has recently been

shown that one can estimate the reproducible proportion of variance, in the average response

vector, using an intraclass correlation coefficient (ICC), namely the so called "ICC(C, k),

Cases 2 and 2A" in the nomenclature of McGraw and Wong (1996), computed on the raw

data of Table 2 (Courrieu, Brand-D'Abrescia, Peereman, Spieler, & Rey, 2011; Rey, Courrieu,

Schmidt-Weigand, & Jacobs, 2009). The obtained ICC is equal to 0.985, which indicates that,

despite the variability of responses, these are in fact highly consistent, and one can hope that

there is an underlying behaviour common to all participants. The 95%, 99%, and 99.9%

confidence intervals of the ICC are [0.973, 0.994], [0.967, 0.996], and [0.959, 0.997],

respectively. According to Courrieu et al. (2011), the squared correlation of the predictions of

an exact model with the average empirical response vector should belong to the ICC

confidence interval with the interval probability. A squared correlation lower than the ICC

lower confidence limit indicates a probable under-fitting, while a squared correlation greater

than the ICC upper confidence limit indicates a probable over-fitting. The very high value of

the observed ICC makes the prediction of such data highly challenging, given that

approximate models have few chances to satisfy the validation criterion.

Table 2

3. Test of standard function approximation models

A representative set of 10 function approximation models has been selected in order to

attempt to predict humans' generalization responses. All models were trained to exactly fit the

P. Courrieu – British Journal of Mathematical and Statistical Psychology 7/59

data set for each sampled function (I1, I2, E1, E2), and then the responses of each model at

the generalization points were recorded. Except for the Quadratic approximation model (see

below), human responses were never used in the training of models, in order to avoid over-

fitting problems (Pitt & Myung, 2002). For those models that have some global tuneable

parameter(s), standard tuning (which is commonly close to optimal) was finally used after

verifying that an optimization with respect to experimental data did not significantly improve

the prediction performance. Psychological function learning models such as EXAM (DeLosh

et al., 1997) and POLE (Kalish et al., 2004)) were not included in the tested set of models

because, in their current state, these models are not suitable to quick multivariate function

approximation, and their possible extension requires non-trivial modifications, which prevents

from regarding these models as standard in this framework.

3.1. Direct approximators. These models directly provide generalizations, and they do not

require learning or building a representation of the whole function, so they are possibly good

candidates for modelling quick function approximation. In this category, we consider the

usual Nearest Neighbour Approximator (abbreviated "NNA"), which simply approximates the

function at any generalization point by the function value of the nearest data point. We also

consider the original Shepard interpolator (abbreviated "She") (Shepard, 1968), also known as

the "inverse distance-weighted average":

!

f (X) = f i X " Xi

"2

i=1

m

# X " X j

"2

j=1

m

# , (1)

where

!

{(Xi, fi), 1" i " m} is a data set of

!

m points, and

!

X is a generalization point. Finally

another direct approximator exhibited interesting properties, this is the Lipschitz interpolator

("Lip") (Beliakov, 2006):

!

f (X) = ( f+(X) + f

"(X)) /2, (2a)

with

!

f+(X) =min1" i"m ( f i + a X # Xi ) ,

!

f"(X) =max

1# i#m ( f i " a X " Xi ) , (2b)

and the Lipschitz constant:

!

a =max1" i< j"m fi # f j / Xi # X j . (2c)

3.2. Radial Basis Function Networks. These are widely used function approximation models

that just require learning by solving a least-squares (or linear) system before generalizing

(Micchelli, 1986; Girosi & Poggio, 1990; Poggio & Girosi, 1990; Yoon, 2001). In this

P. Courrieu – British Journal of Mathematical and Statistical Psychology 8/59

category, we consider Gaussian basis function networks ("Gau"), Hardy multiquadrics

("Har"), and Radial Splines ("Spl"). Given a data set of

!

m points, an RBFN interpolation at

point

!

X is of the form:

!

f (X) = wi g( X " Xii=1

m

# ) , (3)

with

!

f (Xi) = f i,

!

1" i " m, (for exact interpolation), (3a)

or

!

f (Xi) " f i ,

!

1" i " m, (for least-squares approximation), (3b)

where

!

g is a basis function, and

!

wi is a weight associated to the ith data point by learning.

For Gaussian networks, one has the basis function

!

g(d) = exp("d2/s2), while for Hardy

multiquadrics, one has

!

g(d) = (1+ d2)1/ 2, and for Radial Splines on two-dimensional spaces,

one has

!

g(d) = d2ln(d). The scale parameter (s) of Gaussians is tuned using the "global first

nearest-neighbour heuristic" (Moody & Darken, 1989). Despite the existence of a learning

phase, these models are possible candidates to quick function approximation since their

learning process reduces to solving a least-squares (or linear) system, and it has been shown

that biological neural networks could (theoretically) solve least-squares systems very fast,

say, in less than 250 ms (Courrieu, 2004). In practice, one considers a set of

!

L data points

from which one selects a subset of

!

m data points named "prototypes", with

!

m " L . Then one

builds a

!

L "m matrix

!

G = (gki) = (g( Xk " Xi )) ,

!

1" k " L,

!

1" i " m, and the corresponding

!

L "1 vector

!

F = ( fk ) = ( f (Xk )) ,

!

1" k " L, of expected function values. Then one computes

the

!

m "1 weight vector

!

W = (wi) ,

!

1" i " m, solving the least-squares problem

!

minW "R

mGW # F

2 , whose solution is known to be

!

W =G1,3( )F , for a suitable generalized

inverse

!

G1,3( ) of the matrix

!

G (Ben-Israel & Greville, 2003; Courrieu, 2009). If

!

m < L , then

the least-squares solution allows filtering possible noise in the data, however it does not

provide an exact interpolator. If

!

m = L , then the matrix

!

G is square and invertible, the least-

squares system reduces to a simple linear system, and one obtains an exact interpolator using

!

G1,3( ) =G"1, that is

!

W =G"1F , which satisfies (3a).

3.3. Similarity function networks. These are networks that use dissimilarity measures as

inputs, and they can efficiently approximate functions on every metric space, as well as on a

wide variety of non-metric topological spaces, such as those generated by Dynamic

Programming methods. To date, only two approximators are known to have such capabilities:

a simple generalization of the Nearest Neighbour Approximator, and the so-called Phi-

P. Courrieu – British Journal of Mathematical and Statistical Psychology 9/59

approximator ("Phi") (Courrieu, 2005), which, for interpolation on Euclidean spaces, takes

the form:

!

"(X) = wi exp(#$ X # Xi

2)

i=1

m

% exp(#$ X # X j

2

)j=1

m

% , with

!

"(Xi) = f i,

!

1" i " m. (4)

The parameter

!

" is tuned using a standard procedure (

!

"*) defined in the reference above, and

the coefficients (

!

wi's) are learned solving a least-squares (or linear) system, as in Section 3.2.

3.4. Multilayer Neural Networks. These models are more particularly devoted to function

learning, they commonly involve large data sets, and quite laborious learning procedures such

as the "Error Gradient Backpropagation" (Rumelhart, Hinton, & Williams, 1986) for layered

networks called Multilayer Perceptrons ("MLP"), or the "Cascade-Correlation" learning

algorithm (Fahlman & Lebiere, 1990), for both Multilayer Perceptrons and more general feed-

forward architectures called Cascade-Correlation networks ("CCo"). An important advantage

of the Cascade-Correlation learning algorithm over the Backpropagation algorithm is that it

automatically determines the number of necessary hidden neurons. So, we use a variant of the

Cascade-Correlation learning algorithm for both architectures (MLP and CCo). In this variant,

the computation of the hidden units is performed by mean of the global optimization

"Hyperbell" algorithm (Courrieu, 1997) instead of the original multistart type procedure. Each

network was trained to exactly fit the learning set for each function (I1, I2, E1, and E2)

separately, and this was repeated 32 times in order to estimate mean generalization responses

that were used for the comparison with human generalization responses. In all cases, and for

all problems of Experiment 1, MLP completed learning with 4 hidden neurons, while CCo

learned with 2 hidden neurons. These models were tested in order to establish reference fits,

however, their laborious learning process makes them a priori implausible models for quick

function approximation.

3.5. Quadratic approximation ("Qua"). This is the minimum degree polynomial

approximation that allows to exactly fit five data points on a bivariate support. It is important

as an example of polynomial approximation, and also because locally quadratic

approximations are used in other well-known interpolators (Brodlie, Asim, & Unsworth,

2005; Renka, 1988). However, a bivariate polynomial of degree two has 6 monomials, thus

we have 6 coefficients, and only 5 data points to fit in each function. So, we can use the

remaining degree of freedom to choose the particular solution that minimizes the

generalization error with respect to the experimental data. This is known as a "constrained

P. Courrieu – British Journal of Mathematical and Statistical Psychology 10/59

least-squares solution" (Ben-Israel & Greville, 2003, p. 108), and it provides the best

quadratic approximation of human responses that we can hope, given the constraint of exactly

fitting the data points.

3.6. Results. Figure 2 shows examples of generalization surfaces (E1 function) generated by 4

different models (She, Lip, Har, and Phi), which gives an idea of the diversity of solutions.

Results of simulations for all models are reported in Table 3, together with the correlations

between simulated and human generalization responses. The best prediction is provided by

the Phi-approximator (r=0.916), closely followed by the Shepard interpolator (r=0.910), and

the MLP (r=0.905). Despite the optimization of the solutions with respect to the empirical

data, the Quadratic approximation model provides a modest performance (r=0.759). If one

hypothesizes some multi-system model, where each model independently contributes to the

mean responses with a given probability, one can easily compute the optimal probabilities

using Algorithm 1 from (Courrieu, 1994), which provides the following models' probabilities:

0.2022 for NNA, 0 for Lip, 0.1537 for Gau, 0.0847 for Phi, 0.0059 for Har, 0 for Spl, 0.2636

for MLP, 0.0733 for CCo, 0.2166 for She, and 0 for Qua. Weighting the models' predictions

with these probabilities and summing, one obtains a composite prediction vector whose

correlation with human mean responses is r=0.945. Note that this is just an estimate of the

best fit that can be reached with this particular set of models, as a whole. One must not

interpret the individual probabilities as models fits since, in this type of analysis, it can even

happen that a zero probability is assigned to the best model. Despite the quite good

performance of several tested models, one can observe that none of the obtained correlations,

after squaring, belongs to standard confidence intervals of the human data ICC, and all

models under-fit the data (see Section 2.4). Thus, there is room for further investigation.

Table 3

Figure 2

4. Characterization of a class of suitable models

The three best predictors (Phi, She, and MLP) are very different models, and it seems

hard to see what they have in common. In this section, we define a large class of function

P. Courrieu – British Journal of Mathematical and Statistical Psychology 11/59

approximation models, the two best predictors belong to, and we determine the maximum

correlation with empirical data that models of this class could reach.

We hypothesize that the generalization function at any point

!

X " R2 can be rewritten as:

!

f (X) = wi(X) f ii=1

m

" , (5)

with

!

wi(X j ) = "ij (Kronecker delta), and the unit sum constraint that

!

wi(X) =1

i=1

m

" , where

!

m

is the number of data points (here m=5),

!

fi is the given function value at the ith data point,

and

!

wi(X) is the value of the "weighting function" of the ith data point at point

!

X . Moreover,

we hypothesize that the weighting functions only depend on the relative locations of

generalization and data points in the support space, and that they are invariant to shifts and

rotations of the coordinates. The resulting class of interpolators includes, among others,

Shepard interpolators and Nearest Neighbour approximators, and if one weakens the unit sum

constraints to be only approximately satisfied, then the class extends to most Radial Basis

Function Networks, and to Phi approximators when used on Euclidean spaces, as is the case

here.

Observe, in Figure 3, that all interpolation problems have similar input configurations

by quarters of turn of the support, and the same is true for extrapolation problems. If one

considers the data points in increasing order of their distance from the generalization point in

each problem, then one obtains a sequence of 5 distinct distances that is the same for all

interpolation problems, and another sequence of 5 distinct distances that is the same for all

extrapolation problems. The output data reordered in this way form a 5 components row

vector for each problem. Then one can group the vectors of the 8 interpolation problems in an

8×5 matrix, and similarly for the 8 extrapolation problems. Finally, we can append to each

matrix a row vector whose 5 components are equal to 1 (in order to represent the unit sum

constraint). Under the above hypotheses concerning the class of models, one can find a unique

5 weights vector WI for interpolation problems, and similarly, a unique 5 weights vector WE

for extrapolation problems, such that the scalar product of each row of each matrix with the

appropriate weight vector is equal to the generalization response to the corresponding

problem (or is equal to 1 for the unit sum constraint). The data and the solutions to this

problem are presented in Table 4, and the two weight vectors (WI and WE) were obtained by

use of the constrained least-squares technique. The correlation between the predicted

P. Courrieu – British Journal of Mathematical and Statistical Psychology 12/59

generalization responses ("Obtained" column) and the human generalization responses

("Target" column) is r=0.988, which is clearly better than the best model fits previously

obtained, and this correlation is the maximum possible for models belonging to the above

defined class. Interestingly, the squared target correlation (0.9882 = 0.976) belongs to the 95%

confidence interval of the data ICC (0.973, 0.994), which indicates that there is some chance

that the exact model belongs to the class above defined. In order to approach such a goodness

of fit with a model, we must attempt to identify some relevant characteristics that previously

tested models do not have.

Observing the weight vectors (WI and WE) in Table 4, we can see that the weights of

the two most distant data points from the generalization point are always negative, which

indicates that the interpolation is based not only on data values, but also on variations

(differences) between data points. Moreover, although the nearest data point always has the

greatest weight, the weights do not decrease monotonically as the distance increases. A

prominent fact, in the weight vector WI, is that the two greatest weights are at distance ranks 1

and 3, corresponding to data points that are the extremities of a line segment the

generalization point belongs to. This is easy to visualize in Figure 3, noting that in

interpolation problems, the distance rank 2 always corresponds to the central data point. In

summary, it appears that the distance of the generalization point from a line segment joining

two data points is more determinant than the distance from an individual data point, which

strongly suggests that the basic interpolation unit involves a pair of data points, hereafter

called "bipoint".

Now, given two data points with their function values, there is a unique linear function

(straight line) that passes by these points. The global function to be approximated is not

linear, in general, but the principle of using multiple pieces of linear functions to approximate

non-linear functions has been successfully used in several leading models of exemplar based

univariate function learning. This is the case in the EXAM model (DeLosh et al., 1997), as

well as in the POLE model (Kalish et al., 2004), for instance. However, this principle is more

complex to use on multivariate support spaces because, contrarily to the univariate case, most

generalization points do not belong to a straight line defined by a pair of data points. The

model presented in the next section provides a way of overcoming this difficulty. This will be

P. Courrieu – British Journal of Mathematical and Statistical Psychology 13/59

achieved using a special variant of R. Shepard (1987)'s generalization law suitable to

generalize from bipoints.

Table 4

Figure 3

5. The ABI model

We now instantiate a model belonging to the class defined in Section 4, taking into

account the particular observations on the weight functions, in order to attempt to approach

the class maximum possible correlation level with human responses (r=0.988). We

hypothesize that the input space is structured by Voronoi tessellation (Okabe, Boots,

Sugihara, & Chiu, 2000), and that the basic function approximation elements are data

bipoints. A bipoint is simply a pair of data points, to which any generalization point can be

compared by mean of two quantities: a special distance named "exteriority", and a "linear

interpolation/extrapolation" of function values, as explained below. This allows building a

simple model where various bipoints are sequentially sampled with probabilities depending

on the exteriority of the generalization point, and the corresponding function linear estimates

are averaged to build the subject's generalization response. This model is implemented in the

function "ABI" (for "Average of Bipoints Interpolations") listed in Appendix (Matlab 7.5

code). The model is described hereafter.

5.1. Voronoi tessellation and relevant bipoints. Observing visualized data sets (as in Figure

1), it is intuitively obvious that certain bipoints are of minor interest because their two points

are distant from each other and other data points are interposed between them. This intuition

probably results from the way we structure the data support space. There are to date strong

empirical evidences and theoretical arguments suggesting that the visual system generates a

Voronoi-like representation at an early stage in visual processing (Dry, 2008; Dry, Navarro,

Preiss, & Lee, 2009). All around each data point is its "Voronoi-cell", which is the set of all

generalization points having this data point as their nearest neighbour. The juxtaposition off

all Voronoi cells is the "Voronoi tessellation" of the data space. The second row of images in

Figure 4 provides a visualization of Voronoi tessellations of the interpolation and

extrapolation data sets. Two data points are "Voronoi-neighbours" if their respective Voronoi-

cells have at least one border point in common. Some elementary geometric considerations

P. Courrieu – British Journal of Mathematical and Statistical Psychology 14/59

allow building a simple algorithm for extracting all Voronoi-neighbours pairs from a data set,

as does the sub-function "Neighbors" listed in Appendix. We define the set of "relevant

bipoints" as the set of all Voronoi-neighbours pairs from the considered data set. In a two-

dimensional space, the generated set of bipoints coincides with the set of all edges of all

possible "Delaunay triangulations" of the considered set of points. Delaunay triangulation is a

dual structure of Voronoi tessellation, however, contrarily to Voronoi tessellation, Delaunay

triangulation is not always unique.

5.2. Exteriority of a point from a bipoint. This is simply the Euclidean distance of the

considered point from the nearest point belonging to the line segment joining the two points

of the bipoint. Let

!

E(x, a, b) denote the exteriority of a point

!

x from a bipoint

!

(a, b). Set

!

A = x " a,

!

B = x " b, and

!

C = a " b. In the particular case where

!

a = b, the bipoint reduces to

a point and one has simply

!

E(x, a, b) = A = B . In all other cases, the squared exteriority is

given by:

!

E2(x, a, b) = ( A

2B

2" (A.B)

2+ 1

4( A.C + B.C " C

2)2)/ C

2 , (6)

where

!

X .Y denotes the scalar (dot) product of vectors

!

X and

!

Y , and

!

X denotes the

Euclidean norm of vector

!

X . The sub-function named "Exteriority" in the listing of Appendix

computes the exteriority of points from bipoints. The first row of Figure 4 shows the

exteriority function on the square for two distinct bipoints. The expression (6) has the

advantage of being direct and exact, however, note that there is a more general definition of

the exteriority for all convex polytopes (other than bipoints), and corresponding iterative

computation methods can be found in Courrieu (1994). Note also that a technical variant of

the exteriority is known as the "polytope distance" (Gärtner & Jaggi, 2009).

5.3. Sampling probability of a bipoint. Every relevant bipoint

!

(xi,x j ) belonging to the data set

can be considered at any time with a probability

!

pij (x) that depends on the exteriority

!

E(x,xi,x j ) of the current generalization point

!

x from that bipoint, and on the distance of the

generalization point from its nearest neighbour data point

!

d0(x) =min1"k"m x # xk

, where

!

m

is the number of data points. Introducing

!

d0(x) is not only natural in a Voronoi structured

data space, but it is also essential to obtain a stable and exact interpolator. First, both

!

E(x,xi,x j ) and

!

d0(x) depend on the scale of the input data, however, the ratio

!

E(x,xi,x j ) /d0(x) has the desirable property of being scale invariant. Secondly, whenever

!

x

P. Courrieu – British Journal of Mathematical and Statistical Psychology 15/59

tends to a data point, there are two possibilities. (1) If the nearest data point does not belong to

the bipoint

!

(xi,x j ), then the ratio

!

E(x,xi,x j ) /d0(x) tends to infinity, as an ordinary inverse

distance. (2) If the nearest data point belongs to the bipoint

!

(xi,x j ), then the ratio

!

E(x,xi,x j ) /d0(x) tends to a finite limit comprised between 0 and 1, because, by definition of

the exteriority, one has always

!

E(x, a, b) "min( x # a , x # b ) . We can now define

intermediate functions

!

vij (x) , representing the strength of data bipoint

!

(xi, x j ) at

generalization point

!

x :

!

vij (x) = exp("#(E(x,xi,x j ) /d0(x))$) , (7)

where

!

" and

!

" are two positive real global parameters to be estimated, since we have no

theory to fix them. Note that (7) is just a variant of Shepard (1987, Eq. 10) exponential decay

generalization function, where the original Minkowskian metrics have been replaced by the

distance ratio discussed above, in order to account for bipoint generalization. Figure 4 shows

examples of

!

E ,

!

d0, and

!

" functions associated to two bipoints, one belonging to the

interpolation data set (left column), and one belonging to the extrapolation data set (right

column).

Finally, the sampling probability of a relevant bipoint

!

(xi,x j ) is given by:

!

pij (x) = vij (x) / vkl (x)(k,l )"V

# , (8)

where

!

V denotes the set of index pairs of Voronoi-neighbours in the data set. Note that (8) is

just an application of Luce's choice axiom (Luce, 1977), using the strength functions defined

in (7).

5.4. Linear interpolation/extrapolation from one bipoint. Given a relevant bipoint

!

(xi,x j ), and

the corresponding function values

!

( f i, f j ) in the data set, a linear approximation of the

function at the generalization point

!

x is provided by:

!

fij (x) = f i + ( f j " f i)(x " xi).(x j " xi)

x j " xi2

. (9)

As an example, consider the extrapolation problem E1.2 shown in Figure 1, and the data

bipoint consisting of

!

xi= (1, 6), with

!

fi = 8 , and

!

x j = (4,1), with

!

f j =13. The generalization

point is

!

x = (1, 9), and the function value predicted by the considered bipoint is:

!

fij (x) = 8 + (13" 8)(1"1, 9 " 6).(4 "1,1" 6)

(4 "1)2

+ (1" 6)2

= 8 + 5"15

34# 5.79

P. Courrieu – British Journal of Mathematical and Statistical Psychology 16/59

Note that linearly extrapolated values can be outside the range of the data function values.

One can assume that humans do not exactly compute the above quantities, but that they

approximate them with a random approximation error of mean zero (which is not modelled

for the moment).

5.5. Generalization response generating process. Given a generalization point

!

x , we assume

that the subject repeatedly and independently samples relevant bipoints in the data set

according to their sampling probabilities (eq. 8), then he/she linearly estimates the function

value at

!

x from each sampled bipoint (eq. 9), and averages the successive estimates. The

number of sampled bipoints is presumably a positive integer random variable, which is not

modelled for the moment. We can nevertheless determine the expected generalization

response at any point:

!

f (x) = pij (x) f ij (x)( i, j )"V

# . (10)

We can also determine a component part of the variance of generalization responses at any

point:

!

"1

2( f (x)) = pij (x)( fij (x) # f (x))

2

(i, j )$V% . (11)

The above variance does not take into account the (non modelled) random errors in linear

approximations, and it corresponds to the case where only one bipoint is sampled for each

generalization response. In fact, this variance must be divided by the (non modelled) number

of sampled bipoints used in each average. In addition, a simple inspection of individual

responses (see Table 2) shows that there is another (small) source of variance. We can note

that all individual responses are integer numbers, which means that the approximations are

rounded, and thus, there is a rounding error.

5.6. Main properties. Whatever be the dimension of the support space, the one of the function,

and the data set size, one can easily verify that:

.

!

f (x) is continuous at every point

!

x .

. If

!

xi is a data point, then

!

f (xi) = f i , and

!

"1

2( f (xi)) = 0, that is,

!

f (x) is an exact interpolator.

Figure 4

P. Courrieu – British Journal of Mathematical and Statistical Psychology 17/59

6. Test of the model

The two general parameters,

!

" and

!

" , have been estimated using a local search

procedure (Matlab "fminsearch") in order to maximize the correlation of the model

predictions with human generalization responses. As a result, we obtained

!

" =1.2031,

!

" = 2.3536, and the correlation of predicted mean responses with observed ones is

!

r = 0.975

(see Table 5), which is not far from the target correlation r=0.988, however, the difference of

these correlations is marginally significant using Williams T2 test (Steiger, 1980; Williams,

1959), that is

!

T2(13) =1.82, p < .10 . Moreover, the squared obtained correlation (0.9752 =

0.951) is just lower than the lower confidence limit of the 99.9% confidence interval of the

data ICC (0.959, 0.997), which indicates that there is room for model improvement, but we

are not far from an acceptable solution. In addition, the predicted standard deviations have a

correlation of

!

r = 0.653, p < .01, with the observed standard deviations (see Table 5). This last

result could be trivial if there was a correlation between the means and the standard

deviations, but this is not the case, since this correlation is

!

r = 0.179, n.s., for human, and

!

r = "0.106, n.s., for the model. We now verify that the predictions of the ABI model are

significantly better than those of other tested models. For this purpose, we first computed the

correlation between each participant response pattern and each model prediction pattern. The

resulting r values were transformed in z using Fisher's transformation

!

z(r) = log((r +1) /(r "1)) /2 , providing approximately normally distributed measures. This

resulted in a sample of 16 measures (one per subject) for each model, and Student t tests were

applied for pair-wise comparisons of the 11 models. The results are reported in Table 6,

where one can see that the ABI model provided significantly better predictions than all other

tested models. However, one could object that the ABI model was fitted to the empirical data

optimizing two free parameters, while other models (except Qua) were trained/tuned without

reference to human responses. In such circumstances, one commonly use model selection

criteria such as the Akaike Information Criterion (AIC: Akaike, 1974), or the Bayesian

Information Criterion (BIC: Schwarz, 1978), which takes into account both the goodness of fit

(maximum log-likelihood, for these criteria), and the number of model parameters optimized

in order to fit the empirical data. There is another powerful model selection criterion known

as the Bayes factor (BF), but unfortunately this criterion theoretically requires a prior

distribution to be defined on the space of model parameters. However, one knows that a rough

approximation of the Bayes factors can be computed from the BIC values without using a

P. Courrieu – British Journal of Mathematical and Statistical Psychology 18/59

prior distribution on the parameter space (Kass & Raftery, 1995, p. 778). Given two models,

say model 1 and model 2, with data fits such that

!

BIC1" BIC

2, the Bayes factor

approximation is given by

!

BF1,2 " exp((BIC2 # BIC1) /2). An interpretation scale of the Bayes

factors proposed by Jeffreys (1961) is commonly used:

1 ≤ BF < 3.2: "Not worth more than a bare mention"

3.2 ≤ BF < 10: "Substantial"

10 ≤ BF <100: "Strong"

BF ≥ 100: "Decisive"

The AIC and BIC for all models are reported in Table 7, together with the usual

squared correlation (explained proportion of variance), and the BFs with respect to the model

with minimal BIC (ABI model in the present case). As one can see in Table 7, both the AIC

and the BIC are minimal (and the explained proportion of variance is maximal) for the ABI

model compared to other models, confirming the previous analysis. Examining the

approximations of BFs, one can observe that the advantage of the ABI model over other

models is always at least "substantial". Moreover, if one arbitrarily set the ABI model

parameters to simple values such as

!

" = # = 2 , then the correlation of model's predictions

with the empirical data becomes 0.967, which is not much less than the fit obtained with the

optimal tuning, and which remains better than the performance of all other models. This also

shows that the ABI model behaviour has the desirable property of being moderately sensitive

to variations of the parameters.

So, we can conclude that the proposed model is a suitable starting point for modelling

quick multivariate function approximation. Figure 5 visualizes the four generalization

surfaces (I1, I2, E1, and E2) generated by the model with the optimal parameter values. In the

next section, we try a step forward, examining the role of the visual presentation of

approximation problems and the intra-individual variability of responses.

Table 5

Table 6

Table 7

Figure 5

P. Courrieu – British Journal of Mathematical and Statistical Psychology 19/59

7. Experiment 2

A critical assumption in the ABI model is that of the use of a Voronoi-like tessellation

of the data space. Such a tessellation is plausibly performed by the visual system at an early

stage in visual processing (Dry, 2008; Dry et al., 2009). However, one can ask what happens

if the data are presented in a way that does not allow the visual system to perform a Voronoi

tessellation. Is there some form of abstract Voronoi tessellation? If not, the ABI model could

loose its advantage over other models, or even appear completely inappropriate for such a

situation. This is the main point examined in Experiment 2. The function approximation

problems used in this experiment are more complex, less regular, and more numerous than

those of Experiment 1, in order to more closely mimic real life problems. We also address the

question of the intra-participant variability of generalization responses. The ABI model

predicts the intra-participant variability exactly in the same way as the inter-participant

variability, thus the intra-participant variability must exist (which is not a priori obvious!),

and it must be correlated with the predicted variability, as was the inter-participant variability

in Experiment 1. Finally, in an exploratory goal, we also record the response times in this

experiment.

7.1 Participant

As noted in Section 2.4, the generalization responses of participants in the quick

function approximation task are highly consistent (ICC = 0.985), so we can reasonably

consider that the responses provided by a unique participant are representative enough of the

whole population responses. This allows us to schedule a quite long, one participant

experiment with repeated measures, which is suitable to the goals stated in the section

introduction. The participant in this experiment was a 46 years old man, computer engineer,

having a corrected to normal vision.

7.2 Material

A set of 64 bivariate function approximation problems was built in the following way.

Each problem included 8 data points plus one generalization point. The coordinates of points

were randomly generated integer values in the range 1-9, as were the function values for data

points. The following constraints were applied in order to ensure that regular continuous

P. Courrieu – British Journal of Mathematical and Statistical Psychology 20/59

functions could interpolate these points. The minimum distance between two distinct points in

the support space was fixed to 2, and the function values of data points fulfilled a Lipschitz

condition such that, for each pair of data points, the absolute difference of the two function

values divided by the input points' Euclidean distance was never greater than 3 (Lipschitz

constant). An additional set of 8 problems was built in the same way for practice. The

problems were presented in two different ways on a computer screen. In the "spatial

presentation" condition, the problems were presented in a 10×10 square, each point location

corresponding to its input coordinates, while the point was simply materialized by its function

value for data points, or by a question-mark (?) for the generalization point. Figure 6

illustrates the spatial presentation of a problem. In the "non-spatial presentation" condition,

each point was represented as a line segment in a small square, and the heights of the segment

extremities corresponded to the point coordinates. Data point function values were shown

above the corresponding small squares, while a question-mark was used for the generalization

point. Figure 7 illustrates the non-spatial presentation of the same problem as in Figure 6.

Figure 6

Figure 7

7.3 Procedure

In each presentation condition, each trial began with the display of a problem on a

computer screen, then the participant had to estimate the function value at the generalization

point, entering his response on the computer keyboard. As in Experiment 1, there was no time

pressure, however, the computer recorded the response time. If the entered response was

empty or not a number, then the participant was prompted to re-enter his response, however,

the response time was always recorded at the first response. The next trial was initiated when

a suitable response had been recorded. The experiment was divided in 8 sessions of about

1h30 each, distributed over 8 days. Each session began with 8 practice trials, followed by a

short pause. Then the 64 experimental problems were presented in random order, by

sequences of 8 trials, with a short pause between sequences. Only one presentation condition

was used in a given session. Odd rank sessions used the spatial presentation condition, and

even rank sessions used the non-spatial presentation condition. At the end of the experiment,

we had 4 responses and 4 response times in each of the 2 presentation conditions, for each of

the 64 problems.

P. Courrieu – British Journal of Mathematical and Statistical Psychology 21/59

7.4 Competing models

The same standard models as in Experiment 1 were used, in the same conditions, in

order to predict the generalization responses. However, the Qua model was excluded because

it was not suitable to problems with 8 data points. The MLP model always completed learning

with 7 hidden units, while the CCo model always learned with 5 hidden units.

7.5 Results

7.5.1 Effect of the presentation condition and tuning of the ABI model

Four different tunings of the ABI model parameters were used. The first one, denoted

ABI1, is simply the optimal tuning obtained in Experiment 1. The second one, denoted ABIa,

is the optimal tuning obtained by fitting the generalization data of the spatial presentation

condition in Experiment 2. The third one, denoted ABIb, is the optimal tuning obtained by

fitting the generalization data of the non-spatial presentation condition in Experiment 2. The

corresponding parameter values are presented in Table 8. Averaging the three values for each

parameter, we obtain an average α of 1.8669 and an average β of 2.3853. Rounding these

values to the nearest integer numbers, we define a default tuning, denoted ABI0, as α = β =

2. The correlation values (r) were computed between model predictions and the averaged four

responses of the participant to each problem, in each presentation condition. One can see in

Table 8 that the fits of the ABI model to human responses, whatever the model tuning, are

substantially and significantly less good in the non-spatial presentation condition than in the

spatial presentation condition. Moreover, the correlation between human responses in the

non-spatial presentation condition and those in the spatial presentation condition is only r =

0.819, which, although highly significant (p< .001), is significantly less than the 0.947

correlation of ABI0 model predictions (for instance) with the human responses in the spatial

presentation condition (T2(61) = 4.85, p< .001). So, clearly, something important is lost when

changing the presentation of problems, but we need further analyses to know what is lost. One

can also note that the default tuning (ABI0) is almost as good as optimal tunings in both

presentation conditions.

Table 8

P. Courrieu – British Journal of Mathematical and Statistical Psychology 22/59

7.5.2 Comparison of models

Table 9 presents pair-wise comparisons of the prediction performance of 11 models,

by mean of Williams T2 tests (Steiger, 1980; Williams, 1959) on the correlations between

predicted and observed responses, in the spatial presentation condition. As one can see, both

ABIa and ABI0 provided significantly better predictions than all other models in this

presentation condition. Table 10 presents the values of model selection criteria r2, AIC, BIC,

and the approximated Bayes factor (BF) with respect to the minimal BIC. At new, ABIa and

ABI0 are the best models with respect to both criteria, and the results obtained in the spatial

presentation condition confirm those obtained in Experiment 1. Moreover, the Bayes factor

indicates that ABI0 is "strongly" preferable to ABIa, but both models under-fit the data in

terms of squared correlation. Tables 11 and 12 present comparisons similar to those of Tables

9 and 10, but for the data obtained in the non-spatial presentation condition. The ABIb and

ABI0 models remain the best predictors, but not significantly, in terms of correlation of their

predictions with human responses (see Table 11). However, AIC and BIC criteria are minimal

for the standard Lipschitz approximator (Lip) (see Table 12). This emergence of the Lip

model as a possible candidate in the non-spatial presentation condition is interesting, and the

elegant simplicity of this model certainly merits some attention, however, the Bayes factor

indicates that Lip did not perform substantially better than ABI0 in this presentation

condition. The approximated Bayes factor with respect to ABI0 (last column of Table 12)

shows that ABI0 remains at least "substantially" better than all other models, except Lip. One

can note that the non-spatial presentation condition poorly discriminate the models, and that

all fits are poorer than in the spatial presentation condition. What is clear is that the special

advantage of the ABI model over other models is lost in the non-spatial presentation

condition. So, this clear advantage of the ABI model in the spatial presentation condition

probably results from a particular performance of the visual system, such as the Voronoi-like

tessellation of the data space, which is taken into account in the ABI model, but not in the

other tested models. This answers the main question of this experiment. Figure 8 visualizes

the generalization surface generated by the ABI0 model from the data points of the problem

of Figure 6.

Table 9

Table 10

Table 11

P. Courrieu – British Journal of Mathematical and Statistical Psychology 23/59

Table 12

Figure 8

7.5.3 Interpolation and extrapolation problems

Contrarily to Experiment 1 problems, those of Experiment 2 were randomly generated,

and they can thus be considered as a random sample of bivariate function approximation

problems (with some specified characteristics such as the number of data points). It turned out

that, among the 64 problems of Experiment 2, 29 problems were interpolation problems

where the generalization point was inside the convex hull polytope of the data set, and the

remaining 35 problems were extrapolation problems. Table 13 presents a comparison of 10

models' performance (correlations of predictions with human generalization responses), using

only the ABI0 version of the ABI model (without free parameters). The models are compared

in both conditions of Experiment 2, distinguishing interpolation and extrapolation problems.

In the spatial presentation condition, the best predictor is the ABI model for both interpolation

and extrapolation problems, with a particularly strong advantage in extrapolation problems.

However, in the non-spatial presentation condition, the best predictor for interpolation

problems is the Nearest Neighbour Approximator, suggesting that the difficulty in visualizing

the problems induces a regression toward the simplest strategy. This is not true for

extrapolation problems, where the best predictor is the Hardy multiquadric model. A possible

explanation of this is that Hardy's basis functions rapidly become approximately linear along

their radii as the distance from their centre increases, which can lead to approximately linear

extrapolations, that is, to a simplification of the ABI process. This is also in line with

previously reported data concerning extrapolation in univariate function learning tasks

(DeLosh, Busemeyer, & McDaniel, 1997). However, most correlation differences are not

significant in the non-spatial presentation condition, and it is not clear what model is actually

the best predictor, given that it seems to be the Lipschitz approximator, closely followed by

the ABI model, when interpolation and extrapolation generalization responses are mixed.

Note, that the ABI model never performed significantly worse than the best predictors, even

in the non-spatial presentation condition, however, its special advantage is clearly lost in this

case. One can also observe that, contrarily to the intuition, most models better capture human

behaviour regularities in extrapolation problems than in interpolation problems. The reason of

this is not clear, and there is matter for further investigations.

P. Courrieu – British Journal of Mathematical and Statistical Psychology 24/59

Table 13

7.5.4 Response variability and response time

The results concerning response variability and response time are summarized in

Table 14. As expected, in the spatial presentation condition, there was a significant correlation

between the response standard deviations as predicted by the ABI model (ABIa and ABI0),

and the observed standard deviations. The obtained correlations are lower than in Experiment

1, but this is normal given that human response standard deviations were estimated only on 4

measures each in Experiment 2, while they were estimated on 16 measures (participants) in

Experiment 1. So, the SD estimates in Experiment 2 are clearly less accurate than in

Experiment 1, resulting in a loss of correlation. An interesting fact is that we also observed a

significant positive correlation between the SDs predicted by the ABI model and the human

response times. In the framework of the model, this suggests that the subject could increase

the number of sampled bipoints as a function of the variance of successive linear estimates.

Intuitively, this means that one samples more data when the sampled data are inconsistent,

which seems reasonable. This provides suggestions for future developments of the model,

concerning the determination of the number of sampled bipoints in estimating generalization

responses. Similar, but weaker correlations were observed in the non-spatial presentation

condition, and they were significant for response times only. Finally, the mean response time

per problem was 66 (± 40) seconds in the spatial presentation condition, and 52 (± 26)

seconds in the non-spatial presentation condition, which is about twice the mean response

time in Experiment 1. This increase of the response time simply confirms that the problems of

Experiment 2 were more complex than those of Experiment 1.

Table 14

8. Conclusion

This work is a first approach to quick multivariate function approximation tasks and

modelling. The proposed model (ABI) clearly exhibited very encouraging prediction

performance in suitably visualized approximation problems, and thus it constitutes a possible

starting point for further developments. The new model shares a number of characteristics

with several standard models belonging to univariate function learning and computer science

P. Courrieu – British Journal of Mathematical and Statistical Psychology 25/59

areas. It includes an inverse distance principle as the well-known Shepard interpolator

(Shepard, 1968), and it approximates non-linear functions by averaging local linear

interpolations/extrapolations from data points, just as the EXAM model (DeLosh et al., 1997).

Note that the POLE model (Kalish et al., 2004) also uses pieces of linear functions to

approximate non-linear functions, so there is strong convergence of several successful models

on this point. However, the ABI model relies on a Voronoi structuring of the data space, as

suggested by a number of empirical evidences concerning visual processing (Dry, 2008; Dry

et al., 2009). Moreover, the use of bipoints as elementary approximation units on

multidimensional spaces required introducing a special polytope distance, named "exteriority"

(Courrieu, 1994), in order to estimate the distance of any generalization point from each

relevant bipoint. These last elements were strongly suggested by an elaborated examination of

Experiment 1 data. Experiment 2 showed that the model specifically applies to problem

presentations in which the visual system can perform some form of Voronoi-like tessellation

of the data space. However, even in cases where such a visual tessellation is not possible, the

ABI model remains a top level predictor, together with other models such as the standard

Lipschitz interpolator, and possibly the Nearest Neighbour Approximator in interpolation

problems, or the Hardy multiquadric model in extrapolation problems. This leaves open the

question of a possible abstract (non visual) Voronoi tessellation.

The ABI model can work on spaces of any dimension, and we specifically studied

bivariate functions in this article. However, one can also restrict the support space to one

variable in order to compare the generalization behaviour of the ABI model with that of well-

known function learning models. This is particularly relevant in the case of the EXAM model

(DeLosh et al., 1997), because EXAM also builds predictions by averaging linear

interpolations and extrapolations from data points, but in a way that needs the data points to

be strictly ordered, thus in a one-dimension support space. Figure 9 shows a comparison of

ABI and EXAM generalization responses from 5 data points belonging to 3 univariate

functions (linear, exponential, and quadratic) tested by DeLosh et al. (1997), and a fourth

function which is a perturbed variant of the quadratic function. As one can see, ABI and

EXAM provide the same predictions in the linear case, and their generalization responses are

very similar on the interpolation and nearby extrapolation areas of other functions. However,

discrepancies appear between the two models' predictions on the far extrapolation areas of

non-linear functions. As one can see in Figure 9, while EXAM extrapolates in a strictly linear

way from the nearest data points, ABI far extrapolations gradually tend to take into account

P. Courrieu – British Journal of Mathematical and Statistical Psychology 26/59

the whole set of data points, which draws the function towards the average of all bipoints

extrapolations. Examining human and EXAM generalization responses in Figures 9

(exponential function) and Figure 10 (quadratic function) from DeLosh et al (1997), one can

observe that human responses more likely conform to EXAM predictions in the backward

(low) extrapolation area, while they more likely conform to ABI predictions in the forward

(high) extrapolation area. So, for the moment, none of the two models must be preferred in

order to predict one-variable function extrapolation. We can also note that the perturbation of

the quadratic function has the same limited effect on the predictions of both models. Exact

interpolators (unlike smoothers) are frequently highly sensitive to small variations in the data,

which can generate undesirable oscillations of the generalization function. No such a problem

appeared with ABI and EXAM models, and their reaction to local data variations seems very

reasonable. The ABI model was not designed to manage noisy data. This is because the task

of generalizing from a small set of sparse data does not allow participants to detect possible

noise, so participants have no other choice than considering the data as exact. However, the

ABI model itself is not limited to small data sets and it could be used to model other tasks,

possibly using large sets of noisy data, provided that some noise management mechanism be

added to the model. In order to remain in the direct approximation philosophy, a possible

choice is to apply a local smoother, such as a convolution filter, to the raw data in order to

generate a suitable filtered data set before applying the ABI process. However, this is a

substantial extension of the model, and of its application domain, that clearly concerns future

works.

Figure 9

It remains to complete the ABI model with some missing variables (linear

interpolation/extrapolation error, response rounding, and number of sampled bipoints per

response), in order to build a full simulation model. Finally, we note that there is clearly room

for model improvement since all models, including ABI, under-fitted the empirical data in

terms of squared correlation with respect to the human data ICC (Courrieu et al., 2011). This

new model testing methodology has the advantage of providing a clear answer to the

questions of under-fitting and over-fitting of the data by the models, while traditional model

selection criteria only allow comparing several given models to each other without indications

concerning their plausibility. In the present case, the model testing methodology based on the

P. Courrieu – British Journal of Mathematical and Statistical Psychology 27/59

data ICC provides a quite severe diagnostic for all tested models, encouraging us to be

modest, and to envisage further investigations on quick multivariate function approximation.

______________________

Appendix

Matlab code of the ABI function (for "Average of Bipoints Interpolations") that

implements the proposed model (for academic use only, since exceptions are not managed).

Comments (following "%") provide indications about the use of the function.

P. Courrieu – British Journal of Mathematical and Statistical Psychology 28/59

(Note to editorial: please insert this program as it is)

function [EFT,VFT] = ABI(XD,FD,XT,alpha,beta) % Average of Bipoints Interpolations % Expected output (EFT) and variance (VFT) at test input points (XT), % given data output (FD) at data input points (XD). Points=row vectors. [ND,dX]=size(XD); [ND1,dF]=size(FD); [NT,dX1]=size(XT); if nargin<5, alpha=2; beta=2; end % default parameters EFT=zeros(NT,dF); VFT=EFT; Bip=Neighbors(XD); [NBip,s]=size(Bip); for t=1:NT x=XT(t,:); dnn=inf; for i=1:ND d=norm(x-XD(i,:)); if d<dnn, dnn=d; nn=i; end end if dnn<=sqrt(eps) EFT(t,:)=FD(nn,:); VFT(t,:)=zeros(1,dF); else sw=0; fx=0; f2x=0; for k=1:NBip a=XD(Bip(k,1),:); b=XD(Bip(k,2),:); fa=FD(Bip(k,1),:); fb=FD(Bip(k,2),:); ext=Exteriority(x,a,b); w=exp(-beta*(ext/dnn)^alpha); sw=sw+w; fbip=fa+(fb-fa)*((x-a)*(b-a)')/((b-a)*(b-a)'); fx=fx+w*fbip; f2x=f2x+w*(fbip.*fbip); end fx=fx/sw; f2x=f2x/sw; EFT(t,:)=fx; VFT(t,:)=f2x-fx.^2; end, end, end function ext=Exteriority(x,a,b) % Exteriority of point x from line segment ab C=a-b; CC=sum(C.*C); if CC==0 ext=norm(x-a); else A=x-a; B=x-b; e2=sum(A.*A)*sum(B.*B)-sum(A.*B)^2; e2=e2+0.25*(abs(sum(A.*C))+abs(sum(B.*C))-CC)^2; ext=sqrt(max(0,e2/CC)); end, end function pairs=Neighbors(X) % Index pairs of all Voronoi neighbors from matrix X [m,n]=size(X); pairs=[]; for i=1:(m-1) for j=(i+1):m mid=0.5*(X(i,:)+X(j,:)); Y=X-ones(m,1)*mid; D2=sum(Y.^2,2); minD2=min(D2); if minD2==D2(i) pairs=[pairs;[i j]]; else A=eye(n)-Y(i,:)'*Y(i,:)/(Y(i,:)*Y(i,:)'); k=find(D2==minD2); k=k(1); B=2*Y(k,:)*A; C=-(D2(k)+D2(i))*B/(B*B'); D2=sum((Y-ones(m,1)*C).^2,2); minD2=min(D2); if minD2==D2(i), pairs=[pairs;[i j]]; end end, end, end, end

P. Courrieu – British Journal of Mathematical and Statistical Psychology 29/59

References

Akaike, H. (1974). A new look at the statistical model identification. I.E.E.E. Transactions on

Automatic Control, AC 19, 716-723.

Ashby, F.G., & Ell, S.W. (2002). Single versus multiple systems of category learning: Reply

to Nosofsky and Kruschke (2002). Psychonomic Bulletin & Review, 9 (1), 175–180.

Beliakov, G. (2006). Interpolation of Lipschitz functions. Journal of Computational and

Applied Mathematics, 196, 20–44.

Ben-Israel, A., & Greville, T.N.E. (2003). Generalized Inverses: Theory and Applications

(2nd ed.). New York, Springer. 420 pages.

Bott, L., & Heit, E. (2004). Nonmonotonic extrapolation in function learning. Journal of

Experimental Psychology: Learning, Memory, and Cognition, 30(1), 38-50.

Brodlie, K.W., Asim, M.R., & Unsworth, K. (2005). Constrained visualization using the

Shepard interpolation family. Computer Graphics Forum, 24(4), 809-820.

Busemeyer, J.R., Myung, I.J., & McDaniel, M.A. (1993). Cue competition effects:

Theoretical implications for adaptive network learning models. Psychological Science,

4.

Courrieu, P. (1994). Three algorithms for estimating the domain of validity of feedforward

neural networks. Neural Networks, 7, 169-174.

Courrieu, P. (1997). The Hyperbell Algorithm for global optimization: a random walk using

Cauchy densities. Journal of Global Optimization, 10, 37-55.

Courrieu, P. (2004). Solving time of least square systems in Sigma-Pi unit networks. Neural

Information Processing: Letters and Reviews, 4(3), 39-45.

Courrieu, P. (2005). Function approximation on non-Euclidean spaces. Neural Networks, 18,

91-102.

Courrieu, P. (2009). Fast solving of Weighted Pairing Least-Squares systems. Journal of

Computational and Applied Mathematics, 231, 39-48.

Courrieu, P., Brand-D'Abrescia, M., Peereman, R., Spieler, D., & Rey, A. (2011). Validated

intraclass correlation statistics to test item performance models. Behavior Research

Methods, 43, 37-55. doi: 10.3758/s13428-010-0020-5

(preprint: http://arxiv.org/abs/1010.0173).

DeLosh, E.L., Busemeyer, J.R., & McDaniel, M.A. (1997). Extrapolation: the sine qua non

for abstraction in function learning. Journal of Experimental Psychology: Learning,

Memory, and Cognition, 23(4), 968-986.

P. Courrieu – British Journal of Mathematical and Statistical Psychology 30/59

Dry, M. J. (2008). Using relational structure to detect symmetry: a Voronoi tessellation based

model of symmetry perception. Acta Psychologica, 128, 75-90.

Dry, M.J., Navarro, D.J., Preiss, K., Lee, M.D. (2009) The Perceptual Organization of Point

Constellations. In N. Taatgen, H. van Rijn, J. Nerbonne, & L. Shonmaker (Eds.),

Proceedings of the 31st Annual Conference of the Cognitive Science Society, 1151-

1156. Austin, TX: Cognitive Science Society.

Fahlman, S.E., & Lebiere, C. (1990). The Cascade-Correlation learning algorithm. In D.S.

Touretsky (Ed.): Advances in Neural Information Processing Systems, 2. San Mateo,

CA: Morgan Kauffman Publishers, pp. 525-532.

Gärtner, B., & Jaggi, M. (2009). Coresets for polytope distance. ACM, Proceedings of the

25th annual symposium on Computational geometry, 33-42.

Girosi, F., & Poggio, T. (1990). Networks and the best approximation property. Biological

Cybernetics, 63, 169-176.

Jeffreys, H. (1961). Theory of Probability (3rd ed.). Oxford, UK: Oxford University Press.

Kalish, M.L., Lewandowsky, S., & Kruschke, J.K. (2004). Population of linear experts:

knowledge partitioning and function learning. Psychological Review, 111(4), 1072-

1099.

Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical

Association, 90(430), 773-795.

Kelley, H., & Busemeyer, J. (2008). A comparison of models for learning how to dynamically

integrate multiple cues in order to forecast continuous criteria. Journal of Mathematical

Psychology, 52, 218–240.

Koh, K. (1993). Induction of combination rules in two-dimensional function learning.

Memory & Cognition, 21(5), 573-590.

Lewandowsky, S., Kalish, M., & Ngang, S.K. (2002). Simplified learning in complex

situations: knowledge partitioning in function learning. Journal of Experimental

Psychology: General, 131(2), 163-193.

Luce, R.D. (1977). The choice axiom after twenty years. Journal of Mathematical

Psychology, 15, 215-233.

McDaniel, M.A., & Busemeyer, J.R. (2005). The conceptual basis of function learning and

extrapolation: comparison of rule-based and associative-based models. Psychonomic

Bulletin & Review, 12 (1), 24-42.

P. Courrieu – British Journal of Mathematical and Statistical Psychology 31/59

McDaniel, M.A., Dimperio, E., Griego, J.A., & Busemeyer, J.R. (2009). Predicting transfer

performance: a comparison of competing function learning models. Journal of

Experimental Psychology: Learning, Memory, and Cognition, 35(1), 173-195.

McGraw, K.O., & Wong, S.P. (1996). Forming inferences about some intralass correlation

coefficients. Psychological Methods, 1(1), 30-46.

Micchelli, C.A. (1986). Interpolation of scattered data: distance matrices and conditionally

positive definite functions. Constructive Approximation, 2, 11-22.

Moody, J., & Darken, C.J. (1989). Fast learning in networks of locally-tuned processing units.

Neural Computation, 1, 281-294.

Nosofsky, R.M., & Kruschke, J.K. (2002). Single-system models and interference in category

learning: Commentary on Waldron and Ashby (2001). Psychonomic Bulletin & Review,

9, 169-174.

Okabe, A., Boots, B., Sugihara, K., & Chiu, S.N. (2000). Spatial Tessellations - Concepts and

Applications of Voronoi Diagrams (2nd ed.). Chichester, John Wiley. 671 pages.

Pelillo, M. (1996). A relaxation algorithm for estimating the domain of validity of

feedforward neural networks. Neural Processing Letters, 3, 113-121.

Pitt, M.A., & Myung, I.J. (2002). When a good fit can be bad. Trends in Cognitive Sciences,

6(10), 421-425.

Poggio, T., & Girosi, F. (1990). Networks for approximation and learning. Proceedings of the

IEEE, 78(9), 1481-1497.

Renka, R.J. (1988). Multivariate interpolation of large sets of scattered data. ACM

Transactions on Mathematical Software, 14(2), 139-148.

Rey, A., Courrieu, P., Schmidt-Weigand, F., & Jacobs, A.M. (2009). Iterm performane in

visual word recognition. Psychonomic Bulletin & Review, 16 (3), 600-608

Rumelhart, D.E., Hinton, G.E., & Williams, R.J. (1986). Learning internal representations by

error propagation. In D.E. Rumelhart & J.L. McClelland (Eds.): Parallel Distributed

Processing: Explorations in the Microstructure of Cognition. Cambridge, MA: MIT

Press, pp. 318-362.

Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6, 461-464.

Shepard, D. (1968). A two-dimensional interpolation for irregularly-spaced data. Proceedings

of the ACM National Conference, 517-524.

Shepard, R. N. (1987). Toward a universal law of generalization for psychological science.

Science, 237, 1317-1323.

P. Courrieu – British Journal of Mathematical and Statistical Psychology 32/59

Steiger, J.H. (1980). Tests for comparing elements of a correlation matrix. Psychological

Bulletin, 87(2), 245-251.

Williams, E. J. (1959). The comparison of regression variables. Journal of the Royal

Statistical Society, Series B, 21, 396-399.

Yoon, J. (2001). Interpolation by Radial Basis Functions on Sobolev space. Journal of

Approximation Theory, 112, 1-15.

-----------------------------------------------------------------------------------------------------------------

AUTHOR NOTE

The author would like to thank Mark McDaniel and an anonymous reviewer for their

constructive comments concerning this work. Many thanks also to Matthew Dry and John

Kruschke for their helpfull suggestions.

P. Courrieu – British Journal of Mathematical and Statistical Psychology 33/59

Legends and Captions

Table 1. Input-output values of the data sets, and input coordinates of the generalization

points for interpolation and extrapolation problems. Each of the 16 approximation problems

(e.g. "E1.2") consists of a given sampled function (e.g. "E1"), together with a generalization

point (e.g. ".2"), as illustrated in Figure 1.

Table 2. Individual responses of 16 participants to the 16 problems, with mean and standard

deviation for each problem.

Table 3. Generalization responses provided by 10 standard models for the 16 problems,

compared to mean human responses ("Hum"), and correlations between models' responses

and human responses.

Table 4. Data and results of the model class characterization procedure. Each coefficient in

the weight vector WI or WE is the value, at the generalization point, of the weighting function

of the data point whose distance from this generalization point is just above (in the "Dist."

row). The corresponding data function values are given for each problem. Weighting and

summing them in each row, one obtains the values in the "Obtained" column, which are least-

squares approximations of the corresponding human responses in the "Target" column. The

"Const." rows correspond to the unit sum constraint of the model class.

Table 5. Mean values and standard deviations of the responses to the 16 problems provided

by humans and the proposed model (ABI function listed in Appendix). In the last row,

correlations between the predictions and the observations.

Table 6. Pair-wise comparisons of the prediction performance of 11 models by mean of

Student t tests between samples (16 subjects) of Fisher's r to z transformations, where each r

value is the correlation between one subject response pattern and one model prediction

pattern. A positive t value indicates that the row entry model provided better predictions than

the column entry model, while the inverse is true for negative t values. The ABI model

provided significantly better predictions than all other tested models.

P. Courrieu – British Journal of Mathematical and Statistical Psychology 34/59

Table 7. Number of fitted parameters, explained proportion of variance (r2), AIC, and BIC for

11 models tested with the data of Experiment 1. The best model is the one having minimal

AIC and BIC, and maximal r2. The last column provides approximated Bayes factors with

respect to the minimal BIC. Bayes factors greater than 3.2 are considered at least

"substantial". The table also provides the data ICC with its 99.9% confidence interval for

comparison to r2.

Table 8. Comparison (using Williams T2 tests) of the correlations between ABI model

predictions and the responses observed in the two presentation conditions of Experiment 2.

Model's predictions were generated using four tunings of the parameters: the optimal tuning

for Experiment 1 (ABI1), the optimal tuning for the spatial presentation of Experiment 2

(ABIa), the optimal tuning for the non-spatial presentation of Experiment 2 (ABIb), and a

default tuning (ABI0). The table also shows the corresponding parameter values.

Table 9. Pair-wise comparisons of the prediction performance of 11 models, by mean of

Williams T2 tests on the correlations between predicted and observed responses, in the spatial

presentation condition of Experiment 2. A positive T2 value indicates that the row entry

model provided better predictions than the column entry model, while the inverse is true for

negative T2 values. The ABIa and ABI0 models provided significantly better predictions than

all other tested models. The last row provides the obtained correlation for each model.

Table 10. Number of fitted parameters, explained proportion of variance (r2), AIC and BIC

criteria, and approximated Bayes factors with respect to the minimal BIC, for 11 models

tested with the data of the spatial presentation condition of Experiment 2.

Table 11. Pair-wise comparisons of the prediction performance of 11 models, by mean of

Williams T2 tests on the correlations between predicted and observed responses, in the non-

spatial presentation condition of Experiment 2. A positive T2 value indicates that the row

entry model provided better predictions than the column entry model, while the inverse is true

for negative T2 values. The ABIb and ABI0 models provided just, but not significantly better

predictions than standard models. The last row provides the obtained correlation for each

model.

P. Courrieu – British Journal of Mathematical and Statistical Psychology 35/59

Table 12. Number of fitted parameters, explained proportion of variance (r2), AIC and BIC

criteria, and approximated Bayes factors with respect to the two lowest BICs, for 11 models

tested with the data of the non-spatial presentation condition of Experiment 2.

Table 13. Correlations between 10 models' predictions and human generalization responses in

Experiment 2, in the spatial and non spatial presentation conditions, for random interpolation

problems (N=29), and for random extrapolation problems (N=35).

Table 14. Summary of the correlations observed between the predicted standard deviations of

responses, with three tunings of the ABI model (ABI0, ABIa, and ABIb), the observed

response standard deviations, and the averaged response times for each problem, in the spatial

and non-spatial presentation conditions of Experiment 2.

Figure 1. An example of test sheet ("E1.2" extrapolation problem) in Experiment 1. The

function value in the empty circle must be approximated.

Figure 2. Generalization surfaces for the E1 function (dots are data points) generated by four

standard models (She, Lip, Har, and Phi).

Figure 3. Visualization of the data points (in black on white background) and average human

generalization responses (in white on black background) for the 4 experimental functions of

Experiment 1 (with 4 problems per function).

Figure 4. Examples of exteriority functions (first row) from bipoints whose extremity points

are materialized by small white squares. The bipoint in the left column belongs to the

interpolation data set of Experiment 1, while the bipoint in the right column belongs to the

extrapolation data set. The second row shows the nearest data point distance function (d0) for

the interpolation (left) and the extrapolation (right) data sets, which also visualizes the

Voronoi tessellation of these sets. The third row shows the ν functions of the corresponding

bipoints, using the parameter values α=1.2031 and β=2.3536.

P. Courrieu – British Journal of Mathematical and Statistical Psychology 36/59

Figure 5. Generalization surfaces for the I1, I2, E1, and E2 functions (dots are data points)

generated by the ABI model (function listed in Appendix), with α=1.2031 and β=2.3536.

Figure 6. An example of function approximation problem as displayed in the spatial

presentation condition of Experiment 2.

Figure 7. An example of function approximation problem as displayed in the non-spatial

presentation condition of Experiment 2. This is formally the same problem as in Figure 6.

Figure 8. Visualization of the generalization surface generated by the ABI0 model from the

data points of Figure 6 (dots are data points).

Figure 9. Comparison of ABI and EXAM generalization responses from 5 data points

belonging to 3 univariate functions (linear, exponential, and quadratic) tested by DeLosh et al.

(1997), and a fourth function which is a perturbed variant of the quadratic function. The

default parameter values were used for the ABI model, while the EXAM model parameters

were set to γ=0.9 and α=0.1.

P. Courrieu – British Journal of Mathematical and Statistical Psychology 37/59

Table 1

Interpolation Data

Input Output x y I1 I2

Extrapolation Data Input Output

x y E1 E2 1 1 15 10

1 9 10 5 9 1 20 15 9 9 5 5 5 5 5 20

1 6 8 3 6 9 3 3 9 4 18 13 4 1 13 8 5 5 3 18

#

Interpolation Test Points Input x Input y

Extrapolation Test Points Input x Input y

.1

.2

.3

.4

1 6 6 9 9 4 4 1

1 1 1 9 9 1 9 9

P. Courrieu – British Journal of Mathematical and Statistical Psychology 38/59

Table 2

Subject: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 mean SD Problem

I1.1 7 9 8 12 8 12 11 10 10 11 12 12 12 8 8 10 10.00 1.79 I1.2 5 7 5 6 6 6 6 7 7 6 5 7 7 6 8 5 6.19 0.91 I1.3 10 13 10 13 8 10 12 10 15 12 10 10 10 14 10 10 11.06 1.88 I1.4 15 17 17 16 16 18 16 17 17 16 18 18 17 16 15 15 16.50 1.03 I2.1 13 11 8 8 7 12 7 11 7 8 7 15 8 7 12 10 9.44 2.58 I2.2 5 6 5 9 5 5 6 5 5 7 5 5 5 7 9 5 5.88 1.41 I2.3 15 17 17 14 15 18 15 10 10 12 10 10 13 13 10 10 13.06 2.89 I2.4 13 14 13 12 12 13 14 12 13 13 10 18 13 12 12 10 12.75 1.81 E1.1 11 10 10 11 10 10 13 10 10 12 10 15 15 10 8 15 11.25 2.14 E1.2 5 3 6 7 8 5 8 5 5 6 5 5 0 5 6 5 5.25 1.88 E1.3 15 20 16 19 24 21 18 15 18 20 20 20 20 20 12 15 18.31 3.00 E1.4 3 2 5 4 2 4 3 3 5 6 8 3 0 7 10 5 4.38 2.50 E2.1 6 6 5 10 5 5 8 6 10 10 3 9 5 5 8 5 6.62 2.22 E2.2 3 0 2 5 1 2 0 3 2 2 3 4 0 2 6 3 2.38 1.71 E2.3 11 7 11 19 12 10 10 10 15 10 8 15 10 16 10 8 11.38 3.26 E2.4 2 -1 7 10 5 5 3 3 7 7 8 3 7 2 8 5 5.06 2.89

P. Courrieu – British Journal of Mathematical and Statistical Psychology 39/59

Table 3

Model: Hum NNA Lip Gau Phi Har Spl MLP CCo She Qua Problem I1.1 10.0 10.0 9.0 9.4 9.1 9.4 8.5 9.2 9.2 9.8 7.3 I1.2 6.2 5.0 5.1 4.0 5.3 4.6 3.3 4.5 4.8 7.2 4.4 I1.3 11.1 20.0 14.0 12.8 14.3 12.1 10.8 11.6 13.0 12.9 9.8 I1.4 16.5 15.0 11.5 14.7 12.9 14.1 13.2 15.3 14.9 12.6 14.4 I2.1 9.4 5.0 11.0 10.5 9.7 10.1 16.1 10.9 9.9 10.1 12.8 I2.2 5.9 5.0 11.0 8.7 9.0 8.3 14.1 8.2 10.3 9.6 9.6 I2.3 13.1 15.0 13.7 15.8 15.4 14.5 20.3 17.2 15.2 13.8 17.2 I2.4 12.8 10.0 13.5 15.9 13.5 14.8 20.9 13.8 16.5 12.8 16.5 E1.1 11.2 13.0 12.8 8.9 13.1 15.0 19.2 12.9 14.1 10.3 17.5 E1.2 5.2 8.0 8.0 5.3 7.9 8.4 8.6 7.5 6.3 7.2 9.7 E1.3 18.3 18.0 15.3 13.4 18.2 21.9 30.0 18.8 26.1 13.2 28.5 E1.4 4.4 3.0 6.9 5.1 3.7 11.1 14.0 7.2 11.5 6.8 12.8 E2.1 6.6 8.0 8.2 2.6 7.5 5.3 -23.2 6.3 6.2 8.6 -4.3 E2.2 2.4 3.0 5.7 -0.1 2.6 1.0 -29.8 1.1 -6.3 6.1 -10.3 E2.3 11.4 13.0 13.0 7.0 12.6 12.1 -12.5 8.0 14.4 11.6 5.0 E2.4 5.1 3.0 5.7 1.9 3.0 5.7 -21.7 7.9 6.8 7.5 -3.1 r .840 .839 .851 .916 .880 .619 .905 .858 .910 .759

P. Courrieu – British Journal of Mathematical and Statistical Psychology 40/59

Table 4

Problem Reordered interpolation output data Target Obtained I1.1 I2.1 I1.2 I2.2 I1.3 I2.3 I1.4 I2.4 Const.

10.00 5.00 15.00 5.00 20.00 10.00 10.74 5.00 20.00 10.00 5.00 15.00 9.44 8.83 5.00 5.00 10.00 20.00 15.00 6.19 6.21 5.00 20.00 5.00 15.00 10.00 5.88 6.88 20.00 5.00 5.00 15.00 10.00 11.06 12.01 15.00 20.00 5.00 10.00 5.00 13.06 12.30 15.00 5.00 20.00 10.00 5.00 16.50 16.29 10.00 20.00 15.00 5.00 5.00 12.75 14.12 1.00 1.00 1.00 1.00 1.00 1.00 1.00

Dist. WI

3.0000 4.1231 5.0000 8.5440 9.4340 0.5003 0.1583 0.4244 -0.0160 -0.0670

Problem Reordered extrapolation output data Target Obtained E1.1 E2.1 E1.2 E2.2 E1.3 E2.3 E1.4 E2.4 Const.

13.00 8.00 3.00 18.00 3.00 11.25 11.84 8.00 3.00 18.00 13.00 3.00 6.62 6.79 8.00 3.00 3.00 13.00 18.00 5.25 6.28 3.00 3.00 18.00 8.00 13.00 2.38 2.43 18.00 13.00 3.00 3.00 8.00 18.31 18.23 13.00 8.00 18.00 3.00 3.00 11.38 12.97 3.00 18.00 3.00 8.00 13.00 4.38 5.50 3.00 13.00 18.00 3.00 8.00 5.06 5.03 1.00 1.00 1.00 1.00 1.00 1.00 1.00

Dist. WE

3.0000 5.0000 5.6569 8.5440 9.4340 0.8854 0.2098 0.0048 -0.0710 -0.0290 r = .988

P. Courrieu – British Journal of Mathematical and Statistical Psychology 41/59

Table 5

Problem

Mean values Human Model

Standard deviations Human Model

I1.1 I1.2 I1.3 I1.4 I2.1 I2.2 I2.3 I2.4 E1.1 E1.2 E1.3 E1.4 E2.1 E2.2 E2.3 E2.4

10.00 10.59 6.19 6.16 11.06 13.73 16.50 15.14 9.44 8.47 5.88 7.04 13.06 13.07 12.75 12.99 11.25 11.50 5.25 5.96 18.31 18.81 4.38 4.71 6.62 7.00 2.38 2.53 11.38 13.87 5.06 5.12

1.79 2.83 0.91 1.75 1.88 3.63 1.03 3.29 2.58 4.11 1.41 4.49 2.89 3.70 1.81 3.47 2.14 3.38 1.88 3.87 3.00 5.23 2.50 4.66 2.22 4.35 1.71 4.50 3.26 4.32 2.89 5.84

r 0.975 0.653

P. Courrieu – British Journal of Mathematical and Statistical Psychology 42/59

Table 6

t(15) NNA Lip Gau Phi Har Spl MLP CCo She Qua

Lip

Gau

Phi

Har

Spl

MLP

CCo

She

Qua

ABI

-0.16

0.62 1.03

5.14 7.30 3.15

2.08 2.41 0.93 -2.28

-5.84 -7.77 -12.62 -10.12 -13.22

2.72 3.21 3.26 -0.37 2.16 12.93

0.96 1.04 -0.08 -3.17 -2.80 9.29 -2.84

4.20 7.40 3.74 -0.65 1.77 10.66 0.00 2.91

-2.75 -3.47 -5.27 -6.91 -10.45 16.73 -9.14 -5.46 -6.71

19.70 9.83 4.93 4.97 6.58 10.90 3.25 8.98 4.78 8.75

Significance: |t(15)| ≥ 2.13, p<.05; |t(15)| ≥ 2.95, p<.01; |t(15)| ≥ 4.07, p<.001

P. Courrieu – British Journal of Mathematical and Statistical Psychology 43/59

Table 7

Model

Fitting parameters

r2

AIC

BIC

BFABI, *

NNA

Lip

Gau

Phi

Har

Spl

MLP

CCo

She

Qua

ABI

0

0

0

0

0

0

0

0

0

1

2

0.705

0.705

0.725

0.838

0.775

0.383

0.819

0.737

0.828

0.576

0.950

28.86

22.50

25.09

12.28

23.08

793.95

14.43

53.88

19.86

147.91

8.00

28.86

22.50

25.09

12.28

23.08

793.95

14.43

53.88

19.86

148.69

9.54

>1000

652

>1000

3.94

871

>1000

11.53

>1000

174.16

>1000

1

Data ICC

.999 confidence

0.985

[0.959, 0.997]

P. Courrieu – British Journal of Mathematical and Statistical Psychology 44/59

Table 8

Model tuning

α

β

Spatial presentation

Non-spatial presentation

Williams T2 test

ABI0

ABI1

ABIa

ABIb

2

1.2031

3.1757

1.2219

2

2.3536

1.4783

3.3240

r = 0.947

r = 0.931

r = 0.948

r = 0.939

r = 0.782

r = 0.782

r = 0.781

r = 0.787

p < .001

p < .001

p < .001

p < .001

P. Courrieu – British Journal of Mathematical and Statistical Psychology 45/59

Table 9

T2(61) ABIa ABI0 CCo Gau Har Lip MLP NNA Phi She Spl

ABI0

CCo

Gau

Har

Lip

MLP

NNA

Phi

She

Spl

-0.54

-7.02 -6.94

-7.66 -7.59 -0.85

-4.35 -4.26 3.22 3.27

-3.83 -3.73 3.14 4.11 0.80

-3.83 -3.71 5.46 4.10 0.58 -0.34

-6.92 -6.68 1.00 1.82 -1.47 -3.32 -1.77

-4.14 -3.91 3.30 4.48 0.93 0.25 0.49 5.69

-6.45 -6.40 0.50 1.35 -2.16 -4.41 -2.29 -0.66 -3.70

-9.15 -9.02 -2.33 -1.21 -4.60 -4.86 -5.78 -2.69 -5.06 -2.22 ----

r 0.948 0.947 0.712 0.650 0.850 0.877 0.865 0.782 0.881 0.749 0.545

Significance: |T2(61)| ≥ 2, p<.05; |T2(61)| ≥ 2.66, p<.01; |T2(61)| ≥ 3.46, p<.001;

P. Courrieu – British Journal of Mathematical and Statistical Psychology 46/59

Table 10

Model

Fitting parameters

r2

AIC

BIC

BFABI0, *

ABIa

ABI0

CCo

Gau

Har

Lip

MLP

NNA

Phi

She

Spl

2

0

0

0

0

0

0

0

0

0

0

0.899

0.897

0.506

0.423

0.722

0.768

0.748

0.611

0.777

0.560

0.297

74.6

73.1

629.1

1076.3

215.0

191.7

276.5

305.4

159.2

369.5

3709.8

79.0

73.1

629.1

1076.3

215.0

191.7

276.5

305.4

159.2

369.5

3709.8

18.84

1

>1000

>1000

>1000

>1000

>1000

>1000

>1000

>1000

>1000

Data ICC

.999 confidence

0.978

[0.957, 0.989]

P. Courrieu – British Journal of Mathematical and Statistical Psychology 47/59

Table 11

T2(61) ABIb ABI0 CCo Gau Har Lip MLP NNA Phi She Spl

ABI0

CCo

Gau

Har

Lip

MLP

NNA

Phi

She

Spl

-0.47

-2.28 -2.23

-3.74 -3.60 -1.42

-0.24 -0.14 2.55 3.29

-0.62 -0.52 1.65 3.17 -0.27

-0.88 -0.81 2.62 3.16 -0.88 -0.32

-0.29 -0.19 1.68 3.13 -0.04 0.29 0.46

-0.24 -0.11 1.92 3.67 0.05 0.58 0.62 0.19

-1.55 -1.45 0.66 2.09 -1.33 -1.60 -0.67 -1.38 -1.66

-4.11 -4.10 -2.30 -0.65 -3.96 -3.33 -3.89 -3.31 -3.63 -2.32 ----

r 0.787 0.782 0.646 0.529 0.776 0.764 0.747 0.774 0.779 0.701 0.466

Significance: |T2(61)| ≥ 2, p<.05; |T2(61)| ≥ 2.66, p<.01; |T2(61)| ≥ 3.46, p<.001;

P. Courrieu – British Journal of Mathematical and Statistical Psychology 48/59

Table 12

Model

Fitting parameters

r2

AIC

BIC

BFLip, *

BFABI0, *

ABIb

ABI0

CCo

Gau

Har

Lip

MLP

NNA

Phi

She

Spl

2

0

0

0

0

0

0

0

0

0

0

0.619

0.612

0.417

0.280

0.603

0.583

0.557

0.599

0.606

0.491

0.217

57.27

56.16

157.52

221.99

65.05

55.56

102.95

64.15

58.51

72.20

841.85

61.59

56.16

157.52

221.99

65.05

55.56

102.95

64.15

58.51

72.20

841.85

20.49

1.35

>1000

>1000

115.03

1

>1000

73.24

4.37

>1000

>1000

15.09

1

>1000

>1000

85.09

----

>1000

54.18

3.23

>1000

>1000

Data ICC

.999 conf.

0.875

[0.762, 0.940]

P. Courrieu – British Journal of Mathematical and Statistical Psychology 49/59

Table 13

Spatial presentation Non spatial presentation

Model Interpolation Extrapolation Interpolation Extrapolation

ABI0

CCo

Gau

Har

Lip

MLP

NNA

Phi

She

Spl

0.890

0.668*

0.799*

0.831

0.851

0.723*

0.674*

0.800*

0.869

0.431*

0.968

0.725*

0.679*

0.880*

0.910*

0.914*

0.884*

0.935*

0.730*

0.652*

0.662

0.514

0.554*

0.609

0.700

0.541

0.757

0.700

0.670

0.362*

0.837

0.702*

0.583*

0.881

0.805

0.837

0.805

0.825

0.742*

0.563*

(*) Significantly worse than the best predictor in the same column (Williams T2 test).

P. Courrieu – British Journal of Mathematical and Statistical Psychology 50/59

Table 14

Spatial presentation Non-spatial presentation Model tuning Response SD Response time Response SD Response time ABI0 (SD)

ABIa (SD)

ABIb (SD)

r = 0.33, p<.01

r = 0.39, p<.01

r = 0.52, p<.01

r = 0.55, p<.01

r = 0.20, n.s.

r = 0.20, n.s.

r = 0.43, p<.01

r = 0.47, p<.01

P. Courrieu – British Journal of Mathematical and Statistical Psychology 51/59

Figure 1

P. Courrieu – British Journal of Mathematical and Statistical Psychology 52/59

Figure 2

P. Courrieu – British Journal of Mathematical and Statistical Psychology 53/59

Figure 3

P. Courrieu – British Journal of Mathematical and Statistical Psychology 54/59

Figure 4

P. Courrieu – British Journal of Mathematical and Statistical Psychology 55/59

Figure 5

P. Courrieu – British Journal of Mathematical and Statistical Psychology 56/59

Figure 6

P. Courrieu – British Journal of Mathematical and Statistical Psychology 57/59

Figure 7

P. Courrieu – British Journal of Mathematical and Statistical Psychology 58/59

Figure 8

P. Courrieu – British Journal of Mathematical and Statistical Psychology 59/59

Figure 9