research.tue.nl · i preface in this report i present the results of the research made to complete...

Eindhoven University of Technology

MASTER

Analysis and design of pharmacokinetic models

Eyzaguirre Pérez, R.H.

Award date:2006

Link to publication

DisclaimerThis document contains a student thesis (bachelor's or master's), as authored by a student at Eindhoven University of Technology. Studenttheses are made available in the TU/e repository upon obtaining the required degree. The grade received is not published on the documentas presented in the repository. The required complexity or quality of research of student theses may vary by program, and the requiredminimum study period may vary in duration.

General rightsCopyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright ownersand it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.

• Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain

https://research.tue.nl/en/studentthesis/analysis-and-design-of-pharmacokinetic-models(6e516a54-1d97-405d-83cb-6bd3a2b4d941).html

TECHNISCHE UNIVERSITEIT EINDHOVEN

Department of Mathematics and Computer Science

Analysis and Design

of Pharmacokinetic Models

By

R. H. Eyzaguirre Pérez

Supervisors:

E. E. M. van Berkum (TU/e)

H. C. M van der Knaap (Unilever R&D Vlaardingen)

Eindhoven, August 2006

i

Preface

In this report I present the results of the research made to complete my master studies at the Department

of Mathematics and Computer Science of the Technische Universiteit Eindhoven. This research was

carried out at Unilever R&D Vlaardingen between March and August of 2006 in the field of the analysis

of pharmacokinetic models to describe biological systems.

This was a very instructive experience for my professional and academic training and for this reason I

want to thank Unilever R&D Vlaardingen that provided all the facilities to successfully conclude this

project. In particular I want to thank my Unilever supervisor in this project, Henk van der Knaap of the

Data Science Skillbase, and to Emiel van Berkum, my supervisor at the Technische Universiteit

Eindhoven. Thanks also go to Guus Duchateau, Pieter van de Pijl, and Martin Folz, members of the

Bioavailabilty & Gut Health Expertise Group at Unilever R&D Vlaardingen for their recommendations

and time.

Raul Eyzaguirre

Eindhoven, August 2006.

iii

Contents

Definitions and Nomenclature...................................................................................................................v

1. Introduction.......................................................................................................................................1

2. Compartment Analysis .....................................................................................................................3

2.1. Principal Compartment Models.................................................................................................3

2.2. Further Considerations ..............................................................................................................5

3. Individual Pharmacokinetics ...........................................................................................................7

3.1. Nonlinear Regression Model .....................................................................................................7

3.2. Inference for Functions of the Estimated Parameters..............................................................10

3.3. Measures of Curvature or Nonlinearity ...................................................................................12

3.4. Software ..................................................................................................................................16

3.5. Example: One-Compartment Model, Extravascular Administration.......................................17

4. Population Pharmacokinetics ........................................................................................................27

4.1. Hierarchical Nonlinear Models ...............................................................................................27

4.2. Traditional Approaches ...........................................................................................................29

4.3. Inference Based on Linearization............................................................................................31

4.4. Software ..................................................................................................................................33

4.5. Example 1: One-Compartment Model with Extravascular Administration.............................33

4.6. Example 2: Comparison of two Treatments in the One-Compartment Model with

Extravascular Administration..................................................................................................40

5. Sampling Strategies ........................................................................................................................48

5.1. Optimal Designs ......................................................................................................................48

5.2. Simulation Studies...................................................................................................................50

5.3. Sparse Data Analysis...............................................................................................................52

6. Conclusions and Recommendations ..............................................................................................60

6.1. Conclusions .............................................................................................................................60

6.2. Data Analysis Recommendations............................................................................................62

Appendix: R Code for the Measures of Curvature Computation........................................................64

References .................................................................................................................................................66

v

Definitions and Nomenclature

α : Hybrid constant related to the micro rate constants k12, k21, and kel

β : Hybrid constant related to the micro rate constants k12, k21, and kel

β : r×1 vector of fixed population parameters

γ : Vector of intra-individual covariance parameters in the functional part of the covariance model

δ : Intra-individual covariance parameter used to model heterogeneity of variances

θ : p×1 vector of regression parameters

ˆGLSθ : Generalized least squares estimator for θθθθ

ˆOLSθ : Ordinary least squares estimator for θθθθ

ˆWLSθ : Weighted least square estimator for θθθθ

µ : Mean response

ξ : Vector of intra-individual covariance parameters

2σ : Intra-individual variance

2σɶ : Maximum likelihood estimator for 2σ

2ˆOLS

σ : Ordinary least squares estimator for σ 2

2ˆWLS

σ : Weighted least squares estimator for σ 2

OLSΣ : p×p matrix given by 1 T

OLS

− =Σ F. F.

GLSΣ : p×p matrix given by ( )1 T 1

,GLS

− −=Σ F. S β γ F.

a : a×1 covariate vector of individual characteristics

A.. : (p+p′)×p×p compact acceleration array

θA : Parameter effects acceleration array

ιA : Intrinsic acceleration array

AIC : Akaike’s Information Criterion

AUC : Area under the concentration time curve

AUMC : Area under the first moment curve

b : k×1 vector of random effects

C : (p+p′)×p×p array of relative curvatures

θC : Parameter effects relative curvature array

ιC : Intrinsic relative curvature array

vi

( )C t : Concentration of drug at time t

TCl : Total clearance

cθ : Root mean square parameters effect curvature

cι : Root mean square intrinsic curvature

maxc : Maximum concentration

Cp : Concentration of drug in plasma or central compartment

d : p-dimensional vector-valued function

D : Covariance matrix for the random effects

D : Administered dose

e : Random error

f : Fraction of administered dose which is absorbed

( ),f x θ : Nonlinear function on θθθθ

( )f θ : n×1 vector, named the expectation surface, that contains the functions ( ),f x θ , ∀ x

F. : n×p matrix of derivatives of ( )f θ with respect to the elements of θθθθ

F.. : n×p×p array of second derivatives of ( )f θ with respect to the elements of θθθθ

k : Rate constant

ak : Absorption rate constant

elk : Elimination rate constant

12k : Distribution rate constant for transfer of drug from compartment 1 to compartment 2

21k : Distribution rate constant for transfer of drug from compartment 2 to compartment 1

m : Number of subjects in the sample

MRT : Mean residence time

n : Number of measures per subjects

p : Number of regression parameters in the nonlinear model

R : Intra-individual covariance matrix

2s : Residual mean square

( )S θ : Residual sum of squares

S : Matrix proportional to the intra-individual covariance matrix R, specifically, 2σ=S R

SBC : Schwar’s Bayesian Criterion

1/ 2t : Elimination half-life

lagt : Lag-time

maxt : Time to maximum concentration

CV : Volume of the central compartment

DV : Volume of distribution

W : Weight matrix

vii

W : Estimated weight matrix

x : Covariate vector for the nonlinear regression model

y : Response variable in the nonlinear regression model

z : Vector of constants which include some or all of the covariates in x

1

1. Introduction

Pharmacokinetics is dedicated to the study of the time course of substances and their relationship with an

organism; a pharmacokinetic model is used to describe the concentration of such substances into the

organism over the time. In the modeling and analysis of these data an important distinction has to be

made between the case that the data comes from one individual and the case that the data comes from

several individuals. In the first case we deal with individual pharmacokinetics, and the results of the

experiments are usually analyzed using nonlinear regression models where the concentration in the body

of the inoculate substance at time t is the response variable; in the second case we deal with population

pharmacokinetics, and the main statistical tools are hierarchical nonlinear regression models, which are

also referred as nonlinear mixed effects models. Unilever is constantly working in the development and

improvement of healthy food, and in this task a lot of experimentation is made in the field of

pharmacokinetics. The objective of this project is to give an answer to some particular questions in the

analysis of such models.

A first concern is related to the parameterization of the model. In pharmacokinetics, practitioners are

usually interested in several pharmacokinetic parameters which are related among themselves through

nonlinear functions. For instance, the pharmacokinetic parameter total clearance (ClT) is related to the

parameters constant of elimination (kel) and volume of distribution (VD) through the function

T el DCl k V= ⋅ . As a result, a pharmacokinetic model can be fitted under several different

parameterizations, so a question arising here is which parameterization is more convenient in order to get

more accurate parameter estimates.

A second subject is the sampling strategy. Pharmacokinetic data is gathered for each subject over time, so

a first issue is to define the optimal sampling times. The strategy will be different depending if the main

goal is to determine the more appropriate functional form to describe the data or to estimate the

parameters for a model with a given functional form. Due to the fact that data are frequently taken from

human beings there are several limitations in the sampling strategy (ethical concerns, budget limitations,

pour control in the exact measure times, etc.). In this respect it is important to be able to get the maximum

amount of information from sparse data, the extreme case being how to take advantage from individuals

with less data points than parameters in the model (i.e., individuals for whom the individual nonlinear

regression model is not estimable). Indeed, an important concern here is the trade-off of number of

subjects versus number of measures per subject.

A third concern is the estimation of the population parameters. Traditionally these parameters have been

estimated by the naïve pooled data approach, which pooled all the data as if they come from a single

individual, and the two-stage approach, where individual models are fitted first for each subject, and then

the individual parameter estimates are used as building blocks to get estimators for the population

parameters. Both approaches have limitations. The naïve pooled data approach does not recognize

differences among individuals, so inter and intra-individual variation is lumped together; the two-stage

approach can not take advantage from subjects with less data points than parameters in the individual

model, and the population parameters obtained with this approach used to be biased. Here we explore a

third approach which seems to be more efficient, based on a linearization of the hierarchical nonlinear

model, that is used to model the two sources of random variation, the inter-individual and intra-individual

variation.

This report is organized as follows. In Section 2 we present a short introduction to the compartment

models, which are the main type of parametric models used in pharmacokinetics. In Section 3 we deal

with the individual pharmacokinetic models analysis, with a main focus in the effect of the

2

parameterization of the model. In Section 4 we approach the problem of the population analysis which is

based on the estimation of the hierarchical nonlinear regression model by linearization. In Section 5 we

treat the problem of sampling strategies; this problem is approached since the theory of optimal designs

and by simulation studies. Finally in Section 6 we present our conclusions and recommendations.

Throughout this report we illustrate the methods with some examples. All the computations and data

analysis are done using the R language and environment, version 2.2.1 for Windows. A good reference

for statistical applications with R is the book of Venables and Ripley (2002) and for the particular subject

of linear and nonlinear mixed effects models, which is the main statistical tool in this report, the book of

Pinheiro and Bates (2000).

3

2. Compartment Analysis

In pharmacokinetics a compartment is an entity which can be described by a definite volume and a

concentration of drug in that volume. Although the human body is conformed by millions of

compartments, a simplification (theoretical model) is made to just a few of them (mainly one or two). In

practice, rarely more than two compartments are used to model pharmacokinetic data.

Compartment models are a special class of nonlinear models where the response variable (concentration

of drug in the compartment) is described by an ordinary differential equation. These differential equations

describe the change of the concentration of drug in the compartment over time, a process that is usually of

first order kinetics. First order kinetics means that the rate of change of drug concentration in a

compartment at time t is directly proportional to the drug concentration in that compartment at that time.

Therefore, a process of first order kinetics can be described by the following differential equation:

( )

( )dC t

kC tdt

= − , (2.1)

where ( )C t is the drug concentration at time t, and k is named the rate constant.

2.1. Principal Compartment Models

In this section we present a description of some important compartment models. The main goal of this

section is to give some insight on the kind of models we will deal with through this report, so this list is

not exhaustive. In all the models presented here we consider first order kinetics for the rate constants and

that the drug is administered by a single dose with intravascular or extravascular administration. In an

intravascular administration the drug is directly introduced into the bloodstream, usually considered as the

central compartment, and therefore we assume that the drug is rapidly mixed in the blood or plasma. With

this kind of administration our model will contain just elimination rate if a one-compartment model is

used and elimination and between compartments distribution rates if a two-compartment (or more) model

is used. With extravascular administration (for instance oral or nasal), the drug must be absorbed by the

central compartment, so an absorption rate is also included in the model. These descriptions are mainly

based on Chapters 15 and 21 of Ritschel and Kearns (2004).

2.1.1. Open One-Compartment Model, Intravascular Administration

In this model there is just one rate constant, that is, the elimination rate constant. The pharmacokinetic

model is obtained by direct integration of (2.1) and hence defined by

el( ) (0)k t

C t C e− ⋅= ⋅ .

In this model kel is the elimination rate constant, and (0)C is the drug concentration at time zero. Let D

be the administered dose. An important pharmacokinetic parameter is the apparent volume of distribution,

denoted by VD. In this model, VD is given by

4

D(0)

DV

C= ,

so the model can be written as

el

D

( )k tD

C t eV

− ⋅= . (2.2)

2.1.2. Open One-Compartment Model, Extravascular Administration

In this model we have absorption and elimination rate constants. The pharmacokinetic model is given by

el a( )k t k t

C t B e A e− ⋅ − ⋅= ⋅ − ⋅ ,

where ka is the absorption rate constant. The coefficients A and B are equal to

( )a

D a el

D f kA B

V k k

⋅ ⋅= =

−,

where f is the fraction of administered dose which is absorbed. Therefore, the model can be written as

( )

el aa

D a el

( )k t k tD f k

C t e eV k k

− ⋅ − ⋅⋅ ⋅ = − −

. (2.3)

2.1.3. Open Two-Compartment Model, Intravascular Administration

The pharmacokinetic model is given by

( ) t tC t B e A eβ α− ⋅ − ⋅= ⋅ + ⋅ . (2.4)

In this model α and β are called hybrid constants and are related to the micro rate constants k12, k21, and

kel by the following equations

12 21 elk k kα β+ = + +

21 elk kα β⋅ = ⋅ .

The micro rate constant k12 is the distribution rate constant for transfer of drug from compartment 1

(central compartment) to compartment 2 (peripheral compartment), k21 is the distribution rate constant for

transfer of drug from compartment 2 to compartment 1, and kel is the elimination rate constant of drug

from the central compartment. The coefficients A and B are given by

( )( )

21

C

D kA

V

α

α β

−=

−,

( )( )

21

C

D kB

V

β

α β

−=

−,

where VC is the volume of the central compartment. Therefore, this pharmacokinetic model can be written

as

( )

( ) ( )21 21

C

( )t tD

C t k e k eV

β αβ αα β

− ⋅ − ⋅ = − − − −. (2.5)

2.1.4. Open Two-Compartment Model, Extravascular Administration

The pharmacokinetic model is given by

5

a( ) (0)k tt tC t B e A e C eβ α − ⋅− ⋅ − ⋅= ⋅ + ⋅ − ⋅ .

Here, (0)C is the hypothetical drug concentration at time zero obtained from (0)A B C+ = . The

coefficients A and B are given by

( )( ) ( )

a 21

C a

D f k kA

V k

α

α β α

⋅ ⋅ −=

− −,

( )( )( )

a 21

C a

D f k kB

V k

β

α β β

⋅ ⋅ −=

− −.

In this model the volume of the central compartment, VC, is given by

( )( )( )

a 21 a

C

a a(0)

D f k k kV

C k kβ α

⋅ ⋅ −=

− − −,

so the pharmacokinetic model can be written as

( )( ) ( ) ( ) ( )( )

aa 21 a21 21

C a a a a

( )k tt tD f k k kk k

C t e e eV k k k k

α βα β

α β α β α β α β− ⋅− ⋅ − ⋅

⋅ ⋅ −− − = + +

− − − − − − . (2.6)

2.2. Further Considerations

2.2.1. Lag-Time

When the drug is given by intravascular administration, we assume that the drug is rapidly mixed in the

central compartment in such a way that the first appearance of drug in the circulation system is virtually

immediate. However, with extravascular administration it is possible to observe a time interval between

administration of the drug and its first appearance. This time, denoted by tlag, is called lag-time and when

necessary must be incorporated in the model. For instance, in the one-compartment model with

extravascular administration, if a lag-time is considered, (2.3) becomes

( )( ) ( )el lag a laga

D a el

( )k t t k t tD f k

C t e eV k k

− ⋅ − − ⋅ −⋅ ⋅ = − −.

2.2.2. Number of Compartments

As mentioned before, the human body is conformed by millions of compartments. A good theoretical

model must use the fewest number of compartments necessary to adequately describe the experimental

data. Most of the time, one or two compartments are enough.

In Section 2.1 we noted that the compartment models are sums of exponential terms. If the exponents of

these terms are sufficiently sparse (which is usually the case with pharmacokinetic data), we can split up

the model in different straight lines, one for each exponential term, when depicting it in a semilog plot.

Then the number of straight lines will give us insight concerning the number of components of the model.

For instance if β is considerably greater than α in model (2.4), then exp( )tβ− ⋅ will tend to zero faster

than exp( )tα− ⋅ , and therefore, for sufficiently large t, we will have that

( ) ( )ln ( ) lnC t A tα≈ − ⋅ .

This straight line can be observed in the right-hand side of Figure 2.1c. If we subtract exp( )tα− ⋅ from

( )C t and depict a semilog plot of this amount against time, we will get a second straight line, namely

6

( ) ( )ln ( ) lnt

C t e B tα β− ⋅− ≈ − ⋅ .

These two straight lines reveal the two compartments on the model and indeed show a different phase of

the kinetic process. We can clearly observe these phases in the plots of Figure 2.1. In Figure 2.1a we

observe just one straight line, which corresponds to the elimination phase (the only phase in the open one-

compartment model with intravascular administration). In Figure 2.2b we observe an absorption phase

(first part of the curve with positive slope) followed by the elimination phase (second part of the curve

with negative slope). In the open two-compartment model with intravascular administration shown in

Figure 2.1c we have a distribution phase (in this phase the drug is distributed between the two

compartments until reach equilibrium) followed by the elimination phase. Finally, in Figure 2.1d we can

see the three phases of the open two-compartment model with extravascular administration: absorption,

distribution, and elimination phase. A very simple method that utilizes this graphical characteristic in

order to determine the number of compartments is the method of residuals, also known as feathering or

peeling. A description of this method can be found in Chapter 4 of Shargel and Yu (1999). Besides its

utility in the determination of the number of compartments, this method is also proposed, mostly in the

pharmacokinetic literature, as a technique to obtain rough estimations of the pharmacokinetic parameters.

Open One-Compartment Model

Intravascular Administration

Time

(a)

Log

Co

nce

ntr

atio

n

Open One-Compartment Model

Extravascular Administration

Time

(b)

Log

Co

nce

ntr

atio

n

Open Two-Compartment Model


Time

(c)

Log

Co

nce

ntr

atio

n

Open Two-Compartment Model


Time

(d)

Log

Co

nce

ntr

atio

n

FIGURE 2.1 Semilog plots for one-compartment and two-compartment models with intravascular and extravascular

administration

The number of compartments may depend also on pharmacokinetic considerations. For example,

theophylline follows the kinetics of a one-compartment model after oral administration but of a two-

compartment model after intravascular administration. The reason is that the distribution phase is rapid,

and therefore, this phase is confounded with the absorption phase in oral administration (Shargel and Yu,

1999). Another factor that may affect the observed number of compartments is the times for blood

sampling. We can see that on Figure 2.1c; if the first sample comes too late, we can miss the distribution

phase. We will not pay attention to this issue here, so we will assume in our examples that the

compartment model is given.

7

3. Individual Pharmacokinetics

With data gathered from a single individual, the compartment pharmacokinetic models, as those shown in

equations (2.2), (2.3), (2.5), and (2.6), are fitted by using standard nonlinear regression techniques. In all

those models, the response variable, concentration of drug, depends through a nonlinear model on the

time after administration of drug. In Section 3.1 we present a description of the nonlinear regression

model, the estimation procedures, and asymptotic distributional results, and in Section 3.2 we focus on

inference for functions of the parameters in the model and reparameterization. This theory is mainly

based on Chapter 2 of Davidian and Giltinan (1995). As we will see in Section 3.2, inference in the

nonlinear regression model is based on a linearization of the expectation surface, so that the exact results

of linear regression can be asymptotically applied. This approximation will be appropriate if the

expectation surface around θ , the estimated regression parameters, is fairly flat, and different

parameterizations will produce better or worse linear approximations. We pay some attention to this

subject in Section 3.3. In Section 3.4 we mention some available software to analyse nonlinear regression

models or the most specific individual pharmacokinetic models, and in Section 3.5 we illustrate the theory

with an example.

3.1. Nonlinear Regression Model

3.1.1. Model and Assumptions

The nonlinear regression model for a response variable yj taken at the jth covariate value xj, j = 1,…, n, is

usually written as

( ),j j jy f e= +x θ . (3.1)

In this expression θθθθ is a p×1 vector of regression parameters, f is a function which depends on a nonlinear

fashion on θθθθ, and ej is the random error associated to the jth measured response. We can aggregate the n

equations in the more compact model

( )= +y f θ e , (3.2)

where the jth element of (3.2) is given by (3.1). We will call ( ) ( )E =y f θ the expectation surface.

The classical assumptions for the model specified in (3.1) are the following:

(i) The errors ej have mean zero.

(ii) The errors ej are uncorrelated.

(iii) The errors ej have common variance σ 2.

(iv) The errors ej are identically distributed for all xj.

(v) The errors ej are normally distributed.

Some nonlinear models can be transformed to a linear model by applying a suitable transformation. For

instance, the nonlinear model

8

( ) 1 2,

xf θ θ=x θ

can be transformed, applying a logarithmic transformation, to the linear model

( )( ) ( ) ( )1 2ln , ln lnf xθ θ= +x θ ,

and the nonlinear model (known as the Michaelis-Menten model)

( ) 1

2

,x

fx

θ

θ=

+x θ

can be transformed, with a reciprocal transformation, to the linear model

( )2

1 1

1 1 1

,f x

θ

θ θ= +

x θ.

These models are called “transformably linear” or “intrinsically linear”, and transforming models to a

linear form makes computations easier. However, we must consider that a transformation of the model

involves a transformation of the error terms too, and though sometimes the transformation can be helpful

to satisfy the classical assumptions, it can also be the case that it departs from them. Indeed, nonlinear

models are usually the result of some meaningful empirical or theoretical relation among the response

variable, the covariates, and the parameters, and it is desirable to preserve this relation in the analysis.

The last four assumptions are very restrictive and may not hold in some applications. However, the

classical nonlinear regression model may be generalized to accommodate some departures from these

assumptions. Specifically we will consider, by relaxing assumptions (ii) and (iii), the possibility of a

general covariance matrix R for the errors

( ) ( )Cov ,=e R θ ξ , (3.3)

which can depend on the regression parameters θθθθ and on some intra-individual covariance parameters

given in the vector ξξξξ (σ included in ξξξξ).

3.1.2. Least Squares Estimation

Under the classical assumptions, the ordinary least squares (OLS) estimator for θθθθ, ˆOLSθ , which minimizes

the error sum of squares

( ) ( ){ }2

1

,n

j j

j

S y f=

= −∑θ x θ (3.4)

is also the maximum likelihood estimator of θθθθ. The maximum likelihood estimator for 2σ is

( ){ }2

2

1

1 ˆ,n

j j OLS

j

y fn

σ=

= −∑ x θɶ ,

which is generally biased downward. As in the linear case, 2σɶ is usually replaced by

( ){ }2

2

1

1 ˆˆ ,n

OLS j j OLS

j

y fn p

σ=

= −−∑ x θ .

With a general covariance structure as specified in (3.3), we can apply the generalized least squares

principle. For convenience let us write

( ) ( )2, ,σ=R θ ξ S θ γ ,

9

with γγγγ the vector of intra-individual covariance parameters but without including σ. If ( ) 1,

−=S θ γ W for

some known matrix W, then, under normality, the maximum likelihood estimator for θθθθ is the weighted

least square estimator ˆWLSθ minimizing

( ) ( ){ } ( ){ }T

S = − −θ y f θ W y f θ , (3.5)

and 2σ may be estimated by

( ){ } ( ){ }T

2 1 ˆ ˆˆWLS WLS WLS

n pσ = − −

−y f θ W y f θ .

In most of the cases it is unlikely that such a complete specification of the covariance structure be known.

In those cases the following iterative process can be used:

1. Get an initial estimator for θθθθ, for instance ˆOLSθ .

2. Obtain an estimator for γγγγ, and form the estimated weight matrix ( )1 ˆˆ ˆ,−=W S θ γ .

3. Using W reestimate θθθθ by minimizing (3.5), and return to step 2.

The final estimator for θθθθ is called the generalized least squares estimator and is denoted by ˆGLSθ .

Closed form solutions for ˆOLSθ and ˆ

WLSθ are rarely available, and therefore, minimization of expressions

(3.4) and (3.5) require the use of iterative algorithms. The most common algorithms are based on

modifications of the Gauss-Newton algorithm such as the ones proposed by Levenberg (1944) and

Marquardt (1963), and Hartley (1961). Documentation about these algorithms can be found in Chapters 2

and 14 of Seber and Wild (1989) and in Chapters 2 and 3 of Bates and Watts (1988). Other options to

estimate the parameters are the Steepest-Descent method (see Section 13.2.3 of Seber and Wild, 1989)

and the Nelder-Mead Simplex algorithm (see Section 13.5.3 of Seber and Wild, 1989). All the Gauss-

Newton based algorithms work over a linearization of the expectation surface. The same approximation is

used to the get the asymptotic results and is illustrated in the next section.

3.1.3. Asymptotic Results

An important distinction between linear and nonlinear regression is that, even under the classical

assumptions, it is not possible in the nonlinear case to obtain exact distributional results for the

estimators. However, it is possible to obtain asymptotic results based on large sample theory that hold

even when the assumption of normality does not hold. It is of concern to note that these asymptotic results

are obtained from a linear approximation of the nonlinear expectation surface (cf. (3.6)). Here we present

the basis of this approximation following the discussion of Section 2.1.2 of Seber and Wild (1989).

Let ∗θ be the true value of θ . We can approximate the expectation surface in a small neighbourhood of ∗θ by the linear Taylor expansion

( ) ( )( )

( )1

,, ,

pj

j j r r

r r

ff f θ θ

θ∗

∗ ∗

=

∂≈ + −

∂∑

θ

x θx θ x θ

or

( ) ( ) ( )( )∗ ∗≈ + −f θ f θ F. θ θ , (3.6)

where F. is the n×p matrix of derivatives of the expectation function ( )f θ with respect to the parameters

θ given by

10

( ) ( ),

j

r

f

θ

∂∂ = = ∂ ∂

x θf θF.

θ.

Equation (3.4) can be written as

( ) ( ){ } ( ){ }T

S = − −θ y f θ y f θ , (3.7)

and hence, applying approximation (3.6) in (3.7) we have

( ) ( ) ( )( ){ } ( ) ( )( ){ }

( )( ){ } ( )( ){ }

T

T

.

S∗ ∗ ∗ ∗

∗ ∗

≈ − − − − − −

= − − − −

θ y f θ F. θ θ y f θ F. θ θ

e F. θ θ e F. θ θ

(3.8)

By analogy with the linear regression model, (3.8) is minimized when

( ) ( )1

T T−∗− =θ θ F. F. F. e .

If θ is within the small neighbourhood of ∗θ , then we have that

( ) ( )1

T Tˆ −∗− ≈θ θ F. F. F. e .

This result shows how, under the linearization given in (3.6), we can get for the nonlinear regression

model similar results to those of the linear regression model, and therefore, that inference can be treated

in a similar fashion. The results can be extended to the case of generalized least squares. Below we

summarize the asymptotic results presented in Davidian and Giltinan (1995).

1. Under assumptions (i) to (iv) and under the additional condition that the errors are independent, we

have that, asymptotically

( )2ˆ ,OLS OLSN σθ θ Σ∼ , 1 T

OLS

− =Σ F. F. . (3.9)

2. For a general covariance structure, as specified in (3.3), the estimator ˆGLSθ has asymptotic normal

distribution

( )2ˆ ,GLS GLSN σθ θ Σ∼ , ( )1 T 1,

GLS

− −=Σ F. S θ γ F. . (3.10)

It is important to mention that this result holds no matter if γγγγ has been estimated or is known.

These asymptotic results can be used to construct approximate confidence intervals and hypothesis testing

procedures. Some inferential procedures can be found in Davidian and Giltinan (1995), and a

comprehensive discussion for the OLS case is given in Chapter 5 of Seber and Wild (1989). For a single

parameter, inference is straightforward by using the corresponding marginal distribution. For linear and

nonlinear functions of the parameters we can use the results presented in the next section.

3.2. Inference for Functions of the Estimated Parameters

It is sometimes of interest to make inference about quantities that are functions of the estimated

parameters1. That is the case, in pharmacokinetics, of parameters such as half-life or clearance. For

instance, in the open one-compartment model with intravascular administration presented in (2.2)

el

D

( )k tD

C t eV

− ⋅=

the elimination half-life parameter is defined by

1 In pharmacokinetics these parameters are called secondary parameters.

11

1 2

el

ln 2t

k= ,

and the total clearance by

T el DCl k V= ⋅ .

For a linear combination Tc θ of the elements of θθθθ, the asymptotic results in (3.9) and (3.10) imply that

( )T T 2 Tˆ ,N σc θ c θ c Σc∼ . (3.11)

For a nonlinear function ( )c θ of the elements of θθθθ, we can use, in the same way as in (3.6), the linear

Taylor expansion

( ) ( ) ( )Tˆ ˆc c≈ + −θ θ c θ θ , (3.12)

where c is now the p×1 vector of partial derivatives of c with respect to the elements of θθθθ. With this

linearization we have the following asymptotic result

( ) ( )( )2 Tˆ ,c N c σθ θ c Σc∼ . (3.13)

Note that this last expression also applies to the case of a linear function where ( ) Tc =θ c θ .

Since 2σ and ΣΣΣΣ are usually unknown, we can use the following result of the standard statistical theory

( ) ( )

( )1 2

T

ˆ

ˆˆn p

c cT t

σ−

−=

θ θ

c Σc

∼ ,

where tn-p represents the Student t distribution with n-p degrees of freedom. Hence, an approximate

100 (1 )α− % confidence interval for a function ( )c θ is given by

( ) ( )1 2

T

2,ˆ ˆˆ

n pc tα σ−±θ c Σc . (3.14)

The vector c may be a function of the parameters in θθθθ, and in that case it is replace by c , obtained by

substituting θθθθ by θ .

The problem of getting confidence intervals for pharmacokinetic parameters, which sometimes are

functions of the parameters fitted in the model, was treated by Sheiner (1986). He presents some

approaches to this problem, and we summarize them below:

1. If the parameter of interest, let us say φ, is a 1 to 1 function ( )c ⋅ of only one of the original

parameters, let us say θ, we can obtain a confidence interval for φ applying the transformation ( )c ⋅ to

both sides of the original interval. That is, if [ ],L S

θ θ is a 100 (1 )α− % confidence interval for θ, then

( ) ( ),L Sc cθ θ is a 100 (1 )α− % confidence interval for φ. It holds because the confidence level

remains invariant under 1 to 1 transformations.

2. An approximate standard error can be computed for the new function (cf. (3.11) and (3.13)), and then

it can be used to construct confidence intervals or for hypothesis testing.

3. We can reparameterize the model in terms of the new parameter and then refit the data. If the

parameter of interest, φ, is a function of some of the p original parameters, that is ( )1,..., pcφ θ θ= ,

then we can chose one of the original parameters that is of little interest, let us say θ1, and then

replace it in the model by ( )1 2 ,..., ,phθ θ θ φ= .

All the previous approaches seem reasonable, but the problem is that we can obtain different results with

them. For instance, under the asymptotic distributions presented in (3.9) and (3.10), the corresponding

12

confidence intervals with approaches 2 and 3 will be symmetric; however, approach 1 will not necessarily

produce a symmetrical interval. Similarly, if we obtain confidence intervals for θ and φ by using the

appropriate parameterization in each case, the resulting intervals will not be necessarily equivalent

through the function ( )cφ θ= . From a computational point of view, Sheiner (1986) emphasizes the fact

that with approaches 1 or 2 it is not necessary to refit the model while it is the case with approach 3.

However, due to the computational facilities available now it is not anymore an important consideration.

Without taking into account the computational aspect, Sheiner (1986) recommends the approach 3

because it is more direct. Nevertheless, if we use approach 2 with the results (3.11) or (3.13) to get the

standard error of the function of interest, approaches 2 and 3 are equivalent. In Section 3.5 we will

illustrate these ideas with an example.

3.3. Measures of Curvature or Nonlinearity

As exposed in Section 3.1.3, the key step to get the asymptotic results is the linearization of the

expectation surface given in (3.6), in such a way that the expectation surface is approximated by a tangent

plane at θ . At some extend the precision of this approximation depends on the parameterization of the

model, so a bad parameterization could produce misleading results. It is possible to get some insight

about the appropriateness of the approximations under different parameterizations by computing a kind of

measures of curvature or nonlinearity. A good explanation of this subject is given in Chapter 4 of Seber

and Wild (1989) and Chapter 7 of Bates and Watts (1988). In this section we follow the theory of the

latter.

3.3.1. Intrinsic and Parameter Effects Nonlinearity

There are two aspects that determine the appropriateness of the linear approximation of the expectation

surface given in (3.6). The first aspect is the intrinsic curvature of the expectation surface also referred as

the planar assumption. The second aspect is whether straight, parallel, equispaced lines in the parameter

space map into nearly straight, parallel, equispaced lines on the expectation surface which is also referred

as the uniform coordinate assumption. In this section we present measures of both characteristics based

on the second derivatives of the expectation surface; for the planar assumption they are named measures

of intrinsic nonlinearity, and for the uniform coordinate assumption they are named measures of

parameter effects nonlinearity.

In Section 3.1.3 we define the n×p matrix of derivatives of the expectation surface with respect to the

parameters

( ) ( ),j

r

f

θ

∂∂ = = ∂ ∂

x θf θF.

θ.

Similarly, we introduce now the n×p×p array of second derivatives

( ) ( )22

T

,j

r s

f

θ θ

∂∂ = = ∂ ∂∂ ∂

x θf θF..

θ θ.

This is an array of n faces F..j, where each face is a complete p×p matrix of second derivatives2.

The matrix F. can be decomposed in p vectors f.r, and the array F.. can be regarded as consisting of p2

vectors f..rs. Following the terminology of Bates and Watts (1988), the tangent vectors f.r are also called

velocity vectors, since they give the rate of change of f(θθθθ) with respect to each parameter, and the vectors

f..rs are called acceleration vectors, since they give the rates of change of the velocity vectors with respect

to the parameters. There are only p(p+1)/2 different acceleration vectors, so together with the p velocity

2 This matrix is also called the Hessian matrix.

13

vectors, the maximum dimension of the combined tangent and acceleration space is p(p+3)/2. Sometimes,

the combined dimension is only slightly larger than p, so we will denote the combined dimension by p+p′.

All the velocity vectors lie on the tangent plane. The acceleration vectors can be decomposed in two

components: a tangential component (in the tangent plane) and a normal component (orthogonal to the

tangent plane). This decomposition can be performed by a QR decomposition. To do that, let us form a

matrix D of dimension n×(p(p+3)/2) compounded of the p(p+1)/2 different acceleration vectors of F.. into

a matrix W.. and the p vectors of F.

( ),=D F. W.. . (3.15)

Performing the QR decomposition of D we have that

( )1 1 2| ' |= =D QR Q Q Q R ,

where Q1 contains the first p columns of Q and 1'Q the next 'p . Then, we form an array A.. by the

multiplication

( ) [ ]T

1 1| ' =

A.. Q Q F.. . (3.16)

In this multiplication the element in the kth face, rth row, sth column of A.., is given by

{ } ( ){ } { }T

1 1

1

| 'n

krs jrskj

j=

=∑A.. Q Q F.. .

Then, A.. is a compact acceleration array of 'p p+ faces of dimension p×p (instead of the n faces of F..).

The first p faces of A.. determine the projections of the acceleration vectors in the tangent space, so they

are the tangential components. These components measure the nonuniformity of the parameter lines on

the tangent plane, that is, they are a measure of parameter effects nonlinearity. The last 'p faces of A..,

which are the normal components, determine the projections of the acceleration vectors in the space

normal to the tangent space. These components measure how much the expectation surface deviates from

a plane, so they are a measure of intrinsic nonlinearity and do not depend on the parameterization but only

on the design and the form of the nonlinear function. To differentiate these two components, we will write

the first p faces of A.. as Aθ to denote the parameter effects acceleration array and the last 'p faces as A

ι

to denote the intrinsic acceleration array.

To illustrate the concepts of intrinsic nonlinearity and the parameter effects nonlinearity we present an

example taken from Bates and Wild (1988). In this example the nonlinear function is

( ), 60 70x

f x eθθ −= + ,

with the design ( )T4,41=x . In Figure 3.1 we plot the expectation surface for this design with marks for

θ = 0, 0.05, 0.10, …, 0.95, 1.0. In Figure 3.2 we plot the expectation surface with a different

parameterization, namely with 10

logφ θ= , so the nonlinear function is expressed by

( ) ( ), 60 70exp 10f x xφθ = + − .

In this case we plot marks for φ = -2.0, -1.9, …, -0.1, 0. In both cases the expectation curves are identical.

This aspect is the intrinsic nonlinearity of the expectation surface and does not change with the

reparameterization because it is just a relabeling of the points on the curve. On the other hand, the points,

that are equally spaced in the θ and φ spaces, do not map into equally spaced points on the expectation

curves. This is the parameter effects nonlinearity which does depend on the parameterization, and as we

can see in Figures 3.1 and 3.2, is less severe with the φ parameterization. In Figure 3.3 we present the

expectation curve for a different design, namely with ( )T4,12=x , and with the same parameterization

and the same marks for θ as in Figure 3.1. Now, as we can see, the different design affects the intrinsic

nonlinearity (the shape of the curve has changed).

14

60 80 100 120 140

60

80

10

01

20

f1

f 2

1

0

FIGURE 3.1 Expectation surface with design xT = (4, 41) and parameterization in terms of θ.

60 80 100 120 140

60

80

10

01

20

f1

f 2

0

-2

FIGURE 3.2 Expectation surface with design xT = (4, 41) and parameterization in terms of φ = log10θ.

60 80 100 120 140

60

80

10

01

20

f1

f 2

1

0

FIGURE 3.3 Expectation surface with design xT = (4, 12) and parameterization in terms of θ.

15

Summarizing, the curvature of the expectation surface affects the precision of the asymptotic inferential

results. Intrinsic nonlinearity depends on the design (the time points to take the measures and the

nonlinear function in the regression model), and parameter effects nonlinearity depends on the

parameterization.

A final consideration concerning accelerations as nonlinearity measures is that they depend on the scaling

of the data and parameters. To avoid this problem, accelerations are converted to relative curvatures with

the following transformation:

T 1

11 11s p− −=C R A..R . (3.17)

Here, each face of A.. is premultiplied and postmultiplied by T

11

−R and 1

11

−R respectively to get each face

of C, the relative curvature array. In a similar way as with A.., we will use the notation Cθ to denote the

parameter effects relative curvature array and Cι to denote the intrinsic relative curvature array. The

matrix R11 comes from the original QR decomposition and is given by

( ) [ ]T 11 12

1 1 1

22

| '0

= =

R RR Q Q D

R.

s is the root mean square error, and p the number of parameters.

3.3.2. Reparameterization

There is little guidance in the literature about which parameterization is more suitable for particular

models based on curvature measures. The problem is that a parameterization that gives good results for a

particular model with some dataset can produce poor results with another dataset (Bates and Watts, 1981).

Indeed, small changes in the design may change both, the intrinsic and parameter effects curvatures, in

unexpected ways. For instance, Bates and Watts (1980) added two data points to two data sets; in one of

them adding the two data points increased the intrinsic curvatures and decreased the parameter effects

curvatures while in the other, both, the intrinsic and parameter effects curvatures were increased.

Therefore, to have some insight about which parameterization is more suitable in a particular case, we

would need to compute the relative curvature measures for each parameterization with a specified model

and design.

In order to facilitate comparisons, it is helpful to possess a single overall measure of nonlinearity. Bates

and Watts (1988) use the root mean square (RMS) curvature, denoted by c and defined by

( )

2

2

1 1 1

12

2

p p p

krs krr

k r s r

c c cp p = = =

= +

+ ∑ ∑∑ ∑ , (3.18)

with ckrs the (r,s)th element of the kth face of C. Running the index k from 1 to p we obtain the RMS

parameters effect curvature cθ , and running k from p+1 to 'p p+ we obtain the RMS intrinsic curvature

cι .

With these measures we can compare different parameterizations to find the one with less curvature

effects. Hence, if we get the best results with a parameterization φφφφ, where the elements of φφφφ are functions

of the parameters of main interest in θθθθ defined by

( )= G θφφφφ , (3.19)

we can use the inverse of (3.19) to get confidence intervals on θθθθ.

Scaling (3.17) by s p makes curvature measures comparable with the F distribution because in the

linear case, a (1 )α− joint confidence region for θθθθ is given by the set

( ) ( ) ( ){ }T1 2

1 , ,ˆ ˆ:

p n pps F

α

−

− −− − ≤θ θ θ Σ θ θ .

16

There isn’t a clear criterion to decide if the curvature measures in a particular case are satisfactory or not.

Bates and Watts (1988) compute the RMS curvatures for 67 data sets and consider them as acceptable if

c Fθ and c Fι are less than 0.3, with F the 0.95 quantil of the F distribution with p and n-p degrees of

freedom; this reference value strongly depends on geometrical considerations. We present an example of

these computations in Section 3.5.4 to compare three parameterizations of the one-compartment model

with extravascular administration.

3.4. Software

In this section we present an overview of the characteristics of the commercial software SAS and

WinNonLin and the free software R to fit individual pharmacokinetic models and nonlinear regression

models.

3.4.1. R

For nonlinear regression fitting R has the nls function. By default it works with the Gauss-Newton

algorithm, and it produces least squares estimates of the parameters of the model. Weighted least squares

estimates are not yet implemented.

R has two packages to estimate pharmacokinetic parameters and fit pharmacokinetic models, the PK and

PKfit packages. The PK package estimates the area under the concentration time curve (AUC), the area

under the first moment curve (AUMC), and the half life parameters given concentration time data for a

single individual. The PKfit package allows estimating several compartment models (one-compartment,

two-compartment, and macroconstant exponential functions with one, two, and three exponential terms)

and performs simulations with each of them. Simulations can be done using normally and uniformly

distributed random errors, and it is possible to relate their variability to the actual concentrations. The

PKfit package is equipped with a menu-based interface which is invoked with the PKmenu() function.

This package fits the models using three different methods: the Nelder-Mead Simplex algorithm, the

Genetic algorithm, and by calling the nls function, and enables three weighting schemes: equal weight,

1/Cp, and 1/Cp2.

3.4.2. SAS

The NLIN procedure produces least squares and weighted least squares estimates of the parameters of a

nonlinear model. The estimation can be performed using four different algorithms: Steepest-Descent,

Newton, Gauss-Newton, and Marquardt. Confidence intervals for the parameters of the model are

computed using formula (3.14).

3.4.3. WinNonlin

WinNonlin fits one, two, and three-compartment models using three different algorithms: the Nelder-

Mead Simplex, the Gauss-Newton algorithm with Hartley modification, and the Gauss-Newton algorithm

with Hartley and Levenberg modification, the latter used as default. It allows using different weighting

schemes: user specified weights, 1/Cpn, and 1/predicted Cp

n for user specified power n. WinNonlin can

make simulations for several predefined compartment models with one, two, and three-compartments and

also for user defined models; however, the models must be fitted with data first, which becomes a great

limitation. Confidence intervals are computed for all the parameters of the model but not for secondary

parameters. For secondary parameters standard errors are computed using the second approach mentioned

in Section 3.2. If the secondary parameter is a linear combination of the parameters of the model, the

computation of the standard error is direct according to (3.11), otherwise, it is computed from the linear

term of a Taylor series expansion of the secondary parameter according to (3.12) and (3.13).

17

3.5. Example: One-Compartment Model, Extravascular Administration

3.5.1. Data and Nonlinear Model

In this example we will analyze data from an open one-compartment model with intranasal administration

gathered from a single individual. The concentration time data is given in Table 3.1 and is stored in the

data frame ex1 in R.

TABLE 3.1 Concentration time data for a single individual after

intranasal administration of 5 mg of drug.

Time (hrs) Concentration (mg/l)

0.00 0.0000

0.25 0.0081

0.50 0.0092

0.75 0.0098

1.00 0.0089

2.00 0.0072

4.00 0.0043

6.00 0.0027

The open one-compartment model with extravascular administration, given in (2.3) is

( )

el aa

D a el

( )k t k tD f k

C t e eV k k

− ⋅ − ⋅⋅ ⋅ = − −

.

The parameters to estimate in this model are the absorption constant ka, the elimination constant kel, and

the volume of distribution VD. The administered dose D and the fraction of administered dose which is

absorbed f are given constants. In this example we will assume that f = 1 and D = 5 mg. Note that from a

statistical point of view, D, f, and VD form together a single parameter, so any specification on the values

of D and f will be reflected in the estimated value of VD. In order to follow the standard nonlinear

regression notation, we rewrite the model by

( )( )

2 11

3 1 2

5,

x xf x e e

θ θθ

θ θ θ− ⋅ − ⋅ = − −

θ ,

with θ1 = ka, θ2 = kel, and θ3 = VD.

3.5.2. R Output

In Table 3.2 we show the results obtained with the PKfit package of R. The initial values for the

parameters required for the estimation process are θ1 = 5, θ2 = 0.2, and θ3 = 500. Note that the PKfit

package uses the labels ka, kel, and Vd for the parameters. The model is fitted using equal weights, that

is, by ordinary least squares.

The estimated parameters are (according to the Nelder-Mead Simplex algorithm and the nls function) 1

1ˆ 5.521 hrθ −= , -1

2ˆ 0.2438 hrθ = , and

3ˆ 451.3Lθ = , so the estimated model is (with D = 5 mg and f = 1)

( )( )( )( )

0.2438 5.521

0.2438 5.521

5 5.521,

451.3 5.521 0.2438

0.01159 .

x x

x x

f x e e

e e

− ⋅ − ⋅

− ⋅ − ⋅

= − −

= −

θ

18

TABLE 3.2 Analysis of the concentration time data using the PKfit package.

<< The value of parameter fitted by genetic algorithm >>

Parameter Value

1 ka 7.3002211

2 kel 0.2174612

3 Vd 474.3527005

<< The value of parameter fitted by Nelder-Mead Simplex slgorithm >>

Parameter Value

1 ka 5.5209976

2 kel 0.2438322

3 Vd 451.3250264

<< Residual sun-of-squares and parameter values fitted by nls >>

2.591939e-07 : 5.5209976 0.2438322 451.3250264

<< Output >>

time Observed Calculated Wtd Residuals AUC AUMC

1 0.00 0.0000 0.000000000 0.000000e+00 0.0000000 0.000000000

2 0.25 0.0081 0.007989802 1.101983e-04 0.0010125 0.000253125

3 0.50 0.0092 0.009526818 -3.268180e-04 0.0031750 0.001081250

4 0.75 0.0098 0.009468888 3.311119e-04 0.0055500 0.002575000

5 1.00 0.0089 0.009036054 -1.360537e-04 0.0078875 0.004606250

6 2.00 0.0072 0.007116987 8.301315e-05 0.0159375 0.016256250

7 4.00 0.0043 0.004370272 -7.027193e-05 0.0274375 0.047856250

8 6.00 0.0027 0.002683714 1.628555e-05 0.0344375 0.081256250

<< AUC (0 to infinity) computed by trapezoidal rule >>

[1] 0.04551069

<< AUMC (0 to infinity) computed by trapezoidal rule >>

[1] 101.0658

<< Akaike's Information Criterion (AIC) >>

[1] -109.2580

<< Log likelihood >>

'log Lik.' 57.62902 (df=3)

<< Schwarz's Bayesian Criterion (SBC) >>

[1] -109.0197

Formula: conc ~ modfun(time, ka, kel, Vd)

Parameters:

Estimate Std. Error t value Pr(>|t|)

ka 5.52100 0.42989 12.84 5.10e-05 ***

kel 0.24383 0.01254 19.44 6.64e-06 ***

Vd 451.32503 9.27550 48.66 6.93e-08 ***

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.0002277 on 5 degrees of freedom

Correlation of Parameter Estimates:

ka kel

kel -0.5188

Vd 0.6862 -0.761

In Figure 3.4 we show concentration time plots of the data together with the fitted curve in linear and

logarithmic scale, and plots of residuals3 versus time and versus fitted values. These plots are also

generated by the PKfit package.

3 These are standardized residuals.

19

0 1 2 3 4 5 6

0.0

00

0.0

04

0.0

08

Subject 1 plot

Time

Co

ncen

trati

on

Linear

0 1 2 3 4 5 6

0.0

03

0.0

05

0.0

07

0.0

10

Subject 1 plot

Time

Co

ncen

trati

on

Semi-log

0 1 2 3 4 5 6

-3

e-0

4-1

e-0

41

e-0

43

e-0

4

Residual Plots

Time

Weig

hte

d R

esid

ual

0.000 0.002 0.004 0.006 0.008

-3

e-0

4-1

e-0

41

e-0

43

e-0

4

Residual Plots

Calc Cp(i)

Weig

hte

d R

esid

ual

FIGURE 3.4 Fitted curve and plot of residuals

The PKfit package computes the residuals and the calculated concentration values, and they are shown in

Table 3.2 below the title <<Output>>. The pharmacokinetic parameters area under the concentration time

curve (AUC) and area under the first moment curve (AUMC) are also computed.

The Akaike’s Information Criterion (AIC), the log-likelihood, and the Schwar’s Bayesian Criterion

(SBC4) are measures of the goodness of fit of the model and can be used as criteria to decide the best

model to describe the data. Evidently, a model with more parameters will fit the data better, and therefore

it will always have a greater likelihood than a model with less parameters. To obtain a compromise

between the goodness of fit of a model and its simplicity, the AIC and SBC criteria introduce a penalty in

the likelihood of the model as a function of the number of estimated parameters. In this respect, the AIC

and SBC values are defined by5

( )ˆAIC 2 2l p= − +θ ,

( ) ( )ˆSBC 2 lnl n p= − +θ ,

where ˆ( )l θ is the log-likelihood value in the estimated parameters θ , n is the number of observations,

and p the number of parameters in the model. Hence, according to these criteria, while comparing models

the one with the lower AIC or SBC should be preferred. In our example we have that

Log-likelihood = 57.63 ,

AIC 2 57.63 2 3 109.3= − × + × = − ,

4 It is usually called in the literature the Bayes Information Criterion (BIC). 5 These are the formulas used in R.

20

SBC 2 57.63 ln(8) 3 109.0= − × + × = − .

Below these measures, the R output presents a table with the estimated parameters, their asymptotic

standard deviation, and the results of a t test. In this example all the parameters result significant. Finally,

it presents the correlations between the estimated parameters. Since we did not use weighted least

squares, these correlations are computed from the asymptotic result given in (3.9) with ˆ 0.0002277σ = .

3.5.3. Confidence Intervals

Confidence Interval for the Absorption Constant, the Elimination Constant, and the Volume of

Distribution

Given a one-dimensional linear or nonlinear function ( )c θ of the parameters in θθθθ, we can compute an

approximate 100 (1 )α− % confidence interval for ( )c θ using formula (3.14)

( ) ( )1 2

T

2,ˆ ˆˆ

n pc tα σ−±θ c Σc .

In this example ˆ 0.0002277σ = , n = 8, p = 3, and

3566181 54034 52811240

ˆ 54034 3037 1709438

52811240 1709438 1660369983

− = − − −

Σ .

Hence, the approximate 95% confidence intervals for θ1, θ2, and θ3 are given by (respectively)

( ) [ ]5.521 2.571 0.4300 4.416; 6.626± = ,

( ) [ ]0.2438 2.571 0.01255 0.2116; 0.2761± = ,

( ) [ ]451.3 2.571 9.277 427.5; 475.2± = .

Confidence Interval for the Elimination Half-Life

Now we compute a confidence interval for the elimination half-life parameter with the three approaches

discussed in Section 3.2. For the open one-compartment model with extravascular administration we have

that 1 2 el

ln 2t k= , so the reparameterization ( )= Gφ θφ θφ θφ θ is given by

1 1 2 3 3

2

ln 2, ,φ θ φ φ θ

θ= = = .

Following the first approach, we have to apply this transformation to the confidence interval computed

for θθθθ. Hence, with this method the estimated value of φ2 is 2.843 and the correspondent 95% confidence

interval [ ]2.511; 3.276 . Note that this interval is not symmetric.

The second approach consists in computing an approximate standard error for the new parameter. The

computations are straightforward is we apply the result (3.13) with ( ) 2ln 2c θ=θ and the result (3.14).

Then we have that 2ˆ 2.843=φ (the same as in approach 1),

T 2

20, ln 2 ,0θ = − c , and by replacing θ2

with its estimate, [ ]Tˆ 0, 11.66,0= −c . The resulting 95% confidence interval is

( ) [ ]2.843 2.571 0.1463 2.467; 3.219± = .

The third approach, reparameterization, implies fitting the model

( )( )

2 1ln 21

3 1 2

5,

ln 2

x xf x e e

− ⋅ − ⋅ = − −φ φφ

φ φ φφφφφ .

21

The ordinary least squares estimation of this model using the nls function of R is shown in Table 3.3.

The starting values are φ1 = 5, φ2 = 3.5, and φ3 = 500.

TABLE 3.3 Analysis of the model reparameterized in terms of elimination half-life using the

nls function.

> half_life <- nls(Conc~5*Ka/(Vd*(Ka-log(2)/t_half))*(exp(-log(2)/t_half*Time)-

+ exp(-Ka*Time)), data=ex1, start = c(Ka=5, t_half=3.5, Vd=500), model=T)

> summary(half_life)

Formula: Conc ~ 5 * Ka/(Vd * (Ka - log(2)/t_half)) * (exp(-log(2)/t_half *

Time) - exp(-Ka * Time))

Parameters:


Ka 5.5211 0.4300 12.84 5.10e-05 ***

t_half 2.8427 0.1463 19.43 6.66e-06 ***

Vd 451.3271 9.2781 48.64 6.94e-08 ***

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1



Ka t_half

t_half 0.5192

Vd 0.6863 0.7612

With these results, the estimated half-life is 2ˆ 2.843=φ 6

, and again using formula (3.14) with ( ) 2c φ=φφφφ ,

the approximate 95% confidence interval is

( ) [ ]2.843 2.571 0.1463 2.467; 3.219± = .

As we can see, approaches 2 and 3 give the same results.

Confidence Interval for Total Clearance

As a second example, we compute a confidence interval for total clearance, ClT, which in the one-

compartment model is defined by

T el DCl k V= ⋅ .

Hence, the reparameterization ( )= Gφ θφ θφ θφ θ is given by

1 1 2 2 3 2 3, ,φ θ φ θ φ θ θ= = = .

Applying the second approach7 we have that ( ) 2 3

c θ θ=θ , [ ]T

3 20, ,θ θ=c , [ ]Tˆ 451.3,0.2438=c , and by

(3.14), the approximate 95% confidence interval is

( ) [ ]110.0 2.571 4.205 99.24;120.86± = .

In Table 3.4 we present the results of the estimation of the model reparameterized in terms of total

clearance

6 Note that with the three approaches the point estimates are the same. This fact relies in an appealing characteristic of

least squares estimators, namely that for ( )= Gφ θφ θφ θφ θ , if θ is the least squares estimator of θ , then ( )ˆ ˆ= G θφφφφ is the

least squares estimator of φφφφ .

7 The first approach does not apply in this case because total clearance is a function of more than one parameter.

22

( )( )

2 11

3 1 2 2

5,

x xf x e e

− ⋅ − ⋅ = − −φ φφ

φ φ φ φφφφφ .

TABLE 3.4 Analysis of the model reparameterized in terms of total clearance using the nls

function.

> total_clearance<-nls(Conc~5*Ka/(Cl/Kel*(Ka-Kel))*(exp(-Kel*Time)-exp(-Ka*Time)),

+ data=ex1, start=c(Ka=5, Kel=0.2, Cl=100), model=T)

> summary(total_clearance)

Formula: Conc ~ 5 * Ka/(Cl/Kel * (Ka - Kel)) * (exp(-Kel * Time) - exp(-Ka *

Time))

Parameters:


Ka 5.52109 0.43000 12.84 5.10e-05 ***

Kel 0.24383 0.01255 19.43 6.66e-06 ***

Cl 110.04715 4.20543 26.17 1.52e-06 ***

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1



Ka Kel

Kel -0.5192

Cl -0.3300 0.9372

Thus, the approximate 95% confidence interval for total clearance with the third approach is

( ) [ ]110.0 2.571 4.205 99.24;120.86± = .

3.5.4. Measures of Curvature

In this section we compute the measures of intrinsic and parameter effects nonlinearity for each of the

three parameterizations treated in Section 3.5.3; the R code for these computations is presented in the

final Apendix. In each case we have three parameters, so we have three velocity vectors and nine

acceleration vectors, six of them being different.

Given the original parameterization

( )( )

2 11

3 1 2

5,

x xf x e e

θ θθ

θ θ θ− ⋅ − ⋅ = − −

θ ,

the elements of F. are defined by

( ) ( )( )

( )( )

1 2 1 2 1

1 1

2

1 1 2 31 2 3

5 5,x x x x x

e e e e e xf xθ θ θ θ θθ θ

θ θ θ θθ θ θ

− − − − −− + − + +∂= − +

∂ −−

θ,

( ) ( )( ) ( )

1 22

1 1

2

2 1 2 31 2 3

5, 5x x xe ef x e x

θ θ θθ θ

θ θ θ θθ θ θ

− − −− +∂= −

∂ −−

θ,

( ) ( )( )

1 2

1

2

3 1 2 3

5,x x

e ef xθ θθ

θ θ θ θ

− −− +∂= −

∂ −

θ,

and the elements of F.. by

23

( ) ( )( )

( )( )

( )( )

1 2 1 2 1 1 1 221 1 1

2 3 2

1 2 31 1 2 3 1 2 3

10 10 5 2,x x x x x x x

e e e e e x e x e xf xθ θ θ θ θ θ θθ θ θ

θ θ θθ θ θ θ θ θ θ

− − − − − − −− + − + + −∂= − +

−∂ − −

θ,

( ) ( )( )

( )( ) ( )

1 2 1 2 1 22

21 1 1

3 2

1 2 1 2 31 2 3 1 2 3

10 5, 5x x x x x x xe e e e e x e xf x e x

θ θ θ θ θ θ θθ θ θ

θ θ θ θ θθ θ θ θ θ θ

− − − − − − −− + − + + +∂= − + −

∂ −− −

θ,

( ) ( )( )

( )( )

1 2 1 2 121 1

2 221 3 1 2 31 2 3

5 5,x x x x x

e e e e e xf xθ θ θ θ θθ θ

θ θ θ θ θθ θ θ

− − − − −− + − + −∂= −

∂ −−

θ,

( ) ( )2 2

2 1 1 2

, ,f x f x

θ θ θ θ

∂ ∂=

∂ ∂

θ θ,

( ) ( )( ) ( ) ( )

1 22 2

2 21 1 1

2 3 2

1 2 32 1 2 3 1 2 3

10, 10 5x x x xe ef x e x e x

θ θ θ θθ θ θ

θ θ θθ θ θ θ θ θ θ

− − − −− +∂= − +

−∂ − −

θ,

( ) ( )( ) ( )

1 21

21 1

2 222 3 1 2 31 2 3

5, 5x x xe ef x e x

θ θ θθ θ

θ θ θ θ θθ θ θ

− − −− +∂= − +

∂ −−

θ,

( ) ( )2 2

3 1 1 3

, ,f x f x

θ θ θ θ

∂ ∂=

∂ ∂

θ θ,

( ) ( )2 2

3 2 2 3

, ,f x f x

θ θ θ θ

∂ ∂=

∂ ∂

θ θ,

( ) ( )( )

1 221

2 3

3 1 2 3

10,x x

e ef xθ θθ

θ θ θ θ

− −− +∂=

∂ −

θ.

In Table 3.5 we present the velocity and acceleration vectors for our data evaluated at

( )Tˆ 5.521,0.2438,451.3=θ .

TABLE 3.5 Velocity and acceleration vectors.

Velocity Acceleration

Time f.1 f.2 f.3 f..11 f..12 f..13 f..22 f..23 f..33

0.00 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000

0.25 0.000662 -0.001212 -0.000018 -0.000169 -0.000139 -0.000001 0.000222 0.000003 0.000000

0.50 0.000287 -0.003325 -0.000021 -0.000159 -0.000245 -0.000001 0.001305 0.000007 0.000000

0.75 0.000059 -0.005446 -0.000021 -0.000076 -0.000268 0.000000 0.003366 0.000012 0.000000

1.00 -0.000029 -0.007370 -0.000020 -0.000019 -0.000254 0.000000 0.006289 0.000016 0.000000

2.00 -0.000059 -0.012886 -0.000016 0.000022 -0.000148 0.000000 0.023585 0.000029 0.000000

4.00 -0.000037 -0.016653 -0.000010 0.000014 -0.000018 0.000000 0.063615 0.000037 0.000000

6.00 -0.000022 -0.015594 -0.000006 0.000009 0.000034 0.000000 0.090703 0.000035 0.000000

The tangent space has dimension three. The combined tangent and acceleration spaces have dimension

five because four of the acceleration vectors are linear combinations of the velocity vectors. We show the

linear combinations below.

( )( )

( )( )

( )( )

( )2

32

1 2 1 2 1 1 1 2 2 1 1 2 3

, , , ,1f x f x f x f xθθ

θ θ θ θ θ θ θ θ θ θ θ θ θ

∂ ∂ ∂ ∂= − +

∂ ∂ − ∂ − ∂ − ∂

θ θ θ θ,

24

( ) ( )2

1 3 3 1

, ,1f x f x

θ θ θ θ

∂ ∂= −

∂ ∂

θ θ,

( ) ( )2

2 3 3 2

, ,1f x f x

θ θ θ θ

∂ ∂= −

∂ ∂

θ θ,

( ) ( )2

2

3 33

, ,2f x f x

θ θθ

∂ ∂= −

∂∂

θ θ.

Hence, for our future computations, 3p = and ' 2p = . The vectors in Table 3.5 form the matrix D given

in (3.15). Now we perform a QR decomposition on D and compute the compact acceleration array A..

given in (3.16). In Table 3.6 we show the results obtained with R for the array A.. which is composed of

five faces, each of them represented as a 3×3 matrix.

TABLE 3.6 R output for the compact acceleration array A...

> A1

[,1] [,2] [,3]

[1,] 2.245701e-04 2.222226e-04 1.613188e-06

[2,] 2.222226e-04 7.174226e-03 -4.267707e-07

[3,] 1.613188e-06 -4.267707e-07 -1.035020e-07

> A2

[,1] [,2] [,3]

[1,] 2.131654e-05 2.120827e-04 -1.596977e-21

[2,] 2.120827e-04 -1.017923e-01 -6.199946e-05

[3,] -1.596977e-21 -6.199946e-05 -1.276624e-07

> A3

[,1] [,2] [,3]

[1,] 7.864738e-05 3.801505e-04 -1.337680e-21

[2,] 3.801505e-04 4.114165e-02 3.243375e-21

[3,] -1.337680e-21 3.243375e-21 -1.087519e-07

> A4

[,1] [,2] [,3]

[1,] -6.095127e-05 1.325061e-20 -5.494046e-23

[2,] 1.325061e-20 1.847316e-02 -5.277412e-22

[3,] -5.494046e-23 -5.277412e-22 -7.973730e-24

> A5

[,1] [,2] [,3]

[1,] 9.367820e-22 -5.755854e-20 3.568832e-23

[2,] -5.755854e-20 -2.087244e-02 -1.273858e-21

[3,] 3.568832e-23 -1.273858e-21 9.649902e-24

The first three faces (A1, A2, and A3 in Table 3.6) give the projections of the acceleration vectors on the

tangent plane and constitute a measure of the parameter effects nonlinearity; we denote these three faces

by Aθ. The last two faces (A4 and A5) give the projections of the acceleration vectors on the space

normal to the tangent plane spanned by the acceleration vectors and are a measure of intrinsic

nonlinearity; they are denoted by Aι. As we can see in these results, the parameter effects nonlinearity is

larger than the intrinsic nonlinearity.

In Table 3.7 we present the relative curvatures, computed with formula (3.17). In this example,

0.0002277s = and 3p = . The matrix R11 is also presented in Table 3.7.

25

TABLE 3.7 R output for the relative curvature array C

> R11

[,1] [,2] [,3]

[1,] -7.280754e-04 1.926132e-04 2.335663e-05

[2,] -2.268129e-21 2.798204e-02 2.880874e-05

[3,] 1.518704e-20 -1.389134e-19 2.454134e-05

> C1

[,1] [,2] [,3]

[1,] 0.167079417 -0.005451944 -0.188220720

[2,] -0.005451944 0.003680740 0.000867986

[3,] -0.188220720 0.000867986 0.144227621

> C2

[,1] [,2] [,3]

[1,] 0.015859437 -0.004214736 -0.01014622

[2,] -0.004214736 -0.051214704 0.02852463

[3,] -0.010146215 0.028524631 -0.06562669

> C3

[,1] [,2] [,3]

[1,] 0.058513386 -0.007761854 -0.04657717

[2,] -0.007761854 0.020826800 -0.01706114

[3,] -0.046577168 -0.017061138 -0.00685723

> C4

[,1] [,2] [,3]

[1,] -0.0453475368 0.0003121479 0.04279200

[2,] 0.0003121479 0.0093026323 -0.01121731

[3,] 0.0427919974 -0.0112173116 -0.02755840

> C5

[,1] [,2] [,3]

[1,] 6.969627e-19 1.399622e-18 -3.094041e-18

[2,] 1.399622e-18 -1.051328e-02 1.234139e-02

[3,] -3.094041e-18 1.234139e-02 -1.448739e-02

Again, the three first faces (C1, C2, and C3) correspond to the parameter effects nonlinearity and the last

two (C4 and C5) to the intrinsic nonlinearity. We use these values with formula (3.18) to get the RMS

curvatures for parameter effects and intrinsic nonlinearity

( )

23 3 3 3

2

1 1 1 1

12 0.16124

3 3 2krs krr

k r s r

c c cθ

= = = =

= + = +

∑ ∑∑ ∑ ,

( )

25 3 3 3

2

4 1 1 1

12 0.03611

3 3 2krs krr

k r s r

c c cι

= = = =

= + = +

∑ ∑∑ ∑ .

Finally, we compute the RMS curvatures for the parameterizations in terms of the half-life and total

clearance. The relative curvature array can be computed in the same way as we made for the original

parameterization. For the half-life parameterization we have

0.17636cθ = ,

0.03611cι = ,

and for the total clearance parameterization

0.21682cθ = ,

0.03611cι = .

As we can see, different parameterizations change the parameter effects curvature but the intrinsic

curvature remains constant. For the original parameterization, with F the 0.95 quantile of the F

26

distribution with 3 and 5 degrees of freedom, we have that 0.375c Fθ = and 0.084c Fι = . For the

half-life parameterization 0.410c Fθ = and for the total clearance parameterization 0.504c Fθ = . If

we compare these values with the 0.3 reference value given by Bates and Watts (1988), the parameter

effects curvatures are a little bit high. Anyhow, for this specific model and design, the parameterization in

terms of ka, kel, and VD seems to be more convenient.

An important feature to note is that in the three cases the nonlinearity caused by the parameterization is

much longer than the intrinsic nonlinearity of the model. This characteristic was noted by Bates and Watts

(1980) in a work when they computed nonlinearity measures for 24 data sets.

27

4. Population Pharmacokinetics

Different individuals respond in different ways to drugs. Hence, under the reasonable assumption that in a

specific situation (that is, determined drug, concentration, and administration route) the pharmacokinetic

model to explain the concentrations of the drug in the body over time is the same in all the individuals,

the estimated parameters for each individual will be different; in fact, the actual unknown individual

parameters must be different. If we consider the different individual parameters as members of a

population, then the new goal is to find a probabilistic description of this population. In this section we

will discuss this problem, and as usual, this description will be based mainly on the first two moments of

the distribution.

In Section 3 we dealt with individual pharmacokinetics, so there we considered only intra-individual

variation. Now, with several individuals, the inter-individual variation is incorporated. In Section 4.1 we

start by setting up the general model that adequately accommodate both sources of variability. In Section

4.2 we present two traditional methods, quite often used in the past, to estimate the population

pharmacokinetic parameters, and in Section 4.3 we present a different approach, based on the

linearization of the model, apparently more efficient. In Section 4.4 we spend some words on available

software, and in Sections 4.5 and 4.6 we illustrate the different methods presented here with examples.

4.1. Hierarchical Nonlinear Models

The hierarchical nonlinear model gives the general framework to analyze repeated measures data in

nonlinear models, which is the kind of data that arise in the field of population pharmacokinetics. The

theory presented in this section is mainly based on Chapter 4 of Davidian and Giltinan (1995).

4.1.1. The Model

Let yij denote the jth response, j = 1,…, ni, for the ith individual, i = 1,…, m. The hierarchical nonlinear

model is given by

( ),ij ij i ijy f e= +x θ ,

where ( ),ij if x θ is a nonlinear function common to all individuals which depends on a vector of

covariates xij and a vector of possible different parameters θθθθi of dimension 1p × , and eij is the random

error term associated to the jth response in the ith individual.

4.1.2. Intra-Individual Variation

We can summarize the data for the ith individual as

( )i i i i= +y f θ e , (4.1)

with T

1,...,ii i iny y = y , ( ) ( ) ( )

T

1, ,..., ,

ii i i i in if f =

f θ x θ x θ , and

T

1,...,ii i ine e = e .

28

Intra-individual variation is specified by the systematic variation given through the function f and the

random variation characterized by an assumption on the conditional distribution of the random errors

given the ith individual. The general assumption is that

( )E |i i

=e θ 0 , ( ) ( )Cov | ,i i i i

=e θ R θ ξ , (4.2)

with some defined probability distribution. The covariance matrix R (cf. (3.3)) depends on the parameters

of the model θθθθi and on some intra-individual covariance parameters given in the vector ξξξξ. The functional

form of Ri and the covariance parameters ξξξξ are the same for all the individuals, so Ri differs across

individuals only through its dependence on θθθθi. Although it is possible to extend the model for intra-

individual covariance to allow covariance parameters to vary from individual to individual (actually, it is

possible to do it with R), Davidian and Giltinan (1995) recommend not to do this because it could be

impossible to estimate the elements of such a complicated structure reliably. Note that for a given

individual i, this setting is equal to the one presented in Section 3.1. The most common assumption for the

conditional distribution of ei given θθθθi is

( )( )| N , ,i i i ie θ 0 R θ ξ∼ .

4.1.3. Inter-Individual Variation

Differences among individuals are specified through differences in the individual parameters θθθθi. In order

to have a quite general model for variation among individuals, we define the following model for θθθθi

( ), ,i i i

=θ d a β b . (4.3)

In this expression, d is a p-dimensional vector-valued function, ai is a 1a × covariate vector

corresponding to individual characteristics for individual i, bi is a 1k × vector of random effects

associated with individual i, and ββββ is a 1r × vector of fixed population parameters. Each element of d is

associated with the corresponding element of θθθθi, so the functional relation may be different for each

element. Inter-individual random variation is specified by an assumption on the distribution of the random

effects bi. The general assumption is that bi has some distribution with mean 0 and covariance matrix D,

and that they are independent and identically distributed. As in the case of intra-individual variation, the

most common assumption is that

( )N ,i

b 0 D∼ . (4.4)

Note that this assumption does not imply normality for θθθθi since its distribution will depend on the form of

the function d. For instance, pharmacokinetic parameters such as clearance exhibit skewed distributions

with constant coefficient of variation. If we model the individual parameters by

( )expri ri

bθ β=

and assume a normal distribution for the random effects bri (here, the subindices r and i refer to the rth

parameter for the ith individual) then the individual parameters θri will have a lognormal distribution

which is skewed with constant coefficient of variation.

4.1.4. Modelling

We can model different kind of relations and assumptions with suitable choices of the function d, the

matrix D, and the matrices Ri.

Individual Parameters Distribution

In the previous section we have seen that, although the general assumption is that the random effects have

a normal distribution, an adequate form of the function d will enable us to assume a different distribution

for the individual parameters. Similarly, we can incorporate a systematic dependence of the individual

parameters on subject characteristics by including some covariates in the function d. For example, if we

know that a parameter such as clearance depends on weight, we can define the function

29

( ) ( )1 2exp

ri i riw bθ β β= + ,

with wi the weight of the ith individual. Furthermore, not all the pharmacokinetic parameters of the model

need to be random. We can specify some parameters to be fixed and some to be random by considering

some zeros in the vectors bi.

Treatment Comparisons

The effect of different treatments can be also incorporated in the analysis through the function d. For

instance consider an experiment to test two treatments in an open one-compartment model with

intravascular administration (cf. Section 2.1.1). We can assume that both parameters, VD and kel, vary

between treatments by assuming that

i i i= +θ A β b ,

with ( )T

1 2 3 4, , ,β β β β=β , where β1 and β2 represent the fixed effects for the parameters VD and kel for the

first treatment, and β3 and β4 represent the corresponding fixed effects for the second treatment. If each

individual is subjected to just one treatment, then the matrix Ai would be of the form

[ ]2 2 2|

i ×=A I 0

if individual i receives the first treatment and

[ ]2 2 2|

i ×=A 0 I

if individual i receives the second treatment. If we assume that just one parameter varies between

treatments (let us say kel), then we can assume that

1

1 1

2

2 2

3

1 0 0

0 1 0

i i

i i

b

b

βθ

βθ

β

= +

for the first treatment and

1

1 1

2

2 2

3

1 0 0

0 0 1

i i

i i

b

b

βθ

βθ

β

= +

for the second treatment with β1 the fixed effect for VD.

Inter-Individual and Intra-Individual Structure

We can assume uncorrelated random effects by considering a diagonal matrix D. If we assume that some

random effects are correlated and some are not, a block diagonal matrix is adequate. Autocorrelations and

nonconstant variances in the measures of an individual can be incorporated in the matrices Ri.

4.2. Traditional Approaches

In this section we present a brief description of two traditional approaches to the population parameters

estimation problem. Since these approaches are well-known and easy to implement, we will not go into

further details, and we will pay more attention to the approach exposed in Section 4.3.

4.2.1. The Naïve Pooled Data Approach (NPD)

In this approach all individuals’ data are pooled together as though there were no differences among

individuals and analyzed using nonlinear regression models as though it had all come from one

30

individual. Because this method ignores individuals, both intra and inter-individual variations are

combined in one single error term. Although it is an advantage that the method is simple, it produces

biased and imprecise estimators as was shown by Sheiner and Beal in a series of papers where they

applied this method to estimate the pharmacokinetic parameters of three models: the Michaelis-Menten

(Sheiner and Beal, 1980), the bioexponential model (Sheiner and Beal, 1981), and the monoexponential

model (Sheiner and Beal, 1983). As a result of their research, the authors suggested that this approach

must be abandoned.

4.2.2. The Two-Stage Approach

This approach is useful when there are enough measurements per individual in order to fit individual

models. Then, in a second stage, the individual parameter estimates are used as building blocks to obtain

estimators for the population parameters.

In the first stage the individual parameters are estimated using nonlinear regression methods in a similar

way as described in Section 3.1.2. If we consider a model as in (4.1) with a general covariance structure

as in (4.2), we can use the same iterative process described in Section 3.1.2 to obtain the individual

parameters with a slight modification such that the functional form of Ri and the intra-individual

covariance parameter ξξξξ remain the same across individuals. We can do that by fitting the m individual

regressions simultaneously and then using the residuals from all these fits to estimate ξξξξ.

In the second stage, following the model specification given in (4.3) and (4.4) for the inter-individual

variation, the objective is to estimate the population coefficients given in ββββ and the covariance matrix D.

In the traditional method, named the Standard Two-Stage method (STS), the individual estimates îθ are

considered as if they were the true parameters iθ . In the simplest case where

i i= +θ β b , we have that the

iθ are independent and identically distributed ( )N ,β D , and then the STS estimates are the sample mean

and covariance of the îθ

1

ˆ

ˆ

m

i

i

STSm

==∑θ

θ ,

( )( )T

1

ˆ ˆ ˆ ˆ

ˆ1

m

i STS i STS

i

STSm

=

− −

=−

∑ θ θ θ θ

D .

Since no account is taken of the uncertainty in estimating θθθθi, the STS estimator for the variances in D is

upwardly biased. A further drawback of the STS method is that no refinement of the individual îθ such

as shrinkage toward the mean is implemented (Davidian and Giltinan, 1995). Sheiner and Beal (1980,

1981, 1983) suggest to use the geometric mean instead of the arithmetic mean to estimate θθθθ. Their

suggestion is based on the assumption that the functional relation (4.3) for pharmacokinetic parameters is

usually of the form

( ) ( )ln lni i

= +θ β b .

These authors mention that in the data rich situation the population pharmacokinetic parameter estimates

obtained with this approach are as good as the ones obtained with the methods based on linearization that

we will treat in the next section. For the random effects however, they mention that the inter-individual

variance estimates are biased and imprecise.

Davidian and Giltinan (1995) recommend not using the STS method because of the drawbacks mentioned

above. As an alternative they discuss the Global Two-Stage Method (GTS), which does incorporate the

uncertainty of estimating θθθθi. However, iterative methods are needed to obtain the estimators with this

method which is computationally quite more complicated than the STS method.

31

4.3. Inference Based on Linearization

The NPD approach uses all the data points but as if they came from a single individual. On the other side,

the Two-Stage approach uses the data of the individuals as if they were single data points. This approach

takes a middle course between the two previous ones. It pools all the data but recognizing the individuals

they come from. An advantage of this approach over the Two-Stage approach is that it is capable to use

data from all the individuals, while in the Two-Stage approach, data from individuals without enough data

points to fit the individual models are not considered. Therefore, this approach is particularly valuable

when extensive measurements are not available on all the subjects, which is typically the case in

pharmacokinetics with routine type data8.

Consider the model presented in Section 4.1

( )i i i i= +y f θ e , ( ), ,

i i i=θ d a β b ,

with the general assumption that

( )( )| , ,i i i ie θ 0 R θ ξ∼ , ( ),i

b 0 D∼ .

There are two problems in the estimation of this model: that the random effects enter in the model in a

nonlinear fashion and that ei and bi are not independent since the distribution of ei depend on θθθθi which in

turn depend on bi. This approach is based on a linearization by Taylor series expansion of the hierarchical

nonlinear model in such a way that both problems are solved, and the marginal distribution of yi may be

computed. In this section we present two linear approximations: the first-order linearization suggested by

Beal and Sheiner (1982) and a refinement of it, the conditional first-order linearization suggested by

Lindstrom and Bates (1990). This discussion is mainly based on Chapter 6 of Davidian and Giltinan

(1995).

4.3.1. First-Order Linearization

The first-order linearization scheme proceeds as follows. Let us consider

( )1 2,

i i i i=e R θ ξ ε ,

with ( )1 2,

i iR θ ξ the Cholesky decomposition of ( ),

i iR θ ξ . Hence,

iε has mean zero, covariance matrix

inI , and is independent of bi. The model given in (4.1) may be written as

( )( ) ( )( )1 2, , , , ,i i i i i i i i= +y f d a β b R d a β b ξ ε . (4.5)

In this expression we replace iθ by ( ), ,

i id a β b to explicitly show the dependence on the random effects

bi. A Taylor series expansion of (4.5) in i

b about its mean ( )Ei

=b 0 retaining the first two terms in the

expansion of ( )( ), ,i i if d a β b and the first term in ( )( )1 2, , ,i i i iR d a β b ξ ε produces the approximation

( )( ) ( ) ( ) ( )( )1 2, , , , , , ,

ii i i i i i i i≈ + +by f d a β 0 F β 0 ∆ β 0 b R d a β 0 ξ ε , (4.6)

with ( ),i

F β 0 the i

n p× matrix of derivatives of ( )i if θ with respect to

iθ evaluated at ( ), ,

i i=θ d a β 0

and ( ),ib∆ β 0 the p k× matrix of derivatives of ( ), ,

i id a β b with respect to

ib evaluated at

i=b 0 .

Defining the i

n k× matrix ( ) ( ) ( ), , ,ii i= bZ β 0 F β 0 ∆ β 0 and ( )( )1 2

, , ,i i i i

∗ =e R d a β 0 ξ ε , (4.6) may be

written as

( )( ) ( ), , ,i i i i i i

∗≈ + +y f d a β 0 Z β 0 b e . (4.7)

8 These are the data collected from the routine care of patients receiving the drug of interest. In these cases, one

usually takes a few samples per individual in a big group of individuals. This kind of data is quite important because

it comes from the population target and not from healthy subjects.

32

From (4.7), the mean and covariance of yi may be specified by

( ) ( )( )( ) ( ) ( ) ( )( )T

E , , ,

V , , , , , .

i i i

i i i i i

≈

≈ +

y f d a β 0

y Z β 0 DZ β 0 R d a β 0 ξ (4.8)

If i

b and i

∗e are assumed to be normally distributed, it follows from (4.7) that the marginal distribution of

yi may be taken as approximately normal with moments given by (4.8). Maximum likelihood estimation

is based on taking the model (4.7) as exact and a normal distribution for i

b and i

∗e , and this is the

framework used by Sheiner and Beal (1980, 1981, 1983) in the studies where they compare the NPD,

STS, and NONMEM9 approaches. Generalized least squares methods rely on the assumption that the

model in (4.7) and the moments in (4.8) are exact and are inspired on multivariate extensions of the

individual nonlinear regression models.

4.3.2. Conditional First-Order Linearization

Lindstrom and Bates (1990) argue that the approximation given in (4.6) by Taylor series expansion about

i=b 0 may be poor. Instead of that, they propose a Taylor series expansion of (4.5) about some value

i

∗b

closer to i

b than 0. Then, the linear approximation given in (4.6) becomes

( )( ) ( ) ( )( ) ( )( )1 2, , , , , , ,ii i i i i i i i i i i i i

∗ ∗ ∗ ∗ ∗≈ + − +b

y f d a β b F β b ∆ β b b b R d a β b ξ ε , (4.9)

with ( ),i i

∗F β b the

in p× matrix of derivatives of ( )i i

f θ with respect to iθ evaluated at ( ), ,i i i

∗=θ d a β b

and ( ),i i

∗b∆ β b the p k× matrix of derivatives of ( ), ,

i id a β b with respect to

ib evaluated at

i i

∗=b b .

Defining, in a similar way as in the previous section, the i

n k× matrix ( ) ( ) ( ), , ,ii i i i i

∗ ∗ ∗= bZ β b F β b ∆ β b

and ( )( )1 2 , , ,i i i i i

∗ ∗=e R d a β b ξ ε , (4.9) may be written as

( )( ) ( ) ( ), , , ,i i i i i i i i i i i

∗ ∗ ∗ ∗ ∗≈ − + +y f d a β b Z β b b Z β b b e ,

and the mean and covariance of yi are specified by

( ) ( )( ) ( )

( ) ( ) ( ) ( )( )T

E , , , ,

V , , , , , .

i i i i i i i

i i i i i i i i

∗ ∗ ∗

∗ ∗ ∗

≈ −

≈ +

y f d a β b Z β b b

y Z β b DZ β b R d a β b ξ

Estimation of this model requires a reasonable choice of i

∗b . The strategy suggested by Lindstrom and

Bates (1990) consists of obtaining a suitable estimate of i

b , use this value as i

∗b in any, a ML or GLS

estimation procedure, and use the ML or GLS estimates to updated the estimate of i

b . Then, the process

must be iterated with the new values of i

∗b in each iteration.

Lindstrom and Bates (1990) developed their work under a more restricted model than the one presented in

Section 4.1. They assumed that d is a linear function of the fixed and random effects given in β and i

b

respectively and that the intra-individual covariance matrix ( ),i i

R θ ξ does not depend on iθ (and

therefore on i

b ) but on i just through its dimension. Davidian and Giltinan (1995) extend their technique

to accommodate the more general model presented in Section 4.1.

9 NONMEM is an acronym for analysis of nonlinear mixed effect models. In the pharmacokinetic literature, this term

refers to the analysis of hierarchical nonlinear models via linearization.

33

4.4. Software

Fitting a hierarchical nonlinear model by linearization (as discussed in Section 4.3) is considerably more

difficult than fitting a nonlinear model with data form a single individual since a computational point of

view. Even for the simplest case when both, the conditional distribution of i

e given iθ and the

distribution of bi are normal, and the covariance matrix Ri is given by σ 2

inI (that is, the assumption of

uncorrelated random errors with common variance), the algorithms to fit the model may not converge or

present problems with singular matrices. In these cases it is necessary to try with different initial

approximations and to perform a meticulous data cleaning process. Here we discuss some capabilities of

R and SAS to estimate this models. WinNonlin does not provide population pharmacokinetic analysis.

4.4.1. R

The nlme package fits nonlinear mixed effects models with the linearization method proposed by

Lindstrom and Bates (1990) and exposed in Section 4.3.2 but allowing for nested random effects (we will

see this characteristic in the example of Section 4.6). A normal distribution is assumed in both

components of variation, and the intra-individual errors are allowed to be correlated and to have unequal

variances (that is a complete specification as in (4.2)). The parameters of the model can be linear

functions of fixed and random effects, so the function d given in (4.3) is defined by

i i i i= +θ A β B b ,

with Ai and Bi p×r and p×k design matrices respectively. Although this specification is more restrictive

than (4.3), it offers a considerable range of possibilities for modelling inter-individual variation. For

instance, as shown in Section 4.1.4, we can specify different fixed effects among groups of individuals by

defining different design matrices Ai for each group; in such a way, different treatment groups may be

compared. In an analogous way, random effects with different normal distributions may be assigned to

different groups. Dependence of the fixed and random effects on covariates may be included.

4.4.2. SAS

The procedure NLMIXED fits nonlinear mixed effects models. We can specify different distributions for

ei|θθθθi; available options are Normal, binomial, gamma, negative binomial, Poisson, or a general distribution

using SAS programming statements. For the random effects bi, only the normal distribution is available.

PROC NLMIXED fits the models by numerically maximizing an approximation to the marginal

likelihood, that is, the likelihood integrated over the random effects. The principal methods to

approximate this integral are adaptive Gaussian quadrature and a first-order Taylor series approximation

around zero. This procedure does not offer the possibility of modelling correlations structures and

unequal variances for intra-individual errors.

Another option is SAS to fit nonlinear mixed effects models is the %NLINMIX macro. This macro works

by linearization of the nonlinear mixed effects model as described in Section 4.3. Both, the first-order

linearization and the conditional first-order linearization treated in Sections 4.3.1 and 4.3.2 are available.

The methods implemented in this macro are more similar to the method of R than the methods

implemented in the NLMIXED procedure, and consequently, the results obtained with %NLMIX are also

closer to the R results.

4.5. Example 1: One-Compartment Model with Extravascular Administration


In this example we analyze data from an open one-compartment model with intranasal administration of

drug gathered from 4 individuals. This example is a continuation of the one treated in Section 3.5, and the

goal now is to estimate the population parameters. The subject analyzed in Section 3.5 is also included in

34

the current data (subject 1). The data are presented in Table 4.1, and they are stored in the ex2 grouped

data object for the R computations.

TABLE 4.1 Concentration time data for 4 subjects after intranasal administration of 5 mg of drug.

Concentration (mg/l)

Time (hrs) Subject 1 Subject 2 Subject 3 Subject 4

0.00 0.0000 0.0000 0.0006 0.0007

0.25 0.0081 0.0128 0.0221 0.0127

0.50 0.0092 0.0152 0.0213 0.0108

0.75 0.0098 0.0164 0.0216 0.0127

1.00 0.0089 0.0180 0.0199 0.0121

2.00 0.0072 0.0147 0.0129 0.0152

4.00 0.0043 0.0055 0.0066 0.0072

6.00 0.0027 - 0.0036 0.0051

8.00 - 0.0034 0.0024 -

In the same way as in Section 3.5, for each individual we have the following pharmacokinetic model

( )

el aa

D a el

( )k t k tD f k

C t e eV k k

− ⋅ − ⋅⋅ ⋅ = − −

.

Considering f = 1 and D = 5 mg, we express the model for individual i, following the standard nonlinear

regression notation by

( )( )

2 11

3 1 2

5, i ix xi

i

i i i

f x e eθ θθ

θ θ θ− ⋅ − ⋅ = − −

θ ,

with θ1i = ka, θ2i = kel, and θ3i = VD for individual i. In figure 4.1 we present the concentration-time data

for the four individuals.

> plot(ex2)

Time

Con

c

0.000

0.005

0.010

0.015

0.020

0 2 4 6 8

1 2

3

0 2 4 6 8

0.000

0.005

0.010

0.015

0.020

4

FIGURE 4.1 Concentration of drug over time after extravascular administration for four subjects

35

4.5.2. Analysis of the Population Pharmacokinetic Model

Here we consider the simplest setting, that is, that the intra-individual random errors are independent and

have normal distributions with constant variance, and that the random effects have normal distribution.

Hence, the hierarchical nonlinear model is defined by

( )i i i i= +y f θ e ,

with the assumptions

( )2| N ,ii i n

σe θ 0 I∼ , i i

= +θ β b , ( )N ,i

b 0 D∼ .

We start by fitting an individual model for each subject using the nlsList function; the results are

stored in the oneEV.lis object and are shown in Table 4.2. The initial values for the parameters

required for the estimation process are, as in Section 3.5, θ1 = 5, θ2 = 0.2, and θ3 = 500.

TABLE 4.2 Individual analysis using the nlsList function.

> oneEV.lis<-nlsList(Conc~5*Ka/(Vd*(Ka-Kel))*(exp(-Kel*Time)-exp(-Ka*Time)),

+ data=ex2, start=c(Ka=5, Kel=0.2, Vd=500)); oneEV.lis

Call:

Model: Conc ~ 5 * Ka/(Vd * (Ka - Kel)) * (exp(-Kel * Time) - exp(-Ka * Time)) |

Subject

Coefficients:

Ka Kel Vd

1 5.521096 0.2438301 451.3271

2 3.046351 0.3135334 220.1265

3 8.984343 0.3416069 192.8556

4 8.242108 0.1410363 353.0241

Degrees of freedom: 33 total; 21 residual

Residual standard error: 0.001414658

> fixed.effects(oneEV.lis)

Ka Kel Vd

6.4484745 0.2600017 304.3333496

The estimated parameters for subject 1 are equal to the ones obtained in Section 3.5 and presented in

Table 3.2. The function fixed.effects extracts fixed effects estimates, and they are shown at the end

of Table 4.2; in this case they are just the arithmetic means of the individual estimates10

.

Pinheiro and Bates (2000) recommend using a diagonal matrix D when the number of random effects is

large relative to the number of individuals. The reason is that a general positive definite structure for D

would include too many parameters in the model, and this can carry out convergence problems due to an

overparameterized model. Here, we follow their recommendation, and in Table 4.3 we present the results

of the analysis considering a diagonal matrix D, that is, assuming that the random effects are uncorrelated.

As we can see on top of Table 4.3, the nlme function is applied directly over the oneEV.lis object

which contains the results of the first fitting. In this way, the fixed effects estimates obtained with the

nlsList function and presented in Table 4.2 are used as initial values. The results obtained with the

nlme function are stored in the oneEV1.nlme object.

In this analysis we are considering that all the individual parameters, ka, kel, and VD, contain a random

effect. However, it could sometimes be the case that some of the parameters may be considered as fixed

among subjects. We will use an ANOVA procedure to decide if the ka parameter can be considered as

fixed, though in practice additional non statistical criteria must be considered. To do that, we compute a

second model, oneEV2.nlme, where just kel and VD have random effects, and then we compare these

two models using a likelihood-ratio test. We present the fitted model in Table 4.4 and the ANOVA results

10 This corresponds with the STS approach treated in Section 4.2.2.

36

in Table 4.5. If we compare model 1 and model 2 we see that the log-likelihood values do not differ very

much, and that the AIC and BIC values obtained with model 2 are lower. These results support the idea

that model 2 is more suitable. Indeed, the high p-value suggests that the more complicated model 1 does

not fit the data significantly better than model 2, so we will consider that the parameter ka has just fixed

effect. In addition, we performed the same test for the kel and VD parameters, but in both cases the p-

values were significant (0.0182 and <0.0001 respectively).

TABLE 4.3 Estimated hierarchical nonlinear model under the assumption of uncorrelated

random effects.

> oneEV1.nlme<-nlme(oneEV.lis, random = pdDiag(Ka+Kel+Vd~1)); oneEV1.nlme

Nonlinear mixed-effects model fit by maximum likelihood

Model: Conc ~ 5 * Ka/(Vd * (Ka - Kel)) * (exp(-Kel * Time) - exp(-Ka * Time))

Log-likelihood: 157.8977

Fixed: list(Ka ~ 1, Kel ~ 1, Vd ~ 1)

Ka Kel Vd

5.3135642 0.2657170 294.2156330

Random effects:

Formula: list(Ka ~ 1, Kel ~ 1, Vd ~ 1)

Level: Subject

Structure: Diagonal

Ka Kel Vd Residual

StdDev: 1.605036 0.0667344 90.40686 0.001403244

Number of Observations: 33

Number of Groups: 4

TABLE 4.4 Estimated hierarchical nonlinear model under the assumption of uncorrelated

random effects. Parameters kel and VD are considered as random and ka as fixed.

> oneEV2.nlme<-update(oneEV1.nlme, random = pdDiag(Kel+Vd~1)); oneEV2.nlme





Ka Kel Vd

6.2177185 0.2502124 305.4877720

Random effects:

Formula: list(Kel ~ 1, Vd ~ 1)

Level: Subject

Structure: Diagonal

Kel Vd Residual

StdDev: 0.07140224 93.36056 0.001514504


Number of Groups: 4

TABLE 4.5 ANOVA procedure to test if the random effect for the ka parameter can be removed.

> anova(oneEV1.nlme, oneEV2.nlme)

Model df AIC BIC logLik Test L.Ratio p-value

oneEV1.nlme 1 7 -301.7953 -291.3198 157.8977

oneEV2.nlme 2 6 -302.5641 -293.5851 157.2821 1 vs 2 1.231189 0.2672

37

In Table 4.6 we explore a possible correlation between kel and VD. To do that, we update the model stored

in oneEV2.nlme with a general covariance structure. In Table 4.7 we compare both models with an

ANOVA procedure. The high negative correlation obtained between these two random effects (-0.704)

suggests that model 3 could be appropriate, but the ANOVA results state that this more complicated

model does not fit the data significantly better. Therefore, we choose the model 2, stored in the

oneEV2.nlme object as the more appropriate model to describe this data.

TABLE 4.6 Estimated hierarchical nonlinear model under the assumption of a general

covariance structure. Parameters kel and VD are considered as random and ka as fixed.

> oneEV3.nlme<-update(oneEV2.nlme, random = Kel+Vd~1); oneEV3.nlme





Ka Kel Vd

6.188203 0.243500 307.955015

Random effects:


Level: Subject

Structure: General positive-definite, Log-Cholesky parametrization

StdDev Corr

Kel 0.074010013 Kel

Vd 98.107180378 -0.704

Residual 0.001513284


Number of Groups: 4

TABLE 4.7 ANOVA procedure to test if a correlation between the random effects of kel and VD

must be included in the model.



oneEV2.nlme 1 6 -302.5641 -293.5851 157.2821

oneEV3.nlme 2 7 -302.2080 -291.7325 158.1040 1 vs 2 1.643892 0.1998

4.5.3. Heterogeneity of Variances

In the analysis so far we have assumed that the random errors have constant variance. However, with

concentration time data, the variance of the random errors at measure time t used to depend on the

concentration at that time. We can see this relation in Figure 4.2 where the random errors are more

disperse for larger fitted values.

We can use the varPower function of R to model the variance of the random errors as a power function

of the concentrations. Following the general specification in (4.2)

( ) ( )Cov | ,i i i i

=e θ R θ ξ ,

we will assume that the covariance matrices Ri are diagonal matrices with elements given by

( ) 2 2Var ij ije y

δσ= ,

38

so the standard deviation of the jth random error for individual i is proportional to some power δ of the

corresponding jth concentration11

. Therefore in this case the vector of covariance parameters ξξξξ contains

the scale parameter σ and the power parameter δ. The function varPower in R fits the best value for δ

which as shown in Table 4.8 turns out to be 0.6063. In Figure 4.3 we present the plot of residuals for this

new model which looks quite better than the one shown in Figure 4.2.

> plot(oneEV2.nlme)

Fitted values

Sta

nd

ard

ized

re

sid

ua

ls

-1

0

1

2

3

0.000 0.005 0.010 0.015 0.020

FIGURE 4.2 Standardized residuals versus fitted values for model oneEV2.nlme

TABLE 4.8 Estimated hierarchical nonlinear model considering heterogeneity of variances for

the intra-individual random errors.

> oneEV4.nlme<- update(oneEV2.nlme, weights = varPower(0,form=~Conc+0.00001))

> oneEV4.nlme





Ka Kel Vd

5.9268517 0.2555056 306.0218075

Random effects:


Level: Subject

Structure: Diagonal

Kel Vd Residual

StdDev: 0.05066358 89.05437 0.02703865

Variance function:

Structure: Power of variance covariate

Formula: ~Conc + 1e-05

Parameter estimates:

power

0.6062694


Number of Groups: 4

11 As we can see in the first line of Table 4.8, a small quantity (0.00001) is added to each concentration in the weight

function to avoid computational problems.

39

TABLE 4.8 Continuation.

> ranef(oneEV4.nlme)

Kel Vd

1 0.001624147 125.86434

2 -0.007037481 -39.39552

3 0.060773043 -108.27353

4 -0.055359709 21.80472

> plot(oneEV4.nlme)

Fitted values

Sta

nd

ard

ized

re

sid

ua

ls

-1

0

1

2

0.000 0.005 0.010 0.015 0.020

FIGURE 4.3 Standardized residuals versus fitted values for model oneEV4.nlme

Finally, we compare models oneEV2.nlme and oneEV4.nlme with an ANOVA procedure. The

results of Table 4.9 confirm our conclusions from the residual plots. We can see that allowing for

heterogeneity of variances considerably improves the fit of the model.

TABLE 4.9 ANOVA procedure to compare the models with constant and nonconstant variances



oneEV2.nlme 1 6 -302.5641 -293.5851 157.2821

oneEV4.nlme 2 7 -323.0044 -312.5289 168.5022 1 vs 2 22.44028 <.0001

Summarizing the final proposed model is

( )( )( )2 2 11

3 3 1 2 2

5ib x x

ij ij

i i

y e e eb b

β ββ

β β β− + ⋅ − ⋅ = − + + − −

,

with β1, β2, and β3 the fixed effects (population parameters), and b2i and b3i the random effects for

individual i. The individual parameters are given by β1, β2 + b2i, and β3 + b3i. For the random effects we

have that

2 22

3 33

00N ,

00

i

i

i

b d

b d

=

b ∼ .

40

The estimated fixed and random effects are (from Table 4.8) 1

ˆ 5.9269β = , 2

ˆ 0.2555β = , 3

ˆ 306.0218β = ,

21ˆ 0.0016b = ,

22ˆ 0.0070b = − ,

23ˆ 0.0608b = ,

24ˆ 0.0554b = − ,

31ˆ 125.86b = ,

32ˆ 39.40b = − ,

33ˆ 108.27b = − ,

and 34

ˆ 21.80b = . For the random errors, eij, we assume that they are independent and have a normal

distribution with mean 0 and variance given by

( ) 2 2Var ij ije y

δσ= ,

where ˆ 0.02704σ = and ˆ 0.6063δ = . In Figure 4.4 we present the population and individual fitted curves

for this last model.

> plot(augPred(oneEV4.nlme, level=0:1))

Time

Co

nc

0.000

0.005

0.010

0.015

0.020

0 2 4 6 8

1 2

3

0 2 4 6 8

0.000

0.005

0.010

0.015

0.020

4

fixed Subject

FIGURE 4.4 Population and individual fitted curves for the final model

4.6. Example 2: Comparison of Two Treatments in the One-Compartment Model

with Extravascular Administration


In this example we analyze data from an open one-compartment model with oral administration of drug

gathered from 6 subjects. The main goal is to compare two different formulations (treatments) which are

labeled E and PO, in terms of their effects on the pharmacokinetic parameters, especially in the constant

of absorption. Each treatment was applied twice to each subject in randomized orders and with a

sufficiently large between applications time in order to completely wash out the previous dose before the

administration of the next one. Hence we have four repetitions of the experiment per subject (two with

each treatment) which gives a total of 24 individual data sets. To avoid confusions we must keep in mind

41

that “subjects” and “individuals” are not the same; we will call “individual” to each of the 24 subject-

treatment-repetition combinations, and we will refer to them in the R code as “Unit” (and we will use

“Subject” and “Treat” for the other two classification criteria). We show these data in Table 4.10, and

they are stored, for the R computations, in the grouped data object ex3.

TABLE 4.10 Concentration time data for 6 subjects after oral administration of 1.5 g of drug, with

two treatments and two repetitions per treatment.

Concentration (mg/l) Time

(min) Treat Rep Sub 1 Sub 2 Sub 3 Sub 4 Sub 5 Sub 6

5 E 1 2.374 0.000 0.000 0.000 0.000 0.000

10 E 1 7.711 1.860 4.672 2.177 0.000 0.000

15 E 1 16.374 12.277 10.115 9.918 0.000 2.026

20 E 1 21.817 13.623 16.374 13.940 3.735 4.022

30 E 1 16.269 13.003 19.202 16.102 5.519 7.363

40 E 1 14.711 14.862 25.884 14.454 10.327 8.089

50 E 1 14.817 16.102 20.774 12.489 10.841 7.771

60 E 1 14.182 15.785 20.562 12.186 12.413 6.320

75 E 1 11.052 13.517 15.966 15.074 10.735 6.426

90 E 1 11.990 12.277 14.500 15.376 9.903 7.469

105 E 1 9.586 11.355 13.668 13.214 9.797 8.301

120 E 1 8.542 9.193 12.307 12.700 9.586 8.920

150 E 1 7.076 8.361 10.417 10.735 9.903 7.666

180 E 1 6.138 6.925 9.072 8.875 8.542 6.214

240 E 1 4.884 4.657 6.033 6.501 5.413 4.642

5 E 2 2.132 0.000 0.000 0.000 0.000 0.000

10 E 2 5.277 1.512 0.771 2.797 0.423 0.000

15 E 2 12.171 4.748 1.194 6.290 1.149 6.154

20 E 2 18.219 7.469 7.257 10.826 8.164 7.363

30 E 2 21.772 13.320 14.560 14.530 13.320 9.676

40 E 2 18.854 14.363 24.070 13.683 18.370 12.489

50 E 2 11.430 12.685 21.046 12.171 15.694 12.398

60 E 2 12.368 12.791 16.238 15.195 14.243 12.700

75 E 2 10.599 12.065 16.556 14.469 13.003 15.014

90 E 2 9.767 9.344 14.772 12.262 11.355 17.024

105 E 2 9.344 8.618 13.623 11.324 11.566 12.700

120 E 2 7.877 8.406 11.959 10.493 10.946 9.873

150 E 2 7.666 8.195 9.344 9.026 9.298 7.862

180 E 2 5.685 5.579 7.257 8.089 8.467 7.167

240 E 2 3.281 3.598 5.065 5.897 5.685 4.944

5 PO 1 0.000 0.000 0.000 0.000 0.000 0.000

10 PO 1 11.869 17.115 3.523 0.000 0.257 0.922

15 PO 1 19.292 12.096 9.389 0.015 5.231 7.393

20 PO 1 25.688 16.072 5.987 3.432 16.329 10.009

30 PO 1 17.342 15.119 22.906 14.711 16.435 12.897

40 PO 1 16.919 13.562 18.884 13.562 11.461 15.271

50 PO 1 14.137 15.119 13.623 11.884 13.471 14.862

60 PO 1 12.594 13.139 12.383 10.629 13.789 13.003

75 PO 1 11.355 11.990 14.757 9.692 13.048 10.432

90 PO 1 11.385 11.264 14.545 11.778 11.672 9.389

105 PO 1 9.329 10.417 14.757 12.096 11.355 9.298

120 PO 1 8.301 8.860 13.940 11.158 10.085 8.164

150 PO 1 7.061 8.119 10.946 9.797 9.041 6.925

180 PO 1 5.715 5.624 8.467 8.225 8.830 6.411

240 PO 1 4.173 3.946 5.791 5.413 6.607 4.657

5 PO 2 6.758 0.000 0.000 0.000 0.000 0.000

10 PO 2 14.212 8.195 4.702 0.000 0.922 0.000

42

TABLE 4.10 Continuation.

Concentration (mg/l) Time

(min) Treat Rep Sub 1 Sub 2 Sub 3 Sub 4 Sub 5 Sub 6

15 PO 2 19.141 15.800 18.446 1.618 1.724 5.095

20 PO 2 21.560 19.081 18.234 6.653 9.571 10.251

30 PO 2 22.966 12.201 18.446 10.478 16.722 11.294

40 PO 2 17.130 12.836 16.435 9.178 15.618 11.083

50 PO 2 14.212 13.683 15.271 12.594 12.700 11.294

60 PO 2 11.491 12.625 16.752 10.584 12.398 11.702

75 PO 2 9.178 10.402 14.106 9.178 11.491 11.597

90 PO 2 7.756 9.147 12.731 10.886 10.689 10.145

105 PO 2 7.454 9.253 12.096 10.085 10.085 9.435

120 PO 2 6.154 7.666 11.249 10.175 7.454 9.223

150 PO 2 4.944 6.607 9.873 9.072 8.966 8.089

180 PO 2 3.538 6.078 9.253 8.059 6.048 7.061

240 PO 2 2.026 3.115 5.655 5.957 4.748 4.475

In Figure 4.5 we present plots of the concentration time data for each subject and treatment combination.

We observe a similar shape for both curves in each subject treatment combination with the exception of

subject 6 with treatment E. Another strange characteristic in these data is the presence of two peaks in

some curves (it is clear with subject 4). These peaks may be related with some factor not considered in

the experiment, so attention must be paid to this strange effect in future experimentation.

In Table 4.10 we appreciate that in most of the cases a zero concentration is recorded at 5 minutes after

the administration of the drug and sometimes even at 10 and 15 minutes. Hence, we will include in this

example a lag-time in the model, denoted by tlag. With the addition of this component the model is defined

by

( )( ) ( )el lag a laga

D a el

( )k t t k t tD f k

C t e eV k k

− ⋅ − − ⋅ −⋅ ⋅ = − −.

In this example the dose is 1.5 g, and the fraction of drug which is absorbed is assumed to be 0.85, so

D.f = 1275 mg. The pharmacokinetic parameters ka, kel, and VD are positive quantities, and sometimes, in

order to ensure positiveness of the estimates, the model is parameterized in terms of the logarithms of the

parameters. We will do that in this example, so the model is finally defined by

( )( )( ) ( )( )

a

el a

a elDlag lag

1275( ) exp exp

kk k

k kV

eC t e t t e t t

e e e

∗

∗ ∗

∗ ∗∗

= − ⋅ − − − ⋅ − −

,

with a a

lnk k∗ = , el el

lnk k∗ = , and D D

lnV V∗ = . In the standard nonlinear notation we have for each

individual model

( )( )

( )( ) ( )( )1

2 1

3 1 2lag lag

1275, exp exp

i

ef x e x t e x t

e e e

θθ θ

θ θ θ = − ⋅ − − − ⋅ − −

θ , (4.10)

with 1 a a

lnk kθ ∗= = , 2 el el

lnk kθ ∗= = , and 3 D D

lnV Vθ ∗= = . The inclusion of the lag-time as a parameter in

the hierarchical nonlinear model brings about computational problems. To avoid these problems, we

estimate a different lag-time for each of the 24 individual models by fitting equation (4.10) using

nonlinear regression and with the restriction that the lag-time should not be greater than the lowest time

with positive concentration in the data. We fit different lag-times for each individual model because the

lag-time can depend on the subjects and the treatments, and because the first appearance of drug is

different between the two repetitions of the treatments for some subjects. The estimated lag-times range

from 3 to 15 minutes. Then, we use the estimated individual lag-times as fixed values in the population

analysis, so in the population stage, we consider for each individual the model

43

( )( )

( ) ( )1

2 1

3 1 2

1275, exp exp

ef x e x e x

e e e

θθ θ

θ θ θ

∗ ∗ ∗ = − ⋅ − − ⋅ −θ ,

where lagx x t∗ = − . For the computations presented in the next section, we store the data from Table 4.10,

with the lag-time adjustment and the zero concentration data points removed in the grouped data object

ex3clean.

> plot(ex3, outer=~sub_treat)

Time

Co

nc

0

5

10

15

20

25

0 50 100 150 200 250

sub1 - E sub1 - PO

0 50 100 150 200 250

sub2 - E sub2 - PO

sub3 - E sub3 - PO sub4 - E

0

5

10

15

20

25

sub4 - PO

0

5

10

15

20

25

sub5 - E

0 50 100 150 200 250

sub5 - PO sub6 - E

0 50 100 150 200 250

sub6 - PO

FIGURE 4.5 Concentration of drug over time after extravascular administration for six subjects

There are two important features in this example, the inclusion of the factor treatments and the two levels

for the random effects, given by subjects and the repetitions within subjects. These two features make the

modeling considerably more difficult.

4.6.2. Analysis of the Population Pharmacokinetic Model

In this example we have to include in the model the fixed effect of treatments and two levels of random

effects, given by subjects and the repetitions within subjects. Hence, we define the model by

( ) ( )

( ) ( ) ( )2

1 2

, ,

| N , , N , , N , ,

1, 2; 1,...,6; 1,..., 4,

ij

h

ij ij ij ij ij i ij

ij ij n i ij

h i j

σ

= + = + +

= = =

y f θ e θ β b b

e θ 0 I b 0 D b 0 D∼ ∼ ∼ (4.11)

44

where h, i, and j represent the hth treatment, ith subject, and jth repetition within the ith subject. ββββ is the

vector of fixed effects, bi is the vector of random effects for subject i, and bij is the vector of random

effects for the repetition j within subject i. When repetition j within subject i corresponds to treatment E, h

equals 1, and when it corresponds to treatment PO, h equals 2. The random effects bi and bij are assumed

to be independent and normally distributed.

We start the analysis by fitting individual nonlinear regression models using the nlsList function; the

initial values for the parameters required for the estimation process are θ1 = -2.3, θ2 = -4.6, and θ3 = 4.1.

The results of the nlsList function are stored in the onelag.lis object, and in Figure 4.6 we show

the confidence intervals for the individual pharmacokinetic parameters.

> onelag1.lis<-nlsList(Conc~1275*exp(lnKa)/(exp(lnVd)*(exp(lnKa)-exp(lnKel)))*

+ (exp(-exp(lnKel)*t)-exp(-exp(lnKa)*t)),

+ data=ex3clean, start=c(lnKa=-2.3, lnKel=-4.6, lnVd=4.1))

> plot(intervals(onelag1.lis))

Un

it

12345678

109

1112131415161718192021222324

-4 -3 -2 -1 0 1

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

lnKa

-8 -7 -6 -5 -4

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

lnKel

12345678

109

1112131415161718192021222324

3.5 4.0 4.5 5.0

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

lnVd

FIGURE 4.6 95% confidence intervals for the individual pharmacokinetic parameters

The first four intervals (numbered from bottom to top with 1, 2, 3, and 4) correspond to subject 1, the next

four to subject 2 and so on. Within each subject, the first two intervals correspond to treatment E and the

other two to treatment PO. The individual model 7 cannot be estimated (it corresponds to one repetition

with subject 2 and treatment PO). However there are not computational problems when we include this

45

model in the population analysis using the nlme function. We can see in this graph that for each subject,

the constant of absorption has a tendency to be greater with treatment PO.

We start the population analysis fitting a hierarchical nonlinear model with all the fixed and random

effects defined in (4.11). After that, we will test the suitability of the fixed effects by using likelihood-

ratio tests and the AIC value. The results of this first fitting are stored in the onelag1.nlme object and

are shown in Table 4.11.

TABLE 4.11 Estimated hierarchical nonlinear model. All the fixed and random effects are

included.

> onelag1.nlme<-nlme(Conc~1275*exp(lnKa)/(exp(lnVd)*(exp(lnKa)-exp(lnKel)))*


+ fixed = lnKa+lnKel+lnVd~Treat, random = lnKa+lnKel+lnVd~1|Subject/Unit,

+ data=ex3clean, start=c(-2.3,0,-4.6,0,4.1,0))

> onelag1.nlme


Model: Conc ~ 1275 * exp(lnKa)/(exp(lnVd) * (exp(lnKa) - exp(lnKel))) *

(exp(-exp(lnKel) * t) - exp(-exp(lnKa) * t))

Log-likelihood: -712.0107

Fixed: lnKa + lnKel + lnVd ~ Treat

lnKa.(Intercept) lnKa.TreatPO lnKel.(Intercept) lnKel.TreatPO

-2.49726501 0.78591411 -5.16345068 -0.01963519

lnVd.(Intercept) lnVd.TreatPO

4.23964597 0.05720349

Random effects:

Formula: list(lnKa ~ 1, lnKel ~ 1, lnVd ~ 1)

Level: Subject


StdDev Corr

lnKa.(Intercept) 0.2692887 lnK.(In) lnKl.(I)

lnKel.(Intercept) 0.3023348 0.876

lnVd.(Intercept) 0.1814035 -0.503 -0.856


Level: Unit %in% Subject


StdDev Corr

lnKa.(Intercept) 0.4603061 lnK.(In) lnKl.(I)

lnKel.(Intercept) 0.2654344 -0.445

lnVd.(Intercept) 0.1645496 0.217 -0.970

Residual 1.7502720


Number of Groups:

Subject Unit %in% Subject

6 24

> anova(onelag1.nlme)

numDF denDF F-value p-value

lnKa.(Intercept) 1 303 99.2063 <.0001

lnKa.Treat 1 303 24.8467 <.0001

lnKel.(Intercept) 1 303 259.2543 <.0001

lnKel.Treat 1 303 1.4860 0.2238

lnVd.(Intercept) 1 303 2572.0818 <.0001

lnVd.Treat 1 303 0.5483 0.4596

The ANOVA results at the end of the table suggest that the effect of treatments in the volume of

distribution and the constant of elimination could be negligible. Indeed, based on pharmacokinetic

considerations, this must be the case because both, the constant of elimination and the volume of

distribution are subject characteristics, and then, they should not depend on the treatment. To confirm this

assumption, we fit a model without the effect of treatments on the volume of distribution in the

onelag2.nlme object and a model without the effect of treatments on both, the volume of distribution

46

and the constant of elimination, in the onelag3.nlme object. Then, we perform a likelihood-ratio test

to compare model onelag1.nlme with model onelag2.nlme and model onelag2.nlme with

model onelag3.nlme. The results of these comparisons are shown in Table 4.12.

TABLE 4.12 ANOVA procedures to evaluate the effect of treatments on the volume of

distribution and constant of elimination.



+ fixed = list(lnKa~Treat, lnKel~Treat, lnVd~1),

+ random = lnKa+lnKel+lnVd~1|Subject/Unit,

+ data=ex3clean, start=c(-2.3,0,-4.6,0,4.1))

> anova(onelag1.nlme,onelag2.nlme)


onelag1.nlme 1 19 1462.022 1534.319 -712.0107

onelag2.nlme 2 18 1460.571 1529.063 -712.2854 1 vs 2 0.5492719 0.4586



+ fixed = list(lnKa~Treat, lnKel~1, lnVd~1),

+ random = lnKa+lnKel+lnVd~1|Subject/Unit,

+ data=ex3clean, start=c(-2.3,0,-4.6,4.1))

> anova(onelag2.nlme,onelag3.nlme)


onelag2.nlme 1 18 1460.571 1529.063 -712.2854

onelag3.nlme 2 17 1460.234 1524.921 -713.1168 1 vs 2 1.662786 0.1972

We see that the inclusion of the effect of treatments on the volume of distribution and on the constant of

elimination in the model does not fit the data significantly better (p-values of 0.4586 and 0.1972). Indeed,

as we go from the complete model in onelag1.nlme to the models in onelag2.nlme and

onelag3.nlme, we get lower AIC values. Hence we conclude that the treatments just affect the

constant of absorption, which in turn as mention before, is of pharmacokinetic meaning. This final model

is shown in Table 4.13. Confidence intervals for the fixed and random effects parameters can be obtained

with the command intervals. From this table we can get the estimates for the parameters of the model

specified in (4.11), and they are

( )1

2.549

ˆ 5.170

4.268

− = −

β , ( )2

1.664

ˆ 5.170

4.268

− = −

β ,

1

0.0707 0.0715 0.0248

ˆ 0.0913 0.0461

0.0322

− = −

D , 2

0.2157 0.0524 0.0134

ˆ 0.0637 0.0400

0.0273

− = −

D ,

2ˆ 3.073σ = .

We can see in Table 4.13 that the estimated effect of treatment PO on a

k∗ is 0.8852 greater than the

estimated effect of treatment E on a

k∗ . It means that the estimated constant of absorption with treatment

PO is approximately 2.42 times ( ( )exp 0.8852 2.42= ) the estimated constant of absorption with treatment

E.

47

TABLE 4.13 Final estimated hierarchical nonlinear model. Factor Treatments only affects the

constant of absorption.

> onelag3.nlme


Model: Conc ~ 1275 * exp(lnKa)/(exp(lnVd) * (exp(lnKa) - exp(lnKel))) *

(exp(-exp(lnKel) * t) - exp(-exp(lnKa) * t))

Log-likelihood: -713.1168

Fixed: list(lnKa ~ Treat, lnKel ~ 1, lnVd ~ 1)

lnKa.(Intercept) lnKa.TreatPO lnKel lnVd

-2.548838 0.885244 -5.170350 4.268024

Random effects:


Level: Subject


StdDev Corr

lnKa.(Intercept) 0.2658372 lK.(I) lnKel

lnKel 0.3021092 0.89

lnVd 0.1795318 -0.52 -0.85


Level: Unit %in% Subject


StdDev Corr

lnKa.(Intercept) 0.4644615 lK.(I) lnKel

lnKel 0.2524257 -0.447

lnVd 0.1652529 0.174 -0.958

Residual 1.7529901


Number of Groups:

Subject Unit %in% Subject

6 24

> intervals(onelag3.nlme)

Approximate 95% confidence intervals

Fixed effects:

lower est. upper

lnKa.(Intercept) -2.8941050 -2.548838 -2.203570

lnKa.TreatPO 0.5390877 0.885244 1.231400

lnKel -5.4483971 -5.170350 -4.892303

lnVd 4.1050106 4.268024 4.431038

attr(,"label")

[1] "Fixed effects:"

Random Effects:

Level: Subject

lower est. upper

sd(lnKa.(Intercept)) 0.09278093 0.2658372 0.7616806

sd(lnKel) 0.14602780 0.3021092 0.6250176

sd(lnVd) 0.08650154 0.1795318 0.3726137

cor(lnKa.(Intercept),lnKel) -0.91630921 0.8903210 0.9997060

cor(lnKa.(Intercept),lnVd) -0.96099913 -0.5199324 0.6675543

cor(lnKel,lnVd) -0.97758356 -0.8500074 -0.2658948

Level: Unit

lower est. upper

sd(lnKa.(Intercept)) 0.3083837 0.4644615 0.6995327

sd(lnKel) 0.1444497 0.2524257 0.4411134

sd(lnVd) 0.1054579 0.1652529 0.2589520

cor(lnKa.(Intercept),lnKel) -0.8007522 -0.4467078 0.1386471

cor(lnKa.(Intercept),lnVd) -0.3768424 0.1743702 0.6343818

cor(lnKel,lnVd) -0.9957065 -0.9578051 -0.6448654

Within-group standard error:

lower est. upper

1.608233 1.752990 1.910777

48

5. Sampling Strategies

The sampling design is an important issue in most statistical applications, and in population

pharmacokinetics it has a special relevance due to some particular characteristics of the field. Firstly,

since two sources of variation are present in the population model, that is, inter and intra-individual

variation, the sample size determination is a two sided problem, each of them with their own difficulties.

For the intra-individual variation, the main problem is that in some situations it is not possible to obtain

several samples per individual. With routine clinical data for instance, only a few samples per individual

are available in a group of several individuals, usually with just one or two samples in many of them.

With experimental data, other kind of considerations (e.g., popular believes, superstitions, and clinical

considerations) used to limit the number of samples per individual. For inter-individual variation,

limitations rely more on budget considerations, although ethical and clinical considerations can also be

important. Hence, the trade-off between number of individuals and number of measures per individual is

a central aspect here. Another important consideration in the sampling design in pharmacokinetic studies

is the measure times. A limitation here is that sometimes it is quite difficult to strongly control the

measure times. This is the case of routine clinical data where measures are taken at the time patients

arrive.

There are several studies addressing these topics in the literature, and in this section we present some

results. The problem of getting optimal sampling times has been approached with simulation studies

(mostly in the pharmacokinetic literature) and the theory of optimal designs (mostly in the statistical

literature). In Section 5.1 we present some results from the optimal designs theory; the studies in this field

focus mainly in the individual model analysis. In Section 5.2 we present a summary of some simulation

studies which are more focused on the population model analysis; these studies approach problems as the

trade-off between the number of subjects and the number of measures per subject, and the most

appropriate individual sampling times, including the effect of some randomness in the sampling times

which is typical in pharmacokinetic studies. Finally in Section 5.3 we perform some simulations to

evaluate the effect of subjects with just one or two measures in the population analysis of the one-

compartment model with intra and extravascular administration.

5.1. Optimal Designs

We start this section with a definition for D and c-optimal designs, mainly based on the book of Atkinson

and Donev (1996). To go into this subject, we must start with a definition for a design.

A continuous design is represented by a measure ξ over the design region. If the design has trials at k

different points in the design region, we write

1 2

1 2

k

kw w w

ξ

=

x x x…

…,

where xi are the design points (or support points) and wi the corresponding design weights. A design with

n trials is exact if it consists of ni trials at location xi with

1

k

i

i

n n=

=∑ .

49

Given an optimum continuous design ξ *, if n trials are available, in practice we will perform the exact

design ξn with ni the integer approximation to i

w n∗ .

Consider a nonlinear regression model as defined in Section 3.1. A design is D-optimal if it maximizes

the determinant of the information matrix T

F. F. . A D-optimal design is appropriate if our interest is in

precise estimation of all the parameters of the model. If there is a particular interest in the estimation of

some linear combination of the parameters, Tc θ , we must use a c-optimal design. A c-optimal design

minimizes the variance of the linear combination of interest, which is given by

( ) ( )1

T 2 T Tvar σ

−

=c θ c F. F. c .

Due to the fact that the model is nonlinear in the parameters, the matrix of derivatives F. (cf. (3.7)) does

depend on the parameter values, and hence the optimum design too. If the interest is in a nonlinear

function of the parameters c(θθθθ) (which is the case of AUC, MRT, or t1/2), the nonlinear function must be

expanded by Taylor series as in (3.12), and then the resulting c-optimal design will depend on the

parameters through both, the information matrix and the derivatives of c(θθθθ) with respect to θθθθ. Therefore D

and c-optimal designs are only locally optimum. We can deal with the problem of the dependence on the

unknown parameters with the following approaches:

1. Assume a prior value for the parameters.

2. Assume a prior distribution for the parameters (Bayesian approach).

3. Sequential designs.

In the sequential approach we assume a prior value or a prior distribution for the parameters, find the

optimal design for this prior specification, carry out an experiment, and estimate the model. If the

parameter estimates do not considerably differ from the prior specification the process stops, otherwise,

we repeat the process with the parameter estimates as original approximations. The main limitations of

this approach are the costs and time for experimentation. However, in a population study parameter

estimates from previous subjects could be used to compute optimal designs for the next subject

(D’Argenio, 1981).

D-optimal designs for nonlinear models constructed with a prior value for the parameters have some

limitations. If the nonlinear model has p parameters, the D-optimal design usually has p different support

points with equal weights, and hence, there are no degrees of freedom to check the model. Indeed, if the

initial approximation for the parameters is far from the true values, the efficiency of the resulting optimal

design could be very low. For c-optimal designs, the number of support points can be even less than p,

and therefore not all the parameters can be estimated. Indeed, a c-optimal design for a specific parameter

may produce poor estimates for other parameters in the model. Assuming a prior distribution for the

parameters will produce more support points, which allows for model checking, and the number of

support points will be larger as the prior distribution becomes more dispersed. However, as the prior

distribution becomes more diffuse, the relative efficiency of the design will be lower. A similar approach

to the Bayesian is the Maximin approach where we specify a discrete set of possible parameter values

instead of a continuous prior distribution. In the Maximin approach we choose the design which

maximizes the minimum efficiency through the different parameter values. Maximin and Bayesian

optimal designs are more robust to misspecifications of the parameter values. Biedermann, Dette, and

Pepelyshev (2004) explore the Maximin approach in a two exponential compartment model based on D-

efficiencies; Dette, Haines, and Imhof (2005) analyze the relationship between the Maximin and Bayesian

approaches for linear and nonlinear regression models.

Because of the dependence of the optimal designs on the parameter values it is not possible to make

general recommendations. However, there are some studies in the literature addressing this problem in the

field of pharmacokinetics that give some insights. Atkinson et al. (1993) computed D-optimal and c-

optimal designs for the area under the concentration curve (AUC), the maximum concentration (cmax), and

the time to maximum concentration (tmax) in an open one-compartment model with extravascular

administration and compare the efficiency of these designs with the typical geometric design12

with 18

12 In the geometric design readings are taken at approximately equal intervals in log-time. In this particular case,

measures are taken at times, in hours, 0.166, 0.333, 0.5, 0.666, 1, 1.5, 2, 2.5, 3, 4, 5, 6, 8, 10, 12, 24, 30, and 48.

50

measures. They also proposed a “c-omnibus” design where the sum of suitable scaled asymptotic

variances of the three parameters of interest is minimized. They worked with prior values for the

parameters and prior distributions (Bayesian approach). The D-optimal design with a prior value for the

parameters has three support points with equal weights (it takes measures at 0.23, 1.39, and 18.42 hours).

c-optimal designs rely on less than three points, 2, 2, and 1 for AUC, tmax, and cmax respectively, so their

utility is dubious. Optimal designs based on a prior distribution for the parameters give more support

points. They show that if there is a good knowledge about the parameter values, the D-optimal or c-

omnibus designs are quite better than the 18-point geometric design, but if there is great uncertainty in the

parameter values, the 18-point geometric design might be reasonable to use. Indeed, if there is an interest

in estimation of several parameters, the 18-point geometric design is a good alternative. Mentre, Mallet,

and Baccar (1997) analyzed the open one-compartment model with intravascular administration in a

population setting and paid attention to the trade-off of number of subjects and number of measures per

subject under a D-optimality criterion. Although they did not carry out a simulation study, we present

their results in the next section because their analysis is closer to those presented there. Nevertheless, for

this particular model (cf. (2.3)) it is possible to get an analytic solution for the D-efficiency criteriion

which corresponds to take the first measure at time 0 and the second one at time el

1 k (Melas, 2005).

Finally, it is important to note that all the precedent discussion has sense under the assumption of

independent errors of constant variance.

5.2. Simulation Studies

In this section we present a summary of some simulation studies approaching the sampling strategies

problem in pharmacokinetics. These studies focus mainly on the following aspects:

1. Optimal sampling times.

2. Trade-off between number of subjects and number of samples per subject.

3. The effect of randomness in the sampling times.

4. The gain in the precision of the estimates due to the inclusion of subjects with just one measure.

Sheiner and Beal (1983)

They worked with the open one-compartment model with intravascular administration of a regular

repetitive bolus dose. They simulated 2 measures per individual with random times for 50 individuals

(50×2 design) and then went over some departures from this base design. They also compared the STS

method (cf. Section 4.2.2) with the method based on linearization (cf. Section 4.3.1). They found that the

inter-individual variances were poorly estimated even with the large sample size of 50 individuals. They

compared the 50×2 design with 33×3 and 25×4 designs; while with the STS method the trade-off clearly

favored more samples per individual, with the method based on linearization the gain was not so

important. Finally, they investigated the effect of adding data from individuals with a single measure.

They added 50 and 100 single observations to the basic design. While the STS method clearly cannot take

advantage from these data, the method based on linearization can; with the addition of these points and

the linearization based method, they observed an improvement on the precision of all the estimates except

for the intra-individual variance. With respect to the measure times, they explored some rigid designs and

compared the results with the ones obtained from a routine type data which were simulated by randomly

choosing the dosing interval and the sampling times, but they did not find important differences in the

quality of the estimates with the better designed data.

Al-Banna, Kelman, and Whiting (1989)

They worked with the open one-compartment model with intravascular administration of a single dose.

They ran simulations under the assumption of independent normal distributions for the individual

parameters and a normal distribution for the random errors with variance proportional to the actual

concentration to compare three sampling schemes: 2 measures per individual in a group of 50 (50×2

design), 3 measures per individual in a group of 50 (50×3 design), and 3 measures per individual in a

51

group of 33 (33×3 design). They tried different measure times for each design from as early as possible

after the intravenous bolus dose (5 minutes after administration) until as late as possible in such a way

that a minimum response be still observable (20 hours after administration); at each time they added a

random element from a uniform distribution with a range of ± 1 hour to mimic a real study. The

evaluation of each sampling design was made on the precision and bias of the parameter estimates

obtained using a method based on linearization (cf. Section 4.3.1). Considering the trade-off of number of

individuals and number of measures per individual, they compared the 33×3 designs with the 50×2

designs with the first measure at 5 minutes after administration, the second one at different times between

1 and 20 hours and the third one, for the 33×3 designs, at 20 hours. The 33×3 designs gave quite better

results in the estimation of parameters than the 50×2 designs no matter the time of the second measure.

There were no clear insights about the best moment to take the second measure with the 50×2 designs,

and while earlier times produced better estimates for some parameters, later times worked better with

others.

Ette et al. (1994)

They worked with a similar setting as Al-Banna, Kelman, and Whiting (1989) but tested three and four-

point designs with a first measure at 5 minutes after dose, the last measure at 240 minutes after dose, and

the intermediate measures at different time points. In both designs a fixed number of 48 measures was

considered. In the three-point design 16 different subjects were sampled at each time point, and in the

four-point design 12 different subjects were sampled at each time point, so just one measure per subject

was taken13

. Similarly to the results of Al-Banna, Kelman, and Whiting (1989), they found that, with the

three-point designs, the overall efficiency on the estimation does not depend on the location of the third

sample provided that the other two were as early and as late as possible. While earlier times for the

intermediate measure produced better estimates for volume of distribution, later times produced better

estimates for clearance. The four-point design was not markedly better than the three-point design in

overall efficiency.

Jonsson, Wade, and Karlsson (1996)

They evaluated the effect of taking two samples instead of one during each visit to a clinic (that is in the

context of routine clinical data) with simulated data from a one-compartment model with extravascular

administration at steady state. The dosing interval was 12 hours and the basic design considered two visits

per day, one during the morning and one during the afternoon. The second measures were taken at the

same time (time 0) or 1 or 2 hours after the first one. They also analyzed a real data set with similar

characteristics to the simulated data. They compared the following designs with a fixed number of 200

samples: 100 patients with two visits and one sample per visit, 75 patients of which 25 had two visits and

two samples per visit and 50 had two visits and one sample per visit, and 50 patients with two visits and

two samples per visit. They found that the quality of the parameter estimates, with respect to precision

and bias, was greater when two measures per visit were taking in some of the patients (the 75 patients

design), and perhaps more interesting, that this improvement does not depend on the time of the second

measure (at time 0, 1, or 2). Sampling designs where one fraction of the patients have only early samples

(morning) and the other fraction have only late samples (afternoon) were inferior to designs where the

patients had both early and late samples, even when the number of samples is the same in both designs.

Mentre, Mallet, and Baccar (1997)

They focused in the population analysis of the one-compartment model with intravascular administration

and restricted their analysis to a finite set of sampling times (0.5, 1, 2, 4, 7, and 24 hours after

administration). They compared different designs with a fixed amount of 60 measures (60×1, 30×2, 20×3,

15×4, 12×5, and 10×6). Among them the optimal design is the 30×2 design with the first measure at 0.5

and the second one at 24 hours after administration. The 20×3 design with a third measure at 0.5 or 1 hour

is 83% as efficient as the optimal design, and the efficiency decreases as more measures per subject are

considered. If only one measure is available per subject (60×1 design), the optimum design takes 19, 23,

and 18 measures at 0.5, 7, and 24 hours after administration respectively, and it is 58% as efficient as the

13

This is the case of destructive population pharmacokinetic studies where animals are sacrificed when the measure

is taken.

52

optimal 30×2 design. An important result is that the optimal population design based on D-optimality

generally repeats the D-optimal design for all the individuals.

5.3. Sparse Data Analysis

In Section 4.3 we mentioned that an important advantage of the method based on linearization over the

Two-Stage approach is that the first one is capable to use data from individuals without enough data

points to estimate the individual pharmacokinetic models. Due to the fact that in pharmacokinetics it is

sometimes impossible to obtain several measures per subject, this characteristic of the methods based on

linearization is quite promising.

Sheiner and Beal (1983) made a simulation study to investigate the effect of the addition of single

measures in the one-compartment model with intravascular administration. In this section we will

perform some simulations to get more information about how much we can gain from individuals with

fewer measures than parameters in the individual model. In Section 5.3.1 we will work with the one-

compartment model with intravascular administration but considering a correlation between the random

effects, which was not the case in the study of Sheiner and Beal (1983). In Section 5.3.2 we will work

with the one-compartment model with extravascular administration and assuming uncorrelated random

effects. In all the cases we will produce 200 simulations to have a clear idea about the distribution of the

estimated parameters.

5.3.1. Addition of Individuals with One Measure in the One-Compartment Model with


Here we will simulate data from a one-compartment model with intravascular administration of a dose of

1 unit. Following the recommendation of Sheiner and Beal (1983) for the relation between the individual

parameters and the fixed and random effects, we have that the model is given by

( ) ( )1

22 1

1 2

1 1exp exp i

i

b

ij i ij ij ij ijb

i

y x e e x ee

φ βφ β

= − + = − + ,

( ) ( )1 1 1ln ln

i ibφ β= + ,

( ) ( )2 2 2ln ln

i ibφ β= + .

The constant of elimination and volume of distribution fixed effects are settled at β1 = 0.01 and β2 = 100.

The random effects, b1i and b2i, are assumed to have a bivariate normal distribution, each of them with a

mean of 0 and a standard deviation of 0.15, and a correlation of -0.7. For the intra-individual variation we

assume that the error terms, eij, have independent normal distributions with mean 0 and standard deviation

0.1 times the actual concentration.

We simulate data from 10 subjects with 10 measures per subject taken at equidistant times in logarithmic

scale; the sampling times are, on linear scale, 10, 14, 20, 29, 41, 58, 83, 118, 168, and 240. Then we add

sequentially data simulated for 20, 40, and 60 subjects with single measures taken at times chosen at

random from the 10 original sampling times. Therefore we compare the following designs: 10×10, 10×10

+ 20×1, 10×10 + 40×1, and 10×10 + 60×1. We show the results in Figures 5.1, 5.2, 5.3, 5.4, 5.5, and 5.6

by using boxplots. The dashed horizontal lines represent the real value of the parameters and the “+”

symbols the estimated means.

In Figures 5.1 and 5.2 we can see that for the fixed effects, the addition of individuals with just one

measure improves the precision of the estimators. About bias, we see that for the constant of elimination

fixed effect the addition of the single measures increases the estimated value while for the volume of

distribution we observe a slight decrease. The mean and median of the simulated values are quite close to

the real value for the constant of elimination and slightly differ from it for the volume of distribution. The

small bias that we get with all the estimators can be however a result of the linearization of the

expectation surface applied to fit the model. In Figures 5.3, 5.4, and 5.5 we see that for the random effects

53

the gain in precision is less important than for the random effects. For the random effect variances we

observe that the values are increased as more individual measures are included while for the correlation

between them the mean and median of the estimated values remain approximately constant. Finally we

see in Figure 5.6 that the addition of the individual measures produces no effect in the intra-individual

variation which indeed is theoretically consistent.

10x10 10x10 + 20x1 10x10 + 40x1 10x10 + 60x1

0.0

09

00

.01

00

0.0

11

0

Sampling Design

Co

nsta

nt o

f E

limin

atio

n

FIGURE 5.1 Boxplots for the estimated constants of elimination obtained from 200 simulations with different

designs. The horizontal dashed line represents the real parameter value and the + marks the sample means.

10x10 10x10 + 20x1 10x10 + 40x1 10x10 + 60x1

90

95

10

01

05

11

01

15

Sampling Design

Vo

lum

e o

f D

istr

ibu

tion

FIGURE 5.2 Boxplots for the estimated volumes of distribution obtained from 200 simulations with different


54

10x10 10x10 + 20x1 10x10 + 40x1 10x10 + 60x1

0.0

50

.10

0.1

50

.20

0.2

5

Sampling Design

Co

nsta

nt o

f E

limin

atio

n R

an

do

m E

ffe

ct S

tan

da

rd D

evia

tion

FIGURE 5.3 Boxplots for the estimated constant of elimination random effect standard deviations obtained from

200 simulations with different designs. The horizontal dashed line represents the real parameter value and the +

marks the sample means.

10x10 10x10 + 20x1 10x10 + 40x1 10x10 + 60x1

0.0

50

.10

0.1

50

.20

0.2

5

Sampling Design

Vo

lum

e o

f D

istr

ibu

tio

n R

an

do

m E

ffe

ct S

tan

da

rd D

evia

tion

FIGURE 5.4 Boxplots for the estimated volume of distribution random effect standard deviations obtained from 200

simulations with different designs. The horizontal dashed line represents the real parameter value and the + marks the

sample means.

55

10x10 10x10 + 20x1 10x10 + 40x1 10x10 + 60x1

-1.0

-0.8

-0.6

-0.4

-0.2

0.0

0.2

Sampling Design

Ra

nd

om

Effe

cts

Co

rre

latio

n

FIGURE 5.5 Boxplots for the estimated constant of elimination and volume of distribution random effects

correlations obtained from 200 simulations with different designs. The horizontal dashed line represents the real

parameter value and the + marks the sample means.

10x10 10x10 + 20x1 10x10 + 40x1 10x10 + 60x1

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

Sampling Design

Intr

a-I

nd

ivid

ua

l S

tan

da

rd D

evia

tio

n

FIGURE 5.6 Boxplots for the estimated intra-individual standard deviations obtained from 200 simulations with

different designs. The horizontal dashed line represents the real parameter value and the + marks the sample means.

5.3.2. Addition of Individuals with One and Two Measures in the One-Compartment Model with


In this section we will simulate data from a one-compartment model with extravascular administration of

a dose of 1 unit, and for simplicity we will assume that the entire drug is absorbed by the body. In the

same way as in Section 5.3.1 we will parameterize the model in terms of the logarithms of the individual

parameters, so the model is given by

56

( )( ) ( )

( )( ) ( )

1

2 1

3 1 2

1

2 1

3 1 2

1

2 1

3 1 2

exp exp

exp exp ,i

i i

i i i

i

ij i ij i ij ij

i i ib

b b

ij ij ijb b b

y x x e

ee x e x e

e e e

φφ φ

φ φ φβ

β ββ β β

= − − − + −

= − − − + −

( ) ( )1 1 1ln ln

i ibφ β= + ,

( ) ( )2 2 2ln ln

i ibφ β= + ,

( ) ( )3 3 3ln ln

i ibφ β= + .

The constant of absorption, constant of elimination, and volume of distribution fixed effects are settled at

β1 = 0.1, β2 = 0.01, and β3 = 100. The random effects, b1i, b2i, and b3i, are assumed to have independent

normal distributions, each of them with a mean of 0 and a standard deviation of 0.15. For the intra-

individual variation we assume that the error terms, eij, have independent normal distributions with mean

0 and standard deviation 0.1 times the actual concentration.

We simulate data from 10 subjects with 10 measures per subject (10×10) taken at the same times as in the

previous section, that is, at 10, 14, 20, 29, 41, 58, 83, 118, 168, and 240. Then we generate three

additional designs by adding to the original 10×10 design data simulated for 20 subjects with single

measures (20×1), data simulated for 10 subjects with two measures per subject (10×2), and both, the 20×1

and 10×2 data. Therefore we compare the following designs: 10×10, 10×10 + 20×1, 10×10 + 10×2, and

10×10 + 20×1 + 10×2. We show the results in Figures 5.7, 5.8, 5.9, 5.10, 5.11, 5.12, and 5.13.

In Figures 5.7, 5.8, and 5.9 we can see that the gain in precision for the fixed effects is quite small. The

improvement in precision due to the addition of the 20×1 and the 10×2 points is quite similar and adding

both, the 20×1 and the 10×2 data points produces a clearer improvement in the precision of the estimates.

The addition of individuals with just one or two measures produces a small change in the mean and

median of the estimates; while for the constant of absorption and volume of distribution the addition of

these data decreases the estimated values, for the constant of elimination the addition of these data

increases the estimated values. For the variances of the random effects the improvement in precision is

not noticeable for the constant of absorption in Figure 5.10, barely perceptible for the constant of

elimination in Figure 5.11, and quite clear for the volume of distribution in Figure 5.12. In Figure 5.13 we

can see that the effect of the addition of the data with two measures per individual does not produce an

improvement in the precision of the estimation of the intra-individual variation.

10x10 10x10 + 20x1 10x10 + 10x2 10x10 + 20x1 + 10x2

0.0

90

.10

0.1

10

.12

Sampling Design

Co

nsta

nt o

f A

bso

ptio

n

FIGURE 5.7 Boxplots for the estimated constants of absorption obtained from 200 simulations with different


57

10x10 10x10 + 20x1 10x10 + 10x2 10x10 + 20x1 + 10x2

0.0

09

00

.00

95

0.0

10

00

.01

05

0.0

11

0

Sampling Design

Co

nsta

nt o

f E

limin

atio

n

FIGURE 5.8 Boxplots for the estimated constants of elimination obtained from 200 simulations with different


10x10 10x10 + 20x1 10x10 + 10x2 10x10 + 20x1 + 10x2

85

90

95

10

01

05

11

01

15

Sampling Design

Vo

lum

e o

f D

istr

ibu

tion

FIGURE 5.9 Boxplots for the estimated volumes of distribution obtained from 200 simulations with different


58

10x10 10x10 + 20x1 10x10 + 10x2 10x10 + 20x1 + 10x2

0.0

00

.05

0.1

00

.15

0.2

00

.25

0.3

00

.35

Sampling Design

Co

nsta

nt o

f A

bso

rptio

n R

an

do

m E

ffe

ct S

tan

da

rd D

evia

tio

n

FIGURE 5.10 Boxplots for the estimated constant of absorption random effect standard deviations obtained from



10x10 10x10 + 20x1 10x10 + 10x2 10x10 + 20x1 + 10x2

0.0

50

.10

0.1

50

.20

0.2

5

Sampling Design

Co

nsta

nt o

f E

limin

atio

n R

an

do

m E

ffe

ct S

tan

da

rd D

evia

tion

FIGURE 5.11 Boxplots for the estimated constant of elimination random effect standard deviations obtained from



59

10x10 10x10 + 20x1 10x10 + 10x2 10x10 + 20x1 + 10x2

0.0

50

.10

0.1

50

.20

0.2

5

Sampling Design

Vo

lum

e o

f D

istr

ibu

tio

n R

an

do

m E

ffe

ct S

tan

da

rd D

evia

tion

FIGURE 5.12 Boxplots for the estimated volume of distribution random effect standard deviations obtained from



10x10 10x10 + 20x1 10x10 + 10x2 10x10 + 20x1 + 10x2

0.0

0.2

0.4

0.6

0.8

Sampling Design

Intr

a-I

nd

ivid

ua

l S

tan

da

rd D

evia

tio

n

FIGURE 5.13 Boxplots for the estimated intra-individual standard deviations obtained from 200 simulations with

different designs. The horizontal dashed line represents the real parameter value and the + marks the sample means.

Considering the results of both simulations, it is clear that the method based on linearization takes

advantage from individuals whose individual models are not estimable in the estimation of the fixed and

random effects, and that this additional information is maybe more important for the fixed effects

estimation. For the intra-individual variance it is clear that individuals with just one measure cannot

contribute to the estimation of this parameter, and our simulation does not show a contribution due to the

individuals with two measures. In addition we note that most of the boxplots show symmetric

distributions with the exception of the random effects correlation in Figure 5.5 and the intra-individual

variances in Figures 5.6 and 5.13. The asymmetry is stronger for the intra-individual variances.

60

6. Conclusions and Recommendations

In this section we present our conclusions and recommendations. Our conclusions come in Section 6.1,

and in Section 6.2 we present a list of practical recommendations to take into account when analyzing

pharmacokinetic data. We hope that this list could serve as a reference point, and that the theory and

references presented in the previous sections could lead the reader who wants to go further.

6.1. Conclusions

6.1.1. Individual Pharmacokinetics

Estimation and Inference

� Individual pharmacokinetic analysis is carried through standard nonlinear regression techniques

which are based on a linearization of the expectation surface. Under this linearization, asymptotic

results similar to the ones from linear regression hold.

� The classical assumptions in nonlinear regression are that the random errors are independent and

normally distributed with constant variance. Deviations from these assumptions can be

accommodated in the model, but computations are considerably more complicated.

� Confidence intervals can be computed based on the asymptotic results. However we must take into

account that the confidence intervals are only approximate, not just because of the asymptotic results,

but because the standard errors are compute based on a linearization of the nonlinear model.

� Once the model has been fitted, it is straightforward to get approximate standard errors for any kind

of secondary pharmacokinetic parameters. These standard errors can be used to compute confidence

intervals.

Transformations and Reparameterizations

� Transformations on the data change the structure of the random errors. The functional form of

nonlinear models usually corresponds to physical, biological, or chemical relationships which are

important to preserve in the analysis. Reparameterizations do not change the structure of the data nor

the residuals as transformations do. Hence, we must be more cautious with transformations of the

model than with reparameterizations.

� Different parameterizations will produce different accuracies in the linear approximation of the

expectation surface and therefore will produce different inferential results. A methodology based on

second derivatives of the expectation surface, to obtain measurements of curvature or nonlinearity,

has been proposed in the last decades to quantify the appropriateness of the linear approximation of

the expectation surface under different parameterizations. These measures constitute a mechanism to

get insights about which parameterization is more convenient in each case. However, the results,

besides their difficulty to interpret, are quite dependent on the design, so there are no general

guidelines about which parameterization is more convenient for each pharmacokinetic model. In our

opinion, although very promising, this theory turns out to be of little practical use.

61

� The computations involved to get the measures of curvature are quite complicated. In our opinion, the

main difficulty is the determination of the dimension of the vector space spanned by the vectors of

second derivatives.

Software

� Nonlinear regression models analysis is available in most of the statistical software, and mostly they

work with some modification of the Gauss-Newton algorithm to estimate the model. About

departures from the classical assumptions, they used to allow for different weighting schemes, so the

heterogeneity of variances problem is considered. Flexibility to deal with autocorrelations or

departures from normality is not available in WinNonlin nor in the nonlinear regression tools of SAS

and R. However, as mentioned in Section 3.1.3, the asymptotic results hold even when the normality

assumption does not.

6.1.2. Population Pharmacokinetics


� Hierarchical nonlinear models constitute the natural framework for the statistical analysis in

population pharmacokinetics. They provide great flexibility in the data modeling, so we can specify

different relations between the fixed effects and random effects, and their dependence on different

treatments.

� The naive pooled approach produces biased and imprecise estimators, and it is not able to estimate

the inter-individual variability. Therefore this approach must not be considered in the population

analysis.

� The Two-Stage approach cannot use the data from individuals whose individual models cannot be

estimated. This is a main drawback when analyzing routine clinical data, or in situations when there

are strong limitations to get several samples per individual. On the other hand, when several samples

per individual are available, the population pharmacokinetic parameter estimates obtained with this

approach are, according to the simulation studies of Sheiner and Beal (1980, 1981, 1983) as good as

the ones obtained with the methods based on linearization.

� The methods based on linearization are the best alternative to analyze population pharmacokinetic

models. This approach produces a reasonable good estimation of population and individual

pharmacokinetic parameters, and inter and intra-individual variation.

� The main difficulty with the methods based on linearization is that they strongly rely on intensive

computations, so numerical and convergence problems occur quite often. Due to the fact that the

structure of a hierarchical nonlinear model is quite complicated, the modeling phase demands a

considerably amount of time, and sometimes it is necessary to sacrifice some data.

Transformations and Reparameterizations

� As mentioned before, the measures of curvature approach to decide which parameterization is more

suitable is quite appealing but of little practical use. Indeed, these measures apply just to the case of

individual pharmacokinetics (that is nonlinear regression models), and there is not a similar approach

to analyze the population case in the literature. Since the real interest in pharmacokinetics is most of

the time on population models, the measures of curvature theory turns out to be of even less

applicability.

Software

� It is possible to fit hierarchical nonlinear models by linearization with SAS and R. It is possible to

model heterogeneity of variances and autocorrelation schemas for the intra-individual errors with R.

62

6.1.3. Sampling Strategies

Optimal Designs

� Optimal designs with nonlinear models depend on the parameter values. Since they are unknown, we

have to make an assumption about their values, and therefore, the resulting optimal designs are just

locally optimal. This characteristic constitutes a great limitation.

� There are three approaches to deal with the problem of unknown parameters: To assume a prior value

for the parameters, to assume a prior distribution (Bayesian approach), or to use sequential designs.

Assuming a prior value for the parameters can lead to misleading results if our guessing is quite far

from the real values. Indeed, the resulting designs used to have not enough support points for model

checking, and with c-optimal designs not all the parameters can be estimated. The sequential designs

approach is of poor practical use in pharmacokinetics. Therefore, the Bayesian approach, or the in

spirit similar Maximin approach, turns out to be in our opinion the best option. However, we must

consider that the lower the precision on the prior distribution, the lower the relative efficiency of the

resulting design. If the prior distribution has much dispersion, then the efficiency of the resulting

optimal design can be very close to the efficiency of a typical geometric design.

� Another limitation with optimal designs is that they do not assure the same efficiency for all the

pharmacokinetic parameters. Due to the fact that we are usually interested in several parameters, the

optimal design can be quite efficient in the estimation of some parameters but of low efficiency for

others.

� In the population context, the applicability of optimal designs can be even of less practical use due to

the intrinsic differences among individuals. Since the individual parameters differ among individuals,

the efficiency of the optimal design is not constant among them.

� Optimal designs are continuous designs. In practice, an exact design is used, so its relative efficiency

may be lower than 100%.

� Considering all the previous factors, that is, the uncertainty about the parameter values, the

differences in the parameter values among individuals, that usually our interest is on the estimation of

several pharmacokinetic parameters, and that exact designs must be implemented, the task of

deciding on the best design for a specific case is quite complicated. If in addition we consider the fact

that the assumption of independent errors with constant variance usually does not hold, the utility of

optimal design in this field is fairly limited.

Sparse Data Analysis

� The methods based on linearization of the hierarchical nonlinear model can take advantage of

subjects with a small number of measures, even from subjects whose individual models cannot be

fitted. This is a clear advantage over the traditional Two-Stage approach.

6.2. Data Analysis Recommendations

6.2.1. Individual Pharmacokinetics


� We recommend fitting the model in its original functional form and with the usual scale measure for

the variables. Data and model transformations must be considered only when strong statistical and

non statistical foundations are available.

� Residual plots must be observed to decide on the best weighting scheme.

� In order to get confidence intervals for functions of the parameters of the original parameterization

we suggest applying the first approach mentioned in Section 3.2 if possible, unless we have enough

evidence that the new parameterization produce a better linear approximation of the expectation

63

surface. When the parameter of interest is not a 1 to 1 function of one of the original parameters or a

function of more than one of the original parameters, then we can use the second approach with the

asymptotic results in (3.11) and (3.13).

6.2.2. Population Pharmacokinetics


� Due to its simplicity, the Two-Stage approach can be an alternative to get population estimators for

the pharmacokinetic parameters when there are several data points per individual or when the

methods based on linearization do not converge.

� When using the methods based on linearization, the NPD approach can be used to get initial

approximations for the iterative methods.

� As a first step in the population analysis, we can estimate individual models. It allows us to identify

individuals with some strange data points or individual curves that do not fit well to the specified

compartment model. Indeed, individual estimates and individual confidence intervals are useful to get

a first insight about the dependence of the fixed effects on the different treatments and about the

suitability of the random effects on the model.

� When the number of parameters in the model is high in relation with the number of individuals, a

diagonal covariance matrix for the random effects may be useful to avoid numerical problems.

� We can use the likelihood-ratio tests to decide if a more complex model fits the data significantly

better than a simpler one, when the more complex model differs from the simple model only by the

addition of one or more parameters. In addition we can use the AIC or BSC to compare any pair of

models. With these tools we can decide on the best structure for the fixed and random effects on the

model.

6.2.3. Sampling Strategies

Optimal Designs

� Optimal design studies, simulation studies, and practical experience must be considered to make a

decision about the sampling strategy.

� For the one-compartment model with intravascular administration, optimal designs and simulations

studies suggest to take one measure as soon as possible, one measure as late as possible, and a third

measure in between. For the first measure, a small time must be considered in order to ensure the

drug has been mixed through the entire compartment, so a representative positive concentration be

recorded. For the last measure, the optimal time is around 1/kel. There is no clear insight about the

time for the measure in between.

Sparse Data Analysis

� Individuals with just one or two measures can be incorporated in the analysis by using the methods

based on linearization. However, we recommend the inclusion of a small number of subjects with

several samples to get good estimates for inter-individual and intra-individual variation. Particularly,

intra-individual variation estimation is quite poor with sparse data.

64

Appendix: R Code for the Measures of

Curvature Computation

########################################################################################

## These libraries are required for the computations presented in this report

########################################################################################

library(PK)

library(rgenoud)

library(odesolve)

library(PKfit)

library(nlme)

library(lattice)

########################################################################################

## This is the code for computing the measures of curvature on Section 3.5.4

## with the original parameterization

########################################################################################

## First and second derivatives

f<- expression(5*Ka/(Vd*(Ka-Kel))*(exp(-Kel*Time)-exp(-Ka*Time)))

df<-deriv(f,c("Ka","Kel","Vd"),hessian=TRUE)

## Input values

original<-nls(Conc~5*Ka/(Vd*(Ka-Kel))*(exp(-Kel*Time)-exp(-Ka*Time)),

data=ex1, start=c(Ka=5, Kel=0.2, Vd=500), model=T)

Ka<-coef(original)[1]

Kel<-coef(original)[2]

Vd<-coef(original)[3]

## Here we compute the matrices F.. (label F2) and D

F2<-matrix(nrow=8,ncol=9)

D<-matrix(nrow=8,ncol=9)

for (i in 1:8){

Time<-ex1$Time[i]

eval(df)

## Matrix F.. and D

F2[i,]<-c(.hessian[1],.hessian[2],.hessian[3],.hessian[4],.hessian[5],.hessian[6],

.hessian[7],.hessian[8],.hessian[9])

D[i,]<-c(.grad[1],.grad[2],.grad[3],.hessian[1],.hessian[2],.hessian[3],.hessian[5],

.hessian[6],.hessian[9])

}

D

## QR decomposition

## The matrix Q1 contains the first p+p' columns of Q

QR<-qr(D)

Q<-qr.Q(QR)

Q1<-matrix(nrow=8, ncol=5)

for (j in 1:5){

for (i in 1:8){

Q1[i,j]<-Q[i,j]

}

}

R1<-t(Q1)%*%D

## Computing A..

## The array A.. has 5 faces. Each face is a 3x3 matrix, and they are given

## in A1, A2, A3, A4, and A5

A1<-matrix(nrow=3,ncol=3)



65



A<-t(Q1)%*%F2

for (i in 1:3){

for (j in 1:3){

A1[i,j]<-A[1,3*(i-1)+j]

A2[i,j]<-A[2,3*(i-1)+j]

A3[i,j]<-A[3,3*(i-1)+j]

A4[i,j]<-A[4,3*(i-1)+j]

A5[i,j]<-A[5,3*(i-1)+j]

}

}

A1; A2; A3; A4; A5

## Relative Curvatures

R11<-matrix(nrow=3,ncol=3)

for (i in 1:3){

for (j in 1:3){

R11[i,j]<-R1[i,j]

}

}

R11

C1<-t(solve(R11))%*%A1%*%solve(R11)*0.0002277*3^0.5; C1





## RMS parameter effects curvature: original parametrization

cb<-((2*sum(C1*C1,C2*C2,C3*C3)+sum(diag(C1))^2+sum(diag(C2))^2+sum(diag(C3))^2)/15)^.5

ci<-((2*sum(C4*C4,C5*C5)+sum(diag(C4))^2+sum(diag(C5))^2)/15)^.5

cb; cb*qf(0.95,3,5)^0.5

ci; ci*qf(0.95,3,5)^0.5

########################################################################################

## To compute the measures of curvature for the half life parameterization, change the

## the first lines of the code given above by the following lines

########################################################################################


f<- expression(5*Ka/(Vd*(Ka-log(2)/t_half))*(exp(-log(2)/t_half*Time)-exp(-Ka*

Time)))

df<-deriv(f,c("Ka","t_half","Vd"),hessian=TRUE)

## Input values

half_life<-nls(Conc~5*Ka/(Vd*(Ka-log(2)/t_half))*(exp(-log(2)/t_half*Time)-exp(-Ka*

Time)), data=ex1, start = c(Ka=5, t_half=3.5, Vd=500), model=T)

Ka<-coef(half_life)[1]

t_half<-coef(half_life)[2]

Vd<-coef(half_life)[3]

########################################################################################

## To compute the measures of curvature for the total clearance parameterization,

## change the first lines of the code given above by the following lines

########################################################################################


f<- expression(5*Ka/(Cl/Kel*(Ka-Kel))*(exp(-Kel*Time)-exp(-Ka*Time)))

df<-deriv(f,c("Ka","Kel","Cl"),hessian=TRUE)

## Input values

total_clearance<-nls(Conc~5*Ka/(Cl/Kel*(Ka-Kel))*(exp(-Kel*Time)-exp(-Ka*Time)),

data=ex1, start=c(Ka=5, Kel=0.2, Cl=100), model=T)

Ka<-coef(total_clearance)[1]

Kel<-coef(total_clearance)[2]

Cl<-coef(total_clearance)[3]

66

References

1. Al-Banna M. K., Kelman A. W., and Whiting B. (1989). Experimental Design and Efficient

Parameter Estimation in Population Pharmacokinetics. Journal of Pharmacokinetics and

Biopharmaceutics, Vol. 18, 347–360.

2. Atkinson A. C., Chaloner K., Herzberg A. M., and Juritz J. (1993). Optimum Experimental Designs

for Properties of a Compartmental Model. Biometrics, Vol. 49, 325–337.

3. Atkinson A. C. and Donev A. N. (1996). Optimum Experimental Designs. Clarendon Press, Oxford.

4. Bates D. M. and Watts D. G. (1980). Relative Curvature Measures of Nonlinearity. Journal of the

Royal Statistical Society. Series B (Methodological), Vol. 42, 1–25.

5. Bates D. M. and Watts D. G. (1981). Parameter Transformations for Improved Approximate

Confidence Regions in Nonlinear Least Squares. The Annals of Statistics, Vol. 9, 1152–1167.

6. Bates D. M. and Watts D. G. (1988). Nonlinear regression analysis and its applications. Wiley.

7. Beal S. L. and Sheiner L. B. (1982). Estimating Population Kinetics. Critical Reviews in Biomedical

Engineering, Vol. 8, 195–222.

8. Biedermann S., Dette H., and Pepelyshev A. (2004). Maximin Optimal Designs for a Compartmental

Model. mODa 7 – Advances in Model-Oriented Design and Analysis. Physica Verlag, 41–49.

9. D’Argenio D. Z. (1981). Optimal Sampling Times for Pharmacokinetic Experiments. Journal of

Pharmacokinetics and Biopharmaceutics. Vol. 9, 739–756.

10. Davidian M. and Giltinan D. M. (1995). Nonlinear Models for Repeated Measurement Data.

Chapman & Hall.

11. Dette H., Haines L. M., and Imhof L. A. (2005). Maximin and Bayesian Optimal Designs for Linear

and Non-Linear Regression Models. Statistica Sinica.

12. Ette E. I., Howie C. A., Kelman A. W., and Whiting B. (1994). Experimental Design and Efficient

Parameter Estimation in Preclinical Pharmacokinetic Studies. Pharmaceutical Research, Vol. 12,

729–737.

13. Jonsson E. N., Wade J. R., and Karlsson M. O. (1996). Comparison of Some Practical Sampling

Strategies for Population Pharmacokinetic Studies. Journal of Pharmacokinetics and

Biopharmaceutics, Vol. 24, 245–263.

14. Lindstrom M. J. and Bates D. M. (1990). Nonlinear Mixed Effects Models for Repeated Measures

Data. Biometrics, Vol. 46, 673–687.

15. Melas V. B. (2005). On the Functional Approach to Optimal Designs for Nonlinear Models. Journal

of Statistical Planning and Inference, Vol. 132, 93–116.

16. Mentre F., Mallet A., and Baccar D. (1997). Optimal Design in Random-Effects Regression Models.

Biometrika, Vol. 84, 429–442.

17. Pinheiro J. C. and Bates D. M. (2000). Mixed Effects Models in S and S-Plus. Springer.

18. Ritschel W. A. and Kearns G. L. (2004). Handbook of Basic Pharmacokinetics …Including Clinical

Applications (sixth edition). American Pharmacists Association.

67

19. Seber G. A. F. and Wild C. J. (1989). Nonlinear Regression. Wiley.

20. Shargel L. and Yu A. B. C. (1999). Applied Biopharmaceutics and Pharmacokinetics (fourth

edition). McGraw-Hill.

21. Sheiner, L. B. (1986). Analysis of Pharmacokinetic Data Using Parametric Models. III. Hypothesis

Test and Confidence Intervals. Journal of Pharmacokinetics and Biopharmaceutics, Vol. 14, 539–

555.

22. Sheiner, L. B. and Beal S. L. (1980). Evaluation of Methods for Estimating Population

Pharmacokinetic Parameters. I. Michelis-Menten Model: Routine Clinical Data. Journal of

Pharmacokinetics and Biopharmaceutics, Vol. 8, 553–571.


Pharmacokinetic Parameters. II. Biexponential Model and Experimental Pharmacokinetic Data.

Journal of Pharmacokinetics and Biopharmaceutics, Vol. 9, 635–651.


Pharmacokinetic Parameters. III. Monoexponential Model: Routine Clinical Pharmacokinetic Data.

Journal of Pharmacokinetics and Biopharmaceutics, Vol. 11, 303–319.

25. Venables W. N. and Ripley B. D. (2002). Modern Applied Statistics with S (Fourth edition).

Springer.

research.tue.nl · i preface in this report i present the results of the research made to complete...

Documents