modeling compositional data
DESCRIPTION
Modeling compositional data. Some collaborators. Deformations: Paul Sampson Wendy Meiring, Doris Damian Space-time: Tilmann Gneiting Francesca Bruno Deterministic models: Montserrat Fuentes, Peter Challenor Markov random fields: Finn Lindstr ö m Wavelets: Don Percival - PowerPoint PPT PresentationTRANSCRIPT
Modeling compositional data
Some collaborators
Deformations: Paul SampsonWendy Meiring, Doris DamianSpace-time: Tilmann GneitingFrancesca BrunoDeterministic models: Montserrat Fuentes, Peter ChallenorMarkov random fields: Finn LindströmWavelets: Don PercivalBrandon Whitcher, Peter Craigmile, Debashis Mondal
Background
NAPAP, 1980’s
Workshop on biological monitoring, 1986
Dirichlet process: Gary Grunwald, 1987
Current framework: Dean Billheimer, 1995
Other co-workers: Adrian Raftery, Mariabeth Silkey, Eun-Sug Park
Compositional data
Vector of proportions
Proportion of taxes in different categories
Composition of rock samples
Composition of biological populations
Composition of air pollution
z =(z1,...,zk)T zi >0 zi =11
k
∑ z∈∇k−1
The triangle plot
Proportion 1
0
1
1
0
1
0
Proportion 2
Proportion 3
(0.55,0.15,0.30)
The spider plot
(0.40,0.20,0.10,0.05,0.25)
0.2
0.4
0.6
0.8
1.0
An algebra for compositions
Perturbation: For define
The composition acts as a zero, so .
Set so .
Finally define .
ξ,α ∈∇k−1
€
ξ ⊕ α =ξ1α1
ξ iα i1
k
∑,...,
ξkαk
ξ iα i1
k
∑
⎛
⎝
⎜ ⎜
⎞
⎠
⎟ ⎟∈∇k−1
ι =1k,...,
1k
⎛ ⎝ ⎜
⎞ ⎠ ⎟
ξ⊕ι =ξ
ξ−1 =1ξ1
,...,1ξk
⎛
⎝ ⎜ ⎜
⎞
⎠ ⎟ ⎟ ξ⊕ξ−1 =ι
ξ−η=ξ⊕η−1
The logistic normal
If
we say that z is logistic normal, in short Z ~ LN(,).
Other distributions on the simplex:
Dirichlet — ratios of independent gammas
“Danish” — ratios of independent inverse Gaussian
Both have very limited correlation structure.
alr(z)= logz1
zk,...,log
zk−1
zk
⎛
⎝ ⎜ ⎜
⎞
⎠ ⎟ ⎟
T
~MVN(μ,Σ)
Scalar multiplicationLet a be a scalar. Define
is a complete inner product space, with inner product given, e.g., by
N is the multinomial covariance N=I+jjT
j is a vector of k-1 ones.
is a norm on the simplex.
The inner product and norm are invariant to permutations of the components of the composition.
ξ⊗a=ξ1
a
ξia∑,...,
ξka
ξia∑
⎛
⎝ ⎜ ⎜
⎞
⎠ ⎟ ⎟
∇k−1,⊕,⊗( )
ξ,η =alr(ξ)TN−1alr(η)
€
ξ = ξ,ξ
Some models
Measurement error:
where j ~ LN(0,) .Regression:
Correspondence in Euclidean space:
ξj ξ uj
zj =ξ⊕εj
ξj =ξ⊕γ⊗uj
compositions
centeredcovariate
μj = β0 + β1 (xj −x )
alr−1(μj)=alr−1(β0)⊕alr−1(β1)⊗(xj −x )
Some regression lines
Time series (AR 1)
zk+1 =φ⊗ zk ⊕ k
A source receptor model
Observe relative concentration Yi of k species at a location over time.
Consider p sources with chemical profiles j. Let αi be the vector of mixing proportions of the different sources at the receptor on day i.
~ LN, αi ~ indep LN, i ~ zero mean LN
EYi = αiji=1
p
∑ θj =Θαi
Y =Θαi ⊕εi
Juneau air quality
50 observations of relative mass of 5 chemical species. Goal: determine the contribution of wood smoke to local pollution load.
Prior specification:
Inference by MCMC.
f(,α i, i,α ,Γ, ) =
f(α i α ,Γ) f( i )f(α )f(Γ)f( )
Wood smoke contribution
95% CL
50% CL
Source profiles
(fluoranthene)
(pyrene)
(benzo(a))
(chrysene)
(benzo(b))
State-space model
Space-time model of proportions
State-space model:
zj unobservable composition ~ LN(j,j)
yj k-vector of counts ~ Mult(
Inference using MCMC again
€
yj[ ]ii=1
k
∑ ,zj )
Stability of arthropod food webs
Omnivory thought to destabilize ecological communities
Stability: Capacity to recover from shock (relative abundance in trophic classes)
Mount St. Helens experiment: 6 treat-ments in 2-way factorial design; 5 reps.Predator manipulation (3 levels)Vegetation disturbance (2 levels)
Count anthropods, 6 wks after treatment. Divide into specialized herbivores, general herbivores, predators.
Specification of structure
is generated from independent observations at each treatment
mean depends only on treatment
Benthic invertebrates in estuary
EMAP estuaries monitoring program: Delaware Bay 1990. 25 locations, 3 grab samples of bottom sediment during summer
Invertebrates in samples classified into–pollution tolerant–pollution intolerant–suspension feeders (control group; mainly palp worms)
Site j, subsample t
j ~ CAR process z jt : LN( j +βxj,Ψ)
E( j −j ) = +λnj
(kk∈N( j)∑ −)
Var( j −j ) =Γnj
Effect of salinity