why are you using that regression? western mensurationist meeting jim flewelling july, 2003

32
WHY ARE YOU USING THAT REGRESSION? Western Mensurationist Meeting Jim Flewelling July, 2003

Upload: buck-whitehead

Post on 02-Jan-2016

228 views

Category:

Documents


2 download

TRANSCRIPT

WHY ARE YOU USING THAT REGRESSION?

Western Mensurationist Meeting

Jim Flewelling July, 2003

FOCUS

• POPULATIONS– VARIANCE IN RELATIONSHIPS

• OBJECTIVES– USE OF REGRESSION

• TECHNIQUES are SECONDARY

TWO WORLDS

• SURVEY SAMPLING– Fixed Populations– Objective refers to Population

• REGRESSION ANALYSIS– Relationships between variables– Objectives refer to individuals or populations

SURVEY SAMPLING

• Fixed Population.

• Specified probability-sampling processes.

• Estimation of population parameters– unbiased estimators.

SURVEY SAMPLING

“If we are to infer from sample to population, the selection process is an integral part of the inference.”

- Stuart (1984, p. 4)

REGRESSIONS IN SURVEY SAMPLING

• AUXILIARY INFORMATION (X) – known for population.– Increased precision.

• MODEL-ASSISTED ESTIMATORS COMMON (Särndal et al.,1992)

• MODEL-BASED ESTIMATORS

MODEL-ASSISTED SURVEY SAMPLING

XxyYR )/(ˆ

Ratio of Means Estimator:

Asymptotically unbiased, whether or not y proportional to x.

Could be used to estimate individual y’s. No claim of unbiasedness here.

MODEL-BASED SURVEY SAMPLING

• Assumptions from Regression Analysis.– True model– E(e|x) = 0– Errors are independent.

• Random selection avoids a source of bias.• Inference from regression theory, not the

distribution of samples.• Theory from Royall (1970).

REGRESSION ANALYSIS

• Least Squares - Legrendre (1805) and Gauss.• Sir Francis Galton (1877, 1885):

Offspring of seeds “did not tend to resemble their parent seeds in size, but to be always more mediocre [i.e., more average] than they - to be smaller than the parents, if the parents were large … the mean filial regression towards mediocrity was directly proportional to the parental deviation from it.” (quoted from Draper & Smith)

LEAST SQUARES REGRESSION

)()/(ˆˆ xxssyy xy

Var < Var(y)

GEOMETRIC MEAN REGRESSION

)()/()ˆ(ˆ xxsssignyy xy

Preserves Variance

Discussion by Ricker (1984)

HEIGHT-AGE CURVES

• Site Curves (Curtis)• Site Index Prediction Functions• Geometric Mean Regression• Stochastic Differential Equation• Height Growth Models• Percentile Models

Site Curves and SI Prediction Functions

• Curtis et al. (1974)

• Site Curve - Yield table construction– H = f(A, SI).

• SI Prediction Function - Site Classification– SI = f(A, H).

SITE CURVES, SI PREDICTION, and GMR

SI = H (index age)

HA = H (age A)

3 Lines:

All at mean (HA, SI)

Slope = SI/HA { , 1, 1/ }

Straight-line assumption valid for bivariate normal.

Stochastic Differential Equation (Garcia, 1979)

• dH/dt = (b/c)H{(a/H)c -1}– b is plot-specific, (a, c) are global.

• Integrates to Chapman-Richards.• Add Wiener process error to growth.• Add measurement errors at intervals.• Fit with Maximum Likelihood.• It’s a growth-model; also base-age invariant

site curve.

Height Growth Model

• Family of H-A curves.• From any one age, predict height difference to next or

previous age.• Parameters adjusted to minimize errors in predicted

growth. (Bonnor et al, 1995), Flewelling et al (2001).• Crude, ignores measurement errors, and correlations

between periods. Flexible model form.• It’s a growth model - attempts to model H-A

trajectories of plots. Base-age invariant.

Percentile Models

• Concept by Pienaar and Clutter (Clutter et al, 1983).

• Example by Bi (2002).

• Extends to irregular data. (Flewelling, 1982, unpublished).

• Current econometrics theory, rich history.

Percentile Models

• Pienaar and Clutter:

Percentiles as a labeling device: “useful in illustrating the fact that index age is not a fundamental or required concept in the use of site index to express site quality.”

Percentile Models, Example

• Bi et al ( 2002)

• Temporary plots (age and site assumed orthogonal).

• H(t) assumed to have normal distribution.

• Q0.75 and Q0.25, fit as functions of t.

– methodology from Koenker and Bassett (1978)

• Mean H(t) fit with weighted regression.

Percentile Model, Irregular data.

• Sectioned tree data, height every year.

• Younger ages: full data set.

• Older ages: reduced data set.

• Establish tree percentiles at young age.

• Reassign censored percentiles older ages.

• Compute (and model) means and standard deviations from heights and percentiles.

Percentile models, econometrics

• Koenker (2000):

• wonderful discussion of least squares, alternative methods, and statistical history.

• Minimization of summed absolute errors dates from 1760’s.

Height-Age Curves. Questions

• Should height growth models be the same as constant percentile curves?

• Are regressions from one age to another wanted?

• Is there any use for an index age other than as a label?

POPULATIONSWHICH PROJECTION IS WANTED?

TREE GROWTH MODELS

DBH

• Mortality fractions.

• What ensures that the variance of projected stand table is correct?– Need variance models as constraints?– Different fitting techniques?– Good luck and occasional checking?

RIGHT INDEPENDENT VARIABLES?

Regional H-DBHCurves.

Biased by Age or position in stand.

Alternative: local curves, another variable.

Bayesian Regression

• Neglected in Forestry?

• Empirical Bayes used in volume equations (Green and Strawderman, 1985).

• Taper and volume equations by forest district (McTague, Stansfield and Lan, 1992).

• Other opportunities?

Bayesian Opportunity

• Fit y = a0 + a1x1 + a2x2 + a3x3 + …..

– Often by species or other category.– Coefficients tested and omitted if non-

significant.– Or, selected coefficients fit in common for all

species.– Bayesian regression or other methods better?

OTHER REGRESSION TECHNIQUES

• ML with better error characterization.

• Mixed models.

• Systems: Seemingly unrelated regression, 2SLS, 3SLS ……..

• Generally are more efficient, better estimates of parameter variance, possibly avoid some biases. Necessary?

• Imputation?

SUMMARY

• What does population look like?

• What should be described?

• What techniques allow that?

REFERENCES• Bi, H., A.D. Kozek and I.S. Ferguson. 2002. Quantile-based site index curves: a brief

introductory note. Proc of IUFRO Symposium on Statistics and Technology in Forestry, Sept 8-12, 2002 Blacksburg. [ May be a related 2003 paper in J of Agr, Biological, and Environmental Statistics.]

• Bonnor, G.M., R. J. DeJong, P. Boudewyn and J. Flewelling. 1995. A guide to the STIM growth model. Nat. Res. Canada. Info Rpt X-353.

• Clutter, J.L., J.C. Fortson, L.V. Pienaar, G.H. Brister and R.L. Bailey. 1983. Timber management: a quantitative approach. Krieger Publ., Malamar, FL. 333 p.

• Curtis, R.O., D.J. Demars, F.R. Herman 1974. Which dependent variable is site index - height - age relationships? For. Sci. 20: 74-87

• Draper, N. R. and H. Smith. 1998. Applied Regression Analysis. Wiley. New York. 706 p.• Flewelling, J. 1982. Dominant height trends for plantations of loblolly pine at the

Mississippi/Alabama region of Weyerhaeuser Company. Research Rpt 050-3415/3. Weyerhaeuser Forestry Research, Hot Springs. (unpublished)

• Flewelling, J., R. Collier, B. Gonyea, D. Marshall and E. Turnblom. 2001. Height-age curves for planted stands of Douglas fir, with adjustments for density. SMC Working Paper No. 1, Univ. of WA, Seattle.

REFERENCES• Garcia, O., 1979. A stochastic Differential Equation Model for height growth of forest

stands. Biometrics 39: 1059-1072.

• Green, E. and W.E. Strawderman. 1985. The use of Bayes/Empirical Bayes Estimation in Individual Tree Volume Equation Development. For. Sci. 31: 975-990.

• Koenker, R. 2000. Galton, Edgeworth, Frisch, and prospects for quantile regression in econometrics. J of Econometrics 95: 347-374.

• Koenker, R.W. and G.W. Basset. 1978. Regression Quantiles. Econometrica 50, 43-61.

• McTague, J.P., W.F. Stansfield, Z. Lan. 1992. Southwestern ponderosa pine, Douglas fir and white fir volume and taper functions. Report to USFS. Northern Arizona University.

• Ricker, W.E. Computation and uses of central trend lines. Can. J. Zool. 62:1897-1905

• Royall, R.M. 1970. On finite population sampling theory under certain linear regression models. Biometrika 57: 377-387.

• Särndal, C., B. Swensson, J. Wretman . 1992. Model assisted survey sampling. Springer-Verlag, New York. 694 p.

• Stuart, A. 1984. The ideas of sampling. Macmillan, New York. 91 p.

COMMENTS?

QUESTIONS?