understanding species’ abundance

50
Understanding Species’ Abundance Bikas K Sinha Retired Faculty INDIAN STATISTICAL INSTITUTE KOLKATA ********************* RU Workshop : 18/04/2012

Upload: silas-miranda

Post on 31-Dec-2015

22 views

Category:

Documents


0 download

DESCRIPTION

Understanding Species’ Abundance. Bikas K Sinha Retired Faculty INDIAN STATISTICAL INSTITUTE KOLKATA ********************* RU Workshop : 1 8/04/2012. ABSTRACT. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Understanding   Species’ Abundance

Understanding Species’ Abundance

Bikas K SinhaRetired Faculty

INDIAN STATISTICAL INSTITUTE KOLKATA

*********************RU Workshop : 18/04/2012

Page 2: Understanding   Species’ Abundance

ABSTRACT Considered is a multi-species assemblage in an infinite

population with unknown and possibly heterogeneous species' abundance levels. A random sample of a fixed size (n) has been drawn only to realize a certain species distribution.

• At this stage, there are two interesting inference problems :

• (i) Prediction of the unknown proportion of the collective abundance of all hitherto unrealized species

• (ii) Assessment of the quantum of additional units to be sampled [in terms of the sample size n] to realize a certain number of hitherto unobserved species.

Page 3: Understanding   Species’ Abundance

ABSTRACT…..contd.

Finite population analogue of this problem is also worth discussing.

I propose to review the available literature in this fascinating area of research.

Page 4: Understanding   Species’ Abundance

Formulation….• Measurement of biodiversity is an important

issue in ecological studies and planning wildlife conservation. The conventional method of measurement involves sampling in various forms such as line transects or quadrats. Many field studies appear to be rather ad hoc in their design, especially with respect to amount of effort put in. A crucial question in this context is how much sampling effort should be considered enough.

Page 5: Understanding   Species’ Abundance

Formulation…..

• Two quantitative aspects of diversity are widely regarded as central to its measurement. These are: Species Richness (number of species of a taxon in a given geographical area) and Species Evenness (differences in relative abundance). The two can be combined in various ways to form diversity indices (see e.g. Patil and Taillie (1982))

Page 6: Understanding   Species’ Abundance

Formulation….

• Earlier studies (Gore and Paranjpe 1997) indicate that for estimation of diversity indices, a sample of 1000 units should suffice. On the other hand, effort needed to estimate species richness is one order of magnitude higher.

• Consider a virtually infinite population with an underlying species’ distribution which is naturally unknown. It makes sense to ‘estimate’ the number of [distinct] likely available species PROVIDED ALL ARE ‘Equally Abundant’ ! In the 1950’s – 1980’s there have been studies along this direction.

Page 7: Understanding   Species’ Abundance

Formulation….• More realistically, we can put a cap ‘K’ on the

number of species and investigate the nature of their unknown and most likely heterogeneous abundance distribution given by P1, P2, ….the labels are ‘hypothetical’ in nature….

A random sample of a fixed size (n) has been drawn only to realize a certain species distri-bution. We may ‘re-label’ the realized species as #1, #2,…#kn, say, with a decreasing observed abundance distribution p1 ≥ p2 ≥…

Page 8: Understanding   Species’ Abundance

Formulation…..• At this stage, there are two interesting

inference problems : • (i) Prediction of the unknown proportion of

the collective abundance of all hitherto unrealized species

• (ii) Assessment of the quantum of additional units to be sampled [in terms of the sample size n] to realize a certain number of hitherto unobserved species.

Page 9: Understanding   Species’ Abundance

Approach…..• Note : As n tends to infinity, kn tends to K and pi’s tend to

cover all the Pi’s upto a permutation.• Finite Sample Study :• I(i; sn) = 1 if species labelled ‘i’ has not been realized in the

sample sn of size n [i =1 to K]

Unexplored collective species’ abundance parameteris given by Theta(sn) = ∑ Pi I(i; sn) and we want

to predict Theta(sn), based on the given realization.

Note that Theta(sn) is a sample-dependent random

quantity.

Page 10: Understanding   Species’ Abundance

Solution…..• Concept of Unbiased Prediction……• U(sn) : Unbiased Predictor of Theta(sn) iff

• E[U(sn) - Theta(sn)] = 0.

• Note : E[Theta(sn)] = Thetan, say….given by

• Thetan = ∑ Pi (1 – Pi)n

• Interpretation of Thetan : Having drawn a random sample of size n and having encoun-tered some species, it is the chance that the next observation, randomly drawn from the same population, will produce a NEW Species.

Page 11: Understanding   Species’ Abundance

SOLUTION….• Estimation of Thetan : In the exact sense• Each summand above is a polynomial of degree (n+1), last term being (-1)n Pi (n+1)

Impossible to estimate unbiasedly based on a sample of size n. Ignoring each summand’s last term in Thetan, U(sn) has a routine form… term by term……estimation…..routine task

For each ‘i’……Biased Predictor……..may be OK in most cases…..even for moderate n……

Page 12: Understanding   Species’ Abundance

Unbiased Prediction : Second Stage Sampling…..

• Go for additional ‘m’ observations (s*m) and mix with sn to form ‘updated’ frequency counts of ‘already observed’ and possibly ‘new’ species categories……

• Able to define U(sn U s*m) for unbiased prediction of Theta(sn) i.e., unbiased estimation of Thetan.

• Theoretical Results & Illustrative Examples follow.

Page 13: Understanding   Species’ Abundance

Theory of Estimation…..• Thetan = ∑ Pi (1 – Pi)n = ∑ Theta i, n

• Theta i, n = Polynomial of degree (n+1) in Pi

Standard formula for unbiased estimation of a polynomial function based on (n+m; m ≥ 1) observationsTheta^ i, n = U i (snUs*m) for each i

Can be combined over all i’s as ∑ i Ui(..)

and expressed in a nice form as Theta^n = U(……)

Page 14: Understanding   Species’ Abundance

Formulae…..U(…) = ∑ j ≤ m [(m-1)_c_(j-1) V_ j, n+m] / (n+m)_c_ j

V_ j, n+m = # of species categories each with frequency ‘j’ in the combined sample of size n+m

Starr [1979, Annals of Statistics]Special Case : m = 1U(..) = V_1, n+1 / (n+1) Robbins [ 1968, AMS] Note : U(..) : UMVUE for Thetan.

Clayton & Frees [JASA, 1987]; Nayak [JSPI, 1992]

Page 15: Understanding   Species’ Abundance

Illustrative Examples….• K = 50; n = 100; m = 5 addl. units in Stage II• Species Freq. Counts Before / After Stage II • Labels Before After • 1 47 + 2 49• 2 33 + 2 35• 3 11 + 0 11• 4 9 + 0 9• 5 - + 1 1

Page 16: Understanding   Species’ Abundance

Computations….

• V_1, 105 = 1; rest are all 0’s• U(..) = 1/105 < 1 % • Very little chance of discovery of new species

at this stage…..

Page 17: Understanding   Species’ Abundance

Back Tracking……

• If we have maintained a record of all the 100 incoming observations, we may use the last 5 to develop a prediction formula for the initial sample of 95. Also we can randomly select a subset of 5 and do the same exercise. Repeated sampling will produce a prediction distribution for an additional sample of size 5 and an initial sample of size 95.

Page 18: Understanding   Species’ Abundance

Computations….• K = 50; n_ = 95 [deleting last 5] ; n = 100• Species Freq. Counts • Labels Total of 95 Total of 100 • 1 45 + 2 47• 2 31 + 2 33• 3 11 + 0 11• 4 8 + 1 9• V_j, 100 = 0 for all 1 ≤ j ≤ 5• U(..) = 0…….very low chance of discovery of

new species after 95 draws…..

Page 19: Understanding   Species’ Abundance

Finite Population Search….

• Measurement of Species’ Richness [# Species]• Prediction of Species’ Abundance [Prop. Unexpl.]• Natural Community : Trees / Birds / Mussels Gore, A.P. and Paranjpe, S.A. (2001). A Course in

Mathematical and Statistical Ecology, Kluwer Academic Publishers, London.

• Empirical findings…• Sample size : 1000 for Species’ Abundance• : 10,000 for Species’ Richness • [For Bird Community with 500 Species……]

Page 20: Understanding   Species’ Abundance

Prediction of Species’ Abundance: Finite Population Inference

• N = Size of Units in a Finite Population• K = Cap on the Number of Distinct Species• N1, N2, …, NK …..species-specific sizes

n = sample size under SRSWR/SRSWOR samplingsn = sample of n units out of N units With /

Without ReplacementWOLG : Observed Species 1, 2, …, kn with

frequency counts n1, n2, …, nk_n so that

n = ∑ ni; ni > 0

Page 21: Understanding   Species’ Abundance

Finite Population Inference

Unexplored Species Abundance = ∑ j>k_n (N j / N)

Prediction of Theta (sn)= ∑ j>k_n (N j / N),

excluding abundance of all those kn species

captured by sn.

Page 22: Understanding   Species’ Abundance

SRSWR (N, n) : Inference…..

• Under SRSWR(N, n) : SAME RESULTS HOLD• Define Pi = Ni / N; i = 1, 2, …, K

• As before….Theta(sn) = ∑ Pi I(i; sn) and we want to predict Theta(sn)

• Unbiased Predictor calls for m [≥1] additional sample units and the Predictor is given by

U(…)= ∑ j ≤ m [(m-1)_c_(j-1)V_ j, n+m]/(n+m)_c_ j

V_ j, n+m = # of species categories each with frequency ‘j’ in the combined sample of size n+m

Page 23: Understanding   Species’ Abundance

SRSWOR (N, n) : ResultsUse I(i, sn) =1 if Species # i is not represented in sn

Theta(sn) = ∑ Pi I(i, sn) = combined proportion

of units of unobserved species Thetan = E[Theta(sn)] = ∑ Pi E[I(i, sn)]

SRSWR : E[…] = (1 – Pi)n

SRSWOR: E[…] = [(N-Ni)_c_n] / [N_c_n]

Thetan = ∑ Pi[(N-Ni)_c_n] / [N_c_n]

P[Discovery of New Species in next draw] is given by∑ [Ni/(N-n)][(N-Ni)_c_n/N_c_n]

=NThetan/(N-n)= Theta*n, say.

Page 24: Understanding   Species’ Abundance

SRSWOR (N, n) : ResultsNeeded additional units based on SRSWOR(N-n, m)Theorem : UMVUE of Theta*n is given by

∑ j ≤ m [(m-1)_c_(j-1)V_ j, n+m]/(n+m)_c_ j

V_ j, n+m = # of species categories each with frequency ‘j’ in the combined sample of size n+m

Note : Thetan - estimate = (1 - f) Theta* n - estimate

f = n/N = sampling fraction

Page 25: Understanding   Species’ Abundance

SRSWR (N, n) : Distinguishable UNITS ?

So far….tacit assumption: units within species areindistinguishable... so frequency counts V_ j, n+m’s

wereRelevant and informative……For finite populations of Within-Species Distinguishable Units……WR Sampling…scope of repeated units….use of distinct units will improve estimation results. SINHA & SENGUPTA (1993) : CSA Bulletin, 43, 75-84.

Page 26: Understanding   Species’ Abundance

SRSWR : Data Analysis…..

Notations • n = initial SRSWR (N, n) sample size• kn = Number of distinct species observed initially

[WOLG : 1 , 2, .., kn]• m = additional SRSWR (N, m) sample size Theta (sn) = ∑ j > k_n [N j / N] = Abundance of Unobserved

Species • U(..) = Unbiased Predictor of Theta(sn) based on [sn U

s*m]

• d = Number of distinct units in [sn U s*m]

Sd = set of d distinct units

Page 27: Understanding   Species’ Abundance

Rao-Blackwellization….• Improved Estimator = E[U(..); given Sd]• Recall expression for U(….) given by • U = ∑ i Ui (sn U s*m)

Conditional Expectation of ith term is given by E[Ui (sn U s*m) ; given Sd] and it is evaluated as

∑ j ≤ m [(m-1)_c_(j-1)][∆ d_i 0 j] times

[∆(d-d_i) 0(n+m-j) ]/ ∆d 0n+m

where ∆ = Delta Operator & d_i = No. of distinct units from ith species category in the combined sample of size n+m

Page 28: Understanding   Species’ Abundance

Special Cases….

Below we derive expressions for the ith term• m= 1 : ∆d-1 0n/ ∆d 0n+1 if d_i = 1; 0 owm=2: 2∆d-2 0n/ ∆d 0n+2 if d_i = 2[∆d-1 0n+1 + ∆d-1 0n / ∆d 0n+2 if d_i=1; 0 owAnd so on…..

Page 29: Understanding   Species’ Abundance

Illustrative Examples…• SRSWR(N, n) : Distinguishable Units within each

species• Population Size N=1000• K = Cap on the no. of species = 20• Initial Sample Size n = 50• k_n = observed no. of species = 6• Freq. counts of obs. Spc. : 21, 13, 9, 4, 2, 1 • Addl. Sample size m = 5• Revised Freq. Counts : 21+2, 13+1, 9+0, 4+0, 2+0,

1+0, 0+1, 0+1 [2 new species are observed]

Page 30: Understanding   Species’ Abundance

Computations….• Species-specific distinct units in the

combined sample of size 55 : 4, 3, 2, 1, 1, 1, 1• d= 13; d_1 = 4, d_2 = 3, etc• We need computations of U_i(…) for i=1 to 7,

conditional on the sets holding d_i’s fixed. • Case : i=1; d_1 = 4∑ j≤ 5[4_c_(j-1)][∆4 0 j] times

[∆9 0(55-j) ]/ ∆13 055

Etc etc

Page 31: Understanding   Species’ Abundance

Open Issues….

• Actual Study in Western India……• Peninsular India [Western Ghats]. This biodiversity

hotspot (area 200,000 sq.km) is home to some 480 bird species. Earlier study suggests observing about 10,000 birds to estimate species richness of the Western Ghats. But it leaves open the issue of distributing the total effort over time and space……fresh study began with Anil Gore – Environmental Statistician with UniPune….

Page 32: Understanding   Species’ Abundance

Improved Sampling Strategies

• Planning a Field Study on a Smaller Scale for a reliable count of bird species

• Reference Site : Silent Valley Nat’l Park, Kerala• Three Habitats : Evergreen [EV], Semi-

Evergreen, [SE] and Teak Plantations [P] • Study of Avian Diversity : Typical sampling unit

is a transact : Leads to Transact SamplingCoverage : 2 yrs x 12 months x 3 habitats x 2 visits[visits….Morning & Afternoon] = 144 visits

Page 33: Understanding   Species’ Abundance

Baseline Data…• Every visit : One transect was covered over a

period of two hours. This could be either in the morning or in the evening. Thus the total number of transects covered = 144

• Total numbers of birds seen is 4898 and in all 180 distinct species were seen. Earlier checklists show that there are 185 species in that area. This constitutes the baseline data or the universe for simulation.

Page 34: Understanding   Species’ Abundance

Sampling Strategy….• We have a matrix of 180 rows corresponding to

species seen and 144 columns corresponding to transects traversed. Entries in the matrix are number of individuals of the species recorded on the transects.

• In the simulation study we shall propose different strategies and compare their performance on the basis of estimate of species richness. We try to answer the question “Can we arrive at a good estimate of species richness with less effort than that put in the reference data?”

Page 35: Understanding   Species’ Abundance

Check List….• Sample data set will be a list of sample

transects out of 144 mentioned above, time of observation and species abundances as recorded on each sampled transact.

• Based on the data and method of sampling, we will provide an estimate of the species richness and then compare different sampling strategies with the baseline scenario.

• Target : Coverage of 80% of the Species

Page 36: Understanding   Species’ Abundance

SRSWR (144, n) : 1000 Runs

• Table 1 Species Richness Estimates Based OnSimple Random Sampling (With Replacement)# Transects(n) 24 48 72 96 120 144 Mean 110 137 151 159 165 169• Minimum 84 119 132 138 148 154• Maximum 127 154 166 171 177 179• Stdev 6.33 5.81 5.16 4.78 4.38 4.00• MSE 4936.04 1866.67 883.156 460.282 254.51 146.17

Page 37: Understanding   Species’ Abundance

Data Analysis….. Minimum number of species seen increases from 84

(46.67% of 180) with 24 transects to 154 (85.55% of 180) with 144 transects;

The maximum number of species seen increases from 127 (70%) at 24, to 179 (99.5%) at 144 transect.

SD decreases from 6.3 at 24 transects to 4.0 at 144. Table confirms that there is underestimation of

species richness. Extent of bias decreases considerably as effort increases.

Our target of 80% is reached easily with 72 transects. This is only half of actual effort put in.

Page 38: Understanding   Species’ Abundance

Species Accumulation Curve….

• Not much improvement after 72 efforts

Page 39: Understanding   Species’ Abundance

Effect of Intra-Day Division of Efforts on Species Count

Morning : Evening Mean Richness S.D.

• 100:0 133 3.8• 67:33 152 4.9• 50:50 151 5.2• 33:67 147 5.1• 0:100 144 4.4

Page 40: Understanding   Species’ Abundance

Conclusion…..

• Barring the first row, all other estimates are fairly close to each other. Common practice among ornithologists is to distribute equal efforts between morning and evening. In view of the above results it seems reasonable to stick to the convention.

Page 41: Understanding   Species’ Abundance

Choice of Season of the Year

• It is common to concentrate efforts in migratory season obviously because migratory species are not observable in the other season. If all effort (72 transects) is put in migratory season, we get to see on an average 143 species (S.D.=4.4). We further tried adding a small unit of effort (24 transects) in non-migratory season. This improved the estimate to 158 (S.D. = 4.4). Thus an improvement of 10% was possible. Hence our recommendation is that most effort indeed should be put in the migratory season.

Page 42: Understanding   Species’ Abundance

HABITATS….

• The issue is how to divide total effort among available habitats. Since the aim is species accumulation, it seems intuitively obvious that allotment of effort should be related to the number of species that use a particular habitat. We happen to have an overall picture of Western Ghats as a whole. This provides relevant data for the three habitats under consideration.

Page 43: Understanding   Species’ Abundance

Habitat….

Species Richness by Habitat Type• Habitat Type Species Count• EVF (Evergreen Forest) 145• MDC (Semi Evergreen) 183• Teak Plantation [Manmade] 215• In the above table numbers of species in

these three habitats are in the proportion 27:34:39 (145:183:215). So effort can also be divided in the same proportion.

Page 44: Understanding   Species’ Abundance

Sequence to Follow…..• What sequence of habitats to follow in this study.

We first compare performance of different sequences of habitats. This comparison is based on species accumulation. A sequence is preferred if the corresponding species accumulation curve rises faster and becomes flat quickly. In this case no new species are seen in the last few transects in the habitat observed last. By this criterion, the sequence EVF-MDC-Manmade seems the best . Here in fact new species are hardly seen after 82 transects.

Page 45: Understanding   Species’ Abundance

Cycle Sampling : A New Study….

In view of the above considerations we propose the following sampling strategy. List the habitats to be studied. Traverse one transect in each habitat. This completes one cycle of field work. Now take up corresponding analysis. This consists of generating species accumulation curves different sequences of habitats. Our interest is to see if any particular habitat is redundant at any stage [in view of the species already discovered up till the previous stage]

Page 46: Understanding   Species’ Abundance

Cycle Sampling…..

A habitat will be regarded as redundant if it fails to add any species in this accumulation curve. Next take a cycle of one transect per habitatreplacing a redundant habitat by any other habitatleft out earlier. Cycle sampling continues till thetotal number of transects traversed reaches thepredetermined limit or accumulation curve reaches a plateau or all habitats are dropped as redundant, whichever happens earlier.

Page 47: Understanding   Species’ Abundance

Cycle Sampling…..

• The cycle sampling strategy was adopted in this study. In each cycle, 4 transects of each chosen habitat were sampled. At the end of first cycle (12 transects), teak plantation yielded 6 new species (1.5/transect). Hence there was no redundancy and the entire sequence was repeated. At the end of second cycle (24 transects), it turned out that teak plantation yielded 3 new species in 4 transects. This was below the threshold of 1 new species / transect. Hence unrewarding habitat, namely, teak plantation was dropped. Now each cycle consisted of 8 transects only.

Page 48: Understanding   Species’ Abundance

Cycle Sampling…..

• At the end of cycle 6, yield from habitat MDC fell below threshold. Hence it was dropped. In the next cycle EVF also failed to remain above threshold. Thus with 60 transects we terminate the sampling exercise. We have observed 158 (88% of 180) species at this effort level, which is 42% of 144 transects traversed in the reference data set.

Page 49: Understanding   Species’ Abundance

Cycle Sampling…..

• We have suggested an adaptive sampling strategy that is dynamic and adjusts decisions at a point of time according to the accumulated information available at that point. This strategy called cycle sampling seems capable of saving effort to a substantial extent.

Page 50: Understanding   Species’ Abundance

Thanks….

• This is the end of my Technical Presentation….

• Bikas K Sinha • RU Workshop• April 18, 2012