in understanding the fossil record

15
In understanding the fossil record, we are generally making comparisons between the morphology we observe in a fossil or set of fossils and what we see in a comparative sample we know more about. For example, if I want to understand whether or not the difference in mandibular morphology between two fossil specimens means something, I might compare it to variation in a sample of humans or chimpanzees, for whom we know attributes like species, population, sex, and age (although not necessarily all of these things, all the time). Such a comparison allows us to test hypotheses about the observed fossil variation. Is this normal variation for two individuals of the same species? Is this normal variation for two individuals of the opposite sex of a species? The rejection of such hypotheses allows us to refine our understanding of the fossils. To put such hypotheses into action, we need to understand a few concepts about variation. Particularly, we need to understand what "normal" variation looks like. In the above example, we might fail to reject the idea that the observed variation falls outside what is expected for two individuals of the same species, assuming a human model of variation (i.e. comparisons with a human sample). This does not mean that the two specimens above are exactly like any given pair of human mandibles, only that the differences between the two fall within a range of variation in human mandibles that we might consider "normal." But what does normal mean and what properties does it have?

Upload: jaime-r-calderon

Post on 26-Dec-2015

36 views

Category:

Documents


0 download

DESCRIPTION

antropologia biologica

TRANSCRIPT

Page 1: In Understanding the Fossil Record

In understanding the fossil record, we are generally making comparisons between the morphology we observe in a fossil or set of fossils and what we see in a comparative sample we know more about. For example, if I want to understand whether or not the difference in mandibular morphology between two fossil specimens means something, I might compare it to variation in a sample of humans or chimpanzees, for whom we know attributes like species, population, sex, and age (although not necessarily all of these things, all the time).

Such a comparison allows us to test  hypotheses about the observed fossil variation. Is this normal variation for two individuals of the same species? Is this normal variation for two individuals of the opposite sex of a species? The rejection of such hypotheses allows us to refine our understanding of the fossils.

To put such hypotheses into action, we need to understand a few concepts about variation. Particularly, we need to understand what "normal" variation looks like. In the above example, we might fail to reject the idea that the observed variation falls outside what is expected for two individuals of the same species, assuming a human model of variation (i.e. comparisons with a human sample). This does not mean that the two specimens above are exactly like any given pair of human mandibles, only that the differences between the two fall within a range of variation in human mandibles that we might consider "normal." But what does normal mean and what properties does it have?

As a starting point, we might consider the year of birth variation for the enrolled students in 207x (see below). The following histogram shows the self-reported year of birth of more than 17,000 of our 19,000+ students. In other words, it represents more than 17,000 variable observations. So what is the "normal" year of birth in our class?

Page 2: In Understanding the Fossil Record

 

There are a number of ways to answer this question. The most common interpretation of this question migth be re-phrased as, "what is the expected year of birth of a randomly drawn student in this class?"

This rephrasing is one way of asking for the mean year of birth in our class. The mean (or arithmetic mean, to be more precise) is what, in English, we most commonly refer to as the average.

To calculate the mean, you take the summation of every value in your dataset and divide by the total sample size. Viewed mathematically, we can write that as follows:

Page 3: In Understanding the Fossil Record

 The mean year of birth for our class is approximately 1979. In other words, if you add up every year of birth for the students in our class, divide that value by the total number of students with reported values (~17,000), the value you are left with is 1979.

To put it simply, 1979 is the expected year of birth for our course.

But the mean is not the only way to express the expected value of our distribution. We could also describe the expected value by asking for the median value, or the middle value of our dataset. If we lined up every value in our dataset in order, from oldest to youngest (or youngest to oldest), the value that sits in the middle of the our data set would be the median value.

The median value for our class is 1984, meaning that the student at the middle of our course distribution was born in 1984. Median is another expression of the expected value of an observed distribution.

We might also consider the mode of our data. The mode refers to the observed value that occurs most frequently within our dataset. Notice that the modal value is quite likely different from both the mean and median. To calculate the mode, you simply need to see which value occurs most frequently within your dataset. 

In our course, the modal year of birth is 1990. 

average values, review 1

Consider our course distribution below, with the mean, median, and modal values highlighted. The following questions are for review only and are not graded, but are intended to help introduce you to these concepts:

Page 4: In Understanding the Fossil Record

The mean, median, and mode all differ in this data set.

True or false: If you were to remove the left side of the distribution by removing everyone born before 1945, the mean year of birth would decrease.

- unanswered

True or false: If you were to remove the left side of the distribution by removing everyone born before 1945, the median year of birth would decrease.

- unanswered

True or false: If you were to remove the left side of the distribution by removing everyone born before 1945, the modal year of birth would remain unchanged.

- unanswered

You have used 0 of 1 submissions

average values, review 2

Still considering the above chart, consider the following review questions (ungraded):

Page 5: In Understanding the Fossil Record

What value do you think is the most accurate representation of the "expected" value for this distribution?

- unanswered

Mean Median Mode You have used 0 of 1 submissions

Mean, median, mode 1

(1 point possible)

If we measure the height of everyone in our course to the nearest 0.01 cm, which of the "average" values would likely be least meaningful?

- unanswered

Mean Median Mode You have used 0 of 1 submissions

Mean, median, mode 2

(1 point possible)

You are a primatologist measuring the frequency of grooming bouts between pairs of primates. You observe that any given pair of primates in your troup groom each other between 0 and 6 times per day. However, there are a few exceptional pairs you observe that regularly groom each other 20+ times per day. Which value will likely be most meaningful to understand how frequently, on average, primates groom each other in your troop.

- unanswered

Mean Median Mode You have used 0 of 1 submissions

Mean, median, mode 3

(1 point possible)

You are recording first molar tooth breadth in a large sample of recent humans housed in a museum collection. Your sample size is greater than 1,000, with molar breadth ranging from 6.80-12.41 mm. Which value will likely be most meaningful?

- unanswered

Mean Median Mode You have used 0 of 1 submissions

Page 6: In Understanding the Fossil Record

Mean, median, mode 4

(3 points possible)

The following data represent the breadth of the mandibular second molar, measured to the tenth of a millimeter, in a sample of 10 humans.

9.3,9.9,10.7,10.6,10.8,10.9,10.0,10.8,10.5,11.5

Calculate the mean, median, and mode (NOTE: numerous free platforms, including google docs, will allow you to enter data and automatically calculate these values for you. You can either do the work by hand, or use an alternative platform to come up with the values)

What is the mean (to the nearest 0.1 mm)?

- unanswered

What is the median (to the nearest 0.1 mm)?

- unanswered

What is the mode (to the nearest 0.1 mm)?

- unanswered

Mean, median, mode 5

(3 points possible)

Page 7: In Understanding the Fossil Record

The following data represent the breadth of the mandibular canine, measured to the tenth of a millimeter, in a sample of 10 gorillas.

10.3,9.4,9.4,10.6,14.3,12.3,9.5,15.1,12.2,14.4

Calculate the mean, median, and mode (NOTE: numerous free platforms, including google docs, will allow you to enter data and automatically calculate these values for you. You can either do the work by hand, or use an alternative platform to come up with the values)

What is the mean (to the nearest 0.1 mm)?

- unanswered

What is the median (to the nearest 0.1 mm)?

Page 8: In Understanding the Fossil Record

- unanswered

What is the mode (to the nearest 0.1 mm)?

- unanswered

You have used 0 of 1 submissions

As it turns out, the expected value or central tendency of a distribution is not the only aspect of "normal" variation we are interested in. Part of what represents "normal" is the spread of values around that expected value. For example, consider the two distributions below:

 

Obviously the red and green distributions not only in their expected values, but also in the scatter of values around that expectation. The green distribution is tightly clustered, whereas the red distribution is more scattered. In the case of the red distribution, more variation is "normal," with the opposite true in the green distribution.

Just as we had three common measures of the expected value, we have several commonly used measures of the dispersion of values around the expectation. This dispersion is also an important component of what we might consider normal variation.

Page 9: In Understanding the Fossil Record

The simplest measure of dispersion is simply the range of values in a sample (maximum - minimum). Range, however, is highly subject to outliers in the data, and provides very little information about the overall shape of a distribution. In samples of limited size, such as some fossil samples, it may be the only measure of dispersion available to us.

A more commonly employed metric of dispersion is something we refer to as the standard deviation. The standard deviation is the square root of the sum of squared deviations from the mean, or:

To put this forumation in plain english, here is the process for calculating the standard deviation:

1) Take the difference between each observation and the mean and square it (this is the squared deviation)

2) Add up those squared deviations (i.e. take the summation of them)

3) Divide that value by N-1 (one fewer than the size of your sample, NOTE: in some instances you would divide by N rather than N-1, but such a treatment of standard deviation is rarely utilized in paleoanthropology)

4) Take the square root of that value.

Standard deviation tells you about the relative distribution of values around the mean, and it does so on the same scale as your observations. So if you are measuring height in centimeters, standard deviation will be in centimeters. In a true normal distribution, the standard deviation has the added benefit of providing a convenient way of assessing the distribution of observations in a sample. In a normal distribution approximately 2/3 of observations are within +/- 1 standard deviation of the mean, 95% of observations within +/- 2 standard deviations, and 99% within +/- 2.5 standard deviations.

One disadvantage to standard deviation is that it does not consider the relative scale of your observations. For example, if you are measuring the body mass of gorillas, you might find that the standard deviation of your observations is 15 kg. If you are measuring the body mass of dwarf lemurs (one the smallest primates), you might find the standard deviation in body mass is 0.1 g. Gorilla body mass has a much higher

Page 10: In Understanding the Fossil Record

standard deviation, but this does not necessarily mean that the values are more dispersed relative to the mean...only that the mean itself is quite a bit larger.

One way around this problem is to "normalize" our values relative to the mean. To do this, we can calculate the standard deviation as we have done before, but now divide that value by the mean. For convenience sake, we also multiply that value by 100. This value we refer to as the coefficient of variation, or CV:

 The CV is nice because it forces comparisons across very different mean sizes to similar scales. In the above example, whereas the standard deviation will inevitably larger in gorillas, the CV should be able to tell you which distribution is more "spread out" relative to its mean value. CV lacks scale, meaning that it is not a measure in kilograms or grams in the example above, but merely a number indicating the relative degree of dispersion.

 

 

dispersion review

Before asking you to complete a few graded exercises, consider the following throught exercises (not graded):

I am a researcher examining human dietary practices. I have a large sample of total caloric intake per day for individuals in a population. I find that the average individual in my sample consumes 2200 calories per day, but I want to get a sense of how many consume between 1700-2700 per day. Which dispersion statistic is likely to be most helpful?

- unanswered

Range Standard deviation CV Now say instead, I combine forces with another research on campus who has recorded

similar data on capuchin monkeys. We are interested in whether or not humans or capuchins are more variable in their overall daily caloric intake. Now which measure of

dispersion will likely be most useful?- unanswered

Range Standard deviation CV Finally, an enterprising graduate student has examined a large sample of living primates

and determined that estimated volume of the thoracic cavity, as measured by the dimensions of the rib cage, is a good predictor for daily caloric intake. She would like to apply that to the fosil record for the genus Homo. The challenge is that only three partial

Page 11: In Understanding the Fossil Record

skeletons are preserved well enough in the fossil record to allow for such measurements of the rib cage. What measure of dispersion will likely be most useful for her?

- unanswered

Range Standard deviation CV You have used 0 of 1 submissions

Disperson 1

(3 points possible)

Going back to the earlier data set of human second molar breadth, calculate the following:

What is the range (measured to 0.1 mmm)

- unanswered

What is the standard deviation (measured to 0.1 mmm)

- unanswered

What is the coefficient of variation (measured to 0.1)

- unanswered

You have used 0 of 1 submissions

dispersion 2

(3 points possible)

Going back to the earlier data set of gorilla canine breadth, calculate the following:

What is the range (measured to 0.1 mmm)

Page 12: In Understanding the Fossil Record

- unanswered

What is the standard deviation (measured to 0.1 mmm)

- unanswered

What is the coefficient of variation (measured to 0.1)

- unanswered

You have used 0 of 1 submissions

Dispersion 3

(1 point possible)

Given the above values, which is the best explanation of the variability in the two measurements (human M2 breadth or gorilla canine breadth)?

- unanswered

Gorilla canines have a higher standard deviation and are therefore more variable

Gorilla canines have a higher CV and are therefore more variable Human

premolars are less variable because they have a narrower range Human premolars are more variable because they have a higher CV

You have used 0 of 1 submissions

Dispersion 4

(1 point possible)

If I were to reveal to you that individuals 1, 2, 3, 4, and 7 in the gorilla sample were female, and that the remainder were male, how might you interpret the greater variability in the gorilla sample?

- unanswered

Page 13: In Understanding the Fossil Record

The female gorillas are unusually small, contributing to a higher CV value for the

sample The male gorillas are unusually large, contributing to a higher CV value for

the sample Gorillas have high sexual dimorphism, so a normal mixed-sex sample is

highly variable Gorillas have low sexual dimorphism, so you would expect a lot of male-female overlap

You have used 0 of 1 submissions

normal variation

(1 point possible)

In summary, which of the following is the best definition of "normal variation"

- unanswered

The average of a sample The relative shape of a distribution of observations

The observations between the biggest and the smallest The average value, and the distribution of observations around that value