data analysis e3: lecture 8. data analysis lecture outline processing and visualizing data -why do...
TRANSCRIPT
Data Analysis Data Analysis
E3: Lecture 8
Data AnalysisData Analysis
Lecture Outline
• Processing and Visualizing Data- Why do we do this?- Processing the Luria-Delbruck Data- Processing the Public Goods Data
• Analyzing Data (using Excel)- Difference in means (t-test)- Difference in distributions (2 test)
Data AnalysisData Analysis
Lecture Outline
• Processing and Visualizing Data- Why do we do this?- Processing the Luria-Delbruck Data- Processing the Public Goods Data
• Analyzing Data (using Excel)- Difference in means (t-test)- Difference in distributions (2 test)
Handling Data• After a laboratory experiment or time out in the field, you will
have several data points.
• How should one process this (potentially voluminous) data?1) Organize it (spreadsheet programs, like Excel, can help)2) Process it
I) Investigate portions of the data setII) Look at relevant descriptive statisticsIII) Transform data points in a well-defined wayIV) Combine data points in a well-defined way
3) Visualize it4) Subject it to an appropriate statistical test
Massaging?
Dressing-up?
3 P colonies4 R colonies
*Focusing
Picture = Words 1000Grade Distribution
ABCDE
Understanding the Black Box
0
0.2
0.4
0.6
0.8
1
0 5 10 15 20
Number of Trials
Acc
urac
y of
Pre
dict
ion
.
• We are visual animals and often can see patterns when data is presented visually
• Examples:
- Pie-chart illustrates the distribution of values of a single variable
- X-Y plot illustrates the form of the relationship between two variables
- Paired histograms illustrate the relationship between the distributions of two variables.
• The most appropriate picture will often depend on the data:- Categorical or
quantitative?- Frequencies, counts
or measurements?- Relationship between
data points?
Data AnalysisData Analysis
Lecture Outline
• Processing and Visualizing Data- Why do we do this?- Processing the Luria-Delbruck Data- Processing the Public Goods Data
• Analyzing Data (using Excel)- Difference in means (t-test)- Difference in distributions (2 test)
The Data
• Go to our class website:
http://depts.washington.edu/kerrpost/Bio481/HomePage
• On the DATA link, download the following Excel (xls) files:
- “E3_LD_Processed_Data”- “E3_PG_Processed_Data”
• Take care as you process and visualize the class data– the product of your efforts can be used directly in your first two lab reports.
DAY 1: Tuesday
Processing the Luria-Delbruck Data
×24
×3
48 hours at 37C
48 hours at 37C
DAY 3: Thursday
COUNT
COUNT
• We’ll start by computing some useful statistics:- Mean number of colonies on a rifampicin plate.- Variance in number of colonies on a rifampicin plate.- Total number of rifampicin plates (number of replicates in the class).
• Next we will compile the full distribution of rifampicin plate counts:- Actual distribution (COUNTIF function will be useful)- Expected distribution (get ready to write a complicated function!)- Let’s plot these distributions.
• Finally, let’s compute the density of cells in the original wells.
Data AnalysisData Analysis
Lecture Outline
• Processing and Visualizing Data- Why do we do this?- Processing the Luria-Delbruck Data- Processing the Public Goods Data
• Analyzing Data (using Excel)- Difference in means (t-test)- Difference in distributions (2 test)
DAY 3: ThursdayDAY 2: WednesdayDAY 1: Tuesday
Processing the Public Goods Data
• We’ll start by computing the densities when alone.• Next, let’s compute the relative fitnesses (BK26+pBR relative to BK27) & plot these.
Agar comp.
Liqu
idco
mp.
24 hours at 37C
24 hours at 37C
24 hours at 37C
COUNT
24 hours at 37C
24 hours at 37C
COUNT
BK27
BK
26+
pBR
monocultures ↑monocultures ↑ competitions ↓competitions ↓
BK27
BK
26+
pBR
Init.
Mix
amp
BK27
al
one
BK26
+pBR
alon
e
monocultures ↑monocultures ↑ competitions ↓competitions ↓
Agar comp.
Liqu
idco
mp.
24 hours at 37C
24 hours at 37C
24 hours at 37C
24 hours at 37C
24 hours at 37C
COUNT
COUNT
COUNT
COUNTBK
27
alon
eBK
26+p
BRal
one
Save Your Work
• Save your work by renaming the data files:
- “E3_LD_Processed_Data_YOUR_INITIALS”- “E3_PG_Processed_Data_YOUR_INITIALS”
• Save these files on a thumb drive or email them to yourself.
• We will continue to work on these during class today.
Data AnalysisData Analysis
Lecture Outline
• Processing and Visualizing Data- Why do we do this?- Processing the Luria-Delbruck Data- Processing the Public Goods Data
• Analyzing Data (using Excel)- Difference in means (t-test)- Difference in distributions (2 test)
• Suppose your hear that a high-protein diet during puberty leads to an increased height as an adult.
- The mean height in a high protein treatment was 5’11” and the mean height in a control treatment was 5’5”
- What would you feed your kids? How do you gauge this?
• The New York Times has just done an expose about sexism in graduate admissions in a famous department of mathematics
- While the number of male and female applicants was equal, the number of males admitted was greater.
- Should an formal inquiry take place? How do you evaluate the data?
How do we statistically evaluate data?• When you were a child, your father tells you he will let you stay up
late if the result of a coin he flips is heads.- Suppose the coin comes up heads 25% of the time- Is your Dad using a fair coin? How would you evaluate this?
12 3100 25
4 1Number of flips Number of “heads”
Control High-protein Control High-protein
♀a ♂a
Control High-protein
♀e ♂e
freq
uenc
y
The Data
• Go to your email or thumb drive and download your processed data files:
- “E3_LD_Processed_Data_YOUR_INITIALS”- “E3_PG_Processed_Data_YOUR_INITIALS”
• Take care as you analyze the class data– the product of your efforts can be used directly in your first two lab reports.
Student’s t-test
William Sealy Gossett
• DEMO: Performing a t-test- Computing a p-value from a t-test- Distinguish the different types of t-tests:
Paired versus Unpaired data Equal versus Unequal variance One-tailed versus Two-tailed tests
• Using a pseudonym, “Student,” Gossett described a test for distinguishing the difference between means of 2 data sets.
• The t-test uses the statistics from two groups of data (means and s.d.) to generate a third statistic (the t statistic).
• If the two groups of data come from populations with the same mean, the t statistic has a characteristic distribution itself (note the shape will depend on the sample sizes).
• If the computed t is extreme, then the chance that the two groups have equal means is slim (quantified by the p-value of the test). The means are significantly different if p<0.05.
• Assumptions- Each datum is independent- Data is normally distributed
• How can we use t-tests for the Public Goods Data? What type of t-test is appropriate? How do you report it?
Data AnalysisData Analysis
Lecture Outline
• Processing and Visualizing Data- Why do we do this?- Processing the Luria-Delbruck Data- Processing the Public Goods Data
• Analyzing Data (using Excel)- Difference in means (t-test)- Difference in distributions (2 test)
2 Test• Karl Pearson introduced a test to distinguish whether an
observed set of frequencies differs from a specified frequency distribution.
• The -test uses frequency data to generate a statistic (2).
• If the observed frequencies come from a population with the specified frequency, the 2 statistic has a characteristic distribution (the shape will depend on the # of classes).
• If the computed 2 is extreme, then the chance that the observed frequencies derive from the specified distribution is slim (this is quantified by the p-value from the test). The observed frequencies are significantly different if p<0.05.
• Assumptions- Frequencies are derived from independent sampling- There are not several frequencies that are very small
Karl Pearson
• DEMO: Performing a chi-square test- Computing a p-value from a chi-square-test
• Perform a 2 test to see if the Luria-Delbruck Data from the class differs from the frequencies expected under directed mutation. What can you conclude?