experiment design & statistical analyses for fyp students_june 2015

59
Experiment Design & Statistical Analyses for FYP Students (A Refresher Biostatistics session for FYP students) 23, June 2015 Conducted by Dr Nevil LSCT, NP 1

Upload: ruther-teo

Post on 16-Dec-2015

8 views

Category:

Documents


0 download

DESCRIPTION

ASAS

TRANSCRIPT

  • Experiment Design & Statistical Analyses for FYP Students

    (A Refresher Biostatistics session for FYP students)

    23, June 2015

    Conducted by Dr Nevil

    LSCT, NP

    1

  • The outlines of the Todays session are..

    Experiment design & Testing hypothesis

    Data validation & Analysis

    Interpretation of results and making a conclusion

    Statistical analyses in Excel paired t-test, unpaired t-test, Analysis of variance (ANOVA) and multiple comparison

    Some examples of published scientific presentations in different types of charts

    How to create a chart with error bars in Excel

    2

    Important Note:Any research involving human participants will require IRB approval.Any research involving animals will require IACUC approval

  • The Most Important Points to note in your FYP in relation to statistics are

    Experiment design & Hypothesis testing

    Clear objectives

    State your hypotheses: Null and Alternate (Research)

    Sample size

    Replication of experiment

    Execution of experiment

    Data collection and validation

    Check your data for normal distribution and outliers

    Set criteria

    Significance level () = 0.05 or 0.01 or 0.001

    Test statistics

    Choose correct statistical test (Refer flow chart in annex slide 58)

    3

  • The Most Important points to note in your FYP in relation to statistics are

    Interpretation of results & making conclusion

    Interpret the result based on p-value and significance level.

    Interpret the result based on critical value based and calculated value.

    Interpret the result trendline (relationship / association between 2 variables)

    Make your conclusion on the null or alternative hypothesis.

    Presentation of results

    Choose suitable graph/table

    Use error bars to compare your data Mean or trendline to show the relationship between variables

    Write clear title, axis labels, legends and footnote.

  • What is a Hypothesis Testing?

    Hypothesis testing, otherwise called significance testing is a method for testing a claim or hypothesis about a parameter in a population, using data measured in a sample. (e.g. The Mean height of Male students population in NP is 170 cm).

    What is the goal of hypothesis testing?

    The goal of hypothesis testing is to determine the likelihood that a population parameter, such as the Mean height of Asian males, is likely to be true or not.

  • Null & Alternative Hypotheses (Two tail / Non-directional)

    The null hypothesis, denoted by H0 , is a tentative assumption about

    a population parameter. The assumption is always true.

    E.g. The Mean height of Male students population in NP is 170 cm.

    The alternative hypothesis, otherwise called research hypothesis

    denoted by Ha, is the opposite of what is stated in the null

    hypothesis. The assumption is NOT true.

    E.g. The Mean height of Male students population in NP is NOT 170 cm.

    Researchers uses data from sample(s) to test the these two

    competing statements.

    Researchers always challenge the null hypothesis that it is NOT

    TRUE.

    Tips to remember

    Null means: Always Neutral, no effect, same, or no different

    Alternative means: No neutral, has effect, not same, different, increase or decrease

  • One-tailed

    (lower-tail)One-tailed

    (upper-tail)

    Two-tailed

    Three Forms for Null and Alternative Hypotheses

    When we do hypothesis testing, we must follow one of the

    following three forms

    0: 0

    : 0 <

    0: 0

    : 0 >

    0: 0 =

    : 0

    E.g. H0: The height of Asian male population is 173 cm.Ha: The height of Asian male population is not 173 cm.This is the most commonly used method.

    E.g. H0: The height of Asian male population is equal to or more than 173 cm.Ha: The height of Asian male population is less than 173 cm.

    E.g. H0: The height of Asian male population is equal to or less than 173 cm.Ha: The height of Asian male population is more than 173 cm.

    0 is Hypothesized population Mean is True population Mean

  • Null & Alternative Hypotheses (One tail/Directional)

    Another example: A new drug developed is believed to be reducing serum glucose level in diabetic patients than the existing drugs.

    Null Hypothesis H0: The new drug doesnt reduce the serum glucose level in diabetic patients than the existing drugs.

    Alternative Hypothesis (Ha): The new drug reduces the serum glucose level in diabetic patients than the existing drugs.

  • Steps of hypothesis testing

    1. State the hypotheses: Null and Alternate (Research).

    2. Set the criteria for a decision (significance level, called alpha ())

    3. Select and compute test(s) statistics.

    4. Interpret the results

    5. Make a decision.

  • Step 1. State the hypothesis

    The null hypothesis (H0), stated as the null, is a statement about a parameter, such as the population Mean, that is assumed to be true.

    The null hypothesis is a starting point. We will test whether the parameter (Population Mean) stated in the null hypothesis is likely to be true.

    E.g. The Mean height of Male students population in NP is 170 cm.

    An alternative hypothesis (Ha) is a statement that directly contradicts a null hypothesis by stating that that the actual value of a parameter (Population Mean) is less than, greater than, or not equal to null hypothesis.

    The alternative hypothesis states that the null hypothesis is wrong.E.g. The Mean height of Male students population in NP is NOT 170 cm.

  • 2: Set the criteria (-value) to make a decision. To set the criteria, we need state the level of significance, otherwise

    called alpha indicated by a symbol . In tradition, the level of significance is typically set at = 0.05. The

    other levels used are =0.01 and = 0.001. Level of significance, or significance level, is the cut of point at which

    we say the calculated probability (p)-value is small enough to reject null hypothesis or large enough to accept the null hypothesis.

    Probability (p) value The P value or calculated probability is the estimated probability of

    rejecting the null hypothesis (H0) of a study question when that hypothesis is true.

    If p is low, H0 must GO

    Key points to remember If p-value is smaller than , we say that mean values are significantly

    different between groups. Hence, we reject the null hypothesis. If p-value is bigger than , we say that mean values are NOT significantly

    different between groups. Hence, we accept the null hypothesis.

  • Significance level & Confidence level

    Significance levels are related to confidence levels through the rule, CL = 1- .

    1-0.05 = 0.95

    1-0.01 = 0.99

    1-0.001 = 0.999

    It is common to express the confidence level in %.

    CL = 1-0.05 = 0.95 = 95%

    CL = 1-0.01 = 0.99 = 99%

    CL = 1-0.001 = 0.999 = 99.9%

  • 3. Select and Compute the Test Statistic

    The test statistic is select and apply the appropriate statistical test (e.g. t-test or z-test or ANOVA) that allows researchers to determine the likelihood of obtaining sample outcomes.

    The value of the test statistic is used to make a decision regarding the null hypothesis.

  • 4. Make a decision We use the result of the test statistic and significance level to make a

    decision about the null hypothesis.

    In practice, one of these two decisions a researcher can make:

    1. Reject the null hypothesis, when p value is less than set (e.g. 0.05). That means, the difference between and 0 is significant. The difference is NOT caused by a chance. It is caused by the experiment. The experiment has strong evidence to reject the null hypothesis.

    2. Accept the null hypothesis, when p value is more than set (e.g. 0.05). That means, the difference between and 0 is NOT significant. The difference is caused by a chance. It is NOT caused by the experiment. The experiment does not have strong evidence to reject the null hypothesis.

    P < 0.05, the difference is significant, Reject H0P > 0.05, the difference is not significant, Accept H0

  • Experiment Design - Objectives

    Example 1: The aim of the project is to evaluate the effect of a new antibiotic on the mortality of bacterium sp. X

    15

    Example 3. The aim of the project is to evaluate the effect of herbal supplement on the immune parameters of juvenile fish.

    Example 2. The aim of the project is to study the effect of new drug on Diabetic mice to reduce blood serum glucose level

    Let's take example 3 for hypothesis testing

  • Experiment design Making Null and Alternate Hypotheses

    E.g. Objective. The aim of the project is to evaluate the effect of herbal supplement on immune parameters of juvenile fish.

    Write your null and alternative hypothesis and symbols for this objective

    Null hypothesis (H0) Herbal supplement will not affect the immune parameters of juvenile fish.

    Alternative hypothesis (Ha) Herbal supplement will affect the immune parameters of juvenile fish.

    16

    0: 0 =

    : 0

    = value of true population mean0 = value of hypothesized population mean

    Is this hypothesis one tail or two tail?

  • 17

    Steps of Hypothesis TestingStep 1: Develop Null and Alternative hypotheses

    Refer slides 6 & 7

    Step 2: Set the criteria for decision

    Level of significance (conventionally = 0.05 or 0.01)

    Step 3: Compute the test statistics

    Appropriate statistical test like t-test, z-test and ANalysis Of Variance (ANOVA)

    Step 4: Interpret results

    Make your decision based on the set alpha () and P-value or based on the

    critical and calculated values.

    Step 5: Make a decision / conclusion

    When we reach statistical significance, the null hypothesis is rejected and the

    research (alternative) hypothesis is accepted.

    When we fail to reach statistical significance, the null hypothesis is accepted and the research (alternative) hypothesis is rejected.

  • Sample Size One of the main issues in experimental design is setting the sample size. Certainly, larger sample size is desirable. The reasons that data would be

    mostly in normal distribution and gives lesser variation. If your project involves a population study (e.g. survey), the sample size

    should be at least 10% of that population. In fact, this number is a challenge in population studies that require animal model due to cost and animal ethical issues.

    Sample size required depends on how significant your experiments need to be.

    More significant results need bigger sample size Some text books recommend the sample size of 30 and above is large

    sample size for biological research. Below 30 is classified as small sample size, which is also accepted in biological research.

    Its a good practice to refer to scientific articles related to your project to set your sample size.

    In general, clinical trials and population studies require large sample size. Websites are available to calculate sample size at different significance level.

    18

  • Replication of experiment It is very important to obtain consistency and

    reproducibility in your experiment results.

    Replication of the whole experiment is laborious, time consuming and costly.

    One way to address this issue is having 3 or more replicates of samples within the same experiment.

    However, in some long term research or clinical studies replication of the whole experiment is required.

    19

  • Types of data you may collect in your experiments

    Data Type Description Examples Types of Stats Analysis to Apply

    Qualitative data

    Descriptive, observed, categorical orinterpretive data

    Colour of bacteria colonies yellow, red, brown, grey;

    Behaviour of mice fed on caffeine diet;

    Survey on elderly habits in taking prescribed medication;

    Classification; Factor analysis;Cluster analysis;, Prediction

    Quantitative data

    Count / frequency data

    No. of larva that survived after heat treatment;

    No. of viable cells after treatment with anti-tumour drug;

    Chi-square test, Goodness of Fit

    Measureddata

    Length of fish larvae after 2 weeks of treatment with supplemented feed;

    Amount of insulin in the blood sugar of diabetic mice in relation to their glycemic load diet over 2 weeks;

    t-test, z-test ANOVA,multiple comparison,

    20

    Once you have designed your experiments, you will need to think about the data you will collect and what are the tests to apply.

  • Data validation prior to analysis

    Check your data for normal distribution.

    It can be done subjecting your data to descriptive statistics.

    It can be visualized in Excel by scatter plot and by constructing frequency histogram.

    If your data points are normally distributed you may be able to see a kind of Bell curve.

    If you are data points are normally distributed, proceed to parametric tests such as paired t- test, unpaired t-test and ANOVA, which ever test is suitable for your data.

    If your data are not normally distributed proceed to non-parametric test such as Wilcoxon signed Rank test ( = paired t-test), Mann-Whitney test (= unpaired t-test) and Kruscal Wallis test (= ANOVA), Chi-Square (for categorical data) which ever test is suitable for your data.

    21

    Refer Annex slide, 58 for more info

  • Some example of statistical tests and their use

    Descriptive statistics preliminary check to validate your data

    Paired T-test compare means of two groups which came from the same subjects. Used in before and after situation.

    Unpaired T-test compare means of two groups of unequal sample size. The data not necessarily come from the same subjects.

    ANalysis Of Variance (ANOVA)* analyze the variances for more than 2 groups

    Multiple comparison test* E.g. Tukeys test Continuation of ANOVA to compare means in pairs

    Correlation & Regression see the relationship between 2 or more variables

    Chi-square test for the distribution of categorical variables

    22

  • 23

    Descriptive Statistics

    Key in your data sets in Excel and follow the screen shots

    The basic statistics to get the summary of data such as mean, median, mode, variance, standard deviation and other functions of your data.

    A preliminary check to know about your data before analysis.

  • 24

  • Descriptive statistics output in MS Excel

  • 26

    The important values you need know from this table are mean, standard deviation, skewness and kurtosis. Why?

  • Paired t- testYou can use paired t-test When you have 2 groups of data Conditions

    Data are normally distributed Samples are dependent sample size must be equal. Used in before and after situation. In this case, the subjects

    will be same. Two types of data collected from the same subjects Matching pairs by gender or age

    It is most commonly used in clinical trial studies to evaluate the efficacy of new drugs for various diseases and disorders before and after treatments.

    27

  • 28

    Do you see a significant difference between S1 and S2?

    Example of Paired t-testIn Excel it is called t-test: Paired TWO Sample for Means

    Sub

  • Unpaired or Independent sample t-test

    You can use unpaired t-test

    When you have 2 groups of data

    Conditions Data are normally distributed

    Samples are independent

    sample size need not be equal.

    Data collected from different subjects

    It is most commonly used in comparing Means of 2 groups.

  • Unpaired t- test

    Example: A pair of students tested the effect of hand washing liquid on the bacterial growth. Students prepared several

    petri dishes with culture medium. Hand washing liquid were added in some of the Petri dishes and in not in others. Same amount of bacteria were inoculated in all petri dishes and

    incubated for 48 hrs. After 48 hrs, the number colonies present in each petri were counted. The results are tabulated

    in the next slide.

    What is your null and alternative hypotheses?

  • Unpaired or Independent sample t-test in ExcelIn Excel it is called t-test: Two samples assuming equal variances

  • Unpaired or Independent sample t-test in ExcelIn Excel it is called t-test: Two samples assuming equal variances

  • Unpaired or Independent sample t-test output in Excel

    Interpretation: P value is smaller than 0.05. t Stat value is larger than the t critical value. These values indicate there is a significant difference between control group and treatment group in terms of bacterial colonies.What is your conclusion on null hypothesis?

  • ANOVA: ANalysis Of VAriance

    You can use ANOVA when you have more than 2 groups of data to compare. your data are normally distributed. sample size need not be same.

    AdvantageIt can tell us overall significant difference in the experimental groups.

    LimitationIt doesnt tell us which group is significantly different from other groups.

    34

  • One-way ANOVA (ANOVA single factor in Excel)

    Example: A Biologist quantified the levels of cadmium present in different species of sea weeds. He collected same amounts of 4

    different species of seaweeds and measured cadmium levels separately. The data are tabulated in the next slide.

    What is your null and alternative hypotheses?

  • 36

    Prior to do ANOVA, check data for normality

    Are data in this example normally distributed? Why?

  • 37

    ANOVA single factor in MS Excel

  • 38

    ANOVA single factor in MS Excel

  • What is your interpretation on the ANOVA results?

    Do you accept or reject the null hypothesis? Why?

    What is your next step in analysis? Why?

  • You can use Tukeys test when ANOVA shows overall significant difference (P < 0.05) of your data. Tukeys test is able to tell you which group is significantly different from

    the others.

    It makes pairwise comparisons to show the difference in Means. A very useful test available in Excel for multiple comparison.

    Multiple comparison Tukey's test

    40

    How to do Multiple Comparisons in Excel?

    Select data with labels > click Add-Ins > Data Analysis Plus > MultipleComparisons > OK

    Check Labels (if you had selected labels when you selected the data).Otherwise, no. By default is 0.05. Press Ok (unless need to change this).

    You will see the multiple comparison output in another Excel sheet.

  • Multiple comparison in MS Excel (Tukeys Test)

    41

  • 42

    Multiple comparison in MS Excel (Tukeys Test)

  • 43

    The Excel output of Tukeys test looks like this

    How to interpret?Look at the values in difference column and Omega column. If the value in difference column is LARGER than the value in Omega column for a particular pair, the Mean difference between the pair is statistically significant.If the value in difference column is SMALLER than the value in Omega column for a particular pair, the Mean difference between the pair is NOT statistically significant.In your conclusion, if you have many pairs to interpret, you may write about the pairs that are significantly different.

  • Scientific Presentation of ResultsWhat you mainly need are

    Suitable chart Mean values of results Standard deviation or Standard error of mean Confidence interval (in some cases) Interpretation of results Proper X and Y axes labels Adequate title Footnote

    In footnote, indicate statistical output, , level of significance, sample size, test statistics used and experiment is replicated or not.

    44

    You can refer slides 46-50 for some examples of published scientific data presentations using different kinds of charts.

  • When do you use Standard deviation (SD), Standard error of mean (SEM) and Confidence Interval (CI)

    in your data presentation?

    45

    When your Goal is to SD SEM CI

    show the variation in each group

    show how precisely you have determined the mean.

    compare means different groups.

    Whatever you choose to show, be sure to state your choice in the footnote of your graph /table.

  • Samples of data presentationExample 1: Table format

    46

    Table format is fine, if you have many groups to compare. However, It may not be impressive sometimes and not easy for readers to visualize effects of the experiment.

    Note the Title is on top. Footnote at the bottom of the table which contains details of test groups, sample size, statistical tests & software used.

  • Example 2: Simple bar chart to compare Means of 3 groups

    47

    Note the title and footnote are combined at the bottom of the graphs.

    Bars represent mean values and error bars represent SEM.

    Bar with * represent No Gum group is significantly different from other 2 groups

  • Example 3: Vertical Cluster bar chart used to compare means of 5 groups (different time intervals) with different blood cells

    48

    Bars indicate group Means and error bars indicate Standard error of Means

  • 49

    Note: confidence interval is stated

    Mean value

    Example 4. Line chart to present survival studies (Line chart is suitable to present data that were collect over the period of time,

    cell culture studies and incubation studies)

  • Example 5. Line and clustered bar charts

    50

    Line chart is well suited for incubation experiments that show data over the period of experiments (shows trends)

    It is easy to visualize, if you have to many groups to compare.

  • Important Steps to prepare a graph with error bars in MS Excel !!!

    Key in raw data in Excel spreadsheet.

    Find the Mean, Standard Deviation (SD) and Standard Error of Mean (SEM) in Excel or manually.

    Use Mean values to draw a chart. NOT THE RAW DATA.

    Use SD as error bars on the chart if, your interest is show the deviation of data from the mean.

    Use SEM as error bars on the chart, if your interest is to compare the means of different groups. This is the most commonly used (E.g. Compare the mean height of kangkong plant grown in different conditions).

    Label the X and Y axes with proper terms and units.

    Give appropriate title to your chart.

    Indicate the Statistical test used, level of significant, sample size (n), and choice of error bar used in footnote. (E.g. test used is ANOVA, level of significance is p

  • How to prepare chart with error bars in Excel

    52

    In Excel, go to formula bar and type =Average(select cell with data) E.g. =AVERAGE(b2:b6)=STDEV(select cells with data) E.g. = STDEV(b2:b6)=(cell with STDEV value /(sqrt(n)) E.g. =B8/(sqrt(5))

    OrSelect descriptive statistics as shown in slide 23 -26.

    E.g. Key in data, calculate the mean, SD and SEM as shown in this slide.

    n is sample size

  • To draw bar chartCopy and paste feed label and mean values onto new cells as shown in this slide. Select the feed label and mean values > Insert > select column chart >

    2D column and enter. You will see bar chat as shown below.

    53

  • To enter error barsClick the chart > chart elements > click error bar option and the

    arrow and select more options as shown in this slide.

    54

  • Under error bar option select Both (by default it should be both), scroll down and select custom as shown in this slide

    55

  • Click specify values and fill positive and negative error values by selecting SEM values in the spreadsheet once for positive error value and 2nd time for

    negative error value as shown in this slide and press Ok

    56

  • You can see a chart with SEM as error bars. Fill in axes and title as shown in this slide

    57

  • Flow chart used to choose an appropriate statistical test for your data

    58

    It may look complicated, but it's not and it's a very good reference.

    Refer the soft copy for enlarged view

    Start here

    Annex

  • Q & A

    59