inferential statistics
DESCRIPTION
useful when doing ARTRANSCRIPT
1
INFERENTIAL STATISTICS
BY:
ADIBARAWIAH BT ABDUL RAHIM (D20102040002)NIK NURHAFIZAH BT NIK DAUD (D20102039968)NOOR AZIE HARYANIS BT ABDUL AZIZ (D20102039969)
2
WHAT ARE INFERENTIAL STATISTICS?› Inferential statistics refer to certain procedures that allow
researchers to make inferences about a population based on data obtained from a sample.
3
THE LOGIC OF INFERENTIAL STATISTICS
Sampling Error
› Definition– Difference between a sample and its population (the data
obtained)– Arises as a result of taking a sample from population
rather than using the whole population.
Distribution of Sample Means
› Called as sampling distribution– It has its own mean and standard deviation.– It is the mean of the means (from the samples)– The mean is equal to the mean of the population.
› Large collections of random samples– Do pattern themselves in such a way that it is possible for
researchers to predict accurately some characteristics of the population from which sample was selected.
– Its means tend to be normally distributed (sizes of each of the samples must be large; more than 30)
Standard Error Of The Mean (SEM)
› The standard deviation for sampling distribution is called the standard error of the mean (SEM)
› n = sample size
CONFIDENCE INTERVALS
› A confidence interval is a region extending both above and below a sample statistic (such as a sample mean) within which a population parameter (such as the population mean) may be said to fall with a specified probability of being wrong.
› Limits or boundaries where the population mean lies. – 68% fall within ± 1 SEM of the mean– 95% fall within ± 2 SEM of the mean– 99% fall within ± 3 SEM of the mean
The Standard Error Of The Difference Between Sample Means
› The standard error of the difference (SED)– is the standard deviation of the distribution of differences
between samples means as well as the populations. – Formula:
13
HYPOTHESIS TESTING
RESEARCH HYPOTHESIS
NULL HYPOTHESIS
HYPOTHESIS TESTING
HYPOTHESIS TESTING
› Statistical hypothesis testing is a way of determining the probability that an obtained sample statistic will occur, given a hypothetical population parameter.
RESEARCH HYPOTHESIS
› A research hypothesis specifies the nature of the relationship the researcher thinks exists in the population.
› E.g :
“The population mean of students using method A is greater than the population mean of students using method B.”
NULL HYPOTHESIS› The null hypothesis typically specifies that there is no
relationship in the population.
› E.g
“There is no difference between the population mean of students using method A and the population mean of students using method B.”
› (This is the same thing as saying the difference between the means of the two populations is zero.)
Steps in making hypothesis
1. State the research hypothesis.“There is a difference between the population mean of students using method A and the population mean of students using method B”
2. State the null hypothesis.“There is no difference between the population mean of students using method A and the population mean of students usingmethod B.”
3. Determine the sample statistics pertinent to the hypothesis.“the mean of sample A and the mean of sample B.”
4. Determine the probability of obtaining the sample results.“the difference between the mean of sample A and the mean of sample B.”
5. If the probability is small, reject the null hypothesis, thus affirming the research hypothesis. If the probability is large, do not reject the null hypothesis, which means you cannot affirm the research hypothesis.
20
PRACTICAL VS. STATISTICAL SIGNIFICANCE
PRACTICAL SIGNIFICANCE
› A calculated difference is practically significant if the actual difference it is estimating will affect a decision to be made.
› Practical significance is more subjective and is based on other factors like cost, requirements, program goals, etc.
› When determining practical significance the researcher must consider the following:–The quality of the research questions–The relative size of the effect–The size of the sample–The importance of the finding–Confidence intervals–The link to previous research–The strength of correlation
STATISTICAL SIGNIFICANCE
› Statistical significance only means that one’s results are likely to occur by chance less than a certain percentage of the time, say 5 percent.
› The degree of risk that you are willing to take that you will reject a null hypothesis when it is actually true
› Statistical significance is mathematical - it comes from the data (sample size) and from your confidence (how confident you want to be in your results).
TESTS OF STATISTICAL SIGNIFICANCE
› A one-tailed test of significance involves the use of probabilities based on one-half of a sampling distribution because the research hypothesis is a directional hypothesis.
› A two-tailed test, on the other hand, involves the use of probabilities based on both sides of a sampling distribution because the research hypothesis is a non directional hypothesis.
TWO TAILED TESTS› If you are using a significance level of 0.05, a two-tailed test allots half of
your alpha to testing the statistical significance in one direction and half of your alpha to testing statistical significance in the other direction.
› This means that .025 is in each tail of the distribution of your test statistic.
› When using a two-tailed test, regardless of the direction of the relationship you hypothesize, you are testing for the possibility of the relationship in both directions.
› For example, we may wish to compare the mean of a sample to a given value x using a t-test. Our null hypothesis is that the mean is equal to x. A two-tailed test will test both if the mean is significantly greater than x and if the mean significantly less than x. The mean is considered significantly different from x if the test statistic is in the top 2.5% or bottom 2.5% of its probability distribution, resulting in a p-value less than 0.05.
ONE TAILED TESTS› If you are using a significance level of .05, a one-tailed test allots all of your
alpha to testing the statistical significance in the one direction of interest.
› This means that .05 is in one tail of the distribution of your test statistic.
› When using a one-tailed test, you are testing for the possibility of the relationship in one direction and completely disregarding the possibility of a relationship in the other direction.
› Our null hypothesis is that the mean is equal to x. A one-tailed test will test either if the mean is significantly greater than x or if the mean is significantly less than x, but not both.
› Then, depending on the chosen tail, the mean is significantly greater than or less than x if the test statistic is in the top 5% of its probability distribution or bottom 5% of its probability distribution, resulting in a p-value less than 0.05.
› The one-tailed test provides more power to detect an effect in one direction by not testing the effect in the other direction.
TYPE I AND TYPE II ERRORS› Type I Error: rejecting the null
hypothesis when it is true.
› The probability of making a Type I error
- Set by researcher
› E.g., .01 = 1% chance of rejecting null when it is true.
› E.g., .05 = 5% chance of rejecting null when it is true.
- Not the probability of making one or more Type I errors on multiples tests of null.
› Type II Error: not rejecting the null hypothesis when it is not true.
› The probability of making a Type I error
- Not directly controlled by researcher
- Reduced by increasing sample size
Making a decisionIf You… When the Null Hypothesis Is
Actually…Then You Have…
Reject the null hypothesis True (there really are no differences)
Made a Type I Error
Reject the null hypothesis False (there really are differences)
Made a Correct decision
Accept the null hypothesis False (there really are differences)
Made a Type II Decision
Accept the null hypothesis True (there really are no differences)
Made a Correct Decision
SIGNIFICANCE LEVELS
› The term significance level (or level of significance), as used in research, refers to the probability of a sample statistic occurring as a result of sampling error.
› The significance levels most commonly used in educational research are the .05 and 01 levels.
› Statistical significance and practical significance are not necessarily the same. Even if a result is statistically significant, it may not be practically (i.e., educationally) significant.
Probability Values
› p > .05 (deemed likely to be a result of chance)
› p < .05 (not likely to be a result of chance)
› p < .01 (less likely to be a result of chance)
› p < .001 (even less likely to be a result of chance)
Researchers are more often reporting the actual probabilityvalue rather than using < or > signs. (example p = .063)
33
INFERENCE TECHNIQUES
34
Parametric Nonparametric
Quantitative t-test for means Mann-Whitney U test
Analysis of variance (ANOVA)
Kruskal-Wallis one-way analysis of variance
Analysis of covariance (ANCOVA)
Sign test
Multivariate analysis of variance (MANOVA)
Friedman two-way analysis of variance
t-test for r
Categorical t-test for difference in proportions
Chi square
Commonly Used Inferential Techniques
35
PARAMETRIC TESTS FOR QUANTITATIVE DATA› A parametric statistical test requires various kinds of
assumptions about the nature of the population from which the samples involved in the research study were taken.
36
The t -Test for Means› used to see whether a difference between the means of two
samples is significant.
› produces a value for t (called an obtained t), to determine the level of significance (p= .05) that has been reached.– Researcher rejects the null hypothesis and concludes that a real
difference does exist when the level of significance is reached.
› Two forms of t-test:– A t-test for independent means– A t-test for correlated means
37
t-test for independent means› Used to compare the mean scores of
two different or independent groups.
› Example:2 randomly selected groups of 18th graders (31 in each group) were exposed to 2 different methods of teaching for a semester and they were given the same achievement test at the end of the semester. Their achievement scores could be compared using a t-test.
– Null hypothesis: population mean of method A = population mean of method B
– Research hypothesis: population mean of method A > population mean of method B
› The mean score of the achievement test for method A = 85
› The mean score of the achievement test for method B = 80
› The difference between the two teaching methods (85-80=5) is indicated by conducting a one-tailed t-test, to conclude whether the difference is statistically significant or not.
38
t-test for correlated means› Used to compare the mean scores of
the same group before and after a treatment of some sort is given.
› Used when the same subjects receive two different treatments in a study.
› Example: investigate the effectiveness of relaxation training for reducing the level of anxiety such athletes experience and thus improving their performance at the free throw line. She formulates these hypotheses:
› Null hypothesis: There will be no change in performance at the free throw line.
› Research hypothesis: Performance at the free throw line will improve.
39
Analysis of Variance (ANOVA)› A ratio of observed differences between more than two means.
› It is more versatile than a t-test and should be used in most cases instead of the t-test.
› The analysis allows comparison of means of the samples and testing of the null hypothesis regarding no significance difference between means of the samples.
40
Analysis of Covariance (ANCOVA)› Used when groups are given a pretest related in some way to the
dependent variable and their mean scores on this pretest are found to differ.
› Enables the researcher to adjust the posttest mean scores on the dependent variable for each group to compensate for the initial differences between the groups on the pretest. The pretest is called the covariate.
› How much the posttest mean scores must be adjusted depends on how large the difference between the pretest means is and the degree of relationship between the covariate and the dependent variable.
41
Multivariate analysis of variance (MANOVA)› Differs from ANOVA in only one respect: It incorporates two or
more dependent variables in the same analysis, thus permitting a more powerful test of differences among means.
› It is justified only when the researcher has reason to believe correlations exist among the dependent variables.
42
The t -Test for r› Used to see whether a correlation coefficient calculated on
sample data is significant—that is, whether it represents a nonzero correlation in the population from which the sample was drawn.
› The statistic being dealt with is a correlation coefficient ( r ) rather than a difference between means. The test produces a value for t (again called an obtained t ), which the researcher checks in a statistical probability table to see whether it is statistically significant. As with the other parametric tests, the larger the obtained value for t , the greater the likelihood that significance has been achieved.
43
NONPARAMETRIC TESTS FOR QUANTITATIVE DATA› A nonparametric statistical technique makes few, if any,
assumptions about the nature of the population from which the samples in the study were taken.
› Some of the commonly used nonparametric techniques for analyzing quantitative data are the Mann-Whitney U test, the Kruskal-Wallis one-way analysis of variance, the sign test, and the Friedman two-way analysis of variance.
44
The Mann-Whitney U Test› a nonparametric alternative to the t -test used when a researcher
wishes to analyze ranked data. The researcher intermingles the scores of the two groups and then ranks them as if they were all from just one group.
› The test produces a value ( U ), whose probability of occurrence is then checked by the researcher in the appropriate statistical table.
› The logic of the test is as follows:– If the parent populations are identical, then the sum of the pooled
rankings for each group should be about the same. – If the summed ranks are markedly different, on the other hand, then this
difference is likely to be statistically significant.
45
The Kruskal-Wallis One-Way Analysis of Variance› used when researchers have more than two independent groups to
compare.
› The procedure is quite similar to the Mann-Whitney U test except the sums of the ranks added together for each of the separate groups are then compared.
› This analysis produces a value ( H ), whose probability of occurrence is checked by the researcher in the appropriate statistical table.
46
The Sign Test› used when a researcher wants to analyze two related (as opposed to
independent) samples. Related samples are connected in some way.
› For example, often a researcher will try to equalize groups on IQ, gender, age, or some other variable.
› Another example of a related sample is when the same group is both pre- and posttested (that is, tested twice). Each individual, in other words, is tested on two different occasions (as with the t -test for correlated means).
› Procedures:– Simply lines up the pairs of related subjects and then determines how many times the paired
subjects in one group scored higher than those in the other group. If the groups do not differ significantly, the totals for the two groups should be about equal. If there is a marked difference in scoring (such as many more in one group scoring higher), the difference may be statistically significant.
47
The Friedman Two-Way Analysis of Variance› If more than two related groups are involved, then this test can
be used.
› Example:– This test would be appropriate if a researcher employs four
matched groups.
48
PARAMETRIC TESTS FOR CATEGORICAL DATA› The most common parametric technique for analyzing
categorical data is the t –test for differences in proportions.– t -Test for Proportions
› Used to analyze whether the proportion in one category (e.g., males) is different from the proportion in another category (e.g., females)
› Two forms similar like t-test for means in quantitative data:– t-test for independent proportions– t-test for correlated proportions
49
NONPARAMETRIC TESTS FOR CATEGORICAL DATA› The chi-square test is the nonparametric technique most
commonly used to analyze categorical data.– The Chi-Square Test
› The chi-square statistics can be used to determine the strength of the relationship (i.e., Does knowing someone’s gender help you predict their outcome score/value).
› The test statistic is: ›
χ2 = Chi-square valueO = Observed frequency for each category E = Expected frequency for each category.
Chi-square example
› We are interested in whether male students vs. female students are more likely to own cats vs. dogs.
› Notice that both variables are categorical.– Kind of pet: people are classified as owning cats or dogs (or
both or neither). We can count the number of people belonging to each category; we don’t scale them along a dimension of pet ownership.
– Sex: people are male or female. We count the number of people in each category; we don’t scale each person along a sex dimension.
Example Data
› Males are more likely to have dogs as opposed to cats
› Females are more likely to have cats than dogs
Cat Dog
Male 20 30 50
Female 30 20 50
50 50 100
NHST Question: Are these differences best accounted for by the null hypothesis or by the hypothesis that there is a real relationship between gender and pet ownership?
› To answer this question, we need to know what we would expect to observe if the null hypothesis were true (i.e., that there is no relationship between these two variables, and any observed relationship is due to sampling error).
Example Data
› To find expected value for a cell of the table, multiply the corresponding row total by the column total, and divide by the grand total
› For the first cell (and all other cells), (50 x 50)/100 = 25
› Thus, if the two variables are unrelated, we would expect to observe 25 people in each cell
Cat Dog
Male 25 25 50
Female 25 25 50
50 50 100
Example Data
› The differences between these expected values and the observed values are aggregated according to the Chi-square formula:
Cat Dog
Male 50
Female 50
50 50 100
E
EO 22
25
2520 2 25
2530 2
25
2530 2 25
2520 2
25
2520
25
2530
25
2530
25
2520 22222
4111125
25
25
25
25
25
25
252
Null Hypothesis Significance Testing(NHST) and chi-square
› Once you have the chi-square statistic, it can be evaluated against a chi-square sampling distribution
› The sampling distribution characterizes the range of chi-square values we might observe if the null hypothesis is true, but sampling error is giving rise to deviations from the expected values.
› You can look up the probability value associated with a chi-square statistic in a table of using a computer
› In our example in which the chi-square was 4.0, the associated p-value was >.05. (The chi-square statistic would have had to have been larger than 3.8 for it to have been significant.)
56
POWER OF A STATISTICAL TEST› The power of a statistical test for a particular set of data is the
likelihood of identifying a difference, when in fact it exists, between population parameters.
› Parametric tests are generally, but not always, more powerful than nonparametric tests.
57
REFERENCESFraenkel, J. R. , Wallen, N. E. , Hyun, H. H. (2012).
How to design and evaluate research in education. New York: McGraw- Hill.
Idre. (2013). Tail tests. Available: http://www.ats.ucla.edu/stat/mult_pkg/faq/general/tail_tests.htm. Last accessed 6th November 2013.
Sauro, J. (2004-2013). The standard error of the mean. Retreived from http://www.usablestats.com/lessons/sem