summary of data collection and analysis process

Upload: dr-muhammad-mushtaq-mangat

Post on 10-Apr-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/8/2019 Summary of Data Collection and Analysis Process

    1/31

    Data Collection and Analysis:

    An Introduction

    Presented to:

    Prof. Jiri Militky & Prof. Lubos Hes

    By

    Muhammad Mushtaq Ahmed Mangat

    Textile Faculty

    Technical University Liberec

    Sep 09, 2010

  • 8/8/2019 Summary of Data Collection and Analysis Process

    2/31

    Table of Contents, Tables and Figures

    Part One: Statistics Definition and Functions ..................................................................................... 5

    Descriptive stat .....................................................................................................................................5

    Inferential stat .......................................................................................................................................5

    Populations and samples ...................................................................................................................... 5

    Types of samples ..................................................................................................................................5

    Number systems ................................................................................................................................... 6

    Data ...................................................................................................................................................... 6

    Variable ............................................................................................................................................... 6

    Independent variable ........................................................................................................................... 6

    Dependent variable ..............................................................................................................................6

    Univariate data .................................................................................................................................... 6

    Bivariate Data ......................................................................................................................................6

    Multivariate data .................................................................................................................................. 6

    Discrete quantitative data .................................................................................................................... 7

    Continuous Quantitative data ..............................................................................................................7

    Ordinal .................................................................................................................................................7

    Nominal ................................................................................................................................................ 7

    Time Series Data ................................................................................................................................. 7

    Cross-sectional data .............................................................................................................................7

    Primary ................................................................................................................................................7

    Secondary ............................................................................................................................................7

    Data Arranging and Presentation .........................................................................................................7

    List of data ...........................................................................................................................................7

    Data Frequency ..................................................................................................................................... 7

    Part Two ...............................................................................................................................................8

    Frequency Table ...................................................................................................................................8

    Pie Chart ...............................................................................................................................................8

  • 8/8/2019 Summary of Data Collection and Analysis Process

    3/31

    Bar Chart .............................................................................................................................................. 8

    Area Charts ..........................................................................................................................................9

    Line Charts ........................................................................................................................................ 10

    ............................................................................................................................................................10

    ............................................................................................................................................................10

    ............................................................................................................................................................11

    Dot Plot ...............................................................................................................................................11

    Histogram ........................................................................................................................................... 11

    Histogram with normal curve ............................................................................................................. 12

    Radar Charts ......................................................................................................................................12

    Map chart ............................................................................................................................................12

    Stem and Leaf Plot ............................................................................................................................. 13

    Box and Whisker Plot (or Boxplot) ...................................................................................................13

    Polygon charts ................................................................................................................................... 13

    Range .................................................................................................................................................. 14

    Arithmetic mean ................................................................................................................................. 14

    Geometric mean ................................................................................................................................. 14

    Trimmed Mean ...................................................................................................................................14

    Median ................................................................................................................................................ 15

    Mode ...................................................................................................................................................15

    Percentiles .......................................................................................................................................... 15

    Extremes and quartiles ...................................................................................................................... 15

    Variance ..............................................................................................................................................15

    ............................................................................................................................................................15

    Standard deviation () ........................................................................................................................ 15

  • 8/8/2019 Summary of Data Collection and Analysis Process

    4/31

    Variance sum law .............................................................................................................................. 15

    Percentile summary ............................................................................................................................ 15

    Normal Distribution ........................................................................................................................... 16

    Skewed Distribution ...........................................................................................................................16

    Kurtosis ............................................................................................................................................. 17

    Sampling distribution ........................................................................................................................ 17

    Standard error (standard deviation of sampling) ................................................................................17

    Normal Distribution and central limit theorem .................................................................................. 17

    Characteristics of Normal distributions ..............................................................................................18

    Binomial distribution .........................................................................................................................19

    Bivariate and Multivariate analysis .................................................................................................... 19

    Correlation .......................................................................................................................................... 19

    Hypotheses Testing ............................................................................................................................ 19

    Null hypotheses .................................................................................................................................. 19

    The research Hypotheses (alternative Hypotheses) ...........................................................................19

    Two Tail and One Tail Test .............................................................................................................. 20

    Hypotheses Testing Methods ............................................................................................................ 20

    Z Score Test ........................................................................................................................................20

    Types of Z test ....................................................................................................................................20

    Z value calculation ............................................................................................................................. 21

    t Test (William Sealy Gosset, 1908) .................................................................................................. 21

    One sample t test: ............................................................................................................................... 21

    P value ............................................................................................................................................... 21

    Correlation ......................................................................................................................................... 22

    Regression Analysis ........................................................................................................................... 22

    ............................................................................................................................................................26

    Example of regression analysis ......................................................................................................... 26

    Explanation of model: ........................................................................................................................ 28

  • 8/8/2019 Summary of Data Collection and Analysis Process

    5/31

    Multinomial logistic regression .......................................................................................................... 28

    Results of Multinomial Logistic Regression ...................................................................................... 29

    Chi-Square Test ..................................................................................................................................29

    Crosstabs ............................................................................................................................................ 30

    Part One: Statistics Definition and Functions

    Statistics is an art and science of collecting and understanding data. Main functions:

    1. Gathering

    2. Arranging

    3. Analyzing

    4. Exploring the data

    5. Estimate the unknown quantity

    6. Presenting results

    7. Interpreting results

    8. Making available for decisions

    9. Designing plan for data collection

    10.Hypotheses testing

    Descriptive stat

    Descriptive statistics are used to describe the main features of a collection of data in quantitative

    terms (en.wikipedia.org/wiki/Descriptive_statistics)

    Inferential stat

    A statistical inference is a conclusion made on the basis of data which is subject to random variation

    of some kind, possibly observation errors or sampling variation

    (en.wikipedia.org/wiki/Inferential_statistics)

    Populations and samples

    The populationfrom which the sample is drawn and sample --- that is, a small subset of a larger set

    Types of samples

    Random sample, Stratified sample, Quota sample, Purposive sample, Convenience sample

    http://www.google.com.pk/url?q=http://en.wikipedia.org/wiki/Descriptive_statistics&sa=X&ei=lzKKTLvWNo-Sswa1uPzMAQ&ved=0CAUQpAMoAA&usg=AFQjCNFTpIz7WAE6eXmDAc8c2cJ0dJFzNQhttp://www.google.com.pk/url?q=http://en.wikipedia.org/wiki/Inferential_statistics&sa=X&ei=ZjKKTOi7Ds-SswbE_6SuAg&ved=0CAcQpAMoAA&usg=AFQjCNGUZdarWwFAJaX1oHRofWfO3LL8gAhttp://www.google.com.pk/url?q=http://en.wikipedia.org/wiki/Descriptive_statistics&sa=X&ei=lzKKTLvWNo-Sswa1uPzMAQ&ved=0CAUQpAMoAA&usg=AFQjCNFTpIz7WAE6eXmDAc8c2cJ0dJFzNQhttp://www.google.com.pk/url?q=http://en.wikipedia.org/wiki/Inferential_statistics&sa=X&ei=ZjKKTOi7Ds-SswbE_6SuAg&ved=0CAcQpAMoAA&usg=AFQjCNGUZdarWwFAJaX1oHRofWfO3LL8gA
  • 8/8/2019 Summary of Data Collection and Analysis Process

    6/31

    Number systems

    Natural : 0, 1, 2, 3, 4, 5, 6, 7, ..., n

    Integers: n, ..., 5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5, ..., n

    Positive integers: 1, 2, 3, 4, 5, ..., n

    Rational: a/b where a and b are integers and b is not zero (3/4)

    Real: The limit of a convergent sequence of rational numbers (-1.23, 1.234)Complex: a + bi where a and b are real numbers and i is the square root of 1

    Prime numbers: anatural numberthat has exactly two distinct natural numberdivisors: 1 and itself

    (1,3,5,7,11)

    Irrational number:The irrational numbers are in fact precisely those infinite decimals which are not

    repeating (7/22 Pai)

    Data

    Data refers to any kind of recorded information

    Variable

    A piece of information recorded for every item is called a variable

    Independent variable

    A variable, which can be exploited during experiment

    Dependent variable

    A variable affected by the exploitation of independent variable

    Univariate data

    It is a data set which one piece of information has recorded for each item.

    Bivariate Data

    Such data sets have exactly two pieces of information recorded for each item

    Multivariate data

    Such data sets have three or more pieces of information recorded for each item

    http://en.wikipedia.org/wiki/Natural_numberhttp://en.wikipedia.org/wiki/Natural_numberhttp://en.wikipedia.org/wiki/Divisorhttp://en.wikipedia.org/wiki/Divisorhttp://en.wikipedia.org/wiki/1_(number)http://en.wikipedia.org/wiki/Natural_numberhttp://en.wikipedia.org/wiki/Divisorhttp://en.wikipedia.org/wiki/1_(number)
  • 8/8/2019 Summary of Data Collection and Analysis Process

    7/31

    Discrete quantitative data

    A discrete variable can assume values only from a list of specific numbers e.g. number of people,

    number of class rooms.

    Continuous Quantitative data

    It could be any number (value) e.g. weight of students, weather temperature

    Ordinal

    In this there is a meaningful order e.g. 1 to 5 where 1 is the dull and 5 is full bright

    Nominal

    Where there is no meaningful order e.g. name of different departments

    Time Series Data

    Data recorded in a meaningful sequence e.g. daily report of stock exchange, weekly temperature of

    a patient

    Cross-sectional data

    Data collected at point of time e.g. grades of students in first term

    Primary

    Data collected for a specific purpose

    Secondary

    Previously collected data for another use

    Data Arranging and Presentation

    List of data

    It is the simplest kind of data. It represents some kind of information.

    Data Frequency

    Frequency of data shows how often the various values occur in the data set. Normally presented in

    shape of histogram

    (source:http://www.stats.gla.ac.uk/steps/glossary/presenting_data.html#freqtab and results of

    Google image research)

    http://www.stats.gla.ac.uk/steps/glossary/presenting_data.html#freqtabhttp://www.stats.gla.ac.uk/steps/glossary/presenting_data.html#freqtabhttp://www.stats.gla.ac.uk/steps/glossary/presenting_data.html#freqtab
  • 8/8/2019 Summary of Data Collection and Analysis Process

    8/31

    Part Two

    Part Two

    Central Tendency and Data Spread

    Frequency Table

    Score Frequency Frequency (%)

    0 4 13%

    1 3 10%2 5 17%

    3 5 17%

    4 6 20%

    5 7 23%

    Pie Chart

    Bar Chart

  • 8/8/2019 Summary of Data Collection and Analysis Process

    9/31

    Area Charts

  • 8/8/2019 Summary of Data Collection and Analysis Process

    10/31

    Line Charts

  • 8/8/2019 Summary of Data Collection and Analysis Process

    11/31

    Dot Plot

    Useful to identify any outliers, line of values also useful for this purpose.

    Histogram

  • 8/8/2019 Summary of Data Collection and Analysis Process

    12/31

    Histogram with normal curve

    Radar Charts

    Map chart

  • 8/8/2019 Summary of Data Collection and Analysis Process

    13/31

    Stem and Leaf Plot

    Box and Whisker Plot (or Boxplot)

    Polygon charts

  • 8/8/2019 Summary of Data Collection and Analysis Process

    14/31

    Variability means the extent to which data values differ from each other.

    Diversity, dispersion, spread and uncertainty have the same meanings

    Population - parameter Sample - statistic

    size: N n

    Mean mu x x bar

    median n/a Mor ~x x tilde

    proportion pi p

    (p in text) ( p in text)

    spread:

    variance 2 sigma squared s2 s squared = (x -

    x )2/(n - 1)

    standard deviation sigma s

    zscore = (x - mean)/sd Z z

    correlation coefficient rho r

    Slope 1 beta 1 b1

    intercept 0 beta naught b0

    In this report for simplicity we will use only signs of population

    Range

    Highest values-smallest value

    Arithmetic mean

    Geometric mean

    Trimmed Mean

    In this case some extreme values are removed for unbiased mean

  • 8/8/2019 Summary of Data Collection and Analysis Process

    15/31

    Median

    Halfway point of data set (n+1)/2 in case of odd number, in case of even number mean of two

    middle values

    Mode

    The most common category

    Percentiles

    Percentiles are summary measures expressing ranks as percentage 0% to 100% rather than 1 to n.

    These are used:

    To indicate the data value at a given percentage

    To indicate the percentage ranking of a given data value

    Extremes and quartiles

    Extremes the smallest and largest

    Quartiles defines 25% and 75%

    Variance

    For population

    For samples

    Standard deviation ()

    It is square root of variance and tells average distance from the mean value

    Variance sum law

    Percentile summary

    Value attained by a given percentage after they have been ordered from smallest to largest.

  • 8/8/2019 Summary of Data Collection and Analysis Process

    16/31

    Standard Deviation

    It is an indication how different the numbers are from one another.

    Normal Distribution

    It is an idealized, smooth, bell-shaped histogram with all of the randomness removed.

    It represents an ideal set that has lots of numbers concentrated in the middle.

    It is common for statistical procedures to assume that the data set is reasonably approximated by a

    normal distribution. Example with 5 and standard deviation:

    Skewed Distribution

    It is neither symmetric nor normal, because data values trail off more sharply on one side the on the

    other. Pearson suggest following equation to measure skewness1:

    Now more commonly used equation:

    Negative Positive

    1 Online Statistics: An Interactive Multimedia Course of Study

  • 8/8/2019 Summary of Data Collection and Analysis Process

    17/31

    Kurtosis

    Sampling distribution

    It is a distribution of the statistic forall possible samples of a given size from a population. It is

    highly dependent on the distribution of population.

    Standard error (standard deviation of sampling)

    Mean of sampling distribution is equal to the mean of population.

    M =

    Variance of sampling distribution is as under:

    Standard distribution of sampling is referred as standard error of the quantity.

    Normal Distribution and central limit theorem

    Repeated means from a population which may not be normally distributed will be normally

    distributed. Large sample size will have higher normal distribution2.

    2 Online Statistics: An Interactive Multimedia Course of Study

  • 8/8/2019 Summary of Data Collection and Analysis Process

    18/31

    Following figures are different mean and SD.

    Characteristics of Normal distributions

    1. Symmetric around their mean.

    2. Mean, median and mode at same point

    3. Area under normal curve is 1.00

    4. Dense in center and thin at tails

    5. Mean and SD are used for it

    6. 68.27% data is within one SD

    7. 95.45% data is within 2 SD

    8. 99.73 % data is within 3 SD

    9. 1.96 Z has 95% area

    10. 1.68 Z has 90% area

  • 8/8/2019 Summary of Data Collection and Analysis Process

    19/31

    Binomial distribution

    There is only one outcome of each trial and each trial is mutually exclusive for example of head and

    tail of coin.

    Bivariate and Multivariate analysis

    Bivariate analysis deals with the association or relationship between two set of data of two different

    variables, whereas, multivariate deals with data of more than two sets of variable to have joint

    effect. It is used to test hypothesis and identify the strength of correlation between or simply

    dependency one variable on the other.

    Correlation

    1. A causal, complementary, parallel, or reciprocal relationship, especially a structural,

    functional, or qualitative correspondence between two comparable entities: a correlation

    between drug abuse and crime.

    2. Statistics. The simultaneous change in value of two numerically valued random

    variables: the positive correlation between cigarette smoking and the incidence of lung

    cancer; the negative correlation between age and normal vision.

    3. An act of correlating or the condition of being correlated3.

    Hypotheses Testing

    Statistical hypothesis test, or more briefly, hypothesis test, is an algorithm to state the alternative

    (for or against the hypothesis) which minimizes certain risks

    Null hypotheses

    It is denoted by Ho and represents the default possibility about the population that you will accept

    unless you have convincing evidence to the contrary.

    The research Hypotheses (alternative Hypotheses)

    It is denoted by Ha and will be accepted if there is a convincing evidence that would rule out the

    null hypotheses as a reasonable possibilityExample:

    Ho: a = 0

    3http://www.answers.com/topic/correlation

    http://www.answers.com/topic/correlationhttp://www.answers.com/topic/correlation
  • 8/8/2019 Summary of Data Collection and Analysis Process

    20/31

    Ha: a 0

    Two Tail and One Tail Test

    One tail test: population mean is greater/lesser that the sample mean,

    Two Tail Test

    In this case researcher claims that the sample mean may be different than the population mean

    (greater or lesser).

    Hypotheses Testing Methods

    Z test

    t Test

    p value

    Z Score Test

    Considering central limit theorem lots of statistic analysis are possible since distribution is normal.

    Z-tests are better if the sample size is not too small. It tells distance in standard deviation form from

    the mean of a data set.

    Z-test is a statistical test where normal distribution is applied and is basically used for dealing with

    problems relating to large samples when n 30 (http://www.experiment-resources.com/z-test.html#ixzz0zCnm9iX5)

    Types of Z test

    1. Z test for single proportion to test hypothesis on a specific value of proportion, Ho: P=Po.

    2. For two different groups of data, drinking habits of male and female

    3. Test the specific value on a population. It is used when sample size >30 and standard

    deviation is known.

    http://www.experiment-resources.com/z-test.html#ixzz0zCnm9iX5http://www.experiment-resources.com/z-test.html#ixzz0zCnm9iX5
  • 8/8/2019 Summary of Data Collection and Analysis Process

    21/31

    4. Test of variance on a specific value of population variance.

    5. Test of equality of two sets of variable when sample size >304.

    Z value calculation

    Formula of Z value:

    Z value will be used to find the corresponding P value in table and will be compared with critical Z

    value and if the P value is less than alpha, we reject the null hypothesis.

    t Test (William Sealy Gosset, 1908)

    For:

    1. Single sample t test

    2. Two independent samples t test

    3. Compared groups t test (before treatment and after treatment)

    4. For checking of regression line, is it equal to zero or not.

    Useful for small samples, less than 30.

    Assumption:

    Data should by having normality which can be checked by using histogram and equality of variance

    by using levenes test.

    One sample t test:

    P value

    P values indicates the probability if the test statistics are properly distributed under normal curve as

    it was assumed in null hypothesis. The smaller p value supports to not accept the null hypothesis.

    More common is 0.05(95%) significance; however 0.1 and .01 are also used.

    4Choudhury, Amit (2009). Z-Test. Retrieved [Date of Retrieval] from Experiment Resources: http://www.experiment-resources.com/z-test.html

    Read more:http://www.experiment-resources.com/z-test.html#ixzz0zCqH1V8C

    http://en.wikipedia.org/wiki/William_Sealy_Gossethttp://www.experiment-resources.com/z-test.html#ixzz0zCqH1V8Chttp://www.experiment-resources.com/z-test.html#ixzz0zCqH1V8Chttp://www.experiment-resources.com/z-test.html#ixzz0zCqH1V8Chttp://en.wikipedia.org/wiki/William_Sealy_Gosset
  • 8/8/2019 Summary of Data Collection and Analysis Process

    22/31

    Correlation

    1. For parametric statistic (Pearson's product-moment correlation)

    2. For nonparametric statistic (Spearman's rank correlation).5

    Following equation is used to measure coefficient of correlation

    6

    :

    Another equation to calculate correlation coefficient:

    Regression AnalysisRegression analysis is a process to find the best fit line to explain the relationship between the independent and

    dependent variable. It is written as:

    Simple regression:

    Y=b0+ b1X+

    Multiple regression:

    Y=b0+ b1X1++b2X2+b3X3+.bnXn+

    5http://www.answers.com/topic/correlation-coefficient6 Online Statistics: An Interactive Multimedia Course of Study

    http://www.answers.com/topic/correlation-coefficienthttp://www.answers.com/topic/correlation-coefficient
  • 8/8/2019 Summary of Data Collection and Analysis Process

    23/31

    Where:

    b0= interception on Y axis

    Y= value of dependent variable

    b1b3=coefficient of independent value

    X=independent variable

    =noise or effect of unknown variable (it may be ignored)

    Assumption for regression analysis:

    1. The sample true representative f population

    2. Linearity in the data

    3. Existence ofhomoscedasticity

    Equation for regression

    Slope line:

    http://en.wikipedia.org/wiki/Homoscedasticityhttp://en.wikipedia.org/wiki/Homoscedasticityhttp://en.wikipedia.org/wiki/Homoscedasticity
  • 8/8/2019 Summary of Data Collection and Analysis Process

    24/31

    For intercept

    Example taken from http://faculty.uncfsu.edu/dwallace/lesson%2018.pdf

  • 8/8/2019 Summary of Data Collection and Analysis Process

    25/31

  • 8/8/2019 Summary of Data Collection and Analysis Process

    26/31

    Example of regression analysis

    Model Summary

    Mode

    l R R Square

    Adjusted R

    Square

    Std. Error of

    the Estimate

    Change Statistics

    R Square

    Change F Change df1 df2

    Sig. F

    Change

    1 .733a .537 .512 12.92502 .537 21.659 3 56 .000

    a. Predictors: (Constant), Thermal Conductivity at Dry State Wm^1K^-1, Sample Thickness at Dry State (mm),

    Thermal Resistance at Dry StateK.m2W^-1)

  • 8/8/2019 Summary of Data Collection and Analysis Process

    27/31

    ANOVAb

    Model

    Sum of

    Squares df

    Mean

    Square F Sig.

    1 Regressio

    n10854.967 3 3618.322 21.659 .000a

    Residual 9355.145 56 167.056

    Total 20210.112 59

    a. Predictors: (Constant), Thermal Conductivity at Dry State

    Wm^1K^-1, Sample Thickness at Dry State (mm), Thermal

    Resistance at Dry StateK.m2W^-1)

    b. Dependent Variable: Thermal Absorbtivity at Dry

    StateW.m^-2.s1/2. K-1)

  • 8/8/2019 Summary of Data Collection and Analysis Process

    28/31

    Coefficientsa

    Model

    Unstandardized

    Coefficients

    Standardized

    Coefficients

    t Sig.

    Collinearity

    Statistics

    B Std. Error Beta Tolerance VIF

    1 (Constant) -80.481 173.186 -.465 .644

    Thermal Resistance at

    Dry StateK.m2W^-1)12696.547 10202.604 1.786 1.244 .219 .004 249.080

    Sample Thickness at

    Dry State (mm)-334.270 199.663 -1.937 -1.674 .100 .006 161.903

    Thermal Conductivity

    at Dry State

    Wm^1K^-1

    6192.067 3260.818 .774 1.899 .063 .050 20.091

    a. Dependent Variable: Thermal Absorbtivity

    Explanation of model:

    Adjusted R square=.512(51.2%) means that in dependent variable 51.2% changes are due to these

    independent variables. Significant F change shows that model is significant. Standardized

    coefficient are coefficients of independent variables. Their significance values describe the

    significance of these variables in the regression equation. Less than 0.05 tells that variable is

    significant.

    Multinomial logistic regression7

    It is used for:

    1. Analyze relationship between non-metric dependent and metric dichotomous independent

    variable

    2. It compares the multiple group through a combination of a binary logistic regression

    It used to predict:

    1. Coefficients for each of the two comparison

    2. Three equations one for each group defined by the dependent variable

    7 Source: www.utexas.edu/.../MultinomialLogisticRegression_BasicRelationships.ppt SW388R7Data Analysis & Computers II

  • 8/8/2019 Summary of Data Collection and Analysis Process

    29/31

    3. A comparison is possible between group membership and actual group to find measure of

    classification accuracy

    Requirements of Multinomial logistic regression analysis:

    1. Dependent variable should be non-metric the independent variables should metric or

    dichotomous

    2. Dichotomous, nominal, and ordinal variables can satisfy the requirements

    Results of Multinomial Logistic Regression

    1. Overall relationship between independent variables and grouped defined by the

    dependent variables

    2. Difference follows a chi-square distribution and used for significance testing

    Examples:

    1. Influence of father professional and education on occupancy preference2. Effect of food and exercise on a certain disease3. Selection of brands based on gender and age Chi Square Test

    Chi-Square Test

    Chi- square test is used to find association between two sets of variable written in the form of a

    matrix, two way table8:

    Where:

    X2= Chi-square value

    O= observed frequency

    E= expected frequency

    Example:

    Short Tall Total

    8 Source:http://science.jrank.org/pages/1401/Chi-Square-Test.html

    http://science.jrank.org/pages/1401/Chi-Square-Test.htmlhttp://science.jrank.org/pages/1401/Chi-Square-Test.htmlhttp://science.jrank.org/pages/1401/Chi-Square-Test.html
  • 8/8/2019 Summary of Data Collection and Analysis Process

    30/31

    Male 24 20 44

    Female 36 5 41

    Total 60 25 85

    Expected value are calculated by using probability rules:

    Probability that a person is short: 60/85=0.706

    Probability that a person is male: 44/85=0.518

    A person is male and short: 0.706*0.518=.366

    Expected frequency of such person who are male and short: 0.366*85= 31.1

    (we can calculate all other values by using this method)

    Observed

    values

    Expected

    values

    (O-E)2/E

    24 31.1 1.62

    36 12.9 3.91

    20 28.9 1.74

    5 12.1 4.17

    85 85 X2=11.4

    Degree of freedom= (row-1)(column -1)= (2-1)(2-1)=1

    values from Chi sq distribution:

    For p=0.05, value ofX2= 3.84, whereas our value is 11.4, which is quite high.

    It shows that we have to accept the null hypothesis that there is an association between

    male and female and their height.

    Crosstabs

    It is a non parametric test and used to measure the association between two categories by

    controlling other categories.

  • 8/8/2019 Summary of Data Collection and Analysis Process

    31/31

    Example:

    People having high salaries are more likely to go on vocation as compared to people having low

    salaries.

    Most commonly Pearson chi-square, likelihood-ratio chi-square are used for test of significance.

    Results from SPSS:

    Interpretation of the results: difference is by chance and there is no difference in services offerd by

    different stores.