applied multivariate analysis - research-training.net...statistical analysis often appears complex...
TRANSCRIPT
Applied Multivariate Analysis
Adam Smith Business SchoolGlasgow University
February 25–28, 2019
A Course Overview
www.research-training.net/Glasgow
University of Manchester
Graeme Hutcheson Applied Multivariate Analysis
The slides and R-files for this session are available for downloadfrom the course website...
www.research-training.net/Glasgow
Graeme Hutcheson Applied Multivariate Analysis
Statistical analysis often appears complex and inaccessible to manypostgraduate students.
There are a huge number of tests that can be applied to manydifferent research designs and types of data.
It is not always obvious which statistical test or technique can beapplied to each particular research problem.
This course attempts to make this process easier by outlining ageneral method for analysing quantitative data.
Graeme Hutcheson Applied Multivariate Analysis
Statistical analysis often appears complex and inaccessible to manypostgraduate students.
There are a huge number of tests that can be applied to manydifferent research designs and types of data.
It is not always obvious which statistical test or technique can beapplied to each particular research problem.
This course attempts to make this process easier by outlining ageneral method for analysing quantitative data.
Graeme Hutcheson Applied Multivariate Analysis
Statistical analysis often appears complex and inaccessible to manypostgraduate students.
There are a huge number of tests that can be applied to manydifferent research designs and types of data.
It is not always obvious which statistical test or technique can beapplied to each particular research problem.
This course attempts to make this process easier by outlining ageneral method for analysing quantitative data.
Graeme Hutcheson Applied Multivariate Analysis
Statistical analysis often appears complex and inaccessible to manypostgraduate students.
There are a huge number of tests that can be applied to manydifferent research designs and types of data.
It is not always obvious which statistical test or technique can beapplied to each particular research problem.
This course attempts to make this process easier by outlining ageneral method for analysing quantitative data.
Graeme Hutcheson Applied Multivariate Analysis
A general method for analysis is essential in order that researchershave access to a system that can be applied to a wide variety ofdata and methodologies and is also learnable within the timeavailable to a typical student.
This session introduces a system of analysis that...
I is based on a theoretically coherent and consistent statisticaltheory.
I provides a method for analysing many different types of dataand study designs.
I utilises modern graphics for interpretation that are morepowerful and intuitive than ‘traditional’ statistical output.
I is easy to apply using readily-available software.
Graeme Hutcheson Applied Multivariate Analysis
A general method for analysis is essential in order that researchershave access to a system that can be applied to a wide variety ofdata and methodologies and is also learnable within the timeavailable to a typical student.
This session introduces a system of analysis that...
I is based on a theoretically coherent and consistent statisticaltheory.
I provides a method for analysing many different types of dataand study designs.
I utilises modern graphics for interpretation that are morepowerful and intuitive than ‘traditional’ statistical output.
I is easy to apply using readily-available software.
Graeme Hutcheson Applied Multivariate Analysis
A general method for analysis is essential in order that researchershave access to a system that can be applied to a wide variety ofdata and methodologies and is also learnable within the timeavailable to a typical student.
This session introduces a system of analysis that...
I is based on a theoretically coherent and consistent statisticaltheory.
I provides a method for analysing many different types of dataand study designs.
I utilises modern graphics for interpretation that are morepowerful and intuitive than ‘traditional’ statistical output.
I is easy to apply using readily-available software.
Graeme Hutcheson Applied Multivariate Analysis
A general method for analysis is essential in order that researchershave access to a system that can be applied to a wide variety ofdata and methodologies and is also learnable within the timeavailable to a typical student.
This session introduces a system of analysis that...
I is based on a theoretically coherent and consistent statisticaltheory.
I provides a method for analysing many different types of dataand study designs.
I utilises modern graphics for interpretation that are morepowerful and intuitive than ‘traditional’ statistical output.
I is easy to apply using readily-available software.
Graeme Hutcheson Applied Multivariate Analysis
A general method for analysis is essential in order that researchershave access to a system that can be applied to a wide variety ofdata and methodologies and is also learnable within the timeavailable to a typical student.
This session introduces a system of analysis that...
I is based on a theoretically coherent and consistent statisticaltheory.
I provides a method for analysing many different types of dataand study designs.
I utilises modern graphics for interpretation that are morepowerful and intuitive than ‘traditional’ statistical output.
I is easy to apply using readily-available software.
Graeme Hutcheson Applied Multivariate Analysis
A general method for analysis is essential in order that researchershave access to a system that can be applied to a wide variety ofdata and methodologies and is also learnable within the timeavailable to a typical student.
This session introduces a system of analysis that...
I is based on a theoretically coherent and consistent statisticaltheory.
I provides a method for analysing many different types of dataand study designs.
I utilises modern graphics for interpretation that are morepowerful and intuitive than ‘traditional’ statistical output.
I is easy to apply using readily-available software.
Graeme Hutcheson Applied Multivariate Analysis
In general, the process of data analysis can be separated into threedistinct parts...
1. Represent research questions using a standard format.
2. Run analyses using a general technique that applies tomultiple data types and research designs.
3. Interpret results using a common set of techniques andgraphics.
Graeme Hutcheson Applied Multivariate Analysis
In general, the process of data analysis can be separated into threedistinct parts...
1. Represent research questions using a standard format.
2. Run analyses using a general technique that applies tomultiple data types and research designs.
3. Interpret results using a common set of techniques andgraphics.
Graeme Hutcheson Applied Multivariate Analysis
In general, the process of data analysis can be separated into threedistinct parts...
1. Represent research questions using a standard format.
2. Run analyses using a general technique that applies tomultiple data types and research designs.
3. Interpret results using a common set of techniques andgraphics.
Graeme Hutcheson Applied Multivariate Analysis
In general, the process of data analysis can be separated into threedistinct parts...
1. Represent research questions using a standard format.
2. Run analyses using a general technique that applies tomultiple data types and research designs.
3. Interpret results using a common set of techniques andgraphics.
Graeme Hutcheson Applied Multivariate Analysis
1. A general method for representingresearch questions...
Graeme Hutcheson Applied Multivariate Analysis
The course uses an equation-based format to represent researchquestions. This technique is useful as it...
I explicitly identifies which relationships and parameters are tobe estimated from the data.
I allow us to identify the statistical technique to be used, howto structure the data and how to input the model into thesoftware.
I highlights the fundamental similarities between models fordifferent data types and research designs.
Graeme Hutcheson Applied Multivariate Analysis
The course uses an equation-based format to represent researchquestions. This technique is useful as it...
I explicitly identifies which relationships and parameters are tobe estimated from the data.
I allow us to identify the statistical technique to be used, howto structure the data and how to input the model into thesoftware.
I highlights the fundamental similarities between models fordifferent data types and research designs.
Graeme Hutcheson Applied Multivariate Analysis
The course uses an equation-based format to represent researchquestions. This technique is useful as it...
I explicitly identifies which relationships and parameters are tobe estimated from the data.
I allow us to identify the statistical technique to be used, howto structure the data and how to input the model into thesoftware.
I highlights the fundamental similarities between models fordifferent data types and research designs.
Graeme Hutcheson Applied Multivariate Analysis
A variety of research questions can be defined using a generalformulation...
A variable may be predicted by other variables
which is written as...
variable Y ∼ variable X + variable Z
or, more generally as...
Y ∼ Xi + Xj + ... + Xn
Graeme Hutcheson Applied Multivariate Analysis
A variety of research questions can be defined using a generalformulation...
A variable may be predicted by other variables
which is written as...
variable Y ∼ variable X + variable Z
or, more generally as...
Y ∼ Xi + Xj + ... + Xn
Graeme Hutcheson Applied Multivariate Analysis
A variety of research questions can be defined using a generalformulation...
A variable may be predicted by other variables
which is written as...
variable Y ∼ variable X + variable Z
or, more generally as...
Y ∼ Xi + Xj + ... + Xn
Graeme Hutcheson Applied Multivariate Analysis
A variety of research questions can be defined using a generalformulation...
A variable may be predicted by other variables
which is written as...
variable Y ∼ variable X + variable Z
or, more generally as...
Y ∼ Xi + Xj + ... + Xn
Graeme Hutcheson Applied Multivariate Analysis
A variety of research questions can be defined using a generalformulation...
A variable may be predicted by other variables
which is written as...
variable Y ∼ variable X + variable Z
or, more generally as...
Y ∼ Xi + Xj + ... + Xn
Graeme Hutcheson Applied Multivariate Analysis
This simple notation can represent a range of research questionsapplied to different data scales...
weight (kgs) ∼ treatment group(the experimental group to which the rats are allocated may be predicted
by the final weight of the animals)
grade in maths (A to F) ∼ teacher style(The grade achieved in a mathematics exam may be predicted by the
style of teaching adopted by the teacher)
school selected ∼ parental occupation + gender + family income(the school selected may be predicted by parental occupation, the gender
of the child and the income level of the family)
number of car accidents ∼ age*gender(the number of accidents may be predicted by a combination of the age
of the driver and their gender; it is suspected that young males have
more accidents)
Graeme Hutcheson Applied Multivariate Analysis
This simple notation can represent a range of research questionsapplied to different data scales...
weight (kgs) ∼ treatment group(the experimental group to which the rats are allocated may be predicted
by the final weight of the animals)
grade in maths (A to F) ∼ teacher style(The grade achieved in a mathematics exam may be predicted by the
style of teaching adopted by the teacher)
school selected ∼ parental occupation + gender + family income(the school selected may be predicted by parental occupation, the gender
of the child and the income level of the family)
number of car accidents ∼ age*gender(the number of accidents may be predicted by a combination of the age
of the driver and their gender; it is suspected that young males have
more accidents)
Graeme Hutcheson Applied Multivariate Analysis
This simple notation can represent a range of research questionsapplied to different data scales...
weight (kgs) ∼ treatment group(the experimental group to which the rats are allocated may be predicted
by the final weight of the animals)
grade in maths (A to F) ∼ teacher style(The grade achieved in a mathematics exam may be predicted by the
style of teaching adopted by the teacher)
school selected ∼ parental occupation + gender + family income(the school selected may be predicted by parental occupation, the gender
of the child and the income level of the family)
number of car accidents ∼ age*gender(the number of accidents may be predicted by a combination of the age
of the driver and their gender; it is suspected that young males have
more accidents)
Graeme Hutcheson Applied Multivariate Analysis
This simple notation can represent a range of research questionsapplied to different data scales...
weight (kgs) ∼ treatment group(the experimental group to which the rats are allocated may be predicted
by the final weight of the animals)
grade in maths (A to F) ∼ teacher style(The grade achieved in a mathematics exam may be predicted by the
style of teaching adopted by the teacher)
school selected ∼ parental occupation + gender + family income(the school selected may be predicted by parental occupation, the gender
of the child and the income level of the family)
number of car accidents ∼ age*gender(the number of accidents may be predicted by a combination of the age
of the driver and their gender; it is suspected that young males have
more accidents)
Graeme Hutcheson Applied Multivariate Analysis
This simple notation can represent a range of research questionsapplied to different data scales...
weight (kgs) ∼ treatment group(the experimental group to which the rats are allocated may be predicted
by the final weight of the animals)
grade in maths (A to F) ∼ teacher style(The grade achieved in a mathematics exam may be predicted by the
style of teaching adopted by the teacher)
school selected ∼ parental occupation + gender + family income(the school selected may be predicted by parental occupation, the gender
of the child and the income level of the family)
number of car accidents ∼ age*gender(the number of accidents may be predicted by a combination of the age
of the driver and their gender; it is suspected that young males have
more accidents)
Graeme Hutcheson Applied Multivariate Analysis
and can represent a range of research designs....
For example, an independent groups design...
School A
child01 = 56
child02 = 71
child03 = 59
child04 = 62
child05 = 68
School B
child06 = 72
child07 = 69
child08 = 75
child09 = 61
child10 = 76
School C
child11 = 61
child12 = 52
child13 = 43
child14 = 60
child15 = 62
can be represented using the equation...
Score ∼ School
Graeme Hutcheson Applied Multivariate Analysis
and can represent a range of research designs....
For example, an independent groups design...
School A
child01 = 56
child02 = 71
child03 = 59
child04 = 62
child05 = 68
School B
child06 = 72
child07 = 69
child08 = 75
child09 = 61
child10 = 76
School C
child11 = 61
child12 = 52
child13 = 43
child14 = 60
child15 = 62
can be represented using the equation...
Score ∼ School
Graeme Hutcheson Applied Multivariate Analysis
and can represent a range of research designs....
For example, an independent groups design...
School A
child01 = 56
child02 = 71
child03 = 59
child04 = 62
child05 = 68
School B
child06 = 72
child07 = 69
child08 = 75
child09 = 61
child10 = 76
School C
child11 = 61
child12 = 52
child13 = 43
child14 = 60
child15 = 62
can be represented using the equation...
Score ∼
School
Graeme Hutcheson Applied Multivariate Analysis
and can represent a range of research designs....
For example, an independent groups design...
School A
child01 = 56
child02 = 71
child03 = 59
child04 = 62
child05 = 68
School B
child06 = 72
child07 = 69
child08 = 75
child09 = 61
child10 = 76
School C
child11 = 61
child12 = 52
child13 = 43
child14 = 60
child15 = 62
can be represented using the equation...
Score ∼ School
Graeme Hutcheson Applied Multivariate Analysis
or a dependent groups design...
class A class B
58
72
40
62
48
61
69
53
65
62
child01
child02
child03
child04
child05
This design is represented using the equation...
Score ∼ Class + Child
Graeme Hutcheson Applied Multivariate Analysis
or a dependent groups design...
class A class B
58
72
40
62
48
61
69
53
65
62
child01
child02
child03
child04
child05
This design is represented using the equation...
Score ∼ Class + Child
Graeme Hutcheson Applied Multivariate Analysis
or a dependent groups design...
class A class B
58
72
40
62
48
61
69
53
65
62
child01
child02
child03
child04
child05
This design is represented using the equation...
Score ∼
Class + Child
Graeme Hutcheson Applied Multivariate Analysis
or a dependent groups design...
class A class B
58
72
40
62
48
61
69
53
65
62
child01
child02
child03
child04
child05
This design is represented using the equation...
Score ∼ Class +
Child
Graeme Hutcheson Applied Multivariate Analysis
or a dependent groups design...
class A class B
58
72
40
62
48
61
69
53
65
62
child01
child02
child03
child04
child05
This design is represented using the equation...
Score ∼ Class + Child
Graeme Hutcheson Applied Multivariate Analysis
Correlational designs can also be represented using equations...
choice of school ∼ SES + gender + religion
probability of being stopped by police ∼ age * ethnicity
number of job applications ∼ age * education + gender
rating of happiness ∼ gender + education + wealth + age*health
Graeme Hutcheson Applied Multivariate Analysis
All of our statistical models can be represented using equations...
t-tests, ANOVAs, chi-square, cross-tabs, Kruskal Wallis,Mann-Whitney, Pearson and Spearman correlations, log-linear,survival analysis, neural nets, structural equation and regressionmodels can all be written in equation-format.
This is very important, as once we have the equation, the choice ofwhich statistic to use, how to structure the data and input thisinto the software and interpret the results becomes much morestraight-froward.
Graeme Hutcheson Applied Multivariate Analysis
All of our statistical models can be represented using equations...
t-tests, ANOVAs, chi-square, cross-tabs, Kruskal Wallis,Mann-Whitney, Pearson and Spearman correlations, log-linear,survival analysis, neural nets, structural equation and regressionmodels can all be written in equation-format.
This is very important, as once we have the equation, the choice ofwhich statistic to use, how to structure the data and input thisinto the software and interpret the results becomes much morestraight-froward.
Graeme Hutcheson Applied Multivariate Analysis
All of our statistical models can be represented using equations...
t-tests, ANOVAs, chi-square, cross-tabs, Kruskal Wallis,Mann-Whitney, Pearson and Spearman correlations, log-linear,survival analysis, neural nets, structural equation and regressionmodels can all be written in equation-format.
This is very important, as once we have the equation, the choice ofwhich statistic to use, how to structure the data and input thisinto the software and interpret the results becomes much morestraight-froward.
Graeme Hutcheson Applied Multivariate Analysis
2. A general method foranalysing data...
Graeme Hutcheson Applied Multivariate Analysis
The statistical techniques used in this course are from the family ofGeneralized Linear Models (GLMs).
These models are used as they...
I provide a powerful and theoretically-consistent method fordata analysis.
I may be applied to many different types of data and researchdesign.
I are economical with respect to the time taken to learn thetechniques.
I replicate or replace many of the more traditional statisticaltests.
Graeme Hutcheson Applied Multivariate Analysis
The statistical techniques used in this course are from the family ofGeneralized Linear Models (GLMs).
These models are used as they...
I provide a powerful and theoretically-consistent method fordata analysis.
I may be applied to many different types of data and researchdesign.
I are economical with respect to the time taken to learn thetechniques.
I replicate or replace many of the more traditional statisticaltests.
Graeme Hutcheson Applied Multivariate Analysis
The statistical techniques used in this course are from the family ofGeneralized Linear Models (GLMs).
These models are used as they...
I provide a powerful and theoretically-consistent method fordata analysis.
I may be applied to many different types of data and researchdesign.
I are economical with respect to the time taken to learn thetechniques.
I replicate or replace many of the more traditional statisticaltests.
Graeme Hutcheson Applied Multivariate Analysis
The statistical techniques used in this course are from the family ofGeneralized Linear Models (GLMs).
These models are used as they...
I provide a powerful and theoretically-consistent method fordata analysis.
I may be applied to many different types of data and researchdesign.
I are economical with respect to the time taken to learn thetechniques.
I replicate or replace many of the more traditional statisticaltests.
Graeme Hutcheson Applied Multivariate Analysis
The statistical techniques used in this course are from the family ofGeneralized Linear Models (GLMs).
These models are used as they...
I provide a powerful and theoretically-consistent method fordata analysis.
I may be applied to many different types of data and researchdesign.
I are economical with respect to the time taken to learn thetechniques.
I replicate or replace many of the more traditional statisticaltests.
Graeme Hutcheson Applied Multivariate Analysis
Once the research question has been identified using the generalformula notation, the choice of GLM is a simple one, based on thedistribution of the variable being predicted (Y ). The mostcommon models are...
I If Y is continuous: OLS regression.
I If Y is a count: Poisson regression.
I If Y is ordered categorical: Proportional-odds regression.
I If Y is unordered categorical: Multinomial regression.
GLM models are particularly powerful, as they are all conceptuallyvery similar. Learning to apply and interpret results from onetechnique greatly helps in applying and interpreting results fromthe others.
Graeme Hutcheson Applied Multivariate Analysis
Once the research question has been identified using the generalformula notation, the choice of GLM is a simple one, based on thedistribution of the variable being predicted (Y ). The mostcommon models are...
I If Y is continuous: OLS regression.
I If Y is a count: Poisson regression.
I If Y is ordered categorical: Proportional-odds regression.
I If Y is unordered categorical: Multinomial regression.
GLM models are particularly powerful, as they are all conceptuallyvery similar. Learning to apply and interpret results from onetechnique greatly helps in applying and interpreting results fromthe others.
Graeme Hutcheson Applied Multivariate Analysis
Once the research question has been identified using the generalformula notation, the choice of GLM is a simple one, based on thedistribution of the variable being predicted (Y ). The mostcommon models are...
I If Y is continuous: OLS regression.
I If Y is a count: Poisson regression.
I If Y is ordered categorical: Proportional-odds regression.
I If Y is unordered categorical: Multinomial regression.
GLM models are particularly powerful, as they are all conceptuallyvery similar. Learning to apply and interpret results from onetechnique greatly helps in applying and interpreting results fromthe others.
Graeme Hutcheson Applied Multivariate Analysis
Once the research question has been identified using the generalformula notation, the choice of GLM is a simple one, based on thedistribution of the variable being predicted (Y ). The mostcommon models are...
I If Y is continuous: OLS regression.
I If Y is a count: Poisson regression.
I If Y is ordered categorical: Proportional-odds regression.
I If Y is unordered categorical: Multinomial regression.
GLM models are particularly powerful, as they are all conceptuallyvery similar. Learning to apply and interpret results from onetechnique greatly helps in applying and interpreting results fromthe others.
Graeme Hutcheson Applied Multivariate Analysis
Once the research question has been identified using the generalformula notation, the choice of GLM is a simple one, based on thedistribution of the variable being predicted (Y ). The mostcommon models are...
I If Y is continuous: OLS regression.
I If Y is a count: Poisson regression.
I If Y is ordered categorical: Proportional-odds regression.
I If Y is unordered categorical: Multinomial regression.
GLM models are particularly powerful, as they are all conceptuallyvery similar. Learning to apply and interpret results from onetechnique greatly helps in applying and interpreting results fromthe others.
Graeme Hutcheson Applied Multivariate Analysis
Once the research question has been identified using the generalformula notation, the choice of GLM is a simple one, based on thedistribution of the variable being predicted (Y ). The mostcommon models are...
I If Y is continuous: OLS regression.
I If Y is a count: Poisson regression.
I If Y is ordered categorical: Proportional-odds regression.
I If Y is unordered categorical: Multinomial regression.
GLM models are particularly powerful, as they are all conceptuallyvery similar. Learning to apply and interpret results from onetechnique greatly helps in applying and interpreting results fromthe others.
Graeme Hutcheson Applied Multivariate Analysis
Once the research question has been identified using the generalformula notation, the choice of GLM is a simple one, based on thedistribution of the variable being predicted (Y ). The mostcommon models are...
I If Y is continuous: OLS regression.
I If Y is a count: Poisson regression.
I If Y is ordered categorical: Proportional-odds regression.
I If Y is unordered categorical: Multinomial regression.
GLM models are particularly powerful, as they are all conceptuallyvery similar. Learning to apply and interpret results from onetechnique greatly helps in applying and interpreting results fromthe others.
Graeme Hutcheson Applied Multivariate Analysis
The techniques identified above can be applied to bothcorrelational and experimental designs. The regression modelsreplicate or replace many commonly used statistical tests includingt and F tests (related and unrelated), ANOVA, ANCOVA,Mann-Whitney, Kruskal-Wallis, Friedman, Pearson and Spearmancorrelation, Pages L-trend, contingency table chi-square, log-linearetc...
GLMs allow a vast array of individual statistical tests to bereplaced by a single conceptual model (Y ∼ Xi + Xj ...), makingthe statistical landscape much simpler to understand.
We do not need the older-style hypothesis tests anymore -the GLM makes these redundant!
Graeme Hutcheson Applied Multivariate Analysis
The techniques identified above can be applied to bothcorrelational and experimental designs. The regression modelsreplicate or replace many commonly used statistical tests includingt and F tests (related and unrelated), ANOVA, ANCOVA,Mann-Whitney, Kruskal-Wallis, Friedman, Pearson and Spearmancorrelation, Pages L-trend, contingency table chi-square, log-linearetc...
GLMs allow a vast array of individual statistical tests to bereplaced by a single conceptual model (Y ∼ Xi + Xj ...), makingthe statistical landscape much simpler to understand.
We do not need the older-style hypothesis tests anymore -the GLM makes these redundant!
Graeme Hutcheson Applied Multivariate Analysis
The techniques identified above can be applied to bothcorrelational and experimental designs. The regression modelsreplicate or replace many commonly used statistical tests includingt and F tests (related and unrelated), ANOVA, ANCOVA,Mann-Whitney, Kruskal-Wallis, Friedman, Pearson and Spearmancorrelation, Pages L-trend, contingency table chi-square, log-linearetc...
GLMs allow a vast array of individual statistical tests to bereplaced by a single conceptual model (Y ∼ Xi + Xj ...), makingthe statistical landscape much simpler to understand.
We do not need the older-style hypothesis tests anymore -the GLM makes these redundant!
Graeme Hutcheson Applied Multivariate Analysis
The GLM models reproduce or replace many of the traditionaltests. For example, tests for independent group designs...
Traditional Test GLM
one independent variable
t-test (unrelated)
Mann-Whitney
1-way ANOVA (unrelated) Y ∼ X
Kruskal-Wallis
Jonck-heere Trend
chi-square (contingency table)
etc., etc.
multiple independent variables
complex selection of multi-way ANOVA models
multi-way contingency tables (log-linear) Y ∼ X1 + X2
Graeme Hutcheson Applied Multivariate Analysis
The GLM models reproduce or replace many of the traditionaltests. For example, tests for independent group designs...
Traditional Test GLM
one independent variable
t-test (unrelated)
Mann-Whitney
1-way ANOVA (unrelated) Y ∼ X
Kruskal-Wallis
Jonck-heere Trend
chi-square (contingency table)
etc., etc.
multiple independent variables
complex selection of multi-way ANOVA models
multi-way contingency tables (log-linear) Y ∼ X1 + X2
Graeme Hutcheson Applied Multivariate Analysis
... and tests for dependent (or matched) group designs...
Traditional Test GLM
one independent variable
paired t-test
Wilcoxon
1-way ANOVA (related) Y ∼ subject + X
Friedman
Pages L-trend
etc., etc.,
multiple independent variables
complex selection of multi-way ANOVA models
multi-way contingency tables (log-linear) Y ∼ subject + X1 + X2
Graeme Hutcheson Applied Multivariate Analysis
Running GLMs: statistical software
GLMs can be easily applied using the R statistical analysis softwarewith the Rcmdr and the R-studio. These programmes make itvery easy to input models for many different types of data...
...all we need to know is the formula of the research model (forexample, score ∼ age + gender) and the level of measurement ofthe variable being predicted (i.e., continuous, count, orderedcategorical)...
The following examples show how to input models for continuous,count, ordered and unordered response variables (note thesimilarities between the analyses)...
Graeme Hutcheson Applied Multivariate Analysis
Running GLMs: statistical software
GLMs can be easily applied using the R statistical analysis softwarewith the Rcmdr and the R-studio. These programmes make itvery easy to input models for many different types of data...
...all we need to know is the formula of the research model (forexample, score ∼ age + gender) and the level of measurement ofthe variable being predicted (i.e., continuous, count, orderedcategorical)...
The following examples show how to input models for continuous,count, ordered and unordered response variables (note thesimilarities between the analyses)...
Graeme Hutcheson Applied Multivariate Analysis
Running GLMs: statistical software
GLMs can be easily applied using the R statistical analysis softwarewith the Rcmdr and the R-studio. These programmes make itvery easy to input models for many different types of data...
...all we need to know is the formula of the research model (forexample, score ∼ age + gender) and the level of measurement ofthe variable being predicted (i.e., continuous, count, orderedcategorical)...
The following examples show how to input models for continuous,count, ordered and unordered response variables (note thesimilarities between the analyses)...
Graeme Hutcheson Applied Multivariate Analysis
Select the appropriate model in the Rcmdr.
I Generalized linear model... for continuous and count responses
I Multinomial logit model... for unordered categorical responses
I Ordinal regression model... for ordered categorical responses
Graeme Hutcheson Applied Multivariate Analysis
Select the appropriate model in the Rcmdr.
I Generalized linear model... for continuous and count responses
I Multinomial logit model... for unordered categorical responses
I Ordinal regression model... for ordered categorical responses
Graeme Hutcheson Applied Multivariate Analysis
Select the appropriate model in the Rcmdr.
I Generalized linear model... for continuous and count responses
I Multinomial logit model... for unordered categorical responses
I Ordinal regression model... for ordered categorical responses
Graeme Hutcheson Applied Multivariate Analysis
Graeme Hutcheson Applied Multivariate Analysis
Graeme Hutcheson Applied Multivariate Analysis
Graeme Hutcheson Applied Multivariate Analysis
Graeme Hutcheson Applied Multivariate Analysis
3. A general method for interpretingstatistical output...
Graeme Hutcheson Applied Multivariate Analysis
Statistical models can be interpreted using graphs (effect plots)that show predictions for the response variable over the range ofeach explanatory variable in the model.
For example, an OLS regression model that predicts ice creamconsumption at different outdoor temperatures
consumption ∼ temperature
can be represented graphically...
Graeme Hutcheson Applied Multivariate Analysis
Statistical models can be interpreted using graphs (effect plots)that show predictions for the response variable over the range ofeach explanatory variable in the model.
For example, an OLS regression model that predicts ice creamconsumption at different outdoor temperatures
consumption ∼ temperature
can be represented graphically...
Graeme Hutcheson Applied Multivariate Analysis
Effect plots are easy to obtain using the Rcmdr.
Graeme Hutcheson Applied Multivariate Analysis
Effect plots are easy to obtain using the Rcmdr.
Graeme Hutcheson Applied Multivariate Analysis
Temperature effect plot
Temperature
Con
sum
ptio
n
0.25
0.30
0.35
0.40
0.45
20 30 40 50 60 70
Over the range of observations,it appears that consumptionincreases as temperature in-creases.
The effect plot suggests a pos-itive relationship between con-sumption and temperature
Graeme Hutcheson Applied Multivariate Analysis
Temperature effect plot
Temperature
Con
sum
ptio
n
0.25
0.30
0.35
0.40
0.45
20 30 40 50 60 70
Over the range of observations,it appears that consumptionincreases as temperature in-creases.
The effect plot suggests a pos-itive relationship between con-sumption and temperature
Graeme Hutcheson Applied Multivariate Analysis
Other examples show different relationships...
DELAY effect plot
DELAY
QU
ALI
TY
50
55
60
65
70
0 20 40 60 80 100
Income effect plot
Income
Pric
e
0.270
0.275
0.280
80 85 90 95
colour*age effect plot
age
chec
ks
1.5
2.0
2.5
3.0
3.5
10 20 30 40 50 60 70
= colour Black
10 20 30 40 50 60 70
= colour White
A negative relationship between the quality of statements taken from
childen and the delay between the incident and interview; a negative
relationship between the price charged for an item and the average
income of the customer; a positive relationship between the number of
police checks made and the age of those checked (differences are however
evident between black and white).
Graeme Hutcheson Applied Multivariate Analysis
Other examples show different relationships...
DELAY effect plot
DELAY
QU
ALI
TY
50
55
60
65
70
0 20 40 60 80 100
Income effect plot
Income
Pric
e
0.270
0.275
0.280
80 85 90 95
colour*age effect plot
age
chec
ks
1.5
2.0
2.5
3.0
3.5
10 20 30 40 50 60 70
= colour Black
10 20 30 40 50 60 70
= colour White
A negative relationship between the quality of statements taken from
childen and the delay between the incident and interview; a negative
relationship between the price charged for an item and the average
income of the customer; a positive relationship between the number of
police checks made and the age of those checked (differences are however
evident between black and white).
Graeme Hutcheson Applied Multivariate Analysis
Other examples show different relationships...
DELAY effect plot
DELAY
QU
ALI
TY
50
55
60
65
70
0 20 40 60 80 100
Income effect plot
Income
Pric
e0.270
0.275
0.280
80 85 90 95
colour*age effect plot
age
chec
ks
1.5
2.0
2.5
3.0
3.5
10 20 30 40 50 60 70
= colour Black
10 20 30 40 50 60 70
= colour White
A negative relationship between the quality of statements taken from
childen and the delay between the incident and interview; a negative
relationship between the price charged for an item and the average
income of the customer; a positive relationship between the number of
police checks made and the age of those checked (differences are however
evident between black and white).
Graeme Hutcheson Applied Multivariate Analysis
Other examples show different relationships...
DELAY effect plot
DELAY
QU
ALI
TY
50
55
60
65
70
0 20 40 60 80 100
Income effect plot
Income
Pric
e0.270
0.275
0.280
80 85 90 95
colour*age effect plot
age
chec
ks
1.5
2.0
2.5
3.0
3.5
10 20 30 40 50 60 70
= colour Black
10 20 30 40 50 60 70
= colour White
A negative relationship between the quality of statements taken from
childen and the delay between the incident and interview;
a negative
relationship between the price charged for an item and the average
income of the customer; a positive relationship between the number of
police checks made and the age of those checked (differences are however
evident between black and white).
Graeme Hutcheson Applied Multivariate Analysis
Other examples show different relationships...
DELAY effect plot
DELAY
QU
ALI
TY
50
55
60
65
70
0 20 40 60 80 100
Income effect plot
Income
Pric
e0.270
0.275
0.280
80 85 90 95
colour*age effect plot
age
chec
ks
1.5
2.0
2.5
3.0
3.5
10 20 30 40 50 60 70
= colour Black
10 20 30 40 50 60 70
= colour White
A negative relationship between the quality of statements taken from
childen and the delay between the incident and interview; a negative
relationship between the price charged for an item and the average
income of the customer;
a positive relationship between the number of
police checks made and the age of those checked (differences are however
evident between black and white).
Graeme Hutcheson Applied Multivariate Analysis
Other examples show different relationships...
DELAY effect plot
DELAY
QU
ALI
TY
50
55
60
65
70
0 20 40 60 80 100
Income effect plot
Income
Pric
e0.270
0.275
0.280
80 85 90 95
colour*age effect plot
age
chec
ks
1.5
2.0
2.5
3.0
3.5
10 20 30 40 50 60 70
= colour Black
10 20 30 40 50 60 70
= colour White
A negative relationship between the quality of statements taken from
childen and the delay between the incident and interview; a negative
relationship between the price charged for an item and the average
income of the customer; a positive relationship between the number of
police checks made and the age of those checked (differences are however
evident between black and white).
Graeme Hutcheson Applied Multivariate Analysis
Significance can also be estimated from effect plots...
Temperature effect plot
Temperature
Con
sum
ptio
n
0.25
0.30
0.35
0.40
0.45
20 30 40 50 60 70
DELAY effect plot
DELAY
QU
ALI
TY
50
55
60
65
70
0 20 40 60 80 100
Income effect plot
Income
Pric
e
0.270
0.275
0.280
80 85 90 95
The first plot suggests that estimates of consumption differmarkedly according to temperature, suggesting a highly significantrelationship. The seccond plot suggests that estimates of qualitydiffer according to the delay, but there is more overlap than withthe first graph, suggesting a less significant relationship. Estimatesof price, on the other hand, do not change much as incomechanges, suggesting that this relationship is not significant.
Graeme Hutcheson Applied Multivariate Analysis
Significance can also be estimated from effect plots...
Temperature effect plot
Temperature
Con
sum
ptio
n
0.25
0.30
0.35
0.40
0.45
20 30 40 50 60 70
DELAY effect plot
DELAY
QU
ALI
TY
50
55
60
65
70
0 20 40 60 80 100
Income effect plot
Income
Pric
e
0.270
0.275
0.280
80 85 90 95
The first plot suggests that estimates of consumption differmarkedly according to temperature, suggesting a highly significantrelationship.
The seccond plot suggests that estimates of qualitydiffer according to the delay, but there is more overlap than withthe first graph, suggesting a less significant relationship. Estimatesof price, on the other hand, do not change much as incomechanges, suggesting that this relationship is not significant.
Graeme Hutcheson Applied Multivariate Analysis
Significance can also be estimated from effect plots...
Temperature effect plot
Temperature
Con
sum
ptio
n
0.25
0.30
0.35
0.40
0.45
20 30 40 50 60 70
DELAY effect plot
DELAY
QU
ALI
TY
50
55
60
65
70
0 20 40 60 80 100
Income effect plot
Income
Pric
e
0.270
0.275
0.280
80 85 90 95
The first plot suggests that estimates of consumption differmarkedly according to temperature, suggesting a highly significantrelationship. The seccond plot suggests that estimates of qualitydiffer according to the delay, but there is more overlap than withthe first graph, suggesting a less significant relationship.
Estimatesof price, on the other hand, do not change much as incomechanges, suggesting that this relationship is not significant.
Graeme Hutcheson Applied Multivariate Analysis
Significance can also be estimated from effect plots...
Temperature effect plot
Temperature
Con
sum
ptio
n
0.25
0.30
0.35
0.40
0.45
20 30 40 50 60 70
DELAY effect plot
DELAY
QU
ALI
TY
50
55
60
65
70
0 20 40 60 80 100
Income effect plot
Income
Pric
e
0.270
0.275
0.280
80 85 90 95
The first plot suggests that estimates of consumption differmarkedly according to temperature, suggesting a highly significantrelationship. The seccond plot suggests that estimates of qualitydiffer according to the delay, but there is more overlap than withthe first graph, suggesting a less significant relationship. Estimatesof price, on the other hand, do not change much as incomechanges, suggesting that this relationship is not significant.
Graeme Hutcheson Applied Multivariate Analysis
With a little practice, it is relatively simple to obtain an accurateindication of the direction, size and significance of the relationshipsin a statistical model using just the graphical displays. These canall be verified by the standard statistics, although the graphics aresimpler to use and more powerful.
Graeme Hutcheson Applied Multivariate Analysis
Conclusions...
Graeme Hutcheson Applied Multivariate Analysis
The system of analysis presented here allows a wide range ofresearch designs and data types to be analysed using generalisedmethods.
It enables complex analyses to be simply run and interpreted usingstandardised graphics.
It is relatively non-technical and does not require advancedstatistical or mathematical skills.
It can be learned within the time-frame available to a typicalpostgraduate student.
Graeme Hutcheson Applied Multivariate Analysis
The system of analysis presented here allows a wide range ofresearch designs and data types to be analysed using generalisedmethods.
It enables complex analyses to be simply run and interpreted usingstandardised graphics.
It is relatively non-technical and does not require advancedstatistical or mathematical skills.
It can be learned within the time-frame available to a typicalpostgraduate student.
Graeme Hutcheson Applied Multivariate Analysis
The system of analysis presented here allows a wide range ofresearch designs and data types to be analysed using generalisedmethods.
It enables complex analyses to be simply run and interpreted usingstandardised graphics.
It is relatively non-technical and does not require advancedstatistical or mathematical skills.
It can be learned within the time-frame available to a typicalpostgraduate student.
Graeme Hutcheson Applied Multivariate Analysis
The system of analysis presented here allows a wide range ofresearch designs and data types to be analysed using generalisedmethods.
It enables complex analyses to be simply run and interpreted usingstandardised graphics.
It is relatively non-technical and does not require advancedstatistical or mathematical skills.
It can be learned within the time-frame available to a typicalpostgraduate student.
Graeme Hutcheson Applied Multivariate Analysis
Example Analyses...
The following show a number of analyses conducted using theanalytical framework identified above. The analyses are relativelycomplex and involve categorical variables and multiple interactions.The major aim of the analyses is to provide illustrations of thetechniques and not necessarily to provide the ‘best’ model.
The R commands needed to run these analyses are contained inthe Rnotebook 2018Glasgow01.Rmd for session01 available withthe course documentation.
Graeme Hutcheson Applied Multivariate Analysis
Modelling a continuous variable...
An example from survey research...
Graeme Hutcheson Applied Multivariate Analysis
The following data provide information about the amount of icecream bought by people with different incomes and when price andoutdoor temperature also vary.
Consumption Price Income Temperature
0.386 0.270 78 41
0.374 0.282 79 56
... ... .. ..
... ... .. ..
0.393 0.277 81 63
0.425 0.280 80 68
Graeme Hutcheson Applied Multivariate Analysis
1. Define the research model
The aim of this analysis is to predict ice cream consumption whenit is sold at different outdoor temperatures, to neighbourhood withdifferent income levels and at different prices. The following modelassumes that the explanatory variables do not interact and willtherefore just model the ‘main effects’.
consumption ∼ income + price + temperature
This is not a particularly ‘good’ model for reasons that may become
apparent later. It is used here in order to demonstrate a main-effects
regression model.
Graeme Hutcheson Applied Multivariate Analysis
1. Define the research model
The aim of this analysis is to predict ice cream consumption whenit is sold at different outdoor temperatures, to neighbourhood withdifferent income levels and at different prices. The following modelassumes that the explanatory variables do not interact and willtherefore just model the ‘main effects’.
consumption ∼ income + price + temperature
This is not a particularly ‘good’ model for reasons that may become
apparent later. It is used here in order to demonstrate a main-effects
regression model.
Graeme Hutcheson Applied Multivariate Analysis
1. Define the research model
The aim of this analysis is to predict ice cream consumption whenit is sold at different outdoor temperatures, to neighbourhood withdifferent income levels and at different prices. The following modelassumes that the explanatory variables do not interact and willtherefore just model the ‘main effects’.
consumption ∼ income + price + temperature
This is not a particularly ‘good’ model for reasons that may become
apparent later. It is used here in order to demonstrate a main-effects
regression model.
Graeme Hutcheson Applied Multivariate Analysis
2. Select the analysis technique
As the variable being modelled is continuous (consumption), anOLS regression model is selected...
Graeme Hutcheson Applied Multivariate Analysis
2. Select the analysis technique
As the variable being modelled is continuous (consumption), anOLS regression model is selected...
Graeme Hutcheson Applied Multivariate Analysis
2. Select the analysis technique
As the variable being modelled is continuous (consumption), anOLS regression model is selected...
Graeme Hutcheson Applied Multivariate Analysis
3. Interpret the output...
Use effect plots to illustrate the results...
Graeme Hutcheson Applied Multivariate Analysis
3. Interpret the output...
Use effect plots to illustrate the results...
Graeme Hutcheson Applied Multivariate Analysis
3. Interpret the output...
Use effect plots to illustrate the results...
Graeme Hutcheson Applied Multivariate Analysis
Income effect plot
Income
Con
sum
ptio
n
0.32
0.34
0.36
0.38
0.40
0.42
80 85 90 95
Price effect plot
Price
Con
sum
ptio
n
0.32
0.34
0.36
0.38
0.40
0.260 0.265 0.270 0.275 0.280 0.285 0.290
Temperature effect plot
Temperature
Con
sum
ptio
n
0.25
0.30
0.35
0.40
0.45
20 30 40 50 60 70
What is the relationship be-tween each variable and con-sumption?
Estimate the significance ofeach relationship.
Graeme Hutcheson Applied Multivariate Analysis
Income effect plot
Income
Con
sum
ptio
n
0.32
0.34
0.36
0.38
0.40
0.42
80 85 90 95
Price effect plot
Price
Con
sum
ptio
n
0.32
0.34
0.36
0.38
0.40
0.260 0.265 0.270 0.275 0.280 0.285 0.290
Temperature effect plot
Temperature
Con
sum
ptio
n
0.25
0.30
0.35
0.40
0.45
20 30 40 50 60 70
What is the relationship be-tween each variable and con-sumption?
Estimate the significance ofeach relationship.
Graeme Hutcheson Applied Multivariate Analysis
Traditional statistical output
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.1973151 0.2702162 0.730 0.47179
Income 0.0033078 0.0011714 2.824 0.00899 **
Price -1.0444140 0.8343573 -1.252 0.22180
Temperature 0.0034584 0.0004455 7.762 3.1e-08 ***
Graeme Hutcheson Applied Multivariate Analysis
Evaluating the model
It is useful to ask some questions about these results...
I Are these results what we might have expected? Is it a goodmodel?
I Experience suggests that Price is related to sales. Why is thisrelationship insignificant in our model?
I Might our model be a bit simplistic?
I It might be that that people tend to buy ice cream if they canafford it AND if they want it (ie., on hot days) AND if it ispriced accordingly.
I The interaction between Income, Temperature and Price maybe important and should be included in the model...
Graeme Hutcheson Applied Multivariate Analysis
Evaluating the model
It is useful to ask some questions about these results...
I Are these results what we might have expected? Is it a goodmodel?
I Experience suggests that Price is related to sales. Why is thisrelationship insignificant in our model?
I Might our model be a bit simplistic?
I It might be that that people tend to buy ice cream if they canafford it AND if they want it (ie., on hot days) AND if it ispriced accordingly.
I The interaction between Income, Temperature and Price maybe important and should be included in the model...
Graeme Hutcheson Applied Multivariate Analysis
Evaluating the model
It is useful to ask some questions about these results...
I Are these results what we might have expected? Is it a goodmodel?
I Experience suggests that Price is related to sales. Why is thisrelationship insignificant in our model?
I Might our model be a bit simplistic?
I It might be that that people tend to buy ice cream if they canafford it AND if they want it (ie., on hot days) AND if it ispriced accordingly.
I The interaction between Income, Temperature and Price maybe important and should be included in the model...
Graeme Hutcheson Applied Multivariate Analysis
Evaluating the model
It is useful to ask some questions about these results...
I Are these results what we might have expected? Is it a goodmodel?
I Experience suggests that Price is related to sales. Why is thisrelationship insignificant in our model?
I Might our model be a bit simplistic?
I It might be that that people tend to buy ice cream if they canafford it AND if they want it (ie., on hot days) AND if it ispriced accordingly.
I The interaction between Income, Temperature and Price maybe important and should be included in the model...
Graeme Hutcheson Applied Multivariate Analysis
Evaluating the model
It is useful to ask some questions about these results...
I Are these results what we might have expected? Is it a goodmodel?
I Experience suggests that Price is related to sales. Why is thisrelationship insignificant in our model?
I Might our model be a bit simplistic?
I It might be that that people tend to buy ice cream if they canafford it AND if they want it (ie., on hot days) AND if it ispriced accordingly.
I The interaction between Income, Temperature and Price maybe important and should be included in the model...
Graeme Hutcheson Applied Multivariate Analysis
Improving the model...
1. Define the research model...
Consumption ∼ Price*Income*Temperature
2. Select the analysis technique...As consumption is continuous, use OLS regression.
3. Interpret the results...Inspect the effect plots (note: these are much easier tointerpret than the traditional output).
Graeme Hutcheson Applied Multivariate Analysis
Improving the model...
1. Define the research model...
Consumption ∼ Price*Income*Temperature
2. Select the analysis technique...As consumption is continuous, use OLS regression.
3. Interpret the results...Inspect the effect plots (note: these are much easier tointerpret than the traditional output).
Graeme Hutcheson Applied Multivariate Analysis
Improving the model...
1. Define the research model...Consumption ∼ Price*Income*Temperature
2. Select the analysis technique...
As consumption is continuous, use OLS regression.
3. Interpret the results...Inspect the effect plots (note: these are much easier tointerpret than the traditional output).
Graeme Hutcheson Applied Multivariate Analysis
Improving the model...
1. Define the research model...Consumption ∼ Price*Income*Temperature
2. Select the analysis technique...
As consumption is continuous, use OLS regression.
3. Interpret the results...Inspect the effect plots (note: these are much easier tointerpret than the traditional output).
Graeme Hutcheson Applied Multivariate Analysis
Improving the model...
1. Define the research model...Consumption ∼ Price*Income*Temperature
2. Select the analysis technique...As consumption is continuous, use OLS regression.
3. Interpret the results...Inspect the effect plots (note: these are much easier tointerpret than the traditional output).
Graeme Hutcheson Applied Multivariate Analysis
Price*Income*Temperature effect plot
Price
Con
sum
ptio
n
−0.2
0.0
0.2
0.4
0.6
0.8
0.2600.2650.2700.2750.2800.2850.290
= Income 80 = Temperature 20
= Income 90 = Temperature 20
0.2600.2650.2700.2750.2800.2850.290
= Income 100 = Temperature 20
= Income 80 = Temperature 50
= Income 90 = Temperature 50
−0.2
0.0
0.2
0.4
0.6
0.8 = Income 100
= Temperature 50
−0.2
0.0
0.2
0.4
0.6
0.8 = Income 80
= Temperature 70
0.2600.2650.2700.2750.2800.2850.290
= Income 90 = Temperature 70
= Income 100 = Temperature 70
It appears that those in the market to buy ice cream (i.e., those with
enough money and desire to buy due to the hot weather) are influenced
by the price. Cheaper ice cream encourages greater sales!
Graeme Hutcheson Applied Multivariate Analysis
Price*Income*Temperature effect plot
Price
Con
sum
ptio
n
−0.2
0.0
0.2
0.4
0.6
0.8
0.2600.2650.2700.2750.2800.2850.290
= Income 80 = Temperature 20
= Income 90 = Temperature 20
0.2600.2650.2700.2750.2800.2850.290
= Income 100 = Temperature 20
= Income 80 = Temperature 50
= Income 90 = Temperature 50
−0.2
0.0
0.2
0.4
0.6
0.8 = Income 100
= Temperature 50
−0.2
0.0
0.2
0.4
0.6
0.8 = Income 80
= Temperature 70
0.2600.2650.2700.2750.2800.2850.290
= Income 90 = Temperature 70
= Income 100 = Temperature 70
It appears that those in the market to buy ice cream (i.e., those with
enough money and desire to buy due to the hot weather) are influenced
by the price. Cheaper ice cream encourages greater sales!
Graeme Hutcheson Applied Multivariate Analysis
This relationship is not easy to interpret from the traditionalstatistical output!
Call: glm(formula = Consumption ~ Price * Income * Temperature)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 20.665656 7.970792 2.593 0.016613 *
Price -76.452035 29.171664 -2.621 0.015610 *
Income -0.250381 0.092277 -2.713 0.012692 *
Temperature -0.757488 0.180361 -4.200 0.000370 ***
Price:Income 0.935945 0.337905 2.770 0.011174 *
Price:Temperature 2.818488 0.664467 4.242 0.000334 ***
Income:Temperature 0.009276 0.002130 4.355 0.000253 ***
Price:Income:Temperature -0.034391 0.007858 -4.377 0.000240 ***
Analysis of Deviance Table (Type II tests)
SS Df F Pr(>F)
Price 0.001538 1 2.2518 0.1476763
Income 0.009582 1 14.0297 0.0011194 **
Temperature 0.056755 1 83.1017 0.000000006325 ***
Price:Income 0.006693 1 9.8002 0.0048649 **
Price:Temperature 0.002087 1 3.0556 0.0944035 .
Income:Temperature 0.000325 1 0.4755 0.4976583
Price:Income:Temperature 0.013081 1 19.1541 0.0002405 ***
Graeme Hutcheson Applied Multivariate Analysis
Using a more appropriate research question, we find that Price IS asignificant indicator of consumption, but it operates in conjunctionwith other variables.
Many research questions will involve interactions
It is important to be able to include these in your analyses and beable to interpret and illustrate these.
Graeme Hutcheson Applied Multivariate Analysis
Using a more appropriate research question, we find that Price IS asignificant indicator of consumption, but it operates in conjunctionwith other variables.
Many research questions will involve interactions
It is important to be able to include these in your analyses and beable to interpret and illustrate these.
Graeme Hutcheson Applied Multivariate Analysis
Using a more appropriate research question, we find that Price IS asignificant indicator of consumption, but it operates in conjunctionwith other variables.
Many research questions will involve interactions
It is important to be able to include these in your analyses and beable to interpret and illustrate these.
Graeme Hutcheson Applied Multivariate Analysis
Modelling a continuous variable...
An example from experimental research...
Graeme Hutcheson Applied Multivariate Analysis
The following data provide information about effect of vitamin Con tooth growth in Guinea pigs. The response is the length ofodontoblasts (cells responsible for tooth growth) in 60 guinea pigs.Each animal received one of three dose levels of vitamin C (0.5, 1,and 2 mg/day) by one of two delivery methods, (orange juice orascorbic acid (a form of vitamin C and coded as VC)).
Length Delivery Dose
4.2 VC 0.5
11.5 VC 0.5
7.3 VC 0.5
..
..
..
27.3 OJ 2
29.4 OJ 2
23.0 OJ 2
Graeme Hutcheson Applied Multivariate Analysis
1. Define the research model
This is a simple experiemnt where toothgrowth is assessed whendose and delivery are controlled. As the delivery method mayaffect the doses differently, an interaction between these twovariables needs to be considered...
Length ∼ Dose ∗ Delivery
Note: This is an identical analysis to a multi-way ANOVA.
Graeme Hutcheson Applied Multivariate Analysis
1. Define the research model
This is a simple experiemnt where toothgrowth is assessed whendose and delivery are controlled. As the delivery method mayaffect the doses differently, an interaction between these twovariables needs to be considered...
Length ∼ Dose ∗ Delivery
Note: This is an identical analysis to a multi-way ANOVA.
Graeme Hutcheson Applied Multivariate Analysis
1. Define the research model
This is a simple experiemnt where toothgrowth is assessed whendose and delivery are controlled. As the delivery method mayaffect the doses differently, an interaction between these twovariables needs to be considered...
Length ∼ Dose ∗ Delivery
Note: This is an identical analysis to a multi-way ANOVA.
Graeme Hutcheson Applied Multivariate Analysis
2. Select the analysis technique
As the variable being modelled is continuous (length) an OLSregression model is selected...
Graeme Hutcheson Applied Multivariate Analysis
2. Select the analysis technique
As the variable being modelled is continuous (length) an OLSregression model is selected...
Graeme Hutcheson Applied Multivariate Analysis
2. Select the analysis technique
As the variable being modelled is continuous (length) an OLSregression model is selected...
Graeme Hutcheson Applied Multivariate Analysis
3. Interpret the output...
Use effect plots to illustrate the results...
dose*supp effect plot
dose
len
10
15
20
25
30
0.5 1.0 1.5 2.0
suppOJ VC
Lower dosage is associated with lower toothgrowth. SupplementOJ is associated with higher toothgrowth at lower doses comparedto supplement VC.
Graeme Hutcheson Applied Multivariate Analysis
3. Interpret the output...
Use effect plots to illustrate the results...dose*supp effect plot
dose
len
10
15
20
25
30
0.5 1.0 1.5 2.0
suppOJ VC
Lower dosage is associated with lower toothgrowth. SupplementOJ is associated with higher toothgrowth at lower doses comparedto supplement VC.
Graeme Hutcheson Applied Multivariate Analysis
3. Interpret the output...
Use effect plots to illustrate the results...dose*supp effect plot
dose
len
10
15
20
25
30
0.5 1.0 1.5 2.0
suppOJ VC
Lower dosage is associated with lower toothgrowth. SupplementOJ is associated with higher toothgrowth at lower doses comparedto supplement VC.
Graeme Hutcheson Applied Multivariate Analysis
Improve the model...
A better indication of how dose and supplement affect growth isachieved if we consider dose to be categorical (recode 0, 0.5, 2.0into low, medium and high).
doseCAT*supp effect plot
doseCAT
len
10
15
20
25
1low 2medium 3high
suppOJVC
Lower dosage is associated with lower toothgrowth. SupplementOJ is associated with higher toothgrowth but only at low tomedium doses compared to supplement VC.
Graeme Hutcheson Applied Multivariate Analysis
Improve the model...
A better indication of how dose and supplement affect growth isachieved if we consider dose to be categorical (recode 0, 0.5, 2.0into low, medium and high).
doseCAT*supp effect plot
doseCAT
len
10
15
20
25
1low 2medium 3high
suppOJVC
Lower dosage is associated with lower toothgrowth. SupplementOJ is associated with higher toothgrowth but only at low tomedium doses compared to supplement VC.
Graeme Hutcheson Applied Multivariate Analysis
Improve the model...
A better indication of how dose and supplement affect growth isachieved if we consider dose to be categorical (recode 0, 0.5, 2.0into low, medium and high).
doseCAT*supp effect plot
doseCAT
len
10
15
20
25
1low 2medium 3high
suppOJVC
Lower dosage is associated with lower toothgrowth. SupplementOJ is associated with higher toothgrowth but only at low tomedium doses compared to supplement VC.
Graeme Hutcheson Applied Multivariate Analysis
Modelling a categorical variable...
Graeme Hutcheson Applied Multivariate Analysis
A binary response: logistic regression
These data (the ‘Arrests’ dataset which is available in thecarData library) give information about whether someone who isarrested is released or not, the number of databases the person ison (variables ‘checks’ - an indication of their criminal history),their colour and sex, and the year in which they were arrested.
released colour year age sex checks
Yes White 2002 21 Male 3
No Black 1999 17 Female 3
.. ..... .... .. ..... .
.. ..... .... .. ..... .
Yes White 2000 24 Male 1
Graeme Hutcheson Applied Multivariate Analysis
1. Define the research model
The following model predicts whether or not someone is releasedgiven checks, colour, sex and year.
As we are mostly interested in demonstrating how a binary variablecan be modelled, we assume that the explanatory variables do notinteract and will only include the ‘main effects’.
released ∼ checks + colour + sex + year
Graeme Hutcheson Applied Multivariate Analysis
1. Define the research model
The following model predicts whether or not someone is releasedgiven checks, colour, sex and year.
As we are mostly interested in demonstrating how a binary variablecan be modelled, we assume that the explanatory variables do notinteract and will only include the ‘main effects’.
released ∼ checks + colour + sex + year
Graeme Hutcheson Applied Multivariate Analysis
1. Define the research model
The following model predicts whether or not someone is releasedgiven checks, colour, sex and year.
As we are mostly interested in demonstrating how a binary variablecan be modelled, we assume that the explanatory variables do notinteract and will only include the ‘main effects’.
released ∼ checks + colour + sex + year
Graeme Hutcheson Applied Multivariate Analysis
1. Define the research model
The following model predicts whether or not someone is releasedgiven checks, colour, sex and year.
As we are mostly interested in demonstrating how a binary variablecan be modelled, we assume that the explanatory variables do notinteract and will only include the ‘main effects’.
released ∼ checks + colour + sex + year
Graeme Hutcheson Applied Multivariate Analysis
2. Select the analysis technique
As the variable being modelled is unordered categorical (released)a binomial logit model is selected (a multinomial model could alsobe used and will provide identical results)...
Graeme Hutcheson Applied Multivariate Analysis
2. Select the analysis technique
As the variable being modelled is unordered categorical (released)a binomial logit model is selected (a multinomial model could alsobe used and will provide identical results)...
Graeme Hutcheson Applied Multivariate Analysis
2. Select the analysis technique
As the variable being modelled is unordered categorical (released)a binomial logit model is selected (a multinomial model could alsobe used and will provide identical results)...
Graeme Hutcheson Applied Multivariate Analysis
3. Interpret the output...
Use effect plots to illustrate the results...
Graeme Hutcheson Applied Multivariate Analysis
3. Interpret the output...
Use effect plots to illustrate the results...
Graeme Hutcheson Applied Multivariate Analysis
checks effect plot
checksre
leas
ed
0.5
0.6
0.7
0.8
0.9
0 1 2 3 4 5 6
colour effect plot
colour
rele
ased
0.78
0.80
0.82
0.84
0.86
0.88
Black White
●
●
sex effect plot
sex
rele
ased
0.80
0.81
0.82
0.83
0.84
0.85
0.86
0.87
Female Male
●
●
yearCAT effect plot
yearCATre
leas
ed
0.76
0.78
0.80
0.82
0.84
0.86
0.88
1997 1998 1999 2000 2001 2002
●
● ●
●
●
●
R command: plot(allEffects(released01), type="response")
Try and predict the direction and significance of the variablesdirectly from these plots...
Graeme Hutcheson Applied Multivariate Analysis
checks effect plot
checksre
leas
ed
0.5
0.6
0.7
0.8
0.9
0 1 2 3 4 5 6
colour effect plot
colour
rele
ased
0.78
0.80
0.82
0.84
0.86
0.88
Black White
●
●
sex effect plot
sex
rele
ased
0.80
0.81
0.82
0.83
0.84
0.85
0.86
0.87
Female Male
●
●
yearCAT effect plot
yearCATre
leas
ed
0.76
0.78
0.80
0.82
0.84
0.86
0.88
1997 1998 1999 2000 2001 2002
●
● ●
●
●
●
R command: plot(allEffects(released01), type="response")
Try and predict the direction and significance of the variablesdirectly from these plots...
Graeme Hutcheson Applied Multivariate Analysis
The ’standard output’ describes the model, but is not easy tointerpret as the ‘Estimates’ refer to log-odds and it is not at allobvious how to interpret yearCAT (which contrast coding schemeor which reference category has been used). You do not need todirectly interpret these tables, however, as all the informationprovided below can be derived directly from the effect plots.
Graeme Hutcheson Applied Multivariate Analysis
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.55599 0.19659 7.915 2.48e-15 ***
checks -0.40367 0.02516 -16.044 < 2e-16 ***
colour[T.White] 0.54187 0.08183 6.622 3.54e-11 ***
sex[T.Male] 0.09156 0.14711 0.622 0.5337
yearCAT[T.1998] 0.34079 0.14471 2.355 0.0185 *
yearCAT[T.1999] 0.36675 0.13958 2.627 0.0086 **
yearCAT[T.2000] 0.57144 0.13926 4.103 4.07e-05 ***
yearCAT[T.2001] 0.33515 0.13688 2.448 0.0143 *
yearCAT[T.2002] 0.17366 0.19278 0.901 0.3677
Analysis of Deviance Table (Type II tests)
LR Chisq Df Pr(>Chisq)
checks 270.567 1 < 2.2e-16 ***
colour 42.574 1 6.807e-11 ***
sex 0.381 1 0.536901
yearCAT 18.002 5 0.002944 **
Graeme Hutcheson Applied Multivariate Analysis
There is a negative relationship between checks and the probabilityof being released. This relationship is highly significant.
Black people have a lower probability of being released than dowhite people (0.79 compared to 0.87); this relationship is highlysignificant.
Males have a higher probability of being released than do females(0.85 compared to 0.84). This relationship is not significant.
Year does appear to be related to the probability of being released;the biggest difference being between 1997 and 2000 (0.8 to 0.88).This relationship is significant.
Graeme Hutcheson Applied Multivariate Analysis
There is a negative relationship between checks and the probabilityof being released. This relationship is highly significant.
Black people have a lower probability of being released than dowhite people (0.79 compared to 0.87); this relationship is highlysignificant.
Males have a higher probability of being released than do females(0.85 compared to 0.84). This relationship is not significant.
Year does appear to be related to the probability of being released;the biggest difference being between 1997 and 2000 (0.8 to 0.88).This relationship is significant.
Graeme Hutcheson Applied Multivariate Analysis
There is a negative relationship between checks and the probabilityof being released. This relationship is highly significant.
Black people have a lower probability of being released than dowhite people (0.79 compared to 0.87); this relationship is highlysignificant.
Males have a higher probability of being released than do females(0.85 compared to 0.84). This relationship is not significant.
Year does appear to be related to the probability of being released;the biggest difference being between 1997 and 2000 (0.8 to 0.88).This relationship is significant.
Graeme Hutcheson Applied Multivariate Analysis
There is a negative relationship between checks and the probabilityof being released. This relationship is highly significant.
Black people have a lower probability of being released than dowhite people (0.79 compared to 0.87); this relationship is highlysignificant.
Males have a higher probability of being released than do females(0.85 compared to 0.84). This relationship is not significant.
Year does appear to be related to the probability of being released;the biggest difference being between 1997 and 2000 (0.8 to 0.88).This relationship is significant.
Graeme Hutcheson Applied Multivariate Analysis
There are many other models you might consider for these data.For example, has the probability of being released changed forblacks and whites over the years (ie., have recent changes to thelaw designed to alleviate discrimination been effective)?
To address this relationship, we need to look at the interactionbetween colour and year - has the probability of being releasedchanged for black people compared to white over the years? Thisrelationship is represented as an interaction between colour andyear...
released ∼ checks + sex + colour*year
The effect plots for the model above shows that the probability ofrelease changes dramatically for blacks compared to whites overthe years. Pre-2000 shows that blacks have a significantly lowerprobability of being released compared to whites; post 2000 showsimilar probabilities of release for black and white people.
Graeme Hutcheson Applied Multivariate Analysis
There are many other models you might consider for these data.For example, has the probability of being released changed forblacks and whites over the years (ie., have recent changes to thelaw designed to alleviate discrimination been effective)?
To address this relationship, we need to look at the interactionbetween colour and year - has the probability of being releasedchanged for black people compared to white over the years? Thisrelationship is represented as an interaction between colour andyear...
released ∼ checks + sex + colour*year
The effect plots for the model above shows that the probability ofrelease changes dramatically for blacks compared to whites overthe years. Pre-2000 shows that blacks have a significantly lowerprobability of being released compared to whites; post 2000 showsimilar probabilities of release for black and white people.
Graeme Hutcheson Applied Multivariate Analysis
There are many other models you might consider for these data.For example, has the probability of being released changed forblacks and whites over the years (ie., have recent changes to thelaw designed to alleviate discrimination been effective)?
To address this relationship, we need to look at the interactionbetween colour and year - has the probability of being releasedchanged for black people compared to white over the years? Thisrelationship is represented as an interaction between colour andyear...
released ∼ checks + sex + colour*year
The effect plots for the model above shows that the probability ofrelease changes dramatically for blacks compared to whites overthe years. Pre-2000 shows that blacks have a significantly lowerprobability of being released compared to whites; post 2000 showsimilar probabilities of release for black and white people.
Graeme Hutcheson Applied Multivariate Analysis
There are many other models you might consider for these data.For example, has the probability of being released changed forblacks and whites over the years (ie., have recent changes to thelaw designed to alleviate discrimination been effective)?
To address this relationship, we need to look at the interactionbetween colour and year - has the probability of being releasedchanged for black people compared to white over the years? Thisrelationship is represented as an interaction between colour andyear...
released ∼ checks + sex + colour*year
The effect plots for the model above shows that the probability ofrelease changes dramatically for blacks compared to whites overthe years. Pre-2000 shows that blacks have a significantly lowerprobability of being released compared to whites; post 2000 showsimilar probabilities of release for black and white people.
Graeme Hutcheson Applied Multivariate Analysis
yearCAT*colour effect plot
yearCAT
rele
ased
0.70
0.75
0.80
0.85
0.90
1997 1998 1999 2000 2001 2002
colourBlack White
R code: plot(allEffects(Release.Model), type=”response”,
multiline=TRUE, ci.style=”bars”)
Graeme Hutcheson Applied Multivariate Analysis
Modelling a count variable...
Evidence that ‘stop and search’ unfairly targets sections of thepopulation.
Graeme Hutcheson Applied Multivariate Analysis
1. Define the research model.
There is concern that a policy of ‘stop and search’ unfairly targetsyoung black people. It has also been suggested that this situationhas not changed over the years, despite attempts to address theissue.
The following analysis investigates this by looking at the numberof databases people appear on (checks) based on their age,colour, sex and the year they were arrested. As we are interested inthe combination of age and colour (young black people) we needto include the interaction Age * Color. We also need toinvestigate the interaction between year and color yearCAT *
Color to see if there is evidence that the the number of checks haschanged for black and white people over the years.
Graeme Hutcheson Applied Multivariate Analysis
1. Define the research model.
There is concern that a policy of ‘stop and search’ unfairly targetsyoung black people. It has also been suggested that this situationhas not changed over the years, despite attempts to address theissue.
The following analysis investigates this by looking at the numberof databases people appear on (checks) based on their age,colour, sex and the year they were arrested. As we are interested inthe combination of age and colour (young black people) we needto include the interaction Age * Color. We also need toinvestigate the interaction between year and color yearCAT *
Color to see if there is evidence that the the number of checks haschanged for black and white people over the years.
Graeme Hutcheson Applied Multivariate Analysis
The following model predicts the number of checks given sex andinteractions between ‘age and sex’ and ‘year and colour’...
checks ∼ sex + colour*age + colour*year
Graeme Hutcheson Applied Multivariate Analysis
The following model predicts the number of checks given sex andinteractions between ‘age and sex’ and ‘year and colour’...
checks ∼ sex + colour*age + colour*year
Graeme Hutcheson Applied Multivariate Analysis
2. Select the analysis technique.
As the variable being modelled is a count (the number of checks) aPoisson model is selected...
Graeme Hutcheson Applied Multivariate Analysis
2. Select the analysis technique.
As the variable being modelled is a count (the number of checks) aPoisson model is selected...
Graeme Hutcheson Applied Multivariate Analysis
3. Interpret the results.
The ’standard output’ describes the model, but is not easy tointerpret... the ‘Estimates’ are in logs, the lower-order terms arenot readily interpretable (compare the significance of ‘age’ in thetwo tables) and it is not obvious how to interpret the categoricalexplanatory variables...
Graeme Hutcheson Applied Multivariate Analysis
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.909e-01 9.448e-02 3.079 0.00208 **
sex[T.Male] 3.977e-01 4.692e-02 8.478 < 2e-16 ***
age 3.919e-03 2.180e-03 1.798 0.07222 .
colour[T.White] -5.351e-01 9.882e-02 -5.415 6.13e-08 ***
year[T.1998] -5.264e-02 7.619e-02 -0.691 0.48966
year[T.1999] 7.005e-03 7.479e-02 0.094 0.92538
year[T.2000] -7.169e-05 7.347e-02 -0.001 0.99922
year[T.2001] -6.767e-02 7.311e-02 -0.926 0.35462
year[T.2002] -3.251e-02 9.701e-02 -0.335 0.73751
age:colour[T.White] 1.328e-02 2.621e-03 5.069 4.00e-07 ***
col[T.White]:year[T.1998] -4.342e-02 9.165e-02 -0.474 0.63571
col[T.White]:year[T.1999] -1.704e-01 8.927e-02 -1.909 0.05629 .
col[T.White]:year[T.2000] -1.586e-01 8.756e-02 -1.811 0.07010 .
col[T.White]:year[T.2001] -1.210e-01 8.770e-02 -1.380 0.16770
col[T.White]:year[T.2002] -1.665e-01 1.212e-01 -1.374 0.16947
Response: checks
LR Chisq Df Pr(>Chisq)
sex 80.859 1 < 2.2e-16 ***
age 108.406 1 < 2.2e-16 ***
colour 175.067 1 < 2.2e-16 ***
year 15.176 5 0.009637 **
age:colour 26.308 1 2.911e-07 ***
colour:year 6.445 5 0.265274
Youdo
notnee
d tointe
rpret th
is...
Graeme Hutcheson Applied Multivariate Analysis
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.909e-01 9.448e-02 3.079 0.00208 **
sex[T.Male] 3.977e-01 4.692e-02 8.478 < 2e-16 ***
age 3.919e-03 2.180e-03 1.798 0.07222 .
colour[T.White] -5.351e-01 9.882e-02 -5.415 6.13e-08 ***
year[T.1998] -5.264e-02 7.619e-02 -0.691 0.48966
year[T.1999] 7.005e-03 7.479e-02 0.094 0.92538
year[T.2000] -7.169e-05 7.347e-02 -0.001 0.99922
year[T.2001] -6.767e-02 7.311e-02 -0.926 0.35462
year[T.2002] -3.251e-02 9.701e-02 -0.335 0.73751
age:colour[T.White] 1.328e-02 2.621e-03 5.069 4.00e-07 ***
col[T.White]:year[T.1998] -4.342e-02 9.165e-02 -0.474 0.63571
col[T.White]:year[T.1999] -1.704e-01 8.927e-02 -1.909 0.05629 .
col[T.White]:year[T.2000] -1.586e-01 8.756e-02 -1.811 0.07010 .
col[T.White]:year[T.2001] -1.210e-01 8.770e-02 -1.380 0.16770
col[T.White]:year[T.2002] -1.665e-01 1.212e-01 -1.374 0.16947
Response: checks
LR Chisq Df Pr(>Chisq)
sex 80.859 1 < 2.2e-16 ***
age 108.406 1 < 2.2e-16 ***
colour 175.067 1 < 2.2e-16 ***
year 15.176 5 0.009637 **
age:colour 26.308 1 2.911e-07 ***
colour:year 6.445 5 0.265274
Youdo
notnee
d tointe
rpret th
is...
Graeme Hutcheson Applied Multivariate Analysis
The effect plots are easier to interpret...
Graeme Hutcheson Applied Multivariate Analysis
sex effect plot
sex
chec
ks
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
Female Male
colour*age effect plot
age
chec
ks
1.5
2.0
2.5
3.0
3.5
10 20 30 40 50 60 70
= colour Black
10 20 30 40 50 60 70
= colour White
colour*yearCAT effect plot
yearCAT
chec
ks
1.4
1.6
1.8
2.0
2.2
2.4
1997 1998 1999 2000 2001 2002
= colour Black
1997 1998 1999 2000 2001 2002
= colour White
R command: plot(allEffects(ChecksModel01), type="response")
It is clear that young black people ap-pear on more databases.
Although the relationship betweenyearCAT and colour is not significantin the tabular output, an interestingrelationship is suggested in the plot.Further analysis shows that there IS asignificant decreasing linear trend forWhites using a more specific statisticaltest (this relationship was only identifiedfrom the effect plot).
Graeme Hutcheson Applied Multivariate Analysis
sex effect plot
sex
chec
ks
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
Female Male
colour*age effect plot
age
chec
ks
1.5
2.0
2.5
3.0
3.5
10 20 30 40 50 60 70
= colour Black
10 20 30 40 50 60 70
= colour White
colour*yearCAT effect plot
yearCAT
chec
ks
1.4
1.6
1.8
2.0
2.2
2.4
1997 1998 1999 2000 2001 2002
= colour Black
1997 1998 1999 2000 2001 2002
= colour White
R command: plot(allEffects(ChecksModel01), type="response")
It is clear that young black people ap-pear on more databases.
Although the relationship betweenyearCAT and colour is not significantin the tabular output, an interestingrelationship is suggested in the plot.Further analysis shows that there IS asignificant decreasing linear trend forWhites using a more specific statisticaltest (this relationship was only identifiedfrom the effect plot).
Graeme Hutcheson Applied Multivariate Analysis