research question what determines a person’s height?
TRANSCRIPT
Research Question
What determines a person’s height?
• Genetics• Nutrition• Immigration / Origins• Disease
Hypothesis Brainstorming
• Sons will be similar to their Dad’s height
• Daughters will be similar to their Mom’s height
Hypotheses:
Literature Review: Article #1
• Invented Regression
• When Mid-Parents are taller then mediocrity, their Children tend to be shorter than they
• When Mid-Parents are shorter than mediocrity, their Children tend to be taller then they
Francis Galton
Literature Review: Article #2
Variables:• Genes• First two years of life• Illnesses• Infant mortality rates• Smaller Families• Higher income• Better education
Literature Review: Article #3
“we find that a 54-loci genomic profile explained 4–6% of the sex- and age-adjusted height variance”
“the Galtonian mid-parental prediction method explained 40% of the sex- and age-adjusted height variance”
Literature Review: Summary
Variable Galton Hatton AulchenkoHeight Individuals Country Average IndividualsGender Men and Women Men Only Men and WomenAge IndividualsCountries
Infant Mortality Country AverageGDP Country AverageFamily Size Country Average
Time XGenome IndividualsObservations ~1,000 550
5,478
Variables
Y
X’s
Height
Independent Variables
DependentVariable
Y
X4X3X2X1
Height Dataset Variablesheights <- read.csv("GaltonFamilies.csv")
Data Types: Numbers and Factors/Categorical
Dataset Variables: Type
Summary Statistics
Frequency Distribution, Histogram
hist(heights$childHeight)
hist(h$childHeight,freq=F, breaks =25, ylim = c(0,0.14))curve(dnorm(x, mean=mean(h$childHeight), sd=sd(h$childHeight)), col="red", add=T)
Bimodal: two modes
Mode, Bimodal
Q-Q Plot
Correlation Matrix for Continuous Variables
chart.Correlation(num2)PerformanceAnalytics package
Correlations Matrix: Both Types
library(car)scatterplotMatrix(heights)
Zoom in on Gender
Categorical: Revisit Box Plot
Note there is an equation here:Y = mx b
Correlation will depend on spread of distributions
Children Height by Gender
Linear Regression: Model 1Child’s Height = f(Father’s Height)
Linear Regression: Model 2
model.5 <- lm(childHeight~gender, data = h)
Child’s Height = f(Father’s Height)
• Mom• MidParent Height
Linear Regression: Additional Models
Compare Models
Model 1 2 12 3 4
Intercept 40.1 46.6 22.6 22.63 22.64
Father 0.385 0.36 0.01
Mom 0.314 0.29 NA
midparentHeight 0.637 0.538
Gender
R-squares 0.070 0.0395 0.105 0.102 0.1033
r 0.27 0.2 0.32
R^2 0.073 0.04 0.102
• Key Findings:• Gender was the biggest factor• Parents height played a lesser role
• Downsides• DataSet used did not include more variables of interest• DataSet for X Country for 1877
Discussion Summary
• Include More Predictor Variables• Literature review of a few articles suggests several
important factors:• Nutrition
• Analyze a Contemporary DataSet• DataSet used was from 18??• Location Specific as Well
Future Research