Download - Lecture 1: Introduction Math Boot Camp Will Terry Department of Political Science

Lecture 1: IntroductionMath Boot Camp

Will Terry

Department of Political Science University of Oregon

September 16, 2013

Objectives of Math Camp

Have a good time learning about the wonders of math(s)!

Get ready for PS545-546….

Objectives of PS545-546

• The objectives of our sequence are twofold:

(1.) to improve your ability to read mainstream quantitative research, and

(2.) provide a broad overview of the main tools of quantitative analysis.

• We will focus on the linear regression model.

• You will become familiar with Stata.

Statistical software

• This course will focus on practical computing skills that you might find useful in your future research.

– There are reasons to spend some time with R to appreciate capability of statistical computing.

– Given the limited time we will focus on developing STATA skills as much as possible.

• We will master the basic components of statistical computing.

– Data management

– Estimating regression models

– Graphing

The standard political science stats education

I. Basic probability theory- random variables- PDFs-CDFs

II. Statistical inference theory- confidence intervals, hypothesis testing, p-values, etc.

III. Linear regression analysis - the workhorse model of the social sciences

IV. Binary Outcome Models & Other Extensions of the Basic Linear Model

V. Time Series Cross Sectional Models

First, some key terms…

Causality Phenomenon Y (e.g. income) is affected by factor X (e.g., gender)

Statistical inference Drawing conclusions about the world based on characteristics of sample data.

Typically we are in interested in understanding “population parameters.” Independent variable (syn. “regressor”, RHS var) The variable that is exogenously manipulated or changed.

Dependent variable (syn. “regressand”, LHS var) Its value “depends” on the value taken by the independent variables.

Random variables and hypothesis testing

Random Variable (RV) A variable whose values are determined by chance. Population Density Function (PDF) Describes how an RV is “distributed”—i.e., how likely it is that the RV takes any

particular value.

Parameter Characteristic or measure that describes a population. Statistic (not to be confused with Statistics) Characteristic or measure obtained from a sample. .

Common ways to distinguish variables

Qualitative Variables Variables that take non-numerical values. (e.g., eye color; gun ownership) Quantitative Variables Variables that take numerical values. (e.g., number of credit cards in one’s wallet;

time elapsed since the Compromise of 1877) Discrete Variables Variables which assume a finite or countable number of possible values. Usually

obtained by counting. (e.g., the number of credit cards in one’s wallet) Continuous Variables Variables which assume an infinite number of possible values. Usually obtained

by measurement. (e.g., time elapsed since the Compromise of 1877)

Hypothesis testing terminology

Population All subjects possessing a common characteristic that is being studied. Sample A subgroup or subset of the population.

StatisticsCollection of methods for planning experiments, obtaining data, and then

organizing, summarizing, presenting, analyzing, interpreting, and drawing conclusions.

Hypothesis testing

Research design

• Research design is the means by which we attempt to uncover causal relationships between variables using data that we collect.

• In the jargon of the trade, the objective is to to “identify” the effect of a “treatment.”

• Conceptually, one wants to make a comparison between two identical subjects—one who received the treatment, and one who did not.

• A pure experiment is the gold standard. Unfortunately, this ideal is generally infeasible in the social sciences.

Language of research design

Treatment groupThe group that receives the treatment.

Control group The group that does not receive the treatment.

Experimental data Data derived from a process whereby the researcher determines the receipt of the

treatment.

Non-experimental data (syn. “observational data”) Data in which the administration of the treatment is determined by factors beyond

the researchers control.

The standard political science stats education

I Basic probability theory- random variables- PDFs-CDFs

II. Statistical inference theory- confidence intervals, hypothesis testing, p-values, etc.

III. Linear regression analysis - the workhorse model of the social sciencesIV. Binary Outcome Models & Other Extensions of the Basic

Linear ModelV. Time Series Cross Sectional Models

Linear regression analysis

A. Univariate regression model

yi = β0 + β1xi + εi (There is one IV)

B. Multivariate regression model

yi = β0 + β1xi +β2zi + εi (There are two IVs)

yi = β0 + β1x1i +….+ βNxNi + εi (There are N IVs)

V. Binary dependent variable models

Used when the dependent variable takes one of two possible values:

= 1 if citizen i is a Democrat

Democrati

= 0 if citizen i is not a Democrat

Democrati = f (genderi, incomei, racei, agei )

VI. Time series cross sectional models

State Year GDP per capita Ave. Education

Alabama 1970 $5,000 10.3 years

Alabama 1980 $9,500 11.2 years

Alabama 1990 $11,200 12.4 years

Illinois 1970 $7,000 9.3 years



New York 1970 $6,000 8.4 years

New York 1980 $11,500 10.1 years

New York 1990 $18,00 14.5 years

When the researcher observes the objects of analysis at multiple points in time.

(These data have both time series and cross section features.)

What we won’t cover in PS545-6 but might be useful in your dissertation, future research, etc.

A. MLE estimation and other procedures

B. Model selection

C. Simultaneous equations/IV estimation

D. Matching

E. Non-parametric models

F. Case study selection for qualitative research

And much, much more!

Causality and research design

• Causality is often difficult to determine—wait for the next slide—that’s that’s why research design is important.

• An experiment is the gold standard.

• If a treated subject and a control subject are the same in every respect (as they are in a perfect experiment), we can logically attribute any difference in the observed outcome to receipt of the treatment.

• In the social sciences, we generally can’t run experiments so we use statistical techniques to make the treatment and control group as alike as we can.

Common difficulties in determining causalityOne variable causes another, but how do you know which is causal?

Douglass firs ? Rainfall

Two variables cause each other.

Expected closeness of race Candidate expenditures

Common difficulties in determining causality

An omitted third variable causes both. (One reason correlation ≠ causation.)

Bad Driving

Old age Gray Hair

If one were to look at the relationship between Bad Driving and Gray Hair only one might be led to the erroneous conclusion that Gray Hair causes people to drive badly (or Bad Driving causes one to have Gray Hair).

How could one test these competing hypotheses?

Recall the relationship between ice cream consumption and the NY homicide rate…

A research design schematic

R denotes randomized assignment.N denotes non-randomized assignment.X denotes receipt of the treatment.O Denotes that the subject is tested.

Some basic mathematical tools

We will review some basic mathematical tools:

- Functions

- Summation operators

- Differential Calculus

Functions

A function is a rule that assigns exactly one value to each input of a specified type

A function expresses the intuitive idea that one quantity (the argument of the function, also known as the input) completely determines another quantity (the value, or the output).

Summation operators

Summation operators are a useful way to represent the sum of a large set of numbers:

The index i indicates which numbers in the set are to be included in the sum.

The product operator works in a similar fashion.

€

x ii=1

N

∑ = x1 + x2 + ...+ xN −1 + xN

€

x ii=1

N

∏ = x1 × x2 × ... × xN

Summation operatorsSuppose your data were, {x1, x2 , x3 , x4 , x5 , x6 , x7} = {-100,-10, -1, 0, 1, 10, 100}.

Compute the following:

€

x ii=1

7

∑

€

x ii=1

3

∑€

x ii=3

5

∑

€

8(x i)i=1

7

∑€

x i

4i≠3∑

€

x ii is an odd number

∑€

x ii=1

7

∏

€

x ii≥6∏

Sample mean and sample varianceEvery population has a mean (μ) and a variance (σ2), note this implies it has a

standard deviation (σ) as well.

The population mean tells you were the population is “centered.” There’s a sense in which the mean is the middle of the data.

The population variance (or standard deviation) measures how far “spread out” individuals in the population are. (Obviously, these are always non-negative).

The sample mean and sample variance are two fundamental statistics. They estimate the parameters of the population the data were drawn from.

€

ˆ μ =1N

x ii=1

N

∑

€

ˆ σ 2 =1N

(x ii=1

N

∑ − ˆ μ )2

Derivatives

Loosely speaking, a derivative can be thought of as how much one quantity is changing in response to changes in some other quantity.

Integrals

A definite integral of a function can be represented as the signed area of the region bounded by its graph.

Math Camp game plan: Time to get down to business…

In the remainder of this lecture we will discuss some elementary results in a branch of mathematics called Real Analysis—i.e., the branch of math that studies real numbers.

Q: Why do we care about Real Analysis?A: Because it provides the logical structure that undergirds the math we use as social scientists.

The next few slides follow a text that is slightly more advanced than we need, but let’s follow along to develop a few ideas about the real number line…

The set of real numbers:Special symbols

The real number line

The set of real numbers:Properties

Inequalities

A cheat sheet of handy rules re real numbers

(see the Math Camp website for the complete sheet)

Quadratic equations

Quadratic equations (cont.)

Absolute value

Achilles and the tortoise

Achilles and tortoise

Achilles and the tortoise

Bounds

Intervals

Next lecture…

Functions and graphs - Functions

- Graphs

- Functional forms

Download - Lecture 1: Introduction Math Boot Camp Will Terry Department of Political Science

Top Related