m28- categorical analysis 1 department of ism, university of alabama, 1992-2003 categorical data

45
M28- Categorical Analysis Department of ISM, University of Alabama, 1992-2003 Categorical Data

Upload: tracy-lamb

Post on 25-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

M28- Categorical Analysis 1 Department of ISM, University of Alabama, 1992-2003

Categorical Data Categorical Data

M28- Categorical Analysis 2 Department of ISM, University of Alabama, 1992-2003

Lesson Objective

Understand basic rules of probability.

Calculate marginal and conditional probabilities.

Determine if two categorical variables

are independent.

M28- Categorical Analysis 3 Department of ISM, University of Alabama, 1992-2003

Recall Rule of Thumb:

Quantitative variables: averages or differences have meaning.

Ex: weight, height, income, age

M28- Categorical Analysis 4 Department of ISM, University of Alabama, 1992-2003

Recall Rule of Thumb:

Categorical variables: classify people or things.

Ex: gender, race, occupation, political affiliation, country of origin

M28- Categorical Analysis 5 Department of ISM, University of Alabama, 1992-2003

Note: Sometimes quantitative variables are expressed as categorical.

Income (Family Economic Income):

Class Definition 1. Less than $30,000 2. $30,000 but less than $100,000 3. $100,000 or more.

M28- Categorical Analysis 6 Department of ISM, University of Alabama, 1992-2003

Relationships

Relationships

between between

variablesvariables

Relationships

Relationships

between between

variablesvariables

M28- Categorical Analysis 7 Department of ISM, University of Alabama, 1992-2003

Relationship between two quantitative variables?

Is relationship linear (scatterplot)?

Use Correlation &

Least Squares Regression.

Data transformations.

M28- Categorical Analysis 8 Department of ISM, University of Alabama, 1992-2003

Best graphical tool for examining the relationship between a quantitative variable and a categorical variable,(i.e., comparing distributions).

Recall: Boxplots

321

4000

3000

2000

originweight

US Far East EuropeW

eigh

t

“Do the distributions of weights vary for different countries of origin?”

Example: Weight vs. Country of Origin

Boxplot can be used to answer:

M28- Categorical Analysis 9 Department of ISM, University of Alabama, 1992-2003

Relationship between two categorical variables?

Use two-way frequency tables:

Look at marginal probabilities and conditional probabilities.

10M28- Categorical Data Data Department of ISM, University of Alabama, 1995-2003

STATISTICSSTATISTICSSTATISTICSSTATISTICS

is the science oftransforming datainto information

to make decisionsin the face of uncertainty.

M28- Categorical Analysis 11 Department of ISM, University of Alabama, 1992-2003

A numerical measure of the likelihood that an outcome or

an event occurs.

P(A) = probability of event A

Probability

How do we measure "uncertainty"?

M28- Categorical Analysis 12 Department of ISM, University of Alabama, 1992-2003

Three Methods for Assessing Probability

Classical

Relative Frequency

Subjective

M28- Categorical Analysis 13 Department of ISM, University of Alabama, 1992-2003

P(A) = 0 impossible event

P(A) = 1 certain event

2. Sum of the probabilities of all possible outcomes must equal 1. (Binomial, Poisson)

1. 0 < P(A) < 1_ _

Probability requirements fordiscrete variables:

M28- Categorical Analysis 14 Department of ISM, University of Alabama, 1992-2003

Conditional probability:The chance one event happens,given that another event willoccur.

P(A | B) =P(A and B)

P(B)

All outcomes belonging to BOTH A AND B

Those outcomes in the restricted group, B =

M28- Categorical Analysis 15 Department of ISM, University of Alabama, 1992-2003

Problem: Credit Card Manager

New credit test to determine credit worthiness.

Credit test checked against500 previous customers.

M28- Categorical Analysis 16 Department of ISM, University of Alabama, 1992-2003

350 50

20 80

Passed (P)

Failed (F)

Good (G)

Default (D)

400

100

370 130 500

Credit Test ACredit History

M28- Categorical Analysis 17 Department of ISM, University of Alabama, 1992-2003

P ( D ) What is the probability of a customer defaulting given that he fails test A?

What is the probability of a customer defaulting?

P ( D | F ) P(Defaults given failed test A) =

P(Defaults) =

350 50

20 80

P F

G

D

400

100

370 130 500

M28- Categorical Analysis 18 Department of ISM, University of Alabama, 1992-2003

General Rules:

P(A and B) = P(A) P(B|A)

= P(B) P(A|B)

P(A or B) = P(A) + P(B) - P(A and B)

M28- Categorical Analysis 19 Department of ISM, University of Alabama, 1992-2003

P(Fails AND Defaults)

= P(F) P(D|F)

350 50

20 80

P F

G

D

400

100

370 130 500

M28- Categorical Analysis 20 Department of ISM, University of Alabama, 1992-2003

P(Fails OR Defaults)

= P(F) + P(D) - P(D AND F)

Note: The “overlap” group Note: The “overlap” group would be counted twice if would be counted twice if no subtraction.no subtraction.

Note: The “overlap” group Note: The “overlap” group would be counted twice if would be counted twice if no subtraction.no subtraction.

350 50

20 80

P F

G

D

400

100

370 130 500

M28- Categorical Analysis 21 Department of ISM, University of Alabama, 1992-2003

Does knowledge of “test A result”help you make a better decision?

P ( D ) P ( D | F )

Do you want to know the test A results before you give the loan?

“Credit test A results” and “defaulting”

are ____________ on each other.

M28- Categorical Analysis 22 Department of ISM, University of Alabama, 1992-2003

A “Newer” Credit Test.

Is it even better? A “Newer” Credit Test.

Is it even better?

A different sample of 500 credit records

M28- Categorical Analysis 23 Department of ISM, University of Alabama, 1992-2003

340 60

85 15

Passed (P)

Failed (F)

Good (G)

Default (D)

400

100

425 75 500

Credit Test BCredit History

M28- Categorical Analysis 24 Department of ISM, University of Alabama, 1992-2003

P ( D ) What is the probability of a customer defaulting given that he fails test B?

What is the probability of a customer defaulting?

P ( D | F ) P(Defaults given failed test B) =

P(Defaults) =

340 60

85 15

P F

G

D

400

100

425 75 500

M28- Categorical Analysis 25 Department of ISM, University of Alabama, 1992-2003

Does knowledge of “test B result”help you make a better decision?

P ( D ) P ( D | F )

Test B tells me .“Credit test B results” and “defaulting” are

of each other.

M28- Categorical Analysis 26 Department of ISM, University of Alabama, 1992-2003

Independence

Independence

M28- Categorical Analysis 27 Department of ISM, University of Alabama, 1992-2003

Two events are independent if the occurrence, or non-occurrence, of one does not affect the chances of the other occurring, or not occurring.

Otherwise, we say the

events are dependent.

M28- Categorical Analysis 28 Department of ISM, University of Alabama, 1992-2003

If A and B independentindependent, then

P(A and B) = P(A) P(B)

P(A or B) = P(A) + P(B) - P(A) P(B)

P(A|B) = P(A)

P(B|A) = P(B)

Note: The condition Note: The condition does NOT changedoes NOT changethe probability.the probability.

M28- Categorical Analysis 29 Department of ISM, University of Alabama, 1992-2003

Survey of randomly selectedpeople voters in Jan. 2001:

Q1: Did you vote in the 2000 election?

Q2: Do you favor an amendment to require a balanced budget?

Q3: To which political party do you belong ?

M28- Categorical Analysis 30 Department of ISM, University of Alabama, 1992-2003

Political Party:

Republican

Democrat

Other

Total

Do you favor amendmentfor a balancedbudget?

Yes No Total

90

44

48

182 218 400

172

148

80

82

104

32

Sample size

Republican

Democrat

Other

Total

Party:

Favor amendment

Yes No Total

90 82 172

44 104 148

48 32 80

182 218 400

Marginal totalsfor opinion.

Marginal totals for Party.

What proportionfavor the amend.?

What proportionclaim to be Rep?What proportion

favor the amend.andand are Other?

Yes No Total Party

Favor amend.

90 82 172

44 104 148

48 32 80

182 218 400

Repub

Demo

Other

Total

What proportionfavor the amend,given those that claim to be Rep?

Of those that claim to be Democrat,what proportionfavor the amend.

Considering onlythose opposed, what proportionare not Republican?

Yes No Total Party

Favor amend.

90 82 172

44 104 148

48 32 80

182 218 400

Repub

Demo

Other

Total

M28- Categorical Analysis 34 Department of ISM, University of Alabama, 1992-2003

Restrict subjects to only those that meet a condition. Within this restricted group, what is the distribution of some other var.?

Distribution of “opinion” given those that claim to be Republican:

P( Yes | Rep. ) = .523

P( No | Rep. ) = .477

90172

82172

“given that”

Conditional Distribution:

M28- Categorical Analysis 35 Department of ISM, University of Alabama, 1992-2003

Is there a relationship betweenthe party and the opinion on the amendment?

What would you expect to happen if

no relationship existed?

M28- Categorical Analysis 36 Department of ISM, University of Alabama, 1992-2003

Three Conditional Distributions:

P( Yes | Rep.) = .523, P( No | Rep.) =P( Yes | Demo) = .297, P( No | Demo) =

P( Yes | Other) = .600, P( No | Other) =

Marginal Distribution: P( Yes ) = .455, P( No ) = .545

Is there a relationship?Is there a relationship?Why? or Why not?Why? or Why not?

M28- Categorical Analysis 37 Department of ISM, University of Alabama, 1992-2003

If there is NO relationship(i.e., independence)between the party andthe opinion, then

“the three conditional probabilities

should be the close to each

other and close to the marginal probability.”

M28- Categorical Analysis 38 Department of ISM, University of Alabama, 1992-2003

Three Conditional Probabilities:

P( Yes | Rep.) = .523

P( Yes | Demo) = .297

P( Yes | Other) = .600

Marginal Probability: P( Yes ) = .455

Not close; therefore, Not close; therefore, “party” and party” and the “opinion” are the “opinion” are ____________.

Are these close to

each other?

AND close to the “marginal”?

M28- Categorical Analysis 39 Department of ISM, University of Alabama, 1992-2003

Visual Displays

Create with “Pivot Tables”

in Excel.

M28- Categorical Analysis 40 Department of ISM, University of Alabama, 1992-2003

Rep.

Demo.

Other

Barchart- Clustered

Frequency

Yes

M28- Categorical Analysis 41 Department of ISM, University of Alabama, 1992-2003

Rep.

Demo.

Other

Barchart- Stacked

Frequency

Yes

M28- Categorical Analysis 42 Department of ISM, University of Alabama, 1992-2003

Rep.

Demo.

Other

Barchart- Percents

Percent

Yes

M28- Categorical Analysis 43 Department of ISM, University of Alabama, 1992-2003

SummaryFor two categorical variables: Must use conditional probabilities to determine if a relationship exists.

Cannot use correlation.

Visual display: Stacked percentage bar charts

M28- Categorical Analysis 44 Department of ISM, University of Alabama, 1992-2003

Quant. vs. Quant

numerical graphical

LS regression line, r, r-sq, std error

Scatterplot,residual plots

X-bar and sfor each category

Side-by-side box plots

Two-way table, conditional & marginal distributions

Bar chart : stacked, percent.

Cat. vs. Cat.

Quant. vs. Cat.

Variables

Associations between TWO Variables

M28- Categorical Analysis 45 Department of ISM, University of Alabama, 1992-2003

The End