lecture 1 dustin lueker. statistical terminology descriptive methods probability and distribution...

28
Lecture 1 Dustin Lueker

Upload: donald-hall

Post on 29-Jan-2016

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 1 Dustin Lueker.  Statistical terminology  Descriptive methods  Probability and distribution functions  Estimation (confidence intervals)

Lecture 1Dustin Lueker

Page 2: Lecture 1 Dustin Lueker.  Statistical terminology  Descriptive methods  Probability and distribution functions  Estimation (confidence intervals)

Statistical terminology Descriptive methods Probability and distribution functions Estimation (confidence intervals) Hypothesis testing Inferential methods for two samples Simple linear regression and correlation

STA 291 Summer 2008 Lecture 1

Page 3: Lecture 1 Dustin Lueker.  Statistical terminology  Descriptive methods  Probability and distribution functions  Estimation (confidence intervals)

Research in all fields is becoming more quantitative◦ Look at research journals◦ Most graduates will need to be familiar with basic

statistical methodology and terminology Newspapers, advertising, surveys, etc.

◦ Many statements contain statistical arguments Computers make complex statistical

methods easier to use

STA 291 Summer 2008 Lecture 1

Page 4: Lecture 1 Dustin Lueker.  Statistical terminology  Descriptive methods  Probability and distribution functions  Estimation (confidence intervals)

Many times statistics are used in an incorrect and misleading manner

Purposely misused◦ Companies/people wanting to furthur their

agenda Cooking the data

Completely making up data Massaging the numbers

Incidentally misused◦ Using inappropriate methods

Vital to understand a method before using it

STA 291 Summer 2008 Lecture 1

Page 5: Lecture 1 Dustin Lueker.  Statistical terminology  Descriptive methods  Probability and distribution functions  Estimation (confidence intervals)

Statistics is a mathematical science pertaining to the collection, analysis, interpretation or explanation, and presentation of data

Applicable to a wide variety of academic disciplines◦ Physical sciences◦ Social sciences◦ Humanities

Statistics are used for making informed decisions◦ Business◦ Government

STA 291 Summer 2008 Lecture 1

Page 6: Lecture 1 Dustin Lueker.  Statistical terminology  Descriptive methods  Probability and distribution functions  Estimation (confidence intervals)

STA 291 Summer 2008 Lecture 1

Page 7: Lecture 1 Dustin Lueker.  Statistical terminology  Descriptive methods  Probability and distribution functions  Estimation (confidence intervals)

Population◦ Total set of all subjects of interest

Entire group of people, animals, products, etc. about which we want information

Elementary Unit◦ Any individual member of the population

Sample◦ Subset of the population from which the study

actually collects information◦ Used to draw conclusions about the whole

population

STA 291 Summer 2008 Lecture 1

Page 8: Lecture 1 Dustin Lueker.  Statistical terminology  Descriptive methods  Probability and distribution functions  Estimation (confidence intervals)

Variable◦ A characteristic of a unit that can vary among

subjects in the population/sample Ex: gender, nationality, age, income, hair color,

height, disease status, state of residence, grade in STA 291

Parameter◦ Numerical characteristic of the population

Calculated using the whole population Statistic

◦ Numerical characteristic of the sample Calculated using the sample

STA 291 Summer 2008 Lecture 1

Page 9: Lecture 1 Dustin Lueker.  Statistical terminology  Descriptive methods  Probability and distribution functions  Estimation (confidence intervals)

Why take a sample? Why not take a census? Why not measure all of the units in the population?◦ Accuracy

May not be able to find every unit in the population◦ Time

Speed of response from units◦ Money◦ Infinite Population◦ Destructive Sampling or Testing

STA 291 Summer 2008 Lecture 1

Page 10: Lecture 1 Dustin Lueker.  Statistical terminology  Descriptive methods  Probability and distribution functions  Estimation (confidence intervals)

University Health Services at UK conducts a survey about alcohol abuse among students◦ 200 of the students are sampled and asked to

complete a questionnaire◦ One question is “have you regretted something

you did while drinking?” What is the population? Sample?

STA 291 Summer 2008 Lecture 1

Page 11: Lecture 1 Dustin Lueker.  Statistical terminology  Descriptive methods  Probability and distribution functions  Estimation (confidence intervals)

Descriptive Statistics◦ Summarizing the information in a collection of

data Inferential Statistics

◦ Using information from a sample to make conclusions/predictions about the population

STA 291 Summer 2008 Lecture 1

Page 12: Lecture 1 Dustin Lueker.  Statistical terminology  Descriptive methods  Probability and distribution functions  Estimation (confidence intervals)

The Current Population Survey of about 60,000 households in the United States in 2002 distinguishes three types of families: Married-couple (MC), Female householder and no husband (FH), Male householder and no wife (MH)

It indicated that 5.3% of “MC”, 26.5% of “FH”, and 12.1% of “MH” families have annual income below the poverty level◦ Are these numbers statistics or parameters?

The report says that the percentage of all “FH” families in the USA with income below the poverty level is at least 25.5% but no greater than 27.5%◦ Is this an example of descriptive or inferential statistics?

STA 291 Summer 2008 Lecture 1

Page 13: Lecture 1 Dustin Lueker.  Statistical terminology  Descriptive methods  Probability and distribution functions  Estimation (confidence intervals)

Univariate data◦ Consists of observations on a single attribute

Multivariate data◦ Consists of observations on several attributes

Special case Bivariate Data

Consists of observations on two attributes

STA 291 Summer 2008 Lecture 1

Page 14: Lecture 1 Dustin Lueker.  Statistical terminology  Descriptive methods  Probability and distribution functions  Estimation (confidence intervals)

Quantitative or Numerical◦ Variable with numerical values associated with

them Qualitative or Categorical

◦ Variables without numerical values associated with them

STA 291 Summer 2008 Lecture 1

Page 15: Lecture 1 Dustin Lueker.  Statistical terminology  Descriptive methods  Probability and distribution functions  Estimation (confidence intervals)

Nominal◦ Gender, nationality, hair color, state of residence

Nominal variables have a scale of unordered categories It does not make sense to say, for example, that green

hair is greater/higher/better than orange hair

Ordinal◦ Disease status, company rating, grade in STA 291

Ordinal variables have a scale of ordered categories, they are often treated in a quantitative manner (A = 4.0, B = 3.0, etc.) One unit can have more of a certain property than does

another unit

STA 291 Summer 2008 Lecture 1

Page 16: Lecture 1 Dustin Lueker.  Statistical terminology  Descriptive methods  Probability and distribution functions  Estimation (confidence intervals)

Quantitative◦ Age, income, height

Quantitative variables are measured numerically, that is, for each subject a number is observed The scale for quantitative variables is called interval

scale

STA 291 Summer 2008 Lecture 1

Page 17: Lecture 1 Dustin Lueker.  Statistical terminology  Descriptive methods  Probability and distribution functions  Estimation (confidence intervals)

A study about oral hygiene and periodontal conditions among institutionalized elderly measured the following◦ Nominal (Qualitative): Requires assistance from staff?

Yes No

◦ Ordinal (Qualitative): Plaque score No visible plaque Small amounts of plaque Moderate amounts of plaque Abundant plaque

◦ Interval (Quantitative): Number of teeth

STA 291 Summer 2008 Lecture 1

Page 18: Lecture 1 Dustin Lueker.  Statistical terminology  Descriptive methods  Probability and distribution functions  Estimation (confidence intervals)

A birth registry database collects the following information on newborns◦ Birthweight: in grams◦ Infant’s Condition:

Excellent Good Fair Poor

◦ Number of prenatal visits◦ Ethnic background:

African-American Caucasian Hispanic Native American Other

What are the appropriate scales? Quantitative (Interval) Qualitative (Ordinal, Nominal)

STA 291 Summer 2008 Lecture 1

Page 19: Lecture 1 Dustin Lueker.  Statistical terminology  Descriptive methods  Probability and distribution functions  Estimation (confidence intervals)

Statistical methods vary for quantitative and qualitative variables

Methods for quantitative data cannot be used to analyze qualitative data

Quantitative variables can be treated in a less quantitative manner◦ Height: measured in cm/in

Interval (Quantitative) Can be treated at Qualitative

Ordinal: Short Average Tall

Nominal: <60in or >72in 60in-72in

STA 291 Summer 2008 Lecture 1

Page 20: Lecture 1 Dustin Lueker.  Statistical terminology  Descriptive methods  Probability and distribution functions  Estimation (confidence intervals)

Try to measure variables as detailed as possible◦ Quantitative

More detailed data can be analyzed in further depth

◦ Caution: Sometimes ordinal variables are treated at quantitative (ex: GPA)

STA 291 Summer 2008 Lecture 1

Page 21: Lecture 1 Dustin Lueker.  Statistical terminology  Descriptive methods  Probability and distribution functions  Estimation (confidence intervals)

A variable is discrete if it can take on a finite number of values◦ Gender◦ Nationality◦ Hair color◦ Disease status◦ Grade in STA 291◦ Favorite MLB team

Qualitative variables are discrete

STA 291 Summer 2008 Lecture 1

Page 22: Lecture 1 Dustin Lueker.  Statistical terminology  Descriptive methods  Probability and distribution functions  Estimation (confidence intervals)

Continuous variables can take an infinite continuum of possible real number values◦ Time spent studying for STA 291 per day

43 minutes 2 minutes 27.487 minutes 27.48682 minutes

Can be subdivided into more accurate values Therefore continuous

STA 291 Summer 2008 Lecture 1

Page 23: Lecture 1 Dustin Lueker.  Statistical terminology  Descriptive methods  Probability and distribution functions  Estimation (confidence intervals)

Number of children in a family Distance a car travels on a tank of gas % grade on an exam

STA 291 Summer 2008 Lecture 1

Page 24: Lecture 1 Dustin Lueker.  Statistical terminology  Descriptive methods  Probability and distribution functions  Estimation (confidence intervals)

Quantitative variables can be discrete or continuous

Age, income, height?◦ Depends on the scale

Age is potentially continuous, but usually measured in years (discrete)

STA 291 Summer 2008 Lecture 1

Page 25: Lecture 1 Dustin Lueker.  Statistical terminology  Descriptive methods  Probability and distribution functions  Estimation (confidence intervals)

Each possible sample has the same probability of being selected

The sample size is usually denoted by n

STA 291 Summer 2008 Lecture 1

Page 26: Lecture 1 Dustin Lueker.  Statistical terminology  Descriptive methods  Probability and distribution functions  Estimation (confidence intervals)

Population of 4 students: Alf, Buford, Charlie, Dixie

Select a SRS of size n = 2 to ask them about their smoking habits◦ 6 possible samples of size 2

A,B A,C A,D B,C B,D C,D

STA 291 Summer 2008 Lecture 1

Page 27: Lecture 1 Dustin Lueker.  Statistical terminology  Descriptive methods  Probability and distribution functions  Estimation (confidence intervals)

Each of the size possible samples has to have the same probability of being selected◦ How could we do this?

Roll a die Random number generator

STA 291 Summer 2008 Lecture 1

Page 28: Lecture 1 Dustin Lueker.  Statistical terminology  Descriptive methods  Probability and distribution functions  Estimation (confidence intervals)

Convenience sample◦ Selecting subjects that are easily accessible to you

Volunteer sample◦ Selecting the first two subjects who volunteer to

take the survey

What are the problems with these samples?◦ Proper representation of the population◦ Bias

Examples Mall interview Street corner interview

STA 291 Summer 2008 Lecture 1