fundamentals of statistical analysis

27
Fundamentals of Statistical Analysis DR. SUREJ P JOHN

Upload: infinity

Post on 24-Feb-2016

138 views

Category:

Documents


2 download

DESCRIPTION

Fundamentals of Statistical Analysis. Dr. Surej P John. Definition of Variables. A variable is an attribute of a person or an object that varies. Measurement are rules for assigning numbers to objects to represent quantities of attributes. Back to Table of Content. Definition. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Fundamentals of Statistical Analysis

Fundamentals of Statistical AnalysisDR. SUREJ P JOHN

Page 2: Fundamentals of Statistical Analysis

Definition of Variables A variable is an attribute of a person or an object that varies.

Measurement are rules for assigning numbers to objects to represent quantities of attributes.

Back to Table of Content

Page 3: Fundamentals of Statistical Analysis

Definition Datum is one observation about the variable being measured.

Data are a collection of observations.

A population consists of all subjects about whom the study is being conducted.

A sample is a sub-group of population being examined.

Page 4: Fundamentals of Statistical Analysis

What Is Statistics? Statistics is the science of describing or making inferences about the world from a sample of data.

Descriptive statistics are numerical estimates that organize and sum up or present the data.

Inferential statistics is the process of inferring from a sample to the population.

Page 5: Fundamentals of Statistical Analysis

1. Descriptive analysis – data distribution

2. Inferential analysis – hypothesis testing

3. Differences analysis – hypothesis testing

4. Association analysis – correlation

5. Predictive analysis – regression

Five Types of Statistical Analysis

Page 6: Fundamentals of Statistical Analysis

A Hypothesis:

A statement relating to an observation that may be true but for which a proof (or disproof) has not been found

The results of a well-designed experiment or data collection may lead to the proof or disproof of a hypothesis

Descriptive vs. Inferential Statistics

Page 7: Fundamentals of Statistical Analysis

Population

Samples

Sub-samples

Inferential Statistics

Page 8: Fundamentals of Statistical Analysis

For example, Heights of male vs. female at age of 25.Our observations: male H > female H; it may be linked to genetics, consumption and exercise etc.

Is that true for male H> female H? i.e. Null hypothesis: male H ≤ female H

Scenario I: Randomly select 1 person from each sex.Male: 170Female: 175

Then, Female H> Male H ?

Scenario II: Randomly select 3 persons from each sex.Male: 171, 163, 168Female: 160, 172, 173

What is your conclusion then? Which is the better Scenario?

Page 9: Fundamentals of Statistical Analysis

Important messages here:

(1) Sample size is very important and will affect your conclusion

(2) Measurement results vary among samples (or subjects) – that is “variation” or “uncertainty”.

(3) Variation can be due to measurement errors (random or systematic errors) and inherent within samples variation. For example, at age 20, female height varies from 158 to 189 cm. Why?

(4) Therefore, in Statistics, we always deal with distributions of data rather than a single point of measurement or event.

Page 10: Fundamentals of Statistical Analysis

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.10

140 150 160 170 180 190

Height (cm)

Prob

abili

ty d

ensi

ty

Page 11: Fundamentals of Statistical Analysis

Moments of a Normal Distribution

Each moment measures a different dimension of the distribution.

1. Mean (1st moment)

2. Standard deviation (2nd moment)

3. Skewness (3rd moment)

4. Kurtosis (4th moment)

Page 12: Fundamentals of Statistical Analysis

Mean

Mean (µ) is equal to the sum of n number of observation divided by the number of observations (sample size)

Mean = Sum of values/n = Xi/n

e.g. length of 8 fish larvae at day 3 after hatching:

0.6, 0.7, 1.2, 1.5, 1.7, 2.0, 2.2, 2.5 mm

mean length = (0.6+0.7+1.2+1.5+1.7+2.0+2.2+2.5)/8

= 1.55 mm

0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 mm

mean

Page 13: Fundamentals of Statistical Analysis

Standard deviation The standard deviation (SD) (represented by the Greek letter sigma, σ) shows how much variation or dispersion from the average exists.

A low standard deviation indicates that the data points tend to be very close to the mean (also called expected value); a high standard deviation indicates that the data points are spread out over a large range of values.

The formula is easy: it is the square root of the Variance. The Variance is defined as: the average of the squared differences from the Mean.

Page 14: Fundamentals of Statistical Analysis

Standard deviation

Page 15: Fundamentals of Statistical Analysis

Calculate SD?

Page 16: Fundamentals of Statistical Analysis

Skewness In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive or negative, or even undefined.

Page 17: Fundamentals of Statistical Analysis

Kurtosis The coefficient of Kurtosis is a measure for the degree of peakedness /flatness in the variable distribution.

Kurtosis <0 Kurtosis = 0 Kurtosis > 0

Page 18: Fundamentals of Statistical Analysis

Frequency Distribution In statistics, a frequency distribution is an arrangement of the values that one or more variables

take in a sample. Each entry in the table contains the frequency or count of the occurrences of

values within a particular group or interval, and in this way, the table summarizes

the distribution of values in the sample.

Frequency distribution tables can be used for both categorical and numeric variables.

Page 19: Fundamentals of Statistical Analysis

Table 1. Frequency table for the number of cars

registered in each household

Number of cars (x) Tally Frequency (f)

0

4

1

6

2

5

3

3

4

2

Page 20: Fundamentals of Statistical Analysis

Cross Tabulation A cross-tabulation (or cross-tab for short) is a display of data that shows how many cases in each category of one variable are divided among the categories of one or more additional variables.

In a cross-tab, a cell is a combination of two or more characteristics, one from each variable.

If one variable has two categories and the second variable has four categories, for instance, the cross-tab will have 6 cells, each with a number specific to that category

Page 21: Fundamentals of Statistical Analysis

Sample # Gender Handedness 1 Female Right-handed

2 Male Left-handed

3 Female Right-handed

4 Male Right-handed

5 Male Left-handed

6 Male Right-handed

7 Female Right-handed

8 Female Left-handed

9 Male Right-handed

10 Female Right-handed

Page 22: Fundamentals of Statistical Analysis

Left-handed Right-handed Total

Males 2 3 5

Females 1 4 5

Total 3 7 10

Page 23: Fundamentals of Statistical Analysis

Comparing Means We need to compare the means of groups in Inferential statistics.

T-tests and ANOVA (Analysis of Variance) are the methods commonly used for comparing means.

Independent T tests

Independent T tests are used for testing the difference between the means of two independent

groups. For Independent T-tests, there should be only one independent variable but it can have

two levels. There should be only one dependant variable.

Ex: gender (male and female)

How male and female students differ in academic performance?

Page 24: Fundamentals of Statistical Analysis

Anova (Analysis of Variance) Anova is used as the extension of Independent t-tests.

This is used when the researcher is interested in whether the means from

several ( >2) independent groups differ.

For Avova, only one dependant variable should be present. There should be

only ONE independent variable present (but it can have many levels unlike

in independent t-tests)

Page 25: Fundamentals of Statistical Analysis

Statistical errors in hypothesis testing

Page 26: Fundamentals of Statistical Analysis

Statistical Errors in Hypothesis Testing

Consider court judgments where the accused is presumed innocent until proved guilty beyond reasonable doubt (I.e. Ho = innocent)

If the accused isinnocent(Ho is true)

If the accused isguilty(Ho is false)

Court’sdecision:Guilty

Wrongjudgement

OK

Court’sdecision:Innocent

OK Wrongjudgement

Page 27: Fundamentals of Statistical Analysis

Statistical Errors in Hypothesis Testing

Similar to court judgments, in testing a null hypothesis in statistics, we also suffer from the similar kind of errors:

If Ho is true If Ho is false

If Ho is rejected Type I error No error

If Ho is accepted No error Type II error