lesson 2_statistical concepts

80
Copyright 2014, Simplilearn, All rights reserved. Copyright 2014, Simplilearn, All rights reserved. Lesson 2 Statistical Concepts And Their Applications In Business

Upload: pragativbora

Post on 24-Dec-2015

21 views

Category:

Documents


0 download

DESCRIPTION

R

TRANSCRIPT

Page 1: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

Copyright 2014, Simplilearn, All rights reserved.

Lesson 2

Statistical Concepts And Their Applications In Business

Page 2: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

After completing this course, you will be able to understand:

Agenda

• Statistical Methods overview

• Population and Samples

• Developing a sampling plan and Sampling Methods

• What is Descriptive Statistics

• What are its components

• Business usage of Descriptive Statistics via a Case Study

• Probability theory and distributions

• Confidence Interval

• The concept of tests of significance

• One sided and two sided hypothesis testing

• The various tests of significance

• Non parametric testing

Page 3: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

Statistical Methods

Descriptive Statistics

• Sample

• Measure of Central Tendency

• Measure of Dispersion

Inferential Statistics

• Population

• Estimation

• Hypothesis Testing

• Statistics is a applied/business mathematics which estimate the present and predict the future.

Page 4: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

Population and Samples

• A population is any entire collection of objects or observations from which we may collect data. It is the entire group we are interested in, which we wish to describe or draw conclusions about.

• For each population there are many possible samples.

• It is important that the investigator carefully and completely defines the population before collecting the sample, including a description of the members to be included.

• A sample is a group of units selected from a larger group (the population). By studying the sample it is hoped to draw valid conclusions about the larger group.

• A sample is generally selected for study because the population is too large to study in its entirety. The sample should be representative of the general population. This is often best achieved by random sampling.

Page 5: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

Developing a sampling plan

• Define the target population – in terms of number of elements, sampling unit, extent and time.

• Select a sampling method – probability or non-probability sampling.

• Obtain the sampling frame – must contain all the potential factors.

• Determination of sample size – for desired level of accuracy.

• Choose data collection method – procedure to obtain the data.

• Develop operational plan – which technique fits the best.

• Execute operational plan – verification of specified procedure.

Page 6: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

Sampling techniques

Sampling

Simple Random

Systematic Stratified Cluster

Convenience Judgmental Quota Snowball

Probability Non-Probability

Page 7: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

Descriptive Statistics

● Help describe, show and summarize data in a meaningful manner ● Non-conclusive as it is only limited to the data being analysed

Score Range

Number of Students

Below 40 20

40-50 22

50-60 33

60-70 21

70-80 13

>80 5

Total 114

0

5

10

15

20

25

30

35

Below40

40-50 50-60 60-70 70-80 >80

Number of Students

Page 8: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

Measure of Central Tendency

Measure of Central Tendency

• Mean

• Median

• Mode

● Identify with a single value ● Also called measures of central location

Page 9: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

Mean

● mean is the average of the numbers

● a calculated "central" value of a set of numbers

Page 10: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

Median

● Median is the number in the middle

● Number of values above and below median is same

Page 11: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

Mode

● Mode is the value that occurs often ● A set of data can have more than one mode

0

5

10

15

20

25

30

35

0 1 2 3 4 5 6

Freq

uen

cy

Page 12: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

When to use what?

• Mean:

• The average is required

• The variable is continuous / discrete

• Median:

• The variable is discrete

• There are abnormal extreme values / Non-normal data

• The characteristic under study is qualitative

• Mode:

• The variable is discrete

• There are abnormal extreme values

• The characteristic under study is qualitative

Page 13: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

Measure of Dispersion

Measure of Dispersion

• Variance

• Standard Deviation

● The spread or dispersion of a set of scores around some central value ● Describes the amount of heterogeneity or variation within a distribution of scores

Page 14: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

Variance and Standard Deviation

● Variance is an average of squared deviations about the mean

● Standard deviation is the squared root of variance.

Example data : 2,5,5,4,6,8 ● n= 6 ● Mean = (2+5+5+4+6+8)/6 = 5 Example data : 2,5,4,6,8 ● Variance =

● Standard Deviation =

(2−5)2+ (5−5)2+ (4−5)2+ (6−5)2+ (8−5)2

5 =

20

5 = 4

√4 = 2

Page 15: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

Case Study – Descriptive Statistics

Business Case: A telecommunications company maintains a customer database that includes, among other things, information on how much each customer spent on long distance, toll-free, equipment rental, calling card, and wireless services in the previous month.

The telecom company surveyed 1000 of its customers on all the above services.

Use Descriptive analysis to study customer spending to determine which services are most profitable.

Page 16: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

• On average, customers spend the most on equipment rental, but there is a lot of variation in the amount spent.

• Customers with calling card service spend only slightly less, on average, than equipment rental customers, and there is much less variation in the values.

• The real problem here is that most customers don't have every service, so a lot of 0's are being counted. One solution to this problem is to treat 0's as missing values so that the analysis for each service becomes conditional on having that service.

Case Study – Descriptive Statistics (Contd.)

N Valid N

Min Max Mean Standard Deviation

Long distance last month 1000 1000 0.90 99.95 11.72 10.36

Toll free last month 1000 475 0.00 173.00 13.27 16.90

Equipment last month 1000 386 0.00 77.70 14.21 19.07

Calling card last month 1000 678 0.00 109.25 13.78 14.08

Wireless last month 1000 296 0.00 111.95 11.58 19.72

Page 17: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

Probability Theory

HEAD TAIL

• Probability is a branch of mathematics that deals with the uncertainty of an event happening in the future.

• Probability value always occurs within a range of 0 to 1.

• Probability of an event, P(E) = No. of favorable occurrences

No. of possible occurrences

Page 18: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

Assigning Probabilities

• Classical method – based on equally likely outcomes.

E.g.: Rolling a dice.

• Relative frequency method – based on experimentation or historical data.

E.g.: A car agency has 5 cars. His past record as shown in the table shows his cars used in past 60 days.

• Subjective method – based on judgment.

E.g.: 75% chance that England will adopt to Euro currency by 2020.

No. of cars used

No. of days

Probability

0 3 (3/60) = 0.05

1 10 (10/60) = 0.17

2 16 (16/60) = 0.27

3 15 (15/60) = 0.25

4 9 (9/60) = 0.15

5 7 (7/60) = 0.11

Page 19: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

• Probability distribution for a random variable gives information about how the probabilities are distributed over the values of that random variable.

• Its defined by f(x) which gives probability of each value.

• E.g. Suppose we have sales data for AC sale in last 300 days.

Probability Distribution

0

0.2

0.4

0.6

1 2 3 4 5 6

Probability of units sold, f(x)

Probability ofunits sold, f(x)

Units sold No. of days

Probability of units sold, f(x)

0 10 0.03

1 55 0.18

2 150 0.5

3 55 0.18

4 25 0.08

5 5 0.02

Page 20: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

• Discrete probability distribution

• Following conditions should be satisfied –

• A fixed number of trials

• Each trial is independent of the others

• The probability of each outcome remains constant from trial to trial.

• Examples

• Tossing a coin 10 times for occurrences of head

• Surveying a population of 100 people to know if they watch television or not

• Rolling a die to check for occurrence of a 2

Binomial Distribution

Page 21: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

Example of binomial distribution: Amir buys a chocolate bar every day during a promotion that says one out of six chocolate bars has a gift coupon within. Answer the following questions :

• What is the distribution of the number of chocolates with gift coupons in seven days?

• What is the probability that Amir gets no chocolates with gift coupons in seven days?

• Amir gets no gift coupons for the first six days of the week. What is the chance that he will get a one on the seventh day?

• Amir buys a bar every day for six weeks. What is the probability that he gets at least three gift coupons?

• How many days of purchase are required so that Amir’s chance of getting at least one gift coupon is 0.95 or greater?

(Assume that the conditions of binomial distribution apply: the outcomes for Amir’s purchases are independent, and the population of chocolate bars is effectively infinite.)

Case Study—Binomial Distribution

Page 22: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

Steps:

Formula = nCr pr q n-r

Where n is the no. of trials , r is the number of successful outcomes , p is the probability of success, and q is the probability of failure.

Other important formulae include

p + q = 1

Hence, q = 1 – p

Thus, p = 1/6

q = 5/6

Case Study—Binomial Distribution (contd.)

Page 23: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

1. Distribution of number of chocolates with gift coupons in 7 days: 7C r (1/6)r (5/6)7-r

2. Probability of failing 7 days : P(X=0) =(5/6)7

3. Probability of winning a coupon on the 7th day : 1/6

4. The number of winning at least 3 wrappers in six weeks:

P(X ≥3)=1 – P(X≤2)

=1 – (P(X=0)+P(X=1)+P(X=2)

=1 – (0.0005+0.0040+0.0163)

= 0.979

5. Number of purchase days required so that probability of success is greater than 0.95:

P(X ≥1) ≥ 0.95 = 1 – P(X ≤0) ≥ 0.95

= 1 – P(X=0) ≤ 0.05

= n ≥ 16.43 (applying log function)

= 17days.

Case Study—Binomial Distribution (contd.)

Page 24: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

• Theoretical model of the whole population

• Centered around the mean and symmetrical on both sides

• Standard normal distribution – mean 0 and standard deviation 1

Normal Distribution

Page 25: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

• Discrete probability distribution for events that happen randomly in time

• Following conditions need to be satisfied –

• The event results in a success or failure

• The average number of successes, μ is known

• Probability of success is proportional to the region/time

• Probability of success in an extremely small region/time is almost zero.

• Properties: Mean and variance is equal, and is denoted by μ.

• Examples

• Average number of houses sold by a company is 5 per day. What is the probability that exactly 4 houses will be sold tomorrow?

• Average number of births in a hospital is 2.1 births per hour. What is the probability that there will be exactly 6 births in the next two hours?

Poisson distribution

Page 26: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

• Skewness – measure of deviation from symmetry

• Difference between median and mean

• Right or left skewed

• Skewness negative – more negative values (Left skewed)

• Skewness positive – more positive values (Right skewed)

• Kurtosis – measure of peakedness of the distribution

• High kurtosis – tall peak, rapid decline in the tails

• Low kurtosis – flat peaks, gradual decline in the tails

• Extreme case – uniform distribution

Skewness and Kurtosis

Page 27: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

• Equipment last month data is more accurate in nature and its SD is comparatively lower than the other measures.

• Conclusion - Equipment is the segment where the telecom company is getting more profits than the others and it can invest more .

Case Study – Skewness and Kurtosis

N Skewness Kurtosis

Statistic Std. Error Statistic Std. Error

Long distance last month 1000 2.966 0.077 14.012 0.155

Toll free last month 475 3.465 0.112 26.735 0.224

Equipment last month 386 0.756 0.124 0.641 0.248

Calling card last month 678 2.150 0.094 7.572 0.187

Wireless last month 296 1.359 0.142 3.079 0.282

Page 28: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

• It’s a rule for a population parameter to determine an interval that is likely to include the parameter based on the sample information.

• Supposing that a random variable has been taken and the random samples were taken repeatedly from the population, certain percentage of interval contains unknown value.

• In such case, if population is repeatedly sampled and intervals calculated in that fashion then 95% of interval contains true value of the unknown parameter.

• This interval is then said to be 95% confident for the population proportion.

• Data Requirements

• Confidence level

• Statistic

• Margin of error

• Range of the confidence interval = sample statistic + margin of error.

• The uncertainty associated with the confidence interval is specified by the confidence level.

Confidence interval

Page 29: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

• Identify a sample statistic - Choose the statistic that will be used to estimate a population parameter.

• Select a confidence level - It describes the uncertainty of a sampling method.

• Find the margin of error.

• Margin of error = Critical value * Standard error of statistic

• Specify the confidence interval - The range of the confidence interval is defined by the following equation.

• Confidence interval = sample statistic + Margin of error

Constructing a Confidence Interval

Page 30: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

Tests of Significance

• Tests used in assessing the evidence in favor of or against a given assumption

• Begins with a Null Hypothesis, H0

• Tests either validate the null hypothesis, or reject it in favor of an Alternate Hypothesis, Ha

• Two types of tests –

• One sided tests

• Two sided tests

• Results decided by calculating the “p-value”

• Interpretation:

• If p-value is less than the significance level α, reject the null hypothesis.

• General values of α are 0.05, 0.01.

• General Assumptions:

• The distribution is almost normal

• The samples in the distribution have almost unequal variances

Page 31: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

• μ0 = null value

• Null hypothesis : μ = μ0

• Alternative hypothesis : μ < μ0 OR μ > μ0

• Example : Given a sample of heights of 100 males in New York, decide whether the height has increased in general from a given average height of 5 feet 9 inches.

• Null Value: μ0 = 5 feet 9 inches

• Null Hypothesis : μ = 5.9

• Alternative Hypothesis : μ > 5.9

• Using one of various hypothesis tests, calculate p-value and reject null hypothesis if p-value is less than 0.05.

One Sided Hypothesis Testing

Page 32: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

• μ0 = null value

• Null hypothesis : μ = μ0

• Alternative hypothesis : μ ≠μ0

• Example : Given a sample of heights of 100 males in New York, decide whether the height has increased/decreased in general from a given average height of 5 feet 9 inches.

• Null Value: μ0 = 5 feet 9 inches

• Null Hypothesis : μ = 5.9

• Alternative Hypothesis : μ ≠ 5.9

• Using one of various hypothesis tests, calculate p-value and reject null hypothesis if p-value is less than 0.05.

Two Sided Hypothesis Testing

Page 33: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

Tests of Significance

One sample z-test

Two sample z-test

One sample t-test

Paired t-test

Two sample t-test

Chi – Squared test

F test - Analysis of Variance (ANOVA)

F test - Regression

Page 34: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

Chi- Squared tests

• Compare the observed result against an expected result based on a hypothesis

• Steps:

• State the null hypothesis

• Prepare the contingency table for the variable

• Determine the expected results

• Calculate the chi-squared value

• Calculate the degrees of freedom

• Based on the above, calculate the p-value

• If p-value < 0.05, reject the null hypothesis.

• Test of independence:

• Verify if two variables are independent

• Same steps as above.

Page 35: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

Case Study—Chi-Squared Test

• A city has a newly opened nuclear plant, and there are families staying dangerously close to the plant. A health safety officer wants to take this case up to provide relocation for the families that live in the surrounding area. To make a strong case, he wants to prove with numbers that an exposure to radiation levels is leading to an increase in diseased population. He formulates a contingency table of exposure and disease.

• Does the data suggest an association between the disease and exposure?

Disease Total

Exposure Yes No

Yes 37 13 50

No 17 53 70

Total 54 66 120

Page 36: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

Steps:

• Calculate the number of individuals of exposed and unexposed groups expected in each disease category (yes and no) if the probabilities were the same.

• If there were no effect of exposure, the probabilities should be same and the chi-squared statistic would have a very low value.

Proportion of population exposed = (50/120) = 0.42

Proportion of population not exposed = (70/120) = 0.58

Thus, expected values:

Population with disease = 54

Exposure Yes : 54 * 0.42 = 22.5

Exposure No : 54 * 0.58 = 31.5

Population without disease = 66

Exposure Yes : 66 * 0.42 = 27.5

Exposure No : 66 * 0.58 = 38.5

Case Study—Chi-Squared Test (contd.)

Page 37: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

• Calculate the Chi-squared statistic

χ2 = Σ

=

= 29.1

• Calculate the degrees of freedom :

(Number of rows – 1) X (Number of columns – 1)

df = (2 – 1) X (2 – 1) = 1

• Calculate the p-value from the chi-squared table

For chi-squared value 29.1 and degrees of freedom = 1, from the table, p-value is < 0.001

• Interpretation: There is 0.001 chance of obtaining such discrepancies between expected and observed values if there is no association

• Conclusion : There is an association between the exposure and disease.

Case Study—Chi-Squared Test (contd.)

Page 38: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

ANOVA

• Analysis of Variance – used to compare more than two groups

• Extension of the independent t-tests

• Factor variable – variable defining the groups

• Response variable – variable being compared

• One way ANOVA

• Groups of a single variable

• E.g. : Is there a difference in student’s scores based on the row he is seated – front/middle/back?

• Two way ANOVA

• Two independent variables

• E.g. : Does the race and gender affect a person’s yearly income?

Page 39: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

• Marks obtained in the same subject by 3 students belonging to three different schools are given below.

• Does the data suggest any association between schools and marks?

Basic Idea : Partition the total variation in the data into the variance between groups and variance within groups.

Case Study—One Way ANOVA

School A B C

Marks

82 83 38

83 78 59

97 68 55

Page 40: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

Steps:

• Calculate the means

• School A : mean(82,83,97) = 87.3

• School B : mean(83,78,68) = 76.3

• School C : mean(38,59,55) = 50.6

• Calculate the grand mean

• Grand mean = mean(82,83,97,83,78,68,39,59,55) = 71.4

• Calculating the variations

• Sum of Squared Deviations about the grand mean, across all observed values : SSTotal = 2630.2

• Sum of Squared Deviations of group mean about the grand mean – three group means against the grand mean : SSBetween = 2124.2

• Sum of Squared Deviations of observations within a group about their group mean; added across all groups : SSWithin = 506

Case Study—One Way ANOVA (contd.)

Page 41: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

• Calculate the degrees of freedom for every variance

• dfTotal = Number of observations – 1 = 9 -1 = 8

• dfBetween = Number of groups -1 = 3-1 = 2

• dfWithin = Number of observations – number of groups = 9-3 = 6

• Calculate the Mean Squared Variances

• Mean Squared variance between groups : MSBetween= SSBetween /dfBetween = 2124.2/2 = 1062.1

• Mean Squared variance within groups : MSWithin= SSWithin /dfWithin = 506/6 = 84.3

• Calculate the f-statistic

• F-value : MSBetween /MSWithin= 1062.1/84.3 = 12.59

• Calculate the p-value from the F-table

• p-value for given f-value 12.59 and degrees of freedom 2 and 6 is 0.007

• Conclusion : Since the p-value is less than alpha, we can conclude by rejecting the null hypothesis, that there is a difference in the marks obtained by students belonging to different groups.

Case Study—One Way ANOVA (contd.)

Page 42: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

• Referred to as “distribution free”, as they don’t involve making assumptions of any data.

• They have lower power than the parametric tests and hence are always given the second preference after the parametric tests.

• These tests are typically focused on median rather than mean.

• They involve straight-forward procedures like counting and ordering.

• There are at least one non-parametric test done for each parametric test and are classified into the following categories.

• Tests of differences between groups (independent samples)

• Tests of differences between variables (dependent variables)

• Tests of relationships between variables.

Non Parametric Testing

Page 43: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

Non Parametric Tests

Test Parametric Non Parametric

One Qualitative Response Variable

One Sample Test Sign Test

One Quantitative Response Variable – Two Values from Paired Samples

Paired sample t-test Wilcoxon Signed Rank Test

One Quantitative Response Variable – One Qualitative Independent Variable with two groups

Two Independent Sample t-test

Wilcoxon Rank Sum or Mann Whitney Test

One Quantitative Response Variable – One Qualitative Independent Variable with three or more groups

ANOVA Kruskall Wallis

Page 44: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

• Measure of association between variables

• Positive and negative correlation, ranging between +1 and -1

• Positive correlation example:

• Earning and expenditure

• Negative correlation example

• Speed and time

• Parametric – normal distribution and homogenous variance

• Pearson correlation

• Non parametric – no assumptions, nominal variables

• Spearman correlation

Correlation

Page 45: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

• r : correlation coefficient

• +1 : Perfectly positive

• -1 : Perfectly negative

• 0 – 0.2 : No or very weak association

• 0.2 – 0.4 : Weak association

• 0.4 – 0.6 : Moderate association

• 0.6 – 0.8 : Strong association

• 0.8 – 1 : Very strong to perfect association

Correlation coefficient

Page 46: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

● Overview of statistical methods

● Descriptive statistics – Measures of Central Tendency and Measures of Dispersion

● A business case study to understand the concepts of descriptive statistics

● Probability distribution

● What are tests of significance

● The process flow of hypothesis testing

● One sided and two sided hypothesis tests

● Various tests used in calculating the p-value

● What is non parametric testing and why is it used

● Non parametric alternatives for the usual tests of significance

Summary

Here is a quick recap of what we have learned in this lesson

Page 47: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

Quiz

Page 48: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

QUIZ

a.

b.

c.

d.

Which of the following is NOT a part of measure of central tendency?

Median

Mode

Standard Deviation

Mean

1

Page 49: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

QUIZ

a.

b.

c.

d.

Answer: d

Explanation: Standard Deviation is used to measure dispersion and not to measure central tendency.

Median

Mode

Standard Deviation

Mean

1 Which of the following is NOT a part of measure of central tendency?

Page 50: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

QUIZ

a.

b.

c.

d.

Mean = 9, Median = 9, Mode = 10

Mean = 9, Median = 10, Mode = 9

Mean = 8.67, Median = 10, Mode = 10

Mean = 8.67, Median = 9, Mode = 10

2

Calculate the mean, median and mode of the following data and choose the right option:

13, 3, 10, 9, 7, 10, 12, 8, 6

Page 51: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

QUIZ

a.

b.

c.

d.

Answer: a.

Mean is the average of all the values, median is the middle value and the mode is the most commonly occurring value.

Mean = 9, Median = 9, Mode = 10

Mean = 9, Median = 10, Mode = 9

Mean = 8.67, Median = 10, Mode = 10

Mean = 8.67, Median = 9, Mode = 10

2

Calculate the mean, median and mode of the following data and choose the right option:

13, 3, 10, 9, 7, 10, 12, 8, 6

Page 52: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

QUIZ

a.

b.

c.

d.

14.41

9.14

12.41

15.41

3

Calculate the variance of the following data and choose the right option:

5,10,12,4,8,9,16

Page 53: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

QUIZ

a.

b.

c.

d.

Answer: b.

Variance is the average of squared deviations about the mean, given by

14.41

9.14

12.41

15.41

3

Calculate the variance of the following data and choose the right option:

5,10,12,4,8,9,16

Page 54: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

QUIZ

a.

b.

c.

d.

From the research question below, choose the alternative hypothesis from the following options.

Given a sample of body temperatures of 50 persons, and the average temperature μ0 , decide if the average body temperature has increased over time.

μ > μ0

μ < μ0

μ ≠ μ0

μ = μ0

4

Page 55: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

QUIZ

a.

b.

c.

d.

Answer: b.

Explanation: The question forms a one sided hypothesis, checking if the average temperature has increased, that is, if μ > μ0

From the research question below, choose the alternative hypothesis from the following options.

Given a sample of body temperatures of 50 persons, and the average temperature μ0 , decide if the average body temperature has increased over time.

μ > μ0

μ < μ0

μ ≠ μ0

μ = μ0

4

Page 56: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

QUIZ

a.

b.

c.

d.

Choose the commonly used value for significance level α from the values given below

0.5

1.0

0.05

0.1

5

Page 57: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

QUIZ

a.

b.

c.

d.

Answer: d.

Explanation: The commonly used value for significance levels are 0.01 and 0.05.

Choose the commonly used value for significance level α from the values given below

0.5

1.0

0.05

0.1

5

Page 58: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

QUIZ

a.

b.

c.

d.

Choose the right answer – Non parametric tests can also be referred to as?

Deviation free

Dispersion free

Decision free

Distribution free

6

Page 59: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

QUIZ

a.

b.

c.

d.

Answer: a.

Explanation: Non parametric tests are distribution free

Deviation free

Dispersion free

Decision free

Distribution free

6 Choose the right answer – Non parametric tests can also be referred to as?

Page 60: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

QUIZ

a.

b.

c.

d.

Hypothesis testing

Dispersion

Data mining

Estimation

7 Descriptive statistics measures which of the following?

Page 61: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

QUIZ

a.

b.

c.

d.

Answer: c.

Explanation: Descriptive statistics deals with the measure of dispersion.

Hypothesis testing

Dispersion

Data mining

Estimation

7 Descriptive statistics measures which of the following?

Page 62: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

QUIZ

a.

b.

c.

d.

Which of the following conditions binomial distribution does not satisfy?

Each trial is independent of the others

The probability of each outcome remains constant from trial to trial.

Normal distribution

A fixed number of trials

8

Page 63: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

QUIZ

a.

b.

c.

d.

Answer: d.

Explanation: Normal distribution. Rest of the things are satisfied by binomial distribution.

Each trial is independent of the others

The probability of each outcome remains constant from trial to trial.

Normal distribution

A fixed number of trials

Which of the following conditions binomial distribution does not satisfy? 8

Page 64: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

QUIZ

a.

b.

c.

d.

Probability of an event always lies between?

-1 and 1

Negative and positive

Only positive

0 and 1

9

Page 65: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

QUIZ

a.

b.

c.

d.

Answer: a.

Explanation: The probability of an event always lies between 0 and 1, i.e. failure and success of that event

-1 and 1

Negative and positive

Only positive

0 and 1

Probability of an event always lies between?

9

Page 66: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

QUIZ

a.

b.

c.

d.

Which is not a part of descriptive statistics?

Measure of central tendency

Measures of dispersion

Hypothesis testing

Sample

10

Page 67: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

QUIZ

a.

b.

c.

d.

Answer: d.

Explanation: Hypothesis testing is not a part of descriptive statistics, it is a part of inferential statistics.

Measure of central tendency

Measures of dispersion

Hypothesis testing

Sample

Which is not a part of descriptive statistics?

10

Page 68: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

QUIZ

a.

b.

c.

d.

Which is used to calculate the “central” value of numbers?

11

Mode

Mean

Standard deviation

Median

Page 69: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

QUIZ

a.

b.

c.

d.

Answer: c.

Explanation: Mean is used to calculate the central value or average of an given value of numbers.

Which is used to calculate the “central” value of numbers?

11

Mode

Mean

Standard deviation

Median

Page 70: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

QUIZ

a.

b.

c.

d.

Which is used to find the highest frequency in a given value of numbers?

12

Mode

Mean

Standard deviation

Median

Page 71: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

QUIZ

a.

b.

c.

d.

Answer: b.

Explanation: Mode is used to calculate the highest frequency which is being occurred in a given value of numbers.

Which is used to find the highest frequency in a given value of number?

12

Mode

Mean

Standard deviation

Median

Page 72: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

QUIZ

a.

b.

c.

d.

Which is used to measure the dispersion in a given set of numbers?

13

Mode

Mean

Standard deviation

Median

Page 73: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

QUIZ

a.

b.

c.

d.

Answer: d.

Explanation: Standard deviation is used to measure the dispersion in a given set of numbers.

Mode

Mean

Standard deviation

Median

Which is used to measure the dispersion in a given set of numbers? 13

Page 74: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

QUIZ

a.

b.

c.

d.

What is the logical conclusion if p value is less than 0.05?

Reject null hypothesis

Reject alternate hypothesis

None of the above

Accept null hypothesis

14

Page 75: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

QUIZ

a.

b.

c.

d.

Answer: b.

Explanation: if the p value is less than 0.05, i.e., p<0.05, we reject the true null hypothesis.

Reject null hypothesis

Reject alternate hypothesis

None of the above

Accept null hypothesis

14

What is the logical conclusion if p value is less than 0.05?

Page 76: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

QUIZ

a.

b.

c.

d.

Which is used to measure the peakedness of distribution?

Outlier

Kurtosis

Variance

Skewness

15

Page 77: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

QUIZ

a.

b.

c.

d.

Answer: c.

Explanation: Kurtosis is mainly used to measure the peakedness of an distribution of a particular data set .

Which is used to measure the peakedness of distribution?

15

Outlier

Kurtosis

Variance

Skewness

Page 78: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

QUIZ

a.

b.

c.

d.

Which is the measure of deviation from symmetry?

Outlier

Kurtosis

Variance

Skewness

16

Page 79: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

QUIZ

a.

b.

c.

d.

Answer: a.

Explanation: Skewness is the measure of deviation from symmetry and this maybe left skewed or right skewed.

Which is the measure of deviation from symmetry?

16

Outlier

Kurtosis

Variance

Skewness

Page 80: Lesson 2_statistical Concepts

Copyright 2014, Simplilearn, All rights reserved.

Thank You

Copyright 2014, Simplilearn, All rights reserved.