farah adibah adnan engineering mathematics institute (imk) c hapter 1 b asic s tatistics

CHAPTER 1BASIC STATISTICS

CHAPTER 1 Basic Statistics

Statistics in Engineering

Collecting Engineering Data

Data Summary and Presentation

Probability Distributions

- Discrete Probability Distribution

- Continuous Probability Distribution

Sampling Distributions of the Mean and Proportion

STATISTICS IN ENGINEERING

Statistics - area of science that deals with collection, organization, analysis, and interpretation of data.

Statistics - deals with methods and techniques that can be used to draw conclusions about the characteristics of a large number of data points, commonly called a population by using a smaller subset of the entire data called sample.

Because many aspects of engineering practice involve working with data, obviously some knowledge of statistics is important to an engineer.

Specifically, statistical techniques can be a powerful aid in designing new products and systems, improving existing designs, and improving production process.

The methods of statistics allow scientists and engineers to design valid experiments and to draw reliable conclusions from the data they produce

COLLECTING ENGINEERING DATA

Direct observationThe simplest method of obtaining data.Advantage: relatively inexpensive.Disadvantage: difficult to produce useful information since it does not consider all aspects regarding the issues.

ExperimentsMore expensive methods but better way to produce data.Data produced are called experimental.

SurveysMost familiar methods of data collection.Depends on the response rate.

Personal InterviewHas the advantage of having higher expected response rate.Fewer incorrect respondents.

DATA PRESENTATION

Data can be categorized into two :-- Qualitative data - qualitative attributes- Quantitative data - quantitative attributes

Two sources of data :-- Primary ( eg. Questionnaire, Telephone Interview)- Secondary (eg. Internet, Annual Report)

Data should be summarized in more informative way such as graphical, tables or charts.

DATA PRESENTATION

Data can be summarized or presented in two ways: 1) Tabular 2) Charts/graphs.

Data Presentation of Qualitative Data

1) Frequency Distribution Table - represents the number of times the observation occurs in the data.

Example: Ethnic Group Observation FrequencyMalay 33Chinese 9Indian 6Others 2

Bar Chart : Ethnic Group Pie Chart : Gender

Line Chart : Number of Sandpipers from Jan 1989 – Dec 1989

2) Charts for qualitative data are:

Data Presentation of Quantitative Data 1) Frequency Distribution Table – list all classes and the number

of values that belong to each class.Weekly Earnings (dollars)

(Class Limit)

Number of Employees,

f

Class Boundaries

Class Width, c

ClassMidpoint, x

Cumulative Frequency, F

801-1000 9 800.5 – 1000.5

200 900.5 9

1001-1200 22 1000.5 – 1200.5

200 1100.5 9 + 22 = 31

1201-1400 39 1200.5 – 1400.5

200 1300.5 31 + 39 = 70

1401-1600 15 1400.5 – 1600.5

200 1500.5 70 + 15 = 85

1601-1800 9 1600.5 – 1800.5

200 1700.5 85 + 9 = 94

1801-2000 6 1800.5 – 2000.5

200 1900.5 94 + 6 = 100

This formula will be used to form frequency distribution table, from raw data.

Class - an interval that includes all the values that fall within two numbers; the lower and upper class (class limit).

Class Boundary - the midpoint of the upper limit of one class and the lower limit of the next class.

Class Width/Size/Interval ,c - difference between the two boundaries of a class . Formula :C = Upper boundary – Lower Boundary

Class Midpoint/Mark, x – formula:(Lower Limit + Upper Limit)/2

How to Form Frequency Distribution Table

1) Decide the number of classes to be used.2) Determine class width: When the number of classes are given, Class width =

When the number of classes are not given,

Class width =

where the number of class =

Don’t forget to always round up to the nearest whole number when dealing with class width/interval. Any convenient number that is equal to or less than the smallest

values in the data set can be used as the lower limit of the first class.

Highest value - Lowest value

Number of class

Highest value - Lowest value

1 3 3. log n1 3 3. log n

Example:The following data give the total number of iPods sold by a mail order

company on each of 30 days. Construct a frequency distribution table. (Hint: 5 number classes).

Solution:Number of classes = 5Class width =

8 25 11 15 29 22 10 5 1721

22 13 26 16 18 12 9 26 2016

23 14 19 23 20 16 27 9 21 14

highest value - lowest value 29 54 8 5

number of classes 5.

Frequency Distribution Table

Class Interval Frequency, f

5 - 9 410 – 14 615 – 19 720 – 24 825 - 29 5

2) Graph for quantitative data are:

Polygon : Student’s CGPAHistogram: Student’s CGPA

Ogive: Student’s CGPA

DATA SUMMARY

Summary statistics are used to summarize a set of observations.

Two basic summary statistics are 1) Measures of central tendency

- Mean- Median- Mode

2) Measures of dispersion- Range- Variance- Standard deviation

MEASURES OF CENTRAL TENDENCY

1) Mean ,( )

Mean of a sample ( ) or population ( ) is the sum of the sample data divided by the total number sample.

Mean for ungroup data is given by:

Sample:

Population:

Mean for group data is given by:

Sample: Population:

where f = class frequency; x = class mark (mid point)

x n

xxornnfor

n

xxxx n

_21

_

,...,2,1,.......

x/

x

x

N

1

1

n

i iin

ii

f xfx

x orn

f

1

1

n

i iin

ii

f xfx

orN

f

Example:

1) Find the mean for the set of data 4, 6, 3, 1, 2, 5, 7.

Solution:

2) Find the mean of the frequency distribution table below.

47

7521364_

x

Solution:

Therefore, the mean of frequency distribution above is:

235.350

75.161x

(x)(f)

2) Median, ( ) Median is the middle value of a set of observations arranged in

ascending order and normally is denoted by ( ).

Median for ungrouped data:

- The median depends on the number of observations in the data, .

- If is odd, then the median is the th observation of the ordered observations / middle value.

- If is even, then the median is the average of the 2 middle values ( th observation and the th observation).

)2

1(

n

n

n

n

2

n)1

2( n

x~

x~

Median for grouped data / frequency of distribution.

The median of frequency distribution is defined by:

where,

= the lower class boundary of the median class;

= the size of the median class interval;

= the sum of frequencies of all classes lower than the median class;

= the frequency of the median class.

j

j

f

f

FcLx 12

~

L

c

1jF

jf

Example:

1) Find the median for the set of data 4, 6, 3, 1, 2, 5, 7, 3.

Solution:Arrange in order of magnitude : 1,2,3,3,4,5,6,7.As n = 8 (even), the median is the mean of the 4th and 5th value.Therefore, the median is 3.5

2) Find the median of the frequency distribution table below.

Solution:

To determine median class:

So, the median class falls in class 3.00 – 3.25.

252

50n

217.3

15

122

50

25.000.3~

1225.000.325.3

,00.3

1

x

FcL

j

CumulativeFrequency

3) Mode, ( ) The mode of a set of observations is the observation with the

highest frequency and is usually denoted by ( ). Sometimes mode can also be used to describe the qualitative data.

*Note:

If data set with only 1 value that occur with the highest frequency, therefore it has 1 mode and it is called unimodal data.

If data set has 2 measurements with highest frequency, therefore it has 2 modes and known as bimodal data.

If data set has more than 2 measurements with highest frequency, so the data set contains more than 2 modes and said to be multimodal data.

x̂

x̂

For ungrouped data:

- Defined as the value which occurs most frequent.Example:

The mode for data 4,6,3,1,2,5,7,3 is 3.

For grouped data

When data has been grouped into classes and a frequency curve is drawn to fit the data, the mode is the value of corresponding to the maximum point on the curve.

- Determining the mode using formula.

where

= the lower class boundary of the modal class;

= the size of the modal class interval;

= the difference between the modal class frequency and the class before it;and

= the difference between the modal class frequency and the class after it.

Note: The class which has the highest frequency is called the modal class.

21

1cLx

L

c

2

1

Example:

Find mode of the frequency distribution table below.

Solution:

179.325

525.000.3ˆ

2131551015

25.000.325.3,00.3

2

1

x

cL

MEASURES OF DISPERSION

The measure of dispersion/spread is the degree to which a set of data tends to spread around the average value.

It shows whether data will set is focused around the mean or scattered.

The common measures of dispersion are:1) Range2) Variance3) Standard deviation

The standard deviation actually is the square root of the variance.

The sample variance is denoted by s2 and the sample standard deviation is denoted by s.

1) Range Simplest measure of dispersion. Apply for both group & ungroup data.

Ungroup data: Formula: Range = Largest value – Smallest value

Group data:Formula: Range = Largest value (class limit) – Smallest value

(class limit)Example:

Solution:

Range = Largest Value – Smallest Value

= 267, 277 – 49, 651 = 217, 626 square miles.

State Total Area (square miles)

Arkansas 53,182

Louisiana 49,651

Oklahoma 69,903

Texas 267, 277

2) Variance, ( ) Measures the variability in a set of data. The variance for the ungrouped data:

Sample: Population:

The variance for the grouped data:

Sample:

Population:

1

)( 22

n

xxS

11

)(

2

22

2

n

n

fxfx

n

xxfS

22 / s

22

( )x

N

NN

fxfx

N

xf

2

22

2 )(

Example:The variance for grouped data :

Solution: 0973.0

4950

)75.161(031.528

1

22

2

2

nn

xffx

S

2) Standard Deviation, ( ) The positive square root of the variance is the

standard deviation. A larger value of the standard deviation – the values

of the data set are spread relatively large from the mean.

A lower value of the standard deviation – the values of the data set are spread relatively small from the mean.

The standard deviation for the ungrouped data:Sample: Population:

1

)( 22

n

xxS

s/

N

x

22 )(

The standard deviation for grouped data:Sample:

Population:

Example:From previous example.

22

22

( )( )

1 1

f xfxf x x nS

n n

2

2

2

21

(161.75)528.031

50 0.097349

0.3119

f xfx

nSn

NN

xffx

N

xf

22

22

)()(

EXERCISE

The final results in business statistics of 40 students are recorded as below

a) Present the data in frequency distribution table.

b) Construct a histogram

c) Calculate mean, median, mode, variance and std deviation.

59 90 71 77 64 89 83 70 81 73

68 75 90 82 88 94 61 69 79 86

63 60 84 79 76 65 72 68 71 72

63 68 76 78 90 74 76 62 75 86

farah adibah adnan engineering mathematics institute (imk) c hapter 1 b asic s tatistics

Documents

data presentation data

sources of data

entire data

data presentationdata

raw data

interpretation of data

large number of data

upper class class limit