farah adibah adnan engineering mathematics institute (imk) c hapter 1 b asic s tatistics
TRANSCRIPT
CHAPTER 1BASIC STATISTICS
CHAPTER 1 Basic Statistics
Statistics in Engineering
Collecting Engineering Data
Data Summary and Presentation
Probability Distributions
- Discrete Probability Distribution
- Continuous Probability Distribution
Sampling Distributions of the Mean and Proportion
STATISTICS IN ENGINEERING
Statistics - area of science that deals with collection, organization, analysis, and interpretation of data.
Statistics - deals with methods and techniques that can be used to draw conclusions about the characteristics of a large number of data points, commonly called a population by using a smaller subset of the entire data called sample.
Because many aspects of engineering practice involve working with data, obviously some knowledge of statistics is important to an engineer.
Specifically, statistical techniques can be a powerful aid in designing new products and systems, improving existing designs, and improving production process.
The methods of statistics allow scientists and engineers to design valid experiments and to draw reliable conclusions from the data they produce
COLLECTING ENGINEERING DATA
Direct observationThe simplest method of obtaining data.Advantage: relatively inexpensive.Disadvantage: difficult to produce useful information since it does not consider all aspects regarding the issues.
ExperimentsMore expensive methods but better way to produce data.Data produced are called experimental.
SurveysMost familiar methods of data collection.Depends on the response rate.
Personal InterviewHas the advantage of having higher expected response rate.Fewer incorrect respondents.
DATA PRESENTATION
Data can be categorized into two :-- Qualitative data - qualitative attributes- Quantitative data - quantitative attributes
Two sources of data :-- Primary ( eg. Questionnaire, Telephone Interview)- Secondary (eg. Internet, Annual Report)
Data should be summarized in more informative way such as graphical, tables or charts.
DATA PRESENTATION
Data can be summarized or presented in two ways: 1) Tabular 2) Charts/graphs.
Data Presentation of Qualitative Data
1) Frequency Distribution Table - represents the number of times the observation occurs in the data.
Example: Ethnic Group Observation FrequencyMalay 33Chinese 9Indian 6Others 2
Bar Chart : Ethnic Group Pie Chart : Gender
Line Chart : Number of Sandpipers from Jan 1989 – Dec 1989
2) Charts for qualitative data are:
Data Presentation of Quantitative Data 1) Frequency Distribution Table – list all classes and the number
of values that belong to each class.Weekly Earnings (dollars)
(Class Limit)
Number of Employees,
f
Class Boundaries
Class Width, c
ClassMidpoint, x
Cumulative Frequency, F
801-1000 9 800.5 – 1000.5
200 900.5 9
1001-1200 22 1000.5 – 1200.5
200 1100.5 9 + 22 = 31
1201-1400 39 1200.5 – 1400.5
200 1300.5 31 + 39 = 70
1401-1600 15 1400.5 – 1600.5
200 1500.5 70 + 15 = 85
1601-1800 9 1600.5 – 1800.5
200 1700.5 85 + 9 = 94
1801-2000 6 1800.5 – 2000.5
200 1900.5 94 + 6 = 100
This formula will be used to form frequency distribution table, from raw data.
Class - an interval that includes all the values that fall within two numbers; the lower and upper class (class limit).
Class Boundary - the midpoint of the upper limit of one class and the lower limit of the next class.
Class Width/Size/Interval ,c - difference between the two boundaries of a class . Formula :C = Upper boundary – Lower Boundary
Class Midpoint/Mark, x – formula:(Lower Limit + Upper Limit)/2
How to Form Frequency Distribution Table
1) Decide the number of classes to be used.2) Determine class width: When the number of classes are given, Class width =
When the number of classes are not given,
Class width =
where the number of class =
Don’t forget to always round up to the nearest whole number when dealing with class width/interval. Any convenient number that is equal to or less than the smallest
values in the data set can be used as the lower limit of the first class.
Highest value - Lowest value
Number of class
Highest value - Lowest value
1 3 3. log n1 3 3. log n
Example:The following data give the total number of iPods sold by a mail order
company on each of 30 days. Construct a frequency distribution table. (Hint: 5 number classes).
Solution:Number of classes = 5Class width =
8 25 11 15 29 22 10 5 1721
22 13 26 16 18 12 9 26 2016
23 14 19 23 20 16 27 9 21 14
highest value - lowest value 29 54 8 5
number of classes 5.
Frequency Distribution Table
Class Interval Frequency, f
5 - 9 410 – 14 615 – 19 720 – 24 825 - 29 5
2) Graph for quantitative data are:
Polygon : Student’s CGPAHistogram: Student’s CGPA
Ogive: Student’s CGPA
DATA SUMMARY
Summary statistics are used to summarize a set of observations.
Two basic summary statistics are 1) Measures of central tendency
- Mean- Median- Mode
2) Measures of dispersion- Range- Variance- Standard deviation
MEASURES OF CENTRAL TENDENCY
1) Mean ,( )
Mean of a sample ( ) or population ( ) is the sum of the sample data divided by the total number sample.
Mean for ungroup data is given by:
Sample:
Population:
Mean for group data is given by:
Sample: Population:
where f = class frequency; x = class mark (mid point)
x n
xxornnfor
n
xxxx n
_21
_
,...,2,1,.......
x/
x
x
N
1
1
n
i iin
ii
f xfx
x orn
f
1
1
n
i iin
ii
f xfx
orN
f
Example:
1) Find the mean for the set of data 4, 6, 3, 1, 2, 5, 7.
Solution:
2) Find the mean of the frequency distribution table below.
47
7521364_
x
Solution:
Therefore, the mean of frequency distribution above is:
235.350
75.161x
(x)(f)
2) Median, ( ) Median is the middle value of a set of observations arranged in
ascending order and normally is denoted by ( ).
Median for ungrouped data:
- The median depends on the number of observations in the data, .
- If is odd, then the median is the th observation of the ordered observations / middle value.
- If is even, then the median is the average of the 2 middle values ( th observation and the th observation).
)2
1(
n
n
n
n
2
n)1
2( n
x~
x~
Median for grouped data / frequency of distribution.
The median of frequency distribution is defined by:
where,
= the lower class boundary of the median class;
= the size of the median class interval;
= the sum of frequencies of all classes lower than the median class;
= the frequency of the median class.
j
j
f
f
FcLx 12
~
L
c
1jF
jf
Example:
1) Find the median for the set of data 4, 6, 3, 1, 2, 5, 7, 3.
Solution:Arrange in order of magnitude : 1,2,3,3,4,5,6,7.As n = 8 (even), the median is the mean of the 4th and 5th value.Therefore, the median is 3.5
2) Find the median of the frequency distribution table below.
Solution:
To determine median class:
So, the median class falls in class 3.00 – 3.25.
252
50n
217.3
15
122
50
25.000.3~
1225.000.325.3
,00.3
1
x
FcL
j
CumulativeFrequency
3) Mode, ( ) The mode of a set of observations is the observation with the
highest frequency and is usually denoted by ( ). Sometimes mode can also be used to describe the qualitative data.
*Note:
If data set with only 1 value that occur with the highest frequency, therefore it has 1 mode and it is called unimodal data.
If data set has 2 measurements with highest frequency, therefore it has 2 modes and known as bimodal data.
If data set has more than 2 measurements with highest frequency, so the data set contains more than 2 modes and said to be multimodal data.
x̂
x̂
For ungrouped data:
- Defined as the value which occurs most frequent.Example:
The mode for data 4,6,3,1,2,5,7,3 is 3.
For grouped data
When data has been grouped into classes and a frequency curve is drawn to fit the data, the mode is the value of corresponding to the maximum point on the curve.
- Determining the mode using formula.
where
= the lower class boundary of the modal class;
= the size of the modal class interval;
= the difference between the modal class frequency and the class before it;and
= the difference between the modal class frequency and the class after it.
Note: The class which has the highest frequency is called the modal class.
21
1cLx
L
c
2
1
Example:
Find mode of the frequency distribution table below.
Solution:
179.325
525.000.3ˆ
2131551015
25.000.325.3,00.3
2
1
x
cL
MEASURES OF DISPERSION
The measure of dispersion/spread is the degree to which a set of data tends to spread around the average value.
It shows whether data will set is focused around the mean or scattered.
The common measures of dispersion are:1) Range2) Variance3) Standard deviation
The standard deviation actually is the square root of the variance.
The sample variance is denoted by s2 and the sample standard deviation is denoted by s.
1) Range Simplest measure of dispersion. Apply for both group & ungroup data.
Ungroup data: Formula: Range = Largest value – Smallest value
Group data:Formula: Range = Largest value (class limit) – Smallest value
(class limit)Example:
Solution:
Range = Largest Value – Smallest Value
= 267, 277 – 49, 651 = 217, 626 square miles.
State Total Area (square miles)
Arkansas 53,182
Louisiana 49,651
Oklahoma 69,903
Texas 267, 277
2) Variance, ( ) Measures the variability in a set of data. The variance for the ungrouped data:
Sample: Population:
The variance for the grouped data:
Sample:
Population:
1
)( 22
n
xxS
11
)(
2
22
2
n
n
fxfx
n
xxfS
22 / s
22
( )x
N
NN
fxfx
N
xf
2
22
2 )(
Example:The variance for grouped data :
Solution: 0973.0
4950
)75.161(031.528
1
22
2
2
nn
xffx
S
2) Standard Deviation, ( ) The positive square root of the variance is the
standard deviation. A larger value of the standard deviation – the values
of the data set are spread relatively large from the mean.
A lower value of the standard deviation – the values of the data set are spread relatively small from the mean.
The standard deviation for the ungrouped data:Sample: Population:
1
)( 22
n
xxS
s/
N
x
22 )(
The standard deviation for grouped data:Sample:
Population:
Example:From previous example.
22
22
( )( )
1 1
f xfxf x x nS
n n
2
2
2
21
(161.75)528.031
50 0.097349
0.3119
f xfx
nSn
NN
xffx
N
xf
22
22
)()(
EXERCISE
The final results in business statistics of 40 students are recorded as below
a) Present the data in frequency distribution table.
b) Construct a histogram
c) Calculate mean, median, mode, variance and std deviation.
59 90 71 77 64 89 83 70 81 73
68 75 90 82 88 94 61 69 79 86
63 60 84 79 76 65 72 68 71 72
63 68 76 78 90 74 76 62 75 86