aed1222 lesson 5

22
Introduction to Statistics for Built Environment Course Code: AED 1222 Compiled by DEPARTMENT OF ARCHITECTURE AND ENVIRONMENTAL DESIGN (AED) CENTRE FOR FOUNDATION STUDIES (CFS) INTERNATIONAL ISLAMIC UNIVERSITY MALAYSIA

Upload: nurun2010

Post on 14-May-2015

345 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Aed1222 lesson 5

Introduction to Statistics for Built Environment

Course Code: AED 1222

Compiled byDEPARTMENT OF ARCHITECTURE AND ENVIRONMENTAL DESIGN (AED)

CENTRE FOR FOUNDATION STUDIES (CFS)INTERNATIONAL ISLAMIC UNIVERSITY MALAYSIA

Page 2: Aed1222 lesson 5

Lecture 5Summarizing Quantitative Data 1

Today’s Lecture: Summarizing Quantitative Data:

The data array Frequency Distribution Relative Frequency Distribution Cumulative Frequency Distribution

Page 3: Aed1222 lesson 5

Contingency Table

Contingency Table

Data

Qualitative Quantitative

TabularTabular GraphicalGraphical TabularTabular GraphicalGraphical

Frequency DistributionFrequency

Distribution

Rel. Freq. Distribution

Rel. Freq. Distribution

Bar GraphBar Graph

Pie ChartPie Chart

Frequency DistributionFrequency

Distribution

Rel. Freq. Distribution

Rel. Freq. Distribution

Cumulative Freq. Dist.

Cumulative Freq. Dist.

Histograms & Polygons

Histograms & Polygons

Stem and Leaf PlotStem and Leaf Plot

An overview

OgivesOgives

LECTURE 6

LECTURE 4

An overview of common data presentation:

Page 4: Aed1222 lesson 5

Raw data

Raw data (sometimes called source data or atomic data) is data that has not been processed for use. A distinction is sometimes made between data and information to the effect that information is the end product of data processing.

The simplest way of systematically organizing raw data is the DATA ARRAY

Although raw data has the potential to become "information," it requires selective extraction, organization, and sometimes analysis and formatting for presentation.

Page 5: Aed1222 lesson 5

The data array

The data array is an arrangement of data items in either an ascending (from lowest to highest value), or descending (from highest to lowest value).

The advantages of the data array:• Identifying the range of data, which is the difference between the largest and smallest numbers in the data set.• Identifying the upper and lower halves of the data.• An array can show the presence of large concentrations of items at particular values.

Page 6: Aed1222 lesson 5

In spite of these advantages, the array is an awkward data organization tool, especially when the number of data items is very large.

Therefore, there is a need to arrange the data into a more compact form for analysis and communication purposes.

The data array cont.

Page 7: Aed1222 lesson 5

Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.

Example: A manufacturer of insulation randomly selects 20 days and records the daily high temperature.

The data array cont.

DATA ARRAY

RAW DATA

12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Sort raw data in ascending order:

24, 35, 17, 21, 24, 37, 26, 46, 58, 30, 32, 13, 12, 38, 41, 43, 44, 27, 53, 27

Insulation manufacturer 20 days high temperature record.

Page 8: Aed1222 lesson 5

Constructing a frequency tableTo construct a frequency distribution table, it is necessary to determine the following:

1.The range of the collected data

2.The number of classes that will be used to group the data.

3.The width of these classes.

4.Determine the class boundaries.

5.Count the frequency of each class (based on the data collected).

Page 9: Aed1222 lesson 5

Determining the number of classes

Few ClassesFewer classes with a very large width can result in the loss of important detail.

Many ClassesMany classes with small width can be used for preliminary analysis, but may contain too much detail to be used in a formal data presentation.

How to determine Number of Classes?

The number of classes depends on the number of observations being grouped, the purpose of the distribution, and the preference of the researcher.

Page 10: Aed1222 lesson 5

In formal presentations, the number of classes used to group the data generally varies from 5 to 20.

Determining the number of classes cont.

The key is to use classes that give you a good view of the data pattern and enable you to gain insights into the information that is there.

• Therefore, the researcher had to determine the suitable number of classes that suits best to its study.

Page 11: Aed1222 lesson 5

Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.

General Guidelines Number of Data Points Number of Classes

under 50 5 - 7 50 – 100 6 - 10 100 – 250 7 - 12 over 250 10 - 20

– Class widths can typically be reduced as the number of observations increases

– Distributions with numerous observations are more likely to be smooth and have gaps filled since data are plentiful

Determining the number of classes cont.

Page 12: Aed1222 lesson 5

Determining class interval

Class Interval must satisfy two conditions:1. All data items from the smallest to the largest must be

included.2. Each item must be assigned to only one class, i.e. no gaps or

overlapping among classes.

The width of each class (the class interval) should be equal.

To determine the interval of each class, divide the range (the difference between the highest and lowest items in the data set) by the desired number of classes, and then round up.

How to determine Class Interval?

Page 13: Aed1222 lesson 5

Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.

• The class width is the distance between the lowest possible value and the highest possible value for a frequency class.

The class width formula is :

Largest Value - Smallest Value

Number of ClassesW

=

Determining the class interval cont.

Page 14: Aed1222 lesson 5

Class Interval & Boundary

25=lower class limit

34=upper class limit

Open class interval

Table: Number of respondents by age and gender.

Class midpoint

(35+44)/2=39.5

Page 15: Aed1222 lesson 5

Table: Heights of 100 male students at XYZ University.

Includes all measurements from 62.5in. – 65.5in.(class boundary)

62.5= lower class boundary65.5= upper class boundary

Size of class intervalUpper class boundary - Lower class boundary65.5 – 62.5 = 368.5 – 65.5 = 3

Class interval & boundary cont.

Page 16: Aed1222 lesson 5

Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.

Back to earlier Example :

Constructing a frequency distribution table cont.

DATA ARRAY

12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Sorted raw data from low to high:

Then….

1.Find range: 58 - 12 = 46

2.Select number of classes: 5 (usually between 5 and 20)

3.Compute class width: 10 (46/5 then round up)

4.Determine class boundaries: 10, 20, 30, 40, 50, 60. (Sometimes class midpoints are reported: 15, 25, 35, 45, 55)

5.Count the number of values in each class

Insulation manufacturer 20 days high temperature record.

Page 17: Aed1222 lesson 5

Classes : 5Width : 10

Example (Cont.):

DATA ARRAY

12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Constructing a frequency distribution table cont.

Sorted raw data from low to high:

Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.

Insulation manufacturer 20 days high temperature record.

Page 18: Aed1222 lesson 5

Why use Frequency Distribution?

• Frequency distribution tables provide insights about the data that cannot be quickly obtained by looking only at the original data (raw data).

• In addition, it is a method of organizing data items into a compact form without obscuring (covering) essential facts.

• This purpose is achieved by grouping the data into a relatively small number of classes.

• Therefore, a frequency distribution (for quantitative data) groups data items into classes and then records the number of items that appear in each class.

Frequency Distribution

Page 19: Aed1222 lesson 5

Relative frequencyWhy use Relative Frequency?

• The relative frequency of a class is the fraction or proportion of the total number of data items belonging to the class.

• A relative frequency distribution is a tabular summary of a set of data showing the relative frequency for each class.

• Relative frequencies can be written as fractions, percents, or decimals.

Page 20: Aed1222 lesson 5

Cumulative frequency

What is a Cumulative frequency?

• Cumulative frequency analysis is the analysis of the frequency of occurrence of values of a phenomenon less than a reference value.

• i.e. It tells how often the value of the random variable is less than or equal to a particular reference value.

Page 21: Aed1222 lesson 5

Surfing time (minutes)

No. of students (frequency)

Cumulative frequency

Relative frequency

Percentage

300-399 14 14 + 0 = 14 14/400 = 0.035 3.5

400-499 46 14 + 46 = 60 46/400 = 0.115 11.5

500-599 58 60 + 58 = 118 58/400 = 0.145 14.5

600-699 76 118 + 76 = 194 76/400 = 0.19 19.0

700-799 68 194 + 68 = 262 68/400 = 0.17 17.0

800-899 62 262 + 62 = 324 62/400 = 0.155 15.5

900-999 48 324 + 48 = 372 48/400 = 0.12 12.0

1000-1099 22 372 + 22 = 394 22/400 = 0.055 5.5

1100-1199 6 394 + 6 = 400 6/400 = 0.015 1.5

Cumulative frequency cont.From the table below,

118 students surfed internet for up to 599 minutes (i.e. 599 minutes or less)

324 students surfed internet for up to 899 minutes (i.e. 899 minutes or less)

We can state that:

Time taken by students to surfed internet .

Page 22: Aed1222 lesson 5

An exerciseConduct a survey of the number of siblings (brothers and sisters) each student in your group has.

1. What is the range of the data?2. Identify the upper and lower halves of the data.3. What percentage of the students have from 2 to 3 siblings? 4. What percentage of the students have fewer than 4 siblings?5. How many students had up to 5 siblings?

Answer the following questions:

1. Arrange the obtained raw data in an ascending array.2. Group the data and create a frequency table.3. Add to it a cumulative frequency column, a relative frequency column

and a cumulative frequency column.