lecture 02 dr. mumtaz ahmed mth 161: introduction to statistics

47
Lecture 02 Dr. MUMTAZ AHMED MTH 161: Introduction To Statistics

Upload: terence-robinson

Post on 20-Jan-2016

221 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: Lecture 02 Dr. MUMTAZ AHMED MTH 161: Introduction To Statistics

Lecture 02Dr. MUMTAZ AHMED

MTH 161: Introduction To Statistics

Page 2: Lecture 02 Dr. MUMTAZ AHMED MTH 161: Introduction To Statistics

Objectives

Methods of Data Presentations Classification of Data

Bases of ClassificationTypes of Classifications

Tabulation of DataTypes of TabulationsConstructing a Statistical TableGeneral Rules of Tabulation

Table of frequency distributionsFrequency DistributionRelative frequency distributionCumulative frequency distribution

Page 3: Lecture 02 Dr. MUMTAZ AHMED MTH 161: Introduction To Statistics

Organizing Data

After collecting data, the first task for a researcher is to organize and simplify the data so that it is possible to get a general overview of the results.

Raw Data: Data which is not organized is called raw data.

Un-Grouped Data: Data in its original form is called Un-Grouped Data.

Note: Raw data is also called ungrouped data.

Page 4: Lecture 02 Dr. MUMTAZ AHMED MTH 161: Introduction To Statistics

Different Ways of Organizing Data

To get an understanding of the data, it is organized and arranged into a meaningful form.

This is done by the following methods:

ClassificationTabulation (e.g. simple tables, frequency tables, stem and leaf

plots etc.)Graphs (Bar Graph, Pie chart, Histogram, Frequency Ogive etc.)

Page 5: Lecture 02 Dr. MUMTAZ AHMED MTH 161: Introduction To Statistics

Classification of Data

The process of arranging data into homogenous group or classes according to some common characteristics present in the data is called classification.

Example:

The process of sorting letters in a post office, the letters are classified according to the cities and further arranged according to streets.

Page 6: Lecture 02 Dr. MUMTAZ AHMED MTH 161: Introduction To Statistics

Bases of Classification

There are four important bases of classification:

Qualitative BaseQuantitative BaseGeographical BaseChronological or Temporal Base

Page 7: Lecture 02 Dr. MUMTAZ AHMED MTH 161: Introduction To Statistics

Bases of Classification

Qualitative Base:

When the data are classified according to some quality or attributes such as sex, religion, etc.Quantitative Base:

When the data are classified by quantitative characteristics like heights, weights, ages, income etc.

Page 8: Lecture 02 Dr. MUMTAZ AHMED MTH 161: Introduction To Statistics

Bases of Classification

Geographical Base:

When the data are classified by geographical regions or location, like states, provinces, cities, countries etc.

Chronological or Temporal Base:

When the data are classified or arranged by their time of occurrence, such as years, months, weeks, days etc. (e.g. Time series data).

Page 9: Lecture 02 Dr. MUMTAZ AHMED MTH 161: Introduction To Statistics

Types of Classification

There are Three main types of classifications:

One -way ClassificationTwo-way ClassificationMulti-way Classification

Page 10: Lecture 02 Dr. MUMTAZ AHMED MTH 161: Introduction To Statistics

One -way Classification

If we classify observed data keeping in view single characteristic, this type of classification is known as one-way classification.

Example:

The population of world may be classified by religion as Muslim, Christian etc.

Page 11: Lecture 02 Dr. MUMTAZ AHMED MTH 161: Introduction To Statistics

Two-way Classification

If we consider two characteristics at a time in order to classify the observed data then we are doing two way classifications.

Example:

The population of world may be classified by Religion and Sex.

Page 12: Lecture 02 Dr. MUMTAZ AHMED MTH 161: Introduction To Statistics

Multi-way Classification

If we consider more than two characteristics at a time in order to classify the observed data then we are doing multi-way classification.

Example:

The population of world may be classified by Religion, Sex and Literacy.

Page 13: Lecture 02 Dr. MUMTAZ AHMED MTH 161: Introduction To Statistics

Tabulation of Data

The process of placing classified data into tabular form is known as tabulation.

A table is a symmetric arrangement of statistical data in rows and columns.

Rows are horizontal arrangements whereas columns are vertical arrangements.

Page 14: Lecture 02 Dr. MUMTAZ AHMED MTH 161: Introduction To Statistics

Types of Tabulation

There are Three types of tabulation:

Simple or One-way TableDouble or Two-way TableComplex or Multi-way Table

Page 15: Lecture 02 Dr. MUMTAZ AHMED MTH 161: Introduction To Statistics

Simple or One-way Table

When the data are tabulated to one characteristic, it is said to be simple tabulation or one-way tabulation.

Example:

Tabulation of data on population of world classified by one characteristic like Religion, is an example of simple tabulation.

Page 16: Lecture 02 Dr. MUMTAZ AHMED MTH 161: Introduction To Statistics

Double or Two-way Table

When the data are tabulated according to two characteristics at a time. It is said to be double tabulation or two-way tabulation.

Example:

Tabulation of data on population of world classified by two characteristics like Religion and Sex, is an example of double tabulation.

Page 17: Lecture 02 Dr. MUMTAZ AHMED MTH 161: Introduction To Statistics

Complex or Multi-way Table

When the data are tabulated according to many characteristics (generally more than two), it is said to be complex tabulation.

Example:

Tabulation of data on population of world classified by three characteristics like Religion, Sex and Literacy etc.

Page 18: Lecture 02 Dr. MUMTAZ AHMED MTH 161: Introduction To Statistics

Construction of Statistical Table

A statistical table has at least four major parts and some other minor parts.The TitleThe Box Head (column captions)The Stub (row captions)The BodyPrefatory NotesFoot NotesSource Notes

Page 19: Lecture 02 Dr. MUMTAZ AHMED MTH 161: Introduction To Statistics

General Sketch of Table

THE TITLE(Prefatory Notes)

Foot Notes…

Source Notes…

Box Head

Row Caption Column Caption

Stub Entries The Body

Page 20: Lecture 02 Dr. MUMTAZ AHMED MTH 161: Introduction To Statistics

General Sketch of Table

THE TITLEA title is the main

heading written in capital shown at the top of the table.

It must explain the contents of the table and throw light on the table as whole.

Different parts of the heading can be separated by commas and no full stop should be used in the little.

Box Head

Row Caption Column Caption

Stub Entries The Body

THE TITLE(Prefatory Notes)

Foot Notes…Source Notes…

Page 21: Lecture 02 Dr. MUMTAZ AHMED MTH 161: Introduction To Statistics

General Sketch of Table

THE Box Head

(Column Captions)The vertical heading and

subheading of the column are called columns captions.

The spaces where these column headings are written is called box head.

Only the first letter of the box head is in capital letters and the remaining words must be written in small letters.

Box Head

Row Caption Column Caption

Stub Entries The Body

THE TITLE(Prefatory Notes)

Foot Notes…Source Notes…

Page 22: Lecture 02 Dr. MUMTAZ AHMED MTH 161: Introduction To Statistics

General Sketch of Table

THE Stub

(Row Captions)The horizontal headings

and sub-heading of the row are called row captions.

The space where these row headings are written is called stub.

Box Head

Row Caption Column Caption

Stub Entries The Body

THE TITLE(Prefatory Notes)

Foot Notes…Source Notes…

Page 23: Lecture 02 Dr. MUMTAZ AHMED MTH 161: Introduction To Statistics

General Sketch of Table

THE Body

It is the main part of the table which contains the numerical information classified with respect to row and column captions.

Box Head

Row Caption Column Caption

Stub Entries The Body

THE TITLE(Prefatory Notes)

Foot Notes…Source Notes…

Page 24: Lecture 02 Dr. MUMTAZ AHMED MTH 161: Introduction To Statistics

General Sketch of Table

Prefatory Notes

A statement given below the title and enclosed in brackets usually describe the units of measurement.

Box Head

Row Caption Column Caption

Stub Entries The Body

THE TITLE(Prefatory Notes)

Foot Notes…Source Notes…

Page 25: Lecture 02 Dr. MUMTAZ AHMED MTH 161: Introduction To Statistics

General Sketch of Table

Foot Notes

It appears immediately below the body of the table providing the further additional explanation.

Box Head

Row Caption Column Caption

Stub Entries The Body

THE TITLE(Prefatory Notes)

Foot Notes…Source Notes…

Page 26: Lecture 02 Dr. MUMTAZ AHMED MTH 161: Introduction To Statistics

General Sketch of Table

Source NotesThe source notes is

given at the end of the table indicating the source from where the information has been taken.

It includes the information about compiling agency, publication etc.

Box Head

Row Caption Column Caption

Stub Entries The Body

THE TITLE(Prefatory Notes)

Foot Notes…Source Notes…

Page 27: Lecture 02 Dr. MUMTAZ AHMED MTH 161: Introduction To Statistics

General Rules of Tabulation

A table should be simple and attractive. A complex table may be broken into relatively simple tables.

Headings for columns and rows should be proper and clear.

Suitable approximation may be adopted and figures may be rounded off. But this should be mentioned in the prefatory note or in the foot note.

The unit of measurement and nature of data should be well defined.

Page 28: Lecture 02 Dr. MUMTAZ AHMED MTH 161: Introduction To Statistics

Organizing Data via Frequency Tables

One method for simplifying and organizing data is to construct a frequency distribution.

Frequency Distribution: The organization of a set of data in a table showing the distribution of the data into classes or groups together with the number of observations in each class or group is called a Frequency Distribution.

Class Frequency: The number of observations falling in a particular class is called class frequency or simply frequency, denoted by ‘f’.

Grouped Data: Data presented in the form of a frequency distribution is called grouped data.

Page 29: Lecture 02 Dr. MUMTAZ AHMED MTH 161: Introduction To Statistics

Why Use Frequency Distributions?

A frequency distribution is a way to summarize data.

A frequency distribution condenses the raw data into a more meaningful form.

A frequency distribution allows for a quick visual interpretation of the data.

Frequency Distributions can be drawn for qualitative data as well as quantitative data.

Page 30: Lecture 02 Dr. MUMTAZ AHMED MTH 161: Introduction To Statistics

Frequency Distribution of Discrete Data

Example: Number of children in 20 families. 2 3 1 3 2 5 4 1 4 2 3 5 2 5 2 1 3 1 2 0 Construct un-grouped or discrete frequency distribution.

Interpretation: There is 1 family with no children.4 families with 1 children6 families with 2 children4 families with 3 children2 families with 4 children and3 families with 1 children.

No of Children

Tally No of Families (frequency) f

0 | 1

1 | | | | 4

2 | | | | | 6

3 | | | | 4

4 | | 2

5 | | | 3

Total 20

Page 31: Lecture 02 Dr. MUMTAZ AHMED MTH 161: Introduction To Statistics

Grouped Frequency Distribution

Sometimes, when the data is continuous or covers a wide range of values, it becomes very burdensome to make a list of all values as in that case the list will be too long.

To remedy this situation, a grouped frequency distribution table is used.

Page 32: Lecture 02 Dr. MUMTAZ AHMED MTH 161: Introduction To Statistics

Grouped Frequency Distributionfor Continuous Data

Example (Temperature Data):

Temperature of 20 winter days in Pakistan is recorded below:

24, 35, 17, 21, 24, 37, 26, 46, 58, 30, 32, 13, 12, 38, 41, 43, 44,

27, 53, 27

Construct frequency distribution.

Note: Temperature is a continuous variable because it could be measured to any degree of precision desired.

Page 33: Lecture 02 Dr. MUMTAZ AHMED MTH 161: Introduction To Statistics

Steps in Constructing Grouped Frequency Distribution

Sort raw data from low to high:

12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Find range:

Range=maximum value – minimum value=58 - 12 = 46

Select number of classes: 5 (usually between 5 and 20)

Compute class width:

Class width=Range/no of class=46/5=9.2 ~ 10

Determine class limits: 11-20, 21-30, 31-40, 41-50, 51-60

(Note: the above classes should cover the full data)

Count the number of values in each class

Page 34: Lecture 02 Dr. MUMTAZ AHMED MTH 161: Introduction To Statistics

Frequency Distribution of Grouped Data

Sorted Data: 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Frequency Distribution (Temp Data)

Classes

Tally Frequency (f)

11-20 | | | 3

21-30 | | | | | | 7

31-40 | | | | 4

41-50 | | | | 4

51-60 | | 2

Total 20

Page 35: Lecture 02 Dr. MUMTAZ AHMED MTH 161: Introduction To Statistics

Frequency Distribution of Qualitative Data

Political Party Affiliations: Professor X asked his introductory statistics students to state their political party affiliations as PML-N(N), PPP(P), PTI and PML-Q(Q). The responses of the 30 students in a class are:PPP N Q PTI N Q N PPP PTI NPTI N PTI PPP N Q N PTI Q PTIPPP PTI N PTI Q PTI N PTI Q PPP

Construct a frequency distribution.

Interpretation:Out of 30 students in the class, 10 are in favor of PTI9 are in favor of PML-N6 are in favor of PML-Q and 5 are in favor of PPP.

Party Tally Freq (f)

PTI | | | | | | | | 10

N | | | | | | | | 9

Q | | | | | 6

P | | | | 5

Total 30

Page 36: Lecture 02 Dr. MUMTAZ AHMED MTH 161: Introduction To Statistics

Relative Frequency DistributionRelative Frequency is the ratio of the frequency to the total number

of observations.

Relative frequency = Frequency/Number of observations

Example:

Relative frequency of students who favored PTI=10/30=0.333=33.33%

Relative frequency of students who favored PML-N=9/30=0.3=30%

Relative frequency of students who favored PML-Q=6/30=0.2=20%

Relative frequency of students who favored PPP=5/30=0.167=16.67%

Page 37: Lecture 02 Dr. MUMTAZ AHMED MTH 161: Introduction To Statistics

Frequency Distribution of Qualitative Data

Party Affiliation Example:

Interpretation: Out of 30 students in the class,

33.3% are in favor of PTI

30% are in favor of PML-N

20% are in favor of PML-Q

and

16.7% are in favor of PPP.

Party

Freq (f)

Relative Freq

PTI 10 10/30=0.3333

N 9 9/30=0.30

Q 6 6/30=0.20

P 5 5/30=0.1667

Total

30 1

Page 38: Lecture 02 Dr. MUMTAZ AHMED MTH 161: Introduction To Statistics

Cumulative Frequency Distribution

Cumulative Frequency:

The total frequency of a variable from its one end to a certain values (usually upper class boundary in grouped data), called the base, is known as cumulative frequency less than or more than the base of the variable.

Cumulative Frequency Distribution:

The table showing cumulative frequencies is called cumulative frequency distribution.

Page 39: Lecture 02 Dr. MUMTAZ AHMED MTH 161: Introduction To Statistics

Cumulative Frequency Distribution

Constructing Class Boundaries: Take difference of lower limit of second class and upper limit of first class. (e.g. 21-20=1), Then divide this difference by 2. (i.e. ½=0.5). Subtract the resulting number (i.e. 0.5) from lower class limit of each class and add the resulting number (i.e. 0.5) to the upper class limit of each class. The newly obtained classes are called Class Boundaries (C.B).

Classes

Class Boundaries

Frequency (f)

11-20 10.5-20.5 3

21-30 20.5-30.5 6

31-40 30.5-40.5 5

41-50 40.5-50.5 4

51-60 50.5-60.5 2

Total 20

Page 40: Lecture 02 Dr. MUMTAZ AHMED MTH 161: Introduction To Statistics

Less than Cumulative Frequency Distribution

Frequency Distribution of Less than Cumulative temperature data frequency distribution

of Temp dataClass

BoundariesCumulative Frequency

Less than 10.5 0

Less than 20.5 3

Less than 30.5 3+6=9

Less than 40.5 9+5=14

Less than 50.5 14+4=18

Less than 60.5 18+2=20

Classes Class Boundarie

s

Frequency (f)

11-20 10.5-20.5 3

21-30 20.5-30.5 6

31-40 30.5-40.5 5

41-50 40.5-50.5 4

51-60 50.5-60.5 2

Total 20

Page 41: Lecture 02 Dr. MUMTAZ AHMED MTH 161: Introduction To Statistics

More than Cumulative Frequency Distribution

Frequency Distribution of More than Cumulative temperature data frequency distribution

of Temp dataClass

BoundariesCumulative Frequency

More than 10.5 20

More than 20.5 20-3=17

More than 30.5 17-6=11

More than 40.5 11-5=6

More than 50.5 6-4=2

More than 60.5 2-2=0

Classes Class Boundarie

s

Frequency (f)

11-20 10.5-20.5 3

21-30 20.5-30.5 6

31-40 30.5-40.5 5

41-50 40.5-50.5 4

51-60 50.5-60.5 2

Total 20

Page 42: Lecture 02 Dr. MUMTAZ AHMED MTH 161: Introduction To Statistics

Stem and Leaf Plot

Disadvantage of Frequency Table:

An obvious disadvantage of using frequency table is that the identity of individual observation is lost in the grouping process.

Stem and Leaf plot provides the solution by offering a quick and clear way of sorting and displaying data simultaneously.

Page 43: Lecture 02 Dr. MUMTAZ AHMED MTH 161: Introduction To Statistics

Stem and Leaf Plot

METHOD:Sort the data seriesSeparate the sorted data series into leading digits (the stem) and

the trailing digits (the leaves)

e.g. In 13, the leading digit (stem) is 1 and trailing digit (leaf) is 3 and in 21, the leading digit (stem) is 2 and trailing digit (leaf) is 1.List all stems in a column from low to highFor each stem, list all associated leaves

Page 44: Lecture 02 Dr. MUMTAZ AHMED MTH 161: Introduction To Statistics

Stem and Leaf Plot

Example 1: Consider the temp data again.

The sorted data from low to high is shown below:12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Here, use the 10’s digit for the stem unit:

13 is shown as

21 is shown as

35 is shown as

Stem Leaf

1 3

2 1

3 5

Page 45: Lecture 02 Dr. MUMTAZ AHMED MTH 161: Introduction To Statistics

Stem and Leaf Plot

Data in ordered array:12, 13, 17, 21, 24, 24, 26, 27, 28, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Completed Stem-and-leaf diagram

Stem Leaf

1 2 3 7

2 1 4 4 6 7 8

3 0 2 5 7 8

4 1 3 4 6

5 3 8

Page 46: Lecture 02 Dr. MUMTAZ AHMED MTH 161: Introduction To Statistics

Review

Let’s review the main concepts:

Methods of Data Presentations Classification of Data

Bases of ClassificationTypes of Classifications

Tabulation of DataTypes of TabulationsConstructing a Statistical TableGeneral Rules of Tabulation

Table of frequency distributionsFrequency DistributionRelative frequency distributionCumulative frequency distribution

Page 47: Lecture 02 Dr. MUMTAZ AHMED MTH 161: Introduction To Statistics

Next LectureIn next lecture, we will study:

Graphical Methods of Data PresentationsGraphs for qualitative data

Bar Charts Simple Bar Chart Multiple Bar Chart Component Bar Chart

Pie ChartsGraphs for quantitative data

HistogramsFrequency PolygonCumulative Frequency Polygon (Frequency Ogive)