data collection and presentation
TRANSCRIPT
Data Collection & Presentation
Presented by:Nasif Hassan Khan Abir ………… ID # 61531-24-007Md. Ferdaus Alam ………… ID # 61531-24-010Zakir Husain ………… ID # 61325-18-058Md. Faruqul Islam ............ ID # 61325-18-029
Data
Data Collection The collection, organization, and presentation of data are basic
background material for learning descriptive and inferential statistics and their applications
Method of Collecting DataOn the basis of the source of collection data may be classified as: Primary data Secondary data
Types of DataThere are two types of data. They are: Numerical Data Categorical Data
Collection of Data
Collection of Data The data which are originally collected for the first time for the
purpose of the survey are called primary data. For example facts or data collected regarding the habit of taking tea or coffee in a village by an investigator.
Method of Collecting Primary DataThere are several methods for collecting primary data. Some of them
are: Direct personal investigation Indirect investigations Through correspondent By mailed questionnaire Through schedules
Collection of Data(cont’d)Secondary Data When we use the data, which have already been collected by
others, the data are called secondary data. This data is said to be primary for the agency which collects it first, and it becomes secondary for all the other users.
Method of Collecting Secondary Data Published reports of newspapers, RBI and periodicals. Publication from trade associations Financial data reported in annual reports Information from official publications Publication of international bodies such as UNO, World Bank etc. Internal reports of the government departments Records maintained by the institutions Research reports prepared by students in the universities
Types of Data
Categorical Data Categorical data is the statistical data type consisting of
categorical variables or of data that has been converted into that form, for example as grouped data. For example- Marital Status, Political Party, Eye Color, etc.
Numerical Data Numerical values or observations can be measured. And these
numbers can be placed in ascending or descending order. Numerical data can be divided into two groups:
Discrete(Counted Items such as- number of children, defects per hour etc.)
Continuous(Measured Characteristics such as- weight, voltage etc.)
Types of Data(cont’d)Level of Measurement/Measurement Scale
Interval Data
Ordinal Data
Nominal Data
Height, Age, Weekly Food Spending
Service quality rating, Standard & Poor’s bond rating, Student letter grades
Marital status, Type of car owned
Ratio Data
Temperature in Fahrenheit, Standardized exam score
Categories (no ordering or direction)
Ordered Categories (rankings, order, or scaling)
Differences between measurements but no true zero
Differences between measurements, true zero exists
EXAMPLES:
Data PresentationPresentation of Data Data collected in the form of schedules and questionnaires are
not self explanatory. These are in the form of raw data. In order to make them meaningful, these are to be made presentable.
Presentation of Categorical Data Categorical Data can be presented by two ways: Tabulating Data(Summary Table) Graphing Data (Bar Chart, Pie Chart, Pareto Diagram)
The Summary Table
The summary table is a visualization that summarizes statistical information about data in table form.
Example: Current Investment Portfolio
Investment Amount Percentage Type (in thousands $) (%)
Stocks 46.5 42.27Bonds 32.0 29.09CD 15.5 14.09Savings 16.0 14.55 Total110.0 100.0
Bar Chart
Bar charts are often used for qualitative data (categories or nominal scale). Height of bar shows the frequency or percentage for each category. Bar Chart for the previous summary table is
StocksBonds
CDSavings
0 5 10 15 20 25 30 35 40 45 50
Investor's Portfolio
Amount in $1000's
Pie Chart
Pie charts are often used for qualitative data (categories or nominal scale). Size of pie slice shows the frequency or percentage for each category. Pie Chart for the previous summary table is shown below
Pareto Diagram
Used to portray categorical data A bar chart, where categories are shown in descending order of frequency A cumulative polygon is often shown in the same graph Used to separate the “vital few” from the “trivial many”
Stocks Bonds Savings CD0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%Current Investment Portfolio
Series1Series2
% invested in each category (bar graph)
cumulative % invested (line
graph)
Presentation of Numerical DataCategorical Data can be presented by two ways:
Ordered Array (Stem-and-Leaf Display) Frequency/Cumulative Distributions (Histogram, Polygon,
Ogive)
Ordered Array A sequence of data in rank order: Shows range (min to max) Provides some signals about variability within the range May help identify outliers (unusual observations) If the data set is large, the ordered array is less useful Example- Data in raw form (as collected): 24, 26, 24, 21, 27, 27,
30, 41, 32, 38 Data in ordered array from smallest to largest:21, 24, 24, 26, 27,
27, 30, 32, 38, 41
Stem-and-Leaf Diagram A simple way to see distribution details in a data set. To make
this diagram first
We have to separate the sorted data series into leading digits (the stem) and the trailing digits (the leaves).
Stem and Leaves of 21, 38 and 41 is,
Stem Leaf2 13 84 1
Frequency/Cumulative Distributions
What is a Frequency Distribution? A frequency distribution is a list or a table Containing class groupings (ranges within which the data fall) The corresponding frequencies with which data fall within each
grouping or category.
The reasons for using Frequency Distributions are: It is a way to summarize numerical data It condenses the raw data into a more useful form It allows for a quick visual interpretation of the data
Frequency/Cumulative Distributions(cont’d)
Class Intervals and Class Boundaries Each class grouping has the same width Determine the width of each interval by
Usually at least 5 but no more than 15 groupings Class boundaries never overlap Round up the interval width to get desirable endpoints
groupingsclassdesiredofnumberrangeintervalofWidth
Frequency Distributions Example
A manufacturer of insulation randomly selects 20 winter days
and records the daily high temperature 24, 35, 17, 21, 24, 37, 26, 46, 58, 30, 32, 13, 12, 38, 41,
43, 44, 27, 53, 27 For frequency distribution we need to follow the following steps:
Sort raw data in ascending order:12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
Find range: 58 - 12 = 46 Select number of classes: 5 (usually between 5 and 15) Compute class interval (width): 10 (46/5 then round up) Determine class boundaries (limits): 10, 20, 30, 40, 50, 60 Compute class midpoints: 15, 25, 35, 45, 55 Count observations & assign to classes
Frequency Distributions Example(cont’d)
The Histogram
A graph of the data in a frequency distribution is called a histogram
The class boundaries (or class midpoints) are shown on the horizontal axis
the vertical axis is either frequency, relative frequency, or percentage
Bars of the appropriate heights are used to represent the number of observations within each class
Example-For previous data the Histogram should be like this. There will be no gap between bars.
5 15 25 35 45 55 650
1
2
3
4
5
6
7
Histogram: Daily High Temperature
Class Midpoints
Freq
uenc
y
The Frequency Polygon
In a percentage polygon the vertical axis would be defined to show the percentage of observations per class.
Example-For previous data the Frequency Polygon should be like this,
5 15 25 35 45 55 650
1
2
3
4
5
6
7
Frequency Polygon: Daily High Temperature
Class Midpoins
Freq
uenc
y
The Ogive
It is also known as the cumulative percent polygon.Example-For previous data the Ogive or Cumulative percent Polygon should be like this,
10 20 30 40 50 600
10
20
30
40
50
60
70
80
90
100
Ogive: Daily High Temperature
Class Boundaries (Not Midpoints)
Cum
ulat
ive
Perc
enta
ge
Guidelines for good data presentation
Not distorting the data Avoiding unnecessary adornments (no “chart junk”) Using a scale for each axis on a two-dimensional graph The vertical axis scale should begin at zero Properly labeling all axes The graph should contain a title Using the simplest graph for a given set of data
THANK YOU !!!