data processing and analysis of data

A TalkOn

‘Data Processing and Analysis of Data’(Research Methodology)

Introduction

• The data has to be processed and analyzed for the purpose of research plan

• This is essential for scientific study and comparisons.

• Processing implies– Editing– Coding– Classification and– Tabulation

• Analysis implies– Computation of certain measures – Searching for patterns of relationships that exists

among data groups.

Processing Operations

1. Editing– The process of examining the collected raw data

to detect errors and omission and also correct these.

– It involves scrutiny of the completed questionnaires and/or schedules.

– There are two variations of editing• Field editing.• Central editing.

• Field editing– Consists of review of the reporting forms by the

investigator for completing (rewriting) what has been written in abbreviated form at the time of recording the response.

– This editing is expected to be done as soon as possible after the interview.

– While doing field editing the investigator should not try to correct errors or omissions by simply guessing the suitable option.

• Central editing– Takes place when all forms or schedules have

been completed and returned to office.– All the forms should be edited by a single editor in

a small study or a team of editors in case of large inquiry.

– Corrections are allowed in this editing.

– There are certain points to be kept in view while performing their work

a) Editors should be familiar with instructions given to the interviewers and coders.

b) Single line should be drawn to cross out any information.

c) Entries should be made in some distinctive color and in standardized form.

d) They should initial all answers which they change or supply,.

e) Editor’s initials and the date of editing should be placed on each completed from or schedule.

2. Coding– Refers to the process of assigning numerals or

other symbols to answers so that the response can be put into limited categories.

– Necessary for efficient analysis.– Coding decision is usually taken at the design

stage of the questionnaire.

3. Classification– Individual Data should be reduced into

homogeneous groups to get meaningful relationships.

– classification is the process of arranging data in groups or classes on the basis of some common characteristics.

• Broadly there are two types of classification based on the nature of the phenomena involved.a) Classification according to attributes.

b) Classification according to class-interval.

• Classification according to attributes:– Data are classified on the basis of common

characteristics either descriptive or numerical.– Descriptive characteristics refer to qualitative

phenomenon which cannot be measured quantitatively

– Data obtained this way is known as statistics of attributes.

– This classification can be either simple or manifold– In Simple classification, we consider only one

attribute and make two classes; one possessing the considered attribute and the other devoid of it.

– In Manifold classification, more than one attributes are considered and data is divided into number of classes.

• Classification according to class-interval:– Data relating to income, production, age etc are

known as statistics of variables and are classified on the basis of class intervals.

4. Tabulation– Tabulation refers to the process of summarizing

the raw data and displaying the same in compact form.

– It is essential because:• It conserves space and reduces the explanatory

statements to minimum.• Facilitates the process of comparison.

Elements/Types of Analysis

• In case of survey or experimental data, analysis involves – estimating the values of unknown parameters of

the population,– Testing of hypotheses for drawing inferences.

• Categories of analysis:a)Descriptiveb)inferential

• Correlation analysis:– Studies the joint variation of two or more

variables for determining the amount of correlation between two or more variables.

• Casual analysis:– Studies how one or more variable affect changes

in another variable.

• Multivariate analysis:– “All statistical methods which simultaneously

analyze more than two variables on a sample of observations.”

– It involves:a) Multiple regression analysisb) Multiple discriminant analysisc) Multivariate analysis of varianced) Canonical analysis

STATISTICS IN RESEARCH

• Statistics in research functions as a tool in designing research, analyzing its data and drawing conclusions there from.

• The important statistical measures used to summarize the survey/research are:1) Measure of central tendency or statistical

averages.2) Measures of dispersion

3. Measures of asymmetry(skewness)4. Measures of relationship5. Other measures

Measure of Central Tendency

– It tells the point about which items have a tendency to cluster.

– Mean, Median ,Modes are the most popular averages.

– Mean is also known as arithmetic average– Median is the value of the middle item of series

when it is arranged in ascending or descending order.

– Mode is the most commonly or frequently occurring value in a series.

Measure of Dispersion

– It is used to give an idea about the scatter of the values of items of a variable in the series around the true value of average.

– Important measures of dispersion are:a) Rangeb) Mean deviation andc) Standard deviation

• Range– Is the simplest possible measure of dispersion – It is defined as the difference between the values of

the extreme items of a series.• Mean deviation– It is the average of difference of the values of items

from some average of the series.• Standard deviation– Most widely used measure of dispersion– Denoted by the symbol σ

– Standard deviation is defined as the square root of the average of squares of deviations.

Where

Measure of Asymmetry

– When the distribution of the elements in a series happens to be perfectly symmetrical then we get the following type of curve. Technically such curves are described as normal curve.

• If the curve is distorted, it is said to exhibit asymmetrical distribution which indicates the presence of skewness.

– Where

Measures of Relationship

– In context of bivariate and multivariate population, it is required to know the relation of the two or more variables in the data to one another.

– These association/correlation and cause-and-effect relationship are studied using correlation technique and the technique of regression

• In case of bivariate population:– Correlation can be studied through:

a) Cross tabulationb) Charles Spearman’s coefficient of correlationc) Karl Pearson’s coefficient of correlation

– Cause-and-effect relationship can be studied through simple regression technique.

1. Cross tabulation:– Useful when the data are in nominal form– Classify each variable in two or more categories

and then cross classify the variables in these categories.

– The interaction between them can be as follows:• Symmetrical• Reciprocal• Asymmetrical

• In a symmetrical relationship the two variables vary together.

• In reciprocal relationship the two variables mutually influence or reinforce each other.

• In an asymmetric relationship one variable (independent variable) is responsible for another variable (dependent variable).

2. Charles Spearman’s coefficient of correlation:― This technique deals with ordinal data where ranks are

given to the different values of the variables― The objective is to determine the extent to which the

two sets of ranking are similar of dissimilar.

3. Karl Pearson’s coefficient of correlation: – Most widely used method to measure the

degree of relationship between two variables.

• Simple regression analysis:– Regression is the determination of a statistical

relationship between two or more variables, where one variable is the cause of the behavior of another variable.

– If X is the independent variable and Y is the dependent variable then, the regression equation of Y on X is given as below

• In case of multivariate population:– Correlation can be studied through:

a)coefficient of multiple correlation.b)coefficient of partial correlation.

– Cause-and-effect relationship can be studied through multiple regression equations.

1. Multiple Correlation and Regression– When there are two or more independent

variables then the analysis concerning relationship is known as multiple correlation

– The equation describing such relationship is known as multiple regression equation.

• In the context of two independent variables and one dependent variable the equation can be given as:

• Partial correlation:– Partial correlation measures separately the

relationship between two variables such that the effect of other related variable is eliminated

– In other words the aim is at measuring the relation between a dependent variable and particular independent variable by holding all other variables constant.

Other Measures

1. Index number:– Used when the series are expressed in different

units.– In such scenario the series is converted into

series of index numbers.– For example the given figures can be expressed

in terms of percentage.

2. Time- Series Analysis:– When the data collected relates to some time

period concerning a given phenomenon, particularly in economic and business scenario, such data are labeled as ‘Time-Series’

– Factors affecting such series areI. Secular trend (T) : changes taking place at long duration of

time II. Short time oscillations: changes taking place at short

duration of time

• Short time oscillation are affected by the following factors:

a) Cyclic fluctuations (C): the fluctuations as a result of business cycles.

b) Seasonal fluctuations (S): these fluctuations are of short duration occurring at a regular sequence at specific interval of time.

c) Irregular fluctuations (I): such fluctuations takes place at completely unpredictable fashion.

• For analyzing time series there are two models:a) Multiplicative modelb) Additive modelMultiplicative model assumes that the various

component interact in a multiplicative manner to produce the given values of the overall time series and can be stated as;

The additive model considers the total of various components resulting in the given values of the overall time series and can be stated as

data processing and analysis of data

Documents

data groups

classificationindividual

types of classification

manifold classification

collected raw data

date of editing

statistics of attributes

descriptive characteristics