data processing and analysis of data

43
A Talk On Data Processing and Analysis of Data’ (Research Methodology)

Upload: ankita3031

Post on 27-Jan-2016

10 views

Category:

Documents


4 download

DESCRIPTION

power point presentation on Data processing and analysis of data

TRANSCRIPT

Page 1: data processing and analysis of data

A TalkOn

‘Data Processing and Analysis of Data’(Research Methodology)

Page 2: data processing and analysis of data

Introduction

• The data has to be processed and analyzed for the purpose of research plan

• This is essential for scientific study and comparisons.

• Processing implies– Editing– Coding– Classification and– Tabulation

Page 3: data processing and analysis of data

• Analysis implies– Computation of certain measures – Searching for patterns of relationships that exists

among data groups.

Page 4: data processing and analysis of data

Processing Operations

1. Editing– The process of examining the collected raw data

to detect errors and omission and also correct these.

– It involves scrutiny of the completed questionnaires and/or schedules.

– There are two variations of editing• Field editing.• Central editing.

Page 5: data processing and analysis of data

• Field editing– Consists of review of the reporting forms by the

investigator for completing (rewriting) what has been written in abbreviated form at the time of recording the response.

– This editing is expected to be done as soon as possible after the interview.

– While doing field editing the investigator should not try to correct errors or omissions by simply guessing the suitable option.

Page 6: data processing and analysis of data

• Central editing– Takes place when all forms or schedules have

been completed and returned to office.– All the forms should be edited by a single editor in

a small study or a team of editors in case of large inquiry.

– Corrections are allowed in this editing.

Page 7: data processing and analysis of data

– There are certain points to be kept in view while performing their work

a) Editors should be familiar with instructions given to the interviewers and coders.

b) Single line should be drawn to cross out any information.

c) Entries should be made in some distinctive color and in standardized form.

d) They should initial all answers which they change or supply,.

e) Editor’s initials and the date of editing should be placed on each completed from or schedule.

Page 8: data processing and analysis of data

2. Coding– Refers to the process of assigning numerals or

other symbols to answers so that the response can be put into limited categories.

– Necessary for efficient analysis.– Coding decision is usually taken at the design

stage of the questionnaire.

Page 9: data processing and analysis of data

3. Classification– Individual Data should be reduced into

homogeneous groups to get meaningful relationships.

– classification is the process of arranging data in groups or classes on the basis of some common characteristics.

Page 10: data processing and analysis of data

• Broadly there are two types of classification based on the nature of the phenomena involved.a) Classification according to attributes.

b) Classification according to class-interval.

Page 11: data processing and analysis of data

• Classification according to attributes:– Data are classified on the basis of common

characteristics either descriptive or numerical.– Descriptive characteristics refer to qualitative

phenomenon which cannot be measured quantitatively

– Data obtained this way is known as statistics of attributes.

Page 12: data processing and analysis of data

– This classification can be either simple or manifold– In Simple classification, we consider only one

attribute and make two classes; one possessing the considered attribute and the other devoid of it.

– In Manifold classification, more than one attributes are considered and data is divided into number of classes.

Page 13: data processing and analysis of data

• Classification according to class-interval:– Data relating to income, production, age etc are

known as statistics of variables and are classified on the basis of class intervals.

Page 14: data processing and analysis of data

4. Tabulation– Tabulation refers to the process of summarizing

the raw data and displaying the same in compact form.

– It is essential because:• It conserves space and reduces the explanatory

statements to minimum.• Facilitates the process of comparison.

Page 15: data processing and analysis of data

Elements/Types of Analysis

• In case of survey or experimental data, analysis involves – estimating the values of unknown parameters of

the population,– Testing of hypotheses for drawing inferences.

• Categories of analysis:a)Descriptiveb)inferential

Page 16: data processing and analysis of data

• Correlation analysis:– Studies the joint variation of two or more

variables for determining the amount of correlation between two or more variables.

• Casual analysis:– Studies how one or more variable affect changes

in another variable.

Page 17: data processing and analysis of data

• Multivariate analysis:– “All statistical methods which simultaneously

analyze more than two variables on a sample of observations.”

– It involves:a) Multiple regression analysisb) Multiple discriminant analysisc) Multivariate analysis of varianced) Canonical analysis

Page 18: data processing and analysis of data

STATISTICS IN RESEARCH

• Statistics in research functions as a tool in designing research, analyzing its data and drawing conclusions there from.

• The important statistical measures used to summarize the survey/research are:1) Measure of central tendency or statistical

averages.2) Measures of dispersion

Page 19: data processing and analysis of data

3. Measures of asymmetry(skewness)4. Measures of relationship5. Other measures

Page 20: data processing and analysis of data

Measure of Central Tendency

– It tells the point about which items have a tendency to cluster.

– Mean, Median ,Modes are the most popular averages.

– Mean is also known as arithmetic average– Median is the value of the middle item of series

when it is arranged in ascending or descending order.

– Mode is the most commonly or frequently occurring value in a series.

Page 21: data processing and analysis of data

Measure of Dispersion

– It is used to give an idea about the scatter of the values of items of a variable in the series around the true value of average.

– Important measures of dispersion are:a) Rangeb) Mean deviation andc) Standard deviation

Page 22: data processing and analysis of data

• Range– Is the simplest possible measure of dispersion – It is defined as the difference between the values of

the extreme items of a series.• Mean deviation– It is the average of difference of the values of items

from some average of the series.• Standard deviation– Most widely used measure of dispersion– Denoted by the symbol σ

Page 23: data processing and analysis of data

– Standard deviation is defined as the square root of the average of squares of deviations.

Where

Page 24: data processing and analysis of data

Measure of Asymmetry

– When the distribution of the elements in a series happens to be perfectly symmetrical then we get the following type of curve. Technically such curves are described as normal curve.

Page 25: data processing and analysis of data

• If the curve is distorted, it is said to exhibit asymmetrical distribution which indicates the presence of skewness.

– Where

Page 26: data processing and analysis of data
Page 27: data processing and analysis of data

Measures of Relationship

– In context of bivariate and multivariate population, it is required to know the relation of the two or more variables in the data to one another.

– These association/correlation and cause-and-effect relationship are studied using correlation technique and the technique of regression

Page 28: data processing and analysis of data

• In case of bivariate population:– Correlation can be studied through:

a) Cross tabulationb) Charles Spearman’s coefficient of correlationc) Karl Pearson’s coefficient of correlation

– Cause-and-effect relationship can be studied through simple regression technique.

Page 29: data processing and analysis of data

1. Cross tabulation:– Useful when the data are in nominal form– Classify each variable in two or more categories

and then cross classify the variables in these categories.

– The interaction between them can be as follows:• Symmetrical• Reciprocal• Asymmetrical

Page 30: data processing and analysis of data

• In a symmetrical relationship the two variables vary together.

• In reciprocal relationship the two variables mutually influence or reinforce each other.

• In an asymmetric relationship one variable (independent variable) is responsible for another variable (dependent variable).

Page 31: data processing and analysis of data

2. Charles Spearman’s coefficient of correlation:― This technique deals with ordinal data where ranks are

given to the different values of the variables― The objective is to determine the extent to which the

two sets of ranking are similar of dissimilar.

Page 32: data processing and analysis of data

3. Karl Pearson’s coefficient of correlation: – Most widely used method to measure the

degree of relationship between two variables.

Page 33: data processing and analysis of data

• Simple regression analysis:– Regression is the determination of a statistical

relationship between two or more variables, where one variable is the cause of the behavior of another variable.

– If X is the independent variable and Y is the dependent variable then, the regression equation of Y on X is given as below

Page 34: data processing and analysis of data

• In case of multivariate population:– Correlation can be studied through:

a)coefficient of multiple correlation.b)coefficient of partial correlation.

– Cause-and-effect relationship can be studied through multiple regression equations.

Page 35: data processing and analysis of data

1. Multiple Correlation and Regression– When there are two or more independent

variables then the analysis concerning relationship is known as multiple correlation

– The equation describing such relationship is known as multiple regression equation.

Page 36: data processing and analysis of data

• In the context of two independent variables and one dependent variable the equation can be given as:

Page 37: data processing and analysis of data

• Partial correlation:– Partial correlation measures separately the

relationship between two variables such that the effect of other related variable is eliminated

– In other words the aim is at measuring the relation between a dependent variable and particular independent variable by holding all other variables constant.

Page 38: data processing and analysis of data

Other Measures

1. Index number:– Used when the series are expressed in different

units.– In such scenario the series is converted into

series of index numbers.– For example the given figures can be expressed

in terms of percentage.

Page 39: data processing and analysis of data

2. Time- Series Analysis:– When the data collected relates to some time

period concerning a given phenomenon, particularly in economic and business scenario, such data are labeled as ‘Time-Series’

– Factors affecting such series areI. Secular trend (T) : changes taking place at long duration of

time II. Short time oscillations: changes taking place at short

duration of time

Page 40: data processing and analysis of data

• Short time oscillation are affected by the following factors:

a) Cyclic fluctuations (C): the fluctuations as a result of business cycles.

b) Seasonal fluctuations (S): these fluctuations are of short duration occurring at a regular sequence at specific interval of time.

c) Irregular fluctuations (I): such fluctuations takes place at completely unpredictable fashion.

Page 41: data processing and analysis of data

• For analyzing time series there are two models:a) Multiplicative modelb) Additive modelMultiplicative model assumes that the various

component interact in a multiplicative manner to produce the given values of the overall time series and can be stated as;

Page 42: data processing and analysis of data

The additive model considers the total of various components resulting in the given values of the overall time series and can be stated as

Page 43: data processing and analysis of data