processing & data analysis lecture ppts unit iv

60
Unit IV Data processing

Upload: amit-kumar

Post on 11-Aug-2015

23 views

Category:

Documents


2 download

DESCRIPTION

Processing & Analysis of Data- Processing operations; problems in processing; types of analysis Hypothesis Testing- Chi-square test, Z test, t-test, f-test.

TRANSCRIPT

Page 1: Processing & Data Analysis Lecture PPTs Unit IV

Unit IV

Data processing

Page 2: Processing & Data Analysis Lecture PPTs Unit IV

DataData

The word data is derived from Latin language. It is plural of Datum (But Data is usually used as a singular term.) Datum (singular) – Data (plural). Data is any collection of facts of figures. The data is the raw material to be processed by a

computer.Example

Names of students, marks obtained in the examination, designation of employees, addresses, quantity, rate, sales

figures or anything that is input to the computer is data. Even pictures, photographs, drawings, charts and maps can be

treated as data. Computer processes the data and produces the output or result

Page 3: Processing & Data Analysis Lecture PPTs Unit IV

Types of DataMainly Data is divided into two types:

1. Numeric Data2. Character Data

1. Numeric DataThe data which is represented in the form of numbers is known as Numeric Data.

This includes 0-9 digits, a decimal point (.), +, /, – sign and the letters “E” or “D”.

2. Character DataCharacter data falls into two groups.

i. String Dataii. Graphical Data

String DataString data consists of the sequence of characters. Characters may be English alphabets, numbers or space. The space, which separates two words, is also a character. The string data is further divided into two types.a. Alphabetic Datab. Alphanumeric Data

Graphical DataIt is possible that pictures, charts and maps can be treated as data. The scanner is normally used to enter this type of data. The common use of this data is found in the National Identity Card.

Page 4: Processing & Data Analysis Lecture PPTs Unit IV

InformationA collection of data which conveys some meaningful idea is information.

It may provide answers to questions like who, which, when, why, what, and how.

or

The raw input is data and it has no significance when it exists in that form. When data is collated or organized into something meaningful, it gains significance. This meaningful organization is information

or

Observations and recordings are done to obtain data, while analysis is done to obtain information

Page 5: Processing & Data Analysis Lecture PPTs Unit IV

Data Processing

Data processing:

Any operation or set of operations performed upon data, whether or not by automatic means, such as collection, recording, organization, storage, adaptation or alteration to convert it into useful information.

Page 6: Processing & Data Analysis Lecture PPTs Unit IV

Data Processing CycleOnce data is collected, it is processed to convert it into useful information. The data is processed again and again until the accurate result is achieved. This is called data processing cycle.

The data processing is very important activity and involves very careful planning. Usually, data processing activity involves three basic activities.

1. Input 2. Processing 3. Output

Page 7: Processing & Data Analysis Lecture PPTs Unit IV

Data Processing CycleStep-1

1. InputIt is the process through which collected data is transformed into

a form that computer can understand. It is very important step because correct output result totally depends on the input data. In input step, following activities can be performed.i) Verification

The collected data is verified to determine whether it is correct as required. For example, the collected data of all B.Sc. students that appeared in final examination of the university is verified. If errors occur in collected data, data is corrected or it is collected again.ii) Coding

The verified data is coded or converted into machine readable form so that it can be processed through computer.iii) Storing

The data is stored on the secondary storage into a file. The stored data on the storage media will be given to the program as input for processing.

Page 8: Processing & Data Analysis Lecture PPTs Unit IV

Data Processing CycleStep-2

2.Processing The term processing denotes the actual data manipulation techniques such as classifying, sorting, calculating, summarizing, comparing, etc. thatconvert data into information.

i) ClassificationThe data is classified into different groups and subgroups, so that each group or sub-group of data can be handled separately.ii) StoringThe data is arranged into an order so that it can be accessed very quickly as and when required.iii) CalculationsThe arithmetic operations are performed on the numeric data to get the required results. For example, total marks of each student are calculated.iv) SummarizingThe data is processed to represent it in a summarized form. ft means that the summary of data is prepared for top management. For example, the summary of the data of student is prepared to show the percentage of pass and fail student examination etc.

Page 9: Processing & Data Analysis Lecture PPTs Unit IV

Data Processing CycleStep-3

3. OutputAfter completing the processing step, output is generated. The main

purpose of data processing is to get the required result. Mostly, the output is stored on the storage media for later user. In output step, following activities can be performed.

i) RetrievalOutput stored on the storage media can be retrieved at any time. For

example, result of students is prepared and stored on the disk. This result can be retrieved when required for different purposes.

ii) ConversionThe generated output can be converted into different forms. For

example, it can be represented into graphical form.iii) Communication

The generated output is sent to different places. For example, weather forecast is prepared and. sent to different agencies and newspapers etc. where it is required.

Page 10: Processing & Data Analysis Lecture PPTs Unit IV

Types of Data Processing1. Manual Data Processing:

This method of data processing involves human intervention. The manual process of data entry implies many opportunities for errors, such as delays in data capture, as every single data field has to be keyed in manually, a high amount of operator misprints or typos, high labor costs from the amount of manual labor required. Manual processing also implies higher labor expenses in regards to spending for equipment and supplies, rent, etc.

Page 11: Processing & Data Analysis Lecture PPTs Unit IV

Types of Data Processing

EDPEDP (electronic data processing), an

infrequently used term for what is today usually called "IS" (information services or systems) or "MIS" (management information services or systems), is the processing of data by a computer and its programs in an environment involving electronic communication. EDP evolved from "DP" (data processing), a term that was created when most computing input was physically put into the computer in punched card form or in ATM cards form and output as punched cards or paper reports.

Page 12: Processing & Data Analysis Lecture PPTs Unit IV

Types of Data Processing

3.Real time processingIn a real time processing, there is a continual

input, process and output of data. Data has to be processed in a small stipulated time period (real time), otherwise it will create problems for the system. For example, when a bank customer withdraws a sum of money from his or her account it is vital that the transaction be processed and the account balance updated as soon as possible, allowing both the bank and customer to keep track of funds.

Page 13: Processing & Data Analysis Lecture PPTs Unit IV

Types of Data Processing

4.Batch processing

In a batch processing group of transactions collected over a period of time is collected, entered, processed and then the batch results are produced. Batch processing requires seperate programs for input, process and output. It is an efficient way of processing high volume of data. For example: Payroll system, Examination system and billing system.

Page 14: Processing & Data Analysis Lecture PPTs Unit IV

Hypothesis Testing

Page 15: Processing & Data Analysis Lecture PPTs Unit IV

Hypothesis Testing

Decision-making processStatistics used as a tool to assist with

decision-makingScientific hypothesis is a statement of the

predicted relationship amongst the variablesNull hypothesis is a statement of no

relationship amongst the variables

Page 16: Processing & Data Analysis Lecture PPTs Unit IV

Null Hypothesis Not Rejected

Total Population

Samplereared inenrichedenvironment

Samplereared insterileenvironment

Page 17: Processing & Data Analysis Lecture PPTs Unit IV

Null Hypothesis Rejected

Total populationof rats reared insterile environment

Sample usedin study

Total populationof rats reared inenriched environment

Sample usedin study

Page 18: Processing & Data Analysis Lecture PPTs Unit IV

Hypothesis TestingIn Experimental Studies

Your research design determines the kind of statistical test you will use.

Experimental studies test hypotheses while quasi-experimental studies tend to focus more on generating hypotheses.

Page 19: Processing & Data Analysis Lecture PPTs Unit IV

Research Designs/Approaches

Type Purpose Time frame

Degree of control

Examples

Experi-mental

Test for cause/effect relationships

current High Comparing two types of treatments for anxiety.

Quasi-experi-mental

Test for cause/effect relationships without full control

Current or past

Moderate to high

Gender differences in visual/spatial abilities

Page 20: Processing & Data Analysis Lecture PPTs Unit IV

Research Designs/Approaches

Type Purpose Time frame

Degree of control

Examples

Non-experimental - corre-lational

Examine relationship between two variables

Current (cross-sectional) or past

Low to medium

Relationship between studying style and grade point average.

Ex post facto

Examine the effect of past event on current functioning.

Past & current

Low to medium

Relationship between history of child abuse & depression.

Page 21: Processing & Data Analysis Lecture PPTs Unit IV

Research Designs/Approaches

Type Purpose Time frame

Degree of control

Examples

Non-experimental -corre-lational

Examine relat. betw. 2 var. where 1 is measured later.

Future -predictive

Low to moderate

Relat. betw. history of depression & development of cancer.

Cohort-sequen-tial

Examine change in a var. over time in overlapping groups.

Future Low to moderate

How mother-child negativity changed over adolescence.

Page 22: Processing & Data Analysis Lecture PPTs Unit IV

Research Designs/Approaches

Type Purpose Time frame

Degree of control

Examples

Survey Assess opinions or characteristics that exist at a given time.

Current None or low

Voting preferences before an election.

Quali-tative

Discover potential relationships; descriptive.

Past or current

None or Low

People’s experiences of quitting smoking.

Page 23: Processing & Data Analysis Lecture PPTs Unit IV

Tests of SignificanceThe Question Null Hypothesis Statistical Test

Group Difference between means of 2 diff. groups

H0: g1 = g2 t-independent

Diff. betw. 2 means of related groups

H0: g1a = g1b t-dependent

Diff. betw. means of 3 groups

H0: g1 = g2 = g3 ANOVA

Group Relationships: betw. 2 variables

H0: xy = 0 t-test for sig. Of correlation

Group Relationships: betw. 2 correlations

H0: ab = cd t-test for sig. Of diff. betw. 2 corr.

Page 24: Processing & Data Analysis Lecture PPTs Unit IV

Experimental DesignsExamines differences between experimentally

manipulated groups or variables (e.g., one group gets a certain drug and the other gets a placebo).

At minimum, experimental (independent) variable has two levels (e.g., drug vs. placebo).– Advantage is that you can determine causality.– Disadvantage is cost and many variables cannot

be experimentally manipulated (e.g., smoke exposure over time).

Page 25: Processing & Data Analysis Lecture PPTs Unit IV

Null HypothesisSignificance Testing

Null hypothesis– Results are due to “chance” – H0

Alternative (scientific) hypothesis– Results are due to a true “effect”– H1

Page 26: Processing & Data Analysis Lecture PPTs Unit IV

Null HypothesisSignificance Testing

Null hypothesis– Results are due to “chance” (H0)

Alternative (scientific) hypothesis– Results are due to a true “effect” (H1)

Assess– Assuming H0 is true, what is the probability or

“chance” of obtaining the data we did?

Page 27: Processing & Data Analysis Lecture PPTs Unit IV

Null HypothesisSignificance Testing

Null hypothesis– Results are due to “chance” (H0)

Alternative (scientific) hypothesis– Results are due to a true “effect” (H1)

Assess– Assuming H0 is true, what is the probability or

“chance” of obtaining the data we did?Decide

– If the chance is small enough, reject H0 and infer the “effect” is real.

Page 28: Processing & Data Analysis Lecture PPTs Unit IV

Experimental Designs:Hypothesis Testing

Type of Experim ental Research Design

In d ep en d en tsam p les t-tes

Tw o g rou p s

O n e-w ayA N O V A

M ore th antw o g rou p s

O n e in d ep en d en tvariab le

Tw o-w ayA N O V A

Tw o in d ep en d en tvariab les

N u m b er o fin d ep en d en t

variab les

B etw eenS u b jec t

C orre la tedt-tes ts

Tw o g rou p s o rtw o leve ls o f th e

in d ep en d en t va riab le

R ep ea ted m easu resA N O V A

M ore th an tw o g rou p sor m ore th en tw o leve ls o fth e in d ep en d en t va riab le

N u m b er o f g rou p sor leve ls o f th e

in d ep en d en t va riab le

W ith inS u b jec t

Page 29: Processing & Data Analysis Lecture PPTs Unit IV

Parametric Vs. Non-Parametric Statistics: Two-Sample Cases

Level of measurement

Related Samples Independent Samples

Nominal McNemar test Fisher exactX2 test

Ordinal Sign testWilcoxon matched-pairs sign test

Median testMann-Witney U test

Interval T-test for matched pairs

T-independent test

Page 30: Processing & Data Analysis Lecture PPTs Unit IV

Parametric Vs. Non-Parametric Statistics: > 2-Sample Cases

Level of measurement

Related Samples Independent Samples

Nominal Cochran Q test X2 test

Ordinal Friedman 2-way ANOVA

Kruskal-Wallis one-way ANOVA

Interval Repeated measures ANOVA

ANOVA

Page 31: Processing & Data Analysis Lecture PPTs Unit IV

Parametric Vs. Non-Parametric Statistics: > 2-Sample Cases

Level of measurement

Correlation

Nominal Contingency coefficient

Ordinal Spearman rank correlationKendall rank correlation, etc.

Interval Pearson’s Correlation Coefficient

Page 32: Processing & Data Analysis Lecture PPTs Unit IV

Sampling Distribution of Mean Difference Scores

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

Normal Curve

95% of all cases

99% of all cases 0

Page 33: Processing & Data Analysis Lecture PPTs Unit IV

Critical Values of T

Need to determine the degrees of freedom– df = N-2

Need to determine the p value for rejecting the null hypothesis (alpha)

Need to determine if this is a 1-tailed or 2-tailed level of significance.

Page 34: Processing & Data Analysis Lecture PPTs Unit IV

T-Values

T120 = 2.00, p < 0.05

Page 35: Processing & Data Analysis Lecture PPTs Unit IV

What is one of the major criticisms of employing

statistical tests of the null hypothesis to determine if

effects are true?

Page 36: Processing & Data Analysis Lecture PPTs Unit IV

Limitations of Statistical Tests of the Null Hypothesis

Does not take into account the size of the difference between means (effect size)

Page 37: Processing & Data Analysis Lecture PPTs Unit IV

Analysis of Variance (ANOVA)

F-ratio = MSbet

MSwithin

Essentially is the between group variance divided by the within group variance.

If the groups come from similar populations, the variances between the groups will be similar to the variance within groups (null hypothesis is not rejected).

Page 38: Processing & Data Analysis Lecture PPTs Unit IV

ANOVABetween group variance consists of:

– Variability due to the effect of the independent variable (treatment effect)

– Variability due to chance factorsWithin group variance consists of:

– Variability in data with the treatment groups that is due to chance since if treatment effect was consistent, all subjects within a treatment group would experience similar magnitude of effect.

Page 39: Processing & Data Analysis Lecture PPTs Unit IV

Analysis of Variance (ANOVA)

F-ratio = MSbet

MSwithin

The MS refers to the mean square and is the sums of squares divided by the appropriate degrees of freedom.

Df for MSbet is the number of groups minus 1.

Df for MSwithin is the total number of scores in the experiment minus the number of groups.

Page 40: Processing & Data Analysis Lecture PPTs Unit IV

ANOVA

MSbet = treatment effect + chance variability

MSwithin = chance variability

Ratio will be 1 if there is no treatment effectF(2,144) = 5.56, p < 0.05.

Page 41: Processing & Data Analysis Lecture PPTs Unit IV

Two-Way ANOVA

Where you have 2 independent variables, each having at least 2 levels. For example,– Drug dose (none vs. 5 mg)– Delivery mood (intravenous vs. oral)

Factorial design so you can test both main effects and interaction effects

Page 42: Processing & Data Analysis Lecture PPTs Unit IV

Mixed Model:2 Between Subject Factors

1 within Subject Factor Where you have 2 independent variables, each having

at least 2 levels. For example,– Drug dose (none vs. 5 mg)– Delivery mood (intravenous vs. oral)

One within subject factor with for example 3 levels– Pre-treatment, 3 and 6 months follow-up

Factorial design so you can test both main effects and interaction effects (3-way interaction effects)

Page 43: Processing & Data Analysis Lecture PPTs Unit IV

Rejecting the Null HypothesisNull hypothesis can be rejected but not

acceptedArguments made for allowing some

flexibility in being able to conclude the null hypothesis is true;– No other studies of the phenomenon have

rejected the null hypothesis– P value for the test of the null hypothesis is

large (e.g., > .20 or .40).– Research design is sufficiently powerful

Page 44: Processing & Data Analysis Lecture PPTs Unit IV

Errors in Statistical Decision-Making

Type I error – falsely reject the null hypothesis– At p < .05 there is a 5% chance (5 in 100) of

falsely rejecting null hypothesisType II error – failing to reject the null

hypothesis when it is false

Page 45: Processing & Data Analysis Lecture PPTs Unit IV

External Validity

Chapter 14

Page 46: Processing & Data Analysis Lecture PPTs Unit IV

Goals of Psychology Research

Goal is to understand the underlying laws governing the behaviour of organisms.

The extent to which the results of your study help inform one about these underlying laws, the more valuable the findings.

Limits to the importance of the findings are the internal/external validity.

Page 47: Processing & Data Analysis Lecture PPTs Unit IV

External ValidityExtent to which the results of the study can

be generalized across different persons, settings, and times.

Typically think of generalizing to specific populations (e.g., North American elementary school students) than world at large.

Best safeguard is random selection but not usually feasible.

Page 48: Processing & Data Analysis Lecture PPTs Unit IV

Threats to External Validity

Lack of population validityLack of ecological validityLack of time validity

Page 49: Processing & Data Analysis Lecture PPTs Unit IV

Population Validity

Generalizing to the defined population (i.e., target population) from which the sample was drawn.

Sample is the experimentally accessible population.

Page 50: Processing & Data Analysis Lecture PPTs Unit IV

Population Validity

TargetPopulation

Experimentallyaccessiblepopulation

Sample

Page 51: Processing & Data Analysis Lecture PPTs Unit IV

Population Validity

Threatened by a selection by treatment interaction:– Treatment results may not be exactly

reproducible in target population.Even willingness to volunteer for studies

have been shown to result in a selection by treatment interaction effect.

Page 52: Processing & Data Analysis Lecture PPTs Unit IV

Ecological Validity

Extent to which the results can be generalized across settings or environmental conditions.– E.g., Would the treatment effect observed in

patients recruited from a 1st class medical centre be the same as the the treatment effect observed in patients recruited from a local community hospital?

Page 53: Processing & Data Analysis Lecture PPTs Unit IV

Ecological Validity

Multiple-Treatment Interference– Sequencing effect whereby exposure to one

treatment influences responses to another treatment; or

– Exposure to one experiment influences response in another experiment (e.g., sophisticated participants).

Page 54: Processing & Data Analysis Lecture PPTs Unit IV

Ecological Validity

Hawthorne Effect– Knowing one is in a study can affect one’s

behaviour– Participant bias effects (e.g., social

acceptability, compliance)Novelty or Disruption Effect

– Effects are simply due to novelty and wear off once novelty diminishes.

Page 55: Processing & Data Analysis Lecture PPTs Unit IV

Ecological Validity

Experimenter Effect– Enthusiastic experimenter/clinician may get

different effects than a clinician who is implementing the treatment in routine care.

Pre-testing Effect– Administering a pre-test may sensitive the

participant in such a way that he/she may respond differently to the experiment than what would have occurred without a pre-test.

Page 56: Processing & Data Analysis Lecture PPTs Unit IV

Temporal Validity

Extent to which the results would generalize to other times– Results might vary depending on the time

elapsed between presentation of the independent variable and the measurement of the dependent variable.

Page 57: Processing & Data Analysis Lecture PPTs Unit IV

Temporal Validity

Seasonal Variation– Variation that appears regularly over time (e.g.,

change in traffic accident rates between daylight savings time and non-daylight savings time).

– Fixed-time variation – variation at specific, predictable time points

– Variable-time variation – don’t know when variation will occur but when it occurs, there are predictable responses.

Page 58: Processing & Data Analysis Lecture PPTs Unit IV

Temporal Validity

Cyclical Variation– Predictable variation within people or other

organismsPersonological Variation

– Variation in the characteristics of the individual over time

Page 59: Processing & Data Analysis Lecture PPTs Unit IV

Internal Vs. External ValidityTends to be an inverse relationship

– Internal validity ; external validityIn testing for between group differences,

you want to minimize within group variability and maximize between group differences

To do so you want to ensure high control over factors that could confound the results but this often results in increasingly artificial experimental conditions.

Page 60: Processing & Data Analysis Lecture PPTs Unit IV

When Is External Validity Less Important

When you don’t need to demonstrate that “X” will happen but rather “X” can happen.

Sometimes the main goal is to test a theory and extent to which it reflects “real-life” is less important.