rm semester4 project

43
1. Explain Briefly The Stages In Data Processing? Introduction Data processing is simply the conversion of raw data to meaningful information through a process. Data is manipulated to produce results that lead to a resolution of a problem or improvement of an existing situation. Similar to a production process, it follows a cycle where inputs (raw data) are fed to a process (computer systems, software, etc.) to produce output (information and insights). Generally, organizations employ computer systems to carry out a series of operations on the data in order to present, interpret, or obtain information. The process includes activities like data entry, summary, calculation, storage, etc. Useful and informative output is presented in various 1 | Page

Upload: khundongbam-suresh

Post on 14-Apr-2016

25 views

Category:

Documents


1 download

DESCRIPTION

m.com

TRANSCRIPT

Page 1: RM Semester4 Project

1. Explain Briefly The Stages In Data Processing?

Introduction

Data processing is simply the conversion of raw data to meaningful

information through a process. Data is manipulated to produce results

that lead to a resolution of a problem or improvement of an existing

situation. Similar to a production process, it follows a cycle where

inputs (raw data) are fed to a process (computer systems, software,

etc.) to produce output (information and insights).

Generally, organizations employ computer systems to carry out a

series of operations on the data in order to present, interpret, or obtain

information. The process includes activities like data entry, summary,

calculation, storage, etc. Useful and informative output is presented in

various appropriate forms such as diagrams, reports, graphics, etc

Stages of Data processing

There are five main stages in data processing, they are followed

below:

Collection of data

Preparation of data

Input of data

Processing of data

Output of data

1 | P a g e

Page 2: RM Semester4 Project

Collection of data

Here data are obtain or gather from various sources available. The two

main sources are followed:

Primary Data

Primary data are the data collected for first time. It is a firsthand copy.

Secondary Data

Secondary data are the data extracted from primary data, it is second

hand data.

Preparation of Data

In this stage data are made available for future use by performing

various processes that is:

Classifying the data,

Rearranging the data,

Editing the raw data, etc

Here data are made filter for further use. Preparation of data is the

stage where researchers make sure that they have sufficient raw

material of data from which he can satisfied his research subject. The

researcher made raw data into usable output so that he can use it in

future interrogations.

Input of Data2 | P a g e

Page 3: RM Semester4 Project

Input data is the stage where the prepared data is put in the data

processing system to obtain information. Here the data are passed to

the person or department responsible for processing data. For

instance, if computer is used then the data are made recorded into the

computer.

Processing of Data

In this stage the data are manipulated by sorting, studying, analysing,

calculating, updating, etc the obtained contain by the research to get

answer of their questions. It is usually a set of working procedures or

instructions are followed.

Output of Information

This is the final stage where the information is made available for the

future use. The end result is produce in better format.

2. Explain In Brief The Measures Of Central Tendency?

Introduction

A measure of central tendency is a single value that describes the way

in which a group of data cluster around a central value. To put in

other words, it is a way to describe the centre of a data set. There are

three measures of central tendency: the mean, the median, and the

mode.

Important of Central Tendency 3 | P a g e

Page 4: RM Semester4 Project

Central tendency is very useful in psychology. It lets us know what

is normal or 'average' for a set of data. It also condenses the data set

down to one representative value, which is useful when you are

working with large amounts of data. Could you imagine how difficult

it would be to describe the central location of a 1,000 item data set if

you had to consider every number individually?

Central tendency also allows you to compare one data set to another.

For example, let's say you have a sample of girls and a sample of

boys, and you are interested in comparing their heights. By

calculating the average height for each sample, you could easily draw

comparisons between the girls and boys.

Central tendency is also useful when you want to compare one piece

of data to the entire data set. Let's say you received a 60% on your last

psychology quiz, which is usually in the D range. You go around and

talk to your classmates and find out that the average score on the quiz

was 43%. In this instance, your score was significantly higher than

those of your classmates. Since your teacher grades on a curve, your

60% becomes an A. Had you not known about the measures of central

tendency, you probably would have been really upset by your grade

and assume that you bombed the test.

Three Measures of Central Tendency

4 | P a g e

Page 5: RM Semester4 Project

There are three types of measures of central tendency. Each of these

measures describes a different indication of the typical or central

value in the distribution, they are followed:

Mean

Median

Mode

Mean

The mean or average. The mean is calculated in two steps:

1. Add the data together to find the sum

2. Take the sum of the data and divide it by the total number of

data

Now let's see how this is done using the height example from earlier.

Let's say we have a sample of 10 girls and 9 boys.

The girls' heights in inches are 60, 72, 61, 66, 63, 66, 59, 64, 71, 68.

Here are the steps to calculate the mean height for the girls:

First, you add the data together: 60 + 72 + 61 + 66 + 63 + 66 + 59 +

64 + 71 + 68 = 650. Then, we take the sum of the data (650) and

divide it by the total number of data (10 girls): 650 / 10 = 65. The

average height for the girls in the sample is 65 inches. If you look at

the data, you can see that 65 is a good representation of the data set

because 65 lands right around the middle of the data set.

5 | P a g e

Page 6: RM Semester4 Project

The mean is the preferred measure of central tendency because it

considers all of the values in the data set. However, the mean is not

without limitations. In order to calculate the mean, data must be

numerical. You cannot use the mean when you are working with

nominal data, which is data on characteristics like gender, appearance,

and race. For example, there is no way that you can calculate the

mean of the girls' eye colours. The mean is also very sensitive to

outliers, which are numbers that are much higher or much lower than

the rest of the data set and thus, it should not be used when outliers

are present.

To illustrate this point, let's look at what happens to the mean when

we change 68 to 680. Again, we add the data together: 60 + 72 + 61 +

66 + 63 + 66 + 59 + 64 + 71 + 680 = 1262. Then we take the sum of

the data (1262) and divide it by the total number of data (10 girls):

1262 / 10 = 126.2. The mean height (in inches) for the sample of girls

is now 126.2. This number is not a good estimate of the central height

for the girls. This number is almost twice as high as the height of most

of the girls.

Median

The median is determined by sorting the data set from lowest to

highest values and taking the data point in the middle of the sequence.

There are an equal number of points above and below the median. For

example, in the data set {1, 2, 3, 4, 5} the median is 3; there are two

6 | P a g e

Page 7: RM Semester4 Project

data points greater than this value and two data points less than this

value. In this case, the median is equal to the mean. But consider the

data set {1, 2, 3, 4, 10}. In this dataset, the median still is three, but

the mean is equal to 4. If there is an even number of data points in the

set, then there is no single point at the middle and the median is

calculated by taking the mean of the two middle points.

The median can be determined for ordinal data as well as interval and

ratio data. Unlike the mean, the median is not influenced by outliers at

the extremes of the data set. For this reason, the median often is used

when there are a few extreme values that could greatly influence the

mean and distort what might be considered typical. This often is the

case with home prices and with income data for a group of people,

which often is much skewed. For such data, the median often is

reported instead of the mean. For example, in a group of people, if the

salary of one person is 10 times the mean, the mean salary of the

group will be higher because of the unusually large salary. In this

case, the median may better represent the typical salary level of the

group.

Mode

The mode is the most frequently occurring value in the data set. For

example, in the data set {1, 2, 3, 4, 4}, the mode is equal to 4. A data

set can have more than a single mode, in which case it is multimodal.

In the data set {1, 1, 2, 3, 3} there are two modes: 1 and 3.

7 | P a g e

Page 8: RM Semester4 Project

The mode can be very useful for dealing with categorical data. For

example, if a sandwich shop sells 10 different types of sandwiches,

the mode would represent the most popular sandwich. The mode also

can be used with ordinal, interval, and ratio data. However, in interval

and ratio scales, the data may be spread thinly with no data points

having the same value. In such cases, the mode may not exist or may

not be very meaningful.

3. What Is Hypothesis? Explain Steps In Hypothesis Testing?

Introduction

A hypothesis is a specific, testable prediction. It describes in concrete

terms what you expect will happen in a certain circumstance.

Hypothesis is not a written conclusion. It is a preservation that is

taken before any research is done.

Definition

William Goode and Paul Hatt define hypothesis as “a proposition,

which can be put to a test to determine its validity.”

G.A. Lundberg defines hypothesis as “a tentative generalization, the

validity of which remains to be tested.”

8 | P a g e

Page 9: RM Semester4 Project

Hypothesis can also be define as “an unproved theory, proposition,

supposition, etc., tentatively accepted to explain certain facts or to

provide a basis for further investigation, argument, etc”

PURPOSE OF HYPOTHESIS

A hypothesis is used in an experiment to define the relationship

between two variables. The purpose of a hypothesis is to find the

answer to a question. A formalized hypothesis will force us to think

about what results we should look for in an experiment. The first

variable is called the independent variable. This is the part of the

experiment that can be changed and tested. The independent variable

happens first and can be considered the cause of any changes in the

outcome. The outcome is called the dependent variable. The

independent variable in our previous example is not studying for a

test. The dependent variable that you are using to measure outcome is

your test score.

Let's use the previous example again to illustrate these ideas. The

hypothesis is testable because you will receive a score on your test

performance. It is measurable because you can compare test scores

received from when you did study and test scores received from when

you did not study.

A hypothesis should always:

Explain what you expect to happen

9 | P a g e

Page 10: RM Semester4 Project

Be clear and understandable

Be testable

Be measurable

And contain an independent and dependent variable

HYPOTHESIS TESTING

A hypothesis test is a statistical test that is used to determine whether

there is enough evidence in a sample of data to infer that a certain

condition is true for the entire population. A hypothesis test examines

two opposing hypotheses about a population: the null hypothesis and

the alternative hypothesis. The null hypothesis is the statement being

tested. Usually the null hypothesis is a statement of "no effect" or "no

difference". The alternative hypothesis is the statement you want to be

able to conclude is true.

Based on the sample data, the test determines whether to reject the

null hypothesis. You use a p-value, to make the determination. If the

p-value is less than or equal to the level of significance, which is a

cut-off point that you define, and then you can reject the null

hypothesis.

A common misconception is that statistical hypothesis tests are

designed to select the more likely of two hypotheses. Instead, a test

will remain with the null hypothesis until there is enough evidence

(data) to support the alternative hypothesis.10 | P a g e

Page 11: RM Semester4 Project

Example of performing a basic hypothesis test

We can follow six basic steps to correctly set up and perform a

hypothesis test. For example, the manager of a pipe manufacturing

facility must ensure that the diameters of its pipes equal 5cm. The

manager follows the basic steps for doing a hypothesis test.

NOTE

We should determine the criteria for the test and the required sample

size before we collect the data.

1. Specify the hypotheses.

First, the manager formulates the hypotheses. The null hypothesis is:

The population mean of all the pipes is equal to 5 cm. formally, this is

written as: H0: μ = 5

Then, the manager chooses from the following alternative hypotheses:

Condition to testAlternative

Hypothesis

The population mean is less than the

target.

one sided: μ < 5

The population mean is greater than

the target.

one sided: μ > 5

11 | P a g e

Page 12: RM Semester4 Project

Condition to testAlternative

Hypothesis

The population mean differs from the

target.

two sided: μ ≠ 5

Because they need to ensure that the pipes are not larger or smaller

than 5 cm, the manager chooses the two-sided alternative hypothesis,

which states that the population mean of all the pipes is not equal to 5

cm. Formally, this is written as H1: μ ≠ 5

2. Determine the power and sample size for the test.

The manager uses a power and sample size calculation to determine

how many pipes they need to measure to have a good chance of

detecting a difference of 0.1 cm or more from the target diameter.

3. Choose a significance level (also called alpha or α).

The manager selects a significance level 0.05, which is the most

commonly, used significance level.

4. Collect the data.

They collect a sample of pipes and measure their diameters.

5. Compare the p-value from the test to the significance level.

After they perform the hypothesis test, the manager obtains a p-value

of 0.004. The p-value is less than the significance level of 0.05.12 | P a g e

Page 13: RM Semester4 Project

6. Decide whether to reject or fail to reject the null hypothesis.

The manager rejects the null hypothesis and concludes that the mean

pipe diameter of all pipes is not equal to 5cm.

Data than can be analyze with a hypothesis test

Hypothesis tests can be used to evaluate many different parameters of

a population. Each test is designed to evaluate a parameter associated

with a certain type of data. Knowing the difference between the types

of data, and which parameters are associated with each data type, can

help you choose the most appropriate test.

Continuous Data

You will have continuous data when you evaluate the mean, median,

standard deviation, or variance.

When you measure a characteristic of a part or process, such as

length, weight, or temperature, you usually obtain continuous data.

Continuous data often includes fractional (or decimal) values.

For example, a quality engineer wants to determine whether the mean

weight differs from the value stated on the package label (500 g). The

engineer samples cereal boxes and records their weights.

Binomial Data

You will have binomial data when you evaluate a proportion or a

percentage.

13 | P a g e

Page 14: RM Semester4 Project

When you classify an item, event, or person into one of two

categories you obtain binomial data. The two categories should be

mutually exclusive, such as yes/no, pass/fail, or defective/no

defective.

For example, engineers examine a sample of bolts for severe cracks

that make the bolts unusable. They record the number of bolts that are

inspected and the number of bolts that are rejected. The engineers

want to determine whether the percentage of defective bolts is less

than 0.2%.

Poisson Data

You will have Poisson data when you evaluate a rate of occurrence.

When you count the presence of a characteristic, result, or activity

over a certain amount of time, area, or other length of observation,

you obtain Poisson data. Poisson data are evaluated in counts per unit,

with the units the same size.

For example, inspectors at a bus company count the number of bus

breakdowns each day for 30 days. The company wants to determine

the daily rate of bus breakdowns.

About the Null and Alternative hypotheses

A hypothesis test examines two opposing hypotheses about a

population: the null hypothesis and the alternative hypothesis. How

you set up these hypotheses depends on what you are trying to show.14 | P a g e

Page 15: RM Semester4 Project

Null hypothesis (H0)

The null hypothesis states that a population parameter is equal to a

value. The null hypothesis is often an initial claim that researchers

specify using previous research or knowledge.

Alternative Hypothesis (H1)

The alternative hypothesis states that the population parameter is

different than the value of the population parameter in the null

hypothesis. The alternative hypothesis is what you might believe to be

true or hope to prove true.

When you do a hypothesis test, two types of errors are possible, they

are as follow:-

Type I

Type II.

The risks of these two errors are inversely related and determined by

the level of significance and the power for the test. Therefore, you

should determine which error has more severe consequences for your

situation before you define their risks.

No hypothesis test is 100% certain. Because the test is based on

probabilities, there is always a chance of drawing an incorrect

conclusion.

15 | P a g e

Page 16: RM Semester4 Project

Type I error

When the null hypothesis is true and you reject it, you make a type I

error. The probability of making a type I error is α, which is the level

of significance you set for your hypothesis test. An α of 0.05 indicates

that you are willing to accept a 5% chance that you are wrong when

you reject the null hypothesis. To lower this risk, you must use a

lower value for α. Type II error

When the null hypothesis is false and you fail to reject it, you make a

type II error. The probability of making a type II error is β, which

depends on the power of the test. You can decrease your risk of

committing a type II error by ensuring your test has enough power.

You can do this by ensuring your sample size is large enough to

detect a practical difference when one truly exists.

The probability of rejecting the null hypothesis when it is false is

equal to 1–β. This value is the power of the test.

  Null Hypothesis

Decision True False

Fail to

reject

Correct Decision

(probability = 1 - α)

Type II Error - fail to

reject the null when it is

false (probability = β)

Reject Type I Error - rejecting Correct Decision

16 | P a g e

Page 17: RM Semester4 Project

the null when it is true

(probability = α)

(probability = 1 - β)

Example of type I and type II error

To understand the interrelationship between type I and type II error,

and to determine which error has more severe consequences for your

situation, consider the following example.

A medical researcher wants to compare the effectiveness of two

medications. The null and alternative hypotheses are:

Null hypothesis (H0): μ1= μ2

The two medications are equally effective.

Alternative hypothesis (H1): μ1≠ μ2

The two medications are not equally effective.

A type I error occurs if the researcher rejects the null hypothesis and

concludes that the two medications are different when, in fact, they

are not. If the medications have the same effectiveness, the researcher

may not consider this error too severe because the patients still benefit

from the same level of effectiveness regardless of which medicine

they take. However, if a type II error occurs, the researcher fails to

reject the null hypothesis when it should be rejected. That is, the

researcher concludes that the medications are the same when, in fact,

they are different. This error is potentially life-threatening if the less-

17 | P a g e

Page 18: RM Semester4 Project

effective medication is sold to the public instead of the more effective

one.

As you conduct your hypothesis tests, consider the risks of making

type I and type II errors. If the consequences of making one type of

error are more severe or costly than making the other type of error,

then choose a level of significance and a power for the test that will

reflect the relative severity of those consequences.

4. Explain the meaning and significance of interpretation of

data?

Data interpretation refers to the process of critiquing and determining

the significance of important information, such as survey results,

experimental findings, observations or narrative reports. Interpreting

data is an important critical thinking skill that helps you comprehend

text books, graphs and tables. Researchers use a similar but more

meticulous process to gather, analyze and interpret data. Experimental

scientists base their interpretations largely on objective data and

statistical calculations. Social scientists interpret the results of written

reports that are rich in descriptive detail but may be devoid of

mathematical calculations.

18 | P a g e

Page 19: RM Semester4 Project

Data interpretation is part of daily life for most

people. Interpretation is the process of making sense of

numerical data that has been collected, analyzed, and presented. There

are two types of data interpretation, they are

Quantitative data interpretation

Qualitative data interpretation

Quantitative Interpretation

Scientists interpret the results of rigorous experiments that are

performed under specific conditions. Quantifiable data are entered

into spreadsheets and statistical software programs, and then

interpreted by researchers seeking to determine if the results they

achieved are statistically significant or more likely due to chance or

error. The results help prove or disprove hypotheses generated from

an existing theory. By using scientific methods, researchers can

generalize about how their results might apply to a larger population.

For example, if data show that a small group of cancer patients in a

voluntary drug study went into remission after taking a new drug,

other cancer patients might also benefit from it.

Qualitative Interpretation

Certain academic disciplines, such as sociology, anthropology and

women’s studies, rely heavily on the collection and interpretation of

19 | P a g e

Page 20: RM Semester4 Project

qualitative data. Researchers seek new knowledge and insight into

phenomena such as the stages of grief following a loss, for example.

Instead of controlled experiments, data is collected through

techniques such as field observations or personal interviews of

research subjects that are recorded and transcribed. Social scientists

study field notes or look for themes in transcriptions to make

meaning, out of the data.

The interpretation of data is based on the workings of the human

mind. Since the human mind is not 100 percent objective, the

interpretation of data may not be 100 percent accurate. There is

various way of interpretation of data, they are followed below:-

Correct Interpretation

In order to understand misinterpretation, the correct way to interpret

data must be understood. Data interpretation must be approached

without personal bias or preconceived opinions. A researcher forms

an initial opinion, called the hypothesis. He runs an experiment based

on the hypothesis. The data collected prove or disprove his original

hypothesis. For example, a researcher states that the sky is blue

because of nitrogen. He runs an experiment, and the data collected

reveal a high concentration of ozone. In his conclusion, he states the

original hypothesis was wrong, and the facts collected indicate ozone

is the colorant gas. By interpreting data objectively, the correct

20 | P a g e

Page 21: RM Semester4 Project

conclusion is reached. Unfortunately, having a 100 percent bias-free

and objective frame of mind is difficult.

Subjectivity

Suppose you are writing a technical manual and in a step you state:

"Move the part up a little bit, and sideways a little bit." The words "a

little bit" are extremely subjective. To one person, this may mean 1

inch. To another, this may mean 1 foot. Furthermore, "sideways" does

not specify to the left or right. Two different people will interpret the

data you presented completely differently. Stating "move part number

30 to the left one inch" eliminates the error in interpreting the data.

For data to be effectively interpreted, it has to be objective and

accurate.

Background and Experience

According to Drs. Anne E. Egger and Anthony Carpi at Vision

Learning, people base the interpretation of data upon their

background and prior experience. Since backgrounds vary widely, the

interpretation varies widely as well. Drs. Egger and Carpi stated that

even scientists (who are supposed to be objective) can interpret the

same set of data and reach differing opinions depending on their

backgrounds.

Abnormal Mental States

21 | P a g e

Page 22: RM Semester4 Project

People with an abnormal mental state will interpret data in abnormal

ways. Researchers M.R. Broom et al, writing for the British Journal of

Psychiatry reported their findings in 2007. The findings were that

people with delusional attributes jumped to conclusions quickly after

interpreting only a little bit of data. Furthermore, they did not tolerate

ambiguity. For example, a person with severe paranoia may read that

the law enforcement does wiretaps. He may stop reading there, never

reading that this is only done by a search warrant and court approval.

He jumps to the conclusion, based on incomplete data, that he is being

wiretapped.

Cultural Background

In 1968, researchers Marshall Segall et al presented a series of optical

illusions to people of different cultural groups. The conclusion

reached was that different groups perceived the illusions in various

ways. This experiment illustrated that a person's cultural background

influences how data is interpreted.

Significance of interpretation of data

Interpretation is essential for the simple reason that the usefulness and

utility of research findings lie in proper interpretation. It is being

considered a basic component of research process because of the

following reasons:

22 | P a g e

Page 23: RM Semester4 Project

1. It is through interpretation that the researcher can well

understand the abstract principle that works beneath his

findings. Through this he can link up his findings with those of

other studies, having the same abstract principle, and thereby

can predict about the concrete world of events. Fresh inquiries

can test these predictions later on. This way the continuity in

research can be maintained.

2. Interpretation leads to the establishment of explanatory concepts

that can serve as a guide for future research studies; it opens

new avenues of intellectual adventure and stimulates the quest

for more knowledge.

3. Researcher can better appreciate only through interpretation why

his findings are what they are and can make others to understand

the real significance of his research findings.

4. The interpretation of the findings of exploratory research study

often results into hypotheses for experimental research and as

such interpretation is involved in the transition from exploratory

to experimental research. Since an exploratory study does not

have a hypothesis to start with, the findings of such a study have

to be interpreted on a post-factum basis in which case the

interpretation is technically described as ‘post factum’

interpretation.

23 | P a g e

Page 24: RM Semester4 Project

5. What are the precautions essential for interpretation of data?

One should always remember that even if the data are properly

collected and analysed, wrong interpretation would lead to inaccurate

conclusions. It is, therefore, absolutely essential that the task of

interpretation be accomplished with patience in an impartial manner

and also in correct perspective. Researcher must pay attention to the

following points for correct interpretation:

1. At the outset, researcher must invariably satisfy himself that the

data are appropriate, trustworthy and adequate for drawing

inferences,  the data reflect good homogeneity; and that proper

analysis has been done through statistical methods.

2. The researcher must remain cautious about the errors that can

possibly arise in the process of interpreting results. Errors can

arise due to false generalization and/or due to wrong

interpretation of statistical measures, such as the application of

findings beyond the range of observations, identification of

correlation with causation and the like. Another major pitfall is

the tendency to affirm that definite relationships exist on the

basis of confirmation of particular hypotheses. In fact, the

positive test results accepting the hypothesis must be interpreted

as “being in accord” with the hypothesis, rather than as

“confirming the validity of the hypothesis”. The researcher must

remain vigilant about all such things so that false generalization 24 | P a g e

Page 25: RM Semester4 Project

may not take place. He should be well equipped with and must

know the correct use of statistical measures for drawing

inferences concerning his study.

3. He must always keep in view that the task of interpretation is

very much intertwined with analysis and cannot be distinctly

separated. As such he must take the task of interpretation as a

special aspect of analysis and accordingly must take all those

precautions that one usually observes while going through the

process of analysis viz., precautions concerning the reliability of

data, computational checks, validation and comparison of

results.

4. He must never lose sight of the fact that his task is not only to

make sensitive observations of relevant occurrences, but also to

identify and disengage the factors that are initially hidden to the

eye. This will enable him to do his job of interpretation on

proper lines. Broad generalisation should be avoided as most

research is not amenable to it because the coverage may be

restricted to a particular time, a particular area and particular

conditions. Such restrictions, if any, must invariably be

specified and the results must be framed within their limits.

5. The researcher must remember that “ideally in the course of a

research study, there should be constant interaction between

initial hypothesis, empirical observation and theoretical

25 | P a g e

Page 26: RM Semester4 Project

conceptions. It is exactly in this area of interaction between

theoretical orientation and empirical observation that

opportunities for originality and creativity lie." He must pay

special attention to this aspect while engaged in the task of

interpretation.

6. Discuss the essentials of the good research report?

26 | P a g e