mba 3rd sem assignment

By-V.K. Saini MBA(IS) Semester- 3 2010(Fall)

Assignment (Set-1)Subject code: MB0050

Research Methodology

Q 1. Give examples of specific situations that would call for the following types of research, explaining why – a) Exploratory research b) Descriptive research c) Diagnostic research d) Evaluation research.

Ans.: Research may be classified crudely according to its major intent or the methods. According to the intent, research may be classified as:

Basic (aka fundamental or pure) research is driven by a scientist's curiosity or interest in a scientific question. The main motivation is to expand man's knowledge, not to create or invent something. There is no obvious commercial value to the discoveries that result from basic research.

For example, basic science investigations probe for answers to questions such as:

How did the universe begin?

What are protons, neutrons, and electrons composed of?

How do slime molds reproduce?

What is the specific genetic code of the fruit fly?

Most scientists believe that a basic, fundamental understanding of all branches of science is needed in order for progress to take place. In other words, basic research lays down the foundation for the applied science that follows. If basic work is done first, then applied spin-offs often eventually result from this research. As Dr. George Smoot of LBNL says, "People cannot foresee the future well enough to predict what's going to develop from basic research. If we only did applied research, we would still be making better spears."

Applied research is designed to solve practical problems of the modern world, rather than to acquire knowledge for knowledge's sake. One might say that the goal of the applied scientist is to improve the human condition.

For example, applied researchers may investigate ways to:

Improve agricultural crop production

Treat or cure a specific disease

Improve the energy efficiency of homes, offices, or modes of transportation

Some scientists feel that the time has come for a shift in emphasis away from purely basic research and toward applied science. This trend, they feel, is necessitated by the problems resulting from global overpopulation, pollution, and the overuse of the earth's natural resources.

Exploratory research provides insights into and comprehension of an issue or situation. It should draw definitive conclusions only with extreme caution. Exploratory research is a type of research conducted because a problem has not been clearly defined. Exploratory research helps determine the best research design, data collection method and selection of subjects. Given its fundamental nature, exploratory research often concludes that a perceived problem does not actually exist.


Exploratory research often relies on secondary research such as reviewing available literature and/or data, or qualitative approaches such as informal discussions with consumers, employees, management or competitors, and more formal approaches through in-depth interviews, focus groups, projective methods, case studies or pilot studies. The Internet allows for research methods that are more interactive in nature: E.g., RSS feeds efficiently supply researchers with up-to-date information; major search engine search results may be sent by email to researchers by services such as Google Alerts; comprehensive search results are tracked over lengthy periods of time by services such as Google Trends; and Web sites may be created to attract worldwide feedback on any subject.

The results of exploratory research are not usually useful for decision-making by themselves, but they can provide significant insight into a given situation. Although the results of qualitative research can give some indication as to the "why", "how" and "when" something occurs, it cannot tell us "how often" or "how many."

Exploratory research is not typically generalizable to the population at large.

A defining characteristic of causal research is the random assignment of participants to the conditions of the experiment; e.g., an Experimental and a Control Condition... Such assignment results in the groups being comparable at the beginning of the experiment. Any difference between the groups at the end of the experiment is attributable to the manipulated variable. Observational research typically looks for difference among "in-tact" defined groups. A common example compares smokers and non-smokers with regard to health problems. Causal conclusions can't be drawn from such a study because of other possible differences between the groups; e.g., smokers may drink more alcohol than non-smokers. Other unknown differences could exist as well. Hence, we may see a relation between smoking and health but a conclusion that smoking is a cause would not be warranted in this situation. (Cp)

Descriptive research, also known as statistical research, describes data and characteristics about the population or phenomenon being studied. Descriptive research answers the questions who, what, where, when and how.

Although the data description is factual, accurate and systematic, the research cannot describe what caused a situation. Thus, descriptive research cannot be used to create a causal relationship, where one variable affects another. In other words, descriptive research can be said to have a low requirement for internal validity.

The description is used for frequencies, averages and other statistical calculations. Often the best approach, prior to writing descriptive research, is to conduct a survey investigation. Qualitative research often has the aim of description and researchers may follow-up with examinations of why the observations exist and what the implications of the findings are.

In short descriptive research deals with everything that can be counted and studied. But there are always restrictions to that. Your research must have an impact to the life of the people around you. For example, finding the most frequent disease that affects the children of a town. The reader of the research will know what to do to prevent that disease thus; more people will live a healthy life.

Diagnostic study: it is similar to descriptive study but with different focus. It is directed towards discovering what is happening and what can be done about. It aims at identifying the causes of a problem and the possible solutions for it. It may also be concerned with discovering and testing whether certain variables are associated. This type of research requires prior knowledge of the problem, its thorough formulation, clear-cut definition of the given population, adequate methods for collecting accurate information, precise measurement of variables, statistical analysis and test of significance.

Evaluation Studies: it is a type of applied research. It is made for assessing the effectiveness of social or economic programmes implemented or for assessing the impact of development of the project area. It is thus directed to assess or appraise the quality and quantity of an activity and its performance and to specify its


attributes and conditions required for its success. It is concerned with causal relationships and is more actively guided by hypothesis. It is concerned also with change over time.

Action research is a reflective process of progressive problem solving led by individuals working with others in teams or as part of a "community of practice" to improve the way they address issues and solve problems. Action research can also be undertaken by larger organizations or institutions, assisted or guided by professional researchers, with the aim of improving their strategies, practices, and knowledge of the environments within which they practice. As designers and stakeholders, researchers work with others to propose a new course of action to help their community improve its work practices (Center for Collaborative Action Research). Kurt Lewin, then a professor at MIT, first coined the term “action research” in about 1944, and it appears in his 1946 paper “Action Research and Minority Problems”. In that paper, he described action research as “a comparative research on the conditions and effects of various forms of social action and research leading to social action” that uses “a spiral of steps, each of which is composed of a circle of planning, action, and fact-finding about the result of the action”.

Action research is an interactive inquiry process that balances problem solving actions implemented in a collaborative context with data-driven collaborative analysis or research to understand underlying causes enabling future predictions about personal and organizational change (Reason & Bradbury, 2001). After six decades of action research development, many methodologies have evolved that adjust the balance to focus more on the actions taken or more on the research that results from the reflective understanding of the actions. This tension exists between

● those that are more driven by the researcher’s agenda to those more driven by participants; Those that are motivated primarily by instrumental goal attainment to those motivated primarily by the aim of personal, organizational, or societal transformation; and 1st-, to 2nd-, to 3rd-person research, that is, my research on my own action, aimed primarily at personal change; our research on our group (family/team), aimed primarily at improving the group; and ‘scholarly’ research aimed primarily at theoretical generalization and/or large scale change.

Action research challenges traditional social science, by moving beyond reflective knowledge created by outside experts sampling variables to an active moment-to-moment theorizing, data collecting, and inquiring occurring in the midst of emergent structure. “Knowledge is always gained through action and for action. From this starting point, to question the validity of social knowledge is to question, not how to develop a reflective science about action, but how to develop genuinely well-informed action — how to conduct an action science” (Tolbert 2001).

Q 2.In the context of hypothesis testing, briefly explain the difference between a) Null and alternative hypothesis b) Type 1 and type 2 error c) Two tailed and one tailed test d) Parametric and non-parametric tests.

Ans.: Some basic concepts in the context of testing of hypotheses are explained below –

1) Null Hypotheses and Alternative Hypotheses: In the context of statistical analysis, we often talk about null and alternative hypotheses. If we are to compare the superiority of method A with that of method B and we proceed on the assumption that both methods are equally good, then this assumption is termed as a null hypothesis. On the other hand, if we think that method A is superior, then it is known as an alternative hypothesis.

These are symbolically represented as: Null hypothesis = H0 and Alternative hypothesis = Ha


Suppose we want to test the hypothesis that the population mean is equal to the hypothesized mean (µ H0) = 100. Then we would say that the null hypothesis is that the population mean is equal to the hypothesized mean 100 and symbolically we can express it as: H0: µ= µ H0=100 If our sample results do not support this null hypothesis, we should conclude that something else is true. What we conclude rejecting the null hypothesis is known as an alternative hypothesis. If we accept H0, then we are rejecting Ha and if we reject H0, then we are accepting Ha. For H0: µ= µ H0=100, we may consider three possible alternative hypotheses as follows:

Alternative Hypotheses

To be read as follows

Ha: µ≠µ H0 (The alternative hypothesis is that the population mean is not equal to 100 i.e., it may be more or less 100)

Ha: µ>µ H0 (The alternative hypothesis is that the population mean is greater than 100)

Ha: µ< µ H0 (The alternative hypothesis is that the population mean is less than 100)

The null hypotheses and the alternative hypotheses are chosen before the sample is drawn (the researcher must avoid the error of deriving hypotheses from the data he collects and testing the hypotheses from the same data). In the choice of null hypothesis, the following considerations are usually kept in view:

a. The alternative hypothesis is usually the one, which is to be proved, and the null hypothesis is the one that is to be disproved. Thus a null hypothesis represents the hypothesis we are trying to reject, while the alternative hypothesis represents all other possibilities.

b. If the rejection of a certain hypothesis when it is actually true involves great risk, it is taken as null hypothesis, because then the probability of rejecting it when it is true is α (the level of significance) which is chosen very small.

c. The null hypothesis should always be a specific hypothesis i.e., it should not state an approximate value.

Generally, in hypothesis testing, we proceed on the basis of the null hypothesis, keeping the alternative hypothesis in view. Why so? The answer is that on the assumption that the null hypothesis is true, one can assign the probabilities to different possible sample results, but this cannot be done if we proceed with alternative hypotheses. Hence the use of null hypotheses (at times also known as statistical hypotheses) is quite frequent.

2) The Level of Significance: This is a very important concept in the context of hypothesis testing. It is always some percentage (usually 5%), which should be chosen with great care, thought and reason. In case we take the significance level at 5%, then this implies that H0 will be rejected when the sampling result (i.e., observed evidence) has a less than 0.05 probability of occurring if H0 is true. In other words, the 5% level of significance means that the researcher is willing to take as much as 5% risk rejecting the null hypothesis when it (H0) happens to be true. Thus the significance level is the maximum value of the probability of rejecting H0 when it is true and is usually determined in advance before testing the hypothesis.

3) Decision Rule or Test of Hypotheses: Given a hypothesis Ha and an alternative hypothesis H0, we make a rule, which is known as a decision rule, according to which we accept H0 (i.e., reject Ha) or reject H0 (i.e., accept Ha). For instance, if H0 is that a certain lot is good (there are very few defective items in it), against Ha, that the lot is not good (there are many defective items in it), then we must decide the number of items to be tested and the criterion for accepting or rejecting the hypothesis. We might test 10 items in the lot and plan our decision saying that if there are none or only 1 defective item among the 10, we will accept H0; otherwise we will reject H0 (or accept Ha). This sort of basis is known as a decision rule.

4) Type I & II Errors: In the context of testing of hypotheses, there are basically two types of errors that we can make. We may reject H0 when H0 is true and we may accept H0 when it is not true. The former is known as Type I and the latter is known as Type II. In other words, Type I error means rejection of


hypotheses, which should have been accepted, and Type II error means accepting of hypotheses, which should have been rejected. Type I error is denoted by α (alpha), also called as level of significance of test; and Type II error is denoted by β(beta).

Decision

Accept H0 Reject H0

H0 (true) Correct decision Type I error (α error)

Ho (false) Type II error (β error) Correct decision

The probability of Type I error is usually determined in advance and is understood as the level of significance of testing the hypotheses. If type I error is fixed at 5%, it means there are about 5 chances in 100 that we will reject H0 when H0 is true. We can control type I error just by fixing it at a lower level. For instance, if we fix it at 1%, we will say that the maximum probability of committing type I error would only be 0.01. But with a fixed sample size n, when we try to reduce type I error, the probability of committing type II error increases. Both types of errors cannot be reduced simultaneously, since there is a trade-off in business situations. Decision makers decide the appropriate level of type I error by examining the costs of penalties attached to both types of errors. If type I error involves time and trouble of reworking a batch of chemicals that should have been accepted, whereas type II error means taking a chance that an entire group of users of this chemicals compound will be poisoned, then in such a situation one should prefer a type I error to a type II error. As a result, one must set a very high level for type I error in one’s testing techniques of a given hypothesis. Hence, in testing of hypotheses, one must make all possible efforts to strike an adequate balance between Type I & Type II error.

5) Two Tailed Test & One Tailed Test: In the context of hypothesis testing, these two terms are quite important and must be clearly understood. A two-tailed test rejects the null hypothesis if, say, the sample mean is significantly higher or lower than the hypothesized value of the mean of the population. Such a test is inappropriate when we have H0: µ= µ H0 and Ha: µ≠µ H0 which may µ>µ H0 or µ<µ H0. If significance level is 5 % and the two-tailed test is to be applied, the probability of the rejection area will be 0.05 (equally split on both tails of the curve as 0.025) and that of the acceptance region will be 0.95. If we take µ = 100 and if our sample mean deviates significantly from µ, in that case we shall accept the null hypothesis. But there are situations when only a one-tailed test is considered appropriate. A one-tailed test would be used when we are to test, say, whether the population mean is either lower or higher than some hypothesized value.

Parametric statistics is a branch of statistics that assumes data come from a type of probability distribution and makes inferences about the parameters of the distribution most well known elementary statistical methods are parametric. Generally speaking parametric methods make more assumptions than non-parametric methods. If those extra assumptions are correct, parametric methods can produce more accurate and precise estimates. They are said to have more statistical power. However, if those assumptions are incorrect, parametric methods can be very misleading. For that reason they are often not considered robust. On the other hand, parametric formulae are often simpler to write down and faster to compute. In some, but definitely not all cases, their simplicity makes up for their non-robustness, especially if care is taken to examine diagnostic statistics.

Because parametric statistics require a probability distribution, they are not distribution-free. Non-parametric models differ from parametric models in that the model structure is not specified a priori but is instead determined from data. The term nonparametric is not meant to imply that such models completely lack parameters but that the number and nature of the parameters are flexible and not fixed in advance.Kernel density estimation provides better estimates of the density than histograms.Nonparametric regression and semi parametric regression methods have been developed based on kernels, splines, and wavelets.Data Envelopment Analysis provides efficiency coefficients similar to those obtained by Multivariate Analysis without any distributional assumption.

Q 3. Explain the difference between a causal relationship and correlation, with an example of each. What are the possible reasons for a correlation between two variables?

http://en.wikipedia.org/wiki/Parametric_statistics

http://en.wikipedia.org/wiki/Multivariate_Analysis

http://en.wikipedia.org/wiki/Multivariate_Analysis

http://en.wikipedia.org/wiki/Data_Envelopment_Analysis

http://en.wikipedia.org/wiki/Wavelet

http://en.wikipedia.org/wiki/Spline_(mathematics)

http://en.wikipedia.org/wiki/Kernel_(statistics)

http://en.wikipedia.org/wiki/Semiparametric_regression

http://en.wikipedia.org/wiki/Nonparametric_regression

http://en.wikipedia.org/wiki/Kernel_density_estimation


Ans.: Correlation: The correlation is knowing what the consumer wants, and providing it. Marketing research looks at trends in sales and studies all of the variables, i.e. price, color, availability, and styles, and the best way to give the customer what he or she wants. If you can give the customer what they want, they will buy, and let friends and family know where they got it. Making them happy makes the money.

Casual relationship Marketing was first defined as a form of marketing developed from direct response marketing campaigns, which emphasizes customer retention and satisfaction, rather than a dominant focus on sales transactions.

As a practice, Relationship Marketing differs from other forms of marketing in that it recognizes the long term value of customer relationships and extends communication beyond intrusive advertising and sales promotional messages.

With the growth of the internet and mobile platforms, Relationship Marketing has continued to evolve and move forward as technology opens more collaborative and social communication channels. This includes tools for managing relationships with customers that goes beyond simple demographic and customer service data. Relationship Marketing extends to include Inbound Marketing efforts (a combination of search optimization and Strategic Content), PR, Social Media and Application Development.

Just like Customer relationship management(CRM), Relationship Marketing is a broadly recognized, widely-implemented strategy for managing and nurturing a company’s interactions with clients and sales prospects. It also involves using technology to, organize, synchronize business processes (principally sales and marketing activities) and most importantly, automate those marketing and communication activities on concrete marketing sequences that could run in autopilot (also known as marketing sequences). The overall goals are to find, attract, and win new clients, nurture and retain those the company already has, entice former clients back into the fold, and reduce the costs of marketing and client service. [1] Once simply a label for a category of software tools, today, it generally denotes a company-wide business strategy embracing all client-facing departments and even beyond. When an implementation is effective, people, processes, and technology work in synergy to increase profitability, and reduce operational costs

Reasons for a correlation between two variables: Chance association, (the relationship is due to chance) or causative association (one variable causes the other).

The information given by a correlation coefficient is not enough to define the dependence structure between random variables. The correlation coefficient completely defines the dependence structure only in very particular cases, for example when the distribution is a multivariate normal distribution. (See diagram above.) In the case of elliptic distributions it characterizes the (hyper-)ellipses of equal density, however, it does not completely characterize the dependence structure (for example, a multivariate t-distribution's degrees of freedom determine the level of tail dependence).

Distance correlation and Brownian covariance / Brownian correlation [8][9] were introduced to address the deficiency of Pearson's correlation that it can be zero for dependent random variables; zero distance correlation and zero Brownian correlation imply independence.

The correlation ratio is able to detect almost any functional dependency, or the entropy-based mutual information/total correlation which is capable of detecting even more general dependencies. The latter are sometimes referred to as multi-moment correlation measures, in comparison to those that consider only 2nd moment (pairwise or quadratic) dependence. The polychoric correlation is another correlation applied to ordinal data that aims to estimate the correlation between theorised latent variables. One way to capture a more complete view of dependence structure is to consider a copula between them.

Q 4. Briefly explain any two factors that affect the choice of a sampling technique. What are the characteristics of a good sample?

http://en.wikipedia.org/wiki/Copula_(statistics)

http://en.wikipedia.org/wiki/Polychoric_correlation

http://en.wikipedia.org/wiki/Total_correlation

http://en.wikipedia.org/wiki/Mutual_information

http://en.wikipedia.org/wiki/Mutual_information

http://en.wikipedia.org/wiki/Information_entropy

http://en.wikipedia.org/wiki/Correlation_ratio

http://en.wikipedia.org/wiki/#cite_note-8

http://en.wikipedia.org/wiki/#cite_note-7

http://en.wikipedia.org/wiki/Brownian_covariance

http://en.wikipedia.org/wiki/Distance_correlation

http://en.wikipedia.org/wiki/Multivariate_normal_distribution

http://en.wikipedia.org/wiki/Marketing


Ans.: The difference between non-probability and probability sampling is that non-probability sampling does not involve random selection and probability sampling does. Does that mean that non-probability samples aren't representative of the population? Not necessarily. But it does mean that non-probability samples cannot depend upon the rationale of probability theory. At least with a probabilistic sample, we know the odds or probability that we have represented the population well. We are able to estimate confidence intervals for the statistic. With non-probability samples, we may or may not represent the population well, and it will often be hard for us to know how well we've done so. In general, researchers prefer probabilistic or random sampling methods over non probabilistic ones, and consider them to be more accurate and rigorous. However, in applied social research there may be circumstances where it is not feasible, practical or theoretically sensible to do random sampling. Here, we consider a wide range of non-probabilistic alternatives.

We can divide non-probability sampling methods into two broad types:

Accidental or purposive.

Most sampling methods are purposive in nature because we usually approach the sampling problem with a specific plan in mind. The most important distinctions among these types of sampling methods are the ones between the different types of purposive sampling approaches.

Accidental, Haphazard or Convenience Sampling

One of the most common methods of sampling goes under the various titles listed here. I would include in this category the traditional "man on the street" (of course, now it's probably the "person on the street") interviews conducted frequently by television news programs to get a quick (although non representative) reading of public opinion. I would also argue that the typical use of college students in much psychological research is primarily a matter of convenience. (You don't really believe that psychologists use college students because they believe they're representative of the population at large, do you?). In clinical practice, we might use clients who are available to us as our sample. In many research contexts, we sample simply by asking for volunteers. Clearly, the problem with all of these types of samples is that we have no evidence that they are representative of the populations we're interested in generalizing to -- and in many cases we would clearly suspect that they are not.

Purposive Sampling

In purposive sampling, we sample with a purpose in mind. We usually would have one or more specific predefined groups we are seeking. For instance, have you ever run into people in a mall or on the street who are carrying a clipboard and who are stopping various people and asking if they could interview them? Most likely they are conducting a purposive sample (and most likely they are engaged in market research). They might be looking for Caucasian females between 30-40 years old. They size up the people passing by and anyone who looks to be in that category they stop to ask if they will participate. One of the first things they're likely to do is verify that the respondent does in fact meet the criteria for being in the sample. Purposive sampling can be very useful for situations where you need to reach a targeted sample quickly and where sampling for proportionality is not the primary concern. With a purposive sample, you are likely to get the opinions of your target population, but you are also likely to overweight subgroups in your population that are more readily accessible.

All of the methods that follow can be considered subcategories of purposive sampling methods. We might sample for specific groups or types of people as in modal instance, expert, or quota sampling. We might sample for diversity as in heterogeneity sampling. Or, we might capitalize on informal social networks to identify specific respondents who are hard to locate otherwise, as in snowball sampling. In all of these methods we know what we want -- we are sampling with a purpose.

Modal Instance SamplingIn statistics, the mode is the most frequently occurring value in a distribution. In sampling, when we do a modal instance sample, we are sampling the most frequent case, or the "typical" case. In a lot of informal public opinion polls, for instance, they interview a "typical" voter. There are a number of problems with this sampling approach. First, how do we know what the "typical" or "modal" case is? We could say that the modal voter is a person who is of average age, educational level, and income in the population. But, it's not


clear that using the averages of these is the fairest (consider the skewed distribution of income, for instance). And, how do you know that those three variables -- age, education, income -- are the only or even the most relevant for classifying the typical voter? What if religion or ethnicity is an important discriminator? Clearly, modal instance sampling is only sensible for informal sampling contexts.

Expert SamplingExpert sampling involves the assembling of a sample of persons with known or demonstrable experience and expertise in some area. Often, we convene such a sample under the auspices of a "panel of experts." There are actually two reasons you might do expert sampling. First, because it would be the best way to elicit the views of persons who have specific expertise. In this case, expert sampling is essentially just a specific sub case of purposive sampling. But the other reason you might use expert sampling is to provide evidence for the validity of another sampling approach you've chosen. For instance, let's say you do modal instance sampling and are concerned that the criteria you used for defining the modal instance are subject to criticism. You might convene an expert panel consisting of persons with acknowledged experience and insight into that field or topic and ask them to examine your modal definitions and comment on their appropriateness and validity. The advantage of doing this is that you aren't out on your own trying to defend your decisions -- you have some acknowledged experts to back you. The disadvantage is that even the experts can be, and often are, wrong.

Quota SamplingIn quota sampling, you select people non-randomly according to some fixed quota. There are two types of quota sampling: proportional and non proportional. In proportional quota sampling you want to represent the major characteristics of the population by sampling a proportional amount of each. For instance, if you know the population has 40% women and 60% men, and that you want a total sample size of 100, you will continue sampling until you get those percentages and then you will stop. So, if you've already got the 40 women for your sample, but not the sixty men, you will continue to sample men but even if legitimate women respondents come along, you will not sample them because you have already "met your quota." The problem here (as in much purposive sampling) is that you have to decide the specific characteristics on which you will base the quota. Will it be by gender, age, education race, religion, etc.?Non-proportional quota sampling is a bit less restrictive. In this method, you specify the minimum number of sampled units you want in each category. Here, you're not concerned with having numbers that match the proportions in the population. Instead, you simply want to have enough to assure that you will be able to talk about even small groups in the population. This method is the non-probabilistic analogue of stratified random sampling in that it is typically used to assure that smaller groups are adequately represented in your sample.

Heterogeneity SamplingWe sample for heterogeneity when we want to include all opinions or views, and we aren't concerned about representing these views proportionately. Another term for this is sampling for diversity. In many brainstorming or nominal group processes (including concept mapping), we would use some form of heterogeneity sampling because our primary interest is in getting broad spectrum of ideas, not identifying the "average" or "modal instance" ones. In effect, what we would like to be sampling is not people, but ideas. We imagine that there is a universe of all possible ideas relevant to some topic and that we want to sample this population, not the population of people who have the ideas. Clearly, in order to get all of the ideas, and especially the "outlier" or unusual ones, we have to include a broad and diverse range of participants. Heterogeneity sampling is, in this sense, almost the opposite of modal instance sampling.

Snowball SamplingIn snowball sampling, you begin by identifying someone who meets the criteria for inclusion in your study. You then ask them to recommend others who they may know who also meet the criteria. Although this method would hardly lead to representative samples, there are times when it may be the best method available. Snowball sampling is especially useful when you are trying to reach populations that are inaccessible or hard to find. For instance, if you are studying the homeless, you are not likely to be able to find good lists of homeless people within a specific geographical area. However, if you go to that area and identify one or two, you may find that they know very well whom the other homeless people in their vicinity are and how you can find them.

Characteristics of good Sample: The decision process is a complicated one. The researcher has to first identify the limiting factor or factors and must judiciously balance the conflicting factors. The various criteria governing the choice of the sampling technique are:


1. Purpose of the Survey: What does the researcher aim at? If he intends to generalize the findings based on the sample survey to the population, then an appropriate probability sampling method must be selected. The choice of a particular type of probability sampling depends on the geographical area of the survey and the size and the nature of the population under study.

2.Measurability: The application of statistical inference theory requires computation of the sampling error from the sample itself. Only probability samples allow such computation. Hence, where the research objective requires statistical inference, the sample should be drawn by applying simple random sampling method or stratified random sampling method, depending on whether the population is homogenous or heterogeneous.

3.Degree of Precision: Should the results of the survey be very precise, or could even rough results serve the purpose? The desired level of precision is one of the criteria for sampling method selection. Where a high degree of precision of results is desired, probability sampling should be used. Where even crude results would serve the purpose (E.g., marketing surveys, readership surveys etc), any convenient non-random sampling like quota sampling would be enough.

4. Information about Population: How much information is available about the population to be studied? Where no list of population and no information about its nature are available, it is difficult to apply a probability sampling method. Then an exploratory study with non-probability sampling may be done to gain a better idea of the population. After gaining sufficient knowledge about the population through the exploratory study, an appropriate probability sampling design may be adopted.

5. The Nature of the Population: In terms of the variables to be studied, is the population homogenous or heterogeneous? In the case of a homogenous population, even simple random sampling will give a representative sample. If the population is heterogeneous, stratified random sampling is appropriate.

6. Geographical Area of the Study and the Size of the Population: If the area covered by a survey is very large and the size of the population is quite large, multi-stage cluster sampling would be appropriate. But if the area and the size of the population are small, single stage probability sampling methods could be used.

7. Financial Resources: If the available finance is limited, it may become necessary to choose a less costly sampling plan like multistage cluster sampling, or even quota sampling as a compromise. However, if the objectives of the study and the desired level of precision cannot be attained within the stipulated budget, there is no alternative but to give up the proposed survey. Where the finance is not a constraint, a researcher can choose the most appropriate method of sampling that fits the research objective and the nature of population.

8. Time Limitation: The time limit within which the research project should be completed restricts the choice of a sampling method. Then, as a compromise, it may become necessary to choose less time consuming methods like simple random sampling, instead of stratified sampling/sampling with probability proportional to size; or multi-stage cluster sampling, instead of single-stage sampling of elements. Of course, the precision has to be sacrificed to some extent.

9. Economy: It should be another criterion in choosing the sampling method. It means achieving the desired level of precision at minimum cost. A sample is economical if the precision per unit cost is high, or the cost per unit of variance is low. The above criteria frequently conflict with each other and the researcher must balance and blend them to obtain a good sampling plan. The chosen plan thus represents an adaptation of the sampling theory to the available facilities and resources. That is, it represents a compromise between idealism and feasibility. One should use simple workable methods, instead of unduly elaborate and complicated techniques.

Q 5. Select any topic for research and explain how you will use both secondary and primary sources to gather the required information.


Ans.: Primary Sources of Data Primary sources are original sources from which the researcher directly collects data that has not been previously collected, e.g., collection of data directly by the researcher on brand awareness, brand preference, and brand loyalty and other aspects of consumer behavior, from a sample of consumers by interviewing them. Primary data is first hand information collected through various methods such as surveys, experiments and observation, for the purposes of the project immediately at hand.

The advantages of primary data are – It is unique to a particular research study It is recent information, unlike published information that is already available

The disadvantages are – It is expensive to collect, compared to gathering information from available sources Data collection is a time consuming process It requires trained interviewers and investigators

2 Secondary Sources of Data These are sources containing data, which has been collected and compiled for another purpose. Secondary sources may be internal sources, such as annual reports, financial statements, sales reports, inventory records, minutes of meetings and other information that is available within the firm, in the form of a marketing information system. They may also be external sources, such as government agencies (e.g. census reports, reports of government departments), published sources (annual reports of currency and finance published by the Reserve Bank of India, publications of international organizations such as the UN, World Bank and International Monetary Fund, trade and financial journals, etc.), trade associations (e.g. Chambers of Commerce) and commercial services (outside suppliers of information).

Methods of Data Collection:The researcher directly collects primary data from its original sources. In this case, the researcher can collect the required data precisely according to his research needs and he can collect them when he wants and in the form that he needs it. But the collection of primary data is costly and time consuming. Yet, for several types of social science research, required data is not available from secondary sources and it has to be directly gathered from the primary sources. Primary data has to be gathered in cases where the available data is inappropriate, inadequate or obsolete. It includes: socio economic surveys, social anthropological studies of rural communities and tribal communities, sociological studies of social problems and social institutions, marketing research, leadership studies, opinion polls, attitudinal surveys, radio listening and T.V. viewing surveys, knowledge-awareness practice (KAP) studies, farm management studies, business management studies etc. There are various methods of primary data collection, including surveys, audits and panels, observation and experiments.

1 Survey Research A survey is a fact-finding study. It is a method of research involving collection of data directly from a population or a sample at a particular time. A survey has certain characteristics: It is always conducted in a natural setting. It is a field study. It seeks responses directly from the respondents. It can cover a very large population. It may include an extensive study or an intensive study It covers a definite geographical area.

A survey involves the following steps - Selection of a problem and its formulation Preparation of the research design Operation concepts and construction of measuring indexes and scales Sampling Construction of tools for data collection Field work and collection of data Processing of data and tabulation

Analysis of data Reporting


There are four basic survey methods, which include: Personal interview Telephone interview Mail survey and

Fax survey

Personal Interview Personal interviewing is one of the prominent methods of data collection. It may be defined as a two-way systematic conversation between an investigator and an informant, initiated for obtaining information relevant to a specific study. It involves not only conversation, but also learning from the respondent’s gestures, facial expressions and pauses, and his environment.

Interviewing may be used either as a main method or as a supplementary one in studies of persons. Interviewing is the only suitable method for gathering information from illiterate or less educated respondents. It is useful for collecting a wide range of data, from factual demographic data to highly personal and intimate information relating to a person’s opinions, attitudes, values, beliefs, experiences and future intentions. Interviewing is appropriate when qualitative information is required, or probing is necessary to draw out the respondent fully. Where the area covered for the survey is compact, or when a sufficient number of qualified interviewers are available, personal interview is feasible.

Interview is often superior to other data-gathering methods. People are usually more willing to talk than to write. Once rapport is established, even confidential information may be obtained. It permits probing into the context and reasons for answers to questions.

Interview can add flesh to statistical information. It enables the investigator to grasp the behavioral context of the data furnished by the respondents. It permits the investigator to seek clarifications and brings to the forefront those questions, which for some reason or the other the respondents do not want to answer. Interviewing as a method of data collection has certain characteristics. They are:

1. The participants – the interviewer and the respondent – are strangers; hence, the investigator has to get himself/herself introduced to the respondent in an appropriate manner.

2. The relationship between the participants is a transitory one. It has a fixed beginning and termination points. The interview proper is a fleeting, momentary experience for them.

3. The interview is not a mere casual conversational exchange, but a conversation with a specific purpose, viz., obtaining information relevant to a study.

4. The interview is a mode of obtaining verbal answers to questions put verbally.5. The interaction between the interviewer and the respondent need not necessarily be on a face-to-

face basis, because the interview can also be conducted over the telephone.6. Although the interview is usually a conversation between two persons, it need not be limited to a

single respondent. It can also be conducted with a group of persons, such as family members, or a group of children, or a group of customers, depending on the requirements of the study.

7. The interview is an interactive process. The interaction between the interviewer and the respondent depends upon how they perceive each other.

8. The respondent reacts to the interviewer’s appearance, behavior, gestures, facial expression and intonation, his perception of the thrust of the questions and his own personal needs. As far as possible, the interviewer should try to be closer to the social-economic level of the respondents.

9. The investigator records information furnished by the respondent in the interview. This poses a problem of seeing that recording does not interfere with the tempo of conversation.

10. Interviewing is not a standardized process like that of a chemical technician; it is rather a flexible, psychological process.

3 Telephone Interviewing Telephone interviewing is a non-personal method of data collection. It may be used as a major method or as a supplementary method. It will be useful in the following situations:

11. When the universe is composed of those persons whose names are listed in telephone directories, e.g. business houses, business executives, doctors and other professionals.

12. When the study requires responses to five or six simple questions, e.g. a radio or television program survey.


13. When the survey must be conducted in a very short period of time, provided the units of study are listed in the telephone directory.

14. When the subject is interesting or important to respondents, e.g. a survey relating to trade conducted by a trade association or a chamber of commerce, a survey relating to a profession conducted by the concerned professional association.

15. When the respondents are widely scattered and when there are many call backs to make.

4 Group Interviews A group interview may be defined as a method of collecting primary data in which a number of individuals with a common interest interact with each other. In a personal interview, the flow of information is multi dimensional. The group may consist of about six to eight individuals with a common interest. The interviewer acts as the discussion leader. Free discussion is encouraged on some aspect of the subject under study. The discussion leader stimulates the group members to interact with each other. The desired information may be obtained through self-administered questionnaire or interview, with the discussion serving as a guide to ensure consideration of the areas of concern. In particular, the interviewers look for evidence of common elements of attitudes, beliefs, intentions and opinions among individuals in the group. At the same time, he must be aware that a single comment by a member can provide important insight. Samples for group interviews can be obtained through schools, clubs and other organized groups.

5 Mail Survey The mail survey is another method of collecting primary data. This method involves sending questionnaires to the respondents with a request to complete them and return them by post. This can be used in the case of educated respondents only. The mail questionnaires should be simple so that the respondents can easily understand the questions and answer them. It should preferably contain mostly closed-ended and multiple choice questions, so that it could be completed within a few minutes. The distinctive feature of the mail survey is that the questionnaire is self-administered by the respondents themselves and the responses are recorded by them and not by the investigator, as in the case of personal interview method. It does not involve face-to-face conversation between the investigator and the respondent. Communication is carried out only in writing and this requires more cooperation from the respondents than verbal communication. The researcher should prepare a mailing list of the selected respondents, by collecting the addresses from the telephone directory of the association or organization to which they belong. The following procedures should be followed - a covering letter should accompany a copy of the questionnaire. It must explain to the respondent the purpose of the study and the importance of his cooperation to the success of the project. Anonymity must be assured. The sponsor’s identity may be revealed. However, when such information may bias the result, it is not desirable to reveal it. In this case, a disguised organization name may be used. A self-addressed stamped envelope should be enclosed in the covering letter.

After a few days from the date of mailing the questionnaires to the respondents, the researcher can expect the return of completed ones from them. The progress in return may be watched and at the appropriate stage, follow-up efforts can be made.

The response rate in mail surveys is generally very low in developing countries like India. Certain techniques have to be adopted to increase the response rate. They are:

1. Quality printing: The questionnaire may be neatly printed on quality light colored paper, so as to attract the attention of the respondent.

2. Covering letter: The covering letter should be couched in a pleasant style, so as to attract and hold the interest of the respondent. It must anticipate objections and answer them briefly. It is desirable to address the respondent by name.

3. Advance information: Advance information can be provided to potential respondents by a telephone call, or advance notice in the newsletter of the concerned organization, or by a letter. Such preliminary contact with potential respondents is more successful than follow-up efforts.


4. Incentives: Money, stamps for collection and other incentives are also used to induce respondents to complete and return the mail questionnaire.

5. Follow-up-contacts: In the case of respondents belonging to an organization, they may be approached through someone in that organization known as the researcher.

6. Larger sample size: A larger sample may be drawn than the estimated sample size. For example, if the required sample size is 1000, a sample of 1500 may be drawn. This may help the researcher to secure an effective sample size closer to the required size.

Q 6. Case Study: You are engaged to carry out a market survey on behalf of a leading Newspaper that is keen to increase its circulation in Bangalore City, in order to ascertain reader habits and interests. Develop a title for the study; define the research problem and the objectives or questions to be answered by the study.

Ans.: Title: Newspaper reading choices

Research problem: A research problem is the situation that causes the researcher to feel apprehensive, confused and ill at ease. It is the demarcation of a problem area within a certain context involving the WHO or WHAT, the WHERE, the WHEN and the WHY of the problem situation.

There are many problem situations that may give rise to research. Three sources usually contribute to problem identification. Own experience or the experience of others may be a source of problem supply. A second source could be scientific literature. You may read about certain findings and notice that a certain field was not covered. This could lead to a research problem. Theories could be a third source. Shortcomings in theories could be researched.

Research can thus be aimed at clarifying or substantiating an existing theory, at clarifying contradictory findings, at correcting a faulty methodology, at correcting the inadequate or unsuitable use of statistical techniques, at reconciling conflicting opinions, or at solving existing practical problems

Types of questions to be asked :For more than 35 years, the news about newspapers and young readers has been mostly bad for the newspaper industry. Long before any competition from cable television or Nintendo, American newspaper publishers were worrying about declining readership among the young.

As early as 1960, at least 20 years prior to Music Television (MTV) or the Internet, media research scholars1 began to focus their studies on young adult readers' decreasing interest in newspaper content. The concern over a declining youth market preceded and perhaps foreshadowed today's fretting over market penetration. Even where circulation has grown or stayed stable, there is rising concern over penetration, defined as the percentage of occupied households in a geographic market that are served by a newspaper.2 Simply put, population growth is occurring more rapidly than newspaper readership in most communities.

This study looks at trends in newspaper readership among the 18-to-34 age group and examines some of the choices young adults make when reading newspapers.

One of the underlying concerns behind the decline in youth newspaper reading is the question of how young people view the newspaper. A number of studies explored how young readers evaluate and use newspaper content.

Comparing reader content preferences over a 10-year period, Gerald Stone and Timothy Boudreau found differences between readers ages 18-34 and those 35-plus.16 Younger readers showed increased interest in national news, weather, sports, and classified advertisements over the decade between 1984 and 1994, while older readers ranked weather, editorials, and food advertisements higher. Interest in international news and letters to the editor was less among younger readers, while older readers showed less interest in reports of births, obituaries, and marriages.


David Atkin explored the influence of telecommunication technology on newspaper readership among students in undergraduate media courses.17 He reported that computer-related technologies, including electronic mail and computer networks, were unrelated to newspaper readership. The study found that newspaper subscribers preferred print formats over electronic. In a study of younger, school-age children, Brian Brooks and James Kropp found that electronic newspapers could persuade children to become news consumers, but that young readers would choose an electronic newspaper over a printed one.18

In an exploration of leisure reading among college students, Leo Jeffres and Atkin assessed dimensions of interest in newspapers, magazines, and books,19 exploring the influence of media use, non-media leisure, and academic major on newspaper content preferences. The study discovered that overall newspaper readership was positively related to students' focus on entertainment, job / travel information, and public affairs. However, the students' preference for reading as a leisure-time activity was related only to a public affairs focus. Content preferences for newspapers and other print media were related. The researchers found no significant differences in readership among various academic majors, or by gender, though there was a slight correlation between age and the public affairs readership index, with older readers more interested in news about public affairs.

Methodology

Sample

Participants in this study (N=267) were students enrolled in 100- and 200-level English courses at a midwestern public university. Courses that comprise the framework for this sample were selected because they could fulfill basic studies requirements for all majors. A basic studies course is one that is listed within the core curriculum required for all students. The researcher obtained permission from seven professors to distribute questionnaires in the eight classes during regularly scheduled class periods. The students' participation was voluntary; two students declined. The goal of this sampling procedure was to reach a cross-section of students representing various fields of study. In all, 53 majors were represented.

Of the 267 students who participated in the study, 65 (24.3 percent) were male and 177 (66.3 percent) were female. A total of 25 participants chose not to divulge their genders. Ages ranged from 17 to 56, with a mean age of 23.6 years. This mean does not include the 32 respondents who declined to give their ages. A total of 157 participants (58.8 percent) said they were of the Caucasian race, 59 (22.1 percent) African American, 10 (3.8 percent) Asian, five (1.9 percent) African/Native American, two (.8 percent) Hispanic, two (.8 percent) Native American, and one (.4 percent) Arabic. Most (214) of the students were enrolled full time, whereas a few (28) were part-time students. The class rank breakdown was: freshmen, 45 (16.9 percent); sophomores, 15 (5.6 percent); juniors, 33 (12.4 percent); seniors, 133 (49.8 percent); and graduate students, 16 (6 percent).

Procedure

After two pre-tests and revisions, questionnaires were distributed and collected by the investigator. In each of the eight classes, the researcher introduced herself to the students as a journalism professor who was conducting a study on students' use of newspapers and other media. Each questionnaire included a cover letter with the researcher's name, address, and phone number. The researcher provided pencils and was available to answer questions if anyone needed further assistance. The average time spent on the questionnaires was 20 minutes, with some individual students taking as long as an hour. Approximately six students asked to take the questionnaires home to finish. They returned the questionnaires to the researcher's mailbox within a couple of day.



Research Methodology

Q 1.Discuss the relative advantages and disadvantages of the different methods of distributing questionnaires to the respondents of a study.

Ans.: There are some alternative methods of distributing questionnaires to the respondents. They are:

1) Personal delivery, 2) Attaching the questionnaire to a product, 3) Advertising the questionnaire in a newspaper or magazine, and 4) News-stand inserts.

Personal delivery: The researcher or his assistant may deliver the questionnaires to the potential respondents, with a request to complete them at their convenience. After a day or two, the completed questionnaires can be collected from them. Often referred to as the self-administered questionnaire method, it combines the advantages of the personal interview and the mail survey. Alternatively, the questionnaires may be delivered in person and the respondents may return the completed questionnaires through mail.

Attaching questionnaire to a product: A firm test marketing a product may attach a questionnaire to a product and request the buyer to complete it and mail it back to the firm. A gift or a discount coupon usually rewards the respondent.

Advertising the questionnaire: The questionnaire with the instructions for completion may be advertised on a page of a magazine or in a section of newspapers. The potential respondent completes it, tears it out and mails it to the advertiser. For example, the committee of Banks Customer Services used this method for collecting information from the customers of commercial banks in India. This method may be useful for large-scale studies on topics of common interest. Newsstand inserts: This method involves inserting the covering letter, questionnaire and self addressed reply-paid envelope into a random sample of newsstand copies of a newspaper or magazine.

Advantages and Disadvantages:

The advantages of Questionnaire are:

this method facilitates collection of more accurate data for longitudinal studies than any other method, because under this method, the event or action is reported soon after its occurrence.


this method makes it possible to have before and after designs made for field based studies. For example, the effect of public relations or advertising campaigns or welfare measures can be measured by collecting data before, during and after the campaign.

the panel method offers a good way of studying trends in events, behavior or attitudes. For example, a panel enables a market researcher to study how brand preferences change from month to month; it enables an economics researcher to study how employment, income and expenditure of agricultural laborers change from month to month; a political scientist can study the shifts in inclinations of voters and the causative influential factors during an election. It is also possible to find out how the constituency of the various economic and social strata of society changes through time and so on.

A panel study also provides evidence on the causal relationship between variables. For example, a cross sectional study of employees may show an association between their attitude to their jobs and their positions in the organization, but it does not indicate as to which comes first - favorable attitude or promotion. A panel study can provide data for finding an answer to this question.

It facilities depth interviewing, because panel members become well acquainted with the field workers and will be willing to allow probing interviews.

The major limitations or problems of Questionnaire method are:

this method is very expensive. The selection of panel members, the payment of premiums, periodic training of investigators and supervisors, and the costs involved in replacing dropouts, all add to the expenditure.

it is often difficult to set up a representative panel and to keep it representative. Many persons may be unwilling to participate in a panel study. In the course of the study, there may be frequent dropouts. Persons with similar characteristics may replace the dropouts. However, there is no guarantee that the emerging panel would be representative.

A real danger with the panel method is “panel conditioning” i.e., the risk that repeated interviews may sensitize the panel members and they become untypical, as a result of being on the panel. For example, the members of a panel study of political opinions may try to appear consistent in the views they express on consecutive occasions. In such cases, the panel becomes untypical of the population it was selected to represent. One possible safeguard to panel conditioning is to give members of a panel only a limited panel life and then to replace them with persons taken randomly from a reserve list.

Q 2. In processing data, what is the difference between measures of central tendency and measures of dispersion? What is the most important measure of central tendency and dispersion?

Ans.: Measures of Central tendency:

Arithmetic Mean : The arithmetic mean is the most common measure of central tendency. It simply the sum of the numbers divided by the number of numbers. The symbol m is used for the mean of a population. The

symbol M is used for the mean of a sample. The formula for m is shown below: m=

ΣX

N


Where ΣX is the sum of all the numbers in the numbers in the sample and N is the number of numbers in

the sample. As an example, the mean of the numbers 1+2+3+6+8=

20

5

=4 regardless of whether the numbers constitute the entire population or just a sample from the population.

The table, Number of touchdown passes, shows the number of touchdown (TD) passes thrown by each of the 31 teams in the National Football League in the 2000 season. The mean number of touchdown passes

thrown is 20.4516 as shown below. m=

ΣX

N

=

634

31

=20.4516

37 33 33 32 29 28 28 23

22 22 22 21 21 21 20 20

19 19 18 18 18 18 16 15

14 14 14 12 12 9 6

Table 1: Number of touchdown passes

Although the arithmetic mean is not the only "mean" (there is also a geometric mean), it is by far the most commonly used. Therefore, if the term "mean" is used without specifying whether it is the arithmetic mean, the geometric mean, or some other mean, it is assumed to refer to the arithmetic mean.

Median

The median is also a frequently used measure of central tendency. The median is the midpoint of a distribution: the same number of scores is above the median as below it. For the data in the table, Number of touchdown passes, there are 31 scores. The 16th highest score (which equals 20) is the median because there are 15 scores below the 16th score and 15 scores above the 16th score. The median can also be thought of as the 50th percentile.

Let's return to the made up example of the quiz on which you made a three discussed previously in the

module Introduction to Central Tendency and shown in Table 2.

Student Dataset 1 Dataset 2 Dataset 3

You 3 3 3

http://cnx.org/content/m11061/latest/#table2

http://cnx.org/content/m10942/latest/






Student Dataset 1 Dataset 2 Dataset 3

John's 3 4 2

Maria's 3 4 2

Shareecia's 3 4 2

Luther's 3 5 1

Table 2: Three possible datasets for the 5-point make-up quiz

For Dataset 1, the median is three, the same as your score. For Dataset 2, the median is 4. Therefore, your score is below the median. This means you are in the lower half of the class. Finally for Dataset 3, the median is 2. For this dataset, your score is above the median and therefore in the upper half of the distribution.

Computation of the Median: When there is an odd number of numbers, the median is simply the middle number. For example, the median of 2, 4, and 7 is 4. When there is an even number of numbers, the median is the mean of the two middle numbers. Thus, the median of the numbers 2, 4, 7, 12 is

4+7

2

=5.5.

Mode

The mode is the most frequently occurring value. For the data in the table, Number of touchdown passes, the mode is 18 since more teams (4) had 18 touchdown passes than any other number of touchdown passes. With continuous data such as response time measured to many decimals, the frequency of each value is one since no two scores will be exactly the same (see discussion of continuous variables). Therefore the mode of continuous data is normally computed from a grouped frequency distribution. The Grouped frequency

distribution table shows a grouped frequency distribution for the target response time data. Since the interval with the highest frequency is 600-700, the mode is the middle of that interval (650).

Range Frequency

500-600 3

600-700 6

700-800 5

800-900 5

900-1000 0

1000-1100 1

Table 3: Grouped frequency distribution






Measures of Dispersion: A measure of statistical dispersion is a real number that is zero if all the data are identical, and increases as the data becomes more diverse. It cannot be less than zero.

Most measures of dispersion have the same scale as the quantity being measured. In other words, if the measurements have units, such as metres or seconds, the measure of dispersion has the same units. Such measures of dispersion include:

Standard deviation Interquartile range

Range

Mean difference

Median absolute deviation

Average absolute deviation (or simply called average deviation)

Distance standard deviation

These are frequently used (together with scale factors) as estimators of scale parameters, in which capacity they are called estimates of scale.

All the above measures of statistical dispersion have the useful property that they are location-invariant, as well as linear in scale. So if a random variable X has a dispersion of SX then a linear transformation Y = aX + b for real a and b should have dispersion SY = |a|SX.

Other measures of dispersion are dimensionless (scale-free). In other words, they have no units even if the variable itself has units. These include:

Coefficient of variation Quartile coefficient of dispersion

Relative mean difference, equal to twice the Gini coefficient

There are other measures of dispersion:

Variance (the square of the standard deviation) — location-invariant but not linear in scale. Variance-to-mean ratio — mostly used for count data when the term coefficient of dispersion is used

and when this ratio is dimensionless, as count data are themselves dimensionless: otherwise this is not scale-free.

Some measures of dispersion have specialized purposes, among them the Allan variance and the Hadamard variance.

For categorical variables, it is less common to measure dispersion by a single number. See qualitative variation. One measure that does so is the discrete entropy.

Sources of statistical dispersion

In the physical sciences, such variability may result only from random measurement errors: instrument measurements are often not perfectly precise, i.e., reproducible. One may assume that the quantity being measured is unchanging and stable, and that the variation between measurements is due to observational error.

In the biological sciences, this assumption is false: the variation observed might be intrinsic to the phenomenon: distinct members of a population differ greatly. This is also seen in the arena of manufactured

http://en.wikipedia.org/wiki/Observational_error

http://en.wikipedia.org/wiki/Observational_error

http://en.wikipedia.org/wiki/Accuracy_and_precision

http://en.wikipedia.org/wiki/Information_entropy

http://en.wikipedia.org/wiki/Qualitative_variation

http://en.wikipedia.org/wiki/Qualitative_variation

http://en.wikipedia.org/wiki/Categorical_variable

http://en.wikipedia.org/w/index.php?title=Hadamard_variance&action=edit&redlink=1

http://en.wikipedia.org/w/index.php?title=Hadamard_variance&action=edit&redlink=1

http://en.wikipedia.org/wiki/Allan_variance

http://en.wikipedia.org/wiki/Dimensionless

http://en.wikipedia.org/wiki/Coefficient_of_dispersion

http://en.wikipedia.org/wiki/Count_data

http://en.wikipedia.org/wiki/Variance-to-mean_ratio

http://en.wikipedia.org/wiki/Variance

http://en.wikipedia.org/wiki/Gini_coefficient

http://en.wikipedia.org/wiki/Relative_mean_difference

http://en.wikipedia.org/wiki/Quartile_coefficient_of_dispersion

http://en.wikipedia.org/wiki/Coefficient_of_variation

http://en.wikipedia.org/wiki/Dimensionless

http://en.wikipedia.org/wiki/Real_number

http://en.wikipedia.org/wiki/Linear_transformation

http://en.wikipedia.org/wiki/Random_variable

http://en.wikipedia.org/wiki/Scale_parameter

http://en.wikipedia.org/wiki/Estimator

http://en.wikipedia.org/wiki/Scale_factor

http://en.wikipedia.org/wiki/Distance_standard_deviation

http://en.wikipedia.org/wiki/Average_absolute_deviation

http://en.wikipedia.org/wiki/Median_absolute_deviation

http://en.wikipedia.org/wiki/Mean_difference

http://en.wikipedia.org/wiki/Range_(statistics)

http://en.wikipedia.org/wiki/Interquartile_range

http://en.wikipedia.org/wiki/Standard_deviation

http://en.wikipedia.org/wiki/Units_of_measurement

http://en.wikipedia.org/wiki/Real_number


products; even there, the meticulous scientist finds variation.The simple model of a stable quantity is preferred when it is tenable. Each phenomenon must be examined to see if it warrants such a simplification.

Q 3. What are the characteristics of a good research design? Explain how the research design for exploratory studies is different from the research design for descriptive and diagnostic studies.

Ans.: Good research design:Much contemporary social research is devoted to examining whether a program, treatment, or manipulation causes some outcome or result. For example, we might wish to know whether a new educational program causes subsequent achievement score gains, whether a special work release program for prisoners causes lower recidivism rates, whether a novel drug causes a reduction in symptoms, and so on. Cook and Campbell (1979) argue that three conditions must be met before we can infer that such a cause-effect relation exists:

1. Covariation. Changes in the presumed cause must be related to changes in the presumed effect. Thus, if we introduce, remove, or change the level of a treatment or program, we should observe some change in the outcome measures.

2. Temporal Precedence. The presumed cause must occur prior to the presumed effect.

3. No Plausible Alternative Explanations. The presumed cause must be the only reasonable explanation for changes in the outcome measures. If there are other factors, which could be responsible for changes in the outcome measures, we cannot be confident that the presumed cause-effect relationship is correct.

In most social research the third condition is the most difficult to meet. Any number of factors other than the treatment or program could cause changes in outcome measures. Campbell and Stanley (1966) and later, Cook and Campbell (1979) list a number of common plausible alternative explanations (or, threats to internal validity). For example, it may be that some historical event which occurs at the same time that the program or treatment is instituted was responsible for the change in the outcome measures; or, changes in record keeping or measurement systems which occur at the same time as the program might be falsely attributed to the program. The reader is referred to standard research methods texts for more detailed discussions of threats to validity.

This paper is primarily heuristic in purpose. Standard social science methodology textbooks (Cook and Campbell 1979; Judd and Kenny, 1981) typically present an array of research designs and the alternative explanations, which these designs rule out or minimize. This tends to foster a "cookbook" approach to research design - an emphasis on the selection of an available design rather than on the construction of an appropriate research strategy. While standard designs may sometimes fit real-life situations, it will often be necessary to "tailor" a research design to minimize specific threats to validity. Furthermore, even if standard textbook designs are used, an understanding of the logic of design construction in general will improve the comprehension of these standard approaches. This paper takes a structural approach to research design. While this is by no means the only strategy for constructing research designs, it helps to clarify some of the basic principles of design logic.

Minimizing Threats to Validity

Good research designs minimize the plausible alternative explanations for the hypothesized cause-effect relationship. But such explanations may be ruled out or minimized in a number of ways other than by design. The discussion, which follows, outlines five ways to minimize threats to validity, one of which is by research design:

1. By Argument. The most straightforward way to rule out a potential threat to validity is to simply argue that the threat in question is not a reasonable one. Such an argument may be made either a priori or a posteriori, although the former will usually be more convincing than the latter. For example, depending on the situation, one might argue that an instrumentation threat is not likely


because the same test is used for pre and post test measurements and did not involve observers who might improve, or other such factors. In most cases, ruling out a potential threat to validity by argument alone will be weaker than the other approaches listed below. As a result, the most plausible threats in a study should not, except in unusual cases, be ruled out by argument only.

2. By Measurement or Observation. In some cases it will be possible to rule out a threat by measuring it and demonstrating that either it does not occur at all or occurs so minimally as to not be a strong alternative explanation for the cause-effect relationship. Consider, for example, a study of the effects of an advertising campaign on subsequent sales of a particular product. In such a study, history (i.e., the occurrence of other events which might lead to an increased desire to purchase the product) would be a plausible alternative explanation. For example, a change in the local economy, the removal of a competing product from the market, or similar events could cause an increase in product sales. One might attempt to minimize such threats by measuring local economic indicators and the availability and sales of competing products. If there is no change in these measures coincident with the onset of the advertising campaign, these threats would be considerably minimized. Similarly, if one is studying the effects of special mathematics training on math achievement scores of children, it might be useful to observe everyday classroom behavior in order to verify that students were not receiving any additional math training to that provided in the study.

3. By Design. Here, the major emphasis is on ruling out alternative explanations by adding treatment or control groups, waves of measurement, and the like. This topic will be discussed in more detail below.

4. By Analysis. There are a number of ways to rule out alternative explanations using statistical analysis. One interesting example is provided by Jurs and Glass (1971). They suggest that one could study the plausibility of an attrition or mortality threat by conducting a two-way analysis of variance. One factor in this study would be the original treatment group designations (i.e., program vs. comparison group), while the other factor would be attrition (i.e., dropout vs. non-dropout group). The dependent measure could be the pretest or other available pre-program measures. A main effect on the attrition factor would be indicative of a threat to external validity or generalizability, while an interaction between group and attrition factors would point to a possible threat to internal validity. Where both effects occur, it is reasonable to infer that there is a threat to both internal and external validity.

The plausibility of alternative explanations might also be minimized using covariance analysis. For example, in a study of the effects of "workfare" programs on social welfare caseloads, one plausible alternative explanation might be the status of local economic conditions. Here, it might be possible to construct a measure of economic conditions and include that measure as a covariate in the statistical analysis. One must be careful when using covariance adjustments of this type -- "perfect" covariates do not exist in most social research and the use of imperfect covariates will not completely adjust for potential alternative explanations. Nevertheless causal assertions are likely to be strengthened by demonstrating that treatment effects occur even after adjusting on a number of good covariates.

5. By Preventive Action. When potential threats are anticipated some type of preventive action can often rule them out. For example, if the program is a desirable one, it is likely that the comparison group would feel jealous or demoralized. Several actions can be taken to minimize the effects of these attitudes including offering the program to the comparison group upon completion of the study or using program and comparison groups which have little opportunity for contact and communication. In addition, auditing methods and quality control can be used to track potential experimental dropouts or to insure the standardization of measurement.

The five categories listed above should not be considered mutually exclusive. The inclusion of measurements designed to minimize threats to validity will obviously be related to the design structure and is likely to be a factor in the analysis. A good research plan should, where possible. make use of multiple methods for reducing threats. In general, reducing a particular threat by design or preventive action will probably be stronger than by using one of the other three approaches. The choice of which strategy to use for any particular threat is complex and depends at least on the cost of the strategy and on the potential seriousness of the threat.


Design Construction

Basic Design Elements. Most research designs can be constructed from four basic elements:

1. Time. A causal relationship, by its very nature, implies that some time has elapsed between the occurrence of the cause and the consequent effect. While for some phenomena the elapsed time might be measured in microseconds and therefore might be unnoticeable to a casual observer, we normally assume that the cause and effect in social science arenas do not occur simultaneously, In design notation we indicate this temporal element horizontally - whatever symbol is used to indicate the presumed cause would be placed to the left of the symbol indicating measurement of the effect. Thus, as we read from left to right in design notation we are reading across time. Complex designs might involve a lengthy sequence of observations and programs or treatments across time.

2. Program(s) or Treatment(s). The presumed cause may be a program or treatment under the explicit control of the researcher or the occurrence of some natural event or program not explicitly controlled. In design notation we usually depict a presumed cause with the symbol "X". When multiple programs or treatments are being studied using the same design, we can keep the programs distinct by using subscripts such as "X1" or "X2". For a comparison group (i.e., one which does not receive the program under study) no "X" is used.

3. Observation(s) or Measure(s). Measurements are typically depicted in design notation with the symbol "O". If the same measurement or observation is taken at every point in time in a design, then this "O" will be sufficient. Similarly, if the same set of measures is given at every point in time in this study, the "O" can be used to depict the entire set of measures. However, if different measures are given at different times it is useful to subscript the "O" to indicate which measurement is being given at which point in time.

4. Groups or Individuals. The final design element consists of the intact groups or the individuals who participate in various conditions. Typically, there will be one or more program and comparison groups. In design notation, each group is indicated on a separate line. Furthermore, the manner in which groups are assigned to the conditions can be indicated by an appropriate symbol at the beginning of each line. Here, "R" will represent a group, which was randomly assigned, "N" will depict a group, which was nonrandom assigned (i.e., a nonequivalent group or cohort) and a "C" will indicate that the group was assigned using a cutoff score on a measurement.

Q 4. How is the Case Study method useful in Business Research? Give two specific examples of how the case study method can be applied to business research.

Ans.: While case study writing may seem easy at first glance, developing an effective case study (also called a success story) is an art. Like other marketing communication skills, learning how to write a case study takes time. What’s more, writing case studies without careful planning usually results in sub optimal results?Savvy case study writers increase their chances of success by following these ten proven techniques for writing an effective case study:


Involve the customer throughout the process. Involving the

customer throughout the case study development process helps ensure customer cooperation and approval, and results in an improved case study. Obtain customer permission before writing the document, solicit input during the development, and secure approval after drafting the document.

Write all customer quotes for their review. Rather than asking the customer to draft their quotes, writing them for their review usually results in more compelling material.

Case Study Writing Ideas Establish a document template. A template serves as a roadmap for the case study process, and

ensures that the document looks, feels, and reads consistently. Visually, the template helps build the brand; procedurally, it simplifies the actual writing. Before beginning work, define 3-5 specific elements to include in every case study, formalize those elements, and stick to them.

Start with a bang. Use action verbs and emphasize benefits in the case study title and subtitle. Include a short (less than 20-word) customer quote in larger text. Then, summarize the key points of the case study in 2-3 succinct bullet points. The goal should be to tease the reader into wanting to read more.

Organize according to problem, solution, and benefits. Regardless of length, the time-tested, most effective organization for a case study follows the problem-solution-benefits flow. First, describe the business and/or technical problem or issue; next, describe the solution to this problem or resolution of this issue; finally, describe how the customer benefited from the particular solution (more on this below). This natural story-telling sequence resonates with readers.

Use the general-to-specific-to-general approach. In the problem section, begin with a general discussion of the issue that faces the relevant industry. Then, describe the specific problem or issue that the customer faced. In the solution section, use the opposite sequence. First, describe how the solution solved this specific problem; then indicate how it can also help resolve this issue more broadly within the industry. Beginning more generally draws the reader into the story; offering a specific example demonstrates, in a concrete way, how the solution resolves a commonly faced issue; and concluding more generally allows the reader to understand how the solution can also address their problem.

Quantify benefits when possible. No single element in a case study is more compelling than the ability to tie quantitative benefits to the solution. For example, “Using Solution X saved Customer Y over $ZZZ, ZZZ after just 6 months of implementation;” or, “Thanks to Solution X, employees at Customer Y have realized a ZZ% increase in productivity as measured by standard performance indicators.” Quantifying benefits can be challenging, but not impossible. The key is to present imaginative ideas to the customer for ways to quantify the benefits, and remain flexible during this


discussion. If benefits cannot be quantified, attempt to develop a range of qualitative benefits; the latter can be quite compelling to readers as well.

Use photos. Ask the customer if they can provide shots of personnel, ideally using the solution. The shots need not be professionally done; in fact, “homegrown” digital photos sometimes lead to surprisingly good results and often appear more genuine. Photos further personalize the story and help form a connection to readers.

Reward the customer. After receiving final customer approval and finalizing the case study, provide a pdf, as well as printed copies, to the customer. Another idea is to frame a copy of the completed case study and present it to the customer in appreciation for their efforts and cooperation.

Writing a case study is not easy. Even with the best plan, a case study is doomed to failure if the writer lacks the exceptional writing skills, technical savvy, and marketing experience that these documents require. In many cases, a talented writer can mean the difference between an ineffective case study and one that provides the greatest benefit. If a qualified internal writer is unavailable, consider outsourcing the task to professionals who specialize in case study writing.

Q 5. What are the differences between observation and interviewing as methods of data collection? Give two specific examples of situations where either observation or interviewing would be more appropriate.

Ans.: Observation means viewing or seeing. Observation may be defined as a systematic viewing of a specific phenomenon on its proper setting for the specific purpose of gathering data for a particular study. Observation is classical method of scientific study.The prerequisites of observation consist of:

Observations must be done under conditions, which will permit accurate results. The observer must be in vantage point to see clearly the objects to be observed. The distance and the light must be satisfactory. The mechanical devices used must be in good working conditions and operated by skilled persons.

Observation must cover a sufficient number of representative samples of the cases.

Recording should be accurate and complete.

The accuracy and completeness of recorded results must be checked. A certain number of cases can be observered again by another observer/another set of mechanical devices as the case may be. If it is feasible two separate observers and set of instruments may be used in all or some of the original observations. The results could then be compared to determine their accuracy and completeness.

Advantages of observation

o The main virtue of observation is its directness it makes it possible to study behavior as it

occurs. The researcher needs to ask people about their behavior and interactions he can simply watch what they do and say.

o Data collected by observation may describe the observed phenomena as they occur in their

natural settings. Other methods introduce elements or artificiality into the researched situation for instance in interview the respondent may not behave in a natural way. There is no such artificiality in observational studies especially when the observed persons are not aware of their being observed.


o Observations in more suitable for studying subjects who are unable to articulate

meaningfully e.g. studies of children, tribal animals, birds etc.

o Observations improve the opportunities for analyzing the contextual back ground of

behavior. Furthermore verbal resorts can be validated and compared with behavior through observation. The validity of what men of position and authority say can be verified by observing what they actually do.

o Observations make it possible to capture the whole event as it occurs. For example only

observation can be providing an insight into all the aspects of the process of negotiation between union and management representatives.

o Observation is less demanding of the subjects and has less biasing effect on their conduct

than questioning.

o It is easier to conduct disguised observation studies than disguised questioning.

o Mechanical devices may be used for recording data in order to secure more accurate data

and also of making continuous observations over longer periods.

Interviews are a crucial part of the recruitment process for all Organisations. Their purpose is to give the interviewer(s) a chance to assess your suitability for the role and for you to demonstrate your abilities and personality. As this is a two-way process, it is also a good opportunity for you to ask questions and to make sure the organisation and position are right for you.Interview format

Interviews take many different forms. It is a good idea to ask the organisation in advance what format the interview will take.

Competency/criteria based interviews - These are structured to reflect the competencies or qualities that an employer is seeking for a particular job, which will usually have been detailed in the job specification or advert. The interviewer is looking for evidence of your skills and may ask such things as: ‘Give an example of a time you worked as part of a team to achieve a common goal.’

The organisation determines the selection criteria based on the roles they are recruiting for and then, in an interview, examines whether or not you have evidence of possessing these.

Technical interviews - If you have applied for a job or course that requires technical knowledge, it is likely that you will be asked technical questions or has a separate technical interview. Questions may focus on your final year project or on real or hypothetical technical problems. You should be prepared to prove yourself, but also to admit to what you do not know and stress that you are keen to learn. Do not worry if you do not know the exact answer - interviewers are interested in your thought process and logic.

Academic interviews - These are used for further study or research positions. Questions are likely to center on your academic history to date.

Structured interviews - The interviewer has a set list of questions, and asks all the candidates the same questions.

Formal/informal interviews - Some interviews may be very formal, while others will feel more like an informal chat about you and your interests. Be aware that you are still being assessed, however informal the discussion may seem.

Portfolio based interviews - If the role is within the arts, media or communications industries, you may be asked to bring a portfolio of your work to the interview, and to have an in-depth discussion about the pieces you have chosen to include.

Senior/case study interviews - These ranges from straightforward scenario questions (e.g. ‘What would you do in a situation where…?’) to the detailed analysis of a hypothetical business problem. You will be evaluated on your analysis of the problem, how you identify the key issues, how you


pursue a particular line of thinking and whether you can develop and present an appropriate framework for organising your thoughts.

Specific types of interview

The Screening Interview

Companies use screening tools to ensure that candidates meet minimum qualification requirements. Computer programs are among the tools used to weed out unqualified candidates. (This is why you need a digital resume that is screening-friendly. See our resume center for help.) Sometimes human professionals are the gatekeepers. Screening interviewers often have honed skills to determine whether there is anything that might disqualify you for the position. Remember-they does not need to know whether you are the best fit for the position, only whether you are not a match. For this reason, screeners tend to dig for dirt. Screeners will hone in on gaps in your employment history or pieces of information that look inconsistent. They also will want to know from the outset whether you will be too expensive for the company.

Some tips for maintaining confidence during screening interviews:

Highlight your accomplishments and qualifications. Get into the straightforward groove. Personality is not as important to the screener as verifying your

qualifications. Answer questions directly and succinctly. Save your winning personality for the person making hiring decisions!

Be tactful about addressing income requirements. Give a range, and try to avoid giving specifics by replying, "I would be willing to consider your best offer."

If the interview is conducted by phone, it is helpful to have note cards with your vital information sitting next to the phone. That way, whether the interviewer catches you sleeping or vacuuming the floor, you will be able to switch gears quickly.

The Informational Interview

On the opposite end of the stress spectrum from screening interviews is the informational interview. A meeting that you initiate, the informational interview is underutilized by job-seekers who might otherwise consider themselves savvy to the merits of networking. Job seekers ostensibly secure informational meetings in order to seek the advice of someone in their current or desired field as well as to gain further references to people who can lend insight. Employers that like to stay apprised of available talent even when they do not have current job openings, are often open to informational interviews, especially if they like to share their knowledge, feel flattered by your interest, or esteem the mutual friend that connected you to them. During an informational interview, the jobseeker and employer exchange information and get to know one another better without reference to a specific job opening.

This takes off some of the performance pressure, but be intentional nonetheless:

Come prepared with thoughtful questions about the field and the company. Gain references to other people and make sure that the interviewer would be comfortable if you

contact other people and use his or her name.

Give the interviewer your card, contact information and resume.

Write a thank you note to the interviewer.

The Directive Style

In this style of interview, the interviewer has a clear agenda that he or she follows unflinchingly. Sometimes companies use this rigid format to ensure parity between interviews; when interviewers ask each candidate the same series of questions, they can more readily compare the results. Directive interviewers rely upon


their own questions and methods to tease from you what they wish to know. You might feel like you are being steam-rolled, or you might find the conversation develops naturally. Their style does not necessarily mean that they have dominance issues, although you should keep an eye open for these if the interviewer would be your supervisor.

Either way, remember:

Flex with the interviewer, following his or her lead. Do not relinquish complete control of the interview. If the interviewer does not ask you for information

that you think is important to proving your superiority as a candidate, politely interject it.

The Meandering Style

This interview type, usually used by inexperienced interviewers, relies on you to lead the discussion. It might begin with a statement like "tell me about yourself," which you can use to your advantage. The interviewer might ask you another broad, open-ended question before falling into silence. This interview style allows you tactfully to guide the discussion in a way that best serves you.

The following strategies, which are helpful for any interview, are particularly important when interviewers use a non-directive approach:

Come to the interview prepared with highlights and anecdotes of your skills, qualities and experiences. Do not rely on the interviewer to spark your memory-jot down some notes that you can reference throughout the interview.

Remain alert to the interviewer. Even if you feel like you can take the driver's seat and go in any direction you wish, remain respectful of the interviewer's role. If he or she becomes more directive during the interview, adjust.

Ask well-placed questions. Although the open format allows you significantly to shape the interview, running with your own agenda and dominating the conversation means that you run the risk of missing important information about the company and its needs.

Q 6. Case Study: You are engaged to carry out a market survey on behalf of a leading Newspaper that is keen to increase its circulation in Bangalore City, in order to ascertain reader habits and interests. What type of research report would be most appropriate? Develop an outline of the research report with the main sections.

Ans.: There are four major interlinking processes in the presentation of a literature review:1. Critiquing rather than merely listing each item a good literature review is led by your own critical

thought processes - it is not simply a catalogue of what has been written.

Once you have established which authors and ideas are linked, take each group in turn and really think about what you want to achieve in presenting them this way. This is your opportunity for showing that you did not take all your reading at face value, but that you have the knowledge and skills to interpret the authors' meanings and intentions in relation to each other, particularly if there are conflicting views or incompatible findings in a particular area.

Rest assured that developing a sense of critical judgment in the literature surrounding a topic is a gradual process of gaining familiarity with the concepts, language, terminology and conventions in the field. In the early stages of your research you cannot be expected to have a fully developed appreciation of the implications of all findings.

As you get used to reading at this level of intensity within your field you will find it easier and more purposeful to ask questions as you read:


o What is this all about?o Who is saying it and what authorities do they have?

o Why is it significant?

o What is its context?

o How was it reached?

o How valid is it?

o How reliable is the evidence?

o What has been gained?

o What do other authors say?

o How does it contribute?

o So what?

2. Structuring the fragments into a coherent body through your reading and discussions with your supervisor during the searching and organising phases of the cycle, you will eventually reach a final decision as to your own topic and research design.

As you begin to group together the items you read, the direction of your literature review will emerge with greater clarity. This is a good time to finalise your concept map, grouping linked items, ideas and authors into firm categories as they relate more obviously to your own study.

Now you can plan the structure of your written literature review, with your own intentions and conceptual framework in mind. Knowing what you want to convey will help you decide the most appropriate structure.

A review can take many forms; for example:

o An historical survey of theory and research in your fieldo A synthesis of several paradigms

o A process of narrowing down to your own topic

It is likely that your literature review will contain elements of all of these.

As with all academic writing, a literature review needs:

o An introductiono A body

o A conclusion

The introduction sets the scene and lays out the various elements that are to be explored.

The body takes each element in turn, usually as a series of headed sections and subsections. The first paragraph or two of each section mentions the major authors in association with their main ideas and areas of debate. The section then expands on these ideas and authors, showing how each relates to the others, and how the debate informs your understanding of the topic. A short conclusion at the end of each section presents a synthesis of these linked ideas.


The final conclusion of the literature review ties together the main points from each of your sections and this is then used to build the framework for your own study. Later, when you come to write the discussion chapter of your thesis, you should be able to relate your findings in one-to-one correspondence with many of the concepts or questions that were firmed up in the conclusion of your literature review.

3. Controlling the 'voice' of your citations in the text (by selective use of direct quoting, paraphrasing and summarizing)

You can treat published literature like any other data, but the difference is that it is not data you generated yourself.

When you report on your own findings, you are likely to present the results with reference to their source, for example:

o 'Table 2 shows that sixteen of the twenty subjects responded positively.'

When using published data, you would say:

o 'Positive responses were recorded for 80 per cent of the subjects (see table 2).'o 'From the results shown in table 2, it appears that the majority of subjects responded

positively.'

In these examples your source of information is table 2. Had you found the same results on page 17 of a text by Smith published in 1988, you would naturally substitute the name, date and page number for 'table 2'. In each case it would be your voice introducing a fact or statement that had been generated somewhere else.

You could see this process as building a wall: you select and place the 'bricks' and your 'voice' provides the ‘mortar’, which determines how strong the wall will be. In turn, this is significant in the assessment of the merit and rigor of your work.

There are three ways to combine an idea and its source with your own voice:

o Direct quoteo Paraphrase

o Summary

In each method, the author's name and publication details must be associated with the words in the text, using an approved referencing system. If you don't do this you would be in severe breach of academic convention, and might be penalized. Your field of study has its own referencing conventions you should investigate before writing up your results.

Direct quoting repeats exact wording and thus directly represents the author:

o 'Rain is likely when the sky becomes overcast' (Smith 1988, page 27).

If the quotation is run in with your text, single quotation marks are used to enclose it, and it must be an identical copy of the original in every respect.

Overuse or simple 'listing' of quotes can substantially weaken your own argument by silencing your critical view or voice.

Paraphrasing is repeating an idea in your own words, with no loss of the author's intended meaning:


o As Smith (1988) pointed out in the late eighties, rain may well be indicated by the presence of cloud in the sky.

Paraphrasing allows you to organize the ideas expressed by the authors without being rigidly constrained by the grammar, tense and vocabulary of the original. You retain a degree of flexibility as to whose voice comes through most strongly.

Summarizing means to shorten or crystallize a detailed piece of writing by restating the main points in your own words and in the order in which you found them. The original writing is 'described' as if from the outside, and it is your own voice that is predominant:

o Referring to the possible effects of cloudy weather, Smith (1988) predicted the likelihood of rain.

o Smith (1988) claims that some degree of precipitation could be expected as the result of clouds in the sky: he has clearly discounted the findings of Jones (1986).

4. Using appropriate language

Your writing style represents you as a researcher, and reflects how you are dealing with the subtleties and complexities inherent in the literature.Once you have established a good structure with appropriate headings for your literature review, and once you are confident in controlling the voice in your citations, you should find that your writing becomes more lucid and fluent because you know what you want to say and how to say it.The good use of language depends on the quality of the thinking behind the writing, and on the context of the writing. You need to conform to discipline-specific requirements. However, there may still be some points of grammar and vocabulary you would like to improve. If you have doubts about your confidence to use the English language well, you can help yourself in several ways:

o Ask for feedback on your writing from friends, colleagues and academicso Look for specific language information in reference materialso Access programs or self-paced learning resources which may be available on your campus

Grammar tips - practical and helpful

The following guidance on tenses and other language tips may be useful.Which tense should I use?Use present tense:

o For generalizations and claims: The sky is blue.

o To convey ideas, especially theories, which exist for the reader at the time of reading: I think therefore I am.

o For authors' statements of a theoretical nature, which can then be compared on equal terms with others:

Smith (1988) suggests that...o In referring to components of your own document:

Table 2 shows...Use present perfect tense for:

o Recent events or actions that are still linked in an unresolved way to the present: Several studies have attempted to...

Use simple past tense for:o Completed events or actions:

Smith (1988) discovered that...Use past perfect tense for:

o Events which occurred before a specified past time: Prior to these findings, it had been thought that...

Use modals (may, might, could, would, should) to:o Convey degrees of doubt


This may indicate that ... this would imply that...

Other language tipso Convey your meaning in the simplest possible way. Don't try to use an intellectual tone for

the sake of it, and do not rely on your reader to read your mind!o Keep sentences short and simple when you wish to emphasise a point.o Use compound (joined simple) sentences to write about two or more ideas which may be

linked with 'and', 'but', 'because', 'whereas' etc.o Use complex sentences when you are dealing with embedded ideas or those that show the

interaction of two or more complex elements.o Verbs are more dynamic than nouns, and nouns carry information more densely than verbs.o Select active or passive verbs according to whether you are highlighting the 'doer' or the

'done to' of the action.o Keep punctuation to a minimum. Use it to separate the elements of complex sentences in

order to keep subject, verb and object in clear view.o Avoid densely packed strings of words, particularly nouns.

The total processThe story of a research studyIntroductionI looked at the situation and found that I had a question to ask about it. I wanted to investigate something in particular.

Review of literatureSo I read everything I could find on the topic - what was already known and said and what had previously been found. I established exactly where my investigation would fit into the big picture, and began to realise at this stage how my study would be different from anything done previously.

MethodologyI decided on the number and description of my subjects, and with my research question clearly in mind, designed my own investigation process, using certain known research methods (and perhaps some that are not so common). I began with the broad decision about which research paradigm I would work within (that is, qualitative/quantitative, critical/interpretive/ empiricist). Then I devised my research instrument to get the best out of what I was investigating. I knew I would have to analyse the raw data, so I made sure that the instrument and my proposed method(s) of analysis were compatible right from the start. Then I carried out the research study and recorded all the data in a methodical way according to my intended methods of analysis. As part of the analysis, I reduced the data (by means of my preferred form of classification) to manageable thematic representation (tables, graphs, categories, etc). It was then that I began to realise what I had found.

Findings/resultsWhat had I found? What did the tables/graphs/categories etc. have to say that could be pinned down? It was easy enough for me to see the salient points at a glance from these records, but in writing my report, I also spelled out what I had found truly significant to make sure my readers did not miss it. For each display of results, I wrote a corresponding summary of important observations relating only elements within my own set of results and comparing only like with like. I was careful not to let my own interpretations intrude or voice my excitement just yet. I wanted to state the facts - just the facts. I dealt correctly with all inferential statistical procedures, applying tests of significance where appropriate to ensure both reliability and validity. I knew that I wanted my results to be as watertight and squeaky clean as possible. They would carry a great deal more credibility, strength and thereby academic 'clout' if I took no shortcuts and remained both rigorous and scholarly.

DiscussionNow I was free to let the world know the significance of my findings. What did I find in the results that answered my original research question? Why was I so sure I had some answers? What about the unexplained or unexpected findings? Had I interpreted the results correctly? Could there have been any other factors involved? Were my findings supported or contested by the results of similar studies? Where did that leave mine in terms of contribution to my field? Can I actually generalise from my findings in a breakthrough of some kind, or do I simply see myself as reinforcing existing knowledge? And so what, after


all? There were some obvious limitations to my study, which, even so, I'll defend to the hilt. But I won't become over-apologetic about the things left undone, or the abandoned analyses, the fascinating byways sadly left behind. I have my memories...

ConclusionWe'll take a long hard look at this study from a broad perspective. How does it rate? How did I end up answering the question I first thought of? The conclusion needs to be a few clear, succinct sentences. That way, I'll know that I know what I'm talking about. I'll wrap up with whatever generalizations I can make, and whatever implications have arisen in my mind as a result of doing this thing at all. The more you find out, the more questions arise. How I wonder what you are ... how I speculate. OK, so where do we all go from here?

Three stages of research1. Reading2. Research design and implementation3. Writing up the research report or thesis4.

Use an active, cyclical writing process: draft, check, reflect, revise, redraft.

Establishing good practice1. Keep your research question always in mind.2. Read widely to establish a context for your research.3. Read widely to collect information, which may relate to your topic, particularly to your hypothesis or

research question.4. Be systematic with your reading, note-taking and referencing records.5. Train yourself to select what you do need and reject what you don't need.6. Keep a research journal to reflect on your processes, decisions, state of mind, changes of mind,

reactions to experimental outcomes etc.7. Discuss your ideas with your supervisor and interested others.8. Keep a systematic log of technical records of your experimental and other research data,

remembering to date each entry, and noting any discrepancies or unexpected occurrences at the time you notice them.

9. Design your research approaches in detail in the early stages so that you have frameworks to fit findings into straightaway.

10. Know how you will analyse data so that your formats correspond from the start.

Keep going back to the whole picture. Be thoughtful and think ahead about the way you will consider and store new information as it comes to light.



Legal Aspects of Business

Q.1 Explain the concept and limitations of the theory of comparative costs.

Ans. Theory of comparative costs

In economics, the law of comparative advantage refers to the ability of a party (an individual, a firm, or a country) to produce a particular good or service at a lower marginal cost and opportunity cost than another party. It can be contrasted with absolute advantage which refers to the ability of a party to produce a particular good at a lower absolute cost than another.Comparative advantage explains how trade can create value for both parties even when one can produce all goods with fewer resources than the other. The net benefits of such an outcome are called gains from trade.

Origins of the theoryDavid Ricardo explained comparative advantage in his 1817 book On the Principles of Political Economy and Taxation in an example involving England and Portugal. In Portugal it is possible to produce both wine and cloth with less labor than it would take to produce the same quantities in England. However the relative costs of producing those two goods are different in the two countries. In England it is very hard to produce wine, and only moderately difficult to produce cloth. In Portugal both are easy to produce. Therefore while it is cheaper to produce cloth in Portugal than England, it is cheaper still for Portugal to produce excess wine, and trade that for English cloth. Conversely, England benefits from this trade because its cost for producing cloth has not changed but it can now get wine at a lower price, closer to the cost of cloth. The conclusion drawn is that each country can gain by specializing in the good where it has comparative advantage, and trading that good for the other.

Example 1Two men live alone on an isolated island. To survive they must undertake a few basic economic activities like water carrying, fishing, cooking and shelter construction and maintenance. The first man is young, strong, and educated. He is also, faster, better, more productive at everything. He has an absolute advantage in all activities. The second man is old, weak, and uneducated. He has an absolute disadvantage in all economic activities. In some activities the difference between the two is great; in others it is small.

Despite the fact that the younger man has absolute advantage in all activities, it is not in the interest of either of them to work in isolation since they both can benefit from specialization and exchange. If the two men divide the work according to comparative advantage then the young man will specialize in tasks at which he is most productive, while the older man will concentrate on tasks where his productivity is only a little less than that of the young man. Such an arrangement will increase total production for a given amount of labor supplied by both men and it will benefit both of them.

Example 2Suppose there are two countries of equal size, Northland and Southland, that both produce and consume two goods, Food and Clothes. The productive capacities and efficiencies of the countries are such that if both countries devoted all their resources to Food production, output would be as follows:

• Northland: 100 tonnes


• Southland: 400 tonnes

If all the resources of the countries were allocated to the production of Clothes, output would be:• Northland: 100 tonnes• Southland: 200 tonnes

Assuming each has constant opportunity costs of production between the two products and both economies have full employment at all t imes. All factors of production are mobile within the countries between clothing and food industries, but are immobile between the countries. The price mechanism must be working to provide perfect competition. Southland has an absolute advantage over Northland in the production of Food and Clothing. There seems to be no mutual benefit in trade between the economies, as Southland is more efficient at producing both products. The opportunity costs shows otherwise. Northland's opportunity cost ofproducing one tonne of Food is one tonne of Clothes and vice versa. Southland's opportunity cost of one tonne of Food is 0.5 tonne of Clothes. The opportunity cost of one tonne of Clothes is 2 tonnes of Food. Southland has a comparative advantage in food production, because of its lower opportunity cost of production with respect to Northland. Northland has a comparative advantage over Southland in the production of clothes, the opportunity cost of which is higher in Southland with respect to Food than in Northland.To show these different opportunity costs lead to mutual benefit if the countries specialize production and trade, consider the countries produce and consume only domestically. The volumes are:

Production and consumption before tradeFood Clothes

Northland 50 50Southland 200 100TOTAL 250 150

This example includes no formulation of the preferences of consumers in the two economies which would allow the determination of the international exchange rate of Clothes and Food. Given the production capabilities of each country, in order for trade to be worthwhile Northland requires a price of at least one tonne of Food in exchange for one tonne of Clothes; and Southland requires at least one tonne of Clothes for two tonnes of Food. The exchange price will be somewhere between the two.

The remainder of the example works with an international trading price of one tonne of Food for 2/3 tonne of Clothes. If both specialize in the goods in which they have comparative advantage, their outputs will be:

Production after tradeFood Clothes

Northland 0 100Southland 300 50TOTAL 300 150

World production of food increased. Clothing production remained the same. Using the exchange rate of one tonne of Food for 2/3 tonne of Clothes, Northland and Southland are able to trade to yield the following level of consumption:

Consumption after tradeFood Clothes

Northland 75 50Southland 225 100World total 300 150Northland traded 50 tonnes of Clothing for 75 tonnes of Food. Both benefited, and now consume at points outside their production possibility frontiers.Example 3The economist Paul Samuelson provided another well known example in his Economics. Suppose that in a particular city the best lawyer happens also to be the best secretary, that is he would be the most productive lawyer and he would also be the best secretary in town. However, if this lawyer focused on the task of being an attorney and, instead of pursuing both occupations at once, employed a secretary, both the output of the lawyer and the secretary would increase. The example given by Greg Mankiw in his Economics textbook is


almost identical although instead of a lawyer and a secretary, it uses Tiger Woods who is supposed to be both the best golf-player and the fastest lawnmower.

Limitation:

• Two countries, two goods - the theory is no different for larger numbers of countries and goods, but the principles are clearer and the argument easier to follow in this simpler case. • Equal size economies - again, this is a simplification to produce a clearer example. • Full employment - if one or other of the economies has less than full employment of factors of production, then this excess capacity must usually be used up before the comparative advantage reasoning can be applied.• Constant opportunity costs - a more realistic treatment of opportunity costs the reasoning is broadly the same, but specialization of production can only be taken to the point at which the opportunity costs in the two countries become equal. This does not invalidate the principles of comparative advantage, but it does limit the magnitude of the benefit.• Perfect mobility of factors of production within countries - this is necessary to allow production to be switched without cost. In real economies this cost will be incurred: capital will be tied up in plant (sewing machines are not sowing machines) and labour will need to be retrained and relocated. This is why it is sometimes argued that 'nascent industries' should be protected from fully liberalised international trade during the period in which a high cost of entry into the market (capital equipment, training) is being paid for.• Immobility of factors of production between countries - why are there different rates of productivity? The modern version of comparative advantage (developed in the early twentieth century by the Swedish economists Eli Heckscher and Bertil Ohlin) attributes these differences to differences in nations' factor endowments. A nation will have comparative advantage in producing the good that uses intensively the factor it produces abundantly. For example: suppose the US has a relative abundance of capital and India has a relative abundance of labor. Suppose further that cars are capital intensive to produce, while cloth is labor intensive. Then the US will have a comparative advantage in making cars, and India will have a comparative advantage in making cloth. If there is international factor mobility this can change nations' relative factor abundance. The principle of comparative advantage still applies, but who has the advantage in what can change.• Negligible transport cost - Cost is not a cause of concern when countries decided to trade. It is ignored and not factored in.• Assume that half the resources are used to produce each good in each country. This takes place before specialization• Perfect competition - this is a standard assumption that allows perfectly efficient allocation ofproductive resources in an idealized free market.

Q.2 What are the different market entry strategies for a company which is interested to enter International markets? Discuss briefly.

Ans: Definition A market entry strategy is to find the best method of delivering your goods to your market and of distributing them there. This applies to domestic and international sales.

The transactions associated with exporting are generally more complicated than those relating to the domestic market. Remember you are dealing with different cultures underpinned by different legal systems. Language barriers may also cause misunderstanding.

The market entry strategies into international markets can be listed as follows:

1. Direct Sales

This means total distribution and pricing control for the exporter; high profit potential due to elimination of any middlemen. On the other hand, your company provides all services, including advertising, marketing, customer service, translation, required labeling; you must become an expert in that market; credit risks are, on average, the highest of any other strategy; potential sales volume is low.

2. Agent or Representative


An agent or representative is an individual or company legally authorized to act on your behalf in your target market. If you perform due diligence and find the right agent, your export products are represented by an expert in the local market with established customer contacts; sales potential increases. However, the exporter must grant exclusive agreements regarding geographic regions or product lines; there is no control over prices and profit rate is lowered due to sales commission.

3. Distributor

The exporter essentially deals with a single customer who takes ownership of the product, in exchange assuming total responsibility for promotion, marketing, delivery, returns and customer relations. Sales volume potential increases, credit risk decreases. But the relationship is harder to legally terminate than with an agent or representative.

4. Licensing

When working on a licensing basis, the credit risk is low; there is a minimal level of commitment and risk for the licensor company since the overseas licensee is responsible for all production, marketing, distribution, credit and collections. Against this model, we have increased risks of loss of intellectual property. Despite potential high sales volume, profit is limited to a small percentage on each sale.

5. Joint Venture

The production cost per unit can be significantly lowered by moving selected manufacturing overseas; higher sales volume, market penetration and profit potential than any other strategy. However this results in a high level of commitment, investment, resource allocation and risk. This high risk, high commitment, but potentially high reward strategy is for exporters already experienced in the target market who are prepared to walk the last meters to take maximum advantage of that market's potential. In some countries, a joint venture is the only legal way for a foreign company to set up operations.

6. Franchising

This approach can result in a wide market coverage, quick market coverage; protection from copying and reasonably profitable. You must consider also high cost of studying laws and regulations in different countries; cost of frequent visits to support franchisees and potential to lose contract to major franchisee. Franchising your idea, product and style of presentation to foreign franchisees carries with it a moderate degree of risk.

7. Export Merchant

You will have all financial and legal matters handled in your country, this helps reduce the risk level. But you will never learn about exporting; there will be no input into marketing decision; there will be scant feedback on product performance for future Research & Development.

8. Subsidiary or subcontracting

This requires either setting up your own facility or subcontracting the manufacturing of your products to an assembly operator. It offers greater control over operations, lower transportation costs, low tariffs or duties (as with imports), lower production costs and maybe foreign government investment incentives (e.g. tax holidays). On the other hand, a subsidiary requires greater investment than joint ventures and licensing manufacturing, a substantial commitment of time, exposition to local market risks, among other.

There are five points to be considered by firms for entry into new markets:

a. Technical innovation strategy - perceived and demonstrable superior products

b. Product adaptation strategy - modifications to existing products

c. Availability and security strategy - overcome transport risks by countering perceived risks


d. Low price strategy - penetration price and,

e. Total adaptation and conformity strategy - foreign producer gives a straight copy.

In marketing products from less developed countries to developed countries point “c” poses major problems. Buyers in the interested foreign country are usually very careful as they perceive transport, currency, quality and quantity problems.

9. Risks involved

The risks involved in a market entry strategy range from Systematic Credit Risk (different com Systemic Risk), the Exchange Risk (also known as Currency Risk), the Liquidity Risk, the Country or Sovereign (or Geographical) Risk and in some products, particularly agricultural commodities, we have the Weather Risk.

Q. 3 (a) What are the benefits of MNC’s

Ans: MNCs

A multinational corporation (MNC), also called a transnational corporation (TNC), or multinational enterprise (MNE), is a corporation or an enterprise that manages production or delivers services in more than one country. It can also be referred to as an international corporation. The International Labour Organization (ILO) has defined[citation needed] an MNC as a corporation that has its management headquarters in one country, known as the home country, and operates in several other countries, known as host countries.

The Dutch East India Company was the first multinational corporation in the world and the first company to issue stock. It was also arguably the world's first mega corporation, possessing quasi-governmental powers, including the ability to wage war, negotiate treaties, coin money, and establish colonies.[3]

The first modern multinational corporation is generally thought to be the East India Company.[4] Many corporations have offices, branches or manufacturing plants in different countries from where their original and main headquarters is located.

Benefits:

Multinational companies (MNCs) are not without benefits, which may be to the government, the economy, and the people or even to itself. Cole (1996) stated that the size of multinational organization is enormous; many of them have total sales well in excess of the GND of many of the world's nations. Cole also stated that World Bank statistics of comparison between multinational companies and national GNPs shows, for example, that large oil firms such as Exxon and Shell are large in economic terms that nations such as South Africa, Australia and Argentina are substantially greater than nations such as Greece, Bulgaria and Egypt.

Other large multinational companies include General Motors, British Petroleum, Ford and International Business Machine (IBM). Some of the benefits of multinational companies are:

1. There is usually huge capital investment in major economic activities

2. The country enjoys varieties of products, services and facilities, brought to their door steps

3. There is creation of more jobs for the populace

4. The nation's pool of skills are best utilized and put to use effectively and efficiently

5. There is advancement in technology as these companies bring in state-of-the-art-technology for their businesses

6. The demand for training and retraining and advancement in the people's education becomes absolutely necessary. This will in turn help strengthen the economy of the nation

7. The living standard of the people is boosted


8. Friendliness between and among nations in trade i.e. it strengthen international relation

9. The balance of payments of nations in trade are improved on

In the words of Cole (1996), he stated that the sheer size (and wealth) of multinationals means that they can have a significant effect on host country. To Cole, most of the effects are beneficial and include some of the above or all. The Electronic Library of Scientific Literature (1996) explained the benefits of MNCs under a theory known as 'The Theory of Externalities'. The theory considers the benefits of MNCs from the point of view of those who maintain the importance of Foreign Direct Investment (FDI) as part of the engine necessary for growth. In the contribution of Davies (1989), he gave some theories on the benefits/advantages of multinational. Davies (1989:260) tagged this 'Economic Theory' and the multinational where he took a comprehensive and critical look at the benefits of MNCs.

More benefits came along with these people's theories and some are:

1. There is significant injection into the local economy in respect to investment

2. Best utilization of the country's natural resources

3. They help in strengthening domestic competition

4. They are good source of technological expertise

5. Expansion of market in the host country

Q.3 ( b) Give a short note on OPEC.

Ans : OPEC :

The Organization of the Petroleum Exporting Countries (OPEC, pronounced /ˈoʊpɛk/ OH-pek) is a cartel of twelve developing countries made up of Algeria, Angola, Ecuador, Iran, Iraq, Kuwait, Libya, Nigeria, Qatar, Saudi Arabia, the United Arab Emirates, and Venezuela. OPEC has maintained its headquarters in Vienna since 1965, and hosts regular meetings among the oil ministers of its Member Countries. Indonesia withdrew in 2008 after it became a net importer of oil, but stated it would likely return if it became a net exporter in the world again.

According to its statutes, one of the principal goals is the determination of the best means for safeguarding the cartel's interests, individually and collectively. It also pursues ways and means of ensuring the stabilization of prices in international oil markets with a view to eliminating harmful and unnecessary fluctuations; giving due regard at all times to the interests of the producing nations and to the necessity of securing a steady income to the producing countries; an efficient and regular supply of petroleum to consuming nations, and a fair return on their capital to those investing in the petroleum industry.

OPEC's influence on the market has been widely criticized, since it became effective in determining production and prices. Arab members of OPEC alarmed the developed world when they used the “oil weapon” during the Yom Kippur War by implementing oil embargoes and initiating the 1973 oil crisis. Although largely political explanations for the timing and extent of the OPEC price increases are also valid, from OPEC’s point of view, these changes were triggered largely by previous unilateral changes in the world financial system and the ensuing period of high inflation in both the developed and developing world. This explanation encompasses OPEC actions both before and after the outbreak of hostilities in October 1973, and concludes that “OPEC countries were only 'staying even' by dramatically raising the dollar price of oil.”

OPEC's ability to control the price of oil has diminished somewhat since then, due to the subsequent discovery and development of large oil reserves in Alaska, the North Sea, Canada, the Gulf of Mexico, the opening up of Russia, and market modernization. OPEC nations still account for two-thirds of the world's oil


reserves, and, as of April 2009, 33.3% of the world's oil production, affording them considerable control over the global market. The next largest group of producers, members of the OECD and the Post-Soviet states produced only 23.8% and 14.8%, respectively, of the world's total oil production. As early as 2003, concerns that OPEC members had little excess pumping capacity sparked speculation that their influence on crude oil prices would begin to slip.

Q.4. a. How will socio-cultural environment of a country have an impact on a multinational business? Explain with an example.

Ans: Social and cultural environment

The socio-cultural environment of every nation is unique. Therefore, it is very essential for marketers to consider the differences existing between the cultures in the home country and the host country.

The experience faced by The Coca Cola Company during the launch of its soft drink product in China is often cited as an example for emphasizing the impact of cultural differences on global marketing. The company, during the product’s launch in China, spelled Coca Cola as ‘Ke-Kou-ke-la’ in Chinese. Later, the company searched from around 40,000 Chinese characters and came up with the word ‘ko-kou-ko-le,’ which when translated in Chinese meant ‘happiness in the mouth.’

Equally important to the international manager are sociocultural elements. These include the attitudes, values, norms, beliefs, behaviors, and demographic trends of the host country. Learning these things frequently requires a good deal of self-awareness in order to recognize and control culturally specific behaviors in one's self and in others. International managers must know how to relate to and motivate foreign workers, since motivational techniques differ among countries. They must also understand how work roles and attitudes differ. For instance, the boundaries and responsibilities of occupations sometimes have subtle differences across cultures, even if they have equivalent names and educational requirements. Managers must be attuned to such cultural nuances in order to function effectively. Moreover, managers must keep perspective on cultural differences once they are identified and not subscribe to the fallacy that all people in a foreign culture think and act alike.

The Dutch social scientist Geert Hofstede divided sociocultural elements into four categories: (1) power distance, (2) uncertainty avoidance, (3) individualism-collectivism, and (4) masculinity-femininity. International managers must understand all four elements in order to succeed.

Power distance is a cultural dimension that involves the degree to which individuals in a society accept differences in the distribution of power as reasonable and normal. Uncertainty avoidance involves the extent to which members of a society feel uncomfortable with and try to avoid situations that they see as unstructured, unclear, or unpredictable. Individualism-collectivism involves the degree to which individuals concern themselves with their own interests and those of their immediate families as opposed to the interests of a larger group. Finally, masculinity-femininity is the extent to which a society emphasizes traditional male values, e.g., assertiveness, competitiveness, and material success, rather than traditional female values, such as passivity, cooperation, and feelings. All of these dimensions can have a significant impact on a manager's success in an international business environment.

The inability to understand the concepts Hofstede outlined can hinder managers' capacity to manage—and their companies' chances of surviving in the international arena.

The social dimension or environment of a nation determines the value system of the society which, in turn affects the functioning of the business. Sociological factors such as costs structure, customs and conventions, cultural heritage, view toward wealth and income and scientific methods, respect for seniority, mobility of labour etc. have far-reaching impact on the business. These factors determine the work culture and mobility of labour, work groups etc. For instance, the nature of goods and services to be produced depends upon the demand of the people which in turn is affected by their attitudes, customs, so as cultural values fashion etc. Socio-cultural environment determines the code of conduct the business should follow. The social groups such as trade unions or consumer forum will intervene if the business follows the unethical practices. For instance, if the firm is not paying fair wages to its business in indulging in black marketing or adulteration, consumers forums and various government agencies will take action against the business.


Q. 4(b). Discuss the origin of WTO and its principles.

Ans: WTO

The WTO is the successor to a previous trade agreement called the General Agreement on Tariffs and Trade (GATT), which was created in 1948. The WTO has a larger membership than GATT, and covers more subjects. Nevertheless, it was GATT that established, multilaterally, the principles underlying this trading system. Box 3, on the next page, summarizes the history of GATT and the WTO. The WTO is both an institution and a set of rules, called the “WTO law”. Each of the almost 150 WTO members are required to implement these rules, and to provide other members with the specific trade benefits to which they have committed themselves.

The main body of WTO law is composed of over sixty individual agreements and decisions. All of these are overseen by councils and committees at the WTO’s headquarters in Geneva; the WTO doesn’t have any local or regional offices. Large-scale negotiations, like the Doha Round, require their own special negotiating forum. At least once every two years, WTO members meet at the ministerial level. For the rest of the time, national delegates, who are usually diplomats and national trade officials,conduct the day-to-day work. Box 2, below, shows the basic structure of WTO representative bodies.

All this amounts to a heavy burden for many small and poor WTO members. To help lighten the load and ensure effective participation, technical assistance is available from the WTO and other international agencies, including training courses for national trade officials. The assistance available, however, is insufficient for a country like Cambodia to contribute actively in every area of the WTO. Cambodia will need to prioritize its objectives in WTO membership, and the issues it raises before the organization. Occasionally Cambodia may join a group, with other countries leading the negotiations. The group of least-developed country WTO members1 works together when they have similar objectives. One recent example involved seeking to make the WTO rules on special and differential treatment for developing countries more concrete (see the section on non-discrimination on page 19). Additionally, a much larger group (the “G90”) of least-developed and other relatively poor WTO members have worked together in the Doha Round negotiations, particularly on agriculture.

PRIMARY WTO PRINCIPLES

A small number of relatively simple principles underlie the rules of the WTO as they affect Cambodia and, all other members:

1. LAWS AND REGULATIONS MUST BE TRANSPARENT

Transparency is the primary principle of the WTO. Nothing is more important to business people than knowing and having confidence in the regulatory environment in which they operate, at home and overseas. WTO agreements usually have some form of transparency requirement included that requires governments and other authorities to publish all laws, regulations, and practices that can impact trade or investment.

2. NON-DISCRIMINATION

A second key principle of the WTO rulebook is non-discrimination. The principle applies at two levels.At the first level, non-discrimination means that Cambodian goods cannot be discriminated againstin export markets with respect to the same goods arriving from competing countries. At the secondlevel, once they enter those export markets,Cambodian goods cannot be treated differently thanthe same goods produced locally.


3. PROGRESSIVE TRADE LIBERALIZATION

A third principle is progressive trade liberalization through negotiation. The WTO is not a free-trade agreement. As the following chapters will outline, there is scope for the legal protection of markets from import competition. However, the underlying goal of the WTO is to create trade and investment throughincreasingly open markets. Governments are free to open their markets independently of the WTO. After accession, Cambodia can liberalize further to the extent, and at the speed, the government thinks is appropriate.

4. SPECIAL AND DIFFERENTIAL TREATMENT

A fourth principle is of “special and differential treatment” for developing countries. In practice, thispermits easier conditions for poorer countries. This can mean not applying certain provisions of newagreements to developing countries. It can also mean providing poorer nations with more time to implement such provisions than for developed countries.This is an important aspect of the Doha Round.

Q. 5 (a). Explain the merits and demerits of BoP theory?

Ans: BOP Theory

A balance of payments (BOP) sheet is an accounting record of all monetary transactions between a country and the rest of the world. These transactions include payments for the country's exports and imports of goods, services, and financial capital, as well as financial transfers. The BOP summarises international transactions for a specific period, usually a year, and is prepared in a single currency, typically the domestic currency for the country concerned. Sources of funds for a nation, such as exports or the receipts of loans and investments, are recorded as positive or surplus items. Uses of funds, such as for imports or to invest in foreign countries, are recorded as a negative or deficit item.

When all components of the BOP sheet are included it must balance – that is, it must sum to zero – there can be no overall surplus or deficit. For example, if a country is importing more than it exports, its trade balance will be in deficit, but the shortfall will have to be counter balanced in other ways – such as by funds earned from its foreign investments, by running down reserves or by receiving loans from other countries.

While the overall BOP sheet will always balance when all types of payments are included, imbalances are possible on individual elements of the BOP, such as the current account. This can result in surplus countries accumulating hoards of wealth, while deficit nations become increasingly indebted. Historically there have been different approaches to the question of how to correct imbalances and debate on whether they are something governments should be concerned about. With record imbalances held up as one of the contributing factors to the financial crisis of 2007–2010, plans to address global imbalances are now high on the agenda of policy makers for 2010.

Q. 5 (b). Distinguish between fixed and flexible exchange rates.

Ans: Exchange rates allow trade between currencies. Exchange rates determine the value of money when exchanged. A fixed exchange rate means the amount of currency received is set in advance. A floating exchange rate means that the rate is moving and the currency received depends on the time of the exchange.

Unitl 1971, governments with the major currencies in the world maintained fixed exchange rates. The rates were originally based upon the price of gold, and then the value of the US dollar. Fixed exchange rates allowed for stability. Everyone knew the cost of money. There was no uncertainty in the foreign trade of goods. After 1971, governments with major currencies, such as the United States and European countries, could no longer control the exchange rate and the rate was allowed to float. In many developing countries governments continued to use a fixed exchange rate for their currency.


Fixed exchange rateFixed exchange rates are set by governments.A fixed exchange rate is based upon the government's view of the value of its currency as well as the monetary policy. It has advantages. Stability is one. Another is predictability. Businesses and individuals can plan their activities with the certainty of the value of money. A businessman shipping goods overseas knows the value in advance. A tourist travelling in other countries can budget knowing what his money will buy.

Floating exchange rateThe floating exchange rate, in its true form, allows the marketplace to set the rate. The forces of supply and demand determine the value of a currency. For example, when the US dollar is considered strong it will take more euros, the currency of most European countries, to buy. When the US dollar is considered weak or in decline the amount of euros needed to buy it will fall.

In reality, floating rates do not solely change with the forces of the marketplace. Governments are constantly trying to fix the floating rate by taking action in the marketplace. Government action cannot fix the rate, but it can effect the rate through intervention. Such intervention involves either the buying or selling of currency, depending on which way the government wants the rate to go. Some governments, like China, have a modified fixed rate. They set a rate and then allow the rate to float within certain defined limits. Such limits are usually very small. These small allowed changes mean that the rate will always come back to the set figure after going up or down. For China, it is a way to put a small amount of free market in the currency while maintaining government control.

Q. 6. Discuss the need for HRM Strategies and International employee relations strategies in International business.

Ans: The environments within which international business is carried out in the first decade of the new millenium is increasingly competitive.

The technological environment is such that technology supremacy is fleeting and since it does not last long - cannot be considered a strong advantage of a company.

The economic environment is effected by too many uncontrollable factors which means a stable economic situation is less certain. The economy can be effected negatively by things which large companies and federal governments have no control over.

The political environment responds to the socio-cultural environment - which in many countries, is undergoing the stresses of large immigration movements and cultural and religious frictions. Very few regions of the world are free of conflict so no place has a distinctively advantageous political environment

The geographic environment, long affected by rampant pollution, deforestation, greenhouses gases from autos and factories, acid rain from coal fired generators, declining water reserves etc. etc. has seen a bit of Mother Nature fighting back in 2003-2005 with some spectacular events such as a massive tidal wave, numerous destructive tornadoes, larger and more frequent hurricanes, volcanoes, mudslides, sandstorms, drought and crop failures an so on. As a consequence of the changes to and changes by the geographic environment, almost everyplace on the planet has had to endure weather that has negatively effected business and agricultural productivity.

The one area in which companies can become more competitive is having the best people and having those people serve their customers in the best way.

Therefore one of the key things for companies in the "new new" economy is to focus on the people in the company, and the customers they serve - ergo, Human Resource Management has become a "big issue" for international business.

Although Dilbert has many jokes about Catbert, the "Evil H.R. Director", fact is, morale of employees is increasingly important, especially in international business, since companies are more and more challenged to cut expenses, and the # 1 expense cut is staff cuts - meaning, more productivity out of fewer people.


The way to get more productivity is partly by enhancing morale.

Expatriate Managers (expats) is simply defined as a citizen of one country working abroad in another country. Another slang expression often used is "the expat community" to describe a group of foreign country nationals, most often educated executives with good jobs, benefits and privelages, who can sometimes be seen by the local population as behaving in a way that is "elite".

Types of Staffing policies

Ethnocentric Staffing Policy : when the company sends people from the country of the home company, overseas.

Polycentric Staffing Policy: when the company allows local staff to rise to the executive level and be managers.

Geocentric Staffing Policy: when the company uses staff in foreign operations - no matter what country they come from.

Global HR Challenges

Things that make it difficult for companies to manage Human Resources situations in other countries.

o Compensation varies

o Labour Laws

o Social-Cultural Environment

Compensation varies :

Software development in India costs $15-$20/ hour including the cost of the hardware, software and a satellite link. Compare this with $60-$80/ hour in the US

- still true in 2009?

from http://home.alltel.net/bsundquist1/gcib.html

Foundry workers (casting metal things) in India earn $1 for working an 8-hour day

Mexican auto industry wages and benefits average $4.00 hr, vs. $30 hr in the US

Despite seemingly low wages and wretched conditions, Mexico is losing garment assembly jobs to Central America, call centers to Argentina, data processing to India, and electronics manufacture to China

Joel Millman, David Luhnow, "Decade After NAFTA, Prospects for Mexico Seem to be Dimming", Wall Street Journal, 2003

Japanese companies can hire 3 Chinese software engineers for the price of one in Japan (Thomas L. Friedman, "Doing our homework", Pittsburgh Post Gazette, 6/25/04).

Vietnamese Nike workers earn $1.60/ day, while three simple meals cost $2.00

Labour Laws, Rules and Regulations :


Minimum wage in Mexico is $0.50/ hour. In one US auto plant in Mexico, the workers went on strike. The Mexican police shot several and put the strikers back to work-and cut their wages 45%.

Nobody has ever been shot by police for being on strike in Canada !!!

Social-Cultural Environment

o Language issues

o Religious practices

Canada

- Christmas & New Years & Canada Day

+ Chinese New Year

+ Ramadan

+ Jewish High Holidays

o Gender issues

o Vacations and holidays

Social-Cultural Environment :

effects Canadian managers operating Canadian companies overseas

effects Canadian managers operating in large "multi-cultural" cities in Canada

effects Foreign Company managers operating Foreign Companiesin Canada

Canadian managers operating Canadian companies overseas

Canadian managers of "european background"

Canadian managers who's background matches the region in which the company operates eg. Canadian IT company using Chinese-Canadian managers in China

Canadian managers operating in large "multi-cultural" cities in Canada

Canadian managers of "european background"

trying to deal with employees of non-Northern European background - "Managing Diversity"

eg. Canadian Bank Vice Presidents dealing with branch managers from a "blended" community

Canadian managers operating in large "multi-cultural" cities in Canada

Canadian managers who's background matches the workers in the company eg. Canadian ISP company using Canadian-Desi for call centre employees from from India, Pakistan, Bagladesh and Sri Lanka


Foreign Company managers operating Foreign Companies in Canada

Foreign Company managers using English to manage Canadians of European Background

Foreign Company managers communicating and managing employees in Canada, who do not have English as a first language

eg. Japanese auto executives managing employes of South-Asian heritage

Social-Cultural Environment - "Managing Diversity"

"Programs or corporate environments that value multiculturalism must answer hard questions about managing diversity."

Promoting Diversity

- equal treatment ?

- or differential treatment?

Antidiscrimination laws in Canada and other OECD countries require that employers do not treat applicants for jobs, and employees, differently.

Treating people "equally" can be both a positive and negative for ethnic minorities and those who laud and "celebrate diversity".

For example - If we treated people "equally", we'd have just one written drivers test - in English

from http://www.referenceforbusiness.com/encyclopedia/Mor-Off/

Multicultural-Workforce.html#MANAGING_DIVERSITY

Social-Cultural Environment - "Managing Diversity"

"On the other hand, treating people differently often creates resentment and erodes morale with perceptions of preferential treatment."

Some employees resent other employees who get special consideration for holidays, or prayer times, or special food considerations.

"Other questions to be answered are: Will the company emphasize commonalities or differences in facilitating a multicultural environment? Should the successful diverse workplace recognize differentiated applicants as equals or some as unequals? How does the company achieve candor in breaking down stereotypes and insensitivity towards women and minority groups?"


How do you make decisions about managing situations where it might be considered "favourtism" to make allowances or considerations for a special category of "diversity"?


Legal Aspects of Business

Q.1 Discuss the issues involved in international product policy and International branding with a few examples.

Ans: International product policy

When going internationally product decisions are critical for the firm’s marketing activity, as they define its business, customers, competitors, as well as the other marketing policies, such as pricing, distribution and promotion.

Improper product policy decisions are very easily made with negative consequences for the company as the following examples illustrate:

• Ikea, the Swedish furniture chain insists that all its stores carry the basic product line with little or no adaptation to local tastes. When it entered the USA market with the basic product line they did not understand the reluctance of the USA customers to buy beds. Eventually the firm discovered that the Ikea beds were a different size than the USA beds and the bed linen the consumer had did not fit to the bed. They would have had to specially buy bed linen from Ikea to fit to the bed. Ikea remedied the situation by ordering larger beds and bed linen from its suppliers.

• When Ford introduced the Pinto model in Brasil was unaware of the fact that pinto in the Brazilian slang meant small male genitals. Not surprisingly sales were small. When the company found out why the sales for the Pinto model were so small it changed its name to Corcel (that means horse).


These examples show how easily companies, even the experienced ones commit international „blunders”, and emphasize once again the importance of the product policy at international level. The main product policy decisions that a company faces when going abroad comprises aspects such as:

1) What is the degree of adaptation /standardization of the company products on each foreign market?

2) What are the products that the company is going to sell abroad (product portofolio decisions)?

3) What products have to be developed for what markets?

4) What is the branding strategy abroad?

We will start our discussion about product policy by first looking to what a product is and how it can be defined.

Product Issues in International Marketing

Products and Services. Some marketing scholars and professionals tend to draw a strong distinction between conventional products and services, emphasizing service characteristics such as heterogeneity (variation in standards among providers, frequently even among different locations of the same firm), inseperability from consumption, intangibility, and, in some cases, perishability—the idea that a service cannot generally be created during times of slack and be “stored” for use later. However, almost all products have at least some service component—e.g., a warranty, documentation, and distribution—and this service component is an integral part of the product and its positioning. Thus, it may be more useful to look at the product-service continuum as one between very low and very high levels of tangibility of the service. Income tax preparation, for example, is almost entirely intangible—the client may receive a few printouts, but most of the value is in the service. On the other hand, a customer who picks up rocks for construction from a landowner gets a tangible product with very little value added for service. Firms that offer highly tangible products often seek to add an intangible component to improve perception. Conversely, adding a tangible element to a service—e.g., a binder with information—may address many consumers’ psychological need to get something to show for their money.

On the topic of services, cultural issues may be even more prominent than they are for tangible goods. There are large variations in willingness to pay for quality, and often very large differences in expectations. In some countries, it may be more difficult to entice employees to embrace a firm’s customer service philosophy. Labor regulations in some countries make it difficult to terminate employees whose treatment of customers is substandard. Speed of service is typically important in the U.S. and western countries but personal interaction may seem more important in other countries.

Product Need Satisfaction. We often take for granted the “obvious” need that products seem to fill in our own culture; however, functions served may be very different in others—for example, while cars have a large transportation role in the U.S., they are impractical to drive in Japan, and thus cars there serve more of a role of being a status symbol or providing for individual indulgence. In the U.S., fast food and instant drinks such as Tang are intended for convenience; elsewhere, they may represent more of a treat. Thus, it is important to examine through marketing research consumers’ true motives, desires, and expectations in buying a product.

Approaches to Product Introduction. Firms face a choice of alternatives in marketing their products across markets. An extreme strategy involves customization, whereby the firm introduces a unique product in each country, usually with the belief tastes differ so much between countries that it is necessary more or less to start from “scratch” in creating a product for each market. On the other extreme, standardization involves making one global product in the belief the same product can be sold across markets without significant modification—e.g., Intel microprocessors are the same regardless of the country in which they are sold. Finally, in most cases firms will resort to some kind of adaptation, whereby a common product is


modified to some extent when moved between some markets—e.g., in the United States, where fuel is relatively less expensive, many cars have larger engines than their comparable models in Europe and Asia; however, much of the design is similar or identical, so some economies are achieved. Similarly, while Kentucky Fried Chicken serves much the same chicken with the eleven herbs and spices in Japan, a lesser amount of sugar is used in the potato salad, and fries are substituted for mashed potatoes.

There are certain benefits to standardization. Firms that produce a global product can obtain economies of scale in manufacturing, and higher quantities produced also lead to a faster advancement along the experience curve. Further, it is more feasible to establish a global brand as less confusion will occur when consumers travel across countries and see the same product. On the down side, there may be significant differences in desires between cultures and physical environments—e.g., software sold in the U.S. and Europe will often utter a “beep” to alert the user when a mistake has been made; however, in Asia, where office workers are often seated closely together, this could cause embarrassment.

Adaptations come in several forms. Mandatory adaptations involve changes that have to be made before the product can be used—e.g., appliances made for the U.S. and Europe must run on different voltages, and a major problem was experienced in the European Union when hoses for restaurant frying machines could not simultaneously meet the legal requirements of different countries. “Discretionary” changes are changes that do not have to be made before a product can be introduced (e.g., there is nothing to prevent an American firm from introducing an overly sweet soft drink into the Japanese market), although products may face poor sales if such changes are not made. Discretionary changes may also involve cultural adaptations—e.g., in Sesame Street, the Big Bird became the Big Camel in Saudi Arabia.

Another distinction involves physical product vs. communication adaptations. In order for gasoline to be effective in high altitude regions, its octane must be higher, but it can be promoted much the same way. On the other hand, while the same bicycle might be sold in China and the U.S., it might be positioned as a serious means of transportation in the former and as a recreational tool in the latter. In some cases, products may not need to be adapted in either way (e.g., industrial equipment), while in other cases, it might have to be adapted in both (e.g., greeting cards, where the both occasions, language, and motivations for sending differ). Finally, a market may exist abroad for a product which has no analogue at home—e.g., hand-powered washing machines.

Branding. While Americans seem to be comfortable with category specific brands, this is not the case for Asian consumers. American firms observed that their products would be closely examined by Japanese consumers who could not find a major brand name on the packages, which was required as a sign of quality. Note that Japanese keiretsus span and use their brand name across multiple industries—e.g., Mitsubishi, among other things, sells food, automobiles, electronics, and heavy construction equipment.

The International Product Life Cycle (PLC). Consumers in different countries differ in the speed with which they adopt new products, in part for economic reasons (fewer Malaysian than American consumers can afford to buy VCRs) and in part because of attitudes toward new products (pharmaceuticals upset the power afforded to traditional faith healers, for example). Thus, it may be possible, when one market has been saturated, to continue growth in another market—e.g., while somewhere between one third and one half of American homes now contain a computer, the corresponding figures for even Europe and Japan are much lower and thus, many computer manufacturers see greater growth potential there. Note that expensive capital equipment may also cycle between countries—e.g., airlines in economically developed countries will often buy the newest and most desired aircraft and sell off older ones to their counterparts in developing countries. While in developed countries, “three part” canning machines that solder on the bottom with lead are unacceptable for health reasons, they have found a market in developing countries.

Diffusion of innovation. Good new innovations often do not spread as quickly as one might expect—e.g., although the technology for microwave ovens has existed since the 1950s, they really did not take off in the United States until the late seventies or early eighties, and their penetration is much lower in most other


countries. The typewriter, telephone answering machines, and cellular phones also existed for a long time before they were widely adopted.

Certain characteristics of products make them more or less likely to spread. One factor is relative advantage. While a computer offers a huge advantage over a typewriter, for example, the added gain from having an electric typewriter over a manual one was much smaller. Another issue is compatibility, both in the social and physical sense. A major problem with the personal computer was that it could not read the manual files that firms had maintained, and birth control programs are resisted in many countries due to conflicts with religious values. Complexity refers to how difficult a new product is to use—e.g., some people have resisted getting computers because learning to use them takes time. Trialability refers to the extent to which one can examine the merits of a new product without having to commit a huge financial or personal investment—e.g., it is relatively easy to try a restaurant with a new ethnic cuisine, but investing in a global positioning navigation system is riskier since this has to be bought and installed in one’s car before the consumer can determine whether it is worthwhile in practice. Finally, observability refers to the extent to which consumers can readily see others using the product—e.g., people who do not have ATM cards or cellular phones can easily see the convenience that other people experience using them; on the other hand, VCRs are mostly used in people’s homes, and thus only an owner’s close friends would be likely to see it.

At the societal level, several factors influence the spread of an innovation. Not surprisingly, cosmopolitanism, the extent to which a country is connected to other cultures, is useful. Innovations are more likely to spread where there is a higher percentage of women in the work force; these women both have more economic power and are able to see other people use the products and/or discuss them. Modernity refers to the extent to which a culture values “progress.” In the U.S., “new and improved” is considered highly attractive; in more traditional countries, their potential for disruption cause new products to be seen with more skepticism. Although U.S. consumers appear to adopt new products more quickly than those of other countries, we actually score lower on homiphily, the extent to which consumers are relatively similar to each other, and physical distance, where consumers who are more spread out are less likely to interact with other users of the product. Japan, which ranks second only to the U.S., on the other hand, scores very well on these latter two factors.

Branding strategies at international level

For a company that goes international branding is important, as it is more difficult than branding in the domestic market. Branding is usually rooted in the culture of a country and brand names designed for one country can have different meanings in other languages or no meaning at all. A brand is a name, a sign, a symbol, a logo, a term or a combination of these used by a firm to differentiate its offerings from those of the competitors. In most product categories, companies do not compete with products, but with brands, with the way the augmented products are differentiated and positioned as compared to other brands. All brands are products or services in that they serve a functional purpose, but not all products or services are brands. A product is a physical entity, but is not always a brand, as brands are created by marketers. A brand is a product or a service that besides the functional benefits provides also some added value, such as11:

familiarity, as brands identify products,

reliability and risk reduction, as brands in most instances offer a quality guarantee,

association with the kind of people who are known users of the brand, such as young and glamorous or rich and snobbish.

For many firms the brands they own are their most valuable assets. Associated to the brand is the brand equity that refers to brand name awareness, perceived quality or any association made by the customer with the brand name. A brand can be an asset (for Coca Cola the brand is an asset) or a liability (for Nestle the brand was a liability when the boycott for the infant milk formula was launched internationally).


How a company chooses a brand is an elaborated process. In France there is a company that specializes in finding international brands names. Jeannet and Hennessey present the steps undertaken by this company12:

1. The company brings citizens of many countries together and asks them to state names in their particular language that they think would be suitable to the product to be named. Speakers of different languages can immediately react in case names that sound unpleasant in their language or have unwanted connotations appear.

2. The thousands of names that are accumulated in few such sessions, are than reduced to five hundred by the company.

3. The client company is asked to choose fifty names from the five hundred.

4. The fifty chosen names are than searched to determine which ones have not been registered in any of the countries under consideration.

5. From the usually ten names that still remain in the process after this phase, the company together with the client will make the final decision.

When choosing a name for products to be marketed internationally a company may consider different naming strategies, as those exemplified in box no. 9.1.

There are a number of branding strategies that a company may use at international level:

1. According to the existence or not of a brand there are:

the no branded products that have the advantage of lower production costs and lower marketing costs but they have the disadvantage that do not have market identity and compete severely on price,

products with brands that can benefit a lot from their brands if brand awareness is high and the image is positive. Sometimes the brand can be considered the most valuable asset of the company.

.

For instance, Coca-Cola brand’s equity was evaluated at over 35 bill. $ according to one source. The fact that brands are assets for companies is illustrated by their market value. In 1987 Nestle bought the UK chocolate maker Rowntree with 4.5 billion $, five times the book value, due to its ownership of well known brands such as After Eight, Kit Kat and Rolo. Similarly Philip Morris bought Kraft with 12.9 billion $, a price four times the book value13.

2. According to the number of products that have the same name, there are:

individual brands, when each company’s product has its own name usually with no association with the company name. Individual brands are used when the company addresses different market segments. In the cigarettes industry one producer has Camel, Winston and Winchester brands, each of them addressing different market segments,

family/umbrella/ corporate brands. When all products of the company or a group of products of the company have the same name, we have the family or umbrella branding. When this name is the corporate name, we have corporate branding. Such corporate brands are Shell, Levi’s, Sony, Kodak, Daewoo, Virgin etc.


Sometimes companies use both a specific individual name with the name of the corporation. For instance, Chocapic from Nestle, or Toyota Lexus.

3. According to the number of brands commercialised in one market, the company may have:

single brand and it is usually the case when there is a high market homogeneity. The main disadvantage of having just one brand in a country is the limited shelf space at retailer level, resulting in lower exposure of the company. The advantage is that brand confusion for the customer is eliminated and more focused and efficient marketing is permitted for the company,

multiple brands when a company has more brands in one market. This strategy is to be used when the market is segmented and consumers have various needs. Coca-Cola company has on the Romanian market multiple brands, among which Coca-Cola, Sprite, Cappy, Fanta, etc. The advantage of this strategy is that more shelf space is gained by the company (if the consumer does not buy Coca-Cola but buys Fanta the money goes to the same Coca-Cola company). Among the disadvantages are the fact that there are higher marketing costs, as different marketing plans and programs are designed for each brand and the economies of scale are lost.

4. According to the owner of the brand

there are manufacturer’s brands as most brands we know: Levi’s, Coca-Cola, Nike, Levi’s, Adidas, etc.

there are private brands that are retailers’ brands or store brands. Retailers started to buy products and then resell them under their own name. The private brands recently became very popular. They offer high margins for retailers as compared to the margins for manufacturers’ brands, they have extensive and better shelf space, they benefit of heavy in-store promotion and they are usually low price/good quality products. In UK they represent one third of supermarket sales and their sale proportion increases in continental Europe, too.

5. According to the geographical spread of the brand, there are:

global brands that have been defined by Chee and Harris14 as brands that are marketed with the same positioning and marketing approaches in every part of the world. Some other authors consider that it is not so easy to define a global brand. However, using global brands (at least the same name everywhere) offers some advantages to the company:

obtaining economies of scale,

building easier brand awareness, as global brands are more visible than local brands,

by using global brands the company can capitalize on media overlap that exist in many regions (for instance, Germany with Austria),

using global brands contributes to increased prestige for the company, as it gives consumers a signal that the company has the resources to compete globally and has the will power and commitment to support the brand world wide.

local brands are more indicated to be used in certain conditions, such as the following:


there are legal constraints. A few years ago in India, Pepsi was called Lehar as the legislation was asking that all brand names to be local.

if the brand name is already used for a similar or not similar product in that country, another brand name has to be chosen. Budweiser is an American brand of beer, but in Europe a Czech beer company owned the name. So the USA company called its beer Bud in Europe.

when there are cultural barriers and the global name is either difficult to pronounce or has an undesirable association. A company producing milk from New Zeeland, renamed its powder milk sold in Malaysia from Anchor (domestic name) to Fern (local name) because the name Anchor was a beer brand heavily advertised in Malaysia. The company considered that the consumers will not buy this product used for children if its name would be associated to an alcoholic beverage, especially that a large proportion of the population is Muslim in Malaysia.

Many international companies have used local brands by adapting their domestic brands to the environment of the new foreign markets. Procter and Gamble for instance, adapted the name of its household cleaner Mr. Clean to the European markets by translating it. The brand became Monsieur Propre in France and Meister Proper in Germany15. General Motors, also adapted its brand for Europe, even though was selling the same product. The automobile became Opel in Germany and Vauxhall in U.K..16

Brand name selection procedures for international markets are therefore important, as the company has to choose either to adapt or standardize its brand name. A key issue for companies in international marketing is whether they should use global or local brands. The decision of either to use global or local names should be taken according to what each market dictates. In the countries where patriotism is high and consumers have a strong buy-local attitude local brands are recommended. Also, local brands are to be used in the countries where global brands are not known and where local brands have a strong brand equity. When the brand is strong companies should go global with it. A company should use global brands where is possible and to use national/local brands where necessary.

Q.2 a. Why do you think International quality standards are essential in International business?

Ans: QUALITY CONTROLQuality control is a process within an organization designed to ensure a set level of quality for the

products or services offered by a company. This control includes the actions necessary to verify and control the quality output of products and services. The overall goal includes meeting the customer's requirements, product satisfaction, fiscally sound, and dependable output. Most companies provide a service or a product. The control is important to determine that the output being provided is of overall top quality. Quality is important to companies for liability purposes, name recognition or branding, and maintaining a position against the competition in the marketplace.

This process can be implemented with a company in many ways. Some organizations bring in a quality assurance department and practice testing of products before they are delivered to the shelves. When quality assurance is used, a set of requirements is determined and the quality assurance team will verify the product not only meets all of the requirements but they will also perform faulty testing. Companies with a customer service department often implement quality controls through recording phone conversations, sending out customer surveys, and requiring employees to follow a specific set of guidelines when speaking


to customers over the phone. Implementing a quality control department or strategy allows a company to find faults or problems with products or services before they reach the customer.

It is common for a company to send out products that have defects or problems or provide poor service to customers. A good strategy and using techniques can help ensure the elimination of issues that give the company a bad name. This is because quality control monitors the overall quality by comparing the product or service with the requirements. Making sure the products or services meet or exceed the requirements set forth allows a business to be more successful and improve the organization.

Quality control not only consists of products and services but how well an organization works as a whole together within the organization and in the marketplace. A strategy to manage and improve the quality within an organization can help a company become and remain a success. Quality is an ongoing effort that must be consistent and improving every day. Every organization or business can benefit by using quality control for their products or services, within the internal organization, and interacting in the marketplace.

To be competitive on both a national and a global basis, organizations must adopt a forward-thinking approach in developing their management strategies. In this article, we will review ISO 9000 and ISO 14000 and suggest how these standards may be used to move an organization toward that paradigm and thus enable it to compete more effectively in today's global marketplace.

Many of our current quality management and environmental management systems are reactive—that is, they have been developed in response to federal, state, or local regulations. We need to ask ourselves, is this a competitive way to work? When we are in this reactive mode, are we really listening to our customers? Are we able to seek out innovative means of getting the job done?

International standards force companies to look at their processes in a new light and to take a more active approach to management. For example, if a company wishes to pursue the new environmental standard, ISO 14000, its environmental management system's pollution control policy will have to be revamped to focus on prevention rather than command-and-control. As the company moves in that direction it will truly become more competitive, and will do so on a global basis.

Q.2 b. Give a note on Robotics and flexible manufacturing.

Ans : The most powerful long-term technological trend impinging on the factory of the future is that toward computer integrated manufacturing. Behind this trend lies the unique capability of the computer to automate, optimize, and integrate the operations of the total system of manufacturing. The virility of this trend is attested to by technological forecasts made over the past 10 years. The rapidity of the development is due not only to this technological virility, but also to powerful long-term economic and social forces impinging on manufacturing. As a result many industrialized nations are pursuing large national programs of research, development, and implementation of computer integrated manufacturing to hasten the technological evolution of computer integrated automatic factories. Programs receiving major emphasis include development and application of integrated manufacturing software systems, group technology and cellular manufacturing, computer control of manufacturing processes and equipment, computer-controlled robots, flexible manufacturing systems, and prototype computer automated factories. This evolution poses some significant challenges to American industry.


The Role of Robotics in Flexible Manufacturing

When most engineers think about “flexibility,” they imagine robots. Because of programmable controls, end-of-arm tooling and machine vision systems, the devices can perform a wide variety of repeatable tasks. “Robotics is a key component of flexible manufacturing,” claims Ted Wodoslawsky, vice president of marketing at ABB Robotics Inc. (Auburn Hills, MI). “Any applications that involve high-mix, high-volume assembly require flexible automation. Manufacturers need the ability to run different products on the same line. That’s much more difficult to do with hard automation.” The automotive industry is still considered to be the role model for robotic flexibility. However, Wodoslawsky says many of the lessons learned by automakers and suppliers can easily be applied to other industries and processes. “Automotive manufacturers are faced with producing a greater mix of vehicles in a shrinking number of plants,” adds Walter Saxe, automotive business development manager at Applied Robotics Inc. (Glenville, NY). “This practice is driving the need for higher payloads, faster tool changeover and greater control of data to achieve maximum flexibility and exacting production details. This in turn is challenging the makers of robots and tools to stay ahead of the ever-increasing market needs by advancing technologies before they are needed.” For instance, state-of-the-art robots feature force control, which offers an extra degree of flexibility for critical applications such as powertrain assembly. Other new tools and features that make robots more suitable for flexible production applications include open architecture that allows easy integration with commonly used PLC platforms and offline simulation from desktop computers. “[Manufacturing engineers should ensure their] controls platform has the ability to manage, manipulate and store all the data that is required with flexible implementation schemes,” says David Huffstetler, market manager at Staubli Robotics (Duncan, SC). “It can become a critical issue in places where you least expect it to happen.

It is true that the flexible manufacturing cuts the number of employees that are needed for production. Quicker equipment changeover between production jobs will have a direct bearing on the improvement of capital utilization. This will also reduce costs per production job due to the decrease in man hours needed for st up of equipment. Automated control of the manufacturing process yields consistent and higher quality output. Less man hours are needed for overall production which reduces the cost of products. There is significant savings from the reduced indirect labor cost, errors in production, repairs, and product rejects.

Q.3 (a). What is transfer pricing?

Ans: Transfer pricing refers to the setting, analysis, documentation, and adjustment of charges made between related parties for good, services, or use of property (including intangible property). Transfer prices among components of an enterprise may be used to reflect allocation of resources among such components, or for other purposes. OECD Transfer Pricing Guidelines state, “Transfer prices are significant for both taxpayers and tax administrations because they determine in large part the income and expenses, and therefore taxable profits, of associated enterprises in different tax jurisdictions.”

Many governments have adopted transfer pricing rules that apply in determining or adjusting income taxes of domestic and multinational taxpayers. The OECD has adopted guidelines followed, in whole or in part, by many of its member countries in adopting rules. United States and Canadian rules are similar in many respects to OECD guidelines, with certain points of material difference. A few countries follow rules that are materially different overall.

The rules of nearly all countries permit related parties to set prices in any manner, but permit the tax authorities to adjust those prices where the prices charged are outside an arm's length range. Rules are generally provided for determining what constitutes such arm's length prices, and how any analysis should proceed. Prices actually charged are compared to prices or measures of profitability for unrelated transactions and parties. The rules generally require that market level, functions, risks, and terms of sale of unrelated party transactions or activities be reasonably comparable to such items with respect to the related party transactions or profitability being tested.


Most systems allow use of multiple methods, where appropriate and supported by reliable data, to test related party prices. Among the commonly used methods are comparable uncontrolled prices, cost plus, resale price or markup, and profitability based methods. Many systems differentiate methods of testing goods from those for services or use of property due to inherent differences in business aspects of such broad types of transactions. Some systems provide mechanisms for sharing or allocation of costs of acquiring assets (including intangible assets) among related parties in a manner designed to reduce tax controversy.

Most tax treaties and many tax systems provide mechanisms for resolving disputes among taxpayers and governments in a manner designed to reduce the potential for double taxation. Many systems also permit advance agreement between taxpayers and one or more governments regarding mechanisms for setting related party prices.

Many systems impose penalties where the tax authority has adjusted related party prices. Some tax systems provide that taxpayers may avoid such penalties by preparing documentation in advance regarding prices charged between the taxpayer and related parties. Some systems require that such documentation be prepared in advance in all cases.

Q.3(b). Write a short note on Bills of Exchange and Letters of credit.

Ans: A bill of exchange or "draft" is a written order by the drawer to the drawee to pay money to the payee. A common type of bill of exchange is the cheque (check in American English), defined as a bill of exchange drawn on a banker and payable on demand. Bills of exchange are used primarily in international trade, and are written orders by one person to his bank to pay the bearer a specific sum on a specific date. Prior to the advent of paper currency, bills of exchange were a common means of exchange. They are not used as often today.

A bill of exchange is an unconditional order in writing addressed by one person to another, signed by the person giving it, requiring the person to whom it is addressed to pay on demand or at fixed or determinable future time a sum certain in money to order or to bearer. (Sec.126)

It is essentially an order made by one person to another to pay money to a third person. A bill of exchange requires in its inception three parties—the drawer, the drawee, and the payee.

In brief, a "bill of exchange" or a "Hundi" is a kind of legal negotiable instrument used to settle a payment at a future date. It is drawn by a drawer on a drawee wherein drawee accepts the payment liability at a date stated in the instrument. The Drawer of the Bill of Exchange draw the bill on the drawee and send it to him for his acceptance. Once accepted by the drawee, it becomes a legitimate negotiable instrument in the financial market and a debt against the drawee. The drawer may, on acceptance, have the Bill of Exchange discounted from his bank for immediate payment to have his working capital funds. On due date, the bill is again presented to the drawee for the payment accepted by him, as stated therein the bill.

Letter of Credit (LC) is a declaration of financial soundness and commitment, by a bank for its client, for the amount stated in the LC document, to the other party (beneficiary) named therein. The LCs may or may not be endorse-able. In case of default of payment by the party under obligation to pay, the LC issuing Bank undertakes to honour the payment - with or without conditions. Normally, there may be sight LCs or DA LCs containing a set of conditions in both the cases. There "may be" Bills of Exchange(s) drawn under the overall limits of the LC amount for payment later on.

Q.4. Discuss the modern theory of international trade along with its criticisms.

Ans: Modern Trade Theories;-


The orthodox neo-classical trade theories are the basis of trade advocated by WTO and GATT. However, the comparative advantage and specialization do not explain many real world trade patterns. As well, the assumptions of these theories are very simplistic compared to the real world competitive and other economic factors such as returns to scale, the impact of demand by income levels, sizes of firms involved in trade etc. These limitations of the orthodox neo-classical trade policy gave birth to modern trade theories and trade policies as well government assistance to industry in international trade.The assumptions of Modern Trade Theories

The modern trade theories relax the assumptions of the orthodox trade theories. The assumptions of modern trade theory are:

Non identical preferences by consumers

Not constant return to scale but economies of scale

Imperfect and other competitive market structure

The existence of externalities as opposed to no externalities assumed by the orthodox trade theory

The State of Modern Trade theories

At the present moment modern trade theories are not consistent. However, they are important building blocks to create a consistent modern trade theory. For example the Linder hypothesis about preferences, models with economies of scale and strategic trade policy even they are not consistent they are important building blocks to create a consistent modern trade theory, which can assist firms and government to devise policies and practices in the area of international trade.

A form of globalization and global trading where all nations prosper and develop fairly and equitably is probably what most people would like to see.

It is common to hear of today’s world economic system as being “free trade” or “globalization”. Some describe the historical events leading up to today’s global free trade and the existing system as “inevitable”. The UK’s former Prime Minister, Margaret Thatcher, was famous for her TINA acronym. Yet, as discussed in the Neoliberalism Primer page earlier, the modern world system has hardly been inevitable. Instead, various factors such as political decisions, military might, wars, imperial processes and social changes throughout the last few decades and centuries have pulled the world system in various directions. Today’s world economic system is a result of such processes. Power is always a factor.

Capitalism has been successful in nurturing technological innovation, in promoting initiative, and in creating wealth (and increasing poverty). Many economists are agreed that in general capitalism can be a powerful engine for development. But, political interests and specific forms of capitalism can have different results. The monopoly capitalism of the colonial era for example was very destructive. Likewise, there is growing criticism of the current model of corporate-led neoliberalism and its version of globalization and capitalism that has resulted. This criticism comes from many areas including many, many NGOs, developing nation governments and ordinary citizens.

In March 2003, the IMF itself admitted in a paper that globalization may actually increase the risk of financial crisis in the developing world. “Globalization has heightened these risks since cross-country financial linkages amplify the effects of various shocks and transmit them more quickly across national borders” the IMF notes and adds that, “The evidence presented in this paper suggests that financial integration should be approached cautiously, with good institutions and macroeconomic frameworks viewed as important.” In addition, they admit that it is hard to provide a clear road-map on how this should be achieved, and instead it should be done on a case by case basis. This would sound like a move slightly away from a “one size fits all” style of prescription that the IMF has been long criticized for.


In critical respects I would argue that the problem with economic globalization is that it has not gone far enough. Major barriers to trade remain in key sectors of export interest to developing countries such as agriculture and textiles and clothing, and trade remedy actions (antidumping, countervail, and safeguards) have proliferated (often directed at developing countries), in many cases replacing prior tariffs. Indeed, tariffs facing developing country exports to high-income countries are, on average, four times those facing industrial country exports for manufactured goods and much higher again for agricultural products. Agricultural subsidies in developed countries further restrict effective market access by developing countries.73 Economic estimates have found that the costs of protection inflicted on developing countries by developed countries negate most or all of the entire value of foreign aid in recent years.

Q. 5 (a). Make a note of the functions and achievements of UNCTAD.

Ans: UNCTAD:-UNCTAD was created in 1964 as an expression of the belief that a cooperative effort

of the international community was required to bring about changes in the world economic order that would allow developing countries to participate more fully in a prospering world economy. UNCTAD was the product of efforts aimed at countering self-perpetuating asymmetries and inequities in the world economy, strengthening multilateral institutions and disciplines, and promoting sustained and balanced growth and development. The creation of UNCTAD marked the commitment of Member States "to lay the foundations of a better world economic order" through the recognition that "international trade is an important instrument for economic development".

Despite profound economic and political transformations in the world in the last thirty years, the essence of UNCTAD's development mission has not changed. Its thrust continues to be to enlarge opportunities in particular for developing countries to create their own wealth and income and to assist them to take full advantage of new opportunities.

Functions:

The themes addressed by UNCTAD over the years have included:

- expanding and diversifying the exports of goods and services of developing countries, which are their main sources of external finance for development;

- encouraging developed countries to adopt supportive policies, particularly by opening their markets and adjusting their productive structures;

- strengthening international commodity markets on which most developing countries depend for export earnings and enhancing such earnings through their increased participation in the processing, marketing and distribution of commodities, and the reduction of that dependence through the diversification of their economies;

- expanding the export capacity of developing countries by mobilizing domestic and external resources, including development assistance and foreign investment;

- strengthening technical capabilities and promoting appropriate national policies;

- alleviating the impact of debt on the economies of developing countries and reducing their debt burden;

- supporting the expansion of trade and economic cooperation among developing countries as a mutually beneficial complement to their traditional economic linkages with developed countries; and

- special measures in support of the world's poorest and most vulnerable countries.


UNCTAD's early years coincided with economic growth particularly in developed countries, worsening terms of trade for developing countries' exports, especially for commodities, and an increasing income gap between developed and developing countries. The situation became even more difficult through the 1980s which came to be known as "the lost decade for development". One consequence was that the multilateral economic negotiations between developed and developing countries became deadlocked in most forums. As a result, a perceptible loss of confidence occurred in UNCTAD's role as a facilitator of consensus and conciliator of divergent views. Multilateralism as a method of dealing with international trade and development problems was eroded and several countries opted for bilateral approaches.

But the profound changes that took place in the world in the late 1980s forced a reassessment of international economic cooperation. A fresh consensus emerged in the early 1990s on the need for new actions to support the international trade and economic development of developing countries. UNCTAD, and in particular UNCTAD VIII, added impetus to the forging of the development consensus for the 1990s and of a new partnership for development as envisaged in the Declaration on International Economic Cooperation, in particular the revitalization of Economic Growth and Development of the Developing Countries, adopted by the General Assembly at its eighteenth special session held in April-May 1990.

Major achievements

The functions of UNCTAD comprise four building blocks:

(i) policy analysis;

(ii) intergovernmental deliberation, consensus-building and negotiations;

(iii) monitoring, implementation and follow-up; and

(iv) technical cooperation.

UNCTAD VIII added a new dimension, namely the exchange of experiences among Member States so as to enable them to draw appropriate lessons for the formulation and implementation of policies at the national and international levels. These functions are interrelated and call for constant cross-fertilization between the relevant activities. Thus, UNCTAD is at once a negotiating instrument, a deliberative forum, a generator of new ideas and concepts, and a provider of technical assistance. As a result of this multifaceted mandate, UNCTAD was entrusted with a wide spectrum of activities cutting across several dimensions of development.

Its achievements have therefore been of different kinds and of varying impact. Among the most significant achievements reported to the Inspector by the UNCTAD secretariat could be included:

- the agreement on the Generalized System of Preferences (GSP) (1971), under which over $70 billion worth of developing countries' exports receive preferential treatment in most developed country markets every year;

- the setting up of the Global System of Trade Preferences among Developing Countries (1989);

- the adoption of the Set of Multilaterally Agreed Principles for the Control of Restrictive Business Practices (1980);

- negotiations of International Commodity Agreements, including those for cocoa, sugar, natural rubber, jute and jute products, tropical timber, tin, olive oil and wheat;

- the establishment of transparent market mechanisms in the form of intergovernmental commodity expert and study groups, involving consumers and producers, including those for iron ore, tungsten, copper and nickel;


- the negotiation of the Common Fund for Commodities (1989), set up to provide financial backing for the operation of international stocks and for research and development projects in the field of commodities, and which did not fulfil many expectations of the developing countries;

- the adoption of the resolution on the retroactive adjustment of terms of Official Development Assistance (ODA) debt of low-income developing countries under which more than fifty of the poorer developing countries have benefited from debt relief of over $6.5 billion;

- the establishment of guidelines for international action in the area of debt rescheduling (1980);

- the Agreement on a Special New Programme of Action for the Least Developed Countries (1981);

- the Programme of Action for the Least Developed Countries for the 1990s (1990);

- the negotiation of conventions in the area of maritime transport: United Nations Convention on a Code of Conduct for Linear Conferences (1974), United Nations Convention on International Carriage of Goods by Sea (1978), United Nations Convention on International Multimodal Transport of Goods (1980), United Nations Convention on Conditions for Registration of Ships (1986), United Nations Convention on Maritime Liens and Mortgages (1993).

In addition, UNCTAD made some contributions on matters for implementation in other fora, such as:

- the agreement on ODA targets, including the 0.7 per cent of GDP target for developing countries in general and the 0.20 per cent target for LDCs;

- the improvement of the IMF's compensatory financial facility for export earnings shortfalls of developing countries;

- the creation of the Special Drawing Rights (SDRs) by the IMF;

- the reduction of commercial bank debt for the highly indebted countries promoted by the World Bank;

- the principle of "enabling clause" for preferential treatment of developing countries which were later reflected in GATT legal instruments, e.g., Part IV of GATT on trade and development.

UNCTAD has also made a valuable contribution at the practical level, especially in the formulation of national policies, instruments, rules and regulations, as well as in the development of national institutions, infrastructure and human resources, in practically all its fields of activity. These achievements, usually involving an important technical cooperation component, have proved their value and have been much appreciated by the Governments concerned. Special mention should be made of UNCTAD's computerized systems in the area of customs (ASYCUDA) and debt management (DMFAS) which are considered among the best products on the market.

Furthermore, UNCTAD supported the Uruguay Round negotiations by assisting developing countries in understanding the implications for their economies of discussion on various issues or sectors and in defining their position for the negotiations. For this purpose, UNCTAD prepared special studies on specific issues, provided relevant trade information and advice at regional and national level within its technical assistance programme. Through its three annual flagship publications, nameyl the Trade and Development Report, the World Investment Report and the Least Developed Countries Report, the UNCTAD secretariat has made a signicant contribution to international understanding of major economic and development issues.


Q. 5 (b). Give reasons for the slow growth towards achieving international accounting standards.

Ans:The rapid growth of international trade and internationalization of firms, the Developments of new

communication technologies, the emergence of international competitive forces is perturbing the financial environment to a great extent. Under this global business scenario, the residents of the business community are in badly need of a common accounting language that should be spoken by all of them across the globe. A financial reporting system of global standard is a pre-requisite for attracting foreign as well as present and prospective investors at home alike that should be achieved through harmonization of accounting standards.

Accounting Standards are the policy documents (authoritative statements of best accounting practice) issued by recognized expert accountancy bodies relating to various aspects of measurement, treatment and disclosure of accounting transactions and events. As relate to the codification of Generally Accepted Accounting Principles (GAAP). These are stated to be norms of accounting policies and practices by way of codes or guidelines to direct as to how the items, which go to make up the financial statements should be dealt with in accounts and presented in the annual accounts. The aim of setting standards is to bring about uniformity in financial reporting and to ensure consistency and comparability in the data published by enterprises.

Accounting standards prevalent all across the world:

* Accounting standards are being established both at national and international levels. But the variety of accounting standards and principles among the nations of the world has been a sustainable problem for globalizing the business environment.

* There are several standard setting bodies and organizations that are now actively involved in the process of harmonization of accounting practices. The most remarkable phenomenon in the sphere of promoting global harmonization process in accounting is the emergence of international accounting standards.

* In India the Accounting Standards Board (ASB) was constituted by the Institute of Chartered Accountants of India (ICAI) on 21st April 1977 with the function of formulating accounting standards.

* Accounting standards vary from one country to another. There are various factors that are responsible for this. Some of the important factors are

- legal structure

- sources of corporate finance

- maturity of accounting profession

- degree of conformity of financial accounts

- government participation in accounting and

- Degree of exposure to international market.

* Diversity in accounting standards not only means additional cost of financial reporting but can cause difficulties to multinational groups in the manner in which they undertake transactions. It is quite possible for a transaction to give rise to a profit under the accounting standards of one country where as it may require a deferral under the standards of another.

Issues in adopting global accounting standards: -


There seems to be a reluctance to adopt the International Accounting Standards Committee (IASC) norms in the US?

This is definitely a problem. The US is the largest market and it is important for IASC standards to be harmonized with those prevailing there. The US lobby is strong, and they have formed the G4 nations, with the UK, Canada, and Australia (with New Zealand) as the other members. IASC merely enjoys observer status in the meetings of the G4, and cannot vote. Even when the standards are only slightly different, the US accounting body treats them as a big difference, the idea being to show that their standards are the best. We have to work towards bringing about greater acceptance of the IASC standards.

How real is the threat from G4?

G4 has evolved as a standard setting body and has recently issued its first standard on pooling of interest method. (Mergers can either be in the nature of purchase or in the form of pooling of interest like HLL-BBLIL). It is also expected to publish new or revised papers on reporting financial performance, business combinations, joint ventures, leases, and contributions. So far, the FASB (the US standard setting body) was the world's standard setter because of mandatory compliance with US GAAP for listing on the New York Stock Exchange (NYSE). The US congress had to, however, step in and overrule the FASB standard on stock option.

The current status of IAS (Indian Accounting Standards):

In India, the Statements on Accounting Standards are issued by the Institute Of Chartered Accountants of India (ICAI) to establish standards that have to be complied with to ensure that financial statements are prepared in accordance with generally accepted accounting standards in India (India GAAP ). From 1973 to 2000 the IASC has issued 32 accounting standards. These standards, as a matter of fact, most of the countries in the world, which are interested, and confidence in adopting these standards may be followed. But it is observed that many countries are not adopting the standards in the presentation of accounting information. With a view to examine the time gap for indianisation of International Accounting Standards, the information is analyzed The average gap for indianisation of International Accounting Standards is 6.13 years. It shows that for adopting IAS in India, it is taking 6.13 years for one accounting standard. This analysis points out the poor research work, and development in the accounting field.

A significant criticism of IAS;

* That the standards are too broad based and general to ensure that similar accounting method is applied in similar circumstances. For Instance, the accounting for expenses incurred under a Voluntary Retirement Scheme ( VRS ) , in which the methods used range from pay-as-you-go to Amortization of the present value of future pension payments over the period of benefit.

* It may be noted that in several important areas, when the Indian Standards are implemented, the accounting treatment in these areas could lead to differences in the restatement of accounts in accordance with US GAAP . Some of these areas are:

- Consolidated financial statements

- Accounting for taxes on income

- Financial Instruments

- Intangible Assets

Restatement to US GAAP :


A restatement of financial statements prepared under India GAAP to U.S. GAAP requires careful planning in the following areas:

- Involvement of personnel within the accounts function and the time frame within which the task is to be completed.

- Identification of significant accounting policies that would need to be disclosed under U.S. GAAP and the differences that exist between India GAAP and U.S. GAAP

- The extent of training required within the organisation to create an awareness of the requirements under U.S. GAAP

- Subsidiaries and associate companies and restatement of their accounts in conformity with U.S. GAAP

- Adjustment entries that are required for conversion of India GAAP accounts.

- Reconciliation of differences arising on restatement to U.S. GAAP in respect of income for the periods under review and for the statement of Shareholder's equity.

The timetable for restatement of the financial statements to US GAAP would depend upon the size of the company and the nature of its operations , the number of subsidiaries and associates . The process of conversion would normally take up to 16 weeks in a large company in the initial year . It is thus necessary to streamline the accounting systems to provide for restatement to U.S. GAAP on a continuing basis. At first sight the restatement of financial statements in accordance with U.S. GAAP appears to be formidable. However, as the Indian accounting standards are built on the foundation of international accounting standards, on which a truly global GAAP might be built, there is no cause for concern .

Another reason for the prevailing divergent accounting practices is the Accounting Standards, the provisions of the Income Tax Act 1961 and Indian Companies Act 1956 do not go together.

(a) Company law and Accounting Standards:

In India, though accounting standards setting is presently being done by ICAI, one could discern a tentative and halfhearted foray by company legislation in to the making of accounting rules of Measurement and reporting. This action by itself is not the sore point but the failure to keep pace with the changes and simultaneously not allowing scope for some one else to do it is disturbing.

A study of the requirement of company law regarding the financial statements reveal several lacunae like earning per share, information about future cash flows, consolidation, mergers, acquisitions etc.

(b) Income Tax Act and Accounting Standards:

The Income Tax Act does not recognize the accounting standards for most of the items while computing income under the head "Profits & Gains of Business or Profession". Section 145(2) of the I.T. Act has empowered the Central Government to prescribe accounting standards. The standards prescribed so far constitute a rehash of the related accounting standards prescribed by ICAI for corporate accounting. On a close scrutiny of these standards one is left wondering about the purpose and value of this effort. Examples are application of prudence substance over form, adherence to principles of going concern etc.

(c) Other regulations and accounting standards:

In respect of banks, financial institutions, and finance companies the Reserve Bank of India (RBI) pronounces policies among others, revenue recognition, provisioning and assets classifications.

Similarly the Foreign Exchange Dealers Association (FEDAI) provides guidelines regarding accounting for foreign exchange transactions. Since the Securities & Exchange Board of India (SEBI) is an important regulatory body it would also like to have its own accounting standards and in fact, it has started the process


by notifying cash flow reporting format. It is also in the process of issuing a standard on the accounting policies for mutual funds. It appears as if several authorities in our country are keen to have a say in the matter of framing accounting rules of measurement and reporting. The tentative and half hearted legal and regulatory intervention in accounting in our country, has come in the way of development of robust, continuously evolving and dynamic accounting theory and standards.

India is slowly entering the arena of accounting standards. But the progress of formulation of accounting standards has been very slow compared with the developments at international levels. Differences are still there but they are narrowing. It is expected that the pace of progress in the sphere of harmonization will accelerate further in the coming years.

Q. 6 (a). Give a note on the Japanese approach to HRM.

Ans: When it comes to Human Resource Management much adieu has been made in recent years over the comparison between East and West. That comparison usually boils down to Japanese HRM practices that are slow to change in the face of increasing global competitiveness and their more adaptable U.S. counterparts. Optimists even laud that Japan is finally showing signs of "catching up." However, experts say too often this comparison is oversimplified.

Take, for example, a study lead by Markus Pudelko of the University of Edinburgh Management School last year. It confirms that the "seniority principle" for which Japan's traditional HRM model is famed is waning more than any other principle and will likely continue to do so. It even echoes a 2002 study conducted at the University of Melbourne, which also notes that while surveyed firms were undergoing changes in HRM only one in four could be considered transformative, according to the Australian Human Resource Institute.

While the Melbourne study raises questions about what specific changes lie ahead for Japanese firms adopting more performance-based promotion and compensation systems, Pudelko's suggested that whatever changes are made in this area they should be suited to a Japanese context. Others agree, not only regarding promotion and compensation, but also for broader aspects of Japanese HRM. Needless to say, they also raise questions about which specific HR practices in Japan need changing.

While it's generally agreed that more emphasis on performance than seniority and a more equitable assessment of employees could go a long way to improve morale and competitiveness, little has been offered in the realm of revamping Japanese HRM.

"The perception that Japanese companies have to become more like U.S. companies to survive in this global environment isn't born out by fact," Sanford M Jacoby, author and professor at UCLA Anderson School of Management told Veritude newsletter in 2005. A survey of Japanese and U.S. firms he coauthored found when it comes to HR, while some Japanese companies are more likely to hire mid-career staff, have boards made up primarily of outsiders and use financial incentives like their U.S. counterparts, they are still in the minority.

In fact, Jacoby's study shows that some U.S. companies are looking a bit more like Japanese firms, though they also are a minority. Some, especially those more insulated from financial markets, are viewing HR as an essential "resource-based" asset for business strategy and pay close attention to human assets or intellectual capital. They invest heavily in training and retention and their HR executives play a major role in grooming executives for senior posts - much as firms traditionally have done in Japan. The implication, counter to conventional wisdom abroad, is that's a good thing.

It is wise for HR executives to position themselves much in the same way that CFOs have in the past two decades to further this, according to Jacoby. And in that respect Japanese HRM, which continues to play a central role ranging from performance assessment and overall training to strategizing and senior promotions (even before Enron and the Sarbanes-Oxley act helped cast doubt on outside executive hires as a panacea) may have as much to teach as learn.


What seems to be missing from the typical Japan-U.S. comparison is recognition that HRM and other business models naturally change over time and each has its pluses and minuses, Jacoby says. Just as the shareholder-value model gave rise to the excesses that led to Enron and WorldCom, he notes, Japanese firms such as Canon and Toyota have done well by the traditional Japanese model.

As Japanese companies turn more toward other Asian nations for trade - especially bourgeoning China - pressure to adopt western HR models may decrease, affecting the trend to succumb to such pressure. And as Japan's government continues to spearhead regional HR development through Official Development Assistance (ODA) and the Employment and Human Resources Development Organization of Japan (EHDO), it may in the long run do as much exporting of HR-management models as importing. It's a possibility that extends well beyond Japanese governmental aid.

In the private sector, just last year Japanese HR giant Recruit Co., Ltd. tied up with its Chinese counterpart 51jobs.com to collaborate on the development of 51job's products and services in China. The deal allows Recruit to buy up to a 40 percent stake in the firm and it will be sharing its management experience as well as technical expertise, according to ChinaTechNews.com. It would seem prudent for those working in Japan's HR industry to learn, as well as look, before they leap - especially if they are eyeing future prospects elsewhere in the region.

Q. 6 (b). Explain briefly the Purchasing power parity theory.

Ans: Purchasing power parity (PPP) is a theory of long-term equilibrium exchange rates based on relative price levels of two countries. The idea originated with the School of Salamanca in the 16th century and was developed in its modern form by Gustav Cassel in 1918.[2] The concept is founded on the law of one price; the idea that in absence of transaction costs, identical goods will have the same price in different markets.

In its "absolute" version, the purchasing power of different currencies is equalized for a given basket of goods. In the "relative" version, the difference in the rate of change in prices at home and abroad—the difference in the inflation rates—is equal to the percentage depreciation or appreciation of the exchange rate.

The best-known and most-used purchasing power parity exchange rate is the Geary-Khamis dollar (the "international dollar").

PPP exchange rate (the "real exchange rate") fluctuations are mostly due to different rates of inflation between the two economies. Aside from this volatility, consistent deviations of the market and PPP exchange rates are observed, for example (market exchange rate) prices of non-traded goods and services are usually lower where incomes are lower. (A U.S. dollar exchanged and spent in India will buy more haircuts than a dollar spent in the United States). Basically, PPP deduces exchange rates between currencies by finding goods available for purchase in both currencies and comparing the total cost for those goods in each currency.

There can be marked differences between PPP and market exchange rates. For example, the World Bank's World Development Indicators 2005 estimated that in 2003, one Geary-Khamis dollar was equivalent to about 1.8 Chinese yuan by purchasing power parity[5]—considerably different from the nominal exchange rate. This discrepancy has large implications; for instance, GDP per capita in the People's Republic of China is about US$1,800 while on a PPP basis it is about US$7,204. This is frequently used to assert that China is the world's second-largest economy, but such a calculation would only be valid under the PPP theory. At the other extreme, Denmark's nominal GDP per capita is around US$62,100, but its PPP figure is only US$37,304.

Types of PPP

There are two types of PPP. They are:


Absolute Purchasing Power Parity that is based on the maintenance of equal prices in two concerned countries. Relative PPP describes the inflation rate. This describes the appreciation rate of a currency, which is decided by calculating the difference between the exchange rates of two countries.

Calculation of PPP

Purchasing Power Parity is calculated by comparing the price of an identical good in both the countries. The “Hamburger Index” in The Economist magazine presents the index in a jovial manner every year. But the calculation is not free from problem because consumers in every country consume different types of products. Another index is the iPOD Index. The iPOD is considered to be one of the standard consumer products these days. Hence PPP can be calculated by comparing its price.

The PPP is unable to display the right picture of the standard of living. There are certain difficulties since the PPP number vary with specific amount of goods. PPP is very often utilize

Assignment (Set-1)Subject code: MI0033

Software Engineering

Q. 1 : Discuss the Objectives & Principles Behind Software Testing.


Ans. Software Testing Fundamentals:

Testing presents an interesting anomaly for the software engineer. During earlier software engineering

activities, the engineer attempts to build software from an abstract concept to a tangible product. Now comes

testing. The engineer creates a series of test cases that are intended to “demolish” the software that has been

built. Testing is the one step in the software process that could be viewed (psychologically, at least) as

destructive rather than constructive.

(i) Testing Objectives :

In an excellent book on software testing, Glen Myers states a number of rules that can serve well as

testing objectives:

1. Testing is a process of executing a program with the intent of finding an error.

2. A good test case is one that has a high probability of finding an as return discovered error.

3. A successful test is one that uncovers an as-yet-undiscovered error. These objectives imply a

dramatic change in viewpoint. They move counter to the commonly held view that a successful test

is one in which no errors are found. Our objective is to design tests that systematically uncover

different classes of errors, and to do so with a minimum amount of time and effort. If testing is

conducted successfully (according to the objectives stated previously), it will uncover errors in the

software. As a secondary benefit, testing demonstrates that software functions appear to be

working according to specification, that behavioral and performance requirements appear to have

been met. In addition, data collected as testing is conducted provide a good indication of software

reliability, and some indication of software quality as a whole. But testing cannot show the absence

of errors and defects, it can show only that software errors and defects are present. It is important

to keep this (rather gloomy) statement in mind as testing is being conducted.

(ii) Testing Principles :

Before applying methods to design effective test cases, a software engineer must understand the basic

principles that guide software testing. Davis [DAV95] suggests a set of testing principles that have

been adapted for use in this book :

- All tests should be traceable to customer requirements. As we have seen, the objective of

software testing is to uncover errors. It follows that the most severe defects (from the customer’s

point of view) are those that cause the program to fail to meet its requirements.

- Tests should be planned long before testing begins. Test planning can begin as soon as the

requirements model is complete. Detailed definition of test cases can begin as soon as the design

model has been solidified. Therefore, all tests can be planned and designed before any code has

been generated.

- The Pareto principle applies to software testing. Stated simply, the Pareto principle implies that

80 percent of all errors uncovered during testing will most likely be traceable to 20 percent of all


program components. The problem, of course, is to isolate these suspect components and to

thoroughly test them.

- Testing should begin “in the small” and progress toward testing “in the large”. The first tests

planned and executed generally focus on individual components. As testing progresses, focus

shifts in an attempt to find errors in integrated clusters of components and ultimately in the entire

system.

- Exhaustive testing is not possible. The number of path permutations for even a moderately

sized program is exceptionally large. For this season, it is impossible to execute every combination

of paths during testing. It is possible, however, to adequately cover program logic and to ensure

that all conditions in the component-level design have been exercised.

- To be most effective, testing should be conducted by an independent third party. By most

effective, we mean testing that has the highest probability of finding errors (the primary objective of

testing). For reasons that have been introduced earlier in this unit, the software engineer who

created the system is not the best person to conduct all tests for the software.

(iii) Testability :

In ideal circumstances, a software engineer designs a computer program, a system, or a product with

“testability” in mind. This enables the individuals charged with testing to design effective test cases

more easily. But what is testability ? James Bach describes testability in the following manner.

Software testability is simply how easily [a computer program] can be tested. Since testing is so

profoundly difficult, it pays to know what can be done to streamline it. Sometimes programmers are

willing to do things that will help the testing process and a checklist of possible design points, features,

etc., can be useful in negotiating with them. There are certainly metrics that could be used to measure

testability in most of its aspects. Sometimes, testability is used to mean how adequately a particular set

of tests will cover the product. It’s also used by the military to mean how easily a tool can be checked

and repaired in the field. Those two meanings are not the same as software testability. The checklist

that follows provides a set of characteristics that lead to testable software.

Q. 2. Discuss the CMM 5 Levels for Software Process.

Ans. The Software Process :

In recent years, there has been a significant emphasis on “process maturity”. The Software Engineering

Institute (SEI) has developed a comprehensive model predicated on a set of software engineering

capabilities that should be present as organizations reach different levels of process maturity. To determine

an organization’s current state of process maturity, the SEI uses an assessment that results in a five point

grading scheme. The grading scheme determines compliance with a capability maturity model (CMM)

[PAU93] that defines key activities required at different levels of process maturity. The SEI approach

provides a measure of the global effectiveness of a company’s software engineering practices, and

establishes five process maturity levels that are defined in the following manner :


Level 1 : Initial – The Software process is characterized as ad hoc and occasionally even chaotic. Few

processes are defined, and success depends on individual effort.

Level 2 : Repeatable – Basic project management processes are established to track cost, schedule, and

functionality. The necessary process discipline is in place to repeat earlier successes on projects with similar

applications.

Level 3 : Defined – The software process for both management and engineering activities is documented,

standardized, and integrated into an organized-wide software process. All projects use a documented and

approved version of the organizations process for developing and supporting software. This level includes all

characteristic defined for level 2.

Level 4 : Managed – Detailed measures of the software process and product quality are collected. Both the

software process and products are quantitatively understood and controlled using detailed measures. This

level includes all characteristics defined for

level 3.

Level 5 : Optimizing – Continuous process improvement is enabled by quantitative feedback from the

process and from testing innovative ideas and technologies. This level includes all characteristics defined for

level 4. The five levels defined by the SEI were derived as a consequence of evaluating responses to the SEI

assessment questionnaire that is based on the CMM. The results of the questionnaire are distilled to a single

numerical grade that provides an indication of an organization’s process maturity.

The SEI has associated key process areas (KPAs) with each of the maturity levels. The KPAs describe those

software engineering functions (e.g., software project planning, requirements management) that must be

present to satisfy good practice at a particular level. Each KPA is described by identifying the following

characteristics :

- Goals – the overall objectives that the KPA must achieve.

- Commitments – requirements (imposed on the organization) that must be met to achieve the goals,

or provide proof of intent to comply with the goals.

- Abilities – those things must be in place (organizationally and technically) to enable the organization

to meet the commitments.

- Activities – the specific tasks required to achieve the KPA function.

- Methods for monitoring implementation – the manner in which the activities are monitored as

they are put into place.

- Methods for verifying implementation – the manner in which proper practice for the KPA can be

verified.

Q.3. Discuss the Water Fall Model for Software Development.

Ans. The Linear Sequential Model :


Sometimes called the classic life cycle or the waterfall model, the linear sequential model suggests a

systematic, sequential approach to software development that begins at the system level and progresses

through analysis, design, coding testing, and support. The linear sequential model for software engineering.

Modeled after a conventional engineering cycle, the linear sequential model encompasses the following

activities :

System / information engineering and modeling – because software is always part of a larger system (or

business), work begins by establishing requirements for all system elements and then allocating some

subset of these requirements to software. This system view is essential when software must interact with

other elements such as hardware, people, and databases. System engineering and analysis encompass

requirements gathering at the system level, with a small amount of top level design and analysis. Information

engineering encompasses requirements gathering at the strategic business level and at the business area

level.

Software requirements analysis : The requirements gathering process is intensified and focused

specifically on software. To understand the nature of the program(s) to be built, the software engineer

(“analyst”) must understand the information domain for the software, as well as required function, behavior,

performance, and interface. Requirements for both the system and the software are documented and

reviewed with the customer.

Design – Software design is actually a multistep process that focuses on four distinct attributes of a program

: data structure, software architecture, interface representations, and procedural (algorithmic) detail. The

design process translates requirements into a representation of the software that can be assessed for quality

before coding begins. Like requirements, the design is documented and becomes part of the software

configuration.

Code generation – The design must be translated into a machine-readable form. The code generation step

performs this task. If design is performed in a detailed manner, code generation can be accomplished

mechanistically.

Test – Once the code has been generated, program testing begins. The testing process focuses on the

logical internals of the software, ensuring that all statements have been tested, and on the functional

externals; that is, conducting tests to uncover errors and ensure that defined input will produce actual results

that agree with the required results.

Support – Software will undoubtedly undergo change after it is delivered to the customer (a possible

exception is embedded software). Change will occur because errors have been encountered, because the

software must be adapted to accommodate changes in its external environment (e.g. a change required

because of a new operating system or peripheral device), or because the customer requires functional or

performance enhancements. Software support / maintenance reapplies each of the preceding phases to an

existing program rather than a new one. The linear sequential model is the oldest and the most widely used

paradigm for software engineering. However, criticism of the paradigm has caused even active supporters to

questions its efficacy [HAN95]. Among the problems that are sometimes encountered when the linear

sequential model is applied are :


1. Real projects rarely follow the sequential flow that the model proposes. Although the linear model

can accommodate iteration, it does so indirectly. As a result, changes can cause confusion as the

project team proceeds.

2. It is often difficult for the customer to state all requirements explicitly. The linear sequential model

requires this and has difficulty accommodating the natural uncertainty that exists at the beginning of

many projects.

3. The customer must have patience. A working version of the program(s) will not be available until late

in the project time-span. A major blunder, if undetected until the working program is reviewed, can

be disastrous.

In an interesting analysis of actual projects Bradac [BRA94], found that the linear nature of the classic life

cycle leads to “blocking states” in which some project team members must wait for other members of the

team to complete dependent tasks. In fact, the time spent waiting can exceed the time spent on productive

work ! The blocking state tends to be more prevalent at the beginning and end of a linear sequential process.

Each of these problems is real. However, the classic life cycle paradigm has a definite and important place in

software engineering work. It provides a template into which methods for analysis, design, coding, testing,

and support can be placed. The classic life cycle remains a widely used procedural model for software

engineering. While it does have weaknesses, it is significantly better than a haphazard approach to software

development.

Q. 4 . Explain the different types of Software Measurement Techniques.

Ans. Software Measurement Techniques :

Measurements in the physical world can be categorized in two ways : direct measures (e.g. the length of a

bolt) and indirect measures (e.g. the “quality” of bolts produced, measured by counting rejects). Software

metrics can be categorized similarly. Direct measures of the software engineering process include cost and

effort applied. Direct measures of the product include lines of code (LOC) produced, execution speed,


memory size, and defects reported over some set period of time. Indirect measures of the product include

functionality, quality, complexity, efficiency, reliability, maintainability, and many other “- abilities”.

1. Size Oriented Metrics :

Size-oriented software metrics are derived by normalizing quality and / or productivity measures by

considering the size of the software that has been produced. If a software organization maintains simple

records, a table of size-oriented measures can be created. The table lists each software development project

that has been completed over the past few years and corresponding measures for that project. 12,100 lines

of code were developed with 24 person-months of effort at a cost $168,000. It should be noted that the effort

and cost recorded in the table represent all software engineering activities (analysis, design, code, and test),

not just coding. Further information for project alpha indicates that 365 pages of documentation were

developed, 134 errors were recorded before the software was released, and 29 defects were encountered

after release to the customer within the first year of operation. Three people worked on the development of

software for project alpha.

2. Function Oriented Metrics :

Function-oriented software metrics use a measure of the functionality delivered by the application as a

normalization value. Since ‘functionality’ cannot be measured directly, it must be derived indirectly using

other direct measures. Function-oriented metrics were first proposed by Albrecht [ALB79], who suggested a

measure called the function point. Function points are derived using an empirical relationship based on

countable (direct) measures of software’s information domain and assessments of software complexity.

3. Extended Function Point Metrics :

The function point measure was originally designed to be applied to business information systems

applications. To accommodate these applications, the data dimension (the information domain values

discussed previously) was emphasized to the exclusion of the functional and behavioral (control)

dimensions. For this reason, the function point measure was inadequate for many engineering and

embedded systems (which emphasize function and control). A number of extensions to the basic function

point measure have been proposed to remedy this situation.

Q. 5. Explain the COCOMO Model & Software Estimation Technique.

Ans. Software Estimation Technique :

Software cost and effort estimation will never be an exact science. Too many variables – human, technical,

environmental, political – can affect the ultimate cost of software and effort applied to develop it. However,

software project estimation can be transformed from a black art to a series of systematic steps that provide

estimates with acceptable riks.

To achieve reliable cost and effort estimates, a number of options arise :

1. Delay estimation until late in the project (obviously, we can achieve 100% accurate estimates after the

project is complete !).

2. Base estimates on similar projects that have already been completed.


3. Use relatively simple decomposition techniques to generate project cost and effort estimates.

4. Use one or more empirical models for software cost and effort estimation. Unfortunately, the first option,

however attractive, is not practical. Cost estimates must be provided “up front”. However, we should

recognize that the longer we wait, the more we know, and the more we know, the less likely we are to make

serious errors in our estimates.

The second option can work reasonably well, if the current project is quite similar to past efforts and other

project influences (e.g. the customer, business conditions, the SEE, deadlines) are equivalent. Unfortunately,

past experience has not always been a good indicator of future results.

The COCOMO Model :

In his classic book on “software engineering economics”, Barry Boehm [BOE81] introduced a hierarchy of

software estimation models bearing the name COCOMO, for Constructive Cost Model. The original

COCOMO model became one of the most widely used an discussed software cost estimation models in the

industry. It has evolved into a more comprehensive estimation model, called COCOMO II [BOE96, BOE00].

Like its predecessor, COCOMO II is actually a hierarchy of estimation models that address the following

areas :

Application composition model : Used during the early stages of software engineering, when prototyping

of user interfaces, consideration of software and system interaction, assessment of performance, and

evaluation of technology maturity are paramount.

Early design stage model : Used once requirements have been stabilized and basic software architecture

has been established.

Post-architecture-stage model : Used during the construction of the software.

Q. 6 : Write a note on myths of Software.

Ans. Most knowledgeable professionals recognize myths for what they are – misleading attitudes that

have caused serious problems for managers and technical people alike. However, old attitudes and habits

are difficult to modify, and remnants of software myths are till believed.

Primarily, there are three types of software myths, all the three are stated below :

1. Management Myths – Managers with software responsibility, like managers in most disciplines, are often

under pressure to maintain budgets, keep schedules from slipping, and improve quality. Like a drowning

person who grasps at a straw, a software manager often grasps at belief in a software myth, if that belief will

lessen the pressure (even temporarily).

Myth – We already have a book that’s full of standards and procedures for building software; won’t that

provide my people with everything they need to know ?

Reality – The book of standards may very well exist, but is it used ? Are software practitioners aware of its

existence ? Does it reflect modern software engineering practice ? Is it complete? Is it streamlined to


improve time to delivery while still maintaining a focus on quality ? In many cases, the answer to all of these

questions is “no”.

Myth – May people have state-of-the-art software development tools, after all, we buy them the newest

computers.

Reality – It takes much more than the latest model mainframe, workstation, or PC to do high-quality software

development. Computer-aided software engineering (CASE) tools are more important than hardware for

achieving good quality and productivity, yet the majority of software developers still do not use them

effectively.

Myth – If we get behind schedule, we can add more programmers and catch up (sometimes called the

Mongolian horde concept).

Reality – Software development is not a mechanistic process like manufacturing. In the words of Brooks

[BR075] : “adding people to a late software project makes it later”. At first, this statement may seem

counterintuitive. However, as new people are added, people who were working must spend time educating

the newcomers, thereby reducing the amount of time spent on productive development effort. People can be

added but only in a planned and well-coordinated manner.

Myth – If I decide to outsource the software project to a third party, I can just relax and let that firm build it.

Reality – If an organization does not understand how to manage and control software projects internally, it

will invariable struggle when it outsource software projects.

2. Customer Myths – A customer who requests computer software may be a person at the next desk, a

technical group down the hall, the marketing / sales department, or an outside company that has requested

software under contract. In many cases, the customer believes myths about software because software

managers and practitioners do little to correct misinformation. Myths lead to false expectations (by the

customer) and ultimately, dissatisfaction with the developer.

Myth – A general statement of objectives is sufficient to begin writing programs – we can fill in the details

later.

Reality – A poor up-front definition is the major cause of failed software efforts. A formal and detailed

description of the information domain, function, behavior, performance, interfaces, design constraints, and

validation criteria is essential. These characteristics can be determined only after thorough communication

between customer and developer.

Myth – Project requirements continually change, but change can be easily accommodated because software

is flexible.

Reality – It is true that software requirements change, but the impact of change varies with the time at which

it is introduced. If serious attention is given to up-front definition, early requests for change can be

accommodated easily. The customer can review requirements and recommend modifications with relatively

little impact on cost. When changes are requested during software design, the cost impact grows rapidly.

Resources have been committed and a design framework has been established. Change can cause


upheaval that requires additional resources and major design modification, that is, additional cost. Changes

in function, performance, interface, or other characteristics during implementation (code and test) have a

severe impact on cost. Change, when requested after software is in production, can be over an order of

magnitude more expensive than the same change requested earlier.

3. Practitioner’s Myths – Myths that are still believed by software practitioners have been fostered by 50

years of programming culture. During the early days of software, programming was viewed as an art form.

Old ways and attitudes die hard.

Myth – Once we write the program and get it to work, our job is done.

Reality – Someone once said that “the sooner you begin ‘writing code’, the longer it’ll take you to get done”.

Industry data ([LIE80], [JON91], [PUT97]) indicates that between 60 and 80 percent of all effort expended on

software will be expended after it is delivered to the customer for the first time.

Myth – Until I get the program “running” I have no way of assessing its quality.

Reality – One of the most effective software quality assurance mechanisms can be applied from the

inception of a project – the formal technical review. Software reviews are a “quality filter” that have been

found to be more effective than testing for finding certain classes of software defects.

Myth – The only deliverable work product for a successful project is the working program.

Reality – A working program is only one part of a software configuration that includes many elements.

Documentation provides a foundation for successful engineering and, more importantly, guidance for

software support.

Myth – Software engineering will make up creates voluminous and unnecessary documentation and will

invariably slow us down.

Reality – Software engineering is not about creating documents. It is about creating quality. Better quality

leads to reduced rework. And reduced rework results in faster delivery times. Many software professionals

recognize the fallacy of the myths just described. Regrettably, habitual attitudes and methods foster poor

management and technical practices, even when reality dictates a better approach. Recognition of software

realities is the first step towards formulation of practical solutions for software engineering.

Software Myths :

Myth is defined as "widely held but false notation" by the oxford dictionary, so as in other fields

software arena also has some myths to demystify. Pressman insists "Software myths- beliefs about software

and the process used to build it- can be traced to earliest days of computing. Myths have a number of

attributes that have made them insidious." So software myths prevail but though they do are not clearly

visible they have the potential to harm all the parties involved in the software development process mainly

the developer team.


Tom DeMarco expresses “In the absence of meaningful standards, a new industry like software

comes to depend instead on folklore." The given statement points out that the software industry caught pace

just some decades back so it has not matured to a formidable level and there are no strict standards in

software development. There does not exist one best method of software development that ultimately

equates to the ubiquitous software myths.

Primarily, there are three types of software myths, all the three are stated below:

1. Management Myth

2. Customer Myth

3. Practitioner/Developer Myth

Before defining the above three myths one by one lets scrutinize why these myths occur on the first

place. The picture below tries to clarify the complexity of the problem of software development requirement

analysis mainly between the developer team and the clients.


The above pictures elucidate that the techies understand the problem differently than what it really is and it

results to a different solution as the problem itself is misunderstood. So the problem understanding i.e.

requirement analysis must be done properly to avoid any problems in later stages as it will have devastating

effects.

1. Management Myths: Managers with software responsibility, like managers in most disciplines, are often

under pressure to maintain budgets, keep schedules from slipping, and improve quality. Like a drowning

person who grasps at a straw, a software manager often grasps at belief in a software myth, if those beliefs

will lessen the pressure (even temporarily). Some common managerial myths stated by Roger Pressman

include:

I. We have standards and procedures for building software, so developers have

everything they need to know.

II. We have state-of-the-art software development tools; after all, we buy the

latest computers.

III. If we're behind schedule, we can add more programmers to catch up.

IV. A good manger can manage any project.

The managers completely ignore that fact that they are working on something intangible but very important to

the clients which invites more trouble than solution. So a software project manger must have worked well

with the software development process analyzing the minute deals associated with the field learning the

nitty-gritty and the tips and trick of the trade. The realities are self understood as it is already stated how

complex the software development process is.

2. Customer Myths: A customer who requests computer software may be a person at the next desk, a

technical group down the hall, the marketing/sales department, or an outside company that has requested

software under contract. In many cases, the customer believes myths about software because software

managers and practitioners do little to correct misinformation. Myths lead to false expectations (by the

customer) and, ultimately, dissatisfaction with the developer. Commonly held myths by the clients are:

I.A general statement of objectives is sufficient to begin writing programs - we can fill in the details later.II. Requirement changes are easy to accommodate because software is flexible.III. I know what my problem is; therefore I know how to solve it.

This primarily is seen evidently because the clients do not have a first hand


experience in software development and they think that it's an easy process.

3. Practitioner/ Developer Myths: Myths that are still believed by software practitioners have been fostered by

over 50 years of programming culture. During the early days of software, programming was viewed as an art

form. Old ways and attitudes die hard. A malpractice seen is developers are that they think they know

everything and neglect the peculiarity of each problem.

I. If I miss something now, I can fix it later.

II. Once the program is written and running, my job is done.

III. Until a program is running, there's no way of assessing its quality.

IV. The only deliverable for a software project is a working program.

Every developer should try to get all requirement is relevant detail to

effectively design and code the system.

Some misplaced assumptions that intensify the myths are listed below:

1. All requirements can be pre-specified

2. Users are experts at specification of their needs

3. Users and developers are both good at visualization

4. The project team is capable of unambiguous communication

On the whole, realities are always different from the myths. So the myths must be demystified and work

should be based on systematic, scientific and logical bases than the irrational myths. The systemic view

must be considered to determine the success of any software project its not only the matter of hard skills but

soft skills of the developer team also matter to come up with a efficient system.



Software Engineering

Q 1. Quality and reliability are related concepts but are fundamentally different in a number of ways. Discuss them.

Ans: Software quality is defined as conformance to explicitly stated functional and non-functional requirements, explicitly documented development standards, and implicit characteristics that are expected of all professionally developed software.

This definition emphasizes upon three important points:

• Software requirements are the foundation from which quality is measured. Lack of conformance is lack of quality

• Specified standards define a set of development criteria that guide the manner in which software is engineered. If the criteria are not followed, lack of quality will almost surely result.

• A set of implicit requirements often goes unmentioned (ease of use, good maintainability etc.)

DeMarco defines product quality as a function of how much it changes the world for the better.

So, there are many different way to look at the quality.

Quality Assurance

Goal of quality assurance is to provide the management with the necessary data to be informed about product quality. It consists of auditing and reporting functions of management.

Cost of quality

does quality assurance add any value.

If we try to prevent problems, obviously we will have to incur cost. This cost includes:

• Quality planning

• Formal technical reviews

• Test equipment

• Training

The cost of appraisal includes activities to gain insight into the product condition. It compare these numbers to the cost of defect removal once the product has been shipped to the customer. Mostly I think it is profitable

Software Reliability:

“Probability of failure free operation of a computer program in a specified environment for a specified time”. For example, a program X can be estimated to have a reliability of 0.96 over 8 elapsed hours.


Software reliability can be measured, directed, and estimated using historical and development data. The key to this measurement is the meaning of term failure. Failure is defined as non-conformance to software requirements. It can be graded in many different ways as shown below:

• From annoying to catastrophic

• Time to fix from minutes to months

• Ripples from fixing

It is also pertinent to understand the difference between hardware and software reliability. Hardware reliability is predicted on failure due to wear rather than failure due to design. In the case of software, there is no wear and tear. The reliability of software is determined by Mean time between failure (MTBF). MTBF is calculated as:

MTBF = MTTF + MTTR

Where MTTF is the Mean Time to Failure and MTTR is the Mean time required to Repair.

Arguably MTBF is far better than defects/kloc as each error does not have the same failure rate and the user is concerned with failure and not with total error count.

A related issue is the notion of availability. It is defined as the probability that a program is operating according to requirements at a given point in time. It can be calculated as

Availability = (MTTF/MTBF) x 100

and clearly depends upon MTTR.

Q.2. Explain Version Control & Change Control.

Ans. Version Control:

Code evolves. As a project moves from first-cut prototype to deliverable, it goes through multiple cycles in which you explore new ground, debug, and then stabilize what you've accomplished. And this evolution doesn't stop when you first deliver for production. Most projects will need to be maintained and enhanced past the 1.0 stage, and will be released multiple times. Tracking all that detail is just the sort of thing computers are good at and humans are not.

Why Version Control?

Code evolution raises several practical problems that can be major sources of friction and drudgery — thus a serious drain on productivity. Every moment spent on these problems is a moment not spent on getting the design and function of your project right.

Perhaps the most important problem is reversion. If you make a change, and discover it's not viable, how can you revert to a code version that is known good? If reversion is difficult or unreliable, it's hard to risk making changes at all (you could trash the whole project, or make many hours of painful work for yourself).

Almost as important is change tracking. You know your code has changed; do you know why? It's easy to forget the reasons for changes and step on them later. If you have collaborators on a project, how do you know what they have changed while you weren't looking, and who was responsible for each change?


Amazingly often, it is useful to ask what you have changed since the last known-good version, even if you have no collaborators. This often uncovers unwanted changes, such as forgotten debugging code. I now do this routinely before checking in a set of changes.

-- Henry Spencer

Another issue is bug tracking. It's quite common to get new bug reports for a particular version after the code has mutated away from it considerably. Sometimes you can recognize immediately that the bug has already been stomped, but often you can't. Suppose it doesn't reproduce under the new version. How do you get back the state of the code for the old version in order to reproduce and understand it?

To address these problems, you need procedures for keeping a history of your project, and annotating it with comments that explain the history. If your project has more than one developer, you also need mechanisms for making sure developers don't overwrite each others' versions.

Version Control by Hand

The most primitive (but still very common) method is all hand-hacking. You snapshot the project periodically by manually copying everything in it to a backup. You include history comments in source files. You make verbal or email arrangements with other developers to keep their hands off certain files while you hack them.

As with most hand-hacking, this method does not scale well. It restricts the granularity of change tracking, and tends to lose metadata details such as the order of changes, who did them, and why. Reverting just a part of a large change can be tedious and time consuming, and often developers are forced to back up farther than they'd like after trying something that doesn't work.

Automated Version Control

To avoid these problems, you can use a version-control system (VCS), a suite of programs that automates away most of the drudgery involved in keeping an annotated history of your project and avoiding modification conflicts.

Most VCSs share the same basic logic. To use one, you start by registering a collection of source files — that is, telling your VCS to start archive files describing their change histories. Thereafter, when you want to edit one of these files, you have to check out the file — assert an exclusive lock on it. When you're done, you check in the file, adding your changes to the archive, releasing the lock, and entering a change comment explaining what you did.

Most of the rest of what a VCS does is convenience: labeling, and reporting features surrounding these basic operations, and tools which allow you to view differences between versions, or to group a given set of versions of files as a named release that can be examined or reverted to at any time without losing later changes.

Another problem is that some kinds of natural operations tend to confuse VCSs. Renaming files is a notorious trouble spot; it's not easy to automatically ensure that a file's version history will be carried along with it when it is renamed. Renaming problems are particularly difficult to resolve when the VCS supports branching.

Change Control:

Change control within Quality management systems (QMS) and Information Technology (IT) systems is a formal process used to ensure that changes to a product or system are introduced in a controlled and coordinated manner. It reduces the possibility that unnecessary changes will be introduced to a system without forethought, introducing faults into the system or undoing changes made by other users of software. The goals of a change control procedure usually include minimal disruption to services, reduction in back-out activities, and cost-effective utilization of resources involved in implementing change.

Change control is currently used in a wide variety of products and systems. For Information Technology (IT) systems it is a major aspect of the broader discipline of change management. Typical examples from the computer and network environments are patches to software products, installation of new operating systems, upgrades to network routing tables, or changes to the electrical power systems supporting such infrastructure.


Certain experts describe change control as a set of six steps[who?]:

Record / Classify

Assess

Plan

Build / Test

Implement

Close / Gain Acceptance

Q. 3. Discuss the SCM Process.

Ans: In software engineering, software configuration management (SCM) is the task of tracking and controlling changes in the software. Configuration management practices include revision control and the establishment of baselines.

SCM concerns itself with answering the question "Somebody did something, how can one reproduce it?" Often the problem involves not reproducing "it" identically, but with controlled, incremental changes. Answering the question thus becomes a matter of comparing different results and of analysing their differences. Traditional configuration management typically focused on controlled creation of relatively simple products. Now, implementers of SCM face the challenge of dealing with relatively minor increments under their own control, in the context of the complex system being developed.The goals of SCM are generally:[citation needed]

Configuration identification - Identifying configurations, configuration items and baselines.

Configuration control - Implementing a controlled change process. This is usually achieved by setting up a change control board whose primary function is to approve or reject all change requests that are sent against any baseline.

Configuration status accounting - Recording and reporting all the necessary information on the status of the development process.

Configuration auditing - Ensuring that configurations contain all their intended parts and are sound with respect to their specifying documents, including requirements, architectural specifications and user manuals.

Build management - Managing the process and tools used for builds.

Process management - Ensuring adherence to the organization's development process.

Environment management - Managing the software and hardware that host the system.

Teamwork - Facilitate team interactions related to the process.

Defect tracking - Making sure every defect has traceability back to the source.

Effective Configuration Management can be defined as stabilising the evolution of software products and process at key points in the life cycle. The focus of CM includes:

Identification of Artefacts

Early identification and change control of artefacts and work products is integral to the project. The configuration manager needs to fully identify and control changes to all the elements that are required to recreate and maintain the software product.


Version Control

The primary goal of version control is to identify and manage project elements as they change over time. The Configuration Manager should establish a version control library to maintain all lifecycle entities. This library will ensure that changes (deltas) are controlled at their lowest atomic level eg documents, source files, scripts and kits etc.

Development Streaming (Branching)

To provide some level of stability and allow fluidity of parallel development (streaming) it is quite normal for project development to be split into branches (development groups).

The CM manager has to identify what branches will be required and ensure they are appropriately set up (eg security etc).

Baselining

Baselining provides the division with a concise picture of the project artifacts and relationships at a particular instance in time. It provides an official foundation on which subsequent work can be based, and to which only authorized changes can be made.

Through baselining (i.e. labelling, tagging) all constituent project components are aligned, uniquely identifiable and reproducible at both the atomic level (eg file) and at the higher kit levels.

Reasons for baselining include:

A baseline supports ease of roll back

A baseline improves CM managers ability to create change reports etc

A baseline supports creation of new parallel branches (e.g. dev branches)

A baseline supports troubleshooting and element comparison

A baseline provides a stable bill of material for the build system

Build Management

The fundamental objective of the build management process is to deliver a disciplined and automated build process.

Activities to consider:

Create automated build scripts (i.e. fetching from repository)

Enforce baselining before all formal builds (support bill of materials/traceability)

Set up stable build machines

Packaging

Typically the packaging process (see next section) will by synonymous [or tightly coupled] with the build process i.e. the build process will do packaging automatically after the build is complete.

Primary objectives of packaging are:


Manageable (i.e. often a single zipped up file or exe)

Reusable (i.e. Try to avoid need for rebuild)

Secure (i.e. Packages should be free from malicious or accidental modification)

Deployment

The configuration manager will typically be involved in the deployment process. Primary considerations include:

Ensuring deployment automated (reducing possibility for manual errors).

Promoting best practice concepts concept like promotion based releases (opposed to environmental rebuilds).

Ensuring releases are authorised and appropriate windows selected for deployment.

Providing streamlined rollback mechanism in case of problem.

Change Request Management

Change Request management can be described as management of change/enhancement requests.

Typically the Configuration Manger should set up a repository to manage these requests and support activities like status tracking, assignment etc.

Issue Tracking

Issue tracking is the formal tacking of problems/defects on your systems or environments.

Typically the Configuration Manger should set up a repository to track these problems as they occur, and track their status to eventual closure.

Q 4. Explain i. Software doesn’t Wear Out.

ii. Software is engineered & not manufactured.

Ans: Software:

It is a document that describe the operation and use of the program. Data structure that enables the program to manipulate the information.Instructions that when executed provides the desired features or function.

Software can be categorized in two types

Generic software:

Generic software is those which are developed for a broad category of customers. These are the users whose environment is well understood and common for all. This type of software sold in the open market where they face several competitors.

Customized software:

This type of software is meant or developed keeping in mind the needs of a particular customer e.g. hospital management system. These types of users have their own unique domain, environment and requirements.


Following are the characteristics of software engineering, we may say that:-

(I)Software doesn't wear out:

The hardware can wear out whereas software can't. In case of hardware we have a "bathtub" like curve, which is a curve that lies in between failure-rate and time. In this curve, in the starting time there is relatively high failure rate. But, after some period of time, defects get corrected and failure-rate drops to a steady-state for some time period. But, the failure-rate again rises due to the effects of rain, dust, temperature extreme and many other environment effects. The hardware begins to wear out.

Figure above depicts failure rate as a function of time for hardware. The relationship often called the "bath tub curve" indicates that hardware exhibits relatively high failure rates early in its life (these failures are often attributable to design or manufacturing defects); defects are corrected and the failure rate drops to a steady-state level (ideally, quite low) for some period of time. As time passes, however, the failure rate rises again as hardware components suffer from the cumulative effects of dust, vibration, abuse, temperature extremes, and any other environmental maladies. Stated simply, the hardware begins to wear out.

Bath tub curve

Software is not suspected able to the environmental maladies that cause hardware to wear out. In, theory, therefore, the failure rate curve for the software should take the form of the "idealized curve". Undiscovered defects will cause high failure early in the life of a program. However these are corrected (ideally, without introducing other errors) and the curve flattens. However, the implication is clear--software doesn't wear out. But it does deteriorate!

(II) Software is not manufactured in the classical sense, but it is developed or engineered:

Software or hardware both get manufactured in the same manner and both of them uses the design model to implement the product. The only difference is in their implementation part. They both differ in their coding part. So, it is said that software is not manufactured but it is developed or engineered. The only difference lies in the cost of both the hardware and software.

Although some similarities exist between software development and hardware manufacture, the two activities are fundamentally different. In both activities, high quality is achieved through good design, but the manufacturing phase for hardware can introduce quality problems that are nonexistent (or easily corrected) for software. Both activities are dependent on the people, but the relationship between people applied and work accomplished is entirely different. Both activities require the construction of a "product" but the approaches are different. Software costs are concentrated in engineering. This means that software projects can not be managed as if they were manufacturing projects.

Q. 5. Explain the Advantages of Prototype Model, & Spiral Model in Contrast to Water Fall model.

Ans: Many life cycle models have been proposed so far. Each of them has some advantages as well as some disadvantages. A few important and commonly used life cycle models are as follows:

• Classical Waterfall Model


• Iterative Waterfall Model

• Prototyping Model

• Evolutionary Model

• Spiral Model

Classical Waterfall Model

The classical waterfall model is intuitively the most obvious way to develop software. Though the classical waterfall model is elegant and intuitively obvious, we will see that it is not a practical model in the sense that it can not be used in actual software development projects. Thus, we can consider this model to be a theoretical way of developing software. But all other life cycle models are essentially derived from the classical waterfall model. So, in order to be able to appreciate other life cycle models, we must first learn the classical waterfall model.

Classical waterfall model divides the life cycle into the following phases as shown below:

Feasibility study

Requirements analysis and specification

Design

Coding

Testing

Maintenance

Feasibility Study

The main aim of feasibility study is to determine whether it would be financially and technically feasible to develop the product

• At first project managers or team leaders try to have a rough understanding of what is required to be done by visiting the client side. They study different input data to the system and output data to be produced by the system. They study what kind of processing is needed to be done on these data and they look at the various constraints on the behaviour of the system.

• After they have an overall understanding of the problem, they investigate the different solutions that are possible. Then they examine each of the solutions in terms of what kinds of resources are required, what would be the cost of development and what would be the development time for each solution.

• Based on this analysis, they pick the best solution and determine whether the solution is feasible financially and technically. They check whether the customer budget would meet the cost of the product and whether they have sufficient technical expertise in the area of development.

The following is an example of a feasibility study undertaken by an organization. It is intended to give one a feel of the activities and issues involved in the feasibility study phase of a typical software project.


Requirements Analysis and Specification

The aim of the requirements analysis and specification phase is to understand the exact requirements of the customer and to document them properly. This phase consists of two distinct activities, namely

• Requirements gathering and analysis, and

• Requirements specification

The goal of the requirements gathering activity is to collect all relevant information from the customer regarding the product to be developed with a view to clearly understand the customer requirements and weed out the incompleteness and inconsistencies in these requirements.

The requirements analysis activity is begun by collecting all relevant data regarding the product to be developed from the users of the product and from the customer through interviews and discussions. For example, to perform the requirements analysis of a business accounting software required by an organization, the analyst might interview all the accountants of the organization to ascertain their requirements. The data collected from such a group of users usually contain several contradictions and ambiguities, since each user typically has only a partial and incomplete view of the system. Therefore it is necessary to identify all ambiguities and contradictions in the requirements and resolve them through further discussions with the customer. After all ambiguities, inconsistencies, and incompleteness have been resolved and all the requirements properly understood, the requirements specification activity can start. During this activity, the user requirements are systematically organized into a Software Requirements Specification (SRS) document.

The customer requirements identified during the requirements gathering and analysis activity are organized into an SRS document. The important components of this document are functional requirements, the non-functional requirements, and the goals of implementation.

Design

The goal of the design phase is to transform the requirements specified in the SRS document into a structure that is suitable for implementation in some programming language. In technical terms, during the design phase the software architecture is derived from the SRS document. Two distinctly different approaches are available: the traditional design approach and the object-oriented design approach.

Traditional design approach: Traditional design consists of two different activities; first a structured analysis of the requirements specification is carried out where the detailed structure of the problem is examined. This is followed by a structured design activity. During structured design, the results of structured analysis are transformed into the software design.

Object-oriented design approach: In this technique, various objects that occur in the problem domain and the solution domain are first identified, and the different relationships that exist among these objects are identified. The object structure is further refined to obtain the detailed design.

Coding and Unit Testing

The purpose of the coding and unit testing phase (sometimes called the implementation phase) of software development is to translate the software design into source code. Each component of the design is implemented as a program module. The end-product of this phase is a set of program modules that have been individually tested.

During this phase, each module is unit tested to determine the correct working of all the individual modules. It involves testing each module in isolation as this is the most efficient way to debug the errors identified at this stage.


Integration and System Testing

Integration of different modules is undertaken once they have been coded and unit tested. During the integration and system testing phase, the modules are integrated in a planned manner.

The different modules making up a software product are almost never integrated in one shot. Integration is normally carried out incrementally over a number of steps. During each integration step, the partially integrated system is tested and a set of previously planned modules are added to it. Finally, when all the modules have been successfully integrated and tested, system testing is carried out. The goal of system testing is to ensure that the developed system conforms to the requirements laid out in the SRS document. System testing usually consists of three different kinds of testing activities:

• α – testing: It is the system testing performed by the development team.

• β – testing: It is the system testing performed by a friendly set of customers.

• Acceptance testing: It is the system testing performed by the customer himself after product delivery to determine whether to accept or reject the delivered product.

System testing is normally carried out in a planned manner according to the system test plan document. The system test plan identifies all testing-related activities that must be performed, specifies the schedule of testing, and allocates resources. It also lists all the test cases and the expected outputs for each test case.

Maintenance

Maintenance of a typical software product requires much more than the effort necessary to develop the product itself. Many studies carried out in the past confirm this and indicate that the relative effort of development of a typical software product to its maintenance effort is roughly in the 40:60 ratio. Maintenance involves performing any one or more of the following three kinds of activities:

• Correcting errors that were not discovered during the product development phase. This is called corrective maintenance.

• Improving the implementation of the system, and enhancing the functionalities of the system according to the customer’s requirements. This is called perfective maintenance.

• Porting the software to work in a new environment. For example, porting may be required to get the software to work on a new computer platform or with a new operating system. This is called adaptive maintenance.

Shortcomings of the Classical Waterfall Model

The classical waterfall model is an idealistic one since it assumes that no development error is ever committed by the engineers during any of the life cycle phases. However, in practical development environments, the engineers do commit a large number of errors in almost every phase of the life cycle. The source of the defects can be many: oversight, wrong assumptions, use of inappropriate technology, communication gap among the project engineers, etc. These defects usually get detected much later in the life cycle. For example, a design defect might go unnoticed till we reach the coding or testing phase. Once a defect is detected, the engineers need to go back to the phase where the defect had occurred and redo some of the work done during that phase and the subsequent phases to correct the defect and its effect on the later phases. Therefore, in any practical software development work, it is not possible to strictly follow the classical waterfall model.

Prototyping Model


A prototype is a toy implementation of the system. A prototype usually exhibits limited functional capabilities, low reliability, and inefficient performance compared to the actual software. A prototype is usually built using several shortcuts. The shortcuts might involve using inefficient, inaccurate, or dummy functions. The shortcut implementation of a function, for example, may produce the desired results by using a table look-up instead of performing the actual computations. A prototype usually turns out to be a very crude version of the actual system.

The Need for a Prototype

There are several uses of a prototype. An important purpose is to illustrate the input data formats, messages, reports, and the interactive dialogues to the customer. This is a valuable mechanism for gaining better understanding of the customer’s needs.

• how screens might look like

• how the user interface would behave

• how the system would produce outputs, etc.

This is something similar to what the architectural designers of a building do; they show a prototype of the building to their customer. The customer can evaluate whether he likes it or not and the changes that he would need in the actual product. A similar thing happens in the case of a software product and its prototyping model.

Spiral Model

The Spiral model of software development is shown in fig. 33.8. The diagrammatic representation of this model appears like a spiral with many loops. The exact number of loops in the spiral is not fixed. Each loop of the spiral represents a phase of the software process. For example, the innermost loop might be concerned with feasibility study; the next loop with requirements specification; the next one with design, and so on. Each phase in this model is split into four sectors (or quadrants)..

First quadrant (Objective Setting):

• During the first quadrant, we need to identify the objectives of the phase.

• Examine the risks associated with these objectives

Second quadrant (Risk Assessment and Reduction):

• A detailed analysis is carried out for each identified project risk.

• Steps are taken to reduce the risks. For example, if there is a risk that the requirements are inappropriate, a prototype system may be developed

Third quadrant (Objective Setting):

• Develop and validate the next level of the product after resolving the identified risks.

Fourth quadrant (Objective Setting):

• Review the results achieved so far with the customer and plan the next iteration around the spiral.

• With each iteration around the spiral, progressively a more complete version of the software gets built.


technically challenging software products that are prone to several kinds of risks. However, this model is much more complex than the other models. This is probably a factor deterring its use in ordinary projects.

Comparison of Different Life Cycle Models

The classical waterfall model can be considered as the basic model and all other life cycle models as embellishments of this model. However, the classical waterfall model can not be used in practical development projects, since this model supports no mechanism to handle the errors committed during any of the phases.

This problem is overcome in the iterative waterfall model. The iterative waterfall model is probably the most widely used software development model evolved so far. This model is simple to understand and use. However, this model is suitable only for well-understood problems; it is not suitable for very large projects and for projects that are subject to many risks.

The prototyping model is suitable for projects for which either the user requirements or the underlying technical aspects are not well understood. This model is especially popular for development of the user-interface part of the projects.

The evolutionary approach is suitable for large problems which can be decomposed into a set of modules for incremental development and delivery. This model is also widely used for object-oriented development projects. Of course, this model can only be used if the incremental delivery of the system is acceptable to the customer.

The spiral model is called a meta-model since it encompasses all other life cycle models. Risk handling is inherently built into this model. The spiral model is suitable for development of technically challenging software products that are prone to several kinds of risks. However, this model is much more complex than the other models. This is probably a factor deterring its use in ordinary projects.

The different software life cycle models can be compared from the viewpoint of the customer. Initially, customer confidence in the development team is usually high irrespective of the development model followed. During the long development process, customer confidence normally drops, as no working product is immediately visible. Developers answer customer queries using technical slang, and delays are announced. This gives rise to customer resentment. On the other hand, an evolutionary approach lets the customer experiment with a working product much earlier than the monolithic approaches. Another important advantage of the incremental model is that it reduces the customer’s trauma of getting used to an entirely new system. The gradual introduction of the product via incremental phases provides time to the customer to adjust to the new product. Also, from the customer’s financial viewpoint, incremental development does not require a large upfront capital outlay. The customer can order the incremental versions as and when he can afford them.

Q. 6. Write a Note on Spiral Model.

Ans: SPIRAL Model:

While the waterfall methodology offers an orderly structure for software development,demands for reduced

time-to-market make its series steps inappropriate. The next evolutionary step from the waterfall is where the

various steps are staged for multiple deliveries or handoffs. The ultimate evolution from the water fall is the

spiral, taking advantage of the fact that development projects work best when they are both incremental and

iterative, where the team is able to start small and benefit from enlightened trial and error along the way. The


spiral methodology reflects the relationship of tasks with rapid prototyping, increased parallelism, and

concurrency in design and build activities. The spiral method should still be planned methodically, with tasks

and deliverables identified for each step in the spiral.

The Spiral Model is the neo approach in IT project system development and was originally devised by Barry W. Boehm through his article published in 1985 "A Spiral Model of Software Development and Enhancement".

This model of development unites the features of the prototyping model with an iterative approach of system development; combining elements of design and prototyping-in-stages. This model is an effort to combine the advantages of top-down and bottom-up concepts highly preferential for large, exclusive, volatile, and complex projects.

The term "spiral" is used to describe the process that is followed in this model, as the development of the system takes place, the mechanisms go back several times over to earlier sequences, over and over again, circulating like a spiral.

The spiral model represents the evolutionary approach of IT project system development and carries the same activities over a number of cycles in order to elucidate system requirements and its solutions.

Similar to the waterfall model, the spiral model has sequential cycles/stages, with each stage having to be completed before moving on to next.


The prime difference between the waterfall model and the spiral model is that the project system development cycle moves towards eventual completion in both the models but in the spiral model the cycles go back several times over to earlier stages in a repetitive sequence.

Progress Cycles, IT Project Management Solutions

The progress cycle of this model is divided into four quadrants, and each quadrant with a different purpose;

Determining Objectives (I)------------Evaluating Alternatives (II)

*************************************************************

Planning Next Phase (III)------------Planning Next Phase (IV)

First Quadrant: the top left quadrant determines and identifies the project objectives, alternatives, and constrains of the project. Similar to the system conception stage in the Waterfall Model, here objectives are determined with identifying possible obstacles and weighting alternative approaches.

Second Quadrant: the top right quadrant determines the different alternatives of the project risk analysis, and evaluates their task with each alternative eventually resolving them. Probable alternatives are inspected and associated risks are recognized. Resolutions of the project risks are evaluated, and prototyping is used wherever necessary.

Third Quadrant: the bottom right quadrant develops the system and this quadrant corresponds to the waterfall model with detailed requirements determined for the project.

Fourth Quadrant: the bottom left quadrant plans the next phase development process, providing opportunity to analyze the results and feedback.

In each phase, it begins with a system design and terminates with the client reviewing the progress through prototyping.

The major advantage of the spiral model over the waterfall model is the advance approach on setting project objectives, project risk management and project planning into the overall development cycle. Additionally, another significant advantage is, the user can be given some of the functionality before the entire system is completed.

The spiral model addresses complexity of predetermined system performance by providing an iterative approach to system development, repeating the same activities in order to clarify the problem and provide an accurate classification of the requirement within the bounds of multiple constraints.



Database Management Systems

Q.1 : Differentiate between Traditional File System & Modern Database System ? Describe

the properties of Database & the advantage of Database ?

Ans. Differentiate between Traditional File System & Modern Database System ?

Traditional File System Modern Database Management System

Traditional File System is the system that was

followed before the advent of DBMS i.e. it is the older

way.

This is the Modern way which has replaced the

older concept of File System.

In Traditional file processing, data definition is part of

the application program and works with only specific

application.

Data definition is part of the DBMS.

Application is independent and can be used

with any application.

File systems are Design Driven; they require

design/coding change when new kind of data occurs.

E.g. : In a traditional employee the master file has

Emp_name, Emp_id, Emp_addr, Emp_design,

Emp_dept, Emp_sal, if we want to insert one more

column ‘Emp_Mob number’ then it requires a

complete restructuring of the file or redesign of the

application code, even though basically all the data

except that in one column is the same.

One extra column (Attribute) can be added

without any difficulty.

Minor coding changes in the Application

program may be required.

Traditional File system keeps redundant (duplicate)

information in many locations. This might result in the

loss of Data Consistency.

Redundancy is eliminated to the maximum extent in

DBMS if properly defined.


For e.g. : Employee names might exist in separate

files like Payroll Master File and also in Employee

Benefit Master File etc. Now if an employee changes

his or her last name, the name might be changed in

they pay roll master file but not be changed in

Employee Benefit Master File etc. This might result in

the loss of Data Consistency.

In a File system data is scattered in various files, and

each of these files may be in different formats, making

it difficult to write new application programs to retrieve

the appropriate data.

This problem is completely solved here.

Security features are to be coded in the Application

Program itself.

Coding for security requirements is not required as

most of them have been taken care by the DBMS.

Hence, a data base management system is the software that manages a database, and is responsible for

its storage, security, integrity, concurrency, recovery and access.

The DBMS has a data dictionary, referred to as system catalog, which stores data about everything it holds,

such as names, structure, locations and types. This data is also referred to as Meta data.

Describe the properties of Database & the advantage of Database ?

Properties of Database :

The following are the important properties of Database :

1. A database is a logical collection of data having some implicit meaning. If the data are not related then

it is not called as proper database. E.g. Student studying in class II got 5th rank.

Stud_name Class Rank obtained

Vijetha Class II 5th

2. A database consists of both data as well as the description of the database structure and

constraints.

E.g.

Field Name Type Description

Stud_name Character It is the student’s name


Class Alpha numeric It is the class of the student

3. A database can have any size and of various complexity. If we consider the above example of

employee database the name and address of the employee may consists of very few records each

with simple structure.

E.g.

Emp_name Emp_id Emp_addr Emp_desig Emp_Sal

Prasad 100 “Shubhodaya”, Near Katariguppe

Big Bazaar, BSK II stage,

Bangalore

Project Leader 40000

Usha 101 #165, 4th main Chamrajpet,

Bangalore

Software

engineer

10000

Nupur 102 #12, Manipal Towers, Bangalore Lecturer 30000

Peter 103 Syndicate house, Manipal IT executive 15000

Like this there may be ‘n’ number of records.

4. The DBMS is considered as general-purpose software system that facilitates the process of defining,

constructing and manipulating database for various applications.

5. A database provides insulation between programs, data and data abstraction. Data abstraction is a

feature that provides the integration of the data source of interest and helps to leverage the physical

data however the structure is.

6. The data in the database is used by variety of users for variety of purposes. For E.g. when you

consider a hospital database management system the view of usage of patient database is different

from the same used by the doctor. In this case the data are stored separately for the different users. In

fact it is stored in a single database. This property is nothing but multiple views of the database.

7. Multiple user DBMS must allow the data to be shared by multiple users simultaneously. For this

purpose the DBMS includes concurrency control software to ensure that the updation done to the

database by variety of users at single time must get updated correctly. This properly explains the

multiuser transaction processing.

Advantages of Database (DBMS) :

1. Redundancy is reduced.

2. Data located on a server can be shared by clients.


3. Integrity (accuracy) can be maintained.

4. Security features protect the Data from unauthorized access.

5. Modern DBMS support internet based application.

6. In DBMS the application program and structure of data are independent.

7. Consistency of Data is maintained.

8. DBMS supports multiple views. As DBMS has many users, and each one of them might use it for

different purposes, and may require to view and manipulate only on a portion of the database,

depending on requirement.

Q. 2 : What is the disadvantages of sequential file organization ? How do you overcome it?

What are the advantages & disadvantages of Dynamic Hashing?

Ans. One disadvantage of sequential file organization is that we must use linear search or binary search

to locate the desired record and that results in more i/o operations. In this there are a number of unnecessary

comparisons. In hashing technique or direct file organization, the key value is converted into an address by

performing some arithmetic manipulation on the key value, which provides very fast access to records.

Key Value Hash function Address

Let us consider a hash function h that maps the key value k to the value h(k). The VALUE h(k) is used as an

address.

The basic terms associated with the hashing techniques are :

1) Hash table : It is simply an array that is having address of records.

2) Hash function : It is the transformation of a key into the corresponding location or address in the

hash table (it can be defined as a function that takes key as input and transforms it into a hash table

index).

3) Hash key : let ‘R’ be a record and it key hashes into a key value called hash key.

Internal Hashing :

For internal files, hash table is an array of records, having array in the range from 0 to M-1. Let as consider a

hash function H(K) such that H(K)=key mod M which produces a remainder between 0 and M-1 depending

on the value of key. This value is then used for the record address. The problem with most hashing function

is that they do not guarantee that distinct value will hash to distinct address, a situation that occurs when two

non-identical keys are hashed into the same location.

For example : let us assume that there are two non-identical keys k1=342 and k2=352 and we have some

mechanism to covert key values to address. Then the simple hashing function is :

h(k) = k mod 10

Here h(k) produces a bucket address.


To insert a record with key value k, we must have its key first. E.g. : Consider h

(K-1)=K1% 10 will get 2 as the hash value. The record with key value 342 is placed at the location 2, another

record with 352 as its key value produces the same has address i.e. h(k1) = h(k2). When we try to place the

record at the location where the record with key K1 is already stored, there occurs a collision. The process of

finding another position is called collision resolution. There are numerous methods for collision resolution.

1) Open addressing : With open addressing we resolve the hash clash by inserting the record in the

next available free or empty location in the table.

2) Chaining : Various overflow locations are kept, a pointer field is added to each record and the

pointer is set to address of that overflow location.

External Hashing for Disk Files :

Handling Overflow for Buckets by Chaining :

Hashing for disk files is called external hashing. Disk storage is divided into buckets, each of which holds

multiple records. A bucket is either one disk block or a cluster of continuous blocks.

The hashing function maps a key into a relative bucket number. A table maintained in the file header

converts the bucket number into the corresponding disk block address.


The collision problem is less severe with buckets, because many records will fit in a same bucket. When a

bucket is filled to capacity and we try to insert a new record into the same bucket, a collision is caused.

However, we can maintain a pointer in each bucket to address overflow records.

The hashing scheme described is called static hashing, because a fixed number of buckets ‘M’ is allocated.

This can be serious drawback for dynamic files. Suppose M be a number of buckets, m be the maximum

number of records that can fit in one bucket, then at most m*M records will fit in the allocated space. If the

records are fewer than m*M numbers, collisions will occur and retrieval will be slowed down.

What are the advantages & disadvantages of Dynamic Hashing ?

Advantages of Dynamic Hashing :

1. The main advantage is that splitting causes minor reorganization, since only the records in one bucket

are redistributed to the two new buckets.

2. The space overhead of the directory table is negligible.

3. The main advantage of extendable hashing is that performance does not degrade as the file grows. The

main space saving of hashing is that no buckets need to be reserved for future growth; rather buckets

can be allocated dynamically.

Disadvantages of Dynamic Hashing :

1. The index tables grow rapidly and too large to fit in main memory. When part of the index table is stored

on secondary storage, it requires extra access.

2. The directory must be searched before accessing the bucket, resulting in two-block access instead of

one in static hashing.

3. A disadvantages of extendable hashing is that it involves an additional level of indirection.


Q.3. What is relationship type ? Explain the difference among a relationship instance,

relationship type & a relation set ?

Ans. In the real world, items have relationships to one another. E.g. : A book is published by a particular

publisher. The association or relationship that exists between the entities relates data items to each other in

a meaningful way. A relationship is an association between entities. A collection of relationships of the same

type is called a relationship set.

A relationship type R is a set of associations between E, E2…….En entity types mathematically, R is a set of

relationship instances ri.

E.g. : Consider a relationship type WORKS_FOR between two entity types – employee and department,

which associates each employee with the department the employee works for. Each relationship instance in

WORKS_FOR associates one employee entity and one department entity, where each relationship instance

is ri which connects employee and department entities that participate in ri.

Employee el, e3 and e6 work for department d1, e2 and e4 work for d2 and e5 and e7 work for d3.

Relationship type R is a set of all relationship instances.

Degree of relationship type :

The number of entity sets that participate in a relationship set. A unary relationship exists when an

association is maintained with a single entity.

A binary relationship exists when two entities are associated.


A tertiary relationship exists when there are three entities associated.

Role Name and Recursive Relationship :

Each entry type to participate in a relationship type plays a particular role in the relationship. The role name

signifies the role that a participating entity from the entity type plays in each relationship instance, e.g. : In the

WORKS FOR relationship type, the employee plays the role of employee or worker and the department

plays the role of department or employer. However in some cases the same entity type participates more

than once in a relationship type in different roles. Such relationship types are called recursive.

E.g. : employee entity type participates twice in SUPERVISION once in the role of supervisor and once in the

role of supervisee.

Q. 4 : What is SQL ? Discuss.

Ans. Structured Query Language (SQL) is a specialized language for updating, deleting, and requesting

information from databases. SQL is an ANSI and ISO standard, and is the de facto standard database query

language. A variety of established database products support SQL, including products from Oracle and

Microsoft SQL Server. It is widely used in both industry and academia, often for enormous, complex

databases.

In a distributed database system, a program often referred to as the database's "back end" runs constantly

on a server, interpreting data files on the server as a standard relational database. Programs on client

computers allow users to manipulate that data, using tables, columns, rows, and fields. To do this, client

programs send SQL statements to the server. The server then processes these statements and returns

replies to the client program.

Examples


To illustrate, consider a simple SQL command, SELECT. SELECT retrieves a set of data from the database

according to some criteria, using the syntax :

SELECT list_of_column_names from list_of_relation_names where

conditional_expression_that_identifies_specific_rows

The list_of_relation_names may be one or more comma-separated table names or an expression

operating on whole tables.

The conditional_expression will contain assertions about the values of individual columns within

individual rows within a table, and only those rows meeting the assertions will be selected.

Conditional expressions within SQL are very similar to conditional expressions found in most

programming languages.

For example, to retrieve from a table called Customers all columns (designated by the asterisk) with a value

of Smith for the column Last_Name, a client program would prepare and send this SQL statement to the

server back end :

SELECT * FROM Customers WHERE Last_Name='Smith';

The server back end may then reply with data such as this :

+--------------+-----------------+---------------------+

| Cust_No | Last_Name | First_Name |

+--------------+-----------------+---------------------+

| 1001 | Smith | John |

| 2039 | Smith | David |

| 2098 | Smith | Matthew |

+-------------+-----------------+----------------------+

3 rows in set (0.05 sec)

Following is an SQL command that displays only two columns, column_name_1 and column_name_3, from

the table myTable :

SELECT column_name_1, column_name_3 from myTable

Below is a SELECT statement displaying all the columns of the table myTable2 for each row whose

column_name_3 value includes the string "brain" :

SELECT * from column_name_3 where column_name_3 like '%brain%'

Q. 5 : What is Normalization ? Discuss various types of Normal Forms ?

Ans. Normalization is the process of building database structures to store data, because any application

ultimately depends on its data structures. If the data structures are poorly designed, the application will start

from a poor foundation. This will require a lot more work to create a useful and efficient application.

Normalization is the formal process for deciding which attributes should be grouped together in a relation.

Normalization serves as a tool for validating and improving the logical design, so that the logical design

avoids unnecessary duplication of data, i.e. it eliminates redundancy and promotes integrity. In the


normalization process we analyze and decompose the complex relations into smaller, simpler and well-

structured relations.

Discuss various types of Normal Forms ?

1. Normal forms based on Primary Keys :

A relation schema R is in first normal form if every attribute of R takes only single atomic values. We can also

define it as intersection of each row and column containing one and only one value. To transform the un-

normalized table (a table that contains one or more repeating groups) to first normal form, we identify and

remove the repeating groups within the table.

E.g.

Dept.

D.Name D.No. D.location

R&D 5 (England, London, Delhi)

HRD 4 Bangalore

Figure A

Consider the figure that each dept can have number of locations. This is not in first normal form because

D.location is not an atomic attribute. The dormain of D location contains multi-values.

There is a technique to achieve the first normal form. Remove the attribute D.location that violates the first

normal form and place into separate relation Dept_location

Ex. : Dept Dept_location

Dept.no. D.Name Dept_location Dept_No

5 R&D

6 HRD

Functional dependency : The concept of functional dependency was introduced by Prof. Codd in 1970 during

the emergence of definitions for the three normal forms. A functional dependency is the constraint between

the two sets of attributes in a relation from a database.

Given a relation R, a set of attributes X in R is said to functionally determine another attribute Y, in R, (X->Y)

if and only if each value of X is associated with one value of Y. X is called the determinant set and Y is the

dependant attribute.

For e.g. : Consider the example of STUDENT_COURSE database.


STUDENT_COURSE

Sid Sname Address Cid Course Max marks

Marks Obtained (%)

001 Nupur Lucknow MB010 Database Concepts 100 83

001 Nupur Lucknow MB011 C++ 100 90

002 Priya Chennai MB010 Database Concepts 100 85

002 Priya Chennai MB011 C++ 100 75

002 Priya Chennai MQ040 Computer Networks 75 65

003 Pal Bengal MB009 Unix 100 70

004 Prasad Bangalore MC011 System Software 100 85

In the STUDENT_COURSE database (Sid) student id does not uniquely identifies a tuple and therefore it

cannot be a primary key. Similarly (Cid) course id cannot be primary key. But the combination of (Sid, Cid)

uniquely identifies a row in STUDENT_COURSE. Therefore (Sid, Cid) is the primary key which uniquely

retrieves Sname, address, course, marks, which are dependent on the primary key.

2. Second Normal Form (2NF) :

A second normal form is based on the concept of full functional dependency. A relation is in second normal

form if every non-prime attribute A in R is fully functionally dependent on the Primary Key of R.

Emp_Project : Emp_Project : 2NF and 3NF, (a) Normalizing EMP_PROJ into 2NF relations


A Partial functional dependency is a functional dependency in which one or more non-key attributes are

functionally dependent on part of the primary key. It creates a redundancy in that relation, which results in

anomalies when the table is updated.

3. Third Normal Form (3NF) :

This is based on the concept of transitive dependency. We should design relational scheme in such a way

that there should not be any transitive dependencies, because they lead to update anomalies. A functional

dependence [FD] x->y in a relation schema ‘R’ is a transitive dependency. If there is a set of attributes ‘Z’ Le

x->, z->y is transitive. The dependency SSN->Dmgr is transitive through Dnum in Emp_dept relation

because SSN->Dnum and Dnum->Dmgr, Dnum is neither a key nor a subset [part] of the key.

According to codd’s definition, a relational schema ‘R’ is in 3NF if it satisfies 2NF and no no_prime attribute is

transitively dependent on the primary key. Emp_dept relation is not in 3NF, we can normalize the above

table by decomposing into E1 and E2.


Note : Transitive is a mathematical relation that states that if a relation is true between the first value and the

second value, and between the second value and the 3rd value, then it is true between the 1st and the 3rd

value.

Example 2 :

Consider a relation schema ‘Lots’ which describes the parts of land for sale in various countries of a state.

Suppose there are two candidate keys : properly_ID and {Country_name.lot#}; that is, lot numbers are

unique only within each country, but property_ID numbers are unique across countries for entire state.

Based on the two candidate keys property_ID and {country name, Lot} we know that functional

dependencies FD1 and FD2 hold. Suppose the following two additional functional dependencies hold in

LOTS.

FD3 : Country_name -> tax_rate

FD4 : Area -> price

Here, FD3 says that the tax rate is fixed for a given country countryname -> taxrate, FD4 says that price of a

Lot is determined by its area, area -> price. The Lots relation schema violates 2NF, because tax_rate is

partially dependent upon candidate key {Country_namelot#}. Due to this, it decomposes lots relation into two

relations – lots1 and lots 2.

Lots1 violates 3NF, because price is transitively dependent on candidate key of Lots1 via attribute area.

Hence we could decompose LOTS1 into LOTS1A and LOTS1B.

1. It is fully functionally dependent on every key of ‘R’

2. It is non_transitively dependent on every key of ‘R’

Q. 6 : What do you mean by Shared Lock & Exclusive Lock ? Describe briefly two phase

locking protocol ?

Ans. Shared Lock :

It is used for read only operations, i.e. used for operations that do not change or update the data.

E.g. SELECT statement:,

Shared locks allow concurrent transaction to read (SELECT) a data. No other transactions can modify the

data while shared locks exist. Shared locks are released as soon as the data has been read.

Exclusive Locks :

Exclusive locks are used for data modification operations, such as UPDATE, DELETE and INSERT. It

ensures that multiple updates cannot be made to the same resource simultaneously. No other transaction

can read or modify data when locked by an exclusive lock.

Exclusive locks are held until transaction commits or rolls back since those are used for write operations.

There are three locking operations : read_lock(X), write_lock(X), and unlock(X). A lock associated with an

item X, LOCK(X), now has three possible states : “read locked”, “write-locked”, or “unlocked”. A read-locked


item is also called share-locked, because other transactions are allowed to read the item, whereas a write-

locked item is called exclusive-locked, because a single transaction exclusive holds the lock on the item.

Each record on the lock table will have four fields : <data item name, LOCK, no_of_reads,

locking_transaction(s)>, the value (state) of LOCK is either read-locked or write-locked.

Read_lock(X):

B, if LOCK(X)=’unlocked’

Then begin LOCK(X) “read-locked”

No_of_reads(x) 1

End

Else if LOCK(X)=”read-locked”

Then no_of_reads(X) no_of_reads(X)+1

else begin wait(until)LOCK(X)=”unlocked” and

the lock manager wakes up the transaction);

goto B

end;

write_lock(X):

B: if LOCK(X)=”unlocked”

Then LOCK(X) “wite-locked”;

else begin

wait(until LOCK(X)=”unlocked” and

the lock manager wkes up the transaction);

goto B

end;

unlock(X):

if LOCK(X)=”wite-locked”

Then begin LOCK(X) “un-locked”;

Wakeup one of the waiting transactions, if any

End

else if LOCK(X)=”read-locked”

then begin

no_of_reads(X) no_of_reads(X)-1

if no_of_reads(X)=0

then begin LOCK(X)=”unlocked”;

wakeup one of the waiting transactions, if any

end

end;

The Two Phase Locking Protocol

The two phase locking protocol is a process to access the shared resources as their own without creating

deadlocks. This process consists of two phases.


1. Growing Phase : In this phase the transaction may acquire lock, but may not release any locks.

Therefore this phase is also called as resource acquisition activity.

2. Shrinking Phase : In this phase the transaction may release locks, but may not acquire any new

locks. This includes the modification of data and release locks. Here two activities are grouped

together to form second phase.

In the beginning, transaction is in growing phase. Whenever lock is needed the transaction acquires it. As

the lock is released, transaction enters the next phase and it can stop acquiring the new lock request.

Strict tow phase locking :

In the two phases locking protocol cascading rollback are not avoided. In order to avoid this slight

modification are made to two phase locking and called strict two phase locking. In this phase all the locks are

acquired by the transaction are kept on hold until the transaction commits.

Deadlock & starvation : In deadlock state there exists, a set of transactions in which every transaction in the

set is waiting for another transaction in the set.

Suppose there exists a set of transactions waiting

{T1, T2, T3,……………………, Tn) such that T1 is waiting for a data item existing in T2, T2 for T3 etc… and

Tn is waiting of T1. In this state none of the transaction will progress.


Database Management Systems

Q. 1. Define Data Model & discuss the categories of Data Models? What is the difference between logical data Independence & Physical Data Independence?

Ans: DATA MODEL:

The product of the {database} design process which aims to identify and organize the required data logically and physically. A data model says what information is to be contained in a database, how the information will be used, and how the items in the database will be related to each other. For example, a data model might specify that a customer is represented by a customer name and credit card number and a product as a product code and price, and that there is a one-to-many relation between a customer and a product. It can be difficult to change a database layout once code has been written and data inserted. A well thought-out data model reduces the need for such changes. Data modelling enhances application maintainability and future systems may re-use parts of existing models, which should lower development costs. A data modelling language is a mathematical formalism with a notation for describing data structures and a set of operations used to manipulate and validate that data. One of the most widely used methods for developing data models is the {entity-relationship model}. The {relational model} is the most widely used type of data model. Another example is {NIAM}


Catagaries of DATA Model:-

1. Conceptual (high-level, semantic ) data models:

A conceptual schema or conceptual data model is a map of concepts and their relationships. This describes the semantics of an organization and represents a series of assertions about its nature. Specifically, it describes the things of significance to an organization (entity classes), about which it is inclined to collect information, and characteristics of (attributes) and associations between pairs of those things of significance (relationships).

Because a conceptual schema represents the semantics of an organization, and not a database design, it may exist on various levels of abstraction. The original ANSI four-schema architecture began with the set of external schemas that each represent one person's view of the world around him or her. These are consolidated into a single conceptual schema that is the superset of all of those external views. A data model can be as concrete as each person's perspective, but this tends to make it inflexible. If that person's world changes, the model must change. Conceptual data models take a more abstract perspective, identifying the fundamental things, of which the things an individual deals with are just examples.

A conceptual data model identifies the highest-level relationships between the different entities. Features of conceptual data model include:

Includes the important entities and the relationships among them.

No attribute is specified.

No primary key is specified.

The figure is an example of a conceptual data model.

From the figure above, we can see that the only information shown via the conceptual data model is the entities that describe the data and the relationships between those entities. No other information is shown through the conceptual data model.

2. Physical (low -level, internal) data models

Features of physical data model include:

Specification all tables and columns. Foreign keys are used to identify relationships between tables.

Denormalization may occur based on user requirements.

Physical considerations may cause the physical data model to be quite different from the logical data model.

At this level, the data modeler will specify how the logical data model will be realized in the database schema.

The steps for physical data model design are as follows:

1. Convert entities into tables. 2. Convert relationships into foreign keys.

http://en.wikipedia.org/wiki/ANSI

http://en.wikipedia.org/wiki/Database_design

http://en.wikipedia.org/wiki/Organization

http://en.wikipedia.org/wiki/Logical_assertion

http://en.wikipedia.org/wiki/Semantics

http://en.wikipedia.org/wiki/Relational_model

http://en.wikipedia.org/wiki/Concept

http://en.wikipedia.org/wiki/Data_model


3. Convert attributes into columns.

4. Modify the physical data model based on physical constraints / requirements.

3. Logical Data Model

Features of logical data model include:

Includes all entities and relationships among them. All attributes for each entity are specified.

The primary key for each entity specified.

Foreign keys (keys identifying the relationship between different entities) are specified.

Normalization occurs at this level.

At this level, the data modeler attempts to describe the data in as much detail as possible, without regard to how they will be physically implemented in the database.

In data warehousing, it is common for the conceptual data model and the logical data model to be combined into a single step (deliverable).

The steps for designing the logical data model are as follows:

1. Identify all entities. 2. Specify primary keys for all entities.

3. Find the relationships between different entities.

4. Find all attributes for each entity.

5. Resolve many-to-many relationships.


6. Normalization.

Differences between a logical and physical data model

The difference between a logical and a physical data model are hard to grasp at first, but once you see the difference it seems obvious. A logical data model describes your model entities and how they relate to each other. A physical data model describes each entity in detail, including information about how you would implement the model using a particular (database) product.

In a logical model describing a person in a family tree, each person node would have attributes such as name(s), date of birth, place of birth, etc. The logical diagram would also show some kind of unique attribute or combination of attributes called a primary key that describes exactly one entry (a row in SQL) within this entity.

The physical model for the person would contain implementation details. These details are things like data types, indexes, constraints, etc.

The logical and physical model serves two different, but related purposes. A logical model is a way to draw your mental roadmap from a problem specification to an entity-based storage system. The user (problem owner) must understand and approve the

Q. 2. What is a B+Trees? Describe the structure of both internal and leaf nodes of a B+Tree?

Ans. B+ Trees :

In computer science, a B-tree is a tree data structure that keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic amortized time. The B-tree is a generalization of a binary search tree in that a node can have more than two children. (Comer, p. 123) Unlike self-balancing binary search trees, the B-tree is optimized for systems that read and write large blocks of data. It is commonly used in databases and filesystems.

In B-trees, internal (non-leaf) nodes can have a variable number of child nodes within some pre-defined range. When data is inserted or removed from a node, its number of child nodes changes. In order to maintain the pre-defined range, internal nodes may be joined or split. Because a range of child nodes is

http://en.wikipedia.org/wiki/Leaf_node

http://en.wikipedia.org/wiki/Filesystem

http://en.wikipedia.org/wiki/Database

http://en.wikipedia.org/wiki/Self-balancing_binary_search_tree


http://en.wikipedia.org/wiki/B-tree#CITEREFComer

http://en.wikipedia.org/wiki/Binary_search_tree

http://en.wikipedia.org/wiki/Binary_search_tree

http://en.wikipedia.org/wiki/Amortized_analysis

http://en.wikipedia.org/wiki/Tree_data_structure

http://en.wikipedia.org/wiki/Computer_science


permitted, B-trees do not need re-balancing as frequently as other self-balancing search trees, but may waste some space, since nodes are not entirely full. The lower and upper bounds on the number of child nodes are typically fixed for a particular implementation. For example, in a 2-3 B-tree (often simply referred to as a 2-3 tree), each internal node may have only 2 or 3 child nodes.

Each internal node of a B-tree will contain a number of keys. Usually, the number of keys is chosen to vary between d and 2d. In practice, the keys take up the most space in a node. The factor of 2 will guarantee that nodes can be split or combined. If an internal node has 2d keys, then adding a key to that node can be accomplished by splitting the 2d key node into two d key nodes and adding the key to the parent node. Each split node has the required minimum number of keys. Similarly, if an internal node and its neighbor each have d keys, then a key may be deleted from the internal node by combining with its neighbor. Deleting the key would make the internal node have d − 1 keys; joining the neighbor would add d keys plus one more key brought down from the neighbor's parent. The result is an entirely full node of 2d keys.

The number of branches (or child nodes) from a node will be one more than the number of keys stored in the node. In a 2-3 B-tree, the internal nodes will store either one key (with two child nodes) or two keys (with three child nodes). A B-tree is sometimes described with the parameters (d + 1) — (2d + 1) or simply with the highest branching order, (2d + 1).

A B-tree is kept balanced by requiring that all leaf nodes are at the same depth. This depth will increase slowly as elements are added to the tree, but an increase in the overall depth is infrequent, and results in all leaf nodes being one more node further away from the root.

B-trees have substantial advantages over alternative implementations when node access times far exceed access times within nodes. This usually occurs when the nodes are in secondary storage such as disk drives. By maximizing the number of child nodes within each internal node, the height of the tree decreases and the number of expensive node accesses is reduced. In addition, rebalancing the tree occurs less often. The maximum number of child nodes depends on the information that must be stored for each child node and the size of a full disk block or an analogous size in secondary storage. While 2-3 B-trees are easier to explain, practical B-trees using secondary storage want a large number of child nodes to improve performance.

Variants

The term B-tree may refer to a specific design or it may refer to a general class of designs. In the narrow sense, a B-tree stores keys in its internal nodes but need not store those keys in the records at the leaves. The general class includes variations such as the B+-tree and the B*-tree.

In the B+-tree, copies of the keys are stored in the internal nodes; the keys and records are stored in leaves; in addition, a leaf node may include a pointer to the next leaf node to speed sequential access.(Comer, p. 129)

The B*-tree balances more neighboring internal nodes to keep the internal nodes more densely packed.(Comer, p. 129) For example, a non-root node of a B-tree must be only half full, but a non-root node of a B*-tree must be two-thirds full.

Counted B-trees store, with each pointer within the tree, the number of nodes in the subtree below that pointer.[1] This allows rapid searches for the Nth record in key order, or counting the number of records between any two records, and various other related operations.

http://en.wikipedia.org/wiki/B-tree#cite_note-0



http://en.wikipedia.org/wiki/B*-tree

http://en.wikipedia.org/wiki/B%2B_tree

http://en.wikipedia.org/wiki/Block_size_(data_storage_and_transmission)

http://en.wikipedia.org/wiki/Internal_node

http://en.wikipedia.org/wiki/Child_node

http://en.wikipedia.org/wiki/Hard_drive

http://en.wikipedia.org/wiki/Hard_drive

http://en.wikipedia.org/wiki/Secondary_storage

http://en.wikipedia.org/wiki/Leaf_node

http://en.wikipedia.org/wiki/2-3_tree



3. Describe Projection operation, Set theoretic operation & join operation?

Projection OperatorProjection is also a Unary operator. The Projection operator is pi: Projection limits the attributes that will be returned from the original relation. The general syntax is: attributes RWhere attributes is the list of attributes to be displayed and R is the relation. The resulting relation will have the same number of tuples as the original relation (unless there are duplicate tuples produced). The degree of the resulting relation may be equal to or less than that of the original relation.

Projection ExamplesAssume the same EMP relation above is used. Project only the names and departments of the employees: name, dept (EMP)Results:Name Dept

Smith CS Jones Econ Green Econ Brown CS Smith Fin

Combining Selection and Projection The selection and projection operators can be combined to perform both operations.

Show the names of all employees working in the CS department: name ( Dept = 'CS' (EMP) )Results: Name Smith Brown

Show the name and rank of those Employees who are not in the CS department or Adjuncts: name, rank ( (Rank = 'Adjunct' Dept = 'CS') (EMP) )Result: Name Rank

Green Assistant Smith Associate

ExercisesEvaluate the following expressions: name, rank ( (Rank = 'Adjunct' Dept = 'CS') (EMP) ) fname, age ( Age > 22 (R S) )


For this expression, use R and S from the Set Theoretic Operations section above.

office > 300 ( name, rank (EMP))

Aggregate FunctionsWe can also apply Aggregate functions to attributes and tuples:SUM MINIMUM MAXIMUM AVERAGE, MEAN, MEDIAN COUNT Aggregate functions are sometimes written using the Projection operator or the Script F character: as in the Elmasri/Navathe book. Aggregate Function ExamplesAssume the relation EMP has the following tuples: Name Office Dept Salary Smith 400 CS 45000 Jones 220 Econ 35000 Green 160 Econ 50000 Brown 420 CS 65000 Smith 500 Fin 60000

Find the minimum Salary: MIN (salary) (EMP)Results:MIN(salary) 35000

Find the average Salary: AVG (salary) (EMP)Results:AVG(salary) 51000

Count the number of employees in the CS department: COUNT (name) ( Dept = 'CS' (EMP) )Results:COUNT(name) 2

Find the total payroll for the Economics department: SUM (salary) ( Dept = 'Econ' (EMP) )Results:SUM(salary) 85000

Set Theoretic OperationsConsider the following relations R and S R First Last Age Bill Smith 22 Sally Green 28 Mary Keen 23 Tony Jones 32

S First Last Age Forrest Gump 36 Sally Green 28 DonJuan DeMarco 27

Union: R S Result: Relation with tuples from R and S with duplicates removed.

Difference: R - S Result: Relation with tuples from R but not from S

Intersection: R S Result: Relation with tuples that appear in both R and S.


R S First Last Age Bill Smith 22 Sally Green 28 Mary Keen 23 Tony Jones 32 Forrest Gump 36 DonJuan DeMarco 27

R - S First Last Age Bill Smith 22 Mary Keen 23 Tony Jones 32

R S First Last Age Sally Green 28

Join OperationJoin operations bring together two relations and combine their attributes and tuples in a specific fashion. The generic join operator (called the Theta Join is: It takes as arguments the attributes from the two relations that are to be joined. For example assume we have the EMP relation as above and a separate DEPART relation with (Dept, MainOffice, Phone) :EMP EMP.Dept = DEPART.Dept DEPART

The join condition can be When the join condition operator is = then we call this an Equijoin Note that the attributes in common are repeated. Join ExamplesAssume we have the EMP relation from above and the following DEPART relation: Dept MainOffice

Phone CS 404 555-1212 Econ 200 555-1234 Fin 501 555-4321 Hist 100 555-9876

Find all information on every employee including their department info:EMP emp.Dept = depart.Dept DEPARTResults:Name Office EMP.Dept Salary DEPART.Dept MainOffice Phone Smith 400 CS 45000 CS 404 555-1212 Jones 220 Econ 35000 Econ 200 555-1234 Green 160 Econ 50000 Econ 200 555-1234 Brown 420 CS 65000 CS 404 555-1212 Smith 500 Fin 60000 Fin 501 555-4321

Find all information on every employee including their department info where the employee works in an office numbered less than the department main office:EMP (emp.office < depart.mainoffice) (emp.dept = depart.dept) DEPARTResults:Name Office EMP.Dept Salary DEPART.Dept MainOffice Phone Smith 400 CS 45000 CS 404 555-1212 Green 160 Econ 50000 Econ 200 555-1234 Smith 500 Fin 60000 Fin 501 555-4321

Natural JoinNotice in the generic (Theta) join operation, any attributes in common (such as dept above) are repeated. The Natural Join operation removes these duplicate attributes. The natural join operator is: * We can also assume using * that the join condition will be = on the two attributes in common.


Example: EMP * DEPARTResults:Name Office Dept Salary MainOffice Phone Smith 400 CS 45000 404 555-1212 Jones 220 Econ 35000 200 555-1234 Green 160 Econ 50000 200 555-1234 Brown 420 CS 65000 404 555-1212 Smith 500 Fin 60000 501 555-4321

Outer JoinIn the Join operations so far, only those tuples from both relations that satisfy the join condition are included in the output relation. The Outer join includes other tuples as well according to a few rules. Three types of outer joins: Left Outer Join includes all tuples in the left hand relation and includes only those matching tuples from the right hand relation. Right Outer Join includes all tuples in the right hand relation and includes ony those matching tuples from the left hand relation. Full Outer Join includes all tuples in the left hand relation and from the right hand relation.

Examples: Assume we have two relations: PEOPLE and MENU: PEOPLE:Name Age Food Alice 21 Hamburger Bill 24 Pizza Carl 23 Beer Dina 19 Shrimp

MENU: Food Day Pizza Monday Hamburger Tuesday Chicken WednesdayPasta Thursday Tacos Friday

PEOPLE people.food = menu.food MENU Name Age people.Food menu.Food Day Alice 21 Hamburger Hamburger Tuesday Bill 24 Pizza Pizza Monday Carl 23 Beer NULL NULLDina 19 Shrimp NULL NULL

PEOPLE people.food = menu.food MENU Name Age people.Food menu.Food Day Bill 24 Pizza Pizza Monday Alice 21 Hamburger Hamburger Tuesday NULL NULL NULL Chicken WednesdayNULL NULL NULL Pasta Thursday NULL NULL NULL Tacos Friday

PEOPLE people.food = menu.food MENU Name Age people.Food menu.Food Day Alice 21 Hamburger Hamburger Tuesday Bill 24 Pizza Pizza Monday Carl 23 Beer NULL NULLDina 19 Shrimp NULL NULLNULL NULL NULL Chicken WednesdayNULL NULL NULL Pasta Thursday NULL NULL NULL Tacos Friday

Q. 4. Discuss Multi Table Queries?


Ans: Multiple Table Queries :

Most of the queries you create in Microsoft Access will more that likely need to include the data from more than one table and you will have to join the tables in the query. The capability to join tables is the power of the relational database. As you know, in order to join database tables, they must have a field in common. The fields on which you join tables must be the same or compatible data types and they must contain the same kind of data, however they do not have to have the same field name (although they probably will). Occasionally, the two database tables that you want to bring the data from may not have a field in common and you will have to add another table to the query with the sole purpose of joining the tables.

Different types of query joins will return different sets of results. When creating new queries, it is prudent to test them on a set of records for which you know what the result should be. That’s a good way to be sure that you have the correct join and are getting accurate results. Just because a query runs and doesn't give you an error doesn't mean that the resulting data set is what you intended to return.

Failure to join tables in a database query will result in a cross or Cartesian product (A Cartesian product is defined as all possible combinations of rows in all tables. Be sure you have joins before trying to return data, because a Cartesian product on tables with many records and/or on many tables could take several hours to complete.), in which every record in one table is joined with every record in the second table - probably not very meaningful data.

There are inner joins and outer joins - each with variations on the theme.Inner Join

A join of two tables that returns records for which there is a matching value in the field on which the tables are joined.

The most common type of join is the inner join, or equi-join. It joins records in two tables when the values in the fields on which they are joined are equal. For example, if you had the following Customers and Orders tables and did an equi-join on (or, as is sometimes said, over) the CustomerID fields, you would see the set of records that have the same CustomerID in both tables. With the following data, that would be a total of 7 records. Customers listed in the Customers table who had not placed an order would not be included in the result. There has to be the same value in the CustomerID field in both tables.

An Inner Join of the Customer and Order Data

If, in the query result, you eliminated redundant columns - that is, displayed the CustomerID column only once in the result - this would be called a natural join.

An inner join returns the intersection of two tables. Following is a graphic of joining these tables. The Customers table contains data in areas 1 and 2. The Orders table contains data in areas 2 and 3. An inner join returns only the data in area 2.


Outer Joins

A join between two tables that returns all the records from one table and, from the second table, only thoserecords in which there is a matching values in the field on which the tables are joined.

An outer join returns all the records from one table and only the records from the second table where the value in the field on which the tables are joined matches a value in the first table. Outer joins are referred to as left outer joins and right outer joins. The left and right concept comes from the fact that, in a traditional database diagram, the table on the one side of a 1:N relationship was drawn on the left.

Using our Customers and Orders tables again, if you performed a left outer join, the result would include a listing of all Customers and, for those that had placed orders, the data on those orders. You would get a total of 11 records from this data, which is a very different result from the 7 records provided by the inner join.

An Outer Join of the Customers and Orders table

In the diagram below, a left outer join on the Customers table will return the data in areas 1 and 2. By the way, this type of diagram is called a Venn diagram.

Not All Data Can Be Edited


Earlier, it was mentioned that the results of a query represent “live” data, meaning that a change to that data is actually a change to the data in the base table. However, you will find that you cannot edit all data that is returned by a query. You can edit values in all fields from a query based on a single table or on two tables with a one-to-one relationship. But you can’t edit all fields in a query based on tables with a one-to-many relationship nor from crosstab queries orthose with totals.

In general, you can edit:all fields in a single table queryall fields in tables with a one-to-one relationshipall fields in the table on the many side of a one-to-many relationshipnon-key fields in the table on the one side of a one-to-many relationship

You can’t edit:fields in the primary key in the table on the one side of a one-to-many relationshipfields returned by a crosstab queryvalues in queries in which aggregate operations are performed calculated fields

There are ways to work around some of these editing limitations but the precise technique will depend on the RDBMS you’re using.

Q.5. Discuss Transaction Processing Concept? 10.2 Describe properties of Transactions?

Ans: In computer science, transaction processing is information processing that is divided into individual, indivisible operations, called transactions. Each transaction must succeed or fail as a complete unit; it cannot remain in an intermediate state.

Description

Transaction processing is designed to maintain a computer system (typically a database or some modern filesystems) in a known, consistent state, by ensuring that any operations carried out on the system that are interdependent are either all completed successfully or all canceled successfully.

For example, consider a typical banking transaction that involves moving $700 from a customer's savings account to a customer's checking account. This transaction is a single operation in the eyes of the bank, but it involves at least two separate operations in computer terms: debiting the savings account by $700, and crediting the checking account by $700. If the debit operation succeeds but the credit does not (or vice versa), the books of the bank will not balance at the end of the day. There must therefore be a way to ensure that either both operations succeed or both fail, so that there is never any inconsistency in the bank's database as a whole. Transaction processing is designed to provide this.

Transaction processing allows multiple individual operations to be linked together automatically as a single, indivisible transaction. The transaction-processing system ensures that either all operations in a transaction are completed without error, or none of them are. If some of the operations are completed but errors occur when the others are attempted, the transaction-processing system “rolls back” all of the operations of the transaction (including the successful ones), thereby erasing all traces of the transaction and restoring the system to the consistent, known state that it was in before processing of the transaction began. If all operations of a transaction are completed successfully, the transaction is committed by the system, and all changes to the database are made permanent; the transaction cannot be rolled back once this is done.

Transaction processing guards against hardware and software errors that might leave a transaction partially completed, with the system left in an unknown, inconsistent state. If the computer system crashes in the middle of a transaction, the transaction processing system guarantees that all operations in any uncommitted (i.e., not completely processed) transactions are cancelled.

Transactions are processed in a strict chronological order. If transaction n+1 intends to touch the same portion of the database as transaction n, transaction n+1 does not begin until transaction n is committed.


Before any transaction is committed, all other transactions affecting the same part of the system must also be committed; there can be no “holes” in the sequence of preceding transactions.

Methodology

The basic principles of all transaction-processing systems are the same. However, the terminology may vary from one transaction-processing system to another, and the terms used below are not necessarily universal.

Rollback

Transaction-processing systems ensure database integrity by recording intermediate states of the database as it is modified, then using these records to restore the database to a known state if a transaction cannot be committed. For example, copies of information on the database prior to its modification by a transaction are set aside by the system before the transaction can make any modifications (this is sometimes called a before image). If any part of the transaction fails before it is committed, these copies are used to restore the database to the state it was in before the transaction began.

Rollforward

It is also possible to keep a separate journal of all modifications to a database (sometimes called after images); this is not required for rollback of failed transactions, but it is useful for updating the database in the event of a database failure, so some transaction-processing systems provide it. If the database fails entirely, it must be restored from the most recent back-up. The back-up will not reflect transactions committed since the back-up was made. However, once the database is restored, the journal of after images can be applied to the database (rollforward) to bring the database up to date. Any transactions in progress at the time of the failure can then be rolled back. The result is a database in a consistent, known state that includes the results of all transactions committed up to the moment of failure.

Deadlocks

In some cases, two transactions may, in the course of their processing, attempt to access the same portion of a database at the same time, in a way that prevents them from proceeding. For example, transaction A may access portion X of the database, and transaction B may access portion Y of the database. If, at that point, transaction A then tries to access portion Y of the database while transaction B tries to access portion X, a deadlock occurs, and neither transaction can move forward. Transaction-processing systems are designed to detect these deadlocks when they occur. Typically both transactions will be cancelled and rolled back, and then they will be started again in a different order, automatically, so that the deadlock doesn't occur again. Or sometimes, just one of the deadlocked transactions will be cancelled, rolled back, and automatically re-started after a short delay.

Deadlocks can also occur between three or more transactions. The more transactions involved, the more difficult they are to detect, to the point that transaction processing systems find there is a practical limit to the deadlocks they can detect.

Compensating transaction

In systems where commit and rollback mechanisms are not available or undesirable, a Compensating transaction is often used to undo failed transactions and restore the system to a previous state.

Transaction Properties

You can control the behavior of the transactions in your OpenAccess ORM applications by setting various transaction properties. The properties are always set for a specific transaction, and they are valid until the IObjectScope instance is disposed of. Transaction properties can be changed only if the transaction is not active.

The previous sections showed some transaction properties. In the following code example, the RetainValues property is set to true:C#// prepare transaction properties


scope.TransactionProperties.RetainValues = true;scope.Transaction.Begin();

...

ConsoleWriteLine( "RetainValues is "

+ scope.Transaction.Properties.RetainValues );

scope.Transaction.Commit();

// Properties are still valid

scope.Transaction.Begin();...VB.NET' prepare transaction propertiesscope.TransactionProperties.RetainValues = Truescope.Transaction.Begin()'...ConsoleWriteLine("RetainValues is " + scope.Transaction.Properties.RetainValues)scope.Transaction.Commit()' Properties are still validscope.Transaction.Begin()

Following is a list of the transaction properties, their allowed and default values, and a brief description.RetainValues This property controls whether persistent class instances retain their values after commit of the transaction and if read access is allowed. By default it is set to true. However, regardless of this setting, objects are refreshed from the data store the next time the object is accessed within an active transaction. RestoreValues This property controls whether the values of objects are restored to their original values when a transaction (or one particular nesting level) is rolled back. By default it is set to false.No Automatic Refreshes As described earlier in this chapter, OpenAccess ORM uses optimistic concurrency control by default (refer to Concurrency Control Algorithms for more information about the various concurrency control mechanisms). This means that OpenAccess ORM does not validate read (but unmodified) objects at commit time, and therefore it is possible that if an object is read inside a transaction, it might be changed in the database, while the transaction is running.

So, in order to avoid long-living stale objects, OpenAccess ORM will refresh such objects if they are accessed in a subsequent transaction. This happens on the first access to such objects. Thus, only short-living stale objects are possible, at the cost of an SQL call for refreshing the object in a subsequent transaction.

It is possible to have more control over this refreshing behavior, by disabling the automatic refresh function, which can be done as shown below:scope.TransactionProperties.RefreshReadObjectsInNewTransaction = false;

The advantage of using this is that objects can keep their data for a long time, without the need for executing an SQL Statement again.However, if you enable "no automatic refreshes" then you are responsible for avoiding stale data, i.e. you will need to call Refresh() or Evict() at appropriate times.


Therefore, please use this with care, since read (but not modified) objects, will not be refreshed automatic`ally in new transactions of the same ObjectScope, i.e., if an object is fetched from the database in the first transaction and is subsequently never explicitly refreshed, evicted or modified in subsequent transactions, it will still have the values from the first transaction.

Concurrency This property determines the concurrency settings of a transaction. The default is TransactionMode.OPTIMISTIC|TransactionMode.NO_LOST_UPDATES.AutomaticBegin This property allows the user to specify that every Commit()/Rollback() of a transaction will start a new transaction immediately. Therefore, the user does not need to call Begin(), and one can work with just Commit() and Rollback() calls. In other words, there is always a started transaction. This is regardless of a failure of a Commit() [the next transaction will yet be started].

This is especially useful for multithreaded applications, i.e., working with multiple threads in one object scope by setting the <option.Multithreaded> to "true", since this allows an automatic synchronized Commit() + Begin().FailFast This property determines whether a transaction commit or flush, should fail at the first failure. When this property is set to true (default value), the transaction will fail on the occurrence of the first OptimisticVerificationException. When this property is set to false the IObjectScope will collect all the failures, i.e., the commit or flush will continue and collect all the OptimisticVerificationExceptions that occur. It might be time consuming to collect all the failures, therefore this property is set to true by default.This property should be set to false only when the information about failing objects is necessary, since it might be very time consuming to collect all the failures.

Q. 6. Describe the advantage of Distributed database? What is Client/server Model? Discuss briefly the security and Internet violation?

Ans: A distributed database is a database that is under the control of a central database management system (DBMS) in which storage devices are not all attached to a common CPU. It may be stored in multiple computers located in the same physical location, or may be dispersed over a network of interconnected computers.

Collections of data (e.g. in a database) can be distributed across multiple physical locations. A distributed database can reside on network servers on the Internet, on corporate intranets or extranets, or on other company networks. Replication and distribution of databases improve database performance at end-user worksites.

Besides distributed database replication and fragmentation, there are many other distributed database design technologies. For example, local autonomy, synchronous and asynchronous distributed database technologies. These technologies' implementation can and does depend on the needs of the business and the sensitivity/confidentiality of the data to be stored in the database, and hence the price the business is willing to spend on ensuring data security, consistency and integrity.

Advantages of distributed databases

Management of distributed data with different levels of transparency.Increase reliability and availability.Easier expansion.Reflects organizational structure — database fragments are located in the departments they relate to.Local autonomy — a department can control the data about them (as they are the ones familiar with it.)Protection of valuable data — if there were ever a catastrophic event such as a fire, all of the data would not be in one place, but distributed in multiple locations.


Improved performance — data is located near the site of greatest demand, and the database systems themselves are parallelized, allowing load on the databases to be balanced among servers. (A high load on one module of the database won't affect other modules of the database in a distributed database.)Economics — it costs less to create a network of smaller computers with the power of a single large computer.Modularity — systems can be modified, added and removed from the distributed database without affecting other modules (systems).Reliable transactions - Due to replication of database.Hardware, Operating System, Network, Fragmentation, DBMS, Replication and Location Independence.Continuous operation.Distributed Query processing.Distributed Transaction management.

Single site failure does not affect performance of system. All transactions follow A.C.I.D. property: a-atomicity, the transaction takes place as whole or not at all; c-consistency, maps one consistent DB state to another; i-isolation, each transaction sees a consistent DB; d-durability, the results of a transaction must survive system failures. The Merge Replication Method used to consolidate the data between databases.

client–server modelThe client–server model of computing is a distributed application structure that partitions tasks or workloads between the providers of a resource or service, called servers, and service requesters, called clients.[1] Often clients and servers communicate over a computer network on separate hardware, but both client and server may reside in the same system. A server machine is a host that is running one or more server programs which share their resources with clients. A client does not share any of its resources, but requests a server's content or service function. Clients therefore initiate communication sessions with servers which await incoming requests.

The client–server characteristic describes the relationship of cooperating programs in an application. The server component provides a function or service to one or many clients, which initiate requests for such services.

Functions such as email exchange, web access and database access, are built on the client–server model. Users accessing banking services from their computer use a web browser client to send a request to a web server at a bank. That program may in turn forward the request to its own database client program that sends a request to a database server at another bank computer to retrieve the account information. The balance is returned to the bank database client, which in turn serves it back to the web browser client displaying the results to the user. The client–server model has become one of the central ideas of network computing. Many business applications being written today use the client–server model. So do the Internet's main application protocols, such as HTTP, SMTP, Telnet, and DNS.

The interaction between client and server is often described using sequence diagrams. Sequence diagrams are standardized in the Unified Modeling Language.

Specific types of clients include web browsers, email clients, and online chat clients.

Specific types of servers include web servers, ftp servers, application servers, database servers, name servers, mail servers, file servers, print servers, and terminal servers. Most web services are also types of servers.

Security

A condition that results from the establishment and maintenance of protective measures that ensures a state of inviolability from hostile acts or influences.

INTERNET VIOLATIONS


Internet crime is among the newest and most constantly evolving areas of American law. Although the Internet itself is more than three decades old, greater public usage began in the late 1980s with widespread ADOPTION only following in the 1990s. During that decade the Net was transformed from its modest military and academic roots into a global economic tool, used daily by over 100 million Americans and generating upwards of $100 billion in domestic revenue annually. But as many aspects of business, social, political, and cultural life moved online, so did crime, creating new challenges for lawmakers and law enforcement.

Crime on the Net takes both old and new forms. The medium has facilitated such traditional offenses as FRAUD and child PORNOGRAPHY. But it has also given rise to unique technological crimes, such as electronic intrusion in the form of hacking and computer viruses. High-speed Internet accounts helped fuel a proliferation of COPYRIGHT INFRINGEMENT in software, music, and movie PIRACY. National security is also threatened by the Internet's potential usefulness for TERRORISM. Taken together, these crimes have earned a new name: when FBI Director Louis J. Freeh addressed the U. S. Senate in 2000, he used the widely-accepted term "cybercrime."

Example: internet violation in school

Cause/Event Consequences

Hacking into school servers; other computers Suspension; administrative discretion and possible legal actions (action level III)

Using e-mail or Web sites to intimidate students (cyber-bullying)

Detention/Suspension; immediately sent to administrator; considered harassment under district Rights & Responsibilities Handbook

Downloading illegal music/media files from the Internet

Possible civil legal actions; data files wiped

Using inappropriate instant messaging/chats during class

Loss of computer for one or more class teacher/team discretion

Using or carrying computer in an unsafe manner Loss of computer/ note to parents

Plagiarizing information by using the Internet Failed assignment; loss of points and possible legal actions

Accessing pornographic/hate speech groups (others?) websites

Administrative discretion and possible legal actions; loss of computer privileges

Playing games on laptops or PDA’s during class ContentBarrier installed at parent’s expense.

Physically damaging a computer through misuse or neglect (throwing, dropping, snagging)

Loss of computer; administrative discretion/restitution

Posing as a another person; misrepresenting self on the web, using another’s identity (ID theft)

Suspension

Manipulating or changing settings without authorization

Administrative restrictions



Computer Networks

Q. 1 : Explain all design issues for several layers in Computer. What is connection – oriented and connectionless service ?

Ans. Design issues for the layers :The various key design issues are present in several layers in computer networks. The important design issues

are :

1. Addressing – Mechanism for identifying senders and receivers, on the network need some form of

addressing. There are multiple processes running on one machine. Some means is needed for a

process on one machine to specify with whom it wants to communicate.

2. Error Control – There may be erroneous transmission due to several problems during communication.

These are due to problem in communication circuits, physical medium, due to thermal noise and

interference. Many error detecting and error correcting codes are known, but both ends of the

connection must agree on which one being used. In addition, the receiver must have some mechanism

of telling the sender which messages have been received correctly and which has not.

3. Flow Control – If there is a fast sender at one end sending data to a slow receiver, then there must be

flow control mechanism to control the loss of data by slow receivers. There are several mechanisms

used for flow control such as increasing buffer size at receivers, slow down the fast sender, and so on.

Some process will not be in position to accept arbitrarily long messages. Then, there must be some

mechanism to disassembling, transmitting and then reassembling messages.

4. Multiplexing / de-multiplexing – If the data has to be transmitted on transmission media separately, it

is inconvenient or expensive to setup separate connection for each pair of communicating processes.

So, multiplexing is needed in the physical layer at sender end and de-multiplexing is need at the

receiver end.

5. Routing – When data has to be transmitted from source to destination, there may be multiple paths

between them. An optimized (shortest) route muse be chosen. This decision is made on the basis of

several routing algorithms, which chooses optimized route to the destination.

Connection Oriented and Connectionless Services :

Layers can offer two types of services namely connection oriented service and connectionless service.

Connection Oriented Service – The service user first establishes a connection, uses the connection and then

releases the connection. Once the connection is established between source and destination, the path is fixed.

The data transmission takes place through this path established. The order of the message sent will be same


at the receiver end. Services are reliable and there is no loss of data. Most of the time, reliable service provides

acknowledgement is an overhead and adds delay.

Connectionless Services – In this type of services, no connection is established between source and

destination. Here there is no fixed path. Therefore, the messages must carry full destination address and each

one of these messages are sent independent of each other. Messages sent will not be delivered at the

destination in the same order. Thus, grouping and ordering is required at the receiver end, and the services are

not reliable. There is no acknowledgement confirmation from the receiver. Unreliable connectionless service is

often called datagram service, which does not return an acknowledgement to the sender. In some cases,

establishing a connection to send one short messages is needed. But reliability is required, and then

acknowledgement datagram service can be used for these applications.

Another service is the request-reply service. In this type of service, the sender transmits a single datagram

containing a request from the client side. Then at the other end, server reply will contain the answer. Request-

reply is commonly used to implement communication in the client-server model.

Q.2 : Discuss OSI Reference model.

Ans. The OSI Reference Model :

The OSI model is based on a proposal developed by the International Standards Organization as a first step

towards international standardization of the protocols used in the various layers. The model is called the ISO

– (International Standard Organization – Open Systems Interconnection) Reference Model because it deals

with connecting open systems – that is, systems that follow the standard are open for communication with

other systems, irrespective of a manufacturer.

Its main objectives were to :

- Allow manufacturers of different systems to interconnect equipment through a standard interfaces.

- Allow software and hardware to integrate well and be portable on different systems.

The OSI model has seven

layers. The principles that were

applied to arrive at the seven

layers are as follows :

1. Each layer should perform

a well-defined function.

2. The function of each layer

should be chosen with an

eye toward defining

internationally

standardized protocols.


3. The layer boundaries should be chosen to minimize the information flow across the interfaces.

The set of rules for communication between entities in a layer is called protocol for that layer.

Q. 3 : Describe different types of Data Transmission Modes.

Ans. Data Transmission Modes :The transmission of binary data across a link can be accomplished in either parallel or serial mode. In

parallel mode, multiple bits are sent with each clock tick. In serial mode, 1 bit is sent with each clock tick.

While there is one way to send parallel data, there are three subclasses of serial transmission :

asynchronous, synchronous, and isochronous.

Serial and Parallel

Serial Transmission :

In serial transmission one bit follows another, so we need only one communication channel rather than n to

transmit data between two communicating devices.

The advantages of serial over parallel transmission is that with only one communication channel, serial

transmission reduces cost of transmission over parallel by roughly a factor of n.

Since communication within devices is parallel, conversion devices are required at the interface between the

sender and the line (parallel-to-serial) and between the line and the receiver (serial-to-parallel). Serial

transmission occurs in one of three ways : asynchronous, synchronous, and isochronous.

Parallel Transmission :

Binary data, consisting of 1s and 0s, may be organized into groups of n bits each. Computers produce and

consume data in groups of bits much as we conceive of and use spoken language in the form of words

rather than letters. By grouping, we can send data n bits at a time instead of 1. This is called parallel

transmission.

The mechanism for parallel transmission is a simple one : Use n wires to send n bits at one time. That way

each bit has its own wire, and all n bits of one group can be transmitted with each clock tick from one device

to another.


The advantage of parallel transmission is speed. All else being equal, parallel transmission can increase the

transfer speed by a factor on n over serial transmission.

But there is a significant disadvantage : cost. Parallel transmission requires n communication lines just to

transmit the data stream. Because this is expensive, parallel transmission is usually limited to short

distances.

Simplex, Half-duplex and Full-duplex :

There are three modes of data transmission that correspond to the three types of circuits available. These

are :

a) Simplex

b) Half-duplex

c) Full-duplex

Simplex :

Simplex communications imply a simple method of communicating, which they are. In simplex

communication mode, there is a one-way communication transmission. Television transmission is a good

example of simplex communications. The main transmitter sends out a signal (broadcast), but it does not

expect a reply as the receiving units cannot issue a reply back to the transmitter. A data collection terminal

on a factory floor or a line printer (receive only). Another example of simplex communication is a keyboard

attached to a computer because the keyboard can only send data to the computer.

At first thought it might appear adequate for many types of application in which flow of information is

unidirectional. However, in almost all data processing applications, communication in both directions is

required. Even for a “one-way” flow of information from a terminal to computer, the system will be designed

to allow the computer to signal the terminal that data has been received. Without this capability, the remote

used might enter data and never know that it was not received by the other terminal. Hence, simplex circuits

are seldom used because a return path is generally needed to send acknowledgement, control or error

signals.

Half-duplex :

In half-duplex mode, both units communicate over the same medium, but only one unit can send at a time.

While one is in send mode, the other unit is in receiving mode. It is like two polite people talking to each other

– one talks, the other listens, but neither one talks at the same time. Thus, a half-duplex line can alternately


send and receive data. It requires two wires. This is the most common type of transmission for voice

communications because only one person is supposed to speak at a time. It is also used to connect a

terminal with a computer. The terminal might transmit data and then the computer responds with an

acknowledgement. The transmission of data to and from a hard disk is also done in half-duplex mode.

Full-duplex :

In a half-duplex system, the line must be “turned around” each time the direction is reversed. This involves a

special switching circuit and requires a small amount of time (approximately 150 milliseconds). With high

speed capabilities of the computer, this turn-around time is unacceptable in many instances. Also, some

applications require simultaneous transmission in both directions. In such cases, a full-duplex system is used

that allows information to flow simultaneously in both directions on the transmission path. Use of a full-duplex

line improves efficiency as the line turn-around time required in a half-duplex arrangement is eliminated. It

requires four wires.

Synchronous and Asynchronous Transmission :

Synchronous Transmission :

In synchronous transmission, the bit stream is combined into longer “frames”, which may contain multiple

bytes. Each byte, however, is introduced onto the transmission link without a gap between it and the next

one. It is left to the receiver to separate the bit stream into bytes for decoding purpose. In other words, data

are transmitted as an unbroken sting of 1s and 0s, and the receiver separates that string into the bytes, or

characters, it needs to reconstruct the information.

Without gaps and start and stop bits, there is no built-in mechanism to help the receiving device adjust its

bits synchronization midstream. Timing becomes very important, therefore, because the accuracy of the

received information is completely dependent on the ability of the receiving device to keep an accurate count

of the bits as they come in.

The advantage of synchronous transmission is speed. With no extra bits or gaps to introduce at the sending

end and remove at the receiving end, and, by extension, with fewer bits to move across the link,

synchronous transmission is faster than asynchronous transmission of data from one computer to another.

Byte synchronization is accomplished in the data link layer.


Asynchronous Transmission :

Asynchronous transmission is so named because the timing of a signal is unimportant. Instead, information

is received and translated by agreed upon patterns. As long as those patterns are followed, the receiving

device can retrieve the information without regard to the rhythm in which it is sent. Patterns are based on

grouping the bit stream into bytes. Each group usually 8 bits, is sent along the link as a unit. The sending

system handles each group independently, relaying it to the link whenever ready, without regard to t timer.

Without synchronization, the receiver cannot use timing to predict when the next group will arrive. To alert

the receiver to the arrival of an new group, therefore, an extra bit is added to the beginning of each byte. This

bit, usually a 0, is called the start bit. To let the receiver know that the byte is finished, 1 or more additional

bits are appended to the end of the byte. These bits, usually 1s, are called stop bits.

By this method, each byte is increased in size to at least 10 bits, of which 8 bits is information and 2 bits or

more are signals to the receiver. In addition, the transmission of each byte may then be followed by a gap of

varying duration. This gap can be represented either by an idle channel or by a stream of additional stop bits.

The start and stop bits and the gap alert the receiver to the beginning and end of the each byte and also it to

synchronize with the data stream. This mechanism is called asynchronous because, at the byte level, the

sender and receiver do not have to be synchronized. But within each byte, the receiver must still by

synchronized with the incoming bit stream.

That is, some synchronization is required, but only for the duration of a single byte. The receiving device

resynchronizes at the onset of each new byte. When the receiver detects a start bit, it sets a timer and

begins counting bits as they come in. After n bits, the receiver looks for a stop bit. As soon as it detects the

stop bit, it waits until it detects the next start bit.


Isochronous Transmission :

In real-time audio and video, in which uneven delays between frames are not acceptable, synchronous

transmission fails. For example, TV images are broadcast at the rate of 30 images per second; they must be

viewed at the same rate. If each image is send by using one or more frames, there should be no delays

between frames. For this type of application, synchronization between characters is not enough; the entire

stream of bits must be synchronized. The isochronous transmission guarantees that the data arrive at a fixed

rate.

Q. 4 : Define Switching. What is the difference between Circuit Switching and Packet

Switching ?

Ans. Switching :

A network is a set of connected devices. Whenever we have multiple devices, we have the problem of how

to connect them to make one-to-one communication possible. One of the better solutions is switching. A

switch is network consists of a series of interlinked nodes, called switches. Switches are devices capable of

crating temporary connections between two or more devices linked to the switch. In a switched network,

some of these nodes are connected to the end systems (computers or telephones). Others are used only for

routing. Switched networks are divided.

Difference between Circuit Switching and Packet Switching


Item Circuit Switching Packet Switching

What is send Voice Message (divided)

Call setup Required Not required

Dedicated Physical Path Yes No

Each packet follows the same route

Yes No

Packets arrive in order Yes No

Is a switch crash is fatal Yes No

Bandwidth available Yes No

Time of possible congestion At setup time On every packet

Store-and-forward No Yes

Q.5 : Classify Guided Medium (wired). Compare Fiber Optics and Copper Wire.

Ans. Guided Transmission Medium (wired) :

Guided media, which are those that provide a conduit form one device to another, include twisted-pair cable,

coaxial cable, and fiber-optic cable. A single traveling along any of these media is directed and contained by

the physical limits of the medium. Twisted-pair and coaxial cable use metallic (copper) conductors that

accept and transport signals on the form of electric current. Optical fiber is a cable that accepts and

transports signals in the form of light.

1. Twisted Pair :

A twisted pair consists of two insulted copper wires, typically about 1 mm thick. The wires are twisted

together in a helical form, just like a DNA molecule.


Twisting is done because two parallel wires constitute a fine antenna. When the wires are twisted, the waves

from different twists cancel out, so the wire radiates less effectively.

The most common application of the twisted pair is the telephone system. All the telephones are connected

to the telco office is by twisted pair. It runs several kilometers without amplification, but for long distance,

repeaters are needed. If many wires are coming from one building or apartment, they are bundled together

and encased in a protective sheath.

Twisted pairs can be used for transmitting either analog or digital signals. The bandwidth depends on the

thickness of the wire and the distance traveled, but several megabits/sec can be achieved for a new

kilometers. Due to their adequate performance and low cost, twisted pairs are widely used and are likely to

remain so for years to come.

Twisted pair cabling comes in several varieties, two of which are important for computer networks :

1. Category 3 twisted pairs consist of two insulated wires gently twisted together. Four such pairs are

typically grouped in a plastic sheath to protect the wires and keep them together. Which are capable

of handling signals with bandwidth of 16 MHz. This scheme allowed up to four regular telephones or

two multi-line telephones in each office to connect to the telephone company equipment in the wiring

closet.

2. Category 5 twisted pairs similar to category 3 pairs, but with more twists per centimeter, which result

in less crosstalk and a better-quality signals over longer distances, making them more suitable for

high-speed computer communication. Which are capable of handling signals with bandwidth of 100

MHz ?

2. Coaxial Cable :


The coaxial cable consists of a stiff copper wire as the core, surrounded by an insulating material. The

insulator is encased by a cylindrical conductor, often as a closely-woven braided mesh. The outer conductor

is covered in a protective plastic sheath.

The construction and shielding of the coaxial cable give it a good combination of high bandwidth and

excellent noise immunity. The bandwidth possible depends on the cable quality, length, and single-to-noise

ratio of the data signal, but coaxial cables have close bandwidth of 1 GHz.

Coaxial cable widely used within the telephone system for long-distance lines but now replaced by the fiber

optics. Coax is still widely used for cable television and metropolitan area networks.

3. Optical Fiber :

A fiber-optic cable is made of glass or plastic and transmits signals in the form of light. To understand optical

fiber, we need to components of optical fiber.

An optical transmission system has three key components :

1. The light source

2. The transmission medium

3. The detector

A pulse of light indicates a 1 bit and the absence of light indicates a 0 bit. The transmission medium is an

ultra-thin fiber of glass. The detector generates an electrical pulse when light falls on it. By attaching a light

source to one end of an optical fiber and a detector to other, we have unidirectional data transmission

system that accepts an electrical signal, converts and transmits it by light pulses, and then reconverts the

output to an electrical signal at the receiving end.

When light passes from one medium to another, for example, from fused silica to air, the ray is refracted

(bend) at the silica/air boundary.

Here we see a light ray incident on boundary at an angle s1 emerging at an angle a1. The amount of

refraction depends on the properties of the two media. For angles of incidence above certain critical value,

the light is refracted back into the silica; none of its escapes into the air. Thus, a light ray incident at or above

the critical angle is trapped inside the fiber, and can propagate for many kilometers with virtually no loss.


As the light ray trapped by total internal reflection in a medium. Like this many different rays will be bouncing

around at different angles. Each ray is said to have a different modes. So a fiber having this property is

called a multimode fiber.

If the fiber’s diameter is reduced to a few wavelengths of light, the fiber acts like a wave guide, and the light

can propagate only in a straight line, without bouncing, yielding a single-mode fiber.

Fiber optic cables are similar to coax. At the center is the glass core through which the light propagates. In

multimode fibers, the core is typically 50 microns in diameter, about the thickness of human hair. In single-

mode fibers, the core is 8 to 10 microns.

The core is surrounded by a glass cladding with a lower index of refraction than the core, to keep all the light

in the core. Next comes a thin plastic jacket to protect the classing. Fibers are typically grouped in bundles,

protected by an outer sheath.

Comparison of Fiber Optics and Copper Wire :

Fiber has many advantages over copper wire as a transmission media. These are :

It can handle much higher band widths than copper. Due to the low attenuation, repeaters are

needed only about every 30 km. on long lines, versus about every 5 km. for copper.


Fiber is not being affected by the power surges, electromagnetic interference, or power failures. Not

it is affected by corrosive chemicals in the air, making it deal for harsh factory environment.

Fiber is lighter the copper. One thousand twisted pairs copper cables of 1 km. long weight 8000 kg.

But two fibers have more capacity and weigh only 100 kg., which greatly reduces the need for

expensive mechanical support systems that must be maintained.

Fibers do not leak light and are quite difficult to tap. This gives them excellent security against

potential wire-tappers.

If new routes designed, the fiber is the first choice because of lower installation cost.

Q. 6 : What are different types of Satellites ?

Ans. Classification of Satellites :

Four different types of satellite orbits can be identified depending on the shape and diameter of the orbit :

- GEO (Geostationary Orbit)

- LEO (Low Earth Orbit)

- MEO (Medium Earth Orbit) or ICO (Intermediate Circular Orbit)

- HEO (Highly Elliptical Orbit) elliptical orbits

Van-Allen-Belts; ionized particles 2000 – 6000 km. and 15000 – 30000 km. above earth surface.

1. GEO (Geostationary Orbit) :

Altitude :

Ca. 36000 km. above earth surface.

Coverage :


Ideally suited for continuous, regional coverage using a single satellite. Can also be used equally

effectively for global coverage using a minimum of three satellites.

Visibility :

Mobile to satellite visibility decreases with increased latitude of the user. Poor Visibility in built-up,

urban regions.

2. LEO (Low Earth Orbit) :

Altitude :

Ca. 500 – 1500 km.

Coverage :

Multi-satellite constellations of upwards of 30-50 satellites are required for global, continuous

coverage. Single satellites can be used in store and forward mode for localized coverage but only

appear for short periods of time.

Visibility :

The use of satellite diversity, by which more than one satellite is visible at any given time, can be

used to optimize the link. This can be achieved by either selecting the optimum link or combining the

reception of two or more links. The higher the guaranteed minimum elevation angle to the user, the

more satellites is needed in the constellation.

3. MEO (Medium Earth Orbit) :

Altitude :

Ca. 6000 – 20000 km.

Coverage :

Multi-satellite constellations of between 10 and 20 satellites are required for global coverage.

Visibility :

Good to excellent global visibility, augmented by the use of satellite diversity techniques.

4. HEO (Highly Elliptical Orbit) :

Altitude :

Apogee : 40 000 – 50 000 km., Perigee : 1000-20 000 km.

Coverage :

Three or four satellites are needed to provide continuous coverage to a region.

Visibility :

Particularly designed to provide high guaranteed elevation angle to satellite for Northern and

Southern temperate latitudes.



Computer Networks

Q.1 Write down the features of Fast Ethernet and Gigabit Ethernet.

Ans: Fast Ethernet

In computer networking, Fast Ethernet is a collective term for a number of Ethernet standards that carry traffic at the nominal rate of 100 Mbit/s, against the original Ethernet speed of 10 Mbit/s. Of the fast Ethernet standards 100BASE-TX is by far the most common and is supported by the vast majority of Ethernet hardware currently produced. Fast Ethernet was introduced in 1995[1] and remained the fastest version of Ethernet for three years before being superseded by gigabit Ethernet.[2]A fast Ethernet adapter can be logically divided into a Media Access Controller (MAC) which deals with the higher level issues of medium availability and a Physical Layer Interface (PHY). The MAC may be linked to the PHY by a 4 bit 25 MHz synchronous parallel interface known as a Media Independent Interface (MII) or a 2 bit 50 MHz variant Reduced Media Independent Interface (RMII). Repeaters (hubs) are also allowed and connect to multiple PHYs for their different interfaces.

The MII may (rarely) be an external connection but is usually a connection between ICs in a network adapter or even within a single IC. The specs are written based on the assumption that the interface between MAC and PHY will be a MII but they do not require it.


The MII fixes the theoretical maximum data bit rate for all versions of fast Ethernet to 100 Mbit/s. The data signaling rate actually observed on real networks is less than the theoretical maximum, due to the necessary header and trailer (addressing and error-detection bits) on every frame, the occasional "lost frame" due to noise, and time waiting after each sent frame for other devices on the network to finish transmitting

100BASE-TX is the predominant form of Fast Ethernet, and runs over two wire-pairs inside a category 5 or above cable (a typical category 5 cable contains 4 pairs and can therefore support two 100BASE-TX links). Like 10BASE-T, the proper pairs are the orange and green pairs (canonical second and third pairs) in TIA/EIA-568-B's termination standards, T568A or T568B. These pairs use pins 1, 2, 3 and 6.

In T568A and T568B, wires are in the order 1, 2, 3, 6, 4, 5, 7, 8 on the modular jack at each end. The color-order would be green/white, green, orange/white, blue, blue/white, orange, brown/white, brown for T568A, and orange/white, orange, green/white, blue, blue/white, green, brown/white, brown for T568B.

Each network segment can have a maximum distance of 100 metres (328 ft). In its typical configuration, 100BASE-TX uses one pair of twisted wires in each direction, providing 100 Mbit/s of throughput in each direction (full-duplex). See IEEE 802.3 for more details.

Gigabit Ethernet

Gigabit Ethernet (GbE or 1 GigE) is a term describing various technologies for transmitting Ethernet frames at a rate of a gigabit per second, as defined by the IEEE 802.3-2008 standard. Half-duplex gigabit links connected through hubs are allowed by the specification but in the marketplace full-duplex with switches are normal. Intel PRO/1000 GT PCI network interface cardContents

IEEE 802.3ab, ratified in 1999, defines gigabit Ethernet transmission over unshielded twisted pair (UTP) category 5, 5e, or 6 cabling and became known as 1000BASE-T. With the ratification of 802.3ab, gigabit Ethernet became a desktop technology as organizations could use their existing copper cabling infrastructure.

IEEE 802.3ah, ratified in 2004 added two more Gigabit fiber standards, 1000BASE-LX10 (which was already widely implemented as vendor specific extension) and 1000BASE-BX10. This was part of a larger group of protocols known as Ethernet in the First Mile.

Initially, gigabit Ethernet was deployed in high-capacity backbone network links (for instance, on a high-capacity campus network). In 2000, Apple's Power Mac G4 and PowerBook G4 were the first mass produced personal computers featuring the 1000BASE-T connection.[1] It quickly became a built-in feature in many other computers. As of 2009 Gigabit NICs (1000BASE-T) are included in almost all desktop and server computer systems.

Higher bandwidth 10 Gigabit Ethernet standards have since become available as the IEEE ratified a fiber-based standard in 2002, and a twisted pair standard in 2006. As of 2009 10Gb Ethernet is replacing 1Gb as the backbone network and has begun to migrate down to high-end server systems.[citation needed]

Varieties

There are five different physical layer standards for gigabit Ethernet using optical fiber (1000BASE-X), twisted pair cable (1000BASE-T), or balanced copper cable (1000BASE-CX).

The IEEE 802.3z standard includes 1000BASE-SX for transmission over multi-mode fiber, 1000BASE-LX for transmission over single-mode fiber, and the nearly obsolete 1000BASE-CX for transmission over balanced copper cabling. These standards use 8b/10b encoding, which inflates the line rate by 25%, from 1,000–1,250 Mbit/s to ensure a DC balanced signal. The symbols are then sent using NRZ.

IEEE 802.3ab, which defines the widely used 1000BASE-T interface type, uses a different encoding scheme in order to keep the symbol rate as low as possible, allowing transmission over twisted pair.


Ethernet in the First Mile later added 1000BASE-LX10 and -BX10.

Q.2 Differentiate the working between pure ALOHA and slotted ALOHA.

Ans: ALOHA :

ALOHA is a medium access protocol that was originally designed for ground based radio broadcastinghowever it is applicable to any system in which uncoordinated users are competing for the use of a shared channel. Pure ALOHA and slotted ALOHA are the two versions of ALOHA.

Pure ALOHA uses a very simple idea that is to let users transmit whenever they have data to send. Pure ALOHA is featured with the feedback property that enables it to listen to the channel and finds out whether the frame was destroyed. Feedback is immediate in LANs but there is a delay of 270 msec in the satellite transmission. It requires acknowledgment if listening to the channel is not possible due to some reason. It can provide a channel utilization of 18 percent that is not appealing but it gives the advantage of transmitting any time.

Slotted ALOHA divides time into discrete intervals and each interval corresponds to a frame of data. It requires users to agree on slot boundaries. It does not allow a system to transmit any time. Instead the system has to wait for the beginning if the next slot.

Q.3 Write down distance vector algorithm. Explain path vector protocol.

Ans: Distance Vector Algorithms

Routing is the task of finding a path from a sender to a desired destination. In the IP "Catenet model" this reduces primarily to a matter of finding gateways between networks. As long as a message remains on a single network or subnet, any routing problems are solved by technology that is specific to the network. For example, the Ethernet and the ARPANET each define a way in which any sender can talk to any specified destination within that one network. IP routing comes in primarily when messages must go from a sender on one such network to a destination on a different one. In that case, the message must pass through gateways connecting the networks. If the networks are not adjacent, the message may pass through several intervening networks, and the gateways connecting them. Once the message gets to a gateway that is on the same network as the destination, that network's own technology is used to get to the destination.

Throughout this section, the term "network" is used generically to cover a single broadcast network (e.g., an Ethernet), a point to point line, or the ARPANET. The critical point is that a network is treated as a single entity by IP. Either no routing is necessary (as with a point to point line), or that routing is done in a manner that is transparent to IP, allowing IP to treat the entire network as a single fully-connected system (as with an Ethernet or the ARPANET). Note that the term "network" is used in a somewhat different way in discussions of IP addressing. A single IP network number may be assigned to a collection of networks, with "subnet" addressing being used to describe the individual networks. In effect, we are using the term "network" here to refer to subnets in cases where subnet addressing is in use.

A number of different approaches for finding routes between networks are possible. One useful way of categorizing these approaches is on the basis of the type of information the gateways need to exchange in order to be able to find routes. Distance vector algorithms are based on the exchange of only a small amount of information. Each entity (gateway or host) that participates in the routing protocol is assumed to keep information about all of the destinations within the system. Generally, information about all entities connected to one network is summarized by a single entry, which describes the route to all destinations on that network. This summarization is possible because as far as IP is concerned, routing within a network is invisible. Each entry in this routing database includes the next gateway to which datagrams destined for the entity should be sent. In addition, it includes a "metric" measuring the total distance to the entity. Distance is a somewhat generalized concept, which may cover the time delay in getting messages to the entity, the dollar cost of sending messages to it, etc. Distance vector algorithms get their name from the fact that it is possible to


compute optimal routes when the only information exchanged is the list of these distances. Furthermore, information is only exchanged among entities that are adjacent, that is, entities that share a common network.

Although routing is most commonly based on information about networks, it is sometimes necessary to keep track of the routes to individual hosts. The RIP protocol makes no formal distinction between networks and hosts. It simply describes exchange of information about destinations, which may be either networks or hosts. (Note however, that it is possible for an implementor to choose not to support host routes. See section 3.2.) In fact, the mathematical developments are most conveniently thought of in terms of routes from one host or gateway to another. When discussing the algorithm in abstract terms, it is best to think of a routing entry for a network as an abbreviation for routing entries for all of the entities connected to that network. This sort of abbreviation makes sense only because we think of networks as having no internal structure that is visible at the IP level. Thus, we will generally assign the same distance to every entity in a given network.

We said above that each entity keeps a routing database with one entry for every possible destination in the system. An actual implementation is likely to need to keep the following information about each destination:

address: in IP implementations of these algorithms, this will be the IP address of the host or network. gateway: the first gateway along the route to the destination. interface: the physical network which must be used to reach the first gateway. metric: a number, indicating the distance to the destination. timer: the amount of time since the entry was last updated.

In addition, various flags and other internal information will probably be included. This database is initialized with a description of the entities that are directly connected to the system. It is updated according to information received in messages from neighboring gateways.

The most important information exchanged by the hosts and gateways is that carried in update messages. Each entity that participates in the routing scheme sends update messages that describe the routing database as it currently exists in that entity. It is possible to maintain optimal routes for the entire system by using only information obtained from neighboring entities. The algorithm used for that will be described in the next section.

As we mentioned above, the purpose of routing is to find a way to get datagrams to their ultimate destinations. Distance vector algorithms are based on a table giving the best route to every destination in the system. Of course, in order to define which route is best, we have to have some way of measuring goodness. This is referred to as the "metric".

In simple networks, it is common to use a metric that simply counts how many gateways a message must go through. In more complex networks, a metric is chosen to represent the total amount of delay that the message suffers, the cost of sending it, or some other quantity which may be minimized. The main requirement is that it must be possible to represent the metric as a sum of "costs" for individual hops.

Formally, if it is possible to get from entity i to entity j directly (i.e., without passing through another gateway between), then a cost, d(i,j), is associated with the hop between i and j. In the normal case where all entities on a given network are considered to be the same, d(i,j) is the same for all destinations on a given network, and represents the cost of using that network. To get the metric of a complete route, one just adds up the costs of the individual hops that make up the route. For the purposes of this memo, we assume that the costs are positive integers.

Let D(i,j) represent the metric of the best route from entity i to entity j. It should be defined for every pair of entities. d(i,j) represents the costs of the individual steps. Formally, let d(i,j) represent the cost of going directly from entity i to entity j. It is infinite if i and j are not immediate neighbors. (Note that d(i,i) is infinite. That is, we don't consider there to be a direct connection from a node to itself.) Since costs are additive, it is easy to show that the best metric must be described by

D(i,i) = 0, all i D(i,j) = min [d(i,k) + D(k,j)], otherwise k


and that the best routes start by going from i to those neighbors k for which d(i,k) + D(k,j) has the minimum value. (These things can be shown by induction on the number of steps in the routes.) Note that we can limit the second equation to k's that are immediate neighbors of i. For the others, d(i,k) is infinite, so the term involving them can never be the minimum.

It turns out that one can compute the metric by a simple algorithm based on this. Entity i gets its neighbors k to send it their estimates of their distances to the destination j. When i gets the estimates from k, it adds d(i,k) to each of the numbers. This is simply the cost of traversing the network between i and k. Now and then i compares the values from all of its neighbors and picks the smallest.

A proof is given in [2] that this algorithm will converge to the correct estimates of D(i,j) in finite time in the absence of topology changes. The authors make very few assumptions about the order in which the entities send each other their information, or when the min is recomputed. Basically, entities just can't stop sending updates or recomputing metrics, and the networks can't delay messages forever. (Crash of a routing entity is a topology change.) Also, their proof does not make any assumptions about the initial estimates of D(i,j), except that they must be non-negative. The fact that these fairly weak assumptions are good enough is important. Because we don't have to make assumptions about when updates are sent, it is safe to run the algorithm asynchronously. That is, each entity can send updates according to its own clock. Updates can be dropped by the network, as long as they don't all get dropped. Because we don't have to make assumptions about the starting condition, the algorithm can handle changes. When the system changes, the routing algorithm starts moving to a new equilibrium, using the old one as its starting point. It is important that the algorithm will converge in finite time no matter what the starting point. Otherwise certain kinds of changes might lead to non-convergent behavior.

The statement of the algorithm given above (and the proof) assumes that each entity keeps copies of the estimates that come from each of its neighbors, and now and then does a min over all of the neighbors. In fact real implementations don't necessarily do that. They simply remember the best metric seen so far, and the identity of the neighbor that sent it. They replace this information whenever they see a better (smaller) metric. This allows them to compute the minimum incrementally, without having to store data from all of the neighbors.

There is one other difference between the algorithm as described in texts and those used in real protocols such as RIP: the description above would have each entity include an entry for itself, showing a distance of zero. In fact this is not generally done. Recall that all entities on a network are normally summarized by a single entry for the network. Consider the situation of a host or gateway G that is connected to network A. C represents the cost of using network A (usually a metric of one). (Recall that we are assuming that the internal structure of a network is not visible to IP, and thus the cost of going between any two entities on it is the same.) In principle, G should get a message from every other entity H on network A, showing a cost of 0 to get from that entity to itself. G would then compute C + 0 as the distance to H. Rather than having G look at all of these identical messages, it simply starts out by making an entry for network A in its table, and assigning it a metric of C. This entry for network A should be thought of as summarizing the entries for all other entities on network A. The only entity on A that can't be summarized by that common entry is G itself, since the cost of going from G to G is 0, not C. But since we never need those 0 entries, we can safely get along with just the single entry for network A. Note one other implication of this strategy: because we don't need to use the 0 entries for anything, hosts that do not function as gateways don't need to send any update messages. Clearly hosts that don't function as gateways (i.e., hosts that are connected to only one network) can have no useful information to contribute other than their own entry D(i,i) = 0. As they have only the one interface, it is easy to see that a route to any other network through them will simply go in that interface and then come right back out it. Thus the cost of such a route will be greater than the best cost by at least C. Since we don't need the 0 entries, non- gateways need not participate in the routing protocol at all.

Let us summarize what a host or gateway G does. For each destination in the system, G will keep a current estimate of the metric for that destination (i.e., the total cost of getting to it) and the identity of the neighboring gateway on whose data that metric is based. If the destination is on a network that is directly connected to G, then G simply uses an entry that shows the cost of using the network, and the fact that no gateway is needed to get to the destination. It is easy to show that once the computation has converged to the correct metrics, the neighbor that is recorded by this technique is in fact the first gateway on the path to the destination. (If there are several equally good paths, it is the first gateway on one of them.) This combination of destination, metric, and gateway is typically referred to as a route to the destination with that metric, using that gateway.


The method so far only has a way to lower the metric, as the existing metric is kept until a smaller one shows up. It is possible that the initial estimate might be too low. Thus, there must be a way to increase the metric. It turns out to be sufficient to use the following rule: suppose the current route to a destination has metric D and uses gateway G. If a new set of information arrived from some source other than G, only update the route if the new metric is better than D. But if a new set of information arrives from G itself, always update D to the new value. It is easy to show that with this rule, the incremental update process produces the same routes as a calculation that remembers the latest information from all the neighbors and does an explicit minimum. (Note that the discussion so far assumes that the network configuration is static. It does not allow for the possibility that a system might fail.)

To summarize, here is the basic distance vector algorithm as it has been developed so far. (Note that this is not a statement of the RIP protocol. There are several refinements still to be added.) The following procedure is carried out by every entity that participates in the routing protocol. This must include all of the gateways in the system. Hosts that are not gateways may participate as well.

Keep a table with an entry for every possible destination in the system. The entry contains the distance D to the destination, and the first gateway G on the route to that network. Conceptually, there should be an entry for the entity itself, with metric 0, but this is not actually included. Periodically, send a routing update to every neighbor. The update is a set of messages that contain all of the information from the routing table. It contains an entry for each destination, with the distance shown to that destination. When a routing update arrives from a neighbor G', add the cost associated with the network that is shared with G'. (This should be the network over which the update arrived.) Call the resulting distance D'. Compare the resulting distances with the current routing table entries. If the new distance D' for N is smaller than the existing value D, adopt the new route. That is, change the table entry for N to have metric D' and gateway G'. If G' is the gateway from which the existing route came, i.e., G' = G, then use the new metric even if it is larger than the old one.

A path vector protocol is a computer network routing protocol which maintains the path information that gets updated dynamically. Updates which have looped through the network and returned to the same node are easily detected and discarded. This algorithm is sometimes used in Bellman–Ford routing algorithms to avoid "Count to Infinity" problems.

It is different from the distance vector routing and link state routing. Each entry in the routing table contains the destination network, the next router and the path to reach the destination.

Path Vector Messages in BGP: The autonomous system boundary routers (ASBR), which participate in path vector routing, advertise the reachability of networks. Each router that receives a path vector message must verify that the advertised path is according to its policy. If the messages comply with the policy, the ASBR modifies its routing table and the message before sending it to the next neighbor. In the modified message it sends its own AS number and replaces the next router entry with its own identification.

BGP is an example of a path vector protocol. In BGP the routing table maintains the autonomous systems that are traversed in order to reach the destination system. Exterior Gateway Protocol (EGP) does not use path vectors.

Path vector protocols are a class of distance vector protocol in contrast to link state protoco

Q.4 State the working principle of TCP segment header and UDP header.

Ans: Transmission Control Protocol:

Transmission Control Protocol, or TCP as it is commonly referred to, is a transport-layer protocol that runs on top of IP. TCP is a connection-oriented, end-to-end reliable protocol designed to fit into a layered hierarchy of protocols which support multi-network applications. The TCP provides for reliable inter-process communication between pairs of processes in host computers attached to distinct but interconnected


computer communication networks. Very few assumptions are made as to the reliability of the communication protocols below the TCP layer. TCP assumes it can obtain a simple, potentially unreliable datagram service from the lower level protocols. In principle, the TCP should be able to operate above a wide spectrum of communication systems ranging from hard-wired connections to packet-switched or circuit-switched networks.

TCP was specifically designed to be a reliable end-to-end byte stream transmission protocol over an unreliable network. The IP layer does not provide any guarantees that datagrams will be delivered with any degree of reliability. Hence it is up to the upper-layer protocol to provide this reliability. The key functionality associated with TCP is basic data transfer.

Basic Data Transfer. From an application perspective, TCP transfers a contiguous stream of bytes through the network. The application does not have to bother with chopping the data into basic blocks or datagrams. TCP does this by grouping the bytes in TCP segments, which are passed to IP for transmission to the destination.

Reliability. TCP assigns a sequence number to each byte transmitted and expects a positive acknowledgment (ACK) from the receiving TCP. If the ACK is not received within a timeout interval, the data are retransmitted. Since the data are transmitted in blocks (TCP segments), only the sequence number of the first data byte in the segment is sent to the destination host.

Flow Control. The receiving TCP, when sending an ACK back to the sender, also indicates to the sender the number of bytes it can receive beyond the last received TCP segment, without causing overrun and overflow in its internal buffers. This is sent in the ACK in the form of the highest sequence number it can receive without problems.

Multiplexing. Multiplexing is achieved through the concept of ports. A port is a 16-bit number used by the host-to-host protocol to identify to which higher-level protocol or application process it must deliver incoming messages. Two types of ports exist: (1) Well-known: these ports belong to standard applications servers such as telnet, ftp, and http. The well-known ports are controlled and assigned by the Internet Assigned Numbers Authority (IANA). Well-known ports range from 1 to 1023. (2) Ephemeral: A client can negotiate the use of a port dynamically and such ports can be called ephemeral. These ports are maintained for the duration of the session and then released. Ephemeral ports range from 1024 to 65535. Multiple applications can use the ports as a means of multiplexing for communicating with other nodes.

Connections. The reliability and flow control mechanisms require that TCP initializes and maintains certain status information for each data stream. The combination of this status, including sockets, sequence numbers, and window sizes, is called a logical connection. Each connection is uniquely identified by the pair of sockets used by the sending and receiving processes.

TCP entities exchange data in the form of segments. A segment consists of a fixed 20-byte header and an optional part followed by zero or more data bytes.

The fields in the TCP header are desribed as follows:

Source Port and Destination Port: These fields identify the local endpoints of a connection. Each TCP entity decides how to allocate its own ports. A number of well-known ports are reserved for specific applications (e.g., FTP).

Sequence and Acknowledgment Number: Indicate the sequence number of the packet. The ACK number specifies the next byte expected, and not the last byte correctly received.

TCP Header Length: Indicates how many 32-bit words are contained in the TCP header. This is required because of the Options field, which is of variable length.

Reserved: For future use.

The six 1-bit flags are as follows:


URG— Set to 1 if the Urgent pointer is in use.

ACK— Set to 1 to indicate that the Acknowledgment number is valid.

PSH— Indicates PUSHed data. The receiver is requested to deliver the data to the application and not buffer it until a full buffer has been received.

RST— Used to reset a connection.

SYN— Used to establish connections.

FIN— Used to release a connection.

Window Size: This field tells how many bytes may be sent starting at the byte acknowledged. Flow control in TCP is handled using a variable-size sliding window.

Checksum: Provided for reliability. It checksums the header and the data (and the pseudoheader when applicable). While computing the checksum, the Checksum field itself is replaced with zeros.

Urgent Pointer: Used to indicate a byte offset from the current sequence number at which urgent data are to be found.

Options: This field was designed to provide a way to add extra facilities not covered by the regular header.

TCP has been the workhorse of the Internet, and a significant portion of Internet traffic today is carried via TCP. The reliability and congestion control aspects of TCP make it ideally suited for a large number of applications. TCP is formally defined in RFC 793. RFC 1122 provides some clarification and bug fixes, and a few extensions are defined in RFC 1323.

User Data Protocol

User Data Protocol (UDP) is a connectionless transport protocol. UDP is basically an application interface to IP. It adds no reliability, flow control, or error recovery to IP. It simply serves as a multiplexer/demultiplexer for sending and receiving datagrams, using ports to direct the datagrams. UDP is a light-weight protocol with very minimal overhead. The responsibility of recovering from errors, retransmission, etc., is up to the application. Applications that need to communicate need to identify a target is more specific than simply the IP address. UDP provides this function via the concept of ports. The format of the UDP datagram is shown in Figure 2-6.Figure 2-6. UDP header.

The following is a description of the fields of the UDP header:

Source and Destination Port: The two ports serve the same function as in TCP; they identify the endpoints within the source and destination nodes.

UDP Length: This field includes the 8-byte UDP header and the data.

UDP Checksum: The checksum is computed over the UDP header, the IP header, and the data.

Although UDP does not implement flow control or reliable/ordered delivery, it does a little more work than simply to demultiplex messages to some application—it ensures the correctness of the message via the checksum. UDP uses the same cheksum algorithm as IP. UDP is described in RFC 768.


Q.5 What is IP addressing? Discuss different classes of IP Addressing.

Ans: IP Addressing:-An Internet Protocol address (IP address) is a numerical label assigned to each

device (e.g. computer, printer) participating in a computer network that uses the Internet Protocol for communication. An IP address serves two principal functions: host or network interface identification and location addressing. Its role has been characterized as follows: "A name indicates what we seek. An address indicates where it is. A route indicates how to get there."

The designers of computer network communication protocols defined an IP address as a 32-bit number and this system, known as Internet Protocol Version 4 (IPv4), is still in use today. However, due to the enormous growth of the Internet and the predicted depletion of available addresses, a new addressing system (IPv6), using 128 bits for the address, was developed in 1995, standardized in 1998, and is now being deployed world-wide.

Although IP addresses are stored as binary numbers, they are usually displayed in human-readable notations, such as 172.16.254.1 (for IPv4), and 2001:db8:0:1234:0:567:1:1 (for IPv6).

The Internet Assigned Numbers Authority (IANA) manages the IP address space allocations globally and cooperates with five regional Internet registries (RIRs) to allocate IP address blocks to local Internet registries (Internet service providers) and other entities.

IP addresses were originally organized into classes. The address class determined the potential size of the network.

The class of an address specified which of the bits were used to identify the network, the network ID, or which bits were used to identify the host ID, host computer. It also defined the total number of hosts subnets per network. There were five classes of IP addresses: classes A through E.

Classful addressing is no longer in common usage and has now been replaced with classless addressing. Any netmask can now be assigned to any IP address range.

The four octets that make up an IP address are conventionally represented by a, b, c, and d respectively. The following table shows how the octets are distributed in classes A, B, and C.

Class IP Address Network ID Host ID

A a.b.c.d a b.c.d

B a.b.c.d a.b c.d

C a.b.c.d a.b.c d

Class A: Class A addresses are specified to networks with large number of total hosts. Class A allows for 126 networks by using the first octet for the network ID. The first bit in this octet, is always set and fixed to zero. And next seven bits in the octet is all set to one, which then complete network ID. The 24 bits in the remaining octets represent the hosts ID, allowing 126 networks and approximately 17 million hosts per network. Class A network number values begin at 1 and end at 127.

Class B: Class B addresses are specified to medium to large sized of networks. Class B allows for 16,384 networks by using the first two octets for the network ID. The two bits in the first octet are always set and fixed to 1 0. The remaining 6 bits, together with the next octet, complete network ID. The 16 bits in the third and fourth octet represent host ID, allowing for approximately 65,000 hosts per network. Class B network number values begin at 128 and end at 191.


Class C: Class C addresses are used in small local area networks (LANs). Class C allows for approximately 2 million networks by using the first three octets for the network ID. In class C address three bits are always set and fixed to 1 1 0. And in the first three octets 21 bits complete the total network ID. The 8 bits of the last octet represent the host ID allowing for 254 hosts per one network. Class C network number values begin at 192 and end at 223.

Class D and E: Classes D and E are not allocated to hosts. Class D addresses are used for multicasting, and class E addresses are not available for general use: they are reserved for future purposes.

Q..6 Define Cryptography. Discuss two cryptographic techniques.

Ans: Cryptography Definition :

The practise and study of encryption and decryption - encoding data so that it can only be decoded by specific individuals. A system for encrypting and decrypting data is a cryptosystem. These usually involve an algorithm for combining the original data "plaintext" with one or more "keys" - numbers or strings of characters known only to the sender and/or recipient. The resulting output is known as "ciphertext".

The security of a cryptosystem usually depends on the secrecy of some of the keys rather than with the supposed secrecy of the algorithm. A strong cryptosystem has a large range of possible keys so that it is not possible to just try all possible keys a "brute force" approach. A strong cryptosystem will produce ciphertext which appears random to all standard statistical tests. A strong cryptosystem will resist all known previous methods for breaking codes "cryptanalysis".

Symmetric-key cryptography

Symmetric-key cryptography refers to encryption methods in which both the sender and receiver share the same key (or, less commonly, in which their keys are different, but related in an easily computable way). This was the only kind of encryption publicly known until June 1976.[12] One round (out of 8.5) of the patented IDEA cipher, used in some versions of PGP for high-speed encryption of, for instance, e-mail

The modern study of symmetric-key ciphers relates mainly to the study of block ciphers and stream ciphers and to their applications. A block cipher is, in a sense, a modern embodiment of Alberti's polyalphabetic cipher: block ciphers take as input a block of plaintext and a key, and output a block of ciphertext of the same size. Since messages are almost always longer than a single block, some method of knitting together successive blocks is required. Several have been developed, some with better security in one aspect or another than others. They are the modes of operation and must be carefully considered when using a block cipher in a cryptosystem.

The Data Encryption Standard (DES) and the Advanced Encryption Standard (AES) are block cipher designs which have been designated cryptography standards by the US government (though DES's designation was finally withdrawn after the AES was adopted).[14] Despite its deprecation as an official standard, DES (especially its still-approved and much more secure triple-DES variant) remains quite popular; it is used across a wide range of applications, from ATM encryption[15] to e-mail privacy[16] and secure remote access.[17] Many other block ciphers have been designed and released, with considerable variation in quality. Many have been thoroughly broken; see Category:Block ciphers.[13][18]

Stream ciphers, in contrast to the 'block' type, create an arbitrarily long stream of key material, which is combined with the plaintext bit-by-bit or character-by-character, somewhat like the one-time pad. In a stream cipher, the output stream is created based on a hidden internal state which changes as the cipher operates. That internal state is initially set up using the secret key material. RC4 is a widely used stream cipher; see Category:Stream ciphers.[13] Block ciphers can be used as stream ciphers; see Block cipher modes of operation.


Cryptographic hash functions are a third type of cryptographic algorithm. They take a message of any length as input, and output a short, fixed length hash which can be used in (for example) a digital signature. For good hash functions, an attacker cannot find two messages that produce the same hash. MD4 is a long-used hash function which is now broken; MD5, a strengthened variant of MD4, is also widely used but broken in practice. The U.S. National Security Agency developed the Secure Hash Algorithm series of MD5-like hash functions: SHA-0 was a flawed algorithm that the agency withdrew; SHA-1 is widely deployed and more secure than MD5, but cryptanalysts have identified attacks against it; the SHA-2 family improves on SHA-1, but it isn't yet widely deployed, and the U.S. standards authority thought it "prudent" from a security perspective to develop a new standard to "significantly improve the robustness of NIST's overall hash algorithm toolkit."[19] Thus, a hash function design competition is underway and meant to select a new U.S. national standard, to be called SHA-3, by 2012.

Message authentication codes (MACs) are much like cryptographic hash functions, except that a secret key can be used to authenticate the hash value[13] upon receipt.[edit]

Public-key cryptography

Symmetric-key cryptosystems use the same key for encryption and decryption of a message, though a message or group of messages may have a different key than others. A significant disadvantage of symmetric ciphers is the key management necessary to use them securely. Each distinct pair of communicating parties must, ideally, share a different key, and perhaps each ciphertext exchanged as well. The number of keys required increases as the square of the number of network members, which very quickly requires complex key management schemes to keep them all straight and secret. The difficulty of securely establishing a secret key between two communicating parties, when a secure channel does not already exist between them, also presents a chicken-and-egg problem which is a considerable practical obstacle for cryptography users in the real world. Whitfield Diffie and Martin Hellman, authors of the first published paper on public-key cryptography

In a groundbreaking 1976 paper, Whitfield Diffie and Martin Hellman proposed the notion of public-key (also, more generally, called asymmetric key) cryptography in which two different but mathematically related keys are used—a public key and a private key.[ A public key system is so constructed that calculation of one key (the 'private key') is computationally infeasible from the other (the 'public key'), even though they are necessarily related. Instead, both keys are generated secretly, as an interrelated pair.[21] The historian David Kahn described public-key cryptography as "the most revolutionary new concept in the field since polyalphabetic substitution emerged in the Renaissance".

In public-key cryptosystems, the public key may be freely distributed, while its paired private key must remain secret. The public key is typically used for encryption, while the private or secret key is used for decryption. Diffie and Hellman showed that public-key cryptography was possible by presenting the Diffie–Hellman key exchange protocol.

In 1978, Ronald Rivest, Adi Shamir, and Len Adleman invented RSA, another public-key system.

In 1997, it finally became publicly known that asymmetric key cryptography had been invented by James H. Ellis at GCHQ, a British intelligence organization, and that, in the early 1970s, both the Diffie–Hellman and RSA algorithms had been previously developed (by Malcolm J. Williamson and Clifford Cocks, respectively).

The Diffie–Hellman and RSA algorithms, in addition to being the first publicly known examples of high quality public-key algorithms, have been among the most widely used. Others include the Cramer–Shoup cryptosystem, ElGamal encryption, and various elliptic curve techniques. See Category:Asymmetric-key cryptosystems. Padlock icon from the Firefox Web browser, meant to indicate a page has been sent in SSL or TLS-encrypted protected form. However, such an icon is not a guarantee of security; any subverted browser might mislead a user by displaying such an icon when a transmission is not actually being protected by SSL or TLS.


In addition to encryption, public-key cryptography can be used to implement digital signature schemes. A digital signature is reminiscent of an ordinary signature; they both have the characteristic that they are easy for a user to produce, but difficult for anyone else to forge. Digital signatures can also be permanently tied to the content of the message being signed; they cannot then be 'moved' from one document to another, for any attempt will be detectable. In digital signature schemes, there are two algorithms: one for signing, in which a secret key is used to process the message (or a hash of the message, or both), and one for verification, in which the matching public key is used with the message to check the validity of the signature. RSA and DSA are two of the most popular digital signature schemes. Digital signatures are central to the operation of public key infrastructures and many network security schemes (e.g., SSL/TLS, many VPNs, etc.).

Public-key algorithms are most often based on the computational complexity of "hard" problems, often from number theory. For example, the hardness of RSA is related to the integer factorization problem, while Diffie–Hellman and DSA are related to the discrete logarithm problem. More recently, elliptic curve cryptography has developed in which security is based on number theoretic problems involving elliptic curves. Because of the difficulty of the underlying problems, most public-key algorithms involve operations such as modular multiplication and exponentiation, which are much more computationally expensive than the techniques used in most block ciphers, especially with typical key sizes. As a result, public-key cryptosystems are commonly hybrid cryptosystems, in which a fast high-quality symmetric-key encryption algorithm is used for the message itself, while the relevant symmetric key is sent with the message, but encrypted using a public-key algorithm. Similarly, hybrid signature schemes are often used, in which a cryptographic hash function is computed


Business intelligence & Tools

Q1. Define the term business intelligence tools? Briefly explain how the data from the one end gets transformed into information at the other end?

Ans:

In this section will be familiar with the definition of BI also the application involved in it. Business intelligence (BI) is a wide category of applications and technologies which gathers, stores, analysis, and provides access to data.


It helps enterprise users to make better business decisions. BI applications involve the activities of decision support systems, query and reporting, online analytical processing (OLAP), statistical analysis, forecasting, and data mining.

Business intelligence tools provide information on how trade is presently being conducted and what are the areas to be developed. Business intelligence tool are a kind of application software which is developed to report, evaluate, and present data. The tools generally read data that are stored previously however nor necessarily, in data warehouse. Following are some of the types of Business Intelligence Tools commonly used.

Multiplicity of business intelligence tools:

Multiplicity of business intelligence tools offers past, existing, and observations which can be expected in future which are of business operations. The main features of business intelligence methodologies include reporting, online systematic processing, analytics, withdrawal of information, industry performance administration, benchmarking, text mining, and projecting analytics.

It always encourages enhanced business decision-making processes. Therefore it is also called a decision support system. Even if he word business intelligence is frequently used as a synonym for competitive intelligence, BI uses technologies, processes, and functions to examine mainly internal, planned data and business methods as competitive intelligence, is done by assembling, evaluating and distribute information with or without the encouragement from technology and applications. It mainly concentrates on all-source information and data, which is mostly external and also internal to an organization, which helps in decision making.

Q2. What Do mean by data ware house? What are the major concepts and terminology used in the study of data warehouse?

Ans. In order to survive the market competition, an organization has to monitor the changes within the organization and outside the organization. An organization that cannot study the current trends within the organization and without its operations such as corporate relations will not be able to survive in the world today. This is where data warehousing and its applications comes into play.

Data warehousing: technology is the process by which the historical data of a company (also referred to as corporate memory) is created and utilized. A data warehouse is the database that contains data relevant to corporate information. This includes sales figures, market performances, accounts payables, and leave details of employees. However, the data warehouse is not limited to the above mentioned data. The data available is useful in making decisions based on past performances of employees, expenses and experiences.

To utilize the data warehousing technology, companies can opt for online transaction processing (OLTP) or online analytical processing (OLAP). The uses of data warehousing are many. Let us consider a bank scenario to analyse the importance of data warehousing. In a bank, the account of several customers have to be maintained. It includes the balance, savings, and deposit details. The particulars of the bank employees and the information regarding their performance have to be maintained. Data warehousing technologies are used for the same.

Data warehousing transforms data to information and enables the organizations to analyse its operations and performances. This task is done by the staging and transformation of data from data sources. The data stores may be stored on disk or memory.

To extract, clean and load data from online transactions processing (OLTP) and the repositories of data, the data warehousing system uses backend tools. Data warehousing consists of the data storage are composed


of the data warehouse, the data marts and the data store. It also provides tools like OLAP to organize, partition and summarise data in the data warehouse and data marts. Mining, querying and reporting on data requires front end tools.

Contrasting OLTP and Data Warehousing Environments

Illustrates key differences between an OLTP system and a data warehouse.

One major difference between the types of system is that data warehouses are not usually in third normal form (3NF), a type of data normalization common in OLTP environments.

Data warehouses and OLTP systems have very different requirements. Here are some examples of differences between typical data warehouses and OLTP systems:

Workload : Data warehouses are designed to accommodate ad hoc queries. You might not know the workload of your data warehouse in advance, so a data warehouse should be optimized to perform well for a wide variety of possible query operations. OLTP systems support only predefined operations. Your applications might be specifically tuned or designed to support only these operations.

Data modifications :A data warehouse is updated on a regular basis by the ETL process (run nightly or weekly) using bulk data modification techniques. The end users of a data warehouse do not directly update the data warehouse. In OLTP systems, end users routinely issue individual data modification statements to the database. The OLTP database is always up to date, and reflects the current state of each business transaction.

Schema design :Data warehouses often use denormalized or partially denormalized schemas (such as a star schema) to optimize query performance. OLTP systems often use fully normalized schemas to optimize update/insert/delete performance, and to guarantee data consistency.

Typical operations : A typical data warehouse query scans thousands or millions of rows. For example, "Find the total sales for all customers last month. A typical OLTP operation accesses only a handful of records. For example, "Retrieve the current order for this customer."

Historical data : Data warehouses usually store many months or years of data. This is to support historical analysis. OLTP systems usually store data from only a few weeks or months. The OLTP system stores only historical data as needed to successfully meet the requirements of the current transaction.

Data Warehouse ArchitecturesData warehouses and their architectures vary depending upon the specifics of an organization's situation. Three common architectures are:

http://download.oracle.com/docs/cd/B10500_01/server.920/a96520/glossary.htm#432709



Data Warehouse Architecture (Basic) Data Warehouse Architecture (with a Staging Area) Data Warehouse Architecture (with a Staging Area and Data Marts)

Data Warehouse Architecture (Basic) shows a simple architecture for a data warehouse. End users directly access data derived from several source systems through the data warehouse.

In , the metadata and raw data of a traditional OLTP system is present, as is an additional type of data, summary data. Summaries are very valuable in data warehouses because they pre-compute long operations in advance. For example, a typical data warehouse query is to retrieve something like August sales. A summary in Oracle is called a materialized view.

Data Warehouse Architecture (with a Staging Area)

In , you need to clean and process your operational data before putting it into the warehouse. You can do this programmatically, although most data warehouses use a staging area instead. A staging area simplifies building summaries and general warehouse management. illustrates this typical architecture.

Data Warehouse Architecture (with a Staging Area and Data Marts)

Although the architecture in is quite common, you may want to customize your warehouse's architecture for different groups within your organization. You can do this by adding data marts, which are systems designed for a particular line of business. illustrates an example where purchasing, sales, and inventories are separated. In this example, a financial analyst might want to analyze historical data for purchases and sales.



http://download.oracle.com/docs/cd/B10500_01/server.920/a96520/concept.htm#51078




Q3. What are the data modeling techniques used in data warehousing environment?

Ans: Data Modelling Multi-fact Star Schema or Snowflake Schema

Each of the dimension table consists of a single field primary key that has one-to-many relationship with a foreign key in the fact table. Let us look into some facts related to star and snowflake schema.

Model

The fact table consists of the main data and the other smaller dimension tables contain the description for each value in the dimensions. The dimension tables can be connected to the fact table. Fact table consist of a set of foreign keys that makes a complex primary key. Dimension tables consist of a primary key.

One of the reasons for using star schema is because it is simple. The queries are not complex as the joins and conditions involve a fact table and few single level dimension tables. In snowflake schema the queries are complex because of multiple levels of dimension tables.

Uses :

The stat and snowflake schema are used in dimensional data where the speed or retrieval is more important than the efficiency of data management. Therefore, data is not normalized much. The decision as to which schema should be used depends on two factors: the database platform, the query tool to be used. Star schema is relevant in environment where the queries are much simpler and the query tools expose the users to the fundamental table structures. Snowflake schema would be apt for environments with several queries with complex conditions where the user is detached from the fundamental table structures.

Data Normalization and storage:

The data in the database could be repeated. To reduce redundancy we use normalization. Commonly repeated data are moved into a new table. Therefore, the number of tables to be joined to execute a query increases. However, normalization reduces the space required for the storage of redundant data and other places where it has to update. The dimensional tables are smaller compared to fact table when storage is concerned.

What is Data Modeling?


Data modeling is the act of exploring data-oriented structures. Like other modeling artifacts data models can be used for a variety of purposes, from high-level conceptual models to physical data models. From the point of view of an object-oriented developer data modeling is conceptually similar to class modeling. With data modeling you identify entity types whereas with class modeling you identify classes. Data attributes are assigned to entity types just as you would assign attributes and operations to classes. There are associations between entities, similar to the associations between classes – relationships, inheritance, composition, and aggregation are all applicable concepts in data modeling.

Traditional data modeling is different from class modeling because it focuses solely on data – class models allow you to explore both the behavior and data aspects of your domain, with a data model you can only explore data issues. Because of this focus data modelers have a tendency to be much better at getting the data “right” than object modelers. However, some people will model database methods (stored procedures, stored functions, and triggers) when they are physical data modeling. It depends on the situation of course, but I personally think that this is a good idea and promote the concept in my UML data modeling profile (more on this later).

Although the focus of this article is data modeling, there are often alternatives to data-oriented artifacts (never forget Agile Modeling’s Multiple Models principle). For example, when it comes to conceptual modeling ORM diagrams aren’t your only option – In addition to LDMs it is quite common for people to create UML class diagrams and even Class Responsibility Collaborator (CRC) cards instead. In fact, my experience is that CRC cards are superior to ORM diagrams because it is very easy to get project stakeholders actively involved in the creation of the model. Instead of a traditional, analyst-led drawing session you can instead facilitate stakeholders through the creation of CRC cards.

How are Data Models Used in Practice?

Although methodology issues are covered later, we need to discuss how data models can be used in practice to better understand them. You are likely to see three basic styles of data model:

Conceptual data models. These models, sometimes called domain models, are typically used to explore domain concepts with project stakeholders. On Agile teams high-level conceptual models are often created as part of your initial requirements envisioning efforts as they are used to explore the high-level static business structures and concepts. On traditional teams conceptual data models are often created as the precursor to LDMs or as alternatives to LDMs.

Logical data models (LDMs). LDMs are used to explore the domain concepts, and their relationships, of your problem domain. This could be done for the scope of a single project or for your entire enterprise. LDMs depict the logical entity types, typically referred to simply as entity types, the data attributes describing those entities, and the relationships between the entities. LDMs are rarely used on Agile projects although often are on traditional projects (where they rarely seem to add much value in practice).

Physical data models (PDMs). PDMs are used to design the internal schema of a database, depicting the data tables, the data columns of those tables, and the relationships between the tables. PDMs often prove to be useful on both Agile and traditional projects and as a result the focus of this article is on physical modeling.

Although LDMs and PDMs sound very similar, and they in fact are, the level of detail that they model can be significantly different. This is because the goals for each diagram is different – you can use an LDM to explore domain concepts with your stakeholders and the PDM to define your database design. Figure 1 presents a simple LDM and Figure 2 a simple PDM, both modeling the concept of customers and addresses as well as the relationship between them. Both diagrams apply the Barker notation, summarized below.

http://www.agiledata.org/essays/dataModeling101.html#Notations

http://www.amazon.com/exec/obidos/ASIN/0201416964/ambysoftinc

http://www.agiledata.org/essays/dataModeling101.html#Figure2SimplePDM

http://www.agiledata.org/essays/dataModeling101.html#Figure1SimpleLDM

http://www.agilemodeling.com/essays/initialRequirementsModeling.htm

http://www.agiledata.org/essays/dataModeling101.html#EvolutionaryModeling

http://www.agilemodeling.com/artifacts/crcModel.htm

http://www.agilemodeling.com/essays/businessAnalysts.htm

http://www.agilemodeling.com/essays/activeStakeholderParticipation.htm

http://www.agilemodeling.com/essays/activeStakeholderParticipation.htm

http://www.agilemodeling.com/artifacts/crcModel.htm

http://www.agiledata.org/essays/agileDataModeling.html#InitialDomainModel

http://www.agilemodeling.com/artifacts/ormDiagram.htm

http://www.agilemodeling.com/principles.htm#MultipleModels

http://www.agiledata.org/essays/umlDataModelingProfile.html


Notice how the PDM shows greater detail, including an associative table required to implement the association as well as the keys needed to maintain the relationships. More on these concepts later. PDMs should also reflect your organization’s database naming standards, in this case an abbreviation of the entity name is appended to each column name and an abbreviation for “Number” was consistently introduced. A PDM should also indicate the data types for the columns, such as integer and char(5). Although Figure 2 does not show them, lookup tables (also called reference tables or description tables) for how the address is used as well as for states and countries are implied by the attributes ADDR_USAGE_CODE, STATE_CODE, and COUNTRY_CODE.

A simple logical data model.

A simple physical data model.

An important observation about Figures 1 and 2 is that I’m not slavishly following Barker’s approach to naming relationships. For example, between Customer and Address there really should be two names “Each CUSTOMER may be located in one or more ADDRESSES” and “Each ADDRESS may be the site of one or more CUSTOMERS”. Although these names explicitly define the relationship I personally think that they’re visual noise that clutter the diagram. I prefer simple names such as “has” and then trust my readers to interpret the name in each direction. I’ll only add more information where it’s needed, in this case I think that it isn’t. However, a significant advantage of describing the names the way that Barker suggests is that it’s a good test to see if you actually understand the relationship – if you can’t name it then you likely don’t understand it.

Data models can be used effectively at both the enterprise level and on projects. Enterprise architects will often create one or more high-level LDMs that depict the data structures that support your enterprise, models typically referred to as enterprise data models or enterprise information models. An enterprise data model is one of several views that your organization’s enterprise architects may choose to maintain and support – other views may explore your network/hardware infrastructure, your organization structure, your software infrastructure, and your business processes (to name a few). Enterprise data models provide information that a project team can use both as a set of constraints as well as important insights into the structure of their system.

http://www.enterpriseunifiedprocess.com/essays/enterpriseArchitecture.html





Project teams will typically create LDMs as a primary analysis artifact when their implementation environment is predominantly procedural in nature, for example they are using structured COBOL as an implementation language. LDMs are also a good choice when a project is data-oriented in nature, perhaps a data warehouse or reporting system is being developed (having said that, experience seems to show that usage-centered approaches appear to work even better). However LDMs are often a poor choice when a project team is using object-oriented or component-based technologies because the developers would rather work with UML diagrams or when the project is not data-oriented in nature. As Agile Modeling advises, apply the right artifact(s) for the job. Or, as your grandfather likely advised you, use the right tool for the job. It's important to note that traditional approaches to Master Data Management (MDM) will often motivate the creation and maintenance of detailed LDMs, an effort that is rarely justifiable in practice when you consider the total cost of ownership (TCO) when calculating the return on investment (ROI) of those sorts of efforts.

When a relational database is used for data storage project teams are best advised to create a PDMs to model its internal schema. My experience is that a PDM is often one of the critical design artifacts for business application development projects.

What About Conceptual Models?

Halpin (2001) points out that many data professionals prefer to create an Object-Role Model (ORM), an example is depicted in Figure 3, instead of an LDM for a conceptual model. The advantage is that the notation is very simple, something your project stakeholders can quickly grasp, although the disadvantage is that the models become large very quickly. ORMs enable you to first explore actual data examples instead of simply jumping to a potentially incorrect abstraction – for example Figure 3 examines the relationship between customers and addresses in detail.

A simple Object-Role Model.

It is seen that people will capture information in the best place that they know. As a result I typically discard ORMs after I’m finished with them. I sometimes user ORMs to explore the domain with project stakeholders but later replace them with a more traditional artifact such as an LDM, a class diagram, or even a PDM. As a generalizing specialist, someone with one or more specialties who also strives to gain general skills and knowledge, this is an easy decision for me to make; I know that this information that I’ve just “discarded” will be captured in another artifact – a model, the tests, or even the code – that I understand. A specialist who only understands a limited number of artifacts and therefore “hands-off” their work to other specialists doesn’t have this as an option. Not only are they tempted to keep the artifacts that they create but also to invest even more time to enhance the artifacts. Generalizing specialists are more likely than specialists to travel light.

Common Data Modeling Notations

http://www.agilemodeling.com/principles.htm#TravelLight

http://www.agilemodeling.com/principles.htm#TravelLight

http://www.agilemodeling.com/essays/generalizingSpecialists.htm

http://www.agiledata.org/essays/dataModeling101.html#Figure3ConceptualModel

http://www.agiledata.org/essays/dataModeling101.html#Figure3ConceptualModel

http://www.agilemodeling.com/artifacts/ormDiagram.htm


http://www.agiledata.org/essays/mappingObjects.html

http://www.agiledata.org/essays/masterDataManagement.html

http://www.agilemodeling.com/practices.htm#ApplyTheRightArtifacts

http://www.agilemodeling.com/practices.htm#ApplyTheRightArtifacts

http://www.ambysoft.com/books/agileModeling.html

http://www.agiledata.org/essays/dataWarehousingBestPractices.html


Figure presents a summary of the syntax of four common data modeling notations: Information Engineering (IE), Barker, IDEF1X, and the Unified Modeling Language (UML). This diagram isn’t meant to be comprehensive, instead its goal is to provide a basic overview. Furthermore, for the sake of brevity I wasn’t able to depict the highly-detailed approach to relationship naming that Barker suggests. Although I provide a brief description of each notation in Table 1 I highly suggest David Hay’s paper A Comparison of Data Modeling Techniques as he goes into greater detail than I do.

Comparing the syntax of common data modeling notations.

Table . Discussing common data modeling notations.

http://www.essentialstrategies.com/publications/modeling/compare.htm

http://www.essentialstrategies.com/publications/modeling/compare.htm

http://www.agiledata.org/essays/dataModeling101.html#Table1

http://www.agilemodeling.com/essays/umlDiagrams.htm

http://www.agiledata.org/essays/dataModeling101.html#Figure3Notation


Notation Comments

IE

The IE notation (Finkelstein 1989) is simple and easy to read, and is well suited for high-level logical and enterprise data modeling. The only drawback of this notation, arguably an advantage, is that it does not support the identification of attributes of an entity. The assumption is that the attributes will be modeled with another diagram or simply described in the supporting documentation.

Barker The Barker notation is one of the more popular ones, it is supported by Oracle’s toolset, and is well suited for all types of data models. It’s approach to subtyping can become clunky with hierarchies that go several levels deep.

IDEF1X

This notation is overly complex. It was originally intended for physical modeling but has been misapplied for logical modeling as well. Although popular within some U.S. government agencies, particularly the Department of Defense (DoD), this notation has been all but abandoned by everyone else. Avoid it if you can.

UML

This is not an official data modeling notation (yet). Although several suggestions for a data modeling profile for the UML exist, none are complete and more importantly are not “official” UML yet. However, the Object Management Group (OMG) in December 2005 announced an RFP for data-oriented models.

How to Model DataIt is critical for an application developer to have a grasp of the fundamentals of data modeling so they can not only read data models but also work effectively with Agile DBAs who are responsible for the data-oriented aspects of your project. Your goal reading this section is not to learn how to become a data modeler, instead it is simply to gain an appreciation of what is involved. The following tasks are performed in an iterative manner:

Identify entity types Identify attributes Apply naming conventions Identify relationships Apply data model patterns Assign keys Normalize to reduce data redundancy Denormalize to improve performance

Identify Entity TypesAn entity type, also simply called entity (not exactly accurate terminology, but very common in practice), is similar conceptually to object-orientation’s concept of a class – an entity type represents a collection of similar objects. An entity type could represent a collection of people, places, things, events, or concepts. Examples of entities in an order entry system would include Customer, Address, Order, Item, and Tax. If you were class modeling you would expect to discover classes with the exact same names. However, the difference between a class and an entity type is that classes have both data and behavior whereas entity types just have data. Ideally an entity should be normal, the data modeling world’s version of cohesive. A normal entity depicts one concept, just like a cohesive class models one concept. For example, customer and order are clearly two different concepts; therefore it makes sense to model them as separate entities.

Identify Attributes

http://www.agiledata.org/essays/dataModeling101.html#Normalize

http://www.agiledata.org/essays/dataModeling101.html#Denormalize

http://www.agiledata.org/essays/dataModeling101.html#Normalize

http://www.agiledata.org/essays/dataModeling101.html#AssignKeys

http://www.agiledata.org/essays/dataModeling101.html#ApplyDataModelPatterns

http://www.agiledata.org/essays/dataModeling101.html#IdentifyRelationships

http://www.agiledata.org/essays/dataModeling101.html#ApplyDataNamingConventions

http://www.agiledata.org/essays/dataModeling101.html#IdentifyAttributes

http://www.agiledata.org/essays/dataModeling101.html#IdentifyDataEntities






Each entity type will have one or more data attributes. For example, in Figure 1 you saw that the Customer entity has attributes such as First Name and Surname and in Figure 2 that the TCUSTOMER table had corresponding data columns CUST_FIRST_NAME and CUST_SURNAME (a column is the implementation of a data attribute within a relational database). Attributes should also be cohesive from the point of view of your domain, something that is often a judgment call. – in Figure 1 we decided that we wanted to model the fact that people had both first and last names instead of just a name (e.g. “Scott” and “Ambler” vs. “Scott Ambler”) whereas we did not distinguish between the sections of an American zip code (e.g. 90210-1234-5678). Getting the level of detail right can have a significant impact on your development and maintenance efforts. Refactoring a single data column into several columns can be difficult, database refactoring is described in detail in Database Refactoring, although over-specifying an attribute (e.g. having three attributes for zip code when you only needed one) can result in overbuilding your system and hence you incur greater development and maintenance costs than you actually needed.

Apply Data Naming ConventionsYour organization should have standards and guidelines applicable to data modeling, something you should be able to obtain from your enterprise administrators (if they don’t exist you should lobby to have some put in place). These guidelines should include naming conventions for both logical and physical modeling, the logical naming conventions should be focused on human readability whereas the physical naming conventions will reflect technical considerations. You can clearly see that different naming conventions were applied in Figures. As you saw in Introduction to Agile Modeling, AM includes the Apply Modeling Standards practice. The basic idea is that developers should agree to and follow a common set of modeling standards on a software project. Just like there is value in following common coding conventions, clean code that follows your chosen coding guidelines is easier to understand and evolve than code that doesn't, there is similar value in following common modeling conventions.

Identify RelationshipsIn the real world entities have relationships with other entities. For example, customers PLACE orders, customers LIVE AT addresses, and line items ARE PART OF orders. Place, live at, and are part of are all terms that define relationships between entities. The relationships between entities are conceptually identical to the relationships (associations) between objects. Figure depicts a partial LDM for an online ordering system. The first thing to notice is the various styles applied to relationship names and roles – different relationships require different approaches. For example the relationship between Customer and Order has two names, places and is placed by, whereas the relationship between Customer and Address has one. In this example having a second name on the relationship, the idea being that you want to specify how to read the relationship in each direction, is redundant – you’re better off to find a clear wording for a single relationship name, decreasing the clutter on your diagram. Similarly you will often find that by specifying the roles that an entity plays in a relationship will often negate the need to give the relationship a name (although some CASE tools may inadvertently force you to do this). For example the role of billing address and the label billed to are clearly redundant, you really only need one. For example the role part of that Line Item has in its relationship with Order is sufficiently obvious without a relationship name.

A logical data model (Information Engineering notation).

http://www.agilemodeling.com/essays/simpleTools.htm

http://www.agilemodeling.com/practices.htm#ApplyModelingStandards

http://www.agilemodeling.com/essays/introductionToAM.htm

http://www.agiledata.org/essays/databaseRefactoring.html



It is also need to identify the cardinality and optionality of a relationship (the UML combines the concepts of optionality and cardinality into the single concept of multiplicity). Cardinality represents the concept of “how many” whereas optionality represents the concept of “whether you must have something.” For example, it is not enough to know that customers place orders. How many orders can a customer place? None, one, or several? Furthermore, relationships are two-way streets: not only do customers place orders, but orders are placed by customers. This leads to questions like: how many customers can be enrolled in any given order and is it possible to have an order with no customer involved? Figure 5 shows that customers place one or more orders and that any given order is placed by one customer and one customer only. It also shows that a customer lives at one or more addresses and that any given address has zero or more customers living at it.Although the UML distinguishes between different types of relationships – associations, inheritance, aggregation, composition, and dependency – data modelers often aren’t as concerned with this issue as much as object modelers are. Subtyping, one application of inheritance, is often found in data models, an example of which is the is a relationship between Item and it’s two “sub entities” Service and Product. Aggregation and composition are much less common and typically must be implied from the data model, as you see with the part of role that Line Item takes with Order. UML dependencies are typically a software construct and therefore wouldn’t appear on a data model, unless of course it was a very highly detailed physical model that showed how views, triggers, or stored procedures depended on other aspects of the database schema. Assign KeysThere are two fundamental strategies for assigning keys to tables. First, you could assign a natural key which is one or more existing data attributes that are unique to the business concept. The Customer table of Figure 6 there was two candidate keys, in this case CustomerNumber and SocialSecurityNumber. Second, you could introduce a new column, called a surrogate key, which is a key that has no business meaning. An example of which is the AddressID column of the Address table in Figure 6. Addresses don’t have an “easy” natural key because you would need to use all of the columns of the Address table to form a key for itself (you might be able to get away with just the combination of Street and ZipCode depending on your problem domain), therefore introducing a surrogate key is a much better option in this case. Figure . Customer and Address revisited (UML notation).


http://www.agiledata.org/essays/dataModeling101.html#Figure6UMLCustomerAddress


http://www.agiledata.org/essays/dataModeling101.html#Figure5IdentifyRelationships

http://www.agilemodeling.com/essays/umlDiagrams.htm


Let's consider Figure in more detail. Figure presents an alternative design to that presented in Figure, a different naming convention was adopted and the model itself is more extensive. In Figure 6 the Customer table has the CustomerNumber column as its primary key and SocialSecurityNumber as an alternate key. This indicates that the preferred way to access customer information is through the value of a person’s customer number although your software can get at the same information if it has the person’s social security number. The CustomerHasAddress table has a composite primary key, the combination of CustomerNumber and AddressID. A foreign key is one or more attributes in an entity type that represents a key, either primary or secondary, in another entity type. Foreign keys are used to maintain relationships between rows. For example, the relationships between rows in the CustomerHasAddress table and the Customer table is maintained by the CustomerNumber column within the CustomerHasAddress table. The interesting thing about the CustomerNumber column is the fact that it is part of the primary key for CustomerHasAddress as well as the foreign key to the Customer table. Similarly, the AddressID column is part of the primary key of CustomerHasAddress as well as a foreign key to the Address table to maintain the relationship with rows of Address.

Although the "natural vs. surrogate" debate is one of the great religious issues within the data community, the fact is that neither strategy is perfect and you'll discover that in practice (as we see in Figure 6) sometimes it makes sense to use natural keys and sometimes it makes sense to use surrogate keys. In Choosing a Primary Key: Natural or Surrogate? I describe the relevant issues in detail.

Normalize to Reduce Data Redundancy

Data normalization is a process in which data attributes within a data model are organized to increase the cohesion of entity types. In other words, the goal of data normalization is to reduce and even eliminate data redundancy, an important consideration for application developers because it is incredibly difficult to stores objects in a relational database that maintains the same information in several places. Table 2 summarizes the three most common normalization rules describing how to put entity types into a series of increasing levels of normalization. Higher levels of data normalization (Date 2000) are beyond the scope of this book. With respect to terminology, a data schema is considered to be at the level of normalization of its least normalized entity type. For example, if all of your entity types are at second normal form (2NF) or higher then we say that your data schema is at 2NF.

http://www.agiledata.org/essays/dataModeling101.html#Table2DataNormalizationRules

http://www.agiledata.org/essays/dataNormalization.html

http://www.agiledata.org/essays/keys.html



Table 2. Data Normalization Rules.

Level Rule

First normal form (1NF) An entity type is in 1NF when it contains no repeating groups of data.

Second normal form (2NF) An entity type is in 2NF when it is in 1NF and when all of its non-key attributes are fully dependent on its primary key.

Third normal form (3NF) An entity type is in 3NF when it is in 2NF and when all of its attributes are directly dependent on the primary key.

depicts a database schema in ONF whereas Figure 8 depicts a normalized schema in 3NF. Read the Introduction to Data Normalization essay for details.

Why data normalization? The advantage of having a highly normalized data schema is that information is stored in one place and one place only, reducing the possibility of inconsistent data. Furthermore, highly-normalized data schemas in general are closer conceptually to object-oriented schemas because the object-oriented goals of promoting high cohesion and loose coupling between classes results in similar solutions (at least from a data point of view). This generally makes it easier to map your objects to your data schema. Unfortunately, normalization usually comes at a performance cost. With the data schema of Figure 7 all the data for a single order is stored in one row (assuming orders of up to nine order items), making it very easy to access. With the data schema of Figure 7 you could quickly determine the total amount of an order by reading the single row from the Order0NF table. To do so with the data schema of Figure 8 you would need to read data from a row in the Order table, data from all the rows from the OrderItem table for that order and data from the corresponding rows in the Item table for each order item. For this query, the data schema of Figure 7 very likely provides better performance.

An Initial Data Schema for Order (UML Notation).


http://www.agiledata.org/essays/dataModeling101.html#Figure7Order0NF

http://www.agiledata.org/essays/dataModeling101.html#Figure8OrderNormalized



http://www.agiledata.org/essays/mappingObjects.html

http://www.agiledata.org/essays/dataNormalization.html


http://www.agiledata.org/essays/dataNormalization.html#3NF




Normalized schema in 3NF (UML Notation).



In class modeling, there is a similar concept called Class Normalization although that is beyond the scope of this article.

Denormalize to Improve Performance

Normalized data schemas, when put into production, often suffer from performance problems. This makes sense – the rules of data normalization focus on reducing data redundancy, not on improving performance of data access. An important part of data modeling is to denormalize portions of your data schema to improve database access times. For example, the data model of Figure 9 looks nothing like the normalized schema of Figure 8. To understand why the differences between the schemas exist you must consider the performance needs of the application. The primary goal of this system is to process new orders from online customers as quickly as possible. To do this customers need to be able to search for items and add them to their order quickly, remove items from their order if need be, then have their final order totaled and recorded quickly. The secondary goal of the system is to the process, ship, and bills the orders afterwards.

A Denormalized Order Data Schema (UML notation).



http://www.agiledata.org/essays/dataModeling101.html#Figure12DenormalizedOrder

http://www.agiledata.org/essays/classNormalization.html


To denormalize the data schema the following decisions were made:

1. To support quick searching of item information the Item table was left alone.

2. To support the addition and removal of order items to an order the concept of an OrderItem table was kept, albeit split in two to support outstanding orders and fulfilled orders. New order items can easily be inserted into the OutstandingOrderItem table, or removed from it, as needed.


3. To support order processing the Order and OrderItem tables were reworked into pairs to handle outstanding and fulfilled orders respectively. Basic order information is first stored in the OutstandingOrder and OutstandingOrderItem tables and then when the order has been shipped and paid for the data is then removed from those tables and copied into the FulfilledOrder and FulfilledOrderItem tables respectively. Data access time to the two tables for outstanding orders is reduced because only the active orders are being stored there. On average an order may be outstanding for a couple of days, whereas for financial reporting reasons may be stored in the fulfilled order tables for several years until archived. There is a performance penalty under this scheme because of the need to delete outstanding orders and then resave them as fulfilled orders, clearly something that would need to be processed as a transaction.

4. The contact information for the person(s) the order is being shipped and billed to was also denormalized back into the Order table, reducing the time it takes to write an order to the database because there is now one write instead of two or three. The retrieval and deletion times for that data would also be similarly improved.

Note that if your initial, normalized data design meets the performance needs of your application then it is fine as is. Denormalization should be resorted to only when performance testing shows that you have a problem with your objects and subsequent profiling reveals that you need to improve database access time. As my grandfather said, if it ain’t broke don’t fix it.

Evolutionary/Agile Data Modeling

Evolutionary data modeling is data modeling performed in an iterative and incremental manner. The article Evolutionary Development explores evolutionary software development in greater detail. Agile data modeling is evolutionary data modeling done in a collaborative manner. The article Agile Data Modeling: From Domain Modeling to Physical Modeling works through a case study which shows how to take an agile approach to data modeling.

Although you wouldn’t think it, data modeling can be one of the most challenging tasks that an Agile DBA can be involved with on an agile software development project. Your approach to data modeling will often be at the center of any controversy between the agile software developers and the traditional data professionals within your organization. Agile software developers will lean towards an evolutionary approach where data modeling is just one of many activities whereas traditional data professionals will often lean towards a big design up front (BDUF) approach where data models are the primary artifacts, if not THE artifacts. This problem results from a combination of the cultural impedance mismatch, a misguided need to enforce the "one truth", and “normal” political maneuvering within your organization. As a result Agile DBAs often find that navigating the political waters is an important part of their data modeling efforts.

Q4. Discuss the categories in which data is divided before structuring it into data ware house?

Ans. Data Warehouse Testing Categories

Categories of Data Warehouse testing includes different stages of the process. The testing is done on individual and end to end basis.

Good part of the testing of data warehouse testing can be linked to 'Data Warehouse Quality Assurance'. Data Warehouse Testing will include the following chapters:

http://www.agiledata.org/essays/oneTruth.html

http://www.agiledata.org/essays/culturalImpedanceMismatch.html

http://www.agilemodeling.com/essays/bmuf.htm

http://www.agilemodeling.com/essays/bmuf.htm

http://www.agiledata.org/essays/agileDataModeling.html

http://www.agiledata.org/essays/agileDataModeling.html

http://www.agiledata.org/essays/evolutionaryDevelopment.html


Extraction Testing

This testing checks the following:

Data is able to extract the required fields. The Extraction logic for each source system is working Extraction scripts are granted security access to the source systems. Updating of extract audit log and time stamping is happening. Source to Extraction destination is working in terms of completeness and accuracy. Extraction is getting completed with in the expected window.

Transformation Testing

Transaction scripts are transforming the data as per the expected logic. The one time Transformation for historical snap-shots are working. Detailed and aggregated data sets are created and are matching. Transaction Audit Log and time stamping is happening. There is no pilferage of data during Transformation process. Transformation is getting completed with in the given window

Loading Testing

There is no pilferage during the Loading process. Any Transformations during Loading process is working. Data sets in staging to Loading destination is working. One time historical snap-shots are working. Both incremental and total refresh are working. Loading is happening with in the expected window.

End User Browsing and OLAP Testing

The Business views and dashboard are displaying the data as expected. The scheduled reports are accurate and complete. The scheduled reports and other batch operations like view refresh etc. is happening in the expected

window. 'Analysis Functions' and 'Data Analysis' are working. There is no pilferage of data between the source systems and the views.

Ad-hoc Query Testing

Ad-hoc queries creation is as per the expected functionalities. Ad-hoc queries output response time is as expected.

Down Stream Flow Testing Data is extracted from the data warehouse and updated in the down-stream systems/data marts. There is no pilferage.

One Time Population testing

The one time ETL for the production data is working The production reports and the data warehouse reports are matching T he time taken for one time processing will be manageable within the conversion weekend.

End-to-End Integrated Testing

End to end data flow from the source system to the down stream system is complete and accurate.

Stress and volume Testing

http://www.executionmih.com/data-analysis/basic-methods-types.php

http://www.executionmih.com/data-warehouse/etl-loading-design.php

http://www.executionmih.com/data-warehouse/etl-transformation-design.php


This part of testing will involve, placing maximum volume OR failure points to check the robustness and capacity of the system. The level of stress testing depends upon the configuration of the test environment and the level of capacity planning done. Here are some examples from the ideal world:

Server shutdown during batch process. Extraction, Transformation and Loading with two to three times of maximum possible imagined data

(for which the capacity is planned) Having 2 to 3 times more users placing large numbers of ad-hoc queries. Running large number of scheduled reports.

Parallel TestingParallel testing is done where the Data Warehouse is run on the production data as it would have done in real life and its outputs are compared with the existing set of reports to ensure that they are in synch OR have the explained mismatches.

Q5. Discuss the purpose of executive information system in an organization?

Ans: Not a piece of hardware or software, but an infrastructure that supplies to a firm's executives the up-to-the-minute operational data, gathered and sifted from various databases. The typical information mix presented to the executive may include financial information, work in process, inventory figures, sales figures, market trends, industry statistics, and market price of the firm's shares. It may even suggest what needs to be done, but differs from a decision support system (DSS) in that it is targeted at executives and not managers.

An Executive Information System (EIS) is a computer-based system intended to facilitate and support the information and decision making needs of senior executives by providing easy access to both internal and external information relevant to meeting the strategic goals of the organization. It is commonly considered as a specialized form of Decision Support System (DSS).

The emphasis of EIS in on graphical displays and easy-to-use user interfaces. They offer strong reporting and drill-down capabilities. In general, EIS are enterprise-wide DSS that help top-level executives analyze, compare, and highlight trends in important variables so that they can monitor performance and identify opportunities and problems. EIS and data warehousing technologies are converging in the marketplace.

Executive Information System (EC-EIS)

PurposeAn executive information system (EIS) provides information about all the factors that influence the business activities of a company. It combines relevant data from external and internal sources and provides the user with important current data which can be analyzed quickly.The EC-Executive Information System (EC-EIS) is a system which is used to collect and evaluate information from different areas of a business and its environment. Among others, sources of this information can be the Financial Information System (meaning external accounting and cost accounting), the Human Resources Information System and the Logistics Information System.The information provided serves both management and the employees in Accounting.

Implementation ConsiderationsEC-EIS is the information system for upper management. It is generally suitable for the collection and evaluation of data from different functional information systems in one uniform view.

IntegrationThe Executive Information System is based on the same data basis, and has the same data collection facilities as Business Planning . In EC-EIS you can report on thedata planned in EC-BP.

FeaturesWhen customizing your Executive Information System you set up an individual EIS database for your business and have this supplied with data from various sub-information systems (Financial Information

http://www.wordiq.com/definition/Data_warehousing

http://www.wordiq.com/definition/Variable

http://www.wordiq.com/definition/User_interface

http://www.wordiq.com/definition/Decision_Support_System

http://www.wordiq.com/definition/Organization

http://www.wordiq.com/definition/Information

http://www.wordiq.com/definition/Decision_making

http://www.businessdictionary.com/definition/manager.html

http://www.businessdictionary.com/definition/decision-support-system-DSS.html

http://www.businessdictionary.com/definition/need.html

http://www.businessdictionary.com/definition/share.html

http://www.businessdictionary.com/definition/market-price.html

http://www.businessdictionary.com/definition/statistics.html

http://www.businessdictionary.com/definition/industry.html

http://www.businessdictionary.com/definition/trend.html

http://www.businessdictionary.com/definition/market.html

http://www.businessdictionary.com/definition/sales.html

http://www.businessdictionary.com/definition/figure.html

http://www.investorwords.com/2589/inventory.html

http://www.investorwords.com/5572/financial.html

http://www.businessdictionary.com/definition/information.html

http://www.businessdictionary.com/definition/typical.html

http://www.businessdictionary.com/definition/database.html

http://www.businessdictionary.com/definition/data.html

http://www.businessdictionary.com/definition/executive.html

http://www.investorwords.com/1967/firm.html

http://www.businessdictionary.com/definition/supplies.html

http://www.businessdictionary.com/definition/infrastructure.html

http://www.businessdictionary.com/definition/software.html

http://www.businessdictionary.com/definition/hardware.html


System, Human Resources Information System, Logistics Information System, cost accounting, etc.) or with external data. Since this data is structured heterogeneously, you can structure the data basis into separate EIS data areas for different business purposes. These data areas are called aspects. You can define various aspects for your enterprise containing, for example, information on the financial situation, logistics, human resources, the market situation, and stock prices. For each aspect you can create reports to evaluate the data. You can either carry out your own basic evaluations in the EIS presentation (reporting) system or analyze the data using certain report groups created specifically for your requirements. To access the EIS presentation functions, choose Information systems →EIS.

In this documentation the application functions are described in detail and the customizing functions in brief. It is intended for the EC-EIS user but also those responsible for managing the system. To access the EC-EIS application menu, choose Accounting →Enterprise control. →Executive InfoSystem.

To call up the presentation functions from the application menu, choose Environment → Executive menu.Necessary preliminary tasks and settings are carried out in Customizing. You can find a detailed description of the customizing functions in the implementation guidelines.

Setting Up the Data BasisAn aspect consists of characteristics and key figures. Characteristics are classification terms such as division, region, department, or company. A combination of characteristic values for certain characteristics (such as Division: Pharmaceuticals, Region: Northwest) is called an evaluation object. Key figures are numerical values such as revenue, fixed costs, variable costs, number of employees and quantity produced. They also form part of the structure of the aspect.

The key figure data is stored according to the characteristics in an aspect. Besides these key figures stored in the database, you can also define calculated key figures in EC-EIS and EC-BP. Calculated key figures are calculated with a formula and the basic key figures of the aspect (for example: CM1 per employee = (sales - sales deductions – variable costs) / Number of employees).

Determining characteristics and key figures when setting up the system provides the framework for the possible evaluations. You make these settings in customizing. When the structure of the aspect and the data basis have been defined in Customizing, you can evaluate data.

Presentation of the DataDrilldown reporting and the report portfolio help you to evaluate and present your data. You can evaluate EC-EIS data interactively using drilldown reporting. You a make a selection of the characteristics and key figures from the data basis. You can analyze many types of variance (plan/actual comparisons, time comparisons, object comparisons). Drilldown reporting contains easy-to-use functions for navigating through the dataset. In addition, there are a variety of functions for interactively processing a report (selection conditions, exceptions, sort, top n and so on). You can also access SAPgraphics and SAPmail and print using Microsoft Word for Windows and Microsoft Excel.

Drilldown reporting, with its numerous functions, is aimed at trained users, especially financial controllers and managers. By using the various function levels appropriately, other users can execute reports without needing extensive training. The reports created in drilldown reporting can be combined for specific user groups and stored in the graphical report portfolio. The report portfolio is aimed at users with basic knowledge of the system who wish to access information put together for their specific needs. You can call up report portfolio reports via a graphical menu. This menu can be set up individually for different user groups. The navigation function in the report portfolio is limited to scrolling through reports created by the relevant department.

Q6. Discuss the challenges involved in data integration and coordination process?

Ans:

Simple Data Integration The data integration process can often seem overwhelming, and this is often compounded by the vast number of large-scale, complex, and costly enterprise integration applications available on the market.


MapForce seeks to alleviate this burden with powerful data integration capabilities built into a straightforward graphical user interface.

MapForce allows you to easily associate target and source data structures using drag and drop functionality. Advanced data processing filters and functions can be added via a built-in function library, and you can use the visual function builder to combine multiple inline and/or recursive operations in more complex data integration scenarios.

Integrating Data from/into Multiple Files

MapForce lets you easily integrate data from multiple files or split data from one file into many. Multiple files can be specified through support for wildcard characters (e.g., ? or *), a database table, auto-number sequences, or other methods. This feature is very useful in a wide variety of data integration scenarios; for example, it may be necessary to integrate data from a file collection or to generate individual XML files for each main table record in a large database. The screenshot below shows an example in which two files from a directory are integrated into a single target file.

http://www.altova.com/mapforce/visual-function-builder.html

http://www.altova.com/mapforce/data-processing-functions.html

http://www.altova.com/images/shots/data_integration.gif


As a complement to this feature, MapForce also allows you to use file names as parameters in your data integration projects. This lets you create dynamic mappings in which this information is defined at run-time.

Re-usable Data Mappings

Whether it is an XML or database schema, EDI configuration file, or XBRL taxonomy and beyond, MapForce integrates data based on data structures regardless of the underlying content. This means that you can re-use your data integration mappings again and again as your business data changes.

Simply right click the data structure and choose Properties to access the component settings dialog to change your data source and, consequently, the output of your data integration project.

If you need to make some changes to your mapping along the way - to accommodate for underlying schema changes, for instance - MapForce offers a variety of automation features that help ease this process. For example, when you re-map a parent element, you will be asked if you would like to automatically reassign child elements or any other descendent connections accordingly.


Data integration output is created on-the-fly, and can be viewed at any time by simply clicking the Output tab in the design pane.

Automated Data Integration

For XML mappings, MapForce automatically generates data integration code on-the-fly in XSLT 1.0/2.0 or XQuery, based on your selection.

http://www.altova.com/mapforce/xml-mapping.html


MapForce data mappings can also be fully automated through the generation of royalty-free data integration application code in Java, C#, or C++. This enables you to implement scheduled or event-triggered data integration/migration operations for inclusion in any reporting, e-commerce, or SOA-based applications.


MapForce data integration operations can also be automated via data integration API, ActiveX control, or the command line.

Full integration with the Visual Studio and Eclipse IDEs helps developers use MapForce data integration functionality as part of large-scale enterprise projects, without the hefty price tag.

Legacy Data Integration

As technology rapidly advances in the information age, organizations are often left burdened with legacy data repositories that are no longer supported, making the data difficult to access and impossible to edit in its native format. Here, MapForce provides the unique FlexText utility for parsing flat file output so that it can easily be integrated with any other target structure.

FlexText enables you to create reusable legacy data integration templates for mapping flat files to modern data formats like XML, databases, Excel 2007+, XBRL, Web services, and more.

In addition, legacy data formats like EDI can easily be integrated with modern accounting systems like ERP and relational databases, or even translated to modern formats like XML.

Data Coordination Process:

1. A data coordination method of coordinating data between a source application program and a destination application program in an information processing terminal, the information processing terminal including a data storage unit configured to store therein data, a virus pattern file describing characteristics of a computer virus, and a data string pattern file describing a detecting data string; and an applications storage unit that stores therein a plurality of application programs each capable of creating data and storing the data in the data storage unit, and a virus detection program configured to detect a virus contained in the data created by any one of the application programs based on the virus pattern file before storing the data in the data storage unit, the application programs including a source application program that creates a specific data and a

http://www.altova.com/mapforce/flat-file-mapping.html

http://www.altova.com/mapforce/convert-text.html

http://www.altova.com/mapforce/mapforce-api.html

http://www.altova.com/mapforce/mapforce-api.html


destination application program that makes use of the specific data, the data coordination method comprising: executing the virus detection program whereby the virus detection program looks for a data string in the specific data based on the detecting data string in the data string pattern file in the data storage unit and extracts the data string if such a data string is present in the specific data; and notifying the data string extracted by the virus detection program at the executing and path information that specifies path of the specific data as data coordination information to the destination application program.

2. The data coordination method according to claim 1, further comprising creating and storing the data string pattern file in the storage unit.

3. The data coordination method according to claim 1, wherein the virus pattern file and the data string pattern file being separate files.

4. The data coordination method according to claim 1, wherein the data string pattern file includes pattern information for detecting a data string relating to any one of date information and position information or both included in the data created by any one of the application programs.

5. The data coordination method according to claim 1, wherein the executing includes looking for a data string in the specific data each time the destination application program requests the virus detection program to detect a virus contained in the data.

6. The data coordination method according to claim 1, wherein the executing includes looking for a data string in the specific data each time the destination application program is activated.

7. The data coordination method according to claim 1, wherein the executing includes looking for a data string in the specific data at a timing specified by a user.

8. The data coordination method according to claim 1, wherein the destination application program is a schedule management program that manages schedule by using at least one of calendar information and map information.

9. The data coordination method according to claim 8, wherein the data is e-mail data, and the schedule management program extracts e-mail data from the data storage unit based on the storage destination information, and handles extracted e-mail data in association with the calendar information based on the date information.

10. The data coordination method according to claim 8, wherein the data is image data, and the schedule management program extracts image data from the data storage unit based on the storage destination information, and handles extracted image data in association with the calendar information based on the date information.

11. The data coordination method according to claim 8, wherein the data is e-mail data, and the schedule management program extracts e-mail data from the data storage unit based on the storage destination information, and handles extracted e-mail data in association with the map information.

12. The data coordination method according to claim 8, wherein the data is image data, and the schedule management program extracts image data from the data storage unit based on the storage destination information, and handles extracted image data in association with the map information.

13. A computer-readable recording medium that stores therein a computer program that implements on a computer a data coordination method of coordinating data between a source application program and a destination application program in an information processing terminal, the information processing terminal including a data storage unit configured to store therein data, a virus pattern file describing characteristics of a computer virus, and a data string pattern file describing a detecting data string; and an applications storage


unit that stores therein a plurality of application programs each capable of creating data and storing the data in the data storage unit, and a virus detection program configured to detect a virus contained in the data created by any one of the application programs based on the virus pattern file before storing the data in the data storage unit, the application programs including a source application program that creates a specific data and a destination application program that makes use of the specific data, the computer program causing the computer to execute: executing the virus detection program whereby the virus detection program looks for a data string in the specific data based on the detecting data string in the data string pattern file in the data storage unit and extracts the data string if such a data string is present in the specific data; and notifying the data string extracted by the virus detection program at the executing and path information that specifies path of the specific data as data coordination information to the destination application program.

14. An information processing terminal comprising: a data storage unit configured to store therein data, a virus pattern file describing characteristics of a computer virus, and a data string pattern file describing a detecting data string; an applications storage unit that stores therein a plurality of application programs each capable of creating data and storing the data in the data storage unit, and a virus detection program configured to detect a virus contained in the data created by any one of the application programs based on the virus pattern file before storing the data in the data storage unit, the application programs including a source application program that creates a specific data and a destination application program that makes use of the specific data; an executing unit that executes the virus detection program whereby the virus detection program looks for a data string in the specific data based on the detecting data string in the data string pattern file in the data storage unit and extracts the data string if such a data string is present in the specific data; and a notifying unit that notifies the data string extracted by the virus detection program and path information that specifies path of the specific data as data coordination information to the destination application program.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to a technology for coordinating data between various application programs that handle storage of data in a storage unit and a virus detection program that detects a virus contained in the data based on a virus pattern file describing characteristics of viruses.

2. Description of the Related Art

Mobile phones are becoming multifunctional. Most mobile phones now have an e-mail function, a web page browsing function, a music reproducing function, and a photograph function.

In a mobile phone, data files are generally stored in a storage device within the mobile phone. Along with the increase of the functions of mobile phones, types and number of the data files that need to be stored have increased. As a result, there is a need to efficiently group and manage the data files. Various techniques have been proposed to achieve this. One approach includes adding unique classification information to data files. Another approach includes using a data file name to facilitate classification.

For example, Japanese Patent Application Laid-Open (JP-A) No. 2003-134454 discloses a technique that appends a date on which image data was photographed to be included in a data file indicative of the image data. On the contrary, JP-A No. 2001-34632 discloses a technique that appends the name of the place where image data was photographed to be included in a data file indicative of the image data.

Thus, the techniques disclosed in JP-A Nos. 2003-134454 and 2001-34632 include appending unique information to data files. However, some application programs (AP) cannot handle such data files that are


appended with additional information. In other words, there is a limitation on where the conventional techniques can be employed.

Many APs (for example, an e-mail AP, a camera AP, and a web browser AP) are installed in a multifunctional mobile phone, and generally, file formats that are handled by these APs are not the same. Accordingly, although some APs can handle the data files that are appended with additional information, others cannot.

For example, when a user of a mobile phone wishes to rearrange data files stored in a predetermined directory by the camera AP, it is necessary to shift the data file by creating a new directory. However, such an operation is troublesome, and is not convenient for the user.

Data files such as e-mail data files, music files, web page data files, image data files that are handled by mobile phones are in conformity with a standardized format. That is, these data files already include information such as the date when the file is created.

One approach could be to create a search program that can extract information, such as date, from these data files, and install the program to mobile phones. However, creation of a new search program increases the costs.

Therefore, there is a need of a technique that can easily and at low costs coordinate data files among various APs. This issue is not particularly limited to the mobile phones, and applies likewise to information processing terminals such as a personal digital assistant (PDA)


Business intelligence & Tools

Q.1 Explain business development life cycle in detail?

Ans: The Systems Development Life Cycle (SDLC), or Software Development Life Cycle in systems engineering, information systems and software engineering, is the process of creating or altering systems, and the models and methodologies that people use to develop these systems. The concept generally refers to computer or information systems.

In software engineering the SDLC concept underpins many kinds of software development methodologies. These methodologies form the framework for planning and controlling the creation of an information system: the software development process.

ystems Development Life Cycle (SDLC) is a process used by a systems analyst to develop an information system, including requirements, validation, training, and user (stakeholder) ownership. Any SDLC should result in a high quality system that meets or exceeds customer expectations, reaches completion within time and cost estimates, works effectively and efficiently in the current and planned Information Technology infrastructure, and is inexpensive to maintain and cost-effective to enhance.[2]

Computer systems are complex and often (especially with the recent rise of Service-Oriented Architecture) link multiple traditional systems potentially supplied by different software vendors. To manage this level of

http://en.wikipedia.org/wiki/Service-Oriented_Architecture

http://en.wikipedia.org/wiki/Systems_Development_Life_Cycle#cite_note-1

http://en.wikipedia.org/wiki/Infrastructure

http://en.wikipedia.org/wiki/Information_Technology

http://en.wikipedia.org/wiki/Training

http://en.wikipedia.org/wiki/Verification_and_validation

http://en.wikipedia.org/wiki/Requirements

http://en.wikipedia.org/wiki/Information_system


http://en.wikipedia.org/wiki/Systems_analyst

http://en.wikipedia.org/wiki/Software_development_process

http://en.wikipedia.org/wiki/Software_development_methodologies


http://en.wikipedia.org/wiki/Computer_systems

http://en.wikipedia.org/wiki/Methodologies

http://en.wikipedia.org/wiki/Software_engineering

http://en.wikipedia.org/wiki/Information_systems

http://en.wikipedia.org/wiki/Systems_engineering



complexity, a number of SDLC models have been created: "waterfall"; "fountain"; "spiral"; "build and fix"; "rapid prototyping"; "incremental"; and "synchronize and stabilize". [3]

SDLC models can be described along a spectrum of agile to iterative to sequential. Agile methodologies, such as XP and Scrum, focus on light-weight processes which allow for rapid changes along the development cycle. Iterative methodologies, such as Rational Unified Process and Dynamic Systems Development Method, focus on limited project scopes and expanding or improving products by multiple iterations. Sequential or big-design-upfront (BDUF) models, such as Waterfall, focus on complete and correct planning to guide large projects and risks to successful and predictable results[citation needed]. Other models, such as Anamorphic Development, tend to focus on a form of development that is guided by project scope and adaptive iterations of feature development.

In project management a project can be defined both with a project life cycle (PLC) and an SDLC, during which slightly different activities occur. According to Taylor (2004) "the project life cycle encompasses all the activities of the project, while the systems development life cycle focuses on realizing the product requirements".[4]

Systems development phases

The System Development Life Cycle framework provides system designers and developers to follow a sequence of activities. It consists of a set of steps or phases in which each phase of the SDLC uses the results of the previous one.

A Systems Development Life Cycle (SDLC) adheres to important phases that are essential for developers, such as planning, analysis, design, and implementation, and are explained in the section below. A number of system development life cycle (SDLC) models have been created: waterfall, fountain, spiral, build and fix, rapid prototyping, incremental, and synchronize and stabilize. The oldest of these, and the best known, is the waterfall model: a sequence of stages in which the output of each stage becomes the input for the next. These stages can be characterized and divided up in different ways, including the following[6]:

Project planning, feasibility study: Establishes a high-level view of the intended project and determines its goals.

Systems analysis, requirements definition: Refines project goals into defined functions and operation of the intended application. Analyzes end-user information needs.

Systems design: Describes desired features and operations in detail, including screen layouts, business rules, process diagrams, pseudocode and other documentation.

Implementation: The real code is written here.

Integration and testing: Brings all the pieces together into a special testing environment, then checks for errors, bugs and interoperability.

Acceptance, installation, deployment: The final stage of initial development, where the software is put into production and runs actual business.

Maintenance: What happens during the rest of the software's life: changes, correction, additions, moves to a different computing platform and more. This, the least glamorous and perhaps most important step of all, goes on seemingly forever.

In the following example (see picture) these stage of the Systems Development Life Cycle are divided in ten steps from definition to creation and modification of IT work products:

Systems development life cycle topics

Management and control


http://en.wikipedia.org/wiki/Waterfall_model

http://en.wikipedia.org/wiki/Implementation

http://en.wikipedia.org/wiki/Design

http://en.wikipedia.org/wiki/Analysis

http://en.wikipedia.org/wiki/Planning


http://en.wikipedia.org/wiki/Requirement

http://en.wikipedia.org/wiki/Project

http://en.wikipedia.org/wiki/Project_life_cycle

http://en.wikipedia.org/wiki/Project_management

http://en.wikipedia.org/wiki/Anamorphic_Web_Development

http://en.wikipedia.org/wiki/Wikipedia:Citation_needed


http://en.wikipedia.org/wiki/Dynamic_Systems_Development_Method

http://en.wikipedia.org/wiki/Dynamic_Systems_Development_Method

http://en.wikipedia.org/wiki/Rational_Unified_Process

http://en.wikipedia.org/wiki/Iterative_and_incremental_development

http://en.wikipedia.org/wiki/Scrum_(development)

http://en.wikipedia.org/wiki/Extreme_Programming

http://en.wikipedia.org/wiki/Agile_software_development


http://en.wikipedia.org/wiki/Incremental_development

http://en.wikipedia.org/wiki/Software_prototyping#Throwaway_prototyping

http://en.wikipedia.org/wiki/Spiral_model



SDLC Phases Related to Management Controls.

The Systems Development Life Cycle (SDLC) phases serve as a programmatic guide to project activity and provide a flexible but consistent way to conduct projects to a depth matching the scope of the project. Each of the SDLC phase objectives are described in this section with key deliverables, a description of recommended tasks, and a summary of related control objectives for effective management. It is critical for the project manager to establish and monitor control objectives during each SDLC phase while executing projects. Control objectives help to provide a clear statement of the desired result or purpose and should be used throughout the entire SDLC process. Control objectives can be grouped into major categories (Domains), and relate to the SDLC phases as shown in the figure.

To manage and control any SDLC initiative, each project will be required to establish some degree of a Work Breakdown Structure (WBS) to capture and schedule the work necessary to complete the project. The WBS and all programmatic material should be kept in the “Project Description” section of the project notebook. The WBS format is mostly left to the project manager to establish in a way that best describes the project work. There are some key areas that must be defined in the WBS as part of the SDLC policy. The following diagram describes three key areas that will be addressed in the WBS in a manner established by the project manager.[8]

Work breakdown structured organization

Work Breakdown Structure.

http://en.wikipedia.org/wiki/Systems_Development_Life_Cycle#cite_note-USHR99-7

http://en.wikipedia.org/wiki/Work_Breakdown_Structure


http://en.wikipedia.org/wiki/File:SDLC_Phases_Related_to_Management_Controls.jpg

http://en.wikipedia.org/wiki/File:SDLC_Work_Breakdown_Structure.jpg


The upper section of the Work Breakdown Structure (WBS) should identify the major phases and milestones of the project in a summary fashion. In addition, the upper section should provide an overview of the full scope and timeline of the project and will be part of the initial project description effort leading to project approval. The middle section of the WBS is based on the seven Systems Development Life Cycle (SDLC) phases as a guide for WBS task development. The WBS elements should consist of milestones and “tasks” as opposed to “activities” and have a definitive period (usually two weeks or more). Each task must have a measurable output (e.g. document, decision, or analysis). A WBS task may rely on one or more activities (e.g. software engineering, systems engineering) and may require close coordination with other tasks, either internal or external to the project. Any part of the project needing support from contractors should have a Statement of work (SOW) written to include the appropriate tasks from the SDLC phases. The development of a SOW does not occur during a specific phase of SDLC but is developed to include the work from the SDLC process that may be conducted by external resources such as contractors and struct.

Q.2 Discuss the various components of data ware house?

Ans: The data warehouse architecture is based on a relational database management system server that functions as the central repository for informational data. Operational data and processing is completely separated from data warehouse processing. This central information repository is surrounded by a number of key components designed to make the entire environment functional, manageable and accessible by both the operational systems that source data into the warehouse and by end-user query and analysis tools.

Typically, the source data for the warehouse is coming from the operational applications. As the data enters the warehouse, it is cleaned up and transformed into an integrated structure and format. The transformation process may involve conversion, summarization, filtering and condensation of data. Because the data contains a historical component, the warehouse must be capable of holding and managing large volumes of data as well as different data structures for the same database over time.

The next sections look at the seven major components of data warehousing:

Data Warehouse Database

The central data warehouse database is the cornerstone of the data warehousing environment. This database is almost always implemented on the relational database management system (RDBMS) technology. However, this kind of implementation is often constrained by the fact that traditional RDBMS products are optimized for transactional database processing. Certain data warehouse attributes, such as very large database size, ad hoc query processing and the need for flexible user view creation including aggregates, multi-table joins and drill-downs, have become drivers for different technological approaches to the data warehouse database. These approaches include:

Parallel relational database designs for scalability that include shared-memory, shared disk, or shared-nothing models implemented on various multiprocessor configurations (symmetric multiprocessors or SMP, massively parallel processors or MPP, and/or clusters of uni- or multiprocessors).

An innovative approach to speed up a traditional RDBMS by using new index structures to bypass relational table scans.

Multidimensional databases (MDDBs) that are based on proprietary database technology; conversely, a dimensional data model can be implemented using a familiar RDBMS. Multi-dimensional databases are designed to overcome any limitations placed on the warehouse by the nature of the relational data model. MDDBs enable on-line analytical processing (OLAP) tools that architecturally belong to a group of data warehousing components jointly categorized as the data query, reporting, analysis and mining tools.

Sourcing, Acquisition, Cleanup and Transformation Tools

http://en.wikipedia.org/wiki/Statement_of_work


http://en.wikipedia.org/wiki/Software_engineering



A significant portion of the implementation effort is spent extracting data from operational systems and putting it in a format suitable for informational applications that run off the data warehouse.

The data sourcing, cleanup, transformation and migration tools perform all of the conversions, summarizations, key changes, structural changes and condensations needed to transform disparate data into information that can be used by the decision support tool. They produce the programs and control statements, including the COBOL programs, MVS job-control language (JCL), UNIX scripts, and SQL data definition language (DDL) needed to move data into the data warehouse for multiple operational systems. These tools also maintain the meta data. The functionality includes:

Removing unwanted data from operational databases Converting to common data names and definitions

Establishing defaults for missing data

Accommodating source data definition changes

The data sourcing, cleanup, extract, transformation and migration tools have to deal with some significant issues including:

Database heterogeneity. DBMSs are very different in data models, data access language, data navigation, operations, concurrency, integrity, recovery etc.

Data heterogeneity. This is the difference in the way data is defined and used in different models - homonyms, synonyms, unit compatibility (U.S. vs metric), different attributes for the same entity and different ways of modeling the same fact.

These tools can save a considerable amount of time and effort. However, significant shortcomings do exist. For example, many available tools are generally useful for simpler data extracts. Frequently, customized extract routines need to be developed for the more complicated data extraction procedures.

Meta data

Meta data is data about data that describes the data warehouse. It is used for building, maintaining, managing and using the data warehouse. Meta data can be classified into:

Technical meta data, which contains information about warehouse data for use by warehouse designers and administrators when carrying out warehouse development and management tasks.

Business meta data, which contains information that gives users an easy-to-understand perspective of the information stored in the data warehouse.

Equally important, meta data provides interactive access to users to help understand content and find data. One of the issues dealing with meta data relates to the fact that many data extraction tool capabilities to gather meta data remain fairly immature. Therefore, there is often the need to create a meta data interface for users, which may involve some duplication of effort.

Meta data management is provided via a meta data repository and accompanying software. Meta data repository management software, which typically runs on a workstation, can be used to map the source data to the target database; generate code for data transformations; integrate and transform the data; and control moving data to the warehouse.

As user's interactions with the data warehouse increase, their approaches to reviewing the results of their requests for information can be expected to evolve from relatively simple manual analysis for trends and exceptions to agent-driven initiation of the analysis based on user-defined thresholds. The definition of these thresholds, configuration parameters for the software agents using them, and the information directory indicating where the appropriate sources for the information can be found are all stored in the meta data repository as well.

Access Tools


The principal purpose of data warehousing is to provide information to business users for strategic decision-making. These users interact with the data warehouse using front-end tools. Many of these tools require an information specialist, although many end users develop expertise in the tools. Tools fall into four main categories: query and reporting tools, application development tools, online analytical processing tools, and data mining tools.

Query and Reporting tools can be divided into two groups: reporting tools and managed query tools. Reporting tools can be further divided into production reporting tools and report writers. Production reporting tools let companies generate regular operational reports or support high-volume batch jobs such as calculating and printing paychecks. Report writers, on the other hand, are inexpensive desktop tools designed for end-users.

Managed query tools shield end users from the complexities of SQL and database structures by inserting a metalayer between users and the database. These tools are designed for easy-to-use, point-and-click operations that either accept SQL or generate SQL database queries.

Often, the analytical needs of the data warehouse user community exceed the built-in capabilities of query and reporting tools. In these cases, organizations will often rely on the tried-and-true approach of in-house application development using graphical development environments such as PowerBuilder, Visual Basic and Forte. These application development platforms integrate well with popular OLAP tools and access all major database systems including Oracle, Sybase, and Informix.

OLAP tools are based on the concepts of dimensional data models and corresponding databases, and allow users to analyze the data using elaborate, multidimensional views. Typical business applications include product performance and profitability, effectiveness of a sales program or marketing campaign, sales forecasting and capacity planning. These tools assume that the data is organized in a multidimensional model. A critical success factor for any business today is the ability to use information effectively. Data mining is the process of discovering meaningful new correlations, patterns and trends by digging into large amounts of data stored in the warehouse using artificial intelligence, statistical and mathematical techniques.

Data Marts

The concept of a data mart is causing a lot of excitement and attracts much attention in the data warehouse industry. Mostly, data marts are presented as an alternative to a data warehouse that takes significantly less time and money to build. However, the term data mart means different things to different people. A rigorous definition of this term is a data store that is subsidiary to a data warehouse of integrated data. The data mart is directed at a partition of data (often called a subject area) that is created for the use of a dedicated group of users. A data mart might, in fact, be a set of denormalized, summarized, or aggregated data. Sometimes, such a set could be placed on the data warehouse rather than a physically separate store of data. In most instances, however, the data mart is a physically separate store of data and is resident on separate database server, often a local area network serving a dedicated user group. Sometimes the data mart simply comprises relational OLAP technology which creates highly denormalized dimensional model (e.g., star schema) implemented on a relational database. The resulting hypercubes of data are used for analysis by groups of users with a common interest in a limited portion of the database.

These types of data marts, called dependent data marts because their data is sourced from the data warehouse, have a high value because no matter how they are deployed and how many different enabling technologies are used, different users are all accessing the information views derived from the single integrated version of the data.

Unfortunately, the misleading statements about the simplicity and low cost of data marts sometimes result in organizations or vendors incorrectly positioning them as an alternative to the data warehouse. This viewpoint defines independent data marts that in fact, represent fragmented point solutions to a range of business problems in the enterprise. This type of implementation should be rarely deployed in the context of an overall technology or applications architecture. Indeed, it is missing the ingredient that is at the heart of the data warehousing concept -- that of data integration. Each independent data mart makes its own assumptions about how to consolidate the data, and the data across several data marts may not be consistent.


Moreover, the concept of an independent data mart is dangerous -- as soon as the first data mart is created, other organizations, groups, and subject areas within the enterprise embark on the task of building their own data marts. As a result, you create an environment where multiple operational systems feed multiple non-integrated data marts that are often overlapping in data content, job scheduling, connectivity and management. In other words, you have transformed a complex many-to-one problem of building a data warehouse from operational and external data sources to a many-to-many sourcing and management nightmare.

Data Warehouse Administration and Management

Data warehouses tend to be as much as 4 times as large as related operational databases, reaching terabytes in size depending on how much history needs to be saved. They are not synchronized in real time to the associated operational data but are updated as often as once a day if the application requires it.

In addition, almost all data warehouse products include gateways to transparently access multiple enterprise data sources without having to rewrite applications to interpret and utilize the data. Furthermore, in a heterogeneous data warehouse environment, the various databases reside on disparate systems, thus requiring inter-networking tools. The need to manage this environment is obvious.

Managing data warehouses includes security and priority management; monitoring updates from the multiple sources; data quality checks; managing and updating meta data; auditing and reporting data warehouse usage and status; purging data; replicating, subsetting and distributing data; backup and recovery and data warehouse storage management.

Information Delivery System

The information delivery component is used to enable the process of subscribing for data warehouse information and having it delivered to one or more destinations according to some user-specified scheduling algorithm. In other words, the information delivery system distributes warehouse-stored data and other information objects to other data warehouses and end-user products such as spreadsheets and local databases. Delivery of information may be based on time of day or on the completion of an external event. The rationale for the delivery systems component is based on the fact that once the data warehouse is installed and operational, its users don't have to be aware of its location and maintenance. All they need is the report or an analytical view of data at a specific point in time. With the proliferation of the Internet and the World Wide Web such a delivery system may leverage the convenience of the Internet by delivering warehouse-enabled information to thousands of end-users via the ubiquitous world wide network.

In fact, the Web is changing the data warehousing landscape since at the very high level the goals of both the Web and data warehousing are the same: easy access to information. The value of data warehousing is maximized when the right information gets into the hands of those individuals who need it, where they need it and they need it most. However, many corporations have struggled with complex client/server systems to give end users the access they need. The issues become even more difficult to resolve when the users are physically remote from the data warehouse location. The Web removes a lot of these issues by giving users universal and relatively inexpensive access to data. Couple this access with the ability to deliver required information on demand and the result is a web-enabled information delivery system that allows users dispersed across continents to perform a sophisticated business-critical analysis and to engage in collective decision-making.


Q.3 Discuss data extraction process? What are the various methods being used for data extraction?

Ans: Data extract: Data extract is the output of the data extraction process, a very important aspect of data warehouse implementation.

A data warehouse gathers data from several sources and utilizes these data to serve as vital information for the company. These data will be used to spot patterns and trends both in the business operations as well as in industry standards.

Since the data coming to the data warehouse may come from different source which commonly are of disparate systems resulting in different data formats, a data warehouse uses three processes to make use of the data. These processes are extraction, transformation and loading (ETL).

Data extraction is a process that involves retrieval of all format and types of data out of unstructured of badly structured data sources. These data will be further used for processing or data migration. Raw data is usually imported into an intermediate extracting system before being processed for data transformation where they will possibly be padded with meta data before being exported to another stage in the data warehouse work flow. The term data extraction is often applied when experimental data is first imported into a computer server from the primary sources such as recording or measuring devices.

During the process of data extraction in a data warehouse, data may be removed from the system source or a copy may be made with the original data being retained in the source system. It is also practiced in some


data extraction implementation to move historical data that accumulates in the operational system to a data warehouse in order to maintain performance and efficiency.

Data extracts are loaded into the staging area of a relational database which for future manipulation in the ETL methodology.

The data extraction process in general is performed within the source system itself. This is can be most appropriate if the extraction is added to a relational database. Some database professionals implement data extraction using extraction logic in the data warehouse staging area and query the source system for data using applications programming interface (API).

Data extraction is a complex process but there are various software applications that have been developed to handle this process.

Some generic extraction applications can be found free on the internet. A CD extraction software can create digital copies of audio CDs on the hard drive. There also email extraction tools which can extract email addresses from different websites including results from Google searches. These emails can be exported to text, html or XML formats.

Another data extracting tool is a web data or link extractor which can extra URLs, meta tags (like keywords, title and descriptions), body texts, email addresses, phone and fax numbers and many other data from a website.

There is a wide array of data extracting tools. Some are used for individual purposes such as extracting data for entertainment while some are used for big projects like data warehousing.

Since data warehouses need to do other processes and not just extracting alone, database managers or programmers usually write programs that repetitively checks on many different sites or new data updates. This way, the code just sits in one area of the data warehouse sensing new updates from the data sources. Whenever an new data is detected, the program automatically does its function to update and transfer the data to the ETL process.

Three common methods for data extraction

Probably the most common technique used traditionally to do this is to cook up some regular expressions that match the pieces you want (e.g., URL’s and link titles). Our screen-scraper software actually started out as an application written in Perl for this very reason. In addition to regular expressions, you might also use some code written in something like Java or Active Server Pages to parse out larger chunks of text. Using raw regular expressions to pull out the data can be a little intimidating to the uninitiated, and can get a bit messy when a script contains a lot of them. At the same time, if you’re already familiar with regular expressions, and your scraping project is relatively small, they can be a great solution.

Other techniques for getting the data out can get very sophisticated as algorithms that make use of artificial intelligence and such are applied to the page. Some programs will actually analyze the semantic content of an HTML page, then intelligently pull out the pieces that are of interest. Still other approaches deal with developing “ontologies“, or hierarchical vocabularies intended to represent the content domain.

There are a number of companies (including our own) that offer commercial applications specifically intended to do screen-scraping. The applications vary quite a bit, but for medium to large-sized projects they’re often a good solution. Each one will have its own learning curve, so you should plan on taking time to learn the ins and outs of a new application. Especially if you plan on doing a fair amount of screen-scraping it’s probably a good idea to at least shop around for a screen-scraping application, as it will likely save you time and money in the long run.

So what’s the best approach to data extraction? It really depends on what your needs are, and what resources you have at your disposal. Here are some of the pros and cons of the various approaches, as well as suggestions on when you might use each one:

http://en.wikipedia.org/wiki/Ontology_(computer_science)

http://www.screen-scraper.com/

http://blog.screen-scraper.com/2006/03/21/three-common-methods-for-data-extraction/


Raw regular expressions and code

Advantages:

If you’re already familiar with regular expressions and at least one programming language, this can be a quick solution.

Regular expressions allow for a fair amount of “fuzziness” in the matching such that minor changes to the content won’t break them.

You likely don’t need to learn any new languages or tools (again, assuming you’re already familiar with regular expressions and a programming language).

Regular expressions are supported in almost all modern programming languages. Heck, even VBScript has a regular expression engine. It’s also nice because the various regular expression implementations don’t vary too significantly in their syntax.

Disadvantages:

They can be complex for those that don’t have a lot of experience with them. Learning regular expressions isn’t like going from Perl to Java. It’s more like going from Perl to XSLT, where you have to wrap your mind around a completely different way of viewing the problem.

They’re often confusing to analyze. Take a look through some of the regular expressions people have created to match something as simple as an email address and you’ll see what I mean.

If the content you’re trying to match changes (e.g., they change the web page by adding a new “font” tag) you’ll likely need to update your regular expressions to account for the change.

The data discovery portion of the process (traversing various web pages to get to the page containing the data you want) will still need to be handled, and can get fairly complex if you need to deal with cookies and such.

When to use this approach: You’ll most likely use straight regular expressions in screen-scraping when you have a small job you want to get done quickly. Especially if you already know regular expressions, there’s no sense in getting into other tools if all you need to do is pull some news headlines off of a site.

Ontologies and artificial intelligence

Advantages:

You create it once and it can more or less extract the data from any page within the content domain you’re targeting.

The data model is generally built in. For example, if you’re extracting data about cars from web sites the extraction engine already knows what the make, model, and price are, so it can easily map them to existing data structures (e.g., insert the data into the correct locations in your database).

There is relatively little long-term maintenance required. As web sites change you likely will need to do very little to your extraction engine in order to account for the changes.

Disadvantages:

It’s relatively complex to create and work with such an engine. The level of expertise required to even understand an extraction engine that uses artificial intelligence and ontologies is much higher than what is required to deal with regular expressions.

These types of engines are expensive to build. There are commercial offerings that will give you the basis for doing this type of data extraction, but you still need to configure them to work with the specific content domain you’re targeting.

You still have to deal with the data discovery portion of the process, which may not fit as well with this approach (meaning you may have to create an entirely separate engine to handle data


discovery). Data discovery is the process of crawling web sites such that you arrive at the pages where you want to extract data.

When to use this approach: Typically you’ll only get into ontologies and artificial intelligence when you’re planning on extracting information from a very large number of sources. It also makes sense to do this when the data you’re trying to extract is in a very unstructured format (e.g., newspaper classified ads). In cases where the data is very structured (meaning there are clear labels identifying the various data fields), it may make more sense to go with regular expressions or a screen-scraping application.

Screen-scraping software

Advantages:

Abstracts most of the complicated stuff away. You can do some pretty sophisticated things in most screen-scraping applications without knowing anything about regular expressions, HTTP, or cookies.

Dramatically reduces the amount of time required to set up a site to be scraped. Once you learn a particular screen-scraping application the amount of time it requires to scrape sites vs. other methods is significantly lowered.

Support from a commercial company. If you run into trouble while using a commercial screen-scraping application, chances are there are support forums and help lines where you can get assistance.

Disadvantages:

The learning curve. Each screen-scraping application has its own way of going about things. This may imply learning a new scripting language in addition to familiarizing yourself with how the core application works.

A potential cost. Most ready-to-go screen-scraping applications are commercial, so you’ll likely be paying in dollars as well as time for this solution.

A proprietary approach. Any time you use a proprietary application to solve a computing problem (and proprietary is obviously a matter of degree) you’re locking yourself into using that approach. This may or may not be a big deal, but you should at least consider how well the application you’re using will integrate with other software applications you currently have. For example, once the screen-scraping application has extracted the data how easy is it for you to get to that data from your own code?

When to use this approach: Screen-scraping applications vary widely in their ease-of-use, price, and suitability to tackle a broad range of scenarios. Chances are, though, that if you don’t mind paying a bit, you can save yourself a significant amount of time by using one. If you’re doing a quick scrape of a single page you can use just about any language with regular expressions. If you want to extract data from hundreds of web sites that are all formatted differently you’re probably better off investing in a complex system that uses ontologies and/or artificial intelligence. For just about everything else, though, you may want to consider investing in an application specifically designed for screen-scraping.

As an aside, I thought I should also mention a recent project we’ve been involved with that has actually required a hybrid approach of two of the aforementioned methods. We’re currently working on a project that deals with extracting newspaper classified ads. The data in classifieds is about as unstructured as you can get. For example, in a real estate ad the term “number of bedrooms” can be written about 25 different ways. The data extraction portion of the process is one that lends itself well to an ontologies-based approach, which is what we’ve done. However, we still had to handle the data discovery portion. We decided to use screen-scraper for that, and it’s handling it just great. The basic process is that screen-scraper traverses the various pages of the site, pulling out raw chunks of data that constitute the classified ads. These ads then get passed to code we’ve written that uses ontologies in order to extract out the individual pieces we’re after. Once the data has been extracted we then insert it into a database.

Q.4 Discuss the needs of developing OLAP tools in details?


Ans: OLAP

Short for Online Analytical Processing, a category of software tools that provides analysis of data stored in a database. OLAP tools enable users to analyze different dimensions of multidimensional data. For example, it provides time series and trend analysis views. OLAP often is used in data mining.

The chief component of OLAP is the OLAP server, which sits between a client and a database management systems (DBMS). The OLAP server understands how data is organized in the database and has special functions for analyzing the data. There are OLAP servers available for nearly all the major database systems.

The first commercial multidimensional (OLAP) products appeared approximately 30 years ago (Express). When Edgar Codd introduced the OLAP definition in his 1993 white paper, there were already dozens of OLAP products for client/server and desktop/file server environments. Usually those products were expensive, proprietary, standalone systems afforded only by large corporations, and performed only OLAP functions.

After Codd's research appeared, the software industry began appreciating OLAP functionality and many companies have integrated OLAP features into their products (RDBMS, integrated business intelligence suites, reporting tools, portals, etc.). In addition, for the last decade, pure OLAP tools have considerably improved and become cheaper and more user-friendly.

These developments brought OLAP functionality to a much broader range of users and organizations. Now OLAP is used not only for strategic decision-making in large corporations, but also to make daily tactical decisions about how to better streamline business operations in organizations of all sizes and shapes.

However, the acceptance of OLAP is far from maxi mized. For example, one year ago, The OLAP Survey 2 found that only thirty percent of its participants actually used OLAP.

General purpose tools with OLAP capabilities

Organizations do not want to use pure OLAP tools or integrated business intelligence suites for different reasons. But many of organizations may want to use OLAP capabilities integrated into popular general purpose applications development tools which they already use. In this case, the organizations do not need to buy and deploy new software products, train staff to use them or hire new people.

There is another argument for creating general purpose tools with OLAP capabilities. End users work with the information they need via applications. The effectiveness of this work depends very much on the number of applications (and the interfaces, data formats, etc. associated with them). So it is very desirable to reduce the number of applications (ideally to one application). General purpose tools with OLAP capabilities allow us to reach the goal. In other words, there's no need to use separate applications based on pure OLAP tools.

The advantages of such an approach to developers and end users are clear, and Microsoft and Oracle have recognized this. Both corporations have steadily integrated OLAP into their RDBMSs and general purpose database application development tools.

Microsoft provides SQL Server to handle a relational view of data and the Analysis Services OLAP engine to handle a multidimensional cube view of data. Analysis Services provides the OLE DB for OLAP API and the MDX language for processing multidimensional cubes, which can be physically stored in relational tables or a multidimensional store. Microsoft Excel and Microsoft Office both provide access to Analysis Services data.

Oracle has finally incorporated the Express OLAP engine into the Oracle9i Database Enterprise Edition Release 2 (OLAP Option). Multidimensional cubes are stored in analytical workspaces, which are managed in an Oracle database using an abstract data type. The existing Oracle tools such as PL/SQL, Oracle

http://www.survey.com/products/olap2/index.html

http://searchdatabase.techtarget.com/gDefinition/0,294236,sid13_gci214137,00.html

http://www.webopedia.com/TERM/D/database_management_system_DBMS.html

http://www.webopedia.com/TERM/D/database_management_system_DBMS.html

http://www.webopedia.com/TERM/C/client.html

http://www.webopedia.com/TERM/S/server.html

http://www.webopedia.com/TERM/D/data_mining.html

http://www.webopedia.com/TERM/D/database.html

http://www.webopedia.com/TERM/D/data.html


Reports, Oracle Discoverer and Oracle BI Beans can query and analyze analytical workspaces. The OLAP API is Java-based and supports a rich OLAP manipulation language, which can be considered to be the multidimensional equivalent of Oracle PL/SQL.

Granulated OLAP

There is yet another type of OLAP tool, different from pure OLAP tools and general purpose tools with OLAP capabilities: OLAP components. It seems that this sort of OLAP is not as appreciated.

The OLAP component is the minimal and elementary tool (granula) for developers to embed OLAP functionality in applications. So we can say that OLAP components are granulated OLAP.

Each OLAP component is used within some application development environment. At present, almost all known OLAP components are ActiveX or VCL ones.

All OLAP components are divided into two classes: OLAP components without an OLAP engine (MOLAP components) and OLAP components with this engine (ROLAP components).

The OLAP components without OLAP engines allow an application to access existing multidimensional cubes on MOLAP server that performs a required operation and returns results to the application. At present all the OLAP components of this kind are designed for access to MS Analytical Services (more precisely, they use OLE DB for OLAP) and were developed by Microsoft partners (Knosys, Matrix, etc.).

The ROLAP components (with OLAP engine) are of much greater interest than the MOLAP ones and general purpose tools with OLAP capabilities because they allow you to create flexible and efficient applications that cannot be developed with other tools.

The ROLAP component with an OLAP engine has three interfaces:

A data access mechanism interface through which it gets access to data sources APIs through which a developer uses a language like MDX to define data access and processing to

build a multidimensional table (cube). The developer manages properties and behavior of the component so it fully conforms to the application in which the component is embedded

End-user GUI that can pivot, filter, drill down and drill up data and generate numbers of views from a multidimensional table (cube).

As a rule, ROLAP components are used in client-side applications, so the OLAP engine functions on a client PC. Data access mechanisms like BDE (Borland Database Engines) or ADO.NET allow you to get source data from relational tables and flat files of an enterprise. PC performance nowadays allows the best ROLAP components to quickly process hundreds of thousands or even millions of records from these data sources, and dynamically build multidimensional cubes and perform operations with them. So very effective ROLAP and DOLAP are realized -- and in many cases, they are more preferable than MOLAP.

For example, the low price and simplicity of using a ROLAP component is the obvious (and possibly the only) choice for developers to create mass-deployed small and cheap applications, especially single-user DOLAP applications. Another field in which these components may be preferable is real-time analytical applications (no need to create and maintain MOLAP server, load cubes).

The most widely-known OLAP component is the Microsoft Pivot Table, which has an OLAP engine and access to MS Analytical Services so the component is both MOLAP and ROLAP. Another well-known OLAP (ROLAP) component is DesicionCube of Borland corporation.

Some ROLAP components have the ability to store dynamically-built multidimensional cubes which are usually named microcubes. This feature deserves attention from applications architects and designers


because it allows them to develop flexible and cheap applications like enterprise-wide distributed corporate reporting system or Web-based applications.

For example, the ContourCube component stores a microcube with all associated metadata (in fact, it is a container of an analytical application like an Excel workbook) in compressed form (from 10 up to 100 times). So this microcube is optimized for use on the Internet and can be transferred through HTTP and FTP protocols, and via e-mail. An end user with ContourCube is able to fully analyze the microcube. InterSoft Lab, developer of ContourCube component, has also developed several additional tools to facilitate the development, deployment and use of such distributed applications.

At present, there is a broad range of pure OLAP tools, general purpose tools with OLAP capabilities and OLAP components. To make the best choice, developers must understand the benefits and disadvantages of all sorts of OLAP.

Q.5 what do you understand by the term statistical analysis? Discuss the most important statistical techniques?

Ans: Developments in the field of statistical data analysis often parallel or follow advancements in other fields to which statistical methods are fruitfully applied. Because practitioners of the statistical analysis often address particular applied decision problems, methods developments is consequently motivated by the search to a better decision making under uncertainties.

Decision making process under uncertainty is largely based on application of statistical data analysis for probabilistic risk assessment of your decision. Managers need to understand variation for two key reasons. First, so that they can lead others to apply statistical thinking in day to day activities and secondly, to apply the concept for the purpose of continuous improvement. This course will provide you with hands-on experience to promote the use of statistical thinking and techniques to apply them to make educated decisions whenever there is variation in business data. Therefore, it is a course in statistical thinking via a data-oriented approach.

Statistical models are currently used in various fields of business and science. However, the terminology differs from field to field. For example, the fitting of models to data, called calibration, history matching, and data assimilation, are all synonymous with parameter estimation.

Your organization database contains a wealth of information, yet the decision technology group members tap a fraction of it. Employees waste time scouring multiple sources for a database. The decision-makers are frustrated because they cannot get business-critical data exactly when they need it. Therefore, too many decisions are based on guesswork, not facts. Many opportunities are also missed, if they are even noticed at all.

Knowledge is what we know well. Information is the communication of knowledge. In every knowledge exchange, there is a sender and a receiver. The sender make common what is private, does the informing, the communicating. Information can be classified as explicit and tacit forms. The explicit information can be explained in structured form, while tacit information is inconsistent and fuzzy to explain. Know that data are only crude information and not knowledge by themselves.

Data is known to be crude information and not knowledge by itself. The sequence from data to knowledge is: from Data to Information, from Information to Facts, and finally, from Facts to Knowledge . Data becomes information, when it becomes relevant to your decision problem. Information becomes fact, when the data can support it. Facts are what the data reveals. However the decisive instrumental (i.e., applied) knowledge is expressed together with some statistical degree of confidence.

Fact becomes knowledge, when it is used in the successful completion of a decision process. Once you have a massive amount of facts integrated as knowledge, then your mind will be superhuman in the same sense that mankind with writing is superhuman compared to mankind before writing. The following figure

http://home.ubalt.edu/ntsbarsh/stat-data/topics.htm#rparamerts


illustrates the statistical thinking process based on data in constructing statistical models for decision making under uncertainties.

The above figure depicts the fact that as the exactness of a statistical model increases, the level of improvements in decision-making increases. That's why we need statistical data analysis. Statistical data analysis arose from the need to place knowledge on a systematic evidence base. This required a study of the laws of probability, the development of measures of data properties and relationships, and so on.

Statistical inference aims at determining whether any statistical significance can be attached that results after due allowance is made for any random variation as a source of error. Intelligent and critical inferences cannot be made by those who do not understand the purpose, the conditions, and applicability of the various techniques for judging significance.

Considering the uncertain environment, the chance that "good decisions" are made increases with the availability of "good information." The chance that "good information" is available increases with the level of structuring the process of Knowledge Management. The above figure also illustrates the fact that as the exactness of a statistical model increases, the level of improvements in decision-making increases.

Knowledge is more than knowing something technical. Knowledge needs wisdom. Wisdom is the power to put our time and our knowledge to the proper use. Wisdom comes with age and experience. Wisdom is the accurate application of accurate knowledge and its key component is to knowing the limits of your knowledge. Wisdom is about knowing how something technical can be best used to meet the needs of the decision-maker. Wisdom, for example, creates statistical software that is useful, rather than technically brilliant. For example, ever since the Web entered the popular consciousness, observers have noted that it puts information at your fingertips but tends to keep wisdom out of reach.

Almost every professionals need a statistical toolkit. Statistical skills enable you to intelligently collect, analyze and interpret data relevant to their decision-making. Statistical concepts enable us to solve problems in a diversity of contexts. Statistical thinking enables you to add substance to your decisions.

The appearance of computer software, JavaScript Applets, Statistical Demonstrations Applets, and Online Computation are the most important events in the process of teaching and learning concepts in model-based statistical decision making courses. These tools allow you to construct numerical examples to understand the concepts, and to find their significance for yourself.

We will apply the basic concepts and methods of statistics you've already learned in the previous statistics course to the real world problems. The course is tailored to meet your needs in the statistical business-data analysis using widely available commercial statistical computer packages such as SAS and SPSS. By doing this, you will inevitably find yourself asking questions about the data and the method proposed, and you will have the means at your disposal to settle these questions to your own satisfaction. Accordingly, all the applications problems are borrowed from business and economics. By the end of this course you'll be able to think statistically while performing any data analysis.

http://www.physics.csbsju.edu/stats/Index.html

http://www.physics.csbsju.edu/stats/Index.html

http://www.ruf.rice.edu/~lane/stat_sim/index.html

http://home.ubalt.edu/ntsbarsh/Business-stat/otherapplets/Descriptive.htm


There are two general views of teaching/learning statistics: Greater and Lesser Statistics. Greater statistics is everything related to learning from data, from the first planning or collection, to the last presentation or report. Lesser statistics is the body of statistical methodology. This is a Greater Statistics course.

There are basically two kinds of "statistics" courses. The real kind shows you how to make sense out of data. These courses would include all the recent developments and all share a deep respect for data and truth. The imitation kind involves plugging numbers into statistics formulas. The emphasis is on doing the arithmetic correctly. These courses generally have no interest in data or truth, and the problems are generally arithmetic exercises. If a certain assumption is needed to justify a procedure, they will simply tell you to "assume the ... are normally distributed" -- no matter how unlikely that might be. It seems like you all are suffering from an overdose of the latter. This course will bring out the joy of statistics in you.

Statistics is a science assisting you to make decisions under uncertainties (based on some numerical and measurable scales). Decision making process must be based on data neither on personal opinion nor on belief.

It is already an accepted fact that "Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write." So, let us be ahead of our time.

Analysis Of Variance

An important technique for analyzing the effect of categorical factors on a response is to perform an Analysis of Variance. An ANOVA decomposes the variability in the response variable amongst the different factors. Depending upon the type of analysis, it may be important to determine: (a) which factors have a significant effect on the response, and/or (b) how much of the variability in the response variable is attributable to each factor.

STATGRAPHICS Centurion provides several procedures for performing an analysis of variance:

1. One-Way ANOVA - used when there is only a single categorical factor. This is equivalent to comparing multiple groups of data.

2. Multifactor ANOVA - used when there is more than one categorical factor, arranged in a crossed pattern. When factors are crossed, the levels of one factor appear at more than one level of the other factors.

3. Variance Components Analysis - used when there are multiple factors, arranged in a hierarchical manner. In such a design, each factor is nested in the factor above it.

4. General Linear Models - used whenever there are both crossed and nested factors, when some factors are fixed and some are random, and when both categorical and quantitative factors are present.

One-Way ANOVA

A one-way analysis of variance is used when the data are divided into groups according to only one factor. The questions of interest are usually: (a) Is there a significant difference between the groups?, and (b) If so, which groups are significantly different from which others? Statistical tests are provided to compare group means, group medians, and group standard

http://www.statgraphics.com/analysis_of_variance.htm#glm

http://www.statgraphics.com/analysis_of_variance.htm#varcomp

http://www.statgraphics.com/analysis_of_variance.htm#multifactor

http://www.statgraphics.com/analysis_of_variance.htm#oneway


deviations. When comparing means, multiple range tests are used, the most popular of which is Tukey's HSD procedure. For equal size samples, significant group differences can be determined by examining the means plot and identifying those intervals that do not overlap.

Multifactor ANOVA

When more than one factor is present and the factors are crossed, a multifactor ANOVA is appropriate. Both main effects and interactions between the factors may be estimated. The output includes an ANOVA table and a new graphical ANOVA from the latest edition of Statistics for Experimenters by Box, Hunter and Hunter (Wiley, 2005). In a graphical ANOVA, the points are scaled so that any levels that differ by more than exhibited in the distribution of the residuals are significantly different.

Variance Components Analysis

A Variance Components Analysis is most commonly used to determine the level at which variability is being introduced into a product. A typical experiment might select several batches, several samples from each batch, and then run replicates tests on each sample. The goal is to determine the relative percentages of the overall process variability that is being introduced at each level.

General Linear Model

The General Linear Models procedure is used whenever the above procedures are not appropriate. It can be used for models with both crossed and nested factors, models in which one or more of the variables is random rather than fixed, and when quantitative factors are to be combined with categorical ones. Designs that can be analyzed with the GLM procedure include partially nested designs, repeated measures experiments, split plots, and many others. For example, pages 536-540 of the book Design and Analysis of Experiments (sixth edition) by Douglas Montgomery (Wiley, 2005) contains an example of an experimental design with both crossed and nested factors. For that data, the GLM procedure produces several important tables, including estimates of the variance components for the random factors.

Analysis of Variance for Assembly Time


Source Sum of Squares

Df Mean Square

F-Ratio P-Value

Model 243.7 23 10.59 4.54 0.0002Residual 56.0 24 2.333 Total (Corr.) 299.7 47

Type III Sums of Squares

Source Sum of Squares

Df Mean Square

F-Ratio P-Value

Layout 4.083 1 4.083 0.34 0.5807Operator(Layout) 71.92 6 11.99 2.18 0.1174Fixture 82.79 2 41.4 7.55 0.0076Layout*Fixture 19.04 2 9.521 1.74 0.2178Fixture*Operator(Layout) 65.83 12 5.486 2.35 0.0360Residual 56.0 24 2.333 Total (corrected) 299.7 47

Expected Mean Squares

Source EMSLayout (6)+2.0(5)+6.0(2)+Q1Operator(Layout) (6)+2.0(5)+6.0(2)Fixture (6)+2.0(5)+Q2Layout*Fixture (6)+2.0(5)+Q3Fixture*Operator(Layout) (6)+2.0(5)Residual (6)

Variance Components

Source EstimateOperator(Layout) 1.083Fixture*Operator(Layout) 1.576Residual 2.333

mba 3rd sem assignment

Documents