mi0036 set-1& set-2

142
SIKKIM MANIPAL UNIVERSITY BUSINESS INTELLIGENCE TOOLS – 4 CREDITS SUBJECT CODE – MI0036 ASSIGNMENT SET – 1 Q.1 Define the term business intelligence tools? Discuss the roles in Business Intelligence project? Business Intelligence (BI) is a generic term used to describe leveraging the organizational internal and external data, information for making the best possible business decisions . The field of Business intelligence is very diverse and comprises the tools and technologies used to access and analyze various types of business information . These tools gather and store the data and allow the user to view and analyze the information from a wide variety of dimensions and thereby assist the decision-makers make better business decisions. Thus the BusinessIntelligence (BI) systems and tools play a vital role as far as organizations are concerned in making improved decisions in the current cut throat competitive scenario. In simple terms, BusinessIntelligence is an environment in which business SANTOSH GOWDA.H Reg No.: 521075728 3rd semester, Disha institute of management and technology Mobile No.: 9986840143

Upload: santosh143hsv143

Post on 29-Nov-2014

1.705 views

Category:

Documents


4 download

DESCRIPTION

12/12/2011

TRANSCRIPT

Page 1: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

BUSINESS INTELLIGENCE TOOLS – 4 CREDITS

SUBJECT CODE – MI0036

ASSIGNMENT SET – 1

Q.1 Define the term business intelligence tools? Discuss the roles in Business Intelligence project?

Business Intelligence (BI) is a generic term used to describe leveraging the organizational

internal and external data, information for making the

best possible business decisions. The field of Businessintelligence is very diverse and

comprises the tools and technologies used to access and analyze various types of business

information. These tools gather and store the data and allow the user to view and analyze

the information from a wide variety of dimensions and thereby assist the decision-makers

make better business decisions. Thus the BusinessIntelligence (BI) systems and tools play

a vital role as far as organizations are concerned in making improved decisions in the 

current cut throat competitive scenario. In simple terms, BusinessIntelligence is an

environment in which business users receive reliable, consistent, meaningful and timely

information. This data enables the business users conduct analyses that yield overall

understanding of how the business has been, how it is now and how it will be in the near

future. Also, the BI tools monitor the financial and operational health of the organization

through generation of various types of reports, alerts, alarms, key performance indicators

and dashboards. Business intelligence tools are a type of application software designed to

help in making better business decisions. These tools aid in the analysis and presentation

of data in a more meaningful way and so play a key role in the strategic planning

process of an organization. They illustrate business intelligence in the areas of market

research and segmentation, customer profiling, customer support, profitability, and

inventory and distribution analysis to name a few. Various types of BI systems viz.

Decision Support Systems, Executive Information Systems (EIS), Multidimensional

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 2: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

Analysis software or OLAP (On-Line Analytical Processing) tools, data mining tools are

discussed further. Whatever is the type, the Business Intelligencecapabilities of the

system is to let its users slice and dice the information from their organization's numerous

databases without having to wait for their IT departments to develop complex queries and

elicit answers.

Although it is possible to build BI systems without the benefit of a data warehouse, most

of the systems are an integral part of the user-facing end of the data warehouse in

practice. In fact, we can never think of building a data warehouse without BI Systems.

That is the reason; sometimes, the words ‘data warehousing’ and ‘business intelligence’

are being used interchangeably. 

Below Figure depicts how the data from one end gets transformed to information at the

other end for business information.

Roles in Business Intelligence project:

A typical BI Project consists of the following roles and the responsibilities of each of

these roles are detailed below:

Project Manager: Monitors the progress on continuum basis and is responsible for

the success of the project.

Technical Architect: Develops and implements the overall technical architecture

of the BI system, from the backend hardware/software to the client desktop

configurations.

Database Administrator (DBA): Keeps the database available for the applications

to run smoothly and also involves in planning and executing a backup/recovery

plan, as well as performance tuning.

ETL Developer: Involves himself in planning, developing, and deploying the

extraction, transformation, and loading routine for the data warehouse from the

legacysystems.

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 3: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

Front End Developer: Develops the front-end, whether it be client-server or over

the web.

OLAP Developer: Dexlops the OLAP cubes.

Data Modeler: Is responsible for taking the data structure that exists in the

enterprise and model it into a scheme that is suitable for OLAP analysis.

QA Group: Ensures the correctness of the data in the data warehouse.

Trainer: Works with the end users to make them familiar with how the front end

is set up so that the end users can get the most benefit out of the system.

Q.2. What do you mean by data ware house? What are the major concepts and

terminology used in the study of data ware house?

In computing, a data warehouse (DW) is a database used for reporting and analysis. The

data stored in the warehouse is uploaded from the operational systems. The data may pass

through anoperational data store for additional operations before it is used in the DW for

reporting.

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 4: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

A data warehouse maintains its functions in three layers: staging, integration, and

access. Staging is used to store raw data for use by developers. The integration layer is

used to integrate data and to have a level of abstraction from users. The access layer is for

getting data out for users.

Data warehouses can be subdivided into data marts. Data marts store subsets of data from

a warehouse.

This definition of the data warehouse focuses on data storage. The main source of the

data is cleaned, transformed, catalogued and made available for use by managers and

other business professionals for data mining, online analytical processing , market

research and decision support (Marakas & O'Brien 2009). However, the means to retrieve

and analyze data, to extract, transform and load data, and to manage the data

dictionary are also considered essential components of a data warehousing system. Many

references to data warehousing use this broader context. Thus, an expanded definition for

data warehousing includes business intelligence tools, tools to extract, transform and

load data into the repository, and tools to manage and retrieve metadata.

A common way of introducing data warehousing is to refer to the characteristics of a data

warehouse as set forth by William Inmon:

Subject Oriented

Integrated

Nonvolatile

Time Variant

Subject Oriented

Data warehouses are designed to help you analyze data. For example, to learn more about

your company's sales data, you can build a warehouse that concentrates on sales. Using

this warehouse, you can answer questions like "Who was our best customer for this item

last year?" This ability to define a data warehouse by subject matter, sales in this case,

makes the data warehouse subject oriented.

Integrated

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 5: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

Integration is closely related to subject orientation. Data warehouses must put data from

disparate sources into a consistent format. They must resolve such problems as naming

conflicts and inconsistencies among units of measure. When they achieve this, they are

said to be integrated.

Nonvolatile

Nonvolatile means that, once entered into the warehouse, data should not change. This is

logical because the purpose of a warehouse is to enable you to analyze what has occurred.

Time Variant

In order to discover trends in business, analysts need large amounts of data. This is very

much in contrast to online transaction processing (OLTP) systems, where performance

requirements demand that historical data be moved to an archive. A data warehouse's

focus on change over time is what is meant by the term time variant.

DATA WAREHOUSE TERMINOLOGY

Bruce W. Johnson, M.S.

Ad Hoc Query:

A database search that is designed to extract specific information from a database. It is

ad hoc if it is designed at the point of execution as opposed to being a “canned” report.

Most ad hoc query software uses the structured query language (SQL).

Aggregation:

The process of summarizing or combining data.

Catalog:

A component of a data dictionary that describes and organizes the various aspects of a

database such as its folders, dimensions, measures, prompts, functions, queries and other

database objects. It is used to create queries, reports, analyses and cubes.

Cross Tab:

A type of multi-dimensional report that displays values or measures in cells created by

the intersection of two or more dimensions in a table format.

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 6: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

Dashboard:

A data visualization method and workflow management tool that brings together useful

information on a series of screens and/or web pages. Some of the information that may

be contained on a dashboard includes reports, web links, calendar, news, tasks, e-mail,

etc. When incorporated into a DSS or EIS key performance indicators may be

represented as graphics that are linked to various hyperlinks, graphs, tables and other

reports. The dashboard draws its information from multiple sources applications, office

products, databases, Internet, etc.

Cube:

A multi-dimensional matrix of data that has multiple dimensions (independent variables)

and measures (dependent variables) that are created by an Online Analytical Processing

System (OLAP). Each dimension may be organized into a hierarchy with multiple levels.

The intersection of two or more dimensional categories is referred to as a cell.

Data-based Knowledge:

Factual information used in the decision making process that is derived from data marts

or warehouses using business intelligence tools. Data warehousing organizes information

into a format so that it represents an organizations knowledge with respect to a particular

subject area, e.g. finance or clinical outcomes.

Data Cleansing:

The process of cleaning or removing errors, redundancies and inconsistencies in the data

that is being imported into a data mart or data warehouse. It is part of the quality

assurance process.

Data Mart:

A database that is similar in structure to a data warehouse, but is typically smaller and is

focused on a more limited area. Multiple, integrated data marts are sometimes referred to

as an Integrated Data Warehouse. Data marts may be used in place of a larger data

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 7: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

warehouse or in conjunction with it. They are typically less expensive to develop and

faster to deploy and are therefore becoming more popular with smaller organizations.

Data Migration:

The transfer of data from one platform to another. This may include conversion from one

language, file structure and/or operating environment to another.

Data Mining:

The process of researching data marts and data warehouses to detect specific patterns in

the data sets. Data mining may be performed on databases and multi-dimensional data

cubes with ad hoc query tools and OLAP software. The queries and reports are typically

designed to answer specific questions to uncover trends or hidden relationships in the

data.

Data Scrubbing:

See Data Cleansing

Data Transformation:

The modification of transaction data extracted from one or more data sources before it is

loaded into the data mart or warehouse. The modifications may include data cleansing,

translation of data into a common format so that is can be aggregated and compared,

summarizing the data, etc.

Data Warehouse:

An integrated, non-volatile database of historical information that is designed around

specific content areas and is used to answer questions regarding an organizations

operations and environment.

Database Management System:

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 8: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

The software that is used to create data warehouses and data marts. For the purposes of

data warehousing, they typically include relational database management systems and

multi-dimensional database management systems. Both types of database management

systems create the database structures, store and retrieve the data and include various

administrative functions.

Decision Support System (DSS):

A set of queries, reports, rule-based analyses, tables and charts that are designed to aid

management with their decision-making responsibilities. These functions are typically

“wrapped around” a data mart or data warehouse. The DSS tends to employ more

detailed level data than an EIS.

Dimension:

A variable, perspective or general category of information that is used to organize and

analyze information in a multi-dimensional data cube.

Drill Down:

The ability of a data-mining tool to move down into increasing levels of detail in a data

mart, data warehouse or multi-dimensional data cube.

Drill Up:

The ability of a data-mining tool to move back up into higher levels of data in a data

mart, data warehouse or multi-dimensional data cube.

Executive Information Management System (EIS):

A type of decision support system designed for executive management that reports

summary level information as opposed to greater detail derived in a decision support

system.

Extraction, Transformation and Loading (ETL) Tool:

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 9: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

Software that is used to extract data from a data source like a operational system or data

warehouse, modify the data and then load it into a data mart, data warehouse or multi-

dimensional data cube.

Granularity:

The level of detail in a data store or report.

Hierarchy:

The organization of data, e.g. a dimension, into a outline or logical tree structure. The

strata of a hierarchy are referred to as levels. The individual elements within a level are

referred to as categories. The next lower level in a hierarchy is the child; the next higher

level containing the children is their parent.

Legacy System:

Older systems developed on platforms that tend to be one or more generations behind the

current state-of-the-art applications. Data marts and warehouses were developed in large

part due to the difficulty in extracting data from these system and the inconsistencies and

incompatibilities among them.

Level:

A tier or strata in a dimensional hierarchy. Each lower level represents an increasing

degree of detail. Levels in a location dimension might include country, region, state,

county, city, zip code, etc.

Measure:

A quantifiable variable or value stored in a multi-dimensional OLAP cube. It is a value

in the cell at the intersection of two or more dimensions.

Member:

One of the data points for a level of a dimension.

Meta Data:

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 10: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

Information in a data mart or warehouse that describes the tables, fields, data types,

attributes and other objects in the data warehouse and how they map to their data sources.

Meta data is contained in database catalogs and data dictionaries.

Multi-Dimensional Online Processing (MOLAP):

Software that creates and analyzes multi-dimensional cubes to store its information.

Non-Volatile Data:

Data that is static or that does not change. In transaction processing systems the data is

updated on a continual regular basis. In a data warehouse the database is added to or

appended, but the existing data seldom changes.

Normalization:

The process of eliminating duplicate information in a database by creating a separate

table that stores the redundant information. For example, it would be highly inefficient to

re-enter the address of an insurance company with every claim. Instead, the database

uses a key field to link the claims table to the address table. Operational or transaction

processing systems are typically “normalized”. On the other hand, some data warehouses

find it advantageous to de-normalize the data allowing for some degree of redundancy.

Online Analytical Processing (OLAP):

The process employed by multi-dimensional analysis software to analyze the data

resident in data cubes. There are different types of OLAP systems named for the type of

database employed to create them and the data structures produced.

Open Database Connectivity (ODBC):

A database standard developed by Microsoft and the SQL Access Group Consortium that

defines the “rules” for accessing or retrieving data from a database.

Relational Database Management System:

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 11: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

Database management systems that have the ability to link tables of data through a

common or key field. Most databases today use relational technologies and support a

standard programming language called Structured Query Language (SQL).

Relational Online Analytical Processing (ROLAP):

OLAP software that employs a relational strategy to organize and store the data in its

database.

Replication:

The process of copying data from one database table to another.

Scalable:

The attribute or capability of a database to significantly expand the number of records

that it can manage. It also refers to hardware systems and their ability to be expanded or

upgraded to increase their processing speed and handle larger volumes of data.

Structured Query Language (SQL):

A standard programming language used by contemporary relational database

management systems.

Synchronization:

The process by which the data in two or more separate database are synchronized so that

the records contain the same information. If the fields and records are updated in one

database the same fields and records are updated in the other.

About the Author:

Bruce W. Johnson, MS, PMP is the CEO of Johnson Consulting Services, Inc. He is an

information management consultant who specializes in working with social service,

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 12: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

healthcare and government agencies. He can be reached at (800) 988-0934 or by e-mail

at [email protected].

Q.3. what are the data modeling techniques used in data warehousing environment?

Two data modeling techniques that are relevant in a data warehousing environment are

ER modeling and dimensional modeling.

ER modeling produces a data model of the specific area of interest, using two basic

concepts: entities and the relationships between those entities. Detailed

ER models also contain attributes, which can be properties of either the entities or the

relationships. The ER model is an abstraction tool because it can be used to understand

and simplify the ambiguous data relationships in the business world and complex systems

environments.

Dimensional modeling uses three basic concepts: measures, facts, and dimensions.

Dimensional modeling is powerful in representing the requirements of the business user

in the context of database tables.

Both ER and dimensional modeling can be used to create an abstract model of a specific

subject. However, each has its own limited set of modeling concepts and associated

notation conventions. Consequently, the techniques look different, and they are indeed

different in terms of semantic representation. The following sections describe the

modeling concepts and notation conventions for both ER modeling and dimensional

modeling that will be used throughout this book.

ER Modeling

A prerequisite for reading this book is a basic knowledge of ER modeling.

Therefore we do not focus on that traditional technique. We simply define the necessary

terms to form some consensus and present notation conventions used in the rest of this

book.

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 13: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

Figure 12. A Sample ER Model. Entity, relationship, and attributes in an ER diagram.

Basic Concepts

An ER model is represented by an ER diagram, which uses three basic graphic symbols

to conceptualize the data: entity, relationship, and attribute.

6.3.1.1 Entity

An entity is defined to be a person, place, thing, or event of interest to the business or the

organization. An entity represents a class of objects, which are things in the real world

that can be observed and classified by their properties and characteristics. In some books

on IE, the term entity type is used to represent classes of objects and entity for an instance

of an entity type. In this book, we will use them interchangeably.

6.3.1.2 Relationship

A relationship is represented with lines drawn between entities. It depicts the structural

interaction and association among the entities in a model. A relationship is designated

grammatically by a verb, such as owns, belongs, and has. The relationship between two

entities can be defined in terms of the cardinality. This is the maximum number of

instances of one entity that are related to a single instance in another table and vice versa.

The possible cardinalities are: one-to-one (1:1), one-to-many (1:M), and many-to-many

(M:M).

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 14: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

In a detailed (normalized) ER model, any M:M relationship is not shown because it is

resolved to an associative entity.

6.3.1.3 Attributes

Attributes describe the characteristics of properties of the entities. In Figure 12,

Product ID, Description, and Picture are attributes of the PRODUCT entity. For

clarification, attribute naming conventions are very important. An attribute name should

be unique in an entity and should be self-explanatory. For example, simply saying date1

or date2 is not allowed, we must clearly define each. As examples, they could be defined

as the order date and delivery date.

Dimensional Modeling

In some respects, dimensional modeling is simpler, more expressive, and easier to

understand than ER modeling. But, dimensional modeling is a relatively new concept and

not firmly defined yet in details, especially when compared to ER modeling techniques.

This section presents the terminology that we use in this book as we discuss dimensional

modeling.

Basic Concepts

Dimensional modeling is a technique for conceptualizing and visualizing data models as

a set of measures that are described by common aspects of the business. It is especially

useful for summarizing and rearranging the data and presenting views of the data to

support data analysis. Dimensional modeling focuses on numeric data, such as values,

counts, weights, balances, and occurrences.

Dimensional modeling has several basic concepts:

· Facts

· Dimensions

· Measures (variables)

6.4.1.1 Fact

A fact is a collection of related data items, consisting of measures and context data. Each

fact typically represents a business item, a business transaction, or an event that can be

used in analyzing the business or business processes.

In a data warehouse, facts are implemented in the core tables in which all of the numeric

data is stored.

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 15: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

6.4.1.2 Dimension

A dimension is a collection of members or units of the same type of views. In a diagram,

a dimension is usually represented by an axis. In a dimensional model, every data point in

the fact table is associated with one and only one member from each of the multiple

dimensions. That is, dimensions determine the contextual background for the facts. Many

analytical processes are used to quantify the impact of dimensions on the facts.

Dimensions are the parameters over which we want to perform Online Analytical

Processing (OLAP).

6.4.1.3 Measure

A measure is a numeric attribute of a fact, representing the performance or behavior of

the business relative to the dimensions. The actual numbers are called as variables. For

example, measures are the sales in money, the sales volume, the quantity supplied, the

supply cost, the transaction amount, and so forth. A measure is determined by

combinations of the members of the dimensions and is located on facts.

Q.4 Discuss the categories in which data is divided before structuring it into data

ware house?

A Data Warehouse is not an individual repository product. Rather, it is an overall

strategy, or process, for building decision support systems and a knowledge-based

applications architecture and environment that supports both everyday tactical decision

making and long-term business strategizing. The Data Warehouse environment positions

a business to utilize an enterprise-wide data store to link information from diverse

sources and make the information accessible for a variety of user purposes, most notably,

strategic analysis. Business analysts must be able to use the Warehouse for such strategic

purposes as trend identification, forecasting, competitive analysis, and targeted market

research.

Data Warehouses and Data Warehouse applications are designed primarily to support

executives, senior managers, and business analysts in making complex business

decisions. Data Warehouse applications provide the business community with access to

accurate, consolidated information from various internal and external sources.

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 16: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

The primary objective of Data Warehousing is to bring together information from

disparate sources and put the information into a format that is conducive to making

business decisions. This objective necessitates a set of activities that are far more

complex than just collecting data and reporting against it. Data Warehousing requires

both business and technical expertise and involves the following activities:

Accurately identifying the business information that must be contained in the

Warehouse

Identifying and prioritizing subject areas to be included in the Data Warehouse

Managing the scope of each subject area which will be implemented into the

Warehouse on an iterative basis

Developing a scaleable architecture to serve as the Warehouse’s technical and

application foundation, and identifying and selecting the

hardware/software/middleware components to implement it

Extracting, cleansing, aggregating, transforming and validating the data to ensure

accuracy and consistency

Defining the correct level of summarization to support business decision making

Establishing a refresh program that is consistent with business needs, timing and

cycles

Providing user-friendly, powerful tools at the desktop to access the data in the

Warehouse

Educating the business community about the realm of possibilities that are

available to them through Data Warehousing

Establishing a Data Warehouse Help Desk and training users to effectively utilize

the desktop tools

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 17: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

Establishing processes for maintaining, enhancing, and ensuring the ongoing

success and applicability of the Warehouse

Until the advent of Data Warehouses, enterprise databases were expected to serve

multiple purposes, including online transaction processing, batch processing, reporting,

and analytical processing. In most cases, the primary focus of computing resources was

on satisfying operational needs and requirements. Information reporting and analysis

needs were secondary considerations. As the use of PCs, relational databases, 4GL

technology and end-user computing grew and changed the complexion of information

processing, more and more business users demanded that their needs for information be

addressed. Data Warehousing has evolved to meet those needs without disrupting

operational processing.

In the Data Warehouse model, operational databases are not accessed directly to perform

information processing. Rather, they act as the source of data for the Data Warehouse,

which is the information repository and point of access for information processing. There

are sound reasons for separating operational and informational databases, as described

below.

The users of informational and operational data are different. Users of

informational data are generally managers and analysts; users of operational data

tend to be clerical, operational and administrative staff.

Operational data differs from informational data in context and currency.

Informational data contains an historical perspective that is not generally used by

operational systems.

The technology used for operational processing frequently differs from the

technology required to support informational needs.

The processing characteristics for the operational environment and the

informational environment are fundamentally different.

The Data Warehouse functions as a Decision Support System (DSS) and an Executive

Information System (EIS), meaning that it supports informational and analytical needs by

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 18: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

providing integrated and transformed enterprise-wide historical data from which to do

management analysis. A variety of sophisticated tools are readily available in the

marketplace to provide user-friendly access to the information stored in the Data

Warehouse.

Data Warehouses can be defined as subject-oriented, integrated, time-variant, non-

volatile collections of data used to support analytical decision making. The data in the

Warehouse comes from the operational environment and external sources. Data

Warehouses are physically separated from operational systems, even though the

operational systems feed the Warehouse with source data.

Subject Orientation

Data Warehouses are designed around the major subject areas of the enterprise; the

operational environment is designed around applications and functions. This difference in

orientation (data vs. process) is evident in the content of the database. Data Warehouses

do not contain information that will not be used for informational or analytical

processing; operational databases contain detailed data that is needed to satisfy

processing requirements but which has no relevance to management or analysis.

Integration and Transformation

The data within the Data Warehouse is integrated. This means that there is consistency

among naming conventions, measurements of variables, encoding structures, physical

attributes, and other salient data characteristics. An example of this integration is the

treatment of codes such as gender codes. Within a single corporation, various

applications may represent gender codes in different ways: male vs. female, m vs. f, and

1 vs. 0, etc. In the Data Warehouse, gender is always represented in a consistent way,

regardless of the many ways by which it may be encoded and stored in the source data.

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 19: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

As the data is moved to the Warehouse, it is transformed into a consistent representation

as required.

Time Variance

All data in Data Warehouse is accurate as of some moment in time, providing an

historical perspective. This differs from the operational environment in which data is

intended to be accurate as of the moment of access. The data in the Data Warehouse is, in

effect, a series of snapshots. Once the data is loaded into the enterprise data store and data

marts, it cannot be updated. It is refreshed on a periodic basis, as determined by the

business need. The operational data store, if included in the Warehouse architecture, may

be updated.

Non-Volatility

Data in the Warehouse is static, not dynamic. The only operations that occur in Data

Warehouse applications are the initial loading of data, access of data, and refresh of data.

For these reasons, the physical design of a Data Warehouse optimizes the access of data,

rather than focusing on the requirements of data update and delete processing.

Data Warehouse Configurations

A Data Warehouse configuration, also known as the logical architecture, includes the

following components:

One Enterprise Data Store (EDS) - a central repository which supplies atomic

(detail level) integrated information to the whole organization.

(optional) one Operational Data Store - a "snapshot" of a moment in time's

enterprise-wide data

(optional) one or more individual Data Mart(s) - summarized subset of the

enterprise's data specific to a functional area or department, geographical region,

or time period

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 20: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

One or more Metadata Store(s) or Repository(ies) - catalog(s) of reference

information about the primary data. Metadata is divided into two categories:

information for technical use, and information for business end-users.

The EDS is the cornerstone of the Data Warehouse. It can be accessed for both

immediate informational needs and for analytical processing in support of strategic

decision making, and can be used for drill-down support for the Data Marts which

contain only summarized data. It is fed by the existing subject area operational systems

and may also contain data from external sources. The EDS in turn feeds individual Data

Marts that are accessed by end-user query tools at the user's desktop. It is used to

consolidate related data from multiple sources into a single source, while the Data Marts

are used to physically distribute the consolidated data into logical categories of data, such

as business functional departments or geographical regions. The EDS is a collection of

daily "snapshots" of enterprise-wide data taken over an extended time period, and thus

retains and makes available for tracking purposes the history of changes to a given data

element over time. This creates an optimum environment for strategic analysis. However,

access to the EDS can be slow, due to the volume of data it contains, which is a good

reason for using Data Marts to filter, condense and summarize information for specific

business areas. In the absence of the Data Mart layer, users can access the EDS directly.

Metadata is "data about data," a catalog of information about the primary data that

defines access to the Warehouse. It is the key to providing users and developers with a

road map to the information in the Warehouse. Metadata comes in two different forms:

end-user and transformational. End-user metadata serves a business purpose; it translates

a cryptic name code that represents a data element into a meaningful description of the

data element so that end-users can recognize and use the data. For example, metadata

would clarify that the data element "ACCT_CD" represents "Account Code for Small

Business." Transformational metadata serves a technical purpose for development and

maintenance of the Warehouse. It maps the data element from its source system to the

Data Warehouse, identifying it by source field name, destination field code,

transformation routine, business rules for usage and derivation, format, key, size, index

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 21: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

and other relevant transformational and structural information. Each type of metadata is

kept in one or more repositories that service the Enterprise Data Store.

While an Enterprise Data Store and Metadata Store(s) are always included in a sound

Data Warehouse design, the specific number of Data Marts (if any) and the need for an

Operational Data Store are judgment calls. Potential Data Warehouse configurations

should be evaluated and a logical architecture determined according to business

requirements.

The Data Warehouse Process

The james martin + co Data Warehouse Process does not encompass the analysis and

identification of organizational value streams, strategic initiatives, and related business

goals, but it is a prescription for achieving such goals through a specific architecture. The

Process is conducted in an iterative fashion after the initial business requirements and

architectural foundations have been developed with the emphasis on populating the Data

Warehouse with "chunks" of functional subject-area information each iteration. The

Process guides the development team through identifying the business requirements,

developing the business plan and Warehouse solution to business requirements, and

implementing the configuration, technical, and application architecture for the overall

Data Warehouse. It then specifies the iterative activities for the cyclical planning, design,

construction, and deployment of each population project. The following is a description

of each stage in the Data Warehouse Process. (Note: The Data Warehouse Process also

includes conventional project management, startup, and wrap-up activities which are

detailed in the Plan, Activate, Control and End stages, not described here.)

Business Case Development

A variety of kinds of strategic analysis, including Value Stream Assessment, have likely

already been done by the customer organization at the point when it is necessary to

develop a Business Case. The Business Case Development stage launches the Data

Warehouse development in response to previously identified strategic business initiatives

and "predator" (key) value streams of the organization. The organization will likely have

identified more than one important value stream. In the long term it is possible to

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 22: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

implement Data Warehouse solutions that address multiple value streams, but it is the

predator value stream or highest priority strategic initiative that usually becomes the

focus of the short-term strategy and first run population projects resulting in a Data

Warehouse.

At the conclusion of the relevant business reengineering, strategic visioning, and/or value

stream assessment activities conducted by the organization, a Business Case can be built

to justify the use of the Data Warehouse architecture and implementation approach to

solve key business issues directed at the most important goals. The Business Case defines

the outlying activities, costs, benefits, and critical success factors for a multi-generation

implementation plan that results in a Data Warehouse framework of an information

storage/access system. The Warehouse is an iterative designed/developed/refined solution

to the tactical and strategic business requirements. The Business Case addresses both the

short-term and long-term Warehouse strategies (how multiple data stores will work

together to fulfill primary and secondary business goals) and identifies both immediate

and extended costs so that the organization is better able to plan its short and long-term

budget appropriation.

Business Question Assessment

Once a Business Case has been developed, the short-term strategy for implementing the

Data Warehouse is mapped out by means of the Business Question Assessment (BQA)

stage. The purpose of BQA is to:

Establish the scope of the Warehouse and its intended use

Define and prioritize the business requirements and the subsequent information

(data) needs the Warehouse will address

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 23: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

Identify the business directions and objectives that may influence the required

data and application architectures

Determine which business subject areas provide the most needed information;

prioritize and sequence implementation projects accordingly

Drive out the logical data model that will direct the physical implementation

model

Measure the quality, availability, and related costs of needed source data at a high

level

Define the iterative population projects based on business needs and data

validation

The prioritized predator value stream or most important strategic initiative is analyzed to

determine the specific business questions that need to be answered through a Warehouse

implementation. Each business question is assessed to determine its overall importance to

the organization, and a high-level analysis of the data needed to provide the answers is

undertaken. The data is assessed for quality, availability, and cost associated with

bringing it into the Data Warehouse. The business questions are then revisited and

prioritized based upon their relative importance and the cost and feasibility of acquiring

the associated data. The prioritized list of business questions is used to determine the

scope of the first and subsequent iterations of the Data Warehouse, in the form of

population projects. Iteration scoping is dependent on source data acquisition issues and

is guided by determining how many business questions can be answered in a three to six

month implementation time frame. A "business question" is a question deemed by the

business to provide useful information in determining strategic direction. A business

question can be answered through objective analysis of the data that is available.

Architecture Review and Design

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 24: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

The Architecture is the logical and physical foundation on which the Data Warehouse

will be built. The Architecture Review and Design stage, as the name implies, is both a

requirements analysis and a gap analysis activity. It is important to assess what pieces of

the architecture already exist in the organization (and in what form) and to assess what

pieces are missing which are needed to build the complete Data Warehouse architecture.

During the Architecture Review and Design stage, the logical Data Warehouse

architecture is developed. The logical architecture is a configuration map of the necessary

data stores that make up the Warehouse; it includes a central Enterprise Data Store, an

optional Operational Data Store, one or more (optional) individual business area Data

Marts, and one or more Metadata stores. In the metadata store(s) are two different kinds

of metadata that catalog reference information about the primary data.

Once the logical configuration is defined, the Data, Application, Technical and Support

Architectures are designed to physically implement it. Requirements of these four

architectures are carefully analyzed so that the Data Warehouse can be optimized to serve

the users. Gap analysis is conducted to determine which components of each architecture

already exist in the organization and can be reused, and which components must be

developed (or purchased) and configured for the Data Warehouse.

The Data Architecture organizes the sources and stores of business information and

defines the quality and management standards for data and metadata.

The Application Architecture is the software framework that guides the overall

implementation of business functionality within the Warehouse environment; it controls

the movement of data from source to user, including the functions of data extraction, data

cleansing, data transformation, data loading, data refresh, and data access (reporting,

querying).

The Technical Architecture provides the underlying computing infrastructure that enables

the data and application architectures. It includes platform/server, network,

communications and connectivity hardware/software/middleware, DBMS, client/server

2-tier vs.3-tier approach, and end-user workstation hardware/software. Technical

architecture design must address the requirements of scalability, capacity and volume

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 25: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

handling (including sizing and partitioning of tables), performance, availability, stability,

chargeback, and security.

The Support Architecture includes the software components (e.g., tools and structures for

backup/recovery, disaster recovery, performance monitoring, reliability/stability

compliance reporting, data archiving, and version control/configuration management) and

organizational functions necessary to effectively manage the technology investment.

Architecture Review and Design applies to the long-term strategy for development and

refinement of the overall Data Warehouse, and is not conducted merely for a single

iteration. This stage develops the blueprint of an encompassing data and technical

structure, software application configuration, and organizational support structure for the

Warehouse. It forms a foundation that drives the iterative Detail Design activities. Where

Design tells you what to do; Architecture Review and Design tells you what pieces you

need in order to do it.

The Architecture Review and Design stage can be conducted as a separate project that

runs mostly in parallel with the Business Question Assessment stage. For the technical,

data, application and support infrastructure that enables and supports the storage and

access of information is generally independent from the business requirements of which

data is needed to drive the Warehouse. However, the data architecture is dependent on

receiving input from certain BQA activities (data source system identification and data

modeling), so the BQA stage must conclude before the Architecture stage can conclude.

The Architecture will be developed based on the organization's long-term Data

Warehouse strategy, so that future iterations of the Warehouse will have been provided

for and will fit within the overall architecture.

Tool Selection

The purpose of this stage is to identify the candidate tools for developing and

implementing the Data Warehouse data and application architectures, and for performing

technical and support architecture functions where appropriate. Select the candidate tools

that best meet the business and technical requirements as defined by the Data Warehouse

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 26: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

architecture, and recommend the selections to the customer organization. Procure the

tools upon approval from the organization.

It is important to note that the process of selecting tools is often dependent on the existing

technical infrastructure of the organization. Many organizations feel strongly for various

reasons about using tools for the Data Warehouse applications that they already have in

their "arsenal" and are reluctant to purchase new application packages. It is recommended

that a thorough evaluation of existing tools and the feasibility of their reuse be done in the

context of all tool evaluation activities. In some cases, existing tools can be form-fitted to

the Data Warehouse; in other cases, the customer organization may need to be convinced

that new tools would better serve their needs.

It may even be feasible that this series of activities is skipped altogether, if the

organization is insistent that particular tools be used (no room for negotiation), or if tools

have already been assessed and selected in anticipation of the Data Warehouse project.

Tools may be categorized according to the following data, technical, application, or

support functions:

Source Data Extraction and Transformation

Data Cleansing

Data Load

Data Refresh

Data Access

Security Enforcement

Version Control/Configuration Management

Backup and Recovery

Disaster Recovery

Performance Monitoring

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 27: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

Database Management

Platform

Data Modeling

Metadata Management

Iteration Project Planning

The Data Warehouse is implemented (populated) one subject area at a time, driven by

specific business questions to be answered by each implementation cycle. The first and

subsequent implementation cycles of the Data Warehouse are determined during the

BQA stage. At this point in the Process the first (or next if not first) subject area

implementation project is planned. The business requirements discovered in BQA and, to

a lesser extent, the technical requirements of the Architecture Design stage are now

refined through user interviews and focus sessions to the subject area level. The results

are further analyzed to yield the detail needed to design and implement a single

population project, whether initial or follow-on. The Data Warehouse project team is

expanded to include the members needed to construct and deploy the Warehouse, and a

detailed work plan for the design and implementation of the iteration project is developed

and presented to the customer organization for approval.

Detail Design

In the Detail Design stage, the physical Data Warehouse model (database schema) is

developed, the metadata is defined, and the source data inventory is updated and

expanded to include all of the necessary information needed for the subject area

implementation project, and is validated with users. Finally, the detailed design of all

procedures for the implementation project is completed and documented. Procedures to

achieve the following activities are designed:

Warehouse Capacity Growth

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 28: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

Data Extraction/Transformation/Cleansing

Data Load

Security

Data Refresh

Data Access

Backup and Recovery

Disaster Recovery

Data Archiving

Configuration Management

Testing

Transition to Production

User Training

Help Desk

Change Management

Implementation

Once the Planning and Design stages are complete, the project to implement the current

Data Warehouse iteration can proceed quickly. Necessary hardware, software and

middleware components are purchased and installed, the development and test

environment is established, and the configuration management processes are

implemented. Programs are developed to extract, cleanse, transform and load the source

data and to periodically refresh the existing data in the Warehouse, and the programs are

individually unit tested against a test database with sample source data. Metrics are

captured for the load process. The metadata repository is loaded with transformational

and business user metadata. Canned production reports are developed and sample ad-hoc

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 29: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

queries are run against the test database, and the validity of the output is measured. User

access to the data in the Warehouse is established. Once the programs have been

developed and unit tested and the components are in place, system functionality and user

acceptance testing is conducted for the complete integrated Data Warehouse system.

System support processes of database security, system backup and recovery, system

disaster recovery, and data archiving are implemented and tested as the system is

prepared for deployment. The final step is to conduct the Production Readiness Review

prior to transitioning the Data Warehouse system into production. During this review, the

system is evaluated for acceptance by the customer organization.

Transition to Production

The Transition to Production stage moves the Data Warehouse development project into

the production environment. The production database is created, and the

extraction/cleanse/transformation routines are run on the operations system source data.

The development team works with the Operations staff to perform the initial load of this

data to the Warehouse and execute the first refresh cycle. The Operations staff is trained,

and the Data Warehouse programs and processes are moved into the production libraries

and catalogs. Rollout presentations and tool demonstrations are given to the entire

customer community, and end-user training is scheduled and conducted. The Help Desk

is established and put into operation. A Service Level Agreement is developed and

approved by the customer organization. Finally, the new system is positioned for ongoing

maintenance through the establishment of a Change Management Board and the

implementation of change control procedures for future development cycles.

Q.5 Discuss the purpose of executive information system in an organization?

Implementing an Executive Information System (EIS)

An EIS is a tool that provides direct on-line access to relevant information about aspects

of a business that are of particular interest to the senior manager.

Introduction

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 30: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

Many senior managers find that direct on-line access to organizational data is

helpful. For example, Paul Frech, president of Lockheed-Georgia, monitored employee

contributions to company-sponsored programs (United Way, blood drives) as a surrogate

measure of employee morale (Houdeshel and Watson, 1987). C. Robert Kidder, CEO of

Duracell, found that productivity problems were due to salespeople in Germany wasting

time calling on small stores and took corrective action (Main, 1989).

Information systems have long been used to gather and store information, to

produce specific reports for workers, and to produce aggregate reports for managers.

However, senior managers rarely use these systems directly, and often find the aggregate

information to be of little use without the ability to explore underlying details (Watson &

Rainer, 1991, Crockett, 1992).

An Executive Information System (EIS) is a tool that provides direct on-line

access to relevant information in a useful and navigable format. Relevant information is

timely, accurate, and actionable information about aspects of a business that are of

particular interest to the senior manager. The useful and navigable format of the system

means that it is specifically designed to be used by individuals with limited time, limited

keyboarding skills, and little direct experience with computers. An EIS is easy to

navigate so that managers can identify broad strategic issues, and then explore the

information to find the root causes of those issues.

Executive Information Systems differ from traditional information systems in the

following ways:

They are specifically tailored to executive's information needs.

They are able to access data about specific issues and problems as well as

aggregate reports

They provide extensive on-line analysis tools including trend analysis, exception

reporting & "drill-down" capability

They access a broad range of internal and external data

They are particularly easy to use (typically mouse or touchscreen driven)

They are used directly by executives without assistance

They present information in a graphical form

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 31: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

Purpose of EIS

The primary purpose of an Executive Information System is to support

managerial learning about an organization, its work processes, and its interaction with the

external environment. Informed managers can ask better questions and make better

decisions. Vandenbosch and Huff (1992) from the University of Western Ontario found

that Canadian firms using an EIS achieved better business results if their EIS promoted

managerial learning. Firms with an EIS designed to maintain managers' "mental models"

were less effective than firms with an EIS designed to build or enhance managers'

knowledge.

This distinction is supported by Peter Senge in The Fifth Dimension. He

illustrates the benefits of learning about the behaviour of systems versus simply learning

more about their states. Learning more about the state of a system leads to reactive

management fixes. Typically these reactions feed into the underlying system behaviour

and contribute to a downward spiral. Learning more about system behaviour and how

various system inputs and actions interrelate will allow managers to make more proactive

changes to create long-term improvement.

A secondary purpose for an EIS is to allow timely access to information. All of

the information contained in an EIS can typically be obtained by a manager through

traditional methods. However, the resources and time required to manually compile

information in a wide variety of formats, and in response to ever changing and ever more

specific questions usually inhibit managers from obtaining this information. Often, by the

time a useful report can be compiled, the strategic issues facing the manager have

changed, and the report is never fully utilized.

Timely access also influences learning. When a manager obtains the answer to a

question, that answer typically sparks other related questions in the manager's mind. If

those questions can be posed immediately, and the next answer retrieved, the learning

cycle continues unbroken. Using traditional methods, by the time the answer is produced,

the context of the question may be lost, and the learning cycle will not continue. An

executive in Rockart & Treacy's 1982 study noted that:

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 32: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

Your staff really can't help you think. The problem with giving a question to the

staff is that they provide you with the answer. You learn the nature of the real question

you should have asked when you muck around in the data (p. 9).

A third purpose of an EIS is commonly misperceived. An EIS has a powerful

ability to direct management attention to specific areas of the organization or specific

business problems. Some managers see this as an opportunity to discipline subordinates.

Some subordinates fear the directive nature of the system and spend a great deal of time

trying to outwit or discredit it. Neither of these behaviours is appropriate or productive.

Rather, managers and subordinates can work together to determine the root causes of

issues highlighted by the EIS.

The powerful focus of an EIS is due to the maxim "what gets measured gets

done." Managers are particularly attentive to concrete information about their

performance when it is available to their superiors. This focus is very valuable to an

organization if the information reported is actually important and represents a balanced

view of the organization's objectives.

Misaligned reporting systems can result in inordinate management attention to

things that are not important or to things which are important but to the exclusion of other

equally important things. For example, a production reporting system might lead

managers to emphasize volume of work done rather than quality of work. Worse yet,

productivity might have little to do with the organization's overriding customer service

objectives.

Contents of EIS

A general answer to the question of what data is appropriate for inclusion in an

Executive Information System is "whatever is interesting to executives." While this

advice is rather simplistic, it does reflect the variety of systems currently in use.

Executive Information Systems in government have been constructed to track data about

Ministerial correspondence, case management, worker productivity, finances, and human

resources to name only a few. Other sectors use EIS implementations to monitor

information about competitors in the news media and databases of public information in

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 33: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

addition to the traditional revenue, cost, volume, sales, market share and quality

applications.

Frequently, EIS implementations begin with just a few measures that are clearly

of interest to senior managers, and then expand in response to questions asked by those

managers as they use the system. Over time, the presentation of this information becomes

stale, and the information diverges from what is strategically important for the

organization. A "Critical Success Factors" approach is recommended by many

management theorists (Daniel, 1961, Crockett, 1992, Watson and Frolick, 1992).

Practitioners such as Vandenbosch (1993) found that:

While our efforts usually met with initial success, we often found that after six

months to a year, executives were almost as bored with the new information as they had

been with the old. A strategy we developed to rectify this problem required organizations

to create a report of the month. That is, in addition to the regular information provided for

management committee meetings, the CEO was charged with selecting a different

indicator to focus on each month (Vandenbosch, 1993, pp. 8-9).

While the above indicates that selection of data for inclusion in an EIS is difficult,

there are several guidelines that help to make that assessment. A practical set of

principles to guide the design of measures and indicators to be included in an EIS is

presented below (Kelly, 1992b). For a more detailed discussion of methods for selecting

measures that reflect organizational objectives, see the section "EIS and Organizational

Objectives."

EIS measures must be easy to understand and collect. Wherever possible, data

should be collected naturally as part of the process of work. An EIS should not add

substantially to the workload of managers or staff.

EIS measures must be based on a balanced view of the organization's objective.

Data in the system should reflect the objectives of the organization in the areas of

productivity, resource management, quality and customer service.

Performance indicators in an EIS must reflect everyone's contribution in a fair and

consistent manner. Indicators should be as independent as possible from variables outside

the control of managers.

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 34: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

EIS measures must encourage management and staff to share ownership of the

organization's objectives. Performance indicators must promote both team-work and

friendly competition. Measures will be meaningful for all staff; people must feel that

they, as individuals, can contribute to improving the performance of the organization.

EIS information must be available to everyone in the organization. The objective

is to provide everyone with useful information about the organization's performance.

Information that must remain confidential should not be part of the EIS or the

management system of the organization.

EIS measures must evolve to meet the changing needs of the organization.

Barriers to Effectiveness

There are many ways in which an EIS can fail. Dozens of high profile, high cost

EIS projects have been cancelled, implemented and rarely used, or implemented and used

with negative results. An EIS is a high risk project precisely because it is intended for use

by the most powerful people in an organization. Senior managers can easily misuse the

information in the system with strongly detrimental effects on the organization. Senior

managers can refuse to use a system if it does not respond to their immediate personal

needs or is too difficult to learn and use.

Unproductive Organizational Behaviour Norms

Issues of organizational behaviour and culture are perhaps the most deadly

barriers to effective Executive Information Systems. Because an EIS is typically

positioned at the top of an organization, it can create powerful learning experiences and

lead to drastic changes in organizational direction. However, there is also great potential

for misuse of the information. Green, Higgins and Irving (1988) found that performance

monitoring can promote bureaucratic and unproductive behaviour, can unduly focus

organizational attention to the point where other important aspects are ignored, and can

have a strongly negative impact on morale.

The key barrier to EIS effectiveness, therefore, is the way in which the

organization uses the information in the system. Managers must be aware of the dangers

of statistical data, and be skilled at interpreting and using data in an effective way. Even

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 35: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

more important is the manager's ability to communicate with others about statistical data

in a non-defensive, trustworthy, and constructive manner. Argyris (1991) suggests a

universal human tendency towards strategies that avoid embarrassment or threat, and

towards feelings of vulnerability or incompetence. These strategies include:

The stating criticism of others in a way that you feel is valid but also in a way that

prevents others from deciding for themselves

Failing to include any data that others could use to objectively evaluate your

criticism

Stating your conclusions in ways that disguise their logical implications and

denying those implications if they are suggested

To make effective use of an EIS, mangers must have the self-confidence to accept

negative results and focus on the resolution of problems rather than on denial and blame.

Since organizations with limited exposure to planning and targeting, data-based decision-

making, statistical process control, and team-based work models may not have dealt with

these behavioural issues in the past, they are more likely to react defensively and reject an

EIS.

Technical Excellence

An interesting result from the Vandenbosch & Huff (1988) study was that the

technical excellence of an EIS has an inverse relationship with effectiveness. Systems

that are technical masterpieces tend to be inflexible, and thus discourage innovation,

experimentation and mental model development.

Flexibility is important because an EIS has such a powerful ability to direct

attention to specific issues in an organization. A technical masterpiece may accurately

direct management attention when the system is first implemented, but continue to direct

attention to issues that were important a year ago on its first anniversary. There is

substantial danger that the exploration of issues necessary for managerial learning will be

limited to those subjects that were important when the EIS was first developed. Managers

must understand that as the organization and its work changes, an EIS must continually

be updated to address the strategic issues of the day.

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 36: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

A number of explanations as to why technical masterpieces tend to be less

flexible are possible. Developers who create a masterpiece EIS may become attached to

the system and consciously or unconsciously dissuade managers from asking for changes.

Managers who are uncertain that the benefits outweigh the initial cost of a masterpiece

EIS may not want to spend more on system maintenance and improvements. The time

required to create a masterpiece EIS may mean that it is outdated before it is

implemented.

While usability and response time are important factors in determining whether

executives will use a system, cost and flexibility are paramount. A senior manager will be

more accepting of an inexpensive system that provides 20% of the needed information

within a month or two than with an expensive system that provides 80% of the needed

information after a year of development. The manager may also find that the inexpensive

system is easier to change and adapt to the evolving needs of the business. Changing a

large system would involve throwing away parts of a substantial investment. Changing

the inexpensive system means losing a few weeks of work. As a result, fast, cheap,

incremental approaches to developing an EIS increase the chance of success.

Technical Problems

Paradoxically, technical problems are also frequently reported as a significant

barrier to EIS success. The most difficult technical problem -- that of integrating data

from a wide range of data sources both inside and outside the organization -- is also one

of the most critical issues for EIS users. A marketing vice-president, who had spent

several hundred thousand dollars on an EIS, attended a final briefing on the system. The

technical experts demonstrated the many graphs and charts of sales results, market share

and profitability. However, when the vice-president asked for a graph of market share

and advertising expense over the past ten years, the system was unable to access

historical data. The project was cancelled in that meeting.

The ability to integrate data from many different systems is important because it

allows managerial learning that is unavailable in other ways. The president of a

manufacturing company can easily get information about sales and manufacturing from

the relevant VPs. Unfortunately, the information the president receives will likely be

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 37: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

incompatible, and learning about the ways in which sales and manufacturing processes

influence each other will not be easy. An EIS will be particularly effective if it can

overcome this challenge, allowing executives to learn about business processes that cross

organizational boundaries and to compare business results in disparate functions.

Another technical problem that can kill EIS projects is usability. Senior managers

simply have the choice to stop using a system if they find it too difficult to learn or use.

They have very little time to invest in learning the system, a low tolerance for errors, and

initially may have very little incentive to use it. Even if the information in the system is

useful, a difficult interface will quickly result in the manager assigning an analyst to

manipulate the system and print out the required reports. This is counter-productive

because managerial learning is enhanced by the immediacy of the question - answer

learning cycle provided by an EIS. If an analyst is interacting with the system, the analyst

will acquire more learning than the manager, but will not be in a position to put that

learning to its most effective use.

Usability of Executive Information Systems can be enhanced through the use of

prototyping and usability evaluation methods. These methods ensure that clear

communication occurs between the developers of the system and its users. Managers

have an opportunity to interact with systems that closely resemble the functionality of the

final system and thus can offer more constructive criticism than they might be able to

after reading an abstract specification document. Systems developers also are in a

position to listen more openly to criticisms of a system since a prototype is expected to be

disposable. Several evaluation protocols are available including observation and

monitoring, software logging, experiments and benchmarking, etc. (Preece et al, 1994).

The most appropriate methods for EIS design are those with an ethnographic flavour

because the experience base of system developers is typically so different from that of

their user population (senior executives).

Misalignment Between Objectives & EIS

A final barrier to EIS effectiveness was mentioned earlier in the section on

purpose. As noted there, the powerful ability of an EIS to direct organizational attention

can be destructive if the system directs attention to the wrong variables. There are many

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 38: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

examples of this sort of destructive reporting. Grant, Higgins and Irving (1988) report the

account of an employee working under a misaligned reporting system.

I like the challenge of solving customer problems, but they get in the way of

hitting my quota. I'd like to get rid of the telephone work. If (the company) thought

dealing with customers was important, I'd keep it; but if it's just going to be production

that matters, I'd gladly give all the calls to somebody else.

Traditional cost accounting systems are also often misaligned with organizational

objectives, and placing these measures in an EIS will continue to draw attention to the

wrong things. Cost accounting allocates overhead costs to direct labour hours. In some

cases the overhead burden on each direct labour hour is as much as 1000%. A manager

operating under this system might decide to sub-contract 100 hours of direct labor at $20

per hour. On the books, this $2,000 saving is accompanied by $20,000 of savings in

overhead. If the sub-contractor charges $5,000 for the work, the book savings are $2,000

+ $20,000 - $5,000 = $17,000. In reality, however, the overhead costs for an idle machine

in a factory do not go down much at all. The sub-contract actually ends up costing $5,000

- $2,000 = $3,000. (Peters, 1987)

Characteristics of Successful EIS Implementations

Find an Appropriate Executive Champion

EIS projects that succeed do so because at least one member of the senior

management team agrees to champion the project. The executive champion need not fully

understand the technical issues, but must be a person who works closely with all of the

senior management team and understands their needs, work styles and their current

methods of obtaining organizational information. The champion's commitment must

include a willingness to set aside time for reviewing prototypes and implementation

plans, influencing and coaching other members of the senior management team, and

suggesting modifications and enhancements to the system.

Deliver a Simple Prototype Quickly

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 39: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

Executives judge a new EIS on the basis of how easy it is to use and how relevant

the information in the system is to the current strategic issues in the organization. As a

result, the best EIS projects begin as a simple prototype, delivered quickly, that provides

data about at least one critical issue. If the information delivered is worth the hassle of

learning the system, a flurry of requirements will shortly be generated by executives who

like what they see, but want more. These requests are the best way to plan an EIS that

truly supports the organization, and are more valuable than months of planning by a

consultant or analyst.

One caveat concerning the simple prototype approach is that executive requests

will quickly scatter to questions of curiosity rather than strategy in an organization where

strategic direction and objectives are not clearly defined. A number of methods are

available to support executives in defining business objectives and linking them to

performance monitors in an EIS. These are discussed further in the section on EIS and

Organizational Objectives below.

Involve Your Information Systems Department

In some organizations, the motivation for an EIS project arises in the business

units quite apart from the traditional information systems (IS) organization. Consultants

may be called in, or managers and analysts in the business units may take the project on

without consulting or involving IS. This is a serious mistake. Executive Information

Systems rely entirely on the information contained in the systems created and maintained

by this department. IS professionals know best what information is available in an

organization's systems and how to get it. They must be involved in the team. Involvement

in such a project can also be beneficial to IS by giving them a more strategic perspective

on how their work influences the organization.

Communicate & Train to Overcome Resistance

A final characteristic of successful EIS implementations is that of communication.

Executive Information Systems have the potential to drastically alter the prevailing

patterns of organizational communication and thus will typically be met with resistance.

Some of this resistance is simply a matter of a lack of knowledge. Training on how to use

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 40: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

statistics and performance measures can help. However, resistance can also be rooted in

the feelings of fear, insecurity and cynicism experienced by individuals throughout the

organization. These attitudes can only be influenced by a strong and vocal executive

champion who consistently reinforces the purpose of the system and directs the attention

of the executive group away from unproductive and punitive behaviours.

EIS and Organizational Culture

Henry Mintzberg (1972) has argued that impersonal statistical data is irrelevant to

managers. John Dearden (1966) argued that the promise of real-time management

information systems was a myth and would never be of use to top managers. Grant,

Higgins, and Irving (1988) argue that computerized performance monitors undermine

trust, reduce autonomy and fail to illuminate the most important issues.

Many of these arguments against EISs have objective merit. Manager's really do

value the tangible tidbits of detail they encounter in their daily interactions more highly

than abstract numerical reports. Rumours suggest a future, while numbers describe a past.

Conversations are rich in detail and continuously probe the reasons for the situation,

while statistics are vague approximations of reality. When these vague approximations

are used to intimidate or control behaviour rather than to guide learning, they really do

have a negative impact on the organization.

Yet both of these objections point to a deeper set of problems -- the assumptions,

beliefs, values and behaviours that people in the organization hold and use to respond to

their environment. Perhaps senior managers find statistical data to be irrelevant because

they have found too many errors in previous reports? Perhaps people in the organization

prefer to assign blame rather than discover the true root cause of problems. The culture

of an organization can have a dramatic influence on the adoption and use of an Executive

Information System. The following cultural characteristics will contribute directly to the

success or failure of an EIS project.

Learning vs Blaming

A learning organization is one that seeks first to understand why a problem

occurred, and not who is to blame. It is a common and natural response for managers to

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 41: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

try to deflect responsibility for a problem on to someone else. An EIS can help to do this

by indicating very specifically who failed to meet a statistical target, and by how much. A

senior manager, armed with EIS data, can intimidate and blame the appropriate person.

The blamed person can respond by questioning the integrity of the system, blaming

someone else, or even reacting in frustration by slowing work down further.

In a learning organization, any unusual result is seen as an opportunity to learn

more about the business and its processes. Managers who find an unusual statistic explore

it further, breaking it down to understand its components and comparing it with other

numbers to establish cause and effect relationships. Together as a team, management uses

numerical results to focus learning and improve business processes across the

organization. An EIS facilitates this approach by allowing instant exploration of a

number, its components and its relationship to other numbers.

Continuous Improvement vs Crisis Management

Some organizations find themselves constantly reacting to crises, with little time

for any proactive measures. Others have managed to respond to each individual crisis

with an approach that prevents other similar problems in the future. They are engaged in

a continual cycle of improving business practices and finding ways to avoid crisis.

Crises in government are frequently caused by questions about organizational

performance raised by an auditor, the Minister, or members of the Opposition. An EIS

can be helpful in responding to this sort of crisis by providing instant data about the

actual facts of the situation. However, this use of the EIS does little to prevent future

crises.

An organizational culture in which continual improvement is the norm can use the

EIS as an early warning system pointing to issues that have not yet reached the crisis

point, but are perhaps the most important areas on which to focus management attention

and learning. Organizations with a culture of continuous improvement already have an

appetite for the sort of data an EIS can provide, and thus will exhibit less resistance.

Team Work vs Hierarchy

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 42: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

An EIS has the potential to substantially disrupt an organization that relies upon

adherence to a strict chain of command. The EIS provides senior managers with the

ability to micro-manage details at the lowest levels in the organization. A senior manger

with an EIS report who is surprised at the individual results of a front-line worker might

call that person directly to understand why the result is unusual. This could be very

threatening for the managers between the senior manager and the front-line worker. An

EIS can also provide lower level managers with access to information about peer

performance and even the performance of their superiors.

Organizations that are familiar with work teams, matrix managed projects and

other forms of interaction outside the chain of command will find an EIS less disruptive.

Senior managers in these organizations have learned when micro-management is

appropriate and when it is not. Middle managers have learned that most interactions

between their superiors and their staff are not threatening to their position. Workers are

more comfortable interacting with senior managers when the need arises, and know what

their supervisor expects from them in such an interaction.

Data-based Decisions vs Decisions in a Vacuum

The total quality movement, popular in many organizations today, emphasizes a

set of tools referred to as Statistical Process Control (SPC). These analytical tools provide

managers and workers with methods of understanding a problem and finding solutions

rather than allocating blame and passing the buck. Organizations with training and

exposure to SPC and analytical tools will be more open to an EIS than those who are

suspicious of numerical measures and the motives of those who use them.

It should be noted that data-based decision making does not deny the role of

intuition, experience, or negotiation amongst a group. Rather, it encourages decision-

makers to probe the facts of a situation further before coming to a decision. Even if the

final decision contradicts the data, chances are that an exploration of the data will help

the decision-maker to understand the situation better before a decision is reached. An EIS

can help with this decision-making process.

Information Sharing vs Information Hoarding

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 43: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

Information is power in many organizations, and managers are motivated to hoard

information rather than to share it widely. For example, managers may hide information

about their own organizational performance, but jump at any chance to see information

about performance of their peers.

A properly designed EIS promotes information sharing throughout the organization.

Peers have access to information about each other's domain; junior managers have

information about how their performance contributes to overall organizational

performance. An organization that is comfortable with information sharing will have

developed a set of "good manners" for dealing with this broad access to information.

These behavioural norms are key to the success of an EIS.

Specific Objectives vs Vague Directions

An organization that has experience developing and working toward Specific,

Measurable, Achievable and Consistent (SMAC) objectives will also find an EIS to be

less threatening. Many organizations are uncomfortable with specific performance

measures and targets because they believe their work to be too specialized or

unpredictable. Managers in these organizations tend to adopt vague generalizations and

statements of the exceedingly obvious in place of SMAC objectives that actually focus

and direct organizational performance. In a few cases, it may actually be true that

numerical measures are completely inappropriate for a few aspects of the business. In

most cases, managers with this attitude have a poor understanding of the purpose of

objective and target-setting exercises. Some business processes are more difficult to

measure and set targets for than others. Yet almost all business processes have at least a

few characteristics that can be measured and improved through conscientious objective

setting. (See the following section on EIS and Organizational Objectives.)

EIS and Organizational Objectives

A number of writers have discovered that one of the major difficulties with EIS

implementations is that the information contained in the EIS either does not meet

executive requirements, or meets executive requirements, but fails to guide the

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 44: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

organization towards its objectives. As discussed earlier, organizations that are

comfortable in establishing and working towards Specific, Measurable, Achievable, and

Consistent (SMAC) objectives will find it easier to create an EIS that actually drives

organizational performance. Yet even these organizations may have difficulty because

their stated objectives do not represent all of the things that are important.

Crockett (1992) suggests a four step process for developing EIS information

requirements based on a broader understanding of organizational objectives. The steps

are: (1) identify critical success factors and stakeholder expectations, (2) document

performance measures that monitor the critical success factors and stakeholder

expectations, (3) determine reporting formats and frequency, and (4) outline information

flows and how information can be used. Crockett begins with stakeholders to ensure that

all relevant objectives and critical success factors are reflected in the EIS.

Kaplan and Norton (1992) suggest that goals and measures need to be developed

from each of four perspectives: financial, customer, internal business, and innovation and

learning. These perspectives help managers to achieve a balance in setting objectives, and

presenting them in a unified report exposes the tough tradeoffs in any management

system. An EIS built on this basis will not promote productivity while ignoring quality,

or customer satisfaction while ignoring cost.

Meyer (1994) raises several questions that should be asked about measurement

systems for teams. Four are appropriate for evaluating objectives and measures

represented in an EIS. They are:

Are all critical organizational outcomes tracked?

Are all "out-of-bounds" conditions tracked? (Conditions that are serious enough to trigger

a management review.)

Are all the critical variables required to reach each outcome tracked?

Is there any measure that would not cause the organization to change its behaviour?

In summary, proper definition of organizational objectives and measures is a helpful

precondition for reducing organizational resistance to an EIS and is the root of effective

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 45: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

EIS use. The benefits of an EIS will be fully realized only when it helps to focus

management attention on issues of true importance to the organization.

Methodology

Implementation of an effective EIS requires clear consensus on the objectives and

measures to be monitored in the system and a plan for obtaining the data on which those

measures are based. The sections below outline a methodology for achieving these two

results. As noted earlier, successful EIS implementations generally begin with a simple

prototype rather than a detailed planning process. For that reason, the proposed planning

methodologies are as simple and scope-limited as possible.

Q.6 Discuss the challenges involved in data integration and coordination process?

Data Integration Primer

Challenges to Data Integration

One of the most fundamental challenges in the process of data integration is

setting realistic expectations. The term data integration conjures a perfect coordination of

diversified databases, software, equipment, and personnel into a smoothly functioning

alliance, free of the persistent headaches that mark less comprehensive systems of

information management. Think again.

The requirements analysis stage offers one of the best opportunities in the process

to recognize and digest the full scope of complexity of the data integration task.

Thorough attention to this analysis is possibly the most important ingredient in creating a

system that will live to see adoption and maximum use.

As the field of data integration progresses, however, other common impediments

and compensatory solutions will be easily identified. Current integration practices have

already highlighted a few familiar challenges as well as strategies to address them, as

outlined below.

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 46: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

Heterogeneous Data

Challenges

For most transportation agencies, data integration involves synchronizing huge

quantities of variable, heterogeneous data resulting from internal legacy systems that vary

in data format. Legacy systems may have been created around flat file, network, or

hierarchical databases, unlike newer generations of databases which use relational data.

Data in different formats from external sources continue to be added to the legacy

databases to improve the value of the information. Each generation, product, and home-

grown system has unique demands to fulfill in order to store or extract data. So data

integration can involve various strategies for coping with heterogeneity. In some cases,

the effort becomes a major exercise in data homogenization, which may not enhance the

quality of the data offered.

Strategies

A detailed analysis of the characteristics and uses of data is necessary to mitigate

issues with heterogeneous data. First, a model is chosen-either a federated or data

warehouse environment- that serves the requirements of the business applications

and other uses of the data. Then the database developer will need to ensure that

various applications can use this format or, alternatively, that standard operating

procedures are adopted to convert the data to another format.

Bringing disparate data together in a database system or migrating and fusing

highly incompatible databases is painstaking work that can sometimes feel like an

overwhelming challenge. Thankfully, software technology has advanced to

minimize obstacles through a series of data access routines that allow structured

query languages to access nearly all DBM and data file systems-relational or non-

relational.

Bad Data

Challenges

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 47: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

Data quality is a top concern in any data integration strategy. Legacy data must be

cleaned up prior to conversion and integration, or an agency will almost certainly face

serious data problems later. Legacy data impurities have a compounding effect; by

nature, they tend to concentrate around high volume data users.

If this information is corrupt, so, too, will be the decisions made from it. It is not

unusual for undiscovered data quality problems to emerge in the process of cleaning

information for use by the integrated system. The issue of bad data leads to procedures

for regularly auditing the quality of information used. But who holds the ultimate

responsibility for this job is not always clear.

Strategies:

The issue of data quality exists throughout the life of any data integration system.

So it is best to establish both practices and responsibilities right from the start, and

make provisions for each to continue in perpetuity.

The best processes result when developers and users work together to determine

the quality controls that will be put in place in both the development phase and

the ongoing use of the system.

Lack of Storage Capacity

Challenges

The unanticipated need for additional performance and capacity is one of the most

common challenges to data integration, particularly in data warehousing. Two storage-

related requirements generally come into play: extensibility and scalability. Anticipating

the extent of growth in an environment in which the need for storage can increase

exponentially once a system is initiated drives fears that the storage cost will exceed the

benefit of data integration. Introducing such massive quantities of data can push the limits

of hardware and software. This may force developers to instigate costly fixes if an

architecture for processing much larger amounts of data must be retrofitted into the

planned system.

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 48: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

Strategies

Alternative storage is becoming routine for data warehouses that are likely to

grow in size. Planning for such options helps keep expanding databases

affordable.

The cost per gigabyte of storage on disk drives continues to decline as technology

improves. From 2000 to 2004, for instance, the cost of data storage declined ten-

fold. High-performance storage disks are expected to follow the downward

pricing spiral.

Unanticipated Costs

Challenges

Data integration costs are fueled largely by items that are difficult for the

uninitiated to quantify, and thus predict. These might include:

Labor costs for initial planning, evaluation, programming and additional data

acquisition

Software and hardware purchases

Unanticipated technology changes/advances

Both labor and the direct costs of data storage and maintenance

It is important to note that, regardless of efforts to streamline maintenance, the

realities of a fully functioning data integration system may demand a great deal more

maintenance than could be anticipated.

Unrealistic estimating can be driven by an overly optimistic budget, particularly in

these times of budget shortfall and doing more with less. More users, more analysis needs

and more complex requirements may drive performance and capacity problems. Limited

resources may cause project timelines to be extended, without commensurate funding.

Unanticipated issues, or new issues, may call for expensive consulting help. And the

dynamic atmosphere of today's transportation agency must be taken into account, in

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 49: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

which lack of staff, changes in business processes, problems with hardware and software,

and shifting leadership can drive additional expense.

The investment in time and labor required to extract, clean, load, and maintain data

can creep if the quality of the data presented is weak. It is not unusual for this to produce

unanticipated labor costs that are rather alarmingly out of proportion to the total project

budget.

Strategies

The approach to estimating project costs must be both far-sighted and realistic.

This requires an investment in experienced analysts, as well as cooperation, where

possible, among sister agencies on lessons learned.

Special effort should be made to identify items that may seem unlikely but could

dramatically impact total project cost.

Extraordinary care in planning, investing in expertise, obtaining stakeholder buy-

in and participation, and managing the process will each help ensure that cost

overruns are minimized and, when encountered, can be most effectively resolved.

Data integration is a fluid process in which such overruns may occur at each step

along the way, so trained personnel with vigilant oversight are likely to return

dividends instead of adding to cost.

A viable data integration approach must recognize that the better data integration

works for users, the more fundamental it will become to business processes. This

level of use must be supported by consistent maintenance. It might be tempting to

think that a well designed system will, by nature, function without much upkeep

or tweaking. In fact, the best systems and processes tend to thrive on the routine

care and support of well-trained personnel, a fact that wise managers generously

anticipate in the data integration plan and budget.

Lack of Cooperation from Staff

Challenges

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 50: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

User groups within an agency may have developed databases on their own,

sometimes independently from information systems staff, that are highly responsive to

the users' particular needs. It is natural that owners of these functioning standalone units

might be skeptical that the new system would support their needs as effectively.

Other proprietary interests may come into play. For example, division staff may

not want the data they collect and track to be at all times transparently visible to

headquarters staff without the opportunity to address the nuances of what the data appear

to show. Owners or users may fear that higher ups without appreciation of the

peculiarities of a given method of operation will gain more control over how data is

collected and accessed organization-wide.

In some agencies, the level of personnel, consultants, and financial support

emanating from the highest echelons of management may be insufficient to dispel these

fears and gain cooperation. Top management must be fully invested in the project.

Otherwise, the likelihood is smaller that the strategic data integration plan and the

resources associated with it will be approved. The additional support required to engage

and convey to everyone in the agency the need for and benefits of data integration is

unlikely to flow from leaders who lack awareness of or commitment to the benefits of

data integration.

Strategies

Any large-scale data integration project, regardless of model, demands that

executive management be fully on board. Without it, the initiative is, quite

simply, likely to fail.

Informing and involving the diversity of players during the crucial requirements

analysis stage, and then in each subsequent phase and step, is probably the single

most effective way to gain buy-in, trust, and cooperation. Collecting and

addressing each user's concerns may be a daunting proposition, particularly for

knowledgeable information professionals who prefer to "cut to the chase."

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 51: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

However, without a personal stake in the process and a sense of ownership of the

final product, the long-term health of this major investment is likely to be

compromised by users who feel that change has been enforced upon them rather

than designed to advance their interests.

Incremental education, another benefit of stakeholder involvement, is easier to

impart than after-the-fact training, particularly since it addresses both the

capabilities and limitations of the system, helping to calibrate appropriate

expectations along the way.

Since so much of the project's success is dependent upon understanding and

conveying both human and technical issues, skilled communicators are a logical

component of any data integration team. Whether staff or consultants,

professional communications personnel are most effective as core participants,

rather than occasional or outside contributors. They are trained to recognize and

ameliorate gaps in understanding and motivation. Their skills also help maximize

the conditions for cooperation and enthusiastic adoption. In many transportation

agencies, public information personnel actually focus a significant amount of their

time and budget on internal audiences rather than external customers. This makes

them well attuned to the operational realities of a variety of internal stakeholders.

Peer Perspectives...

At least three conditions were required for the success of Virginia DOT's

development effort:

Upper management had to support the business objectives of the project and the

creation of a new system to meet the objectives

Project managers had to receive the budget, staff, and IT resources necessary to

initiate and complete the process

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 52: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

All stakeholders and eventual system users from the agency's districts and

headquarters had to cooperate with the project team throughout the process(22)

Lack of Data Management Expertise

Challenges

As more transportation agencies nationwide undertake the integration of data, the

availability of experienced personnel increases. However, since data integration is a

multi-year, highly complex proposition, even these leaders may not have the kind of

expertise that evolves over a full project life-cycle. Common problems develop at

different stages of the process and these can better be anticipated and addressed when key

personnel have managed the typical variables of each project phase.

Also, the process of transferring historical data from its independent source to the

integrated system may benefit from the knowledge of the manager who originally

captured and stored the information. High turnover in such positions, along with early

retirements and other personnel shifts driven by an historically tight budget environment,

may complicate the mining and preparation of this data for convergence with the new

system.

Strategies

A seasoned and highly knowledgeable data integration project leader and a data

manager with state of the practice experience are the minimum required to design

a viable approach to integration. Choosing this expertise very carefully can help

ensure that the resulting architecture is sufficiently modular, can be maintained,

and is robust enough to support a wide range of owner and user needs while

remaining flexible enough to accommodate changing transportation decision-

support requirements over a period of years.

Perception of Data Integration as an Overwhelming Effort

Challenges

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 53: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

When transportation agencies consider data integration, one pervasive notion is

that the analysis of existing information needs and infrastructure, much less the

organization of data into viable channels for integration, requires a monumental initial

commitment of resources and staff. Resource-scarce agencies identify this perceived

major upfront overhaul as "unachievable" and "disruptive." In addition, uncertainties

about funding priorities and potential shortfalls can exacerbate efforts to move forward.

Strategies

Methodical planning is essential in data integration. Setting incremental (or

phased) goals helps ensure that each phase can be understood, achieved, and

funded adequately. This approach also allows the integration process to be

flexible and agile, minimizing risks associated with funding and other resource

uncertainties and priority shifts. In addition, the smaller, more accurate goals will

help sustain the integration effort and make it less disruptive to those using and

providing data.

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 54: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

BUSINESS INTELLIGENCE TOOLS – 4 CREDITS

SUBJECT CODE – MI0036

ASSIGNMENT SET – 2

Q.1 Explain business development life cycle in detail?

Business Life Cycle

Your business is changing. With the passage of time, your company will go

through various stages of the business life cycle. Learn what upcoming focuses,

challenges and financing sources you will need to succeed.

A business goes through stages of development similar to the cycle of life for the

human race. Parenting strategies that work for your toddler can not be applied to your

teenager. The same goes for your small business. It will be faced with a different cycle

throughout its life. What you focus on today will change and require different approaches

to be successful.

The 7 Stages of the Business Life Cycle

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 55: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

Seed

The seed stage of your business life cycle is when your business is just a thought or an

idea. This is the very conception or birth of a new business.

Challenge: Most seed stage companies will have to overcome the challenge of

market acceptance and pursue one niche opportunity. Do not spread money and time

resources too thin.

Focus: At this stage of the business the focus is on matching the business opportunity

with your skills, experience and passions. Other focal points include: deciding on a

business ownership structure, finding professional advisors, and business planning.

Money Sources: Early in the business life cycle with no proven market or customers

the business will rely on cash from owners, friends and family. Other potential

sources include suppliers, customers, government grants and banks.

WNB products to consider: Classic Checking Account / Business Savings Account /

SBA Resources / Minnesota SBDC / Minnesota Community Capital Fund

Start-Up

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 56: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

Your business is born and now exists legally. Products or services are in production and

you have your first customers.

Challenge: If your business is in the start-up life cycle stage, it is likely you have

overestimated money needs and the time to market. The main challenge is not to

burn through what little cash you have. You need to learn what profitable needs

your clients have and do a reality check to see if your business is on the right track.

Focus: Start-ups require establishing a customer base and market presence along with

tracking and conserving cash flow.

Money Sources: Owner, friends, family, suppliers, customers, grants, and banks.

WNB products to consider: Seed Stage Products / Working Capital Loan / Line of

Credit / Equipment Financing / Business Internet Banking / Bill Payer / Credit Card

Processing

Growth

Your business has made it through the toddler years and is now a child. Revenues and

customers are increasing with many new opportunities and issues. Profits are strong, but

competition is surfacing.

Challenge: The biggest challenge growth companies face is dealing with the constant

range of issues bidding for more time and money. Effective management is required

and a possible new business plan. Learn how to train and delegate to conquer this

stage of development.

Focus: Growth life cycle businesses are focused on running the business in a more

formal fashion to deal with the increased sales and customers. Better accounting and

management systems will have to be set-up. New employees will have to be hired to

deal with the influx of business.

Money Sources: Banks, profits, partnerships, grants and leasing options.

WNB products to consider: Line of Credit / Equipment Financing / Construction

Loan / Commercial Real Estate Loan / Health Savings Account / Remote Deposit /

Cash Management / Business Credit Card

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 57: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

Established

Your business has now matured into a thriving company with a place in the market and

loyal customers. Sales growth is not explosive but manageable. Business life has become

more routine.

Challenge: It is far too easy to rest on your laurels during this life stage. You have

worked hard and have earned a rest but the marketplace is relentless and

competitive. Stay focused on the bigger picture. Issues like the economy,

competitors or changing customer tastes can quickly end all you have work for.

Focus: An established life cycle company will be focused on improvement and

productivity. To compete in an established market, you will require better business

practices along with automation and outsourcing to improve productivity.

Money Sources: Profits, banks, investors and government.

WNB products to consider: Premium Checking Account / Business Money Fund

Account / Sweep Account / Private Financial / 401K Planning / Investment

Brokerage / Health Savings Account / Remote Deposit / Cash Management /

Business Credit Card / Line of Credit

Expansion

This life cycle is characterized by a new period of growth into new markets and

distribution channels. This stage is often the choice of the business owner to gain a larger

market share and find new revenue and profit channels.

Challenge: Moving into new markets requires the planning and research of a seed or

start-up stage business. Focus should be on businesses that complement your

existing experience and capabilities. Moving into unrelated businesses can be

disastrous.

Focus: Add new products or services to existing markets or expand existing business

into new markets and customer types.

Money Sources: Joint ventures, banks, licensing, new investors and partners.

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 58: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

WNB products to consider: Acquisition Financing / Private Financial / Line of

Credit / Equipment Financing / Construction Loan / Commercial Real Estate Loan /

Investment Brokerage

Mature

Year over year sales and profits tend to be stable, however competition remains fierce.

Eventually sales start to fall off and a decision is needed whether to expand or exit the

company.

Challenge: Businesses in the mature stage of the life cycle will be challenged with

dropping sales, profits, and negative cash flow. The biggest issue is how long the

business can support a negative cash flow. Ask is it time to move back to the

expansion stage or move on to the final life cycle stage...exit.

Focus: Search for new opportunities and business ventures. Cutting costs and finding

ways to sustain cash flow are vital for the mature stage.

Money Sources: Suppliers, customers, owners, and banks.

WNB products to consider: Private Financial / 401K Planning / Employee Stock

Ownership Plans (ESOP) / Investment Brokerage / Health Savings Account /

Remote Deposit / Cash Management / Line of Credit

Exit

This is the big opportunity for your business to cash out on all the effort and years of hard

work. Or it can mean shutting down the business.

Challenge: Selling a business requires your realistic valuation. It may have been

years of hard work to build the company, but what is its real value in the current

market place. If you decide to close your business, the challenge is to deal with the

financial and psychological aspects of a business loss.

Focus: Get a proper valuation on your company. Look at your business operations,

management and competitive barriers to make the company worth more to the

buyer. Set-up legal buy-sell agreements along with a business transition plan.

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 59: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

Money Sources: Find a business valuation partner. Consult with your accountant and

financial advisors for the best tax strategy to sell or close-out down business.

WNB products to consider: Acquisition Financing / Employee Stock Ownership

Plans (ESOP) / Investment Brokerage / Trust

Q.2. Discuss the various components of data ware house?

Components of a Data Warehouse

Overall Architecture

The data warehouse architecture is based on a relational database management

system server that functions as the central repository for informational data. Operational

data and processing is completely separated from data warehouse processing. This central

information repository is surrounded by a number of key components designed to make

the entire environment functional, manageable and accessible by both the operational

systems that source data into the warehouse and by end-user query and analysis tools.

Typically, the source data for the warehouse is coming from the operational

applications. As the data enters the warehouse, it is cleaned up and transformed into an

integrated structure and format. The transformation process may involve conversion,

summarization, filtering and condensation of data. Because the data contains a historical

component, the warehouse must be capable of holding and managing large volumes of

data as well as different data structures for the same database over time.

The next sections look at the seven major components of data warehousing:

Data Warehouse Database

The central data warehouse database is the cornerstone of the data warehousing

environment. This database is almost always implemented on the relational database

management system (RDBMS) technology. However, this kind of implementation is

often constrained by the fact that traditional RDBMS products are optimized for

transactional database processing. Certain data warehouse attributes, such as very large

database size, ad hoc query processing and the need for flexible user view creation

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 60: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

including aggregates, multi-table joins and drill-downs, have become drivers for different

technological approaches to the data warehouse database. These approaches include:

Parallel relational database designs for scalability that include shared-memory,

shared disk, or shared-nothing models implemented on various multiprocessor

configurations (symmetric multiprocessors or SMP, massively parallel processors

or MPP, and/or clusters of uni- or multiprocessors).

An innovative approach to speed up a traditional RDBMS by using new index

structures to bypass relational table scans.

Multidimensional databases (MDDBs) that are based on proprietary database

technology; conversely, a dimensional data model can be implemented using a

familiar RDBMS. Multi-dimensional databases are designed to overcome any

limitations placed on the warehouse by the nature of the relational data model.

MDDBs enable on-line analytical processing (OLAP) tools that architecturally

belong to a group of data warehousing components jointly categorized as the data

query, reporting, analysis and mining tools.

Sourcing, Acquisition, Cleanup and Transformation Tools

A significant portion of the implementation effort is spent extracting data from

operational systems and putting it in a format suitable for informational applications that

run off the data warehouse.

The data sourcing, cleanup, transformation and migration tools perform all of the

conversions, summarizations, key changes, structural changes and condensations needed

to transform disparate data into information that can be used by the decision support tool.

They produce the programs and control statements, including the COBOL programs,

MVS job-control language (JCL), UNIX scripts, and SQL data definition language

(DDL) needed to move data into the data warehouse for multiple operational systems.

These tools also maintain the meta data. The functionality includes:

Removing unwanted data from operational databases

Converting to common data names and definitions

Establishing defaults for missing data

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 61: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

Accommodating source data definition changes

The data sourcing, cleanup, extract, transformation and migration tools have to deal with

some significant issues including:

Database heterogeneity. DBMSs are very different in data models, data access

language, data navigation, operations, concurrency, integrity, recovery etc.

Data heterogeneity. This is the difference in the way data is defined and used in

different models - homonyms, synonyms, unit compatibility (U.S. vs metric),

different attributes for the same entity and different ways of modeling the same

fact.

These tools can save a considerable amount of time and effort. However, significant

shortcomings do exist. For example, many available tools are generally useful for simpler

data extracts. Frequently, customized extract routines need to be developed for the more

complicated data extraction procedures.

Meta data

Meta data is data about data that describes the data warehouse. It is used for

building, maintaining, managing and using the data warehouse. Meta data can be

classified into:

Technical meta data, which contains information about warehouse data for use by

warehouse designers and administrators when carrying out warehouse

development and management tasks.

Business meta data, which contains information that gives users an easy-to-

understand perspective of the information stored in the data warehouse.

Equally important, meta data provides interactive access to users to help

understand content and find data. One of the issues dealing with meta data relates to the

fact that many data extraction tool capabilities to gather meta data remain fairly

immature. Therefore, there is often the need to create a meta data interface for users,

which may involve some duplication of effort.

Meta data management is provided via a meta data repository and accompanying

software. Meta data repository management software, which typically runs on a

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 62: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

workstation, can be used to map the source data to the target database; generate code for

data transformations; integrate and transform the data; and control moving data to the

warehouse.

As user's interactions with the data warehouse increase, their approaches to

reviewing the results of their requests for information can be expected to evolve from

relatively simple manual analysis for trends and exceptions to agent-driven initiation of

the analysis based on user-defined thresholds. The definition of these thresholds,

configuration parameters for the software agents using them, and the information

directory indicating where the appropriate sources for the information can be found are

all stored in the meta data repository as well.

Access Tools

The principal purpose of data warehousing is to provide information to business

users for strategic decision-making. These users interact with the data warehouse using

front-end tools. Many of these tools require an information specialist, although many end

users develop expertise in the tools. Tools fall into four main categories: query and

reporting tools, application development tools, online analytical processing tools, and

data mining tools.

Query and Reporting tools can be divided into two groups: reporting tools and

managed query tools. Reporting tools can be further divided into production reporting

tools and report writers. Production reporting tools let companies generate regular

operational reports or support high-volume batch jobs such as calculating and printing

paychecks. Report writers, on the other hand, are inexpensive desktop tools designed for

end-users.

Managed query tools shield end users from the complexities of SQL and database

structures by inserting a metalayer between users and the database. These tools are

designed for easy-to-use, point-and-click operations that either accept SQL or generate

SQL database queries.

Often, the analytical needs of the data warehouse user community exceed the

built-in capabilities of query and reporting tools. In these cases, organizations will often

rely on the tried-and-true approach of in-house application development using graphical

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 63: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

development environments such as PowerBuilder, Visual Basic and Forte. These

application development platforms integrate well with popular OLAP tools and access all

major database systems including Oracle, Sybase, and Informix.

OLAP tools are based on the concepts of dimensional data models and

corresponding databases, and allow users to analyze the data using elaborate,

multidimensional views. Typical business applications include product performance and

profitability, effectiveness of a sales program or marketing campaign, sales forecasting

and capacity planning. These tools assume that the data is organized in a

multidimensional model.

A critical success factor for any business today is the ability to use information

effectively. Data mining is the process of discovering meaningful new correlations,

patterns and trends by digging into large amounts of data stored in the warehouse using

artificial intelligence, statistical and mathematical techniques.

Data Marts

The concept of a data mart is causing a lot of excitement and attracts much

attention in the data warehouse industry. Mostly, data marts are presented as an

alternative to a data warehouse that takes significantly less time and money to build.

However, the term data mart means different things to different people. A rigorous

definition of this term is a data store that is subsidiary to a data warehouse of integrated

data. The data mart is directed at a partition of data (often called a subject area) that is

created for the use of a dedicated group of users. A data mart might, in fact, be a set of

denormalized, summarized, or aggregated data. Sometimes, such a set could be placed on

the data warehouse rather than a physically separate store of data. In most instances,

however, the data mart is a physically separate store of data and is resident on separate

database server, often a local area network serving a dedicated user group. Sometimes the

data mart simply comprises relational OLAP technology which creates highly

denormalized dimensional model (e.g., star schema) implemented on a relational

database. The resulting hypercubes of data are used for analysis by groups of users with a

common interest in a limited portion of the database.

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 64: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

These types of data marts, called dependent data marts because their data is

sourced from the data warehouse, have a high value because no matter how they are

deployed and how many different enabling technologies are used, different users are all

accessing the information views derived from the single integrated version of the data.

Unfortunately, the misleading statements about the simplicity and low cost of data

marts sometimes result in organizations or vendors incorrectly positioning them as an

alternative to the data warehouse. This viewpoint defines independent data marts that in

fact, represent fragmented point solutions to a range of business problems in the

enterprise. This type of implementation should be rarely deployed in the context of an

overall technology or applications architecture. Indeed, it is missing the ingredient that is

at the heart of the data warehousing concept -- that of data integration. Each independent

data mart makes its own assumptions about how to consolidate the data, and the data

across several data marts may not be consistent.

Moreover, the concept of an independent data mart is dangerous -- as soon as the

first data mart is created, other organizations, groups, and subject areas within the

enterprise embark on the task of building their own data marts. As a result, you create an

environment where multiple operational systems feed multiple non-integrated data marts

that are often overlapping in data content, job scheduling, connectivity and management.

In other words, you have transformed a complex many-to-one problem of building a data

warehouse from operational and external data sources to a many-to-many sourcing and

management nightmare.

Data Warehouse Administration and Management

Data warehouses tend to be as much as 4 times as large as related operational

databases, reaching terabytes in size depending on how much history needs to be saved.

They are not synchronized in real time to the associated operational data but are updated

as often as once a day if the application requires it.

In addition, almost all data warehouse products include gateways to transparently

access multiple enterprise data sources without having to rewrite applications to interpret

and utilize the data. Furthermore, in a heterogeneous data warehouse environment, the

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 65: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

various databases reside on disparate systems, thus requiring inter-networking tools. The

need to manage this environment is obvious.

Managing data warehouses includes security and priority management;

monitoring updates from the multiple sources; data quality checks; managing and

updating meta data; auditing and reporting data warehouse usage and status; purging

data; replicating, subsetting and distributing data; backup and recovery and data

warehouse storage management.

Q.3. Discuss data extraction process? What are the various methods being used for

data extraction?

Overview of Extraction in Data Warehouses

Extraction is the operation of extracting data from a source system for further use

in a data warehouse environment. This is the first step of the ETL process. After the

extraction, this data can be transformed and loaded into the data warehouse.

The source systems for a data warehouse are typically transaction processing

applications. For example, one of the source systems for a sales analysis data warehouse

might be an order entry system that records all of the current order activities.

Designing and creating the extraction process is often one of the most time-

consuming tasks in the ETL process and, indeed, in the entire data warehousing process.

The source systems might be very complex and poorly documented, and thus determining

which data needs to be extracted can be difficult. The data has to be extracted normally

not only once, but several times in a periodic manner to supply all changed data to the

data warehouse and keep it up-to-date. Moreover, the source system typically cannot be

modified, nor can its performance or availability be adjusted, to accommodate the needs

of the data warehouse extraction process.

These are important considerations for extraction and ETL in general. This

chapter, however, focuses on the technical considerations of having different kinds of

sources and extraction methods. It assumes that the data warehouse team has already

identified the data that will be extracted, and discusses common techniques used for

extracting data from source databases.

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 66: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

Designing this process means making decisions about the following two main

aspects:

Which extraction method do I choose?

This influences the source system, the transportation process, and the time needed

for refreshing the warehouse.

How do I provide the extracted data for further processing?

This influences the transportation method, and the need for cleaning and

transforming the data.

Introduction to Extraction Methods in Data Warehouses

The extraction method you should choose is highly dependent on the source

system and also from the business needs in the target data warehouse environment. Very

often, there is no possibility to add additional logic to the source systems to enhance an

incremental extraction of data due to the performance or the increased workload of these

systems. Sometimes even the customer is not allowed to add anything to an out-of-the-

box application system.

The estimated amount of the data to be extracted and the stage in the ETL process

(initial load or maintenance of data) may also impact the decision of how to extract, from

a logical and a physical perspective. Basically, you have to decide how to extract data

logically and physically.

Logical Extraction Methods

There are two types of logical extraction:

Full Extraction

Incremental Extraction

Full Extraction

The data is extracted completely from the source system. Because this extraction

reflects all the data currently available on the source system, there's no need to keep track

of changes to the data source since the last successful extraction. The source data will be

provided as-is and no additional logical information (for example, timestamps) is

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 67: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

necessary on the source site. An example for a full extraction may be an export file of a

distinct table or a remote SQL statement scanning the complete source table.

Incremental Extraction

At a specific point in time, only the data that has changed since a well-defined

event back in history will be extracted. This event may be the last time of extraction or a

more complex business event like the last booking day of a fiscal period. To identify this

delta change there must be a possibility to identify all the changed information since this

specific time event. This information can be either provided by the source data itself such

as an application column, reflecting the last-changed timestamp or a change table where

an appropriate additional mechanism keeps track of the changes besides the originating

transactions. In most cases, using the latter method means adding extraction logic to the

source system.

Many data warehouses do not use any change-capture techniques as part of the

extraction process. Instead, entire tables from the source systems are extracted to the data

warehouse or staging area, and these tables are compared with a previous extract from the

source system to identify the changed data. This approach may not have significant

impact on the source systems, but it clearly can place a considerable burden on the data

warehouse processes, particularly if the data volumes are large.

Oracle's Change Data Capture mechanism can extract and maintain such delta

information. See Chapter 16, " Change Data Capture" for further details about the Change

Data Capture framework.

Physical Extraction Methods

Depending on the chosen logical extraction method and the capabilities and

restrictions on the source side, the extracted data can be physically extracted by two

mechanisms. The data can either be extracted online from the source system or from an

offline structure. Such an offline structure might already exist or it might be generated by

an extraction routine.

There are the following methods of physical extraction:

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 68: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

Online Extraction

Offline Extraction

Online Extraction

The data is extracted directly from the source system itself. The extraction process can

connect directly to the source system to access the source tables themselves or to an

intermediate system that stores the data in a preconfigured manner (for example, snapshot

logs or change tables). Note that the intermediate system is not necessarily physically

different from the source system.

With online extractions, you need to consider whether the distributed transactions are

using original source objects or prepared source objects.

Offline Extraction

The data is not extracted directly from the source system but is staged explicitly outside

the original source system. The data already has an existing structure (for example, redo

logs, archive logs or transportable tablespaces) or was created by an extraction routine.

You should consider the following structures:

Flat files

Data in a defined, generic format. Additional information about the source

object is necessary for further processing.

Dump files

Oracle-specific format. Information about the containing objects may or

may not be included, depending on the chosen utility.

Redo and archive logs

Information is in a special, additional dump file.

Transportable tablespaces

A powerful way to extract and move large volumes of data between

Oracle databases. A more detailed example of using this feature to extract

and transport data is provided in Chapter 13, " Transportation in Data

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 69: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

Warehouses". Oracle Corporation recommends that you use transportable

tablespaces whenever possible, because they can provide considerable

advantages in performance and manageability over other extraction

techniques.

See Oracle Database Utilities for more information on using export/import.

Change Data Capture

An important consideration for extraction is incremental extraction, also called

Change Data Capture. If a data warehouse extracts data from an operational system on a

nightly basis, then the data warehouse requires only the data that has changed since the

last extraction (that is, the data that has been modified in the past 24 hours). Change Data

Capture is also the key-enabling technology for providing near real-time, or on-time, data

warehousing.

When it is possible to efficiently identify and extract only the most recently

changed data, the extraction process (as well as all downstream operations in the ETL

process) can be much more efficient, because it must extract a much smaller volume of

data. Unfortunately, for many source systems, identifying the recently modified data may

be difficult or intrusive to the operation of the system. Change Data Capture is typically

the most challenging technical issue in data extraction.

Because change data capture is often desirable as part of the extraction process and it

might not be possible to use the Change Data Capture mechanism, this section describes

several techniques for implementing a self-developed change capture on Oracle Database

source systems:

Timestamps

Partitioning

Triggers

These techniques are based upon the characteristics of the source systems, or may require

modifications to the source systems. Thus, each of these techniques must be carefully

evaluated by the owners of the source system prior to implementation.

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 70: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

Each of these techniques can work in conjunction with the data extraction technique

discussed previously. For example, timestamps can be used whether the data is being

unloaded to a file or accessed through a distributed query. See Chapter 16, " Change Data

Capture" for further details.

Timestamps

The tables in some operational systems have timestamp columns. The timestamp

specifies the time and date that a given row was last modified. If the tables in an

operational system have columns containing timestamps, then the latest data can easily be

identified using the timestamp columns. For example, the following query might be

useful for extracting today's data from an orders table:

SELECT * FROM orders

WHERE TRUNC(CAST(order_date AS date),'dd') =

TO_DATE(SYSDATE,'dd-mon-yyyy');

If the timestamp information is not available in an operational source system, you

will not always be able to modify the system to include timestamps. Such modification

would require, first, modifying the operational system's tables to include a new

timestamp column and then creating a trigger to update the timestamp column following

every operation that modifies a given row.

Partitioning

Some source systems might use range partitioning, such that the source tables are

partitioned along a date key, which allows for easy identification of new data. For

example, if you are extracting from an orders table, and the orders table is partitioned by

week, then it is easy to identify the current week's data.

Triggers

Triggers can be created in operational systems to keep track of recently updated

records. They can then be used in conjunction with timestamp columns to identify the

exact time and date when a given row was last modified. You do this by creating a trigger

on each source table that requires change data capture. Following each DML statement

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 71: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

that is executed on the source table, this trigger updates the timestamp column with the

current time. Thus, the timestamp column provides the exact time and date when a given

row was last modified.

A similar internalized trigger-based technique is used for Oracle materialized

view logs. These logs are used by materialized views to identify changed data, and these

logs are accessible to end users. However, the format of the materialized view logs is not

documented and might change over time.

If you want to use a trigger-based mechanism, use synchronous change data

capture. It is recommended that you use synchronous Change Data Capture for trigger

based change capture, because CDC provides an externalized interface for accessing the

change information and provides a framework for maintaining the distribution of this

information to various clients.

Materialized view logs rely on triggers, but they provide an advantage in that the

creation and maintenance of this change-data system is largely managed by the database.

However, Oracle Corporation recommends the usage of synchronous Change

Data Capture for trigger-based change capture, since CDC provides an externalized

interface for accessing the change information and provides a framework for maintaining

the distribution of this information to various clients

Trigger-based techniques might affect performance on the source systems, and

this impact should be carefully considered prior to implementation on a production

source system.

Data Warehousing Extraction Examples

You can extract data in two ways:

Extraction Using Data Files

Extraction Through Distributed Operations

Extraction Using Data Files

Most database systems provide mechanisms for exporting or unloading data from

the internal database format into flat files. Extracts from mainframe systems often use

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 72: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

COBOL programs, but many databases, as well as third-party software vendors, provide

export or unload utilities.

Data extraction does not necessarily mean that entire database structures are

unloaded in flat files. In many cases, it may be appropriate to unload entire database

tables or objects. In other cases, it may be more appropriate to unload only a subset of a

given table such as the changes on the source system since the last extraction or the

results of joining multiple tables together. Different extraction techniques vary in their

capabilities to support these two scenarios.

When the source system is an Oracle database, several alternatives are available for

extracting data into files:

Extracting into Flat Files Using SQL*Plus

Extracting into Flat Files Using OCI or Pro*C Programs

Exporting into Export Files Using the Export Utility

Extracting into Export Files Using External Tables

Extracting into Flat Files Using SQL*Plus

The most basic technique for extracting data is to execute a SQL query in

SQL*Plus and direct the output of the query to a file. For example, to extract a flat file,

country_city.log, with the pipe sign as delimiter between column values, containing a list

of the cities in the US in the tables countries and customers, the following SQL script

could be run:

SET echo off SET pagesize 0 SPOOL country_city.log

SELECT distinct t1.country_name ||'|'|| t2.cust_city

FROM countries t1, customers t2 WHERE t1.country_id = t2.country_id

AND t1.country_name= 'United States of America';

SPOOL off

The exact format of the output file can be specified using SQL*Plus system

variables.

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 73: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

This extraction technique offers the advantage of storing the result in a

customized format. Note that using the external table data pump unload facility, you can

also extract the result of an arbitrary SQL operation. The example previously extracts the

results of a join.

This extraction technique can be parallelized by initiating multiple, concurrent

SQL*Plus sessions, each session running a separate query representing a different portion

of the data to be extracted. For example, suppose that you wish to extract data from an

orders table, and that the orders table has been range partitioned by month, with partitions

orders_jan1998, orders_feb1998, and so on. To extract a single year of data from the

orders table, you could initiate 12 concurrent SQL*Plus sessions, each extracting a single

partition. The SQL script for one such session could be:

SPOOL order_jan.dat

SELECT * FROM orders PARTITION (orders_jan1998);

SPOOL OFF

These 12 SQL*Plus processes would concurrently spool data to 12 separate files.

You can then concatenate them if necessary (using operating system utilities) following

the extraction. If you are planning to use SQL*Loader for loading into the target, these 12

files can be used as is for a parallel load with 12 SQL*Loader sessions. See Chapter 13, "

Transportation in Data Warehouses" for an example.

Even if the orders table is not partitioned, it is still possible to parallelize the

extraction either based on logical or physical criteria. The logical method is based on

logical ranges of column values, for example:

SELECT ... WHERE order_date

BETWEEN TO_DATE('01-JAN-99') AND TO_DATE('31-JAN-99');

The physical method is based on a range of values. By viewing the data

dictionary, it is possible to identify the Oracle Database data blocks that make up the

orders table. Using this information, you could then derive a set of rowid-range queries

for extracting data from the orders table:

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 74: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

SELECT * FROM orders WHERE rowid BETWEEN value1 and value2;

Parallelizing the extraction of complex SQL queries is sometimes possible,

although the process of breaking a single complex query into multiple components can be

challenging. In particular, the coordination of independent processes to guarantee a

globally consistent view can be difficult. Unlike the SQL*Plus approach, using the new

external table data pump unload functionality provides transparent parallel capabilities.

Note that all parallel techniques can use considerably more CPU and I/O

resources on the source system, and the impact on the source system should be evaluated

before parallelizing any extraction technique.

Extracting into Flat Files Using OCI or Pro*C Programs

OCI programs (or other programs using Oracle call interfaces, such as Pro*C

programs), can also be used to extract data. These techniques typically provide improved

performance over the SQL*Plus approach, although they also require additional

programming. Like the SQL*Plus approach, an OCI program can extract the results of

any SQL query. Furthermore, the parallelization techniques described for the SQL*Plus

approach can be readily applied to OCI programs as well.

When using OCI or SQL*Plus for extraction, you need additional information

besides the data itself. At minimum, you need information about the extracted columns. It

is also helpful to know the extraction format, which might be the separator between

distinct columns.

Exporting into Export Files Using the Export Utility

The Export utility allows tables (including data) to be exported into Oracle Database

export files. Unlike the SQL*Plus and OCI approaches, which describe the extraction of

the results of a SQL statement, Export provides a mechanism for extracting database

objects. Thus, Export differs from the previous approaches in several important ways:

The export files contain metadata as well as data. An export file contains not only

the raw data of a table, but also information on how to re-create the table,

potentially including any indexes, constraints, grants, and other attributes

associated with that table.

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 75: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

A single export file may contain a subset of a single object, many database

objects, or even an entire schema.

Export cannot be directly used to export the results of a complex SQL query.

Export can be used only to extract subsets of distinct database objects.

The output of the Export utility must be processed using the Import utility.

Oracle provides the original Export and Import utilities for backward compatibility

and the data pump export/import infrastructure for high-performant, scalable and parallel

extraction. See Oracle Database Utilities for further details.

Extracting into Export Files Using External Tables

In addition to the Export Utility, you can use external tables to extract the results

from any SELECT operation. The data is stored in the platform independent, Oracle-

internal data pump format and can be processed as regular external table on the target

system. The following example extracts the result of a join operation in parallel into the

four specified files. The only allowed external table type for extracting data is the Oracle-

internal format ORACLE_DATAPUMP.

CREATE DIRECTORY def_dir AS '/net/dlsun48/private/hbaer/WORK/FEATURES/et';

DROP TABLE extract_cust;

CREATE TABLE extract_cust

ORGANIZATION EXTERNAL

(TYPE ORACLE_DATAPUMP DEFAULT DIRECTORY def_dir ACCESS

PARAMETERS

(NOBADFILE NOLOGFILE)

LOCATION ('extract_cust1.exp', 'extract_cust2.exp', 'extract_cust3.exp',

'extract_cust4.exp'))

PARALLEL 4 REJECT LIMIT UNLIMITED AS

SELECT c.*, co.country_name, co.country_subregion, co.country_region

FROM customers c, countries co where co.country_id=c.country_id;

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 76: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

The total number of extraction files specified limits the maximum degree of

parallelism for the write operation. Note that the parallelizing of the extraction does not

automatically parallelize the SELECT portion of the statement.

Unlike using any kind of export/import, the metadata for the external table is not

part of the created files when using the external table data pump unload. To extract the

appropriate metadata for the external table, use the DBMS_METADATA package, as

illustrated in the following statement:

SET LONG 2000

SELECT DBMS_METADATA.GET_DDL('TABLE','EXTRACT_CUST') FROM

DUAL;

Extraction Through Distributed Operations

Using distributed-query technology, one Oracle database can directly query tables

located in various different source systems, such as another Oracle database or a legacy

system connected with the Oracle gateway technology. Specifically, a data warehouse or

staging database can directly access tables and data located in a connected source system.

Gateways are another form of distributed-query technology. Gateways allow an Oracle

database (such as a data warehouse) to access database tables stored in remote, non-

Oracle databases. This is the simplest method for moving data between two Oracle

databases because it combines the extraction and transformation into a single step, and

requires minimal programming. However, this is not always feasible.

Suppose that you wanted to extract a list of employee names with department

names from a source database and store this data into the data warehouse. Using an

Oracle Net connection and distributed-query technology, this can be achieved using a

single SQL statement:

CREATE TABLE country_city AS SELECT distinct t1.country_name, t2.cust_city

FROM countries@source_db t1, customers@source_db t2

WHERE t1.country_id = t2.country_id

AND t1.country_name='United States of America';

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 77: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

This statement creates a local table in a data mart, country_city, and populates it

with data from the countries and customers tables on the source system.

This technique is ideal for moving small volumes of data. However, the data is

transported from the source system to the data warehouse through a single Oracle Net

connection. Thus, the scalability of this technique is limited. For larger data volumes,

file-based data extraction and transportation techniques are often more scalable and thus

more appropriate.

Q.4 Discuss the needs of developing OLAP tools in details?

MOLAP or ROLAP

OLAP tools take you a step beyond query and reporting tools. Via OLAP tools,

data is represented using a multidimensional model rather than the more traditional

tabular data model. The traditional model defines a database schema that focuses on

modeling a process of function, and the information is viewed as a set of transactions,

each which occurred at some single point in time. The multidimensional model usually

defines a star schema, viewing data not as a single event but rather as the cumulative

effect of events over some period of time, such as weeks, then months, then years. With

OLAP tools, the user generally vies the data in grids or corsstabs that can be pivoted to

offer different perspectives on the data. OLAP also enables interactive querying of the

data. For example, a user can look at information at one aggregation (such as a sales

region) and then drill down to more detail information, such as sales by state, then city,

then store.

OLAP tools do not indicate how the data is actually stored. Given that, it’s not

surprising that there are multiple ways to store the data, including storing the data in a

dedicated multidimensional database (also referred to as MOLAP or MDD). Examples

include Arbors Software’s Essbase and Oracle Express Server. The other choice involves

storing the data in relational databases and having an OLAP tool work directly against the

data, referred to as relational OLAP (also referred to as ROLAP or RDBMS). Examples

include MicroStrategy’s DSS server and related products, Informix’s Informix-

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 78: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

MetaCube, Information Advantage’s Decision Suite, and Platinum Technologies’

Plantinum InfoBeacon. (Some also include Red Brick’s Warehouse in this category, but it

isn’t really an OLAP tool. Rather, it is a relations database optimized for performing the

types of operations that ROLAP tools need.)

ROLAP versus MOLAPRelational OLAP (ROLAP) Multidimensional OLAP (MOLAP)Scale to terabytes Under 50 DB capacityManaging of summary tables /indexes Instant responsePlatform portability Easier to implementSMP and MPP SMP onlySecure Integrated meta dataProven technologyData modeling required

Data warehouses can be implemented on standard or extended relational DBMSs,

called relational OLAP (ROLAP) servers. these serves assume that data is stored in

relational databases and they support extensions to SQL and special access and

implementation methods to efficiently implement the multidimensional data model and

operations. In contrast, multidimensional OLAP (MOLAP) servers are servers that

directly store multidimensional data in special data structures (like arrays or cubs) and

implement OLAP operations over these data in free-form fashion (free-from within the

framework of the DMBS that holds the multidimensional data). MOLAP servers have

sparsely populated matrices, numeric data, and a rigid structure of data once the data

enters the MOLAP DBMS framework.

Relational Databases

ROLAP servers contain both numeric and textual data, serving a much wider

purpose than their MOLAP counterparts. Unlike MOLAP DBMSs (supported by

specialized database management systems). ROLAP DBMSs (or RDMBSs) are

supported by relational technology. RDBMSs support numeric, textual, spatial, audio,

graphic, and video data, general-purpose DSS analysis, freely structured data, numerous

indexes, and star schema’s. ROLAP servers can have both disciplined and ad hoc usage

and can contain both detailed and summarized data.

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 79: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

ROLAP supports large databases while enabling good performance, platform

portability, exploitation of hardware advances such as parallel processing, robust

security, multi-user concurrent access (including read-write with locking), recognized

standards, and openness to multiple vendor’s tools. ROLAP is based on familiar, proven,

and already selected technologies.

ROLAP tools take advantage of parallel RDBMSs for those parts of the

application processed using SQL (SQL not being a multidimensional access or

processing language). SO, although it is always possible to store multidimensional data in

a number of relations tables (the star schema), SQL does not, by itself, support

multidimensional manipulation of calculations. Therefore, ROLAP products must do

these calculations either in the client software or intermediate server engine. Note,

however, that Informix has integrated the ROLAP calculation engine into the RDBMS,

effectively mitigating the above disadvantage.

Multidimensional Databases

MDDs deliver impressive query performance by pre-calculating or pre-

consolidating transactional data rather than calculating on-the-fly. (MDDs pre-calculate

and store every measure at every hierarchy summary level at load time and store them in

efficiently indexed cells for immediate retrieval.) However, to fully preconsolidate

incoming data, MDDs require an enormous amount of overhead both in processing time

and in storage. An input file of 200MB can easily expand to 5GB; obviously, a file this

size take many minutes to load and consolidate. As a result, MDDs do not scale, making

them a lackluster choice for the enterprise atomic-level data in the data warehouse.

However, MDDs are great candidates for the <50GB department data marts.

To manage large amounts of data, MDD servers aggregate data along hierarchies.

Not only do hierarchies provide a mechanism for aggregating data, they also provide a

technique for navigation. The ability to navigate data by zooming in and out of detail is

key. With MDDs, application design is essentially the definition of dimensions and

calculation rules, while the RDBMS requires that the database schema be a star or

snowflake. With MDDs, for example, it is common to see the structure of time separated

from the repletion of time. One dimension may be the structure of a year, month, quarter,

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 80: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

half-year, and year. A separate dimension might be different years: 1996, 1997, and so

on. Adding a new year to the MDD simply means adding a new member to the calendar

dimension. Adding a new year to a RDBMS usually requires that each month, quarter,

half-year and year also be added.

In General

Usually, a scaleable, parallel database is used for the large, atomic. organizationally-

structured data warehouse, and subsets or summarized data from the warehouse are

extracted and replicated to proprietary MDDs. Because MDD vendors have enabled drill-

through features, when a user reaches the limit of what is actually stored in the MDD and

seeks more detail data, he/she can drill through to the detail stored in the enterprise

database. However, the drill through functionality usually requires creating views for

every possible query.

As relational database vendors incorporate sophisticated analytical

multidimensional features into their core database technology, the resulting capacity for

higher performance salability and parallelism will enable more sophisticated analysis.

Proprietary database and nonitegrated relational OLAP query tool vendors will find it

difficult to compete with this integrated ROLAP solution.

Both storage methods have strengths and weaknesses -- the weaknesses, however,

are being rapidly addressed by the respective vendors. Currently, data warehouses are

predominantly built using RDBMSs. If you have a warehouse built on a relational

database and you want to perform OLAP analysis against it, ROLAP is a natural fit. This

isn’t to say that MDDs can’t be a part of your data warehouse solution. It’s just that

MDDs aren’t currently well-suited for large volumes of data (10-50GB is fine, but

anything over 50GB is stretching their capabilities). If your really want the functionality

benefits that come with MDD, consider subsetting the data into smaller MDD-based data

marts.

When deciding which technology to go for, consider:

1) Performance: How fast will the system appear to the end-user? MDD server vendors

believe this is a key point in their favor. MDD server databases typically contain indexes

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 81: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

that provide direct access to the data, making MDD servers quicker when trying to solve

a multidimensional business problem. However, MDDs have significant performance

differences due to the differing ability of data models to be held in memory, sparsely

handling, and use of data compression. And, the relational database vendors argue that

they have developed performance improvement techniques, such as IBM’s DB2 Starburst

optimizer and Red Brick’s Warehouse VPT STARindex capabilities. (Before you use

performance as an objective measure for selecting an OLAP server, remember that OLAP

systems are about effectiveness (how to make better decisions), not efficiency (how to

make faster decisions).)

2) Data volume and scalability: While MDD servers can handle up to 50GB of storage,

RDBMS servers can handle hundreds of gigabytes and terabytes. And, although MDD

servers can require up to 50% less disk space than relational databases to store the same

amount of data (because of relational indexes and overhead), relational databases have

more capacity. MDD advocates believe that you should perform multidimensional

modeling on summary, not detail, information, thus mitigating the need for large

databases.

in addition to performance, data volume, and scalabiltiy, you should consider which

architecture better supports systems management and data distribution, which vendors

have a better user interface and functionality, which architecture is easier to understand,

which architecture better handles aggregation and complex calculations, and your

perception of open versus proprietary architectures. Besides these issues, you must also

consider which architecture will be a more strategic technology. In fact, MDD servers

and RDBMS products can be used together -- one for fast reposes, the other for access to

large databases.

What if? IFA. You require write access for What if? analysisB. Your data is under 50 GBC. Your timetable to implement is 60-90 daysD. You don’t have a DBA or data modeler personnel

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 82: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

E. You’re developing a general-purpose application for inventory movement or assets management

THENConsider an MDD solution for your data mart (like Oracle Express, Arbor’s Essbase, and Pilot’s Lightship)

IFA. Your data is over 100 GBB. You have a "read-only" requirement

THENConsider an RDBMS for your data mart.

IFA. Your data is over 1TBB. Need data mining at a detail level

Consider an MPP hardware platform like IBM’s SP and DB2 RDBMS

If, you’ve decided to build a data mart using a MDD, you don’t need a data

modeler. Rather, you need an MDD data mart application builder who will design the

business model (identifying dimensions and defining business measures based on the

source systems identified.

Prior to building separate stove pipe data marts, understand that at some point you

will need to: 1) integrate and consolidate these data marts at the detail enterprise level; 2)

load the MDD data marts; and 3) drill through from the data marts to the detail. Note that

your data mart may outgrow the storage limitations an MDD, creating the need for an

RDMBS (in turn, requiring data modeling similar to constructing the detailed, atomic

enterprise-level RDBMS).

Q.5 what do you understand by the term statistical analysis? Discuss the most

important statistical techniques?

Data mining is a relatively new data analysis technique. It is very different from

query and reporting and multidimensional analysis in that is uses what is called a

discovery technique. That is, you do not ask a particular question of the data but rather

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 83: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

use specific algorithms that analyze the data and report what they have discovered.

Unlike query and reporting and multidimensional analysis where the user has to create

and execute queries based on hypotheses, data mining searches for answers to questions

that may have not been previously asked. This discovery could take the form of finding

significance in relationships between certain data elements, a clustering together of

specific data elements, or other patterns in the usage of specific sets of data elements.

After finding these patterns, the algorithms can infer rules. These rules can then be used

to generate a model that can predict a desired behavior, identify relationships among the

data, discover patterns, and group clusters of records with similar attributes.

Data mining is most typically used for statistical data analysis and knowledge discovery.

Statistical data analysis detects unusual patterns in data and applies statistical and

mathematical modeling techniques to explain the patterns. The models are then used to

forecast and predict. Types of statistical data analysis techniques include linear and

nonlinear analysis, regression analysis, multivariant analysis, and time series analysis.

Knowledge discovery extracts implicit, previously unknown information from the data.

This often results in uncovering unknown business facts.

Data mining is data driven (see Figure 4 on page 13). There is a high level of complexity

in stored data and data interrelations in the data warehouse that are difficult to discover

without data mining. Data mining offers new insights into the business that may not be

discovered with query and reporting or multidimensional analysis. Data mining can help

discover new insights about the business by giving us answers to questions we might

never have thought to ask.

Even within the scope of your data warehouse project, when mining data you want to

define a data scope, or possibly multiple data scopes. Because patterns are based on

various forms of statistical analysis, you must define a scope in which a statistically

significant pattern is likely to emerge. For example, buying patterns that show different

products being purchased together may differ greatly in different geographical locations.

To simply lump all of the data together may hide all of the patterns that exist in each

location. Of course, by imposing such a scope you are defining some, though not all, of

the business rules. It is therefore important that data scoping be done in concert with

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 84: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

someone knowledgeable in both the business and in statistical analysis so that artificial

patterns are not imposed and real patterns are not lost.

Data architecture modeling and advanced modeling techniques such as those suitable for

multimedia databases and statistical databases are beyond the scope

Q.6 what are the methods for determining the executive needs?

Implementing an Executive Information System (EIS)

An EIS is a tool that provides direct on-line access to relevant information about aspects

of a business that are of particular interest to the senior manager.

Contents of EIS

A general answer to the question of what data is appropriate for inclusion in an

Executive Information System is "whatever is interesting to executives." While this

advice is rather simplistic, it does reflect the variety of systems currently in use.

Executive Information Systems in government have been constructed to track data about

Ministerial correspondence, case management, worker productivity, finances, and human

resources to name only a few. Other sectors use EIS implementations to monitor

information about competitors in the news media and databases of public information in

addition to the traditional revenue, cost, volume, sales, market share and quality

applications.

Frequently, EIS implementations begin with just a few measures that are clearly

of interest to senior managers, and then expand in response to questions asked by those

managers as they use the system. Over time, the presentation of this information becomes

stale, and the information diverges from what is strategically important for the

organization. A "Critical Success Factors" approach is recommended by many

management theorists (Daniel, 1961, Crockett, 1992, Watson and Frolick, 1992).

Practitioners such as Vandenbosch (1993) found that:

While our efforts usually met with initial success, we often found that after six

months to a year, executives were almost as bored with the new information as they had

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 85: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

been with the old. A strategy we developed to rectify this problem required organizations

to create a report of the month. That is, in addition to the regular information provided for

management committee meetings, the CEO was charged with selecting a different

indicator to focus on each month (Vandenbosch, 1993, pp. 8-9).

While the above indicates that selection of data for inclusion in an EIS is difficult,

there are several guidelines that help to make that assessment. A practical set of

principles to guide the design of measures and indicators to be included in an EIS is

presented below (Kelly, 1992b). For a more detailed discussion of methods for selecting

measures that reflect organizational objectives, see the section "EIS and Organizational

Objectives."

EIS measures must be easy to understand and collect. Wherever possible, data

should be collected naturally as part of the process of work. An EIS should not add

substantially to the workload of managers or staff.

EIS measures must be based on a balanced view of the organization's objective.

Data in the system should reflect the objectives of the organization in the areas of

productivity, resource management, quality and customer service.

Performance indicators in an EIS must reflect everyone's contribution in a fair and

consistent manner. Indicators should be as independent as possible from variables outside

the control of managers.

EIS measures must encourage management and staff to share ownership of the

organization's objectives. Performance indicators must promote both team-work and

friendly competition. Measures will be meaningful for all staff; people must feel that

they, as individuals, can contribute to improving the performance of the organization.

EIS information must be available to everyone in the organization. The objective

is to provide everyone with useful information about the organization's performance.

Information that must remain confidential should not be part of the EIS or the

management system of the organization.

EIS measures must evolve to meet the changing needs of the organization.

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 86: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

Barriers to Effectiveness

There are many ways in which an EIS can fail. Dozens of high profile, high cost

EIS projects have been cancelled, implemented and rarely used, or implemented and used

with negative results. An EIS is a high risk project precisely because it is intended for use

by the most powerful people in an organization. Senior managers can easily misuse the

information in the system with strongly detrimental effects on the organization. Senior

managers can refuse to use a system if it does not respond to their immediate personal

needs or is too difficult to learn and use.

Unproductive Organizational Behaviour Norms

Issues of organizational behaviour and culture are perhaps the most deadly

barriers to effective Executive Information Systems. Because an EIS is typically

positioned at the top of an organization, it can create powerful learning experiences and

lead to drastic changes in organizational direction. However, there is also great potential

for misuse of the information. Green, Higgins and Irving (1988) found that performance

monitoring can promote bureaucratic and unproductive behaviour, can unduly focus

organizational attention to the point where other important aspects are ignored, and can

have a strongly negative impact on morale.

Technical Excellence

An interesting result from the Vandenbosch & Huff (1988) study was that the

technical excellence of an EIS has an inverse relationship with effectiveness. Systems

that are technical masterpieces tend to be inflexible, and thus discourage innovation,

experimentation and mental model development.

Flexibility is important because an EIS has such a powerful ability to direct

attention to specific issues in an organization. A technical masterpiece may accurately

direct management attention when the system is first implemented, but continue to direct

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 87: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

attention to issues that were important a year ago on its first anniversary. There is

substantial danger that the exploration of issues necessary for managerial learning will be

limited to those subjects that were important when the EIS was first developed. Managers

must understand that as the organization and its work changes, an EIS must continually

be updated to address the strategic issues of the day.

A number of explanations as to why technical masterpieces tend to be less

flexible are possible. Developers who create a masterpiece EIS may become attached to

the system and consciously or unconsciously dissuade managers from asking for changes.

Managers who are uncertain that the benefits outweigh the initial cost of a masterpiece

EIS may not want to spend more on system maintenance and improvements. The time

required to create a masterpiece EIS may mean that it is outdated before it is

implemented.

While usability and response time are important factors in determining whether

executives will use a system, cost and flexibility are paramount. A senior manager will be

more accepting of an inexpensive system that provides 20% of the needed information

within a month or two than with an expensive system that provides 80% of the needed

information after a year of development. The manager may also find that the inexpensive

system is easier to change and adapt to the evolving needs of the business. Changing a

large system would involve throwing away parts of a substantial investment. Changing

the inexpensive system means losing a few weeks of work. As a result, fast, cheap,

incremental approaches to developing an EIS increase the chance of success.

Methodology

Implementation of an effective EIS requires clear consensus on the objectives and

measures to be monitored in the system and a plan for obtaining the data on which those

measures are based. The sections below outline a methodology for achieving these two

results. As noted earlier, successful EIS implementations generally begin with a simple

prototype rather than a detailed planning process. For that reason, the proposed planning

methodologies are as simple and scope-limited as possible.

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 88: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

EIS Project Team

The process of establishing organizational objectives and measures is intimately

linked with the task of locating relevant data in existing computer systems to support

those measures. Objectives must be specific and measurable, and data availability is

critical to measuring progress against objectives.

Since there is little use in defining measures for which data is not available, it is

recommended that an EIS project team including technical staff be established at the

outset. This cross-functional team can provide early warning if data is not available to

support objectives or if senior manager's expectations for the system are impractical.

A preliminary EIS project team might consist of as few as three people. An EIS

Project Leader organizes and directs the project. An Executive Sponsor promotes the

project in the organization, contributes senior management requirements on behalf of the

senior management team, and reviews project progress regularly. A Technical Leader

participates in requirements gathering, reviewing plans, and ensuring technical feasibility

of all proposals during EIS definition.

As the focus of the project becomes more technical, the EIS project team may be

complemented by additional technical staff who will be directly involved in extracting

data from legacy systems and constructing the EIS data repository and user interface.

Establishing Measures & EIS Requirements

Most organizations have a number of high-level objectives and direction

statements that help to shape organizational behaviour and priorities. In many cases,

however, these direction statements have not yet been linked to performance measures

and targets. As well, senior managers may have other critical information requirements

that would not be reflected in a simple analysis of existing direction statements.

Therefore it is essential that EIS requirements be derived directly from interaction with

the senior managers who will use the systems. It is also essential that practical measures

of progress towards organizational objectives be established during these interactions.

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143

Page 89: MI0036 SET-1& SET-2

SIKKIM MANIPAL UNIVERSITY

Measures and EIS requirements are best established through a three-stage process.

First, the EIS team solicits the input of the most senior executives in the organization in

order to establish a broad, top-down perspective on EIS requirements. Second, interviews

are conducted with the managers who will be most directly involved in the collection,

analysis, and monitoring of data in the system to assess bottom-up requirements. Third, a

summary of results and recommendations is presented to senior executives and

operational managers in a workshop where final decisions are made.

Interview Format

The focus of the interviews would be to establish all of the measures managers

require in the EIS. Questions would include the following:

What are the five most important pieces of information you need to do your job?

What expectations does the Board of Directors have for you?

What results do you think the general public expects you to accomplish?

On what basis would consumers and customers judge your effectiveness?

What expectations do other stakeholders impose on you?

What is it that you have to accomplish in your current position?

Senior Management Workshop

SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143