group2.doc.doc

65
ISM3610 Decision Support and Intelligence System Data Warehousing By Group B Chan Chi Leung (03012034) Chan Wing Sze (03012077) 1

Upload: tess98

Post on 27-Jan-2015

106 views

Category:

Documents


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Group2.doc.doc

ISM3610

Decision Support and Intelligence

System

Data Warehousing

By Group B

Chan Chi Leung (03012034)

Chan Wing Sze (03012077)

Cheung Helios Su Ho (03012107)

1

Page 2: Group2.doc.doc

Fong Yau Shing (03000923)

Kong Kevin Tsz Wang (03012239)

Lau Ka Wing (03012255)

Pong Shuk Ting (04001737)

Wong Chi Ho (03012468)

19th April, 2007

2

Page 3: Group2.doc.doc

Table of Contents

1. Introduction.........................................................................................5

1.1 What is data warehouse?.........................................................5

1.2 Construction.............................................................................8

1.3 Data Acquisition and Collection................................................8

1.4 Metadata..................................................................................9

1.5 Data Marts..............................................................................10

1.6 Trustworthiness and Security.................................................10

2. Characteristics of a Data Warehouse................................................11

2.1 Subject Oriented.....................................................................11

2.2 Integrated...............................................................................11

2.3 Time Variant...........................................................................12

2.4 Non-Volatile............................................................................12

3. Data Warehouse Architecture...........................................................14

3.1 Operational Database / External Database Layer..................15

3.2 Information Access Layer.......................................................15

3.3 Data Access Layer.................................................................16

3.4 Data Warehouse (Physical) Layer..........................................17

3.5 Application Messaging Layer.................................................18

3.6 Process Management Layer..................................................18

3.7 Data Directory (Metadata) Layer............................................18

3.8 Data Staging Layer.................................................................19

4. Examples on Data Warehousing Vendors.........................................20

4.1 IBM.........................................................................................20

3

Page 4: Group2.doc.doc

4.1.1 Introduction to IBM......................................................20

4.1.2 Features of DB2 Data Warehouse Edition...................20

4.1.3 Advantages..................................................................22

4.1.4 Disadvantages.............................................................23

4.1.5 Application of DB2 DWE in Copenhagen’s TDC..........23

4.2 Oracle.....................................................................................27

4.2.1 Introduction to Oracle..................................................27

4.2.2 Features of Oracle Data Warehousing........................28

4.2.3 Advantages..................................................................32

4.2.4 Disadvantages.............................................................33

4.2.5 Application of Oracle Datawarehousing in Absa Group

Limited .....................................................................................34

4.3 SAS........................................................................................35

4.3.1 Introduction to SAS......................................................35

4.3.2 Features of SAS Warehousing Administrator..............36

4.3.3 Advantages..................................................................40

4.3.4 Disadvantages.............................................................40

4.3.5 Application of SAS Data Warehousing in the HK Trade

Development Council.................................................................41

4

Page 5: Group2.doc.doc

5. How to implement Data Warehouse successfully.............................42

5.1 “If you Built It, They Will Come”..............................................42

5.2 Omission of an Architectural Framework................................42

5.3 Understanding the Importance of Documenting Assumptions43

5.4 Failure to Use the Right Tool for the Job................................43

5.5 Life Cycle Abuse.....................................................................44

5.6 Ignorance Concerning the Resolution of Data Conflicts.........44

5.7 Failure to Learn from Mistakes...............................................44

6. Concerns & Conclusion.....................................................................46

7. References........................................................................................47

5

Page 6: Group2.doc.doc

1. Introduction

Since the early 1990s, data warehouses have been at the forefront of

information technology applications as a way for organizations to

effectively use digital information for business planning and decision

making. As information professionals, we no doubt will encounter the data

warehouse phenomenon if we have not already been exposed to it in our

work. Hence, an understanding of data warehouse system architecture is

or will be important in our roles and responsibilities in information

management.

1.1 What is data warehouse?

Simply saying, a data warehouse could be thought of as a place for

secondhand data that originates in either other corporate applications,

such as the one our company uses to solve printer problems that are

reported from customers, and our front and second line support staff, or

some other data source external to our company, such as a public

database that contains customer support information gathered from our

competitors.

Technically, a data warehouse is the coordinated, architected, and

periodic copying of data from various sources, both inside and outside the

enterprise, into an environment optimized for analytical and informational

processing. The key here is that the data is copied (duplicated) in a

controlled manner and is copied periodically (batch-oriented processing).

6

Page 7: Group2.doc.doc

Data warehousing is also, therefore, the process of creating an

architected information-management solution to enable analytical and

informational processing despite platform, application, organizational, and

other barriers. The key concept here is that barriers are being broken and

distributed information is being consolidated for analysis, although no

preconceived notion exists for the exact means of doing so, such as

duplicating data.

As we all know that, large companies use software packages that gather

and store data in special configurations called data warehouses. Since a

data warehouse is an integrated collection of data it can support

management analysis and decision making. For example, in a typical

company, data is generated by transaction-based systems, such as order

entry, inventory, accounts receivable, and payroll. If a user wants to know

the customer number on a particular sales order, they can retrieve the

data easily from the order entry system application.

On the other hand, suppose that a user wants to see May sales results for

the sales representative assigned to a specific customer, as shown in the

figure 1 for a typical data warehouse.

7

Page 8: Group2.doc.doc

Figure 1 – Typical Data Warehouse

Although the information systems are interactive, it is difficult for a user to

extract specific data that spans several systems and time frames; the

average user might need assistance from the IT staff.

What's nice about a data warehouse is that rather then accessing

separate systems, the data warehouse stores transaction data in a format

that allows users to access, combine, and analyze the data. Again, this

should help in taming and controlling data volume. A data warehouse

allows users to specify certain dimensions, or characteristics. In a

consumer products data warehouse, dimensions might include time,

customer, and sales representative. By selecting values for each

8

Page 9: Group2.doc.doc

characteristic, a user can obtain multidimensional information from the

stored data.

Data warehousing is also a collection of decision support technologies,

aimed at enabling the knowledge worker, who could be an executive,

manager, or analyst to make better and faster decisions.

1.2 Construction

The steps in planning a data warehouse are identical to the steps for any

other type of computer application. Users must be involved to determine

the scope of the warehouse and what business requirements need to be

met. After selecting a focus area, for example, analyzing the use of state

government records over time, a data warehouse team of business users

and information professionals compiles a list of different types of data that

should go into the warehouse. After business requirements have been

gathered and validated, data elements are organized into a conceptual

data model. The conceptual model is used as a blueprint to develop a

physical database design. As in all systems design projects, there are a

number of iterations, prototypes, and technical decisions that need to be

made between the steps of systems analysis, design, development,

implementation, and support.

1.3 Data Acquisition and Collection

The data warehouse team must determine what data should go into the

warehouse and where those particular pieces of information can be found.

Some of the data will be internal to an organization. In other cases, it can

9

Page 10: Group2.doc.doc

be obtained from another source. Another team of analysts and

programmers create extraction programs to collect data from the various

databases, files, and legacy systems that have been identified, copying

certain data to a staging area outside of the warehouse. At this point, they

ensure that the data has no errors, and then copy it all into the data

warehouse. This source data extraction, selection, and transformation

process is unique to data warehousing. Source data analysis and the

efficient and accurate movement of source data into the warehouse

environment are critical to the success of a data warehouse project.

1.4 Metadata

Good metadata is essential to the effective operation of a data warehouse

and it is used in data acquisition/collection, data transformation, and data

access. Acquisition metadata maps the translation of information from the

operational system to the analytical system. This includes an extract

history describing data origins, updates, algorithms used to summarize

data, and frequency of extractions from operational systems.

Transformation metadata includes a history of data transformations,

changes in names, and other physical characteristics. Access metadata

provides navigation and graphical user interfaces that allow non-technical

business users to interact intuitively with the contents of the warehouse.

And on top of these three types of metadata, a warehouse needs basic

operational metadata, such as procedures on how a data warehouse is

used and accessed, procedures on monitoring the growth of the data

warehouse relative to the available storage space, and authorizations on

10

Page 11: Group2.doc.doc

who is responsible for and who has access to the data in the data

warehouse and data in the operational system.

1.5 Data Marts

Data in a data warehouse should be reasonably current, but not

necessarily up to the minute, although developments in the data

warehouse industry have made frequent and incremental data dumps

more feasible. Data marts are smaller than data warehouses and

generally contain information from a single department of a business or

organization. The current trend in data warehousing is to develop a data

warehouse with several smaller related data marts for specific kinds of

queries and reports.

1.6 Trustworthiness and Security

As with any information system, trustworthiness of data is determined by

the trustworthiness of the hardware, software, and the procedures that

created them. The reliability and authenticity of the data and information

extracted from the warehouse will be a function of the reliability and

authenticity of the warehouse and the various source systems that it

encompasses. In data warehouse environments specifically, there needs

to be a means to ensure the integrity of data first by having procedures to

control the movement of data to the warehouse from operational systems

and second by having controls to protect warehouse data from

unauthorized changes. Data warehouse trustworthiness and security are

contingent upon acquisition, transformation and access metadata and

systems documentation.

11

Page 12: Group2.doc.doc

2. Characteristics of a Data Warehouse

This part focuses on the fundamental characteristics of a data warehouse.

Bill Inmon, is recognized as “father of data warehousing”, has defined

data warehousing as a database containing Subject Oriented,

Integrated, Time Variant and Non-volatile information used to support

the decision making process (Martyn R Jones,1999). The following will

explain these four fundamental characteristics of data warehouse.

1.7 Subject Oriented

Operational databases, such as order processing and payroll databases,

are organized around business processes or functional areas. These

databases grew out of the applications they served. Thus, the data was

relative to the order processing application or the payroll application. Data

on a particular subject, such as products or employees, was maintained

separately (and usually inconsistently) in a number of different databases.

In contrast, a data warehouse is organized around subjects. This subject

orientation presents the data in a much easier-to-understand format for

end users and non-IT business analysts.

1.8 Integrated

Integration of data within a warehouse is accomplished by making the

data consistent in format, naming, and other aspects. Operational

databases, for historic reasons, often have major inconsistencies in data

representations. For example, a set of operational databases may

12

Page 13: Group2.doc.doc

represent "male" and "female" by using codes such as "m" and "f", by "1"

and "2", or by "b" and "g". Often, the inconsistencies are more complex

and subtle. In a data warehouse, on the other hand, data is always

maintained in a consistent fashion.

1.9 Time Variant

Data warehouses are time variant in the sense that they both maintain

historical and (nearly) current data. Operational databases, in contrast,

contain only the most current, up-to-date data values. Furthermore, they

generally maintain this information for no more than a year (and often

much less). In contrast, data warehouses contain data that is generally

loaded from the operational databases daily, weekly, or monthly which is

then typically maintained for a period of 3 to 10 years. This is a major

difference between the two types of environments.

Historical information is of high importance to decision makers, who often

want to understand trends and relationships between data. For example,

the product manager for a Liquefied Natural Gas soda drink may want to

see the relationship between coupon promotions and sales. This is

information that is almost impossible - and certainly in most cases not cost

effective - to determine with an operational database.

1.10 Non-Volatile

Non-volatility, the final primary aspect of data warehouses, means that

after the data warehouse is loaded there are no changes, inserts, or

deletes performed against the informational database. The data

13

Page 14: Group2.doc.doc

warehouse is, of course, first loaded with transformed data that originated

in the operational databases.

The data warehouse is subsequently reloaded or, more likely, appended

on a periodic basis (usually nightly, weekly, or monthly) with new

transformed data from the operational databases. Outside of this loading

process, the data warehouse generally stays static. Due to non-volatility,

the data warehouse can be heavily optimized for query processing.

14

Page 15: Group2.doc.doc

3. Data Warehouse Architecture

A Data Warehouse Architecture (DWA) is a way of representing the

overall structure of data, communication, processing and presentation that

exists for end-user computing within the enterprise. The architecture is

made up of several components:

Operational Database / External Database Layer

Information Access Layer

Data Access Layer

Data Warehouse Layer

Application Messaging Layer

Process Management Layer

Data Directory (Metadata) Layer

Data Staging Layer

The figure below shows how the different layers are inter-connected

together.

Figure 2 – Data Warehouse Architecture

15

Page 16: Group2.doc.doc

1.11 Operational Database / External Database

Layer

Operational systems process data to support critical business operational

needs. Operational databases have been created to provide an efficient

processing structure for a relatively small number of well-defined business

transactions. However, because of the limited implementation of

operational systems, the databases designed to support operational

systems have difficulty accessing the data for other management or

informational purposes. This difficulty in accessing operational data is

amplified by the fact that many operational systems are often very old in

age. This means that the data access technology available to obtain

operational data itself is dated.

The goal of data warehousing is to free the information that is locked up in

the operational databases and to mix it with information from other

external sources of data. Nowadays, many large organizations are

acquiring additional data from outside databases. This information

includes demographic, economic, competitive and purchasing trends. The

so-called "information superhighway" is providing access to more data

resources every day.

1.12 Information Access Layer

The Information Access layer of the Data Warehouse Architecture is the

layer that the end-user deals with directly. In particular, it represents the

tools that the end-user normally uses day to day, e.g., Excel, Lotus 1-2-3,

16

Page 17: Group2.doc.doc

Access, SAS, etc. This layer also includes the hardware and software

involved in displaying and printing reports, spreadsheets, graphs and

charts for analysis and presentation. Over the past two decades, the

Information Access layer has expanded enormously, especially as end-

users have moved to PCs and PC/LANs.

Today, more and more sophisticated tools exist on the desktop PC for

manipulating, analyzing and presenting data; however, there are

significant problems in making the raw data contained in operational

systems available easily to end-user tools. One of the key problems is to

find a common data language that can be used throughout the enterprise.

1.13 Data Access Layer

The Data Access Layer is involved with allowing the Information Access

Layer to communicate to the Operational Layer. Today the common data

language that has emerged is SQL. Originally, SQL was developed by

IBM as a query language, but over the last twenty years has become the

standard for data interchange.

One of the key breakthroughs of the last few years has been the

development of a series of data access "filters" such as Enterprise Data

Access (EDA)/SQL that make it possible for SQL to access nearly all

DBMSs and data file systems, relational or non-relational. These filters

make it possible for Information Access tools to access data stored on

database management systems that are even twenty years old.

17

Page 18: Group2.doc.doc

The Data Access Layer not only spans different DBMSs and file systems

on the same hardware, it spans manufacturers and network protocols as

well. One of the keys to a Data Warehousing strategy is to provide end-

users with "universal data access". Universal data access means that,

theoretically, end-users, regardless of location or Information Access tool,

should be able to access any or all of the data in the enterprise that is

necessary for them.

In some cases, this is all that certain end-users need. However, in

general, organizations are developing a much more sophisticated scheme

to support Data Warehousing.

1.14 Data Warehouse (Physical) Layer

The core Data Warehouse is where the actual data used for informational

uses occurs. In some cases, one can think of the Data Warehouse simply

as a logical or virtual view of data. In many instances, the data warehouse

may not actually involve storing data.

In a Physical Data Warehouse, copies, in some cases many copies, of

operational and or external data are actually stored in a form that is easy

to access and is highly flexible. Increasingly, Data Warehouses are stored

on client/server platforms, but they are often stored on main frames as

well.

18

Page 19: Group2.doc.doc

1.15 Application Messaging Layer

The Application Message Layer has to do with transporting information

around the enterprise computing network. Application Messaging is also

referred to as "middleware", but it can involve more than just networking

protocols. Application Messaging for example can be used to isolate

applications, operational or informational, from the exact data format.

Application Messaging can also be used to collect transactions or

messages and deliver them to a certain location at a certain time.

Application Messaging is the transport system underlying the Data

Warehouse.

1.16 Process Management Layer

The Process Management Layer is involved in scheduling the various

tasks that must be completed to build and maintain the data warehouse

and data directory information. The Process Management Layer can be

regard as the scheduler or the high-level job controller for the many

processes that must be done to keep the Data Warehouse up-to-date.

1.17 Data Directory (Metadata) Layer

In order to provide for universal data access, it is necessary to maintain

some form of data directory or repository of meta-data information. Meta-

data is the data about data within the enterprise. Record descriptions in a

COBOL program are meta-data. So are DIMENSION statements in a

FORTRAN program, or SQL Create statements.

In order to have a fully functional warehouse, it is necessary to have a

19

Page 20: Group2.doc.doc

variety of meta-data available, data about the end-user views of data and

data about the operational databases. Ideally, end-users should be able to

access data from the data warehouse without having to know where that

data resides or the form in which it is stored.

1.18 Data Staging Layer

The final component of the Data Warehouse Architecture is Data Staging.

Data Staging is also called copy management or replication management.

Actually, it includes all the processes necessary to select, edit,

summarize, combine and load data warehouse and information access

data from operational and/or external databases.

Data Staging often involves complex programming, but increasingly data

warehousing tools provide help in this process. Data Staging may also

involve data quality analysis programs and filters that identify patterns and

data structures within existing operational data.

20

Page 21: Group2.doc.doc

4. Examples on Data Warehousing Vendors

As IBM, Oracle, and SAS are the famous software vendors. Also, their

data warehouse technology that provided by those vendors are widely

common used by different industries. Therefore, we chose data

warehouse examples from these companies.

1.19 IBM

4.1.1 Introduction to IBM

IBM is aligned around a single, focused business model: innovation. It

takes its breadth and depth of insight on issues, processes and operations

across a variety of industries, and invents and applies technology to help

solve its clients' most intractable business and competitive problems. It

provides different types of data warehouses for the users to deliver

dynamic warehousing. One of the data warehouse is the DB2 Data

Warehouse Edition (DB2 DWE).

4.1.2 Features of DB2 Data Warehouse Edition

DB2 DWE integrates and simplifies the data warehouse environment to

deliver all of the capabilities in order to consolidate, manage, deliver and

analyze your business information. It is optimized for reporting and

analysis and data are summarized and stored in a dimension-based

model. It can allow the people get a good high-level understanding of

what it takes to implement a successful data warehouse project in their

business. It represents the IBM offering for implementing integrated

21

Page 22: Group2.doc.doc

Business Intelligence solutions in order to remove cost and time to

facilitate the data analysis for the business.

Figure 3 – Platform of IBM DB2 Warehouse

1.19.1.1 Powerful DB2 data server foundation

The IBM DB2 platform is the foundation for the DB2 Warehouse solution.

With its massively scalable, shared-nothing distributed architecture, DB2 9

provides high performance for mixed workload query processing against

both relational and native XML data. Advanced features such as data

partitioning, new row compression, multidimensional clustering and

materialized query tables (MQTs) make DB2 a powerful engine for

dynamic warehousing.

22

Page 23: Group2.doc.doc

1.19.1.2 Do it right

DB2 DWE captures new opportunities with a highly flexible, scalable data

warehousing framework, and combines common design tools, advanced

compression technology, inline analytics and pre-built mining capabilities.

1.19.1.3 Do it smarter

It can increase the return on your data warehouse investment by choosing

a high-performance, open-standards-based solution that can be rapidly

implemented with reduced risk to your business.

1.19.1.4 Modeling and design tool

It provides the core components to graphically model data structures,

move and transform data within the data warehouse, implement online

analytical process (OLAP), build and score data mining models, and

finally the ability to develop embedded analytic application components.

4.1.3 Advantages

DB2 DWE is a comprehensive and integrated solution to enterprise data

warehouse development. It provides tools to help data warehouse

administrator on designing, deploying and maintaining enterprise data

warehouse. DB2 DWE’s multidimensional database provides OLAP

(Online analytical processing) which allow users to view data in the

system from different point of view dynamically. Users can generate

statistics by specifying their own requirement in DB2 DWE. On the other

23

Page 24: Group2.doc.doc

hand, DB2 DWE provides advanced compression of data which lowers

the cost of storing large volume of data. Benchmark reports that DB2

DWE can save 45-69 percent of disk spaces. The compression of data

also reduces the read/write frequency of the storage devices. Thus, the

efficiency of querying is higher than uncompressed data.

4.1.4 Disadvantages

The disadvantages of using DB DWE are the high cost and the high

system requirement. It costs about US $1,000 for each year license. The

system hardware requirement is high because of the data compression

scheme. Higher processing power is needed for both compression and

decompression on data access. Typical personal computers cannot meet

the requirement. Thus, a powerful server is needed. The cost of using

DB2 DWE rises for additional hardware.

4.1.5 Application of DB2 DWE in Copenhagen’s TDC

Here is a real case of company getting benefits from DB2 DWE.

Copenhagen’s TDC, Denmark’s leading telecommunications company,

can testify to the ongoing love affair of people with their telephones.

Danish customers make so many calls that each month TDC has to deal

with 1.5 terabytes of new raw data. The company’s information technology

24

Page 25: Group2.doc.doc

(IT) team realized that its existing technology system rapidly was running

out of storage capability. After upgrading their system to DB2 DWE, they

found that the productivity is increased by offering higher levels of

performance and additional applications to internal users. Customer

service is improved by offering most economical service plan based on

usage. Also, the marketing is enhanced through better customer targeting

and campaign tracking. TDC found that before the new system, TDC’s

batch window hardly afforded enough time for the required data to be

processed by the next morning. In the past, if the team lost even one day

of productivity, it would take us as much as a week to catch up. Now, TDC

has no problems receiving, loading and processing data by morning.

For more information about the case, please refer to the URL, http://www-

306.ibm.com/software/success/cssdb.nsf/CS/SPAT-6ATKAP?

OpenDocument&Site=dmbi&cty=en_us .

25

Page 26: Group2.doc.doc

Some screen shots of DB2 DWE are shown as the following.

The following figure shows how DB2 DWE works. DB2 DWE can generate

reports on user’s needs.

Figure 4 – Example on Reports Generation on user’s needs (DB2 DWE)

26

Page 27: Group2.doc.doc

If user wants a report of more detailed level, detailed reports can be show

in very simple operation.

Figure 5 – Example on Details Report (DB2 DWE)

DB2 DWE provides an interactive platform between the system

developers and the users. When the developers make some changes to

the system, message is prompted to users to tell them what have been

changed.

27

Page 28: Group2.doc.doc

Figure 6 – Example on Prompt Message (DB2 DWE)

On users’ need, some criteria can be set to the reports. The data entry

which conflicts with the criteria would be highlighted. This makes user

easier to notice the characteristics of the data.

1.20 Oracle

4.1.6 Introduction to Oracle

Oracle Database having the ideal technology for the data warehouse

because the software’s open interfaces offered easy integration with

multiple systems. This was important as the company wanted to import

data from existing applications into the data warehouse. It also

accommodated large amounts of detail, down to individual flights in

specific segments.

Furthermore, the Oracle solution offered a flexible structure for reporting,

enabling customer to design customized reports and allowing staff to

undertake multi-dimensional analysis. Oracle is also highly scalable,

ensuring it can cater for future growth.

A research from Winter Corporation and Oracle stated that the size of a

data warehouse triple itself in every 2 years since 2001. There are many

28

Page 29: Group2.doc.doc

reasons that lead to the significant growth in size of the data warehouse,

which many of them can be explained by the industrial trends in data

warehousing.

Oracle stated that the first reason that can explain this phenomenon is the

development of real-time business (Oracle.com, 2006). Organizations

strive to react to the market changes as quickly as possible to gain market

advantages. Data latency has to be reduced in order to achieve a real-

time business model and as a result, the data size will gradually increase.

Enterprises also tend to have a detailed log of the enterprise data as

some regulatory compliance like Basel II (the International Convergence

of Capital Measurement and Capital Standards) requires organizations to

capture and retain detailed transaction histories. Moreover, new types of

storage-intensive information can create new business opportunities, like

the RFID technology, which is also one of the reasons in resulting with a

huge size of the data warehouses.

Besides data volume, data warehousing is experiencing growth in

different dimensions. Traditionally data warehouses, or databases, were

only used for reporting and analysis, but nowadays they often come with a

prediction function and are shared and integrated to application.

Furthermore, the accessibility of data warehouse is no longer limited to

users within the enterprise, but to other customers, partners, and

suppliers. Together with the increasing complexity of queries for intensive

analysis of sophisticated business intelligence applications, there are

many criteria for a data warehouse to satisfy today’s need.

29

Page 30: Group2.doc.doc

4.1.7 Features of Oracle Data Warehousing

Oracle, one of the most popular databases for data warehousing, has

developed its data warehousing application to fit to the above needs.

There are some key features of the Oracle application that increase its

capability.

1.20.1.1 Partitioning

Partitioning is the “foundation” for achieving effective performance in

large-scale Oracle data warehouses. It means splitting data into separate

“chunks”. It can shorten the response time and increase throughput.

Some other different features that will be discussed below have to

function depending on the partitioning of the data warehouse.

1.20.1.2 Parallel Operation

Parallelism enables scalability, which makes large workloads, large

databases, and very large data warehouses (VLDW) possible. It is

because if not all parts in the system is functioning in parallel, any single-

threaded path can potentially bottleneck the throughput of the system and

as a result limiting its ability to scale.

1.20.1.3 Materialized views

Materialized views can enable sophisticated data analysis on large data

sets. Significant processes would have to be allocated to complicated

joins and aggregations in order to produce the complex summaries

required without materialized views. On the contrary, however, queries

30

Page 31: Group2.doc.doc

can be written against tables and views that they have been logically

designed, and the application will deal with the physical tables.

Materialized views often provide the performance boost necessary to turn

a runaway query into a powerful analytical tool.

1.20.1.4 Intelligent Optimization

The intelligent optimizer selects the best strategy and optimizes the order

of operations. As a result the query performances of indexing, partitioning,

and other data access features can be speeded up.

Figure 7 – Example on query (Oracle Datawarehousing)

1.20.1.5 Table Compression

Data compression to save disk space is an attractive option to save costs

31

Page 32: Group2.doc.doc

by decreasing storage requirements. Whereas traditional data

compression will lead to query performance degradation, Oracle’s table

compression feature eliminates duplicate, or redundant, data values

without any negative impacts on the query performance.

1.20.1.6 Online Analytical Processing

OLAPs are deployed to gain better visibility into the business. It helps to

understand what’s happening, why it’s happening, and what will happen to

the business. Thus, all necessary knowledge and information for planning,

budgeting, forecasting, sales, and marketing functions can be derived

from the existing databases. Oracle’s OLAP product uses a single

database platform for all query processing. Both SQL and OLAP API

queries can be directed to 1 single data store. Without the need of

transferring data to different environments, users can benefit from

reducing data latency, faster access to more recent data, and reducing

low cost and complexity.

32

Page 33: Group2.doc.doc

Figure 8 – Example on OLAPs (Oracle Datawarehousing)

1.20.1.7 Data Mining

Data mining is intended to sift through volumes of data to find hidden

patterns. These patterns can derive new business insights that can attract

and retain customers, enhance customer and supplier relationships,

identify new sales opportunities, or identify potentially fraudulent behavior.

4.1.8 Advantages

Provided an integrated view of the business by building an enterprise

data warehouse

Supported decision-making and business analysis at all levels of the

company

33

Page 34: Group2.doc.doc

Improved performance through early detection of market

opportunities

Catered for future growth with scalable solution

Oracle is better known and management often feels more

comfortable with a better-known vendor and product.

There are more Oracle DBAs in the job market

There are more books written on supporting Oracle

Vendors of data warehouse products will almost always write their

products to support Oracle first. In addition, there is usually more

experience with these products with Oracle

Oracle has a pretty complete suite of products for data warehouse.

A company using Oracle's ERP products almost always use Oracle’s

RDBMS for their data warehouse.

4.1.9 Disadvantages

It can be extremely expensive to build and maintain. You may need to

have buy-in from senior management to get approval for a data

warehouse.

You need large amounts of storage space, potentially one terabyte or

more.

Because there is a huge amount of data, it is possible to write queries

that seem to run forever and never come back with an answer (the

query from the Twilight Zone).

The data is not up-to-date? In some cases, 24 hours or more old.

They are not easily changed. If you spot an error in the data

warehouse, you will have to correct it in the source system. If that

34

Page 35: Group2.doc.doc

system cannot be changed, the data warehouse cannot be changed

and you will have to live with incorrect data. For example, company

"ABC Widgets" could be stored in the database as "A.B.C. Widgets",

"AB and C Widgets", or "AB&C". Unless you know about these

possible irregularities, you will get incomplete results. You may have

a difficult time persuading your company to change their procedures

to satisfy the data warehouse.

Because the data is coming from different sources, you may not be

able to get the same answer from your OLTP system as you do from

the data warehouse. It will be difficult, if not impossible, to identify if

any OLTP transactions are missing from the data warehouse.

35

Page 36: Group2.doc.doc

4.1.10 Application of Oracle Datawarehousing in Absa

Group Limited

Absa Group Limited, one of South Africa’s largest financial services

organizations offers a complete range of products and services. Absa has

assets of R372 billion (US$62 billion), 686 staffed outlets, 5,468 ATMs and

South Africa’s largest internet banking customer base. In 2005, Barclays

took a majority stake, to help Absa become the financial services leader in

South Africa and ultimately the pre-eminent bank on the African continent.

1.20.1.8 Challenges

Improve Absa’s business responsiveness by consolidating its

fragmented business intelligence environment, which required

compiling 1,200 reports and 31 business intelligence projects

Align business intelligence to corporate strategy by standardized

methodology, architecture, tools and measurement

Deliver reports required by all business units

Cut costs of delivering and printing manually generated reports

Replace paper-based reports with electronic intelligence for individual

business units to reduce report delivery-to-desk time

1.20.1.9 Solution

Used Oracle Database as the single source of data to make the

Enterprise Data Warehouse more efficient

Consolidated the business intelligence environment on Oracle

Application Server to reduce duplication of reports

Implemented a business intelligence methodology with OLAP and

36

Page 37: Group2.doc.doc

Oracle Balanced Scorecard tools to support common strategic

planning across the group

Aligned business performance measurement to focus on causes of

problems and thus ensure better business decision making

Used Oracle Warehouse Builder to create common processes for

extracting data and loading into the Data Warehouse

Able to source data from 52 core banking systems, on a daily, weekly

or monthly basis, as well as external data sources

Implemented Oracle Discoverer for end-user analysis

Anticipated cost savings from reduced manual reporting and removal

of disparate BI projects represents a possible return on investment of

more than 300% over five years

1.21 SAS

4.1.11 Introduction to SAS

SAS Institute Inc., has been a major producer of software since it was

founded in 1976. SAS was originally an acronym for Statistical Analysis

System but for many years has been used as an arbitrary trade-name to

refer the company as a whole.

The SAS System, originally Statistical Analysis System, is an integrated

system of software products provided by SAS Institute that enables the

programmer to perform:

Data entry, retrieval, management, and mining

Report writing and graphics

Statistical and mathematical analysis

37

Page 38: Group2.doc.doc

Business planning, forecasting, and decision support

Operations research and project management

Quality improvement

Applications development

Data warehousing (extract, transform, load)

platform independent and remote computing

In addition, the SAS System integrates with many SAS business solutions

that enable large scale software solutions for areas such as human

resource management, financial management, business intelligence,

customer relationship management and more.

4.1.12 Features of SAS Warehousing Administrator

SAS/warehousing administrator is designed for the IT professional

responsible for creating and managing data warehouse / data mart

processes. It provides Customizable solution that offers a single point of

control, making it easier to respond to the ever-changing needs of the

business community. Also, it simplifies the creation and maintenance of

data warehouses.

The main benefit of using SAS/warehousing administrator is simplifying

the setup and management of multiple data warehouses and data marts.

The details are as follows,

Integrates extraction, transformation and loading tools for building

and managing data warehouses/data marts.

Provides a framework for effective warehouse management through

38

Page 39: Group2.doc.doc

a metadata-driven architecture.

Facilitates business subject definition, consolidation of business

rules, scheduling of processes for warehouse maintenance and

integration with decision-support tools for effective warehouse

exploitation.

Leverages the strengths of SAS software and rapid warehousing to

deliver the well-proven benefits of a data warehouse even faster.

With using the graphical user interface, the visualization, navigation and

maintenance of the data warehouse are simplified and eliminate much of

the coding work required to build and manage it. Moreover, it offers the

adaptability and the manageability you need as your business and

information needs change, as more data is added, as processes become

more complex, and as users require greater support.

Figure 9 – Example on use interface (SAS/Warehousing Administrator)

39

Page 40: Group2.doc.doc

Figure 10 – Example on reports (SAS Data Warehouse)

1.21.1.1 SAS Enterprise Data Integration

In different from common data warehousing, SAS provides a complete

functional data capturing, storage, integration and analysis software

across the enterprise. The SAS Enterprise Data Integration attains and

manages consistent and trusted data throughout the organization in a

flexible and reliable manner.

40

Page 41: Group2.doc.doc

Graphical user interface provides technicians with an interactive,

single point of control for managing data integration processes,

including wizards for building and executing data access,

transformations and storage process flows.

Connectivity to more data sources on more platforms such as IBM

DB2, Oracle DB, Microsoft Access, Sybase, etc.

Data quality embedded into batch, near-time and real-time processes

Metadata is captured and documented throughout transformation and

data integration processes

Migrate or synchronize data between database structures, enterprise

applications, mainframe legacy files, text, XML and message queues

Join data across these virtual data sources for real-time access and

analysis

Business metadata design interface allows data analysts to quickly

build semantic layer

Business rules library for reusable business rules clean, standardize,

match and enhance data as it moves into the master reference file

and is reused for downstream processes

41

Page 42: Group2.doc.doc

Figure 11 – Overview on SAS Data Integration

4.1.13 Advantages

1.21.1.2 High Compatibility

Access to ERP systems such as Baan, People Soft, and SAP; relational

databases such as DB2, Oracle, Informix, ODBC, MS SQL Server,

Sybase, and Teradata; and non-relational databases such as Adabas and

PC file formats.

1.21.1.3 Point-and-click interface

The user friendly interface enable data management specialist

implementing the warehousing application without the assist of

programmer and also operators.

4.1.14 Disadvantages

1.21.1.4 Unknown Implementation cost

When compare with other, like Oracle, SAS does not has a well pricing

policy. It gives difficulty for customers to choose between available

products.

1.21.1.5 Unknown difficulty of implementation

Compare with other company, such as Oracle offers data warehouse and

analytic specific services that combine technical leadership and expertise

with Oracle technology to provide a complete business intelligence

solution, SAS does not mention the degree of difficulty of implementation.

42

Page 43: Group2.doc.doc

4.1.15 Application of SAS Data Warehousing in the HK

Trade Development Council

The Hong Kong Trade Development Council, which launched Business-

Stat On-line using Data Warehousing and Web Enablement technology

from SAS.

Business-Stat On-line (BSO) is an interactive on-line service allowing

companies to access monthly trade figures compiled by the Census and

Statistics Department. Information available includes Hong Kong’s total

trade figures, overseas trade, and trade according to specific types of

product and service.

The project involved the design and implementation of a Data Warehouse

containing five years of export, domestic export and import data broken

down by a wide range of product and market areas. Other trade service

data was also imported into the system using customized tools provided

by SAS. SAS also developed an extensive number of statistical reports.

Over 6000 pre-summarized general tables were created for on-line

access, designed as a starting point to Hong Kong’s general trade

performance. In addition, users of the service can view an unlimited

number of dynamic reports based on selection criteria such as region,

industry and product type. Registration and administration tools provided

by SAS allow subscribers to register for the BSO service on-line free of

charge. They are then automatically notified by e-mail of their logon ID

and password, allowing them full access to the service.

43

Page 44: Group2.doc.doc

5. How to implement Data Warehouse

successfully

Talking so many benefits about the application of data warehouse in an

enterprise, but how we could implement the DW technology successfully

into our operational processes? Denis Kozar suggested the “seven deadly

sins” on the DW implementation.

1.22 “If you Built It, They Will Come”

The blind faith on the DW technology leads to the failure to recognize the

importance of defining a set of business objectives for the data warehouse

prior to its implementation. A clearly defined data warehouse plan is

important to the needs of the entire enterprise and a documented set of

requirements is necessary to guide the design, construction, and rollout of

the project.

1.23 Omission of an Architectural Framework

One the most important factors in a successful data warehouse

implementation is the development and maintenance of a comprehensive

architectural framework. The framework serves as the blueprint for

construction and use of the various DW components. Developers need to

consider, the number of end-users, volume and diversity of data, expected

data-refresh cycle, etc., in the DW architecture.

44

Page 45: Group2.doc.doc

1.24 Understanding the Importance of

Documenting Assumptions

The assumptions and potential data conflicts associated with the DW

must be included in the architectural framework for the project. Several

questions need to be considered during the requirements phase of the

project that serve to reveal these important underlying assumptions about

the DW. How much data would be loaded into the warehouse? How often

the data need to be refreshed? On what platform the DW will be

developed? Answers to these questions are essential to the success of

DW implementation.

1.25 Failure to Use the Right Tool for the Job

The design and construction of a DW is much different from that of an

operational application system. The DW tools can be categorized into four

areas:

Analysis Tools – assist in identification of data requirements

Development Tools – responsible for data cleansing, code

generation, data integration, and loading of the data into the data

repository.

Implementation Tools – contain data acquisition tools to gather

process, clean, replicate, and consolidate data.

Delivery Tools – assist in data conversion, derivation, and reporting

for the application platform.

Correct application of these tools could help to implement the DW

efficiently and effectively.

45

Page 46: Group2.doc.doc

1.26 Life Cycle Abuse

The life cycle of DW development is a continuous, ongoing set of activities

that flow from initial investigation of DW requirements through data

administration and back again. The development of DW project should be

kept running continuously as if the DW is to remain a viable source of

decision-making support in the ever changing business environment.

1.27 Ignorance Concerning the Resolution of Data

Conflicts

Analysis must be conducted to determine the best data sources available

within an organization. Once these systems have been identified, the

conflicts associated with disparate naming conventions, file formats and

sizes, and value ranges must be resolved. This process may involve

working with data owners to establish an understanding with regard to

future planned or unplanned changes to the source data. Failure to allow

sufficient time and resources to resolve data conflicts can delay a

warehouse implementation and result in an organizational deadlock that

can threaten the success of the project.

1.28 Failure to Learn from Mistakes

The ongoing nature of the DW development cycle suggests that DW

project simply relates one another. Because of this, careful documentation

of the mistakes made in the previous projects will directly impact the

quality assurance activities of all future projects. By learning from the past,

a strong DW with lasting benefits can be built.

46

Page 47: Group2.doc.doc

If developers can pay attention to the above areas, the implementation of

Data Warehouse will certainly bring great benefits to the business.

47

Page 48: Group2.doc.doc

6. Concerns & Conclusion

Data warehouse can bring many benefits to enterprises, however, there

are concerns of using it.

Extracting, cleaning and loading data is time consuming.

Data warehousing project scope must be actively managed to deliver

a release of defined content and value.

Problems with compatibility with systems already in place.

Security could develop into a serious issue, especially if the data

warehouse is web accessible.

Data Storage design controversy warrants careful consideration and

perhaps prototyping of the data warehouse solution for each project's

environments

So, managers need to aware of the concerns when using the data

warehouse, so that they can get the benefits of data warehousing without

any problems.

48

Page 49: Group2.doc.doc

7. References

George M. Marakas. (©1999) pp. 343-346, Decision support system in the

twenty-first century: DSS and data mining technologies for tomorrow’s

manager

IBM, Background, http://www-03.ibm.com/press/us/en/background.wss

IBM, DB2 Data Warehouse Edition, “Features and benefits”,

http://www-306.ibm.com/software/data/db2/dwe/features.html?S_CMP=rnav

IBM, DB2 Data Warehouse Edition, “Overview”,

http://www-306.ibm.com/software/data/db2/dwe/

IBM, “Denmarks’ TDC answers the call of Danish telephone consumers

with IBM Data Warehouse”,

http://www-306.ibm.com/software/success/cssdb.nsf/CS/SPAT-6ATKAP?

OpenDocument&Site=dmbi&cty=en_us

Ken Orr (©1996, revised 2000), Data Warehouse Technology,

http://www.kenorrinst.com/dwpaper.html

Manufacturing Business Technology: Software Finder, “Oracle vs SAS”,

http://softwarefinder.mbtmag.com/search/for/Oracle-vs-SAS.html

Martyn R Jones (1999), “Brief defining characteristics of a Data

Warehouse”, http://www.brint.com/wwwboard/messages/4599.html

Paul Westerman, Data Warehousing : using the Wal-Mart model

SAS, Data Integration, http://www.sas.com/technologies/dw/

Wikipedia, “Bill Inmon”, http://en.wikipedia.org/wiki/Bill_Inmon

Wikipedia, “SAS Institute”, http://en.wikipedia.org/wiki/SAS_Institute

Wikipedia, “SAS System”, http://en.wikipedia.org/wiki/SAS_System

49