group2.doc.doc
DESCRIPTION
TRANSCRIPT
ISM3610
Decision Support and Intelligence
System
Data Warehousing
By Group B
Chan Chi Leung (03012034)
Chan Wing Sze (03012077)
Cheung Helios Su Ho (03012107)
1
Fong Yau Shing (03000923)
Kong Kevin Tsz Wang (03012239)
Lau Ka Wing (03012255)
Pong Shuk Ting (04001737)
Wong Chi Ho (03012468)
19th April, 2007
2
Table of Contents
1. Introduction.........................................................................................5
1.1 What is data warehouse?.........................................................5
1.2 Construction.............................................................................8
1.3 Data Acquisition and Collection................................................8
1.4 Metadata..................................................................................9
1.5 Data Marts..............................................................................10
1.6 Trustworthiness and Security.................................................10
2. Characteristics of a Data Warehouse................................................11
2.1 Subject Oriented.....................................................................11
2.2 Integrated...............................................................................11
2.3 Time Variant...........................................................................12
2.4 Non-Volatile............................................................................12
3. Data Warehouse Architecture...........................................................14
3.1 Operational Database / External Database Layer..................15
3.2 Information Access Layer.......................................................15
3.3 Data Access Layer.................................................................16
3.4 Data Warehouse (Physical) Layer..........................................17
3.5 Application Messaging Layer.................................................18
3.6 Process Management Layer..................................................18
3.7 Data Directory (Metadata) Layer............................................18
3.8 Data Staging Layer.................................................................19
4. Examples on Data Warehousing Vendors.........................................20
4.1 IBM.........................................................................................20
3
4.1.1 Introduction to IBM......................................................20
4.1.2 Features of DB2 Data Warehouse Edition...................20
4.1.3 Advantages..................................................................22
4.1.4 Disadvantages.............................................................23
4.1.5 Application of DB2 DWE in Copenhagen’s TDC..........23
4.2 Oracle.....................................................................................27
4.2.1 Introduction to Oracle..................................................27
4.2.2 Features of Oracle Data Warehousing........................28
4.2.3 Advantages..................................................................32
4.2.4 Disadvantages.............................................................33
4.2.5 Application of Oracle Datawarehousing in Absa Group
Limited .....................................................................................34
4.3 SAS........................................................................................35
4.3.1 Introduction to SAS......................................................35
4.3.2 Features of SAS Warehousing Administrator..............36
4.3.3 Advantages..................................................................40
4.3.4 Disadvantages.............................................................40
4.3.5 Application of SAS Data Warehousing in the HK Trade
Development Council.................................................................41
4
5. How to implement Data Warehouse successfully.............................42
5.1 “If you Built It, They Will Come”..............................................42
5.2 Omission of an Architectural Framework................................42
5.3 Understanding the Importance of Documenting Assumptions43
5.4 Failure to Use the Right Tool for the Job................................43
5.5 Life Cycle Abuse.....................................................................44
5.6 Ignorance Concerning the Resolution of Data Conflicts.........44
5.7 Failure to Learn from Mistakes...............................................44
6. Concerns & Conclusion.....................................................................46
7. References........................................................................................47
5
1. Introduction
Since the early 1990s, data warehouses have been at the forefront of
information technology applications as a way for organizations to
effectively use digital information for business planning and decision
making. As information professionals, we no doubt will encounter the data
warehouse phenomenon if we have not already been exposed to it in our
work. Hence, an understanding of data warehouse system architecture is
or will be important in our roles and responsibilities in information
management.
1.1 What is data warehouse?
Simply saying, a data warehouse could be thought of as a place for
secondhand data that originates in either other corporate applications,
such as the one our company uses to solve printer problems that are
reported from customers, and our front and second line support staff, or
some other data source external to our company, such as a public
database that contains customer support information gathered from our
competitors.
Technically, a data warehouse is the coordinated, architected, and
periodic copying of data from various sources, both inside and outside the
enterprise, into an environment optimized for analytical and informational
processing. The key here is that the data is copied (duplicated) in a
controlled manner and is copied periodically (batch-oriented processing).
6
Data warehousing is also, therefore, the process of creating an
architected information-management solution to enable analytical and
informational processing despite platform, application, organizational, and
other barriers. The key concept here is that barriers are being broken and
distributed information is being consolidated for analysis, although no
preconceived notion exists for the exact means of doing so, such as
duplicating data.
As we all know that, large companies use software packages that gather
and store data in special configurations called data warehouses. Since a
data warehouse is an integrated collection of data it can support
management analysis and decision making. For example, in a typical
company, data is generated by transaction-based systems, such as order
entry, inventory, accounts receivable, and payroll. If a user wants to know
the customer number on a particular sales order, they can retrieve the
data easily from the order entry system application.
On the other hand, suppose that a user wants to see May sales results for
the sales representative assigned to a specific customer, as shown in the
figure 1 for a typical data warehouse.
7
Figure 1 – Typical Data Warehouse
Although the information systems are interactive, it is difficult for a user to
extract specific data that spans several systems and time frames; the
average user might need assistance from the IT staff.
What's nice about a data warehouse is that rather then accessing
separate systems, the data warehouse stores transaction data in a format
that allows users to access, combine, and analyze the data. Again, this
should help in taming and controlling data volume. A data warehouse
allows users to specify certain dimensions, or characteristics. In a
consumer products data warehouse, dimensions might include time,
customer, and sales representative. By selecting values for each
8
characteristic, a user can obtain multidimensional information from the
stored data.
Data warehousing is also a collection of decision support technologies,
aimed at enabling the knowledge worker, who could be an executive,
manager, or analyst to make better and faster decisions.
1.2 Construction
The steps in planning a data warehouse are identical to the steps for any
other type of computer application. Users must be involved to determine
the scope of the warehouse and what business requirements need to be
met. After selecting a focus area, for example, analyzing the use of state
government records over time, a data warehouse team of business users
and information professionals compiles a list of different types of data that
should go into the warehouse. After business requirements have been
gathered and validated, data elements are organized into a conceptual
data model. The conceptual model is used as a blueprint to develop a
physical database design. As in all systems design projects, there are a
number of iterations, prototypes, and technical decisions that need to be
made between the steps of systems analysis, design, development,
implementation, and support.
1.3 Data Acquisition and Collection
The data warehouse team must determine what data should go into the
warehouse and where those particular pieces of information can be found.
Some of the data will be internal to an organization. In other cases, it can
9
be obtained from another source. Another team of analysts and
programmers create extraction programs to collect data from the various
databases, files, and legacy systems that have been identified, copying
certain data to a staging area outside of the warehouse. At this point, they
ensure that the data has no errors, and then copy it all into the data
warehouse. This source data extraction, selection, and transformation
process is unique to data warehousing. Source data analysis and the
efficient and accurate movement of source data into the warehouse
environment are critical to the success of a data warehouse project.
1.4 Metadata
Good metadata is essential to the effective operation of a data warehouse
and it is used in data acquisition/collection, data transformation, and data
access. Acquisition metadata maps the translation of information from the
operational system to the analytical system. This includes an extract
history describing data origins, updates, algorithms used to summarize
data, and frequency of extractions from operational systems.
Transformation metadata includes a history of data transformations,
changes in names, and other physical characteristics. Access metadata
provides navigation and graphical user interfaces that allow non-technical
business users to interact intuitively with the contents of the warehouse.
And on top of these three types of metadata, a warehouse needs basic
operational metadata, such as procedures on how a data warehouse is
used and accessed, procedures on monitoring the growth of the data
warehouse relative to the available storage space, and authorizations on
10
who is responsible for and who has access to the data in the data
warehouse and data in the operational system.
1.5 Data Marts
Data in a data warehouse should be reasonably current, but not
necessarily up to the minute, although developments in the data
warehouse industry have made frequent and incremental data dumps
more feasible. Data marts are smaller than data warehouses and
generally contain information from a single department of a business or
organization. The current trend in data warehousing is to develop a data
warehouse with several smaller related data marts for specific kinds of
queries and reports.
1.6 Trustworthiness and Security
As with any information system, trustworthiness of data is determined by
the trustworthiness of the hardware, software, and the procedures that
created them. The reliability and authenticity of the data and information
extracted from the warehouse will be a function of the reliability and
authenticity of the warehouse and the various source systems that it
encompasses. In data warehouse environments specifically, there needs
to be a means to ensure the integrity of data first by having procedures to
control the movement of data to the warehouse from operational systems
and second by having controls to protect warehouse data from
unauthorized changes. Data warehouse trustworthiness and security are
contingent upon acquisition, transformation and access metadata and
systems documentation.
11
2. Characteristics of a Data Warehouse
This part focuses on the fundamental characteristics of a data warehouse.
Bill Inmon, is recognized as “father of data warehousing”, has defined
data warehousing as a database containing Subject Oriented,
Integrated, Time Variant and Non-volatile information used to support
the decision making process (Martyn R Jones,1999). The following will
explain these four fundamental characteristics of data warehouse.
1.7 Subject Oriented
Operational databases, such as order processing and payroll databases,
are organized around business processes or functional areas. These
databases grew out of the applications they served. Thus, the data was
relative to the order processing application or the payroll application. Data
on a particular subject, such as products or employees, was maintained
separately (and usually inconsistently) in a number of different databases.
In contrast, a data warehouse is organized around subjects. This subject
orientation presents the data in a much easier-to-understand format for
end users and non-IT business analysts.
1.8 Integrated
Integration of data within a warehouse is accomplished by making the
data consistent in format, naming, and other aspects. Operational
databases, for historic reasons, often have major inconsistencies in data
representations. For example, a set of operational databases may
12
represent "male" and "female" by using codes such as "m" and "f", by "1"
and "2", or by "b" and "g". Often, the inconsistencies are more complex
and subtle. In a data warehouse, on the other hand, data is always
maintained in a consistent fashion.
1.9 Time Variant
Data warehouses are time variant in the sense that they both maintain
historical and (nearly) current data. Operational databases, in contrast,
contain only the most current, up-to-date data values. Furthermore, they
generally maintain this information for no more than a year (and often
much less). In contrast, data warehouses contain data that is generally
loaded from the operational databases daily, weekly, or monthly which is
then typically maintained for a period of 3 to 10 years. This is a major
difference between the two types of environments.
Historical information is of high importance to decision makers, who often
want to understand trends and relationships between data. For example,
the product manager for a Liquefied Natural Gas soda drink may want to
see the relationship between coupon promotions and sales. This is
information that is almost impossible - and certainly in most cases not cost
effective - to determine with an operational database.
1.10 Non-Volatile
Non-volatility, the final primary aspect of data warehouses, means that
after the data warehouse is loaded there are no changes, inserts, or
deletes performed against the informational database. The data
13
warehouse is, of course, first loaded with transformed data that originated
in the operational databases.
The data warehouse is subsequently reloaded or, more likely, appended
on a periodic basis (usually nightly, weekly, or monthly) with new
transformed data from the operational databases. Outside of this loading
process, the data warehouse generally stays static. Due to non-volatility,
the data warehouse can be heavily optimized for query processing.
14
3. Data Warehouse Architecture
A Data Warehouse Architecture (DWA) is a way of representing the
overall structure of data, communication, processing and presentation that
exists for end-user computing within the enterprise. The architecture is
made up of several components:
Operational Database / External Database Layer
Information Access Layer
Data Access Layer
Data Warehouse Layer
Application Messaging Layer
Process Management Layer
Data Directory (Metadata) Layer
Data Staging Layer
The figure below shows how the different layers are inter-connected
together.
Figure 2 – Data Warehouse Architecture
15
1.11 Operational Database / External Database
Layer
Operational systems process data to support critical business operational
needs. Operational databases have been created to provide an efficient
processing structure for a relatively small number of well-defined business
transactions. However, because of the limited implementation of
operational systems, the databases designed to support operational
systems have difficulty accessing the data for other management or
informational purposes. This difficulty in accessing operational data is
amplified by the fact that many operational systems are often very old in
age. This means that the data access technology available to obtain
operational data itself is dated.
The goal of data warehousing is to free the information that is locked up in
the operational databases and to mix it with information from other
external sources of data. Nowadays, many large organizations are
acquiring additional data from outside databases. This information
includes demographic, economic, competitive and purchasing trends. The
so-called "information superhighway" is providing access to more data
resources every day.
1.12 Information Access Layer
The Information Access layer of the Data Warehouse Architecture is the
layer that the end-user deals with directly. In particular, it represents the
tools that the end-user normally uses day to day, e.g., Excel, Lotus 1-2-3,
16
Access, SAS, etc. This layer also includes the hardware and software
involved in displaying and printing reports, spreadsheets, graphs and
charts for analysis and presentation. Over the past two decades, the
Information Access layer has expanded enormously, especially as end-
users have moved to PCs and PC/LANs.
Today, more and more sophisticated tools exist on the desktop PC for
manipulating, analyzing and presenting data; however, there are
significant problems in making the raw data contained in operational
systems available easily to end-user tools. One of the key problems is to
find a common data language that can be used throughout the enterprise.
1.13 Data Access Layer
The Data Access Layer is involved with allowing the Information Access
Layer to communicate to the Operational Layer. Today the common data
language that has emerged is SQL. Originally, SQL was developed by
IBM as a query language, but over the last twenty years has become the
standard for data interchange.
One of the key breakthroughs of the last few years has been the
development of a series of data access "filters" such as Enterprise Data
Access (EDA)/SQL that make it possible for SQL to access nearly all
DBMSs and data file systems, relational or non-relational. These filters
make it possible for Information Access tools to access data stored on
database management systems that are even twenty years old.
17
The Data Access Layer not only spans different DBMSs and file systems
on the same hardware, it spans manufacturers and network protocols as
well. One of the keys to a Data Warehousing strategy is to provide end-
users with "universal data access". Universal data access means that,
theoretically, end-users, regardless of location or Information Access tool,
should be able to access any or all of the data in the enterprise that is
necessary for them.
In some cases, this is all that certain end-users need. However, in
general, organizations are developing a much more sophisticated scheme
to support Data Warehousing.
1.14 Data Warehouse (Physical) Layer
The core Data Warehouse is where the actual data used for informational
uses occurs. In some cases, one can think of the Data Warehouse simply
as a logical or virtual view of data. In many instances, the data warehouse
may not actually involve storing data.
In a Physical Data Warehouse, copies, in some cases many copies, of
operational and or external data are actually stored in a form that is easy
to access and is highly flexible. Increasingly, Data Warehouses are stored
on client/server platforms, but they are often stored on main frames as
well.
18
1.15 Application Messaging Layer
The Application Message Layer has to do with transporting information
around the enterprise computing network. Application Messaging is also
referred to as "middleware", but it can involve more than just networking
protocols. Application Messaging for example can be used to isolate
applications, operational or informational, from the exact data format.
Application Messaging can also be used to collect transactions or
messages and deliver them to a certain location at a certain time.
Application Messaging is the transport system underlying the Data
Warehouse.
1.16 Process Management Layer
The Process Management Layer is involved in scheduling the various
tasks that must be completed to build and maintain the data warehouse
and data directory information. The Process Management Layer can be
regard as the scheduler or the high-level job controller for the many
processes that must be done to keep the Data Warehouse up-to-date.
1.17 Data Directory (Metadata) Layer
In order to provide for universal data access, it is necessary to maintain
some form of data directory or repository of meta-data information. Meta-
data is the data about data within the enterprise. Record descriptions in a
COBOL program are meta-data. So are DIMENSION statements in a
FORTRAN program, or SQL Create statements.
In order to have a fully functional warehouse, it is necessary to have a
19
variety of meta-data available, data about the end-user views of data and
data about the operational databases. Ideally, end-users should be able to
access data from the data warehouse without having to know where that
data resides or the form in which it is stored.
1.18 Data Staging Layer
The final component of the Data Warehouse Architecture is Data Staging.
Data Staging is also called copy management or replication management.
Actually, it includes all the processes necessary to select, edit,
summarize, combine and load data warehouse and information access
data from operational and/or external databases.
Data Staging often involves complex programming, but increasingly data
warehousing tools provide help in this process. Data Staging may also
involve data quality analysis programs and filters that identify patterns and
data structures within existing operational data.
20
4. Examples on Data Warehousing Vendors
As IBM, Oracle, and SAS are the famous software vendors. Also, their
data warehouse technology that provided by those vendors are widely
common used by different industries. Therefore, we chose data
warehouse examples from these companies.
1.19 IBM
4.1.1 Introduction to IBM
IBM is aligned around a single, focused business model: innovation. It
takes its breadth and depth of insight on issues, processes and operations
across a variety of industries, and invents and applies technology to help
solve its clients' most intractable business and competitive problems. It
provides different types of data warehouses for the users to deliver
dynamic warehousing. One of the data warehouse is the DB2 Data
Warehouse Edition (DB2 DWE).
4.1.2 Features of DB2 Data Warehouse Edition
DB2 DWE integrates and simplifies the data warehouse environment to
deliver all of the capabilities in order to consolidate, manage, deliver and
analyze your business information. It is optimized for reporting and
analysis and data are summarized and stored in a dimension-based
model. It can allow the people get a good high-level understanding of
what it takes to implement a successful data warehouse project in their
business. It represents the IBM offering for implementing integrated
21
Business Intelligence solutions in order to remove cost and time to
facilitate the data analysis for the business.
Figure 3 – Platform of IBM DB2 Warehouse
1.19.1.1 Powerful DB2 data server foundation
The IBM DB2 platform is the foundation for the DB2 Warehouse solution.
With its massively scalable, shared-nothing distributed architecture, DB2 9
provides high performance for mixed workload query processing against
both relational and native XML data. Advanced features such as data
partitioning, new row compression, multidimensional clustering and
materialized query tables (MQTs) make DB2 a powerful engine for
dynamic warehousing.
22
1.19.1.2 Do it right
DB2 DWE captures new opportunities with a highly flexible, scalable data
warehousing framework, and combines common design tools, advanced
compression technology, inline analytics and pre-built mining capabilities.
1.19.1.3 Do it smarter
It can increase the return on your data warehouse investment by choosing
a high-performance, open-standards-based solution that can be rapidly
implemented with reduced risk to your business.
1.19.1.4 Modeling and design tool
It provides the core components to graphically model data structures,
move and transform data within the data warehouse, implement online
analytical process (OLAP), build and score data mining models, and
finally the ability to develop embedded analytic application components.
4.1.3 Advantages
DB2 DWE is a comprehensive and integrated solution to enterprise data
warehouse development. It provides tools to help data warehouse
administrator on designing, deploying and maintaining enterprise data
warehouse. DB2 DWE’s multidimensional database provides OLAP
(Online analytical processing) which allow users to view data in the
system from different point of view dynamically. Users can generate
statistics by specifying their own requirement in DB2 DWE. On the other
23
hand, DB2 DWE provides advanced compression of data which lowers
the cost of storing large volume of data. Benchmark reports that DB2
DWE can save 45-69 percent of disk spaces. The compression of data
also reduces the read/write frequency of the storage devices. Thus, the
efficiency of querying is higher than uncompressed data.
4.1.4 Disadvantages
The disadvantages of using DB DWE are the high cost and the high
system requirement. It costs about US $1,000 for each year license. The
system hardware requirement is high because of the data compression
scheme. Higher processing power is needed for both compression and
decompression on data access. Typical personal computers cannot meet
the requirement. Thus, a powerful server is needed. The cost of using
DB2 DWE rises for additional hardware.
4.1.5 Application of DB2 DWE in Copenhagen’s TDC
Here is a real case of company getting benefits from DB2 DWE.
Copenhagen’s TDC, Denmark’s leading telecommunications company,
can testify to the ongoing love affair of people with their telephones.
Danish customers make so many calls that each month TDC has to deal
with 1.5 terabytes of new raw data. The company’s information technology
24
(IT) team realized that its existing technology system rapidly was running
out of storage capability. After upgrading their system to DB2 DWE, they
found that the productivity is increased by offering higher levels of
performance and additional applications to internal users. Customer
service is improved by offering most economical service plan based on
usage. Also, the marketing is enhanced through better customer targeting
and campaign tracking. TDC found that before the new system, TDC’s
batch window hardly afforded enough time for the required data to be
processed by the next morning. In the past, if the team lost even one day
of productivity, it would take us as much as a week to catch up. Now, TDC
has no problems receiving, loading and processing data by morning.
For more information about the case, please refer to the URL, http://www-
306.ibm.com/software/success/cssdb.nsf/CS/SPAT-6ATKAP?
OpenDocument&Site=dmbi&cty=en_us .
25
Some screen shots of DB2 DWE are shown as the following.
The following figure shows how DB2 DWE works. DB2 DWE can generate
reports on user’s needs.
Figure 4 – Example on Reports Generation on user’s needs (DB2 DWE)
26
If user wants a report of more detailed level, detailed reports can be show
in very simple operation.
Figure 5 – Example on Details Report (DB2 DWE)
DB2 DWE provides an interactive platform between the system
developers and the users. When the developers make some changes to
the system, message is prompted to users to tell them what have been
changed.
27
Figure 6 – Example on Prompt Message (DB2 DWE)
On users’ need, some criteria can be set to the reports. The data entry
which conflicts with the criteria would be highlighted. This makes user
easier to notice the characteristics of the data.
1.20 Oracle
4.1.6 Introduction to Oracle
Oracle Database having the ideal technology for the data warehouse
because the software’s open interfaces offered easy integration with
multiple systems. This was important as the company wanted to import
data from existing applications into the data warehouse. It also
accommodated large amounts of detail, down to individual flights in
specific segments.
Furthermore, the Oracle solution offered a flexible structure for reporting,
enabling customer to design customized reports and allowing staff to
undertake multi-dimensional analysis. Oracle is also highly scalable,
ensuring it can cater for future growth.
A research from Winter Corporation and Oracle stated that the size of a
data warehouse triple itself in every 2 years since 2001. There are many
28
reasons that lead to the significant growth in size of the data warehouse,
which many of them can be explained by the industrial trends in data
warehousing.
Oracle stated that the first reason that can explain this phenomenon is the
development of real-time business (Oracle.com, 2006). Organizations
strive to react to the market changes as quickly as possible to gain market
advantages. Data latency has to be reduced in order to achieve a real-
time business model and as a result, the data size will gradually increase.
Enterprises also tend to have a detailed log of the enterprise data as
some regulatory compliance like Basel II (the International Convergence
of Capital Measurement and Capital Standards) requires organizations to
capture and retain detailed transaction histories. Moreover, new types of
storage-intensive information can create new business opportunities, like
the RFID technology, which is also one of the reasons in resulting with a
huge size of the data warehouses.
Besides data volume, data warehousing is experiencing growth in
different dimensions. Traditionally data warehouses, or databases, were
only used for reporting and analysis, but nowadays they often come with a
prediction function and are shared and integrated to application.
Furthermore, the accessibility of data warehouse is no longer limited to
users within the enterprise, but to other customers, partners, and
suppliers. Together with the increasing complexity of queries for intensive
analysis of sophisticated business intelligence applications, there are
many criteria for a data warehouse to satisfy today’s need.
29
4.1.7 Features of Oracle Data Warehousing
Oracle, one of the most popular databases for data warehousing, has
developed its data warehousing application to fit to the above needs.
There are some key features of the Oracle application that increase its
capability.
1.20.1.1 Partitioning
Partitioning is the “foundation” for achieving effective performance in
large-scale Oracle data warehouses. It means splitting data into separate
“chunks”. It can shorten the response time and increase throughput.
Some other different features that will be discussed below have to
function depending on the partitioning of the data warehouse.
1.20.1.2 Parallel Operation
Parallelism enables scalability, which makes large workloads, large
databases, and very large data warehouses (VLDW) possible. It is
because if not all parts in the system is functioning in parallel, any single-
threaded path can potentially bottleneck the throughput of the system and
as a result limiting its ability to scale.
1.20.1.3 Materialized views
Materialized views can enable sophisticated data analysis on large data
sets. Significant processes would have to be allocated to complicated
joins and aggregations in order to produce the complex summaries
required without materialized views. On the contrary, however, queries
30
can be written against tables and views that they have been logically
designed, and the application will deal with the physical tables.
Materialized views often provide the performance boost necessary to turn
a runaway query into a powerful analytical tool.
1.20.1.4 Intelligent Optimization
The intelligent optimizer selects the best strategy and optimizes the order
of operations. As a result the query performances of indexing, partitioning,
and other data access features can be speeded up.
Figure 7 – Example on query (Oracle Datawarehousing)
1.20.1.5 Table Compression
Data compression to save disk space is an attractive option to save costs
31
by decreasing storage requirements. Whereas traditional data
compression will lead to query performance degradation, Oracle’s table
compression feature eliminates duplicate, or redundant, data values
without any negative impacts on the query performance.
1.20.1.6 Online Analytical Processing
OLAPs are deployed to gain better visibility into the business. It helps to
understand what’s happening, why it’s happening, and what will happen to
the business. Thus, all necessary knowledge and information for planning,
budgeting, forecasting, sales, and marketing functions can be derived
from the existing databases. Oracle’s OLAP product uses a single
database platform for all query processing. Both SQL and OLAP API
queries can be directed to 1 single data store. Without the need of
transferring data to different environments, users can benefit from
reducing data latency, faster access to more recent data, and reducing
low cost and complexity.
32
Figure 8 – Example on OLAPs (Oracle Datawarehousing)
1.20.1.7 Data Mining
Data mining is intended to sift through volumes of data to find hidden
patterns. These patterns can derive new business insights that can attract
and retain customers, enhance customer and supplier relationships,
identify new sales opportunities, or identify potentially fraudulent behavior.
4.1.8 Advantages
Provided an integrated view of the business by building an enterprise
data warehouse
Supported decision-making and business analysis at all levels of the
company
33
Improved performance through early detection of market
opportunities
Catered for future growth with scalable solution
Oracle is better known and management often feels more
comfortable with a better-known vendor and product.
There are more Oracle DBAs in the job market
There are more books written on supporting Oracle
Vendors of data warehouse products will almost always write their
products to support Oracle first. In addition, there is usually more
experience with these products with Oracle
Oracle has a pretty complete suite of products for data warehouse.
A company using Oracle's ERP products almost always use Oracle’s
RDBMS for their data warehouse.
4.1.9 Disadvantages
It can be extremely expensive to build and maintain. You may need to
have buy-in from senior management to get approval for a data
warehouse.
You need large amounts of storage space, potentially one terabyte or
more.
Because there is a huge amount of data, it is possible to write queries
that seem to run forever and never come back with an answer (the
query from the Twilight Zone).
The data is not up-to-date? In some cases, 24 hours or more old.
They are not easily changed. If you spot an error in the data
warehouse, you will have to correct it in the source system. If that
34
system cannot be changed, the data warehouse cannot be changed
and you will have to live with incorrect data. For example, company
"ABC Widgets" could be stored in the database as "A.B.C. Widgets",
"AB and C Widgets", or "AB&C". Unless you know about these
possible irregularities, you will get incomplete results. You may have
a difficult time persuading your company to change their procedures
to satisfy the data warehouse.
Because the data is coming from different sources, you may not be
able to get the same answer from your OLTP system as you do from
the data warehouse. It will be difficult, if not impossible, to identify if
any OLTP transactions are missing from the data warehouse.
35
4.1.10 Application of Oracle Datawarehousing in Absa
Group Limited
Absa Group Limited, one of South Africa’s largest financial services
organizations offers a complete range of products and services. Absa has
assets of R372 billion (US$62 billion), 686 staffed outlets, 5,468 ATMs and
South Africa’s largest internet banking customer base. In 2005, Barclays
took a majority stake, to help Absa become the financial services leader in
South Africa and ultimately the pre-eminent bank on the African continent.
1.20.1.8 Challenges
Improve Absa’s business responsiveness by consolidating its
fragmented business intelligence environment, which required
compiling 1,200 reports and 31 business intelligence projects
Align business intelligence to corporate strategy by standardized
methodology, architecture, tools and measurement
Deliver reports required by all business units
Cut costs of delivering and printing manually generated reports
Replace paper-based reports with electronic intelligence for individual
business units to reduce report delivery-to-desk time
1.20.1.9 Solution
Used Oracle Database as the single source of data to make the
Enterprise Data Warehouse more efficient
Consolidated the business intelligence environment on Oracle
Application Server to reduce duplication of reports
Implemented a business intelligence methodology with OLAP and
36
Oracle Balanced Scorecard tools to support common strategic
planning across the group
Aligned business performance measurement to focus on causes of
problems and thus ensure better business decision making
Used Oracle Warehouse Builder to create common processes for
extracting data and loading into the Data Warehouse
Able to source data from 52 core banking systems, on a daily, weekly
or monthly basis, as well as external data sources
Implemented Oracle Discoverer for end-user analysis
Anticipated cost savings from reduced manual reporting and removal
of disparate BI projects represents a possible return on investment of
more than 300% over five years
1.21 SAS
4.1.11 Introduction to SAS
SAS Institute Inc., has been a major producer of software since it was
founded in 1976. SAS was originally an acronym for Statistical Analysis
System but for many years has been used as an arbitrary trade-name to
refer the company as a whole.
The SAS System, originally Statistical Analysis System, is an integrated
system of software products provided by SAS Institute that enables the
programmer to perform:
Data entry, retrieval, management, and mining
Report writing and graphics
Statistical and mathematical analysis
37
Business planning, forecasting, and decision support
Operations research and project management
Quality improvement
Applications development
Data warehousing (extract, transform, load)
platform independent and remote computing
In addition, the SAS System integrates with many SAS business solutions
that enable large scale software solutions for areas such as human
resource management, financial management, business intelligence,
customer relationship management and more.
4.1.12 Features of SAS Warehousing Administrator
SAS/warehousing administrator is designed for the IT professional
responsible for creating and managing data warehouse / data mart
processes. It provides Customizable solution that offers a single point of
control, making it easier to respond to the ever-changing needs of the
business community. Also, it simplifies the creation and maintenance of
data warehouses.
The main benefit of using SAS/warehousing administrator is simplifying
the setup and management of multiple data warehouses and data marts.
The details are as follows,
Integrates extraction, transformation and loading tools for building
and managing data warehouses/data marts.
Provides a framework for effective warehouse management through
38
a metadata-driven architecture.
Facilitates business subject definition, consolidation of business
rules, scheduling of processes for warehouse maintenance and
integration with decision-support tools for effective warehouse
exploitation.
Leverages the strengths of SAS software and rapid warehousing to
deliver the well-proven benefits of a data warehouse even faster.
With using the graphical user interface, the visualization, navigation and
maintenance of the data warehouse are simplified and eliminate much of
the coding work required to build and manage it. Moreover, it offers the
adaptability and the manageability you need as your business and
information needs change, as more data is added, as processes become
more complex, and as users require greater support.
Figure 9 – Example on use interface (SAS/Warehousing Administrator)
39
Figure 10 – Example on reports (SAS Data Warehouse)
1.21.1.1 SAS Enterprise Data Integration
In different from common data warehousing, SAS provides a complete
functional data capturing, storage, integration and analysis software
across the enterprise. The SAS Enterprise Data Integration attains and
manages consistent and trusted data throughout the organization in a
flexible and reliable manner.
40
Graphical user interface provides technicians with an interactive,
single point of control for managing data integration processes,
including wizards for building and executing data access,
transformations and storage process flows.
Connectivity to more data sources on more platforms such as IBM
DB2, Oracle DB, Microsoft Access, Sybase, etc.
Data quality embedded into batch, near-time and real-time processes
Metadata is captured and documented throughout transformation and
data integration processes
Migrate or synchronize data between database structures, enterprise
applications, mainframe legacy files, text, XML and message queues
Join data across these virtual data sources for real-time access and
analysis
Business metadata design interface allows data analysts to quickly
build semantic layer
Business rules library for reusable business rules clean, standardize,
match and enhance data as it moves into the master reference file
and is reused for downstream processes
41
Figure 11 – Overview on SAS Data Integration
4.1.13 Advantages
1.21.1.2 High Compatibility
Access to ERP systems such as Baan, People Soft, and SAP; relational
databases such as DB2, Oracle, Informix, ODBC, MS SQL Server,
Sybase, and Teradata; and non-relational databases such as Adabas and
PC file formats.
1.21.1.3 Point-and-click interface
The user friendly interface enable data management specialist
implementing the warehousing application without the assist of
programmer and also operators.
4.1.14 Disadvantages
1.21.1.4 Unknown Implementation cost
When compare with other, like Oracle, SAS does not has a well pricing
policy. It gives difficulty for customers to choose between available
products.
1.21.1.5 Unknown difficulty of implementation
Compare with other company, such as Oracle offers data warehouse and
analytic specific services that combine technical leadership and expertise
with Oracle technology to provide a complete business intelligence
solution, SAS does not mention the degree of difficulty of implementation.
42
4.1.15 Application of SAS Data Warehousing in the HK
Trade Development Council
The Hong Kong Trade Development Council, which launched Business-
Stat On-line using Data Warehousing and Web Enablement technology
from SAS.
Business-Stat On-line (BSO) is an interactive on-line service allowing
companies to access monthly trade figures compiled by the Census and
Statistics Department. Information available includes Hong Kong’s total
trade figures, overseas trade, and trade according to specific types of
product and service.
The project involved the design and implementation of a Data Warehouse
containing five years of export, domestic export and import data broken
down by a wide range of product and market areas. Other trade service
data was also imported into the system using customized tools provided
by SAS. SAS also developed an extensive number of statistical reports.
Over 6000 pre-summarized general tables were created for on-line
access, designed as a starting point to Hong Kong’s general trade
performance. In addition, users of the service can view an unlimited
number of dynamic reports based on selection criteria such as region,
industry and product type. Registration and administration tools provided
by SAS allow subscribers to register for the BSO service on-line free of
charge. They are then automatically notified by e-mail of their logon ID
and password, allowing them full access to the service.
43
5. How to implement Data Warehouse
successfully
Talking so many benefits about the application of data warehouse in an
enterprise, but how we could implement the DW technology successfully
into our operational processes? Denis Kozar suggested the “seven deadly
sins” on the DW implementation.
1.22 “If you Built It, They Will Come”
The blind faith on the DW technology leads to the failure to recognize the
importance of defining a set of business objectives for the data warehouse
prior to its implementation. A clearly defined data warehouse plan is
important to the needs of the entire enterprise and a documented set of
requirements is necessary to guide the design, construction, and rollout of
the project.
1.23 Omission of an Architectural Framework
One the most important factors in a successful data warehouse
implementation is the development and maintenance of a comprehensive
architectural framework. The framework serves as the blueprint for
construction and use of the various DW components. Developers need to
consider, the number of end-users, volume and diversity of data, expected
data-refresh cycle, etc., in the DW architecture.
44
1.24 Understanding the Importance of
Documenting Assumptions
The assumptions and potential data conflicts associated with the DW
must be included in the architectural framework for the project. Several
questions need to be considered during the requirements phase of the
project that serve to reveal these important underlying assumptions about
the DW. How much data would be loaded into the warehouse? How often
the data need to be refreshed? On what platform the DW will be
developed? Answers to these questions are essential to the success of
DW implementation.
1.25 Failure to Use the Right Tool for the Job
The design and construction of a DW is much different from that of an
operational application system. The DW tools can be categorized into four
areas:
Analysis Tools – assist in identification of data requirements
Development Tools – responsible for data cleansing, code
generation, data integration, and loading of the data into the data
repository.
Implementation Tools – contain data acquisition tools to gather
process, clean, replicate, and consolidate data.
Delivery Tools – assist in data conversion, derivation, and reporting
for the application platform.
Correct application of these tools could help to implement the DW
efficiently and effectively.
45
1.26 Life Cycle Abuse
The life cycle of DW development is a continuous, ongoing set of activities
that flow from initial investigation of DW requirements through data
administration and back again. The development of DW project should be
kept running continuously as if the DW is to remain a viable source of
decision-making support in the ever changing business environment.
1.27 Ignorance Concerning the Resolution of Data
Conflicts
Analysis must be conducted to determine the best data sources available
within an organization. Once these systems have been identified, the
conflicts associated with disparate naming conventions, file formats and
sizes, and value ranges must be resolved. This process may involve
working with data owners to establish an understanding with regard to
future planned or unplanned changes to the source data. Failure to allow
sufficient time and resources to resolve data conflicts can delay a
warehouse implementation and result in an organizational deadlock that
can threaten the success of the project.
1.28 Failure to Learn from Mistakes
The ongoing nature of the DW development cycle suggests that DW
project simply relates one another. Because of this, careful documentation
of the mistakes made in the previous projects will directly impact the
quality assurance activities of all future projects. By learning from the past,
a strong DW with lasting benefits can be built.
46
If developers can pay attention to the above areas, the implementation of
Data Warehouse will certainly bring great benefits to the business.
47
6. Concerns & Conclusion
Data warehouse can bring many benefits to enterprises, however, there
are concerns of using it.
Extracting, cleaning and loading data is time consuming.
Data warehousing project scope must be actively managed to deliver
a release of defined content and value.
Problems with compatibility with systems already in place.
Security could develop into a serious issue, especially if the data
warehouse is web accessible.
Data Storage design controversy warrants careful consideration and
perhaps prototyping of the data warehouse solution for each project's
environments
So, managers need to aware of the concerns when using the data
warehouse, so that they can get the benefits of data warehousing without
any problems.
48
7. References
George M. Marakas. (©1999) pp. 343-346, Decision support system in the
twenty-first century: DSS and data mining technologies for tomorrow’s
manager
IBM, Background, http://www-03.ibm.com/press/us/en/background.wss
IBM, DB2 Data Warehouse Edition, “Features and benefits”,
http://www-306.ibm.com/software/data/db2/dwe/features.html?S_CMP=rnav
IBM, DB2 Data Warehouse Edition, “Overview”,
http://www-306.ibm.com/software/data/db2/dwe/
IBM, “Denmarks’ TDC answers the call of Danish telephone consumers
with IBM Data Warehouse”,
http://www-306.ibm.com/software/success/cssdb.nsf/CS/SPAT-6ATKAP?
OpenDocument&Site=dmbi&cty=en_us
Ken Orr (©1996, revised 2000), Data Warehouse Technology,
http://www.kenorrinst.com/dwpaper.html
Manufacturing Business Technology: Software Finder, “Oracle vs SAS”,
http://softwarefinder.mbtmag.com/search/for/Oracle-vs-SAS.html
Martyn R Jones (1999), “Brief defining characteristics of a Data
Warehouse”, http://www.brint.com/wwwboard/messages/4599.html
Paul Westerman, Data Warehousing : using the Wal-Mart model
SAS, Data Integration, http://www.sas.com/technologies/dw/
Wikipedia, “Bill Inmon”, http://en.wikipedia.org/wiki/Bill_Inmon
Wikipedia, “SAS Institute”, http://en.wikipedia.org/wiki/SAS_Institute
Wikipedia, “SAS System”, http://en.wikipedia.org/wiki/SAS_System
49