Download - Data Warehousing Concepts - 1
8/6/2019 Data Warehousing Concepts - 1
http://slidepdf.com/reader/full/data-warehousing-concepts-1 1/9
Data Warehousing Concepts
Data Warehousing
Data Warehousing is a process whereby data from dispersed business units, originatingfrom different sources or operational systems, is integrated for the purpose of analysis
and reporting from a single version of the truth. It is the foundation of BusinessIntelligence.
Data warehousing is commonly used by companies to analyze trends over time. In other
words, companies may very well use data warehousing to view day-to-day operations, but its primary function is facilitating strategic planning resulting from long-term data
overviews. From such overviews, business models, forecasts, and other reports and projections can be made.
Why Data Warehousing?
There are a number of reasons why many large corporations have spent large amounts of
money implementing data warehouses. The most fundamental benefit of using datawarehouses is that they store and present information in such a way that it allows
business executives to make important decisions.
Instead of looking at an organization in terms of the departments that it comprises, data
warehouses allow business executives to look at the company as a whole.
Data warehouses can be highly efficient because they will allow the user to make queries
of data on a regular basis. This can be done from numerous transaction systems, and it
can also be done from outside sources. Before the advent of data warehouses, companies
8/6/2019 Data Warehousing Concepts - 1
http://slidepdf.com/reader/full/data-warehousing-concepts-1 2/9
that wanted reports from numerous systems had to produce data extracts and run special
logic programs to combine this data. In most cases, this strategy worked fine. Despite
this, companies that had large amounts of data may have had problems if they wanted to
sort through it frequently. While there are a number of challenges to these scenarios, a
company can handle them if they take the time to establish the right procedures.
Simplicity plays an important role in the success of a data warehouse, and this is
something that companies will want to pay attention to early on. Most data warehouses
can be set up in such a way that simple queries can be written by workers who do not
have a lot of technical skill. Even then, workers who do not have a lot of technical skill
will often run into problems when trying to perform certain tasks. Data warehouses are
unique in the fact that they can act as a repository, a repository for transaction processing
systems that have been cleaned. The data can be reported against them, and it may not
require the transaction process systems to be fixed or calibrated.
In older systems, data that was considered to be old would often be removed fromtransaction processing systems. This was done for the purpose of making the response
time easier to maintain. For tasks that required querying, the older data and the recentdata may be stored in the data warehouse in a way that gives the user control over the
response time. Workers may run into some challenges depending on the information theyneed. When data warehouses are implemented and designed properly, they can bring a
large number of advantages to the companies that use them. The data warehouse can givethe company a forecast on how the company is performing as a whole, and it can allow
the executives and managers to make crucial decisions that can help a company succeed.
Why not Transaction Systems for Analysis and Reporting?
The vast majority of companies wish to set up transaction systems so there is a goodchance that these transactions will be completed within a desirable time frame. The
biggest problem with reports and queries is that these entities can reduce the chances of atransaction being made within a good time frame. It should also be emphasized that
running reports on a server via transaction systems can be quite challenging. Because of these challenges, many companies seek to alleviate the problem by implementing a data
warehouse system. Another powerful benefit of data warehouses is that they allowcompanies to use data models for querying tasks that are quite difficult for transaction
processing.There are a number of ways that data can be modeled, and the goal of modeling is to
improve the performance of reporting. This will often be done via a star schema, and it isgenerally not recommended for transaction processing systems. The reason for this is
because certain modeling methods can slow down transaction processing systems. At the
same time, the server units may speed up the transaction process, but they will slow down
the querying process. Perhaps one of the most important benefits of data warehouses is
that they set the stage for an environment where a small amount of technical knowledge
8/6/2019 Data Warehousing Concepts - 1
http://slidepdf.com/reader/full/data-warehousing-concepts-1 3/9
about databases can be used to write queries and speed of the maintenance of these
queries.
Data Warehouse
A data warehouse is a relational database that is designed for query and analysis rather than for transaction processing. It usually contains historical data derived from
transaction data. A data warehouse environment includes an extraction, transformation,and loading (ETL) solution, online analytical processing (OLAP) and data mining
capabilities, client analysis tools, and other applications that manage the process of gathering data and delivering it to business users.It is a series of processes, procedures
and tools (h/w & s/w) that help the enterprise understand more about itself, its products,its customers and the market it services.
Data Warehouse Characteristics
Data Warehouse is a subject-oriented, integrated, nonvolatile and time-variant collectionof data in support of management¶s decisions.
Subject Oriented:Data warehouses are designed to help you analyze data. For example, tolearn moreabout your company¶s sales data, you can build a warehouse that concentrates
onsales. Using this warehouse, you can answer questions like "Who was our bestcustomer for this item last year?" This ability to define a data warehouse by
subjectmatter, sales in this case, makes the data warehouse subject oriented.
Integrated:Integration is closely related to subject orientation. Data warehouses must putdatafrom disparate sources into a consistent format. They must resolve such problems
as naming conflicts and inconsistencies among units of measure. When they achievethis, they are said to be integrated.
8/6/2019 Data Warehousing Concepts - 1
http://slidepdf.com/reader/full/data-warehousing-concepts-1 4/9
Nonvolatile:Nonvolatile means that, once entered into the warehouse, data should notchange.This is logical because the purpose of a warehouse is to enable you to analyze
whathas occurred.
Time Variant:In order to discover trends in business, analysts need large amounts of data.
This isvery much in contrast to online transaction processing (OLTP) systems,whereperformance requirements demand that historical data be moved to an archive. A
data warehouse¶s focus on change over time is what is meant by the term timevariant.
8/6/2019 Data Warehousing Concepts - 1
http://slidepdf.com/reader/full/data-warehousing-concepts-1 5/9
The Benefits of Data Warehouse
Data warehouse can be used to efficiently answer business analytical questions. The data
warehousing benefits can be multifaceted, ranging from simple basic analytics, profitability analysis to risk assessment and business trends. A proper well designed data
warehouse has a very high ROI (return on investment) to an organization.
With a data warehouse, companies have the opportunity to manage enterprisewide data as
an asset.
Depending on their data warehousing strategies, companies may experience the following
advantages
y One consistent data store for reporting, forecasting, and analysis
y Easier and timely access to datay Improved end-user productivity
y Improved IS productivity
y Reduced costs
y Scalability
y Flexibility
y Reliability
y Competitive advantage
y Trend analysis and detection
y K ey ratio indicator measurement and trackingy Drill down analysis
y Problem monitoring
y Executive analysis
OLTP
OLTP (online transaction processing) is a class of program that facilitates and managestransaction-oriented applications, typically for data entry and retrieval transactions in a
number of industries, including banking, airlines, mailorder, supermarkets, and
manufacturers. Probably the most widely installed OLTP product is IBM's CICS(Customer Information Control System).
Today's online transaction processing increasingly requires support for transactions that
span a network and may include more than one company. For this reason, new OLTP
software uses client/server processing and brokering software that allows transactions to
run on different computer platforms in a network.
8/6/2019 Data Warehousing Concepts - 1
http://slidepdf.com/reader/full/data-warehousing-concepts-1 6/9
OLTP vs. Data Warehouse (DW)
OLTP systems are tuned for known transactions and workloads while workload is not
known in a data warehouse. Special data organization, access methods and
implementation methods are needed to support data warehouse queries (typicallymultidimensional queries)
e.g., average amount spent on phone calls between 9AM-5PM in Tampa during themonth of December
OLTP Data Warehouse
Application Oriented Subject Oriented
Used to run business Used to analyze business
Detailed data Summarized and refined
Current up to date Snapshot data
Isolated Data Integrated Data Repetitive access Ad-hoc access
Clerical User K nowledge User (Manager)
Performance Sensitive Performance relaxed
Few Records accessed at a time (tens) Large volumes accessed at a time(millions)
Read/Update Access Mostly Read (Batch Update)
No data redundancy Redundancy present
Database Size 100MB -100 GB Database Size 100 GB - fewterabytes
Transaction throughput is the performancemetric
Query throughput is the performancemetric
Thousands of users Hundreds of users
Managed in entirety Managed by subsets
To Summarize «
OLTP systems are used to ³Run´ a business
The Data Warehouse helps to ³Optimize´the business
8/6/2019 Data Warehousing Concepts - 1
http://slidepdf.com/reader/full/data-warehousing-concepts-1 7/9
DataFlow in a Data Warehousing Environment
Data Mart
A data mart is a logical subset of an organizational data store, usually oriented to a
specific purpose or major subject area, that may be distributed to support business needs.
A Data Mart covers a specific area of the business: billing, inventory, transactions,claims, etc.
Reasons for creating a data mart
y Easy access to frequently needed data
y Creates collective view by a group of users
y Improves end-user response time
y Ease of creationy Lower cost than implementing a full Data warehouse
y Potential users are more clearly defined than in a full Data warehouse
8/6/2019 Data Warehousing Concepts - 1
http://slidepdf.com/reader/full/data-warehousing-concepts-1 8/9
Operational Data Store (ODS)
An Operational Data Store (ODS) integrates data from multiple business operationsources to address operational problems that span one or more business functions.
ODS Characterisitics
An ODS has the following features:
y Subject-oriented ² Organized around major subjects of an organization(customer, product, etc.), not specific applications (order entry, accounts
receivable, etc.).
y Integrated ² Presents an integrated image of subject-oriented data which is pulled from fragmented operational source systems.
y Current * ² Contains a snapshot of the current content of legacy sourcesystems. History is not kept, and might be moved to the data warehouse for
analysis.
y Volatile * ² Since ODS content is kept current, it changes frequently. Identicalqueries run at different times may yield different results.
y Detailed * ² ODS data is generally more detailed than data warehouse data.Summary data is usually not stored in an ODS; the exact granularity depends onthe subject that is being supported.
8/6/2019 Data Warehousing Concepts - 1
http://slidepdf.com/reader/full/data-warehousing-concepts-1 9/9
The ODS provides an integrated view of data in operational systems. As the figure belowindicates, there is a clear separation between the ODS and the data warehouse.
Benefits of ODS
y Supports operational reporting needs of the organization
y Provides a complete view of customer relationships, the data for which might bestored in several operational databases -- this data can include data from anorganization¶s internal systems, as well as external data from third-party vendors.
y Operates as a store for detailed data, updated frequently and used for drill-downsfrom the data warehouse which contains summary data.
y Reduces the burden placed on other operational or data warehouse platforms by
providing an additional data store for reporting.
y Provides more current data than in a data warehouse and more integrated than anOLTP system
y Feeds other operational systems in addition to the data warehouse