data warehousing concepts - 1

9
Data Warehousing Concepts  Data Warehousing Data Warehousing is a process whereby data from dispersed business units, originating from different sources or operational systems, is integrated for the purpose of analysis and reporting from a single version of the truth. It is the foundation of Business Intelligence. Data warehousing is commonly used by companies to analyze trends over time. In other words, companies may very well use data warehousing to view day-to-day operations,  but its primary function is facilitating strategic planning resulting from long-term data overviews. From such overviews, business models, forecasts, and other reports and  projections can be made. Why Data Warehousing? There are a number of reasons why many large corporations have spent large amounts of money implementing data warehouses. The most fundamental benefit of using data warehouses is that they store and present information in such a way that it allows  business executives to make important dec isions. Instead of looking at an organization in terms of the departments that it comprises, data warehouses allow business executives to look at t he company as a whole. Data warehouses can be highly efficient because they will allow the user to make queries of data on a regular basis. This can be done from numerous transaction systems, and it can also be done from outside sources. Before the advent of data warehouses, companies

Upload: rama-umamageswaran

Post on 08-Apr-2018

225 views

Category:

Documents


0 download

TRANSCRIPT

8/6/2019 Data Warehousing Concepts - 1

http://slidepdf.com/reader/full/data-warehousing-concepts-1 1/9

Data Warehousing Concepts 

Data Warehousing

Data Warehousing is a process whereby data from dispersed business units, originatingfrom different sources or operational systems, is integrated for the purpose of analysis

and reporting from a single version of the truth. It is the foundation of BusinessIntelligence.

Data warehousing is commonly used by companies to analyze trends over time. In other 

words, companies may very well use data warehousing to view day-to-day operations,  but its primary function is facilitating strategic planning resulting from long-term data

overviews. From such overviews, business models, forecasts, and other reports and projections can be made.

Why Data Warehousing?

There are a number of reasons why many large corporations have spent large amounts of 

money implementing data warehouses. The most fundamental benefit of using datawarehouses is that they store and present information in such a way that it allows

 business executives to make important decisions.

Instead of looking at an organization in terms of the departments that it comprises, data

warehouses allow business executives to look at the company as a whole.

Data warehouses can be highly efficient because they will allow the user to make queries

of data on a regular basis. This can be done from numerous transaction systems, and it

can also be done from outside sources. Before the advent of data warehouses, companies

8/6/2019 Data Warehousing Concepts - 1

http://slidepdf.com/reader/full/data-warehousing-concepts-1 2/9

that wanted reports from numerous systems had to produce data extracts and run special

logic programs to combine this data. In most cases, this strategy worked fine. Despite

this, companies that had large amounts of data may have had problems if they wanted to

sort through it frequently. While there are a number of challenges to these scenarios, a

company can handle them if they take the time to establish the right procedures.

Simplicity plays an important role in the success of a data warehouse, and this is

something that companies will want to pay attention to early on. Most data warehouses

can be set up in such a way that simple queries can be written by workers who do not

have a lot of technical skill. Even then, workers who do not have a lot of technical skill

will often run into problems when trying to perform certain tasks. Data warehouses are

unique in the fact that they can act as a repository, a repository for transaction processing

systems that have been cleaned. The data can be reported against them, and it may not

require the transaction process systems to be fixed or calibrated.

In older systems, data that was considered to be old would often be removed fromtransaction processing systems. This was done for the purpose of making the response

time easier to maintain. For tasks that required querying, the older data and the recentdata may be stored in the data warehouse in a way that gives the user control over the

response time. Workers may run into some challenges depending on the information theyneed. When data warehouses are implemented and designed properly, they can bring a

large number of advantages to the companies that use them. The data warehouse can givethe company a forecast on how the company is performing as a whole, and it can allow

the executives and managers to make crucial decisions that can help a company succeed.

Why not Transaction Systems for Analysis and Reporting?

The vast majority of companies wish to set up transaction systems so there is a goodchance that these transactions will be completed within a desirable time frame. The

 biggest problem with reports and queries is that these entities can reduce the chances of atransaction being made within a good time frame. It should also be emphasized that

running reports on a server via transaction systems can be quite challenging. Because of these challenges, many companies seek to alleviate the problem by implementing a data

warehouse system. Another powerful benefit of data warehouses is that they allowcompanies to use data models for querying tasks that are quite difficult for transaction

 processing.There are a number of ways that data can be modeled, and the goal of modeling is to

improve the performance of reporting. This will often be done via a star schema, and it isgenerally not recommended for transaction processing systems. The reason for this is

 because certain modeling methods can slow down transaction processing systems. At the

same time, the server units may speed up the transaction process, but they will slow down

the querying process. Perhaps one of the most important benefits of data warehouses is

that they set the stage for an environment where a small amount of technical knowledge

8/6/2019 Data Warehousing Concepts - 1

http://slidepdf.com/reader/full/data-warehousing-concepts-1 3/9

about databases can be used to write queries and speed of the maintenance of these

queries.

Data Warehouse

A data warehouse is a relational database that is designed for query and analysis rather than for transaction processing. It usually contains historical data derived from

transaction data. A data warehouse environment includes an extraction, transformation,and loading (ETL) solution, online analytical processing (OLAP) and data mining

capabilities, client analysis tools, and other applications that manage the process of gathering data and delivering it to business users.It is a series of processes, procedures

and tools (h/w & s/w) that help the enterprise understand more about itself, its products,its customers and the market it services.

Data Warehouse Characteristics

Data Warehouse is a subject-oriented, integrated, nonvolatile and time-variant collectionof data in support of management¶s decisions. 

Subject Oriented:Data warehouses are designed to help you analyze data. For example, tolearn moreabout your company¶s sales data, you can build a warehouse that concentrates

onsales. Using this warehouse, you can answer questions like "Who was our  bestcustomer for this item last year?" This ability to define a data warehouse by

subjectmatter, sales in this case, makes the data warehouse subject oriented.

Integrated:Integration is closely related to subject orientation. Data warehouses must putdatafrom disparate sources into a consistent format. They must resolve such problems

as naming conflicts and inconsistencies among units of measure. When they achievethis, they are said to be integrated.

8/6/2019 Data Warehousing Concepts - 1

http://slidepdf.com/reader/full/data-warehousing-concepts-1 4/9

 

 Nonvolatile:Nonvolatile means that, once entered into the warehouse, data should notchange.This is logical because the purpose of a warehouse is to enable you to analyze

whathas occurred.

Time Variant:In order to discover trends in business, analysts need large amounts of data.

This isvery much in contrast to online transaction processing (OLTP) systems,whereperformance requirements demand that historical data be moved to an archive. A

data warehouse¶s focus on change over time is what is meant by the term timevariant.

8/6/2019 Data Warehousing Concepts - 1

http://slidepdf.com/reader/full/data-warehousing-concepts-1 5/9

 

The Benefits of Data Warehouse

Data warehouse can be used to efficiently answer business analytical questions. The data

warehousing benefits can be multifaceted, ranging from simple basic analytics, profitability analysis to risk assessment and business trends. A proper well designed data

warehouse has a very high ROI (return on investment) to an organization.

With a data warehouse, companies have the opportunity to manage enterprisewide data as

an asset.

Depending on their data warehousing strategies, companies may experience the following

advantages

y  One consistent data store for reporting, forecasting, and analysis

y  Easier and timely access to datay  Improved end-user productivity

y  Improved IS productivity

y  Reduced costs

y  Scalability

y  Flexibility

y  Reliability

y  Competitive advantage

y  Trend analysis and detection

y  K ey ratio indicator measurement and trackingy  Drill down analysis

y  Problem monitoring

y  Executive analysis

OLTP

OLTP (online transaction processing) is a class of program that facilitates and managestransaction-oriented applications, typically for data entry and retrieval transactions in a

number of industries, including banking, airlines, mailorder, supermarkets, and

manufacturers. Probably the most widely installed OLTP product is IBM's CICS(Customer Information Control System).

Today's online transaction processing increasingly requires support for transactions that

span a network and may include more than one company. For this reason, new OLTP

software uses client/server processing and brokering software that allows transactions to

run on different computer platforms in a network.

8/6/2019 Data Warehousing Concepts - 1

http://slidepdf.com/reader/full/data-warehousing-concepts-1 6/9

 

OLTP vs. Data Warehouse (DW)

OLTP systems are tuned for known transactions and workloads while workload is not

known in a data warehouse. Special data organization, access methods and

implementation methods are needed to support data warehouse queries (typicallymultidimensional queries)

e.g., average amount spent on phone calls between 9AM-5PM in Tampa during themonth of December 

OLTP Data Warehouse

Application Oriented  Subject Oriented 

Used to run business  Used to analyze business 

Detailed data  Summarized and refined 

Current up to date  Snapshot data 

Isolated Data  Integrated Data Repetitive access  Ad-hoc access 

Clerical User   K nowledge User (Manager) 

Performance Sensitive  Performance relaxed 

Few Records accessed at a time (tens)  Large volumes accessed at a time(millions) 

Read/Update Access  Mostly Read (Batch Update) 

 No data redundancy  Redundancy present 

Database Size 100MB -100 GB  Database Size 100 GB - fewterabytes 

Transaction throughput is the performancemetric 

Query throughput is the performancemetric 

Thousands of users  Hundreds of users 

Managed in entirety  Managed by subsets 

To Summarize «

OLTP systems are used to ³Run´ a business

The Data Warehouse helps to ³Optimize´the business

8/6/2019 Data Warehousing Concepts - 1

http://slidepdf.com/reader/full/data-warehousing-concepts-1 7/9

DataFlow in a Data Warehousing Environment

Data Mart

A data mart is a logical subset of an organizational data store, usually oriented to a

specific purpose or major subject area, that may be distributed to support business needs.

A Data Mart covers a specific area of the business: billing, inventory, transactions,claims, etc.

Reasons for creating a data mart

y  Easy access to frequently needed data

y  Creates collective view by a group of users

y  Improves end-user response time

y  Ease of creationy  Lower cost than implementing a full Data warehouse

y  Potential users are more clearly defined than in a full Data warehouse

8/6/2019 Data Warehousing Concepts - 1

http://slidepdf.com/reader/full/data-warehousing-concepts-1 8/9

 

Operational Data Store (ODS)

An Operational Data Store (ODS) integrates data from multiple business operationsources to address operational problems that span one or more business functions.

ODS Characterisitics 

An ODS has the following features:

y  Subject-oriented  ² Organized around major subjects of an organization(customer, product, etc.), not specific applications (order entry, accounts

receivable, etc.).

y  Integrated  ² Presents an integrated image of subject-oriented data which is pulled from fragmented operational source systems.

y  Current *  ²  Contains a snapshot of the current content of legacy sourcesystems. History is not kept, and might be moved to the data warehouse for 

analysis.

y  Volatile *  ² Since ODS content is kept current, it changes frequently. Identicalqueries run at different times may yield different results.

y  Detailed *  ² ODS data is generally more detailed than data warehouse data.Summary data is usually not stored in an ODS; the exact granularity depends onthe subject that is being supported.

8/6/2019 Data Warehousing Concepts - 1

http://slidepdf.com/reader/full/data-warehousing-concepts-1 9/9

The ODS provides an integrated view of data in operational systems. As the figure belowindicates, there is a clear separation between the ODS and the data warehouse. 

Benefits of ODS

y  Supports operational reporting needs of the organization

y  Provides a complete view of customer relationships, the data for which might bestored in several operational databases -- this data can include data from anorganization¶s internal systems, as well as external data from third-party vendors.

y  Operates as a store for detailed data, updated frequently and used for drill-downsfrom the data warehouse which contains summary data.

y  Reduces the burden placed on other operational or data warehouse platforms by

 providing an additional data store for reporting.

y  Provides more current data than in a data warehouse and more integrated than anOLTP system

y  Feeds other operational systems in addition to the data warehouse