bi curs 6-7 olap fin.docx

28
OLAP The best starting point to approach the multidimensional model - queries for which this model is best suited (Jarke et al., 2000): "What is the total amount of receipts recorded last year per state and per product category?" "What is the relationship between the trend of PC manufacturers' shares and quarter gains over the last five years?" "Which orders maximize receipts?" "Which one of two new treatments will result in a decrease in the average period of admission?" "What is the relationship between profit gained by the shipments consisting of less than 10 items and the profit gained by the shipments of more than 10 items?" It is clear that using traditional languages, such as SQL, to express these types of queries can be a very difficult task for inexperienced users. It is also clear that running these types of queries against operational databases would result in an unacceptably long response time. The multidimensional model begins with the observation that the factors affecting decision-making processes are enterprise- specific facts, such as sales, shipments, hospital admissions, surgeries, and so on. Instances of a fact correspond to events that occurred. For example, every single sale or shipment carried out is an event. Each fact is described by the values of a set of relevant measures that provide a quantitative description of events. For example, sales receipts, amounts shipped, hospital admission costs, and surgery time are measures. Obviously, a huge number of events occur in typical enterprises—too many to analyze one by one. Imagine placing them all into an n-dimensional space to help us quickly select and sort them out. The n-dimensional space axes are called analysis dimensions, and they define different perspectives to single out events. Examples: 1

Upload: darlene-russo

Post on 24-Sep-2015

239 views

Category:

Documents


0 download

TRANSCRIPT

OLAPThe best starting point to approach the multidimensional model - queries for which this model is best suited (Jarke et al., 2000): "What is the total amount of receipts recorded last year per state and per product category?" "What is the relationship between the trend of PC manufacturers' shares and quarter gains over the last five years?" "Which orders maximize receipts?" "Which one of two new treatments will result in a decrease in the average period of admission?" "What is the relationship between profit gained by the shipments consisting of less than 10 items and the profit gained by the shipments of more than 10 items?"It is clear that using traditional languages, such as SQL, to express these types of queries can be a very difficult task for inexperienced users. It is also clear that running these types of queries against operational databases would result in an unacceptably long response time.

The multidimensional model begins with the observation that the factors affecting decision-making processes are enterprise-specific facts, such as sales, shipments, hospital admissions, surgeries, and so on. Instances of a fact correspond to events that occurred. For example, every single sale or shipment carried out is an event. Each fact is described by the values of a set of relevant measures that provide a quantitative description of events. For example, sales receipts, amounts shipped, hospital admission costs, and surgery time are measures.Obviously, a huge number of events occur in typical enterprisestoo many to analyze one by one. Imagine placing them all into an n-dimensional space to help us quickly select and sort them out. The n-dimensional space axes are called analysis dimensions, and they define different perspectives to single out events.

Examples:- the sales in a store chain can be represented in a three-dimensional space whose dimensions are products, stores, and dates. - as far as shipments are concerned, products, shipment dates, orders, destinations, and terms & conditions can be used as dimensions. - hospital admissions can be defined by the department-date-patient combination, and you would need to add the type of operation to classify surgery operations.

The concept of dimension gave life to the broadly used metaphor of cubes to represent multidimensional data. According to this metaphor, events are associated with cube cells and cube edges stand for analysis dimensions. If more than three dimensions exist, the cube is called a hypercube. Each cube cell is given a value for each measure. Its analysis dimensions are store, product and date. An event stands for a specific item sold in a specific store on a specific date, and it is described by two measures: the quantity sold and the receipts. This figure highlights that the cube is sparsethis means that many events did not actually take place. Of course, you cannot sell every item every day in every store.

HistoryThe first commercial multidimensional (OLAP) products appeared approximately 30 years ago (Express). When Edgar Codd introduced the OLAP definition in his 1993 white paper, there were already dozens of OLAP.After Codd's research appeared, the software industry began appreciating OLAP functionality and many companies have integrated OLAP features into their products (RDBMS, integrated business intelligence suites, reporting tools, portals, etc.). In addition, for the last decade, pure OLAP tools have considerably improved and become cheaper and more user-friendly. These developments brought OLAP functionality to a much broader range of users and organizations. Now OLAP is used not only for strategic decision-making in large corporations, but also to make daily tactical decisions about how to better streamline business operations in organizations of all sizes and shapes. However, the acceptance of OLAP is far from maximized. For example, one year ago, The OLAP Survey 2 found that only thirty percent of its participants actually used OLAP.

DefinitionsAn OLAP cube is an array of data understood in terms of its 0 or more dimensions. OLAP is an acronym for online analytical processing. OLAP is a computer-based technique for analyzing business data in the search for business intelligence. Source: Wikipedia.org

An online analytical processing cube (OLAP cube) is a multidimensional array of data that serves as a database optimized for OLAP applications and data warehousing. It is a way of storing relevant data in a multidimensional form to make it appear more logical when used to generate reports and facilitate more efficient analytics.Source: http://www.techopedia.com/definition/21142/online-analytical-processing-cube-olap-cube

An OLAP cube is a multidimensional database that is optimized for data warehouse and online analytical processing (OLAP) applications.An OLAP cube is a method of storing data in a multidimensional form, generally for reporting purposes. In OLAP cubes, data (measures) are categorized by dimensions. OLAP cubes are often pre-summarized across dimensions to drastically improve query time over relational databases. The query language used to interact and perform tasks with OLAP cubes is multidimensional expressions (MDX). The MDX language was originally developed by Microsoft in the late 1990s, and has been adopted by many other vendors of multidimensional databases.Although it stores data like a traditional database does, an OLAP cube is structured very differently. Databases, historically, are designed according to the requirements of the IT systems that use them. OLAP cubes, however, are used by business users for advanced analytics. Thus, OLAP cubes are designed using business logic and understanding. They are optimized for analytical purposes, so that they can report on millions of records at a time. Source: http://searchoracle.techtarget.com/tip/Why-OLAP-deserves-more-attention

Stands for "Online Analytical Processing." OLAP allows users to analyze database information from multiple database systems at one time. While relational databases are considered to be two-dimensional, OLAP data is multidimensional, meaning the information can be compared in many different ways. For example, a company might compare their computer sales in June with sales in July, then compare those results with the sales from another location, which might be stored in a different database.Source: http://www.techterms.com/definition/olap

Basically OLAP is an awful name, Nigel Pendse, author of the OLAP reportcalls the same thing FASMI, which I think is a far better term: Fast - 90% of queries back in under 10 secs and no query takes longer than 30 secs. Analysis - Drill down, multiple aggregation techniques, sophisticated graphics, trends all form part of this Shareable - good security at the back end and available to a wide community of users.also multi currency, multi lingual to cope with the global economy. Multi-Dimensional - Excel pivot tables but more so. The ability to have any multipledimensions of information on each axisof a cross-tab with other dimensions being used to furtherfilter theresults returned. Information -Real world KPI's(Key Performance Indicators) rather than raw numbers.Source: Andrew.Fryer, OLAP, Cubes and Multidimensional Analysis, available at: http://blogs.technet.com/b/andrew/archive/2007/08/22/olap-cubes-and-multidimensional-analysis.aspx

In OLAP the cube is the database structurethat is queried on and to get a handle on how this works below is a simple 3dimensional cubeExemple1.

The coordinate system in a cube not only has a reference to a point in multidimensional space it also has an understanding of hierarchies. So the cube 'knows' that January 2007 has a parent called 2007 in the example above. This forms a key part of the OLAP concept - that the results of calculations can be stored at the parent level rather than using on the fly aggregation of all the children e.g. the sales totalfor 2007 is stored in the cube for bike, components etc. as is the cost of sale. The profit margin % has to be worked out on the fly for bikes for 2007 but this is quick as the cost of sales and the sales that contribute to this calculation are pre-calculated. This gives OLAP it's speed while allowing for rich calculations to be stored. As always in IT there is a catch, and in this case is the complexity of the language used to query a cube and that is MDX or multi-dimensional expressions.

Exemple2Years later, the technology has been sufficiently perfected to make OLAP against large data warehouses feasible, truly bringing the "intelligence" to business intelligence. A huge departure from traditional relational design, OLAP allows the data to be stored and accessed in the most efficient mannerallowing end-users to traverse the edges of a hypothetical "cube" of many dimensions. (See below for an example of such a data cube).

The cube's dimensions are associated with facts (also called "measures"). In relational terms, the facts have a many-to-one relationship with the dimensions. For example, Acme Computer Supplies may have a database for sales. Dimensions are usually Customers, Products, and Time Element (month, quarter, etc.). The sales figure for a specific product (Cat5e cables) to a specific customer (Oracle Corp.) during a specific time period (Aug 2008) is one measure. The dimensions are stored on individual tables and so are the factsi.e. the sales figure. So the fact table, in relational terminology, is a child table of the dimension tables.But that's where the analogy ends. The access to the measures in relational design would have been through indexes created on the customer, product, or time columns of the fact table. In the OLAP approach, specific cells (the measures) are accessed by traversing the cube: in this example, by going to the slice containing the time - Aug 08; then product - Cat5e; and finally the customer - Oracle.Oracle knows how to go to these slices by calculating the destination as in an array, not a table. For instance, suppose the dimensions are organized as shown below:Dimension Time := {'May','Jun','Jul','Aug'}Dimension Customer := {'Microsoft','IBM','Oracle','HP'}Dimension Product := {'Fiber','Cat6e','Cat5e','Serial'}

To find the measure for Oracle + Aug + Cat5e, the OLAP engine performs the navigation like this:1. Aug 08 is the fourth element of the array called Time, so travel to the fourth cell along the time dimension of the cube.2. Cat5e is the third element of the Product array, so travel to the third element.3. Oracle is the third element of the Customer array, so travel to the third element.That's it; now you've arrived at the measure you want. This is done without indexes since the dimension values serve as array pointers. Similarly, if you want to calculate the total sales to all customers in Aug 08, you do the same thing as above, except that in Step 3 you total the measures of the elements of the array without going to a specific cell.

OLAP versus OLTPHari Mailvaganam, Slice, Dice and Drill! , available at http://www.dwreview.com/OLAP/Introduction_OLAP.htmlOLAP allows business users to slice and dice data at will. Normally data in an organization is distributed in multiple data sources and are incompatible with each other. A retail example: Point-of-sales data and sales made via call-center or the Web are stored in different location and formats. It would a time consuming process for an executive to obtain OLAP reports such as - What are the most popular products purchased by customers between the ages 15 to 30?Part of the OLAP implementation process involves extracting data from the various data repositories and making them compatible. Making data compatible involves ensuring that the meaning of the data in one repository matches all other repositories. An example of incompatible data: Customer ages can be stored as birth date for purchases made over the web and stored as age categories (i.e. between 15 and 30) for in store sales.It is not always necessary to create a data warehouse for OLAP analysis. Data stored by operational systems, such as point-of-sales, are in types of databases called OLTPs. OLTP, Online Transaction Process, databases do not have any difference from a structural perspective from any other databases. The main difference, and only, difference is the way in which data is stored.Examples of OLTPs can include ERP, CRM, SCM, Point-of-Sale applications, Call Center.OLTPs are designed for optimal transaction speed. When a consumer makes a purchase online, they expect the transactions to occur instantaneously. With a database design, call data modeling, optimized for transactions the record 'Consumer name, Address, Telephone, Order Number, Order Name, Price, Payment Method' is created quickly on the database and the results can be recalled by managers equally quickly if needed.

Figure 1. Data Model for OLTPData are not typically stored for an extended period on OLTPs for storage cost and transaction speed reasons.OLAPs have a different mandate from OLTPs. OLAPs are designed to give an overview analysis of what happened. Hence the data storage (i.e. data modeling) has to be set up differently. The most common method is called the star design.

Figure 2. Star Data Model for OLAP

The central table in an OLAP start data model is called the fact table. The surrounding tables are called the dimensions. Using the above data model, it is possible to build reports that answer questions such as: The supervisor that gave the most discounts. The quantity shipped on a particular date, month, year or quarter. In which zip code did product A sell the most. To obtain answers, such as the ones above, from a data model OLAP cubes are created. OLAP cubes are not strictly cuboids - it is the name given to the process of linking data from the different dimensions. The cubes can be developed along business units such as sales or marketing. Or a giant cube can be formed with all the dimensions.

Figure 3. OLAP Cube with Time, Customer and Product DimensionsOLAP can be a valuable and rewarding business tool. Aside from producing reports, OLAP analysis can aid an organization evaluate balanced scorecard targets.

Figure 4. Steps in the OLAP Creation Process

OLAP Storage

OLAP storage is one of the critical choices to be made when designing the solution. OLAP storage comes in three forms:MOLAP - Multidimensional OLAP. In MOLAP, both the source data and the aggregations are stores in a multidimensional format. MOLAP is the fastest option for data retrieval, but requires the most disk space. Disk space is less of a concern these days with lowering storage and processing cost.ROLAP - Relational OLAP. All data, including the aggregations are stored within the source relational database. This will be a concern for larger data warehousing implementations which have higher usage needs. ROLAP is the slowest for data retrieval. Whether an aggregation exists or not, a ROLAP database must access the data warehouse itself. ROLAP is best suited for smaller data warehousing implementations.HOLAP - Hybrid OLAP. HOLAP is a combination of both the above storage methodologies. HOLAP databases store the aggregations that exist within a multidimensional structure, leaving the cell-level data itself in a relational form. Where the data is pre aggregated, HOLAP offers the performance of MOLAP, where the data must be fetched from the tables. HOLAP is as slow as ROLAP.Due to shrinking hardware and processing cost, MOLAP are generally most often used. HOLAP is a better solution if the solution is accessing a stand-alone database. ROLAP are more convenient to set up when the query demands are relatively low and also on a stand-alone database.http://businessintelligence.ittoolbox.com/documents/advantagesdisadvantages-of-molap-rolap-and-holap-15897

MOLAP Excellent performance- this is the more traditional way of OLAP analysis. In MOLAP, data is stored in a multidimensional cube. The storage is not in the relational database, but in proprietary formats. Advantages: MOLAP cubes are built for fast data retrieval, and are optimal for slicing and dicing operations. They can also perform complex calculations. All calculations have been pre-generated when the cube is created. Hence, complex calculations are not only doable, but they return quickly. Disadvantages: It is limited in the amount of data it can handle. Because all calculations are performed when the cube is built, it is not possible to include a large amount of data in the cube itself. This is not to say that the data in the cube cannot be derived from a large amount of data. Indeed, this is possible. But in this case, only summary-level information will be included in the cube itself. It requires an additional investment. Cube technology are often proprietary and do not already exist in the organization. Therefore, to adopt MOLAP technology, chances are additional investments in human and capital resources are needed.

ROLAP This methodology relies on manipulating the data stored in the relational database to give the appearance of traditional OLAP's slicing and dicing functionality. In essence, each action of slicing and dicing is equivalent to adding a "WHERE" clause in the SQL statement. Advantages: It can handle large amounts of data. The data size limitation of ROLAP technology is the limitation on data size of the underlying relational database. In other words, ROLAP itself places no limitation on data amount. It can leverage functionalities inherent in the relational database. Often, relational database already comes with a host of functionalities. ROLAP technologies, since they sit on top of the relational database, can therefore leverage these functionalities. Disadvantages: Performance can be slow. Because each ROLAP report is essentially a SQL query (or multiple SQL queries) in the relational database, the query time can be long if the underlying data size is large. It has limited by SQL functionalities. Because ROLAP technology mainly relies on generating SQL statements to query the relational database, and SQL statements do not fit all needs (for example, it is difficult to perform complex calculations using SQL), ROLAP technologies are therefore traditionally limited by what SQL can do. ROLAP vendors have mitigated this risk by building into the tool out-of-the-box complex functions as well as the ability to allow users to define their own functions.

HOLAP HOLAP technologies attempt to combine the advantages of MOLAP and ROLAP. For summary-type information, HOLAP leverages cube technology for faster performance. When detail information is needed, HOLAP can "drill through" from the cube into the underlying relational data. Disclaimer: Contents are not reviewed for correctness and are not endorsed or recommended by Toolbox.com or any vendor. Popular Q&A contents include summarized information from Business Intelligence Career discussion unless otherwise noted.

Operations

The information in a multidimensional cube is very difficult for users to manage because of its quantity, even if it is a concise version of the information stored to operational databases. If, for example, a store chain includes 50 stores selling 1000 items, and a specific data warehouse covers three-year-long transactions (approximately 1000 days), the number of potential events totals 50 1000 1000 = 5 10(7th). Assuming that each store can sell only 10 percent of all the available items per day, the number of events totals 5 10(6th). This is still too much data to be analyzed by users without relying on automatic tools.You have essentially two ways to reduce the quantity of data and obtain useful information: restriction and aggregation. The cube metaphor offers an easy-to-use and intuitive way to understand both of these methods, as we will discuss in the following paragraphs.

RestrictionRestricting data means separating part of the data from a cube to mark out an analysis field. In relational algebra terminology, this is called making selections and/or projections.Selection has two forms: slicing and dicing.

Restriction - selections - slicing dicing projections

Common operations include Slice and Dice, Drill-Down, Roll-Up, and Pivot:Source: Multidimensional OLAP Cubes, available at: http://www.practicaldb.com/blog/cubes/

When you slice data, you decrease cube dimensionality by setting one or more dimensions to a specific value. For example, if you set one of the sales cube dimensions to a value, such as store='EverMore', this results in the set of events associated with the items sold in the EverMore store. According to the cube metaphor, this is simply a plane of cellsthat is, a data slice that can be easily displayed in spreadsheets. In the store chain example given earlier, approximately 10(5th) events still appear in your result. If you set two dimensions to a value, such as store='EverMore' and date='4/5/2008', this will result in all the different items sold in the EverMore store on April 5 (approximately 100 events). Graphically speaking, this information is stored at the intersection of two perpendicular planes resulting in a line. If you set all the dimensions to a particular value, you will define just one event that corresponds to a point in the three-dimensional space of sales.

Dicing is a generalization of slicing. It poses some constraints on dimensional attributes to scale down the size of a cube. For example, you can select only the daily sales of the food items in April 2008 in Florida. In this way, if five stores are located in Florida and 50 food products are sold, the number of events to examine changes to 5 50 30 = 7500.Finally, a projection can be referred to as a choice to keep just one subgroup of measures for every event and reject other measures.

Slice: A slice is a subset of a multi-dimensional array corresponding to a single value for one or more members of the dimensions not in the subset.

Slice is any two-dimensional slice of the data cube. You slice a data cube to filter information. For example, the figure below shows a data cube with following dimensions: Retailer, Date and ProductIf you are interested only in the data for a specific retailer you can slice off a single (two dimensional) layer. In our example the slice contains information on date and product for department stores.

Dice: The dice operation is a slice on more than two dimensions of a data cube (or more than two consecutive slices).

Dice is the "rotation" of the cube to reveal another, different slice of data. For exploring data from various perspectives, you can dice a data cube by exchanging the dimension for other dimensions. For example, after exploring the data by date and product for a specific retailer (orange slice on the left cube), you want to get deeper information on date and retailer for a specific product.

Drill Down/Up: Drilling down or up is a specific analytical technique whereby the user navigates among levels of data ranging from the most summarized (up) to the most detailed (down).

Drill Down is the exploration of data to subsequent levels of more detail along a dimension. For example, the dimension "Retailer" can be drilled-down to specific retailers, the dimension "Date" can be drilled-down to months, and the dimension "Product" finally, can be explored in more detail by single products.

Roll-up: (Aggregate, Consolidate) A roll-up involves computing all of the data relationships for one or more dimensions. To do this, a computational relationship or formula might be defined.

Roll-up is the aggregation of data to subsequent levels of summary, along a dimension. This implies that dimensions are typically hierarchical in nature based on parent/child relationships between dimension values. Pivot: This operation is also called rotate operation. It rotates the data in order to provide an alternative presentation of data the report or page display takes a different dimensional orientation.

ConclusionsIn summary, a multidimensional cube hinges on a fact relevant to decision-making. It shows a set of events for which numeric measures provide a quantitative description. Each cube axis shows a possible analysis dimension. Each dimension can be analyzed at different detail levels specified by hierarchically structured attributes.

OLAP Benefits Successful OLAP applications increase the productivity of business managers, developers, and whole organizations. The inherent flexibility of OLAP systems means business users of OLAP applications can become more self-sufficient. Managers are no longer dependent on IT to make schema changes, to create joins, or worse. Perhaps more importantly, OLAP enables managers to model problems that would be impossible using less flexible systems with lengthy and inconsistent response times. More control and timely access to strategic information equal more effective decision-making. IT developers also benefit from using the right OLAP software. Although it is possible to build an OLAP system using software designed for transaction processing or data collection, it is certainly not a very efficient use of developer time. By using software specifically designed for OLAP, developers can deliver applications to business users faster, providing better service. Faster delivery of applications also reduces the applications backlog. OLAP reduces the applications backlog still further by making business users self-sufficient enough to build their own models. However, unlike standalone departmental applications running on PC networks, OLAP applications are dependent on Data Warehouses and transaction processing systems to refresh their source level data. As a result, IT gains more self-sufficient users without relinquishing control over the integrity of the data. IT also realizes more efficient operations through OLAP. By using software designed for OLAP, IT reduces the query drag and network traffic on transaction systems or the Data Warehouse. Lastly, by providing the ability to model real business problems and a more efficient use of people resources, OLAP enables the organization as a whole to respond more quickly to market demands. Market responsiveness, in turn, often yields improved revenue and profitability.

OLAP functionality is: Multidimensional -- OLAP services provide a wide variety of possible views or a multidimensional conceptual view of the data by supporting a dimensional aggregation path or hierarchies and/or multiple hierarchies. Easy to understand -- The data mart designed for OLAP analysis should handle any business logic and statistical analysis that is relevant to the application and the developer, while at the same time, keeps it easy enough for the target user. Interactive -- OLAP helps the user synthesize business information through comparative, personalized viewing, as well as thorough analysis of historical and projected data in various "what-if" data model scenarios. The users are allowed to define new ad hoc calculations as part of the analysis and can report on the data in any desired way. Fast -- OLAP services are usually implemented in a multi-user client/server mode and offer consistently rapid responses to queries, regardless of database size and complexity. The consolidated business data can be pre-aggregated along with the hierarchies in all dimensions to reduce the runtime calculation for building the OLAP reports.

Overview of the Dimensional Data ModelAvailable at: http://docs.oracle.com/cd/B28359_01/olap.111/b28124/overview.htm#OLAUG9115

Dimensional objects are an integral part of OLAP. Because OLAP is on-line, it must provide answers quickly; analysts pose iterative queries during interactive sessions, not in batch jobs that run overnight. And because OLAP is also analytic, the queries are complex. The dimensional objects and the OLAP engine are designed to solve complex queries in real time.The dimensional objects include cubes, measures, dimensions, attributes, levels and hierarchies. The simplicity of the model is inherent because it defines objects that represent real-world business entities. Analysts know: which business measures they are interested in examining which dimensions and attributes make the data meaningful how the dimensions of their business are organized into levels and hierarchies.

Figure 1. Diagram of the OLAP Dimensional Model

Description of Diagram of the OLAP Dimensional ModelThe dimensional data model is highly structured. Structure implies rules that govern the relationships among the data and control how the data can be queried. Cubes are the physical implementation of the dimensional model, and thus are highly optimized for dimensional queries. The OLAP engine leverages this innate dimensionality in performing highly efficient cross-cube joins for inter-row calculations, outer joins for time series analysis, and indexing. Dimensions are pre-joined to the measures. The technology that underlies cubes is based on an indexed multidimensional array model, which provides direct cell access.The OLAP engine manipulates dimensional objects in the same way that the SQL engine manipulates relational objects. However, because the OLAP engine is optimized to calculate analytic functions, and dimensional objects are optimized for analysis, analytic and row functions can be calculated much faster in OLAP than in SQL.The dimensional model enables Oracle OLAP to support high-end business intelligence tools and applications such as OracleBI Discoverer Plus OLAP, OracleBI Spreadsheet Add-In, OracleBI Suite Enterprise Edition, BusinessObjects Enterprise, and Cognos ReportNet.

CubesCubes provide a means of organizing measures that have the same shape, that is, they have the exact same dimensions. Measures in the same cube can easily be analyzed and displayed together. A cube usually corresponds to a single fact table or view.

MeasuresMeasures populate the cells of a cube with the facts collected about business operations. Measures are organized by dimensions, which typically include a Time dimension.An analytic database contains snapshots of historical data, derived from data in a transactional database, legacy system, syndicated sources, or other data sources. Three years of historical data is generally considered to be appropriate for analytic applications.Measures are static and consistent while analysts are using them to inform their decisions. They are updated in a batch window at regular intervals: weekly, daily, or periodically throughout the day. Some administrators refresh their data by adding periods to the time dimension of a measure, and may also roll off an equal number of the oldest time periods. Each update provides a fixed historical record of a particular business activity for that interval. Other administrators do a full rebuild of their data rather than performing incremental updates.A critical decision in defining a measure is the lowest level of detail. Users may never view this detail data, but it determines the types of analysis that can be performed. For example, market analysts (unlike order entry personnel) do not need to know that Beth Miller in Ann Arbor, Michigan, placed an order for a size 10 blue polka-dot dress on July 6, 2006, at 2:34 p.m. But they might want to find out which color of dress was most popular in the summer of 2006 in the Midwestern United States.The base level determines whether analysts can get an answer to this question. For this particular question, Time could be rolled up into months, Customer could be rolled up into regions, and Product could be rolled up into items (such as dresses) with an attribute of color. However, this level of aggregate data could not answer the question: At what time of day are women most likely to place an order? An important decision is the extent to which the data has been aggregated before being loaded into a data warehouse.

DimensionsDimensions contain a set of unique values that identify and categorize data. They form the edges of a cube, and thus of the measures within the cube. Because measures are typically multidimensional, a single value in a measure must be qualified by a member of each dimension to be meaningful. For example, the Sales measure has four dimensions: Time, Customer, Product, and Channel. A particular Sales value (43,613.50) only has meaning when it is qualified by a specific time period (Feb-06), a customer (Warren Systems), a product (Portable PCs), and a channel (Catalog).Base-level dimension values correspond to the unique keys of a fact table.

Hierarchies and LevelsA hierarchy is a way to organize data at different levels of aggregation. In viewing data, analysts use dimension hierarchies to recognize trends at one level, drill down to lower levels to identify reasons for these trends, and roll up to higher levels to see what affect these trends have on a larger sector of the business.The elements of a dimension can be organized as a hierarchy, a set of parent-child relationships, typically where a parent member summarizes its children. Parent elements can further be aggregated as the children of another parent. For example May 2005's parent is Second Quarter 2005 which is in turn the child of Year 2005. Similarly cities are the children of regions; products roll into product groups and individual expense items into types of expenditure.

Level-Based HierarchiesEach level represents a position in the hierarchy. Each level above the base (or most detailed) level contains aggregate values for the levels below it. The members at different levels have a one-to-many parent-child relation. For example, Q1-05 and Q2-05 are the children of 2005, thus 2005 is the parent of Q1-05 and Q2-05.Suppose a data warehouse contains snapshots of data taken three times a day, that is, every 8 hours. Analysts might normally prefer to view the data that has been aggregated into days, weeks, quarters, or years. Thus, the Time dimension needs a hierarchy with at least five levels.Hierarchies and levels have a many-to-many relationship. A hierarchy typically contains several levels, and a single level can be included in more than one hierarchy.Each level typically corresponds to a column in a dimension table or view. The base level is the primary key.

Value-Based HierarchiesAlthough hierarchies are typically composed of named levels, they do not have to be. The parent-child relations among dimension members may not define meaningful levels. For example, in an employee dimension, each manager has one or more reports, which forms a parent-child relation. Creating levels based on these relations (such as individual contributors, first-level managers, second-level managers, and so forth) may not be meaningful for analysis. Likewise, the line item dimension of financial data does not have levels. This type of hierarchy is called a value-based hierarchy.

AttributesAn attribute provides additional information about the data. Some attributes are used for display. You might have attributes like colors, flavors, or sizes. This type of attribute can be used for data selection and answering questions such as: Which colors were the most popular in women's dresses in the summer of 2005? How does this compare with the previous summer?Time attributes can provide information about the Time dimension that may be useful in some types of analysis, such as identifying the last day or the number of days in each time period.Each attribute typically corresponds to a column in dimension table or view.

17