o lap mining

Upload: atanu-misra

Post on 04-Jun-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/13/2019 o Lap Mining

    1/92

  • 8/13/2019 o Lap Mining

    2/92

    2

    Chapter 32 - Objectives

    Purpose of online analytical processing (OLAP)and how OLAP differs from data warehousing.

    Key features of OLAP applications.

    Potential benefits associated with successfulOLAP applications.

    Rules for OLAP tools and main types of toolsincluding: multi-dimensional OLAP (MOLAP),relational OLAP (ROLAP), and managed queryenvironment (MQE).

  • 8/13/2019 o Lap Mining

    3/92

  • 8/13/2019 o Lap Mining

    4/92

    4

    Data Warehousing and End-User Access Tools

    Accompanying growth in data warehouses isincreasing demands for more powerful accesstools providing advanced analytical

    capabilities.

    Key developments include:

    Online analytical processing (OLAP).

    SQL extensions for complex data analysis. Data mining tools.

  • 8/13/2019 o Lap Mining

    5/92

    5

    Introducing OLAP

    The dynamic synthesis, analysis, and

    consolidation of large volumes of multi-

    dimensional data, Codd (1993).

    Describes a technology that uses a multi-

    dimensional view of aggregate data to provide

    quick access to strategic information for

    purposes of advanced analysis.

  • 8/13/2019 o Lap Mining

    6/92

    6

    Introducing OLAP

    Enables users to gain a deeper understanding

    and knowledge about various aspects of their

    corporate data through fast, consistent,

    interactive access to a wide variety of possibleviews of the data.

    Allows users to view corporate data in such a

    way that it is a better model of the true

    dimensionality of the enterprise.

  • 8/13/2019 o Lap Mining

    7/92

    7

    Introducing OLAP

    Can easily answer who? and what?

    questions, however, ability to answer what if?

    and why? type questions distinguishes OLAP

    from general-purpose query tools.

    Types of analysis ranges from basic navigation

    and browsing (slicing and dicing) tocalculations, to more complex analyses such as

    time series and complex modeling.

  • 8/13/2019 o Lap Mining

    8/92

    8

    OLAP Benchmarks

    OLAP Council published an analytical

    processing benchmark referred to as the APB-

    1 (OLAP Council, 1998).

    Aim is to measure a servers overall OLAP

    performance rather than the performance of

    individual tasks.

  • 8/13/2019 o Lap Mining

    9/92

    9

    OLAP Benchmarks

    APB-1 assesses most common business operationsincluding:

    bulk loading of data from internal/external datasources;

    incremental loading of data from operational systems;

    aggregation of input/level data along hierarchies;

    calculation of new data based on business models;

    time series analysis;

    queries with a high degree of complexity;

    drill-down through hierarchies;

    ad hocqueries;

    multiple online sessions.

  • 8/13/2019 o Lap Mining

    10/92

    10

    OLAP Benchmarks

    OLAP applications are judged on their abilityto provide just-in-time (JIT) information, acore requirement of supporting effectivedecision-making.

    Assessing a servers ability to satisfy thisrequirement is more than measuring

    processing performance but includes itsabilities to model complex businessrelationships and to respond to changingbusiness requirements.

  • 8/13/2019 o Lap Mining

    11/92

    11

    OLAP Benchmarks

    APB-1 uses a standard benchmark metric

    called AQM (Analytical Queries per Minute).

    AQM represents number of analytical queries

    processed per minute including data loading

    and computation time. Thus, AQM

    incorporates data loading performance,calculation performance, and query

    performance into a singe metric.

  • 8/13/2019 o Lap Mining

    12/92

    12

    OLAP Benchmarks

    Publication of APB-1 benchmark results must

    include both the database schema and all code

    required for executing the benchmark.

    An essential requirement of all OLAP

    applications is ability to provide users with JIT

    information, to make effective decisions aboutan organizations strategic directions.

  • 8/13/2019 o Lap Mining

    13/92

    13

    OLAP Applications

    JIT information is computed data that usually

    reflects complex relationships and is often

    calculated on the fly.

    Also, as data relationships may not be knownin advance, the data model must be flexible.

  • 8/13/2019 o Lap Mining

    14/92

    14

    Examples of OLAP Applications in Various

    Functional Areas

  • 8/13/2019 o Lap Mining

    15/92

    15

    OLAP Applications

    Although OLAP applications are found in

    widely divergent functional areas, all have

    following key features:

    multi-dimensional views of data;

    support for complex calculations;

    time intelligence.

  • 8/13/2019 o Lap Mining

    16/92

  • 8/13/2019 o Lap Mining

    17/92

    17

    OLAP Applications - Support for Complex

    Calculations

    Must provide a range of powerful

    computational methods such as that required

    by sales forecasting, which uses trend

    algorithms such as moving averages andpercentage growth.

    Mechanisms for implementing computationalmethods should be clear and non-procedural.

  • 8/13/2019 o Lap Mining

    18/92

    18

    OLAP ApplicationsTime Intelligence

    Key feature of almost any analyticalapplication as performance is almost always

    judged over time.

    Time hierarchy is not always used in samemanner as other hierarchies.

    Concepts such as year-to-date and period-over-period comparisons should be easily defined.

  • 8/13/2019 o Lap Mining

    19/92

  • 8/13/2019 o Lap Mining

    20/92

    20

    Representing Multi-Dimensional Data

    Example of two-dimensional query.

    What is the total revenue generated by property

    sales in each city, in each quarter of 1997?

    Choice of representation is based on types of

    queries end-user may ask.

    Compare representation - three-field relationaltable versus two-dimensional matrix.

  • 8/13/2019 o Lap Mining

    21/92

    21

    Multi-Dimensional Data as Three-Field Table

    versus Two-Dimensional Matrix

  • 8/13/2019 o Lap Mining

    22/92

    22

    Representing Multi-Dimensional Data

    Example of three-dimensional query.

    What is the total revenue generated by property

    sales for each type of property (Flat or House) in

    each city, in each quarter of 1997?

    Compare representation - four-field relational

    table versus three-dimensional cube.

  • 8/13/2019 o Lap Mining

    23/92

    23

    Multi-Dimensional Data as Four-Field

    Table versus Three-Dimensional Cube

  • 8/13/2019 o Lap Mining

    24/92

    24

    Representing Multi-Dimensional Data

    Cube represents data as cells in an array.

    Relational table only represents multi-

    dimensional data in two dimensions.

  • 8/13/2019 o Lap Mining

    25/92

    25

    Multi-Dimensional OLAP Servers

    Use multi-dimensional structures to store data

    and relationships between data.

    Multi-dimensional structures are bestvisualized as cubes of data, and cubes within

    cubes of data. Each side of cube is a dimension.

    A cube can be expanded to include otherdimensions.

  • 8/13/2019 o Lap Mining

    26/92

    26

    Multi-Dimensional OLAP Servers

    A cube supports matrix arithmetic.

    Multi-dimensional query response time

    depends on how many cells have to be addedon the fly.

    As number of dimensions increases, number of

    the cubes cells increases exponentially.

  • 8/13/2019 o Lap Mining

    27/92

    27

    Multi-Dimensional OLAP Servers

    However, majority of multi-dimensionalqueries use summarized, high-level data.

    Solution is to pre-aggregate (consolidate) alllogical subtotals and totals along alldimensions.

    Pre-aggregation is valuable, as typical

    dimensions are hierarchical in nature. (e.g. Time dimension hierarchy - years, quarters,

    months, weeks, and days)

  • 8/13/2019 o Lap Mining

    28/92

    28

    Multi-Dimensional OLAP Servers

    Predefined hierarchy allows logical pre-

    aggregation and, conversely, allows for a

    logical drill-down.

    Supports common analytical operations

    Consolidation.

    Drill-down.

    Slicing and dicing.

  • 8/13/2019 o Lap Mining

    29/92

    29

    Multi-Dimensional OLAP Servers

    Consolidation - aggregation of data such assimple roll-ups or complex expressionsinvolving inter-related data.

    Drill-Down - is reverse of consolidation andinvolves displaying the detailed data thatcomprises the consolidated data.

    Slicing and Dicing - (also called pivoting) refersto the ability to look at the data from differentviewpoints.

  • 8/13/2019 o Lap Mining

    30/92

    30

    Multi-Dimensional OLAP servers

    Can store data in a compressed form by

    dynamically selecting physical storage

    organizations and compression techniques that

    maximize space utilization.

    Dense data (i.e., data that exists for high

    percentage of cells) can be stored separately

    from sparse data (i.e., significant percentage of

    cells are empty).

  • 8/13/2019 o Lap Mining

    31/92

    31

    Multi-Dimensional OLAP Servers

    Ability to omit empty or repetitive cells can

    greatly reduce the size of the cube and the

    amount of processing.

    Allows analysis of exceptionally large amounts

    of data.

  • 8/13/2019 o Lap Mining

    32/92

    32

    Multi-Dimensional OLAP Servers

    In summary, pre-aggregation, dimensional

    hierarchy, and sparse data management can

    significantly reduce the size of the cube and the

    need to calculate values on-the-fly.

    Removes need for multi-table joins and

    provides quick and direct access to arrays of

    data, thus significantly speeding up execution

    of multi-dimensional queries.

  • 8/13/2019 o Lap Mining

    33/92

    33

    Codds Rules for OLAP Systems

    In 1993, E.F. Codd formulated twelve rules as

    the basis for selecting OLAP tools.

    Multi-dimensional conceptual view

    Transparency

    Accessibility

    Consistent reporting performance

    Client-server architectureGeneric dimensionality

  • 8/13/2019 o Lap Mining

    34/92

    34

    Codds Rules for OLAP

    Dynamic sparse matrix handling

    Multi-user support

    Unrestricted cross-dimensional operations

    Intuitive data manipulation

    Flexible reporting

    Unlimited dimensions and aggregation levels.

  • 8/13/2019 o Lap Mining

    35/92

    35

    Codds Rules for OLAP Systems

    There are proposals to re-define or extend the

    rules. For example, to also include:

    Comprehensive database management tools.

    Ability to drill down to detail (source record) level. Incremental database refresh.

    SQL interface to the existing enterprise

    environment.

  • 8/13/2019 o Lap Mining

    36/92

    36

    Categories of OLAP Tools

    OLAP tools are categorized according to the

    architecture of the underlying database.

    Three main categories of OLAP tools include Multi-dimensional OLAP (MOLAP or MD-OLAP)

    Relational OLAP (ROLAP), also called multi-

    relational OLAP

    Managed query environment (MQE)

  • 8/13/2019 o Lap Mining

    37/92

    37

    Multi-Dimensional OLAP (MOLAP)

    Uses specialized data structures and multi-

    dimensional Database Management Systems

    (MDDBMSs) to organize, navigate, and

    analyze data.

    Data is typically aggregated and stored

    according to predicted usage to enhance query

    performance.

  • 8/13/2019 o Lap Mining

    38/92

    38

    Multi-Dimensional OLAP (MOLAP)

    Use array technology and efficient storage

    techniques that minimize the disk space

    requirements through sparse data

    management.

    Provides excellent performance when data is

    used as designed, and the focus is on data for a

    specific decision-support application.

  • 8/13/2019 o Lap Mining

    39/92

    39

    Multi-Dimensional OLAP (MOLAP)

    Traditionally, require a tight coupling with the

    application layer and presentation layer.

    Recent trends segregate the OLAP from the

    data structures through the use of published

    application programming interfaces (APIs).

  • 8/13/2019 o Lap Mining

    40/92

    40

    Typical Architecture for MOLAP Tools

  • 8/13/2019 o Lap Mining

    41/92

    41

    MOLAP Tools - Development Issues

    Underlying data structures are limited in their

    ability to support multiple subject areas and to

    provide access to detailed data.

    Navigation and analysis of data is limited

    because the data is designed according to

    previously determined requirements.

  • 8/13/2019 o Lap Mining

    42/92

    42

    MOLAP Tools - Development Issues

    MOLAP products require a different set of

    skills and tools to build and maintain the

    database, thus increasing the cost and

    complexity of support.

  • 8/13/2019 o Lap Mining

    43/92

    43

    Relational OLAP (ROLAP)

    Fastest growing style of OLAP technology.

    Supports RDBMS products using a metadata

    layer - avoids need to create a static multi-

    dimensional data structure - facilitates the

    creation of multiple multi-dimensional views of

    the two-dimensional relation.

  • 8/13/2019 o Lap Mining

    44/92

    44

    Relational OLAP (ROLAP)

    To improve performance, some products use

    SQL engines to support complexity of multi-

    dimensional analysis, while others recommend,

    or require, the use of highly denormalizeddatabase designs such as the star schema.

  • 8/13/2019 o Lap Mining

    45/92

    45

    Typical Architecture for ROLAP Tools

  • 8/13/2019 o Lap Mining

    46/92

    46

    ROLAP Tools - Development Issues

    Middleware to facilitate the development of

    multi-dimensional applications. (Software that

    converts the two-dimensional relation into a

    multi-dimensional structure).

    Development of an option to create persistent,

    multi-dimensional structures with facilities to

    assist in the administration of these structures.

  • 8/13/2019 o Lap Mining

    47/92

    47

    Hybrid OLAP (HOLAP)

    Can use data from either a RDBMS directly or

    a multi-dimension server.

  • 8/13/2019 o Lap Mining

    48/92

    48

    Managed Query Environment (MQE)

    Relatively new development.

    Provide limited analysis capability, either

    directly against RDBMS products, or by usingan intermediate MOLAP server.

  • 8/13/2019 o Lap Mining

    49/92

    49

    Managed Query Environment (MQE)

    Deliver selected data directly from DBMS or

    via a MOLAP server to desktop (or local

    server) in form of a datacube, where it is

    stored, analyzed, and maintained locally.

    Promoted as being relatively simple to install

    and administer with reduced cost and

    maintenance.

  • 8/13/2019 o Lap Mining

    50/92

    50

    Typical Architecture for MQE Tools

  • 8/13/2019 o Lap Mining

    51/92

    51

    MQE Tools - Development Issues

    Architecture results in significant dataredundancy and may cause problems fornetworks that support many users.

    Ability of each user to build a custom datacubemay cause a lack of data consistency amongusers.

    Only a limited amount of data can beefficiently maintained.

  • 8/13/2019 o Lap Mining

    52/92

    52

    OLAP Extensions to SQL

    SQL promoted as easy to learn, non-procedural, free-format, DBMS-independent,and international standard.

    However, major disadvantage has been inabilityto represent many of the questions mostcommonly asked by business analysts.

    IBM and Oracle jointly proposed OLAPextensions to SQL early in 1999, adopted as anamendment to SQL.

  • 8/13/2019 o Lap Mining

    53/92

  • 8/13/2019 o Lap Mining

    54/92

    54

    OLAP Extensions to SQL - RISQL

    Designed for business analysts.

    Set of extensions that augments SQL with a

    variety of powerful operations appropriate todata analysis and decision-support applications

    such as ranking, moving averages,

    comparisons, market share, this year versus

    last year.

  • 8/13/2019 o Lap Mining

    55/92

    55

    Use of the RISQL CUME Function

    Show the quarterly sales for branch office B003,along with the monthly year-to-date figures.

    SELECT quarter, quarterlySales, CUME(quarterlySales)

    AS Year-to-DateFROM BranchSales

    WHERE branchNo = B003;

  • 8/13/2019 o Lap Mining

    56/92

    56

    Use of the RISQL MOVINGAVG / MOVINGSUM

    Function

    Show the first six monthly sales for branch office

    B003 without the effect of seasonality.

    SELECT month, monthlySales,

    MOVINGAVG(monthlySales) AS 3-MonthMovingAvg,MOVINGSUM(monthlySales) AS 3-MonthMovingSum

    FROM BranchSales

    WHERE branchNo = B003;

  • 8/13/2019 o Lap Mining

    57/92

    57

    Data Mining

    The process of extracting valid, previously

    unknown, comprehensible, and actionable

    information from large databases and using it

    to make crucial business decisions (Simoudis,1996).

    Involves analysis of data and use of software

    techniques for finding hidden and unexpected

    patterns and relationships in sets of data.

  • 8/13/2019 o Lap Mining

    58/92

    58

    Data Mining

    Reveals information that is hidden andunexpected, as little value in finding patternsand relationships that are already intuitive.

    Patterns and relationships are identified byexamining the underlying rules and features inthe data.

    Tends to work from the data up and most

    accurate results normally require largevolumes of data to deliver reliable conclusions.

  • 8/13/2019 o Lap Mining

    59/92

    59

    Data Mining

    Starts by developing an optimal representationof structure of sample data, during which timeknowledge is acquired and extended to largersets of data.

    Data mining can provide huge paybacks forcompanies who have made a significantinvestment in data warehousing.

    Relatively new technology, however alreadyused in a number of industries.

  • 8/13/2019 o Lap Mining

    60/92

    60

    Examples of Applications of Data Mining

    Retail / Marketing

    Identifying buying patterns of customers.

    Finding associations among customer

    demographic characteristics.Predicting response to mailing campaigns.

    Market basket analysis.

  • 8/13/2019 o Lap Mining

    61/92

    61

    Examples of Applications of Data Mining

    Banking

    Detecting patterns of fraudulent credit card

    use.

    Identifying loyal customers.

    Predicting customers likely to change their

    credit card affiliation.

    Determining credit card spending bycustomer groups.

  • 8/13/2019 o Lap Mining

    62/92

    62

    Examples of Applications of Data Mining

    Insurance

    Claims analysis.

    Predicting which customers will buy new policies.

    Medicine

    Characterizing patient behavior to predict surgery

    visits.

    Identifying successful medical therapies fordifferent illnesses.

  • 8/13/2019 o Lap Mining

    63/92

    63

    Data Mining Operations

    Four main operations include: Predictive modeling.

    Database segmentation.

    Link analysis.

    Deviation detection.

    There are recognized associations between theapplications and the corresponding operations.

    e.g. Direct marketing strategies use databasesegmentation.

  • 8/13/2019 o Lap Mining

    64/92

    64

    Data Mining Techniques

    Techniques are specific implementations of the

    data mining operations.

    Each operation has its own strengths and

    weaknesses.

    Data mining tools sometimes offer a choice of

    operations to implement a technique.

  • 8/13/2019 o Lap Mining

    65/92

    65

    Data Mining Techniques

    Criteria for selection of tool includes

    Suitability for certain input data types.

    Transparency of the mining output.

    Tolerance of missing variable values. Level of accuracy possible.

    Ability to handle large volumes of data.

    Data Mining Operations and Associated

  • 8/13/2019 o Lap Mining

    66/92

    66

    Data Mining Operations and Associated

    Techniques

  • 8/13/2019 o Lap Mining

    67/92

    67

    Predictive Modeling

    Similar to the human learning experience

    uses observations to form a model of the important

    characteristics of some phenomenon.

    Uses generalizations of real world and abilityto fit new data into a general framework.

    Can analyze a database to determine essential

    characteristics (model) about the data set.

  • 8/13/2019 o Lap Mining

    68/92

    68

    Predictive Modeling

    Model is developed using a supervised learning

    approach, which has two phases: training and

    testing.

    Training builds a model using a large sample ofhistorical data called a training set.

    Testing involves trying out the model on new,

    previously unseen data to determine its accuracy

    and physical performance characteristics.

  • 8/13/2019 o Lap Mining

    69/92

    69

    Predictive Modeling

    Applications of predictive modeling include

    customer retention management, credit

    approval, cross selling, and direct marketing.

    Two techniques associated with predictive

    modeling: classification and value prediction,

    distinguished by nature of the variable being

    predicted.

  • 8/13/2019 o Lap Mining

    70/92

    70

    Predictive Modeling - Classification

    Used to establish a specific predetermined class

    for each record in a database from a finite set

    of possible class values.

    Two specializations of classification: tree

    induction and neural induction.

  • 8/13/2019 o Lap Mining

    71/92

    71

    Example of Classification using Tree Induction

    Example of Classification using Neural

  • 8/13/2019 o Lap Mining

    72/92

    72

    Example of Classification using Neural

    Induction

  • 8/13/2019 o Lap Mining

    73/92

    73

    Predictive Modeling - Value Prediction

    Used to estimate a continuous numeric value

    that is associated with a database record.

    Uses the traditional statistical techniques oflinear regression and nonlinear regression.

    Relatively easy to use and understand.

  • 8/13/2019 o Lap Mining

    74/92

    74

    Predictive Modeling - Value Prediction

    Linear regression attempts to fit a straight line

    through a plot of the data, such that the line is

    the best representation of the average of all

    observations at that point in the plot.

    Problem is that the technique only works well

    with linear data and is sensitive to the presence

    of outliers (i.e., data values, which do notconform to the expected norm).

  • 8/13/2019 o Lap Mining

    75/92

    75

    Predictive Modeling - Value Prediction

    Although nonlinear regression avoids the main

    problems of linear regression, still not flexible

    enough to handle all possible shapes of the data

    plot.

    Statistical measurements are fine for building

    linear models that describe predictable data

    points, however, most data is not linear in

    nature.

  • 8/13/2019 o Lap Mining

    76/92

    76

    Predictive Modeling - Value Prediction

    Data mining requires statistical methods that

    can accommodate non-linearity, outliers, and

    non-numeric data.

    Applications of value prediction include credit

    card fraud detection or target mailing list

    identification.

  • 8/13/2019 o Lap Mining

    77/92

  • 8/13/2019 o Lap Mining

    78/92

    78

    Database Segmentation

    Less precise than other operations thus lesssensitive to redundant and irrelevant features.

    Sensitivity can be reduced by ignoring a subset

    of the attributes that describe each instance orby assigning a weighting factor to eachvariable.

    Applications of database segmentation includecustomer profiling, direct marketing, and crossselling.

    Example of Database Segmentation using a

  • 8/13/2019 o Lap Mining

    79/92

    79

    Example of Database Segmentation using a

    Scatterplot

  • 8/13/2019 o Lap Mining

    80/92

    80

    Database Segmentation

    Associated with demographic or neural

    clustering techniques, distinguished by:

    Allowable data inputs.

    Methods used to calculate the distancebetween records.

    Presentation of the resulting segments for

    analysis.

  • 8/13/2019 o Lap Mining

    81/92

    81

    Link Analysis

    Aims to establish links (associations) betweenrecords, or sets of records, in a database.

    There are three specializations

    Associations discovery.Sequential pattern discovery.

    Similar time sequence discovery.

    Applications include product affinity analysis,direct marketing, and stock price movement.

  • 8/13/2019 o Lap Mining

    82/92

    82

    Link Analysis - Associations Discovery

    Finds items that imply the presence of other

    items in the same event.

    Affinities between items are represented by

    association rules.

    e.g. When customer rents property for more than 2

    years and is more than 25 years old, in 40% of cases,

    customer will buy a property. Association happens in

    35% of all customers who rent properties.

  • 8/13/2019 o Lap Mining

    83/92

    83

    Link Analysis - Sequential Pattern Discovery

    Finds patterns between events such that the

    presence of one set of items is followed by

    another set of items in a database of events

    over a period of time. e.g. Used to understand long-term customer buying

    behavior.

    Link Analysis Similar Time Sequence

  • 8/13/2019 o Lap Mining

    84/92

    84

    Link Analysis - Similar Time Sequence

    Discovery

    Finds links between two sets of data that are

    time-dependent, and is based on the degree of

    similarity between the patterns that both time

    series demonstrate.

    e.g. Within three months of buying property, new

    home owners will purchase goods such as cookers,

    freezers, and washing machines.

  • 8/13/2019 o Lap Mining

    85/92

    85

    Deviation Detection

    Relatively new operation in terms of

    commercially available data mining tools.

    Often a source of true discovery because itidentifies outliers, which express deviation

    from some previously known expectation and

    norm.

  • 8/13/2019 o Lap Mining

    86/92

    86

    Deviation Detection

    Can be performed using statistics and

    visualization techniques or as a by-product of

    data mining.

    Applications include fraud detection in the use

    of credit cards and insurance claims, quality

    control, and defects tracing.

    Example of Database Segmentation using a

  • 8/13/2019 o Lap Mining

    87/92

    87

    p S g g

    Visualization

  • 8/13/2019 o Lap Mining

    88/92

    88

    Data Mining Tools

    There are a growing number of commercialdata mining tools on the marketplace.

    Important characteristics of data mining toolsinclude:

    Data preparation facilities.

    Selection of data mining operations.

    Product scalability and performance. Facilities for visualization of results.

  • 8/13/2019 o Lap Mining

    89/92

    89

    Data Mining and Data Warehousing

    Major challenge to exploit data mining is

    identifying suitable data to mine.

    Data mining requires single, separate, clean,integrated, and self-consistent source of data.

  • 8/13/2019 o Lap Mining

    90/92

    90

    Data Mining and Data Warehousing

    A data warehouse is well equipped for

    providing data for mining.

    Data quality and consistency is a prerequisitefor mining to ensure the accuracy of the

    predictive models. Data warehouses are

    populated with clean, consistent data.

  • 8/13/2019 o Lap Mining

    91/92

    91

    Data Mining and Data Warehousing

    Advantageous to mine data from multiplesources to discover as many interrelationships

    as possible. Data warehouses contain data from

    a number of sources.

    Selecting relevant subsets of records and fields

    for data mining requires query capabilities of

    the data warehouse.

  • 8/13/2019 o Lap Mining

    92/92

    Data Mining and Data Warehousing

    Results of a data mining study are useful if

    there is some way to further investigate the

    uncovered patterns. Data warehouses provide

    capability to go back to the data source.