Download - Data Warehouse

Transcript
Page 1: Data Warehouse

Data WarehouseData WarehouseLutfi FreijLutfi Freij

Konstantin RimarchukKonstantin RimarchukVasken ChamlaianVasken Chamlaian

John SahakianJohn SahakianSuzan TonSuzan Ton

Page 2: Data Warehouse

InmonInmon

Father of the data warehouseFather of the data warehouse

Co-creator of the CorporateCo-creator of the Corporate

Information Factory.Information Factory.

He has 35 years ofHe has 35 years of

experience in databaseexperience in database

technology managementtechnology management

and data warehouse design. and data warehouse design.

Page 3: Data Warehouse

Inmon-Cont’dInmon-Cont’d

Bill has written about a varietyBill has written about a variety of topics on the building, usage,of topics on the building, usage, & maintenance of the data warehouse& maintenance of the data warehouse & the Corporate Information Factory.& the Corporate Information Factory.

He has written more than 650He has written more than 650articles (Datamation, ComputerWorld, articles (Datamation, ComputerWorld, and Byte Magazine).and Byte Magazine).

Inmon has published 45 books.Inmon has published 45 books. Many of books has been translated to Chinese, Dutch, French, German, Many of books has been translated to Chinese, Dutch, French, German,

Japanese, Korean, Portuguese, Russian, and Spanish. Japanese, Korean, Portuguese, Russian, and Spanish.

Page 4: Data Warehouse

IntroductionIntroduction

What is Data Warehouse?What is Data Warehouse?A data warehouse is a collection of integrated A data warehouse is a collection of integrated databases designed to support a DSS.databases designed to support a DSS.

According to Inmon’s (father of data warehousing) According to Inmon’s (father of data warehousing) definition(Inmon,1992a,p.5):definition(Inmon,1992a,p.5): It is a collection of integrated, subject-oriented It is a collection of integrated, subject-oriented

databases designed to support the DSS function, databases designed to support the DSS function, where each unit of data is non-volatile and relevant where each unit of data is non-volatile and relevant to some moment in time.to some moment in time.

Page 5: Data Warehouse

Introduction-Cont’d.Introduction-Cont’d.

Where is it used?Where is it used?

It is used for evaluating future strategy.It is used for evaluating future strategy.

It needs a successful technician:It needs a successful technician: Flexible.Flexible. Team player.Team player. Good balance of business and technical Good balance of business and technical

understanding.understanding.

Page 6: Data Warehouse

Introduction-Cont’d.Introduction-Cont’d.

The ultimate use of data warehouse is Mass Customization.The ultimate use of data warehouse is Mass Customization. For example, it increased Capital One’s customers from For example, it increased Capital One’s customers from

1 million to approximately 9 millions in 8 years.1 million to approximately 9 millions in 8 years.

Just like a muscle: DW increases in strength with active use.Just like a muscle: DW increases in strength with active use. With each new test and product, valuable information is With each new test and product, valuable information is

added to the DW, allowing the analyst to learn from the added to the DW, allowing the analyst to learn from the success and failure of the past.success and failure of the past.

The key to survival:The key to survival: Is the ability to analyze, plan, and react to changing Is the ability to analyze, plan, and react to changing

business conditions in a much more rapid fashion.business conditions in a much more rapid fashion.

Page 7: Data Warehouse

Data WarehouseData Warehouse

In order for data to be effective, DW must be:In order for data to be effective, DW must be: Consistent.Consistent. Well integrated.Well integrated. Well defined.Well defined. Time stamped.Time stamped.

DW environment:DW environment: The data store, data mart & the metadata.The data store, data mart & the metadata.

Page 8: Data Warehouse

The Data StoreThe Data Store

An operational data store (ODS) stores data for a An operational data store (ODS) stores data for a specific application. It feeds the data warehouse a specific application. It feeds the data warehouse a stream of desired raw data.stream of desired raw data.

Is the most common component of DW environment.Is the most common component of DW environment.

Data store is generally subject oriented, volatile, Data store is generally subject oriented, volatile, current commonly focused on customers, products, current commonly focused on customers, products, orders, policies, claims, etc…orders, policies, claims, etc…

Page 9: Data Warehouse

Data Store & Data Warehouse Data Store & Data Warehouse

Data store & Data warehouse, table 10-1 page Data store & Data warehouse, table 10-1 page 296296

Page 10: Data Warehouse

The data store-Cont’d.The data store-Cont’d.

Its day-to-day function is to store the data for a Its day-to-day function is to store the data for a single specific set of operational application.single specific set of operational application.

Its function is to feed the data warehouse data Its function is to feed the data warehouse data for the purpose of analysis.for the purpose of analysis.

Page 11: Data Warehouse

The Data MartThe Data Mart

It is lower-cost, scaled down version of the It is lower-cost, scaled down version of the DW.DW.

Data Mart offer a targeted and less costly Data Mart offer a targeted and less costly method of gaining the advantages associated method of gaining the advantages associated with data warehousing and can be scaled up to with data warehousing and can be scaled up to a full DW environment over time.a full DW environment over time.

Page 12: Data Warehouse

The Meta DataThe Meta Data

Last component of DW environments.Last component of DW environments.

It is information that is kept about the warehouse It is information that is kept about the warehouse rather than information kept within the warehouse.rather than information kept within the warehouse.

Legacy systems generally don’t keep a record of Legacy systems generally don’t keep a record of characteristics of the data (such as what pieces of data characteristics of the data (such as what pieces of data exist and where they are located).exist and where they are located).

The metadata is simply data about data. The metadata is simply data about data.

Page 13: Data Warehouse

ConclusionConclusion

A Data Warehouse is a collection of integrated subject-A Data Warehouse is a collection of integrated subject-oriented databases designed to support a DSS.oriented databases designed to support a DSS.

Each unit of data is non-volatile and relevant to some moment in time.Each unit of data is non-volatile and relevant to some moment in time.

An operational data store (ODS) stores data for a specific An operational data store (ODS) stores data for a specific application. It feeds the data warehouse a stream of desired application. It feeds the data warehouse a stream of desired raw data.raw data.

A data mart is a lower-cost, scaled-down version of a data A data mart is a lower-cost, scaled-down version of a data warehouse, usually designed to support a small group of users warehouse, usually designed to support a small group of users (rather than the entire firm).(rather than the entire firm).

The metadata is information that is kept about the warehouse.The metadata is information that is kept about the warehouse.

Page 14: Data Warehouse

Data WarehouseData Warehouse

Subject orientedSubject oriented

Data integratedData integrated

Time variantTime variant

NonvolatileNonvolatile

Page 15: Data Warehouse

Characteristics of Data WarehouseCharacteristics of Data Warehouse

Subject oriented. Subject oriented. Data are organized based on Data are organized based on how the users refer to them. how the users refer to them. IntegratedIntegrated. All inconsistencies regarding naming . All inconsistencies regarding naming convention and value representations are convention and value representations are removed.removed.NonvolatileNonvolatile. Data are stored in read-only format . Data are stored in read-only format and do not change over time.and do not change over time.Time variantTime variant. . Data are not current but normally Data are not current but normally time series.time series.

Page 16: Data Warehouse

Characteristics of Data WarehouseCharacteristics of Data Warehouse

SummarizedSummarized Operational data are mapped into Operational data are mapped into a decision-usable formata decision-usable formatLarge volumeLarge volume. . Time series data sets are Time series data sets are normally quite large.normally quite large.Not normalizedNot normalized. . DW data can be, and often DW data can be, and often are, redundant.are, redundant.MetadataMetadata. . Data about data are stored.Data about data are stored.Data sourcesData sources. Data come from internal and . Data come from internal and external unintegrated operational systems.external unintegrated operational systems.

Page 17: Data Warehouse

A Data Warehouse is A Data Warehouse is SubjectSubject Oriented Oriented

Page 18: Data Warehouse

Subject OrientationSubject Orientation

Application EnvironmentApplication Environment Data warehouse Data warehouse EnvironmentEnvironment

Design activities must be equally Design activities must be equally focused on both process and database focused on both process and database design design

DW world is primarily void of process DW world is primarily void of process design and tends to focus exclusively on design and tends to focus exclusively on issues of data modeling and database issues of data modeling and database designdesign

Page 19: Data Warehouse

Data IntegratedData Integrated

IntegrationIntegration –consistency naming –consistency naming conventions and measurement attributers, conventions and measurement attributers, accuracy, and common aggregation.accuracy, and common aggregation.

Establishment of a common unit of Establishment of a common unit of measure for all synonymous data measure for all synonymous data elements from dissimilar database.elements from dissimilar database.

The data must be stored in the DW in an The data must be stored in the DW in an integrated, globally acceptable manner integrated, globally acceptable manner

Page 20: Data Warehouse

Data IntegratedData Integrated

Page 21: Data Warehouse

Time VariantTime Variant

In an operational application system, the In an operational application system, the expectation is that all data within the database expectation is that all data within the database are accurate as of the moment of access. In the are accurate as of the moment of access. In the DW data are simply assumed to be accurate as DW data are simply assumed to be accurate as of some moment in time and not necessarily of some moment in time and not necessarily right now. right now. One of the places where DW data display time One of the places where DW data display time variance is in the structure of the record key. variance is in the structure of the record key. Every primary key contained within the DW Every primary key contained within the DW must contain, either implicitly or explicitly an must contain, either implicitly or explicitly an element of time( day, week, month, etc)element of time( day, week, month, etc)

Page 22: Data Warehouse

Time VariantTime Variant

Every piece of data contained within the Every piece of data contained within the warehouse must be associated with a warehouse must be associated with a particular point in time if any useful particular point in time if any useful analysis is to be conducted with it.analysis is to be conducted with it.

Another aspect of time variance in DW Another aspect of time variance in DW data is that, once recorded, data within the data is that, once recorded, data within the warehouse cannot be updated or warehouse cannot be updated or changed.changed.

Page 23: Data Warehouse

NonvolatilityNonvolatility

Typical activities such as deletes, inserts, Typical activities such as deletes, inserts, and changes that are performed in an and changes that are performed in an operational application environment are operational application environment are completely nonexistent in a DW completely nonexistent in a DW environment.environment.

Only two data operations are ever Only two data operations are ever performed in the DW: data loading and performed in the DW: data loading and data access data access

Page 24: Data Warehouse

NonvolatilityNonvolatility

Application DWThe design issues must focus on data integrity and update anomalies. Complex processes must be coded to ensure that the data update processes allow for high integrity of the final product.

Such issues are no concern to in a DW environment because data update is never performed.

Data is placed in normalized form to ensure a minimal redundancy (totals that could be calculated would never be stored)

Designers find it useful to store many of such calculations or summarizations.

The technologies necessary to support issues of transaction and data recovery, roll back, and detection and remedy of deadlock are quite complex.

Relative simplicity in technology

Page 25: Data Warehouse

Issues of Data Redundancy between Issues of Data Redundancy between DW and operational environmentsDW and operational environments

The lack of relevancy of issues such as data The lack of relevancy of issues such as data normalization in the DW environment may suggest that normalization in the DW environment may suggest that existence of massive data redundancy within the data existence of massive data redundancy within the data warehouse and between the operational and DW warehouse and between the operational and DW environments.environments.

Inmon(1992) pointed out and proved that it is not true.Inmon(1992) pointed out and proved that it is not true.

Page 26: Data Warehouse

Issues of Data Redundancy between Issues of Data Redundancy between DW and operational environmentsDW and operational environments

The data being loaded into the DW are filtered and “cleansed” as they The data being loaded into the DW are filtered and “cleansed” as they pass from the operational database to the warehouse. Because of this pass from the operational database to the warehouse. Because of this cleansing numerous data that exists in the operational environment cleansing numerous data that exists in the operational environment never pass to the data warehouse. Only the data necessary for never pass to the data warehouse. Only the data necessary for processing by the DSS or EIS are ever actually loaded into the DW. processing by the DSS or EIS are ever actually loaded into the DW.

The time horizons for warehouse and operational data elements are The time horizons for warehouse and operational data elements are unique. Data in the operational environment are fresh, whereas unique. Data in the operational environment are fresh, whereas warehouse data are generally much older.(so there is minimal warehouse data are generally much older.(so there is minimal opportunity of the data to overlap between two environments )opportunity of the data to overlap between two environments )

The data loaded into the DW often undergo a radical transformation as The data loaded into the DW often undergo a radical transformation as they pass form operational to the DW environment. So data in DW are they pass form operational to the DW environment. So data in DW are not the same.not the same.

Given this factors, Inmon suggests that data redundancy between the two Given this factors, Inmon suggests that data redundancy between the two environments is a rare occurrence with a typical redundancy factor of environments is a rare occurrence with a typical redundancy factor of less than 1 % less than 1 %

Page 27: Data Warehouse

The Data Warehouse The Data Warehouse ArchitectureArchitecture

The architecture consists of various The architecture consists of various interconnected elements:interconnected elements: Operational and external database layerOperational and external database layer – the – the

source data for the DWsource data for the DW Information access layerInformation access layer – the tools the end – the tools the end

user access to extract and analyze the datauser access to extract and analyze the data Data access layerData access layer – the interface between the – the interface between the

operational and information access layersoperational and information access layers Metadata layerMetadata layer – the data directory or – the data directory or

repository of metadata informationrepository of metadata information

Page 28: Data Warehouse

Components of the Data Components of the Data Warehouse ArchitectureWarehouse Architecture

Page 29: Data Warehouse

The Data Warehouse The Data Warehouse ArchitectureArchitecture

Additional layers are:Additional layers are: Process management layerProcess management layer – the scheduler or job – the scheduler or job

controllercontroller Application messaging layerApplication messaging layer – the “middleware” that – the “middleware” that

transports information around the firmtransports information around the firm Physical data warehouse layerPhysical data warehouse layer – where the actual – where the actual

data used in the DSS are locateddata used in the DSS are located Data staging layerData staging layer – all of the processes necessary to – all of the processes necessary to

select, edit, summarize and load warehouse data select, edit, summarize and load warehouse data from the operational and external data basesfrom the operational and external data bases

Page 30: Data Warehouse

Data Warehousing TypologyData Warehousing Typology

The The virtual data warehousevirtual data warehouse – the end users – the end users have direct access to the data stores, using have direct access to the data stores, using tools enabled at the data access layertools enabled at the data access layer

The The central data warehousecentral data warehouse – a single physical – a single physical database contains all of the data for a specific database contains all of the data for a specific functional areafunctional area

The The distributed data warehousedistributed data warehouse – the – the components are distributed across several components are distributed across several physical databasesphysical databases

Page 31: Data Warehouse

The MetadataThe Metadata

The name suggests some high-level The name suggests some high-level technological concept, but it really is fairly technological concept, but it really is fairly simple. Metadata is “data about data”.simple. Metadata is “data about data”.With the emergence of the data warehouse as a With the emergence of the data warehouse as a decision support structure, the metadata are decision support structure, the metadata are considered as much a resource as the business considered as much a resource as the business data they describe.data they describe.Metadata are abstractions -- they are high level Metadata are abstractions -- they are high level data that provide concise descriptions of lower-data that provide concise descriptions of lower-level data.level data.

Page 32: Data Warehouse

The MetadataThe Metadata

For example, a line in a sales database may contain: For example, a line in a sales database may contain: 4056 KJ596 223.454056 KJ596 223.45

This is mostly meaningless until we consult the metadata This is mostly meaningless until we consult the metadata that tells us it was store number 4056, product KJ596 that tells us it was store number 4056, product KJ596 and sales of $223.45and sales of $223.45

The metadata are essential ingredients in the The metadata are essential ingredients in the transformation of raw data into knowledge. They are the transformation of raw data into knowledge. They are the “keys” that allow us to handle the raw data.“keys” that allow us to handle the raw data.

Page 33: Data Warehouse

General Metadata IssuesGeneral Metadata Issues

General metadata issues associated with Data General metadata issues associated with Data Warehouse use:Warehouse use: What tables, attributes and keys does the DW What tables, attributes and keys does the DW

contain?contain? Where did each set of data come from?Where did each set of data come from? What transformations were applied with cleansing?What transformations were applied with cleansing? How have the metadata changed over time?How have the metadata changed over time? How often do the data get reloaded?How often do the data get reloaded? Are there so many data elements that you need to be Are there so many data elements that you need to be

careful what you ask for?careful what you ask for?

Page 34: Data Warehouse

Components of the MetadataComponents of the Metadata

Transformation mapsTransformation maps – records that show – records that show what transformations were appliedwhat transformations were appliedExtraction & relationship historyExtraction & relationship history – records – records that show what data was analyzedthat show what data was analyzedAlgorithms for summarizationAlgorithms for summarization – methods – methods available for aggregating and summarizingavailable for aggregating and summarizingData ownershipData ownership – records that show origin – records that show originPatterns of accessPatterns of access – records that show – records that show what data are accessed and how oftenwhat data are accessed and how often

Page 35: Data Warehouse

Typical Mapping MetadataTypical Mapping Metadata

Transformation mapping records include:Transformation mapping records include: Identification of original sourceIdentification of original source Attribute conversionsAttribute conversions Physical characteristic conversionsPhysical characteristic conversions Encoding/reference table conversionsEncoding/reference table conversions Naming changesNaming changes Key changesKey changes Values of default attributesValues of default attributes Logic to choose from multiple sourcesLogic to choose from multiple sources Algorithmic changesAlgorithmic changes

Page 36: Data Warehouse

Implementing the Data WarehouseImplementing the Data Warehouse

Kozar list of “seven deadly sins” of data warehouse Kozar list of “seven deadly sins” of data warehouse implementation:implementation:

1.1. ““If you build it, they will come”If you build it, they will come” – the DW needs to be – the DW needs to be designed to meet people’s needsdesigned to meet people’s needs

2.2. Omission of an architectural frameworkOmission of an architectural framework – you need – you need to consider the number of users, volume of data, to consider the number of users, volume of data, update cycle, etc.update cycle, etc.

3.3. Underestimating the importance of documenting Underestimating the importance of documenting assumptionsassumptions – the assumptions and potential – the assumptions and potential conflicts must be included in the frameworkconflicts must be included in the framework

Page 37: Data Warehouse

““Seven Deadly Sins”, Seven Deadly Sins”, continuedcontinued

4.4. Failure to use the right toolFailure to use the right tool – a DW project needs – a DW project needs different tools than those used to develop an different tools than those used to develop an applicationapplication

5.5. Life cycle abuseLife cycle abuse – in a DW, the life cycle really – in a DW, the life cycle really never endsnever ends

6.6. Ignorance about data conflictsIgnorance about data conflicts – resolving these – resolving these takes a lot more effort than most people realizetakes a lot more effort than most people realize

7.7. Failure to learn from mistakesFailure to learn from mistakes – since one DW – since one DW project tends to beget another, learning from the project tends to beget another, learning from the early mistakes will yield higher quality laterearly mistakes will yield higher quality later

Page 38: Data Warehouse

Data Warehouse TechnologiesData Warehouse Technologies

No one currently offers an end-to-end DW No one currently offers an end-to-end DW solution. Organizations buy bits and pieces from solution. Organizations buy bits and pieces from a number of vendors and hopefully make them a number of vendors and hopefully make them work together.work together.

SAS, IBM, Software AG, Information Builders SAS, IBM, Software AG, Information Builders and Platinum offer solutions that are at least and Platinum offer solutions that are at least fairly comprehensive.fairly comprehensive.

The market is very competitive. Table 10-6 in The market is very competitive. Table 10-6 in the text lists 90 firms that produce DW products.the text lists 90 firms that produce DW products.

Page 39: Data Warehouse

The Future of Data WarehousingThe Future of Data Warehousing

As the DW becomes a standard part of an As the DW becomes a standard part of an organization, there will be efforts to find new organization, there will be efforts to find new ways to use the data. This will likely bring with it ways to use the data. This will likely bring with it several new challenges:several new challenges: Regulatory constraintsRegulatory constraints may limit the ability to combine may limit the ability to combine

sources of disparate data.sources of disparate data. These disparate sources are likely to contain These disparate sources are likely to contain

unstructured dataunstructured data, which is hard to store., which is hard to store. The The InternetInternet makes it possible to access data from makes it possible to access data from

virtually “anywhere”. Of course, this just increases virtually “anywhere”. Of course, this just increases the disparity.the disparity.

Page 40: Data Warehouse

ObjectiveObjective

Interesting FactsInteresting Facts

Data Can be Used ToData Can be Used To

Robust InfrastructureRobust Infrastructure

Success of Data Success of Data Warehouse ProjectsWarehouse Projects

Implementing Data Implementing Data WarehouseWarehouse

Real Time Alerts & Real Time Alerts & IntegrationIntegration

Identity TheftIdentity Theft

What Can You Do?What Can You Do?

Page 41: Data Warehouse

Interesting FactsInteresting Facts

Harrah’s Entertainment’s Data Warehouse holds Harrah’s Entertainment’s Data Warehouse holds 30 terabytes, or 30 trillion bytes of data, roughly 30 terabytes, or 30 trillion bytes of data, roughly three times the number of printed characters in three times the number of printed characters in the Library of Congressthe Library of Congress

Casinos, retailers, airlines, and banks are piling Casinos, retailers, airlines, and banks are piling up data so vast, it would have been unthinkable up data so vast, it would have been unthinkable years ago; result from the curse of cheap years ago; result from the curse of cheap storagestorage

Page 42: Data Warehouse

Interesting FactsInteresting Facts

Storage Shipments as of 2004: 22 Storage Shipments as of 2004: 22 exabytes or 22 million trillion bytes of hard exabytes or 22 million trillion bytes of hard disk space, double the amount in 2002.disk space, double the amount in 2002.

Equivalent to 4x’s the space needed to Equivalent to 4x’s the space needed to store every word ever spoken by every store every word ever spoken by every human being who has ever lived.human being who has ever lived.

Should double again in 2006Should double again in 2006

Page 43: Data Warehouse

Data Can be Used ToData Can be Used To

Quantify the volume impact of vehicles across the Quantify the volume impact of vehicles across the marketing matrixmarketing matrix

Account for decay and saturation factors in the Account for decay and saturation factors in the determination of investment choices and returnsdetermination of investment choices and returns

Execute “what-if” simulations of pricing or promotional Execute “what-if” simulations of pricing or promotional scenarios before a proposed action is takenscenarios before a proposed action is taken

Provide a continuous planning, measurement, analysis and Provide a continuous planning, measurement, analysis and optimization cycle supported by a software structureoptimization cycle supported by a software structure

Deliver robust data feeds into other systems supporting Deliver robust data feeds into other systems supporting supply chain, sales, and financial reporting and endeavorssupply chain, sales, and financial reporting and endeavors

Page 44: Data Warehouse

Robust InfrastructureRobust Infrastructure

Data Identification and AcquisitionData Identification and Acquisition

Data Cleansing, Mapping, and Data Cleansing, Mapping, and TransformationTransformation

Production System Loading and Ongoing Production System Loading and Ongoing UpdateUpdate

Page 45: Data Warehouse

Success of Data Warehouse Success of Data Warehouse ProjectsProjects

Over half of Data Warehouse projects are DoomedOver half of Data Warehouse projects are Doomed

Fail due to lack of attention to Data Quality IssuesFail due to lack of attention to Data Quality Issues

More than half only have limited acceptanceMore than half only have limited acceptance

Consistency and Accuracy of DataConsistency and Accuracy of Data

Most businesses fail to use business intelligence (BI) Most businesses fail to use business intelligence (BI) strategicallystrategically

IT organizations build data warehouses with little to no business IT organizations build data warehouses with little to no business involvementinvolvement

Page 46: Data Warehouse

““A real-time A real-time enterprise without enterprise without real-time business real-time business

intelligence is a real intelligence is a real fast, dumb fast, dumb

organization.”organization.”

Stephen BrobstStephen BrobstChief Technology OfficeChief Technology Office

TeradataTeradata

Page 47: Data Warehouse

Success of Data Warehouse Success of Data Warehouse ProjectsProjects

Most challenging type of deployment for an Most challenging type of deployment for an enterpriseenterprise

Large scale and complex system configurationsLarge scale and complex system configurations

Sophisticated data modeling and analysis toolsSophisticated data modeling and analysis tools

High visibility in broad range of important business High visibility in broad range of important business functions within companyfunctions within company

Adoption of Linux-Based PlatformAdoption of Linux-Based Platform

Page 48: Data Warehouse

Implementing Data WarehouseImplementing Data Warehouse

Challenges:Challenges: Identifying new processesIdentifying new processes Assuring there were of real useAssuring there were of real use Implementing and ensuring cultural shiftsImplementing and ensuring cultural shifts Managing content and New communities Managing content and New communities

towards a common benefittowards a common benefit Linear modelsLinear models Standards, Governance, Controls, ValuationStandards, Governance, Controls, Valuation

Page 49: Data Warehouse

TeradataTeradata

Division of NCR in Dayton, OhioDivision of NCR in Dayton, Ohio

Competitor of IBM and OracleCompetitor of IBM and Oracle

Multi-million Dollar Machines to run the Multi-million Dollar Machines to run the world’s biggest data warehousesworld’s biggest data warehouses Wal-MartWal-Mart Bank of AmericaBank of America Verizon WirelessVerizon Wireless

Page 50: Data Warehouse

Teradata’s SuccessTeradata’s Success

Conventional IBM or Sun Microsystems Conventional IBM or Sun Microsystems overload for a couple hours to days on a overload for a couple hours to days on a few terabytes and/or data queriesfew terabytes and/or data queries

IBM cannot return computation on certain IBM cannot return computation on certain complex requestscomplex requests

Equivalent to having data but not able to Equivalent to having data but not able to use it.use it.

Page 51: Data Warehouse

Real Time Alerts & IntegrationReal Time Alerts & Integration

Teradata 8.0 Version released in Oct 2004Teradata 8.0 Version released in Oct 2004 Improves real-time alerts and integrationImproves real-time alerts and integration

Businesses can analyze operational info against Businesses can analyze operational info against historical info to identify events in near real-time historical info to identify events in near real-time using the new table designusing the new table design

Used by:Used by: Continental Airlines in the US: reroute passengers on Continental Airlines in the US: reroute passengers on

delayed flights, reissuing tickets, reserving a room in delayed flights, reissuing tickets, reserving a room in a hotel booking systema hotel booking system

Southwest Airlines- savings between $1.2-$1.4 MillionSouthwest Airlines- savings between $1.2-$1.4 Million

Page 52: Data Warehouse

Identity TheftIdentity Theft

Government Regulation of Personal Data is Needed Government Regulation of Personal Data is Needed (National Consumer Protection Standards)(National Consumer Protection Standards)

ChoicePoint FollyChoicePoint Folly

Georgia-based data-collection companyGeorgia-based data-collection company

Founded in 1997 to analyze insurance claims information, but Founded in 1997 to analyze insurance claims information, but now provides data to customers including finance companies, now provides data to customers including finance companies, law enforcement, and governmentlaw enforcement, and government

Obtain personal information by perusing public records, or Obtain personal information by perusing public records, or purchasing the information from other companiespurchasing the information from other companies

Page 53: Data Warehouse

Identity TheftIdentity Theft

Duped by scammers who set 150 phony Duped by scammers who set 150 phony accounts to access personal data of as many as accounts to access personal data of as many as 145,000 people nationwide 145,000 people nationwide

Scammers set user accounts by faxing in phony Scammers set user accounts by faxing in phony business licenses, undetected for one yearbusiness licenses, undetected for one year

750 people had their identities stolen750 people had their identities stolen

Theft would have gone unnoticed without Theft would have gone unnoticed without California Identity theft law SB 1386 California Identity theft law SB 1386

Page 54: Data Warehouse

Identity TheftIdentity Theft

MSN EventMSN Event

Data Warehouse Information GatheringData Warehouse Information Gathering

Over the Phone InterviewsOver the Phone Interviews

Trash Can HuntingTrash Can Hunting

Gathered from Doctors, Internet Transactions, Gathered from Doctors, Internet Transactions, Telephone Operators (Overseas or Prisoners)Telephone Operators (Overseas or Prisoners)

Page 55: Data Warehouse

MSN EmailMSN Email

Page 56: Data Warehouse

What Can You Do?What Can You Do?

Carefully monitor your credit card bills and credit Carefully monitor your credit card bills and credit reportsreports

Request a once a year free access credit report Request a once a year free access credit report via the three big credit agencies.via the three big credit agencies. Equifax, Experian, TransUnion Equifax, Experian, TransUnion

Victims: contact Federal Trade Commission to Victims: contact Federal Trade Commission to report the theft and monitor credit reports. report the theft and monitor credit reports. 1-800-IDTHEFT1-800-IDTHEFT

Page 57: Data Warehouse

ReferencesReferencesDecision Support Systems in the 21Decision Support Systems in the 21stst Century Century 2 2ndnd Edition, by George M. Edition, by George M. Marakas, Prentice Hall, Upper Saddle River, NJ, 2003Marakas, Prentice Hall, Upper Saddle River, NJ, 2003

http://seattletimes.nwsource.com/html/editorialsopinion/2002191098_credited27.htmlSeattle times, plugging holes in data warehousing Seattle times, plugging holes in data warehousing

Teradata warehouse improves real-time alerts and integrationTeradata warehouse improves real-time alerts and integrationCliff Saran.Cliff Saran. Computer Weekly.Computer Weekly. Sutton: Oct 12, 2004. p. 22 (1 page) Sutton: Oct 12, 2004. p. 22 (1 page)

ON THE MARKON THE MARKMark Hall.Mark Hall. Computerworld.Computerworld. Framingham: Oct 18, 2004. Vol. 38, Iss. 42; p. Framingham: Oct 18, 2004. Vol. 38, Iss. 42; p. 6 (1 page)6 (1 page)

Optimization: It's All About the Data Optimization: It's All About the Data Brandweek: Ellen Pederson, Mark Brandweek: Ellen Pederson, Mark AndersonAnderson

THE NO-SACRIFICE, AFFORDABLE DATA WAREHOUSE APP THE NO-SACRIFICE, AFFORDABLE DATA WAREHOUSE APP Intelligent Enterprises, Michael GonzalezIntelligent Enterprises, Michael Gonzalez

Page 58: Data Warehouse

ReferencesReferenceshttp://www.dmreview.com/article_sub.cfm?articleId=7071 Convergence- http://www.dmreview.com/article_sub.cfm?articleId=7071 Convergence- Beyond the Data WarehouseBeyond the Data Warehouse

http://www.computerworld.com/printthis/2001/0,4814,56969,00.html Micro-http://www.computerworld.com/printthis/2001/0,4814,56969,00.html Micro-segmentation – Computerworldsegmentation – Computerworld

Too Much InformationToo Much Information Forbes article on data warehouseForbes article on data warehouse

http://reviews.cnet.com/4520-3513_7-5690533-1.html http://reviews.cnet.com/4520-3513_7-5690533-1.html When identity When identity thieves strike data warehousesthieves strike data warehouses

Over half of data warehouse projects doomedOver half of data warehouse projects doomed VNU Business Publications VNU Business Publications Limited, Robert Jaques 25 February 2005 Limited, Robert Jaques 25 February 2005

http://www.linuxworld.com/magazine/?issueid=571 Linux World Article http://www.linuxworld.com/magazine/?issueid=571 Linux World Article

Page 59: Data Warehouse

Questions?Questions?


Top Related