Download - Data Warehouse
Data WarehouseData WarehouseLutfi FreijLutfi Freij
Konstantin RimarchukKonstantin RimarchukVasken ChamlaianVasken Chamlaian
John SahakianJohn SahakianSuzan TonSuzan Ton
InmonInmon
Father of the data warehouseFather of the data warehouse
Co-creator of the CorporateCo-creator of the Corporate
Information Factory.Information Factory.
He has 35 years ofHe has 35 years of
experience in databaseexperience in database
technology managementtechnology management
and data warehouse design. and data warehouse design.
Inmon-Cont’dInmon-Cont’d
Bill has written about a varietyBill has written about a variety of topics on the building, usage,of topics on the building, usage, & maintenance of the data warehouse& maintenance of the data warehouse & the Corporate Information Factory.& the Corporate Information Factory.
He has written more than 650He has written more than 650articles (Datamation, ComputerWorld, articles (Datamation, ComputerWorld, and Byte Magazine).and Byte Magazine).
Inmon has published 45 books.Inmon has published 45 books. Many of books has been translated to Chinese, Dutch, French, German, Many of books has been translated to Chinese, Dutch, French, German,
Japanese, Korean, Portuguese, Russian, and Spanish. Japanese, Korean, Portuguese, Russian, and Spanish.
IntroductionIntroduction
What is Data Warehouse?What is Data Warehouse?A data warehouse is a collection of integrated A data warehouse is a collection of integrated databases designed to support a DSS.databases designed to support a DSS.
According to Inmon’s (father of data warehousing) According to Inmon’s (father of data warehousing) definition(Inmon,1992a,p.5):definition(Inmon,1992a,p.5): It is a collection of integrated, subject-oriented It is a collection of integrated, subject-oriented
databases designed to support the DSS function, databases designed to support the DSS function, where each unit of data is non-volatile and relevant where each unit of data is non-volatile and relevant to some moment in time.to some moment in time.
Introduction-Cont’d.Introduction-Cont’d.
Where is it used?Where is it used?
It is used for evaluating future strategy.It is used for evaluating future strategy.
It needs a successful technician:It needs a successful technician: Flexible.Flexible. Team player.Team player. Good balance of business and technical Good balance of business and technical
understanding.understanding.
Introduction-Cont’d.Introduction-Cont’d.
The ultimate use of data warehouse is Mass Customization.The ultimate use of data warehouse is Mass Customization. For example, it increased Capital One’s customers from For example, it increased Capital One’s customers from
1 million to approximately 9 millions in 8 years.1 million to approximately 9 millions in 8 years.
Just like a muscle: DW increases in strength with active use.Just like a muscle: DW increases in strength with active use. With each new test and product, valuable information is With each new test and product, valuable information is
added to the DW, allowing the analyst to learn from the added to the DW, allowing the analyst to learn from the success and failure of the past.success and failure of the past.
The key to survival:The key to survival: Is the ability to analyze, plan, and react to changing Is the ability to analyze, plan, and react to changing
business conditions in a much more rapid fashion.business conditions in a much more rapid fashion.
Data WarehouseData Warehouse
In order for data to be effective, DW must be:In order for data to be effective, DW must be: Consistent.Consistent. Well integrated.Well integrated. Well defined.Well defined. Time stamped.Time stamped.
DW environment:DW environment: The data store, data mart & the metadata.The data store, data mart & the metadata.
The Data StoreThe Data Store
An operational data store (ODS) stores data for a An operational data store (ODS) stores data for a specific application. It feeds the data warehouse a specific application. It feeds the data warehouse a stream of desired raw data.stream of desired raw data.
Is the most common component of DW environment.Is the most common component of DW environment.
Data store is generally subject oriented, volatile, Data store is generally subject oriented, volatile, current commonly focused on customers, products, current commonly focused on customers, products, orders, policies, claims, etc…orders, policies, claims, etc…
Data Store & Data Warehouse Data Store & Data Warehouse
Data store & Data warehouse, table 10-1 page Data store & Data warehouse, table 10-1 page 296296
The data store-Cont’d.The data store-Cont’d.
Its day-to-day function is to store the data for a Its day-to-day function is to store the data for a single specific set of operational application.single specific set of operational application.
Its function is to feed the data warehouse data Its function is to feed the data warehouse data for the purpose of analysis.for the purpose of analysis.
The Data MartThe Data Mart
It is lower-cost, scaled down version of the It is lower-cost, scaled down version of the DW.DW.
Data Mart offer a targeted and less costly Data Mart offer a targeted and less costly method of gaining the advantages associated method of gaining the advantages associated with data warehousing and can be scaled up to with data warehousing and can be scaled up to a full DW environment over time.a full DW environment over time.
The Meta DataThe Meta Data
Last component of DW environments.Last component of DW environments.
It is information that is kept about the warehouse It is information that is kept about the warehouse rather than information kept within the warehouse.rather than information kept within the warehouse.
Legacy systems generally don’t keep a record of Legacy systems generally don’t keep a record of characteristics of the data (such as what pieces of data characteristics of the data (such as what pieces of data exist and where they are located).exist and where they are located).
The metadata is simply data about data. The metadata is simply data about data.
ConclusionConclusion
A Data Warehouse is a collection of integrated subject-A Data Warehouse is a collection of integrated subject-oriented databases designed to support a DSS.oriented databases designed to support a DSS.
Each unit of data is non-volatile and relevant to some moment in time.Each unit of data is non-volatile and relevant to some moment in time.
An operational data store (ODS) stores data for a specific An operational data store (ODS) stores data for a specific application. It feeds the data warehouse a stream of desired application. It feeds the data warehouse a stream of desired raw data.raw data.
A data mart is a lower-cost, scaled-down version of a data A data mart is a lower-cost, scaled-down version of a data warehouse, usually designed to support a small group of users warehouse, usually designed to support a small group of users (rather than the entire firm).(rather than the entire firm).
The metadata is information that is kept about the warehouse.The metadata is information that is kept about the warehouse.
Data WarehouseData Warehouse
Subject orientedSubject oriented
Data integratedData integrated
Time variantTime variant
NonvolatileNonvolatile
Characteristics of Data WarehouseCharacteristics of Data Warehouse
Subject oriented. Subject oriented. Data are organized based on Data are organized based on how the users refer to them. how the users refer to them. IntegratedIntegrated. All inconsistencies regarding naming . All inconsistencies regarding naming convention and value representations are convention and value representations are removed.removed.NonvolatileNonvolatile. Data are stored in read-only format . Data are stored in read-only format and do not change over time.and do not change over time.Time variantTime variant. . Data are not current but normally Data are not current but normally time series.time series.
Characteristics of Data WarehouseCharacteristics of Data Warehouse
SummarizedSummarized Operational data are mapped into Operational data are mapped into a decision-usable formata decision-usable formatLarge volumeLarge volume. . Time series data sets are Time series data sets are normally quite large.normally quite large.Not normalizedNot normalized. . DW data can be, and often DW data can be, and often are, redundant.are, redundant.MetadataMetadata. . Data about data are stored.Data about data are stored.Data sourcesData sources. Data come from internal and . Data come from internal and external unintegrated operational systems.external unintegrated operational systems.
A Data Warehouse is A Data Warehouse is SubjectSubject Oriented Oriented
Subject OrientationSubject Orientation
Application EnvironmentApplication Environment Data warehouse Data warehouse EnvironmentEnvironment
Design activities must be equally Design activities must be equally focused on both process and database focused on both process and database design design
DW world is primarily void of process DW world is primarily void of process design and tends to focus exclusively on design and tends to focus exclusively on issues of data modeling and database issues of data modeling and database designdesign
Data IntegratedData Integrated
IntegrationIntegration –consistency naming –consistency naming conventions and measurement attributers, conventions and measurement attributers, accuracy, and common aggregation.accuracy, and common aggregation.
Establishment of a common unit of Establishment of a common unit of measure for all synonymous data measure for all synonymous data elements from dissimilar database.elements from dissimilar database.
The data must be stored in the DW in an The data must be stored in the DW in an integrated, globally acceptable manner integrated, globally acceptable manner
Data IntegratedData Integrated
Time VariantTime Variant
In an operational application system, the In an operational application system, the expectation is that all data within the database expectation is that all data within the database are accurate as of the moment of access. In the are accurate as of the moment of access. In the DW data are simply assumed to be accurate as DW data are simply assumed to be accurate as of some moment in time and not necessarily of some moment in time and not necessarily right now. right now. One of the places where DW data display time One of the places where DW data display time variance is in the structure of the record key. variance is in the structure of the record key. Every primary key contained within the DW Every primary key contained within the DW must contain, either implicitly or explicitly an must contain, either implicitly or explicitly an element of time( day, week, month, etc)element of time( day, week, month, etc)
Time VariantTime Variant
Every piece of data contained within the Every piece of data contained within the warehouse must be associated with a warehouse must be associated with a particular point in time if any useful particular point in time if any useful analysis is to be conducted with it.analysis is to be conducted with it.
Another aspect of time variance in DW Another aspect of time variance in DW data is that, once recorded, data within the data is that, once recorded, data within the warehouse cannot be updated or warehouse cannot be updated or changed.changed.
NonvolatilityNonvolatility
Typical activities such as deletes, inserts, Typical activities such as deletes, inserts, and changes that are performed in an and changes that are performed in an operational application environment are operational application environment are completely nonexistent in a DW completely nonexistent in a DW environment.environment.
Only two data operations are ever Only two data operations are ever performed in the DW: data loading and performed in the DW: data loading and data access data access
NonvolatilityNonvolatility
Application DWThe design issues must focus on data integrity and update anomalies. Complex processes must be coded to ensure that the data update processes allow for high integrity of the final product.
Such issues are no concern to in a DW environment because data update is never performed.
Data is placed in normalized form to ensure a minimal redundancy (totals that could be calculated would never be stored)
Designers find it useful to store many of such calculations or summarizations.
The technologies necessary to support issues of transaction and data recovery, roll back, and detection and remedy of deadlock are quite complex.
Relative simplicity in technology
Issues of Data Redundancy between Issues of Data Redundancy between DW and operational environmentsDW and operational environments
The lack of relevancy of issues such as data The lack of relevancy of issues such as data normalization in the DW environment may suggest that normalization in the DW environment may suggest that existence of massive data redundancy within the data existence of massive data redundancy within the data warehouse and between the operational and DW warehouse and between the operational and DW environments.environments.
Inmon(1992) pointed out and proved that it is not true.Inmon(1992) pointed out and proved that it is not true.
Issues of Data Redundancy between Issues of Data Redundancy between DW and operational environmentsDW and operational environments
The data being loaded into the DW are filtered and “cleansed” as they The data being loaded into the DW are filtered and “cleansed” as they pass from the operational database to the warehouse. Because of this pass from the operational database to the warehouse. Because of this cleansing numerous data that exists in the operational environment cleansing numerous data that exists in the operational environment never pass to the data warehouse. Only the data necessary for never pass to the data warehouse. Only the data necessary for processing by the DSS or EIS are ever actually loaded into the DW. processing by the DSS or EIS are ever actually loaded into the DW.
The time horizons for warehouse and operational data elements are The time horizons for warehouse and operational data elements are unique. Data in the operational environment are fresh, whereas unique. Data in the operational environment are fresh, whereas warehouse data are generally much older.(so there is minimal warehouse data are generally much older.(so there is minimal opportunity of the data to overlap between two environments )opportunity of the data to overlap between two environments )
The data loaded into the DW often undergo a radical transformation as The data loaded into the DW often undergo a radical transformation as they pass form operational to the DW environment. So data in DW are they pass form operational to the DW environment. So data in DW are not the same.not the same.
Given this factors, Inmon suggests that data redundancy between the two Given this factors, Inmon suggests that data redundancy between the two environments is a rare occurrence with a typical redundancy factor of environments is a rare occurrence with a typical redundancy factor of less than 1 % less than 1 %
The Data Warehouse The Data Warehouse ArchitectureArchitecture
The architecture consists of various The architecture consists of various interconnected elements:interconnected elements: Operational and external database layerOperational and external database layer – the – the
source data for the DWsource data for the DW Information access layerInformation access layer – the tools the end – the tools the end
user access to extract and analyze the datauser access to extract and analyze the data Data access layerData access layer – the interface between the – the interface between the
operational and information access layersoperational and information access layers Metadata layerMetadata layer – the data directory or – the data directory or
repository of metadata informationrepository of metadata information
Components of the Data Components of the Data Warehouse ArchitectureWarehouse Architecture
The Data Warehouse The Data Warehouse ArchitectureArchitecture
Additional layers are:Additional layers are: Process management layerProcess management layer – the scheduler or job – the scheduler or job
controllercontroller Application messaging layerApplication messaging layer – the “middleware” that – the “middleware” that
transports information around the firmtransports information around the firm Physical data warehouse layerPhysical data warehouse layer – where the actual – where the actual
data used in the DSS are locateddata used in the DSS are located Data staging layerData staging layer – all of the processes necessary to – all of the processes necessary to
select, edit, summarize and load warehouse data select, edit, summarize and load warehouse data from the operational and external data basesfrom the operational and external data bases
Data Warehousing TypologyData Warehousing Typology
The The virtual data warehousevirtual data warehouse – the end users – the end users have direct access to the data stores, using have direct access to the data stores, using tools enabled at the data access layertools enabled at the data access layer
The The central data warehousecentral data warehouse – a single physical – a single physical database contains all of the data for a specific database contains all of the data for a specific functional areafunctional area
The The distributed data warehousedistributed data warehouse – the – the components are distributed across several components are distributed across several physical databasesphysical databases
The MetadataThe Metadata
The name suggests some high-level The name suggests some high-level technological concept, but it really is fairly technological concept, but it really is fairly simple. Metadata is “data about data”.simple. Metadata is “data about data”.With the emergence of the data warehouse as a With the emergence of the data warehouse as a decision support structure, the metadata are decision support structure, the metadata are considered as much a resource as the business considered as much a resource as the business data they describe.data they describe.Metadata are abstractions -- they are high level Metadata are abstractions -- they are high level data that provide concise descriptions of lower-data that provide concise descriptions of lower-level data.level data.
The MetadataThe Metadata
For example, a line in a sales database may contain: For example, a line in a sales database may contain: 4056 KJ596 223.454056 KJ596 223.45
This is mostly meaningless until we consult the metadata This is mostly meaningless until we consult the metadata that tells us it was store number 4056, product KJ596 that tells us it was store number 4056, product KJ596 and sales of $223.45and sales of $223.45
The metadata are essential ingredients in the The metadata are essential ingredients in the transformation of raw data into knowledge. They are the transformation of raw data into knowledge. They are the “keys” that allow us to handle the raw data.“keys” that allow us to handle the raw data.
General Metadata IssuesGeneral Metadata Issues
General metadata issues associated with Data General metadata issues associated with Data Warehouse use:Warehouse use: What tables, attributes and keys does the DW What tables, attributes and keys does the DW
contain?contain? Where did each set of data come from?Where did each set of data come from? What transformations were applied with cleansing?What transformations were applied with cleansing? How have the metadata changed over time?How have the metadata changed over time? How often do the data get reloaded?How often do the data get reloaded? Are there so many data elements that you need to be Are there so many data elements that you need to be
careful what you ask for?careful what you ask for?
Components of the MetadataComponents of the Metadata
Transformation mapsTransformation maps – records that show – records that show what transformations were appliedwhat transformations were appliedExtraction & relationship historyExtraction & relationship history – records – records that show what data was analyzedthat show what data was analyzedAlgorithms for summarizationAlgorithms for summarization – methods – methods available for aggregating and summarizingavailable for aggregating and summarizingData ownershipData ownership – records that show origin – records that show originPatterns of accessPatterns of access – records that show – records that show what data are accessed and how oftenwhat data are accessed and how often
Typical Mapping MetadataTypical Mapping Metadata
Transformation mapping records include:Transformation mapping records include: Identification of original sourceIdentification of original source Attribute conversionsAttribute conversions Physical characteristic conversionsPhysical characteristic conversions Encoding/reference table conversionsEncoding/reference table conversions Naming changesNaming changes Key changesKey changes Values of default attributesValues of default attributes Logic to choose from multiple sourcesLogic to choose from multiple sources Algorithmic changesAlgorithmic changes
Implementing the Data WarehouseImplementing the Data Warehouse
Kozar list of “seven deadly sins” of data warehouse Kozar list of “seven deadly sins” of data warehouse implementation:implementation:
1.1. ““If you build it, they will come”If you build it, they will come” – the DW needs to be – the DW needs to be designed to meet people’s needsdesigned to meet people’s needs
2.2. Omission of an architectural frameworkOmission of an architectural framework – you need – you need to consider the number of users, volume of data, to consider the number of users, volume of data, update cycle, etc.update cycle, etc.
3.3. Underestimating the importance of documenting Underestimating the importance of documenting assumptionsassumptions – the assumptions and potential – the assumptions and potential conflicts must be included in the frameworkconflicts must be included in the framework
““Seven Deadly Sins”, Seven Deadly Sins”, continuedcontinued
4.4. Failure to use the right toolFailure to use the right tool – a DW project needs – a DW project needs different tools than those used to develop an different tools than those used to develop an applicationapplication
5.5. Life cycle abuseLife cycle abuse – in a DW, the life cycle really – in a DW, the life cycle really never endsnever ends
6.6. Ignorance about data conflictsIgnorance about data conflicts – resolving these – resolving these takes a lot more effort than most people realizetakes a lot more effort than most people realize
7.7. Failure to learn from mistakesFailure to learn from mistakes – since one DW – since one DW project tends to beget another, learning from the project tends to beget another, learning from the early mistakes will yield higher quality laterearly mistakes will yield higher quality later
Data Warehouse TechnologiesData Warehouse Technologies
No one currently offers an end-to-end DW No one currently offers an end-to-end DW solution. Organizations buy bits and pieces from solution. Organizations buy bits and pieces from a number of vendors and hopefully make them a number of vendors and hopefully make them work together.work together.
SAS, IBM, Software AG, Information Builders SAS, IBM, Software AG, Information Builders and Platinum offer solutions that are at least and Platinum offer solutions that are at least fairly comprehensive.fairly comprehensive.
The market is very competitive. Table 10-6 in The market is very competitive. Table 10-6 in the text lists 90 firms that produce DW products.the text lists 90 firms that produce DW products.
The Future of Data WarehousingThe Future of Data Warehousing
As the DW becomes a standard part of an As the DW becomes a standard part of an organization, there will be efforts to find new organization, there will be efforts to find new ways to use the data. This will likely bring with it ways to use the data. This will likely bring with it several new challenges:several new challenges: Regulatory constraintsRegulatory constraints may limit the ability to combine may limit the ability to combine
sources of disparate data.sources of disparate data. These disparate sources are likely to contain These disparate sources are likely to contain
unstructured dataunstructured data, which is hard to store., which is hard to store. The The InternetInternet makes it possible to access data from makes it possible to access data from
virtually “anywhere”. Of course, this just increases virtually “anywhere”. Of course, this just increases the disparity.the disparity.
ObjectiveObjective
Interesting FactsInteresting Facts
Data Can be Used ToData Can be Used To
Robust InfrastructureRobust Infrastructure
Success of Data Success of Data Warehouse ProjectsWarehouse Projects
Implementing Data Implementing Data WarehouseWarehouse
Real Time Alerts & Real Time Alerts & IntegrationIntegration
Identity TheftIdentity Theft
What Can You Do?What Can You Do?
Interesting FactsInteresting Facts
Harrah’s Entertainment’s Data Warehouse holds Harrah’s Entertainment’s Data Warehouse holds 30 terabytes, or 30 trillion bytes of data, roughly 30 terabytes, or 30 trillion bytes of data, roughly three times the number of printed characters in three times the number of printed characters in the Library of Congressthe Library of Congress
Casinos, retailers, airlines, and banks are piling Casinos, retailers, airlines, and banks are piling up data so vast, it would have been unthinkable up data so vast, it would have been unthinkable years ago; result from the curse of cheap years ago; result from the curse of cheap storagestorage
Interesting FactsInteresting Facts
Storage Shipments as of 2004: 22 Storage Shipments as of 2004: 22 exabytes or 22 million trillion bytes of hard exabytes or 22 million trillion bytes of hard disk space, double the amount in 2002.disk space, double the amount in 2002.
Equivalent to 4x’s the space needed to Equivalent to 4x’s the space needed to store every word ever spoken by every store every word ever spoken by every human being who has ever lived.human being who has ever lived.
Should double again in 2006Should double again in 2006
Data Can be Used ToData Can be Used To
Quantify the volume impact of vehicles across the Quantify the volume impact of vehicles across the marketing matrixmarketing matrix
Account for decay and saturation factors in the Account for decay and saturation factors in the determination of investment choices and returnsdetermination of investment choices and returns
Execute “what-if” simulations of pricing or promotional Execute “what-if” simulations of pricing or promotional scenarios before a proposed action is takenscenarios before a proposed action is taken
Provide a continuous planning, measurement, analysis and Provide a continuous planning, measurement, analysis and optimization cycle supported by a software structureoptimization cycle supported by a software structure
Deliver robust data feeds into other systems supporting Deliver robust data feeds into other systems supporting supply chain, sales, and financial reporting and endeavorssupply chain, sales, and financial reporting and endeavors
Robust InfrastructureRobust Infrastructure
Data Identification and AcquisitionData Identification and Acquisition
Data Cleansing, Mapping, and Data Cleansing, Mapping, and TransformationTransformation
Production System Loading and Ongoing Production System Loading and Ongoing UpdateUpdate
Success of Data Warehouse Success of Data Warehouse ProjectsProjects
Over half of Data Warehouse projects are DoomedOver half of Data Warehouse projects are Doomed
Fail due to lack of attention to Data Quality IssuesFail due to lack of attention to Data Quality Issues
More than half only have limited acceptanceMore than half only have limited acceptance
Consistency and Accuracy of DataConsistency and Accuracy of Data
Most businesses fail to use business intelligence (BI) Most businesses fail to use business intelligence (BI) strategicallystrategically
IT organizations build data warehouses with little to no business IT organizations build data warehouses with little to no business involvementinvolvement
““A real-time A real-time enterprise without enterprise without real-time business real-time business
intelligence is a real intelligence is a real fast, dumb fast, dumb
organization.”organization.”
Stephen BrobstStephen BrobstChief Technology OfficeChief Technology Office
TeradataTeradata
Success of Data Warehouse Success of Data Warehouse ProjectsProjects
Most challenging type of deployment for an Most challenging type of deployment for an enterpriseenterprise
Large scale and complex system configurationsLarge scale and complex system configurations
Sophisticated data modeling and analysis toolsSophisticated data modeling and analysis tools
High visibility in broad range of important business High visibility in broad range of important business functions within companyfunctions within company
Adoption of Linux-Based PlatformAdoption of Linux-Based Platform
Implementing Data WarehouseImplementing Data Warehouse
Challenges:Challenges: Identifying new processesIdentifying new processes Assuring there were of real useAssuring there were of real use Implementing and ensuring cultural shiftsImplementing and ensuring cultural shifts Managing content and New communities Managing content and New communities
towards a common benefittowards a common benefit Linear modelsLinear models Standards, Governance, Controls, ValuationStandards, Governance, Controls, Valuation
TeradataTeradata
Division of NCR in Dayton, OhioDivision of NCR in Dayton, Ohio
Competitor of IBM and OracleCompetitor of IBM and Oracle
Multi-million Dollar Machines to run the Multi-million Dollar Machines to run the world’s biggest data warehousesworld’s biggest data warehouses Wal-MartWal-Mart Bank of AmericaBank of America Verizon WirelessVerizon Wireless
Teradata’s SuccessTeradata’s Success
Conventional IBM or Sun Microsystems Conventional IBM or Sun Microsystems overload for a couple hours to days on a overload for a couple hours to days on a few terabytes and/or data queriesfew terabytes and/or data queries
IBM cannot return computation on certain IBM cannot return computation on certain complex requestscomplex requests
Equivalent to having data but not able to Equivalent to having data but not able to use it.use it.
Real Time Alerts & IntegrationReal Time Alerts & Integration
Teradata 8.0 Version released in Oct 2004Teradata 8.0 Version released in Oct 2004 Improves real-time alerts and integrationImproves real-time alerts and integration
Businesses can analyze operational info against Businesses can analyze operational info against historical info to identify events in near real-time historical info to identify events in near real-time using the new table designusing the new table design
Used by:Used by: Continental Airlines in the US: reroute passengers on Continental Airlines in the US: reroute passengers on
delayed flights, reissuing tickets, reserving a room in delayed flights, reissuing tickets, reserving a room in a hotel booking systema hotel booking system
Southwest Airlines- savings between $1.2-$1.4 MillionSouthwest Airlines- savings between $1.2-$1.4 Million
Identity TheftIdentity Theft
Government Regulation of Personal Data is Needed Government Regulation of Personal Data is Needed (National Consumer Protection Standards)(National Consumer Protection Standards)
ChoicePoint FollyChoicePoint Folly
Georgia-based data-collection companyGeorgia-based data-collection company
Founded in 1997 to analyze insurance claims information, but Founded in 1997 to analyze insurance claims information, but now provides data to customers including finance companies, now provides data to customers including finance companies, law enforcement, and governmentlaw enforcement, and government
Obtain personal information by perusing public records, or Obtain personal information by perusing public records, or purchasing the information from other companiespurchasing the information from other companies
Identity TheftIdentity Theft
Duped by scammers who set 150 phony Duped by scammers who set 150 phony accounts to access personal data of as many as accounts to access personal data of as many as 145,000 people nationwide 145,000 people nationwide
Scammers set user accounts by faxing in phony Scammers set user accounts by faxing in phony business licenses, undetected for one yearbusiness licenses, undetected for one year
750 people had their identities stolen750 people had their identities stolen
Theft would have gone unnoticed without Theft would have gone unnoticed without California Identity theft law SB 1386 California Identity theft law SB 1386
Identity TheftIdentity Theft
MSN EventMSN Event
Data Warehouse Information GatheringData Warehouse Information Gathering
Over the Phone InterviewsOver the Phone Interviews
Trash Can HuntingTrash Can Hunting
Gathered from Doctors, Internet Transactions, Gathered from Doctors, Internet Transactions, Telephone Operators (Overseas or Prisoners)Telephone Operators (Overseas or Prisoners)
MSN EmailMSN Email
What Can You Do?What Can You Do?
Carefully monitor your credit card bills and credit Carefully monitor your credit card bills and credit reportsreports
Request a once a year free access credit report Request a once a year free access credit report via the three big credit agencies.via the three big credit agencies. Equifax, Experian, TransUnion Equifax, Experian, TransUnion
Victims: contact Federal Trade Commission to Victims: contact Federal Trade Commission to report the theft and monitor credit reports. report the theft and monitor credit reports. 1-800-IDTHEFT1-800-IDTHEFT
ReferencesReferencesDecision Support Systems in the 21Decision Support Systems in the 21stst Century Century 2 2ndnd Edition, by George M. Edition, by George M. Marakas, Prentice Hall, Upper Saddle River, NJ, 2003Marakas, Prentice Hall, Upper Saddle River, NJ, 2003
http://seattletimes.nwsource.com/html/editorialsopinion/2002191098_credited27.htmlSeattle times, plugging holes in data warehousing Seattle times, plugging holes in data warehousing
Teradata warehouse improves real-time alerts and integrationTeradata warehouse improves real-time alerts and integrationCliff Saran.Cliff Saran. Computer Weekly.Computer Weekly. Sutton: Oct 12, 2004. p. 22 (1 page) Sutton: Oct 12, 2004. p. 22 (1 page)
ON THE MARKON THE MARKMark Hall.Mark Hall. Computerworld.Computerworld. Framingham: Oct 18, 2004. Vol. 38, Iss. 42; p. Framingham: Oct 18, 2004. Vol. 38, Iss. 42; p. 6 (1 page)6 (1 page)
Optimization: It's All About the Data Optimization: It's All About the Data Brandweek: Ellen Pederson, Mark Brandweek: Ellen Pederson, Mark AndersonAnderson
THE NO-SACRIFICE, AFFORDABLE DATA WAREHOUSE APP THE NO-SACRIFICE, AFFORDABLE DATA WAREHOUSE APP Intelligent Enterprises, Michael GonzalezIntelligent Enterprises, Michael Gonzalez
ReferencesReferenceshttp://www.dmreview.com/article_sub.cfm?articleId=7071 Convergence- http://www.dmreview.com/article_sub.cfm?articleId=7071 Convergence- Beyond the Data WarehouseBeyond the Data Warehouse
http://www.computerworld.com/printthis/2001/0,4814,56969,00.html Micro-http://www.computerworld.com/printthis/2001/0,4814,56969,00.html Micro-segmentation – Computerworldsegmentation – Computerworld
Too Much InformationToo Much Information Forbes article on data warehouseForbes article on data warehouse
http://reviews.cnet.com/4520-3513_7-5690533-1.html http://reviews.cnet.com/4520-3513_7-5690533-1.html When identity When identity thieves strike data warehousesthieves strike data warehouses
Over half of data warehouse projects doomedOver half of data warehouse projects doomed VNU Business Publications VNU Business Publications Limited, Robert Jaques 25 February 2005 Limited, Robert Jaques 25 February 2005
http://www.linuxworld.com/magazine/?issueid=571 Linux World Article http://www.linuxworld.com/magazine/?issueid=571 Linux World Article
Questions?Questions?