data mining bi

Upload: kunal-kalra

Post on 06-Apr-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/2/2019 Data Mining BI

    1/7

    In the past, mining for gold consisted of choosing a site and then sifting through endless amount ofeffort. Sometimes the prospector only found a few valuable nuggets, sometimes he hit upon anentire vein, but most of the time, he found nothing at all and decided to either move on or toanother promising spot or give up mining all together and stop wasting his time. Today, with

    scientific methods and specialized tools, mineral mining is much more accurate and productive.Mining for data has evolved in much the same way. Older methods executed by businessmathematicians and statisticians took a long time to yield constructive information. Now, currentsoftware and techniques help make data mining a lucrative, more accessible process for businesses.

    How Data Mining Works

    Data mining is simply filtering through large amounts of raw data for useful information that givesbusiness a competitive edge. This information is made of meaningful patterns and trends that arealready there but were previously unseen.

    A company might be collecting enormous amounts of data from sources such as cash registers, barcode sweeps, surveys, inventory reports, Web hits, registration cards, and so on. Companiesusually end up with so much data that it is difficult to go through it all, and then by the time itscompiled and analyzed, it could be severely outdated. Instead of letting this data sit in databasearchives indefinitely, companies could mine this data efficiently within a relatively short amount oftime.

    The end result of data mining should be the acquisitions of new and useful information thatcan help a company make better decisions that improve business. For example, lets assumethat your company sells pet supplies. Mining through customer and product databases reveals thatdog food purchasers tend to rent apartments and dog food purchasers usually own homes. With this

    type of information, which may have been difficult to piece together before, you can now moreeffectively target your marketing efforts.

    The overall mining process actually begins with a targeted problem. To keep the projectmanageable, the business should narrow the scope of the mining process to a single issue, such asincreasing repeat business. Data mining is more successful when the company first decides what itwants to get out of the mining or what business problem it wants to solve.

    Searching through vast amounts of data for anything and everything wastes resources and couldgenerate unusable results such as irrelevant data (people with the last name of Young always buydog food), erroneous data (all customers who buy fish food also buy gum, but only three customers

    fit this trend) or obvious data (people who buy cat food own cats). Once the company defines aproblem or focus, they can also identify the mining targets, such as the databases or data stores thatare relevant to the problem.

    Next, the actual mining begins. This data-defining process looks for patterns, trends, relationships,frequencies, and influences on the data. Depending on how much data is involved, this process

  • 8/2/2019 Data Mining BI

    2/7

    Data Mining and Business Intelligence

    might take a few hours to several days to sift through data, pick out patterns, and then verify andre-verify them.

    Figure 1: Microsofts BI Architecture

    Intelligence In Mining Methods.

    The most popular tool used when mining is artificial intelligence (AI). AI technologies try to workthe way the human brain works, by making intelligent guesses, learning by example, and usingdeductive reasoning.

    When mining, you want the computer to think like you as much as possible, say that you want topredict stock prices. What you really want is to predict what the right action to take is. You find

  • 8/2/2019 Data Mining BI

    3/7

    Data Mining and Business Intelligence

    that IBM goes up $1, then it goes up $3 dollars. You are pleasantly surprised, but the computerthinks it made a mistake and tries very hard to correct it.

    Once you define the problem, the computer should be doing the guesswork. Some of the more

    popular AI methods used in data mining include neural networks, clustering, and decision trees.

    Neural networks look at the rules of using data, which are based on the connections found or on asample set of data. For example, you can set up mining software so that it looks at an investmentdatabase for patterns of stock prices in relation to other factors you dictate.

    As a result, the software continually analyzes value and compares it to the other factors, and itcompares these factors repeatedly until it finds patterns emerging. These patterns are known asrules. The software then looks for other patterns based on these rules or sends out an alarm when atrigger value is hit. Neural networks are more accurate than many other methods; however, theyreless interpretable and usually slower.

    Clustering (also called family clusters and symbolic classifiers) divides data into groups based onsimilar features or limited data ranges. In a very simple example, mining might uncover a clusterof customers with low incomes. But when complex adjoining factors are filtered in and analyzed,mining might also reveal that young customers fall into this cluster, as well. Hence, the businesscustomers who are younger tend to make less money.

    Clusters are used when data isnt labeled in a way that is favorable to mining. For instance, aninsurance company that wants to find instances of fraud wouldnt have its records labeled asfraudulent or not fraudulent. But after analyzing patterns within clusters, the mining software canstart to figure out the rules that point to which claims are likely to be false.

    Decision trees, like clusters, separate the data into subsets and then analyze the subsets to dividethem into further subsets, and so on (for a few more levels). The final subsets are then smallenough that the mining process can find interesting patterns and relationships within the data. Toillustrate this, lets assume that customers are divided into one-time visitors and repeat customers.The decision tree method reviews the factors of the one-time visitor tree and then separates thisdata according to type of sale, such as an in-store sale or an online sale. Mining in this mannermight eventually reveal that online customers do more repeat shopping.

    Commercial and academic entities tend to use decision trees more. They are easy to understandand map, but unfortunately, they also have the least amount of accuracy in results.

    Older methods used traditionally in statistical analysis can also contribute a lot to mining.Regression analysis, for instance, is a good method for finding linear patterns in the data. Bylooking at several explanatory factors, regression analysis draws some aggregate conclusion abouta single factor. Older methods are not always as fancy or as easy to use, but they still provide someof the most accurate analyses.

  • 8/2/2019 Data Mining BI

    4/7

    Data Mining and Business Intelligence

    It is best to remember that no single method is a universal solution and some are better withparticular types of data. Often, using more than one method for data mining yields the best results.In doing so, you can use different data mining tools or choose a software package that incorporates

    more than one method.

    Reporting The Results.

    When the process is complete, the mining software generates a report. An analyst goes over thereport to see if further work needs to be done, such as refining mining parameters, using other dataanalysis tools to examine the data, or even scrapping the data if its unusable. If no further work isrequired, the report proceeds to the decision makers for appropriate action.

    Traditional query-and-reporting tools for analyzing data involve designing specific queries or

    search criteria to find usable information. This involves making an assumption, such as cat foodshoppers purchase more upholstery cleaner, and then searching through the database for data thatconfirms or rejects that assumption. This is fine if the user knows specifically what they arelooking for. Mining, however, goes beyond this by finding useful information based on a generalquestion.

    A way to think of it is: with a query, you have to make a hypothesis about relationships yourself,mining is an automatic hypothesis-generator. By searching through the data from the bottom up,mining combs the records for specific details, and then you can assemble answers to your queriesbased on what the data is actually saying.

    Data mining is also more than merely reporting because it tells you not just what has happened inthe past, but it also tells you what could happen in the future based on the trends it finds. Query-and-reporting tools, on the other hand, can only tell you what has happened. In comparison, datamining has a lot more relevant and valuable information to offer.

    This can be the fun and interesting part of data mining. It is like looking into a crystal ball [and]being a wizard looking into the future.

    As technologies allow more and more data to be stored, query-and-reporting tools may not alwaysbe the most efficient method of finding out what you need to know. Using querying to investigatenumerous scenarios and what-ifs might slow down large databases that are in use by other users.Mining does not take the place of querying, however. Querying answers targeted at complexquestions is still valuable, but mining used in conjunction with query-and-reporting tools might bethe best way to refine information uncovered from mining.

    Decide What To Mine.

    Naturally, to mine data, there must be data available. Believe it or not, weve seen companies thatstart mining without any data, in hopes of getting it or collecting it, if the information is not there,

  • 8/2/2019 Data Mining BI

    5/7

    Data Mining and Business Intelligence

    no algorithm will find it.

    Once the data to be mined is identified, it should be cleansed. Cleansing data frees it fromduplicate information and erroneous data. Next, the data should be stored in a uniform format

    within relevant categories or fields. Because mining must use good data to ensure productiveresults, cleansing or formatting the data can take up a significant amount of time in the miningprocess.

    Mining tools can work with all types of data storage, from large data warehouses to smallerdesktop databases to flat files. Data warehouses and data marts are storage methods that involvearchiving large amounts of data in a way that makes it easy to access when necessary.

    Data mining and massive data storage are highly complementary activities. In fact, some of thesame vendors that offer data mining solutions also produce and market data storage tools. Even so,you do not have to collect years of data to make use of the power of data mining.

    Mining is most useful with a large number of factors. You dont need lots of data,. For example,if a company keeps track of 20 to 50 products or 50 to 100 customer traits, theyll probably makebetter use of mining their data and find several overlapping relationships, rather than justmaintaining years of data that are kept in only a few fields or in a memo format.

    Is Mining For You?

    Most of the data mining examples in this article involve marketing issues, and this is certainly acommon use of mining, but data mining is also popular in areas such as investment analysis anddetecting insurance or banking fraud. According to Dr. Elder, these industries have adopted mining

    techniques early because they have had a great deal of data that is very hard to sift through. Miningturns up data that gives them just a little edge, but the payoff for this small edge is large.

    In addition, the power of data mining is being used for many other purposes, such as analyzingSupreme Court decisions, discovering patterns in health care, pulling stories about

    competitors from newswires, resolving bottlenecks in production processes, and analyzing

    sequences in the human genetic makeup. There really is no limit to the type of business or areaof study where data mining can be beneficial.

    Data mining is simple in theory, but it can get quite involved in practice. That doesnt mean thatmining is not accessible, it just means that youll probably need some help getting set up. Many

    data mining software solutions are available for small and large businesses in most industries andthese tools are constantly improving. Their selection ranges from desktop applications for a fewhundred dollars to complete server hardware and software solutions costing tens of thousands ofdollars.

    The mining software you select should be able to test various scenarios, provide analysis in adecipherable format, and most importantly, it should give you information that you dont already

  • 8/2/2019 Data Mining BI

    6/7

    Data Mining and Business Intelligence

    have. Not all data mining tools are equal, nor are all tools appropriate for your type of businessdata, but with a little research, youll find the one that fits your situation the best.

    Any company looking into data mining should first consider a few things. One, you should expect

    that the mining effort will make a difference in how you do business and affect the bottom line.Next, you must have sources of information that need to be mined. In addition to must be ready tocommit the company to the mining project. Mining data takes time; time to prepare the data, timefor the mining software to discover related patterns, and time for the mining process to adjust whenchanges occur.

    When beginning a mining project, we tell a company to estimate about six months to make ameaningful advance,

    Finally, the business should also have someone who has the time and is willing to becomeproficient at the mining process. It is vital to remember that when considering a process as intense

    as data mining, human intervention is still an extremely important part of the loop. You need tohave someone who is willing to understand the tools, the methods, the queries, as well as how tointerpret the data. A user can easily misinterpret mining results just to fit some preconceivedperception. Hiring a consultant might help your company understand the process better and selectthe most appropriate software package. We dont mean to suggest, however, that experimentationis not helpful, especially if you are a smaller company or you already have some software in mind.

    Sometimes, theres a breakthrough when trying something new. Some mistakes will be made,mistakes that can be remedied with statistical expertise, but the technology is becoming moreavailable for more users,

    Data mining is not designed to fix all business problems or magically tell you what the realproblem is. If properly used, however, data mining is a useful tool that gives a company theinformation-sensitive microscope that it didnt already have, which in turn, can be used to helpmake intelligent business decisions. You might not always get the answer you expected or needed,but youll often find new information that is still constructive.

    Most of all, data mining is an ongoing process that involves a lot of analysis and refining along theway, so think of it as a worthy investment. And like any investment, even if your data-miningportfolio only uncovers small golden nuggets at first, those nuggetsif properly managedcanyield a lot of value

    Data Mining GlossaryFor a quick review of data mining terms and definitions, see the

    entries below.

  • 8/2/2019 Data Mining BI

    7/7

    Data Mining and Business Intelligence

    data martAs a data storage term, data marts tend to store

    information for a single subject or department, and can be subsets

    of a larger warehouse.

    data warehouseA massive database for efficiently storing large

    amounts of data to accommodate the rapid processing of queries

    and summaries.

    Online Analytical Processing (OLAP)A tool that helps

    organize databases more efficiently for quick access to internal

    data, especially when querying large amounts. This type of tool is

    used commonly with databases that organize data in multiple

    dimensions or aspects.

    queryTo request specific data from a database or ask a

    question that you want the mining process to answer.

    ruleWhen referring to data mining processes, a rule signifies a

    pattern of factors that a mining method follows to analyze and

    compare data effectively.

    structured query language (SQL)A type of programming

    language used to perform queries and maintenance in databases.

    SQL uses several words in plain English and many databases are

    able to incorporate SQL commands, thereby making it a very

    popular language used with databases.