data mining applications

Data Mining Applications CIS 435

Francisco E. Figueroa

I. Executive Summary The fundamental concept of data mining is extracting useful knowledge from data to

solve business problems in order to improve decision making. These business problems can be treated systematically by following the Cross Industry Standard Process for Data Mining as one of codification process. There is a lot of evidence that by integrating data-driven and big data technologies can support the organization to improve business performance. When applying data mining to healthcare, which is becoming increasingly essential, the industry can be benefit greatly from a payer, provider, and patient perspective. For example the payer can detect fraud and abuse, physicians can identify effective treatments and best practices, hospitals can identify patients with potential to get readmitted, and patients can receive better and more affordable healthcare services. Due to the amount of data generated by healthcare transactions and its complexity it is important to apply data mining process, methods and tasks to transform the data and convert it into useful information and knowledge for decision making. II. Data Mining Applications

Many businesses and scientific communities are employing data-mining tools and techniques. More and more success stories are becoming known. Due to large amount of financial and online transactions is critical for the financial sector to identify the right information to increase customer loyalty by collecting and analyzing client behavior data, predict client behavior and have relevant products and services, identify fraudulent and non-fraudulent transactions based on historical data, and identify the profitability of a new customer. In the case of healthcare, data mining applications are use to identify and observe high-risk and chronic diseases, and design the right treatment, compare symptoms, causes, treatments and negative effects, and detect medical insurance abuse. III. Data Mining Themes

There are a large numbers of data mining algorithms developed but there are only a handful types of tasks of these algorithms. Classification is a probability estimation in an attempt to predict which of a small set of classes the individuals belongs to. For example, which one of the patients are likely to have diabetes? Another task that is very related to classification is the scoring. The scoring model can be applied to the individuals and obtain the score representing the probability that the individual belongs to certain class. If we use the example of diabetes, the score will tell us the probability that each patient has to have diabetes.

If you want to know how much a patient will spend in medications based on their conditions, then regression or “value estimation” is the task to use for data mining purposes. Regression attempts to estimate or predict, for each individual a numerical value of some variable for the individual. Similarity matching attempts to identify similar individuals based on data known about them. This method is very popular for making product recommendations by finding people who are similar in terms of products they have purchased. If we apply this method to healthcare, we can find patients that are similar due to certain physicians diagnose and medications prescribed.

Clustering is a method that attempts to group individuals in a population together by their similarity. It is important to note that the similarity is not driven by any specific purpose. In a case of a hospital, clustering can help the organization to understand patients segments or the formation of natural groups and then apply other data mining tasks or approaches to drill down on questions. Clustering can be used to get input to decision-making processes focusing on questions such as: What services we can offer? How the care teams can be coordinated for better patient experience?

Co-occurrence grouping is also known as frequent itemset mining, association rule discovery, and market-basket analysis. This method attempts to find associations between the entities based on transactions involving them. Some of the questions associated to this method are: What items are commonly purchased when a patient go to buy a medication? The method analyze purchased records from the pharmacy. This help the pharmacies or health plans to act and build promotions, discounts, or alternative products for patients that are acquiring certain medications.

Profiling is a method that attempts to characterize the typical behavior of an individual, group or population. This method is used to establish behavioral norms for anomaly detection applications such as fraud detection. It can also be used to monitor intrusions to computer systems. Profiling can help health plan to identify misuse of healthcare services.

Other methods include, link prediction, data reduction and causal modeling. Link prediction attempts to predict the connections between data points or items, usually by suggesting that a link should exist. Link prediction is common in social networking systems. Data reduction help to take a large set of data and replace it with a small set that contains relevant information in the larger set. Helps to reduce the amount of data to be analyzed and maybe more efficient or effective to reveal information than the larger set. And causal modeling help the analysis to understand what events or actions can influence others. Techniques for causal modeling include substantial investment in data, perform randomized controlled experiments like A/B testing, and draw conclusions from observational data. IV. Use Cases

According to Kaiser Health News, “UnitedHealth and University of California form ACO to focus on mining patient data. In accountable care organizations, or ACOs, physicians, hospitals and an insurer work together to coordinate care, control spending and share savings”. The data warehouse in healthcare can provide the primary source for identifying patients with specific conditions that are particularly very important targets for data mining and predictive analytics. Patients with type 2 diabetes and pneumonia can be identified. Data mining can help to detect patterns among patients with type 2 diabetes by integrating thousands of attributes associated to the patients. These attributes can include patient visits, charges, type of insurance coverage, admit locations, diagnoses and procedures.

Some of the use cases can include: a) the probability of a patient being readmitted for the same condition by finding correlations among patients account, charge, and clinical data elements that will be most likely result in a readmission; b) clustering patients 55+ years old, with type 2 diabetes that have been admitted to the hospital more than 2 times. c) regression to understand and predict what will be the financial penalties of the readmissions for next month of type 2 diabetes patients. d) for hospital planning purposes, what can be other services we can

promote for type 2 diabetes patients based on their services obtained by using clustering and applying co-occurrence to understand the promotions we can integrate to patients type 2 diabetes based on services obtained and products purchased. V. Standard Data Mining Process

To use and apply data mining correctly is is important to use a standard process such as The Cross Industry Standard Process for Data Mining (CRISP-DM). This process provides a framework to structure the thinking about data analytics and problems. CRISP-DM provides a useful codification of the data mining process. The process includes: a) Business understanding help the analyst to cast the business problem as one or more data science problems involving building models for classification, regression, probability, etc; b) Data understanding is very relevant to understand the limitations of the available data, data sources, data reliability, and analyze the potential investment that will be needed; c) Data preparation defines the process to manipulate the data, convert the data, remove missing values, etc; d) Modeling is where data mining techniques are applied; e) Evaluation assess the data mining results and confirm confidence, validate the results, and verify reliability; f) Deployment is the implementation of the predictive model in some information systems or processes. VI. Challenges in Data Mining Adoption

When adopting data mining techniques the management team and business and IT areas need to understand them. Data mining algorithms used can get complex and objects and attributes are not always are available in a way that we need them to apply the right techniques. Some of the challenges might include interoperability, data diversity, data consistency,data accuracy, performance issues, and maturity level of the organization to be a data driving organization to make decisions based on data mining techniques. VII. Data Mining Software

According to Gartner, “predictive analytics and other categories of advanced analytics represent the fastest-growing segment of the analytics market. Dell joins SAS, IBM, KNIME and RapidMiner as a Leader in this market.” In this case, we decided to use SAS, it continues to have the broadest range of advanced analytics capabilities across its product stack and the widest applicability across use cases.Furthermore, the new SAS Factory Miner is designed to increase the productivity of data science teams by fostering collaboration and through automated large-scale machine learning. In addition, SAS has proven applied data mining techniques for fraud and improper payments, case management, value-based care to predict the cost of care, predict patient risks, most effective treatment, and high risk of readmission, among others.

References: HFMA. Predictive Analytics can Support the ACO Model. August 3, 2014. Zirmed. Retrieved from http://public.zirmed.com/predictive-analytics-can-support-aco-model/ Kaiser Health News, UnitedHealth and University of California form ACO to focus on mining patient data. Healthcare IT News. September 30, 2016. Retrieved from http://www.healthcareitnews.com/news/unitedhealth-and-university-california-form-aco-focus-mining-patient-data Provost, F. and Fawcett, T. (2013). Data Science for Business. O’Reily The Modeling Agency. How Data Mining is Helping Healthcare. July 24, 2015. Retreived from https://the-modeling-agency.com/how-data-mining-is-helping-healthcare/ Lisa Kart | Gareth Herschel | Alexander Linden | Jim Hare, Magic Quadrant for Advanced Analytics Platforms. February 8, 2016. Gartner. Retrieved from http://www.gartner.com/document/3204117?ref=solrAll&refval=174751235&qid=86443308f07ef37a237c3c2d0cc59a31

http://public.zirmed.com/predictive-analytics-can-support-aco-model/

http://www.healthcareitnews.com/news/unitedhealth-and-university-california-form-aco-focus-mining-patient-data

https://the-modeling-agency.com/how-data-mining-is-helping-healthcare/

http://www.gartner.com/analyst/40753




http://www.gartner.com/document/3204117?ref=solrAll&refval=174751235&qid=86443308f07ef37a237c3c2d0cc59a31