data mining

Post on 22-Jun-2015

94 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Data Mining

Priyabrata satapathyM.Tech 1st Year

SIS NO.-MCS12121

Contents What is Data mining.

Why Data mining needed.

Data, Information, Knowledge.

Data mining & KDD.

Data Warehouses.

Data Cleaning.

Applications of Data mining.

What is data MiningData mining (knowledge discovery in databases):

Extraction of interesting information or patterns from data in large databases.

Knowledge discovery in databases (KDD) is the process of identifying valid, useful and ultimately understandable patterns in data from large database.

Why Data Mining Needed Data mining is needed for providing

tools to discover Knowledge from data.

Data mining turns a large collection of data into knowledge.

The Data •Data

Data are any facts, numbers, or text that can be processed by a computer.

operational or transactional data : such as, sales, cost, inventory, payroll, and accounting

meta data - : data about the data itself, such as logical database design or data dictionary definitions

nonoperational data: such as industry sales, forecast data, and macro economic data

The InformationThe patterns, associations, or relationships among All this data can provide information.

For example, analysis of retail point of sale transaction data can yield information on which products are selling and when.

The Knowledge•Information can be converted into knowledge about historical patterns and future trends.

For example, summary information on retail supermarket sales can be analyzed in light of promotional efforts to provide knowledge of consumer buying behavior.

Data, Information & Knowledge

Data Mining & KDDData cleaning Used to remove noise and inconsistent data.Data integration Where multiple data sources may be combined.Data selection Where data relevant to the analysis task are retrieved from the database.Data transformation Where data are transformed or consolidated into forms appropriate for mining by performing summary.Data mining An essential process where intelligent methods are applied in order to extract data patterns.

Data Mining & KDD

Data WarehouseIA data warehouse is a repository of information collected from multiple sources, stored under a unified schema and residing to a single site.

Data warehouse constructed through a process of data cleaning, data integration, data transformation, data loading & data refreshing.

Data CleaningData that is to be analyze by data mining techniques can be incomplete, noisy, and inconsistent.

Data cleaning routines attempt to fill in missing values, smooth out noise while identifying outliers, and correct inconstancies of data.

Missing ValuesWe can clean the missing values in data by Ignoring the tuple. Filling the missing value manually. Using a global constant to fill the values. Using the measure of mean, median to fill the missing value. Using the most probable value to fill.

Noisy DataNoisy data means error full data .To handle noisy data : Binning:Binning methods smooth a sorted data value by consulting the neighborhood values around it. Regression: Data smoothing can be done by regression . Here data values changes to a function. Outlier: Outliers may be detected by clustering. Here similar values are arranged in clusters, those are fall outside are outliers.

Applications of Data MiningData mining for Financial data analysis.

Data mining for Retail and

Telecommunication Industries.

Data mining for Science and Engineering.

Data mining and Recommender systems.

Thank You

top related