deepalipawar.files.wordpress.com€¦  · web viewquestion bank of dmta. define term “entropy”...

33
QUESTION BANK of DMTA 1) Define term “Entropy” in classification 2) Explain decision tree classification algorithm with Tree pruning phase 3) Write short note on KNN 4) Explain Apriori Algorithm for generation of association rules? How candidate keys are generated in apriori algorithm 5) What are various applications of KNN 6) Explain various scalable decision tree algorithms? How performance can be improved with the help of scalable decision tree algorithms 7) Write a short note on FP- growth Algorithm 8) Compare supervised learning and unsupervised learning 9) Define term “Gini Index” 10) What is Bayes theorem? Explain Naïve Bayes Classification technique with example 11) What do you mean by Classification? What are various requirements for classification? 12) Explain K-nearest neighbor algorithm for sales prediction application

Upload: others

Post on 07-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: deepalipawar.files.wordpress.com€¦  · Web viewQUESTION BANK of DMTA. Define term “Entropy” in classification . Explain decision tree classification algorithm with Tree pruning

QUESTION BANK of DMTA

1) Define term “Entropy” in classification 2) Explain decision tree classification algorithm with Tree

pruning phase3) Write short note on KNN4) Explain Apriori Algorithm for generation of association

rules? How candidate keys are generated in apriori algorithm

5) What are various applications of KNN6) Explain various scalable decision tree algorithms? How

performance can be improved with the help of scalable decision tree algorithms

7) Write a short note on FP- growth Algorithm8) Compare supervised learning and unsupervised learning9) Define term “Gini Index” 10) What is Bayes theorem? Explain Naïve Bayes Classification technique with example 11) What do you mean by Classification? What are various requirements for classification? 12) Explain K-nearest neighbor algorithm for sales prediction application 13) Explain Decision tree classification algorithm witha) Attribute Selection Phaseb) Tree Pruning phase14) What do you mean by frequent item set? How association rules can be generated from frequent item sets?

E xp l a i n t h e s t o r ag e m od e l s o f O L A P.

Page 2: deepalipawar.files.wordpress.com€¦  · Web viewQUESTION BANK of DMTA. Define term “Entropy” in classification . Explain decision tree classification algorithm with Tree pruning

MOLAP Multidimensional Online Analytical processing - In MOLAP data is stored in form of multidimensional cubes and not in relational databases....................R e a d a n s w e r

D i ff e r e n t i a t e b e tw ee n D a t a M i n i n g an d D a t a w a r e ho u s i n g .

Data warehousing is merely extracting data from different sources, cleaning the data and storing it in the warehouse. Where as data mining aims to examine or explore the data using queries...................R e a d a n s w e r

Da ta m i n i n g i n t e r v i e w t e s t ( 5 0 q u e s t i on s ) Da ta m i n i n g i n t e r v i e w t e s t ( 1 5 q u e s t i o n s)

W ha t i s D a t a pu r g i n g ?

The process of cleaning junk data is termed as data purging. Purging data would mean getting rid of unnecessary NULL values of columns.................Re ad an sw e r

W ha t a r e CU B E S?

A data cube stores data in a summarized version which helps in a faster analysis of data. The data is stored in such a way that it allows reporting easily..................Re ad an sw e r

W ha t a r e O L A P an d O L T P?

OLTP: Online Transaction and Processing helps and manages applications based on transactions involving high volume of data..................R e a d a n s w e r

W ha t a r e t h e d iff e r e n t p r o b l e m s t ha t “ D a t a m i n i ng ” c a n so l v e ?

Data mining helps analysts in making faster business decisions which increases revenue with lower costs..............R e a d a n s w e r

W ha t a r e d i f f e r e n t s t ag e s o f “ D a t a m i n i n g ” ?

Exploration: This stage involves preparation and collection of data. it also involves data cleaning, transformation. Based on size of data, different tools to analyze the data may be required. This

Page 3: deepalipawar.files.wordpress.com€¦  · Web viewQUESTION BANK of DMTA. Define term “Entropy” in classification . Explain decision tree classification algorithm with Tree pruning

stage helps to determine different variables of the data to determine their behavior................... R e a d a n s w e r

W ha t i s D i s c r e t e an d C on t i n uou s d a t a i n D a t a m i n i n g w o r l d ?

Discreet data can be considered as defined or finite data. E.g. Mobile numbers, gender. Continuous data can be considered as...............R e a d a n s w e r

W ha t i s M O D E L i n D a t a m i n i n g w o r l d ?

Models in Data mining help the different algorithms in decision making or pattern matching. The second stage of data mining....................R e a d a n s w e r

H o w do e s t h e da t a m i n i n g an d da t a w a r e hou s i n g w o r k t og e t h e r ?

Data warehousing can be used for analyzing the business needs by storing data in a meaningful form. Using Data mining,...............R e a d a n s w e r

W ha t i s a D e c i s i o n T r e e A l go r i t h m ?

A decision tree is a tree in which every node is either a leaf node or a decision node. This tree takes an input an object and outputs some decision................R e a d a n s w e r

W ha t i s N a ï ve B a y e s A l g o r i t h m ?

Naïve Bayes Algorithm is used to generate mining models. These models help to identify relationships between input columns and the predictable columns................R e a d a n s w e r

E xp l a i n c l us t e r i n g a l go r i t h m .

Clustering algorithm is used to group sets of data with similar characteristics also called as clusters......................Re ad an sw e r

W ha t i s T i m e S e r i e s a l go r i t h m i n da t a m i n i n g ?

Time series algorithm can be used to predict continuous values of data. Once the algorithm is skilled to predict a series of data, it can predict the outcome of other series....................Re ad an sw e r

Page 4: deepalipawar.files.wordpress.com€¦  · Web viewQUESTION BANK of DMTA. Define term “Entropy” in classification . Explain decision tree classification algorithm with Tree pruning

E xp l a i n A sso c i a t i o n a l go r i t h m i n D a t a m i n i n g

Association algorithm is used for recommendation engine that is based on a market based analysis. This engine suggests products to customers based on what they bought earlier. The model is built on a dataset containing identifiers......................R e a d a n s w e r

W ha t i s S e q u e n c e c l us t e r i n g a l go r i t h m ?

Sequence clustering algorithm collects similar or related path s, sequences of data containing events.......................R e a d a n s w e r

E xp l a i n t h e c on c e p t s an d c ap a b i li t i e s o f da t a m i n i ng .

Data mining is used to examine or explore the data using queries. These queries can be fired on the data warehouse. Explore the data in data mining helps in reporting,....................R e a d a n s w e r

E xp l a i n ho w t o w o r k w i t h t h e da t a m i n i n g a l go r i t h m s i n c l ud e d i n S Q L S e r v e r da t a m i n i ng .

SQL Server data mining offers Data Mining Add-ins for office 2007 that allows discovering the patterns and relationships of the data.................R e a d a n s w e r

E xp l a i n ho w t o us e D M X - t h e da t a m i n i n g qu e r y l an g uag e .

Data mining extension is based on the syntax of SQL. It is based on relational concepts and mainly used to create and manage the data mining models.............................R e a d a n s w e r

E xp l a i n ho w t o m i n e a n O L A P c ub e .

A data mining extension can be used to slice the data the source cube in the order as discovered by data mining........................R e a d a n s w e r

Q u e s t i on : E x p l a i n F u l l - T e x t Q u e r y i n S Q L S e r v e r .

Answer - SQL Server supports searches on character string columns using Full -Text Query......

Q u e s t i on : E x \

plain the phases a transaction has to undergo.

Answer - The several phases a transaction has to go through are listed here. Database.....

Page 5: deepalipawar.files.wordpress.com€¦  · Web viewQUESTION BANK of DMTA. Define term “Entropy” in classification . Explain decision tree classification algorithm with Tree pruning

Q u e s t i on : W ha t i s X P a t h ?

Answer - XPath is a language defined by the W3C, used to select nodes from XML documents.....

Q u e s t i on : D e f i n e t h e r u l e s f o r d e s i gn i n g F i l e s an d F i l e g r oup s i n S Q L S e r v e r .

Answer - A file or file group can only be used by one database. For example, the files abc.mdf and abc.ndf contains....

Q u e s t i on : W ha t a r e t h e A u t h e n t i c a t i o n M od e s i n S Q L S e r v e r ?

Answer - SQL Server supports two security (authentication) modes....

Q u e s t i on : E x p l a i n D a t a D e f i n i t i o n L ang u ag e , D a t a C on t r o l L angu a g e an d D a t a M a n i pu l a t i o n L anguag e .

Answer - Data definition language is used to define and manage all attributes and properties of a database.....

1. Differentiate between Data Mining and Data warehousing.2. Define Data purging.3. Define Analysis services4. What are CUBES?5. What are OLAP and OLTP?6. Differentiate between Data Mining and Data warehousing.7. What is Data purging?8. What are CUBES?9. What are OLAP and OLTP?10. What are the different problems that “Data mining” can solve?11. What are different stages of “Data mining”?12. What is Discrete and Continuous data in Data mining world?13. What is MODEL is Data mining world?14. How does the data mining and data warehousing work together?15. What is a Decision Tree Algorithm?16. What is Naïve Bayes Algorithm?17. Explain clustering algorithm.18. What is Time Series algorithm in data mining?19. Explain Association algorithm in Data mining?20. What is Sequence clustering algorithm?21. How does data mining and data warehousing work together?

What are the uses of statistics in data mining?Statistics is used to

Page 6: deepalipawar.files.wordpress.com€¦  · Web viewQUESTION BANK of DMTA. Define term “Entropy” in classification . Explain decision tree classification algorithm with Tree pruning

to estimate the complexity of a data mining problem;suggest which data mining techniques are most likely to be successful; andidentify data fields that contain the most “surface information”.

What are the factors to be considered while selecting the sample in statistics?The sample should be

Large enough to be representative of the population. Small enough to be manageable.Accessible to the sampler. Free of bias.

Name some advanced database systems.

Object-oriented databases, Object-relational databases.

Name some specific application oriented databases.

Spatial databases,Time-series databases,Text databases and multimedia databases.

Define Relational databases.A relational database is a collection of tables, each of which is assigned a unique name.

Each table consists of a set of attributes (columns or fields) and usually stores a large set of tuples (records or rows). Each tuple in a relational table represents an object identified by a unique key and described by a set of attribute values.

Define Transactional Databases.A transactional database consists of a file where each record represents a transaction. A Transaction typically includes a unique transaction identity number (trans_ID), and a list of the items making up the transaction.

.Define Spatial Databases.Spatial databases contain spatial-related information. Such databases include geographic (map) databases, VLSI chip design databases, and medical and satellite image databases. Spatial data may be represented in raster format, consisting of n -dimensional bit maps or pixel maps.

What is Temporal Database?Temporal database store time related data .It usually stores relational data that include time related attributes. These attributes may involve several time stamps, each having different semantics.What are Time-Series databases?

Page 7: deepalipawar.files.wordpress.com€¦  · Web viewQUESTION BANK of DMTA. Define term “Entropy” in classification . Explain decision tree classification algorithm with Tree pruning

A Time-Series database stores sequences of values that change with time, such as dataCollected regarding the stock exchange.

Why machine learning is done?

1. To understand and improve the efficiency of human learning.2. To discover new things or structure that is unknown to human beings.

Page 8: deepalipawar.files.wordpress.com€¦  · Web viewQUESTION BANK of DMTA. Define term “Entropy” in classification . Explain decision tree classification algorithm with Tree pruning

3. To fill in skeletal or computer specifications about a domain.

Give the components of a learning system.1. Critic2. Sensors3. Learning Element4. Performance Element5. Effectors6. Problem generators.

What are the steps in the data mining process?Data cleaning Data integration Data selectionData transformationData miningPattern evaluation

g. Knowledge representation

Define data cleaningData cleaning means removing the inconsistent data or noise and collecting

necessary information

Define data miningData mining is a process of extracting or mining knowledge from huge amount of

data.

Define pattern evaluationPattern evaluation is used to identify the truly interesting patterns representing knowledge based on some interesting measures.

Define knowledge representationKnowledge representation techniques are used to present the mined knowledge to the user.

What is Visualization?Visualization is for depiction of data and to gain intuition about data being observed. ItAssists the analysts in selecting display formats, viewer perspectives and data representation schema

Define Spatial VisualizationSpatial visualization depicts actual members of the population in their feature space

What is Descriptive and predictive data mining?Descriptive data mining describes the data set in a concise and summertime manner and Presents interesting general properties of the data. P redictive data mining analyzes the data in order to construct one or set of models and attempts to predict the behavior of new data sets.

What is Data Generalization?

Page 9: deepalipawar.files.wordpress.com€¦  · Web viewQUESTION BANK of DMTA. Define term “Entropy” in classification . Explain decision tree classification algorithm with Tree pruning

It is process that abstracts a large set of task -relevant data in a database from a relatively low conceptual to higher conceptual levels 2 approaches for Generalization

a. Data cube approachb. Attribute-oriented induction approach

Define Attribute Oriented InductionThese method collets the task-relevant data using a relational database query and then performgeneralization based on the examination in the relevant set of data.

What is bootstrap?An interpretation of the jack knife is that the construction of pseudo value is based on Repeatedly and systematically sampling with out replacement fro m the data at hand. This lead to generalized concept to repeated sampling with replacement called bootstrap.

View of statistical approach?Statistical method is interested in interpreting the model. It may sacrifice some performance to be able to extract meaning from the model structure. If accuracy is acceptable then the reason that a model can be decomposed in to revealing parts is often more useful than a 'black box' system, especially during early stages of investigation and design cycle.

Define Deterministic models?Deterministic models, which takes no account of random variables, but gives precise, fixed reproducible output.

Define Systems and Models?System is a collection of interrelated objects and Model is a description of a system. Models are abstract, and conceptually simple.

How do you choose the best model?All things being equal, the smallest model that explains the observations and fits the objectives that should be accepted. In reality, the smallest means the model should optimizes a cert ain scoring function (e.g. Least nodes, most robust, least assumptions)

What is clustering?Clustering is the process of grouping the data into classes or clusters so that objects within a cluster have high similarity in comparison to one another, but a re very dissimilar to objects in other clusters.

What are the requirements of clustering?

ScalabilityAbility to deal with different types of attributesAbility to deal with noisy dataMinimal requirements for domain knowledge to determine input parameter s

Page 10: deepalipawar.files.wordpress.com€¦  · Web viewQUESTION BANK of DMTA. Define term “Entropy” in classification . Explain decision tree classification algorithm with Tree pruning

Constraint based clusteringInterpretability and usability

Page 11: deepalipawar.files.wordpress.com€¦  · Web viewQUESTION BANK of DMTA. Define term “Entropy” in classification . Explain decision tree classification algorithm with Tree pruning

State the categories of clustering methods?

Partitioning methods Hierarchical methods Density based methods Grid based methods Model based methods

What is linear regression?In linear regression data are modeled using a straight line. Linear regression is the simplest form of regression. Bivariate linear regression models a random variable Y called response variable as a linear function of another random variable X, called a predictor variable.

Y = a + b XState the types of linear model and state its use?Generalized linear model represent the theoretical foundation on which linear regression can be applied to the modeling of categorical response variables. The types of generalized linear model are

Logistic regressionPoisson regression

Write the preprocessing steps that may be applied to the data for classification and prediction.

a. Data Cleaningb. Relevance Analysisc. Data Transformation

Define Data Classification.It is a two-step process. In the first step, a model is built describing a pre -determined set of data classes or concepts. The model is constructed by analyzing database tuples described by attributes. In the second step the model is used for classification.

What is a “decision tree”?It is a flow-chart like tree structure, where each internal node denotes a test on an attribute, each branch represents an outcome of the test, and leaf nodes represent classes or class distributions. Decision tree is a predictive model. Each branch of the tree is a classification question and leaves of the tree are partition of the dataset with their classification.

Where are decision trees mainly used?Used for exploration of dataset and business problems Data preprocessing for other predictive analysis Statisticians use decision trees for exploratory analysis

What is Association rule?

Page 12: deepalipawar.files.wordpress.com€¦  · Web viewQUESTION BANK of DMTA. Define term “Entropy” in classification . Explain decision tree classification algorithm with Tree pruning

Association rule finds interesting association or correlation relationships among a large set of data items, which is used for decision-making processes. Association rule s analyzes buying patterns that are frequently associated or purchased together.

Page 13: deepalipawar.files.wordpress.com€¦  · Web viewQUESTION BANK of DMTA. Define term “Entropy” in classification . Explain decision tree classification algorithm with Tree pruning

Define support.Support is the ratio of the number of transactions that include all items in the antecedent and consequent parts of the rule to the total number of transactio ns. Support is an association rule interestingness measure.

Define Confidence.Confidence is the ratio of the number of transactions that include all items in the consequent aswell as antecedent to the number of transactions that include all items in antecedent. Confidence is an association rule interestingness measure.

How are association rules mined from large databases?Association rule mining is a two-step process.

Find all frequent itemsets.Generate strong association rules from the frequent items ets.

What is the classification of association rules based on various criteria?1. Based on the types of values handled in the rule.

Boolean Association rule. Quantitative Association rule.

2. Based on the dimensions of data involved in the rule. a. Single Dimensional Association rule.b. Multi Dimensional Association rule.

3. Based on the levels of abstractions involved in the rule.

Single level Association rule. Multi level Association rule.

4. Based on various extensions to association mining.

Maxpatterns.Frequent closed itemsets.

What are the advantages of Dimensional modeling?

Ease of use.High performancePredictable, standard frameworkUnderstandableExtensible to accommodate unexpected new data elements and new design decisions

Define Dimensional Modeling?Dimensional modeling is a logical design technique that seeks to present the data in aStandard framework that intuitive and allows for high-performance access. It is inherently

Page 14: deepalipawar.files.wordpress.com€¦  · Web viewQUESTION BANK of DMTA. Define term “Entropy” in classification . Explain decision tree classification algorithm with Tree pruning

Dimensional and adheres to a discipline that uses the relational mode l with some important restrictions.

What comprises of a dimensional model?Dimensional model is composed of one table with a multipart key called fact table and a set of smaller tables called dimension table. Each dimension table has a single part primary key that corresponds exactly to one of the components of multipart key in the fact table.

Define a data mart?

Page 15: deepalipawar.files.wordpress.com€¦  · Web viewQUESTION BANK of DMTA. Define term “Entropy” in classification . Explain decision tree classification algorithm with Tree pruning

Data mart is a pragmatic collection of related facts, but does not have to be exhaustive or Exclusive. A data mart is both a kind of subject are a and an application. Data mart is a collection of numeric facts.

What are the advantages of a data-modeling tool?Integrates the data warehouse model with other corporate data models. Helps assure consistency in naming.Creates good documentation in a variety of useful formats.Provides a reasonably intuitive user interface for entering comments about objects.

What is data warehouse performance issue?The performance of a data warehouse is largely a function of the quantity and type of data stored within a database and the query/data loading workload placed upon the system.

What are the types of performance issue?1. 1.Capacity planning for the data warehouse2. 2.data placement techniques within a data warehouse3. 3.Application Performance Techniques.4. Monitoring the Data Warehouse.

.

Why do you need data warehouse life cycle process?Data warehouse life cycle approach is essential because it ensures that the project pieces are brought together in the right order and at the right time.

What are the steps in the life cycle approach?Project PlanningBusiness Requirements definitionData track: Dimensional modeling, Physical Design, Data Staging Design & Development Technology track: Technical Architecture design, Product Selection & Installation Application track: End user Application Specification, End user Application Development DeploymentMaintenance & GrowthProject Management

Merits of Data Warehouse.Ability to make effective decisions from databaseBetter analysis of data and decision supportDiscover trends and correlations that benefits businessHandle huge amount of data.

What are the characteristics of data warehouse?Separate Available Integrated Subject Oriented

Page 16: deepalipawar.files.wordpress.com€¦  · Web viewQUESTION BANK of DMTA. Define term “Entropy” in classification . Explain decision tree classification algorithm with Tree pruning

Not DynamicConsistencyIterative DevelopmentAggregation Performance

List some of the Data Warehouse tools?

OLAP (Online Analytic Processing) ROLAP (Relational OLAP)End User Data Access toolAd Hoc Query toolData Transformation servicesReplication

Explain OLAP?The general activity of querying and presenting text and number data fro m Data Warehouses, as well as a specifically dimensional style of querying and presenting that is exemplified by a number of “OLAP Vendors” .The OLAP vendors technology is no relational and is almost always biased on an explicit multidimensional cube of data. LAP databases are also known as multidimensional cube of databases.

Explain ROLAP?ROLAP is a set of user interfaces and applications that give a relational database a dimensional flavour. ROLAP stands for Relational Online Analytic Processing.

Explain End User Data Access tool?End User Data Access tool is a client of the data warehouse. In a relational data warehouse, such a client maintains a session with the presentation server, sending a stream of separate SQL requests to the server. Evevtually the end user data access tool is done with the SQL session and turns around to present a screen of data or a report, a graph, or some other higher form of analysis to the user. An end user data access tool can be as simple as an Ad Hoc query tool or can be complex as a sophisticated data mining or modeling application.

Explain Ad Hoc query tool?A specific kind of end user data access tool that invites the user to form their own queries by directly manipulating relational tables and their joins. Ad Hoc query tools, as powerful as they are, can only be effectively used and understood by about 10% of all the potential end users of a data warehouse.

Name some of the data mining applications?

Data mining for Biomedical and DNA data analysisData mining for Financial data analysisData mining for the Retail industryData mining for the Telecommunication industry

Name some of the data mining applications

Data mining for Biomedical and DNA data analysisData mining for Financial data analysisData mining for the Retail industry

Page 17: deepalipawar.files.wordpress.com€¦  · Web viewQUESTION BANK of DMTA. Define term “Entropy” in classification . Explain decision tree classification algorithm with Tree pruning

Data mining for the Telecommunication industry

Page 18: deepalipawar.files.wordpress.com€¦  · Web viewQUESTION BANK of DMTA. Define term “Entropy” in classification . Explain decision tree classification algorithm with Tree pruning

What is the difference between “supervised” and unsupervised” learning scheme.In data mining during classification the class label of each training sample is provided, this type of training is called supervised learning (i.e.) the learning of the model is supervised in that it is told to which class each training sample belongs. Eg. Classification In unsupervised learning the class label of each training sample is not known and the memb er or set of classes to be learned may not be known in advance. Eg.Clustering

Explain the various OLAP operations.a) Roll-up: The roll-up operation performs aggregation on a data cube, either by

Climbing up a concept hierarchy for a dimension.b) Drill-down: It is the reverse of roll-up. It navigates from less detailed data to more

Detailed data.c) Slice: Performs a selection on one dimension of the given cube, resulting in a

Sub cube.Why is data quality so important in a data warehouse environment?Data quality is important in a data warehouse environment to facilitate decision -making. In order to support decision-making, the stored data should provide information from a historical perspective and in a summarized manner.

How can data visualization help in decision-making?Data visualization helps the analyst gain intuition about the data being observed. Visualization applications frequently assist the analyst in selecting display formats, viewer perspective and data representation schemas that faster deep intuitive understanding thus facilitating decision- making.

What do you mean by high performance data mining?Data mining refers to extracting or mining knowledge. It involves an integration of techniques from multiple disciplines like database technology, statistics, machine learning, neural networks, etc. When it involves techniques from high performance computing it is referred as high performance data mining.

Explain the various data mining issues?Explain about

Knowledge Mining User interaction PerformanceDiversity in data types

Explain the data mining

Page 19: deepalipawar.files.wordpress.com€¦  · Web viewQUESTION BANK of DMTA. Define term “Entropy” in classification . Explain decision tree classification algorithm with Tree pruning

functionalities?The data mining functionalities are:

Concept class description Association analysis Classification and prediction Cluster AnalysisOutlier Analysis

Page 20: deepalipawar.files.wordpress.com€¦  · Web viewQUESTION BANK of DMTA. Define term “Entropy” in classification . Explain decision tree classification algorithm with Tree pruning

Explain the different types of data repositories on which mining can be performed?The different types of data repositories on which mining can be performed

are: Relational DatabasesData WarehousesTransactional DatabasesAdvanced DatabasesFlat filesWorld Wide Web

Explain the architecture of data warehouse.Steps for the design and construction of

DW Top-down viewData source viewData warehouse viewBusiness query view3tier DW architecture

What is Data Mining? Explain the steps in Knowledge Discovery?Data mining refers to extracting or mining knowledge from large amount of data. The steps in knowledge discovery are:

Data cleaningData integrationData selectionData transformationData mining Pattern Evolution Knowledge Discovery.

Explain the data pre-processing techniques in detail?The data preprocessing techniques

are: Data CleaningData integrationData transformationData reduction

Explain the smoothing Techniques?Binning Clustering Regression

Explain Data transformation in detail?Smoothing Aggregation Generalization

Page 21: deepalipawar.files.wordpress.com€¦  · Web viewQUESTION BANK of DMTA. Define term “Entropy” in classification . Explain decision tree classification algorithm with Tree pruning

NormalizationAttribute Construction

Explain Normalization in detail?Min Max Normalization Z-Score Normalization Normalization by decimal scaling

Explain data reduction?Data cube Aggregation Attribute subset Selection Dimensional reduction Numerosity reduction

Explain parametric methods and non-parametric methods of reduction?Parametric Methods:

Regression Model Log linear Model

Non-Parametric MethodsSampling Histogram Clustering

Explain Data Discrimination and Concept Hierarchy Generation?Discrimination and concept hierarchy generation for numerical

data: Segmentation by natural partitioningBinningHistogram AnalysisCluster Analysis

Explain Data mining Primitives?There are 5 Data mining Primitives. They

are: Task relevant dataKinds of knowledge to be minedConcept HierarchiesInteresting MeasuresKnowledge Presentation and Visualization Technique to be used for Discovery patterns

Explain Attribute Oriented Induction?Explain:

Attribute oriented induction for data characterizationAlgorithmPresentation of derived generalizationExample

Explain Statistical measures in databases?

Measuring the central

Page 22: deepalipawar.files.wordpress.com€¦  · Web viewQUESTION BANK of DMTA. Define term “Entropy” in classification . Explain decision tree classification algorithm with Tree pruning

tendency

Page 23: deepalipawar.files.wordpress.com€¦  · Web viewQUESTION BANK of DMTA. Define term “Entropy” in classification . Explain decision tree classification algorithm with Tree pruning

Measuring the dispersion of dataGraph displays

Explain multilevel association rule?Example Explanation Variations

Explain Multidimensional Database briefly?Star schema Snowflake schema Fact constellation

Explain with examples for defining star, snowflake, fact constellation schema And Diagrams.

Explain Indexing with suitable examples?Bitmap IndexingJoin IndexingBitmapped join indexing

Explain the Back Propagation technique?DefinitionBack Propagation Algorithm & diagramExample

Explain Partition Methods?Explain

K-Means PartitionK-Medoids PartitionCLARANS method with examples.

Explain Hierarchical method of classifications?Explain

Agglomerative hierarchical clusteringDivisive hierarchical clusteringBIRCH Chameleon CURE

Explain classification by Decision tree induction?Explain the steps in decision tree induction Generation of decision tree algorithm Example and diagramTree pruning

Page 24: deepalipawar.files.wordpress.com€¦  · Web viewQUESTION BANK of DMTA. Define term “Entropy” in classification . Explain decision tree classification algorithm with Tree pruning

Explain the types of data in cluster analysis.Data matrix Dissimilarity matrix Interval scaled variables Binary variablesNominal, Ordinal and Ratio scaled variables

Explain Outlier analysis?Statistical based outlier detection Distance based outlier detection Deviation based outlier detection

Explain Mining complex types of data?Multidimensional analysis and descriptive miningMining spatial databases Mining Multimedia databases Mining Text databasesMining Time-series and sequence dataMining WWW

Briefly explain about Data Mining Application?Financial Data Analysis Retail Industry Telecommunication Industry Biological Data Analysis Scientific Application

Explain social impacts of data mining?Innovators Early adopters ChasmEarly majorityLate majorityLaggards

Explain Additional themes in data mining?Audio and visual miningScientific and statistical data mining