implementation of c-trend for commercial application
DESCRIPTION
Business Intelligence: #tags Access permission, Accountability Scorecard, Agent, Alias, Anonymous Access, Application tier component, Attribute, Authentication, Authentication provider, Burst, CA, Calculated member, Canvas, Capability, Cascading prompt, Certificate, Certificate authority (CA), CGI, Cipher suite, Class style, CM, Common Gateway Interface (CGI), Condition, Constraint, Contact, Content locale, Content Manager (CM), Content store, Credential, Cube, Custom set, Dashboard, Data source, Data source connection, Data tree, Deployment, Deployment archive, Deployment specification, Derived index, Details-based set, Dimension, Dimensional data source, Drill down, Event, Event key, Event list, Fact, Gateway, Glyph, Group, Grouping, Hierarchy, Information card, Information pane, Initiative, Item, Job, Job step, Layout, Level, Locale, MDX, Measure, Member, Metric, Metric extract, Metric package, Metric store, Metric type, Model, Multidimensional data source, Multidimensional Expression Language (MDX), Named set, Namespace, News item, Object, Object extract, Package, Page set, Passport, Portlet, Product locale, Project, Prompt, Properties pane, Publish, Query, Query item, Query subject, Really Simple Syndication (RSS), Repeater, Repeater table, Report, Report output, Report specification, Report view, Response file, Rich Site Summary (RSS), RSS, Score, Scorecard, Scorecard structure, Security provider, Selection-based set, Session, Set, Stacked set, Strategy, Strategy map, Summary, Task, Task execution rule, Template, Thumbnail, Tuple, Union set, User, User-defined column, Watch list, Watch rule, Web Services for Remote Portlets, Widget, Work area, WorkspaceTRANSCRIPT
Implementation Of C-Trend For Commercial
Application Prof. Nilesh Shah
#1,Latika Chaudhary
#2, Aditya Rajmane
#3,Neha Patil
#4,Komal Khatke
#5
Department of Computers, Padmabhushan Vasantdada Patil Pratishthan’s College of Engineering, Mumbai
[email protected](8879309384), [email protected](9768975739),
[email protected](9870669255),[email protected](9664683567)
Abstract— Data mining has made broad and significant
progress since its early beginnings. Today data mining is used in
a vast array of areas, and numerous commercial data mining
system that are available. There are many data mining systems
and research prototypes to choose from. When selecting a data
mining product that is appropriate for one’s task, it is
important to consider various features of data mining systems
from a multidimensional point of view. Researchers have been
striving to build theoretical foundations for data mining.
Various clustering techniques have been used for identifying
and visualizing trends in multi-attribute transactional data . This paper introduce Cluster-based Temporal Representation
of EvenT Data (C-TREND), a system that implements the
temporal cluster graph construct, which maps multiattribute
temporal data to a two-dimensional directed graph that
identifies trends in dominant data types over time.
Keywords— Clustering, Data Visualization, Trend Analysis,
Multiattribute Data
I. INTRODUCTION
Since the inception of information storage, the ability to sift
through and analyze huge amounts of information was a
dream sought out for in many ways and through different
ways. With the advent of electronic and magnetic data
storage, rational databases emerged as one of the efficient
and widely used method to store data. Data stored in such
large databases are not always comprehendible by humans, it
needed to be filtered and analyzed first. Stored records are
raw amounts of data poor in information, not only is it large
and seamlessly irrelevant but also continuously increasing,
updating and changing[8]. Organizations and firms are
increasingly capturing more transactional data, containing
multiple attributes and some measure of time, example
through their websites, e-commerce firms capture
clickstream and purchasing behavior of their customers[2].
Real life transactional data often poses challenges such as
very large size, high dimensionality. Consider the problem of
technology forecasting for a firm. Technologies possess
many features that change over time and understanding how
a technology evolves requires trend analysis of multiple
attributes at once. Similar issues arise in the trend analysis of
consumer purchasing behavior and many other business
intelligence applications[3]. Business intelligence
applications represent to help firms gather and analyze
information about their performance, customers, information
about their performance, customers, competitors, and
business environment. Knowledge representation and data
visualization tools constitute one form of business
intelligence techniques that present information to users in a
manner that supports business decision-making processes[5].
Identifying temporal relationships or trends in data
constitutes an important problem that is relevant in many
business and academic settings, and the data mining
literature has provided analytical techniques for some
specialized types of temporal data, such as time series
analysis and sequence analysis techniques. However,
temporal data can take many forms, most commonly being
general multi-attribute transactional data with a timestamp,
for which time series or sequence analysis methods are not
particularly well suited. The ability to identify trends in such
general temporal faced data can provide significant benefits,
to provide competitive advantage to a firm performing
forecasts or making decisions on future investment and
strategies. In this paper, we present, CTREND-Cluster-based
Temporal Representation of EveNt Data, a new method for
discovering and visualizing trends and temporal patterns in
transactional attribute-value data that builds upon standard
data mining clustering techniques.
II.RELATED WORK
Data mining and visualizations are knowledge discovery
tools used for autonomous analysis of data stored in large
sets in many different ways. Data mining is a knowledge
discovery process; it is the analysis step of knowledge
discovery in databases or KDD for short. Identifying and
visualizing temporal relationships (e.g., trends) in data
constitutes an important problem that is relevant in many
business, scientific, and academic settings[5]. Large data sets
of data cannot possibly be analyzed manually; mining tools
and visualization provide automated means to comprehend
such data sets. In this section, we provide a brief review of
related research in the temporal data mining and visualization
streams.
© 2013 IJEA. ALL RIGHTS RESERVED 27
International Journal of Engineering Associates (2320 – 0804) / #27 / Volume 2 Issue 6
Fig 1.What is Data Mining?
1. Temporal Data Mining
Temporal data mining is a single step in the process of
knowledge discovery in temporal databases that enumerates
structures (temporal patterns or models) over the temporal
data. Temporal data mining is concerned with the analysis of
temporal data and for finding temporal patterns and
regularities in sets of temporal data. Also temporal data
mining techniques allow for the possibility of computer-
driven, automatic exploration of the data
2. Clustering
A computer cluster is a group of linked computers, working
together closely so that in many respects they form a single
computer. The components of a cluster are commonly, but
not always, connected to each other through fast local area
networks. Clusters are usually deployed to improve
performance and availability over that provided by a single
computer, while typically being much more cost-effective
than single computers of comparable speed or availability.
3. Trend analysis
The term "trend analysis" refers to the concept of collecting
information and attempting to spot a pattern, or trend, in the
information. In some fields of study, the term "trend
analysis" has more formally-defined meanings. The data
mining techniques uses clusters identified in multiple time
periods and identifies trends based on similarities between
clusters over time. It is a clustering approach for discovering
temporal patterns, which builds on temporal clustering
methods. Trend analysis decomposes time-series data into
trend movements, cyclic movements, seasonal movements,
and irregular movements.
4. Data Visualization
Visual data mining integrates data mining and data
visualization to discover implicit and useful knowledge from
large data sets. Visual data mining includes data
visualization, data mining results visualization, data mining
process visualization and interactive visual data mining. Data
visualization is the process of presenting data in some visual
form and allowing the human to interact with the data.
i.Hierarchical Clustering: Hierarchical clustering is
one of the visualization techniques partition all dimensions
into subsets. The subsets are visualized in a hierarchical
manner.
ii. Dendrogram: A Dendrogram is a tree-structured
graph used in heat maps to visualize the result of a
hierarchical clustering calculation. The result of a clustering
is presented either as the distance or the similarity between
the clustered rows or columns depending on the selected
distance measure.
iii.Temporal cluster graph: To proceed with the
temporal cluster graph firstly, transactional data set is to be
partitioned with respect to time and then clustering technique
is applied. The temporal cluster graph is a directed graph that
consists of set of nodes and directed edges.
Fig 2.Reducing multi-attribute temporal complexity by
partitioning data into time periods and producing a temporal
cluster graph
III.THE C-TREND TECHNIQUE
1. C-Trend Overview
C-TREND is designed to work with what we term
transactional attribute-value data. Specifically, transactional
attribute-value data is a general form of temporal data that
consists of a collection of records each with a time stamp and
described by a set of attributes. Examples of this type of data
include shopping cart data with a sale timestamp and
numerical values representing the number of products
purchased in certain as well as product description data that
includes a release date and a set of indicators representing the
presence and/or quantity of specific product features.
© 2013 IJEA. ALL RIGHTS RESERVED 28
International Journal of Engineering Associates (2320 – 0804) / #28 / Volume 2 Issue 6
Fig 3.Overview Of C-Trend Process
C-TREND is the system implementation of the temporal
cluster-graph-based trend identification and visualization
technique; it provides an end user with the ability to generate
graphs from data and adjust the graph parameters. C-TREND
consists of two main phases: 1) offline preprocessing of the
data and 2) online interactive analysis and graph rendering.
In the preprocessing phase, the data set is partitioned based
on time periods, and each partition is clustered using one of
many traditional clustering techniques such as a hierarchical
approach The results of the clustering for each partition are
used to generate two data structures: the node list and the
edge list.
Fig 4.The C-Trend Process
Creating this list in the preprocessing phase allows for more
effective (real-time) visualization updates of the C-TREND
output graphs. Based on these data structures, graph entities
(nodes and edges) are generated and rendered as a temporal
cluster graph in the system output window. In the interactive
analysis, C-TREND allows the user to modify ki(i=1,…t),
and on demand in real time and, as a result, update the
view of the temporal cluster graph. Note that in this initial
implementation, the time partition size is set exogenously by
the user and stays constant throughout preprocessing and
online interactive analysis. We followed this approach
because of the domain-specific nature of time granularity.
For example, for analyzing technology evolution, the desired
time granularity could be a year, whereas financial market
trend analysis may require a much smaller time window. For
this reason, we decided to rely on the domain expert to
specify the most appropriate time granularity for a given
application. However, an important future extension for this
research would be to provide the ability to adjust the time
granularity interactively in real time.
2. Data Preprocessing
An important requirement for real-time graph customization
in C-TREND is the pre computation of multiple clustering
solutions from the initial data set. Depending on the type of
clustering algorithm employed, the cluster solution can be
stored in a way that maximizes the efficiency of the output
graph customization. C-TREND can be implemented with
multiple different standard clustering algorithms (e.g.,
agglomerative or divisive hierarchical clustering or partition-
based clustering). C-TREND utilizes optimized Dendrogram
data structures for storing and extracting cluster solutions
generated by hierarchical clustering algorithms. C-TREND is
the system implementation of the temporal cluster graph-
based trend identification and visualization technique; it
provides an end user with the ability to generate graphs from
data and adjust the graph parameters. C-TREND produces a
dendrogram for each data partition and utilizes a global input
value N that represents the maximum-sized cluster solution
maintained for each data partition. For all practical purposes,
a useful solution will consist of a set of N <<n clusters (n is
number of data points in partition i) and, therefore, C-
TREND has to store only 2N -1 nodes per partition. We have
found that maintaining a maximum solution size consisting
of N ¼ 50 clusters is more than sufficient for many practical
applications of data visualization The value of N can also be
set by the user before the preprocessing phase.
Fig 5.Dendogram example with node n=10
A dendrogram data structure allows for quick extraction of
any specific clustering solution for each data partition when
© 2013 IJEA. ALL RIGHTS RESERVED 29
International Journal of Engineering Associates (2320 – 0804) / #29 / Volume 2 Issue 6
the user changes partition zoom level ki. To obtain a specific
clustering solution from the data structure for data partition
Di, C-TREND uses the DENDRO_EXTRACT algorithm
(Algorithm 1), which takes the desired number of clusters in
the solution ki as an input and returns the set CurrCl
containing the clusters corresponding to the ki-sized solution.
Cluster attributes such as center and size are then accessible
from the corresponding dendrogram data structure by
referencing the clusters in CurrCl.
DENDRO_EXTRACT starts at the root of the dendrogram
and traverses the dendrogram by splitting the highest
numbered node in the current set of clusters until k clusters
are included in the set. MaxCl represents the highest element
in the current cluster set CurrCl. It is easy to see that because
of the specific dendrogram structure, it is always the case
that MaxCl=DendrogramRooti- |CurrCl| +1. Furthermore, the
dendrogram data array maintains the successive levels of the
hierarchical solution in order; therefore, replacing MaxCl by
its children MaxCl:Left (left child) and MaxCl:Right (right
child) is sufficient for identifying the next solution level in
the dendrogram. DENDRO_EXTRACT is linear in time
complexity O(ki), which provides for the real-time extraction
of cluster solutions.
3. Interactive Data Analysis
C-TREND utilizes a series of validation flags to maintain
and update the displayed state of the output trend graph.
Combinations of the validation flags are used to determine
whether or not each possible edge and node should be
displayed in the graph, and as these flags change, the
displayed components of the graph also change. Each cluster
in the node list possesses two flags: ki -pass and a-pass.
These flags are used to indicate whether the cluster should
be included in the output graph based on the ki value and the
a value, respectively. Specifically, when ki is changed, the
dendrogram data structure is updated so that only the clusters
that should be extracted for the clustering solution of size ki
have a valid k-pass flag. Similarly, when is changed, the
dendrogram data structure is updated so that only the clusters
that are large enough to pass the node filter based on are
assigned a valid pass flag. The nodes that have both valid
k-pass and pass flags make up the set of nodes that are
both large enough and in the desired clustering solution and
therefore are included in the output graph. In our
implementation, a list of all possible edges and their weights
is generated during preprocessing. Each edge in the list
possesses pass flag. When is changed, all edges with a
passing weight (based on ) are assigned a valid pass flag,
and all others are assigned an invalid flag. Only edges that
have a valid pass flag and are incident to two valid nodes
(nodes with valid k-pass and pass flags) are included in the
output graph. Using the implementation described above, C-
TREND can update output graphs based on user changes to
the ki, , and parameters very efficiently. Changing any
one parameter requires only one operation to update the
corresponding flag in the data structure for a given node or
edge[5].
Fig 6.Implementation Of C-Trend
IV.ADVANTAGES
C-TREND provides three advantages over existing
techniques. First, C-TREND presents temporal data in a
unique and intuitive manner that emphasizes trends between
dominant transaction types over time, and its output graphs
resemble evolutionary diagrams and naturally portray the
changes in data characteristics over time. Second, C-TREND
is a meta-analysis tool for data mining results and, therefore,
is designed to provide the domain expert with substantial
© 2013 IJEA. ALL RIGHTS RESERVED 30
International Journal of Engineering Associates (2320 – 0804) / #30 / Volume 2 Issue 6
control over the data presentation. In particular, C-TREND
provides the user with the ability to adjust all key parameters
for creating output trend graphs, which allows a domain
expert to visualize the data in a manner that provides the
most value. Third, C-TREND presents a set of graph
statistics which, in our future work, will provide a means for
developing new trend metrics and a framework for
performing hypothesis testing on the existence and
characteristics of trends.
VI.CONCLUSION
By harnessing computational techniques of data mining, we
have developed a temporal clustering technique for
discovering, analyzing, and visualizing trends in multi-
attribute temporal data. The proposed technique is versatile
and gives significant data representation power to the user –
domain experts have the ability to adjust parameters and
clustering mechanisms to fine-tune trend graphs. It is also
scalable: the time required to adjust trend parameters is quite
low even for larger data sets, which provides for real-time
visualization capabilities. The proposed technique is
applicable in many data analysis contexts, and can provide
insights for analysts performing historical analyses and
generating forecasts.
VIII.REFERENCES [1]Simmi Bagga, Dr. G.N. Singh “Application of Data
Mining”,International Journal for Science and Emerging
Technologies with Latest Trends,1(1):19-23.
[2] B.Ratnamala, P.M.Kiran “ Temporal Cluster graphs for
visualizing Trends”, National Conference on Advances in Computer
Science and Applications with International Journal of Computer
Applications (NCACSA 2012), Proceedings published in
International Journal of Computer Applications® (IJCA).
[3] D.Radha Rani, A.Vini Bharati, P.Lakshmi Durga Madhuri,
M.Phaneendra Babu, A.Sravani,” Analysis of Dendrogram Tree for
Identifying and Visualizing Trends in Multi-attribute Transactional
Data”, International Journal of Engineering Trends and
Technology- Volume3Issue1- 2012,pp 14-18.
[4]Arna Prabha Jena, Annan Naidu “A Review of C-TREND Using
Complete-Link Clustering for Transactional Data” Arna Prabha
Jena et al./ International Journal of Computer Science &
Engineering Technology (IJCSET)- Vol. 4 No. 07 Jul 2013,pp 850-
854.
[5] Gediminas Adomavicius, Member, IEEE, and Jesse Bockstedt
“C-TREND: Temporal Cluster Graphs for Identifying and
Visualizing Trends in Multiattribute Transactional Data” IEEE
TRANSACTIONS ON KNOWLEDGE AND DATA
ENGINEERING, VOL. 20, NO. 6, JUNE 2008,pp 721-735.
[6] Sotiris Kotsiantis, Dimitris Kanellopoulos “Association Rules
Mining: A Recent Overview” GESTS International Transactions on
Computer Science and Engineering, Vol.32 (1), 2006, pp. 71-82
[7]Jeffrey Hsu ”DATA MINING TRENDS AND DEVELOPMENTS
:The Key Data Mining Technologies and Applications forthe 21st
Century”
[8] AbdulRahman R. Alazmi, AbdulAziz R. Alazmi “Data Mining
And Visualization of Large Databases” International Journal of
Computer Science and Security (IJCSS), Volume (6): Issue (5) :
2012,pp 295-314.
© 2013 IJEA. ALL RIGHTS RESERVED 31
International Journal of Engineering Associates (2320 – 0804) / #31 / Volume 2 Issue 6