international journal of computational intelligence and information security

8/2/2019 International Journal of Computational Intelligence and Information Security

1/6


2/6

leaks. We demonstrate several synthetic as well as real-world examples of heap dumps for which our approach

provides more insight into the problem thanstate-of-the-art tools such as Eclipses MAT.Memory leaks are afrequent source of bugs in applications that use dynamic memoryallocation. They occur if programmers mistakesprevent the deallocation of memory that is nolonger used. Undetected memory leaks cause slow-Permission to

make digital or hard copies of allor part of this work for personal or classroom use is granted without fee provided

that copies are notmade or distributed for profit or commercial advantage and that copies bear this notice and thefullcitation on the first page. To copy otherwise, to republish, to post on servers or to redistribute tolists, requires

prior specific permission and/or a fee. downs and eventually the exhaustion of allavailable memory, triggering out-

of-memory conditions that usually lead to application crashes.These crashes significantly affect availability,particularly of long-running server applications,which is why memory leaks are one of most frequently reported

types of bugs against serverframeworks. Memory leaks are challenging to identify and debug for several reasons.

First, theobserved failure may be far removed from the error that caused it, requiring the use of heap analysistoolsthat examine the state of the reachability graph when a failure occurred. Second, real-worldapplications usually

make heavy use of several layers of frameworks whose implementation detailsare unknown to the developers

debugging encountered memory leaks. Often, these developerscannot distinguish whether an observed reference

chain is legitimate (such as when objects are keptin a cache in anticipation of future uses), or represents a

leak.Third, the sheer size of the heaplarge-scale server applications can easily contain tens of millions of objectsmakes manual inspection of even a small subset of objects difficult orimpossible.Our key contributions can be

summarized as follows:1. Although analysis techniques are widely used in heap analysis, our work is the first toemploygraph mining for detecting leaking candidates. Specifically, we demonstrate that graph grammarmining

used in an offline manner can detect both seeded and known memory leaks in realapplications.2. Compared to

other offline analysis techniques, our approach does not require any a prioriknowledge about which classes arecontainers, or about their internal structure. It capturescontainers even when these are embedded into application

classes, such as ad-hoc listsor arrays.3. Our approach can identify leaks even if the leaks locations within the graphdo not share acommon ancestor node, or if the paths from that ancestor to the instances are difficult to find by

themanual examination that is required in existing tools such as Eclipse Memory Analyzer (MAT).4. Graphgrammar mining can find recursive structures, giving a user insight into the data structuresused in a program. For

instance, linked lists and trees can be identified by their distinct signatures.5. Finally, the ability to combine

subgraph frequency with location information makes ouralgorithm robust to the presence of object structures that

occur naturally with high frequencywithout constituting a leak.

International Journal of Computational Intelligence and Information Security, December

2011 Vol. 2, No. 1250SECTION IV

4. Graph Mining Based on Anomaly DetectionGAD is a graph-based approach to finding anomalies in data by searching for three factors:modifications,

insertions, and deletions of vertices and edges, each unique factor runs its ownalgorithm that finds a normative

substructure and attempts to find the substructures that are similarbut not completely identical to the discovered

normative substructure. A normative substructure is arecurring subgraph of vertices and edges that, whencoalesced into a single vertex, most compressesthe overall graph.

4.1. Categories Of Intrusion Detection systemIntrusion detection is classified into two types.1) Misuse detection, 2)Anomaly detection. Misusedetection useswell-defined patterns of the attack that exploit weakness in system and applicationsoftware to identify the

intrusions (Kumar and Spafford 1995). These patterns are encoded

inadvance and used to match against user behavior to detect intrusions. Anomaly detectionidentifies deviations


3/6

from the normal usage behavior patterns to identify the intrusion. The normalusage patterns are constructed from

the statically measures of the system features, for example theCPU and I/O activities by a particular user orprogram. The behavior of the user is observed andany deviation from the constructed normal behavior is detected

as intrusion.

4.2. What Is Anomaly

Anomaly detection refers to detecting patterns in a given data set that do not conform to anestablished normalbehavior. The patterns thus detected are called anomalies and translate to criticaland actionable information in

several application domains. Anomalies are also referred to as outlier,surprise deviation etc.Most anomaly

detection algorithms require a set of purely normal data to train the model andthey implicitly assume thatanomalies can be treated as patterns not observed before. Since anoutlier may be defined as a data point which is

very different from the rest of the data, based onsome measure, we employ several detection schemes in order to

see how efficiently these schemesmay deal with the problem of anomaly detection. The statistics community hasstudied the conceptof outliers quite extensively. In these techniques, the data points are modeled using a

stochasticdistribution, and points are determined to be outliers depending upon their relationship with thismodel.

However with increasing dimensionality, it becomes increasingly difficult and inaccurate toestimate the

multidimensional distributions of the data points. However recent outlier detectionalgorithms that we utilize in this

study are based on computing the full dimensional distances of thepoints from one another as well as oncomputing the densities of local neighborhoods.The deviation measure is our extension of the traditional method

of discrepancy detection. As indiscrepancy detection, comparisons are made between predicted and actual sensorvalues, anddifferences are interpreted to be indications of anomalies. This raw discrepancy is entered into

anormalization process identical to that used for the value change score, and it is this representationof relative

discrepancy which is reported. The deviation score for a sensor is minimum if there is nodiscrepancy andmaximum if the discrepancy between predicted and actual is the greatest seen todate on that sensor. Deviation

requires that a simulation be available in any form for generatingsensor value predictions. However the remaining

sensitivity and cascading alarms measures requirethe ability to simulate and reason with a causal model of the

system being monitored. Sensitivityand cascading Alarms


2011 Vol. 2, No. 1251An appealing way to assess whether current behavior is anomalous or not is via comparison topast behavior. This

is the essence of the surprise measure. It is designed to highlight a sensor which[18] behaves other than it has

historically. Specifically, surprise uses the historical frequencydistribution for the sensor in two ways: It is thosesensors and to examine the relative likelihoods of different values of the sensor. It is those sensors which display

unlikely values when other values of the sensor are more likely which get a high surprise [19] score. Surprise is not

high if the onlyreason a sensors value is unlikely is that there are many possible values for the sensor, allequallyunlikely.CONCLUSION VTrends obtain through data mining intended to be used for marketing purpose

or for some otherethical purposes, may be misused. Unethical businesses may used the information obtained

throughdata mining to take advantage of vulnerable people or discriminated against a certain group of people. In

addition, data mining technique is not a 100 percent accurate; thus mistakes do happenwhich can have seriousconsequence. Although it is against the law to sell or trade personalinformation between different organizations,

selling personal information have occurred. Forexample, according to Washing Post, in 1998, CVS had sold their

patients prescription purchasesto a different company. In addition, American Express also sold their customerscredit carepurchases to another company. What CVS and American Express did clearly violate privacy

lawbecause they were selling personal information without the consent of their customers. The sellingof personal

information may also bring harm to these customers because you do not know what theother companies are


4/6

planning to do with the personal information that they have purchased. In ourpaper we briefly discuss with the

process of mining in the graph and techniques used in the graphmanagement enhancements to our comes throughthe novel algorithm implementation to preventmisuse detection and privacy.

References[1] C. Aggarwal, N. Ta, J. Feng, J. Wang, M. J. Zaki. XProj: A Framework for Projected

Structural Clustering of XMLDocuments,

KDD Conference, 2007.[2] R. Agrawal, A. Borgida, H.V. Jagadish. EfficientMaintenance of transitive

relationships in large data andknowledge bases,ACM SIGMOD Conference, 1989.[3] D. Chakrabarti, Y. Zhan, C. Faloutsos R-MAT: A Recursive Model for Graph Mining.SDM Conference, 2004.[4] J. Cheng, J. Xu Yu, X. Lin, H.Wang, and P. S. Yu, Fast Computing Reachability

Labelings for Large Graphs withHigh Compression Rate,EDBT Conference, 2008.[5] J. Cheng, J. Xu Yu, X. Lin, H. Wang, and P. S. Yu, Fast Computation of Reachability

Labelings in Large Graphs,

EDBT Conference, 2006.[6] E. Cohen. Size-estimation framework with applications to transitive closure andreachability,Journal of Computer and System Sciences, v.55 n.3, p.441-453, Dec. 1997.[7] E. Cohen, E. Halperin, H. Kaplan, and U. Zwick,Reachability and distance queries via 2-hop labels,ACM Symposium on Discrete Algorithms, 2002.[8] D. Cook, L. Holder, Mining Graph Data,John Wiley & Sons Inc, 2007.[9] D. Conte, P. Foggia, C. Sansone, and M. Vento. Thirty years of graph matching in

pattern recognition.Int. Journalof Pattern Recognition andArtificial Intelligence

, 18(3):265298, 2004.[10] M. Faloutsos, P. Faloutsos, C. Faloutsos, On Power LawRelationships of the Internet Topology.SIGCOMM Conference, 1999.[11] G. Flake, R. Tarjan, M. Tsioutsiouliklis. Graph Clustering and Minimum Cut Trees, Internet Mathematics

, 1(4),385408, 2003.


2011 Vol. 2, No. 1252[12] D. Gibson, R. Kumar, A. Tomkins, Discovering Large Dense Subgraphs in Massive Graphs,VLDB Conference,2005.[13] M. Hay, G. Miklau, D. Jensen, D. Towsley, P. Weis. Resisting Structural Re-

identification in Social Networks,VLDB Conference, 2008.[14] H. He, A. K. Singh. Graphs-at-a-time: Query Language and Access Methods for

Graph Databases. In


5/6

Proc. ofSIGMOD 08, pages 405418, Vancouver, Canada, 2008.[15] H. He, H. Wang, J. Yang, P. S. Yu. BLINKS:Ranked keyword searches on graphs. InSIGMOD, 2007.[16] H. Kashima, K. Tsuda, A. Inokuchi. Marginalized Kernels between Labeled Graphs,

ICML

, 2003.[17] L. Backstrom, C. Dwork, J. Kleinberg. Wherefore Art Thou R3579X Anonymized

Social Networks, HiddenPatterns, and Structural Steganography.[18] T. Kudo, E. Maeda, Y.

Matsumoto. An Application of Boosting to Graph Classification,NIPS Conf.2004.[19] J. Leskovec, J. Kleinberg, C. Faloutsos. Graph Evolution: Densification and Shrinking

Diameters.ACM Transactions on Knowledge Discoveryfrom Data (ACM TKDD), 1(1), 2007.

N.Swapna GoudB.Tech Computer Science & Information Technology from VijayRural Engineering College,

M.Tech Computer Science Engineering from AnuragGroup of Institutions (CVSR EngineeringCollege). She is currently working as AsstProf for CVSR Engineering College, having seven and

half years of experience inAcademic has guided many UG & PG students and her research areasinclude DataMining, Design Analysis of Algorithms, Service Oriented Architecture our

work focusing on Graph Mining

S.Vaishnavipursuing M.Tech Computer Science Engineering at JNTUH B.TechComputer Science &

Engineering. She is currently working as Asst Prof atNarayanamma Institute of Technology &

Science having seven and half years of experience in Academic has guided many UG students

her research areas include DataMining, Design Analysis of Algorithms, Service OrientedArchitecture our work focusing on Graph Mining


6/6

of9Leave a Comment

Submit

Characters: 0