international journal of computational intelligence and information security

Upload: naresh-reddy

Post on 05-Apr-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/2/2019 International Journal of Computational Intelligence and Information Security

    1/6

  • 8/2/2019 International Journal of Computational Intelligence and Information Security

    2/6

    leaks. We demonstrate several synthetic as well as real-world examples of heap dumps for which our approach

    provides more insight into the problem thanstate-of-the-art tools such as Eclipses MAT.Memory leaks are afrequent source of bugs in applications that use dynamic memoryallocation. They occur if programmers mistakesprevent the deallocation of memory that is nolonger used. Undetected memory leaks cause slow-Permission to

    make digital or hard copies of allor part of this work for personal or classroom use is granted without fee provided

    that copies are notmade or distributed for profit or commercial advantage and that copies bear this notice and thefullcitation on the first page. To copy otherwise, to republish, to post on servers or to redistribute tolists, requires

    prior specific permission and/or a fee. downs and eventually the exhaustion of allavailable memory, triggering out-

    of-memory conditions that usually lead to application crashes.These crashes significantly affect availability,particularly of long-running server applications,which is why memory leaks are one of most frequently reported

    types of bugs against serverframeworks. Memory leaks are challenging to identify and debug for several reasons.

    First, theobserved failure may be far removed from the error that caused it, requiring the use of heap analysistoolsthat examine the state of the reachability graph when a failure occurred. Second, real-worldapplications usually

    make heavy use of several layers of frameworks whose implementation detailsare unknown to the developers

    debugging encountered memory leaks. Often, these developerscannot distinguish whether an observed reference

    chain is legitimate (such as when objects are keptin a cache in anticipation of future uses), or represents a

    leak.Third, the sheer size of the heaplarge-scale server applications can easily contain tens of millions of objectsmakes manual inspection of even a small subset of objects difficult orimpossible.Our key contributions can be

    summarized as follows:1. Although analysis techniques are widely used in heap analysis, our work is the first toemploygraph mining for detecting leaking candidates. Specifically, we demonstrate that graph grammarmining

    used in an offline manner can detect both seeded and known memory leaks in realapplications.2. Compared to

    other offline analysis techniques, our approach does not require any a prioriknowledge about which classes arecontainers, or about their internal structure. It capturescontainers even when these are embedded into application

    classes, such as ad-hoc listsor arrays.3. Our approach can identify leaks even if the leaks locations within the graphdo not share acommon ancestor node, or if the paths from that ancestor to the instances are difficult to find by

    themanual examination that is required in existing tools such as Eclipse Memory Analyzer (MAT).4. Graphgrammar mining can find recursive structures, giving a user insight into the data structuresused in a program. For

    instance, linked lists and trees can be identified by their distinct signatures.5. Finally, the ability to combine

    subgraph frequency with location information makes ouralgorithm robust to the presence of object structures that

    occur naturally with high frequencywithout constituting a leak.

    International Journal of Computational Intelligence and Information Security, December

    2011 Vol. 2, No. 1250SECTION IV

    4. Graph Mining Based on Anomaly DetectionGAD is a graph-based approach to finding anomalies in data by searching for three factors:modifications,

    insertions, and deletions of vertices and edges, each unique factor runs its ownalgorithm that finds a normative

    substructure and attempts to find the substructures that are similarbut not completely identical to the discovered

    normative substructure. A normative substructure is arecurring subgraph of vertices and edges that, whencoalesced into a single vertex, most compressesthe overall graph.

    4.1. Categories Of Intrusion Detection systemIntrusion detection is classified into two types.1) Misuse detection, 2)Anomaly detection. Misusedetection useswell-defined patterns of the attack that exploit weakness in system and applicationsoftware to identify the

    intrusions (Kumar and Spafford 1995). These patterns are encoded

    inadvance and used to match against user behavior to detect intrusions. Anomaly detectionidentifies deviations

  • 8/2/2019 International Journal of Computational Intelligence and Information Security

    3/6

    from the normal usage behavior patterns to identify the intrusion. The normalusage patterns are constructed from

    the statically measures of the system features, for example theCPU and I/O activities by a particular user orprogram. The behavior of the user is observed andany deviation from the constructed normal behavior is detected

    as intrusion.

    4.2. What Is Anomaly

    Anomaly detection refers to detecting patterns in a given data set that do not conform to anestablished normalbehavior. The patterns thus detected are called anomalies and translate to criticaland actionable information in

    several application domains. Anomalies are also referred to as outlier,surprise deviation etc.Most anomaly

    detection algorithms require a set of purely normal data to train the model andthey implicitly assume thatanomalies can be treated as patterns not observed before. Since anoutlier may be defined as a data point which is

    very different from the rest of the data, based onsome measure, we employ several detection schemes in order to

    see how efficiently these schemesmay deal with the problem of anomaly detection. The statistics community hasstudied the conceptof outliers quite extensively. In these techniques, the data points are modeled using a

    stochasticdistribution, and points are determined to be outliers depending upon their relationship with thismodel.

    However with increasing dimensionality, it becomes increasingly difficult and inaccurate toestimate the

    multidimensional distributions of the data points. However recent outlier detectionalgorithms that we utilize in this

    study are based on computing the full dimensional distances of thepoints from one another as well as oncomputing the densities of local neighborhoods.The deviation measure is our extension of the traditional method

    of discrepancy detection. As indiscrepancy detection, comparisons are made between predicted and actual sensorvalues, anddifferences are interpreted to be indications of anomalies. This raw discrepancy is entered into

    anormalization process identical to that used for the value change score, and it is this representationof relative

    discrepancy which is reported. The deviation score for a sensor is minimum if there is nodiscrepancy andmaximum if the discrepancy between predicted and actual is the greatest seen todate on that sensor. Deviation

    requires that a simulation be available in any form for generatingsensor value predictions. However the remaining

    sensitivity and cascading alarms measures requirethe ability to simulate and reason with a causal model of the

    system being monitored. Sensitivityand cascading Alarms

    International Journal of Computational Intelligence and Information Security, December

    2011 Vol. 2, No. 1251An appealing way to assess whether current behavior is anomalous or not is via comparison topast behavior. This

    is the essence of the surprise measure. It is designed to highlight a sensor which[18] behaves other than it has

    historically. Specifically, surprise uses the historical frequencydistribution for the sensor in two ways: It is thosesensors and to examine the relative likelihoods of different values of the sensor. It is those sensors which display

    unlikely values when other values of the sensor are more likely which get a high surprise [19] score. Surprise is not

    high if the onlyreason a sensors value is unlikely is that there are many possible values for the sensor, allequallyunlikely.CONCLUSION VTrends obtain through data mining intended to be used for marketing purpose

    or for some otherethical purposes, may be misused. Unethical businesses may used the information obtained

    throughdata mining to take advantage of vulnerable people or discriminated against a certain group of people. In

    addition, data mining technique is not a 100 percent accurate; thus mistakes do happenwhich can have seriousconsequence. Although it is against the law to sell or trade personalinformation between different organizations,

    selling personal information have occurred. Forexample, according to Washing Post, in 1998, CVS had sold their

    patients prescription purchasesto a different company. In addition, American Express also sold their customerscredit carepurchases to another company. What CVS and American Express did clearly violate privacy

    lawbecause they were selling personal information without the consent of their customers. The sellingof personal

    information may also bring harm to these customers because you do not know what theother companies are

  • 8/2/2019 International Journal of Computational Intelligence and Information Security

    4/6

    planning to do with the personal information that they have purchased. In ourpaper we briefly discuss with the

    process of mining in the graph and techniques used in the graphmanagement enhancements to our comes throughthe novel algorithm implementation to preventmisuse detection and privacy.

    References[1] C. Aggarwal, N. Ta, J. Feng, J. Wang, M. J. Zaki. XProj: A Framework for Projected

    Structural Clustering of XMLDocuments,

    KDD Conference, 2007.[2] R. Agrawal, A. Borgida, H.V. Jagadish. EfficientMaintenance of transitive

    relationships in large data andknowledge bases,ACM SIGMOD Conference, 1989.[3] D. Chakrabarti, Y. Zhan, C. Faloutsos R-MAT: A Recursive Model for Graph Mining.SDM Conference, 2004.[4] J. Cheng, J. Xu Yu, X. Lin, H.Wang, and P. S. Yu, Fast Computing Reachability

    Labelings for Large Graphs withHigh Compression Rate,EDBT Conference, 2008.[5] J. Cheng, J. Xu Yu, X. Lin, H. Wang, and P. S. Yu, Fast Computation of Reachability

    Labelings in Large Graphs,

    EDBT Conference, 2006.[6] E. Cohen. Size-estimation framework with applications to transitive closure andreachability,Journal of Computer and System Sciences, v.55 n.3, p.441-453, Dec. 1997.[7] E. Cohen, E. Halperin, H. Kaplan, and U. Zwick,Reachability and distance queries via 2-hop labels,ACM Symposium on Discrete Algorithms, 2002.[8] D. Cook, L. Holder, Mining Graph Data,John Wiley & Sons Inc, 2007.[9] D. Conte, P. Foggia, C. Sansone, and M. Vento. Thirty years of graph matching in

    pattern recognition.Int. Journalof Pattern Recognition andArtificial Intelligence

    , 18(3):265298, 2004.[10] M. Faloutsos, P. Faloutsos, C. Faloutsos, On Power LawRelationships of the Internet Topology.SIGCOMM Conference, 1999.[11] G. Flake, R. Tarjan, M. Tsioutsiouliklis. Graph Clustering and Minimum Cut Trees, Internet Mathematics

    , 1(4),385408, 2003.

    International Journal of Computational Intelligence and Information Security, December

    2011 Vol. 2, No. 1252[12] D. Gibson, R. Kumar, A. Tomkins, Discovering Large Dense Subgraphs in Massive Graphs,VLDB Conference,2005.[13] M. Hay, G. Miklau, D. Jensen, D. Towsley, P. Weis. Resisting Structural Re-

    identification in Social Networks,VLDB Conference, 2008.[14] H. He, A. K. Singh. Graphs-at-a-time: Query Language and Access Methods for

    Graph Databases. In

  • 8/2/2019 International Journal of Computational Intelligence and Information Security

    5/6

    Proc. ofSIGMOD 08, pages 405418, Vancouver, Canada, 2008.[15] H. He, H. Wang, J. Yang, P. S. Yu. BLINKS:Ranked keyword searches on graphs. InSIGMOD, 2007.[16] H. Kashima, K. Tsuda, A. Inokuchi. Marginalized Kernels between Labeled Graphs,

    ICML

    , 2003.[17] L. Backstrom, C. Dwork, J. Kleinberg. Wherefore Art Thou R3579X Anonymized

    Social Networks, HiddenPatterns, and Structural Steganography.[18] T. Kudo, E. Maeda, Y.

    Matsumoto. An Application of Boosting to Graph Classification,NIPS Conf.2004.[19] J. Leskovec, J. Kleinberg, C. Faloutsos. Graph Evolution: Densification and Shrinking

    Diameters.ACM Transactions on Knowledge Discoveryfrom Data (ACM TKDD), 1(1), 2007.

    N.Swapna GoudB.Tech Computer Science & Information Technology from VijayRural Engineering College,

    M.Tech Computer Science Engineering from AnuragGroup of Institutions (CVSR EngineeringCollege). She is currently working as AsstProf for CVSR Engineering College, having seven and

    half years of experience inAcademic has guided many UG & PG students and her research areasinclude DataMining, Design Analysis of Algorithms, Service Oriented Architecture our

    work focusing on Graph Mining

    S.Vaishnavipursuing M.Tech Computer Science Engineering at JNTUH B.TechComputer Science &

    Engineering. She is currently working as Asst Prof atNarayanamma Institute of Technology &

    Science having seven and half years of experience in Academic has guided many UG students

    her research areas include DataMining, Design Analysis of Algorithms, Service OrientedArchitecture our work focusing on Graph Mining

  • 8/2/2019 International Journal of Computational Intelligence and Information Security

    6/6

    of9Leave a Comment

    Submit

    Characters: 0