Analyzing Software Code and Execution – Plagiarism and Bug Detection Shoaib Jameel

Download Analyzing Software Code and Execution – Plagiarism and Bug Detection Shoaib Jameel

Post on 20-Dec-2015

213 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

<ul><li> Slide 1 </li> <li> Analyzing Software Code and Execution Plagiarism and Bug Detection Shoaib Jameel </li> <li> Slide 2 </li> <li> Preliminaries Plagiarism - "use or close imitation of the language and thoughts of another author and the representation of them as one's own original work. Plagiarism.wmv Funny quote - When one copies from one resource its Plagiarism but when copies from multiple resources Research </li> <li> Slide 3 </li> <li> So what happens when you plagiarize? In countries like the US. </li> <li> Slide 4 </li> <li> In India </li> <li> Slide 5 </li> <li> Lets get to the main theme now! GPLAG Detection of Software Plagiarism by Program Dependence Graph Analysis </li> <li> Slide 6 </li> <li> Motivation Its time consuming and labour intensive to design large softwares with multitude of lines of code. So, the easiest way is to Plagiarize ! especially from Open Source Softwares </li> <li> Slide 7 </li> <li> Review Review of Plagiarism Detection 1. String Based 2. AST-based 3. Token-based </li> <li> Slide 8 </li> <li> String Based B. S. Baker. On finding duplication and near duplication in large software systems. In Proc. of 2 nd Working Conf. on Reverse Engineering, 1995. </li> <li> Slide 9 </li> <li> AST-Based I. D. Baxter, A. Yahin, L. Moura, M. SantAnna, and L. Bier. Clone detection using abstract syntax trees. In Proc. of Int. Conf. on Software Maintenance, 1998. </li> <li> Slide 10 </li> <li> Background </li> <li> Slide 11 </li> <li> PDG Program Dependence Graph A PDG is a graph representation of the source code of a procedure. Basic statements like variables, assignments, and procedure calls are represented by program vertices in a PDG. </li> <li> Slide 12 </li> <li> Slide 13 </li> <li> Original and Plagiarized Code </li> <li> Slide 14 </li> <li> PDG-Based Plagiarism Detection Given an original program P, and a plagiarism suspect P`, plagiarism detection tries to search for duplicate structures between P and P` in order to prove or disprove the existence of plagiarism. By representing a program as a set of PDGs, the search for duplicates are performed on PDGs. </li> <li> Slide 15 </li> <li> Slide 16 </li> <li> Plagiarism as Subgraph Isomorphism The disguises are: 1. Format alteration and identifier renaming 2. Statement reordering 3. Control replacement 4. Code insertion </li> <li> Slide 17 </li> <li> The mature rate is set based on ones belief in what proportion of a PDG will stay untouched in plagiarism. It is 0.9 in experiments because overhauling (without errors) 10% of a PDG of reasonable size is almost equivalent to rewriting the code. </li> <li> Slide 18 </li> <li> Pruning Plagiarism Search Space In order to find plagiarized PDG pairs, n m pair-wise (relaxed) subgraph isomorphism testings are needed in principle. Two kinds of filters: 1. Lossless filter 2. Lossy filter </li> <li> Slide 19 </li> <li> Lossless Filter 1. PDGs smaller than an interesting size K are excluded from both G and G`. 2. Based on the definition of - isomorphism, a PDG pair (g, g`), g G and g G`, can be excluded if </li> <li> Slide 20 </li> <li> Lossy Filter Vertex histogram is constructed as a summarized representation of each PDG. Similarity is measured in terms of vertex histograms between g and g`. </li> <li> Slide 21 </li> <li> The Main Idea Estimate the k-dimensional multinomial distribution and then consider whether h(g`) is likely to be an observation from P g </li> <li> Slide 22 </li> <li> GPLAG Algorithm </li> <li> Slide 23 </li> <li> Experiment Evaluation </li> <li> Slide 24 </li> <li> Slide 25 </li> <li> Efficiency of GPLAG </li> <li> Slide 26 </li> <li> Slide 27 </li> <li> Core part Plagiarism </li> <li> Slide 28 </li> <li> Conclusion Some questions still remain How this implementation better than debuggers? How is this approach better than reverse engineering? Human intervention! </li> </ul>