CSC/ECE 517 Fall 2009/wiki3 2 pp: Difference between revisions

From Expertiza_Wiki
Jump to navigation Jump to search
Line 52: Line 52:


[8] [http://www.jucs.org/jucs_8_11/finding_plagiarisms_among_a/Prechelt_L.pdf Finding Plagiarisms among a Set of Programs with JPlag]
[8] [http://www.jucs.org/jucs_8_11/finding_plagiarisms_among_a/Prechelt_L.pdf Finding Plagiarisms among a Set of Programs with JPlag]
[9] [http://theory.stanford.edu/~aiken/moss/ A System for Detecting Software Plagiarism]

Revision as of 22:13, 15 November 2009

Clone detection and clone manipulation

The DRY principle says that a particular code fragment should not be repeated more than once in a program. But it happens. And when it does, it is good to be able to find the multiple code "clones" because Software cloning complicates the maintenance process by giving the maintainers unnecessary code to examine. As per Burd, it seems that when presented with the challenge of adding new functionality the natural instinct of a programmer is to copy, paste and modify the existing code to meet the new requirements and thus creating a software clone. While the basis behind such an approach is uncertain, one possible reason is due to time restrictions on maintainers to complete the maintenance change. Duccase points out that “making a code fragment is simpler and faster than writing from scratch” and that if a programmer’s pay is related to the amount of code they produce then the proliferation of software clones will continue.

Once a clone is created it is effectively lost within the source code and so both clones must therefore be maintained as separate units despite their similarities. Komondoor states that if errors are identified within one clone then it is likely that modifications may be necessary to the other counter-part clones. Detection is therefore required if any of the clones are to be re-identified to assist the maintenance process. So if clones can be detected then the similarities can be exploited and replaced during preventative maintenance with a new single code unit this will eliminate the problems identified above.

There are a good number of clone detection tools available both commercially and within academia. Within these tools several different approaches to software clone detection have been implemented, including string analysis, program slicing, metric analysis and abstract tree comparisons. This page will survey the a set of clone detection tools and compare them.

Clone Detection Technique

This article will be primarily focusing on below five established detection tools; JPlag, MOSS, Covet, CCFinder and CloneDr. JPlag and MOSS are web-based academic tools for detecting plagiarism in student's source code. CloneDr and CCFinder are stand alone tools looking at code duplication in general.

Comparison Table
Figure 1 Comparison Table.

Figure 1 summarizes the clone detection tools. The languages supported by the analysis process are highlighted, as is the analysis approach. The column labeled domain highlights the main purpose of the tools for either clone detection or for plagiarism detection.

CCFinder

CCFinder focuses on analyzing large-scale systems with a limited amount of language dependence. It transforms the source code into tokens. CCFinder aims to identify "portions of interest (but syntactically not exactly identical structures)". After the string is tokenised a token-by-token matching algorithms is performed. CCFinder also provides a dotplotting visualisation tool that allows visual recognition of matches within large amounts of code.

CloneDr

CloneDr analyses software at the syntactic level to produce abstract syntax tree (AST) representations. A series of algorithms are then applied to the tree to detect clones. The first algorithm searches for sub-tree matches within the ASTs. Then a “sequence detection” algorithm attempts to detect “variable size sequences of sub-tree clones”. A third algorithm uses combinations of previously detected clones and looks for “more complex near-miss clones”. The final clone set includes the clones detected in the second and third algorithms. CloneDr can automatically replace cloned code by producing a functionally equivalent subroutine or macro.

Covet

Covet uses a number of the metrics as defined by Mayrand. These metrics were selected by taking known clones and identifying which of the Datrix metrics best highlighted the known clone set. Covet does not apply the same scale of clone likelihood classification as Mayrand. Rather within Covet this is simplified; there is no scale of clone, functions are either classed as clones or distinct. The tool is still in the prototype stages and is not capable of processing industrial sized programs.

JPlag

JPlag uses tokenised substring matching to determine similarity in source code. Its specific purpose is to detect plagiarism within academic institutions. Firstly the source code is translated into tokens (this requires a language dependent process). JPlag aims to tokenise in such way that the "essence" of a program is captured and so can be effective for catching copied functionality. Once converted the tokenised strings are compared to detect the percentage of matching tokens which is used as a similarity value. JPlag is an online service freely available to academia.

MOSS

MOSS [Aik02] Aiken does not publish the method MOSS uses to detect source code plagiarism, as its ability to detect plagiarism may be compromised. Moss like JPlag is an online service provided freely for academic use. Source code is submitted via a perl script and then the results are posted on the MOSS’s webpage. Users are emailed a url of the results.

References

[1] Evaluating clone detection tools for use during preventative maintenance

[2] Investigating the maintenance implications of the replication of code

[3] A Language Independent Approach for Detecting Duplicated Code

[4] Using Slicing to Identify Duplication in Source Code

[5] Experiment on the automatic detection of function clones in a software system using metrics

[6] Clone detection using abstract syntax trees

[7] Maintenance Support Tools for JAVA Programs: CCFinder and JAAT

[8] Finding Plagiarisms among a Set of Programs with JPlag

[9] A System for Detecting Software Plagiarism