CSC/ECE 517 Fall 2009/wiki3 2 pp: Difference between revisions

From Expertiza_Wiki
Jump to navigation Jump to search
No edit summary
Line 18: Line 18:
==<span class="apple-style-span"><span style="mso-bidi-font-size: 12.0pt; line-height: 115%"><font color="windowtext">CCFinder</font></span></span>==
==<span class="apple-style-span"><span style="mso-bidi-font-size: 12.0pt; line-height: 115%"><font color="windowtext">CCFinder</font></span></span>==
[http://ieeexplore.ieee.org.www.lib.ncsu.edu:2048/search/srchabstract.jsp?arnumber=919197&isnumber=19875&punumber=7340&k2dockey=919197@ieeecnfs&query=((maintenance+support+tools+for+java+programs:+ccfinder+and+jaat)%3Cin%3Emetadata)&pos=0&access=no CCFinder] focuses on analyzing large-scale systems with a limited amount of language dependence. It transforms the source code into tokens. CCFinder aims to identify "portions of interest (but syntactically not exactly identical structures)". After the string is tokenised a token-by-token matching algorithms is performed. CCFinder also provides a dotplotting visualisation tool that allows visual recognition of matches within large amounts of code.
[http://ieeexplore.ieee.org.www.lib.ncsu.edu:2048/search/srchabstract.jsp?arnumber=919197&isnumber=19875&punumber=7340&k2dockey=919197@ieeecnfs&query=((maintenance+support+tools+for+java+programs:+ccfinder+and+jaat)%3Cin%3Emetadata)&pos=0&access=no CCFinder] focuses on analyzing large-scale systems with a limited amount of language dependence. It transforms the source code into tokens. CCFinder aims to identify "portions of interest (but syntactically not exactly identical structures)". After the string is tokenised a token-by-token matching algorithms is performed. CCFinder also provides a dotplotting visualisation tool that allows visual recognition of matches within large amounts of code.
==<span class="apple-style-span"><span style="mso-bidi-font-size: 12.0pt; line-height: 115%"><font color="windowtext">CloneDr</font></span></span>==
[http://ieeexplore.ieee.org.www.lib.ncsu.edu:2048/search/srchabstract.jsp?arnumber=738528&isnumber=15947&punumber=5960&k2dockey=738528@ieeecnfs&query=((clone+detection+using+abstract+syntax+trees)%3Cin%3Emetadata)&pos=0&access=no CloneDr] analyses software at the syntactic level to produce abstract syntax tree (AST) representations. A series of algorithms are then applied to the tree to detect clones. The first algorithm searches for sub-tree matches within the ASTs. Then a “sequence detection” algorithm attempts to detect “variable size sequences of sub-tree clones”. A third algorithm uses combinations of previously detected clones and looks for “more complex near-miss clones”. The final clone set includes the clones detected in the second and third algorithms. CloneDr can automatically replace cloned code by producing a functionally equivalent subroutine or macro.


=<font color="windowtext">References</font>=
=<font color="windowtext">References</font>=

Revision as of 21:39, 15 November 2009

Clone detection and clone manipulation

The DRY principle says that a particular code fragment should not be repeated more than once in a program. But it happens. And when it does, it is good to be able to find the multiple code "clones" because Software cloning complicates the maintenance process by giving the maintainers unnecessary code to examine. As per Burd, it seems that when presented with the challenge of adding new functionality the natural instinct of a programmer is to copy, paste and modify the existing code to meet the new requirements and thus creating a software clone. While the basis behind such an approach is uncertain, one possible reason is due to time restrictions on maintainers to complete the maintenance change. Duccase points out that “making a code fragment is simpler and faster than writing from scratch” and that if a programmer’s pay is related to the amount of code they produce then the proliferation of software clones will continue.

Once a clone is created it is effectively lost within the source code and so both clones must therefore be maintained as separate units despite their similarities. Komondoor states that if errors are identified within one clone then it is likely that modifications may be necessary to the other counter-part clones. Detection is therefore required if any of the clones are to be re-identified to assist the maintenance process. So if clones can be detected then the similarities can be exploited and replaced during preventative maintenance with a new single code unit this will eliminate the problems identified above.

There are a good number of clone detection tools available both commercially and within academia. Within these tools several different approaches to software clone detection have been implemented, including string analysis, program slicing, metric analysis and abstract tree comparisons. This page will survey the a set of clone detection tools and compare them.

Clone Detection Technique

This article will be primarily focusing on below five established detection tools; JPlag, MOSS, Covet, CCFinder and CloneDr. JPlag and MOSS are web-based academic tools for detecting plagiarism in student's source code. CloneDr and CCFinder are stand alone tools looking at code duplication in general.

Comparison Table
Figure 1 Comparison Table.

Figure 1 summarizes the clone detection tools. The languages supported by the analysis process are highlighted, as is the analysis approach. The column labeled domain highlights the main purpose of the tools for either clone detection or for plagiarism detection.

CCFinder

CCFinder focuses on analyzing large-scale systems with a limited amount of language dependence. It transforms the source code into tokens. CCFinder aims to identify "portions of interest (but syntactically not exactly identical structures)". After the string is tokenised a token-by-token matching algorithms is performed. CCFinder also provides a dotplotting visualisation tool that allows visual recognition of matches within large amounts of code.

CloneDr

CloneDr analyses software at the syntactic level to produce abstract syntax tree (AST) representations. A series of algorithms are then applied to the tree to detect clones. The first algorithm searches for sub-tree matches within the ASTs. Then a “sequence detection” algorithm attempts to detect “variable size sequences of sub-tree clones”. A third algorithm uses combinations of previously detected clones and looks for “more complex near-miss clones”. The final clone set includes the clones detected in the second and third algorithms. CloneDr can automatically replace cloned code by producing a functionally equivalent subroutine or macro.

References

[1] Evaluating clone detection tools for use during preventative maintenance

[2] Investigating the maintenance implications of the replication of code

[3] A Language Independent Approach for Detecting Duplicated Code

[4] Using Slicing to Identify Duplication in Source Code

[5] Experiment on the automatic detection of function clones in a software system using metrics

[6] Clone detection using abstract syntax trees

[7] Maintenance Support Tools for JAVA Programs: CCFinder and JAAT