CSC/ECE 517 Fall 2009/wiki3 2 clone
Clone Detection and Clone Manipulation
Introduction
"Software clones are segments of code that are similar according to some definition of similarity" —Ira Baxter, 2002[1]. As per this definition, two code snippets may be similar based on text, syntactic structure or semantics or if they follow same pattern. Two code fragments are similar if their program text is similar. Code snippets may not be semantically equivalent. Such code snippets are also termed as redundant because if one changes then the other also needs to be changed. However, some clones cannot be replaced by another for example, two code snippets may be identical at the textual level but they refer to different variables declared in different places with the same name. Example of such clones is shown below:
Example of such clones is shown below:
int count = 0; int methodA(string str) { for (int i=0;i<str.length;i++) { count = count + 1; } } int methodB(string str) { int count=0; for (int i=0;i<str.length;i++) { count = count + 1; } }
In the above code snippet, count in methodA refers to global variable whereas in methodB it refers to local count variable.
Clone Detection Tools
1. Duploc: This language independent tool helps in clone detection. It offers clicakble matrix display that helps developers to locate the source code snippet that is cloned. This tool includes an information mural algorithm, and it helps the tool to show a matrix of 100,000 lines per side in its entirety on a 600x800 screen[2]. The performance this tool is good for systems of sizes below 1 MLOC. And for maintenance projects those seldom change over time, duplication of code can be identified over night and can be interpreted on next day.
2. Moss:
3. JPlag:
4. Dup: This tool identifies cloned code in large softwares. It was developed with an intention that as software progresses in later stages of development mainly in maintainence phase, where new enhancements are added to the software or bugs are fixed, developers intend to use copy & paste existing code[4]. This makes software unmaintainable. The tool allows user to find exact clones or clones with exceptions of some set of variable names and constants. It helps in identifying code snippets that should be written as procedures. It also helps in Parameterized Matching. Consider below code snippet from B-Tree program implementation:
1: fstream fp; 2: long rightOffset; 3: fp.open(s,fstream::in | fstream::out | fstream::binary); 4: fp.seekp(rootOffset,fstream::beg); 5: fp.write((char*)&test_node,sizeof(test_node)); 6: rightOffset = fp.tellp(); 7: fp.write((char*)&rNode,sizeof(rNode)); 8: btree_node pNode = CreateParentNode(temp[BTREE_ORDER/2],rootOffset,rightOffset); 9: fp.seekp(0,fstream::end); 10: fp.write((char*)&pNode,sizeof(pNode)); 11: rootOffset = fp.tellp(); 12: fp.clear(); 13: fp.close();
Lines 4-6 and 9-11 will be considered in parameterized matching, where rootOffset = 0, fstream::beg = fstream::end, test_node = pNode and Dup will produce that match in the report. This will be considered under parameterized matching because three lines are identical except for those three variables.
5. Dotplot:
6. Covet:
7. CloneDR:
8. CLAN:
Clone detection and manipulation tools
Software maintenance is by far the costliest stage in the licycle of a software. Clones increase the size of the source code as well as duplicate errors in the two codebases. This forces us to use tools that can not only detect the clones but also edit code to get rid of the problem of maintenance.
Before editing code to remove clones, a number of factors have to be considered like the percentage of cloning when compared to the entire project, Percentage change after removal of the clone, it's affect on readability and more. If found favourable, only then must editing be done to removal or alter clones.
CloneDR
Clone doctor is a tool that both detects and removes clones from application software. CloneDR detects clones using compiler technology and not through simple string matching. It includes settings to discard white spaces, line breaks, modified variable names and case changes.
CodeDR also has the ability to filter and automatically remove clones. This is in addition to the interactive clone removal editing it already provides. CloneDR intelligently replaces clones with subroutine calls, directives or declarations.
As shown in the figure, duplicate statements are factored into a #define macro. This macro is referred to whenever the sorting needs to be done
Comparison With Refactoring
Clone detection, as mentioned before, is a technique for searching duplicate patterns of code. Refactoring on the other hand, is concerned with modifying internal structure while keeping the external behaviour the same. Currently, these two tools are loosely coupled. A clone detection tool usually strives to detect duplicate patterns while a refactoring tool behaves as a separate entity / tool that does refactoring only and no detection.
The two tools in a way complement each other. Refining code using clone detection and refactoring can be a two step process. In the first stage, clones are identified based on patterns. The detected patterns can then be handed over to a refactoring tool that can, based on various parameters, decide whether to keep a single copy or modify it to retain better readability.
Conclusion
See Also
1. CCFinderX
2. sif
3. SDD
4. GPLAG
5. CloneDigger
6. SimScan
7. Clone Detective
8. JPlag
9. Moss
References
1. http://drops.dagstuhl.de/opus/volltexte/2007/962/pdf/06301.KoschkeRainer.962.pdf
2. http://scg.unibe.ch/archive/papers/Duca99bCodeDuplication.pdf
3. Clone Detection and Refactoring by Robert Tairas
4. http://eprints.kfupm.edu.sa/20225/1/20225.pdf
5. http://www.semdesigns.com/Products/Clone/