CSC/ECE 517 Fall 2009/wiki3 2 clone

From Expertiza_Wiki
Revision as of 07:16, 18 November 2009 by Knrajwan (talk | contribs) (→‎See Also)
Jump to navigation Jump to search

Clone Detection and Clone Manipulation

Introduction

"Software clones are segments of code that are similar according to some definition of similarity" —Ira Baxter, 2002[1]. As per this definition, two code snippets may be similar based on text, syntactic structure or semantics or if they follow same pattern. Two code fragments are similar if their program text is similar. Code snippets may not be semantically equivalent. Such code snippets are also termed as redundant because if one changes then the other also needs to be changed. However, some clones cannot be replaced by another for example, two code snippets may be identical at the textual level but they refer to different variables declared in different places with the same name. Example of such clones is shown below:

Example of such clones is shown below:

 int count = 0;
 int methodA(string str)
 {
       for (int i=0;i<str.length;i++)
       {
          count = count + 1;
       }

 }
 int methodB(string str)
 {
       int count=0;
       for (int i=0;i<str.length;i++)
       {
 	  count = count + 1;
       }
 }

In the above code snippet, count in methodA refers to global variable whereas in methodB it refers to local count variable.

Clone Manipulation

Clone Detection Tools

1. Duploc: This language independent tool helps in clone detection. It offers clicakble matrix display that helps developers to locate the source code snippet that is cloned. This tool includes an information mural algorithm, and it helps the tool to show a matrix of 100,000 lines per side in its entirety on a 600x800 screen[2]. The performance this tool is good for systems of sizes below 1 MLOC. And for maintenance projects those seldom change over time, duplication of code can be identified over night and can be interpreted on next day.
2. Moss:
3. JPlag:
4. Dup: This tool identifies cloned code in large softwares. It was developed with an intention that as software progresses in later stages of development mainly in maintainence phase, where new enhancements are added to the software or bugs are fixed, developers intend to use copy & paste existing code[4]. This makes software unmaintainable. The tool allows user to find exact clones or clones with exceptions of some set of variable names and constants. It helps in identifying code snippets that should be written as procedures. It also helps in Parameterized Matching. Consider below code snippet from B-Tree program implementation:

   1: fstream fp;
   2: long rightOffset;
   
   3: fp.open(s,fstream::in | fstream::out | fstream::binary);
  
   4: fp.seekp(rootOffset,fstream::beg);
   5: fp.write((char*)&test_node,sizeof(test_node));
   6: rightOffset = fp.tellp();
  
   7: fp.write((char*)&rNode,sizeof(rNode));
 
   8: btree_node pNode = CreateParentNode(temp[BTREE_ORDER/2],rootOffset,rightOffset);
   
   9: fp.seekp(0,fstream::end);
  10: fp.write((char*)&pNode,sizeof(pNode));
  11: rootOffset = fp.tellp();
  
  12: fp.clear();
  13: fp.close();

Lines 4-6 and 9-11 will be considered in parameterized matching, where rootOffset = 0, fstream::beg = fstream::end, test_node = pNode and Dup will produce that match in the report. This will be considered under parameterized matching because three lines are identical except for those three variables.


5. Dotplot:
6. Covet:
7. CloneDR:
8. CLAN:

Comparison With Refactoring

Clone detection, as mentioned before, is a technique for searching duplicate patterns of code. Refactoring on the other hand, is concerned with modifying internal structure while keeping the external behaviour the same. Currently, these two tools are loosely coupled. A clone detection tool usually strives to detect duplicate patterns while a refactoring tool behaves as a separate entity / tool that does refactoring only and no detection.

The two tools in a way complement each other. Refining code using clone detection and refactoring can be a two step process. In the first stage, clones are identified based on patterns. The detected patterns can then be handed over to a refactoring tool that can, based on various parameters, decide whether to keep a single copy or modify it to retain better readability.

Conclusion

See Also

1. CCFinderX

References

1. http://drops.dagstuhl.de/opus/volltexte/2007/962/pdf/06301.KoschkeRainer.962.pdf
2. http://scg.unibe.ch/archive/papers/Duca99bCodeDuplication.pdf
3. Clone Detection and Refactoring by Robert Tairas
4. http://eprints.kfupm.edu.sa/20225/1/20225.pdf