(53 intermediate revisions by the same user not shown)
Line 1:
Line 1:
=<font color="windowtext">Clone detection and clone manipulation</font>=
=<font color="windowtext">Clone detection and clone manipulation</font>=
The [http://en.wikipedia.org/wiki/Don't_repeat_yourself DRY principle] says that a particular code fragment should not be repeated more than once in a program. But it happens. And when it does, it is good to be able to find the multiple code "clones" to make the code more maintainable and reusable. Its
The [http://en.wikipedia.org/wiki/Don't_repeat_yourself DRY principle] says that a particular code fragment should not be repeated more than once in a program. But it happens. And when it does, it is good to be able to find the multiple code "clones" because ''Software cloning'' complicates the maintenance
seems that when presented with the challenge of adding new functionality the natural instinct of a programmer is to copy, paste and modify the existing code to meet the new requirements and thus creating a software clone ([http://ieeexplore.ieee.org.www.lib.ncsu.edu:2048/search/srchabstract.jsp?arnumber=624265&isnumber=13575&punumber=4921&k2dockey=624265@ieeecnfs&query=((investigating+the+maintenance+implications+of+the+replication+of+code)%3Cin%3Emetadata)&pos=0&access=no 2]). While the basis behind such an approach is uncertain, one possible reason is due to time restrictions on maintainers to complete the maintenance change. Duccase [http://scg.unibe.ch/archive/papers/Duca99bCodeDuplication.pdf 3] points out that “making a code fragment is simpler and faster than writing from scratch” and that if a programmer’s pay is related to the amount of code they produce then the proliferation of software clones will continue. A variety of tools have been designed for this, and some of them even allow joint editing of the clones. This page will Survey the techniques for dealing with the problem, and compare the effectiveness with refactoring such as Extract Method.
process by giving the maintainers unnecessary code to examine. As per [http://ieeexplore.ieee.org.www.lib.ncsu.edu:2048/search/srchabstract.jsp?arnumber=624265&isnumber=13575&punumber=4921&k2dockey=624265@ieeecnfs&query=((investigating+the+maintenance+implications+of+the+replication+of+code)%3Cin%3Emetadata)&pos=0&access=no Burd], it seems that when presented with the challenge of adding new functionality the natural instinct of a programmer is to copy, paste and modify the existing code to meet the new requirements and thus creating a software clone. While the basis behind such an approach is uncertain, one possible reason is due to time restrictions on maintainers to complete the maintenance change. [http://scg.unibe.ch/archive/papers/Duca99bCodeDuplication.pdf Duccase] points out that “making a code fragment is simpler and faster than writing from scratch” and that if a programmer’s pay is related to the amount of code they produce then the proliferation of software clones will continue.
Once a clone is created it is effectively lost within the source code and so both clones must therefore be maintained as separate units despite their similarities. [http://pages.cs.wisc.edu/~raghavan/sas01.pdf Komondoor] states that if errors are identified within one clone then it is likely that modifications may be necessary to the other counter-part clones. Detection is therefore required if any of the clones are to be re-identified to
assist the maintenance process. So if clones can be detected then the similarities can be exploited and replaced during preventative maintenance with a new single code unit this will eliminate the problems identified above.
Double Submit Problem is a specific instance of the generic Duplicate form submission. It is a common issue with any web based application that has server side transactions. Submission of POST data more than once could result in undesirable results and hence the name : <i>Double Submit Problem </i>. Generally POST requests contain input data and could change the state of the server. So, submitting a form twice or multiple times could result the server in an inconsistent state.
There are a good number of clone detection tools available both commercially and within academia. Within these tools several different approaches to
software clone detection have been implemented, including [http://www.cs.ucsb.edu/~yuf/stranger.pdf string analysis], [http://en.wikipedia.org/wiki/Program_slicing program slicing], [http://ieeexplore.ieee.org.www.lib.ncsu.edu:2048/search/srchabstract.jsp?arnumber=565012&isnumber=12288&punumber=4204&k2dockey=565012@ieeecnfs&query=((automatic+detection+of+function+clones+in+a+software+system+using+metrics)%3Cin%3Emetadata)&pos=0&access=no metric analysis] and [http://ieeexplore.ieee.org.www.lib.ncsu.edu:2048/search/srchabstract.jsp?arnumber=738528&isnumber=15947&punumber=5960&k2dockey=738528@ieeecnfs&query=((clone+detection+using+abstract+syntax+trees)%3Cin%3Emetadata)&pos=0&access=no abstract tree comparisons]. This page will survey the a set of clone detection tools and compare them.
This problem can occur due to any of the following reasons
This article will be primarily focusing on below five established detection tools; [https://www.ipd.uni-karlsruhe.de/jplag/ JPlag], [http://theory.stanford.edu/~aiken/moss/ MOSS], [http://ieeexplore.ieee.org.www.lib.ncsu.edu:2048/search/srchabstract.jsp?arnumber=565012&isnumber=12288&punumber=4204&k2dockey=565012@ieeecnfs&query=((automatic+detection+of+function+clones+in+a+software+system+using+metrics)%3Cin%3Emetadata)&pos=0&access=no Covet], [http://www.ccfinder.net/ CCFinder] and [http://www.semdesigns.com/Products/Clone/ CloneDr]. [http://ieeexplore.ieee.org/Xplore/login.jsp?url=http%3A%2F%2Fieeexplore.ieee.org%2Fstamp%2Fstamp.jsp%3Ftp%3D%26arnumber%3D4812745%26isnumber%3D4812721&authDecision=-203 Extract Method Refactoring] used in [http://www.eclipse.org/ Eclipse] IDE will also be visited briefly. JPlag and MOSS are web-based academic tools for detecting plagiarism in student's source code. CloneDr and CCFinder are stand alone tools looking at code duplication in general.
(b) User inadvertently pressing a back button on the web browser and resubmitting the same form, thereby invoking the duplicate transaction
Figure 1 summarizes the clone detection tools. The languages supported by the analysis process are highlighted, as is the analysis approach. The column labeled domain highlights the main purpose of the tools for either clone detection or for plagiarism detection.
(c) User refreshes the page after successful completion of the transaction. This will invoke the previous transaction again on the server side
[http://ieeexplore.ieee.org.www.lib.ncsu.edu:2048/search/srchabstract.jsp?arnumber=919197&isnumber=19875&punumber=7340&k2dockey=919197@ieeecnfs&query=((maintenance+support+tools+for+java+programs:+ccfinder+and+jaat)%3Cin%3Emetadata)&pos=0&access=no CCFinder] focuses on analyzing large-scale systems with a limited amount of language dependence. It transforms the source code into tokens. CCFinder aims to identify "portions of interest (but syntactically not exactly identical structures)". After the string is tokenised a token-by-token matching algorithms is performed. CCFinder also provides a dotplotting visualisation tool that allows visual recognition of matches within large amounts of code.
(d) User invokes the transaction, but stops the transaction after the transaction is completed on the server side but before the response is rendered on the screen and refreshes the page causing the transaction to be duplicated
[http://ieeexplore.ieee.org.www.lib.ncsu.edu:2048/search/srchabstract.jsp?arnumber=738528&isnumber=15947&punumber=5960&k2dockey=738528@ieeecnfs&query=((clone+detection+using+abstract+syntax+trees)%3Cin%3Emetadata)&pos=0&access=no CloneDr] analyses software at the syntactic level to produce abstract syntax tree (AST) representations. A series of algorithms are then applied to the tree to detect clones. The first algorithm searches for sub-tree matches within the ASTs. Then a “sequence detection” algorithm attempts to detect “variable size sequences of sub-tree clones”. A third algorithm uses combinations of previously detected clones and looks for “more complex near-miss clones”. The final clone set includes the clones detected in the second and third algorithms. CloneDr can automatically replace cloned code by producing a functionally equivalent subroutine or macro.
For instance, assuming an user is checking out a cart from an e-commerce site like amazon.com. User does check-out after paying through credit card and the transaction is successful. For whatever reasons, if user inadvertently refreshes this page, then user's credit card could be charged twice.
[http://ieeexplore.ieee.org.www.lib.ncsu.edu:2048/search/srchabstract.jsp?arnumber=565012&isnumber=12288&punumber=4204&k2dockey=565012@ieeecnfs&query=((automatic+detection+of+function+clones+in+a+software+system+using+metrics)%3Cin%3Emetadata)&pos=0&access=no Mayrand]. These metrics were selected by taking known clones and identifying which of the Datrix metrics best highlighted the known clone set. Covet does not apply the same scale of clone likelihood classification as Mayrand. Rather within Covet this is simplified; there is no scale of clone, functions are either classed as clones or distinct. The tool is still in the prototype stages and is not capable of processing industrial sized programs.
==<font color="windowtext">Solutions for Double Submit Problem</font>==
[http://www.jucs.org/jucs_8_11/finding_plagiarisms_among_a/Prechelt_L.pdf JPlag] uses tokenised substring matching to determine similarity in source code. Its specific purpose is to detect plagiarism within academic institutions. Firstly the source code is translated into tokens (this requires a language dependent process). JPlag aims to tokenise in such way that the "essence" of a program is captured and so can be effective for catching copied functionality. Once converted the tokenised strings are compared to detect the percentage of matching tokens which is used as a similarity value. JPlag is an online service freely available to academia.
Multiple solutions are available to handle these scenarios are discussed below.
[http://theory.stanford.edu/~aiken/moss/ MOSS] Aiken does not publish the method MOSS uses to detect source code plagiarism, as its ability to detect plagiarism may be compromised. Moss like JPlag is an online service provided freely for academic use. Source code is submitted via a perl script and then the results are posted on the MOSS’s webpage. Users are emailed a url of the results.
Extract Method has been recognized as one of the most important refactorings, since it decomposes large methods and can be used in combination with other
refactorings for fixing a variety of design problems. However, existing tools and methodologies support extraction of methods based on a set of statements
selected by the user in the original method. The goal of the methodology proposed by [http://ieeexplore.ieee.org/Xplore/login.jsp?url=http%3A%2F%2Fieeexplore.ieee.org%2Fstamp%2Fstamp.jsp%3Ftp%3D%26arnumber%3D4812745%26isnumber%3D4812721&authDecision=-203 Tsantalis] is to automatically identify Extract Method refactoring opportunities and present them as suggestions to the designer of an object oriented system.
User can be warned not to submit again and can be asked to wait for a response after submitting. Web browsers display a warning to the user when the user tries to reissue a POST request. However, refreshing pages loaded using GET request is allowed and no browser warning will be shown.
Method extraction has a positive effect on maintenance, since it simplifies the code by breaking large methods into smaller ones and creates new methods which can be reused. Method extraction techniques are based on the concept of ''program slicing''. According to Weiser [11], a slice consists of all the statements in a program that may affect the value of a variable x at a specific point of interest p. The pair (p, x) is referred to as slicing criterion. In general, slices are computed by finding sets of directly or indirectly relevant statements based on control and data dependencies. After the original
definition by Weiser, several notions of slicing have been proposed. [http://ieeexplore.ieee.org/Xplore/login.jsp?url=http%3A%2F%2Fieeexplore.ieee.org%2Fstamp%2Fstamp.jsp%3Ftp%3D%26arnumber%3D4812745%26isnumber%3D4812721&authDecision=-203 Tsantalis] has discussed block-base slicing technique and performed experimental evaluations on it. This page will discuss the evaluation results in next section.
[[Image:Browser Warning Display.png|frame|center|alt=Browser Warning Display|Figure 1 ''[[Display warning message in the browser]]''.]]
[http://ieeexplore.ieee.org/search/freesrchabstract.jsp?arnumber=1134103&isnumber=25179&punumber=8211&k2dockey=1134103@ieeecnfs&query=1134103%3Cin%3Earnumber&pos=0 Burd] performed the experiemental evaluation on these tools and the results of the analysis process was used to investigate which of the tools were best suited to assist the process of software maintenance in general and specifically preventative maintenance. The experiment results were obtained using two metrics; precision and recall. Also the results are gathered by performing detecting replication within a single program and replication across distinct programs. Precision is the
measure to the extent to which the clones identified by the tool are extraneous or irrelevant. Recall is a measure to the extent to which the clones identified by the tool matches the clone base within the application being evaluated.
====<span class="apple-style-span"><span style="mso-bidi-font-size: 12.0pt; line-height: 115%"><font color="windowtext">Client Side Solution - Javascript Control Disabling</font></span></span>====
With this, [http://ieeexplore.ieee.org/search/freesrchabstract.jsp?arnumber=1134103&isnumber=25179&punumber=8211&k2dockey=1134103@ieeecnfs&query=1134103%3Cin%3Earnumber&pos=0 Burd] performed the analysis on the results by setting below six categories.
Client based strategy to this issue is to disable the submit button once the form is submitted. Although this could be an easy solution to implement using javascript, it is not as dependable as server side as it is browser based and can be overridden on the browser.
* Output of a high proportion or all of the clones present within the code - CCFinder identified more clones than the other tools but the greater proportion of these clones identified was across files. Proportionally CloneDr identified more clones that were internally replicated within a file. However, the most predictive assessment of this requirement is the metric of recall being to percentage of the clones identified from the total known set. CCFinder identified the greatest total number of clones, thus resulting in the highest level of recall 72%.
* Output of a low proportion or no incorrectly identified clones - CloneDr was the only tool who provided perfect precision, thereby identifying no false positive matches, and therefore not resulting in the incurring of wasted maintenance effort. This is due to the automation process for clone removal; if it can’t be automatically removed then its not identified as a clone.
* Matching and output of clones with a high frequencies of replication - The results of the analysis process showed that Covet followed by CCFinder best satisfied this requirement. However, the benefits CloneDr’s ability to automatically conduct an automated clone replacement process should be not underestimated.
* Output of clones that are large in terms of lines of code - The largest clone identified was by Covet at 123 LOC, but the tools generating the largest mean for all clones was JPlag. Overall, however, all tools showed fairly similar performance levels.
* Output of clones that can be modified or removed with minimum impact to the application - CloneDr was the only tool that could be tested since only that provides automatic clone removal.
* Ease of usability of the tool - There was no analysis performed towards this and hence no evaluation was performed.
The results have identified that there is no single and outright winner for clone detection for preventive maintenance. Each tool had some factors that may
ultimately prove useful to the maintainer.
POST-REDIRECT-GET idiom or PRG Pattern (Jouravlev 2004) offers a decent solution to this problem. The idiom suggests that a web application should issue a REDIRECT request internally as a response to the initial POST request. This internal REDIRECT will then cause a subsequent GET request as shown in the diagram below:
For Extract Method Refactoring [http://ieeexplore.ieee.org/Xplore/login.jsp?url=http%3A%2F%2Fieeexplore.ieee.org%2Fstamp%2Fstamp.jsp%3Ftp%3D%26arnumber%3D4812745%26isnumber%3D4812721&authDecision=-203 Tsantalis] evaluated the results and indicated that the proposed methodology is able to identify slice extraction refactorings which decompose complex methods, create new methods
with useful functionality and preserve the behavior of the code. However, there is a clear need to extend the evaluation on more systems from different domains in order to further improve the effectiveness of the methodology.
So far Extract Method Refactoring using Eclipse have been popular choice between the java based development community.
The web browser that receives a redirect response will then add only the redirected GET request instead of adding the original request to its browsing history. Hence this elegantly solves the double submit problem as the redirected request which is being a GET request can be safely refreshed multiple times.
=<font color="windowtext">References</font>=
Also since the navigational history on the browser will now on contain the idempotent GET requests, user can use back or forward buttons on the browser or bookmark the pages without disrupting the control flow.
<font face=""Times New Roman","serif""><font size="12.0pt"> </font></font>
Synchronizer token pattern which is explained below can also be applied to this problem as a server side solution.
[1] [http://ieeexplore.ieee.org/search/freesrchabstract.jsp?arnumber=1134103&isnumber=25179&punumber=8211&k2dockey=1134103@ieeecnfs&query=1134103%3Cin%3Earnumber&pos=0 Evaluating clone detection tools for use during preventative maintenance]
[2] [http://ieeexplore.ieee.org.www.lib.ncsu.edu:2048/search/srchabstract.jsp?arnumber=624265&isnumber=13575&punumber=4921&k2dockey=624265@ieeecnfs&query=((investigating+the+maintenance+implications+of+the+replication+of+code)%3Cin%3Emetadata)&pos=0&access=no Investigating the maintenance implications of the replication of code]
Synchronizer token pattern is a well know design pattern (Alur 2003) that is normally used in web application development to avoid duplicate form submissions.
The Synchronizer Token pattern works as follows: When a conversation starts, the application generates a unique token that is stored in the HTTP session. The token is also embedded in each page generated during the conversation, and every request that is part of the conversation is required to include the token value. When a request arrives on the server, it compares the value of the token in the HTTP session with the value included in the request. If the two values match, the request is allowed to continue processing. If they don’t match, an error is generated informing the user that the conversation has ended or was invalidated. The token is removed from the HTTP session when the conversation ends or the session expires.
Using POST-REDIRECT-GET avoids accidental double submits of a single request but does not help prevent a user from completing the same business process twice. Such a business process is typically composed of multiple pages spanning several requests. Synchronizer token pattern adds additional safety on top of the POST-REDIRECT-GET idiom by preventing a possibly intentional resubmit of a page. Both the techniques should typically be combined to deliver a complete solution.
=<font color="windowtext">Implementations related to Web Control Flow</font>=
The basic idea of Synchronizer Token is to set a token in a session variable before returning a transactional page to the client. This page carries the token inside a hidden field. Upon submission, request processing first tests for the presence of a valid token in the request parameter by comparing it with the one registered in the session. If the token is valid, processing can continue normally, otherwise an alternate course of action is taken. After testing, the token resets to null to prevent subsequent submissions until a new token is saved in the session, which must be done at the appropriate time based on the desired application flow of control. Many web based frameworks provide built-in support for this. However some of the frameworks require serious developer attention whereas some frameworks do provide configurable automatic support. This section will describe Struts and Spring Web Flow provides Web Control Flow.
[http://struts.apache.org/ Apache Struts] provides built in mechanism for handling tokens in [http://struts.apache.org/1.x/apidocs/org/apache/struts/action/Action.html org.apache.struts.action.Action] class using the [http://struts.apache.org/1.x/apidocs/org/apache/struts/action/Action.html#saveToken(javax.servlet.http.HttpServletRequest) saveToken()] and [http://struts.apache.org/1.x/apidocs/org/apache/struts/action/Action.html#isTokenValid(javax.servlet.http.HttpServletRequest) isTokenValid()] methods. [http://struts.apache.org/1.x/apidocs/org/apache/struts/action/Action.html#saveToken(javax.servlet.http.HttpServletRequest) saveToken()] method creates a token (a unique string) and saves that in the user's current session, while [http://struts.apache.org/1.x/apidocs/org/apache/struts/action/Action.html#isTokenValid(javax.servlet.http.HttpServletRequest) isTokenValid()] checks if the token stored in the user's current session is the same as that was passed as the request parameter.
To do this the JSP has to be loaded through an [http://struts.apache.org/1.x/apidocs/org/apache/struts/action/Action.html Action]. Before loading the [http://java.sun.com/products/jsp/ JSP] call [http://struts.apache.org/1.x/apidocs/org/apache/struts/action/Action.html#saveToken(javax.servlet.http.HttpServletRequest) saveToken()] to save the token in the user session. When the form is submitted, check the token against that in the session by calling [http://struts.apache.org/1.x/apidocs/org/apache/struts/action/Action.html#isTokenValid(javax.servlet.http.HttpServletRequest) isTokenValid()], as shown in the following code snippet:
<pre>
class PurchaseOrderAction extends DispatchAction {
public ActionForward load(ActionMapping mapping,
ActionForm form,
HttpServletRequest request,
HttpServletResponse response) throws Exception {
try {
//save the token
saveToken(request)
// rest of the code for loading the form
} catch(Exception ex){
//exception
}
}
public ActionForward submitOrder(ActionMapping mapping,
ActionForm form,
HttpServletRequest request,
HttpServletResponse response) throws Exception {
try {
// check the token. Proceed only if token is valid
if(isTokenValid(request,true)) {
//implement order submit functionality here
} else {
return mapping.findForward("failure");
}
} catch(Exception ex){
//exception
}
}
}
</pre>
This actually what is happening behind the scene in the [http://struts.apache.org/1.x/apidocs/org/apache/struts/action/Action.html Action] class. [http://struts.apache.org/1.x/apidocs/org/apache/struts/action/Action.html#saveToken(javax.servlet.http.HttpServletRequest) saveToken()] has logic as below:
The method generates a random token using session id, current time and a [http://java.sun.com/j2se/1.4.2/docs/api/java/security/MessageDigest.html MessageDigest] and stores it in the session using a key name [http://struts.apache.org/1.1/api/constant-values.html org.apache.struts.action.TOKEN] (This is the value of the static variable [http://struts.apache.org/1.1/api/org/apache/struts/Globals.html#TRANSACTION_TOKEN_KEY TRANSACTION_TOKEN_KEY] in [http://struts.apache.org/1.1/api/org/apache/struts/Globals.html org.apache.struts.Globals] class).
The [http://struts.apache.org/1.x/apidocs/org/apache/struts/action/Action.html Action] class that renders the form (PurchaseOrderAction.load) invokes the [http://struts.apache.org/1.x/apidocs/org/apache/struts/action/Action.html#saveToken(javax.servlet.http.HttpServletRequest) saveToken()] method to create a session attribute with the above name. In the JSP, the token needs to be as a hidden form field as follows:
The embedded [http://struts.apache.org/1.2.x/userGuide/struts-bean.html#write <bean:write>] tag shown above, looks for a bean named [http://struts.apache.org/1.1/api/constant-values.html org.apache.struts.action.TOKEN] (which is the value of [http://struts.apache.org/1.1/api/org/apache/struts/Globals.html#TRANSACTION_TOKEN_KEY Globals.TRANSACTION_TOKEN_KEY]) in session scope and renders its value as the value attribute of the hidden input variable. The name of the hidden input variable is [http://struts.apache.org/1.x/struts-taglib/apidocs/constant-values.html org.apache.struts.taglib.html.TOKEN] (This is nothing but the value of the static variable [http://struts.apache.org/1.x/struts-taglib/apidocs/org/apache/struts/taglib/html/Constants.html#TOKEN_KEY TOKEN_KEY] in the class [http://struts.apache.org/1.x/struts-taglib/apidocs/org/apache/struts/taglib/html/Constants.html org.apache.struts.taglib.html.Constants]).
When the client submits the form, the hidden field is also submitted. In the [http://struts.apache.org/1.x/apidocs/org/apache/struts/action/Action.html Action] that handles the form submission i.e. PurchaseOrderAction.submitOrder (which most likely is different from the [http://struts.apache.org/1.x/apidocs/org/apache/struts/action/Action.html Action] that rendered the form), the token in the form submission is compared with the token in the session by using the [http://struts.apache.org/1.x/apidocs/org/apache/struts/action/Action.html#isTokenValid(javax.servlet.http.HttpServletRequest) isTokenValid()] method. The method compares the two tokens and returns a true if both are same. Be sure to pass reset=”true” in the [http://struts.apache.org/1.x/apidocs/org/apache/struts/action/Action.html#isTokenValid(javax.servlet.http.HttpServletRequest) isTokenValid()] method to clear the token from session after comparison. If the two tokens are equal, the form was submitted for the first time. However, if the two tokens do not match or if there is no token in the session, then it is a duplicate submission and handle it in the manner acceptable to your users.
If the form is spanned across the multiple pages, then every time the form is submitted before going from one page to another. You definitely want to validate token on every page submission. However you also want to allow the user to traverse back and forth using the browser back button until the point of final submission. If the token is reset on every page submission, the possibility of back and forth traversal using the browser button is ruled out. The solution is not disabling back button (using [http://en.wikipedia.org/wiki/JavaScript JavaScript] hacks) but to handle the token intelligently. This is where the [http://struts.apache.org/1.x/apidocs/org/apache/struts/action/Action.html#isTokenValid(javax.servlet.http.HttpServletRequest) reset parameter] is useful. The token is initially set before showing the first page of the form. The [http://struts.apache.org/1.x/apidocs/org/apache/struts/action/Action.html#isTokenValid(javax.servlet.http.HttpServletRequest) reset parameter] is false for all the [http://struts.apache.org/1.x/apidocs/org/apache/struts/action/Action.html#isTokenValid(javax.servlet.http.HttpServletRequest, boolean) isTokenValid()] invocations except in the [http://struts.apache.org/1.x/apidocs/org/apache/struts/action/Action.html Action] for the last page. The last page uses a true value for the reset argument and hence the token is reset in the [http://struts.apache.org/1.x/apidocs/org/apache/struts/action/Action.html#isTokenValid(javax.servlet.http.HttpServletRequest) isTokenValid()] method. From this point onwards you cannot use back button to traverse to the earlier form pages and successfully submit the form.
<u>Pros and Cons</u> - Token generation and checking support is built in by struts. Although the above approach is good, it requires application developer to add the token checking method pair – [http://struts.apache.org/1.x/apidocs/org/apache/struts/action/Action.html#saveToken(javax.servlet.http.HttpServletRequest) saveToken()] and [http://struts.apache.org/1.x/apidocs/org/apache/struts/action/Action.html#isTokenValid(javax.servlet.http.HttpServletRequest) isTokenValid()] in methods rendering and submitting the sensitive forms respectively. Since the two tasks are generally performed by two different [http://struts.apache.org/1.x/apidocs/org/apache/struts/action/Action.html Action]s, the pairs need to be identified and added manually.
However there is another problem with this implementation. [http://java.sun.com/products/servlet/2.2/javadoc/javax/servlet/http/HttpSession.html HTTPSession] can be looked as a [http://java.sun.com/j2se/1.4.2/docs/api/java/util/HashMap.html HashMap] which holds the key-value pair. No duplicate keys are allowed. Struts uses a fix key ([http://struts.apache.org/1.1/api/org/apache/struts/Globals.html#TRANSACTION_TOKEN_KEY Globals.TRANSACTION_TOKEN_KEY]) to store the token in the session. This means that at any time only one token can be stored in the session. This prohibits users from opening new windows and having multiple conversations at one time. Because as soon the new window is loaded, new token gets generated overwriting previous window's token in the session. So if the first form in submitted, server would not be able to process as the token in the session and in the request would not match. This is a serious limitation. In order to achieve multiple windows using the struts, developers are required to modify the struts implementation and instead of keeping token directly in the session, it would be stored in a separate [http://java.sun.com/j2se/1.4.2/docs/api/java/util/HashMap.html HashMap] with key as window_id and value as token and this [http://java.sun.com/j2se/1.4.2/docs/api/java/util/HashMap.html HashMap] needs to be stored in the session with the key as [http://struts.apache.org/1.1/api/org/apache/struts/Globals.html#TRANSACTION_TOKEN_KEY Globals.TRANSACTION_TOKEN_KEY]. All the places where the session is accessed to retrieve the token are required to be modified and instead of token reading directly from session now it needs to read from the this inner [http://java.sun.com/j2se/1.4.2/docs/api/java/util/HashMap.html HashMap]. This is not only tedious but extremely problematic because then you can not simply upgrade to newer versions of struts because you have customized it. Before adapting newer versions of struts, same modifications as explained above are required.
[3] [http://scg.unibe.ch/archive/papers/Duca99bCodeDuplication.pdf A Language Independent Approach for Detecting Duplicated Code]
Next framework provides automatic support for generating and checking token thus developers do not have to worry about forming the pairs. Also it provides multiple windows feature as well.
[4] [http://pages.cs.wisc.edu/~raghavan/sas01.pdf Using Slicing to Identify Duplication in Source Code]
==<span class="apple-style-span"><span style="mso-bidi-font-size: 12.0pt; line-height: 115%"><font color="windowtext">Spring Web Flow</font></span></span>==
[5] [http://ieeexplore.ieee.org.www.lib.ncsu.edu:2048/search/srchabstract.jsp?arnumber=565012&isnumber=12288&punumber=4204&k2dockey=565012@ieeecnfs&query=((automatic+detection+of+function+clones+in+a+software+system+using+metrics)%3Cin%3Emetadata)&pos=0&access=no Experiment on the automatic detection of function clones in a software system using metrics]
Spring Web Flow was designed with the intention to help developers implement complex conversations within web applications. Spring Web Flow acts as a controller component in the MVC triad and integrates into hosting web MVC frameworks. It serves as an application controller and handles the screen navigation by coordinating the flow of a business process.
Spring Web Flow captures business processes or conversations in modules called '''''flows'''''. A '''''flow''''' is a blueprint for the interaction a user can have with a web application; it reacts to user events to drive the process to completion. You can look at a flow as a simple manifestation of a finite state machine (FSM), consisting of a number of states that define the activities to execute while progressing through the flow. A state can allow a user to participate in the flow, or it can call business services. The flow can move from one state to another using transitions triggered by events. As a common practice business processes are defined using UML state diagrams, and Spring Web Flow flow definitions use a similar model. The following screenshot shows a Spring Web Flow flow definition that mirrors the process definition in Figure:
[6] [http://ieeexplore.ieee.org.www.lib.ncsu.edu:2048/search/srchabstract.jsp?arnumber=738528&isnumber=15947&punumber=5960&k2dockey=738528@ieeecnfs&query=((clone+detection+using+abstract+syntax+trees)%3Cin%3Emetadata)&pos=0&access=no Clone detection using abstract syntax trees]
[[Image:spring_web_flow_process_definition.JPG|frame|center|alt=Spring Web Flow flow definition|Figure 4 ''[[Spring Web Flow flow definition]]''.]]
Given the navigational rules set out in a flow definition, Spring Web Flow automatically takes care of navigational control. Using web continuations, Spring Web Flow can guarantee stable, predictable behavior of a web application even when the user uses the browser’s Back, Forward, or Refresh buttons; revisits bookmarked pages; or opens multiple windows in the same conversation. The '''''POST-REDIRECT-GET''''' idiom will also be automatically applied, without any need for developer intervention.
Typically, a web flow will define a process spanning multiple requests into the web application. While completing the process, the user interacts with the application through several different pages, accumulating data along the way. It is the responsibility of a '''''flow execution repository''''' to maintain all data associated with a flow execution in between separate requests participating in that flow execution.
[7] [http://ieeexplore.ieee.org.www.lib.ncsu.edu:2048/search/srchabstract.jsp?arnumber=919197&isnumber=19875&punumber=7340&k2dockey=919197@ieeecnfs&query=((maintenance+support+tools+for+java+programs:+ccfinder+and+jaat)%3Cin%3Emetadata)&pos=0&access=no Maintenance Support Tools for JAVA Programs: CCFinder and JAAT]
[[Image:flow_execution_repository.JPG|frame|left|alt= FlowExecutionRepository |Figure 5 ''[[FlowExecutionRepository Interface]]''.]] Spring Web Flow provides multiple types of repository implementations where the control flow behavior is predefined. Depending on the requirements developer can configure to use a particular implementation of a repository. All of these implementations implements [http://www.jarvana.com/jarvana/view/org/springframework/spring-webflow/1.0.5/spring-webflow-1.0.5-javadoc.jar!/org/springframework/webflow/execution/repository/FlowExecutionRepository.html FlowExecutionRepository] interface.
[8] [http://www.jucs.org/jucs_8_11/finding_plagiarisms_among_a/Prechelt_L.pdf Finding Plagiarisms among a Set of Programs with JPlag]
<br/>
* Line 5: To adequately manage flow executions, a flow execution repository needs to assign a unique key to each flow execution. Using [http://www.jarvana.com/jarvana/view/org/springframework/spring-webflow/1.0.5/spring-webflow-1.0.5-javadoc.jar!/org/springframework/webflow/execution/repository/FlowExecutionRepository.html#generateKey(org.springframework.webflow.execution.FlowExecution) generateKey(flowExecution)], a new [http://static.springsource.org/spring-webflow/docs/2.0.x/javadoc-api/org/springframework/webflow/execution/FlowExecutionKey.html FlowExecutionKey] will be generated for a freshly launched flow execution. When an existing flow execution, with a key already assigned, needs to be persisted again, getNextKey(flowExecution, previousKey) generates the next key to use. This implies that a new key can be obtained every time a flow execution needs to be stored in a repository, allowing the repository to potentially change the key every time. [http://static.springsource.org/spring-webflow/docs/2.0.x/javadoc-api/org/springframework/webflow/execution/FlowExecutionKey.html FlowExecutionKey] objects can be marshaled into a string form using their toString() method. This string form will be embedded in HTML pages and later travels back to the server using the _flowExecutionKey request parameter. Using [http://www.jarvana.com/jarvana/view/org/springframework/spring-webflow/1.0.5/spring-webflow-1.0.5-javadoc.jar!/org/springframework/webflow/execution/repository/FlowExecutionRepository.html#parseFlowExecutionKey(java.lang.String) parseFlowExecutionKey(encodedKey)], you can unmarshal a string form of a flow execution key back into its object form.
<br/>
* Line 12: To make sure all access to a flow execution object occurs in an orderly fashion, a flow execution repository provides a [http://www.jarvana.com/jarvana/view/org/springframework/spring-webflow/1.0.5/spring-webflow-1.0.5-javadoc.jar!/org/springframework/webflow/executor/jsf/FlowExecutionHolder.html FlowExecutionLock]. A flow execution needs to be locked before it is manipulated and unlocked afterward to ensure that all processing done by a flow execution is serialized: the next request is only processed when the previous one completed processing.
<br/>
* Line 15: Finally, the [http://www.jarvana.com/jarvana/view/org/springframework/spring-webflow/1.0.5/spring-webflow-1.0.5-javadoc.jar!/org/springframework/webflow/execution/repository/FlowExecutionRepository.html FlowExecutionRepository] interface provides methods to store FlowExecution objects in the repository (putFlowExecution(key, flowExecution)), obtain them from the repository getFlowExecution(key)), or remove them from the repository (removeFlowExecution(key)). Before using any of these methods, the flow execution needs to be locked in the repository.
<br/><br/>
Figure 6 represents all the built in repository implementations provided by Spring Web Flows. An appropriate implementation can be chosen by using:
<pre>
[9] [http://theory.stanford.edu/~aiken/moss/ A System for Detecting Software Plagiarism]
In above example simple repository will be used. When using classic Spring bean definitions instead of the Spring Web Flow configuration schema, set the repositoryType property to value SIMPLE:
[10] [http://ieeexplore.ieee.org/Xplore/login.jsp?url=http%3A%2F%2Fieeexplore.ieee.org%2Fstamp%2Fstamp.jsp%3Ftp%3D%26arnumber%3D4812745%26isnumber%3D4812721&authDecision=-203 Identification of Extract Method Refactoring Opportunities]
The [http://static.springsource.org/spring-webflow/docs/1.0.5/api/org/springframework/webflow/execution/repository/support/SimpleFlowExecutionRepository.html SimpleFlowExecutionRepository] guards access to the flow execution by changing the flow execution key for every request thereby imitating the use of a synchronizer token. The methods getNextKey(flowExecution, previousKey) always returns a new key that is different from the previous key. A user can no longer access the flow execution using browser navigation since the request is rendered as stale. The use of Back button on the browser to access history is completely disabled and a user may get an error. When the conversation ends the same situation will occur. When the conversation ends Spring Web Flow will clean up the conversation state and prevent the user form resuming the terminated conversation or to cause duplicate submits.
<br/><u>Pros and Cons</u> - The single most important benefit of using the simple repository is that it only maintains a single copy of the FlowExecution data. These results in very low memory requirements, making this repository type ideal for environments where memory resources are scarce or load requirements are very high.
A downside is that it does not support use of the browser Back button or navigation history and generates an exception if Back button usage is detected. This kind of strict no-Back-button policy enforcement is typically not suited for Internet facing applications. It can be ideal, however, in intranet settings, where you might be able to deploy a custom browser that can completely disable the browser navigational aids.
The single key repository is a variation on the simple repository called as [http://static.springsource.org/spring-webflow/docs/1.0.5/api/org/springframework/webflow/execution/repository/support/SimpleFlowExecutionRepository.html SimpleFlowExecutionRepository]. It will be configured to keep the flow execution key constant for the entire flow execution. Instead of changing the flow execution key on every request the single key repository maintains a single key for every flow execution and hence it supports the use of Back button. Although, since it maintains only a single snapshot of the flow execution it does not support the use of Back button.
When single key repository is used in combination with “always redirect on pause”, all pages of the flow will appear to have been served from exactly the same URL. This is caused by the redirect Spring Web Flow issues before rendering a view. This redirect causes a flow execution refresh. Since the flow execution key remains constant for the entire flow execution when using the single key repository, all pages of the flow will appear to have been served from a URL like this:
The browser will not notice that you are actually navigating from one page to another, because every page of the flow has exactly the same URL. The end result is that the browser does not build up a navigation history, making the Back button useless. If you click the Back button, you end up on the last page before the flow execution started!
Just like all other repository implementations, the single key repository will also remove a flow execution when it ends. This prevents a user from jumping back into a terminated conversation or causing a double submits.
<u>Pros and Cons </u> - Like the simple repository, the single key repository has an important benefit of using very little memory (since it only stores a single flow execution snapshot). This makes it ideal for high-load or low-memory environments.
Although the simple repository completely disallows use of the Back button, the single key repository tricks the browser into not accumulating any browsing history inside the flow execution. This makes going back in the flow using the Back button impossible. This compromise, unlike the strict rules enforced by the simple repository, is typically acceptable for Internet applications.
[[Image:continuation_repository.JPG|frame|right|alt= ContinuationFlowExecutionRepository |Figure 8 ''[[Continuation Repository]]''.]] Continuation Repository is a powerful execution repository provided by Spring Web Flow implemented by the [http://www.jarvana.com/jarvana/view/org/springframework/spring-webflow/1.0.5/spring-webflow-1.0.5-javadoc.jar!/org/springframework/webflow/execution/repository/continuation/ContinuationFlowExecutionRepository.html ContinuationFlowExecutionRepository] class. As suggested by the name the continuation repository manages flow executions using a web continuations algorithm which are an elegant way to deal with complex navigation in web applications. They allow web applications to respond correctly if the user uses the browser Back button or navigation history or even if the user opens multiple windows on a single conversation against which there is no built in mechanism provide by struts. Struts framework required modification in order to support multiple windows on a single conversion. The continuation repository will take a snapshot of the [http://static.springsource.org/spring-webflow/docs/pr5/api/org/springframework/webflow/execution/FlowExecution.html FlowExecution] object at the end of every request that comes into the flow execution. In other words, the “game” is saved at the end of every request. Each continuation snapshot has a unique ID, which is part of the flow execution key. When a request comes in, Spring Web Flow restores the flow execution from the identified continuation snapshot and continues processing. Figure 8 shows this graphically.
* The first request that comes into the flow executor does not contain a flow execution key and hence the flow executor launches a new flow execution for this simple three-state flow. The flow progresses from the first state to a view state which is the second state and then pauses. The flow execution repository takes a snapshot of the FlowExecution and a unique key is assigned. The control is then returned to the Browser. This key is embedded in the rendered view to make sure a next request can submit it again. At the end of the first request, the continuation repository contains a flow execution continuation snapshot, snapshot1, indexed by key1.
* The second request coming into the flow contains key1 as a flow execution key and the response to this request, the flow executor restores the flow execution from the identified continuation, snapshot1. The flow resumes processing and pauses after moving to third state. At this point, the flow execution repository takes another snapshot of the FlowExecution and assigns it a new unique key, which is embedded in the rendered view. The second request caused a second continuation snapshot, snapshot2, to be stored in the repository, indexed using key2. At this point the first snapshot is still present which allows the user to click the Back button and jump back to the previous request to continue from that point onward. Opening a new browser window on the same conversation would allow the user to continue from the current snapshot (snapshot2) independently in each window.
* The third request continues from the continuation identified using key2. In this case, the flow resumes processing and terminates by reaching an end state. As a consequence, the flow execution, and all its continuation snapshots, will be removed from the repository. This prevents double submits, even when using web continuations: if the user clicks the Back button to go back to request two, an error will be produced because the identified continuation snapshot (snapshot2) is no longer available. It was cleaned up along with all other snapshots when the third request terminated the overall conversation. To be able to associate continuation snapshots with the governing logical conversation, Spring Web Flow needs to track both a continuation snapshot ID and the unique ID of the overall conversation. Both of these IDs are embedded in the flow execution key, which consists of two parts:
* The conversation ID is prefixed using _c, followed by the continuation ID prefixed with _k. The conversation ID always remains constant throughout a conversation, while the continuation ID changes on every request.
The continuation repository is the default repository in Spring Web Flow. You can explicitly configure a flow executor to use the continuation repository by specifying continuation as repository type (or CONTINUATION when using the [http://static.springsource.org/spring-webflow/docs/1.0.5/api/org/springframework/webflow/config/FlowExecutorFactoryBean.html FlowExecutorFactoryBean]). The continuation repository has one additional property that can be configured: the maximum number of continuation snapshots allowed per conversation. Using a first-in-first-out algorithm, the oldest snapshot will be thrown away when a new one needs to be taken and the maximum has been reached. Here is a configuration example:
By default, a maximum of 30 continuation snapshots per conversation are maintained. In practice, this is equivalent with an unlimited number of snapshots, since it allows a user to backtrack 30 steps in the browsing history—more than any normal user would ever do. Constraining the number of continuation snapshots is important to prevent an attacker from doing a [http://en.wikipedia.org/wiki/Denial-of-service_attack denial of service] attack by generating a large amount of snapshots.
<u>Pros and Cons</u> - The continuation repository allows you to have completely controlled navigation, while still allowing the user to use all of the browser navigational aides. This promise is a very compelling indeed and is the reason why this is the default repository used by Spring Web Flow.
The most important downside of the continuation repository is the increased memory usage caused by the multiple continuation snapshots that are potentially maintained for each flow execution. By appropriately configuring the maxContinuations property, you can control this, however, making the continuation repository ideal for most web applications.
The last type of flow execution repository provided by Spring Web Flow is the client continuation repository, implemented by the ClientContinuationFlowExecutionRepository class. As the name suggests, the client continuation repository also uses a web-continuations–based approach to flow execution management, similar to the default continuation repository. The difference between the two is in where they store the continuation snapshots. The default continuation repository stores continuation snapshots on the server side, in the HTTP session (using the SessionBindingConversationManager). The client continuation repository stores the continuation snapshots on the client side, avoiding use of any server-side state. To make this possible, the client continuation repository encodes the entire continuation snapshot inside the flow execution key. Here is an example of a hidden field in an HTML form, containing a flow execution key generated by the client continuation repository:
The continuation ID is a base-64–encoded, GZIP-compressed form of the serialized FlowExecution object.
<u>Pros and Cons</u> - The client continuation repository is a remarkable specimen. It has the major advantage of not requiring any server-side state. This has important benefits in terms of scalability, failover, and application and server management. Additionally, since this repository uses web continuations, it has complete support for use of the browser’s Back button. There is a high cost to pay however:
* By default, the client continuation repository does not use a real conversation manager. As a consequence, it cannot properly prevent double submits and does not support the conversation scope. These issues can be resolved, however, by plugging in a real ConversationManager.
* Because of the long flow execution key, applications are limited to using POST requests. This also rules out applying the POST-REDIRECT-GET idiom using “always redirect on pause”.
* Exchanging a relatively large flow execution key between the client and server on every request consumes quite a bit of bandwidth, potentially making the application slower for users on slow connections.
* Storing state on the client has security implications. An attacker could try to reverse engineer the continuation snapshot, extracting sensitive data or manipulating certain data structures. The ClientContinuationFlowExecutionRepository was designed for extensibility, however, allowing you to plug in continuation snapshot encryption. The client continuation repository provided by Spring Web Flow is an interesting experiment. Future versions of Spring Web Flow will certainly investigate this idea in more depth, for instance, in trying to manage conversations using state stored in client-side cookies. For now, make sure you understand the consequences of using the client continuation repository before deciding to use it.
===<span class="apple-style-span"><span style="mso-bidi-font-size: 12.0pt; line-height: 115%"><font color="windowtext">Selecting a Repository</font></span></span>===
The matrix shown in Figure makes it easy to pick the correct repository depending on whether or not you need to support browser Back button usage and on the memory constraints of the deployment environment.
In general, though, most applications are best served using the continuation repository. Since this is the default repository in Spring Web Flow, you don’t need to configure anything. Only select a different repository type if you have real requirements or constraints that force you away from the default repository. So unlike struts the complete token management and checking is done automatically without developer having to worry about pairing issues.
==<span class="apple-style-span"><span style="mso-bidi-font-size: 12.0pt; line-height: 115%"><font color="windowtext">Ruby on the Rails</font></span></span>==
[http://rubyonrails.org/ Rails] introduced the concept of form authenticity tokens in Rails 2.0. These tokens are designed to block naive attempts to call Rails controller methods from outside of views rendered by Rails. Such attacks are called as [http://en.wikipedia.org/wiki/Cross-site_request_forgery Cross-site request forgery (CSRF)]. Form authenticity tokens are one-time hashcodes that are generated as a hidden parameter for any form that is rendered by Rails. When the form is submitted, the hashcode is passed as a hidden parameter to the Rails controller, and Rails validates this hashcode to ensure that the form submission came from a view generated by Rails. This provides a measure of security against naive attempts to submit the form from other clients, since they will not have the proper hashcode needed to pass the Rails authenticity filter for the form submission. Only HTML/JavaScript requests are checked, so this will not protect application against XML API . Also, GET requests are not protected as these should be idempotent anyway.
When a form is rendered using [http://api.rubyonrails.org/classes/ActionView/Helpers/FormTagHelper.html form_tag] or [http://api.rubyonrails.org/classes/ActionView/Helpers/FormHelper.html form_for], the hidden authenticity_token is placed right after the <form> tag automatically.
This feature is turned on with the [http://api.rubyonrails.org/classes/ActionController/RequestForgeryProtection/ClassMethods.html#M000514 protect_from_forgery] method, which will check the token and raise an ActionController::InvalidAuthenticityToken if it doesn‘t match what was expected. You can customize the error message in production by editing public/422.html. A call to this method in ApplicationController is generated by default in post-Rails 2.0 applications. The token parameter is named authenticity_token by default. If you are generating an HTML form manually (without the use of Rails’ form_for, form_tag or other helpers), you have to include a hidden field named like that and set its value to what is returned by form_authenticity_token. Same applies to manually constructed Ajax requests. To make the token available through a global variable to scripts on a certain page, you could add something like this to a view:
Above configuration will turn off the protection for the whole application. If you want to turn it off only for a single controller then:
<pre>
class FooController < ApplicationController
protect_from_forgery :except => :index
# you can disable csrf protection on controller-by-controller basis:
skip_before_filter :verify_authenticity_token
end
</pre>
Valid Options:
:only/:except - Passed to the before_filter call. Set which actions are verified.
As you can see there is not really a sophisticated approach built in into the Rails 2.0 as like that of Spring Web Flows for handling Web Control Flows. We expect Rails to get mature over the next few years and add more features like provided by Spring Web Flows.
=<font color="windowtext">References</font>=
<font face=""Times New Roman","serif""><font size="12.0pt"> </font></font>
[1] [http://ieeexplore.ieee.org/search/freesrchabstract.jsp?arnumber=1134103&isnumber=25179&punumber=8211&k2dockey=1134103@ieeecnfs&query=1134103%3Cin%3Earnumber&pos=0 Evaluating clone detection tools for use during preventative maintenance]
[2] [http://ieeexplore.ieee.org.www.lib.ncsu.edu:2048/search/srchabstract.jsp?arnumber=624265&isnumber=13575&punumber=4921&k2dockey=624265@ieeecnfs&query=((investigating+the+maintenance+implications+of+the+replication+of+code)%3Cin%3Emetadata)&pos=0&access=no Investigating the maintenance implications of the replication of code]
The DRY principle says that a particular code fragment should not be repeated more than once in a program. But it happens. And when it does, it is good to be able to find the multiple code "clones" because Software cloning complicates the maintenance
process by giving the maintainers unnecessary code to examine. As per Burd, it seems that when presented with the challenge of adding new functionality the natural instinct of a programmer is to copy, paste and modify the existing code to meet the new requirements and thus creating a software clone. While the basis behind such an approach is uncertain, one possible reason is due to time restrictions on maintainers to complete the maintenance change. Duccase points out that “making a code fragment is simpler and faster than writing from scratch” and that if a programmer’s pay is related to the amount of code they produce then the proliferation of software clones will continue.
Once a clone is created it is effectively lost within the source code and so both clones must therefore be maintained as separate units despite their similarities. Komondoor states that if errors are identified within one clone then it is likely that modifications may be necessary to the other counter-part clones. Detection is therefore required if any of the clones are to be re-identified to
assist the maintenance process. So if clones can be detected then the similarities can be exploited and replaced during preventative maintenance with a new single code unit this will eliminate the problems identified above.
There are a good number of clone detection tools available both commercially and within academia. Within these tools several different approaches to
software clone detection have been implemented, including string analysis, program slicing, metric analysis and abstract tree comparisons. This page will survey the a set of clone detection tools and compare them.
Clone Detection Techniques
This article will be primarily focusing on below five established detection tools; JPlag, MOSS, Covet, CCFinder and CloneDr. Extract Method Refactoring used in Eclipse IDE will also be visited briefly. JPlag and MOSS are web-based academic tools for detecting plagiarism in student's source code. CloneDr and CCFinder are stand alone tools looking at code duplication in general.
Figure 1 summarizes the clone detection tools. The languages supported by the analysis process are highlighted, as is the analysis approach. The column labeled domain highlights the main purpose of the tools for either clone detection or for plagiarism detection.
CCFinder
CCFinder focuses on analyzing large-scale systems with a limited amount of language dependence. It transforms the source code into tokens. CCFinder aims to identify "portions of interest (but syntactically not exactly identical structures)". After the string is tokenised a token-by-token matching algorithms is performed. CCFinder also provides a dotplotting visualisation tool that allows visual recognition of matches within large amounts of code.
CloneDr
CloneDr analyses software at the syntactic level to produce abstract syntax tree (AST) representations. A series of algorithms are then applied to the tree to detect clones. The first algorithm searches for sub-tree matches within the ASTs. Then a “sequence detection” algorithm attempts to detect “variable size sequences of sub-tree clones”. A third algorithm uses combinations of previously detected clones and looks for “more complex near-miss clones”. The final clone set includes the clones detected in the second and third algorithms. CloneDr can automatically replace cloned code by producing a functionally equivalent subroutine or macro.
Covet
Covet uses a number of the metrics as defined by
Mayrand. These metrics were selected by taking known clones and identifying which of the Datrix metrics best highlighted the known clone set. Covet does not apply the same scale of clone likelihood classification as Mayrand. Rather within Covet this is simplified; there is no scale of clone, functions are either classed as clones or distinct. The tool is still in the prototype stages and is not capable of processing industrial sized programs.
JPlag
JPlag uses tokenised substring matching to determine similarity in source code. Its specific purpose is to detect plagiarism within academic institutions. Firstly the source code is translated into tokens (this requires a language dependent process). JPlag aims to tokenise in such way that the "essence" of a program is captured and so can be effective for catching copied functionality. Once converted the tokenised strings are compared to detect the percentage of matching tokens which is used as a similarity value. JPlag is an online service freely available to academia.
MOSS
MOSS Aiken does not publish the method MOSS uses to detect source code plagiarism, as its ability to detect plagiarism may be compromised. Moss like JPlag is an online service provided freely for academic use. Source code is submitted via a perl script and then the results are posted on the MOSS’s webpage. Users are emailed a url of the results.
Extract Method Refactoring
Extract Method has been recognized as one of the most important refactorings, since it decomposes large methods and can be used in combination with other
refactorings for fixing a variety of design problems. However, existing tools and methodologies support extraction of methods based on a set of statements
selected by the user in the original method. The goal of the methodology proposed by Tsantalis is to automatically identify Extract Method refactoring opportunities and present them as suggestions to the designer of an object oriented system.
Method extraction has a positive effect on maintenance, since it simplifies the code by breaking large methods into smaller ones and creates new methods which can be reused. Method extraction techniques are based on the concept of program slicing. According to Weiser [11], a slice consists of all the statements in a program that may affect the value of a variable x at a specific point of interest p. The pair (p, x) is referred to as slicing criterion. In general, slices are computed by finding sets of directly or indirectly relevant statements based on control and data dependencies. After the original
definition by Weiser, several notions of slicing have been proposed. Tsantalis has discussed block-base slicing technique and performed experimental evaluations on it. This page will discuss the evaluation results in next section.
Comparing evaluation results
Burd performed the experiemental evaluation on these tools and the results of the analysis process was used to investigate which of the tools were best suited to assist the process of software maintenance in general and specifically preventative maintenance. The experiment results were obtained using two metrics; precision and recall. Also the results are gathered by performing detecting replication within a single program and replication across distinct programs. Precision is the
measure to the extent to which the clones identified by the tool are extraneous or irrelevant. Recall is a measure to the extent to which the clones identified by the tool matches the clone base within the application being evaluated.
With this, Burd performed the analysis on the results by setting below six categories.
Output of a high proportion or all of the clones present within the code - CCFinder identified more clones than the other tools but the greater proportion of these clones identified was across files. Proportionally CloneDr identified more clones that were internally replicated within a file. However, the most predictive assessment of this requirement is the metric of recall being to percentage of the clones identified from the total known set. CCFinder identified the greatest total number of clones, thus resulting in the highest level of recall 72%.
Output of a low proportion or no incorrectly identified clones - CloneDr was the only tool who provided perfect precision, thereby identifying no false positive matches, and therefore not resulting in the incurring of wasted maintenance effort. This is due to the automation process for clone removal; if it can’t be automatically removed then its not identified as a clone.
Matching and output of clones with a high frequencies of replication - The results of the analysis process showed that Covet followed by CCFinder best satisfied this requirement. However, the benefits CloneDr’s ability to automatically conduct an automated clone replacement process should be not underestimated.
Output of clones that are large in terms of lines of code - The largest clone identified was by Covet at 123 LOC, but the tools generating the largest mean for all clones was JPlag. Overall, however, all tools showed fairly similar performance levels.
Output of clones that can be modified or removed with minimum impact to the application - CloneDr was the only tool that could be tested since only that provides automatic clone removal.
Ease of usability of the tool - There was no analysis performed towards this and hence no evaluation was performed.
The results have identified that there is no single and outright winner for clone detection for preventive maintenance. Each tool had some factors that may
ultimately prove useful to the maintainer.
For Extract Method Refactoring Tsantalis evaluated the results and indicated that the proposed methodology is able to identify slice extraction refactorings which decompose complex methods, create new methods
with useful functionality and preserve the behavior of the code. However, there is a clear need to extend the evaluation on more systems from different domains in order to further improve the effectiveness of the methodology.
So far Extract Method Refactoring using Eclipse have been popular choice between the java based development community.