CSC/ECE 517 Fall 2009/wiki1a 9 pp: Difference between revisions

From Expertiza_Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 1: Line 1:
*Research in refactoring tools*
<div class="Section1"><div style="mso-element: para-border-div; border: none; border-bottom: solid #4F81BD 1.0pt; mso-border-bottom-themecolor: accent1; padding: 0in 0in 4.0pt 0in">


Refactoring is a very practical subject. A lot of the work in refactoring environments has been done on an ad hoc basis, by practicing programmers. This article briefly summarizes various initiatives that have been taken to improve the current generation of refactoring tools. Refactoring analysis requires serious research. All though the refactoring tools have been used presently for an automated refactoring, the refactoring itself is being performed informally from many years. The automatic refactoring is a result of academic researches being performed in this area since many years.  This article will later summarizes the academic underpinning for some popular refactoring methods.
<span class="apple-style-span">'''<font color="black"><font face="&quot;Times New Roman&quot;,&quot;serif&quot;"><font size="18.0pt">Research in refactoring tools</font></font></font>'''</span>


*1. Introduction*
</div>


While the integration of refactoring tools into many development environments has increased, the usability of these tools has remained stagnant. Specifically, when refactoring fail, tools communicate the failure to the programmer poorly, causing the programmer to restructure slowly, conservatively, and without preserving behavior. This article summarizes various techniques developed for refactoring correctness, speed, and user satisfaction and improving the usability of refactoring tools. In the long run, better usability will aid in the adoption and utilization of refactoring tools.
<span class="apple-style-span"><span style="line-height: 115%"><font color="black"><font face="&quot;Times New Roman&quot;,&quot;serif&quot;"><font size="12.0pt"><span style="mso-spacerun: yes"> </span>Refactoring is a very practical subject. A lot of the work in refactoring environments has been done on an ad hoc basis, by practicing programmers. This article briefly summarizes various initiatives that have been taken to improve the current generation of refactoring tools. Refactoring analysis requires serious research. All though the refactoring tools have been used presently for an automated refactoring, the refactoring itself is being performed informally from many years. The automatic refactoring is a result of academic researches being performed in this area since many years. <span style="mso-spacerun: yes"> </span>This article will later summarizes the academic underpinning for some popular refactoring methods.</font></font></font></span></span>


Refactoring is a process of restructuring code without changing the way it behaves. Refactoring can be semi-automated with the help of tools, such as those that are integrated into the Eclipse environment. If these tools help programmers refactor with fewer errors, in less time, and with greater overall satisfaction than refactoring by hand, then programmers are likely to adopt such tools.\\\\**
=<font color="windowtext">1. Introduction</font>=


*2. Improvements in the Refactoring Tools*
<span class="apple-style-span"><span style="line-height: 115%"><font color="black"><font face="&quot;Times New Roman&quot;,&quot;serif&quot;"><font size="12.0pt">Refactoring is a process of restructuring code without changing the way it behaves. Refactoring can be semi-automated with the help of tools, such as those that are integrated into the Eclipse environment. If these tools help programmers refactor with fewer errors, in less time, and with greater overall satisfaction than refactoring by hand, then programmers are likely to adopt such tools.</font></font></font></span></span>


There are two main difficulties encountered by programmers while trying to refactor. First, on many occasions programmers are unable to select a list of statements, due to very long statements and inconsistent formatting. Second, is that the programmers have trouble understanding and interpreting error messages produced by the tools, such as “Ambiguous return value: selected block contains more than one assignment to local variable.” These error messages are generally caused by violated refactoring preconditions.
<span class="apple-style-span"><span style="line-height: 115%"><font color="black"><font face="&quot;Times New Roman&quot;,&quot;serif&quot;"><font size="12.0pt">While the integration of refactoring tools into many development environments has increased, the usability of these tools has remained stagnant. Specifically, when refactoring fail, tools communicate the failure to the programmer poorly, causing the programmer to restructure slowly, conservatively, and without preserving behavior. There are various techniques developed for refactoring correctness, speed, and user satisfaction and improving the usability of refactoring tools. In the long run, better usability will aid in the adoption and utilization of refactoring tools. </font></font></font></span></span>


In order to make the tools user friendly there are several researches being performed. Murphy-Hill [5] proposed add-ons to Eclipse such as SelectionAssist where when the cursor is placed in the whitespace in front of a statement, a green highlight appears over the text range of that statement. Another suggestion was using Refactoring Annotations which tell the programmer that one parameter must be passed in to the extracted method, and that two values must be returned, violating an Extract Method precondition. Refactoring Annotations can also inform the programmer about precondition violations involving control flow. With experimental validations Murphy-Hill [1] found that the usability of refactoring tools increases significantly with these changes this increasing programmers’ speed and correctness.
=<font color="windowtext">2. Improvements in the Refactoring Tools</font>=


*3. Academic underpinnings*
<span class="apple-style-span"><span style="line-height: 115%"><font color="black"><font face="&quot;Times New Roman&quot;,&quot;serif&quot;"><font size="12.0pt">There are two main difficulties encountered by programmers while trying to refactor. First, on many occasions programmers are unable to select a list of statements, due to very long statements and inconsistent formatting. Second, is that the programmers have trouble understanding and interpreting error messages produced by the tools, such as “Ambiguous return value: selected block contains more than one assignment to local variable.” These error messages are generally caused by violated refactoring preconditions. </font></font></font></span></span>


Refactoring techniques have been used informally since many years and the recent automation is a result on serious academic research that is being performed for many years. Below section describes some popularly used refactoring methods and underlying academic research being performed in the related areas.
<span class="apple-style-span"><span style="line-height: 115%"><font color="black"><font face="&quot;Times New Roman&quot;,&quot;serif&quot;"><font size="12.0pt">In order to make the tools user friendly there are several researches being performed. Murphy-Hill [5] proposed add-ons to Eclipse such as SelectionAssist where when the cursor is placed in the whitespace in front of a statement, a green highlight appears over the text range of that statement. Another suggestion was using Refactoring Annotations which tell the programmer that one parameter must be passed in to the extracted method, and that two values must be returned, violating an Extract Method precondition. Refactoring Annotations can also inform the programmer about precondition violations involving control flow. With experimental validations Murphy-Hill [1] found that the usability of refactoring tools increases significantly with these changes thus increasing programmers’ speed and correctness. </font></font></font></span></span>


Refactoring Annotations as discussed above could be applied to other software engineering tools, such as compilers. For example, type error messages are notoriously hard for beginners to understand and for experts to track down.
=<font color="windowtext">3. Academic underpinnings</font>=


Extract Method has been recognized as one of the most important refactorings, since it decomposes large methods and can be used in combination with other refactorings for fixing a variety of design problems. However, existing tools and methodologies support extraction of methods based on a set of statements selected by the user in the original method. There are researches being performed in order to provide suggestions about Extract Method opportunities. Chatzigeorgiou [12] have already proposed methodology to automatically identify Extract Method refactoring opportunities and present them as suggestions to the designer of an object oriented system. Such techniques require making uses of Program Slicing [2] techniques. According to Weiser [3], a slice consists of all the statements in a program that may affect the value of a variable x at a specific point of interest p. The pair (p, x) is referred to as slicing criterion. In general, slices are computed by finding sets of directly or indirectly relevant statements based on control and data dependencies. There are vast majority of the papers found in the literature of method extraction are based on the concept of program slicing.
<span class="apple-style-span">''<span style="line-height: 115%"><font color="black"><font face="&quot;Times New Roman&quot;,&quot;serif&quot;"><font size="12.0pt">Lexical Analysis</font></font></font></span>''</span><span class="apple-style-span"><span style="line-height: 115%"><font color="black"><font face="&quot;Times New Roman&quot;,&quot;serif&quot;"><font size="12.0pt"> [4], various ''Parsing'' [5] techniques, ''Regular Expressions'' [6] and ''Graph Theory'' [7] are the three basic components used in the refactoring tools. Over last several years these components have matured due to ongoing research and provide extremely strong foundation for refactoring tools building. <span style="mso-spacerun: yes"> </span><span style="mso-spacerun: yes"> </span>All of the below refactoring methods makes uses of these components.</font></font></font></span></span>


When a section of code is duplicated in more than one location among a collection of source files, that section of code and all of its duplicates are considered code clones. Clones are typically generated because a programmer copies a section of code and pastes it into other parts of the same program. Such clones can produce maintenance nightmares. In particular, if a section of code representing a clone is updated, then other sections of code representing clones from the same group may also need to be updated. This scenario creates a challenge for the maintainer to realize the existence of the clones and uniformly update all relevant sections of code. One approach to improve the maintainability of clones is the use of refactoring to eliminate the duplicate code through abstraction and improved modularity. In this case, the maintainability is improved because an update to the section of code is only required in one location. In order to determine clones that have potential for being refactored, analysis of the clones is required.
==<span class="apple-style-span"><span style="mso-bidi-font-size: 12.0pt; line-height: 115%"><font color="windowtext">3.1 Extract Method</font></span></span>==


Clone refactoring analysis requires a significant amount of academicals research foundation. Information Retrieval-Based Analysis technique can be utilized to determine relationships between the groups of clones or clone classes that have been detected by a clone detection tool.
<span class="apple-style-span">''<span style="line-height: 115%"><font color="black"><font face="&quot;Times New Roman&quot;,&quot;serif&quot;"><font size="12.0pt">Extract Method</font></font></font></span>''</span><span class="apple-style-span"><span style="line-height: 115%"><font color="black"><font face="&quot;Times New Roman&quot;,&quot;serif&quot;"><font size="12.0pt"> has been recognized as one of the most important refactorings, since it decomposes large methods and can be used in combination with other refactorings for fixing a variety of design problems. Extract method is simply observed as ''Cut and Paste ''operation but it is not indeed. ''Extract Method'' actually requires some serious work. The method needs to be analyzed, all the temporary variables need to be found and then the refactoring strategy needs to be applied. The analysis is not something which is simply done by regular expression manipulation but it requires some careful work.</font></font></font></span> </span>


*4. References*
<span class="apple-style-span"><span style="line-height: 115%"><font color="black"><font face="&quot;Times New Roman&quot;,&quot;serif&quot;"><font size="12.0pt">The main problem with existing tools and methodologies is that they support extraction of methods based on a set of statements selected by the user in the original method. There are researches being performed in order to provide suggestions about ''Extract Method'' opportunities. ''Chatzigeorgiou'' [12] have proposed methodology to automatically identify Extract Method refactoring opportunities and present them as suggestions to the designer of an object oriented system. Such techniques require making uses of ''Program Slicing'' [2] techniques. ''Program Slicing'' was defined by ''Weiser'' [3] as a slice consists of all the statements in a program that may affect the value of a variable x at a specific point of interest p. The pair (p, x) is referred to as slicing criterion. In general, slices are computed by finding sets of directly or indirectly relevant statements based on control and data dependencies. There are vast majority of the papers found in the literature of method extraction are based on the concept of Program Slicing. After the original definition by ''Weiser'' [3] several other notions of slicing such as static and dynamic slicing, forward and backward slicing, syntax-preserving slicing and intra-procedural slicing have been proposed by various researchers. </font></font></font></span></span>


[1] Emerson Murphy-Hill, Improving usability of refactoring tools.
==<span class="apple-style-span"><span style="mso-bidi-font-size: 12.0pt; line-height: 115%"><font color="windowtext">3.2 Code cloning and detection</font></span></span>==


October 2006, ACM
<span class="apple-style-span"><span style="line-height: 115%"><font color="black"><font face="&quot;Times New Roman&quot;,&quot;serif&quot;"><font size="12.0pt">When a section of code is duplicated in more than one location among a collection of source files, that section of code and all of its duplicates are considered code clones. Clones are typically generated because a programmer copies a section of code and pastes it into other parts of the same program. Such clones can produce maintenance nightmares. In particular, if a section of code representing a clone is updated, then other sections of code representing clones from the same group may also need to be updated. This scenario creates a challenge for the maintainer to realize the existence of the clones and uniformly update all relevant sections of code. One approach to improve the maintainability of clones is the use of refactoring to eliminate the duplicate code through abstraction and improved modularity. In this case, the maintainability is improved because an update to the section of code is only required in one location. In order to determine clones that have potential for being refactored, analysis of the clones is required. </font></font></font></span></span>


[2] [Program Slicing|http://en.wikipedia.org/wiki/Program_slicing]
<span class="apple-style-span"><span style="line-height: 115%"><font color="black"><font face="&quot;Times New Roman&quot;,&quot;serif&quot;"><font size="12.0pt">Clone detection analysis makes use of:</font></font></font></span></span>


[3] M. Weiser, "Program Slicing," IEEE Transactions on Software Engineering, vol. 10, no. 4, pp. 352-357, 1984.
<span lang="EN-GB" style="line-height: 115%; mso-fareast-font-family: &quot;Times New Roman&quot;; mso-ansi-language: EN-GB"><font color="black"><font face="&quot;Times New Roman&quot;,&quot;serif&quot;"><font size="12.0pt"><span style="mso-list: Ignore">•<span style="font: 7.0pt &quot;Times New Roman&quot;">        </span></span></font></font></font></span>''<span lang="EN-GB" style="line-height: 115%; mso-ansi-language: EN-GB"><font color="black"><font face="&quot;Times New Roman&quot;,&quot;serif&quot;"><font size="12.0pt">String Matching</font></font></font></span>''<span lang="EN-GB" style="line-height: 115%; mso-ansi-language: EN-GB"><font color="black"><font face="&quot;Times New Roman&quot;,&quot;serif&quot;"><font size="12.0pt"> – Represents and evaluates code using string comparisons. There are various types of string matching such as Exact String Matching, Parameterized Matching and Substring Matching.</font></font></font></span>
 
<span style="line-height: 115%; mso-fareast-font-family: &quot;Times New Roman&quot;"><font color="black"><font face="&quot;Times New Roman&quot;,&quot;serif&quot;"><font size="12.0pt"><span style="mso-list: Ignore">•<span style="font: 7.0pt &quot;Times New Roman&quot;">        </span></span></font></font></font></span>''<span lang="EN-GB" style="line-height: 115%; mso-ansi-language: EN-GB"><font color="black"><font face="&quot;Times New Roman&quot;,&quot;serif&quot;"><font size="12.0pt">Token Parsing</font></font></font></span>''<span lang="EN-GB" style="line-height: 115%; mso-ansi-language: EN-GB"><font color="black"><font face="&quot;Times New Roman&quot;,&quot;serif&quot;"><font size="12.0pt"> – Transforms code into tokens by using language specific constructs into a single token string. Then finds the similarities within this token string and transforms token clones back into code clones for presentation.</font></font></font></span><span style="line-height: 115%"><font color="black"><font face="&quot;Times New Roman&quot;,&quot;serif&quot;"><font size="12.0pt"></font></font></font></span>
 
<span class="apple-style-span"><span style="line-height: 115%; mso-fareast-font-family: &quot;Times New Roman&quot;"><font color="black"><font face="&quot;Times New Roman&quot;,&quot;serif&quot;"><font size="12.0pt"><span style="mso-list: Ignore">•<span style="font: 7.0pt &quot;Times New Roman&quot;">        </span></span></font></font></font></span></span>''<span lang="EN-GB" style="line-height: 115%; mso-ansi-language: EN-GB"><font color="black"><font face="&quot;Times New Roman&quot;,&quot;serif&quot;"><font size="12.0pt">Graph Matching</font></font></font></span>''<span lang="EN-GB" style="line-height: 115%; mso-ansi-language: EN-GB"><font color="black"><font face="&quot;Times New Roman&quot;,&quot;serif&quot;"><font size="12.0pt"> – Machine representation of the code is formed using graphs and clones are identified as identical subgraphs.</font></font></font></span><span class="apple-style-span"><span style="line-height: 115%"><font color="black"><font face="&quot;Times New Roman&quot;,&quot;serif&quot;"><font size="12.0pt"></font></font></font></span></span>
 
<span class="apple-style-span"><span style="line-height: 115%"><font color="black"><font face="&quot;Times New Roman&quot;,&quot;serif&quot;"><font size="12.0pt">Refactoring Annotations as discussed earlier could be applied to other software engineering tools, such as compilers. For example, type error messages are notoriously hard for beginners to understand and for experts to track down. </font></font></font></span></span>
 
=<font color="windowtext">4. References</font>=
 
<font face="&quot;Times New Roman&quot;,&quot;serif&quot;"><font size="12.0pt"> </font></font>
 
<font face="&quot;Times New Roman&quot;,&quot;serif&quot;"><font size="12.0pt">[1] Emerson Murphy-Hill, Improving usability of refactoring tools.<span style="mso-spacerun: yes">  </span>October 2006, ACM </font></font>
 
<font face="&quot;Times New Roman&quot;,&quot;serif&quot;"><font size="12.0pt">[2] </font></font>[http://en.wikipedia.org/wiki/Program_slicing <font face="&quot;Times New Roman&quot;,&quot;serif&quot;"><font size="12.0pt">Program Slicing</font></font>]<font face="&quot;Times New Roman&quot;,&quot;serif&quot;"><font size="12.0pt"></font></font>
 
<font face="&quot;Times New Roman&quot;,&quot;serif&quot;"><font size="12.0pt">[3] M. Weiser, "Program Slicing," IEEE Transactions on Software Engineering, vol. 10, no. 4, pp. 352-357, 1984.</font></font>
 
<font face="&quot;Times New Roman&quot;,&quot;serif&quot;"><font size="12.0pt">[4] [http://en.wikipedia.org/wiki/Lexical_analysis Lexical Analysis]</font></font>
 
<font face="&quot;Times New Roman&quot;,&quot;serif&quot;"><font size="12.0pt">[5] [http://en.wikipedia.org/wiki/Parsing Parsing] <span style="mso-spacerun: yes"> </span></font></font>
 
<font face="&quot;Times New Roman&quot;,&quot;serif&quot;"><font size="12.0pt">[6] [http://en.wikipedia.org/wiki/Regular_expression Regular Expressions]</font></font>
 
<font face="&quot;Times New Roman&quot;,&quot;serif&quot;"><font size="12.0pt">[7] [http://en.wikipedia.org/wiki/Graph_theory Graph Theory]</font></font>
 
<font face="&quot;Times New Roman&quot;,&quot;serif&quot;"><font size="12.0pt">[8] [http://www.cs.drexel.edu/~spiros/teaching/CS675/slides/code_cloning.ppt Code Cloning Detection]</font></font>
 
</div>

Revision as of 20:31, 7 September 2009

Research in refactoring tools

Refactoring is a very practical subject. A lot of the work in refactoring environments has been done on an ad hoc basis, by practicing programmers. This article briefly summarizes various initiatives that have been taken to improve the current generation of refactoring tools. Refactoring analysis requires serious research. All though the refactoring tools have been used presently for an automated refactoring, the refactoring itself is being performed informally from many years. The automatic refactoring is a result of academic researches being performed in this area since many years. This article will later summarizes the academic underpinning for some popular refactoring methods.

1. Introduction

Refactoring is a process of restructuring code without changing the way it behaves. Refactoring can be semi-automated with the help of tools, such as those that are integrated into the Eclipse environment. If these tools help programmers refactor with fewer errors, in less time, and with greater overall satisfaction than refactoring by hand, then programmers are likely to adopt such tools.

While the integration of refactoring tools into many development environments has increased, the usability of these tools has remained stagnant. Specifically, when refactoring fail, tools communicate the failure to the programmer poorly, causing the programmer to restructure slowly, conservatively, and without preserving behavior. There are various techniques developed for refactoring correctness, speed, and user satisfaction and improving the usability of refactoring tools. In the long run, better usability will aid in the adoption and utilization of refactoring tools.

2. Improvements in the Refactoring Tools

There are two main difficulties encountered by programmers while trying to refactor. First, on many occasions programmers are unable to select a list of statements, due to very long statements and inconsistent formatting. Second, is that the programmers have trouble understanding and interpreting error messages produced by the tools, such as “Ambiguous return value: selected block contains more than one assignment to local variable.” These error messages are generally caused by violated refactoring preconditions.

In order to make the tools user friendly there are several researches being performed. Murphy-Hill [5] proposed add-ons to Eclipse such as SelectionAssist where when the cursor is placed in the whitespace in front of a statement, a green highlight appears over the text range of that statement. Another suggestion was using Refactoring Annotations which tell the programmer that one parameter must be passed in to the extracted method, and that two values must be returned, violating an Extract Method precondition. Refactoring Annotations can also inform the programmer about precondition violations involving control flow. With experimental validations Murphy-Hill [1] found that the usability of refactoring tools increases significantly with these changes thus increasing programmers’ speed and correctness.

3. Academic underpinnings

Lexical Analysis [4], various Parsing [5] techniques, Regular Expressions [6] and Graph Theory [7] are the three basic components used in the refactoring tools. Over last several years these components have matured due to ongoing research and provide extremely strong foundation for refactoring tools building. All of the below refactoring methods makes uses of these components.

3.1 Extract Method

Extract Method has been recognized as one of the most important refactorings, since it decomposes large methods and can be used in combination with other refactorings for fixing a variety of design problems. Extract method is simply observed as Cut and Paste operation but it is not indeed. Extract Method actually requires some serious work. The method needs to be analyzed, all the temporary variables need to be found and then the refactoring strategy needs to be applied. The analysis is not something which is simply done by regular expression manipulation but it requires some careful work.

The main problem with existing tools and methodologies is that they support extraction of methods based on a set of statements selected by the user in the original method. There are researches being performed in order to provide suggestions about Extract Method opportunities. Chatzigeorgiou [12] have proposed methodology to automatically identify Extract Method refactoring opportunities and present them as suggestions to the designer of an object oriented system. Such techniques require making uses of Program Slicing [2] techniques. Program Slicing was defined by Weiser [3] as a slice consists of all the statements in a program that may affect the value of a variable x at a specific point of interest p. The pair (p, x) is referred to as slicing criterion. In general, slices are computed by finding sets of directly or indirectly relevant statements based on control and data dependencies. There are vast majority of the papers found in the literature of method extraction are based on the concept of Program Slicing. After the original definition by Weiser [3] several other notions of slicing such as static and dynamic slicing, forward and backward slicing, syntax-preserving slicing and intra-procedural slicing have been proposed by various researchers.

3.2 Code cloning and detection

When a section of code is duplicated in more than one location among a collection of source files, that section of code and all of its duplicates are considered code clones. Clones are typically generated because a programmer copies a section of code and pastes it into other parts of the same program. Such clones can produce maintenance nightmares. In particular, if a section of code representing a clone is updated, then other sections of code representing clones from the same group may also need to be updated. This scenario creates a challenge for the maintainer to realize the existence of the clones and uniformly update all relevant sections of code. One approach to improve the maintainability of clones is the use of refactoring to eliminate the duplicate code through abstraction and improved modularity. In this case, the maintainability is improved because an update to the section of code is only required in one location. In order to determine clones that have potential for being refactored, analysis of the clones is required.

Clone detection analysis makes use of:

String Matching – Represents and evaluates code using string comparisons. There are various types of string matching such as Exact String Matching, Parameterized Matching and Substring Matching.

Token Parsing – Transforms code into tokens by using language specific constructs into a single token string. Then finds the similarities within this token string and transforms token clones back into code clones for presentation.

Graph Matching – Machine representation of the code is formed using graphs and clones are identified as identical subgraphs.

Refactoring Annotations as discussed earlier could be applied to other software engineering tools, such as compilers. For example, type error messages are notoriously hard for beginners to understand and for experts to track down.

4. References

[1] Emerson Murphy-Hill, Improving usability of refactoring tools. October 2006, ACM

[2] Program Slicing

[3] M. Weiser, "Program Slicing," IEEE Transactions on Software Engineering, vol. 10, no. 4, pp. 352-357, 1984.

[4] Lexical Analysis

[5] Parsing

[6] Regular Expressions

[7] Graph Theory

[8] Code Cloning Detection