CSC/ECE 517 Summer 2008/wiki3 4 mb

From Expertiza_Wiki
Revision as of 17:52, 31 July 2008 by Mabaker4 (talk | contribs)
Jump to navigation Jump to search

Assignment: High Cohesion

We introduced the idea of high cohesion in Lecture 20, after a brief mention in Lecture 15. But we've really only scratched the surface on what there is to know about achieving high cohesion. Browse the Web and the ACM DL for other information, both theoretical and practical, and produce a guide to what there is to know about high cohesion. Be sure to highlight those aspects that would be appropriate for inclusion in CSC/ECE 517.

Cohesion, What is it?

Cohesion is a way to describe how related the responsibilities of a class are |1|. Strong cohesion tends to yield traits such as code reusability, understandability, and reliability.

Therefore, a high-level goal of software design is to maximize cohesion. Classes within a software system should have a specific role |2|. The instance variables and methods associated with a class should all seek to achieve the common goal.

It is important for software systems to exhibit high cohesion for many reasons. Classes that exhibit high cohesion have a distinct role to play within a software system. Thus, it is easy to understand why a class exists and how that class should be used. The purpose of a class with low cohesion can be difficult to determine, and can lead to that class being misused, underused, or being used inappropriately. When a class with an unclear purpose is used with other classes in the system, the overall maintainability, readability, and understandability of the system is weakened.

Disadvantages of low cohesion

Software systems are generally dynamic entities that will often require updated, extensions, and refactoring. Because of this, systems with low cohesion tend to have many disadvantages. If we consider changing a class with low cohesion we can begin to understand the difficulties that may arise |2|. If methods within a class are highly interdependent, changing one method will require making changes in other methods. It can be difficult to understand how subtle changes in a method may affect the state of an object that is important to other methods of the class.

Testing is another problem for classes with low cohesion. If the execution of a method in a class requires another method of the class or, even worse, numerous methods in the class to operate, the number of tests required increases. It may be difficult to determine if all combinations of method invocations are actually being tested, due to the interdependences between the methods.

Code reusability also suffers for classes with low cohesion. If it is difficult for developers to understand how methods in a class work or even what the class is used for, then those developers will probably not want to reuse the code in the class. Inheriting from classes with low cohesion almost always seems like a bad idea. When classes have a seemingly arbitrary mix of methods and functionality, developers will find little value in subclassing such a class.

Recognizing low cohesion

If we can agree that low cohesion within a class is bad, then how can we recognize classes that exhibit poor cohesion? First of all, we can consider utility classes to almost always be bad examples of high cohesion. Utility classes are usually created to perform a disparate suite of functionality, and therefore the applicability of the utility provided is hard to understand and hard to take advantage of.

Class names offer a good indicator of how cohesive the code is. If the name of a class does not convey the role that the class plays within the system, then it is probably not a cohesive class. If the name of a class appears to complex, it probably means the class is too complex. For example, let us say we have a class named ParseAnalyzeAndRecordDataFiles. This class is almost certainly going to be defined with numerous interdependencies between the methods of the class, and for this reason should indicate that the class should be separated into numerous classes each with a specific role to play.

Beyond class names, looking at the number of methods defined within a class can also help to indicate how cohesive a class is. If the number of methods in a class is high, then that class is probably trying to do too many things. Methods should not be designed with a large number of execution branches. If a method takes a lot of Boolean flags that determines what should be done, or what other methods should be called, then the method should probably be broken up into several distinct methods.

Theoretical high cohesion

With a basic understanding of what is cohesive and what is not cohesive, we should investigate cohesion from a more theoretical standpoint. To begin with, let us look at a list of the types of cohesion that can be defined for a system |7|.

Types of cohesion
Coincidental cohesion (worst): Coincidental cohesion is when parts of a module are grouped arbitrarily (at random); the parts have no significant relationship (e.g. a module of frequently used mathematical functions).
Logical cohesion: Logical cohesion is when parts of a module are grouped because they logically are categorized to do the same thing, even if they are different by nature (e.g. grouping all I/O handling routines).
Temporal cohesion: Temporal cohesion is when parts of a module are grouped by when they are processed - the parts are processed at a particular time in program execution (e.g. a function which is called after catching an exception which closes open files, creates an error log, and notifies the user).
Procedural cohesion: Procedural cohesion is when parts of a module are grouped because they always follow a certain sequence of execution (e.g. a function which checks file permissions and then opens the file).
Communicational cohesion: Communicational cohesion is when parts of a module are grouped because they operate on the same data (e.g. a module which operates on the same record of information).
Sequential cohesion: Sequential cohesion is when parts of a module are grouped because the output from one part is the input to another part like an assembly line (e.g. a function which reads data from a file and processes the data).
Functional cohesion (best): Functional cohesion is when parts of a module are grouped because they all contribute to a single well-defined task of the module.

While the preceding classification of the various types of cohesion may help shed some light on whether or not a class is has low or high cohesion, it does not provide a mechanism from moving from one type of cohesion to the next. Even worse, the classification is a subjective measure of cohesion. Although, efforts to formalize these seven types of cohesion have been made and attempts to provide evaluations of class cohesion have been made |3|. This leads to us to a discussion of more formal cohesion metrics. Cohesion metrics provide a way to help understand how related the methods of a class are to one another. A basic interpretation of class cohesiveness is determined by examining whether or not the methods of a class work on the same instance variables. The methods of classes with high cohesion will work on the same variables, while methods of classes with low cohesion will work on different variables altogether. When classes have high cohesion, they promote strong encapsulation and information hiding. The variables of cohesive classes can be accessed appropriately and a clear understanding of how the class methods affect those variables can be readily observed.

A common metric used to determine class cohesion is LCOM, lack of cohesion of methods |4|. This is a formalization of the approach mentioned previously, where classes are considered to be cohesive if the same instance variables appear in the methods of the class.

Another metric used in determining class cohesion is the examination of parameter types of methods within a class. The CAMC, cohesion among methods in a class, metric and the NHD, normalized Hamming distance, metric are two techniques that involve analyzing method signatures as a way to determine cohesiveness |4|. A distinct advantage of metrics that analyze method parameters is the fact that method definitions exist much earlier in a system design than do method bodies. Also, the parameters of a method can be indicative to the types of functionality that will be provided by a method.

Viewing class cohesion outside of the context of the rest of the software system can also be problematic. When measuring internal cohesion for the JDK and Eclipse using standard metrics, the results may seem poor |5|. The problem with considering cohesion locally is that classes can often have several valid implementations, which may affect the number and role of instance variables of the class. It is also the case that many classes will have methods that only work on some instance variables while another set of the class methods work on a different set of instance variables, yet together they help to maintain the overall state of the class within a current software system. This leads to the idea of considering external cohesion |5|, which attempts to consider the context in which a class will be used within a software system. For instance, a Person class may appear to have low cohesion when examined using LCOM or CAMC metrics. But, a Person class will generally serve to create a class that has high cohesion when examining its role within the software system.

Many papers have been written regarding the practical application of these metrics in user studies. One of these papers attempts to show that developers with different backgrounds often regard the cohesiveness of a class quite differently |6|. For instance, in Object-oriented Cohesion Subjectivity Amongst Experienced and Novice Developers: an Empirical Study, citation [6], researchers found that experienced developers did not consider the number of methods in a class to be strongly indicative of a classes cohesion, while the novice developers in the study found that the more methods a class had the more cohesive the class was. Once again, subjectivity and skill level seem to reflect developer’s views upon class cohesion.

Aspects appropriate to include in CSC/ECE 517

It seems that no clear metric for determining the cohesion of a class exists. There are many attempts of defining metrics to measure class cohesion both internally and externally. While these studies are interesting and important to the field of software engineering, I am not sure that they are appropriate to include within the CSC/ECE 517 course (with the exception of wiki research assignments).

On the other hand, the disadvantages of low cohesion and how to identify low cohesion seem like topics that could be covered in greater depth. It is important to understand how cohesion effects code robustness, reliability, reusability, and understandability. A focus on developing test cases for software projects in this course may be a good way to help students recognize low cohesion and ultimately become better developers.

References

[1] Wikipedia: Cohesion

[2] Steve Rowe’s Blog

[3] Lakhotia, A. (1993). Rule-based Approach to Computing Module Cohesion, In Proceedings of the 15th international conference on Software Engineering, Baltimore, Maryland.

[4] Counsell, S., Swift, S., Crampton, J. (2006). The Interpretation and Utility of Three Cohesion Metrics for Object-Oriented Design, ACM Transactions on Software Engineering and Methodology, Volume 15, pp. 123-149.

[5] Mäkelä, S., Leppänen, V. (2007). External Views on Class Cohesion, Proceedings of the 2007 international conference on Computer systems and technologies, Bulgaria.

[6] Counsell, S., Swift, S., Tucker, A., Mendes, E. (2006). Object-oriented Cohesion Subjectivity Amongst Experienced and Novice Developers: an Empirical Study, ACM SIGSOFT Software Engineering Notes, Volume 31, pp. 1-10.

[7] Cohesion