CSC/ECE 517 Summer 2008/wiki3 4 mb
Assignment: High Cohesion
We introduced the idea of high cohesion in Lecture 20, after a brief mention in Lecture 15. But we've really only scratched the surface on what there is to know about achieving high cohesion. Browse the Web and the ACM DL for other information, both theoretical and practical, and produce a guide to what there is to know about high cohesion. Be sure to highlight those aspects that would be appropriate for inclusion in CSC/ECE 517.
Cohesion, What is it?
Cohesion is a way to describe how related the responsibilities of a class are |1|. Strong cohesion tends to yield traits such as code reusability, understandability, and reliability.
Therefore, a high-level goal of software design is to maximize cohesion. Classes within a software system should have a specific role |2|. The instance variables and methods associated with a class should all seek to achieve the common goal.
It is important for software systems to exhibit high cohesion for many reasons. Classes that exhibit high cohesion have a distinct role to play within a software system. Thus, it is easy to understand why a class exists and how that class should be used. The purpose of a class with low cohesion can be difficult to determine, and can lead to that class being misused, underused, or being used inappropriately. When a class with an unclear purpose is used with other classes in the system, the overall maintainability, readability, and understandability of the system is weakened.
Disadvantages of low cohesion
Software systems are generally dynamic entities that will often require updates, extensions, and refactoring. Because of this, systems with low cohesion tend to have many disadvantages. If we consider changing a class with low cohesion we can begin to understand the difficulties that may arise |2|.
Interdependent methods
If methods within a class are highly interdependent, changing one method will require making changes in other methods. It can be difficult to understand how subtle changes in a method may affect the state of an object that is important to other methods of the class.
Testing classes with low cohesion
Testing is another problem for classes with low cohesion. If the execution of a method in a class requires another method of the class or, even worse, numerous methods in the class to operate, the number of tests required increases. It may be difficult to determine if all combinations of method invocations are actually being tested, due to the interdependences between the methods.
Code reuse
Code reusability also suffers for classes with low cohesion. If it is difficult for developers to understand how methods in a class work or even what the class is used for, then those developers will probably not want to reuse the code in the class. Inheriting from classes with low cohesion almost always seems like a bad idea. When classes have a seemingly arbitrary mix of methods and functionality, developers will find little value in subclassing such a class.
Recognizing low cohesion
If we can agree that low cohesion within a class is bad, then how can we recognize classes that exhibit poor cohesion? First of all, we can consider utility classes to almost always be bad examples of high cohesion. Utility classes are usually created to perform a disparate suite of functionality, and therefore the applicability of the utility provided is hard to understand and hard to take advantage of.
Class names offer a good indicator of how cohesive the code is. If the name of a class does not convey the role that the class plays within the system, then it is probably not a cohesive class. If the name of a class appears to complex, it probably means the class is too complex. For example, let us say we have a class named ParseAnalyzeAndRecordDataFiles. This class is almost certainly going to be defined with numerous interdependencies between the methods of the class, and for this reason should indicate that the class should be separated into numerous classes each with a specific role to play.
Beyond class names, looking at the number of methods defined within a class can also help to indicate how cohesive a class is. If the number of methods in a class is high, then that class is probably trying to do too many things. Methods should not be designed with a large number of execution branches. If a method takes a lot of Boolean flags that determines what should be done, or what other methods should be called, then the method should probably be broken up into several distinct methods.
Theoretical high cohesion
With a basic understanding of what is cohesive and what is not cohesive, we should investigate cohesion from a more theoretical standpoint. To begin with, let us look at a list of the types of cohesion that can be defined for a system |7|.
Type of Cohesion Description Coincidental cohesion (worst) Coincidental cohesion is when parts of a module are grouped arbitrarily (at random); the parts have no significant relationship.
e.g. a module of frequently used mathematical functions
Logical cohesion Logical cohesion is when parts of a module are grouped because they are logically categorized to do the same thing,
even if they are different by nature
e.g. grouping all I/O handling routinesTemporal cohesion Temporal cohesion is when parts of a module are grouped by when they are processed - the parts are processed at a particular time in program execution
e.g. a function which is called after catching an exception which closes open files, creates an error log, and notifies the user
Procedural cohesion Procedural cohesion is when parts of a module are grouped because they always follow a certain sequence of execution
e.g. a function which checks file permissions and then opens the file
Communicational cohesion Communicational cohesion is when parts of a module are grouped because they operate on the same data
e.g. a module which operates on the same record of information
Sequential cohesion Sequential cohesion is when parts of a module are grouped because the output from one part is the input to another part like an assembly line
e.g. a function which reads data from a file and processes the data
Functional cohesion (best) Functional cohesion is when parts of a module are grouped because they all contribute to a single well-defined task of the module.
While the preceding classification of the various types of cohesion may help shed some light on whether or not a class has low or high cohesion, it does not provide a mechanism from moving from one type of cohesion to the next. Even worse, the classification is a subjective measure of cohesion. Although, efforts to formalize these seven types of cohesion have been made and attempts to provide evaluations of class cohesion have been made |3|. This leads us to a discussion of formal cohesion metrics.
Cohesion Metrics
Cohesion metrics provide a way to help understand how related the methods of a class are to one another.
Metric: Lack of Cohesion Of Methods (LCOM)
One interpretation of class cohesiveness is determined by examining whether or not the methods of a class work on the same instance variables. The methods of classes with high cohesion will work on the same variables, while methods of classes with low cohesion will work on different variables altogether. When classes have high cohesion, they promote strong encapsulation and information hiding. The variables of cohesive classes can be accessed appropriately and a clear understanding of how the class methods affect those variables can be readily observed.
A common metric used to determine class cohesion is LCOM, |4|. This is a formalization of the approach mentioned previously, where classes are considered to be cohesive if the same instance variables appear in the methods of the class.
Metrics: Cohesion Among Methods in a Class (CAMC), and Normalized Hamming Distance (NHD)
Another metric used in determining class cohesion, is the examination of parameter types of methods within a class. The CAMC metric and the NHD metric are two techniques that involve analyzing method signatures as a way to determine cohesiveness |4|. A distinct advantage of metrics that analyze method parameters is the fact that method definitions exist much earlier in a system design than do method bodies. Also, the parameters of a method can be indicative to the types of functionality that will be provided by the method.
Metric: External Cohesion
Viewing class cohesion outside of the context of the rest of the software system can also be problematic. When measuring internal cohesion for the JDK and Eclipse using standard metrics, the results may seem poor |5|. The problem with considering cohesion locally is that classes can often have several valid implementations, which may affect the number and role of instance variables of the class. It is also the case that many classes will have methods that only work on some instance variables while another set of the class methods work on a different set of instance variables, yet together they help to maintain the overall state of the class within a current software system. This leads to the idea of considering external cohesion |5|, which attempts to consider the context in which a class will be used within a software system. For instance, a Person class may appear to have low cohesion when examined using LCOM or CAMC metrics. Since the Person class will probably have instance variables for name and age, and the methods that effect those variables will probably not effect both. But, a Person class will generally serve to create a class that has high cohesion when examining its role within the software system.
Empirical Studies
Many papers have been written regarding the practical application of these metrics in user studies. One of these papers attempts to show that developers with different backgrounds often regard the cohesiveness of a class quite differently |6|. For instance, in Object-oriented Cohesion Subjectivity Amongst Experienced and Novice Developers: an Empirical Study, citation [6], researchers found that experienced developers did not consider the number of methods in a class to be strongly indicative of a classes cohesion, while the novice developers in the study found that the more methods a class had the more cohesive the class was. Once again, subjectivity and skill level seem to reflect developer’s views upon class cohesion.
Aspects appropriate to include in CSC/ECE 517
It seems that no clear metric for determining the cohesion of a class exists. There are many attempts of defining metrics to measure class cohesion both internally and externally. While these studies are interesting and important to the field of software engineering, I am not sure that they are appropriate to include within the CSC/ECE 517 course (with the exception of wiki research assignments).
On the other hand, the disadvantages of low cohesion and how to identify low cohesion seem like topics that could be covered in greater depth. It is important to understand how cohesion effects code robustness, reliability, reusability, and understandability. A focus on developing test cases for software projects in this course may be a good way to help students recognize low cohesion and ultimately become better developers.
References
[3] Lakhotia, A. (1993). Rule-based Approach to Computing Module Cohesion, In Proceedings of the 15th international conference on Software Engineering, Baltimore, Maryland.
[4] Counsell, S., Swift, S., Crampton, J. (2006). The Interpretation and Utility of Three Cohesion Metrics for Object-Oriented Design, ACM Transactions on Software Engineering and Methodology, Volume 15, pp. 123-149.
[5] Mäkelä, S., Leppänen, V. (2007). External Views on Class Cohesion, Proceedings of the 2007 international conference on Computer systems and technologies, Bulgaria.
[6] Counsell, S., Swift, S., Tucker, A., Mendes, E. (2006). Object-oriented Cohesion Subjectivity Amongst Experienced and Novice Developers: an Empirical Study, ACM SIGSOFT Software Engineering Notes, Volume 31, pp. 1-10.
[7] Cohesion