CSC/ECE 517 Fall 2009/wiki3 4 br: Difference between revisions

From Expertiza_Wiki
Jump to navigation Jump to search
No edit summary
 
(16 intermediate revisions by the same user not shown)
Line 1: Line 1:
Topic: DRY principle for data
Most of the literature on the DRY principle relates to code. But the principle also applies to data. Survey the issues related to copying the same data, and give reasons (e.g., caching) why it might sometimes be desirable, and how one should decide where it is helpful not to follow this general rule.
__TOC__
__TOC__


=='''Don't Repeat Yourself'''==
=='''Don't Repeat Yourself - Introduction'''==
 
[http://en.wikipedia.org/wiki/DRY ''DRY''] (or Don't Repeat Yourself) is a software engineering principle that says that "every piece of knowledge must have a single, unambiguous, authoritative representation within a system" [1].  By applying DRY practice to your software, the system is broken down into smaller parts with logically unrelated pieces separated, allowing easier changes to one element without affecting the system.  DRY also helps by keeping related code together, and making sure that the same code (or even just the same functionality) does not appear in two different locations in the system.  This helps with ensuring that fixing one bug, or enhancing one part of the system, does not leave code (or functionality) somewhere else unmodified and out of sync.


[http://en.wikipedia.org/wiki/DRY ''DRY''] (or Don't Repeat Yourself) is a software engineering principle that says that "every piece of knowledge must have a single, unambiguous, authoritative representation within a system" [1].  By applying DRY practice to your software, the system is broken down into smaller parts with logically unrelated pieces separated, allowing easier changes to one element without affecting the rest of the system.  DRY also helps by keeping related code together, and making sure that the same code (or even just the same functionality) does not appear in two different locations in the system.  This helps with ensuring that fixing one bug, or enhancing one part of the system, does not leave code (or functionality) somewhere else unmodified and out of sync.


This article briefly explains how the DRY principle not only applies to code but also to data.  It further describes how in some situations, violating the DRY principle is accepted for gains in areas where would not be possible without some way of duplicating information.  In such situations, it is extremely important for the original source of information to be known.


==='''DRY principle - Code'''===
=='''DRY principle - Code'''==


The idea of why one does not want more than one way to represent something in the system is simple: if you have more than one to represent something, with time, the different representation are more likely to be out of sync.  As Dave Thomas, author of Programming Ruby: A Pragmatic Programmer's Guide, says "A system's knowledge is far broader than just its code. It refers to database schemas, test plans, the build system, even documentation." [2]
The idea of why one does not want more than one way to represent something in the system is simple: if you have more than one way to represent something, with time, the different representation are more likely to be out of sync.  As Dave Thomas, author of Programming Ruby: A Pragmatic Programmer's Guide, says "A system's knowledge is far broader than just its code. It refers to database schemas, test plans, the build system, even documentation." [2]


:*'''''Example''': Repeated Code
:*'''''Example''': Violation of the DRY principle
   public class Student {
   public class Student {
     private String name;
     private String name;
Line 61: Line 57:
The example above can be applied to classes where the common code does not only return a string, but instead performs some kind of procedure or calculation.  In such a scenario, if the programmer was to make a change to one of the procedure and was not aware of the other, the code now would be different, possibly even providing different results.
The example above can be applied to classes where the common code does not only return a string, but instead performs some kind of procedure or calculation.  In such a scenario, if the programmer was to make a change to one of the procedure and was not aware of the other, the code now would be different, possibly even providing different results.


==='''DRY principle - Data'''===
=='''DRY principle - Data'''==
 
Most of the documentation regarding the DRY principle usually discusses the duplication of code in methods and functions in a system, however, the principle also applies to data.  Data can be created, passed around, copied and destroyed, and in general, data should not be duplicated for the following reasons:
 
* Stale Data - if data is duplicated, it needs to remain in sync with the source or it will become stale and no longer valid
* Overhead - in order for data not to become stale, it needs to be updated whenever the original source is also updated.  This could increase the processing in order to maintain the data in sync
* Additional memory - data duplication usually means more memory to store and manipulate
 
==='''Scenarios where DRY principle is violated'''===
 
There are certain scenarios, however, where duplication might not only be acceptable but also desirable.  However, in order to avoid the reasons just mentioned, certain rules should be followed.


Most of the documentation regarding the DRY principle usually discusses the duplication of code in methods and functions in a system, but the principle also applies to data.  Data can be created, passed around, copied and destroyed.  In general, data, just like code, should not be duplicated as a change in one location would leave the data out of sync with somewhere else.
===='''Source Version Control'''====


===='''Data duplication - acceptable?'''====
[http://en.wikipedia.org/wiki/Version_control Source version control] is a management control tool that tracks different versions of data.  It is usually used in software development teams where several people might be making change to the same file.  It allows multiple copies of the data, sometimes with slight variations, in different branches and tags.  This is a good practice for this tool since at any given time there could be code in several stages: development, testing and production.  Two such tools for management control that directly violate the DRY principle are [http://en.wikipedia.org/wiki/Subversion_(software) SVN] and [http://en.wikipedia.org/wiki/Concurrent_Versions_System CVS].


In some instances, the duplication of data might actually be acceptable and even desirable.
This is direct violation of the DRY principle since at any given time, there could be copies of the same file in different branches allowing developers to work concurrently in different areas of the code without affection one another.  Developers can even work on different versions independently and merge them together at a later time.


* Source version control allows multiple copies of the data, sometimes with slight variations, in different branches and tagsThis is a good practice for this tool since at any given time there could be code in several stages: development, testing and production
===='''Caching'''====
* Caching of data can also prove to be useful if directly obtaining the data each time is resource and time intensive.  Caching data would result in better system efficiency as long as the authoritative source is well known.
[http://en.wikipedia.org/wiki/Caching Caching] of data is the duplication of data that has been retrieved from a certain location, or calculated at an earlier timeIt is used when the time or processing power would be expensive to retrieve/calculate the data again.  Caching has proved to be useful in numerous applications including caching large amount of data transferred on the network, or caching data inside the processors that have been retrieved from memory.
* Documentation can be automatically generated from code, which would essentially duplicate code and comments to create the document.


===='''Data duplication - when is it useful?'''====
This violates the DRY principle by copying data locally in order to achieve faster processing than would normally be possible by having to fetch the same data across the network or another slower location.


When considering whether data duplication might be helpful and acceptable, one must "identify the single, definitive source of every piece of knowledge used in your system, and then use that source to generate applicable instances of that knowledge (code, documentation, tests, etc)." [3]
===='''Documentation'''====
Documentation can be automatically pulled and generated from code, which would essentially duplicate code and comments to create the document.  This is a useful technique since an updated documentation can be directly retrieved and created from the modified code without requiring individual to manually modify the doc.  This actually ensures that the document is up-to-date based on the latest code.
 
This example of violation of the DRY Principle allows the developers to spent less time updating documentation every time the code is changed by allowing the documentation to be directly retrieved from the source code.  This is in direct violation of the DRY principle as the same data/code now appears in two different locations.
 
==='''How to determine when to duplicate data'''===
 
When considering whether data duplication might be helpful and acceptable, one must "identify the single, definitive source of every piece of knowledge used in your system, and then use that source to generate applicable instances of that knowledge (code, documentation, tests, etc)." [3] The duplication of data should only be considered when it will save time or resources or both.  Furthermore, the authoritative source of the data is well known.  In short, you should only consider data duplication if:
 
* It could potentially simplify the system without needing up-to-date or precise information, as in calculating a result at one point in time, where it might not be important to perform the same calculation at a different time with updated information.
* It will increase the system performance without greatly decreasing maintainability.
* There is benefit of being to access different versions of the code in time, or sharing between different developers so that the code throughput is increased, as in the case of source control tools.


=='''Conclusion'''==
=='''Conclusion'''==


Ideally, code and data should not be duplicated making maintainability and synchronization of information easy.  However, this document has described several scenarios where violating the DRY principle with regards to the duplication of data is acceptable.  In all examples provided, duplicating data significantly saved network bandwidth, computing processing, or human time.  Also, in all of the examples, only a single source of data was the authoritative source.


=='''References'''==
=='''References'''==
'''
 
1. http://en.wikipedia.org/wiki/DRY
1. [http://en.wikipedia.org/wiki/DRY The DRY Principle]


2. [http://www.artima.com/intv/dry.html Orthogonality and the DRY Principle - A Conversation with Andy Hunt and Dave Thomas, Part II]
2. [http://www.artima.com/intv/dry.html Orthogonality and the DRY Principle - A Conversation with Andy Hunt and Dave Thomas, Part II]


3. http://c2.com/cgi/wiki?DontRepeatYourself
3. http://c2.com/cgi/wiki?DontRepeatYourself
=='''Further Readings'''==
1. [http://www.stat.auckland.ac.nz/~paul/ItDT/HTML/node23.html applying the DRY principle to writing computer code]
2. [http://reinholdweber.com/css/refactoring-your-css-styles-to-comply-with-the-dry-principle/ Refactoring to comply with the DRY Principle]

Latest revision as of 02:25, 24 November 2009

Don't Repeat Yourself - Introduction

DRY (or Don't Repeat Yourself) is a software engineering principle that says that "every piece of knowledge must have a single, unambiguous, authoritative representation within a system" [1]. By applying DRY practice to your software, the system is broken down into smaller parts with logically unrelated pieces separated, allowing easier changes to one element without affecting the rest of the system. DRY also helps by keeping related code together, and making sure that the same code (or even just the same functionality) does not appear in two different locations in the system. This helps with ensuring that fixing one bug, or enhancing one part of the system, does not leave code (or functionality) somewhere else unmodified and out of sync.

This article briefly explains how the DRY principle not only applies to code but also to data. It further describes how in some situations, violating the DRY principle is accepted for gains in areas where would not be possible without some way of duplicating information. In such situations, it is extremely important for the original source of information to be known.

DRY principle - Code

The idea of why one does not want more than one way to represent something in the system is simple: if you have more than one way to represent something, with time, the different representation are more likely to be out of sync. As Dave Thomas, author of Programming Ruby: A Pragmatic Programmer's Guide, says "A system's knowledge is far broader than just its code. It refers to database schemas, test plans, the build system, even documentation." [2]

  • Example: Violation of the DRY principle
 public class Student {
   private String name;
   private String address;
   private String gpa;
   public String getName() { return name; }
   public String getAddress() { return address; }
   public String getGPA() { return gpa; }
   ... other methods and data ...
 }
 public class Employee {
   private String name;
   private String address;
   private String salary;
   public String getName() { return name; }
   public String getAddress() { return address; }
   public String getSalary() { return salary; }
   ... other methods and data ...
 }
  • Example: DRY Principle
 public class Person {
   private String name;
   private String address;
   public String getName() { return name; }
   public String getAddress() { return address; }
 }
 public class Student {
   private Person me;
   private String gpa;
   public String getName() { return me.getName; }
   public String getAddress() { return me.getAddress; }
   public String getGPA() { return gpa; }
   ... other methods and data ...
 }
 public class Employee {
   private Person me;
   private String salary;
   public String getName() { return me.getName; }
   public String getAddress() { return me.getAddress; }
   public String getSalary() { return salary; }
   ... other methods and data ...
 }

The example above can be applied to classes where the common code does not only return a string, but instead performs some kind of procedure or calculation. In such a scenario, if the programmer was to make a change to one of the procedure and was not aware of the other, the code now would be different, possibly even providing different results.

DRY principle - Data

Most of the documentation regarding the DRY principle usually discusses the duplication of code in methods and functions in a system, however, the principle also applies to data. Data can be created, passed around, copied and destroyed, and in general, data should not be duplicated for the following reasons:

* Stale Data - if data is duplicated, it needs to remain in sync with the source or it will become stale and no longer valid
* Overhead - in order for data not to become stale, it needs to be updated whenever the original source is also updated.  This could increase the processing in order to maintain the data in sync
* Additional memory - data duplication usually means more memory to store and manipulate

Scenarios where DRY principle is violated

There are certain scenarios, however, where duplication might not only be acceptable but also desirable. However, in order to avoid the reasons just mentioned, certain rules should be followed.

Source Version Control

Source version control is a management control tool that tracks different versions of data. It is usually used in software development teams where several people might be making change to the same file. It allows multiple copies of the data, sometimes with slight variations, in different branches and tags. This is a good practice for this tool since at any given time there could be code in several stages: development, testing and production. Two such tools for management control that directly violate the DRY principle are SVN and CVS.

This is direct violation of the DRY principle since at any given time, there could be copies of the same file in different branches allowing developers to work concurrently in different areas of the code without affection one another. Developers can even work on different versions independently and merge them together at a later time.

Caching

Caching of data is the duplication of data that has been retrieved from a certain location, or calculated at an earlier time. It is used when the time or processing power would be expensive to retrieve/calculate the data again. Caching has proved to be useful in numerous applications including caching large amount of data transferred on the network, or caching data inside the processors that have been retrieved from memory.

This violates the DRY principle by copying data locally in order to achieve faster processing than would normally be possible by having to fetch the same data across the network or another slower location.

Documentation

Documentation can be automatically pulled and generated from code, which would essentially duplicate code and comments to create the document. This is a useful technique since an updated documentation can be directly retrieved and created from the modified code without requiring individual to manually modify the doc. This actually ensures that the document is up-to-date based on the latest code.

This example of violation of the DRY Principle allows the developers to spent less time updating documentation every time the code is changed by allowing the documentation to be directly retrieved from the source code. This is in direct violation of the DRY principle as the same data/code now appears in two different locations.

How to determine when to duplicate data

When considering whether data duplication might be helpful and acceptable, one must "identify the single, definitive source of every piece of knowledge used in your system, and then use that source to generate applicable instances of that knowledge (code, documentation, tests, etc)." [3] The duplication of data should only be considered when it will save time or resources or both. Furthermore, the authoritative source of the data is well known. In short, you should only consider data duplication if:

  • It could potentially simplify the system without needing up-to-date or precise information, as in calculating a result at one point in time, where it might not be important to perform the same calculation at a different time with updated information.
  • It will increase the system performance without greatly decreasing maintainability.
  • There is benefit of being to access different versions of the code in time, or sharing between different developers so that the code throughput is increased, as in the case of source control tools.

Conclusion

Ideally, code and data should not be duplicated making maintainability and synchronization of information easy. However, this document has described several scenarios where violating the DRY principle with regards to the duplication of data is acceptable. In all examples provided, duplicating data significantly saved network bandwidth, computing processing, or human time. Also, in all of the examples, only a single source of data was the authoritative source.

References

1. The DRY Principle

2. Orthogonality and the DRY Principle - A Conversation with Andy Hunt and Dave Thomas, Part II

3. http://c2.com/cgi/wiki?DontRepeatYourself

Further Readings

1. applying the DRY principle to writing computer code

2. Refactoring to comply with the DRY Principle