CSC/ECE 517 Fall 2009/wiki3 4 dt: Difference between revisions
No edit summary |
No edit summary |
||
Line 9: | Line 9: | ||
In case of a database it is desirable to have a single copy of data. A database, specially the one that satisfies the third normal form, has the feature that a single piece of information is stored only once. This leads to efficient use of memory as less memory would be required for storing data. It becomes easier to maintain and update because changes need to be made only in one place. This also makes the database less prone to errors because if there were multiple copies of this data then it is quite possible that they do not match. But if only one copy is present then there is no question of mismatch of data. | In case of a database it is desirable to have a single copy of data. A database, specially the one that satisfies the third normal form, has the feature that a single piece of information is stored only once. This leads to efficient use of memory as less memory would be required for storing data. It becomes easier to maintain and update because changes need to be made only in one place. This also makes the database less prone to errors because if there were multiple copies of this data then it is quite possible that they do not match. But if only one copy is present then there is no question of mismatch of data. | ||
== | ==Specific Project requirements== | ||
There are certain scenarios in which duplication is hard to avoid. Suppose there is a client and it wants to have a certain feature implemented. Also, it desires that this feature should be able to run on different platforms. Even though the basic requirement of the feature will remain the same, different documents needs to be written for different target platforms as each would have it’s own programming language, libraries and development environment. Hence, there will be different document even thought they share definitions and procedures. | There are certain scenarios in which duplication is hard to avoid. Suppose there is a client and it wants to have a certain feature implemented. Also, it desires that this feature should be able to run on different platforms. Even though the basic requirement of the feature will remain the same, different documents needs to be written for different target platforms as each would have it’s own programming language, libraries and development environment. Hence, there will be different document even thought they share definitions and procedures. | ||
Revision as of 00:02, 19 November 2009
Introduction
The DRY (Don’t Repeat Yourself) principle states that every piece of knowledge must have a single, unambiguous, authoritative representation within a system. It is a software engineering principle for efficient software development, build, test, deployment and documentation. It can be applied to all the levels in the software development life cycle. It was formulated by Andy Hunt and Dave Thomas with an intention that a change in the data / code in a single element should not affect other unrelated elements. There are advantages of following this principle in software like ease of maintenance, good understanding of the code etc. DRY principle is not only confined to coding methodologies but is much broader and is extended to any duplication of data. This article throws some light upon various instances where data is duplicated in the real world. Each scenario is described with pros and cons.
Examples of DRY principal on data
Database
In case of a database it is desirable to have a single copy of data. A database, specially the one that satisfies the third normal form, has the feature that a single piece of information is stored only once. This leads to efficient use of memory as less memory would be required for storing data. It becomes easier to maintain and update because changes need to be made only in one place. This also makes the database less prone to errors because if there were multiple copies of this data then it is quite possible that they do not match. But if only one copy is present then there is no question of mismatch of data.
Specific Project requirements
There are certain scenarios in which duplication is hard to avoid. Suppose there is a client and it wants to have a certain feature implemented. Also, it desires that this feature should be able to run on different platforms. Even though the basic requirement of the feature will remain the same, different documents needs to be written for different target platforms as each would have it’s own programming language, libraries and development environment. Hence, there will be different document even thought they share definitions and procedures.
Also the platforms may differ only in the version of the operating system, or the processor might be different. In this case both the code and the documentation duplication cannot be avoided.
There are certain workarounds in order to minimize the duplication. For instance, let us discuss about documentation in the code. A programmer is taught to comment the code. The DRY principle says that the low-level knowledge should be in the code and the higher level explanations should be kept for the comments because otherwise we are duplicating knowledge and any change in the code would also require a change in the comments which is not desirable.
Caching
Caching is collection of duplicate data which is already stored elsewhere but when the operations of retrieval are expensive, the cache is used for enhanced performance and low latency. It is applied in many places like web, networking, computer architecture etc. Caching clearly violates DRY principle to a large extent because especially in processors, various levels are caches are maintained. The objective is to reduce the number of processor cycles required to execute the instruction, so that it can use that number of cycles effectively elsewhere. In the web, caching has proven to improve the speed of data transfer and also the bandwidth utilization is approximately increases by 40%. Most of the caching principles utilize the fact that user may want to access the same data that has been accessed many times before
.
Advantages:
- Enhanced performance and effective utilization of resources
- End user satisfaction.
Disadvantages:
- There can be scenarios of more than one cache operating at various levels. Any change in the data has to be updated at all these levels. If not updated periodically or whenever there is a change, there are potential chances that user may get the corrupt data.
- The duplicate data may consume a lot of space and may affect performance.
The tradeoff should be made at the design taking into consideration of all the above factors and based on the requirements and emphasis on each criterion, data duplication can be induced or avoided.
Configuration management
Software configuration management has steadily grown in importance over the past decade and it has become a mandatory for any software application now. It is considered to be one of the best solutions to handle changes in the code and documents. From the requirement gathering phase through the design, development and till the testing phase many elements are chosen as configuration item. Most common SCM tools are IBM Rational Clearcase, SVN, WinCVS etc. The SCM software keeps multiple copies of the data (both code and documents) and the history of them is maintained from the creation and the changes need to be updated periodically. It is clearly a violation of DRY principle. There is a purpose behind having multiple copies of data and allowing multiple checkouts for the same file.
Advantages:
- More than one user can work on the same code with his own copy and merge back into the development branch.
- The client sometimes may require previous versions of the working code for various reasons. Although most of the functionality in the newer version may be common, it makes sense to maintain a complete version of the previous releases.
- There may be a bug related to the older versions, which might have come up after many releases.
Disadvantages:
- It should be ensured that the newer version should contain sufficient amount of changes from the previous one. Then it is reasonable to have another copy of the data