CSC/ECE 517 Fall 2009/wiki3 4 dt: Difference between revisions
Line 26: | Line 26: | ||
Caching is collection of duplicate data which is already stored elsewhere but when the operations of retrieval are expensive, the cache is used for enhanced performance and low latency. It is applied in many places like web, networking, computer architecture etc. Caching clearly violates DRY principle to a large extent because especially in processors, various levels of caches are maintained. The objective is to reduce the number of processor cycles required to execute the instruction, so that it can use that number of cycles effectively elsewhere. In the web, caching has proven to improve the speed of data transfer and also the bandwidth utilization is approximately increased by 40%. Most of the caching principles utilize the fact that user may want to access the same data that has been accessed many times before<br />. | Caching is collection of duplicate data which is already stored elsewhere but when the operations of retrieval are expensive, the cache is used for enhanced performance and low latency. It is applied in many places like web, networking, computer architecture etc. Caching clearly violates DRY principle to a large extent because especially in processors, various levels of caches are maintained. The objective is to reduce the number of processor cycles required to execute the instruction, so that it can use that number of cycles effectively elsewhere. In the web, caching has proven to improve the speed of data transfer and also the bandwidth utilization is approximately increased by 40%. Most of the caching principles utilize the fact that user may want to access the same data that has been accessed many times before<br />. | ||
'''Advantages:''' | '''Advantages:''' | ||
* Enhanced performance and effective utilization of resources | * Enhanced performance and effective utilization of resources | ||
* End user satisfaction. <br /> | * End user satisfaction. <br /> | ||
'''Disadvantages:''' | |||
* There can be scenarios of more than one cache operating at various levels. Any change in the data has to be updated at all these levels. If not updated periodically or whenever there is a change, there are potential chances that user may get the corrupt data. | '''Disadvantages:''' | ||
*The duplicate data may consume a lot of space and may affect performance. | * There can be scenarios of more than one cache operating at various levels. Any change in the data has to be updated at all these levels. If not updated periodically or whenever there is a change, there are potential chances that user may get the corrupt data. | ||
*The duplicate data may consume a lot of space and may affect performance. | |||
The trade off should be made at the design level, taking into consideration all of the above factors and based on the requirements and emphasis on each criterion, data duplication can be induced or avoided. | The trade off should be made at the design level, taking into consideration all of the above factors and based on the requirements and emphasis on each criterion, data duplication can be induced or avoided. | ||
Revision as of 01:45, 19 November 2009
Terminology
- DRY - Don't repeat yourself
- SCM - Software configuration management
- SDLC - Software development life cycle
- REST - Representational state transfer
- URI - Uniform Resource Identifier
- UI - User Interface
Introduction
The DRY principle states that every piece of knowledge must have a single, unambiguous, authoritative representation within a system.Pragmatic Programmer, The: From Journeyman to Master It is a software engineering principle for efficient software development, build, test, deployment and documentation. It can be applied to all the levels in the software development life cycle. It was formulated by Andy Hunt and Dave Thomas with an intention that a change in the data / code in a single element should not affect other unrelated elements. There are advantages of following this principle in software like ease of maintenance, good understanding of the code etc. DRY principle is not only confined to coding methodologies but is much broader and is extended to any duplication of data. This article throws some light upon various instances where data is duplicated in the real world. Each of the instances are discussed in detail.
Examples of DRY principal on data
Database
In case of a database it is desirable to have a single copy of data. A database, specially the one that satisfies the third normal form, has the feature that a single piece of information is stored only once. This leads to efficient use of memory as less memory would be required for storing data. It becomes easier to maintain and update because changes need to be made only in one place. This also makes the database less prone to errors because if there were multiple copies of this data then it is quite possible that they do not match. But if only one copy is present then there is no question of mismatch of data.
Specific Project requirements
There are certain scenarios in which duplication is hard to avoid. Suppose there is a client and it wants to have a certain feature implemented. Also, it desires that this feature should be able to run on different platforms. Even though the basic requirement of the feature will remain the same, different documents needs to be written for different target platforms as each would have it’s own programming language, libraries and development environment. Hence, there will be different document even thought they share definitions and procedures.
Also the platforms may differ only in the version of the operating system, or the processor might be different. In this case both the code and the documentation duplication cannot be avoided.
There are certain workarounds in order to minimize the duplication. For instance, let us discuss about documentation in the code. A programmer is taught to comment the code. The DRY principle says that the low-level knowledge should be in the code and the higher level explanations should be kept for the comments because otherwise we are duplicating knowledge and any change in the code would also require a change in the comments which is not desirable.
Caching
Caching is collection of duplicate data which is already stored elsewhere but when the operations of retrieval are expensive, the cache is used for enhanced performance and low latency. It is applied in many places like web, networking, computer architecture etc. Caching clearly violates DRY principle to a large extent because especially in processors, various levels of caches are maintained. The objective is to reduce the number of processor cycles required to execute the instruction, so that it can use that number of cycles effectively elsewhere. In the web, caching has proven to improve the speed of data transfer and also the bandwidth utilization is approximately increased by 40%. Most of the caching principles utilize the fact that user may want to access the same data that has been accessed many times before
.
Advantages: * Enhanced performance and effective utilization of resources * End user satisfaction.
Disadvantages: * There can be scenarios of more than one cache operating at various levels. Any change in the data has to be updated at all these levels. If not updated periodically or whenever there is a change, there are potential chances that user may get the corrupt data. *The duplicate data may consume a lot of space and may affect performance.
The trade off should be made at the design level, taking into consideration all of the above factors and based on the requirements and emphasis on each criterion, data duplication can be induced or avoided.
Configuration management
Software configuration management has steadily grown in importance over the past decade and it has become mandatory for any software application now. It is considered to be one of the best solutions to handle changes in the code and documents. From the requirement gathering phase through the design, development and till the testing phase, many elements are chosen as configuration item. Most common SCM tools are IBM Rational Clearcase, SVN, WinCVS etc. The SCM software keeps multiple copies of the data (both code and documents) and the history of each is maintained from the creation and the changes need to be updated periodically. It is clearly a violation of DRY principle. There is a purpose behind having multiple copies of data and allowing multiple checkouts for the same file.
Advantages:
- More than one user can work on the same code with his own copy and merge back into the development branch.
- The client sometimes may require previous versions of the working code for various reasons. Although most of the functionality in the newer version may be common, it makes sense to maintain a complete version of the previous releases.
- There may be a bug related to the older versions, which might have come up after many releases.
Disadvantages:
- It should be ensured that the newer version should contain sufficient amount of changes from the previous one. In such a case it is reasonable to have another copy of the data.
Web Architecture
In most web 2.0 applications the users cannot freely access their data. Hence they cannot re-use their data in other web application. The current architecture for most web 2.0 applications is shown below.
Take for example "Flickr". It is popular because of the interesting features it provides e.g. tagging. The users are not concerned about the massive data storage facilities. Hence this application would still function the same even if the data layer as shown in the figure above is not owned by "Flickr". Also, "Flickr" allows users to add the "Flickr" photos to other applications.
This leads to a possibility of a general purpose data layer which not a part of any one particular web application. The web applications would only point to the data in this layer and the users can have full access and control of their data. The architecture shown below is of the future web application which has been put in place in some parts.
Since most of the applications in the internet community share data, for e.g. social networking websites, it would be very useful to have such an architecture as it would greatly minimize the duplication of data.
(Images have been taken from http://wolfbyte-net.blogspot.com/2009/01/ccd-red-degree-principle-staying-dry.html)
Conclusion
We have seen many scenarios above where the DRY principle is violated and in some of the cases it is very useful for the data. In all of those instances where the DRY principle is not followed, the designers have potential reasons for duplicating the data. In general for any system, it is ideal to have a single copy of data which will suffice the purpose. But it is not applicable for every scenario as mentioned in the article.