CSC/ECE 517 Fall 2009/wiki3 4 dt

From Expertiza_Wiki
Revision as of 23:37, 18 November 2009 by Tvishwa (talk | contribs)
Jump to navigation Jump to search

Introduction

The DRY (Don’t Repeat Yourself) principle states that every piece of knowledge must have a single, unambiguous, authoritative representation within a system. It is a software engineering principle for efficient software development, build, test, deployment and documentation. It can be applied to all the levels in the software development life cycle. It was formulated by Andy Hunt and Dave Thomas with an intention that a change in the data / code in a single element should not affect other unrelated elements. There are advantages of following this principle in software like ease of maintenance, good understanding of the code etc. DRY principle is not only confined to coding methodologies but is much broader and is extended to any duplication of data. This article throws some light upon various instances where data is duplicated in the real world. Each scenario is described with pros and cons.

Examples of DRY principal on data

Database

In case of a database it is desirable to have a single copy of data. A database, specially the one that satisfies the third normal form, has the feature that a single piece of information is stored only once. This leads to efficient use of memory as less memory would be required for storing data. It becomes easier to maintain and update because changes need to be made only in one place. This also makes the database less prone to errors because if there were multiple copies of this data then it is quite possible that they do not match. But if only one copy is present then there is no question of mismatch of data.

Duplication of information

There are certain scenarios in which duplication is hard to avoid. Suppose there is a client and it wants to have a certain feature implemented. Also, it desires that this feature should be able to run on different platforms. Even though the basic requirement of the feature will remain the same, different documents needs to be written for different target platforms as each would have it’s own programming language, libraries and development environment. Hence, there will be different document even thought they share definitions and procedures.

Also the platforms may differ only in the version of the operating system, or the processor might be different. In this case both the code and the documentation duplication cannot be avoided.

There are certain workarounds in order to minimize the duplication. For instance, let us discuss about documentation in the code. A programmer is taught to comment the code. The DRY principle says that the low-level knowledge should be in the code and the higher level explanations should be kept for the comments because otherwise we are duplicating knowledge and any change in the code would also require a change in the comments which is not desirable.


Caching

Caching is collection of duplicate data which is already stored elsewhere but when the operations of retrieval are expensive, the cache is used for enhanced performance and low latency. It is applied in many places like web, networking, computer architecture etc. Caching clearly violates DRY principle to a large extent because especially in processors, various levels are caches are maintained. The objective is to reduce the number of processor cycles required to execute the instruction, so that it can use that number of cycles effectively elsewhere. In the web, caching has proven to improve the speed of data transfer and also the bandwidth utilization is approximately increases by 40%. Most of the caching principles utilize the fact that user may want to access the same data that has been accessed many times before. Advantages:

  • Enhanced performance and effective utilization of resources

2. End user satisfaction. Disadvantages: 1. There can be scenarios of more than one cache operating at various levels. Any change in the data has to be updated at all these levels. If not updated periodically or whenever there is a change, there are potential chances that user may get the corrupt data. 2. The duplicate data may consume a lot of space and may affect performance. The tradeoff should be made at the design taking into consideration of all the above factors and based on the requirements and emphasis on each criterion, data duplication can be induced or avoided.


Conclusion