CSC/ECE 517 Fall 2009/wiki1a 3 ve

From Expertiza_Wiki
Jump to navigation Jump to search

Strengths and Weaknesses of Distributed Version Control

Introduction

A version control system is a software package that allows developers to record and track how the source code for a project changes over time. Typically each change is recorded in some form of database along with who made the change and perhaps a description of what the change was or why it was made.

Because each change is recorded it is easy to compare the current version of a particular source file with any older version. It is also possible to restore any source file to the state it was at a particular time in the past which can be useful if a change in a particular file breaks part of the project and needs to be undone.

Centralized Version Control

Traditionally version control systems like CVS and Subversion have relied on each developer having access to a central server that holds the source code repository for the project. The repository contains a complete copy of all of the files in the project along with all of the changes that have been made to each file since it was first added to the project.

When a new file needs to be added, the developer connects to the server with the repository and checks in the new file. When a developer wants to make a change to a file, he or she is usually expected to connect to the server and retrieve the current version of the file. The developer then makes the changes and checks in the file to the repository. This records the changes that were made and makes them available to the other developers working on the project.

Distributed Version Control

A new type of version control system called distributed version control has become popular in recent years. Distributed version control systems do not require a central server to maintain the repository for a project. Instead each developer working on the project has his or her own repository located on their workstation.

Any changes a developer makes to a file are tracked in the developer’s own repository. Each change is given a unique identifier. When the developer A wants to share his or her changes with other developers they can either push the changes directly to the other developer’s repository or they can ask the other developers to retrieve the changes directly from developer A’s repository.

Advantages of Distributed Version Control Systems

No network connectivity required

One of the more obvious advantages of a distributed system is the fact that developers do not need a network connection and access to a central repository while they are working on a project. Since each developer has his or her own repository they can work on changes without any form of connectivity. All changes are committed to the developer’s local repository and can be shared later whenever the developer has a connection to other developers.

Changes can be kept private while preserving change history

Developers often want to keep their changes private until they are done working on them and are ready to share them. In a traditional centralized version control system this can result in a developer checking in a large number of changes as a single revision rather than doing incremental check ins for each change. This results in much of the change history being completely lost.

In a distributed system the entire change history is stored in the developer’s local repository. When the developer shares his or her changes with other developers they also have access to the history for those files that are involved in the change. This allows the complete change history to be preserved for each file in the project while still giving the developer control over when they share their changes.

Project managers do not need to grant permission to developers

In a centralized version control system the project managers often have to grant each developer access to the repository before they can begin to make changes. In a distributed system this is not necessary because a new developer only needs a reasonably current copy of the source code of a project to begin to make changes. Once the developer is ready to share his or her changes they can send them to other developers who can then choose to either accept the changes into their own repositories or reject them.

No single point of failure

In a centralized version control system if the server with the repository fails or becomes unavailable then developers can not check in new changes or retrieve the change history of anything in the project. In a distributed system if a repository goes down everyone can continue to work without any problems because everything is stored in their local repository. If their local repository becomes corrupted or destroyed, they can simply get a new copy of the repository (perhaps from another developer) and continue working.

Disadvantages of Distributed Version Control Systems

Security

Sometimes it is desirable to be able to prevent certain developers from accessing certain sections of a project, particularly in a commercial development project. In a centralized system this can be accomplished by having the server only allow access to those parts of the project that the project managers have defined for each developer. In a distributed version control system it is very hard to do this because developers can share changes with each other and there is no easy way to determine what sections of a project a developer may have in his or her repository.

Conflict Resolution

In any version control system there will be the need for conflict resolution. If more than one developer makes changes to a single file in a project at approximately the same time a conflict can occur. Someone must compare the various versions of the file and merge the changes into a single new version before it can be checked in.

In a distributed version control system each developer is working from their own repository; so conflicts are somewhat more likely to occur than in a centralized system. Fortunately, most distributed version control systems are capable of automatically resolving the conflict when a change is shared. When a conflict must be resolved manually, the resulting merged changes can then be easily shared with the other developers working on the project.

Architectural Awareness

Distributed version control systems allow for great flexibility when choosing how changes should be shared. A distributed version control system can be completely decentralized and changes shared in a peer-to-peer fashion or it can resemble a centralized system, or a mixture of the two.

Different developers will have different changes in their repositories and it may be necessary to connect to the repositories of multiple developers to retrieve the latest version of each file. This can make retrieving a current copy of the source files for a project a bit more difficult especially if the files are being collected to make a release build for the project.

However, it is possible to set up a central repository that developers can send their changes to when they are ready to share them. This allows developers to take advantage of being able to work “offline” from the central repository and share changes directly with other developers while still maintaining a central location where changes can be checked in once the developer is happy with them.

Yet another possibility is for project leaders to set up a central repository but only allow project leaders to send changes to it. In this architecture the project leaders receive changes from the developers and choose whether or not to send them on to the central repository.

In a centralized system, the developer knows that all changes are checked in to a single repository and the latest files are always in the repository. When using a distributed system, developers must be aware of the overall architecture used by the project to ensure they share their changes in the expected way. A developer must learn whether he or she is expected to send changes to a central repository or if they must be sent to someone else for approval. Some projects may not use a central repository at all and may simply require each developer to send changes to all of the other developers in the project.

Conclusion

Distributed version control systems are becoming very popular among development teams and I believe the primary reason for this is the fact that a distributed system can be set up to resemble virtually any architecture imaginable. Development teams can even set them up to act much like a centralized system while still retaining some of the advantages of a distributed system such as being able to work while disconnected from the repository; and being able to preserve change history even when a large group of changes is checked in all at once to a central repository (as long as the developer checked in each change incrementally to his or her own repository.)

Unless a project needs to have tight control over which developers can access certain areas of the project there is little reason for a development team to use a centralized version control system instead of a distributed version control system. However, cooperation between developers is essential for any project.

Some may see distributed version control systems as encouraging developers to work alone with little contact with other team members and only share their changes when they are finished. This can cause problems if the other team members disagree with the changes or how they were implemented. Proper planning and communication can go a long way to prevent this from happening in a project.

Common Distributed Version Control Systems

References