CSC/ECE 517 Fall 2010/ch1 1a vc

From Expertiza_Wiki
Jump to navigation Jump to search

Introduction

Version Control Systems are typically used in a software facility(s), and employed for the primary purposes of collaborative work. While working on a project with common resources (files, folders, information etc.), it becomes essential to put a definite order on the sanctity of the data if the resources are editable.

For example, if there are 10 people working on a project, and 3 people are working on the same file. It becomes important on how they can add/modify data such that all are neither locked out waiting for a lock put by anyone, nor there is any inconsistency in the data. Also, at any point in the life-cycle of the software, the developers might feel the need to go back to an earlier state of the file. Version Control Systems allow to go back to any previous version of a file, as each version is stored in the central repository and is given a label.

Version control systems typically let us create versions of the resources, where each developer can work on a different version, therefore providing the flexibility. There are a whole bunch of features provided by the new Version Control Systems. We shall look at the evolution and history of the same in the subsequent sections.

Taxonomy of Version Control

Branch : When the development team needs to work on two distinct copies of a project, a branch is created. [4] The branch is a replica of the existing code at the time of its creation. On creation of the branch, changes made to a branch are confined to it and are not visible in any other branch.

Repository: Repository is the data store where all the versions of all the files in the project are stored. A repository may be centralized or distributed.

Check-Out: Creating a working copy of the files in the repository on a local machine is called Check-out. Every user who checks out a file has a working copy of the file on his computer. Changes made to the working copy are not visible to the team members

Check-In: Putting back the edited files to the repository for use by everybody is called Check-In

Merge: Merging is the process of combining different working copies of the file into the repository.[3] Each developer is allowed to work on their independent working copies, and everyone’s changes to a single file are combined by merging

Label: A label is an identifier given to a branch when created. Also known as Tag

Trunk: The highest location of the repository is called Trunk.


History of Open Source Version Control Systems

Source Code Control System (SCCS)

The exact time in history when and how the version systems came about into existence is unclear. However, we can estimate roughly that the version control system can be rooted to the Source Code Control System (SCCS) [3],[6]. It was developed by the Bell Labs in 1972. It is no longer prevalent, but the file format is still internally used today by other softwares like BitKeeper and Sablime[6]

Revision Control System(RCS)

The next generation of Version control system was the RCS. It was first released in 1982, and developed by Walter F. Tichy of Purdue University. It was considered to be a better version of SCCS. Also, the version syntax was very complicated. Usually the developers employed locking mechanism, and worked on a single branch(head).

One situation where RCS could prove to be useful is while a single user needs to work on a project. As RCS does not demand a central repository to be set up, and the administration and maintenance of files is easy, hence, it is easy.

The limitations of RCS were that RCS could only work on a single file. There was no way that multiple files in a project could be versioned.[7]


Concurrent Versions System(CVS)

CVS was one step ahead in the evolution of the Version control Systems. CVS is distributed under the public GNU license. One major advantage where the CVS edges over RCS is its Client-Server Architecture. RCS did not support a client-server architecture, which made it difficult for distributed development.

CVS based systems provided for the first, the ability to check out a version of the main copy in the repository. The developers could work on their individual copies, and later, push it back into their branches. Once all the development on the files was done, the developers could merge their code with the live copy. There is however a restriction posed by the CVS systems, that is it only allows modification of code on the latest version. This calls for the developers to regularly sync their code with main copy in the repository, so that the modifications done by other people are also covered.

CVS uses delta compression technique to save space on the multiple copies of the same files. In short, the CVS systems keep only one copy per se. Wherever another set of copy is needed, it just stores the differences from the original copy. If there is no difference between the two, then the 2 files are in sync, else out of sync.

One of the biggest advantages of the CVS system was the ability for distributed development. However, there were quite a few areas where it lacked and called for better methods. To name a few, CVS could only work reliably with ASCII characters. It did not work well with other character sets for example UTF.Also, there was no atomicity of operations i.e. there was no way to ensure commit or rollback in case of an error. A detailed list of criticisms is mentioned at http://en.wikipedia.org/wiki/Concurrent_Versions_System#Criticism.

Subversion(SVN)

Working on most of the criticisms of CVS mentioned before, in 2000, an attempt to improve the structure of the CVS led to the development of SVN. The developers did not want to change the philosophy or change the structure of CVS in a totally new way, they just wanted to revamp whatever was done before. Partly because CVS was then the de facto standard for version control. [9]

SVN matched CVS's features, and preserve the same development model, but not duplicate CVS's most obvious flaws. SVN was aimed to be similar enough that any CVS user could make the switch with little effort.

SVN provided the following additions/changes to CVS features[11]:

Directory versioning: CVS only tracks the history of individual files, but Subversion implements a “virtual” versioned filesystem that tracks changes to whole directory trees over time. Files and directories are versioned in SVN.

True version history: Since CVS is limited to file versioning, operations such as copies and renames—which might happen to files, but which are really changes to the contents of some containing directory—aren't supported in CVS. Additionally, in CVS you cannot replace a versioned file with some new thing of the same name without the new item inheriting the history of the old—perhaps completely unrelated—file. With Subversion, you can add, delete, copy, and rename both files and directories. And every newly added file begins with a fresh, clean history all its own.

Atomic commits: A collection of modifications either goes into the repository completely, or not at all. This allows developers to construct and commit changes as logical chunks, and prevents problems that can occur when only a portion of a set of changes is successfully sent to the repository.

Versioned metadata: Each file and directory has a set of properties (key-value pairs) associated with it. You can create and store any arbitrary key/value pairs you wish. Properties are versioned over time, just like file contents.

Choice of network layers: Subversion has an abstracted notion of repository access, making it easy for people to implement new network mechanisms. Subversion can plug into the Apache HTTP Server as an extension module. This gives Subversion a big advantage in stability and interoperability, and instant access to existing features provided by that server—authentication, authorization, wire compression, and so on. A more lightweight, standalone Subversion server process is also available. This server speaks a custom protocol which can be easily tunneled over SSH.

Consistent data handling: Subversion expresses file differences using a binary differencing algorithm, which works identically on both text (human-readable) and binary (human-unreadable) files. Both types of files are stored equally compressed in the repository, and differences are transmitted in both directions across the network.

Efficient branching and tagging: The cost of branching and tagging need not be proportional to the project size. Subversion creates branches and tags by simply copying the project, using a mechanism similar to a hard-link. Thus these operations take only a very small, constant amount of time.

Hackability: Subversion has no historical baggage; it is implemented as a collection of shared C libraries with well-defined APIs. This makes Subversion extremely maintainable and usable by other applications and languages.

Git

In 2002, BitMover released a community licensed version of BitKeeper, its proprietary VCS that allowed developers to use it free provided they did not participate in the development of a competing tool[10]. One of major drawbacks of the community versions of BitKeeper was that users were not allowed to see metadata and compare past versions. In April 2005, BitMover withdrew the community version of BitKeeper and the Git Project was launched. Git was aimed at becoming the Linux kernel's source configuration management software, and was eventually adopted by Linux developers.

The main characteristics of Git can be found here

History of Commercial Version Control Systems

Polytron Version Control System (PVCS)

PVCS was originally published by Polytron in 1985 and is currently sold by Serena Software. PVCS uses the locking mechanism for concurrency control by creating a parallel branch for the second commiter so that modifications to the same project can exist in parallel.[10] This is unlike CVS and Subversion where the second commiter needs to first merge the changes via the update command and then resolve conflicts (when they exist) before actually committing.[13]

ClearCase

ClearCase was originally developed by Atria Software in 1992 on Unix and was written in C++. It was partly derived from HP software. Atria Software merged with Rational Software which was acquired by IBM in 2003

Clearcase mainly works on the concept of Views. A view is a version of the software that is specific to each developer. Each developer 'sees' the version of the project that is related to the view. Views can be defined by means of a Config spec. A config spec is set of rules that govern what version of each file has to be loaded into the view.

ClearCase repositories are called VOBs (Versioned Object Base). It has a proprietary network file system called MVFS (MultiVersion File System)[12] MVFS can be used to mount VOBs as a virtual file system through a dynamic view. Every developer in team has a dynamic view which resides on the network server. the local copy of the views that are created on developer machines are called sand-box views. Developers make changes to their sandbox views and merge the changes into the dynamic view.

Visual SourceSafe

SourceSafe was developer by One Tree Software which was bought by Microsoft in 1994.[14] Soon after acquiring One Tree, Microsoft discontinued all versions of SourceSafe except the Windows version. Microsft also made some architectural changes in the software which were not very well received by the users and Visual SourceSafe was gained the reputation of Visual SourceUnsafe.[14]

MKS Integrity

MKS was released in 2001 by MKS Inc. MKS in based on a client server architecture and is available in both desktop and web client interfaces. MKS Integrity enables project teams to track all aspects of their work, such as work items, source control, reporting, and build management, in a single product.[15] MKS Integrity has plug-ins available for Eclipse and Visual Studio

References