CSC/ECE 517 Fall 2009/wiki1a 5 History of version control by av
History of version control
Version control is a software utility used to manage data, especially source code, within a software development environment. Managing changes are important for software engineers, since most of time they need to add, modify or fix a small portion of the code (also known as creating new version). Version control does the housekeeping of the changes by providing a detailed history of each document . Version control implements several techniques to ensure the integrity of the information, while making use of minimal resources. This is a comprehensive list of version control software.
Introduction
Version control is a fundamental part of software configuration management SCM. Also known as revision control, it acts as a key lifeline in a team environment. Version control works by keeping a record of every unit of information and tracks the changes and updates made.
It also allows multiple users to edit the same document at the same time. This provides a valuable resource to team projects, thereby enabling a better working environment. Keeping records of all changes done to a set of information provides a reliable solution to data management, and an archive for future reference.
Version control works by uniquely identifying each piece of information and recording the changes with it. The information is stored in a data repository, an information database, and can be accessed remotely or locally depending on the type of system. When a user is required to make changes to a document, he can checkout the corresponding file from the system, thus getting access to the most up‐to‐date version of it. Once the file has been amended, the user can then commit and re‐submit the file, updating the copy in the repository , ready for use by other users.
While file systems provide operations like open, save, rename and delete, version control systems provide checking-in and checking-out.Like their file system counterparts checking-in stores a file version, and checking-out retrieves a file revision from the system.
Trunk is the main copy of a project on which development progresses. Team members create branches (Similar to fork) from the trunk and work on that copy.This provides the flexibility to modify the document in parallel along both branches . At the end changes are merged to the trunk , and this process continues.
Issues Version Control addressed
The basic purpose of a Version Control System was to maintain different versions of a file.
Following are some issues requiring a Version Control System:
- Change tracking : While working in a team project it gets difficult to keep track of what changes are made in the code, why they are made and who made them. A VCS enables change tracking by documenting every change with all the requisite details.
- Reversion : At times changes in a particular code module can lead to the entire application failing during regression testing and calls for reversion to a code version that is known good. If reversion is difficult or unreliable, it's hard to risk making changes at all.
- Bug tracking : In an agile software development environment it's quite common to get new bug reports for a particular version after the code has mutated away from it considerably.But when the bug doesn't reproduce under the new version, it gets difficult to know whether it still exists or has been fixed already. Under such circumstances it calls for getting back to the older version of the code in order to reproduce and comprehend it.
- Concurrency : The ability to have many people across separate geographic locations modifying the same collection of files knowing that conflicting modifications can be detected and resolved.
- History : The ability to attach historical data to your data, such as explanatory comments about the intention behind each change to it. Even for a programmer working solo, change histories are an important aid to memory; for a multi-person project, they are a vitally important form of communication among developers.
History
Change and Configuration Control (CCC)
The history of Version control dates back to 1975 when Software Configuration Management SCM became commerical for the first time with the advent of CCC,which was developed by the SoftTool Corporation.CCC offered a central repository and provided a trunkery system that documented every change thus enhancing the accountability of the system and validity of the information stored.The changes were recorded as they occured,at regular intervals, during development and maintenance and at baseline release.
Source Code Control System(SCCS)
Then came the SCCS which was developed by M. J Rochkind in the very early 1970’s. Designed basically for a unix system, it provided a simple locking model where in only one person could check out and edit a file at one time.This led to serialized development.Based on a central repository it used the notion of discrete delta to record changes which were combined and reconstructed to produce the final version of the product.
Diff Algorithm
The diff algorithm was developed by AT&T Bell Labs in 1974, and prototyped by James W. Hunt in 1976. It worked by finding the longest common subsequence, then comparing the data preceding and following it, incorporating the changes in a diff or a patchfile.
Revsion Control System(RCS)
In the early 1980’s Walter Tichy introduced the RCS. RCS introduced both forward and reverse delta concepts for efficient storage of different file revisions.In forward deltas the origin version is stored, all subsequent versions are stored as sets of changes or deltas where as in backword deltas, the most recent version is stored, all previous versions are stored as set |1| Logically similar to SCCS,it has a cleaner command interface and good facilities for grouping together entire project releases under symbolic names.It is well suited for single-developer or small-group projects hosted at a single development site|2|.
Concurrent Version System(CVS)
Next came the CVS, designed and originally implemented by Dick Grunein 1986 and then modified by Berliner et al in 1990. It became the defacto standard within the open soure community for many years because it didn't require files to be locked while checked out, reconciliating non-conflicting changes mechanically and requesting human intervention on conflicts. One notable drawback to CVS was that it didn't support versioning of re-named or relocated material identifying them as new files instead of new versions.
Subversion(SVN)
Perceived shortcomings and faults in CVS eventually led to a new version control system called SVN around 2001.It was developed by CollabNet Inc. Unlike CVS, SVN committed changes atomically and significantly had better support for branches.
Distributed Version Control
The paradigm then shifted from client server architecture to a distributed system around 2001, with the development of systems like SVK, Bitkeeper, Mercurial, GNU Arch, DARCS, GIT, Bazaar, monotone, codeville, Vesta , Aegis and many more. Distributed revision control took a peer-to-peer approach, as opposed to the client-server approach of centralized systems. The repository was split into several s sub‐repositories for each section or module of a project. The sub‐repositories could be stored on servers or local machines.This enabled every developer to edit his local sharable copy without having to connect to any network connection.Synchronization was conducted by exchanging patches (change-sets) from peer to peer.
A detailed comparision of all the Version Control Systems till date can be found here.
Big Shift in Paradigm
The traditional Version Control Systems were based on a Client Server Model. It is a simple and easy to use model that uses a central repository that is accessible to all the users allowing them to get an up-to-date version instantly. It works for backup, undo and synchronization but has the following drawbacks:
- Branching and Merging is cumbersome.Users have to manually track revisions between merged branches.
- Peer to Peer synchronization is not supported.
- Offline commits is not supported as all operations occur through a connection to a centralized server.
- Data back up is not adequate as there is a single repository.
- Performance is slow.
In order to overcome the drawbacks inherent in the Centralized Model,most of the recent Version Control Systems adopted a Distributed model that has several clones of the main repository, each downloaded to a local machine for use by a single user, instead of a single repository.It offers the following advantages-
- More flexible as, as they allow many different types of workflows, from a classic centralized workflow, to a purely ad hoc, to a mixture of ad hoc and centralized|3|.
- Provides the users unlimited access to the repository even when they are not connected to the network.
- Supports Peer to Peer Synchronization as the users rely on a group of users rather than a central entity.
- It has better merging and branching capabilities.
- Information is more secure as each local repository acts as a backup of the central repository.
- Speed of execution of commands as well as viewing transaction and error reports is fast as no network connection is involved.
Detailed explanation of the two models with an analogy can be found here.
Conclusion
Version Control is becoming a standard with most areas in the computer arena from being embedded into various types of softwares like word processors (e.g. OpenOffice.org Writer, Microsoft Word, KOffice, Pages, Google Docs), spreadsheets (e.g. OpenOffice.org Calc, Google Spreadsheets, Microsoft Excel), and in various content management systems, to being a part of Wiki too. Whether a waterfall, spiral, iterative, agile or open source approach is used, improvements to VCS technology have a direct impact on collaboration effectiveness. Overall it can be said that versioning systems are powerful development management tools that provide security, efficiency and most importantly an asynchronous platform for software development.
See Also
[1] Vincenzo Ambriola, Lars Bendix and Paolo Ciancarini , Software Engineering Journal November 1990 The evolution of configuration management and version control
[2] Benjamin Neal,ECM 3406 Dissertation 11th December 2008 , Version Control:An Overview
[3] Bryan O'Sullivan, ACM 2009, Making Sense of Revision-control Systems
[4] Löh, A., Swierstra, W., Leijen, D. 2007, A principled approach to version control
[5] Marc J. Rochkind, IEEE Transactions on Software Engineering December 1975, The Source Code Control System
[6] Ian Clatworthy, Distributed Version Control Systems – Why and How, Distributed Version Control System
References
[1] http://www.cs.colorado.edu/~kena/classes/3308/f03/lectures/lecture12.pdf
[2] http://catb.org/esr/writings/taoup/html/ch15s05.html
[3] http://www.ibm.com/developerworks/aix/library/au-dist_ver_control/