CSC/ECE 517 Fall 2011/ch1 1f sv: Difference between revisions
Line 104: | Line 104: | ||
*The problem of users stepping on others feet is avoided as the server allows users to "check-in" to the most recent version of the file. | *The problem of users stepping on others feet is avoided as the server allows users to "check-in" to the most recent version of the file. | ||
*Developers are therefore expected to keep their working copy up-to-date by incorporating other people's changes on a regular basis. | *Developers are therefore expected to keep their working copy up-to-date by incorporating other people's changes on a regular basis. | ||
* | *On a successful check - in, the version numbers of all the files are incremented and the copies are updated with a user supplied description line, author’s name and the time. | ||
Clients can also compare versions, request a complete history of changes, or check out a historical snapshot of the project as of a given date or as of a revision number. | Clients can also compare versions, request a complete history of changes, or check out a historical snapshot of the project as of a given date or as of a revision number. |
Revision as of 04:38, 7 September 2011
Comparing version - control systems from the programmer's stand point
Introduction : Version Control Systems
Version Control System (VCS) is a software used by a group of users simultaneously working on the same document, program, image or other information. This helps people to simultaneously collaborate with one another on a project without constantly having to swap files between them.This system lets the users track the changes made to the files over time, which means that at any point of time the users can roll back to a previous version of the file. The versions are usually identified by an incremental letter code known as a revision number. This helps go back to the last known good version of the file in case of errors. The changes made to the file are labeled with the name of the person who introduced it, time of the change and an optional description to track the evolution of the file over time.VCS also allows you to branch from the source file, create a parallel copy of the file, make your own changes and then merge the two files in the future.
Here we shall look at the different types of Version Control Systems available and a few examples of the same, discussing about their features and limitations from a programmer's stand point.
Terminology
- Repository: A repository is where the current and historical data of the files under a project are stored, often on a server. It is sometimes also called a depot in VCS like PreForce.
- Branch: A set of files under version control may be branched or forked at a point in time so that, from that time forward, two copies of those files may develop at different speeds or in different ways independently of each other.
- Trunk: The unique line of development that is not a branch. This is sometimes also known as the mainline.
- Tag: A tag or label refers to an important snapshot in time, consistent across many files. These files at that point may all be tagged with a user-friendly, meaningful name or revision number.
- Working copy: The working copy is the local copy of files from a repository, at a specific time or revision. All work done to the files in a repository is initially done on a working copy, hence the name. Conceptually, it is a sandbox.
- Check-out: A check-out (or co) is the act of creating a local working copy from the repository. A user may specify a specific revision or obtain the latest. The term 'checkout' can also be used as a noun to describe the working copy.
- Commit: A commit (check-in) is the action of writing or merging the changes made in the working copy back to the repository.
- Merge: A merge or integration is an operation in which two sets of changes are applied to a file or set of files.
- Promote: The act of copying file content from a less controlled location into a more controlled location.
Criteria for comparing the different Version Control Systems
A few of the criteria on which the different types of VCS would be compared are stated below:
General Properties
- The repository model: describes relationship between the various copies of the source code repository.
- The concurrency model:describes how can simultaneous edits to the working copy can be done without stepping on ther users feet.
- Platform: specifies the operating system that the software supports.
Technical Properties
- Scope of change: describes if changes are recorded on a file basis or a directory basis.
- Version IDs: describes if version number is based on hashed content of the file, or sequential numbering of files etc.
Features
- Atomic commits: ensures either all the changes are committed or none are.
- File renames: describes if the software allows files to be renamed while retaining their version history.
- Merge file renames:describes if the software can merge changes made to a file on one branch into the same file that has been renamed on another branch.
- Symbolic links: describes if the software allows version control of symbolic links as with regular files.
- Merge tracking: describes if the software tracks the changes which have been merged between respeictive branches and only merges the changes that are missing when merging one branch into another.
Types of Version Control Systems
The version control systems can be classified into three categories:
1. Local version Control
In the local-only approach, all developers must use the same computer system. These softwares often manage single files individually and are largely replaced or embedded within newer software.
Examples of this approach are:
Revision Control System
Revision Control System (RCS) stores the latest version and backward deltas for fastest access to the trunk tip compared to SCCS and an improved user interface, at the cost of slow branch tip access and missing support for included/excluded deltas.
Features of RCS
- Mulitple copies of the same file are maintained.
- RCS stores deltas.
Limitations of RCS
- It is error-prone.
2. Centralized Version Control
The Centralized Version Control, also known as the Client - Server model, consists of a single shared repository which acts as the server and the users are the clients. The repository is located at one place and provides access to all the clients for making changes, commits and sending and receiving information.
General Properties of the Centralized version Control
- It employs a client-server (centralized) repository model.
- It employs a merge concurrency model.
- It runs on Windows, MAC and UNIX - like Operating Systems.
Technical Properties of the Centralized version Control
- Changes made to the files are recorded on a file basis.
- Sequential numbering of files is used for versioning.
Examples of the Centralized Version Control
Concurrent Versions System
Concurrent Versions System (CVS) was originally built on RCS and licensed under the GPL.It uses a client–server architecture in which clients connect to the server where the current versions of a project and its history is stored and "check out" a complete copy of the project, make changes to this copy and then later "check in" their changes. The client and server can connect over a LAN or over the Internet, or client and server may both run on the same machine if a tracking of the version history of a project with only local developers is required.
The Concurrent Versions System is an important component of the Source Configuration Management.
Features of CVS
- Several developers can work concurrently on the same project, each one "checking out" files of the project within their "working copy", and "checking in" their changes to the server.
- The problem of users stepping on others feet is avoided as the server allows users to "check-in" to the most recent version of the file.
- Developers are therefore expected to keep their working copy up-to-date by incorporating other people's changes on a regular basis.
- On a successful check - in, the version numbers of all the files are incremented and the copies are updated with a user supplied description line, author’s name and the time.
Clients can also compare versions, request a complete history of changes, or check out a historical snapshot of the project as of a given date or as of a revision number.
CVS labels a single project (set of related files) which it manages as a module. A CVS server stores the modules it manages in its repository. Programmers acquire copies of modules by checking out. The checked-out files serve as a working copy, sandbox or workspace. Changes to the working copy will be reflected in the repository by committing them. To update is to acquire or merge the changes in the repository with the working copy.
Limitations of CVS
- Revisions created by a commit are per file, rather than spanning the collection of files that make up the project or spanning the entire repository.
- CVS does not version the moving or renaming of files and directories.
- Versioning of symbolic links is not enabled.
- Limited support for Unicode and non-ASCII filenames.
- Commits are not atomic.
- Branch operations are expensive.
Subversion
Subversion (SVN) is an open-source, Apache License versioning control system inspired by CVS and is available on the major operating systems.
"Subversion exists to be universally recognized and adopted as an open-source, centralized version control system characterized by its reliability as a safe haven for valuable data; the simplicity of its model and usage; and its ability to support the needs of a wide variety of users and projects, from individuals to large-scale enterprise operations."
Features of SVN
SVN provides developers with the following advantages when compared to legacy CVS:
- All commits are atomic operations.
- Full revision history is maintained for files renamed/copied/moved/removed.
- Versioning is maintained for directories, renames, and file metadata which enables developers to move and/or copy entire directory-trees while retaining the entire revision history.
- Versioning of symbolic links.
- Branching as a cheap operation, independent of file size.
- Files which cannot be merged, are locked by developers which is known as "Reserved checkouts".
The transaction model is employed by the Subversion filesystem to keep changes truly atomic. A transaction operates on a specified revision of the filesystem, not necessarily the latest. The changes are made on the root of the transaction, which when committed becomes the latest version, or is aborted.Several developers can access the same transaction and work together on an atomic change.
Repository types and Branching
SVN offers two types of repository storage :
- FSFS and
- Berkeley DB.
SVN employs the inter-file branching model of trunks, branches and tags to handle versioning.
The 'svn copy' command creates a new branch. This creates an old and a new version copy which are linked together internally and history is perserved for both. Only the differences between the copied and the original versions are stored in the respoistory, resulting in the copied version taking up only a small amount of space. The versions in a branch, maintain the history of the file till the point of the copy, and any changes made since. The changes made can be merged back into the trunk or between branches.
Limitations of SVN
- It does not have repository administration and management features.
- It does not store the timestamps of modifications.
- It stores additional copies of data on the local machine which can cause space issues for large projects.
- It does not provide support for merge of file renames.
3. Distributed Version Control
In this model of version control, clients do not check out individual files, but mirror the entire repository. Each developer works directly on their own local repository, and the changes are shared among the repositories in a separate step.
Git
Git is a distributed version control system where the emphasis is laid on speed. In this, the working directory is a full-fledged repository with complete history and full revision tracking capabilities, not dependent on network access or a central server. It comes with a simple design and offers strong support for non-linear development. It is fully distributed and can handle large projects like the Linux Kernel efficiently.
Git differs from the other types of version control systems in the way it thinks about its data. It thinks of its data more like a set of snapshots of mini file system rather than a list of file-based changes.
The Three States
Git has three states in which a file can reside : Committed, Modified and Staged.
- Committed: Data is stored safely in the database.
- Modified: The file has been changed but not yet merged to the database.
- Staged: A modified file has been marked in its current version to go into the next commit snapshot.
Workflow of Git
- Developer modifies files in the working directory.
- The files are staged, snapshots of which are added to the staging area.
- The changes are committed, which takes the files from the staging area and the snaphots are stored permanently in the GIT directory.
General Properties of Git
- Git employs a distributed repository model.
- Git uses a merge concurrency model.
- Git is primarily developed for Linux based systems, but runs also on UNIX - like operating systems, POSIX based systems and Windows.
Technical Properties of Git
- Version IDs are checksummed with SHA - 1 hash which is a 40 character string based on the content of the file.
- Git knows everything by hash and not by file name.
Features of Git
- All commits are atomic operations.
- Git provides partial support for file renames. It does not explicitly track file renames as it does not track individual files.
- Git provides support for merge file renames.
- Versioning of symbolic links is enabled.
- Everytime a commit is performed, Git takes a snapshot of the files.
- Files that have not been modified are not copied.
- All operations like browse history and commit are local.
The working copy of a developer is a mirror repository, called a Git clone and it is a complete repository with entire history and revision tracking capabilities. Such a repository is neither dependent on network access or a central server. This model enables branching and merging to be fast and easy to do.
References
[1]http://en.wikipedia.org/wiki/Revision_control
[2]http://en.wikipedia.org/wiki/Subversion_(software)
[3]http://en.wikipedia.org/wiki/Comparison_of_revision_control_software