CSC/ECE 517 Fall 2011/ch1 1f sv: Difference between revisions

From Expertiza_Wiki
Jump to navigation Jump to search
 
(365 intermediate revisions by 2 users not shown)
Line 1: Line 1:
=''Comparing version - control systems from the programmer's stand point''=
==Comparing Version - Control Systems from the programmer's stand point==


==''Introduction : Version Control Systems''==
[http://en.wikipedia.org/wiki/Revision_control Version Control System (VCS)] is a software used by a group of users simultaneously working on the same document, program, image or other information.  This helps people to simultaneously collaborate with one another on a project without constantly having to swap files between them.This system lets the users track the changes made to the files over time, which means that at any point of time the users can roll back to a previous version of the file. The versions are usually identified by an incremental letter code known as a revision number. This helps go back to the last known good version of the file in case of errors.
The changes made to the file are labeled with the name of the person who introduced it, time of the change and an optional description to track the evolution of the file over time.VCS also allows you to branch from the source file, create a parallel copy of the file, make your own changes and then merge the two files in the future.


[http://en.wikipedia.org/wiki/Revision_control Version Control System (VCS)] is a software that allows to manage changes of documents, programs, images and other information that is stored in form of computer files. Changes are usually identified by an incrementing number or letter code also known as revision number or revision.The simplest usage of versioning is - you can easily go back to the previous working version of your files, should you mess something up with the latest changes.Changes could range from fixing a typo in a text file up to a huge refactoring in a software project, spanning hundreds of files. Each change usually has name of the person introduced it, time of the change and an optional description message.
Here we shall look at the different types of Version Control Systems available and a few examples of the same, discussing about their features and limitations from a programmer's stand point.


==''Types of Version Control Systems''==
__TOC__
 
==Overview==
 
The concept of Version Control is gaining increasing ground with the increase in the number of Software development projects requiring a large group of programmers to develop onto the same source code. As many developers collaborate to develop new projects, it becomes all the more important to ensure consistency among the work being done and deployed. Besides enabling collaborative development, version - control also helps track and fix bugs in the source without the other parts of the project being affected. It is critically important when developers need to identify the root cause of a bug, by retrieving and running different versions of the project. Version-Control Systems range from a simple file based model where individual files are versioned to a distributed model where the entire repository is versioned.
 
In most cases,the programmers are geographically separated from one another and are working on different parts of the code and pursuing varied interests and goals.
 
==Terminology==
 
*'''[http://en.wikipedia.org/wiki/Repository Repository]:''' A repository is the place where the current and historical data of the files under a project are stored, often on a server. It can either be of a file or centralized or a distributed model.
 
*'''[http://www.ericsink.com/scm/scm_branches.html Branch]:''' When two or more developers intend to work on distinct copies of the file, those set of files under version control may be branched so that each of them can work independently on their own copy. The changes made to a filein a particular branch are not visible to the other developers unless they havebeen merged back.
 
*'''[http://en.wikipedia.org/wiki/Trunk_(software) Trunk]:''' The unique line of development that is not a branch. This is sometimes also known as the mainline.
 
*'''[http://svnbook.red-bean.com/en/1.5/svn.branchmerge.tags.html Tag]:''' A tag or label refers to an important snapshot in time, consistent across many files. These files at that point may all be tagged with a user-friendly, meaningful name or revision number.
 
*'''[http://www.tigris.org/nonav/scdocs/ddCVS_cvsglossary.html Working copy]:''' The working copy is the local copy of files from a repository on which the developer works, at a specific time or revision. Any changes to be done to the repository are initially done on a working copy. Conceptually, it is a sandbox.
 
*'''[http://www.tigris.org/nonav/scdocs/ddCVS_cvsglossary.html Check-out]:''' Creating a local working copy from the repository is called check-out. The developer may either check - out a particular revision of the file or just the latest.
 
*'''[http://en.wikipedia.org/wiki/Commit_(data_management) Commit]:''' The act of merging the changes made in the working copy back to the repository is known as a commit or check-in.
 
*'''[http://en.wikipedia.org/wiki/Merge_(revision_control) Merge]:''' When two sets of changes are applied to a file or set of files, it is known as a merge operation.
 
*'''Promote:''' When the content of a file is copied from a less controlled location into a more controlled location, it is called as promoting. Example, from the workspace of a user to the central repository.
 
==Criteria for comparing the different Version Control Systems==
 
Listed below are the criteria for comparing the different types of VCS:
 
====General Properties====
 
 
*'''The repository model:''' describes relationship between the various copies of the source code repository.
*'''The concurrency model:'''describes how can simultaneous edits to the working copy can be done without stepping on ther users feet.
*'''Platform:''' specifies the operating system that the software supports.
 
====Technical Properties====
 
*'''Scope of change:''' describes if changes are recorded on a file basis or a directory basis.
*'''Version IDs:''' describes if version  number is based on hashed content of the file, or sequential numbering of files etc.
 
====Features====
 
*'''Programming Language:''' describes the programming language in which the software has been written.
*'''Atomic commits:''' ensures either all the changes are committed or none are.
*'''File renames:''' describes if the software allows files to be renamed while retaining their version history.
*'''Merge file renames:'''describes if the software can merge changes made to a file on one branch into the same file that has been renamed on another branch.
*'''Symbolic links:''' describes if the software allows version control of symbolic links as with regular files.
*'''Merge tracking:''' describes if the software tracks the changes which have been merged between respeictive branches and only merges the changes that are missing when merging one branch into another.
 
==Types of Version Control Systems==


The version control systems can be classified into three categories:
The version control systems can be classified into three categories:


===1. Local version Control===
==Local version Control==
 
In this approach, all developers make use of the same computer system. Single files are managed individually and are largely replaced or embedded within newer software.
 
====General Properties of Local Version Control====
 
*It employs a file server repository model, maintaining multiple copies of the same file.
*Checksums and deltas are employed to maintain file integrity
*It was basically developed for the UNIX Operating Systems.


In the local-only approach, all developers must use the same computer system. These software often manage single files individually and are largely replaced or embedded within newer software.
====Technical Properties of Local Version Control====


Examples of this approach are:
*It versions only individual files.
*Versioning is achieved through a sequential numbering of the files.


[http://en.wikipedia.org/wiki/Revision_Control_System Revision Control System (RCS)] stores the latest version and backward deltas for fastest access to the trunk tip compared to SCCS and an improved user interface, at the cost of slow branch tip access and missing support for included/excluded deltas.
==Examples of Local Version Control==


[[File:revision_control.jpg]]
===Source Code Control System===


[http://en.wikipedia.org/wiki/Source_Code_Control_System Source Code Control System (SCCS)] is a part of UNIX and is based on interleaved deltas, and can construct versions as arbitrary sets of revisions. Extracting an arbitrary version takes essentially the same speed and is thus more useful in environments that rely heavily on branching and merging with multiple "current" and identical versions.
[http://en.wikipedia.org/wiki/Source_Code_Control_System Source Code Control System (SCCS)] is an early version control system and is a part of UNIX. SCCS is now obsolete and is replaced by Revision Control System. It is based on interleaved deltas, and can construct versions as arbitrary sets of revisions. Extracting an arbitrary version takes essentially the same speed and is thus more useful in environments that rely heavily on branching and merging with multiple "current" and identical versions.


===2. Client - Server Model===
====Features of SCSS====


In the client-server model, developers use a shared single repository.
*It was initially written in SNOBOL but later wrritten in C.
*SCCS provides facilities for storing, updating and retrieving all versions of modules by version number.
*It records the change, time of change, user who did the change and location of change.
*It helps retrieve an earlier version of the file in case of system crashes or unintended deletion of the file.


===Open Source===
====Limitations of SCSS====
*It versions only individual files, therefore it is suitable only for small projects consisting of single developers.


[http://en.wikipedia.org/wiki/Concurrent_Versions_System Concurrent Versions System (CVS)] was originally built on RCS and licensed under the GPL.
===Revision Control System===


CVS uses a client–server architecture in which clients connect to the server where the current version(s) of a project and its history is stored and "check out" a complete copy of the project, work on this copy and then later "check in" their changes. The client and server can connect over a LAN or over the Internet, or client and server may both run on the same machine if track of the version history of a project with only local developers is required.  
[[Image:revision_control.jpg|thumb|225px|alt=Revision Control System.|[http://en.wikipedia.org/wiki/Revision_Control_System ''Revision Control System'']]]


Several developers can work concurrently on the same project, each one "checking out" files of the project within their "working copy", and "checking in" their changes to the server. The problem of users stepping on others feet is avoided as the server allows users to "check-in" to the most recent version of the file. Developers are therefore expected to keep their working copy up-to-date by incorporating other people's changes on a regular basis. This task is mostly handled automatically by the CVS client, requiring manual intervention only when an edit conflict arises between a checked-in modification and the yet-unchecked local version of a file.
[http://en.wikipedia.org/wiki/Revision_Control_System Revision Control System(RCS)] is a software used to manage multiple revisions of a file. For text that is edited frequently RCS is used for automating the storing, retrieval, logging, identification, and merging of revisions.
A successful check - in operation, increments the version numbers of all files involved automatically, and the CVS-server writes a user-supplied description line, the date and the author's name to its log files.
It is advantageous in case of single user applications as it does not require a central Repository for storing revisions.


Clients can also compare versions, request a complete history of changes, or check out a historical snapshot of the project as of a given date or as of a revision number.
====Features of RCS====


CVS labels a single project (set of related files) which it manages as a module. A CVS server stores the modules it manages in its repository. Programmers acquire copies of modules by checking out. The checked-out files serve as a working copy, sandbox or workspace. Changes to the working copy will be reflected in the repository by committing them. To update is to acquire or merge the changes in the repository with the working copy.
*It is written in C.
*Mulitple copies of the same file are maintained.
*RCS stores deltas.  


Some of the drawbacks of CVS are
====Limitations of RCS====


Revisions created by a commit are per file, rather than spanning the collection of files that make up the project or spanning the entire repository.  
*It is error-prone, due to many copies of the same file.
*It is advantageous with a single developer working on the project. Multiple developers needs administrative intervention for maintenance of files.
*Built - in locking mechanism is made use of when several developers are working on the same project, instead of branching individual files as the syntax for branching of files isvery cumbersome.


CVS does not version the moving or renaming of files and directories.
==Centralized Version Control==


No versioning of symbolic links. Symbolic links stored in a version control system can pose a security risk - someone can create a symbolic link index.htm to /etc/passwd and then store it in the repository; when the "code" is exported to a Web server the Web site now has a copy of the system security file available for public inspection.  
The Centralized Version Control, also known as the Client - Server model, consists of a single shared repository which acts as the server and the users are the clients. The repository is located at one place and provides access to all the clients for making changes, commits and sending and receiving information.


Limited support for Unicode and non-ASCII filenames.
====General Properties of the Centralized version Control====


No atomic commit. The network and server used should have sufficient resilience that a commit can complete without ever crashing. In many code management processes, development work is performed on branches, and then merged into the trunk after code review - that final merge is 'atomic' and performed in the data center by QA.  
*It employs a client-server (centralized) repository model.
*It employs a merge concurrency model.
*It runs on Windows, MAC and UNIX - like Operating Systems.


Expensive branch operations. CVS assumes that the majority of work will take place on the trunk — branches should generally be short-lived or historical. When used as designed, branches are easily managed and branch operations are efficient and fast.
====Technical Properties of the Centralized version Control====


CVS treats files as textual by default. Text files should be the primary file type stored in the CVS repository. Binary files are supported and files with a particular file extension can automatically be recognized as being binary.
*Changes made to the files are recorded on a file basis.
*Sequential numbering of files is used for versioning.


No support for distributed revision control or unpublished changes. Programmers should commit changes to the files often for frequent merging and rapid publication to all users.
==Examples of the Centralized Version Control==


[http://en.wikipedia.org/wiki/Subversion_(software) Subversion (svn)] is an open-source, Apache License versioning control system inspired by CVS and is available on the major operating systems.
===Concurrent Versions System===
 
[http://en.wikipedia.org/wiki/Concurrent_Versions_System Concurrent Versions System (CVS)] was originally built on Revision Control System and is licensed under the GPL.It uses a client–server architecture in which clients connect to the server where the current versions of a project along with its history is stored and "check out" a complete copy of the project, make changes to this copy and then later "check in" their changes. The client can establish a connection with the server either over a LAN or the Internet. CVS also provides facility for client and server to run on the same machine if the version history of a project is limited to be tracked with only local developers.
 
The Concurrent Versions System is an important component of the Source Configuration Management.
 
====Features of CVS====
 
*It is written in C.
*Several developers can work concurrently on the same project, each one "checking out" files of the project within their "working copy", and "checking in" their changes to the server.
*In CVS the client can continue to work on the checked out copy even if the network between the client and the server is disconnected.
*The problem of users stepping on others feet is avoided as the server allows users to "check-in" to the most recent version of the file.It is therefore expected of the developers to keep their working copy up-to-date by incorporating other people's changes on a regular basis.
*On a successful check - in, the version numbers of all the files are incremented and the copies are updated with a user supplied description line, author’s name and the time.
 
Clients can also compare versions, request a complete history of changes, or check out a historical snapshot of the project as of a given date or as of a revision number.CVS labels a single project (set of related files) which it manages as a module. A CVS server stores the modules it manages in its repository. Programmers acquire copies of modules by checking out. The checked-out files serve as a working copy, sandbox or workspace. Changes to the working copy will be reflected in the repository by committing them. To update is to acquire or merge the changes in the repository with the working copy.
 
====Limitations of CVS====
 
*Revisions created by a commit are per file, rather than spanning the collection of files that make up the project or spanning the entire repository.
*CVS does not version the moving or renaming of files and directories.
*Versioning of symbolic links is not enabled.
*Limited support for Unicode and non-ASCII filenames.
*Commits are not atomic.
*Branch operations are expensive.
 
===Subversion===
 
[http://en.wikipedia.org/wiki/Subversion_(software) Subversion (SVN)] is an open-source, Apache License versioning control system inspired by CVS and is available on the major operating systems.
 
''"[http://subversion.apache.org/ Subversion] exists to be universally recognized and adopted as an open-source, centralized version control system characterized by its reliability as a safe haven for valuable data; the simplicity of its model and usage; and its ability to support the needs of a wide variety of users and projects, from individuals to large-scale enterprise operations."''
 
====Features of SVN====


SVN provides developers with the following advantages when compared to legacy CVS:
SVN provides developers with the following advantages when compared to legacy CVS:


a. All commits are atomic operations.
*It is written in C.
b. Full revision history is maintained for files renamed/copied/moved/removed.
*All commits are atomic operations - No part of a commit takes effect until the entire commit has succeeded.
c. Versioning is maintained for directories, renames, and file metadata which enables developers to move and/or copy entire directory-trees while retaining the entire revision history.
*Subversion versions directories as first-class objects, just like files.
d. Versioning of symbolic links.
*Copying, deleting and renaming are versioned operations.
e. Branching as a cheap operation, independent of file size.  
*Subversion allows properties to be attached to files. These properties are also versioned when the files are versioned. These properties can be added to the files even after the commit is made.
f. Files which cannot be merged, are locked by developers which is known as "Reserved checkouts".
*Revision numbers are per-commit, not per-file, and commit's log message is attached to its revision, not stored redundantly in all the files affected by that commit.
*A Tag is created when a copy operation is done on the file. Any copy is a tag until a commit operation is done on it. After a commit operation, it becomes a branch.
*Subversion supports (but does not require) locking files so that users can be warned when multiple people try to edit the same file. A file can be marked as requiring a lock before being edited, in which case Subversion will present the file in read-only mode until a lock is acquired.
*Full revision history is maintained for files renamed/copied/moved/removed.
*Versioning is maintained for directories, renames, and file metadata which enables developers to move and/or copy entire directory-trees while retaining the entire revision history.
*Versioning of symbolic links is enabled.
*Branching in SVN is a cheap operation, independent of file size.  
*Files which cannot be merged, are locked by developers which is known as "Reserved checkouts".
 
The transaction model is employed by the  Subversion filesystem to keep changes truly atomic. A transaction operates on a specified revision of the filesystem, not necessarily the latest. The changes are made on the root of the transaction, which when committed becomes the latest version, or is aborted.Several developers can access the same transaction and work together on an atomic change.
 
'''Repository types and Branching'''
 
SVN offers two types of repository storage :
*'''Berkeley DB''', a reliable database system. It provides real transactional support and hot backups. Hot Backups allow the user to backup the database without the need to take the DB offline. Not being portable, is one of the major limitations of Berkeley DBs.
 
*'''FSFS''', an alternative to the original Berkeley DB implementation, does not use a database. Platform - independence,little or no need for recovery, smaller repositories and standard backup software are a few advantages of FSFS.
 
SVN employs the inter-file branching model of trunks, branches and tags to handle versioning.
 
[[File:Branches and Tags.jpg|thumb|center|600px|'''''Illustration of Trunk, Branches and Tags''''']]
 
 
The 'svn copy' command creates a new branch. This creates  an old and a new version copy which are linked together internally and history is perserved for both. Only the differences between the copied and the original versions are stored in the repository, resulting in the copied version taking up only a small amount of space. The versions in a branch, maintain the history of the file till the point of the copy, and any changes made since. The changes made can be merged back into the trunk or between branches.
 
====Limitations of SVN====
 
*It does not have repository administration and management features.
*It does not store the timestamps of modifications.
*It stores additional copies of data on the local machine which can cause space issues for large projects.
*It does not provide support for merge of file renames.
 
=='''Distributed Version Control'''==
 
[[File:Distributed Version Control.jpg|thumb|right|280px|'''''Distributed Version Control Model''''']]
 
In [http://en.wikipedia.org/wiki/Distributed_revision_control Distributed Version Control Systems], instead of maintaining a centralized shared repository, the entire repository is maintained by the developers and the changes are shared between them on a peer to peer basis. This system emphasizes on sharing changes among peers rather than tracking, backing up and synchronizing changes. Everyone maintains a personal sandbox, which maintains the incremental history and thus programmers are not required to log in to a central server. The Programmers are not required to be connected to the network at all times. Even if the server is down, the programmers can continue working from anywhere.
 
If required a Distributed Version Control System can create centrally administered locations where the users update the changes. Though most of the times , the repository is independently maintained by the users. The users are not required to share the changes on a regular basis and can do so at their discretion. The changes are made to the local repository and hence the commits and reverts are much faster than in a Centralized system. The DVC's are easier to set up and manage. A perennially running server is not required to be maintained. Adding peers is not centrally administered and thus programmers can attach themselves to the system.
 
===General Properties of Distributed Version Control===
 
*It has a distributed repository model.
*It has a merge concurrency model.
*It is supported on Windows, POSIX based systems and Unix-like systems, such as FreeBSD, Mac OS X and Linux.
 
===Technical Properties of Distributed Version Control===
 
* Every working copy is a mirror of the repository and changes are done on a directory basis.
* Versioning is based on a "guid" or a Unique id like a hash id, which makes it simpler for the system to track and share changes.
* The guids help in merging files without creating duplicates.
 
==Examples of Distributed Version - Control System==
===Git===
 
[http://git-scm.com/ Git] is a distributed version control system where the emphasis is laid on speed. In this, the working directory is a full-fledged repository with complete history and full revision tracking capabilities, not dependent on network access or a central server. It comes with a simple design and offers strong support for non-linear development. It is fully distributed and can handle large projects like the Linux Kernel efficiently.
 
Git differs from the other types of version control systems in the way it thinks about its data. It thinks of its data more like a set of snapshots of mini file system rather than a list  of file-based changes.
 
Git was originally designed to be a low level version control system engine to which others could code their front end. It has eventually grown into a complete revision control system which can be used directly by the programmers.
 
====Features of Git====
 
*It is written in C, Shell Scripts, Perl.
*All commits are atomic operations.
*Git provides partial support for file renames. It does not explicitly track file renames as it does not track individual files.
*Git provides support for merge file renames.
*Versioning of symbolic links is enabled.
*Everytime a commit is performed, Git takes a snapshot of the files.
*Files that have not been modified are not copied.
*All operations like browse history and commit are local.
 
In Git, Version IDs are checksummed with [http://en.wikipedia.org/wiki/SHA-1 SHA - 1] hash which is a 40 character string based on the content of the file.Git knows everything by hash and not by file name.The working copy of a developer is a mirror repository, called a Git clone and it is a complete repository with entire history and revision tracking capabilities. Such a repository is neither dependent on network access or a central server. This model enables branching and merging to be fast and easy to do.
 
[[File:The Three States.jpg|thumb|right|300px|The Three States]]
'''The Three States'''
 
Git has three states in which a file can reside : Committed, Modified and Staged.
*'''Committed:''' Data is stored safely in the database.
*'''Modified:''' The file has been changed but not yet merged to the database.
*'''Staged:''' A modified file has been marked in its current version to go into the next commit snapshot.
 
 
'''Workflow of Git'''
 
Files are modified by the developer in the working directory and are then staged in the staging area, where snapshots of them are added. Once the changes are committed, the files are taken from the staging area and the snaphots are stored permanently in the GIT directory.
 
====Limitations of Git====
 
* A central repository is not available in case the programmers loses data and needs to access the core data.
* There's no stable latest version of the code present centrally.
* There are no revision numbers, only change numbers and hence going back to a previous revision is not easy.
 
===Mercurial===
 
[http://mercurial.selenic.com/ Mercurial] is an Open Source Distributed Version Control Tool for software developers. It is programmed to be used on multiple platforms. Mercurial is a fast , lightweight and efficient tool for handling of very large distributed projects. Mercurial supports both personal projects as well as large Scale Enterprise level projects. Mercurial was mainly implemented in Python, but also includes a binary diff implementation written in C.
 
====Features of Mercurial====
 
*It is written in Python, C.
*All commits are atomic operations.
*Mercurial provides support for File Renames.
*Mercurial provides support for the merging of file renames.
*Mercurial provides symbolic links to create references to other files.
*Mercurial supports merge tracking.
 
Mercurial uses an HTTP based protocol which tries to reduce the round trip time, new connections and data transferred. It can also work over a similar protocol in SSH. Similar to Git, in Mercurial revisions are identified using [http://en.wikipedia.org/wiki/SHA-1 SHA-1] Hashes.
 
====Limitations of Mercurial====
 
* A central repository is not available in case the programmers loses data and needs to access the core data Only working copies exist.
* There's no stable latest version of the code present centrally.
* There are no revision numbers, only change numbers and hence going back to a previous revision is not easy.
 
==Comparison in a Nutshell==
 
{| class="wikitable sortable" style="font-size: 90%; text-align: center; width: auto;"
|-
! Software
! Repository model
! Concurrency model
! Programming Language
! Platforms supported
! Version IDs
! Scope of Change
! Atomic Commits
! File Renames
! Merge File Renames
! Symbolic Links
! Merge Tracking
|-
!SCCS
| File Server
| Checksums and Deltas
| C
| Windows, UNIX like operating systems, MAC OS
| Sequential numbering
| Individual files are versioned
| No
| No
| No
| No
| No
|-
!RCS
| File Server
| Checksums and Deltas
| C
| Windows, UNIX like Operating Systems, MAC OS
| Sequential Numbering
| Individual files are versioned
| No
| No
| No
| No
| No
|-
!CVS
| Centralized Server
| Merge
| C
| Windows, UNIX like Operating Systems, MAC OS
| Sequential Numbering
| Individual files are versioned
| No
| No
| No
| No
| No
|-
!SVN
| Centralized Server
| Merge or Lock
| C
| Windows, UNIX like Operating Systems, MAC OS
| Sequential Numbering
| Tree structured
| Yes
| Yes
| No
| Yes
| Yes
|-
!Git
| Distributed Server
| Merge
| C, Shell Scripts and Perl
| Windows, UNIX like Operating Systems, MAC OS, POSIX based systems
| SHA -1 hash
| Tree
| Yes
| Partial
| Yes
| Yes
| Yes
|-
!Mercurial
| Distributed Server
| Merge
| Python, C
| Windows, UNIX like Operating Systems, MAC OS
| SHA -1 hash
| Tree
| Yes
| Yes
| Yes
| Yes
| Yes
|}
 
==See also==
 
* [http://en.wikipedia.org/wiki/List_of_revision_control_software List of Revision Control Software]
 
==References==
 
[http://en.wikipedia.org/wiki/Revision_control  1. Wikipedia - Revision Control]
 
[http://en.wikipedia.org/wiki/Subversion_(software) 2. Wikipedia - Subversion]
 
[http://en.wikipedia.org/wiki/Comparison_of_revision_control_software 3. Wikipedia - Comparison of Revision Control Software]
 
[http://better-scm.berlios.de/comparison/comparison.html 4. Comparison of SCM]
 
[http://git-scm.com/ 5. GIT - SCM]
 
[http://progit.org/book/ 6. PROGIT]


[http://subversion.apache.org/ 7. Subversion Apache]


[edit]Repository types
[http://expertiza.csc.ncsu.edu/wiki/index.php/CSC/ECE_517_Fall_2010/ch1_1a_br 8. Expertiza Fall 2010 Wiki]
Subversion offers two types of repository storage — FSFS and Berkeley DB.


Repository types
[http://expertiza.csc.ncsu.edu/wiki/index.php/CSC/ECE_517_Fall_2010/ch1_1a_vc 9. Expertiza Fall 2010 Wiki1]
Subversion offers two types of repository storage — FSFS and Berkeley DB.
[edit]FSFS
FSFS works faster on directories with a large number of files and takes less disk space, due to less logging.[5] Beginning with Subversion 1.2, FSFS is the default data store for new repositories.
[edit]Berkeley DB
Subversion has some limitations with Berkeley DB usage when a program that accesses the database crashes or terminates forcibly. No data loss or corruption occurs, but the repository is offline while Berkeley DB replays the journal and cleans up any outstanding locks. When using Berkeley DB repository, the only way to use it safely is on the dedicated server and by a single server process running as one user, according to Version Control with Subversion.[6] Existing tools for Berkeley DB repository recovery are not completely reliable, so system administrators need to make frequent repository backups[citation needed].
[edit]Repository access
Main article: Comparison of Subversion clients
Access to Subversion repositories can take place by the following means:
Local filesystem or network filesystem,[7] accessed by client directly. This mode uses the file:///path access scheme.
WebDAV/Delta-V (over http or https) using the mod_dav_svn module for Apache 2. This mode uses the http://host/path access scheme or https://host/path for secure connections using ssl.
Custom "svn" protocol (default port 3690), using plain text or over TCP/IP. This mode uses either the svn://host/path access scheme for unencrypted transport or svn+ssh://host/path scheme for tunneling over ssh.
All three means can access both FSFS and Berkeley DB repositories.
Any 1.x version of a client can work with any 1.x server. Newer clients and servers have additional features and performance capabilities, but have fallback support for older clients/servers.[8]
[edit]Layers


Internally, a Subversion system comprises several libraries arranged as layers. Each performs a specific task and allows developers to create their own tools at the desired level of complexity and specificity.
[http://en.wikipedia.org/wiki/Revision_Control_System 10. Wikipedia - Revision Control System]
Fs
The lowest level; it implements the versioned filesystem which stores the user data.
Repos


[http://mercurial.selenic.com/ 11. Mercurial Selenic]


These are versioned just like other changes to the filesystem. Users can add any property they wish, and the Subversion client uses a set of properties, which it prefixes with 'svn:'.
[http://en.wikipedia.org/wiki/SHA-1 12. Wikipedia SHA -1]
svn:executable
Makes files on Unix-hosted working copies executable.
svn:mime-type
Stores the Internet media type ("MIME type") of a file. Affects the handling of diffs and merging.
svn:ignore
A list of filename patterns to ignore in a directory. Similar to CVS's .cvsignore file.
svn:keywords
A list of keywords to substitute into a file when changes are made. The file itself must also reference the keywords as $keyword$ or $keyword:...$. This is used to maintain certain information (e.g., author, date of last change, revision number) in a file without human intervention.
The keyword substitution mechanism originates from rcs[10] and from cvs.
svn:eol-style
Makes the client convert end-of-line characters in text files. Used when the working copy is needed with a specific EOL style. "native" is commonly used, so that EOLs match the user's OS EOL style. Repositories may require this property on all files to prevent inconsistent line endings, which can cause a problem in itself.
svn:externals
Allows parts of other repositories to be automatically checked-out into a sub-directory.
svn:needs-lock
Specifies that a file is to be checked out with file permissions set to read-only. This is designed for use with the locking mechanism. The read-only permission reminds one to obtain a lock before modifying the file: obtaining a lock makes the file writable, and releasing the lock makes it read-only again. Locks are only enforced during a commit operation. Locks can be used without setting this property. However, that is not recommended, because it introduces the risk of someone modifying a locked file; they will only discover it has been locked when their commit fails.
svn:special
This property is not meant to be set or modified directly by users. As of 2010 only used for having symbolic links in the repository. When a symbolic link is added to the repository, a file containing the link target is created with this property set. When a Unix-like system checks out this file, the client converts it to a symbolic link.
svn:mergeinfo
Used to track merge data (revision numbers) in Subversion 1.5 (or later). This property is automatically maintained by the merge command, and it is not recommended to change its value manually.[11]
Subversion also uses properties on revisions themselves. Like the above properties on filesystem entries the names are completely arbitrary, with the Subversion client using certain properties prefixed with 'svn:'. However, these properties are not versioned and can be changed later.
svn:date
the date and time stamp of a revision
svn:author
the name of the user that submitted the change(s)
svn:log
the user-supplied description of the change(s);
[edit]Branching and tagging


SVN employs the inter-file branching model to handle branches.  
[http://en.wikipedia.org/wiki/Mercurial 13. Wikipedia Mercurial]


a. A trunk is a location where the main development occurs.
==External Links==
b. A branch is a separate line of development, used to isolate changes.
c. Tags refer to a snapshot of the content.


A new branch is created by using the 'svn copy' command. This creates old and new version copies whicha re linked ogether internally and history is perserved for both. The copied versions take up only a small amount of space as only the differences from the original versions are saved in the repository. All the versions in each branch, maintain the history of the file up to the point of the copy, plus any changes made since. One can "merge" changes back into the trunk or between branches.
*[http://en.wikipedia.org/wiki/Repository Wikipedia - Repository ]
*[http://www.ericsink.com/scm/scm_branches.html Branch]
*[http://en.wikipedia.org/wiki/Trunk_(software) Wikipedia - Trunk ]
*[http://svnbook.red-bean.com/en/1.5/svn.branchmerge.tags.html Tag]
*[http://www.tigris.org/nonav/scdocs/ddCVS_cvsglossary.html Working copy and Checking - out]
*[http://en.wikipedia.org/wiki/Commit_(data_management) Wikipedia - Commit]
*[http://en.wikipedia.org/wiki/Merge_(revision_control) Wikipedia - Merge]

Latest revision as of 03:03, 25 September 2011

Comparing Version - Control Systems from the programmer's stand point

Version Control System (VCS) is a software used by a group of users simultaneously working on the same document, program, image or other information. This helps people to simultaneously collaborate with one another on a project without constantly having to swap files between them.This system lets the users track the changes made to the files over time, which means that at any point of time the users can roll back to a previous version of the file. The versions are usually identified by an incremental letter code known as a revision number. This helps go back to the last known good version of the file in case of errors. The changes made to the file are labeled with the name of the person who introduced it, time of the change and an optional description to track the evolution of the file over time.VCS also allows you to branch from the source file, create a parallel copy of the file, make your own changes and then merge the two files in the future.

Here we shall look at the different types of Version Control Systems available and a few examples of the same, discussing about their features and limitations from a programmer's stand point.

Overview

The concept of Version Control is gaining increasing ground with the increase in the number of Software development projects requiring a large group of programmers to develop onto the same source code. As many developers collaborate to develop new projects, it becomes all the more important to ensure consistency among the work being done and deployed. Besides enabling collaborative development, version - control also helps track and fix bugs in the source without the other parts of the project being affected. It is critically important when developers need to identify the root cause of a bug, by retrieving and running different versions of the project. Version-Control Systems range from a simple file based model where individual files are versioned to a distributed model where the entire repository is versioned.

In most cases,the programmers are geographically separated from one another and are working on different parts of the code and pursuing varied interests and goals.

Terminology

  • Repository: A repository is the place where the current and historical data of the files under a project are stored, often on a server. It can either be of a file or centralized or a distributed model.
  • Branch: When two or more developers intend to work on distinct copies of the file, those set of files under version control may be branched so that each of them can work independently on their own copy. The changes made to a filein a particular branch are not visible to the other developers unless they havebeen merged back.
  • Trunk: The unique line of development that is not a branch. This is sometimes also known as the mainline.
  • Tag: A tag or label refers to an important snapshot in time, consistent across many files. These files at that point may all be tagged with a user-friendly, meaningful name or revision number.
  • Working copy: The working copy is the local copy of files from a repository on which the developer works, at a specific time or revision. Any changes to be done to the repository are initially done on a working copy. Conceptually, it is a sandbox.
  • Check-out: Creating a local working copy from the repository is called check-out. The developer may either check - out a particular revision of the file or just the latest.
  • Commit: The act of merging the changes made in the working copy back to the repository is known as a commit or check-in.
  • Merge: When two sets of changes are applied to a file or set of files, it is known as a merge operation.
  • Promote: When the content of a file is copied from a less controlled location into a more controlled location, it is called as promoting. Example, from the workspace of a user to the central repository.

Criteria for comparing the different Version Control Systems

Listed below are the criteria for comparing the different types of VCS:

General Properties

  • The repository model: describes relationship between the various copies of the source code repository.
  • The concurrency model:describes how can simultaneous edits to the working copy can be done without stepping on ther users feet.
  • Platform: specifies the operating system that the software supports.

Technical Properties

  • Scope of change: describes if changes are recorded on a file basis or a directory basis.
  • Version IDs: describes if version number is based on hashed content of the file, or sequential numbering of files etc.

Features

  • Programming Language: describes the programming language in which the software has been written.
  • Atomic commits: ensures either all the changes are committed or none are.
  • File renames: describes if the software allows files to be renamed while retaining their version history.
  • Merge file renames:describes if the software can merge changes made to a file on one branch into the same file that has been renamed on another branch.
  • Symbolic links: describes if the software allows version control of symbolic links as with regular files.
  • Merge tracking: describes if the software tracks the changes which have been merged between respeictive branches and only merges the changes that are missing when merging one branch into another.

Types of Version Control Systems

The version control systems can be classified into three categories:

Local version Control

In this approach, all developers make use of the same computer system. Single files are managed individually and are largely replaced or embedded within newer software.

General Properties of Local Version Control

  • It employs a file server repository model, maintaining multiple copies of the same file.
  • Checksums and deltas are employed to maintain file integrity
  • It was basically developed for the UNIX Operating Systems.

Technical Properties of Local Version Control

  • It versions only individual files.
  • Versioning is achieved through a sequential numbering of the files.

Examples of Local Version Control

Source Code Control System

Source Code Control System (SCCS) is an early version control system and is a part of UNIX. SCCS is now obsolete and is replaced by Revision Control System. It is based on interleaved deltas, and can construct versions as arbitrary sets of revisions. Extracting an arbitrary version takes essentially the same speed and is thus more useful in environments that rely heavily on branching and merging with multiple "current" and identical versions.

Features of SCSS

  • It was initially written in SNOBOL but later wrritten in C.
  • SCCS provides facilities for storing, updating and retrieving all versions of modules by version number.
  • It records the change, time of change, user who did the change and location of change.
  • It helps retrieve an earlier version of the file in case of system crashes or unintended deletion of the file.

Limitations of SCSS

  • It versions only individual files, therefore it is suitable only for small projects consisting of single developers.

Revision Control System

Revision Control System.
Revision Control System

Revision Control System(RCS) is a software used to manage multiple revisions of a file. For text that is edited frequently RCS is used for automating the storing, retrieval, logging, identification, and merging of revisions. It is advantageous in case of single user applications as it does not require a central Repository for storing revisions.

Features of RCS

  • It is written in C.
  • Mulitple copies of the same file are maintained.
  • RCS stores deltas.

Limitations of RCS

  • It is error-prone, due to many copies of the same file.
  • It is advantageous with a single developer working on the project. Multiple developers needs administrative intervention for maintenance of files.
  • Built - in locking mechanism is made use of when several developers are working on the same project, instead of branching individual files as the syntax for branching of files isvery cumbersome.

Centralized Version Control

The Centralized Version Control, also known as the Client - Server model, consists of a single shared repository which acts as the server and the users are the clients. The repository is located at one place and provides access to all the clients for making changes, commits and sending and receiving information.

General Properties of the Centralized version Control

  • It employs a client-server (centralized) repository model.
  • It employs a merge concurrency model.
  • It runs on Windows, MAC and UNIX - like Operating Systems.

Technical Properties of the Centralized version Control

  • Changes made to the files are recorded on a file basis.
  • Sequential numbering of files is used for versioning.

Examples of the Centralized Version Control

Concurrent Versions System

Concurrent Versions System (CVS) was originally built on Revision Control System and is licensed under the GPL.It uses a client–server architecture in which clients connect to the server where the current versions of a project along with its history is stored and "check out" a complete copy of the project, make changes to this copy and then later "check in" their changes. The client can establish a connection with the server either over a LAN or the Internet. CVS also provides facility for client and server to run on the same machine if the version history of a project is limited to be tracked with only local developers.

The Concurrent Versions System is an important component of the Source Configuration Management.

Features of CVS

  • It is written in C.
  • Several developers can work concurrently on the same project, each one "checking out" files of the project within their "working copy", and "checking in" their changes to the server.
  • In CVS the client can continue to work on the checked out copy even if the network between the client and the server is disconnected.
  • The problem of users stepping on others feet is avoided as the server allows users to "check-in" to the most recent version of the file.It is therefore expected of the developers to keep their working copy up-to-date by incorporating other people's changes on a regular basis.
  • On a successful check - in, the version numbers of all the files are incremented and the copies are updated with a user supplied description line, author’s name and the time.

Clients can also compare versions, request a complete history of changes, or check out a historical snapshot of the project as of a given date or as of a revision number.CVS labels a single project (set of related files) which it manages as a module. A CVS server stores the modules it manages in its repository. Programmers acquire copies of modules by checking out. The checked-out files serve as a working copy, sandbox or workspace. Changes to the working copy will be reflected in the repository by committing them. To update is to acquire or merge the changes in the repository with the working copy.

Limitations of CVS

  • Revisions created by a commit are per file, rather than spanning the collection of files that make up the project or spanning the entire repository.
  • CVS does not version the moving or renaming of files and directories.
  • Versioning of symbolic links is not enabled.
  • Limited support for Unicode and non-ASCII filenames.
  • Commits are not atomic.
  • Branch operations are expensive.

Subversion

Subversion (SVN) is an open-source, Apache License versioning control system inspired by CVS and is available on the major operating systems.

"Subversion exists to be universally recognized and adopted as an open-source, centralized version control system characterized by its reliability as a safe haven for valuable data; the simplicity of its model and usage; and its ability to support the needs of a wide variety of users and projects, from individuals to large-scale enterprise operations."

Features of SVN

SVN provides developers with the following advantages when compared to legacy CVS:

  • It is written in C.
  • All commits are atomic operations - No part of a commit takes effect until the entire commit has succeeded.
  • Subversion versions directories as first-class objects, just like files.
  • Copying, deleting and renaming are versioned operations.
  • Subversion allows properties to be attached to files. These properties are also versioned when the files are versioned. These properties can be added to the files even after the commit is made.
  • Revision numbers are per-commit, not per-file, and commit's log message is attached to its revision, not stored redundantly in all the files affected by that commit.
  • A Tag is created when a copy operation is done on the file. Any copy is a tag until a commit operation is done on it. After a commit operation, it becomes a branch.
  • Subversion supports (but does not require) locking files so that users can be warned when multiple people try to edit the same file. A file can be marked as requiring a lock before being edited, in which case Subversion will present the file in read-only mode until a lock is acquired.
  • Full revision history is maintained for files renamed/copied/moved/removed.
  • Versioning is maintained for directories, renames, and file metadata which enables developers to move and/or copy entire directory-trees while retaining the entire revision history.
  • Versioning of symbolic links is enabled.
  • Branching in SVN is a cheap operation, independent of file size.
  • Files which cannot be merged, are locked by developers which is known as "Reserved checkouts".

The transaction model is employed by the Subversion filesystem to keep changes truly atomic. A transaction operates on a specified revision of the filesystem, not necessarily the latest. The changes are made on the root of the transaction, which when committed becomes the latest version, or is aborted.Several developers can access the same transaction and work together on an atomic change.

Repository types and Branching

SVN offers two types of repository storage :

  • Berkeley DB, a reliable database system. It provides real transactional support and hot backups. Hot Backups allow the user to backup the database without the need to take the DB offline. Not being portable, is one of the major limitations of Berkeley DBs.
  • FSFS, an alternative to the original Berkeley DB implementation, does not use a database. Platform - independence,little or no need for recovery, smaller repositories and standard backup software are a few advantages of FSFS.

SVN employs the inter-file branching model of trunks, branches and tags to handle versioning.


Illustration of Trunk, Branches and Tags


The 'svn copy' command creates a new branch. This creates an old and a new version copy which are linked together internally and history is perserved for both. Only the differences between the copied and the original versions are stored in the repository, resulting in the copied version taking up only a small amount of space. The versions in a branch, maintain the history of the file till the point of the copy, and any changes made since. The changes made can be merged back into the trunk or between branches.

Limitations of SVN

  • It does not have repository administration and management features.
  • It does not store the timestamps of modifications.
  • It stores additional copies of data on the local machine which can cause space issues for large projects.
  • It does not provide support for merge of file renames.

Distributed Version Control

Distributed Version Control Model

In Distributed Version Control Systems, instead of maintaining a centralized shared repository, the entire repository is maintained by the developers and the changes are shared between them on a peer to peer basis. This system emphasizes on sharing changes among peers rather than tracking, backing up and synchronizing changes. Everyone maintains a personal sandbox, which maintains the incremental history and thus programmers are not required to log in to a central server. The Programmers are not required to be connected to the network at all times. Even if the server is down, the programmers can continue working from anywhere.

If required a Distributed Version Control System can create centrally administered locations where the users update the changes. Though most of the times , the repository is independently maintained by the users. The users are not required to share the changes on a regular basis and can do so at their discretion. The changes are made to the local repository and hence the commits and reverts are much faster than in a Centralized system. The DVC's are easier to set up and manage. A perennially running server is not required to be maintained. Adding peers is not centrally administered and thus programmers can attach themselves to the system.

General Properties of Distributed Version Control

  • It has a distributed repository model.
  • It has a merge concurrency model.
  • It is supported on Windows, POSIX based systems and Unix-like systems, such as FreeBSD, Mac OS X and Linux.

Technical Properties of Distributed Version Control

  • Every working copy is a mirror of the repository and changes are done on a directory basis.
  • Versioning is based on a "guid" or a Unique id like a hash id, which makes it simpler for the system to track and share changes.
  • The guids help in merging files without creating duplicates.

Examples of Distributed Version - Control System

Git

Git is a distributed version control system where the emphasis is laid on speed. In this, the working directory is a full-fledged repository with complete history and full revision tracking capabilities, not dependent on network access or a central server. It comes with a simple design and offers strong support for non-linear development. It is fully distributed and can handle large projects like the Linux Kernel efficiently.

Git differs from the other types of version control systems in the way it thinks about its data. It thinks of its data more like a set of snapshots of mini file system rather than a list of file-based changes.

Git was originally designed to be a low level version control system engine to which others could code their front end. It has eventually grown into a complete revision control system which can be used directly by the programmers.

Features of Git

  • It is written in C, Shell Scripts, Perl.
  • All commits are atomic operations.
  • Git provides partial support for file renames. It does not explicitly track file renames as it does not track individual files.
  • Git provides support for merge file renames.
  • Versioning of symbolic links is enabled.
  • Everytime a commit is performed, Git takes a snapshot of the files.
  • Files that have not been modified are not copied.
  • All operations like browse history and commit are local.

In Git, Version IDs are checksummed with SHA - 1 hash which is a 40 character string based on the content of the file.Git knows everything by hash and not by file name.The working copy of a developer is a mirror repository, called a Git clone and it is a complete repository with entire history and revision tracking capabilities. Such a repository is neither dependent on network access or a central server. This model enables branching and merging to be fast and easy to do.

The Three States

The Three States

Git has three states in which a file can reside : Committed, Modified and Staged.

  • Committed: Data is stored safely in the database.
  • Modified: The file has been changed but not yet merged to the database.
  • Staged: A modified file has been marked in its current version to go into the next commit snapshot.


Workflow of Git

Files are modified by the developer in the working directory and are then staged in the staging area, where snapshots of them are added. Once the changes are committed, the files are taken from the staging area and the snaphots are stored permanently in the GIT directory.

Limitations of Git

  • A central repository is not available in case the programmers loses data and needs to access the core data.
  • There's no stable latest version of the code present centrally.
  • There are no revision numbers, only change numbers and hence going back to a previous revision is not easy.

Mercurial

Mercurial is an Open Source Distributed Version Control Tool for software developers. It is programmed to be used on multiple platforms. Mercurial is a fast , lightweight and efficient tool for handling of very large distributed projects. Mercurial supports both personal projects as well as large Scale Enterprise level projects. Mercurial was mainly implemented in Python, but also includes a binary diff implementation written in C.

Features of Mercurial

  • It is written in Python, C.
  • All commits are atomic operations.
  • Mercurial provides support for File Renames.
  • Mercurial provides support for the merging of file renames.
  • Mercurial provides symbolic links to create references to other files.
  • Mercurial supports merge tracking.

Mercurial uses an HTTP based protocol which tries to reduce the round trip time, new connections and data transferred. It can also work over a similar protocol in SSH. Similar to Git, in Mercurial revisions are identified using SHA-1 Hashes.

Limitations of Mercurial

  • A central repository is not available in case the programmers loses data and needs to access the core data Only working copies exist.
  • There's no stable latest version of the code present centrally.
  • There are no revision numbers, only change numbers and hence going back to a previous revision is not easy.

Comparison in a Nutshell

Software Repository model Concurrency model Programming Language Platforms supported Version IDs Scope of Change Atomic Commits File Renames Merge File Renames Symbolic Links Merge Tracking
SCCS File Server Checksums and Deltas C Windows, UNIX like operating systems, MAC OS Sequential numbering Individual files are versioned No No No No No
RCS File Server Checksums and Deltas C Windows, UNIX like Operating Systems, MAC OS Sequential Numbering Individual files are versioned No No No No No
CVS Centralized Server Merge C Windows, UNIX like Operating Systems, MAC OS Sequential Numbering Individual files are versioned No No No No No
SVN Centralized Server Merge or Lock C Windows, UNIX like Operating Systems, MAC OS Sequential Numbering Tree structured Yes Yes No Yes Yes
Git Distributed Server Merge C, Shell Scripts and Perl Windows, UNIX like Operating Systems, MAC OS, POSIX based systems SHA -1 hash Tree Yes Partial Yes Yes Yes
Mercurial Distributed Server Merge Python, C Windows, UNIX like Operating Systems, MAC OS SHA -1 hash Tree Yes Yes Yes Yes Yes

See also

References

1. Wikipedia - Revision Control

2. Wikipedia - Subversion

3. Wikipedia - Comparison of Revision Control Software

4. Comparison of SCM

5. GIT - SCM

6. PROGIT

7. Subversion Apache

8. Expertiza Fall 2010 Wiki

9. Expertiza Fall 2010 Wiki1

10. Wikipedia - Revision Control System

11. Mercurial Selenic

12. Wikipedia SHA -1

13. Wikipedia Mercurial

External Links