CSC/ECE 517 Fall 2009/wiki1a 5 rp: Difference between revisions

From Expertiza_Wiki
Jump to navigation Jump to search
No edit summary
(Unfilling, since mediawiki doesn't like line breaks in a lot of its syntax)
Line 1: Line 1:
=Introduction=
=Introduction=


The defining characteristic of a version control system is its ability
The defining characteristic of a version control system is its ability to track changes to a document, or set of documents, over many changes, or revisions.  For the vast majority of applications, version control systems have focused on tracking plain text files, such as those used for programming source code, HTML documents, and various markup syntax.   
to track changes to a document, or set of documents, over many
changes, or revisions.  For the vast majority of applications, version
control systems have focused on tracking plain text files, such as
those used for programming source code, HTML documents, and various
markup syntax.   


The history of the development of version control tools can be roughly
The history of the development of version control tools can be roughly categorized into three main phases:
categorized into three main phases:


# Local Version Control
# Local Version Control
Line 15: Line 9:
# Distributed Version Control
# Distributed Version Control


This breakdown is focused on the mechanisms that underly how data is
This breakdown is focused on the mechanisms that underly how data is ''shared'' and ''stored'' in a version control system. It should not be inferred from this structure that other attributes are not important to the history of developments of version control systems. There have been many advances in how:
''shared'' and ''stored'' in a version control system. It should not
be inferred from this structure that other attributes are not
important to the history of developments of version control systems.
There have been many advances in how:


* conflicts are recognized and merges are performed
* conflicts are recognized and merges are performed
Line 29: Line 19:
=Local Version Control=
=Local Version Control=


The first version control systems focused on ''local version
The first version control systems focused on ''local version control''; that is, centralized computer systems that were used by many users, often at the same time. In such a system, there were often many users of the system and the ''repository'', or location in which the data was stored, was simply a directory on the server to which the users had access. Because of this use case, these systems focused on two main features:
control''; that is, centralized computer systems that were used by
many users, often at the same time. In such a system, there were often
many users of the system and the ''repository'', or location in which
the data was stored, was simply a directory on the server to which the
users had access. Because of this use case, these systems focused on
two main features:


* File revision tracking
* File revision tracking
Line 44: Line 28:
==File Revision Tracking==
==File Revision Tracking==


The primary feature of these early systems was the ability to check in
The primary feature of these early systems was the ability to check in files at various points as they were altered, so that the history of changes made to files under version control was kept permanently. Thus, many users could alter many files over time, and the entire set of documents under version control at a given point in time could be recovered, preventing loss of valuable data, as well a providing a record of what users made changes to files over time.
files at various points as they were altered, so that the history of
changes made to files under version control was kept permanently.
Thus, many users could alter many files over time, and the entire set
of documents under version control at a given point in time could be
recovered, preventing loss of valuable data, as well a providing a
record of what users made changes to files over time.


==File Checkout and Locking==
==File Checkout and Locking==


Because many users in a shared system may desire to edit a file
Because many users in a shared system may desire to edit a file simultaneously, one of the first features developed for version control systems was the ability to ''check out'' and ''lock'' a file. When a user checks out a file, he or she reserves the right to be the sole editor of that file until it is checked back in to revision control.  The first revision control systems designed for use in a shared environment, such as [http://www.gnu.org/software/rcs/ RCS], allowed files to be checked out and locked in this way.  Other users could check out the file, but only to view it.  Thus, the files were locked from editing by all but the user that had checked the file out most recently.
simultaneously, one of the first features developed for version
control systems was the ability to ''check out'' and ''lock'' a file.
When a user checks out a file, he or she reserves the right to be the
sole editor of that file until it is checked back in to revision
control.  The first revision control systems designed for use in a
shared environment, such as [http://www.gnu.org/software/rcs/ RCS],
allowed files to be checked out and locked in this way.  Other users
could check out the file, but only to view it.  Thus, the files were
locked from editing by all but the user that had checked the file out
most recently.


==Weaknesses of Local Version Control Systems==
==Weaknesses of Local Version Control Systems== These local systems had two primary problems.   
These local systems had two primary problems.   


First, they required that every user log into a single computer to
First, they required that every user log into a single computer to edit or access the information in the repository.
edit or access the information in the repository.


Second, they restricted a particular file to having only one editor at
Second, they restricted a particular file to having only one editor at any given time. The next development in revision control, embodied by [http://www.nongnu.org/cvs/ CVS], sought to address both of these problems.
any given time. The next development in revision control, embodied by
[http://www.nongnu.org/cvs/ CVS], sought to address both of these
problems.


=Networked Revision Control: Client-Server=
=Networked Revision Control: Client-Server=


As users moved away from logging into systems locally to make their
As users moved away from logging into systems locally to make their changes to files, the need for a revision control system that supported remote operations emerged.  The natural way to implement such remote operations was as an extension of the existing system, and by far the most prominent manifestation of this philosophy was present in CVS, the concurrent versions system, which was initially based on RCS.
changes to files, the need for a revision control system that
supported remote operations emerged.  The natural way to implement
such remote operations was as an extension of the existing system, and
by far the most prominent manifestation of this philosophy was present
in CVS, the concurrent versions system, which was initially based on
RCS.


The main feature driving the development of CVS was the need for many
The main feature driving the development of CVS was the need for many users, each on his or her own machine, to be able to perform all the operations present in the original RCS, but over a network connection, and in a way that allowed for concurrent editting to take place.  This led to the development of a client-server model of revision control systems, in which one central server would contain the canoncal version of the repository, and various clients could connect to the central server and perform file check outs and commits.  This model is very similar to the original RCS model, but rather than requiring users of the system to log into the revision control system locally, it allowed users to access and alter the contents of the repository over the network.
users, each on his or her own machine, to be able to perform all the
operations present in the original RCS, but over a network connection,
and in a way that allowed for concurrent editting to take place.  This
led to the development of a client-server model of revision control
systems, in which one central server would contain the canoncal
version of the repository, and various clients could connect to the
central server and perform file check outs and commits.  This model is
very similar to the original RCS model, but rather than requiring
users of the system to log into the revision control system locally,
it allowed users to access and alter the contents of the repository
over the network.


Although CVS supports locking in the same way RCS does, CVS was among
Although CVS supports locking in the same way RCS does, CVS was among the first version control systems to support a ''non-locking repository''. This system allowed for concurrent editing of files under revision control, and generated the need to develop new features that addressed the resulting complexities. Chief among the new features introduced to handle these complexities were the notions of ''branching'' and ''merging''. This allowed CVS to offer a non-locking repository, which is why there is an emphasis on the "concurrent" portion of CVS's name "concurrent versions system".
the first version control systems to support a ''non-locking
repository''. This system allowed for concurrent editing of files
under revision control, and generated the need to develop new features
that addressed the resulting complexities. Chief among the new
features introduced to handle these complexities were the notions of
''branching'' and ''merging''. This allowed CVS to offer a non-locking
repository, which is why there is an emphasis on the "concurrent"
portion of CVS's name "concurrent versions system".


==Branching and Merging==
==Branching and Merging==


Inherent in the notion of concurrent editing is the problem of how to
Inherent in the notion of concurrent editing is the problem of how to reconcile conflicting changes to the same file. A ''conflict'' is essentially two or more changes made to the same file that it may be difficult to merge into a final file that contains both sets of changes. An example of a conflict would occur if two users both edited a file on line 49, one changing the word "blue" to "red", and the other changing the same word "blue" to "green". The first user would then commit his or her changes back to the repository, and when the second user committed changes, the version control system would detect that the repostory had changed since the second user had obtained the file (since the first user had made a change and then committed it). At that point, the version control system would detect a conflict, and prompt the two users to coordinate to resolve the conflict to deteremine what text should be on line 49.
reconcile conflicting changes to the same file. A ''conflict'' is
essentially two or more changes made to the same file that it may be
difficult to merge into a final file that contains both sets of
changes. An example of a conflict would occur if two users both edited
a file on line 49, one changing the word "blue" to "red", and the
other changing the same word "blue" to "green". The first user would
then commit his or her changes back to the repository, and when the
second user committed changes, the version control system would detect
that the repostory had changed since the second user had obtained the
file (since the first user had made a change and then committed it).
At that point, the version control system would detect a conflict, and
prompt the two users to coordinate to resolve the conflict to
deteremine what text should be on line 49.


The solution to this problem lies in allowing users of the revision
The solution to this problem lies in allowing users of the revision control system to ''branch'' a version of the repository and make (possibly many) changes to that branch independent of the changes occurring on the main branch of the repository, known as the ''trunk''. Once a logical set of changes was completed on a branch, that branch would then need to have its changes reconciled with the current state of the repository on the trunk. This process of reconciliation is known as merging.
control system to ''branch'' a version of the repository and make
(possibly many) changes to that branch independent of the changes
occurring on the main branch of the repository, known as the
''trunk''. Once a logical set of changes was completed on a branch,
that branch would then need to have its changes reconciled with the
current state of the repository on the trunk. This process of
reconciliation is known as merging.


This feature is critical in a multi-user client environment as it
This feature is critical in a multi-user client environment as it allows work to progress on multiple fronts simultaneously, only requiring that the files be merged once the users of the system are ready to reconcile changes with other users.
allows work to progress on multiple fronts simultaneously, only
requiring that the files be merged once the users of the system are
ready to reconcile changes with other users.


Along with development of mechanisms to allow this sort of concurrent
Along with development of mechanisms to allow this sort of concurrent access to the repository over the network, version control systems became more adept in the algorithms they used to detect conflicts and merge conflicts. This aspect of revision control is discussed further in [[#Merge Algorithms]].
access to the repository over the network, version control systems
became more adept in the algorithms they used to detect conflicts and
merge conflicts. This aspect of revision control is discussed further
in [[#Merge Algorithms]].


==Client-Server Beyond CVS==
==Client-Server Beyond CVS==


Although CVS developed good approaches to solving many of these
Although CVS developed good approaches to solving many of these problems, it had many problems that gained attention when it became the most widely used revision control system for open source development. An exhaustive list would be lengthy, but to mention a few might be illustrative.  
problems, it had many problems that gained attention when it became
the most widely used revision control system for open source
development. An exhaustive list would be lengthy, but to mention a few
might be illustrative.  


* CVS doesn't provide ''atomic'' operations, which means that if there
* CVS doesn't provide ''atomic'' operations, which means that if there were a network failure during a commit, the repository could become corrupted.  
were a network failure during a commit, the repository could become
* CVS does not version control directories or symbolic links, which means the repostitory is really a lossy copy of a developer's environment, sometimes resulting in failure to track changes accrurately.
corrupted.  
* CVS doesn't track what files were committed at the same time, so if you make a logical group of changes to several files and want to track the fact that those files were changed together, you can only only derive that information from log messages. CVS will not track it for you.
* CVS does not version control directories or symbolic links, which
* CVS cannot track when files are renamed; rather, a rename of a file in CVS looks like the original file was deleted and a new file added, thus losing the file's history.
means the repostitory is really a lossy copy of a developer's
environment, sometimes resulting in failure to track changes
accrurately.
* CVS doesn't track what files were committed at the same time, so if
you make a logical group of changes to several files and want to track
the fact that those files were changed together, you can only only
derive that information from log messages. CVS will not track it for
you.
* CVS cannot track when files are renamed; rather, a rename of a file
in CVS looks like the original file was deleted and a new file added,
thus losing the file's history.
* Creating braches and managing the subsequent merges is slow and difficult.
* Creating braches and managing the subsequent merges is slow and difficult.


In short, while CVS provided a whole host of new features and advanced
In short, while CVS provided a whole host of new features and advanced the state of the art in version control, it left room for improvement. This resulted in a vast number of client-server version control systems entering the market following CVS. One of the latest and most notable of these is [http://subversion.tigris.org/ Subversion], which seeks to address all of the [http://subversion.tigris.org/features.html issues] mentioned above and a whole lot more.
the state of the art in version control, it left room for improvement.
This resulted in a vast number of client-server version control
systems entering the market following CVS. One of the latest and most
notable of these is [http://subversion.tigris.org/ Subversion], which
seeks to address all of the
[http://subversion.tigris.org/features.html issues] mentioned above
and a whole lot more.


=Distributed Revision Control=
=Distributed Revision Control=


''Distributed'' revision control took many of the advances seen in
''Distributed'' revision control took many of the advances seen in client-server version control systems and moved them into a less centralized architecture. Essentially, the original version control systems were completely centralized, requiring every user to locally log in to the server on which the repository was located. In client-server version control systems, the system was made slightly more distributed, allowing users to connect from across the network to the repository, copy files from the repostory to other machines for editing, and then commit them back to the server when edits were complete. Distributed version control continues the trend of decentralization by putting an entire repository, complete with a history of changes and ability to support remote connections, on each user's machine.
client-server version control systems and moved them into a less
centralized architecture. Essentially, the original version control
systems were completely centralized, requiring every user to locally
log in to the server on which the repository was located. In
client-server version control systems, the system was made slightly
more distributed, allowing users to connect from across the network to
the repository, copy files from the repostory to other machines for
editing, and then commit them back to the server when edits were
complete. Distributed version control continues the trend of
decentralization by putting an entire repository, complete with a
history of changes and ability to support remote connections, on each
user's machine.


One of the strengths of CVS is that it supports file locking even
One of the strengths of CVS is that it supports file locking even though the main advance it provides is a non-locking repository. This allows for users of RCS to switch to CVS while maintaining their work flow. When users are ready, they can take advantage of the non-locking repository features CVS provides. In the same way that CVS supports legacy locking work flows, so do distributed version control systems support a centralized repository. The main improvement distributed version control systems offer, however, is they do not require a central server, and allow each user to maintain a complete copy of the repository on their local machine.
though the main advance it provides is a non-locking repository. This
allows for users of RCS to switch to CVS while maintaining their work
flow. When users are ready, they can take advantage of the non-locking
repository features CVS provides. In the same way that CVS supports
legacy locking work flows, so do distributed version control systems
support a centralized repository. The main improvement distributed
version control systems offer, however, is they do not require a
central server, and allow each user to maintain a complete copy of the
repository on their local machine.




=Other Advances=
=Other Advances=


In addition to the evolution of the way version control systems
In addition to the evolution of the way version control systems allowed users to access, modify and share data in the repository, many advances have been made in the way changes are merged, tracked and stored.
allowed users to access, modify and share data in the repository, many
advances have been made in the way changes are merged, tracked and
stored.


==Merge Algorithms==
==Merge Algorithms==
Line 224: Line 89:
=Further Reading=  
=Further Reading=  


IBM has an excellent
IBM has an excellent [http://www.ibm.com/developerworks/java/library/j-subversion/index.html DeveloperWorks article] describing the ways is which Subversion improved upon CVS, along with some history of version control up until Subversion.
[http://www.ibm.com/developerworks/java/library/j-subversion/index.html DeveloperWorks article]  
describing the ways is which Subversion improved upon CVS, along with
some history of version control up until Subversion.

Revision as of 05:48, 7 September 2009

Introduction

The defining characteristic of a version control system is its ability to track changes to a document, or set of documents, over many changes, or revisions. For the vast majority of applications, version control systems have focused on tracking plain text files, such as those used for programming source code, HTML documents, and various markup syntax.

The history of the development of version control tools can be roughly categorized into three main phases:

  1. Local Version Control
  2. Client-Server Version Control
  3. Distributed Version Control

This breakdown is focused on the mechanisms that underly how data is shared and stored in a version control system. It should not be inferred from this structure that other attributes are not important to the history of developments of version control systems. There have been many advances in how:

  • conflicts are recognized and merges are performed
  • groups of logically coherent changes are tracked
  • and how the data is stored

We will discuss these each in turn in #Other Advances.

Local Version Control

The first version control systems focused on local version control; that is, centralized computer systems that were used by many users, often at the same time. In such a system, there were often many users of the system and the repository, or location in which the data was stored, was simply a directory on the server to which the users had access. Because of this use case, these systems focused on two main features:

  • File revision tracking
  • File Checkout and Locking

We will address each of these fundamental features in turn.

File Revision Tracking

The primary feature of these early systems was the ability to check in files at various points as they were altered, so that the history of changes made to files under version control was kept permanently. Thus, many users could alter many files over time, and the entire set of documents under version control at a given point in time could be recovered, preventing loss of valuable data, as well a providing a record of what users made changes to files over time.

File Checkout and Locking

Because many users in a shared system may desire to edit a file simultaneously, one of the first features developed for version control systems was the ability to check out and lock a file. When a user checks out a file, he or she reserves the right to be the sole editor of that file until it is checked back in to revision control. The first revision control systems designed for use in a shared environment, such as RCS, allowed files to be checked out and locked in this way. Other users could check out the file, but only to view it. Thus, the files were locked from editing by all but the user that had checked the file out most recently.

==Weaknesses of Local Version Control Systems== These local systems had two primary problems.

First, they required that every user log into a single computer to edit or access the information in the repository.

Second, they restricted a particular file to having only one editor at any given time. The next development in revision control, embodied by CVS, sought to address both of these problems.

Networked Revision Control: Client-Server

As users moved away from logging into systems locally to make their changes to files, the need for a revision control system that supported remote operations emerged. The natural way to implement such remote operations was as an extension of the existing system, and by far the most prominent manifestation of this philosophy was present in CVS, the concurrent versions system, which was initially based on RCS.

The main feature driving the development of CVS was the need for many users, each on his or her own machine, to be able to perform all the operations present in the original RCS, but over a network connection, and in a way that allowed for concurrent editting to take place. This led to the development of a client-server model of revision control systems, in which one central server would contain the canoncal version of the repository, and various clients could connect to the central server and perform file check outs and commits. This model is very similar to the original RCS model, but rather than requiring users of the system to log into the revision control system locally, it allowed users to access and alter the contents of the repository over the network.

Although CVS supports locking in the same way RCS does, CVS was among the first version control systems to support a non-locking repository. This system allowed for concurrent editing of files under revision control, and generated the need to develop new features that addressed the resulting complexities. Chief among the new features introduced to handle these complexities were the notions of branching and merging. This allowed CVS to offer a non-locking repository, which is why there is an emphasis on the "concurrent" portion of CVS's name "concurrent versions system".

Branching and Merging

Inherent in the notion of concurrent editing is the problem of how to reconcile conflicting changes to the same file. A conflict is essentially two or more changes made to the same file that it may be difficult to merge into a final file that contains both sets of changes. An example of a conflict would occur if two users both edited a file on line 49, one changing the word "blue" to "red", and the other changing the same word "blue" to "green". The first user would then commit his or her changes back to the repository, and when the second user committed changes, the version control system would detect that the repostory had changed since the second user had obtained the file (since the first user had made a change and then committed it). At that point, the version control system would detect a conflict, and prompt the two users to coordinate to resolve the conflict to deteremine what text should be on line 49.

The solution to this problem lies in allowing users of the revision control system to branch a version of the repository and make (possibly many) changes to that branch independent of the changes occurring on the main branch of the repository, known as the trunk. Once a logical set of changes was completed on a branch, that branch would then need to have its changes reconciled with the current state of the repository on the trunk. This process of reconciliation is known as merging.

This feature is critical in a multi-user client environment as it allows work to progress on multiple fronts simultaneously, only requiring that the files be merged once the users of the system are ready to reconcile changes with other users.

Along with development of mechanisms to allow this sort of concurrent access to the repository over the network, version control systems became more adept in the algorithms they used to detect conflicts and merge conflicts. This aspect of revision control is discussed further in #Merge Algorithms.

Client-Server Beyond CVS

Although CVS developed good approaches to solving many of these problems, it had many problems that gained attention when it became the most widely used revision control system for open source development. An exhaustive list would be lengthy, but to mention a few might be illustrative.

  • CVS doesn't provide atomic operations, which means that if there were a network failure during a commit, the repository could become corrupted.
  • CVS does not version control directories or symbolic links, which means the repostitory is really a lossy copy of a developer's environment, sometimes resulting in failure to track changes accrurately.
  • CVS doesn't track what files were committed at the same time, so if you make a logical group of changes to several files and want to track the fact that those files were changed together, you can only only derive that information from log messages. CVS will not track it for you.
  • CVS cannot track when files are renamed; rather, a rename of a file in CVS looks like the original file was deleted and a new file added, thus losing the file's history.
  • Creating braches and managing the subsequent merges is slow and difficult.

In short, while CVS provided a whole host of new features and advanced the state of the art in version control, it left room for improvement. This resulted in a vast number of client-server version control systems entering the market following CVS. One of the latest and most notable of these is Subversion, which seeks to address all of the issues mentioned above and a whole lot more.

Distributed Revision Control

Distributed revision control took many of the advances seen in client-server version control systems and moved them into a less centralized architecture. Essentially, the original version control systems were completely centralized, requiring every user to locally log in to the server on which the repository was located. In client-server version control systems, the system was made slightly more distributed, allowing users to connect from across the network to the repository, copy files from the repostory to other machines for editing, and then commit them back to the server when edits were complete. Distributed version control continues the trend of decentralization by putting an entire repository, complete with a history of changes and ability to support remote connections, on each user's machine.

One of the strengths of CVS is that it supports file locking even though the main advance it provides is a non-locking repository. This allows for users of RCS to switch to CVS while maintaining their work flow. When users are ready, they can take advantage of the non-locking repository features CVS provides. In the same way that CVS supports legacy locking work flows, so do distributed version control systems support a centralized repository. The main improvement distributed version control systems offer, however, is they do not require a central server, and allow each user to maintain a complete copy of the repository on their local machine.


Other Advances

In addition to the evolution of the way version control systems allowed users to access, modify and share data in the repository, many advances have been made in the way changes are merged, tracked and stored.

Merge Algorithms

Tracking Groups of Changes

Repository Data Storage

Further Reading

IBM has an excellent DeveloperWorks article describing the ways is which Subversion improved upon CVS, along with some history of version control up until Subversion.