CSC/ECE 517 Fall 2012/ch1 1w1 rk
Introduction
A version control system (VCS) is a process for managing software codes,files and directory structures and corresponding updates made to them during software development, web development etc. project or during subsequent project maintenances. Version control is also known as revision control, source control or software configuration management (SCM)[6].
Any software development project is a dynamic and fast paced environment. In a typical setting, development for a project is performed simultaneously by many developers. Thus, incorporating all the changes made, either simultaneously or at different times, poses a new type of challenge.
This wikibook chapter focuses mainly on the development history of different version control systems.
Why version control systems are needed?
The major issues in a dynamic development environment can be described as:
- Obviously, merging of a lot of files. If we consider somebody to do merging manually, The sheer volume of communication required is overwhelming. This communication can be among the developers themselves in a small team or between the team leader in charge of merging and the rest of the group.
- Accountability: if one of the developers breaks the code flow during development, It is tough to detect and naturally nobody steps forward to take the blame. A version control system can precisely indicate who caused the break in the code.
- If a certain code implementation doesn't work as expected, Reverting to last satisfactory state may be required.
- Development on stale code may result in Code loss. for example, two developers downloaded a certain file at 10AM. One of the developers modified foo_1() method and uploaded modified file at 10:30AM.The other developer modified foo_2() method and uploaded the same file at 10:45AM. Since the functions are in the same file, the second upload will overwrite the file uploaded by first developer at 10:30AM causing loss of his part of work.
- Even a lone coder might need to review why he or she made a certain change in code. Comments during code check-in can be a useful resource in such situation.
- Version control provides a form of documentation which makes tracking easy. i.e.-tagging is a kind of snapshot of all files and documents at a particular stage of development, usually during a stable release. This allows the developers to work with the exact files that were included in that release for bug-fixing purpose.
- Sandboxing / Branching: Version control makes it possible to perform temporary code changes in an isolated area, usually in branch folder, for testing purpose. If the outcome is satisfactory then the code can be merged with the existing code, otherwise it can be discarded without having any impact on the main code.
- Now a days, software development teams are spread across different countries and work at different time zones. Use of a version control system for project aggregation becomes unavoidable.
Evolution of version control systems
VCS' can be broadly categorized into three groups based on the repository model. The sequence below follows the evolution of version control systems as well:
Local repository model <ref>http://en.wikipedia.org/wiki/Source_Code_Control_System</ref>
Local version control systems had one storage location of the files. It was called local because it didn't support networked commands or access like the other later softwares. So, all the developers had to use same computer system to access or modify files.
Source Code Control System (SCCS)<ref>http://en.wikipedia.org/wiki/Source_Code_Control_System</ref>
It is one of the pioneer source code revision softwares.[5] SCCS was developed at Bell Labs in 1972 by Marc J. Rochkind. Although it was originally developed for OS/MVT, SCCS was included in some UNIX distributions. SCCS ruled as the dominant VCS until Revision Control System was released.
- SCCS was the dominant version control system for Unix until the release of the Revision Control System (RCS). Today, SCCS is generally considered obsolete. However, its file format is still used internally by a few other revision control programs, including BitKeeper and TeamWare. The latter is a frontend to SCCS. Sablime has been developed from a modified version of SCCS but uses a history file format that is incompatible with SCCS. The SCCS file format uses a storage technique called interleaved deltas (or the weave). This storage technique is now considered by many revision control system developers as foundational to advanced merging and versioning techniques, such as the "Precise Codeville" ("pcdv") merge.
- What does a source code control system do?
- If you are creating or maintaining a text file---perhaps a document, or a script, or a program---a source code control system can do several things for you:
- It can keep track of the changes made to the file: what was changed, when it was changed, and by whom.
- It provides a version numbering scheme so you can tell which versions of a file are more recent.
- It can retrieve previous versions of your file, so that you can retreat to an older version if you decide that the current version is a bad idea, or if you want to see some text that has since been changed or deleted.
- If you accidentally delete the current file, you can get back the last version.
- If several people are working with the same file, a source code control system can help you coordinate your work and keep track of who did what, and when.
SCCS is also known for the sccsid
string, for example:
static char sccsid[] = "@(#)ls.c 8.1 (Berkeley) 6/11/93";
This string contains the file name, date, and can also contain a comment. After compilation, this string can be found in binary and object files by looking for the pattern "@(#)" and can be used determine which source code files were used during compilation.
Control System(RCS)<ref>http://en.wikipedia.org/wiki/Revision_Control_System Revision</ref>
It automated storing, retrieval, logging, identification, and merging frequent revisions of texts. RCS was first released in 1982 by Walter F. Tichy as an alternative to SCCS. It quickly gained popularity and almost replaced SCCS.[9]
- RCS performed better by storing the most recent copy of file and then storing only reverse differences called "deltas".One of the shortcomings of RCS was that only one person at a time could edit the file. RCS could manage only single files, not a whole project or directory.
- Ref.[3] is a comprehensive book on CSSC and RCS.
Client-server model
The client-server model utilizes a single centralized repository for the version control which is accessible to the developers through network or internet. The below diagram [1] provides a high level overview of the model.
Concurrent Versions System (CVS)<ref>http://en.wikipedia.org/wiki/Concurrent_Versions_System</ref>
It was one of the early client-server model based scheme. Dick Grune started developing CVS from 1984-1985 to allow his students to collaborate in a project according to their schedule. CVS was publicly released at 1986.The main improvement for CVS over RCS was that CVS could manage the whole project while RCS could only work on a file at a time.At the beginning, CVS was script based and called RCS at background. Later it was developed into a full fledged program. A lot of the IDE's (Emacs, Eclipse/aptana,Netbeans, Komodo, PHPEdit etc.) support CVS. CVSNT is a cross platform port of CVS with some modification. CVS, which was the defacto standard for version controlling for a long time is almost replaced today by SVN. There were many disadvantages of CVS which hindered its stay in the industry further. Prominent ones were
- CVS check-in is not an atomic operation. For example, given the files 1.c, 2.c, 3.c, 4.c and if someone runs cvs ci 1.c 3.c and someone else runs cvs update at the same time, the person running update might get only the change to `3.c' and not the change to `1.c'. So, if check-in process was disturbed, it could result in damage of existing project at repository.
- CVS only tracks modification on a file-by-file basis.
- CVS was designed with only text files in mind. No native support for other formats. Extensive modification are necessary at the client and the server to support other file types.
- Conflict resolution in CVS, if not absent, is definitely not good. Conflict markers used in CVS may go unnoticed and those file might find their way into the repository.
- Slow
Subversion (SVN)<ref>http://en.wikipedia.org/wiki/Subversion_%28software%29</ref>
As the complexity of projects increased, the shortcomings of CVS became more and more prominent. This led to the development of Subversion. CollabNet started developing svn as a successor of CVS with added functionality[2][7]. Different open source projects (i.e.-Apache Software Foundation,Ruby,SourceForge,Tigris.org, PHP, Python and MediaWiki) use Subversion. Subversion is an open source software due to it's Apache license[1].
Features: <ref>http://en.wikipedia.org/wiki/Apache_Subversion#Features</ref>
- Commits as true atomic operations (interrupted commit operations in CVS would cause repository inconsistency or corruption).
- Renamed/copied/moved/removed files retain full revision history.
- The system maintains versioning for directories, renames, and file metadata (but not for timestamps). Users can move and/or copy entire directory-trees very quickly, while retaining full revision history.
- Versioning of symbolic links.
- Native support for binary files, with space-efficient binary-diff storage.
- Apache HTTP Server as network server, WebDAV/Delta-V for protocol. There is also an independent server process called svnserve that uses a custom protocol over TCP/IP.
- Branching as a cheap operation, independent of file size (though Subversion itself does not distinguish between a branch and a directory)
- Natively client–server, layered library design.
- Client/server protocol sends diffs in both directions.
- Costs proportional to change size, not to data size.
- Parsable output, including XML log output.
- Open source licensed — Apache License in the projected 1.7 release; prior versions use a derivative of the Apache Software License, v1.1
- Internationalized program messages.
- File locking for unmergeable files ("reserved checkouts").
- Path-based authorization.
- Language bindings for C#, PHP, Python, Perl, Ruby, and Java.
- Full MIME support — users can view or change the MIME type of each file, with the software knowing which MIME types can have their differences from previous versions shown.
- Merge tracking - Merges between branches will be tracked, this allows automatically merging between branches without telling Subversion what (doesn't) need to be merged.
Eclipse Subversive - Subversion (SVN) Team Provider<ref>http://www.eclipse.org/subversive/</ref>
The Subversive project is aimed to integrate the Subversion (SVN) version control system with the Eclipse platform. Using the Subversive plug-in, you can work with projects stored in Subversion repositories directly from the Eclipse workbench in a way similar to work with other Eclipse version control providers, such as CVS and Git.
Subversive Features: Subversive plug-in provides access to Subversion repositories from the Eclipse workbench.
- Full-Scale SVN Client
- Subversive is designed to be used as a full-featured SVN client, so you can update, commit, merge changes, work with SVN properties, view change history and perform other operations with SVN directly from the Eclipse environment.
- Advanced SVN Features
- Subversive includes several features that extend functionality of the standard SVN client. In particular, Subversive can show the SVN repository content grouped by the logical structures of trunk, branch and tag and display changes on a visual revisions graph.
- Seamless Integration with Eclipse
- Subversive is an official Eclipse project and an integral part of Eclipse Simultaneous releases. The project follows all Eclipse guidelines and requirements to deliver a quality SVN team provider plug-in similar to CVS and Git implementations.
- Support of the Latest SVN Versions
- Subversive evolves together with the Subversion project to provide Eclipse users with the features that appeared in new versions of the SVN implementation. You can use the new SVN functionality in Eclipse by installing the Early Access version of Subversive.
Configuration Management Version Control (CMVC)<ref>http://en.wikipedia.org/wiki/IBM_Configuration_Management_Version_Control_(CMVC)</ref>
It was developed by IBM Corporation in the mid-late 1990s and was derived in part from software purchased from HP and from IBM's internal-use-only system, IDSS. CMVC was superseded by IBM Rational ClearCase and ClearQuest. The system was used to manage the IBM OS/2 and IBM AIX source code repositories in the 1990s. It is still widely used within IBM by various teams.
One of the drawbacks of CMVC is that, once a file is checked-out by a user, it remains locked to all the other users until the either the new version is checked-in or the lock is released explicitly by the owner (user) of the lock. Some proprietary client-server based version control softwares are:
- Autodesk Vault was specifically developed for Autodesk applications.
- ClearCase was developed by IBM Rational Software. It's a Source Code Control (SCC) compliant software.
- Visual SourceSafe is developed by Microsoft targeting small development team.
- Visual Studio Team System is also developed by Microsoft targeting larger groups.
Distributed model
Distributed Version Control (DVCS), or Decentralized Version Control systems can have many central repositories.This is a peer to peer approach. The concept started developing at late 90's. In DVCS. Each developer works with his or her local repository and the changes are synchronized between repositories upon commit. DVCS are better suited for large teams with partially independent developers. i.e.- opensource software development.[8]
- Monotone and GNU arch were the first generation DVCS (release:2001-2003) followed by Darcs (stable release:2010).Mercurial , Git and Bazaar (2005-2007) are well adopted among developers.
- BitKeeper,Code Co-op, Sun WorkShop TeamWare are some proprietary DVCS softwares.
It should be noted that many distributed version control softwares support client-server model too.One of the benefits of DVCS is that a developer can save the source code as long as he wants even if the central repository (in client-server model) is discarded at the end of a project.
Git (Version control)<ref>http://en.wikipedia.org/wiki/Git_(software)</ref>
Git development began after many Linux kernel developers chose to give up access to BitKeeper, a proprietary SCM system that had previously been used to maintain the project. The copyright holder of BitKeeper, Larry McVoy, had withdrawn free use of the product after he claimed that Andrew Tridgell had reverse-engineered the BitKeeper protocols.
Git was created by Linus Torvalds (the founder of Linux) because he really didn’t like Concurrent Versions System (CVS), which, at the time, was the most popular version control system. Torvalds wanted something to help keep versions of the kernel he was working on and he figured that he would have to build a system of his own.
Git is an open source project, and since Torvalds’ initial development, there have been many other primary authors and contributors to the project.
Advantages of Using Git
- Git is super easy to install: I will take you through the installation process – it’s a breeze.
- Git is easier to learn compared to other systems: by the end of this guide, you will have enough knowledge to get going with Git.
- Git is fast: So much so that it doesn’t become one of those things you have to force yourself to remember to do and you can integrate it seamlessly with your current workflow.
- Git is decentralized: If many people are working on a project, they each can have their own copy and not save over each other.
Disadvantages of Using Git
- Git has a learning curve: Whilst I did say that it’s one of the easier version control systems to use, any new thing you introduce to your workflow will need some learning time. Learning Git will be similar to learning a new software application such as Word or Excel.
File Sharing Models
Every system that implement version controlling have to address one fundamental problem: How will the files in the repository be safely shared across users but prevent them from accidentally stepping on each other's feet? It's all too easy for users to accidentally overwrite each other's changes in the repository.
Consider the scenario with two users A and B. Both of them check out the same file (same version). User A after making his modification, suppose, commits the file first. A few moments later B completes his modification. B does not have any clue about A's check-in and he happily commits his code to the repository. Now B has accidentally overwritten the changes made by A leaving the system in an inconsistent state. This problem can be solved in two different ways:
- Lock-Modify-Unlock
- Copy-Modify-Merge
While CMVC follows the lock-modify-unlock solution, svn follows copy-modify-merge allowing users to modify the same piece of code concurrently.
The problem with the lock-modify-unlock model is that it's a bit restrictive and often becomes a roadblock for users:
- Locking may cause unnecessary delay and is not desired when it comes to teams/projects which have a small cycle time between releases. For example: Suppose say, user A checks out a file (locks it) and some interrupt in his work causes him to switch to another task of higher priority. There is a possibility that he forgets completely about the file that he checked out. Now, if another user B wants to edit the same file, he cannot do so, without having the administrator unlock the file for him, his hands are tied!
- Locking may cause unnecessary serialization: What if two different users A and B want to edit different non-intersecting portions of the same file? With proper merging solution in place, they could have done it easily without bothering about what the other user does with the file.
- Locking may create a false sense of security: As with any model, lock-modify-unlock approach too allows two different users to work on separate files simultaneously. However the model does not define how dependencies between files are dealt with.
For example: Suppose A locks and edits file f1, while B simultaneously locks and edits file f2. If there is a function call in f1 whose definition exists in f2 but the definition is removed or the function name is changed after editing f2, they don't work together anymore.
Thus, locking files by itself does not provide security against such deliberate changes. Lot of manual synchronization is needed to avoid such situations which is only going to cause delay in the development of the project.
Use of version control in other areas
Now a days, many document editing softwares have adopted version control. Wikis, GoogleDoc, Microsoft Office and OpenOffice.org suite are some example softwares that can save previous document editing history.
Other considerations
Most open source softwares have distributions for Linux OS. In most open source cases, Linux and windows versions are released concurrently. Many VCS' have plugins for common IDEs like Eclipse, Visual Studio, Oracle JDeveloper etc. NetBeans IDE and Xcode has integrated version control support.
References
<references/>
Additional References
3. Don Bolinger, Tan Bronson: Applying RCS and SCCS. O'Reilly Media. ISBN:978-1-56592-117-7
4. Eric Sink: A collection of articles on source control and best practices, Accessed:September 2010
5. M. J. Rochkind: The Source Code Control System. In IEEE Transactions on Software Engineering SE-1:4 (Dec. 1975), pages 364–370.
7. Mike Mason; Pragmatic Version Control Using Subversion; Pragmatic Bookshelf; ISBN 0-9745140-6-3 (1st edition, paperback, 2005)
See also
Wikipedia: List of revision control software
Wikipedia: Detail comparison of revision control softwares
Dick Grune's website on CVS history
The original usenet post on CVS archived at google groups.