CSC/ECE 517 Fall 2012/ch1 1w1 rk

From Expertiza_Wiki
Jump to navigation Jump to search

Introduction

A version control system (VCS) is a process for managing software codes,files and directory structures and corresponding updates made to them during software development, web development etc. project or during subsequent project maintenances. Version control is also known as revision control, source control or software configuration management (SCM)[6].

Any software development project is a dynamic and fast paced environment. In a typical setting, development for a project is performed simultaneously by many developers. Thus, incorporating all the changes made, either simultaneously or at different times, poses a new type of challenge.

This wikibook chapter focuses mainly on the development history of different version control systems.

Why version control systems are needed?

The major issues in a dynamic development environment can be described as:

  • Obviously, merging of a lot of files. If we consider somebody to do merging manually, The sheer volume of communication required is overwhelming. This communication can be among the developers themselves in a small team or between the team leader in charge of merging and the rest of the group.
  • Accountability: if one of the developers breaks the code flow during development, It is tough to detect and naturally nobody steps forward to take the blame. A version control system can precisely indicate who caused the break in the code.
  • If a certain code implementation doesn't work as expected, Reverting to last satisfactory state may be required.
  • Development on stale code may result in Code loss. for example, two developers downloaded a certain file at 10AM. One of the developers modified foo_1() method and uploaded modified file at 10:30AM.The other developer modified foo_2() method and uploaded the same file at 10:45AM. Since the functions are in the same file, the second upload will overwrite the file uploaded by first developer at 10:30AM causing loss of his part of work.
  • Even a lone coder might need to review why he or she made a certain change in code. Comments during code check-in can be a useful resource in such situation.
  • Version control provides a form of documentation which makes tracking easy. i.e.-tagging is a kind of snapshot of all files and documents at a particular stage of development, usually during a stable release. This allows the developers to work with the exact files that were included in that release for bug-fixing purpose.
  • Sandboxing / Branching: Version control makes it possible to perform temporary code changes in an isolated area, usually in branch folder, for testing purpose. If the outcome is satisfactory then the code can be merged with the existing code, otherwise it can be discarded without having any impact on the main code.
  • Now a days, software development teams are spread across different countries and work at different time zones. Use of a version control system for project aggregation becomes unavoidable.

Evolution of version control systems

VCS' can be broadly categorized into three groups based on the repository model. The sequence below follows the evolution of version control systems as well:

Local repository model

Local version control systems had one storage location of the files. It was called local because it didn't support networked commands or access like the other later softwares. So, all the developers had to use same computer system to access or modify files.

  1. Source Code Control System (SCCS) is one of the pioneer source code revision softwares.[5] SCCS was developed at Bell Labs in 1972 by Marc J. Rochkind. Although it was originally developed for OS/MVT, SCCS was included in some UNIX distributions. SCCS ruled as the dominant VCS until Revision Control System was released.
  2. Revision Control System(RCS) automated storing, retrieval, logging, identification, and merging frequent revisions of texts. RCS was first released in 1982 by Walter F. Tichy as an alternative to SCCS. It quickly gained popularity and almost replaced SCCS.[9]
RCS performed better by storing the most recent copy of file and then storing only reverse differences called "deltas".One of the shortcomings of RCS was that only one person at a time could edit the file. RCS could manage only single files, not a whole project or directory.
Ref.[3] is a comprehensive book on CSSC and RCS.

Client-server model

The client-server model utilizes a single centralized repository for the version control which is accessible to the developers through network or internet. The below diagram<ref>http://svnbook.red-bean.com/en/1.7/svn.basic.version-control-basics.html </ref> provides a high level overview of the model.

  1. Concurrent Versions System (CVS) was one of the early client-server model based scheme. Dick Grune started developing CVS from 1984-1985 to allow his students to collaborate in a project according to their schedule. CVS was publicly released at 1986.The main improvement for CVS over RCS was that CVS could manage the whole project while RCS could only work on a file at a time.At the beginning, CVS was script based and called RCS at background. Later it was developed into a full fledged program. A lot of the IDE's (Emacs, Eclipse/aptana,Netbeans, Komodo, PHPEdit etc.) support CVS. CVSNT is a cross platform port of CVS with some modification.
CVS, which was the defacto standard for version controlling for a long time is almost replaced today by SVN. There were many disadvantages of CVS which hindered its stay in the industry further. Prominent ones were
  • CVS check-in is not an atomic operation. For example, given the files 1.c, 2.c, 3.c, 4.c and if someone runs

                     cvs ci 1.c 3.c
                 and someone else runs cvs update at the same time, the person running update might get only the change to `3.c' and not the change to `1.c'. So, if check-in process was disturbed, it                  could result in damage of existing project at repository.

  • CVS only tracks modification on a file-by-file basis. Not scalable for larger projects.
  • CVS was designed with only text files in mind. No native support for other formats. Extensive modification are necessary at the client and the server to support other file types.
  • Conflict resolution in CVS, if not absent, is definitely not good. Conflict markers used in CVS may go unnoticed and those file might find their way into the repository.
  • Slow
  1. Subversion (SVN): As the complexity of projects increased, the shortcomings of CVS became more and more prominent. This led to the development of Subversion. CollabNet started developing svn as a successor of CVS with added functionality[2][7]. Different open source projects (i.e.-Apache Software Foundation,Ruby,SourceForge,Tigris.org, PHP, Python and MediaWiki) use Subversion. Subversion is an open source software due to it's Apache license[1].
    Visualization of a very simple Subversion project. Source:wikimedia commons, Explanation of terms can be found at http://en.wikipedia.org/wiki/Apache_Subversion, "Branching and tagging" and at http://en.wikipedia.org/wiki/Revision_control, #"Common vocabulary"
  2. Configuration Management Version Control (CMVC) developed by IBM Corporation in the mid-late 1990s and was derived in part from software purchased from HP and from IBM's internal-use-only system, IDSS. CMVC was superseded by IBM Rational ClearCase and ClearQuest. The system was used to manage the IBM OS/2 and IBM AIX source code repositories in the 1990s. It is still widely used within IBM by various teams.
One of the drawbacks of CMVC is that, once a file is checked-out by a user, it remains locked to all the other users until the either the new version is checked-in or the lock is released explicitly by the owner (user) of the lock.
  1. Some proprietary client-server based version control softwares are-
    • Autodesk Vault was specifically developed for Autodesk applications.
    • ClearCase was developed by IBM Rational Software. It's a Source Code Control (SCC) compliant software.
    • Visual SourceSafe is developed by Microsoft targeting small development team.
    • Visual Studio Team System is also developed by Microsoft targeting larger groups.

Distributed model

Distributed Version Control (DVCS), or Decentralized Version Control systems can have many central repositories.This is a peer to peer approach. The concept started developing at late 90's. In DVCS. Each developer works with his or her local repository and the changes are synchronized between repositories upon commit. DVCS are better suited for large teams with partially independent developers. i.e.- opensource software development.[8]

    1. Monotone and GNU arch were the first generation DVCS (release:2001-2003) followed by Darcs (stable release:2010).Mercurial , Git and Bazaar (2005-2007) are well adopted among developers.
    2. BitKeeper,Code Co-op, Sun WorkShop TeamWare are some proprietary DVCS softwares.

It should be noted that many distributed version control softwares support client-server model too.One of the benefits of DVCS is that a developer can save the source code as long as he wants even if the central repository (in client-server model) is discarded at the end of a project.

A generalized timeline. Source:http://pg-server.csc.ncsu.edu/mediawiki/index.php/CSC/ECE_517_Fall_2009/wiki1a_5_History_of_version_control_by_av

File Sharing Models

Every system that implement version controlling have to address one fundamental problem: How will the files in the repository be safely shared across users but prevent them from accidentally stepping on each other's feet? It's all too easy for users to accidentally overwrite each other's changes in the repository.

Consider the scenario with two users A and B. Both of them check out the same file (same version). User A after making his modification, suppose, commits the file first. A few moments later B completes his modification. B does not have any clue about A's check-in and he happily commits his code to the repository. Now B has accidentally overwritten the changes made by A leaving the system in an inconsistent state. This problem can be solved in two different ways:

  • Lock-Modify-Unlock
  • Copy-Modify-Merge

While CMVC follows the lock-modify-unlock solution, svn follows copy-modify-merge allowing users to modify the same piece of code concurrently.
The problem with the lock-modify-unlock model is that it's a bit restrictive and often becomes a roadblock for users:

Locking may cause administrative problems. Sometimes user A will lock a file and then forget about it. Meanwhile, because B is still waiting to edit the file, his hands are tied. And then A goes on vacation. Now B has to get an administrator to release A's lock. The situation ends up causing a lot of unnecessary delay and wasted time.

Locking may cause unnecessary serialization. What if A is editing the beginning of a text file, and B simply wants to edit the end of the same file? These changes don't overlap at all. They could easily edit the file simultaneously, and no great harm would come, assuming the changes were properly merged together. There's no need for them to take turns in this situation.

Locking may create a false sense of security. Suppose A locks and edits file f1, while Sally simultaneously locks and edits file f2. But what if f1 and f2 depend on one another, and the changes made to each are semantically incompatible? Suddenly f1 and f2 don't work together anymore. The locking system was powerless to prevent the problem—yet it somehow provided a false sense of security. It's easy for A and B to imagine that by locking files, each is beginning a safe, insulated task, and thus they need not bother discussing their incompatible changes early on. Locking often becomes a substitute for real communication. <REVISIT>

Use of version control in other areas

Now a days, many document editing softwares have adopted version control. Wikis, GoogleDoc, Microsoft Office and OpenOffice.org suite are some example softwares that can save previous document editing history.

Other considerations

Most open source softwares have distributions for Linux OS. In most open source cases, Linux and windows versions are released concurrently. Many VCS' have plugins for common IDEs like Eclipse, Visual Studio, Oracle JDeveloper etc. NetBeans IDE and Xcode has integrated version control support.

References

1. Apache Software Foundation Announcements, "Subversion becomes Apache Subversion".(2010-02-17) Accessed:September 2010

2. Ben Collins-Sussman, Brian W. Fitzpatrick, C. Michael Pilato. "Version Control with Subversion", an O'Reilly book available online. Accessed:September 2010

3. Don Bolinger, Tan Bronson: Applying RCS and SCCS. O'Reilly Media. ISBN:978-1-56592-117-7

4. Eric Sink: A collection of articles on source control and best practices, Accessed:September 2010

5. M. J. Rochkind: The Source Code Control System. In IEEE Transactions on Software Engineering SE-1:4 (Dec. 1975), pages 364–370.

6. Martin, Robert L. (2002). "Software configuration management for the 21st century". Bell Labs technical journal, Volume 2, Issue 1, page- 154.

7. Mike Mason; Pragmatic Version Control Using Subversion; Pragmatic Bookshelf; ISBN 0-9745140-6-3 (1st edition, paperback, 2005)

8. Noah Gift,Adam Shand: Introduction to distributed version control systems (compare how to use Bazaar, Mercurial, and Git) (Apr 2009), IBM Technical Library, Accessed:September 2010

9.Walter F. Tichy; Rcs - a system for version control; In Software: Practice and Experience. Volume 15, Issue 7,(July 1985) pages 637–654

See also

Wikipedia: Revision control

Wikipedia: List of revision control software

Wikipedia: Detail comparison of revision control softwares

CVS webpage

Dick Grune's website on CVS history

The original usenet post on CVS archived at google groups.

Linux availability of various version control softwares