CSC/ECE 517 Fall 2009/wiki1a 4 srhi4

From Expertiza_Wiki
Jump to navigation Jump to search

Best Practices for Source Code Management with Version Control


Introduction

         Source Code Management (also called Revision Control) is a technique used to manage and monitor the codebase of any software in order to track the changes made to the code. It plays an important role in a setup where there are many developers using a codebase who have to modify/work on the same code. The problem gets more complex when there are multiple teams that want to maintain different versions of the same basic codebase to add new features or fix bugs. In such a scenario, merging different versions of the same code and availability of the latest version of the code becomes a critical factor.

The below diagram gives a quick overview of a source code management system managing multiple codebases

An example of version control
An example of version control

Terminology

• Repository -   Store house for files. Usually integrated with a database
• Trunk -  This is the location where the source code can be found. It is at the root of the tree
• Working set -   Downloaded copy of the codebase
• Client -   Application that connects to the repository
• Server -  The system that hosts the repository
• Code Check-out -  Downloading a file or a set of files from the codebase to a workspace in order to run / modify the code
• Code Check-in -  Uploading new / modified files to a codebase from a workspace
• Branching -  A technique used to aid the concurrent development of software. Simply put, it allows for development of code simultaneously by creating multiple paths for development.A   branch is for a single logical change in the code
• Codeline -  A codeline is similar to a branch but can support multiple logical changes to the code. Often, branch and codeline are used interchangeably
• Merging -  Process of integrating the changes in the code with the codeline. For example, if A branches out from a codeline, development continues on the codeline. A would then have    to merge his / her changes with the codeline to keep the versioning most recent.

Motivation for Source code management (SCM)

         In a development environment, a single project can simultaneously be in multiple phases. One team would be working on building new features into the product. A second team would be working on fixing bugs (Tech support) while a third team concentrates on prototypes for future development. Different lines are created for each of these items – Functional line, development line, maintenance line, release line, integration line and so on. In such an environment, management of the codebase becomes critical for the following reasons

  1. Allow simultaneous / parallel development of software 
  2. Integrate code changes from different teams / developers
  3. Propagate bug fixes to future versions of the software
  4. Isolate, coordinate and tidily separate work item units
  5. Track and revert to older versions
  6. Keep related projects in sync with one another
  7. Reduce costs of late merging
  8. Reduce cost of maintaining codebases

Efficient Source Code Management (SCM)

         There are a lot of challenges associated with Source Code Management(SCM) and there exists no standard way for doing SCM. When do you create new branches? When do you create new codelines? Who should be responsible for the branch? When do you merge code? How do you integrate code fixes with future releases? How do you keep track of multiple releases? These are questions that have no definite answers. Each project uses a different SCM technique depending on development cycles, releases, bug fixes and size of the project.

          Source code management techniques invariably include trade-offs. While branching early could be good, the costs for merging could add up. Trade-offs between maintaining multiple lines like development and maintenance lines can also be confusing to an extent. Optimistically thinking, maintaining a single line for both maintenance and development could be tempting. It would allow new changes / bug fixes from the maintenance team to become immediately visible in the development line thus incorporating all the bug fixes into the new development cycles. However, the cost of merging each time into the new development branch is an important criterion. What would happen if the new code being developed by the development branch is in conflict with a new bug fix? Who would re-work the code to get around the problem? Such trade-offs and many many more become important for efficient source code management.

         Fortunately, there a few common guidelines and patterns that allow for efficient management of source code. The following section describes a few common guidelines in a typical scenario

  1. Have a main line from which all the code is derived
     The main line sets up the base of the project. Typically, when some code is being developed for two different platforms, say,  
     Windows and Linux, common modules are placed in the main line. The two codelines for Linux and Windows are then taken
     from the main line to create parallel development lines. 2. Have parallel maintenance and development lines Product development and maintenance (eg: bug fixes) happen simultaneously. For this reason, we need two parallel lines and
     these lines should interact closely and integrate / merge frequently to stay up-to-date with the fixes 3. Have one codeline per release Separation / Isolation of the codeline for a specific release becomes important since parallel work goes on on the previous
     version of the software(maintenance). Thus a new codeline should be drawn whenever a release for the product is planned 4. Create policies for each codeline Policies for a codeline define the stability and maintainability of a source code management system. The policy for the
     development line could encourage late merges while those for bug fixes encourage merging often. 5. Merge early and often As far as possible, developers must be working with the latest copy of the code. Thus, merging often becomes important.
     Early merges help in having common bases for further development since the early parts of the code are most critical. Frequent
     merging also helps in avoiding large merges later in the cycle which could lead to incosistent code. It is the best way to keep
     all developers in sync with the product code.. 6. Do not isolate too much, Isolation could be beneficial in case of large projects. However, creating many codelines / branches is inadvisable. The
     cost for merging could become high. 7. Analyse and realign The versioning tree could grow out of bounds and become wider and wider. Wider the tree, the more codelines / branches it
     has and the more difficult it is to manage. SCM systems must be checked often for such widening trees and action for merging
     codelines must be taken appropriately to sync up versions 8. Have an owner for every codeline Ownership is a key term used in source code management. Every codeline is assigned an owner who is responsible for the codeline.
     The owner's typical tasks would be to assist in code integration and changes in his codeline, clarify ambiguous code policies,
     decide when to freeze and unfreeze code, co-ordinate across teams to make successful merges.


         The above guidelines help in improving manageability of the source code versions. It leads to increased coordination among developers, improves traceability, isolates changes, defines definite roles and responsibilities and reduces complexity.

Examples

         IBM Rational ClearCase is a widely used version control software with a secure version management and easy to follow interface. It follows the guidelines of the best practices for source code management with version control. It centralizes the code, making it accessible to anyone in the software development team for a particular development. The ClearCase software supports parallel development and easy merging techniques. One of the most useful features of IBM Rational ClearCase is the views. It provides two different views, SnapShot and Dynamic. SnapShot views help to create local copy of the code from the codebase while the dynamic one is used when the codebase has to be modified (during merge). This feature helps to maintain the version of the code by allowing the developer to choose the level of access required for the code.

         As discussed in the previous section, the ClearCase maintains the main branch as the baseline and when a new project has to be derived from the main branch, a sub-branch is created which uses the baseline. This sub branch is used to build the project on. Every developer who is a part of the project team and for whom the project's main branch is accessible is given a unique id. Whenever a file is checked out by a developer, his name appears on the ClearCase view. The rest of the team can always see which file is being modified / looked at by which team members. This feature sometimes helps the developer to decide whether he should access the file to modify it now or not. Also, the ClearCase maintains every version of a file. Every time a file is checked out, a copy is maintained, so that when the modified version of the file is checked in, the view still has the old version of the file as well. One of the most powerful features of this tool is that every time a project is successfully completed and needs to be released to make it accessible to the other projects (branches of the main branch), it can be merged with the main branch which makes it accessible to all the other branches.

         The following diagram is taken from the IBM Boulder site. It is a snapshot of the ClearCase.

A snapshot of IBM Rational ClearCase
A snapshot of IBM Rational ClearCase


The above diagram is a snapshot view of a shared file Prog.c which is in the clearcase. The circles show the different versions of the file, i.e. the number of circles are basically the number of times the file was checked out and checked back in. The dark circle means that the file has been currently checked out by someone called "user" as shown. The file Prog.c, when checked out is copied locally in the user's machine. The changes that the user makes remains local until he checks in the file again.

The details of the tool can be found in the IBM publication cited in the References section of this page.

         Another widely used tool for source code management is Subversion - an open source initiative. Subversion has superseeded CVS as the versioning control system of choice since it includes important functionality like deleting and renaming directories which were absent in CVS. The users of SVN range from those working on very small classroom projects to mammoth ones like the ones in SUN Microsystems. SVN uses a relational database, BerkleyDB as the backend and works much faster than CVS. With space for adding metadata with each file and support for "all or nothing" commits, SVN is more stable and allows for better management of the codebase.

The open source document of SVN can be found in the References section of this page.

Conclusion

         Source Code Management or Revision Control is a way of helping the software development for faster and efficient delivery and reduce the amount of rework or manual code maintenance. In a typical development team setup, it becomes a critical tool which is used by every team member. Hence it becomes very important for the version control systems to be accurate and user friendly. Not only does it need to be simple to understand, but it also has to be reliable and transparent without any conflicts.

See also

•  For Beginners
      The high level overview of SCM best practices.

•  For Advanced Readers
      An in-depth view of source code management using different patterns. It helps in organizing related lines of development into appropriately diverging and converging streams of source code changes.
      This is the product RedBook released by IBM on IBM Rational ClearCase. It explains the different features of their Version Control tool which follow the best practices discussed in this page.

•  For Developers
      The IBM Rational ClearCase Version Control tool document contains tutorials and examples that are easy to follow.
      Information about SVN and its free download versions by O'Reilly media can be found here


References

1. http://www.sunsource.net/scdocs/ddCVS
2. http://www.codewalkers.com/c/a/Server-Administration/Source-Code-Version-Control-Solutions
3. http://en.wikipedia.org/wiki/Revision_control