CSC/ECE 517 Fall 2009/wiki1a 4 SCM

From Expertiza_Wiki
Jump to navigation Jump to search

In software engineering, Source Code Management (SCM) is a mechanism to track and store the modifications made to source files during large scale software development. This is achieved by assigning a unique version number (also called revision number) for changes made to a file. Along with the version number, it also stores the username who made the changes, time-stamp and comments from the user. Version control provides a suitable environment for distributed, collaborative software development as it supports Version Comparisons, Restorations, and Merging etc.

Some of the common Version control systems used are CVS, SVN, Git, Mercurial, Bazaar, LibreSource, Montone, Clearcase, Perforce etc [5].

Overview

A simple source control mechanism. Each changed/updated file is manually copied with a different name.

Source Code Management is also known as version control , revision control or source control, all of which are used interchangeably. Source control, in general provides a way to track and manage the changes done to project's source code, documentation or configuration files etc.

A simplest version control mechanism involves copying the changes done to a file/folder into another file/folder. For example, we may have come across filenames such as project_xyz_03_09_09.zip and project_xyz_03_15_09.zip, where we have manually copied the old files, made some changes and saved it with a new name. Although this method is very simple, it is inefficient and does not provide various other feature that a SCM system provides.

A typical source control tool would generally provide the following features [3]:

  • A space to store the files (source code, documents etc) which is usually called the repository.
  • A record of changes that has been done since the file has been added to the version control.
  • A way to revert back the changes done.
  • A way for developers and testers to work together without affecting each other. This is usually achieved by having a private workspace who have a seperate copy of the files.
  • A way to merge correctly the changes done by different users.

Need for Source Code Management

When a software is developed over a period of time, there will be many versions of the same software. In some cases it is required to look at the list of changes made between two versions. This can be be achieved by using a diff tool over two versions. It can also be used to locate the version where a bug was introduced. In order to debug the error, it will be necessary to retreive the vesrion where the error can be reproduced [4].

In some cases it may be required to maintain two or more versions versions of the software at the same time. For example, one version has the specific features that is released to a company X and another feature release to Y. Source code management techniques can be used used in these cases with minimum maintainence overhead [4].

Often in a large software development projects, multiple developers will be working independently on the same software at the same time. In these cases it is essential to manage the access and provide a suitable mechanism to merge the updates from each user [4].

Terminologies and Definitions

An example version control tree with basic operations


Before one understands the best practices of Source Code Management, the user must be aware of the terminologies involved. The following list provides a non-comprehensive list of terms involved in SCM [1].

  • Repository: It is a database storing the files. This can be either distributed or centralized. In case of distributed, every user will have their own local repository copy and merging takes place peer to peer. Where as in centralized, there is only one main repository server.
  • Client / Server: Computer hosting the repository is known as the server and the computer which connects to the repository is known as client.
  • Workspace: The space used by the user for editing, testing, debugging and building purpose. Workspace is private copy of files. Any changes made to it is confined to the local copy which is not updated in the repository unless the user explicitly commit the changes. Workspace is also known as "Sandboxes" or "Views".
  • Add: The action of putting a new file/folder into the repository for the first time. Version control starts only when files are 'Added'.
  • Get: Operation of copying files from repository to workspace. The files retrieved using 'Get' are read only copies and not intended to be edited.
  • Checkout: Action of Downloading a file from repository to workspace for editing. SCM maintains a list of checked out files and the username who is editing it.
  • Checkin: Action of uploading the changed file to repository. SCM updates the repository my reflecting a new version of changed file.
  • Version Number: Indicates the version of the specified file. Using version number we can retrieve/revert/diff to old versions of file.This feature helps to store history of changes made to the file.
  • Branch: A separate private copy of file which can be used for specific purposes like testing, debugging, developing and bug fixing etc.
  • Merge: Process of combining two different versions of the same file. Merging usually involves copying changes of the files present in one branch to another.
  • Lock: Getting an exclusive modification rights through SCM. This means no two user can modify the same file at same time.
  • History: List of changes made to a file from the time it has been added to repository. History also provide details like users who have changed the file and comments for the change.

Best practices in SCM

Some of the best practices followed to get the maximum advantage from the SCM system is describe in the following section.

Workspace

Workspace is the private local space assigned to the user of the SCM system. Since it is a private space, any change made will not affect the other users. Workspace can be used as the location where the developers edit source files and build the components or testers use it to build, test and release software.

Some of the best practices when using a workspace are:

  • Workspaces should not be shared. The SCM will not be to track the activity by user if a workspace is shared.
  • Always work within the workspace as SCM can track updates only when it is in the proper workspace.
  • Keeping workspace in invaluable state: Working space contains a copy of files from repository. When latest version of files are copied to workspace we say that workspace is in sync with the repository. Duration between changes made to copied files and before committing the changes, the workspace is said to be in valuable mode containing latest modifications of the file. System crash during this period would result in loss of data.Hence we need to commit changes frequently in order to update repository and keep workspace in invaluable( workspace in sync with repository ) state.
  • Use working folder states to keep workspace up to date : There are many states SCM displays related to files in workspace. Using these states we can make sure that workspace is always in sync with the repository.
State Meaning
Old The file in workspace is of older version when compared to the current version in the repository
Edited File has been edited/modified in the workspace
Needs Merge When there two different versions of same file in different branches
Missing Working file does not exists hence need to be checked out from the repository
Renegade Modifying a local copy of file in workspace without checking it out from the repository
Unknown There exists a local copy of file in workspace for which SCM does not have hidden information
  • Review changes before you check-in: Any changes made in workspace which has not been committed yet are displayed as "pending changes set" in the workspace. Pending changes set displays all types of changes including adds, deletes, renames, moves, and modified files. It is a good practice to keep an eye on the pending changes set so that you do not forget anything to check-in. Review changes of modified files using file diff tool provided in SCM helps to compare the latest version present in repository and working file version in the workspace.Performing review changes before check-in prevents many merge conflicts.
  • Frequently updating the workspace: It can happen that after copying files to workspace some other user can modify few files and update the repository with modified changes. At this point of time the copy of files in workspace is old. It is a good practice to perform "Get Latest Version" from repository action periodically.
  • Do not forget to add comments while committing changes: After the files have been modified in the workspace user has to update the repository by committing the changes. It is always good idea to write comments on what and why changes were made so that other user can insight of the latest modifications.

Check-out

Checking out a file involves tasks both at server and client side. When a file is checked out, SCM makes a note that file being checked out and name of the person name who is checking out along with the time stamp. On the client side, it prepares the file for modification. Files are enabled for modification i.e. set write mode only when it is checked out. When user checks out a file for modification, SCM updates the file status to be in use and does not let any other user to check out the same file. File checked out for mutual exclusive access is said to be locked.

SCM tools provide option to "undo checkout" a checked out file. SCM tool releases the lock on the file and any modifications done to the file is lost when we select "undo checkout".

There are few important things that one should keep in mind while checking out a file,

  • Be sure of checking out or locking a file. Because by doing so you prevent other users from modifying the file. Once file locked, other users need to wait until you release lock on the file.
  • Check out single file which is of your interest. Do not try checking out the entire folder where file is located.
  • Checking out more files at once is not recommended. Check out few files at a time, finish task check-in and then you can check-out more files if needed.
  • Confine exclusive locks to the duration of work Holding exclusive locks for longer period is not recommended.
  • Keep track of files for which you own exclusive lock. Forgetting any would prevent other users from modifying those files.

Check-in

Check out, editing is followed my check-in where the changes are updated to repository. SCM tools usually provides an option to write comments while checking in modified file. The comments written are permanently stored in the repository along with the changes. When a file is checked in, SCM updates the version by incrementing it by one, checkout or locked state will be removed and made available for further modifications. Similarly working copy in workspace will be made read only.

SCM tools consider any kind of changes like creating a folder, deleting a file/folder, adding file to folder, renaming/moving a file/folder as modifications that needs to be checked in to repository.

Important things that needs attention while checking in files

  • Any changes should be accompanied with proper comments. Comments should explain what has been modified and why it had been modified.This is very important in large scale development as one can forget what changes were made and the reason behind the any changes. For example when a change is made to fix some bug, it is a good practice to write what bug has been fixed and a brief explanation on how it has been fixed rather than writing "Bug 12345 fixed."
  • Few task may require modifications involving more than one file. It is a good to check-in all the related files of single task together.
  • Restrict yourself to check-in files related to one bug at a time. Checking in files related to multiple fixes makes it hard to have a one to one mapping between the bugs and files holding their fixes.

Branching

Branching means creation of creation of alternate codeline from an existing codeline. Branching is required when a team requires multiple seperate copies of the project at the same time because of the policies required for each copy is different. For example, one codeline for the development team which requires frequent check-ins of bug fixes and other a release codeline which should be stable and should not be checked-in unless specified.

  • A new branch must be created only when required. Each new branch increases the overhead of more builds, more change propagation etc.
  • A new branch should be created instead of copying files. Copying files will have the same overhead as branching minus the advantage of SCM system's branching support.
  • One simple rule to identify when to create a new branch is to find if we require a seperate check-in policies.
  • To minimize the number of changes that needs to propagated, branch as late as possible. For example, if we have development branch and a release branch. For every bug fixed on the development branch needs to propogated to the release branch. Instead of this, if branch the release line after the bugs have been fixed, it will eliminate the need to for change propogation.
  • Branch instead of a check-in freeze. For example, if check-in needs to be freezed so that testing can be done a stable version, we can create a new branch where the changes can be updated frequently and later merged to the release line when required.

Labeling

In repository every version of a file is associated with a "label", this is also called as "tag". Label indicates a specific version of file in repository, this feature helps us to retrieve file by using "Get file by Label" operation. For example, we can have label version_1.4_rel. A Get on this label would retreive all the files with the version matching with the label.

Best practices in labeling a branch,file and folder etc:

  • Keep the label name as descriptive as possible. It helps to retrieve file later.
  • If label name cannot be descriptive then , use comments to explain briefly about the "label".
  • It is good to "label" entire folder than labeling a single file in repository.
  • Try to include previous versions of the file in same label if necessary.
  • Use labels at the time of new releases. When it is time to build source code and release it, label the release. For example: "Release 1.0" . Labeling release will help us to retrieve source code and related files using the " Get by Label" feature of SCM.
  • Use label very often. Consider two branches A and B, code in branch A has been undergoing frequent modifications. Each modification being labeled accordingly. At some point of time , you wish to migrate changes from branch 'A' to 'B'. This can be easily done by remembering previous label that had been merged with from A to B. Now we can consider changes done only after that particular label.

Branch Merging

Branch merging is the process taking some change made in one branch and applying them to another branch. For example, we have two branches development branch and a release branch. All the bug-fixes goes to the development branch. Once all the bug-fixes have been approved for release, the changes needs to be merged to the release line.

Some of the important considerations during merging are:

  • Review the merge before the commit. SCM tools usually have the option to review the changes before the merge has been commited. A thorough review must be done to ensure the correctness of the merge and any conflicts must be resolved.
  • Propogate the changes early and often. By merging often, the number and the complexity of the conflicts can be reduced.
  • The correct person must do the process merging. Since the owner of the file has the better idea of the result of a merge process in case of conflicts, (s)he should be responsible for the process of Merging.

File Merging

There exists several scenarios which require files to be merged. To mention a few:

  1. When SCM tools allow for multiple check outs of same file ie allowing same file to be modified by two different people at same time. This results in two versions of same file, which requires merging.
  2. Branching: Maintaining branches from main line would necessitate to merge changes from one branch to another.
  3. Few developers use "edit-merge-commit" method of modifying files, which allows two users to modify same file at same time, resulting in need to merge files.

Overhead of merging files can be avoided using serialized modifications of files. Allowing only one user to modify a file at given point of time. But this method kinder high performance that is available using concurrent development. In order to maximize concurrency parallel development is encouraged which necessitates merging files[6].

  • Keep working directory updated frequently: Say user A has copied file "xyz" to his workspace,let the version of this file be 4. If other users modify same file and update repository such that current version of file "xyz" is 21. This results in huge gap between the file what user A has in his workspace compared to the current version of the file in repository. In such cases merging file becomes very tedious job. Hence it is recommended to keep an eye on latest versions of file in repository and update accordingly[2].
  • Auto-merge on "Get" : This is a feature provided by many of SCM tools. When we try get a file from repository, SCM tools perform auto-merge between the working version of file in workspace and current version of file from the repository. In case of any conflicts user can visually merge, review, verify and resolve the conflicts thus making sure that merge was successful.
  • Auto-merge on check-in should be avoided: This can be explained well by considering a example. Say user 'A' has made some changes to local copy of file "xyz" having version number 4, current version of the same file in repository being 5. Suppose user 'A' check-in the file "xyz" , attempting a automated merge between local copy of user 'A' and version 5 of the repository could create merge conflicts. Since it is a auto-merge without user 'A' involvement, SCM tool might resolve these conflicts by itself and updated version number to 6. The problem being version 6 of the "xyz" file has not been reviewed , verified after merge. Hence not guarantying free of compile errors [6].

References

  1. http://betterexplained.com/articles/a-visual-guide-to-version-control/
  2. http://www.perforce.com/perforce/papers/bestpractices.html
  3. http://www.oss-watch.ac.uk/resources/versioncontrol.xml
  4. http://www.klariti.com/technical-writing/What-is-Revision-Control.shtml
  5. http://www.smashingmagazine.com/2008/09/18/the-top-7-open-source-version-control-systems/
  6. http://www.ericsink.com/scm/scm_basics.html
  7. http://blog.looplabel.net/2008/07/28/best-practices-for-version-control/