CSC/ECE 517 Fall 2014/ch1a 15 gs: Difference between revisions
No edit summary |
No edit summary |
||
Line 76: | Line 76: | ||
=='''Git Vs. SVN'''== | |||
Git and Subversion are the most greatly used representatives of the two seemingly dichotomous approaches of distributed and centralized SCM services. Git has become the most widely used SCM and lends itself to a very distributed workflow with multiple developers being able to work independently on parallel branches of a project with the ability to then push changes made to the repository to each other or a central repository depending on the workflow model. Subversion on the other hand, is based around a central repository where developers can checkout a revision but do not clone the entire repository history. | |||
[[Git advantages]] | |||
*Everyone who has cloned the central repository in order to work on it, has a complete working backup of all the files. Therefore, careful backups of the central server are not as critical due to the number of local backups depending on the number of developers | |||
*As a result of having the repository and its complete history locally, users can institute version control and choose for themselves what to track and what to merge and when. | |||
=='''References'''== | =='''References'''== | ||
<references/> | <references/> |
Revision as of 04:08, 15 September 2014
Git Version Control System
Git is a widely used open source distributed version control system used to manage small as well as large projects. It outclasses SCM tools like Subversion, CVS, Perforce, and ClearCase with features like cheap local branching, convenient staging areas, and multiple workflows. Git supports branching and merging, multiple distributed workflows and is fast.
Introduction
Source code management (SCM) systems all attempt to store system source code and provide the ability to see what changes have been made to that source code over time. The Git SCM however is unique is many ways from more traditional SCM systems....
TODO: First intro paragraph here.
The Git version control system differs from many other software change management products such as Subversion, CVS, Perforce, SourceSafe and Team Foundation Server in the way that it stores the source within its files. A great deal of other source control systems store deltas or only the parts of the files that differ from one version to the next. Git focuses more on taking a snapshot of the full repository with each commit, and storing any changes made to files while only referencing files that have not changed which allows for the ease of branching in git.
A characteristic that git shares with some other SCM services like BitKeeper and Monotone is but not with Subversion is that when a repo is cloned repo is cloned, the repository and all of its history is copied onto the local machine, making reading the history and rolling back or forward code much easier and faster. The advantage is that in git, because of the local nature of the software, one can work and commit changes locally on his or her machine without access to an external server, and can then sync up again and merge as needed when access is available once again. With subversion, you cannot commit if you do not have access to the central repository as you do not have a full repository on your local machine.
Also most other version control systems have a version number to identify one version of a file from another. The number is sequential and how the number is applied to the changes in the repository (or repo) varies from one SCM to another. In the case of Subversion it applies a single system version number which is applied at the file level while many others just sequentially number individual files. This type of version numbering is arbitrary and posed a significant problem for Linus when he considered how many people contribute to the development and maintenance of Linux. Another issue with this method is potential inaccuracies caused by the recording of merges or splits of files in addition to a rename being recorded as only a rename. In git, files do not hold an explicit history but instead the history is created implicitly through tracking between tree snapshots. Files that have the same name are considered to be directly related and if a equivalently named file is not found, the snapshot is searched for a similar file to compare to which allows for identifying renames.
What's in a number?
Linus had a problem. Traditional sequential version numbers required synchonizationGit did away with traditional sequential version numbers in lieu of SHA hashes.
Git Workflows
Centralized Workflow
The centralized workflow is similar to a central Subversion repository (repo). The developer commits all changes to a single repository, ‘master’ in the case of Git. Git however provides additional features to the user beyond the functionality found in Subversion. Developers are able to work in an isolated environment from all others developing features in the same system. There are times when getting a new feature can unexpectedly cause delays in a developers work because the new feature in some way changes the public interface of code being used by the developer. Often these changes have no direct link to what the developer is working on, but because it was introduced into his/her environment it must be addressed in the code immediately. If the developer is working on a hotfix for a production issue, this could cause undue delay in deploying the hotfix. When using git however all commits occur in the developer’s local environment buffered from any changes being committed by others. Once the developer is at a point where the code they have been working on is stable they can then consider merging in changes made by other developers. The point is that the developer has better control of when to pull in changes made by someone else.
With that said, the Centralized Workflow holds the upstream repository as sacred ground. The central repository should exist on some server that all developers assigned to work on the project should have access and the repository should be created as a bare repository or one that has no working directory associated with it. Note that convention is to name repos that are bare with an extension of .git. In short the repository only ever changes when users push their changes into it.
An additional constraint placed on the central repository is that if the developers local commits diverge from the central repo then the developer cannot just merge his/her changes into the central repo. A merge is simply a 3-way comparison of the two commits and their common ancestor. Developers whose local repo diverges from the central repo must do what is called a rebase in order to be allowed to push their changes back to the central repo. A rebase replays all of the commits from the central repo on top of the developer’s repo. If a conflict occurs they must resolve the conflict and commits those changes. Once the developer rebases his/her local repo it can then be pushed back to the central repo. This push back to the central repo results in a fast forward merge. The end of result of a rebase and merge is the same as a 3-way merge using the merge command without rebasing, however the advantage of requiring the rebase over just a merge is that the fast forward merge results in a linear commit history that is much easier to read and manage over time. The commits in the central repository appear to be linear even though their development may have been parallel and were serialized by their relative local commit order and push.
Feature Branch Workflow
The Feature Branch Workflow builds upon the centralized workflow. In this workflow however developers create a new branch each time they start a new feature or hotfix. There is also a role reversal when it comes to how the commits end up back in the central master branch. When the developer is ready to release their code back to the master branch they make a pull request rather than pushing it.
There a couple of advantages to this approach. Not every feature will involve just one developer. In the centralized workflow there is only one branch that everyone uses. When using git to implement this workflow it had the advantage that each developer was isolated from each other’s work which might cause heartburn due to incompatible changes. The downside to this approach is that developers are isolated from each other. Two developers cannot “share” code without making that code visible to everyone. This poses a significant problem if the code that the developers are working on is some crazy idea that very well may end up being sent to the bit bucket.
The feature branch workflow allows sharing of such code between multiple developers without touching the master branch code. Developers push their feature branch changes back to the server in order to share it with other developers. This allows multiple developers to work on a feature while preserving the pristine state of the master branch.
When a feature reaches the stage where the developer is ready to have it merged back into the master branch, they issue a pull request. A pull request identifies the commit that the target branch should start with. This would likely be the common ancestor of those branches. It also identifies the developers repo (the source repo) and the end commit which if omitted just runs to the end of the commit history. A pull request is really a request for a discussion about the feature in the branch. Developers can review the changes included in the branch. The changes in the branch would either be accepted as is, require modification before they would be accepted, or would be denied.
This type of workflow works very well in a larger environment where a separate change management group controls what code makes it into the master branch. It also is very advantageous when features require more than one developer to implement. Also by pushing feature commits back to the server a backup of commits is effectively made in case a system failure takes out a developer’s system.
Gitflow Workflow
Gitflow is a Workflow proposed by Vincent Driessen. In this workflow there are no additional Git features that are used over the feature branch workflow. It does however prescribe a strict branching model that follows the project release cycle. This model works well in a large project due to the predictable and repeatable way that it handles releases while continuing further development of the system. Even though there are no real new features used in this workflow there a number of open source projects such as nvie/gitflow provided by Vincent Driessen on github.com and the Git Flow Integration plugin for RubyMine (http://plugins.jetbrains.com/plugin/7315) that provide helper scripts and menus to help guide developers to more easily follow the structure provided by workflow.
In the Gitflow Workflow there are a number of persistent and intermittent branches. Specifically there are two persistent branches, master and develop (or development). When you initialize a Gitflow Workflow you will always have these two branches. Intermittently there will exist a release branch and a hotfix branch as well as many feature branches. The feature branches will be named something that indicates what feature will be developed.
The process in a nutshell is as follows. The master branch is initialized. For a new project this branch will be tagged as a beta version or maybe even an alpha version, or if migrating from another version control system this may be the last stable release of the product. The development branch is branched from the master. All developers will then branch their feature branches from the development branch. No real development actually occurs on the development branch. Along with the master branch the development branch should be kept as pristine as possible at all times. This is simply a staging branch for holding completed features. All developer implementation occurs on feature branches. When a developer completes a feature they make a pull request just like in the feature branch workflow. Over time features build up in the development branch and at some point the decision is made to release a version of the software. At this point a release branch is created from the development branch. The release branch is then used to do final testing. No new features are added to the branch. Only fixes to existing features is allowed in a process often called release hardening. As issues are discovered they are fixed in the release branch and eventually after all tests are passed that code then deemed a release. Versions from the release branch may be major or minor releases. It is then merged into the master branch to be built and released to production as well as being merged back into the development branch so that the fixes made during the hardening process are made visible to future development. Once the merges have occurred the release branch is deleted.
The final piece of this puzzle is the hotfix branch. No matter the extent of testing done to software occasionally bugs make it into a production system. When issues are discovered in production that must be fixed immediately a hotfix branch is made from the master branch. The fix is made and tested. It is then merged back into the master branch as well as being merged into the production branch and the branch is deleted as it is no longer needed until the next hotfix. Although it is possible to keep the release and hotfix branches around in between the need for them, but the advantage of deleting them is that they don’t need to be kept up to date in between uses.
Forking Workflow
The Forking Workflow is significantly different than the other types of workflows. In this workflow every developer has a Git repository on the server. That means that every developer has both a private local repository as well as a private remote or server-side repository. Project maintainers control when code is merged into the official repository. Because the nature of this workflow puts each developer’s code base is put in a silo it is ideal and frequently used the workflow of choice for open source projects where there may be untrusted developers contributing code.
When a developer forks a repo a copy of the official repo is made on the server-side for the developer. The act of forking a repo is nothing more than creating a copy of an existing repository so no special functionality is necessary in Git, however Git does support this type of workflow by providing the ability to set more than one remote repository. In the simplest case a developer would fork a repository possibly on repo service like GitHub. He/She would then clone a local version of the repo on their development machine. In order to keep up to date with changes being made in the official repo the developer would set another remote using the git remote add command which by convention is named upstream.<ref>https://help.github.com/articles/fork-a-repo</ref>
git remote add upstream https://github.com/octocat/Spoon-Knife.git
Once the developer also links his repo to the upstream repo it can then be kept up to date by periodically fetching changes from the upstream repo and merging them into their branch. <ref>https://help.github.com/articles/syncing-a-fork)</ref>
git fetch upstream
Getting the changes into their server-side repos just requires the developer to execute a push.
git push origin master
Git Vs. SVN
Git and Subversion are the most greatly used representatives of the two seemingly dichotomous approaches of distributed and centralized SCM services. Git has become the most widely used SCM and lends itself to a very distributed workflow with multiple developers being able to work independently on parallel branches of a project with the ability to then push changes made to the repository to each other or a central repository depending on the workflow model. Subversion on the other hand, is based around a central repository where developers can checkout a revision but do not clone the entire repository history.
- Everyone who has cloned the central repository in order to work on it, has a complete working backup of all the files. Therefore, careful backups of the central server are not as critical due to the number of local backups depending on the number of developers
- As a result of having the repository and its complete history locally, users can institute version control and choose for themselves what to track and what to merge and when.
References
<references/>