CSC/ECE 517 Fall 2014/ch1a F1415 rv: Difference between revisions

From Expertiza_Wiki
Jump to navigation Jump to search
 
(43 intermediate revisions by 2 users not shown)
Line 1: Line 1:
<div style="float: right;">
<div style="float: right;">
[[File:git-logo.png]]
[[File:git-logo.png|center]]
{| class="wikitable"
|-
! Author(s)
| Ravish Roshan, Vikas Piddempally
|-
! Initial release
| 7 April 2005
|-
! Written in
| C, Bourne Shell, Tcl, Perl
|-
! Operating Systems
| Linux, POSIX, Windows, OS X
|-
! Type
| Version Control
|-
! License
| GNU General Public License v2
|-
! Website
| [http://git-scm.com/ git-scm.com]
|-
|}
</div>
</div>
== Introduction ==
== Background ==
 
[http://git-scm.com/ Git] (/ɡɪt/) is a distributed revision control and [https://en.wikipedia.org/wiki/Revision_control source code management (SCM)] originally designed and developed by [https://en.wikipedia.org/wiki/Linus_Torvalds Linus Torvalds] for [http://www.linux.org/ Linux] kernel development in 2005 with emphasis on managing Linux source code without requiring a central server, and has since then become the most widely adopted version control system for software development. Git facilitates a system with an emphasis on speed, data integrity, and support for distributed, non-linear workflows. Git was initially
 
In 2005, the relationship between the community that developed the Linux kernel and the commercial company that developed BitKeeper broke down, and the tool’s free-of-charge status was revoked. This prompted the Linux development community (and in particular Linus Torvalds, the creator of Linux) to develop their own tool based on some of the lessons they learned while using BitKeeper. Some of the goals of the new system were as listed below:
 
*Speed
*Simple design
*Strong support for non-linear development (thousands of parallel branches)
*Fully distributed
*Able to handle large projects like the Linux kernel efficiently (speed and data size)
 
Since its birth in 2005, Git has evolved and matured to be easy to use and yet retain these initial qualities. It’s incredibly fast, it’s very efficient with large projects, and it has an incredible branching system for non-linear development.
Further, [https://github.com/ Github], a git repository web-based hosting service is in fact the largest code host on the planet with over 15.7 million repositories. Large or small, every repository comes with the same powerful tools. These tools are open to the community for public projects and secure for private projects.<ref>[http://git-scm.com/book/en/Getting-Started-A-Short-History-of-Git Getting started a short history of Git]</ref>
 
Like the Linux kernel, Git is free software distributed under the terms of the GNU General Public License version 2.


=== What is Git? ===
== Centralized Vs Distributed Version Control System ==
Git (/ɡɪt/) is a locally enabled distributed revision control and source code management (SCM) system with an emphasis on speed, data integrity, and support for distributed, non-linear workflows. Git was initially designed and developed by Linus Torvalds for Linux kernel development in 2005, and has since become the most widely adopted version control system for software development. Github, a git repository web-based hosting service is in fact the largest code host on the planet with over 15.7 million repositories. Large or small, every repository comes with the same powerful tools. These tools are open to the community for public projects and secure for private projects.
As with most other distributed revision control systems, and unlike most client–server systems, every Git working directory is a full-fledged repository with complete history and full version-tracking capabilities, independent of network access or a central server. Like the Linux kernel, Git is free software distributed under the terms of the GNU General Public License version 2.


=== Why Distributed Version Control System? ===
=== Centralized Version Control System ===


[https://en.wikipedia.org/wiki/Distributed_revision_control Distributed Revision Control Systems (DRCS)] take a peer-to-peer approach, as opposed to the client-server approach of centralized systems. Rather than a single, central repository on which clients synchronize, each peer's working copy of the codebase is a bona-fide repository. Distributed revision control conducts synchronization by exchanging patches(change-sets) from peer to peer. This results in some important differences from a centralized system:
Centralized version control systems are based on the idea that there is a single “central” copy of your project somewhere (probably on a server), and programmers will “commit” their changes to this central copy.
 
“Committing” a change simply means recording the change in the central system. Other programmers can then see this change. They can also pull down the change, and the version control tool will automatically update the contents of any files that were changed. <br/>Most modern version control systems deal with “changesets”, which simply are a groups of changes (possibly to many files) that should be treated as a cohesive whole. <br/>For example: a change to a C header file and the corresponding .c file should always be kept together. <br/>Programmers no longer have to keep many copies of files on their hard drives manually, because the version control tool can talk to the central copy and retrieve any version they need on the fly. <br/>Some of the most popular centralized version control systems are [http://www.nongnu.org/cvs/ CVS], [https://subversion.apache.org/ Subversion (or SVN)] and [http://www.perforce.com/ Perforce].
 
=== Distributed Version Control System ===
 
[https://en.wikipedia.org/wiki/Distributed_revision_control Distributed Revision Control System (DRCS)] takes a [https://en.wikipedia.org/wiki/Peer-to-peer peer-to-peer] approach, as opposed to the [https://en.wikipedia.org/wiki/Client%E2%80%93server_model client-server] approach of centralized systems. Rather than a single, central repository on which clients synchronize, each peer's working copy of the codebase is a bona-fide repository. Distributed revision control conducts synchronization by exchanging patches(change-sets) from peer to peer. This results in some important differences from a centralized system:


*No canonical, reference copy of the codebase exists by default; only working copies.
*No canonical, reference copy of the codebase exists by default; only working copies.
Line 21: Line 64:
*Committing new changesets can be done locally without anyone else seeing them. Once you have a group of changesets ready, you can push all of them at once.
*Committing new changesets can be done locally without anyone else seeing them. Once you have a group of changesets ready, you can push all of them at once.
*Everything but pushing and pulling can be done without an internet connection. So you can work on a plane, and you won’t be forced to commit several bugfixes as one big changeset.
*Everything but pushing and pulling can be done without an internet connection. So you can work on a plane, and you won’t be forced to commit several bugfixes as one big changeset.
*Since each programmer has a full copy of the project repository, they can share changes with one or two other people at a time if they want to get some feedback before showing the changes to everyone.
*Since each programmer has a full copy of the project repository, they can share changes with one or two other people at a time if they want to get some feedback before showing the changes to everyone.<ref>[http://en.wikipedia.org/wiki/Comparison_of_revision_control_software Comparison of Revision Control Software]</ref>


=== Why Git? ===
== Why use Git? <ref>[http://thkoch2001.github.io/whygitisbetter/ Why Git is better than X]</ref> ==
;Branching Model: Git's most compelling feature that really makes it stand apart from nearly every other SCM out there is its branching model.Git will allow you to have multiple local branches that can be entirely independent of each other and the creation, merging and deletion of those lines of development take seconds.<br/>This means that you can do things like:  
=== Branching Model ===
Git's most compelling feature is that, it really makes it stand apart from nearly every other SCM out there is its [http://nvie.com/posts/a-successful-git-branching-model/ branching model]. Git will allow you to have multiple local branches that can be entirely independent of each other and the creation, merging and deletion of those lines of development take seconds. <br/>This means that you can do things like:  
:* Create experimental branches.
:* Create experimental branches.
:* Clear cut Branch delineation into Production, Development, Testing, Fixes, Patches, etc.
:* Clear cut Branch delineation into Production, Development, Testing, Fixes, Patches, etc.
Line 31: Line 75:
:* Independent Merging and Management of one's content changes.
:* Independent Merging and Management of one's content changes.


;Everything is Local: There is very little outside of 'fetch', 'pull' and 'push' that communicates in any way with anything other than your hard disk.This not only makes most operations much faster than you may be used to, but it also allows you to work on stuff offline. That may not sound like a big deal, but I'm always amazed at how often I actually do work offline. Being able to branch, merge, commit and browse history of your project while on the plane or train is very productive.
=== Everything is Local===
There is very little outside of 'fetch', 'pull' and 'push' that communicates in any way with anything other than your hard disk. This not only makes most operations much faster than anyone may be used to, but it also allows you to work on stuff offline. Being able to branch, merge, commit and browse history of one's project while on the plane or train is very productive.


;Git is Fast: Git is fast. Everyone—even most of the hard core users of these other systems—generally give Git this title. With Git, all operations are performed locally giving it a bit of a leg up on [https://subversion.apache.org/ SVN] and [http://www.perforce.com/ Perforce], both of which require network access for certain operations. However, even compared to the other [https://en.wikipedia.org/wiki/Distributed_revision_control DSCMs] that also perform operations locally, Git is pretty fast. <br/> Part of this is likely because it was built to work on the Linux kernel, which means that it has had to deal effectively with large repositories from day one. Additionally, [http://git-scm.com/ Git] is written in [https://en.wikipedia.org/wiki/C_(programming_language) C], reducing the overhead of runtimes associated with higher-level languages. Another reason that Git is so fast is that the primary developers have made this a design goal of the application.
===Git is Fast===
Git is fast. Everyone—even most of the hard core users of these other systems—generally give Git this title. With Git, all operations are performed locally giving it a bit of a leg up on [https://subversion.apache.org/ SVN] and [http://www.perforce.com/ Perforce], both of which require network access for certain operations. However, even compared to the other [https://en.wikipedia.org/wiki/Distributed_revision_control DSCMs] that also perform operations locally, Git is pretty fast. <br/>Part of this is likely because it was built to work on the Linux kernel, which means that it has had to deal effectively with large repositories from day one. Additionally, [http://git-scm.com/ Git] is written in [https://en.wikipedia.org/wiki/C_(programming_language) C], reducing the overhead of runtimes associated with higher-level languages. Another reason that Git is so fast is that the primary developers have made this a design goal of the application.


;Git is Small: Git is really good at conserving disk space. Your Git directory will (in general) barely be larger than an [https://subversion.apache.org/ SVN] checkout—in some cases actually smaller (apparently a lot can go in those .svn dirs). <br /> The following numbers were taken from clones of the [https://www.djangoproject.com/ Django] project in each of its semi-official Git mirrors at the same point in its history.
===Git is Small===
Git is really good at conserving disk space. A Git directory will (in general) barely be larger than an [https://subversion.apache.org/ SVN] checkout—in some cases actually smaller (apparently a lot can go in those .svn dirs). <br />The following numbers were taken from clones of the [https://www.djangoproject.com/ Django] project in each of its semi-official Git mirrors at the same point in its history.
:{| class="wikitable" style="text-align: center;"
:{| class="wikitable" style="text-align: center;"
|+ Disk Usage
|+ Disk-space Usage
!
!
! [http://git-scm.com/ Git ]
! [http://git-scm.com/ Git ]
Line 57: Line 104:
|}
|}


;The Staging Area: Unlike the other systems, Git has what it calls the '''"staging area"''' or '''"index"'''. This is an intermediate area that you can setup what you want your commit to look like before you commit it. <br /> The cool thing about the staging area, and what sets Git apart from all these other tools, is that you can easily stage some of your files as you finish them and then commit them without committing all the modified files in your working directory.
===The Staging Area===
Unlike the other systems, Git has what it calls the '''"staging area"''' or '''"index"'''. This is an intermediate area that you can setup what you want your commit to look like before you commit it. <br />The cool thing about the staging area, and what sets Git apart from all these other tools, is that you can easily stage some of your files as you finish them and then commit them without committing all the modified files in your working directory.


;Distributed: One of the coolest features of any of the [https://en.wikipedia.org/wiki/Distributed_revision_control Distributed SCMs], Git included, is that it's distributed. This means that instead of doing a "checkout" of the current tip of the source code, you do a "clone" of the entire repository. <br/> This means that even if you're using a centralized workflow, every user has what is essentially a full backup of the main server, each of which could be pushed up to replace the main server in the event of a crash or corruption. There is basically no single point of failure with Git unless there is only a single point. <br />This does not slow things down much, either. On average, an [https://subversion.apache.org/ SVN] checkout is only marginally faster than any of the [https://en.wikipedia.org/wiki/Distributed_revision_control DSCMs]. Of the [https://en.wikipedia.org/wiki/Distributed_revision_control DSCMs] I tested, Git was the fastest.
===Distributed===
One of the coolest features of any of the [https://en.wikipedia.org/wiki/Distributed_revision_control Distributed SCMs], Git included, is that it's distributed. This means that instead of doing a "checkout" of the current tip of the source code, you do a "clone" of the entire repository. <br/> This means that even if you're using a centralized workflow, every user has what is essentially a full backup of the main server, each of which could be pushed up to replace the main server in the event of a crash or corruption. There is basically no single point of failure with Git unless there is only a single point. <br />This does not slow things down much, either. On average, an [https://subversion.apache.org/ SVN] checkout is only marginally faster than any of the [https://en.wikipedia.org/wiki/Distributed_revision_control DSCMs]. Of the [https://en.wikipedia.org/wiki/Distributed_revision_control DSCMs] that were tested, Git was the fastest.


:{| class="wikitable" style="text-align: center; width: 200px;"
:{| class="wikitable" style="text-align: center; width: 200px;"
Line 77: Line 126:
|}
|}


;Any Workflow: One of the amazing things about Git is that because of its distributed nature and super branching system, you can easily implement pretty much any workflow you can think of relatively easily.
===Any Workflow===
One of the amazing things about Git is that because of its distributed nature and super branching system, you can easily implement pretty much any workflow you can think of relatively easily.


:*;Subversion-Style Workflow: A very common Git workflow, especially from people transitioning from a centralized system, is a centralized workflow. Git will not allow you to push if someone has pushed since the last time you fetched, so a centralized model where all developers push to the same server works just fine.
:*;Subversion-Style Workflow: A very common Git workflow, especially from people transitioning from a centralized system, is a centralized workflow. Git will not allow you to push if someone has pushed since the last time you fetched, so a centralized model where all developers push to the same server works just fine.
Line 83: Line 133:
:*;Integration Manager Workflow: Another common Git workflow is where there is an integration manager—a single person who commits to the 'blessed' repository, and then a number of developers who clone from that repository, push to their own independent repositories and ask the integrator to pull in their changes. This is the type of development model you often see with open source or [https://github.com/ GitHub] repositories.
:*;Integration Manager Workflow: Another common Git workflow is where there is an integration manager—a single person who commits to the 'blessed' repository, and then a number of developers who clone from that repository, push to their own independent repositories and ask the integrator to pull in their changes. This is the type of development model you often see with open source or [https://github.com/ GitHub] repositories.


:*;Dictator and Lieutenants Workflow: For more massive projects, you can setup your developers similar to the way the Linux kernel is run, where people are in charge of a specific subsystem of the project ('lieutenants') and merge in all changes that have to do with that subsystem. Then another integrator (the 'dictator') can pull changes from only his/her lieutenants and push those to the 'blessed' repository that everyone then clones from again.
:*;Dictator and Lieutenants Workflow: For more massive projects, you can setup your developers similar to the way the [http://www.linux.org/ Linux] kernel is run, where people are in charge of a specific subsystem of the project ('lieutenants') and merge in all changes that have to do with that subsystem. Then another integrator (the 'dictator') can pull changes from only his/her lieutenants and push those to the 'blessed' repository that everyone then clones from again.


== Git Work Flow Terminology ==
== Git Work Flow Terminology ==
<div style="float: right;">
<div style="float: right;">
<p align="right">
[[File:Git_data_flow_simplified.svg.png|frame]]
[[File:Git_data_flow_simplified.svg.png]]
</p>
</div>
</div>
'''Git Data Flow'''
Understanding the Git and a typical Distributed VCS terminology :
A project is the files/source code a developer is currently writing. A git-project is a project that is using git to track/manage changes to the project.
'''Remote repository'''
A centralize Git server to host all your git-projects. A central repository (repo) is not required but it is the best way to distribute (share) your git-projects to different user/developers. Most would call the remote repository (repo) the Origin and keeping it up-to-date is highly recommended. In our case, GitHub would be our remote repo. They all work together and it all one thing.


'''Local repository'''
;Remote repository: A centralize Git server to host all your git-projects. A central repository (repo) is not required but it is the best way to distribute (share) your git-projects to different user/developers. Most would call the remote repository (repo) the Origin and keeping it up-to-date is highly recommended. In our case, GitHub would be our remote repo. They all work together and it all one thing.
A local repo is a git-project on your computer. If you were to have a Remote repository (GitHub), then the local repo would be a clone (exact copy) of the remote git-project. When you make changes to your local repo like adding code, you will need to update (push) the git-project to remote repo.


'''Index (cache) - Staging'''
;Local repository: A local repo is a git-project on your computer. If you were to have a Remote repository (GitHub), then the local repo would be a clone (exact copy) of the remote git-project. When you make changes to your local repo like adding code, you will need to update (push) the git-project to remote repo.
When you make changes to your files, you can not commit (snapshot in time) until you "stage" the files. Once you stage the file, you can then commit the changes.The purpose of this Index (staging area) is that it allows you to control what you want to commit. Typically,any code changes that are not complete can be left out of the commit.


'''Working directory'''
;Index (cache) - Staging: When you make changes to your files, you can not commit (snapshot in time) until you "stage" the files. Once you stage the file, you can then commit the changes.The purpose of this Index (staging area) is that it allows you to control what you want to commit. Typically,any code changes that are not complete can be left out of the commit.
A working directory is the current git-project, which is your source code with git files. A working directory is a local repo which you can make changes (commit, branches) to it.


 
;Working directory: A working directory is the current git-project, which is your source code with git files. A working directory is a local repo which you can make changes (commit, branches) to it.<ref>[https://www.atlassian.com/git/articles/simple-git-workflow-is-simple/ Git workflow is simple]</ref>
 
<div>
<h2>Commands</h2>
<p>For a list of command line commands, please go to [Git-Reference]</p>
<table>
<tr>
<th width="120px">Term</th>
<th>Explanation</th>
</tr>
<tr>
<td>branches</td>
<td>It allows you to create a separate working copy of your code and develop it in  a
new way wihtout affect the original code.  So its like creating a new project (branch) based on the original, the original
is usually called the Master.  The branch is completely independent of the Master and you can develop both the Master and
branch at the same time.  You can also have multiple branches and you can <code>merge</code> any of these together.
<br /><br />Branching, cloning and forking is sometimes confused.
</td>
</tr>
<tr>
<td>clone</td>
<td>
Clone is a copy of a repository.  Clone is used when you copy from the remote repository to your local repository.
<br /><br />Branching, cloning and forking is sometimes confused.
</td>
</tr>
<tr>
<td>collaborator</td>
<td>Collaborators are other GitHub users that you’ve given write access to one or more of your repositories.
Collaborators may fork any private repository you’ve added them to without their own paid plan. Their forks do not count against your private repository quota.
<br />
<em>This was taken from github.com</em>
</td>
</tr>
<tr>
<td>commit</td>
<td>A commit is like a bookmark/milestone that is recorded in the repository.  It creates a revision that you can view and
revert back to at anytime.  When using Git, you are encourge to do many small commits since all of these commits are done
on your computer. When you update (push) to the remote repository to backup or share with other developers all the commits will
be viewable.
<br /><br />
In projects with multiple developers, doing many small commits and updating the remote repository many times a day will allow your coworkers
to merge your changes into their code and thus resolving discrepancies early which is much easier to resolve than waiting
and doing a massive merge.
<br /><br />
A commit will always contain the author and committer.  Author is usually the creator/owner
of the project and a committer can be a collaborator or the author.</td>
</tr>
<tr>
<td>fetch</td>
<td>
Fetch is a command in Git or command, you use fetch to synchronize your local repository with the remote repository.  Its important to know that
fetch does not merge any new changes to your source code (local repository).  New changes from the fetch will be considered a &quot;remote branches&quot;,
you can combine new changes to your code by using the <code>merge</code> command.
 
<br /><br />When will this happen?
<ul>
<li>When a collaborator/coworker ask you to test the changes with your code</li>
<li>Before you do a push (upload to remote repository), you would want to do a <code>fetch</code>, which will synchronize your
local repository to the remote repository.  So you are making sure your local repository is up-to-date with the remote repository
and if new code is downloaded, then you can check out what is new.  You will also need to do a <code>merge</code>.
</li>
</ul>
Note: fetch is not a clone, clone is a new copy down from the repository, fetch is to update your local repository.
</td>
</tr>
<tr>
<td>fork</td>
<td>
Fork is NOT a Git term, its a term born from the community.  Forking is similar to branching in that you copy the origin
(Master) to your account.  The difference is that you are copying the source code from the author (owner) to your
account.  Branching is copying within the same account.
<br /><br />
So when does this happen? Forking usually happens for 2 reasons:
<ol>
<li>You fork it to take the project in a new direction.  Obviously you can only do this with open source projects</li>
<li>You fork the project to add to the project and request a merge back to the origin/master.  You as a independent
developer want to contribute to the code.</li>
</ol>
Branching, cloning and forking is sometimes confused.
</td>
</tr>
<tr>
<td>merge</td>
<td>
A merge is combining branches with your Master (origin). When to you do <code>pull</code> a fetch and merge will be done to update your local repository.
When merging two code sets together, if a conflict were to occur, it will be up to you to resolve it.
</td>
</tr>
<tr>
<td>pull</td>
<td>
A pull is basically 2 command execute together.  Pull will do a <code>fetch</code> and <code>merge</code> on your local repository.
When using the GitHub Windows Client, you will usually do a pull.
<br /><br />When will this happen?
<ul>
<li>When a collaborator/coworker ask you to test the change with your code</li>
<li>Before you do a push (upload to remote repository), you would want to do a <code>pull</code>, which will synchronize your
local repository to the remote repository.  So you are making sure your local repository is up-to-date with the remote repository and you can
test you code with any new code committed to the remote repository.
</li>
</ul>
</td>
</tr>
<tr>
<td>push</td>
<td>
Push is the command to upload your local repository to the remote repository.  Any changes that you want to push to the remote repository, you
will need to <code>commit</code> on those changes on your local repository.
</td>
</tr>
<tr>
<td>repository</td>
<td>Contains all the history... like commits, branches, tags and your source code.  Repository can be remote which is a central
storage for repository.  Local repository would be what the user/deverloper works with (make changes to).
Remote repository is a centralize repository for sharing projects.</td>
</tr>
<tr>
<td>revision</td>
<td>
Revisions represents a version of the source code. Git identifies revisions with SHA1 ids. SHA1 ids are 160 bits long and are represented
in hexadecimal. The latest version can be addressed via "HEAD", the version before that via "HEAD~1" and so on.
A revision is created when you do a commit.  So a commit is a command and a revision is like the bookmark... so when you hear someone
say &quot;revert back to the last commit&quot;... technically they should say, &quot;revert back to the last revision&quot;.
</td>
</tr>
<tr>
<td>tags</td>
<td>A tag is like a commit, it allows you to permanantly mark a point in your project.  A common use of tags is to mark the
application version number.  This version number is the same displayed in the help/about.</td>
</tr>
<tr>
<td>URL</td>
<td>Used to tell Git where the remote repository is. 
<br />There is more than one type of URL
<ul>
<li>HTTP: Should be a https (secure) https://github.com/&lt;user&gt;/&lt;git_project&gt;</li>
<li>GIT: Uses git communication git://github.com/&lt;user&gt;/&lt;git_project&gt;</li>
<li>SSH: Uses SSH git@github.com:&lt;user&gt;/&lt;git_project&gt;</li>
</ul>
</td>
</tr>
</table>
</div>


== Characteristics of Git ==
== Characteristics of Git ==
Line 274: Line 158:
;Cryptographic authentication of history: The Git history is stored in such a way that the name of a particular revision (a "commit" in Git terms) depends upon the complete development history leading up to that commit. Once it is published, it is not possible to change the old versions without it being noticed. Also, tags can be cryptographically signed.
;Cryptographic authentication of history: The Git history is stored in such a way that the name of a particular revision (a "commit" in Git terms) depends upon the complete development history leading up to that commit. Once it is published, it is not possible to change the old versions without it being noticed. Also, tags can be cryptographically signed.


;Toolkit-based design: Following the [http://www.unix.org/ Unix] tradition, Git is a collection of many small tools written in C, and a number of scripts that provide convenient wrappers. Git provides tools for both convenient human usage and easy scripting to perform new clever operations.
;Toolkit-based design: Following the [http://www.unix.org/ Unix] tradition, Git is a collection of many small tools written in C, and a number of scripts that provide convenient wrappers. Git provides tools for both convenient human usage and easy scripting to perform new clever operations.<ref>[https://en.wikipedia.org/wiki/Git_(software)#Characteristics Git Characteristics]</ref>


== How do I use Git? ==
== How do I use Git? ==
Line 305: Line 189:


This will add all the files to staging area. To stage is simply to prepare it finely for a commit. Git, with its index allows you to commit only certain parts of the changes you've done since the last commit.
This will add all the files to staging area. To stage is simply to prepare it finely for a commit. Git, with its index allows you to commit only certain parts of the changes you've done since the last commit.
[[File:GitThreeStates.png|frame|center]]
[[File:Git-3.png|center]]<ref>[http://git-scm.com/book/en/Git-Basics-Recording-Changes-to-the-Repository Git Basics Chapter 1]</ref><ref>[http://git-scm.com/book/en/Getting-Started-Git-Basics Git Basics Chapter 2]</ref>


After a user has staged the required files, they need to commit the files by giving a commit description.
After a user has staged the required files, they need to commit the files by giving a commit description.
Line 330: Line 214:
git push origin master
git push origin master
</pre>
</pre>
<ref>[http://code.tutsplus.com/tutorials/easy-version-control-with-git--net-7449 Easy Version Control System with Git]</ref>
== Real Power-Git Branches==
Branches allow to diverge from the main line of development. This allows developers to work on bug fixes and new features without interfering with the main (live) branch (master). Later you can merge branches with each other or with the master branch. The basics are quite simple:
Open up a new branch:
<pre>git checkout -b <branchname> </pre>


To list all the branches,a user can use the following command.
<pre> git branch </pre>
To merge a branch into the main branch (commonly refered as master),one can use the below command.
<pre>
git merge <branch name>
</pre>
If the branch is not needed anymore, delete it using the following command.
<pre>
git branch -d testing
</pre>


This is the easiest example of branching, there are more advanced options possible like 3-way merging (merging various branches at once) and rebasing.
<ref>[http://git-scm.com/book/ch3-6.html Git Rebasing]</ref>


'''Merging & Branching in a Nut Shell'''


In a nutshell you can create a branch with git branch (branchname), switch into that context with git checkout (branchname), record commit snapshots while in that context, then can switch back and forth easily. When you switch branches, Git replaces your working directory with the snapshot of the latest commit on that branch so you don't have to have multiple directories for multiple branches. You merge branches together with git merge. You can easily merge multiple times from the same branch over time, or alternately you can choose to delete a branch immediately after merging it.


== Comparison of various Version Control Softwares ==


{| class="wikitable sortable" style="text-align: center; width: auto;"
|-
! Software
! Programming language
! Storage method
! Scope of change
! #Revision IDs|Revision IDs
! Protocol (computing)|Network protocols
! Source code size
|-
! [http://bazaar.canonical.com/en/ GNU Bazaar]
| Python, Pyrex , C
| Snapshot
| Tree
| Pseudorandom
| custom
| 4.1 MB
|-
! [http://www.bitkeeper.com/ BitKeeper]
| C (programming language)
| Changeset
| Tree
| Changeset keys, numbers
| custom, Hypertext Transfer Protocol, Remote Shell, secure shell, email
| Unknown
|-
! [http://www-03.ibm.com/software/products/en/clearcase ClearCase]
| C (programming language)|C, Java (programming language), Perl
| Changeset
| File and Tree
| Numbers
| custom (CCFS), custom (MultiVersion File System|MVFS filesystem driver), Hypertext Transfer Protocol
| Unknown
|-
! [http://www.relisoft.com/co_op/ Code Co-op]
| C++
| Changeset
| Unknown
| User ID-Ordinal
| e-mail (Messaging Application Programming Interface|MAPI, SMTP/POP3, Gmail), Local area network
| Unknown
|-
! [http://www.nongnu.org/cvs/ CVS]
| C (programming language)|C
| Changeset
| File
| Numbers
| pserver, Secure Shell|ssh
| 3.3 MB
|-
! [http://git-scm.com/ Git]
| C (programming language)|C, shell scripts, Perl
| Snapshot
| Tree
| SHA-1 hashes
| custom (''git''), custom over Secure Shell, Hypertext Transfer Protocol, rsync, email, bundles
| 10.2 MB
|-
! [https://www.gnu.org/software/gnu-arch/ GNU arch]
| C (programming language), shell scripts
| Changeset
| Tree
| Numbers
| Hypertext Transfer Protocol, WebDAV
| Unknown
|-
! [http://dev.libresource.org/ LibreSource Synchronizer]
| Java (programming language)
| Changeset
| Unknown
| Timestamps
| Hypertext Transfer Protocol, File-System
| Unknown
|-
! [http://mercurial.selenic.com/ Mercurial]
| Python (programming language)|Python, C (programming language)
| Changeset
| Tree
| Numbers
| custom over secure shell, Hypertext Transfer Protocol
| 1.2 MB
|-
! [http://www.perforce.com Perforce]
| C++, C (programming language)|C
| Changeset
| Tree
| Numbers
| custom
| Unknown
|-
! [https://subversion.apache.org/ Subversion]
| C (programming language)|C
| Changeset and Snapshot
| Tree
| Numbers
| custom over Secure Shell, Hypertext Transfer Protocol
| 5.2 MB
|-
! [http://msdn.microsoft.com/en-us/vstudio/ff637362.aspx Team Foundation Server]
| C++ and C Sharp (programming language)|C#
| Changeset
| File and Tree
| Numbers
| SOAP over HTTP or HTTPS
| Unknown
|-
|}<ref>[https://en.wikipedia.org/wiki/Comparison_of_revision_control_software Comparison of revision control software]</ref>


== Developments and Feature Enhancements ==
== References ==
<references/>

Latest revision as of 02:31, 26 September 2014

Author(s) Ravish Roshan, Vikas Piddempally
Initial release 7 April 2005
Written in C, Bourne Shell, Tcl, Perl
Operating Systems Linux, POSIX, Windows, OS X
Type Version Control
License GNU General Public License v2
Website git-scm.com

Background

Git (/ɡɪt/) is a distributed revision control and source code management (SCM) originally designed and developed by Linus Torvalds for Linux kernel development in 2005 with emphasis on managing Linux source code without requiring a central server, and has since then become the most widely adopted version control system for software development. Git facilitates a system with an emphasis on speed, data integrity, and support for distributed, non-linear workflows. Git was initially

In 2005, the relationship between the community that developed the Linux kernel and the commercial company that developed BitKeeper broke down, and the tool’s free-of-charge status was revoked. This prompted the Linux development community (and in particular Linus Torvalds, the creator of Linux) to develop their own tool based on some of the lessons they learned while using BitKeeper. Some of the goals of the new system were as listed below:

  • Speed
  • Simple design
  • Strong support for non-linear development (thousands of parallel branches)
  • Fully distributed
  • Able to handle large projects like the Linux kernel efficiently (speed and data size)

Since its birth in 2005, Git has evolved and matured to be easy to use and yet retain these initial qualities. It’s incredibly fast, it’s very efficient with large projects, and it has an incredible branching system for non-linear development. Further, Github, a git repository web-based hosting service is in fact the largest code host on the planet with over 15.7 million repositories. Large or small, every repository comes with the same powerful tools. These tools are open to the community for public projects and secure for private projects.<ref>Getting started a short history of Git</ref>

Like the Linux kernel, Git is free software distributed under the terms of the GNU General Public License version 2.

Centralized Vs Distributed Version Control System

Centralized Version Control System

Centralized version control systems are based on the idea that there is a single “central” copy of your project somewhere (probably on a server), and programmers will “commit” their changes to this central copy.

“Committing” a change simply means recording the change in the central system. Other programmers can then see this change. They can also pull down the change, and the version control tool will automatically update the contents of any files that were changed.
Most modern version control systems deal with “changesets”, which simply are a groups of changes (possibly to many files) that should be treated as a cohesive whole.
For example: a change to a C header file and the corresponding .c file should always be kept together.
Programmers no longer have to keep many copies of files on their hard drives manually, because the version control tool can talk to the central copy and retrieve any version they need on the fly.
Some of the most popular centralized version control systems are CVS, Subversion (or SVN) and Perforce.

Distributed Version Control System

Distributed Revision Control System (DRCS) takes a peer-to-peer approach, as opposed to the client-server approach of centralized systems. Rather than a single, central repository on which clients synchronize, each peer's working copy of the codebase is a bona-fide repository. Distributed revision control conducts synchronization by exchanging patches(change-sets) from peer to peer. This results in some important differences from a centralized system:

  • No canonical, reference copy of the codebase exists by default; only working copies.
  • Common operations (such as commits, viewing history, and reverting changes) are fast, because there is no need to communicate with a central server.
  • Rather, communication is only necessary when pushing or pulling changes to or from other peers.
  • Each working copy effectively functions as a remote backup of the codebase and of its change-history, providing inherent protection against data loss.

The act of cloning an entire repository gives distributed version control tools several advantages over centralized systems:

  • Performing actions other than pushing and pulling changesets is extremely fast because the tool only needs to access the hard drive, not a remote server.
  • Committing new changesets can be done locally without anyone else seeing them. Once you have a group of changesets ready, you can push all of them at once.
  • Everything but pushing and pulling can be done without an internet connection. So you can work on a plane, and you won’t be forced to commit several bugfixes as one big changeset.
  • Since each programmer has a full copy of the project repository, they can share changes with one or two other people at a time if they want to get some feedback before showing the changes to everyone.<ref>Comparison of Revision Control Software</ref>

Why use Git? <ref>Why Git is better than X</ref>

Branching Model

Git's most compelling feature is that, it really makes it stand apart from nearly every other SCM out there is its branching model. Git will allow you to have multiple local branches that can be entirely independent of each other and the creation, merging and deletion of those lines of development take seconds.
This means that you can do things like:

  • Create experimental branches.
  • Clear cut Branch delineation into Production, Development, Testing, Fixes, Patches, etc.
  • Seamless Navigation across feature branches and Code Versions.
  • Create and Work on your own patches, features and tests without effecting the Production/Master branch.
  • Independent Merging and Management of one's content changes.

Everything is Local

There is very little outside of 'fetch', 'pull' and 'push' that communicates in any way with anything other than your hard disk. This not only makes most operations much faster than anyone may be used to, but it also allows you to work on stuff offline. Being able to branch, merge, commit and browse history of one's project while on the plane or train is very productive.

Git is Fast

Git is fast. Everyone—even most of the hard core users of these other systems—generally give Git this title. With Git, all operations are performed locally giving it a bit of a leg up on SVN and Perforce, both of which require network access for certain operations. However, even compared to the other DSCMs that also perform operations locally, Git is pretty fast.
Part of this is likely because it was built to work on the Linux kernel, which means that it has had to deal effectively with large repositories from day one. Additionally, Git is written in C, reducing the overhead of runtimes associated with higher-level languages. Another reason that Git is so fast is that the primary developers have made this a design goal of the application.

Git is Small

Git is really good at conserving disk space. A Git directory will (in general) barely be larger than an SVN checkout—in some cases actually smaller (apparently a lot can go in those .svn dirs).
The following numbers were taken from clones of the Django project in each of its semi-official Git mirrors at the same point in its history.

Disk-space Usage
Git Mercurial Bazaar SVN
Repo Alone 24M 34M 45M -
Entire Directory 43M 53M 64M 61M

The Staging Area

Unlike the other systems, Git has what it calls the "staging area" or "index". This is an intermediate area that you can setup what you want your commit to look like before you commit it.
The cool thing about the staging area, and what sets Git apart from all these other tools, is that you can easily stage some of your files as you finish them and then commit them without committing all the modified files in your working directory.

Distributed

One of the coolest features of any of the Distributed SCMs, Git included, is that it's distributed. This means that instead of doing a "checkout" of the current tip of the source code, you do a "clone" of the entire repository.
This means that even if you're using a centralized workflow, every user has what is essentially a full backup of the main server, each of which could be pushed up to replace the main server in the event of a crash or corruption. There is basically no single point of failure with Git unless there is only a single point.
This does not slow things down much, either. On average, an SVN checkout is only marginally faster than any of the DSCMs. Of the DSCMs that were tested, Git was the fastest.

Cloning Benchmarks
Git 1m 59s
Mercurial 2m 24s
Bazaar 5m 11s
SVN 1m 4s

Any Workflow

One of the amazing things about Git is that because of its distributed nature and super branching system, you can easily implement pretty much any workflow you can think of relatively easily.

  • Subversion-Style Workflow
    A very common Git workflow, especially from people transitioning from a centralized system, is a centralized workflow. Git will not allow you to push if someone has pushed since the last time you fetched, so a centralized model where all developers push to the same server works just fine.
  • Integration Manager Workflow
    Another common Git workflow is where there is an integration manager—a single person who commits to the 'blessed' repository, and then a number of developers who clone from that repository, push to their own independent repositories and ask the integrator to pull in their changes. This is the type of development model you often see with open source or GitHub repositories.
  • Dictator and Lieutenants Workflow
    For more massive projects, you can setup your developers similar to the way the Linux kernel is run, where people are in charge of a specific subsystem of the project ('lieutenants') and merge in all changes that have to do with that subsystem. Then another integrator (the 'dictator') can pull changes from only his/her lieutenants and push those to the 'blessed' repository that everyone then clones from again.

Git Work Flow Terminology

Remote repository
A centralize Git server to host all your git-projects. A central repository (repo) is not required but it is the best way to distribute (share) your git-projects to different user/developers. Most would call the remote repository (repo) the Origin and keeping it up-to-date is highly recommended. In our case, GitHub would be our remote repo. They all work together and it all one thing.
Local repository
A local repo is a git-project on your computer. If you were to have a Remote repository (GitHub), then the local repo would be a clone (exact copy) of the remote git-project. When you make changes to your local repo like adding code, you will need to update (push) the git-project to remote repo.
Index (cache) - Staging
When you make changes to your files, you can not commit (snapshot in time) until you "stage" the files. Once you stage the file, you can then commit the changes.The purpose of this Index (staging area) is that it allows you to control what you want to commit. Typically,any code changes that are not complete can be left out of the commit.
Working directory
A working directory is the current git-project, which is your source code with git files. A working directory is a local repo which you can make changes (commit, branches) to it.<ref>Git workflow is simple</ref>

Characteristics of Git

Strong support for non-linear development
Git supports rapid and convenient branching and merging, and includes powerful tools for visualizing and navigating a non-linear development history.
Distributed development
Like most other modern version control systems, Git gives each developer a local copy of the entire development history, and changes are copied from one such repository to another. These changes are imported as additional development branches, and can be merged in the same way as a locally developed branch. Repositories can be easily accessed via the efficient Git protocol (optionally wrapped in ssh for authentication and security) or simply using HTTP - you can publish your repository anywhere without any special webserver configuration required.
Efficient handling of large projects
Git is very fast and scales well even when working with large projects and long histories. It is commonly an order of magnitude faster than most other version control systems, and several orders of magnitude faster on some operations. It also uses an extremely efficient packed format for long-term revision storage that currently tops any other open source version control system.
Cryptographic authentication of history
The Git history is stored in such a way that the name of a particular revision (a "commit" in Git terms) depends upon the complete development history leading up to that commit. Once it is published, it is not possible to change the old versions without it being noticed. Also, tags can be cryptographically signed.
Toolkit-based design
Following the Unix tradition, Git is a collection of many small tools written in C, and a number of scripts that provide convenient wrappers. Git provides tools for both convenient human usage and easy scripting to perform new clever operations.<ref>Git Characteristics</ref>

How do I use Git?

Every commit that a user makes will have his/her name and email address to identify the ‘owner’ of the commit, so the user should give these values in a form of configuration.

git config --global user.name "Your Name"
git config --global user.email "your@email.com"

To get started with Git, a user needs to run the git init command in an empty directory; this initializes a Git repository in that folder, adding a .git folder within it. A repository is kind of like a code history book. It will hold all the past versions of your code, as well as the current one.

git init

A project managed by Git can be considered as a tree; a user can create different features on different branches, and everything will stay separate and safe. After the user has created new files in the repository. It is time to commit these files to mark the completion of feature development and ready to push changes to remote repository. However, before that, the user needs to add the files and move it to staging area.

git add .

A user can also more specifically add files.

git add *.js
git add index.php

This will add all the files to staging area. To stage is simply to prepare it finely for a commit. Git, with its index allows you to commit only certain parts of the changes you've done since the last commit.

<ref>Git Basics Chapter 1</ref><ref>Git Basics Chapter 2</ref>

After a user has staged the required files, they need to commit the files by giving a commit description.

git commit -m "Commit description"

Each commit should have an accompanying message, so everyone in the project knows why that code was committed. A user can also skip the command to add the files in the staging area by directly asking Git to stage and commit with a message.

git commit -am "Commit description"

Git uses the code contents of the commit to create a 40 character SHA1 hash. The neat part about this is that, since it's using the code to create the hash, no two hashes in the user's project will be the same unless the code in the commits is identical. A user can check the status of the git repository by the following command.

git status

To push the changes to a remote git repository, a user can do a git push as following.

git push origin master

<ref>Easy Version Control System with Git</ref>

Real Power-Git Branches

Branches allow to diverge from the main line of development. This allows developers to work on bug fixes and new features without interfering with the main (live) branch (master). Later you can merge branches with each other or with the master branch. The basics are quite simple:

Open up a new branch:

git checkout -b <branchname> 

To list all the branches,a user can use the following command.

 git branch 

To merge a branch into the main branch (commonly refered as master),one can use the below command.

 git merge <branch name>

If the branch is not needed anymore, delete it using the following command.

 git branch -d testing

This is the easiest example of branching, there are more advanced options possible like 3-way merging (merging various branches at once) and rebasing. <ref>Git Rebasing</ref>

Merging & Branching in a Nut Shell

In a nutshell you can create a branch with git branch (branchname), switch into that context with git checkout (branchname), record commit snapshots while in that context, then can switch back and forth easily. When you switch branches, Git replaces your working directory with the snapshot of the latest commit on that branch so you don't have to have multiple directories for multiple branches. You merge branches together with git merge. You can easily merge multiple times from the same branch over time, or alternately you can choose to delete a branch immediately after merging it.

Comparison of various Version Control Softwares

Software Programming language Storage method Scope of change Revision IDs Network protocols Source code size
GNU Bazaar Python, Pyrex , C Snapshot Tree Pseudorandom custom 4.1 MB
BitKeeper C (programming language) Changeset Tree Changeset keys, numbers custom, Hypertext Transfer Protocol, Remote Shell, secure shell, email Unknown
ClearCase C, Java (programming language), Perl Changeset File and Tree Numbers MVFS filesystem driver), Hypertext Transfer Protocol Unknown
Code Co-op C++ Changeset Unknown User ID-Ordinal MAPI, SMTP/POP3, Gmail), Local area network Unknown
CVS C Changeset File Numbers ssh 3.3 MB
Git C, shell scripts, Perl Snapshot Tree SHA-1 hashes custom (git), custom over Secure Shell, Hypertext Transfer Protocol, rsync, email, bundles 10.2 MB
GNU arch C (programming language), shell scripts Changeset Tree Numbers Hypertext Transfer Protocol, WebDAV Unknown
LibreSource Synchronizer Java (programming language) Changeset Unknown Timestamps Hypertext Transfer Protocol, File-System Unknown
Mercurial Python, C (programming language) Changeset Tree Numbers custom over secure shell, Hypertext Transfer Protocol 1.2 MB
Perforce C Changeset Tree Numbers custom Unknown
Subversion C Changeset and Snapshot Tree Numbers custom over Secure Shell, Hypertext Transfer Protocol 5.2 MB
Team Foundation Server C# Changeset File and Tree Numbers SOAP over HTTP or HTTPS Unknown

<ref>Comparison of revision control software</ref>

References

<references/>