CSC/ECE 517 Fall 2013/ch1 1w6 zs: Difference between revisions

From Expertiza_Wiki
Jump to navigation Jump to search
 
(75 intermediate revisions by 2 users not shown)
Line 10: Line 10:


Generally, there are three kinds of [http://en.wikipedia.org/wiki/Revision_control version control system], which are local, centralized and distributed VCSs. Figures are illustrated below.
Generally, there are three kinds of [http://en.wikipedia.org/wiki/Revision_control version control system], which are local, centralized and distributed VCSs. Figures are illustrated below.
[[File:Local VCS.png ]]
[[File:Local VCS.png ]]
Local Version Control
 
'''Figure 1-1. Local Version Control'''
 


[[File:Centrailized VCS.png]]
[[File:Centrailized VCS.png]]
Centralized Version Control System
 
'''Figure 1-2. Centralized Version Control System'''
 


[[File:Distributed VCS.png]]
[[File:Distributed VCS.png]]
Distributed Version Control System
 
'''Figure 1-3. Distributed Version Control System'''


=== History ===
=== History ===
Line 30: Line 36:


[[File:snapshots.png]]
[[File:snapshots.png]]
Git's Snapshots way of thinking data
 
'''Figure 1-4. Git's Snapshots way of thinking data'''
 


This is a huge difference between Git and other VCSs, actually nearly all other VCSs. Git take care of very aspect of version control at a minimum cost.It's more like a set of mini file system with lots of advanced tools built in.
This is a huge difference between Git and other VCSs, actually nearly all other VCSs. Git take care of very aspect of version control at a minimum cost.It's more like a set of mini file system with lots of advanced tools built in.
Line 43: Line 51:


[[File:Three States.png]]
[[File:Three States.png]]
'''Figure 1-5. Three stages of Git for file to reside in'''


The repo is where Git stores data for your project.When you clone a repo from another computer, this is the thing copied.
The repo is where Git stores data for your project.When you clone a repo from another computer, this is the thing copied.
Line 135: Line 146:
  $ git help config
  $ git help config
These commands are nice because you can access them anywhere, even offline. If the manual pages and this book aren’t enough and you need in-person help, you can try the #git or #github channel on the Freenode IRC server (irc.freenode.net). These channels are regularly filled with hundreds of people who are all very knowledgeable about Git and are often willing to help.
These commands are nice because you can access them anywhere, even offline. If the manual pages and this book aren’t enough and you need in-person help, you can try the #git or #github channel on the Freenode IRC server (irc.freenode.net). These channels are regularly filled with hundreds of people who are all very knowledgeable about Git and are often willing to help.
== Basic Operation on Git ==
To initiate a Git repository , you generally have two ways. One is using an existing project or directory and imports it into Git. The other is to clone an existing Git repo from another server.
=== Initializing a Repository in an Existing Directory ===
If you’re starting to track an existing project in Git, you need to go to the project’s directory and type
$ git init
This creates a new subdirectory named .git that contains all of your necessary repository files, which is a Git repository skeleton. At this point, nothing in your project is tracked yet.
If you want to start version-controlling existing files (as opposed to an empty directory), you should probably begin tracking those files and do an initial commit. You can accomplish that with a few git add commands that specify the files you want to track, followed by a commit:
$ git add *.c
$ git add README
$ git commit -m 'initial project version'
At this point, you have a Git repository with tracked files and an initial commit.
=== Cloning an Existing Repo ===
If you want to copy from an existing Git repository , for example, a project you want to contribute to . The command you need is "git clone".  Git receives a copy of nearly all data that the server has. Every version of every file for the history of the project is pulled down when you run git clone.
To clone a repository , you should use git clone [url]. For example, if you want to clone the Ruby Git library called Ruby, you can do so like this:
$ git clone git://github.com/schacon/ruby.git
That creates a directory named grit, initializes a .git directory inside it, pulls down all the data for that repository, and checks out a working copy of the latest version. If you go into the new grit directory, you’ll see the project files in there, ready to be worked on or used. If you want to clone the repository into a directory named something other than grit, you can specify that as the next command-line option:
$ git clone git://github.com/schacon/ruby.git ruby
That command does the same thing as the previous one, but the target directory is called myruby.
Git has a number of different transfer protocols you can use. The previous example uses the git:// protocol, but you may also see http(s):// or user@server:/path.git, which uses the SSH transfer protocol. It will be discussed in the next chapter.
=== Record Changes ===
Now we have a repo of the project , we can start to make changes and commit the changes into repo each time you think appropriate to mark a milestone.
Each file in your working directory can be in one of two states: tracked or untracked. Tracked files are files that were in the last snapshot; they can be unmodified, modified, or staged. Untracked files are everything else. Any files in your working directory that were not in your last snapshot and are not in your staging area. When you first clone a repository, all of your files will be tracked and unmodified because you just checked them out and haven’t edited anything.
As you edit files, Git sees them as modified, because you’ve changed them since your last commit. You stage these modified files and then commit all your staged changes, and the cycle repeats. This cycle is illustrated in the figure below.
[[File:cycle.png]]
'''Figure 3-1. File status cycle'''
You can use the command "git status" to check the status of each file.Assuming you add a simple README file to your project and the file didn't exist before. When you run "git status" ,you can see this untracked file like this in below:
$ vim README
$ git status
# On branch master
# Untracked files:
#  (use "git add <file>..." to include in what will be committed)
#
#  README
nothing added to commit but untracked files present (use "git add" to track)
Your new file is in the "untracked file" output. It means that Git doesn't find it in any previous snapshots and if you want Git to include it, you have to explicitly tell him so.
To track a new file like the README file you just added, you can use the command "git add" like this
$ git add README
Then you can run your status command again and you will get the following:
$ git status
# On branch master
# Changes to be committed:
#  (use "git reset HEAD <file>..." to unstage)
#
#  new file:  README
#
It’s under the “Changes to be committed” heading.If you commit now, this version of the file will be in the snapshot if you run git add. The git add command takes both file name or directory name.If it's a directory, the command will adds all the files in that directory recursively.
==== Check status of your file ====
You can use the command "git status" to check the status of each file.Assuming you add a simple README file to your project and the file didn't exist before. When you run "git status" ,you can see this untracked file like this in below:
$ vim README
$ git status
# On branch master
# Untracked files:
#  (use "git add <file>..." to include in what will be committed)
#
#  README
nothing added to commit but untracked files present (use "git add" to track)
Your new file is in the "untracked file" output. It means that Git doesn't find it in any previous snapshots and if you want Git to include it, you have to explicitly tell him so.
==== Track your file ====
To track a new file like the README file you just added, you can use the command "git add" like this
$ git add README
Then you can run your status command again and you will get the following:
$ git status
# On branch master
# Changes to be committed:
#  (use "git reset HEAD <file>..." to unstage)
#
#  new file:  README
#
It’s under the “Changes to be committed” heading.If you commit now, this version of the file will be in the snapshot if you run git add. The git add command takes both file name or directory name.If it's a directory, the command will adds all the files in that directory recursively.
==== Staging Modified Files ====
If you change a previously tracked file called ruby.rb and then run your status command again, you get something that looks like this:
$ git status
# On branch master
# Changes to be committed:
#  (use "git reset HEAD <file>..." to unstage)
#
#  new file:  README
#
# Changes not staged for commit:
#  (use "git add <file>..." to update what will be committed)
#
#  modified:  ruby.rb
#
The ruby.rb file appears under a head named “Changes not staged for commit”. It means that a tracked file has been modified in the working directory but it's not yet staged. To stage it, you run the git add command.This git add command is multi-purpose.You use it to track unmodified file and stage files. Now we run git add to stage the ruby.rb file, and then run git status again:
$ git add ruby.rb
$ git status
# On branch master
# Changes to be committed:
#  (use "git reset HEAD <file>..." to unstage)
#
#  new file:  README
#  modified:  ruby.rb
#
Both files are staged and will go into your next commit. At this point, suppose you remember one little change that you want to make in ruby.rb before you commit it. You open it again and make that change, and you’re ready to commit. However, let’s run git status one more time:
$ vim benchmarks.rb
$ git status
# On branch master
# Changes to be committed:
#  (use "git reset HEAD <file>..." to unstage)
#
#  new file:  README
#  modified:  ruby.rb
#
# Changes not staged for commit:
#  (use "git add <file>..." to update what will be committed)
#
#  modified:  ruby.rb
#
Ta dah! ruby.rb is listed as both staged and unstaged. Well, Git stages a file exactly as it is when you run the git add command. If you commit now, the version of ruby.rb as it was when you last ran the git add command is how it will go into the commit, not the version of the file as it looks in your working directory when you run git commit. If you modify a file after you run git add, you have to run git add again to stage the latest version of the file:
$ git add ruby.rb
$ git status
# On branch master
# Changes to be committed:
#  (use "git reset HEAD <file>..." to unstage)
#
#  new file:  README
#  modified:  ruby.rb
#
==== Ignoring Files ====
You will have a class of files that you don’t want Git to automatically add or even show you as being untracked. Those files are usually automatically generated files like log files or files produced by your build system. In such cases, you can create a file listing patterns to match them named .gitignore. Here is an example .gitignore file:
$ cat .gitignore
*.[oa]
*~
The first line tells Git to ignore any files ending in .o or .a. Those files are object and archive files that may be the product of building your code. The second line tells Git to ignore all files that end with a tilde (~), which is used by many text editors such as Emacs to mark temporary files. You may also include a log, tmp, or pid directory; automatically generated documentation; and so on. Setting up a .gitignore file before you get going is generally a good idea so you don’t accidentally commit files that you really don’t want in your Git repository.
==== Commit Changes ====
When your staging area is set up the way you want it, you can commit your changes. Anything that is still unstaged won’t go into this commit. They will stay as modified files on your disk. In this case, the last time you ran git status, you saw that everything was staged, so you’re ready to commit your changes. The simplest way to commit is to type git commit:
  $ git commit
Doing so launches your editor of choice. (This is set by your shell’s $EDITOR environment variable — usually vim or emacs).
The editor displays the following text (this example is a Vim screen):
# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.
# On branch master
# Changes to be committed:
#  (use "git reset HEAD <file>..." to unstage)
#
#      new file:  README
#      modified:  benchmarks.rb
~
~
~
".git/COMMIT_EDITMSG" 10L, 283C
You can see that the default commit message contains the latest output of the git status command commented out and one empty line on top. You can remove these comments and type your commit message, or you can leave them there to help you remember what you’re committing. (For an even more explicit reminder of what you’ve modified, you can pass the -v option to git commit. Doing so also puts the diff of your change in the editor so you can see exactly what you did.) When you exit the editor, Git creates your commit with that commit message (with the comments and diff stripped out).
Alternatively, you can type your commit message inline with the commit command by specifying it after a -m flag, like this:
$ git commit -m "Story 182: Fix benchmarks for speed"
[master]: created 463dc4f: "Fix benchmarks for speed"
2 files changed, 3 insertions(+), 0 deletions(-)
create mode 100644 README
Now you’ve created your first commit! You can see that the commit has given you some output about itself: which branch you committed to (master), what SHA-1 checksum the commit has (463dc4f), how many files were changed, and statistics about lines added and removed in the commit.
Remember that the commit records the snapshot you set up in your staging area. Anything you didn’t stage is still sitting there modified; you can do another commit to add it to your history. Every time you perform a commit, you’re recording a snapshot of your project that you can revert to or compare to later.
=== Commit History & Undo ===
==== View Commit History ====
After you have created several commits, or if you have cloned a repository with an existing commit history, you’ll probably want to look back to see what has happened. The most basic and powerful tool to do this is the git log command.
==== Undo ====
At any stage, you may want to undo something. Here, we’ll review a few basic tools for undoing changes that you’ve made. Be careful, because you can’t always revert some of these undos. This is one of the few areas in Git where you may lose some work if you do it wrong.
===== Change Last Commit =====
One of the common undos takes place when you commit too early and possibly forget to add some files, or you mess up your commit message. If you want to try that commit again, you can run commit with the --amend option:
$ git commit --amend
You can type the following three commands to replace your last commit.
$ git commit -m 'initial commit'
$ git add forgotten_file
$ git commit --amend
== Branching ==
=== Introduction for Branching ===
Branching is a concept inn area of version control software, which allows users to duplicate source code and doing development parallel among different branches at the same time without messing up. Branching makes Git more special because of its extremely lightweight compared to other version control software. Branching in Git is fast and flexible. To development fast and powerful, it is important to master this feature.
Git stores data in a series of snapshots. Every commit in Git stores a pointer to any commit that came earlier. Similarly, a branch in Git is also a pointer to one commit. "Master" is the default name for branching. It is used when you first make commits, and it is always points to the last commit when more commits happen later.
[[File:Master_branch.png]]
'''Figure 4-1. Master branching pointing to commit history'''
The command '''''git branch''''' is used to create a new branch, which creates a new pointer into the current commit.
$ git branch test
'''HEAD''' is a special pointer which helps git to remember the current branch. '''''git branch''''' only creates a new branch without switching branching. Thus, you are still on the master branch. To switch branch, '''''git checkout''''' is used. The following command is for switching to the new test branch. It moves the '''HEAD''' pointer into the test branch.
$ git checkout test
At this time, when more commits happen, branch test that '''HEAD''' points to moves forward with new commits. However, branch master still points to the commit when we use '''''git checkout''''' to switch the branch as shown in Figure 4-2.
[[File:HEAD_move.png]]
'''Figure 4-2. Pointer HEAD moves to another branch on command checkout'''
'''HEAD''' points to the current branch and the current branch points into the working directory. If we continue to make some changes and commits, the changes made at this point will diverge from the old version of the project as shown in Figure 4-3.
[[File:Branch_diverge.png]]
'''Figure 4-3. Branch diverged'''
Branch in Git is cheap to create and destroy because it only contains 40 character SHA-1 checksum of the commit that it points to. The speed to create a branch is as fast as writing 41 bytes to file, which is a great advantage compared to other version control software.
=== Merging ===
Merging is a basic operation in Git, which reconciles multiple changes. It is an important functionality in version control software which helps multiple developers works on the same project. Command is used to merge is:
$ git merge branch
There are two conditions might happen in Git when merging branches.
The first condition is when you are trying to merge a commit with another commit which can be find directly by following the first commit's upstream. In this case, no divergent work needs to be considered and the only work for Git to do is to move the pointer forward, which is called "fast forward".
In the second condition the two commits to be merged are not in the same commits stream that there are some divergent work needs to be taken care of. In this case, Git does a three-way merge. The two snapshots pointed to by the branches and the common ancestor of the two commits are used. In the process of three-way merging, Git creates a new snapshot to record and automatically creates a new commit for the new snapshot.
Developers do not need to worry about finding the best common ancestor that Git will does this work and saves time for developers.
There are some conflicts may happen in merging when the same part of the same file are changed differently in two branches. Developers have to resolve this problem before merging, otherwise you will get a merge conflict message looks like:
$ git merge iss53
Auto-merging index.html
CONFLICT (content): Merge conflict in index.html
Automatic merge failed; fix conflicts and then commit the result.
'''''git status''''' is used to check which files are not merged.
[master*]$ git status
index.html: needs merge
# On branch master
# Changes not staged for commit:
#  (use "git add <file>..." to update what will be committed)
#  (use "git checkout -- <file>..." to discard changes in working directory)
#
#  unmerged:  index.html
#
You can open the files that have conflicts manually and fix the problem.
You can also use merge tools which can be listed by command:
$ git mergetool
All the merge tools available are listed after "merge tool candidates".
Once you finish using merge tool and exit. Git will ask you whether you solve the conflicts successfully. If you answer yes, Git will stages file and records that as resolved. To finish merging, type command:
$ git commit
=== Manage your Branch ===
Many branch-management tools are available in using branches.
The command "git branch" has been introduced before, but it can do more than just creating and deleting branches. If there are no arguments come after "git branch", all current branches will be listed.
$ git branch
  iss53
* master
  testing
The "*" which prefix master indicates master is the branch which just check out.
If you want to have the last commit list after each branch, using "git branch -v":
$ git branch -v
  iss53  93b412c fix javascript issue
* master  7a98805 Merge branch 'iss53'
  testing 782fd34 add scott to the author list in the readmes
If you want to filter the branches list as only merged or unmerged, "--merged" and "--no-merged" can be used respectively as the argument.
To delete a branch that has not been merged will fail. The argument used to delete a branch is "-d".
=== Branching Workflow and Remote Branching ===
Long-running branches and topic branches are two kinds of workflows that developers can adopt in their development cycle.
In the workflow which has long-running branches, typically there are there branches: master, develop, and topic. Only codes are entirely stable are in the master branch, usually the codes are going to release are have been. And codes in develop are used for testing stability, which is not always stable. Develop is a parallel branch to master. When the codes in develop are make sure stable, the codes can be merged into master. Topic are temporary branch which still contains bugs and has not passe the test. Figure 4-4 shows the three branches.
[[File:long_running.png]]
'''Figure 4-4. Branches master, develop, and topic'''
In the workflow adopts topic branches, each branch is short-lived and used to develop a particular functionality of the software. Topic branches workflow can be used in any size projects. The advantage of topic branches is that all changes made within a branch are only related to a certain topic, which makes it easier to see what happened during code review. The diagram to show the topic branch is Figure 4-5.
[[File:topic_branch.png]]
'''Figure 4-5. Topic branching'''
The branches stored in remote repositories are called remote branches. They can only be moved by using network communication.
Pushing is used when you want to share your branch with others. You can push your branch to a remote. The remote branch cannot be updated automatically, you need to explicitly push to update it.
== On the Server ==
Git server is an intermediate repository that all co-developers have access to and can push to and pull from.
=== Protocols ===
To transfer data in Git, four protocols can be used: Local, Secure Shell(SSH), Git, and HTTP.
*'''Local Protocol'''
Local protocol is the most basic method. In this case, remote repository is on the same disk but different directory. Local protocol is used when a file system is shared with every developers or everyone logs into the same computer. In the previous case, you can clone, push and pull from a local repository.
The pros of local protocol is it is easy to set up a repository with a shared file system and also convenient to grab others work. The cons of shared access is more difficult to set up and it is not always the fastest option.
*'''The SSH Protocol'''
The SSH protocol is the most commonly used transport protocol. In most cases, the SSH access has already been set up and the setting process is also easy. SSH supports read and write while other network protocols do not.
There are many pros of SSH protocol. First, you can have authenticated write access. Second, it is easy to set up. Third, SSH provides security guarantee that all data transferred are encrypted and authenticated.
The cons of SSH is that SSH is not suitable for open source projects that people must have access to your machine and you cannot serve anonymous.
*'''The Git Protocol'''
The Git protocol is packaged with Git and is a special daemon. Git protocol provides similar service with SSH but without authentication.
Git protocol is the fastest transfer protocol. It is useful in traffic heavy public projects. However, Git protocol does not provide authentication. Git protocol is also the most difficult protocol to set up.
*'''The HTTP/S Protocol'''
HTTP/S protocol is simple to set up, that just putting the bare Git repository under your HTTP document root and then set up a specific post-update hook is all you need to do. HTTPS can be used to serve read-only repositories with content encrypted. HTTP/S also provides better cooperation with firewalls because it is widely used.
The disadvantage of HTTP/S is that it is relatively not efficient for client.
=== Getting Git on the Server===
To getting git on a server, the first you need to do is to clone a existing repository into a new bare repository. Using the command:
$ git clone --bare my_project my_project.git
Initialized empty Git repository in /opt/projects/my_project.git/
Now you need to put your bare repository on the server and set up your protocols. Assume your server is called "git.example.com", and to store your Git repositories under /opt/git. The command to set up your new repository is:
$ scp -r my_project.git user@git.example.com:/opt/git
===Generate Public SSH Key===
SSH public key is used for Git server authenticate. Each user of the system should have one SSH key. You can check whether you have a SSH key by:
$ cd ~/.ssh
$ ls
authorized_keys2  id_dsa      known_hosts
config            id_dsa.pub
The files are named something and something.pub are what you should look for. The .pub file contains your public key and the other one has your private key. If you do not have them, you can run ssh-keygen to get them.
$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/Users/schacon/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /Users/schacon/.ssh/id_rsa.
Your public key has been saved in /Users/schacon/.ssh/id_rsa.pub.
The key fingerprint is:
43:c5:5b:5f:b1:f1:50:43:ad:20:a6:92:6a:1f:9a:3a schacon@agadorlaptop.local
Each user should send their public key to the administrator of the Git server.
=== Public Access===
Sometimes you may want host an open source project and allow anonymous read access to your project. Without having to generate SSH keys all the time, the easiest way of smaller setup is to run a static web server with Git repositories under its document root, and then enable the post-update hook.
To enable the hook:
$ cd project.git
$ mv hooks/post-update.sample hooks/post-update
$ chmod a+x hooks/post-update
Next, a VirtualHost entry with the document root serves as the root directory should be added to your Apache configuration of your Git projects.
<VirtualHost *:80>
    ServerName git.gitserver
    DocumentRoot /opt/git
    <Directory /opt/git/>
        Order allow, deny
        allow from all
    </Directory>
</VirtualHost>
Then, change the Unix user group of /opt/git to www-data to have your web server has read-access to the repositories.
$ chgrp -R www-data /opt/git
After restarting your Apache, you can clone your repositories:
$ git clone http://git.gitserver/project.git
In this way, a HTTP-based read access to your projects with a number of users can be setup within minutes.
==Summary==
Git is free and powerful, which helps developers working on small to huge projects in fast speed and high efficiency. By building on snapshot data storage method, Git provides many advantages over other version control software, such as cheap branching management, convenient staging area, and many workflow options. Brief history of Git is introducted and basic operations of Git are listed and explained. To learn more about Git, please visit http://git-scm.com/.
==Reference==
*http://git-scm.com/book
*http://en.wikipedia.org/wiki/Branching_(revision_control)
*http://en.wikipedia.org/wiki/Merge_(revision_control)

Latest revision as of 15:03, 18 September 2013

CSC/ECE 517 Fall 2013/ch1w6 zs

Synopsis of Git book

Introduction

Git is by far the best Version Control System on the market and it's free.

Version Control System

Version control system is a system that keeps track of changes to a set of files for a certain project. You can recall all specific versions of your file later. Actually any file on your computer is placed under version control. A VCS allows you to revert file to a previous version or even revert the whole project back to a certain version. It allows you to see what when changes are made and who modified it. It also allows you to recover yourself easily from a messed up project.

Generally, there are three kinds of version control system, which are local, centralized and distributed VCSs. Figures are illustrated below.

Figure 1-1. Local Version Control


Figure 1-2. Centralized Version Control System


Figure 1-3. Distributed Version Control System

History

Git was born with a bit of creative destruction and fiery controversy. The linux kernel , which is an open-source software project, kept its changes as patches and archived files from 1991 to 2002 and since 2002. The project switched to a Distributed VCS called BitKeeper.

When things between the community that developed linux kernel and the commercial company that developed Bitkeeper went bad. The free version of the BitKeeper was revoked and this prompt the community that developed Linux kernel to develop their own tools. And that is the origin of Git.

For over 7 years since its birth in 2005, Git has matured and it’s now handy to use while keeping the good qualities. It's especially efficient for large projects and it has a amazing branching system(another link here for branching ) for non-linear development .

What is Git

Git stores ,process and thinks about information in a different way than any other VCS. Systems like CVS deals with the information they hold as a set of files and changes made to each single file over time. But Git doesn't do it this way.Git takes your file more like a series of snapshots. Every time you commit or save your file, Git takes a snapshot of all your files. And to avoid unnecessary memory consumption , those files that stay unchanged are stored as a link to its previous real version. Illustrated as below.

Figure 1-4. Git's Snapshots way of thinking data


This is a huge difference between Git and other VCSs, actually nearly all other VCSs. Git take care of very aspect of version control at a minimum cost.It's more like a set of mini file system with lots of advanced tools built in.

Most operation in Git is local and no data or information is needed on another computer. You always have an entire history of your project on your local computer. It's so efficient that most operations seems instant. That means you can browse your entire project history on you local database almost instantaneous. Git can look up a file no matter how long ago and compare it with your current version with diff calculation.When you are offline , you can easily make changes to your projects and commit the change once you are connected to the internet. Besides that, if you are off VPN, you can still work. Although it seems trivial, this will make a huge difference.

Everything you do on Git is known to Git. You won't lose lose anything or get file corruption before Git detecting it. This is done by adopting checksumming using SHA-1 hash. It's a 40 character string composed of hex characters. Git stores everything in the Git database addressable by the hash value of its contents, other than file name.

Generally Git only adds data to Git database. After committing a snapshot into Git database, it's very hard to lose data. This is especially secured if you regularly push Git database into another repo.

Important: Git has three states for a file to reside in, which are committed, staged and modified.Committed means that your file is safe and sound in Git database. Modified means that you have changed the file but it's not committed into database yet.Staged means that you mark a modified file in its current version and ready for the next snapshots. It's like cache. The three states and their connection is illustrated in the figure below.

Figure 1-5. Three stages of Git for file to reside in


The repo is where Git stores data for your project.When you clone a repo from another computer, this is the thing copied. The working directory is a simple check out for your current version of files.The files on working directory are pulled out of repo and put on disk for you to manipulate. The staging area is a simple file which is contained in repo. It stores information that will go into the commit stage.

Generally, the follow three steps are how Git works:

1.You modify files in your working directory.

2.You stage the files, adding snapshots of them to your staging area.

3.You do a commit, which takes the files as they are in the staging area and stores that snapshot permanently to your Git directory.

Getting Started on Git

Installing Git

First thing to use Git is to install it. You can either install it from sources or installing package on your platform(Win, Mac and Linux)

Install from Source

One of the advantage of installing from source is you always get the latest version. Each version of Git tend to add some UI enhancement and it will be comfy to compile on that. To start with, there are some must-have libraries that Git depends on, such as curl, zlib, openssl, expat and libiconv.If your system has yum (such as Fedora) or apt-get (such as a Debian based system), then you can use the one of the commands below to install the all the dependencies.

$ yum install curl-devel expat-devel gettext-devel \
 openssl-devel zlib-devel
$ apt-get install libcurl4-gnutls-dev libexpat1-dev gettext \
 libz-dev libssl-dev

Once this is done, you can go ahead and download the latest version of Git

 http://git-scm.com/download

And then, compile and install

$ tar -zxf git-1.7.2.2.tar.gz
$ cd git-1.7.2.2
$ make prefix=/usr/local all
$ sudo make prefix=/usr/local install

After this is done, Git is already installed on your local computer. You can use the following command to update your Git to latest version.

$ git clone git://git.kernel.org/pub/scm/git/git.git

Install on Linux

It's way much easier to install Git on Linux. If you are on Fedora, you can use the following yum

$ yum install git-core

If you are on a Debian-based distribution like Ubuntu, use apt-get

$ apt-get install git

Install on Windows

Suprisingly,it's very east to install Git on Windows. Download the Git installer exe file from Github page and run it.

http://msysgit.github.com/

After installing on windows, you will have both command-line and GUI version.It is suggested that windows users to use msysGit shell, which supports many complex commands.

First time set-up

You may want to customize your Git environment once and for all. It will persist after each upgrade and you can also make changes to your settings using the commands again.

A tool called git config is built in on Git , it allows you to get and set configs that control all things about how Git looks and runs.The config variables are stored in three different places:

1. /etc/gitconfig file: Contains values for every user on the system and all their repositories. If you pass the option --system to git config, it reads and writes from this file specifically.

2. ~/.gitconfig file: Specific to your user. You can make Git read and write to this file specifically by passing the --global option.

3. config file in the git directory (that is, .git/config) of whatever repository you’re currently using: Specific to that single repository. Each level overrides values in the previous level, so values in .git/config trump those in /etc/gitconfig.

Identity, Editor, Diff tool and Check

You should set your user name and email address once you get your Git installed. Every Git commits uses this info and it's etched in your commits.

$ git config --global user.name "John Doe"
$ git config --global user.email johndoe@example.com

Because you passed the --global option, you only need to do this once.

You can config your default editor when Git asks you to type any message. If you want to use another editor other than your system's default editor like emacs, you can use the following command. Usually it's Vi or Vim by default.

$ git config --global core.editor emacs

You will want to use diff tools for detecting and resolving emerge conflicts.There are many diff tools and let's assume you want to use vimdiff, you should type the following command

$ git config --global merge.tool vimdiff

Git also accept kdiff3, tkdiff, meld, xxdiff, emerge, vimdiff, gvimdiff, ecmerge, and opendiff as valid merge tools.

If you want to check your settings, Git has provided a --list option for you to view all your settings. By typing "git config --list", you will get the following:

$ git config --list
user.name=Scott Chacon
user.email=schacon@gmail.com
color.status=auto
color.branch=auto
color.interactive=auto
color.diff=auto
...

Get help from Git

If you ever need help while using Git, there are three ways to get the manual page help for any of the Git commands:
$ git help <verb>
$ git <verb> --help
$ man git-<verb>

For example, you can get the manpage help for the config command by running

$ git help config

These commands are nice because you can access them anywhere, even offline. If the manual pages and this book aren’t enough and you need in-person help, you can try the #git or #github channel on the Freenode IRC server (irc.freenode.net). These channels are regularly filled with hundreds of people who are all very knowledgeable about Git and are often willing to help.

Basic Operation on Git

To initiate a Git repository , you generally have two ways. One is using an existing project or directory and imports it into Git. The other is to clone an existing Git repo from another server.

Initializing a Repository in an Existing Directory

If you’re starting to track an existing project in Git, you need to go to the project’s directory and type

$ git init

This creates a new subdirectory named .git that contains all of your necessary repository files, which is a Git repository skeleton. At this point, nothing in your project is tracked yet.

If you want to start version-controlling existing files (as opposed to an empty directory), you should probably begin tracking those files and do an initial commit. You can accomplish that with a few git add commands that specify the files you want to track, followed by a commit:

$ git add *.c
$ git add README
$ git commit -m 'initial project version'

At this point, you have a Git repository with tracked files and an initial commit.

Cloning an Existing Repo

If you want to copy from an existing Git repository , for example, a project you want to contribute to . The command you need is "git clone". Git receives a copy of nearly all data that the server has. Every version of every file for the history of the project is pulled down when you run git clone.

To clone a repository , you should use git clone [url]. For example, if you want to clone the Ruby Git library called Ruby, you can do so like this:

$ git clone git://github.com/schacon/ruby.git

That creates a directory named grit, initializes a .git directory inside it, pulls down all the data for that repository, and checks out a working copy of the latest version. If you go into the new grit directory, you’ll see the project files in there, ready to be worked on or used. If you want to clone the repository into a directory named something other than grit, you can specify that as the next command-line option:

$ git clone git://github.com/schacon/ruby.git ruby

That command does the same thing as the previous one, but the target directory is called myruby.

Git has a number of different transfer protocols you can use. The previous example uses the git:// protocol, but you may also see http(s):// or user@server:/path.git, which uses the SSH transfer protocol. It will be discussed in the next chapter.

Record Changes

Now we have a repo of the project , we can start to make changes and commit the changes into repo each time you think appropriate to mark a milestone.

Each file in your working directory can be in one of two states: tracked or untracked. Tracked files are files that were in the last snapshot; they can be unmodified, modified, or staged. Untracked files are everything else. Any files in your working directory that were not in your last snapshot and are not in your staging area. When you first clone a repository, all of your files will be tracked and unmodified because you just checked them out and haven’t edited anything.

As you edit files, Git sees them as modified, because you’ve changed them since your last commit. You stage these modified files and then commit all your staged changes, and the cycle repeats. This cycle is illustrated in the figure below.

Figure 3-1. File status cycle


You can use the command "git status" to check the status of each file.Assuming you add a simple README file to your project and the file didn't exist before. When you run "git status" ,you can see this untracked file like this in below:

$ vim README
$ git status
# On branch master
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#   README
nothing added to commit but untracked files present (use "git add" to track)

Your new file is in the "untracked file" output. It means that Git doesn't find it in any previous snapshots and if you want Git to include it, you have to explicitly tell him so.

To track a new file like the README file you just added, you can use the command "git add" like this

$ git add README

Then you can run your status command again and you will get the following:

$ git status
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#   new file:   README
#

It’s under the “Changes to be committed” heading.If you commit now, this version of the file will be in the snapshot if you run git add. The git add command takes both file name or directory name.If it's a directory, the command will adds all the files in that directory recursively.

Check status of your file

You can use the command "git status" to check the status of each file.Assuming you add a simple README file to your project and the file didn't exist before. When you run "git status" ,you can see this untracked file like this in below:

$ vim README
$ git status
# On branch master
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#   README
nothing added to commit but untracked files present (use "git add" to track)

Your new file is in the "untracked file" output. It means that Git doesn't find it in any previous snapshots and if you want Git to include it, you have to explicitly tell him so.

Track your file

To track a new file like the README file you just added, you can use the command "git add" like this

$ git add README

Then you can run your status command again and you will get the following:

$ git status
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#   new file:   README
#

It’s under the “Changes to be committed” heading.If you commit now, this version of the file will be in the snapshot if you run git add. The git add command takes both file name or directory name.If it's a directory, the command will adds all the files in that directory recursively.

Staging Modified Files

If you change a previously tracked file called ruby.rb and then run your status command again, you get something that looks like this:

$ git status
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#   new file:   README
#
# Changes not staged for commit:
#   (use "git add <file>..." to update what will be committed)
#
#   modified:   ruby.rb
#

The ruby.rb file appears under a head named “Changes not staged for commit”. It means that a tracked file has been modified in the working directory but it's not yet staged. To stage it, you run the git add command.This git add command is multi-purpose.You use it to track unmodified file and stage files. Now we run git add to stage the ruby.rb file, and then run git status again:

$ git add ruby.rb
$ git status
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#   new file:   README
#   modified:   ruby.rb
#

Both files are staged and will go into your next commit. At this point, suppose you remember one little change that you want to make in ruby.rb before you commit it. You open it again and make that change, and you’re ready to commit. However, let’s run git status one more time:

$ vim benchmarks.rb
$ git status
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#   new file:   README
#   modified:   ruby.rb
#
# Changes not staged for commit:
#   (use "git add <file>..." to update what will be committed)
#
#   modified:   ruby.rb
#

Ta dah! ruby.rb is listed as both staged and unstaged. Well, Git stages a file exactly as it is when you run the git add command. If you commit now, the version of ruby.rb as it was when you last ran the git add command is how it will go into the commit, not the version of the file as it looks in your working directory when you run git commit. If you modify a file after you run git add, you have to run git add again to stage the latest version of the file:

$ git add ruby.rb
$ git status
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#   new file:   README
#   modified:   ruby.rb
#

Ignoring Files

You will have a class of files that you don’t want Git to automatically add or even show you as being untracked. Those files are usually automatically generated files like log files or files produced by your build system. In such cases, you can create a file listing patterns to match them named .gitignore. Here is an example .gitignore file:

$ cat .gitignore
*.[oa]
*~

The first line tells Git to ignore any files ending in .o or .a. Those files are object and archive files that may be the product of building your code. The second line tells Git to ignore all files that end with a tilde (~), which is used by many text editors such as Emacs to mark temporary files. You may also include a log, tmp, or pid directory; automatically generated documentation; and so on. Setting up a .gitignore file before you get going is generally a good idea so you don’t accidentally commit files that you really don’t want in your Git repository.

Commit Changes

When your staging area is set up the way you want it, you can commit your changes. Anything that is still unstaged won’t go into this commit. They will stay as modified files on your disk. In this case, the last time you ran git status, you saw that everything was staged, so you’re ready to commit your changes. The simplest way to commit is to type git commit:

 $ git commit

Doing so launches your editor of choice. (This is set by your shell’s $EDITOR environment variable — usually vim or emacs).

The editor displays the following text (this example is a Vim screen):

# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#       new file:   README
#       modified:   benchmarks.rb
~
~
~
".git/COMMIT_EDITMSG" 10L, 283C

You can see that the default commit message contains the latest output of the git status command commented out and one empty line on top. You can remove these comments and type your commit message, or you can leave them there to help you remember what you’re committing. (For an even more explicit reminder of what you’ve modified, you can pass the -v option to git commit. Doing so also puts the diff of your change in the editor so you can see exactly what you did.) When you exit the editor, Git creates your commit with that commit message (with the comments and diff stripped out).

Alternatively, you can type your commit message inline with the commit command by specifying it after a -m flag, like this:

$ git commit -m "Story 182: Fix benchmarks for speed"
[master]: created 463dc4f: "Fix benchmarks for speed"
2 files changed, 3 insertions(+), 0 deletions(-)
create mode 100644 README

Now you’ve created your first commit! You can see that the commit has given you some output about itself: which branch you committed to (master), what SHA-1 checksum the commit has (463dc4f), how many files were changed, and statistics about lines added and removed in the commit.

Remember that the commit records the snapshot you set up in your staging area. Anything you didn’t stage is still sitting there modified; you can do another commit to add it to your history. Every time you perform a commit, you’re recording a snapshot of your project that you can revert to or compare to later.

Commit History & Undo

View Commit History

After you have created several commits, or if you have cloned a repository with an existing commit history, you’ll probably want to look back to see what has happened. The most basic and powerful tool to do this is the git log command.

Undo

At any stage, you may want to undo something. Here, we’ll review a few basic tools for undoing changes that you’ve made. Be careful, because you can’t always revert some of these undos. This is one of the few areas in Git where you may lose some work if you do it wrong.

Change Last Commit

One of the common undos takes place when you commit too early and possibly forget to add some files, or you mess up your commit message. If you want to try that commit again, you can run commit with the --amend option:

$ git commit --amend

You can type the following three commands to replace your last commit.

$ git commit -m 'initial commit'
$ git add forgotten_file
$ git commit --amend

Branching

Introduction for Branching

Branching is a concept inn area of version control software, which allows users to duplicate source code and doing development parallel among different branches at the same time without messing up. Branching makes Git more special because of its extremely lightweight compared to other version control software. Branching in Git is fast and flexible. To development fast and powerful, it is important to master this feature.

Git stores data in a series of snapshots. Every commit in Git stores a pointer to any commit that came earlier. Similarly, a branch in Git is also a pointer to one commit. "Master" is the default name for branching. It is used when you first make commits, and it is always points to the last commit when more commits happen later.

Figure 4-1. Master branching pointing to commit history

The command git branch is used to create a new branch, which creates a new pointer into the current commit.

$ git branch test

HEAD is a special pointer which helps git to remember the current branch. git branch only creates a new branch without switching branching. Thus, you are still on the master branch. To switch branch, git checkout is used. The following command is for switching to the new test branch. It moves the HEAD pointer into the test branch.

$ git checkout test

At this time, when more commits happen, branch test that HEAD points to moves forward with new commits. However, branch master still points to the commit when we use git checkout to switch the branch as shown in Figure 4-2.

Figure 4-2. Pointer HEAD moves to another branch on command checkout

HEAD points to the current branch and the current branch points into the working directory. If we continue to make some changes and commits, the changes made at this point will diverge from the old version of the project as shown in Figure 4-3.

Figure 4-3. Branch diverged

Branch in Git is cheap to create and destroy because it only contains 40 character SHA-1 checksum of the commit that it points to. The speed to create a branch is as fast as writing 41 bytes to file, which is a great advantage compared to other version control software.

Merging

Merging is a basic operation in Git, which reconciles multiple changes. It is an important functionality in version control software which helps multiple developers works on the same project. Command is used to merge is:

$ git merge branch

There are two conditions might happen in Git when merging branches.

The first condition is when you are trying to merge a commit with another commit which can be find directly by following the first commit's upstream. In this case, no divergent work needs to be considered and the only work for Git to do is to move the pointer forward, which is called "fast forward".

In the second condition the two commits to be merged are not in the same commits stream that there are some divergent work needs to be taken care of. In this case, Git does a three-way merge. The two snapshots pointed to by the branches and the common ancestor of the two commits are used. In the process of three-way merging, Git creates a new snapshot to record and automatically creates a new commit for the new snapshot.

Developers do not need to worry about finding the best common ancestor that Git will does this work and saves time for developers.

There are some conflicts may happen in merging when the same part of the same file are changed differently in two branches. Developers have to resolve this problem before merging, otherwise you will get a merge conflict message looks like:

$ git merge iss53
Auto-merging index.html
CONFLICT (content): Merge conflict in index.html
Automatic merge failed; fix conflicts and then commit the result.

git status is used to check which files are not merged.

[master*]$ git status
index.html: needs merge
# On branch master
# Changes not staged for commit:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
#   unmerged:   index.html
#

You can open the files that have conflicts manually and fix the problem.

You can also use merge tools which can be listed by command:

$ git mergetool

All the merge tools available are listed after "merge tool candidates".

Once you finish using merge tool and exit. Git will ask you whether you solve the conflicts successfully. If you answer yes, Git will stages file and records that as resolved. To finish merging, type command:

$ git commit

Manage your Branch

Many branch-management tools are available in using branches.

The command "git branch" has been introduced before, but it can do more than just creating and deleting branches. If there are no arguments come after "git branch", all current branches will be listed.

$ git branch
  iss53
* master
  testing

The "*" which prefix master indicates master is the branch which just check out.

If you want to have the last commit list after each branch, using "git branch -v":

$ git branch -v
  iss53   93b412c fix javascript issue
* master  7a98805 Merge branch 'iss53'
  testing 782fd34 add scott to the author list in the readmes

If you want to filter the branches list as only merged or unmerged, "--merged" and "--no-merged" can be used respectively as the argument.

To delete a branch that has not been merged will fail. The argument used to delete a branch is "-d".

Branching Workflow and Remote Branching

Long-running branches and topic branches are two kinds of workflows that developers can adopt in their development cycle.

In the workflow which has long-running branches, typically there are there branches: master, develop, and topic. Only codes are entirely stable are in the master branch, usually the codes are going to release are have been. And codes in develop are used for testing stability, which is not always stable. Develop is a parallel branch to master. When the codes in develop are make sure stable, the codes can be merged into master. Topic are temporary branch which still contains bugs and has not passe the test. Figure 4-4 shows the three branches.

Figure 4-4. Branches master, develop, and topic

In the workflow adopts topic branches, each branch is short-lived and used to develop a particular functionality of the software. Topic branches workflow can be used in any size projects. The advantage of topic branches is that all changes made within a branch are only related to a certain topic, which makes it easier to see what happened during code review. The diagram to show the topic branch is Figure 4-5.

Figure 4-5. Topic branching

The branches stored in remote repositories are called remote branches. They can only be moved by using network communication.

Pushing is used when you want to share your branch with others. You can push your branch to a remote. The remote branch cannot be updated automatically, you need to explicitly push to update it.

On the Server

Git server is an intermediate repository that all co-developers have access to and can push to and pull from.

Protocols

To transfer data in Git, four protocols can be used: Local, Secure Shell(SSH), Git, and HTTP.

  • Local Protocol

Local protocol is the most basic method. In this case, remote repository is on the same disk but different directory. Local protocol is used when a file system is shared with every developers or everyone logs into the same computer. In the previous case, you can clone, push and pull from a local repository.

The pros of local protocol is it is easy to set up a repository with a shared file system and also convenient to grab others work. The cons of shared access is more difficult to set up and it is not always the fastest option.

  • The SSH Protocol

The SSH protocol is the most commonly used transport protocol. In most cases, the SSH access has already been set up and the setting process is also easy. SSH supports read and write while other network protocols do not.

There are many pros of SSH protocol. First, you can have authenticated write access. Second, it is easy to set up. Third, SSH provides security guarantee that all data transferred are encrypted and authenticated.

The cons of SSH is that SSH is not suitable for open source projects that people must have access to your machine and you cannot serve anonymous.

  • The Git Protocol

The Git protocol is packaged with Git and is a special daemon. Git protocol provides similar service with SSH but without authentication.

Git protocol is the fastest transfer protocol. It is useful in traffic heavy public projects. However, Git protocol does not provide authentication. Git protocol is also the most difficult protocol to set up.

  • The HTTP/S Protocol

HTTP/S protocol is simple to set up, that just putting the bare Git repository under your HTTP document root and then set up a specific post-update hook is all you need to do. HTTPS can be used to serve read-only repositories with content encrypted. HTTP/S also provides better cooperation with firewalls because it is widely used.

The disadvantage of HTTP/S is that it is relatively not efficient for client.

Getting Git on the Server

To getting git on a server, the first you need to do is to clone a existing repository into a new bare repository. Using the command:

$ git clone --bare my_project my_project.git
Initialized empty Git repository in /opt/projects/my_project.git/

Now you need to put your bare repository on the server and set up your protocols. Assume your server is called "git.example.com", and to store your Git repositories under /opt/git. The command to set up your new repository is:

$ scp -r my_project.git user@git.example.com:/opt/git

Generate Public SSH Key

SSH public key is used for Git server authenticate. Each user of the system should have one SSH key. You can check whether you have a SSH key by:

$ cd ~/.ssh
$ ls
authorized_keys2  id_dsa       known_hosts
config            id_dsa.pub

The files are named something and something.pub are what you should look for. The .pub file contains your public key and the other one has your private key. If you do not have them, you can run ssh-keygen to get them.

$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/Users/schacon/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /Users/schacon/.ssh/id_rsa.
Your public key has been saved in /Users/schacon/.ssh/id_rsa.pub.
The key fingerprint is:
43:c5:5b:5f:b1:f1:50:43:ad:20:a6:92:6a:1f:9a:3a schacon@agadorlaptop.local

Each user should send their public key to the administrator of the Git server.

Public Access

Sometimes you may want host an open source project and allow anonymous read access to your project. Without having to generate SSH keys all the time, the easiest way of smaller setup is to run a static web server with Git repositories under its document root, and then enable the post-update hook.

To enable the hook:

$ cd project.git
$ mv hooks/post-update.sample hooks/post-update
$ chmod a+x hooks/post-update

Next, a VirtualHost entry with the document root serves as the root directory should be added to your Apache configuration of your Git projects.

<VirtualHost *:80>
    ServerName git.gitserver
    DocumentRoot /opt/git
    <Directory /opt/git/>
        Order allow, deny
        allow from all
    </Directory>
</VirtualHost>

Then, change the Unix user group of /opt/git to www-data to have your web server has read-access to the repositories.

$ chgrp -R www-data /opt/git

After restarting your Apache, you can clone your repositories:

$ git clone http://git.gitserver/project.git

In this way, a HTTP-based read access to your projects with a number of users can be setup within minutes.

Summary

Git is free and powerful, which helps developers working on small to huge projects in fast speed and high efficiency. By building on snapshot data storage method, Git provides many advantages over other version control software, such as cheap branching management, convenient staging area, and many workflow options. Brief history of Git is introducted and basic operations of Git are listed and explained. To learn more about Git, please visit http://git-scm.com/.

Reference