The local directory that contains all of the managed files is called the working tree.
The repository contains, among other things, a collection of commits. It also includes references to certain commits, called heads.
Unlike many other version control tools, Git does not record changes directly from the working tree to the repository. Before being commited to the repository, changes are first recorded in the index. This is a staging area where you add each individual change. When all the desired changes are staged they can be added to the repo in a single commit.
Your working copy of a repository is a complete repo: history, branches, etc. This not only means that you can have full version control even offline, but also that logs, branches, commit information, etc are local and fast.
- a set of file contents, called blobs, which is a snapshot of the working tree when the commit was made.
- references to parent commits (usually there is only one.)
- information about the author, date, etc.
- a SHA1 identifier.
In a Git repository history there is only one commit without any parent reference, the initial commit.
The history is a directed acyclic graph of commits, with the references to parent commits pointing back in time, ultimately reaching the first commit. You can start from any commit and follow the parent pointers all the way back to that first commit. This is how history works.
A head is a named reference to a commit. By default repos start with only one head, called master, but a repo can have any number of heads.
HEAD is the reference to what is currently checked out in your working tree.
If you are working on the branch feature
then HEAD
points to the last
commit in the branch history.
HEAD is the alias to the currently active head, head is a named reference to a commit in the repo. This distinction is important and a common convention in Git documentation.
- Git creates the commit object.
- The parent of the commit is the commit referenced by HEAD.
- If you are in a branch, the branch head is changed to point to the new commit.
- HEAD is updated to point to the new commit.
There are several ways of referencing a commit in git commands:
- The full SHA1 identifier of the commit object.
- Enough characters of the commit identifier.
- Named reference (a branch or tag name).
- Relative from a known commit:
HEAD^
is the parent ofHEAD
.HEAD^^
is the grand-parent ofHEAD
.e7b20f1b~4
is the commit 4 steps back in history frome7b20f1b
.
The full list is in the git-revisions manpage.
In Git a branch and a head are basically the same. A branch is represented by a head and every head represents a branch. Usually when the term "branch" is used, it means the head commit and the history that lead to it. When the term "head" is used it usually means the commit object that is referenced by the particular head.
The default branch is called master
.
When you create a branch, Git creates a head that point to the commit from which you are branching.
Assuming you are merging the branch feature into master:
- Git identifies the common ancestor of feature and master. Let's call this ancestor-commit.
- If feature references ancestor-commit, do nothing.
- If master references ancestor-commit, do a fast-forward.
- Otherwise, create a diff of ancestor-commit and feature.
- Attempt to apply the diff to master.
- If there are no conflicts, Git creates a new commit with two parents: the commits referenced by feature and master.
- If there are no conflicts, master and HEAD are updated to this new commit.
- If there are conflicts, Git inserts the conflict markers in the appropriate files, creates conflict files and no commit is created. The user is informed of the conflict.
After resolving the conflict you add the changes to the stage and manually commit. Git remembers that you are in the middle of a merge and sets the parents of the new commit correctly. It also pre-populates the commit message with a note on the merge and a list of files that had conflicts.
This is an optimization. In a fast-forward merge no merge commit is created, Git just updates the heads.
Git follows a distributed model of version control. To identify remote repositories Git uses a remote-specification. A remote-specification can be a path on the same filesystem, a remote repo over ssh, http, etc.
When you clone a repo Git performs the following steps:
- It creates a local directory. The default is the name of the remote repo.
- It initializes a repository in the new directory.
- All the commit objects and head references are copied from the remote repo.
- A remote repository reference is created (the default name is
origin
) and it points to the repo that you are cloning. - For each head reference in the remote repo a remote head is created in
the local repo. The remote heads are named
origin/<remote-head-name>
. - A remote head in the local repo, corresponding to the current active head in the remote repo is set to track it.
Remote repository references and remote heads work in a similar way as local heads. You can rebase, merge, show specific commits, show the logs, etc.
When you interact and modify a cloned repository most changes are done in your clone only. If you want to modify the remote repo you have to do it explicitly.
A pull
operation from a remote repository does a fetch
from the remote
repository reference, followed by a merge from the tracked branch into the
current active local branch. This can create some unnecessary merge commits,
which is the reason that some people are against pulling.
When you push to a remote repository Git does the following:
- Add the new commit objects to the remote repo.
- Updates the remote head to reference the same commit it references in the local repo.
- Updates the remote head reference in the local repo.
You should only push changes that will result in a fast-forward merge on the remote repository. Otherwise the remote repo will have dangling commits.
- Understanding Git Conceptually, by Charles Duan.
- Git from the bottom up, by John Wiegley.