git(7) Manual Page

NAME

git - the stupid content tracker

SYNOPSIS

git-<command> <args>

DESCRIPTION

This is reference information for the core git commands.

The Discussion section below contains much useful definition and clarification info - read that first. And of the commands, I suggest reading git-update-cache(1) and git-read-tree(1) first - I wish I had!

David Greaves <david@dgreaves.com> 08/05/05

Updated by Junio C Hamano <junkio@cox.net> on 2005-05-05 to reflect recent changes.

Commands Overview

The git commands can helpfully be split into those that manipulate the repository, the cache and the working fileset and those that interrogate and compare them.

There are also some ancilliary programs that can be viewed as useful aids for using the core commands but which are unlikely to be used by SCMs layered over git.

Manipulation commands

git-checkout-cache(1)
Copy files from the cache to the working directory
git-commit-tree(1)
Creates a new commit object
git-init-db(1)
Creates an empty git object database
git-merge-base(1)
Finds as good a common ancestor as possible for a merge
git-mkdelta(1)
Creates a delta object
git-mktag(1)
Creates a tag object
git-read-tree(1)
Reads tree information into the directory cache
git-update-cache(1)
Modifies the index or directory cache
git-write-blob(1)
Creates a blob from a file
git-write-tree(1)
Creates a tree from the current cache

Interrogation commands

git-cat-file(1)
Provide content or type information for repository objects
git-check-files(1)
Verify a list of files are up-to-date
git-diff-cache(1)
Compares content and mode of blobs between the cache and repository
git-diff-files(1)
Compares files in the working tree and the cache
git-diff-tree(1)
Compares the content and mode of blobs found via two tree objects
git-export(1)
Exports each commit and a diff against each of its parents
git-fsck-cache(1)
Verifies the connectivity and validity of the objects in the database
git-ls-files(1)
Information about files in the cache/working directory
git-ls-tree(1)
Displays a tree object in human readable form
git-merge-cache(1)
Runs a merge for files needing merging
git-rev-list(1)
Lists commit objects in reverse chronological order
git-rev-tree(1)
Provides the revision tree for one or more commits
git-tar-tree(1)
Creates a tar archive of the files in the named tree
git-unpack-file(1)
Creates a temporary file with a blob's contents

The interrogate commands may create files - and you can force them to touch the working file set - but in general they don't

Ancilliary Commands

Manipulators:

git-apply-patch-script(1)
Sample script to apply the diffs from git-diff-*
git-convert-cache(1)
Converts old-style GIT repository
git-http-pull(1)
Downloads a remote GIT repository via HTTP
git-local-pull(1)
Duplicates another GIT repository on a local system
git-merge-one-file-script(1)
The standard helper program to use with "git-merge-cache"
git-pull-script(1)
Script used by Linus to pull and merge a remote repository
git-prune-script(1)
Prunes all unreachable objects from the object database
git-resolve-script(1)
Script used to merge two trees
git-tag-script(1)
An example script to create a tag object signed with GPG
git-ssh-pull
Pulls from a remote repository over ssh connection

Interogators:

git-diff-helper(1)
Generates patch format output for git-diff-*
git-ssh-push
Helper "server-side" program used by git-ssh-pull

Identifier Terminology

<object>
Indicates the sha1 identifier for any type of object
<blob>
Indicates a blob object sha1 identifier
<tree>
Indicates a tree object sha1 identifier
<commit>
Indicates a commit object sha1 identifier
<tree-ish>
Indicates a tree, commit or tag object sha1 identifier. A command that takes a <tree-ish> argument ultimately wants to operate on a <tree> object but automatically dereferences <commit> and <tag> objects that point at a <tree>.
<type>
Indicates that an object type is required. Currently one of: blob/tree/commit/tag
<file>
Indicates a filename - always relative to the root of the tree structure GIT_INDEX_FILE describes.

Symbolic Identifiers

Any git comand accepting any <object> can also use the following symbolic notation:

HEAD
indicates the head of the repository (ie the contents of $GIT_DIR/HEAD)
<tag>
a valid tag name+ (ie the contents of $GIT_DIR/refs/tags/<tag>)
<head>
a valid head name+ (ie the contents of $GIT_DIR/refs/heads/<head>)
<snap>
a valid snapshot name+ (ie the contents of $GIT_DIR/refs/snap/<snap>)

File/Directory Structure

The git-core manipulates the following areas in the directory:

.git/         The base (overridden with $GIT_DIR)
  objects/    The object base (overridden with $GIT_OBJECT_DIRECTORY)
    ??/       'First 2 chars of object' directories

It can interrogate (but never updates) the following areas:

refs/       Directories containing symbolic names for objects
            (each file contains the hex SHA1 + newline)
  heads/    Commits which are heads of various sorts
  tags/     Tags, by the tag name (or some local renaming of it)
  snap/     ????
...         Everything else isn't shared
HEAD        Symlink to refs/heads/<something>

Higher level SCMs may provide and manage additional information in the GIT_DIR.

Terminology

Each line contains terms which you may see used interchangeably

object database, .git directory
directory cache, index
id, sha1, sha1-id, sha1 hash
type, tag

Environment Variables

Various git commands use the following environment variables:

The git Repository

These environment variables apply to all core git commands. Nb: it is worth noting that they may be used/overridden by SCMS sitting above git so take care if using Cogito etc

GIT_INDEX_FILE
This environment allows the specification of an alternate cache/index file. If not specified, the default of $GIT_DIR/index is used.
GIT_OBJECT_DIRECTORY
If the object storage directory is specified via this environment variable then the sha1 directories are created underneath - otherwise the default $GIT_DIR/objects directory is used.
GIT_ALTERNATE_OBJECT_DIRECTORIES
Due to the immutable nature of git objects, old objects can be archived into shared, read-only directories. This variable specifies a ":" seperated list of git object directories which can be used to search for git objects. New objects will not be written to these directories.
GIT_DIR
If the GIT_DIR environment variable is set then it specifies a path to use instead of ./.git for the base of the repository.

git Commits

GIT_AUTHOR_NAME
GIT_AUTHOR_EMAIL
GIT_AUTHOR_DATE
GIT_COMMITTER_NAME
GIT_COMMITTER_EMAIL
see git-commit-tree(1)

git Diffs

GIT_DIFF_OPTS
GIT_EXTERNAL_DIFF
see the "generating patches" section in : git-diff-cache(1); git-diff-files(1); git-diff-tree(1)

Discussion

Cogito and GIT: Quick Introduction
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This document describes the Cogito VCS as well as GIT. The GIT itself is merely an extremely fast and flexible filesystem-based database designed to store directory trees with regard to their history. The top layer is a SCM-like tool Cogito which enables human beings to work with the database in a manner to a degree similar to other SCM tools (like CVS, BitKeeper or Monotone).

The Cogito Version Control System
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Cogito is a version control system layered on top of the git tree history storage system. We shall first describe some quick ways to get started using Cogito, then go over each available command one by one. Quickstart

Downloading Cogito From Scratch

Cogito can be obtained as a tarball from

http://www.kernel.org/pub/software/scm/cogito/

Download and unpack the latest version, build with make, put the executables somewhere in your $PATH (or add your Cogito directory itself to your $PATH), and you're ready to go!

The following tools are required by Cogito:

bash, basic shell environment (sed, grep, textutils, mktemp, ...)
diff, patch
merge (e.g. from the RCS package)
libcurl

The following tools are optional but strongly recommended:

libcrypto (OpenSSL)
rsync
Starting a Fresh GIT Repository

If you want to start your own project using Cogito, there are two basic ways to do this. You may start a fresh repository with no files in it, or you may take an existing directory tree and turn it into a GIT repository.

Starting an Empty Repository

To create a new repository with no files in it, cd into an empty directory, and give the following command:

$ cg-init

Your editor will start up, and you will be asked to type in the initial commit description. Type something cute, and exit your editor.

That's it! You're now in your own GIT repository. Notice there is now a .git directory. Go into it and look around, but don't change anything in there. That's what Cogito commands are for.

Turning an Existing Directory Into a Repository

If you have a directory full of files, you can easily turn this into a GIT repository. In fact, it is virtually the same as starting an empty repository. Just cd into the directory you want converted into a GIT repository, and give the following command:

$ cg-init

Your editor starts up, you type in an initial commit message, exit your editor, and you're good to go. All of the files and directories within that directory are now part of a GIT archive.

Accessing Someone Else's GIT Repository
Creating the Repository

If you want to get started tracking an outside GIT repository, you first must have Cogito's executables on your $PATH. Next, you need the URL (or local directory path) of the repository you want to track. You can't just use the URL of a tarball, like the one given above for the Cogito source. The URL must point specifically to a .git directory somewhere. For instance, the URL for Cogito's self-hosting repository is

rsync://rsync.kernel.org/pub/scm/cogito/cogito.git

Notice that the final directory, cogito.git, is not called .git. That is fine. It still has the same content as your .git directory.

To clone the repository to your local filesystem, use the cg-clone command. cg-clone can be told to create a new directory for your repository, or to drop the repository into the current directory.

To have a new directory created, just invoke cg-clone with the URL. You can also include the directory in the command to specify exactly how should the new directory be called, as follows:

$ cg-clone rsync://rsync.kernel.org/pub/scm/cogito/cogito.git cogitodir

You will see a whole bunch of output, and when it is over there will be a new directory called cogitodir (or whatever name you chose) in the current directory. cd into it. Because we used the Cogito URL, you will see the Cogito source tree, with its own .git directory keeping track of everything.

If, instead, you want to clone the repository to the current directory, first make sure you are in an empty directory. Then give the following command:

$ cg-clone -s rsync://rsync.kernel.org/pub/scm/cogito/cogito.git

When you get your prompt back, do an ls to see the source tree and .git directory.

Tracking Others' Work

Of course, once you have cloned a repository, you don't just want to leave it at that. The upstream sources are constantly being updated, and you want to follow these updates. To do this, cd into the working tree directory (not the .git directory, but the directory that contains the .git directory), and give the following command:

$ cg-update

You don't use a URL anymore. Cogito knows which tree you're tracking, because this information is stored in the .git directory. The above command will track the origin branch, which represents the repository you originally cloned. But cg-update can also be used to track specific branches. See below for more discussion of branches, and how to track them.

When you give the above cg-update command, this performed two actions. First, it pulled all new changes from the upstream repository into your local repository. At that point, the changes exist in your local repository as part of the project history. The changes themselves are not actually visible in the files you see, but reside in the .git directory's awareness, just downloaded and ready to be merged somewhere. The second thing cg-update does is to merge these changes into the files you see and work with. The end result is that, when the cg-update has finished, you will see all the upstream changes reflected in your local files, and the .git directory will be aware of the history of those changes as well.

It may be that you want to be aware of the history of the upstream work, but you don't yet want those changes merged with your own local files. To do this, give the following command:

$ cg-pull

This does the first part of cg-update's job, but skips the second part. Now your local files have not been changed, but your .git directory has been updated with the history of all the changes that have occurred in the upstream sources.

Using cg-pull is useful for a variety of purposes, for instance if you want to construct a diff against the latest version of the upstream sources, but don't want those changes to disturb your ongoing work. cg-pull will update your .git directory with the history you need to construct your diff, without merging that history into your tree, potentially breaking your changes.

Typically, if you are not making changes to the project yourself, but just want the latest version of a given project for your own use, you would use cg-update. cg-pull is strictly for development work.

Once you've done a cg-pull, you may decide you want to merge after all. In this case a cg-update command will do the trick, however you will also update your local files with any further upstream changes that have occurred since your cg-pull. The alternative and much more powerful way is using the cg-merge command, which we shall describe later.

Other Stuff

If there are any changes, two IDs will be printed (I mean the line saying "Tree change"). Pass those as parameters to cg-diff and you will get a diff describing changes from the last time you pulled. You can also

$ cg-diff -r origin:HEAD

which will show changes between the cloned branch and your current branch (shall you do any modifications).

Note that you can also access the Linus' official branch, by adding it with the command

$ cg-branch-add name rsyncurl

(the rsyncurl can have a fragment part identifying a branch inside of the repository accessible over rsync). Then you can specify the name to cg-update and cg-pull, or use it anywhere where you could use the "origin" name.

When you do some local changes, you can do

$ cg-diff

to display them. Of course you will want to commit. If you added any new files, do

$ cg-add newfile1 newfile2 ...

first. Then examine your changes by cg-diff or just show what files did you change by

$ cg-status

and feel free to commit by the

$ cg-commit

command, which will present you with the editor of your choice for composing the commit message.

It is nice to be able to examine the commit history. We have tool for that too.

$ cg-log -r origin

will get you the history of my branch. cg-log with no arguments will default to the history of the current branch. Try prepending the "-c" and "-f" options.

Note that we missed out a lot of stuff here. There is already support for merging (cg-merge), moving your tree to an older commit (cg-seek), etc.

Getting Help

Cogito commands come with their own helpful documentation. To get help on cg-update, for example, give this command:

$ cg-pull --help

or, for the same information, try this:

$ cg-help cg-pull
The "core GIT"
~~~~~~~~~~~~~~
GIT - the stupid content tracker

"git" can mean anything, depending on your mood.

Git is a stupid (but extremely fast) directory content manager. It doesn't do a whole lot, but what it _does_ do is track directory contents efficiently.

There are two object abstractions: the "object database", and the "current directory cache" aka "index".

The Object Database

The object database is literally just a content-addressable collection of objects. All objects are named by their content, which is approximated by the SHA1 hash of the object itself. Objects may refer to other objects (by referencing their SHA1 hash), and so you can build up a hierarchy of objects.

All objects have a statically determined "type" aka "tag", which is determined at object creation time, and which identifies the format of the object (i.e. how it is used, and how it can refer to other objects). There are currently five different object types: "blob", "tree", "commit", "tag" and "delta"

A "blob" object cannot refer to any other object, and is, like the tag implies, a pure storage object containing some user data. It is used to actually store the file data, i.e. a blob object is associated with some particular version of some file.

A "tree" object is an object that ties one or more "blob" objects into a directory structure. In addition, a tree object can refer to other tree objects, thus creating a directory hierarchy.

A "commit" object ties such directory hierarchies together into a DAG of revisions - each "commit" is associated with exactly one tree (the directory hierarchy at the time of the commit). In addition, a "commit" refers to one or more "parent" commit objects that describe the history of how we arrived at that directory hierarchy.

As a special case, a commit object with no parents is called the "root" object, and is the point of an initial project commit. Each project must have at least one root, and while you can tie several different root objects together into one project by creating a commit object which has two or more separate roots as its ultimate parents, that's probably just going to confuse people. So aim for the notion of "one root object per project", even if git itself does not enforce that.

A "tag" object symbolically identifies and can be used to sign other objects. It contains the identifier and type of another object, a symbolic name (of course!) and, optionally, a signature.

A "delta" object is used internally by the object database to minimise disk usage. Instead of storing the entire contents of a revision, git can behave in a similar manner to RCS et al and simply store a delta.

Regardless of object type, all objects share the following characteristics: they are all deflated with zlib, and have a header that not only specifies their tag, but also provides size information about the data in the object. It's worth noting that the SHA1 hash that is used to name the object is the hash of the original data or the delta. (Historical note: in the dawn of the age of git the hash was the sha1 of the _compressed_ object)

As a result, the general consistency of an object can always be tested independently of the contents or the type of the object: all objects can be validated by verifying that (a) their hashes match the content of the file and (b) the object successfully inflates to a stream of bytes that forms a sequence of <ascii tag without space> + <space> + <ascii decimal size> + <byte\0> + <binary object data>.

The structured objects can further have their structure and connectivity to other objects verified. This is generally done with the "git-fsck-cache" program, which generates a full dependency graph of all objects, and verifies their internal consistency (in addition to just verifying their superficial consistency through the hash).

The object types in some more detail:

Blob Object

A "blob" object is nothing but a binary blob of data, and doesn't refer to anything else. There is no signature or any other verification of the data, so while the object is consistent (it _is_ indexed by its sha1 hash, so the data itself is certainly correct), it has absolutely no other attributes. No name associations, no permissions. It is purely a blob of data (i.e. normally "file contents").

In particular, since the blob is entirely defined by its data, if two files in a directory tree (or in multiple different versions of the repository) have the same contents, they will share the same blob object. The object is totally independent of it's location in the directory tree, and renaming a file does not change the object that file is associated with in any way.

A blob is created with git-write-blob and it's data can be accessed by git-cat-file

Tree Object

The next hierarchical object type is the "tree" object. A tree object is a list of mode/name/blob data, sorted by name. Alternatively, the mode data may specify a directory mode, in which case instead of naming a blob, that name is associated with another TREE object.

Like the "blob" object, a tree object is uniquely determined by the set contents, and so two separate but identical trees will always share the exact same object. This is true at all levels, i.e. it's true for a "leaf" tree (which does not refer to any other trees, only blobs) as well as for a whole subdirectory.

For that reason a "tree" object is just a pure data abstraction: it has no history, no signatures, no verification of validity, except that since the contents are again protected by the hash itself, we can trust that the tree is immutable and its contents never change.

So you can trust the contents of a tree to be valid, the same way you can trust the contents of a blob, but you don't know where those contents _came_ from.

Side note on trees: since a "tree" object is a sorted list of "filename+content", you can create a diff between two trees without actually having to unpack two trees. Just ignore all common parts, and your diff will look right. In other words, you can effectively (and efficiently) tell the difference between any two random trees by O(n) where "n" is the size of the difference, rather than the size of the tree.

Side note 2 on trees: since the name of a "blob" depends entirely and exclusively on its contents (i.e. there are no names or permissions involved), you can see trivial renames or permission changes by noticing that the blob stayed the same. However, renames with data changes need a smarter "diff" implementation.

A tree is created with git-write-tree and it's data can be accessed by git-ls-tree

Commit Object

The "commit" object is an object that introduces the notion of history into the picture. In contrast to the other objects, it doesn't just describe the physical state of a tree, it describes how we got there, and why.

A "commit" is defined by the tree-object that it results in, the parent commits (zero, one or more) that led up to that point, and a comment on what happened. Again, a commit is not trusted per se: the contents are well-defined and "safe" due to the cryptographically strong signatures at all levels, but there is no reason to believe that the tree is "good" or that the merge information makes sense. The parents do not have to actually have any relationship with the result, for example.

Note on commits: unlike real SCM's, commits do not contain rename information or file mode chane information. All of that is implicit in the trees involved (the result tree, and the result trees of the parents), and describing that makes no sense in this idiotic file manager.

A commit is created with git-commit-tree and it's data can be accessed by git-cat-file

Trust

An aside on the notion of "trust". Trust is really outside the scope of "git", but it's worth noting a few things. First off, since everything is hashed with SHA1, you _can_ trust that an object is intact and has not been messed with by external sources. So the name of an object uniquely identifies a known state - just not a state that you may want to trust.

Furthermore, since the SHA1 signature of a commit refers to the SHA1 signatures of the tree it is associated with and the signatures of the parent, a single named commit specifies uniquely a whole set of history, with full contents. You can't later fake any step of the way once you have the name of a commit.

So to introduce some real trust in the system, the only thing you need to do is to digitally sign just _one_ special note, which includes the name of a top-level commit. Your digital signature shows others that you trust that commit, and the immutability of the history of commits tells others that they can trust the whole history.

In other words, you can easily validate a whole archive by just sending out a single email that tells the people the name (SHA1 hash) of the top commit, and digitally sign that email using something like GPG/PGP.

To assist in this, git also provides the tag object…

Tag Object

Git provides the "tag" object to simplify creating, managing and exchanging symbolic and signed tokens. The "tag" object at its simplest simply symbolically identifies another object by containing the sha1, type and symbolic name.

However it can optionally contain additional signature information (which git doesn't care about as long as there's less than 8k of it). This can then be verified externally to git.

Note that despite the tag features, "git" itself only handles content integrity; the trust framework (and signature provision and verification) has to come from outside.

A tag is created with git-mktag and it's data can be accessed by git-cat-file

Delta Object

The "delta" object is used internally by the object database to minimise storage usage by using xdeltas (byte level diffs). Deltas can form chains of arbitrary length as RCS does (although this is configureable at creation time). Most operations won't see or even be aware of delta objects as they are automatically applied and appear as real git objects In other words, if you write your own routines to look at the contents of the object database then you need to know about this - otherwise you don't. Actually, that's not quite true - one important area where deltas are likely to prove very valuable is in reducing bandwidth loads - so the more sophisticated network tools for git repositories will be aware of them too.

Finally, git repositories can (and must) be deltafied in the background - the work to calculate the differences does not take place automatically at commit time.

A delta can be created (or undeltafied) with git-mkdelta it's raw data cannot be accessed at present.

The "index" aka "Current Directory Cache"

The index is a simple binary file, which contains an efficient representation of a virtual directory content at some random time. It does so by a simple array that associates a set of names, dates, permissions and content (aka "blob") objects together. The cache is always kept ordered by name, and names are unique (with a few very specific rules) at any point in time, but the cache has no long-term meaning, and can be partially updated at any time.

In particular, the index certainly does not need to be consistent with the current directory contents (in fact, most operations will depend on different ways to make the index _not_ be consistent with the directory hierarchy), but it has three very important attributes:

(a) it can re-generate the full state it caches (not just the directory structure: it contains pointers to the "blob" objects so that it can regenerate the data too)

As a special case, there is a clear and unambiguous one-way mapping from a current directory cache to a "tree object", which can be efficiently created from just the current directory cache without actually looking at any other data. So a directory cache at any one time uniquely specifies one and only one "tree" object (but has additional data to make it easy to match up that tree object with what has happened in the directory)

(b) it has efficient methods for finding inconsistencies between that cached state ("tree object waiting to be instantiated") and the current state.

(c) it can additionally efficiently represent information about merge conflicts between different tree objects, allowing each pathname to be associated with sufficient information about the trees involved that you can create a three-way merge between them.

Those are the three ONLY things that the directory cache does. It's a cache, and the normal operation is to re-generate it completely from a known tree object, or update/compare it with a live tree that is being developed. If you blow the directory cache away entirely, you generally haven't lost any information as long as you have the name of the tree that it described.

At the same time, the directory index is at the same time also the staging area for creating new trees, and creating a new tree always involves a controlled modification of the index file. In particular, the index file can have the representation of an intermediate tree that has not yet been instantiated. So the index can be thought of as a write-back cache, which can contain dirty information that has not yet been written back to the backing store.

The Workflow

Generally, all "git" operations work on the index file. Some operations work purely on the index file (showing the current state of the index), but most operations move data to and from the index file. Either from the database or from the working directory. Thus there are four main combinations:

1) working directory -> index

You update the index with information from the working directory with the git-update-cache command. You generally update the index information by just specifying the filename you want to update, like so:

git-update-cache filename

but to avoid common mistakes with filename globbing etc, the command will not normally add totally new entries or remove old entries, i.e. it will normally just update existing cache entries.

To tell git that yes, you really do realize that certain files no longer exist in the archive, or that new files should be added, you should use the "—remove" and "—add" flags respectively.

NOTE! A "—remove" flag does _not_ mean that subsequent filenames will necessarily be removed: if the files still exist in your directory structure, the index will be updated with their new status, not removed. The only thing "—remove" means is that update-cache will be considering a removed file to be a valid thing, and if the file really does not exist any more, it will update the index accordingly.

As a special case, you can also do "git-update-cache —refresh", which will refresh the "stat" information of each index to match the current stat information. It will _not_ update the object status itself, and it will only update the fields that are used to quickly test whether an object still matches its old backing store object.

2) index -> object database

You write your current index file to a "tree" object with the program

git-write-tree

that doesn't come with any options - it will just write out the current index into the set of tree objects that describe that state, and it will return the name of the resulting top-level tree. You can use that tree to re-generate the index at any time by going in the other direction:

3) object database -> index

You read a "tree" file from the object database, and use that to populate (and overwrite - don't do this if your index contains any unsaved state that you might want to restore later!) your current index. Normal operation is just

git-read-tree <sha1 of tree>

and your index file will now be equivalent to the tree that you saved earlier. However, that is only your _index_ file: your working directory contents have not been modified.

4) index -> working directory

You update your working directory from the index by "checking out" files. This is not a very common operation, since normally you'd just keep your files updated, and rather than write to your working directory, you'd tell the index files about the changes in your working directory (i.e. "git-update-cache").

However, if you decide to jump to a new version, or check out somebody else's version, or just restore a previous tree, you'd populate your index file with read-tree, and then you need to check out the result with git-checkout-cache filename

or, if you want to check out all of the index, use "-a".

NOTE! git-checkout-cache normally refuses to overwrite old files, so if you have an old version of the tree already checked out, you will need to use the "-f" flag (_before_ the "-a" flag or the filename) to _force_ the checkout.

Finally, there are a few odds and ends which are not purely moving from one representation to the other:

5) Tying it all together

To commit a tree you have instantiated with "git-write-tree", you'd create a "commit" object that refers to that tree and the history behind it - most notably the "parent" commits that preceded it in history.

Normally a "commit" has one parent: the previous state of the tree before a certain change was made. However, sometimes it can have two or more parent commits, in which case we call it a "merge", due to the fact that such a commit brings together ("merges") two or more previous states represented by other commits.

In other words, while a "tree" represents a particular directory state of a working directory, a "commit" represents that state in "time", and explains how we got there.

You create a commit object by giving it the tree that describes the state at the time of the commit, and a list of parents:

git-commit-tree <tree> -p <parent> [-p <parent2> ..]

and then giving the reason for the commit on stdin (either through redirection from a pipe or file, or by just typing it at the tty).

git-commit-tree will return the name of the object that represents that commit, and you should save it away for later use. Normally, you'd commit a new "HEAD" state, and while git doesn't care where you save the note about that state, in practice we tend to just write the result to the file ".git/HEAD", so that we can always see what the last committed state was.

6) Examining the data

You can examine the data represented in the object database and the index with various helper tools. For every object, you can use git-cat-file to examine details about the object:

git-cat-file -t <objectname>

shows the type of the object, and once you have the type (which is usually implicit in where you find the object), you can use

git-cat-file blob|tree|commit <objectname>

to show its contents. NOTE! Trees have binary content, and as a result there is a special helper for showing that content, called "git-ls-tree", which turns the binary content into a more easily readable form.

It's especially instructive to look at "commit" objects, since those tend to be small and fairly self-explanatory. In particular, if you follow the convention of having the top commit name in ".git/HEAD", you can do

git-cat-file commit $(cat .git/HEAD)

to see what the top commit was.

7) Merging multiple trees

Git helps you do a three-way merge, which you can expand to n-way by repeating the merge procedure arbitrary times until you finally "commit" the state. The normal situation is that you'd only do one three-way merge (two parents), and commit it, but if you like to, you can do multiple parents in one go.

To do a three-way merge, you need the two sets of "commit" objects that you want to merge, use those to find the closest common parent (a third "commit" object), and then use those commit objects to find the state of the directory ("tree" object) at these points.

To get the "base" for the merge, you first look up the common parent of two commits with

git-merge-base <commit1> <commit2>

which will return you the commit they are both based on. You should now look up the "tree" objects of those commits, which you can easily do with (for example)

git-cat-file commit <commitname> | head -1

since the tree object information is always the first line in a commit object.

Once you know the three trees you are going to merge (the one "original" tree, aka the common case, and the two "result" trees, aka the branches you want to merge), you do a "merge" read into the index. This will throw away your old index contents, so you should make sure that you've committed those - in fact you would normally always do a merge against your last commit (which should thus match what you have in your current index anyway).

To do the merge, do

git-read-tree -m <origtree> <target1tree> <target2tree>

which will do all trivial merge operations for you directly in the index file, and you can just write the result out with "git-write-tree".

NOTE! Because the merge is done in the index file, and not in your working directory, your working directory will no longer match your index. You can use "git-checkout-cache -f -a" to make the effect of the merge be seen in your working directory.

NOTE2! Sadly, many merges aren't trivial. If there are files that have been added.moved or removed, or if both branches have modified the same file, you will be left with an index tree that contains "merge entries" in it. Such an index tree can _NOT_ be written out to a tree object, and you will have to resolve any such merge clashes using other tools before you can write out the result.

Author

Written by Linus Torvalds <torvalds@osdl.org> and the git-list <git@vger.kernel.org>.

Documentation

Documentation by David Greaves, Junio C Hamano and the git-list <git@vger.kernel.org>.

GIT

Part of the git(7) suite