gitsubmodules — mounting one repository inside another

Synopsis

.gitmodules, $GIT_DIR/config
git submodule
git <command> --recurse-submodules

Description

A submodule is a repository embedded inside another repository. The submodule has its own history; the repository it is embedded in is called a superproject.

On the filesystem, a submodule usually (but not always - see Forms below) consists of (i) a Git directory located under the $GIT_DIR/modules/ directory of its superproject, (ii) a working directory inside the superproject’s working directory, and a .git file at the root of the submodule’s working directory pointing to (i).

Assuming the submodule has a Git directory at $GIT_DIR/modules/foo/ and a working directory at path/to/bar/, the superproject tracks the submodule via a gitlink entry in the tree at path/to/bar and an entry in its .gitmodules file (see gitmodules(5)) of the form submodule.foo.path = path/to/bar.

The gitlink entry contains the object name of the commit that the superproject expects the submodule’s working directory to be at.

The section submodule.foo.* in the .gitmodules file gives additional hints to Git’s porcelain layer. For example, the submodule.foo.url setting specifies where to obtain the submodule.

Submodules can be used for at least two different use cases:

  1. Using another project while maintaining independent history. Submodules allow you to contain the working tree of another project within your own working tree while keeping the history of both projects separate. Also, since submodules are fixed to an arbitrary version, the other project can be independently developed without affecting the superproject, allowing the superproject project to fix itself to new versions only when desired.
  2. Splitting a (logically single) project into multiple repositories and tying them back together. This can be used to overcome current limitations of Git’s implementation to have finer grained access:

    • Size of the Git repository: In its current form Git scales up poorly for large repositories containing content that is not compressed by delta computation between trees. For example, you can use submodules to hold large binary assets and these repositories can be shallowly cloned such that you do not have a large history locally.
    • Transfer size: In its current form Git requires the whole working tree present. It does not allow partial trees to be transferred in fetch or clone. If the project you work on consists of multiple repositories tied together as submodules in a superproject, you can avoid fetching the working trees of the repositories you are not interested in.
    • Access control: By restricting user access to submodules, this can be used to implement read/write policies for different users.

The Configuration of Submodules

Submodule operations can be configured using the following mechanisms (from highest to lowest precedence):

Forms

Submodules can take the following forms:

Active Submodules

A submodule is considered active,

  1. if submodule.<name>.active is set to true

    or

  2. if the submodule’s path matches the pathspec in submodule.active

    or

  3. if submodule.<name>.url is set.

and these are evaluated in this order.

For example:

[submodule "foo"]
  active = false
  url = https://example.org/foo
[submodule "bar"]
  active = true
  url = https://example.org/bar
[submodule "baz"]
  url = https://example.org/baz

In the above config only the submodule bar and baz are active, bar due to (1) and baz due to (3). foo is inactive because (1) takes precedence over (3)

Note that (3) is a historical artefact and will be ignored if the (1) and (2) specify that the submodule is not active. In other words, if we have a submodule.<name>.active set to false or if the submodule’s path is excluded in the pathspec in submodule.active, the url doesn’t matter whether it is present or not. This is illustrated in the example that follows.

[submodule "foo"]
  active = true
  url = https://example.org/foo
[submodule "bar"]
  url = https://example.org/bar
[submodule "baz"]
  url = https://example.org/baz
[submodule "bob"]
  ignore = true
[submodule]
  active = b*
  active = :(exclude) baz

In here all submodules except baz (foo, bar, bob) are active. foo due to its own active flag and all the others due to the submodule active pathspec, which specifies that any submodule starting with b except baz are also active, regardless of the presence of the .url field.

Workflow for a Third Party Library

# add a submodule
git submodule add <url> <path>
# occasionally update the submodule to a new version:
git -C <path> checkout <new version>
git add <path>
git commit -m "update submodule to new version"
# See the list of submodules in a superproject
git submodule status
# See FORMS on removing submodules

Workflow for an Artificially Split Repo

# Enable recursion for relevant commands, such that
# regular commands recurse into submodules by default
git config --global submodule.recurse true
# Unlike the other commands below clone still needs
# its own recurse flag:
git clone --recurse <URL> <directory>
cd <directory>
# Get to know the code:
git grep foo
git ls-files
# Get new code
git fetch
git pull --rebase
# change worktree
git checkout
git reset

Implementation Details

When cloning or pulling a repository containing submodules the submodules will not be checked out by default; You can instruct clone to recurse into submodules. The init and update subcommands of git submodule will maintain submodules checked out and at an appropriate revision in your working tree. Alternatively you can set submodule.recurse to have checkout recursing into submodules.

See Also

git-submodule(1), gitmodules(5).

Git

Part of the git(1) suite

Referenced By

git-config(1), git-rm(1), git-submodule(1).

11/04/2019 Git 2.24.0 Git Manual