mount_namespaces — overview of Linux mount namespaces
For an overview of namespaces, see namespaces(7).
Mount namespaces provide isolation of the list of mount points seen by the processes in each namespace instance. Thus, the processes in each of the mount namespace instances will see distinct single-directory hierarchies.
The views provided by the /proc/[pid]/mounts, /proc/[pid]/mountinfo, and /proc/[pid]/mountstats files (all described in proc(5)) correspond to the mount namespace in which the process with the PID [pid] resides. (All of the processes that reside in the same mount namespace will see the same view in these files.)
A new mount namespace is created using either clone(2) or unshare(2) with the CLONE_NEWNS flag. When a new mount namespace is created, its mount point list is initialized as follows:
- If the namespace is created using clone(2), the mount point list of the child's namespace is a copy of the mount point list in the parent's namespace.
- If the namespace is created using unshare(2), the mount point list of the new namespace is a copy of the mount point list in the caller's previous mount namespace.
Subsequent modifications to the mount point list (mount(2) and umount(2)) in either mount namespace will not (by default) affect the mount point list seen in the other namespace (but see the following discussion of shared subtrees).
Restrictions on mount namespaces
Note the following points with respect to mount namespaces:
- Each mount namespace has an owner user namespace. As explained above, when a new mount namespace is created, its mount point list is initialized as a copy of the mount point list of another mount namespace. If the new namespace and the namespace from which the mount point list was copied are owned by different user namespaces, then the new mount namespace is considered less privileged.
- When creating a less privileged mount namespace, shared mounts are reduced to slave mounts. (Shared and slave mounts are discussed below.) This ensures that mappings performed in less privileged mount namespaces will not propagate to more privileged mount namespaces.
- Mounts that come as a single unit from more privileged mount are locked together and may not be separated in a less privileged mount namespace. (The unshare(2) CLONE_NEWNS operation brings across all of the mounts from the original mount namespace as a single unit, and recursive mounts that propagate between mount namespaces propagate as a single unit.)
- The mount(2) flags MS_RDONLY, MS_NOSUID, MS_NOEXEC, and the "atime" flags (MS_NOATIME, MS_NODIRATIME, MS_RELATIME) settings become locked when propagated from a more privileged to a less privileged mount namespace, and may not be changed in the less privileged mount namespace.
A file or directory that is a mount point in one namespace that is not a mount point in another namespace, may be renamed, unlinked, or removed (rmdir(2)) in the mount namespace in which it is not a mount point (subject to the usual permission checks). Consequently, the mount point is removed in the mount namespace where it was a mount point.
Previously (before Linux 3.18), attempting to unlink, rename, or remove a file or directory that was a mount point in another mount namespace would result in the error EBUSY. That behavior had technical problems of enforcement (e.g., for NFS) and permitted denial-of-service attacks against more privileged users. (i.e., preventing individual files from being updated by bind mounting on top of them).
Mount namespaces first appeared in Linux 2.4.19.
Namespaces are a Linux-specific feature.
The propagation type assigned to a new mount point depends on the propagation type of the parent mount. If the mount point has a parent (i.e., it is a non-root mount point) and the propagation type of the parent is MS_SHARED, then the propagation type of the new mount is also MS_SHARED. Otherwise, the propagation type of the new mount is MS_PRIVATE.
Notwithstanding the fact that the default propagation type for new mount points is in many cases MS_PRIVATE, MS_SHARED is typically more useful. For this reason, systemd(1) automatically remounts all mount points as MS_SHARED on system startup. Thus, on most modern systems, the default propagation type is in practice MS_SHARED.
Since, when one uses unshare(1) to create a mount namespace, the goal is commonly to provide full isolation of the mount points in the new namespace, unshare(1) (since util-linux version 2.27) in turn reverses the step performed by systemd(1), by making all mount points private in the new namespace. That is, unshare(1) performs the equivalent of the following in the new mount namespace:
mount --make-rprivate /
To prevent this, one can use the --propagation unchanged option to unshare(1).
An application that creates a new mount namespace directly using clone(2) or unshare(2) may desire to prevent propagation of mount events to other mount namespaces (as is done by unshare(1)). This can be done by changing the propagation type of mount points in the new namespace to either MS_SLAVE or MS_PRIVATE. using a call such as the following:
mount(NULL, "/", MS_SLAVE | MS_REC, NULL);
For a discussion of propagation types when moving mounts (MS_MOVE) and creating bind mounts (MS_BIND), see Documentation/filesystems/sharedsubtree.txt.
unshare(1), clone(2), mount(2), pivot_root(2), setns(2), umount(2), unshare(2), proc(5), namespaces(7), user_namespaces(7), findmnt(8), pivot_root(8)
Documentation/filesystems/sharedsubtree.txt in the kernel source tree.
This page is part of release 5.04 of the Linux man-pages project. A description of the project, information about reporting bugs, and the latest version of this page, can be found at https://www.kernel.org/doc/man-pages/.
clone(2), core(5), mount(2), namespaces(7), nsenter(1), pid_namespaces(7), pivot_root(2), proc(5), systemd.exec(5), umount(2), unshare(1), unshare(2).