This was a pretty good read. I use containers quite a lot on my server at home at maintain a bunch of utilities. Mostly I'm using systemd-nspawn.
> User namespaces have been around since the 3.12 kernel, but few other container management systems use the feature to isolate their containers. Part of the reason for that is the difficulty in sharing files between containers because of the UID mapping. LXD is currently using shiftfs on Ubuntu systems to translate UIDs between containers and the parts of the host filesystem that are shared with it. Shiftfs is not upstream, however; there are hopes to add similar functionality to the new mount API before long.
This is exactly the problem I've run into. If you're trying to share files between containers or between the host and a container (e.g. with systemd-nspawn's "--bind" option), it's much harder to have your permissions set properly and still access them from in the container if you're using user namespacing.
There's also the issue of creating the container in the first place. If you follow the instructions for creating a container on the Arch Wiki[1], the files will end up owned by the host's root (mostly), which is a problem when you then try to boot the container as a different (namespaced) user. I don't know of a straightforward way to create a namespaced container with systemd-nspawn, and I don't think there's any way to convert an existing container to a namespaced container.
Yet another problem that sometimes arises is that various distributions have user namespacing disabled (it has to be enabled when the kernel is built) - notably Arch [2], though this may have recently changed. This is apparently due to concerns that the namespacing code is buggy and can itself lead to privilege escalation vulnerabilities.
While it still needs work for performance and other things, Podman already supports rootless containers using user namespaces. It's actually pretty easy to setup too, especially on Arch:
I'm a little surprised that lxd doesn't cater more to nfs rather than bind mounts etc for file sharing. I mean if you already have private/secure networking - just use a network filesystem for your... Network filesystem needs?
NFS currently cannot be used inside a user namespace.
There are patches floating around (similar to the work we did to allow FUSE) but they haven't made it upstream yet and my understanding is that there is some tricky corner cases on NFSv3 which still need to be sorted (NFSv4 was easier due to already having uid mapping capabilities).
> User namespaces have been around since the 3.12 kernel, but few other container management systems use the feature to isolate their containers. Part of the reason for that is the difficulty in sharing files between containers because of the UID mapping. LXD is currently using shiftfs on Ubuntu systems to translate UIDs between containers and the parts of the host filesystem that are shared with it. Shiftfs is not upstream, however; there are hopes to add similar functionality to the new mount API before long.
This is exactly the problem I've run into. If you're trying to share files between containers or between the host and a container (e.g. with systemd-nspawn's "--bind" option), it's much harder to have your permissions set properly and still access them from in the container if you're using user namespacing.
There's also the issue of creating the container in the first place. If you follow the instructions for creating a container on the Arch Wiki[1], the files will end up owned by the host's root (mostly), which is a problem when you then try to boot the container as a different (namespaced) user. I don't know of a straightforward way to create a namespaced container with systemd-nspawn, and I don't think there's any way to convert an existing container to a namespaced container.
Yet another problem that sometimes arises is that various distributions have user namespacing disabled (it has to be enabled when the kernel is built) - notably Arch [2], though this may have recently changed. This is apparently due to concerns that the namespacing code is buggy and can itself lead to privilege escalation vulnerabilities.
[1] https://wiki.archlinux.org/index.php/Systemd-nspawn#Examples
[2] https://bugs.archlinux.org/task/36969