get_mtu() returns int, but "mtu" variable has unsigned int type.
It leads to logical error in error handling, which can end up
with strange -EINVAL error in lxc_veth_create(), cause (mtu > 0)
condition is met, but negative "mtu" value is too large when set
as mtu for network device.
Issue #4232
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Maher Azzouzi [Sun, 25 Dec 2022 12:50:25 +0000 (13:50 +0100)]
Patching an incoming CVE (CVE-2022-47952)
lxc-user-nic in lxc through 5.0.1 is installed setuid root, and may
allow local users to infer whether any file exists, even within a
protected directory tree, because "Failed to open" often indicates
that a file does not exist, whereas "does not refer to a network
namespace path" often indicates that a file exists. NOTE: this is
different from CVE-2018-6556 because the CVE-2018-6556 fix design was
based on the premise that "we will report back to the user that the
open() failed but the user has no way of knowing why it failed";
however, in many realistic cases, there are no plausible reasons for
failing except that the file does not exist.
PoC:
> % ls /l
> ls: cannot open directory '/l': Permission denied
> % /usr/lib/x86_64-linux-gnu/lxc/lxc-user-nic delete lol lol /l/h/tt h h
> cmd/lxc_user_nic.c: 1096: main: Failed to open "/l/h/tt" <----- file does not exist.
> % /usr/lib/x86_64-linux-gnu/lxc/lxc-user-nic delete lol lol /l/h/t h h
> cmd/lxc_user_nic.c: 1101: main: Path "/l/h/t" does not refer to a network namespace path <---- file exist!
Fabrice Fontaine [Thu, 29 Dec 2022 13:42:45 +0000 (14:42 +0100)]
src/lxc/meson.build: fix build without apparmor
Don't build lsm/apparmor.c if apparmor is explicitly disabled by the
user to avoid the following build failure with gcc 4.8:
/home/buildroot/autobuild/run/instance-3/output-1/host/arm-buildroot-linux-gnueabi/sysroot/usr/include/bits/fcntl2.h: In function '__apparmor_process_label_open.isra.0':
/home/buildroot/autobuild/run/instance-3/output-1/host/arm-buildroot-linux-gnueabi/sysroot/usr/include/bits/fcntl2.h:50:24: error: call to '__open_missing_mode' declared with attribute error: open with O_CREAT in second argument needs 3 arguments
__open_missing_mode ();
^
Despite the fact that struct nl_handler is filled zeros
in netlink_open() there are two cases where we have possible
exit paths from the function before netlink_open() is called.
At the same time we have cleaner registered:
call_cleaner(netlink_close)
Two cases:
- netdev_get_flag
- lxc_ipvlan_create
If we are exiting from these functions before netlink_open()
is called we will close random file descriptor by reading
it from (struct nl_handler)->fd.
Let's just properly initialize this structure in all cases
to prevent this bug in the future.
Reported-by: coverity (CID #1517319 and #1517316) Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
conf: create separate peer group for container's root
Finally, we turn the rootfs into a shared mount. Note, that this
doesn't reestablish mount propagation with the hosts mount
namespace. Instead we'll create a new peer group.
We're doing this because most workloads do rely on the rootfs being
a shared mount. For example, systemd daemon like sytemd-udevd run in
their own mount namespace. Their mount namespace has been made a
dependent mount (MS_SLAVE) with the host rootfs as it's dominating
mount. This means new mounts on the host propagate into the
respective services.
This is broken if we leave the container's rootfs a dependent mount.
In which case both the container's rootfs and the service's rootfs
will be dependent mounts with the host's rootfs as their dominating
mount. So if you were to mount over the rootfs from the host it
would not just propagate into the container's mount namespace it
would also propagate into the service. That's nonsense semantics for
nearly all relevant use-cases. Instead, establish the container's
rootfs as a separate peer group mirroring the behavior on the host.
Signed-off-by: Christian Brauner (Microsoft) <christian.brauner@ubuntu.com>
cgroups: use userns_exec_full() during cgroup removal
When removing cgroups we can't always use the minimal idmap if the user has
specified a specific map for the container instead of just a simple one.
Execute cgroup removal under the full map.
Fixes: https://github.com/lxc/lxd/issues/11108 Signed-off-by: Christian Brauner (Microsoft) <christian.brauner@ubuntu.com>
Mathias Gibbens [Sat, 19 Nov 2022 15:14:47 +0000 (15:14 +0000)]
tests: lxc-test-reboot: Fix build on ia64
Add the prototype for __clone2(...) that is used on ia64, and adjust the
code to use it via macro tests.
Verified that the code compiles properly on Debian's ia64 porterbox
(yttrium), but was unable to actually run as lxc-test-reboot requires
root privileges.
Aleksa Sarai [Fri, 28 Oct 2022 01:58:10 +0000 (12:58 +1100)]
build: drop build-time systemd dependency
On openSUSE, our packages are build in the Open Build Service which does
not have a proper systemd installation that you can query to get the
systemdunitdir.
The simplest solution is to re-add the ability to explicitly set the
systemdunitdir (as was previously possible with the autotools build
system in pre-5.0 LXC).
Aleksa Sarai [Fri, 28 Oct 2022 01:50:41 +0000 (12:50 +1100)]
build: use cc.get_define to detect FS_CONFIG_* symbols
For some reason, openSUSE has a very strange layout in sys/mount.h where
the definition of all of the FS_CONFIG_* idents are present but are
ifdef'd out in such a way that they will never be defined in an actual
build:
#define FSOPEN_CLOEXEC 0x00000001
/* ... */
#ifndef FSOPEN_CLOEXEC
enum fsconfig_command
{
FSCONFIG_SET_FLAG = 0, /* Set parameter, supplying no value */
# define FSCONFIG_SET_FLAG FSCONFIG_SET_FLAG
/* ... */
};
#endif
Unfortunately, while cc.has_header_symbol is faster, it cannot handle
this which results in compilation errors on openSUSE because the
FS_CONFIG_* symbols are actually not defined when compiling even though
the ident is present in the header. Switching to cc.get_define fixes
this issue.
Fixes: cbabe8abf11e ("build: check for FS_CONFIG_* header symbol in sys/mount.h") Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Aleksa Sarai [Fri, 28 Oct 2022 01:44:39 +0000 (12:44 +1100)]
build: only build init.lxc.static if libcap is statically linkable
Without setting this, the default build will fail if you don't have the
static libcap library installed (on openSUSE this is packaged separately
to libcap-devel).
Aleksa Sarai [Fri, 28 Oct 2022 01:38:20 +0000 (12:38 +1100)]
build: fix handling of dependancies to fix build on openSUSE
Among other things, openSUSE places seccomp.h inside a non-default
include directory (/usr/include/seccomp/seccomp.h) which revealed
several issues with how dependencies were being handled previously.
The most notable issue is that the include cflags of our build
dependencies were not being provided to the recipes for static
executables (yet they still expected access to the dependency headers).
This also involved a minor cleanup of how these dependencies are
collected, and added liburing to the set of private pkg-config libs
(which I assume was an oversight?).
Aleksa Sarai [Fri, 28 Oct 2022 01:27:57 +0000 (12:27 +1100)]
cgroups: fix -Waddress warning
While in principle the pointer could overflow, GCC 12 considers this to
not be possible and issues the following warning:
../src/lxc/cgroups/cgfsng.c: In function ‘__cgfsng_delegate_controllers’:
../src/lxc/cgroups/cgfsng.c:3306:21: warning: the comparison will always evaluate as ‘true’ for the pointer operand in ‘it + 8’ must not be NULL [-Waddress]
3306 | if ((it + 1) && *(it + 1))
| ^
This removes the only build warning triggered when building on openSUSE.
Po-Hsu Lin [Wed, 19 Oct 2022 06:17:29 +0000 (14:17 +0800)]
tests: lxc-test-checkpoint-restore: use trap to do cleanup
This test will fail on Jammy 5.15, and because of the "set -e" it
will never go through the lxc-stop and lxc-destroy code in the end
of this script. Thus the lxc-test-criu container will not be removed.
Compose a cleanup() and use TRAP to solve this problem.
Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
lxc-attach: Fix lost return codes of spawned processes that are killed
lxc-attach swallows the return codes of processes that are terminated
via a signal, and by default exits with a return code of 0 (i.e.
indicating success) even if the command it tried to execute was
terminated.
This patch fixes it by explicitly checking if the process was terminated
via a signal, and returning an appropriate exit code.
Note that we add 128 to the signal value to generate the exit code
because by convention the exit code is 128 + signal number. e.g. if a
process is killed via signal 9, then the error code is 9 + 128 = 137.
Signed-off-by: Mohammed Ajmal Siddiqui <ajmalsiddiqui21@gmail.com>
Chen Qi [Thu, 25 Aug 2022 12:45:53 +0000 (05:45 -0700)]
use sd_bus_call_method_async to replace the asyncv one
The sd_bus_call_method_asyncv's 10th parameter is of type
va_list and supplying NULL when invoking it causes compilation
error. Just replace it with the async one.
Stéphane Graber [Mon, 1 Aug 2022 21:45:52 +0000 (17:45 -0400)]
gitignore: Simplify
The move to meson has made it so that all rendered/built files are now
nicely self-contained. This lets us greatly simplify our gitignore,
effectively just ignoring release tarballs and the few usual temporary
files we may deal with during development.