Tumbleweed 0408 systemd 246.13 LXC 又滚挂了

工作机 zypper dup 更新了一下, lxd 又出状况了,所有 centos7 的容器全部卡死在 init 上, strace 看到卡在 epoll_wait, 有没有人遇到? lxd 版本 4.12-2.1, 源上 4.11-1.2 已经没了,不知道咋滚回去

在开机时,可以在 grub 的菜单中选择启动之前的快照。

可以考虑到 tg 群中求助,那边的反应会更及时一些。

我用的 xfs 分区,所以… 没快照功能.

Tumbleweed 又更新了, lxd 升到了 4.13
所有的 centos7 和 ubuntu16.04 容器都不能启动,卡在 init 上.

@fengliqiang

发 log 啊

ubuntu16 的 log 都是这样的:

linux-aka6:/var/log/lxd # cat test-golang/console.log
Failed to mount cgroup at /sys/fs/cgroup/systemd: Operation not permitted
[!!!] Failed to mount API filesystems, freezing.
Freezing execution.
Failed to mount cgroup at /sys/fs/cgroup/systemd: Operation not permitted
[!!!] Failed to mount API filesystems, freezing.
Freezing execution.


centos7 的 log 都长这样:

linux-aka6:/var/log/lxd # cat testcentos/console.log
Failed to mount cgroup at /sys/fs/cgroup/systemd: Operation not permitted
systemd 219 running in system mode. (+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 -SECCOMP +BLKID +ELFUTILS +KMOD +IDN)
Detected virtualization lxc.
Detected architecture x86-64.

Welcome to CentOS Linux 7 (Core)!

Initializing machine ID from random generator.
Failed to read AF_UNIX datagram queue length, ignoring: No such file or directory
Cannot determine cgroup we are running in: No such file or directory
Failed to allocate manager object: No such file or directory
[!!!] Failed to allocate manager object, freezing.


两个容器 init 进程的 cgroup 是这样的:

ubuntu:

linux-aka6:~ # cat /proc/8763/cgroup 
0::/lxc.payload.test-golang

centos7:

linux-aka6:~ # cat /proc/7683/cgroup 
0::/lxc.payload.testcentos

配置文件:

linux-aka6:/var/log/lxd # ll test-golang/
总用量 8
-rw------- 1 root root  376  4 月 16 11:36 console.log
-rw-r--r-- 1 root root    0  4 月 16 11:36 forkexec.log
-rw-r--r-- 1 root root    0  4 月 16 11:36 forkstart.log
-rw-r----- 1 root root 2063  4 月 16 11:36 lxc.conf
-rw-r----- 1 root root    0  4 月 16 11:36 lxc.log
-rw-r----- 1 root root    0  4 月 16 11:07 lxc.log.old
linux-aka6:/var/log/lxd # cat test-golang/lxc.conf 
lxc.log.file = /var/log/lxd/test-golang/lxc.log
lxc.log.level = warn
lxc.console.buffer.size = auto
lxc.console.size = auto
lxc.console.logfile = /var/log/lxd/test-golang/console.log
lxc.mount.auto = proc:rw sys:rw cgroup:rw:force
lxc.autodev = 1
lxc.pty.max = 1024
lxc.mount.entry = /dev/fuse dev/fuse none bind,create=file,optional 0 0
lxc.mount.entry = /dev/net/tun dev/net/tun none bind,create=file,optional 0 0
lxc.mount.entry = /proc/sys/fs/binfmt_misc proc/sys/fs/binfmt_misc none rbind,create=dir,optional 0 0
lxc.mount.entry = /sys/fs/fuse/connections sys/fs/fuse/connections none rbind,create=dir,optional 0 0
lxc.mount.entry = /sys/fs/pstore sys/fs/pstore none rbind,create=dir,optional 0 0
lxc.mount.entry = /sys/kernel/config sys/kernel/config none rbind,create=dir,optional 0 0
lxc.mount.entry = /sys/kernel/debug sys/kernel/debug none rbind,create=dir,optional 0 0
lxc.mount.entry = /sys/kernel/security sys/kernel/security none rbind,create=dir,optional 0 0
lxc.mount.entry = /sys/kernel/tracing sys/kernel/tracing none rbind,create=dir,optional 0 0
lxc.mount.entry = /dev/mqueue dev/mqueue none rbind,create=dir,optional 0 0
lxc.include = /usr/share/lxc/config/common.conf.d/
lxc.arch = linux64
lxc.hook.version = 1
lxc.hook.pre-start = /proc/2918/exe callhook /var/lib/lxd "default" "test-golang" start
lxc.hook.stop = /usr/bin/lxd callhook /var/lib/lxd "default" "test-golang" stopns
lxc.hook.post-stop = /usr/bin/lxd callhook /var/lib/lxd "default" "test-golang" stop
lxc.tty.max = 0
lxc.uts.name = test-golang
lxc.mount.entry = /var/lib/lxd/devlxd dev/lxd none bind,create=dir 0 0
lxc.apparmor.profile = lxd-test-golang_</var/lib/lxd>//&:lxd-test-golang_<var-lib-lxd>:
lxc.seccomp.profile = /var/lib/lxd/security/seccomp/test-golang
lxc.idmap = u 0 400000000 500000001
lxc.idmap = g 0 400000000 500000001
lxc.mount.auto = shmounts:/var/lib/lxd/shmounts/test-golang:/dev/.lxd-mounts
lxc.net.0.type = phys
lxc.net.0.name = eth0
lxc.net.0.flags = up
lxc.net.0.link = veth3a9373d4
lxc.rootfs.path = dir:/var/lib/lxd/containers/test-golang/rootfs
linux-aka6:/var/log/lxd # 
linux-aka6:/var/log/lxd # 
linux-aka6:/var/log/lxd # 
linux-aka6:/var/log/lxd # 
linux-aka6:/var/log/lxd # ll testcentos/
总用量 8
-rw------- 1 root root  728  4 月 16 11:13 console.log
-rw-r--r-- 1 root root    0  4 月 16 11:13 forkexec.log
-rw-r--r-- 1 root root    0  4 月 16 11:13 forkstart.log
-rw-r----- 1 root root 2052  4 月 16 11:13 lxc.conf
-rw-r----- 1 root root    0  4 月 16 11:13 lxc.log
-rw-r----- 1 root root    0  4 月 16 11:13 lxc.log.old
linux-aka6:/var/log/lxd # cat testcentos/lxc.conf 
lxc.log.file = /var/log/lxd/testcentos/lxc.log
lxc.log.level = warn
lxc.console.buffer.size = auto
lxc.console.size = auto
lxc.console.logfile = /var/log/lxd/testcentos/console.log
lxc.mount.auto = proc:rw sys:rw cgroup:rw:force
lxc.autodev = 1
lxc.pty.max = 1024
lxc.mount.entry = /dev/fuse dev/fuse none bind,create=file,optional 0 0
lxc.mount.entry = /dev/net/tun dev/net/tun none bind,create=file,optional 0 0
lxc.mount.entry = /proc/sys/fs/binfmt_misc proc/sys/fs/binfmt_misc none rbind,create=dir,optional 0 0
lxc.mount.entry = /sys/fs/fuse/connections sys/fs/fuse/connections none rbind,create=dir,optional 0 0
lxc.mount.entry = /sys/fs/pstore sys/fs/pstore none rbind,create=dir,optional 0 0
lxc.mount.entry = /sys/kernel/config sys/kernel/config none rbind,create=dir,optional 0 0
lxc.mount.entry = /sys/kernel/debug sys/kernel/debug none rbind,create=dir,optional 0 0
lxc.mount.entry = /sys/kernel/security sys/kernel/security none rbind,create=dir,optional 0 0
lxc.mount.entry = /sys/kernel/tracing sys/kernel/tracing none rbind,create=dir,optional 0 0
lxc.mount.entry = /dev/mqueue dev/mqueue none rbind,create=dir,optional 0 0
lxc.include = /usr/share/lxc/config/common.conf.d/
lxc.arch = linux64
lxc.hook.version = 1
lxc.hook.pre-start = /proc/2918/exe callhook /var/lib/lxd "default" "testcentos" start
lxc.hook.stop = /usr/bin/lxd callhook /var/lib/lxd "default" "testcentos" stopns
lxc.hook.post-stop = /usr/bin/lxd callhook /var/lib/lxd "default" "testcentos" stop
lxc.tty.max = 0
lxc.uts.name = testcentos
lxc.mount.entry = /var/lib/lxd/devlxd dev/lxd none bind,create=dir 0 0
lxc.apparmor.profile = lxd-testcentos_</var/lib/lxd>//&:lxd-testcentos_<var-lib-lxd>:
lxc.seccomp.profile = /var/lib/lxd/security/seccomp/testcentos
lxc.idmap = u 0 400000000 500000001
lxc.idmap = g 0 400000000 500000001
lxc.mount.auto = shmounts:/var/lib/lxd/shmounts/testcentos:/dev/.lxd-mounts
lxc.net.0.type = phys
lxc.net.0.name = eth0
lxc.net.0.flags = up
lxc.net.0.link = veth7f35db7b
lxc.rootfs.path = dir:/var/lib/lxd/containers/testcentos/rootfs

问题定位了.

我找到一个 3 月 21 号的镜像 (openSUSE-Tumbleweed-DVD-x86_64-Snapshot20210321-Media.iso), lxc 一切正常.
然后逐个升级, 发现更新 systemd 后故障复现.

解决方案:

$ sudo rpm -Uvh --force systemd-246.11-1.1.x86_64.rpm  systemd-container-246.11-1.1.x86_64.rpm  systemd-lang-246.11-1.1.noarch.rpm  systemd-sysvinit-246.11-1.1.x86_64.rpm  udev-246.11-1.1.x86_64.rpm

最终方案:

还是用版主给的方案更靠谱:

sudo sed -i 's|GRUB_CMDLINE_LINUX=""|GRUB_CMDLINE_LINUX="systemd.unified_cgroup_hierarchy=0"|g' /etc/default/grub
sudo cp /boot/grub2/grub.cfg /boot/grub2/grub.cfg.bak
sudo grub2-mkconfig -o /boot/grub2/grub.cfg

3赞

好耶!救活了 LXD 里的 docker……


贴一下我的步骤,由于 TW 是在 0408 把 systemd 从 246.11 升级到了 246.13 所以可以添加之前快照对应 history oss(随便找了 nju 的镜像),并使用它对包降级……

sudo zypper ar https://mirrors.nju.edu.cn/opensuse/history/20210406/tumbleweed/repo/oss/ nju_0406_oss
sudo zypper ref nju_0406_oss
sudo zypper in -f --from nju_0406_oss systemd udev
sudo zypper mr -d nju_0406_oss

照着提示选择降级操作解决依赖问题,禁用这个源,然后重启。

2赞

Default to the “unified” cgroup hierarchy. At this point, most users of cgroup (such as docker, libvirt, kubernetes) should be ready for this change. It’s still possible to switch back to the old “hybrid” hierarchy by passing “systemd.unified_cgroup_hierarchy=0” option to the kernel command line.

systemd 的这个 change 导致的,降级使用也不太好,不一定什么时候就误操作升上去了

1赞

不像上面那么搞可以像这样搞 lxc

Actually, for vanilla LXC 3.0.4, addition of

lxc.cgroup.devices.allow =
lxc.cgroup.devices.deny =
lxc.init.cmd = /lib/systemd/systemd systemd.unified_cgroup_hierarchy=1

to a container config is sufficient to allow usual start-up of a Linux distro with a recent version of systemd in an LXC container running on a Linux host booted with systemd.unified_cgroup_hierarchy=1.

本主题在最后一个回复创建后60分钟后自动锁定。不再允许添加新回复。