笔记本最近在升级 KDE6 后,总是出现不明原因的卡死。其早期表现为网络无法使用,点击托盘的网络管理器后,plasmashell 卡住,所有与网络有关的程序无法使用,都会卡住并无法退出(无论是浏览器还是 curl 等命令行工具),一段时间后 ls
等命令也会出现卡死的现象,无法使用 ctrl+c 退出,无论在哪个目录。使用 strace
会发现卡在 ioctl
系统调用。再过一段时间,除了光标以外的一切程序都会卡死,此时无法关机,进入 tty 后使用 poweroff
或 halt
都会卡住,最后只能通过电源键关机。查看 dmesg
有三个线索:
首先是临近发生网络故障之前有大量的 downshift 信息:
Generic FE-GE Realtek PHY r8169-0-301:00: Downshift occurred from negotiated speed 1Gbps to actual speed 100Mbps, check cabling!
[ 1753.808368] r8169 0000:03:00.1 eth0: Link is Up - 100Mbps/Full (downshifted) - flow control rx/tx
以上消息重复多次,表现出 KDE 桌面提示以太网反复在连接与断开间切换,之后直接报 BUG:
[101813.990261] BUG: unable to handle page fault for address: 000000000000115e
[101813.990269] #PF: supervisor read access in kernel mode
[101813.990272] #PF: error_code(0x0000) - not-present page
[101813.990275] PGD 0 P4D 0
[101813.990279] Oops: 0000 [#1] PREEMPT SMP NOPTI
[101813.990282] CPU: 10 PID: 15323 Comm: kworker/10:2 Tainted: P W OE 6.8.1-1-default #1 openSUSE Tumbleweed a408dede100ecd8172a7eae2d0778227ac69e46d
此时电脑开始表现出之前所说的症状,随后是大量重复的 workqueue lockup
信息:
[101856.953486] BUG: workqueue lockup - pool cpus=10 node=0 flags=0x0 nice=0 stuck for 42s!
[101856.953513] Showing busy workqueues and worker pools:
[101856.953517] workqueue events: flags=0x0
[101856.953522] pwq 20: cpus=10 node=0 flags=0x0 nice=0 active=3/256 refcnt=4
[101856.953529] pending: delayed_vfree_work, kfree_rcu_monitor, kernfs_notify_workfn
[101856.953547] workqueue events_unbound: flags=0x2
[101856.953554] pwq 32: cpus=0-15 flags=0x4 nice=0 active=2/512 refcnt=4
[101856.953560] pwq 32: cpus=0-15 flags=0x4 nice=0 active=2/512 refcnt=3
[101856.953565] in-flight: 19069:fsnotify_connector_destroy_workfn fsnotify_connector_destroy_workfn, 7379:fsnotify_mark_destroy_workfn fsnotify_mark_destroy_workfn BAR(20178)
[101856.953583] workqueue rcu_gp: flags=0x8
[101856.953588] pwq 20: cpus=10 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[101856.953592] pending: process_srcu
[101856.953600] workqueue mm_percpu_wq: flags=0x8
[101856.953604] pwq 20: cpus=10 node=0 flags=0x0 nice=0 active=2/256 refcnt=4
[101856.953608] pending: vmstat_update, lru_add_drain_per_cpu BAR(135)
[101856.953618] workqueue pm: flags=0x4
[101856.953623] pwq 20: cpus=10 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[101856.953626] pending: pm_runtime_work
[101856.953632] workqueue cgroup_destroy: flags=0x0
[101856.953636] pwq 20: cpus=10 node=0 flags=0x0 nice=0 active=1/1 refcnt=2
[101856.953640] in-flight: 21110:css_free_rwork_fn
[101856.953672] workqueue usb_hub_wq: flags=0x4
[101856.953677] pwq 20: cpus=10 node=0 flags=0x0 nice=0 active=2/256 refcnt=3
[101856.953681] pending: 2*hub_event [usbcore]
[101856.953726] workqueue gfx_low: flags=0xa0002
[101856.953731] pwq 32: cpus=0-15 flags=0x4 nice=0 active=1/1 refcnt=19
[101856.953734] pending: drm_sched_free_job_work [gpu_sched]
[101856.953743] inactive: drm_sched_run_job_work [gpu_sched]
我截取了相关的信息在此处:Mozilla Community Pastebin/Vwe85ras (C)
我的系统信息:
Operating System: openSUSE Tumbleweed 20240320
KDE Plasma Version: 6.0.2
KDE Frameworks Version: 6.0.0
Qt Version: 6.6.2
Kernel Version: 6.8.1-1-default (64-bit)
Graphics Platform: Wayland
Processors: 16 × AMD Ryzen 7 5800H with Radeon Graphics
Memory: 27.3 GiB of RAM