16:41 ity: Hi, amdgpu seems to be crashing at random just by running for a few days, and also prevents my intel iGPU from properly initializing the display stuff on boot (so, an HDMI plugged into it shows nothing), though it seems to respond to basic query ioctls. The crash in amdgpu seems to be "kernel NULL pointer dereference". The kernel stays up, but the iGPU's display stuff does not
16:41 ity: function & the AMD dGPU's output freezes up. I have a few stacktraces as well from the kernel log. I am currently on kernel 6.6.0 & firmware 20230918 as the latter versions seem to have a regression where the dGPU does not work at all. I have a 7900 XTX - 1002:744c:1458:240e.
17:53 Remco: ity: Did you look through https://gitlab.freedesktop.org/drm/amd/-/issues/ ?
17:53 Remco: If your issue is not there, it's the place to add it :)
17:57 Remco: Reading https://gitlab.freedesktop.org/drm/amd/-/issues/3140 it might be your issue since the last comment says it happens intermittently on 6.6.14
18:23 ity: That *seems* like it might be a diff issue, namely I don't have a "hybrid" GPU, the GPU they are using is diff, their issue happens on boot rather than a few days after boot, and the stacktrace seems to come from openat rather than ioctl. The kernel log from mine: https://hastebin.skyra.pw/izelahaxet . I did check the issue tracker a few times, but I am not 100% sure which issues are
18:23 ity: or are not the same as mine. I am on the 6.6.0 kernel, not 6.6.14 also.
18:37 Remco: In that case the best way forward is to file an issue and to attach all the logs
18:40 Remco: https://gitlab.freedesktop.org/drm/amd/-/issues/2623
18:40 Remco: That has a similar stacktrace
18:42 Remco: Unfortunately also stalled because there is no bisect
20:43 ity: Bisect?
20:51 Remco: Bisecting is the process of finding which commit caused the issue to appear by doing a binary search
20:52 Remco: It involves compiling the kernel multiple times and checking each one for the issue
20:52 Remco: https://git-scm.com/docs/git-bisect
20:53 ity: Hmm, well I have been unable to find a kernel that *doesn't* have the issue yet
20:54 ity: And I am already on quite an old kernel
20:54 Remco: You've had it on 6.5 (or older) as well?
20:55 ity: I haven't tried 6.5 yet, but the GPU has been causing problems ever since I got this computer in early 2023
20:55 ity: I could try 6.5 I guess
21:03 ity: Though I need something better than pacman for that haha. I have been unable to get the arch initramfs working on a custom compiled kernel yet, and I am not experienced enough to get a proper kernel config from scratch. I mostly asked here to try to see if there is some fix that I could do, I guess I can fill out an issue & try to get something set up to be able to properly bisect
21:03 ity: this. It might also be a problem in the firmware, as it seems that the regression that prevents the computer from booting is in the firmware rather than the kernel (it's the firmware right after the one I am using in arch packages). Isn't the firmware just blobs? That isn't bisectable proper, is it? There is certainly more than one issue here, and I have no idea which one to focus on
21:03 ity: rn. The *random crash every few days* one is the least severe one as it lets me work still, so that's what I am trying to get fixed. There's also one where some ROCm thing in Blender Preferences brings down the system in a similar way. I am kinda lost in this thing to be fair. Should I fill an issue to track this issue on old firmware & kernel, aka the one I sent the stacktrace for?
21:51 Remco: A kernel oops is always bad. File the issue and see what people have to add that you can try
21:54 ity: O okis
21:55 Remco: It also seems like 6.7 is the current arch kernel, and 6.8 is mainline so you could check whether the issue is still present while using pacman
21:56 ity: Well, 6.8 with latest firmware doesn't boot
21:56 ity: Should I test out 6.8 with the current firmware?
21:56 Remco: :(
21:57 Remco: Just do the issue first so people that actually can give directions
21:57 Remco: *that actually know
21:57 ity: O, for the current issue right? Not the fails to boot one
21:58 Remco: Yeah
21:58 ity: Okis ^^ Will put it there, though I am having a hard time wrapping my head around all the issues :/
22:00 Remco: One step at a time and you'll be fine
22:04 superkuh: (are you the #opera remco?)
22:04 Remco: Yes I am :)
22:05 superkuh: Ah. Hello! :)
22:07 Remco: Hello :)
22:36 ity: Remco: https://gitlab.freedesktop.org/drm/amd/-/issues/3158 made the issue, is what I wrote fine ?
22:38 Remco: Looks fine to me
22:39 ity: Phew, haha!