00:24 dirbaio: hello! I've been debugging a cpu hog issue in nouveau where the base507c_ntfy_wait_begun function is busylooping for 8-16ms, being called 60 times a second
00:25 dirbaio: so it's approx 1 core pegged to 100%
00:25 dirbaio: for each plugged in external monitor
00:26 dirbaio: Arch linux, kernel 7.0.2, GP107M, Thinkpad X1 Extreme (with optimus enabled), using Sway.
00:28 dirbaio: so the interesting thing is
00:29 dirbaio: with Claude I've changed it to use an event to wait instead of busylooping. it's a ~100 line diff, and it works. cpu hog is gone, everything seems to work
00:29 dirbaio: but
00:30 karolherbst: yeah uhm.. maybe the "usleep_range(1, 2)" is a bit short
00:30 dirbaio: 1. I don't want to upstream it as-is because I don't know enough to know if it's slop or not (I'm an opensource maintainer as well, I don't want to do to others what I don't like being done to me)
00:31 karolherbst: so what "nvif_msec" is doing is to busy loop on the GPU timer
00:31 karolherbst: up to 2 seconds
00:31 dirbaio: 2. I find it suspicious that base507c_ntfy_wait_begun waits for so long. Is it even expected? Or is real problem that it waits for so long and changing it to use an event is just masking the issue?
00:31 karolherbst: until the code breaks out of it
00:31 dirbaio: hmm
00:31 karolherbst: and the `NVBO_TD32` basically reads from GPU memory
00:32 karolherbst: so it waits until the GPU (the disp engine here) writes a certain value at a certain address
00:32 dirbaio: if it's expected that it waits for so long, surely eveyone using this gpu would've seen the cpu hog?
00:32 karolherbst: I suspect it's the `usleep_range(1, 2);` doing
00:33 karolherbst: so every loop iteration it tries to wait for 1-2 us based on CPU timers
00:33 karolherbst: those are then non busy waits
00:33 karolherbst: could maybe try with "usleep_range(100, 200);"
00:33 dirbaio: but simply increasing that would hurt latency, right?
00:34 karolherbst: probably, but maybe it doesn't matter in this case?
00:34 dirbaio: and with 100us it means the cpu is still waking up 10k times a second
00:34 karolherbst: this happens per atomic commit
00:34 karolherbst: *atomic KMS commit
00:35 dirbaio: i'm seeing it happen every frame
00:35 karolherbst: right... because the compositor also sends a new frame every frame :)
00:35 dirbaio: haha, yeah :)
00:35 karolherbst: _maybe_ booting with `nouveau.atomic=1` mitigates it, I don't really know if that would make a difference
00:35 karolherbst: but increasing the wait should help with the busy wait at least
00:36 karolherbst: but maybe Lyude would know what's a better solution here
00:36 dirbaio: this is the diff Claude came up with https://github.com/Dirbaio/linux/commit/0f1d715f1ef8cbe8c2b96b6658cb422daaad7176
00:36 karolherbst: I wonder if we could turn some of those into interrupt based wiats
00:36 dirbaio: but I don't know if it's nonsense or not
00:36 dirbaio: it turns it into an interrupt-based wait yea
00:37 karolherbst: I don't know :)
00:37 karolherbst: but yeah.. it feels like it
00:37 karolherbst: maybe the code isn't good, but I think the idea in theory is a good one, unless there is something going on I'm not aware of
00:38 Lyude: I mean - I've never actually seen the display core take that long. are there other errors you see?
00:39 dirbaio: no, nothing
00:40 Lyude: I think I'd definitely have to poke nvidia to see if they might have an idea what is causing it to wait so long, since that makes me feel like we're doing something wrong - but also at the same time I'm fairly certain if we were doing something wrong we'd see an error from evo
00:40 dirbaio: here's the full dmesg https://gist.github.com/Dirbaio/d5914358e743eb6a05c55a8a8b4118c9
00:40 Lyude: Thinking about it more too - 8ms is _way_ too long for any plane update to take
00:40 Lyude: that's multiple frames worth of waiting
00:40 dirbaio: only suspicious thing is "nouveau 0000:01:00.0: drm: Disabling PCI power management to avoid bug"
00:41 dirbaio: but probably not related
00:41 Lyude: i know what that bit is, and yeah it's unrelated
00:41 Lyude: something something intell pm issue
00:41 dirbaio: and the desktop feels smooth and responsive, screens update at 60fps just fine. it's just hogging cpu
00:41 karolherbst: heh I've added that one
00:42 Lyude: actually - no, 8ms would be half a frame update
00:42 dirbaio: yea. it's like it's busylooping until next vblank
00:43 Lyude: next vblank should be 16ms
00:43 Lyude: given 60fps
00:43 karolherbst: Lyude: I think no matter what, a 1-2 us wait might be too short anyway given it's not _that_ time critical?
00:43 dirbaio: could be less if the compositor took some time to make the frame (?)
00:43 Lyude: definitely don't really want to take an ai code contribution here but i can go and write something up to hook this up, I think you do have the right solution
00:44 Lyude: karolherbst: yes - but honestly, I think if we have an interrupt that we can use that's probably what we should use. I will have to take a look at openrm though maybe
00:44 dirbaio: okay
00:44 karolherbst: yeah...
00:44 karolherbst: I'd also prefer an interrupt based solution
00:44 dirbaio: let me know if you want me to test or debug anything
00:44 dirbaio: i'm also willing to take time to write and submit a non-ai patch but i'm afraid I'll need lots of guidance...
00:44 Lyude: yeah no problem - feel free to poke me too if you don't hear anything for a while
00:44 Lyude: haha that's fine - honestly i'm not too worried about the complexity of hooking this up.
00:45 Lyude: we probably also need to double check too to make sure openrm is doing this (I assume they must be though)
00:45 Lyude: thanks for looking into this btw
00:45 Lyude: you've definitely saved a lot of time
00:46 dirbaio: :D
00:46 Lyude: will try to do this tomorrow when I sit down
00:47 Lyude: for work i mean
14:29 Nanonej: Hello there!I completed LFD103 a couple of weeks ago and waiting for the reply on the LKMP, I wanted to see if I could start my contributing goal to Rust for Linux. I saw on Zulip that Nova could be a good environment for beginners, so here I am :D
14:31 Nanonej: Any suggestion on where to start to understand the state of the project right now and start contributing on it?
14:32 Nanonej: I read https://rust-for-linux.com/nova-gpu-driver and so there was a TO DO there: https://docs.kernel.org/gpu/nova/core/todo.html , but it's still not clear where I could be of use or where to start
14:33 Mary: might be a question for Lyude and airlied[d] for when they are around ^^
14:42 Nanonej: Ok, maybe I should let a message on the Zulip or shoot them an email? Not sure IRC is the best for this then
19:20 _lyude[d]: So I recently discovered that CRC generation doesn't always work (in particular, when gnome-shell is loaded) - and I've been trying to understand what appears to be the cause of this. So: display CRCs are one of the few portions of the GPU that still use old-style memory handles and don't actually have the ability to specify memory offsets, so as a result we
19:20 _lyude[d]: https://gitlab.freedesktop.org/lyudess/linux/-/blob/rust/gem-shmem/drivers/gpu/drm/nouveau/dispnv50/crc.c?ref_type=heads#L508 allocate two buffers (both 16 pages in length) with a custom fixed memor handle to store the CRCs in before actually enabling CRC readback. It appears that when gnome-shell is started, we manage to allocate the buffers - but `ctx->mem.addr` is `-1` - which appears to be a
19:20 _lyude[d]: result of us getting a non-contiguous allocation returned to us:
19:20 _lyude[d]: https://gitlab.freedesktop.org/lyudess/linux/-/blob/rust/gem-shmem/drivers/gpu/drm/nouveau/nvkm/subdev/fb/ram.c?ref_type=heads#L68
19:20 _lyude[d]: airlied[d] do you have any idea why this would be/is this normal behavior for nouveau's mm? I would assume maybe I'm using too much memory on the GPU for it to be able to find a contiguous allocation, but I've also seen this on my desktop at home with 48 GB of vram - and I definitely don't think I've ever run anything that would get even close to the vram limit on there.
19:22 karolherbst[d]: the last time we thought we were running into out of memory situations, it was a race condition that resulted in ENOMEM being returned..
19:23 karolherbst[d]: I wouldn't be surprised if there are more things like that around
19:23 _lyude[d]: The thing is the allocation succeeds, it's just that we can't actually map it the normal way because there's no single memory adress for it
19:23 karolherbst[d]: ahh...
19:23 _lyude[d]: It's interesting too because it's only when g-s is running. if I close g-s it works fine again
19:23 _lyude[d]: Which makes me wonder if this might be related to some memory schenanigans that get activated by gnome-shell using vulkan ?
19:23 nazikiller8492: You should know that esdrastarsis hangs out in a Telegram channel (@linuxgamesbr) with his transphobic friend, Patola. Most of you have been mentioned and mocked there, especially those from the transgender community. Karol Herbst is already aware, but I wanted to make it public.
19:24 _lyude[d]: ok
19:24 _lyude[d]: well, I know vulkan uses a totally different memory API then opengl did
19:33 linkmauve: chikuwad[d], could you share your optimization course materials eventually? I’m quite interested in that, if it goes more in-depth than the ones I had during my masters’ degree. :)
19:33 linkmauve: Same for your advanced graphics course, I’ve never had any in mine!
19:34 chikuwad[d]: sure, I'm gonna be sharing it with another friend of mine once all lectures are uploaded, I'll share it here too
19:34 chikuwad[d]: is there a way I can reach you that's not as ephemeral as IRC? :P
19:34 linkmauve: You can write me over XMPP, or by email.
19:34 chikuwad[d]: advanced graphics course is unfortunately gonna be much later in the master's
19:34 chikuwad[d]: pls hold
19:34 onebigsucc: e-mail works, I don't have an XMPP
19:34 linkmauve: Both my nick @ my nick .fr.
19:35 onebigsucc: oh ew why did znc pick this nick
19:37 Sid127: much better
19:41 Sid127: linkmauve: sent you an e-mail
19:41 Sid127: I'll upload lecture slides as they're made available to me, course ends at the end of june
19:50 _lyude[d]: Is there any way one can request a contiguous memory allocation in the nouveau kernel module?
20:17 _lyude[d]: ooo - OK, I think we do actually have one that just requires I go through `nvif_mem_ctor_type()`
21:07 _lyude[d]: got it!
21:37 _lyude[d]: redsheep[d]: no -but thta's the edid readback bug right?
21:37 _lyude[d]: i have time to start fixing that right now, it's honestly just kept slipping my mind
21:38 _lyude[d]: i think it was reported a while ago but admittedly I am terrible at keeping up with nouveau's bug tracker 🙁
21:39 _lyude[d]: unfort this is totally different though - I started digging into CRC issues because I will need CRCs working to try to reproduce the cursor flashing issue with it
21:43 _lyude[d]: yeah! especially since I won't be tied between two drivers lol
22:09 airlied[d]: _lyude[d]: looks like you figured it out
22:13 _lyude[d]: yep
22:14 _lyude[d]: already got a patch fixing it on the mailing list, along with the patch fixing vblank scanline position readback
22:14 _lyude[d]: should at least finally be good to go for trying to reproduce that awful cursor flickering bug on the other machine. assuming it hasn't just gone away thanks to fixing the vblank stuff