00:36fdobridge: <maba_kalox> Hallo, I would like to contribute into NVK dev, where could I start?
00:36fdobridge: <maba_kalox> PS. not sure if it is right place to ask, so feel free to show me the way 8)
00:47Fijxu: fdobridge: Go into the #dri-level channel
00:47Fijxu: oh it's a bridge, right
00:48Fijxu: maba_kalox: Join to the #dri-devel room
00:56fdobridge: <gfxstrand> dakr: that sounds more complicated than just splitting into multiple jobs but I also don't care. 🤷🏻♀️
00:56fdobridge: <gfxstrand> This is a fine place. First question: what GPUs do you have?
02:05fdobridge: <maba_kalox> I have RTX 3070ti.
02:05fdobridge: <gfxstrand> Okay, then you should be set to work on about anything.
02:06fdobridge: <maba_kalox> And something between 1-2 years of embedded C dev, so - close to none driver dev experience 8)
02:06fdobridge: <gfxstrand> First step is to build yourself a Kernel. You'll need to pull from drm-misc-next
02:06fdobridge: <maba_kalox> And something between 1-2 years of embedded C dev (embedded Linux), so - close to none driver dev experience 8) (edited)
02:07fdobridge: <maba_kalox> Sure. I will follow up when it is done.
02:08fdobridge: <gfxstrand> https://cgit.freedesktop.org/drm-misc/log/
03:01fdobridge: <airlied> found the wierd memory config
03:01fdobridge: <airlied> we are allocating 2MB for 8k buffer
03:16fdobridge: <airlied> https://patchwork.freedesktop.org/patch/552332/
03:38fdobridge: <gfxstrand> Oops...
03:38fdobridge: <airlied> it seems to trip up the dma-buf tests in userspace, gonna track down where
03:49fdobridge: <airlied> uggh WS_BO_LOCAL not being a flag strikes again
03:52fdobridge: <airlied> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24625 needs to land first
04:03fdobridge: <gfxstrand> @airlied you seem as likely as anyone to be able to review https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24619
04:23fdobridge: <airlied> halfway through a run, seems less flaky Pass: 198197, Fail: 678, Crash: 7, Skip: 817610, Flake: 8, Duration: 30:39, Remaining: 30:07
04:25fdobridge: <airlied> one flake per crash seems to line up with side swiping
04:32fdobridge: <airlied> also setting MESA_VK_IGNORE_CONFORMANCE_WARNING makes things a lot nicer to read
04:54fdobridge: <airlied> Pass: 393328, Fail: 1303, Crash: 11, Skip: 1620590, Flake: 11, Duration: 1:01:44, Remaining: 0
04:54fdobridge: <airlied> that is with danilo's fence fix and the memory alloc fix
04:55fdobridge: <airlied> ampere has some geometry shader + descriptors bug
05:56fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> Where can I those kernel fixes?
05:56fdobridge: <airlied> on the dri-devel mailing list
05:56fdobridge: <airlied> I think we've posted patchwork links in here
05:57fdobridge: <airlied> https://patchwork.freedesktop.org/patch/552311/ is the fence once
05:58fdobridge: <airlied> and I posted the memory one just up there
05:58fdobridge: <airlied> the memory one needs the mesa fix applied to not regress
06:07fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> So that Mesa change will explode unpatched kernels? 💥
06:09fdobridge: <airlied> no vice-versa
06:57fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> https://aur.archlinux.org/packages/nouveau-fw-gsp 🚀
07:27fdobridge: <youmukonpaku1337> Poggers
10:59doras: Any opinions from nouveauWinSys/GBM experts about this issue? https://github.com/flathub/org.chromium.Chromium/pull/309#issuecomment-1674457260
11:05doras: The context is that hardware acceleration in Chromium with nouveau is broken, and some Electon apps are completely broken, and neither will work until this matter is concluded.
11:40fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> How good is this warning? :triangle_nvk:
11:40fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> https://cdn.discordapp.com/attachments/1034184951790305330/1139523768595783741/image.png
11:46fdobridge: <youmukonpaku1337> how does nvk perform nowadays?
11:46fdobridge: <youmukonpaku1337> like
11:46fdobridge: <youmukonpaku1337> is it comparable to prop speeds
11:48fdobridge: <youmukonpaku1337> or is it still talos principle at 13fps speeds
11:48karolherbst: doras: 1. what do you mean by "completely broken"? They don't run at all or what? 2. If somebody sends patches to improve the situation we'll gladly take them unless it's a regression and I'm confused, because using electron apps was never an issue
11:49fdobridge: <youmukonpaku1337> i suppose they might depend on hardware acceleration to work?
11:49fdobridge: <youmukonpaku1337> or try to use hardware acceleration regardless of whether its broken and break
11:49doras: karolherbst: it seems that Electon apps don't fall back to software rendering when the Chromium GPU process exits due to a fatal error, or something like that.
11:50doras: I haven't dug into the Electon internals to figure out how it's different than Chromium itself in this regard.
11:50karolherbst: so is this with any random electron app or some specific? Or does it need a newer electron version or anything?
11:51karolherbst: or is it really just the case when there is no hw acceleration available?
11:51fdobridge: <youmukonpaku1337> so my guess was right
11:54fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> Definitely not yet
11:55doras: karolherbst: it's Element in my case. The issue stems from a change in Chromium upstream (https://chromium-review.googlesource.com/c/chromium/src/+/4380766)
11:57doras: The issue is mostly that it's not entirely clear if this is a nouveau or a Chromium issue, since the intended nature of the GEM handle that GBM returns is not clearly defined. This is what's discussed in the Flathub Chromium PR that I linked.
11:58doras: I feel that additional opinions may help establish a path forward.
11:59karolherbst: nouveua should behave like all the other drivers I'd say
13:29fdobridge: <esdrastarsis> 40 FPS on my setup with reclocking on low settings
13:30fdobridge: <youmukonpaku1337> what about xonotic on zink on nvk
13:32fdobridge: <esdrastarsis> idk, I haven't tested the new uAPI yet
13:32fdobridge: <esdrastarsis> but looks like zink has rendering issues on supertuxkart
13:32fdobridge: <youmukonpaku1337> ioa
13:33fdobridge: <esdrastarsis> https://discord.com/channels/1033216351990456371/1034184951790305330/1138025811040018512
13:40fdobridge: <youmukonpaku1337> PFFT
13:40fdobridge: <youmukonpaku1337> SJWIWNENSKWJEN
13:40fdobridge: <youmukonpaku1337> that is goofy
13:43fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> I can test Xonotic too
13:44fdobridge: <youmukonpaku1337> sure
17:08fdobridge: <georgeouzou> I get some weird MESA: error: ../src/nouveau/vulkan/nvk_queue_drm_nouveau.c:365: DRM_NOUVEAU_EXEC failed: No such device (VK_ERROR_DEVICE_LOST)
17:08fdobridge: <georgeouzou> when running the clear/multisample tests (grep -r clear vk-default/pipeline/monolithic.txt | grep multisample)
17:19fdobridge: <airlied> That's not wierd, that is device lost
17:19fdobridge: <airlied> You crashed the channel by sending a bad pushbuf
17:21fdobridge: <georgeouzou> I build the main branch
17:21fdobridge: <georgeouzou> I did not make any further changes
17:22fdobridge: <georgeouzou> When i remove the *r8g8b8a8_unorm_r16g16b16a16_sfloat_r32g32b32a32_uint_d24_unorm_s8_uint* tests it all works smoothly,
17:22fdobridge: <georgeouzou> I need to further analyze it
17:27fdobridge: <georgeouzou> Hmm it only happens when all tests are run together. If i run only some of those it works ok
17:27fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> How about D24S8?
17:28fdobridge: <georgeouzou> If i run only the d24_unorm_s8_uint group it works ok
17:29fdobridge: <georgeouzou> If i remove it from the others they work ok,
17:29fdobridge: <georgeouzou> but all together they explode 🙂
17:32fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> And how about RGBA32?
17:35fdobridge: <georgeouzou> If i remove or run only the rgba32 it works ok
17:35fdobridge: <georgeouzou> If i remove or run only the rgba32 group it works ok (edited)
17:36fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> Have you checked dmesg?
17:47fdobridge: <georgeouzou> I will check it further again later
17:47fdobridge: <georgeouzou> It consistently fails at least
17:51fdobridge: <georgeouzou> So it should be easier to fix
18:01fdobridge: <airlied> Probably a vmfault what GPU is it?
18:06fdobridge: <georgeouzou> Rtx 2070
18:32fdobridge: <georgeouzou> Do you think it's related to turing only?
18:36fdobridge: <airlied> Not sure if we seen it in CTS runs, but maybe it needs a certain order
19:48doras: karolherbst: I presume having to expose a new libdrm API to fix the issue with Chromium would not be optimal, would it? The problem is nouveau keeps its "global" GEM handle list in libdrm, while most drivers keep theirs in Mesa. The list is used to track exported GEM handles to allow managing a single reference count for such handles across multiple buffer objects.
19:51karolherbst: doras: there are plans to ditch libdrm for nouveau
19:52karolherbst: but anyway, I'm still not sure what's requested here. Anyway, others know way more about this area than I do, so either somebody just makes that change or describes what needs to be changed
19:52doras: karolherbst: I see. The bug is that nouveau doesn't consider a GEM handle exported (doesn't add it to its "global" list) when a user queries for the handle. It simply returns it.
19:53doras: I can make the change, but wouldn't it be difficult/impossible to backport to stable versions if it depends on a new libdrm API?
19:54karolherbst: kinda depends. If you can make that change inside mesa that's good enough
19:57doras: karolherbst: I may be able call an existing libdrm API to get the handle into the exported list, but it would call an unnecessary ioctl. It's a bit of a ugly hack, but maybe not too bad for stable versions?
19:58doras: Basically, it would call drmPrimeHandleToFD unnecessarily and then we'll close the fd we get to avoid any side-effects. Not a pretty solution, but may work.
19:59doras: This can be a Mesa-only change.
19:59karolherbst: I think it's eaiser to discuss a proper solution once I can get my head around what needs to be changed here. If the solution is to move libdrm into the tree and fix it there, so be it, but I'd rather just rewrite the entire layer anyway :D
20:00karolherbst: but whatever fixes it
20:01doras: karolherbst: when you have a few minutes, I summarized it briefly here: https://github.com/flathub/org.chromium.Chromium/pull/309#issuecomment-1667457051
20:03fdobridge: <airlied> Oh no we've been phoronixed, im too scared to look
20:03fdobridge: <karolherbst🐧🦀> don't
20:03fdobridge: <karolherbst🐧🦀> oh wow 😄
20:03fdobridge: <karolherbst🐧🦀> it's bad
20:04fdobridge: <karolherbst🐧🦀> we get like 5% perf
20:04fdobridge: <karolherbst🐧🦀> maybe 1%
20:04fdobridge: <karolherbst🐧🦀> depends on the test 😄
20:04fdobridge: <karolherbst🐧🦀> anyway
20:04fdobridge: <karolherbst🐧🦀> without GSP it's no surprise
20:05fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> There's no GSP firmware for RTX 4000 series yet
20:05karolherbst: doras: couldn't we just call nouveau_bo_set_prime in both cases or something?
20:06doras: karolherbst: this is my hacky suggestion. It would call `drmPrimeHandleToFD` and provide us with a fd that we'll need to close.
20:07doras: Because the return value in the case of WINSYS_HANDLE_TYPE_KMS is the handle itself, not a fd.
20:08doras: So yes, we could.
20:08karolherbst: I literally have no idea about any of this
20:08doras: :)
20:08fdobridge: <airlied> Is there a mesa.issue?
20:09fdobridge: <airlied> I can maybe take a closer look if you someone pokes me next week
20:10doras: airlied: are you asking about the nouveau thing? I can open a proper issue.
20:11fdobridge: <airlied> Yes, I think your hack is probably good enough
20:12doras: Sure. I'll check if it works and open a MR too.
20:12karolherbst: anybody with proper knowledge about that dma-buf/gbm/whatever stuff should feel free to merge and cc stable whatever fix is suiteable here :P
20:25fdobridge: <gfxstrand> I almost didn't notice the we were even on the graphs. 🤣
20:48fdobridge: <gfxstrand> I will not get a phoronix account and troll the forums. I will not get a phoronix account and troll the forums....
20:51fdobridge: <karolherbst🐧🦀> 😄
20:51fdobridge: <karolherbst🐧🦀> depends entirely on if the usual shitheads join the fun or not
20:51fdobridge: <airlied> Don't get one under youur real name 🙂
20:52fdobridge: <karolherbst🐧🦀> that was my real mistake
20:52fdobridge: <airlied> They will get there, I won't read comments after another hour :-p
20:52fdobridge: <karolherbst🐧🦀> they are already there
20:53fdobridge: <airlied> Ah only one so far
20:53fdobridge: <karolherbst🐧🦀> still
20:54fdobridge: <gfxstrand> https://cdn.discordapp.com/attachments/1034184951790305330/1139663056855634042/Screenshot_20230811-155338.png
20:55fdobridge: <karolherbst🐧🦀> yeah... it's only going downhill from now on
20:56fdobridge: <karolherbst🐧🦀> give or take, but page 3+ is where it's just nonsense
21:03fdobridge: <gfxstrand> @airlied You keep saying "new compiler"... I should probably get back to that. 😅
21:04fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> Hopefully the vkcube milestone can be reached fairly soon :ferris:
21:07fdobridge: <gfxstrand> :triangle_nvk:
21:08fdobridge: <gfxstrand> Yeah, I'll probably decide to care about 3D stages after I get spilling working.
21:47fdobridge: <airlied> Or not 🙂 seems like we can get quite far with no spilling!
21:55fdobridge: <karolherbst🐧🦀> we do have a couple of registers available
21:59fdobridge: <georgeouzou> I get the following:
21:59fdobridge: <georgeouzou> [ 349.114985] nouveau 0000:01:00.0: fifo: fault 01 [VIRT_WRITE] at 0000007fffd92000 engine 40 [gr] client 1d [HUB/SKED] reason 02 [PTE] on channel 2 [01ffe43000 deqp-vk[4490]]
21:59fdobridge: <georgeouzou> [ 349.114992] nouveau 0000:01:00.0: fifo:000000:0002:[deqp-vk[4490]] rc scheduled
21:59fdobridge: <georgeouzou> [ 349.114993] nouveau 0000:01:00.0: fifo:000000: rc scheduled
21:59fdobridge: <georgeouzou> [ 349.115000] nouveau 0000:01:00.0: fifo:000000:0002:0002:[deqp-vk[4490]] errored - disabling channel
21:59fdobridge: <georgeouzou> [ 349.115002] nouveau 0000:01:00.0: deqp-vk[4490]: channel 2 killed!
22:00fdobridge: <airlied> yeah that's a write fault
22:09HdkR: 256 encoded registers is a pretty good amount before needing to spill :D
22:11fdobridge: <georgeouzou> i managed to consistently reproduce this by linearly running a list of 67 tests, but i cannot pinpoint the issue yet.
22:12fdobridge: <georgeouzou> i managed to consistently reproduce this by linearly running a list of 67 tests, but i cannot pinpoint the cause yet. (edited)
22:26fdobridge: <georgeouzou> and the push dump: https://pastebin.com/eupuDfpu