IRC Logs of #nouveau on irc.freenode.net for 2025-06-03

14:22 karolherbst[d]: mhenning[d]: soo.. I tried to get my head around how to do the proper vec assignment across phis, but it's not quite apparent to me, how the code has to end up like to handle a case like this: https://gist.githubusercontent.com/karolherbst/25784f675d436e25da609027839e5465/raw/dbd380393826b801cb78c3dd5a257961a977407f/gistfile1.txt
14:22 karolherbst[d]: the `%r1608 = copy rZ` gets assigned to `r55`, which obviously wouldn't work out, but it's so early in the reg alloc code, that none of the information required to properly detect it's a vector is available yet. Should I have a pre pass that collects all the phi references, and then when assigning `%1608` it checks the phi map for the biggest vector or something? Like having a
14:22 karolherbst[d]: map of `phi_id -> [SSARef]` or something where this can be looked up?
14:43 gfxstrand[d]: We already have something that gathers phi webs.
14:51 karolherbst[d]: it doesn't work, because it's too late
14:52 karolherbst[d]: like when the "%r1608 = copy rZ" gets assigned, the PhiWebs structure is basically empty for that value, because nothing assigned to the phi node yet
15:02 karolherbst[d]: anyway.. I'll type my idea out and see if it works
15:37 karolherbst[d]: uhh... if I have a SSARef, is there a good way to figure out how it's used by other instructions?
15:37 karolherbst[d]: Phis are all scalar, so...
15:58 mhenning[d]: karolherbst[d]: This isn't true - the PhiWebs have their equivalence classes completely populated before assignment starts
15:59 karolherbst[d]: mhenning[d]: but don't I need this information already when calling into `SSAUseMap::add_vec_use`?
15:59 karolherbst[d]: or rather, when `SSAUseMap::for_block` is getting called
16:00 karolherbst[d]: for OpCopy there is this `self.alloc_scalar` call that looks inside the ssausemap
16:01 pixelcluster[d]: s sause map
16:01 karolherbst[d]: so that `sum.find_vec_use_after` finds it
16:02 mhenning[d]: Oh, right you can either do that in a separate pass between generating the phi webs and assignment or you can build up the vec_use structure and then replace each vec_use's ssavalues with the equivalence classes representative later
16:02 karolherbst[d]: yeah... something like that
16:03 karolherbst[d]: it's just a pain I need this information that early in the pass
16:03 karolherbst[d]: but anyway
16:03 karolherbst[d]: already have the code to collect all the info
16:04 karolherbst[d]: should be enough for a prototype and we can figure out a better solution later
16:29 karolherbst[d]: looks like the collection is working properly now: (e.g. `%r1611 => {%r134 %r135 %r136 %r137}`)
16:29 karolherbst[d]: https://gist.githubusercontent.com/karolherbst/58f8d82e03dde3bdeb711ce80170aea3/raw/76a64d6f15081eafe66dd6d981764720f4f8627a/gistfile1.txt
16:36 karolherbst[d]: uhhh...
16:37 karolherbst[d]: I'm hitting the `assert!(comp < vec.comps());` assert inside `RegAllocator::alloc_scalar` now and that's going to be fun...
16:38 karolherbst[d]: `%r1578 => {%r177 %r178}` so yeah... I'll have to keep the information which component it belongs to? mhh
16:56 karolherbst[d]: the annoying part is, that just because it's aligned to component 1 in one vector, doesn't mean it has to align to the same component of another vector being part of the same phi of a scalar value...
16:56 karolherbst[d]: but I suspect that will never happen really..
16:57 karolherbst[d]: maybe something like component 2 and 4 of a vec2 and vec4, where the latter is also used as two separate vec2 (as with coop matrix results being used in a hfma2)
17:35 phomes_[d]: I am running a build of current linux 6.16 tree and gsp 570. I get fault_addr and fault_type which is helpful. Some type 0 and a type 2. Do we know what those types are?
17:35 karolherbst[d]: 990/83 -> 904/83 (instr/regs)
17:36 phomes_[d]: also noticed new crashes with 570
17:36 karolherbst[d]: I still have tons of movs tho 😭
17:39 karolherbst[d]: https://gist.github.com/karolherbst/aac5a21bece601ad03c4d4c6134e2975
17:39 karolherbst[d]: this part looks nice tho: https://gist.github.com/karolherbst/aac5a21bece601ad03c4d4c6134e2975#file-b-after-L355
17:39 karolherbst[d]: that's peak performance hmma
17:40 karolherbst[d]: that part sucks: https://gist.github.com/karolherbst/aac5a21bece601ad03c4d4c6134e2975#file-b-after-L355
17:40 karolherbst[d]: dest == src2 for lower latencies
17:42 karolherbst[d]: even though it looks much better, it still looks horrible 😄
17:42 karolherbst[d]: and --correctness asserts, oof
17:43 karolherbst[d]: ahh it's somethign else
17:44 karolherbst[d]: but anyway.. that gave me like 8% more perf
17:46 karolherbst[d]: I think...
17:47 karolherbst[d]: let me get to the version with thousands of prmts
17:48 karolherbst[d]: the prmt version is faster, because it only has ~860 instructions 😭
17:48 karolherbst[d]: but optimizing away those prmts sounds more painful honestly...
18:55 gameborn_[d]: Hi, can someone give any hints regarding if nouveau exposes settings such as clockspeeds, power cap or fan control to userspace? I've found there's hwmon code for this, but an hwmon doesn't seem to exist on my ada GPU
18:55 gameborn_[d]: For context, I'm trying to see if I can get any functionality in https://github.com/ilya-zlobintsev/lact working with nouveau
19:08 tiredchiku[d]: it does not
19:08 tiredchiku[d]: not yet
20:11 kar1m0[d]: gameborn_[d]: No but I was thinking about doing so through the kernel gsp since ada has gso enabled by default
20:18 gfxstrand[d]: gameborn_[d]: No. Fan control and power management is all handled by firmware without any kernel intervention. I'm sure the firmware reports something but we don't have it plumbed through to userspace yet.
20:19 gfxstrand[d]: phomes_[d]: Uh... I think the fault types are documented somewhere.
20:19 airlied: yes fault types are ROBUST_CHANNEL_*
20:20 gfxstrand[d]: phomes_[d]: Look in `$MESA/src/nouveau/headers/nvidia/hwref/hopper/dev_fault.h`
20:20 gfxstrand[d]: 0 is PDE, 2 is PTE
20:20 airlied[d]: you have to check the rcerror as well
21:07 phomes_[d]: one example is Rage 2:
21:07 phomes_[d]: `GSP 535
21:07 phomes_[d]: nouveau 0000:01:00.0: gsp: mmu fault queued
21:07 phomes_[d]: nouveau 0000:01:00.0: gsp: rc engn:00000001 chid:144 type:31 scope:1 part:233
21:07 phomes_[d]: nouveau 0000:01:00.0: fifo:c00000:0012:0090:[WindowThread[229395]] errored - disabling channel
21:07 phomes_[d]: GSP 570
21:07 phomes_[d]: nouveau 0000:01:00.0: gsp: mmu fault queued
21:07 phomes_[d]: nouveau 0000:01:00.0: gsp: rc engn:00000001 chid:19 gfid:0 level:2 type:31 scope:1 part:233 fault_addr:00000040994d2000 fault_type:00000000
21:07 phomes_[d]: nouveau 0000:01:00.0: fifo:c00000:0013:0013:[WindowThread[16777]] errored - disabling channel`
21:08 phomes_[d]: what would rcerror be for this?
21:10 airlied[d]: type:31 is ROBUST_CHANNEL_
21:10 airlied[d]: ROBUST_CHANNEL_FIFO_ERROR_MMU_ERR_FLT
21:11 airlied[d]: which means the fault addr/type makes sense
21:11 mhenning[d]: snowycoder[d]: Okay, just got around to looking at this. I think the second algorithm is a lot better because it's easier to reason about. I'm not really concerned about the cost of 255 bytes/block - that's really not all that expensive in terms of memory usage
21:26 mhenning[d]: gfxstrand[d]: Implemented this in https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35328
21:46 mangodev[d]: mhenning[d]: you're such a busy girl, you *always* have a fresh MR 😭
21:46 mhenning[d]: there's a lot to do around here 😛
21:48 mangodev[d]: fair, still a relatively new codebase
21:50 mangodev[d]: mhenning[d]: speaking of issues though
21:50 mangodev[d]: what does the red triangle emoji mean on the dxvk compat checklist?
21:51 mangodev[d]: iirc it was over `VK_FRAGMENT_INTERLOCK` or something along those lines
21:52 HdkR: Is it because barely anything uses PSI? :P
21:52 mangodev[d]: unsure
21:52 mangodev[d]: i just wonder what makes d3dvk12 in specific so slow on NVK compared to dx11 or native vulkan
21:53 mangodev[d]: i'm curious
21:53 mangodev[d]: how does ue5's native vk backend run compared to the dx12 one? does it run better, or just as bad?
21:54 mangodev[d]: (on NVK)
21:57 asdqueerfromeu[d]: mangodev[d]: Well that's actually a reference to Triang3l (because they implemented FSI for RADV)
21:59 mangodev[d]: asdqueerfromeu[d]: ahhhhh funny
21:59 HdkR: ah
21:59 mangodev[d]: ~~is it just a call for them to do the same thing for NVK~~
22:00 HdkR: Has the system even been RE'd for NVIDIA?
22:00 HdkR: The Switch emulators just did idiom replacements
22:25 snowycoder[d]: mhenning[d]: Thank you, I'm optimizing it a bit since it's still 5x slower than the previous pass (with big shaders it can take 1-2ms, not ideal)
22:35 mhenning[d]: snowycoder[d]: Are you measuring that time in release mode?
22:36 mhenning[d]: Also, it could be worth trying representing it as a map from register index to stack position instead of an array. (the map wouldn't need entries for registers that aren't waiting). Not sure if that would be faster or slower - it's sort of a trade-off in terms of how frequent tex ops are