IRC Logs of #nouveau on irc.freenode.net for 2025-03-13

01:10 airlied[d]: finally got a shader dump, wow that compile is pretty nice, and I've no idea how to get there 🙂
01:40 airlied[d]: hmm nvidia seem to use cs2r for 0 loads
01:41 karolherbst[d]: instead of c2r?
01:42 karolherbst[d]: ehh wait S2R is the normal one
01:43 airlied[d]: instead of mov rZ
01:43 karolherbst[d]: heh...
01:43 karolherbst[d]: I think they balance units with it
01:43 karolherbst[d]: but yes.. CS2R returns 2 when using with unsupported system values
01:43 karolherbst[d]: *0
01:44 airlied[d]: srZ is 255
01:44 karolherbst[d]: ohhh looks like CS2R is an alu instruction
01:44 karolherbst[d]: funky
01:45 karolherbst[d]: but only when .32
01:46 karolherbst[d]: I know they also use IMAD instead of MOV
01:46 karolherbst[d]: IMAD is on the fma unit
01:46 karolherbst[d]: but I suspect they optimize for more than just latency
01:46 karolherbst[d]: because a MOV would do as well
01:47 karolherbst[d]: like apparently things like bit flips matter in regards to power consumption
01:47 karolherbst[d]: which means if you reduce the amount of bit flips when executing instruction, you get more performance, because less heat
01:48 karolherbst[d]: not sure if that's the reason
01:48 karolherbst[d]: but I'm sure nvidia is crazy enough to care about such details
02:07 airlied[d]: I've filed a couple of issues for this stuff, so we can gather ideas in there
03:34 airlied[d]: karolherbst[d]: they use c2sr because it allows doing 64-bit
03:34 airlied[d]: cs2r even
03:35 gfxstrand[d]: The 64-bit one is even fast since it's intended for shader clock
03:35 gfxstrand[d]: Like the only thing you can read with it, though, are clock and zero
03:52 HdkR: Is reading zero only useful when setting 64-bit of data to zero, instead of using zr directly?
03:53 airlied[d]: yeah at least where I'm using it, it's to zero some temp regs
03:57 airlied[d]: of course this leads to the eternal question, avoid lowering things, or try and coalesce them later
04:31 airlied[d]: I pushed a nak pass to my wip branch, it even seems to work 😛
04:38 gfxstrand[d]: Woah!
04:38 gfxstrand[d]: Should I be afraid?
04:45 airlied[d]: https://gitlab.freedesktop.org/airlied/mesa/-/commit/129c3957489ec72996b09a5e37fb2d05a236f91d
04:45 airlied[d]: probably 🙂
04:48 airlied[d]: though I suspect it would be slightly better if I lowered to a 64-bit zero instead since I think reg alloc might split some of those
04:48 airlied[d]: just wanted to play about with the simpler idea first
05:13 tiredchiku[d]: airlied[d]: quick question for you, in the drm/nouveau/nvif headers, I can see a NVIF_MEM_UNCACHED flag <https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/nouveau/include/nvif/mmu.h#L26>
05:13 tiredchiku[d]: is that wiredup on the kernel side? as in, is it possible to request said uncached memory from userspace (mesa)?
05:14 airlied[d]: Don't think jts wired to uapi at all, not sure how validated the kernel code is
05:15 tiredchiku[d]: would it be possible to get that going :doomthink:
05:15 tiredchiku[d]: currently we're the only dGPU driver that doesn't expose an uncached gart type
05:16 tiredchiku[d]: and it was worked around in vkd3d-proton's code, but could cause breakage elsewhere
05:51 airlied[d]: doubt anyone will get to it prior to nova doing it
05:53 tiredchiku[d]: hm, okay
05:54 airlied[d]: patches welcome and all that, but I don't think it provides a major upside yet
06:07 airlied[d]: yeah keeping those 32-bits grouped into 64-bits would help later as well, so I probably do need to try and retain some info
06:48 airlied[d]: Or at least regalloc heuristics
07:31 tiredchiku[d]: gfxstrand[d]: ```
07:31 tiredchiku[d]: .cls_copy
07:31 tiredchiku[d]: .cls_eng2d
07:31 tiredchiku[d]: .cls_eng3d = 0xC697, //to-do: get all this info from openrm
07:31 tiredchiku[d]: .cls_m2mf
07:32 tiredchiku[d]: .cls_compute
07:32 tiredchiku[d]: would all these classes be the same as eng3d on turing and above?
07:35 tiredchiku[d]: apparently not, if nv_push_dump.c is to be believed
07:36 tiredchiku[d]: ..that was a dumb question, sorry for the ping e-e
13:06 gfxstrand[d]: No. High byte is generation, low byte is class. 0xC6 is Volta maybe? And 0x97 means 3D. 0xC0 is compute. 0x6F is some generic stuff. Copy is 0xB5, maybe?
13:07 karolherbst[d]: nvidia doesn't guarantee the generation to match tho
13:07 karolherbst[d]: there are a few exceptions, but I don't remember which specifically
13:07 karolherbst[d]: airlied[d]: funky...
13:08 karolherbst[d]: I can see this being faster than writing 0 twice in a couple of corner cases
13:08 karolherbst[d]: ohh yeah.. it is with HMMA
13:09 karolherbst[d]: uhm.. MMA
13:09 karolherbst[d]:interesting
13:09 karolherbst[d]: but it's slower with alu
13:10 karolherbst[d]: gfxstrand[d]: and perf counters
13:11 karolherbst[d]: airlied[d]: it's not always the best, but you should check the scheduling tables for it
13:17 notthatclippy[d]: karolherbst[d]: We ran out of bits for any kind of fully consistent scheme a while back. Don't rely on the pattern.
13:18 karolherbst[d]: yeah, I've asked about this and got a proper answer being what you said 😄
13:18 karolherbst[d]: we do have exceptions in the published classes hence I asked about it
13:18 karolherbst[d]: though it was a mistake or something, but yeah...
13:27 notthatclippy[d]: AFAIK there is nothing in NV HW or SW that relies on a given bit pattern here anymore, so all that remains is a convention that can't always be honored.
14:28 tiredchiku[d]: I just looked through all the headers until I found all that I needed
14:32 allrandomband: painful, the source is legal, the victim was Mart Martin , in other words me, i saw roegan show, but humans butchered me, where i saw it from camera, they bragged with this heroics for years in a row, svande pääbo was nominated as nobel prize win in medicine for this entire terror, and entire nonsense in science for that butchering, there were some talks around it as to how i was alien or
14:32 allrandomband: smth. I might had little different knee cap structure but that did not make me an alien, nor a breakthrough from human species, it's just gene mutation, it was the most famous protocol violation in knee surgery that every orthopedics should know, so they can not be unaware of that, i am pretty strong, but if some thing was done to you , you'd be long since given up, cause it's horrible
14:32 allrandomband: fault or error to live with. they did it three times to me. So the source is correct somewhat. So i never visit doctors again, all those were 100percent success stories for proper surgeon.
14:44 tiredchiku[d]: ban
16:50 linkmauve: Hi, my dad uses an old laptop and upgrading from Ubuntu 22.04 to 24.04, he’s on a 01:00.0 VGA compatible controller: NVIDIA Corporation GT216M [GeForce GT 330M] (rev a2) and apparently got a freeze on boot looking like that:
16:50 linkmauve: https://partage.jabberfr.org/XNL5ysla3c7D1fWGKhCmQqgp/e6xXJODxS12O__SesmKtKA.jpg
16:50 linkmauve: https://partage.jabberfr.org/NMyj8nGgUskWoGkrtnAbX_2a/033lEcWYTkqvIxXDqTKDQg.jpg
16:50 linkmauve: With the cursor not moving apparently.
16:55 linkmauve: kernel: nouveau 0000:01:00.0: Direct firmware load for nouveau/nva5_fuc084 failed with error -2
16:55 linkmauve: kernel: nouveau 0000:01:00.0: Direct firmware load for nouveau/nva5_fuc084d failed with error -2
16:55 linkmauve: kernel: nouveau 0000:01:00.0: msvld: unable to load firmware data
16:55 linkmauve: kernel: nouveau 0000:01:00.0: msvld: init failed, -19
16:55 linkmauve: As well as a bunch of:
16:55 linkmauve: kernel: nouveau 0000:01:00.0: gr: DATA_ERROR 00000012 [RT_LINEAR_WITH_ZETA]
16:55 linkmauve: kernel: nouveau 0000:01:00.0: gr: 00100000 [] ch 3 [003fa80000 gnome-shell[1686]] subc 3 class 8597 mthd 0d78 data 00000004
16:56 karolherbst: the firmware fails don't matter
16:56 linkmauve: kernel: nouveau 0000:01:00.0: DRM: base-0: timeout
16:56 linkmauve: kernel: nouveau 0000:01:00.0: disp: ERROR 5 [INVALID_STATE] 09 [] chid 1 mthd 0080 data 00000000
16:56 karolherbst: the other is userspace doing something wrong
16:56 karolherbst: is this a dual GPU laptop?
16:56 linkmauve: Yes.
16:57 linkmauve: Using Mesa 23.0.4-0ubuntu1~22.04.1 apparently.
16:57 karolherbst: on some systems the wrong primary GPU is choosen and you can end up with such an error
16:57 linkmauve: Weird, 22.04 instead of 24.04.
16:57 karolherbst: well.. 23.0 ain't supported by us, so it's up to ubuntu to fix
16:57 linkmauve: Err, no, it seems to only have the Nvidia.
16:57 karolherbst: ahh
16:57 karolherbst: anyway.. linear buffers with depth doesn't work
16:57 linkmauve: This is a Nehalem Intel CPU (Intel(R) Core(TM) i5 CPU M 460 @ 2.53GHz), so no Intel GPU whatsoever.
16:58 karolherbst: so not even sure what userspace is doing here, because it sounds like broken modifier support?
16:58 linkmauve: karolherbst, is that a bug to report to gnome-shell then?
16:58 karolherbst: mhhhh
16:58 karolherbst: in theory you could claim it's a nouveau one not properly handling linear textures like that
16:58 karolherbst: in any case, you won't get an fixed mesa for that version
16:59 karolherbst: unless ubuntu fixes it on 23.0
16:59 karolherbst: I think we did something in newer mesa to get around this, but not 100% sure on it
16:59 karolherbst: if you can reproduce with mesa 24.3 we can take a look
17:00 karolherbst: in any case, the bug report belongs to ubuntu
17:01 linkmauve: Err sorry, it’s actually 24.2.8-1ubuntu1~24.04.1
17:02 karolherbst: wayland I guess?
17:02 karolherbst: ohh wait.. it's the nv50 driver... uhhh
17:02 karolherbst: that might got broken
17:03 karolherbst: I _think_ I started to look into a similar issue on those older GPUs, but I couldn't really trigger it
17:04 linkmauve: I can try main, and reproduce as much as you want.
17:07 karolherbst: ahh that would be helpful
17:07 karolherbst: though
17:07 karolherbst: I am actually more interested in a `git bisect`
17:07 karolherbst: so if you can figure out what caused the regression that would help a ton
17:09 karolherbst: it also doesn't help that I think most of m nv50 era gpus are also falling apart :')
17:09 linkmauve: Ok, I’ll try that!
17:10 linkmauve: My dad will likely continue using this laptop “forever”, so I can test anything for you any time. :)
17:10 karolherbst: nice
17:54 gfxstrand[d]: karolherbst: We really should fix nouveau GL so it can render to linear.
17:54 karolherbst: probably
17:54 karolherbst: though I think this used to work
17:54 gfxstrand[d]: NVK works around it by rendering to a tiled shadow copy and copying at the end of the render pass.
17:55 karolherbst: yeah.. and something inside mesa handled that for us afaik, or maybe I misremember
17:55 gfxstrand[d]: But I thought the dri code had paths for that. 🤔
17:55 karolherbst: yeah, same
17:55 karolherbst: I think something broke it.. maybe
17:55 gfxstrand[d]: Oh, I bet it broke with modifiers
17:55 karolherbst: could also be a nv50 bug
17:55 karolherbst: possibly
17:55 karolherbst: this isn't nvc0 here, it's nv50
17:56 gfxstrand[d]: Modifiers assumes everyone can do linear. So if you advertise support at all, you might be getting `DRM_FORMAT_MOD_LINEAR`.
17:57 gfxstrand[d]: Maybe I shouldn't throw out that Fermi+SNB laptop I have in my office after all...
17:57 karolherbst: nv50 doesn't even implement `resource_create_with_modifiers`
17:57 karolherbst: nah
17:57 karolherbst: fermi is nvc0 🙃
17:58 karolherbst: though we also have in theory the same issue on nvc0, though I think we are fine there
17:58 karolherbst: otherwise we would have tons of more bug reports
17:59 gfxstrand[d]: Heh
18:00 gfxstrand[d]: I don't plan on getting anything older than Fermi, though. I'll leave that to other people.
18:00 gfxstrand[d]: Even that laptop might be headed to recycle
18:00 karolherbst: yeah... and somehow I couldn't recreate the conditions to run into the issue on my nv50 GPUs
18:00 karolherbst: I think two of my nv50 are also broken
18:01 karolherbst: maybe there is some weirdo random cap we need to enable/disable for nv50 to make it work.. who knows
18:02 karolherbst: or maybe implement resource_create_with_modifiers and don't report LINEAR?
18:13 tiredchiku[d]: Tesla :blobcatnotlikethis:
18:49 airlied[d]: I've got a hacky nvc0 patch to not bind zs if cbuf is linear
18:50 airlied[d]: Just to avoid the crashes, for a customer
18:51 airlied[d]: There was a period of time on a bunch of GPUs where some misc kernel context handling error meant that nothing died when you did linear and Z
18:51 airlied[d]: That got fixed and a bunch of userspace broke
18:53 karolherbst[d]: sounds annoying
19:08 airlied[d]: Yes I only spent 2 months working it out with 2 Pascal's cards and a lot of git bisecting
19:16 _lyude[d]: linkmauve: could you rebuild kernel with `NOUVEAU_DEBUG_PUSH` enabled, reboot with `drm.debug=0x16 nouveau.debug=disp=trace` and then send me the full dmesg
19:49 mhenning[d]: gfxstrand[d]: I think https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32311 is ready for a re-review when you get a chance
19:55 gfxstrand[d]: mhenning[d]: Yup! It's first on my list for Monday.
19:55 gfxstrand[d]: I'm off today and tomorrow.
19:56 gfxstrand[d]: I'm really excited to land that.
19:56 gfxstrand[d]: I just need to get out of fixing things and review code again.
19:57 gfxstrand[d]: But Maxwell is getting so close I can taste it. I might get distracted by that for another day or two.
19:57 mhenning[d]: Yeah, I'm looking forward to getting that in too
19:57 mhenning[d]: gfxstrand[d]: I think gitlab is down next week, so you have a pretty good excuse to delay reviewing a little longer
19:57 gfxstrand[d]: Right
19:58 gfxstrand[d]: I guess I'll keep fixing Maxwell and Tegra then.
20:01 gfxstrand[d]: I think I figured out fp64->fp16 conversion last night. I just need to type it out.
20:01 airlied[d]: I should maybe rebase the latency stuff onto the scheduler
20:02 gfxstrand[d]: Maybe not? I might rework some of the latency stuff, in which case I'll rebase your branch for you.
20:02 gfxstrand[d]: I've got a plan. I just need the graph from Mel's scheduler and then I need to do some typing.
20:03 gfxstrand[d]: But I've been waiting to do it until I had proper latency tables, which you've been so kind as to type.
20:04 gfxstrand[d]: So I'd say wait. We'll plan on the week after next being the week of latency stuff.
20:04 airlied[d]: I'll see then if I can persuade regalloc to give my 64bits adjacent regs 🙂
20:05 gfxstrand[d]: For what? Zeroing?
20:05 airlied[d]: So it avoids pointless movs
20:05 gfxstrand[d]: Ah
20:05 airlied[d]: Currently two phi values end up as a 64bit input and they get split regs so movs ensue
20:05 gfxstrand[d]: Yeah, it's not great about that right now. It's better than terrible but it's not great
20:06 gfxstrand[d]: We need to teach the vectorizer thing about phis
20:06 mhenning[d]: Yeah, we already have a heuristic for that in a single basic block, but it still needs to be taught how to see beyond the current block
20:08 mhenning[d]: I've wondered a bit if we want to support 64-bit phis, which would mean we could just stop narrowing to 32-bits in nir and then the heuristic doesn't actually need to be global - it can look at the phi
20:09 mhenning[d]: But I haven't looked too closely at how hard it is to add 64-bit phis throughout the backend
20:12 airlied[d]: I thought that maybe adding a strong that two phis make a 64-bit value might be useful, though I think for this case the phis are only used once in a 64-bit place, so the heuristic should figure it out
20:15 mhenning[d]: I can't parse that sentence
20:17 gfxstrand[d]: mhenning[d]: There's a lot of assumptions that each phi index is a 32-bit value. Especially now that we allow constants in phis, making them support vectors is tricky
20:23 mhenning[d]: Yeah, I know that assumption is scattered everywhere, but the places I've looked at would just require another simple nested loop for the vector case.
20:23 mhenning[d]: Maybe there are places where the assumption is baked in in a way that's harder to fix though
20:24 airlied[d]: oops lost a word, adding some sort of hint that two phis will be used for 64-bits instead of trying to work it out after we've split everything
20:26 mhenning[d]: ah, makes sense. note that lowering to 32-bit phis happens in nir right now, so you'd need to either lower later or somehow represent the hint in nir
20:35 gfxstrand[d]: I wish we had a better way to denote numbers of components. We could probably add `Imm64` and maybe a thing to cbuf
21:14 airlied[d]: I think having Imm64 or Zero64 could have been enough for this case also to get the reg assigned right
22:24 gfxstrand[d]: The big thing is having enough context to know how much data is being read for things like copy-prop, legalization, spilling, and RA.
22:25 gfxstrand[d]: Or we need to prove that it doesn't matter in those cases.
22:25 gfxstrand[d]: But we can do some of that with source types, maybe
22:40 gfxstrand[d]: gfxstrand[d]: Most of those replace registers with constants so that's kinda okay. FP64 stuff supports `Imm32` now. But it all feels fragile to me at the moment and I'd like to do better.
22:42 snowycoder[d]: If I can help a little bit with some features I'd like to do something on NAK now that I know it a bit better
22:45 snowycoder[d]: That, or I can start to fix issues with Kepler, but I still don't know much about what's missing
22:47 redsheep[d]: You have kepler hardware handy?
22:48 snowycoder[d]: Yep, I do
22:48 snowycoder[d]: Not the most performant, it's a GT710
22:49 redsheep[d]: Iirc it was images and having nak be able to compile to it
22:49 mhenning[d]: The big thing with kepler is that it's missing NAK support. If you want to start writing NAK support for kepler, that might be a good project that's an intermediate difficulty
22:51 mhenning[d]: Other than that, there's a bunch of stuff we could add to NAK, and we could pick one out
22:51 snowycoder[d]: mhenning[d]: I can try that, isn't image storage support a prerequisite though?
22:52 mhenning[d]: For conformance? yes. For getting some stuff running? no
22:52 snowycoder[d]: Ok, perfect!
22:54 gfxstrand[d]: snowycoder[d]: Everything. Everything is missing. 😅
22:54 gfxstrand[d]: mhenning[d]: It's not sufficient for conformance. Codegen can't pass the CTS.
22:55 gfxstrand[d]: Or maybe I read that backwards?
22:56 redsheep[d]: gfxstrand[d]: I have a feeling this is a "draw the rest of the owl" situation
22:56 gfxstrand[d]: In any case, you can forget about images for a bit. They're also actually not that hard. I just freaked about them because I thought we had to do tiling calculations. We don't. Format conversion is easy.
22:56 mhenning[d]: snowycoder[d]: Okay, to start nak work with kepler, you'll want to be writing an equivalent of src/nouveau/compiler/nak/sm50.rs (which is maxwell/pascal) except for kepler. The main thing to do here is getting the instruction encodings working. The old compiler has its instruction encodings here: src/nouveau/codegen/nv50_ir_emit_gk110.cpp
22:56 gfxstrand[d]: And now that we have unit tests, I would start by getting those to pass on Kepler.
22:57 gfxstrand[d]: They're simpler than the CTS and let you poke at things very directly.
22:57 mhenning[d]: gfxstrand[d]: Yeah, start with a skeleton that just panics on every instruction type and then add in instructions as you encounter them
22:58 mhenning[d]: Maybe start with a really simple compute example and then work on the unit tests
22:58 snowycoder[d]: gfxstrand[d]: Wait, we have unit tests besides CTS and the NAK tests MR?
22:58 gfxstrand[d]: There's are NAK hardware tests for piles of opcodes.
22:58 gfxstrand[d]: `hw_tests.rs`
22:59 snowycoder[d]: Ah, right! Ok it doesn't seem that hard now
23:00 mhenning[d]: You can still toggle between the old compiler and NAK, in case having it actually executing helps
23:08 gfxstrand[d]: Yeah. That can be helpful early on but I feel like it's less and less helpful the further you to.
23:09 gfxstrand[d]: It did help me figure out Maxwell geometry shaders, though.
23:10 gfxstrand[d]: gfxstrand[d]: Why? Because most errors bringing up new hardware are encoding errors and looking at codegen only helps if it's using the same instructions.
23:11 gfxstrand[d]: That or weird semantic differences and those are best figured out through the folding tests.
23:14 mhenning[d]: snowycoder[d]: Oh, another thing that can be useful for instruction encodings is this hack I have to run nvdisasm on the shaders we output. Can be pretty useful for debugging https://gitlab.freedesktop.org/mhenning/mesa/-/commit/751fcb42bc841daa98b237c665252aeb30dd9731
23:14 mhenning[d]: Might need modifications for kepler though, and also you would need to get an old cuda toolkit version in order to have an nvdisasm with kepler support
23:15 snowycoder[d]: Thanks! I tried something similar some weeks ago but I falied, might have had something to do with endianness
23:15 gfxstrand[d]: The nv-shader-tools project in the nouveau repo has an nvfuzz tool which is useful for fuzzing the disassembler to figure out encodings.
23:17 mhenning[d]: Yeah, that too. Although hopefully you won't need to RE new instructions since the old complier already has a lot of it figured out
23:17 gfxstrand[d]: It's especially useful if codegen is missing something or you can see that the bits are there but it's not very well documented.
23:17 gfxstrand[d]: But generally, just look at the codegen code and copy that encodings.
23:19 mhenning[d]: Yeah. Oh, and envytools also has a disassembler, although I don't recommend looking at that because it's unreadable gibberish to me
23:19 gfxstrand[d]: Heh
23:19 snowycoder[d]: Thank you for the tips! I'll work on that in my free time :3
23:21 karolherbst[d]: ~~gonna be fun supporting `TEXS` in nak~~
23:23 gfxstrand[d]: 🤷🏻‍♀️
23:24 gfxstrand[d]: Way easier than in the codegen mess.
23:28 karolherbst[d]: hopefully
23:28 karolherbst[d]: oh right... I might have to start working on nvk/nak for work related things
23:28 karolherbst[d]: so I might actually at some point even deep dive into NAK and do random things 🙃
23:29 gfxstrand[d]: :transtada128x128:
23:29 karolherbst[d]: if dave doesn't beat me to it 😄
23:31 gfxstrand[d]: I just need to be on deck to actually review code.