IRC Logs of #nouveau on irc.freenode.net for 2025-06-08

00:26 esdrastarsis[d]: `VK_EXT_shader_float8` can only be supported in Ada or later, right?
00:35 gfxstrand[d]: snowycoder[d]: Not all texture destinations are 4 regs, though.
00:36 gfxstrand[d]: HdkR: Userspace. But it's not really new. We already do that. It's just a refactor and doing it a tiny bit more.
00:37 HdkR: Interesting.I guess the 64MB base size is usually fine for 32-bit things?
00:39 gfxstrand[d]: The GPU is 40-bit no matter what userspace is.
00:40 gfxstrand[d]: I'm not making CPU VA reservations
00:40 gfxstrand[d]: Though I kinda want to
00:48 HdkR: gfxstrand[d]: Ah, device side VA reservations okay. I'm fine with that :D
05:37 mupuf: _lyude[d]: I didn't check, but I would not be surprised. time do be like that
06:41 snowycoder[d]: gfxstrand[d]: On kepler they can use at most 4 registers. Bucket values can be invalid if they use less than 4.
06:41 snowycoder[d]: The cool thing is that every RegRef uses only 1 bucket since it cannot cross the 4-byte alignment
12:22 gfxstrand[d]: Yes. That I get.
12:22 gfxstrand[d]: Anyway, I'll give it a read tomorrow
12:27 snowycoder[d]: gfxstrand[d]: Thanks!
12:53 gfxstrand[d]: The case that concerns me (again, haven't read it) is if you have a tex that only writes r8..10 and then someone else uses r11. Do we end up with false dependences?
13:00 snowycoder[d]: gfxstrand[d]: Nope, in that case it would fetch bucket 2, being [0, 0, 0, 255], range 3..4 ([255]).
13:00 snowycoder[d]: Since the value is too high, it is discarded as not in the stack
16:19 calico: Hello guys, I just seen this article:
16:19 calico: q%l)o2V#h8L3L
16:19 calico: ohh shit
16:19 calico: posted my password
16:20 calico: https://www.gamingonlinux.com/2025/04/mesa-nvk-nvidia-vulkan-driver-now-vulkan-1-4-conformant-on-maxwell-pascal-and-volta-gpus/
16:20 calico: ok so I saw this article
16:20 calico: it says support of NVK on Kepler is nearly ready
16:20 calico: is it true?
16:33 karolherbst: calico: have fun changing your password then :P
16:33 karolherbst: calico: yes
16:33 karolherbst: thank gfxstrand[d] for that
16:34 calico: I remember some months ago someone said he won't work on that
16:34 calico: or maybe it was Fermi
16:34 calico: anyway
16:34 calico: it's very cool
16:35 calico: won't have to buy a GTX 1660 Ti just for getting full GPU compat on my motherboards from 2014 (i7-4790K)
16:37 karolherbst[d]: yeah probably fermi
16:37 karolherbst[d]: I'd be curious how well it performance with kepler reclocking enabled
16:38 karolherbst[d]: *performs
16:39 calico: probably near as good as with the proprietary drivers
16:40 karolherbst[d]: must be
16:42 calico: just sad my GigaByte GA-MA770-UD3 (i7-3770K) **seems** to be dead ... dead BIOS or PCH ... won't even POST, always bootloop
16:44 calico: the GTX 760 was originally on igt
16:44 calico: *it
16:46 calico: gfxstrand[d]: how hard was the port?
17:14 snowycoder[d]: calico: Almost, to have performance similar to the old drivers we need to merge 1 MR and to implement instruction scheduling.
17:14 snowycoder[d]: But feature-wise it handles all Vulkan 1.2
17:34 calico: snowycoder[d]: so most Steam games of the 2010-2019 era shoudl work. YEAH
17:37 f_: calico: it is fermi, I've been told it needs a lot of work to get there
17:37 f_: and no one stepped up to do it
17:38 calico: f_: what?
17:38 f_: Too bad, but as long as the old nouveau GL drivers are still sort of maintained (in that someone takes a lil' look at it every once in a while) I'm fine
17:38 calico: my GTX 760 is Kepler: https://www.techpowerup.com/gpu-specs/geforce-gtx-760.c1857
17:38 f_: I was replying to:
17:38 f_: 18:34 <calico> I remember some months ago someone said he won't work on that
17:38 f_: 18:34 <calico> or maybe it was Fermi
17:39 calico: oh that
17:39 calico: yeah
17:39 calico: hmm
17:39 f_: overall it seems like Fermi is not getting much love these days
17:39 f_: I don't blame anyone for it, but that's just how it is
17:40 f_: (well, I do blame nvidia a bit :p)
17:40 calico: hmm
17:41 calico: on my laptop's gtx 1660 Minecraft always crash with this error:
17:41 f_: I try my best to report the remaining bugs .. but whether someone will work on fixing them... It seems they're all busy on other (probably more exciting!) things (which is absolutely understandable)
17:42 f_: so yeah
17:42 pavlo_kozlenko[d]: snowycoder[d]: Maximum version planned 1.2.175?
17:42 f_: By the way does anyone know how can I get rid of the flickering on my laptop display?
17:42 calico: X Error of failed request: BadAlloc (insufficient resources for operation)4
17:43 calico: here's the error ^
17:43 pavlo_kozlenko[d]: pavlo_kozlenko[d]: karolherbst[d]
17:43 f_: the top-right corner shows a line that keeps on flickering and sometimes the whole display glitches and flickers
17:43 f_: for a split second
17:43 karolherbst[d]: yeah.. I don't think Kepler will ever get 1.3
17:43 karolherbst[d]: I _think_ it's a hw limitation, something around memory model
17:45 f_:is happy to have nouveau around in the first place, btw.
17:45 f_: <3
17:45 gfxstrand[d]: calico: Yeah, it's Fermi that's probably not going to happen.
17:46 f_: What if I replace my fermi with something else? :D
17:47 f_: it's a socketed MXM card FWIW
17:47 karolherbst[d]: replacing MXM cards is pure pain
17:48 pavlo_kozlenko[d]: gfxstrand[d]: even vulkan 1.0?
17:48 karolherbst[d]: gfxstrand[d]: I just wanted to say that Dave might add Fermi support once Red Hat decides to drop the GL drivers and go zink only, but RHEL doesn't even support Fermi anymore 🙃
17:52 pavlo_kozlenko[d]: It's a pity that OpenGL 4.6 could not be implemented
17:52 f_: karolherbst: on this laptop? It's a socketed card, you just pop it out and replace it
17:52 pavlo_kozlenko[d]: karolherbst[d]: On the Kepler, will we have OpenGL 4.5 through zink?
17:53 f_: then you install coreboot on it with seabios and get it to pick the card's VBIOS
17:53 karolherbst[d]: OpenGL 4.6 is kinda implemented, but not advertised... maybe we just should..
17:53 f_: not the first time someone tries to replace the GPU on this laptop, but possibly the first time one flashes coreboot on such a setup
17:53 karolherbst[d]: f_: yeah, and then you replace your cooling system because it doesn't fit with the new card
17:54 f_: Ah, yes, this is all provided you get a card that fits
17:54 karolherbst[d]: yeah.. find a card that fits
17:54 pavlo_kozlenko[d]: karolherbst[d]: Make it possible to activate it through a variable, like I_WANT_BROKEN_MY_GL_DRIVER.=1
17:54 f_: I believe there are quite a few cards that do fit
17:54 karolherbst[d]: your specific laptop model?
17:54 f_: elitebook 8560w
17:54 f_: needs the smaller ones
17:55 f_: Something this size https://tpucdn.com/gpu-specs/images/c/1430-front.jpg
17:55 Jasper[m]: Some laptops do not do MXM-B (which is where the funner cards are), some laptops also explode if you replace the card (Dell for example :^))
17:55 karolherbst[d]: rough.. I mean.. worst case you customize the cooling yourself, just needs a bit of welding and...
17:56 karolherbst[d]: anyway.. replacing MXM cards is just a huge amount of fun
17:56 f_: Jasper[m]: oh, it's possible mine explodes if I replace the card, at least with the HPBIOS that's currently on there
17:57 f_: a BIOS I've been planning to replace anyway... the flash chip containing the EC fw and the BIOS is easily accessible though, so I can easily reprogram it
17:57 Jasper[m]: Nah idk if it will, it shouldn't. Dell just had some particular cases where it did...
17:57 f_: well, the bios did explode when I plugged in some usb stick with lubuntu on it
17:57 f_: it would just hang
17:57 f_: whenever it saw the usb
17:58 f_: karolherbst[d]: all sorts of fun! You could say it removes the fun in funderscore :D
17:58 f_: It was also all sorts of fun getting the MXM properly working in coreboot in the first place I heard
17:58 karolherbst[d]: I'd suggest getting a system with TB, but I think NVK performance is quite bad if your PCIe bus is slow...
17:59 karolherbst[d]: should probably do something about that
17:59 f_: TB?
17:59 karolherbst[d]: thunderbolt
17:59 f_: ah
17:59 karolherbst[d]: though
17:59 karolherbst[d]: uhm..
17:59 karolherbst[d]: I think there were "MXM to PCIe" adapters
17:59 karolherbst[d]: and then you have a uhm.. how is it called
17:59 karolherbst[d]: the slim cable thingies
17:59 f_: this laptop predates thunderbolt becoming popular
17:59 f_: expresscard?
17:59 f_: mPCIe?
18:02 karolherbst[d]: nah
18:02 f_: I've seen people hooking up expresscard to some external gpu
18:02 f_: not sure how fast that kind of stuff would be though
18:04 karolherbst[d]: https://ae01.alicdn.com/kf/S34d9d6a8be1945f084460ced3386b078U.jpg
18:04 f_: this is cursed
18:05 karolherbst[d]: very
18:05 f_: I mean
18:05 f_: very practical when I want to bring my laptop abroad
18:05 f_: ;p
18:06 f_: I'd like to take a moment to thank HP for hardwiring the iGPU to stay off as well
18:06 karolherbst[d]: I'm sure it's detachable
18:06 f_: karolherbst[d]: yeah, but then I got no graphics anymore ;)
18:06 karolherbst[d]: you do
18:06 karolherbst[d]: 😛
18:06 f_: no, the integrated gpu is hardwired to stay off
18:07 f_: it is not used for anything. The fermi handles all the graphics
18:07 karolherbst[d]: what
18:07 f_: so I can't use the integrated gpu at all
18:07 f_: thanks hp
18:08 karolherbst[d]: cursed
18:08 karolherbst[d]: why have a laptop when your GPU burns 100W away
18:08 f_: because it's a thick elitebook
18:09 f_: this is one of those very thick laptops hp produced back in 2011
18:09 f_: As you can imagine the battery life is not all that good when it's running, even though it lasts really long when in suspend
18:09 karolherbst[d]: ah yes, when it was acceptable that the battery drained in one hour
18:10 f_: not quite one hour, I think only several hours, but yes.
18:10 karolherbst[d]: maybe that's the reason why those fermi cards have such low clock states, always wondered why they allow like 100MHz or something
18:10 f_: so yeah I usually keep it plugged in to the wall when I can
18:11 f_: Yeah it's probably that
18:12 f_: But at least I can blame just one GPU when I have graphical issues instead of two :p
18:13 karolherbst[d]: the desktop must be slow tho
18:14 f_: karolherbst[d]: not very slow, but that's probably because I use sway and primarily TUI/CLI
18:14 f_: when I tried hyprland on it (yes, I know) it was indeed pretty slow
18:14 f_: so I went back to sway. And then the whole thing happened, so there's one more reason not to use it :p
18:15 f_: IIRC it seemed to be working okay in gnome, though I think this was with the (not so good anyway) proprietary driver
18:16 f_: Does fermi support reclocking right now?
18:16 f_: # cat pstate
18:16 f_: <...>
18:17 f_: AC: core 202 MHz memory 324 MHz
18:18 f_: Oh. Apparently not https://paste.debian.net/hidden/3d71f24d/
18:18 f_: (or I'm holding it wrong™!)
18:25 f_: according to https://nouveau.freedesktop.org/PowerManagement.html memory reclocking is "WIP", so, ok not there yet
18:27 pavlo_kozlenko[d]: AC: core 202 MHz memory 324 MHz
18:27 pavlo_kozlenko[d]: 0f: core 550 MHz memory 900 MHz
18:28 f_: pavlo_kozlenko[d]: yeah?
18:28 f_: lol I just ran strings on vbios.rom
18:28 f_: ERROR: Valid MXM Structure not found.
18:28 f_: POST halted for 30 seconds, P-State limited to P10...
18:28 pavlo_kozlenko[d]: What video card is this? Fermi?
18:28 f_: brings back memories
18:29 f_: pavlo_kozlenko[d]: fermi indeed
18:29 pavlo_kozlenko[d]: on mobile versions of the fermi there are problems with power management
18:30 f_: you mean, with the card itself or nouveau?
18:30 pavlo_kozlenko[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1381339576001364019/99e426a1-4159-44fd-addb-0d138948930f.png?ex=6847283a&is=6845d6ba&hm=17f2897ac50d759a06f90d41410a28510e36eadc8a5a8c43e16440766ea19514&
18:30 pavlo_kozlenko[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1381339576337170554/8d826142-3ce0-4d09-b64f-573fb4c2d59e.png?ex=6847283b&is=6845d6bb&hm=f760337ef018f491ecb90c384f78724c5aa2468dff55b12a42674b43eceb044e&
18:30 pavlo_kozlenko[d]: ambiguous redirection
18:30 pavlo_kozlenko[d]: and on the us locale there was something like "the function is not implemented".
18:30 pavlo_kozlenko[d]: gt 710m
18:31 f_: # echo 0f | tee pstate
18:31 f_: 0f
18:31 f_: tee: pstate: Function not implemented
18:31 f_: yep I see Function not implemented
18:31 pavlo_kozlenko[d]: karolherbst[d]
18:31 pavlo_kozlenko[d]: it`s hell)
18:32 f_: that said there's only one dri entry
18:32 f_: lrwxrwxrwx 1 root root 0 Jun 7 09:25 0 -> 0000:01:00.0
18:32 f_: drwxr-xr-x 19 root root 0 Jun 7 09:25 0000:01:00.0
18:32 f_: lrwxrwxrwx 1 root root 0 Jun 7 09:25 128 -> 0000:01:00.0
18:32 pavlo_kozlenko[d]: It seems like the list of frequencies is available, but we can't select them.
18:32 karolherbst: fermi doesn't support reclocking
18:32 f_: good to know
18:32 pavlo_kozlenko[d]: very bad news
18:32 dwfreed: pavlo_kozlenko[d]: you can't do an echo into multiple files like that
18:33 f_: Not "very bad" news I'd say, it was expected TBH
18:33 dwfreed: for file in /sys/kernel/debug/dri/*/pstate; do echo 0f > $file; done
18:33 f_: and this isn't a gaming card by any means anyway
18:33 f_: well probably gt710m is
18:33 dwfreed: gt710? no
18:33 f_: okay :p
18:34 pavlo_kozlenko[d]: but it is a much more powerful card than the intel 2500
18:34 dwfreed: no argument there
18:34 f_: My card at least is not a gaming card by any means whatsoever
18:35 f_: so reclocking not working is the least of my issues, given it sucks at gaming either way
18:35 pavlo_kozlenko[d]: not gaming, but it is necessary that at least the graphical interface does not slow down. For this you need to use maximum frequencies
18:35 dwfreed: karolherbst: what's kepler like? I've got a laptop with a GK107 (gt750m), and debian is basically dropping the nvidia 470 driver line (since it's no longer supported) so I'm probably going to have to go back to nouveau when I upgrade to trixie
18:35 f_: Least of my issues again, I use sway
18:35 pavlo_kozlenko[d]: And the most interesting thing is how to implement it
18:35 f_: not the kind of person that wants lots of animations here and there
18:35 pavlo_kozlenko[d]: GF117M
18:36 f_: dwfreed: have a look at https://nouveau.freedesktop.org/FeatureMatrix.html assuming it's actively updated
18:36 f_: kepler is NVE0
18:37 dwfreed: Yeah, I know, I've read it a few times
18:37 f_: alright ^^
18:37 pavlo_kozlenko[d]: "it's actively updated" What is the exact?
18:38 f_: 'Last edited Fri Jun 6 21:05:42 2025'
18:38 pavlo_kozlenko[d]: What was the last update
18:40 pavlo_kozlenko[d]: That's not what I meant, I'm talking about something like a commit where it says what specifically changed.
18:40 f_: Probably in a git repo somewhere.
18:41 f_: https://gitlab.freedesktop.org/nouveau/wiki/-/blob/main/sources/VideoAcceleration.mdwn
18:41 f_: oops wrong page
18:41 f_: https://gitlab.freedesktop.org/nouveau/wiki/-/commits/main/sources/FeatureMatrix.mdwn
18:45 pavlo_kozlenko[d]: What's that about Pascal, by the way?
18:45 pavlo_kozlenko[d]: Are Nvidia still assholes?
18:46 pavlo_kozlenko[d]: I'm talking about firmware.
19:57 snowycoder[d]: I'm trying to tackle Kepler instr_latencies, does anybody have pointers?
19:57 snowycoder[d]: From what I've seen the documentation is all reverse-engineered and discording (papers and envy-tools cite different things).
19:58 karolherbst[d]: yep
19:58 karolherbst[d]: it's all reverse engineered indeed
19:58 karolherbst[d]: I think the values are something that aren't wrong enough to cause bugs
19:59 karolherbst[d]: snowycoder[d]: just do whatever codegen is doing for now, and maybe we can improve things later or get proper info
20:19 snowycoder[d]: karolherbst[d]: Is there a chance to reuse the current infrastructure for reordering and scoreboarding? `InstrDeps` seems entirely different from whatever codegen does
20:30 karolherbst[d]: snowycoder[d]: It's not that different in principle
20:30 karolherbst[d]: you categorize the instruction in classes (which codegen does) and then each class has a relationship
20:31 karolherbst[d]: kinda need to translate from `operationClass` to whatever is done in nak
20:31 karolherbst[d]: but from a general pov you can just do how it's done in NAK and use the information in codegen
20:35 karolherbst[d]: though the code quite hard to follow 🙃
20:37 snowycoder[d]: Thanks, I'll read it more carefully then
20:38 karolherbst[d]: I don't know if there is a strong relationship between the opcode classes on kepler as there is on newer gens... it's all a bit of a mystery for us
20:39 karolherbst[d]: anyway...
20:39 karolherbst[d]: you can't mess it up
20:39 karolherbst[d]: the hardware double checks it's correct and if it's not you just pay a heavy perf penalty
20:39 karolherbst[d]: which makes it even more annoying to reverse engineer
20:40 snowycoder[d]: karolherbst[d]: That to me still means messing it up:blobcatnotlikethis:
20:40 karolherbst[d]: might be able to read out the shader clock and execute instructions a million times or something
20:46 HdkR: Does Kepler have a shader clock system reg to read to do manual clock cycle investigations like that?
20:52 mhenning[d]: yes, there's a shader clock
20:55 mhenning[d]: snowycoder[d]: yes. I think kepler doesn't need any scoreboarding beyond the TexDepBar stuff you've already done. You should be able to implement a model of the instructiion latencies (see eg. sm75_instr_latencies.rs) and then just use the delays from nak's IR
20:58 HdkR: Nice. I haven't looked much at Kepler so always fun to learn more :)
21:01 karolherbst[d]: mhenning[d]: uhm... I think there is _something_...
21:02 karolherbst[d]: mhhh
21:02 karolherbst[d]: nvm, I was thinking of Maxwell
21:04 snowycoder[d]: mhenning[d]: Hold on, isn't codegen doing scoreboarding in `nv50_ir_emit_nvc0.cpp`, with `SchedDataCalculator`?
21:06 karolherbst[d]: it's just cycles
21:19 mhenning[d]: snowycoder[d]: Yeah, I think that just calculates delays for each instruction.
21:20 mhenning[d]: although I admit I haven't read it especially closely
22:08 airlied[d]: okay funky blackwell fails: NAK IR after calc_instr_deps:
22:08 airlied[d]: block.u 0 L1 [] -> {
22:08 airlied[d]: r0 = s2r sr[0x21] // delay=1 wr:0
22:08 airlied[d]: r1 = s2r sr[0x22] // delay=5 wr:1
22:08 airlied[d]: r2 = s2r sr[0x23] // delay=1 wr:2
22:08 airlied[d]: r0 = imnmx.u32 r0 r1 pF // delay=6 wt=000011
22:08 airlied[d]: r0 = imnmx.u32 r0 r2 pF // delay=6 wt=000100
22:08 airlied[d]: p0 = isetp.lt.u32 r0 0x4 // delay=13
22:08 airlied[d]: @!p0 exit // delay=1
22:08 airlied[d]: } -> [1]
22:08 airlied[d]: block 1 L3 [0] -> {
22:08 airlied[d]: r0 = mov 0x1 // delay=1
22:08 airlied[d]: r2..4 = ldc.b64 c[0x1][+0x0] // delay=6 wr:0
22:08 airlied[d]: null = atom.add.u32.global.a64.strong.gpu [r2..4+0x18] r0 // delay=2 wt=000001 rd:0
22:08 airlied[d]: exit // delay=1 wt=000001
22:08 airlied[d]: } -> []
22:08 airlied[d]: it's uint x = gl_LocalInvocationID.x;
22:08 airlied[d]: uint y = gl_LocalInvocationID.y;
22:08 airlied[d]: uint z = gl_LocalInvocationID.z;
22:08 airlied[d]: if (x < 4u && y < 4u && z < 4u)
22:08 airlied[d]: atomicAdd(ssbo.ua[6], 1);
22:08 airlied[d]: the atomic add is happening no matter what, so I get 256 of them instead of 64
22:24 mhenning[d]: maybe try commenting out the call to opt_jump_thread? That will prevent it from moving the exit
22:25 mhenning[d]: or maybe gl_LocalInvocationID has moved?
22:52 airlied[d]: I already tested the ids in a simpler test, so not that
23:00 airlied[d]: the only other thing is maybe isetp is broken I suppose
23:09 airlied[d]: I expanded the test to a different atomic in the other side of the if, also get 256 of them
23:21 karolherbst[d]: what does the shader look like with nvdisasm?