IRC Logs of #nouveau on irc.freenode.net for 2024-07-24

00:00 clangcat[d]: down: Well yea sadly I doubt NVIDIA would go out of their way to support Nouveau with maintaining older cards. Just cause old. sadly it sucks cause a lot of these cards are still fine.
00:00 clangcat[d]: Just older
00:05 down: yes, vulkan will give them a second (third?) wind, but they don't need it (NVIDA F*CK YOU) because it doesn't give them any value other than respect from N-1 guys with a bunch of "junk". im just sad
00:23 mhenning[d]: NVK does not work with fermi and there are no plans to make it work. The old GL driver should still support those cards though.
00:25 down: very pity, thx for answer
00:39 clangcat[d]: mhenning[d]: Yea figured that was the answer but wasn't 100% sure.
00:40 clangcat[d]: down: I mean for me it's just a lot of old hardware is still good. Not if you play like the latest games but still fine for general purpose use and lighter games
00:41 rinlovesyou[d]: yep, it's annoying when support for older hardware gets dropped especially when it's sitll perfectly fine for a lot of usecases
00:51 clangcat[d]: rinlovesyou[d]: Well at very least with Nouveau you can still have the cards working with Opengl probably not to 100% of the performance but better than nothing
01:20 redsheep[d]: down: For the record, at least from everything recent that I have observed, nouveau works as well as it does now because of, not in spite of, nvidia's help. Several Nvidia employees are active and frequently helpful in this channel.
01:20 redsheep[d]: Also, for vulkan specifically it has been discussed before, kepler and especially fermi have hardware limitations that make full vulkan and therefore full dxvk support highly difficult, if not outright impossible.
01:25 mrmx450: Hi, this is where we can submit piglit results?
01:27 redsheep[d]: mrmx450: Are you seeing fails with native opengl, or nvk+zink? What hardware
01:29 mrmx450: I saw a couple errors with an mx450 turing/NV160 + intel hd graphics hybrid laptop
01:29 redsheep[d]: Do you see fewer issues with `NOUVEAU_USE_ZINK=1`?
01:30 mrmx450: I can test it out
01:32 mrmx450: Would the old results be helpful or should I just wait until I get the new ones after running it with that variable
01:32 redsheep[d]: For turing zink is the more active area for improvement, and more likely for the devs to be interested. There are also known issues there though.
01:32 redsheep[d]: https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/gallium/drivers/zink/ci/zink-nvk-ga106-fails.txt?ref_type=heads
01:32 redsheep[d]: https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/gallium/drivers/zink/ci/zink-nvk-ga106-flakes.txt?ref_type=heads
01:32 redsheep[d]: https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/gallium/drivers/zink/ci/zink-nvk-ga106-skips.txt?ref_type=heads
01:33 redsheep[d]: If you're seeing issues not on those lists you'd probably just want to open an issue
01:34 mrmx450: I don't have an issue This server's just linked people.freedesktop.org/~imirkin/ and here nouveau.freedesktop.org/TestersWanted.html
01:38 redsheep[d]: Right, if use `NOUVEAU_USE_ZINK=1` and you see fails that aren't covered by the lists I linked opening an issue would be most appropriate, if there's something wrong with your specific configuration or card that is good to know, and is most likely to get addressed if you are using zink.
04:22 mrmx450: Got 18799 pass, 8406 fail, will look through results and share to the issues place
07:59 demonkingofsalvation[d]: stupid question but why doesn't nouveau have hardware acceleration for the newer graphics card (unless I missed something)?
08:05 clangcat[d]: demonkingofsalvation[d]: For newer cards? Which cards do you mean exactly and what do you mean by hardware accel?
08:05 clangcat[d]: Like they have an Opengl driver and Vulkan driver for hardware accel rendering.
08:06 demonkingofsalvation[d]: clangcat[d]: like turing and above, as for hardware accel like, https://wiki.archlinux.org/title/Hardware_video_acceleration#NVIDIA
08:06 demonkingofsalvation[d]: unless the arch wiki is outdated
08:07 clangcat[d]: demonkingofsalvation[d]: I'm confused as to which line you mean?
08:07 demonkingofsalvation[d]: and the nouveau website says its TODO:
08:07 demonkingofsalvation[d]: https://nouveau.freedesktop.org/FeatureMatrix.html
08:07 demonkingofsalvation[d]: clangcat[d]: my bad, wrong link
08:08 magic_rb[d]: Opengl, vulkan work best on turing up
08:08 magic_rb[d]: As for hwaccell on video, not sure havent tested it
08:08 clangcat[d]: demonkingofsalvation[d]: Wait do you mean like video stuff?
08:08 demonkingofsalvation[d]: yeah, sorry if that wasn't clear
08:11 clangcat[d]: demonkingofsalvation[d]: Yea sorry I automatically assumed like uhhh single apps. As for why it's TODO it's probably just a matter of needing devs/time to work on it and that idk what the Nouveau devs consider it in terms of priority as I imagine most initial work they would put towards KMS/DRM and other features considered to be needed before Video accel.
08:11 clangcat[d]: Would be my guess anyway
08:14 demonkingofsalvation[d]: how difficult is it to implement?
08:15 demonkingofsalvation[d]: wanted to know if I could make help the effort
08:22 clangcat[d]: demonkingofsalvation[d]: Video decoding? I uhh have no idea. I've never worked with videos. But you can always looking it how other the other driver supports it for other GPUs and try and port that to the more modern ones. Though talking to some of the Nouveau devs like airlied[d] or gfxstrand[d] could help as they might be able to inform better than I can.
08:22 clangcat[d]: Especially for any potential reasons video decoding isn't done yet (i.e. if there are other things that need finished first, etc...).
08:23 clangcat[d]: But you should definitely try to help out if you can.
08:26 redsheep[d]: I think dwlsalmeida was looking at video acceleration? Might be mistaken. And yeah Cait is right, other things have come first but as more stuff has gotten buttoned up that is moving up the list, I'm sure
08:27 redsheep[d]: IIRC there was something about getting one frame successfully decoded a few months ago
08:28 tiredchiku[d]: afaik we're looking at doing vulkan video for hwaccel, which will take some time since vulkan video is also rather new
08:29 redsheep[d]: I know for me I can't ever tell if I have working video acceleration anyway unless I'm using obs, software has gotten so fast at it on avx512... So at least for a sample size of one (me) there's bigger fish to fry
08:30 HdkR: Lucky, got AVX512 hardware?
08:30 clangcat[d]: redsheep[d]: Yea idk anything about video accel but I imagine it's just some stuff needs to be finished before it can be done.
08:31 clangcat[d]: tiredchiku[d]: aaahhh the header file I've never touched
08:31 redsheep[d]: HdkR: Zen 4, yeah. Surprisingly good, turns out you don't actually need much of the full width backend for the benefits
08:37 demonkingofsalvation[d]: got it, I'll start looking into vulkan video
08:41 HdkR: redsheep[d]: The larger register file is really the big win for AVX512
08:44 redsheep[d]: HdkR: Yeah. Also AMD had the advantage of getting to see how people actually use it, unlike intel who had to just spend their transistors on their best guess.
09:11 airlied[d]: Yeah we have the class headers for video decode and I got a frame a while back and dwlsalmeida has some of it lined up. Also just been working on zink so vaapi can work on top of Vulkan video
13:36 gfxstrand[d]: demonkingofsalvation[d]: There are a few branches kicking around for Vulkan video but it's still very WIP
14:11 ahuillet[d]: I didn't know class headers had been published for it, this is good to hear.
14:25 babblebones[d]: Very nice, we may have to accelerate Vulkan video efforts for WiVRn XR runtime
14:25 babblebones[d]: Encoding on nouveau this early would be super nice
18:23 gfxstrand[d]: airlied[d]: I know you don't want to care about pre-GSP but this one is the reason I can't make it through a CTS run on Maxwell: https://gitlab.freedesktop.org/drm/nouveau/-/issues/378
18:23 gfxstrand[d]: I'm still trying to figure out what I did to cause the timeout
18:25 airlied[d]: I think skeggsb9778[d] can comment, but I think that's welcome to Maxwell have a nice day territory
18:33 gfxstrand[d]: Yeah, IDK if we can do much about losing the channel. The fact that we then try to handle faults on it and blow up is pretty rough
18:39 gfxstrand[d]: But I also don't want anyone spending weeks on this
18:40 skeggsb9778[d]: There were a couple of GPUs I had that would reliably hit something like that with piglit, that I tried *many* times to resolve, and never managed to get channel recovery to work with
18:41 skeggsb9778[d]: Assuming it's the same bug, it looked like the whole memory controller fell down and it seemed like it needed a harder reset than just reinitialising a single engine
18:42 gfxstrand[d]: ugh
18:42 gfxstrand[d]: This is a 750 Ti
18:42 gfxstrand[d]: I've got a couple 980s I can switch to if that card is too old.
18:43 gfxstrand[d]: I need to do that eventually anyway so I can make sure the sparse ops work.
18:51 gfxstrand[d]: The good news is that I found the thing that was timing out contexts. 😄
18:52 gfxstrand[d]: I'm getting REALLY close to switching NVK/Maxwell to NAK.
18:52 gfxstrand[d]: At this point, I don't think it's worse
18:52 gfxstrand[d]: Different, but not worse
18:53 gfxstrand[d]: At which point I'm very tempted to rip out codegen support and say that if someone wants Kepler, they need to write NAK support for it.
18:59 asdqueerfromeu[d]: gfxstrand[d]: It was useful for finding a regression though
19:02 gfxstrand[d]: Which regression? That `gl_FragCoord.w` thing from the early NAK days?
19:04 sravn: gfxstrand[d]: Are we any closer with kepler support after your Maxwell work, or is it still a lot of work?
19:04 gfxstrand[d]: It's still a lot of work.
19:05 gfxstrand[d]: I mean, every API side bug I fix on Maxwell is a bugfix on Kepler but on the shader side it's still "someone needs to write a backend"
19:05 gfxstrand[d]: The good news, though, is that the tooling situation and over-all separation of HW generations inside the compiler is a lot better now so it should be pretty easy for someone to work on that in a non-invasive way.
19:06 gfxstrand[d]: And the control-flow stuff is pretty similar between Maxwell and Kepler, I think.
19:06 gfxstrand[d]: So it's getting easier but the work still needs to be done
19:06 sravn: OK, seems it slowly become doable. But I better get me a newer card rahter than waiting.
19:06 gfxstrand[d]: Yeah, waiting on Kepler isn't a good plan.
19:07 gfxstrand[d]: Like, if someone wants to do that as a fun hobby project, I'll happily take the patches as long as they don't cause friction elsewhere.
19:07 gfxstrand[d]: But I don't think I'll be putting any time into it besides mentoring time and/or review.
19:07 karolherbst[d]: gfxstrand[d]: 750 Ti is using our FOSS firmware btw
19:08 karolherbst[d]: and that one might as well be quite buggy in recovery situations as well
19:08 asdqueerfromeu[d]: gfxstrand[d]: Yes
19:09 gfxstrand[d]: Yes, codegen was helpful for that one but I've not found codegen to be helpful since.
19:09 gfxstrand[d]: Like, even hacking on Maxwell I'm not really looking at codegen shaders besides trying to figure out a few scheduling bits.
19:10 gfxstrand[d]: I look at the code to get bit encodings for instructions but I'm not actually running it
19:11 gfxstrand[d]: And even there I don't trust it much. Once I get an instruction, I then usually fuzz the assembler to get a more detailed view of how it works than trust codegen to make the right assumptions.
19:15 kuter7639[d]: does anyone know what `desc[][]` style addressin is? I think it was introduced with sm_80
19:18 gfxstrand[d]: I'd need more context
19:18 gfxstrand[d]: constant buffers are sometimes addressed that way
19:18 gfxstrand[d]: like `cx[ur40][0x40]`
19:19 skeggsb9778[d]: karolherbst[d]: That's true, though if it's related to what I mention, that happened with nv's gr fw too
19:20 kuter7639[d]: gfxstrand[d]: No it's different from constant memory as an example `STG.E desc[UR4][R2.64], R24 ;` I suspect it's an optimization for global memory transactions
19:20 gfxstrand[d]: Yeah, that looks like it's fetching the address from a descriptor at UR4 and addressing based on that.
19:21 gfxstrand[d]: I'm not sure on the semantics for sure. In particular, I don't know what the descriptor format is
19:21 kuter7639[d]: in this case UR4 for is a kernel argument. `ULDC.64 UR4, c[0x0][0x208] ;`
19:21 gfxstrand[d]: Yeah
19:22 gfxstrand[d]: So it's fetching a descriptor set base address with that `ULDC` and then the `STG.E` references it uses that as a base address of sorts.
19:22 gfxstrand[d]: At least that's what it looks like is going on.
19:22 gfxstrand[d]: I've not played around with the more complex addressing modes, though, so I'm not sure.
19:23 kuter7639[d]: I think the idea is to group values per warp.
19:23 kuter7639[d]: but not sure
19:23 gfxstrand[d]: I don't think it has anything to do with that. It's just an indirect addressing mode.
19:23 kuter7639[d]: I mean like a memory access coalescing.
19:23 kuter7639[d]: but ok
19:23 gfxstrand[d]: NVIDIA has a LOT of addressing modes
19:24 gfxstrand[d]: Textures have like 6
19:24 gfxstrand[d]: And LDC has a bunch
19:24 kuter7639[d]: Haven't really looked at any texture or surface stuff.
19:24 kuter7639[d]: they look complicated though
19:24 gfxstrand[d]: Yeah, there's a lot of complexity there.
19:25 kuter7639[d]: I am investigating the warpgroup matrix multiplication instructions in hopper ` QGMMA.64x128x32.F32.E4M3.E4M3 R24, gdesc[UR4], RZ, !UPT, gsb0 ; `
19:25 kuter7639[d]: It's super interesting
19:25 gfxstrand[d]: But the UR4 thing sort-of is for grouping values per-warp. It's typically more about HW internal bandwidth than anything, though. You only have to send 64 bits per memory op instead of 64x32 which saves on wires inside the shader
19:26 gfxstrand[d]: Oh, interesting.
19:26 gfxstrand[d]: In that case desc might be doing something totally different
19:26 gfxstrand[d]: I wasn't thinking about matrices
19:27 kuter7639[d]: No this UR4 is different I belive.
19:27 kuter7639[d]: gdesc here should be the tensor descriptor which is like a tagged pointer
19:27 kuter7639[d]: *matrix descriptor
19:27 kuter7639[d]: yet another addressing mode ...
19:27 gfxstrand[d]: Yeah
19:28 kuter7639[d]: also the R24 is used as a vector of 64 register 🤯
19:30 kuter7639[d]: What I really dislike about cuda is that it doesn't have a way of annotating that a value is going to be uniform across the warp. So you have to rely on compilers data flow analysis to make sure it's using a uniform register
19:31 gfxstrand[d]: kuter7639[d]: Yup! That's the way the warp stuff works.
19:31 gfxstrand[d]: kuter7639[d]: Yes, but also the uniform stuff on NVIDIA is super limited to the point of being nearly useless. 😭
19:32 gfxstrand[d]: gfxstrand[d]: And... Maxwell just survived a CTS run!
19:32 gfxstrand[d]: Only ran 10% of the CTS but still
19:32 kuter7639[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1265753566925291550/image.png?ex=66a2a85e&is=66a156de&hm=28537abc3c70aa4275f98b54f6ebe263a4d4c9a88003f6c25154dfedc5d05f1e&
19:32 kuter7639[d]: gfxstrand[d]: I mean 64 registers of a single thread.
19:32 gfxstrand[d]: `Pass: 105952, Fail: 337, Crash: 294, Skip: 167458, Flake: 152, Duration: 51:54, Remaining: 0`
19:32 kuter7639[d]: here is the nvdisasm output
19:33 gfxstrand[d]: kuter7639[d]: Oh, wow. I mean, if you're going to burn half the register file on a single value, your matrix is probably it but yeesh
19:34 kuter7639[d]: I have no proof but a warp group is all sub-cores of an SM working together.
19:35 kuter7639[d]: Also, when I have time I want to generate instruction encodings for older architectures if there is interest. Like I did with https://kuterdinel.com/nv_isa_sm89/
19:35 kuter7639[d]: The problem is, my fuzzer takes few hours on a 128 core machine
19:37 gfxstrand[d]: kuter7639[d]: Wow! That thing is gold. I'm going to bookmark that.
19:38 gfxstrand[d]: I'd be happy to run it on more generations if you wanted to throw me your script
19:38 gfxstrand[d]: I've got a 36T machine that I don't mind running over night
19:39 kuter7639[d]: https://github.com/kuterd/nv_isa_solver I want to clean it up a bit eventually.
19:39 kuter7639[d]: https://github.com/kuterd/nv_isa_solver/issues/1 here is the instructions
19:40 kuter7639[d]: cuobjdump --dump-sass --gpu-architecture sm_89 libcublasLt.so.12.5.3.2 > libcublasLt.sass
19:40 kuter7639[d]: nv-isa-solver-scan --arch SM89 --cache_file 4090_cache.txt libcublasLt.sass
19:40 kuter7639[d]: nv-isa-solver-populate-cache --arch SM89 --cache_file 4090_cache.txt
19:40 kuter7639[d]: nv-isa-solver-mutate --arch SM89 --cache_file 4090_cache.txt
19:40 kuter7639[d]: nv-isa-solver --arch SM89 --arch_code 89 --cache_file 4090_cache.txt --num_parallel 5
19:40 kuter7639[d]: nv-isa-solver-mutate --arch SM89 --cache_file 4090_cache.txt
19:40 kuter7639[d]: nv-isa-solver --arch SM89 --arch_code 89 --cache_file 4090_cache.txt --num_parallel 5
19:41 gfxstrand[d]: And that just fuzzes the disassembler? It doesn't like dump symbols or something? (Worrying about clean-room here)
19:42 kuter7639[d]: the sass dump of cublass is used to discover instructions.
19:42 kuter7639[d]: just iterating over opcodes does not work because some instructions require certain modifiers to be set to decode properly
19:42 gfxstrand[d]: Yeah, we've hit that
19:43 kuter7639[d]: Another thing that might be usefull for you. I made a web based control code viewer. https://kuterdinel.com/nvidia-sass-control-code-viewer.html
19:45 gfxstrand[d]: For that, I have my `nvdis` wrapper (https://gitlab.freedesktop.org/gfxstrand/nv-shader-tools) which annotates the assembly with dependency information
19:46 gfxstrand[d]: And also generally makes it nicer to work with for my use-cases
19:47 kuter7639[d]: cool
19:49 kuter7639[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1265757761795260416/image.png?ex=66a2ac47&is=66a15ac7&hm=6cc28b0433614db8b33526c15f6bc5a79132fe1e64a845d3e65262396436bb5d&
19:49 kuter7639[d]: I made a tool to annotate register deps and anti deps for trying to figure out scheduling stuff. Nvidia's scheduling model is super difficult though ...
20:09 gfxstrand[d]: Yeah, the version in NAK is massively simplified and we really need to get real dependency information in there
20:12 kuter7639[d]: it's ... super difficult. Might consider some ptx fuzzing because just looking at precompiled stuff might not be enough
22:28 gfxstrand[d]: I'm hoping I can convince nvidia to cough up docs in a form I can use. But I've also been thinking about how to reverse that information. It's really tricky.