00:00gfxstrand[d]: That's fine.
00:01gfxstrand[d]: I'm happy enough having triaged it pretty solidly to the kernel.
00:01gfxstrand[d]: It's pretty easy to repro with modesetting and any compositor
00:02gfxstrand[d]: For some reason, the initial X cursor is fine. I have no idea why. Maybe X uses the max size all the time for its internal one?
04:48orowith2os[d]: How's NVK on tegra X1 feeling rn :wires:
04:49orowith2os[d]: Came up in a discussion involving plasma mobile, arm drivers, and the switch being one of the most viable daily drivers there for having Nvidia drivers
04:55mhenning[d]: I don't think there's any nvk on tegra yet
04:56orowith2os[d]: Well, that's certainly some hardware I'd be willing to buy, if anybody here has any interest in making it work
05:57tiredchiku[d]: did a thing to make my life a bit easier :Frog:
05:57tiredchiku[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1347810525358329887/nqpRSi3.png?ex=67cd2dde&is=67cbdc5e&hm=fdb2dc30db1c27a7bcd57c14735ffaf56206044b7892ae00b36757fa6213e673&
06:00gfxstrand[d]: Nice!
06:01tiredchiku[d]: just simple .desktop files that run `systemctl reboot --boot-loader-entry`
06:02tiredchiku[d]: and point to my particular entries 😅
06:02tiredchiku[d]: much better than spamming the arrow keys to interrupt the bootloader at boot though
06:03gfxstrand[d]: Yeah
07:37marysaka[d]: orowith2os[d]: tldr; sync is broken on the kernel side with the new UAPI, I think old UAPI also have the issue but this show less... (you can sometime see out of order frames on the GL driver for example)
07:37marysaka[d]: I think what we are missing is syncpoint support so that mean adding something to make host1x kernel driver less invasive and integrate that with nouveau
07:37marysaka[d]: On the NVK side, idk what is the current state of Maxwell B in general those days
10:38orowith2os[d]: marysaka[d]: "Maxwell b" was all I needed to hear, pretty sure that's broken too much for me to bother
10:38orowith2os[d]: Unless tegra has an exception here from Nvidia
10:39marysaka[d]: orowith2os[d]: I mean it's not too broken, I initially worked quite a bit on supporting Maxwell but because of the "no reclocking" thing never did more.
10:39marysaka[d]: Tegra doesn't have that reclocking problem however so you can get full perf here
10:40orowith2os[d]: Oh, never mind then
10:40marysaka[d]: So that might be the best target to work on Maxwell if we fix the sync issue
11:04orowith2os[d]: I *could* look at buying a few Jetson tx1 boards if anybody wanted to hack on Tegra
11:11marysaka[d]: I uuum have too many of them taking dust around but no time to do anything around them those days
11:14orowith2os[d]: marysaka[d]: :akipeek:
11:14orowith2os[d]: Spare one?
11:16marysaka[d]: I think I have 3 of them still around, one is a bit drunk the rest is okay
14:24gfxstrand[d]: orowith2os[d]: Maxwell B works great. I did a CTS run last week. There were like 12 fails.
14:25gfxstrand[d]: Maxwell A is the one that's broken and is going to need strange workarounds. I think I have a plan for it, though. I just need to do some typing.
14:25orowith2os[d]: Oh
14:25orowith2os[d]: Bleh
14:29gfxstrand[d]: I should probably fix the Maxwell B fails one day. It's some sample mask thing and f16 conversions.
14:52gfxstrand[d]: The f16 conversions are annoying, though. We have to go through f32 to do f64->f16 conversions acetaminophen getting proper RTNE behavior is hard. :blobcatnotlikethis:
14:57gfxstrand[d]: Actually... Hm... 🤔
16:57marysaka[d]: plugged back my Jetson TX1 and updated everything and yeah it's in worst shape than before... even just doing vulkaninfo doesn't work anymore
16:57marysaka[d]: [ 566.259937] nouveau 57000000.gpu: fifo: PBDMA0: 00040000 [PBENTRY] ch 3 [04002ad000 vulkaninfo[5851]] subc 0 mthd 0000 data 00000000
16:57marysaka[d]: [ 566.271900] nouveau 57000000.gpu: fifo:000000:0003:0003:[vulkaninfo[5851]] errored - disabling channel
16:57marysaka[d]: [ 566.281233] nouveau 57000000.gpu: vulkaninfo[5851]: channel 3 killed!
17:03marysaka[d]: GLES2 CTS is running fine with the old GL driver so I guess the sync issue only affect the new uAPI... unless we switched the old GL driver to it and I missed that
17:20gfxstrand[d]: Yeah, we need to fix that. Maybe I'll get motivated some time this summer.
17:21gfxstrand[d]: I'm sure we can make sync work without exposing host1x to userspace. I just need to give it a think/look.
17:22gfxstrand[d]: But I've got a bunch of in-flight (not yet public) Vulkan extensions I need to prototype first.
17:22gfxstrand[d]: And get color compression working
17:32marysaka[d]: *nods*
17:33marysaka[d]: the host1x doesn't need to be exposed to userspace, nvgpu only inject increment/wait when appropriate anyway
17:33marysaka[d]: but maybe it's not syncpoint who knows
17:35marysaka[d]: the fact that the GL driver is currently running GLES2 CTS without a single fault kind of confuse me a lot... if sync was truly completely busted I would at least see that happen with the old uapi too I feel
17:40gfxstrand[d]: Yeah
17:44gfxstrand[d]: I suspect it's something really dumb.
17:44gfxstrand[d]: If bo_busy works, sync objects should as well.
17:45gfxstrand[d]: If bo_busy works then sync objects should as well.
17:45gfxstrand[d]: Okay, IDK what discord just did there. 🤷🏻♀️
17:46gfxstrand[d]: marysaka[d]: What's PBDMA?
17:47marysaka[d]: the thing that fetch command buffers
17:47gfxstrand[d]: Maybe we're just not handling uncached/non-coherent memory properly.
17:47marysaka[d]: https://github.com/NVIDIA/open-gpu-doc/blob/master/manuals/turing/tu104/dev_pbdma.ref.txt#L35
17:48gfxstrand[d]: gfxstrand[d]: Because that message makes it look like it's reading zeros from push buffers.
17:49marysaka[d]: yeah it seems that way... I was testing on 25.0.0 build from Fedora, currently building something against main
17:50marysaka[d]: 4GB of RAM is really short to build our Rust libs rustc is using 3GB with NIL :blobcatnotlikethis:
17:54gfxstrand[d]: I'm thinking we're not getting coherent maps. We don't even implement flush/invalidate so we depend on coherency pretty bad.
17:55gfxstrand[d]: marysaka[d]: Oh, yeah. The nouveau headers crate is not small. 😅
17:59karolherbst[d]: maybe we should get some people working on it to optimize this a bit, because there are a lot of things there to make the codegen smaller
17:59karolherbst[d]: or rather.. the file itself
18:00karolherbst[d]: though might not matter much for linking against it
18:01marysaka[d]: hmm gallium force coherent if the kernel driver is a certain version
18:03marysaka[d]: okay forcing coherent did make the first submission work!
18:03marysaka[d]: now I sigbus on the a memset of some LOCAL mapeable memory
18:04marysaka[d]: but we have no VRAM here sooo I guess that's the thing isn't it
18:14marysaka[d]: yeah no any access to something that is supposed to be mapped even on GART sigbus it seems :aki_thonk:
18:16gfxstrand[d]: Oof
18:17gfxstrand[d]: Is it missing the map flag?
18:19marysaka[d]: it should have it...
18:19marysaka[d]: it's our zero page btw
18:20gfxstrand[d]: I'm not sure we want to actually force coherency all the time but it's a good first step.
18:21gfxstrand[d]: Eventually, I suspect we want to make non-coherent work, especially on Arm.
18:21marysaka[d]: yeah... I'm going to check how the gl driver is doing mapping maybe something is different
18:23karolherbst[d]: if there is no VRAM, the gl driver puts everything in GART memory
18:24karolherbst[d]: though maybe there is some issue with the new UAPI?
18:24karolherbst[d]: that's basically untested on tegra
18:42gfxstrand[d]: Maybe? But there's nothing special about BO creation. It's the same API for that.
18:43marysaka[d]: okay so hmm switching to coherent actually is regressing the thing
18:43marysaka[d]: actually we never set the pushbuf_domains
18:44marysaka[d]: (when creating the channel)
18:44marysaka[d]: let me dig into that a bit
18:45gfxstrand[d]: What are pushbuf domains?
18:46marysaka[d]: it describtes the domain supposed to be used for the BO I think
18:46marysaka[d]: we don't need to set those but I don't think we actually grab the value returned by the kernel and do something about that
18:47marysaka[d]: https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/gallium/winsys/nouveau/drm/nouveau.c#L121
18:48gfxstrand[d]: Oh, that could be. A sort of "pushbufs must live in one of these domains" thing!
18:48marysaka[d]: *nods nods*
18:49marysaka[d]: I'm not sure where we allocate those yet...
18:49gfxstrand[d]: Looks like NVKMD needs a pushbuf flag then.
18:50gfxstrand[d]: Actually... We're gonna need a whole memory type for that for indirect. :blobcatnotlikethis:
18:51gfxstrand[d]: We should look at the kernel and see what it sets when. Maybe there are just funky Tegra rules?
18:51marysaka[d]: hmm looking at the kernel it does hardcode VRAM / GART for Tesla and later... so maybe not related? Will check where we try to allocate those
18:52gfxstrand[d]: Okay then I'm not so worried about that.
18:58marysaka[d]: Something is defintly broken for sure here, if I force only GART to be coherent, I have a VRAM domain allocation that sigbus on first access on the mapped page
18:58marysaka[d]: knowing that I never set the flag for it soo yeah it's fun ™️
19:07gfxstrand[d]: Woof
19:07gfxstrand[d]: Why do we have VRAM domain allocations at all?
19:07gfxstrand[d]: Just GART everything
19:11marysaka[d]: gfxstrand[d]: If I GART everything via NVK_DEBUG=gart, I get the same result :nya_flop:
19:12marysaka[d]: even forced all command buffer functions to have force_gart=true too
19:21marysaka[d]: wait a minute we never wait for BOs to be ready when mapping them?
19:25gfxstrand[d]: No, why would we? Everything is async maps.
19:27marysaka[d]: I was mostly comparing what gallium winsys was doing with pushbuf and the generic codepath for mapping does call the wait
19:28marysaka[d]: it also hardcode RW for all mapping that are mmap
19:31marysaka[d]: yeah no it's not that hmm I'm out of idea
19:35gfxstrand[d]: 😢
19:47gfxstrand[d]: I've got a Switch with Fedora on it. I could attempt something.
19:47gfxstrand[d]: Not today, though.
19:52Jasper[m]: gfxstrand[d]: More capable at that than you may think
19:53Jasper[m]: If you're interested you can likely buy a pair of joycons to maul for UART output
19:57gfxstrand[d]: ?
19:57Jasper[m]: The "wired" way the joycons are connected is just UART
19:58Jasper[m]: Over the pins
19:59Jasper[m]: https://gbatemp.net/threads/how-to-make-a-joycon-rail-uart-to-capture-boot-log.625784/
19:59gfxstrand[d]: Ah
20:00tiredchiku[d]: that's wild
20:10airlied[d]: I really doubt hostx stuff is remotely the problem on Tegra, but I also care less until a gsp one comes out 🙂
20:16Jasper[m]: tbf, GPU blobs for Tegra are upstream and should work by default
20:17Jasper[m]: But I can imagine you want to have it work over all cards/devices directly
20:25marysaka[d]: gfxstrand[d]: found it... it was really the coherent flag missing on pushbuf
20:25marysaka[d]: thing is that NVK_DEBUG_FORCE_GART was clearing my COHERENT flag 🙃
20:25karolherbst[d]: oh no
20:27marysaka[d]: I got vulkaninfo working and one of the smoke triangle without the push_sync so that's nice
20:27marysaka[d]: time to clean up my mess a bit
20:30marysaka[d]: karolherbst[d]: hmm okay no it's even weirder... I was only forcing it on the first alloc done for the init of channel, if I try to do the same for other command buffers allocation it sigbus 🙃
20:39gfxstrand[d]: marysaka[d]: Womp womp...
20:40orowith2os[d]: Faith you can't just say womp womp
20:45tiredchiku[d]: she did, what are you gonna do about it
20:45tiredchiku[d]: :smuggert:
20:48dwfreed: womp womp
20:51marysaka[d]: yeah no there must be a kernel bug tho
20:51marysaka[d]: the second coherent mapping always cause a sigbus
20:54gfxstrand[d]: Uh... That's no good
20:55marysaka[d]: and if you want a bigger joke... the gl driver only set it once for fence allocation on the screen instance
20:55gfxstrand[d]: What?
20:55marysaka[d]: it's only used for the "fence" buffer from what I could tell
20:55marysaka[d]: probably contains pushbuf too tho didn't dig too much that side...
20:56gfxstrand[d]: What is NGL doing with a fence buffer?
20:56gfxstrand[d]: Also, how do they get working maps elsewhere?
20:56gfxstrand[d]: Maybe the coherent flag doesn't do what we think it does
20:58gfxstrand[d]: gfxstrand[d]: Seriously. This concerns me. Userspace shouldn't need a fence buffer on current DRM drivers.
20:59gfxstrand[d]: It should go through the kernel for that and only do it itself if it really needs crazy fast fences. And since nothing related to nouveau was ever crazy fast in the past, I'm dubious.
20:59marysaka[d]: It uses it against NVB06F_SEMAPHOREA
20:59marysaka[d]: so I suppose all sync come from here :aki_thonk:
20:59gfxstrand[d]: Sure. That's what you use to fence.
21:00gfxstrand[d]: Is NGL doing all its own syncing on Tegra?
21:00gfxstrand[d]: I hope not
21:00marysaka[d]: it's doing that for all
21:00marysaka[d]: the only "tegra" specific part is it detecting if it's an SoC or not to set is_uma
21:01gfxstrand[d]: orowith2os[d]: I believe it was a completely valid response to the given situation.
21:01gfxstrand[d]: https://dictionary.cambridge.org/us/dictionary/english/womp-womp
21:02gfxstrand[d]: (It amuses me greatly that that's actually in the dictionary.)
21:02mhenning[d]: at some point nouveau gl was busy looping on fences (on desktop at least). I forget if that got fixed or not
21:02gfxstrand[d]: 😭
21:03marysaka[d]: https://gitlab.freedesktop.org/mesa/mesa/-/commit/e212a80db37b0fc9d57beb91dbca1c43ae4476a0
21:03marysaka[d]: :blobcatnotlikethis:
21:04gfxstrand[d]: Well, that commit makes sense. But why does the kernel fall over if we have more than one coherent map?
21:04marysaka[d]: yeah idk ngl
21:04gfxstrand[d]: We'll, it's progress at least
21:04marysaka[d]: but that's what I'm seeing, might be worth testing on dGPU too
21:05marysaka[d]: ah wait it will not because we are on x86 isn't it
21:05gfxstrand[d]: Yup
21:09mhenning[d]: ah, yeah karol fixed the busy waiting https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19543
21:10mhenning[d]: I still wouldn't assume that nouveau gl is doing fences in any sane way though
21:11gfxstrand[d]: I'm so glad we just use syncobj now...
21:13gfxstrand[d]: Does NGL do any cache management? Or is it trusting in something other than the coherent flag?
21:15mhenning[d]: I don't know (not aure if that question was directed at me or not)
21:15marysaka[d]: There is only one thing done for caches `DRM_NOUVEAU_GEM_CPU_PREP`
21:15marysaka[d]: aka sync with CPU when mapping
21:15gfxstrand[d]: Oh
21:15marysaka[d]: we have `DRM_NOUVEAU_GEM_CPU_FINI` but no one use it
21:15marysaka[d]: and that sync to device
21:15marysaka[d]: when coherent is set, we get uncached + those are stubbed
21:16marysaka[d]: so I suppose instead we should call `DRM_NOUVEAU_GEM_CPU_FINI` to ensure the device see the command buffer content...?
21:16marysaka[d]: or just be coherent for now
21:16gfxstrand[d]: If we can be coherent for now, that's probably good enough. But if maps are failing...
21:17marysaka[d]: but I'm not too sure to follow why we sigbus on the second coherent mapping... I will likely need to rebuild nouveau to check page tables state, it might just not have the mapping for some weird reasons...
21:17gfxstrand[d]: IDK that I want to use the ioctls, regardless of what we do about coherency.
21:17gfxstrand[d]: We probably want to just do whatever the Arm version of CLFLUSH is.
21:18gfxstrand[d]: But we'll have to plumb that through the whole driver, which is annoying.
21:18marysaka[d]: yeah...
21:18gfxstrand[d]: We did that with ANV but eventually gave up because we kept screwing it up. :blobcatnotlikethis:
21:19gfxstrand[d]: It's definitely doable but ugh...
21:19gfxstrand[d]: Are the coherent maps we get at least write-combine?
21:20marysaka[d]: it only set force_coherent https://github.com/search?q=repo%3Atorvalds%2Flinux%20force_coherent&type=code
21:20marysaka[d]: so it get `ttm_uncached`
21:20gfxstrand[d]: If they at least have decent write performance, we can use them for most driver things.
21:21marysaka[d]: ttm_write_combined seems only mapped to AGB...?
21:23gfxstrand[d]: 🫤
21:24gfxstrand[d]: IDK if WC is even really a thing for system RAM.
21:26gfxstrand[d]: Looks like we have to call into the kernel if things aren't coherent.
21:26gfxstrand[d]: https://stackoverflow.com/questions/37332153/flush-a-cache-line-from-user-mode-on-armv7rpi2
21:26gfxstrand[d]: 😭
21:27gfxstrand[d]: Silly Arm...
21:28gfxstrand[d]: Oh... I bet I know why the new UAPI is different!
21:29gfxstrand[d]: I bet that the old pushbuf ioctl does Arm cache maintenance and the new one doesn't.
21:36gfxstrand[d]: That sounds annoying to hunt down but entirely plausible.
21:38gfxstrand[d]: Yup! Right here: https://github.com/torvalds/linux/blob/b7c90e3e717abff6fe06445b98be306b732bbd2b/drivers/gpu/drm/nouveau/nouveau_bo.c#L714
21:39gfxstrand[d]: The new UAPI doesn't do that because it doesn't have a list of buffers.
21:40gfxstrand[d]: So we have to do it ourselves via the CPU_PREP/FINI ioctls.
21:40marysaka[d]: yeah.... *sighs*
21:40marysaka[d]: that map to FINI
21:40marysaka[d]: for the sigbus I'm writing some small ebpf probe to check but I have an idea where the issue is
21:41gfxstrand[d]: The bad news is that that's really annoying. The good news is that we know the problem and it's solvable. Also, we already have the ioctls and they already have a NOWAIT flag so we don't need to worry about them randomly syncing on us.
21:43gfxstrand[d]: A am a little annoyed that the ioctls work on whole BOs instead of taking a range, though.
21:43gfxstrand[d]: I really hope that doesn't bite us somehow.
22:27gfxstrand[d]: I think we may want new ioctls for cache management. I'm a little afraid that a client is going to have the GPU writing to one part of the buffer and be writing to another part from the CPU and expect their `vkFlushMappedMemory()` to not affect the GPU part. That's likely okay in some circumstances but I'm a little afraid that whole buffer flushes are going to bite us. I'm going to tag this
22:27gfxstrand[d]: issue with "new UAPI".
22:32marysaka[d]: couldn't track the origin of the sigbus on coherent domain via ebpf tracing, all that I could saw is NO_PAGE being returned by nouveau on the fault handler but that applies for all mappings that were present
22:35gfxstrand[d]: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12786
22:35gfxstrand[d]: I made a to-do list.
22:36marysaka[d]: Thank you <a:vibrate:1066802555981672650>
22:37marysaka[d]: I should probably do an issue for the COHERENT issue but that will wait another time
22:37marysaka[d]: I want to write a small reproducer for that issue
22:37gfxstrand[d]: Yeah
22:38gfxstrand[d]: Given what I'm reading in nouveau.ko today, I don't think we want to use coherent much at all. Maybe for query pools but that's about it.
22:40marysaka[d]: *nods*
22:41marysaka[d]: Anyway time to shut down that poor board, the fan really don't want to spin and it has been toasting all day :sweating:
22:44gfxstrand[d]: hehe
22:46orowith2os[d]: I am in awe
22:46gfxstrand[d]: But good debug work today. We figured out why Tegra is toasted and it has nothing to do with synchronization.
22:46orowith2os[d]: All I did was ask about NVK on Tegra. And y'all got here.
22:46orowith2os[d]: Wow.
22:46gfxstrand[d]: To be fair, it's not the first time we've poked at it or talked about it.
22:46marysaka[d]: you nerdsnipped me here :linatehe:
22:46gfxstrand[d]: Also that ^^
22:47orowith2os[d]: I guess NVK on the Switch is a real possibility in the future. ;)
22:47orowith2os[d]: *?
22:47Jasper[m]: gfxstrand[d]: Happy to hear that!
22:48gfxstrand[d]: orowith2os[d]: Possibly not that far in the future, depending on how motivated people get.
22:48gfxstrand[d]: Like, I could probably type everything in that MR in a day.
22:49gfxstrand[d]: Figuring out how to get my switch to boot my custom kernel might take a week, though. :frog_upside_down:
22:51gfxstrand[d]: And figuring out the SUGBUS issue is an unknown ammount of time since that's a bug.
22:53gfxstrand[d]: orowith2os[d]: A little more than "womp womp"? 😉
22:57orowith2os[d]: gfxstrand[d]: *nixos, nixos, nixos, nixos, nixos...*
22:58magic_rb[d]: orowith2os[d]: Except when using chaotic-nyx for mesa master causes IFD which isnt supported by my CI system so i had to go back to mesa stable
22:58orowith2os[d]: IFD?
22:59orowith2os[d]: Ah
22:59orowith2os[d]: Building from source?
22:59magic_rb[d]: Yeah
22:59magic_rb[d]: Always
23:00orowith2os[d]: I don't think I've had problems with it, just had to make sure the cache is set up properly
23:00magic_rb[d]: https://github.com/chaotic-cx/nyx/blob/main/modules/nixos/mesa-git.nix#L58 will try to toggle this and see how many things blow up
23:00magic_rb[d]: I explicitly do *not* want to use the cache
23:00orowith2os[d]: Why do you need that?
23:00orowith2os[d]: (the override)
23:00orowith2os[d]: It should Just Work as is
23:00orowith2os[d]: Er, "why" for both of those
23:01magic_rb[d]: You mean the option i linked?
23:01magic_rb[d]: Its set to true by default which causes IFD which is not going to work with my CI
23:01magic_rb[d]: As for the cache, i dont trust it
23:01magic_rb[d]: And its cachix, I dont like cachix
23:02magic_rb[d]: Ive got a full CI system for a reason, a mesa build is not that slow
23:09orowith2os[d]: A full system rebuild is though.
23:10orowith2os[d]: I'm not sure how applicable that
23:10orowith2os[d]: Flag is, also
23:10orowith2os[d]: I've been using flakes, no special setup. No --impure.
23:27magic_rb[d]: The problem is the IFD not impure, thats not needed. Ill probably just go back to stable mesa, cause a full system rebuild might be a bit much
23:41gfxstrand[d]: orowith2os[d]: IDK how much that helps with installing kernels in Arm. That tends to be cursed regardless of distro. :blobcatnotlikethis:
23:42orowith2os[d]: gfxstrand[d]: You can't just have the kernel and a boot entry?
23:42orowith2os[d]: :wires:
23:42orowith2os[d]: If you can get to the bootloader, you should be golden
23:43gfxstrand[d]: As long as an upstream vanilla kernel works, sure.