08:40 tzimmermann: dakr, Lyude, ack on https://patchwork.freedesktop.org/patch/677907/?series=155285&rev=2 ?
09:29 chikuwad[d]: :o
09:30 chikuwad[d]: I broke something :D
09:30 chikuwad[d]: [ 3822.488066] nouveau 0000:01:00.0: gsp: Xid:13 Graphics SM Warp Exception on (GPC 1, TPC 0, SM 0): Illegal Instruction Encoding
09:30 chikuwad[d]: [ 3822.488129] nouveau 0000:01:00.0: gsp: Xid:13 Graphics Exception: ESR 0x50c730=0x9 0x50c734=0x0 0x50c728=0xc81ab60 0x50c72c=0x1174
09:30 chikuwad[d]: [ 3822.489664] nouveau 0000:01:00.0: gsp: rc engn:00000001 chid:14 gfid:0 level:2 type:13 scope:1 part:233 fault_addr:0000000000000000 fault_type:ffffffff
09:30 chikuwad[d]: [ 3822.489668] nouveau 0000:01:00.0: fifo:c00000:000e:000e:[deqp-vk[99057]] errored - disabling channel
09:30 chikuwad[d]: [ 3822.489671] nouveau 0000:01:00.0: deqp-vk[99057]: channel 14 killed!
09:31 chikuwad[d]: now to identify which instruction the gpu isn't happy with
09:56 chikuwad[d]: wait I might've done this all wrong
10:44 tzimmermann: thanks, dakr
18:04 mohamexiety[d]: ooook I have a weird question. if I explicitly blacklist the nvidia module in the kernel bootargs, is it possible for it to somehow not respect that and still get loaded..?
18:04 mohamexiety[d]: because like:
18:04 mohamexiety[d]: [ 0.000000] Command line: BOOT_IMAGE=(hd0,gpt6)/vmlinuz-6.17.0-63.fc43.x86_64 root=UUID=987fc455-0d18-4d0e-861f-7f33259757df ro rootflags=subvol=root rhgb quiet rd.driver.blacklist=nvidia,nova_core modprobe.blacklist=nvidia,nova_core
18:05 mohamexiety[d]: and yet:
18:05 mohamexiety[d]: mohamed@fedora:~$ modinfo nvidia
18:05 mohamexiety[d]: filename: /lib/modules/6.17.0-63.fc43.x86_64/extra/nvidia/nvidia.ko
18:06 mohamexiety[d]: one other thing I am doing is I am early loading the NV modules in initramfs but I am also doing the same on arch and the kernel bootargs get respected, so what makes fedora different here?
18:09 mhenning[d]: yes, the proprietary userspace is very agressive in trying to force-load the proprietary kernel driver
18:10 mhenning[d]: on arch, I've found that modprobe.blacklist isn't enough to prevent it from loading. The only thing that seems to work for me is `module_blacklist=nvidia` on the kernel command line or uninstalling the driver entirely
18:10 mohamexiety[d]: OH I am doing module_blacklist on arch
18:10 mohamexiety[d]: ok let me try that then
18:12 mhenning[d]: Even then, the proprietary driver has been deadlocking for me when booting under nouveau kernel module lately. The only setup I currently have working is installing/uninstalling the proprietary driver
18:12 mohamexiety[d]: oh the deadlock is fixable
18:12 mohamexiety[d]: 1 sec
18:12 mohamexiety[d]: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13436#note_2983601 try this
18:13 mhenning[d]: Yeah, I'm aware of the bind mount thing, but I haven't tried it
18:13 mohamexiety[d]: that fixed it for me on both arch and fedora at least
18:13 mhenning[d]: honestly removing the package seems simpler to me
18:14 mohamexiety[d]: my internet is too slow to reliably keep doing that sadly
18:14 mohamexiety[d]: not sure why but the nv packages on fedora are like 11GB
18:15 mhenning[d]: I've been wondering if we could do something along the lines of the bind mount thing in a vulkan layer in order to have something that could plausibly be shipped to users
18:15 mhenning[d]: mohamexiety[d]: oh, on arch it always gets it from the disk cache, so no re-downloading, just creating/deleting files
18:16 mohamexiety[d]: what I really want to know is what led to NV being so hyper aggressive with loading tbh
18:16 mohamexiety[d]: like damn, the driver _really_ wants to load itself
18:16 mhenning[d]: yeah, I wish they would stop doing that
18:17 mohamexiety[d]: on arch even with the module blacklist I still get a lot of NVRM messages in `dmesg` indicating that it's trying to load but failing because nouveau is loaded and using the gpus
18:19 mhenning[d]: hmm. I'm actually not sure if it does that for me or not. I don't think so, but I'm not totally sure
18:20 mhenning[d]: I do think I've managed to get my kernel into weird states where both drivers were loaded in the past although I forget how that happened
18:30 mohamexiety[d]: and yep, module_blacklist worked. thanks! ❤️
18:37 chikuwad[d]: modprobe.blacklist only blocks the first modprobe attempt at boot iirc, while module_blacklist blacklists at the kernel level for as long as the param is part of the cmdline
18:49 mhenning[d]: hmm. I wonder if that's just from /usr/lib/modules-load.d/nvidia-utils.conf or if other things try to modprobe too
21:26 karolherbst[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1425595163048280246/image.png?ex=68e82875&is=68e6d6f5&hm=22396b8138aed9e679cc2cc7acd7e956fa85f68826b0c5006281f81f4ef2ab15&
21:26 karolherbst[d]: 👀
21:29 karolherbst[d]: yes you can also read registers:
21:29 karolherbst[d]: (cuda-gdb) p/x $R10
21:29 karolherbst[d]: $2 = 0x104
21:37 sonicadvance1[d]: cuda-gdb is good stuff
21:37 karolherbst[d]: yeah...
21:37 mohamexiety[d]: nvk-gdb when
21:37 karolherbst[d]: you can even switch to any gpu thread
21:37 karolherbst[d]: like thread thread
21:38 karolherbst[d]: the mistake I'm making is to set shared memory size to 0 bytes, but... that's impossible 🙃
21:39 karolherbst[d]: ohhh...
21:39 karolherbst[d]: I'm dum dum
21:41 karolherbst[d]: sonicadvance1[d]: oh yeah, you don't know what's the cursed project I'm working on 🙃
21:41 sonicadvance1[d]: I think I've heard of it, if it's the one I'm thinking of then it'll be pretty funny.
21:42 sonicadvance1[d]: Otherwise no reason to investigate cuda-gdb 😛
21:42 mohamexiety[d]: beating ROCm with nvk
21:43 karolherbst[d]: sonicadvance1[d]: heh
21:43 karolherbst[d]: I'll probably submit the MR like this weekend 🙃
21:43 sonicadvance1[d]: ooo
21:44 sonicadvance1[d]: I'll need to watch the gitlab
21:44 karolherbst[d]: I pss like 95% of the CTS already
21:44 sonicadvance1[d]: 🎉
21:45 karolherbst[d]: but cuda-gdb will be such major help debugging more complex applications crashing in GPU code randomly
21:46 karolherbst[d]: I wonder if I could integrate it more neatly...
21:46 karolherbst[d]: like I'd kinda like to see the PTX code the disassembly belongs to
21:46 sonicadvance1[d]: Yea, no other open-source solution currently. Although I guess once the interface is wired up then a oss version wouldn't be very hard.
21:46 sonicadvance1[d]: hah! I have the same request for seeing what x86 code correlates to my ARM code 😄
21:46 karolherbst[d]: 😄
21:46 sonicadvance1[d]: gdb needs more JIT helpers
21:46 karolherbst[d]: mhh maybe I'd need to pass debug flags to the JIT?
21:47 karolherbst[d]: there is `CU_JIT_OPTIMIZATION_LEVEL`, `CU_JIT_GENERATE_DEBUG_INFO` and `CU_JIT_GENERATE_LINE_INFO`
21:54 karolherbst[d]: mhh doens't seem to work with ptx input..
22:06 gfxstrand[d]: Building dEQP on my TX1. This is gonna take a while...
22:07 steel01[d]: Those poor a57's.
22:07 gfxstrand[d]: Could be worse. Could be TK1
22:07 steel01[d]: Hey now, a15 was smoking a lot of the aarch64 stuff back in the day. 😛
22:07 karolherbst[d]: mohamexiety[d]: it's not even _that_ hard, just needs somebody to spend the 2 months to write all the code lol
22:08 steel01[d]: steel01[d]: And might even still do so... It'd be interesting to see k1 vs some of the 'new' amlogic stuff that gets sold in atv devices and all.
22:09 steel01[d]: gfxstrand[d]: Is this to imply that nvk is doing things on gm20b now?
22:10 gfxstrand[d]: I've rebased my NVK branch and built it. I'm sure it's incomplete still. But I need to be able to test it
22:11 steel01[d]: Ah. So haven't even run vkcube or whatever the hell world thing is.
22:11 gfxstrand[d]: No
22:11 gfxstrand[d]: My hello world is dEQP-Vk.api.smoke.triangle
22:12 steel01[d]: If that deqp thing you're building is the same thing that google runs in vts, then... yeah. See you next week. The google list is over 1 million entries. 0_0
22:13 mohamexiety[d]: it's not that bad!
22:13 mohamexiety[d]: though I dont know how bad those cpu cores actually are :Nervous:
22:14 steel01[d]: They were upper mid tier. In 2015.
22:14 mohamexiety[d]: karolherbst[d]: eh the skill set needed to do all that is rough
22:14 airlied[d]: LTO will suck on that
22:14 karolherbst[d]: mohamexiety[d]: writing C code? yeah...
22:15 mohamexiety[d]: surely _a little bit more_ is needed than just that 😛
22:15 karolherbst[d]: well..
22:15 mohamexiety[d]: otherwise mesa and the kernel would be overflowing with devs
22:15 gfxstrand[d]: steel01[d]: Yeah. Doing a full CTS run will take a while. But first I've got to build everything.
22:15 karolherbst[d]: on the GPU side it's not complicated, you just whack the error reporting masks, upload a debug shader thingy and do some weird IPC communicating with the debugger
22:16 steel01[d]: Oh, so I have a reasonable comparison, gfxstrand[d]. Once you get gm20b running, how bad would you expect gp10b to be? I know you don't have a tx2, but if the tegra support is common between archs and stuff can be copy-pasta'd, I can do a general verification pass.
22:16 gfxstrand[d]: It should "just work"
22:17 steel01[d]: Oh, that'd be sweet.
22:17 marysaka[d]: GOB might be different maybe
22:17 marysaka[d]: But yeah
22:17 gfxstrand[d]: There's virtually no difference between Maxwell and Pascal that's visible to userspace.
22:17 gfxstrand[d]: But yeah, Tegra starts using desktop GOBs at some point. Not sure where that point is.
22:17 mohamexiety[d]: orin iirc
22:17 gfxstrand[d]: gm20 is definitely Tegra gobs
22:17 mohamexiety[d]: (as in, orin has a switch to do it)
22:18 gfxstrand[d]: Ah
22:18 gfxstrand[d]: Oh, and with gm20 I can finally hook up ASTC. 😄
22:18 steel01[d]: Iirc, something with surfaces switched to desktop style on xavier. Then more stuff happened on orin. Then blackwell seems to literally just be a desktop card electrically slapped on the die.
22:18 gfxstrand[d]: The one thing Tegra can do that desktop can't. :frog_upside_down:
22:18 mohamexiety[d]: or wait, xavier actually: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36336#note_3023388
22:18 gfxstrand[d]: Yeah, Blackwell has very little difference, I think.
22:19 mohamexiety[d]: but thor seems to use the older layout?
22:19 marysaka[d]: FP16 on SM53, anyone? :AkkoDerp:
22:19 mohamexiety[d]: idk. it's weird
22:19 gfxstrand[d]: marysaka[d]: Feel free?
22:20 mohamexiety[d]: steel01[d]: blackwell is a bit funny because Thor is blackwell_a (same lineage as the big DC gb200) while spark is blackwell_b (desktop cards)
22:20 gfxstrand[d]: If I can get cache maintenance working and enable ASTC, that should be enough for Android for now.
22:20 steel01[d]: I really *really* wish we could get xavier+ wired up on nouveau. But I can't even get the guy that actively wants oss support inside nvidia to release the firmware for someone to try. ><
22:20 mohamexiety[d]: iirc thor+ should be gsp so it should work. but the older ones are... yeah..
22:20 gfxstrand[d]: Oh, and someone needs to figure out how to hack steel01[d]'s Android build to actually be able to build NVK. Android does not like rust. :blobcatnotlikethis:
22:21 steel01[d]: It's pain. Serious pain.
22:21 airlied[d]: thor "gsp" is a bit different than "gsp"
22:21 airlied[d]: I should get back to GH100 enablement
22:21 steel01[d]: One of the lineage guys made a python script to fetch all the crates and store them locally. Cause the aosp build system blocks internet access during the build. But even after that, 25.2.x has more issues that I wasn't seeing on 25.1.x.
22:22 mohamexiety[d]: airlied[d]: oh TIL, sorry. I know that rn it uses some modified openrm, but I thought the plan was that thor would use bone stock normal openrm and thus uses the same gsp path as normal cards
22:22 gfxstrand[d]: I got it most of the way there. I just have some weird thing where bindgen blows up with weird C compiler errors
22:22 airlied[d]: hopefully they figure out a single path, but big companies gonna big company
22:23 steel01[d]: Once stuff is working enough to have expectation of android working (or is getting close at least), I can go bug the guy that plumbed the rest of the rust stuff. And hopefully between the group, we can get something reasonably CMed.
22:34 jja2000[d]: gfxstrand[d]: gl, last time I tried vkcube/vkinfo and some gl stuff through zink with that branch it still complained about not having memory
22:35 jja2000[d]: https://discord.com/channels/1033216351990456371/1034184951790305330/1367211282419286158
22:35 jja2000[d]: sorry, no valid memory type, I misremembered
22:58 karolherbst[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1425618306446004234/image.png?ex=68e83e02&is=68e6ec82&hm=ad419ab42f2fb749a52b21741022325ff2269fc737b892d68e25e931171a1086&
22:58 karolherbst[d]: luxmark speed run complete
23:11 gfxstrand[d]: How's the perf relative to native?
23:14 karolherbst[d]: what do you mean to native?
23:14 karolherbst[d]: nvidia's impl?
23:14 karolherbst[d]: 10% lower
23:15 mohamexiety[d]: that's very impressive actually tbh
23:15 karolherbst[d]: parts of the compiler are really cursed atm
23:15 karolherbst[d]: I haven't even enabled all those `.has_...` flags yet
23:16 karolherbst[d]: I should play around with the function support as well.. that might also help a lot
23:16 karolherbst[d]: atm it's all inlined and I bet with those ray tracing kernels it would help to not do that
23:17 karolherbst[d]: I'm more surprised that luxmark validated on first try after I got it to compile 🙃
23:17 karolherbst[d]: scratch memory was the last bit I had to wire up
23:19 airlied[d]: is the luxmark on rusticl/zink/nvk or something else?
23:19 karolherbst[d]: rusticl/cuda 🙃
23:19 airlied[d]: oh lulz
23:19 karolherbst[d]: yeah...
23:19 karolherbst[d]: started it after XDC 😄
23:19 karolherbst[d]: so I've written a nir to PTX thingy
23:20 mohamexiety[d]: WHAT
23:20 mohamexiety[d]: you did that in just that week? O_O
23:20 karolherbst[d]: it targets the cuda driver API
23:20 airlied[d]: but I want a PTX to NIR thingy
23:20 karolherbst[d]: so it doesn't even depend on the cuda runtime
23:20 karolherbst[d]: just libcuda.so which is shipped with the normal driver
23:21 karolherbst[d]: mohamexiety[d]: I men.. it's layered and the cuda driver API is actually competent
23:21 karolherbst[d]: while also very cursed...
23:21 karolherbst[d]: it's like GLX as you have to bind a context current to the thread and ... it's kinda annoying 🙃
23:22 karolherbst[d]: mohamexiety[d]: tbf.. I thought it would take me two weeks 🙃
23:22 mohamexiety[d]: airlied[d]: would any of the nir to ptx work help with the other way around? :thonk:
23:22 karolherbst[d]: no
23:22 mohamexiety[d]: karolherbst[d]: still, very impressive in either case
23:22 karolherbst[d]: though
23:22 karolherbst[d]: at least I understand PTX enough now 😄
23:23 karolherbst[d]: I just convert nir straight to a bunch of strings
23:24 karolherbst[d]: maybe I should check rusticl/zink/nvk as well
23:24 karolherbst[d]: but I suspect perf will not be that great
23:25 mhenning[d]: you did rusticl + cuda before rusticl + nvk ? 😝
23:25 karolherbst[d]: I mean rusticl + zink already exists 😛
23:26 gfxstrand[d]: rusticl + cuda was more or less done on a bet. Y'all can blame me. 😛
23:26 karolherbst[d]: 😄
23:26 gfxstrand[d]: Don't credit me, mind you. I didn't do any of the work. But go ahead and blame me.
23:26 karolherbst[d]: I uhm..
23:26 karolherbst[d]: soo how do I explain it
23:26 karolherbst[d]: parts of it is from you 🙃
23:27 mohamexiety[d]: ~~can i make a bet for dlss on nvk?~~
23:27 karolherbst[d]: I took your `nir_lower_cf` pass 😛
23:28 mhenning[d]: I assume you ignore the control flow barriers?
23:28 karolherbst[d]: ptx doesn't have them
23:28 gfxstrand[d]: I mean, sure. "Copy Faith code" is pretty standard practice for most Mesa development these days.
23:28 mhenning[d]: (that's really the hard part of lower_cf)
23:28 karolherbst[d]: and yeah, I removed that part
23:28 gfxstrand[d]: Just ask Alyssa. 🙃
23:29 karolherbst[d]: so I use that pass only for structured -> unstructured
23:29 mhenning[d]: yeah, makes sense
23:29 karolherbst[d]: which I technically didn't need to do, but the goto and goto_ifs map better to predicated bras I need in PTX
23:29 rhed0x[d]: karolherbst[d]: now do cuda to nir >:)
23:30 karolherbst[d]: the best part is that I use the nir SSA index for the PTX register names, it's great
23:30 karolherbst[d]: I have a 1:1 mapping there
23:30 rhed0x[d]: (on a more serious note, this is pretty cool)
23:32 karolherbst[d]: though I haven't started image support yet 🙃
23:32 karolherbst[d]: I'm sure that's gonna be a massive time sink
23:33 mhenning[d]: can you rusticl -> cuda -> ZLUDA -> rocm ? 😛
23:33 karolherbst[d]: mhhhhh
23:33 karolherbst[d]: I don't know?
23:35 karolherbst[d]: ohh it does implement the full driver api
23:35 karolherbst[d]: so maybe?
23:35 karolherbst[d]: but why would I target ROCm, rusticl + radeonsi is already fast enough 😄
23:37 mhenning[d]: karolherbst[d]: to see how cursed you can make the stack
23:38 karolherbst[d]: mhh
23:39 karolherbst[d]: we have to throw in virtualization into the mix somehow
23:40 butterflies[d]: Is graphics supportable or is this targeting rusticl only
23:40 butterflies[d]: because having graphics would be amazing
23:40 karolherbst[d]: graphics isn't supportable
23:40 karolherbst[d]: I'm using this API: https://docs.nvidia.com/cuda/cuda-driver-api/
23:40 butterflies[d]: It is
23:40 mhenning[d]: yeah, ptx is compute-only
23:41 butterflies[d]: Rasteriser in GPU kernels :∆
23:41 karolherbst[d]: there is some graphics interop, but mostly for like cl_gl_sharing stuff
23:41 butterflies[d]: butterflies[d]: Which is what I was thinking about
23:41 karolherbst[d]: ehhh
23:41 karolherbst[d]: that's a bit too cursed 🙃
23:41 butterflies[d]: thinking about it if there was common GFX on top of compute only devices that could clean up llvmpipe quite a bit
23:41 butterflies[d]: and make it faster too
23:42 orowith2os[d]: karolherbst[d]: VM native?
23:42 orowith2os[d]: Run it inside of a VM, and pass the results outside of it
23:42 butterflies[d]: virtio-gpu native context for anything new please
23:42 karolherbst[d]: nah, more like zink + venus + moltenvk is always a great thing to do
23:42 orowith2os[d]: butterflies[d]: Yes, that
23:43 orowith2os[d]: For meme purposes, venus
23:43 butterflies[d]: karolherbst[d]: Zink + moltenvk has been broken since months
23:43 butterflies[d]: It used to work until Zink started to require nullDescriptor
23:43 karolherbst[d]: yeah but it's not zink + moltenvk, but zink + venus + moltenvk 😛
23:43 butterflies[d]: butterflies[d]: which isn't a concept present in Metal
23:43 karolherbst[d]: ahh
23:44 butterflies[d]: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37659
23:45 sonicadvance1[d]: That's fine, just wait for KosmicKrisp instead 🙂
23:46 butterflies[d]: Oh we are already waiting for that, currently it has a very Android baseline-shaped feature set without even 16-bit buffer access
23:46 butterflies[d]: or BC textures
23:46 butterflies[d]: waiting for merge before sending all the PRs
23:47 sonicadvance1[d]: :BlobSweat: