00:37 mhenning[d]: somehow writing a bit of documentation is the most exhausting thing I did this week
09:09 monkey: Hi!
10:31 marysaka[d]: Nice, I guess the big question is what is the format used by cubin and if there are variants with only IR in them
10:31 marysaka[d]: Oh yeah
10:33 mohamexiety[d]: yooooooooooo that's so good!! ❤️
10:33 mohamexiety[d]: misyltoad[d]: both
10:34 mohamexiety[d]: they have generated code for released architectures and then ptx for the case of old DLSS + newer GPU (e.g. running DLSS 3 on Blackwell)
10:36 rhed0x[d]: how close is PTX to ISA?
10:36 rhed0x[d]: it's fairly close, isn't it?
10:40 marysaka[d]: yes and no it has some sugar stuffs and still compat with old stuffs that were removed so should not be too bad just need a competent parser I guess
11:03 esdrastarsis[d]: misyltoad[d]: Classic plagfrog
11:20 marysaka[d]: :nya_sad:
11:20 marysaka[d]: what screen is that btw
11:21 marysaka[d]: maybe I might take the bullet and look at supporting my 4k 144hz screen someday because it's getting annoying...
11:23 marysaka[d]: yeah so I guess same issue as me :aki_thonk:
11:25 marysaka[d]: misyltoad[d]: I don't plug my test bench to this screen and it's basically always plugged to my KVM...
11:26 marysaka[d]: no they were incomplete
11:26 marysaka[d]: but tbh we should rebase those and finish that to have that around the corner
11:26 pac85[d]: marysaka[d]: Time to add nir intrinsics for every NV instruction do you can keep the bin intact throught the compiler
11:26 marysaka[d]: I think sync was broken as hell tho
11:27 marysaka[d]: misyltoad[d]: the other annoying part is like... the uapi is completely incompatible between drivers
11:27 marysaka[d]: so we need to keep the interface up to date or whatever
11:27 marysaka[d]: sometime some fields get moved it's fun :AkkoDerp:
11:28 marysaka[d]: like envyhooks need to be rebuild on almost any update it's so annoying...
11:32 mohamexiety[d]: weird thing is it should automatically downgrade
11:32 mohamexiety[d]: like I am running a 4k240 screen on nouveau, it just runs at 4k60
11:33 marysaka[d]: yeah it feels like automatic downgrade is just busted on some display maybe? I don't know anything about that sadly...
11:33 marysaka[d]: my solution for the longest time was to plug in another card, another thing that worked was to force DP1.2 instead of 1.4 in the display settings
11:34 marysaka[d]: (I have a Sony InZone M9 for ref)
11:35 marysaka[d]: (that thing is a buggy mess sometime especially switching inputs)
12:15 karolherbst[d]: I.... I found a bug in CUDA: `an illegal instruction was encountered` 🙃
12:17 karolherbst[d]: ahh found it 🙃
13:04 marysaka[d]: misyltoad[d]: that's even more weird :blobcatnotlikethis:
13:37 mohamexiety[d]: lets gooo
13:49 chikuwad[d]: :o
14:52 karolherbst[d]: this API is such aa pain...
15:56 mohamexiety[d]: that shouldnt happen. wonder if maybe it's because CUDA gets higher shared mem splits?
15:56 mohamexiety[d]: nice
16:00 mohamexiety[d]: <a:vibrate:1066802555981672650> here's hoping
16:01 mohamexiety[d]: aw..
16:06 mhenning[d]: If you disassemble your shader, is it using constbufs or anything? You might need to set them up with the right data
16:07 mhenning[d]: misyltoad[d]: There's https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#miscellaneous-instructions-trap
16:07 mhenning[d]: You still need to look at the disassembly - the compiler can automatically move things into const bufs
16:08 mhenning[d]: I'd guess it won't for something that simple but you really should take a look at the disasm
16:11 mhenning[d]: yeah, that's a start
16:19 karolherbst[d]: the driver offsets the kernel parameter
16:21 karolherbst[d]: do you parse out the grid size?
16:21 karolherbst[d]: mhhh
16:22 mhenning[d]: maybe also run with NVK_DEBUG=zero_memory to make it less likely you're looking at uninitialized memory
16:26 karolherbst[d]: does it matter?
16:26 karolherbst[d]: adding values to the counter
16:27 karolherbst[d]: like the "how many CS invocations did run" one
16:30 mhenning[d]: Is the QMD getting filled in?
16:32 mhenning[d]: hmm it looks like nak_fill_qmd is called by the dispatch_shader call
16:36 mohamexiety[d]: yeah that would do it
16:37 karolherbst[d]: yeah
16:38 karolherbst[d]: nvidia passes some sysvals through those
16:38 karolherbst[d]: and kernel parameters come right after that one
16:38 karolherbst[d]: also at index 0
16:38 karolherbst[d]: from I can tell actual kernel args start at 0x160 or something
16:39 karolherbst[d]: or 0x164?
16:39 karolherbst[d]: dunno
16:39 karolherbst[d]: something like that
16:39 karolherbst[d]: misyltoad[d]: no idea 🙂
16:39 karolherbst[d]: ahhh
16:39 karolherbst[d]: I don't fill in that info 🙃
16:39 karolherbst[d]: but those are aapplication level parameters
16:39 karolherbst[d]: not the internal ones
16:40 karolherbst[d]: let me see...
16:40 karolherbst[d]: `EIATTR_PARAM_CBANK` maybe?
16:40 karolherbst[d]: `/*0014*/ .short 0x0160` looks suspciously like the offset
16:41 karolherbst[d]: mhhh
16:41 karolherbst[d]: `EIATTR_KPARAM_INFO` looks like internal params
16:41 karolherbst[d]: with offset + type?
16:41 karolherbst[d]: misyltoad[d]: I don;t
16:42 karolherbst[d]: the CUDA driver thing does all of it, so I won't have to bother with it
16:42 karolherbst[d]: I don't even have to know about the offset, I just pass in a host buffer with the kernel args and that's it
16:47 karolherbst[d]: should be fine for cb0
16:47 karolherbst[d]: like that should fit into push constants
16:47 avhe[d]: karolherbst[d]: it depends on the uarch
16:47 avhe[d]: it's easy enough to see with godbolt https://godbolt.org/z/dKonf7s78
16:48 karolherbst[d]: ahh
16:49 avhe[d]: though i don't know if that's a hardware thing or simply they need more fields for later gens
16:49 karolherbst[d]: it's not a hardware thing
16:49 karolherbst[d]: probably just supporting more features that need more internal arguments
16:50 avhe[d]: yeah that seems the most likely
16:52 mhenning[d]: well, push constants get put into the root descriptor in NVK, and the nv compiler isn't going to know where they are
16:53 mhenning[d]: yes, nvk puts the root descriptor in cbuf 0
16:55 mhenning[d]: Take a look at nvk_root_descriptor_table - the "push" field is where push constants go in cb0
16:55 mhenning[d]: so it'll be offset according to that field in the struct
16:56 mhenning[d]: tbh if you wanted to hack at it you might be able to reorder that field so it's first and doesn't get an offset
16:57 karolherbst[d]: anyway, can use push constants or not depending on how big the kernel input buffer is
17:04 sonicadvance1[d]: :>
17:04 karolherbst[d]: yeah.. on nvidia you have 64k for it
17:05 karolherbst[d]: there isn't anything else anyway
17:06 karolherbst[d]: the vulkan driver needs to fill the internal data, no?
17:06 mhenning[d]: misyltoad[d]: right, that's what I'm saying
17:06 karolherbst[d]: mhhh
17:07 karolherbst[d]: long-term you probably want to translate to nir, but yeah...
17:14 mohamexiety[d]: YOOOOO!
17:18 chikuwad[d]: [shocked](https://cdn.discordapp.com/emojis/945822600913903688.webp?size=48&name=shocked)
17:28 sonicadvance1[d]: \o/
17:29 gfxstrand[d]: Nice!
17:29 mhenning[d]: karolherbst[d]: yeah, in terms of what makes it upstream, I'd rather see us have a ptx-to-nir frontend than have us try to carefully match the proprietary driver's descriptors to run unmodified binaries
17:30 karolherbst[d]: shouldn't be too hard, from my experience most of it maps quite nicely, just some cursed nvidia special things might need some lowering
17:30 karolherbst[d]: though per instruction flushing controls isn't something we can model in nir atm 🙃
17:31 karolherbst[d]: why?
17:32 karolherbst[d]: mhh guess that's fair
17:33 karolherbst[d]: right
17:33 gfxstrand[d]: Yeah, especially for something like DLSS
17:34 mhenning[d]: misyltoad[d]: well, that's the way cuda normally works - you ship ptx plus a few generations of pre-compiled binaries
17:34 mhenning[d]: even if there's no hand-tuning per architecture
17:38 mohamexiety[d]: for dlss at least it's safe to assume hand-tuning though
17:38 karolherbst[d]: not sure
17:38 karolherbst[d]: their compiler is kinda good
17:39 karolherbst[d]: depends on the code
17:39 karolherbst[d]: but this is just some matrix muladd
17:59 karolherbst[d]: yeah
17:59 karolherbst[d]: you can have constant indicies
17:59 karolherbst[d]: 0x0 is the texture and 0x5a is the sampler header location within the texture const buf
17:59 karolherbst[d]: (or reverse order)
17:59 karolherbst[d]: or is it c[0x0][0x5a] on modern gens?
17:59 karolherbst[d]: mhh
17:59 karolherbst[d]: they changed it
18:00 karolherbst[d]: misyltoad[d]: what SM level is this? sm86?
18:01 karolherbst[d]: yeahh
18:01 karolherbst[d]: then it's `c[0x0][0x5a]`
18:01 karolherbst[d]: two dest regs, two inputs, cb index, cb offset, type
18:01 karolherbst[d]: .SCR means scalar encoding
18:02 karolherbst[d]: for the store it's a vec dest, vec input and cb index, cb offset
18:02 karolherbst[d]: btw the .P is elements in pixels, there is also .B where it's bytes and no format conversion takes place
18:05 karolherbst[d]: on previous gens the const buf index was encoded as part of the 3d/compute state, but I guess you won't have DLSS running pre turing nywy
18:05 karolherbst[d]: *anyway
18:08 karolherbst[d]: ohh it might be shifted by 4
18:09 karolherbst[d]: ehh 2
18:09 karolherbst[d]: so `c[0x0][0x168]` which makes it a normal kernel input
18:40 avhe[d]: misyltoad[d]: it's the elf section index
18:43 avhe[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1426641308780462182/image.png?ex=68ebf6c1&is=68eaa541&hm=888e56d491395d407a6b26f4c8fb23270eabb87dbe8be03c6fd65f21da3c65a5&
18:43 avhe[d]: using cuobjdump on the cubin should yield something like this
18:43 avhe[d]: the index is given in the first column
18:44 avhe[d]: well section is index is probably inaccurate... seems like it's the symbol index
18:45 avhe[d]: misyltoad[d]: not sure sorry... maybe something in .nv.info
19:04 mohamexiety[d]: updated the comp MR, final version now pending review passes and such <a:vibrate:1066802555981672650> <a:vibrate:1066802555981672650>
19:04 mohamexiety[d]: phomes_[d]: whenever you have time, would appreciate another game run just to verify none of the fix ups regressed anything wrt compression
19:08 phomes_[d]: mohamexiety[d]: I am away from my test box for the next week but I will test as soon as possible
19:28 mohamexiety[d]: phomes_[d]: yep, all good. thanks a lot! ❤️
19:29 mohamexiety[d]: misyltoad[d]: YOOO very great. congrats! <a:vibrate:1066802555981672650> 🎉
19:39 sonicadvance1[d]: Woo
20:25 steel01[d]: https://gitlab.freedesktop.org/drm/tegra/-/issues/8
20:25 steel01[d]: jja2000[d] Just now hit me that you might be able to corroborate this issue. Have you booted a non-l4t Linux on your tx2 devkit? If so, are the colors correct?
20:34 mohamexiety[d]: haha
20:36 jja2000[d]: steel01[d]: Yep seemed to be
20:36 jja2000[d]: Assuming llvmpipe and broken nouveau counts
20:37 jja2000[d]: I may recheck, it didn't look out of the ordinary before
20:39 steel01[d]: jja2000[d]: Mmm, wonder what the implications of that is. I've got broken colors on everything from console bit banging to dumb buffers in android recovery to nouveau rendering in the android ui. If the x11 driver or whatever wayland uses renders correctly, then... 0o What?
20:40 jja2000[d]: Do you have a good reference image to check? My Fedora install is a bit older from testing months ago
20:41 steel01[d]: I don't, no. Most of what I do is android. And for mainline verification I just have a super basic busybox ramdisk. I don't have any full Linux distro setup.
20:41 jja2000[d]: Image as in picture, the older install means you might be able to pinpoint a broken commit better
20:42 jja2000[d]: I realise that was a bit unclear from what I said hahaha
20:42 steel01[d]: The only reference I have from pure mainline is the CONFIG_LOGO output.
20:43 mohamexiety[d]: nice! that's promising
20:43 mohamexiety[d]: yeah that's already a lot of great stuff. epic work and thanks so much! ❤️
20:43 steel01[d]: jja2000[d]: If you've got both tx1 and tx2, then those side-by-side would be a reference. Tegra210 is correct for me.
20:44 jja2000[d]: Just TX2, X1 stuff is just the pixel, but that's not on dp/hdmi
20:44 steel01[d]: I've got all my stuff on a big tesmart kvm piped to pikvm. So I can switch between kvm inputs to see differences.
20:45 steel01[d]: Ah, mmm.
20:49 orowith2os[d]: steel01[d]: hacking on shield? :blobfoxpeek:
20:50 steel01[d]: orowith2os[d]: Not directly atm. Still waiting to get my units back with uart exposed in a couple weeks. Atm, I'm messing with several of the jetson devkits. Trying to get them as usable as I can. Unfortunately for xavier and orin, that's with no graphics acceleration. 🙁
20:51 steel01[d]: But it's all related. There's a lot shared between the tegra archs.
20:51 orowith2os[d]: Ah. Still waiting on my shield to have usable DRM stuff on mainline. Glad to hear progress ;)
20:51 jja2000[d]: One downside of having an older fedora install is the lack of fan... or cpu scaling...
20:51 steel01[d]: Well, gfxstrand[d] has nvk firing up on gm20b now. So that'll be a big jump once that gets merged.
20:52 jja2000[d]: Okay no steel01[d] that ia darker
20:52 steel01[d]: Okay, so consistent. That's easier to deal with. But doesn't give any hints.
20:53 jja2000[d]: I'll check kernel and mesa version
20:53 steel01[d]: I'll have to bump my lkml question next week. Unfortunately, it seems like anyone that might have a clue how to fix it don't care enough to even respond. ><
20:54 steel01[d]: I expect the issue is with tegra-drm on the kernel side. Since the issue also happens on console rendering and dumb buffers when nouveau and mesa aren't even in play.
20:54 steel01[d]: Something in the nvdisplay handling specific to t186/t194.
20:58 jja2000[d]: jja2000[d]: It's kernel 6.14.5, mesa 25.0.4
20:58 jja2000[d]: I also have a different annoying issue where the signal can't stay up and it will sometimes turn on with a handful pixels on the left being pink
20:58 steel01[d]: So right in between my test cases. 6.12 and 6.17.
20:58 steel01[d]: Huh?
20:59 steel01[d]: No way... I thought that was a broken android rendering thing. You get that on Linux too?
20:59 jja2000[d]: sec
21:00 jja2000[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1426675808671367269/2025-10-11-22-59-41-270.mp4?ex=68ec16e3&is=68eac563&hm=373cc88e78eaf398fdcf45b44e9628bf9eb9a56ee9d2ed990bda3c1bef4dbf00&
21:00 steel01[d]: I've got a picture in this channel about broken 'scanlines'.
21:00 steel01[d]: That might be a related symptom to whatever the underlying issue is.
21:01 jja2000[d]: I'm upgrading the install and dtb first before testing further to at least get the fan working lmao
21:01 steel01[d]: Mmm. I have also had a flickering display, but that was related to attempt frame rate changes. Dropping to 24 when no interaction and jumping back to 60 when active again.
21:02 jja2000[d]: Also I may need to replace plasma with xfce
21:02 jja2000[d]: And the sdcard with a SATA ssd
21:09 gfxstrand[d]: steel01[d]: I'm gonna try and clean it up and merge next week. But it's blocked behind https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37803
21:10 gfxstrand[d]: There's also a panvk MR I'm supposed to be reviewing that's blocked by that, too.
21:12 gfxstrand[d]: Once I land what I have now, 64-bit NVK will mostly work in Tegra. 32-bit needs a kernel patch. We also need some kernel fixing for coherent BOs. I have no idea when I'm going to find time for either of those, though.
21:17 HdkR: Those poor 32-bit users.
21:18 jannau: I just opened my 2015 shield tv and apparently the HDD is structurally important. Instead of a 2.5" hdd there is an equally sized block of something
21:18 HdkR: Only support Thor and Spark so you don't need to think about 32-bit :P
21:18 jja2000[d]: HdkR booooooo
21:19 jja2000[d]: Second most sold Tegra SoC is prolly the 32-bit K1
21:24 gfxstrand[d]: Let's be honest, I only really care about TX1 and Orin. 🙃
21:24 jja2000[d]: booooooooo /s
21:25 jja2000[d]: These didn't get sold in too many consumer devices either way
21:25 gfxstrand[d]: More seriously, unless someone has a plan to kill off Linux 32-bit userspace, I kinda have to care.
21:25 jja2000[d]: I was mostly referring to chromebooks in the case of the K1
21:25 gfxstrand[d]: But also, 32-bit is easy.
21:26 HdkR: You need to support 32-bit for FEX anyway. It's basically the same thing :P
21:26 gfxstrand[d]: It's just copying+pasting some kernel code from the right places and doing what I did in panvk.
21:28 gfxstrand[d]: Chasing the coherent map bug scares me more because I have no idea what's wrong (it works sometimes) and that code is insane.
21:29 jja2000[d]: jja2000[d]: T124: Jetson, Bunch of chromebooks, Mocha, Shield K1
21:29 jja2000[d]: T132: Norrin devboard, Nexus 9
21:29 jja2000[d]: T186: Jetson, some teslas?
21:29 jja2000[d]: T194: Jetson
21:29 jja2000[d]: T210: Jetson, Switch, Pixel C, Jamboard
21:29 gfxstrand[d]: But once my main MR lands, I think we'll be in a place where someone other than me can help fill in the rest of the pieces.
21:30 gfxstrand[d]: I've even got that hooks in place to tie into the kernel. Someone just has to tie them to the new ioctl.
21:30 steel01[d]: jja2000[d]: T186 has a vr headset from... magic leap?
21:31 steel01[d]: gfxstrand[d]: Next project is getting orin loading in nouveau or nova then, right? 😉
21:32 steel01[d]: Or making mesa work with nvgpu.
21:33 steel01[d]: gfxstrand[d]: Intermittent bugs suck.
21:37 jja2000[d]: steel01[d]: did you ever use the onboard sata connector on the TX2? Do you reckon I can just stand up the drive on the board? :^)
21:37 steel01[d]: jja2000[d]: I've not, no. I only tried to use the one on the tk1 once. I wouldn't trust the header to hold the weight, though.
21:38 jja2000[d]: Welp, time to order a cable
21:38 jja2000[d]: Install on the sd card is too slow I'm afraid
21:40 steel01[d]: I've only used the internal emmc for android. And that's reasonably fast.
21:40 steel01[d]: Iirc, the fedora u-boot forces external sd. Probably because they think 16GB isn't big enough.
21:44 jja2000[d]: gfxstrand[d]: Do you have notes on reproduction? Or the dumps from the cts (if applicable)? The people doing upstreaming for mocha (and pre-desktop arch tegra) may be able to spend some time on it
21:44 jja2000[d]: steel01[d]: Honestly also better to not wear out the eMMC much
21:45 steel01[d]: I've abused mine plenty since I've got my devkits and not seen any issues yet. They're specced pretty decently.
21:50 gfxstrand[d]: jja2000[d]: Yeah, there are some CTS tests that 100% fail. It's pretty predictable.
22:47 jja2000[d]: Fun, I think it thermal shutdown when trying to upgrade. Fan also isn't spinning
22:48 jja2000[d]: oof
22:52 jja2000[d]: Kinda need the upgrade to F42 for meson 1.7.0, guess I should grab the usb fan again tomorrow
23:07 steel01[d]: Hmm. My change to enable the fan was merged. But it might have been since the kernel version you're using.
23:45 jja2000[d]: This was 6.16.10, should be in there (I checked)
23:45 jja2000[d]: Also in Fedora's arm64 config