01:16 derivativedrag: My grandmother just tries to throw me out of it's building, but it is what it is, nothing can be searched from jewish moron like this who advised my dad and mom to make abortion, gave me to mental institution to harvest substances from my circulation, i just leave that idiot, and it gets what it earned. so yes around 10million is the number cause my gnome-calculator appeared to have a
01:16 derivativedrag: mistake. not 700k but 1.4million per bank1. Just like Grigori Perelman no real scientist alike ever wants to deal with nobel prize associations or field medals ones, it's just endless fraud like you all are. I get my things up and leave you all to your own shitty tranny lover association, just as big of freaks like gloria terreur and laura tornado. I am unsure what my dad says nowdays
01:16 derivativedrag: but i depend on no one anyways, estonian and finnish doctors committed worldwide fraud using europes tech and following their commands is over, and they are soon dead already with other wank spammers.from here and other abusers. We just clean that mess up over years here.
01:17 gfxstrand[d]: I have no idea why this test is failing
02:19 mhenning[d]: have you tried asking it nicely to succeed
03:03 gfxstrand[d]: ðŸĪĢ
03:03 gfxstrand[d]: No. Maybe I should try that. I've tried about everything else.
03:03 gfxstrand[d]: I have no idea what the difference between Kepler A and B here is
03:03 gfxstrand[d]: Could be a compiler bug but this shader is so darn simple it's hard to imagine what that would be.
03:51 HdkR: You've heard of first Kepler, but what about second Kepler?
05:14 gfxstrand[d]: ðŸĪ­
06:05 mangodev[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1377890642272850071/image.png?ex=683a9c29&is=68394aa9&hm=a2fdc222edad3b6f089c3222c5ab54a4e3d91edb2aefca6fd8683f923f989a76&
06:05 mangodev[d]: mhenning[d]: no way, the strip thing worked
06:05 mangodev[d]: (this is the electron crash that previously gave very little info)
06:06 mangodev[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1377890970653294622/image.png?ex=683a9c77&is=68394af7&hm=4e84d4e44ea59992f3316688c0c2c4225acdcb90931ae2b248314af697bb0424&
06:06 mangodev[d]: in order to send the whole stacktrace i'd have to copy multiple buffers of pure stacktrace 😭
06:09 mangodev[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1377891570719916154/image.png?ex=683a9d06&is=68394b86&hm=a02830d41aa7f0f1cc78233986121c1e4b99c5c64cbe00dbf634e023527ddf47&
06:09 mangodev[d]: HMMMMMMMMMMMMM
06:10 mangodev[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1377891900492611734/image.png?ex=683a9d55&is=68394bd5&hm=70bb96d7fd16ad59dfc190ffafb2400336bedb941980015a5f910458ea46bbdf&
06:10 mangodev[d]: it segfaults so frequently that the segfault interrupted the other segfault mid-segfault :|
06:14 mangodev[d]: wait oops
06:14 mangodev[d]: don't mind me
06:14 mangodev[d]: forgot you can't scroll in less
11:11 kar1m0[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1377967539002277888/image.png?ex=683ae3c6&is=68399246&hm=c68f7df8f51d642ecc174279a25a85dd201006b2b23d0e02dc68645260426942&
11:11 kar1m0[d]: turns out the gpu viewer app can see the nvk drivers
13:14 gfxstrand[d]: mangodev[d]: does it crash immediately? Or do you have enough time you could attach GDB?
13:55 kar1m0[d]: mangodev[d]: snowycoder[d] gfxstrand[d] guys pls help it seems that my gpu isn't detected for some reason when I open games with mangohud both of my gpu's are at 0% and there is a ram issue and I don't really understand why that happens. can I run some tests or check this somehow to fix this issue? this is the error steam gives me
13:55 kar1m0[d]: LowLevelFatalError [File:D:\Git\UEBuild_Win64_b1_release\Engine\Source\Runtime\D3D12RHI\Private\D3D12Util.cpp] [Line: 819]
13:55 kar1m0[d]: Out of video memory trying to allocate a rendering resource
13:55 kar1m0[d]: 0x00006fffffc0d0c7 kernelbase.dll!UnknownFunction []
13:55 kar1m0[d]: 0x000000014cffc0e3 b1-Win64-Shipping.exe!UnknownFunction []
13:55 kar1m0[d]: 0x000000014cfa8210 b1-Win64-Shipping.exe!UnknownFunction []
13:55 kar1m0[d]: Crash in runnable thread Background Worker #14
13:56 kar1m0[d]: both gpu 0 and gpu 1 are at 0%
13:56 kar1m0[d]: in mangohud
14:06 gfxstrand[d]: gpus being at 0% isn't surprising for NVK. That's just because it isn't reporting anything.
14:15 kar1m0[d]: it's just weird
14:15 kar1m0[d]: the issue I mean
14:15 kar1m0[d]: of insufficient ram/vram
14:59 gfxstrand[d]: Could be something else masquerading as that.
14:59 gfxstrand[d]: But also totally possible.
14:59 gfxstrand[d]: What GPU again?
14:59 kar1m0[d]: gfxstrand[d]: ada lovelace
14:59 kar1m0[d]: rtx 4080 laptop gpu
15:00 gfxstrand[d]: Okay, that shouldn't have any funny corners around memory heaps/types.
15:00 kar1m0[d]: kar1m0[d]: from the games that I have tested so far only lies of p managed to properly boot, war thunder and black myth wukong gave me this error
15:01 kar1m0[d]: though they are gpu heavy games
15:01 kar1m0[d]: and lies of p only managed to boot on the second try
15:01 kar1m0[d]: steam often freezes
15:04 kar1m0[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1378026174818549852/Screenshot_20250530_180328.png?ex=683b1a62&is=6839c8e2&hm=74cbe954cef4845141ceed3f7cf57693a079af5f1198971a1974fa57b9b19f9a&
15:10 gfxstrand[d]: Can you build a debug build? If you do, it'll do some extra logging to stderr
15:22 kar1m0[d]: gfxstrand[d]: I honestly do not know how to do it 😅 but if you can send me a link to a guide I would be happy to do it
15:27 gfxstrand[d]: What distro?
15:27 kar1m0[d]: garuda linux which is arch based
15:30 gfxstrand[d]: You should be able to do a pretty standard mesa build then.
15:32 gfxstrand[d]: Clone and then do
15:32 gfxstrand[d]: ```bash
15:32 gfxstrand[d]: meson configure -Dprefix=$PWD/_install -Dvulkan-drivers=nouveau -Dgallium-drivers=zink -Dbuildtype=debug _build
15:32 gfxstrand[d]: ninja -C _build install
15:32 gfxstrand[d]: and then prefix `VK_ICD_FILENAMES=$BUILD_DIR/_install/share/vulkan/icd.d/nouveau_icd.x86_64.json` to your run command
15:37 kar1m0[d]: gfxstrand[d]: I haven't cloned
15:37 kar1m0[d]: I installed binaries
15:38 kar1m0[d]: though I should probably have cloned
15:38 kar1m0[d]: mind sending me a link to it? gfxstrand[d]
15:41 gfxstrand[d]: A link to what? The repo? https://gitlab.freedesktop.org/mesa/mesa
15:43 kar1m0[d]: to what I have to git clone
15:43 kar1m0[d]: sorry
15:43 kar1m0[d]: I am bad at those things
15:43 kar1m0[d]: I get lost in the branch
15:43 mhenning[d]: mangodev[d]: Okay, cool. After one of the crashes, can you do `coredumpctl debug` and then
15:43 mhenning[d]: set logging on
15:43 mhenning[d]: bt full
15:43 mhenning[d]: quit
15:43 mhenning[d]: at the prompt? Once you do this, it will generate a gdb.txt file in the current working directory and you can upload it to the bug report.
15:49 gfxstrand[d]: kar1m0[d]: go somewhere and `git clone https://gitlab.freedesktop.org/mesa/mesa` and then it'll be in the `mesa` folder.
15:53 kar1m0[d]: gfxstrand[d]: meson configure -Dprefix=$PWD/_install -Dvulkan-drivers=nouveau -Dgallium-drivers=zink -Dbuildtype=debug_build for now there is no output
15:53 kar1m0[d]: I wonder how long it will take
15:54 kar1m0[d]: git cloned exactly the main mesa branch
15:54 gfxstrand[d]: kar1m0[d]: That should start spewing stuff immediately
15:54 gfxstrand[d]: Unless your machine is trying to silently install meson in the background or something. IDK if Arch does that sort of thing.
16:23 kar1m0[d]: gfxstrand[d]: do you need the full log?
16:24 kar1m0[d]: this part is interesting
16:24 kar1m0[d]: thread '<unnamed>' panicked at ../mesa-25.1.1/src/nouveau/compiler/nak/from_nir.rs:3350:18:
16:24 kar1m0[d]: Unsupported intrinsic instruction: rq_initialize
16:24 kar1m0[d]: note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
16:24 kar1m0[d]: thread '<unnamed>' panicked at ../mesa-25.1.1/src/nouveau/compiler/nak/from_nir.rs:3350:18:
16:24 kar1m0[d]: Unsupported intrinsic instruction: rq_initialize
16:24 kar1m0[d]: pid 9933 != 9932, skipping destruction (fork without exec?)
16:25 gfxstrand[d]: kar1m0[d]: That's the ticket!
16:25 kar1m0[d]: I have no idea what this is about
16:25 gfxstrand[d]: You have ray tracing turned on
16:25 gfxstrand[d]: Or at least the game is trying to compile ray tracing shaders
16:25 kar1m0[d]: it is compiling shaders
16:25 kar1m0[d]: but I never turned ray tracing on in that game
16:26 gfxstrand[d]: 😕
16:26 kar1m0[d]: I probably have to bypass the shader compilation?
16:26 gfxstrand[d]: Well, it's trying to compile shaders with ray queries in them
16:26 kar1m0[d]: to make it open the game so I can enter the settings
16:26 gfxstrand[d]: Is there an option to do that?
16:27 gfxstrand[d]: Did you run it with the blob driver before?
16:27 kar1m0[d]: I did run it on nvidia drivers before yes
16:27 kar1m0[d]: might need to clear shaders cache?
16:27 kar1m0[d]: so it will compile them with vulkan with nvk drivers
16:28 gfxstrand[d]: It may have auto-detected ray-tracing and turned it on by default
16:28 kar1m0[d]: might have leftovers from nvidia drivers
16:32 kar1m0[d]: okay I think it might be dlss gfxstrand[d]
16:32 kar1m0[d]: because
16:33 kar1m0[d]: I had it enabled
16:33 gfxstrand[d]: could be
16:33 kar1m0[d]: might it be causing the issue?
16:33 gfxstrand[d]: <a:shrug_anim:1096500513106841673>
16:33 gfxstrand[d]: Hard to know how game settings map to actual API features. You just have to try things.
16:37 gfxstrand[d]: gfxstrand[d]: Hah! It was an encoding error after all!
16:38 gfxstrand[d]: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35246/diffs?commit_id=01515099aa607b99e4bc3de79f4c5c0223abb9bf
16:38 gfxstrand[d]: mhenning[d]: I just noticed that we're setting the threshold for `nir_opt_peephole_select()` to zero. We can probably gain a good bit of perf back if we pick a better number there.
16:43 gfxstrand[d]: I don't know what number to pick and I know it's not as good as "real" predication but still...
16:52 kar1m0[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1378053354055401482/nvk_test.mp4?ex=683b33b2&is=6839e232&hm=a29d0ad8a64d2ee35775513ea93992990c9c0bdef04679889c2f68f407589e25&
16:52 kar1m0[d]: Wanted to record game test on nvk with obs and this happes when I open the recording...
16:53 kar1m0[d]: getting around 75 fps on highest settings with nvk!
16:53 kar1m0[d]: pretty close to nvidia drivers
16:53 kar1m0[d]: don't mind the video quality I had to compress it
16:54 kar1m0[d]: because discord
17:01 mhenning[d]: gfxstrand[d]: oh, yeah I'm aware of that. could be worth tinkering with
17:02 mhenning[d]: for alu-only, I'm actually not sure if there's much of a gain to predication vs peephole_select (beyond whatever register pressure gets you)
17:05 gfxstrand[d]: But only if we can figure out how to RA better. 😭
17:05 gfxstrand[d]: I've actually got some ideas for that. I just need to play with it.
17:08 mhenning[d]: imo the post-ra predication handles it fine
17:09 mhenning[d]: that's actually the main reason I favor post-ra predication - it gets to just handle the liveness of intermediate values in the usual way that all `if`s are handled
17:34 mangodev[d]: gfxstrand[d]: thankfully, when electron crashes, it soft-crashes and self-recovers (hence why it can crash multiple times in a row)
17:34 mangodev[d]: i'll see if i can attach gdb sometime later today, but i'll probably want to know how to use it a little better (given before, i was just spamming the "next step" key because i didn't know how to make it properly continue the program :P)
17:35 gfxstrand[d]: hehe
17:35 mangodev[d]: perhaps it's a kopper issue (again)? given every time it crashed, it brought up egl sync
17:35 gfxstrand[d]: Really, you attach, continue, then once it crashes `bt full`.
17:35 gfxstrand[d]: Is it easy to repro?
17:36 mangodev[d]: gfxstrand[d]: that's why i haven't made a report yet
17:36 mangodev[d]: it's very inconsistent when it happens, so it's hard to make an issue
17:36 mangodev[d]: although it seems to crash far more when the GPU is under contention
17:36 snowycoder[d]: Texdepbar insertion works with only 2 custom graph algorithms (1 DFS, 1 Dijkstra)
17:38 gfxstrand[d]: mangodev[d]: That's plausible
17:38 mangodev[d]: also
17:38 mangodev[d]: question out of curiosity
17:38 mangodev[d]: is the mr for `redux` an optimization of sorts, using one hardware instruction instead of multiple others? or is it a new capability entirely
17:39 gfxstrand[d]: An optimization
17:39 gfxstrand[d]: But thanks for reminding me I need to review it!
17:39 HdkR: systemd-coredump to capture the crash regardless of attaching? :)
17:39 mangodev[d]: it *sounds* like a scalar swizzle (e.g. `vec.a`), although it feels like it may be more than just that ~~because that feels too simple to not be implemented in hardware yet~~
17:40 gfxstrand[d]:is bad at remembering to review mhenning[d] 's MRs.
17:41 mhenning[d]: gfxstrand[d]: Is there anything I can do to make it easier to remember (beyond pestering you on irc)?
17:42 mangodev[d]: gfxstrand[d]: np! i've wanted to nudge you about some MRs that i've been interested in (primarily mhenning's :P), but i didn't know how rude it'd be since you're already busy with Kepler stuff, and also the fact that 9 times out of 10, it's the author themselves that asks for a review
17:44 kar1m0[d]: if there are any tests you need me to run on nvk just let me know guys
17:44 kar1m0[d]: I can run them on both kepler b and ada lovelace
17:51 mhenning[d]: mangodev[d]: honestly, if you wanted to remind faith about reviews on my behalf, I'd appreciate it
17:52 mangodev[d]: mhenning[d]: i do like a lot of your MRs :D
17:52 mangodev[d]: i like performance and stability (as those are the main reasons i migrated to NVK as a daily-driver)
17:54 gfxstrand[d]: I'm happy for folks to remind me.
17:54 gfxstrand[d]: I was pretty low-bandwidth for the last 6 months but I'm pretty much back now so remind away.
17:55 mangodev[d]: i should try we ♥ïļ katamari reroll again sometime and see if it can hit the 60fps target on higher settings than lowest
17:55 mangodev[d]: it ran good on lowest, but I'd also *hope* so, since it's literally a ps2 game :P
17:55 mangodev[d]: enabling shadows and anti-aliasing (iirc msaa?) a couple months ago made my 1660 super *cry,* but now that the spill/fill regressions are mostly fixed (afaik), it should run a ton better
17:56 mangodev[d]: more importantly i need to test for that driver crash though
17:56 mangodev[d]: although i hope it's fixed now since many stability fixes have been made since then
17:57 mhenning[d]: mangodev[d]: maybe. the spill/fill stuff is the kind of thing that will help some games a ton and won't help other games at all
17:57 mhenning[d]: so it's worth trying
17:58 mangodev[d]: gfxstrand[d]: iirc there were even some of your own MRs that i was interested in :P
17:58 mangodev[d]: i constantly scan the repo for commits, MRs, and issues, so maybe i can be your alarm clock
17:59 mangodev[d]: i remember seeing some issues that looked quite out of date, because one of the checklist items was "rewrite NAK"
17:59 mangodev[d]: even though that has long since been merged
17:59 mangodev[d]: there's also the dxvk/d3dvk12 support checklist with `fragment_interlock` unchecked with a red triangle next to it? never knew what the triangle meant
17:59 gfxstrand[d]: IDK about that. I'm pretty wary of other people trying too hard to project manage me.
18:00 gfxstrand[d]: But if there's an issue that's clearly been forgotten about, it's okay to ask.
18:00 mhenning[d]: gfxstrand[d]: there's a tiny fix in https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35141 that's ready for review. and then https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33861 and https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33306 are really big ones that are still waiting too
18:00 gfxstrand[d]: But there are a lot of them that are left undone because they're a giant PITA for not huge benefit.
18:01 mangodev[d]: gfxstrand[d]: fair
18:01 mangodev[d]: some issues and MRs just feel forgotten about
18:01 mangodev[d]: "All good, gonna submit for merging in a bit" ***Open, last active 4 months ago***
18:01 mangodev[d]: though tbf i think some are forgotten for a reason
18:01 mangodev[d]: mainly small patches and such
18:03 gfxstrand[d]: I should skim through the issues again and clean them up
18:03 gfxstrand[d]: That might actually be a good project for this afternoon once I'm done playing with Maxwell UBOs
18:03 mangodev[d]: mhenning[d]: it's sad to see the prepass scheduler get more and more realistic in its benchmarks ðŸ˜Ē
18:03 mangodev[d]: from -60% cycles down to -1%
18:04 mhenning[d]: part of that is just that other things are getting fixed
18:04 mangodev[d]: -6% feels like such a big drop from the previous reported -30% to -60% improvements in cycle count
18:05 mangodev[d]: mhenning[d]: fair
18:05 mangodev[d]: so were the previous gains just from fixed bugs and regressions?
18:05 mangodev[d]: and now that there's not as many, it can't do as much?
18:06 mhenning[d]: yeah, some of the previous wins were the scheduler improving code that shouldn't have been there to begin with
18:06 mangodev[d]: i'd assume that both the prepass and postpass can have further work done in the future on them though
18:06 mangodev[d]: hence why they're "simple"
18:07 mangodev[d]: mhenning[d]: what about zculling? I remember it being brought up that it could help shadowmap perf, just for the MR to say it gives a whopping 3% improvement in a game heavy on shadows
18:08 mhenning[d]: zcull has some possible later improvements, but I'm focusing on other things right now because I think they'll be more impactful
18:08 mangodev[d]: why does horizon zero dawn work so well on NVK? most other 3D games can barely even run on the driver, yet zero dawn seems to run pretty well (going off benchmarks)?
18:08 mangodev[d]: mhenning[d]: ah, fair point
18:09 mangodev[d]: i guess the nice thing about these larger MRs aren't their immediate gains, but rather what they hold for the future
18:11 mhenning[d]: mangodev[d]: That's a good question. Something about their renderer works well with nvk, although I'm not totally sure what
18:12 mhenning[d]: Part of it is that I think they weren't really hitting the spilling bugs that were recently fixed
18:12 mhenning[d]: but it might be other things too
18:12 mangodev[d]: i've seen valheim apparently also works well (probably in part because native vulkan wayland)
18:13 mangodev[d]: noita works better than proprietary, although it's a 2D cpu-heavy game so it's not really a fair point for NVK, all it means is less GL graphical bugs than prop. nvidia
18:15 mangodev[d]: blender vulkan backend works REALLY well
18:15 mangodev[d]: i wish cycles could get a vk compute or vk rt backend so that could be used in the future when NVK is more matured
18:16 mangodev[d]: i should probably file a bug on kde window artifacting on window morph though
18:17 mangodev[d]: on OpenGL applications, the window contents briefly become a jumbled mess of *stuff* during window morphing (maximizing, fullscreening, and tiling)
19:04 gfxstrand[d]: Ugh... Maxwell cache flags are such a mess with NAK...
19:25 mhenning[d]: gfxstrand[d]: yeah, part of me wonders if we should make the IR a bit closer to the hardware there. so, instead of having everything use volta+ cache flags, make things store an enum of either maxwell or volta cache flags
19:28 gfxstrand[d]: Maxwell is almost the same as Kepler
19:28 gfxstrand[d]: I'm reasonably happy with the select() function I did for Kepler.
19:28 gfxstrand[d]: But IDK what the right representation is
19:30 gfxstrand[d]: I like Volta's MemOrder stuff
19:31 gfxstrand[d]: IDK what to do with the eviction hints, though. There's a lot more of them than map to any thing reasonable on Maxwell/Kepler.
19:31 gfxstrand[d]: And anything non-global is a weird special case.
19:33 mhenning[d]: the volta+ hardware actually has eviction hints on almost everything, but we can't always use them because nir is missing the flags in some cases
19:33 gfxstrand[d]: We could make `MemSpace::Global` contain more than just the address mode.
19:34 mhenning[d]: gfxstrand[d]: yeah, the memorder could go in there. I've wondered about doing that before
19:35 gfxstrand[d]: Eviction hints could, maybe, except we might want them on local for stack pop
19:36 gfxstrand[d]: Holy shit! ld.ci is frickin' magic on Maxwell
19:36 gfxstrand[d]: Not as good as ld.constant, I don't think, but really good.
19:37 HdkR: You like? :P
19:39 gfxstrand[d]: It only makes Unigine Heaven go like 4x faster
19:40 HdkR: Dang, not quite 10x
19:40 HdkR: What were you using before. ld.g or something?
19:41 HdkR: ldg? Whatever it is
19:45 gfxstrand[d]: `ld.ca`
19:45 gfxstrand[d]: Which is the normal "cache all" option
19:45 gfxstrand[d]: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35265
19:46 HdkR: ah, yea, that wouldn't be as good :D
19:46 gfxstrand[d]: Unfortunately, we don't know what `.ci` is. There are no docs that mention it that we've been able to find.
19:46 gfxstrand[d]: We just know CUDA sets it for constant memory on Maxwell and Pascal.
19:49 gfxstrand[d]: And benchmarks say it's fast
19:49 gfxstrand[d]: Invariant maybe?
19:49 gfxstrand[d]: Invocation?
19:49 gfxstrand[d]: Immutable?
19:49 gfxstrand[d]: "I promise this is constant"?
19:51 mhenning[d]: incoherent?
19:51 HdkR: I forget, just remember that it's the constant load to use :D
19:52 mhenning[d]: anyway, since it corresponds to `__ldg()` I think we'll just need to refer to the docs for that
19:54 gfxstrand[d]: Incoherent would make sense
19:55 gfxstrand[d]: I suspect that's roughly what `.constant` does on Volta+. "Just grab it from a cache. Any cache! I don't care. Just grab something!"
19:58 mhenning[d]: oh, I guess ptx calls it ld.global.nc
19:59 mhenning[d]: so it's `__ldg()` (c++) -> `ld.global.nc` (ptx) -> `.ci` (maxwell) or `.constant` (volta)
20:00 mhenning[d]: and we have docs for the first two
20:01 gfxstrand[d]: The CUDA docs imply that `.nc` goes through the texture cache. That would make sense.
20:01 gfxstrand[d]: I wonder if that means we need a texture cache invalidate for them. I've never observed this.
20:03 mhenning[d]: gfxstrand[d]: I think that might depend on the architecture. So maxwell might be different from volta there
20:03 gfxstrand[d]: Yeah, I suspect so
20:04 gfxstrand[d]: We'll see if we have any CTS fails around using `.ci`
20:04 gfxstrand[d]: Okay, added a doc comment
20:04 marysaka[d]: Easy to check, open PTX 5.0 docs and see how it refer about it
20:06 marysaka[d]: https://docs.nvidia.com/cuda/archive/8.0/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-ld-global-nc
20:06 gfxstrand[d]: > Load register variable d from the location specified by the source address operand a in the global state space, and optionally cache in non-coherent texture cache. Since the cache is non-coherent, the data should be read-only within the kernel's process.
20:07 mhenning[d]: I think the data and texture caches are combined on volta. Per the volta whitepaper: "For example, if shared memory is configured to 64 KB, texture and load/store operations can use the remaining 64 KB of L1." https://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdf
20:09 mhenning[d]: oh maybe on maxwell too? "Maxwell combines the functionality of the L1 and texture caches into a single unit.
20:09 mhenning[d]: As with Kepler, global loads in Maxwell are cached in L2 only, unless using the LDG read-only data cache
20:09 mhenning[d]: mechanism introduced in Kepler." https://docs.nvidia.com/cuda/pdf/Maxwell_Tuning_Guide.pdf
20:09 HdkR: Yea, maxwell tunables let you fuss with it as well
20:09 HdkR: Maxwell was the first to offer combined I believe.
20:11 mhenning[d]: mhh that feels inconsistent with the ptx docs "On some architectures, the texture cache is larger, has higher bandwidth, and longer latency than the global memory cache. For applications with sufficient parallelism to cover the longer latency, ld.global.nc should offer better performance than ld.global on such architectures." https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-
20:11 mhenning[d]: movement-and-conversion-instructions-ld-global-nc
20:11 mhenning[d]: unless that note is for kepler or something?
20:13 mhenning[d]: oh, interesting. ptx's ld.global.nc requires sm_32 while __ldg requires sm_50
20:14 mhenning[d]: we should probably figure out what ld.global.nc becomes on kepler 2
20:14 HdkR: Actually, did Maxwell1 have it? Not sure there...
20:19 gfxstrand[d]: Kepler B doesn't appear to have `.ci`, unless it's a different op
20:22 gfxstrand[d]: Okay, there is a .ci
20:22 gfxstrand[d]: Let me hunt for it
20:23 gfxstrand[d]: Okay, how do I PTX to a binary?
20:24 mhenning[d]: `ptxas memaccess.ptx --gpu-name sm_89 -o hi.cubin && nvdisasm -hex hi.cubin`
20:31 mhenning[d]: gfxstrand[d]: okay, I managed to get compiler explorer to use it: https://godbolt.org/z/xjjnG8e3K
20:31 gfxstrand[d]: Same but it won't give me a binary
20:32 mhenning[d]: it generates LDG.E.P, which interestingly needs a TEXDEPBAR
20:32 gfxstrand[d]: And fuzzing the instruction space isn't turning anything up
20:32 gfxstrand[d]: Maybe it's only on ldg and we use ld?
20:33 mhenning[d]: 600210867f9c0801
20:34 gfxstrand[d]: Yeah, that's ldg
20:36 gfxstrand[d]: Yeah, that's a magic tex op alright
20:36 mhenning[d]: Both ld.global and ld (generic) seem to encode to the same instruction on kepler for me
20:37 gfxstrand[d]: with godbolt?
20:37 mhenning[d]: both of which are called `ld`
20:37 mhenning[d]: yes
20:37 gfxstrand[d]: Same bit pattern?
20:37 mhenning[d]: yep
20:37 gfxstrand[d]: Okay, so the fact that they're calling it ldg is a lie
20:37 gfxstrand[d]: it's a texture op
20:37 mhenning[d]: c4800000001c0808 for both
20:37 gfxstrand[d]: Doing a bit scan, it degrades to tex if you whack the wrong bit
20:38 mhenning[d]: yeah, it looks like what kepler calls "ldg" is different from what volta calls "ldg"
20:39 mhenning[d]: which actually explains the weird `__ldg` naming in c++
20:39 gfxstrand[d]: And this is something different entirely
20:40 mhenning[d]: I'm guessing kepler's `ldg` is always the texop
20:40 gfxstrand[d]: yeah
20:40 gfxstrand[d]: I wonder if kepler A has anything like this
20:41 mhenning[d]: `ptxas /app/example.ptxas, line 28; error : Instruction 'ld.nc' requires .target sm_32 or higher`
20:42 mhenning[d]: so ptx doesn't expose it until 32
20:47 gfxstrand[d]: In NAK, I think this should probably be its own opcode. It also comes with restrictions like not supporting 64-bit
20:47 mhenning[d]: yeah, that sounds sensible
20:48 gfxstrand[d]: I'm gonna start taking notes in an issue.
20:48 gfxstrand[d]: I've about got it R/E'd
21:17 gfxstrand[d]: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13276
21:20 gfxstrand[d]: And Kepler A and B are both submitted. We'll be conformant in 30 days. 🎉
21:20 gfxstrand[d]: I seriously did not expect Kepler to go this fast.
21:21 gfxstrand[d]: Guess I should probably write another blog post. :bim_giggle:
21:23 gfxstrand[d]: Well, Kepler is passing CTS, codegen has been purged from `src/nouveau`, Maxwell has 4x faster UBOs, and we merged some of mhenning[d] 's improvements. It's been a good week. 💜
21:28 sonicadvance1[d]: Soon all those Tegra X1 users can get away from the proprietary driver 😛
21:28 gfxstrand[d]: I'm headed home. Maxwell will be done CTSing my `ld.ci` MR in 45 minutes. Then I'll kick off a second run with `NVK_DEBUG=no_cbuf` just to be sure.
21:28 gfxstrand[d]: sonicadvance1[d]: That requires me to get my X1 working and and land cache flushing first.
21:28 sonicadvance1[d]: Oh naur.
21:29 gfxstrand[d]: marysaka[d]: and I have no idea why mine doesn't like me. It can't see the GPU. Hers boots fine with the same config. I might just have a boad board or a bad compute unit.
21:29 gfxstrand[d]: Anyway, time to head home
21:29 marysaka[d]: gfxstrand[d]: ngl I wouldn't mind swapping board next time we meet
21:30 sonicadvance1[d]: I assume T214 versus T210 boards don’t cause a visible difference?
21:30 marysaka[d]: I have 2 around and both work fine ect
21:30 marysaka[d]: sonicadvance1[d]: I don't think any Jetson TX1 shipped with T214
21:30 marysaka[d]: I have one EOL board
21:30 marysaka[d]: the other is the regular one with the camera module
21:31 sonicadvance1[d]: Oh, I thought it shipped on the nano or something, probably just misremembered.
21:31 marysaka[d]: oh might have shipped on nano but not certain, my nano fried so uum cannot check
21:31 sonicadvance1[d]: Haha, oops
21:32 gfxstrand[d]: gfxstrand[d]: Actually, we know it's not a bad board because it works fine with the stock ubuntu image from NVIDIA.
21:33 marysaka[d]: yeah no it could be different rev of the board (like old variant as I have the latest with the one that was sold during "EOL sales")
21:33 marysaka[d]: or it could just be difference in fuses
21:34 marysaka[d]: in any cases, would love to dig into that because that's quite suspicious tbh
21:37 marysaka[d]: sonicadvance1[d]: but yeah I'm certain it's not t210b01/t214 as I asked her to use my automated script that setup it the same way as I do (with an explicit flash of t210ref via L4T scripts)
21:42 i509vcb[d]: sonicadvance1[d]: Just in time for switch 2 release so that people can get T239 running after someone breaks it open
21:44 gfxstrand[d]: That should be fine. It's basically just Orin which is just mobile Ampere. Once the board boots and nouveau brings up the chip, NVK should be fine (modulo the previously mentioned cache flushing patches).
21:45 i509vcb[d]: Has orin kind of been adandoned in nouveau.ko since I last checked on it like 2 years ago?
21:45 marysaka[d]: Yeah the Orin Nano is a cute target that could be bringup easily enough hopefully, I think the price of those dropped some month ago too
21:46 i509vcb[d]: yeah the dev board is like $250 or something like that now
21:46 gfxstrand[d]: But I do seriously need to get a Tegra board of some form working on my desk so I can sort that all out. Maybe I can find a TK1? ðŸĪŠ
21:46 marysaka[d]: i509vcb[d]: It's another kind of GSP from what I recall and the openrm module for it is not the same as the one you use on desktop so there is that
21:46 i509vcb[d]: yeah I recall looking through nvgpu and it was gsp related but differed in a few ways
21:48 i509vcb[d]: did nvidia port ga10b over to openrm?
21:48 i509vcb[d]: I thought it was only working under nvgpu for some time
21:49 marysaka[d]: I mean original release of Orin was on openrm for sure
21:49 marysaka[d]: for source code public release I mean
21:50 marysaka[d]: tbh I could donate one of my two TX1 boards, they have been mostly taking dust/space since I stopped doing Falcon/TSEC shenanigans years ago...
21:50 marysaka[d]: and it's not like I'm lacking t210 based devices... *looks away from her 5 switches on the shell*
21:55 sonicadvance1[d]: marysaka[d]: Nice nice.
21:57 avhe[d]: i509vcb[d]: i ordered one in early feb and it still hasn't arrived :))
22:02 esdrastarsis[d]: gfxstrand[d]: What's the next step?
22:14 gfxstrand[d]: esdrastarsis[d]: I switch gears and work on implementing an extension I can't talk about publicly.
22:15 gfxstrand[d]: But it's supposed to be public soon which is why I can to go at it full tilt for a few weeks.
22:22 mhenning[d]: VK_EXT_FIXED_FUNCTION_FOG ? 😛
22:23 gfxstrand[d]: Shhhh! ðŸĪŦ You're not supposed to talk about that! That one's not scheduled for way later in the year.
22:31 snowycoder[d]: gfxstrand[d]: thanks for all the weird bugfixes, I would've spent another two months only on those
22:32 gfxstrand[d]: Yeah. With the exception of the one encoding error with pixld, most of them were API-side things that I wouldn't expect someone new(ish) to figure out.
22:32 HdkR: VK_EXT_quads to satisfy those people that are so terribly concerned about cache hitrate along the inner triangle edges :P
22:32 gfxstrand[d]: Like the sampler one? Yeah, that's not something you were going to guess.
22:33 gfxstrand[d]: HdkR: VK_EXT_vertex4f, actually.
22:34 HdkR: Oh snap
22:34 HdkR: Spoilers