08:06leonidsmirnov: I took the pseudo code on 37 41 and 57 (cause on IRC chat 3 member set was easier to follow), the first collision in my large/larger sequence happens at 65 not before than that, this is 5+9+13+17+21=65 , so hence you chunk those indexes at 4bit/powers each and place them to proper indexes (the needed procedures were demonstrated). And then you decode all combinations easily as was before
08:06leonidsmirnov: told indirectly, they start to ripple or self-permute, in the context of multiply instruction that works so: 1) the whole value is decoded to bitfield, then indexes are added to each four bit subfields accordingly, then answer is pinned as queue jumping procedure, once the stuff is being summed together, the answer is selected, as there are many of the same index degree fields available,
08:06leonidsmirnov: the selection is done based of the one that was placed on the hash in compute mode. For data we already described the procedure. Yes that works similar to as how they design viruses to get rid of weak people. So technically there are worms in world like those who got recruited to kill me there in cambodia, they failed many times to do that, and are soon all dead, sense or any strengths
08:06leonidsmirnov: those people never had, only fraud and amoral shit they do.
08:27leonidsmirnov: I do not understand what they try to scam with their reverse psychology type of thing, the strength of human force does not come through consuming steroids, it's a product of birth talent and practice , so the first is a prequisite to tran your muscle and bone and such nucleus , so you get a proper force handling to lift weights, they attack my nucleus in an attempt to reverse things, but
08:27leonidsmirnov: it is not producing the results that they are after, it will only do harm to them and to the rest of the world.
10:31x512[m]:sent a code block: https://matrix.org/oftc/media/v1/media/download/AUl9CFejOnOVylK8Jaaaix-v7GNP7XUgjwEjrKL6DV2MsrXJxpYvbEk-LAL0LvFv-Gu933RXBmr90kBDqDNDr2BCeY4MhB9AAG1hdHJpeC5vcmcvdWlpeW54eXJVZHVnVFVNR3pFYWlqRnd5
10:31x512[m]: What I am doing wrong?
10:41karolherbst: x512[m]: need to run meson subprojects update --reset
10:58x512[m]: OK, latest NVK still works on Haiku.
13:16bylaws[d]: https://redplait.blogspot.com/2025/03/nvidia-sass-disassembler.html?m=1
13:16bylaws[d]: Came across this
13:16bylaws[d]: Kinda neat
13:45snowycoder[d]: bylaws[d]: Oh wow that is a lot of data, there seems to be even scheduling information for instructions
14:00snowycoder[d]: Looking at Kepler where we don't have any scheduling info, I recently found that IADD uses either 9 cycles or 13 when writing the carry.
14:00snowycoder[d]: The files in that repo report that IADD is in the FAU class and:
14:00snowycoder[d]: FAU_OPS`{Rd} : 9
14:00snowycoder[d]: ...
14:00snowycoder[d]: FAU_OPS[?writeCC]`{writeCC} : {9 + 4}
14:01snowycoder[d]: Can we use this? Reverse engineering is always a gray legal area
14:23gfxstrand[d]: I would generally avoid anything that comes from reverse-engineering the driver binaries. That's very clearly against the EULA.
14:24snowycoder[d]: Isn't fuzzing nvdisasm a similar situation though?
14:25gfxstrand[d]: Fuzzing is a very different thing from cracking open the binary
14:29x512[m]: Wine/ReactOS developers claim that clean room reverse engineering is fine.
14:32ermine1716[d]: That's really thin ice
14:36gfxstrand[d]: I'm not a lawyer and I'd really like to not be put in the position of having to hire one. Regardless of whether or not clean-room engineering is legally defensible, if we ever get sued by NVIDIA it's game-over for nouveau. No one has the means+motivation to fight that behemoth.
14:40gfxstrand[d]: Wine/ReactOS are fine, not because everything is legal, but because Microsoft simply doesn't care. "Oh, no! Someone re-implemented Windows XP!" Big meh from them.
14:41ermine1716[d]: And yet they spent a lot of time checking their code when someone said that they have copyrighted stuff...
14:46snowycoder[d]: I would never use it if we don't all agree, but that data would be a goldmine for old chips (Kepler and Maxwell) where Nvidia doesn't provide instruction scheduling tables.
14:46snowycoder[d]: That could probably still be in the "who cares" section (we're talking about 10 year old cards), but the line is really thin.
14:51karolherbst[d]: gfxstrand[d]: they are also fine as their stance is: "if you ever saw MS source code or disassembled binaries: go away"
14:52karolherbst[d]: snowycoder[d]: no
14:53karolherbst[d]: also
14:53karolherbst[d]: those projects also _forbid_ discussing such topics
14:53karolherbst[d]: so if there is a reasonable believe you might have seen those things, you are also out
14:54karolherbst[d]: so strictly speaking, we wouldn't be able to accept any contributions from you anymore as it might be tainted as you've shown
14:54karolherbst[d]: so the question is, if we want to be as strict
14:58snowycoder[d]: I only checked with things we already knew to see if the data was real.
14:58snowycoder[d]: If you still think that might be a problem, I understand, my last desire is to endanger nouveau/NVK.
14:58gfxstrand[d]: I'm not gonna go that far. "I found a website and it looked interesting so I asked" isn't something I'm too worried about.
14:58karolherbst[d]: yeah.. it's always a question how paranoid you want to be
14:58gfxstrand[d]: But also, yes, I'm pretty uninterested in pulling any data from such projects.
14:59karolherbst[d]: the problem with discussing those things is, that it can also taint everybody reading it
14:59snowycoder[d]: I'm really sorry
14:59karolherbst[d]: _some_ companies are really impressively strict with it
14:59karolherbst[d]: not nvidia, but like nintendo e.g.
15:00karolherbst[d]: snowycoder[d]: don't worry, it's just... a lot of those things are just "unwritten rules" so some might just not be aware of all the details
15:01karolherbst[d]: we should probably have a proper document stating the rules more explicitly
15:01karolherbst[d]: (and also figure what those rules are)
15:01karolherbst[d]: *figure out
15:01karolherbst[d]: but the rule of thumb with anything legal is: if you don't know and you aren't a lawyer, you'll lose in court.
15:02karolherbst[d]: "it's probably fine" -> "you'll lose in court"
15:02karolherbst[d]: though
15:02karolherbst[d]: I doubt nvidia cares about old hardware
15:02karolherbst[d]: so it's fine really
15:02karolherbst[d]: just...
15:03karolherbst[d]: it's more about project integrity and having a firm stance on it to protect the reputation
15:03karolherbst[d]: like if nouveau would be known to do all those things, then nvidia would have been less willing to work with us
15:10karolherbst[d]: snowycoder[d]: just to be explicit. We won't reject MRs from you, because like.. as I said, not everybody knows, but we've explained it now, and we should probably have a write up somewhere that's visible enough for everybody to know before hand.
15:52mhenning[d]: snowycoder[d]: Just to expand on this a little, the general assumption is that reverse engineering is fine as long as you treat the proprietary code as a black box. That is, if you have copyright for the compiler's inputs then you also have copyright to the compiler's outputs and so disassembling the output is fine. Disassembling the compiler (or nvdisasm) itself is not fine because nvidia owns
15:52mhenning[d]: that code.
16:13redsheep[d]: mhenning[d]: That gives me kind of a strange idea. Has anyone ever tried bypassing nir and nak entirely by special casing specific inputs to spit out exactly the same thing Nvidia would?
16:13redsheep[d]: Seems like whatever performance that results in would give good confirmation of whether the compiler is the thing that needs more work in order to get nvk up to speed
16:16redsheep[d]: I'm imagining running an entire game on the shaders Nvidia would output. If it's still slow then the kernel or the GPU setup or something like that is the bigger issue
16:18mhenning[d]: The interface between the shaders and the rest of the driver isn't trivial, so doing that might be a lot more complicated than it sounds
16:24gfxstrand[d]: We may have to figure out enough of that interface for basic compute if we want DLSS to work but compute is a lot simpler than 3D in general.
16:25mohamexiety[d]: yeah I guess the idea is to rig in the ability to just consume NVIDIA-produced SASS
16:26mohamexiety[d]: (for DLSS and such. though there will be an issue for the cases where the DLSS version is older than a GPU gen. e.g., say DLSS 3 on Blackwell)
16:41notthatclippy[d]: As a first step, it would probably be very informative to benchmark nvk+nouveau vs nvk+openrm
16:46x512[m]: I do not like AI upscaling idea in general.
16:48mohamexiety[d]: I hate non-AI upscaling a lot more; the quality difference is night and day to say the least
16:49mohamexiety[d]: notthatclippy[d]: Yeah it would be nice to resume the effort on that front
16:49x512[m]: No upscaling is best.
16:51karolherbst[d]: sadly games start to rely on it, because it gets impossible to run those games on budget GPUs
16:51karolherbst[d]: like if you have the choice between "it runs like crap, and low quality" and "it runs decently and even looks fine" then the 2nd one is the better choice even if it's upscaled 😛
16:52x512[m]: I will run in a window, but still 1x scale.
16:52karolherbst[d]: also upscaling is nice for my nintendo switch 😛
16:54x512[m]: WuWa run good enough with my RTX A2000. RTX A2000 is budget GPU I suppose?
16:54chikuwad[d]: I'd rather upscale and have more immersion than play windowed :3
16:54karolherbst[d]: mohamexiety[d]: fyi, it should be doable to support FSR on nvk, just need to add some hacks
16:54mohamexiety[d]: We already do support FSR
16:54chikuwad[d]: upto 3 already works, yeah
16:54karolherbst[d]: right...
16:54karolherbst[d]: I meant 4
16:54chikuwad[d]: s\/3/3.1
16:55chikuwad[d]: karolherbst[d]: I heard something about FSR4 on non-AMD hardware being against AMD licenses
16:55x512[m]: DLSS is special hardware, not just shader or CUDA core?
16:55karolherbst[d]: chikuwad[d]: well.. doesn't matter if it still ends up running and nobody complains, does it?
16:55chikuwad[d]: x512[m]: yes, it uses the "tensor" cores for some things
16:55mohamexiety[d]: I don’t really think there’s much of a point to 4 given we can get DLSS but it could be nice I guess
16:55karolherbst[d]: x512[m]: matrix multiplication
16:55chikuwad[d]: karolherbst[d]: I suppose
16:55karolherbst[d]: mohamexiety[d]: I mean.. we already support it in principle
16:56karolherbst[d]: just need to add more matrix types
16:56ermine1716[d]: There was an issue about dlss support in nvk
16:56mohamexiety[d]: karolherbst[d]: Hm fair enough if it’s that simple :thonk:
16:56karolherbst[d]: mohamexiety[d]: well.. it goes a bit out of spec, so to advertise it properly, we need to support int8 C matrices and that's a bit of lowering...
16:56karolherbst[d]: or we just yolo it
16:57karolherbst[d]: probably a couple of days work to get it all running
16:58karolherbst[d]: the issue is, that it sucks 😄
16:59karolherbst[d]: ermine1716[d]: where is the MR tho 🙃
16:59karolherbst[d]: the issue with DLSS is that we get binary blobs, right?
17:00karolherbst[d]: ahh.. it uses CUDA
17:00karolherbst[d]: pain
17:00karolherbst[d]: there is also this "launch this cuda kernel" vulkan extension, but that's still pain
17:02x512[m]: redsheep[d]: "in would give good confirmation of whether the compiler is the thing that needs more work in order to get nvk up to speed"
17:03karolherbst[d]: anyway.. if somebody wants to implement it... I can certainly give pointers.
17:03x512[m]: It is also possible to run NVK on NVRM instead on Nouveau.
17:04x512[m]: It may help to test if Nouveau is doing something wrong.
17:04x512[m]: "launch this cuda kernel" Is it even possible without uvm.ko?
17:04chikuwad[d]: karolherbst[d]: fwiw that just got deprecated in dxvk-nvapi
17:05karolherbst[d]: x512[m]: sure
17:05chikuwad[d]: I think?
17:05chikuwad[d]: or is it just nvapi that stopped using it
17:05chikuwad[d]: uhh
17:05chikuwad[d]: https://github.com/jp7677/dxvk-nvapi/pull/297
17:05karolherbst[d]: pain
17:06chikuwad[d]: ok yeah
17:06chikuwad[d]: it's just nvapi no longer using it
17:06chikuwad[d]: BUT
17:06chikuwad[d]: > this extension was already excluded from newer Winevulkan versions.
17:06karolherbst[d]: supporting FSR4 will be far less work than that tbh
17:12chikuwad[d]: which nv gens support FP8 "natively" though
17:12chikuwad[d]: and which ones would need it to be emulated
17:14karolherbst[d]: dxvk emulates fp8 if not supported
17:36chikuwad[d]: oh yeah the vkd3d-proton env var
17:36chikuwad[d]: config
17:36chikuwad[d]: whatever
17:39ermine1716[d]: karolherbst[d]: Don't look at me like this
17:42chikuwad[d]: chikuwad[d]: DXIL_SPIRV_CONFIG=wmma_rdna3_workaround
19:03mohamexiety[d]: chikuwad[d]: Ada and Blackwell
19:19kar1m0[d]: mohamexiety[d]: I am looking at power_budget.c file and I am not exactly sure about as to how the nvidia driver on windows is able to make my gpu work at 145w in performance mode while on linux the nvidia driver restricts my gpu to only 80w. I know that this has nothing to do with the nouveau drivers but I am just wondering as to how does it work on windows and not linux, it just made me wonder.
19:19kar1m0[d]: Also how does nvidia exactly set the gpu power limitation in the driver? Was it ever discussed?
19:20kar1m0[d]: like I understand that the driver it pointing to vbios to get the info from it and then it is used in the gpu driver
19:20kar1m0[d]: but still
19:34snowycoder[d]: I tried to put a infinite loop in a vertex shader to debug things, but the kernel driver timed out (printing the stacktrace).
19:34snowycoder[d]: This isn't supposed to happen, right?
19:34snowycoder[d]: log extract: `WARNING: CPU: 16 PID: 63878 at drivers/gpu/drm/nouveau/nvkm/engine/fifo/nv50.c:228 nv50_runl_wait+0xe0/0xf0 [nouveau]`
19:39HdkR: Working as intended
19:39HdkR: Infinite loops are illegal :)
19:40kar1m0[d]: snowycoder[d]: Uhhhh why are you putting it in an infinite loop?
19:40snowycoder[d]: HdkR: yes but, should they block the kernel module and every other program?
19:40snowycoder[d]: The only way to get it unstuck is by unloading the kernel module.
19:40snowycoder[d]: It would be a pretty brutal DOS attack in WebGPU
19:42snowycoder[d]: kar1m0[d]: I wanted to check if a crash happened before or after a certain point of the shader.
19:42snowycoder[d]: That was a stupid idea, now I learned (but it worked!)
19:43kar1m0[d]: snowycoder[d]: You could just put printf no? To check where it stops
19:47esdrastarsis[d]: mohamexiety[d]: Does turing support fp16?
19:47snowycoder[d]: kar1m0[d]: I'm not sure it can be used with amber
19:49kar1m0[d]: snowycoder[d]: Why not?
19:55snowycoder[d]: kar1m0[d]: I never really used amber before, I think the extension is not supported and the `debugPrintfEXT` call gets silently removed
19:56marysaka[d]: esdrastarsis[d]: Yes
19:56esdrastarsis[d]: cool
19:56kar1m0[d]: snowycoder[d]: Aw man
20:30mohamexiety[d]: kar1m0[d]: I am not sure but it looks like a bug tbh
20:31mohamexiety[d]: and might be completely unrelated to the driver, but not sure. if the laptop OEM's software controls the exposed power limit, then it could be that on windows it only exposes 145W when you toggle something and the software isnt on linux or such
20:43gfxstrand[d]: snowycoder[d]: No. They shouldn't. That would be a security nightmare. So we kill any contexts that time out like that.
20:50karolherbst[d]: I mean.. we did get a CVE on it I think 🙃 not 100% tbh
20:51karolherbst[d]: wait we already have like 35 CVEs? impressive
20:53karolherbst[d]: there ya go: https://www.cve.org/CVERecord?id=CVE-2018-3979
20:53karolherbst[d]: `Vendor advised “issues are known already”` I think that was me 🙃
20:56snowycoder[d]: I assume that is a problem with the open firmware?
20:58karolherbst[d]: gpu recovery somewhere
20:59karolherbst[d]: like I dunno.. I find the score of "7.4" a little excessive there
21:17kar1m0[d]: mohamexiety[d]: Well I think it's a mix on proprietary code from windows OEM and Nvidia
21:18kar1m0[d]: Because my hp omen had omen hub on windows
21:18mohamexiety[d]: do you have to do anything special on omen hub to get gpu to 145W or was it just like that ootb? :thonk:
21:18kar1m0[d]: From which I enabled the performance mode which let my gpu go up to 145w
21:18mohamexiety[d]: ah
21:18kar1m0[d]: While on linux I cannot do that
21:18kar1m0[d]: The max I could possibly get was 85w but now it's 80w
21:19mohamexiety[d]: yeeeah I think the trick is figuring out how to do something like on the linux side. there has to be some way to switch presets
21:19kar1m0[d]: mohamexiety[d]: Because when I tried to do the same through Nvidia-smi it told me that it isn't possible for this gpu or something
21:20kar1m0[d]: Basically an artificial restriction
21:20kar1m0[d]: Which sucks
21:24mohamexiety[d]: kar1m0[d]: it's because of how HP configured the thing sadly. you need to switch to the perf profile on linux somehow. do you know if this https://openomen.github.io/ works for your laptop?
21:58cubanismo[d]: gfxstrand[d]: Corporate gears have finally creaked through a full rotation, and the format modifier patches are posted now.
22:01ristovski[d]: mohamexiety[d]: How is the power profile even set on such laptops, ACPI?
22:01mohamexiety[d]: I genuinely have no clue tbh
22:02mohamexiety[d]: I never looked into it much
22:02mohamexiety[d]: I just know that some of these laptops have configurable power profiles for GPU/CPU which is what made me suggest that this is what’s at play
22:02gfxstrand[d]: cubanismo[d]: Yay! I'll look tomorrow
22:02ristovski[d]: Also OpenOmen seems to have been taken down? I see no releases and no code on the github
22:03ristovski[d]: ah "(In the works)"
22:04mohamexiety[d]: Oh damn I didn’t check and took it for granted :blobcatnotlikethis:
22:04mohamexiety[d]: Last update was in May 2025 at least
22:07ristovski[d]: Btw, have all of nouveaus reverse engineering efforts been in the likes of tracing mmio et al? I assume nvidia would be fine with that as it does not really infringe their terms of use/license, but not sure if anything more.. direct has been done
22:07ristovski[d]: Though with nvidia providing docs/headers, I assume that era of nouveau is sort of ancient history at this point
22:12x512[m]: snowycoder[d]: In theory Nvidia support preemption on hardware level, so infinite loops should be not a problem.
22:13x512[m]: It can be Nouveau kernel module bug.
22:14snowycoder[d]: Taking in consideration the CVE above too, it is quite likely
22:15snowycoder[d]: p.s. not long after I must've made some other illegal thing, nvkm decided to hang the whole kernel, requiring a reboot
22:31gfxstrand[d]: x512[m]: Preemption at that granularity is rarely "at the hardware level" and Nvidia has only had mid-shader preemption for 3D since Turing.
22:32snowycoder[d]: gfxstrand[d]: So everything pre-turing is vulnerable to DOS attacks?
22:33gfxstrand[d]: Yup
22:33gfxstrand[d]: And that's why we kill bad contexts
22:36gfxstrand[d]: And I don't know if it's working today even on Turing. Preemption at that granularity often requires userspace to play along
22:37gfxstrand[d]: It turns out preempting 4k threads and a rasterizer is hard, actually.
23:08x512[m]: Each channel can be preempted.
23:08x512[m]: I suppose it should work with official Nvidia kernel driver. Who knows about Nouveau.
23:09x512[m]: I heard that Linux DRM subsystem have fundamental design flaw that GPU jobs should not execute for a long time.
23:10x512[m]: DRM kernel fence semantics etc.
23:17cubanismo[d]: In various shadowy backroom conversations (j/k, mostly), most DRM devs have admitted the only reliable forward progress guarantee graphics-based dma_fence objects have is a watchdog/timeout mechanism.
23:19cubanismo[d]: Any hardware that can implement a loop in its shaders and can't preempt at instruction granularity is going to have that limitation AFAIK.
23:19x512[m]: No need to have forward progress guarantee if preemption if working.
23:19x512[m]: Hang program will not affect others.
23:19cubanismo[d]: Right. Big if.
23:20x512[m]: I want Vulkan extension to control preemption (force execution halt without device lost, set execution timeout etc.).
23:21cubanismo[d]: IIRC, the windows equivalent thing (TDR) has a very aggressive timeout.
23:37x512[m]: RADV recently added userland fence and ring buffer feature...
23:37x512[m]: They somewhat managed to integrate it with DMA fence.
23:40airlied[d]: pretty sure we modeled our timeout on Windows TDR
23:42gfxstrand[d]: x512[m]: Sort of. IIRC it's only allowed for compute-only jobs which will never be mixed with graphics and where the only thing that's ever waiting on it is something else that uses userspace fences or a userspace CPU wait.
23:43airlied[d]: no latest AMD has userland command submission for GFX, but I'm not across how they solved things or if they have
23:43gfxstrand[d]: I doubt they have
23:43airlied[d]: I think it's more of a build it and solve as we go type of problem
23:44airlied[d]: I expect for nova we should just go screw it and build userspace command submission and work out wtf to do with memory eviction
23:44x512[m]: How it will integrate with DMA fence?
23:45x512[m]: Wayland stuff etc. currently depends on it.
23:45gfxstrand[d]: airlied[d]: That's not too big of a problem. You can keep memory management separate and basically only have preempt fences.
23:45x512[m]: New Wayland protocol for userland fence?
23:45gfxstrand[d]: x512[m]: At the moment? YOLO.
23:46gfxstrand[d]: Long term, there's a few of us who have decent ideas but nothing written down or implemented yet.
23:47airlied[d]: it appears it still enters the kernel for signal/wait
23:49x512[m]: It is possible to make "userland timeline futex". 64 bit value in memory that can be waited/signaled both by CPU and hardware.
23:49airlied[d]: possible is one thing, useful is another
23:52x512[m]: GPU can write to timeline futex, send interrupt and than kernel code will recheck userland timeline futex waiter queue.
23:52airlied[d]: if I was starting again from just the newest hw to support, we would of course be able to make vastly different decisions
23:53airlied[d]: if you have preemptability in a timely manner it would be workable
23:53airlied[d]: if we had page faults it would be easier again
23:53airlied[d]: but we barely have page fault and preemptability kinda still sucks
23:54airlied[d]: in the end any cross-vendor fencing mechanism will need compromises
23:55airlied[d]: and don't get me started on video codec hw with user queues
23:59x512[m]: airlied[d]: Latest AMD GPUs still have only global GFX ring buffers?