00:08 gfxstrand[d]: kk
00:09 gfxstrand[d]: But yeah, if you see bugs in sm20 which I copied from sm32, just fix them.
00:30 gfxstrand[d]: snowycoder[d]: FYI: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35149
00:42 gfxstrand[d]: I can't wait until we can delete all the `nvk_use_nak()` checks.
05:46 mangodev[d]: discord is segfaulting extremely often now
05:46 mangodev[d]: and it says *this* in `journalctl`…?
05:46 mangodev[d]: `u_blitter:548: Caught recursion. This is a driver bug.`
05:47 mangodev[d]: as well as `[2103:0525/003616.914757:ERROR:gpu_process_host.cc(953)] GPU process exited unexpectedly: exit_code=139`
05:57 kar1m0[d]: Haven't had issues in apps so far with the drivers
05:57 kar1m0[d]: mangodev[d]: Are you using gpu acceleration?
05:57 kar1m0[d]: Might be the issue
06:00 mangodev[d]: kar1m0[d]: yes, same with firefox
06:00 mangodev[d]: it runs smoothly until it randomly segfaults :/
06:01 mangodev[d]: seems to happen randomly
06:01 mangodev[d]: at least discord catches itself though
06:01 mangodev[d]: strange thing? discord is the only chromium application that segfaults
06:01 mangodev[d]: everything else works buttery smooth
06:02 mangodev[d]: with the exact same flags
06:03 kar1m0[d]: What gpu and distro
06:03 kar1m0[d]: Do you use
06:03 mangodev[d]: kar1m0[d]: gtx 1660 super, endeavouros
06:04 mangodev[d]: i'm using git mesa, not mesa from `extra/`
06:04 kar1m0[d]: mangodev[d]: Is it newer?
06:04 kar1m0[d]: What is the difference
06:05 mangodev[d]: kar1m0[d]: far newer
06:05 kar1m0[d]: mangodev[d]: Turing
06:06 kar1m0[d]: Not sure if they worked on Turing
06:06 kar1m0[d]: They are still on kepler a
06:06 mangodev[d]: `extra/` is 24.1.1 iirc, i'm on 24.2.0-devel
06:06 kar1m0[d]: And I have a kepler b and the drivers don't work
06:06 mangodev[d]: kar1m0[d]: …?
06:06 kar1m0[d]: mangodev[d]: Gpu architecture
06:06 mangodev[d]: i know
06:07 kar1m0[d]: Turing is newer than kepler
06:07 mangodev[d]: i also know
06:07 kar1m0[d]: So the development is a bit behind
06:07 mangodev[d]: …you know NVK *started* on turing, right?
06:07 kar1m0[d]: Heard of it yes
06:07 mangodev[d]: it has been working *backwards* (and slightly forwards) since
06:08 mangodev[d]: https://www.collabora.com/news-and-blog/news-and-events/nvk-enabled-for-maxwell,-pascal,-and-volta-gpus.html
06:08 mangodev[d]: > When we started work on NVK, we focused on Turing (GTX 16xx and RTX 2xxx series GPUs) and later. This was because those were the only GPUs with GSP support, enabling proper re-clocking and decent performance.
06:08 kar1m0[d]: mangodev[d]: So did you compile the drivers?
06:09 kar1m0[d]: I don't see any other way of getting the devel drivers without extra/
06:09 mangodev[d]: kar1m0[d]: yes, i compile them every time there's something relevant for my hardware merged
06:09 mangodev[d]: kar1m0[d]: `extra/` isn't devel, it's stable
06:09 kar1m0[d]: My cpu will not survive compilation
06:09 kar1m0[d]: mangodev[d]: I know
06:10 kar1m0[d]: snowycoder[d]: I have the stable version of the drivers, will the newer version than extra/ help? The git one
06:10 kar1m0[d]: If I compile it
06:10 kar1m0[d]: For Kepler B
06:13 mangodev[d]: kar1m0[d]: master branch has the commits required for kepler b, i'd *highly* recommend using it
06:13 mangodev[d]: 24.1.1 is too old for kepler support
07:56 kar1m0[d]: mangodev[d]: I'll compile it when I get home then
07:56 kar1m0[d]: Not sure how long it will take
07:56 kar1m0[d]: Is there any software I can get out logs with from the drivers?mangodev[d]
07:57 kar1m0[d]: I want to test obs
09:17 magic_rb[d]: Fellow nixos user updated kernel from 6.12.29 to 6.12.30 and they lost external HDMI output. They on a 1660 ti laptop with intel igpu. Does this sound like a kernel bug or mesa bug? Because they also bumped mesa from 25.0.6 to 25.1.1
09:17 magic_rb[d]: Wondering where we should open a bug report and/or bisect
09:19 kar1m0[d]: magic_rb[d]: Are there any logs you can check?
09:19 kar1m0[d]: Because it kind of can be both
09:20 karolherbst[d]: magic_rb[d]: kernel
09:20 karolherbst[d]: I'd try rebooting 🙃
09:20 karolherbst[d]: laptops are weird
09:20 kar1m0[d]: I use mesa 25.1.1 but I have an issue with nouveau since kepler b isn't fully supported yet
09:20 karolherbst[d]: I'd usually suggest figuring out on which GPU all the ports are
09:20 karolherbst[d]: HDMI might be on the dGPU
09:20 kar1m0[d]: karolherbst[d]: Because they use integrated and discrete graphics at the same time usually
09:21 karolherbst[d]: USB-C -> DP might be on the dGPU or iDPU
09:21 karolherbst[d]: on some laptops one is on the dGPU, the other on the iGPU
09:21 kar1m0[d]: karolherbst[d]: He said hdmi so I think it's the hdmi port
09:21 kar1m0[d]: Which usually is the dgpu output
09:21 karolherbst[d]: yeah, but who knows what GPU is used for it
09:21 karolherbst[d]: kar1m0[d]: usually, but not always
09:22 karolherbst[d]: "usually" means nothing if you track down bugs 😛
09:22 kar1m0[d]: Is it possible to check it? I know that nvidia drivers use he dgpu for sure
09:22 kar1m0[d]: karolherbst[d]: True
09:23 karolherbst[d]: uhm... yeah
09:23 kar1m0[d]: Might check the hdmi output when I get home to see if it's the drivers
09:23 kar1m0[d]: Though it might just default to the input
09:23 karolherbst[d]: `grep . /sys/class/drm/card*/status`
09:23 karolherbst[d]: might need to figure out what's card0/card1
09:23 karolherbst[d]: but...
09:24 karolherbst[d]: usually the one with eDP is the iGPU
09:24 karolherbst[d]: however
09:24 karolherbst[d]: there are laptops with eDP on both GPUs these days
09:24 karolherbst[d]: `ls -l /dev/dri/by-path/`
09:25 kar1m0[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1376128989952737312/20250524_213721.jpg?ex=6834337e&is=6832e1fe&hm=6cf24790b17b024fdb841cae15015aa66e54a02c4036a8bf43b09958efe8f041&
09:25 kar1m0[d]: karolherbst[d]: Since yesterday I checked and the drivers don't work at all in games for me
09:25 kar1m0[d]: It may be a dgpu but the cpu is too of a high percentage usage for it to be a dgpu
09:25 kar1m0[d]: Are there any logs I can check from?
09:26 karolherbst[d]: mhh?
09:26 karolherbst[d]: the CPU usage will be higher when the games runs on your dGPU
09:27 kar1m0[d]: karolherbst[d]: I didn't get any vulkan shaders compilation even when I added `NVK_I_WANT_A_BROKEN_VULKAN_DRIVER=1`
09:27 kar1m0[d]: In steam
09:27 karolherbst[d]: kar1m0[d]: it has nothing to do with shader compilations
09:28 kar1m0[d]: karolherbst[d]: Snowy told me to add it
09:28 karolherbst[d]: sure
09:28 karolherbst[d]: but high CPU usage doesn't have much to do with shader compilations
09:28 karolherbst[d]: most games aren't silly and compile shaders all the time
09:29 karolherbst[d]: you probably want to throw it into a CPU profiler and see where it spends most of the CPU time
09:29 kar1m0[d]: I am just saying it because on nvidia drivers it always compiles the shaders
09:29 kar1m0[d]: karolherbst[d]: How do I do that
09:29 karolherbst[d]: kar1m0[d]: yeah, but you can't make that conclusion
09:29 kar1m0[d]: Sorry I am pretty new not sure how to check those things
09:30 kar1m0[d]: My knowledge is small in terms of checking stuff related to drivers
09:31 karolherbst[d]: kar1m0[d]: good question... normally there are various cli and gui tools, not really sure what integrates well enough with steam
09:32 kar1m0[d]: karolherbst[d]: Mine is geforce 920m
09:32 karolherbst[d]: uhhhhh
09:32 kar1m0[d]: Which is kepler b
09:32 kar1m0[d]: The dgpu
09:33 tiredchiku[d]: mind you
09:33 tiredchiku[d]: karol
09:33 karolherbst[d]: 920m is so slow, it might not even be faster than the igpu
09:33 tiredchiku[d]: they're also trying to run the enhanced edition of witcher 3
09:33 karolherbst[d]: yeah...
09:33 tiredchiku[d]: which is dx12
09:33 karolherbst[d]: I mean...
09:33 kar1m0[d]: tiredchiku[d]: I pit dx11
09:33 karolherbst[d]: didn't know it's a 920m 😄
09:33 karolherbst[d]: that's going to suck either way
09:34 kar1m0[d]: karolherbst[d]: I enjoyed my 15fps of far cry 4 back in the day
09:34 tiredchiku[d]: tbf 920m should be able to do witcher 3, they're both from the same year
09:34 karolherbst[d]: mhhh
09:34 tiredchiku[d]: march 2015 for the 920m
09:34 karolherbst[d]: yeah in theory if you turn down the resolution enough
09:34 kar1m0[d]: tiredchiku[d]: It was able to run the old version on Windows 10 at 30fps on minimal settings
09:34 karolherbst[d]: the 920 is just slow
09:35 tiredchiku[d]: may 2015 for the witcher 3
09:35 karolherbst[d]: even for 2015
09:35 karolherbst[d]: yeah..
09:35 karolherbst[d]: the enhanced edition might be too much
09:36 kar1m0[d]: I will try to run some other games
09:36 kar1m0[d]: Less demanding
09:36 tiredchiku[d]: do you have it on gog or steam?
09:36 kar1m0[d]: tiredchiku[d]: Steam
09:36 snowycoder[d]: gfxstrand[d]: Thanks! I'll rebase the two MRs (also, I might need a new one for header fixes and RAM advertisements that don't fit with the other two)
09:37 karolherbst[d]: but yeah..
09:37 tiredchiku[d]: kar1m0[d]: right click > properties > betas
09:37 karolherbst[d]: gpu0 _might_ be the iGPU, but if you have a high CPU usage it's really difficult to do much about it
09:37 tiredchiku[d]: and select `classic`
09:37 kar1m0[d]: karolherbst[d]: I mean I have an rtx 4080 laptop now but the new nvidia drivers broke everything
09:37 tiredchiku[d]: it'll switch to the previous version of the game (not the enhanced edition)
09:37 karolherbst[d]: kar1m0[d]: try nouveau on it then 😄
09:38 magic_rb[d]: * did reboot multiple times. Tried two cables and two monitors, assuming the display was at fault at first. No luck.
09:38 magic_rb[d]: * hdmi is connected to the nvidia gpu (i know this from previous debugging for other issues with the same hardware)
09:38 kar1m0[d]: karolherbst[d]: Will games work on it even?
09:38 tiredchiku[d]: kar1m0[d]: manjaro again?
09:38 karolherbst[d]: kar1m0[d]: sure
09:38 kar1m0[d]: tiredchiku[d]: No garuda
09:38 karolherbst[d]: maybe not out of the box 🙃
09:38 tiredchiku[d]: weird
09:38 tiredchiku[d]: what driver package?
09:38 karolherbst[d]: dunno what distro is shipping competent enough nvk these days
09:38 magic_rb[d]: `Looking at kernel, i get a SPAM of`Mai 25 11:35:10 grimm-nixos-ssd-2 kmscon[1875]: [0012.490436] ERROR: drm\_shared: cannot set DRM-master (uterm\_drm\_video\_wake\_up() in ../src/uterm\_drm\_shared.c:758)\`
09:38 kar1m0[d]: tiredchiku[d]: 570.153 the new one
09:38 tiredchiku[d]: and describe how it break
09:38 kar1m0[d]: Nvidia smi doesn't work
09:38 kar1m0[d]: And my gpu isn't working in games
09:39 kar1m0[d]: It defaults to the igpu
09:39 kar1m0[d]: And I can tell the difference
09:39 kar1m0[d]: Trust me
09:39 tiredchiku[d]: :doomthink:
09:39 tiredchiku[d]: is it using nvidia-dkms
09:39 tiredchiku[d]: I heard there were build failures on kernel 6.15
09:39 tiredchiku[d]: might be worth switching to the LTS kernel with `nvidia-lts` on it
09:39 tiredchiku[d]: for a while
09:39 kar1m0[d]: tiredchiku[d]: Yup but when I updated it should have recompiled the dkms but it just didn't do it nor did it give me any errors
09:40 tiredchiku[d]: tiredchiku[d]: worth a shot
09:40 kar1m0[d]: I probably will wait until it is patched
09:40 kar1m0[d]: tiredchiku[d]: I'll try
09:40 tiredchiku[d]: hang on let me update my system and see if it also breaks :BlobHajMlem:
09:41 kar1m0[d]: karolherbst[d]: I mean it will work but the question is how well it will work
09:41 tiredchiku[d]: also if it's using `nvidia-dkms`
09:41 tiredchiku[d]: you should switch to `nvidia-open-dkms`
09:41 kar1m0[d]: I play very dgpu demanding games
09:41 tiredchiku[d]: that's the recommended kernel driver now
09:41 tiredchiku[d]: tiredchiku[d]: so try this before you try switching to LTS kernel
09:42 kar1m0[d]: tiredchiku[d]: Garuda defaults to nvidia dkms and last time I tried switching drivers the system refused to fix the nvidia dkms
09:42 kar1m0[d]: Because garuda heavily relies on nvidia dkms
09:42 tiredchiku[d]: you can swap it for nvidia-open-dkmss
09:42 tiredchiku[d]: no problem there
09:42 tiredchiku[d]: they're both provided by nvidia
09:42 tiredchiku[d]: and nvidia-open-dkms is better supported
09:42 kar1m0[d]: I'll try
09:43 kar1m0[d]: Hopefully I will not break anything
09:43 kar1m0[d]: tiredchiku[d]: What distro?
09:43 tiredchiku[d]: arch
09:43 kar1m0[d]: Hm
09:43 kar1m0[d]: Try to update
09:43 tiredchiku[d]: yeah it'ss building AUR packages
09:44 magic_rb[d]: magic_rb[d]: Ah it was kmscon
09:45 tiredchiku[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1376134120157216860/image.png?ex=68343845&is=6832e6c5&hm=46853330215b48b061957f6abc001bb39aebf9c97cfc9a1b8a6c7a8732397bb2&
09:45 tiredchiku[d]: works on my 3070, nvidia-open
09:46 kar1m0[d]: tiredchiku[d]: Alright I will try to update then
09:46 tiredchiku[d]: switch to nvidia-open-dkms :bunnyo
09:46 kar1m0[d]: Also funny thing is that nvidia drivers on linux restrict the dgpu to 80w
09:46 kar1m0[d]: I can't use my dgpu at full power
09:46 kar1m0[d]: Which is 145w
09:46 kar1m0[d]: Nvidia...
09:47 kar1m0[d]: tiredchiku[d]: Yeah sure
09:47 magic_rb[d]: kar1m0[d]: Uh you need the dynamic power smth
09:47 kar1m0[d]: magic_rb[d]: The what
09:48 magic_rb[d]: Dynamic boost
09:48 kar1m0[d]: This tells me nothing
09:48 magic_rb[d]: > Whether to enable dynamic Boost balances power between the CPU and the GPU for improved performance on supported laptops using the nvidia-powerd daemon. For more information, see the NVIDIA docs, on Chapter 23. Dynamic Boost on Linux .
09:48 magic_rb[d]: > From nixos docs
09:48 kar1m0[d]: magic_rb[d]: I don't use nixos
09:49 magic_rb[d]: Doesnt matter
09:49 magic_rb[d]: I just dont have different docs
09:49 tiredchiku[d]: `systemctl enable --now nvidia-powerd`
09:49 kar1m0[d]: Alright I will look into it when I have more time
09:49 kar1m0[d]: Bye!
09:49 magic_rb[d]: Bye
10:28 kar1m0[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1376144914974244984/image.png?ex=68344253&is=6832f0d3&hm=8dbbb8e32d33ec34132376f23362cfad79c0d37ab825c6351d41465d7e129960&
10:28 kar1m0[d]: tiredchiku[d]: updating fixed the issue
10:57 snowycoder[d]: gfxstrand[d]: Other various Kepler fixes: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35151
10:57 snowycoder[d]: With these and the other branches, there are only ~70 test failures (+ srgb) on sm32 .
11:06 kar1m0[d]: snowycoder[d]: can I somehow pull the fixes and test them?
11:06 kar1m0[d]: or will it only be included in the next update of the drivers?
11:19 kar1m0[d]: tiredchiku[d]: sorry for the ping but does clinfo work on nouveau
11:19 kar1m0[d]: or is it an nvidia drivers only thing
11:19 tiredchiku[d]: it work on anything that supports openCL
11:19 kar1m0[d]: I will compile the newer version of the drivers
11:20 kar1m0[d]: and maybe send the clinfo here if it will help somehow
11:20 kar1m0[d]: not sure if it will though
11:35 snowycoder[d]: kar1m0[d]: You can compile a custom branch if you want
11:41 kar1m0[d]: snowycoder[d]: no idea how to do it
11:42 kar1m0[d]: do I just git clone it?
11:42 kar1m0[d]: and then compile
14:32 senseinakato: OK, I propose the first algorithm for data access now. 41+118-100(-34-68+2)=59 41+118-200(-136-68+4)=-41 the buddies are chosen from 136-68+4-118=82 at compile time, so in place you extract all 59s and 41s and add them at runtime for later use, now you do 41+118−114=45+107=152 and 193-152=41 where on duplicates you do 41+118-100=59 and 114-59=55, so now you subtract from 41+59, where as
14:32 senseinakato: the ones you need not you eliminate by subtracting -100, and hence the focal value is 100-41-55=4+referecepoint what was 114=118 which was biggest of the two buddies. Needs more testing, but so far works here.
14:36 snowycoder[d]: kar1m0[d]: Yes, I have a branch with all the patches that are floating around if you want.
14:36 snowycoder[d]: For compilation you need to follow this: https://docs.mesa3d.org/install.html
14:37 kar1m0[d]: snowycoder[d]: Sure I would like that
14:40 snowycoder[d]: kar1m0[d]: Here: https://gitlab.freedesktop.org/SnowyCoder/mesa/-/commits/nak_sm32_all_the_fixes
14:40 snowycoder[d]: Last commit removes the need to use `NVK_I_WANT_A_BROKEN_VULKAN_DRIVER` and `NVK_USE_NAK`, but really don't expect much, the new compiler is awesome but without instruction scheduling it can't do much
14:41 kar1m0[d]: snowycoder[d]: Thanks
14:41 snowycoder[d]: No problem!
14:44 asdqueerfromeu[d]: snowycoder[d]: And/or advanced (that may be the A in Nouveau Advanced Kompiler)
14:48 senseinakato: snowycoder[d]: you say ISA compiler works you mean? Which is actually quite fantastic the way i see.
14:50 mohamexiety[d]: gfxstrand[d]: what would happen if I create a sparse VkImage with a format, and then create a VkImageView of that image with a different format? I am not really picking up on any leads with the weirdness with those test fails but one thing stands out is that there's a potential room where the new format could need different sparse block sizes?
14:50 mohamexiety[d]: but that would lead to a garbled output, not just a black image so I am not sure
14:54 senseinakato: Technically nothing from instruction scheduling is needed here, only some instructions and binary containers etc. Cause actually it would be best to modernize the stack, but i have said that this work i no longer share in public, but you are free of charge to do it on your own, cause i split the ways from here. I contributed a lot to open source from my labs, and got only hate. Vide
14:54 senseinakato: encoding to opencl stacks and drivers of inputs, i am sure i am better on my own, but i just showed what is going on in my strategy that people would not come to charge me for doing what i do, like maybe in future earning a bit more, cause i implement better strategy.
15:10 gfxstrand[d]: mohamexiety[d]: It shouldn't. The new format has to have the same number of bits per pixel and that's the thing that goes into determining the sparse block size.
15:11 mohamexiety[d]: yeah if the bits per pixel are the same then that part is fine
15:11 mohamexiety[d]: not sure what else could go wrong though :thonk:
15:13 snowycoder[d]: senseinakato: It is fantastic, but for Kepler every instruction waits 32 cycles for now.
15:13 snowycoder[d]: I don't think the compiler can magically speed that up😂
15:13 snowycoder[d]: Nothing fondamentally broken, I just need to figure out instruction scheduling and I have no idea where to begin
15:28 gfxstrand[d]: We'll get there
15:28 gfxstrand[d]: I think Fermi scheduling is easier
15:28 gfxstrand[d]: Kepler, rather
15:33 snowycoder[d]: I got a GTX 770 to test on :CoolShiba:
15:35 mohamexiety[d]: hm looking closer at the test list, it seems that unorm <-> snorm (so float formats) of the same size pass, but anything else fails. so e.g. `'dEQP-VK.sparse_resources.image_sparse_residency.mutable.2d.r16_uint_r16_unorm_r16_snorm'` passes, but `'dEQP-VK.sparse_resources.image_sparse_residency.mutable.2d.r16_uint_r16_unorm_r8g8_unorm'`. same for conversion between float and int, that also fails
16:30 gfxstrand[d]: weird
16:30 kar1m0[d]: snowycoder[d]: about the last commit, I struggle to find the directory of those files, can you tell me where it should be?
16:30 kar1m0[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1376236111554416651/2025-05-25_19-30-41.png?ex=68349742&is=683345c2&hm=e60482b51fb98d33b84cf36c471f32473cee4e29f02005e6118165ea60709463&
16:31 kar1m0[d]: I have no idea where to put it
16:31 snowycoder[d]: It should already be applied don't worry, you just need to compile it and it should work (I hope)
16:31 kar1m0[d]: I didn't compile it yet
16:32 kar1m0[d]: I installed the patch tho
16:33 snowycoder[d]: Ah sorry I got the link wrong, this is the branch: https://gitlab.freedesktop.org/SnowyCoder/mesa/-/tree/nak_sm32_all_the_fixes?ref_type=heads
16:33 snowycoder[d]: There are a lot of commits on top of main, i suggest you git clone it separately (or add it as a remote)
16:37 kar1m0[d]: snowycoder[d]: oh yeah now I can git clone it
16:37 kar1m0[d]: thanks
16:49 kar1m0[d]: snowycoder[d]: where the hell is mesa directory
16:49 kar1m0[d]: I git cloned to it
16:50 kar1m0[d]: can't find it
16:51 kar1m0[d]: nvm found it
16:52 karolherbst[d]: gfxstrand[d]: ..... it's not for other reasons
16:52 karolherbst[d]: well
16:53 karolherbst[d]: fermi is easy, because the dual issue thing only matters on kepler really
16:53 karolherbst[d]: and it's a pain, because you need to schedule instructions of different classes to get dual issueing going or something
16:54 karolherbst[d]: and you can get 3 dual issues per block of 7
16:54 karolherbst[d]: and you can't dual issue 2 our of 3 instructions
16:54 karolherbst[d]: *out
16:54 karolherbst[d]: or uhm.. I mean 3
16:54 karolherbst[d]: must be 2 out of 4
16:54 karolherbst[d]: or uhm.. 4.. depending on how you look at it
16:55 karolherbst[d]: the issue is... we don't know for sure what the opcode classes are
16:55 karolherbst[d]: maybe we can ask nvidia 😄 lol
16:57 karolherbst[d]: anyway.. that makes placing instructions annoying, because you need to take into account this block layout for optimal dual issueing
16:58 kar1m0[d]: karolherbst[d]: and they will gladly tell us yes 🤣
16:58 karolherbst[d]: well.. why not
16:58 karolherbst[d]: we have that info for more modern gpus 🙃
16:59 mangodev[d]: good morning
16:59 mangodev[d]: how's the kepler stuff going?
16:59 karolherbst[d]: the question is rather, do we get somebody at nvidia to care about kepler lol
16:59 karolherbst[d]: though I think if it's very specific about dual issueing might be possible, but...
17:00 mhenning[d]: I think we should just not worry about dual issue for now
17:00 karolherbst[d]: how is latency stuff working out pre volta anyway? hopes and prayers that the numbers are right?
17:00 mhenning[d]: get some basic scheduling working for kepler first
17:01 karolherbst[d]: ohh yeah.. ignoring dual issue is a good idea, it just matters for perf 😢
17:01 karolherbst[d]: and it's annoying
17:01 karolherbst[d]: I've written a post RA scheduling pass for codegen at some point
17:01 mhenning[d]: karolherbst[d]: Yes, the pre-volta stuff is just using somewhat arbitrary guesses
17:01 karolherbst[d]: which gave decent results, not sure why I never landed it.
17:02 karolherbst[d]: if the IR is in a state where no instruction gets added anymore, you can still swap instructions around as long as you take care about registers and all that, so maybe that's the way to go
17:02 karolherbst[d]: because doing that before RA, especially ssa reg alloc, sounds like a massive pita
17:03 mhenning[d]: yeah, nak already has a post-ra scheduler
17:03 gfxstrand[d]: I kinda copied from codegen badly and hacked until it worked.
17:03 kar1m0[d]: sudo meson setup build
17:03 kar1m0[d]: The Meson build system
17:03 kar1m0[d]: Version: 1.8.0
17:03 kar1m0[d]: Source dir: /mesa
17:03 kar1m0[d]: Build dir: /mesa/build
17:03 kar1m0[d]: Build type: native build
17:03 kar1m0[d]: Project name: mesa
17:03 kar1m0[d]: Project version: 25.1.0-devel
17:03 kar1m0[d]: C compiler for the host machine: cc (gcc 15.1.1 "cc (GCC) 15.1.1 20250425")
17:03 kar1m0[d]: C linker for the host machine: cc ld.bfd 2.44.0
17:03 kar1m0[d]: C++ compiler for the host machine: c++ (gcc 15.1.1 "c++ (GCC) 15.1.1 20250425")
17:03 kar1m0[d]: C++ linker for the host machine: c++ ld.bfd 2.44.0
17:03 kar1m0[d]: Host machine cpu family: x86_64
17:03 kar1m0[d]: Host machine cpu: x86_64
17:03 kar1m0[d]: Checking for size of "void*" : 8
17:03 kar1m0[d]: Checking if "-mtls-dialect=gnu2" runs: YES
17:03 kar1m0[d]: Checking if "split TLSDESC" links: YES
17:03 kar1m0[d]: Did not find pkg-config by name 'pkg-config'
17:03 kar1m0[d]: Found pkg-config: NO
17:03 kar1m0[d]: Found CMake: /usr/bin/cmake (4.0.2)
17:03 kar1m0[d]: Run-time dependency libglvnd found: NO (tried pkgconfig and cmake)
17:03 kar1m0[d]: Run-time dependency vdpau found: NO (tried pkgconfig and cmake)
17:03 kar1m0[d]: Program glslangValidator found: YES (/usr/bin/glslangValidator)
17:03 kar1m0[d]: Run-time dependency libva found: NO (tried pkgconfig and cmake)
17:03 kar1m0[d]: meson.build:805: WARNING: add_languages is missing native:, assuming languages are wanted for both host and build.
17:03 kar1m0[d]: Rust compiler for the host machine: rustc -C linker=cc (rustc 1.86.0 "1.86.0")
17:03 kar1m0[d]: Rust linker for the host machine: rustc -C linker=cc ld.bfd 2.44.0
17:04 karolherbst[d]: yeah... for dual issue just needs to look at the sched blocks and...
17:04 kar1m0[d]: Program bindgen found: YES (/usr/bin/bindgen)
17:04 kar1m0[d]: Run-time dependency libclc found: NO (tried cmake)
17:04 kar1m0[d]: meson.build:849:12: ERROR: Dependency lookup for libclc with method 'pkgconfig' failed: Pkg-config for machine host machine not found. Giving up.
17:04 kar1m0[d]: snowycoder[d]
17:04 kar1m0[d]: I try to compile it
17:04 kar1m0[d]: even though I have libclc it tells me that I do not have it installed
17:04 karolherbst[d]: can't dual issue across blocks afaik
17:04 kar1m0[d]: even though I do
17:04 gfxstrand[d]: kar1m0[d]: Did you install the dev package?
17:05 mhenning[d]: karolherbst[d]: I thought it was even stricter and the instruction pointers need to be aligned?
17:05 kar1m0[d]: gfxstrand[d]: I git cloned snowy's latest commit https://gitlab.freedesktop.org/SnowyCoder/mesa/-/tree/nak_sm32_all_the_fixes?ref_type=heads
17:05 karolherbst[d]: mhenning[d]: you mean only the 1st/3rd/5th instruction?
17:05 gfxstrand[d]: I mean did you install libclc-dev or whatever your distro calls it.
17:05 kar1m0[d]: gfxstrand[d]: I installed it from aur
17:06 kar1m0[d]: maybe that is the issue
17:06 karolherbst[d]: I don't _think_ it's just specific slots, but who knows...
17:07 karolherbst[d]: if you get it wrong the hw will be angy at you, so maybe it's trivial to reverse engineer it
17:07 avhe[d]: Hi all, I'm reversing a CUDA program and I'm a bit confused by the following instructions:
17:07 avhe[d]: PRMT R9, R9, 0x9910, RZ ;
17:07 avhe[d]: As I understand, this is selecting "bytes" from each 32-bit reg operand (R9 and RZ), according to the pattern 0x9910 and putting them in R9.
17:07 avhe[d]: But what does the pattern represent? I especially can't make sense of the 0x99 high byte
17:07 avhe[d]: I found this snippet which does the same thing: <https://github.com/akrolik/rNdN/blob/20cda43e1d04185d8ceecc1f3d081855f0d621a8/src/Backend/Codegen/Generators/Instructions/Arithmetic/AddGenerator.cpp#L221>
17:08 avhe[d]: is this just constraining a register to int16 range?
17:12 mhenning[d]: avhe[d]: I think so, yes
17:13 mhenning[d]: the ptx docs describe prmt in detail https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-prmt
17:13 karolherbst[d]: it's always funny seeing the cuda docs
17:16 avhe[d]: mhenning[d]: Oh that makes sense. So the 0x9 just do sign extension
17:16 karolherbst[d]: like the cuda folks really go into much detail in a couple of things and it's impressive 😄
17:17 avhe[d]: There are some really fun instructions. LOP3.LUT is crazy
17:23 karolherbst[d]: nah, it makes perfect sense actually
17:25 avhe[d]: Yeah I don't mean crazy in that sense. It's very nice conceptually to model every bitwise operation on 3 operands, with just a single 8-bit value
17:25 karolherbst[d]: heh...
17:26 karolherbst[d]: `LogicOp3::eval` is kinda overly complicated...
17:26 kar1m0[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1376250085679304735/image.png?ex=6834a445&is=683352c5&hm=19ad3c511954d870ae0346b741534d4ea509d341dc89f70fcda6fb0cca6d0e7d&
17:26 kar1m0[d]: tiredchiku[d]: I want to cry...
17:27 kar1m0[d]: also powerd doesn't seem to do anything
17:27 kar1m0[d]: maybe it's something with bios tho
17:27 kar1m0[d]: but my dgpu is still capped at 80w
17:29 tiredchiku[d]: you can uninstall garuda-nvidia-config I suppose
17:31 mangodev[d]: mangodev[d]: bumping this for the morning people
17:31 mangodev[d]: started getting a new error as of yesterday
17:36 mhenning[d]: mangodev[d]: could you please file a bug report?
17:41 mangodev[d]: mhenning[d]: i should
17:42 mangodev[d]: i forgot i had a freedesktop gitlab acct tbh :P
17:42 mangodev[d]: i should've realized because i use dark mode on the gitlab
17:42 mangodev[d]: which for some reason requires an acct for freedesktop
17:45 kar1m0[d]: does anyone test the drivers with ada lovelace?
17:45 kar1m0[d]: or whatever the architecture of nvidia 40s series is called
17:46 dwfreed: mangodev[d]: even gitlab.com appears to require an account to get dark mode
17:46 mhenning[d]: kar1m0[d]: yes
17:46 dwfreed: also my own company's gitlab is the same; so it seems all gitlabs require it, not just fd
17:47 kar1m0[d]: mhenning[d]: how is it so far?
17:47 kar1m0[d]: in terms of performance
17:49 mhenning[d]: performance is okay on some games, although typically still slower than proprietary
17:49 kar1m0[d]: in terms of fps how bad is it
17:49 kar1m0[d]: I just want to understand
17:51 mangodev[d]: dwfreed: strange, kde invent is dark mode by default (but their code blocks aren't)
17:51 mangodev[d]: (for some reason)
17:52 mhenning[d]: kar1m0[d]: as an example, on my 3060 (ampere, which is similar to ada), I get 47 fps average on horizon zero dawn's benchmark on normal settings, which I find playable
17:53 mhenning[d]: I don't know how proprietary compares
18:02 kar1m0[d]: mhenning[d]: that's decent
18:02 kar1m0[d]: mhenning[d]: I don't play horizon zero dawn but I can say that in ac valhalla on max settings I get 80 fps on average
18:02 kar1m0[d]: with proprietary drivers
18:04 kar1m0[d]: on rtx 4080 laptop
18:04 kar1m0[d]: and it's on 80w
18:25 snowycoder[d]: `dEQP-VK.image.mutable.2d.r32_uint_r8g8b8a8_srgb_draw_copy_resolve` fails depending on what tests are run before it, when I run it alone it Fails 0_o
18:26 snowycoder[d]: So, maybe it's related with uninitialized memory?
18:42 mhenning[d]: snowycoder[d]: You can try with `NVK_DEBUG=trash_memory` or `NVK_DEBUG=zero_memory` to see if they make a difference
18:42 avhe[d]: avhe[d]: nvcc indeed seems to generate this prmt pattern on conversion to int16 <https://godbolt.org/z/Gdz3bvvWG>
18:43 mhenning[d]: sometimes it's something more subtle eg. if the shader header is set incorrectly the hardware might turn off a feature that we need for the shader
18:43 mhenning[d]: and then it works sometimes if a different shader is running concurrently which correctly requests the feature
18:44 mhenning[d]: avhe[d]: oh, that's cool, I actually didn't realize godbolt had cuda support
18:45 karolherbst[d]: mhhhh
18:45 avhe[d]: yeah I've been using it a lot to test my decompiled patterns
18:58 kar1m0[d]: karolherbst[d]: I got an idea to test drivers on my main laptop
18:59 kar1m0[d]: Ada lovelace architecture
18:59 kar1m0[d]: I want to compare the performance to proprietary drivers
19:01 kar1m0[d]: I wonder how much playable the games would be on them
20:16 snowycoder[d]: mhenning[d]: I think that's it, the flags don't help and it only happens in mutable -.-
20:28 mhenning[d]: maybe try toggling on the uses 64 or writes memory bits then?
21:01 recklessdude88: You want to tell me what you implemented and it's round about yeah the wank spam in cooperation with estonian abortion leftovers, plus another set of highly obnoxious terror, real code and procedural thinking has been very one sided and was brought in by me, i have logs of everything. Mathematics teacher said to me, multiply is an inherent operation in sequences of one type, they have
21:01 recklessdude88: arrays or sequences of geometrical and of arithmetic type. He is smart young fellow, however i said that no multiply is needed in my hash works, and hence i brought this example today which tends to serve as proof. So in pure mathematics one of the arrays type might use multiply, but in my work it's better not to use them, since compiler hides the latency and in a loop it solves
21:01 recklessdude88: everything assymetrically based of indexes it preproduces a single hash. that has addressing in OS memory as global structures. I've been at those algorithms for practicing like ten years in a row, and close to real development in couple of decades, i doubt that i am the one mistaken there. It's the hardware that solves those index based assymetric modulus or logarithm kinda formula.
21:04 snowycoder[d]: mhenning[d]: The kernel driver usually throws an error for those, mmh, I might have an idea
21:49 aligatorcroco: asymmetric by means, it uses os buffers to do subtracts and additions, cause division is repeated subtraction in hw, and on decimal or binary notation, where as multiply is repeated addition, well it only needs some global constant magic values, then ramble around little more than couple of subtracts and couple of additions, best is to use a derivative to describe this function however,
21:49 aligatorcroco: since actually the thought in the end is something like logarithm or modulus that is achieved with those subtracts and additions, and loop is not even used, since looper is software aka OS. The hunks are flowless straigt line of code, i am really confused as to how i need to type this stuff, have not you got some real thought your own? It's absurd what those people do in life. I am afraid
21:49 aligatorcroco: they get hurt soon entirely knocked out of life so to speak, but they themselves caused this and it is somewhat correct treatment to such. I am afraid this fraud ends with concentration camp held in or for those scammers and terrorist groups. And your people are as much as braindead and arrogant where brainless worms should not be alike. You are always going to be my poor shadow , it's
21:49 aligatorcroco: delusions that you see only and only. I can see how retarded you are very well. Fucking karol or whatnot, amoebas it's embarrassing that such violate me, you end up in a deep hole as massgrave done/dug to you. Such worms are not allowed to speak like this to me.