IRC Logs of #nouveau on irc.freenode.net for 2025-03-18

00:11 nebadon2025[d]: yea I don't know what to do
00:11 nebadon2025[d]: everything feels broken
00:13 nebadon2025[d]: trying xxmitsu copr for mesa-git
00:14 nebadon2025[d]: see if anything changes
00:18 nebadon2025[d]: yea this seems not right?
00:18 nebadon2025[d]: ```nebadon@CERULEAN:~$ inxi -G
00:18 nebadon2025[d]: Graphics:
00:18 nebadon2025[d]: Device-1: NVIDIA GA102 [GeForce RTX 3090] driver: nouveau v: kernel
00:18 nebadon2025[d]: Display: wayland server: Xwayland v: 24.1.6 compositor: kwin_wayland
00:18 nebadon2025[d]: driver: gpu: nouveau resolution: 3840x2160~60Hz
00:18 nebadon2025[d]: API: EGL v: 1.5 drivers: swrast,zink
00:18 nebadon2025[d]: platforms: gbm,wayland,x11,surfaceless,device
00:18 nebadon2025[d]: API: OpenGL v: 4.6 compat-v: 4.5 vendor: mesa v: 25.1.0-devel
00:18 nebadon2025[d]: renderer: zink Vulkan 1.4(NVIDIA GeForce RTX 3090 (NVK GA102) (MESA_NVK))
00:18 nebadon2025[d]: API: Vulkan v: 1.4.304 drivers: N/A surfaces: xcb,xlib,wayland
00:18 nebadon2025[d]: Info: Tools: api: clinfo, eglinfo, glxinfo, vulkaninfo
00:18 nebadon2025[d]: de: kscreen-console,kscreen-doctor wl: wayland-info x11: xdriinfo,
00:18 nebadon2025[d]: xdpyinfo, xprop, xrandr```
01:06 redsheep[d]: orowith2os[d]: The 3090 can't run without gsp, just so you know. Turing was the last generation that worked without it.
01:06 redsheep[d]: If the 3090 is working at all it's using gsp
01:07 orowith2os[d]: Oh, I thought it was the 40 series that required it?
01:07 redsheep[d]: Wait. Oh I might have it wrong, maybe 40 was the cutoff
01:07 redsheep[d]: Yeah it was, you're right
01:11 nebadon2025[d]: im wondering what I am missing or maybe mesa-git is broken at the moment?
01:11 nebadon2025[d]: im not compiling it just using a repo
01:11 nebadon2025[d]: ive done tons of testing so far
01:12 nebadon2025[d]: but its probably been a month since i really done anything on this machine
01:12 nebadon2025[d]: i turn it on like once a month heh
01:12 nebadon2025[d]: https://www.youtube.com/playlist?list=PLuGrTOaiLhBtDoQVMHVufs2_BBf1HSnEj
01:13 redsheep[d]: So performance has regressed? Are you able to get reasonable performance in any other tests right now?
01:13 nebadon2025[d]: most of that testing was like a year ago
01:14 nebadon2025[d]: let me try some more games
01:14 nebadon2025[d]: see if dxvk_hud works
01:14 nebadon2025[d]: since mangohud seems 100% borked
01:15 nebadon2025[d]: I tried every release of it from 0.7 to 0.8.1
01:15 nebadon2025[d]: i even went back to 6.13.6 kernel from 6.14
01:16 nebadon2025[d]: 2 different mesa-git repos
01:16 nebadon2025[d]: feels like master-git might just be in a not good state or something
01:18 redsheep[d]: It worked pretty well for me in deep rock galactic a month ago, and that is vkd3d as well. Maybe something merged that regressed things, kinda hard to check with gitlab down
01:18 nebadon2025[d]: yea I tested few things about a month ago myself and it was fine
01:18 redsheep[d]: If you have DRG maybe give that a spin. I was getting well over 100 fps in the starting area at 4k max
01:19 nebadon2025[d]: today was first time i booted into this Fedora i keep specificially for nvk testing
01:19 nebadon2025[d]: since then
01:19 nebadon2025[d]: trying GTA V
01:20 nebadon2025[d]: thats been one i have used a lot for nvk
01:20 nebadon2025[d]: it has worked pretty well like 60fps+ in 1440p
01:20 nebadon2025[d]: though bunch of objects are missing in the game certain places
01:21 redsheep[d]: GTA V has never been a test game for me, takes way too long to load
01:21 nebadon2025[d]: it has a decent benchmark
01:21 nebadon2025[d]: which is nice
01:21 redsheep[d]: nebadon2025[d]: Good chance that's fixed with some of the patches recently
01:22 redsheep[d]: I haven't seen things rendering incorrectly in a little bit
01:23 nebadon2025[d]: ok game loaded and dxvk_hud works
01:23 nebadon2025[d]: fps seems high
01:23 nebadon2025[d]: but its jittery
01:23 nebadon2025[d]: showing like 100+ fps
01:23 nebadon2025[d]: but doesnt look smooth at all
01:23 nebadon2025[d]: and objects are missing
01:24 redsheep[d]: Ok so no chance you're software rendering or missing gsp
01:24 nebadon2025[d]: the most prominent thing missing is the ferris wheel at boardwalk
01:24 nebadon2025[d]: im not sure how to check for gsp
01:24 redsheep[d]: Well if gitlab wasn't down I would say open an issue
01:24 nebadon2025[d]: and I doubt im getting 100+fps with software render
01:24 redsheep[d]: You should once it's back
01:25 nebadon2025[d]: frametime graph is spikey for sure
01:25 redsheep[d]: nebadon2025[d]: Yeah when I have hit issues that caused me to fall back to software rendering I was lucky to get 5 fps in minecraft
01:26 nebadon2025[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1351366036468797490/image.png?ex=67da1d32&is=67d8cbb2&hm=2f8afaa0a5f6b5e59df3df8942a024aec215be6888530d1771603ce5947b79b6&
01:26 redsheep[d]: nebadon2025[d]: Nothing remotely resembling high fps is possible in pretty much any game without it
01:27 nebadon2025[d]: a bit chunk of water is missing too
01:27 nebadon2025[d]: at the bridge/waterfall where the jet starts in benchmark
01:28 redsheep[d]: Hmm. You built main really recently?
01:28 nebadon2025[d]: im using a COPR repo\
01:28 nebadon2025[d]: its like couple hours ago updated
01:29 redsheep[d]: If you run your entire desktop session with `NOUVEAU_USE_ZINK=0` do those spikes go away?
01:29 redsheep[d]: You can put that in `/etc/environment`
01:30 nebadon2025[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1351367121954672760/Screenshot_20250317_212935.png?ex=67da1e35&is=67d8ccb5&hm=c4c867f5639b3c8f410f07b2b580ec65fd1613e40b2e2d6311c657fe9802cbd4&
01:30 nebadon2025[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1351367123817201744/Screenshot_20250317_212915.png?ex=67da1e35&is=67d8ccb5&hm=2fafc9d176e9248d2e78f69bf7484b70bb0347c8f94905d535809e05736590ea&
01:30 nebadon2025[d]: i can try
01:30 nebadon2025[d]: will that have an impact on DX10 games though?
01:30 nebadon2025[d]: isnt zink just opengl only?
01:31 redsheep[d]: So, those spikes look a bit like what might happen if your display server is slowing it down. Zink was recently defaulted to run your session
01:31 nebadon2025[d]: you can see in those screenshots above though water missing on left and ferris wheel missing on right
01:31 redsheep[d]: And might be breaking mangohud too. If those go away with that env variable then you will want to open issues
01:32 nebadon2025[d]: ok let me try 1 sec
01:33 nebadon2025[d]: /etc/environment doesnt exist
01:33 redsheep[d]: Uh. Okay. /etc/profile then?
01:34 nebadon2025[d]: that is there
01:34 redsheep[d]: just export that variable at the bottom, I think that should do the trick
01:34 nebadon2025[d]: i just put this at bottom and reboot?
01:34 redsheep[d]: Yeah
01:34 nebadon2025[d]: ok cool
01:38 nebadon2025[d]: still no mangohud
01:39 redsheep[d]: What is the result if you go to terminal and `echo $NOUVEAU_USE_ZINK`
01:39 redsheep[d]: Just to double check that it took
01:39 nebadon2025[d]: says 0
01:40 redsheep[d]: Okay, good. What about the spikes? I don't expect any change to incorrect rendering but maybe the spikes will be gone
01:41 redsheep[d]: Guess I need to put gta on my list to test. Is this the legacy version? I haven't checked if enhanced even works on linux
01:41 nebadon2025[d]: yea legacy
01:42 nebadon2025[d]: i cant imagine enhanced works well lol
01:46 nebadon2025[d]: graph seems less spikey
01:46 nebadon2025[d]: but still feels jittery visually
01:47 nebadon2025[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1351371300190421134/image.png?ex=67da2219&is=67d8d099&hm=60587ef7164162437936e1d850d3686c9909f1a45e0902e702b1d07bcd90c9ba&
01:49 redsheep[d]: Interesting, that's pretty near the same part of the benchmark and it kinda looks like the zink session was hindering performance less
01:49 redsheep[d]: It's also just stuttering up a storm
01:50 nebadon2025[d]: yea either way it didnt feel condusive to 80-90fps at 30ms frame times
01:50 nebadon2025[d]: it looked way worse than what its reporting
01:54 nebadon2025[d]: another thing i notice in that benchmark it shows gpu only at 49% utilization
01:54 nebadon2025[d]: which i guess feels about right for 90fps on a 3090
01:54 nebadon2025[d]: it definitely should be around double that
02:01 redsheep[d]: nebadon2025[d]: GPU utilization on nouveau is mostly lies right now. Kind of a vague notion but even less useful than that number normally is
02:02 redsheep[d]: Only seen DXVK even attempt to give a number and it's not pulling that off of any data from the hardware
03:20 airlied[d]: mohamexiety[d]: I think changing GART semantics is probably the best way to go, and see what blows up
03:25 gfxstrand[d]: And I think it's fine to set a maximum at BO creation time and I'm happy to have NVK pass in a 64K or 2M alignment at BO creation if that helps.
03:27 gfxstrand[d]: And I think we actually need that. If something like compression relies on 64K pages, we need to be able to tell the kernel that the BO must always be aligned, even if it gets swapped in and out and ends up with a different set of pages later.
04:12 gfxstrand[d]: I've got Maxwell A CTS results!
04:12 gfxstrand[d]: I'll put the conformance package together in the morning.
04:13 gfxstrand[d]: Maxwell B is running. If that completes okay, I'll run Pascal tomorrow. Then I just have to figure out Volta FP64.
04:15 gfxstrand[d]: If anyone has a Maxwell that does reclocking (I think the 750 Ti might) and wants to test Zink+NVK perf, that'd be cool. Once Maxwell is enumerating by default on NVK, we need to make the decision when/if to switch over to Zink.
04:16 gfxstrand[d]: gfxstrand[d]: Here's hoping clocked down Maxwell B isn't just straight-up too slow to pass the CTS. :frog_upside_down:
04:40 airlied[d]: gfxstrand[d]: any opinions on modelling things like ld where it can take a urX and an rX and the instruction combines them instead of us doing the iadd, I've started hacking up something with a "con_addr" src that I extract if I see a previous iadd with a convergent source, but I'm wondering if we should avoid losing that info earlier, rather than trying to work it out in the nir->nak conversion
04:41 airlied[d]: (I also haven't gotten it to work yet)
09:32 mohamexiety[d]: airlied[d]: Got it, thanks! Let’s see then
12:06 rinlovesyou[d]: nebadon2025[d]: Nvk on my 2070 super can't even run Sonic unleashed at a stable 60fps, im jealous you're getting these kinds of frames in gta
12:25 Jasper[m]: I can't even flash a newer u-boot image to my TX2 because the jetpack flashing scripts depend on python2
12:25 Jasper[m]: My god that's annoying
12:50 doitsujin[d]: redsheep[d]: fwiw we just calculate that based on QueueSubmit + associated timeline semaphore signal timings, generally works as a ballpark estimate but falls apart if Submit/Present stall for extended periods of time
13:14 gfxstrand[d]: airlied[d]: Yeah... We need to come up with something better there. The from_nir hack is about at its limit at this point. I experimented a bit with this to try and fold more address calculations into ldg/stg and it got ugly fast.
14:49 gfxstrand[d]: gfxstrand[d]: Verifying my submission package now...
14:50 gfxstrand[d]: gfxstrand[d]: Maxwell B did not run. I was running RADV all night instead. 🤦🏻‍♀️
14:57 gfxstrand[d]: Yay! Gitlab is back! <a:birthdaypartyparrot:841351708689170484>
14:57 gfxstrand[d]: But don't push anything. The data migration is still in progress.
14:58 karolherbst[d]: it's so fast
14:58 karolherbst[d]: kinda
14:58 karolherbst[d]: it was fast when I've tried
14:59 Jasper[m]: You turning around from being positive to bargaining was also fast :P
14:59 gfxstrand[d]: 🤣
15:00 karolherbst[d]: yeah...
15:00 karolherbst[d]: now it's slow, but it was fast 20 minutes ago!
15:04 Jasper[m]: TX2 is still flashing...
15:14 gfxstrand[d]: gfxstrand[d]: The submission verification package is such a pain....
15:18 gfxstrand[d]:doesn't want to run conformance on Tegra...
15:25 gfxstrand[d]: gfxstrand[d]: So much CPU time spent in python verifying test name lists...
15:29 Jasper[m]: a bunch of A57's aren't necessarily processing monsters indeed
15:54 gfxstrand[d]: gfxstrand[d]: Submitted! In 30(ish) days, Maxwell A will be conformant.
15:54 gfxstrand[d]: Hopefully I'll submit Maxwell B and Pascal this week as well.
15:54 gfxstrand[d]: I'm less sure about Volta.
16:04 mohamexiety[d]: <a:vibrate:1066802555981672650>
16:04 mohamexiety[d]: hopefully the volta fp64 stuff is a simple fix
16:05 mohamexiety[d]: it's for bragging rights really given it's just one (1) GPU but might as well :KEKW:
16:06 Jasper[m]: <Jasper[m]> "TX2 is still flashing..." <- Restarted and reenabled the logging bool that broke things before. Now I can actually see what's going on hahaha
16:19 gfxstrand[d]: Is newer Kepler SM30? or SM35?
16:25 mhenning[d]: I think kepler 2 is SM35
16:28 gfxstrand[d]: Looks like 35 according to https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/
16:53 gfxstrand[d]: `vulkan-cts-1.4.1.3` was tagged mere hours after I submitted Maxwell A on `vulkan-cts-1.4.1.2`. Oh, well...
17:11 gfxstrand[d]: My Maxwell B is getting toasty with no fans on
17:26 Jasper[m]: @_oftc_marysaka[d]:matrix.org what OS do you run on the TX1?
17:38 hisamiiii[d]: gfxstrand[d]: i have a quadro m1000m
17:38 hisamiiii[d]: reclocks
17:38 hisamiiii[d]: i can do that if i have idiotproof instructions and if that gpu counts :3
17:48 hisamiiii[d]: (and yes 750ti should reclock. its gm107)
18:11 gfxstrand[d]: IDK how idiotproof it is to run your whole desktop but running just an app or two is pretty easy.
18:12 gfxstrand[d]: You just configure with `-Dvulkan-drivers=nouveau -Dgallium-drivers=zink`, build, and run with `NOUVEAU_USE_ZINK=1 meson devenv $MY_APP`
18:13 airlied[d]: gfxstrand[d]: I'm thinking new nir intrinsics but I'll play around a bit more
18:14 gfxstrand[d]: Yeah. My attempt at this (which IDK that I should push right now) added an `ldg_nv` intrinsic which supported `addr+imm`
18:15 airlied[d]: I have to work out the encoding for ldg.e, my GPU doesn't like what I'm throwing at it now
18:15 airlied[d]: Yeah I I think I want convergent addr + divergent addr. + Addr
18:15 airlied[d]: Imm
18:16 gfxstrand[d]: There's a lot of different forms. Some take a UR. Some take a cbuf. Some take a bindless cbuf.
18:17 gfxstrand[d]: And then there are immediates which I think you can combine with most of those
18:17 gfxstrand[d]: I'd be tempted to encode the immediate as a const index in the NIR intrinsic so it doesn't get lost.
18:18 karolherbst[d]: they are signed 24 bit values btw
18:18 gfxstrand[d]: yup
18:18 gfxstrand[d]: Some of them are
18:18 cwabbott: gfxstrand[d]: I saw you were wondering about the cache flushing primitives on 32-bit ARM, there are indeed the same instructions as aarch64 but they don't work in userspace iirc
18:18 gfxstrand[d]: It depends on the intrinsic and the form. :frog_upside_down:
18:18 cwabbott: very annoying
18:18 gfxstrand[d]: cwabbott: Yeah, but the Arm CPU docs claim they do.
18:19 karolherbst[d]: apparently there is a .U32 modifier if you specify an indirect offset
18:19 gfxstrand[d]: marysaka[d]: is going to look into it some more.
18:20 karolherbst[d]: have you figured out the `desc[...]` form?
18:21 avhe[d]: dwlsalmeida: FYI I finally got my little hardware accel driver running on OGKM. There are a few issues I need to investigate but I'm pretty happy about this:
18:21 avhe[d]: JCT-VC-HEVC_V1
18:21 avhe[d]: |Test |FFmpeg-H.265-Envideo|FFmpeg-H.265-CUDA|
18:21 avhe[d]: |TOTAL | 139/147 | 138/147 |
18:21 avhe[d]: |TOTAL TIME| 9.422s | 32.069s |
18:21 avhe[d]: JCT-VC-RExt
18:21 avhe[d]: |Test |FFmpeg-H.265-Envideo|FFmpeg-H.265-CUDA|
18:21 avhe[d]: |TOTAL | 25/49 | 22/49 |
18:21 avhe[d]: |TOTAL TIME| 5.119s | 9.049s |
18:22 karolherbst[d]: well.. `desc` doesn't really help with anything.. it's weird
18:23 airlied[d]: Is desc the one that needs the header?
18:24 airlied[d]: I read about it reading some 64bit header with buffer info in it to do bounds checking
18:24 karolherbst[d]: airlied[d]: I have no idea honestly, it sounds something related to caching
18:25 karolherbst[d]: airlied[d]: that's boundless ubo stuff, no?
18:28 airlied[d]: Yes, maybe I read the in LDC, it's hard to keep track 🙂
18:29 gfxstrand[d]: Is there a nouveau.ko variable for context timeouts?
18:30 gfxstrand[d]: I'm going to have to increase the limit to get dEQP-VK.sparse_resources.buffer.ssbo.sparse_residency.buffer_size_2_24 to pass without GSP
18:32 redsheep[d]: How big is your maxwell b?
18:33 gfxstrand[d]: It's a 980, not that it matters. :frog_upside_down:
18:33 redsheep[d]: Oh, yeah so 980ti would help but not much
18:33 Jasper[m]: my card is much bigger than yoohooours
18:34 Jasper[m]: (TX2 will not boot, I am bit bored, so System of a Down lyrics it is)
18:34 gfxstrand[d]: Oh, it's a Ti
18:34 gfxstrand[d]: `#sessionInfo deviceName NVIDIA GeForce GTX 980 Ti (NVK GM200)`
18:34 gfxstrand[d]: And it doesn't frickin' matter...
18:34 karolherbst[d]: ~~enable reclocking and get screwed by the fans not moving anyway~~
18:34 gfxstrand[d]: Worst case, I can disable sparse and run again.
18:34 gfxstrand[d]: But I'd rather just bump the timeout
18:35 marysaka[d]: karolherbst[d]: just put the test bench in the fridge it will be fiiine
18:36 redsheep[d]: I mean, if you plug those fans into 12v from elsewhere to spin full speed and can reclock otherwise that is a solution
18:36 redsheep[d]: And probably not a terribly hard one if the fan connector isn't buried
18:37 karolherbst[d]: with this little trick you can push the thermal threshold to 120ºC
18:37 karolherbst[d]: I'd have to check, but the configured hardware shut down limit is somewhere around 117ºC or so
18:37 redsheep[d]: "We call it roastnvidia"
18:37 airlied[d]: You have to hack the kernel
18:37 karolherbst[d]: nah
18:37 karolherbst[d]: the kernel doesn't configure it
18:38 redsheep[d]: That is such an insanely high shut down limit
18:38 karolherbst[d]: nouveau operates in full yolo mode and trust the VBIOS init scripts to set it up
18:38 karolherbst[d]: redsheep[d]: well it's the worst case
18:38 karolherbst[d]: there is a lot more happening before that
18:38 dwlsalmeida[d]: avhe[d]: Hey congrats!! I really need to get back to this :/
18:38 gfxstrand[d]: airlied[d]: I found the `#define`. Do I need to do anything more than just change that?
18:38 karolherbst[d]: around 105º or so there is a clock divider jumping in
18:39 dwlsalmeida[d]: avhe[d]: Been working on something else for a while now
18:39 karolherbst[d]: ohh Dave meant the timeout
18:39 gfxstrand[d]: It's 10s currently. Let's bump that to 5 minutes
18:40 karolherbst[d]: good luck
18:40 karolherbst[d]: or talking about sparse, are you making use of the ISA support there?
18:41 gfxstrand[d]: yup
18:41 karolherbst[d]: ahh, cool
18:41 gfxstrand[d]: But it doesn't exist on Maxwell A, which is how I got the tests to pass there.
18:41 gfxstrand[d]: (Also, Maxwell A can reclock)
18:41 karolherbst[d]: ohhh.. I see
18:42 redsheep[d]: It's really a shame that maxwell B and pascal were some of the best selling cards in history
18:42 gfxstrand[d]: Yeah, Maxwell B and Maxwell A are different GPU generations. They added sparse, swapped out the whole sampler, reworked MSAA, and who knows what all else. IDK why they kept calling it Maxwell.
18:42 gfxstrand[d]: It should have been Pascal A
18:43 karolherbst[d]: maxwell A also has the new sampler, no?
18:43 marysaka[d]: also shader interlocked happened :blobcatnotlikethis:
18:43 gfxstrand[d]: nope
18:43 gfxstrand[d]: Maxwell A did get formatted image load/store, though, which is nice.
18:43 karolherbst[d]: it does have the new tex headers tho
18:43 karolherbst[d]: it's an opt-in thing tho
18:44 karolherbst[d]: but what's different with samplers?
18:44 gfxstrand[d]: Oh, sampler==texture in my brain
18:44 karolherbst[d]: ah yeah, but maxwell A has the new format
18:44 karolherbst[d]: just need to enable it
18:45 mhenning[d]: karolherbst[d]: Yeah, there's some stuff in maxwell a where you can toggle between kepler style and maxwell style
18:45 gfxstrand[d]: Yeah, looks like we do. I think I tested the old thing on Maxwell A with it disabled. That's probably why I'm remembering it being a B thing
18:45 karolherbst[d]: yeah, which is fair
18:45 karolherbst[d]: good way to check if things would work on kepler at least with less variables
18:46 karolherbst[d]: I think the flip also exist on maxwell b but doesn't do anything
18:48 mhenning[d]: If anyone wants to test out a possible perf improvement, I'd be curious how this branch fares on different gpus: https://codeberg.org/mhenning/mesa/src/branch/watermark
18:49 karolherbst[d]: okay.. I think I _might_ not what those do..
18:49 karolherbst[d]: do you know what's the default value?
18:50 mhenning[d]: nope. Just using the values from a trace of the blob
18:50 karolherbst[d]: should be able to read out everything via MME, but probably doesn't matter much.. let me see if I can find public information on what the term means
18:51 avhe[d]: dwlsalmeida[d]: Yep this is just my little obsession lol
18:52 avhe[d]: Anyway I'm working on some test failures atm. I added hevc range ext and vp9 high depth support, I'll try to figure out av1 at some point
18:55 karolherbst[d]: probably doesn't matter much if copying from blob gives more perf
18:56 karolherbst[d]: I don't know what that means in this specific case anyway
18:57 mhenning[d]: Yeah, the main thing I'm worried about is whether nvidia uses different settings on different gpus
18:57 karolherbst[d]: mhhhhh
18:57 airlied[d]: gfxstrand[d]: there might also be a 15s thing somewhere for fence waits
18:57 gfxstrand[d]: airlied[d]: That would be annoying.
18:57 gfxstrand[d]: Once my kernel is done building, I'll find out.
18:58 airlied[d]: Look in nouveau_fence.c
18:58 gfxstrand[d]: Worst case, I'll disable the sparse features and run again.
18:58 airlied[d]: I'm not at a PC yet
18:58 karolherbst[d]: mhenning[d]: those values look suspiciously low anyway
18:58 karolherbst[d]: like there are like 64k per SM
18:58 karolherbst[d]: *regs
18:58 karolherbst[d]: ohh
18:58 karolherbst[d]: have you checked the value range of those methhods?
18:59 mhenning[d]: no
18:59 karolherbst[d]: mhh both 16 bit...
18:59 gfxstrand[d]: Yeah, I see a `15 * HZ` a few places
18:59 mhenning[d]: I'm not sure what the units are
18:59 karolherbst[d]: 16 bits are enough to fit in all SM registers
18:59 karolherbst[d]: but yeah.. no idea what the unit might be...
19:00 karolherbst[d]: there are also `VTG` variants of those, is nvidia setting them?
19:00 mhenning[d]: Nope, doesn't show up in any of my traces
19:00 gfxstrand[d]: gfxstrand[d]: Okay, changed all those to 10 min
19:00 karolherbst[d]: mhhhh
19:00 mhenning[d]: just the PS variants
19:03 karolherbst[d]: anyway.. would be kinda weird to not have those values be per SM
19:08 hisamiiii[d]: gfxstrand[d]: alright will try on the weekend if thats not too late
19:09 gfxstrand[d]: Oh, we've got lots of time.
19:10 gfxstrand[d]: I'm not planning flip Zink on by default pre-Turing until at least Mesa 25.2.
19:10 gfxstrand[d]: And IDK if we'll do it then. I suspect bindless-only samplers are going to be a perf problem. The cbuf situation isn't great there, either.
19:11 mhenning[d]: karolherbst[d]: I was wondering if it was per-warp. We have a minimum of 4 regs/shader * 32 threads = 128 = 0x80 which is the low register watermark the blob uses
19:12 gfxstrand[d]: Honestly, pre-Turing really needs CPU descriptors for perf. Fermi (if we enable it) needs them in order to work at all.
19:12 karolherbst[d]: mhenning[d]: mhhh... maybe those settings are so that one full warp can always run?
19:13 mhenning[d]: I don't know. Maybe
19:15 redsheep[d]: I'm not sure I understand, are you two saying this watermark stuff is possibly needed to get high utilization?
19:15 karolherbst[d]: mhh that 0x60 is suspicious
19:16 karolherbst[d]: but no idea
19:17 karolherbst[d]: I have some gens where I have multiple GPUs from
19:17 karolherbst[d]: I could check if you throw me a script or so
19:17 mhenning[d]: redsheep[d]: All I know for sure is that it mirrors what the blob does and it improves perf in my tests. My best guess is that these settings make it more likely for hardware to run vertex and pixel shaders at the same time, but I'm not sure
19:18 karolherbst[d]: like I have a kepler titan but also the low end ones
19:18 karolherbst[d]: if they make it dependent on the GPU I'm sure it'd show there
19:19 mhenning[d]: karolherbst[d]: I'm just using envyhooks to dump the command buffers of some of the sascha willems demos
19:19 mhenning[d]: I don't think that works on kepler
19:19 karolherbst[d]: ehh.. right.. is kepler even supported with the new drivers...
19:20 karolherbst[d]: mhhhhh
19:20 mhenning[d]: But it would be enough to run envyhooks on vkcube on a few different cards and sending me the output
19:20 redsheep[d]: I've never used envyhooks before but I could gather the info from Ada later tonight if you want to write some steps
19:20 karolherbst[d]: I could imagine it being different between generations at least
19:21 mhenning[d]: Or you can dump the pushbufs yourself if you want to see the args
19:21 mhenning[d]: karolherbst[d]: Yeah, or we also have differences in max occupancy on some chips
19:22 redsheep[d]: And I'll test if what you have helps perf or not, probably in like 10 hours
19:22 tiredchiku[d]: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12840
19:23 mhenning[d]: redsheep[d]: You just need to follow the instructions in the readme here: https://gitlab.freedesktop.org/marysaka/envyhooks
19:23 mhenning[d]: under the "Installing" and "Running" headings
19:23 mhenning[d]: using vkcube should be enough
19:23 marysaka[d]: oh we moved the repo to https://gitlab.freedesktop.org/nouveau/envyhooks btw
19:24 marysaka[d]: (not that my fork is any different right now but you know)
19:24 marysaka[d]: I really need to find the spoon to properly handle memory allocations detection on envyhooks, we could do way more than what we currently do
19:25 mhenning[d]: and then zip the contents of the "dump_output" directory and send them to me along with your exact gpu model
19:25 marysaka[d]: Also important you need rustup installed on your system because otherwise it doesn't pick the right rust version pinned on the repo
19:26 marysaka[d]: (if you have a system-wide installation of rustc/cargo)
19:28 gfxstrand[d]: gfxstrand[d]: Looks like it passes in about 5s if I up the limits.
19:41 hisamiiii[d]: gfxstrand[d]: oh yeah also am i supposed to compare vulkan vs gl with zink or vulkan zink vs prop
19:42 gfxstrand[d]: I also found a hack for getting the CTS to not die. I have my script start runs at 45-min intervals to avoid all the "murder the GPU" tests running at the same time.
19:43 gfxstrand[d]: hisamiiii[d]: Nouveau GL vs. Zink+NVK. Basically, I want to know that switching to Zink isn't going to regress GL for people.
19:43 hisamiiii[d]: oh alright
19:43 hisamiiii[d]: any test suite thats a good idea to use
19:45 airlied[d]: dare I suggest running GL CTS on zink/nvk just in case there's any wierdness, or has that happened before?
19:51 gfxstrand[d]: It's 4.6 conformant, at least on some GPU.
19:52 gfxstrand[d]: https://www.khronos.org/conformance/adopters/conformant-products/opengl#submission_352
19:52 gfxstrand[d]: Looks like I ran Turing and Ampere.
19:52 mhenning[d]: Yeah, I'd like to see us passing gl cts on maxwell before we change the default to zink+nvk there
19:53 mhenning[d]: but like you said we have time to do that
19:54 Jasper[m]: TX2 is booting, but no display. Prolly because wayland isn't functioning with goof 'ol nouveau
19:54 Jasper[m]: Step closer to ci/cd or just regular nvk testing
19:57 gfxstrand[d]: mhenning[d]: Yeah, someone probably should. It's just a giant PITA. As much as I hate doing official Vulkan conformance runs, at least there the CTS can run headless and doesn't need 3 days of banging your head against window systems and configure flags to get it to work.
19:57 airlied[d]: I think GL CTS finally dropped kc-cts
19:57 gfxstrand[d]: Could be worse, I guess. At least you don't have to write your own window system code anymore. <a:shrug_anim:1096500513106841673>
19:58 gfxstrand[d]: airlied[d]: That was a minor problem compared to getting EGL to pass, IIRC.
20:02 mhenning[d]: Yeah, to be honest I'm not super concerned about being officially conformant before we flip the switch since the old gl driver never was, I'd just like us to be in a state where the cts typically passes with maybe a few flakes (since that's what the old gl driver can do)
20:03 gfxstrand[d]: Yeah. And I suspect we can be conformant. There shouldn't be much difference from Zink's perspective.
20:09 marysaka[d]: uuurgh we cannot push on gitlab yet right
20:10 marysaka[d]: envyhooks will not build currently for new users because I never pushed the Cargo.lock and some dep now hard require rustc 1.81 :blobcatnotlikethis:
20:10 mhenning[d]: I was able to push earlier, but it's been up and down all day
20:11 marysaka[d]: I will push the fixes then and repush them if it rollback at some point
20:11 snowycoder[d]: karolherbst[d]: Spoiler: it isn't 😦
20:14 marysaka[d]: marysaka[d]: okay should be pushed now
20:34 karolherbst[d]: marysaka[d]: there is a fix for that
20:34 karolherbst[d]: in 1.85 🙃
20:34 karolherbst[d]: https://doc.rust-lang.org/edition-guide/rust-2024/cargo-resolver.html
20:34 karolherbst[d]: 1.84 actually
20:50 redsheep[d]: snowycoder[d]: I think Karol meant the latest nvidia prop not supporting kepler, not referring to your kepler work
21:06 gfxstrand[d]: snowycoder[d]: I just threw together a couple of patches that add the basic infra for SM20 and SM35. I'll push it once GitLab is back. I might do enough SM35 to get some basic unit tests passing once my machines free up from CTSing.
21:08 gfxstrand[d]: I think my kepler is a GK110. I should probably pick up a GK10x, too.
21:10 snowycoder[d]: redsheep[d]: Yep, I was saying the same thing, latest nvidia doesn't even recognize kepler.
21:10 snowycoder[d]: It's hard to check how it compiles spir-v (I'm using old nvcc)
21:11 snowycoder[d]: gfxstrand[d]: Nice, what infra does it add?
21:11 snowycoder[d]: I'm getting some basic tests passing, others fail but at least it generates something that nvdisasm recognizes
21:11 gfxstrand[d]: Oh, just the very basics. But if you've already got it generating something, you might already have typed all that
21:12 gfxstrand[d]: I didn't realize you'd started typing
21:12 snowycoder[d]: I only added sm30 encoder and plumbed it through (most ops panic though)
21:13 Jasper[m]: I have one big kepler and several small ones
21:13 Jasper[m]: I wish I had the income to have all of them running at the same time to help hahahaha
21:15 Jasper[m]: GK20A, GK104, GK107, 2xGK208B
21:20 snowycoder[d]: Jasper[m]: Wow, nice!
21:20 Jasper[m]: Ngl if y'all need some specimen, I'd be willing to send them if the shipping isn't too steep
21:21 Jasper[m]: I'm from the Netherlands, within the EU should be doable
21:22 rinlovesyou[d]: mhenning[d]: just gave it a go and it made KDE start with llvmpipe
21:23 gfxstrand[d]: snowycoder[d]: Based on CUDA docs, I think we want sm20 and sm35. I *think* SM30 is the same ISA as SM20.
21:24 gfxstrand[d]: Actually, I think SM32 is the cutoff
21:25 gfxstrand[d]: Or maybe not? I'm not sure if our `sm_for_chipset()` is right
21:28 gfxstrand[d]: Okay, yeah, I think it's SM20 and SM35
21:29 rinlovesyou[d]: rinlovesyou[d]: here's what happens when i try running `vulkaninfo` with your branch on my 2070 super:
21:29 rinlovesyou[d]: WARNING: [../src/nouveau/vulkan/nvkmd/nouveau/nvkmd_nouveau_ctx.c:251] Code 0 : DRM_NOUVEAU_EXEC failed: No such device (VK_ERROR_DEVICE_LOST)
21:29 rinlovesyou[d]: ERROR: [Loader Message] Code 0 : terminator_CreateDevice: Failed in ICD /usr/lib/libvulkan_nouveau.so vkCreateDevice call
21:29 rinlovesyou[d]: ERROR: [Loader Message] Code 0 : vkCreateDevice: Failed to create device chain.
21:29 rinlovesyou[d]: ERROR at /usr/src/debug/vulkan-tools/Vulkan-Tools/vulkaninfo/./vulkaninfo.h:1604:vkCreateDevice failed with ERROR_DEVICE_LOST
21:29 gfxstrand[d]: Okay, no, it's sm32 that's the split. That's TK1 which is definitely using the Kepler stuff.
21:29 gfxstrand[d]: So we want `sm20.rs` and `sm32.rs`.
21:30 gfxstrand[d]: snowycoder[d]: ^^
21:30 nebadon2025[d]: I have an GTX 980m, GTX1660Ti Max-Q, RTX 3070 Ti Mobile, RTX 3090 and a RTX A5000 not sure if can help in anyway, but thats my hardware 🙂
21:30 gfxstrand[d]: Gosh I wish this information were easier to find.
21:31 gfxstrand[d]: nebadon2025[d]: How do you want to help?
21:32 nebadon2025[d]: well I do a lot of game testing, been testing NVK as much as I can, pretty sure you follow me on Mastodon
21:32 nebadon2025[d]: am a moderator over at gamingonlinux so actively involved with that
21:35 nebadon2025[d]: anyway if there is something i can help out testing just let me know, if i can I will 🙂
21:39 gfxstrand[d]: rinlovesyou[d]: Uh... wut? What's in dmesg?
21:40 rinlovesyou[d]: Yeah gimme a sec, back on pc in a bit
21:40 rinlovesyou[d]: It's very weird since it works fine with upstream, and my build script is the same so i can only assume it's the one commit they added
21:40 gfxstrand[d]: `VK_ERROR_DEVICE_LOST` on init is pretty surprising. Either your GPU is already gone or we have a bad line in our init code.
21:41 gfxstrand[d]: But my Turing cards are fine so that's surprising
21:41 rinlovesyou[d]: Yeah works perfectly fine when i switch back to upstream
21:41 gfxstrand[d]: What commit got added?
21:42 gfxstrand[d]: And why are people feeling the need to carry patches?
21:42 rinlovesyou[d]: mhenning[d]: check what i'm replying to, it's this fork
21:42 rinlovesyou[d]: just wanted to give it a go
21:43 gfxstrand[d]: Ah. Okay. I missed that.
21:43 gfxstrand[d]: Too many conversations going on. :blobcatnotlikethis:
21:43 rinlovesyou[d]: all good haha
21:43 snowycoder[d]: Jasper[m]: Uhh, maybe in the future? Right now my 710 is doing fine but maybe other chipsets behave differently (it could be useful a sm20)
21:45 rinlovesyou[d]: yeah it's definitely that commit. I'm going to see about dumping these values from the blob, i guess they just blow up on turing
21:49 snowycoder[d]: gfxstrand[d]: Right now I'm doing "kepler 2" (so it should be a sm32?).
21:49 snowycoder[d]: Sm20 has a completely different encoding, will do it after sm32
21:49 gfxstrand[d]: Yeah, sm32
21:49 Jasper[m]: <snowycoder[d]> "Jasper: Uhh, maybe in the future..." <- Sure! Any time
21:49 gfxstrand[d]: old kepler is sm30
21:49 gfxstrand[d]: TK1 is sm32
21:49 gfxstrand[d]: new desktop Kepler is sm35 and sm37
21:49 snowycoder[d]: Jasper[m]: Thank you! I'll ask it for sure for sm20 :3
21:49 gfxstrand[d]: TK1 and new desktop Kepler are the same ISA
21:50 snowycoder[d]: gfxstrand[d]: What a migraine 😦
21:50 gfxstrand[d]: I just bought two more GPUs on eBay so I should now have a GK107, GK110, and GK208.
21:51 redsheep[d]: Sounds like the kepler arc bas begun
21:52 gfxstrand[d]: Fortunately, my GK107 and GK208 are tiny. Not the giant monster that is my 780 Ti
21:52 redsheep[d]: Never thought this day would arrive
21:52 gfxstrand[d]: Old HW is fun to poke at from time to time. It's also a great for onboarding new folks.
21:54 redsheep[d]: If it can actually work well it would be nice not to have all of those cards end up being completely unusable on really old nvidia drivers
21:55 gfxstrand[d]: The big question is what we're going to do about descriptors.
21:55 redsheep[d]: What makes cpu descriptors difficult?
21:55 gfxstrand[d]: Fermi can't do bindless and pre-Turing has some serious limitations on cbuf descriptors.
21:56 gfxstrand[d]: redsheep[d]: Mostly it's just that that's not the way anything works right now.
21:57 gfxstrand[d]: The way things are currently designed, we have a GPU descriptor buffer and we only ever write to it from the CPU. This means all the optimizations work with descriptor buffers as well.
21:58 gfxstrand[d]: If we add actual CPU descriptors, it's basically new descriptor code.
21:58 gfxstrand[d]: But also, I don't think we care about running DX12 titles on anything pre-Turing so it's probably fine.
21:59 gfxstrand[d]: Zink being stuck on its descriptor set path isn't great, though. 😕
22:00 rinlovesyou[d]: how *do* you guys make sense of these dumped command buffers
22:00 gfxstrand[d]: Experience. <a:shrug_anim:1096500513106841673>
22:00 rinlovesyou[d]: fair
22:00 redsheep[d]: gfxstrand[d]: I mean, on the windows prop drivers maxwell dx12 works pretty alright, but that makes sense if it's just not reasonable to try to match that with all of the extra layers
22:01 gfxstrand[d]: gfxstrand[d]: Which is to say that I really have no good explanation. Once you spend enough time working with a given vendor's GPUs, it all starts to make sense after a while.
22:02 rinlovesyou[d]: i guess so, i haven't gone that deep into reverse engineering things, beyond some basic things. Always fascinating that you manage to make sense of what is, to me, a string of random bytes
22:02 gfxstrand[d]: redsheep[d]: It's more that Maxwell and Pascal won't reclock so running a DX12 game on them doesn't make sense. Maxwell A is typically in the GTX 750s which aren't high-end cards you want to run DX12 games on. And a 780... Maybe capable of some stuff but meh?
22:03 gfxstrand[d]: And Fermi can do DX12 in the same sense that Haswell can. Badly.
22:03 redsheep[d]: Yeah, that's fair.
22:04 redsheep[d]: Might has-well not try 🥁
22:48 orowith2os[d]: gfxstrand[d]: Kepler, at least, only supports VK 1.2 on Linux, so that's fine :Shrug:
22:49 Jasper[m]: ayyyy TX2 now has display and KDE is loaded (in Xorg), just no nouveau rn
22:49 orowith2os[d]: actually, is there anything in hardware *stopping* Kepler from VK 1.3+ support....?
22:49 orowith2os[d]: other than, absolutely crappy performance
22:50 Jasper[m]: it's running llvmpipe though, cannot for the life of me find why it's not using nouveau (even when forced through xorg.conf)
22:52 redsheep[d]: orowith2os[d]: The extension 1.3 includes for vulkan memory model is thought to not be possible on kepler
22:52 orowith2os[d]: gfxstrand[d]: actually, question here: is nouveau allowed to use the nvidia gpu firmware for pre-Turing so long as it's shipped by nvidia? Like, install nvidia-prop, and just load the firmware from that package.
22:53 orowith2os[d]: aiui, shipping it (aka, extracting it to linux-firmware) is where things always get blocked
22:53 airlied[d]: the fw is embedded in the driver, and its like 5 firmwares
22:53 redsheep[d]: orowith2os[d]: The nouveau kmd would have to be modified to be able to load and use that firmware, and there would be significant RE to learn how to use it, just to get a result that is potentially not upstreamable
22:54 airlied[d]: and we've little to no idea how to extract them
22:54 orowith2os[d]: bah
22:55 redsheep[d]: I still think the idea of an out of tree nouveau fork where you have that extra work and you are expected to go and download the nvidia driver to make it work is a potential path, but that not being something that would ever just work out of the box is a drag
22:56 airlied[d]: like we'd probably be fine with shipping it in nouveau if it worked, and had a fw extraction tool
22:56 airlied[d]: I don't think we'd get the rights to redist the fw in l-f
22:56 orowith2os[d]: have users run it manually so that the responsibility is on them?
22:56 snowycoder[d]: gfxstrand[d]: Btw gfxstrand[d] some points on the ISA I've ported so far:
22:56 snowycoder[d]: - There's QUADON and QUADPOP that we aren't using (they seem to be used for ddx, ddy on codegen, we're doing it differently so it might not be needed)
22:56 snowycoder[d]: - We might use cache hints to speed up load/store for certain ops.
22:57 redsheep[d]: If that could be upstreamed that would probably be more than good enough, given the driver boots the hardware. Just one more simple step to get it running well
22:57 redsheep[d]: (from the ideal end user perspective)
22:58 orowith2os[d]: maybe someone will find a vulnerability somewhere in maxwell and pascal one day that we can abuse to bypass the firmware requirements, who knows
22:58 orowith2os[d]: surely nvidia wouldn't patch it out...
22:58 redsheep[d]: I used to think that was an important thing to try to do, but I think all that accomplishes would be being able to develop your own firmware, and why would you want to?
22:59 orowith2os[d]: you'd probably end up doing even more RE, huh?
22:59 redsheep[d]: I guess you could write one from scratch you can redistribute but that sounds awful
22:59 orowith2os[d]: orowith2os[d]: compared to just extracting it from the nvidia driver on the end-users system and using that
22:59 orowith2os[d]: linux-libre-like consequences
23:00 orowith2os[d]: writing your own firmware that doesn't work :v
23:00 Jasper[m]: Is there an easy way to test nvk while not having to completely figure out Fedora's packaging system?
23:00 orowith2os[d]: was it linux-libre, or something else. I forget
23:00 Jasper[m]: The TX2 is now booting
23:00 orowith2os[d]: mesa-git and run it from there
23:00 orowith2os[d]: or build mesa and `meson devenv`
23:01 orowith2os[d]: there's a copr somewhere if you're fine with that
23:01 redsheep[d]: Jasper[m]: If you can manage to build it but not install it you can use the ICD path environment variable to point individual processes at the built
23:02 Jasper[m]: If anything specific needs testing aswell, that would be good to know. I heard some stuff aboit CTS's aswell.
23:02 Jasper[m]: No idea how far along pascal support is supposed to be though
23:03 gfxstrand[d]: orowith2os[d]: Doesn't work if it's signed (which it is)
23:03 gfxstrand[d]: Jasper[m]: Just build anywhere and use `meson devenv`
23:03 orowith2os[d]: gfxstrand[d]: see the "if someone finds a security vulnerability" comment
23:03 gfxstrand[d]: ah
23:03 orowith2os[d]: I doubt it'll happen though
23:04 orowith2os[d]: maybe someone could extract the firmware and crack the key to it :v
23:04 orowith2os[d]: anybody here with a quantum computer? /j
23:05 gfxstrand[d]: Just get one of those GB racks. How good could their crypto have been back during the Maxwell days? 😂
23:05 gfxstrand[d]: But if you have one of those, why are you hacking on Maxwell? 😂
23:08 redsheep[d]: orowith2os[d]: Honestly I would be surprised if cracking putting your own firmware wasn't something already done by the crypto mining people. I thought I had heard something about exactly that but I couldn't find it later when I went looking.
23:09 redsheep[d]: Still I view it as not being at all the ideal path to actually having an actionable way to address reclocking. Better to have it be like gsp, where we use what nvidia does.
23:09 orowith2os[d]: https://www.reddit.com/r/linux_gaming/comments/15xiief/nvidia_bios_signature_lock_broken_what_caused/
23:09 orowith2os[d]: this?
23:09 redsheep[d]: Even if there's a step users have to take
23:11 mhenning[d]: snowycoder[d]: Re: the cache hints, I think nouveau gl actually sets some cache flags in cases where we can't legally use them because the caches aren't coherent enough. (specifically, I remember this being a problem with ldg) So, be careful about copying what nouveau gl does for those
23:13 airlied[d]: anyone got an example of using nvfuzz?
23:15 redsheep[d]: orowith2os[d]: I think that's maybe what I was thinking of. Aren't vBIOS and the firmware different things though? I am not clear on whether this actually changes anything, even assuming someone wanted to create redistributable firmware from scratch
23:15 orowith2os[d]: I'm not sure, but it's something
23:15 airlied[d]: yeah vbios and fw don't overlap here
23:16 Jasper[m]: @_oftc_gfxstrand[d]:matrix.org is there anything other than nvk/tegra I'd need to cherrypick for gp10?
23:19 redsheep[d]: So yeah either live with trying to make non-redistributable firmware work, or somebody here needs to become a billionare so we can argue there's a business case for nvidia creating better redistributable firmware 😛
23:20 phomes_[d]: mhenning[d]: | API | Game | Main | Mel |
23:20 phomes_[d]: | -------- | ---- | ---- | --- |
23:20 phomes_[d]: | vkd3d | Age of empires IV | 147 | 158 |
23:20 phomes_[d]: | vkd3d | Hitman benchmark | 67.5 | 67.4 |
23:20 phomes_[d]: | vkd3d | Atomic heart (looking into wall) | 34 | 38 |
23:20 phomes_[d]: | vkd3d | Deep Rock Galactic | 53 | 59 |
23:20 phomes_[d]: | dxvk | Recipe for disaster | 100 | 117 |
23:20 phomes_[d]: | dxvk | Palworld | 21 | 25 |
23:20 phomes_[d]: | vulkan | Serious Sam 2017 | 276 | 282 |
23:20 phomes_[d]: | vulkan | Sniper elite 5 | 36 | 39 |
23:20 phomes_[d]: | vulkan | The Surge 2 | 38 | 38 |
23:20 phomes_[d]: | vulkan | X4 Foundations | 16 | 16 |
23:20 phomes_[d]: | vulkan | Parkitect | 58 | 59 |
23:21 mhenning[d]: phomes_[d]: Thanks for testing, that looks like a good improvement
23:21 mhenning[d]: phomes_[d]: What card do you have?
23:22 phomes_[d]: this is on a 4070
23:24 phomes_[d]: I have a sheet where I track these games roughly once a week to spot regressions or improvements. I should perhaps put that somewhere public
23:25 redsheep[d]: If you want a tab on the game tracker that might be a good place
23:25 mhenning[d]: phomes_[d]: That's cool, thanks for doing that
23:26 gfxstrand[d]: airlied[d]: `nvfuzz SM70 72..74 XXXXXXXX XXXXXXXX XXXXXXXXX XXXXXXXXX`
23:26 gfxstrand[d]: Jasper[m]: That should be it. gp10 should work on upstream apart from the cache flushing stuff.
23:31 Jasper[m]: Thanks! It's compiling right now. Probably not gonna finish in a while, but if it works I'll report back
23:32 Jasper[m]: old cores, 8GB RAM and an sdcard for storage is not a good combo
23:36 Jasper[m]: Including 2 of those weird ass translator cores
23:39 gfxstrand[d]: It really isn't
23:39 gfxstrand[d]: I've got a few really fast USB drives for when I need to compile shit on Arm.
23:39 gfxstrand[d]: Because unless it's got a M2 slot, I/O sucks.
23:41 Jasper[m]: If this ends up being something I do more often then yes, I'll probably switch over
23:42 Jasper[m]: Don't have an PCIe x4 to nvme adapter though
23:43 magic_rb[d]: Rk3588 board are really good for arm things, they tend to have m.2 and also 2.5g ethernet, you could even do fully network mounted and it might not be that noticable
23:43 magic_rb[d]: \*boards
23:43 gfxstrand[d]: Some of them have eMMC chips that are alright but those are usually a headache to flash.
23:44 Jasper[m]: I have a ms devkit I haven't set up yet, that's about the most performance I have in a desktop/headless setup
23:45 Jasper[m]: Though apparently that new Radxa Orion O6 board may be interesting. It's mini-itx and well enough set up to have gpu's work
23:46 Jasper[m]: *gpus
23:46 Jasper[m]: I have a chronic case of Dutch
23:46 HdkR: I got the Orion with a Radeon in it, can confirm that it works pretty well.
23:46 magic_rb[d]: Jasper[m]: Oh youre dutch? Where are you located
23:47 Jasper[m]: The grey area inbetween Brabant, Belgium and Germany
23:47 Jasper[m]: You were active in 4d1 right? I seem to remember your name
23:48 magic_rb[d]: Nope, no clue what is 4d1
23:48 magic_rb[d]: Im in amsterdam, orginally from slovakia
23:49 Jasper[m]: magic_rb[d]: Hm, might be a different person
23:49 Jasper[m]: magic_rb[d]: Ahh yeah I'm quite a ways away from that
23:50 magic_rb[d]: Probably, or im just having a major case of amnesia, but im pretty sure ive never been to that region of the EU :P
23:50 magic_rb[d]: I only passed through there on my way fosdem i guess
23:50 Jasper[m]: Yeah possibly, I live near Eindhoven basically
23:51 magic_rb[d]: I might be in eindhoven first week of april, want to grab a beer? Could talk tech
23:51 snowycoder[d]: mhenning[d]: I only plumb through what is already available in Ir (we don't have cache hints in NAK yet)
23:51 mohamexiety[d]: airlied[d]: ok having looked around some more today.. how do we want to do this? :thonk:
23:51 mohamexiety[d]: first thing in order is to relax the requirement in the nouveau_bo.c code. but I am not sure what else needs changing after that tbh. where does the whole eviction/moving stuff code lie, and how does it interact with this stuff? I can't actually find any tie between `nouveau_bo->page` and host RAM things in the backend code which is scary
23:52 mohamexiety[d]: like I have been digging through things and following threads of functions but I cant seem to come up with a plan
23:53 mohamexiety[d]: do I just relax the requirement and watch how things explode? :KEKW:
23:55 mohamexiety[d]: (also, things that rely on `NOUVEAU_GEM_DOMAIN_GART` dont seem to interact with this stuff as well aside from things in nouveau_bo.c)
23:55 snowycoder[d]: mhenning[d]: also, can I ask something about the kepler ISA since you have experience?
23:55 snowycoder[d]: `IADD.X` doesn't seem to use the carry flag (the iadd64 hw_test fails when carry=1).
23:55 snowycoder[d]: could it be because between the `IADD` that sets the flag and the `IADD.X` that consumes it there's a scheduling instruction?
23:55 snowycoder[d]: Otherwise I'm out of ideas
23:58 mhenning[d]: snowycoder[d]: I wouldn't expect a scheduling instruction to mess that up (they're not really instructions, they're just encoded that way)
23:59 mhenning[d]: Does NAK_DEBUG=serial help?
23:59 mhenning[d]: It might be that we need longer instruction latencies, I think they've tended to go down over time, so kepler might need longer ones than what we're using
23:59 Jasper[m]: @_oftc_magic_rb[d]:matrix.org I might, but I currently have no idea what I'm doing then