IRC Logs of #nouveau on irc.freenode.net for 2025-03-17

03:07 airlied[d]: there is no opt algebraic for intrinsics?
03:12 gfxstrand[d]: No, there isn't
03:12 gfxstrand[d]: What intrinsic did you have in mind?
03:15 airlied[d]: the one I'm adding of ldsm
03:16 airlied[d]: for ldsm
03:34 gfxstrand[d]: Yeah, you have to write lowering/optimization code for that
04:06 airlied[d]: mhenning[d]: have you tried making the ldc patterns a(is_not_const)? I think it helps with some address calcs for me here
04:20 airlied[d]: ripping out the membar now gets me up to 17TF
04:28 mhenning[d]: airlied[d]: not sure what you mean by "ldc patterns" - ldc is generated by a lowering pass, not opt_algebraic
04:28 airlied[d]: in the nak algebraic
04:29 airlied[d]: (('iadd(is_used_by_non_ldc_nv)', 'a@32(is_not_const)', ('ishl', 'b@32', '#s@32')),
04:29 airlied[d]: ('lea_nv', a, b, s), 'nak->sm >= 70'),
04:29 airlied[d]: sorry lea patterns
04:29 mhenning[d]: Oh, the lea patterns?
04:29 airlied[d]: brain has too many TLAs
04:29 mhenning[d]: No, I don't think I tried that
04:30 mhenning[d]: the `is_used_by_non_ldc_nv` comes from avoiding some shaderdb regressions from those commits, but I didn't try too many variations for the patterns
04:31 airlied[d]: it seems to fold into some shared stores for me
04:31 airlied[d]: though that might be another thing I just did
04:31 airlied[d]: now that I read it
04:32 airlied[d]: but I needed that to avoid lea happening when I didn't want it
04:32 airlied[d]: I add some opt algebraic patterns for ushr/iadd combos
04:32 mhenning[d]: Ah, yeah it's possible that `is_used_by_non_ldc_nv` check should be more general than just ldc - a few different loads can fold additions into them in the same way
04:38 mhenning[d]: To be a little more specific, it's possible the `is_used_by_non_ldc_nv` should also check for load_shared/load_scratch/load_global since I think they all have similar addressing modes on the hardware
04:40 airlied[d]: ah yes that might make sense
14:58 tiredchiku[d]: notthatclippy[d]: question for you: `/dev/nvidiactl` and `/dev/nvidia0`. I gather the former is for managing all nvidia GPUs on a system, and the latter for the first one plugged in, but is it possible to control `nvidia0` from `nvidiactl`? or am I expected to get a GPU handle and use that to figure out which GPU to refer to
14:59 marysaka[d]: I think all `nvidiaX` instances can be used to refer to any NVIDIA GPUs on the system but I might be wrong
15:02 tiredchiku[d]: trying to understand how to send commands/ioctls to openrm
15:02 tiredchiku[d]: and that has me a bit confused
15:03 tiredchiku[d]: looked at other projects that also do the same (LACT, libva-nvidia-driver), but they've hardcoded /dev/nvidia0
15:03 tiredchiku[d]: or rather, they use both fds
15:05 tiredchiku[d]: ..wait
15:05 tiredchiku[d]: I think LACT's codebase answers my question a bit
15:06 tiredchiku[d]: let device_fd = File::options()
15:06 tiredchiku[d]: .read(true)
15:06 tiredchiku[d]: .write(true)
15:06 tiredchiku[d]: .open(format!("/dev/nvidia{minor_number}"))
15:06 tiredchiku[d]: .context("Could not open nvidia device")?;
15:07 tiredchiku[d]: :doomthink:
15:41 mohamexiety[d]: gfxstrand[d]: airlied[d] skeggsb9778[d]
15:41 mohamexiety[d]: heyy, with all the znvk things out of the way, going back to the page table stuff. last time we had the issue of `nouveau_bo->page` causing issues due to nvk setting `GART | LOCAL` on all allocations, which made the `nouveau_bo` code force 4KiB physical pages which in turn caused issues since the virtual page size was no longer 4KiB. the suggestion at the time was removing `nouveau_bo->page`
15:41 mohamexiety[d]: entirely because it's not really something in the HW anyways. and I did set out to do that but I didn't as there are a few things with this:
15:41 mohamexiety[d]: - the biggest issue is removing it means that every time we map memory, we would have to check every region since allocations aren't contiguous to make sure every block is aligned. the current way allows to do it only once when the memory block is allocated.
15:41 mohamexiety[d]: - IIUC we don't _need_ to enforce 4KiB pages just because `GART` is set. we only need to do that when things actually are in sysmem.
15:41 mohamexiety[d]: - also IIUC, what `nouveau_bo->page` enforces is a maximum page size, not the actual page size. so even with `GART` we should have leeway to set a large maximum.
15:41 mohamexiety[d]: the thing is, while this does tell me that this probably isn't a good path forwards, it doesn't exactly tell me where to go from here :thonk:. from where I am standing after research and playing around, there seem to be 2 options:
15:41 mohamexiety[d]: - change the semantics of `GART` such that we only actually go for 4KiB only when things are in sysmem.
15:41 mohamexiety[d]: - stick to the current way of forcing 4KiB when the flag is set, but silently promote to bigger pages anything that is in VRAM (and demote it under high pressure, etc).
15:41 mohamexiety[d]: they both ultimately lead to the same thing but the method is a bit different. and honestly I am not sure how to go about doing either of them
15:49 notthatclippy[d]: tiredchiku[d]: If only it were simple... Very handwavy, /dev/nvidiaN is for initializing the GPU and mmapping memory, while nvidiactl is for most other things. There's probably dozens if not hundred exceptions either direction. Search for `NV_CTL_DEVICE_ONLY()` and `NV_ACTUAL_DEVICE_ONLY()` macros to see some of these.
15:50 tiredchiku[d]: pain
15:50 karolherbst[d]: ✨ technical debt ✨
15:51 tiredchiku[d]: also, to send an ioctl to the kernel modules, apparently I need a bunch of other stuff?
15:51 tiredchiku[d]: I see
15:51 tiredchiku[d]: NV0000_CTRL_OS_UNIX_EXPORT_OBJECT_TO_FD_PARAMS params = {
15:51 tiredchiku[d]: .fd = export_fd,
15:51 tiredchiku[d]: .flags = 0,
15:51 tiredchiku[d]: .object = {
15:51 tiredchiku[d]: .type = NV0000_CTRL_OS_UNIX_EXPORT_OBJECT_TYPE_RM,
15:51 tiredchiku[d]: .data.rmObject = {
15:51 tiredchiku[d]: .hDevice = hDevice,
15:51 tiredchiku[d]: .hParent = hParent,
15:51 tiredchiku[d]: .hObject = hObject
15:51 tiredchiku[d]: }
15:51 tiredchiku[d]: }
15:51 tiredchiku[d]: }
15:51 tiredchiku[d]: plastered into nvidia-vaapi-driver
15:52 tiredchiku[d]: and `const int ret = nv_rm_control(context->nvctlFd, context->clientObject, context->clientObject, NV0000_CTRL_CMD_GPU_GET_UUID_FROM_GPU_ID, 0, sizeof(uuidParams), &uuidParams);` just to get the GPU UUID
15:52 notthatclippy[d]: Depends on what you want to send and how robust you want to be.
15:52 tiredchiku[d]: where nv_rm_control is this function: https://github.com/elFarto/nvidia-vaapi-driver/blob/c519e97ef7af581c109f49b6973269fb16d1bc54/src/direct/nv-driver.c#L101C1-L120C2
15:52 tiredchiku[d]: notthatclippy[d]: huh
15:53 notthatclippy[d]: I keep this as a minimal example of something that talks to a GPU using nvidia.ko: https://gist.github.com/mtijanic/9c129900bfba774b39914ad11b0041f6
15:55 notthatclippy[d]: It performs the minimal setup necessary to talk to a GPU and then instructs it to run at loudest pstate. It should work on most consumer single nv-gpu systems, but... if we were to write an internal version of the same thing it'd probably be 10x the number of ioctls to enumerate all GPUs, pick which one to use, handle the case when due to hotplugging there's gaps in the numbering, etc..
15:55 tiredchiku[d]: ok yeah, makes sense
15:56 tiredchiku[d]: I probably wanna aim for the latter, but at a proof of concept stage I should stick with getting the former running
15:56 tiredchiku[d]: i.e. minimal
15:58 notthatclippy[d]: Our internal userspace "libnvrmapi" sotospeak is around 7000 lines of code, and it exposes an interface similar to this to other userspace drivers: <https://github.com/NVIDIA/open-gpu-kernel-modules/blob/main/src/nvidia/src/kernel/rmapi/entry_points.c#L39-L70>
15:59 tiredchiku[d]: interesting
16:00 tiredchiku[d]: I'll go through that tomorrow, thanks :D
16:13 avhe[d]: tiredchiku[d]: This is my init code: https://paste.centos.org/view/2b3c5242
16:13 avhe[d]: I've tried to keep hardcoding to a minimum, basically looks for the first available device and opens the interfaces using data that's provided by NV_ESC_CARD_INFO and NV0000_CTRL_CMD_GPU_GET_ID_INFO_V2
16:16 avhe[d]: BTW notthatclippy[d] since you're around, I'm curious how complete tegra support is in OGKM? I see a lot of special-casing in the codebase but I haven't heard of actually using it
16:21 notthatclippy[d]: Getting both to build from the same external repo is a thing we want to do, but it's not there yet.
16:22 notthatclippy[d]: Internally it's the same codebase, but the builds and packaging are handled differently so what gets published are separate repos that aren't at all easy to merge.
16:24 notthatclippy[d]: Whereas by Tegra I mean just stuff like Orin that actually uses this code. Most of the other tegra stuff, the upstream linux driver is the source of truth AFAIK
16:27 avhe[d]: I see thanks
16:28 avhe[d]: It would be nice if older models would be supported too... using my jetson nano is very painful on the super outdated ubuntu L4T package
16:33 notthatclippy[d]: I lost track, but I don't think that actually uses RM in production at all. IIRC Orin was the first one in the last 10 years or so that shared the code.
16:33 notthatclippy[d]: And it's just the display related functionality that it shared.
16:34 avhe[d]: Yeah tegras use another kernel driver for most stuff. I don't think it's upstream, though they did upstream some uapi headers which is at least something
16:35 avhe[d]: Given that they broke the uapi between models before
16:37 notthatclippy[d]: Yeah, sorry, I probably have no clue what state that is upstream. It's all a different team, so to speak. We only interacted with those chips in presillicon.
16:38 karolherbst[d]: avhe[d]: there is a kernel tree for android, which has its own acceleration driver, but the tegra side of things is actually upstream. There was this google pixel product where they used nouveau in production on tegra
16:39 avhe[d]: Oh yeah the pixel C? Never knew this was running nouveau
16:39 karolherbst[d]: it's the reason why upstream nouveau even has tegra support somehwat
16:41 Jasper[m]: @_oftc_karolherbst[d]:matrix.org not K1 technically? They released first and landed in cros devices pre-pixel c
16:42 karolherbst[d]: not sure tbh
16:42 Jasper[m]: Also from what I gathered nvk got briefly looked at by @_oftc_marysaka[d]:matrix.org and @_oftc_gfxstrand[d]:matrix.org. I think what was perceived as a big hurdle got fixed?
16:43 karolherbst[d]: it needs somebody caring enough to fix it all up
16:43 karolherbst[d]: _but_ I think the same issues would also happen on any nvidia GPU with unified memory, just.. nvidia stopped doing those
16:44 karolherbst[d]: tesla was the last gen with those?
16:44 avhe[d]: Won't the digits have unified memory?
16:45 mohamexiety[d]: it does yeah
16:45 mohamexiety[d]: part of its whole appeal -- 128GB of unified memory
16:47 Jasper[m]: @_oftc_karolherbst[d]:matrix.org if my context clue recognition is good enough, it looks like that's being looked at: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33959 (when it comes back)
16:55 tiredchiku[d]: avhe[d]: thaaanks, it'll be helpful :3
16:56 avhe[d]: NP
17:00 gfxstrand[d]: And... I think that's the last Maxwell A bugf.
17:01 gfxstrand[d]: If this 4x CTS run is good (it'll take a while, I'll start looking into putting together actual conformance packages for Maxwell+
17:01 gfxstrand[d]: Volta still has some fp64 issues, though, so IDK if that'll get conformance just yet.
17:01 gfxstrand[d]: But I *think* this should be good for Maxwell A through Pascal
17:02 marysaka[d]: nice, if you want I could pull out my TX1 for full CTS testing :nya_peek:
17:03 gfxstrand[d]: Oh, I didn't say I had Tegra fixed. 😛
17:03 gfxstrand[d]: This is Maxwell A desktop
17:04 marysaka[d]: right right I may be too excited for it :linatehe:
17:26 mohamexiety[d]: lets gooo! <a:vibrate:1066802555981672650>
17:46 gfxstrand[d]: I just ordered a TX1 on eBay so I'll be able to test Tegra stuff soon
17:46 gfxstrand[d]: I really need to quit my eBay habbit. I have an entire plastic file box of GPUs. 😂
17:51 marysaka[d]: that's a mood >.>
17:52 mohamexiety[d]: :nervous:
17:56 HdkR: I feel called out from my pile of ARM devices. :D
18:00 Jasper[m]: @_oftc_gfxstrand[d]:matrix.org no Switchin'? :p
18:01 Jasper[m]: Considering there's some movement I will try to set my TX2 up with some haste. For TX1 I don't really have a nicely working device
18:02 Jasper[m]: Well I do, but the Switch is not a workhorse for me and the pixel is not that great for this job
18:06 gfxstrand[d]: I've got a Switch with Fedora on it.
18:06 gfxstrand[d]: But most of the Switch Fedora images have ancient kernels. :blobcatnotlikethis:
18:07 gfxstrand[d]: But someone's working on a modern, upstream nouveau-capable refresh
18:08 Jasper[m]: @_oftc_gfxstrand[d]:matrix.org I know, I'm in the Discord aswell :p
18:08 gfxstrand[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1351255877071405126/rn_image_picker_lib_temp_d60f9d38-5a76-4aa5-ada0-629672abe264.jpg?ex=67d9b69a&is=67d8651a&hm=417fd839f3200d957f66738941a86a8a8a136b3f70c72f8edb9438da1bb84db5&
18:08 gfxstrand[d]: And that doesn't even include the GPUs that are currently plugged into desktops or the two eBay orders I have coming.
18:08 Jasper[m]: The released images with GPU Acceleration are based on Nvidia's downstream kernel which is 5.10 max iirc
18:09 gfxstrand[d]: Yeah. I need 6.6 at a minimum. Preferably torvalds/master
18:09 Jasper[m]: The tree they have for mainline is at newest 6.9 iirc, unless they updated to 6.13 at your request
18:09 gfxstrand[d]: 6.9 is useable
18:10 gfxstrand[d]: I can back/forward-port patches as needed to develop on 6.9
18:10 gfxstrand[d]: But also, I plan to use my TX1 and that should be able to run vanilla
18:11 Jasper[m]: Ah, checked #gitlab in their discord, they rebased to 6.13
18:11 gfxstrand[d]: Sweet
18:11 airlied[d]: gfxstrand[d]: Nice start, I've got a whole office full 🙂
18:12 gfxstrand[d]: https://tenor.com/view/jayma-mays-avalanche-epic-movie-narnia-wardrobe-gif-11624865
18:12 Jasper[m]: Also some ci/cd stuff to poop out an image I think? Lemme check.
18:13 airlied[d]: I threw out most pre dx10 stuff, and it's still that picture 🙂
18:13 gfxstrand[d]: lol
18:13 gfxstrand[d]: I made the mistake of buying myself a new Haswell on eBay
18:13 gfxstrand[d]: I probably shouldn't admit to that publicly
18:13 airlied[d]: I bought a Mac mini m2
18:13 airlied[d]: And an m4
18:13 airlied[d]: I bought the M2 just to debug the m4
18:13 gfxstrand[d]: I might be getting one of those. People want me to help out with Rust DRM stuff
18:15 airlied[d]: I think better power infrastructure should be my next investment
18:15 mohamexiety[d]: every day we inch closer to faith kernel dev arc :p
18:15 airlied[d]: How many power strips can you chain before your house insurance invalidates itself
18:15 gfxstrand[d]: No!
18:15 gfxstrand[d]: No kernel dev arc!
18:15 gfxstrand[d]: I already did that
18:16 gfxstrand[d]: Prototyping Xe was enough kernel dev for me, thanks.
18:17 gfxstrand[d]: I'm feeling good about this Maxwell run:
18:17 gfxstrand[d]: `Pass: 456244, Skip: 708752, Timeout: 4, Duration: 1:22:13, Remaining: 1:56:20`
18:18 Jasper[m]: @_oftc_gfxstrand[d]:matrix.org https://gitlab.com/l4t-community/gnu-linux/switchroot-pipeline/-/jobs/9391857453/artifacts/browse/nouveau/ here's the kernel stuff from that 6.9 build they did. Tree's here: https://gitlab.com/l4t-community/kernel/mainline/linux/-/branches (I think they compiled the Mariko branch, but pretty sure it includes patches for Icosa aswell)
18:18 gfxstrand[d]: IDK if I should submit Maxwell A and Maxwell B conformance separately or not. They feel pretty different.
18:18 gfxstrand[d]: Jasper[m]: Yeah, I've been chatting with folks over in their Discord
18:18 Jasper[m]: (Keep in mind, I have no idea how well this works if at all)
18:19 Jasper[m]: Ahh nice, didn't see anymore traffic in fedora-support, but I guess y'all switched (not intended) to a dofferent channel
18:19 Jasper[m]: *different
18:19 gfxstrand[d]: We've been chatting in testing
18:20 Jasper[m]: Off limits to me, sad
22:23 orowith2os[d]: Is RE4 supposed to be running at, uh, 13 FPS on a 3090 with GSP (supposedly) enabled?
22:23 orowith2os[d]: I have someone running a session with NVK and the GSP cmdline added, and it's stuck there
22:28 esdrastarsis[d]: orowith2os[d]: 2005 or Remake?
22:28 orowith2os[d]: Remake, I'm assuming. It's using VKD3D
22:29 orowith2os[d]: Here's a screenshot of them running it:
22:29 orowith2os[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1351321506080358420/image0.png?ex=67d9f3b9&is=67d8a239&hm=5c4d4303eb0cfa4557c39315517af0ae62db02c852485a90d546c96815b083ea&
23:15 gfxstrand[d]: Ideally, no, but there's a lot of work yet to do on VKD3D perf
23:19 nebadon2025[d]:waves~~
23:19 orowith2os[d]: (this is the someone in question)
23:19 orowith2os[d]: :thumbsup:
23:34 nebadon2025[d]: does this look correct?
23:34 nebadon2025[d]: ```nebadon@CERULEAN:~/install$ inxi -G
23:34 nebadon2025[d]: Graphics:
23:34 nebadon2025[d]: Device-1: NVIDIA GA102 [GeForce RTX 3090] driver: nouveau v: kernel
23:34 nebadon2025[d]: Display: wayland server: Xwayland v: 24.1.6 compositor: kwin_wayland
23:34 nebadon2025[d]: driver: gpu: nouveau resolution: 3840x2160~60Hz
23:34 nebadon2025[d]: API: EGL v: 1.5 drivers: nouveau,swrast
23:34 nebadon2025[d]: platforms: gbm,wayland,x11,surfaceless,device
23:34 nebadon2025[d]: API: OpenGL v: 4.5 compat-v: 4.3 vendor: mesa v: 25.1.0-devel
23:34 nebadon2025[d]: renderer: NV172
23:34 nebadon2025[d]: API: Vulkan v: 1.4.310 drivers: N/A surfaces: xcb,xlib,wayland
23:34 nebadon2025[d]: Info: Tools: api: clinfo, eglinfo, glxinfo, vulkaninfo
23:34 nebadon2025[d]: de: kscreen-console,kscreen-doctor wl: wayland-info x11: xdriinfo,
23:34 nebadon2025[d]: xdpyinfo, xprop, xrandr
23:34 nebadon2025[d]: not sure what this is about `drivers: nouveau,swrast`
23:34 nebadon2025[d]: swrast?
23:39 nebadon2025[d]: i cant get anything to run if i use mangohud
23:42 orowith2os[d]: Swrast is just a software renderer
23:42 mhenning[d]: The Vulkan ... drivers: N/A looks wrong to me
23:43 nebadon2025[d]: im using the che mesa repo for fedora
23:43 nebadon2025[d]: not sure if there is a better option.. its what I have been using for a while though
23:44 nebadon2025[d]: https://copr.fedorainfracloud.org/coprs/che/mesa/
23:44 nebadon2025[d]: this one
23:45 nebadon2025[d]: mangohud is definitely not working now so odd
23:45 nebadon2025[d]: I even went back to version i had working bit ago, and that isnt working now either hmm
23:45 nebadon2025[d]: vkcube runs though
23:45 orowith2os[d]: What does vulkaninfo say?
23:46 nebadon2025[d]: 800 pages?
23:46 nebadon2025[d]: lol
23:46 nebadon2025[d]: no errors though
23:46 nebadon2025[d]: ```Devices:
23:46 nebadon2025[d]: ========
23:46 nebadon2025[d]: GPU0:
23:46 nebadon2025[d]: apiVersion = 1.4.309
23:46 nebadon2025[d]: driverVersion = 25.0.99
23:47 nebadon2025[d]: vendorID = 0x10de
23:47 nebadon2025[d]: deviceID = 0x2204
23:47 nebadon2025[d]: deviceType = PHYSICAL_DEVICE_TYPE_DISCRETE_GPU
23:47 nebadon2025[d]: deviceName = NVIDIA GeForce RTX 3090 (NVK GA102)
23:47 nebadon2025[d]: driverID = DRIVER_ID_MESA_NVK
23:47 nebadon2025[d]: driverName = NVK
23:47 nebadon2025[d]: driverInfo = Mesa 25.1.0-devel
23:47 nebadon2025[d]: conformanceVersion = 1.4.0.0
23:47 nebadon2025[d]: deviceUUID = 7201de10-0422-0000-0100-000100000000
23:47 nebadon2025[d]: driverUUID = 69c28a97-7492-a05c-8afc-9dd5e25ebbca
23:47 nebadon2025[d]: GPU1:
23:47 nebadon2025[d]: apiVersion = 1.4.309
23:47 nebadon2025[d]: driverVersion = 25.0.99
23:47 nebadon2025[d]: vendorID = 0x10005
23:47 nebadon2025[d]: deviceID = 0x0000
23:47 nebadon2025[d]: deviceType = PHYSICAL_DEVICE_TYPE_CPU
23:47 nebadon2025[d]: deviceName = llvmpipe (LLVM 19.1.7, 256 bits)
23:47 nebadon2025[d]: driverID = DRIVER_ID_MESA_LLVMPIPE
23:47 nebadon2025[d]: driverName = llvmpipe
23:47 nebadon2025[d]: driverInfo = Mesa 25.1.0-devel (LLVM 19.1.7)
23:47 nebadon2025[d]: conformanceVersion = 1.3.1.1
23:47 nebadon2025[d]: deviceUUID = 6d657361-3235-2e31-2e30-2d6465766500
23:47 nebadon2025[d]: driverUUID = 6c6c766d-7069-7065-5555-494400000000
23:47 mhenning[d]: yeah, that vulkaninfo output looks fine
23:48 nebadon2025[d]: ok this is odd
23:48 nebadon2025[d]: mangohud works with glxgears
23:48 nebadon2025[d]: but not vkcube
23:52 orowith2os[d]: I was gonna say, enable logging and check the kernel for gsp just to be sure it's working
23:52 orowith2os[d]: But I don't know the specifics on that
23:52 orowith2os[d]: (unrelated to the mangohud issue, but more perf stuff)