07:13UrbanMusic: any info on getting the displayport working with nouveau driver on an MSi GTX660 Ti 10de:1183.
07:16UrbanMusic: I have use info on freedesktop including boot option drm.edid_firmware=DP-1:edid/myedid.bin
07:17UrbanMusic: I have tried with nouveau and i2c modules in initrd and loaded in normal way
07:18airlied: Lyude: did you land those two patches I reviewed into fixes?
07:18airlied: UrbanMusic: it sounds like you have a regression, you might need to workout what kernel broke it
07:20UrbanMusic: it all works perfectly on kernel-6.6.26 with no drivers in initrd or drm edid boot option
07:21UrbanMusic: here is pastebin of 6.6.26 graphics and i2c sections https://pastebin.com/0SPHZNTi
07:21airlied: uggh so 6.7 was where there was some refactoring
07:22airlied: might be worth starting with filing an issue https://gitlab.freedesktop.org/drm/nouveau
07:22airlied: Lyude: ^ also what combo of debug flags is good here?
07:22UrbanMusic: and 2nd pastebin for same sections using both kernel-6.8.2-rt and kernel-6.9-rc2
07:23UrbanMusic: https://pastebin.com/QqPFhNpn
07:23airlied: I don't think it's config related
07:24airlied: it might be worth getting dmesg from 6.6.26 and 6.7 with drm.debug=0x116
07:24airlied: and trying to compare where it goes wrong
07:27UrbanMusic: thanks airlied ill do that. do you know or could point me were to look, for what modules/helpers nouveau uses for displayport in post 6.7 kernels so I can im including
07:28UrbanMusic: okay I can do set drm.debug=0x116 with 6.6.26 and ive got a 6.7, 6.8 and 6.9 kernels
07:30UrbanMusic: I have also got "options nouveau mst=0" which came from a forum post were they got it working
07:31UrbanMusic: ill disable it while debuging
07:40airlied: UrbanMusic: they should all just be there, we don't mess with configs
07:47UrbanMusic: my distro generic-config only had modules auto selected by drivers, so a few additional drivers were missing i2c, gpio and drm_bridge module were missing
07:48UrbanMusic: right set that boot option for the 2 kernels back in a few mins ill get dmesg from them then log bk into channel
08:01airlied: for nvidia it should select everything it needs itself
08:01airlied: gpio and bridge aren't needed
08:01airlied: i2c core should be all that is required
11:18fdobridge: <mohamexiety> ...I have a bit of a rust<->C skill issue here. I want to use a function in Rust that's from a C header, so I added that entrypoint in the already existing `meson.build` file, under the list of allowed functions like this:
11:18fdobridge: <mohamexiety> `'--allowlist-function', 'util_format_is_depth_or_stencil',`
11:18fdobridge: <mohamexiety> but when I try to compile (after regenerating the entire build directory) I get hit by the same "function not found in this scope" error
11:19fdobridge: <mohamexiety> when opening `nil_bindings.rs` which is autogenerated by rustbindgen, I find all the other entrypoints in the allowlist _except_ the one I added. so there's something I missed, but what?
13:22fdobridge: <gfxstrand> Yeah, that's because bindgen doesn't handle inline functions well. 😞
13:23fdobridge: <gfxstrand> Likely, the implementation of that function is trivial and you can just reimplement it in format.rs similar to `is_srgb()`
13:23ad__: hi, so posted a v2 of the backlight-issue on ADA LOVELACE (AD107 MaxQ/Mobile chipset)
13:23ad__: https://lists.freedesktop.org/archives/nouveau/2024-April/044390.html
13:24ad__: My laptop now works great with nouveau, this eas my target, so no hurry at all in comments, but would be nice to get any feedback.
13:31fdobridge: <gfxstrand> `46/60 sessions passed, conformance test FAILED`
13:41fdobridge: <gfxstrand> I'm just gonna run with out fp16
13:53fdobridge: <marysaka> oh fp16 is failing?
13:53fdobridge: <gfxstrand> It's not your fault
13:53fdobridge: <gfxstrand> It's test bugs
13:53fdobridge: <gfxstrand> The desktop GL tests are not designed for working `mediump`.
13:53fdobridge: <zmike.> were there any other issues besides the ones you have tickets for
13:54fdobridge: <gfxstrand> The sin/cos tests, apparently.
13:54fdobridge: <gfxstrand> I think all the trig ones should get the highp treatment across the board
13:54fdobridge: <gfxstrand> hrm...
13:55fdobridge: <gfxstrand> I wonder if we have an fp16 swizzle bug.
13:55fdobridge: <zmike.> sounds right
13:55fdobridge: <gfxstrand> that's possible, I suppose
13:58fdobridge: <marysaka> It's quite possible we have another one of those yeah...
14:10fdobridge: <zmike.> @gfxstrand what is your exact command line for a failing run of https://gitlab.freedesktop.org/mesa/mesa/-/issues/10993
14:18fdobridge: <gfxstrand> Pretty sure it was just `./glcts -n TEST_NAME`
14:19fdobridge: <gfxstrand> You have to be in a Wayland build, though.
14:25fdobridge: <zmike.> I'm in a wayland build and it passes for me on every driver combo I've tested thus far except for radeonsi
14:25fdobridge: <zmike.> this is on my amd test machine
14:25fdobridge: <zmike.> it passes on zink too
14:26fdobridge: <zmike.> the difference I'm seeing is that radeonsi gets a swapchain format of `PIPE_FORMAT_B10G10R10A2_UNORM` while (zink/llvmpipe) use `PIPE_FORMAT_B8G8R8A8_UNORM`
14:26fdobridge: <zmike.> very strange
14:29fdobridge: <gfxstrand> That sound plausible. It fails on iris for me
14:29fdobridge: <gfxstrand> As well as every Zink I tried.
14:30fdobridge: <zmike.> weird that zink fails for you and not me
14:32fdobridge: <mohamexiety> it's afraid of the chair
14:33fdobridge: <zmike.> makes sense to me
14:40fdobridge: <gfxstrand> I suspect it's because my Zink is running on a Wayland compositor running on Intel or something like that.
14:42fdobridge: <zmike.> oh
14:42fdobridge: <zmike.> that's illegal
14:43fdobridge: <zmike.> https://cdn.discordapp.com/attachments/1034184951790305330/1229441974268989540/9d9deb4712cc0eef6fe752decd8c9360.png?ex=662fb210&is=661d3d10&hm=233ad95e70a122babe5ba822c39e9cb2728fe27f56c58173c4877a34b61536a4&
14:46fdobridge: <gfxstrand> Wait, wut?
14:56fdobridge: <zmike.> you can't mix and match compositor/client between zink and not-zink
14:56fdobridge: <zmike.> otherwise you aren't guaranteed to get matching modifiers
14:56fdobridge: <gfxstrand> That's not how modifiers work.
14:56fdobridge: <gfxstrand> Modifiers exist precisely to let you mix and match
14:57fdobridge: <gfxstrand> If you can't mix and match, then someone's not implementing modifiers properly.
14:57fdobridge: <zmike.> well this is for the implicit modifiers case
14:57fdobridge: <zmike.> actually that probably doesn't affect anything here
14:57fdobridge: <zmike.> so nevermind
14:57fdobridge: <gfxstrand> Well yes, implicit is all kinds of problematic.
14:58fdobridge: <gfxstrand> For this case, 1010102 sounds like a very plausible culprit. What I saw looked very much like a rounding error. It could just be that the CTS shouldn't be run on anything but 8888 and 565
14:58fdobridge: <gfxstrand> (and 565 is questionable)
15:09fdobridge: <samantas5855> 24 is the name of the vendor
15:09fdobridge: <samantas5855> for some reason
15:10fdobridge: <samantas5855> it should be NVK not the mesa version
15:47fdobridge: <gfxstrand> I can't wait until I get a successful 4.6 run...
15:47fdobridge: <gfxstrand> Mostly so I can have my test rig back. 😅
16:07fdobridge: <gfxstrand> @airlied If you wanted to look more at the fence signaling paths and figure out why I'm getting spurious timeouts, that'd be cool. They're making getting GL 4.6 results really frustrating. About one in 3 runs actually completes.
16:07fdobridge: <gfxstrand> Unfortunately, I don't have a reproducer besides "If you run the GL 4.6 CTS, it'll probably timeout after like 8 hours"
16:49Lyude: airlied: the two you reviewed? I did push them, I don't remember which bra ch it was
16:49Lyude: airlied: also for debugging that: 0x116
16:51Lyude: btw airlied regarding the low memory issue from the other day w/r/t nouveau failing to do a level 7 allocation during runtime suspend: I tried running my patch for a while but once kvzalloc needed to actually allocate a noncontiguous page dma_map_single() refused to mmap it. So, I assume there's some specific way I should be mapping memory allocations to the GPU through the radix3
16:51Lyude: page table we have access to?
17:01Lyude: (maybe you'd also know dakr ? I'm basically just trying to switch nvkm_gsp_mem_ctor() from https://cgit.freedesktop.org/drm-misc/tree/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c over from using dma_alloc_coherent() to something else that will fallback to allowing for fragmented memor allocations since
17:01Lyude: https://cgit.freedesktop.org/drm-misc/tree/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c#n2017 the allocation that we do in this function has a decent chance of failing on systems with low memory during runtime suspend - which I've seen a few times already)
17:12ahuillet: Lyude : is that contiguous physical sysmem you're allocating that is giving you trouble?
17:12ahuillet: you shouldn't need it to be contig?
17:13Lyude: ahuillet: yeah I shouldn't, that's why I'm trying to figure out how to map noncontig memory here for the suspend/resume process
17:13Lyude: (I haven't worked on this area of the kernel very often so I'm not super familiar with this)
17:16Lyude: (also I may have said contiguous when I meant to say coherent)
18:36airlied: Lyude: I think there are some vmalloc to sgtable helpers around
18:38Lyude: alright, I think that gives me a decent idea of what to look for
18:38fdobridge: <airlied> @gfxstrand my main fear is that it's an actual timeout, but there could be yet another race somewhere
18:38Lyude: airlied: was that message aimed at me?
18:38Lyude: or is that regarding the GL stuff
18:39airlied: Lyude: nope that is GL stuff
18:39Lyude: ah gotcha
18:39airlied: dakr: can you cherry-pick the two nouveau fixes from misc-next into misc-fixes?
18:40fdobridge: <gfxstrand> @airlied I fear that, too, but I also don't have any reason why it would time out.
18:41Lyude: airlied: will make sure to get the right branch next time btw
18:44airlied: Lyude: you could probably cherry-pick them also actually
18:44airlied: I think as long as you use -x it should be fine
18:44Lyude: oh ok - I'll go do that now
18:47Lyude: airlied: done, should be pushing tip now
18:52fdobridge: <airlied> @gfxstrand my fear is that all the CPU threads end up blocking one thread from making progress, but 10s is a long time
18:52fdobridge: <airlied> are the timing out tests pretty random?
18:53fdobridge: <airlied> or are you doing single thread cts-runner?
18:56Lyude: ...huh. airlied, maybe I'm misunderstanding something here but I just noticed it looks like that the actual large allocation which would hold all of the migrated vram during suspend/resume https://cgit.freedesktop.org/drm-misc/tree/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c#n2013 should be the one that's not happening through dma_alloc_coherent - and that the allocation which is
18:57Lyude: failing seems to be a much smaller one where we just write some various GSP arguments for the GPU to enter suspend. I guess I'm confused - would such a small allocation still be likely to fail in a low memor situation>
18:57Lyude: *?
18:57Lyude: oh hold on nevermind
18:57Lyude: i see I misread the backtrace whoops
19:00airlied: Lyude: code in nvkm_firmware_ctor does vmalloc to sgtable stuff I think
19:01Lyude: ooo that would definitely clarify things
19:01Lyude: yep! there it is :)
19:02fdobridge: <mhenning> @airlied not sure if this is the same bug Faith is seeing, but singe threaded vulkan cts runs typically give me timeouts in dmesg (typically about 3 or 4 timeouts in a 17-hour run). Seems completely random which test triggers it, re-running the test always passes, and it doesn't happen with multithreaded runs
19:19Lyude: (I also think I get the general idea of how this mapping stuff works now :)
19:23Lyude: alright - patch v2 done: https://paste.centos.org/view/289c8a00 gonna spin up an RPM build and give it a try
19:39fdobridge: <airlied> okay single threaded dying is very strange alright
20:05fdobridge: <gfxstrand> Single-thread cts-runner. The only other thread is GNOME/Weston.
20:06fdobridge: <karolherbst🐧🦀> sounds like the stuff I hit with the gl driver years ago
20:09fdobridge: <ahuillet> where's the timeout coming from? some pushbuffer not making progress?
20:11fdobridge: <gfxstrand> I suspect it's an interrupt not making it through
20:12fdobridge: <airlied> yeah I thought I fixed a race in that area already, but maybe I only started debugging it
20:13fdobridge: <airlied> and it's a race on the intr block/allow stuff
20:13fdobridge: <karolherbst🐧🦀> I think it's some kind of overflow
20:14fdobridge: <karolherbst🐧🦀> it's not randomly happening, but after a while
20:14fdobridge: <airlied> https://paste.centos.org/view/raw/04d2e33b
20:15fdobridge: <airlied> might be worth a try if someone wants
20:15fdobridge: <airlied> just realised I have that in my local kernel
20:20fdobridge: <Sid> @ahuillet does openrm work without GSP? if yes, what firmware does it use then
20:20fdobridge: <ahuillet> it does not
20:20fdobridge: <Sid> what about the proprietary one?
20:20fdobridge: <Sid> wanna test all scenarios for this bug report I just opened on the forums
20:21fdobridge: <Sid> logic dictates proprietary *should* work, but I haven't used the proprietary one much since openRM released and am unsure 😅
20:22fdobridge: <ahuillet> pretty sure it all goes through GSP on recent boards at least
20:23fdobridge: <Sid> okie thanks
20:24fdobridge: <Sid> just wondering if I should add anything to this report: https://forums.developer.nvidia.com/t/pci-pm-fails-with-modesetting-enabled-in-multi-gpu-config/289720
20:25fdobridge: <ahuillet> "PCI power management" being RTD3?
20:25fdobridge: <Sid> correct
20:27fdobridge: <ahuillet> I assume nvidia_drm.modeset=1 implies that the GPU is driving the console and therefore can't be put to sleep?
20:27fdobridge: <Sid> but the GPU is not the primary GPU
20:27fdobridge: <ahuillet> except because it's not the primary GPU you're saying that's not the case in this instance?
20:27fdobridge: <Sid> primary is intel iGPU
20:27fdobridge: <Sid> yeah
20:27fdobridge: <ahuillet> please update the report with exact laptop model, and dmidecode information
20:28fdobridge: <Sid> on it
20:28fdobridge: <ahuillet> I don't know this stuff well enough to answer, but I suspect the "it's not primary therefore it can sleep" assumption may not be as straightforward as that
20:28fdobridge: <ahuillet> so it might well not be a bug, but it would be good to get this looked at.
20:28fdobridge: <Sid> I mean, it does sleep correctly in x11
20:29fdobridge: <Sid> if I follow the instructions in the readme: https://download.nvidia.com/XFree86/Linux-x86_64/550.67/README/dynamicpowermanagement.html
20:29fdobridge: <ahuillet> that is not obvious from your report?
20:29fdobridge: <Sid> will add
20:29fdobridge: <ahuillet> that is not obvious from your report? (or maybe I can't read) (edited)
20:31fdobridge: <mtijanic> If you're bored, you could bpftrace the calls into openrm and/or gsp with and without the modeset toggle.
20:31fdobridge: <mtijanic> Actually, NVreg_RmMsg=":" will do the job just as well as bpftrace on openrm
20:32fdobridge: <mtijanic> Just make sure your kernel log buffer is large enough as it can get _very_ spammy.
20:33fdobridge: <Sid> added everything to the `More Info` section of the report
20:34fdobridge: <Sid> will do tomorrow, it is 0204 where I am and I have an 0800 class 😅
20:34fdobridge: <mtijanic> At the very high level, the GPU will go to sleep if either kernel or nvidia-powerd (or something) tells it to, _and nothing is calling into it to prevent it_.
20:34fdobridge: <Sid> mhm
20:35fdobridge: <Sid> manually `echo`ing `suspend` to `/proc/driver/nvidia/suspend` also did nothing, fwiw
20:35fdobridge: <Sid> with modeset enabled, that is
20:36fdobridge: <mtijanic> Also, do you know if it doesn't go to sleep at all, or it gets woken up immediately?
20:36fdobridge: <Sid> unsure
20:36fdobridge: <Sid> I do know nvidia-smi is not the right way to check for sleep, since it wakes up the device, which is why I used `grep . /sys/bus/pci/devices/*/power/runtime_status`
20:39fdobridge: <mtijanic> Hrm, I'm not sure if there's a way to poke at the device info without waking it on linux. We have it on Windoze but I think it was just too much hassle to port that particular toggle to linux. But maybe one of those non-rmapi ioctls can give the info..
20:39fdobridge: <mtijanic> I'll check in a bit.
20:40fdobridge: <Sid> I think using the sysfs interface doesn't wake up the device, because if I set that command to `watch -n 0.1` it keeps showing `suspended` on the GPU with modesetting disabled
20:41fdobridge: <mtijanic> I'm more thinking about it possibly giving incorrect data..
20:42fdobridge: <Sid> if my laptop's fans are anything to go by, I'd trust it :myy_TinyGiggle:
20:42fdobridge: <Sid> but yeah
20:42fdobridge: <Sid> funky issue
20:42fdobridge: <mtijanic> I trust your bug report, it sounds totally plausible.
20:43fdobridge: <Sid> I imagine it'll also be a problem for enterprise, which run things headless and with GPU arrays
20:43fdobridge: <mtijanic> There's just a dozen ways I can see it happening, just off the top of my head.
20:43fdobridge: <Sid> yeah, also fair
20:46fdobridge: <mtijanic> For example, as Mary found out the hard way the other day, any userspace access to mmaped GPU memory will wake it.
20:46fdobridge: <mtijanic> So if your toggle ends up invoking anything that occasionally touches VRAM, no sleep for you.
20:46fdobridge: <mtijanic> Even if there's no apparent communication with the driver at all.
20:47fdobridge: <mtijanic> (this goes through the fault() handler)
20:48fdobridge: <mtijanic> Also, 99% of ioctls (by type, not by frequency) will wake it even if all they do is read cpuside state that's not tied to any GPU.
20:49fdobridge: <karolherbst🐧🦀> though a driver can do it more fine grained if it wants to
20:50fdobridge: <Sid> :notlikethishisui:
20:51fdobridge: <karolherbst🐧🦀> but anyway, all those fancy "stat" apps are responsible for 50% of all battery drain
20:52fdobridge: <mtijanic> Oh, _absolutely_. Dynamic power management is very much happy-path optimized and straying from that is a good way to get Dwarf Fortress levels of Fun real quick. But, it's been getting better, bumpy ride that it was with GSP.
20:53fdobridge: <mtijanic> Not fun suspending a GPU when most of the state is kept on the GPU/GSP.
20:53fdobridge: <karolherbst🐧🦀> main reason we return EAGAIN or whatever on all hwmon interfaces in nouveau when the gpu sleeps
20:54fdobridge: <Sid> I should check if the same behavior exists on nouveau with and without GSP
20:54fdobridge: <Sid> but
20:54fdobridge: <Sid> that's a job for tomorrow
20:54fdobridge: <mtijanic> Recent NV drivers actually have a sysmem page that contains the instrumentation stats that the GPU pushes to. If the GPU is sleepy, it doesn't get updated.
20:54fdobridge: <karolherbst🐧🦀> nice
20:54fdobridge: <mtijanic> But moving the libs and tools to use that is a bumpy road.
20:54fdobridge: <karolherbst🐧🦀> we still need to wire up hwmon for gsp though
20:55fdobridge: <mtijanic> You're still prototyping on 535?
20:55fdobridge: <Sid> oh
20:55fdobridge: <Sid> will /sys/bus/pci/devices/*/power/runtime_status not work?
20:56fdobridge: <karolherbst🐧🦀> it should, why?
20:56fdobridge: <Sid> then I don't need hwmon to test :3
20:56fdobridge: <karolherbst🐧🦀> true
20:57fdobridge: <mtijanic> You can't really ask the GSP if the GPU is suspended or not though.
20:57fdobridge: <airlied> it's quite hard to do fine grained pm wakeups, due to lock ordering
20:57fdobridge: <airlied> if you try and wake up a gpu after you've taken a bunch of locks, it doesn't end well
20:57fdobridge: <Sid> not aiming for that either, just wanna observe behavior with various scenarios
20:58fdobridge: <airlied> hence why it happens at ioctl time
21:30fdobridge: <karolherbst🐧🦀> the pci_dev already knows
22:21Lyude: we need more of an ability to back off and suspend userspace requests tbh
22:21Lyude: it's an issue with DP aux devices as well