13:55fdobridge_: <!DodoNVK (she) 🇱🇹> OpenBSD developers are already exploding from all of this :ferris:
13:55fdobridge_: <!DodoNVK (she) 🇱🇹> https://cdn.discordapp.com/attachments/1034184951790305330/1193915909618802738/message.txt?ex=65ae73d6&is=659bfed6&hm=180ab3356138c890c6ef7c43ba11dc59809e86442644d8b0f2943fe3ab112a5b&
14:14fdobridge_: <gfxstrand> Why? Are they stripping the decorations? Just complaining that they're there? Forcing the warnings on anyway?
14:14fdobridge_: <!DodoNVK (she) 🇱🇹> I've heard OpenBSD has a proper code cleanup policy
14:21fdobridge_: <karolherbst🐧🦀> pain 😄
14:21fdobridge_: <karolherbst🐧🦀> sooo.. they have to remove dead_code in projects?
14:23fdobridge_: <gfxstrand> That's not my problem. Especially when working in Rust, you end up adding and removing little helpers pretty frequently. So we ether leave them with an `#[allow(dead_code)]` or we're constantly deleting and retyping them.
14:23fdobridge_: <gfxstrand> So, yeah, if OpenBSD wants to add bullshit requirements, that's on them.
14:24fdobridge_: <gfxstrand> They're free to carry a patch that deletes all the dead code.
14:42fdobridge_: <karolherbst🐧🦀> imagine they file a bug for that
14:47fdobridge_: <gfxstrand> ```
14:47fdobridge_: <gfxstrand> [551582.033064] nouveau 0000:01:00.0: gsp: rc engn:00000001 chid:16 type:69 scope:1 part:233
14:47fdobridge_: <gfxstrand> [551582.033069] nouveau 0000:01:00.0: fifo:c00000:0002:0010:[deqp-vk[570832]] errored - disabling channel
14:47fdobridge_: <gfxstrand> [551582.033072] nouveau 0000:01:00.0: deqp-vk[570832]: channel 16 killed!
14:47fdobridge_: <gfxstrand> ```
14:47fdobridge_: <gfxstrand> Do we know how to decode that?
14:48fdobridge_: <Sid> this is the same job timeouts thing we've been talking about for a while now
14:57fdobridge_: <gfxstrand> So, useless. 😢 That's what I was worried about.
14:58fdobridge_: <gfxstrand> I send substantially fewer commands to the GPU and now it can't finish. *sigh*
14:59fdobridge_: <gfxstrand> Ugh... Maybe I have to rebind all the shaders if I rebind any of them?
14:59fdobridge_: <Sid> that bug's been there for a while, it's just also very random
15:00fdobridge_: <gfxstrand> In this case, it went from 100% pass to 100% fail so whatever I'm hitting isn't random.
15:00fdobridge_: <gfxstrand> But it's likely an entirely different bug
15:00fdobridge_: <gfxstrand> The "channel timed out" is a pretty generic error
15:02fdobridge_: <gfxstrand> I'm still really confused by all this. Why does it want them all bound at the same time? What, exactly, is it that I have to set all at the same time?
15:02fdobridge_: <gfxstrand> Just enable/disable? Or do I need to set addresses and everything else?
15:02fdobridge_: <gfxstrand> Or is not changing shaders causing something else to batch up badly?
15:03fdobridge_: <karolherbst🐧🦀> are you doing compute stuff in the meantime?
15:03fdobridge_: <karolherbst🐧🦀> though that would be weird to mess with it...
15:03fdobridge_: <gfxstrand> Nope
15:03fdobridge_: <gfxstrand> This test is literally just vertex+fragment
15:04fdobridge_: <karolherbst🐧🦀> mhhh
15:04fdobridge_: <karolherbst🐧🦀> I don't think you need to rebind the same shaders again
15:04fdobridge_: <gfxstrand> It doesn't even use any descriptors. Just push constants.
15:10fdobridge_: <Sid> ...huh
15:10fdobridge_: <Sid> do you have a kernel version or so where it didn't happen?
15:13fdobridge_: <Sid> because it's been happening to me since 6.7-rc7 at the very least, I can't really confirm prior versions because I constantly ran into prime-related bugs
15:13fdobridge_: <Sid> I *can* test older tags with the patchset that fixes prime if you want me to
15:22fdobridge_: <gfxstrand> I think it's a very different bug from the one you're seeing
15:22fdobridge_: <gfxstrand> It has nothing to do with prime and it's 100% reproducable.
15:22fdobridge_: <gfxstrand> Just because two things throw the same dmesg error doesn't mean they're related.
15:27fdobridge_: <gfxstrand> Ugh... This might be a test bug.
15:46fdobridge_: <redsheep> Maybe what you're seeing is instead related to the rare crash with that same error that I've seen with zink+NVK on Minecraft without prime?
15:47fdobridge_: <gfxstrand> But also the bug I'm seeing may affect other things. 🤔
15:47fdobridge_: <gfxstrand> I found it and it's pretty nasty
15:48fdobridge_: <gfxstrand> Especially Zink since I think it uses secondaries quite a bit.
15:50fdobridge_: <gfxstrand> I'll know in an hour or so if this fixes the hangs I'm seeing in my ESO branch
16:17EisNerd: did you see I updated #188 with my findings
16:17EisNerd: if I can try anything additional, just ping me, here (if my conncetion is ok) or in the issue
16:28karolherbst: EisNerd: I left a comment on the bug
16:30fdobridge_: <zmike.> No
16:31fdobridge_: <gfxstrand> Oh, it doesn't? I thought it did for some reason.
16:32fdobridge_: <zmike.> You must be thinking of zink's evil twin
16:34fdobridge_: <Sid> oink
16:36fdobridge_: <!DodoNVK (she) 🇱🇹> ANGLE? 🍩
16:36fdobridge_: <zmike.> No secondaries there either
17:59dakr: gfxstrand, https://lore.kernel.org/all/df7d110b-a50c-4293-b5d4-45913fa6909e@infradead.org/T/
18:00dakr: I'm about to merge this one. Guess you'd want to update mesa correspondingly for consistency?
19:43fdobridge_: <gfxstrand> dakr: If it's just a header docs update, I'm not too worried about it. We can pull it into Mesa whenever or we can pull it the next time there's actual new UAPI we want.
19:44fdobridge_: <gfxstrand> It's only functional changes that require the kernel/userspace dance.
19:46dakr: gfxstrand: sure, was just a fyi in case you prefer to keep it consistent :)
19:47fdobridge_: <gfxstrand> Thanks! I appreciate the heads up.
21:33EisNerd: ok completed bisecting (sorry for missing the merges)
21:58Lyude: EisNerd: i'm here now, you said you bisected the problem you've been hitting with modesetting on ampere?
22:11EisNerd: hopefully this helps this time really to nail this down and to provide a quick patch, for current 6.10
22:12EisNerd: Lyude: modesetting is the nouveau kernel module?
22:13EisNerd: and ampere is among others NVIDIA GK106GLM [Quadro K2100M]
22:13EisNerd: then yes
22:15Lyude: EisNerd: sorry-I meant modesetting as in display issues, but it is handled by the kernel module yeah
22:17EisNerd: the issue is plainly, that since linux 6.whatever I needed to blacklist nouveau, to get my system usable, so I can't use my secondary gpu (optimus)
22:18EisNerd: some people even got stuck with kernel 5 as they have such a gpu as the only gpu
22:22EisNerd: drivers/gpu/drm/nouveau/nvif/conn.c
22:22EisNerd: nvif_conn_hpd_status
22:23EisNerd: based on kernel crash output
22:28EisNerd: here is the output https://nopaste.net/vA9NFPKSTu
22:29EisNerd: I missed a bit, so this is complete https://nopaste.net/HiLZGhBeYg
22:52EisNerd: Lyude: can you see anything in this added code, that would be suspective to produce a null pointer issue?
22:54Lyude: could you maybe get the whole log? also, what was the commit sha you ended up with?
22:54EisNerd: see here https://github.com/torvalds/linux/commit/32dd9236698bcd2ffdb69954b167a851fd50182a#diff-a21b40e25476e35cf0127ffdb4db5e4ab9df3c7a74d665d2152dae31e27646bb (sorry)
22:54Lyude: np
22:55EisNerd: https://gitlab.freedesktop.org/drm/nouveau/-/issues/188#note_2229276
22:57Lyude: EisNerd: do you still have the kernel built? might also help a bit if we could run the dmesg through ./scripts/decode_stacktrace.sh in the kernel tree (assuming you're in the main kernel source directory: dmesg | ./scripts/decode_stacktrace.sh vmlinux . .
22:57Lyude: (can also forward the dmesg to that script through other means)
22:57Lyude: if not it's not a huge deal, but it would speed things up a bit
22:57EisNerd: do I need to boot it?
22:58Lyude: nope - as long as you've got the source tree and the log you can just run it through that script. if it's in a text file for instance it'd be something like
22:58Lyude: ./scripts/decode_stacktrace.sh vmlinux . . < /home/lyudess/some_log.txt
22:59Lyude: it should just run the stacktrace through a symbol decoder so it gives us actual line numbers
23:00Lyude: if you can't figure it out though it's not a huge deal. this hopefully should be enough info to go off either way
23:00EisNerd: I still have the kernel.log
23:04Lyude: that should work
23:04EisNerd: do I just need the stack?
23:07Lyude: yeah - it should translate the backtrace into a bunch of source code lines - as long as you've still got the output from building the kernel from source available
23:08Lyude: i have a feeling whatever this issue is it might not even be present in later versions of the kernel
23:08Lyude: since I don't see nvif_conn_hpd_status in the mainline kernel anymore
23:10Lyude: EisNerd: any chance you might be able to/have checked what running a later kernel yields?
23:11EisNerd: it misses the module path
23:11Lyude: (also - might not be qworth decoding that stacktrace quite yet after all if it's this old of a kernel)
23:12EisNerd: I can try with 6.6.8 or check If I find a dump
23:13Lyude: probably a good idea tbh
23:14EisNerd: ok can you tell me how to provide the module path?
23:14Lyude: oh sorry, I thought you were talking about ./scripts/decode_stacktrace.sh for some reason. what exactly do you mean by it not finding the module path?
23:15EisNerd: the script is asking for the module path to decode
23:15EisNerd: WARNING! Modules path isn't set, but is needed to parse this symbol
23:16Lyude: oh - what directory are you running the script in?
23:16EisNerd: the kernel build dir
23:18Lyude: try try replacing vmlinux . . with vmlinux auto
23:18EisNerd: same result
23:18Lyude: if that doesn't work I'm not totally sure, but we probably need to just see a trace with a newer kernel anyway honestly so I don't think it's worth bothering at this point
23:18EisNerd: ok
23:19Lyude: since if things are still broken with a newer kernel and that codepath's gone it's likely a slightly different issue you're hitting now
23:20EisNerd: ok let me try if I do not have X active
23:30EisNerd: ok back, same issue
23:31EisNerd: but other stack
23:31Lyude: could I see that?
23:32EisNerd: https://nopaste.net/HiW6ZX5zBf
23:33EisNerd: same area
23:34EisNerd: there seems to be sth in the new/modified DP code, that does not work with this chips
23:34EisNerd: maybe the not wired DP (as I can drive the DP with my Intel)
23:47Lyude: we must have changed that code really recently then since it seems like it's still calling that symbol
23:49EisNerd: Lyude: is there a way to avoid this?
23:49Lyude: i'll have to see - unfortunately I'm still not sure if this is in the latest kernel or not from that
23:49EisNerd: so some workaround that prevents the oops and makes it usable again
23:49EisNerd: this is 6.6.8
23:50Lyude: you can always disable the driver but besides that unfortunately I don't know of any
23:52EisNerd: should I append this to the issue?
23:52Lyude: EisNerd: yeah - but I think we changed that code very recently, since like I said it's still calling to nvif_conn_hpd_status and that's not in 6.7
23:52Lyude: yeah - probably a good idea
23:54Lyude: i'll have to try taking a closer look soon and maybe see if I've got a machine I can reproduce this on
23:54Lyude: if it's easy to reproduce I definitely should have some working kepler hardware
23:54EisNerd: Lyude: otherwise leave me a message
23:54Lyude: sure thing
23:54EisNerd: in Gentoo it is quite easy to change some source bits
23:58EisNerd: Ok you should get a notice
23:58EisNerd: I need to get some sleep, thx for your help
23:58Lyude: np!