10:49 fdobridge: <k​arolherbst🐧🦀> `gpio: GPU is missing power, check its power cables.` glad that this check still works :ferrisUpsideDown:
13:00 fdobridge: <k​arolherbst🐧🦀> please test https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27853
13:00 fdobridge: <k​arolherbst🐧🦀> @airlied @gfxstrand mind giving it a review?
13:08 fdobridge: <S​id> :o
13:08 fdobridge: <S​id> what does it do
13:09 fdobridge: <S​id> like, what should I test for
13:13 fdobridge: <k​arolherbst🐧🦀> just doing gl/vaapi/vdpau stuff
13:25 fdobridge: <S​id> huh
15:10 dakr: karolherbst, "so it seems if you call DRM_NOUVEAU_VM_INIT and then just terminate the process, the kernel dead locks inside `drm_postclose`"
15:11 dakr: I tried that and can't get it to lock up.
15:11 karolherbst: mhh
15:11 karolherbst: I'll check it out today anyway, just getting a new kernel to compile
15:11 dakr: Any other specific conditions?
15:11 karolherbst: dakr: did you try on 6.7 though?
15:12 dakr: v6.7.7, even without the other fix
15:12 karolherbst: mhh
15:12 karolherbst: maybe you need to allocate a channel as well?
15:12 karolherbst: I triggered it within the gl driver
15:12 dakr: let me try real quick.
15:13 karolherbst: https://gitlab.freedesktop.org/karolherbst/mesa/-/tree/nouveau/vm_bind?ref_type=heads
15:13 karolherbst: just assert somewhere after the VM_BIND thing
15:15 dakr: yeah, so I can't get it to fail, there must be some other condition with mesa.
15:15 dakr: Will try with your branch later.
15:16 karolherbst: dakr: don't worry, I'll try in a couple of minutes with and without your fix
15:36 karolherbst: dakr: that branch doesn't trigger it either 🙃 need to revert the glsl_type_singleton_init_or_ref commit on there and then just run any gl application and it triggers...
15:36 karolherbst: now testing your patch
15:39 karolherbst: dakr: mind pasting your patch again? :D
15:40 dakr: diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
15:40 dakr: index 50589f982d1a..75545da9d1e9 100644
15:40 dakr: --- a/drivers/gpu/drm/nouveau/nouveau_drm.c
15:40 dakr: +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
15:40 dakr: @@ -708,10 +708,11 @@ nouveau_drm_device_fini(struct drm_device *dev)
15:40 dakr: }
15:40 dakr: mutex_unlock(&drm->clients_lock);
15:40 dakr:
15:40 dakr: - nouveau_sched_fini(drm);
15:40 dakr: -
15:40 dakr: nouveau_cli_fini(&drm->client);
15:40 dakr: nouveau_cli_fini(&drm->master);
15:40 dakr: +
15:40 dakr: + nouveau_sched_fini(drm);
15:40 dakr: +
15:40 dakr: nvif_parent_dtor(&drm->parent);
15:40 karolherbst: IRC moments
15:40 dakr: mutex_destroy(&drm->clients_lock);
15:41 dakr: kfree(drm);
15:41 karolherbst: mind using a pastebin website the next time or soemthing? :D IRC really messes up patches
15:44 karolherbst: dakr: on what branch would that apply? Because it doesn't on drm-misc-fixes
15:45 dakr: karolherbst, v6.7, v6.8 got a rework, the patch would be for v6.7 only.
15:45 karolherbst: I see
15:45 karolherbst: hitting the issue on drm-misc-fixes as well btw
15:45 dakr: Ok, then that's something else.
15:46 karolherbst: I have a serial console on my desktop, so maybe that throws something out
15:47 karolherbst: it didn't before, probably because I haven't waited long enough for the hangchecker to kick in
16:28 fdobridge: <b​abblebones> How's the state of nouveau displayport audio looking?
16:28 fdobridge: <k​arolherbst🐧🦀> should work unless you hit bugs
16:30 karolherbst: dakr: okay so yeah.. this one is a little annoying as it doesn't seem to get picked up by the hang checker. Maybe I need to turn on more debugging things, but anyway, your patches doesn't help
16:31 fdobridge: <r​inlovesyou> oh real?
16:32 fdobridge: <k​arolherbst🐧🦀> yeah, nouveau just has to wire some stuff up, but it's up to the intel hda audio driver to do the real thing
16:32 fdobridge: <k​arolherbst🐧🦀> it's also just sometimes buggy
16:32 fdobridge: <r​inlovesyou> i see, a few weeks ago it wasn't working, lemme test it now
16:32 fdobridge: <k​arolherbst🐧🦀> I mean.. your bug is probably not fixed
16:33 fdobridge: <r​inlovesyou> is that something on our side we might be able to fix? or is that an issue with the actual audio driver?
16:34 fdobridge: <r​inlovesyou> dp audio certainly works on proprietary
16:34 fdobridge: <k​arolherbst🐧🦀> I suspect nouveau needs to do something, but I also don't really have the time to look into it
16:34 fdobridge: <r​inlovesyou> i doubt i can do anything, but do you know where i should be looking?
16:35 fdobridge: <r​inlovesyou> i doubt i can do anything, but do you know roughly where i should be looking? (edited)
16:37 fdobridge: <k​arolherbst🐧🦀> not really
16:38 fdobridge: <k​arolherbst🐧🦀> it's unknown what the actual problem here is, It's probably better to figure out what nvidia is doing to make it allw ork
16:38 fdobridge: <k​arolherbst🐧🦀> might involve some firmware call
16:39 fdobridge: <k​arolherbst🐧🦀> maybe some call into the audio driver
16:39 fdobridge: <k​arolherbst🐧🦀> dunno
16:58 fdobridge: <S​id> *sweat*
17:01 fdobridge: <r​inlovesyou> :')
17:01 fdobridge: <S​id> no rin go look at #lounge
17:01 fdobridge: <r​inlovesyou> kek
17:32 karolherbst: dakr: I think it's this line: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/gpu/drm/nouveau/nouveau_drm.c?h=v6.7.8#n1185
17:32 karolherbst: the mutex_lock
17:32 karolherbst: but no idea why that happens
17:33 karolherbst: mhhhh
17:33 dakr: karolherbst, that's pretty weird.
17:33 karolherbst: actually...
17:33 karolherbst: like..
17:33 karolherbst: if the VM_INIT ioctl fails, does it unlock this lock?
17:33 dakr: Can't think of any relevant other locks we're holding there..
17:33 fdobridge: <g​fxstrand> Ugh... Something's not right with trying to set this register. Sometimes I set it and the test passes and sometimes the test fails.
17:34 karolherbst: dakr: it doesn't :D
17:34 karolherbst: or does it?
17:34 karolherbst: ehhh...
17:34 karolherbst: nah it should.. mhh, that's weird
17:34 dakr: I think it does..
17:35 karolherbst: maybe it's a different one...
17:35 dakr: no lockdep splat?
17:35 karolherbst: lockdep disabled, becaue it doesn't really work on nouveau anyway (we really should fix that)
17:35 karolherbst: mhhh
17:35 karolherbst: I think the old submit ioctl might cause it
17:36 dakr: At least without GSP and Pascal lockdep was working fine for me.
17:36 karolherbst: let me try it then...
17:37 karolherbst: maybe the way things got rework makes it work now
17:38 karolherbst: ohhh...
17:38 karolherbst: mhhh
17:38 karolherbst: dakr: I think it needs to call nouveau_abi16_put
17:38 karolherbst: for the `(unlikely(nouveau_cli_uvmm(cli)))` case
17:40 karolherbst: dakr: https://gist.githubusercontent.com/karolherbst/8175d4830f13cc15d97a7af1982fa9a8/raw/e8d3185fa999db938064bf16969629e6a4a3d665/tmp.patch
17:40 karolherbst: testing that shortly
17:40 dakr: Oh, indeed. Good catch.
17:42 fdobridge: <k​arolherbst🐧🦀> might want to see how nvidia does it?
17:42 fdobridge: <k​arolherbst🐧🦀> that regs address might come up in traces
17:44 fdobridge: <g​fxstrand> Yeah, I need to look at their programming.
17:45 fdobridge: <g​fxstrand> The immediate problem is, I think, power management screwing it up on my laptop. I seem to be able to more reliably reproduce both behaviors on the desktop
17:45 fdobridge: <k​arolherbst🐧🦀> mhhhh
17:45 fdobridge: <k​arolherbst🐧🦀> do we touch that reg in the kernel?
17:46 fdobridge: <k​arolherbst🐧🦀> though that shouldn't matter...
17:46 fdobridge: <k​arolherbst🐧🦀> that reg is context switched
17:46 fdobridge: <k​arolherbst🐧🦀> unless the kernel does stuff per context, then it matters
18:32 karolherbst: dakr: sent the fix and another UAPI thing to the ML
18:51 fdobridge: <z​mike.> So where should I be requesting vkcts coverage to hit this?
18:53 fdobridge: <z​mike.> Or @gfxstrand if you file a coverage ticket you can cc me and I'll assign
18:56 fdobridge: <g​fxstrand> To hit what? This hang? Let me file one quick.
19:00 fdobridge: <g​fxstrand> @zmike. https://gitlab.khronos.org/Tracker/vk-gl-cts/-/issues/4984
19:02 fdobridge: <g​fxstrand> I *think* that should reproduce it. It's hard to be 100% sure but, based on my triage, I think that should tickle the bug.
19:04 fdobridge: <z​mike.> Cool
19:06 HdkR: Nice. NVIDIA GPU has arrived. I'll tinker with that after I do a FEX release
19:07 fdobridge: <g​fxstrand> \o/
19:08 fdobridge: <z​mike.> @gfxstrand same goes for any other coverage gaps you find
19:57 HdkR: https://cdn.discordapp.com/attachments/702130548419723294/1214290204370800670/IMG_1500.jpg?ex=65f892e1&is=65e61de1&hm=e54ca54e396ebe4d51fa1b345f3193bd14f5efd9aa8a2df6fd418d64ebd6e853&
19:57 HdkR: So cute, so smol
20:04 fdobridge: <m​ohamexiety> daaaaaaaamn
20:05 fdobridge: <t​om3026> never seen a gt 1030 ? 😄
20:07 fdobridge: <m​ohamexiety> I have one, actually \:p
20:07 fdobridge: <m​ohamexiety> just never seen a 4000 SFF before
20:10 fdobridge: <m​ohamexiety> funny old picture I had. the difference in size is hilarious (this is also one of the slimmer 3080s)
20:10 fdobridge: <m​ohamexiety> https://cdn.discordapp.com/attachments/1034184951790305330/1214303904091607141/20230317_180850.png?ex=65f89fa3&is=65e62aa3&hm=d7289dec57c1ae1ca34b553a253d0806b914b343b35cefd0cb741b9a626e9655&
20:10 fdobridge: <S​id> gt 1030, manufactured e-waste
20:22 fdobridge: <g​fxstrand> 419ea8 doesn't show up at all in the xxd of a dump. 😢
20:22 fdobridge: <k​arolherbst🐧🦀> that's like.. unfortunate
20:23 fdobridge: <g​fxstrand> I should see if I can make a crucible reproducer and throw it at the blob
20:23 fdobridge: <k​arolherbst🐧🦀> might be they workaround it on the kernel level...
20:23 fdobridge: <g​fxstrand> Maybe?
20:24 fdobridge: <k​arolherbst🐧🦀> or it's a different reg 😄
20:24 fdobridge: <k​arolherbst🐧🦀> maybe dump all the registers in the 0x418000+ area
20:24 fdobridge: <g​fxstrand> *sigh
20:24 fdobridge: <g​fxstrand> *sigh* (edited)
20:25 fdobridge: <g​fxstrand> The weird bit is that bit 14 of `0x419ea8` definitely does disable the exception.
20:25 fdobridge: <g​fxstrand> But it also breaks subgroups so WTH
20:25 fdobridge: <k​arolherbst🐧🦀> yeah.. that's a bit wtf
20:25 fdobridge: <k​arolherbst🐧🦀> in what way does it break them though?
20:27 fdobridge: <g​fxstrand> Still figuring that out
20:27 fdobridge: <g​fxstrand> It appears that at least `vote.any` is a bit busted
20:29 fdobridge: <g​fxstrand> Here's the shader:
20:29 fdobridge: <g​fxstrand> ```glsl
20:29 fdobridge: <g​fxstrand> #version 450
20:29 fdobridge: <g​fxstrand> #extension GL_KHR_shader_subgroup_vote: enable
20:29 fdobridge: <g​fxstrand> layout(location = 0) out uint result;
20:29 fdobridge: <g​fxstrand> layout(set = 0, binding = 4, std430) readonly buffer Buffer1
20:29 fdobridge: <g​fxstrand> {
20:30 fdobridge: <g​fxstrand> bool data[];
20:30 fdobridge: <g​fxstrand> };
20:30 fdobridge: <g​fxstrand> void main (void)
20:30 fdobridge: <g​fxstrand> {
20:30 fdobridge: <g​fxstrand> uint tempRes;
20:30 fdobridge: <g​fxstrand> bool valueEqual = bool(1.25 * float(data[gl_SubgroupInvocationID]) + 5.0);
20:30 fdobridge: <g​fxstrand> bool valueNoEqual = bool(subgroupElect());
20:30 fdobridge: <g​fxstrand> tempRes = subgroupAllEqual(bool(1)) ? 0x1 : 0;
20:30 fdobridge: <g​fxstrand> tempRes |= 0x2;
20:30 fdobridge: <g​fxstrand> tempRes |= subgroupAllEqual(data[0]) ? 0x4 : 0;
20:30 fdobridge: <g​fxstrand> tempRes |= subgroupAllEqual(valueEqual) ? 0x8 : 0x0;
20:30 fdobridge: <g​fxstrand> tempRes |= subgroupAllEqual(valueNoEqual) ? 0x0 : 0x10;
20:30 fdobridge: <g​fxstrand> if (subgroupElect()) tempRes |= 0x2 | 0x10;
20:30 fdobridge: <g​fxstrand> result = tempRes;
20:30 fdobridge: <g​fxstrand> }
20:30 fdobridge: <g​fxstrand> ```
20:30 fdobridge: <g​fxstrand> The bit that's missing in the output is `0x4` so it's that `subgroupAllEqual(data[0])` that's going wrong. That or the `data[0]` load itself but that seems unlikely given that the subgroup tests are literally the only tests to fail.
20:34 fdobridge: <g​fxstrand> But other tests have a different failure pattern
20:36 fdobridge: <g​fxstrand> `VOTE.EQ P1, P1` is the one that's failing. Maybe VOTE stops liking P1? That makes no sense
20:41 fdobridge: <k​arolherbst🐧🦀> mhhh
20:41 fdobridge: <k​arolherbst🐧🦀> data is an ssbo, right?
20:41 fdobridge: <g​fxstrand> Hrm... There's a SET_SHADER_EXCEPTIONS method...
20:41 fdobridge: <g​fxstrand> If I just disable all shader exceptions there, both tests pass
20:42 fdobridge: <k​arolherbst🐧🦀> ohh, is nvidia using that?
20:42 fdobridge: <k​arolherbst🐧🦀> I totally forgot that method exists...
20:43 fdobridge: <g​fxstrand> Okay, I found all their falcons
20:49 fdobridge: <g​fxstrand> ```
20:49 fdobridge: <g​fxstrand> 0x418800 bit 0 -> true
20:49 fdobridge: <g​fxstrand> 0x419a08 bit 4 -> false
20:49 fdobridge: <g​fxstrand> 0x419ba4 bit 3 -> false
20:49 fdobridge: <g​fxstrand> 0x419a04 bit 0 -> true
20:49 fdobridge: <g​fxstrand> 0x419a04 bit 1 -> true
20:49 fdobridge: <g​fxstrand> 0x418e40 bits 24..28 -> 0x7
20:49 fdobridge: <g​fxstrand> 0x418e50 bits 12..14 -> 0x2
20:49 fdobridge: <g​fxstrand> 0x418e64 bits 0..16 0x935
20:49 fdobridge: <g​fxstrand> 0x419ea8 bit 2 -> false
20:49 fdobridge: <g​fxstrand> 0x419ea8 bit 14 -> false
20:49 fdobridge: <g​fxstrand> 0x419ea8 bit 15 -> false
20:50 fdobridge: <g​fxstrand> 0x419ea8 bit 18 -> false
20:50 fdobridge: <g​fxstrand> 0x419ea8 bit 24 -> false
20:50 fdobridge: <g​fxstrand> ```
20:50 fdobridge: <g​fxstrand> So they do set `0x419ea8`, I just missed it.
20:50 fdobridge: <g​fxstrand> Now to figure out what those bits do...
21:01 fdobridge: <g​fxstrand> ```
21:01 fdobridge: <g​fxstrand> 0x418800 bit 0 -> true // gr_pri_gpcs_setup_debug_poly_offset_nan_is_zero
21:01 fdobridge: <g​fxstrand> 0x419a08 bit 4 -> false // Unknown gr_pri_gpcs_tpcs_tex_samp_dbg
21:01 fdobridge: <g​fxstrand> 0x419ba4 bit 3 -> false // sm_disp_ctrl_ld_is_nop
21:01 fdobridge: <g​fxstrand> 0x419a04 bit 0 -> true // Unknown tpcs_tex_lod_dbg bit
21:01 fdobridge: <g​fxstrand> 0x419a04 bit 1 -> true // tpcs_tex_lod_dbg_cubeseam_aniso
21:01 fdobridge: <g​fxstrand> 0x418e40 bits 24..28 -> 0x7 // Unknown gr_pri_gpcs_swdx_tc_bundle_ctrl
21:01 fdobridge: <g​fxstrand> 0x418e50 bits 12..14 -> 0x2 // Unknown gr_pri_gpcs_swdx_tc_bundle_ctrl
21:01 fdobridge: <g​fxstrand> 0x418e64 bits 0..16 0x935 // Unknown gr_pri_gpcs_swdx_tc_bundle_addr
21:01 fdobridge: <g​fxstrand> 0x419ea8 bit 2 -> false // api_stack_error_report
21:01 fdobridge: <g​fxstrand> 0x419ea8 bit 14 -> false // oor_addr_report
21:01 fdobridge: <g​fxstrand> 0x419ea8 bit 15 -> false // misaligned_addr_report
21:01 fdobridge: <g​fxstrand> 0x419ea8 bit 18 -> false // invalid_const_addr_pdc_report
21:01 fdobridge: <g​fxstrand> 0x419ea8 bit 24 -> false // Unknown
21:01 fdobridge: <g​fxstrand> ```
21:04 fdobridge: <g​fxstrand> Programming `0x419ea8` exactly the same as the blob and I still have subgroup fails. 🙀
23:32 fdobridge: <g​fxstrand> Uh... so... vote appears to be broken in fragment shaders...
23:34 fdobridge: <g​fxstrand> Or maybe we need to set another helper bit.
23:54 Lyude: 24 files changed, +1802 -2 diff, wow it's been ages since I've written that much code
23:55 gfxstrand: :D
23:55 karolherbst: :D