10:49fdobridge: <karolherbst🐧🦀> `gpio: GPU is missing power, check its power cables.` glad that this check still works :ferrisUpsideDown:
13:00fdobridge: <karolherbst🐧🦀> please test https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27853
13:00fdobridge: <karolherbst🐧🦀> @airlied @gfxstrand mind giving it a review?
13:08fdobridge: <Sid> :o
13:08fdobridge: <Sid> what does it do
13:09fdobridge: <Sid> like, what should I test for
13:13fdobridge: <karolherbst🐧🦀> just doing gl/vaapi/vdpau stuff
13:25fdobridge: <Sid> huh
15:10dakr: karolherbst, "so it seems if you call DRM_NOUVEAU_VM_INIT and then just terminate the process, the kernel dead locks inside `drm_postclose`"
15:11dakr: I tried that and can't get it to lock up.
15:11karolherbst: mhh
15:11karolherbst: I'll check it out today anyway, just getting a new kernel to compile
15:11dakr: Any other specific conditions?
15:11karolherbst: dakr: did you try on 6.7 though?
15:12dakr: v6.7.7, even without the other fix
15:12karolherbst: mhh
15:12karolherbst: maybe you need to allocate a channel as well?
15:12karolherbst: I triggered it within the gl driver
15:12dakr: let me try real quick.
15:13karolherbst: https://gitlab.freedesktop.org/karolherbst/mesa/-/tree/nouveau/vm_bind?ref_type=heads
15:13karolherbst: just assert somewhere after the VM_BIND thing
15:15dakr: yeah, so I can't get it to fail, there must be some other condition with mesa.
15:15dakr: Will try with your branch later.
15:16karolherbst: dakr: don't worry, I'll try in a couple of minutes with and without your fix
15:36karolherbst: dakr: that branch doesn't trigger it either 🙃 need to revert the glsl_type_singleton_init_or_ref commit on there and then just run any gl application and it triggers...
15:36karolherbst: now testing your patch
15:39karolherbst: dakr: mind pasting your patch again? :D
15:40dakr: diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
15:40dakr: index 50589f982d1a..75545da9d1e9 100644
15:40dakr: --- a/drivers/gpu/drm/nouveau/nouveau_drm.c
15:40dakr: +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
15:40dakr: @@ -708,10 +708,11 @@ nouveau_drm_device_fini(struct drm_device *dev)
15:40dakr: }
15:40dakr: mutex_unlock(&drm->clients_lock);
15:40dakr:
15:40dakr: - nouveau_sched_fini(drm);
15:40dakr: -
15:40dakr: nouveau_cli_fini(&drm->client);
15:40dakr: nouveau_cli_fini(&drm->master);
15:40dakr: +
15:40dakr: + nouveau_sched_fini(drm);
15:40dakr: +
15:40dakr: nvif_parent_dtor(&drm->parent);
15:40karolherbst: IRC moments
15:40dakr: mutex_destroy(&drm->clients_lock);
15:41dakr: kfree(drm);
15:41karolherbst: mind using a pastebin website the next time or soemthing? :D IRC really messes up patches
15:44karolherbst: dakr: on what branch would that apply? Because it doesn't on drm-misc-fixes
15:45dakr: karolherbst, v6.7, v6.8 got a rework, the patch would be for v6.7 only.
15:45karolherbst: I see
15:45karolherbst: hitting the issue on drm-misc-fixes as well btw
15:45dakr: Ok, then that's something else.
15:46karolherbst: I have a serial console on my desktop, so maybe that throws something out
15:47karolherbst: it didn't before, probably because I haven't waited long enough for the hangchecker to kick in
16:28fdobridge: <babblebones> How's the state of nouveau displayport audio looking?
16:28fdobridge: <karolherbst🐧🦀> should work unless you hit bugs
16:30karolherbst: dakr: okay so yeah.. this one is a little annoying as it doesn't seem to get picked up by the hang checker. Maybe I need to turn on more debugging things, but anyway, your patches doesn't help
16:31fdobridge: <rinlovesyou> oh real?
16:32fdobridge: <karolherbst🐧🦀> yeah, nouveau just has to wire some stuff up, but it's up to the intel hda audio driver to do the real thing
16:32fdobridge: <karolherbst🐧🦀> it's also just sometimes buggy
16:32fdobridge: <rinlovesyou> i see, a few weeks ago it wasn't working, lemme test it now
16:32fdobridge: <karolherbst🐧🦀> I mean.. your bug is probably not fixed
16:33fdobridge: <rinlovesyou> is that something on our side we might be able to fix? or is that an issue with the actual audio driver?
16:34fdobridge: <rinlovesyou> dp audio certainly works on proprietary
16:34fdobridge: <karolherbst🐧🦀> I suspect nouveau needs to do something, but I also don't really have the time to look into it
16:34fdobridge: <rinlovesyou> i doubt i can do anything, but do you know where i should be looking?
16:35fdobridge: <rinlovesyou> i doubt i can do anything, but do you know roughly where i should be looking? (edited)
16:37fdobridge: <karolherbst🐧🦀> not really
16:38fdobridge: <karolherbst🐧🦀> it's unknown what the actual problem here is, It's probably better to figure out what nvidia is doing to make it allw ork
16:38fdobridge: <karolherbst🐧🦀> might involve some firmware call
16:39fdobridge: <karolherbst🐧🦀> maybe some call into the audio driver
16:39fdobridge: <karolherbst🐧🦀> dunno
16:58fdobridge: <Sid> *sweat*
17:01fdobridge: <rinlovesyou> :')
17:01fdobridge: <Sid> no rin go look at #lounge
17:01fdobridge: <rinlovesyou> kek
17:32karolherbst: dakr: I think it's this line: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/gpu/drm/nouveau/nouveau_drm.c?h=v6.7.8#n1185
17:32karolherbst: the mutex_lock
17:32karolherbst: but no idea why that happens
17:33karolherbst: mhhhh
17:33dakr: karolherbst, that's pretty weird.
17:33karolherbst: actually...
17:33karolherbst: like..
17:33karolherbst: if the VM_INIT ioctl fails, does it unlock this lock?
17:33dakr: Can't think of any relevant other locks we're holding there..
17:33fdobridge: <gfxstrand> Ugh... Something's not right with trying to set this register. Sometimes I set it and the test passes and sometimes the test fails.
17:34karolherbst: dakr: it doesn't :D
17:34karolherbst: or does it?
17:34karolherbst: ehhh...
17:34karolherbst: nah it should.. mhh, that's weird
17:34dakr: I think it does..
17:35karolherbst: maybe it's a different one...
17:35dakr: no lockdep splat?
17:35karolherbst: lockdep disabled, becaue it doesn't really work on nouveau anyway (we really should fix that)
17:35karolherbst: mhhh
17:35karolherbst: I think the old submit ioctl might cause it
17:36dakr: At least without GSP and Pascal lockdep was working fine for me.
17:36karolherbst: let me try it then...
17:37karolherbst: maybe the way things got rework makes it work now
17:38karolherbst: ohhh...
17:38karolherbst: mhhh
17:38karolherbst: dakr: I think it needs to call nouveau_abi16_put
17:38karolherbst: for the `(unlikely(nouveau_cli_uvmm(cli)))` case
17:40karolherbst: dakr: https://gist.githubusercontent.com/karolherbst/8175d4830f13cc15d97a7af1982fa9a8/raw/e8d3185fa999db938064bf16969629e6a4a3d665/tmp.patch
17:40karolherbst: testing that shortly
17:40dakr: Oh, indeed. Good catch.
17:42fdobridge: <karolherbst🐧🦀> might want to see how nvidia does it?
17:42fdobridge: <karolherbst🐧🦀> that regs address might come up in traces
17:44fdobridge: <gfxstrand> Yeah, I need to look at their programming.
17:45fdobridge: <gfxstrand> The immediate problem is, I think, power management screwing it up on my laptop. I seem to be able to more reliably reproduce both behaviors on the desktop
17:45fdobridge: <karolherbst🐧🦀> mhhhh
17:45fdobridge: <karolherbst🐧🦀> do we touch that reg in the kernel?
17:46fdobridge: <karolherbst🐧🦀> though that shouldn't matter...
17:46fdobridge: <karolherbst🐧🦀> that reg is context switched
17:46fdobridge: <karolherbst🐧🦀> unless the kernel does stuff per context, then it matters
18:32karolherbst: dakr: sent the fix and another UAPI thing to the ML
18:51fdobridge: <zmike.> So where should I be requesting vkcts coverage to hit this?
18:53fdobridge: <zmike.> Or @gfxstrand if you file a coverage ticket you can cc me and I'll assign
18:56fdobridge: <gfxstrand> To hit what? This hang? Let me file one quick.
19:00fdobridge: <gfxstrand> @zmike. https://gitlab.khronos.org/Tracker/vk-gl-cts/-/issues/4984
19:02fdobridge: <gfxstrand> I *think* that should reproduce it. It's hard to be 100% sure but, based on my triage, I think that should tickle the bug.
19:04fdobridge: <zmike.> Cool
19:06HdkR: Nice. NVIDIA GPU has arrived. I'll tinker with that after I do a FEX release
19:07fdobridge: <gfxstrand> \o/
19:08fdobridge: <zmike.> @gfxstrand same goes for any other coverage gaps you find
19:57HdkR: https://cdn.discordapp.com/attachments/702130548419723294/1214290204370800670/IMG_1500.jpg?ex=65f892e1&is=65e61de1&hm=e54ca54e396ebe4d51fa1b345f3193bd14f5efd9aa8a2df6fd418d64ebd6e853&
19:57HdkR: So cute, so smol
20:04fdobridge: <mohamexiety> daaaaaaaamn
20:05fdobridge: <tom3026> never seen a gt 1030 ? 😄
20:07fdobridge: <mohamexiety> I have one, actually \:p
20:07fdobridge: <mohamexiety> just never seen a 4000 SFF before
20:10fdobridge: <mohamexiety> funny old picture I had. the difference in size is hilarious (this is also one of the slimmer 3080s)
20:10fdobridge: <mohamexiety> https://cdn.discordapp.com/attachments/1034184951790305330/1214303904091607141/20230317_180850.png?ex=65f89fa3&is=65e62aa3&hm=d7289dec57c1ae1ca34b553a253d0806b914b343b35cefd0cb741b9a626e9655&
20:10fdobridge: <Sid> gt 1030, manufactured e-waste
20:22fdobridge: <gfxstrand> 419ea8 doesn't show up at all in the xxd of a dump. 😢
20:22fdobridge: <karolherbst🐧🦀> that's like.. unfortunate
20:23fdobridge: <gfxstrand> I should see if I can make a crucible reproducer and throw it at the blob
20:23fdobridge: <karolherbst🐧🦀> might be they workaround it on the kernel level...
20:23fdobridge: <gfxstrand> Maybe?
20:24fdobridge: <karolherbst🐧🦀> or it's a different reg 😄
20:24fdobridge: <karolherbst🐧🦀> maybe dump all the registers in the 0x418000+ area
20:24fdobridge: <gfxstrand> *sigh
20:24fdobridge: <gfxstrand> *sigh* (edited)
20:25fdobridge: <gfxstrand> The weird bit is that bit 14 of `0x419ea8` definitely does disable the exception.
20:25fdobridge: <gfxstrand> But it also breaks subgroups so WTH
20:25fdobridge: <karolherbst🐧🦀> yeah.. that's a bit wtf
20:25fdobridge: <karolherbst🐧🦀> in what way does it break them though?
20:27fdobridge: <gfxstrand> Still figuring that out
20:27fdobridge: <gfxstrand> It appears that at least `vote.any` is a bit busted
20:29fdobridge: <gfxstrand> Here's the shader:
20:29fdobridge: <gfxstrand> ```glsl
20:29fdobridge: <gfxstrand> #version 450
20:29fdobridge: <gfxstrand> #extension GL_KHR_shader_subgroup_vote: enable
20:29fdobridge: <gfxstrand> layout(location = 0) out uint result;
20:29fdobridge: <gfxstrand> layout(set = 0, binding = 4, std430) readonly buffer Buffer1
20:29fdobridge: <gfxstrand> {
20:30fdobridge: <gfxstrand> bool data[];
20:30fdobridge: <gfxstrand> };
20:30fdobridge: <gfxstrand> void main (void)
20:30fdobridge: <gfxstrand> {
20:30fdobridge: <gfxstrand> uint tempRes;
20:30fdobridge: <gfxstrand> bool valueEqual = bool(1.25 * float(data[gl_SubgroupInvocationID]) + 5.0);
20:30fdobridge: <gfxstrand> bool valueNoEqual = bool(subgroupElect());
20:30fdobridge: <gfxstrand> tempRes = subgroupAllEqual(bool(1)) ? 0x1 : 0;
20:30fdobridge: <gfxstrand> tempRes |= 0x2;
20:30fdobridge: <gfxstrand> tempRes |= subgroupAllEqual(data[0]) ? 0x4 : 0;
20:30fdobridge: <gfxstrand> tempRes |= subgroupAllEqual(valueEqual) ? 0x8 : 0x0;
20:30fdobridge: <gfxstrand> tempRes |= subgroupAllEqual(valueNoEqual) ? 0x0 : 0x10;
20:30fdobridge: <gfxstrand> if (subgroupElect()) tempRes |= 0x2 | 0x10;
20:30fdobridge: <gfxstrand> result = tempRes;
20:30fdobridge: <gfxstrand> }
20:30fdobridge: <gfxstrand> ```
20:30fdobridge: <gfxstrand> The bit that's missing in the output is `0x4` so it's that `subgroupAllEqual(data[0])` that's going wrong. That or the `data[0]` load itself but that seems unlikely given that the subgroup tests are literally the only tests to fail.
20:34fdobridge: <gfxstrand> But other tests have a different failure pattern
20:36fdobridge: <gfxstrand> `VOTE.EQ P1, P1` is the one that's failing. Maybe VOTE stops liking P1? That makes no sense
20:41fdobridge: <karolherbst🐧🦀> mhhh
20:41fdobridge: <karolherbst🐧🦀> data is an ssbo, right?
20:41fdobridge: <gfxstrand> Hrm... There's a SET_SHADER_EXCEPTIONS method...
20:41fdobridge: <gfxstrand> If I just disable all shader exceptions there, both tests pass
20:42fdobridge: <karolherbst🐧🦀> ohh, is nvidia using that?
20:42fdobridge: <karolherbst🐧🦀> I totally forgot that method exists...
20:43fdobridge: <gfxstrand> Okay, I found all their falcons
20:49fdobridge: <gfxstrand> ```
20:49fdobridge: <gfxstrand> 0x418800 bit 0 -> true
20:49fdobridge: <gfxstrand> 0x419a08 bit 4 -> false
20:49fdobridge: <gfxstrand> 0x419ba4 bit 3 -> false
20:49fdobridge: <gfxstrand> 0x419a04 bit 0 -> true
20:49fdobridge: <gfxstrand> 0x419a04 bit 1 -> true
20:49fdobridge: <gfxstrand> 0x418e40 bits 24..28 -> 0x7
20:49fdobridge: <gfxstrand> 0x418e50 bits 12..14 -> 0x2
20:49fdobridge: <gfxstrand> 0x418e64 bits 0..16 0x935
20:49fdobridge: <gfxstrand> 0x419ea8 bit 2 -> false
20:49fdobridge: <gfxstrand> 0x419ea8 bit 14 -> false
20:49fdobridge: <gfxstrand> 0x419ea8 bit 15 -> false
20:50fdobridge: <gfxstrand> 0x419ea8 bit 18 -> false
20:50fdobridge: <gfxstrand> 0x419ea8 bit 24 -> false
20:50fdobridge: <gfxstrand> ```
20:50fdobridge: <gfxstrand> So they do set `0x419ea8`, I just missed it.
20:50fdobridge: <gfxstrand> Now to figure out what those bits do...
21:01fdobridge: <gfxstrand> ```
21:01fdobridge: <gfxstrand> 0x418800 bit 0 -> true // gr_pri_gpcs_setup_debug_poly_offset_nan_is_zero
21:01fdobridge: <gfxstrand> 0x419a08 bit 4 -> false // Unknown gr_pri_gpcs_tpcs_tex_samp_dbg
21:01fdobridge: <gfxstrand> 0x419ba4 bit 3 -> false // sm_disp_ctrl_ld_is_nop
21:01fdobridge: <gfxstrand> 0x419a04 bit 0 -> true // Unknown tpcs_tex_lod_dbg bit
21:01fdobridge: <gfxstrand> 0x419a04 bit 1 -> true // tpcs_tex_lod_dbg_cubeseam_aniso
21:01fdobridge: <gfxstrand> 0x418e40 bits 24..28 -> 0x7 // Unknown gr_pri_gpcs_swdx_tc_bundle_ctrl
21:01fdobridge: <gfxstrand> 0x418e50 bits 12..14 -> 0x2 // Unknown gr_pri_gpcs_swdx_tc_bundle_ctrl
21:01fdobridge: <gfxstrand> 0x418e64 bits 0..16 0x935 // Unknown gr_pri_gpcs_swdx_tc_bundle_addr
21:01fdobridge: <gfxstrand> 0x419ea8 bit 2 -> false // api_stack_error_report
21:01fdobridge: <gfxstrand> 0x419ea8 bit 14 -> false // oor_addr_report
21:01fdobridge: <gfxstrand> 0x419ea8 bit 15 -> false // misaligned_addr_report
21:01fdobridge: <gfxstrand> 0x419ea8 bit 18 -> false // invalid_const_addr_pdc_report
21:01fdobridge: <gfxstrand> 0x419ea8 bit 24 -> false // Unknown
21:01fdobridge: <gfxstrand> ```
21:04fdobridge: <gfxstrand> Programming `0x419ea8` exactly the same as the blob and I still have subgroup fails. 🙀
21:06mandavoska: general purpose "c'X" storage what would that missing word be? perhaps clock source and storage is also incorrect presumption
21:06mandavoska: basically three terms that i do not know what they are gpcs, pdc, tpcs last is something with textures and but..
21:07fdobridge: <!DodoNVK (she) 🇱🇹> Is subgroups the next XFB?
21:08fdobridge: <gfxstrand> no
21:09fdobridge: <gfxstrand> I'm sure this has an explanation. I just don't know what that explanation is yet
21:10mandavoska: but the registers have functionality that report errors
21:13mandavoska: considering that they report and configure error reporting this could be configuration source also
21:15mandavoska: but nvidias terminology due to partly missing documentation other than mwk ones so to speak mmiotrace annotations that i did not so carefully look at and envytools i was the weakest in
21:15mandavoska: i just know that streaming media engines
21:15mandavoska: are their compute engines
21:16mandavoska: and processing elements and those popular terms i know
21:16mandavoska: pe is simd
21:16fdobridge: <!DodoNVK (she) 🇱🇹> mandavoska: Doesn't PE mean Portable Executable? /s
21:17mandavoska: i did bunch of hardware languages simulations and research on verilog , i have pretty good imagination as to how hardware works too, however that is how i know that software has yet much performance under the hood, cause a userspace hardware simulation layer can be done yet
21:18mandavoska: it means portable executable too, but in nvidia terminology it's processing element
21:18mandavoska: its part of the shader engine sm
21:18mandavoska: sm is either compute engine or cu, and pe is either simd or one workitem
21:19mandavoska: or things like this
21:19mandavoska: people refer to it as simd lane also
21:21mandavoska: difference with cpus is that graphics cards work on simds and one lane is 16 processing elements
21:21mandavoska: they are called as threads on intel
21:21mandavoska: but that is something different then cpu core or thread
21:21mandavoska: cause i worked on miaow graphics card code
21:22mandavoska: its gcn clone
21:23mandavoska: its an inherent property of alu unit, where one thread works on multiple data
21:23mandavoska: as the graphics often needs
21:25mandavoska: in that case simd though is single instruction multiple data, but all know that anyways, you have those pixels in sequence, and it is very effective to do so
21:33mandavoska: gpu is extremely difficult piece of hardware , it took a whole 5years session to study it to perfection, so opencl 2.1 is extremely complex api the maximum you can do there to optimize every part, but on one paradigm that we badly need, it offers no value compared to 1.2 and 3.0
21:46colonization: this paradigm i already introduced, clock units are resistors, but opening the self-timed world would allow to use all the hardware as weapons, like hypersonic , ultrasound etc. So that is why i do not vote for opening this regime, and i had developed an alternative performance model, which core model has been under or in full extend description by me so and hence has been fully described. I noo longer have any details to share about it's
21:46colonization: working procedures
23:32fdobridge: <gfxstrand> Uh... so... vote appears to be broken in fragment shaders...
23:34fdobridge: <gfxstrand> Or maybe we need to set another helper bit.
23:54Lyude: 24 files changed, +1802 -2 diff, wow it's been ages since I've written that much code
23:55gfxstrand: :D
23:55karolherbst: :D