00:04 mhenning[d]: skeggsb9778: Hi, do you remember what prompted you to write the if statement here? https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/fifo.c?h=v6.16#n514
00:04 mhenning[d]: I'm running into an issue where a channel with NOUVEAU_FIFO_ENGINE_CE allocated from userspace runs into job timeouts and I'm wondering if this might be related. (NOUVEAU_FIFO_ENGINE_GR works fine with no other changes)
00:09 airlied[d]: do you see standalone GRCEs?
00:10 mhenning[d]: That's a good question
00:11 mhenning[d]: Since NOUVEAU_FIFO_ENGINE_CE behaves differently form NOUVEAU_FIFO_ENGINE_GR I would assume so?
00:15 airlied[d]: no a GRCE is different to a CE
00:15 airlied[d]: a GRCE is CE that is grouped with a GR, but it sounds like there were some times lone GRCE reported where it wasn't linked to a GR at all
00:16 airlied[d]: but you should still have plain CE I think
00:16 airlied[d]: probably need to dump the table gsp gives to know for sure
00:33 mhenning[d]: oh, yeah I had assumed that a "GRCE without GR" was a CE
00:33 mhenning[d]: but yeah, I'll take another look at it tomorrow
01:07 airlied[d]: mhenning[d]: what hw are you seeing it on?
01:08 mhenning[d]: 3060
01:08 airlied[d]: I'm just running talos principle benchmark on turing right now on your branch
01:10 mhenning[d]: Could you try with NVK_DEBUG=push_sync ? That seems to make it more likely on my machine
01:19 airlied[d]: goes slower but doesn't die, will give it a few runs
01:24 mhenning[d]: hmm, it dies immediately in the loading screen here
01:24 mhenning[d]: that is, before the main menu even opens
01:31 airlied[d]: diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c
01:31 airlied[d]: index 9f345a008717..bf3ffcc85379 100644
01:31 airlied[d]: --- a/drivers/gpu/drm/nouveau/nouveau_fence.c
01:31 airlied[d]: +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
01:31 airlied[d]: @@ -389,9 +389,11 @@ nouveau_fence_sync(struct nouveau_bo *nvbo, struct nouveau_channel *chan,
01:31 airlied[d]: rcu_read_lock();
01:31 airlied[d]: prev = rcu_dereference(f->channel);
01:31 airlied[d]: local = prev && prev->cli->drm == chan->cli->drm;
01:31 airlied[d]: +#if 0
01:31 airlied[d]: if (local && (prev == chan ||
01:31 airlied[d]: fctx->sync(f, prev, chan) == 0))
01:32 airlied[d]: must_wait = false;
01:32 airlied[d]: +#endif
01:32 airlied[d]: rcu_read_unlock();
01:32 airlied[d]: if (!must_wait)
01:32 airlied[d]: continue;
01:32 airlied[d]: mhenning[d]: does hacking the kernel like the above change it?
02:18 airlied[d]: okay got 3050 to trigger it
02:37 mhenning[d]: I don't have time to test more stuff tonight but let me know if you want me to test anything tomorrow.
03:23 airlied[d]: okay I can confirm burning cross context fence sync seems to fix it
04:44 karolherbst[d]: airlied[d]: I'll probably wire up the matrix size lowering on nvk today
04:45 airlied[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1405412004256284803/0001-nouveau-fence-add-a-wfi-to-the-fence-sync-on-volta.patch?ex=689ebb6d&is=689d69ed&hm=ef19bed9ab6ae229c798dd6c94ffb5c115e59fb0c1977843686401ed250ac4b7&
04:45 airlied[d]: mhenning[d]: can you see if that patch makes any consistent difference?
05:03 mhenning[d]: I'll try it tomorrow. Is the change from ACQ_CIRC_GEQ to ACQ_STRICT_GEQ intentional? It seems like a different change.
05:07 airlied[d]: it was intentional because it seemed to help, but it might just be papering over it
05:07 airlied[d]: another idea if to maybe look at why we have if (!(engines & NVKMD_ENGINE_3D))
05:07 airlied[d]: barriers &= ~NVK_BARRIER_RENDER_WFI;
05:07 airlied[d]: because it might be necessary for userspace to WFI there for transfer queues
05:15 mhenning[d]: Oh, yeah that wfi is dispatched to the graphics queue which is why we skip it without graphics
05:15 mhenning[d]: but yeah, it's possible we should have a wfi on the userspace side.
05:25 airlied[d]: uvm seems to insist that wfi/membar is emitted after transfers
05:25 airlied[d]: on the copy engine
05:25 airlied[d]: and even mention Ampere as when it matters
09:12 phomes_[d]: Serious Sam still crash the same way with that kernel patch
09:53 airlied[d]: phomes_[d]: yeah I expect we need the userspace change probably
11:43 karolherbst[d]: airlied[d]: so seems like your pass chokes on arrays derefs in `vk_cooperative_matrix_perf`
11:44 karolherbst[d]: though maybe I did something wrong? but I doubt it..
11:46 airlied[d]: I wouldn't be surprised I've only tested CTS though I thought j got arrays right
11:48 karolherbst[d]: it's asserting in nir_build_deref_follower
11:48 karolherbst[d]: but let me run the CTS first...
11:49 karolherbst[d]: airlied[d]: the changes: https://gitlab.freedesktop.org/karolherbst/mesa/-/commit/6fe3d45aafb18ac430e48a63583454ee72ef391b
11:49 karolherbst[d]: tldr: dim_gran needs to go 😄
11:50 karolherbst[d]: it'd be cool if load_store lowering wouldn't rely on the `dim_gran` value, then it could go completely
11:51 karolherbst[d]: though maybe I messed up split_cmat_length ...
11:51 karolherbst[d]: tried to make it work without relying on `dim_gran` at all, but let me use the one I stash into the split_mat thing..
11:54 karolherbst[d]: anyway.. `split_cmat_length` is kinda weird because it's doing something that's already done earlier, no?
11:54 karolherbst[d]: anyway, it's not that
11:56 karolherbst[d]: airlied[d]: anyway.. maybe you want to advertize a matrix size not supported by hardware in radv like we do with NVK, and run the CTS lowering that via `nir_lower_cooperative_matrix_flexible_dimensions`?
11:56 karolherbst[d]: I'm seeing other issues in the CTS as well
11:57 karolherbst[d]: ohh wait.. `cmat_length` doesn't have a src.. uhhh...
12:03 karolherbst[d]: okay..
12:03 karolherbst[d]: CTS: ` Failed: 0/11644 (0.0%)`
12:03 karolherbst[d]: `vk_cooperative_matrix_perf` still broken
12:05 karolherbst[d]: anyway, dumped the info on the MR
12:06 karolherbst[d]: airlied[d]: it's not about arrays, it's about array derefs on coop-matrix directly
12:06 karolherbst[d]: apparently that's legal
12:06 karolherbst[d]: but
12:06 karolherbst[d]: `nir_build_deref_follower` seems to choke on it
12:55 misyltoad[d]: Is there anyone working on Vulkan Video for NVK?
12:55 misyltoad[d]: Would be super interested in testing stuff there at some point when H265 is hooked up
13:18 kar1m0[d]: might be wrong but
13:18 kar1m0[d]: isn't this what you mean? or at least similar to it
13:18 kar1m0[d]: https://nouveau.freedesktop.org/VideoAcceleration.html
13:20 chikuwad[d]: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31867
13:20 chikuwad[d]: misyltoad[d]
13:22 misyltoad[d]: looks like just decode rn ah
13:25 esdrastarsis[d]: misyltoad[d]: Faith
13:25 mohamexiety[d]: misyltoad[d]: yeah so far it's just decode only. joint effort between dwlsalmeida and gfxstrand[d] (who's working on merging it soon)
13:27 dwlsalmeida[d]: misyltoad[d]: I'm pretty sure the mpeg stuff is working, we're lacking on the open source codecs front though
13:27 misyltoad[d]: thats cool
13:28 misyltoad[d]: i should have clarified that i was interested in encode though x)
13:29 dwlsalmeida[d]: yeah, I think we should ideally get that MR merged before trying to get encode working
13:30 gfxstrand[d]: I'm hoping to get back to it today. I got distracted by Android a bit but now my brain has melted and I need to write code again.
13:30 mohamexiety[d]: that android stuff (if it's the image stuff) was scary tbh
13:33 dwlsalmeida[d]: misyltoad[d]: do you have a few cycles to spend on this?
13:33 dwlsalmeida[d]: there may be something you can do to help
13:34 misyltoad[d]: dwlsalmeida[d]: As much as I would like to, I don't think I have anywhere near enough knowledge of NV hardware to help much there :P
13:36 dwlsalmeida[d]: right, we have to figure out why this thing is 10x slower than my previous C implementation
13:37 dwlsalmeida[d]: I managed to isolate this to downloading the decoded frame from the GPU
13:37 dwlsalmeida[d]: ideally we'd have to nail this down further
13:37 dwlsalmeida[d]: or it won't be useful at all to users
13:40 dwlsalmeida[d]: people noticed this already, e.g.: Stephane has commented about it on the MR
14:45 mhenning[d]: dwlsalmeida[d]: gfxstrand[d] There's a contributor here who seems to be willing to help with the video effort btw: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36719
15:16 mhenning[d]: airlied[d]: where is this? grep is failing me at the moment
15:59 jja2000[d]: mhenning[d]: Ayyy yeah that's Diogo, he tried to get Tegra in a better spot with the mesa driver aswell, but the MR did not gain too much traction
15:59 jja2000[d]: I'll see if I can get him to join here
16:34 x512[m]: cubanismo[d]: How I supposed to signal one sem_surf from another sem_surf?
16:51 cubanismo[d]: I'm not sure what that means.
16:52 cubanismo[d]: They don't signal things. They can themselves be signaled.
16:53 cubanismo[d]: And waited upon
17:25 Calvin75: Hello
17:27 x512[m]: cubanismo[d]: For example how to make to signal some other sem_surf in mem_mapper other than specified in constructor?
17:29 cubanismo[d]: They can't do that directly. The expectation is you'd use indirection if you had to.
17:29 x512[m]: Make dummy channel submission?
17:30 cubanismo[d]: Yeah, if you don't have any other work or didn't have a spare CPU thread around I guess.
17:30 cubanismo[d]: Generally you don't need to wait on work except in the context of doing more work though
17:31 cubanismo[d]: You could certainly extend the auto-increment mechanism to signal arbitrary other semaphore surfaces, but due to the recursive nature, thar be dragons with locking and stuff.
17:32 cubanismo[d]: We haven't run into a need in the proprietary driver, but of course dummy submissions are very cheap with userspace submission.
17:34 cubanismo[d]: Are you trying to run NVK on OpenRM?
17:35 mohamexiety[d]: yup
17:35 mohamexiety[d]: x512 is trying to port NVK to HaikuOS on OpenRM
17:35 mohamexiety[d]: there's also a separate effort based on their work to run NVK on openrm on linux but that's stalled for a while now
17:40 x512[m]: NVK on Haiku with OpenRM is already functional, but all submissions are currently blocking, Vulkan semaphores/fences are not implemented yet.
17:51 x512[m]: Source code if someone cares: https://github.com/X547/mesa/commits/mesa-nvk-r2
17:54 steel01[d]: jja2000[d]: What pr was that?
18:00 cubanismo[d]: x512[m]: That'll be interesting.
18:00 cubanismo[d]: Looking forward to the Phoronix benchmarks
18:02 jja2000[d]: steel01[d]: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20811 it's been a while
18:02 x512[m]: Many things are already working: https://discuss.haiku-os.org/t/haiku-nvidia-porting-nvidia-driver-for-turing-gpus/16520/59
18:04 cubanismo[d]: Very cool
18:05 cubanismo[d]: Does it run on aarch64?
18:06 x512[m]: Haiku itself currently do not run on aarch64. Only on x86[_64] and riscv64.
18:06 x512[m]: In theory Nvidia GPU may work on RISC-V boards on Haiku, but not tested yet.
18:25 steveonfire: There does not appear to be such science or field in maths at least accessible to me. Last message where i tried to do combinations i could not again parse later on. It looks like i am doing the combinations in reverse somehow. In other words, i can not make the calculations in combinatory math roots (i just do not have enough steam or intelligence/experience to do those, I have not read so
18:25 steveonfire: much math books). https://www.calculatorsoup.com/calculators/discretemathematics/permutationsreplacement.php that is a simple power based method. And it might suite, but no idea if that is something correct that i did with it in the context of quadrants examples. n-objects 32 r-sample 4=1048576 where as 64 and 8=281474976710656 so n-objects 8 r-sample 4=4096 but 16 and 4 is 65536, so those
18:25 steveonfire: are 32bit and 64bit mulitplicand and multipliers respecitively getting into max coverage. Such as 1048576*4096 is 32bit max where as 281474976710656*65536 is 64bit max. I am sceptical, i can not understand especially where that 64 and 8 comes from, to get to 2^48, but 2^20 which is 4 from 32 seems plausible.
18:40 cubanismo[d]: I don't know if our official RISC-V support is even out yet. I saw it was announced. I guess in theory you just recompile the open files for RISC-V, but I don't know if there were specific fixes needed or anything. Usually that stuff doesn't just work. Bugs are uncovered, etc., but I didn't do any work on the RISC-V port myself.
18:41 chikuwad[d]: even if openrm runs, do userspace risc-v libs exist
18:42 notthatclippy[d]: OpenRM should work on risc-v. Userspace obviously doesn't, but haiku implements their own anyway
18:43 x512[m]: NVK with NVRM backend should compile for RISC-V.
18:45 mangodev[d]: kinda lost in this conversation
18:45 mangodev[d]: what is openrm?
18:47 redsheep[d]: The open Nvidia kernel driver. Can also be called ogk
18:48 x512[m]: This thing: https://github.com/NVIDIA/open-gpu-kernel-modules
18:50 x512[m]: It is kind of annoying that OpenRM use heap allocation when interrupts are disabled. Haiku kernel do not support it directly.
18:51 cubanismo[d]: Hehe, sorry, sem_surf specifically relies on this.
18:52 cubanismo[d]: I'm aware it's sort of nasty
18:52 cubanismo[d]: But it works on Linux.
18:53 x512[m]: Currently some workaround allocation pool is implemented to solve this problem.
18:54 cubanismo[d]: Yeah, my plan if we ever needed it on platforms that don't allow malloc there had been to replace various maps/lists with fixed-size structures.
18:54 cubanismo[d]: You could pick reasonable maximums for all that stuff and not waste too much memory.
19:01 mhenning[d]: airlied[d]: No, this doesn't make a difference on my machine
19:14 ermine1716[d]: So card with risc-v processor on board can't run on risc-v?
19:19 mhenning[d]: airlied[d]: This also does not seem to change anything.
19:38 airlied[d]: Weird guess I'm just running into it working by luck, though I suspect the area is where it is busted
20:13 juliosanchez: My guess is that it's based of text/word lesson/assignment/problem, it's worded with some statements, and since 4bits is symmetric and have no multiplier when another 4 is met, it get's cancelled out and simplified somehow in a sequence. So 64/16 gets simplified to 64/8 and 32/8 to 32/4, so the word problem is likely a mix of combination and permutation worded. Since there is no
20:13 juliosanchez: simplification and cancellation in permutations with replacement. So there _is_ some very little chance that those calculations hold, but i would not be entirely surprised out of my boots if they do.
22:20 airlied[d]: karolherbst[d]: going to try and rework it all into a single pass today, I did half of it yesterday but got distractedf