02:17 mangodev[d]: gfxstrand[d]: i feel lucky i'm in that weird between period of cards that have GSP but *not* reBAR
02:30 mhenning[d]: mangodev[d]: (In this case, faith is talking about the bar instruction, which is different from rebar)
02:40 mangodev[d]: mhenning[d]: ahhh okay
02:40 mangodev[d]: good to know
02:41 mangodev[d]: oh wait that's right
02:41 mangodev[d]: didn't NVK start supporting rebar over a year ago?
02:41 mangodev[d]: so that's not even a concern
02:58 mangodev[d]: anything of note the `vulkan-beta` build flag does for NVK?
03:14 airlied[d]: no
03:34 mangodev[d]: okay good because i already built my drivers w/o it
03:41 mangodev[d]: i wonder
03:41 mangodev[d]: what's a good way to test why moving the mouse between monitors divebombs render perf? is there a way to see the cause (whether to blame zink or nvk)?
04:07 gfxstrand[d]: Are the two monitors plugged into the same card?
04:11 mangodev[d]: gfxstrand[d]: yes
04:11 mangodev[d]: i don't *have* another graphics device
04:12 tiredchiku[d]: I wonder if the hardware cursor size is changing between monitors
04:13 mangodev[d]: maybe
04:13 mangodev[d]: it's being weird as hell
04:13 mangodev[d]: it's only when *crossing* monitors
04:13 mangodev[d]: not "it's laggy on X monitor"
04:13 mangodev[d]: it specifically lags when moving between monitors
04:15 mangodev[d]: i also want to find out why spectacle is so unbearably laggy on nvk
04:15 mangodev[d]: i take screenshots a lot, and a 10fps screenshotting menu is not ideal (especially because i haven't seen it on other drivers)
04:16 mangodev[d]: would enabling gpuvis markers help diagnose the issues?
04:19 gfxstrand[d]: mangodev[d]: Okay, that's weird
04:19 gfxstrand[d]: Are you dragging an app between monitors or just the cursor itself?
04:20 mangodev[d]: the perf is fine on both monitors (although admittedly prone to stutter and lag from gpu usage elsewhere on the system)
04:20 mangodev[d]: but specifically moving the cursor (with *and* without dragging a window or file) lags the entire compositor by a substantial amount
04:20 tiredchiku[d]: spectacle works fine for me on my system, fwiw
04:21 mangodev[d]: when mangohud didn't crash on start, unlocked glxgears went from about 150fps down to 20 just my moving my mouse between monitors
04:22 mangodev[d]: it's not when the mouse is present on both, it's specifically when the mouse anchor moves from one monitor to the other
04:22 mangodev[d]: and this did not happen on proprietary nvidia
04:23 gfxstrand[d]: That's weird
04:23 gfxstrand[d]: What compositor?
04:25 gfxstrand[d]: I wonder if something funny is going on with Zink and the two monitors.
04:25 mangodev[d]: gfxstrand[d]: kwin, latest
04:25 gfxstrand[d]: That's annoyingly plausible
05:02 gfxstrand[d]: I wonder if Zink isn't massively over-synchronizing in some cases. If the cursor is somehow an external texture and the two monitors are being drawn separately... Yeah, their two vblanks could be fighting.
05:03 gfxstrand[d]: Especially if it's somehow also putting that cursor in a plane.
05:03 gfxstrand[d]: That sounds a little crazy, though.
05:06 mhenning[d]: mangodev[d]: If you stop your mouse cursor so it's half on one monitor, half on the other, does performance remain poor?
05:07 mhenning[d]: Also might be worth filing a bug on https://gitlab.freedesktop.org/mesa/mesa/-/issues so it doesn't get lost
05:07 mangodev[d]: mhenning[d]: no
05:08 mangodev[d]: only when the origin of the mouse crosses monitor bounds
05:08 mangodev[d]: mhenning[d]: i should definitely have created an account by now but i haven't :P
05:10 tiredchiku[d]: https://tenor.com/view/do-it-now-gif-8259067036335066552
05:13 mangodev[d]: mangodev[d]: (the "origin" referring to the actual pixel your cursor is over, as opposed to the bounds of the sprite)
05:15 mhenning[d]: are you on wayland or X?
05:15 mangodev[d]: mhenning[d]: kwin wayland 6.3.4
05:16 mangodev[d]: i have been pondering labwc, but i'd have to set it up when i have plenty of time
05:17 tiredchiku[d]: mangodev[d]: or you can install xfce 4.20
05:17 tiredchiku[d]: with labwc
05:17 mhenning[d]: Okay, yeah that's an odd issue
05:18 tiredchiku[d]: (they use labwc as their default compositor)
05:18 mangodev[d]: tiredchiku[d]: i mean
05:18 mangodev[d]: *ehhhhh*
05:18 mangodev[d]: i'll just do the proper setup and be more direct to the compositor/wm
05:19 mangodev[d]: also because i like the theming opportunities (and it's an opportunity for me to try out waybar w/o having to dive in deep into tiling wms yet)
05:55 jannau: mangodev[d]: does the problem go away if you for kwin to use software cursors via `KWIN_FORCE_SW_CURSOR=1`? has to be set before kwin starts
05:57 mangodev[d]: i'll do so when i get more time
05:57 mangodev[d]: heading off for the night soon
09:05 karolherbst[d]: gfxstrand[d]: well.. I can tell what CCTL combinations are invalid, and it also depends on the things that gets invalidated 🙃
13:52 gfxstrand[d]: Ah, phoronix comments...
13:52 gfxstrand[d]: Nothing horrible this time. Just a bunch of people arguing about reclocking and what GPUs are actually supported.
13:55 Lynne: the comments are bad, but the article text is the worst
13:55 Lynne: in every single report mentioning ffmpeg they've made a mistake
13:56 gfxstrand[d]: Oh, yeah
13:57 gfxstrand[d]: He also never picks up or links my blog posts, even though there's almost always a blog post to go along with things like this.
13:58 HdkR: I personally love the inevitable first comment on my project's release posts saying that I'm wasting time on ARM and effort should be focused instead on the standard ISA RISC-V :P
13:59 HdkR: They're very meme worthy.
14:01 gfxstrand[d]: My favorite are always the ones where the original article is about Mesa and they manage to get side-tracked into an argument about network connectors before finishing off page 1.
14:01 HdkR: :D
14:02 HdkR: The NVIDIA "superNIC" has a GPU attached to it, need to support that in mesa.
14:02 gfxstrand[d]: Ah, codegen...
14:02 gfxstrand[d]: `ERROR: no viable spill candidates left`
14:03 gfxstrand[d]: I'm running my Kepler A with codegen right now. If it doesn't go down in flames, that'll give me a baseline to possibly find bugs which aren't image related. It may also tell me that NAK is now strictly better than codegen and we can delete codegen now.
14:04 mohamexiety[d]: HdkR: still funny to me that the GSP is likely the strongest/best RISC-V core available outside of things like Tenstorrent's accelerators. this is how _good_ the R-V ecosystem and devboard situation is :KEKW:
14:04 dwfreed: gfxstrand[d]: you could try directing him to your blog posts?
14:04 HdkR: mohamexiety[d]: pffft, haha :D
14:04 mohamexiety[d]: and it's just a management core, it's not even meant to be more general!
14:04 snowycoder[d]: gfxstrand[d]: found some funky code in sm50: OpShf has a 32-bit shift immediate form but also calls `copy_alu_src_if_i20_overflow` on shift.
14:04 snowycoder[d]: (also, it can be reduced to an i20 without any copy with the comments you left in the MR)
14:05 gfxstrand[d]: snowycoder[d]: That's entirely plausible. I tried to audit sm50 but probably missed some things in review. Reviewing sm32 actually caught a couple issues in sm20.
14:06 gfxstrand[d]: I kind of hate how much legalization code is duplicated between the three. But also, they're different so... <a:shrug_anim:1096500513106841673>
14:08 gfxstrand[d]: Some of the immediate stuff might be able to be done in common code since it's less about HW encoding and more about what bits of the immediate actually matter.
14:09 snowycoder[d]: gfxstrand[d]: What are you thinking about?
14:29 memberofunion: so: say you are on the remainder of 116 which has shorthand like 80+36 from the literature before, no need for the scaffold. you store 256-58-71, and get the coefficients like this 116+X 256-116-X let's pretend 62 then the inverse is 78 and their sum is 140, but so 127-140=-13+58=45-32=13 32 because 72+40 though 112 but 32 is the distance of 112 where the interpolants come from aka
14:29 memberofunion: 144-72=72-36=36 256-144=112 also, so 127-112=13 where 114 is left when all other elements eliminate, where 140-114=26 i.e 58+58 added to it is 71+71 akin of 71 times two we searched for. now we use that clever trick to avoid diving by two. so 26+32=58 , if we had 124+74 instead we get 28+32=60 , now 144+2-72=74 as you see real format starts from 5, so there is no need to check for 0 which
14:29 memberofunion: is special case, since it would need -1. cause 144-1-72=71 in other words, you should never fill 71 but starting from 72 or more. Now this ends my research on sequences, all sigma tests are succeeding as of now.
14:31 snowycoder[d]: snowycoder[d]: Whoops, totally misread the code, still we could optimize it by removing the copy, will drop a MR
14:32 gfxstrand[d]: snowycoder[d]: Shifts can either be masked or min'd, depending on .wrap. `shfl` immediates can be masked. `prmt` only uses the bottom 16 bits. Those are all the ones I can think of
14:35 snowycoder[d]: gfxstrand[d]: I was just extracting the `shfl` code from the reduce to share with encoders.
14:35 snowycoder[d]: Prmpt can be shared, but it's just a mask
15:16 snowycoder[d]: I think SM32 doesn't like `s2r` with `NAK_SV_INVOCATION_INFO` (aka `SR29`).
15:16 snowycoder[d]: Codegen solves this with an `isberd` from `rz`.
15:16 snowycoder[d]: I have no idea what either of them mean
15:17 snowycoder[d]: gfxstrand[d]: does sm20 work with `dEQP-VK.draw.renderpass.multiple_interpolation.structured.with_sample_decoration.8_samples`?
15:18 snowycoder[d]: P.s. After this I would love to add more comments to the Ops in ir.rs, some of them are quite obscure
15:21 gfxstrand[d]: snowycoder[d]: No, that one seems to fail here, too.
15:25 asdqueerfromeu[d]: gfxstrand[d]: Deleting codegen implies deleting nouveau Gallium (which is still used for pre-Kepler hardware)
15:26 gfxstrand[d]: asdqueerfromeu[d]: Just deleting codegen support from NVK
15:26 gfxstrand[d]: And then moving codegen back to src/gallium/drivers/nouveau
15:28 gfxstrand[d]: Codegen isn't looking so hot...
15:28 gfxstrand[d]: `Pass: 381113, Fail: 35185, Crash: 2328, Skip: 709371, Flake: 3, Duration: 1:27:13, Remaining: 2:04:57`
15:30 gfxstrand[d]: snowycoder[d]: isberd is a read from ISBE. ISBE is a magic thing which re-maps I/O. It's a little odd that they'd just stash the info in the first entry but not totally crazy.
15:31 karolherbst[d]: it's the backwards addressed thing, right?
15:31 gfxstrand[d]: It's also a little odd that they give us two `u8`s and expect us to extract them and multiply them together. I don't ask. I just type the compiler code.
15:31 karolherbst[d]: or was that something else...
15:32 gfxstrand[d]: It's the thing that re-maps per-vertex I/I
15:32 karolherbst[d]: there was this thing where you read from offset 3 to get 0..3
15:32 gfxstrand[d]: Inputs, more specifically
15:32 karolherbst[d]: I'm sure Mary knows what I mean.. but I think it was IO related, maybe not ISBE
15:34 marysaka[d]: offset 0x0 (aka reading from 0x3) on Turing at least mean reading the primitive count
15:35 marysaka[d]: after that you have all primitives indices
15:35 marysaka[d]: See slide 25 (https://indico.freedesktop.org/event/6/contributions/307/attachments/216/295/XDC2024-NVK-MeshShading.pdf)
15:36 marysaka[d]: I should probably write that down on the wiki at some point
15:38 marysaka[d]: karolherbst[d]: Internal Stage Buffer Entry (ISBE)
15:38 gfxstrand[d]: I can't tell exactly what the nv50 lowering is doing. It looks like maybe it's just assuming `shl vtx 2` instead of the `SV_INVOCATION_INFO` nonsense.
15:38 gfxstrand[d]: But hard to tell
15:40 marysaka[d]: gfxstrand[d]: To me if you want to grab the info it should be at offset 0x4 + vtx
15:40 marysaka[d]: no need for the weird inverted thing like on the primitive count because we are reading only one byte from there
15:46 mhenning[d]: snowycoder[d]: Some of the ops have brief descriptions here: https://docs.nvidia.com/cuda/cuda-binary-utilities/index.html or for kepler, here: https://docs.nvidia.com/cuda/archive/10.2/cuda-binary-utilities/index.html
15:47 mhenning[d]: but yeah, adding brief descriptions for some of them could be helpful.
15:47 mhenning[d]: I also find it useful to look at from_nir.rs and see what nir opcodes map to a specific instruction
15:49 snowycoder[d]: mhenning[d]: I do too, but it becomes complex for some (e.g. isberd)
15:51 mhenning[d]: yep, none of that's a replacement for documentation, just wanted to be sure you were aware of those resources
16:00 snowycoder[d]: mhenning[d]: Thanks for the resources then, (I did not know nvidia had old versions archived)
16:14 gfxstrand[d]: For ALU stuff, documentation is half of why I did `Foldable`. Often the exact semantics are tricky and depend on non-obvious bits. If we fold it and we test it, the exact semantics are right there.
16:14 gfxstrand[d]: For non-ALU stuff, the best we can do is a good doc comment.
16:19 gfxstrand[d]: Like the 32-bit shuffles I just added are super easy. `shf`, not so much.
16:33 severenagger: so thousands of accounts frozen, everyone who does not know why those freaks get handled with force, is that those terrorists are freezing all my accounts, on the basis that zooparks airlied is a superstar figure who can abuse others and lie to everyone. Stay patient they are out of finances as sponsors are turning their backs on those abusers altogether soon and handled with force.
16:33 severenagger: Ponytail and DOG i would not want to be in your shoes obsess compulsive moderators. The project will be launched against you quite soon. I just have a lot of real work to do in the meantime. They were as terroristic and as stupid as now, always in the history.