00:20phomes_[d]: I looked into the misrendering in The surge 2 again. I think I understand the issue now. I added some details to the issue https://gitlab.freedesktop.org/mesa/mesa/-/issues/11447
00:21phomes_[d]: maybe a compiler person would know if adding round to zero like the prop driver would be the right thing to do
00:49gfxstrand[d]: phomes_[d]: Interesting. I think that's lowered in common code at the moment but we can add our own where we have more control if we want.
00:59orowith2os[d]: snowycoder[d]: new room almost set up, moved the Kepler PC in here. Any updates on anything I can do, test-wise or anything?
02:00airlied[d]: karolherbst[d]: I wonder about how legal splitting a 16 col load from uvec4 which is 16 bytes aligned down to 8 byte load, wondering if the alignment is going to work
04:20airlied[d]: karolherbst[d]: I don't have the right brain today for deref calcs! I'll see if I can get it to focus later 🙂
06:57soininenmatti: It's again you got understood in maybe slightly crooked way, You say in order for me to heal i need to stay away from the people who broke me, where as compared to you i am not even broken,i phrase this in correct way, if you show your face on the wrong territory or near me again, you get shot. And my life is not about how many shots has undisputed analweight champion/queen of the world
06:58soininenmatti: gotten from various bums to the ass, this is nothing i think about I simply do not care enough to see trash already anyhow, but to vandalize and harass our businesses or me, is not allowed to neither to those bums or "champions", you can give nobel prizes to your aids queens and kings or king pins with fatty dicks, show a dick pic to nobel committee, I do not care about this, it's just my
06:58soininenmatti: territory and surroundings have different rules to engage with/to. It never botheres me what you commit for your own life, if Ryan is obsessed with wine or his x64toaarch64 translator so'll be it. I already said there is a better way to show that i am not as mentally ill as he thinks of.
07:36karolherbst[d]: airlied[d]: the matrix itself is int8, so it does all work out, but the deref needs a cast, and the stride needs to be scaled
07:36karolherbst[d]: airlied[d]: I can fix it up, just was too tired yesterday
07:38airlied[d]: yeah I tried doing that today, but brain wasn't helping, at least for the col_offset
07:42airlied[d]: I'll test patch on radv when you figure it out 🙂
08:22karolherbst[d]: yeah.. I also failed at the col_offset part 😄
08:50karolherbst[d]: Okay, I have a good idea to make this work nicely
09:00karolherbst[d]: got rid of the dep of being `nir_deref_type_array` at least without causing regressions by using the original deref.. so my casting is at least correct
09:00karolherbst[d]: in principle
09:12karolherbst[d]: ahh yeah.. I used the wrong type 🥲
11:05snowycoder[d]: gfxstrand[d]: gfxstrand[d] I think I finally know why we are getting an OutOfRange warning in that geometry pass.
11:05snowycoder[d]: The TessellationControl shader is executed 4 times since `layout(vertices = 4) out;`, but the patch has only 3 vertices (it's a triangle).
11:05snowycoder[d]: `gl_in[gl_InvocationID]` will go out-of-bound for the fourth invocation since `gl_in` is effectively an array of length `gl_MaxPatchVertices` (=3) while `gl_InvocationId`=3.
11:05snowycoder[d]: This also explains why the same does not happen for quads or isolines (`gl_MaxPatchVertices`=4)
11:15snowycoder[d]: If this is correct then why aren't we seeing any strange behaviour on newer GPUs? We are reading an out-of-bound value from an attribute array.
11:15snowycoder[d]: This should be undefined behaviour right? I can't find the appropriate Vulkan specification rule
11:26snowycoder[d]: orowith2os[d]: I've been mostly polishing our Vulkan conformance tests, but there are a lot of games that straight up don't work.
11:26snowycoder[d]: I don't know what the next step is, I need to isolate the banding bug first.
11:26snowycoder[d]: Try using the Vulkan driver on main 🤷♂️
11:54karolherbst[d]: 😭 I just debugged an issue for 2 hours where I simply had to move a line of code around
11:54karolherbst[d]: it works
12:12gfxstrand[d]: snowycoder[d]: That sounds entirely plausible. We could go out of our way to smash patch vertex counts based on topology but that sounds annoying.
12:14gfxstrand[d]: Wait, do patch counts go in SPH or are they state?
12:26karolherbst[d]: SPH
12:31marysaka[d]: I think the position of it also changed after SM35 if my memory is correct :aki_thonk:
13:19gfxstrand[d]: BTW: Rust build is badly broken on new rustc. I'm fixing.
13:19karolherbst[d]: 1.89? Ah yeah...
13:20gfxstrand[d]: Yeah, IDK if it's just 1.89 or if it's also my bindgen
13:20karolherbst[d]: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36747
13:20karolherbst[d]: might want to pick some of that
13:21karolherbst[d]: but yeah.. `mismatched_lifetime_syntaxes` is new in 1.89 and it's... throwing a lot of warnings around
13:21gfxstrand[d]: There's a few more than just that. They removed some stuff from the preamble and changed some import rules
13:22karolherbst[d]: that shouldn't happen if the edition doesn't change?
13:22gfxstrand[d]: heh
13:22karolherbst[d]: maybe it picks 2024 for you?
13:22gfxstrand[d]: Could be
13:22gfxstrand[d]: I'm not seeing an edition being passed to rustc
13:22karolherbst[d]: the default flag thing is kinda broken if you add languages in the build later.. it's.. kinda a mess
13:23karolherbst[d]: yeah.. I have a `-Drust_std=2021` thing locally passed to meson
13:23gfxstrand[d]: Oh
13:24gfxstrand[d]: Yeah, that fixes everything. 😭
13:24karolherbst[d]: we do have it set as default options. but...
13:24karolherbst[d]: meson bug (tm)
13:24karolherbst[d]: gfxstrand[d]: fyi: https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/gallium/frontends/rusticl/meson.build?ref_type=heads#L64-82
13:25karolherbst[d]: in case you care about migrating to 2024 eventually, could already enforce some of the errors
13:26karolherbst[d]: there is also an option in 1.85? to turn them all on
13:27karolherbst[d]: but I'd rather not add any flags that were added after 1.82
13:31gfxstrand[d]: Do we know what version of syn to update to that doesn't spew all these lifetime warnings?
13:40karolherbst[d]: no idea
13:40karolherbst[d]: but it might be easier to simply disable all warnings in subprojects.. I just don't think there is a simple way?
13:44gfxstrand[d]: Anyway... https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36847 cleans up a bunch of it
13:46gfxstrand[d]: With that, it's down to just proc-macro2 complaining
13:46karolherbst[d]: why not just disable all warnings on bindgen?
13:47gfxstrand[d]: I guess maybe we could?
13:48karolherbst[d]: it's interesting, because `unnecessary_transmutes` already gets added for 1.88+ 2 lines below your change
13:48gfxstrand[d]: weird
13:48karolherbst[d]: but yeah, I use `'--raw-line', '#![allow(warnings)]',` and that works great
13:48gfxstrand[d]: IDK why meson isn't picking up my rustc version properly
13:48karolherbst[d]: gfxstrand[d]: need to throw away your build directory, because it's cached
13:49karolherbst[d]: or use that --wipe flag or someting
13:49karolherbst[d]: anyway... allowing all warnings will just solve this issue forever
13:50karolherbst[d]: and won't have to fiddle with what's supported by rustc or not
14:05gfxstrand[d]: This unsafe in unsafe thing is annoying
14:09snowycoder[d]: gfxstrand[d]: The issue goes away if either:
14:09snowycoder[d]: - I add an `if (gl_InvocationID >= 3) return;` at the start of the TessControl test shader
14:09snowycoder[d]: - I patch `threads_per_input_primitive` to 3
14:09snowycoder[d]: - I patch `NV9097, SET_PATCH` to 4
14:09snowycoder[d]: So I guess my theory is confirmed.
14:11gfxstrand[d]: Yeah
14:12snowycoder[d]: I don't think the real solution is to change patch vertex count though, that would mess up real programs.
14:12snowycoder[d]: I think the warning can be safely ignored, programs reading out-of-bounds attributes cause undefined behaviour, it's just strange do find that kind of UB in conformance tests.
14:12gfxstrand[d]: Yeah. We can also probably fix the test
14:20karolherbst[d]: gfxstrand[d]: I've enabled that warning in rusticl since forever, but yeah 😄
14:47gfxstrand[d]: And... bindgen's bitfield stuff doesn't 2024 lint clean. 😩
14:48gfxstrand[d]: mostly unsafe stuff
14:50gfxstrand[d]: And... `--wrap-usafe-ops` doesn't fix it
14:52gfxstrand[d]: Okay, `--wrap-unsafe-op` plus new bindgen is enough
14:56gfxstrand[d]: Do I need to do anything to enable clippy or is a bunch of `-Aclippy::` enough?
15:00gfxstrand[d]: karolherbst[d]: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36849 if you want an easy review
15:00karolherbst[d]: gfxstrand[d]: need to run the clippy build target
15:06gfxstrand[d]: gfxstrand[d]: I'm adding clippy fixes now
15:12karolherbst[d]: ~~50 commits later~~
15:33gfxstrand[d]: lol
16:09gfxstrand[d]: karolherbst[d]: Okay, I'm done for now if you want to give it a skim
16:10snowycoder[d]: The `Robust Buffer Access` extension covers vertex input range checking: "access to vertex input data is bounds checked against the bound vertex buffer range".
16:10snowycoder[d]: Does this apply to TCS? Guess I'll open an issue to VK-GL-CTS and let them figure it out
16:32triang3l[d]: snowycoder[d]: This is more of a general chat question rather than Nouveau-specific, but the vertex input stage feeds either the tessellation control stage or the vertex stage depending on, as far as I remember, the presence of the former
16:33triang3l[d]: Vertex input is conceptually a separate pipeline stage, not a part of a shader stage. Even though on some hardware it may be baked into the vertex shader, but in Vulkan itself it's separate
16:38snowycoder[d]: triang3l[d]: So that does not describe input patch vertex access in the Tessellation Control Shader, right?
16:41triang3l[d]: snowycoder[d]: Reads from the vertex buffers have the same robustness for both TES control points and VS vertices
16:42triang3l[d]: but if you're referring to out-of-bounds indexing of control points themselves in the shader, that's another story
16:50snowycoder[d]: triang3l[d]: I found a deqp-vk test that accesses patch control points (`gl_in` in GLSL) with an out-of-bounds index in the Tessellation Control Shader, here are more details: https://github.com/KhronosGroup/VK-GL-CTS/issues/537.
16:50snowycoder[d]: Why would it change between TES and TCS control points access?
17:00triang3l[d]: snowycoder[d]: Vertex input data is the data that fills custom attributes inside `gl_in`, it's not `gl_in` itself
17:01triang3l[d]: it's loaded _before_ the TES or VS
17:01triang3l[d]: (even though in hardware it may be loaded _in_ TES/VS, but logically it's before TES/VS)
17:02snowycoder[d]: Ohh, ok, so If I request 300 vertices from a VkBuffer that can only contain 200, the missing vertices will be filled with 0
17:02triang3l[d]: snowycoder[d]: Yeah, but not exactly 0, what will be loaded depends on the level of robustness supported and enabled
17:04snowycoder[d]: triang3l[d]: Thanks, that is much clearer!
17:04snowycoder[d]: Still, executing untrusted code in a TCS or Geometry shader can cause undefined behaviour by out-of-bounds attribute access.
17:05triang3l[d]: According to https://registry.khronos.org/vulkan/specs/latest/man/html/VK_KHR_robustness2.html it's:
17:05triang3l[d]: > Most accesses must be tightly bounds-checked, out of bounds writes must be discarded, out of bound reads must return zero. Rather than allowing multiple possible (0,0,0,x) vectors, the out of bounds values are treated as zero, and then missing components are inserted based on the format as described in Conversion to RGBA and vertex input attribute extraction.
17:05triang3l[d]: when robustBufferAccess2 is enabled, but Vulkan 1.0's robustBufferAccess gives more freedom to the implementation
17:06triang3l[d]: snowycoder[d]: It shouldn't affect other processes at least, but yes
17:09triang3l[d]: snowycoder[d]: Yeah, though maybe there are specific rules about out-of-bounds input array access too, I haven't researched that yet, but the specification should possibly have some info regarding that
17:21gfxstrand[d]: Patch isn't going to have anything to do with robustness
17:26gfxstrand[d]: snowycoder[d]: So... it's fine to have 4 vertices out and 3 in, I think. But if it's reading from vertex 3 of the input data, that's wrong.
17:30gfxstrand[d]: I'm having trouble finding a VU about patch control points. This corner of the spec isn't incredibly precise.
17:37gfxstrand[d]: But I'm pretty sure the best case scenario if you read an input vertex that doesn't exist is that you get garbage
17:38gfxstrand[d]: Feel free to make an MR against the vk-gl-cts repo to fix it if you feel so inclined.
17:38gfxstrand[d]: Or don't
17:41snowycoder[d]: gfxstrand[d]: I didn't see many merged MRs so I got a bit scared, but I guess I can try it
17:42gfxstrand[d]: Oh, that's because most CTS development goes through an internal gerrit. But they do accept external MRs. It'll just be an awkward process.
17:52mhenning[d]: if anyone wants to play with a small perf improvement: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36823
17:54karolherbst[d]: yeah, I wanted to see if it helps with the coop matrix stuff given it's just compute and I don't know if nvk might run stuff on the 3d queue there or not...
17:54karolherbst[d]: and if it matters
17:56mhenning[d]: yeah, we definitely switch to the graphics subchannel a bunch during compute-only workloads right now, that should help improve the situation
17:58gfxstrand[d]: Ugh... I can't even build the CTS anymore. 😩
17:59karolherbst[d]: mhenning[d]: I _think_ it also helps not allocating 3D if it's a compute only thing, but not sure if your MR is enough to allow that?
17:59karolherbst[d]: I think it helps for context switching a bit
18:00karolherbst[d]: if the CTS is doing compute only tests a few times, might even speed it up
18:04mhenning[d]: karolherbst[d]: yeah, we're not quite there yet but I'm working on it
18:11mohamexiety[d]: mhenning[d]: marysaka[d] this might help with the sottr stuff since that looked like it had an insane amount of WFIs :thonk:
19:19phomes_[d]: mhenning[d]: I see improvements primarily in vkd3d games. AoE4: +7 fps. Deep Rock Galactic: +5 fps. Atomic heart +1 fps. X-com2 +1 fps. No problems found
19:41karolherbst[d]: okay, let's see if I see a 50% perf increase or not, lol
19:42marysaka[d]: mohamexiety[d]: oooh going to poke at it a bit :aki_thonk:
19:42karolherbst[d]: okay, it makes 0 difference
19:42karolherbst[d]: at least for `vk_cooperative_matrix_perf`
19:45karolherbst[d]: kinda want to land this one (it's a trivial MR): https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36391
19:46karolherbst[d]: mhenning[d]: up for reviewing some of the address calculation stuff I've written? https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36113 Though I kinda still want to make it useful for ldc/ldcx, but haven't gotten to it yet...
19:47karolherbst[d]: ehh I should throw out unreated stuff..
19:47karolherbst[d]: done
20:19mhenning[d]: karolherbst[d]: I'll take a look tomorrow.
20:40airlied[d]: karolherbst[d]: the fixup in your branch passes on radv at least, I'll fold it in
20:41karolherbst[d]: cool
20:43karolherbst[d]: I think with my changes it also makes it easier for the compiler to optimize, because all split matrices are based on the same base deref (as the array one doesn't need to be adjusted anymore), but not sure it even matters
20:44karolherbst[d]: I hoped I could ditch more ~~of Mary's~~ code, but sadly it's just around 100 loc in Nak 🙃
20:44karolherbst[d]: I'm sure Mary spend most of the time making the matrix size lowering work...
20:44marysaka[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1407465361846304830/before.jpg?ex=68a633c3&is=68a4e243&hm=2bde33c1dcfbe851d273964e8b7e4d9be27073f909747a867f68450234fbd941&
20:44marysaka[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1407465362122997891/after.jpg?ex=68a633c3&is=68a4e243&hm=d975d4458842776f76c3ba1069240af53f4d8be2facd94a004881d7d9e54912a&
20:44marysaka[d]: mhenning[d]: for sottr (before the patches and after)
20:45marysaka[d]: it's quite weird tho it felt more responsive on the first few section of the benchmark (overall at 70 fps instead of ~58)
20:46marysaka[d]: karolherbst[d]: yeah it was most of the time I lost there 😄
20:47karolherbst[d]: marysaka[d]: well.. we are going to remove it 🙂 https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36841/diffs?commit_id=617bb5aeb80c6d9d2210da3c1e32de22ffe2f02e
20:47marysaka[d]: karolherbst[d]: that's way better :AkkoYay:
20:48karolherbst[d]: now it's a common pass doing all the lowering and splitting 😄
20:49karolherbst[d]: it's a bit annoying because the pass also splits it for all the other operations, but I think we can optimize everything enough...
20:49karolherbst[d]: I need to look into setting proper alignment information for the load/store vectorizer actually...
20:50karolherbst[d]: Dave wrote a patch to load the elements in pairs, but it's kinda painful, because then you have int8 where it's 4 elements
20:50karolherbst[d]: and the other types being different again..
20:50karolherbst[d]: I kinda wnat to make it work through vectorization, because that should be able to combine more stuff
20:51karolherbst[d]: now that most of my MRs are merged, I should look into more perf tomorrow
20:55karolherbst[d]: what alignment guarantees do we get btw?
20:57karolherbst[d]: ahh env spec
21:12karolherbst[d]: okay.. looks like it's either 16bytes or a full row/column
21:12karolherbst[d]: the lower value
21:12karolherbst[d]: yeah.. we can be a bit smarter with that assumption
21:23mangodev[d]: mhenning[d]: i really need to redo my patch set and rebuild my drivers again
21:23mangodev[d]: this MR and faith's WSI sync MR may be really promising for desktop performance and usability
21:24mangodev[d]: i wanna learn how to `apitrace` so i can better debug why firefox has weird issues with partial present and goes completely kaput after about 2 hours of video playback
23:00snowycoder[d]: gfxstrand[d]: The VK-GL-CTS wiki is unclear, should I target the vulkan-cts-1.4.0 (the earliest active branch) or vulkan-cts-1.3.8 (the earliest non-discontinued branch)?
23:00snowycoder[d]: (I'm asking you since you have numerous contributions to VK-GL-CTS)
23:18gfxstrand[d]: Earliest active
23:18snowycoder[d]: Thanks!