00:24gfxstrand: Sometimes NIR is disturbingly clever....
00:24gfxstrand: bool(1.25 * float(data[gl_SubgroupInvocationID]) + 5.0) becomes load_const(true) when data[] is an array of bool
00:25gfxstrand: I guess range analysis is going to town.
00:25gfxstrand: It's awesome and correct but it scared the crap out of me when looking at this CTS test where one of my vote ops was just... missing.
00:26jenatali: Wow
00:26gfxstrand: Also, jokes on the CTS test author for thinking they'd out smarted the constant folder
00:29CounterPillow: that is impressive
01:14alyssa: oh I'm seeing it
01:14alyssa: data[] is nonnegative -> 1.25 * float(..) is nonnegative -> 1.25 * float(..) + 5.0 is positive -> bool(positive) is true
01:14alyssa: nice :D
01:24Ristovski: Huh, so I have a prog that allocates aligned memory and maps it with GL_EXTERNAL_VIRTUAL_MEMORY_BUFFER_AMD, then inside a compute shader I modify the buffer. The writes are visible in the hosts buffer (i.e. CPU side) on llvmpipe, but not on radeonsi :( I already use GL_MAP_COHERENT_BIT so what gives
08:13glehmann: gfxstrand: if they wrote it as int it probably wouldn't have been constant folded by NIR tho :P
08:33babylonian: I already gave the procedure, which was 334 reduced to value 76 , meaning you take bounded inverse of 334 which is 178 add 76 to it which is 254, then runtime when it generates 334 through the summation of operands the focal element is at 76, while other same values if so to visualize are at 254, so you add 512-245=258 bu times 2 of that, you get 334 and 258, and when subtracted from n times 334, you filter out the per alu one transition
08:33babylonian: that really transitioned, cause 76+76+76 minus 76+76 is 76, it's very simple proof to the eligibility of the algorithm. now when we do 76 to 334 transition would the same procedure work, let's try 436+334 which is inverse of 76 +334 is 770+76=846-512 is 334 and 770+254*2 is off entirely so but even any other method would not work except just add 512 to make the value always bigger if that is not the case that transition from is bigger than
08:33babylonian: transition to value.
08:34babylonian: and the first part would always then work
08:39babylonian: but there is a typo, 512-254
08:39tursulin: airlied, sima: would you be able to apply msg id 20240228142240.2539358-1-tvrtko.ursulin@linux.intel.com. directly to drm-fixes so my email updates reaches the Linus' tree as soon as possible? It had collected all the acks from Intel side.
08:40babylonian: that way one can add prefixes integer codes and translate the results based of adding operands
08:42babylonian: that is also something that hardware would do at the very bottom of the stuff, but it presumably does that through ffm and bus cascades.
08:42babylonian: err fft
08:42babylonian: fast fourier transform
08:44sima: tursulin, can do
08:46tursulin: sima: great, thanks!
08:49dolphin: airlied, sima: cherry picking new fixes is bit borked now that drm-intel-fixes is behind, any ETA for pulling the previous fixes? I could then rebase on top of drm-fixes
08:50sima: dolphin, I guess I could do that rn now too while I'm working drm-fixes
08:50dolphin: that'd be great
08:51sima: dolphin, https://lore.kernel.org/dri-devel/ZeGOUTfiA0_FNKLg@jlahtine-mobl.ger.corp.intel.com/ this one?
08:52dolphin: yes
08:55babylonian: to some degree it makes sense that the other direction is not achievable cause you can not make up bits out of nothing, most importantly sampling goes wrong if the intermediate is short of something not over the bound
08:56babylonian: so it's safe to presume that any algorithm to make the value bigger through transition except by known constant and switch the operands does not exist
08:57babylonian: that is how things work as to you need to allocate memory or storage based of what compiler analyses
09:04babylonian: cause technically it would need to resize either focal or mask by 512 then inside runtime, and there is a way to do that, but only in serial fashion, it will immediately serialize everything
09:06babylonian: so my tests to find an integer transition from smaller to bigger have all failed, and i do not think that is possible in parallel fashion like said unless the constant is allocated by compiler analysis
09:18zzoon[m]: dj-death: really sorry for annoying but, without the check VK_QUERY_RESULT_WITH_STATUS_BIT_KHR, dEQP-VK.video.encode.h264_query_with_status fails.
09:24sima: tursulin, dolphin done
09:24sima: airlied, ^^ I pushed some things to drm-fixes
09:25dj-death: zzoon[m]: ah my bad, I missed that VK_QUERY_RESULT_WITH_STATUS_BIT_KHR was a new video flag
09:26zzoon[m]: yeah, thanks for confirmation.
09:28dj-death: zzoon[m]: I would say move it begin the switch statement like the other VK_QUERY_RESULT_WITH_AVAILABILITY_BIT
09:28dj-death: zzoon[m]: I think it should be after
09:28dj-death: spec seems to say one or the other
09:31zzoon[m]: that's what i understand.
10:13pq: melissawen, mairacanal, did you see https://lists.freedesktop.org/archives/dri-devel/2024-March/444250.html ? There is a puzzle about drm_fixp2int_round().
10:18MrCooper: oh wow, Helen is already going for the kernel top-level .gitlab-ci.yml end boss, nice :)
10:27mupuf: MrCooper: I wish she hadn't, as it seems super premature when the DRM subsystem as a whole has not migrated to it
10:27MrCooper: yeah maybe, and it seems to be too early for Linus
10:36Ristovski: pro tip: always read khr specs before going to sleep so you get an idea in a dream and it works irl
14:27mripard: karolherbst: hey, did anything out of the ordinary happened when you merged that patch to drm-misc-fixes?
14:27mripard: (aside from the prompt to switch to gitlab)
14:41melissawen: pq, I'll take a look. At first glance it seems to me a slight misunderstanding of the fractional part plus a casting issue.
14:41melissawen: AFAIR vkms had a pending implementation of `rounding half up` that I left when solving a kernel compilation issue by moving to drm_fixed helpers:
14:41melissawen: https://patchwork.freedesktop.org/patch/502241/?series=108364&rev=1
14:42melissawen: basically, it was missing the FIXED_TO_INT_ROUND(a) proposed by Igor, so this was what the drm_fixp2int_round() was supposed to do
14:42alyssa: does pipe_cap_shareable_shaders allow sharing shaders compiled in a GLES context with a GL context?
14:42alyssa: this seems... problematic
14:44alyssa: if so, it means that e.g. the pipe_context_no_lod_bias flag I added is bogus
14:44alyssa: since then we're forced to assume any shader is potentially used by GL, even if it's a totally gles app
14:46karolherbst: mripard: well.. I don't have a pub key set up, so I had to manually switch to the https:// urls
14:47karolherbst: I think it would make sense to be able to configure dim to either prefer https:// or ssh:// URLs?
14:47karolherbst: but besides that things just got pushed
14:47mripard: awesome, and yeah, that would make sense
14:48mripard: can you open an issue for that here: https://gitlab.freedesktop.org/drm/maintainer-tools ?
14:51pq: melissawen, #define FIXED_TO_INT_ROUND(a) (((a) + (1 << (SHIFT - 1))) >> SHIFT) looks what I'd expect.
15:02alyssa: gfxstrand: nir_io_semantics is broken for compact clip/cull, yeehaw (-:
15:22jenatali: alyssa: broken how?
15:23alyssa: jenatali: there's no explicit way to know if the offset is in scalars or vec4s
15:24alyssa: the implicit way is hardcoding that "my driver uses compact clip/cull" and then special casing it in the compiler based on the io_semantics::location
15:24jenatali: Yeah, ok
15:24jenatali: So "broken" meaning special, not impossible to use
15:24jenatali: Tess factors would behave the same but those can't be indirect at least
15:24jenatali: ... I think
15:24alyssa: Less special and more.... poorly defined
15:33alyssa: jenatali: Wait, no
15:33alyssa: Broken
15:33alyssa: I was right the first time >.<
15:33alyssa: 32 %80 = @load_interpolated_input (%61, %79 (0x1)) (base=0, component=0, dest_type=float32, io location=VARYING_SLOT_CULL_DIST0 slots=8)
15:33jenatali: How so?
15:33alyssa: this is supposed to load culldist[4]
15:33alyssa: but if we have an indirect index, for the same op we get index 4 instead of 1
15:34alyssa: (-:
15:34jenatali: ... ew
15:35jenatali: How would you load culldist[5]?
15:35alyssa: with direct? 0x1 + comp=1
15:35alyssa: with indirect? 0x5
15:35alyssa: (-:
15:36alyssa: fixing this would require touching common code with effects on other drivers, which I can't do until this regression https://gitlab.freedesktop.org/mesa/mesa/-/issues/9986 is fixed
15:36alyssa: I guess I can just keep a patch downstream since upstream isn't interested in common code work ..
15:45jenatali: alyssa: FWIW I have a pass that would maybe improve things. Though I guess it doesn't handle indirects... If splits them into 2 4-component vars
15:45jenatali: It splits*
15:45jenatali: At some point I need to actually let indirects get to the backend
15:46zmike: alyssa: pretty sure I had to handle that when I did the explicit io conversion in zink
15:46zmike: I remember indirect clip/cull being frustrating
15:48alyssa: zmike: yeah, I mean... the NIR core is fundamentally broken here
15:48alyssa: there's no way to do it 'right' in the driver withot fixing that
15:49alyssa: but we can't fix the NIR core without adjusting every driver supporting this stuff
15:49zmike: 'right' just means drawing the green triangles instead of the red ones
15:49alyssa: which I can't do without a way to test those kinds of patches without hurting myself
15:49alyssa: so... I guess it'll stay broken..
15:49zmike: oh yeah that ci issue is super annoying
15:49zmike: cc DavidHeidelberg again so I don't open a new ticket for it this week
15:50zmike: but I almost did last week
15:57alyssa: maybe i'll add a nir_shader_option for 'wants_working_compact_vars' and let drivers transition one by one?
15:59zmike: maybe put up a MR with an example change in >= 1 driver and then add that as a last resort if people don't move in a reasonable time
16:00alyssa: nod
17:48karolherbst: airlied: `nir_lower_cl_images` is broken for llvmpipe where there are multiple functions :')
18:38DavidHeidelberg: zmike: thx for the ping, it seems that change is really unpleasant (even for me), so I'll try to fix it as it annoys almost everyone
21:02gfxstrand: alyssa: Yeah, we probably need a "compact" bit in semantics
22:19jenatali: alyssa: There's a comment in nir_lower_Io.c: /* We always lower indirect dereferences for "compact" array vars. */
22:23gfxstrand: Which I have patches to fix if anyone has the courage to review them. :)
22:33jenatali: gfxstrand: I can try
22:33jenatali: I happen to be looking at a crash due to compact arrays right now anyway...
22:56DavidHeidelberg: dcbaker: see, yes we undefining it https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27967
22:56DavidHeidelberg: (I would comment on MR, but I'm on the phone)
22:58Lyude: so: when it comes to destroying a kernel device structure that was created by a kernel driver, the driver itself is usually the last one to hold (and subsequently drop) a reference to said device - which then unregisters it from userspace correct?
23:00Lyude: mainly just asking as I'm trying to make sure the drm plane/crtc/encoder types in my rust kernel bindings are both: always valid (likely through all references to said objects holding at least one reference to the owning drm_device), but also don't mistakenly keep the device alive past the point driver teardown starts
23:02jenatali: I'm tempted to write a windbg extension to disassemble nir from another process... Getting a crash in shader compilation is really hard to debug when you have to manually poke around at instructions and links
23:26alyssa: jenatali: comment's wrong then :~)
23:43dcbaker: DavidHeidelberg: ah, okay. That makes sense
23:47alyssa: jenatali: ...nir_print_shader?
23:48jenatali: alyssa: From another process