IRC Logs of #dri-devel on irc.freenode.net for 2024-07-08

05:13 Company: does GL have an equivalent to VkAttachmentLoadOp ?
07:49 tzimmermann: mripard, hi. will there be a drm-misc-next-fixes PR this week?
11:21 mripard: tzimmermann, mlankhorst, daniels: I've started to work on a gitlab issue template to ask for drm-misc commit rights: https://gitlab.freedesktop.org/mripard/kernel/-/tree/drm-misc-templates?ref_type=heads
11:23 mripard: one of the issue I think is that our gitlab tier can only handle a single assignment through the template, so with this mlankhorst is always going to be the one assigned
11:23 mripard: daniels: do you know if there's anything (like a bot maybe?) we can use to set the assignees after the facts?
12:13 mareko: zmike: zink will need fmulz to fix incorrect rendering in this: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11464
12:14 zmike: hm ok
12:14 zmike: thanks
12:15 cwabbott: zink can just not set lower_fpow
12:20 mareko: indeed
12:37 luc: how should GLX cope with a call to glXQueryDrawable(dpy, drawable, GLX_SWAP_INTERVAL_EXT, &...) towards a driver not supporting GLX_EXT_swap_control (e.g. drisw) ?
12:39 zamundaaa[m]: Oh, stupid matrix bridge had me unauthenticated again. In case the message got lost, here it is again:
12:39 zamundaaa[m]: vsyrjala: I just found out about the SIZE_HINTS property for cursor planes, that's quite useful. In the future though, could you please cc wayland-devel about such uAPI additions? I would've really liked to have it implemented a year ago instead of only finding out about it by accident
12:41 luc: https://github.com/flightlessmango/MangoHud/blob/master/src/gl/inject_glx.cpp#L145 this line will run into a SEGV due to a null pdraw->psc->driScreen->getSwapInterval
12:54 Hazematman: (sending again as I was having some matrix<-> irc issues) Hey I have an open MR for Android integration of llvmpipe/lavapipe that also improves some of the Android build docs. Roman Stratiienko has given me some feedback on it but is there anyone else doing Android work that could take a look and review my changes? https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29344
13:31 alyssa: IHV poll - what is fmin(+0.0, -0.0) on your platform?
13:31 alyssa: it's seemingly impl-defined in SPIR-V
13:32 MrCooper: luc: BadValue error? GLX_SWAP_INTERVAL_EXT isn't a valid attribute for glXQueryDrawable without GLX_EXT_swap_control
13:32 zmike: eric_engestrom: is there any way I can get the full process name to print in hangs here? https://gitlab.freedesktop.org/mesa/mesa/-/jobs/60799479
13:45 eric_engestrom: "for process shader_runner" but I guess you want not just the process but the whole command line
13:46 zmike: yes
13:46 eric_engestrom: but you'd have to ask the amd kernel devs
13:46 zmike: ah
14:44 glehmann: alyssa: https://gitlab.khronos.org/spirv/SPIR-V/-/merge_requests/296#note_471367
14:46 glehmann: it's no longer implementation defined, though I kind of disagree with the reasoning because it retroactively changes required behavior for VK_KHR_shader_float_controls (the old one, not 2)
14:49 alyssa: glehmann: ahah, excellent, thanks :)
14:49 alyssa: I mean, sucks to be AGX, but it makes the path forward for NIR very obvious :)
15:49 alyssa: glehmann: wait, I'm still not seeing where in the spirv spec the strict behaviour is required.
15:50 alyssa: I'm reading https://registry.khronos.org/SPIR-V/specs/unified1/GLSL.std.450.html , is there somewhere else with stronger text?
15:50 alyssa: That says: "Result is... either x or y if both x and y are zeros" and "fmin(-0, ±0) = -0"
15:51 glehmann: nmin(-0, ±0) = -0. nmax(+0, ±0) = +0.
15:51 alyssa: Oh. NMin vs FMin. Got it.
15:51 glehmann: fmin has the same note
15:51 alyssa: Wait, no, even for nmin?
15:52 alyssa: Yeah, even for NMin, I'm not seeing text requiring "fmin(+0, -0) = -0"
15:52 glehmann: are you really looking at https://registry.khronos.org/SPIR-V/specs/unified1/GLSL.std.450.html
15:52 alyssa: Yes
15:52 alyssa: I see the text "Result is... either x or y if both x and y are zeros... nmin(-0, ±0) = -0."
15:53 alyssa: That notably allows "nmin(+0, -0) = +0" as a valid implementation.
15:53 glehmann: oh right
15:53 alyssa: (The implementation "return the first source if the sources are equal", in fact.)
15:53 glehmann: so the text and the examples given contradict each other
15:53 glehmann: great
15:53 alyssa: which examples...?
15:54 glehmann: fmin(-0, ±0) = -0
15:54 vsyrjala: zamundaaa[m]: if you want kms uapi changes cc:d to somewhere other than dri-devel then i think you need to propose a change to MAITNAINERS or something. asking a random collection of people to cc uapi stuff to various places probably won't work very well
15:54 vsyrjala: zamundaaa[m]: or start reading dri-devel
15:54 alyssa: I don't think that's an example, I think that's an additional requirement
15:54 alyssa: and I agree with that requirement
15:55 alyssa: but for fully specifying AMD behaviour would require an addition line "fmin(+0, -0) = -0"
15:55 glehmann: ah
15:55 glehmann: well, that's worse
15:55 alyssa: without that addition, either -0 or +0 is valid
15:55 alyssa: I don't know if this is an oversight
15:55 glehmann: because the opcode should be commutative
15:55 alyssa: should is doing a lot of lifting there =D
15:56 glehmann: well it is in NIR, so you have no choice there
15:56 glehmann: we are not going to change that
15:56 alyssa: I mean. If it's not commutative in SPIR-V, maybe it shouldn't be in NIR? but also I would like to know what the SPIR-V spec actually intends before anything changes
15:57 alyssa: If this is a spir-v spec bug, we should fix the spir-v spec
15:57 alyssa: If this is actually intended, I think there are bogus Vulkan CTS tests
15:57 glehmann: non commutative fmin would be a nightmare for opt_algebraic
15:57 alyssa: yeah, agreed. wasn't a real proposal.
15:57 alyssa: but regardless of what NIR does, I think there's *some* khronos bug here...
15:58 glehmann: we should request a clarification
15:58 zamundaaa[m]: vsyrjala: reading dri-devel and finding the bits relevant for compositors isn't a realistic options, most people do not have time for that
15:59 zamundaaa[m]: If we could get a keyword in the commit message or standardize cc-ing some mailing lists on specific topics that would of course be better though
15:59 alyssa: hmm, maybe the vk cts doesn't actually test the min(+0, -0) case
16:00 glehmann: the question is if that's intended
16:00 alyssa: yeah
16:01 alyssa:files ticket
16:08 alyssa: https://gitlab.khronos.org/spirv/SPIR-V/-/issues/800
16:11 glehmann: two new min/max spirv issues in one hour :)
16:12 alyssa:high fives
16:33 Company: glehmann: how many broken spec issues do you find per hour roughly?
16:38 glehmann: not enough, probably
16:40 Company: I am wondering at this point how good the specs are
16:40 Company: ie if it's more about coding to the specs or coding to the implementations
16:41 Company: like, Wayland is more about the implementations and you can ask the few people involved for clarifications
16:41 Company: and web stuff is all about the specs and there's a big process behind making sure they cover all corner cases
16:41 Company: and I have no idea where inbetween GL and Vulkan live
16:42 Company: and if they are similar in that or both are different
16:42 jenatali: Sigh this stupid Matrix bridge keeps disconnecting without telling me that I need to re-auth
16:42 jenatali: I was just trying to say that the specs are pretty good
16:43 jenatali: They have to be, since there are so many different implementations of the spec
16:44 Company: with the less common implementations, they don't conform very well to the specs and people code against the implemenetations
16:44 Company: at least according to Google
16:45 jenatali: For GL? Yeah I can 100% believe that. For Vulkan, the spec is much more thorough, and much more alive
16:45 Company: I have the benefit of being on Linux where there's about 1.5 implementations, too
16:46 jenatali: For GL, yeah, 90% of the stuff is Mesa's frontend or ANGLE. For Vulkan the different implementations even in Mesa are pretty much independent
16:47 Company: that's because there isn't much to do
16:48 Company: and I was thinking of nvidia for the .5 - angle is more a wrapper and suffers from the backend it uses in my experience
16:48 Company: kinda like zink
16:49 Company: and zink is one of the things that forces the Mesa Vulkan impls to actually agree on stuff
16:49 karolherbst: jenatali: one further question about mapping, how are you dealing with CL_USE_HOST_PTR, because this one is an annoying special case, because you have to return a pointer based on the host_ptr, but that's a bit weird if the application would map for reading at the same offset multiple times
16:49 Company: and they share nir and the wsi I think?
16:50 jenatali: karolherbst: I just store the pointer. When the app maps, I copy to staging, map that, and then CPU memcpy it to the app's actual memory
16:50 jenatali: It's terrible :D
16:50 jenatali: D3D doesn't (or at least didn't) really support importing host memory like that
16:50 karolherbst: uhhh
16:51 alyssa: Company: vulkan is very much "coding to the CTS"
16:51 alyssa: luckily the vulkan cts is really great
16:51 alyssa: but not perfect ;) ;)
16:51 jenatali: It really is
16:51 karolherbst: jenatali: but are you refcounting the pointer or are you hoping that no application will map at the same offset multiple times?
16:51 jenatali: karolherbst: Not sure I'm following why I'd need to refcount?
16:51 jenatali: The act of mapping is what copies contents into the user pointer
16:51 karolherbst: because in theory, I think if the app would map at 0x100 4 times, you'd return the same pointer 4 times, but the application might except being able to unmap that 4 times?
16:51 karolherbst: dunno
16:52 jenatali: Yeah that seems fine?
16:52 karolherbst: s/except/expect/
16:52 karolherbst: yeah, but that means you have to count how oftne you returned the same pointer
16:52 jenatali: Why?
16:53 karolherbst: If the buffer object is created with CL_MEM_USE_HOST_PTR set in mem_flags, the following will be true: The pointer value returned by clEnqueueMapBuffer will be derived from the host_ptr specified when the buffer object is created.
16:53 jenatali: Right
16:53 karolherbst: and nothing tells you to fail the map if it already was mapped for reading
16:53 jenatali: ... right?
16:54 karolherbst: so.. you have the application calling clEnqueueMapBuffer(... offset: 0x100) 4 times
16:54 karolherbst: and 4 times you'd e.g. return 0x35341200
16:54 karolherbst: and the application will unmap that
16:54 karolherbst: 4 times
16:54 karolherbst: probably
16:54 jenatali: Yep. And?
16:54 karolherbst: do you have to return an error on the 5th or the 2nd unmap?
16:55 jenatali: The 5th
16:55 karolherbst: so you have to refcount
16:55 Company: alyssa: does that cts greatness go for core stuff only or also for wsi things?
16:55 jenatali: Oh, I'm misremembering the CL API. Let me re-read what I wrote :)
16:55 alyssa: Company: probably mostly core
16:56 alyssa: given that WSI in HK is currently totally broken..........
16:56 karolherbst: jenatali: I don't think it's explciitly speced like that tho
16:57 karolherbst: there is also `CL_MEM_MAP_COUNT`, but that's for the total memory object, not specific offsets
16:57 karolherbst: I wonder if that needs spec clarification
16:57 jenatali: karolherbst: Ok so what I do is each map pushes an "outstanding map" into a queue
16:57 Company: alyssa: dmabuf import/export are the challenging things - and getting swapchains and GTK using the same wl_surface to not trample on each other
16:58 jenatali: And then unmap pops the first map out of the queue and undoes it. So when a write map gets unmapped, it'll trigger some additional copies in some cases (e.g. USE_HOST_PTR) so I need to track which maps need that
16:58 jenatali: karolherbst: https://github.com/microsoft/OpenCLOn12/blob/master/src/openclon12/resources.cpp#L1431
16:58 karolherbst: jenatali: right.. in practise none of this matters, because 1. you return a pointer to a memory allocation covering the whole range, so `size` doesn't even matter. 2. it's mapped for reading, so no write back happens, so you won't have to do anything on `unmap` in the first place
16:58 karolherbst: but still...
16:58 karolherbst: in theory this has some weird issues
16:59 karolherbst: e.g. in what order do you release those maps, because they can all have different sizes
16:59 jenatali: Yeah
16:59 karolherbst: anyway.. for CL_MEM_USE_HOST_PTR none of those issues matter one way or the other
17:00 jenatali: They do for me :)
17:00 karolherbst: you still should return a pointer pointing into the host_ptr region
17:00 jenatali: Yes, I do
17:00 karolherbst: ohh.. you'd fail on the 2nd unmap instead of the 5th I guess
17:01 jenatali: No, I track the right number and fail on the 5th
17:01 karolherbst: right number at the specific offset or generally?
17:02 jenatali: But a sequence of (write, read, write, read) that gets unmapped 4 times, those unmaps would do different things depending on the sequence in which they're supposed to correspond
17:02 karolherbst: that's not an issue
17:02 karolherbst: you can't overlap with a mapping with CL_MAP_WRITE
17:02 karolherbst: (and the other write thing)
17:02 jenatali: Oh ok
17:02 karolherbst: in theory you should fail the map if it overlaps with a writable region
17:03 jenatali: And the maps are keyed based on the pointer that's returned. So unmap of a pointer that wasn't returned will fail
17:03 karolherbst: yeah
17:03 karolherbst: just need to remember if a specific pointer was returend multiple times
17:03 karolherbst: I guess...
17:03 karolherbst: it doesn't even matter, because the pointer remains to be valid either way
17:03 karolherbst: well..
17:03 jenatali: Right that's what I'm saying, per-pointer I have a list of the map tasks that are still outstanding, i.e. haven't been unmapped
17:03 karolherbst: kinda
17:04 karolherbst: I see
17:04 karolherbst: I just dislike this part of the spec, because uhhh... those corner cases are weird
17:05 jenatali: The CL spec as a whole is weird
17:06 karolherbst: yeah.. fair
17:26 glehmann: jenatali: is WaveActiveMax(NaN) defined behavior in D3D12?
17:27 jenatali: Let me see if I can find out
17:28 jenatali: glehmann: https://github.com/microsoft/DirectXShaderCompiler/wiki/Wave-Intrinsics#type-waveactivemin-type-expr - I think this is the "spec" page for these ops and isn't explicit about behavior with specials
17:30 jenatali: glehmann: https://microsoft.github.io/DirectX-Specs/d3d/archive/D3D11_3_FunctionalSpec.htm#22.10.10%20max I believe this table of specials should apply then
17:30 glehmann: okay, so it's NaN then
17:31 jenatali: That's what I'd expect, yeah
17:31 glehmann: SPIR-V is undefined result for some reason
17:31 jenatali: Hm, though WARP is going to return not-NaN
17:32 jenatali: It's a question of whether the hardware/driver is setting an initial value as a constant, and then doing min/max on all values, or if it's setting initial value as the value from the first active lane, and then min/max on subsequent values
17:41 jenatali: glehmann: Yeah WARP for WaveActiveMax(NaN) is going to give you -inf
17:42 glehmann: it also could set the initial value to constant NaN, that's what I wanted to change aco to, but then I noticed the weird spirv rules around exclusive scans (which require -inf for the first invocation)
17:44 glehmann: alyssa: doesn't agx have reductions/exclusive_scans in hw? I wonder what that does for fmin/fmax
17:45 alyssa: glehmann: ....Ouch. Didn't consider that.
17:45 alyssa: Not sure what it does there.
17:45 alyssa: It's possible that those are correct
17:45 jenatali: glehmann: Yeah makes sense. D3D doesn't have min/max scans. I wonder if this is why
17:46 alyssa: since the reductions/scans are real binary fmin/fmax instructions
17:46 alyssa: whereas nir_op_fmin/fmax is implemented with a comparison-and-select instruction
17:46 alyssa: (flt+bcsel fused into 1 instruction, with a special mode to make NaNs do the right thing)
17:46 glehmann: the issue is that at this point I don't even know what correct means
17:46 alyssa: (which is why denorms need to be flushed manually)
17:46 alyssa: ...Yeah
17:47 alyssa: Especially since this is only 754-2019, and M1 was released 2020 so the ALUs predate 2019.
17:47 alyssa: (does Apple participate in IEEE?)
17:48 alyssa: "Metal is compliant to a subset of the IEEE 754 standard" :melt:
17:48 glehmann: well, NaN behavior didn't change, so does your scan use -inf or NaN as identity internally
17:50 glehmann: that's the odd thing. to me it would have been obvious that the identity is NaN because that's... the actual identity of min/max. No idea why apparently multiple people chose +-inf
17:51 alyssa: mm..
17:55 glehmann: alyssa: I found another min/max spir-v issue: nmax(-Inf, y) = y.
17:55 glehmann: that's not true if y is NaN
17:56 alyssa:melts
17:56 alyssa: true.
17:58 glehmann: not sure if I want to pollute your issue with that, but opening yet another one also feels weird
18:02 alyssa: good day for ieee
18:03 glehmann: can't wait for spir-v to get ieee-754-2019 minimum/maximum for maximum confusion
18:09 jenatali: glehmann: Looks like NVIDIA's D3D driver turns WaveActiveMax(NaN) to -FLT_MAX, and WaveActiveMax(-inf) to 0. So....... that's not great :)
18:11 alyssa: jenatali: excuse me
18:11 alyssa: ??
18:11 jenatali: Mhmm
18:15 jenatali: Oh and WARP actually will give you different answers depending on whether it's a full wave or a partial one. That's fun
18:16 glehmann: that's the same situation aco is in rn
18:23 jenatali: NVIDIA's D3D driver gives sane results with a full wave FWIW, the garbage is only with inactive threads
18:25 alyssa: 🤡
23:08 Lyude: Are enable_vblank and disable_vblank hooks that atomic drivers should be using?
23:26 Lyude: ah ok nvm, I think I just misread something :P