IRC Logs of #dri-devel on irc.freenode.net for 2023-12-18

11:55 mlankhorst: danvet/airlied: so far I've never been able to fast-forward drm-misc-fixes this cycle, should I rebase, backmerge or just let it be since nobody complains?
12:17 mairacanal: display folks, i'm having a problem when using non-blocking commits that I cannot guarantee that plane->state->fb == state->fb will be true for both prepare_fb and cleanup_fb. Sometimes, plane->state->fb == state->fb is true for prepare_fb (then I don't increase the refcount) and plane->state->fb == state->fb is false for cleanup_fb (then I
12:17 mairacanal: decrease refcount). I was wondering: this driver should guarantee this condition or it is really something we cannot guarantee?
12:18 emersion: mairacanal: maybe cleanup_fb runs after plane->state->fb has been updated to the new value?
12:19 vsyrjala: you should not consult plane->state. get the old/new state from the commit
12:19 emersion: right
12:20 emersion: but tbh not sure this check is really needed? i'd just inc/dec the refcount always
12:20 pq: conditional reffing sounds... strange
12:21 mairacanal: "but tbh not sure this check is really needed?" -> yeah, it works just fine which it
12:21 mairacanal: i was thinking about just removing this check, but I don't really know how to justify such a change
12:22 pq: I can think of a few suggestions: the old code was a) more complicated than necessary, b) fragile, c) inconsistent
12:23 pq: maybe even buggy, if it causes problems?
12:23 emersion: mairacanal: i think what ville said should be enough
12:23 emersion: but ideally that would be documented somewhere
12:23 emersion: can't find where though
12:24 emersion: vsyrjala: is there a place where this is spelled out?
12:26 javierm: emersion: I also couldn't find in the docs, I remember adding 30b1a0797e0b ("drm/ssd130x: Use drm_atomic_get_new_plane_state()") but only because vsyrjala pointed out to me as well
12:28 javierm: btw, dri-devel ML is still down right? Or am I the only one who isn't getting emails from the list?
12:30 vsyrjala: i'm not sure what's spelled out in the docs. i think the get_existing_state() stuff at least is marked deprecated. but sadly seems to be lots of users left, probably equally many doing foo->state as well
12:32 javierm: a patch for the docs and a gpu/todo.rst entry to fix the existing users would be nice
15:27 sima: mlankhorst, yeah -rc1 is a bit old, but imo it's more important for -next since there's a lot more patches there and so higher chances for unbisectable regression
15:28 sima: I guess if you want to rolling forward once just to stay in sync after a month should be ok imo, even if there's not a direct reason
15:28 sima: i.e. imo backmerging -rc6 is ok if you're on -rc1 still
17:02 karolherbst: jenatali: was there something I'd had to consider when lowering huge dispatches into various smaller ones?
17:02 jenatali: karolherbst: Yeah, you need to add offsets for the workgroup id
17:03 jenatali: Otherwise each dispatch will re-number those starting from 0
17:03 karolherbst: okay.. so with API offsets they always start at 0, but when I do set it internally to lower them, I have to account for them not starting at 0 in later dispatches
17:03 karolherbst: that makes sense I suppose
17:04 karolherbst: and I guess same for the global id
17:05 jenatali: karolherbst: Global ID is computed based on the local one
17:05 karolherbst: ehh wait
17:05 karolherbst: yeah..
17:05 karolherbst: so
17:05 karolherbst: `global_work_offset` simply impacts the global ID of threads
17:05 jenatali: The global offsets are only needed to reflect the global offset functionality at the API
17:06 karolherbst: okay.. yeah, so the workgroup_id is the oddball here
17:06 jenatali: Yep
17:06 karolherbst: I'll probably do shader variants for this and will also optimize the offsets == 0 case while at it I guess :D
17:07 karolherbst: I wonder if I want to precompile the offset shader and lazy compile the lowered one or just lazy compile them both...
17:07 karolherbst: mhh
17:07 jenatali: Yep, I do shader variants and optimize the offsets == 0 case as well
17:08 jenatali: I do both lazily right now, but that's mainly just because the local group size also needs to be part of my variant key and I can't even make a guess at that until enqueue time
17:08 karolherbst: so in the worst case there are 4 variants, but like offsets == 0 and small enough grids are like 99.9% of the cases...
17:08 karolherbst: well..
17:08 karolherbst: on the v3d driver I need huge grids lowered as the hardware has 16 bit limits on the grid :')
17:09 jenatali: Yeah D3D has that limit as well
17:09 karolherbst: I think I can lazy compile all variants, as I don't see it would impact anyway
17:09 karolherbst: while the GPU executes the main variant, the CPU can compile the others...
17:10 karolherbst: the only case which might make sense to compile ahead of time might be the api offset one but uhh...
17:10 karolherbst: whatever
17:11 karolherbst: jenatali: I need to set `has_base_workgroup_id` to true and go from there, right?
17:11 jenatali: Yep
17:12 karolherbst: and `has_cs_global_id` needs to be false?
17:12 karolherbst: or true?
17:12 consolers: on wayland i was trying to use mpv --background=0/0/0/0 --alpha=yes to have a transparent background, and it works but when the background is white #ffffff, i.e. 1/1/1/0 (last number is the alpha) the whole thing becomes opaque. this is reported to work correctly with nvidia
17:13 jenatali: karolherbst: Only needs to be true if the API sets it to true IIRC
17:13 consolers: but I get the same sort of behaviour in xlib, which is also repeatwed here https://stackoverflow.com/questions/39906128/how-to-create-semi-transparent-white-window-in-xlib --- if the background is white, it becomes opaque
17:13 karolherbst: what API?
17:13 jenatali: Wait, that's just an unrelated thing, isn't it?
17:13 karolherbst: yeah..
17:13 karolherbst: I'm asking because there is this pattern in lower_system_values: has_base_workgroup_id || !has_cs_global_id
17:14 karolherbst: but I think I need has_cs_global_id to be true, and then drivers will lower it themselves if needed
17:15 jenatali: karolherbst: It's unrelated
17:15 consolers: is this a bug in xorg or mesa or what layer
17:15 jenatali: If you set that you're using local offsets, then the global ID is always computed from local IDs
17:15 jenatali: Otherwise the has_cs_global_id not determines whether that happens
17:15 karolherbst: you mean if I don't set it
17:16 karolherbst: if I keep has_cs_global_id false, then setting has_base_workgroup_id to true would enable the global_invocation_id/index lowering
17:16 karolherbst: if I read the code correctly
17:16 jenatali: It's an or - when local ID offsets are used, it doesn't matter what that field is
17:16 consolers: or intel
17:17 karolherbst: jenatali: ohh.. uhhh.. that's bad tho
17:17 jenatali: Why?
17:17 karolherbst: why would I want that lowering?
17:17 jenatali: Because when you run your second dispatch, and use a workgroup ID offset, then you want the global ID to automatically also be offset
17:18 karolherbst: but that already happens via has_base_global_invocation_id, no?
17:18 jenatali: I guess you could add the global offsets on the CPU side, sure
17:18 jenatali: It seemed simpler to just forward through the global offsets from the CL API to that sysval, and then when using local offsets you don't need to touch the global ones again
17:19 karolherbst: yeah.. I was thinking of just using the API offsets + fixing the workgroup_id
17:19 jenatali: The kernel just re-computes them
17:20 karolherbst: the thing is, that I kinda want drivers to use nir_intrinsic_load_global_invocation_id_zero_base when they have it
17:20 karolherbst: so lowering that unconditionally kinda sounds wrong
17:20 jenatali: Yeah, that's what I do too (SV_DispatchThreadID), unless you're lowering huge groups
17:20 karolherbst: I see
17:22 karolherbst: I think just setting base_global_id and base_workgroup_id is a bit simpler from my perspective here, so I still let drivers do as they please. I might have to look at some of the codegen, but I might also want to set `has_cs_global_id` regardless of the lowering already...
17:22 karolherbst: mhh...
17:22 jenatali: If you want to change the semantics there to require the dispatcher to supply local offsets + global offsets, instead of just local and letting the global be computed, I can accommodate that but it is a breaking change
17:24 karolherbst: I think I'll figure out first how the nir looks like and see what I'll think would be the best here, but I think for some drivers changing the semantics might allow for better code.
17:24 karolherbst: I'll have to see
17:25 jenatali: I wasn't terribly worried about the codegen for the case of huge dispatches
17:25 karolherbst: yeah.. but I'm hitting it with the CTS allone on v3d
17:25 karolherbst: 256 * 65535 is the max thread count on 1D there
17:26 karolherbst: so I'm constantly hitting it
17:28 jenatali: Oh, D3D's is at least 1024 * 64K
17:30 karolherbst: also.. I currently limit to 32 max threads $because_driver_limitations
17:30 karolherbst: so 32 * 65535 :)
17:31 karolherbst: v3d recompiles shaders based on bound textures/samplers/images, so that's kinda pain
17:31 jenatali: Oh well that'll do it...
17:35 jenatali: karolherbst: Anyway I'm open to changes here, just let me know if you think they're necessary
17:35 karolherbst: yeah, thanks :)
17:40 karolherbst: jenatali: btw.. do you know what applications use SPIR (not SPIR-V)? I might be willing to wire it up if it's important enough (even if only through wine)
17:40 jenatali: karolherbst: No idea
17:41 jenatali: I hooked up the compiler side of it but haven't hooked it into the runtime yet
17:41 karolherbst: ohh
17:41 karolherbst: I kinda thought you had a real reason to do it
17:52 jenatali: Nah, just while I was in the area I exposed the methods to start with LLVM IR
17:56 karolherbst: I see...
18:38 karolherbst: jenatali: btw.. any reason for `clc_compile_c_to_spir` to still exist? Could nuke that part at least I think...
18:39 jenatali: karolherbst: It's 12 lines of code. Doesn't seem that bad to me to keep?
18:39 jenatali: Oh I guess the helper's a bit more too
18:39 jenatali:shrugs
23:07 jenatali: Is enunes still the right contact for lima runners? https://gitlab.freedesktop.org/mesa/mesa/-/jobs/52910034 seems busted
23:09 jenatali: Thanks!