00:49 karolherbst: EdB_: I figured out why array samplers were a bit broken: https://gitlab.freedesktop.org/mesa/mesa/-/blob/master/src/gallium/frontends/clover/llvm/codegen/common.cpp#L55
00:49 karolherbst: there are a few places and now idea if you fixed those up as well..
00:49 karolherbst: same for the spirv one :/
02:55 Chaekyung: What is "UAV" short for? as in "Xonotic no longer shows a perf gain because we removed UAVs"
02:56 jekstrand: Unordered Access View
02:56 jekstrand: It's a D3D term
02:56 jekstrand: Storage buffers/images in Vulkan terminology
02:58 Chaekyung: Why were they removed/disabled?
02:59 jekstrand: HDC and atomics aren't friends
03:03 Chaekyung: Can I simply use a (not) quote like "
03:03 Chaekyung: "Unordered Access View is a D3D term. It is storage buffers/images in Vulkan terminology. They were removed because HDC and atomics aren't friends."
03:03 Chaekyung: I don't have to rewrite it or use it at all, it seems more suitable that way. if that's ok
03:05 jekstrand: Uh... why am I being (not) quoted?
03:06 Chaekyung: You're an expert on the subject? The context would be https://linuxreviews.org/Mesa_Just_Got_A_Significant_Performance_Boost_For_Intel_Tiger_Lake_Chips
03:07 jekstrand: Ah
03:07 jekstrand: I guess, sure.
03:08 Chaekyung: Thank you.
03:09 jekstrand: yw
06:25 airlied: karolherbst, jekstrand : https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7231
06:25 airlied: fyi, basic amd support for nir/clover, the last patch in the series is a horror
06:25 airlied: I expect I don't want to lower kernel inputs
07:05 danvet_: airlied, EMULTIHOP sounds horrible :-)
07:06 airlied: danvet_: finding an errno is hard :-P
07:07 danvet_: I mean my understanding is that könig's plan is that the drive controls this all
07:07 danvet_: like when it wants a vram placment, but bo is in system
07:07 airlied: danvet_: the driver is controlling it :-)
07:07 danvet_: it first sets up the tt placement
07:07 danvet_: then the vram placement
07:07 danvet_: I guess we'll see how much he jumps
07:08 airlied: I read his ideas today and didn't produce working code from them
07:08 airlied: so maybe this will give him a start to dig more
07:08 airlied: it at least removes a lot of the ttm->driver->ttm->driver->ttm roundtrips
07:09 airlied: but yeah the driver could try and detect pre bo validation that multi hops are required
07:10 airlied: then stage them one after the other
07:13 airlied: danvet_: the only bit of that patch I really like is the diffstat
07:38 pinchartl: sravn: btw, when you lose patches, lore.kernel.org is very useful. it allows downloading them as .mbox. and it also offers an NNTP interface
08:10 mripard: danvet_: sravn: ack, I'll do a PR today then
08:13 danvet_: airlied, yeah it might trick könig into doing the work, the best kind of rfc :-)
09:31 MrCooper: ajax: would also be interesting how many of those commits touched .gitlab-ci* only
09:32 MrCooper: danvet_ agd5f: maybe anyone who wants to add another amdgpu module param should be required to read through the output of "modinfo -p amdgpu" and explain what each of them does :P
13:53 karolherbst: uhm.. can we remove TGSI_OPCODE_BGNSUB and TGSI_OPCODE_ENDSUB?
13:53 karolherbst: looks like dead code to me
13:53 karolherbst: and would allow us to remove quite a bit of code overall
13:53 karolherbst: and no driver supports it anyway
13:54 dcbaker[m]: Need to ask vmware. Seems like every other this kind of question gets asked it's for their internal stuff
13:55 karolherbst: yeah well..
13:55 karolherbst: maybe I just remove it and create an MR for it
13:55 karolherbst: and ask them to explain there
13:55 karolherbst: doesn't seem worth to keep it for "internal stuff"
13:55 karolherbst: oh well
14:00 kisak: not wanting to ruffle any feathers, but it's a fact that out of tree drivers are second class citizens. Offuscated "internal stuff" of an out-of-tree driver is grounds for delaying dead code in the spirit of cooperation, but not blocking it
14:00 kisak: ^delaying the removal of dead code
14:01 karolherbst: kisak: this is userspace though
14:01 karolherbst: and drivers for VMs do pass around shaders directly
14:02 karolherbst: for vmware I think they pass the TGSI to the hypervisor
14:02 karolherbst: and all of that is open source afaik
14:02 karolherbst: but yeah...
14:02 karolherbst: it's annoying
14:02 karolherbst: and I would require documentation and pointing out _where_ it's used
14:02 karolherbst: otherwise I'd see it as unused
14:03 daniels: karolherbst: virgl also passes TGSI :(
14:03 karolherbst: right
14:03 karolherbst: but I doubt it passes subroutine stuff
14:03 karolherbst: anyway
14:03 karolherbst: there is no way to generate a TGSI with it
14:03 daniels: yeah, virglrenderer doesn't use it at least
14:03 dcbaker[m]: karolherbst: they have a dx10/11 target as well
14:04 karolherbst: so?
14:04 karolherbst: mesa doesn't generate TGSI with it
14:04 karolherbst: there is simply now way it would reach the driver
14:04 karolherbst: *no
14:05 dcbaker[m]: True enough
14:05 dcbaker[m]: It's 7am here, maybe I should wait till after coffee
14:49 karolherbst: uff.. that removes quite a lot of code actually
15:04 karolherbst: dcbaker[m]: just removing the enum values: +16 -401 https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7235
15:04 karolherbst: and there are a bunch of cleanups I can do in nouveau as well
15:05 dcbaker[m]: I would mention someone at vmware just to be sure
15:06 karolherbst: yeah.. probably
15:21 dcbaker[m]: Wow. They're going to open source it?
15:22 FLHerne: Maybe the reduction in complaints is worth it for them :p
15:22 karolherbst: :D
15:22 karolherbst: well, still doesn't answer the question how they end up with those
15:23 karolherbst: or is there another magic component where they pass in TGSIs with those?
15:23 karolherbst: then they should handle it inside their stack
15:23 karolherbst: why bother mesa with it
15:23 bnieuwenhuizen: I think mesa also gets them as input but just lowers them?
15:24 karolherbst: yes, it's lowered in glsl ir
15:26 karolherbst: I think I would probably fine to just drop it in the codegen/tgsi code, but.. oh well
15:26 karolherbst: now I am at 516 locs removed, so
15:26 karolherbst: it's a bit
15:32 dcbaker[m]: Presumably they're d3d10 state tracker emits it and they consumes it directly. I assume after they release that there will be a flurry of people interested in it
15:34 karolherbst: mhhh
15:35 karolherbst: yeah.. I somehow mistook it for the driver doing d3d9/10 stuff..
15:35 karolherbst: anyway, they can just handle it inside their state tracker
15:36 karolherbst: maybe..
15:36 karolherbst: dunno
15:37 karolherbst: dcbaker[m]: best is also this check_no_subroutines function which apperantly makes sure there isn't a TGSI with those opcodes...
15:38 karolherbst: but seems like that's tesselation only? mhh
15:42 dcbaker[m]: I still think the best plan is to just get everything using NIR, then we can just treat tgsi as the serialization format that virgl and svga use
15:42 karolherbst: probably
15:42 dcbaker[m]: I guess nine uses it too
15:42 karolherbst: but doesn't virgl has a strange copy of tgsi somewhere?
15:42 karolherbst: nine supports nir already afaik
15:42 karolherbst: but yeah, I guess removing tgsi from drivers is the way to go
15:42 dcbaker[m]: Through ttn I think
15:42 karolherbst: mhh
17:00 ajax: woo ntt merged, nice work anholt
17:01 dcbaker[m]: \o/
17:04 krh: yeah, good stuff
17:10 danvet_: robclark, I'm also wondering why you're checking for async_update in can_do_async
17:10 danvet_: since I think you should never see that there
17:13 robclark: danvet_: maybe we only need to check for async cursor update
17:16 chadv: question: How do I force an app to use a specific GL driver in a dual-gpu system? context: My machine has a discrete AMD gpu with an Intel gpu. I want to run an app with Intel OpenGL. I've disconnected all displays from the AMD gpu.
17:17 bnieuwenhuizen: doesn't it do that already?
17:17 bnieuwenhuizen: IIRC most GL stuff with dri3 uses the same GPU as the X server
17:17 chadv: I know that I can blacklist amdgpu on the kernel cmdline. But I'm specifically asking how to do this when both amdgpu and i915 are loaded.
17:18 chadv: bnieuwenhuizen: I've only recently begun working with a dual gpu machine. So I just needed to confirm my assumptions.
17:18 chadv: With an X server, sure. I assume the same holds for Wayland too.
17:18 bnieuwenhuizen: the real question is what to do if X itself picks the wrong GPU :P
17:19 chadv: Yes... initially, my gnome-shell primary display was connected to amd. Then I disconnected it and switched to Intel. I wonder if gnome-shell also switched gpus. I wish I knew a way to check.
17:20 bnieuwenhuizen: glxinfo ?
17:20 chadv: sigh. i'm making this more difficult than it should be.
17:23 chadv: Next issue. How do I ensure that vulkan apps choose the right VkDevice? apps have freedom choice there.
17:23 chadv: maybe i could set VK_ICD_FILENAMES, or find some other env var in the vulkan_loader docs.
17:23 bnieuwenhuizen: VK_ICD_FILENAMES to the right driver
17:24 bnieuwenhuizen: or just use the mesa device select layer which defaults to making the X/wayalnd device the first device listed
17:24 chadv: ahh, thanks.
17:25 chadv: bnieuwenhuizen: btw, thanks for deqp-runner.
18:02 xexaxo1: chadv: the most reliable way is to use DRI_PRIME=gpu_path_id_tag_as_seen_in_udev
18:03 xexaxo1: chadv: for example: udevadm info /dev/dri/card0 | grep ID_PATH_TAG -> E: ID_PATH_TAG=pci-0000_00_02_0
18:12 shfbsdbvf: Hi, I wanted to report few bugs but then I thought that maybe I should check if they are not fixed in latest version
18:13 shfbsdbvf: What do I need for it? Just latest mesa or kernel and llvm should also be latest?
18:13 shfbsdbvf: The bugs are in radv
18:14 ajax: depends on the bug, but start with just mesa
18:14 xexaxo1: shfbsdbvf: my first bet would be mesa. newish radv uses aco (instead of llvm)
18:15 xexaxo1: that would cut down the testing surface a bit :-P
18:17 shfbsdbvf: Thanks, also those 2 bugs are in unreal engine. I think there's a chance that if I report them none will check them because you need ~100gb of free space and ~4 hours to build it.
18:18 shfbsdbvf: Maybe I should add printfs to all radv functions, read the output and make a minimal program that reproduces this bug?
18:19 xexaxo1: shfbsdbvf: might want to wire the vulkan validation layers to catch any obvious issues
18:19 shfbsdbvf: xexaxo1: sorry, what layers?
18:23 xexaxo1: shfbsdbvf: see https://vulkan-tutorial.com/Drawing_a_triangle/Setup/Validation_layers and https://github.com/KhronosGroup/Vulkan-ValidationLayers
18:24 xexaxo1: former provides a nice summary/overview, while the latter (in docs) has documentation about each layer.
18:24 shfbsdbvf: Thanks, I'll try them
18:25 xexaxo1: ... one of which being a "printf"-alike, IIRC. odds are your distro alreayd ships a package with the vulkan layers
18:28 xexaxo1: shfbsdbvf: there's no need to rebuild unreal engine in order to use them.
18:28 shfbsdbvf: I meant that to experience the bug you have to build it, they don't ship binaries for linux
18:31 xexaxo1: ack. explicitly pointing out, since the example in the first page may be confusing
19:05 karolherbst: curro: any opinions on https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7069/diffs?commit_id=c712049c1a4aeadc6dbc462aaa911c896305b8d4 ?
19:06 karolherbst: I think the patch is fine, just wondering if you prefer checking it before calling the constructor
19:09 curro: karolherbst: might be okay, but I didn't verify that all the entry points calling that helper are specified to return CL_INVALID_VALUE in that case
19:10 karolherbst: yeah.. there might be some corner cases where it would be fine I guess
19:11 karolherbst: guess I will take a closer look and see where it might cause issues
19:12 airlied: jekstrand: haven't reached the point in CL where you want to push functions into the backend? :-0
19:13 anholt: MrCooper: any idea why the pages job on https://gitlab.freedesktop.org/mesa/mesa/-/pipelines/216013 is blocked? couldn't find it in the dag, and I tried running an earlier container job in case there was some implicit dependency on previous stage due to pages not declaring dependencies.
19:13 jekstrand: airlied: luxmark4 makes a case for it....
19:13 karolherbst: yeah.. nouveau potentially is able to make use of it
19:13 karolherbst: just...
19:13 karolherbst: that's like dead code for centuries
19:14 airlied: jekstrand: do you have a calling convention? :-)
19:14 jekstrand: airlied: Not yet.
19:15 karolherbst: do we need a calling convention?
19:15 jekstrand: Of some form
19:15 jekstrand: That or global RA
19:15 karolherbst: ohh mhh
19:15 karolherbst: right
19:15 karolherbst: but that's up to the backend
19:15 airlied: karolherbst: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7231/diffs?commit_id=a6dd7660698e157738fc3b8bf074c7400681e658 btw
19:15 karolherbst: I was more refering to between objects
19:15 jekstrand: In any case, not a problem I plan on solving this week.
19:15 airlied: seems to point that I don't want to lower kernel inputs :-P
19:15 karolherbst: I think it would be safe to assume we always link everything together, no?
19:16 karolherbst: or do we want to have multiple objects whith calls between them?
19:16 airlied: karolherbst: I haven't looked into if rocm or lvl0 allow that
19:16 karolherbst: doesn't matter
19:16 karolherbst: it's all driver internal only
19:16 airlied: well just whether it's considered something people expect to have
19:17 karolherbst: mhhh
19:18 karolherbst: lvl0 doesn't seem to have "libraries" does it?
19:18 karolherbst: but I guess it could still have it's native binaries as internal libs
19:19 jekstrand: the Intel OpenCL driver likes to inline everything but it will use "real" function calls if given something truly absurd like luxmark4.
19:19 karolherbst: right
19:19 karolherbst: my question was if you have several object files uploaded
19:19 karolherbst: or is it all one big thing
19:19 airlied: yeah heuristics ftw :-P
19:20 karolherbst: but I guess because nobody wants to bother with relocations you probably don't want to have that
19:20 airlied:wouldn't be shocked if the heuristic is (if lumxark4)
19:20 karolherbst: :p
19:21 jekstrand: hehe
19:21 jekstrand: I suspect not but I don't know that codebase
19:21 karolherbst: anyway, function calls is something nouveau more or less supports (and reserves registers and stuff)
19:21 karolherbst: but uff..
19:21 karolherbst: it's more of an RA thing
19:22 karolherbst: and I think it's only works based on a handful regs
19:22 karolherbst: not sure what happens if you haver deeper call chains with 50 of regs
19:22 karolherbst: anyway
19:22 karolherbst: that's all up to the backend to deal with it
19:22 airlied: spilling :-P
19:23 karolherbst: yeah.. fun
19:23 karolherbst: or just sill function args
19:23 karolherbst: *spill
19:23 karolherbst: ufff
19:24 karolherbst: anyway, this also leads to the problem of having a compilation order
19:24 karolherbst: so you want to compile inner stuff first
19:24 karolherbst: and so on
19:24 karolherbst: or at least you need to know at some point where to move values
19:25 karolherbst: airlied: you are so lucky that llvm does all that stuff for you :p
19:25 airlied: karolherbst: yeah for llvmpipe I can liekly just tur off nir inlining, might be a good test
19:26 airlied: aco might be more fun in the future :-P
19:28 karolherbst: yep
19:33 Emmy: an old article from 2013 mentioned vblank_mode=4, for adaptive vsync. which isnt an option anymore, how does on set this stuff nowdays?
19:39 karolherbst: Emmy: I would be surprised if things were even implemneted back then
19:41 Emmy: I see, so using FreeSync isnt an option yet? (besides X11's variable refresh rate option thing.. or i might be confused.)
19:41 karolherbst: no idea about the current state, just I am quite positive we hadn't it implemented 7 years ago
19:42 Emmy: Ah ok, thanks!
19:48 xexaxo1: Emmy: got a link to the article? iirc vblank_mode=4 was never a thing
19:52 Emmy: xexaxo1, sometimes i think phoronix is a bot ;) .. guess it never was merged or unfinished. https://www.phoronix.com/scan.php?page=news_item&px=MTU0NDE
19:54 xexaxo1: Emmy: actually, the article says patches were sent... and seemingly they never got merged :-\
19:55 xexaxo1: seems like there's no comments although, the patch only covers glx/dri2 - egl* and glx/dri3 are not implemented
20:10 FLHerne: Emmy: "besides X11's variable refresh rate option thing" sounds possibly confused
20:10 FLHerne: https://wiki.archlinux.org/index.php/Variable_refresh_rate is a fairly good description
20:11 FLHerne: Well, of how to use it
20:11 FLHerne: Unless your games aren't running under X...
20:45 DPA: Xorg with atomic enabled now fails to pick up the linear modifier of the mxsfb in linux-next.
20:45 DPA: drm_format_modifier::formats is 0 here, which I think is wrong: https://gitlab.freedesktop.org/xorg/xserver/-/blob/master/hw/xfree86/drivers/modesetting/drmmode_display.c#L2047
20:48 DPA: If this was 3, for example, it would pick up the modifier.
20:58 vsyrjala: mxsfb looks to be missing .format_mod_supported(), and iirc the code which creates the blob is broken w/o that
21:00 vsyrjala: i think there are three options: a) make .format_mod_supported() mandatory, b) assume any format works when populating the blob, c) don't create the blob at all
21:01 vsyrjala: from the kernel side at least. not sure if modesetting is expecting to be able to operate without the blob or not
21:09 DPA: Thanks. It doesn't look like Xorg would get it right without the blob, so I think the other two options would be preferable.
21:10 karolherbst: curro: mhh, reading the spec it is really unclear what to do once regions/origins/offsets are not set
21:10 karolherbst: for example https://www.khronos.org/registry/OpenCL/sdk/2.2/docs/man/html/clEnqueueReadImage.html
21:10 karolherbst: but if NULL would be allowed it would contradict other rules, like for 2D images certain elements have to be 0
21:10 DPA: vsyrjala: I could probably figure out how to do a), but I don't understand drm/dri good enough to do b.
21:10 karolherbst: the spec is the same for the buffer functions
21:11 karolherbst: or we just assume 0?
21:11 karolherbst: in case it's null
21:11 karolherbst: there are some offsets where we could be as nice and just accept the input
21:11 karolherbst: but then again...
21:12 karolherbst: maybe have a vector_or_zero function where we think having 0 makes sense and everywhere else throw an error
21:12 karolherbst: like region has to be something
21:12 karolherbst: offsets and origins can be 0
21:13 airlied: karolherbst: I'm not sure either ofthem can be NULL
21:14 airlied: origin or region
21:14 karolherbst: yeah, but the spec also doesn't define an error :p
21:14 karolherbst: I think it makes sense to assume 0 on NULL
21:14 karolherbst: that's more or less intuitive and follows some other functions
21:14 karolherbst: local_work_size in clEnqueueNDRangeKernel eg is the same
21:15 karolherbst: so the runtime just decides what's good
21:15 karolherbst: but right, that's different than 0
21:15 karolherbst: :/
21:15 karolherbst: uff...
21:15 airlied: just ask on the public opencl docs
21:16 airlied: raise an issue, someone usually comes along
21:16 karolherbst: will do
21:20 curro: karolherbst: seems like undefined behavior if there is nothing sane one can do... but then it should be okay to return a made-up error code as in EdB_'s patch
21:20 karolherbst: okay
21:20 karolherbst: anyway, filed a bug: https://github.com/KhronosGroup/OpenCL-Docs/issues/478
21:20 gitbot: KhronosGroup issue 478 in OpenCL-Docs "Unclear requirements for region/offset/origin arguments in various functions" [Open]
21:21 curro: cool
21:22 karolherbst: ehh, should have checked the 3.0 spec
21:22 karolherbst: nope
21:22 karolherbst: "CL_INVALID_VALUE if any region array element is 0." is all I see
21:22 airlied: "CL_​INVALID_​VALUE if values in origin and region do not follow rules described in the argument description for origin and region."
21:22 airlied: oh thats 2.2
21:22 karolherbst: well
21:23 curro: but there is no rule saying it may not be null :P
21:23 karolherbst: the description is what's above that
21:23 karolherbst: right
21:23 karolherbst: I mean.. the spec tells us to segfault though :p
21:23 karolherbst: because we have to check each element
21:23 vsyrjala: DPA: i think a patch for b) or c) should a few lines. a) would require changing all the drivers
21:24 airlied: karolherbst: maybe segfault is fine :-P
21:24 karolherbst: :p
21:24 curro: karolherbst: yeah x|, anyway EdB_'s patch should be fine then
21:24 karolherbst: CL_NO_ERROR=1 :p
21:25 karolherbst: okay cool
21:25 karolherbst: anyway, I think the image MR is fully reviewed then?
21:25 curro: seems like it yeah
21:25 karolherbst: nice
21:26 karolherbst: next: 1.2 features :/
21:26 karolherbst: found a few places which needs more fixing
21:26 karolherbst: image arrays are annoying
21:26 airlied: karolherbst: printf? :-P
21:26 karolherbst: especially 1Darray
21:26 airlied: oh 1.2 image features
21:26 karolherbst: yeah
21:26 karolherbst: I have the patches, they just need more love
21:29 karolherbst: sooo. merge?
21:29 curro: yep go for it
21:44 karolherbst: anholt: btw, ntt broke scons-win64
21:44 anholt: sigh
21:45 karolherbst: https://gitlab.freedesktop.org/mesa/mesa/-/jobs/5101684
21:59 airlied: I thought scons was just a let vmware fix it
22:01 karolherbst: I didn't say anholt should fix it, just wanted to point it out :p
22:08 karolherbst: duh...
22:08 karolherbst: today CI hates me
22:09 jekstrand: karolherbst: Having to update all thos llvmpipe CL test lists?
22:09 karolherbst: that and flakes
22:10 karolherbst: just got a softpipe timeout
22:10 karolherbst: but I hit retry quick enough as it seems
22:10 karolherbst: nice
22:10 karolherbst: image support merged :)
22:11 AndrewR: karolherbst, \0/
22:13 karolherbst: ohh ehh
22:13 karolherbst: somehow I managed to drop the vector patch...
22:13 karolherbst: "oops"
22:13 jekstrand: karolherbst: \o/
22:13 karolherbst: had it removed locally, but I was sure I didn't push it.. guess I messed up.. oh well
22:16 karolherbst: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7241 :D
22:18 karolherbst: soo, now cleaning up that mess
22:18 daniels: karolherbst: exciting! :)
22:44 karolherbst: mhhh
22:44 karolherbst: so we had this "CL_RGB or CL_RGBx. This format can only be used if channel data type =CL_UNORM_SHORT_565, CL_UNORM_SHORT_555, or CL_UNORM_INT_101010." thing in CL 1.2
22:45 karolherbst: but this is gone with CL 2.0
22:45 karolherbst: :/
22:45 karolherbst: that's kind of a lead to fix some annoying crash with llvmpipe exposing those
22:46 airlied: CL 2.0 does have some language in those formats
22:46 airlied: "The channel order must be CL_​RGB or CL_​RGBx."
22:47 karolherbst: but that's the other way around
22:47 karolherbst: CL_SNORM_INT8 was forbidden in combination with CL_RGB and CL_RGBx
22:47 airlied: yeah so it's leagl for others now
22:47 karolherbst: but what about this in 2.0?
22:47 karolherbst: sure?
22:47 karolherbst: the CTS crashes :p
22:47 airlied: is that the CTS bug or ours though?
22:47 karolherbst: there is also this sentence
22:47 airlied: though I'd just not expose them from clover
22:48 karolherbst: "The number of bits per element determined by the image_channel_data_type and image_channel_order must be a power of two."
22:48 karolherbst: CTS
22:48 karolherbst: does an aligned malloc with an alignment of 12
22:49 karolherbst: but CL_SNORM_INT8 * CL_RGB = 24 bits
22:49 karolherbst: or am I wrong here?
22:49 karolherbst: and that's npot
22:49 karolherbst: so it doesn't meet the "bust be a power of two" req
22:49 karolherbst: *must
22:50 airlied: number of bits per element
22:50 airlied: is per channel
22:50 karolherbst: yeah
22:50 karolherbst: no
22:50 karolherbst: I think?
22:50 karolherbst: not sure if element != channel or not
22:51 daniels: karolherbst: https://github.com/KhronosGroup/OpenCL-CTS/pull/827
22:51 gitbot: KhronosGroup issue (Pull request) 827 in OpenCL-CTS "Use power-of-two alignment values for allocating pixel data" [Closed]
22:51 daniels: have you got that merged?
22:51 karolherbst: ufff
22:51 karolherbst: daniels: so it's legal, just the CTS was broken?
22:52 karolherbst:needs to update headers again :/
22:53 airlied: karolherbst: element means channel
22:53 airlied: I've never heard it used any other way, but this is CL so who knows
23:16 airlied: okay now lowering kernel inputs lets me avoid the two qsorts :-P
23:17 airlied: not lowering
23:23 airlied: karolherbst, jekstrand ; if I'm adding a new lowering options for CL things, do I just need to update noveeau and iris, we have no others we care about do we? (maybe inform jenetali)
23:24 karolherbst: airlied: mhh, I kind of prefer no api specific options though.. what do you want to add?
23:24 karolherbst: if it's CL specific clover should set it
23:24 karolherbst: not drivers
23:26 karolherbst: or it's not CL specific and then it doesn't matter ;)
23:26 karolherbst: but anyway, I prefer clover to handle those things
23:26 airlied: I need two options
23:27 airlied: one to not lower global_group_size and one to avoid the clover kernel input lowering
23:27 airlied: neither are CL specific
23:27 karolherbst: global_group_size goes into nir_lower_compute_system_values_options
23:28 karolherbst: and that's constructed inside clover
23:28 karolherbst: airlied: also, why do you need to avoid kernel input lowering
23:28 airlied: karolherbst: unfortunately not
23:28 airlied: it's lowered earlier
23:29 karolherbst: by what?
23:29 airlied: karolherbst: lower system values
23:29 karolherbst: this just turns it into an intrinsic
23:29 airlied: karolherbst: I've pinged you twice pointed you at the radeonsi patch
23:29 airlied: that is the reason for avoiding kernel input loweirng
23:29 karolherbst: ehh, because llvm is stupid and does its own stuff?
23:30 airlied: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7231/diffs?commit_id=830a391aa42e34be82511e7ecce095b223683390
23:30 airlied: llvm has a shader ABI I can't really change
23:30 karolherbst: kernels != functions though
23:30 airlied: llvm has a shader ABI I can't really change
23:30 airlied: it expects a kernel to have a function entry point
23:31 airlied: with parameters
23:31 airlied: I can rebuild it at the cost of two qsorts or I can just not lower
23:31 karolherbst: because it does the wrapping itself?
23:32 airlied: it does it's own magic, bypassing it would mean changing ghe backend abi
23:32 karolherbst: mhh
23:33 karolherbst: the thing is, we don't need any magic
23:33 airlied: karolherbst: the global group size lowering isn't do an intrisc
23:33 airlied: it lowers it to a multiply
23:33 airlied: of two other intrinsics
23:33 karolherbst: it shouldn't
23:34 airlied: it does
23:34 airlied: build_global_group_size
23:34 airlied: my patch addes an intrinsic
23:34 airlied: actually I should fix my patch to move the lowering options then
23:34 karolherbst: you could move it to lower compute system values
23:34 karolherbst: yeah
23:35 karolherbst: but I don't see why that change is needed at all?
23:35 airlied: it's shader ABI is what
23:35 karolherbst: huh?
23:35 karolherbst: really?
23:35 airlied: doh
23:35 airlied: llvm has a shader ABI I can't really change
23:35 karolherbst: how broken is the shader ABI :/
23:35 airlied: there we go :-P
23:35 airlied: it's fixed in a large rock
23:36 airlied: it works for ROCm and non-nir clover
23:36 karolherbst: sure it works, but there is no benefit
23:36 airlied: karolherbst: but I can't change it
23:36 karolherbst: so you need to pass it as a func arg or something?
23:37 karolherbst: just trying to understand how that's ABI
23:37 airlied: I need to extract it from a magic struct
23:37 airlied: to launch a kernel you fill out a magic structure of parameters
23:38 airlied: then you can get acess to that via an llvm intrinsic
23:38 karolherbst: that sounds horrible broken and suffering from compiler is external to runtime :/
23:38 karolherbst: is there no such "eat this llvm and stop being annoying" flag? :/
23:39 airlied: https://gitlab.freedesktop.org/mesa/mesa/-/blob/master/src/gallium/drivers/radeonsi/si_compute.c#L42
23:39 airlied: see the dispatch packet
23:39 karolherbst: right, sure
23:40 karolherbst: uff
23:40 karolherbst: but yeah, I guess that's fine...
23:40 karolherbst: mhhh
23:40 airlied: the non-kernel calling convention might be possiblebut I expect it's a fairly mess hole to dig
23:40 karolherbst: wait...
23:40 karolherbst: and how does lowering break this?
23:41 airlied: grid_Size_x is the multipled value I need for get_global_size
23:41 airlied: so I need to access 3 uint32_t from that struct directly, not two multiplied values from elsewhere
23:42 karolherbst: ohh
23:42 karolherbst: so you don't have "amount of groups" but just the size directly...
23:42 karolherbst: and how do you get the num_work_groups thing?
23:43 karolherbst: does it divide inside the kernel then?
23:43 karolherbst: or is that some hw value it can read from?
23:43 airlied: get_num_groups does a divide
23:43 airlied: https://github.com/RadeonOpenCompute/ROCm-Device-Libs/blob/amd-stg-open/ockl/src/workitem.cl#L140
23:43 karolherbst: ....
23:43 karolherbst: seriously?
23:43 karolherbst: *sigh*
23:43 airlied: that's the libclc equiv code from rocm
23:44 karolherbst: sure
23:44 karolherbst: just.. they prevent doing a mul, by having to use divide elsewhere?
23:44 karolherbst: some priorities
23:44 airlied: yeah it's horrible, not sure how it came about
23:44 karolherbst: proabbly "nobody uses num_work_groups"
23:44 airlied: though I expect it's one of those we made an ABI now stick with it problems
23:44 airlied: and that
23:45 karolherbst: yeah, okay, I guess we could make that conditional
23:45 karolherbst: I just wished we wouldn't have to deal with too many driver specific flags
23:46 airlied: maybe once oradeonsi can use aco and aco grows kernel support it can be removed :-P
23:46 karolherbst: airlied: for the kernel arg lowering, I'd add a spirv_options thing
23:46 karolherbst: and I guess some way (pipe flag?) to turn that stuff on
23:47 karolherbst: that would be the cleanest solution imho
23:47 airlied: okay so not a NIR thing, make sense I suppsoe
23:47 karolherbst: nir_compiler_options doens't care about the API, and I'd like to keep it this way
23:48 karolherbst: although we already have the shader type inside the function pointer
23:48 karolherbst: but...
23:49 karolherbst: mhhhh
23:49 karolherbst: airlied: although this kernel arg thing is more of a shader type ABI, nothing API specific right?
23:49 karolherbst: for lvl0 you'd have to do the same I mean
23:50 airlied: yeah it's not CL specific
23:50 karolherbst: maybe in this case I'd be fine with sticking it in the nir_compiler_options
23:50 karolherbst: I just don't want any API business in there
23:52 karolherbst: but I think the general idea was to not make this struct grow for ever :D
23:52 karolherbst: it's kind of huge already
23:52 jekstrand: airlied: That's all we have AFAIK
23:55 airlied: uggh compute caps aren't default