00:49karolherbst: EdB_: I figured out why array samplers were a bit broken: https://gitlab.freedesktop.org/mesa/mesa/-/blob/master/src/gallium/frontends/clover/llvm/codegen/common.cpp#L55
00:49karolherbst: there are a few places and now idea if you fixed those up as well..
00:49karolherbst: same for the spirv one :/
02:55Chaekyung: What is "UAV" short for? as in "Xonotic no longer shows a perf gain because we removed UAVs"
02:56jekstrand: Unordered Access View
02:56jekstrand: It's a D3D term
02:56jekstrand: Storage buffers/images in Vulkan terminology
02:58Chaekyung: Why were they removed/disabled?
02:59jekstrand: HDC and atomics aren't friends
03:03Chaekyung: Can I simply use a (not) quote like "
03:03Chaekyung: "Unordered Access View is a D3D term. It is storage buffers/images in Vulkan terminology. They were removed because HDC and atomics aren't friends."
03:03Chaekyung: I don't have to rewrite it or use it at all, it seems more suitable that way. if that's ok
03:05jekstrand: Uh... why am I being (not) quoted?
03:06Chaekyung: You're an expert on the subject? The context would be https://linuxreviews.org/Mesa_Just_Got_A_Significant_Performance_Boost_For_Intel_Tiger_Lake_Chips
03:07jekstrand: I guess, sure.
03:08Chaekyung: Thank you.
06:25airlied: karolherbst, jekstrand : https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7231
06:25airlied: fyi, basic amd support for nir/clover, the last patch in the series is a horror
06:25airlied: I expect I don't want to lower kernel inputs
07:05danvet_: airlied, EMULTIHOP sounds horrible :-)
07:06airlied: danvet_: finding an errno is hard :-P
07:07danvet_: I mean my understanding is that könig's plan is that the drive controls this all
07:07danvet_: like when it wants a vram placment, but bo is in system
07:07airlied: danvet_: the driver is controlling it :-)
07:07danvet_: it first sets up the tt placement
07:07danvet_: then the vram placement
07:07danvet_: I guess we'll see how much he jumps
07:08airlied: I read his ideas today and didn't produce working code from them
07:08airlied: so maybe this will give him a start to dig more
07:08airlied: it at least removes a lot of the ttm->driver->ttm->driver->ttm roundtrips
07:09airlied: but yeah the driver could try and detect pre bo validation that multi hops are required
07:10airlied: then stage them one after the other
07:13airlied: danvet_: the only bit of that patch I really like is the diffstat
07:38pinchartl: sravn: btw, when you lose patches, lore.kernel.org is very useful. it allows downloading them as .mbox. and it also offers an NNTP interface
08:10mripard: danvet_: sravn: ack, I'll do a PR today then
08:13danvet_: airlied, yeah it might trick könig into doing the work, the best kind of rfc :-)
09:31MrCooper: ajax: would also be interesting how many of those commits touched .gitlab-ci* only
09:32MrCooper: danvet_ agd5f: maybe anyone who wants to add another amdgpu module param should be required to read through the output of "modinfo -p amdgpu" and explain what each of them does :P
13:53karolherbst: uhm.. can we remove TGSI_OPCODE_BGNSUB and TGSI_OPCODE_ENDSUB?
13:53karolherbst: looks like dead code to me
13:53karolherbst: and would allow us to remove quite a bit of code overall
13:53karolherbst: and no driver supports it anyway
13:54dcbaker[m]: Need to ask vmware. Seems like every other this kind of question gets asked it's for their internal stuff
13:55karolherbst: yeah well..
13:55karolherbst: maybe I just remove it and create an MR for it
13:55karolherbst: and ask them to explain there
13:55karolherbst: doesn't seem worth to keep it for "internal stuff"
13:55karolherbst: oh well
14:00kisak: not wanting to ruffle any feathers, but it's a fact that out of tree drivers are second class citizens. Offuscated "internal stuff" of an out-of-tree driver is grounds for delaying dead code in the spirit of cooperation, but not blocking it
14:00kisak: ^delaying the removal of dead code
14:01karolherbst: kisak: this is userspace though
14:01karolherbst: and drivers for VMs do pass around shaders directly
14:02karolherbst: for vmware I think they pass the TGSI to the hypervisor
14:02karolherbst: and all of that is open source afaik
14:02karolherbst: but yeah...
14:02karolherbst: it's annoying
14:02karolherbst: and I would require documentation and pointing out _where_ it's used
14:02karolherbst: otherwise I'd see it as unused
14:03daniels: karolherbst: virgl also passes TGSI :(
14:03karolherbst: but I doubt it passes subroutine stuff
14:03karolherbst: there is no way to generate a TGSI with it
14:03daniels: yeah, virglrenderer doesn't use it at least
14:03dcbaker[m]: karolherbst: they have a dx10/11 target as well
14:04karolherbst: mesa doesn't generate TGSI with it
14:04karolherbst: there is simply now way it would reach the driver
14:05dcbaker[m]: True enough
14:05dcbaker[m]: It's 7am here, maybe I should wait till after coffee
14:49karolherbst: uff.. that removes quite a lot of code actually
15:04karolherbst: dcbaker[m]: just removing the enum values: +16 -401 https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7235
15:04karolherbst: and there are a bunch of cleanups I can do in nouveau as well
15:05dcbaker[m]: I would mention someone at vmware just to be sure
15:06karolherbst: yeah.. probably
15:21dcbaker[m]: Wow. They're going to open source it?
15:22FLHerne: Maybe the reduction in complaints is worth it for them :p
15:22karolherbst: well, still doesn't answer the question how they end up with those
15:23karolherbst: or is there another magic component where they pass in TGSIs with those?
15:23karolherbst: then they should handle it inside their stack
15:23karolherbst: why bother mesa with it
15:23bnieuwenhuizen: I think mesa also gets them as input but just lowers them?
15:24karolherbst: yes, it's lowered in glsl ir
15:26karolherbst: I think I would probably fine to just drop it in the codegen/tgsi code, but.. oh well
15:26karolherbst: now I am at 516 locs removed, so
15:26karolherbst: it's a bit
15:32dcbaker[m]: Presumably they're d3d10 state tracker emits it and they consumes it directly. I assume after they release that there will be a flurry of people interested in it
15:35karolherbst: yeah.. I somehow mistook it for the driver doing d3d9/10 stuff..
15:35karolherbst: anyway, they can just handle it inside their state tracker
15:37karolherbst: dcbaker[m]: best is also this check_no_subroutines function which apperantly makes sure there isn't a TGSI with those opcodes...
15:38karolherbst: but seems like that's tesselation only? mhh
15:42dcbaker[m]: I still think the best plan is to just get everything using NIR, then we can just treat tgsi as the serialization format that virgl and svga use
15:42dcbaker[m]: I guess nine uses it too
15:42karolherbst: but doesn't virgl has a strange copy of tgsi somewhere?
15:42karolherbst: nine supports nir already afaik
15:42karolherbst: but yeah, I guess removing tgsi from drivers is the way to go
15:42dcbaker[m]: Through ttn I think
17:00ajax: woo ntt merged, nice work anholt
17:04krh: yeah, good stuff
17:10danvet_: robclark, I'm also wondering why you're checking for async_update in can_do_async
17:10danvet_: since I think you should never see that there
17:13robclark: danvet_: maybe we only need to check for async cursor update
17:16chadv: question: How do I force an app to use a specific GL driver in a dual-gpu system? context: My machine has a discrete AMD gpu with an Intel gpu. I want to run an app with Intel OpenGL. I've disconnected all displays from the AMD gpu.
17:17bnieuwenhuizen: doesn't it do that already?
17:17bnieuwenhuizen: IIRC most GL stuff with dri3 uses the same GPU as the X server
17:17chadv: I know that I can blacklist amdgpu on the kernel cmdline. But I'm specifically asking how to do this when both amdgpu and i915 are loaded.
17:18chadv: bnieuwenhuizen: I've only recently begun working with a dual gpu machine. So I just needed to confirm my assumptions.
17:18chadv: With an X server, sure. I assume the same holds for Wayland too.
17:18bnieuwenhuizen: the real question is what to do if X itself picks the wrong GPU :P
17:19chadv: Yes... initially, my gnome-shell primary display was connected to amd. Then I disconnected it and switched to Intel. I wonder if gnome-shell also switched gpus. I wish I knew a way to check.
17:20bnieuwenhuizen: glxinfo ?
17:20chadv: sigh. i'm making this more difficult than it should be.
17:23chadv: Next issue. How do I ensure that vulkan apps choose the right VkDevice? apps have freedom choice there.
17:23chadv: maybe i could set VK_ICD_FILENAMES, or find some other env var in the vulkan_loader docs.
17:23bnieuwenhuizen: VK_ICD_FILENAMES to the right driver
17:24bnieuwenhuizen: or just use the mesa device select layer which defaults to making the X/wayalnd device the first device listed
17:24chadv: ahh, thanks.
17:25chadv: bnieuwenhuizen: btw, thanks for deqp-runner.
18:02xexaxo1: chadv: the most reliable way is to use DRI_PRIME=gpu_path_id_tag_as_seen_in_udev
18:03xexaxo1: chadv: for example: udevadm info /dev/dri/card0 | grep ID_PATH_TAG -> E: ID_PATH_TAG=pci-0000_00_02_0
18:12shfbsdbvf: Hi, I wanted to report few bugs but then I thought that maybe I should check if they are not fixed in latest version
18:13shfbsdbvf: What do I need for it? Just latest mesa or kernel and llvm should also be latest?
18:13shfbsdbvf: The bugs are in radv
18:14ajax: depends on the bug, but start with just mesa
18:14xexaxo1: shfbsdbvf: my first bet would be mesa. newish radv uses aco (instead of llvm)
18:15xexaxo1: that would cut down the testing surface a bit :-P
18:17shfbsdbvf: Thanks, also those 2 bugs are in unreal engine. I think there's a chance that if I report them none will check them because you need ~100gb of free space and ~4 hours to build it.
18:18shfbsdbvf: Maybe I should add printfs to all radv functions, read the output and make a minimal program that reproduces this bug?
18:19xexaxo1: shfbsdbvf: might want to wire the vulkan validation layers to catch any obvious issues
18:19shfbsdbvf: xexaxo1: sorry, what layers?
18:23xexaxo1: shfbsdbvf: see https://vulkan-tutorial.com/Drawing_a_triangle/Setup/Validation_layers and https://github.com/KhronosGroup/Vulkan-ValidationLayers
18:24xexaxo1: former provides a nice summary/overview, while the latter (in docs) has documentation about each layer.
18:24shfbsdbvf: Thanks, I'll try them
18:25xexaxo1: ... one of which being a "printf"-alike, IIRC. odds are your distro alreayd ships a package with the vulkan layers
18:28xexaxo1: shfbsdbvf: there's no need to rebuild unreal engine in order to use them.
18:28shfbsdbvf: I meant that to experience the bug you have to build it, they don't ship binaries for linux
18:31xexaxo1: ack. explicitly pointing out, since the example in the first page may be confusing
19:05karolherbst: curro: any opinions on https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7069/diffs?commit_id=c712049c1a4aeadc6dbc462aaa911c896305b8d4 ?
19:06karolherbst: I think the patch is fine, just wondering if you prefer checking it before calling the constructor
19:09curro: karolherbst: might be okay, but I didn't verify that all the entry points calling that helper are specified to return CL_INVALID_VALUE in that case
19:10karolherbst: yeah.. there might be some corner cases where it would be fine I guess
19:11karolherbst: guess I will take a closer look and see where it might cause issues
19:12airlied: jekstrand: haven't reached the point in CL where you want to push functions into the backend? :-0
19:13anholt: MrCooper: any idea why the pages job on https://gitlab.freedesktop.org/mesa/mesa/-/pipelines/216013 is blocked? couldn't find it in the dag, and I tried running an earlier container job in case there was some implicit dependency on previous stage due to pages not declaring dependencies.
19:13jekstrand: airlied: luxmark4 makes a case for it....
19:13karolherbst: yeah.. nouveau potentially is able to make use of it
19:13karolherbst: that's like dead code for centuries
19:14airlied: jekstrand: do you have a calling convention? :-)
19:14jekstrand: airlied: Not yet.
19:15karolherbst: do we need a calling convention?
19:15jekstrand: Of some form
19:15jekstrand: That or global RA
19:15karolherbst: ohh mhh
19:15karolherbst: but that's up to the backend
19:15airlied: karolherbst: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7231/diffs?commit_id=a6dd7660698e157738fc3b8bf074c7400681e658 btw
19:15karolherbst: I was more refering to between objects
19:15jekstrand: In any case, not a problem I plan on solving this week.
19:15airlied: seems to point that I don't want to lower kernel inputs :-P
19:15karolherbst: I think it would be safe to assume we always link everything together, no?
19:16karolherbst: or do we want to have multiple objects whith calls between them?
19:16airlied: karolherbst: I haven't looked into if rocm or lvl0 allow that
19:16karolherbst: doesn't matter
19:16karolherbst: it's all driver internal only
19:16airlied: well just whether it's considered something people expect to have
19:18karolherbst: lvl0 doesn't seem to have "libraries" does it?
19:18karolherbst: but I guess it could still have it's native binaries as internal libs
19:19jekstrand: the Intel OpenCL driver likes to inline everything but it will use "real" function calls if given something truly absurd like luxmark4.
19:19karolherbst: my question was if you have several object files uploaded
19:19karolherbst: or is it all one big thing
19:19airlied: yeah heuristics ftw :-P
19:20karolherbst: but I guess because nobody wants to bother with relocations you probably don't want to have that
19:20airlied:wouldn't be shocked if the heuristic is (if lumxark4)
19:21jekstrand: I suspect not but I don't know that codebase
19:21karolherbst: anyway, function calls is something nouveau more or less supports (and reserves registers and stuff)
19:21karolherbst: but uff..
19:21karolherbst: it's more of an RA thing
19:22karolherbst: and I think it's only works based on a handful regs
19:22karolherbst: not sure what happens if you haver deeper call chains with 50 of regs
19:22karolherbst: that's all up to the backend to deal with it
19:22airlied: spilling :-P
19:23karolherbst: yeah.. fun
19:23karolherbst: or just sill function args
19:24karolherbst: anyway, this also leads to the problem of having a compilation order
19:24karolherbst: so you want to compile inner stuff first
19:24karolherbst: and so on
19:24karolherbst: or at least you need to know at some point where to move values
19:25karolherbst: airlied: you are so lucky that llvm does all that stuff for you :p
19:25airlied: karolherbst: yeah for llvmpipe I can liekly just tur off nir inlining, might be a good test
19:26airlied: aco might be more fun in the future :-P
19:33Emmy: an old article from 2013 mentioned vblank_mode=4, for adaptive vsync. which isnt an option anymore, how does on set this stuff nowdays?
19:39karolherbst: Emmy: I would be surprised if things were even implemneted back then
19:41Emmy: I see, so using FreeSync isnt an option yet? (besides X11's variable refresh rate option thing.. or i might be confused.)
19:41karolherbst: no idea about the current state, just I am quite positive we hadn't it implemented 7 years ago
19:42Emmy: Ah ok, thanks!
19:48xexaxo1: Emmy: got a link to the article? iirc vblank_mode=4 was never a thing
19:52Emmy: xexaxo1, sometimes i think phoronix is a bot ;) .. guess it never was merged or unfinished. https://www.phoronix.com/scan.php?page=news_item&px=MTU0NDE
19:54xexaxo1: Emmy: actually, the article says patches were sent... and seemingly they never got merged :-\
19:55xexaxo1: seems like there's no comments although, the patch only covers glx/dri2 - egl* and glx/dri3 are not implemented
20:10FLHerne: Emmy: "besides X11's variable refresh rate option thing" sounds possibly confused
20:10FLHerne: https://wiki.archlinux.org/index.php/Variable_refresh_rate is a fairly good description
20:11FLHerne: Well, of how to use it
20:11FLHerne: Unless your games aren't running under X...
20:45DPA: Xorg with atomic enabled now fails to pick up the linear modifier of the mxsfb in linux-next.
20:45DPA: drm_format_modifier::formats is 0 here, which I think is wrong: https://gitlab.freedesktop.org/xorg/xserver/-/blob/master/hw/xfree86/drivers/modesetting/drmmode_display.c#L2047
20:48DPA: If this was 3, for example, it would pick up the modifier.
20:58vsyrjala: mxsfb looks to be missing .format_mod_supported(), and iirc the code which creates the blob is broken w/o that
21:00vsyrjala: i think there are three options: a) make .format_mod_supported() mandatory, b) assume any format works when populating the blob, c) don't create the blob at all
21:01vsyrjala: from the kernel side at least. not sure if modesetting is expecting to be able to operate without the blob or not
21:09DPA: Thanks. It doesn't look like Xorg would get it right without the blob, so I think the other two options would be preferable.
21:10karolherbst: curro: mhh, reading the spec it is really unclear what to do once regions/origins/offsets are not set
21:10karolherbst: for example https://www.khronos.org/registry/OpenCL/sdk/2.2/docs/man/html/clEnqueueReadImage.html
21:10karolherbst: but if NULL would be allowed it would contradict other rules, like for 2D images certain elements have to be 0
21:10DPA: vsyrjala: I could probably figure out how to do a), but I don't understand drm/dri good enough to do b.
21:10karolherbst: the spec is the same for the buffer functions
21:11karolherbst: or we just assume 0?
21:11karolherbst: in case it's null
21:11karolherbst: there are some offsets where we could be as nice and just accept the input
21:11karolherbst: but then again...
21:12karolherbst: maybe have a vector_or_zero function where we think having 0 makes sense and everywhere else throw an error
21:12karolherbst: like region has to be something
21:12karolherbst: offsets and origins can be 0
21:13airlied: karolherbst: I'm not sure either ofthem can be NULL
21:14airlied: origin or region
21:14karolherbst: yeah, but the spec also doesn't define an error :p
21:14karolherbst: I think it makes sense to assume 0 on NULL
21:14karolherbst: that's more or less intuitive and follows some other functions
21:14karolherbst: local_work_size in clEnqueueNDRangeKernel eg is the same
21:15karolherbst: so the runtime just decides what's good
21:15karolherbst: but right, that's different than 0
21:15airlied: just ask on the public opencl docs
21:16airlied: raise an issue, someone usually comes along
21:16karolherbst: will do
21:20curro: karolherbst: seems like undefined behavior if there is nothing sane one can do... but then it should be okay to return a made-up error code as in EdB_'s patch
21:20karolherbst: anyway, filed a bug: https://github.com/KhronosGroup/OpenCL-Docs/issues/478
21:20gitbot: KhronosGroup issue 478 in OpenCL-Docs "Unclear requirements for region/offset/origin arguments in various functions" [Open]
21:22karolherbst: ehh, should have checked the 3.0 spec
21:22karolherbst: "CL_INVALID_VALUE if any region array element is 0." is all I see
21:22airlied: "CL_INVALID_VALUE if values in origin and region do not follow rules described in the argument description for origin and region."
21:22airlied: oh thats 2.2
21:23curro: but there is no rule saying it may not be null :P
21:23karolherbst: the description is what's above that
21:23karolherbst: I mean.. the spec tells us to segfault though :p
21:23karolherbst: because we have to check each element
21:23vsyrjala: DPA: i think a patch for b) or c) should a few lines. a) would require changing all the drivers
21:24airlied: karolherbst: maybe segfault is fine :-P
21:24curro: karolherbst: yeah x|, anyway EdB_'s patch should be fine then
21:24karolherbst: CL_NO_ERROR=1 :p
21:25karolherbst: okay cool
21:25karolherbst: anyway, I think the image MR is fully reviewed then?
21:25curro: seems like it yeah
21:26karolherbst: next: 1.2 features :/
21:26karolherbst: found a few places which needs more fixing
21:26karolherbst: image arrays are annoying
21:26airlied: karolherbst: printf? :-P
21:26karolherbst: especially 1Darray
21:26airlied: oh 1.2 image features
21:26karolherbst: I have the patches, they just need more love
21:29karolherbst: sooo. merge?
21:29curro: yep go for it
21:44karolherbst: anholt: btw, ntt broke scons-win64
21:59airlied: I thought scons was just a let vmware fix it
22:01karolherbst: I didn't say anholt should fix it, just wanted to point it out :p
22:08karolherbst: today CI hates me
22:09jekstrand: karolherbst: Having to update all thos llvmpipe CL test lists?
22:09karolherbst: that and flakes
22:10karolherbst: just got a softpipe timeout
22:10karolherbst: but I hit retry quick enough as it seems
22:10karolherbst: image support merged :)
22:11AndrewR: karolherbst, \0/
22:13karolherbst: ohh ehh
22:13karolherbst: somehow I managed to drop the vector patch...
22:13jekstrand: karolherbst: \o/
22:13karolherbst: had it removed locally, but I was sure I didn't push it.. guess I messed up.. oh well
22:16karolherbst: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7241 :D
22:18karolherbst: soo, now cleaning up that mess
22:18daniels: karolherbst: exciting! :)
22:44karolherbst: so we had this "CL_RGB or CL_RGBx. This format can only be used if channel data type =CL_UNORM_SHORT_565, CL_UNORM_SHORT_555, or CL_UNORM_INT_101010." thing in CL 1.2
22:45karolherbst: but this is gone with CL 2.0
22:45karolherbst: that's kind of a lead to fix some annoying crash with llvmpipe exposing those
22:46airlied: CL 2.0 does have some language in those formats
22:46airlied: "The channel order must be CL_RGB or CL_RGBx."
22:47karolherbst: but that's the other way around
22:47karolherbst: CL_SNORM_INT8 was forbidden in combination with CL_RGB and CL_RGBx
22:47airlied: yeah so it's leagl for others now
22:47karolherbst: but what about this in 2.0?
22:47karolherbst: the CTS crashes :p
22:47airlied: is that the CTS bug or ours though?
22:47karolherbst: there is also this sentence
22:47airlied: though I'd just not expose them from clover
22:48karolherbst: "The number of bits per element determined by the image_channel_data_type and image_channel_order must be a power of two."
22:48karolherbst: does an aligned malloc with an alignment of 12
22:49karolherbst: but CL_SNORM_INT8 * CL_RGB = 24 bits
22:49karolherbst: or am I wrong here?
22:49karolherbst: and that's npot
22:49karolherbst: so it doesn't meet the "bust be a power of two" req
22:50airlied: number of bits per element
22:50airlied: is per channel
22:50karolherbst: I think?
22:50karolherbst: not sure if element != channel or not
22:51daniels: karolherbst: https://github.com/KhronosGroup/OpenCL-CTS/pull/827
22:51gitbot: KhronosGroup issue (Pull request) 827 in OpenCL-CTS "Use power-of-two alignment values for allocating pixel data" [Closed]
22:51daniels: have you got that merged?
22:51karolherbst: daniels: so it's legal, just the CTS was broken?
22:52karolherbst:needs to update headers again :/
22:53airlied: karolherbst: element means channel
22:53airlied: I've never heard it used any other way, but this is CL so who knows
23:16airlied: okay now lowering kernel inputs lets me avoid the two qsorts :-P
23:17airlied: not lowering
23:23airlied: karolherbst, jekstrand ; if I'm adding a new lowering options for CL things, do I just need to update noveeau and iris, we have no others we care about do we? (maybe inform jenetali)
23:24karolherbst: airlied: mhh, I kind of prefer no api specific options though.. what do you want to add?
23:24karolherbst: if it's CL specific clover should set it
23:24karolherbst: not drivers
23:26karolherbst: or it's not CL specific and then it doesn't matter ;)
23:26karolherbst: but anyway, I prefer clover to handle those things
23:26airlied: I need two options
23:27airlied: one to not lower global_group_size and one to avoid the clover kernel input lowering
23:27airlied: neither are CL specific
23:27karolherbst: global_group_size goes into nir_lower_compute_system_values_options
23:28karolherbst: and that's constructed inside clover
23:28karolherbst: airlied: also, why do you need to avoid kernel input lowering
23:28airlied: karolherbst: unfortunately not
23:28airlied: it's lowered earlier
23:29karolherbst: by what?
23:29airlied: karolherbst: lower system values
23:29karolherbst: this just turns it into an intrinsic
23:29airlied: karolherbst: I've pinged you twice pointed you at the radeonsi patch
23:29airlied: that is the reason for avoiding kernel input loweirng
23:29karolherbst: ehh, because llvm is stupid and does its own stuff?
23:30airlied: llvm has a shader ABI I can't really change
23:30karolherbst: kernels != functions though
23:30airlied: llvm has a shader ABI I can't really change
23:30airlied: it expects a kernel to have a function entry point
23:31airlied: with parameters
23:31airlied: I can rebuild it at the cost of two qsorts or I can just not lower
23:31karolherbst: because it does the wrapping itself?
23:32airlied: it does it's own magic, bypassing it would mean changing ghe backend abi
23:33karolherbst: the thing is, we don't need any magic
23:33airlied: karolherbst: the global group size lowering isn't do an intrisc
23:33airlied: it lowers it to a multiply
23:33airlied: of two other intrinsics
23:33karolherbst: it shouldn't
23:34airlied: it does
23:34airlied: my patch addes an intrinsic
23:34airlied: actually I should fix my patch to move the lowering options then
23:34karolherbst: you could move it to lower compute system values
23:35karolherbst: but I don't see why that change is needed at all?
23:35airlied: it's shader ABI is what
23:35airlied: llvm has a shader ABI I can't really change
23:35karolherbst: how broken is the shader ABI :/
23:35airlied: there we go :-P
23:35airlied: it's fixed in a large rock
23:36airlied: it works for ROCm and non-nir clover
23:36karolherbst: sure it works, but there is no benefit
23:36airlied: karolherbst: but I can't change it
23:36karolherbst: so you need to pass it as a func arg or something?
23:37karolherbst: just trying to understand how that's ABI
23:37airlied: I need to extract it from a magic struct
23:37airlied: to launch a kernel you fill out a magic structure of parameters
23:38airlied: then you can get acess to that via an llvm intrinsic
23:38karolherbst: that sounds horrible broken and suffering from compiler is external to runtime :/
23:38karolherbst: is there no such "eat this llvm and stop being annoying" flag? :/
23:39airlied: see the dispatch packet
23:39karolherbst: right, sure
23:40karolherbst: but yeah, I guess that's fine...
23:40airlied: the non-kernel calling convention might be possiblebut I expect it's a fairly mess hole to dig
23:40karolherbst: and how does lowering break this?
23:41airlied: grid_Size_x is the multipled value I need for get_global_size
23:41airlied: so I need to access 3 uint32_t from that struct directly, not two multiplied values from elsewhere
23:42karolherbst: so you don't have "amount of groups" but just the size directly...
23:42karolherbst: and how do you get the num_work_groups thing?
23:43karolherbst: does it divide inside the kernel then?
23:43karolherbst: or is that some hw value it can read from?
23:43airlied: get_num_groups does a divide
23:43airlied: that's the libclc equiv code from rocm
23:44karolherbst: just.. they prevent doing a mul, by having to use divide elsewhere?
23:44karolherbst: some priorities
23:44airlied: yeah it's horrible, not sure how it came about
23:44karolherbst: proabbly "nobody uses num_work_groups"
23:44airlied: though I expect it's one of those we made an ABI now stick with it problems
23:44airlied: and that
23:45karolherbst: yeah, okay, I guess we could make that conditional
23:45karolherbst: I just wished we wouldn't have to deal with too many driver specific flags
23:46airlied: maybe once oradeonsi can use aco and aco grows kernel support it can be removed :-P
23:46karolherbst: airlied: for the kernel arg lowering, I'd add a spirv_options thing
23:46karolherbst: and I guess some way (pipe flag?) to turn that stuff on
23:47karolherbst: that would be the cleanest solution imho
23:47airlied: okay so not a NIR thing, make sense I suppsoe
23:47karolherbst: nir_compiler_options doens't care about the API, and I'd like to keep it this way
23:48karolherbst: although we already have the shader type inside the function pointer
23:49karolherbst: airlied: although this kernel arg thing is more of a shader type ABI, nothing API specific right?
23:49karolherbst: for lvl0 you'd have to do the same I mean
23:50airlied: yeah it's not CL specific
23:50karolherbst: maybe in this case I'd be fine with sticking it in the nir_compiler_options
23:50karolherbst: I just don't want any API business in there
23:52karolherbst: but I think the general idea was to not make this struct grow for ever :D
23:52karolherbst: it's kind of huge already
23:52jekstrand: airlied: That's all we have AFAIK
23:55airlied: uggh compute caps aren't default