12:47pmoreau: karolherbst: Hey, could you please merge https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10256? Francisco seems to be okay with it. (Sorry if it does double-post, but it looks like it didn’t go through the first time as I wasn’t logged in.)
14:58karolherbst: pmoreau: assigned to marge
15:00karolherbst: soo... do we have tearing with the nouveau ddx and dri3?
15:01karolherbst: I am under the impression that usually with dri3 there shouldn't be any...
15:17pmoreau: karolherbst: Thanks!
15:57karolherbst: pmoreau: you need to take care of the llvmpipe stuff
15:58karolherbst: -api/clsetkernelarg/set kernel argument for cl_int3: fail
15:58karolherbst: as you fixed this test, you also need to update the ci fail list
16:03pmoreau: Ah, I’ll look at it in a bit.
16:28pmoreau: karolherbst: Can I keep your Rb on that patch if I just modify it to remove that test from the CI fail list?
16:29pmoreau: Thanks; new version pushed
17:25karolherbst: https://gist.githubusercontent.com/karolherbst/33729c7b46bdace79f247a948e852950/raw/27d72aaa3f7914d004e3ac1f975efdcbd5b9307b/gistfile1.txt :3
17:32karolherbst: not sure if we should kill the application, but better to do random shit than to spin forever
17:33karolherbst: and this spin timeout does hurt CL probably anyway
17:33karolherbst: I think
17:36karolherbst: and here the mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12189 :)
22:53pmoreau: Do we not use the actual launch block size for configuring the register allocator? `info->prop.cp.numThreads` which is used to compute the number of threads per block used by the register allocator, is initialised for both NIR and TGSI frontends based a fixed block size being specified in the shader. And if nothing was specified (which probably happens relatively often with OpenCL), we default to 512 (1024 on Kepler+) threads, even if the
22:53pmoreau: kernel was launched with a block size of 1.
22:54karolherbst: pmoreau: that's correct
22:54karolherbst: but we should know if when actually launching the shader :/
22:54karolherbst: but yeah...
22:55karolherbst: but does it actually matter all that much?
22:55karolherbst: or is that a problem in case it's a complex shader and we wouldn't have enough thrads?
22:58pmoreau: It seems stupid to start using quite a bit of private memory due to spilling when we still have registers available.
22:58karolherbst: yeah.. but we don't know how many threads are used when it gets launched...
22:59karolherbst: CL added this hint thing for a reason
22:59karolherbst: so one could do variants
22:59karolherbst: so maybe we should just compile twice if the kernel contains the size hint
22:59karolherbst: and select based on how the kernel gets enqueued
23:00karolherbst: but it doesn't matter
23:00karolherbst: with CL we always have the fixed size as we compile once we launch
23:00karolherbst: or not?
23:00pmoreau: We do know how many threads are used when it gets launched: we have the grid size and block size specified!
23:01pmoreau: Or rather, we at the moment only compile as the kernel gets launched, so we have all the data available.
23:01karolherbst: when linking we only link to the ... nir?
23:02karolherbst: most of the work is already done by then, so... "meh"
23:02pmoreau: Might even be only the SPIR-V modules being linked.
23:02karolherbst: don't think so
23:03karolherbst: compiler::link_program calls "nir::spirv_to_nir(spirv::link_program(ms, dev, opts, log), dev, log);"
23:03karolherbst: so we even linked in libclc and DCEed everything
23:03karolherbst: it's not perfect, but also not terrible
23:04karolherbst: but in theory we could compile to the final ISA earlier, I just don't really see the point especially with caching
23:04karolherbst: we could add some caching to clover
23:04karolherbst: atm we only cache libclc
23:06pmoreau: Ignoring the actual block size: `type: 5, local: 1320, shared: 0, gpr: 16, inst: 2853, bytes: 22824`
23:06pmoreau: Using the block size: `type: 5, local: 0, shared: 0, gpr: 47, inst: 1680, bytes: 13440`
23:06karolherbst: we don't pass in this info?
23:06karolherbst: pmoreau: try chaning the block size in the next launch and see what happens :D
23:07karolherbst: the problem is, that the nir doesn't contain that info
23:07pmoreau: But we get that info from clover
23:08karolherbst: pmoreau: you have to adjust kernel::exec_context::bind as well
23:08karolherbst: the issue is changing stuff
23:08karolherbst: atm we only recreate the state if the local or input mem reqs changed
23:09pmoreau: Ah right
23:09karolherbst: uhm... how can the input size even change ¯\_(ツ)_/¯
23:09karolherbst: but yeah..
23:10karolherbst: I think the idea was to keep the kernel generic so we don't recompile all the time
23:10karolherbst: as I said... maybe we should just keep two copies and create them on demand
23:10pmoreau: Also, if I do not relax the number of threads some of the CTS tests fail due to the register allocator giving up trying to find space for all the registers. 🙃
23:10karolherbst: or a few more
23:10karolherbst: RA is busted
23:11karolherbst: the spilling logic
23:11karolherbst: it's .... just broken
23:11karolherbst: not in code
23:11karolherbst: the algorithm is
23:12pmoreau: It’s interesting how some things have changed since I last looked at OpenCL: I now have 4 tests that are passing while they were previously triggering some MP_TIMEOUT before. And another one that now triggers a null pointer dereference.
23:13karolherbst: mhh maybe I have the TGSI somewherre
23:15pmoreau: Ideally, would we want to trigger re-creating the state if the block size changes? Or would it be better to avoid it by having a different API than the one we have currently?
23:19karolherbst: no idea
23:19karolherbst: the issue are really applications changing the size all the time
23:19pmoreau: Well, I currently have the issue without even changing the size once. :-)
23:20karolherbst: what do you mean?
23:20karolherbst: atm we only compile a generic kernel which works with all sizes, no?
23:21pmoreau: We compile a generic kernel that assumes the biggest block size, resulting in unnecessary spills and sometimes the register allocator even giving up.
23:21karolherbst: pmoreau: ahh.. found it: https://gist.github.com/karolherbst/8244b9dd0c1281efc6b9aeb3f913f66e
23:22karolherbst: so tell me why RA should even _try_ to spill there :p
23:24pmoreau: Those are some nice moves you have there :-D
23:25karolherbst: took me a while to reduce a shader so much it actually makes no sense anymore