11:28Venemo: karolherbst: I'm trying to use rusticl to reproduce that bug, but it doesn't wanna work. I'm trying this: RUSTICL_ENABLE=radeonsi OCL_ICD_VENDORS=/etc/OpenCL/vendors/rusticl.icd clinfo ==> and it doesn't find any devices
11:28Venemo: karolherbst: what am I missing?
11:29karolherbst: Venemo: might just want to use a devenv, but that _should_ just work. Any errors or something?
11:29Venemo: no errors, just no devices found
11:30karolherbst: Venemo: but it does find the platform?
11:30Venemo: no
11:31karolherbst: do you have a libRusticlOpenCL.so.1 file somewhere?
11:31Venemo: https://paste.centos.org/view/raw/aa53a3e5
11:31karolherbst: normally that should be in your library paths
11:31Venemo: yes, I've got: /usr/lib64/libRusticlOpenCL.so.1 /usr/lib64/libRusticlOpenCL.so.1.0.0
11:32karolherbst: mhhh.. does LD_DEBUG=libs show anything interesting?
11:33Venemo: karolherbst: https://paste.centos.org/view/raw/cbb17ab5
11:33karolherbst: does the /etc/OpenCL/vendors/rusticl.icd file even exist?
11:34karolherbst: or rather.. does it have "libRusticlOpenCL.so.1" in it?
11:34Venemo: it's in the paste
11:34karolherbst: ahh
11:34karolherbst: ohhh
11:34karolherbst: it's the khronos loader...
11:34karolherbst: uhhh
11:35Venemo: is there a different one?
11:35karolherbst: yeah....
11:35karolherbst: OCL_ICD_FILENAMES=/usr/lib64/libRusticlOpenCL.so.1
11:36karolherbst: well.. only the file name should do as well
11:36Venemo: I thought that OCL_ICD_FILENAMES should point to the icd file
11:36karolherbst: with the khronos one it points to a directory with the icd files...
11:36karolherbst: there is also the ocl-icd loader which is normally used
11:36karolherbst: the khronos one wasn't open source from the start, so ocl-icd was written and used a lot
11:37karolherbst: but now I think some are using the khronos one and the env variables don't match 100%
11:37Venemo: this works: RUSTICL_ENABLE=radeonsi OCL_ICD_FILENAMES=/usr/lib64/libRusticlOpenCL.so.1 clinfo
11:37karolherbst: yeah.. it's a disaster, but with a meson devenv it should just work on either
11:37pendingchaos: is libRusticlOpenCL.so.1 in your LD_LIBRARY_PATH?
11:37Venemo: yes
11:37Venemo: well, I had no idea about the different loaders...
11:38karolherbst: yeah.....
11:38Venemo: I also don't know why this one is installed
11:38karolherbst: ocl-icd is "ICD loader Name OpenCL ICD Loader"
11:39karolherbst: so without the "Khronos"
11:39Venemo: right, but how did the khronos one end up in my system?
11:39karolherbst: it's kinda the preferred one these days
11:39Venemo: fwiw, this is a relatively fresh install of latest fedora
11:40karolherbst: so your distro might just use that one
11:40karolherbst: ocl-icd is.. a bit buggy in weird places
11:40karolherbst: I should probably switch as well...
11:41Venemo: aha
11:42karolherbst: they want to add layers to Opencl and ocl-icd doesn't support that
11:43Venemo: okay, I managed to get it to run, there are at least 2 problems
11:44Venemo: firstly, the shader fails validation and is printed. and the crash happens while printing it
11:45karolherbst: for which issue is that again?
11:45Venemo: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13061
11:47karolherbst: ahh the memory corruption one
11:48Venemo: not sure if it's a memory corruption or just a bug
11:48karolherbst: well.. "call ɘ\E1\8D\20\25\33\32\33\2C\20\25\33\31\35\0D\0A\65\72\72\6F\72\3A\20\73\72\63\2D\3E\73\73\61\2D\3E\6E\75\6D\5F\63\6F\6D\70\6F\6E\65\6E\74\73\20\3D\3D\20\6E\75\6D\5F\63\6F\6D\70\6F\6E\65\6E\74\73\20\28\2E\2E\2F\73\72\63\2F\63\6F\6D\70\69\6C\65\72
11:48karolherbst: \2F\6E\69\72\2F\6E\69\72\5F\76\61\6C\69\64\61\74\65\2E\63\3A\32\30\35\29\0D\0A\0D\0A\20\20\20\20\20\20\20\20\33\32\20\20\20\20\25\33\32\34\20\3D\20\40\6C\6F\61\64\5F\64\65\72\65\66\20\28\25\33\32\33\29\20\28\61\63\63\65\73\73\3D\6E\6F\6E\65
11:48karolherbst: \29\0D\0A\20\20\20\20\20\20\20\20\33\32\20\20\20\20\25\33\32\35\20\3D\20\6C\6F\61\64\5F\63\6F\6E\73\74\20\28\30\78\30\30\30\30\30\30\30\30\20\3D\20\30\2E\30\30\30\30\30\30\29\0D\0A\20\20\20\20\20\20\20\20\31\20\20\20\20\20\25\33\32\36\20\3D
11:48karolherbst: \20\69\65\71\20\25\33\32\34\2C\20\25\33\32\35\20\28\30\78\30\29\0D\0A\20\20\20\20\20\20\20\20\31\20\20\20\20\20\25\33\32\37\20\3D\20\6C\6F\61\64\5F\63\6F\6E\73\74\20\28\74\72\75\65\29\0D\0A\20\20\20\20\20\20\20\20\20\20\20\20\20\20\20\20\20
11:48karolherbst: \20\20\20\20\2F\2F\20\73\75\63\63\73\3A\20\62\31\31\36\20\62\32\32\32\0D\0A\20\20\20\20\20\20\20\20\69\66\20\25\33\32\36\20\7B\0D\0A\20\20\20\20\20\20\20\20\20\20\20\20\62\6C\6F\63\6B\20\62\31\31\36\3A\20\20\2F\2F\20\70\72\65\64\73\3A\20\62
11:48karolherbst: \31\31\35\0D\0A\20\20\20\20\20\20\20\20\20\20\20\20\36\34\20\20\20\20\25\33\32\38\20\3D\20\64\65\72\65\66\5F\76\61\72\20\26\72\65\74\75\72\6E\5F\74\6D\70\23\39\37\20\28\66\75\6E\63\74\69\6F\6E\5F\74\65\6D\70\20\75\69\6E\74\29\0D\0A\20\20\20
11:48karolherbst: \20\20\20\20\20\20\20\20\20\20\20\20\20\20\20\20\20\20\20\20\20\20\63\61\6C\6C\20\C9\98\E1\8D\20\25\33\32\38\2C\20\25\31\31\0D\0A\65\72\72\6F\72\3A\20\73\72\63\2D\3E\73\73\61\2D\3E\6E\75\6D\5F\63\6F\6D\70\6F\6E\65\6E\74\73\20\3D\3D\20\6E\75
11:48karolherbst: \6D\5F\63\6F\6D\70\6F\6E\65\6E\74\73\20\28\2E\2E\2F\73\72\63\2F\63\6F\6D\70\69\6C\65\72\2F\6E\69\72\2F\6E\69\72\5F\76\61\6C\69\64\61\74\65\2E\63\3A\32\30\35\29\0D\0A\0D\0A\20\20\20\20\20\20\20\20\20\20\20\20\33\32\20\20\20\20\25\33\32\39\20
11:48karolherbst: \3D\20\40\6C\6F\61\64\5F\64\65\72\65\66\20\28\25\33\32\38\29\20\28\61\63\63\65\73\73\3D\6E\6F\6E\65\29\0D\0A\20\20\20\20\20\20\20\20\20\20\20\20\31\20\20\20\20\20\25\33\33\30\20\3D\20\69\65\71\20\25\33\32\39\2C\20\25\33\32\35\20\28\30\78\30
11:48karolherbst: \29\0D\0A\20\20\20\20\20\20\20\20\20\20\20\20\20\20\20\20\20\20\20\20\20\20\20\20\20\2F\2F\20\73\75\63\63\73\3A\20\62\31\31\37\20\62\32\32\30\0D\0A\20\20\20\20\20\20\20\20\20\20\20\20\69\66\20\25\33\33\30\20\7B\0D\0A\20\20\20\20\20\20\20\20
11:48karolherbst: \20\20\20\20\20\20\20\20\62\6C\6F\63\6B\20\62\31\31\37\3A\20\20\2F\2F\20\70\72\65\64\73\3A\20\62\31\31\36\0D\0A\20\20\20\20\20\20\20\20\20\20\20\20\20\20\20\20\36\34\20\20\20\20\25\33\33\31\20\3D\20\64\65\72\65\66\5F\76\61\72\20\26\72\65\74
11:48Venemo: oof
11:48karolherbst: \75\72\6E\5F\74\6D\70\23\39\38\20\28\66\75\6E\63\74\69\6F\6E\5F\74\65\6D\70\20\75\69\6E\74\29\0D\0A\20\20\20\20\20\20\20\20\20\20\20\20\20\20\20\20\20\20\20\20\20\20\20\20\20\20\20\20\20\63\61\6C\6C\20\C9\98\E1\8D
11:48karolherbst: "
11:48karolherbst: ehhh...
11:48karolherbst: copy paste fail 🙃
11:48Venemo: don't paste that
11:48Venemo: :P
11:48karolherbst: but yeah....
11:48karolherbst: I thought it's one line :D
11:48Venemo: it seems that the crash comes from trying to print the name of a param
11:48karolherbst: yeah
11:48karolherbst: and the function is already freed probably
11:48karolherbst: hence the weird name probably
11:49Venemo: so, either there is a bug (and there shouldn't be any param) or... who knows
11:49karolherbst: I think it's a use-after-free somehow
11:49karolherbst: or something
11:49Venemo: it seems to work fine on Raphael and fails on Oland
11:49karolherbst: if you do NIR_DEBUG=print you'll see that the calls did point to valid functions at some point
11:50karolherbst: anyway... maybe throw in libasan and see if that gives you anything
11:50Venemo: I'll try that
11:51Venemo: karolherbst: do you still work for Red Hat?
11:52karolherbst: I do, why?
11:54Venemo: they borked llvm in fedora so it's no longer possible to have the x86_64 and i686 versions installed alongside each other. can you please ping the appropriate people and point them to this bug? https://bugzilla.redhat.com/show_bug.cgi?id=2365079
11:54Venemo: this makes it really troublesome to compile 32 and 64 bit mesa alongside each other
11:57karolherbst: ohh right.. I've ran into it as well...
11:57Venemo: they made an MR, then closed it, and left it like that for weeks
11:57karolherbst: Venemo: Tom is the right person for that, so maybe ping Tom or so
11:58Venemo: I since I didn't get any response, I think it would be better if the ping came from a colleague
11:59Venemo: I wouldn't even know where to ping him
11:59karolherbst: looks like the author of that PR also works at RH
12:00karolherbst: I see, well I can ping, because it was annoying me as well 🙃
12:00Venemo: thx
12:00Venemo: btw, with asan, I get a different issue
12:00Venemo: OpenCL-Benchmark-Linux: ../src/compiler/nir/nir_gather_info.c:951: gather_func_info: Assertion `impl || !"nir_shader_gather_info only works with linked shaders"' failed.
12:02Venemo: and after that, asan complains about python leaking memory...
12:03karolherbst: mhhh
12:03karolherbst: did you build mesa with libasan as well?
12:04Venemo: I used this config: meson setup build64debug --libdir lib64 --prefix $HOME/mesa -Dgallium-drivers=radeonsi,llvmpipe,softpipe,zink -Dvulkan-drivers=amd -Dgallium-rusticl=true -Db_sanitize=address -Dbuildtype=debug
12:04karolherbst: mhhh
12:05karolherbst: impl is NULL in the assertion?
12:05Venemo: yes
12:05karolherbst: it does work with other GPUs and llvmpipe, right?
12:06Venemo: it works with Rembrandt (GFX10.3) but fails with Oland (GFX6)
12:06karolherbst: mhhh
12:06karolherbst: which nir_gather_info call is this in?
12:07Venemo: here is the backtrace: https://paste.centos.org/view/raw/670431b2
12:07karolherbst: mhhh
12:08karolherbst: mind running with `NIR_DEBUG=print` and pastebin the last shader?
12:08karolherbst: the shader should be fully linked at this point, so kinda curious on what's going on there
12:09Venemo: I'm going to get a coffee, will ping you when i'm back
12:09karolherbst: I suspect it's something going wrong in vtn_opencl.c and the linking messing up...
12:12karolherbst: ohh mhh.. I think _Z11__clc_isnanf isn't linked in...
12:14karolherbst: `function_link_pass` might not find the _Z11__clc_isnanf function...
12:14karolherbst: and this is all triggered when the device does not support ffma
12:25karolherbst: yeah... it triggers when lower_ffma32 is set to true
12:25karolherbst: (but there is also a infinite opt loop thing going on... *sigh*)
12:25karolherbst: with recent main that is or so
12:56Venemo: karolherbst: I'm back, is there anything more I can do to help?
12:56karolherbst: Venemo: don't think so
12:58Venemo: I find it really weird that lower_ffma32 would trigger this, because then it would happen on GFX7-9 as well
12:59karolherbst: it should happen there as well
12:59karolherbst: well..
12:59karolherbst: not GFX9
12:59Venemo: GFX9 doesn't have fma
13:00Venemo: or wait, maybe it does?
13:00karolherbst: it does
13:00karolherbst: gfx8 as well
13:00karolherbst: soo.. the reason for this is, that in OpenCL ffma exists and it means ffma
13:00karolherbst: not anything else
13:00karolherbst: so on hardware not having actual ffma, there is emulation code
13:00karolherbst: and that code is a bit massive and includes a call to is_nan
13:01karolherbst: and the is_nan call isn't properly added to the shader
13:01karolherbst: at some point the function stub gets removed, hence the weird function names
13:01Venemo: why is that a call? it should be just 1 alu instr
13:01karolherbst: I think the impl = pointer also points to weird stuff
13:01karolherbst: Venemo: it's a builtin function
13:02Venemo: aha
13:02karolherbst: on a spirv level it's a extended instruction set
13:02karolherbst: for opencl
13:02karolherbst: and ffma is one of those
13:02Venemo: okay
13:02karolherbst: and vtn_opencl either emits nir ffma, or a function call depending on the device
13:03Venemo: i understand
13:03karolherbst: didn't bother with implementing ffma emulation if libclc (the opencl builtin lib we are using) already has it as a spirv function
13:03karolherbst: so yeah...
13:03Venemo: I'm glad you figured it out
13:04Venemo: I'm going to plug this gpu out now
13:04karolherbst: thanks for the help tho
13:04Venemo: sorry it took me so long to get to it
13:06karolherbst: that reminds me, that lower_ffma32 could be optimized for GFX8...
13:06karolherbst: GFX8 has it, but it's slower than fmul+fadd, but hw ffma there is still faster than the software emulation
13:07karolherbst: but also who cares
13:11Venemo: honestly, it sounds like something that could be optimized in the backend if anyone cares to do it
13:12Venemo: that said, the result of an fma may not be the same as a fmul+fadd
13:26karolherbst: *sigh*, I see what's the problem now
13:26karolherbst: %_Z11__clc_isnanf and %_Z11__clc_isnand are not defined in the spirv library, but imported, but nothing provides an impl
13:27karolherbst: Venemo: yeah, but CL has a real ffma thing and it must be ffma and nothing else
13:27karolherbst: and it's not optional
13:52HdkR: ffma should never be optional these days :)
13:56karolherbst: aaaand fixed
13:56karolherbst: not our bug (tm)
14:08karolherbst: Venemo: fyi https://pagure.io/fesco/issue/3414
14:53Venemo: karolherbst: nice, thx
15:34glehmann: HdkR: adreno didn't get that message
15:36HdkR: uh oh