16:22HdkR: Really need to get Karol on a bouncer sometime so I can throw things at them when not around
19:03HdkR: karolherbst: https://cdn.discordapp.com/attachments/765304672579092511/1028814485017866301/Screenshot_2022-10-09_16-40-55.png There you go
19:03karolherbst: HdkR: nice
19:04karolherbst: though interesting it renders soo badly
19:04karolherbst: but maybe the emulation of the JIT code isn't all there yet?
19:04HdkR: It /seems/ like it does the blend between frames is done CPU side instead of OpenCL side
19:05karolherbst: well. it's a ray tracer
19:05HdkR: It reproduced without thunks and on different hardware, so likely a CPU emulation bug
19:05karolherbst: from what I can tell is, that the benchmarks fire offs traces and increases precision on the output image wich each one fired
19:05karolherbst: *with
19:06karolherbst: HdkR: how well is llvmpipe supported?
19:06karolherbst: could try this next
19:06HdkR: Right, I think their accumulation happens outside of CL for some reason though. Watching the first frames come in is telling
19:06HdkR: llvmpipe should work, I guess with wiring up rusticl or something?
19:07karolherbst: yeah
19:07karolherbst: LP_CL=1 and it should just work
19:07karolherbst: HdkR: luxmark also has a C++ version of the run. You can select that under "MOde"
19:08HdkR: ah true, that path renders fine
19:09HdkR: I don't think I have rusticl built and installed to be able to use LP_CL
19:09karolherbst: yeah... need to set up rust and all of that
19:10karolherbst: HdkR: how many rays/s did you got with pocl?
19:10karolherbst: though the score is based on that, and it's high enough...
19:10karolherbst: I'd expect that pocls highly optimized JIT code is triggering emulation bugs
19:10karolherbst: pocl is like 10x faster than llvmpipe
19:10HdkR: That was claiming 2463K in the picture
19:11karolherbst: mhhh
19:11karolherbst: I already see it coming, I'll buy an M1 this year...
19:11HdkR: Nah, wait for M2 Pro/Max at this rate
19:12karolherbst: probably
19:12HdkR: :P
19:13HdkR: oop, need to update meson to get rusticl apparently
19:13karolherbst: yeah
19:13karolherbst: https://docs.mesa3d.org/rusticl.html?highlight=rusticl
19:16HdkR: also /really/ new llvmspirvlib
19:17HdkR: Requires 15, only have 13 on Ubuntu repos atm
19:17karolherbst: it needs to match the LLVM version you use for mesa
19:17HdkR: ah
19:18karolherbst: long term we'll be able to get rid of it once LLVM's own SPIR-V target becomes competent
19:18HdkR: I was using a really new clang for testing if a bug was fixed, so that makes sense
19:18karolherbst: yeah.. but it also builds in really fast
19:29HdkR: Oh nice, rusticle emmintrin.h conversion problems
19:29karolherbst: okay. so here is the thing
19:29karolherbst: you can't have multiple versions of libllvm installed
19:30karolherbst: or libclang or something
19:30karolherbst: systemwide I mean
19:30HdkR: lol wtf
19:30karolherbst: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7268
19:31karolherbst: something something loading order something
19:33karolherbst: I know it's fine to have the new LLVM inside LD_LIBRARY_PATH when building
19:33karolherbst: but I don't mess with PATH
19:33karolherbst: huh.. I actually do..
19:33karolherbst: guess then it really only matters what's inside /usr
19:34HdkR: Might be a bit tricky for me to set this up with my current environment
19:37karolherbst: HdkR: might make things easier to just use rustup and use bindgen from there...
19:38karolherbst: ehhh... though not sure it even has bindgen
19:38karolherbst: it does actually
19:39karolherbst: ehh.. cargo I mean
19:39karolherbst: HdkR: "cargo install bindgen"
19:39karolherbst: and then you need to add ~/.cargo/bin to your PATH
19:39karolherbst: though not 100% sure it gets around the LLVM issue
19:42HdkR: Fixed it by building a new spirv-llvm-translator that matched the llvm-14 I had installed
19:42karolherbst: ahh yeah :)
19:42karolherbst: getting rid of a custom LLVM always helps
19:42karolherbst: okay...... OpenCL conformance run on iris go
19:44HdkR: That's llvmpipe+rusticl working
19:44HdkR: And also with FEX without the weirdo blend problem
19:44karolherbst: :)
19:46HdkR: Let this finish on my TR and then kick it over to the ARM board and see if it does the same
19:46karolherbst: not even sure it's a weird blend problem. Some part of the image are static and what's white might be the rays always returning full white
19:46HdkR: It's interesting since the first frame looks fine but then the fireflies start growing
19:46karolherbst: not 100% sure on what's ray traced here, but the parts you saw correct are also correct if you really mess up a lot
19:46karolherbst: heh
19:47karolherbst: could be precision being off
19:48HdkR: Could be if it is assuming some precision of some transcendental routines
19:48karolherbst: yeah
19:48karolherbst: we have the same issue on radv and anv
19:49karolherbst: the inner of the ball and the socket are just very noisy there
19:51karolherbst: guess the best way of fixing those issues in fex is to run the math/integer CTS tests
19:51HdkR: That would be cool
19:51HdkR: If we can find a bug through those we could reduce the case down to something our ASM suite can eat
19:53karolherbst: yeah.. the tests are really nice, they only differ in the ALU stuff they use
19:53karolherbst: and there are tests for each CL builtin
19:54HdkR: Okay, what's isl and how do I destroy it?
19:55HdkR:deletes intel drivers
19:56karolherbst: :D
19:56HdkR: Seems like they decided to hardcode -msse2 in their driver again.
19:56karolherbst: yeah.. without discrete GPUs that might be a safe assumption, but..
19:57HdkR: Need to slap their driver in to the ARM builders
19:57karolherbst: we should just build all drivers on all platforms
19:58HdkR: Created a tracking issue for it
19:59karolherbst: today I learned that mesas spirv_to_nir emits some instructions incorrectly :(
20:02HdkR: oops
20:02karolherbst: yeah.. but it was FMod so I am not even surprised people never noticed
20:02karolherbst: (And the spec being super explicit on what to di was 1.3 in the first place)
20:02karolherbst: but I always had those spirv vs clc inconsistency and never figured out what's wrong
20:03karolherbst: turns out OpenCL C fmod is actually FRem, and spirvs fmod is like fmod but with a copysign
20:03karolherbst: it makes no sense
20:04HdkR: Sounds like the silly behaviour that x87 has FPREM and FPREM1
20:06karolherbst: yep
20:06karolherbst: exactly that
20:06karolherbst: though you still need the copysign on top
20:07karolherbst: seems like both actually need the sign copied..
20:08karolherbst: maybe our lowering is weirdly broken
20:09karolherbst: ahh yeah.. fmod complains now.. *sigh*
20:09karolherbst: guess I'll ignore the spirv ext for now
20:10karolherbst: HdkR: the compile shouldn't fail without rustfmt though
20:11karolherbst: you simply get one lined bindings
20:11HdkR: It completely errored out compiling saying the executable couldn't be found
20:11karolherbst: strange
20:11karolherbst: I know there are warnings printed, but it always compiled for me
20:12HdkR: Different environments doing different things I guess
20:12karolherbst: yeah.. feels that way
20:12HdkR: Making the user experience better is always an improvement :D
20:13karolherbst: requiring rustfmt isn't even a bad thing, because otherwise you might accidentally opened the bindings rs files and your editor just dies
20:14HdkR: hah
20:21HdkR: oop
20:21HdkR: llvm explode!
20:21karolherbst: oh wow
20:21HdkR: `Cannote select: v4f32 = fp_extend ...`
20:22HdkR: womp womp
20:23karolherbst: heh
20:24HdkR: I guess no one in AArch64 land hit that case before
20:24karolherbst: not surprised :D
20:24HdkR: Looks like it is trying to fp_extend a v4i16
20:25karolherbst: ahh.. yeah... that makes somewhat sense
20:25karolherbst: we treat fp16 pretty much like int16
20:25karolherbst: shouldn't be terrible hard to fix in llvmpipe
20:29HdkR: Maybe once turnip works it'll be something to not care about :P
20:30karolherbst: well... we need a working software fallback, but I'd rather have that to be pocl
20:30karolherbst: pocl is just incredibly fast
20:39karolherbst: HdkR: what's missing from turnips side though?
20:41karolherbst: ohh wait.. turnip is vulkan
20:41karolherbst: this should all just work on freedreno
20:42HdkR: Oh right, freedreno side. dunno. I guess it relates to the log saying "msm: missing driver"?
20:42karolherbst: yeah.. need to add it to src/gallium/frontends/rusticl/meson.build
20:42karolherbst: and that should be it
20:42karolherbst: might have to enable it the same way as for clover
20:42karolherbst: ehh
20:42karolherbst: targets
20:42karolherbst: not frontends
20:43karolherbst: seems like it should just work actually
20:44karolherbst: not having lower_uniforms_to_ubo is a bit of a pita though, because I want to slowly move towards drivers just using that
20:44karolherbst: shouldn't matter for main
20:46HdkR: Only device I can see is llvmpipe
20:46karolherbst: mhh
20:46karolherbst: it might miss some callbacks.. let me check
20:47karolherbst: it ran with clover in the past, so I don't think much is missing
20:47HdkR: Even on x86 desktop with radeon
20:47karolherbst: that's not wired up at all
20:47karolherbst: clear_buffer = u_default_clear_buffer
20:47karolherbst: thta's missing
20:48karolherbst: if that's still not it, we might have to debug and see where exactly it bails
20:48karolherbst: and I might even want to add a "force load" option
21:57karolherbst: HdkR: had any luck with the clear_buffer thing?
21:58HdkR: Oh, I got distracted by other things since I didn't know where a clear buffer would even end up being placed
21:58karolherbst: near buffer_subdata
21:58karolherbst: in the driver, you know, like the gallium API callbacks
21:59HdkR: greping for those give me 461 and 121 lines to search through :D
21:59karolherbst: wait a moment
21:59karolherbst: https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/gallium/drivers/freedreno/freedreno_resource.c#L1769
21:59karolherbst: there
22:00karolherbst: just need the same with clear_buffer
22:00HdkR: Let's see
22:01HdkR: Looks like it still fell down to llvmpipe only
22:01karolherbst: heh...
22:02karolherbst: mhhh... could be something stupid then...
22:03karolherbst: let's just force it then... I'll write a patch
22:03HdkR: Coolio, At least then I can get a backtrace to a nullptr when it tries to do something unsupported
22:03karolherbst: yep
22:03karolherbst: that's the idea
22:04karolherbst: HdkR: did you get any error btw?
22:04karolherbst: though the assert stuff might not be hooked up...
22:04HdkR: Just `msm: driver missing`
22:04karolherbst: ohh still?
22:04HdkR: yep
22:05karolherbst: did you add freedreno to https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/gallium/targets/rusticl/meson.build#L44 ?
22:07HdkR: oop, nope, missed that
22:07HdkR: Now it panics at `Context missing features. This should never happen!`
22:07karolherbst: okay
22:07karolherbst: that's easy to figure out then
22:07HdkR: Would be nice if it said which missing feature
22:08karolherbst: yeah... I am planning on doing that
22:08karolherbst: ahh.. clear_texture
22:09karolherbst: same thing as with clear_buffer, just use u_default_clear_texture
22:09karolherbst: though that doesn't exist...
22:09karolherbst: util_clear_texture
22:09karolherbst: if any of the others were missing, GL compute wouldn't work :)
22:10HdkR: ah
22:10HdkR: FD660 inside of clinfo!
22:10karolherbst: https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/gallium/frontends/rusticl/mesa/pipe/context.rs#L413
22:10karolherbst: that's the list, so it's not even that hard to check
22:10karolherbst: \o/
22:11HdkR: ooo fancy, a crash
22:11karolherbst: heh...
22:12HdkR: ir3 compiler crash
22:12karolherbst: some unknown op or something?
22:12HdkR: Lemme rebuild debug to get a backtrace
22:12karolherbst: could also be some vec8/vec16 stuff
22:12karolherbst: with radeonsi it was mostly that and then just random ops
22:23HdkR: nope, not a vec8/16 thing, only deal with a vec4 at this crash
22:23karolherbst: heh
22:23karolherbst: what instruction?
22:24HdkR: Inspecting, looks like an intrinsic is claiming to have a destination but doesn't have one allocated
22:24karolherbst: huh
22:24karolherbst: somehow I saw that in the past as well...
22:25HdkR: load_kernel_input
22:25karolherbst: ahh...
22:25karolherbst: but I thought ir3 is handling that just fine
22:25HdkR: Claims to be
22:25karolherbst: mind pasting the nir?
22:25HdkR: How do I dump the nir?
22:25karolherbst: p nir_print_shader(nir, stdout)
22:26karolherbst: just need to find the pointer to the nir somewhere
22:27karolherbst: there is this assert which also is a little annoying: compile_assert(ctx, !(offset & 0x3));
22:29karolherbst: HdkR: could probably just choose the closest frame in rusticl and do a "p nir.print()" instead
22:30HdkR: https://gist.github.com/Sonicadvance1/5a5be747670a1e3a2efc8dfe96a3faed Found it, just had to walk back a million frames
22:31karolherbst: HdkR: if I had to guess it's the last load_kernel_input
22:32karolherbst: yeah.. freedreno also doesn't handle unpack_64_2x32_split_x
22:32karolherbst: huh.. what lowering am I missing
22:33HdkR: How soon until zink is wired up? :P
22:33karolherbst: uhh.. next year? There is a lot of stuff I need to land/clean up
22:33karolherbst: and the stuff I have is already a mess and it fails 10% of the tests
22:33HdkR: ah
22:34karolherbst: freedreno should work, just need to lower away the vectored load and stuff... should be trivial, just need to find it
22:34karolherbst: okay.. so lower_pack_split is set to true
22:37karolherbst: okay, got it
22:39karolherbst: I think I even have a patch to fix that
22:44HdkR: Nice
22:45karolherbst: uhhh... can't use nir_lower_io_to_scalar_early as it doesn't deal with 64 bit stuff and is very graphics specific...
22:45karolherbst: ohhhh
22:46karolherbst: .lower_uniforms_to_ubo = true,
22:46karolherbst: that makes everything sooooo simply
22:46karolherbst: *simple
22:46karolherbst: wait
22:46karolherbst: HdkR: from my remote pull 2874999401132f45d68eae0ffb735cd8ba2b4c16 and fc640a21ff8fb700eca6409a0485e61ff7dfa856
22:47HdkR: Time to pick some cherries
22:47karolherbst: it's part of https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18581
22:47karolherbst: but the entire thing breaks on pre a600 hardware
22:48HdkR: Well I only have a600 anyway :D
22:48karolherbst: yeah..
22:48karolherbst: the reason is also stupid.. apparently older hardware needs to know at compile time the amount of shared memory used :(
22:48karolherbst: but that MR is very critical to smooth performance
22:48HdkR: ah
22:49karolherbst: instead of creating the compute state object on every kernel launch, I create it once and reuse that for all launch_grids
22:49karolherbst: but CL has variable shared mem bound as kernel args... it's very annoying
22:54HdkR: lol, no longer crashing in userspace at least!
22:54HdkR: Now it is just GPU hangcheck faulting
22:55HdkR: Although with this in place it looks like we can at least shovel things over to the Adreno people to solve :D
22:55HdkR: Oh heck yea, Result of 4 on Luxmark
22:57HdkR: I like that the FD660 claims to have 9999 compute units
23:01karolherbst: :D
23:01karolherbst: pull the entire MR
23:02karolherbst: this should wait until all kernels are compiled upfront
23:02karolherbst: unless the problem is kernels running too long
23:02karolherbst: robclark might know what to do with the hangcheck
23:03HdkR: https://cdn.discordapp.com/attachments/765304672579092511/1029167039237075075/Screenshot_2022-10-10_16-01-24.png
23:03karolherbst: lol
23:04HdkR: Let's test that full PR
23:04HdkR: I think the kernel is actually faulting though
23:04karolherbst: drm drivers tend to kill jobs running too long though, so maybe that's what's happening here? At least if that's what you meant by hang check
23:04karolherbst: ahh
23:05karolherbst: yeah, that would be bad
23:05karolherbst: only sane way of debugging this is to run the CTS and check what's broken
23:06HdkR: oop, freedreno doesn't compile with the pr
23:06karolherbst: annoying
23:06karolherbst: nvm then, if it faults, the MR won't change that
23:08HdkR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18581/diffs#9b3b921b0d7ce5a249c0c9ac58022d948bdd3b95_129_130 Think this needs to be variable_shared_mem
23:08karolherbst: ahh, correct
23:08karolherbst: I even fixed that locally
23:08karolherbst: on some branch
23:09HdkR: fixed, built, installed, and same result as expected
23:10karolherbst: yeah..
23:10karolherbst: hard to say what's wrong
23:10karolherbst: probably something silly though
23:12karolherbst: HdkR: but the rendered pixels are actually correct
23:12HdkR: Interesting isn't it
23:13karolherbst: you can clearly see it with the brown dots, that they are where you expect them to be: https://twitter.com/karolherbst/status/1578033584719306755/photo/1
23:14karolherbst: what kind of error did you get? or just some generic "timeout" thingy in dmesg?
23:14HdkR: `unhandled context fault...`
23:14karolherbst: huh
23:14HdkR: Big crashy issues
23:15karolherbst: does it work with clover? :D
23:15karolherbst: freedreno actually got some testing on clover, so it might as well just work there
23:16HdkR: I don't even have clover built
23:17karolherbst: just add 'gallium-opencl=icd' and point the ICD var to mesa.icd instead
23:17karolherbst: that should be all
23:17HdkR: eeeh
23:17HdkR: But that would be playing with the past
23:17karolherbst: sure, but that might help us figuring out what's wrong
23:17karolherbst: but if it fails the same way, then...
23:18karolherbst: clover has a different lowering pipeline, but it is also different in a few other areas
23:18HdkR: I expect pain regardless, there's a /bunch/ of random applications that fail the same way
23:18karolherbst: ahh
23:18karolherbst: yeah.. wouldn't be surprised if soething is just broken somewhere
23:19HdkR: Wouldn't be surprised if a newer kernel also just fixes some bugs
23:19HdkR: Still on 5.15
23:21karolherbst: heh
23:21karolherbst: might not be the case depending on what's wrong
23:21karolherbst: but maybe freedreno also has a super low timeout?
23:21karolherbst: dunno
23:21HdkR: Raised the timeout and it didn't resolve it, so non-issue. Timeout just comes from fault hang
23:23karolherbst: mhh
23:24karolherbst: on the other hand, freedreno never ran anything real...
23:24HdkR: It's been running some x86 games :D
23:24karolherbst: :D
23:24karolherbst: true
23:25HdkR: Although GL 3.3 is rough, zink is helping out a lot there
23:25karolherbst: you could try the zink branch..
23:25karolherbst: but that's so much WIP and everything.. maybe it even works
23:26karolherbst: HdkR: I assume you don't have physical address buffers wired up in turnip?
23:26HdkR: It has BDA, I don't know if anything is needed on top of that
23:26karolherbst: should be enough
23:27karolherbst: that's the only think rusticl wants from zink
23:27karolherbst: ehh.. zink wants from the vulkan driver for rusticl
23:27HdkR: Turnip supports pretty much everything that anything cares about, so it's pretty good
23:27HdkR: on a6xx anyway
23:27karolherbst: yeah.. so maybe that just works then
23:28karolherbst: worth a try I guess
23:28karolherbst: the branch more or less runs on anv and radc
23:28karolherbst: *radv
23:28karolherbst: "rusticl/zink" that is
23:29HdkR: yes, rusticl+zink+{anv,radv}