16:22 HdkR: Really need to get Karol on a bouncer sometime so I can throw things at them when not around
19:03 HdkR: karolherbst: https://cdn.discordapp.com/attachments/765304672579092511/1028814485017866301/Screenshot_2022-10-09_16-40-55.png There you go
19:03 karolherbst: HdkR: nice
19:04 karolherbst: though interesting it renders soo badly
19:04 karolherbst: but maybe the emulation of the JIT code isn't all there yet?
19:04 HdkR: It /seems/ like it does the blend between frames is done CPU side instead of OpenCL side
19:05 karolherbst: well. it's a ray tracer
19:05 HdkR: It reproduced without thunks and on different hardware, so likely a CPU emulation bug
19:05 karolherbst: from what I can tell is, that the benchmarks fire offs traces and increases precision on the output image wich each one fired
19:05 karolherbst: *with
19:06 karolherbst: HdkR: how well is llvmpipe supported?
19:06 karolherbst: could try this next
19:06 HdkR: Right, I think their accumulation happens outside of CL for some reason though. Watching the first frames come in is telling
19:06 HdkR: llvmpipe should work, I guess with wiring up rusticl or something?
19:07 karolherbst: yeah
19:07 karolherbst: LP_CL=1 and it should just work
19:07 karolherbst: HdkR: luxmark also has a C++ version of the run. You can select that under "MOde"
19:08 HdkR: ah true, that path renders fine
19:09 HdkR: I don't think I have rusticl built and installed to be able to use LP_CL
19:09 karolherbst: yeah... need to set up rust and all of that
19:10 karolherbst: HdkR: how many rays/s did you got with pocl?
19:10 karolherbst: though the score is based on that, and it's high enough...
19:10 karolherbst: I'd expect that pocls highly optimized JIT code is triggering emulation bugs
19:10 karolherbst: pocl is like 10x faster than llvmpipe
19:10 HdkR: That was claiming 2463K in the picture
19:11 karolherbst: mhhh
19:11 karolherbst: I already see it coming, I'll buy an M1 this year...
19:11 HdkR: Nah, wait for M2 Pro/Max at this rate
19:12 karolherbst: probably
19:12 HdkR: :P
19:13 HdkR: oop, need to update meson to get rusticl apparently
19:13 karolherbst: yeah
19:13 karolherbst: https://docs.mesa3d.org/rusticl.html?highlight=rusticl
19:16 HdkR: also /really/ new llvmspirvlib
19:17 HdkR: Requires 15, only have 13 on Ubuntu repos atm
19:17 karolherbst: it needs to match the LLVM version you use for mesa
19:17 HdkR: ah
19:18 karolherbst: long term we'll be able to get rid of it once LLVM's own SPIR-V target becomes competent
19:18 HdkR: I was using a really new clang for testing if a bug was fixed, so that makes sense
19:18 karolherbst: yeah.. but it also builds in really fast
19:29 HdkR: Oh nice, rusticle emmintrin.h conversion problems
19:29 karolherbst: okay. so here is the thing
19:29 karolherbst: you can't have multiple versions of libllvm installed
19:30 karolherbst: or libclang or something
19:30 karolherbst: systemwide I mean
19:30 HdkR: lol wtf
19:30 karolherbst: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7268
19:31 karolherbst: something something loading order something
19:33 karolherbst: I know it's fine to have the new LLVM inside LD_LIBRARY_PATH when building
19:33 karolherbst: but I don't mess with PATH
19:33 karolherbst: huh.. I actually do..
19:33 karolherbst: guess then it really only matters what's inside /usr
19:34 HdkR: Might be a bit tricky for me to set this up with my current environment
19:37 karolherbst: HdkR: might make things easier to just use rustup and use bindgen from there...
19:38 karolherbst: ehhh... though not sure it even has bindgen
19:38 karolherbst: it does actually
19:39 karolherbst: ehh.. cargo I mean
19:39 karolherbst: HdkR: "cargo install bindgen"
19:39 karolherbst: and then you need to add ~/.cargo/bin to your PATH
19:39 karolherbst: though not 100% sure it gets around the LLVM issue
19:42 HdkR: Fixed it by building a new spirv-llvm-translator that matched the llvm-14 I had installed
19:42 karolherbst: ahh yeah :)
19:42 karolherbst: getting rid of a custom LLVM always helps
19:42 karolherbst: okay...... OpenCL conformance run on iris go
19:44 HdkR: That's llvmpipe+rusticl working
19:44 HdkR: And also with FEX without the weirdo blend problem
19:44 karolherbst: :)
19:46 HdkR: Let this finish on my TR and then kick it over to the ARM board and see if it does the same
19:46 karolherbst: not even sure it's a weird blend problem. Some part of the image are static and what's white might be the rays always returning full white
19:46 HdkR: It's interesting since the first frame looks fine but then the fireflies start growing
19:46 karolherbst: not 100% sure on what's ray traced here, but the parts you saw correct are also correct if you really mess up a lot
19:46 karolherbst: heh
19:47 karolherbst: could be precision being off
19:48 HdkR: Could be if it is assuming some precision of some transcendental routines
19:48 karolherbst: yeah
19:48 karolherbst: we have the same issue on radv and anv
19:49 karolherbst: the inner of the ball and the socket are just very noisy there
19:51 karolherbst: guess the best way of fixing those issues in fex is to run the math/integer CTS tests
19:51 HdkR: That would be cool
19:51 HdkR: If we can find a bug through those we could reduce the case down to something our ASM suite can eat
19:53 karolherbst: yeah.. the tests are really nice, they only differ in the ALU stuff they use
19:53 karolherbst: and there are tests for each CL builtin
19:54 HdkR: Okay, what's isl and how do I destroy it?
19:55 HdkR:deletes intel drivers
19:56 karolherbst: :D
19:56 HdkR: Seems like they decided to hardcode -msse2 in their driver again.
19:56 karolherbst: yeah.. without discrete GPUs that might be a safe assumption, but..
19:57 HdkR: Need to slap their driver in to the ARM builders
19:57 karolherbst: we should just build all drivers on all platforms
19:58 HdkR: Created a tracking issue for it
19:59 karolherbst: today I learned that mesas spirv_to_nir emits some instructions incorrectly :(
20:02 HdkR: oops
20:02 karolherbst: yeah.. but it was FMod so I am not even surprised people never noticed
20:02 karolherbst: (And the spec being super explicit on what to di was 1.3 in the first place)
20:02 karolherbst: but I always had those spirv vs clc inconsistency and never figured out what's wrong
20:03 karolherbst: turns out OpenCL C fmod is actually FRem, and spirvs fmod is like fmod but with a copysign
20:03 karolherbst: it makes no sense
20:04 HdkR: Sounds like the silly behaviour that x87 has FPREM and FPREM1
20:06 karolherbst: yep
20:06 karolherbst: exactly that
20:06 karolherbst: though you still need the copysign on top
20:07 karolherbst: seems like both actually need the sign copied..
20:08 karolherbst: maybe our lowering is weirdly broken
20:09 karolherbst: ahh yeah.. fmod complains now.. *sigh*
20:09 karolherbst: guess I'll ignore the spirv ext for now
20:10 karolherbst: HdkR: the compile shouldn't fail without rustfmt though
20:11 karolherbst: you simply get one lined bindings
20:11 HdkR: It completely errored out compiling saying the executable couldn't be found
20:11 karolherbst: strange
20:11 karolherbst: I know there are warnings printed, but it always compiled for me
20:12 HdkR: Different environments doing different things I guess
20:12 karolherbst: yeah.. feels that way
20:12 HdkR: Making the user experience better is always an improvement :D
20:13 karolherbst: requiring rustfmt isn't even a bad thing, because otherwise you might accidentally opened the bindings rs files and your editor just dies
20:14 HdkR: hah
20:21 HdkR: oop
20:21 HdkR: llvm explode!
20:21 karolherbst: oh wow
20:21 HdkR: `Cannote select: v4f32 = fp_extend ...`
20:22 HdkR: womp womp
20:23 karolherbst: heh
20:24 HdkR: I guess no one in AArch64 land hit that case before
20:24 karolherbst: not surprised :D
20:24 HdkR: Looks like it is trying to fp_extend a v4i16
20:25 karolherbst: ahh.. yeah... that makes somewhat sense
20:25 karolherbst: we treat fp16 pretty much like int16
20:25 karolherbst: shouldn't be terrible hard to fix in llvmpipe
20:29 HdkR: Maybe once turnip works it'll be something to not care about :P
20:30 karolherbst: well... we need a working software fallback, but I'd rather have that to be pocl
20:30 karolherbst: pocl is just incredibly fast
20:39 karolherbst: HdkR: what's missing from turnips side though?
20:41 karolherbst: ohh wait.. turnip is vulkan
20:41 karolherbst: this should all just work on freedreno
20:42 HdkR: Oh right, freedreno side. dunno. I guess it relates to the log saying "msm: missing driver"?
20:42 karolherbst: yeah.. need to add it to src/gallium/frontends/rusticl/meson.build
20:42 karolherbst: and that should be it
20:42 karolherbst: might have to enable it the same way as for clover
20:42 karolherbst: ehh
20:42 karolherbst: targets
20:42 karolherbst: not frontends
20:43 karolherbst: seems like it should just work actually
20:44 karolherbst: not having lower_uniforms_to_ubo is a bit of a pita though, because I want to slowly move towards drivers just using that
20:44 karolherbst: shouldn't matter for main
20:46 HdkR: Only device I can see is llvmpipe
20:46 karolherbst: mhh
20:46 karolherbst: it might miss some callbacks.. let me check
20:47 karolherbst: it ran with clover in the past, so I don't think much is missing
20:47 HdkR: Even on x86 desktop with radeon
20:47 karolherbst: that's not wired up at all
20:47 karolherbst: clear_buffer = u_default_clear_buffer
20:47 karolherbst: thta's missing
20:48 karolherbst: if that's still not it, we might have to debug and see where exactly it bails
20:48 karolherbst: and I might even want to add a "force load" option
21:57 karolherbst: HdkR: had any luck with the clear_buffer thing?
21:58 HdkR: Oh, I got distracted by other things since I didn't know where a clear buffer would even end up being placed
21:58 karolherbst: near buffer_subdata
21:58 karolherbst: in the driver, you know, like the gallium API callbacks
21:59 HdkR: greping for those give me 461 and 121 lines to search through :D
21:59 karolherbst: wait a moment
21:59 karolherbst: https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/gallium/drivers/freedreno/freedreno_resource.c#L1769
21:59 karolherbst: there
22:00 karolherbst: just need the same with clear_buffer
22:00 HdkR: Let's see
22:01 HdkR: Looks like it still fell down to llvmpipe only
22:01 karolherbst: heh...
22:02 karolherbst: mhhh... could be something stupid then...
22:03 karolherbst: let's just force it then... I'll write a patch
22:03 HdkR: Coolio, At least then I can get a backtrace to a nullptr when it tries to do something unsupported
22:03 karolherbst: yep
22:03 karolherbst: that's the idea
22:04 karolherbst: HdkR: did you get any error btw?
22:04 karolherbst: though the assert stuff might not be hooked up...
22:04 HdkR: Just `msm: driver missing`
22:04 karolherbst: ohh still?
22:04 HdkR: yep
22:05 karolherbst: did you add freedreno to https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/gallium/targets/rusticl/meson.build#L44 ?
22:07 HdkR: oop, nope, missed that
22:07 HdkR: Now it panics at `Context missing features. This should never happen!`
22:07 karolherbst: okay
22:07 karolherbst: that's easy to figure out then
22:07 HdkR: Would be nice if it said which missing feature
22:08 karolherbst: yeah... I am planning on doing that
22:08 karolherbst: ahh.. clear_texture
22:09 karolherbst: same thing as with clear_buffer, just use u_default_clear_texture
22:09 karolherbst: though that doesn't exist...
22:09 karolherbst: util_clear_texture
22:09 karolherbst: if any of the others were missing, GL compute wouldn't work :)
22:10 HdkR: ah
22:10 HdkR: FD660 inside of clinfo!
22:10 karolherbst: https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/gallium/frontends/rusticl/mesa/pipe/context.rs#L413
22:10 karolherbst: that's the list, so it's not even that hard to check
22:10 karolherbst: \o/
22:11 HdkR: ooo fancy, a crash
22:11 karolherbst: heh...
22:12 HdkR: ir3 compiler crash
22:12 karolherbst: some unknown op or something?
22:12 HdkR: Lemme rebuild debug to get a backtrace
22:12 karolherbst: could also be some vec8/vec16 stuff
22:12 karolherbst: with radeonsi it was mostly that and then just random ops
22:23 HdkR: nope, not a vec8/16 thing, only deal with a vec4 at this crash
22:23 karolherbst: heh
22:23 karolherbst: what instruction?
22:24 HdkR: Inspecting, looks like an intrinsic is claiming to have a destination but doesn't have one allocated
22:24 karolherbst: huh
22:24 karolherbst: somehow I saw that in the past as well...
22:25 HdkR: load_kernel_input
22:25 karolherbst: ahh...
22:25 karolherbst: but I thought ir3 is handling that just fine
22:25 HdkR: Claims to be
22:25 karolherbst: mind pasting the nir?
22:25 HdkR: How do I dump the nir?
22:25 karolherbst: p nir_print_shader(nir, stdout)
22:26 karolherbst: just need to find the pointer to the nir somewhere
22:27 karolherbst: there is this assert which also is a little annoying: compile_assert(ctx, !(offset & 0x3));
22:29 karolherbst: HdkR: could probably just choose the closest frame in rusticl and do a "p nir.print()" instead
22:30 HdkR: https://gist.github.com/Sonicadvance1/5a5be747670a1e3a2efc8dfe96a3faed Found it, just had to walk back a million frames
22:31 karolherbst: HdkR: if I had to guess it's the last load_kernel_input
22:32 karolherbst: yeah.. freedreno also doesn't handle unpack_64_2x32_split_x
22:32 karolherbst: huh.. what lowering am I missing
22:33 HdkR: How soon until zink is wired up? :P
22:33 karolherbst: uhh.. next year? There is a lot of stuff I need to land/clean up
22:33 karolherbst: and the stuff I have is already a mess and it fails 10% of the tests
22:33 HdkR: ah
22:34 karolherbst: freedreno should work, just need to lower away the vectored load and stuff... should be trivial, just need to find it
22:34 karolherbst: okay.. so lower_pack_split is set to true
22:37 karolherbst: okay, got it
22:39 karolherbst: I think I even have a patch to fix that
22:44 HdkR: Nice
22:45 karolherbst: uhhh... can't use nir_lower_io_to_scalar_early as it doesn't deal with 64 bit stuff and is very graphics specific...
22:45 karolherbst: ohhhh
22:46 karolherbst: .lower_uniforms_to_ubo = true,
22:46 karolherbst: that makes everything sooooo simply
22:46 karolherbst: *simple
22:46 karolherbst: wait
22:46 karolherbst: HdkR: from my remote pull 2874999401132f45d68eae0ffb735cd8ba2b4c16 and fc640a21ff8fb700eca6409a0485e61ff7dfa856
22:47 HdkR: Time to pick some cherries
22:47 karolherbst: it's part of https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18581
22:47 karolherbst: but the entire thing breaks on pre a600 hardware
22:48 HdkR: Well I only have a600 anyway :D
22:48 karolherbst: yeah..
22:48 karolherbst: the reason is also stupid.. apparently older hardware needs to know at compile time the amount of shared memory used :(
22:48 karolherbst: but that MR is very critical to smooth performance
22:48 HdkR: ah
22:49 karolherbst: instead of creating the compute state object on every kernel launch, I create it once and reuse that for all launch_grids
22:49 karolherbst: but CL has variable shared mem bound as kernel args... it's very annoying
22:54 HdkR: lol, no longer crashing in userspace at least!
22:54 HdkR: Now it is just GPU hangcheck faulting
22:55 HdkR: Although with this in place it looks like we can at least shovel things over to the Adreno people to solve :D
22:55 HdkR: Oh heck yea, Result of 4 on Luxmark
22:57 HdkR: I like that the FD660 claims to have 9999 compute units
23:01 karolherbst: :D
23:01 karolherbst: pull the entire MR
23:02 karolherbst: this should wait until all kernels are compiled upfront
23:02 karolherbst: unless the problem is kernels running too long
23:02 karolherbst: robclark might know what to do with the hangcheck
23:03 HdkR: https://cdn.discordapp.com/attachments/765304672579092511/1029167039237075075/Screenshot_2022-10-10_16-01-24.png
23:03 karolherbst: lol
23:04 HdkR: Let's test that full PR
23:04 HdkR: I think the kernel is actually faulting though
23:04 karolherbst: drm drivers tend to kill jobs running too long though, so maybe that's what's happening here? At least if that's what you meant by hang check
23:04 karolherbst: ahh
23:05 karolherbst: yeah, that would be bad
23:05 karolherbst: only sane way of debugging this is to run the CTS and check what's broken
23:06 HdkR: oop, freedreno doesn't compile with the pr
23:06 karolherbst: annoying
23:06 karolherbst: nvm then, if it faults, the MR won't change that
23:08 HdkR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18581/diffs#9b3b921b0d7ce5a249c0c9ac58022d948bdd3b95_129_130 Think this needs to be variable_shared_mem
23:08 karolherbst: ahh, correct
23:08 karolherbst: I even fixed that locally
23:08 karolherbst: on some branch
23:09 HdkR: fixed, built, installed, and same result as expected
23:10 karolherbst: yeah..
23:10 karolherbst: hard to say what's wrong
23:10 karolherbst: probably something silly though
23:12 karolherbst: HdkR: but the rendered pixels are actually correct
23:12 HdkR: Interesting isn't it
23:13 karolherbst: you can clearly see it with the brown dots, that they are where you expect them to be: https://twitter.com/karolherbst/status/1578033584719306755/photo/1
23:14 karolherbst: what kind of error did you get? or just some generic "timeout" thingy in dmesg?
23:14 HdkR: `unhandled context fault...`
23:14 karolherbst: huh
23:14 HdkR: Big crashy issues
23:15 karolherbst: does it work with clover? :D
23:15 karolherbst: freedreno actually got some testing on clover, so it might as well just work there
23:16 HdkR: I don't even have clover built
23:17 karolherbst: just add 'gallium-opencl=icd' and point the ICD var to mesa.icd instead
23:17 karolherbst: that should be all
23:17 HdkR: eeeh
23:17 HdkR: But that would be playing with the past
23:17 karolherbst: sure, but that might help us figuring out what's wrong
23:17 karolherbst: but if it fails the same way, then...
23:18 karolherbst: clover has a different lowering pipeline, but it is also different in a few other areas
23:18 HdkR: I expect pain regardless, there's a /bunch/ of random applications that fail the same way
23:18 karolherbst: ahh
23:18 karolherbst: yeah.. wouldn't be surprised if soething is just broken somewhere
23:19 HdkR: Wouldn't be surprised if a newer kernel also just fixes some bugs
23:19 HdkR: Still on 5.15
23:21 karolherbst: heh
23:21 karolherbst: might not be the case depending on what's wrong
23:21 karolherbst: but maybe freedreno also has a super low timeout?
23:21 karolherbst: dunno
23:21 HdkR: Raised the timeout and it didn't resolve it, so non-issue. Timeout just comes from fault hang
23:23 karolherbst: mhh
23:24 karolherbst: on the other hand, freedreno never ran anything real...
23:24 HdkR: It's been running some x86 games :D
23:24 karolherbst: :D
23:24 karolherbst: true
23:25 HdkR: Although GL 3.3 is rough, zink is helping out a lot there
23:25 karolherbst: you could try the zink branch..
23:25 karolherbst: but that's so much WIP and everything.. maybe it even works
23:26 karolherbst: HdkR: I assume you don't have physical address buffers wired up in turnip?
23:26 HdkR: It has BDA, I don't know if anything is needed on top of that
23:26 karolherbst: should be enough
23:27 karolherbst: that's the only think rusticl wants from zink
23:27 karolherbst: ehh.. zink wants from the vulkan driver for rusticl
23:27 HdkR: Turnip supports pretty much everything that anything cares about, so it's pretty good
23:27 HdkR: on a6xx anyway
23:27 karolherbst: yeah.. so maybe that just works then
23:28 karolherbst: worth a try I guess
23:28 karolherbst: the branch more or less runs on anv and radc
23:28 karolherbst: *radv
23:28 karolherbst: "rusticl/zink" that is
23:29 HdkR: yes, rusticl+zink+{anv,radv}